Re: OpenBSD router stops functioning but still send CARP advertisements

2009-06-01 Thread Simon Morvan
Le 27/05/2009 15:38, Stuart Henderson a icrit :
 Simon Morvangar...@zone84.net  wrote:

 After a couple of hours/days one of the box stop functioning properly :
 no ping, no more SSH access but I still capture CARP avertisement on the
 network segments (when it occurs on the master). As a result, when it
 happens on the master, the slave does not take over.
  

 A few ideas...

 Do you have any different hardware you can try instead to rule out
 some incompatibility with the machines?  Have you checked for BIOS updates
 etc that might help?

 Can you break into DDB when this happens? (You'll need to set ddb.console=1
 in sysctl.conf and reboot if it's not already set). If you can, trace/ps might
 be useful. If not it's a useful data point. (make sure you can trigger it
 correctly while the system is running normally; ctrl+alt+esc on glass console,
 or BREAK on serial console; then you can 'c'ontinue).


For what is worth, I haven't got any problems in 5 days since I switched 
em0 and re0 roles. I can't tell if it's related to the NICs themselves. 
I wish I could make any further tests, but this is a production 
platform... If I manage to get that type of hardware again, or a 
comfortable maintenance window, I'll run a new stress test and let you know.

-- 
Simon.



Re: OpenBSD router stops functioning but still send CARP advertisements

2009-05-27 Thread Simon Morvan

Le 27/05/2009 01:52, Samiuela LV Taufa a icrit :

Simon Morvan wrote the following on 27/05/2009 2:28 AM:Hello all,

I've set up two OpenBSD boxes to act as redundant firewalls in front of
our network and I experience a strange behavior :

After a couple of hours/days one of the box stop functioning properly :
no ping, no more SSH access but I still capture CARP avertisement on the
network segments (when it occurs on the master). As a result, when it
happens on the master, the slave does not take over.

When it happens on the slave, the switch sees intermittently  the
virtual CARP mac on the slave port so it disturb the master routing
operations.

When I hook up a screen on the machine, I get back the login screen but
everything is frozen.

I really don't know where I should start looking at to troubleshoot the
issue.

Here's the dmesg, the two boxes are identical. I do VLAN routing on em0
and pfsync on re0 (@ 100BaseFD to be sure there's no issue with the
re(4) driver) :

OpenBSD 4.5 (GENERIC) #1749: Sat Feb 28 14:51:18 MST 2009
  dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
RTC BIOS diagnostic error 80clock_battery
cpu0: Intel(R) Atom(TM) CPU 330 @ 1.60GHz (GenuineIntel 686-class)
1.60 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,TM2,CX16,xTPR
real mem  = 213588 (2036MB)
avail mem = 2056806400 (1961MB)
RTC BIOS diagnostic error 80clock_battery
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 12/31/08, SMBIOS rev. 2.4 @
0xe3590 (23 entries)
bios0: vendor Intel Corp. version LF94510J.86A.0140.2008.1231.0012
date 12/31/2008
bios0: Intel Corporation D945GCLF2
acpi0 at bios0: rev 0
acpi0: tables DSDT FACP APIC WDDT MCFG ASF!
acpi0: wakeup devices SLPB(S4) P32_(S4) UAR1(S4) UAR2(S4) PEX0(S4)
PEX1(S4) PEX2(S4) PEX3(S4) PEX4(S4) PEX5(S4) UHC1(S3) UHC2(S3) UHC3(S3)
UHC4(S3) EHCI(S3) AC9M(S4) AZAL(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: apic clock running at 134MHz
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
ioapic0: misconfigured as apic 0, remapped to apid 2
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 4 (P32_)
acpiprt2 at acpi0: bus 1 (PEX0)
acpiprt3 at acpi0: bus -1 (PEX1)
acpiprt4 at acpi0: bus 2 (PEX2)
acpiprt5 at acpi0: bus 3 (PEX3)
acpiprt6 at acpi0: bus -1 (PEX4)
acpiprt7 at acpi0: bus -1 (PEX5)
acpicpu0 at acpi0
acpibtn0 at acpi0: SLPB
bios0: ROM list: 0xc/0xae00! 0xcb000/0x1000 0xcc000/0x1000
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 Intel 82945G Host rev 0x02
vga1 at pci0 dev 2 function 0 Intel 82945G Video rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
intagp0 at vga1
agp0 at intagp0: aperture at 0x8000, size 0x1000
inteldrm0 at vga1: apic 2 int 16 (irq 11)
drm0 at inteldrm0
azalia0 at pci0 dev 27 function 0 Intel 82801GB HD Audio rev 0x01:
apic 2 int 22 (irq 9)
azalia0: codecs: Realtek ALC662
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 Intel 82801GB PCIE rev 0x01: apic 2 int
17 (irq 255)
pci1 at ppb0 bus 1
re0 at pci1 dev 0 function 0 Realtek 8168 rev 0x02: RTL8168C/8111C
(0x3c00), apic 2 int 16 (irq 11), address 00:1c:c0:c3:40:fa
rgephy0 at re0 phy 7: RTL8169S/8110S PHY, rev. 2
ppb1 at pci0 dev 28 function 2 Intel 82801GB PCIE rev 0x01: apic 2 int
18 (irq 255)
pci2 at ppb1 bus 2
ppb2 at pci0 dev 28 function 3 Intel 82801GB PCIE rev 0x01: apic 2 int
19 (irq 255)
pci3 at ppb2 bus 3
uhci0 at pci0 dev 29 function 0 Intel 82801GB USB rev 0x01: apic 2 int
23 (irq 10)
uhci1 at pci0 dev 29 function 1 Intel 82801GB USB rev 0x01: apic 2 int
19 (irq 11)
uhci2 at pci0 dev 29 function 2 Intel 82801GB USB rev 0x01: apic 2 int
18 (irq 9)
uhci3 at pci0 dev 29 function 3 Intel 82801GB USB rev 0x01: apic 2 int
16 (irq 11)
ehci0 at pci0 dev 29 function 7 Intel 82801GB USB rev 0x01: apic 2 int
23 (irq 10)
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 Intel EHCI root hub rev 2.00/1.00 addr 1
ppb3 at pci0 dev 30 function 0 Intel 82801BA Hub-to-PCI rev 0xe1
pci4 at ppb3 bus 4
em0 at pci4 dev 0 function 0 Intel PRO/1000GT (82541GI) rev 0x05: apic
2 int 21 (irq 10), address 00:1b:21:38:77:25
ichpcib0 at pci0 dev 31 function 0 Intel 82801GB LPC rev 0x01: PM disabled
pciide0 at pci0 dev 31 function 1 Intel 82801GB IDE rev 0x01: DMA,
channel 0 configured to compatibility, channel 1 configured to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 ignored (disabled)
pciide1 at pci0 dev 31 function 2 Intel 82801GB SATA rev 0x01: DMA,
channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide1: using apic 2 int 19 (irq 11) for native-PCI interrupt
wd0 at pciide1 channel 0 drive 0:TS32GSSD25S-M
wd0: 1-sector 

Re: OpenBSD router stops functioning but still send CARP advertisements

2009-05-27 Thread Jussi Peltola
I'd rather run pfsync in its own vlan than over a realtek card. It's
probably not any slower (what could be slower than a realtek...) and
it's not really any less reliable (what use is pfsync if your business
network goes down?)



Re: OpenBSD router stops functioning but still send CARP advertisements

2009-05-27 Thread Henning Brauer
* Jussi Peltola pe...@pelzi.net [2009-05-27 12:11]:
 I'd rather run pfsync in its own vlan than over a realtek card. It's
 probably not any slower (what could be slower than a realtek...) and
 it's not really any less reliable (what use is pfsync if your business
 network goes down?)

oh cut the crap. re(4) cards are ok.
I would not exactly run my performance critical core routers on them,
but that is not their purpose. re is not rl.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg  Amsterdam



Re: OpenBSD router stops functioning but still send CARP advertisements

2009-05-27 Thread Simon Morvan

Le 27/05/2009 12:08, Jussi Peltola a icrit :

I'd rather run pfsync in its own vlan than over a realtek card. It's
probably not any slower (what could be slower than a realtek...) and
it's not really any less reliable (what use is pfsync if your business
network goes down?)

   
I tought I'd better run pfsync over a direct connection rather that 
through the switches. In case of failure of a switch, the sync has a 
chance to be complete and the failover cleaner, but maybe I'm wrong...




Re: OpenBSD router stops functioning but still send CARP advertisements

2009-05-27 Thread Stuart Henderson
Simon Morvan gar...@zone84.net wrote:
 After a couple of hours/days one of the box stop functioning properly :
 no ping, no more SSH access but I still capture CARP avertisement on the
 network segments (when it occurs on the master). As a result, when it
 happens on the master, the slave does not take over.

A few ideas...

Do you have any different hardware you can try instead to rule out
some incompatibility with the machines?  Have you checked for BIOS updates
etc that might help?

Can you break into DDB when this happens? (You'll need to set ddb.console=1
in sysctl.conf and reboot if it's not already set). If you can, trace/ps might
be useful. If not it's a useful data point. (make sure you can trigger it
correctly while the system is running normally; ctrl+alt+esc on glass console,
or BREAK on serial console; then you can 'c'ontinue).


 Le 27/05/2009 12:08, Jussi Peltola a icrit :
 I'd rather run pfsync in its own vlan than over a realtek card. It's
 probably not any slower (what could be slower than a realtek...) and

Plenty of 100Mb only cards are slower than a realtek. re(4) here is good
for about 550Mb/s of large packets (via tcpbench on a Core2 system), or
about 50Mb/s of small-ish datagrams before it starts dropping too many
on the floor.

 it's not really any less reliable (what use is pfsync if your business
 network goes down?)

 I tought I'd better run pfsync over a direct connection rather that 
 through the switches. In case of failure of a switch, the sync has a 
 chance to be complete and the failover cleaner, but maybe I'm wrong...

If your firewalls are connected to different switches, that does make
sense (unless your CPUs are saturated, in which case em(4) might indeed
be a bit better).



Re: OpenBSD router stops functioning but still send CARP advertisements

2009-05-27 Thread Simon Morvan
Le 27/05/2009 15:38, Stuart Henderson a icrit :
 I tought I'd better run pfsync over a direct connection rather that
   through the switches. In case of failure of a switch, the sync has a
   chance to be complete and the failover cleaner, but maybe I'm wrong...
  

 If your firewalls are connected to different switches, that does make
 sense (unless your CPUs are saturated, in which case em(4) might indeed
 be a bit better).


Does the pfsync traffic lead to CPU overload before the business traffic 
do ?



Re: OpenBSD router stops functioning but still send CARP advertisements

2009-05-27 Thread Stuart Henderson
On 2009/05/27 16:09, Simon Morvan wrote:
 Le 27/05/2009 15:38, Stuart Henderson a icrit :
 I tought I'd better run pfsync over a direct connection rather that
   through the switches. In case of failure of a switch, the sync has a
   chance to be complete and the failover cleaner, but maybe I'm wrong...
  

 If your firewalls are connected to different switches, that does make
 sense (unless your CPUs are saturated, in which case em(4) might indeed
 be a bit better).


 Does the pfsync traffic lead to CPU overload before the business 
 traffic do ?

I think that would depend on the specific interfaces and the traffic
characteristics.

In your case, since you're limiting pfsync to 100 Mb/s by hardcoding
the port speed, I don't think you'll max out the cpu with pfsync
traffic even on an Atom.