Re: OpenBSD router stops functioning but still send CARP advertisements
Le 27/05/2009 15:38, Stuart Henderson a icrit : Simon Morvangar...@zone84.net wrote: After a couple of hours/days one of the box stop functioning properly : no ping, no more SSH access but I still capture CARP avertisement on the network segments (when it occurs on the master). As a result, when it happens on the master, the slave does not take over. A few ideas... Do you have any different hardware you can try instead to rule out some incompatibility with the machines? Have you checked for BIOS updates etc that might help? Can you break into DDB when this happens? (You'll need to set ddb.console=1 in sysctl.conf and reboot if it's not already set). If you can, trace/ps might be useful. If not it's a useful data point. (make sure you can trigger it correctly while the system is running normally; ctrl+alt+esc on glass console, or BREAK on serial console; then you can 'c'ontinue). For what is worth, I haven't got any problems in 5 days since I switched em0 and re0 roles. I can't tell if it's related to the NICs themselves. I wish I could make any further tests, but this is a production platform... If I manage to get that type of hardware again, or a comfortable maintenance window, I'll run a new stress test and let you know. -- Simon.
Re: OpenBSD router stops functioning but still send CARP advertisements
Le 27/05/2009 01:52, Samiuela LV Taufa a icrit : Simon Morvan wrote the following on 27/05/2009 2:28 AM:Hello all, I've set up two OpenBSD boxes to act as redundant firewalls in front of our network and I experience a strange behavior : After a couple of hours/days one of the box stop functioning properly : no ping, no more SSH access but I still capture CARP avertisement on the network segments (when it occurs on the master). As a result, when it happens on the master, the slave does not take over. When it happens on the slave, the switch sees intermittently the virtual CARP mac on the slave port so it disturb the master routing operations. When I hook up a screen on the machine, I get back the login screen but everything is frozen. I really don't know where I should start looking at to troubleshoot the issue. Here's the dmesg, the two boxes are identical. I do VLAN routing on em0 and pfsync on re0 (@ 100BaseFD to be sure there's no issue with the re(4) driver) : OpenBSD 4.5 (GENERIC) #1749: Sat Feb 28 14:51:18 MST 2009 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC RTC BIOS diagnostic error 80clock_battery cpu0: Intel(R) Atom(TM) CPU 330 @ 1.60GHz (GenuineIntel 686-class) 1.60 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,TM2,CX16,xTPR real mem = 213588 (2036MB) avail mem = 2056806400 (1961MB) RTC BIOS diagnostic error 80clock_battery mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 12/31/08, SMBIOS rev. 2.4 @ 0xe3590 (23 entries) bios0: vendor Intel Corp. version LF94510J.86A.0140.2008.1231.0012 date 12/31/2008 bios0: Intel Corporation D945GCLF2 acpi0 at bios0: rev 0 acpi0: tables DSDT FACP APIC WDDT MCFG ASF! acpi0: wakeup devices SLPB(S4) P32_(S4) UAR1(S4) UAR2(S4) PEX0(S4) PEX1(S4) PEX2(S4) PEX3(S4) PEX4(S4) PEX5(S4) UHC1(S3) UHC2(S3) UHC3(S3) UHC4(S3) EHCI(S3) AC9M(S4) AZAL(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: apic clock running at 134MHz cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins ioapic0: misconfigured as apic 0, remapped to apid 2 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 4 (P32_) acpiprt2 at acpi0: bus 1 (PEX0) acpiprt3 at acpi0: bus -1 (PEX1) acpiprt4 at acpi0: bus 2 (PEX2) acpiprt5 at acpi0: bus 3 (PEX3) acpiprt6 at acpi0: bus -1 (PEX4) acpiprt7 at acpi0: bus -1 (PEX5) acpicpu0 at acpi0 acpibtn0 at acpi0: SLPB bios0: ROM list: 0xc/0xae00! 0xcb000/0x1000 0xcc000/0x1000 pci0 at mainbus0 bus 0: configuration mode 1 (bios) pchb0 at pci0 dev 0 function 0 Intel 82945G Host rev 0x02 vga1 at pci0 dev 2 function 0 Intel 82945G Video rev 0x02 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) intagp0 at vga1 agp0 at intagp0: aperture at 0x8000, size 0x1000 inteldrm0 at vga1: apic 2 int 16 (irq 11) drm0 at inteldrm0 azalia0 at pci0 dev 27 function 0 Intel 82801GB HD Audio rev 0x01: apic 2 int 22 (irq 9) azalia0: codecs: Realtek ALC662 audio0 at azalia0 ppb0 at pci0 dev 28 function 0 Intel 82801GB PCIE rev 0x01: apic 2 int 17 (irq 255) pci1 at ppb0 bus 1 re0 at pci1 dev 0 function 0 Realtek 8168 rev 0x02: RTL8168C/8111C (0x3c00), apic 2 int 16 (irq 11), address 00:1c:c0:c3:40:fa rgephy0 at re0 phy 7: RTL8169S/8110S PHY, rev. 2 ppb1 at pci0 dev 28 function 2 Intel 82801GB PCIE rev 0x01: apic 2 int 18 (irq 255) pci2 at ppb1 bus 2 ppb2 at pci0 dev 28 function 3 Intel 82801GB PCIE rev 0x01: apic 2 int 19 (irq 255) pci3 at ppb2 bus 3 uhci0 at pci0 dev 29 function 0 Intel 82801GB USB rev 0x01: apic 2 int 23 (irq 10) uhci1 at pci0 dev 29 function 1 Intel 82801GB USB rev 0x01: apic 2 int 19 (irq 11) uhci2 at pci0 dev 29 function 2 Intel 82801GB USB rev 0x01: apic 2 int 18 (irq 9) uhci3 at pci0 dev 29 function 3 Intel 82801GB USB rev 0x01: apic 2 int 16 (irq 11) ehci0 at pci0 dev 29 function 7 Intel 82801GB USB rev 0x01: apic 2 int 23 (irq 10) usb0 at ehci0: USB revision 2.0 uhub0 at usb0 Intel EHCI root hub rev 2.00/1.00 addr 1 ppb3 at pci0 dev 30 function 0 Intel 82801BA Hub-to-PCI rev 0xe1 pci4 at ppb3 bus 4 em0 at pci4 dev 0 function 0 Intel PRO/1000GT (82541GI) rev 0x05: apic 2 int 21 (irq 10), address 00:1b:21:38:77:25 ichpcib0 at pci0 dev 31 function 0 Intel 82801GB LPC rev 0x01: PM disabled pciide0 at pci0 dev 31 function 1 Intel 82801GB IDE rev 0x01: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility pciide0: channel 0 disabled (no drives) pciide0: channel 1 ignored (disabled) pciide1 at pci0 dev 31 function 2 Intel 82801GB SATA rev 0x01: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI pciide1: using apic 2 int 19 (irq 11) for native-PCI interrupt wd0 at pciide1 channel 0 drive 0:TS32GSSD25S-M wd0: 1-sector
Re: OpenBSD router stops functioning but still send CARP advertisements
I'd rather run pfsync in its own vlan than over a realtek card. It's probably not any slower (what could be slower than a realtek...) and it's not really any less reliable (what use is pfsync if your business network goes down?)
Re: OpenBSD router stops functioning but still send CARP advertisements
* Jussi Peltola pe...@pelzi.net [2009-05-27 12:11]: I'd rather run pfsync in its own vlan than over a realtek card. It's probably not any slower (what could be slower than a realtek...) and it's not really any less reliable (what use is pfsync if your business network goes down?) oh cut the crap. re(4) cards are ok. I would not exactly run my performance critical core routers on them, but that is not their purpose. re is not rl. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg Amsterdam
Re: OpenBSD router stops functioning but still send CARP advertisements
Le 27/05/2009 12:08, Jussi Peltola a icrit : I'd rather run pfsync in its own vlan than over a realtek card. It's probably not any slower (what could be slower than a realtek...) and it's not really any less reliable (what use is pfsync if your business network goes down?) I tought I'd better run pfsync over a direct connection rather that through the switches. In case of failure of a switch, the sync has a chance to be complete and the failover cleaner, but maybe I'm wrong...
Re: OpenBSD router stops functioning but still send CARP advertisements
Simon Morvan gar...@zone84.net wrote: After a couple of hours/days one of the box stop functioning properly : no ping, no more SSH access but I still capture CARP avertisement on the network segments (when it occurs on the master). As a result, when it happens on the master, the slave does not take over. A few ideas... Do you have any different hardware you can try instead to rule out some incompatibility with the machines? Have you checked for BIOS updates etc that might help? Can you break into DDB when this happens? (You'll need to set ddb.console=1 in sysctl.conf and reboot if it's not already set). If you can, trace/ps might be useful. If not it's a useful data point. (make sure you can trigger it correctly while the system is running normally; ctrl+alt+esc on glass console, or BREAK on serial console; then you can 'c'ontinue). Le 27/05/2009 12:08, Jussi Peltola a icrit : I'd rather run pfsync in its own vlan than over a realtek card. It's probably not any slower (what could be slower than a realtek...) and Plenty of 100Mb only cards are slower than a realtek. re(4) here is good for about 550Mb/s of large packets (via tcpbench on a Core2 system), or about 50Mb/s of small-ish datagrams before it starts dropping too many on the floor. it's not really any less reliable (what use is pfsync if your business network goes down?) I tought I'd better run pfsync over a direct connection rather that through the switches. In case of failure of a switch, the sync has a chance to be complete and the failover cleaner, but maybe I'm wrong... If your firewalls are connected to different switches, that does make sense (unless your CPUs are saturated, in which case em(4) might indeed be a bit better).
Re: OpenBSD router stops functioning but still send CARP advertisements
Le 27/05/2009 15:38, Stuart Henderson a icrit : I tought I'd better run pfsync over a direct connection rather that through the switches. In case of failure of a switch, the sync has a chance to be complete and the failover cleaner, but maybe I'm wrong... If your firewalls are connected to different switches, that does make sense (unless your CPUs are saturated, in which case em(4) might indeed be a bit better). Does the pfsync traffic lead to CPU overload before the business traffic do ?
Re: OpenBSD router stops functioning but still send CARP advertisements
On 2009/05/27 16:09, Simon Morvan wrote: Le 27/05/2009 15:38, Stuart Henderson a icrit : I tought I'd better run pfsync over a direct connection rather that through the switches. In case of failure of a switch, the sync has a chance to be complete and the failover cleaner, but maybe I'm wrong... If your firewalls are connected to different switches, that does make sense (unless your CPUs are saturated, in which case em(4) might indeed be a bit better). Does the pfsync traffic lead to CPU overload before the business traffic do ? I think that would depend on the specific interfaces and the traffic characteristics. In your case, since you're limiting pfsync to 100 Mb/s by hardcoding the port speed, I don't think you'll max out the cpu with pfsync traffic even on an Atom.