Re: Firewall partially failing with high traffic (Updated)
Just building off my last message. Answering Ryans questions first: - Do you have dedicated addresses on the carp parent interfaces? For sure. - Are all the carp devices on the master firewall MASTER; what about the backup? Before and after the network dies, primary firewall is all MASTER, secondary stays as BACKUP. - Can you reach the 'dissapearing' network from the backup firewall? Yes. - Is preemption enabled? (sysctl net.inet.carp.preempt=1) Yes. - What is the output of 'netstat -sp carp' on both the master and backup firewalls? Have it below. - What about the output of 'netstat -i'? Are there output errors on the offending interface? Exact output below, but no errors in or out, before or after. - Have you tried running with carp debugging turned on? (sysctl net.inet.carp.log=1) Did this on both firewalls, didn't see output one way or the other. Restarted with it in sysctls.conf just to be sure, but didn't see anything. What further I know: - set debug loud, lots of output, nothing looks different while the problem is present. - From the dead network, if I ping the firewall, tcpdump shows the firewall making an arp request for the originating machine. 18:17:50.015307 arp who-has 172.168.120.50 tell 172.168.120.2 172.168.120.50 is the machine on the dead network, which was trying to ping the firewall. This would lead me to believe the firewall saw -something-. Lots of traffic trying to going to, but none come back from that network. - I can ping the dead interface locally. - Bringing interface down and up doesn't help - From the firewall itself, I can hang that interface. Before I was doing it from my desktop, through the firewall. Ifconfig explanation: gem0 - external gem1 - 120.x - network that disappears hme0 - 0.x - pfsync traffic hme1 - 121.x - Network my terminal is on hme2 - 119.x My ifconfig -A output from the master firewall: $ ifconfig -A lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33192 groups: lo inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0xa gem0: flags=8b63UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST mtu 1500 lladdr 00:03:ba:f2:bc:1c groups: egress media: Ethernet autoselect (100baseTX full-duplex) status: active inet 216.2.22.123 netmask 0xffe0 broadcast 216.82.41.127 inet6 fe80::203:baff:fef2:bc1c%gem0 prefixlen 64 scopeid 0x1 gem1: flags=8b63UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST mtu 1500 lladdr 00:03:ba:f2:bc:1d media: Ethernet autoselect (100baseTX full-duplex) status: active inet 172.168.120.2 netmask 0xff00 broadcast 172.168.120.255 inet6 fe80::203:baff:fef2:bc1d%gem1 prefixlen 64 scopeid 0x2 hme0: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 08:00:20:ee:66:60 media: Ethernet autoselect (100baseTX full-duplex) status: active inet 10.0.0.1 netmask 0xff00 broadcast 10.0.0.255 inet6 fe80::a00:20ff:feee:6660%hme0 prefixlen 64 scopeid 0x3 hme1: flags=8b63UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST mtu 1500 lladdr 08:00:20:ee:66:61 media: Ethernet autoselect (100baseTX full-duplex) status: active inet 172.168.121.2 netmask 0xff00 broadcast 172.168.121.255 inet6 fe80::a00:20ff:feee:6661%hme1 prefixlen 64 scopeid 0x4 hme2: flags=8b63UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST mtu 1500 lladdr 08:00:20:ee:66:62 media: Ethernet autoselect (100baseTX full-duplex) status: active inet 172.168.119.2 netmask 0xff00 broadcast 172.168.119.255 inet6 fe80::a00:20ff:feee:6662%hme2 prefixlen 64 scopeid 0x5 hme3: flags=8822BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST mtu 1500 lladdr 08:00:20:ee:66:63 media: Ethernet autoselect pflog0: flags=141UP,RUNNING,PROMISC mtu 33192 pfsync0: flags=41UP,RUNNING mtu 1348 pfsync: syncdev: hme0 maxupd: 128 enc0: flags=0 mtu 1536 tun0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1500 groups: tun inet 172.168.123.1 -- 172.168.123.2 netmask 0x carp0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 carp: MASTER carpdev gem0 vhid 1 advbase 1 advskew 0 groups: carp inet 216.82.41.116 netmask 0xffe0 broadcast 216.82.41.127 inet 216.82.41.97 netmask 0xffe0 broadcast 216.82.41.127 inet 216.82.41.98 netmask 0xffe0 broadcast 216.82.41.127 inet 216.82.41.117 netmask 0xffe0 broadcast 216.82.41.127 inet 216.82.41.118 netmask 0xffe0 broadcast 216.82.41.127 inet 216.82.41.119 netmask 0xffe0 broadcast 216.82.41.127 inet 216.82.41.120 netmask 0xffe0 broadcast 216.82.41.127 inet 216.82.41.125 netmask 0xffe0 broadcast 216.82.41.127 inet
Firewall partially failing with high traffic
I have a 3.8 PF/CARP setup that I can reproducibly screw up simply by cat'ing lots of text over a telnet session. It has several subnets, and several NICs, but only 1 subnet becomes unavailable. Everything else continues to work. There are no errors in messages, daemon, with PF debug set to misc. Counters all look normal, same with state table and netstat -m output. The only reason I believe it's the firewall is restarting it will bring the network back up. I can't (easily) give direct output from things like ifconfig or pf.conf as they're both huge and contain information I've been told we don't want to send out. Hopefully this doesn't prevent anyone from helping me out. gem0 - external gem1 - 120.x hme0 - 0.x hme1 - 121.x hme2 - 119.x Coming in on hme1 routed through gem1, I can cause everything off gem1 to stop working. The interface shows as up, but nothing works. All other interfaces work fine. PF continues to work as NAT and external firewalling still operates. No errors anywhere, even with debugging turned on in PF. netstat -m looks the same before and after. I'm hoping someone can give me a better way to debug this, considering I can reproduce it. I don't believe it's PF as I can disable and re-enable it with no effect. I've disabled ohci using config -e as those were the only errors I was seeing. Specifically: ohci0: 1 scheduling overruns However they didn't happen anywhere near this problem. dmesg (out of messages): syncing disks... done o arpresolve console is /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL PROTECTED],3f8 Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2005 OpenBSD. All rights reserved. http://www.OpenBSD.org Copyright (c) 1995-2005 OpenBSD. All rights reserved. http://www.OpenBSD.org OpenBSD 3.8 (CARP) #0: Fri Feb 24 15:29:15 MST 2006 [EMAIL PROTECTED]:/usr/src/sys/arch/sparc64/compile/CARP total memory = 1073741824 avail memory = 969023488 using 6553 buffers containing 53682176 bytes of memory bootpath: /[EMAIL PROTECTED],0/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0 mainbus0 (root): Sun Fire V120 (UltraSPARC-IIe 648MHz) cpu0 at mainbus0: SUNW,UltraSPARC-IIe @ 648 MHz, version 0 FPU cpu0: physical 32K instruction (32 b/l), 16K data (32 b/l), 2048K external (64 b/l) psycho0 at mainbus0 SUNW,sabre: impl 0, version 0: ign 7c0 bus range 0 to 3; PCI bus 0 DVMA map: c000 to e000 IOTDB: 4d0a000 to 4d8a000 pci0 at psycho0 ppb0 at pci0 dev 1 function 1 Sun Simba PCI-PCI rev 0x13 pci1 at ppb0 bus 1 ebus0 at pci1 dev 12 function 0 Sun PCIO Ebus2 (US III) rev 0x01 flashprom at ebus0 addr 0-f not configured clock1 at ebus0 addr 0-1fff: mk48t59: hostid 83f2bc1c ebus_attach: idprom: incomplete SUNW,lomh at ebus0 addr 20-23 ipl 42 not configured gem0 at pci1 dev 12 function 1 Sun ERI Ether rev 0x01: ivec 3006, address 00:03:ba:f2:bc:1c bmtphy0 at gem0 phy 1: BCM5221 100baseTX PHY, rev. 4 ohci0 at pci1 dev 12 function 3 Sun USB rev 0x01: ivec 24, version 1.0, legacy support usb0 at ohci0: USB revision 1.0 uhub0 at usb0 uhub0: Sun OHCI root hub, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered Acer Labs M7101 Power rev 0x00 at pci1 dev 3 function 0 not configured Acer Labs M7101 Power rev 0x00 at pci1 dev 3 function 0 not configured ebus1 at pci1 dev 7 function 0 Acer Labs M1533 ISA rev 0x00 power at ebus1 addr 800-82f ipl 37 not configured com0 at ebus1 addr 3f8-3ff ipl 43: ns16550a, 16 byte fifo com0: console com1 at ebus1 addr 2e8-2ef ipl 43: ns16550a, 16 byte fifo pciide0 at pci1 dev 13 function 0 Acer Labs M5229 UDMA IDE rev 0xc3: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI pciide0: using ivec 180c for native-PCI interrupt pciide0: channel 0 disabled (no drives) pciide0: channel 1 disabled (no drives) gem1 at pci1 dev 5 function 1 Sun ERI Ether rev 0x01: ivec 301c, address 00:03:ba:f2:bc:1d bmtphy1 at gem1 phy 1: BCM5221 100baseTX PHY, rev. 4 ohci1 at pci1 dev 5 function 3 Sun USB rev 0x01: ivec 26, version 1.0, legacy support usb1 at ohci1: USB revision 1.0 uhub1 at usb1 uhub1: Sun OHCI root hub, rev 1.00/1.00, addr 1 uhub1: 4 ports with 4 removable, self powered ppb1 at pci0 dev 1 function 0 Sun Simba PCI-PCI rev 0x13 pci2 at ppb1 bus 2 siop0 at pci2 dev 8 function 0 Symbios Logic 53c896 rev 0x07: ivec 1820, using 8K of on-board RAM scsibus0 at siop0: 16 targets sd0 at scsibus0 targ 0 lun 0: FUJITSU, MAT3073N SUN72G, 0602 SCSI4 0/direct fixed sd0: 70007MB, 14100 cyl, 24 head, 423 sec, 512 bytes/sec, 143374738 sec total sd1 at scsibus0 targ 1 lun 0: FUJITSU, MAT3073N SUN72G, 0602 SCSI4 0/direct fixed sd1: 70007MB, 14100 cyl, 24 head, 423 sec, 512 bytes/sec, 143374738 sec total siop1 at pci2 dev 8 function 1 Symbios Logic 53c896 rev 0x07: ivec 1820, using 8K of on-board RAM scsibus1 at siop1: 16 targets ppb2 at pci2 dev 5 function 0 Intel S21154AE/BE PCI-PCI rev 0x00 pci3 at ppb2 bus
Re: Firewall partially failing with high traffic
In article [EMAIL PROTECTED], Chris Cameron wrote: I have a 3.8 PF/CARP setup that I can reproducibly screw up simply by cat'ing lots of text over a telnet session. Chances are that you're hitting some bug in 3.8, that has likely been fixed in 3.9, or 4.0. Or the rule you're using to pass the traffic is wrong. You using keep state? Are you using 'flags S/SA' on that rule? With the amount of information you've given, it is hard to even theorize what could be wrong. People would need more information. --Toby.
Re: Firewall partially failing with high traffic
On Tue, Nov 14, 2006 at 09:28:47AM -0700, Chris Cameron wrote: Upgrading isn't an option. I mean it is, but as soon as I say Don't know, lets just upgrade, that's a major hit to something that was tough to get in in the first place. This will be a Firewall-1 shop again quite quickly and any future thing I recommend isn't going to have much weight. You need to upgrade anyway to properly keep up with security updates. You're now running a system that is no longer supported; upgrading to a supported system is a Good Thing regardless of the issue you're currently dealing with. As a bonus, things generally get better and 'more fixed' with each new version and, as Tobias says, there's a good chance the problem you're running up against is resolved. -- o--{ Will Maier }--o | web:...http://www.lfod.us/ | [EMAIL PROTECTED] | *--[ BSD Unix: Live Free or Die ]--*
Re: Firewall partially failing with high traffic
Hi, On 11/14/06, Chris Cameron [EMAIL PROTECTED] wrote: I have a 3.8 PF/CARP setup that I can reproducibly screw up simply by cat'ing lots of text over a telnet session. can you post `pfctl -s info` and `pfctl -s memory`? Best regards, Carlos. -- nick grah windows just crashed again, unstable crap. yukito Windows isn't unstable, it's just spontaneous.
Re: Firewall partially failing with high traffic
This is while it's working. I'll repost this tonight when I'm able to hang it. Status: Enabled for 0 days 16:47:54 Debug: Urgent Interface Stats for gem0 IPv4 IPv6 Bytes In 1560279475 272 Bytes Out 1464940667 352 Packets In Passed 23485100 Blocked 883254 Packets Out Passed 23883682 Blocked 213 State Table Total Rate current entries 784 searches18122501 299.7/s inserts 1069401.8/s removals 1061561.8/s Counters match 3044965.0/s bad-offset 00.0/s fragment 20.0/s short 00.0/s normalize 00.0/s memory 00.0/s bad-timestamp 00.0/s congestion 1290.0/s ip-option 00.0/s proto-cksum 3010.0/s state-mismatch 15190.0/s state-insert 9030.0/s state-limit00.0/s src-limit 00.0/s synproxy 00.0/s $ sudo pfctl -s memory stateshard limit1 src-nodes hard limit1 frags hard limit 5000 tableshard limit 1000 table-entries hard limit 10 $ Chris On Tue, 2006-11-14 at 13:05 -0500, Carlos A. Carnero Delgado wrote: Hi, On 11/14/06, Chris Cameron [EMAIL PROTECTED] wrote: I have a 3.8 PF/CARP setup that I can reproducibly screw up simply by cat'ing lots of text over a telnet session. can you post `pfctl -s info` and `pfctl -s memory`? Best regards, Carlos.
Re: Firewall partially failing with high traffic
On Tue, Nov 14, 2006 at 06:03:51AM -0700, Chris Cameron wrote: I have a 3.8 PF/CARP setup that I can reproducibly screw up simply by cat'ing lots of text over a telnet session. It has several subnets, and several NICs, but only 1 subnet becomes unavailable. Everything else continues to work. There are no errors in messages, daemon, with PF debug set to misc. Counters all look normal, same with state table and netstat -m output. The only reason I believe it's the firewall is restarting it will bring the network back up. gem0 - external gem1 - 120.x hme0 - 0.x hme1 - 121.x hme2 - 119.x Coming in on hme1 routed through gem1, I can cause everything off gem1 to stop working. The interface shows as up, but nothing works. All other interfaces work fine. PF continues to work as NAT and external firewalling still operates. No errors anywhere, even with debugging turned on in PF. netstat -m looks the same before and after. I'm hoping someone can give me a better way to debug this, considering I can reproduce it. I don't believe it's PF as I can disable and re-enable it with no effect. What happens when you send the same data from the firewall? I've disabled ohci using config -e as those were the only errors I was seeing. Specifically: ohci0: 1 scheduling overruns However they didn't happen anywhere near this problem. That does not look like a likely culprit, no. Are you sure it's not just bad hardware? Joachim
Re: Firewall partially failing with high traffic
At 2006-11-14 13:03:51, Chris Cameron wrote: I can't (easily) give direct output from things like ifconfig or pf.conf as they're both huge and contain information I've been told we don't want to send out. Hopefully this doesn't prevent anyone from helping me out. If it's a problem with carp, it's going to be really difficult to resolve without seeing the ifconfig ouptut, but here are some questions that you might want to consider... - Do you have dedicated addresses on the carp parent interfaces? - Are all the carp devices on the master firewall MASTER; what about the backup? - Can you reach the 'dissapearing' network from the backup firewall? - Is preemption enabled? (sysctl net.inet.carp.preempt=1) - What is the output of 'netstat -sp carp' on both the master and backup firewalls? - What about the output of 'netstat -i'? Are there output errors on the offending interface? - Have you tried running with carp debugging turned on? (sysctl net.inet.carp.log=1)