> We are using OpenBSD 3.7 with carp preemption and we have checked that > all interfaces are connected while booting. Carp preemptive failover > works perfectly: we tested it unplugging the ethernet cable from the > nics which are used for carp. > > We also experienced that ARP thing during the migration of our > firewalls. For a long period of time (about two hours) we weren't able > to reach Internet. We think this was because of the ARP cache of our > ISP's router, which was CISCO. We also think that everything worked > again magically after a couple of hours because the ARP cache of that > CISCO router expired (it was late night and we hadn't access to the > room where that router was). Anyway, we still see those ARP failures > when we reboot any of the firewalls, but it doesn't represent any > problem as everything keeps working well. >
Ok interesting... You should NOT see thoose errors when you reboot!. But I bet you don't have two ciscos running HSRP as we do at my web hosting customer. However... It would probably (as for us) go away if you put IP:s on the nics and don't just use the virtual ones OR install "arp ping" from ports and run that from cron every 5 min.IF you test this I would really like to have your results.... > Here's a tcpdump at the moment when a download of an ISO stalls. We > were monitoring carp activity for our external and internal interfaces > and noticed nothing wrong whith carp. > > ------ > Feb 13 18:43:29.405778 0:40:f4:7a:42:72 0:3:40:9c:3:b1 0800 66: > x.x.x.x.59877 > 193.1.193.69.80: . ack 40626201 win 32767 > <nop,nop,timestamp 2597008 157620462> > Feb 13 18:43:29.407996 0:3:40:9c:3:b1 0:0:5e:0:1:1 0800 1364: > 193.1.193.69.80 > x.x.x.x.59877: . 40626201:40627499(1298) ack 160 win > 140 <nop,nop,timestamp 157620475 2596660> (DF) > Feb 13 18:43:29.410776 0:3:40:9c:3:b1 0:0:5e:0:1:1 0800 1364: > 193.1.193.69.80 > x.x.x.x.59877: . 40627499:40628797(1298) ack 160 win > 140 <nop,nop,timestamp 157620475 2596660> (DF) > Feb 13 18:43:29.413217 0:40:f4:7a:42:72 0:3:40:9c:3:b1 0800 66: > x.x.x.x.59877 > 193.1.193.69.80: . ack 40628797 win 32767 > <nop,nop,timestamp 2597016 157620475> > Feb 13 18:43:29.413504 0:3:40:9c:3:b1 0:0:5e:0:1:1 0800 1364: > 193.1.193.69.80 > x.x.x.x.59877: . 40628797:40630095(1298) ack 160 win > 140 <nop,nop,timestamp 157620475 2596670> (DF) > Feb 13 18:43:29.419252 0:3:40:9c:3:b1 0:0:5e:0:1:1 0800 1364: > 193.1.193.69.80 > x.x.x.x.59877: . 40630095:40631393(1298) ack 160 win > 140 <nop,nop,timestamp 157620475 2596670> (DF) > Feb 13 18:43:29.423734 0:40:f4:7a:42:72 0:3:40:9c:3:b1 0800 66: > x.x.x.x.59877 > 193.1.193.69.80: . ack 40631393 win 32767 > <nop,nop,timestamp 2597025 157620475> > ------ > > I saved some logs for pfsync traffic. The only strange things that we > found on these logs are like these: > > Feb 13 18:43:52.804942 0:40:f4:50:f9:99 1:0:5e:0:0:f0 0800 118: 10.0.0.2: > \ > PFSYNCv2 count 5: DEL ST COMP: > id: 436c1ee101e566c5 creatorid: 49885306 > id: 436c1ee101e566c7 creatorid: 49885306 > id: 436c1ee101e566c8 creatorid: 49885306 > > Feb 13 18:43:59.496957 0:40:f4:50:f9:99 0:40:f4:7a:3d:a7 0800 60: > 10.0.0.2: \ > PFSYNCv2 count 1: UPD REQ: > id: 436c1ee101e56a40 creatorid: 49885306 > > What does DEL ST COMP stand for? Might be Delete State Completed? > > We are going to try disabling carp preemption and we will report if > this solves the problem, although we find carp preemption desirable in > our firewall architecture. > > Thanks. > What type of nics do you have? There are fixes for some nics that could cause problems like this. For xl nics (3com) for example.... If you have unpatched xl driver for the pfsync interfaces... to bad.... And I am talkning about 3.7 here. A stable 3.7 checkout could fix it if you use xl. Check CVS and see.
