We are using OpenBSD 3.7 with carp preemption and we have checked that
all interfaces are connected while booting. Carp preemptive failover
works perfectly: we tested it unplugging the ethernet cable from the
nics which are used for carp.
We also experienced that ARP thing during the migration of our
firewalls. For a long period of time (about two hours) we weren't able
to reach Internet. We think this was because of the ARP cache of our
ISP's router, which was CISCO. We also think that everything worked
again magically after a couple of hours because the ARP cache of that
CISCO router expired (it was late night and we hadn't access to the
room where that router was). Anyway, we still see those ARP failures
when we reboot any of the firewalls, but it doesn't represent any
problem as everything keeps working well.
Here's a tcpdump at the moment when a download of an ISO stalls. We
were monitoring carp activity for our external and internal interfaces
and noticed nothing wrong whith carp.
------
Feb 13 18:43:29.405778 0:40:f4:7a:42:72 0:3:40:9c:3:b1 0800 66:
x.x.x.x.59877 > 193.1.193.69.80: . ack 40626201 win 32767
<nop,nop,timestamp 2597008 157620462>
Feb 13 18:43:29.407996 0:3:40:9c:3:b1 0:0:5e:0:1:1 0800 1364:
193.1.193.69.80 > x.x.x.x.59877: . 40626201:40627499(1298) ack 160 win
140 <nop,nop,timestamp 157620475 2596660> (DF)
Feb 13 18:43:29.410776 0:3:40:9c:3:b1 0:0:5e:0:1:1 0800 1364:
193.1.193.69.80 > x.x.x.x.59877: . 40627499:40628797(1298) ack 160 win
140 <nop,nop,timestamp 157620475 2596660> (DF)
Feb 13 18:43:29.413217 0:40:f4:7a:42:72 0:3:40:9c:3:b1 0800 66:
x.x.x.x.59877 > 193.1.193.69.80: . ack 40628797 win 32767
<nop,nop,timestamp 2597016 157620475>
Feb 13 18:43:29.413504 0:3:40:9c:3:b1 0:0:5e:0:1:1 0800 1364:
193.1.193.69.80 > x.x.x.x.59877: . 40628797:40630095(1298) ack 160 win
140 <nop,nop,timestamp 157620475 2596670> (DF)
Feb 13 18:43:29.419252 0:3:40:9c:3:b1 0:0:5e:0:1:1 0800 1364:
193.1.193.69.80 > x.x.x.x.59877: . 40630095:40631393(1298) ack 160 win
140 <nop,nop,timestamp 157620475 2596670> (DF)
Feb 13 18:43:29.423734 0:40:f4:7a:42:72 0:3:40:9c:3:b1 0800 66:
x.x.x.x.59877 > 193.1.193.69.80: . ack 40631393 win 32767
<nop,nop,timestamp 2597025 157620475>
------
I saved some logs for pfsync traffic. The only strange things that we
found on these logs are like these:
Feb 13 18:43:52.804942 0:40:f4:50:f9:99 1:0:5e:0:0:f0 0800 118: 10.0.0.2: \
PFSYNCv2 count 5: DEL ST COMP:
id: 436c1ee101e566c5 creatorid: 49885306
id: 436c1ee101e566c7 creatorid: 49885306
id: 436c1ee101e566c8 creatorid: 49885306
Feb 13 18:43:59.496957 0:40:f4:50:f9:99 0:40:f4:7a:3d:a7 0800 60: 10.0.0.2: \
PFSYNCv2 count 1: UPD REQ:
id: 436c1ee101e56a40 creatorid: 49885306
What does DEL ST COMP stand for? Might be Delete State Completed?
We are going to try disabling carp preemption and we will report if
this solves the problem, although we find carp preemption desirable in
our firewall architecture.
Thanks.