On Sat, Feb 23, 2013 at 11:14 AM, sven falempin <[email protected]> wrote: > > > On Sat, Feb 23, 2013 at 10:10 AM, sangdrax8 <[email protected]> wrote: >> >> On Sat, Feb 23, 2013 at 2:14 AM, Todd T. Fries <[email protected]> wrote: >> > Penned by sven falempin on 20130222 17:05.33, we have: >> > | On Fri, Feb 22, 2013 at 2:29 PM, sangdrax8 <[email protected]> >> > wrote: >> > | >> > | > I am new to OpenBSD, but would like to take advantage of a redundant >> > | > setup with ipsec/carp/sasyncd. I have run into a situation which >> > seems >> > | > to be a bug, and was directed to post to tech with config files. >> > | > >> > | > I believe my problem is that the phase 1 of an ipsec negotiation is >> > not >> > | > being synced with sasyncd, causing a repeatable condition where >> > tunnels >> > | > die for extended periods of time. I have tried the following with >> > all >> > | > three machines running 5.1-stable, 5.2-stable, and 5.2-stable with a >> > | > snapshot kernel from 2/17/2013. My main problem exists across all >> > three >> > | > setup types. I am running 5.2 with the snapshot kernel now as it >> > | > provides the lifetime setting in ipsec phase 2 to make the testing >> > | > faster. >> > | > >> > | > >> > | > ####### Setup Description ###### >> > | > >> > | > 172.16.10.0/24 behind the carp devices on vlan 2 >> > | > 172.16.20.0/24 the other side of the tunnel no vlan >> > | > 1.1.1.0/24 is used for the internet >> > | > >> > | > vlan 3 is tagged on Fw's, untagged to the lab1 box connected with a >> > | > switch between them >> > | > >> > | > fw boxes use trunk ports as follows >> > | > em0 + em1 = trunk0 >> > | > em2 + em3 = trunk1 >> > | > >> > | > >> > | > ####### Setup Drawing ###### >> > | > >> > | > >> > | > 172.16.10.0/24 >> > | > ................ >> > | > . . Vlan 2 >> > | > . .3 . .7 >> > | > *****.**** ****.***** >> > | > * fw1 * * fw2 * >> > | > * * * * >> > | > *****.**** ****.***** >> > | > . 1.1.1.2 . 1.1.1.3 >> > | > . . >> > | > ................ Vlan 3 to switch >> > | > . 1.1.1.1 >> > | > . >> > | > . >> > | > . >> > | > . >> > | > . >> > | > . 1.1.1.5 >> > | > ******.****** >> > | > * Lab1 * >> > | > * * >> > | > ******.****** >> > | > . >> > | > 172.16.20.0/24 >> > | > >> > | > >> > | > >> > | > ###### How to re-create the problem ##### >> > | > >> > | > Bring all machines up, and allow ipsec to come up (ensuring the fw1 >> > is >> > | > the master) >> > | > >> > | > start ping from 172.16.10.0/24 net to 172.16.20.0/24 net >> > | > >> > | > tcpdump on vlan3 on both fw1 and fw2 (only fw1 should show active >> > esp >> > | > traffic). Note the spi's seen. this is spi set 1 >> > | > >> > | > carp demote fw1 'ifconfg -g carp carpdemote 128'. >> > | > >> > | > tcpdump on fw2 should now show the esp (same spi's as before, spi >> > set >> > | > 1), and a large increase in sequence numbers >> > | > >> > | > soon after transfer fw2 will do a full phase1 and phase2 >> > re-negotiation >> > | > (can be seen on the tcpdump). Spi's will change (referring to this >> > as >> > | > spi set 2), sequence numbers will reset, and no pings are lost. >> > This is >> > | > where I believe the phase 1 is now renegotiated between fw2 and lab1 >> > | > because it was not synced from fw1. >> > | > >> > | > recover fw1 as carp master 'ifconf -g carp -carpdemote 128'. >> > | > >> > | > tcpdump on fw1 should now show the esp packets (spi's now from set >> > 2), >> > | > and a large increase in sequence numbers >> > | > >> > | > sometimes soon after transfer fw1 will attempt a phase 2 re-key and >> > be >> > | > denied. even if it doesn't do it quickly, when the phase 2 begins >> > to >> > | > time out it will attempt to re-key and be denied at that time. I >> > have >> > | > reduced phase 2 to 5 minutes in my tests to allow this to happen >> > more >> > | > quickly. >> > | > >> > | > when phase 2 times out, the pings through the tunnel fail and the >> > tunnel >> > | > is down. >> > | > >> > | > You can fail back to fw2, and a new phase 2 negotiation will take >> > place >> > | > to resume traffic, otherwise fw1 will not be able to re-build the >> > tunnel >> > | > until the phase 1 times out (I believe 8 hours default) >> > | > >> > | > As a note, if you fail a firewall by actually rebooting it, this >> > problem >> > | > goes undetected as this clears the SA's. >> > | > >> > | > I know this is a long e-mail, but I have tried to provide all the >> > | > details and configurations that could be needed to re-create this. >> > I >> > | > have been able to consistently re-create this issue every time >> > across >> > | > multiple versions. If there is anything I have left off, please let >> > me >> > | > know. >> > | > >> > | > >> > | > ####################################################### >> > | > ############## Configuration Files Below ############## >> > | > ####################################################### >> > | > >> > | > >> > | > ##### fw1 configs ##### >> > | > >> > | > ==> sasyncd.conf <== >> > | > interface carp3 >> > | > group carp >> > | > peer 172.16.10.7 >> > | > sharedkey >> > | > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3 >> > | > >> > | > >> > | > ==> hostname.carp2 <== >> > | > up >> > | > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password >> > | > carpdev vlan2 >> > | > >> > | > ==> hostname.carp3 <== >> > | > up >> > | > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev >> > vlan3 >> > | > >> > | > ==> hostname.em0 <== >> > | > up >> > | > >> > | > ==> hostname.em1 <== >> > | > up >> > | > >> > | > ==> hostname.em2 <== >> > | > up >> > | > >> > | > ==> hostname.em3 <== >> > | > up >> > | > >> > | > ==> hostname.enc0 <== >> > | > up >> > | > >> > | > ==> hostname.gif1 <== >> > | > create >> > | > tunnel 172.16.10.1 172.16.20.1 >> > | > 10.10.10.1 10.10.20.1 netmask 255.255.255.252 >> > | > mtu 1426 >> > | > up >> > | > !route add 172.16.20.0/24 10.10.20.1 >> > | > >> > | > ==> hostname.pfsync0 <== >> > | > up syncdev vlan2 syncpeer 172.16.10.7 >> > | > >> > | > ==> hostname.trunk0 <== >> > | > up >> > | > trunkproto failover trunkport em0 trunkport em1 >> > | > >> > | > ==> hostname.trunk1 <== >> > | > up >> > | > trunkproto failover trunkport em2 trunkport em3 >> > | > >> > | > ==> hostname.vlan2 <== >> > | > up >> > | > inet 172.16.10.3 255.255.255.0 NONE vlan 2 vlandev trunk0 >> > | > >> > | > ==> hostname.vlan3 <== >> > | > up >> > | > inet 1.1.1.2 255.255.255.0 NONE vlan 3 vlandev trunk1 >> > | > >> > | > ==> ipsec.conf <== >> > | > fw_gw = "1.1.1.1" >> > | > fw_gif = "172.16.10.1" >> > | > fw_net = "172.16.10.0/24" >> > | > >> > | > lab_gw = "1.1.1.5" >> > | > lab_gif = "172.16.20.1" >> > | > lab_net = "172.16.20.0/24" >> > | > >> > | > ike esp from $fw_gif to $lab_gif \ >> > | > local $fw_gw peer $lab_gw \ >> > | > main auth hmac-sha1 enc aes-256 group modp1024 \ >> > | > quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m >> > \ >> > | > psk "password" >> > | > >> > | > >> > | > >> > | > >> > | > ##### fw2 configs ##### >> > | > >> > | > ==> sasyncd.conf <== >> > | > interface carp3 >> > | > group carp >> > | > peer 172.16.10.3 >> > | > sharedkey >> > | > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3 >> > | > >> > | > ==> hostname.carp2 <== >> > | > up >> > | > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password >> > | > carpdev vlan2 advskew 128 >> > | > >> > | > ==> hostname.carp3 <== >> > | > up >> > | > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev >> > vlan3 >> > | > advskew 128 >> > | > >> > | > ==> hostname.em0 <== >> > | > up >> > | > >> > | > ==> hostname.em1 <== >> > | > up >> > | > >> > | > ==> hostname.em2 <== >> > | > up >> > | > >> > | > ==> hostname.em3 <== >> > | > up >> > | > >> > | > ==> hostname.enc0 <== >> > | > up >> > | > >> > | > ==> hostname.gif1 <== >> > | > create >> > | > tunnel 172.16.10.1 172.16.20.1 >> > | > 10.10.10.1 10.10.20.1 netmask 255.255.255.252 >> > | > mtu 1426 >> > | > up >> > | > !route add 172.16.20.0/24 10.10.20.1 >> > | > >> > | > ==> hostname.pfsync0 <== >> > | > up syncdev vlan2 syncpeer 172.16.10.3 >> > | > >> > | > ==> hostname.trunk0 <== >> > | > up >> > | > trunkproto failover trunkport em0 trunkport em1 >> > | > >> > | > ==> hostname.trunk1 <== >> > | > up >> > | > trunkproto failover trunkport em2 trunkport em3 >> > | > >> > | > ==> hostname.vlan2 <== >> > | > up >> > | > inet 172.16.10.7 255.255.255.0 NONE vlan 2 vlandev trunk0 >> > | > >> > | > ==> hostname.vlan3 <== >> > | > up >> > | > inet 1.1.1.3 255.255.255.0 NONE vlan 3 vlandev trunk1 >> > | > >> > | > ==> ipsec.conf <== >> > | > fw_gw = "1.1.1.1" >> > | > fw_gif = "172.16.10.1" >> > | > fw_net = "172.16.10.0/24" >> > | > >> > | > lab_gw = "1.1.1.5" >> > | > lab_gif = "172.16.20.1" >> > | > lab_net = "172.16.20.0/24" >> > | > >> > | > ike esp from $fw_gif to $lab_gif \ >> > | > local $fw_gw peer $lab_gw \ >> > | > main auth hmac-sha1 enc aes-256 group modp1024 \ >> > | > quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m >> > \ >> > | > psk "password" >> > | > >> > | > ###### lab1 ###### >> > | > >> > | > ==> hostname.em0 <== >> > | > up >> > | > inet 1.1.1.5 255.255.255.0 >> > | > >> > | > ==> hostname.em2 <== >> > | > up >> > | > inet 172.16.20.1 255.255.255.0 >> > | > >> > | > ==> hostname.enc0 <== >> > | > up >> > | > >> > | > ==> hostname.gif0 <== >> > | > create >> > | > tunnel 172.16.20.1 172.16.10.1 >> > | > 10.10.20.1 10.10.10.1 netmask 255.255.255.252 >> > | > mtu 1426 >> > | > up >> > | > !route add 172.16.10.0/24 10.10.10.1 >> > | > >> > | > So fw1 is not ready if you manually turn it off but come back if you >> > | reboot it ! >> > | >> > | i guess a MASTER that fail need maintenance ;-) >> > | >> > | MAybe it is a missbehavior, but does it actually happen in real use >> > | scenarii ? >> > >> > Yes. >> > >> > -- >> > Todd Fries .. [email protected] >> > >> > ____________________________________________ >> > | \ 1.636.410.0632 (voice) >> > | Free Daemon Consulting, LLC \ 1.405.227.9094 (voice) >> > | http://FreeDaemonConsulting.com \ 1.866.792.3418 (FAX) >> > | PO Box 16169, Oklahoma City, OK 73113 \ sip:[email protected] >> > | "..in support of free software solutions." \ sip:[email protected] >> > \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ >> > >> > 37E7 D3EB 74D0 8D66 A68D B866 0326 204E 3F42 004A >> > http://todd.fries.net/pgp.txt >> > >> >> >> Yes, this is a real world scenario. The only thing required for this >> to happen is the backup firewall taking over for any reason while >> the primary is still powered on (aka didn't lose the phase1) >> >> 1) you use the carp demote because... <any reason> >> >> 2) a switch reboots and the backup firewall takes over for a few >> minutes until the primary is once again available >> >> 3) a cable is lose and while replacing it the backup firewall takes over >> >> There is even a case where nothing goes wrong and this happens. If >> both boxes are booting up for the first time one of them will come up >> first. If this happens to be fw2, it will be master until fw1 finishes. >> It will negotiate phase 1 and 2, and bring up the tunnel. Once fw1 >> finishes it will become the master, and will re-do the phase1. >> >> So for the next 8 hours (lifetime of phase1) your backup will be in this >> state and you do not have redundant devices. If fw1 dies for any reason >> then fw2 will have an old valid phase 1 and your tunnel is down. >> > > number 2 is convincing > > > -- > --------------------------------------------------------------------------------------------------------------------- > () ascii ribbon campaign - against html e-mail > /\
Any chance someone has the time/knowledge to squash this bug? I would like to deploy some syncing firewall/vpn devices but as they are right now I can't put this in my production environment.
