Re: sasync phase 1 issue

sangdrax8 Sat, 23 Feb 2013 07:11:47 -0800

On Sat, Feb 23, 2013 at 2:14 AM, Todd T. Fries <[email protected]> wrote:
> Penned by sven falempin on 20130222 17:05.33, we have:
> | On Fri, Feb 22, 2013 at 2:29 PM, sangdrax8 <[email protected]> wrote:
> |
> | > I am new to OpenBSD, but would like to take advantage of a redundant
> | > setup with ipsec/carp/sasyncd.  I have run into a situation which seems
> | > to be a bug, and was directed to post to tech with config files.
> | >
> | > I believe my problem is that the phase 1 of an ipsec negotiation is not
> | > being synced with sasyncd, causing a repeatable condition where tunnels
> | > die for extended periods of time.  I have tried the following with all
> | > three machines running 5.1-stable, 5.2-stable, and 5.2-stable with a
> | > snapshot kernel from 2/17/2013.  My main problem exists across all three
> | > setup types.  I am running 5.2 with the snapshot kernel now as it
> | > provides the lifetime setting in ipsec phase 2 to make the testing
> | > faster.
> | >
> | >
> | > ####### Setup Description ######
> | >
> | > 172.16.10.0/24 behind the carp devices on vlan 2
> | > 172.16.20.0/24 the other side of the tunnel no vlan
> | > 1.1.1.0/24 is used for the internet
> | >
> | > vlan 3 is tagged on Fw's, untagged to the lab1 box connected with a
> | > switch between them
> | >
> | > fw boxes use trunk ports as follows
> | > em0 + em1 = trunk0
> | > em2 + em3 = trunk1
> | >
> | >
> | > ####### Setup Drawing ######
> | >
> | >
> | >                     172.16.10.0/24
> | >                 ................
> | >                 .              .   Vlan 2
> | >                 . .3           . .7
> | >            *****.****      ****.*****
> | >            *  fw1   *      *  fw2   *
> | >            *        *      *        *
> | >            *****.****      ****.*****
> | >                 . 1.1.1.2      .  1.1.1.3
> | >                 .              .
> | >                 ................   Vlan 3 to switch
> | >                        . 1.1.1.1
> | >                        .
> | >                        .
> | >                        .
> | >                        .
> | >                        .
> | >                        . 1.1.1.5
> | >                  ******.******
> | >                  *  Lab1     *
> | >                  *           *
> | >                  ******.******
> | >                        .
> | >                     172.16.20.0/24
> | >
> | >
> | >
> | > ###### How to re-create the problem #####
> | >
> | > Bring all machines up, and allow ipsec to come up (ensuring the fw1 is
> | > the master)
> | >
> | > start ping from 172.16.10.0/24 net to 172.16.20.0/24 net
> | >
> | > tcpdump on vlan3 on both fw1 and fw2 (only fw1 should show active esp
> | > traffic).  Note the spi's seen.  this is spi set 1
> | >
> | > carp demote fw1 'ifconfg -g carp carpdemote 128'.
> | >
> | > tcpdump on fw2 should now show the esp (same spi's as before, spi set
> | > 1), and a large increase in sequence numbers
> | >
> | > soon after transfer fw2 will do a full phase1 and phase2 re-negotiation
> | > (can be seen on the tcpdump).  Spi's will change (referring to this as
> | > spi set 2), sequence numbers will reset, and no pings are lost.  This is
> | > where I believe the phase 1 is now renegotiated between fw2 and lab1
> | > because it was not synced from fw1.
> | >
> | > recover fw1 as carp master 'ifconf -g carp -carpdemote 128'.
> | >
> | > tcpdump on fw1 should now show the esp packets (spi's now from set 2),
> | > and a large increase in sequence numbers
> | >
> | > sometimes soon after transfer fw1 will attempt a phase 2 re-key and be
> | > denied.  even if it doesn't do it quickly, when the phase 2 begins to
> | > time out it will attempt to re-key and be denied at that time.  I have
> | > reduced phase 2 to 5 minutes in my tests to allow this to happen more
> | > quickly.
> | >
> | > when phase 2 times out, the pings through the tunnel fail and the tunnel
> | > is down.
> | >
> | > You can fail back to fw2, and a new phase 2 negotiation will take place
> | > to resume traffic, otherwise fw1 will not be able to re-build the tunnel
> | > until the phase 1 times out (I believe 8 hours default)
> | >
> | > As a note, if you fail a firewall by actually rebooting it, this problem
> | > goes undetected as this clears the SA's.
> | >
> | > I know this is a long e-mail, but I have tried to provide all the
> | > details and configurations that could be needed to re-create this.  I
> | > have been able to consistently re-create this issue every time across
> | > multiple versions.  If there is anything I have left off, please let me
> | > know.
> | >
> | >
> | > #######################################################
> | > ############## Configuration Files Below ##############
> | > #######################################################
> | >
> | >
> | > ##### fw1 configs #####
> | >
> | > ==> sasyncd.conf <==
> | > interface carp3
> | > group carp
> | > peer 172.16.10.7
> | > sharedkey
> | > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
> | >
> | >
> | > ==> hostname.carp2 <==
> | > up
> | > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
> | > carpdev vlan2
> | >
> | > ==> hostname.carp3 <==
> | > up
> | > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev vlan3
> | >
> | > ==> hostname.em0 <==
> | > up
> | >
> | > ==> hostname.em1 <==
> | > up
> | >
> | > ==> hostname.em2 <==
> | > up
> | >
> | > ==> hostname.em3 <==
> | > up
> | >
> | > ==> hostname.enc0 <==
> | > up
> | >
> | > ==> hostname.gif1 <==
> | > create
> | > tunnel 172.16.10.1 172.16.20.1
> | > 10.10.10.1 10.10.20.1 netmask 255.255.255.252
> | > mtu 1426
> | > up
> | > !route add 172.16.20.0/24 10.10.20.1
> | >
> | > ==> hostname.pfsync0 <==
> | > up syncdev vlan2 syncpeer 172.16.10.7
> | >
> | > ==> hostname.trunk0 <==
> | > up
> | > trunkproto failover trunkport em0 trunkport em1
> | >
> | > ==> hostname.trunk1 <==
> | > up
> | > trunkproto failover trunkport em2 trunkport em3
> | >
> | > ==> hostname.vlan2 <==
> | > up
> | > inet 172.16.10.3 255.255.255.0 NONE vlan 2 vlandev trunk0
> | >
> | > ==> hostname.vlan3 <==
> | > up
> | > inet 1.1.1.2 255.255.255.0 NONE vlan 3 vlandev trunk1
> | >
> | > ==> ipsec.conf <==
> | > fw_gw = "1.1.1.1"
> | > fw_gif = "172.16.10.1"
> | > fw_net = "172.16.10.0/24"
> | >
> | > lab_gw = "1.1.1.5"
> | > lab_gif = "172.16.20.1"
> | > lab_net = "172.16.20.0/24"
> | >
> | > ike esp from $fw_gif to $lab_gif \
> | >         local $fw_gw peer $lab_gw \
> | >         main auth hmac-sha1 enc aes-256 group modp1024 \
> | >         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m \
> | >         psk "password"
> | >
> | >
> | >
> | >
> | > ##### fw2 configs #####
> | >
> | > ==> sasyncd.conf <==
> | > interface carp3
> | > group carp
> | > peer 172.16.10.3
> | > sharedkey
> | > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
> | >
> | > ==> hostname.carp2 <==
> | > up
> | > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
> | > carpdev vlan2 advskew 128
> | >
> | > ==> hostname.carp3 <==
> | > up
> | > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev vlan3
> | > advskew 128
> | >
> | > ==> hostname.em0 <==
> | > up
> | >
> | > ==> hostname.em1 <==
> | > up
> | >
> | > ==> hostname.em2 <==
> | > up
> | >
> | > ==> hostname.em3 <==
> | > up
> | >
> | > ==> hostname.enc0 <==
> | > up
> | >
> | > ==> hostname.gif1 <==
> | > create
> | > tunnel 172.16.10.1 172.16.20.1
> | > 10.10.10.1 10.10.20.1 netmask 255.255.255.252
> | > mtu 1426
> | > up
> | > !route add 172.16.20.0/24 10.10.20.1
> | >
> | > ==> hostname.pfsync0 <==
> | > up syncdev vlan2 syncpeer 172.16.10.3
> | >
> | > ==> hostname.trunk0 <==
> | > up
> | > trunkproto failover trunkport em0 trunkport em1
> | >
> | > ==> hostname.trunk1 <==
> | > up
> | > trunkproto failover trunkport em2 trunkport em3
> | >
> | > ==> hostname.vlan2 <==
> | > up
> | > inet 172.16.10.7 255.255.255.0 NONE vlan 2 vlandev trunk0
> | >
> | > ==> hostname.vlan3 <==
> | > up
> | > inet 1.1.1.3 255.255.255.0 NONE vlan 3 vlandev trunk1
> | >
> | > ==> ipsec.conf <==
> | > fw_gw = "1.1.1.1"
> | > fw_gif = "172.16.10.1"
> | > fw_net = "172.16.10.0/24"
> | >
> | > lab_gw = "1.1.1.5"
> | > lab_gif = "172.16.20.1"
> | > lab_net = "172.16.20.0/24"
> | >
> | > ike esp from $fw_gif to $lab_gif \
> | >         local $fw_gw peer $lab_gw \
> | >         main auth hmac-sha1 enc aes-256 group modp1024 \
> | >         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m \
> | >         psk "password"
> | >
> | > ###### lab1 ######
> | >
> | > ==> hostname.em0 <==
> | > up
> | > inet 1.1.1.5 255.255.255.0
> | >
> | > ==> hostname.em2 <==
> | > up
> | > inet 172.16.20.1 255.255.255.0
> | >
> | > ==> hostname.enc0 <==
> | > up
> | >
> | > ==> hostname.gif0 <==
> | > create
> | > tunnel 172.16.20.1 172.16.10.1
> | > 10.10.20.1 10.10.10.1 netmask 255.255.255.252
> | > mtu 1426
> | > up
> | > !route add 172.16.10.0/24 10.10.10.1
> | >
> | > So fw1 is not ready if you manually turn it off but come back if you
> | reboot it !
> |
> | i guess a MASTER that fail need maintenance ;-)
> |
> | MAybe it is a missbehavior, but does it actually happen in real use
> | scenarii ?
>
> Yes.
>
> --
> Todd Fries .. [email protected]
>
>  ____________________________________________
> |                                            \  1.636.410.0632 (voice)
> | Free Daemon Consulting, LLC                \  1.405.227.9094 (voice)
> | http://FreeDaemonConsulting.com            \  1.866.792.3418 (FAX)
> | PO Box 16169, Oklahoma City, OK 73113      \  sip:[email protected]
> | "..in support of free software solutions." \  sip:[email protected]
>  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>
>               37E7 D3EB 74D0 8D66 A68D  B866 0326 204E 3F42 004A
>                         http://todd.fries.net/pgp.txt
>



Yes, this is a real world scenario.  The only thing required for this
to happen is the backup firewall taking over for any reason while
the primary is still powered on (aka didn't lose the phase1)

1) you use the carp demote because... <any reason>

2) a switch reboots and the backup firewall takes over for a few
minutes until the primary is once again available

3) a cable is lose and while replacing it the backup firewall takes over

There is even a case where nothing goes wrong and this happens.  If
both boxes are booting up for the first time one of them will come up
first.  If this happens to be fw2, it will be master until fw1 finishes.
It will negotiate phase 1 and 2, and bring up the tunnel. Once fw1
finishes it will become the master, and will re-do the phase1.

So for the next 8 hours (lifetime of phase1) your backup will be in this
state and you do not have redundant devices.  If fw1 dies for any reason
then fw2 will have an old valid phase 1 and your tunnel is down.

Re: sasync phase 1 issue

Reply via email to