Re: sasync phase 1 issue

sangdrax8 Mon, 25 Feb 2013 11:41:44 -0800

On Sat, Feb 23, 2013 at 11:14 AM, sven falempin <[email protected]> wrote:
>
>
> On Sat, Feb 23, 2013 at 10:10 AM, sangdrax8 <[email protected]> wrote:
>>
>> On Sat, Feb 23, 2013 at 2:14 AM, Todd T. Fries <[email protected]> wrote:
>> > Penned by sven falempin on 20130222 17:05.33, we have:
>> > | On Fri, Feb 22, 2013 at 2:29 PM, sangdrax8 <[email protected]>
>> > wrote:
>> > |
>> > | > I am new to OpenBSD, but would like to take advantage of a redundant
>> > | > setup with ipsec/carp/sasyncd.  I have run into a situation which
>> > seems
>> > | > to be a bug, and was directed to post to tech with config files.
>> > | >
>> > | > I believe my problem is that the phase 1 of an ipsec negotiation is
>> > not
>> > | > being synced with sasyncd, causing a repeatable condition where
>> > tunnels
>> > | > die for extended periods of time.  I have tried the following with
>> > all
>> > | > three machines running 5.1-stable, 5.2-stable, and 5.2-stable with a
>> > | > snapshot kernel from 2/17/2013.  My main problem exists across all
>> > three
>> > | > setup types.  I am running 5.2 with the snapshot kernel now as it
>> > | > provides the lifetime setting in ipsec phase 2 to make the testing
>> > | > faster.
>> > | >
>> > | >
>> > | > ####### Setup Description ######
>> > | >
>> > | > 172.16.10.0/24 behind the carp devices on vlan 2
>> > | > 172.16.20.0/24 the other side of the tunnel no vlan
>> > | > 1.1.1.0/24 is used for the internet
>> > | >
>> > | > vlan 3 is tagged on Fw's, untagged to the lab1 box connected with a
>> > | > switch between them
>> > | >
>> > | > fw boxes use trunk ports as follows
>> > | > em0 + em1 = trunk0
>> > | > em2 + em3 = trunk1
>> > | >
>> > | >
>> > | > ####### Setup Drawing ######
>> > | >
>> > | >
>> > | >                     172.16.10.0/24
>> > | >                 ................
>> > | >                 .              .   Vlan 2
>> > | >                 . .3           . .7
>> > | >            *****.****      ****.*****
>> > | >            *  fw1   *      *  fw2   *
>> > | >            *        *      *        *
>> > | >            *****.****      ****.*****
>> > | >                 . 1.1.1.2      .  1.1.1.3
>> > | >                 .              .
>> > | >                 ................   Vlan 3 to switch
>> > | >                        . 1.1.1.1
>> > | >                        .
>> > | >                        .
>> > | >                        .
>> > | >                        .
>> > | >                        .
>> > | >                        . 1.1.1.5
>> > | >                  ******.******
>> > | >                  *  Lab1     *
>> > | >                  *           *
>> > | >                  ******.******
>> > | >                        .
>> > | >                     172.16.20.0/24
>> > | >
>> > | >
>> > | >
>> > | > ###### How to re-create the problem #####
>> > | >
>> > | > Bring all machines up, and allow ipsec to come up (ensuring the fw1
>> > is
>> > | > the master)
>> > | >
>> > | > start ping from 172.16.10.0/24 net to 172.16.20.0/24 net
>> > | >
>> > | > tcpdump on vlan3 on both fw1 and fw2 (only fw1 should show active
>> > esp
>> > | > traffic).  Note the spi's seen.  this is spi set 1
>> > | >
>> > | > carp demote fw1 'ifconfg -g carp carpdemote 128'.
>> > | >
>> > | > tcpdump on fw2 should now show the esp (same spi's as before, spi
>> > set
>> > | > 1), and a large increase in sequence numbers
>> > | >
>> > | > soon after transfer fw2 will do a full phase1 and phase2
>> > re-negotiation
>> > | > (can be seen on the tcpdump).  Spi's will change (referring to this
>> > as
>> > | > spi set 2), sequence numbers will reset, and no pings are lost.
>> > This is
>> > | > where I believe the phase 1 is now renegotiated between fw2 and lab1
>> > | > because it was not synced from fw1.
>> > | >
>> > | > recover fw1 as carp master 'ifconf -g carp -carpdemote 128'.
>> > | >
>> > | > tcpdump on fw1 should now show the esp packets (spi's now from set
>> > 2),
>> > | > and a large increase in sequence numbers
>> > | >
>> > | > sometimes soon after transfer fw1 will attempt a phase 2 re-key and
>> > be
>> > | > denied.  even if it doesn't do it quickly, when the phase 2 begins
>> > to
>> > | > time out it will attempt to re-key and be denied at that time.  I
>> > have
>> > | > reduced phase 2 to 5 minutes in my tests to allow this to happen
>> > more
>> > | > quickly.
>> > | >
>> > | > when phase 2 times out, the pings through the tunnel fail and the
>> > tunnel
>> > | > is down.
>> > | >
>> > | > You can fail back to fw2, and a new phase 2 negotiation will take
>> > place
>> > | > to resume traffic, otherwise fw1 will not be able to re-build the
>> > tunnel
>> > | > until the phase 1 times out (I believe 8 hours default)
>> > | >
>> > | > As a note, if you fail a firewall by actually rebooting it, this
>> > problem
>> > | > goes undetected as this clears the SA's.
>> > | >
>> > | > I know this is a long e-mail, but I have tried to provide all the
>> > | > details and configurations that could be needed to re-create this.
>> > I
>> > | > have been able to consistently re-create this issue every time
>> > across
>> > | > multiple versions.  If there is anything I have left off, please let
>> > me
>> > | > know.
>> > | >
>> > | >
>> > | > #######################################################
>> > | > ############## Configuration Files Below ##############
>> > | > #######################################################
>> > | >
>> > | >
>> > | > ##### fw1 configs #####
>> > | >
>> > | > ==> sasyncd.conf <==
>> > | > interface carp3
>> > | > group carp
>> > | > peer 172.16.10.7
>> > | > sharedkey
>> > | > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
>> > | >
>> > | >
>> > | > ==> hostname.carp2 <==
>> > | > up
>> > | > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
>> > | > carpdev vlan2
>> > | >
>> > | > ==> hostname.carp3 <==
>> > | > up
>> > | > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev
>> > vlan3
>> > | >
>> > | > ==> hostname.em0 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em1 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em2 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em3 <==
>> > | > up
>> > | >
>> > | > ==> hostname.enc0 <==
>> > | > up
>> > | >
>> > | > ==> hostname.gif1 <==
>> > | > create
>> > | > tunnel 172.16.10.1 172.16.20.1
>> > | > 10.10.10.1 10.10.20.1 netmask 255.255.255.252
>> > | > mtu 1426
>> > | > up
>> > | > !route add 172.16.20.0/24 10.10.20.1
>> > | >
>> > | > ==> hostname.pfsync0 <==
>> > | > up syncdev vlan2 syncpeer 172.16.10.7
>> > | >
>> > | > ==> hostname.trunk0 <==
>> > | > up
>> > | > trunkproto failover trunkport em0 trunkport em1
>> > | >
>> > | > ==> hostname.trunk1 <==
>> > | > up
>> > | > trunkproto failover trunkport em2 trunkport em3
>> > | >
>> > | > ==> hostname.vlan2 <==
>> > | > up
>> > | > inet 172.16.10.3 255.255.255.0 NONE vlan 2 vlandev trunk0
>> > | >
>> > | > ==> hostname.vlan3 <==
>> > | > up
>> > | > inet 1.1.1.2 255.255.255.0 NONE vlan 3 vlandev trunk1
>> > | >
>> > | > ==> ipsec.conf <==
>> > | > fw_gw = "1.1.1.1"
>> > | > fw_gif = "172.16.10.1"
>> > | > fw_net = "172.16.10.0/24"
>> > | >
>> > | > lab_gw = "1.1.1.5"
>> > | > lab_gif = "172.16.20.1"
>> > | > lab_net = "172.16.20.0/24"
>> > | >
>> > | > ike esp from $fw_gif to $lab_gif \
>> > | >         local $fw_gw peer $lab_gw \
>> > | >         main auth hmac-sha1 enc aes-256 group modp1024 \
>> > | >         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m
>> > \
>> > | >         psk "password"
>> > | >
>> > | >
>> > | >
>> > | >
>> > | > ##### fw2 configs #####
>> > | >
>> > | > ==> sasyncd.conf <==
>> > | > interface carp3
>> > | > group carp
>> > | > peer 172.16.10.3
>> > | > sharedkey
>> > | > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
>> > | >
>> > | > ==> hostname.carp2 <==
>> > | > up
>> > | > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
>> > | > carpdev vlan2 advskew 128
>> > | >
>> > | > ==> hostname.carp3 <==
>> > | > up
>> > | > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev
>> > vlan3
>> > | > advskew 128
>> > | >
>> > | > ==> hostname.em0 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em1 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em2 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em3 <==
>> > | > up
>> > | >
>> > | > ==> hostname.enc0 <==
>> > | > up
>> > | >
>> > | > ==> hostname.gif1 <==
>> > | > create
>> > | > tunnel 172.16.10.1 172.16.20.1
>> > | > 10.10.10.1 10.10.20.1 netmask 255.255.255.252
>> > | > mtu 1426
>> > | > up
>> > | > !route add 172.16.20.0/24 10.10.20.1
>> > | >
>> > | > ==> hostname.pfsync0 <==
>> > | > up syncdev vlan2 syncpeer 172.16.10.3
>> > | >
>> > | > ==> hostname.trunk0 <==
>> > | > up
>> > | > trunkproto failover trunkport em0 trunkport em1
>> > | >
>> > | > ==> hostname.trunk1 <==
>> > | > up
>> > | > trunkproto failover trunkport em2 trunkport em3
>> > | >
>> > | > ==> hostname.vlan2 <==
>> > | > up
>> > | > inet 172.16.10.7 255.255.255.0 NONE vlan 2 vlandev trunk0
>> > | >
>> > | > ==> hostname.vlan3 <==
>> > | > up
>> > | > inet 1.1.1.3 255.255.255.0 NONE vlan 3 vlandev trunk1
>> > | >
>> > | > ==> ipsec.conf <==
>> > | > fw_gw = "1.1.1.1"
>> > | > fw_gif = "172.16.10.1"
>> > | > fw_net = "172.16.10.0/24"
>> > | >
>> > | > lab_gw = "1.1.1.5"
>> > | > lab_gif = "172.16.20.1"
>> > | > lab_net = "172.16.20.0/24"
>> > | >
>> > | > ike esp from $fw_gif to $lab_gif \
>> > | >         local $fw_gw peer $lab_gw \
>> > | >         main auth hmac-sha1 enc aes-256 group modp1024 \
>> > | >         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m
>> > \
>> > | >         psk "password"
>> > | >
>> > | > ###### lab1 ######
>> > | >
>> > | > ==> hostname.em0 <==
>> > | > up
>> > | > inet 1.1.1.5 255.255.255.0
>> > | >
>> > | > ==> hostname.em2 <==
>> > | > up
>> > | > inet 172.16.20.1 255.255.255.0
>> > | >
>> > | > ==> hostname.enc0 <==
>> > | > up
>> > | >
>> > | > ==> hostname.gif0 <==
>> > | > create
>> > | > tunnel 172.16.20.1 172.16.10.1
>> > | > 10.10.20.1 10.10.10.1 netmask 255.255.255.252
>> > | > mtu 1426
>> > | > up
>> > | > !route add 172.16.10.0/24 10.10.10.1
>> > | >
>> > | > So fw1 is not ready if you manually turn it off but come back if you
>> > | reboot it !
>> > |
>> > | i guess a MASTER that fail need maintenance ;-)
>> > |
>> > | MAybe it is a missbehavior, but does it actually happen in real use
>> > | scenarii ?
>> >
>> > Yes.
>> >
>> > --
>> > Todd Fries .. [email protected]
>> >
>> >  ____________________________________________
>> > |                                            \  1.636.410.0632 (voice)
>> > | Free Daemon Consulting, LLC                \  1.405.227.9094 (voice)
>> > | http://FreeDaemonConsulting.com            \  1.866.792.3418 (FAX)
>> > | PO Box 16169, Oklahoma City, OK 73113      \  sip:[email protected]
>> > | "..in support of free software solutions." \  sip:[email protected]
>> >  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>> >
>> >               37E7 D3EB 74D0 8D66 A68D  B866 0326 204E 3F42 004A
>> >                         http://todd.fries.net/pgp.txt
>> >
>>
>>
>> Yes, this is a real world scenario.  The only thing required for this
>> to happen is the backup firewall taking over for any reason while
>> the primary is still powered on (aka didn't lose the phase1)
>>
>> 1) you use the carp demote because... <any reason>
>>
>> 2) a switch reboots and the backup firewall takes over for a few
>> minutes until the primary is once again available
>>
>> 3) a cable is lose and while replacing it the backup firewall takes over
>>
>> There is even a case where nothing goes wrong and this happens.  If
>> both boxes are booting up for the first time one of them will come up
>> first.  If this happens to be fw2, it will be master until fw1 finishes.
>> It will negotiate phase 1 and 2, and bring up the tunnel. Once fw1
>> finishes it will become the master, and will re-do the phase1.
>>
>> So for the next 8 hours (lifetime of phase1) your backup will be in this
>> state and you do not have redundant devices.  If fw1 dies for any reason
>> then fw2 will have an old valid phase 1 and your tunnel is down.
>>
>
> number 2 is convincing
>
>
> --
> ---------------------------------------------------------------------------------------------------------------------
> () ascii ribbon campaign - against html e-mail
> /\



Any chance someone has the time/knowledge to squash this bug? I
would like to deploy some syncing firewall/vpn devices but as they
are right now I can't put this in my production environment.

Re: sasync phase 1 issue

Reply via email to