Thanks Martin
I checked the configuration settings - in fact they were
keylife=20m
rekeymargin=3m
rekeyfuzz=100%
As far as I can see this means that rekey event will occur between 14 and 17
minutes. I guess over the course of a day this increases the likelihood of a
collision.
Here are the log entries for the 2 sites for the single second where all the
action occurred. The outcome is that we cannot ping through the tunnel.
If you cant see any problem in the log, I'll gather some more data next time it
happens. Is there a log level which I should set to get better information?
Regards
Steve
Site 1
Jul 28 10:31:48 zsitegate-01 charon: 02[KNL] creating rekey job for ESP
CHILD_SA with SPI cfb451b5 and reqid {1}
Jul 28 10:31:48 zsitegate-01 charon: 02[IKE] establishing CHILD_SA
0000-0129-private-to-0000-0121-private{1}
Jul 28 10:31:48 zsitegate-01 charon: 02[ENC] generating CREATE_CHILD_SA request
16 [ N(REKEY_SA) SA No TSi TSr ]
Jul 28 10:31:48 zsitegate-01 charon: 02[NET] sending packet: from
10.1.41.177[4500] to 212.126.133.206[4500] (348 bytes)
Jul 28 10:31:48 zsitegate-01 charon: 16[NET] received packet: from
212.126.133.206[4500] to 10.1.41.177[4500] (348 bytes)
Jul 28 10:31:48 zsitegate-01 charon: 16[ENC] parsed CREATE_CHILD_SA request 23
[ N(REKEY_SA) SA No TSi TSr ]
Jul 28 10:31:48 zsitegate-01 charon: 16[IKE] CHILD_SA
0000-0129-private-to-0000-0121-private{1} established with SPIs c7ba6228_i
c70845f3_o and TS 10.1.41.128/25 === 10.1.33.128/25
Jul 28 10:31:48 zsitegate-01 charon: 16[IKE] detected CHILD_REKEY collision
with CHILD_REKEY
Jul 28 10:31:48 zsitegate-01 charon: 16[ENC] generating CREATE_CHILD_SA
response 23 [ SA No TSi TSr ]
Jul 28 10:31:48 zsitegate-01 charon: 16[NET] sending packet: from
10.1.41.177[4500] to 212.126.133.206[4500] (204 bytes)
Jul 28 10:31:48 zsitegate-01 charon: 15[NET] received packet: from
212.126.133.206[4500] to 10.1.41.177[4500] (204 bytes)
Jul 28 10:31:48 zsitegate-01 charon: 15[ENC] parsed CREATE_CHILD_SA response 16
[ SA No TSi TSr ]
Jul 28 10:31:48 zsitegate-01 charon: 15[IKE] CHILD_SA
0000-0129-private-to-0000-0121-private{1} established with SPIs c738e8d4_i
cea4e8aa_o and TS 10.1.41.128/25 === 10.1.33.128/25
Jul 28 10:31:48 zsitegate-01 charon: 15[IKE] CHILD_SA rekey collision lost,
deleting rekeyed child
Jul 28 10:31:48 zsitegate-01 charon: 15[IKE] closing CHILD_SA
0000-0129-private-to-0000-0121-private{1} with SPIs c738e8d4_i (0 bytes)
cea4e8aa_o (0 bytes) and TS 10.1.41.128/25 === 10.1.33.128/25 c7ba6228
Jul 28 10:31:48 zsitegate-01 charon: 15[IKE] sending DELETE for ESP CHILD_SA
with SPI c738e8d4
Jul 28 10:31:48 zsitegate-01 charon: 15[ENC] generating INFORMATIONAL request
17 [ D ]
Jul 28 10:31:48 zsitegate-01 charon: 15[NET] sending packet: from
10.1.41.177[4500] to 212.126.133.206[4500] (76 bytes)
Jul 28 10:31:48 zsitegate-01 charon: 13[NET] received packet: from
212.126.133.206[4500] to 10.1.41.177[4500] (76 bytes)
Jul 28 10:31:48 zsitegate-01 charon: 13[ENC] parsed INFORMATIONAL request 24 [
D ]
Jul 28 10:31:48 zsitegate-01 charon: 13[IKE] received DELETE for ESP CHILD_SA
with SPI cc55cd29
Jul 28 10:31:48 zsitegate-01 charon: 13[IKE] closing CHILD_SA
0000-0129-private-to-0000-0121-private{1} with SPIs cfb451b5_i (191384 bytes)
cc55cd29_o (215176 bytes) and TS 10.1.41.128/25 === 10.1.33.128/25
Jul 28 10:31:48 zsitegate-01 charon: 13[IKE] sending DELETE for ESP CHILD_SA
with SPI cfb451b5
Jul 28 10:31:48 zsitegate-01 charon: 13[IKE] CHILD_SA closed
Jul 28 10:31:48 zsitegate-01 charon: 13[IKE] detected CHILD_REKEY collision
with CHILD_DELETE
Jul 28 10:31:48 zsitegate-01 charon: 13[ENC] generating INFORMATIONAL response
24 [ D ]
Jul 28 10:31:48 zsitegate-01 charon: 13[NET] sending packet: from
10.1.41.177[4500] to 212.126.133.206[4500] (76 bytes)
Jul 28 10:31:48 zsitegate-01 charon: 01[NET] received packet: from
212.126.133.206[4500] to 10.1.41.177[4500] (76 bytes)
Jul 28 10:31:48 zsitegate-01 charon: 01[ENC] parsed INFORMATIONAL response
c7ba622817 [ D ]
Jul 28 10:31:48 zsitegate-01 charon: 01[IKE] received DELETE for ESP CHILD_SA
with SPI cea4e8aa
Jul 28 10:31:48 zsitegate-01 charon: 01[IKE] CHILD_SA closed
Jul 28 10:31:48 zsitegate-01 vpn: - CN=intersite-customer-0000-0121, OU=HAP
Customers 10.1.33.128/25 == 212.126.133.206 -- 10.1.41.177 == 10.1.41.128/25
Site2
Jul 28 10:31:48 zsitegate-00 charon: 06[KNL] creating rekey job for ESP
CHILD_SA with SPI cfb451b5 and reqid {1}
Jul 28 10:31:48 zsitegate-00 charon: 06[IKE] establishing CHILD_SA
0000-0121-private-to-0000-0129-private{1}
Jul 28 10:31:48 zsitegate-00 charon: 06[ENC] generating CREATE_CHILD_SA request
23 [ N(REKEY_SA) SA No TSi TSr ]
Jul 28 10:31:48 zsitegate-00 charon: 06[NET] sending packet: from
10.1.33.177[4500] to 85.13.227.242[4500] (348 bytes)
Jul 28 10:31:48 zsitegate-00 charon: 14[NET] received packet: from
85.13.227.242[4500] to 10.1.33.177[4500] (348 bytes)
Jul 28 10:31:48 zsitegate-00 charon: 14[ENC] parsed CREATE_CHILD_SA request 16
[ N(REKEY_SA) SA No TSi TSr ]
Jul 28 10:31:48 zsitegate-00 charon: 14[IKE] CHILD_SA
0000-0121-private-to-0000-0129-private{1} established with SPIs cea4e8aa_i
c738e8d4_o and TS 10.1.33.128/25 === 10.1.41.128/25
Jul 28 10:31:48 zsitegate-00 charon: 14[IKE] detected CHILD_REKEY collision
with CHILD_REKEY
Jul 28 10:31:48 zsitegate-00 charon: 14[ENC] generating CREATE_CHILD_SA
response 16 [ SA No TSi TSr ]
Jul 28 10:31:48 zsitegate-00 charon: 14[NET] sending packet: from
10.1.33.177[4500] to 85.13.227.242[4500] (204 bytes)
Jul 28 10:31:48 zsitegate-00 charon: 13[NET] received packet: from
85.13.227.242[4500] to 10.1.33.177[4500] (204 bytes)
Jul 28 10:31:48 zsitegate-00 charon: 13[ENC] parsed CREATE_CHILD_SA response 23
[ SA No TSi TSr ]
Jul 28 10:31:48 zsitegate-00 charon: 13[IKE] CHILD_SA
0000-0121-private-to-0000-0129-private{1} established with SPIs c70845f3_i
c7ba6228_o and TS 10.1.33.128/25 === 10.1.41.128/25
Jul 28 10:31:48 zsitegate-00 charon: 13[IKE] CHILD_SA rekey collision won,
deleting old child
Jul 28 10:31:48 zsitegate-00 charon: 13[IKE] closing CHILD_SA
0000-0121-private-to-0000-0129-private{1} with SPIs cc55cd29_i (213705 bytes)
cfb451b5_o (193578 bytes) and TS 10.1.33.128/25 === 10.1.41.128/25
Jul 28 10:31:48 zsitegate-00 charon: 13[IKE] sending DELETE for ESP CHILD_SA
with SPI cc55cd29
Jul 28 10:31:48 zsitegate-00 charon: 13[ENC] generating INFORMATIONAL request
24 [ D ]
Jul 28 10:31:48 zsitegate-00 charon: 13[NET] sending packet: from
10.1.33.177[4500] to 85.13.227.242[4500] (76 bytes)
Jul 28 10:31:48 zsitegate-00 charon: 01[NET] received packet: from
85.13.227.242[4500] to 10.1.33.177[4500] (76 bytes)
Jul 28 10:31:48 zsitegate-00 charon: 01[ENC] parsed INFORMATIONAL request 17 [
D ]
Jul 28 10:31:48 zsitegate-00 charon: 01[IKE] received DELETE for ESP CHILD_SA
with SPI c738e8d4
Jul 28 10:31:48 zsitegate-00 charon: 01[IKE] closing CHILD_SA
0000-0121-private-to-0000-0129-private{1} with SPIs cea4e8aa_i (0 bytes)
c738e8d4_o (0 bytes) and TS 10.1.33.128/25 === 10.1.41.128/25
Jul 28 10:31:48 zsitegate-00 charon: 01[IKE] sending DELETE for ESP CHILD_SA
with SPI cea4e8aa
Jul 28 10:31:48 zsitegate-00 charon: 01[IKE] CHILD_SA closed
Jul 28 10:31:48 zsitegate-00 vpn: - CN=intersite-customer-0000-0121, OU=HAP
Customers 10.1.41.128/25 == 85.13.227.242 -- 10.1.33.177 == 10.1.33.128/25
Jul 28 10:31:48 zsitegate-00 charon: 01[ENC] generating INFORMATIONAL response
17 [ D ]
Jul 28 10:31:48 zsitegate-00 charon: 01[NET] sending packet: from
10.1.33.177[4500] to 85.13.227.242[4500] (76 bytes)
Jul 28 10:31:48 zsitegate-00 charon: 02[NET] received packet: from
85.13.227.242[4500] to 10.1.33.177[4500] (76 bytes)
Jul 28 10:31:48 zsitegate-00 charon: 02[ENC] parsed INFORMATIONAL response 24 [
D ]
Jul 28 10:31:48 zsitegate-00 charon: 02[IKE] received DELETE for ESP CHILD_SA
with SPI cfb451b5
Jul 28 10:31:48 zsitegate-00 charon: 02[IKE] CHILD_SA closed
________________________________________
From: Martin Willi <[email protected]>
Sent: 29 July 2014 13:33
To: Steve Lee
Cc: [email protected]
Subject: Re: [strongSwan] Rekey Collisions
Hi Steve,
> one or more tunnels seem to stop working
What does that exactly mean? What IKE version are you using?
> After some investigation, it seemed that these coincide with a rekey
> collision where both sides create a rekey jobs at (to the nearest
> second) the same time. When this happens I dont see any specific errors
> in the logs.
If charon detects rekey collisions, it should log that fact. But there
are many different collision scenarios; an excerpt from your log could
certainly help to analyze the issue.
> 1. Why do we keep seeing the collisions, surely the rekeyfuzz would
> make this pretty unlikely or does the way the host were built and/or
> time sync affect the randomness of rekeyfuzz?
Your system time should not have any effect; on most systems charon does
not use the system time anyway to schedule such events.
keylife=20m
rekeymargin=3m
keyingtries=%forever
rekeyfuzz=100%
With your rekeymargin of 9m and 100% fuzz, collisions should be in fact
be very (very) rare. If this is reproducible, something is seriously
wrong.
For these non-cryptographic operations, charon relies on a getpid() +
time() initialized random() calls. Not sure how your Hypervisor handles
that?
> 2. When we get a collision why dont we see an error and why doesnt it
> retry given the keyingtries parameter?
keyingtries has no effect when handling rekey collisions. I think with
5.1.1 these collisions should be handled properly.
> 3. Is it recommended that only one side should do rekeying (i.e. set
> rekey=no on the other)?
Usually it is not required, as with a sane configuration collisions are
unlikely, and even if they happen should be handled gracefully, at least
between two strongSwan hosts.
Regards
Martin
_______________________________________________
Users mailing list
[email protected]
https://lists.strongswan.org/mailman/listinfo/users