Forgot to note the configuration in this thread as well:

The configuration looks like follows:

[root@ams05-004 127.0.0.1-2014-09-17-21:36:11]# cat
/etc/strongswan/ipsec.conf
config setup
        uniqueids=no

conn %default
        ikelifetime=180m
        keylife=90m
        rekeymargin=5m
        keyingtries=%forever
        keyexchange=ikev2
        mobike=no
        reauth=no
        dpdaction=restart
        closeaction=restart

include /etc/strongswan/ipsec.d/*.conf

One of the connection file looks like this:
conn tch017
        left=5.45.X.Y
        right=5.45.Z.W
        authby=secret
        type=transport
        auto=route


Also, looking at one of the affected node on the left side of the
tunnel(tch019 == 5.45.C.D host), the charon log is flooded by similar
messages (only one particular connection grepped out):

Sep 18 11:25:19 08[IKE] <jsc065|31627> received DELETE for ESP CHILD_SA
with SPI c87d949a
Sep 18 11:25:19 08[IKE] <jsc065|31627> CHILD_SA closed
Sep 18 11:25:19 08[IKE] <jsc065|31627> establishing CHILD_SA jsc065
Sep 18 11:25:19 08[ENC] <jsc065|31627> generating CREATE_CHILD_SA
request 23257 [ N(USE_TRANSP) SA No TSi TSr ]
Sep 18 11:25:19 08[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (348 bytes)
Sep 18 11:25:19 04[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (348 bytes)
Sep 18 11:25:19 04[ENC] <jsc065|31627> parsed CREATE_CHILD_SA request
23216 [ N(USE_TRANSP) SA No TSi TSr ]
Sep 18 11:25:19 04[IKE] <jsc065|31627> CHILD_SA jsc065{65} established
with SPIs c7d624c8_i c48ca083_o and TS 5.45.C.D/32 === 5.45.A.X/32
Sep 18 11:25:19 04[ENC] <jsc065|31627> generating CREATE_CHILD_SA
response 23216 [ N(USE_TRANSP) SA No TSi TSr ]
Sep 18 11:25:19 04[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (204 bytes)
Sep 18 11:25:19 10[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (204 bytes)
Sep 18 11:25:19 10[ENC] <jsc065|31627> parsed CREATE_CHILD_SA response
23257 [ N(USE_TRANSP) SA No TSi TSr ]
Sep 18 11:25:19 10[IKE] <jsc065|31627> CHILD_SA jsc065{65} established
with SPIs c6b2b2ef_i c1fb6e97_o and TS 5.45.C.D/32 === 5.45.A.X/32
Sep 18 11:25:19 10[ENC] <jsc065|31627> generating INFORMATIONAL request
23258 [ ]
Sep 18 11:25:19 10[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)
Sep 18 11:25:19 08[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)
Sep 18 11:25:19 08[ENC] <jsc065|31627> parsed INFORMATIONAL request
23217 [ ]
Sep 18 11:25:19 08[ENC] <jsc065|31627> generating INFORMATIONAL response
23217 [ ]
Sep 18 11:25:19 08[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)
Sep 18 11:25:19 10[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)
Sep 18 11:25:19 10[ENC] <jsc065|31627> parsed INFORMATIONAL response
23258 [ ]
Sep 18 11:25:19 10[IKE] <jsc065|31627> closing expired CHILD_SA
jsc065{65} with SPIs c3bca161_i c076627e_o and TS 5.45.C.D/32 ===
5.45.A.X/32
Sep 18 11:25:19 10[IKE] <jsc065|31627> sending DELETE for ESP CHILD_SA
with SPI c3bca161
Sep 18 11:25:19 10[IKE] <jsc065|31627> scheduling CHILD_SA recreate
after hard expire
Sep 18 11:25:19 10[ENC] <jsc065|31627> generating INFORMATIONAL request
23259 [ D ]
Sep 18 11:25:19 10[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)
Sep 18 11:25:19 05[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)
Sep 18 11:25:19 05[ENC] <jsc065|31627> parsed INFORMATIONAL request
23218 [ ]
Sep 18 11:25:19 05[ENC] <jsc065|31627> generating INFORMATIONAL response
23218 [ ]
Sep 18 11:25:19 05[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)
Sep 18 11:25:19 03[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)
Sep 18 11:25:19 03[ENC] <jsc065|31627> parsed INFORMATIONAL response
23259 [ D ]
Sep 18 11:25:19 03[IKE] <jsc065|31627> received DELETE for ESP CHILD_SA
with SPI c076627e
Sep 18 11:25:19 03[IKE] <jsc065|31627> CHILD_SA closed
Sep 18 11:25:19 03[IKE] <jsc065|31627> establishing CHILD_SA jsc065
Sep 18 11:25:19 03[ENC] <jsc065|31627> generating CREATE_CHILD_SA
request 23260 [ N(USE_TRANSP) SA No TSi TSr ]
Sep 18 11:25:19 03[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (348 bytes)
Sep 18 11:25:19 08[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (348 bytes)
Sep 18 11:25:19 08[ENC] <jsc065|31627> parsed CREATE_CHILD_SA request
23219 [ N(USE_TRANSP) SA No TSi TSr ]

So it seems that it keeps recreating CHILD_SA over and over. Did I hit
any kind of race condition? What could cause that? Restarting of
strongswan on tch019 helps to resolve the problem temporarily, but it
reappears on some other node.

Regards
Jirka H.
 
On 09/18/2014 10:25 AM, Jiri Horky wrote:
> Hi list,
>
> we have a rather specific setup with 102x102 host2host tunnels, each
> host from left side conneting to all 102 hosts on the other side. We use
> strongswan-5.2.0-2, we are on CentOS 6.5 with long term support kernel
> v3.14.13. 
>
> The problem we are seeing is that from time to time, a lot of SPIs are 
> created:
>
> [root@ams02-001 ~]# ip xfrm state | grep -c ^src
> 7857
>
> Instead of expected ~ 102*2=204 SPIs. Also, we see a alot. Usually,
>
> Some of the SPIs are in a stale state, on the affected nodes, we see this in 
> charon.log:
>
>
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c663d950 
> failed: No such process (3)
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c5265bed 
> failed: No such process (3)
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI cea44e84 
> failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c218d962 
> failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c08f7af3 
> failed: No such process (3)
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI ca3eb959 
> failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI ccd82ddf 
> failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c5e8e50d 
> failed: No such process (3)
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c7c6f94f 
> failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI cb94f74a 
> failed: No such process (3)
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c2ebe4df 
> failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI cc64998e 
> failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c68e87ce 
> failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cc044f83 
> failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c38d0c03 
> failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c433c8d0 
> failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cd2d20c0 
> failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c01a3729 
> failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI cb6871db 
> failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI c5425a67 
> failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c546ed9d 
> failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI cab7ddd0 
> failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI c86040c9 
> failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cc54935a 
> failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cb459320 
> failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI c471d8de 
> failed: No such process (3)
> Sep 18 10:15:57 06[KNL] <tch059|50061> querying SAD entry with SPI c4e7124f 
> failed: No such process (3)
> Sep 18 10:15:57 06[KNL] <tch059|50061> querying SAD entry with SPI c2c4b801 
> failed: No such process (3)
>
> Some of the point to point assosiations are there multiple times:
>
> ip xfrm state  | grep ^src | sort | uniq -c | sort -n
>
> [root@ams02-001 ~]# ip xfrm state  | grep ^src | sort | uniq -c | sort -rn | 
> head -n10
>    3528 src 5.45.A.B dst 5.45.C.D
>    3528 src 5.45.C.D dst 5.45.A.B
>      88 src 5.45.A.B dst 5.45.E.F
>      88 src 5.45.E.F dst 5.45.A.B
>
> This might related to the kernel crashes we see which I described in thread 
> "Occasional kernel crash at __xfrm_state_lookup".
>
> Any help would be appreciated.
>
> Jiri Horky
>

_______________________________________________
Users mailing list
[email protected]
https://lists.strongswan.org/mailman/listinfo/users

Reply via email to