Hi folks, Long time listener, first time caller. I've been a happy strongswan user for years, recently I moved my gateways to the latest Linux 4.14.77-80 and Strongswan U5.7.1 and if I leave my connection idle, it can never pass traffic again until I restart the connection.
I have two gateways connecting to a Check Point vpn concentrator I don't manage. Identical configuration, bone stock default strongswan.conf. I constantly ping the remote gateway on one of my gateways and it's been up for more than a day, the other I let idle and now I can't reach the remote IPs any more. My ipsec.conf is basic: config setup uniqueids = yes conn %default inactive=15m ikelifetime=1h lifetime=31m margintime=3m rekeyfuzz=100% rekey=yes conn REMOTE authby=psk auto=route keyexchange=ikev1 What detail would help troubleshoot this? ip route list table 220 is the same on the gw that is currently working and the gw that has timed out. /proc/net/xfrm_stat is all zeroes EXCEPT XfrmOutNoStates which keeps increasing with every ping I send and never get a response to. ip xfrm state: WORKING: src LOCAL dst REMOTEGW proto esp spi 0xff09c486 reqid 1 mode tunnel replay-window 0 flag af-unspec auth-trunc hmac(sha256) 0xREDACTED 128 enc cbc(aes) 0xREDACTED anti-replay context: seq 0x0, oseq 0x2, bitmap 0x00000000 src REMOTEGW dst LOCAL proto esp spi 0xc757e42f reqid 1 mode tunnel replay-window 32 flag af-unspec auth-trunc hmac(sha256) 0xREDACTED 128 enc cbc(aes) 0xREDACTED anti-replay context: seq 0x2, oseq 0x0, bitmap 0x00000003 TIMED OUT: src LOCAL dst REMOTEGW proto esp spi 0x00000000 reqid 1 mode tunnel replay-window 0 anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 sel src LOCALIP/32 dst REMOTEIP/32 proto udp sport 33053 dport 1025 dev eth0 Comparison of working and idled out statuses: WORKING: Routed Connections: REMOTE{1}: ROUTED, TUNNEL, reqid 1 REMOTE{1}: LOCALSUBNET/32 === REMOTESUBNET/32 Security Associations (1 up, 0 connecting): REMOTE[49]: ESTABLISHED 9 minutes ago, redacted REMOTE{66}: INSTALLED, TUNNEL, reqid 1, ESP SPIs: c894097b_i ed7e44f7_o REMOTE{66}: LOCALSUBNET/32 === REMOTESUBNET/32 TIMED OUT: Routed Connections: REMOTE{1}: ROUTED, TUNNEL, reqid 1 REMOTE{1}: LOCALSUBNET/32 === REMOTESUBNET/32 Security Associations (1 up, 0 connecting): REMOTE[16]: ESTABLISHED 6 minutes ago, redacted I have charon debug running with most everything set to 2, here are some state changes from the one that timed out: 08:13 15[IKE] <15> IKE_SA (unnamed)[15] state change: CREATED => CONNECTING 08:13 16[IKE] <REMOTE|15> IKE_SA REMOTE[15] state change: CONNECTING => ESTABLISHED 08:13 06[CHD] <REMOTE|14> CHILD_SA REMOTE{7} state change: CREATED => DESTROYING 08:13 06[IKE] <REMOTE|14> IKE_SA REMOTE[14] state change: ESTABLISHED => DELETING 08:13 06[IKE] <REMOTE|14> IKE_SA REMOTE[14] state change: DELETING => DESTROYING 08:28 16[IKE] <16> IKE_SA (unnamed)[16] state change: CREATED => CONNECTING 08:28 06[IKE] <REMOTE|16> IKE_SA REMOTE[16] state change: CONNECTING => ESTABLISHED 08:28 07[CHD] <REMOTE|15> CHILD_SA REMOTE{8} state change: CREATED => DESTROYING 08:28 07[IKE] <REMOTE|15> IKE_SA REMOTE[15] state change: ESTABLISHED => DELETING 08:28 07[IKE] <REMOTE|15> IKE_SA REMOTE[15] state change: DELETING => DESTROYING I'm guessing my side thinks the tunnel is up, remote thinks tunnel is down. How can I get it to automatically reset in this case? Thanks in advance!