Re: [strongSwan] High availability failover problem
On 2015-03-11 10:35, Martin Willi wrote: Hi, Is it essential for both nodes to receive all the ESP packets? Yes. Cannot be ESP sequence numbers synchronized through the HA plugin? No, this is not how the HA plugin works. ESP sequence numbers move very fast, making a synchronization in userland difficult. You may try to synchronize sequence numbers by other means, but we don't provide any solution beyond our ClusterIP patches. Regards Martin Thanks for your help, Martin. Now it's clear for me. -- With kind regards, Aleksey ___ Users mailing list Users@lists.strongswan.org https://lists.strongswan.org/mailman/listinfo/users
Re: [strongSwan] High availability failover problem
Hi, > Is it essential for both nodes to receive all the ESP packets? Yes. > Cannot be ESP sequence numbers synchronized through the HA plugin? No, this is not how the HA plugin works. ESP sequence numbers move very fast, making a synchronization in userland difficult. You may try to synchronize sequence numbers by other means, but we don't provide any solution beyond our ClusterIP patches. Regards Martin ___ Users mailing list Users@lists.strongswan.org https://lists.strongswan.org/mailman/listinfo/users
Re: [strongSwan] High availability failover problem
On 2015-03-10 15:24, Martin Willi wrote: Then you should check if ClusterIP works as expected, and both on the inbound and outbound paths the ESP packets hit both nodes. To clarify, on the outbound path this of course is plain traffic subject to ESP encapsulation. Regards Martin Thanks, Martin! Probably I've misunderstood something, but I don't use clusterip in my setup, so it is not active/active setup, but rather active/standby with VRRP (there are some issues with unicast IP to multicast MAC bindings). I had a converstion with some guys here in the list and they told me that I can use HA plugin in active/standby mode without using CLUSTERIP. Is it essential for both nodes to receive all the ESP packets? Cannot be ESP sequence numbers synchronized through the HA plugin? -- With kind regards, Aleksey ___ Users mailing list Users@lists.strongswan.org https://lists.strongswan.org/mailman/listinfo/users
Re: [strongSwan] High availability failover problem
> Then you should check if ClusterIP works as expected, and both on the > inbound and outbound paths the ESP packets hit both nodes. To clarify, on the outbound path this of course is plain traffic subject to ESP encapsulation. Regards Martin ___ Users mailing list Users@lists.strongswan.org https://lists.strongswan.org/mailman/listinfo/users
Re: [strongSwan] High availability failover problem
Aleksey, > when I test failover [...], traffic won't flow through standby > node until rekey on child SA is done To me this sound like an ESP sequence number issue. I assume you have patched your kernel to include our ClusterIP IPsec extensions, as discussed at [1]. You may find some never patches in the ha-* tags/branches at [2]. Then you should check if ClusterIP works as expected, and both on the inbound and outbound paths the ESP packets hit both nodes. If this is the case, ClusterIP can keep ESP sequence numbers in sync on the passive node. If that all works as expected, try to compare the sequence numbers before and after failover. Linux drops packets with an already used sequence number silently, but /proc/net/xfrm_stats (requires CONFIG_XFRM_STATISTICS) has some counters that can help in analyzing why packets get dropped. Regards Martin [1]https://wiki.strongswan.org/projects/strongswan/wiki/HighAvailability [2]http://git.strongswan.org/?p=linux-dumm.git;a=summary ___ Users mailing list Users@lists.strongswan.org https://lists.strongswan.org/mailman/listinfo/users
[strongSwan] High availability failover problem
Hi guys! I'm trying to make HA setup work but face some problems during testing (both HA nodes - I'll call them local side - run strongswan 5.2.1 install from wheezy-backports on debian 7.8). I'm using HA in active/standby mode. IPs from which the tunnel is initiated are bound to virtual VLAN interfaces on both HA nodes. IP which is used by remote side to reach the virtual IP on vlan is handled by VRRP and is only on the active node. So, the tunnel initiates OK, traffic flows, ipsec statusall on standby node shows tunnel state as PASSIVE. However, when I test failover (I shut down the VRRP service, which shuts down strongswan on formerly active node), traffic won't flow through standby node until rekey on child SA is done, either by waiting for it to rekey itself, force rekey by issuing echo "*1" > /var/run/charon.ha, echo "*2" > /var/run/charon.ha or bringing back up the ipsec service on the primary node (which also causes rekey on child sa). After the rekey traffic flows OK. Before the failover, I launch ping from the remote side (which is also represented by debian machine with strongswan 5.2.1, IKEv2 is used for key exchange) from IP of the leftsubnet to IP of the right subnet, so the traffic flow inside the tunnel. Ping flows OK and then stops when I intitate failover. However, after failover, I can see packets hitting outbound child sa on the remote node, however no packets hit the inbound one. Both IKE and child SAs numbers match on the remote and local (standby node). On the local standby node I can see that the traffic does flow - both inbound and outbound child SAs are hit, and packet counters increment simultaneously on remote node outbound child sa and local node inbound and outbound SA - so the icmp request is forwarded through the tunnel from the remote node, processed by the local one and icmp reply is sent back - however, for some reason it does not hit the inbound child sa on the remote side. Could anyone point what's going wrong? Thanks in advance. -- With kind regards, Aleksey ___ Users mailing list Users@lists.strongswan.org https://lists.strongswan.org/mailman/listinfo/users