Hello, first time writing the list.  Let me know if this is going to the wrong 
place.

My overall goal is for a peer running a pair HA tunnels to terminate at my 
libreswan node (so my node has 2 tunnels using the same right/left-subnets in 
their .conf, but different marks).  My local "switching" to implement the HA 
behavior is an updown script - on up/route it writes iptables connmark rules to 
send packets to the mark of the tunnel referenced in the updown invocation.  (I 
have no preferred "primary".)  On down/unroute it removes rules for the tunnel 
referenced in the invocation, leaving rules to the remaining tunnel mark only.

I have a test setup in a cloud space with end-users A and B who talk between 
their separate VPCs through the tunnels.  I have a pair of endpoints to mock up 
my peers (call them P1 and P2) in A's VPC.  And I have a node (N) in B's VPC 
where a tunnel each from P1 and P2 terminate.  (For cost, P1 and P2 in this 
test setup are also Libreswans.)  All router boxes are Libreswan 3.23, Ubuntu 
18.04.5 LTS, running in the AWS cloud free-tier.

The setup passes traffic and the HA switching behavior works as intended if I 
issue "ipsec auto --delete p1_to_n" while on P1.
P1 sends a terminator message to N.  N calls my updown script while downing and 
unrouting the tunnel.  The script removes the iptables rules directing traffic 
into P1's xfrm tunnel but rules for P2 are still in place so traffic 
immediately flows over through P2.  (The routing switch on the other side for A 
-> P2 is handled and not relevant here.)  I can re-up and down P1 and P2 at 
will and see the traffic is not interrupted between A and B (as long as there 
is at least one tunnel up).

But, where I am having trouble is when I try to make this more realistic by 
suddenly blocking traffic instead of issuing a --delete.  My expectation for 
this scenario was that DPD would detect the disconnect, down the tunnel (as 
suggested in libreswan DPD code tests) and call my updown script; but that has 
not been the case.  I see NAT-T packets go out, but not DPD and lastdpd=-1 
never changes.  If I disable NAT-T (which may cause me other problems with AWS 
public addressing) I do see an R_U_THERE and _ACK, but only once.  After the 
first NAT-T disabled DPD exchange, I see "DPD: no need to send or schedule DPD 
for replaced IPsec SA" repeatedly (every 30 seconds, matching my dpdtimeout) 
but I never see another DPD exchange.  I used iptables DROP on in/out to model 
a disconnect but also tried AWS ACLs in case there was some difference. 
(Netlink seems to recognize the inability to send when the drop rules are in 
place.)

I've done quite a bit of diving to see what's happening and am happy to drop 
both my digest and/or raw-logs here, but as a new user I first wanted to check 
if I'm just missing something entirely before I completely word vomit on the 
mail list.

Thank you in advance,
-Mike


_______________________________________________
Swan mailing list
[email protected]
https://lists.libreswan.org/mailman/listinfo/swan

Reply via email to