On Fri, Aug 12, 2016 at 09:56:41PM +0200, fRANz wrote: > On Sat, Aug 6, 2016 at 2:18 AM, William Ahern > <[email protected]> wrote: <snip> > > isakmpd unconditionally sends NAT-T keepalive messages every 30 seconds, > > whereas iked's ikev2_ike_sa_alive only sends a keepalive message iff > > `!foundin && foundout`. But that presumes that the SA initiator is also the > > initiator of traffic, which definitely isn't the case in my situation, and > > seems dubious and unreliable even for real road warriors. > > ... > > > I'd be happy to create a proper patch if someone could explain the purpose > > of the conditional logic. I wouldn't want to accidentally break something. > > > > I also wouldn't mind making the keepalive interval configurable--rather than > > a compile-time constant--so users could deal with NAT gateways which > > aggressively flush state. > > Hello William, > I did the same switch (from isakmpd to iked) with a lot of problems, > maybe the same that you're reported. > Did you receive any feedback from OpenBSD staff, catching the occasion > of the 6.0 release ready to go? > Regards, > -f
No feedback, yet, but soon after posting I realized a few things: 1) My hack makes the tunnel much more stable, but not nearly as stable as for isakmpd. I think it's because with isakmpd both peers are sending a keepalive every 30 seconds, whereas I only applied the hack I posted to the active, behind-the-NAT peer. See point #3, below. 2) The logic of ikev2_ike_sa_alive is intended, I think, to preserve the limited lifetime of SAs. Otherwise by unconditionally sending a keepalive and not distinguishing keepalive traffic, the SA might never expire. I'm not sure why iked isn't using the standard NAT-T keepalive message format and protocol like isakmpd does. AFAICT it's still defined by IKEv2. Maybe iked is using a hybrid keepalive/dead peer detetion solution, but the author forgot to account for different effective behavior in some scenarios; or maybe it was just more expedient than implementing NAT-T keepalive messages. Figuring that out will probably help me answer what a proper solution looks like. 3) I'm fairly certain the keepalive interval should be configurable. The default UDP NAT state expiration on OpenBSD, for example, is 30 seconds. The compile-time constant interval for keepalives in isakmpd and iked is also 30 seconds--the recommended period in the RFCs. Over time it's inevitable for a peer's keepalive packet to miss the window for preserving NAT state, especially considering that the peer's and router's timers are going to be synchronized. That would explain why even with isakmpd I still need a cronjob to ping the passive host. And it explains why isakmpd is more stable than my hacked iked--isakmpd sends keepalives independently from both sides, so you have two shots at making the NAT expiration window. iked's keepalive is a round-trip message; also two packets, but the timing of the first packet is all that matters. Of course, the NAT state could expire before an IKE keepalive for many reasons, but a keepalive interval at least a few seconds less than the router's NAT expiration should keep the connection stable for longer periods of time. And rather than having a cron job run every minute, the SA child lifetime could be set to something smallish. If and when NAT state does expire, the active peer behind NAT will rekey the SA within a tolerable period, reestablishing NAT state and restoring reverse traffic. Lowering both keepalive and child SA lifetime should make these types of tunnels much more stable and reliable, without recourse to external hacks.
