Hi,

While I agree with the sentiment here, I have personally been in positions where application programmers were unable to (in a timely manner) modify whatever was running, to implement a keepalive protocol. In that case, turning on TCP keepalives was a very easy thing to do that immediately would yield operational benefits.

So I'd like to see in the text that we recommend to do it as "high up" in the stack as possible, but still don't put off people turning on TCP keepalives "because the IETF doesn't recommend that", and thus they do nothing at all and the problem just persists.

Also, should we talk about recommendations for what these timers should be? In my experience, it's typically in tens of seconds up to 5-10 minutes that makes sense for Internet use. Shorter than that might interrupt the connection prematurely, longer than that causes things to take too long to detect a problem. Of course it's up to the application/environment to choose the best value for each use-case, but some text on this might be worthwhile to have there?

On Fri, 13 Jul 2018, Kent Watsen wrote:


Dear TSVAREA,

The folks working with the BBF asked the NETMOD WG to consider modifying 
draft-ietf-netconf-netconf-client-server to support TCP keepalives [1].  However, it 
is unclear what IETF's position is on the use of keepalives, especially with regards 
to keepalives provided in protocol stacks (e.g., <some-app> over HTTP over TLS 
over TCP).

After some discussion with Transport ADs (Spencer and Mijra) and the TLS ADs 
(Eric and Ben), the following draft statement has been crafted.  Spencer and 
Mijra have requested TSVAREA critique it before, perhaps, developing a 
consensus document around it in TSVWG.

It would be greatly appreciated if folks here could review and provide comments 
on the draft statement below.  The scope of the statement can be increased or 
reduced as deemed appropriate.

[1] https://mailarchive.ietf.org/arch/msg/netconf/MOzcZKp2rSxPVMTGdmmrVInwx2M

Thanks,
Kent (and Mahesh) // NETCONF chairs


===== STATEMENT =====

When the initiator of a networking session needs to maintain a persistent 
connection [1], it is necessary for it to periodically test the aliveness of 
the remote peer.  In such cases, it is RECOMMENDED that the aliveness check 
happens at the highest protocol layer possible that is most meaningful to the 
application, to maximize the depth of the aliveness check.

E.g., for an HTTPS connection to a simple webserver, HTTP-level keepalives 
would test more aliveness than TLS-level keepalives.  However, for a webserver 
that is accessed via a load-balancer that terminates TLS connections, TLS-level 
aliveness checks may be the most meaningful check that could be performed.

In order to ensure aliveness checks can always occur at the highest protocol layer, it is 
RECOMMENDED that protocol designers always include an aliveness check mechanism in the 
protocol and, for client/server protocols, that the aliveness check can be initiated from 
either peer, as sometimes the "server" is the initiator of the underlying 
networking connection (e.g., RFC 8071).

Some protocol stacks have a secure transport protocol layer (e.g., TLS, SSH, 
DTLS) that sits on top of a cleartext protocol layer (e.g., TCP, UDP).  In such 
cases, it is RECOMMENDED that the aliveness check occurs within protection 
envelope afforded by the secure transport protocol layer.  In such cases, the 
aliveness checks SHOULD NOT occur via the cleartext protocol layer, as an 
adversary can block aliveness check messages in either direction and send fake 
aliveness check messages in either direction.

[1] While reasons may vary for why the initiator of a networking session feels 
compelled to maintain a persistent connection.  If the session is primarily 
quiet, and the use case can cope with the additional latency of starting a new 
connection, it is RECOMMENDED to use short-lived connections, instead of 
maintaining a long-lived persistent connection using aliveness checks.




--
Mikael Abrahamsson    email: swm...@swm.pp.se

Reply via email to