Hi,
While I agree with the sentiment here, I have personally been in positions
where application programmers were unable to (in a timely manner) modify
whatever was running, to implement a keepalive protocol. In that case,
turning on TCP keepalives was a very easy thing to do that immediately
would yield operational benefits.
So I'd like to see in the text that we recommend to do it as "high up" in
the stack as possible, but still don't put off people turning on TCP
keepalives "because the IETF doesn't recommend that", and thus they do
nothing at all and the problem just persists.
Also, should we talk about recommendations for what these timers should
be? In my experience, it's typically in tens of seconds up to 5-10 minutes
that makes sense for Internet use. Shorter than that might interrupt the
connection prematurely, longer than that causes things to take too long to
detect a problem. Of course it's up to the application/environment to
choose the best value for each use-case, but some text on this might be
worthwhile to have there?
On Fri, 13 Jul 2018, Kent Watsen wrote:
Dear TSVAREA,
The folks working with the BBF asked the NETMOD WG to consider modifying
draft-ietf-netconf-netconf-client-server to support TCP keepalives [1]. However, it
is unclear what IETF's position is on the use of keepalives, especially with regards
to keepalives provided in protocol stacks (e.g., <some-app> over HTTP over TLS
over TCP).
After some discussion with Transport ADs (Spencer and Mijra) and the TLS ADs
(Eric and Ben), the following draft statement has been crafted. Spencer and
Mijra have requested TSVAREA critique it before, perhaps, developing a
consensus document around it in TSVWG.
It would be greatly appreciated if folks here could review and provide comments
on the draft statement below. The scope of the statement can be increased or
reduced as deemed appropriate.
[1] https://mailarchive.ietf.org/arch/msg/netconf/MOzcZKp2rSxPVMTGdmmrVInwx2M
Thanks,
Kent (and Mahesh) // NETCONF chairs
===== STATEMENT =====
When the initiator of a networking session needs to maintain a persistent
connection [1], it is necessary for it to periodically test the aliveness of
the remote peer. In such cases, it is RECOMMENDED that the aliveness check
happens at the highest protocol layer possible that is most meaningful to the
application, to maximize the depth of the aliveness check.
E.g., for an HTTPS connection to a simple webserver, HTTP-level keepalives
would test more aliveness than TLS-level keepalives. However, for a webserver
that is accessed via a load-balancer that terminates TLS connections, TLS-level
aliveness checks may be the most meaningful check that could be performed.
In order to ensure aliveness checks can always occur at the highest protocol layer, it is
RECOMMENDED that protocol designers always include an aliveness check mechanism in the
protocol and, for client/server protocols, that the aliveness check can be initiated from
either peer, as sometimes the "server" is the initiator of the underlying
networking connection (e.g., RFC 8071).
Some protocol stacks have a secure transport protocol layer (e.g., TLS, SSH,
DTLS) that sits on top of a cleartext protocol layer (e.g., TCP, UDP). In such
cases, it is RECOMMENDED that the aliveness check occurs within protection
envelope afforded by the secure transport protocol layer. In such cases, the
aliveness checks SHOULD NOT occur via the cleartext protocol layer, as an
adversary can block aliveness check messages in either direction and send fake
aliveness check messages in either direction.
[1] While reasons may vary for why the initiator of a networking session feels
compelled to maintain a persistent connection. If the session is primarily
quiet, and the use case can cope with the additional latency of starting a new
connection, it is RECOMMENDED to use short-lived connections, instead of
maintaining a long-lived persistent connection using aliveness checks.
--
Mikael Abrahamsson email: swm...@swm.pp.se