Below is an updated version of some text that we might roll into
a statement or an I-D of some sort. Kindly review and provide
suggestions for improvement, or support for the text as is, if
that is the case. ;)
This update accommodates comments from:
- Wesley Eddy & David Black
- removed "layers of functionality" verbiage
- moved footnote into the body of the document (this had
a cascading effect, and why it looks so different now)
- Joe Touch
- keepalives should occur at *all* layers that benefit
- keepalives at a layer should be suppressed in the
presence of sufficient traffic from higher layers
- keepalives at a layer should not be interpreted as
implying state at any other layer
This update does not accommodate comments from:
- Michael Abrahamsson & Tom Herbert
- no statement added to promote TCP keepalives
* note: I believe this to be unnecessary because
the current text doesn't ever say to not use TCP.
- no statement added for tuning params (e.g., timeouts).
* note: we could add this, but it will increase the
scope of the document - do we want to do this?
Cheers!
Kent
===== START =====
# Connection Strategies for Long-lived Connections
A networked device may have an ongoing need to interact with a remote
device. Sometimes the need arises from wanting to push data to the
remote device, and sometimes the need arises from wanting to check if
there is any data the remote device may have pending to deliver to
it.
There are two fundamental network connection strategies that can be
used to accomplish this goal: 1) a single long-lived connection and
2) a sequence of short-lived connections.
A single long-lived connection is most common, as it is
straightforward to implement and directly answers the question of
if the "connection" is established. However, long-lived connections
require more system resources, which may affect scalability, and
require the initiator of the connection to periodically test the
aliveness of the remote device, discussed further in the next
section.
A sequence of short-lived connections is less common, as there is an
additional implementation effort, as well as concerns such as: 1) the
delay of the remote device needing to wait until the connection is
reestablished in order to deliver pending data, and 2) the additional
latency incurred from starting new connections, especially when
cryptology is involved. However, short-lived connections do not
require keepalives and are arguably more secure, as each device is
forced to re-authenticate the other and reload all related
access-control policies on each connection.
For networking sessions that are primarily quiet, and the use case
can cope with the additional latency of waiting for and starting new
connections, it is RECOMMENDED to use a sequence of short-lived
connections, instead of maintaining a single long-lived connection
using aliveness checks.
# Keepalives for Persistent Connections
When the initiator of a networking session needs to maintain a
long-lived connection, it is necessary for it to periodically test
the aliveness of the remote device. In such cases, it is RECOMMENDED
that the aliveness check happens at the highest protocol layer
possible that is meaningful to the application, in order to maximize
the depth of the aliveness check.
For example, for an HTTPS connection to a simple webserver,
HTTP-level keepalives would test more layers of functionality than
TLS-level keepalives. However, for a webserver that is accessed via a
load-balancer that terminates TLS connections, TLS-level aliveness
checks may be the most meaningful check that can be performed.
More generally, it is RECOMMENDED that applications be able to
perform the aliveness checks at all protocol levels that benefit, but
suppress the aliveness checks at lower protocol layers from occurring
when there is sufficient activity at higher protocol layers.
Keepalives at a layer SHOULD NOT be interpreted as implying state at
any other layer.
In order to ensure aliveness checks can occur at any given protocol
layer, it is RECOMMENDED that protocol designers always include an
aliveness check mechanism in the protocol and, for client/server
protocols, that the aliveness check can be initiated from either
device, as sometimes the "server" is the initiator of the underlying
networking connection (e.g., RFC 8071).
Some protocol stacks have a secure transport protocol layer (e.g.,
TLS, SSH, DTLS) that sits on top of a cleartext protocol layer (e.g.,
TCP, UDP). In such cases, it is RECOMMENDED that the aliveness check
occurs within protection envelope afforded by the secure transport
protocol layer; the aliveness checks SHOULD NOT occur via the
underlying cleartext protocol layer, as an adversary can block
aliveness check messages in either direction and send fake aliveness
check messages in either direction.