Hi, Kent,
I think the recommendations miss a few aspects of my suggestions:
- there is NEVER a good reason to assume that keepalives should happen at the
“highest level” of anything;
keepalives are needed *at EVERY level* where endpoint state needs to be
actively (rather than passively) maintained
- I agree it’s not helpful to assume that layers can coordinate on keepalives,
but they don’t need to; keepalives at lower levels simply wouldn’t be triggered
if there is sufficient traffic at those layers driven by upper layer
keepalives. in specific, this means that there is NEVER a good reason to avoid
implementing keepalives at a layer where they are needed, i.e., because of
potential interaction with higher level keepalives. Such interaction is
resolved automatically.
So the point, IMO, is that:
- EACH layer that needs keepaliives MUST implement it for themselves
- there is NEVER a reason to disable or suppress keepalives at any
layer to “reduce traffic” due to keepalives at higher layers
- although keepalives can be useful for state that decays when that
state matters, keep in mind that not all state decays and not all such state
matters
it’s often still a surprise to many that TCP connections aren’t
“cleaned up” when not in use; they’re cleaned up ONLY when old state is in the
way of new state
That’s a feature, not a bug.
As others have pointed out, there’s also no reason to jump to the conclusion
that short, restarted connections are better - or worse - than keepalives. The
difference depends on the amount of effort required to maintain state vs
re-establishing it (including the need to recycle connection identifiers).
Joe
> On Aug 15, 2018, at 10:35 AM, Kent Watsen <[email protected]> wrote:
>
>
> Below is an updated version of some text that we might roll into
> a statement or an I-D of some sort. Kindly review and provide
> suggestions for improvement, or support for the text as is, if
> that is the case. ;)
>
> This update accommodates comments from:
> - Wesley Eddy & David Black
> - removed "layers of functionality" verbiage
> - moved footnote into the body of the document (this had
> a cascading effect, and why it looks so different now)
> - Joe Touch
> - keepalives should occur at *all* layers that benefit
> - keepalives at a layer should be suppressed in the
> presence of sufficient traffic from higher layers
> - keepalives at a layer should not be interpreted as
> implying state at any other layer
>
> This update does not accommodate comments from:
> - Michael Abrahamsson & Tom Herbert
> - no statement added to promote TCP keepalives
> * note: I believe this to be unnecessary because
> the current text doesn't ever say to not use TCP.
> - no statement added for tuning params (e.g., timeouts).
> * note: we could add this, but it will increase the
> scope of the document - do we want to do this?
>
> Cheers!
> Kent
>
>
> ===== START =====
>
> # Connection Strategies for Long-lived Connections
>
> A networked device may have an ongoing need to interact with a remote
> device. Sometimes the need arises from wanting to push data to the
> remote device, and sometimes the need arises from wanting to check if
> there is any data the remote device may have pending to deliver to
> it.
>
> There are two fundamental network connection strategies that can be
> used to accomplish this goal: 1) a single long-lived connection and
> 2) a sequence of short-lived connections.
>
> A single long-lived connection is most common, as it is
> straightforward to implement and directly answers the question of
> if the "connection" is established. However, long-lived connections
> require more system resources, which may affect scalability, and
> require the initiator of the connection to periodically test the
> aliveness of the remote device, discussed further in the next
> section.
>
> A sequence of short-lived connections is less common, as there is an
> additional implementation effort, as well as concerns such as: 1) the
> delay of the remote device needing to wait until the connection is
> reestablished in order to deliver pending data, and 2) the additional
> latency incurred from starting new connections, especially when
> cryptology is involved. However, short-lived connections do not
> require keepalives and are arguably more secure, as each device is
> forced to re-authenticate the other and reload all related
> access-control policies on each connection.
>
> For networking sessions that are primarily quiet, and the use case
> can cope with the additional latency of waiting for and starting new
> connections, it is RECOMMENDED to use a sequence of short-lived
> connections, instead of maintaining a single long-lived connection
> using aliveness checks.
>
>
> # Keepalives for Persistent Connections
>
> When the initiator of a networking session needs to maintain a
> long-lived connection, it is necessary for it to periodically test
> the aliveness of the remote device. In such cases, it is RECOMMENDED
> that the aliveness check happens at the highest protocol layer
> possible that is meaningful to the application, in order to maximize
> the depth of the aliveness check.
>
> For example, for an HTTPS connection to a simple webserver,
> HTTP-level keepalives would test more layers of functionality than
> TLS-level keepalives. However, for a webserver that is accessed via a
> load-balancer that terminates TLS connections, TLS-level aliveness
> checks may be the most meaningful check that can be performed.
>
> More generally, it is RECOMMENDED that applications be able to
> perform the aliveness checks at all protocol levels that benefit, but
> suppress the aliveness checks at lower protocol layers from occurring
> when there is sufficient activity at higher protocol layers.
> Keepalives at a layer SHOULD NOT be interpreted as implying state at
> any other layer.
>
> In order to ensure aliveness checks can occur at any given protocol
> layer, it is RECOMMENDED that protocol designers always include an
> aliveness check mechanism in the protocol and, for client/server
> protocols, that the aliveness check can be initiated from either
> device, as sometimes the "server" is the initiator of the underlying
> networking connection (e.g., RFC 8071).
>
> Some protocol stacks have a secure transport protocol layer (e.g.,
> TLS, SSH, DTLS) that sits on top of a cleartext protocol layer (e.g.,
> TCP, UDP). In such cases, it is RECOMMENDED that the aliveness check
> occurs within protection envelope afforded by the secure transport
> protocol layer; the aliveness checks SHOULD NOT occur via the
> underlying cleartext protocol layer, as an adversary can block
> aliveness check messages in either direction and send fake aliveness
> check messages in either direction.
>
>