Hi, Kent,

I think the recommendations miss a few aspects of my suggestions:

- there is NEVER a good reason to assume that keepalives should happen at the 
“highest level” of anything;
keepalives are needed *at EVERY level* where endpoint state needs to be 
actively (rather than passively) maintained

- I agree it’s not helpful to assume that layers can coordinate on keepalives, 
but they don’t need to; keepalives at lower levels simply wouldn’t be triggered 
if there is sufficient traffic at those layers driven by upper layer 
keepalives. in specific, this means that there is NEVER a good reason to avoid 
implementing keepalives at a layer where they are needed, i.e., because of 
potential interaction with higher level keepalives. Such interaction is 
resolved automatically.

So the point, IMO, is that:
        - EACH layer that needs keepaliives MUST implement it for themselves
        - there is NEVER a reason to disable or suppress keepalives at any 
layer to “reduce traffic” due to keepalives at higher layers
        - although keepalives can be useful for state that decays when that 
state matters, keep in mind that not all state decays and not all such state 
matters
                it’s often still a surprise to many that TCP connections aren’t 
“cleaned up” when not in use; they’re cleaned up ONLY when old state is in the 
way of new state
                That’s a feature, not a bug.

As others have pointed out, there’s also no reason to jump to the conclusion 
that short, restarted connections are better - or worse - than keepalives. The 
difference depends on the amount of effort required to maintain state vs 
re-establishing it (including the need to recycle connection identifiers).

Joe

> On Aug 15, 2018, at 10:35 AM, Kent Watsen <kwat...@juniper.net> wrote:
> 
> 
> Below is an updated version of some text that we might roll into
> a statement or an I-D of some sort.  Kindly review and provide 
> suggestions for improvement, or support for the text as is, if
> that is the case.  ;)
> 
> This update accommodates comments from:
>  - Wesley Eddy & David Black
>     - removed "layers of functionality" verbiage
>     - moved footnote into the body of the document (this had
>       a cascading effect, and why it looks so different now)
>  - Joe Touch
>     - keepalives should occur at *all* layers that benefit
>     - keepalives at a layer should be suppressed in the 
>       presence of sufficient traffic from higher layers
>     - keepalives at a layer should not be interpreted as
>       implying state at any other layer
> 
> This update does not accommodate comments from:
>  - Michael Abrahamsson & Tom Herbert
>     - no statement added to promote TCP keepalives
>        * note: I believe this to be unnecessary because 
>          the current text doesn't ever say to not use TCP.
>     - no statement added for tuning params (e.g., timeouts).
>        * note: we could add this, but it will increase the
>          scope of the document - do we want to do this?
> 
> Cheers!
> Kent
> 
> 
> ===== START =====
> 
> # Connection Strategies for Long-lived Connections
> 
> A networked device may have an ongoing need to interact with a remote
> device. Sometimes the need arises from wanting to push data to the
> remote device, and sometimes the need arises from wanting to check if
> there is any data the remote device may have pending to deliver to
> it.
> 
> There are two fundamental network connection strategies that can be
> used to accomplish this goal: 1) a single long-lived connection and
> 2) a sequence of short-lived connections.
> 
> A single long-lived connection is most common, as it is
> straightforward to implement and directly answers the question of 
> if the "connection" is established. However, long-lived connections
> require more system resources, which may affect scalability, and
> require the initiator of the connection to periodically test the
> aliveness of the remote device, discussed further in the next 
> section.
> 
> A sequence of short-lived connections is less common, as there is an
> additional implementation effort, as well as concerns such as: 1) the
> delay of the remote device needing to wait until the connection is
> reestablished in order to deliver pending data, and 2) the additional
> latency incurred from starting new connections, especially when
> cryptology is involved. However, short-lived connections do not
> require keepalives and are arguably more secure, as each device is
> forced to re-authenticate the other and reload all related
> access-control policies on each connection.
> 
> For networking sessions that are primarily quiet, and the use case
> can cope with the additional latency of waiting for and starting new
> connections, it is RECOMMENDED to use a sequence of short-lived
> connections, instead of maintaining a single long-lived connection
> using aliveness checks.
> 
> 
> # Keepalives for Persistent Connections
> 
> When the initiator of a networking session needs to maintain a
> long-lived connection, it is necessary for it to periodically test
> the aliveness of the remote device. In such cases, it is RECOMMENDED
> that the aliveness check happens at the highest protocol layer
> possible that is meaningful to the application, in order to maximize
> the depth of the aliveness check.
> 
> For example, for an HTTPS connection to a simple webserver,
> HTTP-level keepalives would test more layers of functionality than
> TLS-level keepalives. However, for a webserver that is accessed via a
> load-balancer that terminates TLS connections, TLS-level aliveness
> checks may be the most meaningful check that can be performed.
> 
> More generally, it is RECOMMENDED that applications be able to
> perform the aliveness checks at all protocol levels that benefit, but
> suppress the aliveness checks at lower protocol layers from occurring
> when there is sufficient activity at higher protocol layers.
> Keepalives at a layer SHOULD NOT be interpreted as implying state at
> any other layer.
> 
> In order to ensure aliveness checks can occur at any given protocol
> layer, it is RECOMMENDED that protocol designers always include an
> aliveness check mechanism in the protocol and, for client/server
> protocols, that the aliveness check can be initiated from either
> device, as sometimes the "server" is the initiator of the underlying
> networking connection (e.g., RFC 8071).
> 
> Some protocol stacks have a secure transport protocol layer (e.g.,
> TLS, SSH, DTLS) that sits on top of a cleartext protocol layer (e.g.,
> TCP, UDP). In such cases, it is RECOMMENDED that the aliveness check
> occurs within protection envelope afforded by the secure transport
> protocol layer; the aliveness checks SHOULD NOT occur via the
> underlying cleartext protocol layer, as an adversary can block
> aliveness check messages in either direction and send fake aliveness
> check messages in either direction.
> 
> 

Reply via email to