On Wed, Aug 15, 2018 at 10:35 AM, Kent Watsen <kwat...@juniper.net> wrote:
>
> Below is an updated version of some text that we might roll into
> a statement or an I-D of some sort.  Kindly review and provide
> suggestions for improvement, or support for the text as is, if
> that is the case.  ;)
>
> This update accommodates comments from:
>   - Wesley Eddy & David Black
>      - removed "layers of functionality" verbiage
>      - moved footnote into the body of the document (this had
>        a cascading effect, and why it looks so different now)
>   - Joe Touch
>      - keepalives should occur at *all* layers that benefit
>      - keepalives at a layer should be suppressed in the
>        presence of sufficient traffic from higher layers
>      - keepalives at a layer should not be interpreted as
>        implying state at any other layer
>
> This update does not accommodate comments from:
>   - Michael Abrahamsson & Tom Herbert
>      - no statement added to promote TCP keepalives
>         * note: I believe this to be unnecessary because
>           the current text doesn't ever say to not use TCP.
>      - no statement added for tuning params (e.g., timeouts).
>         * note: we could add this, but it will increase the
>           scope of the document - do we want to do this?
>
> Cheers!
> Kent
>
>
> ===== START =====
>
> # Connection Strategies for Long-lived Connections
>
> A networked device may have an ongoing need to interact with a remote
> device. Sometimes the need arises from wanting to push data to the
> remote device, and sometimes the need arises from wanting to check if
> there is any data the remote device may have pending to deliver to
> it.
>
> There are two fundamental network connection strategies that can be
> used to accomplish this goal: 1) a single long-lived connection and
> 2) a sequence of short-lived connections.
>
> A single long-lived connection is most common, as it is
> straightforward to implement and directly answers the question of
> if the "connection" is established. However, long-lived connections
> require more system resources, which may affect scalability, and
> require the initiator of the connection to periodically test the
> aliveness of the remote device, discussed further in the next
> section.
>
> A sequence of short-lived connections is less common, as there is an
> additional implementation effort, as well as concerns such as: 1) the
> delay of the remote device needing to wait until the connection is
> reestablished in order to deliver pending data, and 2) the additional
> latency incurred from starting new connections, especially when
> cryptology is involved. However, short-lived connections do not
> require keepalives and are arguably more secure, as each device is
> forced to re-authenticate the other and reload all related
> access-control policies on each connection.
>
> For networking sessions that are primarily quiet, and the use case
> can cope with the additional latency of waiting for and starting new
> connections, it is RECOMMENDED to use a sequence of short-lived
> connections, instead of maintaining a single long-lived connection
> using aliveness checks.
>
>
> # Keepalives for Persistent Connections
>
> When the initiator of a networking session needs to maintain a
> long-lived connection, it is necessary for it to periodically test
> the aliveness of the remote device. In such cases, it is RECOMMENDED
> that the aliveness check happens at the highest protocol layer
> possible that is meaningful to the application, in order to maximize
> the depth of the aliveness check.
>
> For example, for an HTTPS connection to a simple webserver,
> HTTP-level keepalives would test more layers of functionality than
> TLS-level keepalives. However, for a webserver that is accessed via a
> load-balancer that terminates TLS connections, TLS-level aliveness
> checks may be the most meaningful check that can be performed.
>
> More generally, it is RECOMMENDED that applications be able to
> perform the aliveness checks at all protocol levels that benefit, but
> suppress the aliveness checks at lower protocol layers from occurring
> when there is sufficient activity at higher protocol layers.
> Keepalives at a layer SHOULD NOT be interpreted as implying state at
> any other layer.
>
> In order to ensure aliveness checks can occur at any given protocol
> layer, it is RECOMMENDED that protocol designers always include an
> aliveness check mechanism in the protocol and, for client/server
> protocols, that the aliveness check can be initiated from either
> device, as sometimes the "server" is the initiator of the underlying
> networking connection (e.g., RFC 8071).
>
> Some protocol stacks have a secure transport protocol layer (e.g.,
> TLS, SSH, DTLS) that sits on top of a cleartext protocol layer (e.g.,
> TCP, UDP). In such cases, it is RECOMMENDED that the aliveness check
> occurs within protection envelope afforded by the secure transport
> protocol layer; the aliveness checks SHOULD NOT occur via the
> underlying cleartext protocol layer, as an adversary can block
> aliveness check messages in either direction and send fake aliveness
> check messages in either direction.
>
I think the statement is missing a primary purpose of keepalives,
maybe the most important one, which to maintain flow state in NAT and
firewalls and prevent eviction by timeout or LRU.

Also, any meaningful discussion or statement about keepalives should
include considerations on the frequency of keepalives and their cost.

Keepalives themselves carry no meaningful end user data, they are
purely management overhead. The higher the frequency of keepalives,
the higher the overhead and hence the more network resources they
consume. At some point they can become a source of congestion,
especially when keepalive timers become synchronized across a network
as I previously pointed out. Unfortunately, there is no standard for
how NAT state eviction is done and no standard NAT timeout, so the
frequency of keepalives to prevent NAT state eviction is probably
higher than it should be (hence more network overhead).

In terms of cost, consider the effects of waking up the transmitter on
a smart phone periodically just for the purpose of keeping connections
up. With a high enough frequency this will drain the battery quickly.
In fact, one of the touted benefits of IPv6 was supposed to be that
NAT isn't present so there is no need for periodic keepalives to
maintain NAT state and hence this would conserve power on mobile
devices. Use of keepalives in power constrained devices is a real
issue.

Tom

>

Reply via email to