Re: statement regarding keepalives

Gorry Fairhurst Thu, 16 Aug 2018 00:55:19 -0700

Adding some comments here. I'm playinh catch-up, so I may have commentson some things that have been fixed, and missed others.


On 16/08/2018, 08:28, Mikael Abrahamsson wrote:

On Wed, 15 Aug 2018, Kent Watsen wrote:
You bring up an interesting point, it goes to the motivation forwanting to do keepalives in the first place. The text doesn't yetmention maintain flow state as a motivation.
It's not only to maintain flow state, it's also to close theconnection when the network goes down and doesn't work anymore, and"give up" on connections that doesn't work anymore (for somedefinition of "anymore").
I have operationally been in the situation where a server/clientapplication was implemented so that the server could only handle 256connections (some filedescriptor limit). Every time the firewall wasrebooted, lost state, the connection hung around forever. So theserver administrators had to go in and restart the process to clearthese connections, otherwise there were 256 hung connections and nonew connections could be established.
Sometimes the other endpoint goes down, and doesn't come back. We willfor instance deploy home gateways probably keeping netconf-call-homesessions to an NMS, and we want them to be around forever, as long asthey work. TCP level keepalives would solve this, as if the customerjust powers off the device, after a while the session will be cleared.Using TCP keepalives here means you get this kind of behaviour even ifthe upper-layer application doesn't support it (netconf might havebeen a bad example here). It's a single socket option to set, so it'svery easy to do.

Agree. I think if we look to the transport layer that allowing a flow tocontinue to use existing "network" state (in various forms) is animportant aspect - there are NATs, Firewalls, QoS Classifiers, etc aswell as load balancers, and layer 2/3's that take resource decisions atthe flow level. Normally all of these do the correct thing when there isa continuous flow of packets.

Somewhere in the thread I also saw statement that suggested thatasosciations should be short-lived - If that advice is carried to thetransport layer, I would expect it to have serious impact on theperformance for some paths! (There are important trade-offs here, and weshould not make sweeping assumptions).

From knowing approximately what settings people have in their NAT44 and
firewalls etc, I'd say the recommendation should be that keepalivesare set to around 60-300 second interval, and then kill the connectionif no traffic has passed in 3-5 of these intervals, kill theconnection. Otherwise TCP will have backed off so far anyway, thatit's probably faster to just re-try the connection instead of waitingfor TCP to re-send the packet.
I have seen so many times in my 20 years working in networking wherelack of keepalives have caused all kinds of problems. I wish everybodywould turn it on and keep it on.

I agree. I have the feeling that this is at all not easy advice to getcorrect in a general way (and this thread is quite there yet). e.g., RFC5245 set lower limits for timers - because that was thought important.

I don't agree that protocol stacks with a secure transport protocollayer (e.g., TLS, SSH, DTLS) that sits on top of a cleartext protocollayer (e.g., TCP, UDP) should be advised to do the aliveness check onlywithin protection envelope afforded by the secure transport protocollayer - to me that seems entirely wrong - it has the same "issue" as aabove, it depends on the function of the aliveness check and the waythis is used by the layer's protocol machine. In many cases it isabsolutely desirable to do this within the layer that needs thisinformation. Passing the detailed state down between layers can be mostawkward. Higher layers can make there own decisions - and suppresskeep-alives or reaffirm state.

Guidance from the transport perspective on timers is in RFC8085 in 3.1.1, there is also more advice in the "behave" RFCs and a summary of themechanisms in RFC8085 3.5 (noted by Lars) .... The vulnerabilities arealso noted in RFC8085, and I think we should be clear to differentiatebetween on-path versus off path knowledge when understanding this.


Gorry

Re: statement regarding keepalives

Reply via email to