> On 16 Aug 2018, at 09:28, Mikael Abrahamsson <[email protected]> wrote: > > On Wed, 15 Aug 2018, Kent Watsen wrote: > >> You bring up an interesting point, it goes to the motivation for wanting to >> do keepalives in the first place. The text doesn't yet mention maintain >> flow state as a motivation. > > It's not only to maintain flow state, it's also to close the connection when > the network goes down and doesn't work anymore, and "give up" on connections > that doesn't work anymore (for some definition of "anymore"). > > I have operationally been in the situation where a server/client application > was implemented so that the server could only handle 256 connections (some > filedescriptor limit). Every time the firewall was rebooted, lost state, the > connection hung around forever. So the server administrators had to go in and > restart the process to clear these connections, otherwise there were 256 hung > connections and no new connections could be established. > > Sometimes the other endpoint goes down, and doesn't come back. We will for > instance deploy home gateways probably keeping netconf-call-home sessions to > an NMS, and we want them to be around forever, as long as they work. TCP > level keepalives would solve this, as if the customer just powers off the > device, after a while the session will be cleared. Using TCP keepalives here > means you get this kind of behaviour even if the upper-layer application > doesn't support it (netconf might have been a bad example here). It's a > single socket option to set, so it's very easy to do. > >> From knowing approximately what settings people have in their NAT44 and > firewalls etc, I'd say the recommendation should be that keepalives are set > to around 60-300 second interval, and then kill the connection if no traffic > has passed in 3-5 of these intervals, kill the connection. Otherwise TCP will > have backed off so far anyway, that it's probably faster to just re-try the > connection instead of waiting for TCP to re-send the packet. > > I have seen so many times in my 20 years working in networking where lack of > keepalives have caused all kinds of problems. I wish everybody would turn it > on and keep it on.
As more and more connections flow over mobile networks, it seems more and more important, even for flows you did not expect. I have to send keepalives over IPv6 connections - not for NAT as on IPv4. but for middlebox devices that has an interesting approach and attitude towards connection management. ;-) The SIP Outbound RFC has a lot of reasoning behind keep-alives for connection failover and may be good input here. https://tools.ietf.org/html/rfc5626 <https://tools.ietf.org/html/rfc5626> /O
