The default is set to 2 hours for good reasons.
First, only stateful firewall and NAT gateway care about timing out idle
TCP connections. There's not such devices in the core Internet routing
infrastructure. Those devices are only found very close to the endpoints
on both sides of the TCP connection, and the local network
administrators should have full control over their configuration. You
will only run into the TCP connections timing out problem when these
devices are either misconfigured or the load requires them to be
configured that way.
Second, to avoid bandwidth and system resource wastage on servers with
large number of open and mostly idle TCP connections with the
SO_KEEPALIVE option enabled. For example, if a server has 100,000 such
TCP connections, and the tcp_keepalive_time is set to 10 minutes on the
server, there will be over 166 keep-alive packets per second. That's a
huge waste of system resource and network bandwidth.
Third, to conserve power on battery powered devices, such as network
enabled sensors and smartphones. Every time a network packet arrives at
these devices, they will have to wake up, process the packet and then go
back to sleep. Waking a smartphone up once every 2 hours isn't a big
deal, that's only 12 times a day, probably less than the number of times
most people check their phones per day. However, if the phone has to
wake up every minute, the phone's battery will straggle to make it to
the lunch time.
Because the above reasons, my recommendations is if you ever run into
TCP connection timeout issue, first look at whether you can fix the
problem from the source - the firewall and NAT. Talk to your local
network admin, and ensure the timeout on these devices are not
misconfigured to unnecessarily low value. If the (peak) load allows the
timeout to be increased, you should always prefer to do this. Only
decrease the tcp_keepalive_time on the server if that's not possible,
because this is not a free lunch.
On 30/01/2022 15:40, Troels Arvin wrote:
Hello,
Manish Khandelwal wrote:
/The issue was //*tcp_keepalive_time*// has the default value
(7200 seconds). So once the idle connection is broken by the
firewall, the application (Cassandra node) was getting notified
very late. Thus we were seeing one node sending merkle tree and
other not receiving it. Reducing it to 60 solved the problem./
Thanks for following up with a solution.
I find it somewhat frustrating that Linux keeps having such a strange
default /tcp_keepalive_time/ value. I've tried suggesting that it be
changed, but without luck:
https://lists.openwall.net/netdev/2021/05/17/61
--
Regards,
Troels Arvin