The default is set to 2 hours for good reasons.

First, only stateful firewall and NAT gateway care about timing out idle TCP connections. There's not such devices in the core Internet routing infrastructure. Those devices are only found very close to the endpoints on both sides of the TCP connection, and the local network administrators should have full control over their configuration. You will only run into the TCP connections timing out problem when these devices are either misconfigured or the load requires them to be configured that way.

Second, to avoid bandwidth and system resource wastage on servers with large number of open and mostly idle TCP connections with the SO_KEEPALIVE option enabled. For example, if a server has 100,000 such TCP connections, and the tcp_keepalive_time is set to 10 minutes on the server, there will be over 166 keep-alive packets per second. That's a huge waste of system resource and network bandwidth.

Third, to conserve power on battery powered devices, such as network enabled sensors  and smartphones. Every time a network packet arrives at these devices, they will have to wake up, process the packet and then go back to sleep. Waking a smartphone up once every 2 hours isn't a big deal, that's only 12 times a day, probably less than the number of times most people check their phones per day. However, if the phone has to wake up every minute, the phone's battery will straggle to make it to the lunch time.

Because the above reasons, my recommendations is if you ever run into TCP connection timeout issue, first look at whether you can fix the problem from the source - the firewall and NAT. Talk to your local network admin, and ensure the timeout on these devices are not misconfigured to unnecessarily low value. If the (peak) load allows the timeout to be increased, you should always prefer to do this. Only decrease the tcp_keepalive_time on the server if that's not possible, because this is not a free lunch.


On 30/01/2022 15:40, Troels Arvin wrote:

Hello,

Manish Khandelwal wrote:

    /The issue was //*tcp_keepalive_time*// has the default value
    (7200 seconds). So once the idle connection is broken by the
    firewall, the application (Cassandra node) was getting notified
    very late.  Thus we were seeing one node sending merkle tree and
    other not receiving it. Reducing it to 60 solved the problem./

Thanks for following up with a solution.

I find it somewhat frustrating that Linux keeps having such a strange default /tcp_keepalive_time/ value. I've tried suggesting that it be changed, but without luck:

https://lists.openwall.net/netdev/2021/05/17/61

--
Regards,
Troels Arvin

Reply via email to