Re: Hanging repairs in Cassandra

Bowen Song Sun, 30 Jan 2022 10:08:46 -0800

The default is set to 2 hours for good reasons.

First, only stateful firewall and NAT gateway care about timing out idleTCP connections. There's not such devices in the core Internet routinginfrastructure. Those devices are only found very close to the endpointson both sides of the TCP connection, and the local networkadministrators should have full control over their configuration. Youwill only run into the TCP connections timing out problem when thesedevices are either misconfigured or the load requires them to beconfigured that way.

Second, to avoid bandwidth and system resource wastage on servers withlarge number of open and mostly idle TCP connections with theSO_KEEPALIVE option enabled. For example, if a server has 100,000 suchTCP connections, and the tcp_keepalive_time is set to 10 minutes on theserver, there will be over 166 keep-alive packets per second. That's ahuge waste of system resource and network bandwidth.

Third, to conserve power on battery powered devices, such as networkenabled sensors and smartphones. Every time a network packet arrives atthese devices, they will have to wake up, process the packet and then goback to sleep. Waking a smartphone up once every 2 hours isn't a bigdeal, that's only 12 times a day, probably less than the number of timesmost people check their phones per day. However, if the phone has towake up every minute, the phone's battery will straggle to make it tothe lunch time.

Because the above reasons, my recommendations is if you ever run intoTCP connection timeout issue, first look at whether you can fix theproblem from the source - the firewall and NAT. Talk to your localnetwork admin, and ensure the timeout on these devices are notmisconfigured to unnecessarily low value. If the (peak) load allows thetimeout to be increased, you should always prefer to do this. Onlydecrease the tcp_keepalive_time on the server if that's not possible,because this is not a free lunch.



On 30/01/2022 15:40, Troels Arvin wrote:


Hello,

Manish Khandelwal wrote:


    /The issue was //*tcp_keepalive_time*// has the default value
    (7200 seconds). So once the idle connection is broken by the
    firewall, the application (Cassandra node) was getting notified
    very late.  Thus we were seeing one node sending merkle tree and
    other not receiving it. Reducing it to 60 solved the problem./

Thanks for following up with a solution.

I find it somewhat frustrating that Linux keeps having such a strangedefault /tcp_keepalive_time/ value. I've tried suggesting that it bechanged, but without luck:


https://lists.openwall.net/netdev/2021/05/17/61

--
Regards,
Troels Arvin

Re: Hanging repairs in Cassandra

Reply via email to