Hi folks, I've been seeing a lot of client session timed out messages from my server logs recently - normally those wouldn't concern me too much since there're occasion network outages. However, some of the session time out values are abnormally large.
I've been using 40s as the session timeout value for my ZooKeeper sessions, and those are confirmed as the negotiated timeouts in the client establishment logs as well. However, I'd sometimes see timeout logs stating times far longer than 40s - e.g. the one in the title. Reading from ZooKeeper's source code (I'm using v3.4.4) - it seems there's no way the clientCnxnSocket.getIdleRecv() call would cause session time out delays of more than 2/3 * sessionTimeout (which is 26.66s in my case). The theory I have is.. let's say the ZooKeeper client receives a ping from server at time t, and the ClientCnxn.SendThread schedules the next doTransport() at t + 26.66s - then the worst thing that could happen is there's nothing from the server for 26.66s and so I'd get a session timed out mesage with sth like 26667ms - which is quite common during network outage. However, sometimes I'm getting these >40s and even >100s time outs - and I just can't understand them. Any clues on how these can happen? Best Regards, Martin Kou
