Hi
I am observing the following error in my app based on the C zookeeper client.
The app is completely idle at all times and it has just created a connection
with the zookeeper server running on the same machine with a session timeout of
5000ms.
2010-12-04 10:14:45,593:8710(0x7f641136d910):zoo_w...@zookeeper_interest@1461:
Exceeded deadline by 25317ms
2010-12-04
10:14:45,593:8710(0x7f641136d910):zoo_er...@handle_socket_error_msg@1528:
Socket
[127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out): connection
timed out (exceeded timeout by 23651ms)
On adding some debug logs, I found that the delay happens in the poll() system
call used in the zookeeper I/O thread (mt_zookeeper.c). I am using a session
timeout of 5 seconds. Thus poll gets called with a timeout of ~1666 ms. However
after running the app for a while, poll gets stuck for a duration much higher
than the specified timeout. Here are the logs which illustrate this problem:
2010-12-04 23:45:50,487:3813(0x7f356fc54910):zoo_de...@do_io@303: called zk
interest
2010-12-04 23:45:50,487:3813(0x7f356fc54910):zoo_de...@do_io@311: calling poll
with timeout 1666ms
2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@313: out of poll
2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@325: calling
zookeeper_process
2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@327: out of
zookeeper_process
2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@301: calling zk
interest
As you can see the poll call returned after ~26 seconds while it was passed a
timeout of 1.6s.
To see if this problem was related to poll() I replaced it with select calls.
The app ran for a much longer time after this change., however I eventually hit
the same problem again:
2010-12-08
10:59:13,718:20624(0x7f74abe18910):zoo_de...@zookeeper_interest@1468:
Time since last receive: 0ms
2010-12-08 10:59:13,718:20624(0x7f74abe18910):zoo_de...@do_io@306: calling
select with timeout 1666ms
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@308: out of
select
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@310: select
returned with 0 ready fds
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@327: calling
zookeeper_process
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@329: out of
zookeeper_process
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_w...@zookeeper_interest@1465:
Exceeded deadline by 2682ms
2010-12-08
10:59:18,065:20624(0x7f74abe18910):zoo_de...@zookeeper_interest@1468:
Time since last receive: 4348ms
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_w...@zookeeper_interest@1472:
Nothing received from server since last: 4348ms
2010-12-08
10:59:18,065:20624(0x7f74abe18910):zoo_er...@handle_socket_error_msg@1540:
Socket [127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out):
connection timed out (exceeded timeout by 1015ms)
In the above run, select was stuck for ~5 seconds inspite of passing a timeout
of 1.6s.
I am running this on Ubuntu 10.10 with kernel version 2.6.35. Has anyone else
observed anything like this? I was looking around for any posts related to
issues with poll/select on linux, but havent found anything.
Thanks for your help!
Regards,
Tabrez