Hi

I am observing the following error in my app based on the C zookeeper client. 
The app is completely idle at all times and it has just created a connection 
with the zookeeper server running on the same machine with a session timeout of 
5000ms.

2010-12-04 10:14:45,593:8710(0x7f641136d910):zoo_w...@zookeeper_interest@1461: 
Exceeded deadline by 25317ms
2010-12-04 
10:14:45,593:8710(0x7f641136d910):zoo_er...@handle_socket_error_msg@1528: 
Socket 
[127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out): connection 
timed out (exceeded timeout by 23651ms)

On adding some debug logs, I found that the delay happens in the poll() system 
call used in the zookeeper I/O thread (mt_zookeeper.c). I am using a session 
timeout of 5 seconds. Thus poll gets called with a timeout of ~1666 ms. However 
after running the app for a while, poll gets stuck for a duration much higher 
than the specified timeout. Here are the logs which illustrate this problem:

2010-12-04 23:45:50,487:3813(0x7f356fc54910):zoo_de...@do_io@303: called zk 
interest
2010-12-04 23:45:50,487:3813(0x7f356fc54910):zoo_de...@do_io@311: calling poll 
with timeout 1666ms
2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@313: out of poll
2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@325: calling 
zookeeper_process
2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@327: out of 
zookeeper_process
2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@301: calling zk 
interest

As you can see the poll call returned after ~26 seconds while it was passed a 
timeout of 1.6s.

To see if this problem was related to poll() I replaced it with select calls. 
The app ran for a much longer time after this change., however I eventually hit 
the same problem again:

2010-12-08 
10:59:13,718:20624(0x7f74abe18910):zoo_de...@zookeeper_interest@1468: 
Time since last receive: 0ms 
2010-12-08 10:59:13,718:20624(0x7f74abe18910):zoo_de...@do_io@306: calling 
select with timeout 1666ms 
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@308: out of 
select 
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@310: select 
returned with 0 ready fds 
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@327: calling 
zookeeper_process 
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@329: out of 
zookeeper_process 
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_w...@zookeeper_interest@1465: 
Exceeded deadline by 2682ms 
2010-12-08 
10:59:18,065:20624(0x7f74abe18910):zoo_de...@zookeeper_interest@1468: 
Time since last receive: 4348ms 
2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_w...@zookeeper_interest@1472: 
Nothing received from server since last: 4348ms 
2010-12-08 
10:59:18,065:20624(0x7f74abe18910):zoo_er...@handle_socket_error_msg@1540: 
Socket [127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out): 
connection timed out (exceeded timeout by 1015ms) 

In the above run, select was stuck for ~5 seconds inspite of passing a timeout 
of 1.6s.

I am running this on Ubuntu 10.10 with kernel version 2.6.35. Has anyone else 
observed anything like this? I was looking around for any posts related to 
issues with poll/select on linux, but havent found anything.

Thanks for your help!
Regards,
Tabrez


      

Reply via email to