Also, Which OS is this running on? Thanks mahadev
On 12/20/10 4:57 AM, "Henry Robinson" <[email protected]> wrote: > Hi Tabrez - > > Just wondering - are the number of file descriptors that are being polled > going up rapidly? Are you able to capture that data? > > Henry > > On 8 December 2010 23:42, Tabrez Memon <[email protected]> wrote: > >> Hi >> >> I am observing the following error in my app based on the C zookeeper >> client. >> The app is completely idle at all times and it has just created a >> connection >> with the zookeeper server running on the same machine with a session >> timeout of >> 5000ms. >> >> 2010-12-04 10:14:45,593:8710(0x7f641136d910):zoo_w...@zookeeper_interest >> @1461: >> Exceeded deadline by 25317ms >> 2010-12-04 >> 10:14:45,593:8710(0x7f641136d910):zoo_er...@handle_socket_error_msg@1528: >> Socket >> [127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out): >> connection >> timed out (exceeded timeout by 23651ms) >> >> On adding some debug logs, I found that the delay happens in the poll() >> system >> call used in the zookeeper I/O thread (mt_zookeeper.c). I am using a >> session >> timeout of 5 seconds. Thus poll gets called with a timeout of ~1666 ms. >> However >> after running the app for a while, poll gets stuck for a duration much >> higher >> than the specified timeout. Here are the logs which illustrate this >> problem: >> >> 2010-12-04 23:45:50,487:3813(0x7f356fc54910):zoo_de...@do_io@303: called >> zk >> interest >> 2010-12-04 23:45:50,487:3813(0x7f356fc54910):zoo_de...@do_io@311: calling >> poll >> with timeout 1666ms >> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@313: out of >> poll >> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@325: calling >> zookeeper_process >> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@327: out of >> zookeeper_process >> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@301: calling >> zk >> interest >> >> As you can see the poll call returned after ~26 seconds while it was passed >> a >> timeout of 1.6s. >> >> To see if this problem was related to poll() I replaced it with select >> calls. >> The app ran for a much longer time after this change., however I eventually >> hit >> the same problem again: >> >> 2010-12-08 10:59:13,718:20624(0x7f74abe18910):zoo_de...@zookeeper_interest >> @1468: >> Time since last receive: 0ms >> 2010-12-08 10:59:13,718:20624(0x7f74abe18910):zoo_de...@do_io@306: calling >> select with timeout 1666ms >> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@308: out of >> select >> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@310: select >> returned with 0 ready fds >> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@327: calling >> zookeeper_process >> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@329: out of >> zookeeper_process >> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_w...@zookeeper_interest >> @1465: >> Exceeded deadline by 2682ms >> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@zookeeper_interest >> @1468: >> Time since last receive: 4348ms >> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_w...@zookeeper_interest >> @1472: >> Nothing received from server since last: 4348ms >> 2010-12-08 >> 10:59:18,065:20624(0x7f74abe18910):zoo_er...@handle_socket_error_msg@1540: >> Socket [127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out): >> connection timed out (exceeded timeout by 1015ms) >> >> In the above run, select was stuck for ~5 seconds inspite of passing a >> timeout >> of 1.6s. >> >> I am running this on Ubuntu 10.10 with kernel version 2.6.35. Has anyone >> else >> observed anything like this? I was looking around for any posts related to >> issues with poll/select on linux, but havent found anything. >> >> Thanks for your help! >> Regards, >> Tabrez >> >> >> > > > > > -- > Henry Robinson > Software Engineer > Cloudera > 415-994-6679 >
