Also,
 Which OS is this running on?

Thanks
mahadev


On 12/20/10 4:57 AM, "Henry Robinson" <[email protected]> wrote:

> Hi Tabrez -
> 
> Just wondering - are the number of file descriptors that are being polled
> going up rapidly? Are you able to capture that data?
> 
> Henry
> 
> On 8 December 2010 23:42, Tabrez Memon <[email protected]> wrote:
> 
>> Hi
>> 
>> I am observing the following error in my app based on the C zookeeper
>> client.
>> The app is completely idle at all times and it has just created a
>> connection
>> with the zookeeper server running on the same machine with a session
>> timeout of
>> 5000ms.
>> 
>> 2010-12-04 10:14:45,593:8710(0x7f641136d910):zoo_w...@zookeeper_interest
>> @1461:
>> Exceeded deadline by 25317ms
>> 2010-12-04
>> 10:14:45,593:8710(0x7f641136d910):zoo_er...@handle_socket_error_msg@1528:
>> Socket
>> [127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out):
>> connection
>> timed out (exceeded timeout by 23651ms)
>> 
>> On adding some debug logs, I found that the delay happens in the poll()
>> system
>> call used in the zookeeper I/O thread (mt_zookeeper.c). I am using a
>> session
>> timeout of 5 seconds. Thus poll gets called with a timeout of ~1666 ms.
>> However
>> after running the app for a while, poll gets stuck for a duration much
>> higher
>> than the specified timeout. Here are the logs which illustrate this
>> problem:
>> 
>> 2010-12-04 23:45:50,487:3813(0x7f356fc54910):zoo_de...@do_io@303: called
>> zk
>> interest
>> 2010-12-04 23:45:50,487:3813(0x7f356fc54910):zoo_de...@do_io@311: calling
>> poll
>> with timeout 1666ms
>> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@313: out of
>> poll
>> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@325: calling
>> zookeeper_process
>> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@327: out of
>> zookeeper_process
>> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):zoo_de...@do_io@301: calling
>> zk
>> interest
>> 
>> As you can see the poll call returned after ~26 seconds while it was passed
>> a
>> timeout of 1.6s.
>> 
>> To see if this problem was related to poll() I replaced it with select
>> calls.
>> The app ran for a much longer time after this change., however I eventually
>> hit
>> the same problem again:
>> 
>> 2010-12-08 10:59:13,718:20624(0x7f74abe18910):zoo_de...@zookeeper_interest
>> @1468:
>> Time since last receive: 0ms
>> 2010-12-08 10:59:13,718:20624(0x7f74abe18910):zoo_de...@do_io@306: calling
>> select with timeout 1666ms
>> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@308: out of
>> select
>> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@310: select
>> returned with 0 ready fds
>> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@327: calling
>> zookeeper_process
>> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@do_io@329: out of
>> zookeeper_process
>> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_w...@zookeeper_interest
>> @1465:
>> Exceeded deadline by 2682ms
>> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_de...@zookeeper_interest
>> @1468:
>> Time since last receive: 4348ms
>> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):zoo_w...@zookeeper_interest
>> @1472:
>> Nothing received from server since last: 4348ms
>> 2010-12-08
>> 10:59:18,065:20624(0x7f74abe18910):zoo_er...@handle_socket_error_msg@1540:
>> Socket [127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out):
>> connection timed out (exceeded timeout by 1015ms)
>> 
>> In the above run, select was stuck for ~5 seconds inspite of passing a
>> timeout
>> of 1.6s.
>> 
>> I am running this on Ubuntu 10.10 with kernel version 2.6.35. Has anyone
>> else
>> observed anything like this? I was looking around for any posts related to
>> issues with poll/select on linux, but havent found anything.
>> 
>> Thanks for your help!
>> Regards,
>> Tabrez
>> 
>> 
>> 
> 
> 
> 
> 
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
> 

Reply via email to