Hi Craig, I am sorry you guys have been running into trouble with Zookeeper. Have you file a JIRA ticket where we can track the issues you are seeing? That is how we track and schedule (human) resources for bug fixing :)
Thanks! Niklas On 4 March 2015 at 13:18, <[email protected]> wrote: > hi again mesos users and devs, > In the prior post i left with description of hanging program with mesos > zookeeper c++ api and wondered about enhancement to not wait indefinitely > when underlying zookeeper responses dont occur. > At that time i thought perhaps the underlying zookeeper and/or its C > binding might not be responding up to the mesos api callers. > So, while the question is still outstanding, I now see that potentially > the hanging issue is with the mesos implementation over zookeeper c binding. > In particular i've now tried a similar scenario just with zookeeper c > binding api. > That is, do zk aget/complete from within a watcher for events for the > CHANGED event from a prior aset/complete. > i dont see any blocking indefinitely and both the aget and aset > completions are invoked and finish. > > Unless i'm not reproducing this properly, what i determine is a bad > behavior from the mesos c++ api. > Somehow the mesos c++ zookeeper api implementation is getting itself into > pthread condition waits with nothing to notify and break the waits. > this seems to occur with get calls from a Watcher on CHANGED events. > > craig > > > > > -------- Original Message -------- > From: [email protected] > Apparently from: [email protected] > To: [email protected] > Subject: mesos c++ zookeeper blocks indefinately -- any plans to enhance? > Date: Wed, 4 Mar 2015 10:05:54 -0500 > > > hi mesos users and devs, > > We've observed that that the mesos 0.22.0-rc1 c++ zookeeper code appears > to allow indefinite waits on responses. > > This leads to application hangs blocked inside mesos zookeeper calls. > > This can happen with a properly running zookeeper presumably able to > make all responses. > > > > Heres how we hung it for eg. > > We issue a mesos zk set via > > > > int ZooKeeper::set ( const std::string & path, > > const std::string & data, > > int version > > ) > > > > then inside a Watcher we process on CHANGED event to issue a mesos zk > get on the same path via > > > > int ZooKeeper::get ( const std::string & path, > > bool watch, > > std::string * result, > > Stat * stat > > ) > > > > we end up with two threads in the process both in pthread_cond_waits > > #0 0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > > #1 0x00007f6664ee1cf5 in Gate::arrive (this=0x7f6140, old=0) > > at ../../../3rdparty/libprocess/src/gate.hpp:82 > > #2 0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, > pid=...) > > at ../../../3rdparty/libprocess/src/process.cpp:2476 > > #3 0x00007f6664ed2ce9 in process::wait (pid=..., duration=...) > > at ../../../3rdparty/libprocess/src/process.cpp:2958 > > #4 0x00007f6664e90558 in process::Latch::await (this=0x7f6ba0, > duration=...) > > at ../../../3rdparty/libprocess/src/latch.cpp:49 > > #5 0x00007f66649452cc in process::Future<int>::await > (this=0x7fffa0fd9040, duration=...) > > at ../../3rdparty/libprocess/include/process/future.hpp:1156 > > #6 0x00007f666493a04d in process::Future<int>::get (this=0x7fffa0fd9040) > > at ../../3rdparty/libprocess/include/process/future.hpp:1167 > > #7 0x00007f6664ab1aac in ZooKeeper::set (this=0x803ce0, > path="/craig/mo", data= > > ... > > > > and > > #0 0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > > #1 0x00007f6664ee1cf5 in Gate::arrive (this=0x7f66380013f0, old=0) > > at ../../../3rdparty/libprocess/src/gate.hpp:82 > > #2 0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, > pid=...) > > at ../../../3rdparty/libprocess/src/process.cpp:2476 > > #3 0x00007f6664ed2ce9 in process::wait (pid=..., duration=...) > > at ../../../3rdparty/libprocess/src/process.cpp:2958 > > #4 0x00007f6664e90558 in process::Latch::await (this=0x7f6638000d00, > duration=...) > > at ../../../3rdparty/libprocess/src/latch.cpp:49 > > #5 0x00007f66649452cc in process::Future<int>::await > (this=0x7f66595fb6f0, duration=...) > > at ../../3rdparty/libprocess/include/process/future.hpp:1156 > > #6 0x00007f666493a04d in process::Future<int>::get (this=0x7f66595fb6f0) > > at ../../3rdparty/libprocess/include/process/future.hpp:1167 > > #7 0x00007f6664ab18d3 in ZooKeeper::get (this=0x803ce0, > path="/craig/mo", watch=false, > > .... > > > > So, really we are asking whether the mesos zk c++ api will be enhanced > to not block indefinitely when results are beyond a time bound. > > > > cheers > > craig >

