I think this is the ticket: https://issues.apache.org/jira/browse/MESOS-2451
and it ended up not being a Mesos bug.

--
Jiang Yan Xu <[email protected]> @xujyan <http://twitter.com/xujyan>

On Mon, Mar 16, 2015 at 11:33 AM, Niklas Nielsen <[email protected]>
wrote:

> Hi Craig,
>
> I am sorry you guys have been running into trouble with Zookeeper.
> Have you file a JIRA ticket where we can track the issues you are seeing?
> That is how we track and schedule (human) resources for bug fixing :)
>
> Thanks!
> Niklas
>
> On 4 March 2015 at 13:18, <[email protected]> wrote:
>
>> hi again mesos users and devs,
>> In the prior post i left with description of hanging program with mesos
>> zookeeper c++ api and wondered about enhancement to not wait indefinitely
>> when underlying zookeeper responses dont occur.
>> At that time i thought perhaps the underlying zookeeper and/or its C
>> binding might not be responding up to the mesos api callers.
>> So, while the question is still outstanding, I now see that potentially
>> the hanging issue is with the mesos implementation over zookeeper c binding.
>> In particular i've now tried a similar scenario just with zookeeper c
>> binding api.
>> That is, do zk aget/complete from within a watcher for events for the
>> CHANGED event from a prior aset/complete.
>> i dont see any blocking indefinitely and both the aget and aset
>> completions are invoked and finish.
>>
>> Unless i'm not reproducing this properly, what i determine is a bad
>> behavior from the mesos c++ api.
>> Somehow the mesos c++ zookeeper api implementation is getting itself into
>> pthread condition waits with nothing to notify and break the waits.
>> this seems to occur with get calls from a Watcher on CHANGED events.
>>
>> craig
>>
>>
>>
>>
>> -------- Original Message --------
>> From: [email protected]
>> Apparently from: [email protected]
>> To: [email protected]
>> Subject: mesos c++ zookeeper blocks indefinately -- any plans to enhance?
>> Date: Wed, 4 Mar 2015 10:05:54 -0500
>>
>> > hi mesos users and devs,
>> > We've observed that that the mesos 0.22.0-rc1 c++ zookeeper code
>> appears to allow indefinite waits on responses.
>> > This leads to application hangs blocked inside mesos zookeeper calls.
>> > This can happen with a properly running zookeeper presumably able to
>> make all responses.
>> >
>> > Heres how we hung it for eg.
>> > We issue a mesos zk set via
>> >
>> > int ZooKeeper::set    (       const std::string &     path,
>> > const std::string &   data,
>> > int   version
>> > )
>> >
>> > then inside a Watcher we process on CHANGED event to issue a mesos zk
>> get on the same path via
>> >
>> > int ZooKeeper::get    (       const std::string &     path,
>> > bool  watch,
>> > std::string *         result,
>> > Stat *        stat
>> > )
>> >
>> > we end up with two threads in the process both in pthread_cond_waits
>> > #0  0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> > #1  0x00007f6664ee1cf5 in Gate::arrive (this=0x7f6140, old=0)
>> >     at ../../../3rdparty/libprocess/src/gate.hpp:82
>> > #2  0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0,
>> pid=...)
>> >     at ../../../3rdparty/libprocess/src/process.cpp:2476
>> > #3  0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
>> >     at ../../../3rdparty/libprocess/src/process.cpp:2958
>> > #4  0x00007f6664e90558 in process::Latch::await (this=0x7f6ba0,
>> duration=...)
>> >     at ../../../3rdparty/libprocess/src/latch.cpp:49
>> > #5  0x00007f66649452cc in process::Future<int>::await
>> (this=0x7fffa0fd9040, duration=...)
>> >     at ../../3rdparty/libprocess/include/process/future.hpp:1156
>> > #6  0x00007f666493a04d in process::Future<int>::get
>> (this=0x7fffa0fd9040)
>> >     at ../../3rdparty/libprocess/include/process/future.hpp:1167
>> > #7  0x00007f6664ab1aac in ZooKeeper::set (this=0x803ce0,
>> path="/craig/mo", data=
>> > ...
>> >
>> > and
>> > #0  0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> > #1  0x00007f6664ee1cf5 in Gate::arrive (this=0x7f66380013f0, old=0)
>> >     at ../../../3rdparty/libprocess/src/gate.hpp:82
>> > #2  0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0,
>> pid=...)
>> >     at ../../../3rdparty/libprocess/src/process.cpp:2476
>> > #3  0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
>> >     at ../../../3rdparty/libprocess/src/process.cpp:2958
>> > #4  0x00007f6664e90558 in process::Latch::await (this=0x7f6638000d00,
>> duration=...)
>> >     at ../../../3rdparty/libprocess/src/latch.cpp:49
>> > #5  0x00007f66649452cc in process::Future<int>::await
>> (this=0x7f66595fb6f0, duration=...)
>> >     at ../../3rdparty/libprocess/include/process/future.hpp:1156
>> > #6  0x00007f666493a04d in process::Future<int>::get
>> (this=0x7f66595fb6f0)
>> >     at ../../3rdparty/libprocess/include/process/future.hpp:1167
>> > #7  0x00007f6664ab18d3 in ZooKeeper::get (this=0x803ce0,
>> path="/craig/mo", watch=false,
>> > ....
>> >
>> > So, really we are asking whether the mesos zk c++ api will be enhanced
>> to not block indefinitely when results are beyond a time bound.
>> >
>> > cheers
>> > craig
>>
>
>

Reply via email to