InterProcessMutex acquire times out and then _succeeds_?

Chris Jeris Thu, 17 Oct 2013 07:48:32 -0700

We have run into a knotty problem with InterProcessMutex, where calls to
.acquire will expend their full timeout and then _succeed_ in acquiring the
lock.  This generally happens when a different server process was the last
one to hold the lock, but it does not happen every time that is the case.


The lock is a single InterProcessMutex object per server process (= server
machine), all on a single persistent Zookeeper node name (the object being
access controlled is a single piece of shared state, whose data is not
itself stored in ZK).  The problem arises in the context of a test suite
where requests to our server cluster are issued serially, so there is
basically no competing traffic on the lock, although there is traffic on
the ZK cluster from other applications.  The frequency of acquires on this
lock is not excessive (order 1 per second), and we are reasonably certain
our client code is not holding the lock longer than it should.

The problem does not seem to be sensitive to the exact value of the
timeout.  If we set it to 15 seconds, we see lock acquires taking 15
seconds and then succeeding; if we set it to 60 seconds, we see them taking
60 seconds and then succeeding.

Right now we observe the problem with Curator 2.1.0 against both ZK 3.3.6
and 3.4.5.

Is this a known or familiar issue?  Does it sound like we're doing
something wrong?

thanks, Chris Jeris
-- 
Chris Jeris
[email protected]
freenode/twitter/github: ystael

InterProcessMutex acquire times out and then _succeeds_?

Reply via email to