Re: Can Curator's recipes for synchronization be used when the releasing entity is not the locking entity?

Foolish Ewe Mon, 30 Jan 2017 13:35:14 -0800

Hello Jordan:


Thank you for your thoughtful reply and also thanks to Vitalii Tymchyshyn, 
whose response may be addressing some of my questions.  Tl; dr  if I understand 
correctly, the Curator api design constrains the client java process that 
unlocks or returns a lease to be the same client (and hence in the same java 
process) that acquired the lock/lease.


Let's consider the problem and try to develop some intuition and if needed 
formalism. First let's consider the problem outside the Curator context and 
then ask if we can express it in Curator/Zookeeper.


Suppose we have the following logic before we decorate it with 
synchronziation/mutual exclusion, we are given a collection of parallel 
workflows where they all do



Step B) update SharedResource

Step C) read SharedResource (and other inputs) and Write Computed Results (to 
HDFS)

Step E) ProcessResults


It happens that for our use case,  Step 2) takes considerable time in our use 
case and if some work flow, say i is in Step B) or Step C) while another work 
flow, say j, does Step B), then job i will either fail and stop (if we are 
lucky) or have (potentially undetectable) corrupted output.

Thus we would like to employ to guard the critical section, which is Step A) 
and Step B) with mutual exclusion/synchronization.   Let w denote the workflow 
id, then the revised job workflow would seem to look like the following:

Step A) Acquire exclusive access to the Shared resource for workflow w 
(reserve/lock the shared resource)

Step B) update SharedResource

Step C) read SharedResource (and other inputs) and Write Computed Results (to 
HDFS)

Step D) Release/unlock the reservation of the Shared Resource of workflow w 
making the Shared Resource available for access by other workflows
Step E) ProcessResults

Since we aren't asking all the workflows to get to reach a particular point in 
execution, it is unclear why I would try a synchronization barrier. To me,  
this looks like a traditional mutual exclusion problem (i.e. at most one 
workflow is active in the critical section of Step B or Step C).

The twist in my use case is that Step B) and Step C) are collections of one or 
more different jobs scheduled by Yarn,  so we don't currently support a 
continuously running client side process that can host a listener for our use 
case.  I was looking to see if the off-the-shelf recipes in Curator support 
this.  My current understanding is (if I understand Vitali's remarks and the 
documentation) is that Curator's design assumes that locking entity should be 
in the same Java process as the unlocking entity and that the Curator design 
advocates for a client side process running with a listener for correctness 
(e.g. recovery in the case of client failure, perhaps other cases too?).   But 
in our current system,  Step A) and Step D) are different jobs and share no 
JVMs (i.e. are distinct Java processes) and I was looking for an appropriate 
approach for the unlock/returnLease in Step D) given that constraint.

Please correct me if I'm wrong, but my understanding I looked at the following  
candidate approaches with the constraint of not having a continuously running 
java process that both acquires and releases a lock (or acquires/returnLease a 
semaphore):


  *   Please correct me if I'm wrong, but my understanding is that for 
revocation, the lock holder needs to be listening for revocation requests and 
then needs to release it's lock (or Revocation appears to be cooperative, so I 
would need a client side listener in the locking entity's java process, which 
would require some (potentially non-trivial) refactoring of the workflow to 
accommodate this, in order to have correct revocation request detection 
followed by lock release.
  *   http://curator.apache.org/curator-recipes/shared-reentrant-lock.html - 
The unlock mechanism requires that the jvm has a valid InterProcessMutex that 
has already acquired the lock before doing a release() operation. So we have a 
chicken and egg situation here.
  *   http://curator.apache.org/curator-recipes/shared-semaphore.html - The 
Lease (obtained via the acquire method) parameter in the returnLease method (on 
first glance) appears to requires that the same java process perform both the 
locking and unlocking (unless the lease can be serialized and transmitted from 
the locking entity and received and deserialized by the unlocking entity). 
Although the lease provides a way to mitigate crashed locking entities, there 
appears to be a tradeoff, where the lease improves recovery from crashed or 
failed clients but makes the Curator semaphores seem less expressive than the  
traditional semaphore definition does not have any analog of the lease. E.g. in 
producer consumer problems, the unlocking entity is distinct from the locking 
entity (which is why I mentioned it as a motivating example).

This seems to imply that I need to look at the cost of modifying the workflow 
design and see if I can meet the constraint or consider other approaches.

With best regards:

Bill

  *
Apache Curator 
Recipes<http://curator.apache.org/curator-recipes/shared-semaphore.html>
curator.apache.org
A counting semaphore that works across JVMs. All processes in all JVMs that use 
the same lock path will achieve an inter-process limited set of leases.



Shared ReEntrant Lock - Apache 
Curator<http://curator.apache.org/curator-recipes/shared-reentrant-lock.html>
curator.apache.org
Fully distributed locks that are globally synchronous, meaning at any snapshot 
in time no two clients think they hold the same lock.





________________________________
From: Jordan Zimmerman <[email protected]>
Sent: Thursday, January 26, 2017 5:05 AM
To: [email protected]
Subject: Re: Can Curator's recipes for synchronization be used when the 
releasing entity is not the locking entity?

I read the description several times and, sadly, don’t understand. Maybe 
someone else? At first blush it almost sounds like a barrier or double barrier: 
http://curator.apache.org/curator-recipes/barrier.html or 
http://curator.apache.org/curator-recipes/double-barrier.html. But, then, I 
don’t totally understand. Another thing: Curator InterProcessMutex can be 
revoked from another process. See 
http://curator.apache.org/curator-recipes/shared-reentrant-lock.html “Revoking” 
- maybe that’s what you want? Other than that, maybe you can restate the 
problem or give more details.
Apache Curator 
Recipes<http://curator.apache.org/curator-recipes/shared-reentrant-lock.html>
curator.apache.org
Fully distributed locks that are globally synchronous, meaning at any snapshot 
in time no two clients think they hold the same lock.


Apache Curator 
Recipes<http://curator.apache.org/curator-recipes/double-barrier.html>
curator.apache.org
An implementation of the Distributed Double Barrier ZK recipe. Double barriers 
enable clients to synchronize the beginning and the end of a computation.


Apache Curator Recipes<http://curator.apache.org/curator-recipes/barrier.html>
curator.apache.org
An implementation of the Distributed Barrier ZK recipe. Distributed systems use 
barriers to block processing of a set of nodes until a condition is met at 
which time ...



-Jordan

On Jan 25, 2017, at 6:03 PM, Foolish Ewe 
<[email protected]<mailto:[email protected]>> wrote:

Hello All:

I would like to use Curator to synchronize mutually exclusive access to a 
shared resource, however the entity that wants to release a lock is distinct 
from the locking entity (i.e. they are in different JVMS on different 
machines).    Such cases can occur in practice (e.g. producer/consumer 
synchronization, but this isn't quite my use case).   Informally I would like 
to have operations that behave like the following in a JVM based language:

  1.  Strict requirements:
     *   acquire(resourceId, taskId) - Have the task waiting for the resource 
suspend until it has mutually exclusive access (i.e. acquires the lock) or 
throw an exception if the request is somehow invalid (i.e. bad resource Id, bad 
task Id, internal error, etc).
     *   release(resourceId) - Given a resource, if there is an acquired lock, 
release that lock and wake up the next task (in FCFS order) waiting to acquire 
the lock if it exists
  2.  Nice to have (useful for maintenance, etc).
     *   status(resourceId) - Report if the resource is locked, the current 
taskId of the acquirer if the lock is acquired and the (potentially empty)  
FCFS list of tasks waiting to acquire the lock.
     *   releaseAll(resourceId)  - remove all pending locks on this resource

However, the semantics of the recipes I've looked at seem to indicate that the 
releasing entity must have a handle (either explicit or implicit) of the 
lease/lock, e.g.


  *   http://curator.apache.org/curator-recipes/shared-reentrant-lock.html 
states
  *

public void release()
Perform one release of the mutex if the calling thread is the same thread that 
acquired it. If the
thread had made multiple calls to acquire, the mutex will still be held when 
this method returns.



  *   http://curator.apache.org/curator-recipes/shared-semaphore.html states:
  *   Lease instances can either be closed directly or you can use these 
convenience methods:

public void returnAll(Collection<Lease> leases)
public void returnLease(Lease lease)

So it appears on the surface the the expectation is that the same entity that 
acquires a mutex or a semaphore lease is expected to release the mutex or 
return the lease.
My questions are:

  1.  Am I misunderstanding how Curator works?
  2.  Is there a more appropriate abstraction in Curator for my use case?
  3.  Can I use one of the existing recipes?  Could a releasing entity return a 
lease if they had a serialized copy of the lease but weren't the entity 
acquiring the lease?
  4.  If I need to roll my own, should the Curator Framework be able to help 
here or should I work at the raw zookeeper level for this use case?

Thanks for your help with this:

Bill

Re: Can Curator's recipes for synchronization be used when the releasing entity is not the locking entity?

Reply via email to