Re: Can Curator's recipes for synchronization be used when the releasing entity is not the locking entity?

Jordan Zimmerman Tue, 31 Jan 2017 07:00:00 -0800

There is something that is continually confusing to me about your explanations. 
What JVMs are acquiring the exclusive access? Is it a single thread? What does 
“exclusive access” mean here? If, as you assert, "this looks like a traditional 
mutual exclusion” you’d have to identify which thread or threads in your system 
are getting the critical section.


> Curator's design assumes that locking entity should be in the same Java 
> process as the unlocking entity
This is not merely Curator’s design - nothing else would make sense. A thread 
holds a lock. Who else could release that lock other than the thread holding 
it? These kinds of comments just confuse me I’m afraid.

> Step A) and Step D) are different jobs and share no JVMs (i.e. are distinct 
> Java processes) and I was looking for an appropriate approach for the 
> unlock/returnLease in Step D) given that constraint.
Then what does “lock” mean in this context? Who is the lock holder. Let’s be 
clear - threads hold locks - nothing else. If you’re looking for code that 
allows 1 thread to hold a lock then that’s a Curator InterProcessMutex. If 
you’re looking for something that allows multiple threads to hold the same lock 
then that’s a Curator InterProcessSemaphoreV2. However, you keep referring to 
locking the SharedResource and that confuses me.

It sounds like you’re asking if there is a single Curator class that does 
everything you want. It seems not. But, you might use a combination of Curator 
locks and caches, etc. Perhaps each process that wants to participate in this 
workflow could create a PersistentNode using EPHEMERAL_SEQUENTIAL. Any 
participant in the workflow can know that a given SharedResource is being 
operated on. Or you could use a LeaderSelector whereby one process leads the 
workflow but the others can know that they are participants. Or, maybe you 
could combine them: use a LeaderSelector to control the master in the workflow 
but also use a PersistentNode to denote that a given SharedResource is in 
process.

FWIW: I wrote a distributed task/workflow system a while back. Maybe you could 
look at it for some ideas? https://github.com/NirmataOSS/workflow 
<https://github.com/NirmataOSS/workflow>

-Jordan

> On Jan 30, 2017, at 4:33 PM, Foolish Ewe <[email protected]> wrote:
> 
> Hello Jordan:
> 
> Thank you for your thoughtful reply and also thanks to Vitalii Tymchyshyn, 
> whose response may be addressing some of my questions.  Tl; dr  if I 
> understand correctly, the Curator api design constrains the client java 
> process that unlocks or returns a lease to be the same client (and hence in 
> the same java process) that acquired the lock/lease.
> 
> Let's consider the problem and try to develop some intuition and if needed 
> formalism. First let's consider the problem outside the Curator context and 
> then ask if we can express it in Curator/Zookeeper.
> 
> Suppose we have the following logic before we decorate it with 
> synchronziation/mutual exclusion, we are given a collection of parallel 
> workflows where they all do
> 
> 
> Step B) update SharedResource
> Step C) read SharedResource (and other inputs) and Write Computed Results (to 
> HDFS)
> Step E) ProcessResults
> 
> 
> It happens that for our use case,  Step 2) takes considerable time in our use 
> case and if some work flow, say i is in Step B) or Step C) while another work 
> flow, say j, does Step B), then job i will either fail and stop (if we are 
> lucky) or have (potentially undetectable) corrupted output.
> 
> Thus we would like to employ to guard the critical section, which is Step A) 
> and Step B) with mutual exclusion/synchronization.   Let w denote the 
> workflow id, then the revised job workflow would seem to look like the 
> following:
> 
> Step A) Acquire exclusive access to the Shared resource for workflow w 
> (reserve/lock the shared resource)
> Step B) update SharedResource
> Step C) read SharedResource (and other inputs) and Write Computed Results (to 
> HDFS)
> Step D) Release/unlock the reservation of the Shared Resource of workflow w 
> making the Shared Resource available for access by other workflows
> Step E) ProcessResults
> 
> Since we aren't asking all the workflows to get to reach a particular point 
> in execution, it is unclear why I would try a synchronization barrier. To me, 
>  this looks like a traditional mutual exclusion problem (i.e. at most one 
> workflow is active in the critical section of Step B or Step C).
> 
> The twist in my use case is that Step B) and Step C) are collections of one 
> or more different jobs scheduled by Yarn,  so we don't currently support a 
> continuously running client side process that can host a listener for our use 
> case.  I was looking to see if the off-the-shelf recipes in Curator support 
> this.  My current understanding is (if I understand Vitali's remarks and the 
> documentation) is that Curator's design assumes that locking entity should be 
> in the same Java process as the unlocking entity and that the Curator design 
> advocates for a client side process running with a listener for correctness 
> (e.g. recovery in the case of client failure, perhaps other cases too?).   
> But in our current system,  Step A) and Step D) are different jobs and share 
> no JVMs (i.e. are distinct Java processes) and I was looking for an 
> appropriate approach for the unlock/returnLease in Step D) given that 
> constraint.
> 
> Please correct me if I'm wrong, but my understanding I looked at the 
> following  candidate approaches with the constraint of not having a 
> continuously running java process that both acquires and releases a lock (or 
> acquires/returnLease a semaphore):
> 
> Please correct me if I'm wrong, but my understanding is that for revocation, 
> the lock holder needs to be listening for revocation requests and then needs 
> to release it's lock (or Revocation appears to be cooperative, so I would 
> need a client side listener in the locking entity's java process, which would 
> require some (potentially non-trivial) refactoring of the workflow to 
> accommodate this, in order to have correct revocation request detection 
> followed by lock release.
> http://curator.apache.org/curator-recipes/shared-reentrant-lock.html 
> <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html> - The 
> unlock mechanism requires that the jvm has a valid InterProcessMutex
>  that has already acquired the lock before doing a release() operation. So we 
> have a chicken and egg situation here.
> http://curator.apache.org/curator-recipes/shared-semaphore.html 
> <http://curator.apache.org/curator-recipes/shared-semaphore.html>
>  - The Lease (obtained via the acquire method) parameter in the returnLease 
> method (on first glance) appears to requires that the same java process 
> perform both the locking and unlocking (unless the
>  lease can be serialized and transmitted from the locking entity and received 
> and deserialized by the unlocking entity). Although the lease provides a way 
> to mitigate crashed locking entities, there appears to be a tradeoff, where 
> the lease improves recovery from crashed or failed clients but makes the 
> Curator semaphores seem less expressive than the  traditional semaphore 
> definition does not have any analog of the lease. E.g. in producer consumer
>  problems, the unlocking entity is distinct from the locking entity (which is 
> why I mentioned it as a motivating example).
> This seems to imply that I need to look at the cost of modifying the workflow 
> design and see if I can meet the constraint or consider other approaches.
> 
> 
> With best regards:
> 
> 
> Bill
> 
> Apache Curator Recipes 
> <http://curator.apache.org/curator-recipes/shared-semaphore.html>
> curator.apache.org <http://curator.apache.org/>
> A counting semaphore that works across JVMs. All processes in all JVMs that 
> use the same lock path will achieve an inter-process limited set of leases.
> 
> 
> 
> Shared ReEntrant Lock - Apache Curator 
> <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html>
> curator.apache.org <http://curator.apache.org/>
> Fully distributed locks that are globally synchronous, meaning at any 
> snapshot in time no two clients think they hold the same lock.
> 
> 
> 
> 
> From: Jordan Zimmerman <[email protected] 
> <mailto:[email protected]>>
> Sent: Thursday, January 26, 2017 5:05 AM
> To: [email protected] <mailto:[email protected]>
> Subject: Re: Can Curator's recipes for synchronization be used when the 
> releasing entity is not the locking entity?
>  
> I read the description several times and, sadly, don’t understand. Maybe 
> someone else? At first blush it almost sounds like a barrier or double 
> barrier: http://curator.apache.org/curator-recipes/barrier.html 
> <http://curator.apache.org/curator-recipes/barrier.html> or 
> http://curator.apache.org/curator-recipes/double-barrier.html 
> <http://curator.apache.org/curator-recipes/double-barrier.html>. But, then, I 
> don’t totally understand. Another thing: Curator InterProcessMutex can be 
> revoked from another process. See 
> http://curator.apache.org/curator-recipes/shared-reentrant-lock.html 
> <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html> 
> “Revoking” - maybe that’s what you want? Other than that, maybe you can 
> restate the problem or give more details.
> Apache Curator Recipes 
> <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html>
> curator.apache.org <http://curator.apache.org/>
> Fully distributed locks that are globally synchronous, meaning at any 
> snapshot in time no two clients think they hold the same lock.
> 
> Apache Curator Recipes 
> <http://curator.apache.org/curator-recipes/double-barrier.html>
> curator.apache.org <http://curator.apache.org/>
> An implementation of the Distributed Double Barrier ZK recipe. Double 
> barriers enable clients to synchronize the beginning and the end of a 
> computation.
> 
> Apache Curator Recipes 
> <http://curator.apache.org/curator-recipes/barrier.html>
> curator.apache.org <http://curator.apache.org/>
> An implementation of the Distributed Barrier ZK recipe. Distributed systems 
> use barriers to block processing of a set of nodes until a condition is met 
> at which time ...
> 
> 
> -Jordan
> 
>> On Jan 25, 2017, at 6:03 PM, Foolish Ewe <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hello All:
>> 
>> I would like to use Curator to synchronize mutually exclusive access to a 
>> shared resource, however the entity that wants to release a lock is distinct 
>> from the locking entity (i.e. they are in different JVMS on different 
>> machines).    Such cases can occur in practice (e.g. producer/consumer 
>> synchronization, but this isn't quite my use case).   Informally I would 
>> like to have operations that behave like the following in a JVM based 
>> language:
>> Strict requirements:
>> acquire(resourceId, taskId) - Have the task waiting for the resource suspend 
>> until it has mutually exclusive access (i.e. acquires the lock) or throw an 
>> exception if the request is somehow invalid (i.e. bad resource Id, bad task 
>> Id, internal error, etc). 
>> release(resourceId) - Given a resource, if there is an acquired lock, 
>> release that lock and wake up the next task (in FCFS order) waiting to 
>> acquire the lock if it exists
>> Nice to have (useful for maintenance, etc).
>> status(resourceId) - Report if the resource is locked, the current taskId of 
>> the acquirer if the lock is acquired and the (potentially empty)  FCFS list 
>> of tasks waiting to acquire the lock.
>> releaseAll(resourceId)  - remove all pending locks on this resource
>> However, the semantics of the recipes I've looked at seem to indicate that 
>> the releasing entity must have a handle (either explicit or implicit) of the 
>> lease/lock, e.g.
>> 
>> http://curator.apache.org/curator-recipes/shared-reentrant-lock.html 
>> <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html> 
>> states 
>> public void release()
>> Perform one release of the mutex if the calling thread is the same thread 
>> that acquired it. If the
>> thread had made multiple calls to acquire, the mutex will still be held when 
>> this method returns.
>> 
>> http://curator.apache.org/curator-recipes/shared-semaphore.html 
>> <http://curator.apache.org/curator-recipes/shared-semaphore.html> states:
>> Lease instances can either be closed directly or you can use these 
>> convenience methods:
>> 
>> public void returnAll(Collection<Lease> leases)
>> public void returnLease(Lease lease)
>> So it appears on the surface the the expectation is that the same entity 
>> that acquires a mutex or a semaphore lease is expected to release the mutex 
>> or return the lease.
>> My questions are:
>> Am I misunderstanding how Curator works?
>> Is there a more appropriate abstraction in Curator for my use case?
>> Can I use one of the existing recipes?  Could a releasing entity return a 
>> lease if they had a serialized copy of the lease but weren't the entity 
>> acquiring the lease?
>> If I need to roll my own, should the Curator Framework be able to help here 
>> or should I work at the raw zookeeper level for this use case?
>> Thanks for your help with this:
>> 
>> Bill
> 
>

Re: Can Curator's recipes for synchronization be used when the releasing entity is not the locking entity?

Reply via email to