There is something that is continually confusing to me about your explanations. What JVMs are acquiring the exclusive access? Is it a single thread? What does “exclusive access” mean here? If, as you assert, "this looks like a traditional mutual exclusion” you’d have to identify which thread or threads in your system are getting the critical section.
> Curator's design assumes that locking entity should be in the same Java > process as the unlocking entity This is not merely Curator’s design - nothing else would make sense. A thread holds a lock. Who else could release that lock other than the thread holding it? These kinds of comments just confuse me I’m afraid. > Step A) and Step D) are different jobs and share no JVMs (i.e. are distinct > Java processes) and I was looking for an appropriate approach for the > unlock/returnLease in Step D) given that constraint. Then what does “lock” mean in this context? Who is the lock holder. Let’s be clear - threads hold locks - nothing else. If you’re looking for code that allows 1 thread to hold a lock then that’s a Curator InterProcessMutex. If you’re looking for something that allows multiple threads to hold the same lock then that’s a Curator InterProcessSemaphoreV2. However, you keep referring to locking the SharedResource and that confuses me. It sounds like you’re asking if there is a single Curator class that does everything you want. It seems not. But, you might use a combination of Curator locks and caches, etc. Perhaps each process that wants to participate in this workflow could create a PersistentNode using EPHEMERAL_SEQUENTIAL. Any participant in the workflow can know that a given SharedResource is being operated on. Or you could use a LeaderSelector whereby one process leads the workflow but the others can know that they are participants. Or, maybe you could combine them: use a LeaderSelector to control the master in the workflow but also use a PersistentNode to denote that a given SharedResource is in process. FWIW: I wrote a distributed task/workflow system a while back. Maybe you could look at it for some ideas? https://github.com/NirmataOSS/workflow <https://github.com/NirmataOSS/workflow> -Jordan > On Jan 30, 2017, at 4:33 PM, Foolish Ewe <[email protected]> wrote: > > Hello Jordan: > > Thank you for your thoughtful reply and also thanks to Vitalii Tymchyshyn, > whose response may be addressing some of my questions. Tl; dr if I > understand correctly, the Curator api design constrains the client java > process that unlocks or returns a lease to be the same client (and hence in > the same java process) that acquired the lock/lease. > > Let's consider the problem and try to develop some intuition and if needed > formalism. First let's consider the problem outside the Curator context and > then ask if we can express it in Curator/Zookeeper. > > Suppose we have the following logic before we decorate it with > synchronziation/mutual exclusion, we are given a collection of parallel > workflows where they all do > > > Step B) update SharedResource > Step C) read SharedResource (and other inputs) and Write Computed Results (to > HDFS) > Step E) ProcessResults > > > It happens that for our use case, Step 2) takes considerable time in our use > case and if some work flow, say i is in Step B) or Step C) while another work > flow, say j, does Step B), then job i will either fail and stop (if we are > lucky) or have (potentially undetectable) corrupted output. > > Thus we would like to employ to guard the critical section, which is Step A) > and Step B) with mutual exclusion/synchronization. Let w denote the > workflow id, then the revised job workflow would seem to look like the > following: > > Step A) Acquire exclusive access to the Shared resource for workflow w > (reserve/lock the shared resource) > Step B) update SharedResource > Step C) read SharedResource (and other inputs) and Write Computed Results (to > HDFS) > Step D) Release/unlock the reservation of the Shared Resource of workflow w > making the Shared Resource available for access by other workflows > Step E) ProcessResults > > Since we aren't asking all the workflows to get to reach a particular point > in execution, it is unclear why I would try a synchronization barrier. To me, > this looks like a traditional mutual exclusion problem (i.e. at most one > workflow is active in the critical section of Step B or Step C). > > The twist in my use case is that Step B) and Step C) are collections of one > or more different jobs scheduled by Yarn, so we don't currently support a > continuously running client side process that can host a listener for our use > case. I was looking to see if the off-the-shelf recipes in Curator support > this. My current understanding is (if I understand Vitali's remarks and the > documentation) is that Curator's design assumes that locking entity should be > in the same Java process as the unlocking entity and that the Curator design > advocates for a client side process running with a listener for correctness > (e.g. recovery in the case of client failure, perhaps other cases too?). > But in our current system, Step A) and Step D) are different jobs and share > no JVMs (i.e. are distinct Java processes) and I was looking for an > appropriate approach for the unlock/returnLease in Step D) given that > constraint. > > Please correct me if I'm wrong, but my understanding I looked at the > following candidate approaches with the constraint of not having a > continuously running java process that both acquires and releases a lock (or > acquires/returnLease a semaphore): > > Please correct me if I'm wrong, but my understanding is that for revocation, > the lock holder needs to be listening for revocation requests and then needs > to release it's lock (or Revocation appears to be cooperative, so I would > need a client side listener in the locking entity's java process, which would > require some (potentially non-trivial) refactoring of the workflow to > accommodate this, in order to have correct revocation request detection > followed by lock release. > http://curator.apache.org/curator-recipes/shared-reentrant-lock.html > <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html> - The > unlock mechanism requires that the jvm has a valid InterProcessMutex > that has already acquired the lock before doing a release() operation. So we > have a chicken and egg situation here. > http://curator.apache.org/curator-recipes/shared-semaphore.html > <http://curator.apache.org/curator-recipes/shared-semaphore.html> > - The Lease (obtained via the acquire method) parameter in the returnLease > method (on first glance) appears to requires that the same java process > perform both the locking and unlocking (unless the > lease can be serialized and transmitted from the locking entity and received > and deserialized by the unlocking entity). Although the lease provides a way > to mitigate crashed locking entities, there appears to be a tradeoff, where > the lease improves recovery from crashed or failed clients but makes the > Curator semaphores seem less expressive than the traditional semaphore > definition does not have any analog of the lease. E.g. in producer consumer > problems, the unlocking entity is distinct from the locking entity (which is > why I mentioned it as a motivating example). > This seems to imply that I need to look at the cost of modifying the workflow > design and see if I can meet the constraint or consider other approaches. > > > With best regards: > > > Bill > > Apache Curator Recipes > <http://curator.apache.org/curator-recipes/shared-semaphore.html> > curator.apache.org <http://curator.apache.org/> > A counting semaphore that works across JVMs. All processes in all JVMs that > use the same lock path will achieve an inter-process limited set of leases. > > > > Shared ReEntrant Lock - Apache Curator > <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html> > curator.apache.org <http://curator.apache.org/> > Fully distributed locks that are globally synchronous, meaning at any > snapshot in time no two clients think they hold the same lock. > > > > > From: Jordan Zimmerman <[email protected] > <mailto:[email protected]>> > Sent: Thursday, January 26, 2017 5:05 AM > To: [email protected] <mailto:[email protected]> > Subject: Re: Can Curator's recipes for synchronization be used when the > releasing entity is not the locking entity? > > I read the description several times and, sadly, don’t understand. Maybe > someone else? At first blush it almost sounds like a barrier or double > barrier: http://curator.apache.org/curator-recipes/barrier.html > <http://curator.apache.org/curator-recipes/barrier.html> or > http://curator.apache.org/curator-recipes/double-barrier.html > <http://curator.apache.org/curator-recipes/double-barrier.html>. But, then, I > don’t totally understand. Another thing: Curator InterProcessMutex can be > revoked from another process. See > http://curator.apache.org/curator-recipes/shared-reentrant-lock.html > <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html> > “Revoking” - maybe that’s what you want? Other than that, maybe you can > restate the problem or give more details. > Apache Curator Recipes > <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html> > curator.apache.org <http://curator.apache.org/> > Fully distributed locks that are globally synchronous, meaning at any > snapshot in time no two clients think they hold the same lock. > > Apache Curator Recipes > <http://curator.apache.org/curator-recipes/double-barrier.html> > curator.apache.org <http://curator.apache.org/> > An implementation of the Distributed Double Barrier ZK recipe. Double > barriers enable clients to synchronize the beginning and the end of a > computation. > > Apache Curator Recipes > <http://curator.apache.org/curator-recipes/barrier.html> > curator.apache.org <http://curator.apache.org/> > An implementation of the Distributed Barrier ZK recipe. Distributed systems > use barriers to block processing of a set of nodes until a condition is met > at which time ... > > > -Jordan > >> On Jan 25, 2017, at 6:03 PM, Foolish Ewe <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hello All: >> >> I would like to use Curator to synchronize mutually exclusive access to a >> shared resource, however the entity that wants to release a lock is distinct >> from the locking entity (i.e. they are in different JVMS on different >> machines). Such cases can occur in practice (e.g. producer/consumer >> synchronization, but this isn't quite my use case). Informally I would >> like to have operations that behave like the following in a JVM based >> language: >> Strict requirements: >> acquire(resourceId, taskId) - Have the task waiting for the resource suspend >> until it has mutually exclusive access (i.e. acquires the lock) or throw an >> exception if the request is somehow invalid (i.e. bad resource Id, bad task >> Id, internal error, etc). >> release(resourceId) - Given a resource, if there is an acquired lock, >> release that lock and wake up the next task (in FCFS order) waiting to >> acquire the lock if it exists >> Nice to have (useful for maintenance, etc). >> status(resourceId) - Report if the resource is locked, the current taskId of >> the acquirer if the lock is acquired and the (potentially empty) FCFS list >> of tasks waiting to acquire the lock. >> releaseAll(resourceId) - remove all pending locks on this resource >> However, the semantics of the recipes I've looked at seem to indicate that >> the releasing entity must have a handle (either explicit or implicit) of the >> lease/lock, e.g. >> >> http://curator.apache.org/curator-recipes/shared-reentrant-lock.html >> <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html> >> states >> public void release() >> Perform one release of the mutex if the calling thread is the same thread >> that acquired it. If the >> thread had made multiple calls to acquire, the mutex will still be held when >> this method returns. >> >> http://curator.apache.org/curator-recipes/shared-semaphore.html >> <http://curator.apache.org/curator-recipes/shared-semaphore.html> states: >> Lease instances can either be closed directly or you can use these >> convenience methods: >> >> public void returnAll(Collection<Lease> leases) >> public void returnLease(Lease lease) >> So it appears on the surface the the expectation is that the same entity >> that acquires a mutex or a semaphore lease is expected to release the mutex >> or return the lease. >> My questions are: >> Am I misunderstanding how Curator works? >> Is there a more appropriate abstraction in Curator for my use case? >> Can I use one of the existing recipes? Could a releasing entity return a >> lease if they had a serialized copy of the lease but weren't the entity >> acquiring the lease? >> If I need to roll my own, should the Curator Framework be able to help here >> or should I work at the raw zookeeper level for this use case? >> Thanks for your help with this: >> >> Bill > >
