Hi. So, basically you are saying you dont need failure recovery. At least I did not see any failure recovery scenarios. In this case you would need something much simplier than a lock. What you need is simple CAS operation with notifications and this is basic zookeeper functionality, you dont need curator to do this (it still has nicer API to use).
Best regards, Vitalii Tymchyshyn On Mon, Jan 30, 2017, 4:33 PM Foolish Ewe <[email protected]> wrote: > Hello Jordan: > > > Thank you for your thoughtful reply and also thanks to Vitalii > Tymchyshyn, whose response may be addressing some of my questions. Tl; > dr if I understand correctly, the Curator api design constrains the > client java process that unlocks or returns a lease to be the same client > (and hence in the same java process) that acquired the lock/lease. > > > Let's consider the problem and try to develop some intuition and if needed > formalism. First let's consider the problem outside the Curator context and > then ask if we can express it in Curator/Zookeeper. > > > Suppose we have the following logic before we decorate it with > synchronziation/mutual exclusion, we are given a collection of parallel > workflows where they all do > > > > Step B) update SharedResource > > Step C) read SharedResource (and other inputs) and Write Computed Results (to > HDFS) > Step E) ProcessResults > > > It happens that for our use case, Step 2) takes considerable time in our > use case and if some work flow, say i is in Step B) or Step C) while > another work flow, say j, does Step B), then job i will either fail and > stop (if we are lucky) or have (potentially undetectable) corrupted output. > > Thus we would like to employ to guard the critical section, which is Step > A) and Step B) with mutual exclusion/synchronization. Let w denote the > workflow id, then the revised job workflow would seem to look like the > following: > > Step A) Acquire exclusive access to the Shared resource for workflow w > (reserve/lock the shared resource) > > Step B) update SharedResource > > Step C) read SharedResource (and other inputs) and Write Computed Results (to > HDFS) > Step D) Release/unlock the reservation of the Shared Resource of workflow > w making the Shared Resource available for access by other workflows > Step E) ProcessResults > > Since we aren't asking all the workflows to get to reach a particular > point in execution, it is unclear why I would try a synchronization > barrier. To me, this looks like a traditional mutual exclusion problem > (i.e. at most one workflow is active in the critical section of Step B or > Step C). > > The twist in my use case is that Step B) and Step C) are collections of > one or more different jobs scheduled by Yarn, so we don't currently > support a continuously running client side process that can host a > listener for our use case. I was looking to see if the off-the-shelf > recipes in Curator support this. My current understanding is (if I > understand Vitali's remarks and the documentation) is that Curator's design > assumes that locking entity should be in the same Java process as the > unlocking entity and that the Curator design advocates for a client side > process running with a listener for correctness (e.g. recovery in the case > of client failure, perhaps other cases too?). But in our current system, > Step > A) and Step D) are different jobs and share no JVMs (i.e. are distinct Java > processes) and I was looking for an appropriate approach for > the unlock/returnLease in Step D) given that constraint. > > Please correct me if I'm wrong, but my understanding I looked at the > following candidate approaches with the constraint of not having a > continuously running java process that both acquires and releases a lock > (or acquires/returnLease a semaphore): > > > - Please correct me if I'm wrong, but my understanding is that for > revocation, the lock holder needs to be listening for revocation requests > and then needs to release it's lock (or Revocation appears to be > cooperative, so I would need a client side listener in the locking entity's > java process, which would require some (potentially non-trivial) > refactoring of the workflow to accommodate this, in order to have correct > revocation request detection followed by lock release. > - http://curator.apache.org/curator-recipes/shared-reentrant-lock.html - > The unlock mechanism requires that the jvm has a valid InterProcessMutex > that has already acquired the lock before doing a release() operation. So > we have a chicken and egg situation here. > - http://curator.apache.org/curator-recipes/shared-semaphore.html - > The Lease (obtained via the acquire method) parameter in the > returnLease method (on first glance) appears to requires that the same java > process perform both the locking and unlocking (unless the lease can be > serialized and transmitted from the locking entity and received and > deserialized by the unlocking entity). Although the lease provides a > way to mitigate crashed locking entities, there appears to be a tradeoff, > where the lease improves recovery from crashed or failed clients but makes > the Curator semaphores seem less expressive than the traditional > semaphore definition does not have any analog of the lease. E.g. in > producer consumer problems, the unlocking entity is distinct from the > locking entity (which is why I mentioned it as a motivating example). > > This seems to imply that I need to look at the cost of modifying the > workflow design and see if I can meet the constraint or consider other > approaches. > > With best regards: > > Bill > > - > Apache Curator Recipes > <http://curator.apache.org/curator-recipes/shared-semaphore.html> > curator.apache.org > A counting semaphore that works across JVMs. All processes in all JVMs > that use the same lock path will achieve an inter-process limited set of > leases. > > > Shared ReEntrant Lock - Apache Curator > <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html> > curator.apache.org > Fully distributed locks that are globally synchronous, meaning at any > snapshot in time no two clients think they hold the same lock. > > > > > > ------------------------------ > *From:* Jordan Zimmerman <[email protected]> > *Sent:* Thursday, January 26, 2017 5:05 AM > *To:* [email protected] > *Subject:* Re: Can Curator's recipes for synchronization be used when the > releasing entity is not the locking entity? > > I read the description several times and, sadly, don’t understand. Maybe > someone else? At first blush it almost sounds like a barrier or double > barrier: http://curator.apache.org/curator-recipes/barrier.html or > http://curator.apache.org/curator-recipes/double-barrier.html. But, then, > I don’t totally understand. Another thing: Curator InterProcessMutex can be > revoked from another process. See > http://curator.apache.org/curator-recipes/shared-reentrant-lock.html > “Revoking” > - maybe that’s what you want? Other than that, maybe you can restate the > problem or give more details. > Apache Curator Recipes > <http://curator.apache.org/curator-recipes/shared-reentrant-lock.html> > curator.apache.org > Fully distributed locks that are globally synchronous, meaning at any > snapshot in time no two clients think they hold the same lock. > > Apache Curator Recipes > <http://curator.apache.org/curator-recipes/double-barrier.html> > curator.apache.org > An implementation of the Distributed Double Barrier ZK recipe. Double > barriers enable clients to synchronize the beginning and the end of a > computation. > > Apache Curator Recipes > <http://curator.apache.org/curator-recipes/barrier.html> > curator.apache.org > An implementation of the Distributed Barrier ZK recipe. Distributed > systems use barriers to block processing of a set of nodes until a > condition is met at which time ... > > > -Jordan > > On Jan 25, 2017, at 6:03 PM, Foolish Ewe <[email protected]> wrote: > > Hello All: > > I would like to use Curator to synchronize mutually exclusive access to a > shared resource, however the entity that wants to release a lock is > distinct from the locking entity (i.e. they are in different JVMS on > different machines). Such cases can occur in practice (e.g. > producer/consumer synchronization, but this isn't quite my use case). > Informally I would like to have operations that behave like the following > in a JVM based language: > > > 1. Strict requirements: > 1. acquire(resourceId, taskId) - Have the task waiting for the > resource suspend until it has mutually exclusive access (i.e. acquires > the > lock) or throw an exception if the request is somehow invalid (i.e. bad > resource Id, bad task Id, internal error, etc). > 2. release(resourceId) - Given a resource, if there is an acquired > lock, release that lock and wake up the next task (in FCFS order) > waiting > to acquire the lock if it exists > 2. Nice to have (useful for maintenance, etc). > 1. status(resourceId) - Report if the resource is locked, the > current taskId of the acquirer if the lock is acquired and the > (potentially > empty) FCFS list of tasks waiting to acquire the lock. > 2. releaseAll(resourceId) - remove all pending locks on this > resource > > However, the semantics of the recipes I've looked at seem to indicate that > the releasing entity must have a handle (either explicit or implicit) of > the lease/lock, e.g. > > > > - http://curator.apache.org/curator-recipes/shared-reentrant-lock.html > states > - > > public void release() > Perform one release of the mutex if the calling thread is the same thread > that acquired it. If the > thread had made multiple calls to acquire, the mutex will still be held > when this method returns. > > > > > - http://curator.apache.org/curator-recipes/shared-semaphore.html > states: > - > > Lease instances can either be closed directly or you can use these > convenience methods: > > public void returnAll(Collection<Lease> leases) > public void returnLease(Lease lease) > > > So it appears on the surface the the expectation is that the same entity > that acquires a mutex or a semaphore lease is expected to release the mutex > or return the lease. > My questions are: > > 1. Am I misunderstanding how Curator works? > 2. Is there a more appropriate abstraction in Curator for my use case? > 3. Can I use one of the existing recipes? Could a releasing entity > return a lease if they had a serialized copy of the lease but weren't the > entity acquiring the lease? > 4. If I need to roll my own, should the Curator Framework be able to > help here or should I work at the raw zookeeper level for this use case? > > Thanks for your help with this: > > Bill > > >
