Thanks! I do agree with you that Client1 will eventually know that the lock is invalid, by tracking disconnection and time.
But, 1. Time can not by precisely synchronized between servers; it is likely that client1 will detect session timeout (by its timer thread), after server treats client1's session as timeouted and Client2 thinks itself holding the lock. so, within a small time gap, more than one client may believe themselves holding the lock. 2. thus , the protocol of lock can still not guarantee exclusiveness; is it ... er... broken ? On Fri, Jan 11, 2013 at 10:48 PM, Andrey Stepachev <[email protected]> wrote: > Hi, > > Yes, this scenario is very likely. > But it will work only for long running tasks (more then session timeout), > for short livinig tasks lock will be unlocked before session timeout, > surely. > > In case of long living locks, Client1 should track disconnection from zk > cluster and assume, that lock was abandoned (and somehow notify lock owner > about that). Client can know value of session timeout and spawn timer, and > action accordingly program logic. As example it can interrupt thread, which > created lock, and rise some flag, so long running task can know - lock is > not valid. > > > On Fri, Jan 11, 2013 at 5:46 PM, Zhao Boran <[email protected]> wrote: > > > While reading the zookeeper's recipe for > > lock<http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks > >, > > I get confused: > > > > Seems that this recipe-for-distributed-lock can not guarantee *"any > > snapshot in time no two clients think they hold the same lock"*. > > > > But since zookeeper is so widely adopted, if there were such mistakes in > > the reference doc, someone should have pointed it out long time ago. > > > > So, what did I misunderstand? please help me! > > > > Recipe-for-distributed-lock (from > > http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks) > > > > Locks > > > > Fully distributed locks that are globally synchronous, *meaning at any > > snapshot in time no two clients think they hold the same lock*. These can > > be implemented using ZooKeeeper. As with priority queues, first define a > > lock node. > > > > 1. Call create( ) with a pathname of "*locknode*/guid-lock-" and the > > sequence and ephemeral flags set. > > 2. Call getChildren( ) on the lock node without setting the watch flag > > (this is important to avoid the herd effect). > > 3. If the pathname created in step 1 has the lowest sequence number > > suffix, the client has the lock and the client exits the protocol. > > 4. The client calls exists( ) with the watch flag set on the path in > the > > lock directory with the next lowest sequence number. > > 5. if exists( ) returns false, go to step 2. Otherwise, wait for a > > notification for the pathname from the previous step before going to > > step 2. > > > > Considering the following case: > > > > - > > > > Client1 successfully acquired the lock(in step3), with zk node > > "locknode/guid-lock-0"; > > - > > > > Client2 created node "locknode/guid-lock-1", failed to acquire the > lock, > > and watching "locknode/guid-lock-0"; > > - > > > > Later, for some reasons(network congestion?), client1 failed to send > > heart beat message to zk cluster on time, but client1 is still > perfectly > > working, and assuming itself still holding the lock. > > - > > > > But, Zookeeper may think client1's session is timeouted, and then > > 1. deletes "locknode/guid-lock-0" > > 2. sends a notification to Client2 (or send the notification > first?) > > 3. but can not send "session timeout" notification to client1 in > time > > (due to network congestion?) > > > > > > - > > > > Client2 got the notification, goes to step 2, gets the only node > > ""locknode/guid-lock-1", which is created by itself; thus, client2 > > assumes > > it hold the lock. > > - > > > > But at the same time, client1 assumes it hold the lock. > > > > Is this a valid scenario? > > > > Thanks a lot! > > > > > > -- > Andrey. >
