It does look like a special case of that JIRA item. I read back through the Chubby paper and it sounds like they solve this problem using a similar mechanism. They just block the client until either they manage to re-establish a session or until the session timeout expires (at which case they return an error to the application). That seems like the right thing to do here as well.
I can solve this problem for myself by just treating a CONNECTION LOSS event (while holding a lock) as the end of a session and clearing my application state. It isn't ideal, but it will do the job in a safe way while guaranteeing progress on the locks. CG. On Mon, May 31, 2010 at 1:54 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > > Isn't this a special case of > https://issues.apache.org/jira/browse/ZOOKEEPER-22 ? > > Is there any progress on this? > > > On Mon, May 31, 2010 at 12:34 PM, Patrick Hunt <ph...@apache.org> wrote: > >> Hi Charles, any luck with this? Re the issues you found with the recipes >> please enter a JIRA, it would be good to address the problem(s) you found. >> https://issues.apache.org/jira/browse/ZOOKEEPER >> >> re use of session/thread id, might you use some sort of unique token >> that's dynamically assigned to the thread making a request on the shared >> session? The calling code could then be identified by that token in recovery >> cases. >> >> Patrick >> >> On 05/28/2010 08:28 AM, Charles Gordon wrote: >> >>> Hello, >>> >>> I am new to using Zookeeper and I have a quick question about the locking >>> recipe that can be found here: >>> >>> >>> http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks >>> >>> It appears to me that there is a flaw in this algorithm related to >>> partial >>> failure, and I am curious to know how to fix it. >>> >>> The algorithm follows these steps: >>> >>> 1. Call "create()" with a pathname like >>> "/some/path/to/parent/child-lock-". >>> 2. Call "getChildren()" on the lock node without the watch flag set. >>> 3. If the path created in step (1) has the lowest sequence number, you >>> are >>> the master (skip the next steps). >>> 4. Otherwise, call "exists()" with the watch flag set on the child with >>> the >>> next lowest sequence number. >>> 5. If "exists()" returns false, go to step (2), otherwise wait for a >>> notification from the path, then go to step (2). >>> >>> The scenario that seems to be faulty is a partial failure in step (1). >>> Assume that my client program follows step (1) and calls "create()". >>> Assume >>> that the call succeeds on the Zookeeper server, but there is a >>> ConnectionLoss event right as the server sends the response (e.g., a >>> network >>> partition, some dropped packets, the ZK server goes down, etc). Assume >>> further that the client immediately reconnects, so the session is not >>> timed >>> out. At this point there is a child node that was created by my client, >>> but >>> that my client does not know about (since it never received the >>> response). >>> Since my client doesn't know about the child, it won't know to watch the >>> previous child to it, and it also won't know to delete it. That means all >>> clients using that lock will fail to make progress as soon as the >>> orphaned >>> child is the lowest sequence number. This state will continue until my >>> client closes it's session (which may be a while if I have a long lived >>> session, as I would like to have). Correctness is maintained here, but >>> live-ness is not. >>> >>> The only good solution I have found for this problem is to establish a >>> new >>> session with Zookeeper before acquiring a lock, and to close that session >>> immediately upon any connection loss in step (1). If everything works, >>> the >>> session could be re-used, but you'd need to guarantee that the session >>> was >>> closed if there was a failure during creation of the child node. Are >>> there >>> other good solutions? >>> >>> I looked at the sample code that comes with the Zookeeper distribution >>> (I'm >>> using 3.2.2 right now), and it uses the current session ID as part of the >>> child node name. Then, if there is a failure during creation, it tries to >>> look up the child using that session ID. This isn't really helpful in the >>> environment I'm using, where a single session could be shared by multiple >>> threads, any of which could request a lock (so I can't uniquely identify >>> a >>> lock by session ID). I could use thread ID, but then I run the risk of a >>> thread being reused and getting the wrong lock. In any case, there is >>> also >>> the risk that a second failure prevents me from looking up the lock after >>> a >>> connection loss, so I'm right back to an orphaned lock child, as above. I >>> could, presumably, be careful enough with try/catch logic to prevent even >>> that case, but it makes for pretty bug-prone code. Also, as a side note, >>> that code appears to be sorting the child nodes by the session ID first, >>> then the sequence number, which could cause locks to be ordered >>> incorrectly. >>> >>> Thanks for any help you can provide! >>> >>> Charles Gordon >>> >>> >