Re: Locking and Partial Failure
It does look like a special case of that JIRA item. I read back through the Chubby paper and it sounds like they solve this problem using a similar mechanism. They just block the client until either they manage to re-establish a session or until the session timeout expires (at which case they return an error to the application). That seems like the right thing to do here as well. I can solve this problem for myself by just treating a CONNECTION LOSS event (while holding a lock) as the end of a session and clearing my application state. It isn't ideal, but it will do the job in a safe way while guaranteeing progress on the locks. CG. On Mon, May 31, 2010 at 1:54 PM, Ted Dunning wrote: > > Isn't this a special case of > https://issues.apache.org/jira/browse/ZOOKEEPER-22 ? > > Is there any progress on this? > > > On Mon, May 31, 2010 at 12:34 PM, Patrick Hunt wrote: > >> Hi Charles, any luck with this? Re the issues you found with the recipes >> please enter a JIRA, it would be good to address the problem(s) you found. >> https://issues.apache.org/jira/browse/ZOOKEEPER >> >> re use of session/thread id, might you use some sort of unique token >> that's dynamically assigned to the thread making a request on the shared >> session? The calling code could then be identified by that token in recovery >> cases. >> >> Patrick >> >> On 05/28/2010 08:28 AM, Charles Gordon wrote: >> >>> Hello, >>> >>> I am new to using Zookeeper and I have a quick question about the locking >>> recipe that can be found here: >>> >>> >>> http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks >>> >>> It appears to me that there is a flaw in this algorithm related to >>> partial >>> failure, and I am curious to know how to fix it. >>> >>> The algorithm follows these steps: >>> >>> 1. Call "create()" with a pathname like >>> "/some/path/to/parent/child-lock-". >>> 2. Call "getChildren()" on the lock node without the watch flag set. >>> 3. If the path created in step (1) has the lowest sequence number, you >>> are >>> the master (skip the next steps). >>> 4. Otherwise, call "exists()" with the watch flag set on the child with >>> the >>> next lowest sequence number. >>> 5. If "exists()" returns false, go to step (2), otherwise wait for a >>> notification from the path, then go to step (2). >>> >>> The scenario that seems to be faulty is a partial failure in step (1). >>> Assume that my client program follows step (1) and calls "create()". >>> Assume >>> that the call succeeds on the Zookeeper server, but there is a >>> ConnectionLoss event right as the server sends the response (e.g., a >>> network >>> partition, some dropped packets, the ZK server goes down, etc). Assume >>> further that the client immediately reconnects, so the session is not >>> timed >>> out. At this point there is a child node that was created by my client, >>> but >>> that my client does not know about (since it never received the >>> response). >>> Since my client doesn't know about the child, it won't know to watch the >>> previous child to it, and it also won't know to delete it. That means all >>> clients using that lock will fail to make progress as soon as the >>> orphaned >>> child is the lowest sequence number. This state will continue until my >>> client closes it's session (which may be a while if I have a long lived >>> session, as I would like to have). Correctness is maintained here, but >>> live-ness is not. >>> >>> The only good solution I have found for this problem is to establish a >>> new >>> session with Zookeeper before acquiring a lock, and to close that session >>> immediately upon any connection loss in step (1). If everything works, >>> the >>> session could be re-used, but you'd need to guarantee that the session >>> was >>> closed if there was a failure during creation of the child node. Are >>> there >>> other good solutions? >>> >>> I looked at the sample code that comes with the Zookeeper distribution >>> (I'm >>> using 3.2.2 right now), and it uses the current session ID as part of the >>> child node name. Then, if there is a failure during creation, it tries to >>> look up the child using that session ID. This isn't really helpful in the >>> environment I'm using, where a single session could be shared by multiple >>> threads, any of which could request a lock (so I can't uniquely identify >>> a >>> lock by session ID). I could use thread ID, but then I run the risk of a >>> thread being reused and getting the wrong lock. In any case, there is >>> also >>> the risk that a second failure prevents me from looking up the lock after >>> a >>> connection loss, so I'm right back to an orphaned lock child, as above. I >>> could, presumably, be careful enough with try/catch logic to prevent even >>> that case, but it makes for pretty bug-prone code. Also, as a side note, >>> that code appears to be sorting the child nodes by the session ID first, >>> then the sequence number, which could cause locks to be ordere
Re: Locking and Partial Failure
Isn't this a special case of https://issues.apache.org/jira/browse/ZOOKEEPER-22 ? Is there any progress on this? On Mon, May 31, 2010 at 12:34 PM, Patrick Hunt wrote: > Hi Charles, any luck with this? Re the issues you found with the recipes > please enter a JIRA, it would be good to address the problem(s) you found. > https://issues.apache.org/jira/browse/ZOOKEEPER > > re use of session/thread id, might you use some sort of unique token that's > dynamically assigned to the thread making a request on the shared session? > The calling code could then be identified by that token in recovery cases. > > Patrick > > On 05/28/2010 08:28 AM, Charles Gordon wrote: > >> Hello, >> >> I am new to using Zookeeper and I have a quick question about the locking >> recipe that can be found here: >> >> >> http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks >> >> It appears to me that there is a flaw in this algorithm related to partial >> failure, and I am curious to know how to fix it. >> >> The algorithm follows these steps: >> >> 1. Call "create()" with a pathname like >> "/some/path/to/parent/child-lock-". >> 2. Call "getChildren()" on the lock node without the watch flag set. >> 3. If the path created in step (1) has the lowest sequence number, you >> are >> the master (skip the next steps). >> 4. Otherwise, call "exists()" with the watch flag set on the child with >> the >> next lowest sequence number. >> 5. If "exists()" returns false, go to step (2), otherwise wait for a >> notification from the path, then go to step (2). >> >> The scenario that seems to be faulty is a partial failure in step (1). >> Assume that my client program follows step (1) and calls "create()". >> Assume >> that the call succeeds on the Zookeeper server, but there is a >> ConnectionLoss event right as the server sends the response (e.g., a >> network >> partition, some dropped packets, the ZK server goes down, etc). Assume >> further that the client immediately reconnects, so the session is not >> timed >> out. At this point there is a child node that was created by my client, >> but >> that my client does not know about (since it never received the response). >> Since my client doesn't know about the child, it won't know to watch the >> previous child to it, and it also won't know to delete it. That means all >> clients using that lock will fail to make progress as soon as the orphaned >> child is the lowest sequence number. This state will continue until my >> client closes it's session (which may be a while if I have a long lived >> session, as I would like to have). Correctness is maintained here, but >> live-ness is not. >> >> The only good solution I have found for this problem is to establish a new >> session with Zookeeper before acquiring a lock, and to close that session >> immediately upon any connection loss in step (1). If everything works, the >> session could be re-used, but you'd need to guarantee that the session was >> closed if there was a failure during creation of the child node. Are there >> other good solutions? >> >> I looked at the sample code that comes with the Zookeeper distribution >> (I'm >> using 3.2.2 right now), and it uses the current session ID as part of the >> child node name. Then, if there is a failure during creation, it tries to >> look up the child using that session ID. This isn't really helpful in the >> environment I'm using, where a single session could be shared by multiple >> threads, any of which could request a lock (so I can't uniquely identify a >> lock by session ID). I could use thread ID, but then I run the risk of a >> thread being reused and getting the wrong lock. In any case, there is also >> the risk that a second failure prevents me from looking up the lock after >> a >> connection loss, so I'm right back to an orphaned lock child, as above. I >> could, presumably, be careful enough with try/catch logic to prevent even >> that case, but it makes for pretty bug-prone code. Also, as a side note, >> that code appears to be sorting the child nodes by the session ID first, >> then the sequence number, which could cause locks to be ordered >> incorrectly. >> >> Thanks for any help you can provide! >> >> Charles Gordon >> >>
Re: Locking and Partial Failure
Hi Charles, any luck with this? Re the issues you found with the recipes please enter a JIRA, it would be good to address the problem(s) you found. https://issues.apache.org/jira/browse/ZOOKEEPER re use of session/thread id, might you use some sort of unique token that's dynamically assigned to the thread making a request on the shared session? The calling code could then be identified by that token in recovery cases. Patrick On 05/28/2010 08:28 AM, Charles Gordon wrote: Hello, I am new to using Zookeeper and I have a quick question about the locking recipe that can be found here: http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks It appears to me that there is a flaw in this algorithm related to partial failure, and I am curious to know how to fix it. The algorithm follows these steps: 1. Call "create()" with a pathname like "/some/path/to/parent/child-lock-". 2. Call "getChildren()" on the lock node without the watch flag set. 3. If the path created in step (1) has the lowest sequence number, you are the master (skip the next steps). 4. Otherwise, call "exists()" with the watch flag set on the child with the next lowest sequence number. 5. If "exists()" returns false, go to step (2), otherwise wait for a notification from the path, then go to step (2). The scenario that seems to be faulty is a partial failure in step (1). Assume that my client program follows step (1) and calls "create()". Assume that the call succeeds on the Zookeeper server, but there is a ConnectionLoss event right as the server sends the response (e.g., a network partition, some dropped packets, the ZK server goes down, etc). Assume further that the client immediately reconnects, so the session is not timed out. At this point there is a child node that was created by my client, but that my client does not know about (since it never received the response). Since my client doesn't know about the child, it won't know to watch the previous child to it, and it also won't know to delete it. That means all clients using that lock will fail to make progress as soon as the orphaned child is the lowest sequence number. This state will continue until my client closes it's session (which may be a while if I have a long lived session, as I would like to have). Correctness is maintained here, but live-ness is not. The only good solution I have found for this problem is to establish a new session with Zookeeper before acquiring a lock, and to close that session immediately upon any connection loss in step (1). If everything works, the session could be re-used, but you'd need to guarantee that the session was closed if there was a failure during creation of the child node. Are there other good solutions? I looked at the sample code that comes with the Zookeeper distribution (I'm using 3.2.2 right now), and it uses the current session ID as part of the child node name. Then, if there is a failure during creation, it tries to look up the child using that session ID. This isn't really helpful in the environment I'm using, where a single session could be shared by multiple threads, any of which could request a lock (so I can't uniquely identify a lock by session ID). I could use thread ID, but then I run the risk of a thread being reused and getting the wrong lock. In any case, there is also the risk that a second failure prevents me from looking up the lock after a connection loss, so I'm right back to an orphaned lock child, as above. I could, presumably, be careful enough with try/catch logic to prevent even that case, but it makes for pretty bug-prone code. Also, as a side note, that code appears to be sorting the child nodes by the session ID first, then the sequence number, which could cause locks to be ordered incorrectly. Thanks for any help you can provide! Charles Gordon
Locking and Partial Failure
Hello, I am new to using Zookeeper and I have a quick question about the locking recipe that can be found here: http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks It appears to me that there is a flaw in this algorithm related to partial failure, and I am curious to know how to fix it. The algorithm follows these steps: 1. Call "create()" with a pathname like "/some/path/to/parent/child-lock-". 2. Call "getChildren()" on the lock node without the watch flag set. 3. If the path created in step (1) has the lowest sequence number, you are the master (skip the next steps). 4. Otherwise, call "exists()" with the watch flag set on the child with the next lowest sequence number. 5. If "exists()" returns false, go to step (2), otherwise wait for a notification from the path, then go to step (2). The scenario that seems to be faulty is a partial failure in step (1). Assume that my client program follows step (1) and calls "create()". Assume that the call succeeds on the Zookeeper server, but there is a ConnectionLoss event right as the server sends the response (e.g., a network partition, some dropped packets, the ZK server goes down, etc). Assume further that the client immediately reconnects, so the session is not timed out. At this point there is a child node that was created by my client, but that my client does not know about (since it never received the response). Since my client doesn't know about the child, it won't know to watch the previous child to it, and it also won't know to delete it. That means all clients using that lock will fail to make progress as soon as the orphaned child is the lowest sequence number. This state will continue until my client closes it's session (which may be a while if I have a long lived session, as I would like to have). Correctness is maintained here, but live-ness is not. The only good solution I have found for this problem is to establish a new session with Zookeeper before acquiring a lock, and to close that session immediately upon any connection loss in step (1). If everything works, the session could be re-used, but you'd need to guarantee that the session was closed if there was a failure during creation of the child node. Are there other good solutions? I looked at the sample code that comes with the Zookeeper distribution (I'm using 3.2.2 right now), and it uses the current session ID as part of the child node name. Then, if there is a failure during creation, it tries to look up the child using that session ID. This isn't really helpful in the environment I'm using, where a single session could be shared by multiple threads, any of which could request a lock (so I can't uniquely identify a lock by session ID). I could use thread ID, but then I run the risk of a thread being reused and getting the wrong lock. In any case, there is also the risk that a second failure prevents me from looking up the lock after a connection loss, so I'm right back to an orphaned lock child, as above. I could, presumably, be careful enough with try/catch logic to prevent even that case, but it makes for pretty bug-prone code. Also, as a side note, that code appears to be sorting the child nodes by the session ID first, then the sequence number, which could cause locks to be ordered incorrectly. Thanks for any help you can provide! Charles Gordon