Re: zookeeper crash

2010-06-02 Thread Ted Dunning
This looks a bit like a small bobble we had when upgrading a bit ago.

I THINK that the answer here is to mind-wipe the misbehaving node and have
it resynch from scratch from the other nodes.

Wait for confirmation from somebody real.

On Wed, Jun 2, 2010 at 11:11 AM, Charity Majors char...@shopkick.comwrote:

 I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an
 attempt to get away from a client bug that was crashing my backend services.

 Unfortunately, this morning I had a server crash, and it brought down my
 entire cluster.  I don't have the logs leading up to the crash, because --
 argghffbuggle -- log4j wasn't set up correctly.  But I restarted all three
 nodes, and odes two and three came back up and formed a quorum.

 Node one, meanwhile, does this:

 2010-06-02 17:04:56,446 - INFO
  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-02 17:04:56,446 - INFO
  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82] - Reading snapshot
 /services/zookeeper/data/zookeeper/version-2/snapshot.a0045
 2010-06-02 17:04:56,476 - INFO
  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election.
 My id =  1, Proposed zxid = 47244640287
 2010-06-02 17:04:56,486 - INFO
  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification:
 1, 47244640287, 4, 1, LOOKING, LOOKING, 1
 2010-06-02 17:04:56,486 - INFO
  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification:
 3, 38654707048, 3, 1, LOOKING, LEADING, 3
 2010-06-02 17:04:56,486 - INFO
  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification:
 3, 38654707048, 3, 1, LOOKING, FOLLOWING, 2
 2010-06-02 17:04:56,486 - INFO
  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
 2010-06-02 17:04:56,486 - INFO
  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server
 with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 4 datadir
 /services/zookeeper/data/zookeeper/version-2 snapdir
 /services/zookeeper/data/zookeeper/version-2
 2010-06-02 17:04:56,486 - FATAL
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] - Leader epoch a is less
 than our epoch b
 2010-06-02 17:04:56,486 - WARN
  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@82] - Exception when following
 the leader
 java.io.IOException: Error: Epoch of leader is lower
   at
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
   at
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
 2010-06-02 17:04:56,486 - INFO
  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@166] - shutdown called
 java.lang.Exception: shutdown Follower
   at
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
   at
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)



 All I can find is this,
 http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00449.html,
 which implies that this state should never happen.

 Any suggestions?  If it happens again, I'll just have to roll everything
 back to 3.2.1 and live with the client crashes.







Re: zookeeper crash

2010-06-02 Thread Ted Dunning
I knew Patrick would remember to add an important detail.

On Wed, Jun 2, 2010 at 11:49 AM, Patrick Hunt ph...@apache.org wrote:

 As Ted suggested you can remove the datadir -- *only on the effected
 server* -- and then restart it.


Re: zookeeper crash

2010-06-02 Thread Charity Majors
Thanks.  That worked for me.  I'm a little confused about why it threw the 
entire cluster into an unusable state, though.

I said before that we restarted all three nodes, but tracing back, we actually 
didn't.  The zookeeper cluster was refusing all connections until we restarted 
node one.  But once node one had been dropped from the cluster, the other two 
nodes formed a quorum and started responding to queries on their own.

Is that expected as well?  I didn't see it in ZOOKEEPER-335, so thought I'd 
mention it.



On Jun 2, 2010, at 11:49 AM, Patrick Hunt wrote:

 Hi Charity, unfortunately this is a known issue not specific to 3.3 that 
 we are working to address. See this thread for some background:
 
 http://zookeeper-user.578899.n2.nabble.com/odd-error-message-td4933761.html
 
 I've raised the JIRA level to blocker to ensure we address this asap.
 
 As Ted suggested you can remove the datadir -- only on the effected 
 server -- and then restart it. That should resolve the issue (the server 
 will d/l a snapshot of the current db from the leader).
 
 Patrick
 
 On 06/02/2010 11:11 AM, Charity Majors wrote:
 I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an attempt 
 to get away from a client bug that was crashing my backend services.
 
 Unfortunately, this morning I had a server crash, and it brought down my 
 entire cluster.  I don't have the logs leading up to the crash, because -- 
 argghffbuggle -- log4j wasn't set up correctly.  But I restarted all three 
 nodes, and odes two and three came back up and formed a quorum.
 
 Node one, meanwhile, does this:
 
 2010-06-02 17:04:56,446 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-02 17:04:56,446 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82] - Reading snapshot 
 /services/zookeeper/data/zookeeper/version-2/snapshot.a0045
 2010-06-02 17:04:56,476 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
 id =  1, Proposed zxid = 47244640287
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 1, 
 47244640287, 4, 1, LOOKING, LOOKING, 1
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 
 38654707048, 3, 1, LOOKING, LEADING, 3
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 
 38654707048, 3, 1, LOOKING, FOLLOWING, 2
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server with 
 tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 4 datadir 
 /services/zookeeper/data/zookeeper/version-2 snapdir 
 /services/zookeeper/data/zookeeper/version-2
 2010-06-02 17:04:56,486 - FATAL 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] - Leader epoch a is less than 
 our epoch b
 2010-06-02 17:04:56,486 - WARN  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@82] - Exception when following 
 the leader
 java.io.IOException: Error: Epoch of leader is lower
at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@166] - shutdown called
 java.lang.Exception: shutdown Follower
at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)
 
 
 
 All I can find is this, 
 http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00449.html,
  which implies that this state should never happen.
 
 Any suggestions?  If it happens again, I'll just have to roll everything 
 back to 3.2.1 and live with the client crashes.
 
 
 
 



Re: zookeeper crash

2010-06-02 Thread Flavio Junqueira
Hi Charity, This is certainly not expected. It would be very useful if  
you could provide us with as much information about your issue as  
possible. I would suggest that either you create a new jira and link  
it to ZOOKEEPER-335, or that you add to 335 directly.


We'll be looking further into why you have seen this problem and  
working on a fix.


Cheers,
-Flavio

On Jun 2, 2010, at 10:32 PM, Charity Majors wrote:

Thanks.  That worked for me.  I'm a little confused about why it  
threw the entire cluster into an unusable state, though.


I said before that we restarted all three nodes, but tracing back,  
we actually didn't.  The zookeeper cluster was refusing all  
connections until we restarted node one.  But once node one had been  
dropped from the cluster, the other two nodes formed a quorum and  
started responding to queries on their own.


Is that expected as well?  I didn't see it in ZOOKEEPER-335, so  
thought I'd mention it.




On Jun 2, 2010, at 11:49 AM, Patrick Hunt wrote:

Hi Charity, unfortunately this is a known issue not specific to 3.3  
that

we are working to address. See this thread for some background:

http://zookeeper-user.578899.n2.nabble.com/odd-error-message-td4933761.html

I've raised the JIRA level to blocker to ensure we address this  
asap.


As Ted suggested you can remove the datadir -- only on the effected
server -- and then restart it. That should resolve the issue (the  
server

will d/l a snapshot of the current db from the leader).

Patrick

On 06/02/2010 11:11 AM, Charity Majors wrote:
I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in  
an attempt to get away from a client bug that was crashing my  
backend services.


Unfortunately, this morning I had a server crash, and it brought  
down my entire cluster.  I don't have the logs leading up to the  
crash, because -- argghffbuggle -- log4j wasn't set up correctly.   
But I restarted all three nodes, and odes two and three came back  
up and formed a quorum.


Node one, meanwhile, does this:

2010-06-02 17:04:56,446 - INFO  [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
2010-06-02 17:04:56,446 - INFO  [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:files...@82] - Reading snapshot /services/ 
zookeeper/data/zookeeper/version-2/snapshot.a0045
2010-06-02 17:04:56,476 - INFO  [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id  
=  1, Proposed zxid = 47244640287
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 1,  
47244640287, 4, 1, LOOKING, LOOKING, 1
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3,  
38654707048, 3, 1, LOOKING, LEADING, 3
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3,  
38654707048, 3, 1, LOOKING, FOLLOWING, 2
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server with  
tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 4  
datadir /services/zookeeper/data/zookeeper/version-2 snapdir / 
services/zookeeper/data/zookeeper/version-2
2010-06-02 17:04:56,486 - FATAL [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:follo...@71] - Leader epoch a is less than  
our epoch b
2010-06-02 17:04:56,486 - WARN  [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:follo...@82] - Exception when following the  
leader

java.io.IOException: Error: Epoch of leader is lower
  at  
org 
.apache 
.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
  at  
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java: 
644)
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/ 
0:0:0:0:0:0:0:0:2181:follo...@166] - shutdown called

java.lang.Exception: shutdown Follower
  at  
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java: 
166)
  at  
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java: 
648)




All I can find is this, http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00449.html 
, which implies that this state should never happen.


Any suggestions?  If it happens again, I'll just have to roll  
everything back to 3.2.1 and live with the client crashes.











Re: zookeeper crash

2010-06-02 Thread Benjamin Reed
charity, do you mind going through your scenario again to give a 
timeline for the failure? i'm a bit confused as to what happened.


ben

On 06/02/2010 01:32 PM, Charity Majors wrote:

Thanks.  That worked for me.  I'm a little confused about why it threw the 
entire cluster into an unusable state, though.

I said before that we restarted all three nodes, but tracing back, we actually 
didn't.  The zookeeper cluster was refusing all connections until we restarted 
node one.  But once node one had been dropped from the cluster, the other two 
nodes formed a quorum and started responding to queries on their own.

Is that expected as well?  I didn't see it in ZOOKEEPER-335, so thought I'd 
mention it.



On Jun 2, 2010, at 11:49 AM, Patrick Hunt wrote:

   

Hi Charity, unfortunately this is a known issue not specific to 3.3 that
we are working to address. See this thread for some background:

http://zookeeper-user.578899.n2.nabble.com/odd-error-message-td4933761.html

I've raised the JIRA level to blocker to ensure we address this asap.

As Ted suggested you can remove the datadir -- only on the effected
server -- and then restart it. That should resolve the issue (the server
will d/l a snapshot of the current db from the leader).

Patrick

On 06/02/2010 11:11 AM, Charity Majors wrote:
 

I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an attempt to 
get away from a client bug that was crashing my backend services.

Unfortunately, this morning I had a server crash, and it brought down my entire 
cluster.  I don't have the logs leading up to the crash, because -- 
argghffbuggle -- log4j wasn't set up correctly.  But I restarted all three 
nodes, and odes two and three came back up and formed a quorum.

Node one, meanwhile, does this:

2010-06-02 17:04:56,446 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
2010-06-02 17:04:56,446 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82] 
- Reading snapshot 
/services/zookeeper/data/zookeeper/version-2/snapshot.a0045
2010-06-02 17:04:56,476 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id 
=  1, Proposed zxid = 47244640287
2010-06-02 17:04:56,486 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 1, 
47244640287, 4, 1, LOOKING, LOOKING, 1
2010-06-02 17:04:56,486 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 
38654707048, 3, 1, LOOKING, LEADING, 3
2010-06-02 17:04:56,486 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 
38654707048, 3, 1, LOOKING, FOLLOWING, 2
2010-06-02 17:04:56,486 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
2010-06-02 17:04:56,486 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server with 
tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 4 datadir 
/services/zookeeper/data/zookeeper/version-2 snapdir 
/services/zookeeper/data/zookeeper/version-2
2010-06-02 17:04:56,486 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] 
- Leader epoch a is less than our epoch b
2010-06-02 17:04:56,486 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@82] 
- Exception when following the leader
java.io.IOException: Error: Epoch of leader is lower
at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@166] 
- shutdown called
java.lang.Exception: shutdown Follower
at 
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)



All I can find is this, 
http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00449.html, 
which implies that this state should never happen.

Any suggestions?  If it happens again, I'll just have to roll everything back 
to 3.2.1 and live with the client crashes.




   
   




Re: zookeeper crash

2010-06-02 Thread Charity Majors
Sure thing.

We got paged this morning because backend services were not able to write to 
the database.  Each server discovers the DB master using zookeeper, so when 
zookeeper goes down, they assume they no longer know who the DB master is and 
stop working.

When we realized there were no problems with the database, we logged in to the 
zookeeper nodes.  We weren't able to connect to zookeeper using zkCli.sh from 
any of the three nodes, so we decided to restart them all, starting with node 
one.  However, after restarting node one, the cluster started responding 
normally again. 

(The timestamps on the zookeeper processes on nodes two and three *are* dated 
today, but none of us restarted them.  We checked shell histories and sudo 
logs, and they seem to back us up.)

We tried getting node one to come back up and join the cluster, but that's when 
we realized we weren't getting any logs, because log4j.properties was in the 
wrong location.  Sorry -- I REALLY wish I had those logs for you.  We put log4j 
back in place, and that's when we saw the spew I pasted in my first message.

I'll tack this on to ZK-335.



On Jun 2, 2010, at 4:17 PM, Benjamin Reed wrote:

 charity, do you mind going through your scenario again to give a 
 timeline for the failure? i'm a bit confused as to what happened.
 
 ben
 
 On 06/02/2010 01:32 PM, Charity Majors wrote:
 Thanks.  That worked for me.  I'm a little confused about why it threw the 
 entire cluster into an unusable state, though.
 
 I said before that we restarted all three nodes, but tracing back, we 
 actually didn't.  The zookeeper cluster was refusing all connections until 
 we restarted node one.  But once node one had been dropped from the cluster, 
 the other two nodes formed a quorum and started responding to queries on 
 their own.
 
 Is that expected as well?  I didn't see it in ZOOKEEPER-335, so thought I'd 
 mention it.
 
 
 
 On Jun 2, 2010, at 11:49 AM, Patrick Hunt wrote:
 
 
 Hi Charity, unfortunately this is a known issue not specific to 3.3 that
 we are working to address. See this thread for some background:
 
 http://zookeeper-user.578899.n2.nabble.com/odd-error-message-td4933761.html
 
 I've raised the JIRA level to blocker to ensure we address this asap.
 
 As Ted suggested you can remove the datadir -- only on the effected
 server -- and then restart it. That should resolve the issue (the server
 will d/l a snapshot of the current db from the leader).
 
 Patrick
 
 On 06/02/2010 11:11 AM, Charity Majors wrote:
 
 I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an 
 attempt to get away from a client bug that was crashing my backend 
 services.
 
 Unfortunately, this morning I had a server crash, and it brought down my 
 entire cluster.  I don't have the logs leading up to the crash, because -- 
 argghffbuggle -- log4j wasn't set up correctly.  But I restarted all three 
 nodes, and odes two and three came back up and formed a quorum.
 
 Node one, meanwhile, does this:
 
 2010-06-02 17:04:56,446 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-02 17:04:56,446 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82] - Reading snapshot 
 /services/zookeeper/data/zookeeper/version-2/snapshot.a0045
 2010-06-02 17:04:56,476 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. 
 My id =  1, Proposed zxid = 47244640287
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 
 1, 47244640287, 4, 1, LOOKING, LOOKING, 1
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 
 3, 38654707048, 3, 1, LOOKING, LEADING, 3
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 
 3, 38654707048, 3, 1, LOOKING, FOLLOWING, 2
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server 
 with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 4 datadir 
 /services/zookeeper/data/zookeeper/version-2 snapdir 
 /services/zookeeper/data/zookeeper/version-2
 2010-06-02 17:04:56,486 - FATAL 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] - Leader epoch a is less 
 than our epoch b
 2010-06-02 17:04:56,486 - WARN  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@82] - Exception when following 
 the leader
 java.io.IOException: Error: Epoch of leader is lower
at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
 2010-06-02 17:04:56,486 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@166] - shutdown called
 java.lang.Exception: shutdown Follower
at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at 
 

Re: Locking and Partial Failure

2010-06-02 Thread Charles Gordon
It does look like a special case of that JIRA item. I read back through the
Chubby paper and it sounds like they solve this problem using a similar
mechanism. They just block the client until either they manage to
re-establish a session or until the session timeout expires (at which case
they return an error to the application). That seems like the right thing to
do here as well.

I can solve this problem for myself by just treating a CONNECTION LOSS event
(while holding a lock) as the end of a session and clearing my application
state. It isn't ideal, but it will do the job in a safe way while
guaranteeing progress on the locks.

CG.

On Mon, May 31, 2010 at 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:


 Isn't this a special case of
 https://issues.apache.org/jira/browse/ZOOKEEPER-22 ?

 Is there any progress on this?


 On Mon, May 31, 2010 at 12:34 PM, Patrick Hunt ph...@apache.org wrote:

 Hi Charles, any luck with this? Re the issues you found with the recipes
 please enter a JIRA, it would be good to address the problem(s) you found.
 https://issues.apache.org/jira/browse/ZOOKEEPER

 re use of session/thread id, might you use some sort of unique token
 that's dynamically assigned to the thread making a request on the shared
 session? The calling code could then be identified by that token in recovery
 cases.

 Patrick

 On 05/28/2010 08:28 AM, Charles Gordon wrote:

 Hello,

 I am new to using Zookeeper and I have a quick question about the locking
 recipe that can be found here:


 http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks

 It appears to me that there is a flaw in this algorithm related to
 partial
 failure, and I am curious to know how to fix it.

 The algorithm follows these steps:

  1. Call create() with a pathname like
 /some/path/to/parent/child-lock-.
  2. Call getChildren() on the lock node without the watch flag set.
  3. If the path created in step (1) has the lowest sequence number, you
 are
 the master (skip the next steps).
  4. Otherwise, call exists() with the watch flag set on the child with
 the
 next lowest sequence number.
  5. If exists() returns false, go to step (2), otherwise wait for a
 notification from the path, then go to step (2).

 The scenario that seems to be faulty is a partial failure in step (1).
 Assume that my client program follows step (1) and calls create().
 Assume
 that the call succeeds on the Zookeeper server, but there is a
 ConnectionLoss event right as the server sends the response (e.g., a
 network
 partition, some dropped packets, the ZK server goes down, etc). Assume
 further that the client immediately reconnects, so the session is not
 timed
 out. At this point there is a child node that was created by my client,
 but
 that my client does not know about (since it never received the
 response).
 Since my client doesn't know about the child, it won't know to watch the
 previous child to it, and it also won't know to delete it. That means all
 clients using that lock will fail to make progress as soon as the
 orphaned
 child is the lowest sequence number. This state will continue until my
 client closes it's session (which may be a while if I have a long lived
 session, as I would like to have). Correctness is maintained here, but
 live-ness is not.

 The only good solution I have found for this problem is to establish a
 new
 session with Zookeeper before acquiring a lock, and to close that session
 immediately upon any connection loss in step (1). If everything works,
 the
 session could be re-used, but you'd need to guarantee that the session
 was
 closed if there was a failure during creation of the child node. Are
 there
 other good solutions?

 I looked at the sample code that comes with the Zookeeper distribution
 (I'm
 using 3.2.2 right now), and it uses the current session ID as part of the
 child node name. Then, if there is a failure during creation, it tries to
 look up the child using that session ID. This isn't really helpful in the
 environment I'm using, where a single session could be shared by multiple
 threads, any of which could request a lock (so I can't uniquely identify
 a
 lock by session ID). I could use thread ID, but then I run the risk of a
 thread being reused and getting the wrong lock. In any case, there is
 also
 the risk that a second failure prevents me from looking up the lock after
 a
 connection loss, so I'm right back to an orphaned lock child, as above. I
 could, presumably, be careful enough with try/catch logic to prevent even
 that case, but it makes for pretty bug-prone code. Also, as a side note,
 that code appears to be sorting the child nodes by the session ID first,
 then the sequence number, which could cause locks to be ordered
 incorrectly.

 Thanks for any help you can provide!

 Charles Gordon