Thanks.  That worked for me.  I'm a little confused about why it threw the 
entire cluster into an unusable state, though.

I said before that we restarted all three nodes, but tracing back, we actually 
didn't.  The zookeeper cluster was refusing all connections until we restarted 
node one.  But once node one had been dropped from the cluster, the other two 
nodes formed a quorum and started responding to queries on their own.

Is that expected as well?  I didn't see it in ZOOKEEPER-335, so thought I'd 
mention it.



On Jun 2, 2010, at 11:49 AM, Patrick Hunt wrote:

> Hi Charity, unfortunately this is a known issue not specific to 3.3 that 
> we are working to address. See this thread for some background:
> 
> http://zookeeper-user.578899.n2.nabble.com/odd-error-message-td4933761.html
> 
> I've raised the JIRA level to "blocker" to ensure we address this asap.
> 
> As Ted suggested you can remove the datadir -- only on the effected 
> server -- and then restart it. That should resolve the issue (the server 
> will d/l a snapshot of the current db from the leader).
> 
> Patrick
> 
> On 06/02/2010 11:11 AM, Charity Majors wrote:
>> I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an attempt 
>> to get away from a client bug that was crashing my backend services.
>> 
>> Unfortunately, this morning I had a server crash, and it brought down my 
>> entire cluster.  I don't have the logs leading up to the crash, because -- 
>> argghffbuggle -- log4j wasn't set up correctly.  But I restarted all three 
>> nodes, and odes two and three came back up and formed a quorum.
>> 
>> Node one, meanwhile, does this:
>> 
>> 2010-06-02 17:04:56,446 - INFO  
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
>> 2010-06-02 17:04:56,446 - INFO  
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82] - Reading snapshot 
>> /services/zookeeper/data/zookeeper/version-2/snapshot.a00000045
>> 2010-06-02 17:04:56,476 - INFO  
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
>> id =  1, Proposed zxid = 47244640287
>> 2010-06-02 17:04:56,486 - INFO  
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 1, 
>> 47244640287, 4, 1, LOOKING, LOOKING, 1
>> 2010-06-02 17:04:56,486 - INFO  
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 
>> 38654707048, 3, 1, LOOKING, LEADING, 3
>> 2010-06-02 17:04:56,486 - INFO  
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification: 3, 
>> 38654707048, 3, 1, LOOKING, FOLLOWING, 2
>> 2010-06-02 17:04:56,486 - INFO  
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
>> 2010-06-02 17:04:56,486 - INFO  
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server with 
>> tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir 
>> /services/zookeeper/data/zookeeper/version-2 snapdir 
>> /services/zookeeper/data/zookeeper/version-2
>> 2010-06-02 17:04:56,486 - FATAL 
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] - Leader epoch a is less than 
>> our epoch b
>> 2010-06-02 17:04:56,486 - WARN  
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@82] - Exception when following 
>> the leader
>> java.io.IOException: Error: Epoch of leader is lower
>>        at 
>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
>>        at 
>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
>> 2010-06-02 17:04:56,486 - INFO  
>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@166] - shutdown called
>> java.lang.Exception: shutdown Follower
>>        at 
>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>>        at 
>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)
>> 
>> 
>> 
>> All I can find is this, 
>> http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00449.html,
>>  which implies that this state should never happen.
>> 
>> Any suggestions?  If it happens again, I'll just have to roll everything 
>> back to 3.2.1 and live with the client crashes.
>> 
>> 
>> 
>> 

Reply via email to