Re: zookeeper crash

Ted Dunning Wed, 02 Jun 2010 11:21:03 -0700

This looks a bit like a small bobble we had when upgrading a bit ago.

I THINK that the answer here is to mind-wipe the misbehaving node and have
it resynch from scratch from the other nodes.


Wait for confirmation from somebody real.

On Wed, Jun 2, 2010 at 11:11 AM, Charity Majors <char...@shopkick.com>wrote:

> I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an
> attempt to get away from a client bug that was crashing my backend services.
>
> Unfortunately, this morning I had a server crash, and it brought down my
> entire cluster.  I don't have the logs leading up to the crash, because --
> argghffbuggle -- log4j wasn't set up correctly.  But I restarted all three
> nodes, and odes two and three came back up and formed a quorum.
>
> Node one, meanwhile, does this:
>
> 2010-06-02 17:04:56,446 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
> 2010-06-02 17:04:56,446 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82] - Reading snapshot
> /services/zookeeper/data/zookeeper/version-2/snapshot.a00000045
> 2010-06-02 17:04:56,476 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election.
> My id =  1, Proposed zxid = 47244640287
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification:
> 1, 47244640287, 4, 1, LOOKING, LOOKING, 1
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification:
> 3, 38654707048, 3, 1, LOOKING, LEADING, 3
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification:
> 3, 38654707048, 3, 1, LOOKING, FOLLOWING, 2
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@642] - FOLLOWING
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:zookeeperser...@151] - Created server
> with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir
> /services/zookeeper/data/zookeeper/version-2 snapdir
> /services/zookeeper/data/zookeeper/version-2
> 2010-06-02 17:04:56,486 - FATAL
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@71] - Leader epoch a is less
> than our epoch b
> 2010-06-02 17:04:56,486 - WARN
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@82] - Exception when following
> the leader
> java.io.IOException: Error: Epoch of leader is lower
>       at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
>       at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:follo...@166] - shutdown called
> java.lang.Exception: shutdown Follower
>       at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>       at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)
>
>
>
> All I can find is this,
> http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00449.html,
> which implies that this state should never happen.
>
> Any suggestions?  If it happens again, I'll just have to roll everything
> back to 3.2.1 and live with the client crashes.
>
>
>
>
>

Re: zookeeper crash

Reply via email to