I have a 5-node Zookeeper 3.4.6 cluster across 3 data centres (2 zookeepers in
each “main” DC, and a 5th in a 3rd DC for quorum). I see that the two nodes in
one DC have regular “issues” where they get kicked out of the cluster and the
ZooKeeperServer process stops for a few minutes until the node rejoins. I’d
like to know a couple of things, if someone could please point me in the
direction of the relevant docs I’d greatly appreciate it.
1.) Is it expected behaviour that when a node is kicked from the cluster, it
will not be allowed to re-join for a period ? From the logs below I can see
that re-establishing a valid cluster took around 15 minutes.
2.) It appears that the leader closes connections to the affected followers
after a “transaction timeout” occurs. Where would I find out what this timeout
is ? Is this the same thing as a session timout (e.g. The default of 20 *
tickTime) ?
3.) Where can I find the definition of the different fields in the election log
messages (I.e. What are “n.round”, “n.zxid”, “n.state” and so on) ? I can make
an educated guess, which seems to point towards a possible network partition
event (see below). This should help me start to narrow down the issue. For
reference, below are some extracts from the relevant logs:
On the leader, I see a message similar to the following :
2015-06-24 04:18:40,152 [myid:5] - WARN
[QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing
connection to peer due to transaction timeout.
2015-06-24 04:18:40,152 [myid:5] - WARN
[LearnerHandler-/10.93.167.74:36874:LearnerHandler@646] - ******* GOODBYE
/10.93.167.74:36874 ********
2015-06-24 04:18:40,153 [myid:5] - WARN
[LearnerHandler-/10.93.167.74:36874:LearnerHandler@658] - Ignoring unexpected
exception
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
at
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
at
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)
at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)
And on the affected follower :
2015-06-24 04:18:40,160 [myid:3] - WARN
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2015-06-24 04:18:40,160 [myid:3] - INFO
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)
Following this, the follower has a lot of the following :
2015-06-24 04:26:24,615 [myid:3] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /0:0:0:0:0:0:0:1:42079
2015-06-24 04:26:24,616 [myid:3] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
running
And a few election messages. These are interesting as (from a guess of what the
fields are in these messages), I see what looks like a network partition issue :
2015-06-24 04:26:27,689 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
format version), 3 (n.leader), 0x10009b227 (n.zxid), 0x2 (n.round), LOOKING
(n.state), 3 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2015-06-24 04:26:27,691 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
format version), 5 (n.leader), 0x0 (n.zxid), 0x1 (n.round), FOLLOWING
(n.state), 4 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
Assuming that “n.leader” means what this zookeeper thinks the leader is, it
looks like there are two leaders – 3 (the affected follower) and 5 (the actual
leader that the rest of the cluster recognises) . And a little while later on
the follower, I see a successful election. It looks like there’s also some
shaking out of quorum here – almost as if it’s trying to recover from a near
split-brain scenario (of course, it’s possible my interpretation of these logs
is way off…):
2015-06-24 04:34:27,707 [myid:3] - INFO
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification
time out: 60000
2015-06-24 04:34:27,712 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
format version), 3 (n.leader), 0x10009b227 (n.zxid), 0x2 (n.round), LOOKING
(n.state), 3 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2015-06-24 04:34:27,714 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
format version), 5 (n.leader), 0x0 (n.zxid), 0x1 (n.round), FOLLOWING
(n.state), 4 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2015-06-24 04:34:27,719 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
format version), 5 (n.leader), 0x0 (n.zxid), 0x1 (n.round), FOLLOWING
(n.state), 1 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2015-06-24 04:34:27,719 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
format version), 5 (n.leader), 0x0 (n.zxid), 0x1 (n.round), FOLLOWING
(n.state), 1 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2015-06-24 04:34:27,721 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
format version), 5 (n.leader), 0x0 (n.zxid), 0x1 (n.round), FOLLOWING
(n.state), 2 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2015-06-24 04:34:27,722 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
format version), 5 (n.leader), 0x0 (n.zxid), 0x1 (n.round), FOLLOWING
(n.state), 2 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2015-06-24 04:34:27,728 [myid:3] - INFO
[WorkerSender[myid=3]:QuorumCnxManager@193] - Have smaller server identifier,
so dropping the connection: (5, 3)
2015-06-24 04:34:27,762 [myid:3] - INFO
[nftserentzk01.slu.skycdc.com/10.93.167.74:3888:QuorumCnxManager$Listener@511]
- Received connection request /10.105.6.117:38358
2015-06-24 04:34:27,764 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message
format version), 5 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LEADING (n.state),
5 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2015-06-24 04:34:27,765 [myid:3] - INFO
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@784] - FOLLOWING
2015-06-24 04:34:27,765 [myid:3] - INFO
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] - Created server
with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir
/zkdata/version-2 snapdir /zkdata/version-2
2015-06-24 04:34:27,765 [myid:3] - INFO
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@63] - FOLLOWING - LEADER
ELECTION TOOK - 943889
Are my assumptions correct here, or am I barking up the wrong tree ?
Many thanks in advance,
Mark Dastmalchi-Round
DevOps Engineer, Sky Search
BSkyB | Great West House | GW1 - 4th floor
Information in this email including any attachments may be privileged,
confidential and is intended exclusively for the addressee. The views expressed
may not be official policy, but the personal views of the originator. If you
have received it in error, please notify the sender by return e-mail and delete
it from your system. You should not reproduce, distribute, store, retransmit,
use or disclose its contents to anyone. Please note we reserve the right to
monitor all e-mail communication through our internal and external networks.
SKY and the SKY marks are trademarks of Sky plc and Sky International AG and
are used under licence. Sky UK Limited (Registration No. 2906991), Sky-In-Home
Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited
(Registration No. 2340150) are direct or indirect subsidiaries of Sky plc
(Registration No. 2247735). All of the companies mentioned in this paragraph
are incorporated in England and Wales and share the same registered office at
Grant Way, Isleworth, Middlesex TW7 5QD.