Hi All,
We are using Apache Storm with Apache Zookeepers.
In our configuration we have Zookeeper cluster with 3 nodes in it. Everything
was well. But recently we faced issue that zookeepers got restarted during
night, although there are no calculations or any load during at that period.
And it continuously happening on each night.
Zookeeper version is 3.4.6.
Here are the logs for time of failure.
On follower:
2015-05-02 08:20:52,603 [myid:3] - INFO
[PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2015-05-02 08:20:53,044 [myid:3] - INFO
[PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
2015-05-02 08:45:03,416 [myid:3] - WARN
[QuorumPeer[myid=3]/0.0.0.0:2181:Follower@89] - Exception when following the
leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2015-05-02 08:45:03,417 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)
2015-05-02 08:45:03,418 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection
for client /10.49.5.15:41692 which had sessionid 0x34cef2f0370017c
2015-05-02 08:45:03,418 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection
for client /10.49.5.15:41687 which had sessionid 0x34cef2f0370017a
2015-05-02 08:45:03,419 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection
for client /10.49.5.19:40478 which had sessionid 0x34cef2f0370017d
2015-05-02 08:45:03,419 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection
for client /10.49.5.19:40474 which had sessionid 0x34cef2f0370017b
2015-05-02 08:45:03,420 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection
for client /10.49.5.17:55431 which had sessionid 0x34cef2f03700179
2015-05-02 08:45:03,420 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:FollowerZooKeeperServer@139] - Shutting down
2015-05-02 08:45:03,420 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer@441] - shutting down
2015-05-02 08:45:03,420 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:FollowerRequestProcessor@105] - Shutting down
2015-05-02 08:45:03,421 [myid:3] - INFO
[FollowerRequestProcessor:3:FollowerRequestProcessor@95] -
FollowerRequestProcessor exited loop!
2015-05-02 08:45:03,421 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:CommitProcessor@181] - Shutting down
2015-05-02 08:45:03,422 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:FinalRequestProcessor@415] - shutdown of
request processor complete
2015-05-02 08:45:03,422 [myid:3] - INFO
[CommitProcessor:3:CommitProcessor@150] - CommitProcessor exited loop!
2015-05-02 08:45:03,422 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:SyncRequestProcessor@209] - Shutting down
On leader:
2015-05-02 08:20:45,078 [myid:2] - INFO
[PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2015-05-02 08:20:45,403 [myid:2] - INFO
[PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
2015-05-02 08:45:03,415 [myid:2] - WARN
[QuorumPeer[myid=2]/0.0.0.0:2181:LearnerHandler@687] - Closing connection to
peer due to transaction timeout.
2015-05-02 08:45:03,415 [myid:2] - WARN
[QuorumPeer[myid=2]/0.0.0.0:2181:LearnerHandler@687] - Closing connection to
peer due to transaction timeout.
2015-05-02 08:45:03,416 [myid:2] - WARN
[LearnerHandler-/148.112.48.199:43202:LearnerHandler@646] - ******* GOODBYE
/148.112.48.199:43202 ********
2015-05-02 08:45:03,417 [myid:2] - WARN
[LearnerHandler-/148.112.48.199:43202:LearnerHandler@658] - Ignoring unexpected
exception
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
at
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
at
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)
at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)
2015-05-02 08:45:03,417 [myid:2] - WARN
[LearnerHandler-/148.112.48.201:33870:LearnerHandler@646] - ******* GOODBYE
/148.112.48.201:33870 ********
2015-05-02 08:45:03,417 [myid:2] - WARN
[LearnerHandler-/148.112.48.201:33870:LearnerHandler@658] - Ignoring unexpected
exception
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
at
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
at
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)
at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)
2015-05-02 08:45:03,536 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.49.5.16:53812
2015-05-02 08:45:03,539 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client
attempting to renew session 0x14cef2ee4c80171 at /10.49.5.16:53812
2015-05-02 08:45:03,539 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established
session 0x14cef2ee4c80171 with negotiated timeout 20000 for client
/10.49.5.16:53812
2015-05-02 08:45:03,582 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.49.5.19:37378
2015-05-02 08:45:03,585 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client
attempting to renew session 0x14cef2ee4c80173 at /10.49.5.19:37378
2015-05-02 08:45:03,586 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established
session 0x14cef2ee4c80173 with negotiated timeout 20000 for client
/10.49.5.19:37378
2015-05-02 08:45:03,606 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.49.5.17:40379
............
2015-05-02 08:45:04,379 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established
session 0x14cef2ee4c8016f with negotiated timeout 20000 for client
/10.49.5.18:52703
2015-05-02 08:45:04,416 [myid:2] - INFO
[QuorumPeer[myid=2]/0.0.0.0:2181:Leader@493] - Shutting down
2015-05-02 08:45:04,416 [myid:2] - INFO
[QuorumPeer[myid=2]/0.0.0.0:2181:Leader@499] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced,
only synced with sids: [ 2 ]
at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:499)
at
org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:474)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:799)
2015-05-02 08:45:04,417 [myid:2] - INFO
[QuorumPeer[myid=2]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection
for client /10.49.5.17:53780 which had sessionid 0x24cef2ee4cf016a
I don't understand why it's happening during nights, when there are no load.
And everything running ok during day.
Can you help me to understand the root cause.
Will increasing syncLimit parameter? resolve this issue?
Thanks.