Hi All,


We are using Apache Storm with Apache Zookeepers.

In our configuration we have Zookeeper cluster with 3 nodes in it. Everything 
was well. But recently we faced issue that zookeepers got restarted during 
night, although there are no calculations or any load during at that period. 
And it continuously happening on each night.

Zookeeper version is 3.4.6.



Here are the logs for time of failure.



On follower:

2015-05-02 08:20:52,603 [myid:3] - INFO  
[PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.

2015-05-02 08:20:53,044 [myid:3] - INFO  
[PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.

2015-05-02 08:45:03,416 [myid:3] - WARN  
[QuorumPeer[myid=3]/0.0.0.0:2181:Follower@89] - Exception when following the 
leader

java.io.EOFException

                at java.io.DataInputStream.readInt(DataInputStream.java:392)

                at 
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)

                at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)

                at 
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)

                at 
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)

                at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)

                at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)

2015-05-02 08:45:03,417 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:Follower@166] - shutdown called

java.lang.Exception: shutdown Follower

                at 
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)

                at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)

2015-05-02 08:45:03,418 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection 
for client /10.49.5.15:41692 which had sessionid 0x34cef2f0370017c

2015-05-02 08:45:03,418 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection 
for client /10.49.5.15:41687 which had sessionid 0x34cef2f0370017a

2015-05-02 08:45:03,419 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection 
for client /10.49.5.19:40478 which had sessionid 0x34cef2f0370017d

2015-05-02 08:45:03,419 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection 
for client /10.49.5.19:40474 which had sessionid 0x34cef2f0370017b

2015-05-02 08:45:03,420 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection 
for client /10.49.5.17:55431 which had sessionid 0x34cef2f03700179

2015-05-02 08:45:03,420 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:FollowerZooKeeperServer@139] - Shutting down

2015-05-02 08:45:03,420 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer@441] - shutting down

2015-05-02 08:45:03,420 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:FollowerRequestProcessor@105] - Shutting down

2015-05-02 08:45:03,421 [myid:3] - INFO  
[FollowerRequestProcessor:3:FollowerRequestProcessor@95] - 
FollowerRequestProcessor exited loop!

2015-05-02 08:45:03,421 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:CommitProcessor@181] - Shutting down

2015-05-02 08:45:03,422 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:FinalRequestProcessor@415] - shutdown of 
request processor complete

2015-05-02 08:45:03,422 [myid:3] - INFO  
[CommitProcessor:3:CommitProcessor@150] - CommitProcessor exited loop!

2015-05-02 08:45:03,422 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:SyncRequestProcessor@209] - Shutting down





On leader:



2015-05-02 08:20:45,078 [myid:2] - INFO  
[PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.

2015-05-02 08:20:45,403 [myid:2] - INFO  
[PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.

2015-05-02 08:45:03,415 [myid:2] - WARN  
[QuorumPeer[myid=2]/0.0.0.0:2181:LearnerHandler@687] - Closing connection to 
peer due to transaction timeout.

2015-05-02 08:45:03,415 [myid:2] - WARN  
[QuorumPeer[myid=2]/0.0.0.0:2181:LearnerHandler@687] - Closing connection to 
peer due to transaction timeout.

2015-05-02 08:45:03,416 [myid:2] - WARN  
[LearnerHandler-/148.112.48.199:43202:LearnerHandler@646] - ******* GOODBYE 
/148.112.48.199:43202 ********

2015-05-02 08:45:03,417 [myid:2] - WARN  
[LearnerHandler-/148.112.48.199:43202:LearnerHandler@658] - Ignoring unexpected 
exception

java.lang.InterruptedException

                at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)

                at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)

                at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)

                at 
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)

                at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)

2015-05-02 08:45:03,417 [myid:2] - WARN  
[LearnerHandler-/148.112.48.201:33870:LearnerHandler@646] - ******* GOODBYE 
/148.112.48.201:33870 ********

2015-05-02 08:45:03,417 [myid:2] - WARN  
[LearnerHandler-/148.112.48.201:33870:LearnerHandler@658] - Ignoring unexpected 
exception

java.lang.InterruptedException

                at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)

                at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)

                at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)

                at 
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)

                at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)

2015-05-02 08:45:03,536 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.49.5.16:53812

2015-05-02 08:45:03,539 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client 
attempting to renew session 0x14cef2ee4c80171 at /10.49.5.16:53812

2015-05-02 08:45:03,539 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established 
session 0x14cef2ee4c80171 with negotiated timeout 20000 for client 
/10.49.5.16:53812

2015-05-02 08:45:03,582 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.49.5.19:37378

2015-05-02 08:45:03,585 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client 
attempting to renew session 0x14cef2ee4c80173 at /10.49.5.19:37378

2015-05-02 08:45:03,586 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established 
session 0x14cef2ee4c80173 with negotiated timeout 20000 for client 
/10.49.5.19:37378

2015-05-02 08:45:03,606 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.49.5.17:40379

............

2015-05-02 08:45:04,379 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established 
session 0x14cef2ee4c8016f with negotiated timeout 20000 for client 
/10.49.5.18:52703

2015-05-02 08:45:04,416 [myid:2] - INFO  
[QuorumPeer[myid=2]/0.0.0.0:2181:Leader@493] - Shutting down

2015-05-02 08:45:04,416 [myid:2] - INFO  
[QuorumPeer[myid=2]/0.0.0.0:2181:Leader@499] - Shutdown called

java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 2 ]

                at 
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:499)

                at 
org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:474)

                at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:799)

2015-05-02 08:45:04,417 [myid:2] - INFO  
[QuorumPeer[myid=2]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection 
for client /10.49.5.17:53780 which had sessionid 0x24cef2ee4cf016a





I don't understand why it's happening during nights, when there are no load. 
And everything running ok during day.

Can you help me to understand the root cause.

Will increasing syncLimit parameter? resolve this issue?



Thanks.

Reply via email to