Re: Follower never recovers and keeps saying ZooKeeperServer not running
I've seen this bur can't remember what caused it. What's your zNode size? steph On Tue, Jul 3, 2018, 9:17 PM Benjamin Jaton wrote: > Hello, > > I'm wondering what can cause a ZK follower to check out like this: > > 2018-07-03T13:43:28,814 [myid:] - ERROR [LearnerHandler-/10.0.0.248:40282 > :LearnerHandler@620] - Unexpected exception causing shutdown while sock > still open > java.io.EOFException: null > at java.io.DataInputStream.readInt(DataInputStream.java:392) > ~[?:1.8.0_171] > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] > at > > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > ~[zookeeper-3.5.0-alpha.jar:?] > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] > at > > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:514) > [zookeeper-3.5.0-alpha.jar:?] > 2018-07-03T13:43:28,814 [myid:] - WARN [LearnerHandler-/10.0.0.248:40282 > :LearnerHandler@635] - *** GOODBYE /10.0.0.248:40282 > 2018-07-03T13:43:28,814 [myid:] - ERROR [LearnerHandler-/10.0.0.237:54408 > :LearnerHandler@620] - Unexpected exception causing shutdown while sock > still open > java.io.EOFException: null > at java.io.DataInputStream.readInt(DataInputStream.java:392) > ~[?:1.8.0_171] > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] > at > > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > ~[zookeeper-3.5.0-alpha.jar:?] > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] > at > > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:514) > [zookeeper-3.5.0-alpha.jar:?] > 2018-07-03T13:43:28,814 [myid:] - INFO > [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 > :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from > / > 10.0.0.237:46754 > 2018-07-03T13:43:28,814 [myid:] - WARN [LearnerHandler-/10.0.0.237:54408 > :LearnerHandler@635] - *** GOODBYE /10.0.0.237:54408 > 2018-07-03T13:43:28,814 [myid:] - INFO > [ConnnectionExpirer:MBeanRegistry@128] - Unregister MBean > > [org.apache.ZooKeeperService:name0=ReplicatedServer_id2,name1=replica.2,name2=Leader,name3=Connections,name4=10.0.0.111,name5=0x25f3766] > 2018-07-03T13:43:28,815 [myid:] - INFO > [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 > :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from > / > 10.0.0.248:33198 > 2018-07-03T13:43:28,815 [myid:] - INFO > [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 > :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from > / > 10.0.0.237:46780 > 2018-07-03T13:43:28,815 [myid:] - INFO > [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 > :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from > / > 10.0.0.248:33268 > 2018-07-03T13:43:28,815 [myid:] - INFO > [ConnnectionExpirer:NIOServerCnxn@606] - Closed socket connection for > client /10.0.0.111:39162 which had sessionid 0x25f3766 > 2018-07-03T13:43:28,815 [myid:] - INFO > [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 > :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from > / > 10.0.0.237:46810 > 2018-07-03T13:43:28,815 [myid:] - INFO > [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 > :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from > / > 10.0.0.237:46824 > 2018-07-03T13:43:28,815 [myid:] - INFO > [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 > :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from > / > 10.0.0.237:46826 > 2018-07-03T13:43:28,815 [myid:] - INFO > > [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@613 > ] > - Shutting down > 2018-07-03T13:43:28,815 [myid:] - INFO > [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 > :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from > / > 10.0.0.248:33316 > 2018-07-03T13:43:28,815 [myid:] - INFO > [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 > :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from > / > 10.0.0.248:33548 > 2018-07-03T13:43:28,815 [myid:] - INFO > > [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@619 > ] > - Shutdown called > java.lang.Exception: shutdown Leader! reason: Not sufficient followers > synced, only synced with sids: [ [2] ] > at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:619) > [zookeeper-3.5.0-alpha.jar:?] > at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) > [zookeeper-3.5.0-alpha.jar:?] > at >
Follower never recovers and keeps saying ZooKeeperServer not running
Hello, I'm wondering what can cause a ZK follower to check out like this: 2018-07-03T13:43:28,814 [myid:] - ERROR [LearnerHandler-/10.0.0.248:40282 :LearnerHandler@620] - Unexpected exception causing shutdown while sock still open java.io.EOFException: null at java.io.DataInputStream.readInt(DataInputStream.java:392) ~[?:1.8.0_171] at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) ~[zookeeper-3.5.0-alpha.jar:?] at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:514) [zookeeper-3.5.0-alpha.jar:?] 2018-07-03T13:43:28,814 [myid:] - WARN [LearnerHandler-/10.0.0.248:40282 :LearnerHandler@635] - *** GOODBYE /10.0.0.248:40282 2018-07-03T13:43:28,814 [myid:] - ERROR [LearnerHandler-/10.0.0.237:54408 :LearnerHandler@620] - Unexpected exception causing shutdown while sock still open java.io.EOFException: null at java.io.DataInputStream.readInt(DataInputStream.java:392) ~[?:1.8.0_171] at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) ~[zookeeper-3.5.0-alpha.jar:?] at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:514) [zookeeper-3.5.0-alpha.jar:?] 2018-07-03T13:43:28,814 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 10.0.0.237:46754 2018-07-03T13:43:28,814 [myid:] - WARN [LearnerHandler-/10.0.0.237:54408 :LearnerHandler@635] - *** GOODBYE /10.0.0.237:54408 2018-07-03T13:43:28,814 [myid:] - INFO [ConnnectionExpirer:MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id2,name1=replica.2,name2=Leader,name3=Connections,name4=10.0.0.111,name5=0x25f3766] 2018-07-03T13:43:28,815 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 10.0.0.248:33198 2018-07-03T13:43:28,815 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 10.0.0.237:46780 2018-07-03T13:43:28,815 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 10.0.0.248:33268 2018-07-03T13:43:28,815 [myid:] - INFO [ConnnectionExpirer:NIOServerCnxn@606] - Closed socket connection for client /10.0.0.111:39162 which had sessionid 0x25f3766 2018-07-03T13:43:28,815 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 10.0.0.237:46810 2018-07-03T13:43:28,815 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 10.0.0.237:46824 2018-07-03T13:43:28,815 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 10.0.0.237:46826 2018-07-03T13:43:28,815 [myid:] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@613] - Shutting down 2018-07-03T13:43:28,815 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 10.0.0.248:33316 2018-07-03T13:43:28,815 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 :NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 10.0.0.248:33548 2018-07-03T13:43:28,815 [myid:] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@619] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ [2] ] at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:619) [zookeeper-3.5.0-alpha.jar:?] at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) [zookeeper-3.5.0-alpha.jar:?] at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1077) [zookeeper-3.5.0-alpha.jar:?] 2018-07-03T13:43:28,815 [myid:] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):NIOServerCnxn@606] - Closed socket connection for client /10.0.0.248:33198 (no session established for client) 2018-07-03T13:43:28,816 [myid:] - INFO
Re: Observer went down with Read timed out exception
Andor, Zk version that i use is zk_version 3.4.5-1392090, built on 09/30/2012 17:52 GMT No Auth or encryption config None my of network graphs showing any dip or unusual pattern thats why i am thinking there may not be any n/w issue. I have those nodes in cloud so checking with them to see if any n/w issue between regions. Thanks, Ram On Tue, Jul 3, 2018 at 6:29 AM Andor Molnar wrote: > Hi Rammohan, > > Would you please elaborate on the details of your cluster setup? > Which ZooKeeper version do you use? > Do you use authentication / encryption? > Would you please attach config files and log files of other nodes like > leader and followers? > > How did you make sure that there was no network problem at the time when > issue happened? > Would you please attach graphs / diagrams on the network traffic including > latency and bandwidth usage between the affected data centers? > > Regards, > Andor > > > > > On Tue, Jul 3, 2018 at 2:56 PM, rammohan ganapavarapu < > rammohanga...@gmail.com> wrote: > > > Yes I am sure there is no network issues, if leader is busy in GC > followers > > on the same DC would have been shutdown as we right but it wasn't the > case. > > > > On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar > > > wrote: > > > > > Hi Ram, > > > > > > Are you sure there were no network error? For me, this looks like it > > could > > > be due to failed heartbeats (as shutdown was called after the timeout). > > > > > > It is also possible the leader was busy (maybe garbage collection > caused > > > pause?) - especially if you store big(ish) chunks of data in ZooKeeper. > > > (There is plan to integrate JVMPauseMonitor to ZooKeeper for this > reason > > > actually). > > > > > > Regards, > > > Norbert > > > > > > On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu < > > > rammohanga...@gmail.com> wrote: > > > > > > > All, > > > > > > > > I have multi data-center ldap cluster setup with other data-center > with > > > all > > > > observers all of sudden all the observer threads went down with the > > > > following message, any idea why they went down? We don't see any > > network > > > > related issues between data-centers. > > > > > > > > > > > > 2018-06-29 05:32:59,036 [myid:222] - WARN > > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception > > when > > > > observing the leader > > > > java.net.SocketTimeoutException: Read timed out > > > > at java.net.SocketInputStream.socketRead0(Native Method) > > > > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > > > > at java.net.SocketInputStream.read(SocketInputStream.java:170) > > > > at java.net.SocketInputStream.read(SocketInputStream.java:141) > > > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > > > > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > > > > at java.io.DataInputStream.readInt(DataInputStream.java:387) > > > > at org.apache.jute.BinaryInputArchive.readInt( > > BinaryInputArchive.java:63) > > > > at > > > > > > > > > > > org.apache.zookeeper.server.quorum.QuorumPacket. > > deserialize(QuorumPacket.java:83) > > > > at > > > > > > > org.apache.jute.BinaryInputArchive.readRecord( > > BinaryInputArchive.java:108) > > > > at > > > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) > > > > at > > > > > > > org.apache.zookeeper.server.quorum.Observer.observeLeader( > > Observer.java:75) > > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run( > > QuorumPeer.java:727) > > > > 2018-06-29 05:32:59,244 [myid:222] - INFO > > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown > > > called > > > > java.lang.Exception: shutdown Observer > > > > at > > > org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137) > > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run( > > QuorumPeer.java:731) > > > > > > > > > > > > Thanks, > > > > Ram > > > > > > > > > >
Re: Observer went down with Read timed out exception
Yes I am sure there is no network issues, if leader is busy in GC followers on the same DC would have been shutdown as we right but it wasn't the case. On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar wrote: > Hi Ram, > > Are you sure there were no network error? For me, this looks like it could > be due to failed heartbeats (as shutdown was called after the timeout). > > It is also possible the leader was busy (maybe garbage collection caused > pause?) - especially if you store big(ish) chunks of data in ZooKeeper. > (There is plan to integrate JVMPauseMonitor to ZooKeeper for this reason > actually). > > Regards, > Norbert > > On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu < > rammohanga...@gmail.com> wrote: > > > All, > > > > I have multi data-center ldap cluster setup with other data-center with > all > > observers all of sudden all the observer threads went down with the > > following message, any idea why they went down? We don't see any network > > related issues between data-centers. > > > > > > 2018-06-29 05:32:59,036 [myid:222] - WARN > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception when > > observing the leader > > java.net.SocketTimeoutException: Read timed out > > at java.net.SocketInputStream.socketRead0(Native Method) > > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > > at java.net.SocketInputStream.read(SocketInputStream.java:170) > > at java.net.SocketInputStream.read(SocketInputStream.java:141) > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > > at java.io.DataInputStream.readInt(DataInputStream.java:387) > > at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > > at > > > > > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > > at > > > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) > > at > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) > > at > > > org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:75) > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727) > > 2018-06-29 05:32:59,244 [myid:222] - INFO > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown > called > > java.lang.Exception: shutdown Observer > > at > org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137) > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:731) > > > > > > Thanks, > > Ram > > >
Re: Observer went down with Read timed out exception
Hi Ram, Are you sure there were no network error? For me, this looks like it could be due to failed heartbeats (as shutdown was called after the timeout). It is also possible the leader was busy (maybe garbage collection caused pause?) - especially if you store big(ish) chunks of data in ZooKeeper. (There is plan to integrate JVMPauseMonitor to ZooKeeper for this reason actually). Regards, Norbert On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu < rammohanga...@gmail.com> wrote: > All, > > I have multi data-center ldap cluster setup with other data-center with all > observers all of sudden all the observer threads went down with the > following message, any idea why they went down? We don't see any network > related issues between data-centers. > > > 2018-06-29 05:32:59,036 [myid:222] - WARN > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception when > observing the leader > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:170) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) > at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) > at > org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:75) > at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727) > 2018-06-29 05:32:59,244 [myid:222] - INFO > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown called > java.lang.Exception: shutdown Observer > at org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137) > at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:731) > > > Thanks, > Ram >