Hi Folks, OK, the following works. It will start up and if any of the three are restarted they will rejoin the quorum.
1) Versions: openjdk version "1.8.0_111-internal" OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) Zookeeper version: 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on 02/10/2020 11:30 GMT 2) Add following to zoo.cfg: quorumListenOnAllIPs=true 3) Use the follow docker-compose config: environment: ZK_ID: 1 ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888 -- environment: ZK_ID: 2 ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888 -- environment: ZK_ID: 3 ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888 Sorry, I misunderstood yesterday, I did 2) but not 3) Thanks. Steve On 2/20/20, 8:16 AM, "Steve Jerman" <st...@kloudspot.com> wrote: Few points: 1) The containers all are running: bash-4.3# java -version openjdk version "1.8.0_111-internal" OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) 2) The containers are configured like this: environment: ZK_ID: 1 ZK_CLUSTER: server.1=0.0.0.0:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888 -- environment: ZK_ID: 2 ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zookeeper3:2888:3888 -- environment: ZK_ID: 3 ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=0.0.0.0:2888:3888 Reading below, I see that you suggest I should do the following: environment: ZK_ID: 1 quorumListenOnAllIPs: true ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888 -- environment: ZK_ID: 2 quorumListenOnAllIPs: true ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888 -- environment: ZK_ID: 3 quorumListenOnAllIPs: true ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888 Will try... Steve On 2/20/20, 2:56 AM, "Jörn Franke" <jornfra...@gmail.com> wrote: Thanks . It is strange that JDK 11.0.6 has a backwards incompatible change. However, it would be sad if we are stuck all the time with JDK 11.0.5. > Am 20.02.2020 um 10:53 schrieb Szalay-Bekő Máté <szalay.beko.m...@gmail.com>: > > Hi Guys, > > I think the 'reverse order startup failure' actually has the very same root > cause than the 0.0.0.0 issue discussed in ZOOKEEPER-2164. > > Downgrading to 3.4 for now should solve these problems I think. > > Still I am a bit confused... I just want to understand if we really miss > something in the ZooKeeper configuration model. > > Assuming that myid=1 (we are talking about the zoo.cfg in server 1), to the > 'server.1=...' line you can put address which can be used by other servers > to talk back to server 1. This will be the 'advertised address' used by > ZooKeeper in 3.5. Putting here 0.0.0.0 will not work with 3.5 (until we fix > it with ZOOKEEPER-2164), as server 2 will not be able to use 0.0.0.0 to > talk back to server 1. But if you put a valid address to the 'server.1=...' > config line while having quorumListenOnAllIPs=true set, you should still be > able tell ZooKeeper to bind on 0.0.0.0, no matter what IP/hostname you put > to the 'server.1=...' configs. > (there is a similar config to set which IP to bind with the client port as > well, if you would need to bind to 0.0.0.0 with the client port too.) > > > @Jorn: >> This might be a wide shot and I did not see exactly the same error, but > with corretto jdk 11.0.6 I had also issue that ZK could not a quorum. I > downgraded to 11.0.5 and it did not have an issues. This was on ZK 3.5.5 > with Kerberos authentication and authorization. > > In the recent JDK versions (8u424, or 11.0.6) there are some > backward-incompatible kerbersos related changes affecting basically the > whole Hadoop stack, not only ZooKeeper. I think it is not recommended to > use these JDK versions with Hadoop. I am not involved deep in this (maybe > there is some workaround already, but I am not aware of it). > > Kind regards, > Mate > >> On Wed, Feb 19, 2020 at 11:49 PM Steve Jerman <st...@kloudspot.com> wrote: >> >> Ok, >> >> Just to confirm. Rolling back to 3.4.14 fixes the issue. The quorum >> starts up and restarting any of the instances works.... >> >> Are there any issues with using the 3.5 client with 3.4 server? >> >> Steve >> >> On 2/19/20, 9:02 AM, "Steve Jerman" <st...@kloudspot.com> wrote: >> >> OK, that explains it. I will see if 3.4.14 fixes the issue for the >> time being... >> >> Thanks >> Steve >> >> On 2/19/20, 8:57 AM, "Jan Kosecki" <jan.koseck...@gmail.com> wrote: >> >> Hi Steve, >> >> it's possible that the quorum state depends on the order your >> nodes start. >> In my kubernetes environment I've had a similar issue and I've >> noticed that >> starting brokers 1 by 1, following the order from configuration >> file allows >> all 3 to join the quorum but a reverse order would keep server >> started as >> the last outside of the quorum. I was also using 0.0.0.0 in the >> configuration and didn't try a full address due to readiness check >> configuration. >> >> Unfortunately I didn't have time to debug it any further so I've >> downgraded >> back to 3.4 for the time being. >> >> Hope you manage to find a solution, >> >> Best, >> Jan >> >> On Wed, 19 Feb 2020, 15:47 Steve Jerman, <st...@kloudspot.com> >> wrote: >> >>> Hi, >>> >>> I've just been testing restarts ... I restarted one of the >> instances (id >>> 1) ... and it doesn't join the quorum ... same error. >>> >>> Odd that the system started fine but can't handle a restart.... >>> >>> Steve >>> >>>> On 2/19/20, 7:45 AM, "Steve Jerman" <st...@kloudspot.com> wrote: >>> >>> Thank You Mate, >>> >>> That fixed it. Unfortunately I can't easily avoid using >> 0.0.0.0 >>> >>> My configuration is built using Docker Storm and it doesn't >> let you >>> bind to a host name... >>> >>> Steve >>> >>> On 2/19/20, 5:27 AM, "Szalay-Bekő Máté" < >> szalay.beko.m...@gmail.com> >>> wrote: >>> >>> Hi Steve! >>> >>> If you are using ZooKeeper newer than 3.5.0, then this >> might be >>> the issue >>> we are just discussing / trying to fix in ZOOKEEPER-2164. >>> Can you test your setup with a config where you don't >> use 0.0.0.0 >>> in the >>> server addresses? >>> >>> If you need to bind to the 0.0.0.0 locally, then please >> set the >>> 'quorumListenOnAllIPs' config property to true. >>> >>> like: >>> >>> # usually you don't really need this, unless if you >> actually need >>> to bind >>> to multiple IPs >>> quorumListenOnAllIPs=true >>> >>> # it is best if all the zoo.cfg files contain the same >> address >>> settings, >>> and we don't use 0.0.0.0 here >>> server.1=zookeeper1:2888:3888 >>> server.2=zookeeper2:2888:3888 >>> server.3=zookeeper3:2888:3888 >>> >>> Kind regards, >>> Mate >>> >>> On Wed, Feb 19, 2020 at 6:08 AM Steve Jerman < >> st...@kloudspot.com> >>> wrote: >>> >>>> Hello folks, >>>> >>>> Wonder if anyone can help me. Suspect it must be >> something >>> simple but I >>>> cant see it. Any suggestions about how to diagnose >> would be >>> gratefully >>>> received. >>>> >>>> I have a three node ZK cluster, when it starts up only >> two of >>> the nodes >>>> form a quorum. If I restart the leader the quorum >> reforms with >>> the other >>>> two node… >>>> >>>> Thanks in advance for any help >>>> Steve >>>> >>>> This is the ‘stat’ for the leader/follow… >>>> >>>> bash-5.0$ echo stat | nc zookeeper1 2181 >>>> This ZooKeeper instance is not currently serving >> requests >>>> >>>> bash-5.0$ echo stat | nc zookeeper2 2181 >>>> Zookeeper version: >>> 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built >>>> on 02/10/2020 11:30 GMT >>>> Clients: >>>> /10.0.1.152:44910[1](queued=0,recved=151,sent=151) >>>> /10.0.1.140:53138[1](queued=0,recved=187,sent=187) >>>> /10.0.1.143:57422[1](queued=0,recved=151,sent=151) >>>> /10.0.1.152:59242[0](queued=0,recved=1,sent=0) >>>> /10.0.1.143:40826[1](queued=0,recved=1139,sent=1139) >>>> /10.0.1.152:49188[1](queued=0,recved=200,sent=203) >>>> /10.0.1.152:59548[1](queued=0,recved=1157,sent=1159) >>>> /10.0.1.140:36624[1](queued=0,recved=151,sent=151) >>>> >>>> Latency min/avg/max: 0/0/5 >>>> Received: 3338 >>>> Sent: 3342 >>>> Connections: 8 >>>> Outstanding: 0 >>>> Zxid: 0xc000000f3 >>>> Mode: follower >>>> Node count: 181 >>>> >>>> bash-5.0$ echo stat | nc zookeeper3 2181 >>>> Zookeeper version: >>> 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built >>>> on 02/10/2020 11:30 GMT >>>> Clients: >>>> /10.0.1.152:49428[0](queued=0,recved=1,sent=0) >>>> /10.0.1.140:32912[1](queued=0,recved=1426,sent=1429) >>>> >>>> Latency min/avg/max: 0/0/4 >>>> Received: 1684 >>>> Sent: 1686 >>>> Connections: 2 >>>> Outstanding: 0 >>>> Zxid: 0xc000000f4 >>>> Mode: leader >>>> Node count: 181 >>>> Proposal sizes last/min/max: 32/32/406 >>>> bash-5.0$ >>>> >>>> The trace for the failing node is: >>>> >>>> server.1=0.0.0.0:2888:3888 >>>> server.2=zookeeper2:2888:3888 >>>> server.3=zookeeper3:2888:3888 >>>> ZooKeeper JMX enabled by default >>>> Using config: /opt/zookeeper/bin/../conf/zoo.cfg >>>> 2020-02-19 04:23:27,759 [myid:] - INFO >>> [main:QuorumPeerConfig@135] - >>>> Reading configuration from: >> /opt/zookeeper/bin/../conf/zoo.cfg >>>> 2020-02-19 04:23:27,764 [myid:] - INFO >>> [main:QuorumPeerConfig@387] - >>>> clientPortAddress is 0.0.0.0:2181 >>>> 2020-02-19 04:23:27,764 [myid:] - INFO >>> [main:QuorumPeerConfig@391] - >>>> secureClientPort is not set >>>> 2020-02-19 04:23:27,771 [myid:1] - INFO >>> [main:DatadirCleanupManager@78] >>>> - autopurge.snapRetainCount set to 3 >>>> 2020-02-19 04:23:27,772 [myid:1] - INFO >>> [main:DatadirCleanupManager@79] >>>> - autopurge.purgeInterval set to 24 >>>> 2020-02-19 04:23:27,772 [myid:1] - INFO >>>> [PurgeTask:DatadirCleanupManager$PurgeTask@138] - >> Purge task >>> started. >>>> 2020-02-19 04:23:27,773 [myid:1] - INFO >> [main:ManagedUtil@46] >>> - Log4j >>>> found with jmx enabled. >>>> 2020-02-19 04:23:27,774 [myid:1] - INFO >>> [PurgeTask:FileTxnSnapLog@115] - >>>> zookeeper.snapshot.trust.empty : false >>>> 2020-02-19 04:23:27,780 [myid:1] - INFO >>>> [PurgeTask:DatadirCleanupManager$PurgeTask@144] - >> Purge task >>> completed. >>>> 2020-02-19 04:23:27,781 [myid:1] - INFO >> [main:QuorumPeerMain@141] >>> - >>>> Starting quorum peer >>>> 2020-02-19 04:23:27,786 [myid:1] - INFO >>> [main:ServerCnxnFactory@135] - >>>> Using org.apache.zookeeper.server.NIOServerCnxnFactory >> as server >>> connection >>>> factory >>>> 2020-02-19 04:23:27,788 [myid:1] - INFO >>> [main:NIOServerCnxnFactory@673] >>>> - Configuring NIO connection handler with 10s >> sessionless >>> connection >>>> timeout, 2 selector thread(s), 32 worker threads, and >> 64 kB >>> direct buffers. >>>> 2020-02-19 04:23:27,791 [myid:1] - INFO >>> [main:NIOServerCnxnFactory@686] >>>> - binding to port 0.0.0.0/0.0.0.0:2181 >>>> 2020-02-19 <http://0.0.0.0/0.0.0.0:21812020-02-19> >> 04:23:27,809 >>> [myid:1] >>>> - INFO [main:Log@169] - Logging initialized @249ms to >>>> org.eclipse.jetty.util.log.Slf4jLog >>>> 2020-02-19 04:23:27,913 [myid:1] - WARN >>> [main:ContextHandler@1520] - >>>> o.e.j.s.ServletContextHandler@5abca1e0 >> {/,null,UNAVAILABLE} >>> contextPath >>>> ends with /* >>>> 2020-02-19 04:23:27,913 [myid:1] - WARN >>> [main:ContextHandler@1531] - >>>> Empty contextPath >>>> 2020-02-19 04:23:27,922 [myid:1] - INFO >> [main:X509Util@79] - >>> Setting -D >>>> jdk.tls.rejectClientInitiatedRenegotiation=true to >> disable >>> client-initiated >>>> TLS renegotiation >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO >> [main:FileTxnSnapLog@115] >>> - >>>> zookeeper.snapshot.trust.empty : false >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO >> [main:QuorumPeer@1470] >>> - Local >>>> sessions disabled >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO >> [main:QuorumPeer@1481] >>> - Local >>>> session upgrading disabled >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO >> [main:QuorumPeer@1448] >>> - >>>> tickTime set to 2000 >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO >> [main:QuorumPeer@1492] >>> - >>>> minSessionTimeout set to 4000 >>>> 2020-02-19 04:23:27,924 [myid:1] - INFO >> [main:QuorumPeer@1503] >>> - >>>> maxSessionTimeout set to 40000 >>>> 2020-02-19 04:23:27,924 [myid:1] - INFO >> [main:QuorumPeer@1518] >>> - >>>> initLimit set to 30 >>>> 2020-02-19 04:23:27,932 [myid:1] - INFO >> [main:ZKDatabase@117] - >>>> zookeeper.snapshotSizeFactor = 0.33 >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO >> [main:QuorumPeer@1763] >>> - Using >>>> insecure (non-TLS) quorum communication >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO >> [main:QuorumPeer@1769] >>> - Port >>>> unification disabled >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO >> [main:QuorumPeer@2136] >>> - >>>> QuorumPeer communication is not secured! (SASL auth >> disabled) >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO >> [main:QuorumPeer@2165] >>> - >>>> quorum.cnxn.threads.size set to 20 >>>> 2020-02-19 04:23:27,934 [myid:1] - INFO >> [main:FileSnap@83] - >>> Reading >>>> snapshot >> /opt/zookeeper/data/version-2/snapshot.90000043e >>>> 2020-02-19 04:23:27,963 [myid:1] - INFO >> [main:Server@359] - >>>> jetty-9.4.24.v20191120; built: >> 2019-11-20T21:37:49.771Z; git: >>>> 363d5f2df3a8a28de40604320230664b9c793c16; jvm >>>> 1.8.0_111-internal-alpine-r0-b14 >>>> 2020-02-19 04:23:27,989 [myid:1] - INFO >>> [main:DefaultSessionIdManager@333] >>>> - DefaultSessionIdManager workerName=node0 >>>> 2020-02-19 04:23:27,989 [myid:1] - INFO >>> [main:DefaultSessionIdManager@338] >>>> - No SessionScavenger set, using defaults >>>> 2020-02-19 04:23:27,990 [myid:1] - INFO >> [main:HouseKeeper@140] >>> - node0 >>>> Scavenging every 660000ms >>>> 2020-02-19 04:23:27,997 [myid:1] - INFO >> [main:ContextHandler@825] >>> - >>>> Started o.e.j.s.ServletContextHandler@5abca1e0 >> {/,null,AVAILABLE} >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO >>> [main:AbstractConnector@330] - >>>> Started ServerConnector@2b98378d{HTTP/1.1,[http/1.1]}{ >>> 0.0.0.0:8080} >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO >> [main:Server@399] - >>> Started >>>> @444ms >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO >>> [main:JettyAdminServer@112] - >>>> Started AdminServer on address 0.0.0.0, port 8080 and >> command >>> URL /commands >>>> 2020-02-19 04:23:28,007 [myid:1] - INFO >>>> [main:QuorumCnxManager$Listener@867] - Election port >> bind >>> maximum retries >>>> is 1000 >>>> 2020-02-19 04:23:28,007 [myid:1] - INFO >>>> [QuorumPeerListener:QuorumCnxManager$Listener@917] - >> My >>> election bind >>>> port: /0.0.0.0:3888 >>>> 2020-02-19 04:23:28,014 [myid:1] - INFO >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181 >>> )(secure=disabled):QuorumPeer@1175] >>>> - LOOKING >>>> 2020-02-19 04:23:28,015 [myid:1] - INFO >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181 >>>> )(secure=disabled):FastLeaderElection@885] - New >> election. My >>> id = 1, >>>> proposed zxid=0xa000000fb >>>> 2020-02-19 04:23:28,018 [myid:1] - INFO >>>> [WorkerSender[myid=1]:QuorumCnxManager@438] - Have >> smaller >>> server >>>> identifier, so dropping the connection: (2, 1) >>>> 2020-02-19 04:23:28,019 [myid:1] - INFO >>>> [WorkerSender[myid=1]:QuorumCnxManager@438] - Have >> smaller >>> server >>>> identifier, so dropping the connection: (3, 1) >>>> 2020-02-19 04:23:28,019 [myid:1] - INFO >>>> [WorkerReceiver[myid=1]:FastLeaderElection@679] - >> Notification: >>> 2 >>>> (message format version), 1 (n.leader), 0xa000000fb >> (n.zxid), 0x1 >>>> (n.round), LOOKING (n.state), 1 (n.sid), 0xa >> (n.peerEPoch), >>> LOOKING (my >>>> state)0 (n.config version) >>>> 2020-02-19 04:23:28,221 [myid:1] - INFO >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181 >>>> )(secure=disabled):QuorumCnxManager@438] - Have >> smaller server >>>> identifier, so dropping the connection: (2, 1) >>>> 2020-02-19 04:23:28,222 [myid:1] - INFO >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181 >>>> )(secure=disabled):QuorumCnxManager@438] - Have >> smaller server >>>> identifier, so dropping the connection: (3, 1) >>>> 2020-02-19 04:23:28,222 [myid:1] - INFO >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181 >>>> )(secure=disabled):FastLeaderElection@919] - >> Notification time >>> out: 400 >>>> 2020-02-19 04:23:28,623 [myid:1] - INFO >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181 >>>> )(secure=disabled):QuorumCnxManager@438] - Have >> smaller server >>>> identifier, so dropping the connection: (2, 1) >>>> 2020-02-19 04:23:28,624 [myid:1] - INFO >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181 >>>> )(secure=disabled):QuorumCnxManager@438] - Have >> smaller server >>>> identifier, so dropping the connection: (3, 1) >>>> 2020-02-19 04:23:28,624 [myid:1] - INFO >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181 >>>> )(secure=disabled):FastLeaderElection@919] - >> Notification time >>> out: 800 >>>> … >>>> >>>> And for the leader: >>>> …. >>>> 2020-02-19 05:02:10,341 [myid:3] - INFO >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] - >> Notification: >>> 2 >>>> (message format version), 3 (n.leader), 0xb0000018c >> (n.zxid), 0x1 >>>> (n.round), LOOKING (n.state), 3 (n.sid), 0xb >> (n.peerEPoch), >>> LEADING (my >>>> state)0 (n.config version) >>>> 2020-02-19 05:02:10,341 [myid:3] - INFO >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] - >> Notification: >>> 2 >>>> (message format version), 3 (n.leader), 0xb0000018c >> (n.zxid), 0x1 >>>> (n.round), LEADING (n.state), 3 (n.sid), 0xc >> (n.peerEPoch), >>> LEADING (my >>>> state)0 (n.config version) >>>> 2020-02-19 05:02:33,640 [myid:3] - WARN >>>> [NIOWorkerThread-4:NIOServerCnxn@366] - Unable to read >>> additional data >>>> from client sessionid 0x30002ba40710018, likely client >> has >>> closed socket >>>> 2020-02-19 05:02:39,047 [myid:3] - INFO >>>> [SessionTracker:ZooKeeperServer@398] - Expiring >> session >>>> 0x20002ba2fd6001a, timeout of 40000ms exceeded >>>> 2020-02-19 05:02:39,048 [myid:3] - INFO >>>> [SessionTracker:QuorumZooKeeperServer@157] - >> Submitting global >>>> closeSession request for session 0x20002ba2fd6001a >>>> 2020-02-19 05:03:10,340 [myid:3] - INFO [/ >> 0.0.0.0:3888 >>>> :QuorumCnxManager$Listener@924] - Received connection >> request >>>> 10.0.1.152:52492 >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN >>>> [SendWorker:1:QuorumCnxManager$SendWorker@1143] - >> Interrupted >>> while >>>> waiting for message on queue >>>> java.lang.InterruptedException >>>> at >>>> >>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) >>>> at >>>> >>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) >>>> at >>>> >>> >> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) >>>> at >>>> >>> >> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294) >>>> at >>>> >>> >> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82) >>>> at >>>> >>> >> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131) >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN >>>> [SendWorker:1:QuorumCnxManager$SendWorker@1153] - >> Send worker >>> leaving >>>> thread id 1 my id = 3 >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN >>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@1227] - >> Connection >>> broken for >>>> id 1, my id = 3, error = >>>> java.net.SocketException: Socket closed >>>> at >> java.net.SocketInputStream.socketRead0(Native >>> Method) >>>> at >>>> >> java.net.SocketInputStream.socketRead(SocketInputStream.java:116) >>>> at >>>> >> java.net.SocketInputStream.read(SocketInputStream.java:170) >>>> at >>>> >> java.net.SocketInputStream.read(SocketInputStream.java:141) >>>> at >>>> >> java.io.BufferedInputStream.fill(BufferedInputStream.java:246) >>>> at >>>> >> java.io.BufferedInputStream.read(BufferedInputStream.java:265) >>>> at >>>> >> java.io.DataInputStream.readInt(DataInputStream.java:387) >>>> at >>>> >>> >> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212) >>>> 2020-02-19 05:03:10,341 [myid:3] - WARN >>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@1230] - >> Interrupting >>> SendWorker >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN >>>> [RecvWorker:3:QuorumCnxManager$RecvWorker@1227] - >> Connection >>> broken for >>>> id 3, my id = 3, error = >>>> java.io.EOFException >>>> at >>>> >> java.io.DataInputStream.readInt(DataInputStream.java:392) >>>> at >>>> >>> >> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212) >>>> 2020-02-19 05:03:10,341 [myid:3] - WARN >>>> [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] - >> Interrupting >>> SendWorker >>>> 2020-02-19 05:03:10,341 [myid:3] - INFO [/ >> 0.0.0.0:3888 >>>> :QuorumCnxManager$Listener@924] - Received connection >> request >>>> 10.0.1.142:46326 >>>> 2020-02-19 05:03:10,343 [myid:3] - WARN >>>> [SendWorker:3:QuorumCnxManager$SendWorker@1143] - >> Interrupted >>> while >>>> waiting for message on queue >>>> java.lang.InterruptedException >>>> at >>>> >>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) >>>> at >>>> >>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) >>>> at >>>> >>> >> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) >>>> at >>>> >>> >> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294) >>>> at >>>> >>> >> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82) >>>> at >>>> >>> >> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131) >>>> 2020-02-19 05:03:10,344 [myid:3] - WARN >>>> [SendWorker:3:QuorumCnxManager$SendWorker@1153] - >> Send worker >>> leaving >>>> thread id 3 my id = 3 >>>> 2020-02-19 05:03:10,344 [myid:3] - INFO >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] - >> Notification: >>> 2 >>>> (message format version), 3 (n.leader), 0xb0000018c >> (n.zxid), 0x1 >>>> (n.round), LOOKING (n.state), 3 (n.sid), 0xb >> (n.peerEPoch), >>> LEADING (my >>>> state)0 (n.config version) >>>> 2020-02-19 05:03:10,345 [myid:3] - INFO >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] - >> Notification: >>> 2 >>>> (message format version), 3 (n.leader), 0xb0000018c >> (n.zxid), 0x1 >>>> (n.round), LEADING (n.state), 3 (n.sid), 0xc >> (n.peerEPoch), >>> LEADING (my >>>> state)0 (n.config version) >>>> 2020-02-19 05:03:11,048 [myid:3] - INFO >>>> [SessionTracker:ZooKeeperServer@398] - Expiring >> session >>>> 0x30002ba40710018, timeout of 40000ms exceeded >>>> 2020-02-19 05:03:11,048 [myid:3] - INFO >>>> [SessionTracker:QuorumZooKeeperServer@157] - >> Submitting global >>>> closeSession request for session 0x30002ba40710018 >>>> >>>> All of the instances have a similar zoo.cfg: >>>> >>>> bash-4.3# cat conf/zoo.cfg >>>> # The number of milliseconds of each tick >>>> tickTime=2000 >>>> # The number of ticks that the initial >>>> # synchronization phase can take >>>> initLimit=30 >>>> # The number of ticks that can pass between >>>> # sending a request and getting an acknowledgement >>>> syncLimit=5 >>>> # Purge every 24 hours >>>> autopurge.purgeInterval=24 >>>> # the directory where the snapshot is stored. >>>> # do not use /tmp for storage, /tmp here is just >>>> # example sakes. >>>> dataDir=/opt/zookeeper/data >>>> # the port at which the clients will connect >>>> clientPort=2181 >>>> #Append other confg... >>>> electionPortBindRetry=1000 >>>> 4lw.commands.whitelist=stat, ruok, conf, mntr >>>> >>>> server.1=zookeeper1:2888:3888 >>>> server.2=zookeeper2:2888:3888 >>>> server.3=0.0.0.0:2888:3888 >>>> >>> >>> >>> >>> >>> >> >> >> >> >>