Few points:
1) The containers all are running:
bash-4.3# java -version
openjdk version "1.8.0_111-internal"
OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
2) The containers are configured like this:
environment:
ZK_ID: 1
ZK_CLUSTER: server.1=0.0.0.0:2888:3888 server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
--
environment:
ZK_ID: 2
ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=0.0.0.0:2888:3888
server.3=zookeeper3:2888:3888
--
environment:
ZK_ID: 3
ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888
server.3=0.0.0.0:2888:3888
Reading below, I see that you suggest I should do the following:
environment:
ZK_ID: 1
quorumListenOnAllIPs: true
ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
--
environment:
ZK_ID: 2
quorumListenOnAllIPs: true
ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
--
environment:
ZK_ID: 3
quorumListenOnAllIPs: true
ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
Will try...
Steve
On 2/20/20, 2:56 AM, "Jörn Franke" <[email protected]> wrote:
Thanks . It is strange that JDK 11.0.6 has a backwards incompatible change.
However, it would be sad if we are stuck all the time with JDK 11.0.5.
> Am 20.02.2020 um 10:53 schrieb Szalay-Bekő Máté
<[email protected]>:
>
> Hi Guys,
>
> I think the 'reverse order startup failure' actually has the very same
root
> cause than the 0.0.0.0 issue discussed in ZOOKEEPER-2164.
>
> Downgrading to 3.4 for now should solve these problems I think.
>
> Still I am a bit confused... I just want to understand if we really miss
> something in the ZooKeeper configuration model.
>
> Assuming that myid=1 (we are talking about the zoo.cfg in server 1), to
the
> 'server.1=...' line you can put address which can be used by other servers
> to talk back to server 1. This will be the 'advertised address' used by
> ZooKeeper in 3.5. Putting here 0.0.0.0 will not work with 3.5 (until we
fix
> it with ZOOKEEPER-2164), as server 2 will not be able to use 0.0.0.0 to
> talk back to server 1. But if you put a valid address to the
'server.1=...'
> config line while having quorumListenOnAllIPs=true set, you should still
be
> able tell ZooKeeper to bind on 0.0.0.0, no matter what IP/hostname you put
> to the 'server.1=...' configs.
> (there is a similar config to set which IP to bind with the client port as
> well, if you would need to bind to 0.0.0.0 with the client port too.)
>
>
> @Jorn:
>> This might be a wide shot and I did not see exactly the same error, but
> with corretto jdk 11.0.6 I had also issue that ZK could not a quorum. I
> downgraded to 11.0.5 and it did not have an issues. This was on ZK 3.5.5
> with Kerberos authentication and authorization.
>
> In the recent JDK versions (8u424, or 11.0.6) there are some
> backward-incompatible kerbersos related changes affecting basically the
> whole Hadoop stack, not only ZooKeeper. I think it is not recommended to
> use these JDK versions with Hadoop. I am not involved deep in this (maybe
> there is some workaround already, but I am not aware of it).
>
> Kind regards,
> Mate
>
>> On Wed, Feb 19, 2020 at 11:49 PM Steve Jerman <[email protected]>
wrote:
>>
>> Ok,
>>
>> Just to confirm. Rolling back to 3.4.14 fixes the issue. The quorum
>> starts up and restarting any of the instances works....
>>
>> Are there any issues with using the 3.5 client with 3.4 server?
>>
>> Steve
>>
>> On 2/19/20, 9:02 AM, "Steve Jerman" <[email protected]> wrote:
>>
>> OK, that explains it. I will see if 3.4.14 fixes the issue for the
>> time being...
>>
>> Thanks
>> Steve
>>
>> On 2/19/20, 8:57 AM, "Jan Kosecki" <[email protected]> wrote:
>>
>> Hi Steve,
>>
>> it's possible that the quorum state depends on the order your
>> nodes start.
>> In my kubernetes environment I've had a similar issue and I've
>> noticed that
>> starting brokers 1 by 1, following the order from configuration
>> file allows
>> all 3 to join the quorum but a reverse order would keep server
>> started as
>> the last outside of the quorum. I was also using 0.0.0.0 in the
>> configuration and didn't try a full address due to readiness check
>> configuration.
>>
>> Unfortunately I didn't have time to debug it any further so I've
>> downgraded
>> back to 3.4 for the time being.
>>
>> Hope you manage to find a solution,
>>
>> Best,
>> Jan
>>
>> On Wed, 19 Feb 2020, 15:47 Steve Jerman, <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> I've just been testing restarts ... I restarted one of the
>> instances (id
>>> 1) ... and it doesn't join the quorum ... same error.
>>>
>>> Odd that the system started fine but can't handle a restart....
>>>
>>> Steve
>>>
>>>> On 2/19/20, 7:45 AM, "Steve Jerman" <[email protected]> wrote:
>>>
>>> Thank You Mate,
>>>
>>> That fixed it. Unfortunately I can't easily avoid using
>> 0.0.0.0
>>>
>>> My configuration is built using Docker Storm and it doesn't
>> let you
>>> bind to a host name...
>>>
>>> Steve
>>>
>>> On 2/19/20, 5:27 AM, "Szalay-Bekő Máté" <
>> [email protected]>
>>> wrote:
>>>
>>> Hi Steve!
>>>
>>> If you are using ZooKeeper newer than 3.5.0, then this
>> might be
>>> the issue
>>> we are just discussing / trying to fix in ZOOKEEPER-2164.
>>> Can you test your setup with a config where you don't
>> use 0.0.0.0
>>> in the
>>> server addresses?
>>>
>>> If you need to bind to the 0.0.0.0 locally, then please
>> set the
>>> 'quorumListenOnAllIPs' config property to true.
>>>
>>> like:
>>>
>>> # usually you don't really need this, unless if you
>> actually need
>>> to bind
>>> to multiple IPs
>>> quorumListenOnAllIPs=true
>>>
>>> # it is best if all the zoo.cfg files contain the same
>> address
>>> settings,
>>> and we don't use 0.0.0.0 here
>>> server.1=zookeeper1:2888:3888
>>> server.2=zookeeper2:2888:3888
>>> server.3=zookeeper3:2888:3888
>>>
>>> Kind regards,
>>> Mate
>>>
>>> On Wed, Feb 19, 2020 at 6:08 AM Steve Jerman <
>> [email protected]>
>>> wrote:
>>>
>>>> Hello folks,
>>>>
>>>> Wonder if anyone can help me. Suspect it must be
>> something
>>> simple but I
>>>> cant see it. Any suggestions about how to diagnose
>> would be
>>> gratefully
>>>> received.
>>>>
>>>> I have a three node ZK cluster, when it starts up only
>> two of
>>> the nodes
>>>> form a quorum. If I restart the leader the quorum
>> reforms with
>>> the other
>>>> two node…
>>>>
>>>> Thanks in advance for any help
>>>> Steve
>>>>
>>>> This is the ‘stat’ for the leader/follow…
>>>>
>>>> bash-5.0$ echo stat | nc zookeeper1 2181
>>>> This ZooKeeper instance is not currently serving
>> requests
>>>>
>>>> bash-5.0$ echo stat | nc zookeeper2 2181
>>>> Zookeeper version:
>>> 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
>>>> on 02/10/2020 11:30 GMT
>>>> Clients:
>>>> /10.0.1.152:44910[1](queued=0,recved=151,sent=151)
>>>> /10.0.1.140:53138[1](queued=0,recved=187,sent=187)
>>>> /10.0.1.143:57422[1](queued=0,recved=151,sent=151)
>>>> /10.0.1.152:59242[0](queued=0,recved=1,sent=0)
>>>> /10.0.1.143:40826[1](queued=0,recved=1139,sent=1139)
>>>> /10.0.1.152:49188[1](queued=0,recved=200,sent=203)
>>>> /10.0.1.152:59548[1](queued=0,recved=1157,sent=1159)
>>>> /10.0.1.140:36624[1](queued=0,recved=151,sent=151)
>>>>
>>>> Latency min/avg/max: 0/0/5
>>>> Received: 3338
>>>> Sent: 3342
>>>> Connections: 8
>>>> Outstanding: 0
>>>> Zxid: 0xc000000f3
>>>> Mode: follower
>>>> Node count: 181
>>>>
>>>> bash-5.0$ echo stat | nc zookeeper3 2181
>>>> Zookeeper version:
>>> 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
>>>> on 02/10/2020 11:30 GMT
>>>> Clients:
>>>> /10.0.1.152:49428[0](queued=0,recved=1,sent=0)
>>>> /10.0.1.140:32912[1](queued=0,recved=1426,sent=1429)
>>>>
>>>> Latency min/avg/max: 0/0/4
>>>> Received: 1684
>>>> Sent: 1686
>>>> Connections: 2
>>>> Outstanding: 0
>>>> Zxid: 0xc000000f4
>>>> Mode: leader
>>>> Node count: 181
>>>> Proposal sizes last/min/max: 32/32/406
>>>> bash-5.0$
>>>>
>>>> The trace for the failing node is:
>>>>
>>>> server.1=0.0.0.0:2888:3888
>>>> server.2=zookeeper2:2888:3888
>>>> server.3=zookeeper3:2888:3888
>>>> ZooKeeper JMX enabled by default
>>>> Using config: /opt/zookeeper/bin/../conf/zoo.cfg
>>>> 2020-02-19 04:23:27,759 [myid:] - INFO
>>> [main:QuorumPeerConfig@135] -
>>>> Reading configuration from:
>> /opt/zookeeper/bin/../conf/zoo.cfg
>>>> 2020-02-19 04:23:27,764 [myid:] - INFO
>>> [main:QuorumPeerConfig@387] -
>>>> clientPortAddress is 0.0.0.0:2181
>>>> 2020-02-19 04:23:27,764 [myid:] - INFO
>>> [main:QuorumPeerConfig@391] -
>>>> secureClientPort is not set
>>>> 2020-02-19 04:23:27,771 [myid:1] - INFO
>>> [main:DatadirCleanupManager@78]
>>>> - autopurge.snapRetainCount set to 3
>>>> 2020-02-19 04:23:27,772 [myid:1] - INFO
>>> [main:DatadirCleanupManager@79]
>>>> - autopurge.purgeInterval set to 24
>>>> 2020-02-19 04:23:27,772 [myid:1] - INFO
>>>> [PurgeTask:DatadirCleanupManager$PurgeTask@138] -
>> Purge task
>>> started.
>>>> 2020-02-19 04:23:27,773 [myid:1] - INFO
>> [main:ManagedUtil@46]
>>> - Log4j
>>>> found with jmx enabled.
>>>> 2020-02-19 04:23:27,774 [myid:1] - INFO
>>> [PurgeTask:FileTxnSnapLog@115] -
>>>> zookeeper.snapshot.trust.empty : false
>>>> 2020-02-19 04:23:27,780 [myid:1] - INFO
>>>> [PurgeTask:DatadirCleanupManager$PurgeTask@144] -
>> Purge task
>>> completed.
>>>> 2020-02-19 04:23:27,781 [myid:1] - INFO
>> [main:QuorumPeerMain@141]
>>> -
>>>> Starting quorum peer
>>>> 2020-02-19 04:23:27,786 [myid:1] - INFO
>>> [main:ServerCnxnFactory@135] -
>>>> Using org.apache.zookeeper.server.NIOServerCnxnFactory
>> as server
>>> connection
>>>> factory
>>>> 2020-02-19 04:23:27,788 [myid:1] - INFO
>>> [main:NIOServerCnxnFactory@673]
>>>> - Configuring NIO connection handler with 10s
>> sessionless
>>> connection
>>>> timeout, 2 selector thread(s), 32 worker threads, and
>> 64 kB
>>> direct buffers.
>>>> 2020-02-19 04:23:27,791 [myid:1] - INFO
>>> [main:NIOServerCnxnFactory@686]
>>>> - binding to port 0.0.0.0/0.0.0.0:2181
>>>> 2020-02-19 <http://0.0.0.0/0.0.0.0:21812020-02-19>
>> 04:23:27,809
>>> [myid:1]
>>>> - INFO [main:Log@169] - Logging initialized @249ms to
>>>> org.eclipse.jetty.util.log.Slf4jLog
>>>> 2020-02-19 04:23:27,913 [myid:1] - WARN
>>> [main:ContextHandler@1520] -
>>>> o.e.j.s.ServletContextHandler@5abca1e0
>> {/,null,UNAVAILABLE}
>>> contextPath
>>>> ends with /*
>>>> 2020-02-19 04:23:27,913 [myid:1] - WARN
>>> [main:ContextHandler@1531] -
>>>> Empty contextPath
>>>> 2020-02-19 04:23:27,922 [myid:1] - INFO
>> [main:X509Util@79] -
>>> Setting -D
>>>> jdk.tls.rejectClientInitiatedRenegotiation=true to
>> disable
>>> client-initiated
>>>> TLS renegotiation
>>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
>> [main:FileTxnSnapLog@115]
>>> -
>>>> zookeeper.snapshot.trust.empty : false
>>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
>> [main:QuorumPeer@1470]
>>> - Local
>>>> sessions disabled
>>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
>> [main:QuorumPeer@1481]
>>> - Local
>>>> session upgrading disabled
>>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
>> [main:QuorumPeer@1448]
>>> -
>>>> tickTime set to 2000
>>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
>> [main:QuorumPeer@1492]
>>> -
>>>> minSessionTimeout set to 4000
>>>> 2020-02-19 04:23:27,924 [myid:1] - INFO
>> [main:QuorumPeer@1503]
>>> -
>>>> maxSessionTimeout set to 40000
>>>> 2020-02-19 04:23:27,924 [myid:1] - INFO
>> [main:QuorumPeer@1518]
>>> -
>>>> initLimit set to 30
>>>> 2020-02-19 04:23:27,932 [myid:1] - INFO
>> [main:ZKDatabase@117] -
>>>> zookeeper.snapshotSizeFactor = 0.33
>>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
>> [main:QuorumPeer@1763]
>>> - Using
>>>> insecure (non-TLS) quorum communication
>>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
>> [main:QuorumPeer@1769]
>>> - Port
>>>> unification disabled
>>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
>> [main:QuorumPeer@2136]
>>> -
>>>> QuorumPeer communication is not secured! (SASL auth
>> disabled)
>>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
>> [main:QuorumPeer@2165]
>>> -
>>>> quorum.cnxn.threads.size set to 20
>>>> 2020-02-19 04:23:27,934 [myid:1] - INFO
>> [main:FileSnap@83] -
>>> Reading
>>>> snapshot
>> /opt/zookeeper/data/version-2/snapshot.90000043e
>>>> 2020-02-19 04:23:27,963 [myid:1] - INFO
>> [main:Server@359] -
>>>> jetty-9.4.24.v20191120; built:
>> 2019-11-20T21:37:49.771Z; git:
>>>> 363d5f2df3a8a28de40604320230664b9c793c16; jvm
>>>> 1.8.0_111-internal-alpine-r0-b14
>>>> 2020-02-19 04:23:27,989 [myid:1] - INFO
>>> [main:DefaultSessionIdManager@333]
>>>> - DefaultSessionIdManager workerName=node0
>>>> 2020-02-19 04:23:27,989 [myid:1] - INFO
>>> [main:DefaultSessionIdManager@338]
>>>> - No SessionScavenger set, using defaults
>>>> 2020-02-19 04:23:27,990 [myid:1] - INFO
>> [main:HouseKeeper@140]
>>> - node0
>>>> Scavenging every 660000ms
>>>> 2020-02-19 04:23:27,997 [myid:1] - INFO
>> [main:ContextHandler@825]
>>> -
>>>> Started o.e.j.s.ServletContextHandler@5abca1e0
>> {/,null,AVAILABLE}
>>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
>>> [main:AbstractConnector@330] -
>>>> Started ServerConnector@2b98378d{HTTP/1.1,[http/1.1]}{
>>> 0.0.0.0:8080}
>>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
>> [main:Server@399] -
>>> Started
>>>> @444ms
>>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
>>> [main:JettyAdminServer@112] -
>>>> Started AdminServer on address 0.0.0.0, port 8080 and
>> command
>>> URL /commands
>>>> 2020-02-19 04:23:28,007 [myid:1] - INFO
>>>> [main:QuorumCnxManager$Listener@867] - Election port
>> bind
>>> maximum retries
>>>> is 1000
>>>> 2020-02-19 04:23:28,007 [myid:1] - INFO
>>>> [QuorumPeerListener:QuorumCnxManager$Listener@917] -
>> My
>>> election bind
>>>> port: /0.0.0.0:3888
>>>> 2020-02-19 04:23:28,014 [myid:1] - INFO
>>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>>> )(secure=disabled):QuorumPeer@1175]
>>>> - LOOKING
>>>> 2020-02-19 04:23:28,015 [myid:1] - INFO
>>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>>>> )(secure=disabled):FastLeaderElection@885] - New
>> election. My
>>> id = 1,
>>>> proposed zxid=0xa000000fb
>>>> 2020-02-19 04:23:28,018 [myid:1] - INFO
>>>> [WorkerSender[myid=1]:QuorumCnxManager@438] - Have
>> smaller
>>> server
>>>> identifier, so dropping the connection: (2, 1)
>>>> 2020-02-19 04:23:28,019 [myid:1] - INFO
>>>> [WorkerSender[myid=1]:QuorumCnxManager@438] - Have
>> smaller
>>> server
>>>> identifier, so dropping the connection: (3, 1)
>>>> 2020-02-19 04:23:28,019 [myid:1] - INFO
>>>> [WorkerReceiver[myid=1]:FastLeaderElection@679] -
>> Notification:
>>> 2
>>>> (message format version), 1 (n.leader), 0xa000000fb
>> (n.zxid), 0x1
>>>> (n.round), LOOKING (n.state), 1 (n.sid), 0xa
>> (n.peerEPoch),
>>> LOOKING (my
>>>> state)0 (n.config version)
>>>> 2020-02-19 04:23:28,221 [myid:1] - INFO
>>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>>>> )(secure=disabled):QuorumCnxManager@438] - Have
>> smaller server
>>>> identifier, so dropping the connection: (2, 1)
>>>> 2020-02-19 04:23:28,222 [myid:1] - INFO
>>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>>>> )(secure=disabled):QuorumCnxManager@438] - Have
>> smaller server
>>>> identifier, so dropping the connection: (3, 1)
>>>> 2020-02-19 04:23:28,222 [myid:1] - INFO
>>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>>>> )(secure=disabled):FastLeaderElection@919] -
>> Notification time
>>> out: 400
>>>> 2020-02-19 04:23:28,623 [myid:1] - INFO
>>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>>>> )(secure=disabled):QuorumCnxManager@438] - Have
>> smaller server
>>>> identifier, so dropping the connection: (2, 1)
>>>> 2020-02-19 04:23:28,624 [myid:1] - INFO
>>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>>>> )(secure=disabled):QuorumCnxManager@438] - Have
>> smaller server
>>>> identifier, so dropping the connection: (3, 1)
>>>> 2020-02-19 04:23:28,624 [myid:1] - INFO
>>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>>>> )(secure=disabled):FastLeaderElection@919] -
>> Notification time
>>> out: 800
>>>> …
>>>>
>>>> And for the leader:
>>>> ….
>>>> 2020-02-19 05:02:10,341 [myid:3] - INFO
>>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
>> Notification:
>>> 2
>>>> (message format version), 3 (n.leader), 0xb0000018c
>> (n.zxid), 0x1
>>>> (n.round), LOOKING (n.state), 3 (n.sid), 0xb
>> (n.peerEPoch),
>>> LEADING (my
>>>> state)0 (n.config version)
>>>> 2020-02-19 05:02:10,341 [myid:3] - INFO
>>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
>> Notification:
>>> 2
>>>> (message format version), 3 (n.leader), 0xb0000018c
>> (n.zxid), 0x1
>>>> (n.round), LEADING (n.state), 3 (n.sid), 0xc
>> (n.peerEPoch),
>>> LEADING (my
>>>> state)0 (n.config version)
>>>> 2020-02-19 05:02:33,640 [myid:3] - WARN
>>>> [NIOWorkerThread-4:NIOServerCnxn@366] - Unable to read
>>> additional data
>>>> from client sessionid 0x30002ba40710018, likely client
>> has
>>> closed socket
>>>> 2020-02-19 05:02:39,047 [myid:3] - INFO
>>>> [SessionTracker:ZooKeeperServer@398] - Expiring
>> session
>>>> 0x20002ba2fd6001a, timeout of 40000ms exceeded
>>>> 2020-02-19 05:02:39,048 [myid:3] - INFO
>>>> [SessionTracker:QuorumZooKeeperServer@157] -
>> Submitting global
>>>> closeSession request for session 0x20002ba2fd6001a
>>>> 2020-02-19 05:03:10,340 [myid:3] - INFO [/
>> 0.0.0.0:3888
>>>> :QuorumCnxManager$Listener@924] - Received connection
>> request
>>>> 10.0.1.152:52492
>>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
>>>> [SendWorker:1:QuorumCnxManager$SendWorker@1143] -
>> Interrupted
>>> while
>>>> waiting for message on queue
>>>> java.lang.InterruptedException
>>>> at
>>>>
>>>
>>
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>>>> at
>>>>
>>>
>>
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
>>>> at
>>>>
>>>
>> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
>>>> at
>>>>
>>>
>>
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
>>>> at
>>>>
>>>
>>
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
>>>> at
>>>>
>>>
>>
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
>>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
>>>> [SendWorker:1:QuorumCnxManager$SendWorker@1153] -
>> Send worker
>>> leaving
>>>> thread id 1 my id = 3
>>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
>>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@1227] -
>> Connection
>>> broken for
>>>> id 1, my id = 3, error =
>>>> java.net.SocketException: Socket closed
>>>> at
>> java.net.SocketInputStream.socketRead0(Native
>>> Method)
>>>> at
>>>>
>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>>> at
>>>>
>> java.net.SocketInputStream.read(SocketInputStream.java:170)
>>>> at
>>>>
>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>>>> at
>>>>
>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>>> at
>>>>
>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>>>> at
>>>>
>> java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>> at
>>>>
>>>
>>
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
>>>> 2020-02-19 05:03:10,341 [myid:3] - WARN
>>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@1230] -
>> Interrupting
>>> SendWorker
>>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
>>>> [RecvWorker:3:QuorumCnxManager$RecvWorker@1227] -
>> Connection
>>> broken for
>>>> id 3, my id = 3, error =
>>>> java.io.EOFException
>>>> at
>>>>
>> java.io.DataInputStream.readInt(DataInputStream.java:392)
>>>> at
>>>>
>>>
>>
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
>>>> 2020-02-19 05:03:10,341 [myid:3] - WARN
>>>> [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] -
>> Interrupting
>>> SendWorker
>>>> 2020-02-19 05:03:10,341 [myid:3] - INFO [/
>> 0.0.0.0:3888
>>>> :QuorumCnxManager$Listener@924] - Received connection
>> request
>>>> 10.0.1.142:46326
>>>> 2020-02-19 05:03:10,343 [myid:3] - WARN
>>>> [SendWorker:3:QuorumCnxManager$SendWorker@1143] -
>> Interrupted
>>> while
>>>> waiting for message on queue
>>>> java.lang.InterruptedException
>>>> at
>>>>
>>>
>>
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>>>> at
>>>>
>>>
>>
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
>>>> at
>>>>
>>>
>> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
>>>> at
>>>>
>>>
>>
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
>>>> at
>>>>
>>>
>>
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
>>>> at
>>>>
>>>
>>
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
>>>> 2020-02-19 05:03:10,344 [myid:3] - WARN
>>>> [SendWorker:3:QuorumCnxManager$SendWorker@1153] -
>> Send worker
>>> leaving
>>>> thread id 3 my id = 3
>>>> 2020-02-19 05:03:10,344 [myid:3] - INFO
>>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
>> Notification:
>>> 2
>>>> (message format version), 3 (n.leader), 0xb0000018c
>> (n.zxid), 0x1
>>>> (n.round), LOOKING (n.state), 3 (n.sid), 0xb
>> (n.peerEPoch),
>>> LEADING (my
>>>> state)0 (n.config version)
>>>> 2020-02-19 05:03:10,345 [myid:3] - INFO
>>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
>> Notification:
>>> 2
>>>> (message format version), 3 (n.leader), 0xb0000018c
>> (n.zxid), 0x1
>>>> (n.round), LEADING (n.state), 3 (n.sid), 0xc
>> (n.peerEPoch),
>>> LEADING (my
>>>> state)0 (n.config version)
>>>> 2020-02-19 05:03:11,048 [myid:3] - INFO
>>>> [SessionTracker:ZooKeeperServer@398] - Expiring
>> session
>>>> 0x30002ba40710018, timeout of 40000ms exceeded
>>>> 2020-02-19 05:03:11,048 [myid:3] - INFO
>>>> [SessionTracker:QuorumZooKeeperServer@157] -
>> Submitting global
>>>> closeSession request for session 0x30002ba40710018
>>>>
>>>> All of the instances have a similar zoo.cfg:
>>>>
>>>> bash-4.3# cat conf/zoo.cfg
>>>> # The number of milliseconds of each tick
>>>> tickTime=2000
>>>> # The number of ticks that the initial
>>>> # synchronization phase can take
>>>> initLimit=30
>>>> # The number of ticks that can pass between
>>>> # sending a request and getting an acknowledgement
>>>> syncLimit=5
>>>> # Purge every 24 hours
>>>> autopurge.purgeInterval=24
>>>> # the directory where the snapshot is stored.
>>>> # do not use /tmp for storage, /tmp here is just
>>>> # example sakes.
>>>> dataDir=/opt/zookeeper/data
>>>> # the port at which the clients will connect
>>>> clientPort=2181
>>>> #Append other confg...
>>>> electionPortBindRetry=1000
>>>> 4lw.commands.whitelist=stat, ruok, conf, mntr
>>>>
>>>> server.1=zookeeper1:2888:3888
>>>> server.2=zookeeper2:2888:3888
>>>> server.3=0.0.0.0:2888:3888
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>