Hi Steve,

This is great news!
I also found the same during my tests, it is good to see that it works in
your environment as well.
Thanks for taking the time on testing and also sharing your findings!

Still, I think it worths to push the PRs in ZOOKEEPER-2164 just for
backward-compatibility reasons, so that not everyone affected by this needs
to change their configs when upgrading from 3.4 to 3.5.

Kind regards,
Mate

On Thu, Feb 20, 2020 at 4:40 PM Steve Jerman <st...@kloudspot.com> wrote:

> Hi Folks,
>
> OK, the following works. It will start up and if any of the three are
> restarted they will rejoin the quorum.
>
> 1) Versions:
> openjdk version "1.8.0_111-internal"
>     OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14)
>     OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
> Zookeeper version: 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
> on 02/10/2020 11:30 GMT
>
> 2) Add following to zoo.cfg:
> quorumListenOnAllIPs=true
>
> 3) Use the follow docker-compose config:
>     environment:
>       ZK_ID: 1
>       ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
> --
>     environment:
>       ZK_ID: 2
>       ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
> --
>     environment:
>       ZK_ID: 3
>       ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
>
>
> Sorry, I misunderstood yesterday, I did 2) but not 3)
>
> Thanks.
>
> Steve
>
>
> On 2/20/20, 8:16 AM, "Steve Jerman" <st...@kloudspot.com> wrote:
>
>     Few points:
>
>     1) The containers all are running:
>
>     bash-4.3# java -version
>     openjdk version "1.8.0_111-internal"
>     OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14)
>     OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>
>     2) The containers are configured like this:
>         environment:
>           ZK_ID: 1
>           ZK_CLUSTER: server.1=0.0.0.0:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
>     --
>         environment:
>           ZK_ID: 2
>           ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=0.0.0.0:2888:3888
> server.3=zookeeper3:2888:3888
>     --
>         environment:
>           ZK_ID: 3
>           ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=0.0.0.0:2888:3888
>
>     Reading below, I see that you suggest I should do the following:
>
>         environment:
>           ZK_ID: 1
>          quorumListenOnAllIPs: true
>           ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
>     --
>         environment:
>           ZK_ID: 2
>          quorumListenOnAllIPs: true
>           ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
>     --
>         environment:
>           ZK_ID: 3
>          quorumListenOnAllIPs: true
>           ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
>
>     Will try...
>
>     Steve
>
>     On 2/20/20, 2:56 AM, "Jörn Franke" <jornfra...@gmail.com> wrote:
>
>         Thanks . It is strange that JDK 11.0.6 has a backwards
> incompatible change. However, it would be sad if we are stuck all the time
> with JDK 11.0.5.
>
>
>         > Am 20.02.2020 um 10:53 schrieb Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com>:
>         >
>         > Hi Guys,
>         >
>         > I think the 'reverse order startup failure' actually has the
> very same root
>         > cause than the 0.0.0.0 issue discussed in ZOOKEEPER-2164.
>         >
>         > Downgrading to 3.4 for now should solve these problems I think.
>         >
>         > Still I am a bit confused... I just want to understand if we
> really miss
>         > something in the ZooKeeper configuration model.
>         >
>         > Assuming that myid=1 (we are talking about the zoo.cfg in server
> 1), to the
>         > 'server.1=...' line you can put address which can be used by
> other servers
>         > to talk back to server 1. This will be the 'advertised address'
> used by
>         > ZooKeeper in 3.5. Putting here 0.0.0.0 will not work with 3.5
> (until we fix
>         > it with ZOOKEEPER-2164), as server 2 will not be able to use
> 0.0.0.0 to
>         > talk back to server 1. But if you put a valid address to the
> 'server.1=...'
>         > config line while having quorumListenOnAllIPs=true set, you
> should still be
>         > able tell ZooKeeper to bind on 0.0.0.0, no matter what
> IP/hostname you put
>         > to the 'server.1=...' configs.
>         > (there is a similar config to set which IP to bind with the
> client port as
>         > well, if you would need to bind to 0.0.0.0 with the client port
> too.)
>         >
>         >
>         > @Jorn:
>         >> This might be a wide shot and I did not see exactly the same
> error, but
>         > with corretto jdk 11.0.6 I had also issue that ZK could not a
> quorum. I
>         > downgraded to 11.0.5 and it did not have an issues. This was on
> ZK 3.5.5
>         > with Kerberos authentication and authorization.
>         >
>         > In the recent JDK versions (8u424, or 11.0.6) there are some
>         > backward-incompatible kerbersos related changes affecting
> basically the
>         > whole Hadoop stack, not only ZooKeeper. I think it is not
> recommended to
>         > use these JDK versions with Hadoop. I am not involved deep in
> this (maybe
>         > there is some workaround already, but I am not aware of it).
>         >
>         > Kind regards,
>         > Mate
>         >
>         >> On Wed, Feb 19, 2020 at 11:49 PM Steve Jerman <
> st...@kloudspot.com> wrote:
>         >>
>         >> Ok,
>         >>
>         >> Just to confirm. Rolling back to 3.4.14 fixes the issue.  The
> quorum
>         >> starts up and restarting any of the instances works....
>         >>
>         >> Are there any issues with using the 3.5 client with 3.4 server?
>         >>
>         >> Steve
>         >>
>         >> On 2/19/20, 9:02 AM, "Steve Jerman" <st...@kloudspot.com>
> wrote:
>         >>
>         >>    OK, that explains it.  I will see if 3.4.14 fixes the issue
> for the
>         >> time being...
>         >>
>         >>    Thanks
>         >>    Steve
>         >>
>         >>    On 2/19/20, 8:57 AM, "Jan Kosecki" <jan.koseck...@gmail.com>
> wrote:
>         >>
>         >>        Hi Steve,
>         >>
>         >>        it's possible that the quorum state depends on the order
> your
>         >> nodes start.
>         >>        In my kubernetes environment I've had a similar issue
> and I've
>         >> noticed that
>         >>        starting brokers 1 by 1, following the order from
> configuration
>         >> file allows
>         >>        all 3 to join the quorum but a reverse order would keep
> server
>         >> started as
>         >>        the last outside of the quorum. I was also using 0.0.0.0
> in the
>         >>        configuration and didn't try a full address due to
> readiness check
>         >>        configuration.
>         >>
>         >>        Unfortunately I didn't have time to debug it any further
> so I've
>         >> downgraded
>         >>        back to 3.4 for the time being.
>         >>
>         >>        Hope you manage to find a solution,
>         >>
>         >>        Best,
>         >>        Jan
>         >>
>         >>        On Wed, 19 Feb 2020, 15:47 Steve Jerman, <
> st...@kloudspot.com>
>         >> wrote:
>         >>
>         >>> Hi,
>         >>>
>         >>> I've just been testing restarts ... I restarted one of the
>         >> instances (id
>         >>> 1) ... and it doesn't join the quorum ... same error.
>         >>>
>         >>> Odd that the system started fine but can't handle a restart....
>         >>>
>         >>> Steve
>         >>>
>         >>>> On 2/19/20, 7:45 AM, "Steve Jerman" <st...@kloudspot.com>
> wrote:
>         >>>
>         >>>    Thank You Mate,
>         >>>
>         >>>    That fixed it. Unfortunately I can't easily avoid using
>         >> 0.0.0.0
>         >>>
>         >>>    My configuration is built using Docker Storm and it doesn't
>         >> let you
>         >>> bind to a host name...
>         >>>
>         >>>    Steve
>         >>>
>         >>>    On 2/19/20, 5:27 AM, "Szalay-Bekő Máté" <
>         >> szalay.beko.m...@gmail.com>
>         >>> wrote:
>         >>>
>         >>>        Hi Steve!
>         >>>
>         >>>        If you are using ZooKeeper newer than 3.5.0, then this
>         >> might be
>         >>> the issue
>         >>>        we are just discussing / trying to fix in
> ZOOKEEPER-2164.
>         >>>        Can you test your setup with a config where you don't
>         >> use 0.0.0.0
>         >>> in the
>         >>>        server addresses?
>         >>>
>         >>>        If you need to bind to the 0.0.0.0 locally, then please
>         >> set the
>         >>>        'quorumListenOnAllIPs' config property to true.
>         >>>
>         >>>        like:
>         >>>
>         >>>        # usually you don't really need this, unless if you
>         >> actually need
>         >>> to bind
>         >>>        to multiple IPs
>         >>>        quorumListenOnAllIPs=true
>         >>>
>         >>>        # it is best if all the zoo.cfg files contain the same
>         >> address
>         >>> settings,
>         >>>        and we don't use 0.0.0.0 here
>         >>>        server.1=zookeeper1:2888:3888
>         >>>        server.2=zookeeper2:2888:3888
>         >>>        server.3=zookeeper3:2888:3888
>         >>>
>         >>>        Kind regards,
>         >>>        Mate
>         >>>
>         >>>        On Wed, Feb 19, 2020 at 6:08 AM Steve Jerman <
>         >> st...@kloudspot.com>
>         >>> wrote:
>         >>>
>         >>>> Hello folks,
>         >>>>
>         >>>> Wonder if anyone can help me. Suspect it must be
>         >> something
>         >>> simple but I
>         >>>> cant see it. Any suggestions about how to diagnose
>         >> would be
>         >>> gratefully
>         >>>> received.
>         >>>>
>         >>>> I have a three node ZK cluster, when it starts up only
>         >> two of
>         >>> the nodes
>         >>>> form a quorum. If I restart the leader the quorum
>         >> reforms with
>         >>> the other
>         >>>> two node…
>         >>>>
>         >>>> Thanks in advance for any  help
>         >>>> Steve
>         >>>>
>         >>>> This is the ‘stat’ for the leader/follow…
>         >>>>
>         >>>> bash-5.0$ echo stat | nc zookeeper1 2181
>         >>>> This ZooKeeper instance is not currently serving
>         >> requests
>         >>>>
>         >>>> bash-5.0$ echo stat | nc zookeeper2 2181
>         >>>> Zookeeper version:
>         >>> 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
>         >>>> on 02/10/2020 11:30 GMT
>         >>>> Clients:
>         >>>> /10.0.1.152:44910[1](queued=0,recved=151,sent=151)
>         >>>> /10.0.1.140:53138[1](queued=0,recved=187,sent=187)
>         >>>> /10.0.1.143:57422[1](queued=0,recved=151,sent=151)
>         >>>> /10.0.1.152:59242[0](queued=0,recved=1,sent=0)
>         >>>> /10.0.1.143:40826[1](queued=0,recved=1139,sent=1139)
>         >>>> /10.0.1.152:49188[1](queued=0,recved=200,sent=203)
>         >>>> /10.0.1.152:59548[1](queued=0,recved=1157,sent=1159)
>         >>>> /10.0.1.140:36624[1](queued=0,recved=151,sent=151)
>         >>>>
>         >>>> Latency min/avg/max: 0/0/5
>         >>>> Received: 3338
>         >>>> Sent: 3342
>         >>>> Connections: 8
>         >>>> Outstanding: 0
>         >>>> Zxid: 0xc000000f3
>         >>>> Mode: follower
>         >>>> Node count: 181
>         >>>>
>         >>>> bash-5.0$ echo stat | nc zookeeper3 2181
>         >>>> Zookeeper version:
>         >>> 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
>         >>>> on 02/10/2020 11:30 GMT
>         >>>> Clients:
>         >>>> /10.0.1.152:49428[0](queued=0,recved=1,sent=0)
>         >>>> /10.0.1.140:32912[1](queued=0,recved=1426,sent=1429)
>         >>>>
>         >>>> Latency min/avg/max: 0/0/4
>         >>>> Received: 1684
>         >>>> Sent: 1686
>         >>>> Connections: 2
>         >>>> Outstanding: 0
>         >>>> Zxid: 0xc000000f4
>         >>>> Mode: leader
>         >>>> Node count: 181
>         >>>> Proposal sizes last/min/max: 32/32/406
>         >>>> bash-5.0$
>         >>>>
>         >>>> The trace for the failing node is:
>         >>>>
>         >>>> server.1=0.0.0.0:2888:3888
>         >>>> server.2=zookeeper2:2888:3888
>         >>>> server.3=zookeeper3:2888:3888
>         >>>> ZooKeeper JMX enabled by default
>         >>>> Using config: /opt/zookeeper/bin/../conf/zoo.cfg
>         >>>> 2020-02-19 04:23:27,759 [myid:] - INFO
>         >>> [main:QuorumPeerConfig@135] -
>         >>>> Reading configuration from:
>         >> /opt/zookeeper/bin/../conf/zoo.cfg
>         >>>> 2020-02-19 04:23:27,764 [myid:] - INFO
>         >>> [main:QuorumPeerConfig@387] -
>         >>>> clientPortAddress is 0.0.0.0:2181
>         >>>> 2020-02-19 04:23:27,764 [myid:] - INFO
>         >>> [main:QuorumPeerConfig@391] -
>         >>>> secureClientPort is not set
>         >>>> 2020-02-19 04:23:27,771 [myid:1] - INFO
>         >>> [main:DatadirCleanupManager@78]
>         >>>> - autopurge.snapRetainCount set to 3
>         >>>> 2020-02-19 04:23:27,772 [myid:1] - INFO
>         >>> [main:DatadirCleanupManager@79]
>         >>>> - autopurge.purgeInterval set to 24
>         >>>> 2020-02-19 04:23:27,772 [myid:1] - INFO
>         >>>> [PurgeTask:DatadirCleanupManager$PurgeTask@138] -
>         >> Purge task
>         >>> started.
>         >>>> 2020-02-19 04:23:27,773 [myid:1] - INFO
>         >> [main:ManagedUtil@46]
>         >>> - Log4j
>         >>>> found with jmx enabled.
>         >>>> 2020-02-19 04:23:27,774 [myid:1] - INFO
>         >>> [PurgeTask:FileTxnSnapLog@115] -
>         >>>> zookeeper.snapshot.trust.empty : false
>         >>>> 2020-02-19 04:23:27,780 [myid:1] - INFO
>         >>>> [PurgeTask:DatadirCleanupManager$PurgeTask@144] -
>         >> Purge task
>         >>> completed.
>         >>>> 2020-02-19 04:23:27,781 [myid:1] - INFO
>         >> [main:QuorumPeerMain@141]
>         >>> -
>         >>>> Starting quorum peer
>         >>>> 2020-02-19 04:23:27,786 [myid:1] - INFO
>         >>> [main:ServerCnxnFactory@135] -
>         >>>> Using org.apache.zookeeper.server.NIOServerCnxnFactory
>         >> as server
>         >>> connection
>         >>>> factory
>         >>>> 2020-02-19 04:23:27,788 [myid:1] - INFO
>         >>> [main:NIOServerCnxnFactory@673]
>         >>>> - Configuring NIO connection handler with 10s
>         >> sessionless
>         >>> connection
>         >>>> timeout, 2 selector thread(s), 32 worker threads, and
>         >> 64 kB
>         >>> direct buffers.
>         >>>> 2020-02-19 04:23:27,791 [myid:1] - INFO
>         >>> [main:NIOServerCnxnFactory@686]
>         >>>> - binding to port 0.0.0.0/0.0.0.0:2181
>         >>>> 2020-02-19 <http://0.0.0.0/0.0.0.0:21812020-02-19>
>         >> 04:23:27,809
>         >>> [myid:1]
>         >>>> - INFO  [main:Log@169] - Logging initialized @249ms to
>         >>>> org.eclipse.jetty.util.log.Slf4jLog
>         >>>> 2020-02-19 04:23:27,913 [myid:1] - WARN
>         >>> [main:ContextHandler@1520] -
>         >>>> o.e.j.s.ServletContextHandler@5abca1e0
>         >> {/,null,UNAVAILABLE}
>         >>> contextPath
>         >>>> ends with /*
>         >>>> 2020-02-19 04:23:27,913 [myid:1] - WARN
>         >>> [main:ContextHandler@1531] -
>         >>>> Empty contextPath
>         >>>> 2020-02-19 04:23:27,922 [myid:1] - INFO
>         >> [main:X509Util@79] -
>         >>> Setting -D
>         >>>> jdk.tls.rejectClientInitiatedRenegotiation=true to
>         >> disable
>         >>> client-initiated
>         >>>> TLS renegotiation
>         >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
>         >> [main:FileTxnSnapLog@115]
>         >>> -
>         >>>> zookeeper.snapshot.trust.empty : false
>         >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
>         >> [main:QuorumPeer@1470]
>         >>> - Local
>         >>>> sessions disabled
>         >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
>         >> [main:QuorumPeer@1481]
>         >>> - Local
>         >>>> session upgrading disabled
>         >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
>         >> [main:QuorumPeer@1448]
>         >>> -
>         >>>> tickTime set to 2000
>         >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
>         >> [main:QuorumPeer@1492]
>         >>> -
>         >>>> minSessionTimeout set to 4000
>         >>>> 2020-02-19 04:23:27,924 [myid:1] - INFO
>         >> [main:QuorumPeer@1503]
>         >>> -
>         >>>> maxSessionTimeout set to 40000
>         >>>> 2020-02-19 04:23:27,924 [myid:1] - INFO
>         >> [main:QuorumPeer@1518]
>         >>> -
>         >>>> initLimit set to 30
>         >>>> 2020-02-19 04:23:27,932 [myid:1] - INFO
>         >> [main:ZKDatabase@117] -
>         >>>> zookeeper.snapshotSizeFactor = 0.33
>         >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
>         >> [main:QuorumPeer@1763]
>         >>> - Using
>         >>>> insecure (non-TLS) quorum communication
>         >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
>         >> [main:QuorumPeer@1769]
>         >>> - Port
>         >>>> unification disabled
>         >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
>         >> [main:QuorumPeer@2136]
>         >>> -
>         >>>> QuorumPeer communication is not secured! (SASL auth
>         >> disabled)
>         >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
>         >> [main:QuorumPeer@2165]
>         >>> -
>         >>>> quorum.cnxn.threads.size set to 20
>         >>>> 2020-02-19 04:23:27,934 [myid:1] - INFO
>         >> [main:FileSnap@83] -
>         >>> Reading
>         >>>> snapshot
>         >> /opt/zookeeper/data/version-2/snapshot.90000043e
>         >>>> 2020-02-19 04:23:27,963 [myid:1] - INFO
>         >> [main:Server@359] -
>         >>>> jetty-9.4.24.v20191120; built:
>         >> 2019-11-20T21:37:49.771Z; git:
>         >>>> 363d5f2df3a8a28de40604320230664b9c793c16; jvm
>         >>>> 1.8.0_111-internal-alpine-r0-b14
>         >>>> 2020-02-19 04:23:27,989 [myid:1] - INFO
>         >>> [main:DefaultSessionIdManager@333]
>         >>>> - DefaultSessionIdManager workerName=node0
>         >>>> 2020-02-19 04:23:27,989 [myid:1] - INFO
>         >>> [main:DefaultSessionIdManager@338]
>         >>>> - No SessionScavenger set, using defaults
>         >>>> 2020-02-19 04:23:27,990 [myid:1] - INFO
>         >> [main:HouseKeeper@140]
>         >>> - node0
>         >>>> Scavenging every 660000ms
>         >>>> 2020-02-19 04:23:27,997 [myid:1] - INFO
>         >> [main:ContextHandler@825]
>         >>> -
>         >>>> Started o.e.j.s.ServletContextHandler@5abca1e0
>         >> {/,null,AVAILABLE}
>         >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
>         >>> [main:AbstractConnector@330] -
>         >>>> Started ServerConnector@2b98378d{HTTP/1.1,[http/1.1]}{
>         >>> 0.0.0.0:8080}
>         >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
>         >> [main:Server@399] -
>         >>> Started
>         >>>> @444ms
>         >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
>         >>> [main:JettyAdminServer@112] -
>         >>>> Started AdminServer on address 0.0.0.0, port 8080 and
>         >> command
>         >>> URL /commands
>         >>>> 2020-02-19 04:23:28,007 [myid:1] - INFO
>         >>>> [main:QuorumCnxManager$Listener@867] - Election port
>         >> bind
>         >>> maximum retries
>         >>>> is 1000
>         >>>> 2020-02-19 04:23:28,007 [myid:1] - INFO
>         >>>> [QuorumPeerListener:QuorumCnxManager$Listener@917] -
>         >> My
>         >>> election bind
>         >>>> port: /0.0.0.0:3888
>         >>>> 2020-02-19 04:23:28,014 [myid:1] - INFO
>         >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>         >>> )(secure=disabled):QuorumPeer@1175]
>         >>>> - LOOKING
>         >>>> 2020-02-19 04:23:28,015 [myid:1] - INFO
>         >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>         >>>> )(secure=disabled):FastLeaderElection@885] - New
>         >> election. My
>         >>> id =  1,
>         >>>> proposed zxid=0xa000000fb
>         >>>> 2020-02-19 04:23:28,018 [myid:1] - INFO
>         >>>> [WorkerSender[myid=1]:QuorumCnxManager@438] - Have
>         >> smaller
>         >>> server
>         >>>> identifier, so dropping the connection: (2, 1)
>         >>>> 2020-02-19 04:23:28,019 [myid:1] - INFO
>         >>>> [WorkerSender[myid=1]:QuorumCnxManager@438] - Have
>         >> smaller
>         >>> server
>         >>>> identifier, so dropping the connection: (3, 1)
>         >>>> 2020-02-19 04:23:28,019 [myid:1] - INFO
>         >>>> [WorkerReceiver[myid=1]:FastLeaderElection@679] -
>         >> Notification:
>         >>> 2
>         >>>> (message format version), 1 (n.leader), 0xa000000fb
>         >> (n.zxid), 0x1
>         >>>> (n.round), LOOKING (n.state), 1 (n.sid), 0xa
>         >> (n.peerEPoch),
>         >>> LOOKING (my
>         >>>> state)0 (n.config version)
>         >>>> 2020-02-19 04:23:28,221 [myid:1] - INFO
>         >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>         >>>> )(secure=disabled):QuorumCnxManager@438] - Have
>         >> smaller server
>         >>>> identifier, so dropping the connection: (2, 1)
>         >>>> 2020-02-19 04:23:28,222 [myid:1] - INFO
>         >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>         >>>> )(secure=disabled):QuorumCnxManager@438] - Have
>         >> smaller server
>         >>>> identifier, so dropping the connection: (3, 1)
>         >>>> 2020-02-19 04:23:28,222 [myid:1] - INFO
>         >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>         >>>> )(secure=disabled):FastLeaderElection@919] -
>         >> Notification time
>         >>> out: 400
>         >>>> 2020-02-19 04:23:28,623 [myid:1] - INFO
>         >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>         >>>> )(secure=disabled):QuorumCnxManager@438] - Have
>         >> smaller server
>         >>>> identifier, so dropping the connection: (2, 1)
>         >>>> 2020-02-19 04:23:28,624 [myid:1] - INFO
>         >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>         >>>> )(secure=disabled):QuorumCnxManager@438] - Have
>         >> smaller server
>         >>>> identifier, so dropping the connection: (3, 1)
>         >>>> 2020-02-19 04:23:28,624 [myid:1] - INFO
>         >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
>         >>>> )(secure=disabled):FastLeaderElection@919] -
>         >> Notification time
>         >>> out: 800
>         >>>> …
>         >>>>
>         >>>> And for the leader:
>         >>>> ….
>         >>>> 2020-02-19 05:02:10,341 [myid:3] - INFO
>         >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
>         >> Notification:
>         >>> 2
>         >>>> (message format version), 3 (n.leader), 0xb0000018c
>         >> (n.zxid), 0x1
>         >>>> (n.round), LOOKING (n.state), 3 (n.sid), 0xb
>         >> (n.peerEPoch),
>         >>> LEADING (my
>         >>>> state)0 (n.config version)
>         >>>> 2020-02-19 05:02:10,341 [myid:3] - INFO
>         >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
>         >> Notification:
>         >>> 2
>         >>>> (message format version), 3 (n.leader), 0xb0000018c
>         >> (n.zxid), 0x1
>         >>>> (n.round), LEADING (n.state), 3 (n.sid), 0xc
>         >> (n.peerEPoch),
>         >>> LEADING (my
>         >>>> state)0 (n.config version)
>         >>>> 2020-02-19 05:02:33,640 [myid:3] - WARN
>         >>>> [NIOWorkerThread-4:NIOServerCnxn@366] - Unable to read
>         >>> additional data
>         >>>> from client sessionid 0x30002ba40710018, likely client
>         >> has
>         >>> closed socket
>         >>>> 2020-02-19 05:02:39,047 [myid:3] - INFO
>         >>>> [SessionTracker:ZooKeeperServer@398] - Expiring
>         >> session
>         >>>> 0x20002ba2fd6001a, timeout of 40000ms exceeded
>         >>>> 2020-02-19 05:02:39,048 [myid:3] - INFO
>         >>>> [SessionTracker:QuorumZooKeeperServer@157] -
>         >> Submitting global
>         >>>> closeSession request for session 0x20002ba2fd6001a
>         >>>> 2020-02-19 05:03:10,340 [myid:3] - INFO  [/
>         >> 0.0.0.0:3888
>         >>>> :QuorumCnxManager$Listener@924] - Received connection
>         >> request
>         >>>> 10.0.1.152:52492
>         >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
>         >>>> [SendWorker:1:QuorumCnxManager$SendWorker@1143] -
>         >> Interrupted
>         >>> while
>         >>>> waiting for message on queue
>         >>>> java.lang.InterruptedException
>         >>>>                at
>         >>>>
>         >>>
>         >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
>         >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
>         >>>> [SendWorker:1:QuorumCnxManager$SendWorker@1153] -
>         >> Send worker
>         >>> leaving
>         >>>> thread  id 1 my id = 3
>         >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
>         >>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@1227] -
>         >> Connection
>         >>> broken for
>         >>>> id 1, my id = 3, error =
>         >>>> java.net.SocketException: Socket closed
>         >>>>                at
>         >> java.net.SocketInputStream.socketRead0(Native
>         >>> Method)
>         >>>>                at
>         >>>>
>         >>
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>         >>>>                at
>         >>>>
>         >> java.net.SocketInputStream.read(SocketInputStream.java:170)
>         >>>>                at
>         >>>>
>         >> java.net.SocketInputStream.read(SocketInputStream.java:141)
>         >>>>                at
>         >>>>
>         >> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         >>>>                at
>         >>>>
>         >> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         >>>>                at
>         >>>>
>         >> java.io.DataInputStream.readInt(DataInputStream.java:387)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
>         >>>> 2020-02-19 05:03:10,341 [myid:3] - WARN
>         >>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@1230] -
>         >> Interrupting
>         >>> SendWorker
>         >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
>         >>>> [RecvWorker:3:QuorumCnxManager$RecvWorker@1227] -
>         >> Connection
>         >>> broken for
>         >>>> id 3, my id = 3, error =
>         >>>> java.io.EOFException
>         >>>>                at
>         >>>>
>         >> java.io.DataInputStream.readInt(DataInputStream.java:392)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
>         >>>> 2020-02-19 05:03:10,341 [myid:3] - WARN
>         >>>> [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] -
>         >> Interrupting
>         >>> SendWorker
>         >>>> 2020-02-19 05:03:10,341 [myid:3] - INFO  [/
>         >> 0.0.0.0:3888
>         >>>> :QuorumCnxManager$Listener@924] - Received connection
>         >> request
>         >>>> 10.0.1.142:46326
>         >>>> 2020-02-19 05:03:10,343 [myid:3] - WARN
>         >>>> [SendWorker:3:QuorumCnxManager$SendWorker@1143] -
>         >> Interrupted
>         >>> while
>         >>>> waiting for message on queue
>         >>>> java.lang.InterruptedException
>         >>>>                at
>         >>>>
>         >>>
>         >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
>         >>>>                at
>         >>>>
>         >>>
>         >>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
>         >>>> 2020-02-19 05:03:10,344 [myid:3] - WARN
>         >>>> [SendWorker:3:QuorumCnxManager$SendWorker@1153] -
>         >> Send worker
>         >>> leaving
>         >>>> thread  id 3 my id = 3
>         >>>> 2020-02-19 05:03:10,344 [myid:3] - INFO
>         >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
>         >> Notification:
>         >>> 2
>         >>>> (message format version), 3 (n.leader), 0xb0000018c
>         >> (n.zxid), 0x1
>         >>>> (n.round), LOOKING (n.state), 3 (n.sid), 0xb
>         >> (n.peerEPoch),
>         >>> LEADING (my
>         >>>> state)0 (n.config version)
>         >>>> 2020-02-19 05:03:10,345 [myid:3] - INFO
>         >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
>         >> Notification:
>         >>> 2
>         >>>> (message format version), 3 (n.leader), 0xb0000018c
>         >> (n.zxid), 0x1
>         >>>> (n.round), LEADING (n.state), 3 (n.sid), 0xc
>         >> (n.peerEPoch),
>         >>> LEADING (my
>         >>>> state)0 (n.config version)
>         >>>> 2020-02-19 05:03:11,048 [myid:3] - INFO
>         >>>> [SessionTracker:ZooKeeperServer@398] - Expiring
>         >> session
>         >>>> 0x30002ba40710018, timeout of 40000ms exceeded
>         >>>> 2020-02-19 05:03:11,048 [myid:3] - INFO
>         >>>> [SessionTracker:QuorumZooKeeperServer@157] -
>         >> Submitting global
>         >>>> closeSession request for session 0x30002ba40710018
>         >>>>
>         >>>> All of the instances have a similar zoo.cfg:
>         >>>>
>         >>>> bash-4.3# cat conf/zoo.cfg
>         >>>> # The number of milliseconds of each tick
>         >>>> tickTime=2000
>         >>>> # The number of ticks that the initial
>         >>>> # synchronization phase can take
>         >>>> initLimit=30
>         >>>> # The number of ticks that can pass between
>         >>>> # sending a request and getting an acknowledgement
>         >>>> syncLimit=5
>         >>>> # Purge every 24 hours
>         >>>> autopurge.purgeInterval=24
>         >>>> # the directory where the snapshot is stored.
>         >>>> # do not use /tmp for storage, /tmp here is just
>         >>>> # example sakes.
>         >>>> dataDir=/opt/zookeeper/data
>         >>>> # the port at which the clients will connect
>         >>>> clientPort=2181
>         >>>> #Append other confg...
>         >>>> electionPortBindRetry=1000
>         >>>> 4lw.commands.whitelist=stat, ruok, conf, mntr
>         >>>>
>         >>>> server.1=zookeeper1:2888:3888
>         >>>> server.2=zookeeper2:2888:3888
>         >>>> server.3=0.0.0.0:2888:3888
>         >>>>
>         >>>
>         >>>
>         >>>
>         >>>
>         >>>
>         >>
>         >>
>         >>
>         >>
>         >>
>
>
>
>
>

Reply via email to