Re: Zookeeper won't form a quorum...

Steve Jerman Thu, 20 Feb 2020 07:41:10 -0800

Hi Folks,

OK, the following works. It will start up and if any of the three are restarted 
they will rejoin the quorum.


1) Versions:
openjdk version "1.8.0_111-internal"
    OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14)
    OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
Zookeeper version: 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on 
02/10/2020 11:30 GMT

2) Add following to zoo.cfg:
quorumListenOnAllIPs=true

3) Use the follow docker-compose config:
    environment:
      ZK_ID: 1
      ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 
server.3=zookeeper3:2888:3888
--
    environment:
      ZK_ID: 2
      ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 
server.3=zookeeper3:2888:3888
--
    environment:
      ZK_ID: 3
      ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 
server.3=zookeeper3:2888:3888


Sorry, I misunderstood yesterday, I did 2) but not 3)

Thanks.

Steve


On 2/20/20, 8:16 AM, "Steve Jerman" <st...@kloudspot.com> wrote:

    Few points:
    
    1) The containers all are running:
    
    bash-4.3# java -version
    openjdk version "1.8.0_111-internal"
    OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14)
    OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
    
    2) The containers are configured like this:
        environment:
          ZK_ID: 1
          ZK_CLUSTER: server.1=0.0.0.0:2888:3888 server.2=zookeeper2:2888:3888 
server.3=zookeeper3:2888:3888
    --
        environment:
          ZK_ID: 2
          ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=0.0.0.0:2888:3888 
server.3=zookeeper3:2888:3888
    --
        environment:
          ZK_ID: 3
          ZK_CLUSTER: server.1=zookeeper1:2888:3888 
server.2=zookeeper2:2888:3888 server.3=0.0.0.0:2888:3888
    
    Reading below, I see that you suggest I should do the following:
    
        environment:
          ZK_ID: 1
         quorumListenOnAllIPs: true
          ZK_CLUSTER: server.1=zookeeper1:2888:3888 
server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
    --
        environment:
          ZK_ID: 2
         quorumListenOnAllIPs: true
          ZK_CLUSTER: server.1=zookeeper1:2888:3888 
server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
    --
        environment:
          ZK_ID: 3
         quorumListenOnAllIPs: true
          ZK_CLUSTER: server.1=zookeeper1:2888:3888 
server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
    
    Will try...
    
    Steve
    
    On 2/20/20, 2:56 AM, "Jörn Franke" <jornfra...@gmail.com> wrote:
    
        Thanks . It is strange that JDK 11.0.6 has a backwards incompatible 
change. However, it would be sad if we are stuck all the time with JDK 11.0.5.
        
        
        > Am 20.02.2020 um 10:53 schrieb Szalay-Bekő Máté 
<szalay.beko.m...@gmail.com>:
        > 
        > Hi Guys,
        > 
        > I think the 'reverse order startup failure' actually has the very 
same root
        > cause than the 0.0.0.0 issue discussed in ZOOKEEPER-2164.
        > 
        > Downgrading to 3.4 for now should solve these problems I think.
        > 
        > Still I am a bit confused... I just want to understand if we really 
miss
        > something in the ZooKeeper configuration model.
        > 
        > Assuming that myid=1 (we are talking about the zoo.cfg in server 1), 
to the
        > 'server.1=...' line you can put address which can be used by other 
servers
        > to talk back to server 1. This will be the 'advertised address' used 
by
        > ZooKeeper in 3.5. Putting here 0.0.0.0 will not work with 3.5 (until 
we fix
        > it with ZOOKEEPER-2164), as server 2 will not be able to use 0.0.0.0 
to
        > talk back to server 1. But if you put a valid address to the 
'server.1=...'
        > config line while having quorumListenOnAllIPs=true set, you should 
still be
        > able tell ZooKeeper to bind on 0.0.0.0, no matter what IP/hostname 
you put
        > to the 'server.1=...' configs.
        > (there is a similar config to set which IP to bind with the client 
port as
        > well, if you would need to bind to 0.0.0.0 with the client port too.)
        > 
        > 
        > @Jorn:
        >> This might be a wide shot and I did not see exactly the same error, 
but
        > with corretto jdk 11.0.6 I had also issue that ZK could not a quorum. 
I
        > downgraded to 11.0.5 and it did not have an issues. This was on ZK 
3.5.5
        > with Kerberos authentication and authorization.
        > 
        > In the recent JDK versions (8u424, or 11.0.6) there are some
        > backward-incompatible kerbersos related changes affecting basically 
the
        > whole Hadoop stack, not only ZooKeeper. I think it is not recommended 
to
        > use these JDK versions with Hadoop. I am not involved deep in this 
(maybe
        > there is some workaround already, but I am not aware of it).
        > 
        > Kind regards,
        > Mate
        > 
        >> On Wed, Feb 19, 2020 at 11:49 PM Steve Jerman <st...@kloudspot.com> 
wrote:
        >> 
        >> Ok,
        >> 
        >> Just to confirm. Rolling back to 3.4.14 fixes the issue.  The quorum
        >> starts up and restarting any of the instances works....
        >> 
        >> Are there any issues with using the 3.5 client with 3.4 server?
        >> 
        >> Steve
        >> 
        >> On 2/19/20, 9:02 AM, "Steve Jerman" <st...@kloudspot.com> wrote:
        >> 
        >>    OK, that explains it.  I will see if 3.4.14 fixes the issue for 
the
        >> time being...
        >> 
        >>    Thanks
        >>    Steve
        >> 
        >>    On 2/19/20, 8:57 AM, "Jan Kosecki" <jan.koseck...@gmail.com> 
wrote:
        >> 
        >>        Hi Steve,
        >> 
        >>        it's possible that the quorum state depends on the order your
        >> nodes start.
        >>        In my kubernetes environment I've had a similar issue and I've
        >> noticed that
        >>        starting brokers 1 by 1, following the order from 
configuration
        >> file allows
        >>        all 3 to join the quorum but a reverse order would keep server
        >> started as
        >>        the last outside of the quorum. I was also using 0.0.0.0 in 
the
        >>        configuration and didn't try a full address due to readiness 
check
        >>        configuration.
        >> 
        >>        Unfortunately I didn't have time to debug it any further so 
I've
        >> downgraded
        >>        back to 3.4 for the time being.
        >> 
        >>        Hope you manage to find a solution,
        >> 
        >>        Best,
        >>        Jan
        >> 
        >>        On Wed, 19 Feb 2020, 15:47 Steve Jerman, <st...@kloudspot.com>
        >> wrote:
        >> 
        >>> Hi,
        >>> 
        >>> I've just been testing restarts ... I restarted one of the
        >> instances (id
        >>> 1) ... and it doesn't join the quorum ... same error.
        >>> 
        >>> Odd that the system started fine but can't handle a restart....
        >>> 
        >>> Steve
        >>> 
        >>>> On 2/19/20, 7:45 AM, "Steve Jerman" <st...@kloudspot.com> wrote:
        >>> 
        >>>    Thank You Mate,
        >>> 
        >>>    That fixed it. Unfortunately I can't easily avoid using
        >> 0.0.0.0
        >>> 
        >>>    My configuration is built using Docker Storm and it doesn't
        >> let you
        >>> bind to a host name...
        >>> 
        >>>    Steve
        >>> 
        >>>    On 2/19/20, 5:27 AM, "Szalay-Bekő Máté" <
        >> szalay.beko.m...@gmail.com>
        >>> wrote:
        >>> 
        >>>        Hi Steve!
        >>> 
        >>>        If you are using ZooKeeper newer than 3.5.0, then this
        >> might be
        >>> the issue
        >>>        we are just discussing / trying to fix in ZOOKEEPER-2164.
        >>>        Can you test your setup with a config where you don't
        >> use 0.0.0.0
        >>> in the
        >>>        server addresses?
        >>> 
        >>>        If you need to bind to the 0.0.0.0 locally, then please
        >> set the
        >>>        'quorumListenOnAllIPs' config property to true.
        >>> 
        >>>        like:
        >>> 
        >>>        # usually you don't really need this, unless if you
        >> actually need
        >>> to bind
        >>>        to multiple IPs
        >>>        quorumListenOnAllIPs=true
        >>> 
        >>>        # it is best if all the zoo.cfg files contain the same
        >> address
        >>> settings,
        >>>        and we don't use 0.0.0.0 here
        >>>        server.1=zookeeper1:2888:3888
        >>>        server.2=zookeeper2:2888:3888
        >>>        server.3=zookeeper3:2888:3888
        >>> 
        >>>        Kind regards,
        >>>        Mate
        >>> 
        >>>        On Wed, Feb 19, 2020 at 6:08 AM Steve Jerman <
        >> st...@kloudspot.com>
        >>> wrote:
        >>> 
        >>>> Hello folks,
        >>>> 
        >>>> Wonder if anyone can help me. Suspect it must be
        >> something
        >>> simple but I
        >>>> cant see it. Any suggestions about how to diagnose
        >> would be
        >>> gratefully
        >>>> received.
        >>>> 
        >>>> I have a three node ZK cluster, when it starts up only
        >> two of
        >>> the nodes
        >>>> form a quorum. If I restart the leader the quorum
        >> reforms with
        >>> the other
        >>>> two node…
        >>>> 
        >>>> Thanks in advance for any  help
        >>>> Steve
        >>>> 
        >>>> This is the ‘stat’ for the leader/follow…
        >>>> 
        >>>> bash-5.0$ echo stat | nc zookeeper1 2181
        >>>> This ZooKeeper instance is not currently serving
        >> requests
        >>>> 
        >>>> bash-5.0$ echo stat | nc zookeeper2 2181
        >>>> Zookeeper version:
        >>> 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
        >>>> on 02/10/2020 11:30 GMT
        >>>> Clients:
        >>>> /10.0.1.152:44910[1](queued=0,recved=151,sent=151)
        >>>> /10.0.1.140:53138[1](queued=0,recved=187,sent=187)
        >>>> /10.0.1.143:57422[1](queued=0,recved=151,sent=151)
        >>>> /10.0.1.152:59242[0](queued=0,recved=1,sent=0)
        >>>> /10.0.1.143:40826[1](queued=0,recved=1139,sent=1139)
        >>>> /10.0.1.152:49188[1](queued=0,recved=200,sent=203)
        >>>> /10.0.1.152:59548[1](queued=0,recved=1157,sent=1159)
        >>>> /10.0.1.140:36624[1](queued=0,recved=151,sent=151)
        >>>> 
        >>>> Latency min/avg/max: 0/0/5
        >>>> Received: 3338
        >>>> Sent: 3342
        >>>> Connections: 8
        >>>> Outstanding: 0
        >>>> Zxid: 0xc000000f3
        >>>> Mode: follower
        >>>> Node count: 181
        >>>> 
        >>>> bash-5.0$ echo stat | nc zookeeper3 2181
        >>>> Zookeeper version:
        >>> 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
        >>>> on 02/10/2020 11:30 GMT
        >>>> Clients:
        >>>> /10.0.1.152:49428[0](queued=0,recved=1,sent=0)
        >>>> /10.0.1.140:32912[1](queued=0,recved=1426,sent=1429)
        >>>> 
        >>>> Latency min/avg/max: 0/0/4
        >>>> Received: 1684
        >>>> Sent: 1686
        >>>> Connections: 2
        >>>> Outstanding: 0
        >>>> Zxid: 0xc000000f4
        >>>> Mode: leader
        >>>> Node count: 181
        >>>> Proposal sizes last/min/max: 32/32/406
        >>>> bash-5.0$
        >>>> 
        >>>> The trace for the failing node is:
        >>>> 
        >>>> server.1=0.0.0.0:2888:3888
        >>>> server.2=zookeeper2:2888:3888
        >>>> server.3=zookeeper3:2888:3888
        >>>> ZooKeeper JMX enabled by default
        >>>> Using config: /opt/zookeeper/bin/../conf/zoo.cfg
        >>>> 2020-02-19 04:23:27,759 [myid:] - INFO
        >>> [main:QuorumPeerConfig@135] -
        >>>> Reading configuration from:
        >> /opt/zookeeper/bin/../conf/zoo.cfg
        >>>> 2020-02-19 04:23:27,764 [myid:] - INFO
        >>> [main:QuorumPeerConfig@387] -
        >>>> clientPortAddress is 0.0.0.0:2181
        >>>> 2020-02-19 04:23:27,764 [myid:] - INFO
        >>> [main:QuorumPeerConfig@391] -
        >>>> secureClientPort is not set
        >>>> 2020-02-19 04:23:27,771 [myid:1] - INFO
        >>> [main:DatadirCleanupManager@78]
        >>>> - autopurge.snapRetainCount set to 3
        >>>> 2020-02-19 04:23:27,772 [myid:1] - INFO
        >>> [main:DatadirCleanupManager@79]
        >>>> - autopurge.purgeInterval set to 24
        >>>> 2020-02-19 04:23:27,772 [myid:1] - INFO
        >>>> [PurgeTask:DatadirCleanupManager$PurgeTask@138] -
        >> Purge task
        >>> started.
        >>>> 2020-02-19 04:23:27,773 [myid:1] - INFO
        >> [main:ManagedUtil@46]
        >>> - Log4j
        >>>> found with jmx enabled.
        >>>> 2020-02-19 04:23:27,774 [myid:1] - INFO
        >>> [PurgeTask:FileTxnSnapLog@115] -
        >>>> zookeeper.snapshot.trust.empty : false
        >>>> 2020-02-19 04:23:27,780 [myid:1] - INFO
        >>>> [PurgeTask:DatadirCleanupManager$PurgeTask@144] -
        >> Purge task
        >>> completed.
        >>>> 2020-02-19 04:23:27,781 [myid:1] - INFO
        >> [main:QuorumPeerMain@141]
        >>> -
        >>>> Starting quorum peer
        >>>> 2020-02-19 04:23:27,786 [myid:1] - INFO
        >>> [main:ServerCnxnFactory@135] -
        >>>> Using org.apache.zookeeper.server.NIOServerCnxnFactory
        >> as server
        >>> connection
        >>>> factory
        >>>> 2020-02-19 04:23:27,788 [myid:1] - INFO
        >>> [main:NIOServerCnxnFactory@673]
        >>>> - Configuring NIO connection handler with 10s
        >> sessionless
        >>> connection
        >>>> timeout, 2 selector thread(s), 32 worker threads, and
        >> 64 kB
        >>> direct buffers.
        >>>> 2020-02-19 04:23:27,791 [myid:1] - INFO
        >>> [main:NIOServerCnxnFactory@686]
        >>>> - binding to port 0.0.0.0/0.0.0.0:2181
        >>>> 2020-02-19 <http://0.0.0.0/0.0.0.0:21812020-02-19>
        >> 04:23:27,809
        >>> [myid:1]
        >>>> - INFO  [main:Log@169] - Logging initialized @249ms to
        >>>> org.eclipse.jetty.util.log.Slf4jLog
        >>>> 2020-02-19 04:23:27,913 [myid:1] - WARN
        >>> [main:ContextHandler@1520] -
        >>>> o.e.j.s.ServletContextHandler@5abca1e0
        >> {/,null,UNAVAILABLE}
        >>> contextPath
        >>>> ends with /*
        >>>> 2020-02-19 04:23:27,913 [myid:1] - WARN
        >>> [main:ContextHandler@1531] -
        >>>> Empty contextPath
        >>>> 2020-02-19 04:23:27,922 [myid:1] - INFO
        >> [main:X509Util@79] -
        >>> Setting -D
        >>>> jdk.tls.rejectClientInitiatedRenegotiation=true to
        >> disable
        >>> client-initiated
        >>>> TLS renegotiation
        >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
        >> [main:FileTxnSnapLog@115]
        >>> -
        >>>> zookeeper.snapshot.trust.empty : false
        >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
        >> [main:QuorumPeer@1470]
        >>> - Local
        >>>> sessions disabled
        >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
        >> [main:QuorumPeer@1481]
        >>> - Local
        >>>> session upgrading disabled
        >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
        >> [main:QuorumPeer@1448]
        >>> -
        >>>> tickTime set to 2000
        >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
        >> [main:QuorumPeer@1492]
        >>> -
        >>>> minSessionTimeout set to 4000
        >>>> 2020-02-19 04:23:27,924 [myid:1] - INFO
        >> [main:QuorumPeer@1503]
        >>> -
        >>>> maxSessionTimeout set to 40000
        >>>> 2020-02-19 04:23:27,924 [myid:1] - INFO
        >> [main:QuorumPeer@1518]
        >>> -
        >>>> initLimit set to 30
        >>>> 2020-02-19 04:23:27,932 [myid:1] - INFO
        >> [main:ZKDatabase@117] -
        >>>> zookeeper.snapshotSizeFactor = 0.33
        >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
        >> [main:QuorumPeer@1763]
        >>> - Using
        >>>> insecure (non-TLS) quorum communication
        >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
        >> [main:QuorumPeer@1769]
        >>> - Port
        >>>> unification disabled
        >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
        >> [main:QuorumPeer@2136]
        >>> -
        >>>> QuorumPeer communication is not secured! (SASL auth
        >> disabled)
        >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
        >> [main:QuorumPeer@2165]
        >>> -
        >>>> quorum.cnxn.threads.size set to 20
        >>>> 2020-02-19 04:23:27,934 [myid:1] - INFO
        >> [main:FileSnap@83] -
        >>> Reading
        >>>> snapshot
        >> /opt/zookeeper/data/version-2/snapshot.90000043e
        >>>> 2020-02-19 04:23:27,963 [myid:1] - INFO
        >> [main:Server@359] -
        >>>> jetty-9.4.24.v20191120; built:
        >> 2019-11-20T21:37:49.771Z; git:
        >>>> 363d5f2df3a8a28de40604320230664b9c793c16; jvm
        >>>> 1.8.0_111-internal-alpine-r0-b14
        >>>> 2020-02-19 04:23:27,989 [myid:1] - INFO
        >>> [main:DefaultSessionIdManager@333]
        >>>> - DefaultSessionIdManager workerName=node0
        >>>> 2020-02-19 04:23:27,989 [myid:1] - INFO
        >>> [main:DefaultSessionIdManager@338]
        >>>> - No SessionScavenger set, using defaults
        >>>> 2020-02-19 04:23:27,990 [myid:1] - INFO
        >> [main:HouseKeeper@140]
        >>> - node0
        >>>> Scavenging every 660000ms
        >>>> 2020-02-19 04:23:27,997 [myid:1] - INFO
        >> [main:ContextHandler@825]
        >>> -
        >>>> Started o.e.j.s.ServletContextHandler@5abca1e0
        >> {/,null,AVAILABLE}
        >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
        >>> [main:AbstractConnector@330] -
        >>>> Started ServerConnector@2b98378d{HTTP/1.1,[http/1.1]}{
        >>> 0.0.0.0:8080}
        >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
        >> [main:Server@399] -
        >>> Started
        >>>> @444ms
        >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
        >>> [main:JettyAdminServer@112] -
        >>>> Started AdminServer on address 0.0.0.0, port 8080 and
        >> command
        >>> URL /commands
        >>>> 2020-02-19 04:23:28,007 [myid:1] - INFO
        >>>> [main:QuorumCnxManager$Listener@867] - Election port
        >> bind
        >>> maximum retries
        >>>> is 1000
        >>>> 2020-02-19 04:23:28,007 [myid:1] - INFO
        >>>> [QuorumPeerListener:QuorumCnxManager$Listener@917] -
        >> My
        >>> election bind
        >>>> port: /0.0.0.0:3888
        >>>> 2020-02-19 04:23:28,014 [myid:1] - INFO
        >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
        >>> )(secure=disabled):QuorumPeer@1175]
        >>>> - LOOKING
        >>>> 2020-02-19 04:23:28,015 [myid:1] - INFO
        >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
        >>>> )(secure=disabled):FastLeaderElection@885] - New
        >> election. My
        >>> id =  1,
        >>>> proposed zxid=0xa000000fb
        >>>> 2020-02-19 04:23:28,018 [myid:1] - INFO
        >>>> [WorkerSender[myid=1]:QuorumCnxManager@438] - Have
        >> smaller
        >>> server
        >>>> identifier, so dropping the connection: (2, 1)
        >>>> 2020-02-19 04:23:28,019 [myid:1] - INFO
        >>>> [WorkerSender[myid=1]:QuorumCnxManager@438] - Have
        >> smaller
        >>> server
        >>>> identifier, so dropping the connection: (3, 1)
        >>>> 2020-02-19 04:23:28,019 [myid:1] - INFO
        >>>> [WorkerReceiver[myid=1]:FastLeaderElection@679] -
        >> Notification:
        >>> 2
        >>>> (message format version), 1 (n.leader), 0xa000000fb
        >> (n.zxid), 0x1
        >>>> (n.round), LOOKING (n.state), 1 (n.sid), 0xa
        >> (n.peerEPoch),
        >>> LOOKING (my
        >>>> state)0 (n.config version)
        >>>> 2020-02-19 04:23:28,221 [myid:1] - INFO
        >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
        >>>> )(secure=disabled):QuorumCnxManager@438] - Have
        >> smaller server
        >>>> identifier, so dropping the connection: (2, 1)
        >>>> 2020-02-19 04:23:28,222 [myid:1] - INFO
        >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
        >>>> )(secure=disabled):QuorumCnxManager@438] - Have
        >> smaller server
        >>>> identifier, so dropping the connection: (3, 1)
        >>>> 2020-02-19 04:23:28,222 [myid:1] - INFO
        >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
        >>>> )(secure=disabled):FastLeaderElection@919] -
        >> Notification time
        >>> out: 400
        >>>> 2020-02-19 04:23:28,623 [myid:1] - INFO
        >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
        >>>> )(secure=disabled):QuorumCnxManager@438] - Have
        >> smaller server
        >>>> identifier, so dropping the connection: (2, 1)
        >>>> 2020-02-19 04:23:28,624 [myid:1] - INFO
        >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
        >>>> )(secure=disabled):QuorumCnxManager@438] - Have
        >> smaller server
        >>>> identifier, so dropping the connection: (3, 1)
        >>>> 2020-02-19 04:23:28,624 [myid:1] - INFO
        >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
        >>>> )(secure=disabled):FastLeaderElection@919] -
        >> Notification time
        >>> out: 800
        >>>> …
        >>>> 
        >>>> And for the leader:
        >>>> ….
        >>>> 2020-02-19 05:02:10,341 [myid:3] - INFO
        >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
        >> Notification:
        >>> 2
        >>>> (message format version), 3 (n.leader), 0xb0000018c
        >> (n.zxid), 0x1
        >>>> (n.round), LOOKING (n.state), 3 (n.sid), 0xb
        >> (n.peerEPoch),
        >>> LEADING (my
        >>>> state)0 (n.config version)
        >>>> 2020-02-19 05:02:10,341 [myid:3] - INFO
        >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
        >> Notification:
        >>> 2
        >>>> (message format version), 3 (n.leader), 0xb0000018c
        >> (n.zxid), 0x1
        >>>> (n.round), LEADING (n.state), 3 (n.sid), 0xc
        >> (n.peerEPoch),
        >>> LEADING (my
        >>>> state)0 (n.config version)
        >>>> 2020-02-19 05:02:33,640 [myid:3] - WARN
        >>>> [NIOWorkerThread-4:NIOServerCnxn@366] - Unable to read
        >>> additional data
        >>>> from client sessionid 0x30002ba40710018, likely client
        >> has
        >>> closed socket
        >>>> 2020-02-19 05:02:39,047 [myid:3] - INFO
        >>>> [SessionTracker:ZooKeeperServer@398] - Expiring
        >> session
        >>>> 0x20002ba2fd6001a, timeout of 40000ms exceeded
        >>>> 2020-02-19 05:02:39,048 [myid:3] - INFO
        >>>> [SessionTracker:QuorumZooKeeperServer@157] -
        >> Submitting global
        >>>> closeSession request for session 0x20002ba2fd6001a
        >>>> 2020-02-19 05:03:10,340 [myid:3] - INFO  [/
        >> 0.0.0.0:3888
        >>>> :QuorumCnxManager$Listener@924] - Received connection
        >> request
        >>>> 10.0.1.152:52492
        >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
        >>>> [SendWorker:1:QuorumCnxManager$SendWorker@1143] -
        >> Interrupted
        >>> while
        >>>> waiting for message on queue
        >>>> java.lang.InterruptedException
        >>>>                at
        >>>> 
        >>> 
        >> 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        >>>>                at
        >>>> 
        >>> 
        >> 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
        >>>>                at
        >>>> 
        >>> 
        >> 
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
        >>>>                at
        >>>> 
        >>> 
        >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
        >>>>                at
        >>>> 
        >>> 
        >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
        >>>>                at
        >>>> 
        >>> 
        >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
        >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
        >>>> [SendWorker:1:QuorumCnxManager$SendWorker@1153] -
        >> Send worker
        >>> leaving
        >>>> thread  id 1 my id = 3
        >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
        >>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@1227] -
        >> Connection
        >>> broken for
        >>>> id 1, my id = 3, error =
        >>>> java.net.SocketException: Socket closed
        >>>>                at
        >> java.net.SocketInputStream.socketRead0(Native
        >>> Method)
        >>>>                at
        >>>> 
        >> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        >>>>                at
        >>>> 
        >> java.net.SocketInputStream.read(SocketInputStream.java:170)
        >>>>                at
        >>>> 
        >> java.net.SocketInputStream.read(SocketInputStream.java:141)
        >>>>                at
        >>>> 
        >> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        >>>>                at
        >>>> 
        >> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        >>>>                at
        >>>> 
        >> java.io.DataInputStream.readInt(DataInputStream.java:387)
        >>>>                at
        >>>> 
        >>> 
        >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
        >>>> 2020-02-19 05:03:10,341 [myid:3] - WARN
        >>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@1230] -
        >> Interrupting
        >>> SendWorker
        >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
        >>>> [RecvWorker:3:QuorumCnxManager$RecvWorker@1227] -
        >> Connection
        >>> broken for
        >>>> id 3, my id = 3, error =
        >>>> java.io.EOFException
        >>>>                at
        >>>> 
        >> java.io.DataInputStream.readInt(DataInputStream.java:392)
        >>>>                at
        >>>> 
        >>> 
        >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
        >>>> 2020-02-19 05:03:10,341 [myid:3] - WARN
        >>>> [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] -
        >> Interrupting
        >>> SendWorker
        >>>> 2020-02-19 05:03:10,341 [myid:3] - INFO  [/
        >> 0.0.0.0:3888
        >>>> :QuorumCnxManager$Listener@924] - Received connection
        >> request
        >>>> 10.0.1.142:46326
        >>>> 2020-02-19 05:03:10,343 [myid:3] - WARN
        >>>> [SendWorker:3:QuorumCnxManager$SendWorker@1143] -
        >> Interrupted
        >>> while
        >>>> waiting for message on queue
        >>>> java.lang.InterruptedException
        >>>>                at
        >>>> 
        >>> 
        >> 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        >>>>                at
        >>>> 
        >>> 
        >> 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
        >>>>                at
        >>>> 
        >>> 
        >> 
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
        >>>>                at
        >>>> 
        >>> 
        >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
        >>>>                at
        >>>> 
        >>> 
        >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
        >>>>                at
        >>>> 
        >>> 
        >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
        >>>> 2020-02-19 05:03:10,344 [myid:3] - WARN
        >>>> [SendWorker:3:QuorumCnxManager$SendWorker@1153] -
        >> Send worker
        >>> leaving
        >>>> thread  id 3 my id = 3
        >>>> 2020-02-19 05:03:10,344 [myid:3] - INFO
        >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
        >> Notification:
        >>> 2
        >>>> (message format version), 3 (n.leader), 0xb0000018c
        >> (n.zxid), 0x1
        >>>> (n.round), LOOKING (n.state), 3 (n.sid), 0xb
        >> (n.peerEPoch),
        >>> LEADING (my
        >>>> state)0 (n.config version)
        >>>> 2020-02-19 05:03:10,345 [myid:3] - INFO
        >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
        >> Notification:
        >>> 2
        >>>> (message format version), 3 (n.leader), 0xb0000018c
        >> (n.zxid), 0x1
        >>>> (n.round), LEADING (n.state), 3 (n.sid), 0xc
        >> (n.peerEPoch),
        >>> LEADING (my
        >>>> state)0 (n.config version)
        >>>> 2020-02-19 05:03:11,048 [myid:3] - INFO
        >>>> [SessionTracker:ZooKeeperServer@398] - Expiring
        >> session
        >>>> 0x30002ba40710018, timeout of 40000ms exceeded
        >>>> 2020-02-19 05:03:11,048 [myid:3] - INFO
        >>>> [SessionTracker:QuorumZooKeeperServer@157] -
        >> Submitting global
        >>>> closeSession request for session 0x30002ba40710018
        >>>> 
        >>>> All of the instances have a similar zoo.cfg:
        >>>> 
        >>>> bash-4.3# cat conf/zoo.cfg
        >>>> # The number of milliseconds of each tick
        >>>> tickTime=2000
        >>>> # The number of ticks that the initial
        >>>> # synchronization phase can take
        >>>> initLimit=30
        >>>> # The number of ticks that can pass between
        >>>> # sending a request and getting an acknowledgement
        >>>> syncLimit=5
        >>>> # Purge every 24 hours
        >>>> autopurge.purgeInterval=24
        >>>> # the directory where the snapshot is stored.
        >>>> # do not use /tmp for storage, /tmp here is just
        >>>> # example sakes.
        >>>> dataDir=/opt/zookeeper/data
        >>>> # the port at which the clients will connect
        >>>> clientPort=2181
        >>>> #Append other confg...
        >>>> electionPortBindRetry=1000
        >>>> 4lw.commands.whitelist=stat, ruok, conf, mntr
        >>>> 
        >>>> server.1=zookeeper1:2888:3888
        >>>> server.2=zookeeper2:2888:3888
        >>>> server.3=0.0.0.0:2888:3888
        >>>> 
        >>> 
        >>> 
        >>> 
        >>> 
        >>> 
        >> 
        >> 
        >> 
        >> 
        >>

Re: Zookeeper won't form a quorum...

Reply via email to