rld...@cornell.edu Hi, I'm trying to get a 3 node Zookeeper cluster up and running. I am doing an initial install with Ansible from https://galaxy.ansible.com/lean_delivery/zookeeper which installs version 3.6.0. This gets launched with systemd which is just calling bin/zkServer.sh start-foreground.
The Ansible role sets the myid's and has a zoo.cfg.dynamic file that is called in zoo.cfg. In my efforts to get this to work I've removed the reference to the dynamic file. My zoo.cfg looks like this now. tickTime=2000 dataDir=/var/lib/zookeeper dataLogDir=/var/log/zookeeper initLimit=30 syncLimit=15 clientPort=2181 server.1=0.0.0.0:2888:3888 server.2=< fqdn of server 2 >:2888:3888 server.3=< fqdn of server 3 >:2888:3888 4lw.commands.whitelist=* For each server 0.0.0.0 is swapped out in the appropriate place and the myid's are correct. I upped the initLimit and syncLimit at some point to see if that would make a difference. I've made sure to start the Zookeeper services on each server in the correct order. But with all that when I try bin/zkServer.sh status I always get the following error /bin/java ZooKeeper JMX enabled by default Using config: /opt/zookeeper-3.6.0/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Error contacting service. It is probably not running. I think Zookeeper is supposed to be listening on 2888 but am not seeing that when I try netstat -tulpn | grep LISTEN tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1260/sshd tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1222/master tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1/systemd tcp6 0 0 :::22 :::* LISTEN 1260/sshd tcp6 0 0 :::8983 :::* LISTEN 1129/java tcp6 0 0 ::1:25 :::* LISTEN 1222/master tcp6 0 0 :::443 :::* LISTEN 1028/httpd tcp6 0 0 :::40093 :::* LISTEN 3354/java tcp6 0 0 :::2181 :::* LISTEN 3354/java tcp6 0 0 127.0.0.1:7983 :::* LISTEN 1129/java tcp6 0 0 :::111 :::* LISTEN 1/systemd tcp6 0 0 :::3888 :::* LISTEN 3354/java tcp6 0 0 :::8080 :::* LISTEN 3354/java tcp6 0 0 :::80 :::* LISTEN 1028/httpd I'm seeing this in my logs. 2020-03-23 14:38:38,738 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1470] - The max bytes for all large requests are set to 104857600 2020-03-23 14:38:38,738 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1484] - The large request threshold is set to -1 2020-03-23 14:38:38,738 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@329] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 clientPortListenBacklog -1 datadir /var/log/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2 2020-03-23 14:38:38,738 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@75] - FOLLOWING - LEADER ELECTION TOOK - 59223 MS 2020-03-23 14:38:38,738 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@863] - Peer state changed: following - discovery 2020-03-23 14:38:39,740 [myid:1] - WARN [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@129] - Exception when following the leader java.lang.IllegalArgumentException at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1314) at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1202) at java.util.concurrent.Executors.newFixedThreadPool(Executors.java:89) at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:275) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:87) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1455) 2020-03-23 14:38:39,740 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@292] - shutdown Follower 2020-03-23 14:38:39,740 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@863] - Peer state changed: looking 2020-03-23 14:38:39,740 [myid:1] - WARN [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1501] - PeerState set to LOOKING 2020-03-23 14:38:39,741 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1371] - LOOKING 2020-03-23 14:38:39,741 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):FastLeaderElection@931] - New election. My id = 1, proposed zxid=0x0 2020-03-23 14:38:39,741 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@462] - Have smaller server identifier, so dropping the connection: (2, 1) 2020-03-23 14:38:39,742 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:1, n.round:0x27, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-23 15:08:48,614 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:3, n.state:LEADING, n.leader:3, n.round:0x44, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-23 15:08:49,406 [myid:2] - INFO [ListenerHandler-/0.0.0.0:3888:QuorumCnxManager$Listener$ListenerHandler@1052] - Received connection request /10.92.108.38:51428 2020-03-23 15:08:49,407 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker@1374] - Connection broken for id 1, my id = 2 java.net.SocketException: Socket closed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1362) 2020-03-23 15:08:49,407 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker@1380] - Interrupting SendWorker 2020-03-23 15:08:49,407 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker@1262] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105) at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1425) at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:97) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1251) 2020-03-23 15:08:49,407 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker@1274] - Send worker leaving thread id 1 my id = 2 2020-03-23 15:08:49,407 [myid:2] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@1374] - Connection broken for id 2, my id = 2 java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1362) 2020-03-23 15:08:49,407 [myid:2] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@1380] - Interrupting SendWorker 2020-03-23 15:08:49,410 [myid:2] - INFO [ListenerHandler-/0.0.0.0:3888:QuorumCnxManager$Listener$ListenerHandler@1052] - Received connection request /10.92.108.99:41828 2020-03-23 15:08:49,420 [myid:2] - WARN [SendWorker:2:QuorumCnxManager$SendWorker@1262] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105) at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1425) at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:97) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1251) 2020-03-23 15:08:49,420 [myid:2] - WARN [SendWorker:2:QuorumCnxManager$SendWorker@1274] - Send worker leaving thread id 2 my id = 2 2020-03-23 15:08:49,421 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x45, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-23 15:08:51,021 [myid:2] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):FastLeaderElection@966] - Notification time out: 3200 2020-03-23 15:08:51,021 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x45, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-23 15:08:51,021 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x45, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-23 15:08:51,022 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:3, n.state:LEADING, n.leader:3, n.round:0x44, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-23 15:08:37,187 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LEADING; n.sid:1, n.state:LOOKING, n.leader:1, n.round:0x44, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-23 15:08:45,183 [myid:3] - WARN [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1471] - Unexpected exception java.lang.InterruptedException: Timeout while waiting for epoch from quorum at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1425) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:599) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1468) 2020-03-23 15:08:45,183 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Leader@792] - Shutting down 2020-03-23 15:08:45,183 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Leader@798] - Shutdown called. For the reason Forcing shutdown 2020-03-23 15:08:45,183 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@863] - Peer state changed: looking 2020-03-23 15:08:45,183 [myid:3] - WARN [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1501] - PeerState set to LOOKING 2020-03-23 15:08:45,183 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1371] - LOOKING 2020-03-23 15:08:45,183 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):FastLeaderElection@931] - New election. My id = 3, proposed zxid=0x0 2020-03-23 15:08:45,184 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x44, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-23 15:08:45,187 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x44, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-23 15:08:45,188 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:3, n.round:0x44, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-23 15:08:45,189 [myid:3] - WARN [LearnerCnxAcceptorHandler-/0.0.0.0:2888:Leader$LearnerCnxAcceptor$LearnerCnxAcceptorHandler@523] - Exception while shutting down acceptor. java.net.SocketException: Socket closed at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:560) at java.net.ServerSocket.accept(ServerSocket.java:528) at org.apache.zookeeper.server.quorum.Leader$LearnerCnxAcceptor$LearnerCnxAcceptorHandler.acceptConnections(Leader.java:510) at org.apache.zookeeper.server.quorum.Leader$LearnerCnxAcceptor$LearnerCnxAcceptorHandler.run(Leader.java:493) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-03-23 15:08:45,389 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@857] - Peer state changed: leading 2020-03-23 15:08:45,389 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1465] - LEADING Don't know what to do here. Any help would be greatly appreciated. -- Sent from: http://zookeeper-user.578899.n2.nabble.com/