Hi, Some more information that we just got from our installation.
- To check the EOF theory, we cleared all the version-2 data directories and restarted all vehicles - 2 of the 8 vehicles were unavailable at that time - The remaining 6 vehicles did not succeed in finishing leader election phase ! A log extract can be found at the end of this e-mail. - We then removed the 2 vehicles from the configuration just for testing purposes - Now leader election succeed without any problems ! Any insights are very helpful ! Filip Logging during failing leader election phase : 2015-06-25 16:29:30,965 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:65133 (no session established for client) 2015-06-25 16:29:31,171 [myid:8] - WARN [WorkerSender[myid=8]:QuorumCnxManager@382] - Cannot open channel to 5 at election address /172.17.35.15:3888 java.net.SocketTimeoutException: connect timed out at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:31,189 [myid:8] - DEBUG [WorkerSender[myid=8]:QuorumCnxManager@364] - Opening channel to server 6 2015-06-25 16:29:31,464 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:65145 2015-06-25 16:29:31,466 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:31,467 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:31,467 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:65145 (no session established for client) 2015-06-25 16:29:32,370 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65163 2015-06-25 16:29:32,372 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:32,374 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:32,374 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:65163 (no session established for client) 2015-06-25 16:29:32,585 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:65167 2015-06-25 16:29:32,587 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:32,588 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:32,589 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:65167 (no session established for client) 2015-06-25 16:29:32,800 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65173 2015-06-25 16:29:32,805 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:32,809 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:32,810 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:65173 (no session established for client) 2015-06-25 16:29:33,543 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:65190 2015-06-25 16:29:33,544 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:33,546 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:33,546 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:65190 (no session established for client) 2015-06-25 16:29:33,983 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65200 2015-06-25 16:29:33,985 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:33,986 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:33,986 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:65200 (no session established for client) 2015-06-25 16:29:34,527 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:65212 2015-06-25 16:29:34,529 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:34,530 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:34,530 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:65212 (no session established for client) 2015-06-25 16:29:35,108 [myid:8] - DEBUG [WorkerReceiver[myid=8]:FastLeaderElection$Messenger$WorkerReceiver@270] - Receive new notification message. My id = 8 2015-06-25 16:29:35,109 [myid:8] - INFO [WorkerReceiver[myid=8]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x1ae0000019c (n.zxid), 0x2e (n.round), LOOKING (n.state), 4 (n.sid), 0x1ae (n.peerEpoch) LOOKING (my state) 2015-06-25 16:29:35,111 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@609] - id: 2, proposed id: 8, zxid: 0x1ae0000019c, proposed zxid: 0x1ae0000019c 2015-06-25 16:29:35,113 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@885] - Adding vote: from=4, proposed leader=2, proposed zxid=0x1ae0000019c, proposed election epoch=0x2e 2015-06-25 16:29:35,113 [myid:8] - DEBUG [WorkerReceiver[myid=8]:FastLeaderElection$Messenger$WorkerReceiver@270] - Receive new notification message. My id = 8 2015-06-25 16:29:35,113 [myid:8] - INFO [WorkerReceiver[myid=8]:FastLeaderElection@597] - Notification: 1 (message format version), 8 (n.leader), 0x1ae0000019c (n.zxid), 0x2e (n.round), LOOKING (n.state), 2 (n.sid), 0x1ae (n.peerEpoch) LOOKING (my state) 2015-06-25 16:29:35,115 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@609] - id: 8, proposed id: 8, zxid: 0x1ae0000019c, proposed zxid: 0x1ae0000019c 2015-06-25 16:29:35,115 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@885] - Adding vote: from=2, proposed leader=8, proposed zxid=0x1ae0000019c, proposed election epoch=0x2e 2015-06-25 16:29:35,333 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65229 2015-06-25 16:29:35,335 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:35,336 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:35,336 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:65229 (no session established for client) 2015-06-25 16:29:35,369 [myid:8] - DEBUG [WorkerReceiver[myid=8]:FastLeaderElection$Messenger$WorkerReceiver@270] - Receive new notification message. My id = 8 2015-06-25 16:29:35,369 [myid:8] - INFO [WorkerReceiver[myid=8]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x1ae0000019c (n.zxid), 0x2e (n.round), LOOKING (n.state), 1 (n.sid), 0x1ae (n.peerEpoch) LOOKING (my state) 2015-06-25 16:29:35,371 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@609] - id: 2, proposed id: 8, zxid: 0x1ae0000019c, proposed zxid: 0x1ae0000019c 2015-06-25 16:29:35,371 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@885] - Adding vote: from=1, proposed leader=2, proposed zxid=0x1ae0000019c, proposed election epoch=0x2e 2015-06-25 16:29:35,776 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65239 2015-06-25 16:29:35,779 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:35,780 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:35,781 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:65239 (no session established for client) 2015-06-25 16:29:36,052 [myid:8] - DEBUG [WorkerReceiver[myid=8]:FastLeaderElection$Messenger$WorkerReceiver@270] - Receive new notification message. My id = 8 2015-06-25 16:29:36,052 [myid:8] - INFO [WorkerReceiver[myid=8]:FastLeaderElection@597] - Notification: 1 (message format version), 8 (n.leader), 0x1ae0000019c (n.zxid), 0x2e (n.round), LOOKING (n.state), 3 (n.sid), 0x1ae (n.peerEpoch) LOOKING (my state) 2015-06-25 16:29:36,054 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@609] - id: 8, proposed id: 8, zxid: 0x1ae0000019c, proposed zxid: 0x1ae0000019c 2015-06-25 16:29:36,054 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@885] - Adding vote: from=3, proposed leader=8, proposed zxid=0x1ae0000019c, proposed election epoch=0x2e 2015-06-25 16:29:36,064 [myid:8] - DEBUG [WorkerReceiver[myid=8]:FastLeaderElection$Messenger$WorkerReceiver@270] - Receive new notification message. My id = 8 2015-06-25 16:29:36,064 [myid:8] - INFO [WorkerReceiver[myid=8]:FastLeaderElection@597] - Notification: 1 (message format version), 7 (n.leader), 0x1ae00000196 (n.zxid), 0x2e (n.round), LOOKING (n.state), 7 (n.sid), 0x1ae (n.peerEpoch) LOOKING (my state) 2015-06-25 16:29:36,066 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@609] - id: 7, proposed id: 8, zxid: 0x1ae00000196, proposed zxid: 0x1ae0000019c 2015-06-25 16:29:36,066 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@885] - Adding vote: from=7, proposed leader=7, proposed zxid=0x1ae00000196, proposed election epoch=0x2e 2015-06-25 16:29:36,189 [myid:8] - WARN [WorkerSender[myid=8]:QuorumCnxManager@382] - Cannot open channel to 6 at election address /172.17.35.16:3888 java.net.SocketTimeoutException: connect timed out at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:36,198 [myid:8] - DEBUG [WorkerSender[myid=8]:QuorumCnxManager@387] - There is a connection already for server 7 2015-06-25 16:29:36,198 [myid:8] - DEBUG [WorkerSender[myid=8]:QuorumCnxManager@387] - There is a connection already for server 1 2015-06-25 16:29:36,198 [myid:8] - DEBUG [WorkerSender[myid=8]:QuorumCnxManager@387] - There is a connection already for server 2 2015-06-25 16:29:36,198 [myid:8] - DEBUG [WorkerSender[myid=8]:QuorumCnxManager@387] - There is a connection already for server 3 2015-06-25 16:29:36,198 [myid:8] - DEBUG [WorkerSender[myid=8]:QuorumCnxManager@387] - There is a connection already for server 4 2015-06-25 16:29:36,198 [myid:8] - DEBUG [WorkerSender[myid=8]:QuorumCnxManager@364] - Opening channel to server 5 2015-06-25 16:29:36,200 [myid:8] - DEBUG [WorkerReceiver[myid=8]:FastLeaderElection$Messenger$WorkerReceiver@270] - Receive new notification message. My id = 8 2015-06-25 16:29:36,201 [myid:8] - INFO [WorkerReceiver[myid=8]:FastLeaderElection@597] - Notification: 1 (message format version), 8 (n.leader), 0x1ae0000019c (n.zxid), 0x2e (n.round), LOOKING (n.state), 8 (n.sid), 0x1ae (n.peerEpoch) LOOKING (my state) 2015-06-25 16:29:36,215 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@609] - id: 8, proposed id: 8, zxid: 0x1ae0000019c, proposed zxid: 0x1ae0000019c 2015-06-25 16:29:36,215 [myid:8] - DEBUG [QuorumPeer[myid=8]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@885] - Adding vote: from=8, proposed leader=8, proposed zxid=0x1ae0000019c, proposed election epoch=0x2e 2015-06-25 16:29:36,309 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:65251 2015-06-25 16:29:36,315 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:36,344 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:36,345 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:65251 (no session established for client) 2015-06-25 16:29:36,369 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:65254 2015-06-25 16:29:36,371 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:36,372 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:36,373 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:65254 (no session established for client) 2015-06-25 16:29:38,053 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65288 2015-06-25 16:29:38,056 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:38,057 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:38,058 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:65288 (no session established for client) 2015-06-25 16:29:38,328 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65294 2015-06-25 16:29:38,330 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:38,331 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:38,332 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:65294 (no session established for client) 2015-06-25 16:29:38,679 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:65302 2015-06-25 16:29:38,686 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:38,688 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:38,689 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:65302 (no session established for client) 2015-06-25 16:29:39,385 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:65318 2015-06-25 16:29:39,393 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:39,400 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:39,401 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:65318 (no session established for client) 2015-06-25 16:29:39,973 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65331 2015-06-25 16:29:39,975 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:39,977 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:39,977 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:65331 (no session established for client) 2015-06-25 16:29:40,510 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65343 2015-06-25 16:29:40,512 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:40,514 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:40,514 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:65343 (no session established for client) 2015-06-25 16:29:40,973 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:65352 2015-06-25 16:29:40,975 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:40,977 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:40,977 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:65352 (no session established for client) 2015-06-25 16:29:41,200 [myid:8] - WARN [WorkerSender[myid=8]:QuorumCnxManager@382] - Cannot open channel to 5 at election address /172.17.35.15:3888 java.net.SocketTimeoutException: connect timed out at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:41,211 [myid:8] - DEBUG [WorkerSender[myid=8]:QuorumCnxManager@364] - Opening channel to server 6 2015-06-25 16:29:41,564 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:65364 2015-06-25 16:29:41,566 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:41,567 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2015-06-25 16:29:41,567 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:65364 (no session established for client) 2015-06-25 16:29:42,448 [myid:8] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65384 2015-06-25 16:29:42,465 [myid:8] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-06-25 16:29:42,467 [myid:8] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace > On 25 Jun 2015, at 16:34, Filip Deleersnijder <fi...@motum.be> wrote: > > Hi, > > I can see that all of our logs contain the following log-statements pretty > often. > > 2015-06-22 12:02:00,752 [myid:2] - DEBUG [main:DataTree@949][] - Ignoring > processTxn failure hdr: -1 : error: -2 > 2015-06-22 12:02:00,753 [myid:2] - DEBUG [main:DataTree@949][] - Ignoring > processTxn failure hdr: 14 : error: -101 > > 2015-06-25 14:02:39,505 [myid:3] - DEBUG > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FileTxnLog$FileTxnIterator@636] - > EOF excepton java.io.EOFException: Failed to read > c:\motum\config\MASS\ZK\version-2\log.1aa00000001 > Since we don’t properly shut the ZK process down ( we just shutdown windows > ), this properly can cause corruption of files. > > Is there somebody that has a clear idea about whether the “EOF” or the > “Ignoring processTxn” problems could cause frequent and long during Leader > Elections ? > > Any help is greatly appreciated, > > Filip > > > >> On 25 Jun 2015, at 11:51, Guy Moshkowich <guy.moshkow...@gmail.com >> <mailto:guy.moshkow...@gmail.com>> wrote: >> >> Are you using ZK client on your vehicles or ZK servers? >> You mentioned below 8 vehicles and i see 8 servers defined in the config. >> I would expect you have 8 client(running on your vehicles) communicating >> against 1 or 3 ZK servers as this will be more than enough for 8 clients. >> Guy >> >> On Thursday, June 25, 2015, Filip Deleersnijder <fi...@motum.be >> <mailto:fi...@motum.be>> wrote: >> >>> Hi, >>> >>> Thanks for your response. >>> >>> Our application consists of 8 automatic vehicles in a warehouse setting. >>> Those vehicles need some consensus decisions, and that is what we use >>> Zookeeper for. >>> Because vehicles can come and go at random, we installed a ZK participant >>> on every vehicle. The ZK client is some other piece of software that is >>> also running on the vehicles. >>> >>> Therefor : >>> - We can not choose the number of ZK-participants because it just >>> depends on the number of vehicles. >>> - The participants communicate over Wifi >>> - The client is running on the same machine, so it communicates >>> over the local network >>> >>> We are running Zookeeper version 3.4.6 >>> >>> Our zoo.cfg can be found below this e-mail. >>> >>> Thanks in advance ! >>> >>> Filip >>> >>> # The number of milliseconds of each tick >>> tickTime=2000 >>> # The number of ticks that the initial >>> # synchronization phase can take >>> initLimit=10 >>> # The number of ticks that can pass between >>> # sending a request and getting an acknowledgement >>> syncLimit=5 >>> # the directory where the snapshot is stored. >>> # do not use /tmp for storage, /tmp here is just >>> # example sakes. >>> dataDir=c:/motum/config/MASS/ZK >>> # the port at which the clients will connect >>> clientPort=2181 >>> >>> server.1=172.17.35.11:2888:3888 >>> server.2=172.17.35.12:2888:3888 >>> server.3=172.17.35.13:2888:3888 >>> server.4=172.17.35.14:2888:3888 >>> server.5=172.17.35.15:2888:3888 >>> server.6=172.17.35.16:2888:3888 >>> server.7=172.17.35.17:2888:3888 >>> server.8=172.17.35.18:2888:3888 >>> >>> # The number of snapshots to retain in dataDir >>> # Purge task interval in hours >>> # Set to "0" to disable auto purge feature >>> autopurge.snapRetainCount=3 >>> autopurge.purgeInterval=1 >>> >>> >>> >>>> On 24 Jun 2015, at 18:54, Raúl Gutiérrez Segalés <r...@itevenworks.net >>>> <mailto:r...@itevenworks.net> >>> <javascript:;>> wrote: >>>> >>>> Hi, >>>> >>>> On 24 June 2015 at 06:05, Filip Deleersnijder <fi...@motum.be >>>> <mailto:fi...@motum.be> >>> <javascript:;>> wrote: >>>> >>>>> Hi, >>>>> >>>>> Let’s start with some description of our system : >>>>> >>>>> - We our using a Zookeeper cluster with 8 participants for an >>> application >>>>> with mobile nodes ( connected over Wifi ). >>>>> >>>> >>>> You mean the participants talk over wifi or the clients? >>>> >>>> >>>>> ( Ip of the different nodes are according to the following structure : >>>>> Node X has IP : 172.17.35.1X ) >>>>> >>>> >>>> Why 8 and not an odd number of machines (i.e.: >>>> >>> http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup >>> >>> <http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup> >>>> )? >>>> >>>> - It is not that unusual to have a node being shut-down or restarted >>>>> - We haven’t benchmarked the number of write operations yet, but I would >>>>> estimate that it would be less than 10 writes / second >>>>> >>>> >>>> What version of ZK are you using? >>>> >>>> >>>>> >>>>> The problem we are having however is that sometimes(*), some instances >>>>> seem to be having problems with leader election. >>>>> Under the header “Attachment 1” below, you can find the leader election >>>>> times that were needed over 24h ( from 1 node ). One average it took >>> more >>>>> than 1 minute ! >>>>> I assume that this is not normal behaviour ? ( If somebody could confirm >>>>> that in a 8-node cluster, these are not normal leader election times, >>> that >>>>> would be nice ) >>>>> >>>>> In attachement 2 : I included an extract from the logging during a >>> leader >>>>> election that took 101874ms for 1 node ( server 2 ). >>>>> >>>>> Any help is greatly appreciated. >>>>> If further or more specific logging is required, please ask ! >>>>> >>>>> >>>> Do you mind sharing a copy of your config file (zoo.cfg)? Thanks! >>>> >>>> >>>> -rgs >