Dear Zookeeper usergroup members, I have some questions.
We're currently use Zookeeper 3.4.5 with clustering 3 nodes. We got zookeeper service stopped all of sudden so client wasn't able to connect to zookeeper server. In that situation, zookeepers couldn't elect leader each other. Then I restarted zookeeper service (all of them) but could't elect leader and be follower. So I rebooted linux but same happened. (I lost zookeeper log here t.t) When I removed snapshot files in data directory, the zookeeper worked okay. I have uploaded my zookeeper snapshot here - https://s3-ap-northeast-1.amazonaws.com/zookeeper-logs/data_org_b1.tar If I push the snapshot into data directory, zookeeper clustering fail reappears again. My question is 1. why the snapshot was corrupted all of sudden? 2. Is there any way I can avoid this snapshot corruption issue? I've attached zoo.cfg and some of error log. I'd be happy if I get any opinion. Thank You. Best Regards Youngseok Jung #zoo.cfg (pretty much default setting) tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/zookeeper/data clientPort=2181 server.1=192.168.33.1:2888:3888 server.2=192.168.33.129:2888:3888 server.3=192.168.161.1:2888:3888 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 #Some of error log 2014-03-19 17:56:24,737 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 2 (n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING (n.state), 2 (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state) 2014-03-19 17:56:24,737 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open channel to 3 at election address /10.0.161.1:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365) at java.lang.Thread.run(Thread.java:724) 2014-03-19 17:56:25,537 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 1600 2014-03-19 17:56:25,538 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 1 (n.leader), 0xc200000001 (n.zxid), 0x145 (n.round), LOOKING (n.state), 1 (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state) 2014-03-19 17:56:25,540 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 2 (n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING (n.state), 2 (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state) 2014-03-19 17:56:25,540 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open channel to 3 at election address /10.0.161.1:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365) at java.lang.Thread.run(Thread.java:724)
