As I said before, I cannot even restart one server, it automatically brings up another process.
I tried specifically setting the PID. ps -aef | grep -i zoo vim /var/lib/zookeeper/zookeeper_server.pid sudo /usr/share/zookeeper/bin/zkServer.sh restart or stop, neither works. Is there a setting to shutdown zookeper and bring up one by one in 3 node cluster? On Fri, May 13, 2016 at 12:57 PM, R Krishna <[email protected]> wrote: > I have a fairly simple config file (below), I tried to reboot the machine > but server 75 never restarts properly by exposing LISTEN port on 3888 and > obviously get 2016-05-13 12:54:58,555 - WARN > [WorkerSender[myid=3]:QuorumCnxManager@368] - Cannot open channel to 1 at > election address /172.28.84.75:3888. Whereas 75 is unable to expose 3888 > and unable to connect to other servers with those exceptions shown before. > > Yes, I chose a distinct id=1 to 3 for each server. How do you do a rolling > restart? and where do you specify to take it easy if it cannot find all > servers? > > # The number of milliseconds of each tick > tickTime=2000 > # The number of ticks that the initial > # synchronization phase can take > initLimit=10 > # The number of ticks that can pass between > # sending a request and getting an acknowledgement > syncLimit=5 > # the directory where the snapshot is stored. > dataDir=/var/lib/zookeeper > # Place the dataLogDir to a separate physical disc for better performance > # dataLogDir=/disk2/zookeeper > > # the port at which the clients will connect > clientPort=2181 > > # specify all zookeeper servers > # The fist port is used by followers to connect to the leader > # The second one is used for leader election > server.1=X.Y.Z.75:2888:3888 > server.2=X.Y.Z.76:2888:3888 > server.3=X.Y.Z.98:2888:3888 > > > On Fri, May 13, 2016 at 3:51 AM, Flavio Junqueira <[email protected]> wrote: > >> Hi there, >> >> The myid needs to contain the id for each server in the ensemble, so each >> server will have a distinct value in its myid file. >> >> The problem might be with you configuration file. I think you say that >> you have specified the servers in the config file of each server, but >> perhaps you want to have a look at the documentation to see if there is >> anything you're missing. If you're not sure, please post it here. >> >> In the 3.4 branch of ZK, you have to do a rolling upgrade of the servers. >> >> -Flavio >> >> > On 13 May 2016, at 11:15, R Krishna <[email protected]> wrote: >> > >> > Just tried to setup a 2 zookeeper cluster for the first time one each >> for >> > my 2 Kafka broker cluster and came across following issues: >> > 1. Do we have to specify a separate value in vim >> ./var/lib/zookeeper/myid >> > although they are separate machine instances? >> > 2. I kept seeing Mode:standalone between the two servers although I saw >> > connectivity between these two. After restarts, I saw them go to >> > Follower/Leader. >> > /usr/share/zookeeper/bin/zkServer.sh status >> > JMX enabled by default >> > Using config: /etc/zookeeper/conf/zoo.cfg >> > Mode: standalone >> > 3. The data was completely inconsistent, I was able to connect to each >> one >> > run the all netcat status commands from the other server without any >> issue. >> > However, Kafka broker data was inconsistent and kept failing, is there a >> > way to confirm if both nodes are in sync and part of same cluster? >> > org.I0Itec.zkclient.exception.ZkNoNodeException: >> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = >> > NoNode for /config/changes >> > >> > 4. Whenever I updated the .cfg file, I cannot do a sudo >> > /usr/share/zookeeper/bin/zkServer.sh restart, I have to force kill the >> pid, >> > in which case in brings up another process reading the latest .cfg, why >> is >> > this so? >> > >> > 5. I realized we need at least 3 to make an ensemble, so I created and >> > added another ZK host updated the .cfg and force killed the process so >> it >> > reads the latest config and started getting these exceptions. Yes, this >> > probably means I have run out of connections. >> > >> > *And finally, how do I safely restart such a cluster when adding new >> nodes >> > and then force them to sync data?* >> > >> > MASTER: 75: :::::::::::::::::::::::::::::::::::: >> > 3 09:56:03,823 - INFO [main:FileSnap@83] - Reading snapshot >> > /var/lib/zookeeper/version-2/snapshot.30 >> > 2016-05-13 09:56:03,860 - ERROR [main:FileTxnSnapLog@210] - Parent >> > /brokers/ids missing for /brokers/ids/2 >> > 2016-05-13 09:56:03,862 - ERROR [main:QuorumPeer@453] - Unable to load >> > database on disk >> > java.io.IOException: Failed to process transaction type: 1 error: >> > KeeperErrorCode = NoNode for /brokers/ids >> > at >> > >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153) >> > at >> > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) >> > at >> > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) >> > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >> > KeeperErrorCode = NoNode for /brokers/ids >> > at >> > >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211) >> > at >> > >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) >> > ... 6 more >> > 2016-05-13 09:56:03,865 - ERROR [main:QuorumPeerMain@89] - Unexpected >> > exception, exiting abnormally >> > java.lang.RuntimeException: Unable to run quorum server >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454) >> > at >> > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) >> > Caused by: java.io.IOException: Failed to process transaction type: 1 >> > error: KeeperErrorCode = NoNode for /brokers/ids >> > at >> > >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153) >> > at >> > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) >> > ... 4 more >> > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >> > KeeperErrorCode = NoNode for /brokers/ids >> > at >> > >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211) >> > at >> > >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) >> > ... 6 more >> > >> > >> > 2016-05-13 09:57:29,084 - ERROR [main:QuorumPeerMain@89] - Unexpected >> > exception, exiting abnormally >> > java.lang.RuntimeException: Unable to run quorum server >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454) >> > at >> > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) >> > Caused by: java.io.IOException: Failed to process transaction type: 1 >> > error: KeeperErrorCode = NoNode for /brokers/ids >> > at >> > >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153) >> > at >> > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) >> > ... 4 more >> > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >> > KeeperErrorCode = NoNode for /brokers/ids >> > at >> > >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211) >> > at >> > >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) >> > ... 6 more >> > >> > >> > SECOND: 76 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: >> > ING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >> > 2016-05-13 09:42:40,650 - WARN >> > [RecvWorker:1:QuorumCnxManager$RecvWorker@762] - Connection broken for >> id >> > 1, my id = 2, error = >> > java.io.EOFException >> > at java.io.DataInputStream.readInt(DataInputStream.java:392) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:747) >> > 2016-05-13 09:42:40,650 - WARN >> > [RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting >> SendWorker >> > 2016-05-13 09:42:40,651 - WARN >> > [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while >> waiting >> > for message on queue >> > java.lang.InterruptedException >> > at >> > >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) >> > at >> > >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095) >> > at >> > >> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667) >> > 2016-05-13 09:42:40,651 - WARN >> > [SendWorker:1:QuorumCnxManager$SendWorker@688] - Send worker leaving >> threa >> > >> > >> > ..... then these ............... >> > >> > ==> /var/log/zookeeper/zookeeper.log <== >> > 2016-05-13 10:01:20,334 - INFO [NIOServerCxn.Factory: >> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >> connection >> > from /X.Y.Z.75:58954 >> > 2016-05-13 10:01:20,334 - WARN [NIOServerCxn.Factory: >> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of >> > session 0x0 due to java.io.IOException: ZooKeeperServer not running >> > 2016-05-13 10:01:20,335 - INFO [NIOServerCxn.Factory: >> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for >> > client /X.Y.Z.75:58954 (no session established for client) >> > >> > ==> /home/kafka/kafka/kafka.log <== >> > [2016-05-13 10:01:20,412] INFO Opening socket connection to server >> > X.Y.Z.75/X.Y.Z.75:2181. Will not attempt to authenticate using SASL >> > (unknown error) (org.apache.zookeeper.ClientCnxn) >> > [2016-05-13 10:01:20,413] INFO Socket connection established to >> > X.Y.Z.75/X.Y.Z.75:2181, initiating session >> (org.apache.zookeeper.ClientCnxn) >> > [2016-05-13 10:01:20,637] WARN Session 0x254a9245fc00000 for server >> > X.Y.Z.75/X.Y.Z.75:2181, unexpected error, closing socket connection and >> > attempting reconnect (org.apache.zookeeper.ClientCnxn) >> > java.io.IOException: Connection reset by peer >> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) >> > at sun.nio.ch.IOUtil.read(IOUtil.java:192) >> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384) >> > at >> > >> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) >> > at >> > >> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) >> > at >> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) >> > [2016-05-13 10:01:21,782] INFO Opening socket connection to server >> > X.Y.Z.76/X.Y.Z.76:2181. Will not attempt to authenticate using SASL >> > (unknown error) (org.apache.zookeeper.ClientCnxn) >> > >> > >> > >> > >> > >> > THIRD - added last:::::::::::::::::::::::::::::::::::::::: >> > >> > LOWING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >> > 2016-05-13 03:03:39,540 - INFO >> > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - >> > Notification time out: 25600 >> > 2016-05-13 03:03:39,569 - WARN >> [WorkerSender[myid=3]:QuorumCnxManager@368] >> > - Cannot open channel to 1 at election address /X.Y.Z.75:3888 >> > java.net.ConnectException: Connection refused >> > at java.net.PlainSocketImpl.socketConnect(Native Method) >> > at >> > >> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) >> > at >> > >> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) >> > at >> > >> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) >> > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >> > at java.net.Socket.connect(Socket.java:579) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354) >> > at >> > >> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327) >> > at >> > >> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393) >> > at >> > >> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365) >> > at java.lang.Thread.run(Thread.java:745) >> > 2016-05-13 03:03:39,570 - INFO >> > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3 >> > (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), >> 3 >> > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >> > 2016-05-13 03:03:39,596 - INFO >> > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3 >> > (n.leader), 0x100000052 (n.zxid), 0x108d1 (n.round), FOLLOWING >> (n.state), 2 >> > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >> > 2016-05-13 03:03:47,801 - INFO >> > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2 >> > (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), >> 2 >> > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >> > 2016-05-13 03:03:48,013 - INFO >> > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2 >> > (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), >> 2 >> > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >> > 2016-05-13 03:03:48,415 - INFO >> > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2 >> > (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), >> 2 >> > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >> > 2016-05-13 03:03:49,216 - INFO >> > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2 >> > (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), >> 2 >> > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >> > 2016-05-13 03:03:50,818 - INFO >> > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2 >> > (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), >> 2 >> > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >> >> > > > -- > Radha Krishna, Proddaturi > 253-234-5657 > -- Radha Krishna, Proddaturi 253-234-5657
