Re: What does this mean?
On Mon, Oct 11, 2010 at 4:16 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote: tickTime = 2000, initLimit = 3000 and the data is around 11GB this is log + snapshot. So if I need to add a new observer can I transfer state from the ensemble manually before starting it? If so which files do I need to transfer? You can't really do it manually. As part of the bring up process for a server it communicates with the current leader and downloads the appropriate data (either a diff of the recent changes or a full snapshot if too far behind ). Try increasing your initLimit to 15 or so (btw, that' in ticks, not milliseconds, so if you have 3000 now that's probably not the issue ;-) ). You might also want to increase the syncLimit at the same time. Here's from the sample conf that ships with the release: # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 Patrick Thanks On Mon, Oct 11, 2010 at 10:16 AM, Benjamin Reed br...@yahoo-inc.com wrote: how big is your data? you may be running into the problem where it takes too long to do the state transfer and times out. check the initLimit and the size of your data. ben On 10/10/2010 08:57 AM, Avinash Lakshman wrote: Thanks Ben. I am not mixing processes of different clusters. I just double checked that. I have ZK deployed in a 5 node cluster and I have 20 observers. I just started the 5 node cluster w/o starting the observers. I still the same issue. Now my cluster won't start up. So what is the correct workaround to get this going? How can I find out who the leader is and who the follower to get more insight? Thanks A On Sun, Oct 10, 2010 at 8:33 AM, Benjamin Reedbr...@yahoo-inc.com wrote: this usually happens when a follower closes its connection to the leader. it is usually caused by the follower shutting down or failing. you may get further insight by looking at the follower logs. you should really run with timestamps on so that you can correlate the logs of the leader and follower. on thing that is strange is the wide divergence between zxid of follower and leader. are you mixing processes of different clusters? ben From: Avinash Lakshman [avinash.laksh...@gmail.com] Sent: Sunday, October 10, 2010 8:18 AM To: zookeeper-user Subject: What does this mean? I see this exception and the servers not doing anything. java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) ERROR - 124554051584(higestZxid) 21477836646(next log) for type -11 WARN - Sending snapshot last zxid of peer is 0xe zxid of leader is 0x1e WARN - Sending snapshot last zxid of peer is 0x18 zxid of leader is 0x1eg WARN - Sending snapshot last zxid of peer is 0x5002dc766 zxid of leader is 0x1e WARN - Sending snapshot last zxid of peer is 0x1c zxid of leader is 0x1e ERROR - Unexpected exception causing shutdown while sock still open java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:78) at java.io.DataOutputStream.writeInt(DataOutputStream.java:180) at org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55) at org.apache.zookeeper.data.StatPersisted.serialize(StatPersisted.java:116) at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:167) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:967) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1031) at org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104) at org.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:426) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:331) WARN - *** GOODBYE /10.138.34.212:33272 Avinash
Re: What does this mean?
how big is your data? you may be running into the problem where it takes too long to do the state transfer and times out. check the initLimit and the size of your data. ben On 10/10/2010 08:57 AM, Avinash Lakshman wrote: Thanks Ben. I am not mixing processes of different clusters. I just double checked that. I have ZK deployed in a 5 node cluster and I have 20 observers. I just started the 5 node cluster w/o starting the observers. I still the same issue. Now my cluster won't start up. So what is the correct workaround to get this going? How can I find out who the leader is and who the follower to get more insight? Thanks A On Sun, Oct 10, 2010 at 8:33 AM, Benjamin Reedbr...@yahoo-inc.com wrote: this usually happens when a follower closes its connection to the leader. it is usually caused by the follower shutting down or failing. you may get further insight by looking at the follower logs. you should really run with timestamps on so that you can correlate the logs of the leader and follower. on thing that is strange is the wide divergence between zxid of follower and leader. are you mixing processes of different clusters? ben From: Avinash Lakshman [avinash.laksh...@gmail.com] Sent: Sunday, October 10, 2010 8:18 AM To: zookeeper-user Subject: What does this mean? I see this exception and the servers not doing anything. java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) ERROR - 124554051584(higestZxid) 21477836646(next log) for type -11 WARN - Sending snapshot last zxid of peer is 0xe zxid of leader is 0x1e WARN - Sending snapshot last zxid of peer is 0x18 zxid of leader is 0x1eg WARN - Sending snapshot last zxid of peer is 0x5002dc766 zxid of leader is 0x1e WARN - Sending snapshot last zxid of peer is 0x1c zxid of leader is 0x1e ERROR - Unexpected exception causing shutdown while sock still open java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:78) at java.io.DataOutputStream.writeInt(DataOutputStream.java:180) at org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55) at org.apache.zookeeper.data.StatPersisted.serialize(StatPersisted.java:116) at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:167) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:967) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1031) at org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104) at org.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:426) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:331) WARN - *** GOODBYE /10.138.34.212:33272 Avinash
RE: What does this mean?
this usually happens when a follower closes its connection to the leader. it is usually caused by the follower shutting down or failing. you may get further insight by looking at the follower logs. you should really run with timestamps on so that you can correlate the logs of the leader and follower. on thing that is strange is the wide divergence between zxid of follower and leader. are you mixing processes of different clusters? ben From: Avinash Lakshman [avinash.laksh...@gmail.com] Sent: Sunday, October 10, 2010 8:18 AM To: zookeeper-user Subject: What does this mean? I see this exception and the servers not doing anything. java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) ERROR - 124554051584(higestZxid) 21477836646(next log) for type -11 WARN - Sending snapshot last zxid of peer is 0xe zxid of leader is 0x1e WARN - Sending snapshot last zxid of peer is 0x18 zxid of leader is 0x1eg WARN - Sending snapshot last zxid of peer is 0x5002dc766 zxid of leader is 0x1e WARN - Sending snapshot last zxid of peer is 0x1c zxid of leader is 0x1e ERROR - Unexpected exception causing shutdown while sock still open java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:78) at java.io.DataOutputStream.writeInt(DataOutputStream.java:180) at org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55) at org.apache.zookeeper.data.StatPersisted.serialize(StatPersisted.java:116) at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:167) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:967) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1031) at org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104) at org.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:426) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:331) WARN - *** GOODBYE /10.138.34.212:33272 Avinash
Re: What does this mean?
Thanks Ben. I am not mixing processes of different clusters. I just double checked that. I have ZK deployed in a 5 node cluster and I have 20 observers. I just started the 5 node cluster w/o starting the observers. I still the same issue. Now my cluster won't start up. So what is the correct workaround to get this going? How can I find out who the leader is and who the follower to get more insight? Thanks A On Sun, Oct 10, 2010 at 8:33 AM, Benjamin Reed br...@yahoo-inc.com wrote: this usually happens when a follower closes its connection to the leader. it is usually caused by the follower shutting down or failing. you may get further insight by looking at the follower logs. you should really run with timestamps on so that you can correlate the logs of the leader and follower. on thing that is strange is the wide divergence between zxid of follower and leader. are you mixing processes of different clusters? ben From: Avinash Lakshman [avinash.laksh...@gmail.com] Sent: Sunday, October 10, 2010 8:18 AM To: zookeeper-user Subject: What does this mean? I see this exception and the servers not doing anything. java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) ERROR - 124554051584(higestZxid) 21477836646(next log) for type -11 WARN - Sending snapshot last zxid of peer is 0xe zxid of leader is 0x1e WARN - Sending snapshot last zxid of peer is 0x18 zxid of leader is 0x1eg WARN - Sending snapshot last zxid of peer is 0x5002dc766 zxid of leader is 0x1e WARN - Sending snapshot last zxid of peer is 0x1c zxid of leader is 0x1e ERROR - Unexpected exception causing shutdown while sock still open java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:78) at java.io.DataOutputStream.writeInt(DataOutputStream.java:180) at org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55) at org.apache.zookeeper.data.StatPersisted.serialize(StatPersisted.java:116) at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:167) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:967) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982) at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1031) at org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104) at org.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:426) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:331) WARN - *** GOODBYE /10.138.34.212:33272 Avinash