Re: What does this mean?

2010-10-13 Thread Patrick Hunt
On Mon, Oct 11, 2010 at 4:16 PM, Avinash Lakshman 
avinash.laksh...@gmail.com wrote:

 tickTime = 2000, initLimit = 3000 and the data is around 11GB this is log +
 snapshot. So if I need to add a new observer can I transfer state from the
 ensemble manually before starting it? If so which files do I need to
 transfer?


You can't really do it manually. As part of the bring up process for a
server it communicates with the current leader and downloads the appropriate
data (either a diff of the recent changes or a full snapshot if too far
behind ). Try increasing your initLimit to 15 or so (btw, that' in ticks,
not milliseconds, so if you have 3000 now that's probably not the issue ;-)
). You might also want to increase the syncLimit at the same time. Here's
from the sample conf that ships with the release:

# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5

Patrick



 Thanks

 On Mon, Oct 11, 2010 at 10:16 AM, Benjamin Reed br...@yahoo-inc.com
 wrote:

   how big is your data? you may be running into the problem where it takes
  too long to do the state transfer and times out. check the initLimit and
 the
  size of your data.
 
  ben
 
 
  On 10/10/2010 08:57 AM, Avinash Lakshman wrote:
 
  Thanks Ben. I am not mixing processes of different clusters. I just
 double
  checked that. I have ZK deployed in a 5 node cluster and I have 20
  observers. I just started the 5 node cluster w/o starting the observers.
 I
  still the same issue. Now my cluster won't start up. So what is the
  correct
  workaround to get this going? How can I find out who the leader is and
 who
  the follower to get more insight?
 
  Thanks
  A
 
  On Sun, Oct 10, 2010 at 8:33 AM, Benjamin Reedbr...@yahoo-inc.com
   wrote:
 
   this usually happens when a follower closes its connection to the
 leader.
  it is usually caused by the follower shutting down or failing. you may
  get
  further insight by looking at the follower logs. you should really run
  with
  timestamps on so that you can correlate the logs of the leader and
  follower.
 
  on thing that is strange is the wide divergence between zxid of
 follower
  and leader. are you mixing processes of different clusters?
 
  ben
 
  
  From: Avinash Lakshman [avinash.laksh...@gmail.com]
  Sent: Sunday, October 10, 2010 8:18 AM
  To: zookeeper-user
  Subject: What does this mean?
 
  I see this exception and the servers not doing anything.
 
  java.io.IOException: Channel eof
 at
 
 
 
 org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
  ERROR - 124554051584(higestZxid)  21477836646(next log) for type -11
  WARN - Sending snapshot last zxid of peer is 0xe  zxid of
 leader
  is
  0x1e
  WARN - Sending snapshot last zxid of peer is 0x18  zxid of
 leader
  is
  0x1eg
   WARN - Sending snapshot last zxid of peer is 0x5002dc766  zxid of
 leader
  is
  0x1e
  WARN - Sending snapshot last zxid of peer is 0x1c  zxid of
 leader
  is
  0x1e
  ERROR - Unexpected exception causing shutdown while sock still open
  java.net.SocketException: Broken pipe
 at java.net.SocketOutputStream.socketWrite0(Native Method)
 at
  java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 at
 java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 at
  java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at
  java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
 at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
 at
 
 org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55)
 at
 
 org.apache.zookeeper.data.StatPersisted.serialize(StatPersisted.java:116)
 at
  org.apache.zookeeper.server.DataNode.serialize(DataNode.java:167)
 at
 
 
 
 org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
 at
  org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:967)
 at
  org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
 at
  org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
 at
  org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
 at
  org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1031)
 at
 
 
 
 org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104)
 at
 
 
 
 org.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:426)
 at
 
 
 
 org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:331)
  WARN - *** GOODBYE /10.138.34.212:33272 
 
  Avinash
 
 
 



Re: What does this mean?

2010-10-11 Thread Benjamin Reed
 how big is your data? you may be running into the problem where it 
takes too long to do the state transfer and times out. check the 
initLimit and the size of your data.


ben

On 10/10/2010 08:57 AM, Avinash Lakshman wrote:

Thanks Ben. I am not mixing processes of different clusters. I just double
checked that. I have ZK deployed in a 5 node cluster and I have 20
observers. I just started the 5 node cluster w/o starting the observers. I
still the same issue. Now my cluster won't start up. So what is the correct
workaround to get this going? How can I find out who the leader is and who
the follower to get more insight?

Thanks
A

On Sun, Oct 10, 2010 at 8:33 AM, Benjamin Reedbr...@yahoo-inc.com  wrote:


this usually happens when a follower closes its connection to the leader.
it is usually caused by the follower shutting down or failing. you may get
further insight by looking at the follower logs. you should really run with
timestamps on so that you can correlate the logs of the leader and follower.

on thing that is strange is the wide divergence between zxid of follower
and leader. are you mixing processes of different clusters?

ben


From: Avinash Lakshman [avinash.laksh...@gmail.com]
Sent: Sunday, October 10, 2010 8:18 AM
To: zookeeper-user
Subject: What does this mean?

I see this exception and the servers not doing anything.

java.io.IOException: Channel eof
at

org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
ERROR - 124554051584(higestZxid)  21477836646(next log) for type -11
WARN - Sending snapshot last zxid of peer is 0xe  zxid of leader is
0x1e
WARN - Sending snapshot last zxid of peer is 0x18  zxid of leader
is
0x1eg
  WARN - Sending snapshot last zxid of peer is 0x5002dc766  zxid of leader
is
0x1e
WARN - Sending snapshot last zxid of peer is 0x1c  zxid of leader
is
0x1e
ERROR - Unexpected exception causing shutdown while sock still open
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
at
org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55)
at
org.apache.zookeeper.data.StatPersisted.serialize(StatPersisted.java:116)
at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:167)
at

org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
at
org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:967)
at
org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
at
org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
at
org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
at
org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1031)
at

org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104)
at

org.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:426)
at

org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:331)
WARN - *** GOODBYE /10.138.34.212:33272 

Avinash





RE: What does this mean?

2010-10-10 Thread Benjamin Reed
this usually happens when a follower closes its connection to the leader. it is 
usually caused by the follower shutting down or failing. you may get further 
insight by looking at the follower logs. you should really run with timestamps 
on so that you can correlate the logs of the leader and follower.

on thing that is strange is the wide divergence between zxid of follower and 
leader. are you mixing processes of different clusters?

ben


From: Avinash Lakshman [avinash.laksh...@gmail.com]
Sent: Sunday, October 10, 2010 8:18 AM
To: zookeeper-user
Subject: What does this mean?

I see this exception and the servers not doing anything.

java.io.IOException: Channel eof
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
ERROR - 124554051584(higestZxid)  21477836646(next log) for type -11
WARN - Sending snapshot last zxid of peer is 0xe  zxid of leader is
0x1e
WARN - Sending snapshot last zxid of peer is 0x18  zxid of leader is
0x1eg
WARN - Sending snapshot last zxid of peer is 0x5002dc766  zxid of leader is
0x1e
WARN - Sending snapshot last zxid of peer is 0x1c  zxid of leader is
0x1e
ERROR - Unexpected exception causing shutdown while sock still open
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
at
org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55)
at
org.apache.zookeeper.data.StatPersisted.serialize(StatPersisted.java:116)
at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:167)
at
org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
at
org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:967)
at
org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
at
org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
at
org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
at
org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1031)
at
org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104)
at
org.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:426)
at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:331)
WARN - *** GOODBYE /10.138.34.212:33272 

Avinash


Re: What does this mean?

2010-10-10 Thread Avinash Lakshman
Thanks Ben. I am not mixing processes of different clusters. I just double
checked that. I have ZK deployed in a 5 node cluster and I have 20
observers. I just started the 5 node cluster w/o starting the observers. I
still the same issue. Now my cluster won't start up. So what is the correct
workaround to get this going? How can I find out who the leader is and who
the follower to get more insight?

Thanks
A

On Sun, Oct 10, 2010 at 8:33 AM, Benjamin Reed br...@yahoo-inc.com wrote:

 this usually happens when a follower closes its connection to the leader.
 it is usually caused by the follower shutting down or failing. you may get
 further insight by looking at the follower logs. you should really run with
 timestamps on so that you can correlate the logs of the leader and follower.

 on thing that is strange is the wide divergence between zxid of follower
 and leader. are you mixing processes of different clusters?

 ben

 
 From: Avinash Lakshman [avinash.laksh...@gmail.com]
 Sent: Sunday, October 10, 2010 8:18 AM
 To: zookeeper-user
 Subject: What does this mean?

 I see this exception and the servers not doing anything.

 java.io.IOException: Channel eof
at

 org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
 ERROR - 124554051584(higestZxid)  21477836646(next log) for type -11
 WARN - Sending snapshot last zxid of peer is 0xe  zxid of leader is
 0x1e
 WARN - Sending snapshot last zxid of peer is 0x18  zxid of leader
 is
 0x1eg
  WARN - Sending snapshot last zxid of peer is 0x5002dc766  zxid of leader
 is
 0x1e
 WARN - Sending snapshot last zxid of peer is 0x1c  zxid of leader
 is
 0x1e
 ERROR - Unexpected exception causing shutdown while sock still open
 java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
at
 org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55)
at
 org.apache.zookeeper.data.StatPersisted.serialize(StatPersisted.java:116)
at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:167)
at

 org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
at
 org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:967)
at
 org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
at
 org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
at
 org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
at
 org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1031)
at

 org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104)
at

 org.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:426)
at

 org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:331)
 WARN - *** GOODBYE /10.138.34.212:33272 

 Avinash