On 04/19/2010 11:55 AM, Travis Crawford wrote:
To double-check, is the best way to tell a ZK instance is up-to-date
by looking at its ``LastZxid`` value? For example:

$ java -jar /home/travis/cmdline-jmxclient-0.10.5.jar - localhost:8081
04/19/2010 18:42:45 +0000 org.archive.jmx.Client LastZxid: 0xf000420ad

I believe the ``LastZxid`` for each ZK instance needs to be compared
to the leader to see how far behind it is.

Well the server will only be "active" once it joins the quorum (usually as a follower) so if it's having trouble joining that data might not be available. But yes, once the server is active then you could examine the lastzxid to determine if/howmuch it's lagging the leader (quorum).

It would be a lot easier from the operations perspective if the leader
explicitly published some health stats:

(a) Count of instances in the ensemble.
(b) Count of up-to-date instances in the ensemble.

This would greatly simplify monitoring&  alerting - when an instance
falls behind one could configure their monitoring system to let
someone know and take a look at the logs.

That's a great idea. Please enter a JIRA for this - a new 4 letter word and JMX support. It would also be a great starter project for someone interested in becoming more familiar with the server code.



On Mon, Apr 19, 2010 at 10:14 AM, Patrick Hunt<ph...@apache.org>  wrote:
Usually the server logs will shed light on such issues. If we had access to
them it might be easier to speculate.


On 04/19/2010 09:22 AM, Mahadev Konar wrote:

Hi Hao,
   As Vishal already asked, how are you determining if the writes are being
  Also, what was the status of C2 when you checked for these writes? Do you
have the output of echo "stat" | nc localhost port?

How long did you wait when you say that C2 did not received the writes?
was the status of C2 (again echo "stat" | nc localhost port) when you saw
the C2 had received the writes?


On 4/18/10 10:54 PM, "Dr Hao He"<h...@softtouchit.com>    wrote:

I have zookeeper cluster E1 with 3 nodes A,B, and C.

I stopped C and did some writes on E1.  Both A and B received the writes.
then started C and after a short while, C also received the writes.

All seem to go well so I replicated the setup to another cluster E2 with
exactly 3 nodes: A2, B2, and C2.

I stopped C2 and did some writes on E2.  A2 received the writes.  I then
started C2.  However, no matter how long I wait, C2 never received the

I then did more writes on E2.  Then C2 can receive all the writes
the old writes when it was down.

How do I find out what was wrong withe E2 setup?

I am running 3.2.2 on all nodes.


Dr Hao He

XPE - the truly SOA platform


Reply via email to