On Mon, Apr 19, 2010 at 12:10 PM, Patrick Hunt <ph...@apache.org> wrote: > > On 04/19/2010 11:55 AM, Travis Crawford wrote: >> >> To double-check, is the best way to tell a ZK instance is up-to-date >> by looking at its ``LastZxid`` value? For example: >> >> $ java -jar /home/travis/cmdline-jmxclient-0.10.5.jar - localhost:8081 >> >> org.apache.ZooKeeperService:name0=ReplicatedServer_id1,name1=replica.1,name2=Follower,name3=InMemoryDataTree >> LastZxid >> 04/19/2010 18:42:45 +0000 org.archive.jmx.Client LastZxid: 0xf000420ad >> >> I believe the ``LastZxid`` for each ZK instance needs to be compared >> to the leader to see how far behind it is. > > Well the server will only be "active" once it joins the quorum (usually as a > follower) so if it's having trouble joining that data might not be > available. But yes, once the server is active then you could examine the > lastzxid to determine if/howmuch it's lagging the leader (quorum). > >> >> >> It would be a lot easier from the operations perspective if the leader >> explicitly published some health stats: >> >> (a) Count of instances in the ensemble. >> (b) Count of up-to-date instances in the ensemble. >> >> This would greatly simplify monitoring& alerting - when an instance >> falls behind one could configure their monitoring system to let >> someone know and take a look at the logs. > > That's a great idea. Please enter a JIRA for this - a new 4 letter word and > JMX support. It would also be a great starter project for someone interested > in becoming more familiar with the server code.
Filed: https://issues.apache.org/jira/browse/ZOOKEEPER-744 Attached is a screenshot of some JMX output in Ganglia - its currently implemented using a -javaagent tool I happened to find. Having a simple non-java way to fetch monitoring stats and publish to an external monitoring system would be awesome, and probably reusable by others. --travis > > Patrick > > >> >> --travis >> >> >> >> >> On Mon, Apr 19, 2010 at 10:14 AM, Patrick Hunt<ph...@apache.org> wrote: >>> >>> Usually the server logs will shed light on such issues. If we had access >>> to >>> them it might be easier to speculate. >>> >>> Patrick >>> >>> On 04/19/2010 09:22 AM, Mahadev Konar wrote: >>>> >>>> Hi Hao, >>>> As Vishal already asked, how are you determining if the writes are >>>> being >>>> received? >>>> Also, what was the status of C2 when you checked for these writes? Do >>>> you >>>> have the output of echo "stat" | nc localhost port? >>>> >>>> How long did you wait when you say that C2 did not received the writes? >>>> What >>>> was the status of C2 (again echo "stat" | nc localhost port) when you >>>> saw >>>> the C2 had received the writes? >>>> >>>> Thanks >>>> mahadev >>>> >>>> >>>> On 4/18/10 10:54 PM, "Dr Hao He"<h...@softtouchit.com> wrote: >>>> >>>>> I have zookeeper cluster E1 with 3 nodes A,B, and C. >>>>> >>>>> I stopped C and did some writes on E1. Both A and B received the >>>>> writes. >>>>> I >>>>> then started C and after a short while, C also received the writes. >>>>> >>>>> All seem to go well so I replicated the setup to another cluster E2 >>>>> with >>>>> exactly 3 nodes: A2, B2, and C2. >>>>> >>>>> I stopped C2 and did some writes on E2. A2 received the writes. I >>>>> then >>>>> started C2. However, no matter how long I wait, C2 never received the >>>>> writes. >>>>> >>>>> I then did more writes on E2. Then C2 can receive all the writes >>>>> including >>>>> the old writes when it was down. >>>>> >>>>> How do I find out what was wrong withe E2 setup? >>>>> >>>>> I am running 3.2.2 on all nodes. >>>>> >>>>> Regards, >>>>> >>>>> Dr Hao He >>>>> >>>>> XPE - the truly SOA platform >>>>> >>>>> h...@softtouchit.com >>>>> http://softtouchit.com >>>>> >>>>> >>>> >>> >