To double-check, is the best way to tell a ZK instance is up-to-date by looking at its ``LastZxid`` value? For example:
$ java -jar /home/travis/cmdline-jmxclient-0.10.5.jar - localhost:8081 org.apache.ZooKeeperService:name0=ReplicatedServer_id1,name1=replica.1,name2=Follower,name3=InMemoryDataTree LastZxid 04/19/2010 18:42:45 +0000 org.archive.jmx.Client LastZxid: 0xf000420ad I believe the ``LastZxid`` for each ZK instance needs to be compared to the leader to see how far behind it is. It would be a lot easier from the operations perspective if the leader explicitly published some health stats: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. This would greatly simplify monitoring & alerting - when an instance falls behind one could configure their monitoring system to let someone know and take a look at the logs. --travis On Mon, Apr 19, 2010 at 10:14 AM, Patrick Hunt <ph...@apache.org> wrote: > Usually the server logs will shed light on such issues. If we had access to > them it might be easier to speculate. > > Patrick > > On 04/19/2010 09:22 AM, Mahadev Konar wrote: >> >> Hi Hao, >> As Vishal already asked, how are you determining if the writes are being >> received? >> Also, what was the status of C2 when you checked for these writes? Do you >> have the output of echo "stat" | nc localhost port? >> >> How long did you wait when you say that C2 did not received the writes? >> What >> was the status of C2 (again echo "stat" | nc localhost port) when you saw >> the C2 had received the writes? >> >> Thanks >> mahadev >> >> >> On 4/18/10 10:54 PM, "Dr Hao He"<h...@softtouchit.com> wrote: >> >>> I have zookeeper cluster E1 with 3 nodes A,B, and C. >>> >>> I stopped C and did some writes on E1. Both A and B received the writes. >>> I >>> then started C and after a short while, C also received the writes. >>> >>> All seem to go well so I replicated the setup to another cluster E2 with >>> exactly 3 nodes: A2, B2, and C2. >>> >>> I stopped C2 and did some writes on E2. A2 received the writes. I then >>> started C2. However, no matter how long I wait, C2 never received the >>> writes. >>> >>> I then did more writes on E2. Then C2 can receive all the writes >>> including >>> the old writes when it was down. >>> >>> How do I find out what was wrong withe E2 setup? >>> >>> I am running 3.2.2 on all nodes. >>> >>> Regards, >>> >>> Dr Hao He >>> >>> XPE - the truly SOA platform >>> >>> h...@softtouchit.com >>> http://softtouchit.com >>> >>> >> >