I'd certainly like to understand the fundamental problem you're seeing of why any server is unable to enter quorum for any period of time without being partitioned, etc. Is there a ticket open for this or do you think it's just part of your env somehow?
As for the larger question, why not run the check on a regular timing and fail if it isn't in quorum for more than N checks? We could add a 4lw but it seems like you should be able to figure this out in other ways. C On Tue, Feb 18, 2014 at 5:51 PM, Deepak Jagtap <[email protected]>wrote: > Hi All, > > I came across couple of instances where one zookeeper server was falling > out from the quorum due to some bug/issue with leader election not > completing successfully. > > We are trying to mitigate this problem by monitoring status of zookeeper > server to check if it is part of the quorum. > If it's not part of the quorum for very long time we restart zookeeper > server so that it can join the quorum again. > > Currently there is no way to check if server is part of quorum : > 'ruok' returns 'imok' even if zookeeper server is running and is not part > of quorum(i.e it might be continuously running leader election) > 'mntr' command reports this information but it doesn't report how long > server is in that state. > > I want to restart zookeeper server only if out of quorum for certain amount > of time (say: 2 minutes). > Do I need to add a new four letter word command to report this info or is > there any other way I can achieve this? > > I would be more than happy to add this to zookeeper if its helpful for > other zookeeper users. > > Thanks & Regards, > Deepak >
