I have a related question: what's the behavior of a cluster of 3 when
one is down? I've tried it and a leader is elected, but are there any
other caveats for this situation?
On Tue, Jan 12, 2010 at 2:40 PM, Patrick Hunt <ph...@apache.org> wrote:
> 12 servers? That's alot, if you dont' mind my asking why so many? Typically
> we recommend 5 - that way you can have one down for maintenance and still
> have a failure that doesn't bring down the cluster.
> The "electing a leader" is probably the restarted machine attempting to
> re-join the ensemble (it should join as a follower if you have a leader
> already elected, given that it's xid is behind the existing leader.) Hard to
> tell though without the logs.
> You might also be seeing the initLimit exceeded, is the data you are storing
> in ZK large? Or perhaps network connectivity is slow?
> again the logs would give some insight on this.
> Nick Bailey wrote:
>> We are running zookeeper 3.1.0
>> Recently we noticed the cpu usage on our machines becoming
>> increasingly high and we believe the cause is
>> However our solution when we noticed the problem was to kill the
>> zookeeper process and restart it.
>> After doing that though it looks like the newly restarted zookeeper
>> server is continually attempting to elect a leader even though one
>> already exists.
>> The process responses with 'imok' when asked, but the stat command
>> returns 'ZooKeeperServer not running'.
>> I belive that killing the current leader should trigger all servers
>> to do an election and solve the problem, but I'm not sure. Should
>> that be the course of action in this situation?
>> Also we have 12 servers, but 5 are currently not running according to
>> stat. So I guess this isn't a problem unless we lose another one.
>> We have plans to upgrade zookeeper to solve the cpu issue but haven't
>> been able to do that yet.
>> Any help appreciated, Nick Bailey