Re: Killing a zookeeper server

Adam Rosien Tue, 12 Jan 2010 15:58:47 -0800

Doh - that makes total sense. For whatever reason I thought with 2
servers you couldn't get a majority :P


On Tue, Jan 12, 2010 at 3:17 PM, Henry Robinson <he...@cloudera.com> wrote:
> Hi Adam -
>
> As long as a quorum of servers is running, ZK will be live. With majority
> quorums, 2/3 is enough to keep going. In general, if fewer than half your
> nodes have failed, ZK will keep on keeping on.
>
> The main concern with a cluster of 2/3 machines is that a single further
> failure will bring down the whole cluster.
>
> Henry
>
> 2010/1/12 Adam Rosien <a...@rosien.net>
>
>> I have a related question: what's the behavior of a cluster of 3 when
>> one is down? I've tried it and a leader is elected, but are there any
>> other caveats for this situation?
>>
>> .. Adam
>>
>> On Tue, Jan 12, 2010 at 2:40 PM, Patrick Hunt <ph...@apache.org> wrote:
>> > 12 servers? That's alot, if you dont' mind my asking why so many?
>> Typically
>> > we recommend 5 - that way you can have one down for maintenance and still
>> > have a failure that doesn't bring down the cluster.
>> >
>> > The "electing a leader" is probably the restarted machine attempting to
>> > re-join the ensemble (it should join as a follower if you have a leader
>> > already elected, given that it's xid is behind the existing leader.) Hard
>> to
>> > tell though without the logs.
>> >
>> > You might also be seeing the initLimit exceeded, is the data you are
>> storing
>> > in ZK large? Or perhaps network connectivity is slow?
>> >
>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_clusterOptions
>> > again the logs would give some insight on this.
>> >
>> >
>> > Patrick
>> >
>> > Nick Bailey wrote:
>> >>
>> >> We are running zookeeper 3.1.0
>> >>
>> >> Recently we noticed the cpu usage on our machines becoming
>> >> increasingly high and we believe the cause is
>> >>
>> >> https://issues.apache.org/jira/browse/ZOOKEEPER-427
>> >>
>> >> However our solution when we noticed the problem was to kill the
>> >> zookeeper process and restart it.
>> >>
>> >> After doing that though it looks like the newly restarted zookeeper
>> >> server is continually attempting to elect a leader even though one
>> >> already exists.
>> >>
>> >> The process responses with 'imok' when asked, but the stat command
>> >> returns 'ZooKeeperServer not running'.
>> >>
>> >> I belive that killing the current leader should trigger all servers
>> >> to do an election and solve the problem, but I'm not sure. Should
>> >> that be the course of action in this situation?
>> >>
>> >> Also we have 12 servers, but 5 are currently not running according to
>> >> stat.  So I guess this isn't a problem unless we lose another one.
>> >> We have plans to upgrade zookeeper to solve the cpu issue but haven't
>> >> been able to do that yet.
>> >>
>> >> Any help appreciated, Nick Bailey
>> >>
>> >
>>
>

Re: Killing a zookeeper server

Reply via email to