Hi Adam - As long as a quorum of servers is running, ZK will be live. With majority quorums, 2/3 is enough to keep going. In general, if fewer than half your nodes have failed, ZK will keep on keeping on.
The main concern with a cluster of 2/3 machines is that a single further failure will bring down the whole cluster. Henry 2010/1/12 Adam Rosien <a...@rosien.net> > I have a related question: what's the behavior of a cluster of 3 when > one is down? I've tried it and a leader is elected, but are there any > other caveats for this situation? > > .. Adam > > On Tue, Jan 12, 2010 at 2:40 PM, Patrick Hunt <ph...@apache.org> wrote: > > 12 servers? That's alot, if you dont' mind my asking why so many? > Typically > > we recommend 5 - that way you can have one down for maintenance and still > > have a failure that doesn't bring down the cluster. > > > > The "electing a leader" is probably the restarted machine attempting to > > re-join the ensemble (it should join as a follower if you have a leader > > already elected, given that it's xid is behind the existing leader.) Hard > to > > tell though without the logs. > > > > You might also be seeing the initLimit exceeded, is the data you are > storing > > in ZK large? Or perhaps network connectivity is slow? > > > http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_clusterOptions > > again the logs would give some insight on this. > > > > > > Patrick > > > > Nick Bailey wrote: > >> > >> We are running zookeeper 3.1.0 > >> > >> Recently we noticed the cpu usage on our machines becoming > >> increasingly high and we believe the cause is > >> > >> https://issues.apache.org/jira/browse/ZOOKEEPER-427 > >> > >> However our solution when we noticed the problem was to kill the > >> zookeeper process and restart it. > >> > >> After doing that though it looks like the newly restarted zookeeper > >> server is continually attempting to elect a leader even though one > >> already exists. > >> > >> The process responses with 'imok' when asked, but the stat command > >> returns 'ZooKeeperServer not running'. > >> > >> I belive that killing the current leader should trigger all servers > >> to do an election and solve the problem, but I'm not sure. Should > >> that be the course of action in this situation? > >> > >> Also we have 12 servers, but 5 are currently not running according to > >> stat. So I guess this isn't a problem unless we lose another one. > >> We have plans to upgrade zookeeper to solve the cpu issue but haven't > >> been able to do that yet. > >> > >> Any help appreciated, Nick Bailey > >> > > >