Hmm... Interesting perspective.  From my point of view, if you add a node to an 
odd-numbered set, the reliability of the system will decrease.  This is easiest 
to see between 1 and 2 nodes.  For 2 nodes, if either node fails the whole 
system dies.  P(A*B) < P(A) -- adding the second node can only decrease your 
reliability.  By induction you can see that reliability always decreases by 
adding a node to an odd numbered set.

So, no, it is not a *requirement* that zookeeper have an odd number of nodes.  
It is just self-defeating for that not to be the deployment plan.  Although, if 
you did happen to have a 200 node zookeeper system, it is not going to be that 
much less resilient than a 199 node system.  But that would be silly -- at that 
scale you have other issues that are much more pressing.

Dave

-----Original Message-----
From: Joe Pallas [mailto:[email protected]] On Behalf Of Joe Pallas
Sent: Wednesday, September 21, 2011 9:34 AM
To: [email protected]
Subject: Re: Queries on Zookeeper failure and RegionServer restartup


On Sep 20, 2011, at 8:37 AM, Buttler, David wrote:

> Wait, you do realize that you have to have a majority of zookeeper nodes 
> alive for zookeeper to work, right?  That means that you get lower 
> reliability with two nodes than one node: if either node goes down, zookeeper 
> will give up.  This also implies that you need to have an odd number of nodes 
> in your zookeeper ensemble.

I keep seeing this last part repeated and I believe it is a misunderstanding.  
There is no requirement for an odd number of nodes, although adding a node to 
an odd-numbered set will not increase reliability.  "For this reason, ZooKeeper 
deployments are usually made up of an odd number of machines." 
<http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_CrossMachineRequirements>

joe

Reply via email to