Re: Achieving quorum with only half of the nodes

2010-07-15 Thread Sergei Babovich
Thanks, Ted. This would solve power feed problem. Unfortunately we have similar situation with the switches. Failure of one will bring half of the servers down. It is understood that changing/adjusting hardware infrastructure can solve the problem. Unfortunately this is not under my control. S

Re: Achieving quorum with only half of the nodes

2010-07-15 Thread Ted Dunning
A small rack mounted UPS doesn't require a full-scale rebuild of infrastructure and would get you through almost all power fail scenarios. If you have 5 ZK servers, put 3 on one power source and give one of them the UPS. Then give put the other 2 on the second power source. If power source A fai

Re: Achieving quorum with only half of the nodes

2010-07-15 Thread Sergei Babovich
Thanks, Flavio, I appreciate your feedback. Three power sources obviously would solve the problem. Unfortunately at this moment it does not seem to be feasible (we will need to rebuild the whole existing infrastructure). This is the main reason why I am exploring possible alternative (besides t

Re: Achieving quorum with only half of the nodes

2010-07-15 Thread Flavio Junqueira
Your EC2 suggestion sounds reasonable. If your deployment is able to form a local quorum most of the time, then you would be able to get a quorum of acks most of the time. One concern is that the EC2 replica might lag behind badly, which may force the leader to either slow down or to drop the conne

Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Ted Dunning
On Wed, Jul 14, 2010 at 2:16 PM, Sergei Babovich wrote: > Yep... I see. This is a problem. Any better idea? > I think that the production of slightly elaborate quorum rules to handle specific failure modes isn't a reasonable thing. What you need to do in conjunction is to estimate likelihoods of

Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Sergei Babovich
Thanks, Flavio, Yep... I see. This is a problem. Any better idea? As an alternative option we could probably consider running single ZK node on EC2 - only in order to handle this specific case. Does it make sense to you? Is it feasible? Would it result in considerable performance impact due to

Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Sergei Babovich
Just another implementation of QuorumVerifier (based on existing implementation: either majority or hierarchical quorums). Probably hierarchical quorum is simplest to adjust - it already has notion of groups, etc. On 07/14/2010 04:46 PM, Benjamin Reed wrote: by custom QuorumVerifier are you r

Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Flavio Junqueira
Hi Sergei, I'm not sure what the implementation of QuorumVerifier you have in mind would look like to make your setting work. Even if you don't have partitions, variation in message delays can cause inconsistencies in your ZooKeeper cluster. Keep in mind that we make the assumption that quorums int

Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Benjamin Reed
by custom QuorumVerifier are you referring to http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperHierarchicalQuorums.html ? ben On 07/14/2010 12:43 PM, Sergei Babovich wrote: Hi, We are currently evaluating use of ZK in our infrastructure. In our setup we have a set of servers running fr

Achieving quorum with only half of the nodes

2010-07-14 Thread Sergei Babovich
Hi, We are currently evaluating use of ZK in our infrastructure. In our setup we have a set of servers running from two different power feeds. If one power feed goes away so does half of the servers. This makes problematic to configure ZK ensemble that would tolerate such outage. The network p