Re: zookeeper on ec2

Evan Jones Mon, 06 Jul 2009 13:44:28 -0700

On Jul 6, 2009, at 15:40 , Henry Robinson wrote:

This is an interesting way of doing things. It seems like there is a
correctness issue: if a majority of servers fail, with the remaining
minority lagging the leader for some reason, won't the ensemble'scurrentstate be forever lost? This is akin to a majority of servers failingandnever recovering. ZK relies on the eventual liveness of a majorityof its
servers; with EC2 it seems possible that that property might not be
satisfied.

I think you are absolutely correct. However, my understanding of EC2failure modes is that even though there is no guarantee that aparticular instance's disk will survive a failure, it is very possibleto observe EC2 nodes that "fail" temporarily (such as rebooting). Inthese cases, the instance's disk typically does survive, and when itcomes back it will have the same contents. It is only "permanent" EC2failures where the disk is gone (eg. hardware failure, or Amazondecides to pull it for some other reason).

Thus, this looks a lot like running your own machines in your own datacenter to me. Soft failures will recover, hardware failures won't. Theonly difference is that if you were running the machines yourself, andyou ran into some weird issue where you had hardware failures across amajority of your Zookeeper ensemble, you could physically move thedisks to recover the state. If this happens in EC2, you will have todo some sort of "manual" repair where you forcibly restart Zookeeperusing the state of one of the surviving members. Some Zookeeperoperations may be lost in this case.

However, we are talking about a situation that seems exceedingly rare.No matter what kind of system you are running, serious non-recoverablefailures will happen, so I don't see this to be an impediment forrunning Zookeeper or other quorum systems in EC2.

That said, I haven't run enough EC2 instances for a long enough periodof time to observe any serious failures or recoveries. If anyone hasmore detailed information, I would love to hear about it.


Evan Jones

--
Evan Jones
http://evanjones.ca/

Re: zookeeper on ec2

Reply via email to