We have used EC2 quite a bit for ZK.
The basic lessons that I have learned include:
a) EC2's biggest advantage after scaling and elasticity was conformity of
configuration. Since you are bringing machines up and down all the time,
they begin to act more like programs and you wind up with boot scripts that
give you a very predictable environment. Nice.
b) EC2 interconnect has a lot more going on than in a dedicated VLAN. That
can make the ZK servers appear a bit less connected. You have to plan for
c) for highest reliability, I switched to large instances. On reflection, I
think that was helpful, but less important than I thought at the time.
d) increasing and decreasing cluster size is nearly painless and is easily
scriptable. To decrease, do a rolling update on the survivors to update
their configuration. Then take down the instance you want to lose. To
increase, do a rolling update starting with the new instances to update the
configuration to include all of the machines. The rolling update should
bounce each ZK with several seconds between each bounce. Rescaling the
cluster takes less than a minute which makes it comparable to EC2 instance
boot time (about 30 seconds for the Alestic ubuntu instance that we used
plus about 20 seconds for additional configuration).
On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.g...@28msec.com> wrote:
> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my system,
> zookeeper is used to run a locking service and to generate unique id's.
> Currently, for testing purposes, I am only running one instance. Now, I need
> to set up an ensemble to protect my system against crashes.
> The ec2 services has some differences to a normal server farm. E.g. the
> data saved on the file system of an ec2 instance is lost if the instance
> crashes. In the documentation of zookeeper, I have read that zookeeper saves
> snapshots of the in-memory data in the file system. Is that needed for
> recovery? Logically, it would be much easier for me if this is not the case.
> Additionally, ec2 brings the advantage that serves can be switch on and off
> dynamically dependent on the load, traffic, etc. Can this advantage be
> utilized for a zookeeper ensemble? Is it possible to add a zookeeper server
> dynamically to an ensemble? E.g. dependent on the in-memory load?