Deployment scenario's for zookeeper in AWS

Robert Kamphuis Thu, 29 Jan 2015 00:36:07 -0800

Hi,

I would like to discuss deployment scenarios of apps and zookeeper in AWS. I 
have been trying to find info about this, but haven’t found too much yet.


We have been working on better redundancy of our apps - hundreds of VMs - 
running 24x7, and zookeeper is one component we introduced last year. While it 
is working fine, there are some little tricks and missing things in the current 
setup.

I would really like to hear more how others are configuring their apps and 
zookeeper in AWS.

Trying to summarise our current setup:
- One autoscalinggroup (ASG) for the 5 zookeeper servers per application - the 
ASG will replace an instance by a fresh one if it goes bad.
- the zookeeper servers each have their assigned elastic-ip, and their zoo.cfg 
lists these elastic ips in the server.N lines. not the names, the IP addresses 
directly. We can swap a zookeeper-VM by terminating it, and have the ASG create 
a new one, and once it is assigned the freed elastic-ip, it joins the 
zookeeper-cluster.
- the zookeepers security group explicitly allows those 5 elastic-ips for the 
2888 and 3888 ports, plus the SGs of our app-servers
- the image we use for the zookeeper ASG contains a little extra service which 
takes care of automatically assigning the configured elastic IPs to its ASG 
members. So when an new server boots up, the remaining ASG members will set the 
missing elastic-ip to the new instance, and it will startup zookeeper and join 
the cluster. The same image is used for all apps-deployments - with the 
userdata of the ASG telling what elastic-ips and some other details. One image 
to have a redundant self-healing zookeeper-cluster per application.
- the application servers are spread across different SGs depending on needs 
and their roles, and the connectstring is configured by logical names like 
zookeeperX.<app.domainname.<http://domainname.com>org> for X=1,2,3,4,5. We 
added manually mappings for these to the ec2-public hostname of the elastic-ips 
- like 
ec2-A-B-C-D.compute-1.amazonaws.com<http://ec2-A-B-C-D.compute-1.amazonaws.com> 
 with A.B.C.D being the corresponding elastic-ip. This has the great benefit, 
that all our application VMs when looking up these logical 
zookeeperX.app.domain.org<http://zookeeperN.app.domain.org> will resolve it to 
the current private-ip of that zookeeper-server instance, and when connecting 
the SG will allow it through. (if we use the A.B.C.D directly, we would need to 
provision each application-vm explicitly in the SG of the zookeeper-cluster - 
hundreds servers which are changing somewhat from week to week.
- we use curator for leader-election to pick what server is doing what role, 
and we run some 5-10% more servers than roles we need. Each server holds on to 
its role until it lost its session, and another spare-server jumps in to take 
over. So if an app servers goes bad (eg. ebs, networking, or it just 
disappears), one of the others jump in to take over.
- we changed the curator’s leaderlatch somewhat to hang-on to the leadership 
during suspend events. Waiting for the reconnect or lost events. A leadership 
role is an expensive thing due to the high amount of state and data-caching in 
each server - which is needed for performance. This means that when one of the 
zookeeper-servers goes bad, its not that about one 5th of our servers loose 
their role - they have some 30 seconds to reconnect to the remaining servers 
and continue their session their.

The current issues we have are the following:

- A while back there was a networking issue in AWS which caused traffic between 
the zookeeper-servers to be partially blocking for some minutes. The zookeeper 
cluster lost its leader, and re-election failed. The App came to a grinding 
halt. Not good. We have been working on adding keep-alive packets to the 
election ports between the servers which we identified as a working solution 
for that issue. We simulate the problems via iptables. We hope get that patch 
submitted in the near future for consideration.  This has been reported a while 
back with discussions on the best way going forward. eg. 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748 (we would prefer 
application level keepalive packets, in stead of lower level tcpkeepalive 
socket options.)

- While the replacement of a zookeeper VM instance works great, there is one 
remaining issue: how do the applications VMs know about the changed name-to-ip 
relation? zookeeperX.app.domain.org<http://zookeeperX.app.domain.org> is no 
longer mapping to the same private IP anymore - the replacement VM has a 
different IP.
We tried to work around this by changing the connectstring to a shuffled 
replacement, but that expired the sessions, and thus cause the leaderlatches to 
close, and in some cases some servers could not get their old role back as some 
spares got there first.
We now have a prototype working where we use a special HostProvider 
implementation which resolves from name to IP when next() is called instead of 
on construction as the default StaticHostProvider does. This means that after 
the mapping changes, the zookeeper client has the new private IP address to 
connect to. In addition, this is not ending the zookeeper session, so the 
leader-latches remain. (we use a sessiontime of about 1-2 minutes). This 
solution requires a small fix and addition to the ZooKeeper class to enable 
passing a custom HostProvider. see: 
https://issues.apache.org/jira/browse/ZOOKEEPER-2107


Hope this helps others running on AWS, and please share you experiences ?

thanks
Robert

Deployment scenario's for zookeeper in AWS

Reply via email to