Re: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

Mahadev Konar Thu, 06 Aug 2009 15:58:14 -0700

Hi Todd,
  I would strongly recommend 3.1.1 for now. 3.2 has quite a few bugs. 3.2.1
should be out in the next week or so but again, I would suggest to go with
3.1.1 for now on production, since we already have bunch of users using
3.1.1 on production clusters. Upgrading from 3.1.1 to 3.2.1 should be
seemless.


Thanks
mahadev


On 8/6/09 3:53 PM, "Todd Greenwood" <[email protected]> wrote:

> Considering that we're opting for a WAN deployment that is not going to
> use groups, weights, etc., and that we are going to implement an
> ensemble-to-ensemble sync mechanism...what version of zookeeper do you
> recommend?
> 
>> -----Original Message-----
>> From: Todd Greenwood
>> Sent: Wednesday, August 05, 2009 2:21 PM
>> To: '[email protected]'
>> Subject: RE: Optimized WAN ZooKeeper Config : Multi-Ensemble
> configuration
>> 
>> Mahadev, comments inline:
>> 
>>> -----Original Message-----
>>> From: Mahadev Konar [mailto:[email protected]]
>>> Sent: Wednesday, August 05, 2009 1:47 PM
>>> To: [email protected]
>>> Subject: Re: Optimized WAN ZooKeeper Config : Multi-Ensemble
>> configuration
>>> 
>>> Todd,
>>>  Comments in line:
>>> 
>>> 
>>> On 8/5/09 12:10 PM, "Todd Greenwood" <[email protected]>
> wrote:
>>> 
>>>> Flavio/Patrick/Mahadev -
>>>> 
>>>> Thanks for your support to date. As I understand it, the sticky
> points
>>>> w/ respect to WAN deployments are:
>>>> 
>>>> 1. Leader Election:
>>>> 
>>>> Leader elections in the WAN config (pod zk server weight = 0) is a
> bit
>>>> troublesome (ZOOKEEPER-498)
>>> Yes, until ZOOKEEPER-498 is fixed, you wont be able to use it with
>> groups
>>> and zero weight.
>>> 
>>>> 
>>>> 2. Network Connectivity Required:
>>>> 
>>>> ZooKeeper clients cannot read/write to ZK Servers if the Server
> does
>> not
>>>> have network connectivity to the quorum. In short, there is a hard
>>>> requirement to have network connectivity in order for the clients
> to
>>>> access the shared memory graph in ZK.
>>> Yes
>>> 
>>>> 
>>>> Alternative
>>>> -----------
>>>> 
>>>> I have seen some discussion about in the past re: multi-ensemble
>>>> solutions. Essentially, put one ensemble in each physical location
>>>> (POD), and another in your DC, and have a fairly simple process
>>>> coordinate synchronizing the various ensembles. If the POD writes
> can
>> be
>>>> confined to a sub-tree in the master graph, then this should be
> fairly
>>>> simple. I'm imagining the following:
>>>> 
>>>> DC (master) graph:
>>>> /root/pods/1/data/item1
>>>> /root/pods/1/data/item2
>>>> /root/pods/1/data/item3
>>>> /root/pods/2
>>>> /root/pods/3
>>>> ...etc
>>>> /root/shared/allpods/readonly/data/item1
>>>> /root/shared/allpods/readonly/data/item2
>>>> ...etc
>>>> 
>>>> This has the advantage of minimizing cross pod traffic, which
> could be
>> a
>>>> real perf killer in an WAN. It also provides transacted writes in
> the
>>>> PODs, even in the disconnected state. Clearly, another portion of
> the
>>>> business logic has to reconcile the DC (master) graph such that
> each
>> of
>>>> the pods data items are processed, etc.
>>>> 
>>>> Does anyone have any experience with this (pitfalls, suggestions,
>> etc.?)
>>> As far as I understand is that you mean that have a master Cluster
> with
>>> other in a different data center syncing with the master (just a
>> subtree)?
>>> Is that correct?
>>> 
>>> If yes, this is what one of our users in Yahoo! Search do. They have
> a
>>> master cluster and a smaller cluster in a different datacenter and a
>>> brdige
>>> that copies data from the master cluster (only a subtree) to the
> smaller
>>> one
>>> and keeps them in syncs.
>>> 
>> 
>> Yes, this is exactly what I'm proposing. With the addition that I'll
> sync
>> subtrees in both directions, and have a separate process reconcile
> data
>> from the various pods, like so:
>> 
>> #pod1 ensemble
>> /root/a/b
>> 
>> #pod2 ensemble
>> /root/a/b
>> 
>> #dc ensemble
>> /root/shared/foo/bar
>> 
>> # Mapping (modeled after perforce client config)
>> # [ensemble]:[path] [ensemble]:[path]
>> # sync pods to dc
>> [POD1]:/root/... [DC]:/root/pods/POD1/...
>> [POD2]:/root/... [DC]:/root/pods/POD2/...
>> # sync dc to pods
>> [DC]:/root/shared/... [POD1]:/shared/...
>> [DC]:/root/shared/... [POD2]:/shared/...
>> [DC]:/root/shared/... [POD3]:/shared/...
>> 
>> Now, for our needs, we'd like the DC data aggregated, so I'll have
> another
>> process handle aggregating the pod specific data like so:
>> 
>> POD Data Aggregator: aggregate data in [DC]:/root/pods/POD(N) to
>> [DC]:/root/aggregated/data.
>> 
>> This is just off the top of my head.
>> 
>> -Todd
>> 
>>> 
>>> Thanks
>>> mahadev
>>>> 
>>>> -Todd
>

Re: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

Reply via email to