RE: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

Todd Greenwood Thu, 06 Aug 2009 16:02:22 -0700
Thanks! We'll proceed w/ 3.1.1. -Todd

> -----Original Message-----
> From: Mahadev Konar [mailto:[email protected]]
> Sent: Thursday, August 06, 2009 3:56 PM
> To: [email protected]
> Subject: Re: Optimized WAN ZooKeeper Config : Multi-Ensemble
configuration
> 
> Hi Todd,
>   I would strongly recommend 3.1.1 for now. 3.2 has quite a few bugs.
> 3.2.1
> should be out in the next week or so but again, I would suggest to go
with
> 3.1.1 for now on production, since we already have bunch of users
using
> 3.1.1 on production clusters. Upgrading from 3.1.1 to 3.2.1 should be
> seemless.
> 
> Thanks
> mahadev
> 
> 
> On 8/6/09 3:53 PM, "Todd Greenwood" <[email protected]> wrote:
> 
> > Considering that we're opting for a WAN deployment that is not going
to
> > use groups, weights, etc., and that we are going to implement an
> > ensemble-to-ensemble sync mechanism...what version of zookeeper do
you
> > recommend?
> >
> >> -----Original Message-----
> >> From: Todd Greenwood
> >> Sent: Wednesday, August 05, 2009 2:21 PM
> >> To: '[email protected]'
> >> Subject: RE: Optimized WAN ZooKeeper Config : Multi-Ensemble
> > configuration
> >>
> >> Mahadev, comments inline:
> >>
> >>> -----Original Message-----
> >>> From: Mahadev Konar [mailto:[email protected]]
> >>> Sent: Wednesday, August 05, 2009 1:47 PM
> >>> To: [email protected]
> >>> Subject: Re: Optimized WAN ZooKeeper Config : Multi-Ensemble
> >> configuration
> >>>
> >>> Todd,
> >>>  Comments in line:
> >>>
> >>>
> >>> On 8/5/09 12:10 PM, "Todd Greenwood" <[email protected]>
> > wrote:
> >>>
> >>>> Flavio/Patrick/Mahadev -
> >>>>
> >>>> Thanks for your support to date. As I understand it, the sticky
> > points
> >>>> w/ respect to WAN deployments are:
> >>>>
> >>>> 1. Leader Election:
> >>>>
> >>>> Leader elections in the WAN config (pod zk server weight = 0) is
a
> > bit
> >>>> troublesome (ZOOKEEPER-498)
> >>> Yes, until ZOOKEEPER-498 is fixed, you wont be able to use it with
> >> groups
> >>> and zero weight.
> >>>
> >>>>
> >>>> 2. Network Connectivity Required:
> >>>>
> >>>> ZooKeeper clients cannot read/write to ZK Servers if the Server
> > does
> >> not
> >>>> have network connectivity to the quorum. In short, there is a
hard
> >>>> requirement to have network connectivity in order for the clients
> > to
> >>>> access the shared memory graph in ZK.
> >>> Yes
> >>>
> >>>>
> >>>> Alternative
> >>>> -----------
> >>>>
> >>>> I have seen some discussion about in the past re: multi-ensemble
> >>>> solutions. Essentially, put one ensemble in each physical
location
> >>>> (POD), and another in your DC, and have a fairly simple process
> >>>> coordinate synchronizing the various ensembles. If the POD writes
> > can
> >> be
> >>>> confined to a sub-tree in the master graph, then this should be
> > fairly
> >>>> simple. I'm imagining the following:
> >>>>
> >>>> DC (master) graph:
> >>>> /root/pods/1/data/item1
> >>>> /root/pods/1/data/item2
> >>>> /root/pods/1/data/item3
> >>>> /root/pods/2
> >>>> /root/pods/3
> >>>> ...etc
> >>>> /root/shared/allpods/readonly/data/item1
> >>>> /root/shared/allpods/readonly/data/item2
> >>>> ...etc
> >>>>
> >>>> This has the advantage of minimizing cross pod traffic, which
> > could be
> >> a
> >>>> real perf killer in an WAN. It also provides transacted writes in
> > the
> >>>> PODs, even in the disconnected state. Clearly, another portion of
> > the
> >>>> business logic has to reconcile the DC (master) graph such that
> > each
> >> of
> >>>> the pods data items are processed, etc.
> >>>>
> >>>> Does anyone have any experience with this (pitfalls, suggestions,
> >> etc.?)
> >>> As far as I understand is that you mean that have a master Cluster
> > with
> >>> other in a different data center syncing with the master (just a
> >> subtree)?
> >>> Is that correct?
> >>>
> >>> If yes, this is what one of our users in Yahoo! Search do. They
have
> > a
> >>> master cluster and a smaller cluster in a different datacenter and
a
> >>> brdige
> >>> that copies data from the master cluster (only a subtree) to the
> > smaller
> >>> one
> >>> and keeps them in syncs.
> >>>
> >>
> >> Yes, this is exactly what I'm proposing. With the addition that
I'll
> > sync
> >> subtrees in both directions, and have a separate process reconcile
> > data
> >> from the various pods, like so:
> >>
> >> #pod1 ensemble
> >> /root/a/b
> >>
> >> #pod2 ensemble
> >> /root/a/b
> >>
> >> #dc ensemble
> >> /root/shared/foo/bar
> >>
> >> # Mapping (modeled after perforce client config)
> >> # [ensemble]:[path] [ensemble]:[path]
> >> # sync pods to dc
> >> [POD1]:/root/... [DC]:/root/pods/POD1/...
> >> [POD2]:/root/... [DC]:/root/pods/POD2/...
> >> # sync dc to pods
> >> [DC]:/root/shared/... [POD1]:/shared/...
> >> [DC]:/root/shared/... [POD2]:/shared/...
> >> [DC]:/root/shared/... [POD3]:/shared/...
> >>
> >> Now, for our needs, we'd like the DC data aggregated, so I'll have
> > another
> >> process handle aggregating the pod specific data like so:
> >>
> >> POD Data Aggregator: aggregate data in [DC]:/root/pods/POD(N) to
> >> [DC]:/root/aggregated/data.
> >>
> >> This is just off the top of my head.
> >>
> >> -Todd
> >>
> >>>
> >>> Thanks
> >>> mahadev
> >>>>
> >>>> -Todd
> >
RE: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

Reply via email to