Re: Partitioned Zookeeper

Pramod Biligiri Sun, 18 May 2014 21:37:23 -0700

Hi Ted,
I was briefly familiar with multi, and I just took a look at that part of
the code. It seems to provide a way to implement transactions.


I guess you mean that you can't parallellize the workload because a multi
command might require locking all the containers? Let me know if I'm
missing something.

Thanks,
Pramod


On Sun, May 18, 2014 at 9:12 PM, Ted Dunning <[email protected]> wrote:

> Pramod,
>
> Have you looked at the multi command?  That might cause you some serious
> heartburn.
>
>
>
>
> On Sun, May 18, 2014 at 8:25 PM, Pramod Biligiri
> <[email protected]>wrote:
>
> > Hi,
> > [Let me know if you want this thread moved to the Dev list (or even to
> > JIRA). I was only seeing automated mails there so I thought I'll go ahead
> > and post here]
> >
> > I have been looking at the codebase the last couple of days (see my notes
> > regarding the same here:
> >
> >
> https://docs.google.com/document/d/1TcohOWsUBXehS-E50bYY77p8SnGsF3IrBtu_LleP80U/edit
> > ).
> >
> > We are planning to do a proof-of-concept for the partitioning concept as
> > part of a class project, and measure any possible performance gains.
> Since
> > we're new to Zookeeper and short on time, it may not be the *right* way
> to
> > do it, but I hope it can give some pointers for the future.
> >
> > Design approaches to implement a partitioned Zookeeper
> >
> > For starters, let's assume we only parallelize accesses to paths starting
> > with a different top-level prefix, i.e. /app1/child1, /app2/child1,
> /config
> > etc
> >
> > Possible approach:
> >
> > Have a different tree object for each top-level node (/app1, /app2 etc).
> > This loosely corresponds to a container in the Wiki page [1], and
> > corresponds to the DataTree class in the codebase
> >
> > - As soon as a request comes in, associate it with one of the trees.
> Since
> > each request necessarily has a path associated with it, this is possible.
> >
> > - Then, all the queues that are used to process requests should operate
> > parallelly on these different trees. This can be done by having multiple
> > queues - one for each container.
> >
> > Potential issues:
> >
> > - Whether ZK code is designed to work with multiple trees instead of just
> > one
> >
> > - Whether the queuing process (which uses RequestProcessors) is designed
> to
> > handle multiple queues
> >
> > - Make sure performance actually improves, and does not degrade!
> >
> > Discussion:
> >
> > - Where is the performance benefit actually going to come from?
> >
> > Intuitively, we might think that parallel trees might give a benefit, but
> > since each node logs all change records to disk before applying them,
> isn't
> > disk the throughput bottleneck? If I remember right, the ZK paper says
> that
> > with proper configs, they are able to make ZK I/O bound.
> >
> > So along with having separate trees and associated processing, should we
> > also have separate logging to disk for each tree? Will this actually help
> > in improving write speeds to disk?
> >
> > References:
> >
> > 1. The wiki page:
> > http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper
> >
> > 2. The JIRA discussion:
> > https://issues.apache.org/jira/browse/ZOOKEEPER-646
> >
> > 3. In this blog post, see the section called Scalability and Hashing
> > Zookeeper clusters:
> >
> >
> http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cage
> >
> > Thanks,
> > Pramod
> > --
> > http://twitter.com/pramodbiligiri
> >
> >
> > On Fri, May 16, 2014 at 10:56 PM, Pramod Biligiri
> > <[email protected]>wrote:
> >
> > > Thanks Michi,
> > > That was a very useful link! :)
> > >
> > > Pramod
> > >
> > >
> > > On Fri, May 16, 2014 at 3:37 PM, Michi Mutsuzaki <
> [email protected]
> > >wrote:
> > >
> > >> Hi Pramod,
> > >>
> > >> No it has not been implemented, and I'm not aware of any recipes.
> > >> There is an open JIRA for this feature.
> > >>
> > >> https://issues.apache.org/jira/browse/ZOOKEEPER-646
> > >>
> > >> On Thu, May 15, 2014 at 12:59 PM, Pramod Biligiri
> > >> <[email protected]> wrote:
> > >> > Hi,
> > >> > The Zookeeper wiki talks about Partitioned Zookeeper:
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/PartitionedZooKeeper
> > >> >
> > >> > I wanted to know if that has already been implemented or not. If
> not,
> > >> are
> > >> > there some recipes which can make Zookeeper behave in that way?
> > >> >
> > >> > Thanks.
> > >> >
> > >> > Pramod
> > >>
> > >
> > >
> >
>

Re: Partitioned Zookeeper

Reply via email to