Pramod, Have you looked at the multi command? That might cause you some serious heartburn.
On Sun, May 18, 2014 at 8:25 PM, Pramod Biligiri <[email protected]>wrote: > Hi, > [Let me know if you want this thread moved to the Dev list (or even to > JIRA). I was only seeing automated mails there so I thought I'll go ahead > and post here] > > I have been looking at the codebase the last couple of days (see my notes > regarding the same here: > > https://docs.google.com/document/d/1TcohOWsUBXehS-E50bYY77p8SnGsF3IrBtu_LleP80U/edit > ). > > We are planning to do a proof-of-concept for the partitioning concept as > part of a class project, and measure any possible performance gains. Since > we're new to Zookeeper and short on time, it may not be the *right* way to > do it, but I hope it can give some pointers for the future. > > Design approaches to implement a partitioned Zookeeper > > For starters, let's assume we only parallelize accesses to paths starting > with a different top-level prefix, i.e. /app1/child1, /app2/child1, /config > etc > > Possible approach: > > Have a different tree object for each top-level node (/app1, /app2 etc). > This loosely corresponds to a container in the Wiki page [1], and > corresponds to the DataTree class in the codebase > > - As soon as a request comes in, associate it with one of the trees. Since > each request necessarily has a path associated with it, this is possible. > > - Then, all the queues that are used to process requests should operate > parallelly on these different trees. This can be done by having multiple > queues - one for each container. > > Potential issues: > > - Whether ZK code is designed to work with multiple trees instead of just > one > > - Whether the queuing process (which uses RequestProcessors) is designed to > handle multiple queues > > - Make sure performance actually improves, and does not degrade! > > Discussion: > > - Where is the performance benefit actually going to come from? > > Intuitively, we might think that parallel trees might give a benefit, but > since each node logs all change records to disk before applying them, isn't > disk the throughput bottleneck? If I remember right, the ZK paper says that > with proper configs, they are able to make ZK I/O bound. > > So along with having separate trees and associated processing, should we > also have separate logging to disk for each tree? Will this actually help > in improving write speeds to disk? > > References: > > 1. The wiki page: > http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper > > 2. The JIRA discussion: > https://issues.apache.org/jira/browse/ZOOKEEPER-646 > > 3. In this blog post, see the section called Scalability and Hashing > Zookeeper clusters: > > http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cage > > Thanks, > Pramod > -- > http://twitter.com/pramodbiligiri > > > On Fri, May 16, 2014 at 10:56 PM, Pramod Biligiri > <[email protected]>wrote: > > > Thanks Michi, > > That was a very useful link! :) > > > > Pramod > > > > > > On Fri, May 16, 2014 at 3:37 PM, Michi Mutsuzaki <[email protected] > >wrote: > > > >> Hi Pramod, > >> > >> No it has not been implemented, and I'm not aware of any recipes. > >> There is an open JIRA for this feature. > >> > >> https://issues.apache.org/jira/browse/ZOOKEEPER-646 > >> > >> On Thu, May 15, 2014 at 12:59 PM, Pramod Biligiri > >> <[email protected]> wrote: > >> > Hi, > >> > The Zookeeper wiki talks about Partitioned Zookeeper: > >> > > >> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/PartitionedZooKeeper > >> > > >> > I wanted to know if that has already been implemented or not. If not, > >> are > >> > there some recipes which can make Zookeeper behave in that way? > >> > > >> > Thanks. > >> > > >> > Pramod > >> > > > > >
