Hi, [Let me know if you want this thread moved to the Dev list (or even to JIRA). I was only seeing automated mails there so I thought I'll go ahead and post here]
I have been looking at the codebase the last couple of days (see my notes regarding the same here: https://docs.google.com/document/d/1TcohOWsUBXehS-E50bYY77p8SnGsF3IrBtu_LleP80U/edit ). We are planning to do a proof-of-concept for the partitioning concept as part of a class project, and measure any possible performance gains. Since we're new to Zookeeper and short on time, it may not be the *right* way to do it, but I hope it can give some pointers for the future. Design approaches to implement a partitioned Zookeeper For starters, let's assume we only parallelize accesses to paths starting with a different top-level prefix, i.e. /app1/child1, /app2/child1, /config etc Possible approach: Have a different tree object for each top-level node (/app1, /app2 etc). This loosely corresponds to a container in the Wiki page [1], and corresponds to the DataTree class in the codebase - As soon as a request comes in, associate it with one of the trees. Since each request necessarily has a path associated with it, this is possible. - Then, all the queues that are used to process requests should operate parallelly on these different trees. This can be done by having multiple queues - one for each container. Potential issues: - Whether ZK code is designed to work with multiple trees instead of just one - Whether the queuing process (which uses RequestProcessors) is designed to handle multiple queues - Make sure performance actually improves, and does not degrade! Discussion: - Where is the performance benefit actually going to come from? Intuitively, we might think that parallel trees might give a benefit, but since each node logs all change records to disk before applying them, isn't disk the throughput bottleneck? If I remember right, the ZK paper says that with proper configs, they are able to make ZK I/O bound. So along with having separate trees and associated processing, should we also have separate logging to disk for each tree? Will this actually help in improving write speeds to disk? References: 1. The wiki page: http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper 2. The JIRA discussion: https://issues.apache.org/jira/browse/ZOOKEEPER-646 3. In this blog post, see the section called Scalability and Hashing Zookeeper clusters: http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cage Thanks, Pramod -- http://twitter.com/pramodbiligiri On Fri, May 16, 2014 at 10:56 PM, Pramod Biligiri <[email protected]>wrote: > Thanks Michi, > That was a very useful link! :) > > Pramod > > > On Fri, May 16, 2014 at 3:37 PM, Michi Mutsuzaki <[email protected]>wrote: > >> Hi Pramod, >> >> No it has not been implemented, and I'm not aware of any recipes. >> There is an open JIRA for this feature. >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-646 >> >> On Thu, May 15, 2014 at 12:59 PM, Pramod Biligiri >> <[email protected]> wrote: >> > Hi, >> > The Zookeeper wiki talks about Partitioned Zookeeper: >> > >> https://cwiki.apache.org/confluence/display/ZOOKEEPER/PartitionedZooKeeper >> > >> > I wanted to know if that has already been implemented or not. If not, >> are >> > there some recipes which can make Zookeeper behave in that way? >> > >> > Thanks. >> > >> > Pramod >> > >
