ZK is sensitive to IO starvation which is why it is recommended to keep it on a separate node or separate disk. In most cases, giving ZK its own disk is sufficient and dedicated node(s) are unnecessary.
On smallish clusters like 10 nodes, I would recommend starting with just 1 ZK node co-located with your NameNode and HMaster, but with a dedicated disk just for ZK. Since the NN is a SPOF, having one ZK doesn't really lower your fault tolerance, except that it may be on a non-raided disk. I encourage RAID usage for NN and ZK. JBOD for DN/RS. JG > -----Original Message----- > From: [email protected] [mailto:[email protected]] > Sent: Thursday, July 08, 2010 4:20 PM > To: [email protected] > Subject: zookeeper & HBase > > > I'm trying to have our deployment layout..I read one of the > articles/FAQ (probably JG's)...that it's better to > have zookeeper on separate cluster/separate sets of machine..I'm > assuming that is the right approach.. > > > All our transactions are HBase (inserts, mapreduce-table as input, > another table as output, other queries,..) > Based on other thread on locality..RegionServer & Datanode i'll put on > same hosts.. > > If these boxes have enough capacity, do we need to put zookeeper on > separate cluster? > If it is on a separate cluster, my understanding is zookeper has much > smaller memory footprint compared > to HRegionServer/Datanodes..& it shld need that much CPU as > well..correct? > > Is there any suggested guidance on number of zookeeper vs number of > regionservers?..looking for some ratio..say 10 node cluster.. > how many zookeeper..? > > Please ignore responding to this ..if this is outside the etiquette > thanks > venkatesh >
