On 07/08/2010 06:45 PM, Jean-Daniel Cryans wrote:
It's not IO intense, it's IO latency sensitive eg. if other processes
are sucking up most of the IO bandwidth then ZK will have a hard time
taking quorum decisions.

ZK disk activity is pretty low - really the only time we write to disk is when a client asks us to (in this case typically HBase RS). We need to "fsync" data to disk before returning "success" indication to the client, this is required for our durability guarantees.

The issue here is that if you are on a saturated disk (say colo'd with a data node/rs) the disk might be 100+% saturated for minutes at a time. That coupled with linux/ext3fs fsync issues it might be minutes before our fsync call returns - in which case the ZooKeeper clients (hbase rs) will timeout, similar to a network partition or service failure.

Patrick


On Thu, Jul 8, 2010 at 5:38 PM, Arun Ramakrishnan
<[email protected]>  wrote:
Good to know ZK is IO intense.
Since ZK does not require much disk space and is IO intense. Has anyone played 
with using solid state drives for ZK.
We have a 20 node cluster. It would be feasible to have a 3 node ZK all 
configured with solid state drives.

Thanks
Arun

-----Original Message-----
From: Jonathan Gray [mailto:[email protected]]
Sent: Thursday, July 08, 2010 4:25 PM
To: [email protected]
Subject: RE: zookeeper&  HBase

ZK is sensitive to IO starvation which is why it is recommended to keep it on a 
separate node or separate disk.  In most cases, giving ZK its own disk is 
sufficient and dedicated node(s) are unnecessary.

On smallish clusters like 10 nodes, I would recommend starting with just 1 ZK 
node co-located with your NameNode and HMaster, but with a dedicated disk just 
for ZK.  Since the NN is a SPOF, having one ZK doesn't really lower your fault 
tolerance, except that it may be on a non-raided disk.  I encourage RAID usage 
for NN and ZK.  JBOD for DN/RS.

JG

-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Thursday, July 08, 2010 4:20 PM
To: [email protected]
Subject: zookeeper&  HBase


  I'm trying to have our deployment layout..I read one of the
articles/FAQ (probably JG's)...that it's better to
have zookeeper on separate cluster/separate sets of machine..I'm
assuming that is the right approach..


All our transactions are HBase (inserts, mapreduce-table as input,
another table as output, other queries,..)
Based on other thread on locality..RegionServer&  Datanode i'll put on
same hosts..

If these boxes have enough capacity, do we need to put zookeeper on
separate cluster?
If it is on a separate cluster, my understanding is zookeper has much
smaller memory footprint compared
to HRegionServer/Datanodes..&  it shld need that much CPU as
well..correct?

Is there any suggested guidance on number of zookeeper vs number of
regionservers?..looking for some ratio..say 10 node cluster..
how many zookeeper..?

Please ignore responding to this ..if this is outside the etiquette
thanks
venkatesh



Reply via email to