Thankyou JG..Did not realize ZK is sensitive to IO..also did n't plan for more 
than normal disk space..
 I was thinking around 5 ZK on relatively cheap hardware just to support our 
insert/put rates (300 million per day roughly)
...NN/HM/DN/HM all on much more powerful machines (penguin like hardware)
Does that sound ok?

Besides scaling..I've a question regarding retries...
hbase.client.retries.number
zookeeper.retries

While testing from hbase client (tomcat in our case), by default client keeps 
trying for ever connecting to zookeeper..
if all are down..It does n't abort after 10 (hbase.client.retries.number) or 
zookeeper.retries (i think default 5)..

Only way i can stop retrying is if set hbase.client.retries.number = 0 
(zookeper.retries can be any number)
>From config comment, I see that there is exponential backoff algorithm..could 
>you please shed some light on retries?
I'll read about the algorithm as well..

Is there a way to completely stop retrying after X absolute tries.?

thanks
venkatesh

 
PS: I'll get on IRC after my critical questions :)..sorry


 

 

-----Original Message-----
From: Jonathan Gray <[email protected]>
To: [email protected] <[email protected]>
Sent: Thu, Jul 8, 2010 7:24 pm
Subject: RE: zookeeper & HBase


ZK is sensitive to IO starvation which is why it is recommended to keep it on a 

separate node or separate disk.  In most cases, giving ZK its own disk is 

sufficient and dedicated node(s) are unnecessary.



On smallish clusters like 10 nodes, I would recommend starting with just 1 ZK 

node co-located with your NameNode and HMaster, but with a dedicated disk just 

for ZK.  Since the NN is a SPOF, having one ZK doesn't really lower your fault 

tolerance, except that it may be on a non-raided disk.  I encourage RAID usage 

for NN and ZK.  JBOD for DN/RS.



JG



> -----Original Message-----

> From: [email protected] [mailto:[email protected]]

> Sent: Thursday, July 08, 2010 4:20 PM

> To: [email protected]

> Subject: zookeeper & HBase

> 

> 

>  I'm trying to have our deployment layout..I read one of the

> articles/FAQ (probably JG's)...that it's better to

> have zookeeper on separate cluster/separate sets of machine..I'm

> assuming that is the right approach..

> 

> 

> All our transactions are HBase (inserts, mapreduce-table as input,

> another table as output, other queries,..)

> Based on other thread on locality..RegionServer & Datanode i'll put on

> same hosts..

> 

> If these boxes have enough capacity, do we need to put zookeeper on

> separate cluster?

> If it is on a separate cluster, my understanding is zookeper has much

> smaller memory footprint compared

> to HRegionServer/Datanodes..& it shld need that much CPU as

> well..correct?

> 

> Is there any suggested guidance on number of zookeeper vs number of

> regionservers?..looking for some ratio..say 10 node cluster..

> how many zookeeper..?

> 

> Please ignore responding to this ..if this is outside the etiquette

> thanks

> venkatesh

> 


 

Reply via email to