Re: zookeeper & HBase

Jean-Daniel Cryans Thu, 08 Jul 2010 17:49:07 -0700

The client's retry policies need some more thoughts, currently it's
hard to manage.


https://issues.apache.org/jira/browse/HBASE-2445

J-D

On Thu, Jul 8, 2010 at 5:17 PM,  <[email protected]> wrote:
> Thankyou JG..Did not realize ZK is sensitive to IO..also did n't plan for 
> more than normal disk space..
>  I was thinking around 5 ZK on relatively cheap hardware just to support our 
> insert/put rates (300 million per day roughly)
> ...NN/HM/DN/HM all on much more powerful machines (penguin like hardware)
> Does that sound ok?
>
> Besides scaling..I've a question regarding retries...
> hbase.client.retries.number
> zookeeper.retries
>
> While testing from hbase client (tomcat in our case), by default client keeps 
> trying for ever connecting to zookeeper..
> if all are down..It does n't abort after 10 (hbase.client.retries.number) or 
> zookeeper.retries (i think default 5)..
>
> Only way i can stop retrying is if set hbase.client.retries.number = 0 
> (zookeper.retries can be any number)
> From config comment, I see that there is exponential backoff algorithm..could 
> you please shed some light on retries?
> I'll read about the algorithm as well..
>
> Is there a way to completely stop retrying after X absolute tries.?
>
> thanks
> venkatesh
>
>
> PS: I'll get on IRC after my critical questions :)..sorry
>
>
>
>
>
>
> -----Original Message-----
> From: Jonathan Gray <[email protected]>
> To: [email protected] <[email protected]>
> Sent: Thu, Jul 8, 2010 7:24 pm
> Subject: RE: zookeeper & HBase
>
>
> ZK is sensitive to IO starvation which is why it is recommended to keep it on 
> a
>
> separate node or separate disk.  In most cases, giving ZK its own disk is
>
> sufficient and dedicated node(s) are unnecessary.
>
>
>
> On smallish clusters like 10 nodes, I would recommend starting with just 1 ZK
>
> node co-located with your NameNode and HMaster, but with a dedicated disk just
>
> for ZK.  Since the NN is a SPOF, having one ZK doesn't really lower your fault
>
> tolerance, except that it may be on a non-raided disk.  I encourage RAID usage
>
> for NN and ZK.  JBOD for DN/RS.
>
>
>
> JG
>
>
>
>> -----Original Message-----
>
>> From: [email protected] [mailto:[email protected]]
>
>> Sent: Thursday, July 08, 2010 4:20 PM
>
>> To: [email protected]
>
>> Subject: zookeeper & HBase
>
>>
>
>>
>
>>  I'm trying to have our deployment layout..I read one of the
>
>> articles/FAQ (probably JG's)...that it's better to
>
>> have zookeeper on separate cluster/separate sets of machine..I'm
>
>> assuming that is the right approach..
>
>>
>
>>
>
>> All our transactions are HBase (inserts, mapreduce-table as input,
>
>> another table as output, other queries,..)
>
>> Based on other thread on locality..RegionServer & Datanode i'll put on
>
>> same hosts..
>
>>
>
>> If these boxes have enough capacity, do we need to put zookeeper on
>
>> separate cluster?
>
>> If it is on a separate cluster, my understanding is zookeper has much
>
>> smaller memory footprint compared
>
>> to HRegionServer/Datanodes..& it shld need that much CPU as
>
>> well..correct?
>
>>
>
>> Is there any suggested guidance on number of zookeeper vs number of
>
>> regionservers?..looking for some ratio..say 10 node cluster..
>
>> how many zookeeper..?
>
>>
>
>> Please ignore responding to this ..if this is outside the etiquette
>
>> thanks
>
>> venkatesh
>
>>
>
>
>
>

Re: zookeeper & HBase

Reply via email to