I think that it is needed a little bit more than just stating 1000 per RS
or 100-200 and monitor table. The reason is simple: it is very difficult
for IT to work with this statement especially if we need to sell product
based on hbase technology. It is important to figure out:
* How many RS
* How many memory to assign to RS
* How many Region per RS
* How many connections a cluster of Hbase can handle
are needed for a specific deployment.
I'm not an IT guy but i know that will have to answer such questions.

Mikael.S


On Fri, Nov 4, 2011 at 1:37 PM, Michel Segel <[email protected]>wrote:

> The funny thing about tuning... What works for one situation may not work
> well for others.
> Using the old recommendation of never exceeding 1000 R per RS, keeping it
> low around 100-200 and monitoring tables and changing the REgion Size on a
> table by table basis we are doing OK.
> ( of course there are other nasty bugs that kill us... But that's a
> different thread...)
>
> The point is that you need to decide what makes sense for you and what
> trade offs you can live with...
>
> Just my two cents...
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Nov 2, 2011, at 9:10 PM, lars hofhansl <[email protected]> wrote:
>
> > Do we know what would need to change in HBase in order to be able to
> manage more regions per regionserver?
> > With 20 regions per server, one would need 300G regions to just utilize
> 6T of drive space.
> >
> >
> > To utilize a regionserver/datanode with 24T drive space the region size
> would be an insane 1T.
> >
> > -- Lars
> >
> > ________________________________
> > From: Nicolas Spiegelberg <[email protected]>
> > To: "[email protected]" <[email protected]>
> > Cc: Karthik Ranganathan <[email protected]>; Kannan Muthukkaruppan <
> [email protected]>
> > Sent: Tuesday, November 1, 2011 3:57 PM
> > Subject: Re: region size/count per regionserver
> >
> > Simple answer
> > -------------
> > 20 regions/server & <2000 regions/cluster is a good rule of thumb if you
> > can't profile your workload yet.  You really want to ensure that
> >
> > 1) You need to limits the regions/cluster so the master can have a
> > reasonable startup time & can handle all the region state transitions via
> > ZK.  Most bigger companies are running 2,000 in production and achieve
> > reasonable startup times (< 2 minutes for region assignment on cold
> > start).  If you want to test the scalability of that algorithm beyond
> what
> > other companies need, admin beware.
> > 2) The more regions/server you have, the faster that recovery can happen
> > after RS death because you can currently parallelize recovery on a
> > region-granularity.  Too many regions/server and #1 starts to be a
> problem.
> >
> >
> >
> > Complicated answer
> > ------------------
> > More information is optimize this formula.  Additional considerations:
> >
> > 1) Are you IO-bound or CPU-bound
> > 2) What is your grid topology like
> > 3) What is your network hardware like
> > 4) How many disks (not just size)
> > 5) What is the data locality between RegionServer & DataNode
> >
> > In the Facebook case, we have 5 racks with 20 nodes each.  Servers in the
> > rack are connected by 1G Eth to a switch with a 10G uplink.  We are
> > network bound.  Our saturation point is mostly commonly on the
> top-of-rack
> > switch.  With 20 regions/server, we can roughly parallelize our
> > distributed log splitting within a single rack on RS death (although 2
> > regions do split off-rack).  This minimizes top-of-rack traffic and
> > optimized our recovery time.  Even if you are CPU-bound, log splitting
> > (hence recovery time) is an IO-bound operation.  A lot of our work on
> > region assignment is about maximizing data locality, even on RS death, so
> > we avoid top-of-rack saturation.
> >
> >
> > On 11/1/11 10:54 AM, "Sujee Maniyam" <[email protected]> wrote:
> >
> >> HI all,
> >> My HBase cluster is 10 nodes, each node has 12core ,   48G RAM, 24TB
> disk,
> >> 10GEthernet.
> >> My region size is 1GB.
> >>
> >> Any guidelines on how many regions can a RS  handle comfortably?
> >> I vaguely remember reading some where to have no more than 1000 regions
> /
> >> server; that comes to 1TB / server.  Seems pretty low for the current
> >> hardware config.
> >>
> >> Any rules of thumb?  experiences?
> >>
> >> thanks
> >> Sujee
> >>
> >> http://sujee.net
> >
>



-- 
Mikael Sitruk

Reply via email to