I think that it is needed a little bit more than just stating 1000 per RS or 100-200 and monitor table. The reason is simple: it is very difficult for IT to work with this statement especially if we need to sell product based on hbase technology. It is important to figure out: * How many RS * How many memory to assign to RS * How many Region per RS * How many connections a cluster of Hbase can handle are needed for a specific deployment. I'm not an IT guy but i know that will have to answer such questions.
Mikael.S On Fri, Nov 4, 2011 at 1:37 PM, Michel Segel <[email protected]>wrote: > The funny thing about tuning... What works for one situation may not work > well for others. > Using the old recommendation of never exceeding 1000 R per RS, keeping it > low around 100-200 and monitoring tables and changing the REgion Size on a > table by table basis we are doing OK. > ( of course there are other nasty bugs that kill us... But that's a > different thread...) > > The point is that you need to decide what makes sense for you and what > trade offs you can live with... > > Just my two cents... > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Nov 2, 2011, at 9:10 PM, lars hofhansl <[email protected]> wrote: > > > Do we know what would need to change in HBase in order to be able to > manage more regions per regionserver? > > With 20 regions per server, one would need 300G regions to just utilize > 6T of drive space. > > > > > > To utilize a regionserver/datanode with 24T drive space the region size > would be an insane 1T. > > > > -- Lars > > > > ________________________________ > > From: Nicolas Spiegelberg <[email protected]> > > To: "[email protected]" <[email protected]> > > Cc: Karthik Ranganathan <[email protected]>; Kannan Muthukkaruppan < > [email protected]> > > Sent: Tuesday, November 1, 2011 3:57 PM > > Subject: Re: region size/count per regionserver > > > > Simple answer > > ------------- > > 20 regions/server & <2000 regions/cluster is a good rule of thumb if you > > can't profile your workload yet. You really want to ensure that > > > > 1) You need to limits the regions/cluster so the master can have a > > reasonable startup time & can handle all the region state transitions via > > ZK. Most bigger companies are running 2,000 in production and achieve > > reasonable startup times (< 2 minutes for region assignment on cold > > start). If you want to test the scalability of that algorithm beyond > what > > other companies need, admin beware. > > 2) The more regions/server you have, the faster that recovery can happen > > after RS death because you can currently parallelize recovery on a > > region-granularity. Too many regions/server and #1 starts to be a > problem. > > > > > > > > Complicated answer > > ------------------ > > More information is optimize this formula. Additional considerations: > > > > 1) Are you IO-bound or CPU-bound > > 2) What is your grid topology like > > 3) What is your network hardware like > > 4) How many disks (not just size) > > 5) What is the data locality between RegionServer & DataNode > > > > In the Facebook case, we have 5 racks with 20 nodes each. Servers in the > > rack are connected by 1G Eth to a switch with a 10G uplink. We are > > network bound. Our saturation point is mostly commonly on the > top-of-rack > > switch. With 20 regions/server, we can roughly parallelize our > > distributed log splitting within a single rack on RS death (although 2 > > regions do split off-rack). This minimizes top-of-rack traffic and > > optimized our recovery time. Even if you are CPU-bound, log splitting > > (hence recovery time) is an IO-bound operation. A lot of our work on > > region assignment is about maximizing data locality, even on RS death, so > > we avoid top-of-rack saturation. > > > > > > On 11/1/11 10:54 AM, "Sujee Maniyam" <[email protected]> wrote: > > > >> HI all, > >> My HBase cluster is 10 nodes, each node has 12core , 48G RAM, 24TB > disk, > >> 10GEthernet. > >> My region size is 1GB. > >> > >> Any guidelines on how many regions can a RS handle comfortably? > >> I vaguely remember reading some where to have no more than 1000 regions > / > >> server; that comes to 1TB / server. Seems pretty low for the current > >> hardware config. > >> > >> Any rules of thumb? experiences? > >> > >> thanks > >> Sujee > >> > >> http://sujee.net > > > -- Mikael Sitruk
