With beefy nodes, don't be afraid of using bigger regions... and LZO.
At stumbleupon we have 1GB maxfilesize on our >13B rows table and LZO
enabled on every table. The number of regions per node is a factor of
so many things... size of rows, acces pattern, hardware, etc. FWIW, I
would say that you should definitely not try to host as much data as
available per machine, not even 50%. In fact, do the calculation of
how much data you think you need to serve from the block cache. By
default, it can grow to as much as 20% of the available RAM so in your
case it's a bit less than 1GB, times the number of region servers so
~15GB available cluster-wide. You can tweak hfile.block.cache.size for
more caching.

J-D

On Thu, May 27, 2010 at 12:09 PM, Jacob Isaac <jacobpis...@gmail.com> wrote:
> Hi
>
> Wanted to find the group's experience on HBase performance with increasing
> number of regions/node.
> Also wanted to find out if there is an optimal number of regions one should
> aim for?
>
> We are currently using
>
> 17 node HBase(0.20.4) cluster on a 20 node Hadoop(0.20.2) cluster
>
> 16G RAM per node, 4G RAM for HBase
> space available for (Hadoop + HBase)  ~ 1.5T /per node
>
>
>
> We are currently loading 2 tables each with ~100m rows resulting in
> ~ 4000 regions (Using the default for hbase.hregion.max.filesize=256m)
> and half the number of region when we double the value
> for hbase.hregion.max.filesize to 512m
> Although the two runs did not differ in the time taken ~ 9hrs
>
> With the current load we are only using 10% of the disk space available,
>  full utilization would result in increased # of regions
> and hence wanted to find group's experience/suggestions in this regards.
>
> ~Jacob
>

Reply via email to