Re: Performance at large number of regions/node

Jacob Isaac Thu, 27 May 2010 14:58:04 -0700

Thanks J-D

Currently we are trying to find/optimize our load/write times - although in
prod we expect it to be 25/75 (writes/reads) ratio.
We are using long table model with only one column - row-size is typically ~
4-5k


As to your suggestion on not using even 50% of disk space - I agree and was
planning to use only ~30-40% (1.5T of 4T) for HDFS
and as I reported earlier
4000 regi...@256m per region(with 3 replications) on 20 nodes ==  150G
per/node == 10% utilization

while using 1GB as maxfilesize did you have to adjust other params such
as hbase.hstore.compactionThreshold and hbase.hregion.memstore.flush.size.
There is an interesting observation by Jonathan Gray documented/reported in
HBASE-2375 -
wondering whether that issue gets compounded when using 1G as the
hbase.hregion.max.filesize

Thx
Jacob


On Thu, May 27, 2010 at 1:27 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:

> With beefy nodes, don't be afraid of using bigger regions... and LZO.
> At stumbleupon we have 1GB maxfilesize on our >13B rows table and LZO
> enabled on every table. The number of regions per node is a factor of
> so many things... size of rows, acces pattern, hardware, etc. FWIW, I
> would say that you should definitely not try to host as much data as
> available per machine, not even 50%. In fact, do the calculation of
> how much data you think you need to serve from the block cache. By
> default, it can grow to as much as 20% of the available RAM so in your
> case it's a bit less than 1GB, times the number of region servers so
> ~15GB available cluster-wide. You can tweak hfile.block.cache.size for
> more caching.
>
> J-D
>
> On Thu, May 27, 2010 at 12:09 PM, Jacob Isaac <jacobpis...@gmail.com>
> wrote:
> > Hi
> >
> > Wanted to find the group's experience on HBase performance with
> increasing
> > number of regions/node.
> > Also wanted to find out if there is an optimal number of regions one
> should
> > aim for?
> >
> > We are currently using
> >
> > 17 node HBase(0.20.4) cluster on a 20 node Hadoop(0.20.2) cluster
> >
> > 16G RAM per node, 4G RAM for HBase
> > space available for (Hadoop + HBase)  ~ 1.5T /per node
> >
> >
> >
> > We are currently loading 2 tables each with ~100m rows resulting in
> > ~ 4000 regions (Using the default for hbase.hregion.max.filesize=256m)
> > and half the number of region when we double the value
> > for hbase.hregion.max.filesize to 512m
> > Although the two runs did not differ in the time taken ~ 9hrs
> >
> > With the current load we are only using 10% of the disk space available,
> >  full utilization would result in increased # of regions
> > and hence wanted to find group's experience/suggestions in this regards.
> >
> > ~Jacob
> >
>

Re: Performance at large number of regions/node

Reply via email to