Hi all,

The HBase instance I'm managing has grown to the point that it has way too
many regions per server - 5 region servers with 1010 regions each on HBase
0.90.4-cdh3u2.  I want to bring this region count under control. The
cluster is currently running with the default region size of 256 mb, and
the data is spread across 17 tables.   I've turned on compression for all
the column families, which is great, as my region count is growing much
slower now. I've looked through HDFS at the individual regions, and they
seem rather small - 40-50 mb - which is not surprising due to major
compactions after enabling compression.  My total hbase folder size in HDFS
(hadoop fs -dus /hbase) is 926,939,501,499 bytes.

My question is - what's the best strategy for handling this?

What I assume from reading the docs:

1. Increase the hbase.hregion.max.filesize to something more reasonable,
like 2 GB.
2. Bring the cluster offline and merge regions.

Is there a good way to determine the actual region sizes, other than
manually, that way I can do the merges to end up with the most efficient
regions, size-wise?

At what point is it a good idea to turn off automatic region splits and
manually manage them?

Thanks,

Rob Roland
Senior Software Engineer
Simply Measured, Inc.

Reply via email to