It can be reasonable to turn off the automatic region split if you know your rowkey distribution well and you're able to ensure a great parallelism among your regionservers "easily". (ie: manually or through HBase API). Sometimes it's even the best solution to ensure the minimum number of regions (Many companies are doing this). There is an example about pre-splitting regions on the Reference Guide.
About your region size, consider upgrading it to 2 GB or even more will help to reduce the number of regions and storeFiles. On Fri, Jul 13, 2012 at 10:31 PM, Rob Roland <[email protected]> wrote: > Hi all, > > The HBase instance I'm managing has grown to the point that it has way too > many regions per server - 5 region servers with 1010 regions each on HBase > 0.90.4-cdh3u2. I want to bring this region count under control. The > cluster is currently running with the default region size of 256 mb, and > the data is spread across 17 tables. I've turned on compression for all > the column families, which is great, as my region count is growing much > slower now. I've looked through HDFS at the individual regions, and they > seem rather small - 40-50 mb - which is not surprising due to major > compactions after enabling compression. My total hbase folder size in HDFS > (hadoop fs -dus /hbase) is 926,939,501,499 bytes. > > My question is - what's the best strategy for handling this? > > What I assume from reading the docs: > > 1. Increase the hbase.hregion.max.filesize to something more reasonable, > like 2 GB. > 2. Bring the cluster offline and merge regions. > > Is there a good way to determine the actual region sizes, other than > manually, that way I can do the merges to end up with the most efficient > regions, size-wise? > > At what point is it a good idea to turn off automatic region splits and > manually manage them? > > Thanks, > > Rob Roland > Senior Software Engineer > Simply Measured, Inc. > -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me
