Hi all, The HBase instance I'm managing has grown to the point that it has way too many regions per server - 5 region servers with 1010 regions each on HBase 0.90.4-cdh3u2. I want to bring this region count under control. The cluster is currently running with the default region size of 256 mb, and the data is spread across 17 tables. I've turned on compression for all the column families, which is great, as my region count is growing much slower now. I've looked through HDFS at the individual regions, and they seem rather small - 40-50 mb - which is not surprising due to major compactions after enabling compression. My total hbase folder size in HDFS (hadoop fs -dus /hbase) is 926,939,501,499 bytes.
My question is - what's the best strategy for handling this? What I assume from reading the docs: 1. Increase the hbase.hregion.max.filesize to something more reasonable, like 2 GB. 2. Bring the cluster offline and merge regions. Is there a good way to determine the actual region sizes, other than manually, that way I can do the merges to end up with the most efficient regions, size-wise? At what point is it a good idea to turn off automatic region splits and manually manage them? Thanks, Rob Roland Senior Software Engineer Simply Measured, Inc.
