Hey all, We are running on cdh3u2 (soon to upgrade to 3u3), and we notice that regions are balanced solely based on the number of regions per region server, with no regard for horizontal scaling of tables. This was mostly fine with a small number of regions, but as our cluster reaches thousands of regions we are often finding an entire table (or large part of one) on a single region server. This seems suboptimal.
We were looking into options for this, and noticed that it is fixed in 0.94 (possibly 0.92?), but we are wanting to stick with CDH for now. With that mind, we needed alternatives, and found the HBaseAdmin move(byte[], byte[]) function<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#move(byte[], byte[])>. The documentation doesn't mention, but I'm wondering if using this function ruins locality. Without the locality problem, I was thinking of creating a utility that allowed us to scramble the regions of a table then called balance(), which would hopefully result in a better spread of regions for a table. However, I don't want to ruin our performance by ruining the locality. The HBase book mentions that locality is achieved through major compactions. If I have the opportunity to take some downtime, would it be feasible to scramble all of the regions, run balance() to make sure all regionservers have about the same number, then a major compaction to fix locality? Thanks! Bryan
