This doesn't address your question on move(), but regarding locality, see 8.7.3 in here...
http://hbase.apache.org/book.html#regions.arch .. it's not just major compactions, but any write of a storefile that affects locality (flush, minor, major). On 3/5/12 11:02 AM, "Bryan Beaudreault" <[email protected]> wrote: >Hey all, > >We are running on cdh3u2 (soon to upgrade to 3u3), and we notice that >regions are balanced solely based on the number of regions per region >server, with no regard for horizontal scaling of tables. This was mostly >fine with a small number of regions, but as our cluster reaches thousands >of regions we are often finding an entire table (or large part of one) on >a >single region server. This seems suboptimal. > >We were looking into options for this, and noticed that it is fixed in >0.94 >(possibly 0.92?), but we are wanting to stick with CDH for now. With that >mind, we needed alternatives, and found the HBaseAdmin move(byte[], >byte[]) >function<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HB >aseAdmin.html#move(byte[], >byte[])>. The documentation doesn't mention, but I'm wondering if using >this function ruins locality. Without the locality problem, I was >thinking >of creating a utility that allowed us to scramble the regions of a table >then called balance(), which would hopefully result in a better spread of >regions for a table. However, I don't want to ruin our performance by >ruining the locality. > >The HBase book mentions that locality is achieved through major >compactions. If I have the opportunity to take some downtime, would it be >feasible to scramble all of the regions, run balance() to make sure all >regionservers have about the same number, then a major compaction to fix >locality? > >Thanks! > >Bryan
