Re: HBase Region move() and Data Locality

Jean-Daniel Cryans Mon, 05 Mar 2012 10:20:01 -0800

So what's going on on that cluster exactly? You have a lot of tables
of various sizes and they tend to grow on only one machine?


One simple trick to get good balancing for a table is to disable it,
balance the cluster, then re-enable it. It will be distributed
properly at that point.

J-D

On Mon, Mar 5, 2012 at 8:02 AM, Bryan Beaudreault
<[email protected]> wrote:
> Hey all,
>
> We are running on cdh3u2 (soon to upgrade to 3u3), and we notice that
> regions are balanced solely based on the number of regions per region
> server, with no regard for horizontal scaling of tables.  This was mostly
> fine with a small number of regions, but as our cluster reaches thousands
> of regions we are often finding an entire table (or large part of one) on a
> single region server.  This seems suboptimal.
>
> We were looking into options for this, and noticed that it is fixed in 0.94
> (possibly 0.92?), but we are wanting to stick with CDH for now.  With that
> mind, we needed alternatives, and found the HBaseAdmin move(byte[], byte[])
> function<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#move(byte[],
> byte[])>.  The documentation doesn't mention, but I'm wondering if using
> this function ruins locality.  Without the locality problem, I was thinking
> of creating a utility that allowed us to scramble the regions of a table
> then called balance(), which would hopefully result in a better spread of
> regions for a table.  However, I don't want to ruin our performance by
> ruining the locality.
>
> The HBase book mentions that locality is achieved through major
> compactions.  If I have the opportunity to take some downtime, would it be
> feasible to scramble all of the regions, run balance() to make sure all
> regionservers have about the same number, then a major compaction to fix
> locality?
>
> Thanks!
>
> Bryan

Re: HBase Region move() and Data Locality

Reply via email to