Hi,

Balancing regions between RS is correctly handled by HBase : I mean that your RSs always manage the same number of regions (the balancer takes care of it).

Unfortunately, balancing all the regions of one particular table between the RS of your cluster is not always easy, since HBase (as for 0.90.3) when it comes to splitting a region, create the new one always on the same RS. This means that if you start with a 1 region only table, and then you insert lots of data into it, new regions will always be created to the same RS (if you insert is a M/R job, you saturate this RS). Eventually, the balancer at a time will decide to balance one of these regions to other RS, limiting the issue, but it is not controllable.

Here at Capptain, we solved this problem by developing a special Python script, based on the HBase shell, allowing to entirely balance all the regions of all tables to all RS. It ensure that regions of tables are uniformly deployed on all RS of the cluster, with a minimum region transitions.

It is fast, and even if it can trigger a lot of region transitions, there is very few impact at runtime and it can be run safely.

If you are interested, just let me know, I can share it.

Regards,

Le 04/09/12 23:42, David Koch a écrit :
Hello,

Thank you for your replies. We are using CDH4 HBase 0.92. Good call on the
web interface. The port is blocked so I never really got a chance to test
it. As far as manual re-balancing is concerned I will check the book.

/David


On Tue, Sep 4, 2012 at 5:34 PM, Guillaume Gardey <
[email protected]> wrote:

Hello,

a) What is the easiest way to get an overview of how a table is
distributed
across regions of a cluster? I guess I could search .META. but I haven't
figured out how to use filters from shell.
b) What constitutes a "badly distributed" table and how can I re-balance
manually?
c) Is b) needed at all? I know that HBase does its balancing
automatically
behind the scenes.
I have found that
http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/  is a good
source of information/tools to look at regions balancing in the cluster and
investigate it.

As for a) I tried running this script:

https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb

like so:

hbase org.jruby.Main ./list_regions.rb <_my_table>

but I get

ArgumentError: wrong number of arguments (1 for 2)
  (root) at ./list_regions.rb:60

If someone more proficient notices an obvious fix, I'd be glad to hear
about it.
Concerning https://github.com/Mendeley/hbase-scripts , I am afraid that
this is a repository that is no longer maintained and was written for old
releases of hbase (cdh2 I believe). There's no plan to upgrade it to newer
releases.

Cheers
---
Guillaume

Reply via email to