> a) What is the easiest way to get an overview of how a table is distributed 
> across regions of a cluster?

I usually see by the web interface (host:60010).
Click on a table and scroll down. There will be a region count of this table 
across the cluster.

> b) What constitutes a "badly distributed" table and how can I re-balance 
> manually?

I think the answer to this questions is manually split. There is a chapter in 
the book talking about it.
I am looking forward for an answer from the experienced guys ;)

> c) Is b) needed at all? I know that HBase does its balancing automatically 
> behind the scenes.

>From my experience yes. HBase does not balance as much as I need. In the worst 
>case I have
a difference of 16 regions (32 against 48) in a 10 machine cluster.

Hoping for a great answer so I don't have to do manual splits ;)

Regards,
Pablo

-----Original Message-----
From: David Koch [mailto:[email protected]] 
Sent: terça-feira, 4 de setembro de 2012 11:56
To: [email protected]
Subject: Fixing badly distributed table manually.

Hello,

A couple of questions regarding balancing of a table's data in HBase.

a) What is the easiest way to get an overview of how a table is distributed 
across regions of a cluster? I guess I could search .META. but I haven't 
figured out how to use filters from shell.
b) What constitutes a "badly distributed" table and how can I re-balance 
manually?
c) Is b) needed at all? I know that HBase does its balancing automatically 
behind the scenes.

As for a) I tried running this script:

https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb

like so:

hbase org.jruby.Main ./list_regions.rb <_my_table>

but I get

ArgumentError: wrong number of arguments (1 for 2)
  (root) at ./list_regions.rb:60

If someone more proficient notices an obvious fix, I'd be glad to hear about it.

Why do I ask? I have the impression that one of the tables on our HBase cluster 
is not well distributed. When running a Map Reduce job on this table, the load 
average on a single node is very high, whereas all other nodes are almost 
idling. It is the only table where this behavior is observed. Other Map Reduce 
jobs result in slightly elevated load averages on several machines.

Thank you,

/David

Reply via email to