On Mon, Dec 24, 2012 at 8:27 AM, Ivan Balashov <[email protected]> wrote:
> > Vincent Barat <vbarat@...> writes: > > > > > Hi, > > > > Balancing regions between RS is correctly handled by HBase : I mean > > that your RSs always manage the same number of regions (the balancer > > takes care of it). > > > > Unfortunately, balancing all the regions of one particular table > > between the RS of your cluster is not always easy, since HBase (as > > for 0.90.3) when it comes to splitting a region, create the new one > > always on the same RS. This means that if you start with a 1 region > > only table, and then you insert lots of data into it, new regions > > will always be created to the same RS (if you insert is a M/R job, > > you saturate this RS). Eventually, the balancer at a time will > > decide to balance one of these regions to other RS, limiting the > > issue, but it is not controllable. > > > > Here at Capptain, we solved this problem by developing a special > > Python script, based on the HBase shell, allowing to entirely > > balance all the regions of all tables to all RS. It ensure that > > regions of tables are uniformly deployed on all RS of the cluster, > > with a minimum region transitions. > > > Is it possible to describe the logic at high level on what you did? > > It is fast, and even if it can trigger a lot of region transitions, > > there is very few impact at runtime and it can be run safely. > > > > If you are interested, just let me know, I can share it. > > > > Regards, > > > > Vincent, > > I would much like to see and possibly use the script that you > mentioned. We've just run into the same issue (after the table > has been truncated it was re-created with only 1 region, and > after data loading and manual splits we ended up having all > regions within the same RS). > > If you could share the script, it will be really appreciated, > I believe not only by me. > > Thanks, > Ivan > > > > > > >
