Vincent Barat <vbarat@...> writes: > > Hi, > > Balancing regions between RS is correctly handled by HBase : I mean > that your RSs always manage the same number of regions (the balancer > takes care of it). > > Unfortunately, balancing all the regions of one particular table > between the RS of your cluster is not always easy, since HBase (as > for 0.90.3) when it comes to splitting a region, create the new one > always on the same RS. This means that if you start with a 1 region > only table, and then you insert lots of data into it, new regions > will always be created to the same RS (if you insert is a M/R job, > you saturate this RS). Eventually, the balancer at a time will > decide to balance one of these regions to other RS, limiting the > issue, but it is not controllable. > > Here at Capptain, we solved this problem by developing a special > Python script, based on the HBase shell, allowing to entirely > balance all the regions of all tables to all RS. It ensure that > regions of tables are uniformly deployed on all RS of the cluster, > with a minimum region transitions. > > It is fast, and even if it can trigger a lot of region transitions, > there is very few impact at runtime and it can be run safely. > > If you are interested, just let me know, I can share it. > > Regards, >
Vincent, I would much like to see and possibly use the script that you mentioned. We've just run into the same issue (after the table has been truncated it was re-created with only 1 region, and after data loading and manual splits we ended up having all regions within the same RS). If you could share the script, it will be really appreciated, I believe not only by me. Thanks, Ivan
