Just had to say, https://issues.apache.org/jira/browse/HBASE-13103 looks *AWESOME*
On Thu, Jun 18, 2015 at 5:00 PM Mikhail Antonov <olorinb...@gmail.com> wrote: > Yeah, I could see 2 reasons for remaining few regions to take > unproportionally long time - 1) those regions are unproportionally > large (you should be able to quickly confirm it) and 2) they happened > to be hosted on really slow/overloaded machine(s). #1 seems far more > likely to me. > > And as Nick said, there's ongoing effort to provide exactly what > you've described - centralized periodic analysis of region sizes and > equalization as needed (somewhat complementary to balancing), and any > feedback (especially from folks experiencing real issues with unequal > region sizes) is much appreciated. > > -Mikhail > > On Thu, Jun 18, 2015 at 10:07 AM, Nick Dimiduk <ndimi...@gmail.com> wrote: > > If you're interested in region size balancing, please have a look at > > https://issues.apache.org/jira/browse/HBASE-13103 . Please provide > feedback > > as we're hoping to have an early version available in 1.2. > > > > Which reminds me, I owe Mikhail another review... > > > > On Thu, Jun 18, 2015 at 9:39 AM, Elliott Clark <ecl...@apache.org> > wrote: > > > >> The balancer is not responsible fore region size decisions. The > balancer is > >> only responsible for deciding which regionservers should host which > >> regions. > >> Splits are determined by data size of a region. See max store file size. > >> > >> On Thu, Jun 18, 2015 at 7:50 AM, Nasron Cheong <nas...@gmail.com> > wrote: > >> > >> > Hi, > >> > > >> > I've noticed there are two settings available when using the HBase > >> balancer > >> > (specifically the default stochastic balancer) > >> > > >> > hbase.master.balancer.stochastic.tableSkewCost > >> > > >> > hbase.master.loadbalance.bytable > >> > > >> > How do these two settings relate? The documentation indicates when > using > >> > the stochastic balancer that 'bytable' should be set to false? > >> > > >> > Our deployment relies on very few, very large tables, and I've noticed > >> bad > >> > distribution when accessing some of the tables. E.g. there are 443 > >> regions > >> > for a single table, but when doing a MR job over a full scan of the > >> table, > >> > the first 426 regions scan quickly (minutes), but the remaining 17 > >> regions > >> > take significantly longer (hours) > >> > > >> > My expectation is to have the balancer equalize the size of the > regions > >> for > >> > each table. > >> > > >> > Thanks! > >> > > >> > - Nasron > >> > > >> > > > > -- > Thanks, > Michael Antonov >