I looked at commit history of StochasticLoadBalancer.java.
0.98.10 should have most of the recent fixes.

Can you capture a few jstack's when load balancer does the computation and
pastebin them ?

Please start a new thread since recent discussion is no longer about shared
filesystem.

If you can take a look at TestStochasticLoadBalancer and add a test which
reproduces what you saw, that would help us troubleshoot.

Cheers

On Thu, Aug 27, 2015 at 6:56 AM, donmai <[email protected]> wrote:

> Very unbalanced due to the addition of a few nodes at 0 regions each. When
> I ran balancer in hbase shell without these nodes and a balanced cluster
> (+- 3 regions per node), balancer ran very quickly, around 3 seconds.
>
> On Thu, Aug 27, 2015 at 9:50 AM, Ted Yu <[email protected]> wrote:
>
> > How balanced are the table regions in your cluster ?
> >
> > Cheers
> >
> > On Thu, Aug 27, 2015 at 6:15 AM, donmai <[email protected]> wrote:
> >
> > > I figured out the issue - the reason wasn't actually region movement
> > taking
> > > a while, the balancer is actually the thing taking forever:
> > >
> > > 2015-08-27 12:50:13,582 DEBUG
> > [hostname,60000,1440642755872-BalancerChore]
> > > balancer.StochasticLoadBalancer: Could not find a better load balance
> > > plan.  Tried 0 different configurations in 2211294ms, and did not find
> > > anything with a computed cost less than 54.18640355329625
> > >
> > > After waiting for half an hour to an hour, only one region is ever
> moved
> > by
> > > the balancer and this process is repeated. I'm using default settings
> > with
> > > regard to slop / overall balancing...any idea why it's taking so long?
> > > Thanks!
> > >
> > > On Tue, Aug 25, 2015 at 4:46 PM, anil gupta <[email protected]>
> > wrote:
> > >
> > > > AFAIK, region movement does not moves the data of region on the
> > > > (distributed)FileSystem. It should only, update metadata of HBase.
> > > > Did you check diskio stats during region movement?
> > > >
> > > > On Tue, Aug 25, 2015 at 10:40 AM, Ted Yu <[email protected]>
> wrote:
> > > >
> > > > > Please see
> http://hbase.apache.org/book.html#regions.arch.assignment
> > > > >
> > > > > On Tue, Aug 25, 2015 at 10:37 AM, donmai <[email protected]>
> wrote:
> > > > >
> > > > > > NFS
> > > > > > 0.98.10
> > > > > > Will get to you as soon as I am able, on travel
> > > > > >
> > > > > > Is my general understanding correct, though, that there shouldn't
> > be
> > > > any
> > > > > > data movement from a region reassignment?
> > > > > >
> > > > > > On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > Can you give a bit more information:
> > > > > > >
> > > > > > > which filesystem you use
> > > > > > > which hbase release you use
> > > > > > > master log snippet for the long region assignment
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Tue, Aug 25, 2015 at 9:30 AM, donmai <[email protected]>
> > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm curious about how exactly region movement works with
> regard
> > > to
> > > > > data
> > > > > > > > transfer. To my understanding from the docs given an
> > HDFS-backed
> > > > > > > cluster, a
> > > > > > > > region movement / transition involves changing things in meta
> > > only,
> > > > > all
> > > > > > > > data movement for locality is handled by HDFS. In the case
> > where
> > > > > > rootdir
> > > > > > > is
> > > > > > > > a shared file system, there shouldn't be any data movement
> > with a
> > > > > > region
> > > > > > > > reassignment, correct? I'm running into performance issues
> > where
> > > > > region
> > > > > > > > assignment takes a very long time and I'm trying to figure
> out
> > > why.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Anil Gupta
> > > >
> > >
> >
>

Reply via email to