I looked at commit history of StochasticLoadBalancer.java. 0.98.10 should have most of the recent fixes.
Can you capture a few jstack's when load balancer does the computation and pastebin them ? Please start a new thread since recent discussion is no longer about shared filesystem. If you can take a look at TestStochasticLoadBalancer and add a test which reproduces what you saw, that would help us troubleshoot. Cheers On Thu, Aug 27, 2015 at 6:56 AM, donmai <[email protected]> wrote: > Very unbalanced due to the addition of a few nodes at 0 regions each. When > I ran balancer in hbase shell without these nodes and a balanced cluster > (+- 3 regions per node), balancer ran very quickly, around 3 seconds. > > On Thu, Aug 27, 2015 at 9:50 AM, Ted Yu <[email protected]> wrote: > > > How balanced are the table regions in your cluster ? > > > > Cheers > > > > On Thu, Aug 27, 2015 at 6:15 AM, donmai <[email protected]> wrote: > > > > > I figured out the issue - the reason wasn't actually region movement > > taking > > > a while, the balancer is actually the thing taking forever: > > > > > > 2015-08-27 12:50:13,582 DEBUG > > [hostname,60000,1440642755872-BalancerChore] > > > balancer.StochasticLoadBalancer: Could not find a better load balance > > > plan. Tried 0 different configurations in 2211294ms, and did not find > > > anything with a computed cost less than 54.18640355329625 > > > > > > After waiting for half an hour to an hour, only one region is ever > moved > > by > > > the balancer and this process is repeated. I'm using default settings > > with > > > regard to slop / overall balancing...any idea why it's taking so long? > > > Thanks! > > > > > > On Tue, Aug 25, 2015 at 4:46 PM, anil gupta <[email protected]> > > wrote: > > > > > > > AFAIK, region movement does not moves the data of region on the > > > > (distributed)FileSystem. It should only, update metadata of HBase. > > > > Did you check diskio stats during region movement? > > > > > > > > On Tue, Aug 25, 2015 at 10:40 AM, Ted Yu <[email protected]> > wrote: > > > > > > > > > Please see > http://hbase.apache.org/book.html#regions.arch.assignment > > > > > > > > > > On Tue, Aug 25, 2015 at 10:37 AM, donmai <[email protected]> > wrote: > > > > > > > > > > > NFS > > > > > > 0.98.10 > > > > > > Will get to you as soon as I am able, on travel > > > > > > > > > > > > Is my general understanding correct, though, that there shouldn't > > be > > > > any > > > > > > data movement from a region reassignment? > > > > > > > > > > > > On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu <[email protected]> > > > wrote: > > > > > > > > > > > > > Can you give a bit more information: > > > > > > > > > > > > > > which filesystem you use > > > > > > > which hbase release you use > > > > > > > master log snippet for the long region assignment > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > On Tue, Aug 25, 2015 at 9:30 AM, donmai <[email protected]> > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > I'm curious about how exactly region movement works with > regard > > > to > > > > > data > > > > > > > > transfer. To my understanding from the docs given an > > HDFS-backed > > > > > > > cluster, a > > > > > > > > region movement / transition involves changing things in meta > > > only, > > > > > all > > > > > > > > data movement for locality is handled by HDFS. In the case > > where > > > > > > rootdir > > > > > > > is > > > > > > > > a shared file system, there shouldn't be any data movement > > with a > > > > > > region > > > > > > > > reassignment, correct? I'm running into performance issues > > where > > > > > region > > > > > > > > assignment takes a very long time and I'm trying to figure > out > > > why. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Thanks & Regards, > > > > Anil Gupta > > > > > > > > > >
