By trunk, you mean 0.89 or 0.20.6? -Jack
On Mon, Oct 4, 2010 at 10:59 PM, Jack Levin <[email protected]> wrote: > Full stop of all region servers, restart of master, is what brings it all > back: > > Please attached. Lots of data there, search for 'Shedding'. > > -Jack > > On Mon, Oct 4, 2010 at 9:42 PM, Stack <[email protected]> wrote: >> So, required a start/stop to fix balance issue? >> >> Can I see master log from around problematic time? >> >> (The load balancer has been completely redone in TRUNK) >> >> St.Ack >> >> On Mon, Oct 4, 2010 at 6:23 PM, Jack Levin <[email protected]> wrote: >>> http://pastebin.com/suw2QVYg this is OOME event. >>> >>> When I started it up, the master eventually stopped shedding to 14 >>> regions each (used to be 700 on 10 servers), and stayed there for a >>> while, I wanted 10 minutes, and stopped/started all region servers, >>> and they came up in 5 minutes. >>> >>> -Jack >>> >>> On Mon, Oct 4, 2010 at 5:48 PM, Jack Levin <[email protected]> wrote: >>>> 2010-10-04 17:47:25,449 DEBUG >>>> org.apache.hadoop.hbase.master.RegionManager: Server(s) are carrying >>>> only 2 regions. Server mtab5.prod.imageshack.com,60020,1285878100774 >>>> is most loaded (290). Shedding 32 regions to pass to least loaded >>>> (numMoveToLowLoaded=177) >>>> >>>> >>>> I observe that number of loaded regions sheds pretty much to zero >>>> before starting back up (taking long time in the process), even though >>>> I had server that OOME'ed started up again. It seems to be there >>>> might be a bug in rebalancing logic? >>>> >>>> -Jack >>>> >>> >> >
