Re: shedding regions when one region server dies

Jack Levin Mon, 04 Oct 2010 23:01:30 -0700

By trunk, you mean 0.89 or 0.20.6?

-Jack


On Mon, Oct 4, 2010 at 10:59 PM, Jack Levin <[email protected]> wrote:
> Full stop of all region servers, restart of master, is what brings it all 
> back:
>
> Please attached.  Lots of data there, search for 'Shedding'.
>
> -Jack
>
> On Mon, Oct 4, 2010 at 9:42 PM, Stack <[email protected]> wrote:
>> So, required a start/stop to fix balance issue?
>>
>> Can I see master log from around problematic time?
>>
>> (The load balancer has been completely redone in TRUNK)
>>
>> St.Ack
>>
>> On Mon, Oct 4, 2010 at 6:23 PM, Jack Levin <[email protected]> wrote:
>>> http://pastebin.com/suw2QVYg this is OOME event.
>>>
>>> When I started it up, the master eventually stopped shedding to 14
>>> regions each (used to be 700 on 10 servers), and stayed there for a
>>> while, I wanted 10 minutes, and stopped/started all region servers,
>>> and they came up in 5 minutes.
>>>
>>> -Jack
>>>
>>> On Mon, Oct 4, 2010 at 5:48 PM, Jack Levin <[email protected]> wrote:
>>>> 2010-10-04 17:47:25,449 DEBUG
>>>> org.apache.hadoop.hbase.master.RegionManager: Server(s) are carrying
>>>> only 2 regions. Server mtab5.prod.imageshack.com,60020,1285878100774
>>>> is most loaded (290). Shedding 32 regions to pass to  least loaded
>>>> (numMoveToLowLoaded=177)
>>>>
>>>>
>>>> I observe that number of loaded regions sheds pretty much to zero
>>>> before starting back up (taking long time in the process), even though
>>>> I had server that OOME'ed started up again.  It seems to be there
>>>> might be a bug in rebalancing logic?
>>>>
>>>> -Jack
>>>>
>>>
>>
>

Re: shedding regions when one region server dies

Reply via email to