Re: Region loadbalancing

Lars George Tue, 14 Dec 2010 08:27:39 -0800

Hi Jan,

Thank you. I hope this did not come through derogative I really meant
this in a friendly way (emails sometimes - errr often - do not convey
this right).


Lars

On Tue, Dec 14, 2010 at 5:00 PM, Jan Lukavský
<[email protected]> wrote:
> Hi Lars,
>
> sure, I understand this. :-)
>
> Thanks.
>
> On 14.12.2010 16:17, Lars George wrote:
>>
>> Hi Jan,
>>
>> Any day now!
>>
>> Really, there just a few little road bumps but nothing major ad once
>> they are resolved it will be released. Just rushing it for the sake of
>> releasing it will not make anyone happy (if we find issues right away
>> just afterwards). Please bear with us!
>>
>> Lars
>>
>> On Tue, Dec 14, 2010 at 10:20 AM, Jan Lukavský
>> <[email protected]>  wrote:
>>>
>>> Hi Daniel,
>>>
>>> I thought that version 0.90.0 would have major rewrites in this area,
>>> could
>>> you give a rough estimate when the new version will be out?
>>>
>>> Thanks,
>>>  Jan
>>>
>>> On 13.12.2010 20:43, Jean-Daniel Cryans wrote:
>>>>
>>>> Hi Jan,
>>>>
>>>> That area of HBase was reworked a lot in the upcoming 0.90.0 and
>>>> region opening and closing can now be done in parallel for multiple
>>>> regions.
>>>>
>>>> Also, the balancer works differently and may not even assign a single
>>>> region to a new region server (or a dead one that was restarted) until
>>>> the balancer runs (it's now every 5 minutes).
>>>>
>>>> Those behaviors are completely new, so it will probably need better
>>>> tuning, and there's still a lot to do regarding region balancing in
>>>> general, but it's probably worth trying it out.
>>>>
>>>> Regarding limiting the number of regions, you probably want to use LZO
>>>> (99% of the time it's faster for your tables) and set MAX_FILESIZE to
>>>> something like 1GB since the default is pretty low.
>>>>
>>>> Maybe your new config would be useful too in the new master, I have to
>>>> give it more thoughts.
>>>>
>>>> J-D
>>>>
>>>> On Mon, Dec 13, 2010 at 8:36 AM, Jan Lukavský
>>>> <[email protected]>    wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> we are using HBase 0.20.6 on a cluster of about 25 nodes with about 30k
>>>>> regions and are experiencing as issue which causes running  M/R jobs to
>>>>> fail.
>>>>> When we restart single RegionServer, then happens the following:
>>>>>  1) all regions of that RS get reassigned to remaing (say 24) nodes
>>>>>  2) when the restarted RegionServer comes up, HMaster closes about 60
>>>>> regions on all 24 nodes and assigns them back to the restarted node
>>>>>
>>>>> Now, the step 1) is usually very quick (if we can assign 10 regions per
>>>>> heartbeat, we have 240 regions per heartbeat on the whole cluster).
>>>>> The step 2) seems problematic, because first about 1200 regions get
>>>>> unassigned, and then they get slowly assigned to the single RS (speed
>>>>> again
>>>>> 10 regions per heartbeat). This time causes clients of Maps connected
>>>>> to
>>>>> the
>>>>> regions to throw RetriesExhaustedException.
>>>>>
>>>>> I'm aware that we can limit number of regions closed per RegionServer
>>>>> heartbeat by hbase.regions.close.max, but this config option seems a
>>>>> bit
>>>>> unsatisfactory, because as we increase size of the cluster, we will get
>>>>> more
>>>>> and more regions unassigned in single cluster heartbeat (say we limit
>>>>> this
>>>>> to 1, then we get 24 unassigned regions, but only 10 assigned per
>>>>> heartbeat). This led us to a solution, which seems quite simple. We
>>>>> have
>>>>> introduced new config option which is used to limit number of regions
>>>>> in
>>>>> transition. When regionsInTransition.size() crosses boundary, we
>>>>> temporarily
>>>>> stop load balancer. This seems to resolve our issue, because no region
>>>>> gets
>>>>> unassigned for long time and clients manage to recover within their
>>>>> number
>>>>> of retries.
>>>>>
>>>>> My question is, is this s general issue and a new config option should
>>>>> be
>>>>> proposed, or I am missing something a we could have resolved the issue
>>>>> with
>>>>> some other config option tuning?
>>>>>
>>>>> Thanks.
>>>>>  Jan
>>>>>
>>>>>
>>>
>>> --
>>>
>>> Jan Lukavský
>>> programátor
>>> Seznam.cz, a.s.
>>> Radlická 608/2
>>> 15000, Praha 5
>>>
>>> [email protected]
>>> http://www.seznam.cz
>>>
>>>
>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> [email protected]
> http://www.seznam.cz
>
>

Re: Region loadbalancing

Reply via email to