If you did the change, can you share your experience/results?

On Wed, Dec 15, 2010 at 12:04 AM, Jan Lukavský <[email protected]
> wrote:

> We can give it a try. Currently we use 512 MiB per region, is there any
> upper bound for this value which is not recommended to cross? Are there any
> side-effects we may expect when we set this value to say 1 GiB? I suppose at
> least a bit longer random gets?
>
> Thanks,
>  Jan
>
>
> On 14.12.2010 18:50, Stack wrote:
>
>> Can you do w/ less regions?  1k plus per server is pushing it I'd say.
>>  Can you up your region sizes, for instance?
>> St.Ack
>>
>> On Mon, Dec 13, 2010 at 8:36 AM, Jan Lukavský
>> <[email protected]>  wrote:
>>
>>> Hi all,
>>>
>>> we are using HBase 0.20.6 on a cluster of about 25 nodes with about 30k
>>> regions and are experiencing as issue which causes running  M/R jobs to
>>> fail.
>>> When we restart single RegionServer, then happens the following:
>>>  1) all regions of that RS get reassigned to remaing (say 24) nodes
>>>  2) when the restarted RegionServer comes up, HMaster closes about 60
>>> regions on all 24 nodes and assigns them back to the restarted node
>>>
>>> Now, the step 1) is usually very quick (if we can assign 10 regions per
>>> heartbeat, we have 240 regions per heartbeat on the whole cluster).
>>> The step 2) seems problematic, because first about 1200 regions get
>>> unassigned, and then they get slowly assigned to the single RS (speed
>>> again
>>> 10 regions per heartbeat). This time causes clients of Maps connected to
>>> the
>>> regions to throw RetriesExhaustedException.
>>>
>>> I'm aware that we can limit number of regions closed per RegionServer
>>> heartbeat by hbase.regions.close.max, but this config option seems a bit
>>> unsatisfactory, because as we increase size of the cluster, we will get
>>> more
>>> and more regions unassigned in single cluster heartbeat (say we limit
>>> this
>>> to 1, then we get 24 unassigned regions, but only 10 assigned per
>>> heartbeat). This led us to a solution, which seems quite simple. We have
>>> introduced new config option which is used to limit number of regions in
>>> transition. When regionsInTransition.size() crosses boundary, we
>>> temporarily
>>> stop load balancer. This seems to resolve our issue, because no region
>>> gets
>>> unassigned for long time and clients manage to recover within their
>>> number
>>> of retries.
>>>
>>> My question is, is this s general issue and a new config option should be
>>> proposed, or I am missing something a we could have resolved the issue
>>> with
>>> some other config option tuning?
>>>
>>> Thanks.
>>>  Jan
>>>
>>>
>>>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> [email protected]
> http://www.seznam.cz
>
>

Reply via email to