Re: help diagnosing issue

Vladimir Rodionov Tue, 01 Sep 2015 15:48:19 -0700

OK, from beginning

1. RegionTooBusy is thrown when Memstore size exceeds region flush size X
flush multiplier. THIS is a sign of a great imbalance on a write path -
some regions are much hotter than other or .... compaction can not keep up
with load , you hit blocking store count and flushes get disabled (as well
as writes) for 90 sec by default. Choose one - what is your case?


2. Your region load is unbalanced because default region split  algorithm
does not do its job well - try to presplit (salt) to more than 40 buckets,
can you do 256?

-Vlad

On Tue, Sep 1, 2015 at 3:29 PM, Samarth Jain <[email protected]> wrote:

> Ralph,
>
> Couple of questions.
>
> Do you have phoenix stats enabled?
>
> Can you send us a stacktrace of RegionTooBusy exception? Looking at HBase
> code it is thrown in a few places. Would be good to check where the
> resource crunch is occurring at.
>
>
>
> On Tue, Sep 1, 2015 at 2:26 PM, Perko, Ralph J <[email protected]>
> wrote:
>
>> Hi I have run into an issue several times now and could really use some
>> help diagnosing the problem.
>>
>> Environment:
>> phoenix 4.4
>> hbase 0.98
>> 34 node cluster
>> Tables are defined with 40 salt buckets
>> We are continuously loading large, bz2, csv files into Phoenix via Pig.
>> The data is in the hundred of TB’s per month
>>
>> The process runs well for a few weeks but as the regions split and the
>> number of regions gets into the hundreds per table we begin to get
>> “RegionTooBusy” exceptions around Phoenix write code when the Pig jobs run.
>>
>> Something else I have noticed is the number of requests on the regions
>> becomes really unbalanced.  While the number of regions is around 40, 80,
>> 120 the number of requests per region (via the hbase master site) is pretty
>> well balanced.  But as the number gets into the 200’s many of the regions
>> have 0 requests while the other regions have hundreds of millions of
>> requests.
>>
>> If I drop the tables and start over the issue goes away.  But we are
>> approaching a production deadline and this is no longer an option.
>>
>> The cluster is on a closed network so sending log files is not possible
>> although I can send scanned images of logs and answer specific questions.
>>
>> Can you please help me diagnose this issue.
>>
>> Thanks!
>> Ralph
>>
>>
>

Re: help diagnosing issue

Reply via email to