Re: uneven region distribution

divye sheth Fri, 14 Feb 2014 19:56:19 -0800

Typo. Not the total load on machine but the hbase cluster.

Thanks
D
On Feb 15, 2014 9:24 AM, "divye sheth" <[email protected]> wrote:


> The 2417 is the total load on the machine. When the regionserver crashes
> the master autobalances the regions.
>
> Also when you run balancer externally, one thing you should note that the
> balancer runs on a table in a RS. So if the total regions for a table are
> 20 then in your case the mean would be 4. Check using the hbase ui if the
> any table has regions equal to (average +- 1)
>
> Thanks
> D
> On Feb 15, 2014 9:13 AM, "Ted Yu" <[email protected]> wrote:
>
>> Please take a look at http://hbase.apache.org/book.html#hbase_metrics.
>>
>> You should pay attention to callQueueLength, compactionQueueLength,
>> readRequestsCount and writeRequestsCount.
>>
>> Cheers
>>
>>
>> On Fri, Feb 14, 2014 at 7:13 PM, Rohit Kelkar <[email protected]>
>> wrote:
>>
>> > It could have been under load because I am not salting the keys. If I
>> were
>> > in a position to replicate this issue what metrics should I capture so
>> > that I find whether it was under load?
>> >
>> > - R
>> >
>> > On Friday, February 14, 2014, Ted Yu <[email protected]> wrote:
>> >
>> > > From region server log - was server5 under heavy load ?
>> > >
>> > >
>> > >    1. 2014-02-14 16:06:05,700 WARN
>> org.apache.hadoop.hbase.util.Sleeper:
>> > We
>> > >    slept 99984ms instead of 3000ms, this is likely due to a long
>> garbage
>> > >    collecting pause and it's usually bad, see
>> > >    http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > >    2. ...
>> > >    3. 2014-02-14 16:06:05,783 FATAL
>> > >    org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>> > > server
>> > >    server5,60020,1392355987269: Unhandled exception:
>> > >    org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
>> rejected;
>> > >    currently processing server5,60020,1392355987269 as dead server
>> > >
>> > >
>> > >
>> > > On Fri, Feb 14, 2014 at 5:00 PM, Rohit Kelkar <[email protected]
>> > <javascript:;>>
>> > > wrote:
>> > >
>> > > > Thanks for your inputs,
>> > > > I am sharing the master log - http://pastebin.com/Xi9P6Ykr
>> > > > and the region server log of the failed region server -
>> > > > http://pastebin.com/1munghDv
>> > > >
>> > > > - R
>> > > >
>> > > >
>> > > > On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu <[email protected]
>> > <javascript:;>>
>> > > wrote:
>> > > >
>> > > > > Looking at bug fix since 0.94.2, I wonder if you are experiencing
>> the
>> > > > > following which went into 0.94.10 :
>> > > > > HBASE-8432 a table with unbalanced regions will balance
>> indefinitely
>> > > > >
>> > > > > Master log would tell us more.
>> > > > >
>> > > > >
>> > > > > On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar <
>> [email protected]
>> > <javascript:;>
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Sorry mis-stated the version, its 0.94.2
>> > > > > >
>> > > > > > - R
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <[email protected]
>> > <javascript:;>>
>> > > wrote:
>> > > > > >
>> > > > > > > bq.  it does not change the status of the assignments.
>> > > > > > >
>> > > > > > > Can you check / pastebin master log to see what caused the
>> > > balancing
>> > > > to
>> > > > > > > stop ?
>> > > > > > >
>> > > > > > > bq. attributing the region server crash to the
>> disproportionately
>> > > > high
>> > > > > > > number of regions on that server?
>> > > > > > >
>> > > > > > > Checking region server log on server5 should give us more
>> clue.
>> > > > > > >
>> > > > > > > bq. 0.92.4
>> > > > > > >
>> > > > > > > please consider upgrading :-)
>> > > > > > >
>> > > > > > >
>> > > > > > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <
>> > > [email protected] <javascript:;>
>> > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > I am using hbase version 0.92.4 on a 5 node cluster. I am
>> > seeing
>> > > > > that a
>> > > > > > > > particular region server often crashes. A status 'simple' on
>> > > hbase
>> > > > > > shell
>> > > > > > > > gives the following stats
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > HBase Shell; enter 'help<RETURN>' for list of supported
>> > commands.
>> > > > > Type
>> > > > > > > > "exit<RETURN>" to leave the HBase Shell Version 0.94.2,
>> > r1395367,
>> > > > Sun
>> > > > > > > Oct 7
>> > > > > > > > 19:11:01 UTC 2012
>> > > > > > > > status 'simple' 4 live servers
>> > > > > > > > server7:60020 1392017875910 requestsPerSecond=0,
>> > > > > > > numberOfOnlineRegions=419,
>> > > > > > > > usedHeapMB=3315, maxHeapMB=6127
>> > > > > > > > server4:60020 1392300859332 requestsPerSecond=843,
>> > > > > > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
>> > > > > > > > server3:60020 1391583646998 requestsPerSecond=429,
>> > > > > > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
>> > > > > > > > server6:60020 1391583647588 requestsPerSecond=0,
>> > > > > > > numberOfOnlineRegions=966,
>> > > > > > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
>> > > > > > > > server5,60020,1392108515637 Aggregate load: 1272, regions:
>> 2417
>> > > > > > > >
>> > > > > > > > The dead region server has 2417 regions as opposed to 419,
>> 379,
>> > > > 653,
>> > > > > > 966
>> > > > > > > > regions on other servers. Am I right in attributing the
>> region
>> > > > server
>> > > > > > > crash
>> > > > > > > > to the disproportionately high number of regions on that
>> > server?
>> > > > > > > >
>> > > > > > > > If I invoke the balancer on hbase shell using the "balancer"
>> > > > command
>> > > > > it
>> > > > > > > > returns true. But it does not change the status of the
>> > > assignments.
>> > > > > > > >
>> > > > > > > > - R
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: uneven region distribution

Reply via email to