Typo. Not the total load on machine but the hbase cluster. Thanks D On Feb 15, 2014 9:24 AM, "divye sheth" <[email protected]> wrote:
> The 2417 is the total load on the machine. When the regionserver crashes > the master autobalances the regions. > > Also when you run balancer externally, one thing you should note that the > balancer runs on a table in a RS. So if the total regions for a table are > 20 then in your case the mean would be 4. Check using the hbase ui if the > any table has regions equal to (average +- 1) > > Thanks > D > On Feb 15, 2014 9:13 AM, "Ted Yu" <[email protected]> wrote: > >> Please take a look at http://hbase.apache.org/book.html#hbase_metrics. >> >> You should pay attention to callQueueLength, compactionQueueLength, >> readRequestsCount and writeRequestsCount. >> >> Cheers >> >> >> On Fri, Feb 14, 2014 at 7:13 PM, Rohit Kelkar <[email protected]> >> wrote: >> >> > It could have been under load because I am not salting the keys. If I >> were >> > in a position to replicate this issue what metrics should I capture so >> > that I find whether it was under load? >> > >> > - R >> > >> > On Friday, February 14, 2014, Ted Yu <[email protected]> wrote: >> > >> > > From region server log - was server5 under heavy load ? >> > > >> > > >> > > 1. 2014-02-14 16:06:05,700 WARN >> org.apache.hadoop.hbase.util.Sleeper: >> > We >> > > slept 99984ms instead of 3000ms, this is likely due to a long >> garbage >> > > collecting pause and it's usually bad, see >> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> > > 2. ... >> > > 3. 2014-02-14 16:06:05,783 FATAL >> > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region >> > > server >> > > server5,60020,1392355987269: Unhandled exception: >> > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT >> rejected; >> > > currently processing server5,60020,1392355987269 as dead server >> > > >> > > >> > > >> > > On Fri, Feb 14, 2014 at 5:00 PM, Rohit Kelkar <[email protected] >> > <javascript:;>> >> > > wrote: >> > > >> > > > Thanks for your inputs, >> > > > I am sharing the master log - http://pastebin.com/Xi9P6Ykr >> > > > and the region server log of the failed region server - >> > > > http://pastebin.com/1munghDv >> > > > >> > > > - R >> > > > >> > > > >> > > > On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu <[email protected] >> > <javascript:;>> >> > > wrote: >> > > > >> > > > > Looking at bug fix since 0.94.2, I wonder if you are experiencing >> the >> > > > > following which went into 0.94.10 : >> > > > > HBASE-8432 a table with unbalanced regions will balance >> indefinitely >> > > > > >> > > > > Master log would tell us more. >> > > > > >> > > > > >> > > > > On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar < >> [email protected] >> > <javascript:;> >> > > > >> > > > > wrote: >> > > > > >> > > > > > Sorry mis-stated the version, its 0.94.2 >> > > > > > >> > > > > > - R >> > > > > > >> > > > > > >> > > > > > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <[email protected] >> > <javascript:;>> >> > > wrote: >> > > > > > >> > > > > > > bq. it does not change the status of the assignments. >> > > > > > > >> > > > > > > Can you check / pastebin master log to see what caused the >> > > balancing >> > > > to >> > > > > > > stop ? >> > > > > > > >> > > > > > > bq. attributing the region server crash to the >> disproportionately >> > > > high >> > > > > > > number of regions on that server? >> > > > > > > >> > > > > > > Checking region server log on server5 should give us more >> clue. >> > > > > > > >> > > > > > > bq. 0.92.4 >> > > > > > > >> > > > > > > please consider upgrading :-) >> > > > > > > >> > > > > > > >> > > > > > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar < >> > > [email protected] <javascript:;> >> > > > > >> > > > > > > wrote: >> > > > > > > >> > > > > > > > I am using hbase version 0.92.4 on a 5 node cluster. I am >> > seeing >> > > > > that a >> > > > > > > > particular region server often crashes. A status 'simple' on >> > > hbase >> > > > > > shell >> > > > > > > > gives the following stats >> > > > > > > > >> > > > > > > > >> > > > > > > > HBase Shell; enter 'help<RETURN>' for list of supported >> > commands. >> > > > > Type >> > > > > > > > "exit<RETURN>" to leave the HBase Shell Version 0.94.2, >> > r1395367, >> > > > Sun >> > > > > > > Oct 7 >> > > > > > > > 19:11:01 UTC 2012 >> > > > > > > > status 'simple' 4 live servers >> > > > > > > > server7:60020 1392017875910 requestsPerSecond=0, >> > > > > > > numberOfOnlineRegions=419, >> > > > > > > > usedHeapMB=3315, maxHeapMB=6127 >> > > > > > > > server4:60020 1392300859332 requestsPerSecond=843, >> > > > > > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127 >> > > > > > > > server3:60020 1391583646998 requestsPerSecond=429, >> > > > > > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127 >> > > > > > > > server6:60020 1391583647588 requestsPerSecond=0, >> > > > > > > numberOfOnlineRegions=966, >> > > > > > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers >> > > > > > > > server5,60020,1392108515637 Aggregate load: 1272, regions: >> 2417 >> > > > > > > > >> > > > > > > > The dead region server has 2417 regions as opposed to 419, >> 379, >> > > > 653, >> > > > > > 966 >> > > > > > > > regions on other servers. Am I right in attributing the >> region >> > > > server >> > > > > > > crash >> > > > > > > > to the disproportionately high number of regions on that >> > server? >> > > > > > > > >> > > > > > > > If I invoke the balancer on hbase shell using the "balancer" >> > > > command >> > > > > it >> > > > > > > > returns true. But it does not change the status of the >> > > assignments. >> > > > > > > > >> > > > > > > > - R >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >
