I am using hbase version 0.92.4 on a 5 node cluster. I am seeing that a particular region server often crashes. A status 'simple' on hbase shell gives the following stats
HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.94.2, r1395367, Sun Oct 7 19:11:01 UTC 2012 status 'simple' 4 live servers server7:60020 1392017875910 requestsPerSecond=0, numberOfOnlineRegions=419, usedHeapMB=3315, maxHeapMB=6127 server4:60020 1392300859332 requestsPerSecond=843, numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127 server3:60020 1391583646998 requestsPerSecond=429, numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127 server6:60020 1391583647588 requestsPerSecond=0, numberOfOnlineRegions=966, usedHeapMB=2975, maxHeapMB=6127 1 dead servers server5,60020,1392108515637 Aggregate load: 1272, regions: 2417 The dead region server has 2417 regions as opposed to 419, 379, 653, 966 regions on other servers. Am I right in attributing the region server crash to the disproportionately high number of regions on that server? If I invoke the balancer on hbase shell using the "balancer" command it returns true. But it does not change the status of the assignments. - R
