Can you do one thing. When you stop the services do this way -> Stop the RS -> Then stop the master.
That is always better i feel. REgards Ram On Fri, Mar 15, 2013 at 6:49 AM, Time Less <[email protected]> wrote: > We have a 15-node HBase cluster with RS on same nodes as HDFS DN. We do a > full restart of HBase[1]. Sometimes this works. But sometimes several of > the RS have this in their logs: > > """ > regionserverHostname: 2013-03-12 16:48:03,396 DEBUG > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > locateRegionInMeta parentTable=-ROOT-, > metaLocation={region=-ROOT-,,0.70236052, hostname=hbaseMasterHostname, > port=60020}, attempt=25 of 100 failed; retrying after sleep of 32000 > because: org.apache.hadoop.hbase.NotServingRegionException: Region is not > online: -ROOT-,,0" > """ > > The HMaster will be failing to find -ROOT- region[2] and will be stalled > starting up. > > The above counter from the logs will continue to increment to attempt > 100/100, then go back down to attempt 1/100 again. This will continue > forever until we delete the stale ZK entry /hbase/root-region-server. As > soon as we do, all RS get back to normal, HBase Master comes up, and life > is good. > > I searched JIRA and mailing lists and didn't find what appeared to be a > precise match. Does anyone have matching experience? > > HBase version: 0.92.1 (CDH4). > > [1] Stop Thrift. Stop HBase Master. Stop all RS. Stop Zookeeper. Reverse > this order for starting. > [2] I forget the precise verbiage from the HBase web UI. I will discover it > next time this happens if it's important, but it seems rather generic. > > -- > *Tim Ellis: *Fifth Sigma, Inc. Multimedia and Technology++ >
