I do thank you for the advice, and I will try it. Is there a quick two- or three-sentence summary about why this is the proper order?
I would have thought since the -ROOT- and .META. are on RS, that you'd want to stop the master first before stopping the RS. Perhaps I'm thinking of services incorrectly, but I always imagine that a supporting function should be stopped after the function that it supports. For example, close all files before unmounting filesystem. Unmount all filesystems before powering down. Thus, perhaps I'm misunderstanding the dependencies between RS and HMaster. Is HMaster supporting RS or vice-versa? On Fri, Mar 15, 2013 at 12:43 AM, ramkrishna vasudevan < [email protected]> wrote: > Can you do one thing. > When you stop the services do this way > -> Stop the RS > -> Then stop the master. > > That is always better i feel. > > REgards > Ram > > On Fri, Mar 15, 2013 at 6:49 AM, Time Less <[email protected]> wrote: > > > We have a 15-node HBase cluster with RS on same nodes as HDFS DN. We do a > > full restart of HBase[1]. Sometimes this works. But sometimes several of > > the RS have this in their logs: > > > > """ > > regionserverHostname: 2013-03-12 16:48:03,396 DEBUG > > > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > > locateRegionInMeta parentTable=-ROOT-, > > metaLocation={region=-ROOT-,,0.70236052, hostname=hbaseMasterHostname, > > port=60020}, attempt=25 of 100 failed; retrying after sleep of 32000 > > because: org.apache.hadoop.hbase.NotServingRegionException: Region is not > > online: -ROOT-,,0" > > """ > > > > The HMaster will be failing to find -ROOT- region[2] and will be stalled > > starting up. > > > > The above counter from the logs will continue to increment to attempt > > 100/100, then go back down to attempt 1/100 again. This will continue > > forever until we delete the stale ZK entry /hbase/root-region-server. As > > soon as we do, all RS get back to normal, HBase Master comes up, and life > > is good. > > > > I searched JIRA and mailing lists and didn't find what appeared to be a > > precise match. Does anyone have matching experience? > > > > HBase version: 0.92.1 (CDH4). > > > > [1] Stop Thrift. Stop HBase Master. Stop all RS. Stop Zookeeper. Reverse > > this order for starting. > > [2] I forget the precise verbiage from the HBase web UI. I will discover > it > > next time this happens if it's important, but it seems rather generic. > > > > -- > > *Tim Ellis: *Fifth Sigma, Inc. Multimedia and Technology++ > > > -- *Tim Ellis: *Fifth Sigma, Inc. Multimedia and Technology++ *Contact: *[email protected], 510-761-6610 *Urgent Contact:* [email protected] (gtalk preferred. if email, CC no-one)
