Ah then doing "bin/hbase-daemon.sh stop master" on the master node is the equivalent, but don't stop the region server themselves as the master will take care of it. Doing a stop on the master and the region servers will screw things up.
J-D On Fri, Aug 2, 2013 at 3:28 PM, Patrick Schless <[email protected]> wrote: > Doesn't stop-hbase.sh (and its ilk) require the server to be able to manage > the clients (using unpassworded SSH keys, for instance)? I don't have that > set up (for security reasons). I use capistrano for all these sort of > coordination tasks. > > > On Fri, Aug 2, 2013 at 12:07 PM, Jean-Daniel Cryans > <[email protected]>wrote: > >> Doing a bin/stop-hbase.sh is the way to go, then on the Hadoop side >> you do stop-all.sh. I think your ordering is correct but I'm not sure >> you are using the right commands. >> >> J-D >> >> On Fri, Aug 2, 2013 at 8:27 AM, Patrick Schless >> <[email protected]> wrote: >> > Ah, I bet the issue is that I'm stopped the HMaster, but not the Region >> > Servers, then restarting HDFS. What's the correct order of operations for >> > bouncing everything? >> > >> > >> > On Thu, Aug 1, 2013 at 5:21 PM, Jean-Daniel Cryans <[email protected] >> >wrote: >> > >> >> Can you follow the life of one of those blocks though the Namenode and >> >> datanode logs? I'd suggest you start by doing a fsck on one of those >> >> files with the option that gives the block locations first. >> >> >> >> By the way why do you have split logs? Are region servers dying every >> >> time you try out something? >> >> >> >> On Thu, Aug 1, 2013 at 3:16 PM, Patrick Schless >> >> <[email protected]> wrote: >> >> > Yup, 14 datanodes, all check back in. However, all of the corrupt >> files >> >> > seem to be splitlogs from data05. This is true even though I've done >> >> > several restarts (each restart adding a few missing blocks). There's >> >> > nothing special about data05, and it seems to be in the cluster, the >> same >> >> > as anyone else. >> >> > >> >> > >> >> > On Thu, Aug 1, 2013 at 5:04 PM, Jean-Daniel Cryans < >> [email protected] >> >> >wrote: >> >> > >> >> >> I can't think of a way how your missing blocks would be related to >> >> >> HBase replication, there's something else going on. Are all the >> >> >> datanodes checking back in? >> >> >> >> >> >> J-D >> >> >> >> >> >> On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless >> >> >> <[email protected]> wrote: >> >> >> > I'm running: >> >> >> > CDH4.1.2 >> >> >> > HBase 0.92.1 >> >> >> > Hadoop 2.0.0 >> >> >> > >> >> >> > Is there an issue with restarting a standby cluster with >> replication >> >> >> > running? I am doing the following on the standby cluster: >> >> >> > >> >> >> > - stop hmaster >> >> >> > - stop name_node >> >> >> > - start name_node >> >> >> > - start hmaster >> >> >> > >> >> >> > When the name node comes back up, it's reliably missing blocks. I >> >> started >> >> >> > with 0 missing blocks, and have run through this scenario a few >> times, >> >> >> and >> >> >> > am up to 46 missing blocks, all from the table that is the standby >> for >> >> >> our >> >> >> > production table (in a different datacenter). The missing blocks >> all >> >> are >> >> >> > from the same table, and look like: >> >> >> > >> >> >> > blk_-2036986832155369224 /hbase/splitlog/ >> data01.sea01.staging.tdb.com >> >> >> > ,60020,1372703317824_hdfs%3A%2F%2Fname-node.sea01.staging.tdb.com >> >> >> > %3A8020%2Fhbase%2F.logs%2Fdata05.sea01.staging.tdb.com >> >> >> > %2C60020%2C1373557074890-splitting%2Fdata05.sea01.staging.tdb.com >> >> >> > >> >> >> >> >> >> %252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/0000000001366319450.temp >> >> >> > >> >> >> > Do I have to stop replication before restarting the standby? >> >> >> > >> >> >> > Thanks, >> >> >> > Patrick >> >> >> >> >> >>
