Re: HDFS Restart with Replication

2013-08-08 Thread Patrick Schless
Hi Asaf, Thanks for the info. I tried this, but it didn't work for me (the region servers never shut down). Any idea how long it should take to pick it up? I let it sit several minutes, and all I saw in the RS logs was: 2013-08-08 13:41:55,303 INFO

Re: HDFS Restart with Replication

2013-08-06 Thread Patrick Schless
Hi J-D, Thanks for the help. I tried your suggestion (hbase-daemon.sh stop master), and this leaves all the region servers running. This seems the same as the problematic case I was in when I was stopping only the HMaster, and not the region servers, and then bouncing HDFS. It seems like I want

Re: HDFS Restart with Replication

2013-08-06 Thread Asaf Mesika
Yep. That's a confusing one. When running /hbase stop master, it sets the shutdown flag in ZK. RS listen in on this flag, and once they see it set, they shut them selfs down. Once they are all down, the master goes down as well. On Saturday, August 3, 2013, Jean-Daniel Cryans wrote: Ah then

Re: HDFS Restart with Replication

2013-08-02 Thread Jean-Daniel Cryans
Doing a bin/stop-hbase.sh is the way to go, then on the Hadoop side you do stop-all.sh. I think your ordering is correct but I'm not sure you are using the right commands. J-D On Fri, Aug 2, 2013 at 8:27 AM, Patrick Schless patrick.schl...@gmail.com wrote: Ah, I bet the issue is that I'm

Re: HDFS Restart with Replication

2013-08-02 Thread Patrick Schless
Doesn't stop-hbase.sh (and its ilk) require the server to be able to manage the clients (using unpassworded SSH keys, for instance)? I don't have that set up (for security reasons). I use capistrano for all these sort of coordination tasks. On Fri, Aug 2, 2013 at 12:07 PM, Jean-Daniel Cryans

Re: HDFS Restart with Replication

2013-08-02 Thread Jean-Daniel Cryans
Ah then doing bin/hbase-daemon.sh stop master on the master node is the equivalent, but don't stop the region server themselves as the master will take care of it. Doing a stop on the master and the region servers will screw things up. J-D On Fri, Aug 2, 2013 at 3:28 PM, Patrick Schless

HDFS Restart with Replication

2013-08-01 Thread Patrick Schless
I'm running: CDH4.1.2 HBase 0.92.1 Hadoop 2.0.0 Is there an issue with restarting a standby cluster with replication running? I am doing the following on the standby cluster: - stop hmaster - stop name_node - start name_node - start hmaster When the name node comes back up, it's reliably

Re: HDFS Restart with Replication

2013-08-01 Thread Jean-Daniel Cryans
I can't think of a way how your missing blocks would be related to HBase replication, there's something else going on. Are all the datanodes checking back in? J-D On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless patrick.schl...@gmail.com wrote: I'm running: CDH4.1.2 HBase 0.92.1 Hadoop 2.0.0

Re: HDFS Restart with Replication

2013-08-01 Thread Patrick Schless
Yup, 14 datanodes, all check back in. However, all of the corrupt files seem to be splitlogs from data05. This is true even though I've done several restarts (each restart adding a few missing blocks). There's nothing special about data05, and it seems to be in the cluster, the same as anyone

Re: HDFS Restart with Replication

2013-08-01 Thread Jean-Daniel Cryans
Can you follow the life of one of those blocks though the Namenode and datanode logs? I'd suggest you start by doing a fsck on one of those files with the option that gives the block locations first. By the way why do you have split logs? Are region servers dying every time you try out something?