Replication needs to be higher than 1. If you have a node which is running both DataNode and HRegionServer then shut it down you WILL loose all the data that the DataNode was holding because no one else on the cluster has it. HBase relies on HDFS for the replication of data and does NOT have it's own data replication mechanism unlike Cassandra or Voldemort. If you set the HDFS replication factor to 3 then when you shutdown your node 2 other nodes will have the data and HBase will be able to serve that data for you.
You can think of each DataNode as a hard drive. Having a replication factor of 1 means the data is only on one hard drive and if you unplug the hard drive that data will be lost. Having a replication factor greater than 1 is like having multiple hard drives in a raid 1 (mirrored) array. If you unplug one of the hard drives the data is still on the other ones and nothing is lost. ~Jeff On 7/27/2011 10:35 AM, 吴限 wrote: > Here is my hbase-site.xml: > configuration> > <property> > <name>hbase.cluster.distributed</name> > <value>true</value> > </property> > <property> > <name>hbase.rootdir</name> > <value>hdfs://server3.yun.com:54310/hbase</value> > <description>The directory shared by region servers. > </description> > </property> > <property> > <name>hbase.zookeeper.quorum</name> > <value>server3.yun.com</value> > </property> > <property> > <name>dfs.replication</name> > <value>1</value> > </property> > > > 2011/7/28 Stack <[email protected]> > >> On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <[email protected]> wrote: >>> Setup: >>> -cdh3u0 >>> - Hadoop 0.20.2 >> You are using the hadoop from cdh3u0? >> >> >>> - dfs.replication is set to 1 >>> >> You will lose data if a machine goes away. You have two machines but >> only one instance of each data block; think of it as half of your data >> one one node and the rest on another. If you kill one machine, half >> your data is gone. >> >> >>> After I restarted the regionserver which I had rebooted and checked >> again, >>> I found that some of the missing data was got back but there still >> existed >>> some data which hadn't been found yet. >> >> I wonder what was going on here that we didn't see it all restored. >> >> >>> This is problematic since we are supposed to >>> replicate at x1, so at least one other node should be able to >>> theoretically serve the *data* that the downed regionserver can't. >>> >> No. The behavior you describe would come with replication of 2, not 1. >> >> St.Ack >> -- Jeff Whiting Qualtrics Senior Software Engineer [email protected]
