Running with 1 replica is unusual -- and there is little motiviation for running with this configuration since it means dataloss -- so few have experience with it. St.Ack
2011/7/28 Xian Woo <[email protected]>: > Thanks, everybody. I really appreciate what you guys have done with my > question. Indeed , for me the situation which I came across is too > complicated and too strange to me .So I've decided to re-install the hbase > tool and change the related configuration files.Hope this time it will get > better. Thanks again! > Best wishes~ > Woo. > > 在 2011年7月28日 下午1:50,Nico Guba <[email protected]>写道: > >> Very interesting. What is a good value where there is not too much of a >> trade-off in performance? >> >> I'd imagine that setting this too high could create a very 'chatty' >> cluster. >> >> On 28 Jul 2011, at 00:33, Jeff Whiting wrote: >> >> > Replication needs to be higher than 1. If you have a node which is >> running both DataNode and >> > HRegionServer then shut it down you WILL loose all the data that the >> DataNode was holding because no >> > one else on the cluster has it. HBase relies on HDFS for the replication >> of data and does NOT have >> > it's own data replication mechanism unlike Cassandra or Voldemort. If you >> set the HDFS replication >> > factor to 3 then when you shutdown your node 2 other nodes will have the >> data and HBase will be able >> > to serve that data for you. >> > >> > You can think of each DataNode as a hard drive. Having a replication >> factor of 1 means the data is >> > only on one hard drive and if you unplug the hard drive that data will be >> lost. Having a replication >> > factor greater than 1 is like having multiple hard drives in a raid 1 >> (mirrored) array. If you >> > unplug one of the hard drives the data is still on the other ones and >> nothing is lost. >> > >> > ~Jeff >> > >> > On 7/27/2011 10:35 AM, 吴限 wrote: >> >> Here is my hbase-site.xml: >> >> configuration> >> >> <property> >> >> <name>hbase.cluster.distributed</name> >> >> <value>true</value> >> >> </property> >> >> <property> >> >> <name>hbase.rootdir</name> >> >> <value>hdfs://server3.yun.com:54310/hbase</value> >> >> <description>The directory shared by region servers. >> >> </description> >> >> </property> >> >> <property> >> >> <name>hbase.zookeeper.quorum</name> >> >> <value>server3.yun.com</value> >> >> </property> >> >> <property> >> >> <name>dfs.replication</name> >> >> <value>1</value> >> >> </property> >> >> >> >> >> >> 2011/7/28 Stack <[email protected]> >> >> >> >>> On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <[email protected]> wrote: >> >>>> Setup: >> >>>> -cdh3u0 >> >>>> - Hadoop 0.20.2 >> >>> You are using the hadoop from cdh3u0? >> >>> >> >>> >> >>>> - dfs.replication is set to 1 >> >>>> >> >>> You will lose data if a machine goes away. You have two machines but >> >>> only one instance of each data block; think of it as half of your data >> >>> one one node and the rest on another. If you kill one machine, half >> >>> your data is gone. >> >>> >> >>> >> >>>> After I restarted the regionserver which I had rebooted and checked >> >>> again, >> >>>> I found that some of the missing data was got back but there still >> >>> existed >> >>>> some data which hadn't been found yet. >> >>> >> >>> I wonder what was going on here that we didn't see it all restored. >> >>> >> >>> >> >>>> This is problematic since we are supposed to >> >>>> replicate at x1, so at least one other node should be able to >> >>>> theoretically serve the *data* that the downed regionserver can't. >> >>>> >> >>> No. The behavior you describe would come with replication of 2, not 1. >> >>> >> >>> St.Ack >> >>> >> > >> > -- >> > Jeff Whiting >> > Qualtrics Senior Software Engineer >> > [email protected] >> > >> >> >
