Re: data loss due to regionserver going down

Jeff Whiting Wed, 27 Jul 2011 16:33:56 -0700

Replication needs to be higher than 1. If you have a node which is running both 
DataNode and
HRegionServer then shut it down you WILL loose all the data that the DataNode 
was holding because no
one else on the cluster has it. HBase relies on HDFS for the replication of 
data and does NOT have
it's own data replication mechanism unlike Cassandra or Voldemort. If you set 
the HDFS replication
factor to 3 then when you shutdown your node 2 other nodes will have the data 
and HBase will be able
to serve that data for you.


You can think of each DataNode as a hard drive. Having a replication factor of 
1 means the data is
only on one hard drive and if you unplug the hard drive that data will be lost. 
Having a replication
factor greater than 1 is like having multiple hard drives in a raid 1 
(mirrored) array. If you
unplug one of the hard drives the data is still on the other ones and nothing 
is lost.

~Jeff

On 7/27/2011 10:35 AM, 吴限 wrote:
> Here is my hbase-site.xml:
> configuration>
>     <property>
>         <name>hbase.cluster.distributed</name>
>         <value>true</value>
>     </property>
>     <property>
>         <name>hbase.rootdir</name>
>         <value>hdfs://server3.yun.com:54310/hbase</value>
>         <description>The directory shared by region servers.
>         </description>
>     </property>
>     <property>
>         <name>hbase.zookeeper.quorum</name>
>         <value>server3.yun.com</value>
>     </property>
>     <property>
>         <name>dfs.replication</name>
>         <value>1</value>
>     </property>
>
>
> 2011/7/28 Stack <[email protected]>
>
>> On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <[email protected]> wrote:
>>> Setup:
>>>   -cdh3u0
>>>   - Hadoop 0.20.2
>> You are using the hadoop from cdh3u0?
>>
>>
>>>   - dfs.replication is set to 1
>>>
>> You will lose data if a machine goes away. You have two machines but
>> only one instance of each data block; think of it as half of your data
>> one one node and the rest on another.  If you kill one machine, half
>> your data is gone.
>>
>>
>>>  After I restarted the regionserver which I had rebooted and checked
>> again,
>>>  I found that some of the missing data was got back but there still
>> existed
>>> some data which hadn't been found yet.
>>
>> I wonder what was going on here that we didn't see it all restored.
>>
>>
>>>  This is problematic since we are supposed to
>>> replicate at x1, so at least one other node should be able to
>>> theoretically serve the *data* that the downed regionserver can't.
>>>
>> No.  The behavior you describe would come with replication of 2, not 1.
>>
>> St.Ack
>>

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
[email protected]

Re: data loss due to regionserver going down

Reply via email to