Re: data loss due to regionserver going down

Chris Tarnas Wed, 27 Jul 2011 09:17:04 -0700

Replication of 1x means no replication. 2x would mean the data exists in two 
locations (what it looks like you want). Running with a replication of 1x is a 
very bad idea and is pretty much a guaranteed way to get data loss.


-chris

On Jul 27, 2011, at 8:58 AM, 吴限 wrote:

> Hi everyone. I'd like to run the following *data* *loss* scenario by you to
> see if
> we are doing something obviously wrong with our setup here.
> 
> Setup:
>   -cdh3u0
>   - Hadoop 0.20.2
>   - HBase 0.90.1
>   - 1 Master Node running as NameNode & JobTracker
>   -zookeeper quorum
>   - 2 child nodes running as Datanode, TaskTracker and RegionServer each
>   - dfs.replication is set to 1
> 
> First, I inserted some data into the hbase a few hours ago.
> Then after a while. I rebooted one of the region servers and waited until
> the master responded to that. However, after I checked the table using hbase
> shell (I used the "count" command), I noticed that there was a huge amount
> of data being lost.
> After I restarted the regionserver which I had rebooted and checked again,
> I found that some of the missing data was got back but there still existed
> some data which hadn't been found yet.
> At last,after I disabled the table and then enabled the table , I found that
> all data was stored in the cluster and there was no data that was lost.
> 
> This is problematic since we are supposed to
> replicate at x1, so at least one other node should be able to
> theoretically serve the *data* that the downed regionserver can't.
> 
> Questions:
> 
>   - How can you guys explain this weird situation?
>   - Are there way to recover such lost *data*?
> 
> Any tips here are definitely appreciated. I'll be happy to provide more
> information as well.-0

Re: data loss due to regionserver going down

Reply via email to