Re: A data loss scenario with a single region server going down

Ryan Rawson Mon, 20 Sep 2010 12:52:59 -0700

Hey,

The problem is that the stock 0.20 hadoop wont let you read from a
non-closed file.  It will report that length as 0.  So if a
regionserver crashes, that last WAL log that is still open becomes 0
length and the data within in unreadable.  That specifically is the
problem of data loss.  You could always make it so your regionservers
rarely crash - this is possible btw and I did it for over a year.


But you will want to run CDH3 or the append-branch releases to get the
series of patches that fix this hole.  It also happens that only 0.89
runs on it.  I would like to avoid the hadoop "everyone uses 0.20
forever" problem and talk about what we could do to help you get on
0.89.  Over here at SU we've made a commitment to the future of 0.89
and are running it in production.  Let us know what else you'd need.

-ryan

On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis
<[email protected]> wrote:
> Thanks Todd. We are not quite ready to move to 0.89 yet. We have made custom
> modifications to the transactional contrib sources which are now taken out
> of 0.89. We are planning on moving to 0.90 when it comes out and at that
> point, either migrate our customizations, or move back to the out-of-the box
> features (which will require a re-write of our code).
>
> We are well aware of the CDH distros but at the time we started with hbase,
> there was none that included HBase. I think CDH3 the first one to include
> HBase, correct? And is 0.89 the only one supported?
>
> Moreover, are we saying that there is no way to prevent stock hbase 0.20.6
> and hadoop 0.20.2 from losing data when a single node goes down? It does not
> matter if the data is replicated, it will still get lost?
>
> -GS
>
> On Sun, Sep 19, 2010 at 5:58 PM, Todd Lipcon <[email protected]> wrote:
>
>> Hi George,
>>
>> The data loss problems you mentioned below are known issues when running on
>> stock Apache 0.20.x hadoop.
>>
>> You should consider upgrading to CDH3b2, which includes a number of HDFS
>> patches that allow HBase to durably store data. You'll also have to upgrade
>> to HBase 0.89 - we ship a version as part of CDH that will work well.
>>
>> Thanks
>> -Todd
>>
>> On Sun, Sep 19, 2010 at 6:57 AM, George P. Stathis <[email protected]
>> >wrote:
>>
>> > Hi folks. I'd like to run the following data loss scenario by you to see
>> if
>> > we are doing something obviously wrong with our setup here.
>> >
>> > Setup:
>> >
>> >   - Hadoop 0.20.1
>> >   - HBase 0.20.3
>> >   - 1 Master Node running Nameserver, SecondaryNameserver, JobTracker,
>> >   HMaster and 1 Zookeeper (no zookeeper quorum right now)
>> >   - 4 child nodes running a Datanode, TaskTracker and RegionServer each
>> >   - dfs.replication is set to 2
>> >   - Host: Amazon EC2
>> >
>> > Up until yesterday, we were frequently experiencing
>> > HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
>> > which kept bringing our RegionServers down. What we realized though is
>> that
>> > we were losing data (a few hours worth) with just one out of four
>> > regionservers going down. This is problematic since we are supposed to
>> > replicate at x2 out of 4 nodes, so at least one other node should be able
>> > to
>> > theoretically serve the data that the downed regionserver can't.
>> >
>> > Questions:
>> >
>> >   - When a regionserver goes down unexpectedly, the only data that
>> >   theoretically gets lost was whatever didn't make it to the WAL, right?
>> Or
>> >   wrong? E.g.
>> >
>> >
>> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
>> >   - We ran a hadoop fsck on our cluster and verified the replication
>> factor
>> >   as well as that the were no under replicated blocks. So why was our
>> data
>> > not
>> >   available from another node?
>> >   - If the log gets rolled every 60 minutes by default (we haven't
>> touched
>> >   the defaults), how can we lose data from up to 24 hours ago?
>> >   - When the downed regionserver comes back up, shouldn't that data be
>> >   available again? Ours wasn't.
>> >   - In such scenarios, is there a recommended approach for restoring the
>> >   regionserver that goes down? We just brought them back up by logging on
>> > the
>> >   node itself an manually restarting them first. Now we have automated
>> > crons
>> >   that listen for their ports and restart them if they go down within two
>> >   minutes.
>> >   - Are there way to recover such lost data?
>> >   - Are versions 0.89 / 0.90 addressing any of these issues?
>> >   - Curiosity question: when a regionserver goes down, does the master
>> try
>> >   to replicate that node's data on another node to satisfy the
>> > dfs.replication
>> >   ratio?
>> >
>> > For now, we have upgraded our HBase to 0.20.6, which is supposed to
>> contain
>> > the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077> fix
>> (but
>> > no one has verified yet). Lars' blog also suggests that Hadoop 0.21.0 is
>> > the
>> > way to go to avoid the  file append issues but it's not production ready
>> > yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
>> >
>> > Any tips here are definitely appreciated. I'll be happy to provide more
>> > information as well.
>> >
>> > -GS
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Re: A data loss scenario with a single region server going down

Reply via email to