Hey, The problem is that the stock 0.20 hadoop wont let you read from a non-closed file. It will report that length as 0. So if a regionserver crashes, that last WAL log that is still open becomes 0 length and the data within in unreadable. That specifically is the problem of data loss. You could always make it so your regionservers rarely crash - this is possible btw and I did it for over a year.
But you will want to run CDH3 or the append-branch releases to get the series of patches that fix this hole. It also happens that only 0.89 runs on it. I would like to avoid the hadoop "everyone uses 0.20 forever" problem and talk about what we could do to help you get on 0.89. Over here at SU we've made a commitment to the future of 0.89 and are running it in production. Let us know what else you'd need. -ryan On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis <[email protected]> wrote: > Thanks Todd. We are not quite ready to move to 0.89 yet. We have made custom > modifications to the transactional contrib sources which are now taken out > of 0.89. We are planning on moving to 0.90 when it comes out and at that > point, either migrate our customizations, or move back to the out-of-the box > features (which will require a re-write of our code). > > We are well aware of the CDH distros but at the time we started with hbase, > there was none that included HBase. I think CDH3 the first one to include > HBase, correct? And is 0.89 the only one supported? > > Moreover, are we saying that there is no way to prevent stock hbase 0.20.6 > and hadoop 0.20.2 from losing data when a single node goes down? It does not > matter if the data is replicated, it will still get lost? > > -GS > > On Sun, Sep 19, 2010 at 5:58 PM, Todd Lipcon <[email protected]> wrote: > >> Hi George, >> >> The data loss problems you mentioned below are known issues when running on >> stock Apache 0.20.x hadoop. >> >> You should consider upgrading to CDH3b2, which includes a number of HDFS >> patches that allow HBase to durably store data. You'll also have to upgrade >> to HBase 0.89 - we ship a version as part of CDH that will work well. >> >> Thanks >> -Todd >> >> On Sun, Sep 19, 2010 at 6:57 AM, George P. Stathis <[email protected] >> >wrote: >> >> > Hi folks. I'd like to run the following data loss scenario by you to see >> if >> > we are doing something obviously wrong with our setup here. >> > >> > Setup: >> > >> > - Hadoop 0.20.1 >> > - HBase 0.20.3 >> > - 1 Master Node running Nameserver, SecondaryNameserver, JobTracker, >> > HMaster and 1 Zookeeper (no zookeeper quorum right now) >> > - 4 child nodes running a Datanode, TaskTracker and RegionServer each >> > - dfs.replication is set to 2 >> > - Host: Amazon EC2 >> > >> > Up until yesterday, we were frequently experiencing >> > HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>, >> > which kept bringing our RegionServers down. What we realized though is >> that >> > we were losing data (a few hours worth) with just one out of four >> > regionservers going down. This is problematic since we are supposed to >> > replicate at x2 out of 4 nodes, so at least one other node should be able >> > to >> > theoretically serve the data that the downed regionserver can't. >> > >> > Questions: >> > >> > - When a regionserver goes down unexpectedly, the only data that >> > theoretically gets lost was whatever didn't make it to the WAL, right? >> Or >> > wrong? E.g. >> > >> > >> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html >> > - We ran a hadoop fsck on our cluster and verified the replication >> factor >> > as well as that the were no under replicated blocks. So why was our >> data >> > not >> > available from another node? >> > - If the log gets rolled every 60 minutes by default (we haven't >> touched >> > the defaults), how can we lose data from up to 24 hours ago? >> > - When the downed regionserver comes back up, shouldn't that data be >> > available again? Ours wasn't. >> > - In such scenarios, is there a recommended approach for restoring the >> > regionserver that goes down? We just brought them back up by logging on >> > the >> > node itself an manually restarting them first. Now we have automated >> > crons >> > that listen for their ports and restart them if they go down within two >> > minutes. >> > - Are there way to recover such lost data? >> > - Are versions 0.89 / 0.90 addressing any of these issues? >> > - Curiosity question: when a regionserver goes down, does the master >> try >> > to replicate that node's data on another node to satisfy the >> > dfs.replication >> > ratio? >> > >> > For now, we have upgraded our HBase to 0.20.6, which is supposed to >> contain >> > the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077> fix >> (but >> > no one has verified yet). Lars' blog also suggests that Hadoop 0.21.0 is >> > the >> > way to go to avoid the file append issues but it's not production ready >> > yet. Should we stick to 0.20.1? Upgrade to 0.20.2? >> > >> > Any tips here are definitely appreciated. I'll be happy to provide more >> > information as well. >> > >> > -GS >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >
