First of all, thank you all for the answers. I appreciate it! To recap: - 0.20.4 is known to be "fragile" - upgrade to 0.89 (cdh3b3) would improve stability - GC should be monitored and system tuned if necessary (not sure how to do that - yet :) - memory should be at least 2GB, better 4GB+ (we can't go that far) - more nodes would help with stability issues
@Jonathan: yes, we are using 2 nodes that run both Hadoop (namenode, sec. namenode, datanodes, jobtracker, tasktrackers) and Hbase. The reason is that performance-wise we don't need more than that yet, but we have plans to make operation much larger in future. So while this is in production, it is really a test-case for much larger system. However, Hadoop runs reliably, even under pressure (I do understand it is much more mature project though). I would expect HBase to be written with the mantra "any machine may fail at any time" in mind - and with error recovery in that spirit. In out experience, with 0.20.4 this just isn't the case (data loss of a few hours' worth of Put()-s is very common when it crashes). But I really hope we can make it work reliably, we have put a lot of work in building a system around it... We'll see how it goes with 0.89 (fingers crossed :). Again, thank you all for the answers! Anze On Monday 13 December 2010, Geoff Hendrey wrote: > We we're having no end to "buffet" of errors and stability problems with > 20.3 when we ran big mapreduce jobs to insert data. Upgraded to 20.6 > last week, and have not seen any instability. Just my anecdotal > experience. > > -geoff > > -----Original Message----- > From: Anze [mailto:[email protected]] > Sent: Monday, December 13, 2010 2:41 AM > To: [email protected] > Subject: HBase stability > > Hi all! > > We have been using HBase 0.20.4 (cdh3b1) in production on 2 nodes for a > few > months now and we are having constant issues with it. We fell over all > standard traps (like "Too many open files", network configuration > problems,...). All in all, we had about one crash every week or so. > Fortunately we are still using it just for background processing so our > service didn't suffer directly, but we have lost huge amounts of time > just > fixing the data errors that resulted from data not being written to > permanent > storage. Not to mention fixing the issues. > As you can probably understand, we are very frustrated with this and are > > seriously considering moving to another bigtable. > > Right now, HBase crashes whenever we run very intensive rebuild of > secondary > index (normal table, but we use it as secondary index) to a huge table. > I have > found this: > http://wiki.apache.org/hadoop/Hbase/Troubleshooting > (see problem 9) > One of the lines read: > "Make sure you give plenty of RAM (in hbase-env.sh), the default of 1GB > won't > be able to sustain long running imports." > > So, if I understand correctly, no matter how HBase is set up, if I run > an > intensive enough application, it will choke? I would expect it to be > slower > when under (too much) pressure, but not to crash. > > Of course, we will somehow solve this issue (working on it), but... :( > > What are your experiences with HBase? Is it stable? Is it just us and > the way > we set it up? > > Also, would upgrading to 0.89 (cdh3b3) help? > > Thanks, > > Anze
