On Tue, Dec 14, 2010 at 1:46 AM, Anze <[email protected]> wrote: > > First of all, thank you all for the answers. I appreciate it! > > To recap: > - 0.20.4 is known to be "fragile"
Yes. It had a bug that would cause deadlock. > - upgrade to 0.89 (cdh3b3) would improve stability > - GC should be monitored and system tuned if necessary (not sure how to do > that - yet :) > - memory should be at least 2GB, better 4GB+ (we can't go that far) Yes, it'd help having more memory though that said 0.90 seems fine w/ 1G heaps (caveat, the more memory you have the more cache you have and the faster your reads will be). > - more nodes would help with stability issues > > @Jonathan: yes, we are using 2 nodes that run both Hadoop (namenode, sec. > namenode, datanodes, jobtracker, tasktrackers) and Hbase. The reason is that > performance-wise we don't need more than that yet, but we have plans to make > operation much larger in future. So while this is in production, it is really > a test-case for much larger system. Two nodes is a particularly 'bad' number. It probably runs slower than 1 node. Go to 3, 4 or even 5? Life will be rosier. > However, Hadoop runs reliably, even under pressure (I do understand it is much > more mature project though). Hadoop has ten minute timeouts and retries each task up to 4 times as opposed to HBase which has much shorter timeouts, etc. I would expect HBase to be written with the > mantra "any machine may fail at any time" in mind - and with error recovery in > that spirit. In out experience, with 0.20.4 this just isn't the case (data > loss of a few hours' worth of Put()-s is very common when it crashes). Sorry if you picked up the wrong impression. In 0.20.x HBase, you will lose data. The Hadoop it runs on does not have a working sync. > But I > really hope we can make it work reliably, we have put a lot of work in > building a system around it... We'll see how it goes with 0.89 (fingers > crossed :). > Please come back to the list if you have issues getting it all going. St.Ack
