Drifting off topic a bit … On Sep 1, 2011, at 12:12 PM, Ryan Rawson wrote:
>> First, you have to learn: >> 1) Linux HA >> 2) DRDB >> >> Right out of the gate just to have a redundant name node. > > Eh, no one would do that. If you want a redundant name node your only > choice is to use Mapr, which I would def recommend since you get a > better nn "fail-over" w/o service interruption and significantly > higher performance than hdfs. Really? People running offline analytics may be fine with an hour of downtime [<http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html> <http://www.hortonworks.com/data-integrity-and-availability-in-apache-hadoop-hdfs/>] for their M/R jobs, but people running interactive services do not find that acceptable. Is my only option to avoid significant downtime in the event of a name node failure a closed-source offering that has already demonstrated at least one serious data-loss issue <http://answers.mapr.com/questions/415/hbase-table-disappear-after-failover-attempt-and-fall-back>? I don’t really mean to criticize MapR: they were victims of a hidden dependency, but that’s what happens when you replace part of an integrated stack. And that is why I find your suggestion that I should not expect to use the integrated stack a little unnerving, because I'm looking at HBase for an online application. joe