On Sat, Oct 29, 2011 at 1:34 PM, lars hofhansl <[email protected]> wrote:
> This is more of "theoretical problem" really. > Yahoo and others claim they lost far more data due to human error than any > HDFS problems (including Namenode failures). > Actually it is not theoretical at all. SPOF != data-loss. Data-loss can occur even if you don't have any SPOF's. Vice versa, many SPOF systems do not have data-loss (eg, a single Netapp). SPOF == lack of high-availability. Which is indeed the case with HDFS, even at Y! For example, when a cluster is upgraded it becomes unavailable. @Mark: the Avatar-node is not for the faint-hearted. AFAIK, only FB runs it. Konstantin Shvachko and co at eBay have a much better NN-SPOF solution in 0.22 that was just released. I recommend you try that. > You can prevent data loss by having the namenode write the metadata to > another machine (via NFS or DRBD or if you have a SAN). > You'll still have an outage while switching over to a different machine, > but at least you won't lose any data. > > > Facebook has a partial solution (Avatarnode) and the HSFS folks are > working on a solution (which like Avatarnode mainly involves keeping > a hot copy of the Namenode so that failover is "instantaneous" - 1 or 2 > minutes at most). > > > ----- Original Message ----- > From: Mark <[email protected]> > To: [email protected] > Cc: > Sent: Saturday, October 29, 2011 11:46 AM > Subject: Dealing with single point of failure > > How does one deal with the fact that HBase has a single point of failure.. > namely the namenode. What steps can be taken to eliminate and/or minimize > the impact of a namenode failure? What can a situation where reliability is > of utmost importance should one choose an alternative technology.. ie > Cassandra? > > Thanks > >
