Re: Dealing with single point of failure

M. C. Srivas Tue, 13 Dec 2011 23:29:01 -0800

On Sat, Oct 29, 2011 at 1:34 PM, lars hofhansl <[email protected]> wrote:


> This is more of "theoretical problem" really.
> Yahoo and others claim they lost far more data due to human error than any
> HDFS problems (including Namenode failures).
>

Actually it is not theoretical at all.

SPOF  !=  data-loss.

Data-loss can occur even if you don't have any SPOF's.  Vice versa, many
SPOF systems do not have data-loss (eg, a single Netapp).

SPOF == lack of high-availability.

Which is indeed the case with HDFS, even at Y!  For example, when a cluster
is upgraded it becomes unavailable.

@Mark:
 the Avatar-node is not for the faint-hearted. AFAIK, only FB runs it.
Konstantin Shvachko and co at eBay have a much better NN-SPOF solution in
0.22 that was just released. I recommend you try that.









> You can prevent data loss by having the namenode write the metadata to
> another machine (via NFS or DRBD or if you have a SAN).
> You'll still have an outage while switching over to a different machine,
> but at least you won't lose any data.
>
>
> Facebook has a partial solution (Avatarnode) and the HSFS folks are
> working on a solution (which like Avatarnode mainly involves keeping
> a hot copy of the Namenode so that failover is "instantaneous" - 1 or 2
> minutes at most).
>
>
> ----- Original Message -----
> From: Mark <[email protected]>
> To: [email protected]
> Cc:
> Sent: Saturday, October 29, 2011 11:46 AM
> Subject: Dealing with single point of failure
>
> How does one deal with the fact that HBase has a single point of failure..
> namely the namenode. What steps can be taken to eliminate and/or minimize
> the impact of a namenode failure? What can a situation where reliability is
> of utmost importance should one choose an alternative technology.. ie
> Cassandra?
>
> Thanks
>
>

Re: Dealing with single point of failure

Reply via email to