HBase/HDFS are maintaining block checksums, so presumably a corrupted block would fail checksum validation. Increasing the number of replicas increases the odds that you'll still have a valid block. I'm not an HDFS expert, but I would be very surprised if HDFS is validating a "questionable block" via byte-wise comparison over the network amongst the replica peers.
On Mon, Feb 23, 2015 at 12:25 PM, Michael Segel <[email protected]> wrote: > > On Feb 23, 2015, at 1:47 AM, Arinto Murdopo <[email protected]> wrote: > > We're running HBase (0.94.15-cdh4.6.0) on top of HDFS (Hadoop > 2.0.0-cdh4.6.0). > For all of our tables, we set the replication factor to 1 (dfs.replication > = 1 in hbase-site.xml). We set to 1 because we want to minimize the HDFS > usage (now we realize we should set this value to at least 2, because > "failure is a norm" in distributed systems). > > > > Sorry, but you really want this to be a replication value of at least 3 > and not 2. > > Suppose you have corruption but not a lost block. Which copy of the two is > right? > With 3, you can compare the three and hopefully 2 of the 3 will match. > >
