Thank you for your reply Anoop. However, the confusing is, unfortunately, still there because of the following (from here<http://hbase.apache.org/book.html#perf.hdfs.configs.localread> ):
"For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled. To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into its datablocks and verify against these" To me it implies that HDFS checksum needs to be disabled, meaning that HDFS wouldn't write checksums into it's datablocks. But HBase would be fine by writing it's own checksum. On 29 April 2014 09:32, Anoop John <[email protected]> wrote: > HBase using its own checksum handling doesn't directly affect HDFS. It will > still maintain checksum info. The diff is at the read time.. HBase will > open reader with checksum validation false and it will do checksum > validation on its own. So using hbase handled checksum in a cluster > should not affect other data.. Does that solves your doubt? > > -Anoop- > > On Tue, Apr 29, 2014 at 1:58 PM, Krishna Rao <[email protected]> > wrote: > > > Hi Ted, > > > > I had read those, but I'm confused about how this will affect non-HBase > > HDFS data. With HDFS checksumming off won't it affect data integrity? > > > > Krishna > > > > > > On 24 April 2014 15:54, Ted Yu <[email protected]> wrote: > > > > > Please take a look at the following: > > > > > > http://hbase.apache.org/book.html#perf.hdfs.configs.localread > > > http://hbase.apache.org/book.html#hbase.regionserver.checksum.verify > > > > > > > > > On Thu, Apr 24, 2014 at 5:55 AM, Krishna Rao <[email protected]> > > > wrote: > > > > > > > Hi all, > > > > > > > > I understand that there is a significant improvement gain when > turning > > on > > > > short circuit reads, and additionally by setting HBase to do > checksums > > > > rather than HDFS. > > > > > > > > However, I'm a little confused by this, do I need to turn of checksum > > > > within HDFS for the entire file system? We don't just use HBase on > our > > > > cluster, so this would seem to be a bad idea right? > > > > > > > > Cheers, > > > > > > > > Krishna > > > > > > > > > >
