On Tue, Apr 29, 2014 at 1:54 AM, Krishna Rao <[email protected]> wrote:
> Thank you for your reply Anoop. > > However, the confusing is, unfortunately, still there because of the > following (from > here<http://hbase.apache.org/book.html#perf.hdfs.configs.localread> > ): > > "For optimal performance when short-circuit reads are enabled, it is > recommended that HDFS checksums are disabled. To maintain data integrity > with HDFS checksums disabled, HBase can be configured to write its own > checksums into its datablocks and verify against these" > > The text is confusing. If you read the next sentence and click on the description under hbase.regionserver.checksum.verify<http://hbase.apache.org/book.html#hbase.regionserver.checksum.verify> it should be a little more clear. The confusion comes of the little configuration dance that is necessary around hbase writing checksums optionally inline into hfiles so they are available inline at read time and the interaction w/ native hdfs checksumming. When running with hbase checksumming of hfiles, we want a means of telling HDFS to NOT validate the checksum -- i.e. double checksumming -- because hbase will be doing it (unless there is an error, and then we'll fall back to HDFS validation). Let me try and clean up the docs. St.Ack > To me it implies that HDFS checksum needs to be disabled, meaning that HDFS > wouldn't write checksums into it's datablocks. But HBase would be fine by > writing it's own checksum. > > > On 29 April 2014 09:32, Anoop John <[email protected]> wrote: > > > HBase using its own checksum handling doesn't directly affect HDFS. It > will > > still maintain checksum info. The diff is at the read time.. HBase will > > open reader with checksum validation false and it will do checksum > > validation on its own. So using hbase handled checksum in a cluster > > should not affect other data.. Does that solves your doubt? > > > > -Anoop- > > > > On Tue, Apr 29, 2014 at 1:58 PM, Krishna Rao <[email protected]> > > wrote: > > > > > Hi Ted, > > > > > > I had read those, but I'm confused about how this will affect non-HBase > > > HDFS data. With HDFS checksumming off won't it affect data integrity? > > > > > > Krishna > > > > > > > > > On 24 April 2014 15:54, Ted Yu <[email protected]> wrote: > > > > > > > Please take a look at the following: > > > > > > > > http://hbase.apache.org/book.html#perf.hdfs.configs.localread > > > > http://hbase.apache.org/book.html#hbase.regionserver.checksum.verify > > > > > > > > > > > > On Thu, Apr 24, 2014 at 5:55 AM, Krishna Rao <[email protected] > > > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I understand that there is a significant improvement gain when > > turning > > > on > > > > > short circuit reads, and additionally by setting HBase to do > > checksums > > > > > rather than HDFS. > > > > > > > > > > However, I'm a little confused by this, do I need to turn of > checksum > > > > > within HDFS for the entire file system? We don't just use HBase on > > our > > > > > cluster, so this would seem to be a bad idea right? > > > > > > > > > > Cheers, > > > > > > > > > > Krishna > > > > > > > > > > > > > > >
