Doing HBase level checksums (as opposed to HDFS level) will mostly yield results for random gets. Scans (like rowcounting and similar) will probably see a negligible improvement.
In HDFS a block and its checksum are stored in different local files on each datanode. So loading a block requires 2 IOs. With the checksum handled by HBase only one IO is needed per block. ________________________________ From: Robert Dyer <[email protected]> To: Hbase-User <[email protected]> Sent: Friday, February 1, 2013 11:37 AM Subject: Re: HBase Checksum Yes that log is a debug level log, as I saw in the source. But I too enabled DEBUG and still never saw that log message. But I, unlike you, see absolutely no change in performance. One test I did however that makes me think it is actually enabled: if I submit from another user I start getting security warnings about that user not having permission for shortcircuit. So perhaps it is working, but I have no clue why that log fails to show anywhere. Regarding enabling checksums that is an interesting question. Do I have to do a major compaction after enabling so HBase writes the checksum? Or will it detect the setting change and do that automatically? What if I disable, will it remove the checksums? On Fri, Feb 1, 2013 at 6:30 AM, Jean-Marc Spaggiari <[email protected] > wrote: > Hi Robert, > > That's perfectly fine, it was my next question ;) > > > Anoop, I saw a 5% performance increase by activating HBase Checksum. > Can I disable it again to retry the baseline and see the difference? > Or now that it's there, it's to late? > > Also, regarding BlockReaderLocal, I don't find that in my logs, but > after I have activated the shortcircuit, I saw a 41% performance > increase, so I'm almost sure it's working, but I don't know either how > to check that. > > What's the best way to see that on the logs? It's not display when > HBase is starting. Even not displayed when I'n doing major > compactions. > > I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and > still can't see anything. Not in the region server, and not in the > datanode. > > Also, to check with HDFS level logs whether the checksum meta file is > getting read to the DFS client, I'm not really sure how to acheive > that. > > JM > > 2013/2/1, Robert Dyer <[email protected]>: > > Ok grepping the RS logs I see nothing with 'local' in any of them. > Thanks > > for that hint. > > > > For the test I was using, I know it is data local. Every map task > launched > > data local, and no regions were moving recently. > > > > I think I've hijacked this thread enough, I'll move my issues to another. > > ;-) > > > > > > On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <[email protected]> > > wrote: > > > >> Hi Robert > >> When HDFS is doing the local short circuit read, it will use > >> BlockReaderLocal class for reading. There should be some logs at the > DFS > >> client side (RS) which tells abt creating new BlockReaderLocal . If you > >> can see this then sure the local read is happening. > >> > >> Also check DN log. If local read happening, then you will not see read > >> request related logs for the HFile at the DN side. > >> You check your no# of HFiles and names for checking the logs > >> > >> Are you sure that when you tested, u have data locality? Region > movements > >> across RSs can break the full data locality. > >> > >> -Anoop- > >> ________________________________________ > >> From: Robert Dyer [[email protected]] > >> Sent: Friday, February 01, 2013 11:10 AM > >> To: Hbase-User > >> Subject: Re: HBase Checksum > >> > >> Not trying to hijack your thread here... > >> > >> But can you verify via logs that the shortcircuit is working? Because I > >> enabled shortcircuit but I sure didn't see any performance increase. > >> > >> I haven't tried enabling hbase checksum yet but I'd like to be able to > >> verify that works too. > >> > >> > >> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <[email protected]> > >> wrote: > >> > >> > You can check with HDFS level logs whether the checksum meta file is > >> > getting read to the DFS client? In the HBase handled checksum, this > >> should > >> > not happen. > >> > Have you noticed any perf gain when you configure the HBase handled > >> > checksum option? > >> > > >> > -Anoop- > >> > ________________________________________ > >> > From: Jean-Marc Spaggiari [[email protected]] > >> > Sent: Friday, February 01, 2013 4:16 AM > >> > To: user > >> > Subject: HBase Checksum > >> > > >> > Hi, > >> > > >> > I have activated shortcircuit and checksum and I would like to get a > >> > confirmation that it's working fine. > >> > > >> > So I have activated short circuit first and saw a 40% improvement of > >> > the MR rowcount job. So I guess it's working fine. > >> > > >> > Now, I'm configuring the checksum option, and I'm wondering how I can > >> > do to validate that it's taken into consideration and used, or not. Is > >> > there a way to see that? > >> > > >> > Thanks, > >> > > >> > JM > >> > > >> > > > > > > > > -- > > > > Robert Dyer > > [email protected] > > > -- Robert Dyer [email protected]
