I do not know much about Hive. Sorry. It all depends on where Hive creates the ClientScanner object. Normally you would call HTable.getScanner(Scan) in order to get a scanner. ClientScanner checks whether the scannerCaching on the passed Scan object is > 0, if so it takes that, otherwise it looks into the environment Configuration for hbase.client.scanner.caching and defaults to 1 if not set.
So it all depends on what Configuration Hive sees. -- Lars ________________________________ From: java8964 <[email protected]> To: "[email protected]" <[email protected]> Sent: Monday, February 10, 2014 2:33 PM Subject: RE: Hive + Hbase scanning performance Hi, Lars: Is there any logging I can enable to verify this? I am not questioning your knowledge, but from my performance testing, I really didn't see any result. I read org.apache.hadoop.hbase.client.Scan of Hbase 0.94.3 version, I didn't see any logging I can use to check if the cache value is being set on what value. From the Hive code org.apache.hadoop.hive.hbase.HiveBaseTableInputFormat, it will create a Scan object with default caching value (-1), and set this scan into its BaseClass, which is org.apache.hadoop.hbase.mapreduce.TableInputFormatBase. I believe then this Scan class will be serialized to the server and I didn't find any place its caching value will be reset based on the Configuration. Of course, I maybe miss it since I just start reading Hbase codebase and not knowing too much about it. Any log in the server side can show the cache value, if I change any log level? If so, how? Also, can you comment out about Hive Jira https://issues.apache.org/jira/browse/HIVE-3603? In fact, I have the same question as the 2nd to last comment in the Jira ticket, but no one ever answered it. Quoted: Swarnim Kulkarni added a comment - 26/Aug/13 19:28Edward Capriolo Thanks! Also how is setting this property different than directly setting the "hbase.client.scanner.caching" property in hive-site.xml without this enhancement? Wouldn't they have the same effect? Thanks Yong > Date: Mon, 10 Feb 2014 12:37:07 -0800 > From: [email protected] > Subject: Re: Hive + Hbase scanning performance > To: [email protected] > > The block caching won't buy you much in terms of performance. > You *must* set the scanner caching. > > Note that hbase.client.scanner.caching is a global config option. (see > HTable.getScanner(...)), so as long as that option is set on the > Configuration that the HTable sees that Hive uses to create the scanner it > should work. > > > -- Lars > > > > ________________________________ > From: java8964 <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Monday, February 10, 2014 12:19 PM > Subject: Re: Hive + Hbase scanning performance > > > Hi, Ted: > Our environment is using a distribution from a Vendor, so it is not easy just > to patch it myself. > But I can seek the option to see if the vendor is willing to patch it in next > release. > Before I do that, I just want to make sure patching the code is the ONLY > solution. > I read the source code of Hive 0.9.0 of HiveHBaseTableInputFormat. I didn't > see any place it invoked scan.setCaching(), so I don't think "set > hbase.client.scanner.caching" in the hive session will work, but that is just > my guess. There are quite a lot of messages on the internet that it will work > in this case, so it confused me. > What I want to confirm is that "set hbase.client.scanner.caching" in fact > doesn't work in hive for scan.setCaching(). Is that true? > Thanks > Yong > > Date: Mon, 13 Jan 2014 19:31:38 -0800 > Subject: Re: Hive + Hbase scanning performance > From: [email protected] > To: [email protected] > > You can patch HIVE-3603 into your deployment so that you can make use of > scan.setCacheBlocks(false). > > Cheers
