The block caching won't buy you much in terms of performance. You *must* set the scanner caching.
Note that hbase.client.scanner.caching is a global config option. (see HTable.getScanner(...)), so as long as that option is set on the Configuration that the HTable sees that Hive uses to create the scanner it should work. -- Lars ________________________________ From: java8964 <[email protected]> To: "[email protected]" <[email protected]> Sent: Monday, February 10, 2014 12:19 PM Subject: Re: Hive + Hbase scanning performance Hi, Ted: Our environment is using a distribution from a Vendor, so it is not easy just to patch it myself. But I can seek the option to see if the vendor is willing to patch it in next release. Before I do that, I just want to make sure patching the code is the ONLY solution. I read the source code of Hive 0.9.0 of HiveHBaseTableInputFormat. I didn't see any place it invoked scan.setCaching(), so I don't think "set hbase.client.scanner.caching" in the hive session will work, but that is just my guess. There are quite a lot of messages on the internet that it will work in this case, so it confused me. What I want to confirm is that "set hbase.client.scanner.caching" in fact doesn't work in hive for scan.setCaching(). Is that true? Thanks Yong Date: Mon, 13 Jan 2014 19:31:38 -0800 Subject: Re: Hive + Hbase scanning performance From: [email protected] To: [email protected] You can patch HIVE-3603 into your deployment so that you can make use of scan.setCacheBlocks(false). Cheers
