A brief answer yes, by default the caching size is Integer.MAX_VALUE now and it's a big difference from 0.98. This is changed by HBASE-11544 and you could find below statement on http://hbase.apache.org/book.html:
hbase.client.scanner.caching Description Number of rows that we try to fetch when calling next on a scanner if it is not served from (local, client) memory. This configuration works together with hbase.client.scanner.max.result.size to try and use the network efficiently. The default value is Integer.MAX_VALUE by default so that the network will fill the chunk size defined by hbase.client.scanner.max.result.size rather than be limited by a particular number of rows since the size of rows varies table to table. If you know ahead of time that you will not require more than a certain number of rows from a scan, this configuration should be set to that row limit via Scan#setCaching. Higher caching values will enable faster scanners but will eat up more memory and some calls of next may take longer and longer times when the cache is empty. Do not set this value such that the time between invocations is greater than the scanner timeout; i.e. hbase.client.scanner.timeout.period Default 2147483647 And user will be able to control the time limit of each call from client configuration after HBASE-15593, but only after 1.3.0 get released (sorry but for all existing release we could only control this by server side configuration, say half of hbase.client.scanner.timeout.period) We're discussing about this in https://issues.apache.org/jira/browse/HBASE-16973 recently, you can get more details there. Small world, isn't it? (Smile) Best Regards, Yu On 1 November 2016 at 13:10, Sachin Jain <[email protected]> wrote: > Hi, > > I am using HBase v1.1.2. I have few questions regarding full table scan:- > > 1. When we instantiate a Scanner and do not set any caching on it. What is > the value it picks by default. > - By looking at the code, I have found the following: > > From documentation on the top in Scan.java class > > * To modify scanner caching for just this scan, use {@link > #setCaching(int) setCaching}. > * If caching is NOT set, we will use the caching value of the hosting > {@link Table}. > > And > > /** > * Set the number of rows for caching that will be passed to scanners. > * If not set, the Configuration setting {@link > HConstants#HBASE_CLIENT_SCANNER_CACHING} will > * apply. > * Higher caching values will enable faster scanners but will use more > memory. > * @param caching the number of rows for caching > */ > public Scan setCaching(int caching) { > this.caching = caching; > return this; > } > > And, default value in HConstants file is > > public static final String HBASE_CLIENT_SCANNER_CACHING = > "hbase.client.scanner.caching"; > public static final int DEFAULT_HBASE_CLIENT_SCANNER_CACHING = 2147483647; > > > Does that mean the default value viz number of records read per scan is > 2147483647. > Can someone please clarify this ? > > 2. Another question is: I assume we have to set the caching value higher so > that we can reduce the number of RPC calls between client and region > server. > So if we increase the caching value, should we also increase the RPC > timeout and scannerTimeout values otherwise we may reach that threshold for > the new cache value. > > Thanks > -Sachin >
