Thanks Yu!! This is very helpful. On Tue, Nov 1, 2016 at 2:45 PM, Yu Li <[email protected]> wrote:
> A brief answer yes, by default the caching size is Integer.MAX_VALUE now > and it's a big difference from 0.98. This is changed by HBASE-11544 and you > could find below statement on http://hbase.apache.org/book.html: > > hbase.client.scanner.caching > Description > > Number of rows that we try to fetch when calling next on a scanner if it is > not served from (local, client) memory. This configuration works together > with hbase.client.scanner.max.result.size to try and use the network > efficiently. The default value is Integer.MAX_VALUE by default so that the > network will fill the chunk size defined by > hbase.client.scanner.max.result.size rather than be limited by a > particular > number of rows since the size of rows varies table to table. If you know > ahead of time that you will not require more than a certain number of rows > from a scan, this configuration should be set to that row limit via > Scan#setCaching. Higher caching values will enable faster scanners but will > eat up more memory and some calls of next may take longer and longer times > when the cache is empty. Do not set this value such that the time between > invocations is greater than the scanner timeout; i.e. > hbase.client.scanner.timeout.period > Default > > 2147483647 > > And user will be able to control the time limit of each call from client > configuration after HBASE-15593, but only after 1.3.0 get released (sorry > but for all existing release we could only control this by server side > configuration, say half of hbase.client.scanner.timeout.period) > > We're discussing about this in > https://issues.apache.org/jira/browse/HBASE-16973 recently, you can get > more details there. > > Small world, isn't it? (Smile) > > Best Regards, > Yu > > On 1 November 2016 at 13:10, Sachin Jain <[email protected]> wrote: > > > Hi, > > > > I am using HBase v1.1.2. I have few questions regarding full table scan:- > > > > 1. When we instantiate a Scanner and do not set any caching on it. What > is > > the value it picks by default. > > - By looking at the code, I have found the following: > > > > From documentation on the top in Scan.java class > > > > * To modify scanner caching for just this scan, use {@link > > #setCaching(int) setCaching}. > > * If caching is NOT set, we will use the caching value of the hosting > > {@link Table}. > > > > And > > > > /** > > * Set the number of rows for caching that will be passed to scanners. > > * If not set, the Configuration setting {@link > > HConstants#HBASE_CLIENT_SCANNER_CACHING} will > > * apply. > > * Higher caching values will enable faster scanners but will use more > > memory. > > * @param caching the number of rows for caching > > */ > > public Scan setCaching(int caching) { > > this.caching = caching; > > return this; > > } > > > > And, default value in HConstants file is > > > > public static final String HBASE_CLIENT_SCANNER_CACHING = > > "hbase.client.scanner.caching"; > > public static final int DEFAULT_HBASE_CLIENT_SCANNER_CACHING = > 2147483647; > > > > > > Does that mean the default value viz number of records read per scan is > > 2147483647. > > Can someone please clarify this ? > > > > 2. Another question is: I assume we have to set the caching value higher > so > > that we can reduce the number of RPC calls between client and region > > server. > > So if we increase the caching value, should we also increase the RPC > > timeout and scannerTimeout values otherwise we may reach that threshold > for > > the new cache value. > > > > Thanks > > -Sachin > > >
