Would suggest you look at the full context of that sentence.
*Higher caching values will enable faster scanners but will eat up more memory and some calls of next may take longer and longer times when the cache is empty*
When the caching value is large, you will have to block to fill the cache which is done on a call to next() when the cache is empty. The rest of the calls to next() would be very quick.
Conversly, a smaller value for this property would result in more calls to next() actually requiring a re-load of the cache, but these calls would take less time because that cache is smaller.
Rajeshkumar J wrote:
Hi, hbase.client.scanner.caching Description Number of rows that we try to fetch when calling next on a scanner if it is not served from (local, client) memory. This configuration works together with hbase.client.scanner.max.result.size to try and use the network efficiently. The default value is Integer.MAX_VALUE by default so that the network will fill the chunk size defined by hbase.client.scanner.max.result.size rather than be limited by a particular number of rows since the size of rows varies table to table. If you know ahead of time that you will not require more than a certain number of rows from a scan, this configuration should be set to that row limit via Scan#setCaching. Higher caching values will enable faster scanners but will eat up more memory and *some calls of next may take longer and longer times when the cache is empty*. Do not set this value such that the time between invocations is greater than the scanner timeout; i.e. hbase.client.scanner.timeout.period Default 2147483647 Can any one explain below lines *some calls of next may take longer and longer times when the cache is empty* Thanks
