Re: block caching

lars hofhansl Thu, 17 Nov 2011 14:08:47 -0800

Hi Sam,
The idea is that the entire result of the scan will not fit into the cache if 
the scan scans a "reasonable" number of cells, and hence it unlikely that 
another scan will hit cached blocks before they get evicted, especially when 
using an LRU cache.

-- Lars

----- Original Message -----
From: Sam Seigal <[email protected]>
To: [email protected]
Cc: 
Sent: Thursday, November 17, 2011 1:44 PM
Subject: block caching

I have a table that I only use for generating indexes. It rarely will
have random reads, but will have M/R jobs running against it
constantly for generating indexes. Even the index table, random reads
will be rare. It will mostly be used for scanning blocks of data.

According to HBase The Definitive Guide

"As HBase reads entire blocks of data for efficient IO usage it
retains these blocks in an in-memory cache, so that subsequent reads
do not need any disk operation. The default of true enables the block
cache for every read operation. But if your use-case only ever has
sequential reads on a particular column family it is advisable to
disable it from polluting the block cache by setting the block cache
enabled flag to false. "

"There are other options you can use to influence how the block cache
is used, for example during a scan operation. This is useful during
full table scans so that you do not cause a major churn on the cache.
See the section called “Configuration” for more information about this
feature."

"Scan instances can be set to use the block cache in the region server
via the setCacheBlocks() method. For scans used with MapReduce jobs,
this should be false. For frequently accessed rows, it is advisable to
use the block cache."

What is the reasoning behind the above ?  Why is using a block cache
for M/R jobs not a good idea if it is doing full table scans ?

Re: block caching

Reply via email to