Hi, thanks for responses. Ted - when I said "scan.setCaching", I meant "scan.setCacheBlocks(false)". That's what I get for not copying/pasting directly from code :)
I added a link to the graphs here: https://drive.google.com/file/d/0B3ZQ0nMNMFxCOHZNZVFsWEhCOUU/edit?usp=sharing Bryan - I believe you're right, but wanted to confirm. Thanks, -Matt On Mon, Jun 2, 2014 at 4:09 PM, Ted Yu <[email protected]> wrote: > Have you added the following when passing Scan to your job ? > > scan.setCacheBlocks(false); > > BTW image didn't go through. > Consider putting image on third-party site. > > On Mon, Jun 2, 2014 at 12:55 PM, Matt K <[email protected]> wrote: > > > Hi all, > > > > We are running a number of Map/Reduce jobs on top of HBase. We are not > > using HBase for any of its realtime capabilities, only for > > batch-processing. So we aren't doing lookups, just scans. > > > > Each one of our jobs has *scan.setCaching(false)* to turn off > > block-caching, since each block will only be accessed once. > > > > We recently started using Cloudera Manager, and I’m seeing something that > > doesn’t add up. See image below. It’s clear from the graphs that Block > > Cache is being used currently, and blocks are being cached and evicted. > > > > We do have *hfile.block.cache.size* set to 0.4 (default), but my > > understanding is that the jobs setting scan.setCaching(false) should > > override this. Since it’s set in every job, there should be no blocks > being > > cached. > > > > Can anyone help me understand what we’re seeing? > > > > Thanks, > > > > -Matt > > > > [image: Inline image 1] > > > -- www.calcmachine.com - easy online calculator.
