Cool. Thanks!
Just to dig deeper, is this because BloomFilter is part of Meta, and Meta
block always cached no matter what?
Or it is because the BloomFilter is in the upper level of the searchTree in
the code path I pasted? I guess that code path is actually for data block,
not meta block?
// Call HFile's caching block reader API. We always cache index
// blocks, otherwise we might get terrible performance.
boolean shouldCache = cacheBlocks || (lookupLevel <
searchTreeLevel);
BlockType expectedBlockType;
if (lookupLevel < searchTreeLevel - 1) {
expectedBlockType = BlockType.INTERMEDIATE_INDEX;
} else if (lookupLevel == searchTreeLevel - 1) {
expectedBlockType = BlockType.LEAF_INDEX;
} else {
// this also accounts for ENCODED_DATA
expectedBlockType = BlockType.DATA;
}
On Wed, Apr 16, 2014 at 4:59 PM, Ted Yu <[email protected]> wrote:
> bq. it is always cached on read even when per-family/per-query cacheBlocks
> is turned off.
>
> True.
>
>
> On Wed, Apr 16, 2014 at 4:41 PM, Tianying Chang <[email protected]> wrote:
>
> > Hi,
> >
> > We have a use case where some data are mostly random read, so it polluted
> > cache and caused big GC. It is better to turn off the block cache for
> those
> > data. So we are going to call setCacheBlocks(false) for those get(). We
> > know that the index will be still cached based on below code path, so we
> > are safe there. But it is not clear if BloomFilter belong to the level <
> > searchTreeLevel, and also get cached also.
> >
> > // Call HFile's caching block reader API. We always cache index
> > // blocks, otherwise we might get terrible performance.
> > boolean shouldCache = cacheBlocks || (lookupLevel <
> > searchTreeLevel);
> > BlockType expectedBlockType;
> > if (lookupLevel < searchTreeLevel - 1) {
> > expectedBlockType = BlockType.INTERMEDIATE_INDEX;
> > } else if (lookupLevel == searchTreeLevel - 1) {
> > expectedBlockType = BlockType.LEAF_INDEX;
> > } else {
> > // this also accounts for ENCODED_DATA
> > expectedBlockType = BlockType.DATA;
> > }
> >
> > Or I think because BloomFilter is part of Meta data, so it is always
> cached
> > on read even when per-family/per-query cacheBlocks is turned off. Am I
> > right?
> >
> > Thanks
> > Tian-Ying
> >
>