Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15306

to look at the new patch set (#19).

Change subject: IMPALA-8690: Add LIRS cache eviction algorithm
......................................................................

IMPALA-8690: Add LIRS cache eviction algorithm

One concern for the data cache is that the LRU eviction
algorithm is suceptible to being flushed by large
scans of low priority data. This implements the LIRS algorithm
described in "LIRS: An Efficient Low Inter-reference Recency
Set Replacement Policy to Improve Buffer Cache Performance"
by Song Jiang / Xiaodon Xhang 2002. LIRS is a scan-resistent
eviction algorithm with low performance penalty to LRU.

This introduces the startup flag data_cache_eviction_policy to
control which eviction policy to use. The only two options are
LRU and LIRS, with the default continuing to be LRU.

To accomodate the new algorithm and associated tests, some
code moved around:
1. The RLCacheShard implementation moved from util/cache/cache.cc
   to util/cache/rl-cache.cc.
2. The backend cache tests were split into multiple files.
   util/cache/cache-test.h contains shared cache testing code.
   util/cache/cache-test.cc contains generic tests that should
   work for any algorithm.
   util/cache/rl-cache-test.cc are RLCacheShard specific tests
   util/cache/lirs-cache-test.cc are LIRS specific tests
3. To make it easy for clients of the cache code to customize
   the cache eviction algorithm, the public interface changed
   from using a template to taking the policy as an argument.
4. Cache::MemoryType is removed.
5. Cache adds an Init() method to verify the validity of
   startup flags

Testing:
 - Added LIRS specific backend cache tests (lirs-cache-test)
 - Ran TPC-DS with a very small cache and concurrency to test
   corner cases with the LIRS eviction policy
 - Parameterized data-cache-test to run for both LRU and LIRS
 - Added LIRS equivalents for tests in custom_cluster/test_data_cache.py
 - Ran cache-bench with LRU and LIRS. The results are:
   Test case           | Algorithm | Lookups / sec | Hit rate
   ZIPFIAN ratio=1.00x | LRU       | 11.31M        | 99.9%
   ZIPFIAN ratio=1.00x | LIRS      | 10.09M        | 99.8%
   ZIPFIAN ratio=3.00x | LRU       | 11.36M        | 95.9%
   ZIPFIAN ratio=3.00x | LIRS      |  9.27M        | 96.4%
   UNIFORM ratio=1.00x | LRU       |  7.46M        | 99.8%
   UNIFORM ratio=1.00x | LIRS      |  6.93M        | 99.8%
   UNIFORM ratio=3.00x | LRU       |  5.63M        | 33.3%
   UNIFORM ratio=3.00x | LIRS      |  3.24M        | 33.3%
   The takeaway is that LIRS is a bit slower on lookups and
   quite a bit slower on inserts. However, they both are still
   doing millions of operations per second, so it should not
   be a bottleneck for the data cache.

Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/util/cache/CMakeLists.txt
M be/src/util/cache/cache-bench.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache-test.cc
A be/src/util/cache/cache-test.h
M be/src/util/cache/cache.cc
M be/src/util/cache/cache.h
A be/src/util/cache/lirs-cache-test.cc
A be/src/util/cache/lirs-cache.cc
A be/src/util/cache/rl-cache-test.cc
A be/src/util/cache/rl-cache.cc
M bin/rat_exclude_files.txt
M tests/custom_cluster/test_data_cache.py
16 files changed, 2,665 insertions(+), 844 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/15306/19
--
To view, visit http://gerrit.cloudera.org:8080/15306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8
Gerrit-Change-Number: 15306
Gerrit-PatchSet: 19
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>

Reply via email to