Hello Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19475

to look at the new patch set (#10).

Change subject: IMPALA-11886: Data cache should support asynchronous writes
......................................................................

IMPALA-11886: Data cache should support asynchronous writes

This patch implements asynchronous writes to the data cache to improve
scan performance when a cache miss happens.
Previously, writes to the data cache are synchronous with hdfs file
reads, and both are handled by remote hdfs IO threads. In other words,
if a cache miss occurs,  the IO thread needs to take additional
responsibility for cache writes,  which will lead to scan performance
deterioration.
This patch uses a thread pool for asynchronous writes, and the number of
threads in the pool is determined by the new configuration
'data_cache_num_write_threads'. In asynchronous write mode, the IO
thread only needs to copy data to the temporary buffer when storing data
into the data cache. The additional memory consumption caused by
temporary buffers can be limited, depending on the new configuration
'data_cache_write_buffer_limit'.

Testing:
- Add test cases for asynchronous data writing to the original
DataCacheTest using different number of threads.
- Add DataCacheTest,#OutOfWriteBufferLimit
Used to test the limit of memory consumed by temporary buffers in the
case of asynchronous writes
- Add a timer to the MultiThreadedReadWrite function to get the average
time of multithreaded writes. Here are some test cases and their time
that differ significantly between synchronous and asynchronous:
Test case                | Policy | Sync/Async | write time in ms
MultiThreadedNoMisses    | LRU    | Sync       |   12.20
MultiThreadedNoMisses    | LRU    | Async      |   20.74
MultiThreadedNoMisses    | LIRS   | Sync       |    9.42
MultiThreadedNoMisses    | LIRS   | Async      |   16.75
MultiThreadedWithMisses  | LRU    | Sync       |  510.87
MultiThreadedWithMisses  | LRU    | Async      |   10.06
MultiThreadedWithMisses  | LIRS   | Sync       | 1872.11
MultiThreadedWithMisses  | LIRS   | Async      |   11.02
MultiPartitions          | LRU    | Sync       |    1.20
MultiPartitions          | LRU    | Async      |    5.23
MultiPartitions          | LIRS   | Sync       |    1.26
MultiPartitions          | LIRS   | Async      |    7.91
AccessTraceAnonymization | LRU    | Sync       | 1963.89
AccessTraceAnonymization | LRU    | Sync       | 2073.62
AccessTraceAnonymization | LRU    | Async      |    9.43
AccessTraceAnonymization | LRU    | Async      |   13.13
AccessTraceAnonymization | LIRS   | Sync       | 1663.93
AccessTraceAnonymization | LIRS   | Sync       | 1501.86
AccessTraceAnonymization | LIRS   | Async      |   12.83
AccessTraceAnonymization | LIRS   | Async      |   12.74

Change-Id: I878f7486d485b6288de1a9145f49576b7155d312
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache-trace.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/run-all-tests.sh
M bin/start-impala-cluster.py
M common/thrift/metrics.json
10 files changed, 427 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/19475/10
--
To view, visit http://gerrit.cloudera.org:8080/19475
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I878f7486d485b6288de1a9145f49576b7155d312
Gerrit-Change-Number: 19475
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>

Reply via email to