Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19475 )
Change subject: IMPALA-11886: Data cache should support asynchronous writes ...................................................................... IMPALA-11886: Data cache should support asynchronous writes This patch implements asynchronous writes to the data cache to improve scan performance when a cache miss happens. Previously, writes to the data cache are synchronous with hdfs file reads, and both are handled by remote hdfs IO threads. In other words, if a cache miss occurs, the IO thread needs to take additional responsibility for cache writes, which will lead to scan performance deterioration. This patch uses a thread pool for asynchronous writes, and the number of threads in the pool is determined by the new configuration 'data_cache_num_write_threads'. In asynchronous write mode, the IO thread only needs to copy data to the temporary buffer when storing data into the data cache. The additional memory consumption caused by temporary buffers can be limited, depending on the new configuration 'data_cache_write_buffer_limit'. Testing: - Add test cases for asynchronous data writing to the original DataCacheTest using different number of threads. - Add DataCacheTest,#OutOfWriteBufferLimit Used to test the limit of memory consumed by temporary buffers in the case of asynchronous writes - Add a timer to the MultiThreadedReadWrite function to get the average time of multithreaded writes. Here are some test cases and their time that differ significantly between synchronous and asynchronous: Test case | Policy | Sync/Async | write time in ms MultiThreadedNoMisses | LRU | Sync | 12.20 MultiThreadedNoMisses | LRU | Async | 20.74 MultiThreadedNoMisses | LIRS | Sync | 9.42 MultiThreadedNoMisses | LIRS | Async | 16.75 MultiThreadedWithMisses | LRU | Sync | 510.87 MultiThreadedWithMisses | LRU | Async | 10.06 MultiThreadedWithMisses | LIRS | Sync | 1872.11 MultiThreadedWithMisses | LIRS | Async | 11.02 MultiPartitions | LRU | Sync | 1.20 MultiPartitions | LRU | Async | 5.23 MultiPartitions | LIRS | Sync | 1.26 MultiPartitions | LIRS | Async | 7.91 AccessTraceAnonymization | LRU | Sync | 1963.89 AccessTraceAnonymization | LRU | Sync | 2073.62 AccessTraceAnonymization | LRU | Async | 9.43 AccessTraceAnonymization | LRU | Async | 13.13 AccessTraceAnonymization | LIRS | Sync | 1663.93 AccessTraceAnonymization | LIRS | Sync | 1501.86 AccessTraceAnonymization | LIRS | Async | 12.83 AccessTraceAnonymization | LIRS | Async | 12.74 Change-Id: I878f7486d485b6288de1a9145f49576b7155d312 Reviewed-on: http://gerrit.cloudera.org:8080/19475 Reviewed-by: Joe McDonnell <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache-trace.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M bin/run-all-tests.sh M bin/start-impala-cluster.py M common/thrift/metrics.json 10 files changed, 427 insertions(+), 52 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/19475 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I878f7486d485b6288de1a9145f49576b7155d312 Gerrit-Change-Number: 19475 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]>
