[email protected] has posted comments on this change. ( http://gerrit.cloudera.org:8080/19475 )
Change subject: IMPALA-11886: Data cache should support asynchronous writes ...................................................................... Patch Set 7: (14 comments) Thank you for your suggestion! I have updated the code and commit message, and added time-consuming log printing in the test code for easier replication of test results. http://gerrit.cloudera.org:8080/#/c/19475/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19475/4//COMMIT_MSG@24 PS4, Line 24: Testing: > Thank you for the performance numbers. This is interesting enough that we s Done http://gerrit.cloudera.org:8080/#/c/19475/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19475/7//COMMIT_MSG@9 PS7, Line 9: write > Nit: writes Done http://gerrit.cloudera.org:8080/#/c/19475/7//COMMIT_MSG@10 PS7, Line 10: when cache miss happens > Nit: "when a cache miss happens" Done http://gerrit.cloudera.org:8080/#/c/19475/7//COMMIT_MSG@11 PS7, Line 11: synchronized > Nit: synchronous Done http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/data-cache-test.cc@822 PS7, Line 822: EXPECT_LT(count, NUM_CACHE_ENTRIES_NO_EVICT); > On our test jobs, this test fails on this assert. I think we might > want to use a lower number for data_cache_async_write_threads for > this test (e.g. 1 or 2). Sorry, this was caused by me missing some code. In order not to interfere with the DataCache in TraceReplayer, I changed the number of asynchronous threads to the constructor parameter of DataCache, but I forgot to make the corresponding modifications in the test code. Now this issue has been fixed and all test cases should pass. http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/data-cache.cc@128 PS7, Line 128: const int MAX_STORE_TASK_QUEUE_SIZE = 1 << 20; > Let's add a comment here saying that this large value for the queue size is Done http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/data-cache.cc@385 PS7, Line 385: abstruct > Nit: abstract Done http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/data-cache.cc@393 PS7, Line 393: explicit StoreTask(const std::string& filename, int64_t mtime, int64_t offset, : const uint8_t* buffer, int64_t buffer_len, AtomicInt64& total_size) : : key_(filename, mtime, offset), > I would like to make StoreTask a simple struct with minimal logic, and move Done http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/data-cache.cc@975 PS7, Line 975: if (buffer_limit < PAGE_SIZE) { : return Status(Substitute("Configured data cache write buffer limit $0 is too small", : FLAGS_data_cache_async_write_buffer_limit)); : } > Let's require the limit to be higher. I think the minimum should be 8MB. If Done http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/data-cache.cc@1059 PS7, Line 1059: if (UNLIKELY(current_buffer_size_.Load() + buffer_len > store_buffer_capacity_)) { : VLOG(2) << Substitute("Failed to create store task due to buffer size limitation, " : "current buffer size: $0 size limitation: $1 require: $2", : current_buffer_size_.Load(), store_buffer_capacity_, buffer_len); : ImpaladMetrics::IO_MGR_REMOTE_DATA_CACHE_ASYNC_WRITES_DROPPED_BYTES-> : Increment(buffer_len); : ImpaladMetrics::IO_MGR_REMOTE_DATA_CACHE_ASYNC_WRITES_DROPPED_ENTRIES->Increment(1); : return false; : } > This would become a CompareAndSwap loop where either we are at the limit an Done http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/data-cache.cc@1069 PS7, Line 1069: const StoreTask* task = new StoreTask(filename, mtime, offset, buffer, buffer_len, : current_buffer_size_); > Move the logic from the StoreTask constructor (incrementing counters, alloc Done http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/data-cache.cc@1074 PS7, Line 1074: > We would also add a CompleteStoreTask() function here that would be called Done http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/19475/7/be/src/runtime/io/disk-io-mgr.cc@80 PS7, Line 80: Write threads need to bound the extra memory consumption for holding the " : "temporary buffer though. > Nit: Let's change this sentence to say that the extra memory for temporary Done http://gerrit.cloudera.org:8080/#/c/19475/7/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/19475/7/common/thrift/metrics.json@653 PS7, Line 653: bytes async > Nit: "bytes of async" Done -- To view, visit http://gerrit.cloudera.org:8080/19475 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I878f7486d485b6288de1a9145f49576b7155d312 Gerrit-Change-Number: 19475 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Comment-Date: Tue, 14 Mar 2023 08:14:22 +0000 Gerrit-HasComments: Yes
