Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 )
Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 ...................................................................... Patch Set 27: (19 comments) Another round of comments. I understand the local buffer file logic now and I think it makes sense, just need to do another pass to look for bugs. http://gerrit.cloudera.org:8080/#/c/16318/27//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16318/27//COMMIT_MSG@77 PS27, Line 77: The first available local directory is used for the local buffer Can you give an example of how you might set this up in practice to allow both local and remote spilling? Would you configure two local directories with different limits so that one was the local buffer directory and the other was the main scratch directory? http://gerrit.cloudera.org:8080/#/c/16318/27//COMMIT_MSG@91 PS27, Line 91: * Unit Tests added to tmp-file-mgr-test/disk-io-mgr-test. I think it'd be good to add a buffer pool test that exercises the remote spilling code paths. We have some randomized tests like BufferPoolTest::TestRandomInternalMulti() - I think you could add another one with a FileGroup configured to use remote scratch. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-file.h@137 PS27, Line 137: void SetActualFileSize(int64_t size) { This is only called once, right? Can we check that invariant: DCHECK_EQ(0, actual_file_size_.Load()); http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-file.h@138 PS27, Line 138: DCHECK nit: use DCHECK_LE - it automatically prints the two values if it fails so it's a little nicer for debugging. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-io-mgr.cc@252 PS27, Line 252: is_full_ = written_bytes != 0 && written_bytes == disk_file_->actual_file_size(); Add DCHECK_LE(written_bytes, disk_file_->actual_file_size()) Just so we catch it in testing if there's a bug. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-io-mgr.cc@329 PS27, Line 329: int bytes = (file_size - offset < buffer_size) ? (file_size - offset) : buffer_size; Maybe use min() instead? http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/request-context.cc File be/src/runtime/io/request-context.cc: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/request-context.cc@243 PS27, Line 243: (static_cast<WriteRange*>(range))->callback()(status); nit: I think the parentheses around static_cast here and on l246 are not needed. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr-internal.h@326 PS27, Line 326: /// The number of the tmp files in the pool. For debug and test. We don't seem to use this variable any more http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.h@194 PS27, Line 194: std::unique_ptr<TmpFileBufferPool> tmp_file_pool_; When is this null or non-null? http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@427 PS24, Line 427: *need_local_buffer_dir = false; > Currently local buffer directory won't be inserted into tmp_dirs_, the dire I left a comment on the commit message. It seems like it would be helpful to have an example of how to configure this in a reasonable way. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@512 PS27, Line 512: if (tmp_dirs_remote_ctrl_.tmp_file_pool_ == nullptr) { What [ http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@917 PS27, Line 917: if (tmp_file_mgr_->tmp_dirs_remote_ctrl_.tmp_file_pool_ != nullptr) { Which case is this? Can we add a boolean helper method to TmpDirsRemoteCtrl() that makes it self-describing? It's sometimes a little hard to follow code when null/non-null implies certain modes. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@967 PS27, Line 967: if (!tmp_files_remote_.empty()) { It would be helpful to add a couple of comments to explain the code flow. E.g. // First, check to see if we can allocate more space in the previous file. then below // No space in previous file, we need to set up a new remote file. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1019 PS27, Line 1019: return Status("Page size can't be larger than the temporary file size."); I think this error isn't accurate any more? http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1073 PS27, Line 1073: if ((*tmp_file)->is_blacklisted()) { nit: there seems to be some unnecessary formatting changes here that it would be better to avoid. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1112 PS27, Line 1112: if (!alloc_full || !status.ok()) return status; nit: maybe reverse the condition so status.ok() comes first. It's basically equivalent but I prefer this pattern because in other places there are functions where the values of output arguments are undefined if they return a non-error status. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1705 PS27, Line 1705: if (cur_write_range_ == nullptr) { : DCHECK(shut_down_); : return; : } Couldn't we just return on line 1699 in this case? http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1753 PS27, Line 1753: auto key_range_it = find(write_ranges_.begin(), write_ranges_.end(), range); I think this ends up being O(n^2) in the number of write ranges, which isn't ideal. Generally I like to avoid premature optimization, but it would be good to avoid the quadratic behaviour. Can we store an iterator from write_ranges_ in write_ranges_to_add_ instead? std::list iterators are stable and remain valid as long as the item is present in the list. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1862 PS27, Line 1862: DCHECK nit: DCHECK_EQ -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 27 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Fri, 15 Jan 2021 08:03:49 +0000 Gerrit-HasComments: Yes
