Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 )
Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 ...................................................................... Patch Set 10: (17 comments) http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG@43 PS10, Line 43: A local buffer file can be evicted if it is in status REMOTE or it typo REMOTE -> UPLOADED Also, maybe better to say "its all pages have been pinned" http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG@51 PS10, Line 51: Remote scratch space uses the highest priority local scratch dir : as its buffer. We could look into enabling the multiple directories per device startup option by default (allow_multiple_scratch_dirs_per_device) so users can configure multiple directories in the local file system. This option is disabled by default, but it might make sense to enable it since we want to reserve a local directory as a buffer for remote scratch dirs. Also, in case no local scratch directory has been configured, then that should be a configuration error. If, only one local scratch directory is configured then we should use that as a buffer for remote. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.h File be/src/runtime/hdfs-fs-cache.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.h@57 PS10, Line 57: Maybe use a typedef for the vector<std::pair<string, string>> to improve the readability. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.cc File be/src/runtime/hdfs-fs-cache.cc: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.cc@103 PS10, Line 103: if (options != nullptr && !options->empty()) { Should we also use the local cache for these connections? Also, do you also need to create a new instance of the filesystem object like we do in L99 above? Seems like you would have to? http://gerrit.cloudera.org:8080/#/c/16318/7/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/7/be/src/runtime/io/disk-io-mgr.cc@110 PS7, Line 110: DEFINE_int32(num_remote_hdfs_file_oper_io_threads, 2, I think as a starting point, we should probably use the same number of threads as num_remote_hdfs_io_threads http://gerrit.cloudera.org:8080/#/c/16318/7/be/src/runtime/io/disk-io-mgr.cc@122 PS7, Line 122: DEFINE_int32(num_s3_file_oper_io_threads, 2, "Number of S3 file operations I/O threads"); Probably good to use the same default number of threads as num_s3_io_threads. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@110 PS10, Line 110: DEFINE_int32(num_remote_hdfs_file_oper_io_threads, 2, Add a comment for the new startup flag. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@240 PS10, Line 240: ScopedHistogramTimer Should the write timer also involve the lock acquisition delays? http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@285 PS10, Line 285: offset_ = file_offset; : disk_id_ = disk_id; : tmp_file_ = tmp_file; : io_mgr_ = io_mgr; We could move all these to the initializer's list in the constructor. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@294 PS10, Line 294: disk_id_ = disk_id; : file_path_ = file_path; : io_mgr_ = io_mgr; Same here, better to move to initializer's list. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-context.h@296 PS10, Line 296: void RemoteOperDone(RemoteOperRange* oper_range, const Status& write_status); Would be good to add comments here. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-ranges.h@372 PS10, Line 372: bool Adding comments here for the new functions would be good. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr-internal.h@38 PS10, Line 38: DUMPED Not sure what's the best name for the new states. I do find myself thinking about this all the time I come across this code. Ideally we want something which is self explanatory: This is just a suggestion, but something like following maybe? INWRITING -> WRITING_LOCAL / SPILLING_LOCAL DUMPED -> WRITTEN_LOCAL / SPILLED_LOCAL INFETCHING -> READING_REMOTE / LOADING_REMOTE UPLOADED -> WRITTEN_REMOTE / SPILLED_REMOTE DUMPED_UPLOADED -> WRITTEN_LOCAL_REMOTE / SPILLED_LOCAL_REMOTE DELETED -> DELETED http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr-internal.h@198 PS10, Line 198: void SetDumped(bool dumped = true); Do we need multiple setters? Since they are updating the same file_status_ variable we could simply pass type TmpFileStatus as a param and set file_status_. We could do a special case for Delete in the common setter. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc@134 PS10, Line 134: const If this is a test only const then we should move it to the test file. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc@152 PS10, Line 152: Don't think we're using this variable? http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc@391 PS10, Line 391: s3a_options_ Should we set s3a_options_ only for S3 and not for hdfs? -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 10 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Fri, 11 Sep 2020 19:52:11 +0000 Gerrit-HasComments: Yes
