Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 )
Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 ...................................................................... Patch Set 13: (5 comments) http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/hdfs-fs-cache.cc File be/src/runtime/hdfs-fs-cache.cc: http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/hdfs-fs-cache.cc@103 PS13, Line 103: if (options != nullptr && !options->empty()) { : for (auto option : *options) { : hdfsBuilderConfSetStr( : hdfs_builder, option.first.c_str(), option.second.c_str()); : } : } > does this actually work? were you able to confirm that non-default configs Modified. http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/disk-io-mgr.h File be/src/runtime/io/disk-io-mgr.h: http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/disk-io-mgr.h@370 PS13, Line 370: REMOTE_DFS_DISK_FILE_OPER_OFFSET, : REMOTE_S3_DISK_FILE_OPER_OFFSET, > you cover this a bit in the commit message, but is the reason they were add The purpose of the "DISK_FILE_OPER" queues is to isolate the file operations from normal read/write IO operations. It may increase the efficiency of file operations and provide a separate control of the threads working on file operation jobs. Have added some comments in the code and the design document "Queue" chapter. http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/disk-io-mgr.cc@30 PS13, Line 30: #include "runtime/tmp-file-mgr-internal.h" > Why are we including the tmp-file-mgr here? I see a lot of references to th Thank you for the comment. Have decoupled the use of TmpFile from disk-io-mgr. Created a DiskFile in the io layer. The tmp-file-mgr would pass the DiskFiles to the disk-io-mgr for IO operations. http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/disk-io-mgr.cc@241 PS13, Line 241: goto end; > I think we generally try to avoid 'goto' statements because they make the c Done http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/request-ranges.h@113 PS13, Line 113: READ, : WRITE, : FETCH, : UPLOAD, > not sure I understand the difference between all of these. fetch seems like FETCH and UPLOAD are for the operations of the entire file, while READ and WRITE are related to a certain range of a file (page). Have changed the name to FILE_FETCH and FILE_UPLOAD, and added some comments in the code. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 13 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 08 Oct 2020 16:51:01 +0000 Gerrit-HasComments: Yes
