Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 )
Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 ...................................................................... Patch Set 17: (11 comments) http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@280 PS17, Line 280: i block_size_ and disk_id_ could also be moved to constructor's initializer's list. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@314 PS17, Line 314: // If it is not persisted, the only case currently should be I am not sure I understand this comment. I think the local file's status is PERSISTED when it's full? So if we find that a local file, which we are trying to upload , is not full then that likely means some of the pages were pinned? Why does it have to be "all pages pinned", is the part I don't understand. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@341 PS17, Line 341: // The file should be deleted by the thread who who -> which http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@362 PS17, Line 362: offset += buffer_size; Is file_size always a multiple of buffer_size? If yes then we will exit the while loop at L337 once we've read the entire file buffer_size chunk at a time. Please add a DCHECK: `file_size % buffer_size == 0` However, if file_size is not a multiple of buffer_size then we will try to read beyond the file_size and I believe Fread will return an error in that case. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@421 PS17, Line 421: return Status("File has been deleted."); Maybe add more context in the error message such as the "remote_file_path". http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@441 PS17, Line 441: status = Status("File is to be deleted."); Maybe better to add more context, such as the file name. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@455 PS17, Line 455: offset += buffer_size; Same comment as for DoUpload(). Is file_size always a multiple of buffer_size? Please add a DCHECK for the same at the beginning of this function. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@476 PS17, Line 476: Status rm_status = FileSystemUtil::RemovePaths({local_file_path}); Nit: The RemovePaths() could probably be called before acquiring the status lock? We already have the file lock in L413 above and the status lock is probably only needed for updating status. Also, would be good to log a warning and dump the status message, if rm_status is non zero. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@837 PS17, Line 837: } break; I believe the standard is to put break inside the scope of the case statement. Same comments for the rest of the cases. ``` switch (var) { case 0: { // 2 space indent ... // 4 space indent break; } case 1: { ... break; } default: { assert(false); } } ``` http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@858 PS17, Line 858: DCHECK(false); Would be good to also log the request type ```DCHECK(false) << range->request_type()``` http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/request-ranges.h@671 PS17, Line 671: bool is_ready() const { return is_ready_; } Not sure this function and the member 'is_ready' belongs in the WriteRange class. I think this should be property of the related DiskFile. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 17 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Wed, 21 Oct 2020 19:53:20 +0000 Gerrit-HasComments: Yes
