Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 )
Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 ...................................................................... Patch Set 17: (37 comments) http://gerrit.cloudera.org:8080/#/c/16318/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16318/13//COMMIT_MSG@18 PS13, Line 18: * TmpFile is separated into two types of implementation, TmpFileLocal > nit: A new Done http://gerrit.cloudera.org:8080/#/c/16318/13//COMMIT_MSG@32 PS13, Line 32: DiskFile status of the buffer file would become INWRITING, and then > nit: at the same time Done http://gerrit.cloudera.org:8080/#/c/16318/13//COMMIT_MSG@34 PS13, Line 34: > nit: space after comma Done http://gerrit.cloudera.org:8080/#/c/16318/13//COMMIT_MSG@43 PS13, Line 43: : to specify the imple > Maybe better to say: Done http://gerrit.cloudera.org:8080/#/c/16318/13//COMMIT_MSG@54 PS13, Line 54: ease the efficiency of the file op > Better to say Done http://gerrit.cloudera.org:8080/#/c/16318/13//COMMIT_MSG@65 PS13, Line 65: of > "Unit" Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/hdfs-fs-cache.h File be/src/runtime/hdfs-fs-cache.h: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/hdfs-fs-cache.h@59 PS17, Line 59: NULL > nit: use nullptr instead of NULL Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-file.h@69 PS17, Line 69: /// or rewriting after deleted. > "rewriting after deleted" is not very clear. Yes, have refined the comment. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-file.h@123 PS17, Line 123: volatile > why volatile? I've generally found the semantics of volatile in C++ to be q Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-file.h@133 PS17, Line 133: // Every time to query or modify the status, or need to gurrantee working under > typo: guarantee Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-file.cc File be/src/runtime/io/disk-file.cc: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-file.cc@24 PS17, Line 24: void DiskFile::SetPersisted() { > I think these set function definitions could also be moved inside class bod Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@109 PS17, Line 109: // The maximum number of HDFS I/O threads for doing file operations, such a file > typo "such a file" -> "such as file" Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@110 PS17, Line 110: // uploading. The default value of 8 was chosen emperically to maximize HDFS throughput. > typo: "emperically" -> "empirically" Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@120 PS17, Line 120: // The maximum number of S3 I/O threads for doing file operations, such a file > typo: "such a file" -> "such as file" Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@121 PS17, Line 121: // uploading. The default value of 16 was chosen emperically to maximize S3 throughput. > typo: "emperically" -> "empirically" Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@233 PS17, Line 233: // DoWrite is using for Spilling. For spilling to local, we will > typo: using -> used Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@246 PS17, Line 246: ret_status = file_writer->Write(this, &is_ready); > I think "is_full" might make more sense here rather than "is_ready". Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@253 PS17, Line 253: // The file handle is closed, set the file to persisted status. : lock_guard<SpinLock> l(disk_file_->status_lock_); : disk_file_->SetPersisted(); > bit of an anti-pattern to acquire a external classes internal locks. its pr Added a mode for SetPersisted to acquire the lock inside the function. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@280 PS17, Line 280: i > block_size_ and disk_id_ could also be moved to constructor's initializer's Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@314 PS17, Line 314: // If it is not persisted, the only case currently should be > I am not sure I understand this comment. I think the local file's status is The file must be full and is/was persisted, however there could be a time delay for doing the upload, because the upload job is in the disk queue, so during the interval, it is possible all of the pages have been pinned and the local buffer file has been evicted. So, when the upload job starts to work, it is possible the local buffer file has gone. There could be another case that the query is cancelled or finished, so that the file is deleted. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@341 PS17, Line 341: // The file should be deleted by the thread who > who -> which Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@362 PS17, Line 362: offset += buffer_size; > Is file_size always a multiple of buffer_size? If yes then we will exit the Thanks for pointing out this, it could be a bug here when the file size is not a multiple of the buffer size. I think we can restrict the default file size and the buffer size are the power of 2. For another part, I added a padding function in the local file writer to guarantee the actual local buffer file size would be the default file size, and since it would be rare to do the padding and the padding size is less than a page, so it won't be a big issue for the performance. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@421 PS17, Line 421: return Status("File has been deleted."); > Maybe add more context in the error message such as the "remote_file_path". Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@441 PS17, Line 441: status = Status("File is to be deleted."); > Maybe better to add more context, such as the file name. Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@455 PS17, Line 455: offset += buffer_size; > Same comment as for DoUpload(). Is file_size always a multiple of buffer_si Added. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@476 PS17, Line 476: Status rm_status = FileSystemUtil::RemovePaths({local_file_path}); > Nit: The RemovePaths() could probably be called before acquiring the status Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@837 PS17, Line 837: } break; > I believe the standard is to put break inside the scope of the case stateme Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@858 PS17, Line 858: DCHECK(false); > Would be good to also log the request type Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/file-writer.h File be/src/runtime/io/file-writer.h: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/file-writer.h@52 PS17, Line 52: virtual Status WriteOne(WriteRange*) = 0; > not sure I understand the difference between WriteOne and Write The function of WriteOne is used for random range writing(mainly for spilling to local, since we might reuse the write range), it opens its file handle and close it for each range writing. The Write function does the range writing without opening a file handle or closing (using a shared file handle), which could be good for a sequential writing(mainly for spilling to remote, ranges are to be used once). For the later case, the file handle is opened or closed only once per file and shared by WriteRanges of the same file. Added some comments. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/file-writer.h@68 PS17, Line 68: std::mutex lock_; > would be nice to explain the thread-safety semantics of this class Added comment: The lock_ is to guarantee only one thread is using the file handle. Ideally, Open() and Close() would be called once for each file writer, so the lock_ is a exclusive lock mainly for running the Write(). WriteOne() doesn't need the lock_ because it opens its own file handle, and doesn't share it with other WriteRanges. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/file-writer.h@67 PS17, Line 67: DiskIoMgr* io_mgr_ = nullptr; : std::mutex lock_; : int64_t written_bytes_ = 0; : const char* file_path_; : const int64_t file_size_; : const int64_t max_page_size_; > nit: would be nice to have some docs on each of these Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/hdfs-file-reader.cc File be/src/runtime/io/hdfs-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/hdfs-file-reader.cc@98 PS17, Line 98: Status status = Status::OK(); > nit: is this change necessary? Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/request-ranges.h@671 PS17, Line 671: bool is_ready() const { return is_ready_; } > Not sure this function and the member 'is_ready' belongs in the WriteRange Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/tmp-file-mgr-internal.h@46 PS17, Line 46: /// Runtime Error during creation of the TmpFile. : class ResourceError : public std::runtime_error { : public: : explicit ResourceError(const Status& status) : : runtime_error(status.msg().msg()), status_(status) {} : virtual ~ResourceError() {} : Status& GetStatus() { return status_; } : : private: : Status status_; : }; > see other comment about C++ errors, probably should be removed. should be a Will delete. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/tmp-file-mgr.cc@236 PS17, Line 236: s3a_options_ = {make_pair("fs.s3a.fast.upload", "true"), : make_pair("fs.s3a.fast.upload.buffer", "disk")}; > is the implementation of this feature dependent on setting these values? I Typically, the s3 options are set in the xml, but I am afraid that if the options of s3 connections used by "spilling to s3" can be changed by the user through the xml, it might be changed to some other option that we don't want for "spilling", for example, from "disk" to "bytebuffer". So for safety, I set the options in the code to guarantee the connection is something we want for spilling. http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/tmp-file-mgr.cc@728 PS17, Line 728: throw TmpFile::ResourceError( : Status(GetHdfsErrorMsg("Failed to connect to FS: ", hdfs_url))); > we try to avoid C++ errors. Impala uses 'Status' objects instead. Done http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/util/hdfs-util.h File be/src/util/hdfs-util.h: http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/util/hdfs-util.h@31 PS17, Line 31: ABFS_1 > nit: ABFSS or ABFS_SECURE would probably be a better name Done -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 17 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Mon, 26 Oct 2020 18:29:35 +0000 Gerrit-HasComments: Yes
