[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Patch Set 10: (10 comments) http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG@43 PS10, Line 43:A local buffer file can be evicted if it is in status REMOTE or it > typo REMOTE -> UPLOADED Done http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.h File be/src/runtime/hdfs-fs-cache.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.h@57 PS10, Line 57: > Maybe use a typedef for the vector> to improve th Done http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@110 PS10, Line 110: DEFINE_int32(num_remote_hdfs_file_oper_io_threads, 2, > Add a comment for the new startup flag. Done. Modified the default value for operation io threads either. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@240 PS10, Line 240: ScopedHistogramTimer > Should the write timer also involve the lock acquisition delays? Done. Yes, it is better to include the lock delays. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@285 PS10, Line 285: offset_ = file_offset; : disk_id_ = disk_id; : tmp_file_ = tmp_file; : io_mgr_ = io_mgr; > We could move all these to the initializer's list in the constructor. Done http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@294 PS10, Line 294: disk_id_ = disk_id; : file_path_ = file_path; : io_mgr_ = io_mgr; > Same here, better to move to initializer's list. Done http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-context.h@296 PS10, Line 296: void RemoteOperDone(RemoteOperRange* oper_range, const Status& write_status); > Would be good to add comments here. Done http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-ranges.h@372 PS10, Line 372: bool > Adding comments here for the new functions would be good. Done http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc@152 PS10, Line 152: > Don't think we're using this variable? Done http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc@391 PS10, Line 391: s3a_options_ > Should we set s3a_options_ only for S3 and not for hdfs? Done. Move to the S3 path logic. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 10 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 16 Sep 2020 14:33:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7186/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 11 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Sep 2020 14:31:32 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Yida Wu has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management. 5) Compatibility of spilling to local and remote. 6) All the errors from hdfs/s3 should terminate the query. Implementation Details: 1) An new enum type is added to specify the disk type of local files, including TmpFileDiskType::LOCAL/DFS/S3. Also, startup option remote_tmp_file_read_by_file is added to specify the implementation of reading pages from the remote. If set to true, the entire file would be fetched to the local buffer during reading(pinning) if it was evicted. If set to false, only a page is read for each reading. 2) Two disk queues have been added to do the file operation jobs. Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper File operations on the remote disk like upload and fetch should be done in these queues. The purpose of the queues is to seperate long run operations with short ones, and also to have a more accurate control on the thread number working on these file operation jobs, sometimes we might not want too many upload and fetch jobs working in the same time. RemoteOperRange is the new type to carry the file operation jobs. Previously,we have request types of READ and WRITE. Now FETCH/UPLOAD have been added. 3) The tmp files are deleted when the tmp file group is deconstructing. For remote files, the entire directory would be deleted. 4) The local buffer files management is to control the total size of local buffer files and evict files if needed. There are basically six status of a remote tmp file, IN_WRITING/DUMPED/IN_FETCHING/UPLOADED/DUMPED_UPLOADED/DELETED. A local buffer file can be evicted if it is in status UPLOADED or it has been all pinned. There are two modes to decide the sequence of choosing files to be evicted. Default is LIFO, the other is FIFO. It can be controlled by startup option remote_tmp_files_avail_pool_lifo. 5) Spilling to local has higher priority than spilling to remote. If no local scratch space is available, temporary data will be spilled to remote. Remote scratch space uses the highest priority local scratch dir as its buffer. If no local scratch space or only one has been configured, a default local buffer should be used. The purpose of the design is to simplify the implementation in milestone 1 with less changes on the configuration. Limitations: * Only one remote scratch dir is supported. * The highest priority local scratch dir is used for the buffer of remote scratch space if remote scratch dir exists. Testcases: * Ran Unit Tests: $IMPALA_HOME/be/build/debug/runtime/buffered-tuple-stream-test $IMPALA_HOME/be/build/debug/runtime/tmp-file-mgr-test $IMPALA_HOME/be/build/debug/runtime/bufferpool/buffer-pool-test $IMPALA_HOME/be/build/debug/runtime/io/disk-io-mgr-test * Some new testcases has been added to tmp-file-mgr-test. TODO: - New Testcases for Spilling to S3. - Upper and lower bounds of new options related to size. - Preserve memory buffer for block buffers on file upload and fetch. - Add some new metrics, like the rate of accessing local buffer. Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 --- M be/src/runtime/hdfs-fs-cache.cc M be/src/runtime/hdfs-fs-cache.h M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h A be/src/runtime/io/file-writer.h M be/src/runtime/io/hdfs-file-reader.cc A be/src/runtime/io/hdfs-file-writer.cc A be/src/runtime/io/hdfs-file-writer.h M be/src/runtime/io/local-file-system.cc M be/src/runtime/io/local-file-system.h A be/src/runtime/io/local-file-writer.cc A be/src/runtime/io/local-file-writer.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/query-state.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M common/thrift/metrics.json 26 files changed, 2,929 insertions(+), 237 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/16318/11 -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Patch Set 10: (17 comments) http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG@43 PS10, Line 43:A local buffer file can be evicted if it is in status REMOTE or it typo REMOTE -> UPLOADED Also, maybe better to say "its all pages have been pinned" http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG@51 PS10, Line 51: Remote scratch space uses the highest priority local scratch dir :as its buffer. We could look into enabling the multiple directories per device startup option by default (allow_multiple_scratch_dirs_per_device) so users can configure multiple directories in the local file system. This option is disabled by default, but it might make sense to enable it since we want to reserve a local directory as a buffer for remote scratch dirs. Also, in case no local scratch directory has been configured, then that should be a configuration error. If, only one local scratch directory is configured then we should use that as a buffer for remote. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.h File be/src/runtime/hdfs-fs-cache.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.h@57 PS10, Line 57: Maybe use a typedef for the vector> to improve the readability. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.cc File be/src/runtime/hdfs-fs-cache.cc: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.cc@103 PS10, Line 103: if (options != nullptr && !options->empty()) { Should we also use the local cache for these connections? Also, do you also need to create a new instance of the filesystem object like we do in L99 above? Seems like you would have to? http://gerrit.cloudera.org:8080/#/c/16318/7/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/7/be/src/runtime/io/disk-io-mgr.cc@110 PS7, Line 110: DEFINE_int32(num_remote_hdfs_file_oper_io_threads, 2, I think as a starting point, we should probably use the same number of threads as num_remote_hdfs_io_threads http://gerrit.cloudera.org:8080/#/c/16318/7/be/src/runtime/io/disk-io-mgr.cc@122 PS7, Line 122: DEFINE_int32(num_s3_file_oper_io_threads, 2, "Number of S3 file operations I/O threads"); Probably good to use the same default number of threads as num_s3_io_threads. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@110 PS10, Line 110: DEFINE_int32(num_remote_hdfs_file_oper_io_threads, 2, Add a comment for the new startup flag. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@240 PS10, Line 240: ScopedHistogramTimer Should the write timer also involve the lock acquisition delays? http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@285 PS10, Line 285: offset_ = file_offset; : disk_id_ = disk_id; : tmp_file_ = tmp_file; : io_mgr_ = io_mgr; We could move all these to the initializer's list in the constructor. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@294 PS10, Line 294: disk_id_ = disk_id; : file_path_ = file_path; : io_mgr_ = io_mgr; Same here, better to move to initializer's list. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-context.h@296 PS10, Line 296: void RemoteOperDone(RemoteOperRange* oper_range, const Status& write_status); Would be good to add comments here. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-ranges.h@372 PS10, Line 372: bool Adding comments here for the new functions would be good. http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr-internal.h@38 PS10, Line 38: DUMPED Not sure what's the best name for the new states. I do find myself thinking about this all the time I come across this code. Ideally we want something which is self explanatory: This is just a suggestion, but something like following maybe? INWRITING -> WRITING_LOCAL / SPILLING_LOCAL DUMPED -> WRITTEN_LOCAL / SPILLED_LOCAL INFETCHING ->
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Patch Set 10: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7110/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 10 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 04 Sep 2020 17:45:55 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Yida Wu has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management. 5) Compatibility of spilling to local and remote. 6) All the errors from hdfs/s3 should terminate the query. Implementation Details: 1) An new enum type is added to specify the disk type of local files, including TmpFileDiskType::LOCAL/DFS/S3. Also, startup option remote_tmp_file_read_by_file is added to specify the implementation of reading pages from the remote. If set to true, the entire file would be fetched to the local buffer during reading(pinning) if it was evicted. If set to false, only a page is read for each reading. 2) Two disk queues have been added to do the file operation jobs. Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper File operations on the remote disk like upload and fetch should be done in these queues. The purpose of the queues is to seperate long run operations with short ones, and also to have a more accurate control on the thread number working on these file operation jobs, sometimes we might not want too many upload and fetch jobs working in the same time. RemoteOperRange is the new type to carry the file operation jobs. Previously,we have request types of READ and WRITE. Now FETCH/UPLOAD have been added. 3) The tmp files are deleted when the tmp file group is deconstructing. For remote files, the entire directory would be deleted. 4) The local buffer files management is to control the total size of local buffer files and evict files if needed. There are basically six status of a remote tmp file, IN_WRITING/DUMPED/IN_FETCHING/UPLOADED/DUMPED_UPLOADED/DELETED. A local buffer file can be evicted if it is in status REMOTE or it has been all pinned. There are two modes to decide the sequence of choosing files to be evicted. Default is LIFO, the other is FIFO. It can be controlled by startup option remote_tmp_files_avail_pool_lifo. 5) Spilling to local has higher priority than spilling to remote. If no local scratch space is available, temporary data will be spilled to remote. Remote scratch space uses the highest priority local scratch dir as its buffer. If no local scratch space or only one has been configured, a default local buffer should be used. The purpose of the design is to simplify the implementation in milestone 1 with less changes on the configuration. Limitations: * Only one remote scratch dir is supported. * The highest priority local scratch dir is used for the buffer of remote scratch space if remote scratch dir exists. Testcases: * Ran Unit Tests: $IMPALA_HOME/be/build/debug/runtime/buffered-tuple-stream-test $IMPALA_HOME/be/build/debug/runtime/tmp-file-mgr-test $IMPALA_HOME/be/build/debug/runtime/bufferpool/buffer-pool-test $IMPALA_HOME/be/build/debug/runtime/io/disk-io-mgr-test * Some new testcases has been added to tmp-file-mgr-test. TODO: - New Testcases for Spilling to S3. - Upper and lower bounds of new options related to size. - Preserve memory buffer for block buffers on file upload and fetch. - Add some new metrics, like the rate of accessing local buffer. Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 --- M be/src/runtime/hdfs-fs-cache.cc M be/src/runtime/hdfs-fs-cache.h M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h A be/src/runtime/io/file-writer.h M be/src/runtime/io/hdfs-file-reader.cc A be/src/runtime/io/hdfs-file-writer.cc A be/src/runtime/io/hdfs-file-writer.h M be/src/runtime/io/local-file-system.cc M be/src/runtime/io/local-file-system.h A be/src/runtime/io/local-file-writer.cc A be/src/runtime/io/local-file-writer.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/query-state.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M common/thrift/metrics.json 26 files changed, 2,754 insertions(+), 237 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/16318/10 -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Yida Wu has removed Abhishek Rawat from this change. ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Removed reviewer Abhishek Rawat. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6945/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 17 Aug 2020 14:37:46 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Yida Wu has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management. 5) Compatibility of spilling to local and remote. 6) All the errors from hdfs/s3 should terminate the query. Implementation Details: 1) An new enum type is added to specify the disk type of local files, including TmpFileDiskType::LOCAL/DFS/S3. Also, startup option remote_tmp_file_read_by_file is added to specify the implementation of reading pages from the remote. If set to true, the entire file would be fetched to the local buffer during reading(pinning) if it was evicted. If set to false, only a page is read for each reading. 2) Two disk queues have been added to do the file operation jobs. Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper File operations on the remote disk like upload and fetch should be done in these queues. The purpose of the queues is to seperate long run operations with short ones, and also to have a more accurate control on the thread number working on these file operation jobs, sometimes we might not want too many upload and fetch jobs working in the same time. RemoteOperRange is the new type to carry the file operation jobs. Previously,we have request types of READ and WRITE. Now FETCH/UPLOAD have been added. 3) The tmp files are deleted when the tmp file group is deconstructing. For remote files, the entire directory would be deleted. 4) The local buffer files management is to control the total size of local buffer files and evict files if needed. There are basically six status of a remote tmp file, IN_WRITING/DUMPED/IN_FETCHING/UPLOADED/DUMPED_UPLOADED/DELETED. A local buffer file can be evicted if it is in status REMOTE or it has been all pinned. There are two modes to decide the sequence of choosing files to be evicted. Default is LIFO, the other is FIFO. It can be controlled by startup option remote_tmp_files_avail_pool_lifo. 5) Spilling to local has higher priority than spilling to remote. If no local scratch space is available, temporary data will be spilled to remote. Remote scratch space uses the highest priority local scratch dir as its buffer. If no local scratch space or only one has been configured, a default local buffer should be used. The purpose of the design is to simplify the implementation in milestone 1 with less changes on the configuration. Limitations: * Only one remote scratch dir is supported. * The highest priority local scratch dir is used for the buffer of remote scratch space if remote scratch dir exists. Testcases: * Ran Unit Tests: $IMPALA_HOME/be/build/debug/runtime/buffered-tuple-stream-test $IMPALA_HOME/be/build/debug/runtime/tmp-file-mgr-test $IMPALA_HOME/be/build/debug/runtime/bufferpool/buffer-pool-test $IMPALA_HOME/be/build/debug/runtime/io/disk-io-mgr-test * Some new testcases has been added to tmp-file-mgr-test. TODO: - New Testcases for Spilling to S3. - Upper and lower bounds of new options related to size. - Preserve memory buffer for block buffers on file upload and fetch. - Add some new metrics, like the rate of accessing local buffer. Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 --- M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h A be/src/runtime/io/file-writer.h M be/src/runtime/io/hdfs-file-reader.cc A be/src/runtime/io/hdfs-file-writer.cc A be/src/runtime/io/hdfs-file-writer.h M be/src/runtime/io/local-file-system.cc M be/src/runtime/io/local-file-system.h A be/src/runtime/io/local-file-writer.cc A be/src/runtime/io/local-file-writer.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/query-state.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M common/thrift/metrics.json 24 files changed, 2,753 insertions(+), 232 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/16318/7 -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6914/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 13 Aug 2020 14:54:33 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Yida Wu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management. 5) Compatibility of spilling to local and remote. 6) All the errors from hdfs/s3 should terminate the query. Implementation Details: 1) An new enum type is added to specify the function of local files. LocalFileMode::BUFFER and LocalFileMode::FILE. LocalFileMode::BUFFER indicates that the local file is used as a buffer for remote operations. LocalFileMode::FILE indicates the local file is used for spilling to local. Also, startup option remote_tmp_file_local_buff_mode is added to specify the implementation of the reading pages from the remote. If set to true, the entire file would be fetched to the local buffer during reading(pinning) if it was evicted. If set to false, only a page is read for each reading. 2) Two disk queues have been added to do the file operation jobs. Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper File operations on the remote disk like upload and fetch should be done in these queues. The purpose of the queues is to seperate long run operations with short ones, and also to have a more accurate control on the thread number working on these file operation jobs, sometimes we might don't want too many upload and fetch jobs working in the same time. RemoteOperRange is the new type to carry the file operation jobs. Previously,we have request types of READ and WRITE. Now FETCH/UPLOAD/EVICT have been added. 3) The tmp files are deleted when the tmp file group is deconstructing. 4) The local buffer files management is to control the total size of local buffer files and evict files if needed. There are basically six status of a remote tmp file, IN_WRITING/DUMPED/IN_DUMPING/UPLOADED/DUMPED_UPLOADED/DELETED. A local buffer file can be evicted if it is in status REMOTE or it has been all pinned. An EVICT job is sent to the local disk queue if a file is chosen to be evicted. There are two modes to decide the sequence of choosing files to be evicted. Default is LIFO, the other is FIFO. It can be decided by startup option remote_tmp_files_avail_pool_lifo. 5) Spilling to local has higher priority than spilling to remote. If no local scratch space is available, temporary data will be spilled to remote. Remote scratch space uses the highest priority local scratch dir as its buffer. If no local scratch space or only one has been configured, a default local buffer should be used. The purpose of the design is to simplify the implementation in milestone 1 with less changes on the configuration. Limitations: * Only one remote scratch dir is supported. * The highest priority local scratch dir is used for the buffer of remote scratch space if remote scratch dir exists. Testcases: * Ran Unit Tests: $IMPALA_HOME/be/build/debug/runtime/buffered-tuple-stream-test $IMPALA_HOME/be/build/debug/runtime/tmp-file-mgr-test $IMPALA_HOME/be/build/debug/runtime/bufferpool/buffer-pool-test $IMPALA_HOME/be/build/debug/runtime/io/disk-io-mgr-test * Some new testcases has been added to tmp-file-mgr-test. TODO: - New Testcases for Spilling to S3. - Upper and lower bounds of new options related to size. - Preserve memory buffer for block buffers on file upload and fetch. - Add some new metrics, like the rate of accessing local buffer. Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 --- M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h A be/src/runtime/io/file-writer.h M be/src/runtime/io/hdfs-file-reader.cc A be/src/runtime/io/hdfs-file-writer.cc A be/src/runtime/io/hdfs-file-writer.h M be/src/runtime/io/local-file-system.cc M be/src/runtime/io/local-file-system.h A be/src/runtime/io/local-file-writer.cc A be/src/runtime/io/local-file-writer.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M common/thrift/metrics.json 23 files changed, 2,675 insertions(+), 211 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/16318/3 -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/6902/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 13 Aug 2020 03:05:57 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management. 5) Compatibility of spilling to local and remote. 6) All the errors from hdfs/s3 should terminate the query. Implementation Details: 1) An new enum type is added to specify the function of local files. LocalFileMode::BUFFER and LocalFileMode::FILE. LocalFileMode::BUFFER indicates that the local file is used as a buffer for remote operations. LocalFileMode::FILE indicates the local file is used for spilling to local. Also, startup option remote_tmp_file_local_buff_mode is added to specify the implementation of the reading pages from the remote. If set to true, the entire file would be fetched to the local buffer during reading(pinning) if it was evicted. If set to false, only a page is read for each reading. 2) Two disk queues have been added to do the file operation jobs. Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper File operations on the remote disk like upload and fetch should be done in these queues. The purpose of the queues is to seperate long run operations with short ones, and also to have a more accurate control on the thread number working on these file operation jobs, sometimes we might don't want too many upload and fetch jobs working in the same time. RemoteOperRange is the new type to carry the file operation jobs. Previously,we have request types of READ and WRITE. Now FETCH/UPLOAD/EVICT have been added. 3) The tmp files are deleted when the tmp file group is deconstructing. 4) The local buffer files management is to control the total size of local buffer files and evict files if needed. There are basically six status of a remote tmp file, IN_WRITING/DUMPED/IN_DUMPING/UPLOADED/DUMPED_UPLOADED/DELETED. A local buffer file can be evicted if it is in status REMOTE or it has been all pinned. An EVICT job is sent to the local disk queue if a file is chosen to be evicted. There are two modes to decide the sequence of choosing files to be evicted. Default is LIFO, the other is FIFO. It can be decided by startup option remote_tmp_files_avail_pool_lifo. 5) Spilling to local has higher priority than spilling to remote. If no local scratch space is available, temporary data will be spilled to remote. Remote scratch space uses the highest priority local scratch dir as its buffer. If no local scratch space or only one has been configured, a default local buffer should be used. The purpose of the design is to simplify the implementation in milestone 1 with less changes on the configuration. Limitations: * Only one remote scratch dir is supported. * The highest priority local scratch dir is used for the buffer of remote scratch space if remote scratch dir exists. Testcases: * Ran Unit Tests: $IMPALA_HOME/be/build/debug/runtime/buffered-tuple-stream-test $IMPALA_HOME/be/build/debug/runtime/tmp-file-mgr-test $IMPALA_HOME/be/build/debug/runtime/bufferpool/buffer-pool-test $IMPALA_HOME/be/build/debug/runtime/io/disk-io-mgr-test * Some new testcases has been added to tmp-file-mgr-test. TODO: - New Testcases for Spilling to S3. - Upper and lower bounds of new options related to size. - Preserve memory buffer for block buffers on file upload and fetch. - Add some new metrics, like the rate of accessing local buffer. Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 --- M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h A be/src/runtime/io/file-writer.h M be/src/runtime/io/hdfs-file-reader.cc A be/src/runtime/io/hdfs-file-writer.cc A be/src/runtime/io/hdfs-file-writer.h M be/src/runtime/io/local-file-system.cc M be/src/runtime/io/local-file-system.h A be/src/runtime/io/local-file-writer.cc A be/src/runtime/io/local-file-writer.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M common/thrift/metrics.json M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java 24 files changed, 2,680 insertions(+), 215 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/16318/2 -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Yida Wu has abandoned this change. ( http://gerrit.cloudera.org:8080/16264 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/16264 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ia5aa4036b4c72656b4297f9fbe42e21d2796a495 Gerrit-Change-Number: 16264 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features: 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync r
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features: 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/6861/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 10 Aug 2020 20:31:23 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features: 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync r
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16318 Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features: 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management .. WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features: 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management. 5) Compatibility of spilling to local and remote. 6) All the errors from hdfs/s3 should terminate the query. Implementation Details: 1) An new enum type is added to specify the function of local files. LocalFileMode::BUFFER and LocalFileMode::FILE. LocalFileMode::BUFFER indicates that the local file is used as a buffer for remote operations. LocalFileMode::FILE indicates the local file is used for spilling to local. Also, startup option remote_tmp_file_local_buff_mode is added to specify the implementation of the reading pages from the remote. If set to true, the entire file would be fetched to the local buffer during reading(pinning) if it was evicted. If set to false, only a page is read for each reading. 2) Two disk queues have been added to do the file operation jobs. Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper File operations on the remote disk like upload and fetch should be done in these queues. The purpose of the queues is to seperate long run operations with short ones, and also to have a more accurate control on the thread number working on these file operation jobs, sometimes we might don't want too many upload and fetch jobs working in the same time. RemoteOperRange is the new type to carry the file operation jobs. Previously,we have request types of READ and WRITE. Now FETCH/UPLOAD/EVICT have been added. 3) The tmp files are deleted when the tmp file group is deconstructing. 4) The local buffer files management is to control the total size of local buffer files and evict files if needed. There are basically six status of a remote tmp file, IN_WRITING/DUMPED/IN_DUMPING/UPLOADED/DUMPED_UPLOADED/DELETED. A local buffer file can be evicted if it is in status REMOTE or it has been all pinned. An EVICT job is sent to the local disk queue if a file is chosen to be evicted. There are two modes to decide the sequence of choosing files to be evicted. Default is LIFO, the other is FIFO. It can be decided by startup option remote_tmp_files_avail_pool_lifo. 5) Spilling to local has higher priority than spilling to remote. If no local scratch space is available, temporary data will be spilled to remote. Remote scratch space uses the highest priority local scratch dir as its buffer. If no local scratch space or only one has been configured, a default local buffer should be used. The purpose of the design is to simplify the implementation in milestone 1 with less changes on the configuration. Limitations: * Only one remote scratch dir is supported. * The highest priority local scratch dir is used for the buffer of remote scratch space if remote scratch dir exists. TODO: - New Testcases for Spilling to S3. - Upper and lower bounds of new options related to size. - Preserve memory buffer for block buffers on file upload and fetch. - Add some new metrics, like the rate of accessing local buffer. - Efficiency issue when mixing local and remote scratch space. Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 --- M be/src/runtime/hdfs-fs-cache.cc M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h A be/src/runtime/io/file-writer.h M be/src/runtime/io/hdfs-file-reader.cc A be/src/runtime/io/hdfs-file-writer.cc A be/src/runtime/io/hdfs-file-writer.h M be/src/runtime/io/local-file-system.cc M be/src/runtime/io/local-file-system.h A be/src/runtime/io/local-file-writer.cc A be/src/runtime/io/local-file-writer.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M common/thrift/metrics.json 23 files changed, 2,376 insertions(+), 197 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/16318/1 -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id:
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16264 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/6746/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16264 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia5aa4036b4c72656b4297f9fbe42e21d2796a495 Gerrit-Change-Number: 16264 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 31 Jul 2020 02:30:53 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16264 Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features: 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management. 5) Compatibility of spilling to local and remote. 6) All the errors from hdfs/s3 should terminate the query. Implementation Details: 1) An new enum type is added to specify the function of local files. LocalFileMode::BUFFER and LocalFileMode::FILE. LocalFileMode::BUFFER indicates that the local file is used as a buffer for remote operations. LocalFileMode::FILE indicates the local file is used for spilling to local. Also, startup option "remote_tmp_file_local_buff_mode" is added to specify the implementation of the reading pages from the remote. If set to true, the whole file would be fetched to the local buffer during reading. If set to false, only a page is read for each reading. 2) Two disk queues have been added to do the file operation jobs. Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper File operations on the remote disk like upload and fetch should be done in these queues. The purpose of the queues is to seperate long run operations with short ones, and also to have a more accurate control on the thread number working on these file operation jobs, sometimes we might don't want too many upload and fetch jobs working in the same time. RemoteOperRange is the new type to carry the file operation jobs. Previously,we have request types of READ and WRITE. Now FETCH/UPLOAD/EVICT have been added. 3) The tmp files are deleted when the tmp file group is deconstructing. 4) The local buffer files management is to control the total size of local buffer files and evict files if needed. There are basically five status of a remote tmp file, IN_WRITING/DUMPED/IN_DUMPING/REMOTE/TO_DELETE. A local buffer file can be evicted only if it is in status REMOTE. An EVICT job is sent to the local disk queue if a file is decided to be evicted. There are two modes to decide the sequence of choosing files to be evicted. Default is LIFO, the other is FIFO. It can be decided by startup option "remote_tmp_files_avail_pool_lifo". 5) Spilling to local has higher priority than spilling to remote. If no local scratch space is available, temporary data will be spilled to remote. Remote scratch space uses the highest priority local scratch dir as its buffer. If no local scratch space or only one has been configured, a default local buffer should be used. The purpose of the design is to simplify the implementation in milestone 1 with less changes on the configuration. Limitations: * Only one remote scratch dir is supported. * The highest priority local scratch dir is used for the buffer of remote scratch space if remote scratch dir exists. TODO: - Testcases - Refine the naming of the remote scratch dir and files. - Upper and lower bounds of new options related to size. - More accurate error codes and error handling. - Preserve memory buffer for block buffers on file upload and fetch. - Jobs cancelling for new disk queues. - Some metrics might need to be added. - Efficiency issue when mixing local and remote scratch space. Change-Id: Ia5aa4036b4c72656b4297f9fbe42e21d2796a495 --- M be/src/runtime/hdfs-fs-cache.cc M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h A be/src/runtime/io/file-writer.h M be/src/runtime/io/hdfs-file-reader.cc A be/src/runtime/io/hdfs-file-writer.cc A be/src/runtime/io/hdfs-file-writer.h M be/src/runtime/io/local-file-system.cc M be/src/runtime/io/local-file-system.h A be/src/runtime/io/local-file-writer.cc A be/src/runtime/io/local-file-writer.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M common/thrift/metrics.json 22 files changed, 2,065 insertions(+), 211 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/16264/1 -- To view, visit http://gerrit.cloudera.org:8080/16264 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia5aa4036b4c72656b4297f9fbe42e21d2796a495 Gerrit-Change-Number: 16264 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu