[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. IMPALA-10147: Avoid getting a file handle for data cache hits When reading from the data cache, the disk IO thread first gets a file handle, then it checks the data cache for a hit. The file handle is only used if there is a data cache miss. It is not used when data cache hit and in turns becomes an overhead. This patch move the file handle retrieval later when data cache miss hapens. Testing: - Add custom cluster test test_no_fd_caching_on_cached_data. - Pass core tests. Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Reviewed-on: http://gerrit.cloudera.org:8080/16963 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/runtime/io/hdfs-file-reader.cc M tests/custom_cluster/test_hdfs_fd_caching.py 2 files changed, 71 insertions(+), 21 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Jan 2021 16:04:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6852/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Jan 2021 10:28:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Jan 2021 10:28:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6859/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Jan 2021 06:43:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 2: Code-Review+2 Looks good! Thanks! -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Jan 2021 23:49:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 2: (2 comments) Thank you for the review. Patch set 2 move the handle variable and scope exit trigger closer around the file handle retrieval. http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc File be/src/runtime/io/hdfs-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@99 PS1, Line 99: Status status = Status::OK(); : { : ScopedTimer req_context_read_timer( : scan_range_->reader_->read_timer_); : bool logged_slow_read = false; // True if we already logged about a slow read. : : // Try reading from the remote data cache if it's enabled for the scan range. : DataCache* remote_data_cache = io_mgr->remote_data_cache(); : bool try_data_cache = scan_range_->UseDataCache() && remote_data_cache != nullptr; : i > Nit: Since these are now initialized later in the function, I would also mo Done http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@126 PS1, Line 126: hdfsFile hdfs_file; > For the clang-tidy issue you are seeing here: Thank you. I didn't read thoroughly to realize that the logic later won't execute if all bytes successfully read from data cache. -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Jan 2021 17:13:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8040/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Jan 2021 17:06:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 2: Code-Review+1 Thanks for fixing this, LGTM! -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Jan 2021 16:56:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Hello Zoltan Borok-Nagy, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16963 to look at the new patch set (#2). Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. IMPALA-10147: Avoid getting a file handle for data cache hits When reading from the data cache, the disk IO thread first gets a file handle, then it checks the data cache for a hit. The file handle is only used if there is a data cache miss. It is not used when data cache hit and in turns becomes an overhead. This patch move the file handle retrieval later when data cache miss hapens. Testing: - Add custom cluster test test_no_fd_caching_on_cached_data. - Pass core tests. Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 --- M be/src/runtime/io/hdfs-file-reader.cc M tests/custom_cluster/test_hdfs_fd_caching.py 2 files changed, 71 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16963/2 -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 1: (2 comments) Thanks for working on this. This is looking good to me. Just a couple style nits. I'm thinking about whether there are any tests I want you to run. I'll get back to you. http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc File be/src/runtime/io/hdfs-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@99 PS1, Line 99: CachedHdfsFileHandle* borrowed_hdfs_fh = nullptr; : hdfsFile hdfs_file; : : // Make sure to release any borrowed file handle. : auto release_borrowed_hdfs_fh = MakeScopeExitTrigger([this, _hdfs_fh]() { : if (borrowed_hdfs_fh != nullptr) { : scan_range_->io_mgr_->ReleaseCachedHdfsFileHandle(scan_range_->file_string(), : borrowed_hdfs_fh); : } : }); Nit: Since these are now initialized later in the function, I would also move the definitions of the variables down to just before the "if (*bytes_read < bytes_to_read)" statement. I think I would also move the scope exit trigger down to just after that if block (just before the while). http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@126 PS1, Line 126: if (*bytes_read < bytes_to_read) { For the clang-tidy issue you are seeing here: https://jenkins.impala.io/job/clang-tidy-ub1604/14691/artifact/tidylog.txt clang-tidy must not be smart enough to know that this if statement has a condition that is the same as the while loop, so it thinks hdfs_file is uninitialized. One option is to bail out early here if *bytes_read == bytes_to_read, then just have these variables without an if statement. Looking through the logic later in the function, none of it would run in that case, so I think it is safe to bail. (Obviously, we aren't reading anything more, and the block that does WriteDataCache() also doesn't have anything to do.) -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Jan 2021 00:21:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8018/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 19 Jan 2021 17:15:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16963 Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. IMPALA-10147: Avoid getting a file handle for data cache hits When reading from the data cache, the disk IO thread first gets a file handle, then it checks the data cache for a hit. The file handle is only used if there is a data cache miss. It is not used when data cache hit and in turns becomes an overhead. This patch move the file handle retrieval later when data cache miss hapens. Testing: - Add custom cluster test test_no_fd_caching_on_cached_data. - Pass core tests. Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 --- M be/src/runtime/io/hdfs-file-reader.cc M tests/custom_cluster/test_hdfs_fd_caching.py 2 files changed, 59 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16963/1 -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto