[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..

IMPALA-10147: Avoid getting a file handle for data cache hits

When reading from the data cache, the disk IO thread first gets a file
handle, then it checks the data cache for a hit. The file handle is only
used if there is a data cache miss. It is not used when data cache hit
and in turns becomes an overhead. This patch move the file handle
retrieval later when data cache miss hapens.

Testing:
- Add custom cluster test test_no_fd_caching_on_cached_data.
- Pass core tests.

Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Reviewed-on: http://gerrit.cloudera.org:8080/16963
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/io/hdfs-file-reader.cc
M tests/custom_cluster/test_hdfs_fd_caching.py
2 files changed, 71 insertions(+), 21 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Jan 2021 16:04:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6852/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Jan 2021 10:28:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Jan 2021 10:28:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6859/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Jan 2021 06:43:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-21 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 2: Code-Review+2

Looks good! Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Jan 2021 23:49:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 2:

(2 comments)

Thank you for the review.
Patch set 2 move the handle variable and scope exit trigger closer around the 
file handle retrieval.

http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@99
PS1, Line 99: Status status = Status::OK();
:   {
: ScopedTimer req_context_read_timer(
: scan_range_->reader_->read_timer_);
: bool logged_slow_read = false; // True if we already logged 
about a slow read.
:
: // Try reading from the remote data cache if it's enabled for 
the scan range.
: DataCache* remote_data_cache = io_mgr->remote_data_cache();
: bool try_data_cache = scan_range_->UseDataCache() && 
remote_data_cache != nullptr;
: i
> Nit: Since these are now initialized later in the function, I would also mo
Done


http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@126
PS1, Line 126: hdfsFile hdfs_file;
> For the clang-tidy issue you are seeing here:
Thank you. I didn't read thoroughly to realize that the logic later won't 
execute if all bytes successfully read from data cache.



--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Jan 2021 17:13:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8040/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Jan 2021 17:06:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 2: Code-Review+1

Thanks for fixing this, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Jan 2021 16:56:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-21 Thread Riza Suminto (Code Review)
Hello Zoltan Borok-Nagy, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16963

to look at the new patch set (#2).

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..

IMPALA-10147: Avoid getting a file handle for data cache hits

When reading from the data cache, the disk IO thread first gets a file
handle, then it checks the data cache for a hit. The file handle is only
used if there is a data cache miss. It is not used when data cache hit
and in turns becomes an overhead. This patch move the file handle
retrieval later when data cache miss hapens.

Testing:
- Add custom cluster test test_no_fd_caching_on_cached_data.
- Pass core tests.

Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
---
M be/src/runtime/io/hdfs-file-reader.cc
M tests/custom_cluster/test_hdfs_fd_caching.py
2 files changed, 71 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16963/2
--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-20 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 1:

(2 comments)

Thanks for working on this. This is looking good to me. Just a couple style 
nits. I'm thinking about whether there are any tests I want you to run. I'll 
get back to you.

http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@99
PS1, Line 99: CachedHdfsFileHandle* borrowed_hdfs_fh = nullptr;
:   hdfsFile hdfs_file;
:
:   // Make sure to release any borrowed file handle.
:   auto release_borrowed_hdfs_fh = MakeScopeExitTrigger([this, 
_hdfs_fh]() {
: if (borrowed_hdfs_fh != nullptr) {
:   
scan_range_->io_mgr_->ReleaseCachedHdfsFileHandle(scan_range_->file_string(),
:   borrowed_hdfs_fh);
: }
:   });
Nit: Since these are now initialized later in the function, I would also move 
the definitions of the variables down to just before the "if (*bytes_read < 
bytes_to_read)" statement.

I think I would also move the scope exit trigger down to just after that if 
block (just before the while).


http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@126
PS1, Line 126: if (*bytes_read < bytes_to_read) {
For the clang-tidy issue you are seeing here:
https://jenkins.impala.io/job/clang-tidy-ub1604/14691/artifact/tidylog.txt

clang-tidy must not be smart enough to know that this if statement has a 
condition that is the same as the while loop, so it thinks hdfs_file is 
uninitialized.

One option is to bail out early here if *bytes_read == bytes_to_read, then just 
have these variables without an if statement. Looking through the logic later 
in the function, none of it would run in that case, so I think it is safe to 
bail. (Obviously, we aren't reading anything more, and the block that does 
WriteDataCache() also doesn't have anything to do.)



--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Jan 2021 00:21:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8018/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 19 Jan 2021 17:15:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-19 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16963


Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..

IMPALA-10147: Avoid getting a file handle for data cache hits

When reading from the data cache, the disk IO thread first gets a file
handle, then it checks the data cache for a hit. The file handle is only
used if there is a data cache miss. It is not used when data cache hit
and in turns becomes an overhead. This patch move the file handle
retrieval later when data cache miss hapens.

Testing:
- Add custom cluster test test_no_fd_caching_on_cached_data.
- Pass core tests.

Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
---
M be/src/runtime/io/hdfs-file-reader.cc
M tests/custom_cluster/test_hdfs_fd_caching.py
2 files changed, 59 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16963/1
--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto