Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
......................................................................


Patch Set 4:

(18 comments)

looking good.

http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 894:         
runtime_state_->io_mgr()->cached_file_handles_hit_count(reader_context_));
pretty awkward. why not call the reader directly (and maybe introduce a new 
getter)?


http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 441:   RuntimeProfile::Counter* cached_file_handles_hit_count_;
initialize to nullptr here, that way you don't need to do it in the c'tor 
(which you have to edit if the member order changes, or which you might have to 
duplicate if you multiple c'tors).

do that for all pointer members.


http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/buffered-block-mgr.h
File be/src/runtime/buffered-block-mgr.h:

Line 25: #include <boost/unordered_map.hpp>
why?


http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 342:       io_mgr_->CacheOrCloseFileHandle(file(), exclusive_hdfs_fh_);
yes, the name really needs to distinguish between cache operations and direct 
operations on handles.


Line 548:         remote_bytes_ += stats->totalBytesRead - 
stats->totalLocalBytesRead;
why not stick that into reader_ as well?


http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

Line 231:   /// Threads checkout a file handle for exclusive access and return 
it when finished.
"check out" (checkout is a noun)


Line 237:   template <size_t NUM_PARTITIONS>
please pull this along with hdfsfilehandle out into a separate file, this file 
is large enough as it is.


Line 242:     /// first k partitions will each have one additional file handle.
let's make the size uniform across all partitions (e.g., by rounding up), that 
should simplify the code.


Line 282:     /// the partitions are aligned to cache line boundaries (64 
bytes).
hm, isn't there some predefined constant somewhere for the cache line size?


Line 549:     Status Open(bool use_file_handle_cache) WARN_UNUSED_RESULT;
does it ever make sense to call open on the same scan range with different 
values of the bool? if not, make a parameter of the c'tor?

point out clearly in the comment that if the param is true, open does nothing.


Line 573:     void* meta_data_;
initialize nullptr here (for all pointer members)


Line 587:     /// Total number of bytes read remotely
point out why we need to maintain this particular metric instead of in 
requestcontext.


Line 601:     /// handle, and no sharing takes place.
also mention when this is populated and how long it's valid.


Line 856:   HdfsFileHandle* OpenHdfsFile(const hdfsFS& fs,
if this always returns a cached handle, would be good to reflect that in the 
name.


Line 860:   void CacheOrCloseFileHandle(const char* fname, HdfsFileHandle* fid);
rename


http://gerrit.cloudera.org:8080/#/c/6478/4/tests/query_test/test_hdfs_fd_caching.py
File tests/query_test/test_hdfs_fd_caching.py:

Line 76:     # File deleted: either # rows is the same or # rows = 0
we don't make any guarantees for this, so might as well not check the result. 
we just don't want to crash.


Line 98:     new_table_location = 
"{0}/test-warehouse/{1}".format(FILESYSTEM_PREFIX, unique_database)
long lines


Line 105:     # Delete the whole directory (including data file)
this function is pretty much identical to the one above (and below), except for 
the specific modification action. consolidate into a single function and drive 
that with an enum parameter.


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to