Alex Behm has posted comments on this change.

Change subject: IMPALA-4840: Fix REFRESH performance regression.
......................................................................


Patch Set 1:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6009/1//COMMIT_MSG
Commit Message:

Line 20: listStatus() on the partition directory and checking the
Mention that listStatus() is significantly faster than listFiles(), that's the 
fundamental reason why this optimization works.


Line 22: metadata load still fetches the block locations in bulk.
Mention the behavioral change of REFRESH with respect to HDFS block movements.


http://gerrit.cloudera.org:8080/#/c/6009/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 341:   private int computeFdBlockMetadata(FileDescriptor fd, 
BlockLocation[] locations)
setBlockMetadata() or function/arg names that more clearly indicate the 
inputs/outputs

We are basically setting the block metadata in 'fd' based on its block 
locations.


Line 364:   }
newline


Line 783:    * modified files on HDFS.
Mention which case this function is optimized for, e.g., it will be fast if 
there are few changes, but it will be slower than listFiles() if there are many 
changes. While doing that you can mention the implementation of listStatus() 
followed by getFileBlockLocations() and the 40x perf difference between 
listStatus() and listFiles() that you measured.


Line 785:   private void loadMetadataAndDiskIds(HdfsPartition partition) throws 
CatalogException {
refreshFileMetadata() or something that indicates we are reusing existing file 
metadata?


Line 804:       if (!FileSystemUtil.supportsStorageIds(fs)) {
can move this up (avoid indexing all files in the partition)


Line 830:       if (unknownDiskIdCount > 0 && LOG.isWarnEnabled()) {
Do we really need this warning? It has the potential of log spew because it's 
printed per-partition.
Can we maybe add this warning after a complete REFRESH?


Line 835:       numHdfsFiles_ += newFileDescs.size();
where do we subtract the old values?


-- 
To view, visit http://gerrit.cloudera.org:8080/6009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-HasComments: Yes

Reply via email to