Alex Behm has posted comments on this change. Change subject: IMPALA-4840: Fix REFRESH performance regression. ......................................................................
Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/6009/1//COMMIT_MSG Commit Message: Line 20: listStatus() on the partition directory and checking the Mention that listStatus() is significantly faster than listFiles(), that's the fundamental reason why this optimization works. Line 22: metadata load still fetches the block locations in bulk. Mention the behavioral change of REFRESH with respect to HDFS block movements. http://gerrit.cloudera.org:8080/#/c/6009/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 341: private int computeFdBlockMetadata(FileDescriptor fd, BlockLocation[] locations) setBlockMetadata() or function/arg names that more clearly indicate the inputs/outputs We are basically setting the block metadata in 'fd' based on its block locations. Line 364: } newline Line 783: * modified files on HDFS. Mention which case this function is optimized for, e.g., it will be fast if there are few changes, but it will be slower than listFiles() if there are many changes. While doing that you can mention the implementation of listStatus() followed by getFileBlockLocations() and the 40x perf difference between listStatus() and listFiles() that you measured. Line 785: private void loadMetadataAndDiskIds(HdfsPartition partition) throws CatalogException { refreshFileMetadata() or something that indicates we are reusing existing file metadata? Line 804: if (!FileSystemUtil.supportsStorageIds(fs)) { can move this up (avoid indexing all files in the partition) Line 830: if (unknownDiskIdCount > 0 && LOG.isWarnEnabled()) { Do we really need this warning? It has the potential of log spew because it's printed per-partition. Can we maybe add this warning after a complete REFRESH? Line 835: numHdfsFiles_ += newFileDescs.size(); where do we subtract the old values? -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-HasComments: Yes
