Bharath Vissapragada has uploaded a new change for review. http://gerrit.cloudera.org:8080/6009
Change subject: IMPALA-4840: Fix REFRESH performance regression. ...................................................................... IMPALA-4840: Fix REFRESH performance regression. The fix for IMPALA-4172 introduced a regression in performance of the REFRESH command. The regression stems from the fact that we reload the block metadata of every valid data file without considering whether it has changed since the last load. This caused unnecessary metadata loads for unchanged files and thus increasing the runtime. The fix involves having the refresh codepath (and other operations that use the same codepath like insert etc.) to reload the metadata of only modified files by doing a listStatus() on the partition directory and checking the last modified time of each file. The initial/invalidate metadata load still fetches the block locations in bulk. Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 2 files changed, 106 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/6009/1 -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada <[email protected]>
