Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4840: Fix REFRESH performance regression. ......................................................................
IMPALA-4840: Fix REFRESH performance regression. The fix for IMPALA-4172 introduced a regression in performance of the REFRESH command. The regression stems from the fact that we reload the block metadata of every valid data file without considering whether it has changed since the last load. This caused unnecessary metadata loads for unchanged files and thus increasing the runtime. The fix involves having the refresh codepath (and other operations that use the same codepath like insert etc.) to reload the metadata of only modified files by doing a listStatus() on the partition directory and checking the last modified time of each file. Without this patch, we relied on listFiles(), which fetched the block locations irrespective of whether the file has changed and it was significantly slower on unchanged tables. The initial/invalidate metadata load still fetches the block locations in bulk using listFiles(). The side effect of this change is that the refresh no longer picks up block location changes after HDFS block rebalancing. We suggest using "invalidate metadata" for that which loads the metadata from scratch. Additionally, this commit enables the reuse of metadata during table refresh (which was disabled in IMPALA-4172) to prevent reloading metadata from HMS everytime. Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Reviewed-on: http://gerrit.cloudera.org:8080/6009 Reviewed-by: Dimitris Tsirogiannis <[email protected]> Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 2 files changed, 110 insertions(+), 32 deletions(-) Approvals: Impala Public Jenkins: Verified Dimitris Tsirogiannis: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Impala Public Jenkins
