Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20271
Change subject: IMPALA-12298: Improve incremental load of Iceberg tables ...................................................................... IMPALA-12298: Improve incremental load of Iceberg tables Currently Impala reloads the whole table with all its metadata when a table is updated. Even if there are no files modififed, or only a few file added. This hurts performance for large tables, especially when Hadoop RPC encryption is enabled. See HADOOP-14558 and HADOOP-10768 for details. This patch adds an optimization to only load the newly added files if their number are under a threshold (currently 100). If there are more files than the threshold, we fallback to the old behavior. Testing: * added Unit test * manually checked the TRACE logs of IcebergFileMetadataLoader Change-Id: Icf643798a93e74ae7b0f37ceeab0a8052fb2699d --- M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java 7 files changed, 193 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/20271/1 -- To view, visit http://gerrit.cloudera.org:8080/20271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icf643798a93e74ae7b0f37ceeab0a8052fb2699d Gerrit-Change-Number: 20271 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
