Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20271 )
Change subject: IMPALA-12298: Improve incremental load of Iceberg tables ...................................................................... IMPALA-12298: Improve incremental load of Iceberg tables Currently Impala reloads the whole table with all its metadata when a table is updated. Even if there are no files modififed, or only a few file added. This hurts performance for large tables, especially when Hadoop RPC encryption is enabled. See HADOOP-14558 and HADOOP-10768 for details. This patch adds an optimization to only load the newly added files if their number are under a threshold. The threshold can be set by the backend flag 'iceberg_reload_new_files_threshold' (100 by default). If there are more files than the threshold, we fallback to the old behavior. Testing: * added Unit test * manually checked the TRACE logs of IcebergFileMetadataLoader Change-Id: Icf643798a93e74ae7b0f37ceeab0a8052fb2699d Reviewed-on: http://gerrit.cloudera.org:8080/20271 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/common/global-flags.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java 11 files changed, 393 insertions(+), 30 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Icf643798a93e74ae7b0f37ceeab0a8052fb2699d Gerrit-Change-Number: 20271 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
