Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20271 )

Change subject: IMPALA-12298: Improve incremental load of Iceberg tables
......................................................................

IMPALA-12298: Improve incremental load of Iceberg tables

Currently Impala reloads the whole table with all its metadata
when a table is updated. Even if there are no files modififed, or
only a few file added. This hurts performance for large tables,
especially when Hadoop RPC encryption is enabled. See HADOOP-14558 and
HADOOP-10768 for details.

This patch adds an optimization to only load the newly added files
if their number are under a threshold. The threshold can be set by
the backend flag 'iceberg_reload_new_files_threshold' (100 by default).
If there are more files than the threshold, we fallback to the old
behavior.

Testing:
 * added Unit test
 * manually checked the TRACE logs of IcebergFileMetadataLoader

Change-Id: Icf643798a93e74ae7b0f37ceeab0a8052fb2699d
Reviewed-on: http://gerrit.cloudera.org:8080/20271
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
11 files changed, 393 insertions(+), 30 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/20271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icf643798a93e74ae7b0f37ceeab0a8052fb2699d
Gerrit-Change-Number: 20271
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to