Hello Zoltan Borok-Nagy, Noemi Pap-Takacs, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23042
to look at the new patch set (#5).
Change subject: IMPALA-12337: Implement delete orphan files for Iceberg table
......................................................................
IMPALA-12337: Implement delete orphan files for Iceberg table
This patch implements delete orphan files query for Iceberg table.
The following statement becomes available for Iceberg tables:
- ALTER TABLE <tbl> EXECUTE remove_orphan_files(<timestamp>)
The bulk of implementation copies Hive's implementation of
org.apache.iceberg.actions.DeleteOrphanFiles interface (HIVE-27906),
which this patch rename to ImpalaIcebergDeleteOrphanFiles.java. Upon
execute(), ImpalaIcebergDeleteOrphanFiles class instance will gather all
URI of valid data files and Iceberg metadata files using Iceberg API.
These valid URIs then will be compared to recursive file listing
obtained through Hadoop FileSystem API under table's 'data' and
'metadata' directory accordingly. Any unmatched URI from FileSystem API
listing that has modification time less than 'olderThanTimestamp'
parameter will then be removed via Iceberg FileIO API of given Iceberg
table.
The execution happen in CatalogD via
IcebergCatalogOpExecutor.alterTableExecuteRemoveOrphanFiles(). CatalogD
supplied CatalogOpExecutor.icebergExecutorService_ as executor service
to execute the Iceberg API planFiles and FileIO API for deletion.
Note that after remove_orphan_files is executed, a new metadata.json is
created with the same snapshot id, but with a new "last-updated-ms".
Testing:
- Add FE and EE tests.
Change-Id: I5979cdf15048d5a2c4784918533f65f32e888de0
---
M common/thrift/JniCatalog.thrift
A
fe/src/main/java/org/apache/impala/analysis/AlterTableExecuteRemoveOrphanFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableExecuteStmt.java
A
fe/src/main/java/org/apache/impala/catalog/iceberg/ImpalaIcebergDeleteOrphanFiles.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A
testdata/workloads/functional-query/queries/QueryTest/iceberg-remove-orphan-negative.test
M tests/query_test/test_iceberg.py
9 files changed, 559 insertions(+), 3 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/23042/5
--
To view, visit http://gerrit.cloudera.org:8080/23042
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5979cdf15048d5a2c4784918533f65f32e888de0
Gerrit-Change-Number: 23042
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>