Noemi Pap-Takacs has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/20866 )

Change subject: IMPALA-12412: Support partition evolution in OPTIMIZE statement
......................................................................

IMPALA-12412: Support partition evolution in OPTIMIZE statement

The OPTIMIZE statement is used to execute table maintenance tasks
on Iceberg tables, such as:
 1. compacting small files,
 2. merging delete deltas,
 3. rewriting the table according to the latest schema
    and partition spec.

OptimizeStmt used to serve as an alias for INSERT OVERWRITE.
After this change it works as follows: It creates a source statement
that contains all columns of the table. All table content will be
rewritten to new data files. After the executors finished writing,
the Catalog calls RewriteFiles Iceberg API to commit the changes.
All previous data and delete files will be excluded from,
and all newly written data files will be added to the next
snapshot. The old files remain accessible via time travel
to older snapshots of the table.

By default, Impala has as many file writers as query fragment instances
and therefore can write too many files for unpartitioned tables.
For smaller tables this can be limited by setting the
MAX_FS_WRITERS Query Option.

Authorization: OPTIMIZE TABLE requires ALL privileges.

Limitations:
All limitations about writing Iceberg tables apply.

Testing:
 - E2E tests:
     - schema evolution
     - partition evolution
     - UPDATE/DELETE
     - time travel
     - table history
 - negative tests
 - Ranger tests for authorization
 - FE: Planner test:
     - sorting order
     - MAX_FS_WRITERS
     - partitioned exchange
     - Parser test
Change-Id: I65a0c8529d274afff38ccd582f1b8a857716b1b5
---
M be/src/service/client-request-state.cc
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-optimize.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
M tests/query_test/test_iceberg.py
20 files changed, 639 insertions(+), 266 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/20866/10
--
To view, visit http://gerrit.cloudera.org:8080/20866
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65a0c8529d274afff38ccd582f1b8a857716b1b5
Gerrit-Change-Number: 20866
Gerrit-PatchSet: 10
Gerrit-Owner: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to