[ 
https://issues.apache.org/jira/browse/IMPALA-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770186#comment-17770186
 ] 

ASF subversion and git services commented on IMPALA-12406:
----------------------------------------------------------

Commit 2d3289027c2ffdd245d13b60e6fa3f9b3e7bf833 in impala's branch 
refs/heads/master from Noemi Pap-Takacs
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2d3289027 ]

IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

If an Iceberg table is frequently updated/written to in small batches,
a lot of small files are created. This decreases read performance.
Similarly, frequent row-level deletes contribute to this problem
by creating delete files, which have to be merged on read.

So far INSERT OVERWRITE (rewriting the table with itself) has been used
to compact Iceberg tables.
However, it comes with some RESTRICTIONS:
- The table should not have multiple partition specs/partition evolution.
- The table should not contain complex types.

The OPTIMIZE statement offers a new syntax and a solution limited to
Iceberg tables to enhance read performance for subsequent operations.
See IMPALA-12293 for details.

Syntax: OPTIMIZE TABLE <table_name>;

This first patch introduces the new syntax, temporarily as an alias
for INSERT OVERWRITE.

Note that executing OPTIMIZE TABLE requires ALL privileges.

Testing:
 - negative tests
 - FE planner test
 - Ranger test
 - E2E tests

Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Reviewed-on: http://gerrit.cloudera.org:8080/20405
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> OPTIMIZE statement as an alias for INSERT OVERWRITE
> ---------------------------------------------------
>
>                 Key: IMPALA-12406
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12406
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>            Reporter: Noemi Pap-Takacs
>            Assignee: Noemi Pap-Takacs
>            Priority: Major
>              Labels: impala-iceberg
>
> If an Iceberg table is frequently updated/written to in small batches, a lot 
> of small files are created. This fragmentation decreases read performance. 
> Similarly, frequent row-level deletes contribute to this problem by creating 
> delete files which have to be merged on read.
> Currently INSERT OVERWRITE is used as a workaround to rewrite and compact 
> Iceberg tables.
> The OPTIMIZE statement offers a new syntax and an Iceberg specific solution 
> to this problem.
> This first subtask introduces the new syntax, temporarily as an alias for 
> INSERT OVERWRITE.
> {code:java}
> Syntax: OPTIMIZE TABLE <table_name>;{code}
> Limitations - OPTIMIZE TABLE can not be executed on the following tables:
>  * Tables with partition evolution
>  * Tables with complex types columns
>  * Non-Iceberg tables



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to