[ 
https://issues.apache.org/jira/browse/IMPALA-12867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12867 started by Noemi Pap-Takacs.
-------------------------------------------------
> Filter files to OPTIMIZE based on file size
> -------------------------------------------
>
>                 Key: IMPALA-12867
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12867
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Noemi Pap-Takacs
>            Assignee: Noemi Pap-Takacs
>            Priority: Major
>              Labels: impala-iceberg
>
> {{'OPTIMIZE TABLE <table_name>'}} rewrites all files of the table regardless 
> of size and type, even if the table does not contain any small  or delete 
> files.
> With '{{{}FILE_SIZE_THRESHOLD'{}}} option, the user should be able to specify 
> a file size limit to rewrite only small files.
> {code:java}
> Syntax: OPTIMIZE TABLE <table_name> (FILE_SIZE_THRESHOLD=100);{code}
> The value of the threshold is the file size in MBs. Data files larger than 
> the given limit will only be rewritten if they are referenced from delete 
> deltas.
> Note that if '{{{}FILE_SIZE_THRESHOLD'{}}} is set, only the selected files 
> will be rewritten according to the latest schema and partition spec. 
> Therefore the intact data files might still have an older schema or partition 
> layout. Use {{'OPTIMIZE TABLE table_name'}} to rewrite the entire table 
> according to the latest schema and partititon layout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to