[ https://issues.apache.org/jira/browse/IMPALA-12867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on IMPALA-12867 started by Noemi Pap-Takacs. ------------------------------------------------- > Filter files to OPTIMIZE based on file size > ------------------------------------------- > > Key: IMPALA-12867 > URL: https://issues.apache.org/jira/browse/IMPALA-12867 > Project: IMPALA > Issue Type: Sub-task > Reporter: Noemi Pap-Takacs > Assignee: Noemi Pap-Takacs > Priority: Major > Labels: impala-iceberg > > {{'OPTIMIZE TABLE <table_name>'}} rewrites all files of the table regardless > of size and type, even if the table does not contain any small or delete > files. > With '{{{}FILE_SIZE_THRESHOLD'{}}} option, the user should be able to specify > a file size limit to rewrite only small files. > {code:java} > Syntax: OPTIMIZE TABLE <table_name> (FILE_SIZE_THRESHOLD=100);{code} > The value of the threshold is the file size in MBs. Data files larger than > the given limit will only be rewritten if they are referenced from delete > deltas. > Note that if '{{{}FILE_SIZE_THRESHOLD'{}}} is set, only the selected files > will be rewritten according to the latest schema and partition spec. > Therefore the intact data files might still have an older schema or partition > layout. Use {{'OPTIMIZE TABLE table_name'}} to rewrite the entire table > according to the latest schema and partititon layout. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org