[
https://issues.apache.org/jira/browse/HIVE-22215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933545#comment-16933545
]
Rajkumar Singh commented on HIVE-22215:
---
[~pasza] this is as per design, hive skip the compaction for sorted bucket table
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java#L151]
> Compaction of sorted table
> --
>
> Key: HIVE-22215
> URL: https://issues.apache.org/jira/browse/HIVE-22215
> Project: Hive
> Issue Type: Bug
> Components: Hive
>Affects Versions: 3.1.0
>Reporter: Pawel Jurkiewicz
>Priority: Major
>
> I recently came across an issue regarding compacting tables with sorting.
> I am creating and populating with test data two tables: both ACID but only
> one is sorted
> {code:sql}
> USE priv;
> DROP TABLE IF EXISTS test_data;
> DROP TABLE IF EXISTS test_compact_insert_with_sorting;
> DROP TABLE IF EXISTS test_compact_insert_without_sorting;
> CREATE TABLE test_data AS SELECT 'foobar' col;
> CREATE TABLE test_compact_insert_with_sorting (col string)
> CLUSTERED BY (col) SORTED BY (col) INTO 1 BUCKETS
> TBLPROPERTIES ('transactional' = 'true',
> 'transactional_properties'='insert_only');
> CREATE TABLE test_compact_insert_without_sorting (col string)
> CLUSTERED BY (col) INTO 1 BUCKETS
> TBLPROPERTIES ('transactional' = 'true',
> 'transactional_properties'='insert_only');
> INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT col FROM
> test_data;
> INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM
> test_data; INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT
> col FROM test_data;
> INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM
> test_data;
> {code}
> As expected, after these operations two base files were created for each
> table:
> {code:bash}
> $ hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact_insert*
> Found 2 items
> drwxrwx---+ - hive hadoop 0 2019-09-18 15:08
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_001
> drwxrwx---+ - hive hadoop 0 2019-09-18 15:08
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_002
> Found 2 items
> drwxrwx---+ - hive hadoop 0 2019-09-18 15:08
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_001
> drwxrwx---+ - hive hadoop 0 2019-09-18 15:08
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_002
> {code}
> But after running manual compaction on those tables:
> {code:sql}
> USE priv;
> ALTER TABLE test_compact_insert_with_sorting COMPACT 'MAJOR';
> ALTER TABLE test_compact_insert_without_sorting COMPACT 'MAJOR';
> {code}
> Tuns out only the one without sorting got compacted:
> {code:bash}
> hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact*
> Found 2 items
> drwxrwx---+ - hive hadoop 0 2019-09-18 15:08
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_001
> drwxrwx---+ - hive hadoop 0 2019-09-18 15:08
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_002
> Found 1 items
> drwxrwx---+ - hive hadoop 0 2019-09-18 15:08
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_002
> {code}
> Compactions inspection returns:
> {code:bash}
> $ beeline -e 'show compactions' | grep priv | grep test_compact
> | 7598474 | priv | test_compact_insert_with_sorting | ---
> | MAJOR | succeeded |
> master-01.pd.my-domain.com.pl-51 | 1568812155386 | 11 | None
> |
> | 7598475 | priv | test_compact_insert_without_sorting| ---
> | MAJOR | succeeded | ---
>| 1568812155403 | 298| None
> {code}
> Is this by design? Both compactions states are 'succeeded' but only the one
> that resulted in reducing number of base files took some time. Another
> remarkable behavior is compaction of the table with sorting has worker
> assigned meaning it is still in progress?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)