[jira] [Commented] (HIVE-22215) Compaction of sorted table

2023-09-15 Thread Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765506#comment-17765506
 ] 

Jacques commented on HIVE-22215:


Any information / view on when compaction of SORTED tables will be supported? 

> Compaction of sorted table
> --
>
> Key: HIVE-22215
> URL: https://issues.apache.org/jira/browse/HIVE-22215
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Pawel Jurkiewicz
>Priority: Major
>
> I recently came across an issue regarding compacting tables with sorting.
> I am creating and populating with test data two tables: both ACID but only 
> one is sorted
> {code:sql}
> USE priv;
> DROP TABLE IF EXISTS test_data;
> DROP TABLE IF EXISTS test_compact_insert_with_sorting;
> DROP TABLE IF EXISTS test_compact_insert_without_sorting;
> CREATE TABLE test_data AS SELECT 'foobar' col;
> CREATE TABLE test_compact_insert_with_sorting (col string) 
> CLUSTERED BY (col) SORTED BY (col) INTO 1 BUCKETS
> TBLPROPERTIES ('transactional' = 'true', 
> 'transactional_properties'='insert_only');
> CREATE TABLE test_compact_insert_without_sorting (col string) 
> CLUSTERED BY (col) INTO 1 BUCKETS
> TBLPROPERTIES ('transactional' = 'true', 
> 'transactional_properties'='insert_only');
> INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT col FROM 
> test_data;
> INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM 
> test_data;  INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT 
> col FROM test_data;
> INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM 
> test_data; 
> {code}
> As expected, after these operations two base files were created for each 
> table:
> {code:bash}
> $ hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact_insert*
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_002
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_002
> {code}
> But after running manual compaction on those tables:
> {code:sql}
> USE priv;
> ALTER TABLE test_compact_insert_with_sorting COMPACT 'MAJOR';
> ALTER TABLE test_compact_insert_without_sorting COMPACT 'MAJOR';
> {code}
> Tuns out only the one without sorting got compacted:
> {code:bash}
> hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact*
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_002
> Found 1 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_002
> {code}
> Compactions inspection returns:
> {code:bash}
> $ beeline -e 'show compactions' | grep priv | grep test_compact
> | 7598474   | priv  | test_compact_insert_with_sorting   |  ---   
> | MAJOR  | succeeded  | 
> master-01.pd.my-domain.com.pl-51  | 1568812155386  | 11 | None
> |
> | 7598475   | priv  | test_compact_insert_without_sorting|  ---   
> | MAJOR  | succeeded  |  ---  
>| 1568812155403  | 298| None
> {code}
> Is this by design? Both compactions states are 'succeeded' but only the one 
> that resulted in reducing number of base files took some time. Another 
> remarkable behavior is compaction of the table with sorting has worker 
> assigned meaning it is still in progress?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-22215) Compaction of sorted table

2019-09-19 Thread Rajkumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933545#comment-16933545
 ] 

Rajkumar Singh commented on HIVE-22215:
---

[~pasza] this is as per design, hive skip the compaction for sorted bucket table

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java#L151]

> Compaction of sorted table
> --
>
> Key: HIVE-22215
> URL: https://issues.apache.org/jira/browse/HIVE-22215
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Pawel Jurkiewicz
>Priority: Major
>
> I recently came across an issue regarding compacting tables with sorting.
> I am creating and populating with test data two tables: both ACID but only 
> one is sorted
> {code:sql}
> USE priv;
> DROP TABLE IF EXISTS test_data;
> DROP TABLE IF EXISTS test_compact_insert_with_sorting;
> DROP TABLE IF EXISTS test_compact_insert_without_sorting;
> CREATE TABLE test_data AS SELECT 'foobar' col;
> CREATE TABLE test_compact_insert_with_sorting (col string) 
> CLUSTERED BY (col) SORTED BY (col) INTO 1 BUCKETS
> TBLPROPERTIES ('transactional' = 'true', 
> 'transactional_properties'='insert_only');
> CREATE TABLE test_compact_insert_without_sorting (col string) 
> CLUSTERED BY (col) INTO 1 BUCKETS
> TBLPROPERTIES ('transactional' = 'true', 
> 'transactional_properties'='insert_only');
> INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT col FROM 
> test_data;
> INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM 
> test_data;  INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT 
> col FROM test_data;
> INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM 
> test_data; 
> {code}
> As expected, after these operations two base files were created for each 
> table:
> {code:bash}
> $ hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact_insert*
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_002
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_002
> {code}
> But after running manual compaction on those tables:
> {code:sql}
> USE priv;
> ALTER TABLE test_compact_insert_with_sorting COMPACT 'MAJOR';
> ALTER TABLE test_compact_insert_without_sorting COMPACT 'MAJOR';
> {code}
> Tuns out only the one without sorting got compacted:
> {code:bash}
> hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact*
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_002
> Found 1 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_002
> {code}
> Compactions inspection returns:
> {code:bash}
> $ beeline -e 'show compactions' | grep priv | grep test_compact
> | 7598474   | priv  | test_compact_insert_with_sorting   |  ---   
> | MAJOR  | succeeded  | 
> master-01.pd.my-domain.com.pl-51  | 1568812155386  | 11 | None
> |
> | 7598475   | priv  | test_compact_insert_without_sorting|  ---   
> | MAJOR  | succeeded  |  ---  
>| 1568812155403  | 298| None
> {code}
> Is this by design? Both compactions states are 'succeeded' but only the one 
> that resulted in reducing number of base files took some time. Another 
> remarkable behavior is compaction of the table with sorting has worker 
> assigned meaning it is still in progress?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)