[ https://issues.apache.org/jira/browse/HIVE-22474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Laszlo Pinter reassigned HIVE-22474: ------------------------------------ > Query based major compaction always creates only one bucket file > ---------------------------------------------------------------- > > Key: HIVE-22474 > URL: https://issues.apache.org/jira/browse/HIVE-22474 > Project: Hive > Issue Type: Sub-task > Components: Hive > Reporter: Laszlo Pinter > Assignee: Laszlo Pinter > Priority: Major > > {code:sql} > set hive.execution.engine=mr; > drop table if exists tbl2; > create table tbl2 (a int, b int) clustered by (a) into 2 buckets stored as > ORC TBLPROPERTIES('bucketing_version'='2', 'transactional'='true', > 'compactorthreshold.hive.compactor.delta.num.threshold'='3'); > insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4); > insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4); > delete from tbl2 where b = 2; > insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4); > delete from tbl2 where a = 1; > {code} > Having the above use case, at the end of the major compaction the base > directory contains only one bucket file, although the table is bucketed in 2 > buckets. Before running the compaction, the delta directories contains the > right amount of bucket files, and the data is split accordingly. > -- This message was sent by Atlassian Jira (v8.3.4#803005)