Naresh P R created HIVE-28213: --------------------------------- Summary: Incorrect results after insert-select from similar bucketed source & target table Key: HIVE-28213 URL: https://issues.apache.org/jira/browse/HIVE-28213 Project: Hive Issue Type: Bug Reporter: Naresh P R Attachments: test.q
Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result{code} Workaround: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] -- This message was sent by Atlassian Jira (v8.20.10#820010)