Rui Li created HIVE-7956: ---------------------------- Summary: When inserting into a bucketed table, all data goes to a single bucket [Spark Branch] Key: HIVE-7956 URL: https://issues.apache.org/jira/browse/HIVE-7956 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li
I created a bucketed table: {code} create table testBucket(x int,y string) clustered by(x) into 10 buckets; {code} Then I run a query like: {code} set hive.enforce.bucketing = true; insert overwrite table testBucket select intCol,stringCol from src; {code} Here {{src}} is a simple textfile-based table containing 40000000 records (not bucketed). The query launches 10 reduce tasks but all the data goes to only one of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)