Yongzhi Chen created HIVE-10881:
-----------------------------------

             Summary: The bucket number is not respected in insert overwrite.
                 Key: HIVE-10881
                 URL: https://issues.apache.org/jira/browse/HIVE-10881
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.2.0, 1.3.0
            Reporter: Yongzhi Chen
            Priority: Critical


When hive.enforce.bucketing is true, the bucket number defined in the table is 
no longer respected in current master and 1.2. This is a regression.
Reproduce:
{noformat}
CREATE TABLE IF NOT EXISTS buckettestinput( 
data string 
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput1( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput2( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then I inserted the following data into the "buckettestinput" table
firstinsert1 
firstinsert2 
firstinsert3 
firstinsert4 
firstinsert5 
firstinsert6 
firstinsert7 
firstinsert8 
secondinsert1 
secondinsert2 
secondinsert3 
secondinsert4 
secondinsert5 
secondinsert6 
secondinsert7 
secondinsert8
set hive.enforce.bucketing = true; 
set hive.enforce.sorting=true;
insert overwrite table buckettestoutput1 
select * from buckettestinput where data like 'first%';
set hive.auto.convert.sortmerge.join=true; 
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 
select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);

Error: Error while compiling statement: FAILED: SemanticException [Error 
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of 
buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
(state=42000,code=10141)
{noformat}

The related debug information related to insert overwrite:
{noformat}
0: jdbc:hive2://localhost:10000> insert overwrite table buckettestoutput1 
select * from buckettestinput where data like 'first%'insert overwrite table 
buckettestoutput1 
0: jdbc:hive2://localhost:10000> ;
select * from buckettestinput where data like ' 
first%';
INFO  : Number of reduce tasks determined at compile time: 2
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapred.reduce.tasks=<number>
INFO  : Job running in-process (local Hadoop)
INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
INFO  : Ended Job = job_local107155352_0001
INFO  : Loading data to table default.buckettestoutput1 from 
file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-10000
INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
totalSize=52, rawDataSize=48]
No rows affected (1.692 seconds)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to