[jira] [Comment Edited] (HIVE-10283) HIVE-4240 may be causing issue with bucketed tables

Xuefu Zhang (JIRA) Fri, 29 May 2015 14:30:34 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565200#comment-14565200
 ]


Xuefu Zhang edited comment on HIVE-10283 at 5/29/15 9:29 PM:
-------------------------------------------------------------

[~xuefuz] && [~szehon], could you find someone who know this part well work on 
the issue. Currently, in upstream master code , number of buckets is not 
respected even with insert overwrite. (insert overwrite only create 1 bucket 
file while the table definition is 2. 
Reproduce:
{noformat}
create table buckettest (data string) partitioned by (state string) clustered 
by (data) into 2 buckets;
set hive.enforce.bucketing = true;
insert overwrite table buckettest partition(state='MA') select code from jsmall 
limit 10;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true;
0: jdbc:hive2://localhost:10000> select * from buckettest a join 
buckettestoutput2 b on (a.data=b.data);
select * from buckettest a join buckettestoutpu 
t2 b on (a.data=b.data);
Error: Error while compiling statement: FAILED: SemanticException [Error 
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of 
buckets for table buckettest partition state=MA is 2, whereas the number of 
files is 1 (state=42000,code=10141)
{noformat}



was (Author: ychena):
[~xuefuz] && [~szehon], could you find someone who know this part well work on 
the issue. Currently, in upstream master code , number of buckets is not 
respected even with insert overwrite. (insert overwrite only create 1 bucket 
file while the table definition is 2. 
Reproduce:
{noformat}
create table buckettest (data string) partitioned by (state string) clustered 
by (data) into 2 buckets;
set hive.enforce.bucketing = true;
insert overwrite table buckettest partition(state='MA') select code from jsmall 
limit 10;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true;
0: jdbc:hive2://localhost:10000> select * from buckettest a join 
buckettestoutput2 b on (a.data=b.data);
select * from buckettest a join buckettestoutpu 
t2 b on (a.data=b.data);
Error: Error while compiling statement: FAILED: SemanticException [Error 
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of 
buckets for table buckettest partition state=MA is 2, whereas the number of 
files is 1 (state=42000,code=10141)



> HIVE-4240 may be causing issue with bucketed tables 
> ----------------------------------------------------
>
>                 Key: HIVE-10283
>                 URL: https://issues.apache.org/jira/browse/HIVE-10283
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Ryan P
>
> I suspect that by removing the reducer, HIVE-4240, may be causing issues. 
> Because of this inserts will not consolidate 'buckets' into single files 
> which is problematic when attempting to use bucketmapjoin.
> CREATE TABLE IF NOT EXISTS buckettestinput( 
> data string 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> Then I inserted the following data into the "buckettestinput" table 
> firstinsert1 
> firstinsert2 
> firstinsert3 
> firstinsert4 
> firstinsert5 
> firstinsert6 
> firstinsert7 
> firstinsert8 
> secondinsert1 
> secondinsert2 
> secondinsert3 
> secondinsert4 
> secondinsert5 
> secondinsert6 
> secondinsert7 
> secondinsert8 
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true; 
> insert into table buckettestoutput1 
> select * from buckettestinput where data like 'first%' 
> SELECT * 
> FROM buckettestoutput1 TABLESAMPLE(BUCKET 1 OUT OF 1 ON data) s; 
> insert into table buckettestoutput1 
> select * from buckettestinput where data like 'second%' 
> check the results of the table sample query. 
> for sort merge bucket map join 
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> set hive.auto.convert.sortmerge.join.noconditionaltask=true; 
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data) 
> hive> select * from buckettestoutput1 a join buckettestoutput2 b on 
> (a.data=b.data); 
> FAILED: SemanticException [Error 10141]: Bucketed table metadata is not 
> correct. Fix the metadata or don't use bucketed mapjoin, by setting 
> hive.enforce.bucketmapjoin to false. The number of buckets for table 
> buckettestoutput1 is 2, whereas the number of files is 4 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10283) HIVE-4240 may be causing issue with bucketed tables

Reply via email to