[ 
https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2262:
---------------------------------

    Fix Version/s:     (was: 0.7.1)

> mapjoin followed by union all, groupby does not work
> ----------------------------------------------------
>
>                 Key: HIVE-2262
>                 URL: https://issues.apache.org/jira/browse/HIVE-2262
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: yu xiang
>            Priority: Trivial
>
> sql:
> CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, 
> double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED 
> BY ',';
> explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 
> as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = 
> b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 
> from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable 
> group by int_data2;
> exception:
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
>         at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
>         at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
>         at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
>         at 
> org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
>         at 
> org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> Analyse the reason:
> 1.When use mapjoin,union,groupby together,the 
> UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and 
> set up the UnionParseContext.
> 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
> 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call 
> GenMRRedSink1()).process() to init the plan.But the utask's plan has been set 
> yet, it just need to set reducer.And also the utask is processing temporary 
> table, there is no topOp map to table.So here we get null exception.
> Solutions:
> 1.SQL solution:use a sub query to modify the sql;
> 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, 
> set a settaskplan flag true to indicate the plan for this utask has been 
> set.When in GenMRRedSink3 ,if this flag sets true, don't use the 
> GenMRRedSink1()).process() to reinit the plan.
> ++++++++++++++++++++++++++++
> if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan())
> ++++++++++++++++++++++++++++
> I don't know whether the code solution is suitable.
> Is there any better solution?
> thx

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to