Hi all,

I was doing TPC-H benchmark on Hive recently while I found some queries
went wrong.

Following are the two cases, both are MapJoin while the join key is bigint
type. After disabling auto convert join the error is gone.

Case 1.
Query (TPC-H query4):

create table q4_result as
select
o_orderpriority,
count(*) as order_count
from
orders o
join
(
select
distinct l_orderkey
from
(
select
*
from
lineitem
where
l_commitdate < l_receiptdate
) tab1
) tab2
on tab2.l_orderkey = o.o_orderkey
where
o.o_orderdate >= '1993-07-01' and o.o_orderdate < '1993-10-01'
group by
o_orderpriority
order by
o_orderpriority;

The query will cause data-loss if MapJoin is enabled. Both side of join
have expected output but some data can't be joined together here. (Note
l_orderkey & o_orderkey is bigint).

Case 2:
Query (TPC-H query9):

create table q9_result as
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
substr(o_orderdate,1,4) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
supplier s
join lineitem l on s.s_suppkey = l.l_suppkey
join partsupp ps on ps.ps_suppkey = l.l_suppkey and ps.ps_partkey =
l.l_partkey
join part p on p.p_partkey = l.l_partkey
join orders o on o.o_orderkey = l.l_orderkey
join nation n on s.s_nationkey = n.n_nationkey
where
p_name like '%green%'
) profit
group by
nation,
o_year
order by
nation,
o_year desc;


The error is when joining table s and n, we got an exception as follows:

Error: Failure while running task:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
        ... 15 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
        ... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ArrayIndexOutOfBoundsException: -1
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:368)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:117)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
        ... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ArrayIndexOutOfBoundsException: -1
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:403)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:98)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:603)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:362)
        ... 27 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
        at 
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:301)
        at 
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:244)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:196)
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:542)
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:386)
        ... 31 more


Both the 2 cases use MapJoin and bigint as join key, under tez with
vectorized execution enabled, so I'm wondering if there is bug on
VectorMapJoinInnerLongOperator.

Does anyone have ideas?

Reply via email to