union all with null value gives exception

Ch Wan Sat, 15 Nov 2014 09:25:32 -0800

Hi all, I use hive 0.13.1 and encountered a problem now. Here is the tables
and query.


// each table contains only one row

create table test_union_all_1(a double);
create table test_union_all_2(a double);
create table test_union_all_3(a double);


select max(a) from (
    select NULL a from test_union_all_1
    union all
    select NULL a from test_union_all_2
    union all
    select a a from test_union_all_3
)t;



Then I got such exception blew:

Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row (tag=0) {"key":{},"value":{"_col0":0.0}}

        at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)

        at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:462)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)

        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:415)

        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row (tag=0)
{"key":{},"value":{"_col0":0.0}}

        at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:271)

        ... 7 more

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ArrayIndexOutOfBoundsException: 7

        at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808)

        at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:262)

        ... 7 more

Caused by: java.lang.ArrayIndexOutOfBoundsException: 7

        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:81)

        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)

        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:216)

        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:197)

        at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)

        at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98)

        at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)

        at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)

        at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:616)

        at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:905)

        at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:737)

        at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803)

        ... 8 more



Here is the explain of the sql:

OK

STAGE DEPENDENCIES:

  Stage-1 is a root stage

  Stage-0 is a root stage



STAGE PLANS:

  Stage: Stage-1

    Map Reduce

      Map Operator Tree:

          TableScan

            alias: test_union_all_2

            Statistics: Num rows: -1 Data size: 4 Basic stats: PARTIAL
Column stats: COMPLETE

            Select Operator

              expressions: null (type: string)

              outputColumnNames: _col0

              Statistics: Num rows: -1 Data size: 4 Basic stats:
PARTIAL Column stats: COMPLETE

              Union

                Statistics: Num rows: -2 Data size: 12 Basic stats:
PARTIAL Column stats: PARTIAL

                Select Operator

                  expressions: _col0 (type: double)

                  outputColumnNames: _col0

                  Statistics: Num rows: -2 Data size: 12 Basic stats:
PARTIAL Column stats: PARTIAL

                  Group By Operator

                    aggregations: max(_col0)

                    mode: hash

                    outputColumnNames: _col0

                    Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: PARTIAL

                    Reduce Output Operator

                      sort order:

                      Statistics: Num rows: 1 Data size: 8 Basic
stats: COMPLETE Column stats: PARTIAL

                      value expressions: _col0 (type: double)

          TableScan

            alias: test_union_all_1

            Statistics: Num rows: -1 Data size: 4 Basic stats: PARTIAL
Column stats: COMPLETE

            Select Operator

              expressions: null (type: string)

              outputColumnNames: _col0

              Statistics: Num rows: -1 Data size: 4 Basic stats:
PARTIAL Column stats: COMPLETE

              Union

                Statistics: Num rows: -2 Data size: 12 Basic stats:
PARTIAL Column stats: PARTIAL

                Select Operator

                  expressions: _col0 (type: double)

                  outputColumnNames: _col0

                  Statistics: Num rows: -2 Data size: 12 Basic stats:
PARTIAL Column stats: PARTIAL

                  Group By Operator

                    aggregations: max(_col0)

                    mode: hash

                    outputColumnNames: _col0

                    Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: PARTIAL

                    Reduce Output Operator

                      sort order:

                      Statistics: Num rows: 1 Data size: 8 Basic
stats: COMPLETE Column stats: PARTIAL

                      value expressions: _col0 (type: double)

          TableScan

            alias: test_union_all_3

            Statistics: Num rows: 0 Data size: 4 Basic stats: PARTIAL
Column stats: NONE

            Select Operator

              expressions: a (type: double)

              outputColumnNames: _col0

              Statistics: Num rows: 0 Data size: 4 Basic stats:
PARTIAL Column stats: NONE

              Union

                Statistics: Num rows: -2 Data size: 12 Basic stats:
PARTIAL Column stats: PARTIAL

                Select Operator

                  expressions: _col0 (type: double)

                  outputColumnNames: _col0

                  Statistics: Num rows: -2 Data size: 12 Basic stats:
PARTIAL Column stats: PARTIAL

                  Group By Operator

                    aggregations: max(_col0)

                    mode: hash

                    outputColumnNames: _col0

                    Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: PARTIAL

                    Reduce Output Operator

                      sort order:

                      Statistics: Num rows: 1 Data size: 8 Basic
stats: COMPLETE Column stats: PARTIAL

                      value expressions: _col0 (type: double)

      Reduce Operator Tree:

        Group By Operator

          aggregations: max(VALUE._col0)

          mode: mergepartial

          outputColumnNames: _col0

          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: PARTIAL

          Select Operator

            expressions: _col0 (type: double)

            outputColumnNames: _col0

            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: PARTIAL

            File Output Operator

              compressed: false

              Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: PARTIAL

              table:

                  input format: org.apache.hadoop.mapred.TextInputFormat

                  output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe



  Stage: Stage-0

    Fetch Operator

      limit: -1



It seems that the query plan treats the null as string which should be
double in fact. So I changed the query as below and solved the problem:

select max(a) from (

    select cast(NULL as double) a from test_union_all_1

    union all

    select NULL a from test_union_all_2

    union all

    select a a from test_union_all_3

)t;


Will anyone give some suggestions about it? Thanks.

union all with null value gives exception

Reply via email to