I am not aware of this issue. Please file a JIRA, and if it does turn out to be a duplicate we can mark it as such.

Alan.

Furcy Pin <mailto:furcy....@flaminem.com>
September 19, 2015 at 2:36
Hi,


We bumped into a bug when using vectorization on a transactional table.
Here is a minimal example :

DROP TABLE IF EXISTS vectorization_transactional_test ;
CREATE TABLE vectorization_transactional_test (
    id INT
)
CLUSTERED BY (id) into 3 buckets
STORED AS ORC
TBLPROPERTIES('transactional'='true') ;

INSERT INTO TABLE vectorization_transactional_test values
(1)
;

SET hive.vectorized.execution.enabled=true ;

SELECT
*
FROM vectorization_transactional_test
WHERE id = 1
;

With vectorization enable, the last query will fail with a n ArrayOutOfBoundException in the mappers.
Here is the full stack:

FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 1 at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:126)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.processOp(VectorFilterOperator.java:111)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
    ... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluateLong(ConstantVectorExpression.java:102) at org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluate(ConstantVectorExpression.java:150) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:124)
    ... 15 more


Of course, disabling vectorization removes the bug.

More annoyingly, when the table is used in a JOIN, the job doesn't fail but returns a wrong result instead : for instance an empty table, while disabling vectorization returns a non-empty one. This behavior is harder to reproduce with a minimal example.

We experienced this bug in version 1.1.0-cdh5.4.2.

We didn't find any JIRA related to this, is it a known bug, or should we create a new JIRA?

Best,

Furcy





Reply via email to