I am not aware of this issue. Please file a JIRA, and if it does turn
out to be a duplicate we can mark it as such.
Alan.
Furcy Pin <mailto:furcy....@flaminem.com>
September 19, 2015 at 2:36
Hi,
We bumped into a bug when using vectorization on a transactional table.
Here is a minimal example :
DROP TABLE IF EXISTS vectorization_transactional_test ;
CREATE TABLE vectorization_transactional_test (
id INT
)
CLUSTERED BY (id) into 3 buckets
STORED AS ORC
TBLPROPERTIES('transactional'='true') ;
INSERT INTO TABLE vectorization_transactional_test values
(1)
;
SET hive.vectorized.execution.enabled=true ;
SELECT
*
FROM vectorization_transactional_test
WHERE id = 1
;
With vectorization enable, the last query will fail with a n
ArrayOutOfBoundException in the mappers.
Here is the full stack:
FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row
at
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error
evaluating 1
at
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:126)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.processOp(VectorFilterOperator.java:111)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at
org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluateLong(ConstantVectorExpression.java:102)
at
org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluate(ConstantVectorExpression.java:150)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:124)
... 15 more
Of course, disabling vectorization removes the bug.
More annoyingly, when the table is used in a JOIN, the job doesn't
fail but returns a wrong result instead :
for instance an empty table, while disabling vectorization returns a
non-empty one. This behavior is harder to reproduce with a minimal
example.
We experienced this bug in version 1.1.0-cdh5.4.2.
We didn't find any JIRA related to this, is it a known bug, or should
we create a new JIRA?
Best,
Furcy