RE: Intermittent ArrayIndexOutOfBoundsException on Hive Merge

Aaron Grubb Tue, 14 Jul 2020 07:06:39 -0700

This is just a suggestion but I recently ran into an issue with vectorized 
query execution and a map column type, specifically when inserting into an 
HBase table with a map to column family setup. Try using “set 
hive.vectorized.execution.enabled=false;”

Thanks,
Aaron

From: Bernard Quizon <[email protected]>
Sent: Tuesday, July 14, 2020 9:57 AM
To: [email protected]
Subject: Re: Intermittent ArrayIndexOutOfBoundsException on Hive Merge

Hi.

I see that this piece of code is the source of the error:

final int maxSize =
    (vectorizedTestingReducerBatchSize > 0 ?
        Math.min(vectorizedTestingReducerBatchSize, batch.getMaxSize()) :
        batch.getMaxSize());
Preconditions.checkState(maxSize > 0);
int rowIdx = 0;
int batchBytes = keyBytes.length;
try {
  for (Object value : values) {
    if (rowIdx >= maxSize ||
        (rowIdx > 0 && batchBytes >= BATCH_BYTES)) {

      // Batch is full AND we have at least 1 more row...
      batch.size = rowIdx;
      if (handleGroupKey) {
        reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ false);
      }
      reducer.process(batch, tag);

      // Reset just the value columns and value buffer.
      for (int i = firstValueColumnOffset; i < batch.numCols; i++) {
        // Note that reset also resets the data buffer for bytes column vectors.
        batch.cols[i].reset();
      }
      rowIdx = 0;
      batchBytes = keyBytes.length;
    }
    if (valueLazyBinaryDeserializeToRow != null) {
      // Deserialize value into vector row columns.
      BytesWritable valueWritable = (BytesWritable) value;
      byte[] valueBytes = valueWritable.getBytes();
      int valueLength = valueWritable.getLength();
      batchBytes += valueLength;

      valueLazyBinaryDeserializeToRow.setBytes(valueBytes, 0, valueLength);
      valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx);
    }
    rowIdx++;
  }

`valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)` throws an 
exception due to `rowIdx` having a value of 1024, it should have a value of1023 
at most.
But it seems to me that `maxSize` will always be < 1024 then why would `rowIdx` 
on the expression `valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)` 
have anything >= 1024.
Am I missing something here?

Thanks,
Bernard

On Tue, Jul 14, 2020 at 5:44 PM Bernard Quizon 
<[email protected]<mailto:[email protected]>> 
wrote:

Hi.

I'm using Hive 3.1.0 (Tez Execution Engine) and I'm running into intermittent 
errors when doing Hive Merge.

Just to clarify, the Hive Merge query probably succeeds 60% of the time using 
the same source and destination table for the Hive Merge query.

By the way, both the source and destination table has columns with complex data 
types such as ARRAY<STRING> and MAP<STRING, STRING>.

Here's the error :

TaskAttempt 0 failed, info=
» Error: Error while running task ( failure ) : 
attempt_1594345704665_28139_1_06_000007_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 4)
  at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
  at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
  at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
  at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
  at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
  at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
  at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
  at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
  at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
  at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing vector batch (tag=0) (vectorizedVertexNum 4)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:396)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
  at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
  ... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing vector batch (tag=0) (vectorizedVertexNum 4)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:493)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387)
  ... 19 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
  at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187)
  at 
org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588)
  at 
org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778)
  at 
org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeMapRowColumn(VectorDeserializeRow.java:855)
  at 
org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:941)
  at 
org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:470)
  ... 20 more

Would someone know a workaround for this?

Thanks,
Bernard

RE: Intermittent ArrayIndexOutOfBoundsException on Hive Merge

Reply via email to