Gopal V created HIVE-20177:
------------------------------

             Summary: Vectorization: Reduce KeyWrapper allocation in GroupBy 
Streaming mode
                 Key: HIVE-20177
                 URL: https://issues.apache.org/jira/browse/HIVE-20177
             Project: Hive
          Issue Type: Bug
          Components: Vectorization
            Reporter: Gopal V


The streaming mode for VectorGroupBy allocates a large number of arrays due to 
VectorKeyHashWrapper::duplicateTo()

Since the vectors can't be mutated in-place while a single batch is being 
processed, this operation can be cut by 1000x by allocating a streaming key at 
the end of the loop, instead of reallocating within the loop.

{code}
      for(int i = 0; i < batch.size; ++i) {
        if (!batchKeys[i].equals(streamingKey)) {
          // We've encountered a new key, must save current one
          // We can't forward yet, the aggregators have not been evaluated
          rowsToFlush[flushMark] = currentStreamingAggregators;
          if (keysToFlush[flushMark] == null) {
            keysToFlush[flushMark] = (VectorHashKeyWrapper) 
streamingKey.copyKey();
          } else {
            streamingKey.duplicateTo(keysToFlush[flushMark]);
          }

          currentStreamingAggregators = 
streamAggregationBufferRowPool.getFromPool();
          batchKeys[i].duplicateTo(streamingKey);
          ++flushMark;
        }
{code}

The duplicateTo can be pushed out of the loop since there only one to truly 
keep a copy of is the last unique key in the VRB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to