Gopal V created HIVE-20177: ------------------------------ Summary: Vectorization: Reduce KeyWrapper allocation in GroupBy Streaming mode Key: HIVE-20177 URL: https://issues.apache.org/jira/browse/HIVE-20177 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Gopal V
The streaming mode for VectorGroupBy allocates a large number of arrays due to VectorKeyHashWrapper::duplicateTo() Since the vectors can't be mutated in-place while a single batch is being processed, this operation can be cut by 1000x by allocating a streaming key at the end of the loop, instead of reallocating within the loop. {code} for(int i = 0; i < batch.size; ++i) { if (!batchKeys[i].equals(streamingKey)) { // We've encountered a new key, must save current one // We can't forward yet, the aggregators have not been evaluated rowsToFlush[flushMark] = currentStreamingAggregators; if (keysToFlush[flushMark] == null) { keysToFlush[flushMark] = (VectorHashKeyWrapper) streamingKey.copyKey(); } else { streamingKey.duplicateTo(keysToFlush[flushMark]); } currentStreamingAggregators = streamAggregationBufferRowPool.getFromPool(); batchKeys[i].duplicateTo(streamingKey); ++flushMark; } {code} The duplicateTo can be pushed out of the loop since there only one to truly keep a copy of is the last unique key in the VRB. -- This message was sent by Atlassian JIRA (v7.6.3#76005)