Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19082#discussion_r143326742
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
---
@@ -797,26 +904,44 @@ case class HashAggregateExec(
def updateRowInFastHashMap(isVectorized: Boolean): Option[String] = {
- ctx.INPUT_ROW = fastRowBuffer
+ // We need to copy the aggregation row buffer to a local row first
because each aggregate
+ // function directly updates the buffer when it finishes.
+ val localRowBuffer = ctx.freshName("localFastRowBuffer")
+ val initLocalRowBuffer = s"InternalRow $localRowBuffer =
$fastRowBuffer.copy();"
--- End diff --
Why we need to copy the row buffer? You let `updateExpr` bound to the local
copied row buffer, but the evaluation is happened in split functions. Isn't
possible the `updateExpr` can't find the local variable of the copied row
buffer in the functions?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]