[GitHub] spark pull request #19082: [SPARK-21870][SQL] Split aggregation code into sm...

viirya Sat, 07 Oct 2017 02:13:46 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19082#discussion_r143326742
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
    @@ -797,26 +904,44 @@ case class HashAggregateExec(
     
     
         def updateRowInFastHashMap(isVectorized: Boolean): Option[String] = {
    -      ctx.INPUT_ROW = fastRowBuffer
    +      // We need to copy the aggregation row buffer to a local row first 
because each aggregate
    +      // function directly updates the buffer when it finishes.
    +      val localRowBuffer = ctx.freshName("localFastRowBuffer")
    +      val initLocalRowBuffer = s"InternalRow $localRowBuffer = 
$fastRowBuffer.copy();"
    --- End diff --
    
    Why we need to copy the row buffer? You let `updateExpr` bound to the local 
copied row buffer, but the evaluation is happened in split functions. Isn't 
possible the `updateExpr` can't find the local variable of the copied row 
buffer in the functions?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19082: [SPARK-21870][SQL] Split aggregation code into sm...

Reply via email to