Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9038#discussion_r41581609
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
 ---
    @@ -170,10 +232,27 @@ class TungstenAggregationIterator(
         val bufferRowSize: Int = bufferSchema.length
     
         val genericMutableBuffer = new GenericMutableRow(bufferRowSize)
    -    val unsafeProjection =
    -      UnsafeProjection.create(bufferSchema.map(_.dataType))
    -    val buffer = unsafeProjection.apply(genericMutableBuffer)
    -    initialProjection.target(buffer)(EmptyRow)
    +    // TODO(josh): figure out whether we have to use
    +    val useUnsafeBuffer = 
bufferSchema.map(_.dataType).forall(UnsafeRow.isMutable)
    +
    +    val buffer =  /* if (useUnsafeBuffer) */ {
    --- End diff --
    
    Whoops, I meant to clean this up. One concern that I have is whether it's 
safe to expose an UnsafeRow aggregation buffer to InterpretedAggregate 
functions. Even if UnsafeRow supports in-place updates for all of the data 
types used by the aggregate function's buffer schema, I'm still slightly 
worried about what might happen if the imperative aggregate code were to call 
the generic `MutableRow.update()` method. As long as we can guarantee that 
`update()` won't be called by any valid imperative aggregate function, then I 
think it should be fine to just always use UnsafeRow here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to