[GitHub] [spark] planga82 commented on pull request #32057: [SPARK-34961][SQL] Migrate First function from DeclarativeAggregate to TypedImperativeAggregate to improve performance

GitBox Wed, 07 Apr 2021 14:41:39 -0700


planga82 commented on pull request #32057:
URL: https://github.com/apache/spark/pull/32057#issuecomment-815282837



   > Perhaps @cloud-fan can comment on this. Why does `ObjectHashAggregateExec` 
require at least one of its aggregate functions to be 
`TypedImperativeAggregate`?
   > 
   > 
https://github.com/apache/spark/blob/4b5fc1da752ec008468ef80a5717c8beab468387/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala#L143-L149
   > 
   > Without that check the decision tree would be much simpler:
   > 
   > 1. Are all agg buffers fixed with? Use `HashAggregateExec`
   > 2. Is `conf.useObjectHashAggregation` enabled? Use 
`ObjectHashAggregateExec`
   > 3. Use `SortAggregateExec`
   > 
   > From #15590 it seems, that `ObjectHashAggregateExec` is allways faster.
   
   I was doing a little tests and I understand better what you mean, and yes it 
seems very interesting. Gentle ping to @liancheng too, if he could solve our 
doubt. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] planga82 commented on pull request #32057: [SPARK-34961][SQL] Migrate First function from DeclarativeAggregate to TypedImperativeAggregate to improve performance

Reply via email to