Github user mccheah commented on the pull request:

    https://github.com/apache/spark/pull/11688#issuecomment-196447273
  
    Good points you make, @hvanhovell. I wasn't aware of that PR; I looked at 
it and it indeed does look very similar, with Imperative instead of 
Declarative. Good point on the fact that type conversion isn't done with 
imperative aggregates, and only if UDAF is involved.
    
    My understanding of the difference is that DeclarativeAggregate could 
theoretically be faster... if we implement good code generation for the 
expressions that back it. The main difference between declarative vs. 
imperative is that declarative is backed by Catalyst expressions, and 
expressions appear to support code generation here, whereas the imperative 
aggregate function does not. Of course since I don't actually codegen in this 
PR so that's not relevant at the moment.
    
    It might be a good thought experiment to look through our current built-in 
aggregates and see which ones implement Imperative vs. Declarative, and why. I 
see HyperLogPlusPlus extends imperative for example. Count and average 
implement declarative.
    
    Good point on the use of mutable array data; I wonder if we could support 
an Unsafe-esque implementation of that construct. Although I don't quite know 
how we would go about "not using the buffer at all"; do you mean perhaps 
storing the intermediate state inside the function object itself?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to