Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/11688#issuecomment-196447273
Good points you make, @hvanhovell. I wasn't aware of that PR; I looked at
it and it indeed does look very similar, with Imperative instead of
Declarative. Good point on the fact that type conversion isn't done with
imperative aggregates, and only if UDAF is involved.
My understanding of the difference is that DeclarativeAggregate could
theoretically be faster... if we implement good code generation for the
expressions that back it. The main difference between declarative vs.
imperative is that declarative is backed by Catalyst expressions, and
expressions appear to support code generation here, whereas the imperative
aggregate function does not. Of course since I don't actually codegen in this
PR so that's not relevant at the moment.
It might be a good thought experiment to look through our current built-in
aggregates and see which ones implement Imperative vs. Declarative, and why. I
see HyperLogPlusPlus extends imperative for example. Count and average
implement declarative.
Good point on the use of mutable array data; I wonder if we could support
an Unsafe-esque implementation of that construct. Although I don't quite know
how we would go about "not using the buffer at all"; do you mean perhaps
storing the intermediate state inside the function object itself?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]