Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/3247#issuecomment-68805521
I only looked at this quickly, but I like the goals, especially the middle
one. Our current implementation is really wasteful on memory. Some thoughts:
- It would be good if you could write up a quick design doc that outlines
the interfaces as right now its kind of hard to pull them out from all the
other changes you have to make.
- I wonder if it is possible to combine aggregate expression and aggregate
function somehow.
- Can you explain how the `modes` are used. Do we really need them?
Other things:
- Before we commit this we will have to implement the approximates. I
don't think its okay to regress in functionality here.
- I'm not totally against removing the code generated version, but I'd
have to see some performance tests that show we aren't regressing.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]