GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/9038
[SPARK-11017] Support ImperativeAggregates in TungstenAggregate
This patch extends TungstenAggregate to support ImperativeAggregate
functions. The existing TungstenAggregate operator only supported
DeclarativeAggregate functions, which are defined in terms of Catalyst
expressions and can be evaluated via generated projections. ImperativeAggregate
functions, on the other hand, are evaluated by calling their `initialize`,
`update`, `merge`, and `eval` methods.
The basic strategy here is similar to how SortBasedAggregate evaluates both
types of aggregate functions: use a generated projection to evaluate the
expression-based declarative aggregates with dummy placeholder expressions
inserted in place of the imperative aggregate function output, then invoke the
imperative aggregate functions and target them against the aggregation buffer.
The bulk of the diff here consists of code that was copied and adapted from
SortBasedAggregate, with some key changes to handle TungstenAggregate's sort
fallback path.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark
support-interpreted-in-tungsten-agg-final
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9038.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9038
----
commit 9d141c44818d415b37965444bd001ea8aaa54877
Author: Josh Rosen <[email protected]>
Date: 2015-10-07T23:57:02Z
Add initialInputBufferOffset to TungstenAggregate.
commit 3e92fd1176403512e6076acebc6241d33123f95d
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T00:36:08Z
Refactor TungstenAggregationIterator constructor to accept imperative
aggregate functions, too.
commit 78aaab2b5844b093705bac04cca6e6a55a8c7a8d
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T00:39:09Z
Try enabling ImperativeAggregate for agg queries w/o distinct to see what
breaks.
commit fc9c2a0a9866a768b029dc9588bd63a99d46dcff
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T01:41:20Z
Fix result projection for interpreted aggs.
commit fdd6b91156300c7910e90f44d4a929e1bbc63640
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T01:51:43Z
Use SpecificMutableRow in more places
commit cec6cef6669680c86c49100cca1283fcb608aa64
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T02:25:50Z
Re-initialize aggregate functions when switching to sort.
commit 53b6462f677c7746589d7d3e8a65ee74c8135b59
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T02:26:23Z
Remove stray println
commit 2d2ab17a7793c1f53939baab685f84e1bf468350
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T02:35:24Z
Fix None.get issue.
commit c945bd610a52ad7732cb217fd5b5c077640fe221
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T02:45:56Z
Work around lazy val initialization issues to fix attr. binding errors.
commit b5be45402612be83fad6196870e53d2fec7b87d5
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T02:52:06Z
Use NoOp instead of a null literal.
commit e820d78e645bad43ff3641f368f4796d42686b1e
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T08:17:32Z
Improvements to agg buffer initialization.
commit 7a34e03696bb8b96b23c7a5c6fa9ead169ce4602
Author: Josh Rosen <[email protected]>
Date: 2015-10-08T18:21:37Z
Reset input buffer offset after spilling to sort
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]