GitHub user clockfly opened a pull request:
https://github.com/apache/spark/pull/14753
[SPARK-17187][SQL] Supports using arbitrary Java object as internal
aggregation buffer object
## What changes were proposed in this pull request?
This PR introduces an abstract class `TypedImperativeAggregate` so that an
aggregation function of TypedImperativeAggregate can use **arbitrary**
user-defined Java object as intermediate aggregation buffer object.
**This has advantages like:**
1. It now can support larger category of aggregation functions. For
example, it will be much easier to implement aggregation function
`percentile_approx`, which has a complex aggregation buffer definition.
2. It can be used to avoid doing serialization/de-serialization for every
call of `update` or `merge` when converting domain specific aggregation object
to internal Spark-Sql storage format.
3. It is easier to integrate with other existing monoid libraries like
algebird, and supports more aggregation functions with high performance.
Please see Java doc of `TypedImperativeAggregate` and Jira ticket
SPARK-17187 for more information.
## How was this patch tested?
Unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/clockfly/spark object_aggregation_buffer_try_2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14753.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14753
----
commit 6efddadcb8e6d48e9898a8980f4dcceee4894ebc
Author: Sean Zhong <[email protected]>
Date: 2016-08-19T16:34:56Z
object aggregation buffer
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]