GitHub user chenghao-intel opened a pull request:
https://github.com/apache/spark/pull/5542
[SPARK-4233] [SQL] [WIP] UDAF Interface Refactoring
This PR will keep both old / new versions of UDAF, and switch them by
```
SET spark.sql.aggregate2=true/false;
```
The new interface is
```scala
trait AggregateFunction2 {
self: Product =>
// Specify the BoundReference for Aggregate Buffer
def initialize(buffers: Seq[BoundReference]): Unit
// Initialize (reinitialize) the aggregation buffer
def reset(buf: MutableRow): Unit
// Get the children value from the input row, and then
// merge it with the given aggregate buffer,
// `seen` is the set that the value showed up, that's will
// be useful for distinct aggregate. And it probably be
// null for non-distinct aggregate
def update(input: Row, buf: MutableRow, seen: JSet[Any]): Unit
// Merge 2 aggregation buffers, and write back to the later one
def merge(value: Row, buf: MutableRow): Unit
// Semantically we probably don't need this, however, we need it when
// integrating with Hive UDAF(GenericUDAF)
@deprecated
def terminatePartial(buf: MutableRow): Unit = {}
// Output the final result by feeding the aggregation buffer
def terminate(buffer: Row): Any
}
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chenghao-intel/spark udaf_refactor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5542.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5542
----
commit a6fdd419a63c241f2d21ca3bebc0222cf96e9280
Author: Cheng Hao <[email protected]>
Date: 2015-04-11T06:08:23Z
migrate to support both version of UDAF
commit bfe30158023c6f876fd9112d456e749b35afd40c
Author: Cheng Hao <[email protected]>
Date: 2015-04-11T06:33:12Z
Update the unit test to comment out the not support ones
commit 3a8232ccdf5abcfd6962be8dc6f3e50dbb8a6f88
Author: Cheng Hao <[email protected]>
Date: 2015-04-13T23:05:37Z
update the interface name
commit dced96cde1f650f1fd098cc77504d6858febc104
Author: Cheng Hao <[email protected]>
Date: 2015-04-15T02:43:40Z
change the update method from Any to Row
commit 29202bd89e028ce534b1b986c3444c362c29a3d8
Author: Cheng Hao <[email protected]>
Date: 2015-04-15T18:29:51Z
move the distinct into the udaf
commit 967716b74d7cdbd3ea382e5b657a3952b3ef585f
Author: Cheng Hao <[email protected]>
Date: 2015-04-16T07:50:31Z
simpify the aggregate expression by uing the Projection
commit 504fbe52abc191e873cc741fe9ef44d9e4013d7a
Author: Cheng Hao <[email protected]>
Date: 2015-04-16T08:23:43Z
revert the uncessary changes
commit e9017ed23e946a3cc0d7e5142c05a19039491c58
Author: Cheng Hao <[email protected]>
Date: 2015-04-16T17:45:12Z
Add Unit test
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]