GitHub user icexelloss opened a pull request:
https://github.com/apache/spark/pull/19872
WIP: [SPARK-22274][PySpark] User-defined aggregation functions with pandas
udf
## What changes were proposed in this pull request?
Add support for pandas_udf in groupby().agg()
## How was this patch tested?
GroupbyAggTests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/icexelloss/spark SPARK-22274-groupby-agg
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19872.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19872
----
commit f71575782be3f9c41184eeafa275b5ba1cb5fb83
Author: Li Jin <[email protected]>
Date: 2017-12-01T17:26:26Z
Initial commit: wip
commit 2e03eec8de2ed6d38e807428c18f2500a8717b32
Author: Li Jin <[email protected]>
Date: 2017-12-01T22:54:02Z
Test working. Need clean up
commit 456c4a8adf646ee46b00f8ce51d4e9e8279abc3e
Author: Li Jin <[email protected]>
Date: 2017-12-04T06:34:16Z
Add tests
commit 35ff548ac942d210ccd99fb2a60b95e2d4a28e2a
Author: Li Jin <[email protected]>
Date: 2017-12-04T06:36:03Z
Clean up
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]