GitHub user icexelloss opened a pull request:
https://github.com/apache/spark/pull/22305
[WIP][SPARK-24561][SQL][Python] User-defined window aggregation functions
with Pandas UDF (bounded window)
## What changes were proposed in this pull request?
### **This is currently WIP**
This PR implements a new feature - window aggregation Pandas UDF for
bounded window.
Example:
```
@pandas_udf('double', PandasUDFType.GROUPED_AGG)
def avg(v):
return v.mean()
return avg
w = Window.partitionBy('id').rowsBetween(-2, 3)
result1 = df.withColumn('mean_v', avg(df['v']).over(w))
```
## How was this patch tested?
New tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/icexelloss/spark
SPARK-24561-bounded-window-udf
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22305.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22305
----
commit 4a4ba9406b7cb393eb4397083f872ee463eca97e
Author: Li Jin <ice.xelloss@...>
Date: 2018-08-29T20:39:16Z
wip
commit f9e73265dee746e3192a858bea1b5eca6a1b1826
Author: Li Jin <ice.xelloss@...>
Date: 2018-08-30T14:49:16Z
Remove empty line
commit 28414fb9f099bd063dc24cb36878d006ccd2d53d
Author: Li Jin <ice.xelloss@...>
Date: 2018-08-31T14:51:09Z
Initial commit (WIP)
commit 51a9dcf0166d68ce7fbdc380a686426eeff5ebb6
Author: Li Jin <ice.xelloss@...>
Date: 2018-08-31T15:22:44Z
Fix case for unbounded window
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]