GitHub user icexelloss opened a pull request:

    https://github.com/apache/spark/pull/21082

    [SPARK-22239][SQL][Python][WIP] Enable grouped aggregate pandas UDFs as 
window functions

    ## What changes were proposed in this pull request?
    This PR enables using a grouped aggregate pandas UDFs as window functions. 
The semantics is the same as using SQL aggregation function as window functions.
    
    ```
    w = Window.partitionBy('id').rowsBetween(Window.unboundedPreceding, 
Window.unboundedFollowing)
    mean_udf = pandas_udf(lambda v: v.mean(), 'double', 
PandasUDFType.GROUPED_AGG)
    result1 = df.withColumn('mean_v', mean_udf(df['v']).over(w))
    ```
    
    The scope of this PR is somewhat limited in terms of:
    (1) Only supports unbounded window, which acts essentially as group by.
    (2) Only supports aggregation functions, not "transform" like window 
functions (n -> n mapping)
    
    Both of these are left as future work. Especially, (1) needs careful 
thinking w.r.t. how to pass rolling window data to python efficiently. (2) is a 
bit easier but does require more changes therefore I think it's better to leave 
it as a separate PR.
    
    **This PR is currently WIP**
    
    ## How was this patch tested?
    
    WindowPandasUDFTests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/icexelloss/spark SPARK-22239-window-udf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21082.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21082
    
----
commit 54609cf97aa7e6b3d55f288e1b2aa92ac91e9b28
Author: Li Jin <ice.xelloss@...>
Date:   2018-03-24T19:52:55Z

    wip

commit f454933ac45bbf4a8bf0b87192bf4b323724b3fe
Author: Li Jin <ice.xelloss@...>
Date:   2018-04-04T14:41:03Z

    wip

commit e5207455d835198fcab253a99b490784ee04b3cf
Author: Li Jin <ice.xelloss@...>
Date:   2018-04-05T14:06:53Z

    wip

commit 78dc82b881fa15b4e6ef0380177418df58df392d
Author: Li Jin <ice.xelloss@...>
Date:   2018-04-06T18:34:48Z

    wip

commit 083ae4a5b9676eb953b181f98ba0c5c1fb3fce47
Author: Li Jin <ice.xelloss@...>
Date:   2018-04-16T21:23:37Z

    Test passes

commit 15bbedf92b27afc0234669a0fcd21d141511aa17
Author: Li Jin <ice.xelloss@...>
Date:   2018-04-16T21:54:15Z

    Clean up

commit 6a964d433b6e318af515bfc3ee38c8e3621872d7
Author: Li Jin <ice.xelloss@...>
Date:   2018-04-16T21:56:24Z

    white space

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to