GitHub user icexelloss opened a pull request:

    https://github.com/apache/spark/pull/22305

    [WIP][SPARK-24561][SQL][Python] User-defined window aggregation functions 
with Pandas UDF (bounded window)

    ## What changes were proposed in this pull request?
    
    ### **This is currently WIP** 
    
    This PR implements a new feature - window aggregation Pandas UDF for 
bounded window.
    
    Example:
    ```
    @pandas_udf('double', PandasUDFType.GROUPED_AGG)
    def avg(v):
        return v.mean()
    return avg
    
    w =  Window.partitionBy('id').rowsBetween(-2, 3)
    
    result1 = df.withColumn('mean_v', avg(df['v']).over(w))
    ```
    
    ## How was this patch tested?
    
    New tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/icexelloss/spark 
SPARK-24561-bounded-window-udf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22305.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22305
    
----
commit 4a4ba9406b7cb393eb4397083f872ee463eca97e
Author: Li Jin <ice.xelloss@...>
Date:   2018-08-29T20:39:16Z

    wip

commit f9e73265dee746e3192a858bea1b5eca6a1b1826
Author: Li Jin <ice.xelloss@...>
Date:   2018-08-30T14:49:16Z

    Remove empty line

commit 28414fb9f099bd063dc24cb36878d006ccd2d53d
Author: Li Jin <ice.xelloss@...>
Date:   2018-08-31T14:51:09Z

    Initial commit (WIP)

commit 51a9dcf0166d68ce7fbdc380a686426eeff5ebb6
Author: Li Jin <ice.xelloss@...>
Date:   2018-08-31T15:22:44Z

    Fix case for unbounded window

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to