[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

HyukjinKwon Thu, 31 May 2018 08:25:06 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21082#discussion_r192138739
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2321,7 +2323,30 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
            |  2|        6.0|
            +---+-----------+
     
    -       .. seealso:: :meth:`pyspark.sql.GroupedData.agg`
    +       This example shows using grouped aggregated UDFs as window 
functions. Note that only
    +       unbounded window frame is supported at the moment:
    +
    +       >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    +       >>> from pyspark.sql import Window
    +       >>> df = spark.createDataFrame(
    +       ...     [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
    +       ...     ("id", "v"))
    +       >>> @pandas_udf("double", PandasUDFType.GROUPED_AGG)  # doctest: 
+SKIP
    +       ... def mean_udf(v):
    +       ...     return v.mean()
    +       >>> w = Window.partitionBy('id')
    --- End diff --
    
    Shall we explicitly show unbounded boundaries?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

Reply via email to