[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

icexelloss Tue, 24 Apr 2018 08:09:13 -0700

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21082#discussion_r183769433
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2321,7 +2323,30 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
            |  2|        6.0|
            +---+-----------+
     
    -       .. seealso:: :meth:`pyspark.sql.GroupedData.agg`
    +       This example shows using grouped aggregated UDFs as window 
functions. Note that only
    +       unbounded window frame is supported at the moment:
    +
    +       >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    +       >>> from pyspark.sql import Window
    +       >>> df = spark.createDataFrame(
    +       ...     [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
    +       ...     ("id", "v"))
    +       >>> @pandas_udf("double", PandasUDFType.GROUPED_AGG)  # doctest: 
+SKIP
    --- End diff --
    
    Yes exactly. The idea is that the producer of the UDF can produce a grouped 
agg udf, such as weighted mean, and the consumer can use the UDF in both 
groupby and window, similar to how SQL aggregation function work.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

Reply via email to