[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

icexelloss Thu, 18 Jan 2018 08:48:48 -0800

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19872#discussion_r162402605
  
    --- Diff: python/pyspark/sql/group.py ---
    @@ -65,7 +65,16 @@ def __init__(self, jgd, df):
         def agg(self, *exprs):
             """Compute aggregates and returns the result as a 
:class:`DataFrame`.
     
    -        The available aggregate functions are `avg`, `max`, `min`, `sum`, 
`count`.
    +        The available aggregate functions can be:
    +
    +        1. built-in aggregation functions, such as `avg`, `max`, `min`, 
`sum`, `count`
    +
    +        2. group aggregate pandas UDFs
    +
    +           .. note:: There is no partial aggregation with group aggregate 
UDFs, i.e.,
    +               a full shuffle is required.
    +
    +           .. seealso:: :meth:`pyspark.sql.functions.pandas_udf`
    --- End diff --
    
    Added



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

Reply via email to