[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

ueshin Wed, 17 Jan 2018 18:58:23 -0800

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19872#discussion_r162238606
  
    --- Diff: python/pyspark/sql/group.py ---
    @@ -65,7 +65,16 @@ def __init__(self, jgd, df):
         def agg(self, *exprs):
             """Compute aggregates and returns the result as a 
:class:`DataFrame`.
     
    -        The available aggregate functions are `avg`, `max`, `min`, `sum`, 
`count`.
    +        The available aggregate functions can be:
    +
    +        1. built-in aggregation functions, such as `avg`, `max`, `min`, 
`sum`, `count`
    +
    +        2. group aggregate pandas UDFs
    +
    +           .. note:: There is no partial aggregation with group aggregate 
UDFs, i.e.,
    +               a full shuffle is required.
    +
    +           .. seealso:: :meth:`pyspark.sql.functions.pandas_udf`
    --- End diff --
    
    We should also note that we can't use built-in and udf at the same time?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

Reply via email to