Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r162402605 --- Diff: python/pyspark/sql/group.py --- @@ -65,7 +65,16 @@ def __init__(self, jgd, df): def agg(self, *exprs): """Compute aggregates and returns the result as a :class:`DataFrame`. - The available aggregate functions are `avg`, `max`, `min`, `sum`, `count`. + The available aggregate functions can be: + + 1. built-in aggregation functions, such as `avg`, `max`, `min`, `sum`, `count` + + 2. group aggregate pandas UDFs + + .. note:: There is no partial aggregation with group aggregate UDFs, i.e., + a full shuffle is required. + + .. seealso:: :meth:`pyspark.sql.functions.pandas_udf` --- End diff -- Added
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org