Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r162238606
--- Diff: python/pyspark/sql/group.py ---
@@ -65,7 +65,16 @@ def __init__(self, jgd, df):
def agg(self, *exprs):
"""Compute aggregates and returns the result as a
:class:`DataFrame`.
- The available aggregate functions are `avg`, `max`, `min`, `sum`,
`count`.
+ The available aggregate functions can be:
+
+ 1. built-in aggregation functions, such as `avg`, `max`, `min`,
`sum`, `count`
+
+ 2. group aggregate pandas UDFs
+
+ .. note:: There is no partial aggregation with group aggregate
UDFs, i.e.,
+ a full shuffle is required.
+
+ .. seealso:: :meth:`pyspark.sql.functions.pandas_udf`
--- End diff --
We should also note that we can't use built-in and udf at the same time?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]