Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21887#discussion_r205644653
--- Diff: docs/sql-programming-guide.md ---
@@ -1804,6 +1804,25 @@ The following example shows how to use
`groupby().apply()` to subtract the mean
For detailed usage, please see
[`pyspark.sql.functions.pandas_udf`](api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf)
and
[`pyspark.sql.GroupedData.apply`](api/python/pyspark.sql.html#pyspark.sql.GroupedData.apply).
+### Grouped Aggregate
+
+Grouped aggregate Pandas UDFs are similar to Spark aggregate functions.
Grouped aggregate Pandas UDFs are used with groupBy and
+window operations. It defines an aggregation from one or more
`pandas.Series`
+to a scalar value, where the `pandas.Series` represents values for a
column within the same group or window.
+
+Note that this type of UDF doesn't not support partial aggregation and all
data for a group or window will be loaded into memory. Also,
+only unbounded window are supported with Grouped aggregate Pandas UDfs
currently.
--- End diff --
`UDfs` -> `UDFs`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]