Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154616454
--- Diff: python/pyspark/sql/functions.py ---
@@ -2070,6 +2070,8 @@ class PandasUDFType(object):
GROUP_MAP = PythonEvalType.SQL_PANDAS_GROUP_MAP_UDF
+ GROUP_AGG = PythonEvalType.SQL_PANDAS_GROUP_AGG_UDF
--- End diff --
So I'm worried that it isn't clear to the user that this will result in a
full-shuffle with no-partial aggregation. Is there maybe a place we can
document this warning?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]