[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

holdenk Mon, 04 Dec 2017 03:02:34 -0800

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19872#discussion_r154616454
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2070,6 +2070,8 @@ class PandasUDFType(object):
     
         GROUP_MAP = PythonEvalType.SQL_PANDAS_GROUP_MAP_UDF
     
    +    GROUP_AGG = PythonEvalType.SQL_PANDAS_GROUP_AGG_UDF
    --- End diff --
    
    So I'm worried that it isn't clear to the user that this will result in a 
full-shuffle with no-partial aggregation. Is there maybe a place we can 
document this warning?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

Reply via email to