Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20295#discussion_r171285307
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
            |  2| 1.1094003924504583|
            +---+-------------------+
     
    +       Alternatively, the user can define a function that takes two 
arguments.
    +       In this case, the grouping key will be passed as the first argument 
and the data will
    +       be passed as the second argument. The grouping key will be passed 
as a tuple of numpy
    +       data types, e.g., `numpy.int32` and `numpy.float64`. The data will 
still be passed in
    +       as a `pandas.DataFrame` containing all columns from the original 
Spark DataFrame.
    +       This is useful when the user doesn't want to hardcode grouping key 
in the function.
    --- End diff --
    
    I usually avoid abbreviation like `doesn't` in doc but I am not sure if 
this actually matters though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to