Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22610#discussion_r223070065
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2909,6 +2909,11 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
             can fail on special rows, the workaround is to incorporate the 
condition into the functions.
     
         .. note:: The user-defined functions do not take keyword arguments on 
the calling side.
    +
    +    .. note:: The data type of returned `pandas.Series` from the 
user-defined functions should be
    +        matched with defined returnType. When there is mismatch between 
them, it is not guaranteed
    +        that the conversion by SparkSQL during serialization is correct at 
all and users might get
    --- End diff --
    
    instead of saying "conversion is not guaranteed" which sounds like results 
might be arbitrary, could we say "..mismatch between them, an attempt will be 
made to cast the data and results should be checked for accuracy."?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to