Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22610#discussion_r223217249 --- Diff: python/pyspark/sql/functions.py --- @@ -2909,6 +2909,12 @@ def pandas_udf(f=None, returnType=None, functionType=None): can fail on special rows, the workaround is to incorporate the condition into the functions. .. note:: The user-defined functions do not take keyword arguments on the calling side. + + .. note:: The data type of returned `pandas.Series` from the user-defined functions should be + matched with defined returnType (see :meth:`types.to_arrow_type` and + :meth:`types.from_arrow_type`). When there is mismatch between them, Spark might do + conversion on returned data. The conversion is not guaranteed to be correct and results + should be checked for accuracy by users. --- End diff -- I am merging this since this describes the current status but let's make it clear and try to get rid of this note within 3.0.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org