Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22610#discussion_r223070065
--- Diff: python/pyspark/sql/functions.py ---
@@ -2909,6 +2909,11 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
can fail on special rows, the workaround is to incorporate the
condition into the functions.
.. note:: The user-defined functions do not take keyword arguments on
the calling side.
+
+ .. note:: The data type of returned `pandas.Series` from the
user-defined functions should be
+ matched with defined returnType. When there is mismatch between
them, it is not guaranteed
+ that the conversion by SparkSQL during serialization is correct at
all and users might get
--- End diff --
instead of saying "conversion is not guaranteed" which sounds like results
might be arbitrary, could we say "..mismatch between them, an attempt will be
made to cast the data and results should be checked for accuracy."?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]