Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22610#discussion_r222885910
--- Diff: python/pyspark/sql/functions.py ---
@@ -2909,6 +2909,11 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
can fail on special rows, the workaround is to incorporate the
condition into the functions.
.. note:: The user-defined functions do not take keyword arguments on
the calling side.
+
+ .. note:: The data type of returned `pandas.Series` from the
user-defined functions should be
+ matched with defined returnType. When there is mismatch between
them, it is not guaranteed
+ that the conversion by SparkSQL during serialization is correct at
all and users might get
--- End diff --
maybe I am concerning too much .. but how about just say ..
```
... defined returnType (see :meth:`types.to_arrow_type` and
:meth:`types.from_arrow_type`).
When there is mismatch between them, the conversion is not guaranteed.
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]