xinrong-meng commented on code in PR #39384:
URL: https://github.com/apache/spark/pull/39384#discussion_r1064122105
##########
python/pyspark/sql/udf.py:
##########
@@ -75,6 +81,104 @@ def _create_udf(
return udf_obj._wrapped()
+def _create_py_udf(
+ f: Callable[..., Any],
+ returnType: "DataTypeOrString",
+ evalType: int,
+ useArrow: Optional[bool] = None,
+) -> "UserDefinedFunctionLike":
+ # The following table shows the results when the type coercion in Arrow is
needed, that is,
+ # when the user-specified return type(SQL Type) of the UDF and the actual
instance(Python
+ # Value(Type)) that the UDF returns are different.
+ # Arrow and Pickle have different type coercion rules, so a UDF might have
a different result
+ # with/without Arrow optimization. That's the main reason the Arrow
optimization for Python
+ # UDFs is disabled by default.
+ #
+-----------------------------+--------------+----------+------+------+----------------+-----------------------------+----------+----------------------+---------+-----------+----------------------------+----------+--------------+
# noqa
+ # |SQL Type \ Python
Value(Type)|None(NoneType)|True(bool)|1(int)|a(str)|1970-01-01(date)|1970-01-01
00:00:00(datetime)|1.0(float)|array('i',
[1])(array)|[1](list)|(1,)(tuple)|bytearray(b'ABC')(bytearray)|1(Decimal)|{'a':
1}(dict)| # noqa
+ #
+-----------------------------+--------------+----------+------+------+----------------+-----------------------------+----------+----------------------+---------+-----------+----------------------------+----------+--------------+
# noqa
+ # | boolean| X| X| X| X|
X| X| X| X|
X| X| X| X| X| #
noqa
Review Comment:
Good catch! I re-generated the table and added a note for the library
versions used.
```
$ conda list | grep -e 'python\|pyarrow\|pandas'
pandas 1.5.2 pypi_0 pypi
pandas-stubs 1.2.0.53 pypi_0 pypi
pyarrow 10.0.1 pypi_0 pypi
python 3.9.15 h218abb5_2
python-dateutil 2.8.2 pypi_0 pypi
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]