xinrong-meng commented on code in PR #39384:
URL: https://github.com/apache/spark/pull/39384#discussion_r1064122105


##########
python/pyspark/sql/udf.py:
##########
@@ -75,6 +81,104 @@ def _create_udf(
     return udf_obj._wrapped()
 
 
+def _create_py_udf(
+    f: Callable[..., Any],
+    returnType: "DataTypeOrString",
+    evalType: int,
+    useArrow: Optional[bool] = None,
+) -> "UserDefinedFunctionLike":
+    # The following table shows the results when the type coercion in Arrow is 
needed, that is,
+    # when the user-specified return type(SQL Type) of the UDF and the actual 
instance(Python
+    # Value(Type)) that the UDF returns are different.
+    # Arrow and Pickle have different type coercion rules, so a UDF might have 
a different result
+    # with/without Arrow optimization. That's the main reason the Arrow 
optimization for Python
+    # UDFs is disabled by default.
+    # 
+-----------------------------+--------------+----------+------+------+----------------+-----------------------------+----------+----------------------+---------+-----------+----------------------------+----------+--------------+
  # noqa
+    # |SQL Type \ Python 
Value(Type)|None(NoneType)|True(bool)|1(int)|a(str)|1970-01-01(date)|1970-01-01 
00:00:00(datetime)|1.0(float)|array('i', 
[1])(array)|[1](list)|(1,)(tuple)|bytearray(b'ABC')(bytearray)|1(Decimal)|{'a': 
1}(dict)|  # noqa
+    # 
+-----------------------------+--------------+----------+------+------+----------------+-----------------------------+----------+----------------------+---------+-----------+----------------------------+----------+--------------+
  # noqa
+    # |                      boolean|             X|         X|     X|     X|  
             X|                            X|         X|                     X| 
       X|          X|                           X|         X|             X|  # 
noqa

Review Comment:
   Good catch! I re-generated the table and added a note for the library 
versions used.
   
   ```
   $ conda list | grep -e 'python\|pyarrow\|pandas'
   pandas                    1.5.2                    pypi_0    pypi
   pandas-stubs              1.2.0.53                 pypi_0    pypi
   pyarrow                   10.0.1                   pypi_0    pypi
   python                    3.9.15               h218abb5_2  
   python-dateutil           2.8.2                    pypi_0    pypi
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to