[GitHub] [spark] xinrong-meng commented on a diff in pull request #41974: [SPARK-44401][PYTHON][DOCS] Arrow Python UDF Use Guide

via GitHub Mon, 17 Jul 2023 10:50:40 -0700


xinrong-meng commented on code in PR #41974:
URL: https://github.com/apache/spark/pull/41974#discussion_r1265711863



##########
python/docs/source/user_guide/sql/arrow_pandas.rst:
##########
@@ -333,6 +333,32 @@ The following example shows how to use 
``DataFrame.groupby().cogroup().applyInPa
 
 For detailed usage, please see :meth:`PandasCogroupedOps.applyInPandas`
 
+Arrow Python UDFs
+-----------------
+
+Arrow Python UDFs are user defined functions that are executed row-by-row, 
utilizing Arrow for efficient batch data
+transfer and serialization. To define an Arrow Python UDF, you can use the 
:meth:`udf` decorator or wrap the function
+with the :meth:`udf` method, ensuring the ``useArrow`` parameter is set to 
True. Additionally, you can enable Arrow
+optimization for Python UDFs throughout the entire SparkSession by setting the 
Spark configuration ``spark.sql
+.execution.pythonUDF.arrow.enabled`` to true. It's important to note that the 
Spark configuration takes effect only
+when ``useArrow`` is either not set or set to None.
+
+The type hints for Arrow Python UDFs should be specified in the same way as 
for default, pickled Python UDFs.
+
+Here's an example that demonstrates the usage of both a default, pickled 
Python UDF and an Arrow Python UDF:
+
+.. literalinclude:: ../../../../../examples/src/main/python/sql/arrow.py
+    :language: python
+    :lines: 279-297
+    :dedent: 4
+
+Compared to the default, pickled Python UDF, Arrow Python UDF provides a more 
coherent type coercion mechanism. UDF

Review Comment:
   `the default, pickled Python UDF` is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41974: [SPARK-44401][PYTHON][DOCS] Arrow Python UDF Use Guide

Reply via email to