itholic commented on code in PR #41974:
URL: https://github.com/apache/spark/pull/41974#discussion_r1264809851
##########
python/docs/source/user_guide/sql/arrow_pandas.rst:
##########
@@ -333,6 +333,32 @@ The following example shows how to use
``DataFrame.groupby().cogroup().applyInPa
For detailed usage, please see :meth:`PandasCogroupedOps.applyInPandas`
+Arrow Python UDFs
+-----------------
+
+Arrow Python UDFs are user defined functions that are executed row-by-row,
utilizing Arrow for efficient batch data
+transfer and serialization. To define an Arrow Python UDF, you can use the
:meth:`udf` decorator or wrap the function
+with the :meth:`udf` method, ensuring the ``useArrow`` parameter is set to
True. Additionally, you can enable Arrow
+optimization for Python UDFs throughout the entire SparkSession by setting the
Spark configuration ``spark.sql
+.execution.pythonUDF.arrow.enabled`` to true. It's important to note that the
Spark configuration takes effect only
+when ``useArrow`` is either not set or set to None.
+
+The type hints for Arrow Python UDFs should be specified in the same way as
for default, pickled Python UDFs.
+
+Here's an example that demonstrates the usage of both a default, pickled
Python UDF and an Arrow Python UDF:
+
+.. literalinclude:: ../../../../../examples/src/main/python/sql/arrow.py
+ :language: python
+ :lines: 279-297
+ :dedent: 4
+
+Compared to the default, pickled Python UDF, Arrow Python UDF provides a more
coherent type coercion mechanism. UDF
Review Comment:
qq: Is the term "pickled Python UDF" generally used in PySpark?? Just to
confirm.
##########
examples/src/main/python/sql/arrow.py:
##########
@@ -275,6 +275,28 @@ def merge_ordered(left: pd.DataFrame, right: pd.DataFrame)
-> pd.DataFrame:
# +--------+---+---+----+
+def arrow_python_udf_example(spark: SparkSession) -> None:
+ from pyspark.sql.functions import udf
+
+ @udf(returnType='int') # A default, pickled Python UDF
+ def slen(s): # type: ignore[no-untyped-def]
+ return len(s)
+
+ @udf(returnType='int', useArrow=True) # An Arrow Python UDF
+ def add_one(x): # type: ignore[no-untyped-def]
+ if x is not None:
+ return x + 1
Review Comment:
Why don't we use same function as an example for both case to avoid
confusion? Or maybe it's intended to use different example for some reason??
##########
python/docs/source/user_guide/sql/arrow_pandas.rst:
##########
@@ -333,6 +333,32 @@ The following example shows how to use
``DataFrame.groupby().cogroup().applyInPa
For detailed usage, please see :meth:`PandasCogroupedOps.applyInPandas`
+Arrow Python UDFs
+-----------------
+
+Arrow Python UDFs are user defined functions that are executed row-by-row,
utilizing Arrow for efficient batch data
+transfer and serialization. To define an Arrow Python UDF, you can use the
:meth:`udf` decorator or wrap the function
+with the :meth:`udf` method, ensuring the ``useArrow`` parameter is set to
True. Additionally, you can enable Arrow
+optimization for Python UDFs throughout the entire SparkSession by setting the
Spark configuration ``spark.sql
+.execution.pythonUDF.arrow.enabled`` to true. It's important to note that the
Spark configuration takes effect only
+when ``useArrow`` is either not set or set to None.
+
+The type hints for Arrow Python UDFs should be specified in the same way as
for default, pickled Python UDFs.
+
+Here's an example that demonstrates the usage of both a default, pickled
Python UDF and an Arrow Python UDF:
+
+.. literalinclude:: ../../../../../examples/src/main/python/sql/arrow.py
+ :language: python
+ :lines: 279-297
+ :dedent: 4
+
+Compared to the default, pickled Python UDF, Arrow Python UDF provides a more
coherent type coercion mechanism. UDF
Review Comment:
nit: I think it would be great if we can pick & use one term "Python UDF" or
"Python UDFs" ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]