xinrong-meng commented on code in PR #41974: URL: https://github.com/apache/spark/pull/41974#discussion_r1265711863
########## python/docs/source/user_guide/sql/arrow_pandas.rst: ########## @@ -333,6 +333,32 @@ The following example shows how to use ``DataFrame.groupby().cogroup().applyInPa For detailed usage, please see :meth:`PandasCogroupedOps.applyInPandas` +Arrow Python UDFs +----------------- + +Arrow Python UDFs are user defined functions that are executed row-by-row, utilizing Arrow for efficient batch data +transfer and serialization. To define an Arrow Python UDF, you can use the :meth:`udf` decorator or wrap the function +with the :meth:`udf` method, ensuring the ``useArrow`` parameter is set to True. Additionally, you can enable Arrow +optimization for Python UDFs throughout the entire SparkSession by setting the Spark configuration ``spark.sql +.execution.pythonUDF.arrow.enabled`` to true. It's important to note that the Spark configuration takes effect only +when ``useArrow`` is either not set or set to None. + +The type hints for Arrow Python UDFs should be specified in the same way as for default, pickled Python UDFs. + +Here's an example that demonstrates the usage of both a default, pickled Python UDF and an Arrow Python UDF: + +.. literalinclude:: ../../../../../examples/src/main/python/sql/arrow.py + :language: python + :lines: 279-297 + :dedent: 4 + +Compared to the default, pickled Python UDF, Arrow Python UDF provides a more coherent type coercion mechanism. UDF Review Comment: `the default, pickled Python UDF` is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
