HyukjinKwon commented on a change in pull request #28160: [SPARK-30722][DOCS][FOLLOW-UP] Explicitly mention the same entire input/output length restriction of Series Iterator UDF URL: https://github.com/apache/spark/pull/28160#discussion_r405973319
########## File path: docs/sql-pyspark-pandas-with-arrow.md ########## @@ -198,12 +201,14 @@ For detailed usage, please see [`pyspark.sql.functions.pandas_udf`](api/python/p ## Pandas Function APIs -Pandas function APIs can directly apply a Python native function against the whole DataFrame by -using Pandas instances. Internally it works similarly with Pandas UDFs by Spark using Arrow to transfer -data and Pandas to work with the data, which allows vectorized operations. A Pandas function API behaves -as a regular API under PySpark `DataFrame` in general. +Pandas Function APIs can directly apply a Python native function against the whole `DataFrame` by +using Pandas instances. Internally it works similarly with Pandas UDFs by using Arrow to transfer +data and Pandas to work with the data, which allows vectorized operations. However, A Pandas Function +API behaves as a regular API under PySpark `DataFrame` instead of `Column`, and Python type hints in Pandas +Functions APIs are optional and do not affect how it works internally at this moment although they +might be required in the future. Review comment: I piggy-backed some doc changes here while I am here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
