Re: [PR] [SPARK-54555][PYTHON] Enable Arrow-optimized Python UDFs by default [spark]

via GitHub Sun, 30 Nov 2025 13:50:15 -0800


dongjoon-hyun commented on code in PR #53264:
URL: https://github.com/apache/spark/pull/53264#discussion_r2574946541



##########
python/docs/source/migration_guide/pyspark_upgrade.rst:
##########
@@ -26,6 +26,7 @@ Upgrading from PySpark 4.0 to 4.1
 * In Spark 4.1, the minimum supported version for PyArrow has been raised from 
11.0.0 to 15.0.0 in PySpark.
 * In Spark 4.1, the minimum supported version for Pandas has been raised from 
2.0.0 to 2.2.0 in PySpark.
 * In Spark 4.1, ``DataFrame['name']`` on Spark Connect Python Client no longer 
eagerly validate the column name. To restore the legacy behavior, set 
``PYSPARK_VALIDATE_COLUMN_NAME_LEGACY`` environment variable to ``1``.
+* In Spark 4.1, Python UDFs are arrow-optimized by default, as the default 
configuration of ``spark.sql.execution.pythonUDF.arrow.enabled`` is ``true``. 
To restore the legacy behavior, set 
``spark.sql.execution.pythonUDF.arrow.enabled`` to ``false``.

Review Comment:
   Please make a new section like this.
   
   ```
   Upgrading from PySpark 4.1 to 4.2
   ---------------------------------
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54555][PYTHON] Enable Arrow-optimized Python UDFs by default [spark]

Reply via email to