xianzhe-databricks commented on code in PR #52467:
URL: https://github.com/apache/spark/pull/52467#discussion_r2435248776
##########
python/docs/source/migration_guide/pyspark_upgrade.rst:
##########
@@ -29,6 +29,17 @@ Upgrading from PySpark 4.0 to 4.1
* In Spark 4.1, Arrow-optimized Python UDF supports UDT input / output instead
of falling back to the regular UDF. To restore the legacy behavior, set
``spark.sql.execution.pythonUDF.arrow.legacy.fallbackOnUDT`` to ``true``.
* In Spark 4.1, unnecessary conversion to pandas instances is removed when
``spark.sql.execution.pythonUDF.arrow.enabled`` is enabled. As a result, the
type coercion changes when the produced output has a schema different from the
specified schema. To restore the previous behavior, enable
``spark.sql.legacy.execution.pythonUDF.pandas.conversion.enabled``.
* In Spark 4.1, unnecessary conversion to pandas instances is removed when
``spark.sql.execution.pythonUDTF.arrow.enabled`` is enabled. As a result, the
type coercion changes when the produced output has a schema different from the
specified schema. To restore the previous behavior, enable
``spark.sql.legacy.execution.pythonUDTF.pandas.conversion.enabled``.
+* In Spark 4.1, the data type ``BinaryType`` is mapped to Python ``bytes``
consistently in PySpark.
+ To restore the previous behavior, set
``spark.sql.execution.pyspark.binaryAsBytes`` to ``true``. The behavior before
Spark 4.1.0 is illustrated in the following table:
+
+
===============================================================================
==============================
+ Case
Python type for ``BinaryType``
+
===============================================================================
==============================
+ Regular UDF and UDTF without Arrow optimization
``bytearray``
+ DataFrame APIs (both Spark Classic and Spark Connect)
``bytearray``
+ Data sources
``bytearray``
+ Arrow-optimized UDF and UDTF with unnecessary conversion to pandas
instances ``bytes``
+
===============================================================================
==============================
Review Comment:
@ueshin @allisonwang-db migration guide is added!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]