Re: [PR] [SPARK-53696][PYTHON]Default to bytes for BinaryType in PySpark arrow UDF [spark]

via GitHub Sat, 18 Oct 2025 05:54:46 -0700


allisonwang-db commented on code in PR #52467:
URL: https://github.com/apache/spark/pull/52467#discussion_r2389381290



##########
docs/sql-ref-datatypes.md:
##########
@@ -131,7 +131,7 @@ from pyspark.sql.types import *
 |**StringType**|str|StringType()|
 |**CharType(length)**|str|CharType(length)|
 |**VarcharType(length)**|str|VarcharType(length)|
-|**BinaryType**|bytearray|BinaryType()|
+|**BinaryType**|bytearray<br/>**Note:** When Arrow is enabled 
(`spark.sql.execution.arrow.pyspark.enabled=true`), BinaryType maps to `bytes` 
instead of `bytearray`.|BinaryType()|
 |**BooleanType**|bool|BooleanType()|

Review Comment:
   And this also requires this config 
`spark.sql.execution.arrow.pyspark.binaryAsBytes` to be enabled right? I think 
this can be confusing. If we enabled both configs 
`spark.sql.execution.arrow.pyspark.enabled` and 
`spark.sql.execution.arrow.pyspark.binaryAsBytes` by default in Spark 4.1, then 
can we directly change the documentation here to `bytes`? You can mention this 
is a behavior change in Spark 4.1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-53696][PYTHON]Default to bytes for BinaryType in PySpark arrow UDF [spark]

Reply via email to