Re: [PR] [SPARK-53696][PYTHON][CONNECT][SQL] Default to bytes for BinaryType in PySpark [spark]

via GitHub Fri, 17 Oct 2025 18:00:13 -0700


xianzhe-databricks commented on code in PR #52467:
URL: https://github.com/apache/spark/pull/52467#discussion_r2435056619



##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -3063,6 +3076,19 @@ def tearDownClass(cls):
     not have_pandas or not have_pyarrow, pandas_requirement_message or 
pyarrow_requirement_message
 )
 class LegacyUDTFArrowTestsMixin(BaseUDTFTestsMixin):
+    def test_udtf_binary_type(self):
+        @udtf(returnType="type_name: string")
+        class BinaryTypeUDTF:
+            def eval(self, b):
+                # Check the type of the binary input and return type name as 
string
+                yield (type(b).__name__,)
+
+        # For Arrow Python UDTF with legacy conversion BinaryType is always 
mapped to bytes
+        for conf_value in ["true", "false"]:
+            with self.sql_conf({"spark.sql.execution.pyspark.binaryAsBytes": 
conf_value}):
+                result = BinaryTypeUDTF(lit(b"test")).collect()
+                self.assertEqual(result[0]["type_name"], "bytes")

Review Comment:
   @allisonwang-db arrow udtf with legacy conversion is tested here



##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -3389,6 +3415,11 @@ def tearDownClass(cls):
 
 
 class UDTFArrowTestsMixin(LegacyUDTFArrowTestsMixin):
+    def test_udtf_binary_type(self):
+        # For Arrow Python UDTF with non-legacy conversionBinaryType is mapped 
to
+        # bytes or bytearray consistently with non-Arrow Python UDTF behavior.
+        BaseUDTFTestsMixin.test_udtf_binary_type(self)

Review Comment:
   @allisonwang-db arrow udtf without legacy conversion is tested here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-53696][PYTHON][CONNECT][SQL] Default to bytes for BinaryType in PySpark [spark]

Reply via email to