Re: [PR] [SPARK-53029][PYTHON] Support return type coercion for Arrow Python UDTFs [spark]

via GitHub Thu, 28 Aug 2025 13:37:08 -0700


shujingyang-db commented on code in PR #52140:
URL: https://github.com/apache/spark/pull/52140#discussion_r2308489185



##########
python/pyspark/sql/pandas/serializers.py:
##########
@@ -201,9 +201,26 @@ class 
ArrowStreamArrowUDTFSerializer(ArrowStreamUDTFSerializer):
     Serializer for PyArrow-native UDTFs that work directly with PyArrow 
RecordBatches and Arrays.
     """
 
-    def __init__(self, table_arg_offsets=None):
+    def __init__(self, table_arg_offsets=None, arrow_cast=False):
         super().__init__()
         self.table_arg_offsets = table_arg_offsets if table_arg_offsets else []
+        self._arrow_cast = arrow_cast
+
+    def _create_array(self, arr, arrow_type):
+        import pyarrow as pa
+
+        assert isinstance(arr, pa.Array)
+        assert isinstance(arrow_type, pa.DataType)
+
+        if arr.type == arrow_type:
+            return arr
+        elif self._arrow_cast:
+            return arr.cast(target_type=arrow_type, safe=True)
+        else:
+            raise PySparkTypeError(

Review Comment:
   Thanks for pointing it out! I have added a new test suite 
`ArrowUDTFTestsWithoutTypeCoercion`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-53029][PYTHON] Support return type coercion for Arrow Python UDTFs [spark]

Reply via email to