Re: [PR] [SPARK-53029][PYTHON] Support return type coercion for Arrow Python UDTFs [spark]

via GitHub Tue, 02 Sep 2025 18:47:06 -0700


shujingyang-db commented on code in PR #52140:
URL: https://github.com/apache/spark/pull/52140#discussion_r2317227159



##########
python/pyspark/sql/pandas/serializers.py:
##########
@@ -201,9 +201,26 @@ class 
ArrowStreamArrowUDTFSerializer(ArrowStreamUDTFSerializer):
     Serializer for PyArrow-native UDTFs that work directly with PyArrow 
RecordBatches and Arrays.
     """
 
-    def __init__(self, table_arg_offsets=None):
+    def __init__(self, table_arg_offsets=None, arrow_cast=True):
         super().__init__()
         self.table_arg_offsets = table_arg_offsets if table_arg_offsets else []
+        self._arrow_cast = arrow_cast
+
+    def _create_array(self, arr, arrow_type):
+        import pyarrow as pa
+
+        assert isinstance(arr, pa.Array)
+        assert isinstance(arrow_type, pa.DataType)
+
+        if arr.type == arrow_type:
+            return arr
+        elif self._arrow_cast:
+            return arr.cast(target_type=arrow_type, safe=True)

Review Comment:
   Arrow UDF has the same 
[implementation](https://github.com/apache/spark/blob/885bfc22cb0d315519384568c9cb0dce2c0f556f/python/pyspark/sql/pandas/serializers.py#L726).
 cc: @zhengruifeng please keep me honest.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-53029][PYTHON] Support return type coercion for Arrow Python UDTFs [spark]

Reply via email to