shujingyang-db commented on code in PR #52140: URL: https://github.com/apache/spark/pull/52140#discussion_r2317227159
########## python/pyspark/sql/pandas/serializers.py: ########## @@ -201,9 +201,26 @@ class ArrowStreamArrowUDTFSerializer(ArrowStreamUDTFSerializer): Serializer for PyArrow-native UDTFs that work directly with PyArrow RecordBatches and Arrays. """ - def __init__(self, table_arg_offsets=None): + def __init__(self, table_arg_offsets=None, arrow_cast=True): super().__init__() self.table_arg_offsets = table_arg_offsets if table_arg_offsets else [] + self._arrow_cast = arrow_cast + + def _create_array(self, arr, arrow_type): + import pyarrow as pa + + assert isinstance(arr, pa.Array) + assert isinstance(arrow_type, pa.DataType) + + if arr.type == arrow_type: + return arr + elif self._arrow_cast: + return arr.cast(target_type=arrow_type, safe=True) Review Comment: Arrow UDF has the same [implementation](https://github.com/apache/spark/blob/885bfc22cb0d315519384568c9cb0dce2c0f556f/python/pyspark/sql/pandas/serializers.py#L726). cc: @zhengruifeng please keep me honest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org