Re: [PR] [SPARK-55176][PYTHON] Extract `arrow_to_pandas` converter into ArrowArrayToPandasConversion [spark]

via GitHub Sun, 25 Jan 2026 22:07:10 -0800


zhengruifeng commented on code in PR #53957:
URL: https://github.com/apache/spark/pull/53957#discussion_r2726437581



##########
python/pyspark/sql/conversion.py:
##########
@@ -1030,3 +1030,78 @@ def convert_legacy(
             integer_object_nulls=True,
         )
         return converter(ser)
+
+    @classmethod
+    def create_converter(
+        cls,
+        timezone: str,
+        struct_in_pandas: str = "dict",
+        ndarray_as_list: bool = False,
+        df_for_struct: bool = False,
+        input_types: Optional[List] = None,
+    ) -> Callable[["pa.Array", int], Union["pd.Series", "pd.DataFrame"]]:
+        """
+        Create a column-level converter function for Arrow to Pandas 
conversion.
+
+        Parameters
+        ----------
+        timezone : str
+            Timezone for timestamp conversion.
+        struct_in_pandas : str
+            How to represent struct in pandas ("dict", "row", etc.)
+        ndarray_as_list : bool
+            Whether to convert ndarray as list.
+        df_for_struct : bool
+            If True, convert struct columns to DataFrame instead of Series.
+        input_types : list, optional
+            Spark types for each column, used for precise type conversion.
+
+        Returns
+        -------
+        callable
+            Function (arrow_column, column_index) -> pd.Series | pd.DataFrame
+        """
+        from pyspark.sql.pandas.types import from_arrow_type, is_variant
+
+        def convert(arrow_column: "pa.Array", spark_type: Optional[DataType] = 
None):
+            return cls.convert_legacy(
+                arrow_column,
+                spark_type or from_arrow_type(arrow_column.type),
+                timezone=timezone,
+                struct_in_pandas=struct_in_pandas,
+                ndarray_as_list=ndarray_as_list,
+            )
+
+        def converter(

Review Comment:
   I'd enhance existing `convert_legacy` to support returning pdf in 
https://github.com/apache/spark/pull/53967
   
   so that all logic are in single function and we can then start to adding and 
comparing new conversion implementations (e.g. to pyarrow-backed pandas series)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55176][PYTHON] Extract `arrow_to_pandas` converter into ArrowArrayToPandasConversion [spark]

Reply via email to