Re: [PR] [SPARK-55186][PYTHON] Make ArrowArrayToPandasConversion.convert_legacy able to return pd.DataFrame [spark]

via GitHub Sun, 25 Jan 2026 22:44:03 -0800


zhengruifeng commented on code in PR #53967:
URL: https://github.com/apache/spark/pull/53967#discussion_r2726495606



##########
python/pyspark/sql/pandas/serializers.py:
##########
@@ -559,46 +560,22 @@ def __init__(
             assert isinstance(input_type, StructType)
         self._input_type = input_type
 
-    def arrow_to_pandas(self, arrow_column, idx):
-        import pyarrow.types as types
+    def arrow_to_pandas(self, arrow_column, idx) -> Union["pd.Series", 
"pd.DataFrame"]:
+        input_type = self._input_type
+        if input_type is None:
+            input_type = from_arrow_type(arrow_column.type)
 
-        # If the arrow type is struct, return a pandas dataframe where the 
fields of the struct
-        # correspond to columns in the DataFrame. However, if the arrow struct 
is actually a
-        # Variant, which is an atomic type, treat it as a non-struct arrow 
type.
-        if (
-            self._df_for_struct
-            and types.is_struct(arrow_column.type)

Review Comment:
   the check with pyarrow datatype is not complete after adding geo types which 
are also based on `pa.struct`
   
   we should always check with spark data type



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55186][PYTHON] Make ArrowArrayToPandasConversion.convert_legacy able to return pd.DataFrame [spark]

Reply via email to