zhengruifeng commented on code in PR #53967:
URL: https://github.com/apache/spark/pull/53967#discussion_r2726495606
##########
python/pyspark/sql/pandas/serializers.py:
##########
@@ -559,46 +560,22 @@ def __init__(
assert isinstance(input_type, StructType)
self._input_type = input_type
- def arrow_to_pandas(self, arrow_column, idx):
- import pyarrow.types as types
+ def arrow_to_pandas(self, arrow_column, idx) -> Union["pd.Series",
"pd.DataFrame"]:
+ input_type = self._input_type
+ if input_type is None:
+ input_type = from_arrow_type(arrow_column.type)
- # If the arrow type is struct, return a pandas dataframe where the
fields of the struct
- # correspond to columns in the DataFrame. However, if the arrow struct
is actually a
- # Variant, which is an atomic type, treat it as a non-struct arrow
type.
- if (
- self._df_for_struct
- and types.is_struct(arrow_column.type)
Review Comment:
the check with pyarrow datatype is not complete after adding geo types which
are also based on `pa.struct`
we should always check with spark data type
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]