itholic commented on code in PR #35391:
URL: https://github.com/apache/spark/pull/35391#discussion_r1491894555
##########
python/pyspark/sql/pandas/types.py:
##########
@@ -86,8 +86,15 @@ def to_arrow_type(dt: DataType) -> "pa.DataType":
elif type(dt) == DayTimeIntervalType:
arrow_type = pa.duration("us")
elif type(dt) == ArrayType:
- if type(dt.elementType) in [StructType, TimestampType]:
+ if type(dt.elementType) == TimestampType:
raise TypeError("Unsupported type in conversion to Arrow: " +
str(dt))
+ elif type(dt.elementType) == StructType:
+ if LooseVersion(pa.__version__) < LooseVersion("2.0.0"):
+ raise TypeError(
+ "Array of StructType is only supported with pyarrow 2.0.0
and above"
Review Comment:
Hi, @LucaCanali I think I've found a case where Array of StructType doesn't
work properly:
**In:**
```python
df = spark.createDataFrame(
[
("a", [("b", False), ("c", True)]),
]
).toDF("c1", "c2")
df.toPandas()
```
**Out:**
```python
c1 c2
0 a [{'_1': 'b', '_2': False}, {'_1': 'c', '_2': T...
```
**Expected:**
```python
c1 c2
0 a [(b, False), (c, True)]
```
I roughly suspect that this is maybe an internal table conversion issue from
PyArrow side but not very sure about this, so could you check this out when you
find some time??
Thanks in advance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]