Re: [PR] [SPARK-38098][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion [spark]

via GitHub Thu, 15 Feb 2024 18:37:49 -0800


itholic commented on code in PR #35391:
URL: https://github.com/apache/spark/pull/35391#discussion_r1491894555



##########
python/pyspark/sql/pandas/types.py:
##########
@@ -86,8 +86,15 @@ def to_arrow_type(dt: DataType) -> "pa.DataType":
     elif type(dt) == DayTimeIntervalType:
         arrow_type = pa.duration("us")
     elif type(dt) == ArrayType:
-        if type(dt.elementType) in [StructType, TimestampType]:
+        if type(dt.elementType) == TimestampType:
             raise TypeError("Unsupported type in conversion to Arrow: " + 
str(dt))
+        elif type(dt.elementType) == StructType:
+            if LooseVersion(pa.__version__) < LooseVersion("2.0.0"):
+                raise TypeError(
+                    "Array of StructType is only supported with pyarrow 2.0.0 
and above"

Review Comment:
   Hi, @LucaCanali I think I've found a case where Array of StructType doesn't 
work properly:
   
   **In:**
   ```python
   df = spark.createDataFrame(
     [
       ("a", [("b", False), ("c", True)]),
     ]
   ).toDF("c1", "c2")
   df.toPandas()
   ```
   
   **Out:**
   ```python
     c1                                                 c2
   0  a  [{'_1': 'b', '_2': False}, {'_1': 'c', '_2': T...
   ```
   
   **Expected:**
   ```python
     c1                       c2
   0  a  [(b, False), (c, True)]
   ```
   
   
   I roughly suspect that this is maybe an internal table conversion issue from 
PyArrow side but not very sure about this, so could you check this out when you 
find some time??
   
   Thanks in advance!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-38098][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion [spark]

Reply via email to