Re: [PR] [SPARK-55324][PYTHON]: Make convert_numpy support ArrayType [spark]

via GitHub Thu, 12 Mar 2026 00:58:25 -0700


zhengruifeng commented on code in PR #54143:
URL: https://github.com/apache/spark/pull/54143#discussion_r2922892510



##########
python/pyspark/sql/conversion.py:
##########
@@ -1739,7 +1772,14 @@ def convert_numpy(
             # This name will be dropped after pa.compute functions.
             ser_name = arr._name
 
-        arr = ArrowArrayConversion.preprocess_time(arr)
+        if isinstance(spark_type, ArrayType):
+            # ArrayType only needs tz localization, not ns coercion.
+            # preprocess_time() coerces to timestamp[ns] which causes PyArrow's
+            # to_pandas() to return raw integers for nested timestamps
+            # instead of datetime objects.
+            arr = ArrowArrayConversion.localize_tz(arr)

Review Comment:
   Is there test failure due to this?
   I guess we should also support ns coercion in this case.



##########
python/pyspark/sql/conversion.py:
##########
@@ -1699,7 +1734,6 @@ def convert_numpy(
         *,
         ser_name: Optional[str] = None,
         timezone: Optional[str] = None,
-        struct_in_pandas: Optional[str] = None,

Review Comment:
   is it never used in pandas udf?



##########
python/pyspark/sql/conversion.py:
##########
@@ -1808,15 +1848,14 @@ def convert_numpy(
             series = series.map(
                 lambda v: Geometry.fromWKB(v["wkb"], v["srid"]) if v is not 
None else None
             )
-        # elif isinstance(
-        #     spark_type,
-        #     (
-        #         ArrayType,
-        #         MapType,
-        #         StructType,
-        #     ),
-        # ):
-        # TODO(SPARK-55324): Support complex types
+        elif isinstance(spark_type, ArrayType):
+            if ndarray_as_list:
+                series = arr.to_pandas(date_as_object=True, 
integer_object_nulls=True)
+                series = series.map(lambda x: cls._ndarray_to_list(x) if x is 
not None else None)
+            else:
+                series = arr.to_pandas(date_as_object=True)

Review Comment:
   ditto



##########
python/pyspark/sql/conversion.py:
##########
@@ -1808,15 +1848,14 @@ def convert_numpy(
             series = series.map(
                 lambda v: Geometry.fromWKB(v["wkb"], v["srid"]) if v is not 
None else None
             )
-        # elif isinstance(
-        #     spark_type,
-        #     (
-        #         ArrayType,
-        #         MapType,
-        #         StructType,
-        #     ),
-        # ):
-        # TODO(SPARK-55324): Support complex types
+        elif isinstance(spark_type, ArrayType):
+            if ndarray_as_list:
+                series = arr.to_pandas(date_as_object=True, 
integer_object_nulls=True)

Review Comment:
   `date_as_object` defaults to True



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55324][PYTHON]: Make convert_numpy support ArrayType [spark]

Reply via email to