[GitHub] [spark] LucaCanali commented on a diff in pull request #35391: [SPARK-38098][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion

GitBox Fri, 29 Jul 2022 03:28:20 -0700


LucaCanali commented on code in PR #35391:
URL: https://github.com/apache/spark/pull/35391#discussion_r933096674



##########
python/pyspark/sql/tests/test_pandas_udf_scalar.py:
##########
@@ -134,6 +134,30 @@ def test_pandas_udf_nested_arrays(self):
         result = df.select(tokenize("vals").alias("hi"))
         self.assertEqual([Row(hi=[["hi", "boo"]]), Row(hi=[["bye", "boo"]])], 
result.collect())
 
+    def test_pandas_array_struct(self):
+        # SPARK-38098: Support Array of Struct for Pandas UDFs and toPandas
+        # import numpy as np
+
+        @pandas_udf("Array<struct<col1:string, col2:long, col3:double>>")
+        def return_cols(cols):
+            # self.assertEqual(type(cols), pd.Series)
+            # self.assertEqual(type(cols[0]), np.ndarray)
+            # self.assertEqual(type(cols[0][0]), dict)

Review Comment:
   Thank you @ueshin for the review and comments.
   I have added the proposed modifications.
   As for `import numpy as np`, I have now added it explicitly, however 
externally to the udf, just to be consistent with the other tests there that 
use numpy.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LucaCanali commented on a diff in pull request #35391: [SPARK-38098][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion

Reply via email to