[GitHub] [spark] allisonwang-db commented on a diff in pull request #42310: [SPARK-44561][PYTHON] Fix AssertionError when converting UDTF output to a complex type

via GitHub Thu, 03 Aug 2023 09:54:52 -0700


allisonwang-db commented on code in PR #42310:
URL: https://github.com/apache/spark/pull/42310#discussion_r1283466745



##########
python/pyspark/sql/pandas/types.py:
##########
@@ -763,6 +764,9 @@ def _create_converter_from_pandas(
     error_on_duplicated_field_names : bool, optional
         Whether raise an exception when there are duplicated field names.
         (default ``True``)
+    ignore_unexpected_complex_type_values : bool, optional
+        Whether ignore the case where unexpected values are given for complex 
types.

Review Comment:
   Can we add a bit more comment to explain what is an unexpected value for a 
complex type? I.e., provide some concrete examples.



##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -1789,9 +1786,102 @@ def eval(self):
             ("x: array<int>", [Row(x=[0, 1, 2])]),
             ("x: array<float>", [Row(x=[0, 1.1, 2])]),
             ("x: array<array<int>>", err),
-            # TODO(SPARK-44561): fix AssertionError in convert_map and 
convert_struct
-            # ("x: map<string,int>", None),
-            # ("x: struct<a:int>", None)
+            ("x: map<string,int>", err),
+            ("x: struct<a:int>", err),
+        ]:
+            with self.subTest(ret_type=ret_type):
+                self._check_result_or_exception(TestUDTF, ret_type, expected)
+
+    def test_map_output_type_casting(self):

Review Comment:
   Thanks for adding these tests!



##########
python/pyspark/sql/pandas/types.py:
##########
@@ -781,28 +785,51 @@ def correct_timestamp(pser: pd.Series) -> pd.Series:
     def _converter(dt: DataType) -> Optional[Callable[[Any], Any]]:
 
         if isinstance(dt, ArrayType):
-            _element_conv = _converter(dt.elementType)
-            if _element_conv is None:
-                return None
+            _element_conv = _converter(dt.elementType) or (lambda x: x)

Review Comment:
   just curious why do we need (lambda x: x)?



##########
python/pyspark/sql/types.py:
##########
@@ -2402,7 +2402,7 @@ def __repr__(self) -> str:
                 "%s=%r" % (k, v) for k, v in zip(self.__fields__, tuple(self))
             )
         else:
-            return "<Row(%s)>" % ", ".join("%r" % field for field in self)
+            return "<Row(%s)>" % ", ".join(repr(field) for field in self)

Review Comment:
   This is fixed in https://github.com/apache/spark/pull/42303?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42310: [SPARK-44561][PYTHON] Fix AssertionError when converting UDTF output to a complex type

Reply via email to