[GitHub] [spark] allisonwang-db commented on a diff in pull request #42196: [SPARK-44218] Customize diff log in assertDataFrameEqual error message format

via GitHub Mon, 31 Jul 2023 11:01:12 -0700


allisonwang-db commented on code in PR #42196:
URL: https://github.com/apache/spark/pull/42196#discussion_r1279694241



##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -287,25 +302,30 @@ def test_assert_notequal_arraytype(self):
             ),
         )
 
-        expected_error_message = "Results do not match: "
-        percent_diff = (1 / 2) * 100
-        expected_error_message += "( %.5f %% )" % percent_diff
+        if isinstance(df2, DataFrame):
+            actual_str = df1._jdf.showString(2, 2, False)
+            expected_str = df2._jdf.showString(2, 2, False)
+        else:
+            # Spark Connect
+            actual_str = df1._show_string(2, 2, False)
+            expected_str = df2._show_string(2, 2, False)

Review Comment:
   Instead of using _show_string to invoke a Spark job, how about we convert 
the df to a pandas dataframe. Pandas should be a required dependency when spark 
connect is enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42196: [SPARK-44218] Customize diff log in assertDataFrameEqual error message format

Reply via email to