[GitHub] [spark] asl3 commented on a diff in pull request #41606: [WIP] [SPARK-44061][PYTHON] Add assertDFEqual util function

via GitHub Wed, 05 Jul 2023 09:49:23 -0700


asl3 commented on code in PR #41606:
URL: https://github.com/apache/spark/pull/41606#discussion_r1253356858



##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -16,18 +16,283 @@
 # limitations under the License.
 #
 
+import unittest
+from prettytable import PrettyTable

Review Comment:
   @HyukjinKwon I don't think difflib will work unfortunately. difflib is to 
compare strings. I can convert PySpark df -> pandas df -> str, and put that 
into difflib, but then the output isn't clear because it only shows the exact 
characters that are different. 
   
   prettytable is nice because it can stack the rows and color-code them. I 
remember in a design discussion we said 
[prettytable](https://github.com/jazzband/prettytable) may be okay to add as a 
dependency, since it is popular (8.2million downloads per month). Another 
option is reimplementing similar functionality to prettytable, but I think it 
might make sense to just use what already exists?
   
   For example, here's the difference in output for difflib and prettytable for 
a simple pyspark df
   
   `df = self.spark.createDataFrame(
               data=[
                   ("1", 1000.00),
                   ("2", 3000.00),
               ],
               schema=["id", "amount"],
           )`
      
   `expected = self.spark.createDataFrame(
               data=[
                   ("1", 1001.00),
                   ("2", 3000.00),
               ],
               schema=["id", "amount"],
           )`
   
   difflib:
   <img width="173" alt="Screenshot 2023-07-05 at 9 17 11 AM" 
src="https://github.com/apache/spark/assets/68875504/444f6091-57a8-4d1d-a323-0a5b5b0a9c82";>
   
   prettytable:
   <img width="494" alt="Screenshot 2023-07-05 at 9 19 44 AM" 
src="https://github.com/apache/spark/assets/68875504/5a6f4123-fcff-4405-ada0-210ac8e0cb9a";>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] asl3 commented on a diff in pull request #41606: [WIP] [SPARK-44061][PYTHON] Add assertDFEqual util function

Reply via email to