zhengruifeng opened a new pull request, #56428:
URL: https://github.com/apache/spark/pull/56428

   ### What changes were proposed in this pull request?
   
   Fix two categories of `assertTrue` misuse in PySpark test files:
   
   1. **Silent no-op assertions** (`assertTrue(a, b)` where `b` is the expected 
value, not a failure message). Because `assertTrue(expr, msg=None)` treats the 
second argument as a failure message, these tests were only checking that the 
expression is truthy — never actually comparing values. Affected files:
      - `test_column.py`: `assertTrue(df3.columns, ["aa", "b", "a", "b"])` etc.
      - `test_dataframe.py`: `assertTrue(df3.columns, ["y"])`
      - `test_functions.py`: `assertTrue(row[1], 1)` / `assertTrue(row[2], 1)`
   
   2. **Weaker assertions** (`assertTrue(a == b)`, `assertFalse(a == b)`, 
`assertTrue(a != b)`, `assertTrue(a > b)`, `assertTrue(x not in y)`). These 
work correctly but produce worse error messages on failure. Replaced with 
`assertEqual`, `assertNotEqual`, `assertGreater`, and `assertNotIn`.
   
   ### Why are the changes needed?
   
   The silent no-op assertions in category 1 are bugs: the tests pass 
unconditionally regardless of the actual value. Category 2 changes improve 
debuggability.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests; no logic change.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes, generated with Claude Sonnet 4.6.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to