fuxi611 opened a new pull request, #55987: URL: https://github.com/apache/spark/pull/55987
### What changes were proposed in this pull request? This PR fixes pandas-on-Spark equality and inequality comparisons between incompatible dtypes under ANSI mode. The change makes pandas-on-Spark return pandas-compatible boolean results for incompatible dtype comparisons instead of delegating them to Spark SQL casting behavior: - `eq` returns all `False` - `ne` returns all `True` This covers comparisons such as numeric Series/Index against string Series/Index or string scalar values. ### Why are the changes needed? ANSI mode should not change pandas API on Spark behavior. Without this fix, Spark SQL may try to cast incompatible operands under ANSI mode, which can produce behavior that differs from pandas or raise errors for comparisons where pandas would simply return boolean results. ### Does this PR introduce any user-facing change? Yes. pandas-on-Spark comparison behavior becomes more consistent with pandas under ANSI mode for incompatible dtype equality and inequality comparisons. ### How was this patch tested? Ran: ```bash python3 python/run-tests.py --testnames pyspark.pandas.tests.data_type_ops.test_num_ops python3 python/run-tests.py --testnames pyspark.pandas.tests.data_type_ops.test_boolean_ops -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
