AngersZhuuuu opened a new pull request #29035: URL: https://github.com/apache/spark/pull/29035
### What changes were proposed in this pull request? In current Join Hint strategies, if we use SHUFFLE_REPLICATE_NL hint, it will directly convert join to Cartesian Product Join and loss join condition making result not correct. For Example: ``` spark-sql> select * from test4 order by a asc; 1 2 Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s) spark-sql>select * from test5 order by a asc 1 2 2 2 Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 where test4.a = test5.a order by test4.a asc ; 1 2 1 2 1 2 2 2 Time taken: 0.351 seconds, Fetched 2 row(s) 20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched 2 row(s) ``` ### Why are the changes needed? Fix wrong data result ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Added UT ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
