AngersZhuuuu opened a new pull request #29035:
URL: https://github.com/apache/spark/pull/29035


   ### What changes were proposed in this pull request?
   In current Join Hint strategies, if we use SHUFFLE_REPLICATE_NL hint, it 
will directly convert join to Cartesian Product Join and loss join condition 
making result not correct.
   
   For Example:
   ```
   spark-sql> select * from test4 order by a asc;
   1 2
   Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
   spark-sql>select * from test5 order by a asc
   1 2
   2 2
   Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
   spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 
where test4.a = test5.a order by test4.a asc ;
   1 2 1 2
   1 2 2 2
   Time taken: 0.351 seconds, Fetched 2 row(s)
   20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched 
2 row(s)
   ```
   
   
   ### Why are the changes needed?
   Fix wrong data result
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   
   ### How was this patch tested?
   Added UT


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to