peter-toth commented on issue #27309: [SPARK-30598][SQL] Detect equijoins better URL: https://github.com/apache/spark/pull/27309#issuecomment-577202774 @maropu I don't know why pgsql doesn't allow it, but Spark SQL does and the query makes sense. IMHO the 2 queries: ```SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.c2 = 2 AND t2.c2 = 2``` and ```SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.c2 = 2 AND t2.c2 = 2 AND t1.c2 = t2.c2'``` should have the same plan using SMJ to avoid BNLJ. @JoshRosen thanks for the detailed comment. This is a niche optimization indeed. Let me share why I raised this PR. I have another WIP PR here: https://github.com/apache/spark/pull/24553 and in the last commit I started playing with enabling constant propagation on join conditions too. I believe it could be beneficial on some niche inner joins e.g. `SELECT * FROM t1 JOIN t2 ON t1.c2 = 2 AND t2.c2 = 2 AND t1.c2 = t2.c2'` as it could turn BHJ into BNLJ (no need for hashing) or SMJ into CartesianProduct (no need for sorting). But that PR has a downside as well, it optimizes away the `t1.c2 = t2.c2` expression on full outer joins. This PR seemed to be a good idea to solve that issue and also improve full outer join queries where the equality is not specified explicitly.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
