peter-toth commented on issue #27309: [SPARK-30598][SQL] Detect equijoins better
URL: https://github.com/apache/spark/pull/27309#issuecomment-577202774
 
 
   @maropu I don't know why pgsql doesn't allow it, but Spark SQL does and the 
query makes sense. IMHO the 2 queries:
   ```SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.c2 = 2 AND t2.c2 = 2``` 
   and 
   ```SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.c2 = 2 AND t2.c2 = 2 AND t1.c2 
= t2.c2'``` should have the same plan using SMJ to avoid BNLJ.
   
   @JoshRosen thanks for the detailed comment. This is a niche optimization 
indeed. Let me share why I raised this PR. I have another WIP PR here: 
https://github.com/apache/spark/pull/24553 and in the last commit I started 
playing with enabling constant propagation on join conditions too. I believe it 
could be beneficial on some niche inner joins e.g. `SELECT * FROM t1 JOIN t2 ON 
t1.c2 = 2 AND t2.c2 = 2 AND t1.c2 = t2.c2'` as it could turn BHJ into BNLJ (no 
need for hashing) or SMJ into CartesianProduct (no need for sorting).
   But that PR has a downside as well, it optimizes away the `t1.c2 = t2.c2` 
expression on full outer joins. This PR seemed to be a good idea to solve that 
issue and also improve full outer join queries where the equality is not 
specified explicitly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to