AngersZhuuuu commented on issue #25854: [SPARK-29145][SQL] Spark SQL cannot 
handle "NOT IN" condition when using "JOIN" 
URL: https://github.com/apache/spark/pull/25854#issuecomment-545366245
 
 
   > @AngersZhuuuu I just quickly checked the plan for following query :
   > 
   > **query**
   > 
   > ```
   > SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id NOT IN (select id 
from s3)
   > ```
   > 
   > **plan**
   > 
   > ```
   > Project [id#244]
   > +- Join Inner, (id#244 = id#250)
   >   :- Project [value#241 AS id#244]
   >   :  +- Join LeftAnti, ((value#241 = id#256) OR isnull((value#241 = 
id#256)))
   >   :     :- LocalRelation [value#241]
   >   :     +- Project [value#253 AS id#256]
   >   :        +- LocalRelation [value#253]
   >   +- Project [value#247 AS id#250]
   >      +- Join LeftAnti, ((value#247 = id#256) OR isnull((value#247 = 
id#256)))
   >         :- LocalRelation [value#247]
   >         +- Project [value#253 AS id#256]
   >            +- LocalRelation [value#253]
   > ```
   > 
   > Thats the reason i asked to test out the outer joins. Lets please make 
sure that in case of Outer joins we preserve the full join condition in the 
main join. Lets add few tests to make sure please.
   
   Check whole process, you show is optimized plan, in analyzed plan, join 
condition is still in main join, after optimize, it was pushed down.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to