AngersZhuuuu commented on a change in pull request #25854: 
[SPARK-29145][SQL]Spark SQL cannot handle "NOT IN" condition when using "JOIN"  
     
URL: https://github.com/apache/spark/pull/25854#discussion_r335293254
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
 ##########
 @@ -204,6 +204,30 @@ class SubquerySuite extends QueryTest with 
SharedSparkSession {
     }
   }
 
+  test("SPARK-29145: JOIN Condition use QueryList") {
+    withTempView("s1", "s2", "s3") {
+      Seq(1, 3, 5, 7, 9).toDF("id").createOrReplaceTempView("s1")
+      Seq(1, 3, 4, 6, 9).toDF("id").createOrReplaceTempView("s2")
+      Seq(3, 4, 6, 9).toDF("id").createOrReplaceTempView("s3")
+
+      checkAnswer(
+        sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id IN 
(select 9)"),
 
 Review comment:
   > can we put correlated subquery in join condition?
   
   Subquery is in join condition, LogicalPlan as below:
   
   ```
   == Parsed Logical Plan ==
   'Project ['s1.id]
   +- 'Join Inner, (('s1.id = 's2.id) AND 's1.id IN (list#258 []))
      :  +- 'Project [unresolvedalias(9, None)]
      :     +- OneRowRelation
      :- 'UnresolvedRelation [s1]
      +- 'UnresolvedRelation [s2]
   
   == Analyzed Logical Plan ==
   id: int
   Project [id#244]
   +- Join Inner, ((id#244 = id#250) AND id#244 IN (list#258 []))
      :  +- Project [9 AS 9#259]
      :     +- OneRowRelation
      :- SubqueryAlias `s1`
      :  +- Project [value#241 AS id#244]
      :     +- LocalRelation [value#241]
      +- SubqueryAlias `s2`
         +- Project [value#247 AS id#250]
            +- LocalRelation [value#247]
   
   == Optimized Logical Plan ==
   Project [id#244]
   +- Join Inner, (id#244 = id#250)
      :- Project [value#241 AS id#244]
      :  +- Join LeftSemi, (value#241 = 9#259)
      :     :- LocalRelation [value#241]
      :     +- Project [9 AS 9#259]
      :        +- OneRowRelation
      +- Project [value#247 AS id#250]
         +- Join LeftSemi, (value#247 = 9#259)
            :- LocalRelation [value#247]
            +- Project [9 AS 9#259]
               +- OneRowRelation
   
   == Physical Plan ==
   *(4) Project [id#244]
   +- *(4) BroadcastHashJoin [id#244], [id#250], Inner, BuildRight
      :- *(4) Project [value#241 AS id#244]
      :  +- *(4) BroadcastHashJoin [value#241], [9#259], LeftSemi, BuildRight
      :     :- *(4) LocalTableScan [value#241]
      :     +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, 
int, false] as bigint))), [id=#145]
      :        +- *(1) Project [9 AS 9#259]
      :           +- *(1) Scan OneRowRelation[]
      +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
false] as bigint))), [id=#173]
         +- *(3) Project [value#247 AS id#250]
            +- *(3) BroadcastHashJoin [value#247], [9#259], LeftSemi, BuildRight
               :- *(3) LocalTableScan [value#247]
               +- ReusedExchange [9#259], BroadcastExchange 
HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))), 
[id=#145]
   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to