AngersZhuuuu commented on a change in pull request #25854:
[SPARK-29145][SQL]Spark SQL cannot handle "NOT IN" condition when using "JOIN"
URL: https://github.com/apache/spark/pull/25854#discussion_r335319542
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
##########
@@ -204,6 +204,30 @@ class SubquerySuite extends QueryTest with
SharedSparkSession {
}
}
+ test("SPARK-29145: JOIN Condition use QueryList") {
+ withTempView("s1", "s2", "s3") {
+ Seq(1, 3, 5, 7, 9).toDF("id").createOrReplaceTempView("s1")
+ Seq(1, 3, 4, 6, 9).toDF("id").createOrReplaceTempView("s2")
+ Seq(3, 4, 6, 9).toDF("id").createOrReplaceTempView("s3")
+
+ checkAnswer(
+ sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id IN
(select 9)"),
+ Row(9) :: Nil)
+
+ checkAnswer(
+ sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id NOT IN
(select 9)"),
+ Row(1) :: Row(3) :: Nil)
+
+ checkAnswer(
+ sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id IN
(select id from s3)"),
Review comment:
> for example, do we support
> `SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id IN (select id
from s3 where s3.id = s2.id)`
Cann't since strategy's idempotence is broken. Seem write sql like this is
not reasonable...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]