cloud-fan commented on a change in pull request #25854: [SPARK-29145][SQL]Spark
SQL cannot handle "NOT IN" condition when using "JOIN"
URL: https://github.com/apache/spark/pull/25854#discussion_r335333731
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
##########
@@ -204,6 +204,30 @@ class SubquerySuite extends QueryTest with
SharedSparkSession {
}
}
+ test("SPARK-29145: JOIN Condition use QueryList") {
+ withTempView("s1", "s2", "s3") {
+ Seq(1, 3, 5, 7, 9).toDF("id").createOrReplaceTempView("s1")
+ Seq(1, 3, 4, 6, 9).toDF("id").createOrReplaceTempView("s2")
+ Seq(3, 4, 6, 9).toDF("id").createOrReplaceTempView("s3")
+
+ checkAnswer(
+ sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id IN
(select 9)"),
+ Row(9) :: Nil)
+
+ checkAnswer(
+ sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id NOT IN
(select 9)"),
+ Row(1) :: Row(3) :: Nil)
+
+ checkAnswer(
+ sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id IN
(select id from s3)"),
Review comment:
also cc @dilipbiswal
I checked with pgsql and it's supported. We need to update
`RewriteCorrelatedScalarSubquery` to support it in Spark.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]