AngersZhuuuu commented on a change in pull request #26437: [SPARK-29800][SQL]
Rewrite non-correlated subquery use ScalaSubquery to optimize perf
URL: https://github.com/apache/spark/pull/26437#discussion_r363146645
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
##########
@@ -52,6 +52,21 @@ object ReplaceExpressions extends Rule[LogicalPlan] {
}
}
+/**
+ * Rewrite non correlated exists subquery to use ScalarSubquery
+ * WHERE EXISTS (SELECT A FROM TABLE B WHERE COL1 > 10)
+ * will be rewrite to
+ * WHERE (SELECT 1 FROM (SELECT A FROM TABLE B WHERE COL1 > 10) LIMIT 1) IS
NOT NULL
+ */
+object RewriteNonCorrelatedExists extends Rule[LogicalPlan] {
Review comment:
> Shall we add a test for this rule?
With test case
```
test("Rewritten uncorrelated exists subquery to use ScalarSubquery") {
val relation = LocalRelation('a.int)
val relExistSubquery = LocalRelation('x.int, 'y.int, 'z.int).where('x >
10)
val query = relation.where(Exists(relExistSubquery)).select('a)
val optimized = Optimize.execute(query.analyze)
val correctAnswer = relation
.where(IsNotNull(ScalarSubquery(Limit(Literal(1),
Project(Seq(Alias(Literal(1), "col")()), relExistSubquery)))))
.analyze
comparePlans(optimized, correctAnswer)
}
```
Get error
```
\[info] RewriteSubquerySuite:
[info] - Rewritten uncorrelated exists subquery to use ScalarSubquery ***
FAILED *** (852 milliseconds)
[info] == FAIL: Plans do not match ===
[info] Filter isnotnull(scalar-subquery#0 []) Filter
isnotnull(scalar-subquery#0 [])
[info] : +- GlobalLimit 1 : +-
GlobalLimit 1
[info] : +- LocalLimit 1 :
+- LocalLimit 1
[info] !: +- Project [1 AS col#5] :
+- Project [1 AS col#6]
[info] : +- Filter (x#1 > 10) :
+- Filter (x#1 > 10)
[info] : +- LocalRelation <empty>, [x#1, y#2, z#3] :
+- LocalRelation <empty>, [x#1, y#2, z#3]
[info] +- LocalRelation <empty>, [a#0] +-
LocalRelation <empty>, [a#0] (PlanTest.scala:147)
[info] org.scalatest.exceptions.TestFailedException:
[info] at
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
```
Because of `Alias` in `RewriteNonCorrelatedExists `.
Any good advise for test case, where I add test case can avoid this problem?
@cloud-fan @viirya
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]