dilipbiswal commented on a change in pull request #23211: [SPARK-19712][SQL]
Move PullupCorrelatedPredicates and RewritePredicateSubquery after
OptimizeSubqueries
URL: https://github.com/apache/spark/pull/23211#discussion_r240476585
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -649,13 +664,16 @@ object CollapseProject extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
case p1 @ Project(_, p2: Project) =>
- if (haveCommonNonDeterministicOutput(p1.projectList, p2.projectList)) {
+ if (haveCommonNonDeterministicOutput(p1.projectList, p2.projectList) ||
+ ScalarSubquery.hasScalarSubquery(p1.projectList) ||
+ ScalarSubquery.hasScalarSubquery(p2.projectList)) {
Review comment:
@cloud-fan One failing test that i needed to address with this change is in
subquerysuite.
```
select a, (select sum(b) from l l2 where l2.a <=> l1.a) sum_b from l l1")
```
One main reason is, the Filter ops with outer references were pulled up
before optimizeSubqueries rule. So by the time other optimization rules kick in
(like pushDownPredicate etc), it does not see outer references. But with the
change in the PR, they are present. So another way to handle this is to change
pushdownPredicate rule to make sure the filter clauses with outer references
are not moved down. May be thats better way to handle it and keep
CollapseProject as it is.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]