dilipbiswal commented on a change in pull request #23211: [SPARK-19712][SQL] 
Move PullupCorrelatedPredicates and RewritePredicateSubquery after 
OptimizeSubqueries
URL: https://github.com/apache/spark/pull/23211#discussion_r240476585
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ##########
 @@ -649,13 +664,16 @@ object CollapseProject extends Rule[LogicalPlan] {
 
   def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
     case p1 @ Project(_, p2: Project) =>
-      if (haveCommonNonDeterministicOutput(p1.projectList, p2.projectList)) {
+      if (haveCommonNonDeterministicOutput(p1.projectList, p2.projectList) ||
+        ScalarSubquery.hasScalarSubquery(p1.projectList) ||
+        ScalarSubquery.hasScalarSubquery(p2.projectList)) {
 
 Review comment:
   @cloud-fan One failing test that i needed to address with this change is in 
subquerysuite.
   ```
   select a, (select sum(b) from l l2 where l2.a <=> l1.a) sum_b from l l1")
   ```
   One main reason is, the Filter ops with outer references were pulled up 
before optimizeSubqueries rule. So by the time other optimization rules kick in 
(like pushDownPredicate etc), it does not see outer references. But with the 
change in the PR, they are present. So another way to handle this is to change 
pushdownPredicate rule to make sure the filter clauses with outer references 
are not moved down. May be thats better way to handle it and keep 
CollapseProject as it is.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to