ulysses-you commented on a change in pull request #32742:
URL: https://github.com/apache/spark/pull/32742#discussion_r644593517



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##########
@@ -603,6 +603,19 @@ case class AdaptiveSparkPlanExec(
     (newPlan, optimized)
   }
 
+  /**
+   * Clean up logical plan stats before re-optimize
+   */
+  private def cleanupStats(logicalPlan: LogicalPlan): Unit = {
+    logicalPlan.invalidateStatsCache()
+    // We must invalidate ineffective rules before re-optimize since AQE 
Optimizer may introduce
+    // LocalRelation that can affect result.

Review comment:
       Let's say we have a complex plan.
   * two join with two exchange
   *  join2 rigth side is empty
   
   the logical join looks like:
   ```
   Join2
   :- Join1
   :  :  +- Relation parquet
   :  +- LogicalQueryStage Project, BroadcastQueryStage 0
   +- LogicalQueryStage Relation parquet, ShuffleQueryStage 1
   ```
   
   * first round, QueryStage 0 is materialized and QueryStage 1 is not 
materialized. Then AQE Optimier can only optimize join1 but unfortunately the 
rule has no effects. At this monent, we would mark this rule as ineffictive.
   * second round, QueryStage 1 is materialized. Then AQE Optimier can optimize 
join2 but the rule has been already marked as ineffictive. As a result we 
cann't optimize it.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to