sigmod commented on a change in pull request #32742:
URL: https://github.com/apache/spark/pull/32742#discussion_r647227099
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##########
@@ -603,6 +603,19 @@ case class AdaptiveSparkPlanExec(
(newPlan, optimized)
}
+ /**
+ * Clean up logical plan stats before re-optimize
+ */
+ private def cleanupStats(logicalPlan: LogicalPlan): Unit = {
+ logicalPlan.invalidateStatsCache()
+ // We must invalidate ineffective rules before re-optimize since AQE
Optimizer may introduce
+ // LocalRelation that can affect result.
Review comment:
>> some AQE Optimier rules only match the materialized logical plan
IIUC, the key issue is that LogicalQueryStage.sparkPlan is not immutable.
In general, Catalyst plans are immutable and copy-on-modification but
LogicalQueryStage sounds like a violation. There're a few write-once var fields
but they're essentially lazy or a cache.
Therefore, I think we should not use ruleId pruning for AQE at this point so
as not to cause weird issues. We can still do pattern bits pruning by passing
the pattern Lambda but not the ruleId to transform functions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]