cloud-fan commented on code in PR #38558:
URL: https://github.com/apache/spark/pull/38558#discussion_r1018732458
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala:
##########
@@ -111,10 +112,15 @@ case class InMemoryTableScanExec(
override def output: Seq[Attribute] = attributes
+ private def cachedPlan = relation.cachedPlan match {
+ case adaptive: AdaptiveSparkPlanExec if adaptive.isFinalized =>
adaptive.executedPlan
+ case other => other
Review Comment:
Another idea is to materialize the AQE plan eagerly so that even the first
cache access can be optimized. However, this requires triggering query
execution during query planning, which is a bit risky.
A good practice is to ask users to do query caching eagerly, e.g. do a
`df.count` right after the `df` is cached. Then they won't observe
inconsistencies. Anyway, I think this PR is a net win as it optimizes all the
following cache accesses after the first access. This is important for query
caching as cache is meant to be accessed repeatedly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]