[GitHub] [spark] cloud-fan commented on a diff in pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

GitBox Tue, 08 Nov 2022 05:49:09 -0800


cloud-fan commented on code in PR #38558:
URL: https://github.com/apache/spark/pull/38558#discussion_r1016656445



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala:
##########
@@ -209,6 +209,19 @@ case class AdaptiveSparkPlanExec(
 
   override def output: Seq[Attribute] = inputPlan.output
 
+  // Try our best to give a stable output partitioning and ordering.

Review Comment:
   I'm trying to understand this "best effort". AFAIK, table cache is lazy. For 
a query that accesses a cached query the first time, the cached query is not 
executed yet so we don't know the output partitioning/ordering and can't 
optimize out shuffles. But when the cached query is accessed the next time, 
it's already executed and we know the output partitioning/ordering.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

Reply via email to