cloud-fan commented on a change in pull request #26207: [SPARK-29552][SQL]
Execute the "OptimizeLocalShuffleReader" rule when creating new query stage and
then can optimize the shuffle reader to local shuffle reader as much as
possible.
URL: https://github.com/apache/spark/pull/26207#discussion_r337854155
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##########
@@ -92,6 +92,14 @@ case class AdaptiveSparkPlanExec(
// optimizations should be stage-independent.
@transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
ReuseAdaptiveSubquery(conf, subqueryCache),
+ // We will revert the all local shuffle reader node in
OptimizeLocalShuffleReader rule
Review comment:
To polish it a little bit:
```
When adding local shuffle readers in `OptimizeLocalShuffleReader`, we revert
all the local readers if
additional shuffles are introduced. This may be too conservative: maybe
there is only one local reader
that introduces shuffle, and we can still keep other local readers. Here we
re-execute this rule with
the sub-plan-tree of a query stage, to make sure necessary local readers are
added before executing
the query stage.
This rule must be executed before `ReduceNumShufflePartitions`, as local
shuffle readers can't
change number of partitions.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]