cloud-fan commented on a change in pull request #26207: [SPARK-29552][SQL]
Execute the "OptimizeLocalShuffleReader" rule when creating new query stage and
then can optimize the shuffle reader to local shuffle reader as much as
possible.
URL: https://github.com/apache/spark/pull/26207#discussion_r337854155
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##########
@@ -92,6 +92,14 @@ case class AdaptiveSparkPlanExec(
// optimizations should be stage-independent.
@transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
ReuseAdaptiveSubquery(conf, subqueryCache),
+ // We will revert the all local shuffle reader node in
OptimizeLocalShuffleReader rule
Review comment:
To polish it a little bit:
```
When adding local shuffle readers in `OptimizeLocalShuffleReader`, we revert
all the local
readers if additional shuffles are introduced. This may be too conservative:
maybe there is
only one local reader that introduces shuffle, and we can still keep other
local readers.
Here we re-execute this rule with the sub-plan-tree of a query stage, to
make sure necessary
local readers are added before executing the query stage.
This rule must be executed before `ReduceNumShufflePartitions`, as local
shuffle readers can't
change number of partitions.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]