cloud-fan commented on a change in pull request #26207: [SPARK-29552][SQL] 
Execute the "OptimizeLocalShuffleReader" rule when creating new query stage and 
then can optimize the shuffle reader to local shuffle reader as much as 
possible.
URL: https://github.com/apache/spark/pull/26207#discussion_r337854155
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 ##########
 @@ -92,6 +92,14 @@ case class AdaptiveSparkPlanExec(
   // optimizations should be stage-independent.
   @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
     ReuseAdaptiveSubquery(conf, subqueryCache),
+    // We will revert the all local shuffle reader node in 
OptimizeLocalShuffleReader rule
 
 Review comment:
   To polish it a little bit:
   ```
   When adding local shuffle readers in `OptimizeLocalShuffleReader`, we revert 
all the local readers if
   additional shuffles are introduced. This may be too conservative: maybe 
there is only one local reader
   that introduces shuffle, and we can still keep other local readers. Here we 
re-execute this rule with
   the sub-plan-tree of a query stage, to make sure necessary local readers are 
added before executing
   the query stage.
   This rule must be executed before `ReduceNumShufflePartitions`, as local 
shuffle readers can't
   change number of partitions.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to