cloud-fan commented on a change in pull request #26207: [SPARK-29552][SQL] 
Execute the "OptimizeLocalShuffleReader" rule when creating new query stage and 
then can optimize the shuffle reader to local shuffle reader as much as 
possible.
URL: https://github.com/apache/spark/pull/26207#discussion_r337854155
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 ##########
 @@ -92,6 +92,14 @@ case class AdaptiveSparkPlanExec(
   // optimizations should be stage-independent.
   @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
     ReuseAdaptiveSubquery(conf, subqueryCache),
+    // We will revert the all local shuffle reader node in 
OptimizeLocalShuffleReader rule
 
 Review comment:
   To polish it a little bit:
   ```
   When adding local shuffle readers in `OptimizeLocalShuffleReader`, we revert 
all the local
   readers if additional shuffles are introduced. This may be too conservative: 
maybe there is
   only one local reader that introduces shuffle, and we can still keep other 
local readers.
   Here we re-execute this rule with the sub-plan-tree of a query stage, to 
make sure necessary
   local readers are added before executing the query stage.
   This rule must be executed before `ReduceNumShufflePartitions`, as local 
shuffle readers can't
   change number of partitions.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to