cloud-fan commented on a change in pull request #26516: [SPARK-29893] improve
the local shuffle reader performance by changing the reading task number from 1
to multi.
URL: https://github.com/apache/spark/pull/26516#discussion_r347727154
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##########
@@ -82,24 +82,18 @@ case class AdaptiveSparkPlanExec(
// plan should reach a final status of query stages (i.e., no more addition
or removal of
// Exchange nodes) after running these rules.
private def queryStagePreparationRules: Seq[Rule[SparkPlan]] = Seq(
- OptimizeLocalShuffleReader(conf),
+ // OptimizeLocalShuffleReaderInBuildSide(conf),
ensureRequirements
)
// A list of physical optimizer rules to be applied to a new stage before
its execution. These
// optimizations should be stage-independent.
@transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
ReuseAdaptiveSubquery(conf, subqueryCache),
-
- // When adding local shuffle readers in 'OptimizeLocalShuffleReader`, we
revert all the local
- // readers if additional shuffles are introduced. This may be too
conservative: maybe there is
- // only one local reader that introduces shuffle, and we can still keep
other local readers.
- // Here we re-execute this rule with the sub-plan-tree of a query stage,
to make sure necessary
- // local readers are added before executing the query stage.
- // This rule must be executed before `ReduceNumShufflePartitions`, as
local shuffle readers
- // can't change number of partitions.
- OptimizeLocalShuffleReader(conf),
ReduceNumShufflePartitions(conf),
+ // The rule of 'OptimizeLocalShuffleReader' need make use of the
'partitionStartIndices'
Review comment:
`need to`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]