Victsm commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r513103688
##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -1252,6 +1254,28 @@ private[spark] class DAGScheduler(
execCores.map(cores =>
properties.setProperty(EXECUTOR_CORES_LOCAL_PROPERTY, cores))
}
+ /**
+ * If push based shuffle is enabled, set the shuffle services to be used for
the given
+ * shuffle map stage. The list of shuffle services is determined based on
the list of
+ * active executors tracked by block manager master at the start of the
stage.
+ */
+ private def prepareShuffleServicesForShuffleMapStage(stage: ShuffleMapStage)
{
Review comment:
We also want to get community inputs on this.
This PR provides the implementation for shuffle service (merger) location
selection logic for YARN with support for dynamic resource allocation.
The selected shuffle service locations will be provided to `ShuffleMapTask`
via `ShuffleDependency`, so that all mappers can push blocks to corresponding
shuffle services.
We think it would be better to make the `getMergerLocations` API pluggable,
so that in addition to this default implementation for YARN with DRA support,
people can provide their own implementations based on their unique cluster
deployment setup.
For example, in cluster deployment with disaggregated shuffle services,
where the shuffle service availability to a given Spark application is not as
dynamic as the case of YARN with DRA, the pluggable API would enable a very
different strategy for selecting merger locations.
Want to get inputs from the community for this pluggable merger selection
API, and the parameters that would fit into such an API.
CC @Ngone51 @jiangxb1987 @attilapiros @tgravescs @mridulm
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]