Victsm commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r513103688



##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -1252,6 +1254,28 @@ private[spark] class DAGScheduler(
     execCores.map(cores => 
properties.setProperty(EXECUTOR_CORES_LOCAL_PROPERTY, cores))
   }
 
+  /**
+   * If push based shuffle is enabled, set the shuffle services to be used for 
the given
+   * shuffle map stage. The list of shuffle services is determined based on 
the list of
+   * active executors tracked by block manager master at the start of the 
stage.
+   */
+  private def prepareShuffleServicesForShuffleMapStage(stage: ShuffleMapStage) 
{

Review comment:
       We also want to get community inputs on this.
   This PR provides the implementation for shuffle service (merger) location 
selection logic for YARN with support for dynamic resource allocation.
   The selected shuffle service locations will be provided to `ShuffleMapTask` 
via `ShuffleDependency`, so that all mappers can push blocks to corresponding 
shuffle services.
   We think it would be better to make the `getMergerLocations` API pluggable, 
so that in addition to this default implementation for YARN with DRA support, 
people can provide their own implementations based on their unique cluster 
deployment setup.
   For example, in cluster deployment with disaggregated shuffle services, 
where the shuffle service availability to a given Spark application is not as 
dynamic as the case of YARN with DRA, the pluggable API would enable a very 
different strategy for selecting merger locations.
   
   Want to get inputs from the community for this pluggable merger selection 
API, and the parameters that would fit into such an API.
   CC @Ngone51 @jiangxb1987 @attilapiros @tgravescs @mridulm 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to