attilapiros commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r524264323
##########
File path:
resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
##########
@@ -161,6 +173,35 @@ private[spark] abstract class YarnSchedulerBackend(
totalRegisteredExecutors.get() >= totalExpectedExecutors *
minRegisteredRatio
}
+ override def getShufflePushMergerLocations(
+ numPartitions: Int,
+ resourceProfileId: Int): Seq[BlockManagerId] = {
+ // Currently this is naive way of calculating numMergersDesired for a
stage. In future,
+ // we can use better heuristics to calculate numMergersDesired for a stage.
+ val maxExecutors = if (Utils.isDynamicAllocationEnabled(sc.getConf)) {
+ maxNumExecutors
+ } else {
+ numExecutors
+ }
+ val tasksPerExecutor = sc.resourceProfileManager
+ .resourceProfileFromId(resourceProfileId).maxTasksPerExecutor(sc.conf)
+ val numMergersDesired = math.min(
+ math.max(1, math.ceil(numPartitions / tasksPerExecutor).toInt),
maxExecutors)
+ val minMergersNeeded = math.max(minMergersStaticThreshold,
+ math.floor(numMergersDesired * minMergersThresholdRatio).toInt)
+
+ // Request for numMergersDesired shuffle mergers to
BlockManagerMasterEndpoint
+ // and if its less than minMergersNeeded, we disable push based shuffle.
+ val mergerLocations = blockManagerMaster
+ .getShufflePushMergerLocations(numMergersDesired,
scheduler.excludedNodes())
+ logDebug(s"Num merger locations available ${mergerLocations.length}")
Review comment:
Quick question regarding this log line:
As the return value of this method still can be an empty `Seq` wouldn't it
be more helpful to add this line to the `else` below and add another log above
` Seq.empty[BlockManagerId]` to state the case no merger location will be used.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]