[GitHub] [spark] mridulm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE] Driver side changes for coordinating push based shuffle by selecting external shuffle services for merging partitions

GitBox Tue, 10 Nov 2020 00:04:21 -0800


mridulm commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r520320751




##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -1938,4 +1938,38 @@ package object config {
       .version("3.0.1")
       .booleanConf
       .createWithDefault(false)
+
+  private[spark] val PUSH_BASED_SHUFFLE_ENABLED =
+    ConfigBuilder("spark.shuffle.push.enabled")
+      .doc("Set to 'true' to enable push based shuffle on the client side and 
this works in" +
+        "conjunction with the server side flag 
spark.shuffle.server.mergedShuffleFileManagerImpl" +
+        "which needs to be set with the appropriate" +
+        "org.apache.spark.network.shuffle.MergedShuffleFileManager 
implementation for push-based" +
+        "shuffle to be enabled")
+      .booleanConf
+      .createWithDefault(false)
+
+  private[spark] val MAX_MERGER_LOCATIONS_CACHED =
+    ConfigBuilder("spark.shuffle.push.retainedMergerLocations")

Review comment:
       nit: config name and variable name/description seem to be out of sync ? 
Was this due to some rename refactoring ?

##########
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##########
@@ -95,6 +96,20 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: 
ClassTag](
   val shuffleHandle: ShuffleHandle = 
_rdd.context.env.shuffleManager.registerShuffle(
     shuffleId, this)
 
+  /**
+   * Stores the location of the list of chosen external shuffle services for 
handling the
+   * shuffle merge requests from mappers in this shuffle map stage.
+   */
+  private[spark] var mergerLocs: Seq[BlockManagerId] = Nil
+
+  def setMergerLocs(mergerLocs: Seq[BlockManagerId]): Unit = {
+    if (mergerLocs != null && mergerLocs.length > 0) {

Review comment:
       Why are we checking for `mergerLocs.length > 0` ?

##########
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##########
@@ -657,6 +681,28 @@ class BlockManagerMasterEndpoint(
     }
   }
 
+  private def getShufflePushMergerLocations(
+      numMergersNeeded: Int,
+      hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+    val activeBlockManagers = blockManagerIdByExecutor.groupBy(_._2.host)
+      .mapValues(_.head).values.map(_._2).toSet
+    val filteredActiveBlockManagers = activeBlockManagers
+      .filterNot(x => hostsToFilter.contains(x.host))
+    val filteredActiveMergers = filteredActiveBlockManagers.map(
+      x => BlockManagerId(x.executorId, x.host, 
StorageUtils.externalShuffleServicePort(conf)))
+
+    // Enough mergers are available as part of active executors list
+    if (filteredActiveMergers.size >= numMergersNeeded) {
+      filteredActiveMergers.toSeq
+    } else {
+      // Delta mergers added from inactive mergers list to the active mergers 
list
+      val filteredDeadMergers = shuffleMergerLocations.values
+        .filterNot(mergerHost => filteredActiveMergers.exists(x => x.host == 
mergerHost.host))
+      filteredActiveMergers.toSeq ++
+        filteredDeadMergers.toSeq.take(numMergersNeeded - 
filteredActiveMergers.size)

Review comment:
       This will mean we will keep prioritizing hosts which will be removed in 
`addMergerLocation` when thresholds are hit.
   Is this intentional ?

##########
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##########
@@ -526,6 +548,8 @@ class BlockManagerMasterEndpoint(
 
       blockManagerInfo(id) = new BlockManagerInfo(id, 
System.currentTimeMillis(),
         maxOnHeapMemSize, maxOffHeapMemSize, storageEndpoint, 
externalShuffleServiceBlockStatus)
+
+      addMergerLocation(id)

Review comment:
       It is not a copy, but a materialized view of candidate hosts where 
external shuffle has been configured for the current application. It becomes a 
copy only when there is only 1 executor per host.
   The cardinality of this map is, btw, low in comparison to total executors - 
given multi tenancy and given `maxRetainedMergerLocations` threshold.

##########
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##########
@@ -360,6 +371,17 @@ class BlockManagerMasterEndpoint(
 
   }
 
+  private def addMergerLocation(blockManagerId: BlockManagerId): Unit = {
+    if (!shuffleMergerLocations.contains(blockManagerId.host) && 
!blockManagerId.isDriver) {

Review comment:
       super nit: change order and check `!isDriver` first

##########
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##########
@@ -657,6 +681,28 @@ class BlockManagerMasterEndpoint(
     }
   }
 
+  private def getShufflePushMergerLocations(
+      numMergersNeeded: Int,
+      hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+    val activeBlockManagers = blockManagerIdByExecutor.groupBy(_._2.host)
+      .mapValues(_.head).values.map(_._2).toSet
+    val filteredActiveBlockManagers = activeBlockManagers
+      .filterNot(x => hostsToFilter.contains(x.host))
+    val filteredActiveMergers = filteredActiveBlockManagers.map(
+      x => BlockManagerId(x.executorId, x.host, 
StorageUtils.externalShuffleServicePort(conf)))
+
+    // Enough mergers are available as part of active executors list
+    if (filteredActiveMergers.size >= numMergersNeeded) {
+      filteredActiveMergers.toSeq
+    } else {
+      // Delta mergers added from inactive mergers list to the active mergers 
list
+      val filteredDeadMergers = shuffleMergerLocations.values
+        .filterNot(mergerHost => filteredActiveMergers.exists(x => x.host == 
mergerHost.host))

Review comment:
       This is O(N^2) - create a set of hosts from `filteredActiveMergers`, and 
check against that instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE] Driver side changes for coordinating push based shuffle by selecting external shuffle services for merging partitions

Reply via email to