[GitHub] [spark] venkata91 commented on a change in pull request #34122: [SPARK-34826][SHUFFLE] Adaptively fetch shuffle mergers for push based shuffle

GitBox Wed, 19 Jan 2022 17:51:35 -0800


venkata91 commented on a change in pull request #34122:
URL: https://github.com/apache/spark/pull/34122#discussion_r788286262




##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -1176,6 +1223,9 @@ private[spark] class MapOutputTrackerWorker(conf: 
SparkConf) extends MapOutputTr
   // instantiate a serializer. See the followup to SPARK-36705 for more 
details.
   private lazy val fetchMergeResult = Utils.isPushBasedShuffleEnabled(conf, 
isDriver = false)
 
+  // Exposed for testing
+  val shufflePushMergerLocations = new ConcurrentHashMap[Int, 
Seq[BlockManagerId]]().asScala

Review comment:
       1. For `MapOutputTrackerMaster`, instead of keeping track of both 
`<shuffleId, shuffleMergeId>` for shuffle merger locations, I think we can 
piggy back on `unregisterAllMapAndMergeOutput` to clear the `mergers` 
in`ShuffleStatus` 
   
   2. For `MapOutputTrackerWorker` based on this 
[comment](https://github.com/apache/spark/pull/34122#discussion_r785388799)), 
we can clear the `shufflePushMergerLocations` in `updateEpoch`.
   
    This makes the code much simpler and neat. Let me know if you think 
otherwise or this has any other issues.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] venkata91 commented on a change in pull request #34122: [SPARK-34826][SHUFFLE] Adaptively fetch shuffle mergers for push based shuffle

Reply via email to