squito commented on a change in pull request #24554: [SPARK-27622][Core] 
Avoiding the network when block manager fetches disk persisted RDD blocks from 
the same host
URL: https://github.com/apache/spark/pull/24554#discussion_r282725731
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
 ##########
 @@ -438,12 +439,27 @@ class BlockManagerMasterEndpoint(
     if (blockLocations.containsKey(blockId)) blockLocations.get(blockId).toSeq 
else Seq.empty
   }
 
-  private def getLocationsAndStatus(blockId: BlockId): 
Option[BlockLocationsAndStatus] = {
+  private def getLocationsAndStatus(
+      blockId: BlockId, requesterHost: String): 
Option[BlockLocationsAndStatus] = {
     val locations = 
Option(blockLocations.get(blockId)).map(_.toSeq).getOrElse(Seq.empty)
     val status = locations.headOption.flatMap { bmId => 
blockManagerInfo(bmId).getStatus(blockId) }
 
     if (locations.nonEmpty && status.isDefined) {
-      Some(BlockLocationsAndStatus(locations, status.get))
+      val bmIdToLocalDirs = if (status.get.storageLevel.useDisk) {
+        locations
+          .find(_.host == requesterHost)
 
 Review comment:
   if its been cached by multiple executors on the same host, this will only 
return one of them.  I guess that is OK?  there isn't really an important 
scenario where it would become unavailable from one executor but still 
available on another.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to