squito commented on a change in pull request #24554: [SPARK-27622][Core] Avoiding the network when block manager fetches disk persisted RDD blocks from the same host URL: https://github.com/apache/spark/pull/24554#discussion_r282725731
########## File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala ########## @@ -438,12 +439,27 @@ class BlockManagerMasterEndpoint( if (blockLocations.containsKey(blockId)) blockLocations.get(blockId).toSeq else Seq.empty } - private def getLocationsAndStatus(blockId: BlockId): Option[BlockLocationsAndStatus] = { + private def getLocationsAndStatus( + blockId: BlockId, requesterHost: String): Option[BlockLocationsAndStatus] = { val locations = Option(blockLocations.get(blockId)).map(_.toSeq).getOrElse(Seq.empty) val status = locations.headOption.flatMap { bmId => blockManagerInfo(bmId).getStatus(blockId) } if (locations.nonEmpty && status.isDefined) { - Some(BlockLocationsAndStatus(locations, status.get)) + val bmIdToLocalDirs = if (status.get.storageLevel.useDisk) { + locations + .find(_.host == requesterHost) Review comment: if its been cached by multiple executors on the same host, this will only return one of them. I guess that is OK? there isn't really an important scenario where it would become unavailable from one executor but still available on another. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org