Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/19041#discussion_r178959007
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
---
@@ -246,6 +251,38 @@ class BlockManagerMasterEndpoint(
blockManagerIdByExecutor.get(execId).foreach(removeBlockManager)
}
+ private def recoverLatestRDDBlock(
+ execId: String,
+ excludeExecutors: Seq[String],
+ context: RpcCallContext): Unit = {
+ logDebug(s"Replicating first cached block on $execId")
+ val excluded = excludeExecutors.flatMap(blockManagerIdByExecutor.get)
+ val response: Option[Future[Boolean]] = for {
+ blockManagerId <- blockManagerIdByExecutor.get(execId)
+ info <- blockManagerInfo.get(blockManagerId)
+ blocks = info.cachedBlocks.collect { case r: RDDBlockId => r }
--- End diff --
sorry my comment was worded very poorly. Is there any reason you wouldn't
want to transfer the on-disk blocks? I assume you'd want to replicate all of
them, and just needed to change the comments elsewhere. Intentional tradeoff,
as users are more likely to limit the amount of in-memory caching to only the
most important stuff?
Also just to be really precise -- the check below isn't whether the block
is in-memory currently, its whether its been requested to cache in memory. It
may have been cached as MEMORY_AND_DISK, but currently only resides on disk.
Depending on why you want to limit to in-memory only, this may not be
applicable. Maybe you actually want `!useDisk`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]