Github user brad-kaiser commented on a diff in the pull request:
https://github.com/apache/spark/pull/19041#discussion_r179579886
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
---
@@ -246,6 +251,38 @@ class BlockManagerMasterEndpoint(
blockManagerIdByExecutor.get(execId).foreach(removeBlockManager)
}
+ private def recoverLatestRDDBlock(
+ execId: String,
+ excludeExecutors: Seq[String],
+ context: RpcCallContext): Unit = {
+ logDebug(s"Replicating first cached block on $execId")
+ val excluded = excludeExecutors.flatMap(blockManagerIdByExecutor.get)
+ val response: Option[Future[Boolean]] = for {
+ blockManagerId <- blockManagerIdByExecutor.get(execId)
+ info <- blockManagerInfo.get(blockManagerId)
+ blocks = info.cachedBlocks.collect { case r: RDDBlockId => r }
--- End diff --
Thanks for the clarification @squito . So my original vision for this
feature was that we would just do in memory blocks, and that this would be more
lightweight than all blocks, and still useful for spark notebook users who had
persisted the results of some long calculation. However, maybe replicating all
blocks would be more useful and then I could get rid of that .checkMem function
and simplify the code a bit.
Do you have any preference for which approach to take here? I'm open to
input.
Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]