Github user brad-kaiser commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19041#discussion_r179579886
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala 
---
    @@ -246,6 +251,38 @@ class BlockManagerMasterEndpoint(
         blockManagerIdByExecutor.get(execId).foreach(removeBlockManager)
       }
     
    +  private def recoverLatestRDDBlock(
    +      execId: String,
    +      excludeExecutors: Seq[String],
    +      context: RpcCallContext): Unit = {
    +    logDebug(s"Replicating first cached block on $execId")
    +    val excluded = excludeExecutors.flatMap(blockManagerIdByExecutor.get)
    +    val response: Option[Future[Boolean]] = for {
    +      blockManagerId <- blockManagerIdByExecutor.get(execId)
    +      info <- blockManagerInfo.get(blockManagerId)
    +      blocks = info.cachedBlocks.collect { case r: RDDBlockId => r }
    --- End diff --
    
    Thanks for the clarification @squito . So my original vision for this 
feature was that we would just do in memory blocks, and that this would be more 
lightweight than all blocks, and still useful for spark notebook users who had 
persisted the results of some long calculation. However, maybe replicating all 
blocks would be more useful and then I could get rid of that .checkMem function 
and simplify the code a bit. 
    
    Do you have any preference for which approach to take here? I'm open to 
input.
    
    Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to