Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19041#discussion_r178959007
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala 
---
    @@ -246,6 +251,38 @@ class BlockManagerMasterEndpoint(
         blockManagerIdByExecutor.get(execId).foreach(removeBlockManager)
       }
     
    +  private def recoverLatestRDDBlock(
    +      execId: String,
    +      excludeExecutors: Seq[String],
    +      context: RpcCallContext): Unit = {
    +    logDebug(s"Replicating first cached block on $execId")
    +    val excluded = excludeExecutors.flatMap(blockManagerIdByExecutor.get)
    +    val response: Option[Future[Boolean]] = for {
    +      blockManagerId <- blockManagerIdByExecutor.get(execId)
    +      info <- blockManagerInfo.get(blockManagerId)
    +      blocks = info.cachedBlocks.collect { case r: RDDBlockId => r }
    --- End diff --
    
    sorry my comment was worded very poorly.  Is there any reason you wouldn't 
want to transfer the on-disk blocks?  I assume you'd want to replicate all of 
them, and just needed to change the comments elsewhere.  Intentional tradeoff, 
as users are more likely to limit the amount of in-memory caching to only the 
most important stuff?
    
    Also just to be really precise -- the check below isn't whether the block 
is in-memory currently, its whether its been requested to cache in memory.  It 
may have been cached as MEMORY_AND_DISK, but currently only resides on disk.  
Depending on why you want to limit to in-memory only, this may not be 
applicable.  Maybe you actually want `!useDisk`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to