[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

squito Wed, 04 Apr 2018 13:09:07 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19041#discussion_r179266769
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala 
---
    @@ -252,6 +257,44 @@ class BlockManagerMasterEndpoint(
         blockManagerIdByExecutor.get(execId).foreach(removeBlockManager)
       }
     
    +  private def recoverLatestRDDBlock(
    +      execId: String,
    +      excludeExecutors: Seq[String],
    +      context: RpcCallContext): Unit = {
    +    logDebug(s"Replicating first cached block on $execId")
    +    val excluded = excludeExecutors.flatMap(blockManagerIdByExecutor.get)
    +    val response: Option[Future[Boolean]] = for {
    +      blockManagerId <- blockManagerIdByExecutor.get(execId)
    +      info <- blockManagerInfo.get(blockManagerId)
    +      blocks = info.cachedBlocks.collect { case r: RDDBlockId => r }
    +      // As a heuristic, prioritize replicating the latest rdd. If this 
succeeds,
    +      // CacheRecoveryManager will try to replicate the remaining rdds.
    +      firstBlock <- if (blocks.isEmpty) None else 
Some(blocks.maxBy(_.rddId))
    +      replicaSet <- blockLocations.asScala.get(firstBlock)
    +      // Add 2 to force this block to be replicated to one new executor.
    +      maxReps = replicaSet.size + 2
    --- End diff --
    
    I figured out why you need +2 instead of +1.  The existing code wants you 
to explicitly *remove* id of the blockManager you're trying to replicate from 
in `replicaSet`.  See:
    
    
https://github.com/apache/spark/blob/cccaaa14ad775fb981e501452ba2cc06ff5c0f0a/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L236-L239
    
    While the existing code is confusing, I definitely don't like using +2 here 
as a workaround, as it gets pretty confusing.  I'd at least update the comments 
on `BlockManager.replicate()` etc., or maybe just change its behavior and 
update the callsites.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

Reply via email to