[GitHub] [spark] itskals commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

GitBox Sat, 02 May 2020 08:07:51 -0700


itskals commented on a change in pull request #28370:
URL: https://github.com/apache/spark/pull/28370#discussion_r418968590




##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##########
@@ -1551,30 +1555,36 @@ private[spark] class BlockManager(
   }
 
   /**
-   * Called for pro-active replenishment of blocks lost due to executor 
failures
+   * Replicates a block to peer block managers based on existingReplicas and 
maxReplicas
    *
    * @param blockId blockId being replicate
    * @param existingReplicas existing block managers that have a replica
    * @param maxReplicas maximum replicas needed
+   * @param maxReplicationFailures number of replication failures to tolerate 
before
+   *                               giving up.
+   * @return whether block was successfully replicated or not
    */
   def replicateBlock(
       blockId: BlockId,
       existingReplicas: Set[BlockManagerId],
-      maxReplicas: Int): Unit = {
+      maxReplicas: Int,
+      maxReplicationFailures: Option[Int] = None): Boolean = {
     logInfo(s"Using $blockManagerId to pro-actively replicate $blockId")
-    blockInfoManager.lockForReading(blockId).foreach { info =>
+    blockInfoManager.lockForReading(blockId).forall { info =>
       val data = doGetLocalBytes(blockId, info)
       val storageLevel = StorageLevel(
         useDisk = info.level.useDisk,
         useMemory = info.level.useMemory,
         useOffHeap = info.level.useOffHeap,
         deserialized = info.level.deserialized,
         replication = maxReplicas)
-      // we know we are called as a result of an executor removal, so we 
refresh peer cache
-      // this way, we won't try to replicate to a missing executor with a 
stale reference
+      // we know we are called as a result of an executor removal or because 
the current executor
+      // is getting decommissioned. so we refresh peer cache before trying 
replication, we won't
+      // try to replicate to a missing executor/another decommissioning 
executor

Review comment:
       I think getPeers Handles it. If so kindly ignore it.

##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##########
@@ -1761,6 +1775,57 @@ private[spark] class BlockManager(
     blocksToRemove.size
   }
 
+  def decommissionBlockManager(): Unit = {
+    if (!blockManagerDecommissioning) {
+      logInfo("Starting block manager decommissioning process")
+      blockManagerDecommissioning = true
+      decommissionManager = Some(new BlockManagerDecommissionManager(conf))
+      decommissionManager.foreach(_.start())
+    } else {
+      logDebug("Block manager already in decommissioning state")
+    }
+  }
+
+  /**
+   * Tries to offload all cached RDD blocks from this BlockManager to peer 
BlockManagers
+   * Visible for testing
+   */
+  def decommissionRddCacheBlocks(): Unit = {

Review comment:
       Just a though.. should it be part of a thread pool rather than a single 
thread doing it?

##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##########
@@ -1829,7 +1895,52 @@ private[spark] class BlockManager(
     data.dispose()
   }
 
+  /**
+   * Class to handle block manager decommissioning retries
+   * It creates a Thread to retry offloading all RDD cache blocks
+   */
+  private class BlockManagerDecommissionManager(conf: SparkConf) {
+    @volatile private var stopped = false
+    private val blockReplicationThread = new Thread {
+      override def run(): Unit = {
+        while (blockManagerDecommissioning && !stopped) {
+          try {
+            logDebug("Attempting to replicate all cached RDD blocks")
+            decommissionRddCacheBlocks()
+            logInfo("Attempt to replicate all cached blocks done")
+            val sleepInterval = conf.get(
+              config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL)
+            Thread.sleep(sleepInterval)
+          } catch {
+            case _: InterruptedException =>
+              // no-op
+            case NonFatal(e) =>
+              logError("Error occurred while trying to " +
+                "replicate cached RDD blocks for block manager 
decommissioning", e)
+          }
+        }
+      }
+    }
+    blockReplicationThread.setDaemon(true)
+    blockReplicationThread.setName("block-replication-thread")
+
+    def start(): Unit = {
+      logInfo("Starting block replication thread")
+      blockReplicationThread.start()
+    }
+
+    def stop(): Unit = {
+      if (!stopped) {
+        stopped = true
+        logInfo("Stopping block replication thread")
+        blockReplicationThread.interrupt()
+        blockReplicationThread.join()

Review comment:
       > My thought is joining the thread here might block if the interrupt 
didn't work as we want it to, what do you think?
   
   I think the initial concern is regarding join waiting infinitely as the 
"stop"  is not percolated deep enough to stop early.... May be it needs to be 
refactored?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itskals commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

Reply via email to