holdenk commented on a change in pull request #28370:
URL: https://github.com/apache/spark/pull/28370#discussion_r419597557
##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##########
@@ -1551,30 +1555,36 @@ private[spark] class BlockManager(
}
/**
- * Called for pro-active replenishment of blocks lost due to executor
failures
+ * Replicates a block to peer block managers based on existingReplicas and
maxReplicas
*
* @param blockId blockId being replicate
* @param existingReplicas existing block managers that have a replica
* @param maxReplicas maximum replicas needed
+ * @param maxReplicationFailures number of replication failures to tolerate
before
+ * giving up.
+ * @return whether block was successfully replicated or not
*/
def replicateBlock(
blockId: BlockId,
existingReplicas: Set[BlockManagerId],
- maxReplicas: Int): Unit = {
+ maxReplicas: Int,
+ maxReplicationFailures: Option[Int] = None): Boolean = {
logInfo(s"Using $blockManagerId to pro-actively replicate $blockId")
- blockInfoManager.lockForReading(blockId).foreach { info =>
+ blockInfoManager.lockForReading(blockId).forall { info =>
val data = doGetLocalBytes(blockId, info)
val storageLevel = StorageLevel(
useDisk = info.level.useDisk,
useMemory = info.level.useMemory,
useOffHeap = info.level.useOffHeap,
deserialized = info.level.deserialized,
replication = maxReplicas)
- // we know we are called as a result of an executor removal, so we
refresh peer cache
- // this way, we won't try to replicate to a missing executor with a
stale reference
+ // we know we are called as a result of an executor removal or because
the current executor
+ // is getting decommissioned. so we refresh peer cache before trying
replication, we won't
+ // try to replicate to a missing executor/another decommissioning
executor
Review comment:
I don't think getPeers handles that situation currently, it's handled
inside of blockReplicationPolicy `prioritize` which is called inside of
`replicate`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]