[GitHub] [spark] mridulm commented on a diff in pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

via GitHub Sun, 12 Feb 2023 07:05:40 -0800


mridulm commented on code in PR #39459:
URL: https://github.com/apache/spark/pull/39459#discussion_r1103818933



##########
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala:
##########
@@ -77,6 +77,11 @@ class BlockManagerMasterEndpoint(
   // Mapping from block id to the set of block managers that have the block.
   private val blockLocations = new JHashMap[BlockId, 
mutable.HashSet[BlockManagerId]]
 
+  // Mapping from task id to the set of rdd blocks which are generated from 
the task.
+  private val tidToRddBlockIds = new mutable.HashMap[Long, 
mutable.HashSet[RDDBlockId]]
+  // Record the visible RDD blocks which have been generated at least from one 
successful task.
+  private val visibleRDDBlocks = new mutable.HashSet[RDDBlockId]

Review Comment:
   A bit unclear on this actually ... why would a block status be cached on an 
executor (other than driver) which is not hosting the block ?
   If executor is lost, driver will update its state as well due to the loss 
(evict the block) ... if there are other copies of the block, continues to be 
visible (since block exists). If task fails before block transitions to visible 
(or executor loss), driver will evict all blocks.
   
   Note, when executor is fetching a block for computation from remote executor 
- it should be 'seeing' a block only if it is visible.
   (If block is locally available due to replication - then it is invisible 
until marked visible. And so same logic as above would work).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on a diff in pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

Reply via email to