mridulm commented on code in PR #39459: URL: https://github.com/apache/spark/pull/39459#discussion_r1101665418
########## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ########## @@ -77,6 +77,11 @@ class BlockManagerMasterEndpoint( // Mapping from block id to the set of block managers that have the block. private val blockLocations = new JHashMap[BlockId, mutable.HashSet[BlockManagerId]] + // Mapping from task id to the set of rdd blocks which are generated from the task. + private val tidToRddBlockIds = new mutable.HashMap[Long, mutable.HashSet[RDDBlockId]] + // Record the visible RDD blocks which have been generated at least from one successful task. + private val visibleRDDBlocks = new mutable.HashSet[RDDBlockId] Review Comment: Yes, that is the idea - block gets added to invisible list initially (since task has not completed), and then gets promoted to becoming visible (when task completes). If existing blocks are lost - why would you need that information as they are gone ? In other words, how is it different from today's situation (without visibility) - if a block is lost, it is no longer in system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
