[GitHub] [spark] mridulm commented on a diff in pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

via GitHub Tue, 14 Feb 2023 23:47:18 -0800


mridulm commented on code in PR #39459:
URL: https://github.com/apache/spark/pull/39459#discussion_r1106753599



##########
core/src/main/scala/org/apache/spark/storage/BlockManager.scala:
##########
@@ -1424,6 +1457,16 @@ private[spark] class BlockManager(
     blockStoreUpdater.save()
   }
 
+  // Check whether a rdd block is visible or not.
+  private[spark] def isRDDBlockVisible(blockId: RDDBlockId): Boolean = {
+    // If the rdd block visibility information not available in the block 
manager,
+    // asking master for the information.
+    if (blockInfoManager.isRDDBlockVisible(blockId)) {
+      return true
+    }
+    master.isRDDBlockVisible(blockId)

Review Comment:
   > With above mechanism, do you think we still need another cache to store 
the visiblity information in executor or do we also need to cache the state in 
executors not having the cached block data stored?
   
   You are right, the current PR is handling it on a second read ... Since we 
are already checking for `blockInfoManager.isRDDBlockVisible(blockId)` first.
   This should cover the case of (1) - and we will always query in case block 
is available, and we have to distinguish (2).
   (2.1) would be an optimization we can attempt later on.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on a diff in pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

Reply via email to