ivoson commented on code in PR #39459:
URL: https://github.com/apache/spark/pull/39459#discussion_r1103799711
##########
core/src/main/scala/org/apache/spark/storage/BlockManager.scala:
##########
@@ -1424,6 +1457,16 @@ private[spark] class BlockManager(
blockStoreUpdater.save()
}
+ // Check whether a rdd block is visible or not.
+ private[spark] def isRDDBlockVisible(blockId: RDDBlockId): Boolean = {
+ // If the rdd block visibility information not available in the block
manager,
+ // asking master for the information.
+ if (blockInfoManager.isRDDBlockVisible(blockId)) {
+ return true
+ }
+ master.isRDDBlockVisible(blockId)
Review Comment:
Hi @mridulm, in current implementation, once a block turns to be visible,
driver would send a broadcast message to executors having the cached block data
stored to mark the block as visible.
The state `visibleRDDBlocks` is cached in
[BlockerInfoManager](https://github.com/apache/spark/pull/39459/files#diff-fdee2ef66ad5bea5323506395b453145c74f47c8da092dcacd34a66190a20a15).
It is kind of cached visiblity state in executor side but only in executors
which have the cached block stored. This is done in a push-based update style.
With above mechanism, do you think we still need another cache to store the
visiblity information in executor or do we also need to cache the state in
executors not having the cached block data stored?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]