Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15036#discussion_r78258706
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -807,8 +801,8 @@ private[spark] class BlockManager(
             // Now that the block is in either the memory or disk store,
             // tell the master about it.
             info.size = size
    -        if (tellMaster) {
    -          reportBlockStatus(blockId, info, putBlockStatus)
    +        if (tellMaster && info.tellMaster) {
    --- End diff --
    
    In `BlockInfo`, which tracks metadata about an individual block (such as 
the desired storage level that the block should be stored at), the `tellMaster` 
field tracks whether the master should be informed of state changes to this 
block. This appears to be false only for blocks which are deserialized copies 
of TorrentBroadcasts (see the `putSingle` calls in `TorrentBroadcast.scala`).
    
    The `tellMaster` parameter, on the other hand, controls whether this 
particular block-status-changing operation should send a metadata update to the 
master. The only place where this seems to be false is in the `removeRdd` code 
path, which is used for bulk-removal of an RDD's cached blocks. In this path, 
the master first performs a bulk deletion of block statuses in its own metadata 
table and then asynchronously deletes the blocks from block managers. I think 
the goal here is to avoid sending one status update per deleted block since 
that might result in a huge flood of RPC traffic at the master and could cause 
bad message queueing (since the block manager metadata-handling endpoint is 
single-threaded).
    
    If we go _way_ back, I think that one original rationale of this may have 
been to avoid sending status updates for map outputs, which at one time may 
have been persisted on disk via the BlockManager rather than bypassing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to