Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/126#discussion_r11268425
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala ---
    @@ -125,6 +141,34 @@ class BlockManagerMaster(var driverActor: ActorRef, 
conf: SparkConf) extends Log
         askDriverWithReply[Array[StorageStatus]](GetStorageStatus)
       }
     
    +  /**
    +   * Return the block's status on all block managers, if any.
    +   *
    +   * If askSlaves is true, this invokes the master to query each block 
manager for the most
    +   * updated block statuses. This is useful when the master is not 
informed of the given block
    +   * by all block managers.
    +   */
    +  def getBlockStatus(
    +      blockId: BlockId,
    +      askSlaves: Boolean = true): Map[BlockManagerId, BlockStatus] = {
    --- End diff --
    
    This seems like a pretty expensive operation - what if there are hundreds 
of `BlockManagers`. It might make sense to say in the doc that this should only 
be used for testing. Otherwise people will come along and use it without 
understanding the performance implications.
    
    Another thought here (let's talk offline) we should make it explicit which 
blocks the BlockManagerMaster is always informed about vs which ones it might 
not know about. Right now it's not made explicit anywhere and it's hard to 
reason about.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to