Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/13152
  
    A couple high level questions:
    - Rather than send an RPC to the master asking for a worker's topology 
info, is it possible for this to be provided at initialization time or 
determined based on the environment?
    
    - Is it possible to narrow the interface of the prioritizer to just choose 
a single next peer? If it is desired to cache the prioritization order, this 
can be done internally within the prioritizer. For example, the interface could 
be something like this. Then the default prioritizer does not need to do a 
random shuffle of the entire peer list to choose its target.
    
    ```
    trait BlockReplicationStrategy {
    
      trait ReplicationTargetSelector {
        def getNextPeer(
          candidatePeers: Set[BlockManagerId],
          successfulReplications: Set[BlockManagerId],
          failedReplications: Set[BlockManagerId]): Option[BlockManagerId]
      }
    
      def getTargetSelector(
        localId: BlockManagerId,
        blockId: BlockId,
        level: StorageLevel): ReplicationTargetSelector
    }
    ```
    
    Also, the patch would be more minimal if only the `getRandomPeer()` call 
was changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to