Github user shubhamchopra commented on the issue:

    https://github.com/apache/spark/pull/13152
  
    The topology info is only queried when the executor initiates and is 
assumed to stay the same throughout the life of the executor. Depending on the 
cluster manager being used, I am assuming the exact way this information is 
provided may differ. Resolving this at the master makes this implementation 
simpler as only the master needs to be able to access the service/script/class 
being used to resolve the topology. The communication overhead is minimal as 
the executors do have to communicate with the master when they initiate anyways.
    
    The getRandomPeer() method was doing quite a bit more than just getting a 
random peer. It was being used to manage/mutate state, which was being mutated 
in other places as well. I tried to keep the block placement strategy and the 
usage of its output separate, to make it simpler to provide a new block 
placement strategy. I also thought it would be best to de-couple any internal 
replication state management with the block replication strategy, while still 
keeping the structure of the state the same.
    
    The costlier operation here is the RPC fetch of all the peers from the 
master. The prioritization algorithm is only called once if there are no 
failures.  If there are failures, the list of peers is requested from the 
master again, before the prioritizer is run. The bigger hit again, would be the 
RPC communication between the executor and the master. Random.shuffle in the 
default prioritizer uses Fisher-Yates shuffle, so is linear in time. 
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to