Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13152
The topology info is only queried when the executor initiates and is
assumed to stay the same throughout the life of the executor. Depending on the
cluster manager being used, I am assuming the exact way this information is
provided may differ. Resolving this at the master makes this implementation
simpler as only the master needs to be able to access the service/script/class
being used to resolve the topology. The communication overhead is minimal as
the executors do have to communicate with the master when they initiate anyways.
The getRandomPeer() method was doing quite a bit more than just getting a
random peer. It was being used to manage/mutate state, which was being mutated
in other places as well. I tried to keep the block placement strategy and the
usage of its output separate, to make it simpler to provide a new block
placement strategy. I also thought it would be best to de-couple any internal
replication state management with the block replication strategy, while still
keeping the structure of the state the same.
The costlier operation here is the RPC fetch of all the peers from the
master. The prioritization algorithm is only called once if there are no
failures. If there are failures, the list of peers is requested from the
master again, before the prioritizer is run. The bigger hit again, would be the
RPC communication between the executor and the master. Random.shuffle in the
default prioritizer uses Fisher-Yates shuffle, so is linear in time.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]