squito commented on issue #23951: [SPARK-13704][CORE][YARN] Re-implement RackResolver to reduce resolving time URL: https://github.com/apache/spark/pull/23951#issuecomment-474023289 wrt `AMRMClient` -- I don't think we should make any change related to it as part of this PR. But I was thinking that we could: 1) Since `ContainerRequest` explicitly says its going to add all racks for the hosts in the request: https://github.com/apache/hadoop/blob/6fa229891e06eea62cb9634efde755f40247e816/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java#L136-L139 Spark should probably not do the same thing itself: https://github.com/apache/spark/blob/7043aee1ba95e92e1cbd0ebafcc5b09b69ee3082/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/LocalityPreferredContainerPlacementStrategy.scala#L141-L143 2a) YARN could change to add another api, where it does *not* add racks for hosts in the list 2b) then spark could change to send the rack from the scheduler to the YarnAllocator, and then use the new api, to avoid making another set of rack lookups in the AM.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
