[ https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107685#comment-15107685 ]
Jian He commented on YARN-4496: ------------------------------- Uploaded a patch: - added a new RequestHedgingRMFailoverProxyProvider. When client tries to failover, it uses separate proxy object to talk to each RM simultaneously , each proxy retries the RM until the first one receives a response from the active RM. All the other requests are then cancelled. - changed the default rm-retry-interval to be 5 seconds, 30 seconds interval I think is too long. > Improve HA ResourceManager Failover detection on the client > ----------------------------------------------------------- > > Key: YARN-4496 > URL: https://issues.apache.org/jira/browse/YARN-4496 > Project: Hadoop YARN > Issue Type: Improvement > Components: client, resourcemanager > Reporter: Arun Suresh > Assignee: Jian He > Attachments: YARN-4496.1.patch > > > HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to > improve Namenode failover detection in the client. It does this by > concurrently trying all namenodes and picks the namenode that returns the > fastest with a successful response as the active node. > It would be useful to have a similar ProxyProvider for the Yarn RM (it can > possibly be done by converging some the class hierarchies to use the same > ProxyProvider) > This would especially be useful for large YARN deployments with multiple > standby RMs where clients will be able to pick the active RM without having > to traverse a list of configured RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)