[
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823072#comment-13823072
]
Karthik Kambatla commented on YARN-1028:
----------------------------------------
Few points to discuss on the patch:
# RMProxy uses failoverOnNetworkException for failover. When connecting to an
RM, the Client tries to connect every 1 second for
{{ipc.client.connect.max.retries}} times. Currently, the default for this is
10. So, client/AM/NM tries to connect to one of the RMs for at least 10 seconds
before trying the other one. We should try to avoid this. May be, we can
explicitly set {{ipc.client.connect.max.retries}} to a low value (1 or 2) in
RMProxy or add a {{ipc.client.connect.retry.interval}} and set that to a low
value (0.1 second) in RMProxy.
# Have a config for number of failovers to tolerate, and a retry-interval for
that?
# Unit test: should we use MiniYarnCluster with HA (YARN-1181) or create a
DummyRM that listens on the three RPC servers and counts the incoming
connections everytime the Clients/AMs/NMs connect to it?
> Add FailoverProxyProvider like capability to RMProxy
> ----------------------------------------------------
>
> Key: YARN-1028
> URL: https://issues.apache.org/jira/browse/YARN-1028
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Karthik Kambatla
> Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch
>
>
> RMProxy layer currently abstracts RM discovery and implements it by looking
> up service information from configuration. Motivated by HDFS and using
> existing classes from Common, we can add failover proxy providers that may
> provide RM discovery in extensible ways.
--
This message was sent by Atlassian JIRA
(v6.1#6144)