[
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823230#comment-13823230
]
Karthik Kambatla commented on YARN-1028:
----------------------------------------
bq. How do we know that the standby RM will be ready in that much time (if the
current RM has crashed)?
The StandbyRM might not be ready in a second. However, we should connect as
soon as it is ready. There is no harm in alternating between the RMs until the
RM-failover succeeds.
bq. What kind of network errors cause failoverOnNetworkException to be
triggered?
There are a bunch of exceptions that result in RETRY_BY_FAILOVER -
ConnectException,SocketException etc.
bq. Can they be caused due to glitches in the network, thus we would fail over
to the standby and failback again?
Yes, the clientNM might try connecting to a different RM during a network
glitch. However, that is different from a RM-failover. The worst thing that can
happen when the Client/NM tries to connect to the Standby and fails to (or, in
the future, connects and realizes it is not Active) is it trys the Active RM on
the next attempt.
> Add FailoverProxyProvider like capability to RMProxy
> ----------------------------------------------------
>
> Key: YARN-1028
> URL: https://issues.apache.org/jira/browse/YARN-1028
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Karthik Kambatla
> Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch
>
>
> RMProxy layer currently abstracts RM discovery and implements it by looking
> up service information from configuration. Motivated by HDFS and using
> existing classes from Common, we can add failover proxy providers that may
> provide RM discovery in extensible ways.
--
This message was sent by Atlassian JIRA
(v6.1#6144)