[ 
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823072#comment-13823072
 ] 

Karthik Kambatla commented on YARN-1028:
----------------------------------------

Few points to discuss on the patch:
# RMProxy uses failoverOnNetworkException for failover. When connecting to an 
RM, the Client tries to connect every 1 second for 
{{ipc.client.connect.max.retries}} times. Currently, the default for this is 
10. So, client/AM/NM tries to connect to one of the RMs for at least 10 seconds 
before trying the other one. We should try to avoid this. May be, we can 
explicitly set {{ipc.client.connect.max.retries}} to a low value (1 or 2) in 
RMProxy or add a {{ipc.client.connect.retry.interval}} and set that to a low 
value (0.1 second) in RMProxy.
# Have a config for number of failovers to tolerate, and a retry-interval for 
that? 
# Unit test: should we use MiniYarnCluster with HA (YARN-1181) or create a 
DummyRM that listens on the three RPC servers and counts the incoming 
connections everytime the Clients/AMs/NMs connect to it?


> Add FailoverProxyProvider like capability to RMProxy
> ----------------------------------------------------
>
>                 Key: YARN-1028
>                 URL: https://issues.apache.org/jira/browse/YARN-1028
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch
>
>
> RMProxy layer currently abstracts RM discovery and implements it by looking 
> up service information from configuration. Motivated by HDFS and using 
> existing classes from Common, we can add failover proxy providers that may 
> provide RM discovery in extensible ways.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to