[
https://issues.apache.org/jira/browse/YARN-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357070#comment-17357070
]
D M Murali Krishna Reddy commented on YARN-10767:
-------------------------------------------------
Thanks [~BilwaST], [~Jim_Brennan] for the review
1. I will handle the null check in the v2 patch.
2. Yes, findActiveRMHAId is going to contact all the RM's but there is no retry
policy in this case, they try to connect only once to each RM, and return the
active one as soon as they are able to connect to any one of the RM. Yes I have
verified the fix on a HA cluster.
3. I also believe it is not necessary to loop through all the RM's once we find
out the active RM. I just wanted to improve the existing functionality without
impacting the existing one, So I have just changed the executing order of RM's
so that we try to connect to the active RM first.
4. I am also not sure why the method name is execOnActiveRM but currently it
executes on all the RM's in a loop.
> Yarn Logs Command retrying on Standby RM for 30 times
> -----------------------------------------------------
>
> Key: YARN-10767
> URL: https://issues.apache.org/jira/browse/YARN-10767
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: D M Murali Krishna Reddy
> Assignee: D M Murali Krishna Reddy
> Priority: Major
> Attachments: YARN-10767.001.patch
>
>
> When ResourceManager HA is enabled and the first RM is unavailable, on
> executing "bin/yarn logs -applicationId <appID> -am 1", we get
> ConnectionException for connecting to the first RM, the ConnectionException
> Occurs for 30 times before it tries to connect to the second RM.
>
> This can be optimized by trying to fetch the logs from the Active RM.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]