[jira] [Commented] (YARN-10767) Yarn Logs Command retrying on Standby RM for 30 times

D M Murali Krishna Reddy (Jira) Thu, 03 Jun 2021 22:29:08 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357070#comment-17357070
 ]


D M Murali Krishna Reddy commented on YARN-10767:
-------------------------------------------------

Thanks [~BilwaST], [~Jim_Brennan] for the review

1.  I will handle the null check in the v2 patch.

2. Yes, findActiveRMHAId is going to contact all the RM's but there is no retry 
policy in this case, they try to connect only once to each RM, and return the 
active one as soon as they are able to connect to any one of the RM. Yes I have 
verified the fix on a HA cluster.

3. I also believe it is not necessary to loop through all the RM's once we find 
out the active RM. I just wanted to improve the existing functionality without 
impacting the existing one, So I have just changed the executing order of RM's 
so that we try to connect to the active RM first.

4. I am also not sure why the method name is execOnActiveRM but currently it 
executes on all the RM's in a loop.

> Yarn Logs Command retrying on Standby RM for 30 times
> -----------------------------------------------------
>
>                 Key: YARN-10767
>                 URL: https://issues.apache.org/jira/browse/YARN-10767
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: D M Murali Krishna Reddy
>            Assignee: D M Murali Krishna Reddy
>            Priority: Major
>         Attachments: YARN-10767.001.patch
>
>
> When ResourceManager HA is enabled and the first RM is unavailable, on 
> executing "bin/yarn logs -applicationId <appID> -am 1", we get 
> ConnectionException for connecting to the first RM, the ConnectionException 
> Occurs for 30 times before it tries to connect to the second RM.
>  
> This can be optimized by trying to fetch the logs from the Active RM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-10767) Yarn Logs Command retrying on Standby RM for 30 times

Reply via email to