[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088392#comment-15088392
 ] 

Junping Du commented on YARN-4180:
----------------------------------

Thanks [~kasha] for help on this.

> AMLauncher does not retry on failures when talking to NM 
> ---------------------------------------------------------
>
>                 Key: YARN-4180
>                 URL: https://issues.apache.org/jira/browse/YARN-4180
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Anubhav Dhoot
>            Assignee: Anubhav Dhoot
>            Priority: Critical
>             Fix For: 2.7.2, 2.6.4
>
>         Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to