Anubhav Dhoot created YARN-4180:
-----------------------------------
Summary: AMLauncher does not retry on failures when talking to NM
Key: YARN-4180
URL: https://issues.apache.org/jira/browse/YARN-4180
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
We see issues with RM trying to launch a container while a NM is restarting and
we get exceptions like NMNotReadyException. While YARN-3842 added retry for
other clients of NM (AMs mainly) its not used by AMLauncher in RM causing there
intermittent errors to cause job failures. This can manifest during rolling
restart of NMs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)