Srikanth Sundarrajan created YARN-3644:
------------------------------------------

             Summary: Node manager shuts down if unable to connect with RM
                 Key: YARN-3644
                 URL: https://issues.apache.org/jira/browse/YARN-3644
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
            Reporter: Srikanth Sundarrajan


When NM is unable to connect to RM, NM shuts itself down.

{code}
          } catch (ConnectException e) {
            //catch and throw the exception if tried MAX wait time to connect RM
            dispatcher.getEventHandler().handle(
                new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
            throw new YarnRuntimeException(e);
{code}

In large clusters, if RM is down for maintenance for longer period, all the NMs 
shuts themselves down, requiring additional work to bring up the NMs.

Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side effects, 
where non connection failures are being retried infinitely by all YarnClients 
(via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to