Srikanth Sundarrajan created YARN-3644:

             Summary: Node manager shuts down if unable to connect with RM
                 Key: YARN-3644
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
            Reporter: Srikanth Sundarrajan

When NM is unable to connect to RM, NM shuts itself down.

          } catch (ConnectException e) {
            //catch and throw the exception if tried MAX wait time to connect RM
                new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
            throw new YarnRuntimeException(e);

In large clusters, if RM is down for maintenance for longer period, all the NMs 
shuts themselves down, requiring additional work to bring up the NMs.

Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side effects, 
where non connection failures are being retried infinitely by all YarnClients 
(via RMProxy).

This message was sent by Atlassian JIRA

Reply via email to