[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583189#comment-14583189 ]
Raju Bairishetti commented on YARN-3644: ---------------------------------------- [~amareshwari] [~Naganarasimha] Thanks for the review and comments. [~Naganarasimha] Yes, this jira is only to make NM wait for RM. > Node manager shuts down if unable to connect with RM > ---------------------------------------------------- > > Key: YARN-3644 > URL: https://issues.apache.org/jira/browse/YARN-3644 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Srikanth Sundarrajan > Assignee: Raju Bairishetti > Attachments: YARN-3644.001.patch, YARN-3644.patch > > > When NM is unable to connect to RM, NM shuts itself down. > {code} > } catch (ConnectException e) { > //catch and throw the exception if tried MAX wait time to connect > RM > dispatcher.getEventHandler().handle( > new NodeManagerEvent(NodeManagerEventType.SHUTDOWN)); > throw new YarnRuntimeException(e); > {code} > In large clusters, if RM is down for maintenance for longer period, all the > NMs shuts themselves down, requiring additional work to bring up the NMs. > Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side > effects, where non connection failures are being retried infinitely by all > YarnClients (via RMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)