[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623462#comment-14623462 ]
Raju Bairishetti commented on YARN-3644: ---------------------------------------- Thanks [~varun_saxena] for the review and comments. bq. The config name is yarn.nodemanager.shutdown.on.RM.connection.failures. All our config names are in lowercase, just for the sake of consistency, maybe RM can be in lowercase too. Thoughts? Agree. Will change it to lower case. bq. The test doesnt really check for whether ConnectionException was thrown or NM Shutdown event was called or not. I ran the test in debugger mode. also. Test is hitting all the source changes. *I agree, I will rewrite this test using Mockito to make it more generic* > Node manager shuts down if unable to connect with RM > ---------------------------------------------------- > > Key: YARN-3644 > URL: https://issues.apache.org/jira/browse/YARN-3644 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Srikanth Sundarrajan > Assignee: Raju Bairishetti > Attachments: YARN-3644.001.patch, YARN-3644.001.patch, > YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch > > > When NM is unable to connect to RM, NM shuts itself down. > {code} > } catch (ConnectException e) { > //catch and throw the exception if tried MAX wait time to connect > RM > dispatcher.getEventHandler().handle( > new NodeManagerEvent(NodeManagerEventType.SHUTDOWN)); > throw new YarnRuntimeException(e); > {code} > In large clusters, if RM is down for maintenance for longer period, all the > NMs shuts themselves down, requiring additional work to bring up the NMs. > Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side > effects, where non connection failures are being retried infinitely by all > YarnClients (via RMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)