[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

Varun Saxena (JIRA) Fri, 10 Jul 2015 08:31:20 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622468#comment-14622468
 ]


Varun Saxena commented on YARN-3644:
------------------------------------

Moreover, I think most, if not all of the My**** classes added by you are not 
required. You can easily use mocking and use current classes to achieve the 
same result.
You just need to throw an exception while calling heartbeat. You can easily use 
Mockito to achieve it.
We can probably change visibility of {{getRMClient}} method in one of the 
MyNodeStatusUpdater* class which you can use so that its visible for use with 
Mockito.
This will greatly reduce unnecessary code.

And IMHO, changing a method of a private class in test scope to public 
shouldn't be an issue. Thoughts ?
You can probably explore this option to refactor your code.

> Node manager shuts down if unable to connect with RM
> ----------------------------------------------------
>
>                 Key: YARN-3644
>                 URL: https://issues.apache.org/jira/browse/YARN-3644
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Srikanth Sundarrajan
>            Assignee: Raju Bairishetti
>         Attachments: YARN-3644.001.patch, YARN-3644.001.patch, 
> YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>           } catch (ConnectException e) {
>             //catch and throw the exception if tried MAX wait time to connect 
> RM
>             dispatcher.getEventHandler().handle(
>                 new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
>             throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

Reply via email to