[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

Varun Saxena (JIRA) Sat, 11 Jul 2015 15:58:42 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623616#comment-14623616
 ]


Varun Saxena commented on YARN-3644:
------------------------------------

[~raju.bairishetti],
bq. I ran the test in debugger mode. also. Test is hitting all the source 
changes
I did not mean that test is not hitting the change. It is doing so and I 
verified it as well from logs.
What I meant is some Mockito#verify statements can be added or some other 
assertions added to check if required functions or flows are getting hit.
Because the assertion of Service state being STARTED is something which can 
happen irrespective of whether your code is hit or not. 
Let us say somebody changes the code in future in a manner where your part of 
the code is conditional. Unlikely but you never know what happens 6 months down 
the line. 
If you have verification statements checking whether your code has been called 
or not, your test case would fail after future changes, if those parts of code 
are not called. This would force the developer to change your test case as well.
If you have just the check for service being STARTED, after any future changes, 
your tests may still pass despite relevant flow being hit or not. And this may 
mask any mistakes made in this or related part of the main code.  So test case 
should verify if flow is being hit, either by checking function invocations or 
by having a set of assertions which are somewhat unique to test.

> Node manager shuts down if unable to connect with RM
> ----------------------------------------------------
>
>                 Key: YARN-3644
>                 URL: https://issues.apache.org/jira/browse/YARN-3644
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Srikanth Sundarrajan
>            Assignee: Raju Bairishetti
>         Attachments: YARN-3644.001.patch, YARN-3644.001.patch, 
> YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>           } catch (ConnectException e) {
>             //catch and throw the exception if tried MAX wait time to connect 
> RM
>             dispatcher.getEventHandler().handle(
>                 new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
>             throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

Reply via email to