[ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118067#comment-15118067
 ] 

Jason Lowe commented on YARN-3102:
----------------------------------

Test failures are unrelated and tracked elsewhere.

Patch looks pretty good, but there are some nits:

Not caused by this JIRA, but the node rejoin transition has the 
contains-followed-by-get problem.  If another thread removes the node after the 
containsKey call then the get will return null but the code doesn't handle that 
and will NPE.  It is safer and faster to just do the get then check for null 
rather than perform the key lookup twice.  This also avoids the awkward 
declaring of the previousRMNode variable with no initial value.

Similarly we should not call get-then-remove when checking for the unknown 
node.  Instead we can simply call remove on the unknown node ID and check if 
the remove returned anything to know if it was there.

In TestResourceTrackerService, wouldn't it be simpler to create an overloaded 
form of writeToHostsFile that takes the specified file rather than replacing 
the existing method?  Then we can implement the original method in terms of the 
new method and cut out a large portion of this patch where it has to fixup all 
the original calls of the removed method.

4 seconds seems pretty aggressive for a test timeout, especially with multiple 
RM bringups and teardowns involved.  If this runs on a sluggish jenkins machine 
that happens to pause at the wrong time then the test fails -- i.e.: is it 
important that this test fails if it executes in 5 seconds instead?  Seems like 
the timeout should be at least 20 seconds, if there is an explicit timeout 
specified at all (surefire has one built in).

Should we just change MockRM to use the drain dispatcher and expose a drain 
events method rather than fix a bunch of places to override which dispatcher to 
use?

> Decommisioned Nodes not listed in Web UI
> ----------------------------------------
>
>                 Key: YARN-3102
>                 URL: https://issues.apache.org/jira/browse/YARN-3102
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>         Environment: 2 Node Manager and 1 Resource Manager 
>            Reporter: Bibin A Chundatt
>            Assignee: Kuhu Shukla
>            Priority: Minor
>         Attachments: YARN-3102-v1.patch, YARN-3102-v2.patch, 
> YARN-3102-v3.patch, YARN-3102-v4.patch, YARN-3102-v5.patch, YARN-3102-v6.patch
>
>
> Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
> yarn.exlude file In RM1 machine
> Add Yarn.exclude with NM1 Host Name 
> Start the node as listed below NM1,NM2 Resource manager
> Now check Nodes decommisioned in /cluster/nodes
> Number of decommisioned node is listed as 1 but Table is empty in 
> /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to