[
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118067#comment-15118067
]
Jason Lowe commented on YARN-3102:
----------------------------------
Test failures are unrelated and tracked elsewhere.
Patch looks pretty good, but there are some nits:
Not caused by this JIRA, but the node rejoin transition has the
contains-followed-by-get problem. If another thread removes the node after the
containsKey call then the get will return null but the code doesn't handle that
and will NPE. It is safer and faster to just do the get then check for null
rather than perform the key lookup twice. This also avoids the awkward
declaring of the previousRMNode variable with no initial value.
Similarly we should not call get-then-remove when checking for the unknown
node. Instead we can simply call remove on the unknown node ID and check if
the remove returned anything to know if it was there.
In TestResourceTrackerService, wouldn't it be simpler to create an overloaded
form of writeToHostsFile that takes the specified file rather than replacing
the existing method? Then we can implement the original method in terms of the
new method and cut out a large portion of this patch where it has to fixup all
the original calls of the removed method.
4 seconds seems pretty aggressive for a test timeout, especially with multiple
RM bringups and teardowns involved. If this runs on a sluggish jenkins machine
that happens to pause at the wrong time then the test fails -- i.e.: is it
important that this test fails if it executes in 5 seconds instead? Seems like
the timeout should be at least 20 seconds, if there is an explicit timeout
specified at all (surefire has one built in).
Should we just change MockRM to use the drain dispatcher and expose a drain
events method rather than fix a bunch of places to override which dispatcher to
use?
> Decommisioned Nodes not listed in Web UI
> ----------------------------------------
>
> Key: YARN-3102
> URL: https://issues.apache.org/jira/browse/YARN-3102
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.6.0
> Environment: 2 Node Manager and 1 Resource Manager
> Reporter: Bibin A Chundatt
> Assignee: Kuhu Shukla
> Priority: Minor
> Attachments: YARN-3102-v1.patch, YARN-3102-v2.patch,
> YARN-3102-v3.patch, YARN-3102-v4.patch, YARN-3102-v5.patch, YARN-3102-v6.patch
>
>
> Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to
> yarn.exlude file In RM1 machine
> Add Yarn.exclude with NM1 Host Name
> Start the node as listed below NM1,NM2 Resource manager
> Now check Nodes decommisioned in /cluster/nodes
> Number of decommisioned node is listed as 1 but Table is empty in
> /cluster/nodes/decommissioned (detail of Decommision node not shown)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)