[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745516#comment-16745516
 ] 

Jim Brennan commented on YARN-9202:
-----------------------------------

[~kshukla] thanks for the patch!

I'd like some clarification of what the bug is that this is fixing?  If nodes 
are in the include list, but never register, what is it that we are missing? Is 
it just that those nodes are not included in any metrics? 

Looks like the patch is putting any nodes in the include list that are not in 
the exclude list into the inactive node list, and moving them to the SHUTDOWN 
state, with new transitions from NEW to SHUTDOWN and SHUTDOWN to RUNNING to 
support this.   Can the desired result be accomplished by just adding these 
nodes to the inactive list and leaving them in the NEW state?  It seems like 
the only difference would be that we would not be counting these nodes as part 
of the numShutdownNMs metric during the period between NEW and RUNNING.

The code looks good, assuming this is the correct solution.

One comment on the test:

testIncludeHostsWithNoRegister() - it's not clear to me why the latter half of 
the test is needed?  Looks like it was copied from the previous test but I 
don't see why it needs to be repeated in this one?

 

> RM does not track nodes that are in the include list and never register
> -----------------------------------------------------------------------
>
>                 Key: YARN-9202
>                 URL: https://issues.apache.org/jira/browse/YARN-9202
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.9.2, 3.0.3, 2.8.5
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>            Priority: Major
>         Attachments: YARN-9202.001.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to