[
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745549#comment-16745549
]
Kuhu Shukla commented on YARN-9202:
-----------------------------------
Thank you Jim for the review. Appreciate it.
bq. If nodes are in the include list, but never register, what is it that we
are missing
Currently there is no way to know which nodes should have been a part of the
cluster, unless one manually goes and checks the include list. This is
different from the Namenode as the nodes that are not registered are still
listed as dead or in other categories.
bq. Is it just that those nodes are not included in any metrics?
More or less, yes, tracking what *should* be there is harder for operation
teams.
bq. Can the desired result be accomplished by just adding these nodes to the
inactive list and leaving them in the NEW state?
I did think about that and since there was no place where NEW nodes were
exposed on the UI I thought may be moving them to a somewhat terminal state
would be nicer , but of course, I like the idea of having NEW nodes in the
inactive list as well. I will have to see how much semantic difference does it
make in the code, to which end I will update shortly.
bq. testIncludeHostsWithNoRegister() - it's not clear to me why the latter half
of the test is needed? Looks like it was copied from the previous test but I
don't see why it needs to be repeated in this one?
True. I will prune the test in the next version.
If keeping the nodes in NEW state is fairly straight forward while they get
listed as inactive, the next version would have that change as well.
> RM does not track nodes that are in the include list and never register
> -----------------------------------------------------------------------
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.9.2, 3.0.3, 2.8.5
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Priority: Major
> Attachments: YARN-9202.001.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state
> only past the point of either registration or being in the exclude list. This
> does not cover the case where a node is the in the include list but never
> registers and since all state changes are based on these NodeState
> transitions, having NEW nodes be listed as inactive first may help. This
> would change the semantics of how inactiveNodes are looked at today. Another
> state addition might help this case too.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]