[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370446#comment-14370446
 ] 

Junping Du commented on YARN-3212:
----------------------------------

Thanks [~jlowe] and [~mingma] for review and comments!
bq. Do we want to handle the DECOMMISSIONING_WITH_TIMEOUT event when the node 
is already in the DECOMMISSIONING state? Curious if we might get a duplicate 
decommission event somehow and want to ignore it or if we know for sure this 
cannot happen in practice.
This case is possibly happen when user submit another decommssion CLI while the 
node still in decommissioning. I think it just ignore it now as nothing need to 
update if node already in decommissioning. We will not have timeout tracking 
and update in RM side (may only pass to AM for notification) according to 
discussions in YARN-914 and YARN-3225. 

bq. Do we want to consider DECOMMISSIONING nodes as not active? There are 
containers actively running on them, and in that sense they are participating 
in the cluster (and contributing to the overall cluster resource). I think they 
should still be considered active, but I could be persuaded otherwise.
I think we discussed this on YARN-914 before. The conclusion so far is keeping 
node in decommissioning as active (or may broken some services - I am not 100% 
sure on this) and make node resource equals to resource of assigned containers 
at anytime. Do we need to change this conclusion?

bq.  In the reconnected node transition there is a switch statement that will 
debug-log an unexpected state message when in fact the DECOMMISSIONING state is 
expected for this transition.
That's a good point. Will fix it in v3 patch. Thanks!

bq. Curious why the null check is needed in handleNMContainerStatuses? What 
about this change allows the container statuses to be null?
I think so. Look like the RMNodeReconnectEvent comes from 
RegisterNodeManagerRequest and containerStatuses field (getting from proto) 
could be nullable. So there is an NPE bug here and I found through unit test 
where we created event like "new RMNodeReconnectEvent(node.getNodeID(), node, 
null, null)" even before this patch. Am I missing something here?

bq. It would be nice to see some refactoring of the common code between 
StatusUpdateWhenHealthyTransition, StatusUpdateWhenUnhealthyTransition, and 
StatusUpdateWhenDecommissioningTransition.
Yes. I should do earlier. Will do it in v3 patch.

bq. These change seems unnecessary?
These are still necessary because we changed state transition from one final 
state to multiple final states (like below example) and interface only accept 
EnumSet. 
{code}
   public static class ReconnectNodeTransition implements
-      SingleArcTransition<RMNodeImpl, RMNodeEvent> {
+      MultipleArcTransition<RMNodeImpl, RMNodeEvent, NodeState> {
{code}

bq. Do we need to support the scenario where NM becomes dead when it is being 
decommissioned? Say decommission timeout is 30 minutes larger than the NM 
liveness timeout. The node drops out of the cluster for some time and rejoin 
later all within the decommission time out. Will Yarn show the status as just 
dead node, or
{dead, decommissioning}
Now, the node can be LOST (dead) when it is in decommissioning. It is not 
different with running node get lost but cannot join back except user put it 
back through recommission. Make sense?

bq. Seems useful for admins to know about it. If we need that, we can consider 
two types of NodeState. One is liveness state, one is admin state. Then you 
will have different combinations.
We can add necessary log to let admin know about it. Are you talking about 
scenario like this: admin put some node in decommissioning with a timeout, some 
upgrade script doing OS upgrade and finish with a restart in a random time 
which could be shorter than decommissioning time. Admin want these nodes can 
join back automatically. But how YARN know about Admin want these nodes back or 
not after a restart? May be a explicitly set back to white list (recommission) 
still necessary.

> RMNode State Transition Update with DECOMMISSIONING state
> ---------------------------------------------------------
>
>                 Key: YARN-3212
>                 URL: https://issues.apache.org/jira/browse/YARN-3212
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
> YARN-3212-v2.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
> can transition from “running” state triggered by a new event - 
> “decommissioning”. 
> This new state can be transit to state of “decommissioned” when 
> Resource_Update if no running apps on this NM or NM reconnect after restart. 
> Or it received DECOMMISSIONED event (after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous 
> decommission by calling recommission on the same node. The reaction to other 
> events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to