[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288644#comment-14288644
 ] 

Junping Du commented on YARN-914:
---------------------------------

Sorry for replying late. These are all good points, a couple of comments:

bq. Sounds like we need a new state for NM, called "decommission_in_progress" 
when NM is draining the containers.
Agree. We need a dedicated state for NM in this situation and both AM and RM 
should be aware of it for properly handle it.  

bq. To clarify my early comment "all its map output are fetched or until all 
the applications the node touches have completed", the question is when YARN 
can declare a node's state has been gracefully drained and thus the node 
gracefully decommissioned ( admins can shutdown the whole machine without any 
impact on jobs ). For MR, the state could be running tasks/containers or mapper 
outputs. Say we have timeout of 30 minutes for decommission, it takes 3 minutes 
to finish the mappers on the node, another 5 minutes for the job to finish, 
then YARN can declare the node gracefully decommissioned in 8 minutes, instead 
of waiting for 30 minutes. RM knows all applications on any given NM. So if all 
applications on any given node have completed, RM can mark the node 
"decommissioned".
The first step I was thinking to keep NM running in a low resource mode after 
graceful decommissioned - no running containers, no new containers get spawned, 
no obviously resources consumption, etc. and just like putting these nodes into 
maintenance mode. Timeout value there is used to kill unfinished containers to 
release resources. Not quite sure if we have to terminate NM after timeout but 
would like to understand your use case here.

bq. Yes, I meant long running services. If YARN just kills the containers upon 
decommission request, the impact could vary. Some services might not have 
states to drain. Or maybe the services can handle the state migration on their 
own without YARN's help. For such services, maybe we can just use 
ResourceOption's timeout for that; set timeout to 0 and NM will just kill the 
containers.
I believe most of these services already take care of losing nodes as each node 
in YARN cluster cannot be reliable always. However, I am not sure if they can 
handle state migration to new node ahead of predictable node lost here, or be 
stateless more or less make more sense here? If we have an example application 
that could easy migrate a node's state to another, then we can discuss how to 
provide some rudimentary support here.   

bq. Given we don't plan to have applications checkpoint and migrate states, it 
doesn't seem to be necessary to have YARN notify applications upon decommission 
requests. Just to call it out.
These notification may still be necessary, so AM won't add these nodes into 
blacklist if container get killed afterwards. Thoughts?

bq. It might be useful to have a new state called "decommissioned_timeout", so 
that admins know the node has been gracefully decommissioned or not.
Just like my above comments, we can see if we have to terminate the NM. If not, 
I prefer to use "maintenance" state and Admin can decide if to fully 
decommission it later. Again, we should talk on your scenarios here. 

> Support graceful decommission of nodemanager
> --------------------------------------------
>
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to