[
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288644#comment-14288644
]
Junping Du commented on YARN-914:
---------------------------------
Sorry for replying late. These are all good points, a couple of comments:
bq. Sounds like we need a new state for NM, called "decommission_in_progress"
when NM is draining the containers.
Agree. We need a dedicated state for NM in this situation and both AM and RM
should be aware of it for properly handle it.
bq. To clarify my early comment "all its map output are fetched or until all
the applications the node touches have completed", the question is when YARN
can declare a node's state has been gracefully drained and thus the node
gracefully decommissioned ( admins can shutdown the whole machine without any
impact on jobs ). For MR, the state could be running tasks/containers or mapper
outputs. Say we have timeout of 30 minutes for decommission, it takes 3 minutes
to finish the mappers on the node, another 5 minutes for the job to finish,
then YARN can declare the node gracefully decommissioned in 8 minutes, instead
of waiting for 30 minutes. RM knows all applications on any given NM. So if all
applications on any given node have completed, RM can mark the node
"decommissioned".
The first step I was thinking to keep NM running in a low resource mode after
graceful decommissioned - no running containers, no new containers get spawned,
no obviously resources consumption, etc. and just like putting these nodes into
maintenance mode. Timeout value there is used to kill unfinished containers to
release resources. Not quite sure if we have to terminate NM after timeout but
would like to understand your use case here.
bq. Yes, I meant long running services. If YARN just kills the containers upon
decommission request, the impact could vary. Some services might not have
states to drain. Or maybe the services can handle the state migration on their
own without YARN's help. For such services, maybe we can just use
ResourceOption's timeout for that; set timeout to 0 and NM will just kill the
containers.
I believe most of these services already take care of losing nodes as each node
in YARN cluster cannot be reliable always. However, I am not sure if they can
handle state migration to new node ahead of predictable node lost here, or be
stateless more or less make more sense here? If we have an example application
that could easy migrate a node's state to another, then we can discuss how to
provide some rudimentary support here.
bq. Given we don't plan to have applications checkpoint and migrate states, it
doesn't seem to be necessary to have YARN notify applications upon decommission
requests. Just to call it out.
These notification may still be necessary, so AM won't add these nodes into
blacklist if container get killed afterwards. Thoughts?
bq. It might be useful to have a new state called "decommissioned_timeout", so
that admins know the node has been gracefully decommissioned or not.
Just like my above comments, we can see if we have to terminate the NM. If not,
I prefer to use "maintenance" state and Admin can decide if to fully
decommission it later. Again, we should talk on your scenarios here.
> Support graceful decommission of nodemanager
> --------------------------------------------
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.0.4-alpha
> Reporter: Luke Lu
> Assignee: Junping Du
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.),
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to
> be rescheduled on other NMs. Further more, for finished map tasks, if their
> map output are not fetched by the reducers of the job, these map tasks will
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a
> node manager.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)