[
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324277#comment-14324277
]
Jason Lowe commented on YARN-914:
---------------------------------
bq. I think prediction of expected runtime of containers could be hard in YARN
case. However, can we typically say long running service containers are
expected to run very long or infinite? If so, notifying AM to preempt
containers of LRS make more sense here than waiting here for timeout. Isn't it?
The main point I'm trying to make here is that we shouldn't be worrying too
much about long-running services right now. YARN doesn't even know which are
which yet, and without any kind of container lifespan prediction there's no way
to know whether a container will finish within the decomm timeout window or
not. YARN knowing which apps are LRS is a primitive form of container lifespan
prediction (i.e.: LRS = containers run forever). We will have the same
problems with apps that aren't LRS but have containers that can run for a
"long" time, where "long" is larger than the decomm timeout. That's why I'm
not convinced it makes sense to do anything special for LRS apps vs. other apps.
In the short-term I think we just go with a configurable decomm timeout and AM
notification via strict preemption as the timeout expires. If we want to get a
bit fancier, we can annotate the strict preemption with a timeout so the AM
knows approximately _when_ the preemption will occur. With that feature we
would notify AMs as soon as the node is marked for decomm that their containers
will be forcibly preempted (i.e.: killed) in X minutes, and it's up to each AM
to decide whether to do anything about it or if their containers on that node
will complete within that time naturally. With that setup we don't have to
special-case LRS apps or anything like that, as we're telling the apps ASAP the
decomm is happening and giving them time to deal with it, LRS or not.
> Support graceful decommission of nodemanager
> --------------------------------------------
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.0.4-alpha
> Reporter: Luke Lu
> Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf,
> Gracefully Decommission of NodeManager (v2).pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.),
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to
> be rescheduled on other NMs. Further more, for finished map tasks, if their
> map output are not fetched by the reducers of the job, these map tasks will
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a
> node manager.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)