[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297783#comment-14297783
]
Craig Welch commented on YARN-1039:
-----------------------------------
[~chris.douglas]
bq. YARN shouldn't understand the lifecycle for a service or the
progress/dependencies for task containers
That's not necessarily so, there are some cases where the type of life cycle
for an application is important, for example, when determining whether or not
it is open-ended ("service") or a batch process which entails a notion of
progress ("session"), at least for purposes of display.
I think we need to re scope and clarify this jira a bit so that we can make
progress - there are a number of items in the original problem statement and
subsequent comments which have been taken on elsewhere and so really no longer
make sense to pursue here. Here's an attempt at a breakdown:
bq. This could be used by a scheduler that would know not to host the service
on a transient (cloud: spot priced) node
I think this is now clearly covered by [YARN-796], nodes having qualities
(including operational qualities such as these) is one of the core purposes of
this work, it makes no sense to duplicate it here, and so it should be
de-scoped from this jira
bq. Schedulers could also decide whether or not to allocate multiple long-lived
containers on the same node
As [[email protected]] mentioned in an earlier comment
[https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14038041&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14038041]
affinity / anti-affinity is covered in a more general sense in [YARN-1042].
The above component of this jira is really just such a case, and so it should
be covered with that general solution and dropped from scope as well. There
may be some interest in informing that solution based on a generalized
"service" setting, but to really understand that the affinity approach needs to
be worked out - and I think the affinity approach will really need to
inform/integrate with this rather than the other way around, and integration
should be approached as part of that effort
That leaves nothing, so we can close the jira ;-) Not quite, there were
several things added in comments:
Token management - handled in [YARN-941]
Scheduler hints not related to node categories or anti-affinity (opportunistic
scheduling, etc) - this does strike me as something better handled via the
duration route et all [YARN-2877] [YARN-1051] and not something which needs to
be replicated here
I think that really just leaves the progress bar (and potentially other display
related items). This is covered by [YARN-1079] I suggest, then, that we
either rescope this jira to providing the lifecycle information as an
application tag
[https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14039679&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039679]
as suggested by [~zjshen] early on or close it and cover the work as part of
[YARN-1079]. I originally objected to that approach on the basis that tags
appeared to be a display type feature which did not fit this effort, but if re
scoped as I'm proposing, it becomes such a feature, and I think that approach
is now a good fit.
Thoughts?
> Add parameter for YARN resource requests to indicate "long lived"
> -----------------------------------------------------------------
>
> Key: YARN-1039
> URL: https://issues.apache.org/jira/browse/YARN-1039
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 3.0.0, 2.1.1-beta
> Reporter: Steve Loughran
> Assignee: Craig Welch
> Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch
>
>
> A container request could support a new parameter "long-lived". This could be
> used by a scheduler that would know not to host the service on a transient
> (cloud: spot priced) node.
> Schedulers could also decide whether or not to allocate multiple long-lived
> containers on the same node
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)