[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541129#comment-14541129
]
Vinod Kumar Vavilapalli commented on YARN-1039:
-----------------------------------------------
*sigh* This JIRA was all over the place.
Can we please agree not to discuss here *how* long running services related
scheduling features, UI, log-aggregation, security-tokens should be
implemented? There are separate JIRAs with good progress on each of them.
Let's also please not discuss how the platform _could_ make use of the notion
of a long-lived nature of an application/container. I understand that the type
of usage shall dictate what the input will look like, but hold on to that for a
second.
h3. Blocker
I've already started seeing real-life situations where we need the RM to know
about the long-lived'ness of a container and an application. The prominents one
of this are (a) reservations (b) white-listed requests or (c) node-label
requests getting stuck on a node used by other services' containers that don't
exit.
Absence of this notion is increasingly becoming a *blocker* for running
services. I'd like to get some progress here.
h3. Short Proposal
There seems like a general agreement on having the notion itself. Here are the
proposals and dimensions
# The notion at app level, at per container level
# a boolean flag, an enum, duration
I propose that we solve the blocker use-case that I pointed above with a
boolean at both app-level and container-level. Tomorrow, when somebody
implements a duration based bin-packing scheduling policy, they can add in the
notion of a duration and then reconcile the boolean with infinity values on the
duration. The enum proposal is to me a dup of YARN-3409 which covers a much
larger problem space.
Thoughts?
> Add parameter for YARN resource requests to indicate "long lived"
> -----------------------------------------------------------------
>
> Key: YARN-1039
> URL: https://issues.apache.org/jira/browse/YARN-1039
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 3.0.0, 2.1.1-beta
> Reporter: Steve Loughran
> Assignee: Craig Welch
> Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch
>
>
> A container request could support a new parameter "long-lived". This could be
> used by a scheduler that would know not to host the service on a transient
> (cloud: spot priced) node.
> Schedulers could also decide whether or not to allocate multiple long-lived
> containers on the same node
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)