[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549505#comment-14549505
]
Chris Douglas commented on YARN-1039:
-------------------------------------
The semantics of a boolean flag are opaque. The policies enforced by different
RM configurations (and versions) will not be- and cannot be made to be-
consistent. Application and container priority are already encoded (or in
progress, YARN-1963), so it's not just preemption priority or cost. Affinity
and anti-affinity are also covered by different features. Discussion has been
wide-ranging because it is unclear what "long-lived" guarantees across existing
features (beyond removing the progress bar from the UI, which I hope we can
stop mentioning).
An implementation that only recognizes infinite and undefined leases could be
mapped into duration. Lease duration could also be used to communicate when
security tokens cannot be renewed, short-lived guarantees for YARN-2877
containers, boundaries of YARN-1051 reservations, and planned decommissioning.
In contrast, the "long-lived" flag cannot be used for these cases. We could
expose probabilistic guarantees (which are what we give in reality), but that's
a later issue.
Considering the blockers more concretely:
bq. (a) reservations (b) white-listed requests or (c) node-label requests
getting stuck on a node used by other services' containers that don't exit.
Aren't these handled by adding a timeout to allocations, which would also catch
cases where this flag is _not_ set? The timeout value could be set across the
scheduler to start, but could even be user-visible in later versions...
All said, I don't have time to work on this, agree the API can be evolved from
the flag, and am -0 on it.
> Add parameter for YARN resource requests to indicate "long lived"
> -----------------------------------------------------------------
>
> Key: YARN-1039
> URL: https://issues.apache.org/jira/browse/YARN-1039
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 3.0.0, 2.1.1-beta
> Reporter: Steve Loughran
> Assignee: Craig Welch
> Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch
>
>
> A container request could support a new parameter "long-lived". This could be
> used by a scheduler that would know not to host the service on a transient
> (cloud: spot priced) node.
> Schedulers could also decide whether or not to allocate multiple long-lived
> containers on the same node
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)