Chris Douglas commented on YARN-1039:

The semantics of a boolean flag are opaque. The policies enforced by different 
RM configurations (and versions) will not be- and cannot be made to be- 
consistent. Application and container priority are already encoded (or in 
progress, YARN-1963), so it's not just preemption priority or cost. Affinity 
and anti-affinity are also covered by different features. Discussion has been 
wide-ranging because it is unclear what "long-lived" guarantees across existing 
features (beyond removing the progress bar from the UI, which I hope we can 
stop mentioning).

An implementation that only recognizes infinite and undefined leases could be 
mapped into duration. Lease duration could also be used to communicate when 
security tokens cannot be renewed, short-lived guarantees for YARN-2877 
containers, boundaries of YARN-1051 reservations, and planned decommissioning. 
In contrast, the "long-lived" flag cannot be used for these cases. We could 
expose probabilistic guarantees (which are what we give in reality), but that's 
a later issue.

Considering the blockers more concretely:
bq. (a) reservations (b) white-listed requests or (c) node-label requests 
getting stuck on a node used by other services' containers that don't exit.

Aren't these handled by adding a timeout to allocations, which would also catch 
cases where this flag is _not_ set? The timeout value could be set across the 
scheduler to start, but could even be user-visible in later versions...

All said, I don't have time to work on this, agree the API can be evolved from 
the flag, and am -0 on it.

> Add parameter for YARN resource requests to indicate "long lived"
> -----------------------------------------------------------------
>                 Key: YARN-1039
>                 URL: https://issues.apache.org/jira/browse/YARN-1039
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 3.0.0, 2.1.1-beta
>            Reporter: Steve Loughran
>            Assignee: Craig Welch
>         Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch
> A container request could support a new parameter "long-lived". This could be 
> used by a scheduler that would know not to host the service on a transient 
> (cloud: spot priced) node.
> Schedulers could also decide whether or not to allocate multiple long-lived 
> containers on the same node

This message was sent by Atlassian JIRA

Reply via email to