[
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348963#comment-15348963
]
Roni Burd commented on YARN-5292:
---------------------------------
Chiming in on the use case
An AM may use opportunistic containers
(https://issues.apache.org/jira/browse/YARN-2882) for a couple of things:
1) Duplicate execution: race a slow/laggard container and see which one is
making more progress.
2) Speculative execution: I can start future work by scavenging resources that
are free and above my current allocation
3) Have a customer "pay" for opportunistic containers which are " cheaper" than
guaranteed containers, fully knowing that the job has a lower SLA
In all these cases, the Opportunistic token may get preempted. The question is
what strategy to choose on preemption:
1: Kill the container
2: context switch the container somehow
3: move the container somewhere else
Case #2 and #3 are work preserving strategies. This is important in long
running batch jobs. Imagine a stage in the job was RUNNING for 10 min on an
opportunistic container that has 1 minute left to run. I already localized
resources and processed a bunch of data and then a small 30s GUARANTEED
container preempted. KILL becomes very expensive. So more than a PAUSED state,
I think it is a PREEMPTED state.
> Support for PAUSED container state
> ----------------------------------
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Hitesh Sharma
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it
> remains until resources get freed up on the node then the preempted container
> can resume to the running state.
>
> One scenario where this capability is useful is work preservation. How
> preemption is done, and whether the container supports it, is implementation
> specific.
> For instance, if the container is a virtual machine, then preempt would pause
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to
> killing the container.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]