[ 
https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3784:
--------------------------
    Attachment: 0001-YARN-3784.patch

Uploading an initial version.

As per existing preemption framework, AM will fetch the conatiners which are to 
be preempted during each allocate call. Along with this, AM can also fetch a 
proposed possible time duration, after which RM will forcefully kill those 
containers.
This patch is not updating preemption timeout  per container level, rather it 
is giving per application level. So for all those preempted containers within a 
heartbeat duration, timeout will be common. If 2 types of containers are marked 
for preemption with different timeout, lowest one will be updated.

If we provide timeout per container level, we need to change the interface of 
list of containers to a map <containerId, timeOut>. Please share your thoughts 
on this point.

> Indicate preemption timout along with the list of containers to AM 
> (preemption message)
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3784
>                 URL: https://issues.apache.org/jira/browse/YARN-3784
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: 0001-YARN-3784.patch
>
>
> Currently during preemption, AM is notified with a list of containers which 
> are marked for preemption. Introducing a timeout duration also along with 
> this container list so that AM can know how much time it will get to do a 
> graceful shutdown to its containers (assuming one of preemption policy is 
> loaded in AM).
> This will help in decommissioning NM scenarios, where NM will be 
> decommissioned after a timeout (also killing containers on it). This timeout 
> will be helpful to indicate AM that those containers can be killed by RM 
> forcefully after the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to