Sunil G commented on YARN-3784:

Yes [~leftnoteasy] Thank you for sharing your thoughts.
I f I understood you correctly, there are chances that to-be-preempted 
container will reside in FicaSchedulerApp till allocate call comes from AM. 
Within this duration, there are chances that some more containers got free or 
cancelled its resource requests. Due to this, we may remove this container from 
this to-be-preempted list. I feel we can have a remove-from-to-preempt in 
scheduler, and propportionalCPP can notify the app when such scenario occurs. 
This can be added as a new argument to AM response also. I will separate this 
improvement in to another ticket.

>From your second point, I feel we can keep a getter api (synchronized) for 
>to-be-preempted containers which is present in FicaSchedulerApp (scheduler 
>level). With this api, proportionalCPP can have look whether the container 
>which is newly identified to preempt is already reported as to-be-preempted 
>container in app level. If so, proportionalCPP need not have to raise another 
>event to scheduler. I ll separate this if its ok.

> Indicate preemption timout along with the list of containers to AM 
> (preemption message)
> ---------------------------------------------------------------------------------------
>                 Key: YARN-3784
>                 URL: https://issues.apache.org/jira/browse/YARN-3784
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch
> Currently during preemption, AM is notified with a list of containers which 
> are marked for preemption. Introducing a timeout duration also along with 
> this container list so that AM can know how much time it will get to do a 
> graceful shutdown to its containers (assuming one of preemption policy is 
> loaded in AM).
> This will help in decommissioning NM scenarios, where NM will be 
> decommissioned after a timeout (also killing containers on it). This timeout 
> will be helpful to indicate AM that those containers can be killed by RM 
> forcefully after the timeout.

This message was sent by Atlassian JIRA

Reply via email to