[ 
https://issues.apache.org/jira/browse/YARN-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8450:
-----------------------------------
    Comment: was deleted

(was: [~sunilg]/[~eepayne]

During kill scenarios/preemption cases this issue mainly gets exposed.
Thoughts on moving the resource check to {{ResourceHandlerChain}}.

Solution could be wait until the resource is released by {{resourceHandlers}} 
which has strict binding.
or Adding {{canAssign}} interface to resource handlers, and Query canAssign 
till timeout. Thoughts?)

> Blocking resources such as GPU/FPGA etc tend to release actual device slowly 
> even after RM identifies it as COMPLETED
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8450
>                 URL: https://issues.apache.org/jira/browse/YARN-8450
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.2
>            Reporter: Sunil Govindan
>            Assignee: Bilwa S T
>            Priority: Major
>
> For resources such as GPU/FPGA or similar resources, sometimes we have seen 
> that device is not released from a container even after container is in 
> completed states. 
> In such cases, we need a common way of handling from NM level. YARN-8423 is 
> only handling this for GPU.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to