[
https://issues.apache.org/jira/browse/YARN-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bibin A Chundatt updated YARN-8450:
-----------------------------------
Comment: was deleted
(was: [~sunilg]/[~eepayne]
During kill scenarios/preemption cases this issue mainly gets exposed.
Thoughts on moving the resource check to {{ResourceHandlerChain}}.
Solution could be wait until the resource is released by {{resourceHandlers}}
which has strict binding.
or Adding {{canAssign}} interface to resource handlers, and Query canAssign
till timeout. Thoughts?)
> Blocking resources such as GPU/FPGA etc tend to release actual device slowly
> even after RM identifies it as COMPLETED
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-8450
> URL: https://issues.apache.org/jira/browse/YARN-8450
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.0.2
> Reporter: Sunil Govindan
> Assignee: Bilwa S T
> Priority: Major
>
> For resources such as GPU/FPGA or similar resources, sometimes we have seen
> that device is not released from a container even after container is in
> completed states.
> In such cases, we need a common way of handling from NM level. YARN-8423 is
> only handling this for GPU.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]