[
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuqi Wang updated YARN-8012:
----------------------------
Description:
An *unmanaged container / leaked container* is a container which is no longer
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
*There are many cases a YARN managed container can become unmanaged, such as:*
* NM service is disabled or removed on the node.
* NM is unable to start up again on the node, such as depended configuration,
or resources cannot be ready.
* NM local leveldb store is corrupted or lost, such as bad disk sectors.
* NM has bugs, such as wrongly mark live container as complete.
Things become worse if work-preserving NM restart enabled, see
[YARN-1336|https://issues.apache.org/jira/browse/YARN-1336]
*Bad impacts of unmanaged container, such as:*
# Resource cannot be managed for YARN and the node:
** Cause YARN and node resource leak
** Cannot kill the container to release YARN resource on the node
# Container and App killing is not eventually consistent for user:
** App which has bugs can still produce bad impacts to outside even if the App
is killed for a long time
was:
An *unmanaged container* is a container which is no longer managed by NM. Thus,
it is cannot be managed by YARN, too.
*There are many cases a YARN managed container can become unmanaged, such as:*
# For container resource managed by YARN, such as container job object
and disk data:
** NM service is disabled or removed on the node.
** NM is unable to start up again on the node, such as depended configuration,
or resources cannot be ready.
** NM local leveldb store is corrupted or lost, such as bad disk sectors.
** NM has bugs, such as wrongly mark live container as complete.
# For container resource unmanaged by YARN:
** User breakaway processes from container job object.
** User creates VMs from container job object.
** User acquires other resource on the machine which is unmanaged by
YARN, such as produce data outside Container folder.
*Bad impacts of unmanaged container, such as:*
# Resource cannot be managed for YARN and the node:
** Cause YARN and node resource leak
** Cannot kill the container to release YARN resource on the node
# Container and App killing is not eventually consistent for user:
** App which has bugs can still produce bad impacts to outside even if the App
is killed for a long time
> Support Unmanaged Container Cleanup
> -----------------------------------
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Affects Versions: 2.7.1
> Reporter: Yuqi Wang
> Assignee: Yuqi Wang
> Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
> * NM service is disabled or removed on the node.
> * NM is unable to start up again on the node, such as depended
> configuration, or resources cannot be ready.
> * NM local leveldb store is corrupted or lost, such as bad disk sectors.
> * NM has bugs, such as wrongly mark live container as complete.
> Things become worse if work-preserving NM restart enabled, see
> [YARN-1336|https://issues.apache.org/jira/browse/YARN-1336]
> *Bad impacts of unmanaged container, such as:*
> # Resource cannot be managed for YARN and the node:
> ** Cause YARN and node resource leak
> ** Cannot kill the container to release YARN resource on the node
> # Container and App killing is not eventually consistent for user:
> ** App which has bugs can still produce bad impacts to outside even if the
> App is killed for a long time
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]