Yuqi Wang created YARN-8012:

             Summary: Support Unmanaged Container Cleanup
                 Key: YARN-8012
                 URL: https://issues.apache.org/jira/browse/YARN-8012
             Project: Hadoop YARN
          Issue Type: New Feature
          Components: nodemanager
    Affects Versions: 2.7.1
            Reporter: Yuqi Wang
            Assignee: Yuqi Wang
             Fix For: 2.7.1

An *unmanaged container* is a container which is no longer managed by NM. Thus, 
it is cannot be managed by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 # For container resource managed by YARN, such as container job object
 and disk data:
 ** NM service is disabled or removed on the node.
 ** NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
 ** NM has bugs, such as wrongly mark live container as complete.
 #  For container resource unmanaged by YARN:
 ** User breakaway processes from container job object.
 ** User creates VMs from container job object.
 ** User acquires other resource on the machine which is unmanaged by
 YARN, such as produce data outside Container folder.

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

*Initial patch for review:*

For the initial patch, the unmanaged container cleanup feature on Windows, only 
can cleanup the container job object of the unmanaged container. Cleanup for 
more container resources will be supported. And the UT will be added if the 
design is agreed.

The current container will be considered as unmanaged when:
 # NM is dead:
 ** Failed to check whether container is managed by NM within timeout.
 # NM is alive but container is
 or not found:
 ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE 
 not found in the NM container list.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to