Yuqi Wang created YARN-8012: ------------------------------- Summary: Support Unmanaged Container Cleanup Key: YARN-8012 URL: https://issues.apache.org/jira/browse/YARN-8012 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.1 Reporter: Yuqi Wang Assignee: Yuqi Wang Fix For: 2.7.1
An *unmanaged container* is a container which is no longer managed by NM. Thus, it is cannot be managed by YARN, too. *There are many cases a YARN managed container can become unmanaged, such as:* # For container resource managed by YARN, such as container job object and disk data: ** NM service is disabled or removed on the node. ** NM is unable to start up again on the node, such as depended configuration, or resources cannot be ready. ** NM local leveldb store is corrupted or lost, such as bad disk sectors. ** NM has bugs, such as wrongly mark live container as complete. # For container resource unmanaged by YARN: ** User breakaway processes from container job object. ** User creates VMs from container job object. ** User acquires other resource on the machine which is unmanaged by YARN, such as produce data outside Container folder. *Bad impacts of unmanaged container, such as:* # Resource cannot be managed for YARN and the node: ** Cause YARN and node resource leak ** Cannot kill the container to release YARN resource on the node # Container and App killing is not eventually consistent for user: ** App which has bugs can still produce bad impacts to outside even if the App is killed for a long time *Initial patch for review:* For the initial patch, the unmanaged container cleanup feature on Windows, only can cleanup the container job object of the unmanaged container. Cleanup for more container resources will be supported. And the UT will be added if the design is agreed. The current container will be considered as unmanaged when: # NM is dead: ** Failed to check whether container is managed by NM within timeout. # NM is alive but container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE or not found: ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE or not found in the NM container list. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org