Hu Ziqian created YARN-8382:
-------------------------------
Summary: cgroup file leak in NM
Key: YARN-8382
URL: https://issues.apache.org/jira/browse/YARN-8382
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Environment: we write an container with a shutdownHook which has a
piece of code like "while(true) sleep(100)" . when
*yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms <*
*yarn.nodemanager.sleep-delay-before-sigkill.ms , cgourp file leak happens;
when* *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms >*
** *yarn.nodemanager.sleep-delay-before-sigkill.ms, cgroup file is deleted
successfully***
Reporter: Hu Ziqian
Assignee: Hu Ziqian
As Jiandan said in YARN-6525, NM may delete Cgroup container file timeout with
logs like
org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler:
Unable to delete cgroup at: /cgroup/cpu/hadoop-yarn/container_xxx, tried to
delete for 1000ms
we found one situation is that when we set
*yarn.nodemanager.sleep-delay-before-sigkill.ms* bigger than
yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms, the cgroup
file leak happens *.*
One container process tree looks like follow graph:
{{bash(16097)───java(16099)─┬─\{java}(16100) }}
{{ ├─\{java}(16101) }}
{{ ├─\{java}(16102)}}
{{when NM kill a container, NM send kill -15 -pid to kill container process
group. Bash process will exit when it received sigterm, but java process may do
some job (shutdownHook etc.), and may exit unit receive sigkill. And when bash
process exit, CgroupsLCEResourcesHandler begin to try to delete cgroup. So when
*yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* arrived,
the java processes may still running and cgourp/tasks still not empty and cause
a cgroup file leak.}}
{{we add a condition that
*yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* must
bigger than *yarn.nodemanager.sleep-delay-before-sigkill.ms* to solve this
problem.}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]