Jun Gong commented on YARN-4382:

'release_agent' in cgroups 
 might help for this case. Maybe we could use it to remove empty dirs? SLURM 
has used it 

If the notify_on_release flag is enabled (1) in a cgroup, then whenever the 
last task in the cgroup leaves (exits or attaches to some other cgroup) and the 
last child cgroup of that cgroup is removed, then the kernel runs the command 
specified by the contents of the "release_agent" file in that hierarchy's root 
directory, supplying the pathname (relative to the mount point of the cgroup 
file system) of the abandoned cgroup.  This enables automatic removal of 
abandoned cgroups.  The default value of notify_on_release in the root cgroup 
at system boot is disabled (0).  The default value of other cgroups at creation 
is the current value of their parents' notify_on_release settings. The default 
value of a cgroup hierarchy's release_agent path is empty.

> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> ----------------------------------------------------------------------------------------
>                 Key: YARN-4382
>                 URL: https://issues.apache.org/jira/browse/YARN-4382
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.5.2
>            Reporter: lachisis
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.

This message was sent by Atlassian JIRA

Reply via email to