[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670310#comment-15670310 ] Jun Gong commented on YARN-4382: {quote} You want to write the shell command or the script to release_agent in the cgroup ? The Shell command or the script that can delete the hierarchy or the container empty dirs. {quote} Yes. The demo could be found at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-common_tunable_parameters.html. Please feel free to work on it. Sorry for it. I was busy at that time and forgot it...:( > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis >Assignee: Jun Gong > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663485#comment-15663485 ] XiaopengLi commented on YARN-4382: -- Hi Jun Gong.I have the interest to fix this problem. But I can not understand what you mean. You want to write the shell command or the script to release_agent in the cgroup ? The Shell command or the script that can delete the hierarchy or the container empty dirs.You can write the demo or descirbe your idea?Thanks.:) > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis >Assignee: Jun Gong > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029670#comment-15029670 ] Jun Gong commented on YARN-4382: Sorry for late reply. I am working on it, will attach a patch later. > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis >Assignee: Jun Gong > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029363#comment-15029363 ] lachisis commented on YARN-4382: I have tested for the "release_agent" feature, and think it is suitable. Jun Gong , do you make the patch now? If not, I will assignee to me and make. > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis >Assignee: Jun Gong > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023481#comment-15023481 ] lachisis commented on YARN-4382: Thanks for your reply, Jun Gong. I think it is a good idea to use "release_agent" to clear the empty container hierarchys. But I am afaid that does "release_agent" option suit all the cgroup versions? I just test "release_agent" option, maybe some mistake, it does not work now. > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis >Assignee: Jun Gong > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022059#comment-15022059 ] Jun Gong commented on YARN-4382: [~lachisis] Thanks for reporting the issue. Please feel free to re-assign to yourself if you starts/wants to work on it. > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis >Assignee: Jun Gong > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021666#comment-15021666 ] Jun Gong commented on YARN-4382: 'release_agent' in cgroups ([https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt|https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt]) might help for this case. Maybe we could use it to remove empty dirs? SLURM has used it ([http://slurm.schedmd.com/cgroups.html|http://slurm.schedmd.com/cgroups.html]). {quote} If the notify_on_release flag is enabled (1) in a cgroup, then whenever the last task in the cgroup leaves (exits or attaches to some other cgroup) and the last child cgroup of that cgroup is removed, then the kernel runs the command specified by the contents of the "release_agent" file in that hierarchy's root directory, supplying the pathname (relative to the mount point of the cgroup file system) of the abandoned cgroup. This enables automatic removal of abandoned cgroups. The default value of notify_on_release in the root cgroup at system boot is disabled (0). The default value of other cgroups at creation is the current value of their parents' notify_on_release settings. The default value of a cgroup hierarchy's release_agent path is empty. {quote} > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021630#comment-15021630 ] lachisis commented on YARN-4382: If lots of container hierarchys remained, it will make the cpu busy of this node, even when no jobs are running. -- PerfTop: 129889 irqs/sec kernel:76.3% [10 cycles], (all, 16 CPUs) -- samplespcnt kernel function ___ _ ___ 117166.00 - 59.1% : tg_shares_up 35688.00 - 18.0% : _spin_lock_irqsave 12045.00 - 6.1% : __set_se_shares > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)