[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2016-11-16 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670310#comment-15670310
 ] 

Jun Gong commented on YARN-4382:


{quote}
You want to write the shell command or the script to release_agent in the 
cgroup ? The Shell command or the script that can delete the hierarchy or the 
container empty dirs.
{quote}
Yes. The demo could be found at 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-common_tunable_parameters.html.

Please feel free to work on it. Sorry for it. I was busy at that time and 
forgot it...:(

> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>Assignee: Jun Gong
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2016-11-14 Thread XiaopengLi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663485#comment-15663485
 ] 

XiaopengLi commented on YARN-4382:
--

Hi Jun Gong.I have the interest to fix this problem. But I can not understand 
what you mean. You want to write the shell command or the script to  
release_agent in the cgroup ? The Shell command or the script that can delete 
the hierarchy or the container  empty dirs.You can write the demo or descirbe 
your idea?Thanks.:)


> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>Assignee: Jun Gong
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-27 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029670#comment-15029670
 ] 

Jun Gong commented on YARN-4382:


Sorry for late reply. I am working on it, will attach a patch later.

> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>Assignee: Jun Gong
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-26 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029363#comment-15029363
 ] 

lachisis commented on YARN-4382:


I have tested for the "release_agent" feature, and think it is suitable.
Jun Gong , do you make the patch now?  If not, I will assignee to me and make.

> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>Assignee: Jun Gong
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-23 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022059#comment-15022059
 ] 

Jun Gong commented on YARN-4382:


[~lachisis] Thanks for reporting the issue. Please feel free to re-assign to 
yourself if you starts/wants to work on it. 

> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>Assignee: Jun Gong
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-23 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023481#comment-15023481
 ] 

lachisis commented on YARN-4382:


Thanks for your reply, Jun Gong. 
I think it is a good idea to use "release_agent" to clear the empty container 
hierarchys. But I am afaid that does "release_agent" option suit all the cgroup 
versions?
I just test "release_agent" option, maybe some mistake, it does not work now.


> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>Assignee: Jun Gong
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-22 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021630#comment-15021630
 ] 

lachisis commented on YARN-4382:


If lots of container hierarchys remained, it will make the cpu busy of this 
node, even when no jobs are running.

--
   PerfTop:  129889 irqs/sec  kernel:76.3% [10 cycles],  (all, 16 CPUs)
--

 samplespcnt   kernel function
 ___   _   ___

   117166.00 - 59.1% : tg_shares_up
35688.00 - 18.0% : _spin_lock_irqsave
12045.00 -  6.1% : __set_se_shares


> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-22 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021666#comment-15021666
 ] 

Jun Gong commented on YARN-4382:


'release_agent' in cgroups 
([https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt|https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt])
 might help for this case. Maybe we could use it to remove empty dirs? SLURM 
has used it 
([http://slurm.schedmd.com/cgroups.html|http://slurm.schedmd.com/cgroups.html]).

{quote}
If the notify_on_release flag is enabled (1) in a cgroup, then whenever the 
last task in the cgroup leaves (exits or attaches to some other cgroup) and the 
last child cgroup of that cgroup is removed, then the kernel runs the command 
specified by the contents of the "release_agent" file in that hierarchy's root 
directory, supplying the pathname (relative to the mount point of the cgroup 
file system) of the abandoned cgroup.  This enables automatic removal of 
abandoned cgroups.  The default value of notify_on_release in the root cgroup 
at system boot is disabled (0).  The default value of other cgroups at creation 
is the current value of their parents' notify_on_release settings. The default 
value of a cgroup hierarchy's release_agent path is empty.
{quote}

> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)