[ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-799:
---------------------------------

    Description: 
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

{code}
  public String getResourcesOption(ContainerId containerId) {
    String containerName = containerId.toString();
    StringBuilder sb = new StringBuilder("cgroups=");

    if (isCpuWeightEnabled()) {
      sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs");
      sb.append(",");
    }

    if (sb.charAt(sb.length() - 1) == ',') {
      sb.deleteCharAt(sb.length() - 1);
    }
    return sb.toString();
  }
{code}

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

{quote}
$ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux
{quote}

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.    fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

{quote}
$ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
{quote}

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site that lets admins specify which file to write to.

Thoughts?

  was:
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

{quote}
  public String getResourcesOption(ContainerId containerId) {
    String containerName = containerId.toString();
    StringBuilder sb = new StringBuilder("cgroups=");

    if (isCpuWeightEnabled()) {
      sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs");
      sb.append(",");
    }

    if (sb.charAt(sb.length() - 1) == ',') {
      sb.deleteCharAt(sb.length() - 1);
    }
    return sb.toString();
  }
{quote}

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

{quote}
$ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux
{quote}

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.    fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

{quote}
$ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
{quote}

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site that lets admins specify which file to write to.

Thoughts?

    
> CgroupsLCEResourcesHandler tries to write to cgroup.procs
> ---------------------------------------------------------
>
>                 Key: YARN-799
>                 URL: https://issues.apache.org/jira/browse/YARN-799
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.4-alpha, 2.0.5-alpha
>            Reporter: Chris Riccomini
>
> The implementation of
> bq. 
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
> Tells the container-executor to write PIDs to cgroup.procs:
> {code}
>   public String getResourcesOption(ContainerId containerId) {
>     String containerName = containerId.toString();
>     StringBuilder sb = new StringBuilder("cgroups=");
>     if (isCpuWeightEnabled()) {
>       sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + 
> "/cgroup.procs");
>       sb.append(",");
>     }
>     if (sb.charAt(sb.length() - 1) == ',') {
>       sb.deleteCharAt(sb.length() - 1);
>     }
>     return sb.toString();
>   }
> {code}
> Apparently, this file has not always been writeable:
> https://patchwork.kernel.org/patch/116146/
> http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
> https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
> The RHEL version of the Linux kernel that I'm using has a CGroup module that 
> has a non-writeable cgroup.procs file.
> {quote}
> $ uname -a
> Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 
> 2011 x86_64 x86_64 x86_64 GNU/Linux
> {quote}
> As a result, when the container-executor tries to run, it fails with this 
> error message:
> bq.    fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",
> This is because the executor is given a resource by the 
> CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
> {quote}
> $ pwd 
> /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001
> $ ls -l
> total 0
> -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
> -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
> -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
> -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
> -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
> -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
> {quote}
> I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
> and this appears to have fixed the problem.
> I can think of several potential resolutions to this ticket:
> 1. Ignore the problem, and make people patch YARN when they hit this issue.
> 2. Write to /tasks instead of /cgroup.procs for everyone
> 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
> to /tasks.
> 4. Add a config to yarn-site that lets admins specify which file to write to.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to