[jira] [Updated] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.
[ https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-8480: -- Description: The way we get resource statistics for Docker tasks is through getting the cgroup subsystem path through {{/proc//cgroup}} first (taking the {{cpuacct}} subsystem as an example): {noformat} 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b {noformat} Then read {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} to get the statistics: {noformat} user 4 system 0 {noformat} However, when a Docker container is being teared down, it seems that Docker or the operation system will first move the process to the root cgroup before actually killing it, making {{/proc//docker}} look like the following: {noformat} 9:cpuacct,cpu:/ {noformat} This makes a racy call to [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] return a single '/', which in turn makes [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the statistics for the root cgroup: {noformat} user 228058750 system 24506461 {noformat} This can be reproduced by [^test.cpp] with the following command: {noformat} $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep ... Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' user 4 system 0 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Failed to open file '/proc/44224/cgroup' sleep [2]- Exit 1 ./test $(docker inspect sleep | jq .[].State.Pid) {noformat} was: The way we get resource statistics for Docker tasks is through getting the cgroup subsystem path through {{/proc//docker}} first (taking the {{cpuacct}} subsystem as an example): {noformat} 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b {noformat} Then read {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} to get the statistics: {noformat} user 4 system 0 {noformat} However, when a Docker container is being teared down, it seems that Docker or the operation system will first move the process to the root cgroup before actually killing it, making {{/proc//docker}} look like the following: {noformat} 9:cpuacct,cpu:/ {noformat} This makes a racy call to [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] return a single '/', which in turn makes [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the statistics for the root cgroup: {noformat} user 228058750 system 24506461 {noformat} This can be reproduced by [^test.cpp] with the following command: {noformat} $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep ... Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' user 4 system 0 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Failed to open file '/proc/44224/cgroup' sleep [2]- Exit 1 ./test $(docker inspect sleep | jq .[].State.Pid) {noformat} > Mesos returns high resource usage when killing a Docker task. > - > > Key: MESOS-8480 > URL: https://issues.apache.org/jira/browse/MESOS-8480 > Project: Mesos > Issue Type: Bug > Components: cgroups >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Major > Fix For: 1.3.2, 1.4.2, 1.6.0, 1.5.1 > > Attachments: test.cpp > > > The way we get resource statistics for Docker tasks is through getting the > cgroup subsystem path through {{/proc//cgroup}} first (taking the > {{cpuacct}} subsystem as an example): > {noformat} > 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b > {noformat} > Then
[jira] [Updated] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.
[ https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-8480: -- Fix Version/s: 1.5.1 1.4.2 1.3.2 > Mesos returns high resource usage when killing a Docker task. > - > > Key: MESOS-8480 > URL: https://issues.apache.org/jira/browse/MESOS-8480 > Project: Mesos > Issue Type: Bug > Components: cgroups >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Major > Fix For: 1.3.2, 1.4.2, 1.6.0, 1.5.1 > > Attachments: test.cpp > > > The way we get resource statistics for Docker tasks is through getting the > cgroup subsystem path through {{/proc//docker}} first (taking the > {{cpuacct}} subsystem as an example): > {noformat} > 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b > {noformat} > Then read > {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} > to get the statistics: > {noformat} > user 4 > system 0 > {noformat} > However, when a Docker container is being teared down, it seems that Docker > or the operation system will first move the process to the root cgroup before > actually killing it, making {{/proc//docker}} look like the following: > {noformat} > 9:cpuacct,cpu:/ > {noformat} > This makes a racy call to > [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] > return a single '/', which in turn makes > [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] > read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the > statistics for the root cgroup: > {noformat} > user 228058750 > system 24506461 > {noformat} > This can be reproduced by [^test.cpp] with the following command: > {noformat} > $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect > sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep > ... > Reading file '/proc/44224/cgroup' > Reading file > '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' > user 4 > system 0 > Reading file '/proc/44224/cgroup' > Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' > user 228058750 > system 24506461 > Reading file '/proc/44224/cgroup' > Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' > user 228058750 > system 24506461 > Failed to open file '/proc/44224/cgroup' > sleep > [2]- Exit 1 ./test $(docker inspect sleep | jq > .[].State.Pid) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.
[ https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-8480: -- Fix Version/s: 1.6.0 > Mesos returns high resource usage when killing a Docker task. > - > > Key: MESOS-8480 > URL: https://issues.apache.org/jira/browse/MESOS-8480 > Project: Mesos > Issue Type: Bug > Components: cgroups >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Major > Fix For: 1.6.0 > > Attachments: test.cpp > > > The way we get resource statistics for Docker tasks is through getting the > cgroup subsystem path through {{/proc//docker}} first (taking the > {{cpuacct}} subsystem as an example): > {noformat} > 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b > {noformat} > Then read > {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} > to get the statistics: > {noformat} > user 4 > system 0 > {noformat} > However, when a Docker container is being teared down, it seems that Docker > or the operation system will first move the process to the root cgroup before > actually killing it, making {{/proc//docker}} look like the following: > {noformat} > 9:cpuacct,cpu:/ > {noformat} > This makes a racy call to > [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] > return a single '/', which in turn makes > [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] > read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the > statistics for the root cgroup: > {noformat} > user 228058750 > system 24506461 > {noformat} > This can be reproduced by [^test.cpp] with the following command: > {noformat} > $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect > sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep > ... > Reading file '/proc/44224/cgroup' > Reading file > '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' > user 4 > system 0 > Reading file '/proc/44224/cgroup' > Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' > user 228058750 > system 24506461 > Reading file '/proc/44224/cgroup' > Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' > user 228058750 > system 24506461 > Failed to open file '/proc/44224/cgroup' > sleep > [2]- Exit 1 ./test $(docker inspect sleep | jq > .[].State.Pid) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.
[ https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao updated MESOS-8480: --- Story Points: 2 (was: 3) > Mesos returns high resource usage when killing a Docker task. > - > > Key: MESOS-8480 > URL: https://issues.apache.org/jira/browse/MESOS-8480 > Project: Mesos > Issue Type: Bug > Components: cgroups >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Major > Attachments: test.cpp > > > The way we get resource statistics for Docker tasks is through getting the > cgroup subsystem path through {{/proc//docker}} first (taking the > {{cpuacct}} subsystem as an example): > {noformat} > 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b > {noformat} > Then read > {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} > to get the statistics: > {noformat} > user 4 > system 0 > {noformat} > However, when a Docker container is being teared down, it seems that Docker > or the operation system will first move the process to the root cgroup before > actually killing it, making {{/proc//docker}} look like the following: > {noformat} > 9:cpuacct,cpu:/ > {noformat} > This makes a racy call to > [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] > return a single '/', which in turn makes > [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] > read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the > statistics for the root cgroup: > {noformat} > user 228058750 > system 24506461 > {noformat} > This can be reproduced by [^test.cpp] with the following command: > {noformat} > $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect > sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep > ... > Reading file '/proc/44224/cgroup' > Reading file > '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' > user 4 > system 0 > Reading file '/proc/44224/cgroup' > Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' > user 228058750 > system 24506461 > Reading file '/proc/44224/cgroup' > Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' > user 228058750 > system 24506461 > Failed to open file '/proc/44224/cgroup' > sleep > [2]- Exit 1 ./test $(docker inspect sleep | jq > .[].State.Pid) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.
[ https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao updated MESOS-8480: --- Description: The way we get resource statistics for Docker tasks is through getting the cgroup subsystem path through {{/proc//docker}} first (taking the {{cpuacct}} subsystem as an example): {noformat} 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b {noformat} Then read {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} to get the statistics: {noformat} user 4 system 0 {noformat} However, when a Docker container is being teared down, it seems that Docker or the operation system will first move the process to the root cgroup before actually killing it, making {{/proc//docker}} look like the following: {noformat} 9:cpuacct,cpu:/ {noformat} This makes a racy call to [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] return a single '/', which in turn makes [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the statistics for the root cgroup: {noformat} user 228058750 system 24506461 {noformat} This can be reproduced by [^test.cpp] with the following command: {noformat} $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep ... Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' user 4 system 0 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Failed to open file '/proc/44224/cgroup' sleep [2]- Exit 1 ./test $(docker inspect sleep | jq .[].State.Pid) {noformat} was: The way we get resource statistics for Docker tasks is through getting the cgroup subsystem path through {{/proc//docker}} first (taking the {{cpuacct}} subsystem as an example): {noformat} 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b {noformat} Then read {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} to get the statistics: {noformat} user 4 system 0 {noformat} However, when a Docker container is being teared down, it seems that Docker or the operation system will first move the process to the root cgroup before actually killing it, making {{/proc//docker}} look like the following: {noformat} 9:cpuacct,cpu:/ {noformat} This makes [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] return a single '/', which in turn makes [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the statistics for the root cgroup: {noformat} user 228058750 system 24506461 {noformat} This can be reproduced by [^test.cpp] with the following command: {noformat} $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep ... Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' user 4 system 0 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Failed to open file '/proc/44224/cgroup' sleep [2]- Exit 1 ./test $(docker inspect sleep | jq .[].State.Pid) {noformat} > Mesos returns high resource usage when killing a Docker task. > - > > Key: MESOS-8480 > URL: https://issues.apache.org/jira/browse/MESOS-8480 > Project: Mesos > Issue Type: Bug > Components: cgroups >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Major > Attachments: test.cpp > > > The way we get resource statistics for Docker tasks is through getting the > cgroup subsystem path through {{/proc//docker}} first (taking the > {{cpuacct}} subsystem as an example): > {noformat} > 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b > {noformat} > Then read >
[jira] [Updated] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.
[ https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao updated MESOS-8480: --- Description: The way we get resource statistics for Docker tasks is through getting the cgroup subsystem path through {{/proc//docker}} first (taking the {{cpuacct}} subsystem as an example): {noformat} 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b {noformat} Then read {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} to get the statistics: {noformat} user 4 system 0 {noformat} However, when a Docker container is being teared down, it seems that Docker or the operation system will first move the process to the root cgroup before actually killing it, making {{/proc//docker}} look like the following: {noformat} 9:cpuacct,cpu:/ {noformat} This makes [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] return a single '/', which in turn makes [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the statistics for the root cgroup: {noformat} user 228058750 system 24506461 {noformat} This can be reproduced by [^test.cpp] with the following command: {noformat} $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep ... Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' user 4 system 0 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Failed to open file '/proc/44224/cgroup' sleep [2]- Exit 1 ./test $(docker inspect sleep | jq .[].State.Pid) {noformat} was: The way we get resource statistics for Docker tasks is through getting the cgroup subsystem path through {{/proc//docker}} first (taking the {{cpuacct}} subsystem as an example): {noformat} 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b {noformat} Then read {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} to get the statistics: {noformat} user 4 system 0 {noformat} However, when a Docker container is being teared down, it seems that Docker or the operation system will first move the process to the root cgroup before actually killing it, making {{/proc//docker}} look like the following: {noformat} 9:cpuacct,cpu:/ {noformat} This makes [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] return a single '/', which in turn makes [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the statistics for the root cgroup: {noformat} user 228058750 system 24506461 {noformat} This can be reproduced through test.cpp with the following command: {noformat} $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep ... Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' user 4 system 0 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Failed to open file '/proc/44224/cgroup' sleep [2]- Exit 1 ./test $(docker inspect sleep | jq .[].State.Pid) {noformat} > Mesos returns high resource usage when killing a Docker task. > - > > Key: MESOS-8480 > URL: https://issues.apache.org/jira/browse/MESOS-8480 > Project: Mesos > Issue Type: Bug > Components: cgroups >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Major > Attachments: test.cpp > > > The way we get resource statistics for Docker tasks is through getting the > cgroup subsystem path through {{/proc//docker}} first (taking the > {{cpuacct}} subsystem as an example): > {noformat} > 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b > {noformat} > Then read >
[jira] [Updated] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.
[ https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao updated MESOS-8480: --- Attachment: test.cpp > Mesos returns high resource usage when killing a Docker task. > - > > Key: MESOS-8480 > URL: https://issues.apache.org/jira/browse/MESOS-8480 > Project: Mesos > Issue Type: Bug > Components: cgroups >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Major > Attachments: test.cpp > > > The way we get resource statistics for Docker tasks is through getting the > cgroup subsystem path through {{/proc//docker}} first (taking the > {{cpuacct}} subsystem as an example): > {noformat} > 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b > {noformat} > Then read > {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} > to get the statistics: > {noformat} > user 4 > system 0 > {noformat} > However, when a Docker container is being teared down, it seems that Docker > or the operation system will first move the process to the root cgroup before > actually killing it, making {{/proc//docker}} look like the following: > {noformat} > 9:cpuacct,cpu:/ > {noformat} > This makes > [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] > return a single '/', which in turn makes > [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] > read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the > statistics for the root cgroup: > {noformat} > user 228058750 > system 24506461 > {noformat} > This can be reproduced through test.cpp with the following command: > {noformat} > $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect > sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep > ... > Reading file '/proc/44224/cgroup' > Reading file > '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' > user 4 > system 0 > Reading file '/proc/44224/cgroup' > Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' > user 228058750 > system 24506461 > Reading file '/proc/44224/cgroup' > Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' > user 228058750 > system 24506461 > Failed to open file '/proc/44224/cgroup' > sleep > [2]- Exit 1 ./test $(docker inspect sleep | jq > .[].State.Pid) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)