[jira] [Commented] (MESOS-9950) memory cgroup gone before isolator cleaning up
[ https://issues.apache.org/jira/browse/MESOS-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369759#comment-17369759 ] Subhajit Palit commented on MESOS-9950: --- Thanks for that option [~cf.natali] - I will try it out further and share my observation. > memory cgroup gone before isolator cleaning up > -- > > Key: MESOS-9950 > URL: https://issues.apache.org/jira/browse/MESOS-9950 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: longfei >Priority: Major > > The memcg created by mesos may have been deleted before cgroup/memory > isolator cleaning up. > This would let the termination fail and lose information in the old > termination(before fail). > {code:java} > I0821 15:16:03.025796 3354800 paths.cpp:745] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > for user 'tiger' > I0821 15:16:03.026199 3354800 paths.cpp:748] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.026304 3354800 slave.cpp:9064] Launching executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- with resources > [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] > in work directory > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.051795 3354800 slave.cpp:3520] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 for executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- > I0821 15:16:03.076608 3354807 containerizer.cpp:1325] Starting container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.076911 3354807 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PROVISIONING to > PREPARING > I0821 15:16:03.077906 3354802 memory.cpp:478] Started listening for OOM > events for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079540 3354804 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079587 3354820 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079589 3354804 memory.cpp:227] Updated 'memory.limit_in_bytes' > to 4032MB for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.080901 3354802 switchboard.cpp:316] Container logger module > finished preparing container a0706ca0-fe2c-4477-8161-329b26ea5d89; > IOSwitchboard server is not required > I0821 15:16:03.081593 3354801 linux_launcher.cpp:492] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 and cloning with namespaces > I0821 15:16:03.083823 3354808 containerizer.cpp:2107] Checkpointing > container's forked pid 1857418 to > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89/pids/forked.pid' > I0821 15:16:03.084156 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PREPARING to ISOLATING > I0821 15:16:03.091468 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from ISOLATING to FETCHING > I0821 15:16:03.094933 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from FETCHING to RUNNING > I0821 15:16:03.197753 3354808 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.197757 3354801 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:21:39.692978 3354814 memory.cpp:515] OOM detected for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:21:39.693182 3354805 containerizer.cpp:3044] Container > a0706ca0-fe2c-4477-8161-329b26ea5d89 has reached its limit for resource [] > and will be terminated > I0821 15:21:39.693192 3354805
[jira] [Commented] (MESOS-9950) memory cgroup gone before isolator cleaning up
[ https://issues.apache.org/jira/browse/MESOS-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369023#comment-17369023 ] Charles Natali commented on MESOS-9950: --- [~subhajitpalit] So did you check the systemd configuration? > memory cgroup gone before isolator cleaning up > -- > > Key: MESOS-9950 > URL: https://issues.apache.org/jira/browse/MESOS-9950 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: longfei >Priority: Major > > The memcg created by mesos may have been deleted before cgroup/memory > isolator cleaning up. > This would let the termination fail and lose information in the old > termination(before fail). > {code:java} > I0821 15:16:03.025796 3354800 paths.cpp:745] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > for user 'tiger' > I0821 15:16:03.026199 3354800 paths.cpp:748] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.026304 3354800 slave.cpp:9064] Launching executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- with resources > [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] > in work directory > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.051795 3354800 slave.cpp:3520] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 for executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- > I0821 15:16:03.076608 3354807 containerizer.cpp:1325] Starting container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.076911 3354807 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PROVISIONING to > PREPARING > I0821 15:16:03.077906 3354802 memory.cpp:478] Started listening for OOM > events for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079540 3354804 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079587 3354820 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079589 3354804 memory.cpp:227] Updated 'memory.limit_in_bytes' > to 4032MB for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.080901 3354802 switchboard.cpp:316] Container logger module > finished preparing container a0706ca0-fe2c-4477-8161-329b26ea5d89; > IOSwitchboard server is not required > I0821 15:16:03.081593 3354801 linux_launcher.cpp:492] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 and cloning with namespaces > I0821 15:16:03.083823 3354808 containerizer.cpp:2107] Checkpointing > container's forked pid 1857418 to > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89/pids/forked.pid' > I0821 15:16:03.084156 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PREPARING to ISOLATING > I0821 15:16:03.091468 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from ISOLATING to FETCHING > I0821 15:16:03.094933 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from FETCHING to RUNNING > I0821 15:16:03.197753 3354808 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.197757 3354801 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:21:39.692978 3354814 memory.cpp:515] OOM detected for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:21:39.693182 3354805 containerizer.cpp:3044] Container > a0706ca0-fe2c-4477-8161-329b26ea5d89 has reached its limit for resource [] > and will be terminated > I0821 15:21:39.693192 3354805 containerizer.cpp:2518] Destroying
[jira] [Commented] (MESOS-9950) memory cgroup gone before isolator cleaning up
[ https://issues.apache.org/jira/browse/MESOS-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363126#comment-17363126 ] Charles Natali commented on MESOS-9950: --- [~subhajitpalit] Is the agent started via systemd? If yes, could you post the output of: {noformat} # systemctl show | grep Delegate {noformat} > memory cgroup gone before isolator cleaning up > -- > > Key: MESOS-9950 > URL: https://issues.apache.org/jira/browse/MESOS-9950 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: longfei >Priority: Major > > The memcg created by mesos may have been deleted before cgroup/memory > isolator cleaning up. > This would let the termination fail and lose information in the old > termination(before fail). > {code:java} > I0821 15:16:03.025796 3354800 paths.cpp:745] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > for user 'tiger' > I0821 15:16:03.026199 3354800 paths.cpp:748] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.026304 3354800 slave.cpp:9064] Launching executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- with resources > [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] > in work directory > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.051795 3354800 slave.cpp:3520] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 for executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- > I0821 15:16:03.076608 3354807 containerizer.cpp:1325] Starting container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.076911 3354807 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PROVISIONING to > PREPARING > I0821 15:16:03.077906 3354802 memory.cpp:478] Started listening for OOM > events for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079540 3354804 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079587 3354820 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079589 3354804 memory.cpp:227] Updated 'memory.limit_in_bytes' > to 4032MB for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.080901 3354802 switchboard.cpp:316] Container logger module > finished preparing container a0706ca0-fe2c-4477-8161-329b26ea5d89; > IOSwitchboard server is not required > I0821 15:16:03.081593 3354801 linux_launcher.cpp:492] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 and cloning with namespaces > I0821 15:16:03.083823 3354808 containerizer.cpp:2107] Checkpointing > container's forked pid 1857418 to > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89/pids/forked.pid' > I0821 15:16:03.084156 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PREPARING to ISOLATING > I0821 15:16:03.091468 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from ISOLATING to FETCHING > I0821 15:16:03.094933 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from FETCHING to RUNNING > I0821 15:16:03.197753 3354808 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.197757 3354801 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:21:39.692978 3354814 memory.cpp:515] OOM detected for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:21:39.693182 3354805 containerizer.cpp:3044] Container > a0706ca0-fe2c-4477-8161-329b26ea5d89 has reached its limit for resource [] >
[jira] [Commented] (MESOS-9950) memory cgroup gone before isolator cleaning up
[ https://issues.apache.org/jira/browse/MESOS-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362718#comment-17362718 ] Subhajit Palit commented on MESOS-9950: --- [~carlone] any idea how to get around above issue ? > memory cgroup gone before isolator cleaning up > -- > > Key: MESOS-9950 > URL: https://issues.apache.org/jira/browse/MESOS-9950 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: longfei >Priority: Major > > The memcg created by mesos may have been deleted before cgroup/memory > isolator cleaning up. > This would let the termination fail and lose information in the old > termination(before fail). > {code:java} > I0821 15:16:03.025796 3354800 paths.cpp:745] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > for user 'tiger' > I0821 15:16:03.026199 3354800 paths.cpp:748] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.026304 3354800 slave.cpp:9064] Launching executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- with resources > [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] > in work directory > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.051795 3354800 slave.cpp:3520] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 for executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- > I0821 15:16:03.076608 3354807 containerizer.cpp:1325] Starting container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.076911 3354807 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PROVISIONING to > PREPARING > I0821 15:16:03.077906 3354802 memory.cpp:478] Started listening for OOM > events for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079540 3354804 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079587 3354820 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079589 3354804 memory.cpp:227] Updated 'memory.limit_in_bytes' > to 4032MB for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.080901 3354802 switchboard.cpp:316] Container logger module > finished preparing container a0706ca0-fe2c-4477-8161-329b26ea5d89; > IOSwitchboard server is not required > I0821 15:16:03.081593 3354801 linux_launcher.cpp:492] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 and cloning with namespaces > I0821 15:16:03.083823 3354808 containerizer.cpp:2107] Checkpointing > container's forked pid 1857418 to > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89/pids/forked.pid' > I0821 15:16:03.084156 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PREPARING to ISOLATING > I0821 15:16:03.091468 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from ISOLATING to FETCHING > I0821 15:16:03.094933 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from FETCHING to RUNNING > I0821 15:16:03.197753 3354808 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.197757 3354801 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:21:39.692978 3354814 memory.cpp:515] OOM detected for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:21:39.693182 3354805 containerizer.cpp:3044] Container > a0706ca0-fe2c-4477-8161-329b26ea5d89 has reached its limit for resource [] > and will be terminated > I0821 15:21:39.693192 3354805 containerizer.cpp:2518] Destroying container >
[jira] [Commented] (MESOS-9950) memory cgroup gone before isolator cleaning up
[ https://issues.apache.org/jira/browse/MESOS-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362406#comment-17362406 ] Subhajit Palit commented on MESOS-9950: --- Has there been any solve on this yet ? Seeing a similar issue recently : E0612 13:17:01.861963 3586 memory.cpp:530] Failed to read 'memory.limit_in_bytes': 'mesos/84c0dd72-7571-4a31-a070-7b085d87482f' is not a valid cgroup E0612 13:17:01.863368 3586 memory.cpp:539] Failed to read 'memory.max_usage_in_bytes': 'mesos/84c0dd72-7571-4a31-a070-7b085d87482f' is not a valid cgroup E0612 13:17:01.864610 3586 memory.cpp:551] Failed to read 'memory.stat': 'mesos/84c0dd72-7571-4a31-a070-7b085d87482f' is not a valid cgroup W0612 13:17:01.866982 3591 linux_launcher.cpp:560] Couldn't find freezer cgroup for container 84c0dd72-7571-4a31-a070-7b085d87482f so assuming partially destroyed W0612 13:17:02.607906 3590 containerizer.cpp:2378] Skipping status for container 84c0dd72-7571-4a31-a070-7b085d87482f because: Container does not exist W0612 13:17:02.608215 3591 containerizer.cpp:2240] Ignoring update for currently being destroyed container 84c0dd72-7571-4a31-a070-7b085d87482f E0612 13:17:07.647258 3607 slave.cpp:6305] Termination of executor 'compose-admin-development-gfsm-gcp-f587ea61-cbba-11eb-a340-42010ab41255-0-2e9bf188-bf2f-45ec-8912-9ec7a4f05a02' of framework 9f48d831-63e7-4556-86ab-463a69389e4d- failed: Failed to clean up an isolator when destroying container: Failed to destroy cgroups: Failed to get nested cgroups: 'mesos/84c0dd72-7571-4a31-a070-7b085d87482f' is not a valid cgroup;Failed to get nested cgroups: 'mesos/84c0dd72-7571-4a31-a070-7b085d87482f' is not a valid cgroup > memory cgroup gone before isolator cleaning up > -- > > Key: MESOS-9950 > URL: https://issues.apache.org/jira/browse/MESOS-9950 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: longfei >Priority: Major > > The memcg created by mesos may have been deleted before cgroup/memory > isolator cleaning up. > This would let the termination fail and lose information in the old > termination(before fail). > {code:java} > I0821 15:16:03.025796 3354800 paths.cpp:745] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > for user 'tiger' > I0821 15:16:03.026199 3354800 paths.cpp:748] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.026304 3354800 slave.cpp:9064] Launching executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- with resources > [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] > in work directory > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.051795 3354800 slave.cpp:3520] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 for executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- > I0821 15:16:03.076608 3354807 containerizer.cpp:1325] Starting container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.076911 3354807 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PROVISIONING to > PREPARING > I0821 15:16:03.077906 3354802 memory.cpp:478] Started listening for OOM > events for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079540 3354804 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079587 3354820 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079589 3354804 memory.cpp:227] Updated 'memory.limit_in_bytes' > to 4032MB for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.080901 3354802 switchboard.cpp:316] Container logger module > finished preparing container a0706ca0-fe2c-4477-8161-329b26ea5d89; > IOSwitchboard server is not required > I0821 15:16:03.081593 3354801 linux_launcher.cpp:492] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 and cloning with namespaces > I0821 15:16:03.083823 3354808
[jira] [Commented] (MESOS-9950) memory cgroup gone before isolator cleaning up
[ https://issues.apache.org/jira/browse/MESOS-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912997#comment-16912997 ] longfei commented on MESOS-9950: version: 1.7.3 > memory cgroup gone before isolator cleaning up > -- > > Key: MESOS-9950 > URL: https://issues.apache.org/jira/browse/MESOS-9950 > Project: Mesos > Issue Type: Bug >Reporter: longfei >Priority: Major > > The memcg created by mesos may have been deleted before cgroup/memory > isolator cleaning up. > This would let the termination fail and lose information in the old > termination(before fail). > {code:java} > I0821 15:16:03.025796 3354800 paths.cpp:745] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > for user 'tiger' > I0821 15:16:03.026199 3354800 paths.cpp:748] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.026304 3354800 slave.cpp:9064] Launching executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- with resources > [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] > in work directory > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.051795 3354800 slave.cpp:3520] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 for executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- > I0821 15:16:03.076608 3354807 containerizer.cpp:1325] Starting container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.076911 3354807 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PROVISIONING to > PREPARING > I0821 15:16:03.077906 3354802 memory.cpp:478] Started listening for OOM > events for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079540 3354804 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079587 3354820 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079589 3354804 memory.cpp:227] Updated 'memory.limit_in_bytes' > to 4032MB for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.080901 3354802 switchboard.cpp:316] Container logger module > finished preparing container a0706ca0-fe2c-4477-8161-329b26ea5d89; > IOSwitchboard server is not required > I0821 15:16:03.081593 3354801 linux_launcher.cpp:492] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 and cloning with namespaces > I0821 15:16:03.083823 3354808 containerizer.cpp:2107] Checkpointing > container's forked pid 1857418 to > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89/pids/forked.pid' > I0821 15:16:03.084156 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PREPARING to ISOLATING > I0821 15:16:03.091468 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from ISOLATING to FETCHING > I0821 15:16:03.094933 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from FETCHING to RUNNING > I0821 15:16:03.197753 3354808 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.197757 3354801 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:21:39.692978 3354814 memory.cpp:515] OOM detected for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:21:39.693182 3354805 containerizer.cpp:3044] Container > a0706ca0-fe2c-4477-8161-329b26ea5d89 has reached its limit for resource [] > and will be terminated > I0821 15:21:39.693192 3354805 containerizer.cpp:2518] Destroying container > a0706ca0-fe2c-4477-8161-329b26ea5d89 in RUNNING state > I0821 15:21:39.693197 3354805