[ 
https://issues.apache.org/jira/browse/MESOS-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362406#comment-17362406
 ] 

Subhajit Palit commented on MESOS-9950:
---------------------------------------

Has there been any solve on this yet ?

Seeing a similar issue recently :

 

E0612 13:17:01.861963 3586 memory.cpp:530] Failed to read 
'memory.limit_in_bytes': 'mesos/84c0dd72-7571-4a31-a070-7b085d87482f' is not a 
valid cgroup
E0612 13:17:01.863368 3586 memory.cpp:539] Failed to read 
'memory.max_usage_in_bytes': 'mesos/84c0dd72-7571-4a31-a070-7b085d87482f' is 
not a valid cgroup
E0612 13:17:01.864610 3586 memory.cpp:551] Failed to read 'memory.stat': 
'mesos/84c0dd72-7571-4a31-a070-7b085d87482f' is not a valid cgroup
W0612 13:17:01.866982 3591 linux_launcher.cpp:560] Couldn't find freezer cgroup 
for container 84c0dd72-7571-4a31-a070-7b085d87482f so assuming partially 
destroyed
W0612 13:17:02.607906 3590 containerizer.cpp:2378] Skipping status for 
container 84c0dd72-7571-4a31-a070-7b085d87482f because: Container does not exist
W0612 13:17:02.608215 3591 containerizer.cpp:2240] Ignoring update for 
currently being destroyed container 84c0dd72-7571-4a31-a070-7b085d87482f
E0612 13:17:07.647258 3607 slave.cpp:6305] Termination of executor 
'compose-admin-development-gfsm-gcp-f587ea61-cbba-11eb-a340-42010ab41255-0-2e9bf188-bf2f-45ec-8912-9ec7a4f05a02'
 of framework 9f48d831-63e7-4556-86ab-463a69389e4d-0000 failed: Failed to clean 
up an isolator when destroying container: Failed to destroy cgroups: Failed to 
get nested cgroups: 'mesos/84c0dd72-7571-4a31-a070-7b085d87482f' is not a valid 
cgroup;Failed to get nested cgroups: 
'mesos/84c0dd72-7571-4a31-a070-7b085d87482f' is not a valid cgroup

> memory cgroup gone before isolator cleaning up
> ----------------------------------------------
>
>                 Key: MESOS-9950
>                 URL: https://issues.apache.org/jira/browse/MESOS-9950
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: longfei
>            Priority: Major
>
> The memcg created by mesos may have been deleted before cgroup/memory 
> isolator cleaning up.
> This would let the termination fail and lose information in the old 
> termination(before fail). 
> {code:java}
> I0821 15:16:03.025796 3354800 paths.cpp:745] Creating sandbox 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-0000/executors/mt:z00000000000003584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89'
>  for user 'tiger'
> I0821 15:16:03.026199 3354800 paths.cpp:748] Creating sandbox 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-0000/executors/mt:z00000000000003584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89'
> I0821 15:16:03.026304 3354800 slave.cpp:9064] Launching executor 
> 'mt:z00000000000003584687:1' of framework 
> 8e4967e5-736e-4a22-90c3-7b32d526914d-0000 with resources 
> [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}]
>  in work directory 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-0000/executors/mt:z00000000000003584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89'
> I0821 15:16:03.051795 3354800 slave.cpp:3520] Launching container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89 for executor 
> 'mt:z00000000000003584687:1' of framework 
> 8e4967e5-736e-4a22-90c3-7b32d526914d-0000
> I0821 15:16:03.076608 3354807 containerizer.cpp:1325] Starting container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.076911 3354807 containerizer.cpp:3185] Transitioning the state 
> of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PROVISIONING to 
> PREPARING
> I0821 15:16:03.077906 3354802 memory.cpp:478] Started listening for OOM 
> events for container a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.079540 3354804 memory.cpp:198] Updated 
> 'memory.soft_limit_in_bytes' to 4032MB for container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.079587 3354820 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus 
> 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.079589 3354804 memory.cpp:227] Updated 'memory.limit_in_bytes' 
> to 4032MB for container a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.080901 3354802 switchboard.cpp:316] Container logger module 
> finished preparing container a0706ca0-fe2c-4477-8161-329b26ea5d89; 
> IOSwitchboard server is not required
> I0821 15:16:03.081593 3354801 linux_launcher.cpp:492] Launching container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89 and cloning with namespaces
> I0821 15:16:03.083823 3354808 containerizer.cpp:2107] Checkpointing 
> container's forked pid 1857418 to 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-0000/executors/mt:z00000000000003584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89/pids/forked.pid'
> I0821 15:16:03.084156 3354808 containerizer.cpp:3185] Transitioning the state 
> of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PREPARING to ISOLATING
> I0821 15:16:03.091468 3354808 containerizer.cpp:3185] Transitioning the state 
> of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from ISOLATING to FETCHING
> I0821 15:16:03.094933 3354808 containerizer.cpp:3185] Transitioning the state 
> of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from FETCHING to RUNNING
> I0821 15:16:03.197753 3354808 memory.cpp:198] Updated 
> 'memory.soft_limit_in_bytes' to 4032MB for container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.197757 3354801 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus 
> 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:21:39.692978 3354814 memory.cpp:515] OOM detected for container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:21:39.693182 3354805 containerizer.cpp:3044] Container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89 has reached its limit for resource [] 
> and will be terminated
> I0821 15:21:39.693192 3354805 containerizer.cpp:2518] Destroying container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89 in RUNNING state
> I0821 15:21:39.693197 3354805 containerizer.cpp:3185] Transitioning the state 
> of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from RUNNING to DESTROYING
> I0821 15:21:39.693542 3354815 linux_launcher.cpp:576] Asked to destroy 
> container a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:21:39.693563 3354815 linux_launcher.cpp:618] Destroying cgroup 
> '/sys/fs/cgroup/freezer/tiger/a0706ca0-fe2c-4477-8161-329b26ea5d89'
> I0821 15:21:39.693737 3354825 cgroups.cpp:2854] Freezing cgroup 
> /sys/fs/cgroup/freezer/tiger/a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:21:39.794579 3354827 cgroups.cpp:1242] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/tiger/a0706ca0-fe2c-4477-8161-329b26ea5d89 after 
> 100.765952ms
> I0821 15:21:39.795117 3354800 cgroups.cpp:2872] Thawing cgroup 
> /sys/fs/cgroup/freezer/tiger/a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:21:39.795686 3354800 cgroups.cpp:1271] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/tiger/a0706ca0-fe2c-4477-8161-329b26ea5d89 after 
> 515072ns
> I0821 15:21:39.847966 3354798 containerizer.cpp:3024] Container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89 has exited
> E0821 15:21:39.950544 3354823 slave.cpp:6302] Termination of executor 
> 'mt:z00000000000003584687:1' of framework 
> 8e4967e5-736e-4a22-90c3-7b32d526914d-0000 failed: Failed to clean up an 
> isolator when destroying container: Failed to destroy cgroups: Failed to get 
> nested cgroups: Failed to determine canonical path of 
> '/sys/fs/cgroup/memory/tiger/a0706ca0-fe2c-4477-8161-329b26ea5d89': No such 
> file or directory
> W0821 15:21:39.950760 3354817 containerizer.cpp:2462] Skipping status for 
> container a0706ca0-fe2c-4477-8161-329b26ea5d89 because: Container does not 
> exist
> W0821 15:21:39.950958 3354811 containerizer.cpp:2324] Ignoring update for 
> currently being destroyed container a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:21:39.953343 3354816 gc.cpp:95] Scheduling 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-0000/executors/mt:z00000000000003584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89/tasks/mt:z00000000000003584687:1'
>  for gc 6.99610007729185days in the future
> I0821 15:21:39.953478 3354806 gc.cpp:95] Scheduling 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-0000/executors/mt:z00000000000003584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89'
>  for gc 6.99998896460444days in the future
> I0821 15:21:39.953538 3354810 gc.cpp:95] Scheduling 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-0000/executors/mt:z00000000000003584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89'
>  for gc 6.99998896395852days in the future
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to