[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY
[ https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101572#comment-17101572 ] Andrei Budnik commented on MESOS-10107: --- {code:java} commit 0cb1591b709e3c9f32093d943b8e2ddcdcf7999f Author: Charles-Francois Natali Date: Sat May 2 01:41:09 2020 +0100 Keep retrying to remove cgroup on EBUSY. This is a follow-up to MESOS-10107, which introduced retries when calling `rmdir` on a seemingly empty cgroup fails with `EBUSY` because of various kernel bugs. At the time, the fix introduced a bounded number of retries, using an exponential backoff summing up to slightly over 1s. This was done because it was similar to what Docker does, and worked during testing. However, after 1 month without seeing this error in our cluster at work, we finally experienced one case where the 1s timeout wasn't enough. It could be because the machine was busy at the time, or some other random factor. So instead of only trying for 1s, I think it might make sense to just keep retrying, until the top-level container destruction timeout - set at 1 minute - kicks in. This actually makes more sense, and avoids having a magical timeout in the cgroup code. We just need to ensure that when the destroyer is finalized, it discards the future in charge of doing the periodic remove. This closes #362 {code} > containeriser: failed to remove cgroup - EBUSY > -- > > Key: MESOS-10107 > URL: https://issues.apache.org/jira/browse/MESOS-10107 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Charles N >Assignee: Charles Natali >Priority: Major > Labels: cgroups, containerization > Fix For: 1.10.0 > > Attachments: mesos-remove-cgroup-race.diff, > reproduce-cgroup-rmdir-race.py > > > We've been seeing some random errors on our cluster, where the container > cgroup isn't properly destroyed after the OOM killer kicked in when memory > limit has been exceeded - see analysis and patch below: > Agent log: > {noformat} > I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: > 10272MB Maximum Used: 10518532KB > MEMORY STATISTICS: > cache 0 > rss 10502754304 > rss_huge 4001366016 > shmem 0 > mapped_file 270336 > dirty 0 > writeback 0 > swap 0 > pgpgin 1684617 > pgpgout 95480 > pgfault 1670328 > pgmajfault 957 > inactive_anon 0 > active_anon 10501189632 > inactive_file 4096 > active_file 0 > unevictable 0 > hierarchical_memory_limit 10770972672 > hierarchical_memsw_limit 10770972672 > total_cache 0 > total_rss 10502754304 > total_rss_huge 4001366016 > total_shmem 0 > total_mapped_file 270336 > total_dirty 0 > total_writeback 0 > total_swap 0 > total_pgpgin 1684617 > total_pgpgout 95480 > total_pgfault 1670328 > total_pgmajfault 957 > total_inactive_anon 0 > total_active_anon 10501070848 > total_inactive_file 4096 > total_active_file 0 > total_unevictable 0 > I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource > [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be > terminated > I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state > I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state > of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING > after 4.285078272secs > I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy > container 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c' > I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 102.27072ms > I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 242944ns > I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for > executor(1)@127.0.1.1:46357 > I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited > E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor >
[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY
[ https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084136#comment-17084136 ] Andrei Budnik commented on MESOS-10107: --- {code:java} commit af3ca189aced5fbc537bfca571264142d4cd37b3 Author: Charles-Francois Natali Date: Wed Apr 1 13:40:16 2020 +0100 Handled EBUSY when destroying a cgroup. It's a workaround for kernel bugs which can cause `rmdir` to fail with `EBUSY` even though the cgroup - appears - empty. See for example https://lkml.org/lkml/2020/1/15/1349 This closes #355 {code} > containeriser: failed to remove cgroup - EBUSY > -- > > Key: MESOS-10107 > URL: https://issues.apache.org/jira/browse/MESOS-10107 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Charles N >Priority: Major > Labels: cgroups, containerization > Fix For: 1.10.0 > > Attachments: mesos-remove-cgroup-race.diff, > reproduce-cgroup-rmdir-race.py > > > We've been seeing some random errors on our cluster, where the container > cgroup isn't properly destroyed after the OOM killer kicked in when memory > limit has been exceeded - see analysis and patch below: > Agent log: > {noformat} > I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: > 10272MB Maximum Used: 10518532KB > MEMORY STATISTICS: > cache 0 > rss 10502754304 > rss_huge 4001366016 > shmem 0 > mapped_file 270336 > dirty 0 > writeback 0 > swap 0 > pgpgin 1684617 > pgpgout 95480 > pgfault 1670328 > pgmajfault 957 > inactive_anon 0 > active_anon 10501189632 > inactive_file 4096 > active_file 0 > unevictable 0 > hierarchical_memory_limit 10770972672 > hierarchical_memsw_limit 10770972672 > total_cache 0 > total_rss 10502754304 > total_rss_huge 4001366016 > total_shmem 0 > total_mapped_file 270336 > total_dirty 0 > total_writeback 0 > total_swap 0 > total_pgpgin 1684617 > total_pgpgout 95480 > total_pgfault 1670328 > total_pgmajfault 957 > total_inactive_anon 0 > total_active_anon 10501070848 > total_inactive_file 4096 > total_active_file 0 > total_unevictable 0 > I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource > [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be > terminated > I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state > I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state > of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING > after 4.285078272secs > I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy > container 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c' > I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 102.27072ms > I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 242944ns > I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for > executor(1)@127.0.1.1:46357 > I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited > E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor > 'task-0-e4e4f131-ee09-4eaa-8120-3797f71c0e16' of framework > 0ab2a2ad-d6ef-4ca2-b17a-33972f9e8af7-0001 failed: Failed to kill all > processes in the container: Failed to remove cgroup > 'mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Failed to remove cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Device > or resource busy > {noformat} > Initially I thought it was a race condition in the cgroup destruction code, > but an strace confirmed that the cgroup directory was only deleted once all > tasks had exited (edited and commented strace below from a different instance > of the same problem): > {noformat} > # get the list of processes > 3431 23:01:32.293608 openat(AT_FDCWD, > "/sys/fs/cgroup/freezer/mesos/7eb1155b-ee0d-4233-8e49-cbe81f8b4deb/cgroup.procs", > O_RDONLY > 3431 23:01:32.293669 <... openat resumed> ) = 18 <0.36> > 3431 23:01:32.294220 read(18, >
[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY
[ https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072739#comment-17072739 ] Charles commented on MESOS-10107: - Hey! Sure, I created [https://github.com/apache/mesos/pull/355] I answered your questions there (I'm not sure what's the best place for discussions, but it feels nice to have it close to thew code review). > containeriser: failed to remove cgroup - EBUSY > -- > > Key: MESOS-10107 > URL: https://issues.apache.org/jira/browse/MESOS-10107 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Charles >Priority: Major > Attachments: mesos-remove-cgroup-race.diff, > reproduce-cgroup-rmdir-race.py > > > We've been seeing some random errors on our cluster, where the container > cgroup isn't properly destroyed after the OOM killer kicked in when memory > limit has been exceeded - see analysis and patch below: > Agent log: > {noformat} > I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: > 10272MB Maximum Used: 10518532KB > MEMORY STATISTICS: > cache 0 > rss 10502754304 > rss_huge 4001366016 > shmem 0 > mapped_file 270336 > dirty 0 > writeback 0 > swap 0 > pgpgin 1684617 > pgpgout 95480 > pgfault 1670328 > pgmajfault 957 > inactive_anon 0 > active_anon 10501189632 > inactive_file 4096 > active_file 0 > unevictable 0 > hierarchical_memory_limit 10770972672 > hierarchical_memsw_limit 10770972672 > total_cache 0 > total_rss 10502754304 > total_rss_huge 4001366016 > total_shmem 0 > total_mapped_file 270336 > total_dirty 0 > total_writeback 0 > total_swap 0 > total_pgpgin 1684617 > total_pgpgout 95480 > total_pgfault 1670328 > total_pgmajfault 957 > total_inactive_anon 0 > total_active_anon 10501070848 > total_inactive_file 4096 > total_active_file 0 > total_unevictable 0 > I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource > [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be > terminated > I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state > I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state > of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING > after 4.285078272secs > I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy > container 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c' > I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 102.27072ms > I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 242944ns > I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for > executor(1)@127.0.1.1:46357 > I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited > E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor > 'task-0-e4e4f131-ee09-4eaa-8120-3797f71c0e16' of framework > 0ab2a2ad-d6ef-4ca2-b17a-33972f9e8af7-0001 failed: Failed to kill all > processes in the container: Failed to remove cgroup > 'mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Failed to remove cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Device > or resource busy > {noformat} > Initially I thought it was a race condition in the cgroup destruction code, > but an strace confirmed that the cgroup directory was only deleted once all > tasks had exited (edited and commented strace below from a different instance > of the same problem): > {noformat} > # get the list of processes > 3431 23:01:32.293608 openat(AT_FDCWD, > "/sys/fs/cgroup/freezer/mesos/7eb1155b-ee0d-4233-8e49-cbe81f8b4deb/cgroup.procs", > O_RDONLY > 3431 23:01:32.293669 <... openat resumed> ) = 18 <0.36> > 3431 23:01:32.294220 read(18, > 3431 23:01:32.294268 <... read resumed> "5878\n6036\n6210\n", 8192) = > 15 <0.33> > 3431 23:01:32.294306 read(18, "", 4096) = 0 <0.13> > 3431 23:01:32.294346 close(18 > 3431 23:01:32.294402 <... close resumed> ) = 0 <0.45> > #kill them > 3431 23:01:32.296266
[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY
[ https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072680#comment-17072680 ] Andrei Budnik commented on MESOS-10107: --- Thanks for the detailed explanations! Could you please submit your patch to [Apache Review Board|http://mesos.apache.org/documentation/latest/advanced-contribution/#submit-your-patch] or open a [PR on github|http://mesos.apache.org/documentation/latest/beginner-contribution/#open-a-pr/]? Does the workaround work reliably after changing the initial delay and retry count to the values taken from libcontainerd (10ms and 5)? Should we retry only if `::rmdir()` returns EBUSY errno error? > containeriser: failed to remove cgroup - EBUSY > -- > > Key: MESOS-10107 > URL: https://issues.apache.org/jira/browse/MESOS-10107 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Charles >Priority: Major > Attachments: mesos-remove-cgroup-race.diff, > reproduce-cgroup-rmdir-race.py > > > We've been seeing some random errors on our cluster, where the container > cgroup isn't properly destroyed after the OOM killer kicked in when memory > limit has been exceeded - see analysis and patch below: > Agent log: > {noformat} > I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: > 10272MB Maximum Used: 10518532KB > MEMORY STATISTICS: > cache 0 > rss 10502754304 > rss_huge 4001366016 > shmem 0 > mapped_file 270336 > dirty 0 > writeback 0 > swap 0 > pgpgin 1684617 > pgpgout 95480 > pgfault 1670328 > pgmajfault 957 > inactive_anon 0 > active_anon 10501189632 > inactive_file 4096 > active_file 0 > unevictable 0 > hierarchical_memory_limit 10770972672 > hierarchical_memsw_limit 10770972672 > total_cache 0 > total_rss 10502754304 > total_rss_huge 4001366016 > total_shmem 0 > total_mapped_file 270336 > total_dirty 0 > total_writeback 0 > total_swap 0 > total_pgpgin 1684617 > total_pgpgout 95480 > total_pgfault 1670328 > total_pgmajfault 957 > total_inactive_anon 0 > total_active_anon 10501070848 > total_inactive_file 4096 > total_active_file 0 > total_unevictable 0 > I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource > [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be > terminated > I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state > I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state > of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING > after 4.285078272secs > I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy > container 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c' > I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 102.27072ms > I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 242944ns > I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for > executor(1)@127.0.1.1:46357 > I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited > E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor > 'task-0-e4e4f131-ee09-4eaa-8120-3797f71c0e16' of framework > 0ab2a2ad-d6ef-4ca2-b17a-33972f9e8af7-0001 failed: Failed to kill all > processes in the container: Failed to remove cgroup > 'mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Failed to remove cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Device > or resource busy > {noformat} > Initially I thought it was a race condition in the cgroup destruction code, > but an strace confirmed that the cgroup directory was only deleted once all > tasks had exited (edited and commented strace below from a different instance > of the same problem): > {noformat} > # get the list of processes > 3431 23:01:32.293608 openat(AT_FDCWD, > "/sys/fs/cgroup/freezer/mesos/7eb1155b-ee0d-4233-8e49-cbe81f8b4deb/cgroup.procs", > O_RDONLY > 3431 23:01:32.293669 <... openat resumed> ) = 18 <0.36> > 3431
[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY
[ https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072214#comment-17072214 ] Charles commented on MESOS-10107: - Interestingly with the attached patch which fixes it by doing an exponential backoff sleep between retries, we can see the following: {noformat} 24136 23:23:15.141237 kill(7881, SIGKILL) = 0 <0.08> 24136 23:23:15.141267 kill(7895, SIGKILL) = 0 <0.29> 24136 23:23:15.141317 kill(7918, SIGKILL) = 0 <0.06> 7916 23:23:15.141834 +++ killed by SIGKILL +++ 7914 23:23:15.141860 +++ killed by SIGKILL +++ 7911 23:23:15.141867 +++ killed by SIGKILL +++ 7909 23:23:15.141872 +++ killed by SIGKILL +++ 7913 23:23:15.141973 +++ killed by SIGKILL +++ 7910 23:23:15.141985 +++ killed by SIGKILL +++ 7918 23:23:15.142034 +++ killed by SIGKILL +++ 7915 23:23:15.142061 +++ killed by SIGKILL +++ 7912 23:23:15.145387 +++ killed by SIGKILL +++ 7881 23:23:15.153055 +++ killed by SIGKILL +++ 24133 23:23:15.153183 — SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=7881, si_uid=1000, si_status=SIGKILL, si_utime=11, si_stime=2} 7917 23:23:15.159178 +++ killed by SIGKILL +++ 7895 23:23:15.159233 +++ killed by SIGKILL +++ 24132 23:23:15.164164 wait4(7881, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], WNOHANG, NULL) = 7881 <0.49> 24132 23:23:15.165980 wait4(7895, 0x7fc618f315e0, WNOHANG, NULL) = -1 ECHILD (No child processes) <0.000219> 24132 23:23:15.166281 kill(7895, 0) = -1 ESRCH (No such process) <0.10> 24132 23:23:15.166518 wait4(7918, 0x7fc618f315e0, WNOHANG, NULL) = -1 ECHILD (No child processes) <0.06> 24132 23:23:15.166566 kill(7918, 0) = -1 ESRCH (No such process) <0.05> 24136 23:23:15.167315 rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 EBUSY (Device or resource busy) <0.13> 24136 23:23:15.168470 rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 EBUSY (Device or resource busy) <0.16> 24136 23:23:15.170644 rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 EBUSY (Device or resource busy) <0.18> 24136 23:23:15.175433 rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 EBUSY (Device or resource busy) <0.55> 24136 23:23:15.184471 rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 EBUSY (Device or resource busy) <0.34> 24136 23:23:15.200760 rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 EBUSY (Device or resource busy) <0.22> 24136 23:23:15.232992 rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 EBUSY (Device or resource busy) <0.000111> 24136 23:23:15.297338 rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 EBUSY (Device or resource busy) <0.28> 7932 23:23:15.337486 +++ killed by SIGKILL +++ 24136 23:23:15.425600 rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = 0 <0.86> {noformat} See PID 7932 which is "killed by SIGKILL" right before the rmdir finally succeeds? It's actually a thread of process 7918 created earlier: {noformat} 7918 23:23:09.391067 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f5966fc4850) = 7932 <0.59>{noformat} So basically what happens is that process PID 7918 exits earlier: {noformat} 7918 23:23:15.142034 +++ killed by SIGKILL +++ [...] 24132 23:23:15.166518 wait4(7918, 0x7fc618f315e0, WNOHANG, NULL) = -1 ECHILD (No child processes) <0.06> 24132 23:23:15.166566 kill(7918, 0) = -1 ESRCH (No such process) <0.05> {noformat} So we believe that all processes are exited, and indeed cgroup.procs is empty, but actually a thread of one of those processes is not exited - it would likely show up in the cgroup tasks file. So this instance is actually another kernel bug: [https://github.com/torvalds/linux/commit/c03cd7738a83b13739f00546166969342c8ff014] In any case the fix for both issues is the same, just retry until the process/threads have exited. > containeriser: failed to remove cgroup - EBUSY > -- > > Key: MESOS-10107 > URL: https://issues.apache.org/jira/browse/MESOS-10107 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Charles >Priority: Major > Attachments: mesos-remove-cgroup-race.diff, > reproduce-cgroup-rmdir-race.py > > > We've been seeing some random errors on our cluster, where the container > cgroup isn't properly destroyed after the OOM killer kicked in when memory > limit has been exceeded - see analysis and patch below: > Agent log: > {noformat} > I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container >
[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY
[ https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072090#comment-17072090 ] Charles commented on MESOS-10107: - Note that libcontainer - used by docker - has a similar hack: https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/utils.go#L469 > containeriser: failed to remove cgroup - EBUSY > -- > > Key: MESOS-10107 > URL: https://issues.apache.org/jira/browse/MESOS-10107 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Charles >Priority: Major > Attachments: mesos-remove-cgroup-race.diff, > reproduce-cgroup-rmdir-race.py > > > We've been seeing some random errors on our cluster, where the container > cgroup isn't properly destroyed after the OOM killer kicked in when memory > limit has been exceeded - see analysis and patch below: > Agent log: > {noformat} > I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: > 10272MB Maximum Used: 10518532KB > MEMORY STATISTICS: > cache 0 > rss 10502754304 > rss_huge 4001366016 > shmem 0 > mapped_file 270336 > dirty 0 > writeback 0 > swap 0 > pgpgin 1684617 > pgpgout 95480 > pgfault 1670328 > pgmajfault 957 > inactive_anon 0 > active_anon 10501189632 > inactive_file 4096 > active_file 0 > unevictable 0 > hierarchical_memory_limit 10770972672 > hierarchical_memsw_limit 10770972672 > total_cache 0 > total_rss 10502754304 > total_rss_huge 4001366016 > total_shmem 0 > total_mapped_file 270336 > total_dirty 0 > total_writeback 0 > total_swap 0 > total_pgpgin 1684617 > total_pgpgout 95480 > total_pgfault 1670328 > total_pgmajfault 957 > total_inactive_anon 0 > total_active_anon 10501070848 > total_inactive_file 4096 > total_active_file 0 > total_unevictable 0 > I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource > [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be > terminated > I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state > I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state > of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING > after 4.285078272secs > I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy > container 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c' > I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 102.27072ms > I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 242944ns > I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for > executor(1)@127.0.1.1:46357 > I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited > E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor > 'task-0-e4e4f131-ee09-4eaa-8120-3797f71c0e16' of framework > 0ab2a2ad-d6ef-4ca2-b17a-33972f9e8af7-0001 failed: Failed to kill all > processes in the container: Failed to remove cgroup > 'mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Failed to remove cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Device > or resource busy > {noformat} > Initially I thought it was a race condition in the cgroup destruction code, > but an strace confirmed that the cgroup directory was only deleted once all > tasks had exited (edited and commented strace below from a different instance > of the same problem): > {noformat} > # get the list of processes > 3431 23:01:32.293608 openat(AT_FDCWD, > "/sys/fs/cgroup/freezer/mesos/7eb1155b-ee0d-4233-8e49-cbe81f8b4deb/cgroup.procs", > O_RDONLY > 3431 23:01:32.293669 <... openat resumed> ) = 18 <0.36> > 3431 23:01:32.294220 read(18, > 3431 23:01:32.294268 <... read resumed> "5878\n6036\n6210\n", 8192) = > 15 <0.33> > 3431 23:01:32.294306 read(18, "", 4096) = 0 <0.13> > 3431 23:01:32.294346 close(18 > 3431 23:01:32.294402 <... close resumed> ) = 0 <0.45> > #kill them > 3431 23:01:32.296266 kill(5878, SIGKILL) = 0 <0.19> > 3431 23:01:32.296384
[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY
[ https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071571#comment-17071571 ] Charles commented on MESOS-10107: - [^reproduce-cgroup-rmdir-race.py] [^mesos-remove-cgroup-race.diff] > containeriser: failed to remove cgroup - EBUSY > -- > > Key: MESOS-10107 > URL: https://issues.apache.org/jira/browse/MESOS-10107 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Charles >Priority: Major > Attachments: mesos-remove-cgroup-race.diff, > reproduce-cgroup-rmdir-race.py > > > We've been seeing some random errors on our cluster, where the container > cgroup isn't properly destroyed after the OOM killer kicked in when memory > limit has been exceeded - see analysis and patch below: > Agent log: > {noformat} > I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: > 10272MB Maximum Used: 10518532KB > MEMORY STATISTICS: > cache 0 > rss 10502754304 > rss_huge 4001366016 > shmem 0 > mapped_file 270336 > dirty 0 > writeback 0 > swap 0 > pgpgin 1684617 > pgpgout 95480 > pgfault 1670328 > pgmajfault 957 > inactive_anon 0 > active_anon 10501189632 > inactive_file 4096 > active_file 0 > unevictable 0 > hierarchical_memory_limit 10770972672 > hierarchical_memsw_limit 10770972672 > total_cache 0 > total_rss 10502754304 > total_rss_huge 4001366016 > total_shmem 0 > total_mapped_file 270336 > total_dirty 0 > total_writeback 0 > total_swap 0 > total_pgpgin 1684617 > total_pgpgout 95480 > total_pgfault 1670328 > total_pgmajfault 957 > total_inactive_anon 0 > total_active_anon 10501070848 > total_inactive_file 4096 > total_active_file 0 > total_unevictable 0 > I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource > [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be > terminated > I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state > I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state > of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING > after 4.285078272secs > I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy > container 2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c' > I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 102.27072ms > I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c > I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after > 242944ns > I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for > executor(1)@127.0.1.1:46357 > I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container > 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited > E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor > 'task-0-e4e4f131-ee09-4eaa-8120-3797f71c0e16' of framework > 0ab2a2ad-d6ef-4ca2-b17a-33972f9e8af7-0001 failed: Failed to kill all > processes in the container: Failed to remove cgroup > 'mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Failed to remove cgroup > '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Device > or resource busy > {noformat} > Initially I thought it was a race condition in the cgroup destruction code, > but an strace confirmed that the cgroup directory was only deleted once all > tasks had exited (edited and commented strace below from a different instance > of the same problem): > {noformat} > # get the list of processes > 3431 23:01:32.293608 openat(AT_FDCWD, > "/sys/fs/cgroup/freezer/mesos/7eb1155b-ee0d-4233-8e49-cbe81f8b4deb/cgroup.procs", > O_RDONLY > 3431 23:01:32.293669 <... openat resumed> ) = 18 <0.36> > 3431 23:01:32.294220 read(18, > 3431 23:01:32.294268 <... read resumed> "5878\n6036\n6210\n", 8192) = > 15 <0.33> > 3431 23:01:32.294306 read(18, "", 4096) = 0 <0.13> > 3431 23:01:32.294346 close(18 > 3431 23:01:32.294402 <... close resumed> ) = 0 <0.45> > #kill them > 3431 23:01:32.296266 kill(5878, SIGKILL) = 0 <0.19> > 3431 23:01:32.296384 kill(6036, SIGKILL > 3431 23:01:32.296474 kill(6210, SIGKILL) = 0 <0.13> >