[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY

2020-05-07 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101572#comment-17101572
 ] 

Andrei Budnik commented on MESOS-10107:
---

{code:java}
commit 0cb1591b709e3c9f32093d943b8e2ddcdcf7999f
Author: Charles-Francois Natali 
Date:   Sat May 2 01:41:09 2020 +0100

Keep retrying to remove cgroup on EBUSY.

This is a follow-up to MESOS-10107, which introduced retries when
calling `rmdir` on a seemingly empty cgroup fails with `EBUSY`
because of various kernel bugs.
At the time, the fix introduced a bounded number of retries, using an
exponential backoff summing up to slightly over 1s. This was done
because it was similar to what Docker does, and worked during testing.
However, after 1 month without seeing this error in our cluster at work,
we finally experienced one case where the 1s timeout wasn't enough.
It could be because the machine was busy at the time, or some other
random factor.
So instead of only trying for 1s, I think it might make sense to just
keep retrying, until the top-level container destruction timeout - set
at 1 minute - kicks in.
This actually makes more sense, and avoids having a magical timeout in
the cgroup code.
We just need to ensure that when the destroyer is finalized, it discards
the future in charge of doing the periodic remove.

This closes #362
{code}

> containeriser: failed to remove cgroup - EBUSY
> --
>
> Key: MESOS-10107
> URL: https://issues.apache.org/jira/browse/MESOS-10107
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Charles N
>Assignee: Charles Natali
>Priority: Major
>  Labels: cgroups, containerization
> Fix For: 1.10.0
>
> Attachments: mesos-remove-cgroup-race.diff, 
> reproduce-cgroup-rmdir-race.py
>
>
> We've been seeing some random errors on our cluster, where the container 
> cgroup isn't properly destroyed after the OOM killer kicked in when memory 
> limit has been exceeded - see analysis and patch below:
> Agent log:
> {noformat}
> I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: 
> 10272MB Maximum Used: 10518532KB
> MEMORY STATISTICS: 
> cache 0
> rss 10502754304
> rss_huge 4001366016
> shmem 0
> mapped_file 270336
> dirty 0
> writeback 0
> swap 0
> pgpgin 1684617
> pgpgout 95480
> pgfault 1670328
> pgmajfault 957
> inactive_anon 0
> active_anon 10501189632
> inactive_file 4096
> active_file 0
> unevictable 0
> hierarchical_memory_limit 10770972672
> hierarchical_memsw_limit 10770972672
> total_cache 0
> total_rss 10502754304
> total_rss_huge 4001366016
> total_shmem 0
> total_mapped_file 270336
> total_dirty 0
> total_writeback 0
> total_swap 0
> total_pgpgin 1684617
> total_pgpgout 95480
> total_pgfault 1670328
> total_pgmajfault 957
> total_inactive_anon 0
> total_active_anon 10501070848
> total_inactive_file 4096
> total_active_file 0
> total_unevictable 0
> I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource 
> [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be 
> terminated
> I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state
> I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state 
> of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING 
> after 4.285078272secs
> I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy 
> container 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c'
> I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 102.27072ms
> I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 242944ns
> I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for 
> executor(1)@127.0.1.1:46357
> I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited
> E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor 
> 

[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY

2020-04-15 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084136#comment-17084136
 ] 

Andrei Budnik commented on MESOS-10107:
---

{code:java}
commit af3ca189aced5fbc537bfca571264142d4cd37b3
Author: Charles-Francois Natali 
Date:   Wed Apr 1 13:40:16 2020 +0100

Handled EBUSY when destroying a cgroup.

It's a workaround for kernel bugs which can cause `rmdir` to fail with
`EBUSY` even though the cgroup - appears - empty.
See for example https://lkml.org/lkml/2020/1/15/1349

This closes #355
{code}

> containeriser: failed to remove cgroup - EBUSY
> --
>
> Key: MESOS-10107
> URL: https://issues.apache.org/jira/browse/MESOS-10107
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Charles N
>Priority: Major
>  Labels: cgroups, containerization
> Fix For: 1.10.0
>
> Attachments: mesos-remove-cgroup-race.diff, 
> reproduce-cgroup-rmdir-race.py
>
>
> We've been seeing some random errors on our cluster, where the container 
> cgroup isn't properly destroyed after the OOM killer kicked in when memory 
> limit has been exceeded - see analysis and patch below:
> Agent log:
> {noformat}
> I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: 
> 10272MB Maximum Used: 10518532KB
> MEMORY STATISTICS: 
> cache 0
> rss 10502754304
> rss_huge 4001366016
> shmem 0
> mapped_file 270336
> dirty 0
> writeback 0
> swap 0
> pgpgin 1684617
> pgpgout 95480
> pgfault 1670328
> pgmajfault 957
> inactive_anon 0
> active_anon 10501189632
> inactive_file 4096
> active_file 0
> unevictable 0
> hierarchical_memory_limit 10770972672
> hierarchical_memsw_limit 10770972672
> total_cache 0
> total_rss 10502754304
> total_rss_huge 4001366016
> total_shmem 0
> total_mapped_file 270336
> total_dirty 0
> total_writeback 0
> total_swap 0
> total_pgpgin 1684617
> total_pgpgout 95480
> total_pgfault 1670328
> total_pgmajfault 957
> total_inactive_anon 0
> total_active_anon 10501070848
> total_inactive_file 4096
> total_active_file 0
> total_unevictable 0
> I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource 
> [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be 
> terminated
> I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state
> I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state 
> of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING 
> after 4.285078272secs
> I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy 
> container 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c'
> I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 102.27072ms
> I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 242944ns
> I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for 
> executor(1)@127.0.1.1:46357
> I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited
> E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor 
> 'task-0-e4e4f131-ee09-4eaa-8120-3797f71c0e16' of framework 
> 0ab2a2ad-d6ef-4ca2-b17a-33972f9e8af7-0001 failed: Failed to kill all 
> processes in the container: Failed to remove cgroup 
> 'mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Failed to remove cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Device 
> or resource busy
> {noformat}
> Initially I thought it was a race condition in the cgroup destruction code, 
> but an strace confirmed that the cgroup directory was only deleted once all 
> tasks had exited (edited and commented strace below from a different instance 
> of the same problem):
> {noformat}
> # get the list of processes
> 3431  23:01:32.293608 openat(AT_FDCWD,
> "/sys/fs/cgroup/freezer/mesos/7eb1155b-ee0d-4233-8e49-cbe81f8b4deb/cgroup.procs",
> O_RDONLY 
> 3431  23:01:32.293669 <... openat resumed> ) = 18 <0.36>
> 3431  23:01:32.294220 read(18,  
> 

[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY

2020-04-01 Thread Charles (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072739#comment-17072739
 ] 

Charles commented on MESOS-10107:
-

Hey!

Sure, I created [https://github.com/apache/mesos/pull/355]

I answered your questions there (I'm not sure what's the best place for 
discussions, but it feels nice to have it close to thew code review).

> containeriser: failed to remove cgroup - EBUSY
> --
>
> Key: MESOS-10107
> URL: https://issues.apache.org/jira/browse/MESOS-10107
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Charles
>Priority: Major
> Attachments: mesos-remove-cgroup-race.diff, 
> reproduce-cgroup-rmdir-race.py
>
>
> We've been seeing some random errors on our cluster, where the container 
> cgroup isn't properly destroyed after the OOM killer kicked in when memory 
> limit has been exceeded - see analysis and patch below:
> Agent log:
> {noformat}
> I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: 
> 10272MB Maximum Used: 10518532KB
> MEMORY STATISTICS: 
> cache 0
> rss 10502754304
> rss_huge 4001366016
> shmem 0
> mapped_file 270336
> dirty 0
> writeback 0
> swap 0
> pgpgin 1684617
> pgpgout 95480
> pgfault 1670328
> pgmajfault 957
> inactive_anon 0
> active_anon 10501189632
> inactive_file 4096
> active_file 0
> unevictable 0
> hierarchical_memory_limit 10770972672
> hierarchical_memsw_limit 10770972672
> total_cache 0
> total_rss 10502754304
> total_rss_huge 4001366016
> total_shmem 0
> total_mapped_file 270336
> total_dirty 0
> total_writeback 0
> total_swap 0
> total_pgpgin 1684617
> total_pgpgout 95480
> total_pgfault 1670328
> total_pgmajfault 957
> total_inactive_anon 0
> total_active_anon 10501070848
> total_inactive_file 4096
> total_active_file 0
> total_unevictable 0
> I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource 
> [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be 
> terminated
> I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state
> I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state 
> of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING 
> after 4.285078272secs
> I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy 
> container 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c'
> I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 102.27072ms
> I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 242944ns
> I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for 
> executor(1)@127.0.1.1:46357
> I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited
> E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor 
> 'task-0-e4e4f131-ee09-4eaa-8120-3797f71c0e16' of framework 
> 0ab2a2ad-d6ef-4ca2-b17a-33972f9e8af7-0001 failed: Failed to kill all 
> processes in the container: Failed to remove cgroup 
> 'mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Failed to remove cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Device 
> or resource busy
> {noformat}
> Initially I thought it was a race condition in the cgroup destruction code, 
> but an strace confirmed that the cgroup directory was only deleted once all 
> tasks had exited (edited and commented strace below from a different instance 
> of the same problem):
> {noformat}
> # get the list of processes
> 3431  23:01:32.293608 openat(AT_FDCWD,
> "/sys/fs/cgroup/freezer/mesos/7eb1155b-ee0d-4233-8e49-cbe81f8b4deb/cgroup.procs",
> O_RDONLY 
> 3431  23:01:32.293669 <... openat resumed> ) = 18 <0.36>
> 3431  23:01:32.294220 read(18,  
> 3431  23:01:32.294268 <... read resumed> "5878\n6036\n6210\n", 8192) =
> 15 <0.33>
> 3431  23:01:32.294306 read(18, "", 4096) = 0 <0.13>
> 3431  23:01:32.294346 close(18 
> 3431  23:01:32.294402 <... close resumed> ) = 0 <0.45>
> #kill them
> 3431  23:01:32.296266 

[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY

2020-04-01 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072680#comment-17072680
 ] 

Andrei Budnik commented on MESOS-10107:
---

Thanks for the detailed explanations!

Could you please submit your patch to [Apache Review 
Board|http://mesos.apache.org/documentation/latest/advanced-contribution/#submit-your-patch]
 or open a [PR on 
github|http://mesos.apache.org/documentation/latest/beginner-contribution/#open-a-pr/]?

Does the workaround work reliably after changing the initial delay and retry 
count to the values taken from libcontainerd (10ms and 5)?

Should we retry only if `::rmdir()` returns EBUSY errno error?

> containeriser: failed to remove cgroup - EBUSY
> --
>
> Key: MESOS-10107
> URL: https://issues.apache.org/jira/browse/MESOS-10107
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Charles
>Priority: Major
> Attachments: mesos-remove-cgroup-race.diff, 
> reproduce-cgroup-rmdir-race.py
>
>
> We've been seeing some random errors on our cluster, where the container 
> cgroup isn't properly destroyed after the OOM killer kicked in when memory 
> limit has been exceeded - see analysis and patch below:
> Agent log:
> {noformat}
> I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: 
> 10272MB Maximum Used: 10518532KB
> MEMORY STATISTICS: 
> cache 0
> rss 10502754304
> rss_huge 4001366016
> shmem 0
> mapped_file 270336
> dirty 0
> writeback 0
> swap 0
> pgpgin 1684617
> pgpgout 95480
> pgfault 1670328
> pgmajfault 957
> inactive_anon 0
> active_anon 10501189632
> inactive_file 4096
> active_file 0
> unevictable 0
> hierarchical_memory_limit 10770972672
> hierarchical_memsw_limit 10770972672
> total_cache 0
> total_rss 10502754304
> total_rss_huge 4001366016
> total_shmem 0
> total_mapped_file 270336
> total_dirty 0
> total_writeback 0
> total_swap 0
> total_pgpgin 1684617
> total_pgpgout 95480
> total_pgfault 1670328
> total_pgmajfault 957
> total_inactive_anon 0
> total_active_anon 10501070848
> total_inactive_file 4096
> total_active_file 0
> total_unevictable 0
> I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource 
> [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be 
> terminated
> I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state
> I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state 
> of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING 
> after 4.285078272secs
> I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy 
> container 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c'
> I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 102.27072ms
> I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 242944ns
> I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for 
> executor(1)@127.0.1.1:46357
> I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited
> E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor 
> 'task-0-e4e4f131-ee09-4eaa-8120-3797f71c0e16' of framework 
> 0ab2a2ad-d6ef-4ca2-b17a-33972f9e8af7-0001 failed: Failed to kill all 
> processes in the container: Failed to remove cgroup 
> 'mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Failed to remove cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Device 
> or resource busy
> {noformat}
> Initially I thought it was a race condition in the cgroup destruction code, 
> but an strace confirmed that the cgroup directory was only deleted once all 
> tasks had exited (edited and commented strace below from a different instance 
> of the same problem):
> {noformat}
> # get the list of processes
> 3431  23:01:32.293608 openat(AT_FDCWD,
> "/sys/fs/cgroup/freezer/mesos/7eb1155b-ee0d-4233-8e49-cbe81f8b4deb/cgroup.procs",
> O_RDONLY 
> 3431  23:01:32.293669 <... openat resumed> ) = 18 <0.36>
> 3431  

[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY

2020-03-31 Thread Charles (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072214#comment-17072214
 ] 

Charles commented on MESOS-10107:
-

Interestingly with the attached patch which fixes it by doing an exponential 
backoff sleep between retries, we can see the following:

 
{noformat}
24136 23:23:15.141237 kill(7881, SIGKILL) = 0 <0.08>
 24136 23:23:15.141267 kill(7895, SIGKILL) = 0 <0.29>
 24136 23:23:15.141317 kill(7918, SIGKILL) = 0 <0.06>
 7916 23:23:15.141834 +++ killed by SIGKILL +++
 7914 23:23:15.141860 +++ killed by SIGKILL +++
 7911 23:23:15.141867 +++ killed by SIGKILL +++
 7909 23:23:15.141872 +++ killed by SIGKILL +++
 7913 23:23:15.141973 +++ killed by SIGKILL +++
 7910 23:23:15.141985 +++ killed by SIGKILL +++
 7918 23:23:15.142034 +++ killed by SIGKILL +++
 7915 23:23:15.142061 +++ killed by SIGKILL +++
 7912 23:23:15.145387 +++ killed by SIGKILL +++
 7881 23:23:15.153055 +++ killed by SIGKILL +++
 24133 23:23:15.153183 — SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, 
si_pid=7881, si_uid=1000, si_status=SIGKILL, si_utime=11, si_stime=2}
 7917 23:23:15.159178 +++ killed by SIGKILL +++
 7895 23:23:15.159233 +++ killed by SIGKILL +++
 24132 23:23:15.164164 wait4(7881, [{WIFSIGNALED(s) && WTERMSIG(s) == 
SIGKILL}], WNOHANG, NULL) = 7881 <0.49>
 24132 23:23:15.165980 wait4(7895, 0x7fc618f315e0, WNOHANG, NULL) = -1 ECHILD 
(No child processes) <0.000219>
 24132 23:23:15.166281 kill(7895, 0) = -1 ESRCH (No such process) <0.10>
 24132 23:23:15.166518 wait4(7918, 0x7fc618f315e0, WNOHANG, NULL) = -1 ECHILD 
(No child processes) <0.06>
 24132 23:23:15.166566 kill(7918, 0) = -1 ESRCH (No such process) <0.05>
 24136 23:23:15.167315 
rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 
EBUSY (Device or resource busy) <0.13>
 24136 23:23:15.168470 
rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 
EBUSY (Device or resource busy) <0.16>
 24136 23:23:15.170644 
rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 
EBUSY (Device or resource busy) <0.18>
 24136 23:23:15.175433 
rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 
EBUSY (Device or resource busy) <0.55>
 24136 23:23:15.184471 
rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 
EBUSY (Device or resource busy) <0.34>
 24136 23:23:15.200760 
rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 
EBUSY (Device or resource busy) <0.22>
 24136 23:23:15.232992 
rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 
EBUSY (Device or resource busy) <0.000111>
 24136 23:23:15.297338 
rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = -1 
EBUSY (Device or resource busy) <0.28>
 7932 23:23:15.337486 +++ killed by SIGKILL +++
 24136 23:23:15.425600 
rmdir("/sys/fs/cgroup/freezer/mesos/f22e05d5-b84c-4104-9cb9-41d2ae44a1fd") = 0 
<0.86>
{noformat}
 

See PID 7932 which is "killed by SIGKILL" right before the rmdir finally 
succeeds?
 It's actually a thread of process 7918 created earlier:
{noformat}
7918 23:23:09.391067 clone(child_stack=NULL, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f5966fc4850) = 7932 <0.59>{noformat}
So basically what happens is that process PID 7918 exits earlier:
{noformat}
7918 23:23:15.142034 +++ killed by SIGKILL +++
 [...]
 24132 23:23:15.166518 wait4(7918, 0x7fc618f315e0, WNOHANG, NULL) = -1 ECHILD 
(No child processes) <0.06>
 24132 23:23:15.166566 kill(7918, 0) = -1 ESRCH (No such process) <0.05>
{noformat}
So we believe that all processes are exited, and indeed cgroup.procs is empty, 
but actually a thread of one of those processes is not exited - it would likely 
show up in the cgroup tasks file.
So this instance is actually another kernel bug: 
[https://github.com/torvalds/linux/commit/c03cd7738a83b13739f00546166969342c8ff014]

In any case the fix for both issues is the same, just retry until the 
process/threads have exited.

> containeriser: failed to remove cgroup - EBUSY
> --
>
> Key: MESOS-10107
> URL: https://issues.apache.org/jira/browse/MESOS-10107
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Charles
>Priority: Major
> Attachments: mesos-remove-cgroup-race.diff, 
> reproduce-cgroup-rmdir-race.py
>
>
> We've been seeing some random errors on our cluster, where the container 
> cgroup isn't properly destroyed after the OOM killer kicked in when memory 
> limit has been exceeded - see analysis and patch below:
> Agent log:
> {noformat}
> I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container 
> 

[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY

2020-03-31 Thread Charles (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072090#comment-17072090
 ] 

Charles commented on MESOS-10107:
-

Note that libcontainer - used by docker - has a similar hack: 
https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/utils.go#L469


> containeriser: failed to remove cgroup - EBUSY
> --
>
> Key: MESOS-10107
> URL: https://issues.apache.org/jira/browse/MESOS-10107
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Charles
>Priority: Major
> Attachments: mesos-remove-cgroup-race.diff, 
> reproduce-cgroup-rmdir-race.py
>
>
> We've been seeing some random errors on our cluster, where the container 
> cgroup isn't properly destroyed after the OOM killer kicked in when memory 
> limit has been exceeded - see analysis and patch below:
> Agent log:
> {noformat}
> I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: 
> 10272MB Maximum Used: 10518532KB
> MEMORY STATISTICS: 
> cache 0
> rss 10502754304
> rss_huge 4001366016
> shmem 0
> mapped_file 270336
> dirty 0
> writeback 0
> swap 0
> pgpgin 1684617
> pgpgout 95480
> pgfault 1670328
> pgmajfault 957
> inactive_anon 0
> active_anon 10501189632
> inactive_file 4096
> active_file 0
> unevictable 0
> hierarchical_memory_limit 10770972672
> hierarchical_memsw_limit 10770972672
> total_cache 0
> total_rss 10502754304
> total_rss_huge 4001366016
> total_shmem 0
> total_mapped_file 270336
> total_dirty 0
> total_writeback 0
> total_swap 0
> total_pgpgin 1684617
> total_pgpgout 95480
> total_pgfault 1670328
> total_pgmajfault 957
> total_inactive_anon 0
> total_active_anon 10501070848
> total_inactive_file 4096
> total_active_file 0
> total_unevictable 0
> I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource 
> [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be 
> terminated
> I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state
> I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state 
> of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING 
> after 4.285078272secs
> I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy 
> container 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c'
> I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 102.27072ms
> I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 242944ns
> I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for 
> executor(1)@127.0.1.1:46357
> I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited
> E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor 
> 'task-0-e4e4f131-ee09-4eaa-8120-3797f71c0e16' of framework 
> 0ab2a2ad-d6ef-4ca2-b17a-33972f9e8af7-0001 failed: Failed to kill all 
> processes in the container: Failed to remove cgroup 
> 'mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Failed to remove cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Device 
> or resource busy
> {noformat}
> Initially I thought it was a race condition in the cgroup destruction code, 
> but an strace confirmed that the cgroup directory was only deleted once all 
> tasks had exited (edited and commented strace below from a different instance 
> of the same problem):
> {noformat}
> # get the list of processes
> 3431  23:01:32.293608 openat(AT_FDCWD,
> "/sys/fs/cgroup/freezer/mesos/7eb1155b-ee0d-4233-8e49-cbe81f8b4deb/cgroup.procs",
> O_RDONLY 
> 3431  23:01:32.293669 <... openat resumed> ) = 18 <0.36>
> 3431  23:01:32.294220 read(18,  
> 3431  23:01:32.294268 <... read resumed> "5878\n6036\n6210\n", 8192) =
> 15 <0.33>
> 3431  23:01:32.294306 read(18, "", 4096) = 0 <0.13>
> 3431  23:01:32.294346 close(18 
> 3431  23:01:32.294402 <... close resumed> ) = 0 <0.45>
> #kill them
> 3431  23:01:32.296266 kill(5878, SIGKILL) = 0 <0.19>
> 3431  23:01:32.296384 

[jira] [Commented] (MESOS-10107) containeriser: failed to remove cgroup - EBUSY

2020-03-31 Thread Charles (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071571#comment-17071571
 ] 

Charles commented on MESOS-10107:
-

 [^reproduce-cgroup-rmdir-race.py]  [^mesos-remove-cgroup-race.diff] 

> containeriser: failed to remove cgroup - EBUSY
> --
>
> Key: MESOS-10107
> URL: https://issues.apache.org/jira/browse/MESOS-10107
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Charles
>Priority: Major
> Attachments: mesos-remove-cgroup-race.diff, 
> reproduce-cgroup-rmdir-race.py
>
>
> We've been seeing some random errors on our cluster, where the container 
> cgroup isn't properly destroyed after the OOM killer kicked in when memory 
> limit has been exceeded - see analysis and patch below:
> Agent log:
> {noformat}
> I0331 08:49:16.398592 12831 memory.cpp:515] OOM detected for container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.401342 12831 memory.cpp:555] Memory limit exceeded: Requested: 
> 10272MB Maximum Used: 10518532KB
> MEMORY STATISTICS: 
> cache 0
> rss 10502754304
> rss_huge 4001366016
> shmem 0
> mapped_file 270336
> dirty 0
> writeback 0
> swap 0
> pgpgin 1684617
> pgpgout 95480
> pgfault 1670328
> pgmajfault 957
> inactive_anon 0
> active_anon 10501189632
> inactive_file 4096
> active_file 0
> unevictable 0
> hierarchical_memory_limit 10770972672
> hierarchical_memsw_limit 10770972672
> total_cache 0
> total_rss 10502754304
> total_rss_huge 4001366016
> total_shmem 0
> total_mapped_file 270336
> total_dirty 0
> total_writeback 0
> total_swap 0
> total_pgpgin 1684617
> total_pgpgout 95480
> total_pgfault 1670328
> total_pgmajfault 957
> total_inactive_anon 0
> total_active_anon 10501070848
> total_inactive_file 4096
> total_active_file 0
> total_unevictable 0
> I0331 08:49:16.414501 12831 containerizer.cpp:3175] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has reached its limit for resource 
> [{"name":"mem","scalar":{"value":10272.0},"type":"SCALAR"}] and will be 
> terminated
> I0331 08:49:16.415262 12831 containerizer.cpp:2619] Destroying container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c in RUNNING state
> I0331 08:49:16.415323 12831 containerizer.cpp:3317] Transitioning the state 
> of container 2c2a31eb-bac5-4acd-82ee-593c4616a63c from RUNNING to DESTROYING 
> after 4.285078272secs
> I0331 08:49:16.416393 12830 linux_launcher.cpp:576] Asked to destroy 
> container 2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.416484 12830 linux_launcher.cpp:618] Destroying cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c'
> I0331 08:49:16.417093 12830 cgroups.cpp:2854] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.519397 12830 cgroups.cpp:1242] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 102.27072ms
> I0331 08:49:16.524307 12826 cgroups.cpp:2872] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c
> I0331 08:49:16.524654 12828 cgroups.cpp:1271] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c after 
> 242944ns
> I0331 08:49:16.531811 12829 slave.cpp:6539] Got exited event for 
> executor(1)@127.0.1.1:46357
> I0331 08:49:16.539868 12825 containerizer.cpp:3155] Container 
> 2c2a31eb-bac5-4acd-82ee-593c4616a63c has exited
> E0331 08:49:16.548131 12825 slave.cpp:6917] Termination of executor 
> 'task-0-e4e4f131-ee09-4eaa-8120-3797f71c0e16' of framework 
> 0ab2a2ad-d6ef-4ca2-b17a-33972f9e8af7-0001 failed: Failed to kill all 
> processes in the container: Failed to remove cgroup 
> 'mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Failed to remove cgroup 
> '/sys/fs/cgroup/freezer/mesos/2c2a31eb-bac5-4acd-82ee-593c4616a63c': Device 
> or resource busy
> {noformat}
> Initially I thought it was a race condition in the cgroup destruction code, 
> but an strace confirmed that the cgroup directory was only deleted once all 
> tasks had exited (edited and commented strace below from a different instance 
> of the same problem):
> {noformat}
> # get the list of processes
> 3431  23:01:32.293608 openat(AT_FDCWD,
> "/sys/fs/cgroup/freezer/mesos/7eb1155b-ee0d-4233-8e49-cbe81f8b4deb/cgroup.procs",
> O_RDONLY 
> 3431  23:01:32.293669 <... openat resumed> ) = 18 <0.36>
> 3431  23:01:32.294220 read(18,  
> 3431  23:01:32.294268 <... read resumed> "5878\n6036\n6210\n", 8192) =
> 15 <0.33>
> 3431  23:01:32.294306 read(18, "", 4096) = 0 <0.13>
> 3431  23:01:32.294346 close(18 
> 3431  23:01:32.294402 <... close resumed> ) = 0 <0.45>
> #kill them
> 3431  23:01:32.296266 kill(5878, SIGKILL) = 0 <0.19>
> 3431  23:01:32.296384 kill(6036, SIGKILL 
> 3431  23:01:32.296474 kill(6210, SIGKILL) = 0 <0.13>
>