cf-natali commented on pull request #388: URL: https://github.com/apache/mesos/pull/388#issuecomment-850889402
> That said, I'm not convinced that postponing c) is the best available option. > > Maybe we should consider replacing c) with another termination mechanism that will be **guaranteed** not to halt this loop in the middle of an iteration? Something like ~setting a flag by that discard callback and checking this flag~ checking `future.hasDiscard()` inside this loop before calling `cgroups::freezer:freeze()` ? Yes, I wondered about this, and actually I initially wanted to go for this approach to never interrupt of the middle of an iteration, however I chose the approach in this MR because: - it's simpler - I don't feel too comfortable changing this code as it's already the result of several workarounds, so as you said earlier it's not clear that any change won't break some subtle corner-case - but more fundamentally, the reason is that if the freezing fails, then I think pretty much all bets are off. For example in case of timeout upon freeze, I'm not convinced that the call to `TaskKiller::kill` is safe because by definition some tasks are not frozen hence the kill won't be atomic and it might be possible that because of PID reuse we end up killing the wrong process. Another thing is that if freezing failed, I am not sure how reliable the thawing or if the kernel might just discard it which will also leave the cgroup in a bad state. However I'm open to trying it if you prefer the non-interruption approach. It's so sad that there is no simple and reliable way to do this. AFAICT systemd doesn't use the freezer cgroup because of its various issues - it actually suffers from PID reuse race conditions, looks like they are considering using freezer in cgroupsv2: https://github.com/systemd/systemd/issues/13101 And sadly, while PID namespace mentions in the various comments sounds promising, the problem is that it could definitely break some tasks so it's not like we can just switch to using it. > Btw, how are you testing the code handling freeze/kill failures? > In my experience, the simplest way to reliably create a process in an uninterruptable sleep (D state) is to create a file on some FUSE-backed FS (say, sshfs), mmap that file, stop the program backing the FS with SIGSTOP, and repeatedly read/write the mmap-ed bytes until the process eventually hangs. That's rather complicated, takes 30 seconds of read/write on my kernel and is not really usable in tests... I'm wondering if there is a simpler option. It depends. If the goal is really to create a process in uninterruptible sleep, then yes I'm not aware of any way much simpler - typically I'd use a write to an hard NFS mount, and use e.g. iptables to drop the packets and cause the process to hang. However I don't believe it's quite necessary to test the freezer code. For this change, I just re-ran some cgroups related tests in a loop because it was enough to trigger the issue. For the previous changes I've made to the freezer code - some workarounds for kernel bugs - I'd generally just use `strace --inject` to either introduce faults or delays, e.g.: ``` cf@thinkpad:~$ strace -ttT --inject=read:delay_enter=1s /bin/cat /etc/fstab >/dev/null [...] 20:34:21.932568 openat(AT_FDCWD, "/etc/fstab", O_RDONLY) = 3 <0.000041> [...] 20:34:21.932936 read(3, "# /etc/fstab: static file system"..., 131072) = 788 <1.000289> 20:34:22.933402 write(1, "# /etc/fstab: static file system"..., 788) = 788 <0.000037> 20:34:22.933580 read(3, "", 131072) = 0 <1.000115> 20:34:23.933857 munmap(0x7ff03b46b000, 139264) = 0 <0.000074> [...] ``` This can be used to e.g. cause timeouts while reading/writing to `freezer.state` file, which is enough to exercise some interesting code paths. And I quite like it because it's very generic and can be used for many problems. For something purely freezer-specific, another way I can think of to trigger freezer errors, and which can actually probably be done as part of an automated test, is to use nested cgroups: ``` root@thinkpad:~# # create parent cgroup root@thinkpad:~# mkdir /sys/fs/cgroup/freezer/parent root@thinkpad:~# # create child cgroup root@thinkpad:~# mkdir /sys/fs/cgroup/freezer/parent/child root@thinkpad:~# # freeze parent - since freezing is recursive, the child will be frozen root@thinkpad:~# echo FROZEN > /sys/fs/cgroup/freezer/parent/freezer.state root@thinkpad:~# cat /sys/fs/cgroup/freezer/parent/freezer.state FROZEN root@thinkpad:~# cat /sys/fs/cgroup/freezer/parent/child/freezer.state FROZEN root@thinkpad:~# # attempt to thaw the child root@thinkpad:~# echo THAWED > /sys/fs/cgroup/freezer/parent/child/freezer.state root@thinkpad:~# # not working since parent is still frozen root@thinkpad:~# cat /sys/fs/cgroup/freezer/parent/child/freezer.state FROZEN root@thinkpad:~# cat /sys/fs/cgroup/freezer/parent/freezer.state FROZEN root@thinkpad:~# # thaw parent, and child thaws as well root@thinkpad:~# echo THAWED > /sys/fs/cgroup/freezer/parent/freezer.state root@thinkpad:~# cat /sys/fs/cgroup/freezer/parent/freezer.state THAWED root@thinkpad:~# cat /sys/fs/cgroup/freezer/parent/child/freezer.state THAWED root@thinkpad:~# ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
