> On Jan. 8, 2020, 7:07 a.m., Greg Mann wrote: > > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp > > Lines 199 (patched) > > <https://reviews.apache.org/r/71944/diff/2/?file=2193218#file2193218line199> > > > > Do we really want to do this? My concern is that this will make any > > non-Mesos-task processes on the node (networking and security components, > > for example) more likely to be OOM-killed than Mesos tasks. Perhaps we > > should only set the OOM score adjustment for burstable tasks. What do you > > think? > > Qian Zhang wrote: > I think it depends on which one is in higher priority and more important, > guaranteed task or non-Mesos-task processes? In Kubernetes implementation > (https://github.com/kubernetes/kubernetes/blob/v1.16.2/pkg/kubelet/qos/policy.go#L51:L53), > the OOM score adjust of guaranteed container is set to -998, and kubelet's > OOM score adjust is set to -998 too, I think we should do the same to protect > guaranteed containers and Mesos agent, what do you think? > > Greg Mann wrote: > One significant difference in the Kubernetes case is that they have > user-space code which kills pod processes to reclaim memory when necessary. > Consequently, there will be less impact if the OOM killer shows a strong > preference against killing guaranteed tasks. > > My intuition is that we should not set the OOM score adjustment for > non-bursting processes. Even if we leave it at zero, guaranteed tasks will > still be treated preferentially with respect to bursting tasks, since all > bursting tasks will have an adjustment greater than zero. > > Qian Zhang wrote: > I agree that guaranteed tasks will be treated preferentially with respect > to bursting tasks, but I am thinking about the guaranteed tasks v.s. the > non-Mesos-tasks, let's say two guaranteed tasks running on a node, and each > of them's memory request/limit is half of the node's memory, and both of them > has almost used all of its memory request/limit, so their OOM score will be > very high (like 490+). Now if a non-mesos-task (e.g., a system component or > even Mesos agent itself) tries to use a lot of memory suddenly, the node will > be short of memory, and then OOM killer will definitely kill one of the two > guaranteed tasks since their OOM score are the top 2 in the node. But I do > not think K8s will have this issue since the guaranteed containers OOM score > adjust is -998. > > Qian Zhang wrote: > And even we think about the case guaranteed tasks v.s. burstable tasks, I > think it is also a bit risky if we leave guaranteed task's OOM score adjust > to 0. For example, one guaranteed task (T1) and one burstable task (T2) > running on a node, each of them's memory request is half of the node's > memory. T1 has almost used all of its memory request/limit, so its OOM score > will be something like 490+. T2 uses very little memory, so its OOM score > will be a bit beyond 500 (like 510). I think in this case the OOM scores of > T1 and T2 are too close, the actual OOM score is calculated in a more complex > way, so I am afraid there will be a moment that the OOM score of T1 is even > higher than T2, that's why I think it is a bit risky. > > Qian Zhang wrote: > Just want to add one point, it seems a small amount (30) will be > subtracted from the OOM score of root-owned processes, so in the above > example, if T2 is owned by root but T1 is owned by a normal user, it might be > possible T2 get a smaller OOM score that T1. > > Greg Mann wrote: > In the case above of tasks T1 and T2, I don't think we need to provide > any guarantee of which process is killed first in this case. If neither task > is above its memory request, then I think it's OK for the OOM killer to > decide which one is killed first. The resource limits feature doesn't add a > notion of priority, like "guaranteed" vs. "burstable", I think we just want > to make sure that tasks which have exceeded their memory request are killed > preferentially. So I think it's OK to leave the OOM score adjustment of > non-burstable tasks at zero.
Second thought, I agree with you that we leave the OOM score adjustment of non-burstable tasks at zero for backward compatibility. - Qian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71944/#review219158 ----------------------------------------------------------- On Jan. 15, 2020, 10:20 p.m., Qian Zhang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71944/ > ----------------------------------------------------------- > > (Updated Jan. 15, 2020, 10:20 p.m.) > > > Review request for mesos, Andrei Budnik and Greg Mann. > > > Bugs: MESOS-10048 > https://issues.apache.org/jira/browse/MESOS-10048 > > > Repository: mesos > > > Description > ------- > > Set container process's OOM score adjust. > > > Diffs > ----- > > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp > b12b73d8e0161d448075378765e77867521de04e > src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp > a311ab4495f71bedacd2e99c84c765f0e5fe99d3 > src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp > dc6c7aa1c998c30c8b17db04a38e7a1e28a6a6c1 > src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.hpp > c62deec4b1cd749dba5fe71b901e0353806a0805 > src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp > ac2e66b570bb84b43c4a3e8f19b40e5fcea71a4a > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp > 27d88e91fb784179effd54781f84000fe85c13eb > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp > 0896d37761a11f55ba4b866d235c3bd2b79dcfba > src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.hpp > 06531072f445d4ec978ebaf5ec5e4a2475517d05 > src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.cpp > ec2ce67e54387f26aa11c00d4c7f85f0807a127b > src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.hpp > 2c865aca35084a5db567b5f95c8c57bb6e1d5634 > src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.cpp > 180afc936798c2fa4de0deef080276cf7cc94199 > > > Diff: https://reviews.apache.org/r/71944/diff/4/ > > > Testing > ------- > > sudo make check > > > Thanks, > > Qian Zhang > >
