Re: [PATCH 0/2] memory_hotplug: introduce config and command line options to set the default onlining policy

2016-04-06 Thread David Rientjes
On Wed, 6 Apr 2016, Andrew Morton wrote: > > This patchset continues the work I started with: > > > > commit 31bc3858ea3ebcc3157b3f5f0e624c5962f5a7a6 > > Author: Vitaly Kuznetsov > > Date: Tue Mar 15 14:56:48 2016 -0700 > > > > memory-hotplug: add automatic onlining

Re: [PATCH 0/2] memory_hotplug: introduce config and command line options to set the default onlining policy

2016-04-20 Thread David Rientjes
On Tue, 19 Apr 2016, Vitaly Kuznetsov wrote: > > I'd personally disagree that we need more and more config options to take > > care of something that an initscript can easily do and most distros > > already have their own initscripts that this can be added to. I don't see > > anything that

Re: [PATCH 0/2] memory_hotplug: introduce config and command line options to set the default onlining policy

2016-04-18 Thread David Rientjes
On Thu, 7 Apr 2016, Vitaly Kuznetsov wrote: > >> > This patchset continues the work I started with: > >> > > >> > commit 31bc3858ea3ebcc3157b3f5f0e624c5962f5a7a6 > >> > Author: Vitaly Kuznetsov > >> > Date: Tue Mar 15 14:56:48 2016 -0700 > >> > > >> > memory-hotplug:

Re: [v5 4/4] mm, oom, docs: describe the cgroup-aware OOM killer

2017-08-15 Thread David Rientjes
On Tue, 15 Aug 2017, Roman Gushchin wrote: > > > diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt > > > index dec5afdaa36d..22108f31e09d 100644 > > > --- a/Documentation/cgroup-v2.txt > > > +++ b/Documentation/cgroup-v2.txt > > > @@ -48,6 +48,7 @@ v1 is available under

Re: [v5 2/4] mm, oom: cgroup-aware OOM killer

2017-08-15 Thread David Rientjes
On Tue, 15 Aug 2017, Roman Gushchin wrote: > > I'm curious about the decision made in this conditional and how > > oom_kill_memcg_member() ignores task->signal->oom_score_adj. It means > > that memory.oom_kill_all_tasks overrides /proc/pid/oom_score_adj if it > > should otherwise be disabled.

Re: [v5 1/4] mm, oom: refactor the oom_kill_process() function

2017-08-14 Thread David Rientjes
el.org> > Cc: Vladimir Davydov <vdavydov@gmail.com> > Cc: Johannes Weiner <han...@cmpxchg.org> > Cc: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> > Cc: David Rientjes <rient...@google.com> > Cc: Tejun Heo <t...@kernel.org> > Cc: kernel-t...@fb.c

Re: [v5 3/4] mm, oom: introduce oom_priority for memory cgroups

2017-08-14 Thread David Rientjes
...@kernel.org> > Cc: Vladimir Davydov <vdavydov@gmail.com> > Cc: Johannes Weiner <han...@cmpxchg.org> > Cc: David Rientjes <rient...@google.com> > Cc: Tejun Heo <t...@kernel.org> > Cc: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> > C

Re: [v5 2/4] mm, oom: cgroup-aware OOM killer

2017-08-14 Thread David Rientjes
On Mon, 14 Aug 2017, Roman Gushchin wrote: > diff --git a/include/linux/oom.h b/include/linux/oom.h > index 8a266e2be5a6..b7ec3bd441be 100644 > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -39,6 +39,7 @@ struct oom_control { > unsigned long totalpages; > struct

Re: [v5 4/4] mm, oom, docs: describe the cgroup-aware OOM killer

2017-08-14 Thread David Rientjes
On Mon, 14 Aug 2017, Roman Gushchin wrote: > diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt > index dec5afdaa36d..22108f31e09d 100644 > --- a/Documentation/cgroup-v2.txt > +++ b/Documentation/cgroup-v2.txt > @@ -48,6 +48,7 @@ v1 is available under Documentation/cgroup-v1/.

Re: [v3 2/6] mm, oom: cgroup-aware OOM killer

2017-07-11 Thread David Rientjes
On Tue, 11 Jul 2017, Roman Gushchin wrote: > > Yes, the original motivation was to limit killing to a single process, if > > possible. To do that, we kill the process with the largest rss to free > > the most memory and rely on the user to configure /proc/pid/oom_score_adj > > if something

Re: [v3 2/6] mm, oom: cgroup-aware OOM killer

2017-07-10 Thread David Rientjes
On Wed, 21 Jun 2017, Roman Gushchin wrote: > Traditionally, the OOM killer is operating on a process level. > Under oom conditions, it finds a process with the highest oom score > and kills it. > > This behavior doesn't suit well the system with many running > containers. There are two main

Re: [v3 2/6] mm, oom: cgroup-aware OOM killer

2017-07-12 Thread David Rientjes
On Wed, 12 Jul 2017, Roman Gushchin wrote: > > It's a no-op if nobody sets up priorities or the system-wide sysctl is > > disabled. Presumably, as in our model, the Activity Manager sets the > > sysctl and is responsible for configuring the priorities if present. All > > memcgs at the

Re: [v4 2/4] mm, oom: cgroup-aware OOM killer

2017-08-08 Thread David Rientjes
On Tue, 1 Aug 2017, Roman Gushchin wrote: > > To the rest of the patch. I have to say I do not quite like how it is > > implemented. I was hoping for something much simpler which would hook > > into oom_evaluate_task. If a task belongs to a memcg with kill-all flag > > then we would update the

Re: [v4 4/4] mm, oom, docs: describe the cgroup-aware OOM killer

2017-08-08 Thread David Rientjes
On Wed, 26 Jul 2017, Roman Gushchin wrote: > +Cgroup-aware OOM Killer > +~~~ > + > +Cgroup v2 memory controller implements a cgroup-aware OOM killer. > +It means that it treats memory cgroups as first class OOM entities. > + > +Under OOM conditions the memory controller tries

Re: [v4 3/4] mm, oom: introduce oom_priority for memory cgroups

2017-08-08 Thread David Rientjes
lar to priority based oom killing that we have done. I think this kind of support is long overdue in the oom killer. Comment inline. > Signed-off-by: Roman Gushchin <g...@fb.com> > Cc: Michal Hocko <mho...@kernel.org> > Cc: Vladimir Davydov <vdavydov@gmail.com> > Cc: J

Re: [v5 4/4] mm, oom, docs: describe the cgroup-aware OOM killer

2017-08-20 Thread David Rientjes
On Thu, 17 Aug 2017, Roman Gushchin wrote: > Hi David! > > Please, find an updated version of docs patch below. > Looks much better, thanks! I think the only pending issue is discussing the relationship of memory.oom_kill_all_tasks with /proc/pid/oom_score_adj == OOM_SCORE_ADJ_MIN. -- To

Re: [v5 2/4] mm, oom: cgroup-aware OOM killer

2017-08-20 Thread David Rientjes
On Wed, 16 Aug 2017, Roman Gushchin wrote: > It's natural to expect that inside a container there are their own sshd, > "activity manager" or some other stuff, which can play with oom_score_adj. > If it can override the upper cgroup-level settings, the whole delegation model > is broken. > I

Re: [RFC PATCH v2 1/7] mm, oom: refactor select_bad_process() to take memcg as an argument

2017-06-04 Thread David Rientjes
We use a heavily modified system and memcg oom killer and I'm wondering if there is some opportunity for collaboration because we may have some shared goals. I can summarize how we currently use the oom killer at a high level so that it is not overwhelming with implementation details and give

Re: [RFC PATCH v2 1/7] mm, oom: refactor select_bad_process() to take memcg as an argument

2017-06-06 Thread David Rientjes
On Tue, 6 Jun 2017, Roman Gushchin wrote: > Hi David! > > Thank you for sharing this! > > It's very interesting, and it looks like, > it's not that far from what I've suggested. > > So we definitily need to come up with some common solution. > Hi Roman, Yes, definitely. I could post a

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-15 Thread David Rientjes
On Fri, 15 Sep 2017, Roman Gushchin wrote: > > But then you just enforce a structural restriction on your configuration > > because > > root > > / \ > >AD > > /\ > > B C > > > > is a different thing than > > root > > / | \ > >B C D >

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-19 Thread David Rientjes
On Fri, 15 Sep 2017, Roman Gushchin wrote: > > > > But then you just enforce a structural restriction on your configuration > > > > because > > > > root > > > > / \ > > > >AD > > > > /\ > > > > B C > > > > > > > > is a different thing than > > > >

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-21 Thread David Rientjes
On Thu, 21 Sep 2017, Johannes Weiner wrote: > That's a ridiculous nak. > > The fact that this patch series doesn't solve your particular problem > is not a technical argument to *reject* somebody else's work to solve > a different problem. It's not a regression when behavior is completely >

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-22 Thread David Rientjes
On Fri, 22 Sep 2017, Tejun Heo wrote: > > It doesn't have anything to do with my particular usecase, but rather the > > ability of userspace to influence the decisions of the kernel. Previous > > to this patchset, when selection is done based on process size, userspace > > has full control

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-22 Thread David Rientjes
On Thu, 21 Sep 2017, Johannes Weiner wrote: > > The issue is that if you opt-in to the new feature, then you are forced to > > change /proc/pid/oom_score_adj of all processes attached to a cgroup that > > you do not want oom killed based on size to be oom disabled. > > You're assuming that

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-23 Thread David Rientjes
On Fri, 22 Sep 2017, Tejun Heo wrote: > > If you have this low priority maintenance job charging memory to the high > > priority hierarchy, you're already misconfigured unless you adjust > > /proc/pid/oom_score_adj because it will oom kill any larger process than > > itself in today's kernels

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-14 Thread David Rientjes
On Thu, 14 Sep 2017, Michal Hocko wrote: > > It is certainly possible to add oom priorities on top before it is merged, > > but I don't see why it isn't part of the patchset. > > Because the semantic of the priority for non-leaf memcgs is not fully > clear and I would rather have the core of

Re: [v8 2/4] mm, oom: cgroup-aware OOM killer

2017-09-13 Thread David Rientjes
On Mon, 11 Sep 2017, Roman Gushchin wrote: > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 15af3da5af02..da2b12ea4667 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2661,6 +2661,231 @@ static inline bool memcg_has_children(struct > mem_cgroup *memcg) > return ret; >

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-13 Thread David Rientjes
On Wed, 13 Sep 2017, Michal Hocko wrote: > > > This patchset makes the OOM killer cgroup-aware. > > > > > > v8: > > > - Do not kill tasks with OOM_SCORE_ADJ -1000 > > > - Make the whole thing opt-in with cgroup mount option control > > > - Drop oom_priority for further discussions > > > >

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-21 Thread David Rientjes
On Wed, 20 Sep 2017, Roman Gushchin wrote: > > It's actually much more complex because in our environment we'd need an > > "activity manager" with CAP_SYS_RESOURCE to control oom priorities of user > > subcontainers when today it need only be concerned with top-level memory > > cgroups. Users

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-21 Thread David Rientjes
On Mon, 18 Sep 2017, Roman Gushchin wrote: > > As said in other email. We can make priorities hierarchical (in the same > > sense as hard limit or others) so that children cannot override their > > parent. > > You mean they can set the knob to any value, but parent's value is enforced, > if it's

Re: [v10 3/6] mm, oom: cgroup-aware OOM killer

2017-10-05 Thread David Rientjes
On Wed, 4 Oct 2017, Johannes Weiner wrote: > > By only considering leaf memcgs, does this penalize users if their memcg > > becomes oc->chosen_memcg purely because it has aggregated all of its > > processes to be members of that memcg, which would otherwise be the > > standard behavior? > > >

Re: [v11 3/6] mm, oom: cgroup-aware OOM killer

2017-10-10 Thread David Rientjes
problem is only a result of the implementation detail of this patchset. For these reasons: unfair comparison of root mem cgroup usage to bias against that mem cgroup from oom kill in system oom conditions, the ability of users to completely evade the oom killer by attaching all processes to chi

Re: [v11 3/6] mm, oom: cgroup-aware OOM killer

2017-10-13 Thread David Rientjes
On Fri, 13 Oct 2017, Roman Gushchin wrote: > > Think about it in a different way: we currently compare per-process usage > > and userspace has /proc/pid/oom_score_adj to adjust that usage depending > > on priorities of that process and still oom kill if there's a memory leak. > > Your

Re: [v11 3/6] mm, oom: cgroup-aware OOM killer

2017-10-12 Thread David Rientjes
On Wed, 11 Oct 2017, Roman Gushchin wrote: > > But let's move the discussion forward to fix it. To avoid necessarily > > accounting memory to the root mem cgroup, have we considered if it is even > > necessary to address the root mem cgroup? For the users who opt-in to > > this heuristic,

Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

2017-09-09 Thread David Rientjes
On Fri, 8 Sep 2017, Christopher Lameter wrote: > Ok. Certainly there were scalability issues (lots of them) and the sysctl > may have helped there if set globally. But the ability to kill the > allocating tasks was primarily used in cpusets for constrained allocation. > I remember discussing it

Re: [v6 2/4] mm, oom: cgroup-aware OOM killer

2017-08-30 Thread David Rientjes
On Wed, 30 Aug 2017, Roman Gushchin wrote: > I've spent some time to implement such a version. > > It really became shorter and more existing code were reused, > howewer I've met a couple of serious issues: > > 1) Simple summing of per-task oom_score doesn't make sense. >First, we calculate

Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

2017-09-06 Thread David Rientjes
is has to do with the overall patchset though :) > To make a first step towards deprecation, let's warn potential > users about deprecation plans. > > Signed-off-by: Roman Gushchin <g...@fb.com> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Michal Hocko <mho...@su

Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

2017-09-07 Thread David Rientjes
On Thu, 7 Sep 2017, Christopher Lameter wrote: > > SGI required it when it was introduced simply to avoid the very expensive > > tasklist scan. Adding Christoph Lameter to the cc since he was involved > > back then. > > Really? From what I know and worked on way back when: The reason was to be

Re: [v8 3/4] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer

2017-09-12 Thread David Rientjes
On Tue, 12 Sep 2017, Roman Gushchin wrote: > > I can't imagine that Tejun would be happy with a new mount option, > > especially when it's not required. > > > > OOM behavior does not need to be defined at mount time and for the entire > > hierarchy. It's possible to very easily implement a

Re: [v8 3/4] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer

2017-09-11 Thread David Rientjes
On Mon, 11 Sep 2017, Roman Gushchin wrote: > Add a "groupoom" cgroup v2 mount option to enable the cgroup-aware > OOM killer. If not set, the OOM selection is performed in > a "traditional" per-process way. > > The behavior can be changed dynamically by remounting the cgroupfs. I can't imagine

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-11 Thread David Rientjes
On Mon, 11 Sep 2017, Roman Gushchin wrote: > This patchset makes the OOM killer cgroup-aware. > > v8: > - Do not kill tasks with OOM_SCORE_ADJ -1000 > - Make the whole thing opt-in with cgroup mount option control > - Drop oom_priority for further discussions Nack, we specifically require

Re: [v8 1/4] mm, oom: refactor the oom_kill_process() function

2017-09-11 Thread David Rientjes
el.org> > Cc: Vladimir Davydov <vdavydov@gmail.com> > Cc: Johannes Weiner <han...@cmpxchg.org> > Cc: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> > Cc: David Rientjes <rient...@google.com> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Tej

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-25 Thread David Rientjes
On Mon, 25 Sep 2017, Johannes Weiner wrote: > > True but we want to have the semantic reasonably understandable. And it > > is quite hard to explain that the oom killer hasn't selected the largest > > memcg just because it happened to be in a deeper hierarchy which has > > been configured to

Re: [v8 0/4] cgroup-aware OOM killer

2017-09-26 Thread David Rientjes
On Tue, 26 Sep 2017, Michal Hocko wrote: > > No, I agree that we shouldn't compare sibling memory cgroups based on > > different criteria depending on whether group_oom is set or not. > > > > I think it would be better to compare siblings based on the same criteria > > independent of group_oom

Re: [v6 3/4] mm, oom: introduce oom_priority for memory cgroups

2017-08-28 Thread David Rientjes
On Thu, 24 Aug 2017, Roman Gushchin wrote: > > > Do you have an example, which can't be effectively handled by an approach > > > I'm suggesting? > > > > No, I do not have any which would be _explicitly_ requested but I do > > envision new requirements will emerge. The most probable one would be

Re: [v6 2/4] mm, oom: cgroup-aware OOM killer

2017-08-23 Thread David Rientjes
On Wed, 23 Aug 2017, Roman Gushchin wrote: > Traditionally, the OOM killer is operating on a process level. > Under oom conditions, it finds a process with the highest oom score > and kills it. > > This behavior doesn't suit well the system with many running > containers: > > 1) There is no

Re: [v5 2/4] mm, oom: cgroup-aware OOM killer

2017-08-23 Thread David Rientjes
On Wed, 23 Aug 2017, Roman Gushchin wrote: > > It's better to have newbies consult the documentation once than making > > everybody deal with long and cumbersome names for the rest of time. > > > > Like 'ls' being better than 'read_and_print_directory_contents'. > > I don't think it's a good

Re: [v6 2/4] mm, oom: cgroup-aware OOM killer

2017-08-31 Thread David Rientjes
On Thu, 31 Aug 2017, Roman Gushchin wrote: > So, it looks to me that we're close to an acceptable version, > and the only remaining question is the default behavior > (when oom_group is not set). > Nit: without knowledge of the implementation, I still don't think I would know what an "out of

Re: [v11 2/6] mm: implement mem_cgroup_scan_tasks() for the root memory cgroup

2017-10-09 Thread David Rientjes
will use this function to iterate over tasks belonging > to the root memcg. > > Signed-off-by: Roman Gushchin <g...@fb.com> Acked-by: David Rientjes <rient...@google.com> -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a m

Re: [v11 3/6] mm, oom: cgroup-aware OOM killer

2017-10-09 Thread David Rientjes
On Thu, 5 Oct 2017, Roman Gushchin wrote: > Traditionally, the OOM killer is operating on a process level. > Under oom conditions, it finds a process with the highest oom score > and kills it. > > This behavior doesn't suit well the system with many running > containers: > > 1) There is no

Re: [v10 2/6] mm: implement mem_cgroup_scan_tasks() for the root memory cgroup

2017-10-04 Thread David Rientjes
On Wed, 4 Oct 2017, Roman Gushchin wrote: > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index d5f3a62887cf..b4de17a78dc1 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -917,7 +917,8 @@ static void invalidate_reclaim_iterators(struct > mem_cgroup *dead_memcg) > * value, the

Re: [v10 3/6] mm, oom: cgroup-aware OOM killer

2017-10-04 Thread David Rientjes
On Wed, 4 Oct 2017, Roman Gushchin wrote: > > > @@ -828,6 +828,12 @@ static void __oom_kill_process(struct task_struct > > > *victim) > > > struct mm_struct *mm; > > > bool can_oom_reap = true; > > > > > > + if (is_global_init(victim) || (victim->flags & PF_KTHREAD) || > > > +

Re: [v10 3/6] mm, oom: cgroup-aware OOM killer

2017-10-04 Thread David Rientjes
On Wed, 4 Oct 2017, Roman Gushchin wrote: > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index b4de17a78dc1..79f30c281185 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2670,6 +2670,178 @@ static inline bool memcg_has_children(struct > mem_cgroup *memcg) > return ret; >

Re: [v10 3/6] mm, oom: cgroup-aware OOM killer

2017-10-05 Thread David Rientjes
On Thu, 5 Oct 2017, Johannes Weiner wrote: > > It is, because it can quite clearly be a DoSand was prevented with > > Roman's earlier design of iterating usage up the hierarchy and comparing > > siblings based on that criteria. I know exactly why he chose that > > implementation detail early

Re: [v10 3/6] mm, oom: cgroup-aware OOM killer

2017-10-05 Thread David Rientjes
On Thu, 5 Oct 2017, Roman Gushchin wrote: > > This patchset exists because overcommit is real, exactly the same as > > overcommit within memcg hierarchies is real. 99% of the time we don't run > > into global oom because people aren't using their limits so it just works > > out. 1% of the

Re: [v11 3/6] mm, oom: cgroup-aware OOM killer

2017-10-11 Thread David Rientjes
ocesses to child cgroups either purposefully or unpurposefully, and the > > inability of userspace to effectively control oom victim selection: > > > > Nacked-by: David Rientjes <rient...@google.com> > > I consider this NACK rather dubious. Evading the heuristic

Re: [v11 3/6] mm, oom: cgroup-aware OOM killer

2017-10-11 Thread David Rientjes
aching all > > processes to child cgroups either purposefully or unpurposefully, and the > > inability of userspace to effectively control oom victim selection: > > > > Nacked-by: David Rientjes <rient...@google.com> > > So, if we'll sum the oom_score of task

Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-11-01 Thread David Rientjes
On Wed, 1 Nov 2017, Michal Hocko wrote: > > memory.oom_score_adj would never need to be permanently tuned, just as > > /proc/pid/oom_score_adj need never be permanently tuned. My response was > > an answer to Roman's concern that "v8 has it's own limitations," but I > > haven't seen a

Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-25 Thread David Rientjes
On Mon, 23 Oct 2017, Michal Hocko wrote: > On Sun 22-10-17 17:24:51, David Rientjes wrote: > > On Thu, 19 Oct 2017, Johannes Weiner wrote: > > > > > David would have really liked for this patchset to include knobs to > > > influence how the algorithm pic

Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-30 Thread David Rientjes
On Fri, 27 Oct 2017, Roman Gushchin wrote: > The thing is that the hierarchical approach (as in v8), which are you pushing, > has it's own limitations, which we've discussed in details earlier. There are > reasons why v12 is different, and we can't really simple go back. I mean if > there are

Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-31 Thread David Rientjes
On Tue, 31 Oct 2017, Michal Hocko wrote: > > I'm not ignoring them, I have stated that we need the ability to protect > > important cgroups on the system without oom disabling all attached > > processes. If that is implemented as a memory.oom_score_adj with the same > > semantics as

Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-26 Thread David Rientjes
On Thu, 26 Oct 2017, Johannes Weiner wrote: > > The nack is for three reasons: > > > > (1) unfair comparison of root mem cgroup usage to bias against that mem > > cgroup from oom kill in system oom conditions, > > > > (2) the ability of users to completely evade the oom killer by

Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-22 Thread David Rientjes
On Thu, 19 Oct 2017, Johannes Weiner wrote: > David would have really liked for this patchset to include knobs to > influence how the algorithm picks cgroup victims. The rest of us > agreed that this is beyond the scope of these patches, that the > patches don't need it to be useful, and that

Re: [PATCH v13 0/7] cgroup-aware OOM killer

2018-01-09 Thread David Rientjes
On Thu, 30 Nov 2017, Andrew Morton wrote: > > This patchset makes the OOM killer cgroup-aware. > > Thanks, I'll grab these. > > There has been controversy over this patchset, to say the least. I > can't say that I followed it closely! Could those who still have > reservations please summarise

Re: [PATCH v13 0/7] cgroup-aware OOM killer

2018-01-10 Thread David Rientjes
On Wed, 10 Jan 2018, Roman Gushchin wrote: > > 1. The unfair comparison of the root mem cgroup vs leaf mem cgroups > > > > The patchset uses two different heuristics to compare root and leaf mem > > cgroups and scores them based on number of pages. For the root mem > > cgroup, it totals the

Re: [PATCH v13 0/7] cgroup-aware OOM killer

2018-01-16 Thread David Rientjes
On Mon, 15 Jan 2018, Johannes Weiner wrote: > > It's quite trivial to allow the root mem cgroup to be compared exactly the > > same as another cgroup. Please see > > https://marc.info/?l=linux-kernel=151579459920305. > > This only says "that will be fixed" and doesn't address why I care. >

Re: [PATCH v13 0/7] cgroup-aware OOM killer

2018-01-16 Thread David Rientjes
On Mon, 15 Jan 2018, Michal Hocko wrote: > > No, this isn't how kernel features get introduced. We don't design a new > > kernel feature with its own API for a highly specialized usecase and then > > claim we'll fix the problems later. Users will work around the > > constraints of the new

Re: [patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable

2018-01-19 Thread David Rientjes
On Wed, 17 Jan 2018, David Rientjes wrote: > Yes, this is a valid point. The policy of "tree" and "all" are identical > policies and then the mechanism differs wrt to whether one process is > killed or all eligible processes are killed, respectively. My motiva

Re: [patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable

2018-01-24 Thread David Rientjes
On Wed, 24 Jan 2018, Michal Hocko wrote: > > The current implementation of memory.oom_group is based on top of a > > selection implementation that is broken in three ways I have listed for > > months: > > This doesn't lead to anywhere. You are not presenting any new arguments > and you are

[patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable

2018-01-16 Thread David Rientjes
w by writing "cgroup" to the root mem cgroup's memory.oom_policy). The "all" oom policy cannot be enabled on the root mem cgroup. Signed-off-by: David Rientjes <rient...@google.com> --- Documentation/cgroup-v2.txt | 51 ++--- includ

[patch -mm 0/4] mm, memcg: introduce oom policies

2018-01-16 Thread David Rientjes
There are three significant concerns about the cgroup aware oom killer as it is implemented in -mm: (1) allows users to evade the oom killer by creating subcontainers or using other controllers since scoring is done per cgroup and not hierarchically, (2) does not allow the user to

[patch -mm 4/4] mm, memcg: add hierarchical usage oom policy

2018-01-16 Thread David Rientjes
his allows administrators, for example, to require users in their own top-level mem cgroup subtree to be accounted for with hierarchical usage. In other words, they can longer evade the oom killer by using other controllers or subcontainers. Signed-off-by: David Rientjes <rient...@google.com> --

[patch -mm 1/4] mm, memcg: introduce per-memcg oom policy tunable

2018-01-16 Thread David Rientjes
selection should be done per process. Signed-off-by: David Rientjes <rient...@google.com> --- Documentation/cgroup-v2.txt | 9 + include/linux/memcontrol.h | 11 +++ mm/memcontrol.c | 35 +++ 3 files changed, 55 insertion

[patch -mm 2/4] mm, memcg: replace cgroup aware oom killer mount option with tunable

2018-01-16 Thread David Rientjes
ffers from the traditional per process selection, and (2) a remount to change. Instead of enabling the cgroup aware oom killer with the "groupoom" mount option, set the mem cgroup subtree's memory.oom_policy to "cgroup". Signed-off-by: David Rientjes <rient...@google.com> -

Re: [patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable

2018-01-17 Thread David Rientjes
On Wed, 17 Jan 2018, Michal Hocko wrote: > Absolutely agreed! And moreover, there are not all that many ways what > to do as an action. You just kill a logical entity - be it a process or > a logical group of processes. But you have way too many policies how > to select that entity. Do you want

Re: [patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable

2018-01-17 Thread David Rientjes
On Wed, 17 Jan 2018, Tejun Heo wrote: > Hello, David. > Hi Tejun! > > The behavior of killing an entire indivisible memory consumer, enabled > > by memory.oom_group, is an oom policy itself. It specifies that all > > I thought we discussed this before but maybe I'm misremembering. > There

Re: [patch -mm 0/4] mm, memcg: introduce oom policies

2018-01-17 Thread David Rientjes
On Wed, 17 Jan 2018, Roman Gushchin wrote: > You're introducing a new oom_policy knob, which has two separate sets > of possible values for the root and non-root cgroups. I don't think > it aligns with the existing cgroup v2 design. > The root mem cgroup can use "none" or "cgroup" to either

Re: [PATCH v13 0/7] cgroup-aware OOM killer

2018-01-14 Thread David Rientjes
On Sat, 13 Jan 2018, Johannes Weiner wrote: > You don't have any control and no accounting of the stuff situated > inside the root cgroup, so it doesn't make sense to leave anything in > there while also using sophisticated containerization mechanisms like > this group oom setting. > > In fact,

Re: [patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable

2018-01-25 Thread David Rientjes
On Thu, 25 Jan 2018, Michal Hocko wrote: > > As a result, this would remove patch 3/4 from the series. Do you have any > > other feedback regarding the remainder of this patch series before I > > rebase it? > > Yes, and I have provided it already. What you are proposing is > incomplete at

[patch -mm v2 1/3] mm, memcg: introduce per-memcg oom policy tunable

2018-01-25 Thread David Rientjes
selection should be done per process. Signed-off-by: David Rientjes <rient...@google.com> --- Documentation/cgroup-v2.txt | 9 + include/linux/memcontrol.h | 11 +++ mm/memcontrol.c | 35 +++ 3 files changed, 55 insertion

[patch -mm v2 3/3] mm, memcg: add hierarchical usage oom policy

2018-01-25 Thread David Rientjes
his allows administrators, for example, to require users in their own top-level mem cgroup subtree to be accounted for with hierarchical usage. In other words, they can longer evade the oom killer by using other controllers or subcontainers. Signed-off-by: David Rientjes <rient...@google.com> --

[patch -mm v2 0/3] mm, memcg: introduce oom policies

2018-01-25 Thread David Rientjes
There are three significant concerns about the cgroup aware oom killer as it is implemented in -mm: (1) allows users to evade the oom killer by creating subcontainers or using other controllers since scoring is done per cgroup and not hierarchically, (2) does not allow the user to

Re: [patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable

2018-01-26 Thread David Rientjes
On Fri, 26 Jan 2018, Michal Hocko wrote: > > Could you elaborate on why specifying the oom policy for the entire > > hierarchy as part of the root mem cgroup and also for individual subtrees > > is incomplete? It allows admins to specify and delegate policy decisions > > to subtrees owners as

Re: [patch -mm v2 2/3] mm, memcg: replace cgroup aware oom killer mount option with tunable

2018-01-26 Thread David Rientjes
On Thu, 25 Jan 2018, Andrew Morton wrote: > > Now that each mem cgroup on the system has a memory.oom_policy tunable to > > specify oom kill selection behavior, remove the needless "groupoom" mount > > option that requires (1) the entire system to be forced, perhaps > > unnecessarily, perhaps

Re: [patch -mm v2 2/3] mm, memcg: replace cgroup aware oom killer mount option with tunable

2018-01-26 Thread David Rientjes
On Fri, 26 Jan 2018, Andrew Morton wrote: > > -ECONFUSED. We want to have a mount option that has the sole purpose of > > doing echo cgroup > /mnt/cgroup/memory.oom_policy? > > Approximately. Let me put it another way: can we modify your patchset > so that the mount option remains, and

Re: [PATCH v13 0/7] cgroup-aware OOM killer

2018-01-12 Thread David Rientjes
in that. We should not fall into a cgroup v1 mentality which became very difficult to make extensible. Let's make a feature that is generally useful, complete, and empowers the user rather than push them into a corner with a system wide policy with obvious downsides. For these reasons, and the

Re: [PATCH v13 0/7] cgroup-aware OOM killer

2018-01-11 Thread David Rientjes
On Thu, 11 Jan 2018, Michal Hocko wrote: > > > I find this problem quite minor, because I haven't seen any practical > > > problems > > > caused by accounting of the root cgroup memory. > > > If it's a serious problem for you, it can be solved without switching to > > > the > > > hierarchical

Re: [patch -mm v2 1/3] mm, memcg: introduce per-memcg oom policy tunable

2018-01-29 Thread David Rientjes
On Fri, 26 Jan 2018, Michal Hocko wrote: > > The cgroup aware oom killer is needlessly declared for the entire system > > by a mount option. It's unnecessary to force the system into a single > > oom policy: either cgroup aware, or the traditional process aware. > > > > This patch introduces a

Re: [patch -mm v2 1/3] mm, memcg: introduce per-memcg oom policy tunable

2018-01-30 Thread David Rientjes
On Tue, 30 Jan 2018, Michal Hocko wrote: > > > So what is the actual semantic and scope of this policy. Does it apply > > > only down the hierarchy. Also how do you compare cgroups with different > > > policies? Let's say you have > > > root > > > / | \ > > > A B C

Re: [patch -mm v2 1/3] mm, memcg: introduce per-memcg oom policy tunable

2018-02-01 Thread David Rientjes
On Wed, 31 Jan 2018, Michal Hocko wrote: > > > > > root > > > > > / | \ > > > > > A B C > > > > >/ \/ \ > > > > > D E F G > > > > > > > > > > Assume A: cgroup, B: oom_group=1, C: tree, G: oom_group=1 > > > > > > > > > > > > > At each level

[patch 2/2] mm, page_alloc: move mirrored_kernelcore to __meminitdata

2018-02-12 Thread David Rientjes
mirrored_kernelcore can be in __meminitdata, so move it there. At the same time, fixup section specifiers to be after the name of the variable per checkpatch. --- mm/page_alloc.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/page_alloc.c

[patch 1/2] mm, page_alloc: extend kernelcore and movablecore for percent

2018-02-12 Thread David Rientjes
be a '%'. Signed-off-by: David Rientjes <rient...@google.com> --- Documentation/admin-guide/kernel-parameters.txt | 44 - mm/page_alloc.c | 43 +++- 2 files changed, 57 insertions(+), 30 deletions(-) diff --git a/Documentation

Re: [patch 1/2] mm, page_alloc: extend kernelcore and movablecore for percent

2018-02-13 Thread David Rientjes
On Tue, 13 Feb 2018, Andrew Morton wrote: > > Both kernelcore= and movablecore= can be used to define the amount of > > ZONE_NORMAL and ZONE_MOVABLE on a system, respectively. This requires > > the system memory capacity to be known when specifying the command line, > > however. > > > > This

Re: [patch 1/2] mm, page_alloc: extend kernelcore and movablecore for percent

2018-02-13 Thread David Rientjes
On Tue, 13 Feb 2018, Mike Kravetz wrote: > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > > b/Documentation/admin-guide/kernel-parameters.txt > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -1825,30

[patch -mm] mm, page_alloc: extend kernelcore and movablecore for percent fix

2018-02-13 Thread David Rientjes
Specify that movablecore= can use a percent value. Remove comment about hugetlb pages not being movable per Mike. Cc: Mike Kravetz <mike.krav...@oracle.com> Signed-off-by: David Rientjes <rient...@google.com> --- .../admin-guide/kernel-parameters.txt | 22 +--

Re: [patch 1/2] mm, page_alloc: extend kernelcore and movablecore for percent

2018-02-14 Thread David Rientjes
On Wed, 14 Feb 2018, Michal Hocko wrote: > I do not have any objections regarding the extension. What I am more > interested in is _why_ people are still using this command line > parameter at all these days. Why would anybody want to introduce lowmem > issues from 32b days. I can see the

Re: [patch 1/2] mm, page_alloc: extend kernelcore and movablecore for percent

2018-02-15 Thread David Rientjes
On Thu, 15 Feb 2018, Michal Hocko wrote: > > When the amount of kernel > > memory is well bounded for certain systems, it is better to aggressively > > reclaim from existing MIGRATE_UNMOVABLE pageblocks rather than eagerly > > fallback to others. > > > > We have additional patches that help

Re: [patch 1/2] mm, page_alloc: extend kernelcore and movablecore for percent

2018-02-15 Thread David Rientjes
On Thu, 15 Feb 2018, Matthew Wilcox wrote: > What I was proposing was an intermediate page allocator where slab would > request 2MB for its own uses all at once, then allocate pages from that to > individual slabs, so allocating a kmalloc-32 object and a dentry object > would result in 510 pages

Re: [patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable

2018-01-23 Thread David Rientjes
On Tue, 23 Jan 2018, Michal Hocko wrote: > > It can't, because the current patchset locks the system into a single > > selection criteria that is unnecessary and the mount option would become a > > no-op after the policy per subtree becomes configurable by the user as > > part of the hierarchy

Re: [PATCH v2 1/2] mm: introduce ARCH_HAS_PTE_SPECIAL

2018-04-10 Thread David Rientjes
> >> arch/s390/Kconfig | 1 + > > > > You forgot to delete __HAVE_ARCH_PTE_SPECIAL from > > arch/riscv/include/asm/pgtable-bits.h > > Damned ! > Thanks for catching it. > Squashing the two patches together at least allowed it to be caught easil

  1   2   >