Re: [v5 2/4] mm, oom: cgroup-aware OOM killer

2017-08-20 Thread David Rientjes
On Wed, 16 Aug 2017, Roman Gushchin wrote: > It's natural to expect that inside a container there are their own sshd, > "activity manager" or some other stuff, which can play with oom_score_adj. > If it can override the upper cgroup-level settings, the whole delegation model > is broken. > I don

Re: [v5 4/4] mm, oom, docs: describe the cgroup-aware OOM killer

2017-08-20 Thread David Rientjes
On Thu, 17 Aug 2017, Roman Gushchin wrote: > Hi David! > > Please, find an updated version of docs patch below. > Looks much better, thanks! I think the only pending issue is discussing the relationship of memory.oom_kill_all_tasks with /proc/pid/oom_score_adj == OOM_SCORE_ADJ_MIN.

[patch -mm] mm, compaction: persistently skip hugetlbfs pageblocks fix

2017-08-20 Thread David Rientjes
when CONFIG_COMPACTION=n. Signed-off-by: David Rientjes --- include/linux/pageblock-flags.h | 11 +++ mm/compaction.c | 8 +++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h --- a/include/

[patch 2/2] mm, compaction: persistently skip hugetlbfs pageblocks

2017-08-15 Thread David Rientjes
simple solution that doesn't involve any additional subsystems in pageblock skip manipulation. Signed-off-by: David Rientjes --- mm/compaction.c | 48 +--- 1 file changed, 37 insertions(+), 11 deletions(-) diff --git a/mm/compaction.c

[patch 1/2] mm, compaction: kcompactd should not ignore pageblock skip

2017-08-15 Thread David Rientjes
so, or that it is beneficial from attempting to isolate memory. Use the pageblock skip hint to avoid rescanning pageblocks needlessly until that information is reset. Signed-off-by: David Rientjes --- mm/compaction.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm

Re: [v5 2/4] mm, oom: cgroup-aware OOM killer

2017-08-15 Thread David Rientjes
On Tue, 15 Aug 2017, Roman Gushchin wrote: > > I'm curious about the decision made in this conditional and how > > oom_kill_memcg_member() ignores task->signal->oom_score_adj. It means > > that memory.oom_kill_all_tasks overrides /proc/pid/oom_score_adj if it > > should otherwise be disabled.

Re: [v5 4/4] mm, oom, docs: describe the cgroup-aware OOM killer

2017-08-15 Thread David Rientjes
On Tue, 15 Aug 2017, Roman Gushchin wrote: > > > diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt > > > index dec5afdaa36d..22108f31e09d 100644 > > > --- a/Documentation/cgroup-v2.txt > > > +++ b/Documentation/cgroup-v2.txt > > > @@ -48,6 +48,7 @@ v1 is available under Docume

Re: [PATCH 1/2] mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS

2017-08-14 Thread David Rientjes
t simply add VM_FAULT_SIGBUS to the > existing error code because all arch specific page fault handlers and > g-u-p would have to learn a new error code combination. > > Reported-by: Tetsuo Handa > Fixes: 3f70dc38cec2 ("mm: make sure that kthreads will not refault oom reaped > memory") > Cc: stable # 4.9+ > Signed-off-by: Michal Hocko Acked-by: David Rientjes

[patch] mm, oom: remove unused mmput_async

2017-08-14 Thread David Rientjes
After "mm: oom: let oom_reap_task and exit_mmap to run concurrently", mmput_async() is no longer used. Remove it. Cc: Andrea Arcangeli Signed-off-by: David Rientjes --- include/linux/sched/mm.h | 6 -- kernel/fork.c| 16 2 files changed, 22

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-08-14 Thread David Rientjes
he mm if > MMF_OOM_SKIP is already set and in turn all memory is already freed > and furthermore the mm data structures may already have been taken > down by free_pgtables. > > Signed-off-by: Andrea Arcangeli With your follow-up one liner to include linux/oom.h folded in: Tested-by: David Rientjes

Re: [v5 4/4] mm, oom, docs: describe the cgroup-aware OOM killer

2017-08-14 Thread David Rientjes
On Mon, 14 Aug 2017, Roman Gushchin wrote: > diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt > index dec5afdaa36d..22108f31e09d 100644 > --- a/Documentation/cgroup-v2.txt > +++ b/Documentation/cgroup-v2.txt > @@ -48,6 +48,7 @@ v1 is available under Documentation/cgroup-v1/.

Re: [v5 3/4] mm, oom: introduce oom_priority for memory cgroups

2017-08-14 Thread David Rientjes
ger priority if they are > populated with elegible tasks. > > The oom_priority value is compared within sibling cgroups. > > The root cgroup has the oom_priority 0, which cannot be changed. > > Signed-off-by: Roman Gushchin > Cc: Michal Hocko > Cc: Vladimir Davydov &

Re: [v5 2/4] mm, oom: cgroup-aware OOM killer

2017-08-14 Thread David Rientjes
On Mon, 14 Aug 2017, Roman Gushchin wrote: > diff --git a/include/linux/oom.h b/include/linux/oom.h > index 8a266e2be5a6..b7ec3bd441be 100644 > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -39,6 +39,7 @@ struct oom_control { > unsigned long totalpages; > struct task_struc

Re: [v5 1/4] mm, oom: refactor the oom_kill_process() function

2017-08-14 Thread David Rientjes
m cgroup. We don't need to print > the debug information for the each task, as well as play > with task selection (considering task's children), > so we can't use the existing oom_kill_process(). > > Signed-off-by: Roman Gushchin > Cc: Michal Hocko > Cc: Vladimi

Re: [PATCH] slub: fix per memcg cache leak on css offline

2017-08-14 Thread David Rientjes
> Fix the leak by adding the missing call to kobject_put() to > sysfs_slab_remove_workfn(). > > Signed-off-by: Vladimir Davydov > Reported-and-tested-by: Andrei Vagin > Acked-by: Tejun Heo > Cc: Michal Hocko > Cc: Johannes Weiner > Cc: Christoph Lameter > Cc: Pekka Enberg > Cc: David Rientjes > Cc: Joonsoo Kim > Fixes: 3b7b314053d02 ("slub: make sysfs file removal asynchronous") Acked-by: David Rientjes

Re: [v4 4/4] mm, oom, docs: describe the cgroup-aware OOM killer

2017-08-08 Thread David Rientjes
On Wed, 26 Jul 2017, Roman Gushchin wrote: > +Cgroup-aware OOM Killer > +~~~ > + > +Cgroup v2 memory controller implements a cgroup-aware OOM killer. > +It means that it treats memory cgroups as first class OOM entities. > + > +Under OOM conditions the memory controller tries t

Re: [v4 3/4] mm, oom: introduce oom_priority for memory cgroups

2017-08-08 Thread David Rientjes
o priority based oom killing that we have done. I think this kind of support is long overdue in the oom killer. Comment inline. > Signed-off-by: Roman Gushchin > Cc: Michal Hocko > Cc: Vladimir Davydov > Cc: Johannes Weiner > Cc: David Rientjes > Cc: Tejun Heo >

Re: [v4 2/4] mm, oom: cgroup-aware OOM killer

2017-08-08 Thread David Rientjes
On Tue, 1 Aug 2017, Roman Gushchin wrote: > > To the rest of the patch. I have to say I do not quite like how it is > > implemented. I was hoping for something much simpler which would hook > > into oom_evaluate_task. If a task belongs to a memcg with kill-all flag > > then we would update the cum

Re: [rfc] superblock shrinker accumulating excessive deferred counts

2017-07-18 Thread David Rientjes
On Tue, 18 Jul 2017, Dave Chinner wrote: > > Thanks for looking into this, Dave! > > > > The number of GFP_NOFS allocations that build up the deferred counts can > > be unbounded, however, so this can become excessive, and the oom killer > > will not kill any processes in this context. Althoug

Re: [rfc] superblock shrinker accumulating excessive deferred counts

2017-07-17 Thread David Rientjes
On Mon, 17 Jul 2017, Dave Chinner wrote: > > This is a side effect of super_cache_count() returning the appropriate > > count but super_cache_scan() refusing to do anything about it and > > immediately terminating with SHRINK_STOP, mostly for GFP_NOFS allocations. > > Yup. Happens during things

[rfc] superblock shrinker accumulating excessive deferred counts

2017-07-12 Thread David Rientjes
Hi Al and everyone, We're encountering an issue where the per-shrinker per-node deferred counts grow excessively large for the superblock shrinker. This appears to be long-standing behavior, so reaching out to you to see if there's any subtleties being overlooked since there is a reference to

Re: [v3 2/6] mm, oom: cgroup-aware OOM killer

2017-07-12 Thread David Rientjes
On Wed, 12 Jul 2017, Roman Gushchin wrote: > > It's a no-op if nobody sets up priorities or the system-wide sysctl is > > disabled. Presumably, as in our model, the Activity Manager sets the > > sysctl and is responsible for configuring the priorities if present. All > > memcgs at the sibling

Re: [v3 2/6] mm, oom: cgroup-aware OOM killer

2017-07-11 Thread David Rientjes
On Tue, 11 Jul 2017, Roman Gushchin wrote: > > Yes, the original motivation was to limit killing to a single process, if > > possible. To do that, we kill the process with the largest rss to free > > the most memory and rely on the user to configure /proc/pid/oom_score_adj > > if something els

Re: [RFC PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-11 Thread David Rientjes
On Tue, 11 Jul 2017, Michal Hocko wrote: > This? > --- > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 5dc0ff22d567..e155d1d8064f 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -470,11 +470,14 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, > struct mm_struct *mm) > { >

Re: [RFC PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-10 Thread David Rientjes
On Tue, 27 Jun 2017, Tetsuo Handa wrote: > I wonder why you prefer timeout based approach. Your patch will after all > set MMF_OOM_SKIP if operations between down_write() and up_write() took > more than one second. lock_anon_vma_root() from unlink_anon_vmas() from > free_pgtables() for example cal

Re: [RFC PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-10 Thread David Rientjes
On Mon, 26 Jun 2017, Michal Hocko wrote: > diff --git a/mm/mmap.c b/mm/mmap.c > index 3bd5ecd20d4d..253808e716dc 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2962,6 +2962,11 @@ void exit_mmap(struct mm_struct *mm) > /* Use -1 here to ensure all VMAs in the mm are unmapped */ > unma

Re: [v3 2/6] mm, oom: cgroup-aware OOM killer

2017-07-10 Thread David Rientjes
On Wed, 21 Jun 2017, Roman Gushchin wrote: > Traditionally, the OOM killer is operating on a process level. > Under oom conditions, it finds a process with the highest oom score > and kills it. > > This behavior doesn't suit well the system with many running > containers. There are two main issue

[patch resend 4.12] compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled

2017-06-26 Thread David Rientjes
and work toward moving arm64, and other architectures, toward CONFIG_OPTIMIZE_INLINING behavior. Reported-by: Sodagudi Prasad Tested-by: Matthias Kaehlcke Signed-off-by: David Rientjes --- Resend of http://marc.info/?l=linux-kernel&m=149681501816319 for 4.12 inclusion. Prasad, ple

Re: Re: [PATCH] mm,oom_kill: Close race window of needlessly selecting new victims.

2017-06-21 Thread David Rientjes
On Wed, 21 Jun 2017, Tetsuo Handa wrote: > Umm... So, you are pointing out that select_bad_process() aborts based on > TIF_MEMDIE or MMF_OOM_SKIP is broken because victim threads can be removed > from global task list or cgroup's task list. Then, the OOM killer will have > to > wait until all mm

Re: [PATCH] compiler, clang: Add always_inline attribute to inline

2017-06-20 Thread David Rientjes
On Tue, 20 Jun 2017, Mark Rutland wrote: > As with my reply to David, my preference would be that we: > > 1) Align compiler-clang.h with the compiler-gcc.h inlining behaviour, so >that things work by default. > > 2) Fix up the arm64 core code (and drivers for architected / common >periph

Re: [PATCH] mm,oom_kill: Close race window of needlessly selecting new victims.

2017-06-20 Thread David Rientjes
On Sat, 17 Jun 2017, Tetsuo Handa wrote: > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 04c9143..cf1d331 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -470,38 +470,9 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, > struct mm_struct *mm) > { > struct mmu_gather t

Re: [PATCH] compiler, clang: Add always_inline attribute to inline

2017-06-19 Thread David Rientjes
On Mon, 19 Jun 2017, Sodagudi Prasad wrote: > > > Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused > > > static inline functions") re-defining the 'inline' macro but > > > __attribute__((always_inline)) is missing. Some compilers may > > > not honor inline hint if always_iniline

[patch for-4.12] mm, thp: remove cond_resched from __collapse_huge_page_copy

2017-06-19 Thread David Rientjes
Reported-by: Larry Finger Signed-off-by: David Rientjes --- Note: Larry should be back as of June 17 to test if this fixes the reported issue. mm/khugepaged.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -65

Re: [PATCH 0/7] rwsem: Implement down_read_killable()

2017-06-19 Thread David Rientjes
On Mon, 19 Jun 2017, Kirill Tkhai wrote: > This series implements killable version of down_read() > similar to already existing down_write_killable() function. > Patches [1-2/7] add arch-independent low-level primitives > for the both rwsem types. > > Patches [3-6/7] add arch-dependent primitives

Re: [PATCH] compiler, clang: Add always_inline attribute to inline

2017-06-19 Thread David Rientjes
On Mon, 19 Jun 2017, Prasad Sodagudi wrote: > Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused > static inline functions") re-defining the 'inline' macro but > __attribute__((always_inline)) is missing. Some compilers may > not honor inline hint if always_iniline attribute not th

Re: [patch] mm, oom: prevent additional oom kills before memory is freed

2017-06-15 Thread David Rientjes
On Fri, 16 Jun 2017, Michal Hocko wrote: > I am sorry but I have really hard to make the oom reaper a reliable way > to stop all the potential oom lockups go away. I do not want to > reintroduce another potential lockup now. Please show where this "potential lockup" ever existed in a bug report o

Re: [patch] mm, oom: prevent additional oom kills before memory is freed

2017-06-15 Thread David Rientjes
On Thu, 15 Jun 2017, Michal Hocko wrote: > > Yes, quite a bit in testing. > > > > One oom kill shows the system to be oom: > > > > [22999.488705] Node 0 Normal free:90484kB min:90500kB ... > > [22999.488711] Node 1 Normal free:91536kB min:91948kB ... > > > > followed up by one or more unnecessa

Re: [patch] mm, oom: prevent additional oom kills before memory is freed

2017-06-15 Thread David Rientjes
On Thu, 15 Jun 2017, Tetsuo Handa wrote: > David is trying to avoid setting MMF_OOM_SKIP when the OOM reaper found that > mm->users == 0. Yes, because MMF_OOM_SKIP enables the oom killer to select another process to kill and will do so without the original victim's mm being able to undergo exit

Re: [patch] mm, oom: prevent additional oom kills before memory is freed

2017-06-15 Thread David Rientjes
On Thu, 15 Jun 2017, Michal Hocko wrote: > > If mm->mm_users is not incremented because it is already zero by the oom > > reaper, meaning the final refcount has been dropped, do not set > > MMF_OOM_SKIP prematurely. > > > > __mmput() may not have had a chance to do exit_mmap() yet, so memory from

Re: Sleeping BUG in khugepaged for i586

2017-06-14 Thread David Rientjes
On Thu, 8 Jun 2017, Michal Hocko wrote: > collapse_huge_page > pte_offset_map > kmap_atomic > kmap_atomic_prot > preempt_disable > __collapse_huge_page_copy > pte_unmap > kunmap_atomic > __kunmap_atomic > preempt_enable > > I suspect, so cond_resched seem

Re: Sleeping BUG in khugepaged for i586

2017-06-14 Thread David Rientjes
On Mon, 12 Jun 2017, Michal Hocko wrote: > > These are not soft lockups, these are need_resched warnings. We monitor > > how long need_resched has been set and when a thread takes an excessive > > amount of time to reschedule after it has been set. A loop of 512 pages > > with ptl contention

[patch] mm, oom: prevent additional oom kills before memory is freed

2017-06-14 Thread David Rientjes
lly requires no references on mm->mm_users to do exit_mmap(). Without this, several processes can be oom killed unnecessarily and the oom log can show an abundance of memory available if exit_mmap() is in progress at the time the process is skipped. Signed-off-by: David Rientjes --- mm/oom_

Re: Sleeping BUG in khugepaged for i586

2017-06-11 Thread David Rientjes
On Sat, 10 Jun 2017, Michal Hocko wrote: > > > I would just pull the cond_resched out of __collapse_huge_page_copy > > > right after pte_unmap. But I am not really sure why this cond_resched is > > > really needed because the changelog of the patch which adds is is quite > > > terse on details. >

Re: Sleeping BUG in khugepaged for i586

2017-06-09 Thread David Rientjes
On Thu, 8 Jun 2017, Michal Hocko wrote: > I would just pull the cond_resched out of __collapse_huge_page_copy > right after pte_unmap. But I am not really sure why this cond_resched is > really needed because the changelog of the patch which adds is is quite > terse on details. I'm not sure what

[patch v2 -mm] mm, hugetlb: schedule when potentially allocating many hugepages

2017-06-09 Thread David Rientjes
A few hugetlb allocators loop while calling the page allocator and can potentially prevent rescheduling if the page allocator slowpath is not utilized. Conditionally schedule when large numbers of hugepages can be allocated. Signed-off-by: David Rientjes --- Based on -mm only to prevent merge

Re: [patch -mm] mm, hugetlb: schedule when potentially allocating many hugepages

2017-06-09 Thread David Rientjes
On Wed, 7 Jun 2017, Mike Kravetz wrote: > > @@ -2364,6 +2366,7 @@ static unsigned long set_max_huge_pages(struct hstate > > *h, unsigned long count, > > ret = alloc_fresh_gigantic_page(h, nodes_allowed); > > else > > ret = alloc_fresh_huge_page(

Re: [PATCH] mm: Drop useless local parameters of register_one_node()

2017-06-09 Thread David Rientjes
d register_node(). > > [Test in Qemu by 4 hotpluggable nodes in x86-64 system] > > Signed-off-by: Dou Liyang Acked-by: David Rientjes

[patch -mm] mm, hugetlb: schedule when potentially allocating many hugepages

2017-06-07 Thread David Rientjes
A few hugetlb allocators loop while calling the page allocator and can potentially prevent rescheduling if the page allocator slowpath is not utilized. Conditionally schedule when large numbers of hugepages can be allocated. Signed-off-by: David Rientjes --- Based on -mm only to prevent merge

Re: Sleeping BUG in khugepaged for i586

2017-06-07 Thread David Rientjes
On Wed, 7 Jun 2017, Vlastimil Babka wrote: > >> Hmm I'd expect such spin lock to be reported together with mmap_sem in > >> the debugging "locks held" message? > > > > My bisection of the problem is about half done. My latest good version is > > commit > > 7b8cd33 and the latest bad one is 2ea6

Re: [RFC] clang: 'unused-function' warning on static inline functions

2017-06-06 Thread David Rientjes
On Tue, 6 Jun 2017, Matthias Kaehlcke wrote: > Unfortunately as is the patch doesn't work: > > include/linux/compiler-clang.h:20:9: error: 'inline' macro redefined > [-Werror,-Wmacro-redefined] > #define inline inline __attribute__((unused)) > ^ > include/linux/compiler-gcc.h:78:9: note:

[patch] compiler, clang: move inline definition to compiler-gcc.h

2017-06-06 Thread David Rientjes
inline' ends up overriding the definition in compiler-gcc.h. Simply annotate all inline functions as __attribute__((unused)). It's necessary to suppress the warning for clang and is implicit with gcc. Reported-by: Matthias Kaehlcke Signed-off-by: David Rientjes --- Matthias, please a

Re: [RFC PATCH v2 1/7] mm, oom: refactor select_bad_process() to take memcg as an argument

2017-06-06 Thread David Rientjes
On Tue, 6 Jun 2017, Roman Gushchin wrote: > Hi David! > > Thank you for sharing this! > > It's very interesting, and it looks like, > it's not that far from what I've suggested. > > So we definitily need to come up with some common solution. > Hi Roman, Yes, definitely. I could post a serie

[patch resend] compiler, clang: suppress warning for unused static inline functions

2017-06-06 Thread David Rientjes
out that suppressing the warnings avoids potentially complex #ifdef directives, which also reduces LOC. Suppress the warning for clang. Signed-off-by: David Rientjes --- This is a resend of my patch from http://marc.info/?t=14956596926 that did not seem to end very productively, but I&#

Re: [RFC PATCH v2 1/7] mm, oom: refactor select_bad_process() to take memcg as an argument

2017-06-04 Thread David Rientjes
We use a heavily modified system and memcg oom killer and I'm wondering if there is some opportunity for collaboration because we may have some shared goals. I can summarize how we currently use the oom killer at a high level so that it is not overwhelming with implementation details and give some

Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low

2017-06-04 Thread David Rientjes
On Fri, 2 Jun 2017, Andrew Morton wrote: > On Mon, 1 May 2017 14:34:21 -0700 (PDT) David Rientjes > wrote: > > > The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do > > not swap anon pages just because free+file is low'") reintro

Re: [patch] compiler, clang: suppress warning for unused static inline functions

2017-05-31 Thread David Rientjes
On Wed, 31 May 2017, Doug Anderson wrote: > > Again, I defer to maintainers like Andrew and Ingo who have to deal with > > an enormous amount of patches on how they would like to handle it; I don't > > think myself or anybody else who doesn't deal with a large number of > > patches should be manda

[patch] mm, vmpressure: pass-through notification support

2017-05-31 Thread David Rientjes
ed for backwards compatibility. See the change to Documentation/cgroup-v1/memory.txt for full specification. Signed-off-by: David Rientjes --- Documentation/cgroup-v1/memory.txt | 47 ++ mm/vmpressure.c| 122 - 2 files changed

Re: [patch] compiler, clang: suppress warning for unused static inline functions

2017-05-30 Thread David Rientjes
On Wed, 24 May 2017, Doug Anderson wrote: > * Matthias has been sending out individual patches that take each > particular case into account to try to remove the warnings. In some > cases this removes totally dead code. In other cases this adds > __maybe_unused. ...and as a last resort it uses

Re: [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills

2017-05-29 Thread David Rientjes
On Thu, 25 May 2017, Konstantin Khlebnikov wrote: > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 04c9143a8625..dd30a045ef5b 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control *oc, > const char *message) > /* Get a

Re: [PATCH v2] mm: introduce MADV_RESET_HUGEPAGE

2017-05-29 Thread David Rientjes
On Mon, 29 May 2017, Mike Rapoport wrote: > Currently applications can explicitly enable or disable THP for a memory > region using MADV_HUGEPAGE or MADV_NOHUGEPAGE. However, once either of > these advises is used, the region will always have > VM_HUGEPAGE/VM_NOHUGEPAGE flag set in vma->vm_flags.

[patch] compiler, clang: suppress warning for unused static inline functions

2017-05-24 Thread David Rientjes
out that suppressing the warnings avoids potentially complex #ifdef directives, which also reduces LOC. Supress the warning for clang. Signed-off-by: David Rientjes --- include/linux/compiler-clang.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/include/linux/compiler-clang.h b

Re: [PATCH] mm/oom_kill: count global and memory cgroup oom kills

2017-05-24 Thread David Rientjes
On Tue, 23 May 2017, Konstantin Khlebnikov wrote: > This is worth addition. Let's call it "oom_victim" for short. > > It allows to locate leaky part if they are spread over sub-containers within > common limit. > But doesn't tell which limit caused this kill. For hierarchical limits this > might

Re: [PATCH 1/3] mm/slub: Only define kmalloc_large_node_hook() for NUMA systems

2017-05-24 Thread David Rientjes
On Tue, 23 May 2017, Matthias Kaehlcke wrote: > > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h > > index de179993e039..e1895ce6fa1b 100644 > > --- a/include/linux/compiler-clang.h > > +++ b/include/linux/compiler-clang.h > > @@ -15,3 +15,8 @@ > > * with any versio

Re: [PATCH] mm/oom_kill: count global and memory cgroup oom kills

2017-05-23 Thread David Rientjes
On Mon, 22 May 2017, Konstantin Khlebnikov wrote: > Nope, they are different. I think we should rephase documentation somehow > > low - count of reclaims below low level > high - count of post-allocation reclaims above high level > max - count of direct reclaims > oom - count of failed direct rec

Re: [PATCH 1/3] mm/slub: Only define kmalloc_large_node_hook() for NUMA systems

2017-05-22 Thread David Rientjes
On Mon, 22 May 2017, Andrew Morton wrote: > > > Is clang not inlining kmalloc_large_node_hook() for some reason? I don't > > > think this should ever warn on gcc. > > > > clang warns about unused static inline functions outside of header > > files, in difference to gcc. > > I wish it wouldn't.

Re: [PATCH 1/3] mm/slub: Only define kmalloc_large_node_hook() for NUMA systems

2017-05-22 Thread David Rientjes
-Wunused-function] > Is clang not inlining kmalloc_large_node_hook() for some reason? I don't think this should ever warn on gcc. > Signed-off-by: Matthias Kaehlcke Acked-by: David Rientjes > --- > mm/slub.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff

Re: dm ioctl: Restore __GFP_HIGH in copy_params()

2017-05-22 Thread David Rientjes
On Mon, 22 May 2017, Mike Snitzer wrote: > > > The lvm2 was designed this way - it is broken, but there is not much that > > > can be done about it - fixing this would mean major rewrite. The only > > > thing we can do about it is to lower the deadlock probability with > > > __GFP_HIGH (or PF_M

Re: [PATCH] slub/memcg: Cure the brainless abuse of sysfs attributes

2017-05-21 Thread David Rientjes
> slub_attributes which must be propagated and avoid that insane conversion > to and from ASCII, but that's too large for a hot fix. > > Check at least the return value of the show() function, so calling store() > with stale content is prevented. > > Reported-by: Steven Roste

[patch] mm, thp: copying user pages must schedule on collapse

2017-05-10 Thread David Rientjes
We have encountered need_resched warnings in __collapse_huge_page_copy() while doing {clear,copy}_user_highpage() over HPAGE_PMD_NR source pages. mm->mmap_sem is held for write, but the iteration is well bounded. Reschedule as needed. Signed-off-by: David Rientjes --- mm/khugepaged.c

Re: [patch] fs, epoll: short circuit fetching events if thread has been killed

2017-05-09 Thread David Rientjes
On Tue, 9 May 2017, Andrew Morton wrote: > > We've encountered zombies that are waiting for a thread to exit that are > > looping in ep_poll() almost endlessly although there is a pending SIGKILL > > as a result of a group exit. > > > > This happens because we always find ep_events_available() an

[patch] fs, epoll: short circuit fetching events if thread has been killed

2017-05-03 Thread David Rientjes
g for ep_events_available(), but there have been no reports of delayed signal handling other than SIGKILL preventing zombies from exiting that would be fixed by this. Signed-off-by: David Rientjes --- fs/eventpoll.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/eventpoll.c b/fs/eventpoll.c

Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low

2017-05-03 Thread David Rientjes
On Wed, 3 May 2017, Michal Hocko wrote: > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 24efcc20af91..f3ec8760dc06 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2113,16 +2113,14 @@ static void get_scan_count(struct lruvec *lruvec, > struct mem_cgroup *memcg, > u64 denominator = 0;

Re: [RESENT PATCH] x86/mem: fix the offset overflow when read/write mem

2017-05-02 Thread David Rientjes
On Thu, 27 Apr 2017, zhongjiang wrote: > From: zhong jiang > > Recently, I found the following issue, it will result in the panic. > > [ 168.739152] mmap1: Corrupted page table at address 7f3e6275a002 > [ 168.745039] PGD 61f4a1067 > [ 168.745040] PUD 61ab19067 > [ 168.747730] PMD 61fb8b067

Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low

2017-05-02 Thread David Rientjes
On Tue, 2 May 2017, Michal Hocko wrote: > I have already asked and my questions were ignored. So let me ask again > and hopefuly not get ignored this time. So Why do we need a different > criterion on anon pages than file pages? The preference in get_scan_count() as already implemented is to recl

Re: [PATCH 2/3] x86/numa_emulation: assign physnode_mask directly from numa_nodes_parsed

2017-05-01 Thread David Rientjes
On Tue, 11 Apr 2017, Wei Yang wrote: > On Mon, Apr 10, 2017 at 05:26:03PM -0700, David Rientjes wrote: > >On Tue, 11 Apr 2017, Wei Yang wrote: > > > >> According to current code path, numa_nodes_parsed is already setup when > >> numa_emucation() is cal

[patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low

2017-05-01 Thread David Rientjes
sufficient, fallback to balanced reclaim so the file lru doesn't remain untouched. Suggested-by: Minchan Kim Signed-off-by: David Rientjes --- to akpm: this issue has been possible since at least 3.15, so it's probably not high priority for 4.12 but applies cleanly if it ca

Re: [patch] mm, vmscan: avoid thrashing anon lru when free + file is low

2017-04-19 Thread David Rientjes
} > } > } > Hi Minchan, This looks good and it correctly biases against SCAN_ANON for my workload that was thrashing the anon lrus. Feel free to use parts of my changelog if you'd like. Tested-by: David Rientjes

Re: [patch] mm, vmscan: avoid thrashing anon lru when free + file is low

2017-04-18 Thread David Rientjes
On Tue, 18 Apr 2017, Minchan Kim wrote: > > The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do > > not swap anon pages just because free+file is low'") reintroduces is to > > prefer swapping anonymous memory rather than trashing the file lru. > > > > If all anonymous memory

Re: [PATCH V3] mm/madvise: Move up the behavior parameter validation

2017-04-18 Thread David Rientjes
system call madvise(). > > Signed-off-by: Anshuman Khandual Acked-by: David Rientjes Looks like this depends on existing patches in -mm.

Re: [PATCH v2 tip/core/rcu 01/11] mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU

2017-04-17 Thread David Rientjes
AFE_BY_RCU in order > to avoid future instances of this sort of confusion. > > Signed-off-by: Paul E. McKenney > Cc: Christoph Lameter > Cc: Pekka Enberg > Cc: David Rientjes > Cc: Joonsoo Kim > Cc: Andrew Morton > Cc: > Acked-by: Johannes Weiner > Acked-by:

[patch] mm, vmscan: avoid thrashing anon lru when free + file is low

2017-04-17 Thread David Rientjes
e lru doesn't remain untouched. Signed-off-by: David Rientjes --- mm/vmscan.c | 41 +++-- 1 file changed, 23 insertions(+), 18 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2186,26 +2186,31 @@ static void get

Re: [PATCH] slab: avoid IPIs when creating kmem caches

2017-04-17 Thread David Rientjes
> open("/tmp/x", "w").close() > os.unlink("/tmp/x") > b = ipi_count() > print "%d loops: %d => %d (+%d ipis)" % (n, a, b, b-a) > echo(pid, "cgroup.procs") > for i in range(n): > os.rmdir(str(i)) > > patched: 1 loops: 1069 => 1170 (+101 ipis) > unpatched: 1 loops: 1192 => 48933 (+47741 ipis) > > Signed-off-by: Greg Thelen Acked-by: David Rientjes

Re: [PATCH 3/3] x86/numa_emulation: restructures numa_nodes_parsed from emulated nodes

2017-04-10 Thread David Rientjes
ch restructures numa_nodes_parsed from emulated nodes. > > Signed-off-by: Wei Yang Acked-by: David Rientjes although there's a small nit: NODE_MASK_NONE is only used for initialization, this should be nodes_clear(numa_nodes_parsed) instead, but that would be up to the x86 maintainers to allow

Re: [PATCH 2/3] x86/numa_emulation: assign physnode_mask directly from numa_nodes_parsed

2017-04-10 Thread David Rientjes
ad of re-finding it when calling numa_emulation(). > This means we can get the physnode_mask directly from numa_nodes_parsed. At > the same time, this patch correct the comment of these two functions. > > Signed-off-by: Wei Yang Acked-by: David Rientjes

Re: [PATCH 1/3] x86/numa_emulation: fix potential memory leak

2017-04-10 Thread David Rientjes
by re-order the code path. > > Signed-off-by: Wei Yang Acked-by: David Rientjes

[patch] mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()

2017-04-06 Thread David Rientjes
We got need_resched() warnings in swap_cgroup_swapoff() because swap_cgroup_ctrl[type].length is particularly large. Reschedule when needed. Signed-off-by: David Rientjes --- mm/swap_cgroup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c --- a/mm

[patch for-4.11] mm, thp: fix setting of defer+madvise thp defrag mode

2017-04-05 Thread David Rientjes
et appropriately for "defer+madvise". Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option") Signed-off-by: David Rientjes --- mm/huge_memory.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_

Re: [PATCH -v2 1/2] mm, swap: Use kvzalloc to allocate some swap data structure

2017-03-20 Thread David Rientjes
On Mon, 20 Mar 2017, Huang, Ying wrote: > From: Huang Ying > > Now vzalloc() is used in swap code to allocate various data > structures, such as swap cache, swap slots cache, cluster info, etc. > Because the size may be too large on some system, so that normal > kzalloc() may fail. But using kz

Re: [patch v2] mm, vmstat: print non-populated zones in zoneinfo

2017-03-17 Thread David Rientjes
On Fri, 17 Mar 2017, Michal Hocko wrote: > > Does it really make sense to print any counters of that zone though? > > Your follow up patch just suggests that we don't want some but what > > about others? > > Managed and present pages needs to be emitted for userspace parsing of memory hotplug,

Re: [PATCH 4/5] mm, swap: Try kzalloc before vzalloc

2017-03-17 Thread David Rientjes
On Fri, 17 Mar 2017, Huang, Ying wrote: > From: Huang Ying > > Now vzalloc() is used in swap code to allocate various data > structures, such as swap cache, swap slots cache, cluster info, etc. > Because the size may be too large on some system, so that normal > kzalloc() may fail. But using kz

[patch -mm] mm, vmstat: suppress pcp stats for unpopulated zones in zoneinfo

2017-03-06 Thread David Rientjes
protection information above pcp stats since it is relevant for all zones per vm.lowmem_reserve_ratio. Signed-off-by: David Rientjes --- mm/vmstat.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/mm/vmstat.c b/mm/vmstat.c --- a/mm/vmstat.c +++ b/mm/vmstat.c @

[patch v2] mm, vmstat: print non-populated zones in zoneinfo

2017-03-03 Thread David Rientjes
not done for unpopulated zones. Signed-off-by: David Rientjes --- v2: - s/bool populated/b assert_populated/ per Anshuman - add comment to zoneinfo_show() to describe why we care mm/vmstat.c | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/mm

Re: [patch] mm, zoneinfo: print non-populated zones

2017-03-03 Thread David Rientjes
On Fri, 3 Mar 2017, Anshuman Khandual wrote: > > This patch shows statistics for non-populated zones in /proc/zoneinfo. > > The zones exist and hold a spot in the vm.lowmem_reserve_ratio array. > > Without this patch, it is not possible to determine which index in the > > array controls which zone

[patch] mm, zoneinfo: print non-populated zones

2017-03-02 Thread David Rientjes
not done for unpopulated zones. Signed-off-by: David Rientjes --- mm/vmstat.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/mm/vmstat.c b/mm/vmstat.c --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1121,8 +1121,12 @@ static void frag_stop(struct seq_file *m

Re: [PATCH 2/4] mm: Fix checkpatch warnings, whitespace

2017-02-01 Thread David Rientjes
On Thu, 2 Feb 2017, Tobin C. Harding wrote: > @@ -3696,8 +3695,8 @@ int handle_mm_fault(struct vm_area_struct *vma, > unsigned long address, > * VM_FAULT_OOM), there is no need to kill anything. > * Just clean up the OOM state peacefully. > */

Re: [patch] mm, madvise: fail with ENOMEM when splitting vma will hit max_map_count

2017-01-25 Thread David Rientjes
On Wed, 25 Jan 2017, Anshuman Khandual wrote: > But in the due course there might be other changes in number of VMAs of > the process because of unmap() or merge() which could reduce the total > number of VMAs and hence this condition may not exist afterwards. In > that case EAGAIN still makes sen

[patch -man] madvise.2: Specify new ENOMEM return value

2017-01-24 Thread David Rientjes
madvise(2) may return ENOMEM if the advice acts on a vma that must be split and creating the new vma will result in the process exceeding /proc/sys/vm/max_map_count. Specify this additional possibility. Signed-off-by: David Rientjes --- man2/madvise.2 | 7 ++- 1 file changed, 6 insertions

[patch] mm, madvise: fail with ENOMEM when splitting vma will hit max_map_count

2017-01-24 Thread David Rientjes
(for vmas, anon_vmas, or mempolicies) cannot be allocated. Encountering /proc/sys/vm/max_map_count is not a temporary failure, however, so return ENOMEM to indicate this is a more serious issue. A followup patch to the man page will specify this behavior. Signed-off-by: David Rientjes

Re: [PATCH] mm: ensure alloc_flags in slow path are initialized

2017-01-23 Thread David Rientjes
> higher looks safe and makes it obvious to both me and gcc that > the initialization comes before the first use. > > Fixes: 74eaa4a97e8e ("mm: consolidate GFP_NOFAIL checks in the allocator > slowpath") > Signed-off-by: Arnd Bergmann Acked-by: David Rientjes

Re: [patch] mm, oom: header nodemask is NULL when cpusets are disabled

2017-01-20 Thread David Rientjes
On Fri, 20 Jan 2017, Vlastimil Babka wrote: > Could we simplify both patches with something like this? > Although the sizeof("null") is not the nicest thing, because it relies on > knowledge > that pointer() in lib/vsprintf.c uses this string. Maybe Rasmus has some > better idea? > > Thanks, >

<    3   4   5   6   7   8   9   10   11   12   >