[patch -mm] mm, oom: header nodemask is NULL when cpusets are disabled fix

2017-01-20 Thread David Rientjes
Newline per Hillf Signed-off-by: David Rientjes --- mm/oom_kill.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1767e50844ac..51c091849dcb 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -408,7 +408,7 @@ static void dump_header(struct

[patch] mm, oom: header nodemask is NULL when cpusets are disabled

2017-01-19 Thread David Rientjes
Signed-off-by: David Rientjes --- mm/oom_kill.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -403,12 +403,14 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemas

[patch -mm] mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled

2017-01-18 Thread David Rientjes
tting the nodemask to cpuset_current_mems_allowed is redundant and prevents debugging issues where ac->nodemask is not set properly in the page allocator. This provides better debugging output since cpuset_print_current_mems_allowed() is already provided. Signed-off-by: David Rientjes --- mm

Re: [PATCH] mm/mempolicy.c: do not put mempolicy before using its nodemask

2017-01-18 Thread David Rientjes
_alloc_pages_nodemask() can end up using a bogus nodemask, which could lead > e.g. to premature OOM. > > Fixes: be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to > alloc_pages_vma") > Signed-off-by: Vlastimil Babka > Cc: sta...@vger.kernel.org > Cc: Aneesh Kuma

Re: [PATCH] slub: Trace free objects at KERN_INFO

2017-01-17 Thread David Rientjes
ned-off-by: Daniel Thompson Acked-by: David Rientjes

Re: [PATCH 2/4] mm, page_alloc: warn_alloc print nodemask

2017-01-17 Thread David Rientjes
On Tue, 17 Jan 2017, Michal Hocko wrote: > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 57dc3c3b53c1..3e35eb04a28a 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1912,8 +1912,8 @@ extern void si_meminfo_node(struct sysinfo *val, int > nid); > extern unsigned l

Re: [PATCH] slab: add a check for the first kmem_cache not to be destroyed

2017-01-17 Thread David Rientjes
On Tue, 17 Jan 2017, kwon wrote: > >> diff --git a/mm/slab_common.c b/mm/slab_common.c > >> index 1dfc209..2d30ace 100644 > >> --- a/mm/slab_common.c > >> +++ b/mm/slab_common.c > >> @@ -744,7 +744,7 @@ void kmem_cache_destroy(struct kmem_cache *s) > >>bool need_rcu_barrier = false; > >>in

Re: [PATCH 1/4] mm, page_alloc: do not report all nodes in show_mem

2017-01-17 Thread David Rientjes
g task numa > policy. Add this check to not pollute the output with the pointless > information. > > Acked-by: Mel Gorman > Acked-by: Johannes Weiner > Signed-off-by: Michal Hocko s/fileter/filter/ Acked-by: David Rientjes

Re: [patch v2] mm, memcg: do not retry precharge charges

2017-01-14 Thread David Rientjes
On Sat, 14 Jan 2017, Johannes Weiner wrote: > The OOM killer livelock was the motivation for this patch. With that > ruled out, what's the point of this patch? Try a bit less hard to move > charges during task migration? > Most important part is to fail ->can_attach() instead of oom killing pro

Re: [patch v2] mm, memcg: do not retry precharge charges

2017-01-13 Thread David Rientjes
& ~__GFP_NORETRY, which is pointless as written. Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path") Acked-by: Michal Hocko Signed-off-by: David Rientjes --- mm/memcontrol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/mem

[patch v2] mm, memcg: do not retry precharge charges

2017-01-12 Thread David Rientjes
OOM parameter in charge path") Signed-off-by: David Rientjes --- mm/memcontrol.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4353,9 +4353,12 @@ static int mem_cgroup_do_prec

[patch] mm, memcg: do not retry precharge charges

2017-01-11 Thread David Rientjes
. This also restructures mem_cgroup_wait_acct_move() since it is not possible for mc.moving_task to be current. Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path") Signed-off-by: David Rientjes --- mm/memcontrol.c | 32 +++---

Re: [PATCH v2] mm: Respect FOLL_FORCE/FOLL_COW for thp

2017-01-10 Thread David Rientjes
S(status) == 0); > return 0; > } > > Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously to > the update in gup.c in the original commit. The same pattern exists in > follow_devmap_pmd. However, we should not be able to reach that check > with FOLL_COW set, so add WARN_ONCE to make sure we notice if we ever > do. > > Signed-off-by: Keno Fischer Tested-by: David Rientjes

[patch v2] mm, thp: add new defer+madvise defrag option

2017-01-10 Thread David Rientjes
rts five and triple_flag_store() was getting unnecessarily messy. Signed-off-by: David Rientjes --- v2: uses new naming suggested by Vlastimil (defer+madvise order looks better in "... defer defer+madvise madvise ...") v1 was acked by Mel, and it probably could have been pre

Re: [patch] mm, thp: add new background defrag option

2017-01-10 Thread David Rientjes
On Tue, 10 Jan 2017, Vlastimil Babka wrote: > > I get very confused by the /sys/kernel/mm/transparent_hugepage/defrag > > versus enabled flags, and this may be a terrible, even more confusing, > > idea: but I've been surprised and sad to see defrag with a "defer" > > option, but poor enabled witho

Re: [patch] mm, thp: add new background defrag option

2017-01-09 Thread David Rientjes
On Mon, 9 Jan 2017, Vlastimil Babka wrote: > > Any suggestions for a better name for "background" are more than welcome. > > Why not just "madvise+defer"? > Seeing no other activity regarding this issue (omg!), I'll wait a day or so to see if there are any objections to "madvise+defer" or su

Re: [patch] mm, thp: add new background defrag option

2017-01-06 Thread David Rientjes
On Fri, 6 Jan 2017, Vlastimil Babka wrote: > Deciding between "defer" and "background" is however confusing, and also > doesn't indicate that the difference is related to madvise. > Any suggestions for a better name for "background" are more than welcome. > > The kernel implementation takes l

Re: [patch] mm, thp: add new background defrag option

2017-01-05 Thread David Rientjes
On Thu, 5 Jan 2017, Vlastimil Babka wrote: > Hmm that's probably why it's hard to understand, because "madvise > request" is just setting a vma flag, and the THP allocation (and defrag) > still happens at fault. > > I'm not a fan of either name, so I've tried to implement my own > suggestion. Tur

[patch] mm, thp: add new background defrag option

2017-01-04 Thread David Rientjes
serspace, was offered: http://marc.info/?t=14823661273. This additional mode is a compromise. This patch also cleans up the helper function for storing to "enabled" and "defrag" since the former supports three modes while the latter supports five and triple_flag_st

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2017-01-04 Thread David Rientjes
On Wed, 4 Jan 2017, Vlastimil Babka wrote: > > Hmm, is there a significant benefit to setting "defer" rather than "never" > > if you can rely on khugepaged to trigger compaction when it tries to > > allocate. I suppose if there is nothing to collapse that this won't do > > compaction, but is t

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2017-01-04 Thread David Rientjes
On Wed, 4 Jan 2017, Mel Gorman wrote: > There is a slight disconnect. The bug reports I'm aware of predate the > introduction of "defer" and the current "madvise" semantics for defrag. The > current semantics have not had enough time in the field to generate > reports. I expect lag before users ar

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2017-01-03 Thread David Rientjes
On Mon, 2 Jan 2017, Vlastimil Babka wrote: > I'm late to the thread (I did read it fully though), so instead of > multiple responses, I'll just list my observations here: > > - "defer", e.g. background kswapd+compaction is not a silver bullet, it > will also affect the system. Mel already mention

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2017-01-03 Thread David Rientjes
On Tue, 3 Jan 2017, Mel Gorman wrote: > > I sympathize with that, I've dealt with a number of issues that we have > > encountered where thp defrag was either at fault or wasn't, and there were > > also suggestions to set defrag to "madvise" to rule it out and that > > impacted other users. > >

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2016-12-30 Thread David Rientjes
On Fri, 30 Dec 2016, Mel Gorman wrote: > Michal is correct in that my intent for defer was to have "never stall" > as the default behaviour. This was because of the number of severe stalls > users experienced that lead to recommendations in tuning guides to always > disable THP. I'd also seen mul

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2016-12-28 Thread David Rientjes
On Wed, 28 Dec 2016, Michal Hocko wrote: > I do care more about _users_ and their _experience_ than what > application _writers_ think is the best. This is the whole point > of giving the defrag tunable. madvise(MADV_HUGEPAGE) is just a hint to > the system that using transparent hugepages is _pre

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2016-12-27 Thread David Rientjes
On Tue, 27 Dec 2016, Michal Hocko wrote: > > Important to who? > > To all users who want to have THP without stalls experience. This was > the whole point of 444eb2a449ef ("mm: thp: set THP defrag by default to > madvise and add a stall-free defrag option"). > THEY DO NOT STALL. If the applica

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2016-12-26 Thread David Rientjes
On Mon, 26 Dec 2016, Michal Hocko wrote: > But my primary argument is that if you tweak "defer" value behavior > then you lose the only "stall free yet allow background compaction" > option. That option is really important. Important to who? What regresses if we kick a background kthread to comp

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2016-12-23 Thread David Rientjes
On Fri, 23 Dec 2016, Michal Hocko wrote: > > We have no way to compact memory for users who are not using > > MADV_HUGEPAGE, > > yes we have. it is defrag=always. If you do not want direct compaction > and the resulting allocation stalls then you have to rely on kcompactd > which is something we

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2016-12-23 Thread David Rientjes
On Fri, 23 Dec 2016, Michal Hocko wrote: > > The offering of defer breaks backwards compatibility with previous > > settings of defrag=madvise, where we could set madvise(MADV_HUGEPAGE) on > > .text segment remap and try to force thp backing if available but not > > directly reclaim for non VM_

Re: [PATCH 1/4] mm: add new mmgrab() helper

2016-12-22 Thread David Rientjes
o provided most of the kerneldoc comment.) > > Cc: Andrew Morton > Acked-by: Michal Hocko > Signed-off-by: Vegard Nossum Acked-by: David Rientjes for the series

Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2016-12-22 Thread David Rientjes
On Thu, 22 Dec 2016, Michal Hocko wrote: > > Currently, when defrag is set to "madvise", thp allocations will direct > > reclaim. However, when defrag is set to "defer", all thp allocations do > > not attempt reclaim regardless of MADV_HUGEPAGE. > > > > This patch always directly reclaims for MA

[patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred

2016-12-21 Thread David Rientjes
tion"). In this form, "defer" is a stronger, more heavyweight version of "madvise". Signed-off-by: David Rientjes --- Documentation/vm/transhuge.txt | 7 +-- mm/huge_memory.c | 10 ++ 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/Doc

Re: [patch] mm, compaction: add vmstats for kcompactd work

2016-12-12 Thread David Rientjes
mpact_free_scanned" for compatibility. > > > > It could be argued that explicitly triggered compaction could also be > > tracked separately, and that could be added if others find it useful. > > > > Signed-off-by: David Rientjes > > A bit of downside i

[patch] mm, compaction: add vmstats for kcompactd work

2016-12-07 Thread David Rientjes
ively. These values are still accounted for in the general "compact_migrate_scanned" and "compact_free_scanned" for compatibility. It could be argued that explicitly triggered compaction could also be tracked separately, and that could be added if others find it useful. Signe

[patch -mm] mm, slab: maintain total slab count instead of active count

2016-12-04 Thread David Rientjes
be inferred by the difference in number of total objects and number of active objects. Suggested-by: Joonsoo Kim Signed-off-by: David Rientjes --- For -mm because this depends on mm-slab-faster-active-and-free-stats.patch mm/slab.c | 70

Re: [patch] mm, slab: faster active and free stats

2016-11-29 Thread David Rientjes
oids active slab tracking when a slab goes from free to partial or partial to free. Suggested-by: Joonsoo Kim Signed-off-by: David Rientjes --- mm/slab.c | 48 +--- mm/slab.h | 4 ++-- 2 files changed, 23 insertions(+), 29 deletions(-) diff --git a/

[patch v2 1/2] mm, zone: track number of movable free pages

2016-11-29 Thread David Rientjes
total number of free pages. This is exported to userspace as part of a new /proc/vmstat field. Signed-off-by: David Rientjes --- v2: do not track free pages per migratetype since page allocator stress testing reveals this tracking can impact workloads and there is no substantial benefit

[patch v2 2/2] mm, compaction: avoid async compaction if most free memory is ineligible

2016-11-29 Thread David Rientjes
even start async compaction in a scenario where free memory cannot be isolated as a migration target. This patch does not deem async compaction to be suitable when the watermark checks using only the amount of free movable memory fails. Signed-off-by: David Rientjes --- v2: convert to per-zone

Re: [PATCH] RFC: dm: avoid the mutex lock in dm_bufio_shrink_count()

2016-11-28 Thread David Rientjes
to be precise, > so we don't need to take the dm-bufio lock. > > Signed-off-by: Mikulas Patocka Acked-by: David Rientjes

Re: Linux 4.9-rc6

2016-11-21 Thread David Rientjes
On Sun, 20 Nov 2016, Eric Dumazet wrote: > Another potential issue with CONFIG_VMAP_STACK is that we make no > attempt to allocate 4 consecutive pages. > > Even if we have plenty of memory, 4 calls to alloc_page() are likely to > give us 4 pages in completely different locations. > > Here I prin

Re: [PATCH] RFC: dm: avoid the mutex lock in dm_bufio_shrink_count()

2016-11-17 Thread David Rientjes
On Thu, 17 Nov 2016, Douglas Anderson wrote: > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c > index b3ba142e59a4..885ba5482d9f 100644 > --- a/drivers/md/dm-bufio.c > +++ b/drivers/md/dm-bufio.c > @@ -89,6 +89,7 @@ struct dm_bufio_client { > > struct list_head lru[LIST_SIZE];

Re: [patch 1/2] mm, zone: track number of pages in free area by migratetype

2016-11-17 Thread David Rientjes
t; Yes, sorry, I'll fix that in v2. I think less than half a kilobyte for each memory zone is satisfactory for extra tracking, compaction improvements, and optimized /proc/pagetypeinfo, though. > > Signed-off-by: David Rientjes > > I'd be for this if there are no perfor

[patch 1/2] mm, zone: track number of pages in free area by migratetype

2016-11-16 Thread David Rientjes
-zone metadata at worst by 48 bytes per memory zone (when CONFIG_CMA and CONFIG_MEMORY_ISOLATION are enabled). Signed-off-by: David Rientjes --- include/linux/mmzone.h | 3 ++- mm/compaction.c| 4 ++-- mm/page_alloc.c| 47 --- mm/vms

[patch 2/2] mm, compaction: avoid async compaction if most free memory is ineligible

2016-11-16 Thread David Rientjes
above would easily trigger earlier when async compaction will become very expensive. It would also be possible to check zone watermarks in __compaction_suitable() using the amount of MIGRATE_MOVABLE memory as an alternative. Signed-off-by: David Rientjes --- fs/buffer.c| 2

Re: [patch] mm, slab: faster active and free stats

2016-11-11 Thread David Rientjes
On Fri, 11 Nov 2016, Joonsoo Kim wrote: > Hello, David. > > Maintaining acitve/free_slab counters looks so complex. And, I think > that we don't need to maintain these counters for faster slabinfo. > Key point is to remove iterating n->slabs_partial list. > > We can calculate active slab/object

Re: [patch] mm, slab: faster active and free stats

2016-11-09 Thread David Rientjes
On Tue, 8 Nov 2016, Andrew Morton wrote: > > Reading /proc/slabinfo or monitoring slabtop(1) can become very expensive > > if there are many slab caches and if there are very lengthy per-node > > partial and/or free lists. > > > > Commit 07a63c41fa1f ("mm/slab: improve performance of gathering sl

[patch] mm, slab: faster active and free stats

2016-11-08 Thread David Rientjes
ather than iterating the lists at runtime when reading /proc/slabinfo. [rient...@google.com: changelog] Signed-off-by: Greg Thelen Signed-off-by: David Rientjes --- mm/slab.c | 117 +- mm/slab.h | 3 +- 2 files changed, 49 inserti

Re: [PATCH v2] memcg: Prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB

2016-11-02 Thread David Rientjes
On Wed, 2 Nov 2016, Thomas Garnier wrote: > >> diff --git a/mm/slab.h b/mm/slab.h > >> index 9653f2e..58be647 100644 > >> --- a/mm/slab.h > >> +++ b/mm/slab.h > >> @@ -144,6 +144,9 @@ static inline unsigned long kmem_cache_flags(unsigned > >> long object_size, > >> > >> #define CACHE_CREATE_MASK

Re: [PATCH v3 0/1] mm/mempolicy.c: forbid static or relative flags for local NUMA mode

2016-10-31 Thread David Rientjes
mpol_rebind_preferred()) or when just printing > the mempolicy structure (/proc/PID/numa_maps). > Isolated tests done. > > Signed-off-by: Piotr Kwapulinski Acked-by: David Rientjes

Re: [PATCH v2] memcg: Prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB

2016-10-31 Thread David Rientjes
On Mon, 31 Oct 2016, Thomas Garnier wrote: > While testing OBJFREELIST_SLAB integration with pagealloc, we found a > bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB & > CFLGS_OBJFREELIST_SLAB. > > The original kmem_cache is created early making OFF_SLAB not possible. > When km

[patch] mm, thp: avoid unlikely branches for split_huge_pmd

2016-10-18 Thread David Rientjes
() branch. Avoid the unlikely() branch when in a context where pmd is known to be good for __split_huge_pmd() directly. Signed-off-by: David Rientjes --- include/linux/huge_mm.h | 2 ++ mm/memory.c | 4 ++-- mm/mempolicy.c | 2 +- mm/mprotect.c | 2 +- 4 files

Re: Question on kzalloc and GFP_DMA32

2016-09-28 Thread David Rientjes
On Tue, 27 Sep 2016, Ben Greear wrote: > > I have been running this patch for a while: > > ath10k: Use GPF_DMA32 for firmware swap memory. > > This fixes OS crash when using QCA 9984 NIC on x86-64 system > without vt-d enabled. > > Also tested on ea8500 with 9980, and x86-64 w

Re: [PATCH 3/5] mm/vmalloc.c: correct lazy_max_pages() return value

2016-09-21 Thread David Rientjes
On Thu, 22 Sep 2016, zijun_hu wrote: > On 2016/9/22 5:21, David Rientjes wrote: > > On Wed, 21 Sep 2016, zijun_hu wrote: > > > >> From: zijun_hu > >> > >> correct lazy_max_pages() return value if the number of online > >> CPUs is power of 2 &

Re: [PATCH 1/5] mm/vmalloc.c: correct a few logic error for __insert_vmap_area()

2016-09-21 Thread David Rientjes
On Thu, 22 Sep 2016, zijun_hu wrote: > > We don't support inserting when va->va_start == tmp_va->va_end, plain and > > simple. There's no reason to do so. NACK to the patch. > > > i am sorry i disagree with you because > 1) in almost all context of vmalloc, original logic treat the special cas

Re: [PATCH 1/5] mm/vmalloc.c: correct a few logic error for __insert_vmap_area()

2016-09-21 Thread David Rientjes
On Thu, 22 Sep 2016, zijun_hu wrote: > >> correct a few logic error for __insert_vmap_area() since the else > >> if condition is always true and meaningless > >> > >> in order to fix this issue, if vmap_area inserted is lower than one > >> on rbtree then walk around left branch; if higher then rig

Re: [PATCH 3/5] mm/vmalloc.c: correct lazy_max_pages() return value

2016-09-21 Thread David Rientjes
On Wed, 21 Sep 2016, zijun_hu wrote: > From: zijun_hu > > correct lazy_max_pages() return value if the number of online > CPUs is power of 2 > > Signed-off-by: zijun_hu > --- > mm/vmalloc.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.

Re: [PATCH 2/5] mm/vmalloc.c: simplify /proc/vmallocinfo implementation

2016-09-21 Thread David Rientjes
On Wed, 21 Sep 2016, zijun_hu wrote: > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index cc6ecd6..a125ae8 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2576,32 +2576,13 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int > nr_vms) > static void *s_start(struct seq_file *m, loff_t *pos

Re: [PATCH 1/5] mm/vmalloc.c: correct a few logic error for __insert_vmap_area()

2016-09-21 Thread David Rientjes
On Wed, 21 Sep 2016, zijun_hu wrote: > From: zijun_hu > > correct a few logic error for __insert_vmap_area() since the else > if condition is always true and meaningless > > in order to fix this issue, if vmap_area inserted is lower than one > on rbtree then walk around left branch; if higher t

Re: [PATCH] mm/mempolicy.c: forbid static or relative flags for local NUMA mode

2016-09-20 Thread David Rientjes
On Tue, 20 Sep 2016, Piotr Kwapulinski wrote: > > There wasn't an MPOL_LOCAL when I introduced either of these flags, it's > > an oversight to allow them to be passed. > > > > Want to try to update set_mempolicy(2) with the procedure outlined in > > https://www.kernel.org/doc/man-pages/patches.

Re: [PATCH] mm/mempolicy.c: forbid static or relative flags for local NUMA mode

2016-09-19 Thread David Rientjes
red()) or when just printing > the mempolicy structure (/proc/PID/numa_maps). > Isolated tests done. > > Signed-off-by: Piotr Kwapulinski Acked-by: David Rientjes There wasn't an MPOL_LOCAL when I introduced either of these flags, it's an oversight to allow them to be passed. W

Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information

2016-09-19 Thread David Rientjes
On Sat, 17 Sep 2016, Anshuman Khandual wrote: > > I'm questioning if this information can be inferred from information > > already in /proc/zoneinfo and sysfs. We know the no-fallback zonelist is > > going to include the local node, and we know the other zonelists are > > either node ordered o

Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information

2016-09-12 Thread David Rientjes
On Mon, 12 Sep 2016, Anshuman Khandual wrote: > >> > after memory or node hot[un]plug is desirable. This change adds one > >> > new sysfs interface (/sys/devices/system/memory/system_zone_details) > >> > which will fetch and dump this information. > > Doesn't this violate the "one value per file"

Re: [PATCH] x86: Put the num_processors++ code in a more suitable position

2016-09-06 Thread David Rientjes
; disabled_cpus. > > Signed-off-by: Dou Liyang Acked-by: David Rientjes

Re: [RESEND PATCH v2] memory-hotplug: fix store_mem_state() return value

2016-08-31 Thread David Rientjes
On Wed, 31 Aug 2016, Reza Arbab wrote: > > Nope, the return value of changing state from online to online was > > established almost 11 years ago in commit 3947be1969a9. > > Fair enough. So if online-to-online is -EINVAL, online-to-online for state is -EINVAL, it has been since 2005. > 1. Shou

Re: [RESEND PATCH v2] memory-hotplug: fix store_mem_state() return value

2016-08-31 Thread David Rientjes
On Wed, 31 Aug 2016, Reza Arbab wrote: > > The correct fix is for store_mem_state() to return -EINVAL when > > device_online() returns non-zero. > > Let me put it to you this way--which one of these sysfs operations is behaving > correctly? > > # cd /sys/devices/system/memory/memory0 >

Re: [PATCH 4/4] selftests/vm: add test for mlock() when areas are intersected.

2016-08-31 Thread David Rientjes
On Tue, 30 Aug 2016, wei.guo.si...@gmail.com wrote: > From: Simon Guo > > This patch adds mlock() test for multiple invocation on > the same address area, and verify it doesn't mess the > rlimit mlock limitation. > Thanks for expanding mlock testing. I'm wondering if you are interested in mo

Re: [PATCH v2] cpu: Fix node state for whether it contains CPU

2016-08-31 Thread David Rientjes
ning node state. > That would mean that when node_reclaim_mode is enabled that we weren't properly returning NODE_RECLAIM_NOSCAN if a remote node had its own cpus and PGDAT_RECLAIM_LOCKED wasn't already set, so this seems like it could result in a performance improvement. &

Re: [RESEND PATCH v2] memory-hotplug: fix store_mem_state() return value

2016-08-31 Thread David Rientjes
On Wed, 31 Aug 2016, Andrew Morton wrote: > > Attempting to online memory which is already online will cause this: > > > > 1. store_mem_state() called with buf="online" > > 2. device_online() returns 1 because device is already online > > 3. store_mem_state() returns 1 > > 4. calling code interpr

Re: [PATCH] mm: clarify COMPACTION Kconfig text

2016-08-25 Thread David Rientjes
On Thu, 25 Aug 2016, Michal Hocko wrote: > > I don't believe it has been an issue in the past for any archs that > > don't use thp. > > Well, fragmentation is a real problem and order-0 reclaim will be never > anywhere close to reliably provide higher order pages. Well, reclaiming > a lot of memo

Re: + stackdepot-fix-mempolicy-use-after-free.patch added to -mm tree

2016-08-24 Thread David Rientjes
On Fri, 19 Aug 2016, a...@linux-foundation.org wrote: > From: Vegard Nossum > Subject: stackdepot: fix mempolicy use-after-free > > This patch fixes the following: > > BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr > 88010b48102c > Read of size 2 by task trin

Re: [PATCH] mm: clarify COMPACTION Kconfig text

2016-08-24 Thread David Rientjes
On Tue, 23 Aug 2016, Michal Hocko wrote: > From: Michal Hocko > > The current wording of the COMPACTION Kconfig help text doesn't > emphasise that disabling COMPACTION might cripple the page allocator > which relies on the compaction quite heavily for high order requests and > an unexpected OOM

Re: [PATCH] slub: Drop bogus inline for fixup_red_left()

2016-08-03 Thread David Rientjes
; nearest_obj()") > Signed-off-by: Geert Uytterhoeven Acked-by: David Rientjes

Re: [PATCH 3/8] mm, page_alloc: don't retry initial attempt in slowpath

2016-07-20 Thread David Rientjes
nt to slowpath just to wake up > kswapd and then succeed on min watermark > 2 - try all zones with min watermark before resorting to no watermark > (if allowed), so we don't needlessly put below min watermark the first > zone in zonelist, while some later zone would still be above watermark > The second point makes sense, thanks! Acked-by: David Rientjes

Re: [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path

2016-07-20 Thread David Rientjes
On Wed, 20 Jul 2016, Michal Hocko wrote: > > Any mempool_alloc() user that then takes a contended mutex can do this. > > An example: > > > > taskA taskB taskC > > - - - > > mempool_alloc(a) > > mutex_lock(b) > >

Re: [PATCH 5/8] mm, page_alloc: make THP-specific decisions more generic

2016-07-19 Thread David Rientjes
On Mon, 18 Jul 2016, Vlastimil Babka wrote: > Since THP allocations during page faults can be costly, extra decisions are > employed for them to avoid excessive reclaim and compaction, if the initial > compaction doesn't look promising. The detection has never been perfect as > there is no gfp fla

Re: [PATCH 4/8] mm, page_alloc: restructure direct compaction handling in slowpath

2016-07-19 Thread David Rientjes
On Mon, 18 Jul 2016, Vlastimil Babka wrote: > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 30443804f156..a04a67745927 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3510,7 +3510,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int > order, > struct page *page = N

Re: [PATCH 3/8] mm, page_alloc: don't retry initial attempt in slowpath

2016-07-19 Thread David Rientjes
On Mon, 18 Jul 2016, Vlastimil Babka wrote: > After __alloc_pages_slowpath() sets up new alloc_flags and wakes up kswapd, it > first tries get_page_from_freelist() with the new alloc_flags, as it may > succeed e.g. due to using min watermark instead of low watermark. It makes > sense to to do this

Re: [PATCH 2/8] mm, page_alloc: set alloc_flags only once in slowpath

2016-07-19 Thread David Rientjes
RMARKS from > gfp_to_alloc_flags() to gfp_pfmemalloc_allowed(). This means we don't have to > mask out ALLOC_NO_WATERMARKS in numerous places in __alloc_pages_slowpath() > anymore. The only two tests for the flag can instead call > gfp_pfmemalloc_allowed(). > > Signed-off-by: Vlastimil Babka

Re: [PATCH 1/8] mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode

2016-07-19 Thread David Rientjes
that's not what we usually expect, so probably better not to isolate it. > > When tested by stress-highalloc from mmtests, this has reduced the number of > page migrate failures by 60-70%. > > Signed-off-by: Hugh Dickins > Signed-off-by: Vlastimil Babka > Acked-by: Michal Hocko Acked-by: David Rientjes

Re: [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path

2016-07-19 Thread David Rientjes
On Tue, 19 Jul 2016, Johannes Weiner wrote: > Mempool guarantees forward progress by having all necessary memory > objects for the guaranteed operation in reserve. Think about it this > way: you should be able to delete the pool->alloc() call entirely and > still make reliable forward progress. It

Re: [PATCH] sh-DWARF: Delete unnecessary checks before the function call "mempool_destroy"

2016-07-19 Thread David Rientjes
On Tue, 19 Jul 2016, SF Markus Elfring wrote: > > From: Markus Elfring > > Date: Mon, 16 Nov 2015 08:20:36 +0100 > > > > The mempool_destroy() function tests whether its argument is NULL > > and then returns immediately. Thus the test around the calls is not needed. > > > > This issue was detec

Re: [PATCH -next] mm/slab: use list_move instead of list_del/list_add

2016-07-19 Thread David Rientjes
On Tue, 19 Jul 2016, Wei Yongjun wrote: > From: Wei Yongjun > > Using list_move() instead of list_del() + list_add(). > ... to prevent needlessly poisoning the next and prev values. > Signed-off-by: Wei Yongjun Acked-by: David Rientjes

Re: [PATCH 1/2] mem-hotplug: use GFP_HIGHUSER_MOVABLE in, alloc_migrate_target()

2016-07-19 Thread David Rientjes
On Tue, 19 Jul 2016, Xishi Qiu wrote: > Memory offline could happen on both movable zone and non-movable zone, and we > can offline the whole node if the zone is movable_zone(the node only has one > movable_zone), and if the zone is normal_zone, we cannot offline the whole > node, > because some

Re: [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path

2016-07-18 Thread David Rientjes
On Mon, 18 Jul 2016, Michal Hocko wrote: > David Rientjes was objecting that such an approach wouldn't help if the > oom victim was blocked on a lock held by process doing mempool_alloc. This > is very similar to other oom deadlock situations and we have oom_reaper > to deal w

Re: System freezes after OOM

2016-07-18 Thread David Rientjes
On Mon, 18 Jul 2016, Michal Hocko wrote: > > There's > > two fundamental ways to go about it: (1) ensure mempool_alloc() can make > > forward progress (whether that's by way of gfp flags or access to memory > > reserves, which may depend on the process context such as PF_MEMALLOC) or > > (2) r

Re: System freezes after OOM

2016-07-15 Thread David Rientjes
On Fri, 15 Jul 2016, Mikulas Patocka wrote: > And what about the oom reaper? It should have freed all victim's pages > even if the victim is looping in mempool_alloc. Why the oom reaper didn't > free up memory? > Is that possible with mlock or shared memory? Nope. The oom killer does not ha

Re: System freezes after OOM

2016-07-15 Thread David Rientjes
On Fri, 15 Jul 2016, Michal Hocko wrote: > > If PF_MEMALLOC context is allocating too much memory reserves, then I'd > > argue that is a problem independent of using mempool_alloc() since > > mempool_alloc() can evolve directly into a call to the page allocator. > > How does such a process gua

Re: System freezes after OOM

2016-07-15 Thread David Rientjes
On Fri, 15 Jul 2016, Mikulas Patocka wrote: > > There is no guarantee that _anything_ can return memory to the mempool, > > You misunderstand mempools if you make such claims. > > There is in fact guarantee that objects will be returned to mempool. In > the past I reviewed device mapper thoroug

Re: System freezes after OOM

2016-07-15 Thread David Rientjes
On Fri, 15 Jul 2016, Mikulas Patocka wrote: > > Umm, show me an explicit guarantee where the oom reaper will free memory > > such that other threads may return memory to this process's mempool so it > > can make forward progress in mempool_alloc() without the need of utilizing > > memory reserv

Re: [PATCH] mem-hotplug: use GFP_HIGHUSER_MOVABLE and alloc from next node in alloc_migrate_target()

2016-07-14 Thread David Rientjes
On Thu, 14 Jul 2016, Xishi Qiu wrote: > alloc_migrate_target() is called from migrate_pages(), and the page > is always from user space, so we can add __GFP_HIGHMEM directly. > > Second, when we offline a node, the new page should alloced from other > nodes instead of the current node, because re

Re: System freezes after OOM

2016-07-14 Thread David Rientjes
On Fri, 15 Jul 2016, Tetsuo Handa wrote: > Whether the OOM reaper will free some memory no longer matters. Instead, > whether the OOM reaper will let the OOM killer select next OOM victim matters. > > Are you aware that the OOM reaper will let the OOM killer select next OOM > victim (currently by

Re: System freezes after OOM

2016-07-14 Thread David Rientjes
On Thu, 14 Jul 2016, Michal Hocko wrote: > > It prevents the whole system from livelocking due to an oom killed process > > stalling forever waiting for mempool_alloc() to return. No other threads > > may be oom killed while waiting for it to exit. > > But it is true that the patch has uninten

Re: System freezes after OOM

2016-07-14 Thread David Rientjes
On Thu, 14 Jul 2016, Tetsuo Handa wrote: > David Rientjes wrote: > > On Wed, 13 Jul 2016, Mikulas Patocka wrote: > > > > > What are the real problems that f9054c70d28bc214b2857cf8db8269f4f45a5e23 > > > tries to fix? > > > > > > > It pr

Re: System freezes after OOM

2016-07-14 Thread David Rientjes
On Thu, 14 Jul 2016, Mikulas Patocka wrote: > > schedule > > schedule_timeout > > io_schedule_timeout > > mempool_alloc > > __split_and_process_bio > > dm_request > > generic_make_request > > submit_bio > > mpage_readpages > > ext4_readpages > > __do_page_cache_readahead > > ra_submit > > filemap_

Re: System freezes after OOM

2016-07-13 Thread David Rientjes
On Wed, 13 Jul 2016, Tetsuo Handa wrote: > I wonder whether commit f9054c70d28bc214 ("mm, mempool: only set > __GFP_NOMEMALLOC if there are free elements") is doing correct thing. > It says > > If an oom killed thread calls mempool_alloc(), it is possible that it'll > loop forever if ther

Re: System freezes after OOM

2016-07-13 Thread David Rientjes
On Wed, 13 Jul 2016, Mikulas Patocka wrote: > What are the real problems that f9054c70d28bc214b2857cf8db8269f4f45a5e23 > tries to fix? > It prevents the whole system from livelocking due to an oom killed process stalling forever waiting for mempool_alloc() to return. No other threads may be

Re: [PATCH 2/3] mm, meminit: Always return a valid node from early_pfn_to_nid

2016-07-12 Thread David Rientjes
valid PFNs. No caller of early_pfn_to_nid > cares except early_page_uninitialised. This patch has early_pfn_to_nid > always return a valid node. > > Signed-off-by: Mel Gorman > Cc: # 4.2+ Acked-by: David Rientjes This makes me wonder about meminit_pfn_in_nid(), however, since

Re: [PATCH 3/3] mm, meminit: Ensure node is online before checking whether pages are uninitialised

2016-07-12 Thread David Rientjes
PFN order. This is not guaranteed so this patch adds robustness by always > checking if the node being checked is online. > > Signed-off-by: Mel Gorman > Cc: # 4.2+ Acked-by: David Rientjes

Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive

2016-07-11 Thread David Rientjes
On Thu, 30 Jun 2016, Joonsoo Kim wrote: > We need to find a root cause of this problem, first. > > I guess that this problem would happen when isolate_freepages_block() > early stop due to watermark check (if your patch is applied to your > kernel). If scanner meets, cached pfn will be reset and

[patch for-4.7] mm, compaction: prevent VM_BUG_ON when terminating freeing scanner fix

2016-07-11 Thread David Rientjes
cc->free_pfn go > backward though it would not be a big problem. Just leaving > isolate_start_pfn as isolate_freepages_block returns would be a proper > solution here. > I guess, but I don't see what value there is in starting free page isolation within a pageblock

<    4   5   6   7   8   9   10   11   12   13   >