Re: MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression)

2018-12-07 Thread David Rientjes
On Fri, 7 Dec 2018, Vlastimil Babka wrote: > >> But *that* in turn makes for other possible questions: > >> > >> - if the reason we couldn't get a local hugepage is that we're simply > >> out of local memory (huge *or* small), then maybe a remote hugepage is > >> better. > >> > >>Note that

Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"

2018-12-07 Thread David Rientjes
On Fri, 7 Dec 2018, Michal Hocko wrote: > > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. > > > > There are a couple of issues with 89c83fb539f9 independent of its partial > > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage > > allocations"): > > > > Firstly, the

[patch v2 for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"

2018-12-07 Thread David Rientjes
0799a0ffc0. The result is the same thp allocation policy for 4.20 that was in 4.19. Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") Signed-off-by: David Rientjes ---

Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"

2018-12-07 Thread David Rientjes
On Fri, 7 Dec 2018, Vlastimil Babka wrote: > > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. > > > > There are a couple of issues with 89c83fb539f9 independent of its partial > > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage > > allocations"): > > > > Firstly,

Re: MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression)

2018-12-06 Thread David Rientjes
On Thu, 6 Dec 2018, Michal Hocko wrote: > MADV_HUGEPAGE changes the picture because the caller expressed a need > for THP and is willing to go extra mile to get it. That involves > allocation latency and as of now also a potential remote access. We do > not have complete agreement on the later

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-06 Thread David Rientjes
On Wed, 5 Dec 2018, Linus Torvalds wrote: > > Ok, I've applied David's latest patch. > > > > I'm not at all objecting to tweaking this further, I just didn't want > > to have this regression stand. > > Hmm. Can somebody (David?) also perhaps try to state what the > different latency impacts end

[patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"

2018-12-06 Thread David Rientjes
HP gfp handling into alloc_hugepage_direct_gfpmask") Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") Signed-off-by: David Rientjes --- include/linux/gfp.h | 12 mm/huge_memory.c| 27 +-- mm/mempolicy.c | 32 ++

Re: [patch v2 for-4.20] mm, thp: restore node-local hugepage allocations

2018-12-06 Thread David Rientjes
On Wed, 5 Dec 2018, David Rientjes wrote: > This is a full revert of ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for > MADV_HUGEPAGE mappings") and a partial revert of 89c83fb539f9 ("mm, thp: > consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). >

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Andrea Arcangeli wrote: > __GFP_COMPACT_ONLY gave an hope it could give some middle ground but > it shows awful compaction results, it basically destroys compaction > effectiveness and we know why (COMPACT_SKIPPED must call reclaim or > compaction can't succeed because there's

[patch v2 for-4.20] mm, thp: restore node-local hugepage allocations

2018-12-05 Thread David Rientjes
Restore __GFP_THISNODE for thp allocations. Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") Signed-off-by: David Rientjes --- include/linux/m

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Linus Torvalds wrote: > > So ultimately we decided that the saner behavior that gives the least > > risk of regression for the short term, until we can do something > > better, was the one that is already applied upstream. > > You're ignoring the fact that people *did* report

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Andrea Arcangeli wrote: > > High thp utilization is not always better, especially when those hugepages > > are accessed remotely and introduce the regressions that I've reported. > > Seeking high thp utilization at all costs is not the goal if it causes > > workloads to

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread David Rientjes
d on the observation that allocating pages on local node is more beneficial than allocating hugepages on remote node. With this patch applied we may find transparent huge page allocation failures if the current node doesn't have enough freee hugepages. Before this patch such failures result

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Michal Hocko wrote: > > As we've been over countless times, this is the desired effect for > > workloads that fit on a single node. We want local pages of the native > > page size because they (1) are accessed faster than remote hugepages and > > (2) are candidates for

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Michal Hocko wrote: > > The revert is certainly needed to prevent the regression, yes, but I > > anticipate that Andrea will report back that patch 2 at least improves the > > situation for the problem that he was addressing, specifically that it is > > pointless to thrash

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Mel Gorman wrote: > > This is a single MADV_HUGEPAGE usecase, there is nothing special about it. > > It would be the same as if you did mmap(), madvise(MADV_HUGEPAGE), and > > faulted the memory with a fragmented local node and then measured the > > remote access latency

Re: [patch 1/2 for-4.20] mm, thp: restore node-local hugepage allocations

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Michal Hocko wrote: > > > At minimum do not remove the cleanup part which consolidates the gfp > > > hadnling to a single place. There is no real reason to have the > > > __GFP_THISNODE ugliness outside of alloc_hugepage_direct_gfpmask. > > > > > > > The __GFP_THISNODE usage

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Michal Hocko wrote: > > It isn't specific to MADV_HUGEPAGE, it is the policy for all transparent > > hugepage allocations, including defrag=always. We agree that > > MADV_HUGEPAGE is not exactly defined: does it mean try harder to allocate > > a hugepage locally, try

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Pingfan Liu wrote: > > > And rather than using first_online_node, would next_online_node() work? > > > > > What is the gain? Is it for memory pressure on node0? > > > Maybe I got your point now. Do you try to give a cheap assumption on > nearest neigh of this node? > It's

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-04 Thread David Rientjes
On Tue, 4 Dec 2018, Mel Gorman wrote: > What should also be kept in mind is that we should avoid conflating > locality preferences with THP preferences which is separate from THP > allocation latencies. The whole __GFP_THISNODE approach is pushing too > hard on locality versus huge pages when

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-04 Thread David Rientjes
On Tue, 4 Dec 2018, Michal Hocko wrote: > The thing I am really up to here is that reintroduction of > __GFP_THISNODE, which you are pushing for, will conflate madvise mode > resp. defrag=always with a numa placement policy because the allocation > doesn't fallback to a remote node. > It isn't

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-04 Thread David Rientjes
On Tue, 4 Dec 2018, Michal Hocko wrote: > > This fixes a 13.9% of remote memory access regression and 40% remote > > memory allocation regression on Haswell when the local node is fragmented > > for hugepage sized pages and memory is being faulted with either the thp > > defrag setting of

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-04 Thread David Rientjes
On Tue, 4 Dec 2018, Vlastimil Babka wrote: > So, AFAIK, the situation is: > > - commit 5265047ac301 in 4.1 introduced __GFP_THISNODE for THP. The > intention came a bit earlier in 4.0 commit 077fcf116c8c. (I admit acking > both as it seemed to make sense). Yes, both are based on the preference

Re: [patch 1/2 for-4.20] mm, thp: restore node-local hugepage allocations

2018-12-04 Thread David Rientjes
On Tue, 4 Dec 2018, Michal Hocko wrote: > > This is a full revert of ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for > > MADV_HUGEPAGE mappings") and a partial revert of 89c83fb539f9 ("mm, thp: > > consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). > > > > By not setting

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2018-12-03 Thread David Rientjes
On Tue, 4 Dec 2018, Pingfan Liu wrote: > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 76f8db0..8324953 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -453,6 +453,8 @@ static inline int gfp_zonelist(gfp_t flags) > */ > static inline struct zonelist

[patch 1/2 for-4.20] mm, thp: restore node-local hugepage allocations

2018-12-03 Thread David Rientjes
Restore __GFP_THISNODE for thp allocations. Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") Signed-off-by: David Rientjes --- include/linux/m

[patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-03 Thread David Rientjes
This fixes a 13.9% of remote memory access regression and 40% remote memory allocation regression on Haswell when the local node is fragmented for hugepage sized pages and memory is being faulted with either the thp defrag setting of "always" or has been madvised with MADV_HUGEPAGE. The usecase

[patch 2/2 for-4.20] mm, thp: always fault memory with __GFP_NORETRY

2018-12-03 Thread David Rientjes
rather than trying reclaim of SWAP_CLUSTER_MAX pages which is unlikely to make a difference for memory compaction to become successful. Signed-off-by: David Rientjes --- drivers/gpu/drm/ttm/ttm_page_alloc.c | 8 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 3 +-- include/linux/gfp.h

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-03 Thread David Rientjes
On Mon, 3 Dec 2018, Linus Torvalds wrote: > Side note: I think maybe people should just look at that whole > compaction logic for that block, because it doesn't make much sense to > me: > > /* > * Checks for costly allocations with __GFP_NORETRY, which >

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-03 Thread David Rientjes
On Mon, 3 Dec 2018, Michal Hocko wrote: > > I think extending functionality so thp can be allocated remotely if truly > > desired is worthwhile > > This is a complete NUMA policy antipatern that we have for all other > user memory allocations. So far you have to be explicit for your numa >

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-03 Thread David Rientjes
On Mon, 3 Dec 2018, Michal Hocko wrote: > I have merely said that a better THP locality needs more work and during > the review discussion I have even volunteered to work on that. There > are other reclaim related fixes under work right now. All I am saying > is that MADV_TRANSHUGE having numa

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-03 Thread David Rientjes
On Mon, 3 Dec 2018, Andrea Arcangeli wrote: > In my earlier review of David's patch, it looked runtime equivalent to > the __GFP_COMPACT_ONLY solution. It has the only advantage of adding a > new gfpflag until we're sure we need it but it's the worst solution > available for the long term in my

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-03 Thread David Rientjes
On Mon, 3 Dec 2018, Andrea Arcangeli wrote: > It's trivial to reproduce the badness by running a memhog process that > allocates more than the RAM of 1 NUMA node, under defrag=always > setting (or by changing memhog to use MADV_HUGEPAGE) and it'll create > swap storms despite 75% of the RAM is

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-11-28 Thread David Rientjes
On Wed, 28 Nov 2018, Linus Torvalds wrote: > On Tue, Nov 27, 2018 at 7:20 PM Huang, Ying wrote: > > > > From the above data, for the parent commit 3 processes exited within > > 14s, another 3 exited within 100s. For this commit, the first process > > exited at 203s. That is, this commit makes

Re: [RFC PATCH 1/3] mm, proc: be more verbose about unstable VMA flags in /proc//smaps

2018-11-20 Thread David Rientjes
relying on a semantic of a specific VMA > > flag. The primary reason why that happened is a lack of a proper > > internface. While this has been worked on and it will be fixed properly, > > it seems that our wording could see some refinement and be more vocal > > about semantic aspec

Re: [PATCH 4.4 131/160] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-11-20 Thread David Rientjes
On Tue, 20 Nov 2018, Michal Hocko wrote: > On Mon 19-11-18 14:16:24, David Rientjes wrote: > > On Mon, 19 Nov 2018, Greg Kroah-Hartman wrote: > > > > > 4.4-stable review patch. If anyone has any objections, please let me > > > know. > > > > >

Re: [PATCH 4.4 131/160] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-11-19 Thread David Rientjes
On Mon, 19 Nov 2018, Greg Kroah-Hartman wrote: > 4.4-stable review patch. If anyone has any objections, please let me know. > As I noted when this patch was originally proposed and when I nacked it[*] because it causes a 13.9% increase in remote memory access latency and up to 40% increase

Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc

2018-11-19 Thread David Rientjes
On Thu, 15 Nov 2018, Michal Hocko wrote: > > The userspace had a single way to determine if thp had been disabled for a > > specific vma and that was broken with your commit. We have since fixed > > it. Modifying our software stack to start looking for some field > > somewhere else will not

Re: [PATCH] mm: mmap: remove verify_mm_writelocked()

2018-11-14 Thread David Rientjes
properly.So there is no need to use this function. > > Signed-off-by: Yangtao Li Acked-by: David Rientjes

Re: [PATCH] Suppress the sparse warning ./include/linux/slab.h:332:43: warning: dubious: x & !y

2018-11-14 Thread David Rientjes
On Thu, 8 Nov 2018, Darryl T. Agostinelli wrote: > Signed-off-by: Darryl T. Agostinelli > --- > include/linux/slab.h | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 918f374e7156..883b7f56bf35 100644 > ---

Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc

2018-11-14 Thread David Rientjes
On Wed, 14 Nov 2018, Michal Hocko wrote: > > > > Do you know of any other userspace except your usecase? Is there > > > > anything fundamental that would prevent a proper API adoption for you? > > > > > > > > > > Yes, it would require us to go back in time and build patched binaries. > > > >

Re: [PATCH] mm, slab: remove unnecessary unlikely()

2018-11-07 Thread David Rientjes
On Sun, 4 Nov 2018, Yangtao Li wrote: > WARN_ON() already contains an unlikely(), so it's not necessary to use > unlikely. > > Signed-off-by: Yangtao Li Acked-by: David Rientjes

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-28 Thread David Rientjes
On Mon, 22 Oct 2018, Zi Yan wrote: > Hi David, > Hi! > On 22 Oct 2018, at 17:04, David Rientjes wrote: > > > On Tue, 16 Oct 2018, Mel Gorman wrote: > > > > > I consider this to be an unfortunate outcome. On the one hand, we have a > > > problem

Re: [PATCH] mm,oom: Use timeout based back off.

2018-10-22 Thread David Rientjes
On Sat, 20 Oct 2018, Tetsuo Handa wrote: > This patch changes the OOM killer to wait for either > > (A) __mmput() of the OOM victim's mm completes > > or > > (B) the OOM reaper gives up waiting for (A) because memory pages > used by the OOM victim's mm did not decrease for one second

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-22 Thread David Rientjes
On Tue, 16 Oct 2018, Mel Gorman wrote: > I consider this to be an unfortunate outcome. On the one hand, we have a > problem that three people can trivially reproduce with known test cases > and a patch shown to resolve the problem. Two of those three people work > on distributions that are

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-22 Thread David Rientjes
On Mon, 15 Oct 2018, Andrea Arcangeli wrote: > > On Mon, 15 Oct 2018 15:30:17 -0700 (PDT) David Rientjes > > wrote: > > > Would it be possible to test with my > > > patch[*] that does not try reclaim to address the thrashing issue? > > > > Yes ple

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-22 Thread David Rientjes
On Mon, 15 Oct 2018, Andrea Arcangeli wrote: > > At the risk of beating a dead horse that has already been beaten, what are > > the plans for this patch when the merge window opens? It would be rather > > unfortunate for us to start incurring a 14% increase in access latency and > > 40%

Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc

2018-10-17 Thread David Rientjes
On Wed, 17 Oct 2018, Michal Hocko wrote: > Do you know of any other userspace except your usecase? Is there > anything fundamental that would prevent a proper API adoption for you? > Yes, it would require us to go back in time and build patched binaries.

Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc

2018-10-16 Thread David Rientjes
On Tue, 16 Oct 2018, Michal Hocko wrote: > > I don't understand the point of extending smaps with yet another line. > > Because abusing a vma flag part is just wrong. What are you going to do > when a next bug report states that the flag is set even though no > userspace has set it and that

Re: [patch] mm, slab: avoid high-order slab pages when it does not reduce waste

2018-10-15 Thread David Rientjes
On Mon, 15 Oct 2018, Christopher Lameter wrote: > > > If the amount of waste is the same at higher cachep->gfporder values, > > > there is no significant benefit to allocating higher order memory. There > > > will be fewer calls to the page allocator, but each call will require > > > zone->lock

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-15 Thread David Rientjes
On Wed, 10 Oct 2018, David Rientjes wrote: > > I think "madvise vs mbind" is more an issue of "no-permission vs > > permission" required. And if the processes ends up swapping out all > > other process with their memory already allocated in the node,

Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc

2018-10-15 Thread David Rientjes
On Mon, 15 Oct 2018, Michal Hocko wrote: > > > No, because the offending commit actually changed the precedence itself: > > > PR_SET_THP_DISABLE used to be honored for future mappings and the commit > > > changed that for all current mappings. > > > > Which is the actual and the full point of

Re: [patch] mm, slab: avoid high-order slab pages when it does not reduce waste

2018-10-12 Thread David Rientjes
On Fri, 12 Oct 2018, Andrew Morton wrote: > > The slab allocator has a heuristic that checks whether the internal > > fragmentation is satisfactory and, if not, increases cachep->gfporder to > > try to improve this. > > > > If the amount of waste is the same at higher cachep->gfporder values, >

[patch] mm, slab: avoid high-order slab pages when it does not reduce waste

2018-10-12 Thread David Rientjes
ond point to eliminate cases where all other pages on a pageblock are movable (or free) and fallback to pageblocks of other migratetypes from the per-zone free areas causes high-order slab memory to be allocated from them rather than from free MIGRATE_UNMOVABLE pages on the pcp. Signed-off-by: David

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-10 Thread David Rientjes
On Tue, 9 Oct 2018, Andrea Arcangeli wrote: > I think "madvise vs mbind" is more an issue of "no-permission vs > permission" required. And if the processes ends up swapping out all > other process with their memory already allocated in the node, I think > some permission is correct to be

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-10 Thread David Rientjes
On Tue, 9 Oct 2018, Andrea Arcangeli wrote: > On Tue, Oct 09, 2018 at 03:17:30PM -0700, David Rientjes wrote: > > causes workloads to severely regress both in fault and access latency when > > we know that direct reclaim is unlikely to make direct compaction free an > > en

Re: INFO: rcu detected stall in shmem_fault

2018-10-09 Thread David Rientjes
On Wed, 10 Oct 2018, Tetsuo Handa wrote: > syzbot is hitting RCU stall due to memcg-OOM event. > https://syzkaller.appspot.com/bug?id=4ae3fff7fcf4c33a47c1192d2d62d2e03efffa64 > > What should we do if memcg-OOM found no killable task because the allocating > task > was oom_score_adj == -1000 ?

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread David Rientjes
On Tue, 9 Oct 2018, Mel Gorman wrote: > > The page allocator is expecting __GFP_NORETRY for thp allocations per its > > comment: > > > > /* > > * Checks for costly allocations with __GFP_NORETRY, which > > * includes THP page fault allocations > >

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-08 Thread David Rientjes
On Fri, 5 Oct 2018, Andrea Arcangeli wrote: > I tried to add just __GFP_NORETRY but it changes nothing. Try it > yourself if you think that can resolve the swap storm and excessive > reclaim CPU overhead... and see if it works. I didn't intend to > reinvent the wheel with __GFP_COMPACT_ONLY, if

Re: [patch] mm, page_alloc: set num_movable in move_freepages()

2018-10-07 Thread David Rientjes
On Fri, 5 Oct 2018, Andrew Morton wrote: > On Fri, 5 Oct 2018 13:56:39 -0700 (PDT) David Rientjes > wrote: > > > If move_freepages() returns 0 because zone_spans_pfn(), *num_movable can > > move_free_pages_block()? !zone_spans_pfn()? > move_freepages

[patch] mm, page_alloc: set num_movable in move_freepages()

2018-10-05 Thread David Rientjes
caller where num_movable != NULL, so no bug fix, but just more robust. Signed-off-by: David Rientjes --- mm/page_alloc.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2015,10 +2015,6

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread David Rientjes
On Fri, 5 Oct 2018, Mel Gorman wrote: > > This causes, on average, a 13.9% access latency regression on Haswell, and > > the regression would likely be more severe on Naples and Rome. > > > > That assumes that fragmentation prevents easy allocation which may very > well be the case. While it

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-04 Thread David Rientjes
On Thu, 4 Oct 2018, Andrea Arcangeli wrote: > Hello David, > Hi Andrea, > On Thu, Oct 04, 2018 at 01:16:32PM -0700, David Rientjes wrote: > > There are ways to address this without introducing regressions for > > existing users of MADV_HUGEPAGE: introduce an mad

Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask

2018-10-04 Thread David Rientjes
On Wed, 26 Sep 2018, Kirill A. Shutemov wrote: > On Tue, Sep 25, 2018 at 02:03:26PM +0200, Michal Hocko wrote: > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index c3bc7e9c9a2a..c0bcede31930 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -629,21 +629,40 @@ static

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-04 Thread David Rientjes
ur substantial allocation latency when it will likely fail. We don't introduce 13.9% regressions for binaries that are correctly using MADV_HUGEPAGE as it is implemented. Nacked-by: David Rientjes

Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc

2018-10-04 Thread David Rientjes
On Thu, 4 Oct 2018, Michal Hocko wrote: > > And prior to the offending commit, there were three ways to control thp > > but two ways to determine if a mapping was eligible for thp based on the > > implementation detail of one of those ways. > > Yes, it is really unfortunate that we have ever

Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc

2018-10-04 Thread David Rientjes
On Thu, 4 Oct 2018, Michal Hocko wrote: > > > > > So how about this? (not tested yet but it should be pretty > > > > > straightforward) > > > > > > > > Umm, prctl(PR_GET_THP_DISABLE)? > > > > > > /me confused. I thought you want to query for the flag on a > > > _different_ process. > > > >

Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc

2018-10-03 Thread David Rientjes
On Wed, 3 Oct 2018, Michal Hocko wrote: > > > So how about this? (not tested yet but it should be pretty > > > straightforward) > > > > Umm, prctl(PR_GET_THP_DISABLE)? > > /me confused. I thought you want to query for the flag on a > _different_ process. Why would we want to check three

Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc

2018-10-02 Thread David Rientjes
On Tue, 2 Oct 2018, Michal Hocko wrote: > On Wed 26-09-18 08:06:24, Michal Hocko wrote: > > On Tue 25-09-18 15:04:06, Andrew Morton wrote: > > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes > > > wrote: > > > > > > > > > It is

Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps

2018-09-25 Thread David Rientjes
On Tue, 25 Sep 2018, Andrew Morton wrote: > > > > It is also used in > > > > automated testing to ensure that vmas get disabled for thp > > > > appropriately > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously > > > > enforced > > > > this, and those tests now break. > >

[patch v3] mm, thp: always specify disabled vmas as nh in smaps

2018-09-25 Thread David Rientjes
mm: make PR_SET_THP_DISABLE immediately active") Signed-off-by: David Rientjes --- v3: - reword Documentation/filesystems/proc.txt for eligibility v2: - clear VM_HUGEPAGE per Vlastimil - update Documentation/filesystems/proc.txt to be explicit Documentation/filesystems/pr

Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps

2018-09-25 Thread David Rientjes
On Tue, 25 Sep 2018, Michal Hocko wrote: > > This is used to identify heap mappings that should be able to fault thp > > but do not, and they normally point to a low-on-memory or fragmentation > > issue. After commit 1860033237d4, our users of PR_SET_THP_DISABLE no > > longer show "nh" for

Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps

2018-09-25 Thread David Rientjes
On Mon, 24 Sep 2018, Vlastimil Babka wrote: > On 9/24/18 10:02 PM, Michal Hocko wrote: > > On Mon 24-09-18 21:56:03, Michal Hocko wrote: > >> On Mon 24-09-18 12:30:07, David Rientjes wrote: > >>> Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately act

[patch v2] mm, thp: always specify ineligible vmas as nh in smaps

2018-09-24 Thread David Rientjes
make PR_SET_THP_DISABLE immediately active") Signed-off-by: David Rientjes --- v2: - clear VM_HUGEPAGE per Vlastimil - update Documentation/filesystems/proc.txt to be explicit Documentation/filesystems/proc.txt | 12 ++-- fs/proc/task_mmu.c | 14 +- 2

Re: [patch] mm, thp: always specify ineligible vmas as nh in smaps

2018-09-24 Thread David Rientjes
On Mon, 24 Sep 2018, Vlastimil Babka wrote: > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > > @@ -653,13 +653,23 @@ static void show_smap_vma_flags(struct seq_file *m, > > struct vm_area_struct *vma) > > #endif > > #endif /*

[patch] mm, thp: always specify ineligible vmas as nh in smaps

2018-09-24 Thread David Rientjes
not emitted. This causes smaps parsing libraries to assume a vma is eligible for thp and ends up puzzling the user on why its memory is not backed by thp. Signed-off-by: David Rientjes --- fs/proc/task_mmu.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/proc/task_

Re: [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-09-12 Thread David Rientjes
On Wed, 12 Sep 2018, Michal Hocko wrote: > > Saying that we really want THP isn't an all-or-nothing decision. We > > certainly want to try hard to fault hugepages locally especially at task > > startup when remapping our .text segment to thp, and MADV_HUGEPAGE works > > very well for that.

Re: [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-09-11 Thread David Rientjes
On Tue, 11 Sep 2018, Michal Hocko wrote: > > That's not entirely true, the remote access latency for remote thp on all > > of our platforms is greater than local small pages, this is especially > > true for remote thp that is allocated intersocket and must be accessed > > through the

Re: [PATCH RFC] mm: don't raise MEMCG_OOM event due to failed high-order allocation

2018-09-10 Thread David Rientjes
; > Signed-off-by: Roman Gushchin > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Vladimir Davydov Acked-by: David Rientjes

Re: [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-09-10 Thread David Rientjes
On Fri, 7 Sep 2018, Michal Hocko wrote: > From: Michal Hocko > > Andrea has noticed [1] that a THP allocation might be really disruptive > when allocated on NUMA system with the local node full or hard to > reclaim. Stefan has posted an allocation stall report on 4.12 based > SLES kernel which

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-23 Thread David Rientjes
On Fri, 24 Aug 2018, Tetsuo Handa wrote: > > For those of us who are tracking CVE-2016-10723 which has peristently been > > labeled as "disputed" and with no clear indication of what patches address > > it, I am assuming that commit 9bfe5ded054b ("mm, oom: remove sleep from > > under

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-23 Thread David Rientjes
On Wed, 22 Aug 2018, Tetsuo Handa wrote: > On 2018/08/03 15:16, Michal Hocko wrote: > > On Fri 03-08-18 07:05:54, Tetsuo Handa wrote: > >> On 2018/07/31 14:09, Michal Hocko wrote: > >>> On Tue 31-07-18 06:01:48, Tetsuo Handa wrote: > On 2018/07/31 4:10, Michal Hocko wrote: > > Since

cgroup aware oom killer (was Re: [PATCH 0/3] introduce memory.oom.group)

2018-08-19 Thread David Rientjes
Roman, have you had time to go through this? On Tue, 7 Aug 2018, David Rientjes wrote: > On Mon, 6 Aug 2018, Roman Gushchin wrote: > > > > In a cgroup-aware oom killer world, yes, we need the ability to specify > > > that the usage of the entire subtree should

Re: [PATCH 0/3] introduce memory.oom.group

2018-08-09 Thread David Rientjes
On Wed, 8 Aug 2018, Michal Hocko wrote: > > > > In a cgroup-aware oom killer world, yes, we need the ability to specify > > > > that the usage of the entire subtree should be compared as a single > > > > entity with other cgroups. That is necessary for user subtrees but may > > > > not be

Re: [PATCH v2] proc: add percpu populated pages count to meminfo

2018-08-07 Thread David Rientjes
e > the backing memory scales with the numbere of cpus and can quickly > outweigh the metadata. It also makes this calculation light. > > Signed-off-by: Dennis Zhou Acked-by: David Rientjes

Re: [PATCH 0/3] introduce memory.oom.group

2018-08-07 Thread David Rientjes
On Mon, 6 Aug 2018, Roman Gushchin wrote: > > In a cgroup-aware oom killer world, yes, we need the ability to specify > > that the usage of the entire subtree should be compared as a single > > entity with other cgroups. That is necessary for user subtrees but may > > not be necessary for

Re: [PATCH v2 1/3] mm: introduce mem_cgroup_put() helper

2018-08-06 Thread David Rientjes
oman Gushchin > Reviewed-by: Shakeel Butt > Reviewed-by: Andrew Morton > Acked-by: Johannes Weiner > Acked-by: Michal Hocko > Signed-off-by: Andrew Morton > Signed-off-by: Stephen Rothwell Acked-by: David Rientjes

Re: [LKP] [mm, oom] c1e4c54f9c: BUG:KASAN:null-ptr-deref_in_d

2018-08-06 Thread David Rientjes
On Mon, 6 Aug 2018, 禹舟键 wrote: > Hi Michal > Sorry, I cannot open the link you shared. > The suggestion atop your previous patch was diff --git a/mm/oom_kill.c b/mm/oom_kill.c --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -438,14 +438,6 @@ static void dump_header(struct oom_control *oc, struct

Re: [PATCH 0/3] introduce memory.oom.group

2018-08-06 Thread David Rientjes
On Wed, 1 Aug 2018, Roman Gushchin wrote: > Ok, I think that what we'll do here: > 1) drop the current cgroup-aware OOM killer implementation from the mm tree > 2) land memory.oom.group to the mm tree (your ack will be appreciated) > 3) discuss and, hopefully, agree on memory.oom.policy interface

Re: [PATCH 0/3] introduce memory.oom.group

2018-08-01 Thread David Rientjes
On Tue, 31 Jul 2018, Roman Gushchin wrote: > > What's the plan with the cgroup aware oom killer? It has been sitting in > > the -mm tree for ages with no clear path to being merged. > > It's because your nack, isn't it? > Everybody else seem to be fine with it. > If they are fine with it,

Re: [LKP] [mm, oom] c1e4c54f9c: BUG:KASAN:null-ptr-deref_in_d

2018-07-30 Thread David Rientjes
On Mon, 30 Jul 2018, Michal Hocko wrote: > On Mon 30-07-18 17:03:20, kernel test robot wrote: > [...] > > [9.034310] BUG: KASAN: null-ptr-deref in dump_header+0x10c/0x448 > > Could you faddr2line on the offset please? > It's possible that p is NULL when calling dump_header(). In this case

Re: [PATCH 0/3] introduce memory.oom.group

2018-07-30 Thread David Rientjes
On Mon, 30 Jul 2018, Roman Gushchin wrote: > This is a tiny implementation of cgroup-aware OOM killer, > which adds an ability to kill a cgroup as a single unit > and so guarantee the integrity of the workload. > > Although it has only a limited functionality in comparison > to what now resides

Re: cgroups iptables-restor: vmalloc: allocation failure

2018-07-25 Thread David Rientjes
On Wed, 25 Jul 2018, Georgi Nikolov wrote: > Hello, > > I posted a kernel bug https://bugzilla.kernel.org/show_bug.cgi?id=200651 and > i hope this is the correct place to discuss this. > Could you post the full allocation failure from the kernel log? It's not possible to vmalloc any

Re: [patch v4] mm, oom: fix unnecessary killing of additional processes

2018-07-24 Thread David Rientjes
On Wed, 25 Jul 2018, Tetsuo Handa wrote: > > If exit_mmap() gets preempted indefinitely before it can free any memory, > > we are better off oom killing another process. The purpose of the timeout > > is to give an oom victim an amount of time to free its memory and exit > > before selecting

Re: [patch v4] mm, oom: fix unnecessary killing of additional processes

2018-07-24 Thread David Rientjes
On Wed, 25 Jul 2018, Tetsuo Handa wrote: > >> You might worry about situations where __oom_reap_task_mm() is a no-op. > >> But that is not always true. There is no point with emitting > >> > >> pr_info("oom_reaper: unable to reap pid:%d (%s)\n", ...); > >> debug_show_all_locks(); > >> > >>

Re: [patch v4] mm, oom: fix unnecessary killing of additional processes

2018-07-24 Thread David Rientjes
On Sat, 21 Jul 2018, Tetsuo Handa wrote: > You can't apply "[patch v4] mm, oom: fix unnecessary killing of additional > processes" > because Michal's patch which removes oom_lock serialization was added to -mm > tree. > I've rebased the patch to linux-next and posted a v5. > You might worry

[patch v5] mm, oom: fix unnecessary killing of additional processes

2018-07-24 Thread David Rientjes
ime, such as 10s, since oom livelock is a very rare occurrence and it's better to optimize for preventing additional (unnecessary) oom killing than a scenario that is much more unlikely. Signed-off-by: David Rientjes --- v5: - rebased to linux-next - reworked serialization in exit_mmap() t

Re: [PATCH] mm, oom: remove oom_lock from oom_reaper

2018-07-24 Thread David Rientjes
again. > > Therefore remove the oom_lock for oom_reaper paths (both exit_mmap and > oom_reap_task_mm). The reaper serializes with exit_mmap by mmap_sem + > MMF_OOM_SKIP flag. There is no synchronization with out_of_memory path > now. > > Suggested-by: David Rientjes > Signed-off-by: Michal Hocko Acked-by: David Rientjes

Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-24 Thread David Rientjes
On Tue, 24 Jul 2018, Kirill A. Shutemov wrote: > > use_zero_page is currently a simple thp flag, meaning it rejects writes > > where val != !!val, so perhaps it would be best to overload it with > > additional options? I can imagine 0x2 defining persistent allocation so > > that the hzp is

Re: [patch v3 -mm 3/6] mm, memcg: add hierarchical usage oom policy

2018-07-23 Thread David Rientjes
On Mon, 23 Jul 2018, Roman Gushchin wrote: > > Roman, I'm trying to make progress so that the cgroup aware oom killer is > > in a state that it can be merged. Would you prefer a second tunable here > > to specify a cgroup's points includes memory from its subtree? > > Hi, David! > > It's

  1   2   3   4   5   6   7   8   9   10   >