making it a highly favored OOM kill target process. The output
> documents both the misconfiguration and the fact that the process
> was correctly targeted by OOM due to the miconfiguration. Having
> the oom_score_adj on the Killed message ensures that it is documented.
>
> Si
On Tue, 13 Aug 2019, Andrew Morton wrote:
> > After commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages"),
> > struct page of reserved memory is zeroed. This causes page->flags to be 0
> > and fixes issues related to reading /proc/kpageflags, for example, of
> > reserved memory.
> >
se, an
> > incorrect node or zone is a bug worthy of being warned about (and the
> > examination of struct page is acceptable bcause this memory is not
> > reserved).
> >
> > Signed-off-by: David Rientjes
> > ---
> > mm/page_alloc.c | 19
cause this memory is not
reserved).
Signed-off-by: David Rientjes
---
mm/page_alloc.c | 19 ---
1 file changed, 4 insertions(+), 15 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2238,27 +2238,12 @@ static int move_freepages(st
20] ---[ end trace f67eb9af4d8d492b ]---
>
> Fix this by ensuring the value we set with set_freepointer is either NULL
> or another value in the chain.
>
> Reported-by: kernel test robot
> Signed-off-by: Laura Abbott
Acked-by: David Rientjes
On Wed, 10 Jul 2019, Singh, Brijesh wrote:
> diff --git a/Documentation/virtual/kvm/hypercalls.txt
> b/Documentation/virtual/kvm/hypercalls.txt
> index da24c138c8d1..94f0611f4d88 100644
> --- a/Documentation/virtual/kvm/hypercalls.txt
> +++ b/Documentation/virtual/kvm/hypercalls.txt
> @@ -141,3 +
from include/linux/slab.h to mm/slab.h.
> It is just a refactoring patch with no code change.
>
> In fact both the slub_def.h and slab_def.h should be moved into the mm
> directory as well, but that will probably cause many merge conflicts.
>
> Signed-off-by: Waiman Long
Acked-by: David Rientjes
Thanks Waiman!
Commit-ID: ffdb07f31252625b7bcbf1f424d7beccff02ba97
Gitweb: https://git.kernel.org/tip/ffdb07f31252625b7bcbf1f424d7beccff02ba97
Author: David Rientjes
AuthorDate: Wed, 10 Jul 2019 13:19:35 -0700
Committer: Thomas Gleixner
CommitDate: Tue, 16 Jul 2019 23:13:48 +0200
x86/mm: Free
Commit-ID: e74bd96989dd42a51a73eddb4a5510a6f5e42ac3
Gitweb: https://git.kernel.org/tip/e74bd96989dd42a51a73eddb4a5510a6f5e42ac3
Author: David Rientjes
AuthorDate: Tue, 9 Jul 2019 19:44:03 -0700
Committer: Thomas Gleixner
CommitDate: Tue, 16 Jul 2019 23:13:48 +0200
x86/boot: Fix memory
On Sat, 13 Jul 2019, Yang Shi wrote:
> When running ltp's oom test with kmemleak enabled, the below warning was
> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
> passed in:
>
> WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608
> __alloc_pages_nodemask+0x1c31/0x1d50
>
69.s:7526: Warning: ignoring changed section attributes
> for .data..ro_after_init
>
> Adding an initialization to kmalloc_caches is rather silly here
> but does avoid the issue.
>
> Link: https://bugs.llvm.org/show_bug.cgi?id=42570
> Signed-off-by: Arnd Bergmann
Acked-by:
On Thu, 11 Jul 2019, Chris Wilson wrote:
> Quoting Steven Rostedt (2019-07-11 03:57:20)
> > On Fri, 14 Jun 2019 08:38:37 -0700
> > Tejun Heo wrote:
> >
> > > Hello,
> > >
> > > On Fri, Jun 14, 2019 at 04:08:33PM +0100, Chris Wilson wrote:
> > > > #ifdef CONFIG_MEMCG
> > > > if (slab_sta
.
Reported-by: Cfir Cohen
Signed-off-by: David Rientjes
---
arch/x86/mm/mem_encrypt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -41,7 +41,7
rted-by: Cfir Cohen
Signed-off-by: David Rientjes
---
arch/x86/kernel/mpparse.c | 10 --
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index f1c5eb99d445..7a7055056b0d 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/ar
On Fri, 5 Jul 2019, Christopher Lameter wrote:
> On Fri, 5 Jul 2019, Markus Elfring wrote:
>
> > Avoid an extra function call by using a ternary operator instead of
> > a conditional statement for a string literal selection.
>
> Well. I thought the compiler does that on its own? And the tenary o
the flag in the common code as suggested by Roman.]
>
> Signed-off-by: Waiman Long
> Reviewed-by: Shakeel Butt
> Acked-by: Roman Gushchin
Acked-by: David Rientjes
in locked, as documentation.
>
> Signed-off-by: Henry Burns
> Suggested-by: Vitaly Wool
Acked-by: David Rientjes
On Thu, 6 Jun 2019, David Rientjes wrote:
> The idea that I had was snipped from this, however, and it would be nice
> to get some feedback on it: I've suggested that direct reclaim for the
> purposes of hugepage allocation on the local node is never worthwhile
> unless
: Vlastimil Babka
> Signed-off-by: Alan Jenkins
> Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external
> fragmentation event occurs")
> Acked-by: Mel Gorman
Acked-by: David Rientjes
mask+0x49/0x70
> [ 381.346287] softirqs last enabled at (10262): []
> cgroup_idr_replace+0x3a/0x50
> [ 381.346290] softirqs last disabled at (10260): []
> cgroup_idr_replace+0x1d/0x50
> [ 381.346293] ---[ end trace b324ba73eb3659f0 ]---
>
> v2: fixed return value from memcg_
.
>
> Reported-by: Dave Hansen
> Signed-off-by: Shakeel Butt
Acked-by: David Rientjes
On Fri, 7 Jun 2019, Michal Hocko wrote:
> > So my proposed change would be:
> > - give the page allocator a consistent indicator that compaction failed
> >because we are low on memory (make COMPACT_SKIPPED really mean this),
> > - if we get this in the page allocator and we are allocating th
On Wed, 5 Jun 2019, Michal Hocko wrote:
> > That's fine, but we also must be mindful of users who have used
> > MADV_HUGEPAGE over the past four years based on its hard-coded behavior
> > that would now regress as a result.
>
> Absolutely, I am all for helping those usecases. First of all we ne
On Fri, 31 May 2019, Michal Hocko wrote:
> > The problem which this patch addresses has apparently gone unreported for
> > 4+ years since
>
> Can we finaly stop considering the time and focus on the what is the
> most reasonable behavior in general case please? Conserving mistakes
> based on an
On Thu, 23 May 2019, Andrew Morton wrote:
> > We are going in circles, *yes* there is a problem for potential swap
> > storms today because of the poor interaction between memory compaction and
> > directed reclaim but this is a result of a poor API that does not allow
> > userspace to specify
On Wed, 29 May 2019, Hariprasad Kelam wrote:
> dont acquire lock before calling wd719x_chip_init.
>
> Issue identified by coccicheck
>
> Signed-off-by: Hariprasad Kelam
> -
> changes in v1: Replace GFP_KERNEL with GFP_ATOMIC.
> changes in v2: Call wd719x_chip_init without lock as suggested
On Tue, 28 May 2019, Christoph Hellwig wrote:
> > wd719x_chip_init is getting called in interrupt disabled
> > mode(spin_lock_irqsave) , so we need to GFP_ATOMIC instead
> > of GFP_KERNEL.
> >
> > Issue identified by coccicheck
>
> I don't think request_firmware is any more happy being called un
On Wed, 29 May 2019, Yang Shi wrote:
> > Right, we've also encountered this. I talked to Kirill about it a week or
> > so ago where the suggestion was to split all compound pages on the
> > deferred split queues under the presence of even memory pressure.
> >
> > That breaks cgroup isolation and
On Fri, 24 May 2019, Andrea Arcangeli wrote:
> > > We are going in circles, *yes* there is a problem for potential swap
> > > storms today because of the poor interaction between memory compaction
> > > and
> > > directed reclaim but this is a result of a poor API that does not allow
> > > use
On Tue, 28 May 2019, Yang Shi wrote:
>
> I got some reports from our internal application team about memcg OOM.
> Even though the application has been killed by oom killer, there are
> still a lot THPs reside, page reclaim doesn't reclaim them at all.
>
> Some investigation shows they are on def
On Mon, 20 May 2019, Mel Gorman wrote:
> > There was exhausting discussion subsequent to this that caused Linus to
> > have to revert the offending commit late in an rc series that is not
> > described here.
>
> Yes, at the crux of that matter was which regression introduced was more
> importa
On Tue, 14 May 2019, Qian Cai wrote:
> Running tests on a debug kernel will usually generate a large number of
> kmemleak objects.
>
> # grep kmemleak /proc/slabinfo
> kmemleak_object 2243606 3436210 ...
>
> As the result, reading /proc/slab_allocators could easily loop forever
> while pro
CONFIG_DEBUG_SLAB_LEAK has been removed, so remove it from defconfig.
Fixes: 7878c231dae0 ("slab: remove /proc/slab_allocators")
Signed-off-by: David Rientjes
---
arch/parisc/configs/c8000_defconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/parisc/configs/c8000_defconf
;&
> > + (gfpflags & ___GFP_DIRECT_RECLAIM))
> > return false;
>
> Should we use __GFP_DIRECT_RECLAIM instead of ___GFP_DIRECT_RECLAIM?
> Because I found the following comment in gfp.h
>
> /* Plain integer GFP bitmasks. Do not use this directly. */
>
Yes, we should use the two underscore version instead of the three.
Nicolas, after that's fixed up, feel free to add Acked-by: David Rientjes
.
Thanks!
On Wed, 15 May 2019, Gen Zhang wrote:
> Pointer s is allocated with kmem_cache_zalloc(). And s is used in the
> follwoing codes. However, when kmem_cache_zalloc fails, using s will
> cause null pointer dereference and the kernel will go wrong. Thus we
> check whether the kmem_cache_zalloc fails.
On Fri, 3 May 2019, Andrea Arcangeli wrote:
> This reverts commit 2f0799a0ffc033bf3cc82d5032acc3ec633464c2.
>
> commit 2f0799a0ffc033bf3cc82d5032acc3ec633464c2 was rightfully applied
> to avoid the risk of a severe regression that was reported by the
> kernel test robot at the end of the merge wi
On Wed, 24 Apr 2019, Joel Savitz wrote:
> In the event of an oom kill, useful information about the killed
> process is printed to dmesg. Users, especially system administrators,
> will find it useful to immediately see the UID of the process.
>
> In the following example, abuse_the_ram is the na
On Wed, 10 Apr 2019, Vlastimil Babka wrote:
> On 4/10/19 4:47 AM, Tobin C. Harding wrote:
> > Recently a 2 year old bug was found in the SLAB allocator that crashes
> > the kernel. This seems to imply that not that many people are using the
> > SLAB allocator.
>
> AFAIK that bug required CONFIG_
cleanup.
> >
> > Reported-by: Cfir Cohen
> > Signed-off-by: David Rientjes
> > ---
> > arch/x86/kvm/svm.c | 12 +---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
>
>
> Reviewed-by: Brijesh Singh
>
> thanks
>
Paolo, Radim, I don't see this in kvm.git, is it ready to be staged?
On Wed, 20 Mar 2019, Singh, Brijesh wrote:
> > get_num_contig_pages() could potentially overflow int so make its type
> > consistent with its usage.
> >
> > Reported-by: Cfir Cohen
> > Signed-off-by: David Rientjes
> > ---
> > arch/x86/kvm/svm
This ensures that the address and length provided to DBG_DECRYPT and
DBG_ENCRYPT do not cause an overflow.
At the same time, pass the actual number of pages pinned in memory to
sev_unpin_memory() as a cleanup.
Reported-by: Cfir Cohen
Signed-off-by: David Rientjes
---
arch/x86/kvm/svm.c | 12
On Thu, 14 Mar 2019, Liu Xiang wrote:
> When CONFIG_SLUB_DEBUG is not enabled, remove_full() is empty.
> While CONFIG_SLUB_DEBUG is enabled, remove_full() can check
> s->flags by itself. So kmem_cache_debug() is useless and
> can be removed.
>
> Signed-off-by: Liu Xi
On Thu, 14 Mar 2019, Zhaoyang Huang wrote:
> From: Zhaoyang Huang
>
> Two action for this patch:
> 1. set a batch size for system heap's shrinker, which can have it buffer
> reasonable page blocks in pool for future allocation.
> 2. reverse the order sequence when free page blocks, the purpose i
ob/master/testcases/kernel/syscalls/mbind/mbind02.c
>
> Fixes: 6f4576e3687b ("mempolicy: apply page table walker on
> queue_pages_range()")
> Reported-by: Cyril Hrubis
> Cc: Vlastimil Babka
> Cc: sta...@vger.kernel.org
> Suggested-by: Kirill A. Shutemov
> Signed-off
get_num_contig_pages() could potentially overflow int so make its type
consistent with its usage.
Reported-by: Cfir Cohen
Signed-off-by: David Rientjes
---
arch/x86/kvm/svm.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
On Tue, 26 Feb 2019, Jing Xiangfeng wrote:
> On 2019/2/26 3:17, David Rientjes wrote:
> > On Mon, 25 Feb 2019, Mike Kravetz wrote:
> >
> >> Ok, what about just moving the calculation/check inside the lock as in the
> >> untested patch below?
>
On Mon, 25 Feb 2019, Daniel Vetter wrote:
> On Sun, Feb 24, 2019 at 12:40:19PM -0800, David Rientjes wrote:
> > On Sat, 29 Dec 2018, syzbot wrote:
> >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD comm
On Mon, 25 Feb 2019, Mike Kravetz wrote:
> Ok, what about just moving the calculation/check inside the lock as in the
> untested patch below?
>
> Signed-off-by: Mike Kravetz
> ---
> mm/hugetlb.c | 34 ++
> 1 file changed, 26 insertions(+), 8 deletions(-)
>
> dif
On Sun, 24 Feb 2019, Mike Kravetz wrote:
> > User can change a node specific hugetlb count. i.e.
> > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
> > the calculated value of count is a total number of huge pages. It could
> > be overflow when a user entering a crazy high
On Sat, 29 Dec 2018, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:5694cecdb092 Merge tag 'arm64-upstream' of git://git.kerne..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=124eebc740
> kernel config: https://sy
On Mon, 11 Feb 2019, Qian Cai wrote:
> "addr" function argument is not used in alloc_consistency_checks() at
> all, so remove it.
>
> Fixes: becfda68abca ("slub: convert SLAB_DEBUG_FREE to
> SLAB_CONSISTENCY_CHECKS")
> Signed-off-by: Qian Cai
Acked-by: David Rientjes
On Thu, 24 Jan 2019, miles.c...@mediatek.com wrote:
> From: Miles Chen
>
> When debugging slab errors in slub.c, sometimes we have to trigger
> a panic in order to get the coredump file. Add a debug option
> SLAB_WARN_ON_ERROR to toggle WARN_ON() when the option is set.
>
Wouldn't it be better
On Sat, 29 Dec 2018, Peng Wang wrote:
> new_slab_objects() will return immediately if freelist is not NULL.
>
> if (freelist)
> return freelist;
>
> One more assignment operation could be avoided.
>
> Signed-off-by: Peng Wang
Acked-by: David Rientjes
mm, tree wide: replace __GFP_REPEAT by
__GFP_RETRY_MAYFAIL with more useful semantic")
Signed-off-by: David Rientjes
---
net/core/skbuff.c | 7 +--
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
--- a/net/core/skbuff.c
+++ b/net/core/sk
.
Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command")
Reported-by: Cfir Cohen
Signed-off-by: David Rientjes
---
arch/x86/kvm/svm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -6278,6 +62
On Fri, 14 Dec 2018, Vlastimil Babka wrote:
> > It would be interesting to know if anybody has tried using the per-zone
> > free_area's to determine migration targets and set a bit if it should be
> > considered a migration source or a migration target. If all pages for a
> > pageblock are not
On Fri, 14 Dec 2018, Mel Gorman wrote:
> > In other words, I think there is a lot of potential stranding that occurs
> > for both scanners that could otherwise result in completely free
> > pageblocks. If there a single movable page present near the end of the
> > zone in an otherwise fully fr
On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c..1c118d7 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
>
> /* Import existing vmlist entries. */
> for (tmp = vmlist; tmp
On Wed, 12 Dec 2018, Vlastimil Babka wrote:
> > Regarding the role of direct reclaim in the allocator, I think we need
> > work on the feedback from compaction to determine whether it's worthwhile.
> > That's difficult because of the point I continue to bring up:
> > isolate_freepages() is not
On Sun, 9 Dec 2018, Andrea Arcangeli wrote:
> You didn't release the proprietary software that depends on
> __GFP_THISNODE behavior and that you're afraid is getting a
> regression.
>
> Could you at least release with an open source license the benchmark
> software that you must have used to do t
On Tue, 11 Dec 2018, Arnd Bergmann wrote:
> > Hmm, strange that Arnd's build failure is only reporting about an unused
> > variable instead of MMU_NOTIFY_CLEAR being undefined :/
> >
> > I think this should be done so that anybody using
> > mmu_notifier_range_init() doesn't need to worry about the
On Tue, 11 Dec 2018, Jerome Glisse wrote:
> > > > The macro version of mmu_notifier_range_init() for CONFIG_MMU_NOTIFIER=n
> > > > does not evaluate all its arguments, leading to a warning in one case:
> > > >
> > > > mm/migrate.c: In function 'migrate_vma_pages':
> > > > mm/migrate.c:2711:20: er
On Tue, 11 Dec 2018, Jerome Glisse wrote:
> On Tue, Dec 11, 2018 at 09:04:43PM +0100, Arnd Bergmann wrote:
> > The macro version of mmu_notifier_range_init() for CONFIG_MMU_NOTIFIER=n
> > does not evaluate all its arguments, leading to a warning in one case:
> >
> > mm/migrate.c: In function 'mig
On Thu, 6 Dec 2018, Linus Torvalds wrote:
> > On Broadwell, the access latency to local small pages was +5.6%, remote
> > hugepages +16.4%, and remote small pages +19.9%.
> >
> > On Naples, the access latency to local small pages was +4.9%, intrasocket
> > hugepages +10.5%, intrasocket small pages
On Thu, 6 Dec 2018, Mike Rapoport wrote:
> Add the description for kmem_cache_create, fixup the return value paragraph
> and make both kmem_cache_create and add the second '*' to the comment
> opening.
>
> Signed-off-by: Mike Rapoport
Acked-by: David Rientjes
On Thu, 6 Dec 2018, Mike Rapoport wrote:
> Several functions in mm/slab_common.c have kernel-doc comments, it makes
> perfect sense to link them to the MM API reference.
>
> Signed-off-by: Mike Rapoport
Acked-by: David Rientjes
On Wed, 5 Dec 2018, Andrea Arcangeli wrote:
> > I've must have said this at least six or seven times: fault latency is
>
> In your original regression report in this thread to Linus:
>
> https://lkml.kernel.org/r/alpine.deb.2.21.1811281504030.231...@chino.kir.corp.google.com
>
> you said "On a
On Fri, 7 Dec 2018, Vlastimil Babka wrote:
> >> But *that* in turn makes for other possible questions:
> >>
> >> - if the reason we couldn't get a local hugepage is that we're simply
> >> out of local memory (huge *or* small), then maybe a remote hugepage is
> >> better.
> >>
> >>Note that th
On Fri, 7 Dec 2018, Michal Hocko wrote:
> > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317.
> >
> > There are a couple of issues with 89c83fb539f9 independent of its partial
> > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage
> > allocations"):
> >
> > Firstly, the
799a0ffc0. The result is the same thp
allocation policy for 4.20 that was in 4.19.
Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into
alloc_hugepage_direct_gfpmask")
Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations")
Signed-off-by: David Rientjes
On Fri, 7 Dec 2018, Vlastimil Babka wrote:
> > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317.
> >
> > There are a couple of issues with 89c83fb539f9 independent of its partial
> > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage
> > allocations"):
> >
> > Firstly, t
On Thu, 6 Dec 2018, Michal Hocko wrote:
> MADV_HUGEPAGE changes the picture because the caller expressed a need
> for THP and is willing to go extra mile to get it. That involves
> allocation latency and as of now also a potential remote access. We do
> not have complete agreement on the later but
On Wed, 5 Dec 2018, Linus Torvalds wrote:
> > Ok, I've applied David's latest patch.
> >
> > I'm not at all objecting to tweaking this further, I just didn't want
> > to have this regression stand.
>
> Hmm. Can somebody (David?) also perhaps try to state what the
> different latency impacts end u
ate THP gfp handling into
alloc_hugepage_direct_gfpmask")
Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations")
Signed-off-by: David Rientjes
---
include/linux/gfp.h | 12
mm/huge_memory.c| 27 +--
mm/mempolicy.c | 32 +
On Wed, 5 Dec 2018, David Rientjes wrote:
> This is a full revert of ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for
> MADV_HUGEPAGE mappings") and a partial revert of 89c83fb539f9 ("mm, thp:
> consolidate THP gfp handling into alloc_hugepage_direct_gfpmask").
>
On Wed, 5 Dec 2018, Andrea Arcangeli wrote:
> __GFP_COMPACT_ONLY gave an hope it could give some middle ground but
> it shows awful compaction results, it basically destroys compaction
> effectiveness and we know why (COMPACT_SKIPPED must call reclaim or
> compaction can't succeed because there's
Restore __GFP_THISNODE for thp allocations.
Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into
alloc_hugepage_direct_gfpmask")
Signed-off-by: David Rientjes
---
include/linux/m
On Wed, 5 Dec 2018, Linus Torvalds wrote:
> > So ultimately we decided that the saner behavior that gives the least
> > risk of regression for the short term, until we can do something
> > better, was the one that is already applied upstream.
>
> You're ignoring the fact that people *did* report
On Wed, 5 Dec 2018, Andrea Arcangeli wrote:
> > High thp utilization is not always better, especially when those hugepages
> > are accessed remotely and introduce the regressions that I've reported.
> > Seeking high thp utilization at all costs is not the goal if it causes
> > workloads to reg
te transparent hugepages on local node
This make sure that we try to allocate hugepages from local node if
allowed by mempolicy. If we can't, we fallback to small page allocation
based on mempolicy. This is based on the observation that allocating
pages on local node is more beneficial th
On Wed, 5 Dec 2018, Michal Hocko wrote:
> > As we've been over countless times, this is the desired effect for
> > workloads that fit on a single node. We want local pages of the native
> > page size because they (1) are accessed faster than remote hugepages and
> > (2) are candidates for coll
On Wed, 5 Dec 2018, Michal Hocko wrote:
> > The revert is certainly needed to prevent the regression, yes, but I
> > anticipate that Andrea will report back that patch 2 at least improves the
> > situation for the problem that he was addressing, specifically that it is
> > pointless to thrash a
On Wed, 5 Dec 2018, Mel Gorman wrote:
> > This is a single MADV_HUGEPAGE usecase, there is nothing special about it.
> > It would be the same as if you did mmap(), madvise(MADV_HUGEPAGE), and
> > faulted the memory with a fragmented local node and then measured the
> > remote access latency to
On Wed, 5 Dec 2018, Michal Hocko wrote:
> > > At minimum do not remove the cleanup part which consolidates the gfp
> > > hadnling to a single place. There is no real reason to have the
> > > __GFP_THISNODE ugliness outside of alloc_hugepage_direct_gfpmask.
> > >
> >
> > The __GFP_THISNODE usage
On Wed, 5 Dec 2018, Michal Hocko wrote:
> > It isn't specific to MADV_HUGEPAGE, it is the policy for all transparent
> > hugepage allocations, including defrag=always. We agree that
> > MADV_HUGEPAGE is not exactly defined: does it mean try harder to allocate
> > a hugepage locally, try compac
On Wed, 5 Dec 2018, Pingfan Liu wrote:
> > > And rather than using first_online_node, would next_online_node() work?
> > >
> > What is the gain? Is it for memory pressure on node0?
> >
> Maybe I got your point now. Do you try to give a cheap assumption on
> nearest neigh of this node?
>
It's li
On Tue, 4 Dec 2018, Mel Gorman wrote:
> What should also be kept in mind is that we should avoid conflating
> locality preferences with THP preferences which is separate from THP
> allocation latencies. The whole __GFP_THISNODE approach is pushing too
> hard on locality versus huge pages when MADV
On Tue, 4 Dec 2018, Michal Hocko wrote:
> The thing I am really up to here is that reintroduction of
> __GFP_THISNODE, which you are pushing for, will conflate madvise mode
> resp. defrag=always with a numa placement policy because the allocation
> doesn't fallback to a remote node.
>
It isn't s
On Tue, 4 Dec 2018, Michal Hocko wrote:
> > This fixes a 13.9% of remote memory access regression and 40% remote
> > memory allocation regression on Haswell when the local node is fragmented
> > for hugepage sized pages and memory is being faulted with either the thp
> > defrag setting of "always"
On Tue, 4 Dec 2018, Vlastimil Babka wrote:
> So, AFAIK, the situation is:
>
> - commit 5265047ac301 in 4.1 introduced __GFP_THISNODE for THP. The
> intention came a bit earlier in 4.0 commit 077fcf116c8c. (I admit acking
> both as it seemed to make sense).
Yes, both are based on the preference t
On Tue, 4 Dec 2018, Michal Hocko wrote:
> > This is a full revert of ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for
> > MADV_HUGEPAGE mappings") and a partial revert of 89c83fb539f9 ("mm, thp:
> > consolidate THP gfp handling into alloc_hugepage_direct_gfpmask").
> >
> > By not setting __GFP_TH
On Tue, 4 Dec 2018, Pingfan Liu wrote:
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 76f8db0..8324953 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -453,6 +453,8 @@ static inline int gfp_zonelist(gfp_t flags)
> */
> static inline struct zonelist *node_zo
Restore __GFP_THISNODE for thp allocations.
Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into
alloc_hugepage_direct_gfpmask")
Signed-off-by: David Rientjes
---
include/linux/m
This fixes a 13.9% of remote memory access regression and 40% remote
memory allocation regression on Haswell when the local node is fragmented
for hugepage sized pages and memory is being faulted with either the thp
defrag setting of "always" or has been madvised with MADV_HUGEPAGE.
The usecase th
rather
than trying reclaim of SWAP_CLUSTER_MAX pages which is unlikely to make
a difference for memory compaction to become successful.
Signed-off-by: David Rientjes
---
drivers/gpu/drm/ttm/ttm_page_alloc.c | 8
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 3 +--
include/linux/gfp.h
On Mon, 3 Dec 2018, Linus Torvalds wrote:
> Side note: I think maybe people should just look at that whole
> compaction logic for that block, because it doesn't make much sense to
> me:
>
> /*
> * Checks for costly allocations with __GFP_NORETRY, which
>
On Mon, 3 Dec 2018, Michal Hocko wrote:
> > I think extending functionality so thp can be allocated remotely if truly
> > desired is worthwhile
>
> This is a complete NUMA policy antipatern that we have for all other
> user memory allocations. So far you have to be explicit for your numa
> requi
On Mon, 3 Dec 2018, Michal Hocko wrote:
> I have merely said that a better THP locality needs more work and during
> the review discussion I have even volunteered to work on that. There
> are other reclaim related fixes under work right now. All I am saying
> is that MADV_TRANSHUGE having numa loc
On Mon, 3 Dec 2018, Andrea Arcangeli wrote:
> In my earlier review of David's patch, it looked runtime equivalent to
> the __GFP_COMPACT_ONLY solution. It has the only advantage of adding a
> new gfpflag until we're sure we need it but it's the worst solution
> available for the long term in my vi
On Mon, 3 Dec 2018, Andrea Arcangeli wrote:
> It's trivial to reproduce the badness by running a memhog process that
> allocates more than the RAM of 1 NUMA node, under defrag=always
> setting (or by changing memhog to use MADV_HUGEPAGE) and it'll create
> swap storms despite 75% of the RAM is com
201 - 300 of 2393 matches
Mail list logo