Hi Michel,
On Sun, Feb 03, 2013 at 11:17:12PM -0800, Michel Lespinasse wrote:
> munlock_vma_pages_range() was always incrementing addresses by PAGE_SIZE
> at a time. When munlocking THP pages (or the huge zero page), this resulted
> in taking the mm->page_table_lock 512 times in a row.
>
> We can
The memory barrier inside __SetPageUptodate makes sure that
> + * preceeding stores to the page contents become visible after
> + * the set_pte_at() write.
> + */
s/after/before/
After the above correction it looks nice cleanup, thanks!
Acked-by: Andrea Arcangeli
--
To unsu
Hi everyone,
On Tue, Jan 29, 2013 at 04:26:13AM +0200, Izik Eidus wrote:
> On 01/29/2013 02:49 AM, Izik Eidus wrote:
> > On 01/29/2013 01:54 AM, Andrew Morton wrote:
> >> On Fri, 25 Jan 2013 17:53:10 -0800 (PST)
> >> Hugh Dickins wrote:
> >>
> >>> Here's a KSM series
> >> Sanity check: do you hav
Hi Michel,
On Tue, Sep 04, 2012 at 02:20:52AM -0700, Michel Lespinasse wrote:
> This change fixes an anon_vma locking issue in the following situation:
> - vma has no anon_vma
> - next has an anon_vma
> - vma is being shrunk / next is being expanded, due to an mprotect call
>
> We need to take ne
On Tue, Sep 04, 2012 at 02:53:47PM -0700, Michel Lespinasse wrote:
> I think the minimal fix would actually be:
>
> if (vma->anon_vma && (importer || start != vma->vm_start)) {
> anon_vma = vma->anon_vma;
> + else if (next->anon_vma && adjust_next)
> + anon_vma
Hi Andrew and Martin,
On Fri, Aug 31, 2012 at 12:47:02PM -0700, Andrew Morton wrote:
> On Fri, 31 Aug 2012 09:07:57 +0200
> Martin Schwidefsky wrote:
>
> > > I grabbed them all. Patches 1-3 look sane to me and I cheerfully
> > > didn't read the s390 changes at all. Hopefully Andrea will be abl
Hi Gerald,
On Wed, Aug 29, 2012 at 05:32:58PM +0200, Gerald Schaefer wrote:
> +#ifndef __HAVE_ARCH_PGTABLE_DEPOSIT
> +extern void pgtable_deposit(struct mm_struct *mm, pgtable_t pgtable);
> +#endif
One minor nitpick on the naming of the two functions: considering that
those are global exports, th
lcome,
> >
> > Will
> >
> > Catalin Marinas (2):
> > mm: thp: Fix the pmd_clear() arguments in pmdp_get_and_clear()
> > mm: thp: Fix the update_mmu_cache() last argument passing in
> > mm/huge_memory.c
Both:
Reviewed-by: Andrea Arcangeli
> >
&g
guess
it would be more correct if __GFP_MOVABLE was clear, like
(GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE because this page isn't
really movable (it's only reclaimable).
The xchg vs xchgcmp locking also looks good.
Reviewed-by: Andrea Arcangeli
Thanks,
Andrea
--
To unsubscrib
Hi Kirill,
On Thu, Sep 13, 2012 at 08:37:58PM +0300, Kirill A. Shutemov wrote:
> On Thu, Sep 13, 2012 at 07:16:13PM +0200, Andrea Arcangeli wrote:
> > Hi Kirill,
> >
> > On Wed, Sep 12, 2012 at 01:07:53PM +0300, Kirill A. Shutemov wrote:
> > > - hpage = alloc_pa
On Wed, Jan 30, 2008 at 04:20:35PM -0600, Robin Holt wrote:
> On Wed, Jan 30, 2008 at 11:19:28AM -0800, Christoph Lameter wrote:
> > On Wed, 30 Jan 2008, Jack Steiner wrote:
> >
> > > Moving to a different lock solves the problem.
> >
> > Well it gets us back to the issue why we removed the lock.
On Wed, Jan 30, 2008 at 11:50:26AM -0800, Christoph Lameter wrote:
> Then we have
>
> invalidate_range_start(mm)
>
> and
>
> invalidate_range_finish(mm, start, end)
>
> in addition to the invalidate rmap_notifier?
>
> ---
> include/linux/mmu_notifier.h |7 +--
> 1 file changed, 5 ins
On Wed, Jan 30, 2008 at 03:55:37PM -0800, Christoph Lameter wrote:
> On Thu, 31 Jan 2008, Andrea Arcangeli wrote:
>
> > > I think Andrea's original concept of the lock in the mmu_notifier_head
> > > structure was the best. I agree with him that it should be a
On Wed, Jan 30, 2008 at 04:01:31PM -0800, Christoph Lameter wrote:
> How we offload that? Before the scan of the rmaps we do not have the
> mmstruct. So we'd need another notifier_rmap_callback.
My assumption is that that "int lock" exists just because
unmap_mapping_range_vma exists. If I'm right
On Wed, Jan 30, 2008 at 06:08:14PM -0800, Christoph Lameter wrote:
> hlist_for_each_entry_safe_rcu(mn, n, t,
> &mm->mmu_notifier.head, hlist) {
> hlist_del_rcu(&mn->hlist);
On Wed, Jan 30, 2008 at 05:46:21PM -0800, Christoph Lameter wrote:
> Well the GRU uses follow_page() instead of get_user_pages. Performance is
> a major issue for the GRU.
GRU is a external TLB, we have to allocate RAM instead but we do it
through the regular userland paging mechanism. Performan
On Wed, Jan 30, 2008 at 08:57:52PM -0800, Christoph Lameter wrote:
> @@ -211,7 +212,9 @@ asmlinkage long sys_remap_file_pages(uns
> spin_unlock(&mapping->i_mmap_lock);
> }
>
> + mmu_notifier(invalidate_range_begin, mm, start, start + size, 0);
> err = populate_range(
On Wed, Jan 30, 2008 at 06:51:26PM -0800, Christoph Lameter wrote:
> True. hlist_del_init ok? That would allow to check the driver that the
> mmu_notifier is already linked in using !hlist_unhashed(). Driver then
> needs to properly initialize the mmu_notifier list with INIT_HLIST_NODE().
A driv
will better get start,end range too.
XPMEM -> invalidate_range_start/end/invalidate_external_rmap
GRU/KVM -> invalidate_pages[s], in the future mprotect_pages optimization etc...
Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>
Signed-off-by: Christoph Lameter <[EMAIL PROTECT
On Thu, Jan 31, 2008 at 12:18:54PM -0800, Christoph Lameter wrote:
> pt lock cannot serialize with invalidate_range since it is split. A range
> requires locking for a series of ptes not only individual ones.
The lock I take already protects up to 512 ptes yes. I call
invalidate_pages only across
On Thu, Jan 31, 2008 at 03:09:55PM -0800, Christoph Lameter wrote:
> On Thu, 31 Jan 2008, Christoph Lameter wrote:
>
> > > pagefault against the main linux page fault, given we already have all
> > > needed serialization out of the PT lock. XPMEM is forced to do that
> >
> > pt lock cannot serial
erformance critical for GRU, if yes, then I hope my _dual_ approach
is by far the best for at least GRU (and KVM of course for the very
same reason), and of course it'll fit XPMEM too the moment you add
invalidate_range_start/end too.
Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>
On Thu, Jan 31, 2008 at 02:21:58PM -0800, Christoph Lameter wrote:
> Is this okay for KVM too?
->release isn't implemented at all in KVM, only the list_del generates
complications.
I think current code could be already safe through the mm_count pin,
becasue KVM relies on the fact anybody pinning
On Thu, Jan 31, 2008 at 05:37:21PM -0800, Christoph Lameter wrote:
> On Fri, 1 Feb 2008, Andrea Arcangeli wrote:
>
> > I appreciate the review! I hope my entirely bug free and
> > strightforward #v5 will strongly increase the probability of getting
> > this in sooner than
On Thu, Jan 31, 2008 at 05:44:24PM -0800, Christoph Lameter wrote:
> The trouble is that the invalidates are much more expensive if you have to
> send theses to remote partitions (XPmem). And its really great if you can
> simple tear down everything. Certainly this is a significant improvement
>
On Thu, Jan 31, 2008 at 07:58:40PM -0800, Christoph Lameter wrote:
> Ok. Andrea wanted the same because then he can void the begin callouts.
Exactly. I hope the page-pin will avoid me having to serialize the KVM
page fault against the start/end critical section.
BTW, I wonder if the start/end cri
On Thu, Jan 31, 2008 at 09:04:39PM -0800, Christoph Lameter wrote:
> - Has page tables to track pages whose refcount was elevated(?) but
> no reverse maps.
Just a correction, rmaps exists or swap couldn't be sane, it's just
that it's not built on the page_t because the guest memory is really
vir
On Fri, Feb 01, 2008 at 11:23:57AM -0800, Christoph Lameter wrote:
> Yes so your invalidate_range is still some sort of dysfunctional
> optimization? Gazillions of invalidate_page's will have to be executed
> when tearing down large memory areas.
I don't know if gru can flush the external TLB re
On Fri, Feb 01, 2008 at 05:35:28PM -0600, Robin Holt wrote:
> No, we need a callout when we are becoming more restrictive, but not
> when becoming more permissive. I would have to guess that is the case
> for any of these callouts. It is for both GRU and XPMEM. I would
> expect the same is true
On Sat, Feb 02, 2008 at 09:14:57PM -0600, Jack Steiner wrote:
> Also, most (but not all) applications that use the GRU do not usually do
> anything that requires frequent flushing (fortunately). The GRU is intended
> for HPC-like applications. These don't usually do frequent map/unmap
> operations
On Mon, Feb 04, 2008 at 11:09:01AM -0800, Christoph Lameter wrote:
> On Sun, 3 Feb 2008, Andrea Arcangeli wrote:
>
> > > Right but that pin requires taking a refcount which we cannot do.
> >
> > GRU can use my patch without the pin. XPMEM obviously can't use my
&
On Mon, Feb 04, 2008 at 10:11:24PM -0800, Christoph Lameter wrote:
> Zero problems only if you find having a single callout for every page
> acceptable. So the invalidate_range in your patch is only working
invalidate_pages is only a further optimization that was
strightforward in some places wh
it's 200M.
> After the patcheset thp-always RSS is 400k too.
>
> v3:
> - fix potential deadlock in refcounting code on preemptive kernel.
> - do not mark huge zero page as movable.
> - fix typo in comment.
> - Reviewed-by tag from Andrea Arcangeli.
> v2:
> -
aults on transparent hugepages which do not result
> in a CoW update the access flags for the faulting pmd.
>
> Cc: Andrea Arcangeli
> Cc: Chris Metcalf
> Signed-off-by: Steve Capper
> Signed-off-by: Will Deacon
> ---
>
> v2: - Use pmd_trans_huge_lock to guard again
Hi Dave,
On Tue, Oct 02, 2012 at 06:27:18PM -0400, David Miller wrote:
>
> The code currently assumes that pgtable_t is a struct page pointer.
>
> Fix this by pushing pgtable management behind arch helper functions.
This should be fixed in -mm already, it's from the s390x support.
--
To unsubsc
Hi Andrew,
On Tue, Oct 02, 2012 at 03:31:48PM -0700, Andrew Morton wrote:
> From reading the code, it appears that we initially allocate a huge
> page and point the pmd at that. If/when there is a write fault against
> that page we then populate the mm with ptes which point at the normal
> 4k zer
d_clear(tlb->mm, addr, pmd);
> page = pmd_page(orig_pmd);
> tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
>
> And we properly accomodate TLB flush mechanims like the one described
> above.
Thanks for the explanation.
Reviewed-by: Andrea Arcangeli
--
To unsubs
Hi Dave,
On Wed, Oct 03, 2012 at 10:00:27PM -0400, David Miller wrote:
> From: Andrew Morton
> Date: Tue, 2 Oct 2012 15:55:44 -0700
>
> > I had a shot at integrating all this onto the pending stuff in linux-next.
> > "mm: Add and use update_mmu_cache_pmd() in transparent huge page code."
> > ne
Subject: Re: [PATCH 29/33] autonuma: page_autonuma
Reply-To:
In-Reply-To:
<013a2c223da2-632aa43e-21f8-4abd-a0ba-2e1b49881e3a-000...@email.amazonses.com>
Hi Christoph,
On Thu, Oct 04, 2012 at 02:16:14PM +, Christoph Lameter wrote:
> On Thu, 4 Oct 2012, Andrea Arcang
Hi Christoph,
On Thu, Oct 04, 2012 at 06:17:37PM +, Christoph Lameter wrote:
> On Thu, 4 Oct 2012, Andrea Arcangeli wrote:
>
> > So we could drop page_autonuma by creating a CONFIG_SLUB=y dependency
> > (AUTONUMA wouldn't be available in the kernel config if SLAB=y, a
Hi Christoph,
On Thu, Oct 04, 2012 at 07:11:51PM +, Christoph Lameter wrote:
> I did not say anything like that. Still not convinced that autonuma is
> worth doing and that it is beneficial given the complexity it adds to the
> kernel. Just wanted to point out that there is a case to be made f
Hi Michal,
On Fri, Oct 05, 2012 at 11:28:10AM +0200, Michal Hocko wrote:
> FWIW there is also a pure -mm (non-rebased) git tree at
> http://git.kernel.org/?p=linux/kernel/git/mhocko/mm.git;a=summary
> since-3.6 branch. It is based on top of 3.6 with mm patches from
> Andrew's tree.
I'd still sugg
Hi everyone,
because of vger technicalities the AutoNUMA27 submit only reached
linux-mm, so I thought of posting a link here too for reference. If
you're interested to benchmark AutoNUMA using your preferred workload
on NUMA hardware, or even better if you could review the code, you can
find the r
Hi Kanoj,
On Wed, Feb 13, 2008 at 03:43:17PM -0800, Kanoj Sarcar wrote:
> Oh ok, yes, I did see the discussion on this; sorry I
> missed it. I do see what notifiers bring to the table
> now (without endorsing it :-)).
I'm not really livelocks are really the big issue here.
I'm running N 1G VM on
pinned by sptes. The race
can materialize if the linux pte is zapped after get_user_pages
returns but before the page is mapped by the spte and tracked by
rmap. The invalidate_ calls can also likely be optimized further but
it's not a fast path so it's not urgent.
Signed-off-by: Andrea Ar
On Fri, Feb 15, 2008 at 07:37:36PM -0800, Andrew Morton wrote:
> The "|" is obviously deliberate. But no explanation is provided telling us
> why we still call the callback if ptep_clear_flush_young() said the page
> was recently referenced. People who read your code will want to understand
> thi
Hi Kirill,
On Tue, Sep 25, 2012 at 05:27:03PM +0300, Kirill A. Shutemov wrote:
> On Fri, Sep 14, 2012 at 07:52:10AM +0200, Ingo Molnar wrote:
> > Without repeatable hard numbers such code just gets into the
> > kernel and bitrots there as new CPU generations come in - a few
> > years down the li
Hi everyone,
On Fri, Aug 02, 2013 at 02:46:59PM -0500, Alex Thorlton wrote:
> This patch implements functionality to allow processes to disable the use of
> transparent hugepages through the prctl syscall.
>
> We've determined that some jobs perform significantly better with thp
> disabled,
> an
as already
modified in order to set PG_tail so this won't affect the boot time of
large memory systems.
Reported-by: andy123
Signed-off-by: Andrea Arcangeli
---
mm/hugetlb.c | 18 +-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.
patch
11feeb498086a3a5907b8148bdf1786a9b18fc55.
Enforcing PG_reserved not set for tail pages of hugetlbfs gigantic
compound pages sounds safer regardless of commit
11feeb498086a3a5907b8148bdf1786a9b18fc55 to be consistent with the
other hugetlbfs page sizes (i.e hugetlbfs page order < MAX_ORDER).
Thanks,
Andrea
Andrea Arca
Hi everyone,
On Sat, Sep 28, 2013 at 09:37:39PM +0200, Ingo Molnar wrote:
>
> * Ingo Molnar wrote:
>
> > If we do that then I suspect the next step will be queued rwlocks :-/
> > The current rwlock_t implementation is rather primitive by modern
> > standards. (We'd probably have killed rwlock
On Mon, Sep 30, 2013 at 09:26:21AM -0700, Linus Torvalds wrote:
> On Mon, Sep 30, 2013 at 1:52 AM, Andrea Arcangeli wrote:
> >
> > Sorry having to break the party but the sleepable locks for anon_vma
> > and i_mmap_mutex are now requirement for the "pageable RDMA&quo
Hi Mel,
On Tue, Nov 06, 2012 at 09:14:36AM +, Mel Gorman wrote:
> This series addresses part of the integration and sharing problem by
> implementing a foundation that either the policy for schednuma or autonuma
> can be rebased on. The actual policy it implements is a very stupid
> greedy pol
Hi,
On Fri, Nov 16, 2012 at 02:14:28PM +, Mel Gorman wrote:
> With some shuffling the question on what to consider for merging
> becomes
>
>
> 1. TLB optimisation patches 1-3? Patches 1-3
I assume you mean simply reshuffling 33-35 as 1-3.
> 2. Stats for migration?
mm/huge_memory.c | 43 +------
> 1 file changed, 21 insertions(+), 22 deletions(-)
Reviewed-by: Andrea Arcangeli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More
On Tue, Jun 04, 2013 at 06:37:25AM -0400, Rik van Riel wrote:
> On 06/03/2013 03:50 PM, Daniel Forrest wrote:
> > On Tue, Aug 21, 2012 at 11:29:54PM -0400, Rik van Riel wrote:
> >> On 08/21/2012 11:20 PM, Michel Lespinasse wrote:
> >>> On Mon, Aug 20, 2012 at 02:39:26AM -0700, Michel Lespinasse wro
Hi Johannes,
On Fri, Jul 19, 2013 at 04:55:25PM -0400, Johannes Weiner wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index af1d956b..d938b67 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1879,6 +1879,14 @@ zonelist_scan:
> if (alloc_flags & ALLOC_NO_WATERMA
On Thu, Aug 01, 2013 at 12:31:34AM -0400, Rik van Riel wrote:
> On 07/31/2013 10:56 PM, Minchan Kim wrote:
>
> > Yes, it's not really slow path because it could return to normal status
> > without calling significant slow functions by reset batchcount of
> > prepare_slowpath.
> >
> > I think it's
On Thu, Aug 01, 2013 at 03:58:23PM -0400, Johannes Weiner wrote:
> But we might be able to get away with a small error.
The idea is that there's a small error anyway, because multiple CPUs
can reset it at the same time, while another CPU is decreasing it, so
the decrease sometime may get lost rega
NUMA
node step on each other is going to be unmeasurable.
ACK the whole series.
Signed-off-by: Andrea Arcangeli
Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vge
On Mon, Aug 05, 2013 at 06:34:56PM +0800, Wanpeng Li wrote:
> Why round robin allocator don't consume ZONE_DMA?
I guess lowmem reserve reserves it all, 4GB/256(ratio)=16MB.
The only way to relax it would be 1) to account depending on memblock
types and allow only the movable ones to bypass the lo
> say what exactly the effects are, but maybe you do (or you could
> > explain to me why I am wrong :)).
> >
> > commit a8aed3e0752b4beb2e37cbed6df69faae88268da
> > Author: Andrea Arcangeli
> > Date: Fri Feb 22 15:11:51 2013 -0800
> >
> > x86/mm/pa
On Mon, Apr 08, 2013 at 03:53:31PM +0100, Andy Whitcroft wrote:
> On Sat, Apr 06, 2013 at 04:58:04PM +0200, Andrea Arcangeli wrote:
>
> > You're right, so this location clearly didn't trigger the problem so I
> > didn't notice the noop here. I only exercised the
it to the right variable in the new
location.
Reported-by: Stefan Bader
Signed-off-by: Andrea Arcangeli
---
arch/x86/mm/pageattr.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 091934e..7896f71 100644
--- a/a
p_anon_pages runs.
Signed-off-by: Andrea Arcangeli
---
mm/huge_memory.c | 24
mm/rmap.c| 9 +
2 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f46aad1..9a2e235 100644
--- a/mm/huge_memory.c
+++ b/mm
exclusive if set.
Signed-off-by: Andrea Arcangeli
---
arch/alpha/include/uapi/asm/mman.h | 3 +++
arch/mips/include/uapi/asm/mman.h | 3 +++
arch/parisc/include/uapi/asm/mman.h| 3 +++
arch/xtensa/include/uapi/asm/mman.h| 3 +++
include/linux/mm.h | 1
remap_anon_pages only once for each range received). So
remap_anon_pages in the above testcase runs within the signal handler,
but in production postcopy, it can run in a different thread.
Signed-off-by: Andrea Arcangeli
---
arch/x86/syscalls/syscall_32.tbl | 1 +
arch/x86/syscalls/
in some anon_vma.
Signed-off-by: Andrea Arcangeli
---
include/linux/swap.h | 6 ++
mm/swapfile.c| 13 +
2 files changed, 19 insertions(+)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1701ce4..0ea2a56 100644
--- a/include/linux/swap.h
+++ b/include/linux
nd to be strict and be sure it knows what it
is doing (otherwise it should use mremap in the first place?).
Comments welcome, thanks!
Andrea
Andrea Arcangeli (4):
mm: madvise MADV_USERFAULT
mm: rmap preparation for remap_anon_pages
mm: swp_entry_swapcount
mm: sys_remap_anon_pages
arch/alp
On Mon, May 06, 2013 at 09:57:01PM +0200, Andrea Arcangeli wrote:
> ===
>
> static unsigned char *c, *tmp;
>
> void userfault_sighandler(int signum, siginfo_t *info, void *ctx)
oops, the hash of the test program got cut... so I append it below
which is nicer without lead
Hi Andrew,
On Tue, May 07, 2013 at 01:16:30PM +0200, Andrew Jones wrote:
> On Mon, May 06, 2013 at 09:56:58PM +0200, Andrea Arcangeli wrote:
> > @@ -405,6 +420,7 @@ madvise_behavior_valid(int behavior)
> > case MADV_HUGEPAGE:
> > case MADV_NOHUGEPAGE:
>
Hi Isaku,
On Tue, May 07, 2013 at 07:07:40PM +0900, Isaku Yamahata wrote:
> On Mon, May 06, 2013 at 09:56:57PM +0200, Andrea Arcangeli wrote:
> > Hello everyone,
> >
> > this is a patchset to implement two new kernel features:
> > MADV_USERFAULT and remap_anon_pages.
On Tue, May 07, 2013 at 01:38:10PM +0200, Andrew Jones wrote:
> What about instead of adding a new syscall (remap_anon_pages) to
> instead extend mremap with new flags giving it a strict mode?
I actually thought about this and it's a very interesting argument.
When I thought about it, I felt the
Hi,
On Thu, Apr 11, 2013 at 02:29:18PM +0200, Ingo Molnar wrote:
>
>
> * tip-bot for Andrea Arcangeli wrote:
>
> > Commit-ID: f76cfa3c2496c462b5bc01bd0c9340c2715b73ca
> > Gitweb:
> > http://git.kernel.org/tip/f76cfa3c2496c462b5bc01bd0c9340c2715b73ca
&g
If the pmd is not present, _PAGE_PSE will not be set anymore. Fix the
false positive.
Reported-by: Ingo Molnar
Signed-off-by: Andrea Arcangeli
---
arch/x86/mm/pageattr-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr
ely.
This can't affect x86 where even a locked bitop is the equivalent of a
full memory barrier.
> Also, what is that barrier() in handle_mm_fault() doing? And why doesn't
> it have a comment explaining that?
I added the docs below:
=
>From ad51771a2c3fa697fa0267edda23b48
On Tue, Jul 24, 2012 at 02:51:05PM -0700, Hugh Dickins wrote:
> Since then, I think THP has made the rules more complicated; but I
> believe Andrea paid a great deal of attention to that kind of issue.
There were many issues, one unexpected was
1a5a9906d4e8d1976b701f889d8f35d54b928f25.
Keep in mi
On Sat, Aug 04, 2012 at 03:02:45PM -0700, Paul E. McKenney wrote:
> OK, I'll bite. ;-)
:))
> The most sane way for this to happen is with feedback-driven techniques
> involving profiling, similar to what is done for basic-block reordering
> or branch prediction. The idea is that you compile the
tested yet. Reviews welcome. Especially if
you could test it again with trinity over the mbind syscall it'd be
wonderful.
Thanks,
Andrea
===
>From 59af0d4348eb07087097e310f60422b994dd3a2c Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli
Date: Tue, 21 Aug 2012 19:32:23 +0200
Subject: [PATCH] thp: make pmd_pres
This resets all per-thread and per-process statistics across exec
syscalls or after kernel threads detach from the mm. The past
statistical NUMA information is unlikely to be relevant for the future
in these cases.
Acked-by: Rik van Riel
Signed-off-by: Andrea Arcangeli
---
fs/exec.c
and make it tunable with
sysfs too.
Signed-off-by: Andrea Arcangeli
---
mm/huge_memory.c | 33 +++--
1 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 08fd33c..a65590f 100644
--- a/mm/huge_memory.c
+++ b/mm
This is needed to make sure the tail pages are also queued into the
migration queues of knuma_migrated across a transparent hugepage
split.
Acked-by: Rik van Riel
Signed-off-by: Andrea Arcangeli
---
mm/huge_memory.c |2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/mm
allocated
page_autonuma of 32 bytes per page (only allocated if booted on NUMA
hardware, unless "noautonuma" is passed as parameter to the kernel at
boot). Yet another later patch introduces the autonuma_list and
reduces the size of the page_autonuma from 32 to 12 bytes.
Signed-off-
ned-off-by: Andrea Arcangeli
---
kernel/sched/fair.c | 11 +++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 42a88fa..677b99e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2794,6 +2794,17 @@ select_ta
it reduces some
initial thrashing in case of NUMA false sharing.
Signed-off-by: Andrea Arcangeli
---
include/linux/autonuma_flags.h | 20
mm/autonuma.c |7 +--
2 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/include/linux
low the NUMA hinting page faults
to start. All other actions follow after that. If knuma_scand doesn't
run, AutoNUMA is fully bypassed. If knuma_scand is stopped, soon all
other AutoNUMA gears will settle down too.
Acked-by: Rik van Riel
Signed-off-by: Andrea Arcangeli
---
kernel/fork.c |
to the
original value of -1 and task_autonuma_cpu will always return true in
that case.
Includes fixes from Hillf Danton .
Signed-off-by: Andrea Arcangeli
---
kernel/sched/fair.c | 71 ++
1 files changed, 59 insertions(+), 12 deletions(-)
dif
Define the two data structures that collect the per-process (in the
mm) and per-thread (in the task_struct) statistical information that
are the input of the CPU follow memory algorithms in the NUMA
scheduler.
Signed-off-by: Andrea Arcangeli
---
include/linux/autonuma_types.h | 107
set_pmd_at() will also be used for the knuma_scand/pmd = 1 (default)
mode even when TRANSPARENT_HUGEPAGE=n. Make it available so the build
won't fail.
Acked-by: Rik van Riel
Signed-off-by: Andrea Arcangeli
---
arch/x86/include/asm/paravirt.h |2 --
1 files changed, 0 insertions(
NUMA
hardware. So the non NUMA hardware only pays the memory of a pointer
in the kernel stack (which remains NULL at all times in that case).
If the kernel is compiled with CONFIG_AUTONUMA=n, not even the pointer
is allocated on the kernel stack of course.
Signed-off-by: Andrea Arcangeli
Until THP native migration is implemented it's safer to boost
khugepaged scanning rate because all memory migration are splitting
the hugepages. So the regular rate of scanning becomes too low when
lots of memory is migrated.
Signed-off-by: Andrea Arcangeli
---
mm/huge_memory.c |6
Add the config options to allow building the kernel with AutoNUMA.
If CONFIG_AUTONUMA_DEFAULT_ENABLED is "=y", then
/sys/kernel/mm/autonuma/enabled will be equal to 1, and AutoNUMA will
be enabled automatically at boot.
Signed-off-by: Andrea Arcangeli
---
arch/Kconfig |3 +++
When pages are collapsed try to keep the last_nid information from one
of the original pages.
Signed-off-by: Andrea Arcangeli
---
mm/huge_memory.c | 14 ++
1 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a65590f..0d2a12f
ted if the kernel is booted on real
NUMA hardware and noautonuma is not passed as a parameter to the
kernel.
Signed-off-by: Andrea Arcangeli
---
include/linux/autonuma.h | 18 +++-
include/linux/autonuma_types.h | 55 +
include/linux/mm_types.h | 26
in
Code include fixes and cleanups from Hillf Danton .
Signed-off-by: Andrea Arcangeli
---
include/linux/autonuma_sched.h | 50
include/linux/mm_types.h |5 +
include/linux/sched.h |3 +
kernel/sched/core.c|1 +
kernel/sched/fair.c|4 +
e able to wait on process
migration (avoid _nowait), but most of the time it does nothing at
all.
Changelog from alpha11 to alpha13:
o autonuma_balance optimization (take the fast path when process is in
the preferred NUMA node)
TODO:
o THP native migration (orthogonal and also needed for
c
y: Rik van Riel
Signed-off-by: Andrea Arcangeli
---
arch/x86/mm/gup.c | 13 -
1 files changed, 12 insertions(+), 1 deletions(-)
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index dd74e46..02c5ec5 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -163,8 +163,19 @@
space reserved
for now).
This means the max RAM configuration fully supported by AutoNUMA
becomes AUTONUMA_LIST_MAX_PFN_OFFSET multiplied by 32767 nodes
multiplied by the PAGE_SIZE (assume 4096 here, but for some archs it's
bigger).
4096*32767*(0x-3)>>(10*5) = 511 PetaBytes.
Si
This is where the numa hinting page faults are detected and are passed
over to the AutoNUMA core logic.
Signed-off-by: Andrea Arcangeli
---
include/linux/huge_mm.h |2 ++
mm/huge_memory.c| 18 ++
mm/memory.c | 31 +++
3
the memory in a round robin fashion from
all remote nodes to the daemon's local node.
The head that belongs to the local node that knuma_migrated runs on,
for now must be empty and it's not being used.
Signed-off-by: Andrea Arcangeli
---
include/linux/mmzone.h | 18
1 - 100 of 2012 matches
Mail list logo