Re: [PATCH v2 3/3] mm: accelerate munlock() treatment of THP pages

2013-02-08 Thread Andrea Arcangeli
Hi Michel, On Sun, Feb 03, 2013 at 11:17:12PM -0800, Michel Lespinasse wrote: > munlock_vma_pages_range() was always incrementing addresses by PAGE_SIZE > at a time. When munlocking THP pages (or the huge zero page), this resulted > in taking the mm->page_table_lock 512 times in a row. > > We can

Re: [PATCH] THP: Use explicit memory barrier

2013-04-04 Thread Andrea Arcangeli
The memory barrier inside __SetPageUptodate makes sure that > + * preceeding stores to the page contents become visible after > + * the set_pte_at() write. > + */ s/after/before/ After the above correction it looks nice cleanup, thanks! Acked-by: Andrea Arcangeli -- To unsu

Re: [PATCH 0/11] ksm: NUMA trees and page migration

2013-01-29 Thread Andrea Arcangeli
Hi everyone, On Tue, Jan 29, 2013 at 04:26:13AM +0200, Izik Eidus wrote: > On 01/29/2013 02:49 AM, Izik Eidus wrote: > > On 01/29/2013 01:54 AM, Andrew Morton wrote: > >> On Fri, 25 Jan 2013 17:53:10 -0800 (PST) > >> Hugh Dickins wrote: > >> > >>> Here's a KSM series > >> Sanity check: do you hav

Re: [PATCH 2/7] mm: fix potential anon_vma locking issue in mprotect()

2012-09-04 Thread Andrea Arcangeli
Hi Michel, On Tue, Sep 04, 2012 at 02:20:52AM -0700, Michel Lespinasse wrote: > This change fixes an anon_vma locking issue in the following situation: > - vma has no anon_vma > - next has an anon_vma > - vma is being shrunk / next is being expanded, due to an mprotect call > > We need to take ne

Re: [PATCH 2/7] mm: fix potential anon_vma locking issue in mprotect()

2012-09-04 Thread Andrea Arcangeli
On Tue, Sep 04, 2012 at 02:53:47PM -0700, Michel Lespinasse wrote: > I think the minimal fix would actually be: > > if (vma->anon_vma && (importer || start != vma->vm_start)) { > anon_vma = vma->anon_vma; > + else if (next->anon_vma && adjust_next) > + anon_vma

Re: [RFC v2 PATCH 0/7] thp: transparent hugepages on s390

2012-09-04 Thread Andrea Arcangeli
Hi Andrew and Martin, On Fri, Aug 31, 2012 at 12:47:02PM -0700, Andrew Morton wrote: > On Fri, 31 Aug 2012 09:07:57 +0200 > Martin Schwidefsky wrote: > > > > I grabbed them all. Patches 1-3 look sane to me and I cheerfully > > > didn't read the s390 changes at all. Hopefully Andrea will be abl

Re: [RFC v2 PATCH 1/7] thp: remove assumptions on pgtable_t type

2012-09-04 Thread Andrea Arcangeli
Hi Gerald, On Wed, Aug 29, 2012 at 05:32:58PM +0200, Gerald Schaefer wrote: > +#ifndef __HAVE_ARCH_PGTABLE_DEPOSIT > +extern void pgtable_deposit(struct mm_struct *mm, pgtable_t pgtable); > +#endif One minor nitpick on the naming of the two functions: considering that those are global exports, th

Re: [PATCH 0/3] Minor changes to common hugetlb code for ARM

2012-09-12 Thread Andrea Arcangeli
lcome, > > > > Will > > > > Catalin Marinas (2): > > mm: thp: Fix the pmd_clear() arguments in pmdp_get_and_clear() > > mm: thp: Fix the update_mmu_cache() last argument passing in > > mm/huge_memory.c Both: Reviewed-by: Andrea Arcangeli > > &g

Re: [PATCH v3 10/10] thp: implement refcounting for huge zero page

2012-09-13 Thread Andrea Arcangeli
guess it would be more correct if __GFP_MOVABLE was clear, like (GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE because this page isn't really movable (it's only reclaimable). The xchg vs xchgcmp locking also looks good. Reviewed-by: Andrea Arcangeli Thanks, Andrea -- To unsubscrib

Re: [PATCH v3 10/10] thp: implement refcounting for huge zero page

2012-09-13 Thread Andrea Arcangeli
Hi Kirill, On Thu, Sep 13, 2012 at 08:37:58PM +0300, Kirill A. Shutemov wrote: > On Thu, Sep 13, 2012 at 07:16:13PM +0200, Andrea Arcangeli wrote: > > Hi Kirill, > > > > On Wed, Sep 12, 2012 at 01:07:53PM +0300, Kirill A. Shutemov wrote: > > > - hpage = alloc_pa

Re: [patch 1/6] mmu_notifier: Core code

2008-01-30 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 04:20:35PM -0600, Robin Holt wrote: > On Wed, Jan 30, 2008 at 11:19:28AM -0800, Christoph Lameter wrote: > > On Wed, 30 Jan 2008, Jack Steiner wrote: > > > > > Moving to a different lock solves the problem. > > > > Well it gets us back to the issue why we removed the lock.

Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-30 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 11:50:26AM -0800, Christoph Lameter wrote: > Then we have > > invalidate_range_start(mm) > > and > > invalidate_range_finish(mm, start, end) > > in addition to the invalidate rmap_notifier? > > --- > include/linux/mmu_notifier.h |7 +-- > 1 file changed, 5 ins

Re: [kvm-devel] [patch 1/6] mmu_notifier: Core code

2008-01-30 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 03:55:37PM -0800, Christoph Lameter wrote: > On Thu, 31 Jan 2008, Andrea Arcangeli wrote: > > > > I think Andrea's original concept of the lock in the mmu_notifier_head > > > structure was the best. I agree with him that it should be a

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-30 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 04:01:31PM -0800, Christoph Lameter wrote: > How we offload that? Before the scan of the rmaps we do not have the > mmstruct. So we'd need another notifier_rmap_callback. My assumption is that that "int lock" exists just because unmap_mapping_range_vma exists. If I'm right

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-30 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 06:08:14PM -0800, Christoph Lameter wrote: > hlist_for_each_entry_safe_rcu(mn, n, t, > &mm->mmu_notifier.head, hlist) { > hlist_del_rcu(&mn->hlist);

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 05:46:21PM -0800, Christoph Lameter wrote: > Well the GRU uses follow_page() instead of get_user_pages. Performance is > a major issue for the GRU. GRU is a external TLB, we have to allocate RAM instead but we do it through the regular userland paging mechanism. Performan

Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 08:57:52PM -0800, Christoph Lameter wrote: > @@ -211,7 +212,9 @@ asmlinkage long sys_remap_file_pages(uns > spin_unlock(&mapping->i_mmap_lock); > } > > + mmu_notifier(invalidate_range_begin, mm, start, start + size, 0); > err = populate_range(

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 06:51:26PM -0800, Christoph Lameter wrote: > True. hlist_del_init ok? That would allow to check the driver that the > mmu_notifier is already linked in using !hlist_unhashed(). Driver then > needs to properly initialize the mmu_notifier list with INIT_HLIST_NODE(). A driv

[PATCH] mmu notifiers #v5

2008-01-31 Thread Andrea Arcangeli
will better get start,end range too. XPMEM -> invalidate_range_start/end/invalidate_external_rmap GRU/KVM -> invalidate_pages[s], in the future mprotect_pages optimization etc... Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]> Signed-off-by: Christoph Lameter <[EMAIL PROTECT

Re: [PATCH] mmu notifiers #v5

2008-01-31 Thread Andrea Arcangeli
On Thu, Jan 31, 2008 at 12:18:54PM -0800, Christoph Lameter wrote: > pt lock cannot serialize with invalidate_range since it is split. A range > requires locking for a series of ptes not only individual ones. The lock I take already protects up to 512 ptes yes. I call invalidate_pages only across

Re: [PATCH] mmu notifiers #v5

2008-01-31 Thread Andrea Arcangeli
On Thu, Jan 31, 2008 at 03:09:55PM -0800, Christoph Lameter wrote: > On Thu, 31 Jan 2008, Christoph Lameter wrote: > > > > pagefault against the main linux page fault, given we already have all > > > needed serialization out of the PT lock. XPMEM is forced to do that > > > > pt lock cannot serial

Re: mmu_notifier: close hole in fork

2008-01-31 Thread Andrea Arcangeli
erformance critical for GRU, if yes, then I hope my _dual_ approach is by far the best for at least GRU (and KVM of course for the very same reason), and of course it'll fit XPMEM too the moment you add invalidate_range_start/end too. Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

Re: mmu_notifier: Move mmu_notifier_release up to get rid of the invalidat_all() callback

2008-01-31 Thread Andrea Arcangeli
On Thu, Jan 31, 2008 at 02:21:58PM -0800, Christoph Lameter wrote: > Is this okay for KVM too? ->release isn't implemented at all in KVM, only the list_del generates complications. I think current code could be already safe through the mm_count pin, becasue KVM relies on the fact anybody pinning

Re: [PATCH] mmu notifiers #v5

2008-02-01 Thread Andrea Arcangeli
On Thu, Jan 31, 2008 at 05:37:21PM -0800, Christoph Lameter wrote: > On Fri, 1 Feb 2008, Andrea Arcangeli wrote: > > > I appreciate the review! I hope my entirely bug free and > > strightforward #v5 will strongly increase the probability of getting > > this in sooner than

Re: [PATCH] mmu notifiers #v5

2008-02-01 Thread Andrea Arcangeli
On Thu, Jan 31, 2008 at 05:44:24PM -0800, Christoph Lameter wrote: > The trouble is that the invalidates are much more expensive if you have to > send theses to remote partitions (XPmem). And its really great if you can > simple tear down everything. Certainly this is a significant improvement >

Re: [patch 1/3] mmu_notifier: Core code

2008-02-02 Thread Andrea Arcangeli
On Thu, Jan 31, 2008 at 07:58:40PM -0800, Christoph Lameter wrote: > Ok. Andrea wanted the same because then he can void the begin callouts. Exactly. I hope the page-pin will avoid me having to serialize the KVM page fault against the start/end critical section. BTW, I wonder if the start/end cri

Re: [patch 0/4] [RFC] EMMU Notifiers V5

2008-02-02 Thread Andrea Arcangeli
On Thu, Jan 31, 2008 at 09:04:39PM -0800, Christoph Lameter wrote: > - Has page tables to track pages whose refcount was elevated(?) but > no reverse maps. Just a correction, rmaps exists or swap couldn't be sane, it's just that it's not built on the page_t because the guest memory is really vir

Re: [PATCH] mmu notifiers #v5

2008-02-02 Thread Andrea Arcangeli
On Fri, Feb 01, 2008 at 11:23:57AM -0800, Christoph Lameter wrote: > Yes so your invalidate_range is still some sort of dysfunctional > optimization? Gazillions of invalidate_page's will have to be executed > when tearing down large memory areas. I don't know if gru can flush the external TLB re

Re: [patch 2/4] mmu_notifier: Callbacks to invalidate address ranges

2008-02-02 Thread Andrea Arcangeli
On Fri, Feb 01, 2008 at 05:35:28PM -0600, Robin Holt wrote: > No, we need a callout when we are becoming more restrictive, but not > when becoming more permissive. I would have to guess that is the case > for any of these callouts. It is for both GRU and XPMEM. I would > expect the same is true

Re: [PATCH] mmu notifiers #v5

2008-02-02 Thread Andrea Arcangeli
On Sat, Feb 02, 2008 at 09:14:57PM -0600, Jack Steiner wrote: > Also, most (but not all) applications that use the GRU do not usually do > anything that requires frequent flushing (fortunately). The GRU is intended > for HPC-like applications. These don't usually do frequent map/unmap > operations

Re: [PATCH] mmu notifiers #v5

2008-02-04 Thread Andrea Arcangeli
On Mon, Feb 04, 2008 at 11:09:01AM -0800, Christoph Lameter wrote: > On Sun, 3 Feb 2008, Andrea Arcangeli wrote: > > > > Right but that pin requires taking a refcount which we cannot do. > > > > GRU can use my patch without the pin. XPMEM obviously can't use my &

Re: [PATCH] mmu notifiers #v5

2008-02-05 Thread Andrea Arcangeli
On Mon, Feb 04, 2008 at 10:11:24PM -0800, Christoph Lameter wrote: > Zero problems only if you find having a single callout for every page > acceptable. So the invalidate_range in your patch is only working invalidate_pages is only a further optimization that was strightforward in some places wh

Re: [PATCH v3 00/10] Introduce huge zero page

2012-10-02 Thread Andrea Arcangeli
it's 200M. > After the patcheset thp-always RSS is 400k too. > > v3: > - fix potential deadlock in refcounting code on preemptive kernel. > - do not mark huge zero page as movable. > - fix typo in comment. > - Reviewed-by tag from Andrea Arcangeli. > v2: > -

Re: [PATCH v2] mm: thp: Set the accessed flag for old pages on access fault.

2012-10-02 Thread Andrea Arcangeli
aults on transparent hugepages which do not result > in a CoW update the access flags for the faulting pmd. > > Cc: Andrea Arcangeli > Cc: Chris Metcalf > Signed-off-by: Steve Capper > Signed-off-by: Will Deacon > --- > > v2: - Use pmd_trans_huge_lock to guard again

Re: [PATCH 6/8] mm: Make transparent huge code not depend upon the details of pgtable_t

2012-10-02 Thread Andrea Arcangeli
Hi Dave, On Tue, Oct 02, 2012 at 06:27:18PM -0400, David Miller wrote: > > The code currently assumes that pgtable_t is a struct page pointer. > > Fix this by pushing pgtable management behind arch helper functions. This should be fixed in -mm already, it's from the s390x support. -- To unsubsc

Re: [PATCH v3 00/10] Introduce huge zero page

2012-10-02 Thread Andrea Arcangeli
Hi Andrew, On Tue, Oct 02, 2012 at 03:31:48PM -0700, Andrew Morton wrote: > From reading the code, it appears that we initially allocate a huge > page and point the pmd at that. If/when there is a write fault against > that page we then populate the mm with ptes which point at the normal > 4k zer

Re: [PATCH 7/8] mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd().

2012-10-02 Thread Andrea Arcangeli
d_clear(tlb->mm, addr, pmd); > page = pmd_page(orig_pmd); > tlb_remove_pmd_tlb_entry(tlb, pmd, addr); > > And we properly accomodate TLB flush mechanims like the one described > above. Thanks for the explanation. Reviewed-by: Andrea Arcangeli -- To unsubs

Re: [PATCH 0/8] THP support for Sparc64

2012-10-04 Thread Andrea Arcangeli
Hi Dave, On Wed, Oct 03, 2012 at 10:00:27PM -0400, David Miller wrote: > From: Andrew Morton > Date: Tue, 2 Oct 2012 15:55:44 -0700 > > > I had a shot at integrating all this onto the pending stuff in linux-next. > > "mm: Add and use update_mmu_cache_pmd() in transparent huge page code." > > ne

[no subject]

2012-10-04 Thread Andrea Arcangeli
Subject: Re: [PATCH 29/33] autonuma: page_autonuma Reply-To: In-Reply-To: <013a2c223da2-632aa43e-21f8-4abd-a0ba-2e1b49881e3a-000...@email.amazonses.com> Hi Christoph, On Thu, Oct 04, 2012 at 02:16:14PM +, Christoph Lameter wrote: > On Thu, 4 Oct 2012, Andrea Arcang

Re: [PATCH 29/33] autonuma: page_autonuma

2012-10-04 Thread Andrea Arcangeli
Hi Christoph, On Thu, Oct 04, 2012 at 06:17:37PM +, Christoph Lameter wrote: > On Thu, 4 Oct 2012, Andrea Arcangeli wrote: > > > So we could drop page_autonuma by creating a CONFIG_SLUB=y dependency > > (AUTONUMA wouldn't be available in the kernel config if SLAB=y, a

Re: [PATCH 29/33] autonuma: page_autonuma

2012-10-05 Thread Andrea Arcangeli
Hi Christoph, On Thu, Oct 04, 2012 at 07:11:51PM +, Christoph Lameter wrote: > I did not say anything like that. Still not convinced that autonuma is > worth doing and that it is beneficial given the complexity it adds to the > kernel. Just wanted to point out that there is a case to be made f

Re: [PATCH 0/8] THP support for Sparc64

2012-10-05 Thread Andrea Arcangeli
Hi Michal, On Fri, Oct 05, 2012 at 11:28:10AM +0200, Michal Hocko wrote: > FWIW there is also a pure -mm (non-rebased) git tree at > http://git.kernel.org/?p=linux/kernel/git/mhocko/mm.git;a=summary > since-3.6 branch. It is based on top of 3.6 with mm patches from > Andrew's tree. I'd still sugg

AutoNUMA27

2012-10-05 Thread Andrea Arcangeli
Hi everyone, because of vger technicalities the AutoNUMA27 submit only reached linux-mm, so I thought of posting a link here too for reference. If you're interested to benchmark AutoNUMA using your preferred workload on NUMA hardware, or even better if you could review the code, you can find the r

Re: [ofa-general] Re: Demand paging for memory regions

2008-02-13 Thread Andrea Arcangeli
Hi Kanoj, On Wed, Feb 13, 2008 at 03:43:17PM -0800, Kanoj Sarcar wrote: > Oh ok, yes, I did see the discussion on this; sorry I > missed it. I do see what notifiers bring to the table > now (without endorsing it :-)). I'm not really livelocks are really the big issue here. I'm running N 1G VM on

[PATCH] KVM swapping with MMU Notifiers V7

2008-02-16 Thread Andrea Arcangeli
pinned by sptes. The race can materialize if the linux pte is zapped after get_user_pages returns but before the page is mapped by the spte and tracked by rmap. The invalidate_ calls can also likely be optimized further but it's not a fast path so it's not urgent. Signed-off-by: Andrea Ar

Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-16 Thread Andrea Arcangeli
On Fri, Feb 15, 2008 at 07:37:36PM -0800, Andrew Morton wrote: > The "|" is obviously deliberate. But no explanation is provided telling us > why we still call the callback if ptep_clear_flush_young() said the page > was recently referenced. People who read your code will want to understand > thi

Re: [PATCH v4 0/8] Avoid cache trashing on clearing huge/gigantic page

2012-09-25 Thread Andrea Arcangeli
Hi Kirill, On Tue, Sep 25, 2012 at 05:27:03PM +0300, Kirill A. Shutemov wrote: > On Fri, Sep 14, 2012 at 07:52:10AM +0200, Ingo Molnar wrote: > > Without repeatable hard numbers such code just gets into the > > kernel and bitrots there as new CPU generations come in - a few > > years down the li

Re: [PATCH] Add per-process flag to control thp

2013-08-28 Thread Andrea Arcangeli
Hi everyone, On Fri, Aug 02, 2013 at 02:46:59PM -0500, Alex Thorlton wrote: > This patch implements functionality to allow processes to disable the use of > transparent hugepages through the prctl syscall. > > We've determined that some jobs perform significantly better with thp > disabled, > an

[PATCH] mm: hugetlb: initialize PG_reserved for tail pages of gigantig compound pages

2013-10-10 Thread Andrea Arcangeli
as already modified in order to set PG_tail so this won't affect the boot time of large memory systems. Reported-by: andy123 Signed-off-by: Andrea Arcangeli --- mm/hugetlb.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.

[PATCH] initialize PG_reserved for tail pages of gigantig compound pages

2013-10-10 Thread Andrea Arcangeli
patch 11feeb498086a3a5907b8148bdf1786a9b18fc55. Enforcing PG_reserved not set for tail pages of hugetlbfs gigantic compound pages sounds safer regardless of commit 11feeb498086a3a5907b8148bdf1786a9b18fc55 to be consistent with the other hugetlbfs page sizes (i.e hugetlbfs page order < MAX_ORDER). Thanks, Andrea Andrea Arca

Re: [PATCH] anon_vmas: Convert the rwsem to an rwlock_t

2013-09-30 Thread Andrea Arcangeli
Hi everyone, On Sat, Sep 28, 2013 at 09:37:39PM +0200, Ingo Molnar wrote: > > * Ingo Molnar wrote: > > > If we do that then I suspect the next step will be queued rwlocks :-/ > > The current rwlock_t implementation is rather primitive by modern > > standards. (We'd probably have killed rwlock

Re: [PATCH] anon_vmas: Convert the rwsem to an rwlock_t

2013-09-30 Thread Andrea Arcangeli
On Mon, Sep 30, 2013 at 09:26:21AM -0700, Linus Torvalds wrote: > On Mon, Sep 30, 2013 at 1:52 AM, Andrea Arcangeli wrote: > > > > Sorry having to break the party but the sleepable locks for anon_vma > > and i_mmap_mutex are now requirement for the "pageable RDMA&quo

Re: [RFC PATCH 00/19] Foundation for automatic NUMA balancing

2012-11-09 Thread Andrea Arcangeli
Hi Mel, On Tue, Nov 06, 2012 at 09:14:36AM +, Mel Gorman wrote: > This series addresses part of the integration and sharing problem by > implementing a foundation that either the policy for schednuma or autonuma > can be rebased on. The actual policy it implements is a very stupid > greedy pol

Re: Benchmark results: "Enhanced NUMA scheduling with adaptive affinity"

2012-11-16 Thread Andrea Arcangeli
Hi, On Fri, Nov 16, 2012 at 02:14:28PM +, Mel Gorman wrote: > With some shuffling the question on what to consider for merging > becomes > > > 1. TLB optimisation patches 1-3? Patches 1-3 I assume you mean simply reshuffling 33-35 as 1-3. > 2. Stats for migration?

Re: [PATCH] thp: fix huge zero page logic for page with pfn == 0

2013-04-17 Thread Andrea Arcangeli
mm/huge_memory.c | 43 +------ > 1 file changed, 21 insertions(+), 22 deletions(-) Reviewed-by: Andrea Arcangeli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More

Re: [RFC PATCH] Re: Repeated fork() causes SLAB to grow without bound

2013-06-05 Thread Andrea Arcangeli
On Tue, Jun 04, 2013 at 06:37:25AM -0400, Rik van Riel wrote: > On 06/03/2013 03:50 PM, Daniel Forrest wrote: > > On Tue, Aug 21, 2012 at 11:29:54PM -0400, Rik van Riel wrote: > >> On 08/21/2012 11:20 PM, Michel Lespinasse wrote: > >>> On Mon, Aug 20, 2012 at 02:39:26AM -0700, Michel Lespinasse wro

Re: [patch 3/3] mm: page_alloc: fair zone allocator policy

2013-07-29 Thread Andrea Arcangeli
Hi Johannes, On Fri, Jul 19, 2013 at 04:55:25PM -0400, Johannes Weiner wrote: > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index af1d956b..d938b67 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1879,6 +1879,14 @@ zonelist_scan: > if (alloc_flags & ALLOC_NO_WATERMA

Re: [patch 3/3] mm: page_alloc: fair zone allocator policy

2013-08-01 Thread Andrea Arcangeli
On Thu, Aug 01, 2013 at 12:31:34AM -0400, Rik van Riel wrote: > On 07/31/2013 10:56 PM, Minchan Kim wrote: > > > Yes, it's not really slow path because it could return to normal status > > without calling significant slow functions by reset batchcount of > > prepare_slowpath. > > > > I think it's

Re: [patch 3/3] mm: page_alloc: fair zone allocator policy

2013-08-01 Thread Andrea Arcangeli
On Thu, Aug 01, 2013 at 03:58:23PM -0400, Johannes Weiner wrote: > But we might be able to get away with a small error. The idea is that there's a small error anyway, because multiple CPUs can reset it at the same time, while another CPU is decreasing it, so the decrease sometime may get lost rega

Re: [patch v2 0/3] mm: improve page aging fairness between zones/nodes

2013-08-02 Thread Andrea Arcangeli
NUMA node step on each other is going to be unmeasurable. ACK the whole series. Signed-off-by: Andrea Arcangeli Thanks, Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vge

Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2013-08-05 Thread Andrea Arcangeli
On Mon, Aug 05, 2013 at 06:34:56PM +0800, Wanpeng Li wrote: > Why round robin allocator don't consume ZONE_DMA? I guess lowmem reserve reserves it all, 4GB/256(ratio)=16MB. The only way to relax it would be 1) to account depending on memblock types and allow only the movable ones to bypass the lo

Re: x86/mm/pageattr: Code without effect?

2013-04-06 Thread Andrea Arcangeli
> say what exactly the effects are, but maybe you do (or you could > > explain to me why I am wrong :)). > > > > commit a8aed3e0752b4beb2e37cbed6df69faae88268da > > Author: Andrea Arcangeli > > Date: Fri Feb 22 15:11:51 2013 -0800 > > > > x86/mm/pa

Re: x86/mm/pageattr: Code without effect?

2013-04-08 Thread Andrea Arcangeli
On Mon, Apr 08, 2013 at 03:53:31PM +0100, Andy Whitcroft wrote: > On Sat, Apr 06, 2013 at 04:58:04PM +0200, Andrea Arcangeli wrote: > > > You're right, so this location clearly didn't trigger the problem so I > > didn't notice the noop here. I only exercised the

[PATCH] mm: pageattr: convert noop to functional fix

2013-04-10 Thread Andrea Arcangeli
it to the right variable in the new location. Reported-by: Stefan Bader Signed-off-by: Andrea Arcangeli --- arch/x86/mm/pageattr.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 091934e..7896f71 100644 --- a/a

[PATCH 2/4] mm: rmap preparation for remap_anon_pages

2013-05-06 Thread Andrea Arcangeli
p_anon_pages runs. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c | 24 mm/rmap.c| 9 + 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f46aad1..9a2e235 100644 --- a/mm/huge_memory.c +++ b/mm

[PATCH 1/4] mm: madvise MADV_USERFAULT

2013-05-06 Thread Andrea Arcangeli
exclusive if set. Signed-off-by: Andrea Arcangeli --- arch/alpha/include/uapi/asm/mman.h | 3 +++ arch/mips/include/uapi/asm/mman.h | 3 +++ arch/parisc/include/uapi/asm/mman.h| 3 +++ arch/xtensa/include/uapi/asm/mman.h| 3 +++ include/linux/mm.h | 1

[PATCH 4/4] mm: sys_remap_anon_pages

2013-05-06 Thread Andrea Arcangeli
remap_anon_pages only once for each range received). So remap_anon_pages in the above testcase runs within the signal handler, but in production postcopy, it can run in a different thread. Signed-off-by: Andrea Arcangeli --- arch/x86/syscalls/syscall_32.tbl | 1 + arch/x86/syscalls/

[PATCH 3/4] mm: swp_entry_swapcount

2013-05-06 Thread Andrea Arcangeli
in some anon_vma. Signed-off-by: Andrea Arcangeli --- include/linux/swap.h | 6 ++ mm/swapfile.c| 13 + 2 files changed, 19 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 1701ce4..0ea2a56 100644 --- a/include/linux/swap.h +++ b/include/linux

[PATCH 0/4] madvise(MADV_USERFAULT) & sys_remap_anon_pages()

2013-05-06 Thread Andrea Arcangeli
nd to be strict and be sure it knows what it is doing (otherwise it should use mremap in the first place?). Comments welcome, thanks! Andrea Andrea Arcangeli (4): mm: madvise MADV_USERFAULT mm: rmap preparation for remap_anon_pages mm: swp_entry_swapcount mm: sys_remap_anon_pages arch/alp

Re: [PATCH 4/4] mm: sys_remap_anon_pages

2013-05-06 Thread Andrea Arcangeli
On Mon, May 06, 2013 at 09:57:01PM +0200, Andrea Arcangeli wrote: > === > > static unsigned char *c, *tmp; > > void userfault_sighandler(int signum, siginfo_t *info, void *ctx) oops, the hash of the test program got cut... so I append it below which is nicer without lead

Re: [Qemu-devel] [PATCH 1/4] mm: madvise MADV_USERFAULT

2013-05-07 Thread Andrea Arcangeli
Hi Andrew, On Tue, May 07, 2013 at 01:16:30PM +0200, Andrew Jones wrote: > On Mon, May 06, 2013 at 09:56:58PM +0200, Andrea Arcangeli wrote: > > @@ -405,6 +420,7 @@ madvise_behavior_valid(int behavior) > > case MADV_HUGEPAGE: > > case MADV_NOHUGEPAGE: >

Re: [PATCH 0/4] madvise(MADV_USERFAULT) & sys_remap_anon_pages()

2013-05-07 Thread Andrea Arcangeli
Hi Isaku, On Tue, May 07, 2013 at 07:07:40PM +0900, Isaku Yamahata wrote: > On Mon, May 06, 2013 at 09:56:57PM +0200, Andrea Arcangeli wrote: > > Hello everyone, > > > > this is a patchset to implement two new kernel features: > > MADV_USERFAULT and remap_anon_pages.

Re: [Qemu-devel] [PATCH 0/4] madvise(MADV_USERFAULT) & sys_remap_anon_pages()

2013-05-07 Thread Andrea Arcangeli
On Tue, May 07, 2013 at 01:38:10PM +0200, Andrew Jones wrote: > What about instead of adding a new syscall (remap_anon_pages) to > instead extend mremap with new flags giving it a strict mode? I actually thought about this and it's a very interesting argument. When I thought about it, I felt the

Re: [tip:x86/urgent] x86/mm/cpa: Convert noop to functional fix

2013-04-11 Thread Andrea Arcangeli
Hi, On Thu, Apr 11, 2013 at 02:29:18PM +0200, Ingo Molnar wrote: > > > * tip-bot for Andrea Arcangeli wrote: > > > Commit-ID: f76cfa3c2496c462b5bc01bd0c9340c2715b73ca > > Gitweb: > > http://git.kernel.org/tip/f76cfa3c2496c462b5bc01bd0c9340c2715b73ca &g

[PATCH] cpa: pageattr-test: fix false positive in CPA self test

2013-04-11 Thread Andrea Arcangeli
If the pmd is not present, _PAGE_PSE will not be set anymore. Fix the false positive. Reported-by: Ingo Molnar Signed-off-by: Andrea Arcangeli --- arch/x86/mm/pageattr-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr

Re: thp and memory barrier assumptions

2012-08-03 Thread Andrea Arcangeli
ely. This can't affect x86 where even a locked bitop is the equivalent of a full memory barrier. > Also, what is that barrier() in handle_mm_fault() doing? And why doesn't > it have a comment explaining that? I added the docs below: = >From ad51771a2c3fa697fa0267edda23b48

Re: [RFC] page-table walkers vs memory order

2012-08-04 Thread Andrea Arcangeli
On Tue, Jul 24, 2012 at 02:51:05PM -0700, Hugh Dickins wrote: > Since then, I think THP has made the rules more complicated; but I > believe Andrea paid a great deal of attention to that kind of issue. There were many issues, one unexpected was 1a5a9906d4e8d1976b701f889d8f35d54b928f25. Keep in mi

Re: [RFC] page-table walkers vs memory order

2012-08-04 Thread Andrea Arcangeli
On Sat, Aug 04, 2012 at 03:02:45PM -0700, Paul E. McKenney wrote: > OK, I'll bite. ;-) :)) > The most sane way for this to happen is with feedback-driven techniques > involving profiling, similar to what is done for basic-block reordering > or branch prediction. The idea is that you compile the

Re: mm: kernel BUG at mm/memory.c:1230

2012-08-21 Thread Andrea Arcangeli
tested yet. Reviews welcome. Especially if you could test it again with trinity over the mbind syscall it'd be wonderful. Thanks, Andrea === >From 59af0d4348eb07087097e310f60422b994dd3a2c Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Tue, 21 Aug 2012 19:32:23 +0200 Subject: [PATCH] thp: make pmd_pres

[PATCH 14/36] autonuma: call autonuma_setup_new_exec()

2012-08-22 Thread Andrea Arcangeli
This resets all per-thread and per-process statistics across exec syscalls or after kernel threads detach from the mm. The past statistical NUMA information is unlikely to be relevant for the future in these cases. Acked-by: Rik van Riel Signed-off-by: Andrea Arcangeli --- fs/exec.c

[PATCH 22/36] autonuma: make khugepaged pte_numa aware

2012-08-22 Thread Andrea Arcangeli
and make it tunable with sysfs too. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c | 33 +++-- 1 files changed, 31 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 08fd33c..a65590f 100644 --- a/mm/huge_memory.c +++ b/mm

[PATCH 21/36] autonuma: call autonuma_split_huge_page()

2012-08-22 Thread Andrea Arcangeli
This is needed to make sure the tail pages are also queued into the migration queues of knuma_migrated across a transparent hugepage split. Acked-by: Rik van Riel Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/mm

[PATCH 11/36] autonuma: add page structure fields

2012-08-22 Thread Andrea Arcangeli
allocated page_autonuma of 32 bytes per page (only allocated if booted on NUMA hardware, unless "noautonuma" is passed as parameter to the kernel at boot). Yet another later patch introduces the autonuma_list and reduces the size of the page_autonuma from 32 to 12 bytes. Signed-off-

[PATCH 17/36] autonuma: prevent select_task_rq_fair to return -1

2012-08-22 Thread Andrea Arcangeli
ned-off-by: Andrea Arcangeli --- kernel/sched/fair.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 42a88fa..677b99e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2794,6 +2794,17 @@ select_ta

[PATCH 35/36] autonuma: add knuma_migrated/allow_first_fault in sysfs

2012-08-22 Thread Andrea Arcangeli
it reduces some initial thrashing in case of NUMA false sharing. Signed-off-by: Andrea Arcangeli --- include/linux/autonuma_flags.h | 20 mm/autonuma.c |7 +-- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/include/linux

[PATCH 13/36] autonuma: autonuma_enter/exit

2012-08-22 Thread Andrea Arcangeli
low the NUMA hinting page faults to start. All other actions follow after that. If knuma_scand doesn't run, AutoNUMA is fully bypassed. If knuma_scand is stopped, soon all other AutoNUMA gears will settle down too. Acked-by: Rik van Riel Signed-off-by: Andrea Arcangeli --- kernel/fork.c |

[PATCH 18/36] autonuma: teach CFS about autonuma affinity

2012-08-22 Thread Andrea Arcangeli
to the original value of -1 and task_autonuma_cpu will always return true in that case. Includes fixes from Hillf Danton . Signed-off-by: Andrea Arcangeli --- kernel/sched/fair.c | 71 ++ 1 files changed, 59 insertions(+), 12 deletions(-) dif

[PATCH 07/36] autonuma: mm_autonuma and task_autonuma data structures

2012-08-22 Thread Andrea Arcangeli
Define the two data structures that collect the per-process (in the mm) and per-thread (in the task_struct) statistical information that are the input of the CPU follow memory algorithms in the NUMA scheduler. Signed-off-by: Andrea Arcangeli --- include/linux/autonuma_types.h | 107

[PATCH 01/36] autonuma: make set_pmd_at always available

2012-08-22 Thread Andrea Arcangeli
set_pmd_at() will also be used for the knuma_scand/pmd = 1 (default) mode even when TRANSPARENT_HUGEPAGE=n. Make it available so the build won't fail. Acked-by: Rik van Riel Signed-off-by: Andrea Arcangeli --- arch/x86/include/asm/paravirt.h |2 -- 1 files changed, 0 insertions(

[PATCH 15/36] autonuma: alloc/free/init task_autonuma

2012-08-22 Thread Andrea Arcangeli
NUMA hardware. So the non NUMA hardware only pays the memory of a pointer in the kernel stack (which remains NULL at all times in that case). If the kernel is compiled with CONFIG_AUTONUMA=n, not even the pointer is allocated on the kernel stack of course. Signed-off-by: Andrea Arcangeli

[PATCH 32/36] autonuma: boost khugepaged scanning rate

2012-08-22 Thread Andrea Arcangeli
Until THP native migration is implemented it's safer to boost khugepaged scanning rate because all memory migration are splitting the hugepages. So the regular rate of scanning becomes too low when lots of memory is migrated. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c |6

[PATCH 27/36] autonuma: add CONFIG_AUTONUMA and CONFIG_AUTONUMA_DEFAULT_ENABLED

2012-08-22 Thread Andrea Arcangeli
Add the config options to allow building the kernel with AutoNUMA. If CONFIG_AUTONUMA_DEFAULT_ENABLED is "=y", then /sys/kernel/mm/autonuma/enabled will be equal to 1, and AutoNUMA will be enabled automatically at boot. Signed-off-by: Andrea Arcangeli --- arch/Kconfig |3 +++

[PATCH 23/36] autonuma: retain page last_nid information in khugepaged

2012-08-22 Thread Andrea Arcangeli
When pages are collapsed try to keep the last_nid information from one of the original pages. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a65590f..0d2a12f

[PATCH 28/36] autonuma: page_autonuma

2012-08-22 Thread Andrea Arcangeli
ted if the kernel is booted on real NUMA hardware and noautonuma is not passed as a parameter to the kernel. Signed-off-by: Andrea Arcangeli --- include/linux/autonuma.h | 18 +++- include/linux/autonuma_types.h | 55 + include/linux/mm_types.h | 26 in

[PATCH 10/36] autonuma: CPU follows memory algorithm

2012-08-22 Thread Andrea Arcangeli
Code include fixes and cleanups from Hillf Danton . Signed-off-by: Andrea Arcangeli --- include/linux/autonuma_sched.h | 50 include/linux/mm_types.h |5 + include/linux/sched.h |3 + kernel/sched/core.c|1 + kernel/sched/fair.c|4 +

[PATCH 00/36] AutoNUMA24

2012-08-22 Thread Andrea Arcangeli
e able to wait on process migration (avoid _nowait), but most of the time it does nothing at all. Changelog from alpha11 to alpha13: o autonuma_balance optimization (take the fast path when process is in the preferred NUMA node) TODO: o THP native migration (orthogonal and also needed for c

[PATCH 05/36] autonuma: teach gup_fast about pmd_numa

2012-08-22 Thread Andrea Arcangeli
y: Rik van Riel Signed-off-by: Andrea Arcangeli --- arch/x86/mm/gup.c | 13 - 1 files changed, 12 insertions(+), 1 deletions(-) diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c index dd74e46..02c5ec5 100644 --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c @@ -163,8 +163,19 @@

[PATCH 31/36] autonuma: shrink the per-page page_autonuma struct size

2012-08-22 Thread Andrea Arcangeli
space reserved for now). This means the max RAM configuration fully supported by AutoNUMA becomes AUTONUMA_LIST_MAX_PFN_OFFSET multiplied by 32767 nodes multiplied by the PAGE_SIZE (assume 4096 here, but for some archs it's bigger). 4096*32767*(0x-3)>>(10*5) = 511 PetaBytes. Si

[PATCH 24/36] autonuma: numa hinting page faults entry points

2012-08-22 Thread Andrea Arcangeli
This is where the numa hinting page faults are detected and are passed over to the AutoNUMA core logic. Signed-off-by: Andrea Arcangeli --- include/linux/huge_mm.h |2 ++ mm/huge_memory.c| 18 ++ mm/memory.c | 31 +++ 3

[PATCH 12/36] autonuma: knuma_migrated per NUMA node queues

2012-08-22 Thread Andrea Arcangeli
the memory in a round robin fashion from all remote nodes to the daemon's local node. The head that belongs to the local node that knuma_migrated runs on, for now must be empty and it's not being used. Signed-off-by: Andrea Arcangeli --- include/linux/mmzone.h | 18

  1   2   3   4   5   6   7   8   9   10   >