[PATCH 26/36] autonuma: link mm/autonuma.o and kernel/sched/numa.o

2012-08-22 Thread Andrea Arcangeli
Link the AutoNUMA core and scheduler object files in the kernel if CONFIG_AUTONUMA=y. Signed-off-by: Andrea Arcangeli --- kernel/sched/Makefile |1 + mm/Makefile |1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile

[PATCH 02/36] autonuma: export is_vma_temporary_stack() even if CONFIG_TRANSPARENT_HUGEPAGE=n

2012-08-22 Thread Andrea Arcangeli
is_vma_temporary_stack() is needed by mm/autonuma.c too, and without this the build breaks with CONFIG_TRANSPARENT_HUGEPAGE=n. Reported-by: Petr Holasek Acked-by: Rik van Riel Signed-off-by: Andrea Arcangeli --- include/linux/huge_mm.h |4 ++-- 1 files changed, 2 insertions(+), 2

[PATCH 03/36] autonuma: define _PAGE_NUMA_PTE and _PAGE_NUMA_PMD

2012-08-22 Thread Andrea Arcangeli
ptes established by ioremap, never on pmds so there's no risk of collision with Xen. Signed-off-by: Andrea Arcangeli --- arch/x86/include/asm/pgtable_types.h | 28 1 files changed, 28 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h

[PATCH 08/36] autonuma: define the autonuma flags

2012-08-22 Thread Andrea Arcangeli
These flags are the ones tweaked through sysfs, they control the behavior of autonuma, from enabling disabling it, to selecting various runtime options. Signed-off-by: Andrea Arcangeli --- include/linux/autonuma_flags.h | 129 1 files changed, 129

[PATCH 19/36] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection

2012-08-22 Thread Andrea Arcangeli
udes some fixes from Hillf Danton . Math documentation on autonuma_last_nid in the header of last_nid_set() reworked from sched-numa code by Peter Zijlstra . Signed-off-by: Andrea Arcangeli Signed-off-by: Hillf Danton --- mm/autonuma.c | 1619 +++

[PATCH 33/36] autonuma: powerpc port

2012-08-22 Thread Andrea Arcangeli
be observed/verified Signed-off-by: Vaidyanathan Srinivasan Signed-off-by: Andrea Arcangeli --- arch/powerpc/include/asm/pgtable.h| 48 - arch/powerpc/include/asm/pte-hash64-64k.h |4 ++- arch/powerpc/mm/numa.c|3 +- mm/autonuma.c

[PATCH 34/36] autonuma: make the AUTONUMA_SCAN_PMD_FLAG conditional to CONFIG_HAVE_ARCH_AUTONUMA_SCAN_PMD

2012-08-22 Thread Andrea Arcangeli
Remove the sysfs entry /sys/kernel/mm/autonuma/knuma_scand/pmd and force the knuma_scand pmd mode off if CONFIG_HAVE_ARCH_AUTONUMA_SCAN_PMD is not set by the architecture. Enable AutoNUMA for PPC64. Signed-off-by: Andrea Arcangeli --- arch/Kconfig |3 +++ arch/powerpc/Kconfig

[PATCH 06/36] autonuma: introduce kthread_bind_node()

2012-08-22 Thread Andrea Arcangeli
This function makes it easy to bind the per-node knuma_migrated threads to their respective NUMA nodes. Those threads take memory from the other nodes (in round robin with a incoming queue for each remote node) and they move that memory to their local node. Signed-off-by: Andrea Arcangeli

[PATCH 09/36] autonuma: core autonuma.h header

2012-08-22 Thread Andrea Arcangeli
Header that defines the generic AutoNUMA specific functions. All functions are defined unconditionally, but are only linked into the kernel if CONFIG_AUTONUMA=y. When CONFIG_AUTONUMA=n, their call sites are optimized away at build time (or the kernel wouldn't link). Signed-off-by: A

[PATCH 16/36] autonuma: alloc/free/init mm_autonuma

2012-08-22 Thread Andrea Arcangeli
n not NUMA hardware the memory cost is reduced to one pointer per mm. To get rid of the pointer in the each mm, the kernel can be compiled with CONFIG_AUTONUMA=n. Signed-off-by: Andrea Arcangeli --- kernel/fork.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/kerne

[PATCH 36/36] autonuma: add mm_autonuma working set estimation

2012-08-22 Thread Andrea Arcangeli
are never used. Signed-off-by: Andrea Arcangeli --- include/linux/autonuma_flags.h | 25 ++--- mm/autonuma.c | 25 + 2 files changed, 47 insertions(+), 3 deletions(-) diff --git a/include/linux/autonuma_flags.h b/include/linux

[PATCH 20/36] autonuma: default mempolicy follow AutoNUMA

2012-08-22 Thread Andrea Arcangeli
If an task_selected_nid has already been selected for the task, try to allocate memory from it even if it's temporarily not the local node. Chances are it's where most of its memory is already located and where it will run in the future. Acked-by: Rik van Riel Signed-off-by: Andrea

[PATCH 29/36] autonuma: autonuma_migrate_head[0] dynamic size

2012-08-22 Thread Andrea Arcangeli
Reduce the autonuma_migrate_head array entries from MAX_NUMNODES to num_possible_nodes() or zero if autonuma is not possible. Signed-off-by: Andrea Arcangeli --- arch/x86/mm/numa.c |6 -- arch/x86/mm/numa_32.c |3 ++- include/linux/memory_hotplug.h |3

[PATCH 25/36] autonuma: reset autonuma page data when pages are freed

2012-08-22 Thread Andrea Arcangeli
When pages are freed abort any pending migration. If knuma_migrated arrives first it will notice because get_page_unless_zero would fail. You can safely ignore the #ifdef because a later patch (page_autonuma) clears it. Signed-off-by: Andrea Arcangeli --- mm/page_alloc.c |4 1 files

[PATCH 30/36] autonuma: bugcheck page_autonuma fields on newly allocated pages

2012-08-22 Thread Andrea Arcangeli
Debug tweak. Signed-off-by: Andrea Arcangeli --- include/linux/autonuma.h | 19 +++ mm/page_alloc.c |3 ++- 2 files changed, 21 insertions(+), 1 deletions(-) diff --git a/include/linux/autonuma.h b/include/linux/autonuma.h index 1d87ecc..8a779e0 100644 --- a

[PATCH 04/36] autonuma: pte_numa() and pmd_numa()

2012-08-22 Thread Andrea Arcangeli
bly add pages to migrate queues. They are extremely quick, absolutely non-blocking and do not allocate memory. The generic implementation is used when CONFIG_AUTONUMA=n. Acked-by: Rik van Riel Signed-off-by: Andrea Arcangeli --- arch/x86/include/asm/pgtable.h |

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 02:03:41PM +0800, Xiao Guangrong wrote: > On 08/21/2012 11:06 PM, Andrea Arcangeli wrote: > > CPU0CPU1 > > oldpage[1] == 0 (both guest & host) > > oldpage[0] = 1 > > trigg

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 11:51:17AM +0800, Xiao Guangrong wrote: > Hmm, in KSM code, i found this code in replace_page: > > set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot)); > > It is possible to establish a writable pte, no? Hugh already answered this thanks. Further details o

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
Hi Andrew, On Wed, Aug 22, 2012 at 12:15:35PM -0700, Andrew Morton wrote: > On Wed, 22 Aug 2012 18:29:55 +0200 > Andrea Arcangeli wrote: > > > On Wed, Aug 22, 2012 at 02:03:41PM +0800, Xiao Guangrong wrote: > > > On 08/21/2012 11:06 PM, Andrea Arcan

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 12:58:05PM -0700, Andrew Morton wrote: > If you can suggest some text I'll type it in right now. Ok ;), I tried below: This is safe to start by updating the secondary MMUs, because the relevant primary MMU pte invalidate must have already happened with a ptep_clear_flush b

Re: [PATCH 19/36] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection

2012-08-22 Thread Andrea Arcangeli
Hi Andi, On Wed, Aug 22, 2012 at 01:19:04PM -0700, Andi Kleen wrote: > Andrea Arcangeli writes: > > > +/* > > + * In this function we build a temporal CPU_node<->page relation by > > + * using a two-stage autonuma_last_nid filter to remove short/unlikely > >

Re: [PATCH 00/36] AutoNUMA24

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 11:40:48PM +0200, Ingo Molnar wrote: > > * Rik van Riel wrote: > > > On 08/22/2012 10:58 AM, Andrea Arcangeli wrote: > > >Hello everyone, > > > > > >Before the Kernel Summit, I think it's good idea to post a new > >

Re: [PATCH 33/36] autonuma: powerpc port

2012-08-22 Thread Andrea Arcangeli
On Thu, Aug 23, 2012 at 08:01:47AM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2012-08-22 at 16:59 +0200, Andrea Arcangeli wrote: > > diff --git a/arch/powerpc/include/asm/pgtable.h > > b/arch/powerpc/include/asm/pgtable.h > > index 2e0e411..5f03079 100644 > > ---

Re: [PATCH 19/36] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection

2012-08-22 Thread Andrea Arcangeli
Hi Andi, On Thu, Aug 23, 2012 at 12:37:33AM +0200, Andi Kleen wrote: > > > > This comment seems quite accurate to me (btw I taken it from > > sched-numa rewrite with minor changes). > > I had expected it to describe the next function. If it's a strategic > overview maybe it should be somewhere e

Re: [PATCH 33/36] autonuma: powerpc port

2012-08-22 Thread Andrea Arcangeli
Hi Benjamin, On Thu, Aug 23, 2012 at 08:56:34AM +1000, Benjamin Herrenschmidt wrote: > What I mean here is that it's fine as a proof of concept ;-) I don't > like it being in a series aimed at upstream... > > We can try to flush out the issues, but as it is, the patch isn't > upstreamable imho.

Re: [PATCH 33/36] autonuma: powerpc port

2012-08-23 Thread Andrea Arcangeli
Hi Benjamin, On Thu, Aug 23, 2012 at 03:11:00PM +1000, Benjamin Herrenschmidt wrote: > Basically PROT_NONE turns into _PAGE_PRESENT without _PAGE_USER for us. Maybe the simplest is to implement pte_numa as !_PAGE_USER too. No need to clear the _PAGE_PRESENT bit and to alter pte_present() if clear

Re: [PATCH 00/19] sched-numa rewrite

2012-08-08 Thread Andrea Arcangeli
Hi everyone, On Tue, Jul 31, 2012 at 09:12:04PM +0200, Peter Zijlstra wrote: > Hi all, > > After having had a talk with Rik about all this NUMA nonsense where he > proposed > the scheme implemented in the next to last patch, I came up with a related > means of doing the home-node selection. > >

Re: [PATCH 02/19] mm/mpol: Remove NUMA_INTERLEAVE_HIT

2012-08-09 Thread Andrea Arcangeli
Hi, On Tue, Jul 31, 2012 at 09:12:06PM +0200, Peter Zijlstra wrote: > Since the NUMA_INTERLEAVE_HIT statistic is useless on its own; it wants > to be compared to either a total of interleave allocations or to a miss > count, remove it. > > Fixing it would be possible, but since we've gone years w

Re: [PATCH 04/19] mm, thp: Preserve pgprot across huge page split

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:08PM +0200, Peter Zijlstra wrote: > If we marked a THP with our special PROT_NONE protections, ensure we > don't loose them over a split. > > Collapse seems to always allocate a new (huge) page which should > already end up on the new target node so loosing protection

Re: [PATCH 05/19] mm, mpol: Create special PROT_NONE infrastructure

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:09PM +0200, Peter Zijlstra wrote: > +static bool pte_prot_none(struct vm_area_struct *vma, pte_t pte) > +{ > + /* > + * If we have the normal vma->vm_page_prot protections we're not a > + * 'special' PROT_NONE page. > + * > + * This means we can

Re: [PATCH 07/19] mm/mpol: Add MPOL_MF_NOOP

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:11PM +0200, Peter Zijlstra wrote: > From: Lee Schermerhorn > > This patch augments the MPOL_MF_LAZY feature by adding a "NOOP" > policy to mbind(). When the NOOP policy is used with the 'MOVE > and 'LAZY flags, mbind() [check_range()] will walk the specified > range

Re: [PATCH 10/19] mm, mpol: Use special PROT_NONE to migrate pages

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:14PM +0200, Peter Zijlstra wrote: > +#ifdef CONFIG_NUMA > /* > - * Do fancy stuff... > + * For NUMA systems we use the special PROT_NONE maps to drive > + * lazy page migration, see MPOL_MF_LAZY and related. >*/ > + page = vm_normal_pag

Re: [PATCH 15/19] sched: Implement home-node awareness

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:19PM +0200, Peter Zijlstra wrote: > @@ -2699,6 +2705,29 @@ select_task_rq_fair(struct task_struct * > } > > rcu_read_lock(); > + if (sched_feat_numa(NUMA_BIAS) && node != -1) { > + int node_cpu; > + > + node_cpu = cpumask_any_a

Re: [PATCH 18/19] sched, numa: Per task memory placement for big processes

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:22PM +0200, Peter Zijlstra wrote: > Implement a per-task memory placement scheme for 'big' tasks (as per > the last patch). It relies on a regular PROT_NONE 'migration' fault to > scan the memory space of the procress and uses a two stage migration > scheme to reduce t

Re: [patch 1/6] mmu_notifier: Core code

2008-02-16 Thread Andrea Arcangeli
exact like the ptes. XPMEM adds the requirement that sptes are infact remote entities that are mangled by a message passing protocol over the network, it's the same as ptep_clear_flush being required to schedule and send skbs to be successful and allowing try_to_unmap to do its work. Same p

Re: [PATCH] KVM swapping with MMU Notifiers V7

2008-02-18 Thread Andrea Arcangeli
On Sat, Feb 16, 2008 at 03:08:17AM -0800, Andrew Morton wrote: > On Sat, 16 Feb 2008 11:48:27 +0100 Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > > > +void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, > > +

Re: [PATCH] KVM swapping with MMU Notifiers V7

2008-02-18 Thread Andrea Arcangeli
On Sat, Feb 16, 2008 at 05:51:38AM -0600, Robin Holt wrote: > I am doing this in xpmem with a stack-based structure in the function > calling get_user_pages. That structure describes the start and > end address of the range we are doing the get_user_pages on. If an > invalidate_range_begin comes

Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 07:46:10PM +1100, Nick Piggin wrote: > On Sunday 17 February 2008 06:22, Christoph Lameter wrote: > > On Fri, 15 Feb 2008, Andrew Morton wrote: > > > > > flush_cache_page(vma, address, pte_pfn(*pte)); > > > > entry = ptep_clear_flush(vma, add

Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 07:54:14PM +1100, Nick Piggin wrote: > As far as sleeping inside callbacks goes... I think there are big > problems with the patch (the sleeping patch and the external rmap > patch). I don't think it is workable in its current state. Either > we have to make some big changes

Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote: > are rather similar. However I have tried to make a point of minimising the > impact the the core mm/. I don't see why we need to invalidate or flush I also tried hard to minimise the impact of the core mm/, I also argued with Christoph

Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 11:59:23PM +0100, Nick Piggin wrote: > That's why I don't understand the need for the pairs: it should be > done like this. Yes, except it can't be done like this for xpmem. > OK, I didn't see the invalidate_pages call... See the last patch I posted to Andrew, you've prob

Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 12:04:27AM +0100, Nick Piggin wrote: > On Tue, Feb 19, 2008 at 08:27:25AM -0600, Jack Steiner wrote: > > > On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote: > > > > understand the need for invalidate_begin/invalidate_end pairs at al

Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote: > Sorry, I realise I still didn't get this through my head yet (and also > have not seen your patch recently). So I don't know exactly what you > are doing... The last version was posted here: http://marc.info/?l=kvm-devel&m=12032173252

Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-19 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 10:08:49AM +1100, Nick Piggin wrote: > You can't sleep inside rcu_read_lock()! > > I must say that for a patch that is up to v8 or whatever and is > posted twice a week to such a big cc list, it is kind of slack to > not even test it and expect other people to review it. W

[PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
mmap_lock to a mutex. I doubt xpmem fits inside a CONFIG_MMU_NOTIFIER anymore, or we'll all run a bit slower because of it. It's really a call of how much we want to optimize the MMU notifier, by keeping things like RCU for the registration. Signed-off-by: Andrea Arcangeli <[EMAIL PROT

[PATCH] KVM swapping (+ seqlock fix) with mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
, without requiring a page pin). Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 41962e7..e1287ab 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM tristate "Kernel-based Virt

Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 05:33:13AM -0600, Robin Holt wrote: > But won't that other "subsystem" cause us to have two seperate callouts > that do equivalent things and therefore force a removal of this and go > back to what Christoph has currently proposed? The point is that a new kind of notifier t

Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 06:24:24AM -0600, Robin Holt wrote: > We do not need to do any allocation in the messaging layer, all > structures used for messaging are allocated at module load time. > The allocation discussions we had early on were about trying to > rearrange you notifiers to allow a sep

Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 08:41:55AM -0600, Robin Holt wrote: > On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote: > > XPMEM simply can't use RCU for the registration locking if it wants to > > schedule inside the mmu notifier calls. So I guess it's better to

Re: [PATCH] mmu notifiers #v6

2008-02-21 Thread Andrea Arcangeli
On Thu, Feb 21, 2008 at 05:54:30AM +0100, Nick Piggin wrote: > will send you incremental changes that can be discussed more easily > that way (nothing major, mainly style and minor things). I don't need to say you're very welcome ;). > I agree: your coherent, non-sleeping mmu notifiers are pretty

Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address

2012-08-16 Thread Andrea Arcangeli
Hi Kirill, On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote: > for (i = 0; i < pages_per_huge_page; >i++, p = mem_map_next(p, page, i)) { It may be more optimal to avoid a multiplication/shiftleft before the add, and to do: for (i = 0, vaddr = haddr; i

Re: [PATCH] mmu notifiers #v5

2008-02-05 Thread Andrea Arcangeli
On Tue, Feb 05, 2008 at 10:17:41AM -0800, Christoph Lameter wrote: > The other approach will not have any remote ptes at that point. Why would > there be a coherency issue? It never happens that two threads writes to two different physical pages by working on the same process virtual address. Thi

Re: [PATCH] mmu notifiers #v5

2008-02-05 Thread Andrea Arcangeli
On Tue, Feb 05, 2008 at 02:06:23PM -0800, Christoph Lameter wrote: > On Tue, 5 Feb 2008, Andrea Arcangeli wrote: > > > On Tue, Feb 05, 2008 at 10:17:41AM -0800, Christoph Lameter wrote: > > > The other approach will not have any remote ptes at that point. Why would >

Re: [PATCH] mmu notifiers #v5

2008-02-05 Thread Andrea Arcangeli
On Tue, Feb 05, 2008 at 03:10:52PM -0800, Christoph Lameter wrote: > On Tue, 5 Feb 2008, Andrea Arcangeli wrote: > > > > You can avoid the page-pin and the pt lock completely by zapping the > > > mappings at _start and then holding off new references until _end. &

Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrea Arcangeli
On Fri, Feb 08, 2008 at 04:36:16PM -0800, Christoph Lameter wrote: > On Fri, 8 Feb 2008, Roland Dreier wrote: > > > That would of course work -- dumb adapters would just always fail, > > which might be inefficient. > > H.. that means we need something that actually pins pages for good so > t

Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrea Arcangeli
On Fri, Feb 08, 2008 at 05:27:03PM -0800, Christoph Lameter wrote: > Pages will still be on the LRU and cycle through rmap again and again. > If page migration is used on those pages then the code may make repeated > attempt to migrate the page thinking that the page count must at some > point d

Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address

2012-08-16 Thread Andrea Arcangeli
On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote: > Hm.. I think with static_key we can avoid cache overhead here. I'll try. Could you elaborate on the static_key? Is it some sort of self modifying code? > Thanks, for review. Could you take a look at huge zero page patchset? ;)

Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page

2012-08-16 Thread Andrea Arcangeli
On Thu, Aug 09, 2012 at 12:08:18PM +0300, Kirill A. Shutemov wrote: > +static void __split_huge_zero_page_pmd(struct mm_struct *mm, pmd_t *pmd, > + unsigned long address) > +{ > + pgtable_t pgtable; > + pmd_t _pmd; > + unsigned long haddr = address & HPAGE_PMD_MASK; > +

Re: [PATCH, RFC 0/9] Introduce huge zero page

2012-08-16 Thread Andrea Arcangeli
Hi Andrew, On Thu, Aug 16, 2012 at 12:20:23PM -0700, Andrew Morton wrote: > That's a pretty big improvement for a rather fake test case. I wonder > how much benefit we'd see with real workloads? The same discussion happened about the zero page in general and there's no easy answer. I seem to rec

Re: [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd()

2012-08-16 Thread Andrea Arcangeli
On Thu, Aug 09, 2012 at 12:08:17PM +0300, Kirill A. Shutemov wrote: > From: "Kirill A. Shutemov" > > It's required to implement huge zero pmd splitting. > This isn't bisectable with the next one, it'd fail on wfg 0-DAY kernel build testing backend, however this is clearly to separate this patch

Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address

2012-08-16 Thread Andrea Arcangeli
On Thu, Aug 16, 2012 at 09:37:25PM +0300, Kirill A. Shutemov wrote: > On Thu, Aug 16, 2012 at 08:29:44PM +0200, Andrea Arcangeli wrote: > > On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote: > > > Hm.. I think with static_key we can avoid cache overh

Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page

2012-08-17 Thread Andrea Arcangeli
On Fri, Aug 17, 2012 at 11:12:33AM +0300, Kirill A. Shutemov wrote: > I've used do_huge_pmd_wp_page_fallback() as template for my code. > What's difference between these two code paths? > Why is do_huge_pmd_wp_page_fallback() safe? Good point. do_huge_pmd_wp_page_fallback works only on the current

Re: [PATCH 00/19] sched-numa rewrite

2012-08-17 Thread Andrea Arcangeli
Hi, On Wed, Aug 08, 2012 at 02:43:34PM -0400, Rik van Riel wrote: > While the sched-numa code is relatively small and clean, the > current version does not seem to offer a significant > performance improvement over not having it, and in one of > the tests performance actually regresses vs. mainlin

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-21 Thread Andrea Arcangeli
pte/spte established by set_pte_at_notify/change_pte is readonly we don't need to do the ptep_clear_flush_notify instead because when the host will write to the page that will fault and serialize against the PT lock (set_pte_at_notify must always run under the PT lock of course).

Re: [PATCH 18/40] autonuma: call autonuma_setup_new_exec()

2012-07-12 Thread Andrea Arcangeli
Hi, On Sat, Jun 30, 2012 at 01:04:26AM -0400, Konrad Rzeszutek Wilk wrote: > On Thu, Jun 28, 2012 at 02:55:58PM +0200, Andrea Arcangeli wrote: > > This resets all per-thread and per-process statistics across exec > > syscalls or after kernel threads detached from the mm. The past

Re: [PATCH 19/40] autonuma: alloc/free/init sched_autonuma

2012-07-12 Thread Andrea Arcangeli
Hi Konrad, On Sat, Jun 30, 2012 at 01:10:01AM -0400, Konrad Rzeszutek Wilk wrote: > On Thu, Jun 28, 2012 at 02:55:59PM +0200, Andrea Arcangeli wrote: > > This is where the dynamically allocated sched_autonuma structure is > > being handled. > > > > The reason for

Re: [PATCH 20/40] autonuma: alloc/free/init mm_autonuma

2012-07-12 Thread Andrea Arcangeli
On Sat, Jun 30, 2012 at 01:12:18AM -0400, Konrad Rzeszutek Wilk wrote: > On Thu, Jun 28, 2012 at 02:56:00PM +0200, Andrea Arcangeli wrote: > > This is where the mm_autonuma structure is being handled. Just like > > sched_autonuma, this is only allocated at runtime if the hardware th

Re: [PATCH 20/40] autonuma: alloc/free/init mm_autonuma

2012-07-12 Thread Andrea Arcangeli
Hi Rik, On Sun, Jul 01, 2012 at 11:33:17AM -0400, Rik van Riel wrote: > On 06/28/2012 08:56 AM, Andrea Arcangeli wrote: > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 0adbe09..3e5a0d9 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c >

Re: [PATCH 28/40] autonuma: make khugepaged pte_numa aware

2012-07-12 Thread Andrea Arcangeli
On Mon, Jul 02, 2012 at 12:24:36AM -0400, Rik van Riel wrote: > On 06/28/2012 08:56 AM, Andrea Arcangeli wrote: > > If any of the ptes that khugepaged is collapsing was a pte_numa, the > > resulting trans huge pmd will be a pmd_numa too. > > Why? > > If some of the

Re: [PATCH 36/40] autonuma: page_autonuma

2012-07-12 Thread Andrea Arcangeli
On Sat, Jun 30, 2012 at 01:24:05AM -0400, Konrad Rzeszutek Wilk wrote: > I think you are better using a different name. > > Perhaps 'if (autonuma_on())' I changed it to AUTONUMA_POSSIBLE_FLAG/autonuma_possible() and optimized the implementation to a single test_bit on the read mostly flag variabl

Re: [PATCH 36/40] autonuma: page_autonuma

2012-07-12 Thread Andrea Arcangeli
On Mon, Jul 02, 2012 at 02:37:10AM -0400, Rik van Riel wrote: > > +fail: > > + printk(KERN_CRIT "allocation of page_autonuma failed.\n"); > > + printk(KERN_CRIT "please try the 'noautonuma' boot option\n"); > > + panic("Out of memory"); > > +} > > The system can run just fine without auton

Re: [PATCH 40/40] autonuma: shrink the per-page page_autonuma struct size

2012-07-12 Thread Andrea Arcangeli
On Mon, Jul 02, 2012 at 03:18:46AM -0400, Rik van Riel wrote: > On 06/28/2012 08:56 AM, Andrea Arcangeli wrote: > > From 32 to 12 bytes, so the AutoNUMA memory footprint is reduced to > > 0.29% of RAM. > > Still not ideal, however once we get native THP migration working &g

Re: [PATCH 00/33] AutoNUMA27

2012-10-11 Thread Andrea Arcangeli
Hi Mel, On Thu, Oct 11, 2012 at 11:19:30AM +0100, Mel Gorman wrote: > As a basic sniff test I added a test to MMtests for the AutoNUMA > Benchmark on a 4-node machine and the following fell out. > > 3.6.0 3.6.0 >

Re: [PATCH 01/33] autonuma: add Documentation/vm/autonuma.txt

2012-10-11 Thread Andrea Arcangeli
Hi, On Thu, Oct 11, 2012 at 11:50:36AM +0100, Mel Gorman wrote: > On Thu, Oct 04, 2012 at 01:50:43AM +0200, Andrea Arcangeli wrote: > > +The AutoNUMA logic is a chain reaction resulting from the actions of > > +the AutoNUMA daemon, knum_scand. The knuma_scand daemon periodically &g

Re: [PATCH 04/33] autonuma: define _PAGE_NUMA

2012-10-11 Thread Andrea Arcangeli
On Thu, Oct 11, 2012 at 12:01:37PM +0100, Mel Gorman wrote: > On Thu, Oct 04, 2012 at 01:50:46AM +0200, Andrea Arcangeli wrote: > > The objective of _PAGE_NUMA is to be able to trigger NUMA hinting page > > faults to identify the per NUMA node working set of the thread

Re: [PATCH 05/33] autonuma: pte_numa() and pmd_numa()

2012-10-11 Thread Andrea Arcangeli
On Thu, Oct 11, 2012 at 12:15:45PM +0100, Mel Gorman wrote: > huh? > > #define _PAGE_NUMA _PAGE_PROTNONE > > so this is effective _PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PROTNONE > > I suspect you are doing this because there is no requirement for > _PAGE_NUMA == _PAGE_PROTNONE for other arch

Re: [PATCH 06/33] autonuma: teach gup_fast about pmd_numa

2012-10-11 Thread Andrea Arcangeli
On Thu, Oct 11, 2012 at 01:22:55PM +0100, Mel Gorman wrote: > On Thu, Oct 04, 2012 at 01:50:48AM +0200, Andrea Arcangeli wrote: > > In the special "pmd" mode of knuma_scand > > (/sys/kernel/mm/autonuma/knuma_scand/pmd == 1), the pmd may be of numa > > type (_PAGE_PR

Re: [PATCH 07/33] autonuma: mm_autonuma and task_autonuma data structures

2012-10-11 Thread Andrea Arcangeli
On Thu, Oct 11, 2012 at 01:28:27PM +0100, Mel Gorman wrote: > s/togehter/together/ Fixed. > > > + * knumad_scan structure. > > + */ > > +struct mm_autonuma { > > Nit but this is very similar in principle to mm_slot for transparent > huge pages. It might be worth renaming both to mm_thp_slot and

Re: [PATCH 08/33] autonuma: define the autonuma flags

2012-10-11 Thread Andrea Arcangeli
On Thu, Oct 11, 2012 at 02:46:43PM +0100, Mel Gorman wrote: > Should this be a SCHED_FEATURE flag? I guess it could. It is only used by kernel/sched/numa.c which isn't even built unless CONFIG_AUTONUMA is set. So it would require a CONFIG_AUTONUMA in the sched feature flags unless we want to expos

Re: [PATCH 10/33] autonuma: CPU follows memory algorithm

2012-10-11 Thread Andrea Arcangeli
On Thu, Oct 11, 2012 at 03:58:05PM +0100, Mel Gorman wrote: > On Thu, Oct 04, 2012 at 01:50:52AM +0200, Andrea Arcangeli wrote: > > This algorithm takes as input the statistical information filled by the > > knuma_scand (mm->mm_autonuma) and by the NUMA hinting page faults >

Re: [PATCH 00/33] AutoNUMA27

2012-10-11 Thread Andrea Arcangeli
On Thu, Oct 11, 2012 at 04:35:03PM +0100, Mel Gorman wrote: > If System CPU time really does go down as this converges then that > should be obvious from monitoring vmstat over time for a test. Early on > - high usage with that dropping as it converges. If that doesn't happen > then the tasks are

Re: [PATCH 07/33] autonuma: mm_autonuma and task_autonuma data structures

2012-10-11 Thread Andrea Arcangeli
Hi Christoph, On Fri, Oct 12, 2012 at 12:23:17AM +, Christoph Lameter wrote: > On Thu, 11 Oct 2012, Rik van Riel wrote: > > > These statistics are updated at page fault time, I > > believe while holding the page table lock. > > > > In other words, they are in code paths where updating > > the

Re: [PATCH 00/33] AutoNUMA27

2012-10-11 Thread Andrea Arcangeli
Hi Mel, On Thu, Oct 11, 2012 at 10:34:32PM +0100, Mel Gorman wrote: > So after getting through the full review of it, there wasn't anything > I could not stand. I think it's *very* heavy on some of the paths like > the idle balancer which I was not keen on and the fault paths are also > quite heav

Re: [PATCH 00/33] AutoNUMA27

2012-10-13 Thread Andrea Arcangeli
Hi Srikar, On Sun, Oct 14, 2012 at 12:10:19AM +0530, Srikar Dronamraju wrote: > * Andrea Arcangeli [2012-10-04 01:50:42]: > > > Hello everyone, > > > > This is a new AutoNUMA27 release for Linux v3.6. > > > > > Here results of autonumabenchm

Re: [PATCH 00/27] Latest numa/core release, v16

2012-11-21 Thread Andrea Arcangeli
Hi, On Wed, Nov 21, 2012 at 10:38:59AM +, Mel Gorman wrote: > HACKBENCH PIPES > 3.7.0 3.7.0 3.7.0 > 3.7.0 3.7.0 >rc6-stats-v4r12 rc6-schednuma-v16r2rc6-autonuma-v28fastr3 >rc6-mo

Re: oops in copy_page_rep()

2013-01-11 Thread Andrea Arcangeli
On Fri, Jan 11, 2013 at 01:50:44AM -0600, Simon Jeons wrote: > On Tue, 2013-01-08 at 18:49 +0100, Andrea Arcangeli wrote: > > Hi Kirill, > > > > On Tue, Jan 08, 2013 at 07:30:58PM +0200, Kirill A. Shutemov wrote: > > > Merged patch is obviously broken: huge_pm

Re: [PATCH v4] KSM: numa awareness sysfs knob

2012-09-28 Thread Andrea Arcangeli
Hi everyone, On Mon, Sep 24, 2012 at 02:56:06AM +0200, Petr Holasek wrote: > +static struct rb_root root_unstable_tree[MAX_NUMNODES] = { RB_ROOT, }; not initializing is better so we don't waste .data and it goes in the .bss, initializing only the first entry is useless anyway, that's getting init

Re: [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock

2012-09-28 Thread Andrea Arcangeli
t; + if (page->mapping) > + mlock_vma_page(page); > + unlock_page(page); > + } > + } Reviewed-by: Andrea Arcangeli Without the patch the kernel will be perfectly fine too, this is is only to show more "up

Re: [patch] mm, thp: fix mlock statistics

2012-09-28 Thread Andrea Arcangeli
of memory > is reflected. *snip* > Reported-by: Hugh Dickens > Signed-off-by: David Rientjes > --- > mm/internal.h |3 ++- > mm/mlock.c |6 -- > mm/page_alloc.c |2 +- > 3 files changed, 7 insertions(+), 4 deletions(-) Reviewed-by: Andrea Arcangeli Than

Re: [PATCH 0/3] Virtual huge zero page

2012-09-29 Thread Andrea Arcangeli
On Sat, Sep 29, 2012 at 02:37:18AM +0300, Kirill A. Shutemov wrote: > Cons: > - increases TLB pressure; I generally don't like using 4k tlb entries ever. This only has the advantage of saving 2MB-4KB RAM (globally), and a chpxchg at the first system-wide zero page fault. I like apps to only use 2

Re: [PATCH 0/3] Virtual huge zero page

2012-09-29 Thread Andrea Arcangeli
On Sat, Sep 29, 2012 at 07:30:06AM -0700, Andi Kleen wrote: > On Sat, Sep 29, 2012 at 03:48:11PM +0200, Andrea Arcangeli wrote: > > On Sat, Sep 29, 2012 at 02:37:18AM +0300, Kirill A. Shutemov wrote: > > > Cons: > > > - increases TLB pressure; > > > > I ge

Re: [PATCH v4] KSM: numa awareness sysfs knob

2012-10-01 Thread Andrea Arcangeli
Hi Hugh, On Sun, Sep 30, 2012 at 05:36:33PM -0700, Hugh Dickins wrote: > I'm all for the simplest solution, but here in ksm_migrate_page() > is not a good place for COW breaking - we don't want to get into > an indefinite number of page allocations, and the risk of failure. Agreed, not a good pla

Re: [PATCH] mm: thp: Set the accessed flag for old pages on access fault.

2012-10-01 Thread Andrea Arcangeli
Hi Will, On Mon, Oct 01, 2012 at 02:51:45PM +0100, Will Deacon wrote: > +void huge_pmd_set_accessed(struct mm_struct *mm, struct vm_area_struct *vma, > +unsigned long address, pmd_t *pmd, pmd_t orig_pmd) > +{ > + pmd_t entry; > + > + spin_lock(&mm->page_table_lock);

Re: [PATCH 0/3] Virtual huge zero page

2012-10-01 Thread Andrea Arcangeli
On Mon, Oct 01, 2012 at 04:49:48PM +0300, Kirill A. Shutemov wrote: > On Sat, Sep 29, 2012 at 04:37:37PM +0200, Andrea Arcangeli wrote: > > But I agree we need to verify it before taking a decision, and that > > the numbers are better than theory, or to rephrase it "let'

Re: [PATCH 0/3] Virtual huge zero page

2012-10-01 Thread Andrea Arcangeli
On Mon, Oct 01, 2012 at 08:34:28AM -0700, H. Peter Anvin wrote: > On 09/29/2012 06:48 AM, Andrea Arcangeli wrote: > > > > There would be a small cache benefit here... but even then some first > > level caches are virtually indexed IIRC (always physically tagged to > > a

Re: [PATCH 0/3] Virtual huge zero page

2012-10-01 Thread Andrea Arcangeli
On Mon, Oct 01, 2012 at 10:03:53AM -0700, H. Peter Anvin wrote: > Something isn't quite right about that. If you look at your numbers: > > 1,049,134,961 LLC-loads > 6,222 LLC-load-misses > > This is another way of saying in your benchmark the huge zero page is > parked in your LLC - usin

Re: [PATCH 0/3] Virtual huge zero page

2012-10-01 Thread Andrea Arcangeli
On Mon, Oct 01, 2012 at 10:33:12AM -0700, H. Peter Anvin wrote: > ... and I think it would be worthwhile to know which effect dominates > (or neither, in which case it doesn't matter). > > Overall, I'm okay with either as long as we don't lock down 2 MB when > there isn't a huge zero page in use.

Re: [PATCH 0/3] Virtual huge zero page

2012-10-01 Thread Andrea Arcangeli
On Mon, Oct 01, 2012 at 08:15:19PM +0300, Kirill A. Shutemov wrote: > I think performance is not the first thing we should look at. We need to > choose which implementation is easier to support. Having to introduce a special pmd bitflag requiring architectural support is actually making it less se

Re: oops in copy_page_rep()

2013-01-08 Thread Andrea Arcangeli
Hi, On Tue, Jan 08, 2013 at 08:52:14AM -0800, Linus Torvalds wrote: > On Tue, Jan 8, 2013 at 8:31 AM, Kirill A. Shutemov > wrote: > >> > >> Heh. I was more thinking about why do_huge_pmd_wp_page() needs it, but > >> do_huge_pmd_numa_page() does not. > > > > It does. The check should be moved up.

Re: oops in copy_page_rep()

2013-01-08 Thread Andrea Arcangeli
Hi Kirill, On Tue, Jan 08, 2013 at 07:30:58PM +0200, Kirill A. Shutemov wrote: > Merged patch is obviously broken: huge_pmd_set_accessed() can be called > only if the pmd is under splitting. Of course I assume you meant "only if the pmd is not under splitting". But no, setting a bitflag like the

<    1   2   3   4   5   6   7   8   9   10   >