Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE

2013-12-05 Thread Rik van Riel

On 12/03/2013 10:13 PM, Benjamin Herrenschmidt wrote:

On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote:

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
On archs like ppc64 that don't use _PAGE_PROTNONE and also have
a separate page table outside linux pagetable, we just need to
make sure that when calling change_prot_numa we flush the
hardware page table entry so that next page access  result in a numa
fault.


That patch doesn't look right...


At first glance, indeed...


You are essentially making change_prot_numa() do whatever it does (which
I don't completely understand) *for all architectures* now, whether they
have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that
behaviour on powerpc book3s64, you change everybody.


However, it appears that since the code was #ifdefed
like that, the called code was made generic enough,
that change_prot_numa should actually work for
everything.

In other words:

Reviewed-by: Rik van Riel r...@redhat.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE

2013-12-05 Thread Rik van Riel

On 12/05/2013 12:20 AM, Benjamin Herrenschmidt wrote:

On Thu, 2013-12-05 at 10:48 +0530, Aneesh Kumar K.V wrote:


Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call
change_prot_numa from task_numa_work and queue_pages_range(). The later
may be an issue. So doing the below will help ?

-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+#ifdef CONFIG_NUMA_BALANCING


I will defer to Mel and Rik (should we also CC Andrea ?)


It looks like manual numa binding can also use lazy
page migration, but I am not sure if that can happen
without CONFIG_NUMA_BALANCING, or if mbind always uses
MPOL_MF_STRICT...

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE

2013-12-05 Thread Benjamin Herrenschmidt
On Thu, 2013-12-05 at 12:27 -0500, Rik van Riel wrote:

 However, it appears that since the code was #ifdefed
 like that, the called code was made generic enough,
 that change_prot_numa should actually work for
 everything.
 
 In other words:
 
 Reviewed-by: Rik van Riel r...@redhat.com

Ok thanks, that's what I needed. Do you have any objection of me merging
that change via the powerpc tree along with the corresponding powerpc
bits from Aneesh ?

The other option would be to have it in a topic branch that I pull from
you.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE

2013-12-04 Thread Aneesh Kumar K.V

Adding Mel and Rik to cc:

Benjamin Herrenschmidt b...@au1.ibm.com writes:

 On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
 On archs like ppc64 that don't use _PAGE_PROTNONE and also have
 a separate page table outside linux pagetable, we just need to
 make sure that when calling change_prot_numa we flush the
 hardware page table entry so that next page access  result in a numa
 fault.

 That patch doesn't look right...

 You are essentially making change_prot_numa() do whatever it does (which
 I don't completely understand) *for all architectures* now, whether they
 have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that
 behaviour on powerpc book3s64, you change everybody.

 Is that correct ?


Yes. 


 Also what exactly is that doing, can you explain ? From what I can see,
 it calls back into the core of mprotect to change the protection to
 vma-vm_page_prot, which I would have expected is already the protection
 there, with the added prot_numa flag passed down.

it set the _PAGE_NUMA bit. Now we also want to make sure that when
we set _PAGE_NUMA, we would get a pagefault on that so that we can track
that fault as a numa fault. To ensure that, we had the below BUILD_BUG

BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);


But other than that the function doesn't really have any dependency on
_PAGE_PROTNONE. The only requirement is when we set _PAGE_NUMA, the
architecture should do enough to ensure that we get a page fault. Now on
ppc64 we does that by clearlying hpte entry and also clearing
_PAGE_PRESENT. Since we have _PAGE_PRESENT cleared hash_page will return
1 and we get to page fault handler.


 Your changeset comment says On archs like ppc64 [...] we just need to
 make sure that when calling change_prot_numa we flush the
 hardware page table entry so that next page access  result in a numa
 fault.

 But change_prot_numa() does a lot more than that ... it does
 pte_mknuma(), do we need it ? I assume we do or we wouldn't have added
 that PTE bit to begin with...

 Now it *might* be allright and it might be that no other architecture
 cares anyway etc... but I need at least some mm folks to ack on that
 patch before I can take it because it *will* change behaviour of other
 architectures.


Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call
change_prot_numa from task_numa_work and queue_pages_range(). The later
may be an issue. So doing the below will help ?

-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+#ifdef CONFIG_NUMA_BALANCING


-aneesh


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE

2013-12-04 Thread Benjamin Herrenschmidt
On Thu, 2013-12-05 at 10:48 +0530, Aneesh Kumar K.V wrote:
 
 Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call
 change_prot_numa from task_numa_work and queue_pages_range(). The later
 may be an issue. So doing the below will help ?
 
 -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
 +#ifdef CONFIG_NUMA_BALANCING

I will defer to Mel and Rik (should we also CC Andrea ?)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE

2013-12-03 Thread Benjamin Herrenschmidt
On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
 On archs like ppc64 that don't use _PAGE_PROTNONE and also have
 a separate page table outside linux pagetable, we just need to
 make sure that when calling change_prot_numa we flush the
 hardware page table entry so that next page access  result in a numa
 fault.

That patch doesn't look right...

You are essentially making change_prot_numa() do whatever it does (which
I don't completely understand) *for all architectures* now, whether they
have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that
behaviour on powerpc book3s64, you change everybody.

Is that correct ?

Also what exactly is that doing, can you explain ? From what I can see,
it calls back into the core of mprotect to change the protection to
vma-vm_page_prot, which I would have expected is already the protection
there, with the added prot_numa flag passed down.

Your changeset comment says On archs like ppc64 [...] we just need to
make sure that when calling change_prot_numa we flush the
hardware page table entry so that next page access  result in a numa
fault.

But change_prot_numa() does a lot more than that ... it does
pte_mknuma(), do we need it ? I assume we do or we wouldn't have added
that PTE bit to begin with...

Now it *might* be allright and it might be that no other architecture
cares anyway etc... but I need at least some mm folks to ack on that
patch before I can take it because it *will* change behaviour of other
architectures.

Cheers,
Ben.

 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  include/linux/mm.h | 3 ---
  mm/mempolicy.c | 9 -
  2 files changed, 12 deletions(-)
 
 diff --git a/include/linux/mm.h b/include/linux/mm.h
 index 0548eb201e05..51794c1a1d7e 100644
 --- a/include/linux/mm.h
 +++ b/include/linux/mm.h
 @@ -1851,11 +1851,8 @@ static inline pgprot_t vm_get_page_prot(unsigned long 
 vm_flags)
  }
  #endif
  
 -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
  unsigned long change_prot_numa(struct vm_area_struct *vma,
   unsigned long start, unsigned long end);
 -#endif
 -
  struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long 
 addr);
  int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
   unsigned long pfn, unsigned long size, pgprot_t);
 diff --git a/mm/mempolicy.c b/mm/mempolicy.c
 index c4403cdf3433..cae10af4fdc4 100644
 --- a/mm/mempolicy.c
 +++ b/mm/mempolicy.c
 @@ -613,7 +613,6 @@ static inline int queue_pages_pgd_range(struct 
 vm_area_struct *vma,
   return 0;
  }
  
 -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
  /*
   * This is used to mark a range of virtual addresses to be inaccessible.
   * These are later cleared by a NUMA hinting fault. Depending on these
 @@ -627,7 +626,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
   unsigned long addr, unsigned long end)
  {
   int nr_updated;
 - BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);
  
   nr_updated = change_protection(vma, addr, end, vma-vm_page_prot, 0, 1);
   if (nr_updated)
 @@ -635,13 +633,6 @@ unsigned long change_prot_numa(struct vm_area_struct 
 *vma,
  
   return nr_updated;
  }
 -#else
 -static unsigned long change_prot_numa(struct vm_area_struct *vma,
 - unsigned long addr, unsigned long end)
 -{
 - return 0;
 -}
 -#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */
  
  /*
   * Walk through page tables and collect pages to be migrated.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE

2013-11-18 Thread Aneesh Kumar K.V
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
On archs like ppc64 that don't use _PAGE_PROTNONE and also have
a separate page table outside linux pagetable, we just need to
make sure that when calling change_prot_numa we flush the
hardware page table entry so that next page access  result in a numa
fault.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 include/linux/mm.h | 3 ---
 mm/mempolicy.c | 9 -
 2 files changed, 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0548eb201e05..51794c1a1d7e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1851,11 +1851,8 @@ static inline pgprot_t vm_get_page_prot(unsigned long 
vm_flags)
 }
 #endif
 
-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
 unsigned long change_prot_numa(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
-#endif
-
 struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index c4403cdf3433..cae10af4fdc4 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -613,7 +613,6 @@ static inline int queue_pages_pgd_range(struct 
vm_area_struct *vma,
return 0;
 }
 
-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
 /*
  * This is used to mark a range of virtual addresses to be inaccessible.
  * These are later cleared by a NUMA hinting fault. Depending on these
@@ -627,7 +626,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
unsigned long addr, unsigned long end)
 {
int nr_updated;
-   BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);
 
nr_updated = change_protection(vma, addr, end, vma-vm_page_prot, 0, 1);
if (nr_updated)
@@ -635,13 +633,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
 
return nr_updated;
 }
-#else
-static unsigned long change_prot_numa(struct vm_area_struct *vma,
-   unsigned long addr, unsigned long end)
-{
-   return 0;
-}
-#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */
 
 /*
  * Walk through page tables and collect pages to be migrated.
-- 
1.8.3.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev