Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
On 12/03/2013 10:13 PM, Benjamin Herrenschmidt wrote: On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. On archs like ppc64 that don't use _PAGE_PROTNONE and also have a separate page table outside linux pagetable, we just need to make sure that when calling change_prot_numa we flush the hardware page table entry so that next page access result in a numa fault. That patch doesn't look right... At first glance, indeed... You are essentially making change_prot_numa() do whatever it does (which I don't completely understand) *for all architectures* now, whether they have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that behaviour on powerpc book3s64, you change everybody. However, it appears that since the code was #ifdefed like that, the called code was made generic enough, that change_prot_numa should actually work for everything. In other words: Reviewed-by: Rik van Riel r...@redhat.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
On 12/05/2013 12:20 AM, Benjamin Herrenschmidt wrote: On Thu, 2013-12-05 at 10:48 +0530, Aneesh Kumar K.V wrote: Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call change_prot_numa from task_numa_work and queue_pages_range(). The later may be an issue. So doing the below will help ? -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE +#ifdef CONFIG_NUMA_BALANCING I will defer to Mel and Rik (should we also CC Andrea ?) It looks like manual numa binding can also use lazy page migration, but I am not sure if that can happen without CONFIG_NUMA_BALANCING, or if mbind always uses MPOL_MF_STRICT... ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
On Thu, 2013-12-05 at 12:27 -0500, Rik van Riel wrote: However, it appears that since the code was #ifdefed like that, the called code was made generic enough, that change_prot_numa should actually work for everything. In other words: Reviewed-by: Rik van Riel r...@redhat.com Ok thanks, that's what I needed. Do you have any objection of me merging that change via the powerpc tree along with the corresponding powerpc bits from Aneesh ? The other option would be to have it in a topic branch that I pull from you. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
Adding Mel and Rik to cc: Benjamin Herrenschmidt b...@au1.ibm.com writes: On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. On archs like ppc64 that don't use _PAGE_PROTNONE and also have a separate page table outside linux pagetable, we just need to make sure that when calling change_prot_numa we flush the hardware page table entry so that next page access result in a numa fault. That patch doesn't look right... You are essentially making change_prot_numa() do whatever it does (which I don't completely understand) *for all architectures* now, whether they have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that behaviour on powerpc book3s64, you change everybody. Is that correct ? Yes. Also what exactly is that doing, can you explain ? From what I can see, it calls back into the core of mprotect to change the protection to vma-vm_page_prot, which I would have expected is already the protection there, with the added prot_numa flag passed down. it set the _PAGE_NUMA bit. Now we also want to make sure that when we set _PAGE_NUMA, we would get a pagefault on that so that we can track that fault as a numa fault. To ensure that, we had the below BUILD_BUG BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE); But other than that the function doesn't really have any dependency on _PAGE_PROTNONE. The only requirement is when we set _PAGE_NUMA, the architecture should do enough to ensure that we get a page fault. Now on ppc64 we does that by clearlying hpte entry and also clearing _PAGE_PRESENT. Since we have _PAGE_PRESENT cleared hash_page will return 1 and we get to page fault handler. Your changeset comment says On archs like ppc64 [...] we just need to make sure that when calling change_prot_numa we flush the hardware page table entry so that next page access result in a numa fault. But change_prot_numa() does a lot more than that ... it does pte_mknuma(), do we need it ? I assume we do or we wouldn't have added that PTE bit to begin with... Now it *might* be allright and it might be that no other architecture cares anyway etc... but I need at least some mm folks to ack on that patch before I can take it because it *will* change behaviour of other architectures. Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call change_prot_numa from task_numa_work and queue_pages_range(). The later may be an issue. So doing the below will help ? -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE +#ifdef CONFIG_NUMA_BALANCING -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
On Thu, 2013-12-05 at 10:48 +0530, Aneesh Kumar K.V wrote: Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call change_prot_numa from task_numa_work and queue_pages_range(). The later may be an issue. So doing the below will help ? -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE +#ifdef CONFIG_NUMA_BALANCING I will defer to Mel and Rik (should we also CC Andrea ?) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. On archs like ppc64 that don't use _PAGE_PROTNONE and also have a separate page table outside linux pagetable, we just need to make sure that when calling change_prot_numa we flush the hardware page table entry so that next page access result in a numa fault. That patch doesn't look right... You are essentially making change_prot_numa() do whatever it does (which I don't completely understand) *for all architectures* now, whether they have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that behaviour on powerpc book3s64, you change everybody. Is that correct ? Also what exactly is that doing, can you explain ? From what I can see, it calls back into the core of mprotect to change the protection to vma-vm_page_prot, which I would have expected is already the protection there, with the added prot_numa flag passed down. Your changeset comment says On archs like ppc64 [...] we just need to make sure that when calling change_prot_numa we flush the hardware page table entry so that next page access result in a numa fault. But change_prot_numa() does a lot more than that ... it does pte_mknuma(), do we need it ? I assume we do or we wouldn't have added that PTE bit to begin with... Now it *might* be allright and it might be that no other architecture cares anyway etc... but I need at least some mm folks to ack on that patch before I can take it because it *will* change behaviour of other architectures. Cheers, Ben. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- include/linux/mm.h | 3 --- mm/mempolicy.c | 9 - 2 files changed, 12 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0548eb201e05..51794c1a1d7e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1851,11 +1851,8 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags) } #endif -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE unsigned long change_prot_numa(struct vm_area_struct *vma, unsigned long start, unsigned long end); -#endif - struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr); int remap_pfn_range(struct vm_area_struct *, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index c4403cdf3433..cae10af4fdc4 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -613,7 +613,6 @@ static inline int queue_pages_pgd_range(struct vm_area_struct *vma, return 0; } -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE /* * This is used to mark a range of virtual addresses to be inaccessible. * These are later cleared by a NUMA hinting fault. Depending on these @@ -627,7 +626,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, unsigned long addr, unsigned long end) { int nr_updated; - BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE); nr_updated = change_protection(vma, addr, end, vma-vm_page_prot, 0, 1); if (nr_updated) @@ -635,13 +633,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, return nr_updated; } -#else -static unsigned long change_prot_numa(struct vm_area_struct *vma, - unsigned long addr, unsigned long end) -{ - return 0; -} -#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */ /* * Walk through page tables and collect pages to be migrated. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. On archs like ppc64 that don't use _PAGE_PROTNONE and also have a separate page table outside linux pagetable, we just need to make sure that when calling change_prot_numa we flush the hardware page table entry so that next page access result in a numa fault. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- include/linux/mm.h | 3 --- mm/mempolicy.c | 9 - 2 files changed, 12 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0548eb201e05..51794c1a1d7e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1851,11 +1851,8 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags) } #endif -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE unsigned long change_prot_numa(struct vm_area_struct *vma, unsigned long start, unsigned long end); -#endif - struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr); int remap_pfn_range(struct vm_area_struct *, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index c4403cdf3433..cae10af4fdc4 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -613,7 +613,6 @@ static inline int queue_pages_pgd_range(struct vm_area_struct *vma, return 0; } -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE /* * This is used to mark a range of virtual addresses to be inaccessible. * These are later cleared by a NUMA hinting fault. Depending on these @@ -627,7 +626,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, unsigned long addr, unsigned long end) { int nr_updated; - BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE); nr_updated = change_protection(vma, addr, end, vma-vm_page_prot, 0, 1); if (nr_updated) @@ -635,13 +633,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, return nr_updated; } -#else -static unsigned long change_prot_numa(struct vm_area_struct *vma, - unsigned long addr, unsigned long end) -{ - return 0; -} -#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */ /* * Walk through page tables and collect pages to be migrated. -- 1.8.3.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev