Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries

2021-05-20 Thread Peter Xu
On Thu, May 20, 2021 at 03:06:30PM -0400, Zi Yan wrote:
> On 20 May 2021, at 10:57, Peter Xu wrote:
> 
> > On Thu, May 20, 2021 at 07:07:57PM +0530, Aneesh Kumar K.V wrote:
> >> "Aneesh Kumar K.V"  writes:
> >>
> >>> On 5/20/21 6:16 PM, Peter Xu wrote:
> >>>> On Thu, May 20, 2021 at 01:56:54PM +0530, Aneesh Kumar K.V wrote:
> >>>>>> This seems to work at least for my userfaultfd test on shmem, however 
> >>>>>> I don't
> >>>>>> fully understand the commit message [1] on: How do we guarantee we're 
> >>>>>> not
> >>>>>> moving a thp pte?
> >>>>>>
> >>>>>
> >>>>> move_page_tables() checks for pmd_trans_huge() and ends up calling
> >>>>> move_huge_pmd if it is a THP entry.
> >>>>
> >>>> Sorry to be unclear: what if a huge pud thp?
> >>>>
> >>>
> >>> I am still checking. Looking at the code before commit
> >>> c49dd340180260c6239e453263a9a244da9a7c85, I don't see kernel handling
> >>> huge pud thp. I haven't studied huge pud thp enough to understand
> >>> whether c49dd340180260c6239e453263a9a244da9a7c85 intent to add that
> >>> support.
> >>>
> >>> We can do a move_huge_pud() like we do for huge pmd thp. But I am not
> >>> sure whether we handle those VMA's earlier and restrict mremap on them?
> >>
> >> something like this? (not even compile tested). I am still not sure
> >> whether this is really needed or we handle DAX VMA's in some other form.
> >
> > Yeah maybe (you may want to at least drop that extra "case HPAGE_PUD").
> >
> > It's just that if with CONFIG_HAVE_MOVE_PUD (x86 and arm64 enables it by
> > default so far) it does seem to work even with huge pud, while after this 
> > patch
> > it seems to be not working anymore, even with your follow up fix.
> >
> > Indeed I saw CONFIG_HAVE_MOVE_PUD is introduced a few months ago so breaking
> > someone seems to be unlikely, perhaps no real user yet to mremap() a huge 
> > pud
> > for dax or whatever backend?
> >
> > Ideally maybe rework this patch (or series?) and repost it for a better 
> > review?
> > Agree the risk seems low.  I'll leave that to you and Andrew to decide..
> 
> It seems that the mremap function for 1GB DAX THP was not added when 1GB DAX 
> THP
> was implemented[1].

Yes, but trickily as I mentioned it seems Android's CONFIG_HAVE_MOVE_PUD has
done this right (with no intention I guess) with the set_pud_at() before this
patch is merged, so we might have a short period that this might start to work..

> I guess no one is using mremap on 1GB DAX THP. Maybe we want
> to at least add a warning or VM_BUG_ON to catch this or use Aneesh’s 
> move_huge_pud()
> to handle the situation properly?

Agreed, if we decide to go with the patches, some warning (or even VM_BUG_ON,
which iiuc should be very not-suggested in most cases) looks better than
pgtable corruption reports.

-- 
Peter Xu



Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries

2021-05-20 Thread Peter Xu
On Thu, May 20, 2021 at 07:07:57PM +0530, Aneesh Kumar K.V wrote:
> "Aneesh Kumar K.V"  writes:
> 
> > On 5/20/21 6:16 PM, Peter Xu wrote:
> >> On Thu, May 20, 2021 at 01:56:54PM +0530, Aneesh Kumar K.V wrote:
> >>>> This seems to work at least for my userfaultfd test on shmem, however I 
> >>>> don't
> >>>> fully understand the commit message [1] on: How do we guarantee we're not
> >>>> moving a thp pte?
> >>>>
> >>>
> >>> move_page_tables() checks for pmd_trans_huge() and ends up calling
> >>> move_huge_pmd if it is a THP entry.
> >> 
> >> Sorry to be unclear: what if a huge pud thp?
> >> 
> >
> > I am still checking. Looking at the code before commit 
> > c49dd340180260c6239e453263a9a244da9a7c85, I don't see kernel handling 
> > huge pud thp. I haven't studied huge pud thp enough to understand 
> > whether c49dd340180260c6239e453263a9a244da9a7c85 intent to add that 
> > support.
> >
> > We can do a move_huge_pud() like we do for huge pmd thp. But I am not 
> > sure whether we handle those VMA's earlier and restrict mremap on them?
> 
> something like this? (not even compile tested). I am still not sure
> whether this is really needed or we handle DAX VMA's in some other form.

Yeah maybe (you may want to at least drop that extra "case HPAGE_PUD").

It's just that if with CONFIG_HAVE_MOVE_PUD (x86 and arm64 enables it by
default so far) it does seem to work even with huge pud, while after this patch
it seems to be not working anymore, even with your follow up fix.

Indeed I saw CONFIG_HAVE_MOVE_PUD is introduced a few months ago so breaking
someone seems to be unlikely, perhaps no real user yet to mremap() a huge pud
for dax or whatever backend?

Ideally maybe rework this patch (or series?) and repost it for a better review?
Agree the risk seems low.  I'll leave that to you and Andrew to decide..

-- 
Peter Xu



Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries

2021-05-20 Thread Peter Xu
On Thu, May 20, 2021 at 01:56:54PM +0530, Aneesh Kumar K.V wrote:
> > This seems to work at least for my userfaultfd test on shmem, however I 
> > don't
> > fully understand the commit message [1] on: How do we guarantee we're not
> > moving a thp pte?
> > 
> 
> move_page_tables() checks for pmd_trans_huge() and ends up calling
> move_huge_pmd if it is a THP entry.

Sorry to be unclear: what if a huge pud thp?

-- 
Peter Xu



Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries

2021-05-19 Thread Peter Xu
On Wed, May 19, 2021 at 10:16:07AM +0530, Aneesh Kumar K.V wrote:
> > On Thu, Apr 22, 2021 at 11:13:17AM +0530, Aneesh Kumar K.V wrote:
> >> pmd/pud_populate is the right interface to be used to set the respective
> >> page table entries. Some architectures like ppc64 do assume that 
> >> set_pmd/pud_at
> >> can only be used to set a hugepage PTE. Since we are not setting up a 
> >> hugepage
> >> PTE here, use the pmd/pud_populate interface.

[1]

> Can you try this change?
> 
> modified   mm/mremap.c
> @@ -279,7 +279,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, 
> unsigned long old_addr,
>   pmd_clear(old_pmd);
>  
>   VM_BUG_ON(!pmd_none(*new_pmd));
> - pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
> + pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
>  
>   if (new_ptl != old_ptl)
>   spin_unlock(new_ptl);

I reported this issue today somewhere else:

https://lore.kernel.org/linux-mm/YKVemB5DuSqLFmmz@t490s/

And came to this same line after the bisection.

This seems to work at least for my userfaultfd test on shmem, however I don't
fully understand the commit message [1] on: How do we guarantee we're not
moving a thp pte?

-- 
Peter Xu



[PATCH v5 16/25] mm/powerpc: Use general page fault accounting

2020-07-07 Thread Peter Xu
Use the general page fault accounting by passing regs into handle_mm_fault().

CC: Michael Ellerman 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman 
Signed-off-by: Peter Xu 
---
 arch/powerpc/mm/fault.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 25dee001d8e1..00259e9b452d 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -607,7 +607,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   fault = handle_mm_fault(vma, address, flags, NULL);
+   fault = handle_mm_fault(vma, address, flags, regs);
 
major |= fault & VM_FAULT_MAJOR;
 
@@ -633,14 +633,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
/*
 * Major/minor page fault accounting.
 */
-   if (major) {
-   current->maj_flt++;
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+   if (major)
cmo_account_page_fault();
-   } else {
-   current->min_flt++;
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
-   }
+
return 0;
 }
 NOKPROBE_SYMBOL(__do_page_fault);
-- 
2.26.2



[PATCH v4 16/26] mm/powerpc: Use general page fault accounting

2020-06-30 Thread Peter Xu
Use the general page fault accounting by passing regs into handle_mm_fault().

CC: Michael Ellerman 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Peter Xu 
---
 arch/powerpc/mm/fault.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 25dee001d8e1..00259e9b452d 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -607,7 +607,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   fault = handle_mm_fault(vma, address, flags, NULL);
+   fault = handle_mm_fault(vma, address, flags, regs);
 
major |= fault & VM_FAULT_MAJOR;
 
@@ -633,14 +633,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
/*
 * Major/minor page fault accounting.
 */
-   if (major) {
-   current->maj_flt++;
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+   if (major)
cmo_account_page_fault();
-   } else {
-   current->min_flt++;
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
-   }
+
return 0;
 }
 NOKPROBE_SYMBOL(__do_page_fault);
-- 
2.26.2



[PATCH 16/26] mm/powerpc: Use general page fault accounting

2020-06-19 Thread Peter Xu
Use the general page fault accounting by passing regs into handle_mm_fault().

CC: Michael Ellerman 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Peter Xu 
---
 arch/powerpc/mm/fault.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 992b10c3761c..e325d13efaf5 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -563,7 +563,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   fault = handle_mm_fault(vma, address, flags, NULL);
+   fault = handle_mm_fault(vma, address, flags, regs);
 
 #ifdef CONFIG_PPC_MEM_KEYS
/*
@@ -604,14 +604,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
/*
 * Major/minor page fault accounting.
 */
-   if (major) {
-   current->maj_flt++;
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+   if (major)
cmo_account_page_fault();
-   } else {
-   current->min_flt++;
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
-   }
+
return 0;
 }
 NOKPROBE_SYMBOL(__do_page_fault);
-- 
2.26.2



[PATCH 17/25] mm/powerpc: Use mm_fault_accounting()

2020-06-15 Thread Peter Xu
Use the new mm_fault_accounting() helper for page fault accounting.

cmo_account_page_fault() is special.  Keep that.

CC: Michael Ellerman 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Peter Xu 
---
 arch/powerpc/mm/fault.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 84af6c8eecf7..6043b639ae42 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -481,8 +481,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
if (!arch_irq_disabled_regs(regs))
local_irq_enable();
 
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
-
if (error_code & DSISR_KEYFAULT)
return bad_key_fault_exception(regs, address,
   get_mm_addr_key(mm, address));
@@ -604,14 +602,11 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
/*
 * Major/minor page fault accounting.
 */
-   if (major) {
-   current->maj_flt++;
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+   if (major)
cmo_account_page_fault();
-   } else {
-   current->min_flt++;
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
-   }
+
+   mm_fault_accounting(current, regs, address, major);
+
return 0;
 }
 NOKPROBE_SYMBOL(__do_page_fault);
-- 
2.26.2



Re: [PATCH v2 4/4] hugetlbfs: clean up command line processing

2020-04-14 Thread Peter Xu
On Mon, Apr 13, 2020 at 10:59:26AM -0700, Mike Kravetz wrote:
> On 4/10/20 1:37 PM, Peter Xu wrote:
> > On Wed, Apr 01, 2020 at 11:38:19AM -0700, Mike Kravetz wrote:
> >> With all hugetlb page processing done in a single file clean up code.
> >> - Make code match desired semantics
> >>   - Update documentation with semantics
> >> - Make all warnings and errors messages start with 'HugeTLB:'.
> >> - Consistently name command line parsing routines.
> >> - Check for hugepages_supported() before processing parameters.
> >> - Add comments to code
> >>   - Describe some of the subtle interactions
> >>   - Describe semantics of command line arguments
> >>
> >> Signed-off-by: Mike Kravetz 
> >> ---
> >>  .../admin-guide/kernel-parameters.txt | 35 ---
> >>  Documentation/admin-guide/mm/hugetlbpage.rst  | 44 +
> >>  mm/hugetlb.c  | 96 +++
> >>  3 files changed, 142 insertions(+), 33 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> >> b/Documentation/admin-guide/kernel-parameters.txt
> >> index 1bd5454b5e5f..de653cfe1726 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -832,12 +832,15 @@
> >>See also Documentation/networking/decnet.txt.
> >>  
> >>default_hugepagesz=
> >> -  [same as hugepagesz=] The size of the default
> >> -  HugeTLB page size. This is the size represented by
> >> -  the legacy /proc/ hugepages APIs, used for SHM, and
> >> -  default size when mounting hugetlbfs filesystems.
> >> -  Defaults to the default architecture's huge page size
> >> -  if not specified.
> >> +  [HW] The size of the default HugeTLB page size. This
> > 
> > Could I ask what's "HW"?  Sorry this is not a comment at all but
> > really a pure question I wanted to ask... :)
> 
> kernel-parameters.rst includes kernel-parameters.txt and included the meaning
> for these codes.
> 
>HW  Appropriate hardware is enabled.
> 
> Previously, it listed an obsolete list of architectures.

I see. It was a bit confusing since hugepage is not a real hardware,
"CAP (capability)" might be easier, but I get the point now, thanks!

[...]

> >> diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst 
> >> b/Documentation/admin-guide/mm/hugetlbpage.rst
> >> index 1cc0bc78d10e..de340c586995 100644
> >> --- a/Documentation/admin-guide/mm/hugetlbpage.rst
> >> +++ b/Documentation/admin-guide/mm/hugetlbpage.rst
> >> @@ -100,6 +100,50 @@ with a huge page size selection parameter 
> >> "hugepagesz=".   must
> >>  be specified in bytes with optional scale suffix [kKmMgG].  The default 
> >> huge
> >>  page size may be selected with the "default_hugepagesz=" boot 
> >> parameter.
> >>  
> >> +Hugetlb boot command line parameter semantics
> >> +hugepagesz - Specify a huge page size.  Used in conjunction with hugepages
> >> +  parameter to preallocate a number of huge pages of the specified
> >> +  size.  Hence, hugepagesz and hugepages are typically specified in
> >> +  pairs such as:
> >> +  hugepagesz=2M hugepages=512
> >> +  hugepagesz can only be specified once on the command line for a
> >> +  specific huge page size.  Valid huge page sizes are architecture
> >> +  dependent.
> >> +hugepages - Specify the number of huge pages to preallocate.  This 
> >> typically
> >> +  follows a valid hugepagesz parameter.  However, if hugepages is the
> >> +  first or only hugetlb command line parameter it specifies the number
> >> +  of huge pages of default size to allocate.  The number of huge pages
> >> +  of default size specified in this manner can be overwritten by a
> >> +  hugepagesz,hugepages parameter pair for the default size.
> >> +  For example, on an architecture with 2M default huge page size:
> >> +  hugepages=256 hugepagesz=2M hugepages=512
> >> +  will result in 512 2M huge pages being allocated.  If a hugepages
> >> +  parameter is preceded by an invalid hugepagesz parameter, it will
> >> +  be ignored.
> >> +default_hugepagesz - Specify the default huge page size.  This parameter 
> >> can
> >> + 

Re: [PATCH v2 4/4] hugetlbfs: clean up command line processing

2020-04-10 Thread Peter Xu
ssing
> + * A specific huge page size can only be specified once with hugepagesz.
> + * hugepagesz is followed by hugepages on the command line.  The global
> + * variable 'parsed_valid_hugepagesz' is used to determine if prior
> + * hugepagesz argument was valid.
> + */
>  static int __init hugepagesz_setup(char *s)
>  {
>   unsigned long size;
>  
> + if (!hugepages_supported()) {
> + pr_warn("HugeTLB: huge pages not supported, ignoring hugepagesz 
> = %s\n", s);
> + return 0;
> + }
> +
>   size = (unsigned long)memparse(s, NULL);
>  
>   if (!arch_hugetlb_valid_size(size)) {
> @@ -3329,19 +3368,31 @@ static int __init hugepagesz_setup(char *s)
>   }
>  
>   if (size_to_hstate(size)) {
> + parsed_valid_hugepagesz = false;
>   pr_warn("HugeTLB: hugepagesz %s specified twice, ignoring\n", 
> s);
>   return 0;
>   }
>  
> + parsed_valid_hugepagesz = true;
>   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
>   return 1;
>  }
>  __setup("hugepagesz=", hugepagesz_setup);
>  
> +/*
> + * default_hugepagesz command line input
> + * Only one instance of default_hugepagesz allowed on command line.  Do not
> + * add hstate here as that will confuse hugepagesz/hugepages processing.
> + */
>  static int __init default_hugepagesz_setup(char *s)
>  {
>   unsigned long size;
>  
> + if (!hugepages_supported()) {
> + pr_warn("HugeTLB: huge pages not supported, ignoring 
> default_hugepagesz = %s\n", s);
> + return 0;
> + }
> +
>   size = (unsigned long)memparse(s, NULL);
>  
>   if (!arch_hugetlb_valid_size(size)) {
> @@ -3349,6 +3400,11 @@ static int __init default_hugepagesz_setup(char *s)
>   return 0;
>   }
>  
> + if (default_hstate_size) {
> + pr_err("HugeTLB: default_hugepagesz previously specified, 
> ignoring %s\n", s);
> + return 0;
> + }

Nitpick: ideally this can be moved before memparse().

Thanks,

> +
>   default_hstate_size = size;
>   return 1;
>  }
> -- 
> 2.25.1
> 
> 

-- 
Peter Xu



Re: [PATCH v2 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

2020-04-10 Thread Peter Xu
On Wed, Apr 01, 2020 at 11:38:18AM -0700, Mike Kravetz wrote:

[...]

> @@ -3255,7 +3254,6 @@ void __init hugetlb_add_hstate(unsigned int order)
>   unsigned long i;
>  
>   if (size_to_hstate(PAGE_SIZE << order)) {
> - pr_warn("hugepagesz= specified twice, ignoring\n");
>   return;
>   }

Nitpick: I think the brackets need to be removed to follow linux
coding style.  With that:

Reviewed-by: Peter Xu 

-- 
Peter Xu



Re: [PATCH v2 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-04-10 Thread Peter Xu
On Wed, Apr 01, 2020 at 11:38:17AM -0700, Mike Kravetz wrote:
> Now that architectures provide arch_hugetlb_valid_size(), parsing
> of "hugepagesz=" can be done in architecture independent code.
> Create a single routine to handle hugepagesz= parsing and remove
> all arch specific routines.  We can also remove the interface
> hugetlb_bad_size() as this is no longer used outside arch independent
> code.
> 
> This also provides consistent behavior of hugetlbfs command line
> options.  The hugepagesz= option should only be specified once for
> a specific size, but some architectures allow multiple instances.
> This appears to be more of an oversight when code was added by some
> architectures to set up ALL huge pages sizes.
> 
> Signed-off-by: Mike Kravetz 

This could change the error messages for a wrong setup on archs, but I
guess it's not a big deal, assuming even to capture error people will
majorly still look for error lines in general..

Reviewed-by: Peter Xu 

-- 
Peter Xu



Re: [PATCH v2 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-04-10 Thread Peter Xu
On Wed, Apr 01, 2020 at 11:38:16AM -0700, Mike Kravetz wrote:
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 2eb6c234d594..81606223494f 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -59,6 +59,8 @@ extern void huge_pte_clear(struct mm_struct *mm, unsigned 
> long addr,
>  extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
>pte_t *ptep, pte_t pte, unsigned long sz);
>  #define set_huge_swap_pte_at set_huge_swap_pte_at
> +bool __init arch_hugetlb_valid_size(unsigned long size);
> +#define arch_hugetlb_valid_size arch_hugetlb_valid_size

Sorry for chimming in late.

Since we're working on removing arch-dependent codes after all.. I'm
thinking whether we can define arch_hugetlb_valid_size() once in the
common header (e.g. linux/hugetlb.h), then in mm/hugetlb.c:

bool __init __attribute((weak)) arch_hugetlb_valid_size(unsigned long size)
{
return size == HPAGE_SIZE;
}

We can simply redefine arch_hugetlb_valid_size() in arch specific C
files where we want to override the default.  Would that be slightly
cleaner?

Thanks,

-- 
Peter Xu



[PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook

2019-01-31 Thread Peter Xu
The change_pte() notifier was designed to use as a quick path to
update secondary MMU PTEs on write permission changes or PFN changes.
For KVM, it could reduce the vm-exits when vcpu faults on the pages
that was touched up by KSM.  It's not used to do cache invalidations,
for example, if we see the notifier will be called before the real PTE
update after all (please see set_pte_at_notify that set_pte_at was
called later).

All the necessary cache invalidation should all be done in
invalidate_range() already.

CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Alistair Popple 
CC: Alexey Kardashevskiy 
CC: Mark Hairgrove 
CC: Balbir Singh 
CC: David Gibson 
CC: Andrea Arcangeli 
CC: Jerome Glisse 
CC: Jason Wang 
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-ker...@vger.kernel.org
Signed-off-by: Peter Xu 
---
 arch/powerpc/platforms/powernv/npu-dma.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 3f58c7dbd581..c003b29d870e 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -917,15 +917,6 @@ static void pnv_npu2_mn_release(struct mmu_notifier *mn,
mmio_invalidate(npu_context, 0, ~0UL);
 }
 
-static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn,
-   struct mm_struct *mm,
-   unsigned long address,
-   pte_t pte)
-{
-   struct npu_context *npu_context = mn_to_npu_context(mn);
-   mmio_invalidate(npu_context, address, PAGE_SIZE);
-}
-
 static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long start, unsigned long end)
@@ -936,7 +927,6 @@ static void pnv_npu2_mn_invalidate_range(struct 
mmu_notifier *mn,
 
 static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
.release = pnv_npu2_mn_release,
-   .change_pte = pnv_npu2_mn_change_pte,
.invalidate_range = pnv_npu2_mn_invalidate_range,
 };
 
-- 
2.17.1