from:"Mike Kravetz"

[PATCH] mm: remove zap_page_range and create zap_vma_pages

2023-01-03 Thread Mike Kravetz

zap_page_range was originally designed to unmap pages within an address
range that could span multiple vmas.  While working on [1], it was
discovered that all callers of zap_page_range pass a range entirely within
a single vma.  In addition, the mmu notification call within zap_page
range does not correctly handle ranges that span multiple vmas.  When
crossing a vma boundary, a new mmu_notifier_range_init/end call pair
with the new vma should be made.

Instead of fixing zap_page_range, do the following:
- Create a new routine zap_vma_pages() that will remove all pages within
  the passed vma.  Most users of zap_page_range pass the entire vma and
  can use this new routine.
- For callers of zap_page_range not passing the entire vma, instead call
  zap_page_range_single().
- Remove zap_page_range.

[1] 
https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.krav...@oracle.com/
Suggested-by: Peter Xu 
Signed-off-by: Mike Kravetz 
---
RFC->v1 Created zap_vma_pages to zap entire vma (Christoph Hellwig)
Did not add Acked-by's as routine was changed.

 arch/arm64/kernel/vdso.c|  6 ++---
 arch/powerpc/kernel/vdso.c  |  4 +---
 arch/powerpc/platforms/book3s/vas-api.c |  2 +-
 arch/powerpc/platforms/pseries/vas.c|  3 +--
 arch/riscv/kernel/vdso.c|  6 ++---
 arch/s390/kernel/vdso.c |  4 +---
 arch/s390/mm/gmap.c |  2 +-
 arch/x86/entry/vdso/vma.c   |  4 +---
 drivers/android/binder_alloc.c  |  2 +-
 include/linux/mm.h  |  7 --
 mm/memory.c | 30 -
 mm/page-writeback.c |  2 +-
 net/ipv4/tcp.c  |  7 +++---
 13 files changed, 21 insertions(+), 58 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index e59a32aa0c49..0119dc91abb5 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -138,13 +138,11 @@ int vdso_join_timens(struct task_struct *task, struct 
time_namespace *ns)
mmap_read_lock(mm);
 
for_each_vma(vmi, vma) {
-   unsigned long size = vma->vm_end - vma->vm_start;
-
if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
-   zap_page_range(vma, vma->vm_start, size);
+   zap_vma_pages(vma);
 #ifdef CONFIG_COMPAT_VDSO
if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
-   zap_page_range(vma, vma->vm_start, size);
+   zap_vma_pages(vma);
 #endif
}
 
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 507f8228f983..7a2ff9010f17 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -120,10 +120,8 @@ int vdso_join_timens(struct task_struct *task, struct 
time_namespace *ns)
 
mmap_read_lock(mm);
for_each_vma(vmi, vma) {
-   unsigned long size = vma->vm_end - vma->vm_start;
-
if (vma_is_special_mapping(vma, _spec))
-   zap_page_range(vma, vma->vm_start, size);
+   zap_vma_pages(vma);
}
mmap_read_unlock(mm);
 
diff --git a/arch/powerpc/platforms/book3s/vas-api.c 
b/arch/powerpc/platforms/book3s/vas-api.c
index eb5bed333750..9580e8e12165 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -414,7 +414,7 @@ static vm_fault_t vas_mmap_fault(struct vm_fault *vmf)
/*
 * When the LPAR lost credits due to core removal or during
 * migration, invalidate the existing mapping for the current
-* paste addresses and set windows in-active (zap_page_range in
+* paste addresses and set windows in-active (zap_vma_pages in
 * reconfig_close_windows()).
 * New mapping will be done later after migration or new credits
 * available. So continue to receive faults if the user space
diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index 4ad6e510d405..559112312810 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -760,8 +760,7 @@ static int reconfig_close_windows(struct vas_caps *vcap, 
int excess_creds,
 * is done before the original mmap() and after the ioctl.
 */
if (vma)
-   zap_page_range(vma, vma->vm_start,
-   vma->vm_end - vma->vm_start);
+   zap_vma_pages(vma);
 
mmap_write_unlock(task_ref->mm);
mutex_unlock(_ref->mmap_mutex);
diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
index e410275918ac..5c30212d8d1c 100644
--- a/arch/riscv/kernel/vdso.c
+++ b/arch/riscv/kernel/vdso.c
@@ -124,13 +124,11 @@ int vdso_join_timens(struct task_s

Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range

2022-12-23 Thread Mike Kravetz

On 12/23/22 08:27, Christoph Hellwig wrote:
> > unsigned long size = vma->vm_end - vma->vm_start;
> >  
> > if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> > -   zap_page_range(vma, vma->vm_start, size);
> > +   zap_vma_page_range(vma, vma->vm_start, size);
> >  #ifdef CONFIG_COMPAT_VDSO
> > if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> > -   zap_page_range(vma, vma->vm_start, size);
> > +   zap_vma_page_range(vma, vma->vm_start, size);
> >  #endif
> 
> So for something called zap_vma_page_range I'd expect to just pass
> the vma and zap all of it, which this and many other callers want
> anyway.
> 
> > +++ b/arch/s390/mm/gmap.c
> > @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long 
> > from, unsigned long to)
> > if (is_vm_hugetlb_page(vma))
> > continue;
> > size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> > -   zap_page_range(vma, vmaddr, size);
> > +   zap_vma_page_range(vma, vmaddr, size);
> 
> And then just call zap_page_range_single directly for those that
> don't want to zap the entire vma.

Thanks!

This sounds like a good idea and I will incorporate in a new patch.

-- 
Mike Kravetz

Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range

2022-12-19 Thread Mike Kravetz

On 12/19/22 13:06, Michal Hocko wrote:
> On Fri 16-12-22 11:20:12, Mike Kravetz wrote:
> > zap_page_range was originally designed to unmap pages within an address
> > range that could span multiple vmas.  While working on [1], it was
> > discovered that all callers of zap_page_range pass a range entirely within
> > a single vma.  In addition, the mmu notification call within zap_page
> > range does not correctly handle ranges that span multiple vmas as calls
> > should be vma specific.
> 
> Could you spend a sentence or two explaining what is wrong here?

H?  My assumption was that the range passed to mmu_notifier_range_init()
was supposed to be within the specified vma.  When looking into the notifier
routines, I could not find any documentation about the usage of the vma within
the mmu_notifier_range structure.  It was introduced with commit bf198b2b34bf
"mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening".
However, I do not see this being used today.

Of course, I could be missing something, so adding Jérôme.

> 
> > Instead of fixing zap_page_range, change all callers to use the new
> > routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> > zap_page_range_single passing in NULL zap details.  The name is also
> > more in line with other exported routines that operate within a vma.
> > We can then remove zap_page_range.
> 
> I would stick with zap_page_range_single rather than adding a new
> wrapper but nothing really critical.

I am fine with doing that as well.  My only reason for the wrapper is that all 
callers outside mm/memory.c would pass in NULL zap details.

> 
> > Also, change madvise_dontneed_single_vma to use this new routine.
> > 
> > [1] 
> > https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.krav...@oracle.com/
> > Suggested-by: Peter Xu 
> > Signed-off-by: Mike Kravetz 
> 
> Other than that LGTM
> Acked-by: Michal Hocko 
> 
> Thanks!

Thanks for taking a look.
-- 
Mike Kravetz

[RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range

2022-12-16 Thread Mike Kravetz

zap_page_range was originally designed to unmap pages within an address
range that could span multiple vmas.  While working on [1], it was
discovered that all callers of zap_page_range pass a range entirely within
a single vma.  In addition, the mmu notification call within zap_page
range does not correctly handle ranges that span multiple vmas as calls
should be vma specific.

Instead of fixing zap_page_range, change all callers to use the new
routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
zap_page_range_single passing in NULL zap details.  The name is also
more in line with other exported routines that operate within a vma.
We can then remove zap_page_range.

Also, change madvise_dontneed_single_vma to use this new routine.

[1] 
https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.krav...@oracle.com/
Suggested-by: Peter Xu 
Signed-off-by: Mike Kravetz 
---
 arch/arm64/kernel/vdso.c|  4 ++--
 arch/powerpc/kernel/vdso.c  |  2 +-
 arch/powerpc/platforms/book3s/vas-api.c |  2 +-
 arch/powerpc/platforms/pseries/vas.c|  2 +-
 arch/riscv/kernel/vdso.c|  4 ++--
 arch/s390/kernel/vdso.c |  2 +-
 arch/s390/mm/gmap.c |  2 +-
 arch/x86/entry/vdso/vma.c   |  2 +-
 drivers/android/binder_alloc.c  |  2 +-
 include/linux/mm.h  |  7 --
 mm/madvise.c|  4 ++--
 mm/memory.c | 30 -
 mm/page-writeback.c |  2 +-
 net/ipv4/tcp.c  |  6 ++---
 14 files changed, 22 insertions(+), 49 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index e59a32aa0c49..a7b10e182f78 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -141,10 +141,10 @@ int vdso_join_timens(struct task_struct *task, struct 
time_namespace *ns)
unsigned long size = vma->vm_end - vma->vm_start;
 
if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
-   zap_page_range(vma, vma->vm_start, size);
+   zap_vma_page_range(vma, vma->vm_start, size);
 #ifdef CONFIG_COMPAT_VDSO
if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
-   zap_page_range(vma, vma->vm_start, size);
+   zap_vma_page_range(vma, vma->vm_start, size);
 #endif
}
 
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 507f8228f983..479d70fe8c55 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -123,7 +123,7 @@ int vdso_join_timens(struct task_struct *task, struct 
time_namespace *ns)
unsigned long size = vma->vm_end - vma->vm_start;
 
if (vma_is_special_mapping(vma, _spec))
-   zap_page_range(vma, vma->vm_start, size);
+   zap_vma_page_range(vma, vma->vm_start, size);
}
mmap_read_unlock(mm);
 
diff --git a/arch/powerpc/platforms/book3s/vas-api.c 
b/arch/powerpc/platforms/book3s/vas-api.c
index eb5bed333750..8f57388b760b 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -414,7 +414,7 @@ static vm_fault_t vas_mmap_fault(struct vm_fault *vmf)
/*
 * When the LPAR lost credits due to core removal or during
 * migration, invalidate the existing mapping for the current
-* paste addresses and set windows in-active (zap_page_range in
+* paste addresses and set windows in-active (zap_vma_page_range in
 * reconfig_close_windows()).
 * New mapping will be done later after migration or new credits
 * available. So continue to receive faults if the user space
diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index 4ad6e510d405..2aef8d9295a2 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -760,7 +760,7 @@ static int reconfig_close_windows(struct vas_caps *vcap, 
int excess_creds,
 * is done before the original mmap() and after the ioctl.
 */
if (vma)
-   zap_page_range(vma, vma->vm_start,
+   zap_vma_page_range(vma, vma->vm_start,
vma->vm_end - vma->vm_start);
 
mmap_write_unlock(task_ref->mm);
diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
index e410275918ac..a405119da2c0 100644
--- a/arch/riscv/kernel/vdso.c
+++ b/arch/riscv/kernel/vdso.c
@@ -127,10 +127,10 @@ int vdso_join_timens(struct task_struct *task, struct 
time_namespace *ns)
unsigned long size = vma->vm_end - vma->vm_start;
 
if (vma_is_special_mapping(vma, vdso_info.dm))
-

Re: [PATCH v4] hugetlb: simplify hugetlb handling in follow_page_mask

2022-10-30 Thread Mike Kravetz

On 10/30/22 15:45, Peter Xu wrote:
> On Fri, Oct 28, 2022 at 11:11:08AM -0700, Mike Kravetz wrote:
> > +   } else {
> > +   if (is_hugetlb_entry_migration(entry)) {
> > +   spin_unlock(ptl);
> > +   hugetlb_vma_unlock_read(vma);
> 
> Just noticed it when pulled the last mm-unstable: this line seems to be a
> left-over of v3, while not needed now?
> 
> > +   __migration_entry_wait_huge(pte, ptl);
> > +   goto retry;
> > +   }

Thanks Peter!

Sent v5 with the that line removed.

-- 
Mike Kravetz

[PATCH v5] hugetlb: simplify hugetlb handling in follow_page_mask

2022-10-30 Thread Mike Kravetz

During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified.  At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages.  ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error.  Instead, at each
level of the page table a check is made for a hugetlb entry.  If a hugetlb
entry is found, a call to a routine associated with that entry is made.

Currently, there are two checks for hugetlb entries at each page table
level.  The first check is of the form:
if (p?d_huge())
page = follow_huge_p?d();
the second check is of the form:
if (is_hugepd())
page = follow_huge_pd().

We can replace these checks, as well as the special handling routines
such as follow_huge_p?d() and follow_huge_pd() with a single routine to
handle hugetlb vmas.

A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask.  hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries.  huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.

[1] 
https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.w...@linux.alibaba.com/

Suggested-by: David Hildenbrand 
Signed-off-by: Mike Kravetz 
---
v5 -Remove left over hugetlb_vma_unlock_read
v4 -Remove vma (pmd sharing) locking as this can be called with
FOLL_NOWAIT. Peter
v3 -Change WARN_ON_ONCE() to BUILD_BUG() as reminded by Christophe Leroy
v2 -Added WARN_ON_ONCE() and updated comment as suggested by David
Fixed build issue found by kernel test robot
Added vma (pmd sharing) locking to hugetlb_follow_page_mask
ReBased on Baolin's patch to fix issues with CONT_* entries

 arch/ia64/mm/hugetlbpage.c|  15 ---
 arch/powerpc/mm/hugetlbpage.c |  37 
 include/linux/hugetlb.h   |  50 ++
 mm/gup.c  |  80 +++-
 mm/hugetlb.c  | 172 +++---
 5 files changed, 76 insertions(+), 278 deletions(-)

diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index f993cb36c062..380d2f3966c9 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -91,21 +91,6 @@ int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long addr, int 
write)
-{
-   struct page *page;
-   pte_t *ptep;
-
-   if (REGION_NUMBER(addr) != RGN_HPAGE)
-   return ERR_PTR(-EINVAL);
-
-   ptep = huge_pte_offset(mm, addr, HPAGE_SIZE);
-   if (!ptep || pte_none(*ptep))
-   return NULL;
-   page = pte_page(*ptep);
-   page += ((addr & ~HPAGE_MASK) >> PAGE_SHIFT);
-   return page;
-}
 int pmd_huge(pmd_t pmd)
 {
return 0;
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 5852a86d990d..f1ba8d1e8c1a 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -506,43 +506,6 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb,
} while (addr = next, addr != end);
 }
 
-struct page *follow_huge_pd(struct vm_area_struct *vma,
-   unsigned long address, hugepd_t hpd,
-   int flags, int pdshift)
-{
-   pte_t *ptep;
-   spinlock_t *ptl;
-   struct page *page = NULL;
-   unsigned long mask;
-   int shift = hugepd_shift(hpd);
-   struct mm_struct *mm = vma->vm_mm;
-
-retry:
-   /*
-* hugepage directory entries are protected by mm->page_table_lock
-* Use this instead of huge_pte_lockptr
-*/
-   ptl = >page_table_lock;
-   spin_lock(ptl);
-
-   ptep = hugepte_offset(hpd, address, pdshift);
-   if (pte_present(*ptep)) {
-   mask = (1UL << shift) - 1;
-   page = pte_page(*ptep);
-   page += ((address & mask) >> PAGE_SHIFT);
-   if (flags & FOLL_GET)
-   get_page(page);
-   } else {
-   if (is_hugetlb_entry_migration(*ptep)) {
-   spin_unlock(ptl);
-   __migration_entry_wait(mm, ptep, ptl);
-   goto retry;
-   }
-   }
-   spin_unlock(ptl);
-   return page;
-}
-
 bool __init arch_hugetlb_valid_size(unsigned long size)
 {
int shift = __ffs(size);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8b4f93e84868..4a76c0fc6bbf 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -149,6 +149,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 unsigned long len);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struc

[PATCH v4] hugetlb: simplify hugetlb handling in follow_page_mask

2022-10-28 Thread Mike Kravetz

During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified.  At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages.  ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error.  Instead, at each
level of the page table a check is made for a hugetlb entry.  If a hugetlb
entry is found, a call to a routine associated with that entry is made.

Currently, there are two checks for hugetlb entries at each page table
level.  The first check is of the form:
if (p?d_huge())
page = follow_huge_p?d();
the second check is of the form:
if (is_hugepd())
page = follow_huge_pd().

We can replace these checks, as well as the special handling routines
such as follow_huge_p?d() and follow_huge_pd() with a single routine to
handle hugetlb vmas.

A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask.  hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries.  huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.

[1] 
https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.w...@linux.alibaba.com/

Suggested-by: David Hildenbrand 
Signed-off-by: Mike Kravetz 
---
v4 -Remove vma (pmd sharing) locking as this can be called with
FOLL_NOWAIT. Peter
v3 -Change WARN_ON_ONCE() to BUILD_BUG() as reminded by Christophe Leroy
v2 -Added WARN_ON_ONCE() and updated comment as suggested by David
Fixed build issue found by kernel test robot
Added vma (pmd sharing) locking to hugetlb_follow_page_mask
ReBased on Baolin's patch to fix issues with CONT_* entries

 arch/ia64/mm/hugetlbpage.c|  15 ---
 arch/powerpc/mm/hugetlbpage.c |  37 
 include/linux/hugetlb.h   |  50 ++
 mm/gup.c  |  80 +++-
 mm/hugetlb.c  | 173 +++---
 5 files changed, 77 insertions(+), 278 deletions(-)

diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index f993cb36c062..380d2f3966c9 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -91,21 +91,6 @@ int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long addr, int 
write)
-{
-   struct page *page;
-   pte_t *ptep;
-
-   if (REGION_NUMBER(addr) != RGN_HPAGE)
-   return ERR_PTR(-EINVAL);
-
-   ptep = huge_pte_offset(mm, addr, HPAGE_SIZE);
-   if (!ptep || pte_none(*ptep))
-   return NULL;
-   page = pte_page(*ptep);
-   page += ((addr & ~HPAGE_MASK) >> PAGE_SHIFT);
-   return page;
-}
 int pmd_huge(pmd_t pmd)
 {
return 0;
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 5852a86d990d..f1ba8d1e8c1a 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -506,43 +506,6 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb,
} while (addr = next, addr != end);
 }
 
-struct page *follow_huge_pd(struct vm_area_struct *vma,
-   unsigned long address, hugepd_t hpd,
-   int flags, int pdshift)
-{
-   pte_t *ptep;
-   spinlock_t *ptl;
-   struct page *page = NULL;
-   unsigned long mask;
-   int shift = hugepd_shift(hpd);
-   struct mm_struct *mm = vma->vm_mm;
-
-retry:
-   /*
-* hugepage directory entries are protected by mm->page_table_lock
-* Use this instead of huge_pte_lockptr
-*/
-   ptl = >page_table_lock;
-   spin_lock(ptl);
-
-   ptep = hugepte_offset(hpd, address, pdshift);
-   if (pte_present(*ptep)) {
-   mask = (1UL << shift) - 1;
-   page = pte_page(*ptep);
-   page += ((address & mask) >> PAGE_SHIFT);
-   if (flags & FOLL_GET)
-   get_page(page);
-   } else {
-   if (is_hugetlb_entry_migration(*ptep)) {
-   spin_unlock(ptl);
-   __migration_entry_wait(mm, ptep, ptl);
-   goto retry;
-   }
-   }
-   spin_unlock(ptl);
-   return page;
-}
-
 bool __init arch_hugetlb_valid_size(unsigned long size)
 {
int shift = __ffs(size);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8b4f93e84868..4a76c0fc6bbf 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -149,6 +149,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 unsigned long len);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
struct vm_area_struct *,

Re: [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask

2022-10-28 Thread Mike Kravetz

On 10/27/22 15:34, Peter Xu wrote:
> On Wed, Oct 26, 2022 at 05:34:04PM -0700, Mike Kravetz wrote:
> > On 10/26/22 17:59, Peter Xu wrote:
> 
> If we want to use the vma read lock to protect here as the slow gup path,
> then please check again with below [1] - I think we'll also need to protect
> it with fast-gup (probably with trylock only, because fast-gup cannot
> sleep) or it'll encounter the same race, iiuc.
> 
> Actually, instead of using vma lock, I really think this is another problem
> and needs standalone fixing.  The problem is we allows huge_pte_offset() to
> walk the process pgtable without any protection, while pmd unsharing can
> drop a page anytime.  huge_pte_offset() is always facing use-after-free
> when walking the PUD page.
> 
> We may want RCU lock to protect the pgtable pages from getting away when
> huge_pte_offset() is walking it, it'll be safe then because pgtable pages
> are released in RCU fashion only (e.g. in above example, process [2] will
> munmap() and release the last ref to the "used to be shared" pmd and the
> PUD that maps the shared pmds will be released only after a RCU grace
> period), and afaict that's also what's protecting fast-gup from accessing
> freed pgtable pages.
> 
> If with all huge_pte_offset() callers becoming RCU-safe, then IIUC we can
> drop the vma lock in all GUP code, aka, in hugetlb_follow_page_mask() here,
> because both slow and fast gup should be safe too in the same manner.
> 
> Thanks,
> 
> > > IIUC it's also the same as fast-gup - afaiu we don't take the read vma 
> > > lock
> > > in fast-gup too but I also think it's safe.  But I hope I didn't miss
> > > something.
> 
> [1]

Thanks Peter!  I think the best thing would be to eliminate the vma_lock
calls in this patch.  The code it is replacing/simplifying does not do any
locking, so no real regression.

I think a scheme like you describe above is going to require some more
thought/work.  It might be better as a follow on patch.
-- 
Mike Kravetz

Re: [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask

2022-10-26 Thread Mike Kravetz

On 10/26/22 17:59, Peter Xu wrote:
> Hi, Mike,
> 
> On Sun, Sep 18, 2022 at 07:13:48PM -0700, Mike Kravetz wrote:
> > +struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> > +   unsigned long address, unsigned int flags)
> > +{
> > +   struct hstate *h = hstate_vma(vma);
> > +   struct mm_struct *mm = vma->vm_mm;
> > +   unsigned long haddr = address & huge_page_mask(h);
> > +   struct page *page = NULL;
> > +   spinlock_t *ptl;
> > +   pte_t *pte, entry;
> > +
> > +   /*
> > +* FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
> > +* follow_hugetlb_page().
> > +*/
> > +   if (WARN_ON_ONCE(flags & FOLL_PIN))
> > +   return NULL;
> > +
> > +retry:
> > +   /*
> > +* vma lock prevents racing with another thread doing a pmd unshare.
> > +* This keeps pte as returned by huge_pte_offset valid.
> > +*/
> > +   hugetlb_vma_lock_read(vma);
> 
> I'm not sure whether it's okay to take a rwsem here, as the code can be
> called by e.g. FOLL_NOWAIT?

I think you are right.  This is possible even thought not called this
way today,

> I'm wondering whether it's fine to just drop this anyway, just always walk
> it lockless.  IIUC gup callers should be safe here because the worst case
> is the caller will fetch a wrong page, but then it should be invalidated
> very soon with mmu notifiers.  One thing worth mention is that pmd unshare
> should never free a pgtable page.

You are correct in that pmd unshare will not directly free a pgtable page.
However, I think a 'very worst case' race could be caused by two threads(1,2)
in the same process A, and another process B.  Processes A and B share a PMD.
- Process A thread 1 gets a *ptep via huge_pte_offset and gets scheduled out.
- Process A thread 2 calls mprotect to change protection and unshares
  the PMD shared with process B.
- Process B then unmaps the PMD shared with process A and the PMD page
  gets deleted.
- The *ptep in Process A thread 1 then points into a freed page.
This is VERY unlikely, but I do think it is possible and is the reason I
may be overcautious about protecting against races with pmd unshare.

-- 
Mike Kravetz

> 
> IIUC it's also the same as fast-gup - afaiu we don't take the read vma lock
> in fast-gup too but I also think it's safe.  But I hope I didn't miss
> something.
> 
> -- 
> Peter Xu
>

Re: [RFC PATCH] fs/hugetlb: Fix UBSAN warning reported on hugetlb

2022-10-26 Thread Mike Kravetz

On 10/26/22 10:50, Aristeu Rozanski wrote:
> On Thu, Sep 08, 2022 at 10:29:59PM +0530, Aneesh Kumar K V wrote:
> > On 9/8/22 10:23 PM, Matthew Wilcox wrote:
> > > On Thu, Sep 08, 2022 at 12:56:59PM +0530, Aneesh Kumar K.V wrote:
> > >> +++ b/fs/dax.c
> > >> @@ -1304,7 +1304,7 @@ EXPORT_SYMBOL_GPL(dax_zero_range);
> > >>  int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
> > >>  const struct iomap_ops *ops)
> > >>  {
> > >> -unsigned int blocksize = i_blocksize(inode);
> > >> +size_t blocksize = i_blocksize(inode);
> > >>  unsigned int off = pos & (blocksize - 1);
> > > 
> > > If blocksize is larger than 4GB, then off also needs to be size_t.
> > > 
> > >> +++ b/fs/iomap/buffered-io.c
> > >> @@ -955,7 +955,7 @@ int
> > >>  iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
> > >>  const struct iomap_ops *ops)
> > >>  {
> > >> -unsigned int blocksize = i_blocksize(inode);
> > >> +size_t blocksize = i_blocksize(inode);
> > >>  unsigned int off = pos & (blocksize - 1);
> > > 
> > > Ditto.
> > > 
> > > (maybe there are others; I didn't check closely)
> > 
> > Thanks. will check those. 
> > 
> > Any feedback on statx? Should we really fix that?
> > 
> > I am still not clear why we chose to set blocksize = pagesize for hugetlbfs.
> > Was that done to enable application find the hugetlb pagesize via stat()? 
> 
> I'd like to know that as well. It'd be easier to just limit the hugetlbfs max
> blocksize to 4GB. It's very unlikely anything else will use such large
> blocksizes and having to introduce new user interfaces for it doesn't sound
> right.

I was not around hugetlbfs when the decision was made to set 'blocksize =
pagesize'.  However, I must say that it does seem to make sense as you
can only add or remove entire hugetlb pages from a hugetlbfs file.  So,
the hugetlb page size does seem to correspond to the meaning of filesystem
blocksize.

Does any application code make use of this?  I can not make a guess.
-- 
Mike Kravetz

Re: [powerpc] Kernel crash with THP tests (next-20220920)

2022-09-21 Thread Mike Kravetz

On 09/21/22 12:00, Sachin Sant wrote:
> While running transparent huge page tests [1] against 6.0.0-rc6-next-20220920
> following crash is seen on IBM Power server.

Thanks Sachin,

Naoya reported this, with my analysis here:
https://lore.kernel.org/linux-mm/YyqCS6+OXAgoqI8T@monkey/

An updated version of the patch was posted here,
https://lore.kernel.org/linux-mm/20220921202702.106069-1-mike.krav...@oracle.com/

Sorry about that,
-- 
Mike Kravetz

> 
> Kernel attempted to read user page (34) - exploit attempt? (uid: 0)
> BUG: Kernel NULL pointer dereference on read at 0x0034
> Faulting instruction address: 0xc04d2744
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: dm_mod(E) bonding(E) rfkill(E) tls(E) sunrpc(E) nd_pmem(E) 
> nd_btt(E) dax_pmem(E) papr_scm(E) libnvdimm(E) pseries_rng(E) vmx_crypto(E) 
> ext4(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) 
> sg(E) ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) fuse(E)
> CPU: 37 PID: 2219255 Comm: sysctl Tainted: GE  
> 6.0.0-rc6-next-20220920 #1
> NIP:  c04d2744 LR: c04d2734 CTR: 
> REGS: c012801bf660 TRAP: 0300   Tainted: GE   
> (6.0.0-rc6-next-20220920)
> MSR:  80009033   CR: 24048222  XER: 2004
> CFAR: c04b0eac DAR: 0034 DSISR: 4000 IRQMASK: 0 
> GPR00: c04d2734 c012801bf900 c2a92300  
> GPR04: c2ac8ac0 c1209340 0005 c01286714b80 
> GPR08: 0034    
> GPR12: 28048242 c0167fff6b00   
> GPR16:     
> GPR20: c012801bfae8 0001 0100 0001 
> GPR24: c012801bfae8 c2ac8ac0 0002 0005 
> GPR28:  0001  00346cca 
> NIP [c04d2744] alloc_buddy_huge_page+0xd4/0x240
> LR [c04d2734] alloc_buddy_huge_page+0xc4/0x240
> Call Trace:
> [c012801bf900] [c04d2734] alloc_buddy_huge_page+0xc4/0x240 
> (unreliable)
> [c012801bf9b0] [c04d46a4] 
> alloc_fresh_huge_page.part.72+0x214/0x2a0
> [c012801bfa40] [c04d7f88] alloc_pool_huge_page+0x118/0x190
> [c012801bfa90] [c04d84dc] __nr_hugepages_store_common+0x4dc/0x610
> [c012801bfb70] [c04d88bc] 
> hugetlb_sysctl_handler_common+0x13c/0x180
> [c012801bfc10] [c06380e0] proc_sys_call_handler+0x210/0x350
> [c012801bfc90] [c0551c00] vfs_write+0x2e0/0x460
> [c012801bfd50] [c0551f5c] ksys_write+0x7c/0x140
> [c012801bfda0] [c0033f58] system_call_exception+0x188/0x3f0
> [c012801bfe10] [c000c53c] system_call_common+0xec/0x270
> --- interrupt: c00 at 0x7fffa9520c34
> NIP:  7fffa9520c34 LR: 0001024754bc CTR: 
> REGS: c012801bfe80 TRAP: 0c00   Tainted: GE   
> (6.0.0-rc6-next-20220920)
> MSR:  8280f033   CR: 28002202  
> XER: 
> IRQMASK: 0 
> GPR00: 0004 7fffccd76cd0 7fffa9607300 0003 
> GPR04: 000138da6970 0006 fff6  
> GPR08: 000138da6970    
> GPR12:  7fffa9a40940   
> GPR16:     
> GPR20:     
> GPR24: 0001 0010 0006 000138da8aa0 
> GPR28: 7fffa95fc2c8 000138da8aa0 0006 000138da6930 
> NIP [7fffa9520c34] 0x7fffa9520c34
> LR [0001024754bc] 0x1024754bc
> --- interrupt: c00
> Instruction dump:
> 3b42 3ba1 3b80 7f26cb78 7fc5f378 7f64db78 7fe3fb78 4bfde5b9 
> 6000 7c691b78 39030034 7c0004ac <7d404028> 7c0ae800 40c20010 7f80412d 
> ---[ end trace  ]---
> 
> Kernel panic - not syncing: Fatal exception
> 
> Bisect points to following patch:
> commit f2f3c25dea3acfb17aecb7273541e7266dfc8842
> hugetlb: freeze allocated pages before creating hugetlb pages
> 
> Reverting the patch allows the test to run successfully.
> 
> Thanks
> - Sachin
> 
> [1] 
> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/transparent_hugepages_defrag.py

[PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask

2022-09-18 Thread Mike Kravetz

During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified.  At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages.  ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error.  Instead, at each
level of the page table a check is made for a hugetlb entry.  If a hugetlb
entry is found, a call to a routine associated with that entry is made.

Currently, there are two checks for hugetlb entries at each page table
level.  The first check is of the form:
if (p?d_huge())
page = follow_huge_p?d();
the second check is of the form:
if (is_hugepd())
page = follow_huge_pd().

We can replace these checks, as well as the special handling routines
such as follow_huge_p?d() and follow_huge_pd() with a single routine to
handle hugetlb vmas.

A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask.  hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries.  huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.

[1] 
https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.w...@linux.alibaba.com/

Suggested-by: David Hildenbrand 
Signed-off-by: Mike Kravetz 
---
v3 -Change WARN_ON_ONCE() to BUILD_BUG() as reminded by Christophe Leroy
v2 -Added WARN_ON_ONCE() and updated comment as suggested by David
Fixed build issue found by kernel test robot
Added vma (pmd sharing) locking to hugetlb_follow_page_mask
ReBased on Baolin's patch to fix issues with CONT_* entries

 arch/ia64/mm/hugetlbpage.c|  15 ---
 arch/powerpc/mm/hugetlbpage.c |  37 ---
 include/linux/hugetlb.h   |  50 ++
 mm/gup.c  |  80 +++
 mm/hugetlb.c  | 182 --
 5 files changed, 86 insertions(+), 278 deletions(-)

diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index f993cb36c062..380d2f3966c9 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -91,21 +91,6 @@ int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long addr, int 
write)
-{
-   struct page *page;
-   pte_t *ptep;
-
-   if (REGION_NUMBER(addr) != RGN_HPAGE)
-   return ERR_PTR(-EINVAL);
-
-   ptep = huge_pte_offset(mm, addr, HPAGE_SIZE);
-   if (!ptep || pte_none(*ptep))
-   return NULL;
-   page = pte_page(*ptep);
-   page += ((addr & ~HPAGE_MASK) >> PAGE_SHIFT);
-   return page;
-}
 int pmd_huge(pmd_t pmd)
 {
return 0;
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index bc84a594ca62..b0e037c75c12 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -506,43 +506,6 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb,
} while (addr = next, addr != end);
 }
 
-struct page *follow_huge_pd(struct vm_area_struct *vma,
-   unsigned long address, hugepd_t hpd,
-   int flags, int pdshift)
-{
-   pte_t *ptep;
-   spinlock_t *ptl;
-   struct page *page = NULL;
-   unsigned long mask;
-   int shift = hugepd_shift(hpd);
-   struct mm_struct *mm = vma->vm_mm;
-
-retry:
-   /*
-* hugepage directory entries are protected by mm->page_table_lock
-* Use this instead of huge_pte_lockptr
-*/
-   ptl = >page_table_lock;
-   spin_lock(ptl);
-
-   ptep = hugepte_offset(hpd, address, pdshift);
-   if (pte_present(*ptep)) {
-   mask = (1UL << shift) - 1;
-   page = pte_page(*ptep);
-   page += ((address & mask) >> PAGE_SHIFT);
-   if (flags & FOLL_GET)
-   get_page(page);
-   } else {
-   if (is_hugetlb_entry_migration(*ptep)) {
-   spin_unlock(ptl);
-   __migration_entry_wait(mm, ptep, ptl);
-   goto retry;
-   }
-   }
-   spin_unlock(ptl);
-   return page;
-}
-
 bool __init arch_hugetlb_valid_size(unsigned long size)
 {
int shift = __ffs(size);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cfe15b32e2d4..32d45e96a894 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -149,6 +149,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 unsigned long len);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
struct vm_area_struct *, struct vm_area_struct *);
+struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
+

Re: [PATCH v2] hugetlb: simplify hugetlb handling in follow_page_mask

2022-09-06 Thread Mike Kravetz

On 09/05/22 06:34, Christophe Leroy wrote:
> 
> 
> Le 02/09/2022 à 21:03, Mike Kravetz a écrit :
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index fe4944f89d34..275e554dd365 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -264,6 +255,13 @@ static inline void 
> > adjust_range_if_pmd_sharing_possible(
> >   {
> >   }
> >   
> > +static inline struct page *hugetlb_follow_page_mask(struct vm_area_struct 
> > *vma,
> > +   unsigned long address, unsigned int flags)
> > +{
> > +   WARN_ON_ONCE(1); /* should never be called if !CONFIG_HUGETLB_PAGE*/
> > +   return ERR_PTR(-EINVAL);
> 
> This function is called only when is_vm_hugetlb_page() is true.
> 
> When !CONFIG_HUGETLB_PAGE is_vm_hugetlb_page() always returns false, so 
> the call to hugetlb_follow_page_mask() should never be compiled in.
> 
> Use BUILD_BUG() to catch it at buildtime.
> 

Yes.  My bad as David suggested this previously.
How about we just leave out the function in the !CONFIG_HUGETLB_PAGE case?
We will get build errors without the need for a BUILD_BUG().

> > +}
> > +
> >   static inline long follow_hugetlb_page(struct mm_struct *mm,
> >         struct vm_area_struct *vma, struct page **pages,
> > struct vm_area_struct **vmas, unsigned long *position,

-- 
Mike Kravetz

[PATCH v2] hugetlb: simplify hugetlb handling in follow_page_mask

2022-09-02 Thread Mike Kravetz

During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified.  At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages.  ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error.  Instead, at each
level of the page table a check is made for a hugetlb entry.  If a hugetlb
entry is found, a call to a routine associated with that entry is made.

Currently, there are two checks for hugetlb entries at each page table
level.  The first check is of the form:
if (p?d_huge())
page = follow_huge_p?d();
the second check is of the form:
if (is_hugepd())
page = follow_huge_pd().

We can replace these checks, as well as the special handling routines
such as follow_huge_p?d() and follow_huge_pd() with a single routine to
handle hugetlb vmas.

A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask.  hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries.  huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.

[1] 
https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.w...@linux.alibaba.com/

Suggested-by: David Hildenbrand 
Signed-off-by: Mike Kravetz 
---
v2 -Added WARN_ON_ONCE() and updated comment as suggested by David
Fixed build issue found by kernel test robot
Added vma (pmd sharing) locking to hugetlb_follow_page_mask
ReBased on Baolin's patch to fix issues with CONT_* entries

 arch/ia64/mm/hugetlbpage.c|  15 ---
 arch/powerpc/mm/hugetlbpage.c |  37 ---
 include/linux/hugetlb.h   |  51 ++
 mm/gup.c  |  80 +++
 mm/hugetlb.c  | 182 --
 5 files changed, 87 insertions(+), 278 deletions(-)

diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index f993cb36c062..380d2f3966c9 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -91,21 +91,6 @@ int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long addr, int 
write)
-{
-   struct page *page;
-   pte_t *ptep;
-
-   if (REGION_NUMBER(addr) != RGN_HPAGE)
-   return ERR_PTR(-EINVAL);
-
-   ptep = huge_pte_offset(mm, addr, HPAGE_SIZE);
-   if (!ptep || pte_none(*ptep))
-   return NULL;
-   page = pte_page(*ptep);
-   page += ((addr & ~HPAGE_MASK) >> PAGE_SHIFT);
-   return page;
-}
 int pmd_huge(pmd_t pmd)
 {
return 0;
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index bc84a594ca62..b0e037c75c12 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -506,43 +506,6 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb,
} while (addr = next, addr != end);
 }
 
-struct page *follow_huge_pd(struct vm_area_struct *vma,
-   unsigned long address, hugepd_t hpd,
-   int flags, int pdshift)
-{
-   pte_t *ptep;
-   spinlock_t *ptl;
-   struct page *page = NULL;
-   unsigned long mask;
-   int shift = hugepd_shift(hpd);
-   struct mm_struct *mm = vma->vm_mm;
-
-retry:
-   /*
-* hugepage directory entries are protected by mm->page_table_lock
-* Use this instead of huge_pte_lockptr
-*/
-   ptl = >page_table_lock;
-   spin_lock(ptl);
-
-   ptep = hugepte_offset(hpd, address, pdshift);
-   if (pte_present(*ptep)) {
-   mask = (1UL << shift) - 1;
-   page = pte_page(*ptep);
-   page += ((address & mask) >> PAGE_SHIFT);
-   if (flags & FOLL_GET)
-   get_page(page);
-   } else {
-   if (is_hugetlb_entry_migration(*ptep)) {
-   spin_unlock(ptl);
-   __migration_entry_wait(mm, ptep, ptl);
-   goto retry;
-   }
-   }
-   spin_unlock(ptl);
-   return page;
-}
-
 bool __init arch_hugetlb_valid_size(unsigned long size)
 {
int shift = __ffs(size);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index fe4944f89d34..275e554dd365 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -142,6 +142,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 unsigned long len);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
struct vm_area_struct *, struct vm_area_struct *);
+struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
+   unsigned long address, unsigned int flags);
 lo

Re: [PATCH v6 1/2] mm: migration: fix the FOLL_GET failure on following huge page

2022-08-19 Thread Mike Kravetz

On 08/19/22 21:22, Michael Ellerman wrote:
> Mike Kravetz  writes:
> > On 08/16/22 22:43, Andrew Morton wrote:
> >> On Wed, 17 Aug 2022 03:31:37 + "Wang, Haiyue"  
> >> wrote:
> >>
> >> > > >  }
> >> > >
> >> > > I would be better to fix this for real at those three client code 
> >> > > sites?
> >> >
> >> > Then 5.19 will break for a while to wait for the final BIG patch ?
> >>
> >> If that's the proposal then your [1/2] should have had a cc:stable and
> >> changelog words describing the plan for 6.0.
> >>
> >> But before we do that I'd like to see at least a prototype of the final
> >> fixes to s390 and hugetlb, so we can assess those as preferable for
> >> backporting.  I don't think they'll be terribly intrusive or risky?
> >
> > I will start on adding follow_huge_pgd() support.  Although, I may need
> > some help with verification from the powerpc folks, as that is the only
> > architecture which supports hugetlb pages at that level.
> >
> > mpe any suggestions?
> 
> I'm happy to test.
> 
> I have a system where I can allocate 1GB huge pages.
> 
> I'm not sure how to actually test this path though. I hacked up the
> vm/migration.c test to allocate 1GB hugepages, but I can't see it going
> through follow_huge_pgd() (using ftrace).

I thing you needed to use 16GB to trigger this code path.  Anshuman introduced
support for page offline (and migration) at this level in commit 94310cbcaa3c
("mm/madvise: enable (soft|hard) offline of HugeTLB pages at PGD level").
When asked about the use case, he mentioned:

"Yes, its in the context of 16GB pages on POWER8 system where all the
 gigantic pages are pre allocated from the platform and passed on to
 the kernel through the device tree. We dont allocate these gigantic
 pages on runtime."

-- 
Mike Kravetz

> 
> Maybe I hacked it up badly, I'll have a closer look on Monday. But if
> you have any tips on how to trigger that path let me know :)
> 
> cheers

[PATCH v2 4/4] hugetlb: Lazy page table copies in fork()

2022-06-21 Thread Mike Kravetz

Lazy page table copying at fork time was introduced with commit
d992895ba2b2 ("[PATCH] Lazy page table copies in fork()").  At the
time, hugetlb was very new and did not support page faulting.  As a
result, it was excluded.  When full page fault support was added for
hugetlb, the exclusion was not removed.

Simply remove the check that prevents lazy copying of hugetlb page
tables at fork.  Of course, like other mappings this only applies to
shared mappings.

Lazy page table copying at fork will be less advantageous for hugetlb
mappings because:
- There are fewer page table entries with hugetlb
- hugetlb pmds can be shared instead of copied

In any case, completely eliminating the copy at fork time should speed
things up.

Signed-off-by: Mike Kravetz 
Acked-by: Muchun Song 
Acked-by: David Hildenbrand 
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index fee2884481f2..90d2a614b2de 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1262,7 +1262,7 @@ vma_needs_copy(struct vm_area_struct *dst_vma, struct 
vm_area_struct *src_vma)
if (userfaultfd_wp(dst_vma))
return true;
 
-   if (src_vma->vm_flags & (VM_HUGETLB | VM_PFNMAP | VM_MIXEDMAP))
+   if (src_vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
return true;
 
if (src_vma->anon_vma)
-- 
2.35.3

[PATCH v2 0/4] hugetlb: speed up linear address scanning

2022-06-21 Thread Mike Kravetz

At unmap, fork and remap time hugetlb address ranges are linearly
scanned.  We can optimize these scans if the ranges are sparsely
populated.

Also, enable page table "Lazy copy" for hugetlb at fork.

NOTE: Architectures not defining CONFIG_ARCH_WANT_GENERAL_HUGETLB
need to add an arch specific version hugetlb_mask_last_page() to
take advantage of sparse address scanning improvements.  Baolin Wang
added the routine for arm64.  Other architectures which could be
optimized are: ia64, mips, parisc, powerpc, s390, sh and sparc.

v1->v2  Change hugetlb_mask_last_page default code to 0 instead of ~0.  Peter
Fix build issues on i386, including going back to if-else-if
instead of switch in hugetlb_mask_last_page. kernel test robot
Update commit message. Rolf Eike Beer
Changes were relatively minor, so I left the Reviewed-by and
ACKed-by tags.

Baolin Wang (1):
  arm64/hugetlb: Implement arm64 specific hugetlb_mask_last_page

Mike Kravetz (3):
  hugetlb: skip to end of PT page mapping when pte not present
  hugetlb: do not update address in huge_pmd_unshare
  hugetlb: Lazy page table copies in fork()

 arch/arm64/mm/hugetlbpage.c |  20 +++
 include/linux/hugetlb.h |   5 +-
 mm/hugetlb.c| 102 +---
 mm/memory.c |   2 +-
 mm/rmap.c   |   4 +-
 5 files changed, 96 insertions(+), 37 deletions(-)

-- 
2.35.3

[PATCH v2 3/4] hugetlb: do not update address in huge_pmd_unshare

2022-06-21 Thread Mike Kravetz

As an optimization for loops sequentially processing hugetlb address
ranges, huge_pmd_unshare would update a passed address if it unshared a
pmd.  Updating a loop control variable outside the loop like this is
generally a bad idea.  These loops are now using hugetlb_mask_last_page
to optimize scanning when non-present ptes are discovered.  The same
can be done when huge_pmd_unshare returns 1 indicating a pmd was
unshared.

Remove address update from huge_pmd_unshare.  Change the passed argument
type and update all callers.  In loops sequentially processing addresses
use hugetlb_mask_last_page to update address if pmd is unshared.

Signed-off-by: Mike Kravetz 
Acked-by: Muchun Song 
Reviewed-by: Baolin Wang 
---
 include/linux/hugetlb.h |  4 ++--
 mm/hugetlb.c| 46 +
 mm/rmap.c   |  4 ++--
 3 files changed, 23 insertions(+), 31 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e37465e830fe..ee9a28ef26ee 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -199,7 +199,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
   unsigned long addr, unsigned long sz);
 unsigned long hugetlb_mask_last_page(struct hstate *h);
 int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
-   unsigned long *addr, pte_t *ptep);
+   unsigned long addr, pte_t *ptep);
 void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end);
 struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
@@ -246,7 +246,7 @@ static inline struct address_space 
*hugetlb_page_mapping_lock_write(
 
 static inline int huge_pmd_unshare(struct mm_struct *mm,
struct vm_area_struct *vma,
-   unsigned long *addr, pte_t *ptep)
+   unsigned long addr, pte_t *ptep)
 {
return 0;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0e4877cea62e..2e4a92cebd9c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4945,7 +4945,6 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
struct mm_struct *mm = vma->vm_mm;
unsigned long old_end = old_addr + len;
unsigned long last_addr_mask;
-   unsigned long old_addr_copy;
pte_t *src_pte, *dst_pte;
struct mmu_notifier_range range;
bool shared_pmd = false;
@@ -4973,14 +4972,10 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
if (huge_pte_none(huge_ptep_get(src_pte)))
continue;
 
-   /* old_addr arg to huge_pmd_unshare() is a pointer and so the
-* arg may be modified. Pass a copy instead to preserve the
-* value in old_addr.
-*/
-   old_addr_copy = old_addr;
-
-   if (huge_pmd_unshare(mm, vma, _addr_copy, src_pte)) {
+   if (huge_pmd_unshare(mm, vma, old_addr, src_pte)) {
shared_pmd = true;
+   old_addr |= last_addr_mask;
+   new_addr |= last_addr_mask;
continue;
}
 
@@ -5045,10 +5040,11 @@ static void __unmap_hugepage_range(struct mmu_gather 
*tlb, struct vm_area_struct
}
 
ptl = huge_pte_lock(h, mm, ptep);
-   if (huge_pmd_unshare(mm, vma, , ptep)) {
+   if (huge_pmd_unshare(mm, vma, address, ptep)) {
spin_unlock(ptl);
tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
force_flush = true;
+   address |= last_addr_mask;
continue;
}
 
@@ -6343,7 +6339,7 @@ unsigned long hugetlb_change_protection(struct 
vm_area_struct *vma,
continue;
}
ptl = huge_pte_lock(h, mm, ptep);
-   if (huge_pmd_unshare(mm, vma, , ptep)) {
+   if (huge_pmd_unshare(mm, vma, address, ptep)) {
/*
 * When uffd-wp is enabled on the vma, unshare
 * shouldn't happen at all.  Warn about it if it
@@ -6353,6 +6349,7 @@ unsigned long hugetlb_change_protection(struct 
vm_area_struct *vma,
pages++;
spin_unlock(ptl);
shared_pmd = true;
+   address |= last_addr_mask;
continue;
}
pte = huge_ptep_get(ptep);
@@ -6776,11 +6773,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct 
vm_area_struct *vma,
  * 0 the underlying pte page is not shared, or it is the last user
  */
 int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_stru

[PATCH v2 2/4] arm64/hugetlb: Implement arm64 specific hugetlb_mask_last_page

2022-06-21 Thread Mike Kravetz

From: Baolin Wang 

The HugeTLB address ranges are linearly scanned during fork, unmap and
remap operations, and the linear scan can skip to the end of range mapped
by the page table page if hitting a non-present entry, which can help
to speed linear scanning of the HugeTLB address ranges.

So hugetlb_mask_last_page() is introduced to help to update the address in
the loop of HugeTLB linear scanning with getting the last huge page mapped
by the associated page table page[1], when a non-present entry is encountered.

Considering ARM64 specific cont-pte/pmd size HugeTLB, this patch implemented
an ARM64 specific hugetlb_mask_last_page() to help this case.

[1] 
https://lore.kernel.org/linux-mm/20220527225849.284839-1-mike.krav...@oracle.com/

Signed-off-by: Baolin Wang 
Signed-off-by: Mike Kravetz 
Acked-by: Muchun Song 
---
 arch/arm64/mm/hugetlbpage.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index e2a5ec9fdc0d..c9e076683e5d 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -368,6 +368,26 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
return NULL;
 }
 
+unsigned long hugetlb_mask_last_page(struct hstate *h)
+{
+   unsigned long hp_size = huge_page_size(h);
+
+   switch (hp_size) {
+   case PUD_SIZE:
+   return PGDIR_SIZE - PUD_SIZE;
+   case CONT_PMD_SIZE:
+   return PUD_SIZE - CONT_PMD_SIZE;
+   case PMD_SIZE:
+   return PUD_SIZE - PMD_SIZE;
+   case CONT_PTE_SIZE:
+   return PMD_SIZE - CONT_PTE_SIZE;
+   default:
+   break;
+   }
+
+   return 0UL;
+}
+
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 {
size_t pagesize = 1UL << shift;
-- 
2.35.3

[PATCH v2 1/4] hugetlb: skip to end of PT page mapping when pte not present

2022-06-21 Thread Mike Kravetz

HugeTLB address ranges are linearly scanned during fork, unmap and
remap operations.  If a non-present entry is encountered, the code
currently continues to the next huge page aligned address.  However,
a non-present entry implies that the page table page for that entry
is not present.  Therefore, the linear scan can skip to the end of
range mapped by the page table page.  This can speed operations on
large sparsely populated hugetlb mappings.

Create a new routine hugetlb_mask_last_page() that will return an
address mask.  When the mask is ORed with an address, the result
will be the address of the last huge page mapped by the associated
page table page.  Use this mask to update addresses in routines which
linearly scan hugetlb address ranges when a non-present pte is
encountered.

hugetlb_mask_last_page is related to the implementation of
huge_pte_offset as hugetlb_mask_last_page is called when huge_pte_offset
returns NULL.  This patch only provides a complete hugetlb_mask_last_page
implementation when CONFIG_ARCH_WANT_GENERAL_HUGETLB is defined.
Architectures which provide their own versions of huge_pte_offset can also
provide their own version of hugetlb_mask_last_page.

Signed-off-by: Mike Kravetz 
Tested-by: Baolin Wang 
Reviewed-by: Baolin Wang 
Acked-by: Muchun Song 
Reported-by: kernel test robot 
---
 include/linux/hugetlb.h |  1 +
 mm/hugetlb.c| 56 +
 2 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 642a39016f9a..e37465e830fe 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -197,6 +197,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct 
vm_area_struct *vma,
unsigned long addr, unsigned long sz);
 pte_t *huge_pte_offset(struct mm_struct *mm,
   unsigned long addr, unsigned long sz);
+unsigned long hugetlb_mask_last_page(struct hstate *h);
 int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long *addr, pte_t *ptep);
 void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 98492733cc64..0e4877cea62e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4736,6 +4736,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct 
mm_struct *src,
unsigned long npages = pages_per_huge_page(h);
struct address_space *mapping = src_vma->vm_file->f_mapping;
struct mmu_notifier_range range;
+   unsigned long last_addr_mask;
int ret = 0;
 
if (cow) {
@@ -4755,11 +4756,14 @@ int copy_hugetlb_page_range(struct mm_struct *dst, 
struct mm_struct *src,
i_mmap_lock_read(mapping);
}
 
+   last_addr_mask = hugetlb_mask_last_page(h);
for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) {
spinlock_t *src_ptl, *dst_ptl;
src_pte = huge_pte_offset(src, addr, sz);
-   if (!src_pte)
+   if (!src_pte) {
+   addr |= last_addr_mask;
continue;
+   }
dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz);
if (!dst_pte) {
ret = -ENOMEM;
@@ -4776,8 +4780,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, 
struct mm_struct *src,
 * after taking the lock below.
 */
dst_entry = huge_ptep_get(dst_pte);
-   if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
+   if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
+   addr |= last_addr_mask;
continue;
+   }
 
dst_ptl = huge_pte_lock(h, dst, dst_pte);
src_ptl = huge_pte_lockptr(h, src, src_pte);
@@ -4938,6 +4944,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
unsigned long sz = huge_page_size(h);
struct mm_struct *mm = vma->vm_mm;
unsigned long old_end = old_addr + len;
+   unsigned long last_addr_mask;
unsigned long old_addr_copy;
pte_t *src_pte, *dst_pte;
struct mmu_notifier_range range;
@@ -4953,12 +4960,16 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
flush_cache_range(vma, range.start, range.end);
 
mmu_notifier_invalidate_range_start();
+   last_addr_mask = hugetlb_mask_last_page(h);
/* Prevent race with file truncation */
i_mmap_lock_write(mapping);
for (; old_addr < old_end; old_addr += sz, new_addr += sz) {
src_pte = huge_pte_offset(mm, old_addr, sz);
-   if (!src_pte)
+   if (!src_pte) {
+   old_addr |= last_addr_mask;
+   new_addr |= last_addr_mask;
continue;
+   }
if (huge_p

Re: [PATCH 1/4] hugetlb: skip to end of PT page mapping when pte not present

2022-06-17 Thread Mike Kravetz

On 06/17/22 19:26, kernel test robot wrote:
> Hi Mike,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on soc/for-next]
> [also build test ERROR on linus/master v5.19-rc2 next-20220617]
> [cannot apply to arm64/for-next/core arm/for-next kvmarm/next 
> xilinx-xlnx/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:
> https://github.com/intel-lab-lkp/linux/commits/Mike-Kravetz/hugetlb-speed-up-linear-address-scanning/20220617-050726
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next
> config: i386-randconfig-a002 
> (https://download.01.org/0day-ci/archive/20220617/202206171929.ziurng6p-...@intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 
> f0e608de27b3d56846eebf3712ab542979d6)
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # 
> https://github.com/intel-lab-lkp/linux/commit/4c647687607f10fece04967b8180c0dadaf765e6
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review 
> Mike-Kravetz/hugetlb-speed-up-linear-address-scanning/20220617-050726
> git checkout 4c647687607f10fece04967b8180c0dadaf765e6
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
> O=build_dir ARCH=i386 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot 
> 
> All errors (new ones prefixed by >>):

A couple of things here,

> 
> >> mm/hugetlb.c:6901:7: error: duplicate case value '4194304'
>case PUD_SIZE:
> ^
>include/asm-generic/pgtable-nopud.h:20:20: note: expanded from macro 
> 'PUD_SIZE'
>#define PUD_SIZE(1UL << PUD_SHIFT)
>^
>mm/hugetlb.c:6899:7: note: previous case defined here
>case P4D_SIZE:
> ^
>include/asm-generic/pgtable-nop4d.h:13:19: note: expanded from macro 
> 'P4D_SIZE'
>#define P4D_SIZE(1UL << P4D_SHIFT)

In the CONFIG_ARCH_WANT_GENERAL_HUGETLB case covered by this version of
hugetlb_mask_last_page, huge pages can only be PMD_SIZE or PUD_SIZE.
So, the 'case P4D_SIZE:' should not exist and can be removed.

>^
>mm/hugetlb.c:6903:7: error: duplicate case value '4194304'
>case PMD_SIZE:
> ^
>include/asm-generic/pgtable-nopmd.h:22:20: note: expanded from macro 
> 'PMD_SIZE'
>#define PMD_SIZE(1UL << PMD_SHIFT)
>^

On i386 with CONFIG_PGTABLE_LEVELS=2, PUD_SIZE == PMD_SIZE.
Originally, I coded this as a if .. else if ... statement instead of a
switch.  If coded this way, the compiler does not complain about the
duplicate values.  The only other alternative I can think of is
something like '#if CONFIG_PGTABLE_LEVELS > 2' around the PUD_SIZE case.

I would prefer the if else if, unless someone can suggest something else?
-- 
Mike Kravetz

Re: [PATCH 1/4] hugetlb: skip to end of PT page mapping when pte not present

2022-06-17 Thread Mike Kravetz

On 06/17/22 10:15, Peter Xu wrote:
> Hi, Mike,
> 
> On Thu, Jun 16, 2022 at 02:05:15PM -0700, Mike Kravetz wrote:
> > @@ -6877,6 +6896,39 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
> > return (pte_t *)pmd;
> >  }
> >  
> > +/*
> > + * Return a mask that can be used to update an address to the last huge
> > + * page in a page table page mapping size.  Used to skip non-present
> > + * page table entries when linearly scanning address ranges.  Architectures
> > + * with unique huge page to page table relationships can define their own
> > + * version of this routine.
> > + */
> > +unsigned long hugetlb_mask_last_page(struct hstate *h)
> > +{
> > +   unsigned long hp_size = huge_page_size(h);
> > +
> > +   switch (hp_size) {
> > +   case P4D_SIZE:
> > +   return PGDIR_SIZE - P4D_SIZE;
> > +   case PUD_SIZE:
> > +   return P4D_SIZE - PUD_SIZE;
> > +   case PMD_SIZE:
> > +   return PUD_SIZE - PMD_SIZE;
> > +   default:
> 
> Should we add a WARN_ON_ONCE() if it should never trigger?
> 

Sure.  I will add this.

> > +   break; /* Should never happen */
> > +   }
> > +
> > +   return ~(0UL);
> > +}
> > +
> > +#else
> > +
> > +/* See description above.  Architectures can provide their own version. */
> > +__weak unsigned long hugetlb_mask_last_page(struct hstate *h)
> > +{
> > +   return ~(0UL);
> 
> I'm wondering whether it's better to return 0 rather than ~0 by default.
> Could an arch with !CONFIG_ARCH_WANT_GENERAL_HUGETLB wrongly skip some
> valid address ranges with ~0, or perhaps I misread?

Thank you, thank you, thank you Peter!

Yes, the 'default' return for hugetlb_mask_last_page() should be 0.  If
there is no 'optimization', we do not want to modify the address so we
want to OR with 0 not ~0.  My bad, I must have been thinking AND instead
of OR.

I will change here as well as in Baolin's patch.
-- 
Mike Kravetz

[PATCH 3/4] hugetlb: do not update address in huge_pmd_unshare

2022-06-16 Thread Mike Kravetz

As an optimization for loops sequentially processing hugetlb address
ranges, huge_pmd_unshare would update a passed address if it unshared a
pmd.  Updating a loop control variable outside the loop like this is
generally a bad idea.  These loops are now using hugetlb_mask_last_page
to optimize scanning when non-present ptes are discovered.  The same
can be done when huge_pmd_unshare returns 1 indicating a pmd was
unshared.

Remove address update from huge_pmd_unshare.  Change the passed argument
type and update all callers.  In loops sequentially processing addresses
use hugetlb_mask_last_page to update address if pmd is unshared.

Signed-off-by: Mike Kravetz 
Acked-by: Muchun Song 
Reviewed-by: Baolin Wang 
---
 include/linux/hugetlb.h |  4 ++--
 mm/hugetlb.c| 47 ++---
 mm/rmap.c   |  4 ++--
 3 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e37465e830fe..ee9a28ef26ee 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -199,7 +199,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
   unsigned long addr, unsigned long sz);
 unsigned long hugetlb_mask_last_page(struct hstate *h);
 int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
-   unsigned long *addr, pte_t *ptep);
+   unsigned long addr, pte_t *ptep);
 void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end);
 struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
@@ -246,7 +246,7 @@ static inline struct address_space 
*hugetlb_page_mapping_lock_write(
 
 static inline int huge_pmd_unshare(struct mm_struct *mm,
struct vm_area_struct *vma,
-   unsigned long *addr, pte_t *ptep)
+   unsigned long addr, pte_t *ptep)
 {
return 0;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7c4a82848603..f7da2d54ef39 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4949,7 +4949,6 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
struct mm_struct *mm = vma->vm_mm;
unsigned long old_end = old_addr + len;
unsigned long last_addr_mask;
-   unsigned long old_addr_copy;
pte_t *src_pte, *dst_pte;
struct mmu_notifier_range range;
bool shared_pmd = false;
@@ -4977,14 +4976,10 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
if (huge_pte_none(huge_ptep_get(src_pte)))
continue;
 
-   /* old_addr arg to huge_pmd_unshare() is a pointer and so the
-* arg may be modified. Pass a copy instead to preserve the
-* value in old_addr.
-*/
-   old_addr_copy = old_addr;
-
-   if (huge_pmd_unshare(mm, vma, _addr_copy, src_pte)) {
+   if (huge_pmd_unshare(mm, vma, old_addr, src_pte)) {
shared_pmd = true;
+   old_addr |= last_addr_mask;
+   new_addr |= last_addr_mask;
continue;
}
 
@@ -5049,10 +5044,11 @@ static void __unmap_hugepage_range(struct mmu_gather 
*tlb, struct vm_area_struct
}
 
ptl = huge_pte_lock(h, mm, ptep);
-   if (huge_pmd_unshare(mm, vma, , ptep)) {
+   if (huge_pmd_unshare(mm, vma, address, ptep)) {
spin_unlock(ptl);
tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
force_flush = true;
+   address |= last_addr_mask;
continue;
}
 
@@ -6347,7 +6343,7 @@ unsigned long hugetlb_change_protection(struct 
vm_area_struct *vma,
continue;
}
ptl = huge_pte_lock(h, mm, ptep);
-   if (huge_pmd_unshare(mm, vma, , ptep)) {
+   if (huge_pmd_unshare(mm, vma, address, ptep)) {
/*
 * When uffd-wp is enabled on the vma, unshare
 * shouldn't happen at all.  Warn about it if it
@@ -6357,6 +6353,7 @@ unsigned long hugetlb_change_protection(struct 
vm_area_struct *vma,
pages++;
spin_unlock(ptl);
shared_pmd = true;
+   address |= last_addr_mask;
continue;
}
pte = huge_ptep_get(ptep);
@@ -6780,11 +6777,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct 
vm_area_struct *vma,
  * 0 the underlying pte page is not shared, or it is the last user
  */
 int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_stru

[PATCH 4/4] hugetlb: Lazy page table copies in fork()

2022-06-16 Thread Mike Kravetz

Lazy page table copying at fork time was introduced with commit
commit d992895ba2b2 ("[PATCH] Lazy page table copies in fork()").
At the time, hugetlb was very new and did not support page faulting.
As a result, it was excluded.  When full page fault support was added
for hugetlb, the exclusion was not removed.

Simply remove the check that prevents lazy copying of hugetlb page
tables at fork.  Of course, like other mappings this only applies to
shared mappings.

Lazy page table copying at fork will be less advantageous for hugetlb
mappings because:
- There are fewer page table entries with hugetlb
- hugetlb pmds can be shared instead of copied

In any case, completely eliminating the copy at fork time should speed
things up.

Signed-off-by: Mike Kravetz 
Acked-by: Muchun Song 
Acked-by: David Hildenbrand 
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index fee2884481f2..90d2a614b2de 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1262,7 +1262,7 @@ vma_needs_copy(struct vm_area_struct *dst_vma, struct 
vm_area_struct *src_vma)
if (userfaultfd_wp(dst_vma))
return true;
 
-   if (src_vma->vm_flags & (VM_HUGETLB | VM_PFNMAP | VM_MIXEDMAP))
+   if (src_vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
return true;
 
if (src_vma->anon_vma)
-- 
2.35.3

[PATCH 2/4] arm64/hugetlb: Implement arm64 specific hugetlb_mask_last_page

2022-06-16 Thread Mike Kravetz

From: Baolin Wang 

The HugeTLB address ranges are linearly scanned during fork, unmap and
remap operations, and the linear scan can skip to the end of range mapped
by the page table page if hitting a non-present entry, which can help
to speed linear scanning of the HugeTLB address ranges.

So hugetlb_mask_last_page() is introduced to help to update the address in
the loop of HugeTLB linear scanning with getting the last huge page mapped
by the associated page table page[1], when a non-present entry is encountered.

Considering ARM64 specific cont-pte/pmd size HugeTLB, this patch implemented
an ARM64 specific hugetlb_mask_last_page() to help this case.

[1] 
https://lore.kernel.org/linux-mm/20220527225849.284839-1-mike.krav...@oracle.com/

Signed-off-by: Baolin Wang 
Signed-off-by: Mike Kravetz 
---
 arch/arm64/mm/hugetlbpage.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index e2a5ec9fdc0d..ddeafee7c4de 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -368,6 +368,26 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
return NULL;
 }
 
+unsigned long hugetlb_mask_last_page(struct hstate *h)
+{
+   unsigned long hp_size = huge_page_size(h);
+
+   switch (hp_size) {
+   case PUD_SIZE:
+   return PGDIR_SIZE - PUD_SIZE;
+   case CONT_PMD_SIZE:
+   return PUD_SIZE - CONT_PMD_SIZE;
+   case PMD_SIZE:
+   return PUD_SIZE - PMD_SIZE;
+   case CONT_PTE_SIZE:
+   return PMD_SIZE - CONT_PTE_SIZE;
+   default:
+   break;
+   }
+
+   return ~0UL;
+}
+
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 {
size_t pagesize = 1UL << shift;
-- 
2.35.3

[PATCH 1/4] hugetlb: skip to end of PT page mapping when pte not present

2022-06-16 Thread Mike Kravetz

HugeTLB address ranges are linearly scanned during fork, unmap and
remap operations.  If a non-present entry is encountered, the code
currently continues to the next huge page aligned address.  However,
a non-present entry implies that the page table page for that entry
is not present.  Therefore, the linear scan can skip to the end of
range mapped by the page table page.  This can speed operations on
large sparsely populated hugetlb mappings.

Create a new routine hugetlb_mask_last_page() that will return an
address mask.  When the mask is ORed with an address, the result
will be the address of the last huge page mapped by the associated
page table page.  Use this mask to update addresses in routines which
linearly scan hugetlb address ranges when a non-present pte is
encountered.

hugetlb_mask_last_page is related to the implementation of
huge_pte_offset as hugetlb_mask_last_page is called when huge_pte_offset
returns NULL.  This patch only provides a complete hugetlb_mask_last_page
implementation when CONFIG_ARCH_WANT_GENERAL_HUGETLB is defined.
Architectures which provide their own versions of huge_pte_offset can also
provide their own version of hugetlb_mask_last_page.

Signed-off-by: Mike Kravetz 
Tested-by: Baolin Wang 
Reviewed-by: Baolin Wang 
---
 include/linux/hugetlb.h |  1 +
 mm/hugetlb.c| 62 +
 2 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 642a39016f9a..e37465e830fe 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -197,6 +197,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct 
vm_area_struct *vma,
unsigned long addr, unsigned long sz);
 pte_t *huge_pte_offset(struct mm_struct *mm,
   unsigned long addr, unsigned long sz);
+unsigned long hugetlb_mask_last_page(struct hstate *h);
 int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long *addr, pte_t *ptep);
 void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 259b9c41892f..7c4a82848603 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4740,6 +4740,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct 
mm_struct *src,
unsigned long npages = pages_per_huge_page(h);
struct address_space *mapping = src_vma->vm_file->f_mapping;
struct mmu_notifier_range range;
+   unsigned long last_addr_mask;
int ret = 0;
 
if (cow) {
@@ -4759,11 +4760,14 @@ int copy_hugetlb_page_range(struct mm_struct *dst, 
struct mm_struct *src,
i_mmap_lock_read(mapping);
}
 
+   last_addr_mask = hugetlb_mask_last_page(h);
for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) {
spinlock_t *src_ptl, *dst_ptl;
src_pte = huge_pte_offset(src, addr, sz);
-   if (!src_pte)
+   if (!src_pte) {
+   addr |= last_addr_mask;
continue;
+   }
dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz);
if (!dst_pte) {
ret = -ENOMEM;
@@ -4780,8 +4784,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, 
struct mm_struct *src,
 * after taking the lock below.
 */
dst_entry = huge_ptep_get(dst_pte);
-   if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
+   if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
+   addr |= last_addr_mask;
continue;
+   }
 
dst_ptl = huge_pte_lock(h, dst, dst_pte);
src_ptl = huge_pte_lockptr(h, src, src_pte);
@@ -4942,6 +4948,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
unsigned long sz = huge_page_size(h);
struct mm_struct *mm = vma->vm_mm;
unsigned long old_end = old_addr + len;
+   unsigned long last_addr_mask;
unsigned long old_addr_copy;
pte_t *src_pte, *dst_pte;
struct mmu_notifier_range range;
@@ -4957,12 +4964,16 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
flush_cache_range(vma, range.start, range.end);
 
mmu_notifier_invalidate_range_start();
+   last_addr_mask = hugetlb_mask_last_page(h);
/* Prevent race with file truncation */
i_mmap_lock_write(mapping);
for (; old_addr < old_end; old_addr += sz, new_addr += sz) {
src_pte = huge_pte_offset(mm, old_addr, sz);
-   if (!src_pte)
+   if (!src_pte) {
+   old_addr |= last_addr_mask;
+   new_addr |= last_addr_mask;
continue;
+   }
if (huge_pte_none(huge_ptep_get(src_pte)))

[PATCH 0/4] hugetlb: speed up linear address scanning

2022-06-16 Thread Mike Kravetz

At unmap, fork and remap time hugetlb address ranges are linearly
scanned.  We can optimize these scans if the ranges are sparsely
populated.

Also, enable page table "Lazy copy" for hugetlb at fork.

NOTE: Architectures not defining CONFIG_ARCH_WANT_GENERAL_HUGETLB
need to add an arch specific version hugetlb_mask_last_page() to
take advantage of sparse address scanning improvements.  Baolin Wang
added the routine for arm64.  Other architectures which could be
optimized are: ia64, mips, parisc, powerpc, s390, sh and sparc.

Baolin Wang (1):
  arm64/hugetlb: Implement arm64 specific hugetlb_mask_last_page

Mike Kravetz (3):
  hugetlb: skip to end of PT page mapping when pte not present
  hugetlb: do not update address in huge_pmd_unshare
  hugetlb: Lazy page table copies in fork()

 arch/arm64/mm/hugetlbpage.c |  20 +++
 include/linux/hugetlb.h |   5 +-
 mm/hugetlb.c| 109 +---
 mm/memory.c |   2 +-
 mm/rmap.c   |   4 +-
 5 files changed, 103 insertions(+), 37 deletions(-)

-- 
2.35.3

Re: [PATCH v2 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-09 Thread Mike Kravetz

On 5/8/22 02:36, Baolin Wang wrote:
> On some architectures (like ARM64), it can support CONT-PTE/PMD size
> hugetlb, which means it can support not only PMD/PUD size hugetlb:
> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
> size specified.
> 
> When unmapping a hugetlb page, we will get the relevant page table
> entry by huge_pte_offset() only once to nuke it. This is correct
> for PMD or PUD size hugetlb, since they always contain only one
> pmd entry or pud entry in the page table.
> 
> However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
> since they can contain several continuous pte or pmd entry with
> same page table attributes, so we will nuke only one pte or pmd
> entry for this CONT-PTE/PMD size hugetlb page.
> 
> And now try_to_unmap() is only passed a hugetlb page in the case
> where the hugetlb page is poisoned. Which means now we will unmap
> only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb
> page, and we can still access other subpages of a CONT-PTE or CONT-PMD
> size poisoned hugetlb page, which will cause serious issues possibly.
> 
> So we should change to use huge_ptep_clear_flush() to nuke the
> hugetlb page table to fix this issue, which already considered
> CONT-PTE and CONT-PMD size hugetlb.
> 
> We've already used set_huge_swap_pte_at() to set a poisoned
> swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON()
> to make sure the passed hugetlb page is poisoned in try_to_unmap().
> 
> Signed-off-by: Baolin Wang 
> ---
>  mm/rmap.c | 39 ++-
>  1 file changed, 22 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 7cf2408..37c8fd2 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1530,6 +1530,11 @@ static bool try_to_unmap_one(struct folio *folio, 
> struct vm_area_struct *vma,
>  
>   if (folio_test_hugetlb(folio)) {
>   /*
> +  * The try_to_unmap() is only passed a hugetlb page
> +  * in the case where the hugetlb page is poisoned.
> +  */
> + VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
> + /*

It is unfortunate that this could not easily be added to the first
if (folio_test_hugetlb(folio)) block in this routine.  However, it
is fine to add here.

Looks good.  Thanks for all these changes,

Reviewed-by: Mike Kravetz 

-- 
Mike Kravetz

Re: [PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-09 Thread Mike Kravetz

On 5/8/22 02:36, Baolin Wang wrote:
> On some architectures (like ARM64), it can support CONT-PTE/PMD size
> hugetlb, which means it can support not only PMD/PUD size hugetlb:
> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
> size specified.
> 
> When migrating a hugetlb page, we will get the relevant page table
> entry by huge_pte_offset() only once to nuke it and remap it with
> a migration pte entry. This is correct for PMD or PUD size hugetlb,
> since they always contain only one pmd entry or pud entry in the
> page table.
> 
> However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
> since they can contain several continuous pte or pmd entry with
> same page table attributes. So we will nuke or remap only one pte
> or pmd entry for this CONT-PTE/PMD size hugetlb page, which is
> not expected for hugetlb migration. The problem is we can still
> continue to modify the subpages' data of a hugetlb page during
> migrating a hugetlb page, which can cause a serious data consistent
> issue, since we did not nuke the page table entry and set a
> migration pte for the subpages of a hugetlb page.
> 
> To fix this issue, we should change to use huge_ptep_clear_flush()
> to nuke a hugetlb page table, and remap it with set_huge_pte_at()
> and set_huge_swap_pte_at() when migrating a hugetlb page, which
> already considered the CONT-PTE or CONT-PMD size hugetlb.
> 
> Signed-off-by: Baolin Wang 
> ---
>  mm/rmap.c | 24 ++--
>  1 file changed, 18 insertions(+), 6 deletions(-)

With the addition of !CONFIG_HUGETLB_PAGE stubs,

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz

Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-09 Thread Mike Kravetz

On 5/9/22 01:46, Baolin Wang wrote:
> 
> 
> On 5/9/2022 1:46 PM, Christophe Leroy wrote:
>>
>>
>> Le 08/05/2022 à 15:09, Baolin Wang a écrit :
>>>
>>>
>>> On 5/8/2022 7:09 PM, Muchun Song wrote:
>>>> On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote:
>>>>> It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
>>>>> table when unmapping or migrating a hugetlb page, and will change
>>>>> to use huge_ptep_clear_flush() instead in the following patches.
>>>>>
>>>>> So this is a preparation patch, which changes the
>>>>> huge_ptep_clear_flush()
>>>>> to return the original pte to help to nuke a hugetlb page table.
>>>>>
>>>>> Signed-off-by: Baolin Wang 
>>>>> Acked-by: Mike Kravetz 
>>>>
>>>> Reviewed-by: Muchun Song 
>>>
>>> Thanks for reviewing.
>>>
>>>>
>>>> But one nit below:
>>>>
>>>> [...]
>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>> index 8605d7e..61a21af 100644
>>>>> --- a/mm/hugetlb.c
>>>>> +++ b/mm/hugetlb.c
>>>>> @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct
>>>>> *mm, struct vm_area_struct *vma,
>>>>>    ClearHPageRestoreReserve(new_page);
>>>>>    /* Break COW or unshare */
>>>>> -    huge_ptep_clear_flush(vma, haddr, ptep);
>>>>> +    (void)huge_ptep_clear_flush(vma, haddr, ptep);
>>>>
>>>> Why add a "(void)" here? Is there any warning if no "(void)"?
>>>> IIUC, I think we can remove this, right?
>>>
>>> I did not meet any warning without the casting, but this is per Mike's
>>> comment[1] to make the code consistent with other functions casting to
>>> void type explicitly in hugetlb.c file.
>>>
>>> [1]
>>> https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/
>>>
>>
>> As far as I understand, Mike said that you should be accompagnied with a
>> big fat comment explaining why we ignore the returned value from
>> huge_ptep_clear_flush(). >
>> By the way huge_ptep_clear_flush() is not declared 'must_check' so this
>> cast is just visual polution and should be removed.
>>
>> In the meantime the comment suggested by Mike should be added instead.
> Sorry for my misunderstanding. I just follow the explicit void casting like 
> other places in hugetlb.c file. And I am not sure if it is useful adding some 
> comments like below, since we did not need the original pte value in the COW 
> case mapping with a new page, and the code is more readable already I think.
> 
> Mike, could you help to clarify what useful comments would you like? and 
> remove the explicit void casting? Thanks.
> 

Sorry for the confusion.

In the original commit, it seemed odd to me that the signature of the
function was changing and there was not an associated change to the only
caller of the function.  I did suggest casting to void or adding a comment.
As Christophe mentions, the cast to void is not necessary.  In addition,
there really isn't a need for a comment as the calling code is not changed.

The original version of the commit without either is actually preferable.
The commit message does say this is a preparation patch and the return
value will be used in later patches.
Again, sorry for the confusion.
-- 
Mike Kravetz

Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-07 Thread Mike Kravetz

On 5/5/22 20:39, Baolin Wang wrote:
> 
> On 5/6/2022 7:53 AM, Mike Kravetz wrote:
>> On 4/29/22 01:14, Baolin Wang wrote:
>>> On some architectures (like ARM64), it can support CONT-PTE/PMD size
>>> hugetlb, which means it can support not only PMD/PUD size hugetlb:
>>> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
>>> size specified.
>> 
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 6fdd198..7cf2408 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, 
>>> struct vm_area_struct *vma,
>>>   break;
>>>   }
>>>   }
>>> +
>>> +    /* Nuke the hugetlb page table entry */
>>> +    pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
>>>   } else {
>>>   flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>>> +    /* Nuke the page table entry. */
>>> +    pteval = ptep_clear_flush(vma, address, pvmw.pte);
>>>   }
>>>   
>>
>> On arm64 with CONT-PTE/PMD the returned pteval will have dirty or young set
>> if ANY of the PTE/PMDs had dirty or young set.
> 
> Right.
> 
>>
>>> -    /* Nuke the page table entry. */
>>> -    pteval = ptep_clear_flush(vma, address, pvmw.pte);
>>> -
>>>   /* Set the dirty flag on the folio now the pte is gone. */
>>>   if (pte_dirty(pteval))
>>>   folio_mark_dirty(folio);
>>> @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, 
>>> struct vm_area_struct *vma,
>>>   pte_t swp_pte;
>>>     if (arch_unmap_one(mm, vma, address, pteval) < 0) {
>>> -    set_pte_at(mm, address, pvmw.pte, pteval);
>>> +    if (folio_test_hugetlb(folio))
>>> +    set_huge_pte_at(mm, address, pvmw.pte, pteval);
>>
>> And, we will use that pteval for ALL the PTE/PMDs here.  So, we would set
>> the dirty or young bit in ALL PTE/PMDs.
>>
>> Could that cause any issues?  May be more of a question for the arm64 people.
> 
> I don't think this will cause any issues. Since the hugetlb can not be split, 
> and we should not lose the the dirty or young state if any subpages were set. 
> Meanwhile we already did like this in hugetlb.c:
> 
> pte = huge_ptep_get_and_clear(mm, address, ptep);
> tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
> if (huge_pte_dirty(pte))
> set_page_dirty(page);
> 

Agree that it 'should not' cause issues.  It just seems inconsistent.
This is not a problem specifically with your patch, just the handling of
CONT-PTE/PMD entries.

There does not appear to be an arm64 specific version of huge_ptep_get()
that takes CONT-PTE/PMD into account.  So, huge_ptep_get() would only
return the one specific value.  It would not take into account the dirty
or young bits of CONT-PTE/PMDs like your new version of
huge_ptep_get_and_clear.  Is that correct?  Or, am I missing something.

If I am correct, then code like the following may not work:

static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
unsigned long addr, unsigned long end, struct mm_walk *walk)
{
pte_t huge_pte = huge_ptep_get(pte);
struct numa_maps *md;
struct page *page;

if (!pte_present(huge_pte))
return 0;

page = pte_page(huge_pte);

md = walk->private;
gather_stats(page, md, pte_dirty(huge_pte), 1);
return 0;
}

-- 
Mike Kravetz

Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-06 Thread Mike Kravetz

gt;}
>>>>
>>>> }
>>>
>>> OK, but wouldn't the pteval be overwritten here with
>>> pteval = swp_entry_to_pte(make_hwpoison_entry(subpage))?
>>> IOW, what sense does it make to save the returned pteval from
>>> huge_ptep_clear_flush(), when it is never being used anywhere?
>>
>> Please see previous code, we'll use the original pte value to check if 
>> it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs 
>> is set noop_dirty_folio().
>>
>> pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
> 
> Uh, ok, that wouldn't work on s390, but we also don't have
> CONFIG_PTE_MARKER_UFFD_WP / HAVE_ARCH_USERFAULTFD_WP set, so
> I guess we will be fine (for now).
> 
> Still, I find it a bit unsettling that pte_install_uffd_wp_if_needed()
> would work on a potential hugetlb *pte, directly de-referencing it
> instead of using huge_ptep_get().
> 
> The !pte_none(*pte) check at the beginning would be broken in the
> hugetlb case for s390 (not sure about other archs, but I think s390
> might be the only exception strictly requiring huge_ptep_get()
> for de-referencing hugetlb *pte pointers).
> 

Adding Peter Wu mostly for above as he is working uffd_wp.

>>
>> /* Set the dirty flag on the folio now the pte is gone. */
>> if (pte_dirty(pteval))
>>  folio_mark_dirty(folio);
> 
> Ok, that should work fine, huge_ptep_clear_flush() will return
> a pteval properly de-referenced and converted with huge_ptep_get(),
> and that would contain the hugetlb pmd/pud dirty information.
> 


-- 
Mike Kravetz

Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-06 Thread Mike Kravetz

On 4/29/22 01:14, Baolin Wang wrote:
> On some architectures (like ARM64), it can support CONT-PTE/PMD size
> hugetlb, which means it can support not only PMD/PUD size hugetlb:
> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
> size specified.
> 
> When unmapping a hugetlb page, we will get the relevant page table
> entry by huge_pte_offset() only once to nuke it. This is correct
> for PMD or PUD size hugetlb, since they always contain only one
> pmd entry or pud entry in the page table.
> 
> However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
> since they can contain several continuous pte or pmd entry with
> same page table attributes, so we will nuke only one pte or pmd
> entry for this CONT-PTE/PMD size hugetlb page.
> 
> And now we only use try_to_unmap() to unmap a poisoned hugetlb page,

Since try_to_unmap can be called for non-hugetlb pages, perhaps the following
is more accurate?

try_to_unmap is only passed a hugetlb page in the case where the
hugetlb page is poisoned.

It does concern me that this assumption is built into the code as
pointed out in your discussion with Gerald.  Should we perhaps add
a VM_BUG_ON() to make sure the passed huge page is poisoned?  This
would be in the same 'if block' where we call
adjust_range_if_pmd_sharing_possible.

-- 
Mike Kravetz

> which means now we will unmap only one pte entry for a CONT-PTE or
> CONT-PMD size poisoned hugetlb page, and we can still access other
> subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page,
> which will cause serious issues possibly.
> 
> So we should change to use huge_ptep_clear_flush() to nuke the
> hugetlb page table to fix this issue, which already considered
> CONT-PTE and CONT-PMD size hugetlb.
> 
> Note we've already used set_huge_swap_pte_at() to set a poisoned
> swap entry for a poisoned hugetlb page.
> 
> Signed-off-by: Baolin Wang 
> ---
>  mm/rmap.c | 34 +-
>  1 file changed, 17 insertions(+), 17 deletions(-)

Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-05 Thread Mike Kravetz

On 4/29/22 01:14, Baolin Wang wrote:
> On some architectures (like ARM64), it can support CONT-PTE/PMD size
> hugetlb, which means it can support not only PMD/PUD size hugetlb:
> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
> size specified.

> diff --git a/mm/rmap.c b/mm/rmap.c
> index 6fdd198..7cf2408 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, 
> struct vm_area_struct *vma,
>   break;
>   }
>   }
> +
> + /* Nuke the hugetlb page table entry */
> + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
>   } else {
>   flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
> + /* Nuke the page table entry. */
> + pteval = ptep_clear_flush(vma, address, pvmw.pte);
>   }
>  

On arm64 with CONT-PTE/PMD the returned pteval will have dirty or young set
if ANY of the PTE/PMDs had dirty or young set.

> - /* Nuke the page table entry. */
> - pteval = ptep_clear_flush(vma, address, pvmw.pte);
> -
>   /* Set the dirty flag on the folio now the pte is gone. */
>   if (pte_dirty(pteval))
>   folio_mark_dirty(folio);
> @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, 
> struct vm_area_struct *vma,
>   pte_t swp_pte;
>  
>   if (arch_unmap_one(mm, vma, address, pteval) < 0) {
> - set_pte_at(mm, address, pvmw.pte, pteval);
> + if (folio_test_hugetlb(folio))
> + set_huge_pte_at(mm, address, pvmw.pte, 
> pteval);

And, we will use that pteval for ALL the PTE/PMDs here.  So, we would set
the dirty or young bit in ALL PTE/PMDs.

Could that cause any issues?  May be more of a question for the arm64 people.
-- 
Mike Kravetz

Re: [PATCH 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-05 Thread Mike Kravetz

On 4/29/22 01:14, Baolin Wang wrote:
> It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
> table when unmapping or migrating a hugetlb page, and will change
> to use huge_ptep_clear_flush() instead in the following patches.
> 
> So this is a preparation patch, which changes the huge_ptep_clear_flush()
> to return the original pte to help to nuke a hugetlb page table.
> 
> Signed-off-by: Baolin Wang 
> ---
>  arch/arm64/include/asm/hugetlb.h   |  4 ++--
>  arch/arm64/mm/hugetlbpage.c| 12 +---
>  arch/ia64/include/asm/hugetlb.h|  4 ++--
>  arch/mips/include/asm/hugetlb.h|  9 ++---
>  arch/parisc/include/asm/hugetlb.h  |  4 ++--
>  arch/powerpc/include/asm/hugetlb.h |  9 ++---
>  arch/s390/include/asm/hugetlb.h|  6 +++---
>  arch/sh/include/asm/hugetlb.h  |  4 ++--
>  arch/sparc/include/asm/hugetlb.h   |  4 ++--
>  include/asm-generic/hugetlb.h  |  4 ++--
>  10 files changed, 32 insertions(+), 28 deletions(-)

The above changes look straight forward.
Happy that you Cc'ed impacted arch maintainers so they can at least
have a look.

The only user of huge_ptep_clear_flush() today is hugetlb_cow/wp() in
mm/hugetlb.c.  Any reason why you did not change that code?  At least
cast the return of huge_ptep_clear_flush() to void with a comment?
Not absolutely necessary.

Acked-by: Mike Kravetz 
-- 
Mike Kravetz

Re: [PATCH] mm: Merge pte_mkhuge() call into arch_make_huge_pte()

2022-02-02 Thread Mike Kravetz

On 2/1/22 21:38, Anshuman Khandual wrote:
> Each call into pte_mkhuge() is invariably followed by arch_make_huge_pte().
> Instead arch_make_huge_pte() can accommodate pte_mkhuge() at the beginning.
> This updates generic fallback stub for arch_make_huge_pte() and available
> platforms definitions. This makes huge pte creation much cleaner and easier
> to follow.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: "David S. Miller" 
> Cc: Mike Kravetz 
> Cc: Andrew Morton 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: sparcli...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/arm64/mm/hugetlbpage.c  | 1 +
>  arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 1 +
>  arch/sparc/mm/hugetlbpage.c  | 1 +
>  include/linux/hugetlb.h  | 2 +-
>  mm/hugetlb.c | 3 +--
>  mm/vmalloc.c | 1 -
>  6 files changed, 5 insertions(+), 4 deletions(-)

Seems like a reasonable cleanup/simplification to me.

Acked-by: Mike Kravetz 

-- 
Mike Kravetz

Re: [RFC PATCH v1 2/4] mm/hugetlb: Change parameters of arch_make_huge_pte()

2021-04-29 Thread Mike Kravetz

On 4/28/21 9:46 AM, Christophe Leroy wrote:
> At the time being, arch_make_huge_pte() has the following prototype:
> 
>   pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
>struct page *page, int writable);
> 
> vma is used to get the pages shift or size.
> vma is also used on Sparc to get vm_flags.
> page is not used.
> writable is not used.
> 
> In order to use this function without a vma, and replace vma by shift
> and flags. Also remove the used parameters.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/arm64/include/asm/hugetlb.h | 3 +--
>  arch/arm64/mm/hugetlbpage.c  | 5 ++---
>  arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 5 ++---
>  arch/sparc/include/asm/pgtable_64.h  | 3 +--
>  arch/sparc/mm/hugetlbpage.c  | 6 ++
>  include/linux/hugetlb.h  | 4 ++--
>  mm/hugetlb.c | 6 --
>  mm/migrate.c | 4 +++-
>  8 files changed, 17 insertions(+), 19 deletions(-)

Hi Christophe,

Sorry, no suggestion for how to make a beautiful generic implementation.

This patch is straight forward.
Acked-by: Mike Kravetz 
-- 
Mike Kravetz

Re: [PATCH V3 3/3] mm/hugetlb: Define a generic fallback for arch_clear_hugepage_flags()

2020-05-11 Thread Mike Kravetz

On 5/7/20 8:07 PM, Anshuman Khandual wrote:
> There are multiple similar definitions for arch_clear_hugepage_flags() on
> various platforms. Lets just add it's generic fallback definition for
> platforms that do not override. This help reduce code duplication.
> 
> Cc: Russell King 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Tony Luck 
> Cc: Fenghua Yu 
> Cc: Thomas Bogendoerfer 
> Cc: "James E.J. Bottomley" 
> Cc: Helge Deller 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Paul Walmsley 
> Cc: Palmer Dabbelt 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> Cc: Yoshinori Sato 
> Cc: Rich Felker 
> Cc: "David S. Miller" 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Mike Kravetz 
> Cc: Andrew Morton 
> Cc: x...@kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-i...@vger.kernel.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Anshuman Khandual 

Thanks!
Removing duplicate code is good.

Acked-by: Mike Kravetz 

-- 
Mike Kravetz

Re: [PATCH V3 2/3] mm/hugetlb: Define a generic fallback for is_hugepage_only_range()

2020-05-11 Thread Mike Kravetz

On 5/10/20 8:14 PM, Anshuman Khandual wrote:
> On 05/09/2020 03:52 AM, Mike Kravetz wrote:
>> On 5/7/20 8:07 PM, Anshuman Khandual wrote:
>>
>> Did you try building without CONFIG_HUGETLB_PAGE defined?  I'm guessing
> 
> Yes I did for multiple platforms (s390, arm64, ia64, x86, powerpc etc).
> 
>> that you need a stub for is_hugepage_only_range().  Or, perhaps add this
>> to asm-generic/hugetlb.h?
>>
> There is already a stub (include/linux/hugetlb.h) when !CONFIG_HUGETLB_PAGE.
> 

Thanks!  I missed that stub in the existing code.  I like the removal of
redundant code.

Acked-by: Mike Kravetz 

-- 
Mike Kravetz

Re: [PATCH V3 2/3] mm/hugetlb: Define a generic fallback for is_hugepage_only_range()

2020-05-08 Thread Mike Kravetz

On 5/7/20 8:07 PM, Anshuman Khandual wrote:
> There are multiple similar definitions for is_hugepage_only_range() on
> various platforms. Lets just add it's generic fallback definition for
> platforms that do not override. This help reduce code duplication.
> 
> Cc: Russell King 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Tony Luck 
> Cc: Fenghua Yu 
> Cc: Thomas Bogendoerfer 
> Cc: "James E.J. Bottomley" 
> Cc: Helge Deller 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Paul Walmsley 
> Cc: Palmer Dabbelt 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> Cc: Yoshinori Sato 
> Cc: Rich Felker 
> Cc: "David S. Miller" 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Mike Kravetz 
> Cc: Andrew Morton 
> Cc: x...@kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-i...@vger.kernel.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/arm/include/asm/hugetlb.h | 6 --
>  arch/arm64/include/asm/hugetlb.h   | 6 --
>  arch/ia64/include/asm/hugetlb.h| 1 +
>  arch/mips/include/asm/hugetlb.h| 7 ---
>  arch/parisc/include/asm/hugetlb.h  | 6 --
>  arch/powerpc/include/asm/hugetlb.h | 1 +
>  arch/riscv/include/asm/hugetlb.h   | 6 --
>  arch/s390/include/asm/hugetlb.h| 7 ---
>  arch/sh/include/asm/hugetlb.h  | 6 --
>  arch/sparc/include/asm/hugetlb.h   | 6 --
>  arch/x86/include/asm/hugetlb.h | 6 --
>  include/linux/hugetlb.h| 9 +
>  12 files changed, 11 insertions(+), 56 deletions(-)
> 

> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 43a1cef8f0f1..c01c0c6f7fd4 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -591,6 +591,15 @@ static inline unsigned int blocks_per_huge_page(struct 
> hstate *h)
>  
>  #include 
>  
> +#ifndef is_hugepage_only_range
> +static inline int is_hugepage_only_range(struct mm_struct *mm,
> + unsigned long addr, unsigned long len)
> +{
> + return 0;
> +}
> +#define is_hugepage_only_range is_hugepage_only_range
> +#endif
> +
>  #ifndef arch_make_huge_pte
>  static inline pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct 
> *vma,
>  struct page *page, int writable)
> 

Did you try building without CONFIG_HUGETLB_PAGE defined?  I'm guessing
that you need a stub for is_hugepage_only_range().  Or, perhaps add this
to asm-generic/hugetlb.h?

-- 
Mike Kravetz

[PATCH v4 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-04-28 Thread Mike Kravetz

Now that architectures provide arch_hugetlb_valid_size(), parsing
of "hugepagesz=" can be done in architecture independent code.
Create a single routine to handle hugepagesz= parsing and remove
all arch specific routines.  We can also remove the interface
hugetlb_bad_size() as this is no longer used outside arch independent
code.

This also provides consistent behavior of hugetlbfs command line
options.  The hugepagesz= option should only be specified once for
a specific size, but some architectures allow multiple instances.
This appears to be more of an oversight when code was added by some
architectures to set up ALL huge pages sizes.

Signed-off-by: Mike Kravetz 
Acked-by: Mina Almasry 
Reviewed-by: Peter Xu 
Acked-by: Gerald Schaefer   [s390]
Acked-by: Will Deacon 
---
 arch/arm64/mm/hugetlbpage.c   | 15 ---
 arch/powerpc/mm/hugetlbpage.c | 15 ---
 arch/riscv/mm/hugetlbpage.c   | 16 
 arch/s390/mm/hugetlbpage.c| 18 --
 arch/sparc/mm/init_64.c   | 22 --
 arch/x86/mm/hugetlbpage.c | 16 
 include/linux/hugetlb.h   |  1 -
 mm/hugetlb.c  | 23 +--
 8 files changed, 17 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 069b96ee2aec..f706b821aba6 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -476,18 +476,3 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 
return false;
 }
-
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long ps = memparse(opt, );
-
-   if (arch_hugetlb_valid_size(ps)) {
-   add_huge_page_size(ps);
-   return 1;
-   }
-
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10);
-   return 0;
-}
-__setup("hugepagesz=", setup_hugepagesz);
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index de54d2a37830..2c3fa0a7787b 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -589,21 +589,6 @@ static int __init add_huge_page_size(unsigned long long 
size)
return 0;
 }
 
-static int __init hugepage_setup_sz(char *str)
-{
-   unsigned long long size;
-
-   size = memparse(str, );
-
-   if (add_huge_page_size(size) != 0) {
-   hugetlb_bad_size();
-   pr_err("Invalid huge page size specified(%llu)\n", size);
-   }
-
-   return 1;
-}
-__setup("hugepagesz=", hugepage_setup_sz);
-
 static int __init hugetlbpage_init(void)
 {
bool configured = false;
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index da1f516bc451..4e5d7e9f0eef 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -22,22 +22,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
return false;
 }
 
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long ps = memparse(opt, );
-
-   if (arch_hugetlb_valid_size(ps)) {
-   hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT);
-   return 1;
-   }
-
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
-   return 0;
-
-}
-__setup("hugepagesz=", setup_hugepagesz);
-
 #ifdef CONFIG_CONTIG_ALLOC
 static __init int gigantic_pages_init(void)
 {
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index ac25b207624c..242dfc0d462d 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -261,24 +261,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
return false;
 }
 
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long size;
-   char *string = opt;
-
-   size = memparse(opt, );
-   if (arch_hugetlb_valid_size(size)) {
-   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-   } else {
-   hugetlb_bad_size();
-   pr_err("hugepagesz= specifies an unsupported page size %s\n",
-   string);
-   return 0;
-   }
-   return 1;
-}
-__setup("hugepagesz=", setup_hugepagesz);
-
 static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 2bfe8e22b706..4618f96fd30f 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -397,28 +397,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 
return true;
 }
-
-static int __init setup_hugepagesz(char *string)
-{
-   unsigned long long hugepage_size;
-   int rc = 0;
-
-   hugepage_size = memparse(string, );
-
-   if (!arch_hugetlb_valid_size((unsig

[PATCH v4 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

2020-04-28 Thread Mike Kravetz

The routine hugetlb_add_hstate prints a warning if the hstate already
exists.  This was originally done as part of kernel command line
parsing.  If 'hugepagesz=' was specified more than once, the warning
pr_warn("hugepagesz= specified twice, ignoring\n");
would be printed.

Some architectures want to enable all huge page sizes.  They would
call hugetlb_add_hstate for all supported sizes.  However, this was
done after command line processing and as a result hstates could have
already been created for some sizes.  To make sure no warning were
printed, there would often be code like:
if (!size_to_hstate(size)
hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT)

The only time we want to print the warning is as the result of command
line processing.  So, remove the warning from hugetlb_add_hstate and
add it to the single arch independent routine processing "hugepagesz=".
After this, calls to size_to_hstate() in arch specific code can be
removed and hugetlb_add_hstate can be called without worrying about
warning messages.

Signed-off-by: Mike Kravetz 
Acked-by: Mina Almasry 
Acked-by: Gerald Schaefer   [s390]
Acked-by: Will Deacon 
Tested-by: Anders Roxell 
---
 arch/arm64/mm/hugetlbpage.c   | 16 
 arch/powerpc/mm/hugetlbpage.c |  3 +--
 arch/riscv/mm/hugetlbpage.c   |  2 +-
 arch/sparc/mm/init_64.c   | 19 ---
 arch/x86/mm/hugetlbpage.c |  2 +-
 mm/hugetlb.c  |  9 ++---
 6 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index f706b821aba6..14bed8f4674a 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -441,22 +441,14 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma,
clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
 }
 
-static void __init add_huge_page_size(unsigned long size)
-{
-   if (size_to_hstate(size))
-   return;
-
-   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-}
-
 static int __init hugetlbpage_init(void)
 {
 #ifdef CONFIG_ARM64_4K_PAGES
-   add_huge_page_size(PUD_SIZE);
+   hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
 #endif
-   add_huge_page_size(CONT_PMD_SIZE);
-   add_huge_page_size(PMD_SIZE);
-   add_huge_page_size(CONT_PTE_SIZE);
+   hugetlb_add_hstate((CONT_PMD_SHIFT + PMD_SHIFT) - PAGE_SHIFT);
+   hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate((CONT_PTE_SHIFT + PAGE_SHIFT) - PAGE_SHIFT);
 
return 0;
 }
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 2c3fa0a7787b..4d5ed1093615 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -584,8 +584,7 @@ static int __init add_huge_page_size(unsigned long long 
size)
if (!arch_hugetlb_valid_size((unsigned long)size))
return -EINVAL;
 
-   if (!size_to_hstate(size))
-   hugetlb_add_hstate(shift - PAGE_SHIFT);
+   hugetlb_add_hstate(shift - PAGE_SHIFT);
return 0;
 }
 
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 4e5d7e9f0eef..932dadfdca54 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -26,7 +26,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 static __init int gigantic_pages_init(void)
 {
/* With CONTIG_ALLOC, we can allocate gigantic pages at runtime */
-   if (IS_ENABLED(CONFIG_64BIT) && !size_to_hstate(1UL << PUD_SHIFT))
+   if (IS_ENABLED(CONFIG_64BIT))
hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
return 0;
 }
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 4618f96fd30f..ae819a16d07a 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -325,23 +325,12 @@ static void __update_mmu_tsb_insert(struct mm_struct *mm, 
unsigned long tsb_inde
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void __init add_huge_page_size(unsigned long size)
-{
-   unsigned int order;
-
-   if (size_to_hstate(size))
-   return;
-
-   order = ilog2(size) - PAGE_SHIFT;
-   hugetlb_add_hstate(order);
-}
-
 static int __init hugetlbpage_init(void)
 {
-   add_huge_page_size(1UL << HPAGE_64K_SHIFT);
-   add_huge_page_size(1UL << HPAGE_SHIFT);
-   add_huge_page_size(1UL << HPAGE_256MB_SHIFT);
-   add_huge_page_size(1UL << HPAGE_2GB_SHIFT);
+   hugetlb_add_hstate(HPAGE_64K_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_256MB_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_2GB_SHIFT - PAGE_SHIFT);
 
return 0;
 }
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 937d640a89e3..cf5781142716 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -195,7 +195,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 static __ini

[PATCH v4 4/4] hugetlbfs: clean up command line processing

2020-04-28 Thread Mike Kravetz

With all hugetlb page processing done in a single file clean up code.
- Make code match desired semantics
  - Update documentation with semantics
- Make all warnings and errors messages start with 'HugeTLB:'.
- Consistently name command line parsing routines.
- Warn if !hugepages_supported() and command line parameters have
  been specified.
- Add comments to code
  - Describe some of the subtle interactions
  - Describe semantics of command line arguments

This patch also fixes issues with implicitly setting the number of
gigantic huge pages to preallocate.  Previously on X86 command line,
hugepages=2 default_hugepagesz=1G
would result in zero 1G pages being preallocated and,
# grep HugePages_Total /proc/meminfo
HugePages_Total:   0
# sysctl -a | grep nr_hugepages
vm.nr_hugepages = 2
vm.nr_hugepages_mempolicy = 2
# cat /proc/sys/vm/nr_hugepages
2
After this patch 2 gigantic pages will be preallocated and all the
proc, sysfs, sysctl and meminfo files will accurately reflect this.

To address the issue with gigantic pages, a small change in behavior
was made to command line processing.  Previously the command line,
hugepages=128 default_hugepagesz=2M hugepagesz=2M hugepages=256
would result in the allocation of 256 2M huge pages.  The value 128
would be ignored without any warning.  After this patch, 128 2M pages
will be allocated and a warning message will be displayed indicating
the value of 256 is ignored.  This change in behavior is required
because allocation of implicitly specified gigantic pages must be done
when the default_hugepagesz= is encountered for gigantic pages.
Previously the code waited until later in the boot process (hugetlb_init),
to allocate pages of default size.  However the bootmem allocator required
for gigantic allocations is not available at this time.

Signed-off-by: Mike Kravetz 
Acked-by: Gerald Schaefer   [s390]
Acked-by: Will Deacon 
Tested-by: Sandipan Das 
---
 .../admin-guide/kernel-parameters.txt |  40 +++--
 Documentation/admin-guide/mm/hugetlbpage.rst  |  35 
 mm/hugetlb.c  | 149 ++
 3 files changed, 179 insertions(+), 45 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 7bc83f3d9bdf..cbe657b86d0e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -834,12 +834,15 @@
See also Documentation/networking/decnet.txt.
 
default_hugepagesz=
-   [same as hugepagesz=] The size of the default
-   HugeTLB page size. This is the size represented by
-   the legacy /proc/ hugepages APIs, used for SHM, and
-   default size when mounting hugetlbfs filesystems.
-   Defaults to the default architecture's huge page size
-   if not specified.
+   [HW] The size of the default HugeTLB page. This is
+   the size represented by the legacy /proc/ hugepages
+   APIs.  In addition, this is the default hugetlb size
+   used for shmget(), mmap() and mounting hugetlbfs
+   filesystems.  If not specified, defaults to the
+   architecture's default huge page size.  Huge page
+   sizes are architecture dependent.  See also
+   Documentation/admin-guide/mm/hugetlbpage.rst.
+   Format: size[KMG]
 
deferred_probe_timeout=
[KNL] Debugging option to set a timeout in seconds for
@@ -1479,13 +1482,24 @@
hugepages using the cma allocator. If enabled, the
boot-time allocation of gigantic hugepages is skipped.
 
-   hugepages=  [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
-   hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
-   On x86-64 and powerpc, this option can be specified
-   multiple times interleaved with hugepages= to reserve
-   huge pages of different sizes. Valid pages sizes on
-   x86-64 are 2M (when the CPU supports "pse") and 1G
-   (when the CPU supports the "pdpe1gb" cpuinfo flag).
+   hugepages=  [HW] Number of HugeTLB pages to allocate at boot.
+   If this follows hugepagesz (below), it specifies
+   the number of pages of hugepagesz to be allocated.
+   If this is the first HugeTLB parameter on the command
+   line, it specifies the number of pages to allocate for
+   the default huge page size.  See also
+   Document

[PATCH v4 0/4] Clean up hugetlb boot command line processing

2020-04-28 Thread Mike Kravetz

v4 -
   Fixed huge page order definitions for arm64 (Qian Cai)
   Removed hugepages_supported() checks in command line processing as
 powerpc does not set hugepages_supported until later in boot (Sandipan)
   Added Acks, Reviews and Tested (Will, Gerald, Anders, Sandipan)

v3 -
   Used weak attribute method of defining arch_hugetlb_valid_size.
 This eliminates changes to arch specific hugetlb.h files (Peter)
   Updated documentation (Peter, Randy)
   Fixed handling of implicitly specified gigantic page preallocation
 in existing code and removed documentation of such.  There is now
 no difference between handling of gigantic and non-gigantic pages.
 (Peter, Nitesh).
 This requires the most review as there is a small change to
 undocumented behavior.  See patch 4 commit message for details.
   Added Acks and Reviews (Mina, Peter)

v2 -
   Fix build errors with patch 1 (Will)
   Change arch_hugetlb_valid_size arg to unsigned long and remove
 irrelevant 'extern' keyword (Christophe)
   Documentation and other misc changes (Randy, Christophe, Mina)
   Do not process command line options if !hugepages_supported()
 (Dave, but it sounds like we may want to additional changes to
  hugepages_supported() for x86?  If that is needed I would prefer
  a separate patch.)

Longpeng(Mike) reported a weird message from hugetlb command line processing
and proposed a solution [1].  While the proposed patch does address the
specific issue, there are other related issues in command line processing.
As hugetlbfs evolved, updates to command line processing have been made to
meet immediate needs and not necessarily in a coordinated manner.  The result
is that some processing is done in arch specific code, some is done in arch
independent code and coordination is problematic.  Semantics can vary between
architectures.

The patch series does the following:
- Define arch specific arch_hugetlb_valid_size routine used to validate
  passed huge page sizes.
- Move hugepagesz= command line parsing out of arch specific code and into
  an arch independent routine.
- Clean up command line processing to follow desired semantics and
  document those semantics.

[1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpe...@huawei.com

Mike Kravetz (4):
  hugetlbfs: add arch_hugetlb_valid_size
  hugetlbfs: move hugepagesz= parsing to arch independent code
  hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate
  hugetlbfs: clean up command line processing

 .../admin-guide/kernel-parameters.txt |  40 ++--
 Documentation/admin-guide/mm/hugetlbpage.rst  |  35 
 arch/arm64/mm/hugetlbpage.c   |  30 +--
 arch/powerpc/mm/hugetlbpage.c |  30 +--
 arch/riscv/mm/hugetlbpage.c   |  24 +--
 arch/s390/mm/hugetlbpage.c|  24 +--
 arch/sparc/mm/init_64.c   |  43 +
 arch/x86/mm/hugetlbpage.c |  23 +--
 include/linux/hugetlb.h   |   2 +-
 mm/hugetlb.c  | 180 ++
 10 files changed, 260 insertions(+), 171 deletions(-)

-- 
2.25.4

[PATCH v4 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-04-28 Thread Mike Kravetz

The architecture independent routine hugetlb_default_setup sets up
the default huge pages size.  It has no way to verify if the passed
value is valid, so it accepts it and attempts to validate at a later
time.  This requires undocumented cooperation between the arch specific
and arch independent code.

For architectures that support more than one huge page size, provide
a routine arch_hugetlb_valid_size to validate a huge page size.
hugetlb_default_setup can use this to validate passed values.

arch_hugetlb_valid_size will also be used in a subsequent patch to
move processing of the "hugepagesz=" in arch specific code to a common
routine in arch independent code.

Signed-off-by: Mike Kravetz 
Acked-by: Gerald Schaefer   [s390]
Acked-by: Will Deacon 
---
 arch/arm64/mm/hugetlbpage.c   | 17 +
 arch/powerpc/mm/hugetlbpage.c | 20 +---
 arch/riscv/mm/hugetlbpage.c   | 26 +-
 arch/s390/mm/hugetlbpage.c| 16 
 arch/sparc/mm/init_64.c   | 24 
 arch/x86/mm/hugetlbpage.c | 17 +
 include/linux/hugetlb.h   |  1 +
 mm/hugetlb.c  | 21 ++---
 8 files changed, 103 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index bbeb6a5a6ba6..069b96ee2aec 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -462,17 +462,26 @@ static int __init hugetlbpage_init(void)
 }
 arch_initcall(hugetlbpage_init);
 
-static __init int setup_hugepagesz(char *opt)
+bool __init arch_hugetlb_valid_size(unsigned long size)
 {
-   unsigned long ps = memparse(opt, );
-
-   switch (ps) {
+   switch (size) {
 #ifdef CONFIG_ARM64_4K_PAGES
case PUD_SIZE:
 #endif
case CONT_PMD_SIZE:
case PMD_SIZE:
case CONT_PTE_SIZE:
+   return true;
+   }
+
+   return false;
+}
+
+static __init int setup_hugepagesz(char *opt)
+{
+   unsigned long ps = memparse(opt, );
+
+   if (arch_hugetlb_valid_size(ps)) {
add_huge_page_size(ps);
return 1;
}
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 33b3461d91e8..de54d2a37830 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -558,7 +558,7 @@ unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
return vma_kernel_pagesize(vma);
 }
 
-static int __init add_huge_page_size(unsigned long long size)
+bool __init arch_hugetlb_valid_size(unsigned long size)
 {
int shift = __ffs(size);
int mmu_psize;
@@ -566,20 +566,26 @@ static int __init add_huge_page_size(unsigned long long 
size)
/* Check that it is a page size supported by the hardware and
 * that it fits within pagetable and slice limits. */
if (size <= PAGE_SIZE || !is_power_of_2(size))
-   return -EINVAL;
+   return false;
 
mmu_psize = check_and_get_huge_psize(shift);
if (mmu_psize < 0)
-   return -EINVAL;
+   return false;
 
BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);
 
-   /* Return if huge page size has already been setup */
-   if (size_to_hstate(size))
-   return 0;
+   return true;
+}
 
-   hugetlb_add_hstate(shift - PAGE_SHIFT);
+static int __init add_huge_page_size(unsigned long long size)
+{
+   int shift = __ffs(size);
+
+   if (!arch_hugetlb_valid_size((unsigned long)size))
+   return -EINVAL;
 
+   if (!size_to_hstate(size))
+   hugetlb_add_hstate(shift - PAGE_SHIFT);
return 0;
 }
 
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index a6189ed36c5f..da1f516bc451 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -12,21 +12,29 @@ int pmd_huge(pmd_t pmd)
return pmd_leaf(pmd);
 }
 
+bool __init arch_hugetlb_valid_size(unsigned long size)
+{
+   if (size == HPAGE_SIZE)
+   return true;
+   else if (IS_ENABLED(CONFIG_64BIT) && size == PUD_SIZE)
+   return true;
+   else
+   return false;
+}
+
 static __init int setup_hugepagesz(char *opt)
 {
unsigned long ps = memparse(opt, );
 
-   if (ps == HPAGE_SIZE) {
-   hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT);
-   } else if (IS_ENABLED(CONFIG_64BIT) && ps == PUD_SIZE) {
-   hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
-   } else {
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
-   return 0;
+   if (arch_hugetlb_valid_size(ps)) {
+   hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT);
+   return 1;
}
 
-   return 1;
+   hugetlb_bad_size();
+   pr_err("hugepagesz: Unsupported page size %lu M\n&quo

Re: [PATCH v3 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-04-27 Thread Mike Kravetz

On 4/27/20 1:18 PM, Andrew Morton wrote:
> On Mon, 27 Apr 2020 12:09:47 -0700 Mike Kravetz  
> wrote:
> 
>> Previously, a check for hugepages_supported was added before processing
>> hugetlb command line parameters.  On some architectures such as powerpc,
>> hugepages_supported() is not set to true until after command line
>> processing.  Therefore, no hugetlb command line parameters would be
>> accepted.
>>
>> Remove the additional checks for hugepages_supported.  In hugetlb_init,
>> print a warning if !hugepages_supported and command line parameters were
>> specified.
> 
> This applies to [4/4] instead of fixing [2/4].  I guess you'll
> straighten that out in v4?

Yes.

> btw, was
> http://lkml.kernel.org/r/CADYN=9Koefrq9H1Y82Q8nMNbeyN4tzhEfvDu5u=svfjfzcy...@mail.gmail.com
> addressed?

Yes, you pulled a patch into your tree to address this.
hugetlbfs-remove-hugetlb_add_hstate-warning-for-existing-hstate-fix.patch

I'll send out a v4 with both these issues addressed.  Would like to wait
until receiving confirmation from someone who can test on powerpc.
-- 
Mike Kravetz

Re: [PATCH v3 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-04-27 Thread Mike Kravetz

On 4/27/20 10:25 AM, Mike Kravetz wrote:
> On 4/26/20 10:04 PM, Sandipan Das wrote:
>> On 18/04/20 12:20 am, Mike Kravetz wrote:
>>> Now that architectures provide arch_hugetlb_valid_size(), parsing
>>> of "hugepagesz=" can be done in architecture independent code.
>>
>> This isn't working as expected on powerpc64.
>>
>>   [0.00] Kernel command line: 
>> root=UUID=dc7b49cf-95a2-4996-8e7d-7c64ddc7a6ff hugepagesz=16G hugepages=2 
>>   [0.00] HugeTLB: huge pages not supported, ignoring hugepagesz = 16G
>>   [0.00] HugeTLB: huge pages not supported, ignoring hugepages = 2
>>   [0.284177] HugeTLB registered 16.0 MiB page size, pre-allocated 0 pages
>>   [0.284182] HugeTLB registered 16.0 GiB page size, pre-allocated 0 pages
>>   [2.585062] hugepagesz=16G
>>   [2.585063] hugepages=2
>>
> 
> In the new arch independent version of hugepages_setup, I added the following
> code in patch 4 off this series:
> 
>> +if (!hugepages_supported()) {
>> +pr_warn("HugeTLB: huge pages not supported, ignoring hugepages 
>> = %s\n", s);
>> +return 0;
>> +}
>> +
> 
> The easy solution is to remove all the hugepages_supported() checks from
> command line parsing routines and rely on the later check in hugetlb_init().

Here is a patch to address the issue.  Sorry, as my series breaks all hugetlb
command line processing on powerpc.

Sandipan, can you test the following patch?

>From 480fe2847361e2a85aeec1fb39fe643bb7100a07 Mon Sep 17 00:00:00 2001
From: Mike Kravetz 
Date: Mon, 27 Apr 2020 11:37:30 -0700
Subject: [PATCH] hugetlbfs: fix changes to command line processing

Previously, a check for hugepages_supported was added before processing
hugetlb command line parameters.  On some architectures such as powerpc,
hugepages_supported() is not set to true until after command line
processing.  Therefore, no hugetlb command line parameters would be
accepted.

Remove the additional checks for hugepages_supported.  In hugetlb_init,
print a warning if !hugepages_supported and command line parameters were
specified.

Signed-off-by: Mike Kravetz 
---
 mm/hugetlb.c | 20 
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1075abdb5717..5548e8851b93 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3212,8 +3212,11 @@ static int __init hugetlb_init(void)
 {
int i;
 
-   if (!hugepages_supported())
+   if (!hugepages_supported()) {
+   if (hugetlb_max_hstate || default_hstate_max_huge_pages)
+   pr_warn("HugeTLB: huge pages not supported, ignoring 
associated command-line parameters\n");
return 0;
+   }
 
/*
 * Make sure HPAGE_SIZE (HUGETLB_PAGE_ORDER) hstate exists.  Some
@@ -3315,11 +3318,6 @@ static int __init hugepages_setup(char *s)
unsigned long *mhp;
static unsigned long *last_mhp;
 
-   if (!hugepages_supported()) {
-   pr_warn("HugeTLB: huge pages not supported, ignoring hugepages 
= %s\n", s);
-   return 0;
-   }
-
if (!parsed_valid_hugepagesz) {
pr_warn("HugeTLB: hugepages=%s does not follow a valid 
hugepagesz, ignoring\n", s);
parsed_valid_hugepagesz = true;
@@ -3372,11 +3370,6 @@ static int __init hugepagesz_setup(char *s)
struct hstate *h;
 
parsed_valid_hugepagesz = false;
-   if (!hugepages_supported()) {
-   pr_warn("HugeTLB: huge pages not supported, ignoring hugepagesz 
= %s\n", s);
-   return 0;
-   }
-
size = (unsigned long)memparse(s, NULL);
 
if (!arch_hugetlb_valid_size(size)) {
@@ -3424,11 +3417,6 @@ static int __init default_hugepagesz_setup(char *s)
unsigned long size;
 
parsed_valid_hugepagesz = false;
-   if (!hugepages_supported()) {
-   pr_warn("HugeTLB: huge pages not supported, ignoring 
default_hugepagesz = %s\n", s);
-   return 0;
-   }
-
if (parsed_default_hugepagesz) {
pr_err("HugeTLB: default_hugepagesz previously specified, 
ignoring %s\n", s);
return 0;
-- 
2.25.4

Re: [PATCH v3 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-04-27 Thread Mike Kravetz

On 4/26/20 10:04 PM, Sandipan Das wrote:
> Hi Mike,
> 
> On 18/04/20 12:20 am, Mike Kravetz wrote:
>> Now that architectures provide arch_hugetlb_valid_size(), parsing
>> of "hugepagesz=" can be done in architecture independent code.
>> Create a single routine to handle hugepagesz= parsing and remove
>> all arch specific routines.  We can also remove the interface
>> hugetlb_bad_size() as this is no longer used outside arch independent
>> code.
>>
>> This also provides consistent behavior of hugetlbfs command line
>> options.  The hugepagesz= option should only be specified once for
>> a specific size, but some architectures allow multiple instances.
>> This appears to be more of an oversight when code was added by some
>> architectures to set up ALL huge pages sizes.
>>
>> [...]
>>
>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>> index de54d2a37830..2c3fa0a7787b 100644
>> --- a/arch/powerpc/mm/hugetlbpage.c
>> +++ b/arch/powerpc/mm/hugetlbpage.c
>> @@ -589,21 +589,6 @@ static int __init add_huge_page_size(unsigned long long 
>> size)
>>  return 0;
>>  }
>>  
>> -static int __init hugepage_setup_sz(char *str)
>> -{
>> -unsigned long long size;
>> -
>> -size = memparse(str, );
>> -
>> -if (add_huge_page_size(size) != 0) {
>> -hugetlb_bad_size();
>> -pr_err("Invalid huge page size specified(%llu)\n", size);
>> -}
>> -
>> -return 1;
>> -}
>> -__setup("hugepagesz=", hugepage_setup_sz);
>> -
>> [...]
> 
> This isn't working as expected on powerpc64.
> 
>   [0.00] Kernel command line: 
> root=UUID=dc7b49cf-95a2-4996-8e7d-7c64ddc7a6ff hugepagesz=16G hugepages=2 
>   [0.00] HugeTLB: huge pages not supported, ignoring hugepagesz = 16G
>   [0.00] HugeTLB: huge pages not supported, ignoring hugepages = 2
>   [0.284177] HugeTLB registered 16.0 MiB page size, pre-allocated 0 pages
>   [0.284182] HugeTLB registered 16.0 GiB page size, pre-allocated 0 pages
>   [2.585062] hugepagesz=16G
>   [2.585063] hugepages=2
> 
> The "huge pages not supported" messages are under a !hugepages_supported()
> condition which checks if HPAGE_SHIFT is non-zero. On powerpc64, HPAGE_SHIFT
> comes from the hpage_shift variable. At this point, it is still zero and yet
> to be set. Hence the check fails. The reason being hugetlbpage_init_default(),
> which sets hpage_shift, it now called after hugepage_setup_sz().

Thanks for catching this Sandipan.

In the new arch independent version of hugepages_setup, I added the following
code in patch 4 off this series:

> +static int __init hugepages_setup(char *s)
>  {
>   unsigned long *mhp;
>   static unsigned long *last_mhp;
>  
> + if (!hugepages_supported()) {
> + pr_warn("HugeTLB: huge pages not supported, ignoring hugepages 
> = %s\n", s);
> + return 0;
> + }
> +
>   if (!parsed_valid_hugepagesz) {

In fact, I added it to the beginning of all the hugetlb command line parsing
routines.  My 'thought' was to warn early if hugetlb pages were not supported.
Previously, the first check for hugepages_supported() was in hugetlb_init()
which ran after hugetlbpage_init_default().

The easy solution is to remove all the hugepages_supported() checks from
command line parsing routines and rely on the later check in hugetlb_init().

Another reason for adding those early checks was to possibly prevent the
preallocation of gigantic pages at command line parsing time.   Gigantic
pages are allocated at command line parsing time as they need to be allocated
with the bootmem allocator.  My concern is that there could be some strange
configuration where !hugepages_supported(), yet we allocate gigantic pages
from bootmem that can not be used or freeed later.

powerpc is the only architecture which has it's own alloc_bootmem_huge_page
routine.  So, it handles this potential issue.

I'll send out a fix shortly.
-- 
Mike Kravetz

Re: [PATCH v3 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

2020-04-22 Thread Mike Kravetz

On 4/22/20 3:42 AM, Aneesh Kumar K.V wrote:
> Mike Kravetz  writes:
> 
>> The routine hugetlb_add_hstate prints a warning if the hstate already
>> exists.  This was originally done as part of kernel command line
>> parsing.  If 'hugepagesz=' was specified more than once, the warning
>>  pr_warn("hugepagesz= specified twice, ignoring\n");
>> would be printed.
>>
>> Some architectures want to enable all huge page sizes.  They would
>> call hugetlb_add_hstate for all supported sizes.  However, this was
>> done after command line processing and as a result hstates could have
>> already been created for some sizes.  To make sure no warning were
>> printed, there would often be code like:
>>  if (!size_to_hstate(size)
>>  hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT)
>>
>> The only time we want to print the warning is as the result of command
>> line processing.
> 
> Does this patch break hugepages=x command line? I haven't tested this
> yet. But one of the details w.r.t. skipping that hugetlb_add_hstate is
> to make sure we can configure the max_huge_pages. 
> 

Are you asking about hugepages=x being the only option on the command line?
If so, then the behavior is not changed.  This will result in x pages of
default huge page size being allocated.  Where default huge page size is of
course architecture dependent.  On an x86 VM,

[0.040474] Kernel command line: BOOT_IMAGE=/vmlinuz-5.6.0-mm1+ 
root=/dev/mapper/fedora_new--host-root ro rd.lvm.lv=fedora_new-host/root 
rd.lvm.lv=fedora_new-host/swap console=tty0 console=ttyS0,115200 audit=0 
transparent_hugepage=always hugepages=128
[0.332618] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[0.333245] HugeTLB registered 2.00 MiB page size, pre-allocated 128 pages

BTW - Here are the command line options I tested on x86 with this series.

No errors or warnings
-
hugepages=128
hugepagesz=2M hugepages=128
default_hugepagesz=2M hugepages=128
hugepages=128 default_hugepagesz=2M
hugepagesz=1G hugepages=2
hugepages=2 default_hugepagesz=1G
default_hugepagesz=1G hugepages=2
hugepages=128 hugepagesz=1G hugepages=2
hugepagesz=1G hugepages=2 hugepagesz=2M hugepages=128
default_hugepagesz=2M hugepages=128 hugepagesz=1G hugepages=2
hugepages=128 default_hugepagesz=2M hugepagesz=1G hugepages=2
hugepages=2 default_hugepagesz=1G hugepagesz=2M hugepages=128
default_hugepagesz=1G hugepages=2 hugepagesz=2M hugepages=128
default_hugepagesz=2M hugepagesz=2M hugepages=128
default_hugepagesz=2M hugepagesz=1G hugepages=2 hugepagesz=2M hugepages=128

Error or warning

hugepages=128 hugepagesz=2M hugepages=256
hugepagesz=2M hugepages=128 hugepagesz=2M hugepages=256
default_hugepagesz=2M hugepages=128 hugepagesz=2M hugepages=256
hugepages=128 hugepages=256
hugepagesz=2M hugepages=128 hugepages=2 default_hugepagesz=1G

-- 
Mike Kravetz

Re: [PATCH v3 0/4] Clean up hugetlb boot command line processing

2020-04-20 Thread Mike Kravetz

On 4/20/20 1:29 PM, Anders Roxell wrote:
> On Mon, 20 Apr 2020 at 20:23, Mike Kravetz  wrote:
>> On 4/20/20 8:34 AM, Qian Cai wrote:
>>>
>>> Reverted this series fixed many undefined behaviors on arm64 with the 
>>> config,
>> While rearranging the code (patch 3 in series), I made the incorrect
>> assumption that CONT_XXX_SIZE == (1UL << CONT_XXX_SHIFT).  However,
>> this is not the case.  Does the following patch fix these issues?
>>
>> From b75cb4a0852e208bee8c4eb347dc076fcaa88859 Mon Sep 17 00:00:00 2001
>> From: Mike Kravetz 
>> Date: Mon, 20 Apr 2020 10:41:18 -0700
>> Subject: [PATCH] arm64/hugetlb: fix hugetlb initialization
>>
>> When calling hugetlb_add_hstate() to initialize a new hugetlb size,
>> be sure to use correct huge pages size order.
>>
>> Signed-off-by: Mike Kravetz 
>> ---
>>  arch/arm64/mm/hugetlbpage.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 9ca840527296..a02411a1f19a 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -453,11 +453,11 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma,
>>  static int __init hugetlbpage_init(void)
>>  {
>>  #ifdef CONFIG_ARM64_4K_PAGES
>> -   hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
>> +   hugetlb_add_hstate(ilog2(PUD_SIZE) - PAGE_SHIFT);
>>  #endif
>> -   hugetlb_add_hstate(CONT_PMD_SHIFT - PAGE_SHIFT);
>> -   hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
>> -   hugetlb_add_hstate(CONT_PTE_SHIFT - PAGE_SHIFT);
>> +   hugetlb_add_hstate(ilog2(CONT_PMD_SIZE) - PAGE_SHIFT);
>> +   hugetlb_add_hstate(ilog2(PMD_SIZE) - PAGE_SHIFT);
>> +   hugetlb_add_hstate(ilog2(CONT_PTE_SIZE) - PAGE_SHIFT);
>>
>> return 0;
>>  }
> 
> I build this for an arm64 kernel and ran it in qemu and it worked.

Thanks for testing Anders!

Will, here is an updated version of the patch based on your suggestion.
I added the () for emphasis but that may just be noise for some.  Also,
the naming differences and values for CONT_PTE may make some people
look twice.  Not sure if being consistent here helps?

I have only built this.  No testing.

>From daf833ab6b806ecc0816d84d45dcbacc052a7eec Mon Sep 17 00:00:00 2001
From: Mike Kravetz 
Date: Mon, 20 Apr 2020 13:56:15 -0700
Subject: [PATCH] arm64/hugetlb: fix hugetlb initialization

When calling hugetlb_add_hstate() to initialize a new hugetlb size,
be sure to use correct huge pages size order.

Signed-off-by: Mike Kravetz 
---
 arch/arm64/mm/hugetlbpage.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 9ca840527296..bed6dc7c4276 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -455,9 +455,9 @@ static int __init hugetlbpage_init(void)
 #ifdef CONFIG_ARM64_4K_PAGES
hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
 #endif
-   hugetlb_add_hstate(CONT_PMD_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate((CONT_PMD_SHIFT + PMD_SHIFT) - PAGE_SHIFT);
hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
-   hugetlb_add_hstate(CONT_PTE_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate((CONT_PTE_SHIFT + PAGE_SHIFT) - PAGE_SHIFT);
 
return 0;
 }
-- 
2.25.2

Re: [PATCH v3 0/4] Clean up hugetlb boot command line processing

2020-04-20 Thread Mike Kravetz

On 4/20/20 8:34 AM, Qian Cai wrote:
> 
> 
>> On Apr 17, 2020, at 2:50 PM, Mike Kravetz  wrote:
>>
>> Longpeng(Mike) reported a weird message from hugetlb command line processing
>> and proposed a solution [1].  While the proposed patch does address the
>> specific issue, there are other related issues in command line processing.
>> As hugetlbfs evolved, updates to command line processing have been made to
>> meet immediate needs and not necessarily in a coordinated manner.  The result
>> is that some processing is done in arch specific code, some is done in arch
>> independent code and coordination is problematic.  Semantics can vary between
>> architectures.
>>
>> The patch series does the following:
>> - Define arch specific arch_hugetlb_valid_size routine used to validate
>>  passed huge page sizes.
>> - Move hugepagesz= command line parsing out of arch specific code and into
>>  an arch independent routine.
>> - Clean up command line processing to follow desired semantics and
>>  document those semantics.
>>
>> [1] 
>> https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpe...@huawei.com
>>
>> Mike Kravetz (4):
>>  hugetlbfs: add arch_hugetlb_valid_size
>>  hugetlbfs: move hugepagesz= parsing to arch independent code
>>  hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate
>>  hugetlbfs: clean up command line processing
> 
> Reverted this series fixed many undefined behaviors on arm64 with the config,
> 
> https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
> 
> [   54.172683][T1] UBSAN: shift-out-of-bounds in 
> ./include/linux/hugetlb.h:555:34
> [   54.180411][T1] shift exponent 4294967285 is too large for 64-bit type 
> 'unsigned long'
> [   54.15][T1] CPU: 130 PID: 1 Comm: swapper/0 Not tainted 
> 5.7.0-rc2-next-20200420 #1
> [   54.197284][T1] Hardware name: HPE Apollo 70 
> /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019
> [   54.207888][T1] Call trace:
> [   54.211100][T1]  dump_backtrace+0x0/0x224
> [   54.215565][T1]  show_stack+0x20/0x2c
> [   54.219651][T1]  dump_stack+0xfc/0x184
> [   54.223829][T1]  __ubsan_handle_shift_out_of_bounds+0x304/0x344
> [   54.230204][T1]  hugetlb_add_hstate+0x3ec/0x414
> huge_page_size at include/linux/hugetlb.h:555
> (inlined by) hugetlb_add_hstate at mm/hugetlb.c:3301
> [   54.235191][T1]  hugetlbpage_init+0x14/0x30
> [   54.239824][T1]  do_one_initcall+0x6c/0x144
> [   54.26][T1]  do_initcall_level+0x158/0x1c4
> [   54.249336][T1]  do_initcalls+0x68/0xb0
> [   54.253597][T1]  do_basic_setup+0x28/0x30
> [   54.258049][T1]  kernel_init_freeable+0x19c/0x228
> [   54.263188][T1]  kernel_init+0x14/0x208
> [   54.267473][T1]  ret_from_fork+0x10/0x18

While rearranging the code (patch 3 in series), I made the incorrect
assumption that CONT_XXX_SIZE == (1UL << CONT_XXX_SHIFT).  However,
this is not the case.  Does the following patch fix these issues?

>From b75cb4a0852e208bee8c4eb347dc076fcaa88859 Mon Sep 17 00:00:00 2001
From: Mike Kravetz 
Date: Mon, 20 Apr 2020 10:41:18 -0700
Subject: [PATCH] arm64/hugetlb: fix hugetlb initialization

When calling hugetlb_add_hstate() to initialize a new hugetlb size,
be sure to use correct huge pages size order.

Signed-off-by: Mike Kravetz 
---
 arch/arm64/mm/hugetlbpage.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 9ca840527296..a02411a1f19a 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -453,11 +453,11 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma,
 static int __init hugetlbpage_init(void)
 {
 #ifdef CONFIG_ARM64_4K_PAGES
-   hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(ilog2(PUD_SIZE) - PAGE_SHIFT);
 #endif
-   hugetlb_add_hstate(CONT_PMD_SHIFT - PAGE_SHIFT);
-   hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
-   hugetlb_add_hstate(CONT_PTE_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(ilog2(CONT_PMD_SIZE) - PAGE_SHIFT);
+   hugetlb_add_hstate(ilog2(PMD_SIZE) - PAGE_SHIFT);
+   hugetlb_add_hstate(ilog2(CONT_PTE_SIZE) - PAGE_SHIFT);
 
return 0;
 }
-- 
2.25.2

[PATCH v3 0/4] Clean up hugetlb boot command line processing

2020-04-17 Thread Mike Kravetz

v3 -
   Used weak attribute method of defining arch_hugetlb_valid_size.
 This eliminates changes to arch specific hugetlb.h files (Peter)
   Updated documentation (Peter, Randy)
   Fixed handling of implicitly specified gigantic page preallocation
 in existing code and removed documentation of such.  There is now
 no difference between handling of gigantic and non-gigantic pages.
 (Peter, Nitesh).
 This requires the most review as there is a small change to
 undocumented behavior.  See patch 4 commit message for details.
   Added Acks and Reviews (Mina, Peter)

v2 -
   Fix build errors with patch 1 (Will)
   Change arch_hugetlb_valid_size arg to unsigned long and remove
 irrelevant 'extern' keyword (Christophe)
   Documentation and other misc changes (Randy, Christophe, Mina)
   Do not process command line options if !hugepages_supported()
 (Dave, but it sounds like we may want to additional changes to
  hugepages_supported() for x86?  If that is needed I would prefer
  a separate patch.)

Longpeng(Mike) reported a weird message from hugetlb command line processing
and proposed a solution [1].  While the proposed patch does address the
specific issue, there are other related issues in command line processing.
As hugetlbfs evolved, updates to command line processing have been made to
meet immediate needs and not necessarily in a coordinated manner.  The result
is that some processing is done in arch specific code, some is done in arch
independent code and coordination is problematic.  Semantics can vary between
architectures.

The patch series does the following:
- Define arch specific arch_hugetlb_valid_size routine used to validate
  passed huge page sizes.
- Move hugepagesz= command line parsing out of arch specific code and into
  an arch independent routine.
- Clean up command line processing to follow desired semantics and
  document those semantics.

[1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpe...@huawei.com

Mike Kravetz (4):
  hugetlbfs: add arch_hugetlb_valid_size
  hugetlbfs: move hugepagesz= parsing to arch independent code
  hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate
  hugetlbfs: clean up command line processing

 .../admin-guide/kernel-parameters.txt |  40 ++--
 Documentation/admin-guide/mm/hugetlbpage.rst  |  35 
 arch/arm64/mm/hugetlbpage.c   |  30 +--
 arch/powerpc/mm/hugetlbpage.c |  30 +--
 arch/riscv/mm/hugetlbpage.c   |  24 +--
 arch/s390/mm/hugetlbpage.c|  24 +--
 arch/sparc/mm/init_64.c   |  43 +---
 arch/x86/mm/hugetlbpage.c |  23 +--
 include/linux/hugetlb.h   |   2 +-
 mm/hugetlb.c  | 190 +++---
 10 files changed, 271 insertions(+), 170 deletions(-)

-- 
2.25.2

[PATCH v3 4/4] hugetlbfs: clean up command line processing

2020-04-17 Thread Mike Kravetz

With all hugetlb page processing done in a single file clean up code.
- Make code match desired semantics
  - Update documentation with semantics
- Make all warnings and errors messages start with 'HugeTLB:'.
- Consistently name command line parsing routines.
- Check for hugepages_supported() before processing parameters.
- Add comments to code
  - Describe some of the subtle interactions
  - Describe semantics of command line arguments

This patch also fixes issues with implicitly setting the number of
gigantic huge pages to preallocate.  Previously on X86 command line,
hugepages=2 default_hugepagesz=1G
would result in zero 1G pages being preallocated and,
# grep HugePages_Total /proc/meminfo
HugePages_Total:   0
# sysctl -a | grep nr_hugepages
vm.nr_hugepages = 2
vm.nr_hugepages_mempolicy = 2
# cat /proc/sys/vm/nr_hugepages
2
After this patch 2 gigantic pages will be preallocated and all the
proc, sysfs, sysctl and meminfo files will accurately reflect this.

To address the issue with gigantic pages, a small change in behavior
was made to command line processing.  Previously the command line,
hugepages=128 default_hugepagesz=2M hugepagesz=2M hugepages=256
would result in the allocation of 256 2M huge pages.  The value 128
would be ignored without any warning.  After this patch, 128 2M pages
will be allocated and a warning message will be displayed indicating
the value of 256 is ignored.  This change in behavior is required
because allocation of implicitly specified gigantic pages must be done
when the default_hugepagesz= is encountered for gigantic pages.
Previously the code waited until later in the boot process (hugetlb_init),
to allocate pages of default size.  However the bootmem allocator required
for gigantic allocations is not available at this time.

Signed-off-by: Mike Kravetz 
---
 .../admin-guide/kernel-parameters.txt |  40 +++--
 Documentation/admin-guide/mm/hugetlbpage.rst  |  35 
 mm/hugetlb.c  | 159 ++
 3 files changed, 190 insertions(+), 44 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index f2a93c8679e8..8cd78cc87a1c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -834,12 +834,15 @@
See also Documentation/networking/decnet.txt.
 
default_hugepagesz=
-   [same as hugepagesz=] The size of the default
-   HugeTLB page size. This is the size represented by
-   the legacy /proc/ hugepages APIs, used for SHM, and
-   default size when mounting hugetlbfs filesystems.
-   Defaults to the default architecture's huge page size
-   if not specified.
+   [HW] The size of the default HugeTLB page. This is
+   the size represented by the legacy /proc/ hugepages
+   APIs.  In addition, this is the default hugetlb size
+   used for shmget(), mmap() and mounting hugetlbfs
+   filesystems.  If not specified, defaults to the
+   architecture's default huge page size.  Huge page
+   sizes are architecture dependent.  See also
+   Documentation/admin-guide/mm/hugetlbpage.rst.
+   Format: size[KMG]
 
deferred_probe_timeout=
[KNL] Debugging option to set a timeout in seconds for
@@ -1479,13 +1482,24 @@
hugepages using the cma allocator. If enabled, the
boot-time allocation of gigantic hugepages is skipped.
 
-   hugepages=  [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
-   hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
-   On x86-64 and powerpc, this option can be specified
-   multiple times interleaved with hugepages= to reserve
-   huge pages of different sizes. Valid pages sizes on
-   x86-64 are 2M (when the CPU supports "pse") and 1G
-   (when the CPU supports the "pdpe1gb" cpuinfo flag).
+   hugepages=  [HW] Number of HugeTLB pages to allocate at boot.
+   If this follows hugepagesz (below), it specifies
+   the number of pages of hugepagesz to be allocated.
+   If this is the first HugeTLB parameter on the command
+   line, it specifies the number of pages to allocate for
+   the default huge page size.  See also
+   Documentation/admin-guide/mm/hugetlbpage.rst.
+   Format: 
+
+   hugepagesz=
+

[PATCH v3 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-04-17 Thread Mike Kravetz

The architecture independent routine hugetlb_default_setup sets up
the default huge pages size.  It has no way to verify if the passed
value is valid, so it accepts it and attempts to validate at a later
time.  This requires undocumented cooperation between the arch specific
and arch independent code.

For architectures that support more than one huge page size, provide
a routine arch_hugetlb_valid_size to validate a huge page size.
hugetlb_default_setup can use this to validate passed values.

arch_hugetlb_valid_size will also be used in a subsequent patch to
move processing of the "hugepagesz=" in arch specific code to a common
routine in arch independent code.

Signed-off-by: Mike Kravetz 
---
 arch/arm64/mm/hugetlbpage.c   | 17 +
 arch/powerpc/mm/hugetlbpage.c | 20 +---
 arch/riscv/mm/hugetlbpage.c   | 26 +-
 arch/s390/mm/hugetlbpage.c| 16 
 arch/sparc/mm/init_64.c   | 24 
 arch/x86/mm/hugetlbpage.c | 17 +
 include/linux/hugetlb.h   |  1 +
 mm/hugetlb.c  | 21 ++---
 8 files changed, 103 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index bbeb6a5a6ba6..069b96ee2aec 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -462,17 +462,26 @@ static int __init hugetlbpage_init(void)
 }
 arch_initcall(hugetlbpage_init);
 
-static __init int setup_hugepagesz(char *opt)
+bool __init arch_hugetlb_valid_size(unsigned long size)
 {
-   unsigned long ps = memparse(opt, );
-
-   switch (ps) {
+   switch (size) {
 #ifdef CONFIG_ARM64_4K_PAGES
case PUD_SIZE:
 #endif
case CONT_PMD_SIZE:
case PMD_SIZE:
case CONT_PTE_SIZE:
+   return true;
+   }
+
+   return false;
+}
+
+static __init int setup_hugepagesz(char *opt)
+{
+   unsigned long ps = memparse(opt, );
+
+   if (arch_hugetlb_valid_size(ps)) {
add_huge_page_size(ps);
return 1;
}
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 33b3461d91e8..de54d2a37830 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -558,7 +558,7 @@ unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
return vma_kernel_pagesize(vma);
 }
 
-static int __init add_huge_page_size(unsigned long long size)
+bool __init arch_hugetlb_valid_size(unsigned long size)
 {
int shift = __ffs(size);
int mmu_psize;
@@ -566,20 +566,26 @@ static int __init add_huge_page_size(unsigned long long 
size)
/* Check that it is a page size supported by the hardware and
 * that it fits within pagetable and slice limits. */
if (size <= PAGE_SIZE || !is_power_of_2(size))
-   return -EINVAL;
+   return false;
 
mmu_psize = check_and_get_huge_psize(shift);
if (mmu_psize < 0)
-   return -EINVAL;
+   return false;
 
BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);
 
-   /* Return if huge page size has already been setup */
-   if (size_to_hstate(size))
-   return 0;
+   return true;
+}
 
-   hugetlb_add_hstate(shift - PAGE_SHIFT);
+static int __init add_huge_page_size(unsigned long long size)
+{
+   int shift = __ffs(size);
+
+   if (!arch_hugetlb_valid_size((unsigned long)size))
+   return -EINVAL;
 
+   if (!size_to_hstate(size))
+   hugetlb_add_hstate(shift - PAGE_SHIFT);
return 0;
 }
 
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index a6189ed36c5f..da1f516bc451 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -12,21 +12,29 @@ int pmd_huge(pmd_t pmd)
return pmd_leaf(pmd);
 }
 
+bool __init arch_hugetlb_valid_size(unsigned long size)
+{
+   if (size == HPAGE_SIZE)
+   return true;
+   else if (IS_ENABLED(CONFIG_64BIT) && size == PUD_SIZE)
+   return true;
+   else
+   return false;
+}
+
 static __init int setup_hugepagesz(char *opt)
 {
unsigned long ps = memparse(opt, );
 
-   if (ps == HPAGE_SIZE) {
-   hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT);
-   } else if (IS_ENABLED(CONFIG_64BIT) && ps == PUD_SIZE) {
-   hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
-   } else {
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
-   return 0;
+   if (arch_hugetlb_valid_size(ps)) {
+   hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT);
+   return 1;
}
 
-   return 1;
+   hugetlb_bad_size();
+   pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
+   return 0;
+
 }
 __setup("hugepag

[PATCH v3 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

2020-04-17 Thread Mike Kravetz

The routine hugetlb_add_hstate prints a warning if the hstate already
exists.  This was originally done as part of kernel command line
parsing.  If 'hugepagesz=' was specified more than once, the warning
pr_warn("hugepagesz= specified twice, ignoring\n");
would be printed.

Some architectures want to enable all huge page sizes.  They would
call hugetlb_add_hstate for all supported sizes.  However, this was
done after command line processing and as a result hstates could have
already been created for some sizes.  To make sure no warning were
printed, there would often be code like:
if (!size_to_hstate(size)
hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT)

The only time we want to print the warning is as the result of command
line processing.  So, remove the warning from hugetlb_add_hstate and
add it to the single arch independent routine processing "hugepagesz=".
After this, calls to size_to_hstate() in arch specific code can be
removed and hugetlb_add_hstate can be called without worrying about
warning messages.

Signed-off-by: Mike Kravetz 
Acked-by: Mina Almasry 
---
 arch/arm64/mm/hugetlbpage.c   | 16 
 arch/powerpc/mm/hugetlbpage.c |  3 +--
 arch/riscv/mm/hugetlbpage.c   |  2 +-
 arch/sparc/mm/init_64.c   | 19 ---
 arch/x86/mm/hugetlbpage.c |  2 +-
 mm/hugetlb.c  |  9 ++---
 6 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index f706b821aba6..21fa98b51e00 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -441,22 +441,14 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma,
clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
 }
 
-static void __init add_huge_page_size(unsigned long size)
-{
-   if (size_to_hstate(size))
-   return;
-
-   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-}
-
 static int __init hugetlbpage_init(void)
 {
 #ifdef CONFIG_ARM64_4K_PAGES
-   add_huge_page_size(PUD_SIZE);
+   hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
 #endif
-   add_huge_page_size(CONT_PMD_SIZE);
-   add_huge_page_size(PMD_SIZE);
-   add_huge_page_size(CONT_PTE_SIZE);
+   hugetlb_add_hstate(CONT_PMD_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(CONT_PTE_SHIFT - PAGE_SHIFT);
 
return 0;
 }
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 2c3fa0a7787b..4d5ed1093615 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -584,8 +584,7 @@ static int __init add_huge_page_size(unsigned long long 
size)
if (!arch_hugetlb_valid_size((unsigned long)size))
return -EINVAL;
 
-   if (!size_to_hstate(size))
-   hugetlb_add_hstate(shift - PAGE_SHIFT);
+   hugetlb_add_hstate(shift - PAGE_SHIFT);
return 0;
 }
 
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 4e5d7e9f0eef..932dadfdca54 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -26,7 +26,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 static __init int gigantic_pages_init(void)
 {
/* With CONTIG_ALLOC, we can allocate gigantic pages at runtime */
-   if (IS_ENABLED(CONFIG_64BIT) && !size_to_hstate(1UL << PUD_SHIFT))
+   if (IS_ENABLED(CONFIG_64BIT))
hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
return 0;
 }
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 4618f96fd30f..ae819a16d07a 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -325,23 +325,12 @@ static void __update_mmu_tsb_insert(struct mm_struct *mm, 
unsigned long tsb_inde
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void __init add_huge_page_size(unsigned long size)
-{
-   unsigned int order;
-
-   if (size_to_hstate(size))
-   return;
-
-   order = ilog2(size) - PAGE_SHIFT;
-   hugetlb_add_hstate(order);
-}
-
 static int __init hugetlbpage_init(void)
 {
-   add_huge_page_size(1UL << HPAGE_64K_SHIFT);
-   add_huge_page_size(1UL << HPAGE_SHIFT);
-   add_huge_page_size(1UL << HPAGE_256MB_SHIFT);
-   add_huge_page_size(1UL << HPAGE_2GB_SHIFT);
+   hugetlb_add_hstate(HPAGE_64K_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_256MB_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_2GB_SHIFT - PAGE_SHIFT);
 
return 0;
 }
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 937d640a89e3..cf5781142716 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -195,7 +195,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 static __init int gigantic_pages_init(void)
 {
/* With compaction or CMA we can allocate gigantic pag

[PATCH v3 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-04-17 Thread Mike Kravetz

Now that architectures provide arch_hugetlb_valid_size(), parsing
of "hugepagesz=" can be done in architecture independent code.
Create a single routine to handle hugepagesz= parsing and remove
all arch specific routines.  We can also remove the interface
hugetlb_bad_size() as this is no longer used outside arch independent
code.

This also provides consistent behavior of hugetlbfs command line
options.  The hugepagesz= option should only be specified once for
a specific size, but some architectures allow multiple instances.
This appears to be more of an oversight when code was added by some
architectures to set up ALL huge pages sizes.

Signed-off-by: Mike Kravetz 
Acked-by: Mina Almasry 
Reviewed-by: Peter Xu 
---
 arch/arm64/mm/hugetlbpage.c   | 15 ---
 arch/powerpc/mm/hugetlbpage.c | 15 ---
 arch/riscv/mm/hugetlbpage.c   | 16 
 arch/s390/mm/hugetlbpage.c| 18 --
 arch/sparc/mm/init_64.c   | 22 --
 arch/x86/mm/hugetlbpage.c | 16 
 include/linux/hugetlb.h   |  1 -
 mm/hugetlb.c  | 23 +--
 8 files changed, 17 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 069b96ee2aec..f706b821aba6 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -476,18 +476,3 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 
return false;
 }
-
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long ps = memparse(opt, );
-
-   if (arch_hugetlb_valid_size(ps)) {
-   add_huge_page_size(ps);
-   return 1;
-   }
-
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10);
-   return 0;
-}
-__setup("hugepagesz=", setup_hugepagesz);
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index de54d2a37830..2c3fa0a7787b 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -589,21 +589,6 @@ static int __init add_huge_page_size(unsigned long long 
size)
return 0;
 }
 
-static int __init hugepage_setup_sz(char *str)
-{
-   unsigned long long size;
-
-   size = memparse(str, );
-
-   if (add_huge_page_size(size) != 0) {
-   hugetlb_bad_size();
-   pr_err("Invalid huge page size specified(%llu)\n", size);
-   }
-
-   return 1;
-}
-__setup("hugepagesz=", hugepage_setup_sz);
-
 static int __init hugetlbpage_init(void)
 {
bool configured = false;
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index da1f516bc451..4e5d7e9f0eef 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -22,22 +22,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
return false;
 }
 
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long ps = memparse(opt, );
-
-   if (arch_hugetlb_valid_size(ps)) {
-   hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT);
-   return 1;
-   }
-
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
-   return 0;
-
-}
-__setup("hugepagesz=", setup_hugepagesz);
-
 #ifdef CONFIG_CONTIG_ALLOC
 static __init int gigantic_pages_init(void)
 {
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index ac25b207624c..242dfc0d462d 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -261,24 +261,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
return false;
 }
 
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long size;
-   char *string = opt;
-
-   size = memparse(opt, );
-   if (arch_hugetlb_valid_size(size)) {
-   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-   } else {
-   hugetlb_bad_size();
-   pr_err("hugepagesz= specifies an unsupported page size %s\n",
-   string);
-   return 0;
-   }
-   return 1;
-}
-__setup("hugepagesz=", setup_hugepagesz);
-
 static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 2bfe8e22b706..4618f96fd30f 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -397,28 +397,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 
return true;
 }
-
-static int __init setup_hugepagesz(char *string)
-{
-   unsigned long long hugepage_size;
-   int rc = 0;
-
-   hugepage_size = memparse(string, );
-
-   if (!arch_hugetlb_valid_size((unsigned long)hugepage_size)) {
-   hugetlb_bad_size();
-

Re: [PATCH v2 4/4] hugetlbfs: clean up command line processing

2020-04-13 Thread Mike Kravetz

On 4/10/20 1:37 PM, Peter Xu wrote:
> On Wed, Apr 01, 2020 at 11:38:19AM -0700, Mike Kravetz wrote:
>> With all hugetlb page processing done in a single file clean up code.
>> - Make code match desired semantics
>>   - Update documentation with semantics
>> - Make all warnings and errors messages start with 'HugeTLB:'.
>> - Consistently name command line parsing routines.
>> - Check for hugepages_supported() before processing parameters.
>> - Add comments to code
>>   - Describe some of the subtle interactions
>>   - Describe semantics of command line arguments
>>
>> Signed-off-by: Mike Kravetz 
>> ---
>>  .../admin-guide/kernel-parameters.txt | 35 ---
>>  Documentation/admin-guide/mm/hugetlbpage.rst  | 44 +
>>  mm/hugetlb.c  | 96 +++
>>  3 files changed, 142 insertions(+), 33 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
>> b/Documentation/admin-guide/kernel-parameters.txt
>> index 1bd5454b5e5f..de653cfe1726 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -832,12 +832,15 @@
>>  See also Documentation/networking/decnet.txt.
>>  
>>  default_hugepagesz=
>> -[same as hugepagesz=] The size of the default
>> -HugeTLB page size. This is the size represented by
>> -the legacy /proc/ hugepages APIs, used for SHM, and
>> -default size when mounting hugetlbfs filesystems.
>> -Defaults to the default architecture's huge page size
>> -if not specified.
>> +[HW] The size of the default HugeTLB page size. This
> 
> Could I ask what's "HW"?  Sorry this is not a comment at all but
> really a pure question I wanted to ask... :)

kernel-parameters.rst includes kernel-parameters.txt and included the meaning
for these codes.

   HW  Appropriate hardware is enabled.

Previously, it listed an obsolete list of architectures.

>> +is the size represented by the legacy /proc/ hugepages
>> +APIs.  In addition, this is the default hugetlb size
>> +used for shmget(), mmap() and mounting hugetlbfs
>> +filesystems.  If not specified, defaults to the
>> +architecture's default huge page size.  Huge page
>> +sizes are architecture dependent.  See also
>> +Documentation/admin-guide/mm/hugetlbpage.rst.
>> +Format: size[KMG]
>>  
>>  deferred_probe_timeout=
>>  [KNL] Debugging option to set a timeout in seconds for
>> @@ -1480,13 +1483,19 @@
>>  If enabled, boot-time allocation of gigantic hugepages
>>  is skipped.
>>  
>> -hugepages=  [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
>> -hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
>> -On x86-64 and powerpc, this option can be specified
>> -multiple times interleaved with hugepages= to reserve
>> -huge pages of different sizes. Valid pages sizes on
>> -x86-64 are 2M (when the CPU supports "pse") and 1G
>> -(when the CPU supports the "pdpe1gb" cpuinfo flag).
>> +hugepages=  [HW] Number of HugeTLB pages to allocate at boot.
>> +If this follows hugepagesz (below), it specifies
>> +the number of pages of hugepagesz to be allocated.
> 
> "... Otherwise it specifies the number of pages to allocate for the
> default huge page size." ?

Yes, best to be specific.  I suspect this is the most common way this
parameter is used.

> 
>> +Format: 
> 
> How about add a new line here?

Sure

>> +hugepagesz=
>> +[HW] The size of the HugeTLB pages.  This is used in
>> +conjunction with hugepages (above) to allocate huge
>> +pages of a specific size at boot.  The pair
>> +hugepagesz=X hugepages=Y can be specified once for
>> +each supported huge page size. Huge page sizes are
>> +architecture dependent.  See also
>> +Documentation/admin-guide/mm/hugetlbpage.rst.
>> +Format: size[KMG]
>>  
>>

Re: [PATCH v2 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-04-13 Thread Mike Kravetz

On 4/10/20 12:16 PM, Peter Xu wrote:
> On Wed, Apr 01, 2020 at 11:38:16AM -0700, Mike Kravetz wrote:
>> diff --git a/arch/arm64/include/asm/hugetlb.h 
>> b/arch/arm64/include/asm/hugetlb.h
>> index 2eb6c234d594..81606223494f 100644
>> --- a/arch/arm64/include/asm/hugetlb.h
>> +++ b/arch/arm64/include/asm/hugetlb.h
>> @@ -59,6 +59,8 @@ extern void huge_pte_clear(struct mm_struct *mm, unsigned 
>> long addr,
>>  extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
>>   pte_t *ptep, pte_t pte, unsigned long sz);
>>  #define set_huge_swap_pte_at set_huge_swap_pte_at
>> +bool __init arch_hugetlb_valid_size(unsigned long size);
>> +#define arch_hugetlb_valid_size arch_hugetlb_valid_size
> 
> Sorry for chimming in late.

Thank you for taking a look!

> Since we're working on removing arch-dependent codes after all.. I'm
> thinking whether we can define arch_hugetlb_valid_size() once in the
> common header (e.g. linux/hugetlb.h), then in mm/hugetlb.c:
> 
> bool __init __attribute((weak)) arch_hugetlb_valid_size(unsigned long size)
> {
>   return size == HPAGE_SIZE;
> }
> 
> We can simply redefine arch_hugetlb_valid_size() in arch specific C
> files where we want to override the default.  Would that be slightly
> cleaner?

I think both the #define X X and weak attribute methods are acceptable.
I went with the #define method only because it was most familiar to me.
Using the weak attribute method does appear to be cleaner.  I'll code it up.

Anyone else have a preference?
-- 
Mike Kravetz

[PATCH v2 0/4] Clean up hugetlb boot command line processing

2020-04-01 Thread Mike Kravetz

v2 -
   Fix build errors with patch 1 (Will)
   Change arch_hugetlb_valid_size arg to unsigned long and remove
 irrelevant 'extern' keyword (Christophe)
   Documentation and other misc changes (Randy, Christophe, Mina)
   Do not process command line options if !hugepages_supported()
 (Dave, but it sounds like we may want to additional changes to
  hugepages_supported() for x86?  If that is needed I would prefer
  a separate patch.)

Longpeng(Mike) reported a weird message from hugetlb command line processing
and proposed a solution [1].  While the proposed patch does address the
specific issue, there are other related issues in command line processing.
As hugetlbfs evolved, updates to command line processing have been made to
meet immediate needs and not necessarily in a coordinated manner.  The result
is that some processing is done in arch specific code, some is done in arch
independent code and coordination is problematic.  Semantics can vary between
architectures.

The patch series does the following:
- Define arch specific arch_hugetlb_valid_size routine used to validate
  passed huge page sizes.
- Move hugepagesz= command line parsing out of arch specific code and into
  an arch independent routine.
- Clean up command line processing to follow desired semantics and
  document those semantics.

[1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpe...@huawei.com

Mike Kravetz (4):
  hugetlbfs: add arch_hugetlb_valid_size
  hugetlbfs: move hugepagesz= parsing to arch independent code
  hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate
  hugetlbfs: clean up command line processing

 .../admin-guide/kernel-parameters.txt |  35 +++--
 Documentation/admin-guide/mm/hugetlbpage.rst  |  44 ++
 arch/arm64/include/asm/hugetlb.h  |   2 +
 arch/arm64/mm/hugetlbpage.c   |  30 +---
 arch/powerpc/include/asm/hugetlb.h|   3 +
 arch/powerpc/mm/hugetlbpage.c |  30 ++--
 arch/riscv/include/asm/hugetlb.h  |   3 +
 arch/riscv/mm/hugetlbpage.c   |  24 +--
 arch/s390/include/asm/hugetlb.h   |   3 +
 arch/s390/mm/hugetlbpage.c|  24 +--
 arch/sparc/include/asm/hugetlb.h  |   3 +
 arch/sparc/mm/init_64.c   |  43 ++
 arch/x86/include/asm/hugetlb.h|   5 +
 arch/x86/mm/hugetlbpage.c |  23 +--
 include/linux/hugetlb.h   |   8 +-
 mm/hugetlb.c  | 141 ++
 16 files changed, 252 insertions(+), 169 deletions(-)

-- 
2.25.1

[PATCH v2 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-04-01 Thread Mike Kravetz

The architecture independent routine hugetlb_default_setup sets up
the default huge pages size.  It has no way to verify if the passed
value is valid, so it accepts it and attempts to validate at a later
time.  This requires undocumented cooperation between the arch specific
and arch independent code.

For architectures that support more than one huge page size, provide
a routine arch_hugetlb_valid_size to validate a huge page size.
hugetlb_default_setup can use this to validate passed values.

arch_hugetlb_valid_size will also be used in a subsequent patch to
move processing of the "hugepagesz=" in arch specific code to a common
routine in arch independent code.

Signed-off-by: Mike Kravetz 
---
 arch/arm64/include/asm/hugetlb.h   |  2 ++
 arch/arm64/mm/hugetlbpage.c| 17 +
 arch/powerpc/include/asm/hugetlb.h |  3 +++
 arch/powerpc/mm/hugetlbpage.c  | 20 +---
 arch/riscv/include/asm/hugetlb.h   |  3 +++
 arch/riscv/mm/hugetlbpage.c| 26 +-
 arch/s390/include/asm/hugetlb.h|  3 +++
 arch/s390/mm/hugetlbpage.c | 16 
 arch/sparc/include/asm/hugetlb.h   |  3 +++
 arch/sparc/mm/init_64.c| 24 
 arch/x86/include/asm/hugetlb.h |  5 +
 arch/x86/mm/hugetlbpage.c  | 17 +
 include/linux/hugetlb.h|  7 +++
 mm/hugetlb.c   | 15 ---
 14 files changed, 122 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 2eb6c234d594..81606223494f 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -59,6 +59,8 @@ extern void huge_pte_clear(struct mm_struct *mm, unsigned 
long addr,
 extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned long sz);
 #define set_huge_swap_pte_at set_huge_swap_pte_at
+bool __init arch_hugetlb_valid_size(unsigned long size);
+#define arch_hugetlb_valid_size arch_hugetlb_valid_size
 
 #include 
 
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index bbeb6a5a6ba6..069b96ee2aec 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -462,17 +462,26 @@ static int __init hugetlbpage_init(void)
 }
 arch_initcall(hugetlbpage_init);
 
-static __init int setup_hugepagesz(char *opt)
+bool __init arch_hugetlb_valid_size(unsigned long size)
 {
-   unsigned long ps = memparse(opt, );
-
-   switch (ps) {
+   switch (size) {
 #ifdef CONFIG_ARM64_4K_PAGES
case PUD_SIZE:
 #endif
case CONT_PMD_SIZE:
case PMD_SIZE:
case CONT_PTE_SIZE:
+   return true;
+   }
+
+   return false;
+}
+
+static __init int setup_hugepagesz(char *opt)
+{
+   unsigned long ps = memparse(opt, );
+
+   if (arch_hugetlb_valid_size(ps)) {
add_huge_page_size(ps);
return 1;
}
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index bd6504c28c2f..19b453ee1431 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -64,6 +64,9 @@ static inline void arch_clear_hugepage_flags(struct page 
*page)
 {
 }
 
+#define arch_hugetlb_valid_size arch_hugetlb_valid_size
+bool __init arch_hugetlb_valid_size(unsigned long size);
+
 #include 
 
 #else /* ! CONFIG_HUGETLB_PAGE */
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 33b3461d91e8..de54d2a37830 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -558,7 +558,7 @@ unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
return vma_kernel_pagesize(vma);
 }
 
-static int __init add_huge_page_size(unsigned long long size)
+bool __init arch_hugetlb_valid_size(unsigned long size)
 {
int shift = __ffs(size);
int mmu_psize;
@@ -566,20 +566,26 @@ static int __init add_huge_page_size(unsigned long long 
size)
/* Check that it is a page size supported by the hardware and
 * that it fits within pagetable and slice limits. */
if (size <= PAGE_SIZE || !is_power_of_2(size))
-   return -EINVAL;
+   return false;
 
mmu_psize = check_and_get_huge_psize(shift);
if (mmu_psize < 0)
-   return -EINVAL;
+   return false;
 
BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);
 
-   /* Return if huge page size has already been setup */
-   if (size_to_hstate(size))
-   return 0;
+   return true;
+}
 
-   hugetlb_add_hstate(shift - PAGE_SHIFT);
+static int __init add_huge_page_size(unsigned long long size)
+{
+   int shift = __ffs(size);
+
+   if (!arch_hugetlb_valid_size((unsigned long)size))
+   return -EINVAL;
 
+   if (!size_to_hstate(size))
+   hugetlb_

[PATCH v2 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

2020-04-01 Thread Mike Kravetz

The routine hugetlb_add_hstate prints a warning if the hstate already
exists.  This was originally done as part of kernel command line
parsing.  If 'hugepagesz=' was specified more than once, the warning
pr_warn("hugepagesz= specified twice, ignoring\n");
would be printed.

Some architectures want to enable all huge page sizes.  They would
call hugetlb_add_hstate for all supported sizes.  However, this was
done after command line processing and as a result hstates could have
already been created for some sizes.  To make sure no warning were
printed, there would often be code like:
if (!size_to_hstate(size)
hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT)

The only time we want to print the warning is as the result of command
line processing.  So, remove the warning from hugetlb_add_hstate and
add it to the single arch independent routine processing "hugepagesz=".
After this, calls to size_to_hstate() in arch specific code can be
removed and hugetlb_add_hstate can be called without worrying about
warning messages.

Signed-off-by: Mike Kravetz 
---
 arch/arm64/mm/hugetlbpage.c   | 16 
 arch/powerpc/mm/hugetlbpage.c |  3 +--
 arch/riscv/mm/hugetlbpage.c   |  2 +-
 arch/sparc/mm/init_64.c   | 19 ---
 arch/x86/mm/hugetlbpage.c |  2 +-
 mm/hugetlb.c  |  9 ++---
 6 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index f706b821aba6..21fa98b51e00 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -441,22 +441,14 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma,
clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
 }
 
-static void __init add_huge_page_size(unsigned long size)
-{
-   if (size_to_hstate(size))
-   return;
-
-   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-}
-
 static int __init hugetlbpage_init(void)
 {
 #ifdef CONFIG_ARM64_4K_PAGES
-   add_huge_page_size(PUD_SIZE);
+   hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
 #endif
-   add_huge_page_size(CONT_PMD_SIZE);
-   add_huge_page_size(PMD_SIZE);
-   add_huge_page_size(CONT_PTE_SIZE);
+   hugetlb_add_hstate(CONT_PMD_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(CONT_PTE_SHIFT - PAGE_SHIFT);
 
return 0;
 }
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 2c3fa0a7787b..4d5ed1093615 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -584,8 +584,7 @@ static int __init add_huge_page_size(unsigned long long 
size)
if (!arch_hugetlb_valid_size((unsigned long)size))
return -EINVAL;
 
-   if (!size_to_hstate(size))
-   hugetlb_add_hstate(shift - PAGE_SHIFT);
+   hugetlb_add_hstate(shift - PAGE_SHIFT);
return 0;
 }
 
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 4e5d7e9f0eef..932dadfdca54 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -26,7 +26,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 static __init int gigantic_pages_init(void)
 {
/* With CONTIG_ALLOC, we can allocate gigantic pages at runtime */
-   if (IS_ENABLED(CONFIG_64BIT) && !size_to_hstate(1UL << PUD_SHIFT))
+   if (IS_ENABLED(CONFIG_64BIT))
hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
return 0;
 }
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 4618f96fd30f..ae819a16d07a 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -325,23 +325,12 @@ static void __update_mmu_tsb_insert(struct mm_struct *mm, 
unsigned long tsb_inde
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void __init add_huge_page_size(unsigned long size)
-{
-   unsigned int order;
-
-   if (size_to_hstate(size))
-   return;
-
-   order = ilog2(size) - PAGE_SHIFT;
-   hugetlb_add_hstate(order);
-}
-
 static int __init hugetlbpage_init(void)
 {
-   add_huge_page_size(1UL << HPAGE_64K_SHIFT);
-   add_huge_page_size(1UL << HPAGE_SHIFT);
-   add_huge_page_size(1UL << HPAGE_256MB_SHIFT);
-   add_huge_page_size(1UL << HPAGE_2GB_SHIFT);
+   hugetlb_add_hstate(HPAGE_64K_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_256MB_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_2GB_SHIFT - PAGE_SHIFT);
 
return 0;
 }
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 937d640a89e3..cf5781142716 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -195,7 +195,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 static __init int gigantic_pages_init(void)
 {
/* With compaction or CMA we can allocate gigantic pages at runtime */
-   if (boot_cpu_

[PATCH v2 4/4] hugetlbfs: clean up command line processing

2020-04-01 Thread Mike Kravetz

With all hugetlb page processing done in a single file clean up code.
- Make code match desired semantics
  - Update documentation with semantics
- Make all warnings and errors messages start with 'HugeTLB:'.
- Consistently name command line parsing routines.
- Check for hugepages_supported() before processing parameters.
- Add comments to code
  - Describe some of the subtle interactions
  - Describe semantics of command line arguments

Signed-off-by: Mike Kravetz 
---
 .../admin-guide/kernel-parameters.txt | 35 ---
 Documentation/admin-guide/mm/hugetlbpage.rst  | 44 +
 mm/hugetlb.c  | 96 +++
 3 files changed, 142 insertions(+), 33 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 1bd5454b5e5f..de653cfe1726 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -832,12 +832,15 @@
See also Documentation/networking/decnet.txt.
 
default_hugepagesz=
-   [same as hugepagesz=] The size of the default
-   HugeTLB page size. This is the size represented by
-   the legacy /proc/ hugepages APIs, used for SHM, and
-   default size when mounting hugetlbfs filesystems.
-   Defaults to the default architecture's huge page size
-   if not specified.
+   [HW] The size of the default HugeTLB page size. This
+   is the size represented by the legacy /proc/ hugepages
+   APIs.  In addition, this is the default hugetlb size
+   used for shmget(), mmap() and mounting hugetlbfs
+   filesystems.  If not specified, defaults to the
+   architecture's default huge page size.  Huge page
+   sizes are architecture dependent.  See also
+   Documentation/admin-guide/mm/hugetlbpage.rst.
+   Format: size[KMG]
 
deferred_probe_timeout=
[KNL] Debugging option to set a timeout in seconds for
@@ -1480,13 +1483,19 @@
If enabled, boot-time allocation of gigantic hugepages
is skipped.
 
-   hugepages=  [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
-   hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
-   On x86-64 and powerpc, this option can be specified
-   multiple times interleaved with hugepages= to reserve
-   huge pages of different sizes. Valid pages sizes on
-   x86-64 are 2M (when the CPU supports "pse") and 1G
-   (when the CPU supports the "pdpe1gb" cpuinfo flag).
+   hugepages=  [HW] Number of HugeTLB pages to allocate at boot.
+   If this follows hugepagesz (below), it specifies
+   the number of pages of hugepagesz to be allocated.
+   Format: 
+   hugepagesz=
+   [HW] The size of the HugeTLB pages.  This is used in
+   conjunction with hugepages (above) to allocate huge
+   pages of a specific size at boot.  The pair
+   hugepagesz=X hugepages=Y can be specified once for
+   each supported huge page size. Huge page sizes are
+   architecture dependent.  See also
+   Documentation/admin-guide/mm/hugetlbpage.rst.
+   Format: size[KMG]
 
hung_task_panic=
[KNL] Should the hung task detector generate panics.
diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst 
b/Documentation/admin-guide/mm/hugetlbpage.rst
index 1cc0bc78d10e..de340c586995 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -100,6 +100,50 @@ with a huge page size selection parameter 
"hugepagesz=".   must
 be specified in bytes with optional scale suffix [kKmMgG].  The default huge
 page size may be selected with the "default_hugepagesz=" boot parameter.
 
+Hugetlb boot command line parameter semantics
+hugepagesz - Specify a huge page size.  Used in conjunction with hugepages
+   parameter to preallocate a number of huge pages of the specified
+   size.  Hence, hugepagesz and hugepages are typically specified in
+   pairs such as:
+   hugepagesz=2M hugepages=512
+   hugepagesz can only be specified once on the command line for a
+   specific huge page size.  Valid huge page sizes are architecture
+   dependent.
+hugepages - Specify the number of huge pages to preallocate.  This typically
+   follows a valid huge

[PATCH v2 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-04-01 Thread Mike Kravetz

Now that architectures provide arch_hugetlb_valid_size(), parsing
of "hugepagesz=" can be done in architecture independent code.
Create a single routine to handle hugepagesz= parsing and remove
all arch specific routines.  We can also remove the interface
hugetlb_bad_size() as this is no longer used outside arch independent
code.

This also provides consistent behavior of hugetlbfs command line
options.  The hugepagesz= option should only be specified once for
a specific size, but some architectures allow multiple instances.
This appears to be more of an oversight when code was added by some
architectures to set up ALL huge pages sizes.

Signed-off-by: Mike Kravetz 
---
 arch/arm64/mm/hugetlbpage.c   | 15 ---
 arch/powerpc/mm/hugetlbpage.c | 15 ---
 arch/riscv/mm/hugetlbpage.c   | 16 
 arch/s390/mm/hugetlbpage.c| 18 --
 arch/sparc/mm/init_64.c   | 22 --
 arch/x86/mm/hugetlbpage.c | 16 
 include/linux/hugetlb.h   |  1 -
 mm/hugetlb.c  | 23 +--
 8 files changed, 17 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 069b96ee2aec..f706b821aba6 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -476,18 +476,3 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 
return false;
 }
-
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long ps = memparse(opt, );
-
-   if (arch_hugetlb_valid_size(ps)) {
-   add_huge_page_size(ps);
-   return 1;
-   }
-
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10);
-   return 0;
-}
-__setup("hugepagesz=", setup_hugepagesz);
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index de54d2a37830..2c3fa0a7787b 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -589,21 +589,6 @@ static int __init add_huge_page_size(unsigned long long 
size)
return 0;
 }
 
-static int __init hugepage_setup_sz(char *str)
-{
-   unsigned long long size;
-
-   size = memparse(str, );
-
-   if (add_huge_page_size(size) != 0) {
-   hugetlb_bad_size();
-   pr_err("Invalid huge page size specified(%llu)\n", size);
-   }
-
-   return 1;
-}
-__setup("hugepagesz=", hugepage_setup_sz);
-
 static int __init hugetlbpage_init(void)
 {
bool configured = false;
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index da1f516bc451..4e5d7e9f0eef 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -22,22 +22,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
return false;
 }
 
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long ps = memparse(opt, );
-
-   if (arch_hugetlb_valid_size(ps)) {
-   hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT);
-   return 1;
-   }
-
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
-   return 0;
-
-}
-__setup("hugepagesz=", setup_hugepagesz);
-
 #ifdef CONFIG_CONTIG_ALLOC
 static __init int gigantic_pages_init(void)
 {
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index ac25b207624c..242dfc0d462d 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -261,24 +261,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
return false;
 }
 
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long size;
-   char *string = opt;
-
-   size = memparse(opt, );
-   if (arch_hugetlb_valid_size(size)) {
-   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-   } else {
-   hugetlb_bad_size();
-   pr_err("hugepagesz= specifies an unsupported page size %s\n",
-   string);
-   return 0;
-   }
-   return 1;
-}
-__setup("hugepagesz=", setup_hugepagesz);
-
 static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 2bfe8e22b706..4618f96fd30f 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -397,28 +397,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 
return true;
 }
-
-static int __init setup_hugepagesz(char *string)
-{
-   unsigned long long hugepage_size;
-   int rc = 0;
-
-   hugepage_size = memparse(string, );
-
-   if (!arch_hugetlb_valid_size((unsigned long)hugepage_size)) {
-   hugetlb_bad_size();
-   pr_err("

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-26 Thread Mike Kravetz

On 3/18/20 4:36 PM, Dave Hansen wrote:
> On 3/18/20 3:52 PM, Mike Kravetz wrote:
>> Sounds good.  I'll incorporate those changes into a v2, unless someone
>> else with has a different opinion.
>>
>> BTW, this patch should not really change the way the code works today.
>> It is mostly a movement of code.  Unless I am missing something, the
>> existing code will always allow setup of PMD_SIZE hugetlb pages.
> 
> Hah, I totally skipped over the old code in the diff.
> 
> It looks like we'll disable hugetblfs *entirely* if PSE isn't supported.
>  I think this is actually wrong, but nobody ever noticed.  I think you'd
> have to be running as a guest under a hypervisor that's lying about PSE
> not being supported *and* care about 1GB pages.  Nobody does that.

Actually, !PSE will disable hugetlbfs a little later in the boot process.
You are talking about hugepages_supported() correct?

I think something really bad could happen in this situation (!PSE and
X86_FEATURE_GBPAGES).  When parsing 'hugepages=' for gigantic pages we
immediately allocate from bootmem.  This happens before later checks in
hugetlb_init for hugepages_supported().  So, I think we would end up
allocating GB pages from bootmem and not be able to use or free them. :(

Perhaps it would be best to check hugepages_supported() when parsing
hugetlb command line options.  If not enabled, throw an error.  This
will be much easier to do after moving all command line parsing to
arch independent code.

Is that a sufficient way to address this concern?  I think it is a good
change in any case.
-- 
Mike Kravetz

Re: [PATCH 4/4] hugetlbfs: clean up command line processing

2020-03-24 Thread Mike Kravetz

On 3/23/20 8:47 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) 
wrote:
> 
> 
> On 2020/3/24 8:43, Mina Almasry wrote:
>> On Wed, Mar 18, 2020 at 3:07 PM Mike Kravetz  wrote:
>>> +default_hugepagesz - Specify the default huge page size.  This parameter 
>>> can
>>> +   only be specified on the command line.  No other hugetlb command 
>>> line
>>> +   parameter is associated with default_hugepagesz.  Therefore, it can
>>> +   appear anywhere on the command line.  Valid default huge page size 
>>> is
>>> +   architecture dependent.
>>
>> Maybe specify what happens/should happen in a case like:
>>
>> hugepages=100 default_hugepagesz=1G
>>
>> Does that allocate 100 2MB pages or 100 1G pages? Assuming the default
>> size is 2MB.

That will allocate 100 1G pages as 1G is the default.  However, if the
command line reads:

hugepages=100 default_hugepagesz=1G hugepages=200

You will get this warning,

HugeTLB: First hugepages=104857600 kB ignored

>>
>> Also, regarding Randy's comment. It may be nice to keep these docs in
>> one place only, so we don't have to maintain 2 docs in sync.

Let me think about that a bit.  We should probably expand the
kernel-parameters doc.  Or, we should at least make it more clear.  This
doc also talks about the command line parameters and in general goes into
more detail.  However, more people read kernel-parameters doc.

>>> +
>>>  When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
>>>  indicates the current number of pre-allocated huge pages of the default 
>>> size.
>>>  Thus, one can use the following command to dynamically allocate/deallocate
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index cc85b4f156ca..2b9bf01db2b6 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c

>>> -static int __init hugetlb_nrpages_setup(char *s)
>>> +/*
>>> + * hugepages command line processing
>>> + * hugepages must normally follows a valid hugepagsz specification.  If 
>>> not,
>>
>> 'hugepages must' or 'hugepages normally follows'
>>> + * ignore the hugepages value.  hugepages can also be the first huge page
>>> + * command line option in which case it specifies the number of huge pages
>>> + * for the default size.
>>> + */
>>> +static int __init hugepages_setup(char *s)
>>>  {
>>> unsigned long *mhp;
>>> static unsigned long *last_mhp;
>>>
>>> if (!parsed_valid_hugepagesz) {
>>> -   pr_warn("hugepages = %s preceded by "
>>> +   pr_warn("HugeTLB: hugepages = %s preceded by "
>>> "an unsupported hugepagesz, ignoring\n", s);
>>> parsed_valid_hugepagesz = true;
>>> return 1;
>>> }
>>> /*
>>> -* !hugetlb_max_hstate means we haven't parsed a hugepagesz= 
>>> parameter yet,
>>> -* so this hugepages= parameter goes to the "default hstate".
>>> +* !hugetlb_max_hstate means we haven't parsed a hugepagesz= 
>>> parameter
>>> +* yet, so this hugepages= parameter goes to the "default hstate".
>>>  */
>>> else if (!hugetlb_max_hstate)
>>> mhp = _hstate_max_huge_pages;
>>
>> We don't set parsed_valid_hugepagesz to false at the end of this
>> function, shouldn't we? Parsing a hugepages= value should 'consume' a
>> previously defined hugepagesz= value, so that this is invalid IIUC:
>>
>> hugepagesz=x hugepages=z hugepages=y
>>
> In this case, we'll get:
> "HugeTLB: hugepages= specified twice without interleaving hugepagesz=, 
> ignoring
> hugepages=y"
> 

Thanks Longpeng (Mike),

I believe that is the desired message in this situation.  The code uses saved
values of mhp (max hstate pointer) to catch this condition.  Setting
parsed_valid_hugepagesz to false would result in the message:

HugeTLB: hugepages=y preceded by an unsupported hugepagesz, ignoring

Thanks for all your comments I will incorporate in v2 and send later this
week.
-- 
Mike Kravetz

Re: [PATCH 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

2020-03-23 Thread Mike Kravetz

On 3/23/20 5:01 PM, Mina Almasry wrote:
> On Wed, Mar 18, 2020 at 3:07 PM Mike Kravetz  wrote:
>>
>> The routine hugetlb_add_hstate prints a warning if the hstate already
>> exists.  This was originally done as part of kernel command line
>> parsing.  If 'hugepagesz=' was specified more than once, the warning
>> pr_warn("hugepagesz= specified twice, ignoring\n");
>> would be printed.
>>
>> Some architectures want to enable all huge page sizes.  They would
>> call hugetlb_add_hstate for all supported sizes.  However, this was
>> done after command line processing and as a result hstates could have
>> already been created for some sizes.  To make sure no warning were
>> printed, there would often be code like:
>> if (!size_to_hstate(size)
>> hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT)
>>
>> The only time we want to print the warning is as the result of command
>> line processing.  So, remove the warning from hugetlb_add_hstate and
>> add it to the single arch independent routine processing "hugepagesz=".
>> After this, calls to size_to_hstate() in arch specific code can be
>> removed and hugetlb_add_hstate can be called without worrying about
>> warning messages.
>>
>> Signed-off-by: Mike Kravetz 
>> ---
>>  arch/arm64/mm/hugetlbpage.c   | 16 
>>  arch/powerpc/mm/hugetlbpage.c |  3 +--
>>  arch/riscv/mm/hugetlbpage.c   |  2 +-
>>  arch/sparc/mm/init_64.c   | 19 ---
>>  arch/x86/mm/hugetlbpage.c |  2 +-
>>  mm/hugetlb.c  | 10 +++---
>>  6 files changed, 18 insertions(+), 34 deletions(-)
>>
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 4aa9534a45d7..050809e6f0a9 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -441,22 +441,14 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma,
>> clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
>>  }
>>
>> -static void __init add_huge_page_size(unsigned long size)
>> -{
>> -   if (size_to_hstate(size))
>> -   return;
>> -
>> -   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
>> -}
>> -
>>  static int __init hugetlbpage_init(void)
>>  {
>>  #ifdef CONFIG_ARM64_4K_PAGES
>> -   add_huge_page_size(PUD_SIZE);
>> +   hugetlb_add_hstate(ilog2(PUD_SIZE) - PAGE_SHIFT);
>>  #endif
>> -   add_huge_page_size(CONT_PMD_SIZE);
>> -   add_huge_page_size(PMD_SIZE);
>> -   add_huge_page_size(CONT_PTE_SIZE);
>> +   hugetlb_add_hstate(ilog2(CONT_PMD_SIZE) - PAGE_SHIFT);
>> +   hugetlb_add_hstate(ilog2(PMD_SIZE) - PAGE_SHIFT);
>> +   hugetlb_add_hstate(ilog2(CONT_PTE_SIZE) - PAGE_SHIFT);
>>
>> return 0;
>>  }
>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>> index 166960ba1236..f46464ba6fb4 100644
>> --- a/arch/powerpc/mm/hugetlbpage.c
>> +++ b/arch/powerpc/mm/hugetlbpage.c
>> @@ -584,8 +584,7 @@ static int __init add_huge_page_size(unsigned long long 
>> size)
>> if (!arch_hugetlb_valid_size(size))
>> return -EINVAL;
>>
>> -   if (!size_to_hstate(size))
>> -   hugetlb_add_hstate(shift - PAGE_SHIFT);
>> +   hugetlb_add_hstate(shift - PAGE_SHIFT);
>> return 0;
>>  }
>>
>> diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
>> index bdf89d7eb714..beaa91941db8 100644
>> --- a/arch/riscv/mm/hugetlbpage.c
>> +++ b/arch/riscv/mm/hugetlbpage.c
>> @@ -26,7 +26,7 @@ bool __init arch_hugetlb_valid_size(unsigned long long 
>> size)
>>  static __init int gigantic_pages_init(void)
>>  {
>> /* With CONTIG_ALLOC, we can allocate gigantic pages at runtime */
>> -   if (IS_ENABLED(CONFIG_64BIT) && !size_to_hstate(1UL << PUD_SHIFT))
>> +   if (IS_ENABLED(CONFIG_64BIT))
>> hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
>> return 0;
>>  }
>> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
>> index 5c29203fd460..8f619edc8f8c 100644
>> --- a/arch/sparc/mm/init_64.c
>> +++ b/arch/sparc/mm/init_64.c
>> @@ -325,23 +325,12 @@ static void __update_mmu_tsb_insert(struct mm_struct 
>> *mm, unsigned long tsb_inde
>>  }
>>
>>  #ifdef CONFIG_HUGETLB_PAGE
>> -static void __init add_huge_page_size(unsigned long size)
>> -{
>> -   unsigned int order;
>> -
&

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-19 Thread Mike Kravetz

On 3/19/20 12:00 AM, Christophe Leroy wrote:
> 
> Le 18/03/2020 à 23:06, Mike Kravetz a écrit :
>> The architecture independent routine hugetlb_default_setup sets up
>> the default huge pages size.  It has no way to verify if the passed
>> value is valid, so it accepts it and attempts to validate at a later
>> time.  This requires undocumented cooperation between the arch specific
>> and arch independent code.
>>
>> For architectures that support more than one huge page size, provide
>> a routine arch_hugetlb_valid_size to validate a huge page size.
>> hugetlb_default_setup can use this to validate passed values.
>>
>> arch_hugetlb_valid_size will also be used in a subsequent patch to
>> move processing of the "hugepagesz=" in arch specific code to a common
>> routine in arch independent code.
>>
>> Signed-off-by: Mike Kravetz 
>> ---
>>   arch/arm64/include/asm/hugetlb.h   |  2 ++
>>   arch/arm64/mm/hugetlbpage.c| 19 ++-
>>   arch/powerpc/include/asm/hugetlb.h |  3 +++
>>   arch/powerpc/mm/hugetlbpage.c  | 20 +---
>>   arch/riscv/include/asm/hugetlb.h   |  3 +++
>>   arch/riscv/mm/hugetlbpage.c| 28 ++--
>>   arch/s390/include/asm/hugetlb.h|  3 +++
>>   arch/s390/mm/hugetlbpage.c | 18 +-
>>   arch/sparc/include/asm/hugetlb.h   |  3 +++
>>   arch/sparc/mm/init_64.c| 23 ---
>>   arch/x86/include/asm/hugetlb.h |  3 +++
>>   arch/x86/mm/hugetlbpage.c  | 21 +++--
>>   include/linux/hugetlb.h|  7 +++
>>   mm/hugetlb.c   | 16 +---
>>   14 files changed, 126 insertions(+), 43 deletions(-)
>>
> 
> [snip]
> 
>> diff --git a/arch/powerpc/include/asm/hugetlb.h 
>> b/arch/powerpc/include/asm/hugetlb.h
>> index bd6504c28c2f..3b5939016955 100644
>> --- a/arch/powerpc/include/asm/hugetlb.h
>> +++ b/arch/powerpc/include/asm/hugetlb.h
>> @@ -64,6 +64,9 @@ static inline void arch_clear_hugepage_flags(struct page 
>> *page)
>>   {
>>   }
>>   +#define arch_hugetlb_valid_size arch_hugetlb_valid_size
>> +extern bool __init arch_hugetlb_valid_size(unsigned long long size);
> 
> Don't add 'extern' keyword, it is irrelevant for a function declaration.
> 

Will do.  One of the other arch's did this and I got into a bad habit.

> checkpatch --strict doesn't like it either 
> (https://openpower.xyz/job/snowpatch/job/snowpatch-linux-checkpatch/12318//artifact/linux/checkpatch.log)
> 
>> +
>>   #include 
>> #else /* ! CONFIG_HUGETLB_PAGE */
>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>> index 33b3461d91e8..b78f660252f3 100644
>> --- a/arch/powerpc/mm/hugetlbpage.c
>> +++ b/arch/powerpc/mm/hugetlbpage.c
>> @@ -558,7 +558,7 @@ unsigned long vma_mmu_pagesize(struct vm_area_struct 
>> *vma)
>>   return vma_kernel_pagesize(vma);
>>   }
>>   -static int __init add_huge_page_size(unsigned long long size)
>> +bool __init arch_hugetlb_valid_size(unsigned long long size)
>>   {
>>   int shift = __ffs(size);
>>   int mmu_psize;
>> @@ -566,20 +566,26 @@ static int __init add_huge_page_size(unsigned long 
>> long size)
>>   /* Check that it is a page size supported by the hardware and
>>* that it fits within pagetable and slice limits. */
>>   if (size <= PAGE_SIZE || !is_power_of_2(size))
>> -return -EINVAL;
>> +return false;
>> mmu_psize = check_and_get_huge_psize(shift);
>>   if (mmu_psize < 0)
>> -return -EINVAL;
>> +return false;
>> BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);
>>   -/* Return if huge page size has already been setup */
>> -if (size_to_hstate(size))
>> -return 0;
>> +return true;
>> +}
>>   -hugetlb_add_hstate(shift - PAGE_SHIFT);
>> +static int __init add_huge_page_size(unsigned long long size)
>> +{
>> +int shift = __ffs(size);
>> +
>> +if (!arch_hugetlb_valid_size(size))
>> +return -EINVAL;
>>   +if (!size_to_hstate(size))
>> +hugetlb_add_hstate(shift - PAGE_SHIFT);
>>   return 0;
>>   }
>>   
> 
> [snip]
> 
>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
>> index 5bfd5aef5378..51e6208fdeec 100644
>> --- a/arch/x86/mm/hugetlbpage.c
>> +++ b/arch/x86/mm/hugetlbpage.c
>> @@ -181,16 +181,25 @@ hugetlb_get_u

Re: [PATCH 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-03-19 Thread Mike Kravetz

On 3/19/20 12:04 AM, Christophe Leroy wrote:
> 
> 
> Le 18/03/2020 à 23:06, Mike Kravetz a écrit :
>> Now that architectures provide arch_hugetlb_valid_size(), parsing
>> of "hugepagesz=" can be done in architecture independent code.
>> Create a single routine to handle hugepagesz= parsing and remove
>> all arch specific routines.  We can also remove the interface
>> hugetlb_bad_size() as this is no longer used outside arch independent
>> code.
>>
>> This also provides consistent behavior of hugetlbfs command line
>> options.  The hugepagesz= option should only be specified once for
>> a specific size, but some architectures allow multiple instances.
>> This appears to be more of an oversight when code was added by some
>> architectures to set up ALL huge pages sizes.
>>
>> Signed-off-by: Mike Kravetz 
>> ---
>>   arch/arm64/mm/hugetlbpage.c   | 15 ---
>>   arch/powerpc/mm/hugetlbpage.c | 15 ---
>>   arch/riscv/mm/hugetlbpage.c   | 16 
>>   arch/s390/mm/hugetlbpage.c| 18 --
>>   arch/sparc/mm/init_64.c   | 22 --
>>   arch/x86/mm/hugetlbpage.c | 16 
>>   include/linux/hugetlb.h   |  1 -
>>   mm/hugetlb.c  | 24 ++--
>>   8 files changed, 18 insertions(+), 109 deletions(-)
>>
> 
> [snip]
> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 2f99359b93af..cd4ec07080fb 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3149,12 +3149,6 @@ static int __init hugetlb_init(void)
>>   }
>>   subsys_initcall(hugetlb_init);
>>   -/* Should be called on processing a hugepagesz=... option */
>> -void __init hugetlb_bad_size(void)
>> -{
>> -parsed_valid_hugepagesz = false;
>> -}
>> -
>>   void __init hugetlb_add_hstate(unsigned int order)
>>   {
>>   struct hstate *h;
>> @@ -3224,6 +3218,24 @@ static int __init hugetlb_nrpages_setup(char *s)
>>   }
>>   __setup("hugepages=", hugetlb_nrpages_setup);
>>   +static int __init hugepagesz_setup(char *s)
>> +{
>> +unsigned long long size;
>> +char *saved_s = s;
>> +
>> +size = memparse(s, );
> 
> You don't use s after that, so you can pass NULL instead of  and avoid the 
> saved_s

Thanks!

I'll incorporate in v2.

-- 
Mike Kravetz

Re: [PATCH 4/4] hugetlbfs: clean up command line processing

2020-03-18 Thread Mike Kravetz

On 3/18/20 5:20 PM, Randy Dunlap wrote:
> Hi Mike,
> 
> On 3/18/20 3:06 PM, Mike Kravetz wrote:
>> With all hugetlb page processing done in a single file clean up code.
>> - Make code match desired semantics
>>   - Update documentation with semantics
>> - Make all warnings and errors messages start with 'HugeTLB:'.
>> - Consistently name command line parsing routines.
>> - Add comments to code
>>   - Describe some of the subtle interactions
>>   - Describe semantics of command line arguments
>>
>> Signed-off-by: Mike Kravetz 
>> ---
>>  Documentation/admin-guide/mm/hugetlbpage.rst | 26 +++
>>  mm/hugetlb.c | 78 +++-
>>  2 files changed, 87 insertions(+), 17 deletions(-)
> 
> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index cc85b4f156ca..2b9bf01db2b6 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
> 
>> @@ -3214,8 +3238,15 @@ static int __init hugetlb_nrpages_setup(char *s)
>>  
>>  return 1;
>>  }
>> -__setup("hugepages=", hugetlb_nrpages_setup);
>> +__setup("hugepages=", hugepages_setup);
>>  
>> +/*
>> + * hugepagesz command line processing
>> + * A specific huge page size can only be specified once with hugepagesz.
>> + * hugepagesz is followed by hugepages on the commnad line.  The global
> 
> typo:command

Thanks

> 
>> + * variable 'parsed_valid_hugepagesz' is used to determine if prior
>> + * hugepagesz argument was valid.
>> + */
>>  static int __init hugepagesz_setup(char *s)
>>  {
>>  unsigned long long size;
> 
> 
> Does any of this need to be updated?  (from 
> Documentation/admin-guide/kernel-parameters.txt)
> 
>   hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
>   On x86-64 and powerpc, this option can be specified
>   multiple times interleaved with hugepages= to reserve
>   huge pages of different sizes. Valid pages sizes on
>   x86-64 are 2M (when the CPU supports "pse") and 1G
>   (when the CPU supports the "pdpe1gb" cpuinfo flag).
> 

No functional changes should be expected/seen as a result of these patches.
So the documentation here is basically OK.  However, it is out of date as
more architectures are supported.  In addition, the statement "this option
can be specified multiple times interleaved with hugepages= to reserve
huge pages of different sizes." may need a little clarification.  As mentioned
elsewhere,  hugepagesz= can only be specified once per huge page size.

I'll make some updates in v2.
-- 
Mike Kravetz

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread Mike Kravetz

On 3/18/20 3:15 PM, Dave Hansen wrote:
> Hi Mike,
> 
> The series looks like a great idea to me.  One nit on the x86 bits,
> though...
> 
>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
>> index 5bfd5aef5378..51e6208fdeec 100644
>> --- a/arch/x86/mm/hugetlbpage.c
>> +++ b/arch/x86/mm/hugetlbpage.c
>> @@ -181,16 +181,25 @@ hugetlb_get_unmapped_area(struct file *file, unsigned 
>> long addr,
>>  #endif /* CONFIG_HUGETLB_PAGE */
>>  
>>  #ifdef CONFIG_X86_64
>> +bool __init arch_hugetlb_valid_size(unsigned long long size)
>> +{
>> +if (size == PMD_SIZE)
>> +return true;
>> +else if (size == PUD_SIZE && boot_cpu_has(X86_FEATURE_GBPAGES))
>> +return true;
>> +else
>> +return false;
>> +}
> 
> I'm pretty sure it's possible to have a system without 2M/PMD page
> support.  We even have a handy-dandy comment about it in
> arch/x86/include/asm/required-features.h:
> 
>   #ifdef CONFIG_X86_64
>   #ifdef CONFIG_PARAVIRT
>   /* Paravirtualized systems may not have PSE or PGE available */
>   #define NEED_PSE0
>   ...
> 
> I *think* you need an X86_FEATURE_PSE check here to be totally correct.
> 
>   if (size == PMD_SIZE && cpu_feature_enabled(X86_FEATURE_PSE))
>   return true;
> 
> BTW, I prefer cpu_feature_enabled() to boot_cpu_has() because it
> includes disabled-features checking.  I don't think any of it matters
> for these specific features, but I generally prefer it on principle.

Sounds good.  I'll incorporate those changes into a v2, unless someone
else with has a different opinion.

BTW, this patch should not really change the way the code works today.
It is mostly a movement of code.  Unless I am missing something, the
existing code will always allow setup of PMD_SIZE hugetlb pages.
-- 
Mike Kravetz

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread Mike Kravetz

On 3/18/20 3:09 PM, Will Deacon wrote:
> On Wed, Mar 18, 2020 at 03:06:31PM -0700, Mike Kravetz wrote:
>> The architecture independent routine hugetlb_default_setup sets up
>> the default huge pages size.  It has no way to verify if the passed
>> value is valid, so it accepts it and attempts to validate at a later
>> time.  This requires undocumented cooperation between the arch specific
>> and arch independent code.
>>
>> For architectures that support more than one huge page size, provide
>> a routine arch_hugetlb_valid_size to validate a huge page size.
>> hugetlb_default_setup can use this to validate passed values.
>>
>> arch_hugetlb_valid_size will also be used in a subsequent patch to
>> move processing of the "hugepagesz=" in arch specific code to a common
>> routine in arch independent code.
>>
>> Signed-off-by: Mike Kravetz 
>> ---
>>  arch/arm64/include/asm/hugetlb.h   |  2 ++
>>  arch/arm64/mm/hugetlbpage.c| 19 ++-
>>  arch/powerpc/include/asm/hugetlb.h |  3 +++
>>  arch/powerpc/mm/hugetlbpage.c  | 20 +---
>>  arch/riscv/include/asm/hugetlb.h   |  3 +++
>>  arch/riscv/mm/hugetlbpage.c| 28 ++--
>>  arch/s390/include/asm/hugetlb.h|  3 +++
>>  arch/s390/mm/hugetlbpage.c | 18 +-
>>  arch/sparc/include/asm/hugetlb.h   |  3 +++
>>  arch/sparc/mm/init_64.c| 23 ---
>>  arch/x86/include/asm/hugetlb.h |  3 +++
>>  arch/x86/mm/hugetlbpage.c  | 21 +++--
>>  include/linux/hugetlb.h|  7 +++
>>  mm/hugetlb.c   | 16 +---
>>  14 files changed, 126 insertions(+), 43 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/hugetlb.h 
>> b/arch/arm64/include/asm/hugetlb.h
>> index 2eb6c234d594..3248f35213ee 100644
>> --- a/arch/arm64/include/asm/hugetlb.h
>> +++ b/arch/arm64/include/asm/hugetlb.h

>> +
>> +static __init int setup_hugepagesz(char *opt)
>> +{
>> +unsigned long long ps = memparse(opt, );
>> +
>> +if arch_hugetlb_valid_size(ps)) {
> 
> Please compile your changes if you're touching multiple architectures. You
> can get cross-compiler binaries from:
> 

My apologies.  I only cross compiled the result of the series on each
architecture.  The above code is obviously bad.

-- 
Mike Kravetz

[PATCH 4/4] hugetlbfs: clean up command line processing

2020-03-18 Thread Mike Kravetz

With all hugetlb page processing done in a single file clean up code.
- Make code match desired semantics
  - Update documentation with semantics
- Make all warnings and errors messages start with 'HugeTLB:'.
- Consistently name command line parsing routines.
- Add comments to code
  - Describe some of the subtle interactions
  - Describe semantics of command line arguments

Signed-off-by: Mike Kravetz 
---
 Documentation/admin-guide/mm/hugetlbpage.rst | 26 +++
 mm/hugetlb.c | 78 +++-
 2 files changed, 87 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst 
b/Documentation/admin-guide/mm/hugetlbpage.rst
index 1cc0bc78d10e..afcf33c3 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -100,6 +100,32 @@ with a huge page size selection parameter 
"hugepagesz=".   must
 be specified in bytes with optional scale suffix [kKmMgG].  The default huge
 page size may be selected with the "default_hugepagesz=" boot parameter.
 
+Hugetlb boot command line parameter semantics
+hugepagesz - Specify a huge page size.  Used in conjunction with hugepages
+   parameter to preallocate a number of huge pages of the specified
+   size.  Hence, hugepagesz and hugepages are typically specified in
+   pairs such as:
+   hugepagesz=2M hugepages=512
+   hugepagesz can only be specified once on the command line for a
+   specific huge page size.  Valid huge page sizes are architecture
+   dependent.
+hugepages - Specify the number of huge pages to preallocate.  This typically
+   follows a valid hugepagesz parameter.  However, if hugepages is the
+   first or only hugetlb command line parameter it specifies the number
+   of huge pages of default size to allocate.  The number of huge pages
+   of default size specified in this manner can be overwritten by a
+   hugepagesz,hugepages parameter pair for the default size.
+   For example, on an architecture with 2M default huge page size:
+   hugepages=256 hugepagesz=2M hugepages=512
+   will result in 512 2M huge pages being allocated.  If a hugepages
+   parameter is preceded by an invalid hugepagesz parameter, it will
+   be ignored.
+default_hugepagesz - Specify the default huge page size.  This parameter can
+   only be specified on the command line.  No other hugetlb command line
+   parameter is associated with default_hugepagesz.  Therefore, it can
+   appear anywhere on the command line.  Valid default huge page size is
+   architecture dependent.
+
 When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
 indicates the current number of pre-allocated huge pages of the default size.
 Thus, one can use the following command to dynamically allocate/deallocate
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index cc85b4f156ca..2b9bf01db2b6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2954,7 +2954,7 @@ static void __init hugetlb_sysfs_init(void)
err = hugetlb_sysfs_add_hstate(h, hugepages_kobj,
 hstate_kobjs, _attr_group);
if (err)
-   pr_err("Hugetlb: Unable to add hstate %s", h->name);
+   pr_err("HugeTLB: Unable to add hstate %s", h->name);
}
 }
 
@@ -3058,7 +3058,7 @@ static void hugetlb_register_node(struct node *node)
nhs->hstate_kobjs,
_node_hstate_attr_group);
if (err) {
-   pr_err("Hugetlb: Unable to add hstate %s for node %d\n",
+   pr_err("HugeTLB: Unable to add hstate %s for node %d\n",
h->name, node->dev.id);
hugetlb_unregister_node(node);
break;
@@ -3109,19 +3109,35 @@ static int __init hugetlb_init(void)
if (!hugepages_supported())
return 0;
 
-   if (!size_to_hstate(default_hstate_size)) {
-   if (default_hstate_size != 0) {
-   pr_err("HugeTLB: unsupported default_hugepagesz %lu. 
Reverting to %lu\n",
-  default_hstate_size, HPAGE_SIZE);
-   }
-
+   /*
+* Make sure HPAGE_SIZE (HUGETLB_PAGE_ORDER) hstate exists.  Some
+* architectures depend on setup being done here.
+*
+* If a valid default huge page size was specified on the command line,
+* add associated hstate if necessary.  If not, set default_hstate_size
+* to default size.  default_hstate_idx is used at runtime to identify
+* the default huge page size/hstate.
+*/
+   hugetlb_add_hstate(HUGETLB_PAGE_ORDER);
+   if (default_hstate_size)
+

[PATCH 0/4] Clean up hugetlb boot command line processing

2020-03-18 Thread Mike Kravetz

Longpeng(Mike) reported a weird message from hugetlb command line processing
and proposed a solution [1].  While the proposed patch does address the
specific issue, there are other related issues in command line processing.
As hugetlbfs evolved, updates to command line processing have been made to
meet immediate needs and not necessarily in a coordinated manner.  The result
is that some processing is done in arch specific code, some is done in arch
independent code and coordination is problematic.  Semantics can vary between
architectures.

The following patch series does the following:
- Define arch specific arch_hugetlb_valid_size routine used to validate
  passed huge page sizes.
- Move hugepagesz= command line parsing out of arch specific code and into
  an arch independent routine.
- Clean up command line processing to follow desired semantics and
  document those semantics.

[1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpe...@huawei.com

Mike Kravetz (4):
  hugetlbfs: add arch_hugetlb_valid_size
  hugetlbfs: move hugepagesz= parsing to arch independent code
  hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate
  hugetlbfs: clean up command line processing

 Documentation/admin-guide/mm/hugetlbpage.rst |  26 
 arch/arm64/include/asm/hugetlb.h |   2 +
 arch/arm64/mm/hugetlbpage.c  |  30 ++---
 arch/powerpc/include/asm/hugetlb.h   |   3 +
 arch/powerpc/mm/hugetlbpage.c|  30 ++---
 arch/riscv/include/asm/hugetlb.h |   3 +
 arch/riscv/mm/hugetlbpage.c  |  24 ++--
 arch/s390/include/asm/hugetlb.h  |   3 +
 arch/s390/mm/hugetlbpage.c   |  24 ++--
 arch/sparc/include/asm/hugetlb.h |   3 +
 arch/sparc/mm/init_64.c  |  42 ++-
 arch/x86/include/asm/hugetlb.h   |   3 +
 arch/x86/mm/hugetlbpage.c|  23 ++--
 include/linux/hugetlb.h  |   8 +-
 mm/hugetlb.c | 126 ++-
 15 files changed, 198 insertions(+), 152 deletions(-)

-- 
2.24.1

[PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread Mike Kravetz

The architecture independent routine hugetlb_default_setup sets up
the default huge pages size.  It has no way to verify if the passed
value is valid, so it accepts it and attempts to validate at a later
time.  This requires undocumented cooperation between the arch specific
and arch independent code.

For architectures that support more than one huge page size, provide
a routine arch_hugetlb_valid_size to validate a huge page size.
hugetlb_default_setup can use this to validate passed values.

arch_hugetlb_valid_size will also be used in a subsequent patch to
move processing of the "hugepagesz=" in arch specific code to a common
routine in arch independent code.

Signed-off-by: Mike Kravetz 
---
 arch/arm64/include/asm/hugetlb.h   |  2 ++
 arch/arm64/mm/hugetlbpage.c| 19 ++-
 arch/powerpc/include/asm/hugetlb.h |  3 +++
 arch/powerpc/mm/hugetlbpage.c  | 20 +---
 arch/riscv/include/asm/hugetlb.h   |  3 +++
 arch/riscv/mm/hugetlbpage.c| 28 ++--
 arch/s390/include/asm/hugetlb.h|  3 +++
 arch/s390/mm/hugetlbpage.c | 18 +-
 arch/sparc/include/asm/hugetlb.h   |  3 +++
 arch/sparc/mm/init_64.c| 23 ---
 arch/x86/include/asm/hugetlb.h |  3 +++
 arch/x86/mm/hugetlbpage.c  | 21 +++--
 include/linux/hugetlb.h|  7 +++
 mm/hugetlb.c   | 16 +---
 14 files changed, 126 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 2eb6c234d594..3248f35213ee 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -59,6 +59,8 @@ extern void huge_pte_clear(struct mm_struct *mm, unsigned 
long addr,
 extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned long sz);
 #define set_huge_swap_pte_at set_huge_swap_pte_at
+extern bool __init arch_hugetlb_valid_size(unsigned long long size);
+#define arch_hugetlb_valid_size arch_hugetlb_valid_size
 
 #include 
 
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index bbeb6a5a6ba6..da30127086d0 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -462,23 +462,32 @@ static int __init hugetlbpage_init(void)
 }
 arch_initcall(hugetlbpage_init);
 
-static __init int setup_hugepagesz(char *opt)
+bool __init arch_hugetlb_valid_size(unsigned long long size)
 {
-   unsigned long ps = memparse(opt, );
-
-   switch (ps) {
+   switch (size) {
 #ifdef CONFIG_ARM64_4K_PAGES
case PUD_SIZE:
 #endif
case CONT_PMD_SIZE:
case PMD_SIZE:
case CONT_PTE_SIZE:
+   return true;
+   }
+
+   return false;
+}
+
+static __init int setup_hugepagesz(char *opt)
+{
+   unsigned long long ps = memparse(opt, );
+
+   if arch_hugetlb_valid_size(ps)) {
add_huge_page_size(ps);
return 1;
}
 
hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10);
+   pr_err("hugepagesz: Unsupported page size %llu K\n", ps >> 10);
return 0;
 }
 __setup("hugepagesz=", setup_hugepagesz);
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index bd6504c28c2f..3b5939016955 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -64,6 +64,9 @@ static inline void arch_clear_hugepage_flags(struct page 
*page)
 {
 }
 
+#define arch_hugetlb_valid_size arch_hugetlb_valid_size
+extern bool __init arch_hugetlb_valid_size(unsigned long long size);
+
 #include 
 
 #else /* ! CONFIG_HUGETLB_PAGE */
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 33b3461d91e8..b78f660252f3 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -558,7 +558,7 @@ unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
return vma_kernel_pagesize(vma);
 }
 
-static int __init add_huge_page_size(unsigned long long size)
+bool __init arch_hugetlb_valid_size(unsigned long long size)
 {
int shift = __ffs(size);
int mmu_psize;
@@ -566,20 +566,26 @@ static int __init add_huge_page_size(unsigned long long 
size)
/* Check that it is a page size supported by the hardware and
 * that it fits within pagetable and slice limits. */
if (size <= PAGE_SIZE || !is_power_of_2(size))
-   return -EINVAL;
+   return false;
 
mmu_psize = check_and_get_huge_psize(shift);
if (mmu_psize < 0)
-   return -EINVAL;
+   return false;
 
BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);
 
-   /* Return if huge page size has already been setup */
-   if (size_to_hstate(size))
-   return 0;

[PATCH 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-03-18 Thread Mike Kravetz

Now that architectures provide arch_hugetlb_valid_size(), parsing
of "hugepagesz=" can be done in architecture independent code.
Create a single routine to handle hugepagesz= parsing and remove
all arch specific routines.  We can also remove the interface
hugetlb_bad_size() as this is no longer used outside arch independent
code.

This also provides consistent behavior of hugetlbfs command line
options.  The hugepagesz= option should only be specified once for
a specific size, but some architectures allow multiple instances.
This appears to be more of an oversight when code was added by some
architectures to set up ALL huge pages sizes.

Signed-off-by: Mike Kravetz 
---
 arch/arm64/mm/hugetlbpage.c   | 15 ---
 arch/powerpc/mm/hugetlbpage.c | 15 ---
 arch/riscv/mm/hugetlbpage.c   | 16 
 arch/s390/mm/hugetlbpage.c| 18 --
 arch/sparc/mm/init_64.c   | 22 --
 arch/x86/mm/hugetlbpage.c | 16 
 include/linux/hugetlb.h   |  1 -
 mm/hugetlb.c  | 24 ++--
 8 files changed, 18 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index da30127086d0..4aa9534a45d7 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -476,18 +476,3 @@ bool __init arch_hugetlb_valid_size(unsigned long long 
size)
 
return false;
 }
-
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long long ps = memparse(opt, );
-
-   if arch_hugetlb_valid_size(ps)) {
-   add_huge_page_size(ps);
-   return 1;
-   }
-
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %llu K\n", ps >> 10);
-   return 0;
-}
-__setup("hugepagesz=", setup_hugepagesz);
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index b78f660252f3..166960ba1236 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -589,21 +589,6 @@ static int __init add_huge_page_size(unsigned long long 
size)
return 0;
 }
 
-static int __init hugepage_setup_sz(char *str)
-{
-   unsigned long long size;
-
-   size = memparse(str, );
-
-   if (add_huge_page_size(size) != 0) {
-   hugetlb_bad_size();
-   pr_err("Invalid huge page size specified(%llu)\n", size);
-   }
-
-   return 1;
-}
-__setup("hugepagesz=", hugepage_setup_sz);
-
 static int __init hugetlbpage_init(void)
 {
bool configured = false;
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index f1990882f16c..bdf89d7eb714 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -22,22 +22,6 @@ bool __init arch_hugetlb_valid_size(unsigned long long size)
return false;
 }
 
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long long ps = memparse(opt, );
-
-   if (arch_hugetlb_valid_size(ps)) {
-   hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT);
-   return 1;
-   }
-
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
-   return 0;
-
-}
-__setup("hugepagesz=", setup_hugepagesz);
-
 #ifdef CONFIG_CONTIG_ALLOC
 static __init int gigantic_pages_init(void)
 {
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index d92e8c5c3e71..b809762f206e 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -261,24 +261,6 @@ bool __init arch_hugetlb_valid_size(unsigned long long 
size)
return false;
 }
 
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long long size;
-   char *string = opt;
-
-   size = memparse(opt, );
-   if (arch_hugetlb_valid_size(size)) {
-   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-   } else {
-   hugetlb_bad_size();
-   pr_err("hugepagesz= specifies an unsupported page size %s\n",
-   string);
-   return 0;
-   }
-   return 1;
-}
-__setup("hugepagesz=", setup_hugepagesz);
-
 static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 4cc248817b19..5c29203fd460 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -398,28 +398,6 @@ bool __init arch_hugetlb_valid_size(unsigned long long 
size)
 
return true;
 }
-
-static int __init setup_hugepagesz(char *string)
-{
-   unsigned long long hugepage_size;
-   int rc = 0;
-
-   hugepage_size = memparse(string, );
-
-   if (!arch_hugetlb_valid_size(hugepage_size)) {
-   hugetlb_bad_size();
-   pr_err("

[PATCH 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

2020-03-18 Thread Mike Kravetz

The routine hugetlb_add_hstate prints a warning if the hstate already
exists.  This was originally done as part of kernel command line
parsing.  If 'hugepagesz=' was specified more than once, the warning
pr_warn("hugepagesz= specified twice, ignoring\n");
would be printed.

Some architectures want to enable all huge page sizes.  They would
call hugetlb_add_hstate for all supported sizes.  However, this was
done after command line processing and as a result hstates could have
already been created for some sizes.  To make sure no warning were
printed, there would often be code like:
if (!size_to_hstate(size)
hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT)

The only time we want to print the warning is as the result of command
line processing.  So, remove the warning from hugetlb_add_hstate and
add it to the single arch independent routine processing "hugepagesz=".
After this, calls to size_to_hstate() in arch specific code can be
removed and hugetlb_add_hstate can be called without worrying about
warning messages.

Signed-off-by: Mike Kravetz 
---
 arch/arm64/mm/hugetlbpage.c   | 16 
 arch/powerpc/mm/hugetlbpage.c |  3 +--
 arch/riscv/mm/hugetlbpage.c   |  2 +-
 arch/sparc/mm/init_64.c   | 19 ---
 arch/x86/mm/hugetlbpage.c |  2 +-
 mm/hugetlb.c  | 10 +++---
 6 files changed, 18 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 4aa9534a45d7..050809e6f0a9 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -441,22 +441,14 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma,
clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
 }
 
-static void __init add_huge_page_size(unsigned long size)
-{
-   if (size_to_hstate(size))
-   return;
-
-   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-}
-
 static int __init hugetlbpage_init(void)
 {
 #ifdef CONFIG_ARM64_4K_PAGES
-   add_huge_page_size(PUD_SIZE);
+   hugetlb_add_hstate(ilog2(PUD_SIZE) - PAGE_SHIFT);
 #endif
-   add_huge_page_size(CONT_PMD_SIZE);
-   add_huge_page_size(PMD_SIZE);
-   add_huge_page_size(CONT_PTE_SIZE);
+   hugetlb_add_hstate(ilog2(CONT_PMD_SIZE) - PAGE_SHIFT);
+   hugetlb_add_hstate(ilog2(PMD_SIZE) - PAGE_SHIFT);
+   hugetlb_add_hstate(ilog2(CONT_PTE_SIZE) - PAGE_SHIFT);
 
return 0;
 }
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 166960ba1236..f46464ba6fb4 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -584,8 +584,7 @@ static int __init add_huge_page_size(unsigned long long 
size)
if (!arch_hugetlb_valid_size(size))
return -EINVAL;
 
-   if (!size_to_hstate(size))
-   hugetlb_add_hstate(shift - PAGE_SHIFT);
+   hugetlb_add_hstate(shift - PAGE_SHIFT);
return 0;
 }
 
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index bdf89d7eb714..beaa91941db8 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -26,7 +26,7 @@ bool __init arch_hugetlb_valid_size(unsigned long long size)
 static __init int gigantic_pages_init(void)
 {
/* With CONTIG_ALLOC, we can allocate gigantic pages at runtime */
-   if (IS_ENABLED(CONFIG_64BIT) && !size_to_hstate(1UL << PUD_SHIFT))
+   if (IS_ENABLED(CONFIG_64BIT))
hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
return 0;
 }
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 5c29203fd460..8f619edc8f8c 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -325,23 +325,12 @@ static void __update_mmu_tsb_insert(struct mm_struct *mm, 
unsigned long tsb_inde
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void __init add_huge_page_size(unsigned long size)
-{
-   unsigned int order;
-
-   if (size_to_hstate(size))
-   return;
-
-   order = ilog2(size) - PAGE_SHIFT;
-   hugetlb_add_hstate(order);
-}
-
 static int __init hugetlbpage_init(void)
 {
-   add_huge_page_size(1UL << HPAGE_64K_SHIFT);
-   add_huge_page_size(1UL << HPAGE_SHIFT);
-   add_huge_page_size(1UL << HPAGE_256MB_SHIFT);
-   add_huge_page_size(1UL << HPAGE_2GB_SHIFT);
+   hugetlb_add_hstate(HPAGE_64K_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_256MB_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_2GB_SHIFT - PAGE_SHIFT);
 
return 0;
 }
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index dd3ed09f6c23..8a3f586e1217 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -195,7 +195,7 @@ bool __init arch_hugetlb_valid_size(unsigned long long size)
 static __init int gigantic_pages_init(void)
 {
/* With compaction or CMA we can allocate gigantic pages at runtime */
-

Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS

2020-03-17 Thread Mike Kravetz

On 3/17/20 1:04 AM, Christophe Leroy wrote:
> When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the
> following build failure is encoutered:
> 
> In file included from arch/powerpc/mm/fault.c:33:0:
> ./include/linux/hugetlb.h: In function 'hstate_inode':
> ./include/linux/hugetlb.h:477:9: error: implicit declaration of function 
> 'HUGETLBFS_SB' [-Werror=implicit-function-declaration]
>   return HUGETLBFS_SB(i->i_sb)->hstate;
>  ^
> ./include/linux/hugetlb.h:477:30: error: invalid type argument of '->' (have 
> 'int')
>   return HUGETLBFS_SB(i->i_sb)->hstate;
>   ^
> 
> Gate hstate_inode() with CONFIG_HUGETLBFS instead of CONFIG_HUGETLB_PAGE.
> 
> Reported-by: kbuild test robot 
> Link: https://patchwork.ozlabs.org/patch/1255548/#2386036
> Fixes: a137e1cc6d6e ("hugetlbfs: per mount huge page sizes")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

As hugetlb.h evolved over time, I suspect nobody imagined a configuration
with CONFIG_HUGETLB_PAGE and not CONFIG_HUGETLBFS.  This patch does address
the build issues.  So,

Reviewed-by: Mike Kravetz 

However, there are many definitions in that file not behind #ifdef
CONFIG_HUGETLBFS that make no sense unless CONFIG_HUGETLBFS is defined.
Such cleanup is way beyond the scope of this patch/effort.  I will add
it to the list of hugetlb/hugetlbfs things that can be cleaned up.
-- 
Mike Kravetz

Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS

2020-03-17 Thread Mike Kravetz

On 3/17/20 9:47 AM, Christophe Leroy wrote:
> 
> 
> Le 17/03/2020 à 17:40, Mike Kravetz a écrit :
>> On 3/17/20 1:43 AM, Christophe Leroy wrote:
>>>
>>>
>>> Le 17/03/2020 à 09:25, Baoquan He a écrit :
>>>> On 03/17/20 at 08:04am, Christophe Leroy wrote:
>>>>> When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the
>>>>> following build failure is encoutered:
>>>>
>>>>   From the definition of HUGETLB_PAGE, isn't it relying on HUGETLBFS?
>>>> I could misunderstand the def_bool, please correct me if I am wrong.
>>>
>>> AFAIU, it means that HUGETLBFS rely on HUGETLB_PAGE, by default 
>>> HUGETLB_PAGE is not selected when HUGETLBFS is not. But it is still 
>>> possible for an arch to select HUGETLB_PAGE without selecting HUGETLBFS 
>>> when it uses huge pages for other purpose than hugetlb file system.
>>>
>>
>> Hi Christophe,
>>
>> Do you actually have a use case/example of using hugetlb pages without
>> hugetlbfs?  I can understand that there are some use cases which never
>> use the filesystem interface.  However, hugetlb support is so intertwined
>> with hugetlbfs, I am thinking there would be issues trying to use them
>> separately.  I will look into this further.
>>
> 
> Hi Mike,
> 
> Series https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=164620
> 
> And especially patch 39 to 41.
> 

Ah, ok.  You are simply using a few interfaces in the hugetlb header files.
The huge pages created in your mappings are not PageHuge() pages.

-- 
Mike Kravetz

Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS

2020-03-17 Thread Mike Kravetz

On 3/17/20 1:43 AM, Christophe Leroy wrote:
> 
> 
> Le 17/03/2020 à 09:25, Baoquan He a écrit :
>> On 03/17/20 at 08:04am, Christophe Leroy wrote:
>>> When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the
>>> following build failure is encoutered:
>>
>>  From the definition of HUGETLB_PAGE, isn't it relying on HUGETLBFS?
>> I could misunderstand the def_bool, please correct me if I am wrong.
> 
> AFAIU, it means that HUGETLBFS rely on HUGETLB_PAGE, by default HUGETLB_PAGE 
> is not selected when HUGETLBFS is not. But it is still possible for an arch 
> to select HUGETLB_PAGE without selecting HUGETLBFS when it uses huge pages 
> for other purpose than hugetlb file system.
> 

Hi Christophe,

Do you actually have a use case/example of using hugetlb pages without
hugetlbfs?  I can understand that there are some use cases which never
use the filesystem interface.  However, hugetlb support is so intertwined
with hugetlbfs, I am thinking there would be issues trying to use them
separately.  I will look into this further.

-- 
Mike Kravetz

Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2019-05-29 Thread Mike Kravetz

On 5/28/19 2:49 AM, Wanpeng Li wrote:
> Cc Paolo,
> Hi all,
> On Wed, 14 Feb 2018 at 06:34, Mike Kravetz  wrote:
>>
>> On 02/12/2018 06:48 PM, Michael Ellerman wrote:
>>> Andrew Morton  writes:
>>>
>>>> On Thu, 08 Feb 2018 12:30:45 + Punit Agrawal  
>>>> wrote:
>>>>
>>>>>>
>>>>>> So I don't think that the above test result means that errors are 
>>>>>> properly
>>>>>> handled, and the proposed patch should help for arm64.
>>>>>
>>>>> Although, the deviation of pud_huge() avoids a kernel crash the code
>>>>> would be easier to maintain and reason about if arm64 helpers are
>>>>> consistent with expectations by core code.
>>>>>
>>>>> I'll look to update the arm64 helpers once this patch gets merged. But
>>>>> it would be helpful if there was a clear expression of semantics for
>>>>> pud_huge() for various cases. Is there any version that can be used as
>>>>> reference?
>>>>
>>>> Is that an ack or tested-by?
>>>>
>>>> Mike keeps plaintively asking the powerpc developers to take a look,
>>>> but they remain steadfastly in hiding.
>>>
>>> Cc'ing linuxppc-dev is always a good idea :)
>>>
>>
>> Thanks Michael,
>>
>> I was mostly concerned about use cases for soft/hard offline of huge pages
>> larger than PMD_SIZE on powerpc.  I know that powerpc supports PGD_SIZE
>> huge pages, and soft/hard offline support was specifically added for this.
>> See, 94310cbcaa3c "mm/madvise: enable (soft|hard) offline of HugeTLB pages
>> at PGD level"
>>
>> This patch will disable that functionality.  So, at a minimum this is a
>> 'heads up'.  If there are actual use cases that depend on this, then more
>> work/discussions will need to happen.  From the e-mail thread on PGD_SIZE
>> support, I can not tell if there is a real use case or this is just a
>> 'nice to have'.
> 
> 1GB hugetlbfs pages are used by DPDK and VMs in cloud deployment, we
> encounter gup_pud_range() panic several times in product environment.
> Is there any plan to reenable and fix arch codes?

I too am aware of slightly more interest in 1G huge pages.  Suspect that as
Intel MMU capacity increases to handle more TLB entries there will be more
and more interest.

Personally, I am not looking at this issue.  Perhaps Naoya will comment as
he know most about this code.

> In addition, 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/mmu.c#n3213
> The memory in guest can be 1GB/2MB/4K, though the host-backed memory
> are 1GB hugetlbfs pages, after above PUD panic is fixed,
> try_to_unmap() which is called in MCA recovery path will mark the PUD
> hwpoison entry. The guest will vmexit and retry endlessly when
> accessing any memory in the guest which is backed by this 1GB poisoned
> hugetlbfs page. We have a plan to split this 1GB hugetblfs page by 2MB
> hugetlbfs pages/4KB pages, maybe file remap to a virtual address range
> which is 2MB/4KB page granularity, also split the KVM MMU 1GB SPTE
> into 2MB/4KB and mark the offensive SPTE w/ a hwpoison flag, a sigbus
> will be delivered to VM at page fault next time for the offensive
> SPTE. Is this proposal acceptable?

I am not sure of the error handling design, but this does sound reasonable.
That block of code which potentially dissolves a huge page on memory error
is hard to understand and I'm not sure if that is even the 'normal'
functionality.  Certainly, we would hate to waste/poison an entire 1G page
for an error on a small subsection.

-- 
Mike Kravetz

Re: [PATCH v8 4/4] hugetlb: allow to free gigantic pages regardless of the configuration

2019-03-28 Thread Mike Kravetz

On 3/26/19 11:36 PM, Alexandre Ghiti wrote:
> On systems without CONTIG_ALLOC activated but that support gigantic pages,
> boottime reserved gigantic pages can not be freed at all. This patch
> simply enables the possibility to hand back those pages to memory
> allocator.
> 
> Signed-off-by: Alexandre Ghiti 
> Acked-by: David S. Miller  [sparc]

Thanks for all the updates

Reviewed-by: Mike Kravetz 

-- 
Mike Kravetz

Re: [PATCH v6 4/4] hugetlb: allow to free gigantic pages regardless of the configuration

2019-03-08 Thread Mike Kravetz

On 3/7/19 5:20 AM, Alexandre Ghiti wrote:
> On systems without CONTIG_ALLOC activated but that support gigantic pages,
> boottime reserved gigantic pages can not be freed at all. This patch
> simply enables the possibility to hand back those pages to memory
> allocator.
> 
> Signed-off-by: Alexandre Ghiti 
> Acked-by: David S. Miller  [sparc]

Reviewed-by: Mike Kravetz 

-- 
Mike Kravetz

Re: [PATCH v4 4/4] hugetlb: allow to free gigantic pages regardless of the configuration

2019-03-01 Thread Mike Kravetz

On 3/1/19 5:21 AM, Alexandre Ghiti wrote:
> On 03/01/2019 07:25 AM, Alex Ghiti wrote:
>> On 2/28/19 5:26 PM, Mike Kravetz wrote:
>>> On 2/28/19 12:23 PM, Dave Hansen wrote:
>>>> On 2/28/19 11:50 AM, Mike Kravetz wrote:
>>>>> On 2/28/19 11:13 AM, Dave Hansen wrote:
>>>>>>> +if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
>>>>>>> +spin_lock(_lock);
>>>>>>> +if (count > persistent_huge_pages(h)) {
>>>>>>> +spin_unlock(_lock);
>>>>>>> +return -EINVAL;
>>>>>>> +}
>>>>>>> +goto decrease_pool;
>>>>>>> +}
>>>>>> This choice confuses me.  The "Decrease the pool size" code already
>>>>>> works and the code just falls through to it after skipping all the
>>>>>> "Increase the pool size" code.
>>>>>>
>>>>>> Why did did you need to add this case so early?  Why not just let it
>>>>>> fall through like before?
>>>>> I assume you are questioning the goto, right?  You are correct in that
>>>>> it is unnecessary and we could just fall through.
>>>> Yeah, it just looked odd to me.
> 
>> I'd rather avoid useless checks when we already know they won't
>> be met and I think that makes the code more understandable.
>>
>> But that's up to you for the next version.

I too find some value in the goto.  It tells me this !CONFIG_CONTIG_ALLOC
case is special and we are skipping the normal checks.  But, removing the
goto is not a requirement for me.

>>>>> However, I wonder if we might want to consider a wacky condition that the
>>>>> above check would prevent.  Consider a system/configuration with 5 
>>>>> gigantic
...
>>
>> If I may, I think that this is the kind of info the user wants to have and 
>> we should
>> return an error when it is not possible to allocate runtime huge pages.
>> I already noticed that if someone asks for 10 huge pages, and only 5 are 
>> allocated,
>> no error is returned to the user and I found that surprising.

Upon further thought, let's not consider this wacky permanent -> surplus ->
permanent case.  I just can't see it being an actual use case.

IIUC, that 'no error' behavior is somewhat expected.  I seem to recall previous
discussions about changing with the end result to leave as is.

>>>> @@ -2428,7 +2442,9 @@ static ssize_t __nr_hugepages_store_common(bool 
>>>> obey_mempolicy,
>>>>   } else
>>>>   nodes_allowed = _states[N_MEMORY];
>>>>   -h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed);
>>>> +err = set_max_huge_pages(h, count, nodes_allowed);
>>>> +if (err)
>>>> +goto out;
>>>> if (nodes_allowed != _states[N_MEMORY])
>>>>   NODEMASK_FREE(nodes_allowed);
>>> Do note that I beleive there is a bug the above change.  The code after
>>> the out label is:
>>>
>>> out:
>>>  NODEMASK_FREE(nodes_allowed);
>>>  return err;
>>> }
>>>
>>> With the new goto, we need the same
>>> if (nodes_allowed != _states[N_MEMORY]) before NODEMASK_FREE().
>>>
>>> Sorry, I missed this in previous versions.
>>
>> Oh right, I'm really sorry I missed that, thank you for noticing.

This is the only issue I have with the code in hugetlb.c.  For me, the
goto can stay or go.  End result is the same.
-- 
Mike Kravetz

Re: [PATCH] hugetlb: allow to free gigantic pages regardless of the configuration

2019-01-17 Thread Mike Kravetz

On 1/17/19 10:39 AM, Alexandre Ghiti wrote:
> From: Alexandre Ghiti 
> 
> On systems without CMA or (MEMORY_ISOLATION && COMPACTION) activated but
> that support gigantic pages, boottime reserved gigantic pages can not be
> freed at all. This patchs simply enables the possibility to hand back
> those pages to memory allocator.
> 
> This commit then renames gigantic_page_supported and
> ARCH_HAS_GIGANTIC_PAGE to make them more accurate. Indeed, those values
> being false does not mean that the system cannot use gigantic pages: it
> just means that runtime allocation of gigantic pages is not supported,
> one can still allocate boottime gigantic pages if the architecture supports
> it.
> 
> Signed-off-by: Alexandre Ghiti 

Thank you for doing this!

Reviewed-by: Mike Kravetz 

> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -589,8 +589,8 @@ static inline bool pm_suspended_storage(void)
>  /* The below functions must be run on a range from a single zone. */
>  extern int alloc_contig_range(unsigned long start, unsigned long end,
> unsigned migratetype, gfp_t gfp_mask);
> -extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
>  #endif
> +extern void free_contig_range(unsigned long pfn, unsigned int nr_pages);

I think nr_pages should be an unsigned long in cma_release() and here
as well, but that is beyond the scope of this patch.  Most callers of
cma_release pass in a truncated unsigned long.  The truncation is unlikely
to cause any issues, just would be nice if types were consistent.  I have
a patch to do that as part of a contiguous allocation series that I will
get back to someday.

> @@ -2350,9 +2355,10 @@ static unsigned long set_max_huge_pages(struct hstate 
> *h, unsigned long count,
>   break;
>   }
>  out:
> - ret = persistent_huge_pages(h);
> + h->max_huge_pages = persistent_huge_pages(h);
>   spin_unlock(_lock);
> - return ret;
> +
> + return 0;
>  }
>  
>  #define HSTATE_ATTR_RO(_name) \
> @@ -2404,11 +2410,6 @@ static ssize_t __nr_hugepages_store_common(bool 
> obey_mempolicy,
>   int err;
>   NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY);
>  
> - if (hstate_is_gigantic(h) && !gigantic_page_supported()) {
> - err = -EINVAL;
> - goto out;
> - }
> -
>   if (nid == NUMA_NO_NODE) {
>   /*
>* global hstate attribute
> @@ -2428,7 +2429,9 @@ static ssize_t __nr_hugepages_store_common(bool 
> obey_mempolicy,
>   } else
>   nodes_allowed = _states[N_MEMORY];
>  
> - h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed);
> + err = set_max_huge_pages(h, count, nodes_allowed);
> + if (err)
> + goto out;
>  
>   if (nodes_allowed != _states[N_MEMORY])
>   NODEMASK_FREE(nodes_allowed);

Yeah!  Those changes causes max_huge_pages to be modified while holding
hugetlb_lock as it should be.
-- 
Mike Kravetz

Re: Infinite looping observed in __offline_pages

2018-08-22 Thread Mike Kravetz

On 08/22/2018 02:30 AM, Aneesh Kumar K.V wrote:
> commit 2e9d754ac211f2af3731f15df3cd8cd070b4cc54
> Author: Aneesh Kumar K.V 
> Date:   Tue Aug 21 14:17:55 2018 +0530
> 
> mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not 
> supported.
> 
> When scanning for movable pages, filter out Hugetlb pages if hugepage 
> migration
> is not supported. Without this we hit infinte loop in __offline pages 
> where we
> do
> pfn = scan_movable_pages(start_pfn, end_pfn);
> if (pfn) { /* We have movable pages */
> ret = do_migrate_range(pfn, end_pfn);
> goto repeat;
> }
> 
> We do support hugetlb migration ony if the hugetlb pages are at pmd 
> level. Here

I thought migration at pgd level was added for POWER?  commit 94310cbcaa3c
(mm/madvise: enable (soft|hard) offline of HugeTLB pages at PGD level).
Only remember, because I did not fully understand the use case. :)

> we just check for Kernel config. The gigantic page size check is done in
> page_huge_active.
> 
> Reported-by: Haren Myneni 
> CC: Naoya Horiguchi 
> Signed-off-by: Aneesh Kumar K.V 
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4eb6e824a80c..f9bdea685cf4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1338,7 +1338,8 @@ static unsigned long scan_movable_pages(unsigned long 
> start, unsigned long end)
>   return pfn;
>   if (__PageMovable(page))
>   return pfn;
> - if (PageHuge(page)) {
> + if (IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION) &&
> + PageHuge(page)) {

How about using hugepage_migration_supported instead?  It would automatically
catch those non-migratable huge page sizes.  Something like:

        if (PageHuge(page) &&
hugepage_migration_supported(page_hstate(page))) {

-- 
Mike Kravetz

>   if (page_huge_active(page))
>   return pfn;
>   else
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 15ea511fb41c..a3f81e18c882 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7649,6 +7649,10 @@ bool has_unmovable_pages(struct zone *zone, struct 
> page *page, int count,
>* handle each tail page individually in migration.
>*/
>   if (PageHuge(page)) {
> +
> + if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
> + goto unmovable;
> +
>   iter = round_up(iter + 1, 1<   continue;
>   }
>

Re: [PATCH v4 02/11] hugetlb: Introduce generic version of hugetlb_free_pgd_range

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> arm, arm64, mips, parisc, sh, x86 architectures use the
> same version of hugetlb_free_pgd_range, so move this generic
> implementation into asm-generic/hugetlb.h.
> 

Just one small issue below.  Not absolutely necessary to fix.
Reviewed-by: Mike Kravetz 

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm/include/asm/hugetlb.h | 12 ++--
>  arch/arm64/include/asm/hugetlb.h   | 10 --
>  arch/ia64/include/asm/hugetlb.h|  5 +++--
>  arch/mips/include/asm/hugetlb.h| 13 ++---
>  arch/parisc/include/asm/hugetlb.h  | 12 ++--
>  arch/powerpc/include/asm/hugetlb.h |  4 +++-
>  arch/sh/include/asm/hugetlb.h  | 12 ++--
>  arch/sparc/include/asm/hugetlb.h   |  4 +++-
>  arch/x86/include/asm/hugetlb.h | 11 ++-
>  include/asm-generic/hugetlb.h  | 11 +++
>  10 files changed, 30 insertions(+), 64 deletions(-)
> 
> diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
> index 7d26f6c4f0f5..047b893ef95d 100644
> --- a/arch/arm/include/asm/hugetlb.h
> +++ b/arch/arm/include/asm/hugetlb.h
> @@ -23,19 +23,9 @@
>  #define _ASM_ARM_HUGETLB_H
>  
>  #include 
> -#include 
>  
>  #include 
>  
> -static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
> -   unsigned long addr, unsigned long end,
> -   unsigned long floor,
> -   unsigned long ceiling)
> -{
> - free_pgd_range(tlb, addr, end, floor, ceiling);
> -}
> -
> -
>  static inline int is_hugepage_only_range(struct mm_struct *mm,
>unsigned long addr, unsigned long len)
>  {
> @@ -68,4 +58,6 @@ static inline void arch_clear_hugepage_flags(struct page 
> *page)
>   clear_bit(PG_dcache_clean, >flags);
>  }
>  
> +#include 
> +

I don't think moving the #include is necessary in this case where you are
not adding a __HAVE_ARCH_HUGE* definition.  I like having all the #include
statements at the top if possible.
-- 
Mike Kravetz

>  #endif /* _ASM_ARM_HUGETLB_H */
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 3fcf14663dfa..4af1a800a900 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -25,16 +25,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>   return READ_ONCE(*ptep);
>  }
>  
> -
> -
> -static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
> -   unsigned long addr, unsigned long end,
> -   unsigned long floor,
> -   unsigned long ceiling)
> -{
> - free_pgd_range(tlb, addr, end, floor, ceiling);
> -}
> -
>  static inline int is_hugepage_only_range(struct mm_struct *mm,
>unsigned long addr, unsigned long len)
>  {
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index 74d2a5540aaf..afe9fa4d969b 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -3,9 +3,8 @@
>  #define _ASM_IA64_HUGETLB_H
>  
>  #include 
> -#include 
> -
>  
> +#define __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
>  void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
>   unsigned long end, unsigned long floor,
>   unsigned long ceiling);
> @@ -70,4 +69,6 @@ static inline void arch_clear_hugepage_flags(struct page 
> *page)
>  {
>  }
>  
> +#include 
> +
>  #endif /* _ASM_IA64_HUGETLB_H */
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 982bc0685330..53764050243e 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -10,8 +10,6 @@
>  #define __ASM_HUGETLB_H
>  
>  #include 
> -#include 
> -
>  
>  static inline int is_hugepage_only_range(struct mm_struct *mm,
>unsigned long addr,
> @@ -38,15 +36,6 @@ static inline int prepare_hugepage_range(struct file *file,
>   return 0;
>  }
>  
> -static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
> -   unsigned long addr,
> -   unsigned long end,
> -   unsigned long floor,
> -   unsigned long ceiling)
> -{
> - free_pgd_range(tlb, addr, end, floor, ceiling);
> -}
> -
>  static inline void set_huge_pte_at(struct mm_struct *

Re: [PATCH v4 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-07-26 Thread Mike Kravetz

On 07/26/2018 04:46 AM, Michael Ellerman wrote:
> Mike Kravetz  writes:
> 
>> On 07/20/2018 11:37 AM, Alex Ghiti wrote:
>>> Does anyone have any suggestion about those patches ?
>>
>> I only took a quick look.  From the hugetlb perspective, I like the
>> idea of moving routines to a common file.  If any of the arch owners
>> (or anyone else) agree, I can do a review of the series.
> 
> The conversions look pretty good to me. If you want to give it a review
> then from my point of view it could go in -mm to shake out any bugs.

Nothing of significance found in a review.  As others have suggested,
the (cross)compiler may be better at finding issues than human eyes.

I also suggest it be added to -mm.
-- 
Mike Kravetz

Re: [PATCH v4 11/11] hugetlb: Introduce generic version of huge_ptep_get

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use the
> same version of huge_ptep_get, so move this generic implementation into
> asm-generic/hugetlb.h.
> 

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm/include/asm/hugetlb-3level.h | 1 +
>  arch/arm64/include/asm/hugetlb.h  | 1 +
>  arch/ia64/include/asm/hugetlb.h   | 5 -
>  arch/mips/include/asm/hugetlb.h   | 5 -
>  arch/parisc/include/asm/hugetlb.h | 5 -
>  arch/powerpc/include/asm/hugetlb.h| 5 -
>  arch/sh/include/asm/hugetlb.h | 5 -
>  arch/sparc/include/asm/hugetlb.h  | 5 -
>  arch/x86/include/asm/hugetlb.h| 5 -
>  include/asm-generic/hugetlb.h | 7 +++
>  10 files changed, 9 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm/include/asm/hugetlb-3level.h 
> b/arch/arm/include/asm/hugetlb-3level.h
> index 54e4b097b1f5..0d9f3918fa7e 100644
> --- a/arch/arm/include/asm/hugetlb-3level.h
> +++ b/arch/arm/include/asm/hugetlb-3level.h
> @@ -29,6 +29,7 @@
>   * ptes.
>   * (The valid bit is automatically cleared by set_pte_at for PROT_NONE ptes).
>   */
> +#define __HAVE_ARCH_HUGE_PTEP_GET
>  static inline pte_t huge_ptep_get(pte_t *ptep)
>  {
>   pte_t retval = *ptep;
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 80887abcef7f..fb6609875455 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -20,6 +20,7 @@
>  
>  #include 
>  
> +#define __HAVE_ARCH_HUGE_PTEP_GET
>  static inline pte_t huge_ptep_get(pte_t *ptep)
>  {
>   return READ_ONCE(*ptep);
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index e9b42750fdf5..36cc0396b214 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -27,11 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> -static inline pte_t huge_ptep_get(pte_t *ptep)
> -{
> - return *ptep;
> -}
> -
>  static inline void arch_clear_hugepage_flags(struct page *page)
>  {
>  }
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 120adc3b2ffd..425bb6fc3bda 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -82,11 +82,6 @@ static inline int huge_ptep_set_access_flags(struct 
> vm_area_struct *vma,
>   return changed;
>  }
>  
> -static inline pte_t huge_ptep_get(pte_t *ptep)
> -{
> - return *ptep;
> -}
> -
>  static inline void arch_clear_hugepage_flags(struct page *page)
>  {
>  }
> diff --git a/arch/parisc/include/asm/hugetlb.h 
> b/arch/parisc/include/asm/hugetlb.h
> index 165b4e5a6f32..7cb595dcb7d7 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -48,11 +48,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep,
>pte_t pte, int dirty);
>  
> -static inline pte_t huge_ptep_get(pte_t *ptep)
> -{
> - return *ptep;
> -}
> -
>  static inline void arch_clear_hugepage_flags(struct page *page)
>  {
>  }
> diff --git a/arch/powerpc/include/asm/hugetlb.h 
> b/arch/powerpc/include/asm/hugetlb.h
> index 658bf7136a3c..33a2d9e3ea9e 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -142,11 +142,6 @@ extern int huge_ptep_set_access_flags(struct 
> vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t pte, int dirty);
>  
> -static inline pte_t huge_ptep_get(pte_t *ptep)
> -{
> - return *ptep;
> -}
> -
>  static inline void arch_clear_hugepage_flags(struct page *page)
>  {
>  }
> diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
> index c87195ae0cfa..6f025fe18146 100644
> --- a/arch/sh/include/asm/hugetlb.h
> +++ b/arch/sh/include/asm/hugetlb.h
> @@ -32,11 +32,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> -static inline pte_t huge_ptep_get(pte_t *ptep)
> -{
> - return *ptep;
> -}
> -
>  static inline void arch_clear_hugepage_flags(struct page *page)
>  {
>   clear_bit(PG_dcache_clean, >flags);
> diff --git a/arch/sparc/include/asm/hugetlb.h 
> b/arch/sparc/include/asm/hugetlb.h
> index 028a1465fbe7..3963f80d1cb3 100644
> --- a/arch/sparc/include/asm/hugetlb.h
> +++ b/arch

Re: [PATCH v4 10/11] hugetlb: Introduce generic version of huge_ptep_set_access_flags

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> arm, ia64, sh, x86 architectures use the same version
> of huge_ptep_set_access_flags, so move this generic implementation
> into asm-generic/hugetlb.h.
> 

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm/include/asm/hugetlb-3level.h | 7 ---
>  arch/arm64/include/asm/hugetlb.h  | 1 +
>  arch/ia64/include/asm/hugetlb.h   | 7 ---
>  arch/mips/include/asm/hugetlb.h   | 1 +
>  arch/parisc/include/asm/hugetlb.h | 1 +
>  arch/powerpc/include/asm/hugetlb.h| 1 +
>  arch/sh/include/asm/hugetlb.h | 7 ---
>  arch/sparc/include/asm/hugetlb.h  | 1 +
>  arch/x86/include/asm/hugetlb.h| 7 ---
>  include/asm-generic/hugetlb.h | 9 +
>  10 files changed, 14 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/arm/include/asm/hugetlb-3level.h 
> b/arch/arm/include/asm/hugetlb-3level.h
> index 8247cd6a2ac6..54e4b097b1f5 100644
> --- a/arch/arm/include/asm/hugetlb-3level.h
> +++ b/arch/arm/include/asm/hugetlb-3level.h
> @@ -37,11 +37,4 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>   return retval;
>  }
>  
> -static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> -  unsigned long addr, pte_t *ptep,
> -  pte_t pte, int dirty)
> -{
> - return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
> -}
> -
>  #endif /* _ASM_ARM_HUGETLB_3LEVEL_H */
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index f4f69ae5466e..80887abcef7f 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -42,6 +42,7 @@ extern pte_t arch_make_huge_pte(pte_t entry, struct 
> vm_area_struct *vma,
>  #define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
>  extern void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>   pte_t *ptep, pte_t pte);
> +#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
>  extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t pte, int dirty);
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index 49d1f7949f3a..e9b42750fdf5 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -27,13 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> -static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> -  unsigned long addr, pte_t *ptep,
> -  pte_t pte, int dirty)
> -{
> - return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
> -}
> -
>  static inline pte_t huge_ptep_get(pte_t *ptep)
>  {
>   return *ptep;
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 3dcf5debf8c4..120adc3b2ffd 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -63,6 +63,7 @@ static inline int huge_pte_none(pte_t pte)
>   return !val || (val == (unsigned long)invalid_pte_table);
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
>  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>unsigned long addr,
>pte_t *ptep, pte_t pte,
> diff --git a/arch/parisc/include/asm/hugetlb.h 
> b/arch/parisc/include/asm/hugetlb.h
> index 9c3950ca2974..165b4e5a6f32 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -43,6 +43,7 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  void huge_ptep_set_wrprotect(struct mm_struct *mm,
>  unsigned long addr, pte_t *ptep);
>  
> +#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
>  int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep,
>pte_t pte, int dirty);
> diff --git a/arch/powerpc/include/asm/hugetlb.h 
> b/arch/powerpc/include/asm/hugetlb.h
> index 69c14ecac133..658bf7136a3c 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -137,6 +137,7 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>   flush_hugetlb_page(vma, addr);
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
>  extern int huge_ptep_set_a

Re: [PATCH v4 09/11] hugetlb: Introduce generic version of huge_ptep_set_wrprotect

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> arm, ia64, mips, sh, x86 architectures use the same version
> of huge_ptep_set_wrprotect, so move this generic implementation into
> asm-generic/hugetlb.h.
> Note: powerpc uses twice for book3s/32 and nohash/32 the same version as
> the above architectures, but the modification was not straightforward
> and hence has not been done.
> 

Just one small comment, otehrwise
Reviewed-by: Mike Kravetz 

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm/include/asm/hugetlb-3level.h| 6 --
>  arch/arm64/include/asm/hugetlb.h | 1 +
>  arch/ia64/include/asm/hugetlb.h  | 6 --
>  arch/mips/include/asm/hugetlb.h  | 6 --
>  arch/parisc/include/asm/hugetlb.h| 1 +
>  arch/powerpc/include/asm/book3s/32/pgtable.h | 2 ++
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 1 +
>  arch/powerpc/include/asm/nohash/32/pgtable.h | 2 ++
>  arch/powerpc/include/asm/nohash/64/pgtable.h | 1 +

As in patch 03, the book3s and nohash header files do not explicitly
include .  With these, I had an even harder time
finding out who brought in that file.  This is not an issue with this
patch, just wish there was some easier way to check/prove include file
dependencies.  Since it compiles, I am sure it is OK.
-- 
Mike Kravetz

>  arch/sh/include/asm/hugetlb.h| 6 --
>  arch/sparc/include/asm/hugetlb.h | 1 +
>  arch/x86/include/asm/hugetlb.h   | 6 --
>  include/asm-generic/hugetlb.h| 8 
>  13 files changed, 17 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/arm/include/asm/hugetlb-3level.h 
> b/arch/arm/include/asm/hugetlb-3level.h
> index b897541520ef..8247cd6a2ac6 100644
> --- a/arch/arm/include/asm/hugetlb-3level.h
> +++ b/arch/arm/include/asm/hugetlb-3level.h
> @@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>   return retval;
>  }
>  
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> -unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
>  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep,
>pte_t pte, int dirty)
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 3e7f6e69b28d..f4f69ae5466e 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -48,6 +48,7 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct 
> *vma,
>  #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>  extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>unsigned long addr, pte_t *ptep);
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>  extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
>   unsigned long addr, pte_t *ptep);
>  #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index cbe296271030..49d1f7949f3a 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -27,12 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> -unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
>  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep,
>pte_t pte, int dirty)
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 6ff2531cfb1d..3dcf5debf8c4 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -63,12 +63,6 @@ static inline int huge_pte_none(pte_t pte)
>   return !val || (val == (unsigned long)invalid_pte_table);
>  }
>  
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> -unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
>  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>unsigned long addr,
>pte_t *ptep, pte_t pte,
> diff --git a/arch/parisc/include/asm/hugetlb.h 
> b/arch/parisc/include/asm/hugetlb.h
> index fb7e0fd858a3..9c3950ca2974 100644
> --

Re: [PATCH v4 06/11] hugetlb: Introduce generic version of huge_pte_none

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> arm, arm64, ia64, parisc, powerpc, sh, sparc, x86 architectures
> use the same version of huge_pte_none, so move this generic
> implementation into asm-generic/hugetlb.h.
> 

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm/include/asm/hugetlb.h | 5 -
>  arch/arm64/include/asm/hugetlb.h   | 5 -
>  arch/ia64/include/asm/hugetlb.h| 5 -
>  arch/mips/include/asm/hugetlb.h| 1 +
>  arch/parisc/include/asm/hugetlb.h  | 5 -
>  arch/powerpc/include/asm/hugetlb.h | 5 -
>  arch/sh/include/asm/hugetlb.h  | 5 -
>  arch/sparc/include/asm/hugetlb.h   | 5 -
>  arch/x86/include/asm/hugetlb.h | 5 -
>  include/asm-generic/hugetlb.h  | 7 +++
>  10 files changed, 8 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
> index 047b893ef95d..3d2ce4dbc145 100644
> --- a/arch/arm/include/asm/hugetlb.h
> +++ b/arch/arm/include/asm/hugetlb.h
> @@ -43,11 +43,6 @@ static inline int prepare_hugepage_range(struct file *file,
>   return 0;
>  }
>  
> -static inline int huge_pte_none(pte_t pte)
> -{
> - return pte_none(pte);
> -}
> -
>  static inline pte_t huge_pte_wrprotect(pte_t pte)
>  {
>   return pte_wrprotect(pte);
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 4c8dd488554d..49247c6f94db 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -42,11 +42,6 @@ static inline int prepare_hugepage_range(struct file *file,
>   return 0;
>  }
>  
> -static inline int huge_pte_none(pte_t pte)
> -{
> - return pte_none(pte);
> -}
> -
>  static inline pte_t huge_pte_wrprotect(pte_t pte)
>  {
>   return pte_wrprotect(pte);
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index 41b5f6adeee4..bf573500b3c4 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -26,11 +26,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> -static inline int huge_pte_none(pte_t pte)
> -{
> - return pte_none(pte);
> -}
> -
>  static inline pte_t huge_pte_wrprotect(pte_t pte)
>  {
>   return pte_wrprotect(pte);
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 7df1f116a3cc..1c9c4531376c 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -55,6 +55,7 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>   flush_tlb_page(vma, addr & huge_page_mask(hstate_vma(vma)));
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTE_NONE
>  static inline int huge_pte_none(pte_t pte)
>  {
>   unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL;
> diff --git a/arch/parisc/include/asm/hugetlb.h 
> b/arch/parisc/include/asm/hugetlb.h
> index 9afff26747a1..c09d8c74553c 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -38,11 +38,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> -static inline int huge_pte_none(pte_t pte)
> -{
> - return pte_none(pte);
> -}
> -
>  static inline pte_t huge_pte_wrprotect(pte_t pte)
>  {
>   return pte_wrprotect(pte);
> diff --git a/arch/powerpc/include/asm/hugetlb.h 
> b/arch/powerpc/include/asm/hugetlb.h
> index 0b02856aa85b..3562d46585ba 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -152,11 +152,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>   flush_hugetlb_page(vma, addr);
>  }
>  
> -static inline int huge_pte_none(pte_t pte)
> -{
> - return pte_none(pte);
> -}
> -
>  static inline pte_t huge_pte_wrprotect(pte_t pte)
>  {
>   return pte_wrprotect(pte);
> diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
> index 9abf9c86b769..a9f8266f33cf 100644
> --- a/arch/sh/include/asm/hugetlb.h
> +++ b/arch/sh/include/asm/hugetlb.h
> @@ -31,11 +31,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> -static inline int huge_pte_none(pte_t pte)
> -{
> - return pte_none(pte);
> -}
> -
>  static inline pte_t huge_pte_wrprotect(pte_t pte)
>  {
>   return pte_wrprotect(pte);
> diff --git a/arch/sparc/include/asm/hugetlb.h 
> b/arch/sparc/include/asm/hugetlb.h
> index 651a9593fcee..5bbd712e 100644
> --- a/arch/sparc/include/asm/hugetlb.h
> +++ b/arch/sparc/

Re: [PATCH v4 08/11] hugetlb: Introduce generic version of prepare_hugepage_range

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> arm, arm64, powerpc, sparc, x86 architectures use the same version of
> prepare_hugepage_range, so move this generic implementation into
> asm-generic/hugetlb.h.
> 

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm/include/asm/hugetlb.h | 11 ---
>  arch/arm64/include/asm/hugetlb.h   | 11 ---
>  arch/ia64/include/asm/hugetlb.h|  1 +
>  arch/mips/include/asm/hugetlb.h|  1 +
>  arch/parisc/include/asm/hugetlb.h  |  1 +
>  arch/powerpc/include/asm/hugetlb.h | 15 ---
>  arch/sh/include/asm/hugetlb.h  |  1 +
>  arch/sparc/include/asm/hugetlb.h   | 16 
>  arch/x86/include/asm/hugetlb.h | 15 ---
>  include/asm-generic/hugetlb.h  | 15 +++
>  10 files changed, 19 insertions(+), 68 deletions(-)
> 
> diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
> index 1e718a626ef9..34fb401efe81 100644
> --- a/arch/arm/include/asm/hugetlb.h
> +++ b/arch/arm/include/asm/hugetlb.h
> @@ -32,17 +32,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
> *mm,
>   return 0;
>  }
>  
> -static inline int prepare_hugepage_range(struct file *file,
> -  unsigned long addr, unsigned long len)
> -{
> - struct hstate *h = hstate_file(file);
> - if (len & ~huge_page_mask(h))
> - return -EINVAL;
> - if (addr & ~huge_page_mask(h))
> - return -EINVAL;
> - return 0;
> -}
> -
>  static inline void arch_clear_hugepage_flags(struct page *page)
>  {
>   clear_bit(PG_dcache_clean, >flags);
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 1fd64ebf0cd7..3e7f6e69b28d 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -31,17 +31,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
> *mm,
>   return 0;
>  }
>  
> -static inline int prepare_hugepage_range(struct file *file,
> -  unsigned long addr, unsigned long len)
> -{
> - struct hstate *h = hstate_file(file);
> - if (len & ~huge_page_mask(h))
> - return -EINVAL;
> - if (addr & ~huge_page_mask(h))
> - return -EINVAL;
> - return 0;
> -}
> -
>  static inline void arch_clear_hugepage_flags(struct page *page)
>  {
>   clear_bit(PG_dcache_clean, >flags);
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index 82fe3d7a38d9..cbe296271030 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -9,6 +9,7 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned 
> long addr,
>   unsigned long end, unsigned long floor,
>   unsigned long ceiling);
>  
> +#define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
>  int prepare_hugepage_range(struct file *file,
>   unsigned long addr, unsigned long len);
>  
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index b3d6bb53ee6e..6ff2531cfb1d 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -18,6 +18,7 @@ static inline int is_hugepage_only_range(struct mm_struct 
> *mm,
>   return 0;
>  }
>  
> +#define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
>  static inline int prepare_hugepage_range(struct file *file,
>unsigned long addr,
>unsigned long len)
> diff --git a/arch/parisc/include/asm/hugetlb.h 
> b/arch/parisc/include/asm/hugetlb.h
> index 5a102d7251e4..fb7e0fd858a3 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -22,6 +22,7 @@ static inline int is_hugepage_only_range(struct mm_struct 
> *mm,
>   * If the arch doesn't supply something else, assume that hugepage
>   * size aligned regions are ok without further preparation.
>   */
> +#define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
>  static inline int prepare_hugepage_range(struct file *file,
>   unsigned long addr, unsigned long len)
>  {
> diff --git a/arch/powerpc/include/asm/hugetlb.h 
> b/arch/powerpc/include/asm/hugetlb.h
> index 7123599089c6..69c14ecac133 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -117,21 +117,6 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, 
> unsigned long addr,
>   unsigned long end, unsigned long floor,
>

Re: [PATCH v4 07/11] hugetlb: Introduce generic version of huge_pte_wrprotect

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> arm, arm64, ia64, mips, parisc, powerpc, sh, sparc, x86
> architectures use the same version of huge_pte_wrprotect, so move
> this generic implementation into asm-generic/hugetlb.h.
> 

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm/include/asm/hugetlb.h | 5 -
>  arch/arm64/include/asm/hugetlb.h   | 5 -
>  arch/ia64/include/asm/hugetlb.h| 5 -
>  arch/mips/include/asm/hugetlb.h| 5 -
>  arch/parisc/include/asm/hugetlb.h  | 5 -
>  arch/powerpc/include/asm/hugetlb.h | 5 -
>  arch/sh/include/asm/hugetlb.h  | 5 -
>  arch/sparc/include/asm/hugetlb.h   | 5 -
>  arch/x86/include/asm/hugetlb.h | 5 -
>  include/asm-generic/hugetlb.h  | 7 +++
>  10 files changed, 7 insertions(+), 45 deletions(-)
> 
> diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
> index 3d2ce4dbc145..1e718a626ef9 100644
> --- a/arch/arm/include/asm/hugetlb.h
> +++ b/arch/arm/include/asm/hugetlb.h
> @@ -43,11 +43,6 @@ static inline int prepare_hugepage_range(struct file *file,
>   return 0;
>  }
>  
> -static inline pte_t huge_pte_wrprotect(pte_t pte)
> -{
> - return pte_wrprotect(pte);
> -}
> -
>  static inline void arch_clear_hugepage_flags(struct page *page)
>  {
>   clear_bit(PG_dcache_clean, >flags);
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 49247c6f94db..1fd64ebf0cd7 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -42,11 +42,6 @@ static inline int prepare_hugepage_range(struct file *file,
>   return 0;
>  }
>  
> -static inline pte_t huge_pte_wrprotect(pte_t pte)
> -{
> - return pte_wrprotect(pte);
> -}
> -
>  static inline void arch_clear_hugepage_flags(struct page *page)
>  {
>   clear_bit(PG_dcache_clean, >flags);
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index bf573500b3c4..82fe3d7a38d9 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -26,11 +26,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> -static inline pte_t huge_pte_wrprotect(pte_t pte)
> -{
> - return pte_wrprotect(pte);
> -}
> -
>  static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>  unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 1c9c4531376c..b3d6bb53ee6e 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -62,11 +62,6 @@ static inline int huge_pte_none(pte_t pte)
>   return !val || (val == (unsigned long)invalid_pte_table);
>  }
>  
> -static inline pte_t huge_pte_wrprotect(pte_t pte)
> -{
> - return pte_wrprotect(pte);
> -}
> -
>  static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>  unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/parisc/include/asm/hugetlb.h 
> b/arch/parisc/include/asm/hugetlb.h
> index c09d8c74553c..5a102d7251e4 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -38,11 +38,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> -static inline pte_t huge_pte_wrprotect(pte_t pte)
> -{
> - return pte_wrprotect(pte);
> -}
> -
>  void huge_ptep_set_wrprotect(struct mm_struct *mm,
>  unsigned long addr, pte_t *ptep);
>  
> diff --git a/arch/powerpc/include/asm/hugetlb.h 
> b/arch/powerpc/include/asm/hugetlb.h
> index 3562d46585ba..7123599089c6 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -152,11 +152,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>   flush_hugetlb_page(vma, addr);
>  }
>  
> -static inline pte_t huge_pte_wrprotect(pte_t pte)
> -{
> - return pte_wrprotect(pte);
> -}
> -
>  extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t pte, int dirty);
> diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
> index a9f8266f33cf..54f65094efe6 100644
> --- a/arch/sh/include/asm/hugetlb.h
> +++ b/arch/sh/include/asm/hugetlb.h
> @@ -31,11 +31,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
>

Re: [PATCH v4 05/11] hugetlb: Introduce generic version of huge_ptep_clear_flush

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> arm, x86 architectures use the same version of
> huge_ptep_clear_flush, so move this generic implementation into
> asm-generic/hugetlb.h.
> 

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm/include/asm/hugetlb-3level.h | 6 --
>  arch/arm64/include/asm/hugetlb.h  | 1 +
>  arch/ia64/include/asm/hugetlb.h   | 1 +
>  arch/mips/include/asm/hugetlb.h   | 1 +
>  arch/parisc/include/asm/hugetlb.h | 1 +
>  arch/powerpc/include/asm/hugetlb.h| 1 +
>  arch/sh/include/asm/hugetlb.h | 1 +
>  arch/sparc/include/asm/hugetlb.h  | 1 +
>  arch/x86/include/asm/hugetlb.h| 6 --
>  include/asm-generic/hugetlb.h | 8 
>  10 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm/include/asm/hugetlb-3level.h 
> b/arch/arm/include/asm/hugetlb-3level.h
> index ad36e84b819a..b897541520ef 100644
> --- a/arch/arm/include/asm/hugetlb-3level.h
> +++ b/arch/arm/include/asm/hugetlb-3level.h
> @@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>   return retval;
>  }
>  
> -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> -  unsigned long addr, pte_t *ptep)
> -{
> - ptep_clear_flush(vma, addr, ptep);
> -}
> -
>  static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>  unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 6ae0bcafe162..4c8dd488554d 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -71,6 +71,7 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>unsigned long addr, pte_t *ptep);
>  extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
>   unsigned long addr, pte_t *ptep);
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
>  extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep);
>  #define __HAVE_ARCH_HUGE_PTE_CLEAR
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index 6719c74da0de..41b5f6adeee4 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -20,6 +20,7 @@ static inline int is_hugepage_only_range(struct mm_struct 
> *mm,
>   REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
>  static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 0959cc5a41fa..7df1f116a3cc 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -48,6 +48,7 @@ static inline pte_t huge_ptep_get_and_clear(struct 
> mm_struct *mm,
>   return pte;
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
>  static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/parisc/include/asm/hugetlb.h 
> b/arch/parisc/include/asm/hugetlb.h
> index 6e281e1bb336..9afff26747a1 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -32,6 +32,7 @@ static inline int prepare_hugepage_range(struct file *file,
>   return 0;
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
>  static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/powerpc/include/asm/hugetlb.h 
> b/arch/powerpc/include/asm/hugetlb.h
> index 970101cf9c82..0b02856aa85b 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -143,6 +143,7 @@ static inline pte_t huge_ptep_get_and_clear(struct 
> mm_struct *mm,
>  #endif
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
>  static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
> index 08ee6c00b5e9..9abf9c86b769 100644
> --- a/arch/sh/include/asm/hugetlb.h
> +++ b/arch/sh/include/asm/hugetlb.h
> @@ -25,6 +25,7 @@ static inline int prepare_hugepage_range(struct file *file,
>   return 0;
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_

Re: [PATCH v4 04/11] hugetlb: Introduce generic version of huge_ptep_get_and_clear

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> arm, ia64, sh, x86 architectures use the
> same version of huge_ptep_get_and_clear, so move this generic
> implementation into asm-generic/hugetlb.h.
> 

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm/include/asm/hugetlb-3level.h | 6 --
>  arch/arm64/include/asm/hugetlb.h  | 1 +
>  arch/ia64/include/asm/hugetlb.h   | 6 --
>  arch/mips/include/asm/hugetlb.h   | 1 +
>  arch/parisc/include/asm/hugetlb.h | 1 +
>  arch/powerpc/include/asm/hugetlb.h| 1 +
>  arch/sh/include/asm/hugetlb.h | 6 --
>  arch/sparc/include/asm/hugetlb.h  | 1 +
>  arch/x86/include/asm/hugetlb.h| 6 --
>  include/asm-generic/hugetlb.h | 8 
>  10 files changed, 13 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm/include/asm/hugetlb-3level.h 
> b/arch/arm/include/asm/hugetlb-3level.h
> index 398fb06e8207..ad36e84b819a 100644
> --- a/arch/arm/include/asm/hugetlb-3level.h
> +++ b/arch/arm/include/asm/hugetlb-3level.h
> @@ -49,12 +49,6 @@ static inline void huge_ptep_set_wrprotect(struct 
> mm_struct *mm,
>   ptep_set_wrprotect(mm, addr, ptep);
>  }
>  
> -static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> -{
> - return ptep_get_and_clear(mm, addr, ptep);
> -}
> -
>  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep,
>pte_t pte, int dirty)
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 874661a1dff1..6ae0bcafe162 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -66,6 +66,7 @@ extern void set_huge_pte_at(struct mm_struct *mm, unsigned 
> long addr,
>  extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t pte, int dirty);
> +#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>  extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>unsigned long addr, pte_t *ptep);
>  extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index a235d6f60fb3..6719c74da0de 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -20,12 +20,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
> *mm,
>   REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
>  }
>  
> -static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> -{
> - return ptep_get_and_clear(mm, addr, ptep);
> -}
> -
>  static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 8ea439041d5d..0959cc5a41fa 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -36,6 +36,7 @@ static inline int prepare_hugepage_range(struct file *file,
>   return 0;
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>  static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>   unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/parisc/include/asm/hugetlb.h 
> b/arch/parisc/include/asm/hugetlb.h
> index 77c8adbac7c3..6e281e1bb336 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -8,6 +8,7 @@
>  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>pte_t *ptep, pte_t pte);
>  
> +#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>  pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> pte_t *ptep);
>  
> diff --git a/arch/powerpc/include/asm/hugetlb.h 
> b/arch/powerpc/include/asm/hugetlb.h
> index 0794b53439d4..970101cf9c82 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -132,6 +132,7 @@ static inline int prepare_hugepage_range(struct file 
> *file,
>   return 0;
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>  static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>   unsigned long addr, pte_t *ptep)
>  {
&

Re: [PATCH v4 03/11] hugetlb: Introduce generic version of set_huge_pte_at

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> arm, ia64, mips, powerpc, sh, x86 architectures use the
> same version of set_huge_pte_at, so move this generic
> implementation into asm-generic/hugetlb.h.
> 

Just one comment below, otherwise:
Reviewed-by: Mike Kravetz 

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm/include/asm/hugetlb-3level.h | 6 --
>  arch/arm64/include/asm/hugetlb.h  | 1 +
>  arch/ia64/include/asm/hugetlb.h   | 6 --
>  arch/mips/include/asm/hugetlb.h   | 6 --
>  arch/parisc/include/asm/hugetlb.h | 1 +
>  arch/powerpc/include/asm/hugetlb.h| 6 --
>  arch/sh/include/asm/hugetlb.h | 6 --
>  arch/sparc/include/asm/hugetlb.h  | 1 +
>  arch/x86/include/asm/hugetlb.h| 6 --
>  include/asm-generic/hugetlb.h | 8 +++-
>  10 files changed, 10 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/arm/include/asm/hugetlb-3level.h 
> b/arch/arm/include/asm/hugetlb-3level.h
> index d4014fbe5ea3..398fb06e8207 100644
> --- a/arch/arm/include/asm/hugetlb-3level.h
> +++ b/arch/arm/include/asm/hugetlb-3level.h
> @@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>   return retval;
>  }
>  
> -static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> -pte_t *ptep, pte_t pte)
> -{
> - set_pte_at(mm, addr, ptep, pte);
> -}
> -

Since  is not directly included in this file,
I had to search around in the #include dependency chain to look for
it.  It makes me just a tiny bit nervous, but since it compiled, I'm
sure there is not an issue.
-- 
Mike Kravetz

>  static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 4af1a800a900..874661a1dff1 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -60,6 +60,7 @@ static inline void arch_clear_hugepage_flags(struct page 
> *page)
>  extern pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
>   struct page *page, int writable);
>  #define arch_make_huge_pte arch_make_huge_pte
> +#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
>  extern void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>   pte_t *ptep, pte_t pte);
>  extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index afe9fa4d969b..a235d6f60fb3 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -20,12 +20,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
> *mm,
>   REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
>  }
>  
> -static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> -pte_t *ptep, pte_t pte)
> -{
> - set_pte_at(mm, addr, ptep, pte);
> -}
> -
>  static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>   unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 53764050243e..8ea439041d5d 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -36,12 +36,6 @@ static inline int prepare_hugepage_range(struct file *file,
>   return 0;
>  }
>  
> -static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> -pte_t *ptep, pte_t pte)
> -{
> - set_pte_at(mm, addr, ptep, pte);
> -}
> -
>  static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>   unsigned long addr, pte_t *ptep)
>  {
> diff --git a/arch/parisc/include/asm/hugetlb.h 
> b/arch/parisc/include/asm/hugetlb.h
> index 28c23b68d38d..77c8adbac7c3 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -4,6 +4,7 @@
>  
>  #include 
>  
> +#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
>  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>pte_t *ptep, pte_t pte);
>  
> diff --git a/arch/powerpc/include/asm/hugetlb.h 
> b/arch/powerpc/include/asm/hugetlb.h
> index a7d5c739df9b..0794b53439d4 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -132,12 +132,6 @@ static inline int prepare_hugepage_range(struct file 
> *file,
>   return 0;
>  }
>  
> -static inline void set_huge_pte_at(struct mm_stru

Re: [PATCH v4 01/11] hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h

2018-07-26 Thread Mike Kravetz

On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> asm-generic/hugetlb.h proposes generic implementations of hugetlb
> related functions: use __HAVE_ARCH_HUGE* defines in order to make arch
> specific implementations of hugetlb functions consistent with pgtable.h
> scheme.
> 

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz

> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/arm64/include/asm/hugetlb.h | 2 +-
>  include/asm-generic/hugetlb.h| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index e73f68569624..3fcf14663dfa 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -81,9 +81,9 @@ extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
>   unsigned long addr, pte_t *ptep);
>  extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep);
> +#define __HAVE_ARCH_HUGE_PTE_CLEAR
>  extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>  pte_t *ptep, unsigned long sz);
> -#define huge_pte_clear huge_pte_clear
>  extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
>pte_t *ptep, pte_t pte, unsigned long sz);
>  #define set_huge_swap_pte_at set_huge_swap_pte_at
> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
> index 9d0cde8ab716..3da7cff52360 100644
> --- a/include/asm-generic/hugetlb.h
> +++ b/include/asm-generic/hugetlb.h
> @@ -32,7 +32,7 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t 
> newprot)
>   return pte_modify(pte, newprot);
>  }
>  
> -#ifndef huge_pte_clear
> +#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR
>  static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>   pte_t *ptep, unsigned long sz)
>  {
>

Re: [PATCH v4 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-07-20 Thread Mike Kravetz

On 07/20/2018 11:37 AM, Alex Ghiti wrote:
> Does anyone have any suggestion about those patches ?

I only took a quick look.  From the hugetlb perspective, I like the
idea of moving routines to a common file.  If any of the arch owners
(or anyone else) agree, I can do a review of the series.
-- 
Mike Kravetz

> On 07/09/2018 02:16 PM, Michal Hocko wrote:
>> [CC hugetlb guys - 
>> http://lkml.kernel.org/r/20180705110716.3919-1-a...@ghiti.fr]
>>
>> On Thu 05-07-18 11:07:05, Alexandre Ghiti wrote:
>>> In order to reduce copy/paste of functions across architectures and then
>>> make riscv hugetlb port (and future ports) simpler and smaller, this
>>> patchset intends to factorize the numerous hugetlb primitives that are
>>> defined across all the architectures.
>>>
>>> Except for prepare_hugepage_range, this patchset moves the versions that
>>> are just pass-through to standard pte primitives into
>>> asm-generic/hugetlb.h by using the same #ifdef semantic that can be
>>> found in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.
>>>
>>> s390 architecture has not been tackled in this serie since it does not
>>> use asm-generic/hugetlb.h at all.
>>> powerpc could be factorized a bit more (cf huge_ptep_set_wrprotect).
>>>
>>> This patchset has been compiled on x86 only.
>>>
>>> Changelog:
>>>
>>> v4:
>>>Fix powerpc build error due to misplacing of #include
>>> outside of #ifdef CONFIG_HUGETLB_PAGE, as
>>>pointed by Christophe Leroy.
>>>
>>> v1, v2, v3:
>>>Same version, just problems with email provider and misuse of
>>>--batch-size option of git send-email
>>>
>>> Alexandre Ghiti (11):
>>>hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
>>>hugetlb: Introduce generic version of hugetlb_free_pgd_range
>>>hugetlb: Introduce generic version of set_huge_pte_at
>>>hugetlb: Introduce generic version of huge_ptep_get_and_clear
>>>hugetlb: Introduce generic version of huge_ptep_clear_flush
>>>hugetlb: Introduce generic version of huge_pte_none
>>>hugetlb: Introduce generic version of huge_pte_wrprotect
>>>hugetlb: Introduce generic version of prepare_hugepage_range
>>>hugetlb: Introduce generic version of huge_ptep_set_wrprotect
>>>hugetlb: Introduce generic version of huge_ptep_set_access_flags
>>>hugetlb: Introduce generic version of huge_ptep_get
>>>
>>>   arch/arm/include/asm/hugetlb-3level.h| 32 +-
>>>   arch/arm/include/asm/hugetlb.h   | 33 +--
>>>   arch/arm64/include/asm/hugetlb.h | 39 +++-
>>>   arch/ia64/include/asm/hugetlb.h  | 47 ++-
>>>   arch/mips/include/asm/hugetlb.h  | 40 +++--
>>>   arch/parisc/include/asm/hugetlb.h| 33 +++
>>>   arch/powerpc/include/asm/book3s/32/pgtable.h |  2 +
>>>   arch/powerpc/include/asm/book3s/64/pgtable.h |  1 +
>>>   arch/powerpc/include/asm/hugetlb.h   | 43 ++
>>>   arch/powerpc/include/asm/nohash/32/pgtable.h |  2 +
>>>   arch/powerpc/include/asm/nohash/64/pgtable.h |  1 +
>>>   arch/sh/include/asm/hugetlb.h| 54 ++---
>>>   arch/sparc/include/asm/hugetlb.h | 40 +++--
>>>   arch/x86/include/asm/hugetlb.h   | 72 +--
>>>   include/asm-generic/hugetlb.h| 88 
>>> +++-
>>>   15 files changed, 143 insertions(+), 384 deletions(-)
>>>
>>> -- 
>>> 2.16.2
>

Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2018-02-13 Thread Mike Kravetz

On 02/12/2018 06:48 PM, Michael Ellerman wrote:
> Andrew Morton <a...@linux-foundation.org> writes:
> 
>> On Thu, 08 Feb 2018 12:30:45 + Punit Agrawal <punit.agra...@arm.com> 
>> wrote:
>>
>>>>
>>>> So I don't think that the above test result means that errors are properly
>>>> handled, and the proposed patch should help for arm64.
>>>
>>> Although, the deviation of pud_huge() avoids a kernel crash the code
>>> would be easier to maintain and reason about if arm64 helpers are
>>> consistent with expectations by core code.
>>>
>>> I'll look to update the arm64 helpers once this patch gets merged. But
>>> it would be helpful if there was a clear expression of semantics for
>>> pud_huge() for various cases. Is there any version that can be used as
>>> reference?
>>
>> Is that an ack or tested-by?
>>
>> Mike keeps plaintively asking the powerpc developers to take a look,
>> but they remain steadfastly in hiding.
> 
> Cc'ing linuxppc-dev is always a good idea :)
> 

Thanks Michael,

I was mostly concerned about use cases for soft/hard offline of huge pages
larger than PMD_SIZE on powerpc.  I know that powerpc supports PGD_SIZE
huge pages, and soft/hard offline support was specifically added for this.
See, 94310cbcaa3c "mm/madvise: enable (soft|hard) offline of HugeTLB pages
at PGD level"

This patch will disable that functionality.  So, at a minimum this is a
'heads up'.  If there are actual use cases that depend on this, then more
work/discussions will need to happen.  From the e-mail thread on PGD_SIZE
support, I can not tell if there is a real use case or this is just a
'nice to have'.

-- 
Mike Kravetz

>> Folks, this patch fixes a BUG and is marked for -stable.  Can we please
>> prioritize it?
> 
> It's not crashing for me (on 4.16-rc1):
> 
>   # ./huge-poison 
>   Poisoning page...once
>   Poisoning page...once again
>   madvise: Bad address
> 
> And I guess the above is the expected behaviour?
> 
> Looking at the function trace it looks like the 2nd madvise is going
> down reasonable code paths, but I don't know for sure:
> 
>   8)   |  SyS_madvise() {
>   8)   |capable() {
>   8)   |  ns_capable_common() {
>   8)   0.094 us|cap_capable();
>   8)   0.516 us|  }
>   8)   1.052 us|}
>   8)   |get_user_pages_fast() {
>   8)   0.354 us|  gup_pgd_range();
>   8)   |  get_user_pages_unlocked() {
>   8)   0.050 us|down_read();
>   8)   |__get_user_pages() {
>   8)   |  find_extend_vma() {
>   8)   |find_vma() {
>   8)   0.148 us|  vmacache_find();
>   8)   0.622 us|}
>   8)   1.064 us|  }
>   8)   0.028 us|  arch_vma_access_permitted();
>   8)   |  follow_hugetlb_page() {
>   8)   |huge_pte_offset() {
>   8)   0.128 us|  __find_linux_pte();
>   8)   0.580 us|}
>   8)   0.048 us|_raw_spin_lock();
>   8)   |hugetlb_fault() {
>   8)   |  huge_pte_offset() {
>   8)   0.034 us|__find_linux_pte();
>   8)   0.434 us|  }
>   8)   0.028 us|  is_hugetlb_entry_migration();
>   8)   0.032 us|  is_hugetlb_entry_hwpoisoned();
>   8)   2.118 us|}
>   8)   4.940 us|  }
>   8)   7.468 us|}
>   8)   0.056 us|up_read();
>   8)   8.722 us|  }
>   8) + 10.264 us   |}
>   8) + 12.212 us   |  }
> 
> 
> cheers
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>

Re: [PATCH v3] powerpc/mm: Implemented default_hugepagesz verification for powerpc

2017-08-04 Thread Mike Kravetz

On 07/24/2017 04:52 PM, Victor Aoqui wrote:
> Implemented default hugepage size verification (default_hugepagesz=)
> in order to allow allocation of defined number of pages (hugepages=)
> only for supported hugepage sizes.
> 
> Signed-off-by: Victor Aoqui <vict...@linux.vnet.ibm.com>
> ---
> v2:
> 
> - Renamed default_hugepage_setup_sz function to hugetlb_default_size_setup;
> - Added powerpc string to error message.
> 
> v3:
> 
> - Renamed hugetlb_default_size_setup() to hugepage_default_setup_sz();
> - Implemented hugetlb_bad_default_size();
> - Reimplemented hugepage_setup_sz() to just parse default_hugepagesz= and
> check if it's a supported size;
> - Added verification of default_hugepagesz= value on hugetlb_nrpages_setup()
> before allocating hugepages.
> 
>  arch/powerpc/mm/hugetlbpage.c | 15 +++
>  include/linux/hugetlb.h   |  1 +
>  mm/hugetlb.c  | 17 +++--
>  3 files changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index e1bf5ca..5990381 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -780,6 +780,21 @@ static int __init hugepage_setup_sz(char *str)
>  }
>  __setup("hugepagesz=", hugepage_setup_sz);
>  
> +static int __init hugepage_default_setup_sz(char *str)
> +{
> + unsigned long long size;
> +
> + size = memparse(str, );
> +
> + if (add_huge_page_size(size) != 0) {
> + hugetlb_bad_default_size();
> + pr_err("Invalid ppc default huge page size specified(%llu)\n", 
> size);
> + }
> +
> + return 1;
> +}
> +__setup("default_hugepagesz=", hugepage_default_setup_sz);
> +
>  struct kmem_cache *hugepte_cache;
>  static int __init hugetlbpage_init(void)
>  {
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 0ed8e41..2927200 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -361,6 +361,7 @@ int huge_add_to_page_cache(struct page *page, struct 
> address_space *mapping,
>  int __init alloc_bootmem_huge_page(struct hstate *h);
>  
>  void __init hugetlb_bad_size(void);
> +void __init hugetlb_bad_default_size(void);
>  void __init hugetlb_add_hstate(unsigned order);
>  struct hstate *size_to_hstate(unsigned long size);
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bc48ee7..3c24266 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -54,6 +54,7 @@
>  static unsigned long __initdata default_hstate_max_huge_pages;
>  static unsigned long __initdata default_hstate_size;
>  static bool __initdata parsed_valid_hugepagesz = true;
> +static bool __initdata parsed_valid_default_hugepagesz = true;
>  
>  /*
>   * Protects updates to hugepage_freelists, hugepage_activelist, 
> nr_huge_pages,
> @@ -2804,6 +2805,12 @@ void __init hugetlb_bad_size(void)
>   parsed_valid_hugepagesz = false;
>  }
>  
> +/* Should be called on processing a default_hugepagesz=... option */
> +void __init hugetlb_bad_default_size(void)
> +{
> + parsed_valid_default_hugepagesz = false;
> +}
> +
>  void __init hugetlb_add_hstate(unsigned int order)
>  {
>   struct hstate *h;
> @@ -2846,8 +2853,14 @@ static int __init hugetlb_nrpages_setup(char *s)
>* !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter 
> yet,
>* so this hugepages= parameter goes to the "default hstate".
>*/
> - else if (!hugetlb_max_hstate)
> - mhp = _hstate_max_huge_pages;
> + else if (!hugetlb_max_hstate) {
> + if (!parsed_valid_default_hugepagesz) {
> + pr_warn("hugepages = %s cannot be allocated for "
> + "unsupported default_hugepagesz, ignoring\n", 
> s);
> + parsed_valid_default_hugepagesz = true;
> + } else
> + mhp = _hstate_max_huge_pages;
> + }
>   else
>   mhp = _hstate->max_huge_pages;
>  
> 

My compiler tells me,

mm/hugetlb.c: In function ‘hugetlb_nrpages_setup’:
mm/hugetlb.c:2873:8: warning: ‘mhp’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]

You have added a way of getting out of that big if/else if statement without
setting mhp.  mhp will be examined later in the code, so this is indeed a bug.

Like Aneesh, I am not sure if there is great benefit in this patch.

You added this change in functionality only for powerpc.  IMO, it would be
best if behavior was consistent in all architectures.  So, if we change it
for powerpc we may want to change everywhere.
-- 
Mike Kravetz

1 2 >

1 - 100 of 101 matches

Mail list logo