On Wed, Mar 11, 2015 at 02:35:10PM +0100, [email protected] wrote:
>
> The patch below does not apply to the 3.19-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <[email protected]>.
>
> thanks,
>
> greg k-h
>
> ------------------ original commit in Linus's tree ------------------
>
> From e66f17ff71772b209eed39de35aaa99ba819c93d Mon Sep 17 00:00:00 2001
> From: Naoya Horiguchi <[email protected]>
> Date: Wed, 11 Feb 2015 15:25:22 -0800
> Subject: [PATCH] mm/hugetlb: take page table lock in follow_huge_pmd()
This patch depends on commit 61f77eda9bbf ("mm/hugetlb: reduce arch
dependent code around follow_huge_*"), so we need backport this with it.
Unfortunately, 61f77eda9bbf doesn't apply cleanly on v3.19.1 because of
commit 25153a83b04f ("mm/hugetlb: take page table lock in follow_huge_pmd()"),
which is already merged at v3.19.1 separately.
IOW, these 3 patches should've been applied in the order of 61f77eda9bbf
-> 25153a83b04f -> e66f17ff7177, but now only 25153a83b04f is in stable-3.19.
So I attached modified versions of 61f77eda9bbf and e66f17ff7177, these
should be applied onto v3.19.1 cleanly.
Thanks,
Naoya Horiguchi
From b0a4217b12c72697d7b48ac66c8547031ca373d7 Mon Sep 17 00:00:00 2001
From: Naoya Horiguchi <[email protected]>
Date: Mon, 16 Mar 2015 16:41:08 +0900
Subject: [PATCH 1/2] mm/hugetlb: reduce arch dependent code around
follow_huge_*
Currently we have many duplicates in definitions around
follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
patch tries to remove the m. The basic idea is to put the default
implementation for these functions in mm/hugetlb.c as weak symbols
(regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
arch-specific code only when the arch needs it.
For follow_huge_addr(), only powerpc and ia64 have their own
implementation, and in all other architectures this function just returns
ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as
default.
As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
always return 0 in your architecture (like in ia64 or sparc,) it's never
called (the callsite is optimized away) no matter how implemented it is.
So in such architectures, we don't need arch-specific implementation.
In some architecture (like mips, s390 and tile,) their current
arch-specific follow_huge_(pmd|pud)() are effectively identical with the
common code, so this patch lets these architecture use the common code.
One exception is metag, where pmd_huge() could return non-zero but it
expects follow_huge_pmd() to always return NULL. This means that we need
arch-specific implementation which returns NULL. This behavior looks
strange to me (because non-zero pmd_huge() implies that the architecture
supports PMD-based hugepage, so follow_huge_pmd() can/should return some
relevant value,) but that's beyond this cleanup patch, so let's keep it.
Justification of non-trivial changes:
- in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
is true when follow_huge_pmd() can be called (note that pmd_huge() has
the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
- in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
code. This patch forces these archs use PMD_MASK, but it's OK because
they are identical in both archs.
In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
PTE_ORDER is always 0, so these are identical.
[[email protected]: resolve conflict to apply to v3.19.1]
Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: James Hogan <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Nishanth Aravamudan <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Steve Capper <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
---
arch/arm/mm/hugetlbpage.c | 6 ------
arch/arm64/mm/hugetlbpage.c | 6 ------
arch/ia64/mm/hugetlbpage.c | 6 ------
arch/metag/mm/hugetlbpage.c | 6 ------
arch/mips/mm/hugetlbpage.c | 18 ------------------
arch/powerpc/mm/hugetlbpage.c | 8 ++++++++
arch/s390/mm/hugetlbpage.c | 20 --------------------
arch/sh/mm/hugetlbpage.c | 12 ------------
arch/sparc/mm/hugetlbpage.c | 12 ------------
arch/tile/mm/hugetlbpage.c | 28 ----------------------------
arch/x86/mm/hugetlbpage.c | 12 ------------
mm/hugetlb.c | 30 +++++++++++++++---------------
12 files changed, 23 insertions(+), 141 deletions(-)
diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
index 66781bf34077..c72412415093 100644
--- a/arch/arm/mm/hugetlbpage.c
+++ b/arch/arm/mm/hugetlbpage.c
@@ -36,12 +36,6 @@
* of type casting from pmd_t * to pte_t *.
*/
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
- int write)
-{
- return ERR_PTR(-EINVAL);
-}
-
int pud_huge(pud_t pud)
{
return 0;
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 023747bf4dd7..2de9d2e59d96 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -38,12 +38,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long
*addr, pte_t *ptep)
}
#endif
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
- int write)
-{
- return ERR_PTR(-EINVAL);
-}
-
int pmd_huge(pmd_t pmd)
{
return !(pmd_val(pmd) & PMD_TABLE_BIT);
diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index 76069c18ee42..52b7604b5215 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -114,12 +114,6 @@ int pud_huge(pud_t pud)
return 0;
}
-struct page *
-follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int
write)
-{
- return NULL;
-}
-
void hugetlb_free_pgd_range(struct mmu_gather *tlb,
unsigned long addr, unsigned long end,
unsigned long floor, unsigned long ceiling)
diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c
index 3c32075d2945..7ca80ac42ed5 100644
--- a/arch/metag/mm/hugetlbpage.c
+++ b/arch/metag/mm/hugetlbpage.c
@@ -94,12 +94,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long
*addr, pte_t *ptep)
return 0;
}
-struct page *follow_huge_addr(struct mm_struct *mm,
- unsigned long address, int write)
-{
- return ERR_PTR(-EINVAL);
-}
-
int pmd_huge(pmd_t pmd)
{
return pmd_page_shift(pmd) > PAGE_SHIFT;
diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
index 4ec8ee10d371..06e0f421b41b 100644
--- a/arch/mips/mm/hugetlbpage.c
+++ b/arch/mips/mm/hugetlbpage.c
@@ -68,12 +68,6 @@ int is_aligned_hugepage_range(unsigned long addr, unsigned
long len)
return 0;
}
-struct page *
-follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
-{
- return ERR_PTR(-EINVAL);
-}
-
int pmd_huge(pmd_t pmd)
{
return (pmd_val(pmd) & _PAGE_HUGE) != 0;
@@ -83,15 +77,3 @@ int pud_huge(pud_t pud)
{
return (pud_val(pud) & _PAGE_HUGE) != 0;
}
-
-struct page *
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int write)
-{
- struct page *page;
-
- page = pte_page(*(pte_t *)pmd);
- if (page)
- page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
- return page;
-}
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 620d0ec93e6f..7e408bfc7948 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -714,6 +714,14 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long
address,
return NULL;
}
+struct page *
+follow_huge_pud(struct mm_struct *mm, unsigned long address,
+ pud_t *pud, int write)
+{
+ BUG();
+ return NULL;
+}
+
static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
unsigned long sz)
{
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 3c80d2e38f03..210ffede0153 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -192,12 +192,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long
*addr, pte_t *ptep)
return 0;
}
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
- int write)
-{
- return ERR_PTR(-EINVAL);
-}
-
int pmd_huge(pmd_t pmd)
{
if (!MACHINE_HAS_HPAGE)
@@ -210,17 +204,3 @@ int pud_huge(pud_t pud)
{
return 0;
}
-
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmdp, int write)
-{
- struct page *page;
-
- if (!MACHINE_HAS_HPAGE)
- return NULL;
-
- page = pmd_page(*pmdp);
- if (page)
- page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
- return page;
-}
diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
index d7762349ea48..534bc978af8a 100644
--- a/arch/sh/mm/hugetlbpage.c
+++ b/arch/sh/mm/hugetlbpage.c
@@ -67,12 +67,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long
*addr, pte_t *ptep)
return 0;
}
-struct page *follow_huge_addr(struct mm_struct *mm,
- unsigned long address, int write)
-{
- return ERR_PTR(-EINVAL);
-}
-
int pmd_huge(pmd_t pmd)
{
return 0;
@@ -82,9 +76,3 @@ int pud_huge(pud_t pud)
{
return 0;
}
-
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int write)
-{
- return NULL;
-}
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index d329537739c6..4242eab12e10 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -215,12 +215,6 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr,
return entry;
}
-struct page *follow_huge_addr(struct mm_struct *mm,
- unsigned long address, int write)
-{
- return ERR_PTR(-EINVAL);
-}
-
int pmd_huge(pmd_t pmd)
{
return 0;
@@ -230,9 +224,3 @@ int pud_huge(pud_t pud)
{
return 0;
}
-
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int write)
-{
- return NULL;
-}
diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
index 3270e0019266..8416240c322c 100644
--- a/arch/tile/mm/hugetlbpage.c
+++ b/arch/tile/mm/hugetlbpage.c
@@ -150,12 +150,6 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long
addr)
return NULL;
}
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
- int write)
-{
- return ERR_PTR(-EINVAL);
-}
-
int pmd_huge(pmd_t pmd)
{
return !!(pmd_val(pmd) & _PAGE_HUGE_PAGE);
@@ -166,28 +160,6 @@ int pud_huge(pud_t pud)
return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
}
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int write)
-{
- struct page *page;
-
- page = pte_page(*(pte_t *)pmd);
- if (page)
- page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
- return page;
-}
-
-struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
- pud_t *pud, int write)
-{
- struct page *page;
-
- page = pte_page(*(pte_t *)pud);
- if (page)
- page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
- return page;
-}
-
int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
{
return 0;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 006cc914994b..9161f764121e 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -52,20 +52,8 @@ int pud_huge(pud_t pud)
return 0;
}
-struct page *
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int write)
-{
- return NULL;
-}
#else
-struct page *
-follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
-{
- return ERR_PTR(-EINVAL);
-}
-
/*
* pmd_huge() returns 1 if @pmd is hugetlb related entry, that is normal
* hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c49586f40758..dd42878549d5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3660,7 +3660,20 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned
long addr)
return (pte_t *) pmd;
}
-struct page *
+#endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
+
+/*
+ * These functions are overwritable if your architecture needs its own
+ * behavior.
+ */
+struct page * __weak
+follow_huge_addr(struct mm_struct *mm, unsigned long address,
+ int write)
+{
+ return ERR_PTR(-EINVAL);
+}
+
+struct page * __weak
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
{
@@ -3674,7 +3687,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long
address,
return page;
}
-struct page *
+struct page * __weak
follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int write)
{
@@ -3686,19 +3699,6 @@ follow_huge_pud(struct mm_struct *mm, unsigned long
address,
return page;
}
-#else /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
-
-/* Can be overriden by architectures */
-struct page * __weak
-follow_huge_pud(struct mm_struct *mm, unsigned long address,
- pud_t *pud, int write)
-{
- BUG();
- return NULL;
-}
-
-#endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
-
#ifdef CONFIG_MEMORY_FAILURE
/* Should be called in hugetlb_lock */
--
1.9.3
From 25153a83b04f6662618d8a7fa5e028a94288873f Mon Sep 17 00:00:00 2001
From: Naoya Horiguchi <[email protected]>
Date: Mon, 16 Mar 2015 16:42:43 +0900
Subject: [PATCH 2/2] mm/hugetlb: take page table lock in follow_huge_pmd()
We have a race condition between move_pages() and freeing hugepages, where
move_pages() calls follow_page(FOLL_GET) for hugepages internally and
tries to get its refcount without preventing concurrent freeing. This
race crashes the kernel, so this patch fixes it by moving FOLL_GET code
for hugepages into follow_huge_pmd() with taking the page table lock.
This patch intentionally removes page==NULL check after pte_page.
This is justified because pte_page() never returns NULL for any
architectures or configurations.
This patch changes the behavior of follow_huge_pmd() for tail pages and
then tail pages can be pinned/returned. So the caller must be changed to
properly handle the returned tail pages.
We could have a choice to add the similar locking to
follow_huge_(addr|pud) for consistency, but it's not necessary because
currently these functions don't support FOLL_GET flag, so let's leave it
for future development.
Here is the reproducer:
$ cat movepages.c
#include <stdio.h>
#include <stdlib.h>
#include <numaif.h>
#define ADDR_INPUT 0x700000000000UL
#define HPS 0x200000
#define PS 0x1000
int main(int argc, char *argv[]) {
int i;
int nr_hp = strtol(argv[1], NULL, 0);
int nr_p = nr_hp * HPS / PS;
int ret;
void **addrs;
int *status;
int *nodes;
pid_t pid;
pid = strtol(argv[2], NULL, 0);
addrs = malloc(sizeof(char *) * nr_p + 1);
status = malloc(sizeof(char *) * nr_p + 1);
nodes = malloc(sizeof(char *) * nr_p + 1);
while (1) {
for (i = 0; i < nr_p; i++) {
addrs[i] = (void *)ADDR_INPUT + i * PS;
nodes[i] = 1;
status[i] = 0;
}
ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
MPOL_MF_MOVE_ALL);
if (ret == -1)
err("move_pages");
for (i = 0; i < nr_p; i++) {
addrs[i] = (void *)ADDR_INPUT + i * PS;
nodes[i] = 0;
status[i] = 0;
}
ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
MPOL_MF_MOVE_ALL);
if (ret == -1)
err("move_pages");
}
return 0;
}
$ cat hugepage.c
#include <stdio.h>
#include <sys/mman.h>
#include <string.h>
#define ADDR_INPUT 0x700000000000UL
#define HPS 0x200000
int main(int argc, char *argv[]) {
int nr_hp = strtol(argv[1], NULL, 0);
char *p;
while (1) {
p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ |
PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
if (p != (void *)ADDR_INPUT) {
perror("mmap");
break;
}
memset(p, 0, nr_hp * HPS);
munmap(p, nr_hp * HPS);
}
}
$ sysctl vm.nr_hugepages=40
$ ./hugepage 10 &
$ ./movepages 10 $(pgrep -f hugepage)
[[email protected]: resolve conflict to apply to v3.19.1]
Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
Signed-off-by: Naoya Horiguchi <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
Cc: James Hogan <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Nishanth Aravamudan <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: <[email protected]> [3.12+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
---
include/linux/hugetlb.h | 8 ++++----
include/linux/swapops.h | 4 ++++
mm/gup.c | 25 ++++++++-----------------
mm/hugetlb.c | 48 ++++++++++++++++++++++++++++++++++--------------
mm/migrate.c | 5 +++--
5 files changed, 53 insertions(+), 37 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 431b7fc605c9..e235ec5f1f28 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -99,9 +99,9 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long
*addr, pte_t *ptep);
struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
int write);
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int write);
+ pmd_t *pmd, int flags);
struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
- pud_t *pud, int write);
+ pud_t *pud, int flags);
int pmd_huge(pmd_t pmd);
int pud_huge(pud_t pmd);
unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
@@ -133,8 +133,8 @@ static inline void hugetlb_report_meminfo(struct seq_file
*m)
static inline void hugetlb_show_meminfo(void)
{
}
-#define follow_huge_pmd(mm, addr, pmd, write) NULL
-#define follow_huge_pud(mm, addr, pud, write) NULL
+#define follow_huge_pmd(mm, addr, pmd, flags) NULL
+#define follow_huge_pud(mm, addr, pud, flags) NULL
#define prepare_hugepage_range(file, addr, len) (-EINVAL)
#define pmd_huge(x) 0
#define pud_huge(x) 0
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 6adfb7bfbf44..e288d5c016a7 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -137,6 +137,8 @@ static inline void make_migration_entry_read(swp_entry_t
*entry)
*entry = swp_entry(SWP_MIGRATION_READ, swp_offset(*entry));
}
+extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
+ spinlock_t *ptl);
extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
unsigned long address);
extern void migration_entry_wait_huge(struct vm_area_struct *vma,
@@ -150,6 +152,8 @@ static inline int is_migration_entry(swp_entry_t swp)
}
#define migration_entry_to_page(swp) NULL
static inline void make_migration_entry_read(swp_entry_t *entryp) { }
+static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
+ spinlock_t *ptl) { }
static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
unsigned long address) { }
static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
diff --git a/mm/gup.c b/mm/gup.c
index 9b2afbfe67e3..e29c3745a893 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -167,10 +167,10 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
if (pud_none(*pud))
return no_page_table(vma, flags);
if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
- if (flags & FOLL_GET)
- return NULL;
- page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);
- return page;
+ page = follow_huge_pud(mm, address, pud, flags);
+ if (page)
+ return page;
+ return no_page_table(vma, flags);
}
if (unlikely(pud_bad(*pud)))
return no_page_table(vma, flags);
@@ -179,19 +179,10 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
if (pmd_none(*pmd))
return no_page_table(vma, flags);
if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
- page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
- if (flags & FOLL_GET) {
- /*
- * Refcount on tail pages are not well-defined and
- * shouldn't be taken. The caller should handle a NULL
- * return when trying to follow tail pages.
- */
- if (PageHead(page))
- get_page(page);
- else
- page = NULL;
- }
- return page;
+ page = follow_huge_pmd(mm, address, pmd, flags);
+ if (page)
+ return page;
+ return no_page_table(vma, flags);
}
if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
return no_page_table(vma, flags);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dd42878549d5..adafced1aa17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3675,28 +3675,48 @@ follow_huge_addr(struct mm_struct *mm, unsigned long
address,
struct page * __weak
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int write)
+ pmd_t *pmd, int flags)
{
- struct page *page;
-
- if (!pmd_present(*pmd))
- return NULL;
- page = pte_page(*(pte_t *)pmd);
- if (page)
- page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
+ struct page *page = NULL;
+ spinlock_t *ptl;
+retry:
+ ptl = pmd_lockptr(mm, pmd);
+ spin_lock(ptl);
+ /*
+ * make sure that the address range covered by this pmd is not
+ * unmapped from other threads.
+ */
+ if (!pmd_huge(*pmd))
+ goto out;
+ if (pmd_present(*pmd)) {
+ page = pte_page(*(pte_t *)pmd) +
+ ((address & ~PMD_MASK) >> PAGE_SHIFT);
+ if (flags & FOLL_GET)
+ get_page(page);
+ } else {
+ if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pmd))) {
+ spin_unlock(ptl);
+ __migration_entry_wait(mm, (pte_t *)pmd, ptl);
+ goto retry;
+ }
+ /*
+ * hwpoisoned entry is treated as no_page_table in
+ * follow_page_mask().
+ */
+ }
+out:
+ spin_unlock(ptl);
return page;
}
struct page * __weak
follow_huge_pud(struct mm_struct *mm, unsigned long address,
- pud_t *pud, int write)
+ pud_t *pud, int flags)
{
- struct page *page;
+ if (flags & FOLL_GET)
+ return NULL;
- page = pte_page(*(pte_t *)pud);
- if (page)
- page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
- return page;
+ return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
}
#ifdef CONFIG_MEMORY_FAILURE
diff --git a/mm/migrate.c b/mm/migrate.c
index 344cdf692fc8..be6d1edcfcb7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -229,7 +229,7 @@ static void remove_migration_ptes(struct page *old, struct
page *new)
* get to the page and wait until migration is finished.
* When we return from this function the fault will be retried.
*/
-static void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
+void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
spinlock_t *ptl)
{
pte_t pte;
@@ -1268,7 +1268,8 @@ static int do_move_page_to_node_array(struct mm_struct
*mm,
goto put_and_set;
if (PageHuge(page)) {
- isolate_huge_page(page, &pagelist);
+ if (PageHead(page))
+ isolate_huge_page(page, &pagelist);
goto put_and_set;
}
--
1.9.3