[PATCH] update NFIT flags error message

2019-02-28 Thread Toshi Kani
ACPI NFIT flags field reports major errors on NVDIMM, which need
user's attention.

Update the current log to a proper error message with dev_err().
The current message string is kept for grep-compatibility.

Signed-off-by: Toshi Kani 
Cc: Dan Williams 
Cc: "Rafael J. Wysocki" 
Cc: Robert Elliott 
---
 drivers/acpi/nfit/core.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index e18ade5d74e9..143a77704481 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2050,7 +2050,7 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc 
*acpi_desc)
if ((mem_flags & ACPI_NFIT_MEM_FAILED_MASK) == 0)
continue;
 
-   dev_info(acpi_desc->dev, "%s flags:%s%s%s%s%s\n",
+   dev_err(acpi_desc->dev, "Error found in NVDIMM %s 
flags:%s%s%s%s%s\n",
nvdimm_name(nvdimm),
  mem_flags & ACPI_NFIT_MEM_SAVE_FAILED ? " save_fail" : "",
  mem_flags & ACPI_NFIT_MEM_RESTORE_FAILED ? " restore_fail":"",


[tip:x86/mm] x86/mm: Add TLB purge to free pmd/pte page interfaces

2018-07-04 Thread tip-bot for Toshi Kani
Commit-ID:  5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e
Gitweb: https://git.kernel.org/tip/5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e
Author: Toshi Kani 
AuthorDate: Wed, 27 Jun 2018 08:13:48 -0600
Committer:  Thomas Gleixner 
CommitDate: Wed, 4 Jul 2018 21:37:09 +0200

x86/mm: Add TLB purge to free pmd/pte page interfaces

ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani 
Signed-off-by: Thomas Gleixner 
Cc: mho...@suse.com
Cc: a...@linux-foundation.org
Cc: h...@zytor.com
Cc: cpan...@codeaurora.org
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: Joerg Roedel 
Cc: sta...@vger.kernel.org
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: "H. Peter Anvin" 
Cc: 
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.k...@hpe.com

---
 arch/x86/mm/pgtable.c | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-   pmd_t *pmd;
+   pmd_t *pmd, *pmd_sv;
+   pte_t *pte;
int i;
 
if (pud_none(*pud))
return 1;
 
pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+   if (!pmd_sv)
+   return 0;
 
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
-   return 0;
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_sv[i] = pmd[i];
+   if (!pmd_none(pmd[i]))
+   pmd_clear([i]);
+   }
 
pud_clear(pud);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd_sv[i])) {
+   pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+   free_page((unsigned long)pte);
+   }
+   }
+
+   free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
 
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
 
return 1;


[tip:x86/mm] x86/mm: Add TLB purge to free pmd/pte page interfaces

2018-07-04 Thread tip-bot for Toshi Kani
Commit-ID:  5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e
Gitweb: https://git.kernel.org/tip/5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e
Author: Toshi Kani 
AuthorDate: Wed, 27 Jun 2018 08:13:48 -0600
Committer:  Thomas Gleixner 
CommitDate: Wed, 4 Jul 2018 21:37:09 +0200

x86/mm: Add TLB purge to free pmd/pte page interfaces

ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani 
Signed-off-by: Thomas Gleixner 
Cc: mho...@suse.com
Cc: a...@linux-foundation.org
Cc: h...@zytor.com
Cc: cpan...@codeaurora.org
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: Joerg Roedel 
Cc: sta...@vger.kernel.org
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: "H. Peter Anvin" 
Cc: 
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.k...@hpe.com

---
 arch/x86/mm/pgtable.c | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-   pmd_t *pmd;
+   pmd_t *pmd, *pmd_sv;
+   pte_t *pte;
int i;
 
if (pud_none(*pud))
return 1;
 
pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+   if (!pmd_sv)
+   return 0;
 
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
-   return 0;
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_sv[i] = pmd[i];
+   if (!pmd_none(pmd[i]))
+   pmd_clear([i]);
+   }
 
pud_clear(pud);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd_sv[i])) {
+   pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+   free_page((unsigned long)pte);
+   }
+   }
+
+   free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
 
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
 
return 1;


[tip:x86/mm] x86/mm: Disable ioremap free page handling on x86-PAE

2018-07-04 Thread tip-bot for Toshi Kani
Commit-ID:  f967db0b9ed44ec3057a28f3b28efc51df51b835
Gitweb: https://git.kernel.org/tip/f967db0b9ed44ec3057a28f3b28efc51df51b835
Author: Toshi Kani 
AuthorDate: Wed, 27 Jun 2018 08:13:46 -0600
Committer:  Thomas Gleixner 
CommitDate: Wed, 4 Jul 2018 21:37:08 +0200

x86/mm: Disable ioremap free page handling on x86-PAE

ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel 
Signed-off-by: Toshi Kani 
Signed-off-by: Thomas Gleixner 
Cc: mho...@suse.com
Cc: a...@linux-foundation.org
Cc: h...@zytor.com
Cc: cpan...@codeaurora.org
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: sta...@vger.kernel.org
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: "H. Peter Anvin" 
Cc: 
Link: https://lkml.kernel.org/r/20180627141348.21777-2-toshi.k...@hpe.com

---
 arch/x86/mm/pgtable.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 47b5951e592b..1aeb7a5dbce5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,6 +719,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -766,4 +767,22 @@ int pmd_free_pte_page(pmd_t *pmd)
 
return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[tip:x86/mm] x86/mm: Disable ioremap free page handling on x86-PAE

2018-07-04 Thread tip-bot for Toshi Kani
Commit-ID:  f967db0b9ed44ec3057a28f3b28efc51df51b835
Gitweb: https://git.kernel.org/tip/f967db0b9ed44ec3057a28f3b28efc51df51b835
Author: Toshi Kani 
AuthorDate: Wed, 27 Jun 2018 08:13:46 -0600
Committer:  Thomas Gleixner 
CommitDate: Wed, 4 Jul 2018 21:37:08 +0200

x86/mm: Disable ioremap free page handling on x86-PAE

ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel 
Signed-off-by: Toshi Kani 
Signed-off-by: Thomas Gleixner 
Cc: mho...@suse.com
Cc: a...@linux-foundation.org
Cc: h...@zytor.com
Cc: cpan...@codeaurora.org
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: sta...@vger.kernel.org
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: "H. Peter Anvin" 
Cc: 
Link: https://lkml.kernel.org/r/20180627141348.21777-2-toshi.k...@hpe.com

---
 arch/x86/mm/pgtable.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 47b5951e592b..1aeb7a5dbce5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,6 +719,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -766,4 +767,22 @@ int pmd_free_pte_page(pmd_t *pmd)
 
return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH v4 0/3] fix free pmd/pte page handlings on x86

2018-06-27 Thread Toshi Kani
This series fixes two issues in the x86 ioremap free page handlings
for pud/pmd mappings.

Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
the free page handling on x86-PAE.

Patch 02-03 fixes a possible issue with speculation which can cause
stale page-directory cache.
 - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
   'addr', with my merge change to patch 01.
 - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
   that may be cached by speculation.  See the patch descriptions for
   more detal.

The patches are based off from the tip tree.

[1] https://patchwork.kernel.org/patch/10371015/

v4:
 - Re-wrote patch 2/3 description. (v3-UPDATE)
 - Added NOTE to pud_free_pmd_page().

v3:
 - Fixed a build error in v2.

v2:
 - Reordered patch-set, so that patch 01 can be applied independently.
 - Added a NULL pointer check for the page alloc in patch 03. 

---
Toshi Kani (2):
  1/3 x86/mm: disable ioremap free page handling on x86-PAE
  3/3 x86/mm: add TLB purge to free pmd/pte page interfaces

Chintan Pandya (1):
  2/3 ioremap: Update pgtable free interfaces with addr

---
 arch/arm64/mm/mmu.c   |  4 +--
 arch/x86/mm/pgtable.c | 61 +--
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c |  4 +--
 4 files changed, 61 insertions(+), 16 deletions(-)


[PATCH v4 0/3] fix free pmd/pte page handlings on x86

2018-06-27 Thread Toshi Kani
This series fixes two issues in the x86 ioremap free page handlings
for pud/pmd mappings.

Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
the free page handling on x86-PAE.

Patch 02-03 fixes a possible issue with speculation which can cause
stale page-directory cache.
 - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
   'addr', with my merge change to patch 01.
 - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
   that may be cached by speculation.  See the patch descriptions for
   more detal.

The patches are based off from the tip tree.

[1] https://patchwork.kernel.org/patch/10371015/

v4:
 - Re-wrote patch 2/3 description. (v3-UPDATE)
 - Added NOTE to pud_free_pmd_page().

v3:
 - Fixed a build error in v2.

v2:
 - Reordered patch-set, so that patch 01 can be applied independently.
 - Added a NULL pointer check for the page alloc in patch 03. 

---
Toshi Kani (2):
  1/3 x86/mm: disable ioremap free page handling on x86-PAE
  3/3 x86/mm: add TLB purge to free pmd/pte page interfaces

Chintan Pandya (1):
  2/3 ioremap: Update pgtable free interfaces with addr

---
 arch/arm64/mm/mmu.c   |  4 +--
 arch/x86/mm/pgtable.c | 61 +--
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c |  4 +--
 4 files changed, 61 insertions(+), 16 deletions(-)


[PATCH v4 1/3] x86/mm: disable ioremap free page handling on x86-PAE

2018-06-27 Thread Toshi Kani
ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel 
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Joerg Roedel 
Cc: 
---
 arch/x86/mm/pgtable.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 47b5951e592b..1aeb7a5dbce5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,6 +719,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -766,4 +767,22 @@ int pmd_free_pte_page(pmd_t *pmd)
 
return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH v4 1/3] x86/mm: disable ioremap free page handling on x86-PAE

2018-06-27 Thread Toshi Kani
ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel 
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Joerg Roedel 
Cc: 
---
 arch/x86/mm/pgtable.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 47b5951e592b..1aeb7a5dbce5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,6 +719,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -766,4 +767,22 @@ int pmd_free_pte_page(pmd_t *pmd)
 
return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH v4 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces

2018-06-27 Thread Toshi Kani
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Joerg Roedel 
Cc: 
---
 arch/x86/mm/pgtable.c |   36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-   pmd_t *pmd;
+   pmd_t *pmd, *pmd_sv;
+   pte_t *pte;
int i;
 
if (pud_none(*pud))
return 1;
 
pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+   if (!pmd_sv)
+   return 0;
 
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
-   return 0;
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_sv[i] = pmd[i];
+   if (!pmd_none(pmd[i]))
+   pmd_clear([i]);
+   }
 
pud_clear(pud);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd_sv[i])) {
+   pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+   free_page((unsigned long)pte);
+   }
+   }
+
+   free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
 
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
 
return 1;


[PATCH v4 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces

2018-06-27 Thread Toshi Kani
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Joerg Roedel 
Cc: 
---
 arch/x86/mm/pgtable.c |   36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-   pmd_t *pmd;
+   pmd_t *pmd, *pmd_sv;
+   pte_t *pte;
int i;
 
if (pud_none(*pud))
return 1;
 
pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+   if (!pmd_sv)
+   return 0;
 
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
-   return 0;
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_sv[i] = pmd[i];
+   if (!pmd_none(pmd[i]))
+   pmd_clear([i]);
+   }
 
pud_clear(pud);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd_sv[i])) {
+   pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+   free_page((unsigned long)pte);
+   }
+   }
+
+   free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
 
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
 
return 1;


[PATCH v4 2/3] ioremap: Update pgtable free interfaces with addr

2018-06-27 Thread Toshi Kani
From: Chintan Pandya 

The following kernel panic was observed on ARM64 platform due to a stale
TLB entry.

 1. ioremap with 4K size, a valid pte page table is set.
 2. iounmap it, its pte entry is set to 0.
 3. ioremap the same address with 2M size, update its pmd entry with
a new value.
 4. CPU may hit an exception because the old pmd entry is still in TLB,
which leads to a kernel panic.

Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page
table") has addressed this panic by falling to pte mappings in the above
case on ARM64.

To support pmd mappings in all cases, TLB purge needs to be performed
in this case on ARM64.

Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
so that TLB purge can be added later in seprate patches.

[toshi.k...@hpe.com: merge changes, rewrite patch description]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya 
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Will Deacon 
Cc: Joerg Roedel 
Cc: 
---
 arch/arm64/mm/mmu.c   |4 ++--
 arch/x86/mm/pgtable.c |   12 +++-
 include/asm-generic/pgtable.h |8 
 lib/ioremap.c |4 ++--
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 493ff75670ff..8ae5d7ae4af3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -977,12 +977,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 1aeb7a5dbce5..fbd14e506758 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -723,11 +723,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
pmd_t *pmd;
int i;
@@ -738,7 +739,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
 
for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i]))
+   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
return 0;
 
pud_clear(pud);
@@ -750,11 +751,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
pte_t *pte;
 
@@ -770,7 +772,7 @@ int pmd_free_pte_page(pmd_t *pmd)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
@@ -779,7 +781,7 @@ int pud_free_pmd_page(pud_t *pud)
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static i

[PATCH v4 2/3] ioremap: Update pgtable free interfaces with addr

2018-06-27 Thread Toshi Kani
From: Chintan Pandya 

The following kernel panic was observed on ARM64 platform due to a stale
TLB entry.

 1. ioremap with 4K size, a valid pte page table is set.
 2. iounmap it, its pte entry is set to 0.
 3. ioremap the same address with 2M size, update its pmd entry with
a new value.
 4. CPU may hit an exception because the old pmd entry is still in TLB,
which leads to a kernel panic.

Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page
table") has addressed this panic by falling to pte mappings in the above
case on ARM64.

To support pmd mappings in all cases, TLB purge needs to be performed
in this case on ARM64.

Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
so that TLB purge can be added later in seprate patches.

[toshi.k...@hpe.com: merge changes, rewrite patch description]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya 
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Will Deacon 
Cc: Joerg Roedel 
Cc: 
---
 arch/arm64/mm/mmu.c   |4 ++--
 arch/x86/mm/pgtable.c |   12 +++-
 include/asm-generic/pgtable.h |8 
 lib/ioremap.c |4 ++--
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 493ff75670ff..8ae5d7ae4af3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -977,12 +977,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 1aeb7a5dbce5..fbd14e506758 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -723,11 +723,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
pmd_t *pmd;
int i;
@@ -738,7 +739,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
 
for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i]))
+   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
return 0;
 
pud_clear(pud);
@@ -750,11 +751,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
pte_t *pte;
 
@@ -770,7 +772,7 @@ int pmd_free_pte_page(pmd_t *pmd)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
@@ -779,7 +781,7 @@ int pud_free_pmd_page(pud_t *pud)
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static i

[PATCH v3-UPADATE 2/3] ioremap: Update pgtable free interfaces with addr

2018-05-17 Thread Toshi Kani
From: Chintan Pandya <cpan...@codeaurora.org>

The following kernel panic was observed on ARM64 platform due to a stale
TLB entry.

 1. ioremap with 4K size, a valid pte page table is set.
 2. iounmap it, its pte entry is set to 0.
 3. ioremap the same address with 2M size, update its pmd entry with
a new value.
 4. CPU may hit an exception because the old pmd entry is still in TLB,
which leads to a kernel panic.

Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page
table") has addressed this panic by falling to pte mappings in the above
case on ARM64.

To support pmd mappings in all cases, TLB purge needs to be performed
in this case on ARM64.

Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
so that TLB purge can be added later in seprate patches.

[to...@hpe.com: merge changes, rewrite patch description]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya <cpan...@codeaurora.org>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Michal Hocko <mho...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: <sta...@vger.kernel.org>
---
v3-UPDATE - Rewrite patch description
---
 arch/arm64/mm/mmu.c   |4 ++--
 arch/x86/mm/pgtable.c |   12 +++-
 include/asm-generic/pgtable.h |8 
 lib/ioremap.c |4 ++--
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 3f7180bc5f52..f60fdf411103 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,11 +719,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
pmd_t *pmd;
int i;
@@ -734,7 +735,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
 
for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i]))
+   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
return 0;
 
pud_clear(pud);
@@ -746,11 +747,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
pte_t *pte;
 
@@ -766,7 +768,7 @@ int pmd_free_pte_page(pmd_t *pmd)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
@@ -775,7 +777,7 @@ int pud_free_pmd_page(pud_t *pud)
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, u

[PATCH v3-UPADATE 2/3] ioremap: Update pgtable free interfaces with addr

2018-05-17 Thread Toshi Kani
From: Chintan Pandya 

The following kernel panic was observed on ARM64 platform due to a stale
TLB entry.

 1. ioremap with 4K size, a valid pte page table is set.
 2. iounmap it, its pte entry is set to 0.
 3. ioremap the same address with 2M size, update its pmd entry with
a new value.
 4. CPU may hit an exception because the old pmd entry is still in TLB,
which leads to a kernel panic.

Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page
table") has addressed this panic by falling to pte mappings in the above
case on ARM64.

To support pmd mappings in all cases, TLB purge needs to be performed
in this case on ARM64.

Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
so that TLB purge can be added later in seprate patches.

[to...@hpe.com: merge changes, rewrite patch description]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya 
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Will Deacon 
Cc: Joerg Roedel 
Cc: 
---
v3-UPDATE - Rewrite patch description
---
 arch/arm64/mm/mmu.c   |4 ++--
 arch/x86/mm/pgtable.c |   12 +++-
 include/asm-generic/pgtable.h |8 
 lib/ioremap.c |4 ++--
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 3f7180bc5f52..f60fdf411103 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,11 +719,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
pmd_t *pmd;
int i;
@@ -734,7 +735,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
 
for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i]))
+   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
return 0;
 
pud_clear(pud);
@@ -746,11 +747,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
pte_t *pte;
 
@@ -766,7 +768,7 @@ int pmd_free_pte_page(pmd_t *pmd)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
@@ -775,7 +777,7 @@ int pud_free_pmd_page(pud_t *pud)
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+

[PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces

2018-05-16 Thread Toshi Kani
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Michal Hocko <mho...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: <sta...@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f60fdf411103..7e96594c7e97 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -721,24 +721,42 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-   pmd_t *pmd;
+   pmd_t *pmd, *pmd_sv;
+   pte_t *pte;
int i;
 
if (pud_none(*pud))
return 1;
 
pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+   if (!pmd_sv)
+   return 0;
 
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
-   return 0;
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_sv[i] = pmd[i];
+   if (!pmd_none(pmd[i]))
+   pmd_clear([i]);
+   }
 
pud_clear(pud);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd_sv[i])) {
+   pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+   free_page((unsigned long)pte);
+   }
+   }
+
+   free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
 
return 1;
@@ -749,7 +767,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -761,6 +779,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
 
return 1;


[PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces

2018-05-16 Thread Toshi Kani
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Joerg Roedel 
Cc: 
---
 arch/x86/mm/pgtable.c |   34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f60fdf411103..7e96594c7e97 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -721,24 +721,42 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-   pmd_t *pmd;
+   pmd_t *pmd, *pmd_sv;
+   pte_t *pte;
int i;
 
if (pud_none(*pud))
return 1;
 
pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+   if (!pmd_sv)
+   return 0;
 
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
-   return 0;
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_sv[i] = pmd[i];
+   if (!pmd_none(pmd[i]))
+   pmd_clear([i]);
+   }
 
pud_clear(pud);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd_sv[i])) {
+   pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+   free_page((unsigned long)pte);
+   }
+   }
+
+   free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
 
return 1;
@@ -749,7 +767,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -761,6 +779,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
 
return 1;


[PATCH v3 1/3] x86/mm: disable ioremap free page handling on x86-PAE

2018-05-16 Thread Toshi Kani
ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel <j...@8bytes.org>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Michal Hocko <mho...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: <sta...@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..3f7180bc5f52 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -715,6 +715,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -762,4 +763,22 @@ int pmd_free_pte_page(pmd_t *pmd)
 
return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH v3 1/3] x86/mm: disable ioremap free page handling on x86-PAE

2018-05-16 Thread Toshi Kani
ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel 
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Joerg Roedel 
Cc: 
---
 arch/x86/mm/pgtable.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..3f7180bc5f52 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -715,6 +715,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -762,4 +763,22 @@ int pmd_free_pte_page(pmd_t *pmd)
 
return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH v3 2/3] ioremap: Update pgtable free interfaces with addr

2018-05-16 Thread Toshi Kani
From: Chintan Pandya <cpan...@codeaurora.org>

This patch ("mm/vmalloc: Add interfaces to free unmapped
page table") adds following 2 interfaces to free the page
table in case we implement huge mapping.

pud_free_pmd_page() and pmd_free_pte_page()

Some architectures (like arm64) needs to do proper TLB
maintanance after updating pagetable entry even in map.
Why ? Read this,
https://patchwork.kernel.org/patch/10134581/

Pass 'addr' in these interfaces so that proper TLB ops
can be performed.

[to...@hpe.com: merge changes]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya <cpan...@codeaurora.org>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: <sta...@vger.kernel.org>
---
 arch/arm64/mm/mmu.c   |4 ++--
 arch/x86/mm/pgtable.c |   12 +++-
 include/asm-generic/pgtable.h |8 
 lib/ioremap.c |4 ++--
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 3f7180bc5f52..f60fdf411103 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,11 +719,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
pmd_t *pmd;
int i;
@@ -734,7 +735,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
 
for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i]))
+   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
return 0;
 
pud_clear(pud);
@@ -746,11 +747,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
pte_t *pte;
 
@@ -766,7 +768,7 @@ int pmd_free_pte_page(pmd_t *pmd)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
@@ -775,7 +777,7 @@ int pud_free_pmd_page(pud_t *pud)
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-   pmd_free_pte_page(pmd)) {
+   pmd_free_pte_page(pmd, addr))

[PATCH v3 2/3] ioremap: Update pgtable free interfaces with addr

2018-05-16 Thread Toshi Kani
From: Chintan Pandya 

This patch ("mm/vmalloc: Add interfaces to free unmapped
page table") adds following 2 interfaces to free the page
table in case we implement huge mapping.

pud_free_pmd_page() and pmd_free_pte_page()

Some architectures (like arm64) needs to do proper TLB
maintanance after updating pagetable entry even in map.
Why ? Read this,
https://patchwork.kernel.org/patch/10134581/

Pass 'addr' in these interfaces so that proper TLB ops
can be performed.

[to...@hpe.com: merge changes]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya 
Signed-off-by: Toshi Kani 
Cc: 
---
 arch/arm64/mm/mmu.c   |4 ++--
 arch/x86/mm/pgtable.c |   12 +++-
 include/asm-generic/pgtable.h |8 
 lib/ioremap.c |4 ++--
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 3f7180bc5f52..f60fdf411103 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,11 +719,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
pmd_t *pmd;
int i;
@@ -734,7 +735,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
 
for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i]))
+   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
return 0;
 
pud_clear(pud);
@@ -746,11 +747,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
pte_t *pte;
 
@@ -766,7 +768,7 @@ int pmd_free_pte_page(pmd_t *pmd)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
@@ -775,7 +777,7 @@ int pud_free_pmd_page(pud_t *pud)
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-   pmd_free_pte_page(pmd)) {
+   pmd_free_pte_page(pmd, addr)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -119,

[PATCH v3 0/3] fix free pmd/pte page handlings on x86

2018-05-16 Thread Toshi Kani
This series fixes two issues in the x86 ioremap free page handlings
for pud/pmd mappings.

Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
the free page handling on x86-PAE.

Patch 02-03 fixes a possible issue with speculation which can cause
stale page-directory cache.
 - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
   'addr', with my merge change to patch 01.
 - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
   that may be cached by speculation.  See the patch descriptions for
   more detal.

[1] https://patchwork.kernel.org/patch/10371015/

v3:
 - Fixed a build error in v2.

v2:
 - Reordered patch-set, so that patch 01 can be applied independently.
 - Added a NULL pointer check for the page alloc in patch 03. 

---
Toshi Kani (2):
  1/3 x86/mm: disable ioremap free page handling on x86-PAE
  3/3 x86/mm: add TLB purge to free pmd/pte page interfaces

Chintan Pandya (1):
  2/3 ioremap: Update pgtable free interfaces with addr

---
 arch/arm64/mm/mmu.c   |  4 +--
 arch/x86/mm/pgtable.c | 59 +--
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c |  4 +--
 4 files changed, 59 insertions(+), 16 deletions(-)



[PATCH v3 0/3] fix free pmd/pte page handlings on x86

2018-05-16 Thread Toshi Kani
This series fixes two issues in the x86 ioremap free page handlings
for pud/pmd mappings.

Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
the free page handling on x86-PAE.

Patch 02-03 fixes a possible issue with speculation which can cause
stale page-directory cache.
 - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
   'addr', with my merge change to patch 01.
 - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
   that may be cached by speculation.  See the patch descriptions for
   more detal.

[1] https://patchwork.kernel.org/patch/10371015/

v3:
 - Fixed a build error in v2.

v2:
 - Reordered patch-set, so that patch 01 can be applied independently.
 - Added a NULL pointer check for the page alloc in patch 03. 

---
Toshi Kani (2):
  1/3 x86/mm: disable ioremap free page handling on x86-PAE
  3/3 x86/mm: add TLB purge to free pmd/pte page interfaces

Chintan Pandya (1):
  2/3 ioremap: Update pgtable free interfaces with addr

---
 arch/arm64/mm/mmu.c   |  4 +--
 arch/x86/mm/pgtable.c | 59 +--
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c |  4 +--
 4 files changed, 59 insertions(+), 16 deletions(-)



[PATCH v2 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces

2018-05-15 Thread Toshi Kani
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Michal Hocko <mho...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: <sta...@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f60fdf411103..7e96594c7e97 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -721,24 +721,42 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-   pmd_t *pmd;
+   pmd_t *pmd, *pmd_sv;
+   pte_t *pte;
int i;
 
if (pud_none(*pud))
return 1;
 
pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+   if (!pmd_sv)
+   return 0;
 
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
-   return 0;
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_sv[i] = pmd[i];
+   if (!pmd_none(pmd[i]))
+   pmd_clear([i]);
+   }
 
pud_clear(pud);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd_sv[i])) {
+   pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+   free_page((unsigned long)pte);
+   }
+   }
+
+   free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
 
return 1;
@@ -749,7 +767,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -761,6 +779,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
 
return 1;


[PATCH v2 1/3] x86/mm: disable ioremap free page handling on x86-PAE

2018-05-15 Thread Toshi Kani
ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel <j...@8bytes.org>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Michal Hocko <mho...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: <sta...@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..08cdd7c13619 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -715,6 +715,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -762,4 +763,22 @@ int pmd_free_pte_page(pmd_t *pmd)
 
return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+   return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
+{
+   return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH v2 0/3] fix free pmd/pte page handlings on x86

2018-05-15 Thread Toshi Kani
This series fixes two issues in the x86 ioremap free page handlings
for pud/pmd mappings.

Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
the free page handling on x86-PAE.

Patch 02-03 fixes a possible issue with speculation which can cause
stale page-directory cache.
 - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
   'addr'.  This avoids merge conflicts with his series.
 - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
   that may be cached by speculation.  See the patch descriptions for
   more detal.

[1] https://patchwork.kernel.org/patch/10371015/

v2:
 - Reordered patch-set, so that patch 01 can be applied independently.
 - Added a NULL pointer check for the page alloc in patch 03. 

---
Toshi Kani (2):
  1/3 x86/mm: disable ioremap free page handling on x86-PAE
  3/3 x86/mm: add TLB purge to free pmd/pte page interfaces

Chintan Pandya (1):
  2/3 ioremap: Update pgtable free interfaces with addr

---
 arch/arm64/mm/mmu.c   |  4 +--
 arch/x86/mm/pgtable.c | 59 +--
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c |  4 +--
 4 files changed, 59 insertions(+), 16 deletions(-)


[PATCH v2 0/3] fix free pmd/pte page handlings on x86

2018-05-15 Thread Toshi Kani
This series fixes two issues in the x86 ioremap free page handlings
for pud/pmd mappings.

Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
the free page handling on x86-PAE.

Patch 02-03 fixes a possible issue with speculation which can cause
stale page-directory cache.
 - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
   'addr'.  This avoids merge conflicts with his series.
 - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
   that may be cached by speculation.  See the patch descriptions for
   more detal.

[1] https://patchwork.kernel.org/patch/10371015/

v2:
 - Reordered patch-set, so that patch 01 can be applied independently.
 - Added a NULL pointer check for the page alloc in patch 03. 

---
Toshi Kani (2):
  1/3 x86/mm: disable ioremap free page handling on x86-PAE
  3/3 x86/mm: add TLB purge to free pmd/pte page interfaces

Chintan Pandya (1):
  2/3 ioremap: Update pgtable free interfaces with addr

---
 arch/arm64/mm/mmu.c   |  4 +--
 arch/x86/mm/pgtable.c | 59 +--
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c |  4 +--
 4 files changed, 59 insertions(+), 16 deletions(-)


[PATCH v2 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces

2018-05-15 Thread Toshi Kani
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Joerg Roedel 
Cc: 
---
 arch/x86/mm/pgtable.c |   34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f60fdf411103..7e96594c7e97 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -721,24 +721,42 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-   pmd_t *pmd;
+   pmd_t *pmd, *pmd_sv;
+   pte_t *pte;
int i;
 
if (pud_none(*pud))
return 1;
 
pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+   if (!pmd_sv)
+   return 0;
 
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
-   return 0;
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_sv[i] = pmd[i];
+   if (!pmd_none(pmd[i]))
+   pmd_clear([i]);
+   }
 
pud_clear(pud);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd_sv[i])) {
+   pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+   free_page((unsigned long)pte);
+   }
+   }
+
+   free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
 
return 1;
@@ -749,7 +767,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -761,6 +779,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
 
return 1;


[PATCH v2 1/3] x86/mm: disable ioremap free page handling on x86-PAE

2018-05-15 Thread Toshi Kani
ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel 
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Joerg Roedel 
Cc: 
---
 arch/x86/mm/pgtable.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..08cdd7c13619 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -715,6 +715,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -762,4 +763,22 @@ int pmd_free_pte_page(pmd_t *pmd)
 
return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+   return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
+{
+   return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH v2 2/3] ioremap: Update pgtable free interfaces with addr

2018-05-15 Thread Toshi Kani
From: Chintan Pandya <cpan...@codeaurora.org>

This patch ("mm/vmalloc: Add interfaces to free unmapped
page table") adds following 2 interfaces to free the page
table in case we implement huge mapping.

pud_free_pmd_page() and pmd_free_pte_page()

Some architectures (like arm64) needs to do proper TLB
maintanance after updating pagetable entry even in map.
Why ? Read this,
https://patchwork.kernel.org/patch/10134581/

Pass 'addr' in these interfaces so that proper TLB ops
can be performed.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya <cpan...@codeaurora.org>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: <sta...@vger.kernel.org>
---
 arch/arm64/mm/mmu.c   |4 ++--
 arch/x86/mm/pgtable.c |8 +---
 include/asm-generic/pgtable.h |8 
 lib/ioremap.c |4 ++--
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 08cdd7c13619..f60fdf411103 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,11 +719,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
pmd_t *pmd;
int i;
@@ -734,7 +735,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
 
for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i]))
+   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
return 0;
 
pud_clear(pud);
@@ -746,11 +747,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
pte_t *pte;
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-   pmd_free_pte_page(pmd)) {
+   pmd_free_pte_page(pmd, addr)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -119,7 +119,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
-   pud_free_pmd_page(pud)) {
+   pud_free_pmd_page(pud, addr)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}


[PATCH v2 2/3] ioremap: Update pgtable free interfaces with addr

2018-05-15 Thread Toshi Kani
From: Chintan Pandya 

This patch ("mm/vmalloc: Add interfaces to free unmapped
page table") adds following 2 interfaces to free the page
table in case we implement huge mapping.

pud_free_pmd_page() and pmd_free_pte_page()

Some architectures (like arm64) needs to do proper TLB
maintanance after updating pagetable entry even in map.
Why ? Read this,
https://patchwork.kernel.org/patch/10134581/

Pass 'addr' in these interfaces so that proper TLB ops
can be performed.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya 
Signed-off-by: Toshi Kani 
Cc: 
---
 arch/arm64/mm/mmu.c   |4 ++--
 arch/x86/mm/pgtable.c |8 +---
 include/asm-generic/pgtable.h |8 
 lib/ioremap.c |4 ++--
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 08cdd7c13619..f60fdf411103 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,11 +719,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
pmd_t *pmd;
int i;
@@ -734,7 +735,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
 
for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i]))
+   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
return 0;
 
pud_clear(pud);
@@ -746,11 +747,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
pte_t *pte;
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-   pmd_free_pte_page(pmd)) {
+   pmd_free_pte_page(pmd, addr)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -119,7 +119,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
-   pud_free_pmd_page(pud)) {
+   pud_free_pmd_page(pud, addr)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}


[PATCH] ghes_edac: add DDR4 and NVDIMM memory types

2018-05-09 Thread Toshi Kani
The ghes_edac driver obtains memory type from SMBIOS type 17,
but it does not recognize DDR4 and NVDIMM types.

Add support of DDR4 and NVDIMM types.  NVDIMM type is set when
memory type is DDR3/4 and non-volatile.

Reported-by: Robert Elliott <elli...@hpe.com>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Mauro Carvalho Chehab <mche...@kernel.org>
---
 drivers/edac/ghes_edac.c |   12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 68b6ee18bea6..d0399273018d 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -123,11 +123,21 @@ static void ghes_edac_dmidecode(const struct dmi_header 
*dh, void *arg)
dimm->mtype = MEM_FB_DDR2;
break;
case 0x18:
-   if (entry->type_detail & 1 << 13)
+   if (entry->type_detail & 1 << 12)
+   dimm->mtype = MEM_NVDIMM;
+   else if (entry->type_detail & 1 << 13)
dimm->mtype = MEM_RDDR3;
else
dimm->mtype = MEM_DDR3;
break;
+   case 0x1a:
+   if (entry->type_detail & 1 << 12)
+   dimm->mtype = MEM_NVDIMM;
+   else if (entry->type_detail & 1 << 13)
+   dimm->mtype = MEM_RDDR4;
+   else
+   dimm->mtype = MEM_DDR4;
+   break;
default:
if (entry->type_detail & 1 << 6)
dimm->mtype = MEM_RMBS;


[PATCH] ghes_edac: add DDR4 and NVDIMM memory types

2018-05-09 Thread Toshi Kani
The ghes_edac driver obtains memory type from SMBIOS type 17,
but it does not recognize DDR4 and NVDIMM types.

Add support of DDR4 and NVDIMM types.  NVDIMM type is set when
memory type is DDR3/4 and non-volatile.

Reported-by: Robert Elliott 
Signed-off-by: Toshi Kani 
Cc: Borislav Petkov 
Cc: Mauro Carvalho Chehab 
---
 drivers/edac/ghes_edac.c |   12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 68b6ee18bea6..d0399273018d 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -123,11 +123,21 @@ static void ghes_edac_dmidecode(const struct dmi_header 
*dh, void *arg)
dimm->mtype = MEM_FB_DDR2;
break;
case 0x18:
-   if (entry->type_detail & 1 << 13)
+   if (entry->type_detail & 1 << 12)
+   dimm->mtype = MEM_NVDIMM;
+   else if (entry->type_detail & 1 << 13)
dimm->mtype = MEM_RDDR3;
else
dimm->mtype = MEM_DDR3;
break;
+   case 0x1a:
+   if (entry->type_detail & 1 << 12)
+   dimm->mtype = MEM_NVDIMM;
+   else if (entry->type_detail & 1 << 13)
+   dimm->mtype = MEM_RDDR4;
+   else
+   dimm->mtype = MEM_DDR4;
+   break;
default:
if (entry->type_detail & 1 << 6)
dimm->mtype = MEM_RMBS;


[PATCH 2/3] x86/mm: add TLB purge to free pmd/pte page interfaces

2018-04-30 Thread Toshi Kani
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Michal Hocko <mho...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: <sta...@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   32 ++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 37e3cbac59b9..816fd41ee854 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -720,24 +720,40 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-   pmd_t *pmd;
+   pmd_t *pmd, *pmd_sv;
+   pte_t *pte;
int i;
 
if (pud_none(*pud))
return 1;
 
pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
 
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
-   return 0;
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_sv[i] = pmd[i];
+   if (!pmd_none(pmd[i]))
+   pmd_clear([i]);
+   }
 
pud_clear(pud);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd_sv[i])) {
+   pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+   free_page((unsigned long)pte);
+   }
+   }
+
+   free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
 
return 1;
@@ -748,7 +764,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -760,6 +776,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
 
return 1;


[PATCH 2/3] x86/mm: add TLB purge to free pmd/pte page interfaces

2018-04-30 Thread Toshi Kani
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Joerg Roedel 
Cc: 
---
 arch/x86/mm/pgtable.c |   32 ++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 37e3cbac59b9..816fd41ee854 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -720,24 +720,40 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-   pmd_t *pmd;
+   pmd_t *pmd, *pmd_sv;
+   pte_t *pte;
int i;
 
if (pud_none(*pud))
return 1;
 
pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
 
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
-   return 0;
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_sv[i] = pmd[i];
+   if (!pmd_none(pmd[i]))
+   pmd_clear([i]);
+   }
 
pud_clear(pud);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd_sv[i])) {
+   pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+   free_page((unsigned long)pte);
+   }
+   }
+
+   free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
 
return 1;
@@ -748,7 +764,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -760,6 +776,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+   /* INVLPG to clear all paging-structure caches */
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
 
return 1;


[PATCH 0/3] fix free pmd/pte page handlings on x86

2018-04-30 Thread Toshi Kani
This series fixes x86 ioremap free page handlings when setting up
pud/pmd maps.

Patch 01 is from Chintan's v9 01/04 patch [1], which adds a new arg 'addr'.
This avoids merge conflicts with his series.

Patch 02 adds a TLB purge (INVLPG) to purge page-structure caches that
may be cached by speculation.  See patch 2/2 for the detals.

Patch 03 disables free page handling on x86-PAE to address BUG_ON reported
by Joerg.

[1] https://patchwork.kernel.org/patch/10371015/

---
Chintan Pandya (1):
  1/3 ioremap: Update pgtable free interfaces with addr

Toshi Kani (2):
  2/3 x86/mm: add TLB purge to free pmd/pte page interfaces
  3/3 x86/mm: disable ioremap free page handling on x86-PAE

---
 arch/arm64/mm/mmu.c   |  4 +--
 arch/x86/mm/pgtable.c | 57 +--
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c |  4 +--
 4 files changed, 57 insertions(+), 16 deletions(-)


[PATCH 0/3] fix free pmd/pte page handlings on x86

2018-04-30 Thread Toshi Kani
This series fixes x86 ioremap free page handlings when setting up
pud/pmd maps.

Patch 01 is from Chintan's v9 01/04 patch [1], which adds a new arg 'addr'.
This avoids merge conflicts with his series.

Patch 02 adds a TLB purge (INVLPG) to purge page-structure caches that
may be cached by speculation.  See patch 2/2 for the detals.

Patch 03 disables free page handling on x86-PAE to address BUG_ON reported
by Joerg.

[1] https://patchwork.kernel.org/patch/10371015/

---
Chintan Pandya (1):
  1/3 ioremap: Update pgtable free interfaces with addr

Toshi Kani (2):
  2/3 x86/mm: add TLB purge to free pmd/pte page interfaces
  3/3 x86/mm: disable ioremap free page handling on x86-PAE

---
 arch/arm64/mm/mmu.c   |  4 +--
 arch/x86/mm/pgtable.c | 57 +--
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c |  4 +--
 4 files changed, 57 insertions(+), 16 deletions(-)


[PATCH 1/3] ioremap: Update pgtable free interfaces with addr

2018-04-30 Thread Toshi Kani
From: Chintan Pandya <cpan...@codeaurora.org>

This patch ("mm/vmalloc: Add interfaces to free unmapped
page table") adds following 2 interfaces to free the page
table in case we implement huge mapping.

pud_free_pmd_page() and pmd_free_pte_page()

Some architectures (like arm64) needs to do proper TLB
maintanance after updating pagetable entry even in map.
Why ? Read this,
https://patchwork.kernel.org/patch/10134581/

Pass 'addr' in these interfaces so that proper TLB ops
can be performed.

Fixes: b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page table")
Signed-off-by: Chintan Pandya <cpan...@codeaurora.org>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: <sta...@vger.kernel.org>
---
 arch/arm64/mm/mmu.c   |4 ++--
 arch/x86/mm/pgtable.c |8 +---
 include/asm-generic/pgtable.h |8 
 lib/ioremap.c |4 ++--
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..37e3cbac59b9 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -718,11 +718,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
pmd_t *pmd;
int i;
@@ -733,7 +734,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
 
for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i]))
+   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
return 0;
 
pud_clear(pud);
@@ -745,11 +746,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
pte_t *pte;
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-   pmd_free_pte_page(pmd)) {
+   pmd_free_pte_page(pmd, addr)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -119,7 +119,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
-   pud_free_pmd_page(pud)) {
+   pud_free_pmd_page(pud, addr)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}


[PATCH 3/3] x86/mm: disable ioremap free page handling on x86-PAE

2018-04-30 Thread Toshi Kani
ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel <j...@8bytes.org>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Michal Hocko <mho...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: <sta...@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 816fd41ee854..809115150d8b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -715,6 +715,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -784,4 +785,22 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+   return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
+{
+   return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH 1/3] ioremap: Update pgtable free interfaces with addr

2018-04-30 Thread Toshi Kani
From: Chintan Pandya 

This patch ("mm/vmalloc: Add interfaces to free unmapped
page table") adds following 2 interfaces to free the page
table in case we implement huge mapping.

pud_free_pmd_page() and pmd_free_pte_page()

Some architectures (like arm64) needs to do proper TLB
maintanance after updating pagetable entry even in map.
Why ? Read this,
https://patchwork.kernel.org/patch/10134581/

Pass 'addr' in these interfaces so that proper TLB ops
can be performed.

Fixes: b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page table")
Signed-off-by: Chintan Pandya 
Signed-off-by: Toshi Kani 
Cc: 
---
 arch/arm64/mm/mmu.c   |4 ++--
 arch/x86/mm/pgtable.c |8 +---
 include/asm-generic/pgtable.h |8 
 lib/ioremap.c |4 ++--
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..37e3cbac59b9 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -718,11 +718,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
pmd_t *pmd;
int i;
@@ -733,7 +734,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
 
for (i = 0; i < PTRS_PER_PMD; i++)
-   if (!pmd_free_pte_page([i]))
+   if (!pmd_free_pte_page([i], addr + (i * PMD_SIZE)))
return 0;
 
pud_clear(pud);
@@ -745,11 +746,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
pte_t *pte;
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-   pmd_free_pte_page(pmd)) {
+   pmd_free_pte_page(pmd, addr)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -119,7 +119,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
-   pud_free_pmd_page(pud)) {
+   pud_free_pmd_page(pud, addr)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}


[PATCH 3/3] x86/mm: disable ioremap free page handling on x86-PAE

2018-04-30 Thread Toshi Kani
ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel 
Signed-off-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Joerg Roedel 
Cc: 
---
 arch/x86/mm/pgtable.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 816fd41ee854..809115150d8b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -715,6 +715,7 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -784,4 +785,22 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+   return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
+{
+   return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH] pmem: fix badblocks population for raw mode

2018-04-25 Thread Toshi Kani
pmem_attach_disk() calls nvdimm_badblocks_populate() with resource
range uninitialized in the case of raw mode.  This leads the pmem
driver to hit MCE despite of ARS reporting the range bad.

Initialize 'bb_res' for raw mode.

Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface to use 
struct dev_pagemap")
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: <sta...@vger.kernel.org>
---
 drivers/nvdimm/pmem.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 9d714926ecf5..2d7875209bce 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -367,9 +367,11 @@ static int pmem_attach_disk(struct device *dev,
addr = devm_memremap_pages(dev, >pgmap);
pmem->pfn_flags |= PFN_MAP;
memcpy(_res, >pgmap.res, sizeof(bb_res));
-   } else
+   } else {
addr = devm_memremap(dev, pmem->phys_addr,
pmem->size, ARCH_MEMREMAP_PMEM);
+   memcpy(_res, res, sizeof(bb_res));
+   }
 
/*
 * At release time the queue must be frozen before


[PATCH] pmem: fix badblocks population for raw mode

2018-04-25 Thread Toshi Kani
pmem_attach_disk() calls nvdimm_badblocks_populate() with resource
range uninitialized in the case of raw mode.  This leads the pmem
driver to hit MCE despite of ARS reporting the range bad.

Initialize 'bb_res' for raw mode.

Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface to use 
struct dev_pagemap")
Signed-off-by: Toshi Kani 
Cc: Christoph Hellwig 
Cc: Dan Williams 
Cc: 
---
 drivers/nvdimm/pmem.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 9d714926ecf5..2d7875209bce 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -367,9 +367,11 @@ static int pmem_attach_disk(struct device *dev,
addr = devm_memremap_pages(dev, >pgmap);
pmem->pfn_flags |= PFN_MAP;
memcpy(_res, >pgmap.res, sizeof(bb_res));
-   } else
+   } else {
addr = devm_memremap(dev, pmem->phys_addr,
pmem->size, ARCH_MEMREMAP_PMEM);
+   memcpy(_res, res, sizeof(bb_res));
+   }
 
/*
 * At release time the queue must be frozen before


[tip:x86/mm] x86/mm: Remove pointless checks in vmalloc_fault

2018-03-15 Thread tip-bot for Toshi Kani
Commit-ID:  565977a3d929fc4427769117a8ac976ec16776d5
Gitweb: https://git.kernel.org/tip/565977a3d929fc4427769117a8ac976ec16776d5
Author: Toshi Kani <toshi.k...@hpe.com>
AuthorDate: Wed, 14 Mar 2018 14:59:32 -0600
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Thu, 15 Mar 2018 15:27:47 +0100

x86/mm: Remove pointless checks in vmalloc_fault

vmalloc_fault() sets user's pgd or p4d from the kernel page table.  Once
it's set, all tables underneath are identical. There is no point of
following the same page table with two separate pointers and make sure they
see the same with BUG().

Remove the pointless checks in vmalloc_fault(). Also rename the kernel
pgd/p4d pointers to pgd_k/p4d_k so that their names are consistent in the
file.

Suggested-by: Andy Lutomirski <l...@kernel.org>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Cc: linux...@kvack.org
Cc: Borislav Petkov <b...@alien8.de>
Cc: Gratian Crisan <gratian.cri...@ni.com>
Link: https://lkml.kernel.org/r/20180314205932.7193-1-toshi.k...@hpe.com

---
 arch/x86/mm/fault.c | 56 -
 1 file changed, 17 insertions(+), 39 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 84d702a71afe..70c3b1c43676 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -417,11 +417,11 @@ void vmalloc_sync_all(void)
  */
 static noinline int vmalloc_fault(unsigned long address)
 {
-   pgd_t *pgd, *pgd_ref;
-   p4d_t *p4d, *p4d_ref;
-   pud_t *pud, *pud_ref;
-   pmd_t *pmd, *pmd_ref;
-   pte_t *pte, *pte_ref;
+   pgd_t *pgd, *pgd_k;
+   p4d_t *p4d, *p4d_k;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
 
/* Make sure we are in vmalloc area: */
if (!(address >= VMALLOC_START && address < VMALLOC_END))
@@ -435,73 +435,51 @@ static noinline int vmalloc_fault(unsigned long address)
 * case just flush:
 */
pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address);
-   pgd_ref = pgd_offset_k(address);
-   if (pgd_none(*pgd_ref))
+   pgd_k = pgd_offset_k(address);
+   if (pgd_none(*pgd_k))
return -1;
 
if (pgtable_l5_enabled) {
if (pgd_none(*pgd)) {
-   set_pgd(pgd, *pgd_ref);
+   set_pgd(pgd, *pgd_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(pgd_page_vaddr(*pgd) != 
pgd_page_vaddr(*pgd_ref));
+   BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_k));
}
}
 
/* With 4-level paging, copying happens on the p4d level. */
p4d = p4d_offset(pgd, address);
-   p4d_ref = p4d_offset(pgd_ref, address);
-   if (p4d_none(*p4d_ref))
+   p4d_k = p4d_offset(pgd_k, address);
+   if (p4d_none(*p4d_k))
return -1;
 
if (p4d_none(*p4d) && !pgtable_l5_enabled) {
-   set_p4d(p4d, *p4d_ref);
+   set_p4d(p4d, *p4d_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_ref));
+   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_k));
}
 
-   /*
-* Below here mismatches are bugs because these lower tables
-* are shared:
-*/
BUILD_BUG_ON(CONFIG_PGTABLE_LEVELS < 4);
 
pud = pud_offset(p4d, address);
-   pud_ref = pud_offset(p4d_ref, address);
-   if (pud_none(*pud_ref))
+   if (pud_none(*pud))
return -1;
 
-   if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
-   BUG();
-
if (pud_large(*pud))
return 0;
 
pmd = pmd_offset(pud, address);
-   pmd_ref = pmd_offset(pud_ref, address);
-   if (pmd_none(*pmd_ref))
+   if (pmd_none(*pmd))
return -1;
 
-   if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
-   BUG();
-
if (pmd_large(*pmd))
return 0;
 
-   pte_ref = pte_offset_kernel(pmd_ref, address);
-   if (!pte_present(*pte_ref))
-   return -1;
-
pte = pte_offset_kernel(pmd, address);
-
-   /*
-* Don't use pte_page here, because the mappings can point
-* outside mem_map, and the NUMA hash lookup cannot handle
-* that:
-*/
-   if (!pte_present(*pte) || pte_pfn(*pte) != pte_pfn(*pte_ref))
-   BUG();
+   if (!pte_present(*pte))
+   return -1;
 
return 0;
 }


[tip:x86/mm] x86/mm: Remove pointless checks in vmalloc_fault

2018-03-15 Thread tip-bot for Toshi Kani
Commit-ID:  565977a3d929fc4427769117a8ac976ec16776d5
Gitweb: https://git.kernel.org/tip/565977a3d929fc4427769117a8ac976ec16776d5
Author: Toshi Kani 
AuthorDate: Wed, 14 Mar 2018 14:59:32 -0600
Committer:  Thomas Gleixner 
CommitDate: Thu, 15 Mar 2018 15:27:47 +0100

x86/mm: Remove pointless checks in vmalloc_fault

vmalloc_fault() sets user's pgd or p4d from the kernel page table.  Once
it's set, all tables underneath are identical. There is no point of
following the same page table with two separate pointers and make sure they
see the same with BUG().

Remove the pointless checks in vmalloc_fault(). Also rename the kernel
pgd/p4d pointers to pgd_k/p4d_k so that their names are consistent in the
file.

Suggested-by: Andy Lutomirski 
Signed-off-by: Toshi Kani 
Signed-off-by: Thomas Gleixner 
Cc: linux...@kvack.org
Cc: Borislav Petkov 
Cc: Gratian Crisan 
Link: https://lkml.kernel.org/r/20180314205932.7193-1-toshi.k...@hpe.com

---
 arch/x86/mm/fault.c | 56 -
 1 file changed, 17 insertions(+), 39 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 84d702a71afe..70c3b1c43676 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -417,11 +417,11 @@ void vmalloc_sync_all(void)
  */
 static noinline int vmalloc_fault(unsigned long address)
 {
-   pgd_t *pgd, *pgd_ref;
-   p4d_t *p4d, *p4d_ref;
-   pud_t *pud, *pud_ref;
-   pmd_t *pmd, *pmd_ref;
-   pte_t *pte, *pte_ref;
+   pgd_t *pgd, *pgd_k;
+   p4d_t *p4d, *p4d_k;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
 
/* Make sure we are in vmalloc area: */
if (!(address >= VMALLOC_START && address < VMALLOC_END))
@@ -435,73 +435,51 @@ static noinline int vmalloc_fault(unsigned long address)
 * case just flush:
 */
pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address);
-   pgd_ref = pgd_offset_k(address);
-   if (pgd_none(*pgd_ref))
+   pgd_k = pgd_offset_k(address);
+   if (pgd_none(*pgd_k))
return -1;
 
if (pgtable_l5_enabled) {
if (pgd_none(*pgd)) {
-   set_pgd(pgd, *pgd_ref);
+   set_pgd(pgd, *pgd_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(pgd_page_vaddr(*pgd) != 
pgd_page_vaddr(*pgd_ref));
+   BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_k));
}
}
 
/* With 4-level paging, copying happens on the p4d level. */
p4d = p4d_offset(pgd, address);
-   p4d_ref = p4d_offset(pgd_ref, address);
-   if (p4d_none(*p4d_ref))
+   p4d_k = p4d_offset(pgd_k, address);
+   if (p4d_none(*p4d_k))
return -1;
 
if (p4d_none(*p4d) && !pgtable_l5_enabled) {
-   set_p4d(p4d, *p4d_ref);
+   set_p4d(p4d, *p4d_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_ref));
+   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_k));
}
 
-   /*
-* Below here mismatches are bugs because these lower tables
-* are shared:
-*/
BUILD_BUG_ON(CONFIG_PGTABLE_LEVELS < 4);
 
pud = pud_offset(p4d, address);
-   pud_ref = pud_offset(p4d_ref, address);
-   if (pud_none(*pud_ref))
+   if (pud_none(*pud))
return -1;
 
-   if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
-   BUG();
-
if (pud_large(*pud))
return 0;
 
pmd = pmd_offset(pud, address);
-   pmd_ref = pmd_offset(pud_ref, address);
-   if (pmd_none(*pmd_ref))
+   if (pmd_none(*pmd))
return -1;
 
-   if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
-   BUG();
-
if (pmd_large(*pmd))
return 0;
 
-   pte_ref = pte_offset_kernel(pmd_ref, address);
-   if (!pte_present(*pte_ref))
-   return -1;
-
pte = pte_offset_kernel(pmd, address);
-
-   /*
-* Don't use pte_page here, because the mappings can point
-* outside mem_map, and the NUMA hash lookup cannot handle
-* that:
-*/
-   if (!pte_present(*pte) || pte_pfn(*pte) != pte_pfn(*pte_ref))
-   BUG();
+   if (!pte_present(*pte))
+   return -1;
 
return 0;
 }


[PATCH] x86/mm: remove pointless checks in vmalloc_fault

2018-03-14 Thread Toshi Kani
vmalloc_fault() sets user's pgd or p4d from the kernel page table.
Once it's set, all tables underneath are identical. There is no point
of following the same page table with two separate pointers and makes
sure they see the same with BUG().

Remove the pointless checks in vmalloc_fault(). Also rename the kernel
pgd/p4d pointers to pgd_k/p4d_k so that their names are consistent in
the file.

Suggested-by: Andy Lutomirski <l...@kernel.org>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Gratian Crisan <gratian.cri...@ni.com>
---
Rebased 2/2 patch on tip.
---
 arch/x86/mm/fault.c |   56 +++
 1 file changed, 17 insertions(+), 39 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8e012c3f6ad6..73bd8c95ac71 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -417,11 +417,11 @@ void vmalloc_sync_all(void)
  */
 static noinline int vmalloc_fault(unsigned long address)
 {
-   pgd_t *pgd, *pgd_ref;
-   p4d_t *p4d, *p4d_ref;
-   pud_t *pud, *pud_ref;
-   pmd_t *pmd, *pmd_ref;
-   pte_t *pte, *pte_ref;
+   pgd_t *pgd, *pgd_k;
+   p4d_t *p4d, *p4d_k;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
 
/* Make sure we are in vmalloc area: */
if (!(address >= VMALLOC_START && address < VMALLOC_END))
@@ -435,73 +435,51 @@ static noinline int vmalloc_fault(unsigned long address)
 * case just flush:
 */
pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address);
-   pgd_ref = pgd_offset_k(address);
-   if (pgd_none(*pgd_ref))
+   pgd_k = pgd_offset_k(address);
+   if (pgd_none(*pgd_k))
return -1;
 
if (pgtable_l5_enabled) {
if (pgd_none(*pgd)) {
-   set_pgd(pgd, *pgd_ref);
+   set_pgd(pgd, *pgd_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(pgd_page_vaddr(*pgd) != 
pgd_page_vaddr(*pgd_ref));
+   BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_k));
}
}
 
/* With 4-level paging, copying happens on the p4d level. */
p4d = p4d_offset(pgd, address);
-   p4d_ref = p4d_offset(pgd_ref, address);
-   if (p4d_none(*p4d_ref))
+   p4d_k = p4d_offset(pgd_k, address);
+   if (p4d_none(*p4d_k))
return -1;
 
if (p4d_none(*p4d) && !pgtable_l5_enabled) {
-   set_p4d(p4d, *p4d_ref);
+   set_p4d(p4d, *p4d_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_ref));
+   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_k));
}
 
-   /*
-* Below here mismatches are bugs because these lower tables
-* are shared:
-*/
BUILD_BUG_ON(CONFIG_PGTABLE_LEVELS < 4);
 
pud = pud_offset(p4d, address);
-   pud_ref = pud_offset(p4d_ref, address);
-   if (pud_none(*pud_ref))
+   if (pud_none(*pud))
return -1;
 
-   if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
-   BUG();
-
if (pud_large(*pud))
return 0;
 
pmd = pmd_offset(pud, address);
-   pmd_ref = pmd_offset(pud_ref, address);
-   if (pmd_none(*pmd_ref))
+   if (pmd_none(*pmd))
return -1;
 
-   if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
-   BUG();
-
if (pmd_large(*pmd))
return 0;
 
-   pte_ref = pte_offset_kernel(pmd_ref, address);
-   if (!pte_present(*pte_ref))
-   return -1;
-
pte = pte_offset_kernel(pmd, address);
-
-   /*
-* Don't use pte_page here, because the mappings can point
-* outside mem_map, and the NUMA hash lookup cannot handle
-* that:
-*/
-   if (!pte_present(*pte) || pte_pfn(*pte) != pte_pfn(*pte_ref))
-   BUG();
+   if (!pte_present(*pte))
+   return -1;
 
return 0;
 }


[PATCH] x86/mm: remove pointless checks in vmalloc_fault

2018-03-14 Thread Toshi Kani
vmalloc_fault() sets user's pgd or p4d from the kernel page table.
Once it's set, all tables underneath are identical. There is no point
of following the same page table with two separate pointers and makes
sure they see the same with BUG().

Remove the pointless checks in vmalloc_fault(). Also rename the kernel
pgd/p4d pointers to pgd_k/p4d_k so that their names are consistent in
the file.

Suggested-by: Andy Lutomirski 
Signed-off-by: Toshi Kani 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Gratian Crisan 
---
Rebased 2/2 patch on tip.
---
 arch/x86/mm/fault.c |   56 +++
 1 file changed, 17 insertions(+), 39 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8e012c3f6ad6..73bd8c95ac71 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -417,11 +417,11 @@ void vmalloc_sync_all(void)
  */
 static noinline int vmalloc_fault(unsigned long address)
 {
-   pgd_t *pgd, *pgd_ref;
-   p4d_t *p4d, *p4d_ref;
-   pud_t *pud, *pud_ref;
-   pmd_t *pmd, *pmd_ref;
-   pte_t *pte, *pte_ref;
+   pgd_t *pgd, *pgd_k;
+   p4d_t *p4d, *p4d_k;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
 
/* Make sure we are in vmalloc area: */
if (!(address >= VMALLOC_START && address < VMALLOC_END))
@@ -435,73 +435,51 @@ static noinline int vmalloc_fault(unsigned long address)
 * case just flush:
 */
pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address);
-   pgd_ref = pgd_offset_k(address);
-   if (pgd_none(*pgd_ref))
+   pgd_k = pgd_offset_k(address);
+   if (pgd_none(*pgd_k))
return -1;
 
if (pgtable_l5_enabled) {
if (pgd_none(*pgd)) {
-   set_pgd(pgd, *pgd_ref);
+   set_pgd(pgd, *pgd_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(pgd_page_vaddr(*pgd) != 
pgd_page_vaddr(*pgd_ref));
+   BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_k));
}
}
 
/* With 4-level paging, copying happens on the p4d level. */
p4d = p4d_offset(pgd, address);
-   p4d_ref = p4d_offset(pgd_ref, address);
-   if (p4d_none(*p4d_ref))
+   p4d_k = p4d_offset(pgd_k, address);
+   if (p4d_none(*p4d_k))
return -1;
 
if (p4d_none(*p4d) && !pgtable_l5_enabled) {
-   set_p4d(p4d, *p4d_ref);
+   set_p4d(p4d, *p4d_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_ref));
+   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_k));
}
 
-   /*
-* Below here mismatches are bugs because these lower tables
-* are shared:
-*/
BUILD_BUG_ON(CONFIG_PGTABLE_LEVELS < 4);
 
pud = pud_offset(p4d, address);
-   pud_ref = pud_offset(p4d_ref, address);
-   if (pud_none(*pud_ref))
+   if (pud_none(*pud))
return -1;
 
-   if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
-   BUG();
-
if (pud_large(*pud))
return 0;
 
pmd = pmd_offset(pud, address);
-   pmd_ref = pmd_offset(pud_ref, address);
-   if (pmd_none(*pmd_ref))
+   if (pmd_none(*pmd))
return -1;
 
-   if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
-   BUG();
-
if (pmd_large(*pmd))
return 0;
 
-   pte_ref = pte_offset_kernel(pmd_ref, address);
-   if (!pte_present(*pte_ref))
-   return -1;
-
pte = pte_offset_kernel(pmd, address);
-
-   /*
-* Don't use pte_page here, because the mappings can point
-* outside mem_map, and the NUMA hash lookup cannot handle
-* that:
-*/
-   if (!pte_present(*pte) || pte_pfn(*pte) != pte_pfn(*pte_ref))
-   BUG();
+   if (!pte_present(*pte))
+   return -1;
 
return 0;
 }


[tip:x86/urgent] x86/mm: Fix vmalloc_fault to use pXd_large

2018-03-14 Thread tip-bot for Toshi Kani
Commit-ID:  18a955219bf7d9008ce480d4451b6b8bf4483a22
Gitweb: https://git.kernel.org/tip/18a955219bf7d9008ce480d4451b6b8bf4483a22
Author: Toshi Kani <toshi.k...@hpe.com>
AuthorDate: Tue, 13 Mar 2018 11:03:46 -0600
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Wed, 14 Mar 2018 20:22:42 +0100

x86/mm: Fix vmalloc_fault to use pXd_large

Gratian Crisan reported that vmalloc_fault() crashes when CONFIG_HUGETLBFS
is not set since the function inadvertently uses pXn_huge(), which always
return 0 in this case.  ioremap() does not depend on CONFIG_HUGETLBFS.

Fix vmalloc_fault() to call pXd_large() instead.

Fixes: f4eafd8bcd52 ("x86/mm: Fix vmalloc_fault() to handle large pages 
properly")
Reported-by: Gratian Crisan <gratian.cri...@ni.com>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Cc: sta...@vger.kernel.org
Cc: linux...@kvack.org
Cc: Borislav Petkov <b...@alien8.de>
Cc: Andy Lutomirski <l...@kernel.org>
Link: https://lkml.kernel.org/r/20180313170347.3829-2-toshi.k...@hpe.com

---
 arch/x86/mm/fault.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c88573d90f3e..25a30b5d6582 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -330,7 +330,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (!pmd_k)
return -1;
 
-   if (pmd_huge(*pmd_k))
+   if (pmd_large(*pmd_k))
return 0;
 
pte_k = pte_offset_kernel(pmd_k, address);
@@ -475,7 +475,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
BUG();
 
-   if (pud_huge(*pud))
+   if (pud_large(*pud))
return 0;
 
pmd = pmd_offset(pud, address);
@@ -486,7 +486,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
BUG();
 
-   if (pmd_huge(*pmd))
+   if (pmd_large(*pmd))
return 0;
 
pte_ref = pte_offset_kernel(pmd_ref, address);


[tip:x86/urgent] x86/mm: Fix vmalloc_fault to use pXd_large

2018-03-14 Thread tip-bot for Toshi Kani
Commit-ID:  18a955219bf7d9008ce480d4451b6b8bf4483a22
Gitweb: https://git.kernel.org/tip/18a955219bf7d9008ce480d4451b6b8bf4483a22
Author: Toshi Kani 
AuthorDate: Tue, 13 Mar 2018 11:03:46 -0600
Committer:  Thomas Gleixner 
CommitDate: Wed, 14 Mar 2018 20:22:42 +0100

x86/mm: Fix vmalloc_fault to use pXd_large

Gratian Crisan reported that vmalloc_fault() crashes when CONFIG_HUGETLBFS
is not set since the function inadvertently uses pXn_huge(), which always
return 0 in this case.  ioremap() does not depend on CONFIG_HUGETLBFS.

Fix vmalloc_fault() to call pXd_large() instead.

Fixes: f4eafd8bcd52 ("x86/mm: Fix vmalloc_fault() to handle large pages 
properly")
Reported-by: Gratian Crisan 
Signed-off-by: Toshi Kani 
Signed-off-by: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Cc: linux...@kvack.org
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Link: https://lkml.kernel.org/r/20180313170347.3829-2-toshi.k...@hpe.com

---
 arch/x86/mm/fault.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c88573d90f3e..25a30b5d6582 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -330,7 +330,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (!pmd_k)
return -1;
 
-   if (pmd_huge(*pmd_k))
+   if (pmd_large(*pmd_k))
return 0;
 
pte_k = pte_offset_kernel(pmd_k, address);
@@ -475,7 +475,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
BUG();
 
-   if (pud_huge(*pud))
+   if (pud_large(*pud))
return 0;
 
pmd = pmd_offset(pud, address);
@@ -486,7 +486,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
BUG();
 
-   if (pmd_huge(*pmd))
+   if (pmd_large(*pmd))
return 0;
 
pte_ref = pte_offset_kernel(pmd_ref, address);


[PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces

2018-03-14 Thread Toshi Kani
Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
clear a given pud/pmd entry and free up lower level page table(s).
Address range associated with the pud/pmd entry must have been purged
by INVLPG.

fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings")
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Michal Hocko <mho...@suse.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Borislav Petkov <b...@suse.de>
Cc: Matthew Wilcox <wi...@infradead.org>
Cc: <sta...@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 1eed7ed518e6..34cda7e0551b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -712,7 +712,22 @@ int pmd_clear_huge(pmd_t *pmd)
  */
 int pud_free_pmd_page(pud_t *pud)
 {
-   return pud_none(*pud);
+   pmd_t *pmd;
+   int i;
+
+   if (pud_none(*pud))
+   return 1;
+
+   pmd = (pmd_t *)pud_page_vaddr(*pud);
+
+   for (i = 0; i < PTRS_PER_PMD; i++)
+   if (!pmd_free_pte_page([i]))
+   return 0;
+
+   pud_clear(pud);
+   free_page((unsigned long)pmd);
+
+   return 1;
 }
 
 /**
@@ -724,6 +739,15 @@ int pud_free_pmd_page(pud_t *pud)
  */
 int pmd_free_pte_page(pmd_t *pmd)
 {
-   return pmd_none(*pmd);
+   pte_t *pte;
+
+   if (pmd_none(*pmd))
+   return 1;
+
+   pte = (pte_t *)pmd_page_vaddr(*pmd);
+   pmd_clear(pmd);
+   free_page((unsigned long)pte);
+
+   return 1;
 }
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces

2018-03-14 Thread Toshi Kani
Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
clear a given pud/pmd entry and free up lower level page table(s).
Address range associated with the pud/pmd entry must have been purged
by INVLPG.

fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings")
Signed-off-by: Toshi Kani 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Matthew Wilcox 
Cc: 
---
 arch/x86/mm/pgtable.c |   28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 1eed7ed518e6..34cda7e0551b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -712,7 +712,22 @@ int pmd_clear_huge(pmd_t *pmd)
  */
 int pud_free_pmd_page(pud_t *pud)
 {
-   return pud_none(*pud);
+   pmd_t *pmd;
+   int i;
+
+   if (pud_none(*pud))
+   return 1;
+
+   pmd = (pmd_t *)pud_page_vaddr(*pud);
+
+   for (i = 0; i < PTRS_PER_PMD; i++)
+   if (!pmd_free_pte_page([i]))
+   return 0;
+
+   pud_clear(pud);
+   free_page((unsigned long)pmd);
+
+   return 1;
 }
 
 /**
@@ -724,6 +739,15 @@ int pud_free_pmd_page(pud_t *pud)
  */
 int pmd_free_pte_page(pmd_t *pmd)
 {
-   return pmd_none(*pmd);
+   pte_t *pte;
+
+   if (pmd_none(*pmd))
+   return 1;
+
+   pte = (pte_t *)pmd_page_vaddr(*pmd);
+   pmd_clear(pmd);
+   free_page((unsigned long)pte);
+
+   return 1;
 }
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH v2 1/2] mm/vmalloc: Add interfaces to free unmapped page table

2018-03-14 Thread Toshi Kani
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings.  Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo.

 1. ioremap a 4K size, valid page table will build,
 2. iounmap it, pte0 will set to 0;
 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
 4. pte0 is leaked;
 5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

The patch changes the ioremap path to free unmapped page table(s) since
doing so in the unmap path has the following issues:

 - The iounmap() path is shared with vunmap().  Since vmap() only
   supports pte mappings, making vunmap() to free a pte page is an
   overhead for regular vmap users as they do not need a pte page
   freed up.
 - Checking if all entries in a pte page are cleared in the unmap path
   is racy, and serializing this check is expensive.
 - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
   Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
   purge.

Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
which clear a given pud/pmd entry and free up a page for the lower
level entries.

This patch implements their stub functions on x86 and arm64, which
work as workaround.

Reported-by: Lei Li <lious.li...@hisilicon.com>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Wang Xuefeng <wxf.w...@hisilicon.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Hanjun Guo <guohan...@huawei.com>
Cc: Michal Hocko <mho...@suse.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Borislav Petkov <b...@suse.de>
Cc: Matthew Wilcox <wi...@infradead.org>
Cc: Chintan Pandya <cpan...@codeaurora.org>
Cc: <sta...@vger.kernel.org>
---
 arch/arm64/mm/mmu.c   |   10 ++
 arch/x86/mm/pgtable.c |   24 
 include/asm-generic/pgtable.h |   10 ++
 lib/ioremap.c |6 --
 4 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8c704f1e53c2..2dbb2c9f1ec1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -972,3 +972,13 @@ int pmd_clear_huge(pmd_t *pmdp)
pmd_clear(pmdp);
return 1;
 }
+
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 004abf9ebf12..1eed7ed518e6 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -702,4 +702,28 @@ int pmd_clear_huge(pmd_t *pmd)
 
return 0;
 }
+
+/**
+ * pud_free_pmd_page - Clear pud entry and free pmd page.
+ * @pud: Pointer to a PUD.
+ *
+ * Context: The pud range has been unmaped and TLB purged.
+ * Return: 1 if clearing the entry succeeded. 0 otherwise.
+ */
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+/**
+ * pmd_free_pte_page - Clear pmd entry and free pte page.
+ * @pmd: Pointer to a PMD.
+ *
+ * Context: The pmd range has been unmaped and TLB purged.
+ * Return: 1 if clearing the entry succeeded. 0 otherwise.
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 2cfa3075d148..2490800f7c5a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -983,6 +983,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud);
+int pmd_free_pte_page(pmd_t *pmd);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1008,6 +1010,14 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
+static inline int pud_free_pmd_page(pud_t *pud)
+{
+   return 0;
+}
+static inline int pmd_free_pte_page(pud_t *pmd)
+{
+   return 0;
+}
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 #ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
diff --git a/lib/ioremap.c b/lib/ioremap.c
index b808a390e4c3..54e5bbaa3200 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
 
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &

[PATCH v2 1/2] mm/vmalloc: Add interfaces to free unmapped page table

2018-03-14 Thread Toshi Kani
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings.  Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo.

 1. ioremap a 4K size, valid page table will build,
 2. iounmap it, pte0 will set to 0;
 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
 4. pte0 is leaked;
 5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

The patch changes the ioremap path to free unmapped page table(s) since
doing so in the unmap path has the following issues:

 - The iounmap() path is shared with vunmap().  Since vmap() only
   supports pte mappings, making vunmap() to free a pte page is an
   overhead for regular vmap users as they do not need a pte page
   freed up.
 - Checking if all entries in a pte page are cleared in the unmap path
   is racy, and serializing this check is expensive.
 - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
   Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
   purge.

Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
which clear a given pud/pmd entry and free up a page for the lower
level entries.

This patch implements their stub functions on x86 and arm64, which
work as workaround.

Reported-by: Lei Li 
Signed-off-by: Toshi Kani 
Cc: Catalin Marinas 
Cc: Wang Xuefeng 
Cc: Will Deacon 
Cc: Hanjun Guo 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Matthew Wilcox 
Cc: Chintan Pandya 
Cc: 
---
 arch/arm64/mm/mmu.c   |   10 ++
 arch/x86/mm/pgtable.c |   24 
 include/asm-generic/pgtable.h |   10 ++
 lib/ioremap.c |6 --
 4 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8c704f1e53c2..2dbb2c9f1ec1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -972,3 +972,13 @@ int pmd_clear_huge(pmd_t *pmdp)
pmd_clear(pmdp);
return 1;
 }
+
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 004abf9ebf12..1eed7ed518e6 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -702,4 +702,28 @@ int pmd_clear_huge(pmd_t *pmd)
 
return 0;
 }
+
+/**
+ * pud_free_pmd_page - Clear pud entry and free pmd page.
+ * @pud: Pointer to a PUD.
+ *
+ * Context: The pud range has been unmaped and TLB purged.
+ * Return: 1 if clearing the entry succeeded. 0 otherwise.
+ */
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+/**
+ * pmd_free_pte_page - Clear pmd entry and free pte page.
+ * @pmd: Pointer to a PMD.
+ *
+ * Context: The pmd range has been unmaped and TLB purged.
+ * Return: 1 if clearing the entry succeeded. 0 otherwise.
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 2cfa3075d148..2490800f7c5a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -983,6 +983,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud);
+int pmd_free_pte_page(pmd_t *pmd);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1008,6 +1010,14 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
+static inline int pud_free_pmd_page(pud_t *pud)
+{
+   return 0;
+}
+static inline int pmd_free_pte_page(pud_t *pmd)
+{
+   return 0;
+}
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 #ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
diff --git a/lib/ioremap.c b/lib/ioremap.c
index b808a390e4c3..54e5bbaa3200 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
 
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
-   IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
+   IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
+   pmd_free_pte_page(pmd)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -117,7 +118,8 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
 
 

[PATCH v2 0/2] fix memory leak / panic in ioremap huge pages

2018-03-14 Thread Toshi Kani
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings.  Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo. [1]

 1. ioremap a 4K size, valid page table will build,
 2. iounmap it, pte0 will set to 0;
 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
 4. pte0 is leaked;
 5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

The patch changes the ioremap path to free unmapped page table(s) since
doing so in the unmap path has the following issues:

 - The iounmap() path is shared with vunmap().  Since vmap() only
   supports pte mappings, making vunmap() to free a pte page is an
   overhead for regular vmap users as they do not need a pte page
   freed up.
 - Checking if all entries in a pte page are cleared in the unmap path
   is racy, and serializing this check is expensive.
 - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
   Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
   purge.

Patch 01 adds new interfaces as stubs, which work as workaround of
this issue.  This patch 01 was leveraged from Hanjun's patch. [1]

Patch 02 fixes the issue on x86 by implementing the interfaces.
A separate patch (not included in this series) is necessary for arm64.

[1] https://patchwork.kernel.org/patch/10134581/

---
v2
 - Added cc to stable (Andrew Morton)
 - Added proper function headers (Matthew Wilcox)
 - Added descriptions why fixing in the ioremap path. (Will Deacon)

---
Toshi Kani (2):
 1/2 mm/vmalloc: Add interfaces to free unmapped page table
 2/2 x86/mm: implement free pmd/pte page interfaces

---
 arch/arm64/mm/mmu.c   | 10 ++
 arch/x86/mm/pgtable.c | 44 +++
 include/asm-generic/pgtable.h | 10 ++
 lib/ioremap.c |  6 --
 4 files changed, 68 insertions(+), 2 deletions(-)


[PATCH v2 0/2] fix memory leak / panic in ioremap huge pages

2018-03-14 Thread Toshi Kani
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings.  Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo. [1]

 1. ioremap a 4K size, valid page table will build,
 2. iounmap it, pte0 will set to 0;
 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
 4. pte0 is leaked;
 5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

The patch changes the ioremap path to free unmapped page table(s) since
doing so in the unmap path has the following issues:

 - The iounmap() path is shared with vunmap().  Since vmap() only
   supports pte mappings, making vunmap() to free a pte page is an
   overhead for regular vmap users as they do not need a pte page
   freed up.
 - Checking if all entries in a pte page are cleared in the unmap path
   is racy, and serializing this check is expensive.
 - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
   Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
   purge.

Patch 01 adds new interfaces as stubs, which work as workaround of
this issue.  This patch 01 was leveraged from Hanjun's patch. [1]

Patch 02 fixes the issue on x86 by implementing the interfaces.
A separate patch (not included in this series) is necessary for arm64.

[1] https://patchwork.kernel.org/patch/10134581/

---
v2
 - Added cc to stable (Andrew Morton)
 - Added proper function headers (Matthew Wilcox)
 - Added descriptions why fixing in the ioremap path. (Will Deacon)

---
Toshi Kani (2):
 1/2 mm/vmalloc: Add interfaces to free unmapped page table
 2/2 x86/mm: implement free pmd/pte page interfaces

---
 arch/arm64/mm/mmu.c   | 10 ++
 arch/x86/mm/pgtable.c | 44 +++
 include/asm-generic/pgtable.h | 10 ++
 lib/ioremap.c |  6 --
 4 files changed, 68 insertions(+), 2 deletions(-)


[PATCH 2/2] x86/mm: remove pointless checks in vmalloc_fault

2018-03-13 Thread Toshi Kani
vmalloc_fault() sets user's pgd or p4d from the kernel page table.
Once it's set, all tables underneath are identical. There is no point
of following the same page table with two separate pointers and makes
sure they see the same with BUG().

Remove the pointless checks in vmalloc_fault(). Also rename the kernel
pgd/p4d pointers to pgd_k/p4d_k so that their names are consistent in
the file.

Suggested-by: Andy Lutomirski <l...@kernel.org>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Gratian Crisan <gratian.cri...@ni.com>
---
 arch/x86/mm/fault.c |   56 +++
 1 file changed, 17 insertions(+), 39 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 25a30b5d6582..e7bc79853538 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -417,11 +417,11 @@ void vmalloc_sync_all(void)
  */
 static noinline int vmalloc_fault(unsigned long address)
 {
-   pgd_t *pgd, *pgd_ref;
-   p4d_t *p4d, *p4d_ref;
-   pud_t *pud, *pud_ref;
-   pmd_t *pmd, *pmd_ref;
-   pte_t *pte, *pte_ref;
+   pgd_t *pgd, *pgd_k;
+   p4d_t *p4d, *p4d_k;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
 
/* Make sure we are in vmalloc area: */
if (!(address >= VMALLOC_START && address < VMALLOC_END))
@@ -435,73 +435,51 @@ static noinline int vmalloc_fault(unsigned long address)
 * case just flush:
 */
pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address);
-   pgd_ref = pgd_offset_k(address);
-   if (pgd_none(*pgd_ref))
+   pgd_k = pgd_offset_k(address);
+   if (pgd_none(*pgd_k))
return -1;
 
if (CONFIG_PGTABLE_LEVELS > 4) {
if (pgd_none(*pgd)) {
-   set_pgd(pgd, *pgd_ref);
+   set_pgd(pgd, *pgd_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(pgd_page_vaddr(*pgd) != 
pgd_page_vaddr(*pgd_ref));
+   BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_k));
}
}
 
/* With 4-level paging, copying happens on the p4d level. */
p4d = p4d_offset(pgd, address);
-   p4d_ref = p4d_offset(pgd_ref, address);
-   if (p4d_none(*p4d_ref))
+   p4d_k = p4d_offset(pgd_k, address);
+   if (p4d_none(*p4d_k))
return -1;
 
if (p4d_none(*p4d) && CONFIG_PGTABLE_LEVELS == 4) {
-   set_p4d(p4d, *p4d_ref);
+   set_p4d(p4d, *p4d_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_ref));
+   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_k));
}
 
-   /*
-* Below here mismatches are bugs because these lower tables
-* are shared:
-*/
BUILD_BUG_ON(CONFIG_PGTABLE_LEVELS < 4);
 
pud = pud_offset(p4d, address);
-   pud_ref = pud_offset(p4d_ref, address);
-   if (pud_none(*pud_ref))
+   if (pud_none(*pud))
return -1;
 
-   if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
-   BUG();
-
if (pud_large(*pud))
return 0;
 
pmd = pmd_offset(pud, address);
-   pmd_ref = pmd_offset(pud_ref, address);
-   if (pmd_none(*pmd_ref))
+   if (pmd_none(*pmd))
return -1;
 
-   if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
-   BUG();
-
if (pmd_large(*pmd))
return 0;
 
-   pte_ref = pte_offset_kernel(pmd_ref, address);
-   if (!pte_present(*pte_ref))
-   return -1;
-
pte = pte_offset_kernel(pmd, address);
-
-   /*
-* Don't use pte_page here, because the mappings can point
-* outside mem_map, and the NUMA hash lookup cannot handle
-* that:
-*/
-   if (!pte_present(*pte) || pte_pfn(*pte) != pte_pfn(*pte_ref))
-   BUG();
+   if (!pte_present(*pte))
+   return -1;
 
return 0;
 }


[PATCH 2/2] x86/mm: remove pointless checks in vmalloc_fault

2018-03-13 Thread Toshi Kani
vmalloc_fault() sets user's pgd or p4d from the kernel page table.
Once it's set, all tables underneath are identical. There is no point
of following the same page table with two separate pointers and makes
sure they see the same with BUG().

Remove the pointless checks in vmalloc_fault(). Also rename the kernel
pgd/p4d pointers to pgd_k/p4d_k so that their names are consistent in
the file.

Suggested-by: Andy Lutomirski 
Signed-off-by: Toshi Kani 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Gratian Crisan 
---
 arch/x86/mm/fault.c |   56 +++
 1 file changed, 17 insertions(+), 39 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 25a30b5d6582..e7bc79853538 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -417,11 +417,11 @@ void vmalloc_sync_all(void)
  */
 static noinline int vmalloc_fault(unsigned long address)
 {
-   pgd_t *pgd, *pgd_ref;
-   p4d_t *p4d, *p4d_ref;
-   pud_t *pud, *pud_ref;
-   pmd_t *pmd, *pmd_ref;
-   pte_t *pte, *pte_ref;
+   pgd_t *pgd, *pgd_k;
+   p4d_t *p4d, *p4d_k;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
 
/* Make sure we are in vmalloc area: */
if (!(address >= VMALLOC_START && address < VMALLOC_END))
@@ -435,73 +435,51 @@ static noinline int vmalloc_fault(unsigned long address)
 * case just flush:
 */
pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address);
-   pgd_ref = pgd_offset_k(address);
-   if (pgd_none(*pgd_ref))
+   pgd_k = pgd_offset_k(address);
+   if (pgd_none(*pgd_k))
return -1;
 
if (CONFIG_PGTABLE_LEVELS > 4) {
if (pgd_none(*pgd)) {
-   set_pgd(pgd, *pgd_ref);
+   set_pgd(pgd, *pgd_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(pgd_page_vaddr(*pgd) != 
pgd_page_vaddr(*pgd_ref));
+   BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_k));
}
}
 
/* With 4-level paging, copying happens on the p4d level. */
p4d = p4d_offset(pgd, address);
-   p4d_ref = p4d_offset(pgd_ref, address);
-   if (p4d_none(*p4d_ref))
+   p4d_k = p4d_offset(pgd_k, address);
+   if (p4d_none(*p4d_k))
return -1;
 
if (p4d_none(*p4d) && CONFIG_PGTABLE_LEVELS == 4) {
-   set_p4d(p4d, *p4d_ref);
+   set_p4d(p4d, *p4d_k);
arch_flush_lazy_mmu_mode();
} else {
-   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_ref));
+   BUG_ON(p4d_pfn(*p4d) != p4d_pfn(*p4d_k));
}
 
-   /*
-* Below here mismatches are bugs because these lower tables
-* are shared:
-*/
BUILD_BUG_ON(CONFIG_PGTABLE_LEVELS < 4);
 
pud = pud_offset(p4d, address);
-   pud_ref = pud_offset(p4d_ref, address);
-   if (pud_none(*pud_ref))
+   if (pud_none(*pud))
return -1;
 
-   if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
-   BUG();
-
if (pud_large(*pud))
return 0;
 
pmd = pmd_offset(pud, address);
-   pmd_ref = pmd_offset(pud_ref, address);
-   if (pmd_none(*pmd_ref))
+   if (pmd_none(*pmd))
return -1;
 
-   if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
-   BUG();
-
if (pmd_large(*pmd))
return 0;
 
-   pte_ref = pte_offset_kernel(pmd_ref, address);
-   if (!pte_present(*pte_ref))
-   return -1;
-
pte = pte_offset_kernel(pmd, address);
-
-   /*
-* Don't use pte_page here, because the mappings can point
-* outside mem_map, and the NUMA hash lookup cannot handle
-* that:
-*/
-   if (!pte_present(*pte) || pte_pfn(*pte) != pte_pfn(*pte_ref))
-   BUG();
+   if (!pte_present(*pte))
+   return -1;
 
return 0;
 }


[PATCH 0/2] x86/mm: vmalloc_fault fix for CONFIG_HUGETLBFS off

2018-03-13 Thread Toshi Kani
Gratian Crisan reported that vmalloc_fault() crashes when
CONFIG_HUGETLBFS is not set since the function inadvertently
uses pXn_huge(), which always return 0 in this case. [1]
ioremap() does not depend on CONFIG_HUGETLBFS.

Patch 01 fixes the issue in vmalloc_fault().
Patch 02 is a clean-up for vmalloc_fault().

[1] https://lkml.org/lkml/2018/3/8/1281

---
Toshi Kani (2):
 1/2 x86/mm: fix vmalloc_fault to use pXd_large
 2/2 x86/mm: remove pointless checks in vmalloc_fault

---
 arch/x86/mm/fault.c | 62 +
 1 file changed, 20 insertions(+), 42 deletions(-)


[PATCH 1/2] x86/mm: fix vmalloc_fault to use pXd_large

2018-03-13 Thread Toshi Kani
Gratian Crisan reported that vmalloc_fault() crashes when
CONFIG_HUGETLBFS is not set since the function inadvertently
uses pXn_huge(), which always return 0 in this case.  ioremap()
does not depend on CONFIG_HUGETLBFS.

Fix vmalloc_fault() to call pXd_large() instead.

fixes: f4eafd8bcd52 ("x86/mm: Fix vmalloc_fault() to handle large pages 
properly")
Reported-by: Gratian Crisan <gratian.cri...@ni.com>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Gratian Crisan <gratian.cri...@ni.com>
Cc: sta...@vger.kernel.org
---
 arch/x86/mm/fault.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c88573d90f3e..25a30b5d6582 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -330,7 +330,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (!pmd_k)
return -1;
 
-   if (pmd_huge(*pmd_k))
+   if (pmd_large(*pmd_k))
return 0;
 
pte_k = pte_offset_kernel(pmd_k, address);
@@ -475,7 +475,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
BUG();
 
-   if (pud_huge(*pud))
+   if (pud_large(*pud))
return 0;
 
pmd = pmd_offset(pud, address);
@@ -486,7 +486,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
BUG();
 
-   if (pmd_huge(*pmd))
+   if (pmd_large(*pmd))
return 0;
 
pte_ref = pte_offset_kernel(pmd_ref, address);


[PATCH 0/2] x86/mm: vmalloc_fault fix for CONFIG_HUGETLBFS off

2018-03-13 Thread Toshi Kani
Gratian Crisan reported that vmalloc_fault() crashes when
CONFIG_HUGETLBFS is not set since the function inadvertently
uses pXn_huge(), which always return 0 in this case. [1]
ioremap() does not depend on CONFIG_HUGETLBFS.

Patch 01 fixes the issue in vmalloc_fault().
Patch 02 is a clean-up for vmalloc_fault().

[1] https://lkml.org/lkml/2018/3/8/1281

---
Toshi Kani (2):
 1/2 x86/mm: fix vmalloc_fault to use pXd_large
 2/2 x86/mm: remove pointless checks in vmalloc_fault

---
 arch/x86/mm/fault.c | 62 +
 1 file changed, 20 insertions(+), 42 deletions(-)


[PATCH 1/2] x86/mm: fix vmalloc_fault to use pXd_large

2018-03-13 Thread Toshi Kani
Gratian Crisan reported that vmalloc_fault() crashes when
CONFIG_HUGETLBFS is not set since the function inadvertently
uses pXn_huge(), which always return 0 in this case.  ioremap()
does not depend on CONFIG_HUGETLBFS.

Fix vmalloc_fault() to call pXd_large() instead.

fixes: f4eafd8bcd52 ("x86/mm: Fix vmalloc_fault() to handle large pages 
properly")
Reported-by: Gratian Crisan 
Signed-off-by: Toshi Kani 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Gratian Crisan 
Cc: sta...@vger.kernel.org
---
 arch/x86/mm/fault.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c88573d90f3e..25a30b5d6582 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -330,7 +330,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (!pmd_k)
return -1;
 
-   if (pmd_huge(*pmd_k))
+   if (pmd_large(*pmd_k))
return 0;
 
pte_k = pte_offset_kernel(pmd_k, address);
@@ -475,7 +475,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
BUG();
 
-   if (pud_huge(*pud))
+   if (pud_large(*pud))
return 0;
 
pmd = pmd_offset(pud, address);
@@ -486,7 +486,7 @@ static noinline int vmalloc_fault(unsigned long address)
if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
BUG();
 
-   if (pmd_huge(*pmd))
+   if (pmd_large(*pmd))
return 0;
 
pte_ref = pte_offset_kernel(pmd_ref, address);


[PATCH 2/2] x86/mm: implement free pmd/pte page interfaces

2018-03-07 Thread Toshi Kani
Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
clear a given pud/pmd entry and free up lower level page table(s).
Address range associated with the pud/pmd entry must have been purged
by INVLPG.

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Michal Hocko <mho...@suse.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Borislav Petkov <b...@suse.de>
---
 arch/x86/mm/pgtable.c |   28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 942f4fa341f1..121c0114439e 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -710,7 +710,22 @@ int pmd_clear_huge(pmd_t *pmd)
  */
 int pud_free_pmd_page(pud_t *pud)
 {
-   return pud_none(*pud);
+   pmd_t *pmd;
+   int i;
+
+   if (pud_none(*pud))
+   return 1;
+
+   pmd = (pmd_t *)pud_page_vaddr(*pud);
+
+   for (i = 0; i < PTRS_PER_PMD; i++)
+   if (!pmd_free_pte_page([i]))
+   return 0;
+
+   pud_clear(pud);
+   free_page((unsigned long)pmd);
+
+   return 1;
 }
 
 /**
@@ -720,6 +735,15 @@ int pud_free_pmd_page(pud_t *pud)
  */
 int pmd_free_pte_page(pmd_t *pmd)
 {
-   return pmd_none(*pmd);
+   pte_t *pte;
+
+   if (pmd_none(*pmd))
+   return 1;
+
+   pte = (pte_t *)pmd_page_vaddr(*pmd);
+   pmd_clear(pmd);
+   free_page((unsigned long)pte);
+
+   return 1;
 }
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH 2/2] x86/mm: implement free pmd/pte page interfaces

2018-03-07 Thread Toshi Kani
Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
clear a given pud/pmd entry and free up lower level page table(s).
Address range associated with the pud/pmd entry must have been purged
by INVLPG.

Signed-off-by: Toshi Kani 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
---
 arch/x86/mm/pgtable.c |   28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 942f4fa341f1..121c0114439e 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -710,7 +710,22 @@ int pmd_clear_huge(pmd_t *pmd)
  */
 int pud_free_pmd_page(pud_t *pud)
 {
-   return pud_none(*pud);
+   pmd_t *pmd;
+   int i;
+
+   if (pud_none(*pud))
+   return 1;
+
+   pmd = (pmd_t *)pud_page_vaddr(*pud);
+
+   for (i = 0; i < PTRS_PER_PMD; i++)
+   if (!pmd_free_pte_page([i]))
+   return 0;
+
+   pud_clear(pud);
+   free_page((unsigned long)pmd);
+
+   return 1;
 }
 
 /**
@@ -720,6 +735,15 @@ int pud_free_pmd_page(pud_t *pud)
  */
 int pmd_free_pte_page(pmd_t *pmd)
 {
-   return pmd_none(*pmd);
+   pte_t *pte;
+
+   if (pmd_none(*pmd))
+   return 1;
+
+   pte = (pte_t *)pmd_page_vaddr(*pmd);
+   pmd_clear(pmd);
+   free_page((unsigned long)pte);
+
+   return 1;
 }
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */


[PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

2018-03-07 Thread Toshi Kani
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings.  Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo.

1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
   then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
   which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
which clear a given pud/pmd entry and free up a page for the lower
level entries.

This patch implements their stub functions on x86 and arm64, which
work as workaround.

Reported-by: Lei Li <lious.li...@hisilicon.com>
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Wang Xuefeng <wxf.w...@hisilicon.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Hanjun Guo <guohan...@huawei.com>
Cc: Michal Hocko <mho...@suse.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Borislav Petkov <b...@suse.de>
---
 arch/arm64/mm/mmu.c   |   10 ++
 arch/x86/mm/pgtable.c |   20 
 include/asm-generic/pgtable.h |   10 ++
 lib/ioremap.c |6 --
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 84a019f55022..84a37b4bc28e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -972,3 +972,13 @@ int pmd_clear_huge(pmd_t *pmdp)
pmd_clear(pmdp);
return 1;
 }
+
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 004abf9ebf12..942f4fa341f1 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -702,4 +702,24 @@ int pmd_clear_huge(pmd_t *pmd)
 
return 0;
 }
+
+/**
+ * pud_free_pmd_page - clear pud entry and free pmd page
+ *
+ * Returns 1 on success and 0 on failure (pud not cleared).
+ */
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+/**
+ * pmd_free_pte_page - clear pmd entry and free pte page
+ *
+ * Returns 1 on success and 0 on failure (pmd not cleared).
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 2cfa3075d148..2490800f7c5a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -983,6 +983,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud);
+int pmd_free_pte_page(pmd_t *pmd);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1008,6 +1010,14 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
+static inline int pud_free_pmd_page(pud_t *pud)
+{
+   return 0;
+}
+static inline int pmd_free_pte_page(pud_t *pmd)
+{
+   return 0;
+}
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 #ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
diff --git a/lib/ioremap.c b/lib/ioremap.c
index b808a390e4c3..54e5bbaa3200 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
 
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
-   IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
+   IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
+   pmd_free_pte_page(pmd)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -117,7 +118,8 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
 
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
-   IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
+   IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
+   pud_free_pmd_page(pud)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}


[PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table

2018-03-07 Thread Toshi Kani
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings.  Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo.

1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
   then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
   which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
which clear a given pud/pmd entry and free up a page for the lower
level entries.

This patch implements their stub functions on x86 and arm64, which
work as workaround.

Reported-by: Lei Li 
Signed-off-by: Toshi Kani 
Cc: Catalin Marinas 
Cc: Wang Xuefeng 
Cc: Will Deacon 
Cc: Hanjun Guo 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
---
 arch/arm64/mm/mmu.c   |   10 ++
 arch/x86/mm/pgtable.c |   20 
 include/asm-generic/pgtable.h |   10 ++
 lib/ioremap.c |6 --
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 84a019f55022..84a37b4bc28e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -972,3 +972,13 @@ int pmd_clear_huge(pmd_t *pmdp)
pmd_clear(pmdp);
return 1;
 }
+
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 004abf9ebf12..942f4fa341f1 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -702,4 +702,24 @@ int pmd_clear_huge(pmd_t *pmd)
 
return 0;
 }
+
+/**
+ * pud_free_pmd_page - clear pud entry and free pmd page
+ *
+ * Returns 1 on success and 0 on failure (pud not cleared).
+ */
+int pud_free_pmd_page(pud_t *pud)
+{
+   return pud_none(*pud);
+}
+
+/**
+ * pmd_free_pte_page - clear pmd entry and free pte page
+ *
+ * Returns 1 on success and 0 on failure (pmd not cleared).
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+   return pmd_none(*pmd);
+}
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 2cfa3075d148..2490800f7c5a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -983,6 +983,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud);
+int pmd_free_pte_page(pmd_t *pmd);
 #else  /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1008,6 +1010,14 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
return 0;
 }
+static inline int pud_free_pmd_page(pud_t *pud)
+{
+   return 0;
+}
+static inline int pmd_free_pte_page(pud_t *pmd)
+{
+   return 0;
+}
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 #ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
diff --git a/lib/ioremap.c b/lib/ioremap.c
index b808a390e4c3..54e5bbaa3200 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
 
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
-   IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
+   IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
+   pmd_free_pte_page(pmd)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -117,7 +118,8 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
 
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
-   IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
+   IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
+   pud_free_pmd_page(pud)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}


[PATCH 0/2] fix memory leak / panic in ioremap huge pages

2018-03-07 Thread Toshi Kani
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings.  Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo. [1]

1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
   then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
   which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

Patch 01 adds new interfaces as stubs, which work as workaround of
this issue.  This patch 01 was leveraged from Hanjun's patch. [1]
Patch 02 fixes the issue on x86 by implementing the interfaces.

[1] https://patchwork.kernel.org/patch/10134581/

---
Toshi Kani (2):
 1/2 mm/vmalloc: Add interfaces to free unused page table
 2/2 x86/mm: implement free pmd/pte page interfaces

---
 arch/arm64/mm/mmu.c   | 10 ++
 arch/x86/mm/pgtable.c | 44 +++
 include/asm-generic/pgtable.h | 10 ++
 lib/ioremap.c |  6 --
 4 files changed, 68 insertions(+), 2 deletions(-)


[PATCH 0/2] fix memory leak / panic in ioremap huge pages

2018-03-07 Thread Toshi Kani
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings.  Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo. [1]

1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
   then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
   which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

Patch 01 adds new interfaces as stubs, which work as workaround of
this issue.  This patch 01 was leveraged from Hanjun's patch. [1]
Patch 02 fixes the issue on x86 by implementing the interfaces.

[1] https://patchwork.kernel.org/patch/10134581/

---
Toshi Kani (2):
 1/2 mm/vmalloc: Add interfaces to free unused page table
 2/2 x86/mm: implement free pmd/pte page interfaces

---
 arch/arm64/mm/mmu.c   | 10 ++
 arch/x86/mm/pgtable.c | 44 +++
 include/asm-generic/pgtable.h | 10 ++
 lib/ioremap.c |  6 --
 4 files changed, 68 insertions(+), 2 deletions(-)


[PATCH 0/2] update label size handlings per UEFI 2.7

2018-02-23 Thread Toshi Kani
This patchset updates label storage size check and index block size
calculation according to UEFI 2.7 spec.

---
Toshi Kani (2):
 1/2 libnvdimm, label: change min label storage size per UEFI 2.7
 2/2 libnvdimm, label: change nvdimm_num_label_slots per UEFI 2.7

---
 drivers/nvdimm/label.c | 34 --
 drivers/nvdimm/label.h |  2 +-
 2 files changed, 25 insertions(+), 11 deletions(-)


[PATCH 0/2] update label size handlings per UEFI 2.7

2018-02-23 Thread Toshi Kani
This patchset updates label storage size check and index block size
calculation according to UEFI 2.7 spec.

---
Toshi Kani (2):
 1/2 libnvdimm, label: change min label storage size per UEFI 2.7
 2/2 libnvdimm, label: change nvdimm_num_label_slots per UEFI 2.7

---
 drivers/nvdimm/label.c | 34 --
 drivers/nvdimm/label.h |  2 +-
 2 files changed, 25 insertions(+), 11 deletions(-)


[PATCH 1/2] libnvdimm, label: change min label storage size per UEFI 2.7

2018-02-23 Thread Toshi Kani
UEFI 2.7 defines in page 758 that:

  Initial Label Storage Area Configuration
 :
  The minimum size of the Label Storage Area is large enough to
  hold 2 index blocks and 2 labels.

The mininum index block size is 256 bytes, and the minimum label size
is also 256 bytes.

Change ND_LABEL_MIN_SIZE to (256 * 4) so that NVDIMM devices with
the minimum label storage area do not fail with the size check in
nvdimm_init_config_data().

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/nvdimm/label.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/label.h b/drivers/nvdimm/label.h
index 1ebf4d3d01ba..18bbe183b3a9 100644
--- a/drivers/nvdimm/label.h
+++ b/drivers/nvdimm/label.h
@@ -33,7 +33,7 @@ enum {
BTTINFO_UUID_LEN = 16,
BTTINFO_FLAG_ERROR = 0x1,/* error state (read-only) */
BTTINFO_MAJOR_VERSION = 1,
-   ND_LABEL_MIN_SIZE = 512 * 129, /* see sizeof_namespace_index() */
+   ND_LABEL_MIN_SIZE = 256 * 4, /* see sizeof_namespace_index() */
ND_LABEL_ID_SIZE = 50,
ND_NSINDEX_INIT = 0x1,
 };


[PATCH 1/2] libnvdimm, label: change min label storage size per UEFI 2.7

2018-02-23 Thread Toshi Kani
UEFI 2.7 defines in page 758 that:

  Initial Label Storage Area Configuration
 :
  The minimum size of the Label Storage Area is large enough to
  hold 2 index blocks and 2 labels.

The mininum index block size is 256 bytes, and the minimum label size
is also 256 bytes.

Change ND_LABEL_MIN_SIZE to (256 * 4) so that NVDIMM devices with
the minimum label storage area do not fail with the size check in
nvdimm_init_config_data().

Signed-off-by: Toshi Kani 
Cc: Dan Williams 
---
 drivers/nvdimm/label.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/label.h b/drivers/nvdimm/label.h
index 1ebf4d3d01ba..18bbe183b3a9 100644
--- a/drivers/nvdimm/label.h
+++ b/drivers/nvdimm/label.h
@@ -33,7 +33,7 @@ enum {
BTTINFO_UUID_LEN = 16,
BTTINFO_FLAG_ERROR = 0x1,/* error state (read-only) */
BTTINFO_MAJOR_VERSION = 1,
-   ND_LABEL_MIN_SIZE = 512 * 129, /* see sizeof_namespace_index() */
+   ND_LABEL_MIN_SIZE = 256 * 4, /* see sizeof_namespace_index() */
ND_LABEL_ID_SIZE = 50,
ND_NSINDEX_INIT = 0x1,
 };


[PATCH 2/2] libnvdimm, label: change nvdimm_num_label_slots per UEFI 2.7

2018-02-23 Thread Toshi Kani
sizeof_namespace_index() fails when NVDIMM devices have the minimum
1024 bytes label storage area.  nvdimm_num_label_slots() returns 3
slots while the area is only big enough for 2 slots.

Change nvdimm_num_label_slots() to calculate a number of label slots
according to UEFI 2.7 spec.

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/nvdimm/label.c |   34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index de66c02f6140..be3ccf7c5413 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -45,9 +45,27 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata *ndd)
return ndd->nslabel_size;
 }
 
+static size_t __sizeof_namespace_index(u32 nslot)
+{
+   return ALIGN(sizeof(struct nd_namespace_index) + DIV_ROUND_UP(nslot, 8),
+   NSINDEX_ALIGN);
+}
+
+static int __nvdimm_num_label_slots(struct nvdimm_drvdata *ndd,
+   size_t index_size)
+{
+   return (ndd->nsarea.config_size - index_size * 2) /
+   sizeof_namespace_label(ndd);
+}
+
 int nvdimm_num_label_slots(struct nvdimm_drvdata *ndd)
 {
-   return ndd->nsarea.config_size / (sizeof_namespace_label(ndd) + 1);
+   u32 tmp_nslot, n;
+
+   tmp_nslot = ndd->nsarea.config_size / sizeof_namespace_label(ndd);
+   n = __sizeof_namespace_index(tmp_nslot) / NSINDEX_ALIGN;
+
+   return __nvdimm_num_label_slots(ndd, NSINDEX_ALIGN * n);
 }
 
 size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd)
@@ -55,18 +73,14 @@ size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd)
u32 nslot, space, size;
 
/*
-* The minimum index space is 512 bytes, with that amount of
-* index we can describe ~1400 labels which is less than a byte
-* of overhead per label.  Round up to a byte of overhead per
-* label and determine the size of the index region.  Yes, this
-* starts to waste space at larger config_sizes, but it's
-* unlikely we'll ever see anything but 128K.
+* Per UEFI 2.7, the minimum size of the Label Storage Area is large
+* enough to hold 2 index blocks and 2 labels.  The minimum index
+* block size is 256 bytes, and the minimum label size is 256 bytes.
 */
nslot = nvdimm_num_label_slots(ndd);
space = ndd->nsarea.config_size - nslot * sizeof_namespace_label(ndd);
-   size = ALIGN(sizeof(struct nd_namespace_index) + DIV_ROUND_UP(nslot, 8),
-   NSINDEX_ALIGN) * 2;
-   if (size <= space)
+   size = __sizeof_namespace_index(nslot) * 2;
+   if (size <= space && nslot >= 2)
return size / 2;
 
dev_err(ndd->dev, "label area (%d) too small to host (%d byte) 
labels\n",


[PATCH 2/2] libnvdimm, label: change nvdimm_num_label_slots per UEFI 2.7

2018-02-23 Thread Toshi Kani
sizeof_namespace_index() fails when NVDIMM devices have the minimum
1024 bytes label storage area.  nvdimm_num_label_slots() returns 3
slots while the area is only big enough for 2 slots.

Change nvdimm_num_label_slots() to calculate a number of label slots
according to UEFI 2.7 spec.

Signed-off-by: Toshi Kani 
Cc: Dan Williams 
---
 drivers/nvdimm/label.c |   34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index de66c02f6140..be3ccf7c5413 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -45,9 +45,27 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata *ndd)
return ndd->nslabel_size;
 }
 
+static size_t __sizeof_namespace_index(u32 nslot)
+{
+   return ALIGN(sizeof(struct nd_namespace_index) + DIV_ROUND_UP(nslot, 8),
+   NSINDEX_ALIGN);
+}
+
+static int __nvdimm_num_label_slots(struct nvdimm_drvdata *ndd,
+   size_t index_size)
+{
+   return (ndd->nsarea.config_size - index_size * 2) /
+   sizeof_namespace_label(ndd);
+}
+
 int nvdimm_num_label_slots(struct nvdimm_drvdata *ndd)
 {
-   return ndd->nsarea.config_size / (sizeof_namespace_label(ndd) + 1);
+   u32 tmp_nslot, n;
+
+   tmp_nslot = ndd->nsarea.config_size / sizeof_namespace_label(ndd);
+   n = __sizeof_namespace_index(tmp_nslot) / NSINDEX_ALIGN;
+
+   return __nvdimm_num_label_slots(ndd, NSINDEX_ALIGN * n);
 }
 
 size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd)
@@ -55,18 +73,14 @@ size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd)
u32 nslot, space, size;
 
/*
-* The minimum index space is 512 bytes, with that amount of
-* index we can describe ~1400 labels which is less than a byte
-* of overhead per label.  Round up to a byte of overhead per
-* label and determine the size of the index region.  Yes, this
-* starts to waste space at larger config_sizes, but it's
-* unlikely we'll ever see anything but 128K.
+* Per UEFI 2.7, the minimum size of the Label Storage Area is large
+* enough to hold 2 index blocks and 2 labels.  The minimum index
+* block size is 256 bytes, and the minimum label size is 256 bytes.
 */
nslot = nvdimm_num_label_slots(ndd);
space = ndd->nsarea.config_size - nslot * sizeof_namespace_label(ndd);
-   size = ALIGN(sizeof(struct nd_namespace_index) + DIV_ROUND_UP(nslot, 8),
-   NSINDEX_ALIGN) * 2;
-   if (size <= space)
+   size = __sizeof_namespace_index(nslot) * 2;
+   if (size <= space && nslot >= 2)
return size / 2;
 
dev_err(ndd->dev, "label area (%d) too small to host (%d byte) 
labels\n",


[PATCH] acpi, nfit: fix register dimm error handling

2018-02-02 Thread Toshi Kani
A NULL pointer reference kernel bug was observed when
acpi_nfit_add_dimm() called in acpi_nfit_register_dimms()
failed. This error path does not set nfit_mem->nvdimm, but
the 2nd list_for_each_entry() loop in the function assumes
it's always set. Add a check to nfit_mem->nvdimm.

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: "Rafael J. Wysocki" <r...@rjwysocki.net>
---
 drivers/acpi/nfit/core.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index abeb4df4f22e..b28ce440a06f 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1867,6 +1867,9 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc 
*acpi_desc)
struct kernfs_node *nfit_kernfs;
 
nvdimm = nfit_mem->nvdimm;
+   if (!nvdimm)
+   continue;
+
nfit_kernfs = sysfs_get_dirent(nvdimm_kobj(nvdimm)->sd, "nfit");
if (nfit_kernfs)
nfit_mem->flags_attr = sysfs_get_dirent(nfit_kernfs,


[PATCH] acpi, nfit: fix register dimm error handling

2018-02-02 Thread Toshi Kani
A NULL pointer reference kernel bug was observed when
acpi_nfit_add_dimm() called in acpi_nfit_register_dimms()
failed. This error path does not set nfit_mem->nvdimm, but
the 2nd list_for_each_entry() loop in the function assumes
it's always set. Add a check to nfit_mem->nvdimm.

Signed-off-by: Toshi Kani 
Cc: Dan Williams 
Cc: "Rafael J. Wysocki" 
---
 drivers/acpi/nfit/core.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index abeb4df4f22e..b28ce440a06f 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1867,6 +1867,9 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc 
*acpi_desc)
struct kernfs_node *nfit_kernfs;
 
nvdimm = nfit_mem->nvdimm;
+   if (!nvdimm)
+   continue;
+
nfit_kernfs = sysfs_get_dirent(nvdimm_kobj(nvdimm)->sd, "nfit");
if (nfit_kernfs)
nfit_mem->flags_attr = sysfs_get_dirent(nfit_kernfs,


[PATCH v5 3/5] ghes_edac: add platform check to enable ghes_edac

2017-08-31 Thread Toshi Kani
The ghes_edac driver was introduced in 2013 [1], but it has not
been enabled by any distro yet.  This driver obtains error info
from firmware interfaces, which are not properly implemented on
many platforms, as the driver always emits the messages below:

 This EDAC driver relies on BIOS to enumerate memory and get error reports.
 Unfortunately, not all BIOSes reflect the memory layout correctly
 So, the end result of using this driver varies from vendor to vendor
 If you find incorrect reports, please contact your hardware vendor
 to correct its BIOS.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms
to the list when they support ghes_edac.

"ghes_edac.force_load=1" skips this platform check.

[1]: https://lwn.net/Articles/538438/
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Mauro Carvalho Chehab <mche...@kernel.org>
Cc: Tony Luck <tony.l...@intel.com>
---
v5
- Remove prefix from 'force_load'
- Update comment of force_load option
---
 drivers/edac/ghes_edac.c |   28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 8d904df..68b6ee1 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -38,6 +38,10 @@ static struct ghes_edac_pvt *ghes_pvt;
  */
 static DEFINE_SPINLOCK(ghes_lock);
 
+/* "ghes_edac.force_load=1" skips the platform check */
+static bool __read_mostly force_load;
+module_param(force_load, bool, 0);
+
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
u8 type;
@@ -415,6 +419,14 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
spin_unlock_irqrestore(_lock, flags);
 }
 
+/*
+ * Known systems that are safe to enable this module.
+ */
+static struct acpi_platform_list plat_list[] = {
+   {"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+   { } /* End */
+};
+
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
bool fake = false;
@@ -422,6 +434,12 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
struct mem_ctl_info *mci;
struct edac_mc_layer layers[1];
struct ghes_edac_dimm_fill dimm_fill;
+   int idx;
+
+   /* Check if safe to enable on this system */
+   idx = acpi_match_platform_list(plat_list);
+   if (!force_load && idx < 0)
+   return 0;
 
/*
 * We have only one logical memory controller to which all DIMMs belong.
@@ -460,17 +478,17 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
mci->ctl_name = "ghes_edac";
mci->dev_name = "ghes";
 
-   if (!fake) {
+   if (fake) {
+   pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
+   pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
+   pr_info("work on such system. Use this driver with caution\n");
+   } else if (idx < 0) {
pr_info("This EDAC driver relies on BIOS to enumerate memory 
and get error reports.\n");
pr_info("Unfortunately, not all BIOSes reflect the memory 
layout correctly.\n");
pr_info("So, the end result of using this driver varies from 
vendor to vendor.\n");
pr_info("If you find incorrect reports, please contact your 
hardware vendor\n");
pr_info("to correct its BIOS.\n");
pr_info("This system has %d DIMM sockets.\n", num_dimm);
-   } else {
-   pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
-   pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
-   pr_info("work on such system. Use this driver with caution\n");
}
 
if (!fake) {


[PATCH v5 3/5] ghes_edac: add platform check to enable ghes_edac

2017-08-31 Thread Toshi Kani
The ghes_edac driver was introduced in 2013 [1], but it has not
been enabled by any distro yet.  This driver obtains error info
from firmware interfaces, which are not properly implemented on
many platforms, as the driver always emits the messages below:

 This EDAC driver relies on BIOS to enumerate memory and get error reports.
 Unfortunately, not all BIOSes reflect the memory layout correctly
 So, the end result of using this driver varies from vendor to vendor
 If you find incorrect reports, please contact your hardware vendor
 to correct its BIOS.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms
to the list when they support ghes_edac.

"ghes_edac.force_load=1" skips this platform check.

[1]: https://lwn.net/Articles/538438/
Signed-off-by: Toshi Kani 
Cc: Borislav Petkov 
Cc: Mauro Carvalho Chehab 
Cc: Tony Luck 
---
v5
- Remove prefix from 'force_load'
- Update comment of force_load option
---
 drivers/edac/ghes_edac.c |   28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 8d904df..68b6ee1 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -38,6 +38,10 @@ static struct ghes_edac_pvt *ghes_pvt;
  */
 static DEFINE_SPINLOCK(ghes_lock);
 
+/* "ghes_edac.force_load=1" skips the platform check */
+static bool __read_mostly force_load;
+module_param(force_load, bool, 0);
+
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
u8 type;
@@ -415,6 +419,14 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
spin_unlock_irqrestore(_lock, flags);
 }
 
+/*
+ * Known systems that are safe to enable this module.
+ */
+static struct acpi_platform_list plat_list[] = {
+   {"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+   { } /* End */
+};
+
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
bool fake = false;
@@ -422,6 +434,12 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
struct mem_ctl_info *mci;
struct edac_mc_layer layers[1];
struct ghes_edac_dimm_fill dimm_fill;
+   int idx;
+
+   /* Check if safe to enable on this system */
+   idx = acpi_match_platform_list(plat_list);
+   if (!force_load && idx < 0)
+   return 0;
 
/*
 * We have only one logical memory controller to which all DIMMs belong.
@@ -460,17 +478,17 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
mci->ctl_name = "ghes_edac";
mci->dev_name = "ghes";
 
-   if (!fake) {
+   if (fake) {
+   pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
+   pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
+   pr_info("work on such system. Use this driver with caution\n");
+   } else if (idx < 0) {
pr_info("This EDAC driver relies on BIOS to enumerate memory 
and get error reports.\n");
pr_info("Unfortunately, not all BIOSes reflect the memory 
layout correctly.\n");
pr_info("So, the end result of using this driver varies from 
vendor to vendor.\n");
pr_info("If you find incorrect reports, please contact your 
hardware vendor\n");
pr_info("to correct its BIOS.\n");
pr_info("This system has %d DIMM sockets.\n", num_dimm);
-   } else {
-   pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
-   pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
-   pr_info("work on such system. Use this driver with caution\n");
}
 
if (!fake) {


[PATCH v4 1/5] ACPI / blacklist: add acpi_match_platform_list()

2017-08-23 Thread Toshi Kani
ACPI OEM ID / OEM Table ID / Revision can be used to identify
a platform based on ACPI firmware info.  acpi_blacklisted(),
intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
have been using similar check to detect a list of platforms
that require special handlings.

Move the platform check in acpi_blacklisted() to a new common
utility function, acpi_match_platform_list(), so that other
drivers do not have to implement their own version.

There is no change in functionality.

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: "Rafael J. Wysocki" <r...@rjwysocki.net>
Cc: Borislav Petkov <b...@alien8.de>
---
 drivers/acpi/blacklist.c |   83 --
 drivers/acpi/utils.c |   36 
 include/linux/acpi.h |   19 +++
 3 files changed, 69 insertions(+), 69 deletions(-)

diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
index bb542ac..037fd53 100644
--- a/drivers/acpi/blacklist.c
+++ b/drivers/acpi/blacklist.c
@@ -30,30 +30,13 @@
 
 #include "internal.h"
 
-enum acpi_blacklist_predicates {
-   all_versions,
-   less_than_or_equal,
-   equal,
-   greater_than_or_equal,
-};
-
-struct acpi_blacklist_item {
-   char oem_id[7];
-   char oem_table_id[9];
-   u32 oem_revision;
-   char *table;
-   enum acpi_blacklist_predicates oem_revision_predicate;
-   char *reason;
-   u32 is_critical_error;
-};
-
 static struct dmi_system_id acpi_rev_dmi_table[] __initdata;
 
 /*
  * POLICY: If *anything* doesn't work, put it on the blacklist.
  *If they are critical errors, mark it critical, and abort driver load.
  */
-static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
+static struct acpi_platform_list acpi_blacklist[] __initdata = {
/* Compaq Presario 1700 */
{"PTLTD ", "  DSDT  ", 0x0604, ACPI_SIG_DSDT, less_than_or_equal,
 "Multiple problems", 1},
@@ -67,65 +50,27 @@ static struct acpi_blacklist_item acpi_blacklist[] 
__initdata = {
{"IBM   ", "TP600E  ", 0x0105, ACPI_SIG_DSDT, less_than_or_equal,
 "Incorrect _ADR", 1},
 
-   {""}
+   { }
 };
 
 int __init acpi_blacklisted(void)
 {
-   int i = 0;
+   int i;
int blacklisted = 0;
-   struct acpi_table_header table_header;
-
-   while (acpi_blacklist[i].oem_id[0] != '\0') {
-   if (acpi_get_table_header(acpi_blacklist[i].table, 0, 
_header)) {
-   i++;
-   continue;
-   }
-
-   if (strncmp(acpi_blacklist[i].oem_id, table_header.oem_id, 6)) {
-   i++;
-   continue;
-   }
-
-   if (strncmp
-   (acpi_blacklist[i].oem_table_id, table_header.oem_table_id,
-8)) {
-   i++;
-   continue;
-   }
-
-   if ((acpi_blacklist[i].oem_revision_predicate == all_versions)
-   || (acpi_blacklist[i].oem_revision_predicate ==
-   less_than_or_equal
-   && table_header.oem_revision <=
-   acpi_blacklist[i].oem_revision)
-   || (acpi_blacklist[i].oem_revision_predicate ==
-   greater_than_or_equal
-   && table_header.oem_revision >=
-   acpi_blacklist[i].oem_revision)
-   || (acpi_blacklist[i].oem_revision_predicate == equal
-   && table_header.oem_revision ==
-   acpi_blacklist[i].oem_revision)) {
 
-   printk(KERN_ERR PREFIX
-  "Vendor \"%6.6s\" System \"%8.8s\" "
-  "Revision 0x%x has a known ACPI BIOS problem.\n",
-  acpi_blacklist[i].oem_id,
-  acpi_blacklist[i].oem_table_id,
-  acpi_blacklist[i].oem_revision);
+   i = acpi_match_platform_list(acpi_blacklist);
+   if (i >= 0) {
+   pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" Revision 0x%x 
has a known ACPI BIOS problem.\n",
+  acpi_blacklist[i].oem_id,
+  acpi_blacklist[i].oem_table_id,
+  acpi_blacklist[i].oem_revision);
 
-   printk(KERN_ERR PREFIX
-  "Reason: %s. This is a %s error\n",
-  acpi_blacklist[i].reason,
-  (acpi_blacklist[i].
-   is_critical_error ? "non-recoverable" :
-   "recoverable"));
+   pr_er

[PATCH v4 1/5] ACPI / blacklist: add acpi_match_platform_list()

2017-08-23 Thread Toshi Kani
ACPI OEM ID / OEM Table ID / Revision can be used to identify
a platform based on ACPI firmware info.  acpi_blacklisted(),
intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
have been using similar check to detect a list of platforms
that require special handlings.

Move the platform check in acpi_blacklisted() to a new common
utility function, acpi_match_platform_list(), so that other
drivers do not have to implement their own version.

There is no change in functionality.

Signed-off-by: Toshi Kani 
Cc: "Rafael J. Wysocki" 
Cc: Borislav Petkov 
---
 drivers/acpi/blacklist.c |   83 --
 drivers/acpi/utils.c |   36 
 include/linux/acpi.h |   19 +++
 3 files changed, 69 insertions(+), 69 deletions(-)

diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
index bb542ac..037fd53 100644
--- a/drivers/acpi/blacklist.c
+++ b/drivers/acpi/blacklist.c
@@ -30,30 +30,13 @@
 
 #include "internal.h"
 
-enum acpi_blacklist_predicates {
-   all_versions,
-   less_than_or_equal,
-   equal,
-   greater_than_or_equal,
-};
-
-struct acpi_blacklist_item {
-   char oem_id[7];
-   char oem_table_id[9];
-   u32 oem_revision;
-   char *table;
-   enum acpi_blacklist_predicates oem_revision_predicate;
-   char *reason;
-   u32 is_critical_error;
-};
-
 static struct dmi_system_id acpi_rev_dmi_table[] __initdata;
 
 /*
  * POLICY: If *anything* doesn't work, put it on the blacklist.
  *If they are critical errors, mark it critical, and abort driver load.
  */
-static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
+static struct acpi_platform_list acpi_blacklist[] __initdata = {
/* Compaq Presario 1700 */
{"PTLTD ", "  DSDT  ", 0x0604, ACPI_SIG_DSDT, less_than_or_equal,
 "Multiple problems", 1},
@@ -67,65 +50,27 @@ static struct acpi_blacklist_item acpi_blacklist[] 
__initdata = {
{"IBM   ", "TP600E  ", 0x0105, ACPI_SIG_DSDT, less_than_or_equal,
 "Incorrect _ADR", 1},
 
-   {""}
+   { }
 };
 
 int __init acpi_blacklisted(void)
 {
-   int i = 0;
+   int i;
int blacklisted = 0;
-   struct acpi_table_header table_header;
-
-   while (acpi_blacklist[i].oem_id[0] != '\0') {
-   if (acpi_get_table_header(acpi_blacklist[i].table, 0, 
_header)) {
-   i++;
-   continue;
-   }
-
-   if (strncmp(acpi_blacklist[i].oem_id, table_header.oem_id, 6)) {
-   i++;
-   continue;
-   }
-
-   if (strncmp
-   (acpi_blacklist[i].oem_table_id, table_header.oem_table_id,
-8)) {
-   i++;
-   continue;
-   }
-
-   if ((acpi_blacklist[i].oem_revision_predicate == all_versions)
-   || (acpi_blacklist[i].oem_revision_predicate ==
-   less_than_or_equal
-   && table_header.oem_revision <=
-   acpi_blacklist[i].oem_revision)
-   || (acpi_blacklist[i].oem_revision_predicate ==
-   greater_than_or_equal
-   && table_header.oem_revision >=
-   acpi_blacklist[i].oem_revision)
-   || (acpi_blacklist[i].oem_revision_predicate == equal
-   && table_header.oem_revision ==
-   acpi_blacklist[i].oem_revision)) {
 
-   printk(KERN_ERR PREFIX
-  "Vendor \"%6.6s\" System \"%8.8s\" "
-  "Revision 0x%x has a known ACPI BIOS problem.\n",
-  acpi_blacklist[i].oem_id,
-  acpi_blacklist[i].oem_table_id,
-  acpi_blacklist[i].oem_revision);
+   i = acpi_match_platform_list(acpi_blacklist);
+   if (i >= 0) {
+   pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" Revision 0x%x 
has a known ACPI BIOS problem.\n",
+  acpi_blacklist[i].oem_id,
+  acpi_blacklist[i].oem_table_id,
+  acpi_blacklist[i].oem_revision);
 
-   printk(KERN_ERR PREFIX
-  "Reason: %s. This is a %s error\n",
-  acpi_blacklist[i].reason,
-  (acpi_blacklist[i].
-   is_critical_error ? "non-recoverable" :
-   "recoverable"));
+   pr_err(PREFIX "Reason: %s. This is a %s error\n",
+

[PATCH v4 2/5] intel_pstate: convert to use acpi_match_platform_list()

2017-08-23 Thread Toshi Kani
Convert to use acpi_match_platform_list() for the platform check.
There is no change in functionality.

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruv...@linux.intel.com>
Reviewed-by: Borislav Petkov <b...@suse.de>
Cc: "Rafael J. Wysocki" <r...@rjwysocki.net>
Cc: Srinivas Pandruvada <srinivas.pandruv...@linux.intel.com>
Cc: Len Brown <l...@kernel.org>
Cc: Borislav Petkov <b...@alien8.de>
---
 drivers/cpufreq/intel_pstate.c |   64 
 1 file changed, 25 insertions(+), 39 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index b7fb8b7..ef22a20 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2458,39 +2458,31 @@ enum {
PPC,
 };
 
-struct hw_vendor_info {
-   u16  valid;
-   char oem_id[ACPI_OEM_ID_SIZE];
-   char oem_table_id[ACPI_OEM_TABLE_ID_SIZE];
-   int  oem_pwr_table;
-};
-
 /* Hardware vendor-specific info that has its own power management modes */
-static struct hw_vendor_info vendor_info[] __initdata = {
-   {1, "HP", "ProLiant", PSS},
-   {1, "ORACLE", "X4-2", PPC},
-   {1, "ORACLE", "X4-2L   ", PPC},
-   {1, "ORACLE", "X4-2B   ", PPC},
-   {1, "ORACLE", "X3-2", PPC},
-   {1, "ORACLE", "X3-2L   ", PPC},
-   {1, "ORACLE", "X3-2B   ", PPC},
-   {1, "ORACLE", "X4470M2 ", PPC},
-   {1, "ORACLE", "X4270M3 ", PPC},
-   {1, "ORACLE", "X4270M2 ", PPC},
-   {1, "ORACLE", "X4170M2 ", PPC},
-   {1, "ORACLE", "X4170 M3", PPC},
-   {1, "ORACLE", "X4275 M3", PPC},
-   {1, "ORACLE", "X6-2", PPC},
-   {1, "ORACLE", "Sudbury ", PPC},
-   {0, "", ""},
+static struct acpi_platform_list plat_info[] __initdata = {
+   {"HP", "ProLiant", 0, ACPI_SIG_FADT, all_versions, 0, PSS},
+   {"ORACLE", "X4-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4470M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4270M3 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4270M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4170M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4170 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4275 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X6-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "Sudbury ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   { } /* End */
 };
 
 static bool __init intel_pstate_platform_pwr_mgmt_exists(void)
 {
-   struct acpi_table_header hdr;
-   struct hw_vendor_info *v_info;
const struct x86_cpu_id *id;
u64 misc_pwr;
+   int idx;
 
id = x86_match_cpu(intel_pstate_cpu_oob_ids);
if (id) {
@@ -2499,21 +2491,15 @@ static bool __init 
intel_pstate_platform_pwr_mgmt_exists(void)
return true;
}
 
-   if (acpi_disabled ||
-   ACPI_FAILURE(acpi_get_table_header(ACPI_SIG_FADT, 0, )))
+   idx = acpi_match_platform_list(plat_info);
+   if (idx < 0)
return false;
 
-   for (v_info = vendor_info; v_info->valid; v_info++) {
-   if (!strncmp(hdr.oem_id, v_info->oem_id, ACPI_OEM_ID_SIZE) &&
-   !strncmp(hdr.oem_table_id, v_info->oem_table_id,
-   ACPI_OEM_TABLE_ID_SIZE))
-   switch (v_info->oem_pwr_table) {
-   case PSS:
-   return intel_pstate_no_acpi_pss();
-   case PPC:
-   return intel_pstate_has_acpi_ppc() &&
-   (!force_load);
-   }
+   switch (plat_info[idx].data) {
+   case PSS:
+   return intel_pstate_no_acpi_pss();
+   case PPC:
+   return intel_pstate_has_acpi_ppc() && !force_load;
}
 
return false;


[PATCH v4 2/5] intel_pstate: convert to use acpi_match_platform_list()

2017-08-23 Thread Toshi Kani
Convert to use acpi_match_platform_list() for the platform check.
There is no change in functionality.

Signed-off-by: Toshi Kani 
Acked-by: Srinivas Pandruvada 
Reviewed-by: Borislav Petkov 
Cc: "Rafael J. Wysocki" 
Cc: Srinivas Pandruvada 
Cc: Len Brown 
Cc: Borislav Petkov 
---
 drivers/cpufreq/intel_pstate.c |   64 
 1 file changed, 25 insertions(+), 39 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index b7fb8b7..ef22a20 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2458,39 +2458,31 @@ enum {
PPC,
 };
 
-struct hw_vendor_info {
-   u16  valid;
-   char oem_id[ACPI_OEM_ID_SIZE];
-   char oem_table_id[ACPI_OEM_TABLE_ID_SIZE];
-   int  oem_pwr_table;
-};
-
 /* Hardware vendor-specific info that has its own power management modes */
-static struct hw_vendor_info vendor_info[] __initdata = {
-   {1, "HP", "ProLiant", PSS},
-   {1, "ORACLE", "X4-2", PPC},
-   {1, "ORACLE", "X4-2L   ", PPC},
-   {1, "ORACLE", "X4-2B   ", PPC},
-   {1, "ORACLE", "X3-2", PPC},
-   {1, "ORACLE", "X3-2L   ", PPC},
-   {1, "ORACLE", "X3-2B   ", PPC},
-   {1, "ORACLE", "X4470M2 ", PPC},
-   {1, "ORACLE", "X4270M3 ", PPC},
-   {1, "ORACLE", "X4270M2 ", PPC},
-   {1, "ORACLE", "X4170M2 ", PPC},
-   {1, "ORACLE", "X4170 M3", PPC},
-   {1, "ORACLE", "X4275 M3", PPC},
-   {1, "ORACLE", "X6-2", PPC},
-   {1, "ORACLE", "Sudbury ", PPC},
-   {0, "", ""},
+static struct acpi_platform_list plat_info[] __initdata = {
+   {"HP", "ProLiant", 0, ACPI_SIG_FADT, all_versions, 0, PSS},
+   {"ORACLE", "X4-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4470M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4270M3 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4270M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4170M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4170 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4275 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X6-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "Sudbury ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   { } /* End */
 };
 
 static bool __init intel_pstate_platform_pwr_mgmt_exists(void)
 {
-   struct acpi_table_header hdr;
-   struct hw_vendor_info *v_info;
const struct x86_cpu_id *id;
u64 misc_pwr;
+   int idx;
 
id = x86_match_cpu(intel_pstate_cpu_oob_ids);
if (id) {
@@ -2499,21 +2491,15 @@ static bool __init 
intel_pstate_platform_pwr_mgmt_exists(void)
return true;
}
 
-   if (acpi_disabled ||
-   ACPI_FAILURE(acpi_get_table_header(ACPI_SIG_FADT, 0, )))
+   idx = acpi_match_platform_list(plat_info);
+   if (idx < 0)
return false;
 
-   for (v_info = vendor_info; v_info->valid; v_info++) {
-   if (!strncmp(hdr.oem_id, v_info->oem_id, ACPI_OEM_ID_SIZE) &&
-   !strncmp(hdr.oem_table_id, v_info->oem_table_id,
-   ACPI_OEM_TABLE_ID_SIZE))
-   switch (v_info->oem_pwr_table) {
-   case PSS:
-   return intel_pstate_no_acpi_pss();
-   case PPC:
-   return intel_pstate_has_acpi_ppc() &&
-   (!force_load);
-   }
+   switch (plat_info[idx].data) {
+   case PSS:
+   return intel_pstate_no_acpi_pss();
+   case PPC:
+   return intel_pstate_has_acpi_ppc() && !force_load;
}
 
return false;


[PATCH v4 4/5] EDAC: add edac_get_owner() to check MC owner

2017-08-23 Thread Toshi Kani
Only a single edac driver can be enabled for EDAC MC.  When ghes_edac
is enabled, a regular edac driver for the CPU type / platform still
attempts to register itself and fails in edac_mc_add_mc().

Add edac_get_owner() so that regular edac drivers can check the owner
of EDAC MC without calling edac_mc_add_mc().

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Suggested-by: Borislav Petkov <b...@alien8.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Mauro Carvalho Chehab <mche...@kernel.org>
Cc: Tony Luck <tony.l...@intel.com>
---
 drivers/edac/edac_mc.c |7 ++-
 drivers/edac/edac_mc.h |8 
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 4800721..48193f5 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -53,7 +53,7 @@ static LIST_HEAD(mc_devices);
  * Used to lock EDAC MC to just one module, avoiding two drivers e. g.
  * apei/ghes and i7core_edac to be used at the same time.
  */
-static void const *edac_mc_owner;
+static const char *edac_mc_owner;
 
 static struct bus_type mc_bus[EDAC_MAX_MCS];
 
@@ -701,6 +701,11 @@ struct mem_ctl_info *edac_mc_find(int idx)
 }
 EXPORT_SYMBOL(edac_mc_find);
 
+const char *edac_get_owner(void)
+{
+   return edac_mc_owner;
+}
+EXPORT_SYMBOL_GPL(edac_get_owner);
 
 /* FIXME - should a warning be printed if no error detection? correction? */
 int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 5357800..4165e15 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -128,6 +128,14 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
   unsigned sz_pvt);
 
 /**
+ * edac_get_owner - Return the owner's mod_name of EDAC MC
+ *
+ * Returns:
+ * Pointer to mod_name string when EDAC MC is owned. NULL otherwise.
+ */
+extern const char *edac_get_owner(void);
+
+/*
  * edac_mc_add_mc_with_groups() - Insert the @mci structure into the mci
  * global list and create sysfs entries associated with @mci structure.
  *


[PATCH v4 4/5] EDAC: add edac_get_owner() to check MC owner

2017-08-23 Thread Toshi Kani
Only a single edac driver can be enabled for EDAC MC.  When ghes_edac
is enabled, a regular edac driver for the CPU type / platform still
attempts to register itself and fails in edac_mc_add_mc().

Add edac_get_owner() so that regular edac drivers can check the owner
of EDAC MC without calling edac_mc_add_mc().

Signed-off-by: Toshi Kani 
Suggested-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Mauro Carvalho Chehab 
Cc: Tony Luck 
---
 drivers/edac/edac_mc.c |7 ++-
 drivers/edac/edac_mc.h |8 
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 4800721..48193f5 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -53,7 +53,7 @@ static LIST_HEAD(mc_devices);
  * Used to lock EDAC MC to just one module, avoiding two drivers e. g.
  * apei/ghes and i7core_edac to be used at the same time.
  */
-static void const *edac_mc_owner;
+static const char *edac_mc_owner;
 
 static struct bus_type mc_bus[EDAC_MAX_MCS];
 
@@ -701,6 +701,11 @@ struct mem_ctl_info *edac_mc_find(int idx)
 }
 EXPORT_SYMBOL(edac_mc_find);
 
+const char *edac_get_owner(void)
+{
+   return edac_mc_owner;
+}
+EXPORT_SYMBOL_GPL(edac_get_owner);
 
 /* FIXME - should a warning be printed if no error detection? correction? */
 int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 5357800..4165e15 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -128,6 +128,14 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
   unsigned sz_pvt);
 
 /**
+ * edac_get_owner - Return the owner's mod_name of EDAC MC
+ *
+ * Returns:
+ * Pointer to mod_name string when EDAC MC is owned. NULL otherwise.
+ */
+extern const char *edac_get_owner(void);
+
+/*
  * edac_mc_add_mc_with_groups() - Insert the @mci structure into the mci
  * global list and create sysfs entries associated with @mci structure.
  *


[PATCH v4 0/5] enable ghes_edac on selected platforms

2017-08-23 Thread Toshi Kani
The ghes_edac driver was introduced in 2013 [1], but it has not been
enabled by any distro yet.  This is because the driver obtains error
info from firmware interfaces, which are not properly implemented on
many platforms.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms to
the list when they support ghes_edac.

Patch 1-2 introduces a common function for platform check.
Patch 3 introduces platform check to ghes_edac.
Patch 4-5 optimizes regular edac driver's init code when ghes_edac is used.

Patch-set is based on bp.git ghes branch.

v4:
 - Increase the size of oem_id[] and oem_table_id[] by 1 (patch 1)
 - Change code style to a single line (patch 1)
 - Rebase to top of bp.git ghes branch.

v3:
 - Change struct & func names to "platform" from "oem" (patch 1)
 - Drop a patch that checks OSC APEI bit (remove v2 patch 3)
 - Drop a patch that avoids multiple calls to dmi_walk() (remove v2 patch 4)
 - Change parameter name to ghes_edac.force_load (patch 3)
 - Change function to edac_get_owner() (patch 4)
 - Change edac_mc_owner to const char * (patch 4)
 - Change to call edac_get_owner() at the beginning (patch 5)
 - Remove ".c" from mod_name (patch 5)

v2:
 - Address review comments (patch 1)
 - Add OSC APEI check (patch 3)
 - Avoid multiple dmi_walk (patch 4)
 - Add EDAC MC owner check (patch 6,7)

---
Toshi Kani (5):
 1/5 ACPI / blacklist: add acpi_match_platform_list()
 2/5 intel_pstate: convert to use acpi_match_platform_list()
 3/5 ghes_edac: add platform check to enable ghes_edac
 4/5 EDAC: add edac_get_owner() to check MC owner
 5/5 edac drivers: add MC owner check in init

---
 drivers/acpi/blacklist.c   | 83 +++---
 drivers/acpi/utils.c   | 36 ++
 drivers/cpufreq/intel_pstate.c | 64 +---
 drivers/edac/amd64_edac.c  |  5 +++
 drivers/edac/edac_mc.c |  7 +++-
 drivers/edac/edac_mc.h |  8 
 drivers/edac/ghes_edac.c   | 29 ---
 drivers/edac/pnd2_edac.c   |  9 -
 drivers/edac/sb_edac.c |  9 -
 drivers/edac/skx_edac.c|  9 -
 include/linux/acpi.h   | 19 ++
 11 files changed, 160 insertions(+), 118 deletions(-)


[PATCH v4 0/5] enable ghes_edac on selected platforms

2017-08-23 Thread Toshi Kani
The ghes_edac driver was introduced in 2013 [1], but it has not been
enabled by any distro yet.  This is because the driver obtains error
info from firmware interfaces, which are not properly implemented on
many platforms.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms to
the list when they support ghes_edac.

Patch 1-2 introduces a common function for platform check.
Patch 3 introduces platform check to ghes_edac.
Patch 4-5 optimizes regular edac driver's init code when ghes_edac is used.

Patch-set is based on bp.git ghes branch.

v4:
 - Increase the size of oem_id[] and oem_table_id[] by 1 (patch 1)
 - Change code style to a single line (patch 1)
 - Rebase to top of bp.git ghes branch.

v3:
 - Change struct & func names to "platform" from "oem" (patch 1)
 - Drop a patch that checks OSC APEI bit (remove v2 patch 3)
 - Drop a patch that avoids multiple calls to dmi_walk() (remove v2 patch 4)
 - Change parameter name to ghes_edac.force_load (patch 3)
 - Change function to edac_get_owner() (patch 4)
 - Change edac_mc_owner to const char * (patch 4)
 - Change to call edac_get_owner() at the beginning (patch 5)
 - Remove ".c" from mod_name (patch 5)

v2:
 - Address review comments (patch 1)
 - Add OSC APEI check (patch 3)
 - Avoid multiple dmi_walk (patch 4)
 - Add EDAC MC owner check (patch 6,7)

---
Toshi Kani (5):
 1/5 ACPI / blacklist: add acpi_match_platform_list()
 2/5 intel_pstate: convert to use acpi_match_platform_list()
 3/5 ghes_edac: add platform check to enable ghes_edac
 4/5 EDAC: add edac_get_owner() to check MC owner
 5/5 edac drivers: add MC owner check in init

---
 drivers/acpi/blacklist.c   | 83 +++---
 drivers/acpi/utils.c   | 36 ++
 drivers/cpufreq/intel_pstate.c | 64 +---
 drivers/edac/amd64_edac.c  |  5 +++
 drivers/edac/edac_mc.c |  7 +++-
 drivers/edac/edac_mc.h |  8 
 drivers/edac/ghes_edac.c   | 29 ---
 drivers/edac/pnd2_edac.c   |  9 -
 drivers/edac/sb_edac.c |  9 -
 drivers/edac/skx_edac.c|  9 -
 include/linux/acpi.h   | 19 ++
 11 files changed, 160 insertions(+), 118 deletions(-)


[PATCH v4 5/5] edac drivers: add MC owner check in init

2017-08-23 Thread Toshi Kani
Change generic x86 edac drivers, which probe CPU type with
x86_match_cpu(), to verify the module owner at the beginning
of their module init functions.  This allows them to fail
their init immediately when ghes_edac is enabled.  Similar
change can be made to other edac drivers as necessary.

Also, remove ".c" from module names of pnp2_edac, sb_edac,
and skx_edac.

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Suggested-by: Borislav Petkov <b...@alien8.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Mauro Carvalho Chehab <mche...@kernel.org>
Cc: Tony Luck <tony.l...@intel.com>
---
 drivers/edac/amd64_edac.c |5 +
 drivers/edac/pnd2_edac.c  |9 -
 drivers/edac/sb_edac.c|9 +++--
 drivers/edac/skx_edac.c   |9 -
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index ac2f302..8b16ec5 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3434,9 +3434,14 @@ MODULE_DEVICE_TABLE(x86cpu, amd64_cpuids);
 
 static int __init amd64_edac_init(void)
 {
+   const char *owner;
int err = -ENODEV;
int i;
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
if (!x86_match_cpu(amd64_cpuids))
return -ENODEV;
 
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index a3180a8..7ca02f8 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -45,6 +45,8 @@
 #include "edac_module.h"
 #include "pnd2_edac.h"
 
+#define EDAC_MOD_STR   "pnd2_edac"
+
 #define APL_NUM_CHANNELS   4
 #define DNV_NUM_CHANNELS   2
 #define DNV_MAX_DIMMS  2 /* Max DIMMs per channel */
@@ -1337,7 +1339,7 @@ static int pnd2_register_mci(struct mem_ctl_info **ppmci)
pvt = mci->pvt_info;
memset(pvt, 0, sizeof(*pvt));
 
-   mci->mod_name = "pnd2_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->dev_name = ops->name;
mci->ctl_name = "Pondicherry2";
 
@@ -1529,10 +1531,15 @@ MODULE_DEVICE_TABLE(x86cpu, pnd2_cpuids);
 static int __init pnd2_init(void)
 {
const struct x86_cpu_id *id;
+   const char *owner;
int rc;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(pnd2_cpuids);
if (!id)
return -ENODEV;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index dc05916..581fdb7 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -36,7 +36,7 @@ static LIST_HEAD(sbridge_edac_list);
  * Alter this version for the module when modifications are made
  */
 #define SBRIDGE_REVISION" Ver: 1.1.2 "
-#define EDAC_MOD_STR  "sbridge_edac"
+#define EDAC_MOD_STR   "sb_edac"
 
 /*
  * Debug macros
@@ -3155,7 +3155,7 @@ static int sbridge_register_mci(struct sbridge_dev 
*sbridge_dev, enum type type)
MEM_FLAG_DDR4 : MEM_FLAG_DDR3;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
-   mci->mod_name = "sb_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->dev_name = pci_name(pdev);
mci->ctl_page_to_phys = NULL;
 
@@ -3402,10 +3402,15 @@ static void sbridge_remove(void)
 static int __init sbridge_init(void)
 {
const struct x86_cpu_id *id;
+   const char *owner;
int rc;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(sbridge_cpuids);
if (!id)
return -ENODEV;
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index 16dea97..85d5f0b 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -31,6 +31,8 @@
 
 #include "edac_module.h"
 
+#define EDAC_MOD_STR"skx_edac"
+
 /*
  * Debug macros
  */
@@ -469,7 +471,7 @@ static int skx_register_mci(struct skx_imc *imc)
mci->mtype_cap = MEM_FLAG_DDR4;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
-   mci->mod_name = "skx_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->dev_name = pci_name(imc->chan[0].cdev);
mci->ctl_page_to_phys = NULL;
 
@@ -1039,12 +1041,17 @@ static int __init skx_init(void)
 {
const struct x86_cpu_id *id;
const struct munit *m;
+   const char *owner;
int rc = 0, i;
u8 mc = 0, src_id, node_id;
struct skx_dev *d;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(skx_cpuids);
if (!id)
return -ENODEV;


[PATCH v4 5/5] edac drivers: add MC owner check in init

2017-08-23 Thread Toshi Kani
Change generic x86 edac drivers, which probe CPU type with
x86_match_cpu(), to verify the module owner at the beginning
of their module init functions.  This allows them to fail
their init immediately when ghes_edac is enabled.  Similar
change can be made to other edac drivers as necessary.

Also, remove ".c" from module names of pnp2_edac, sb_edac,
and skx_edac.

Signed-off-by: Toshi Kani 
Suggested-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Mauro Carvalho Chehab 
Cc: Tony Luck 
---
 drivers/edac/amd64_edac.c |5 +
 drivers/edac/pnd2_edac.c  |9 -
 drivers/edac/sb_edac.c|9 +++--
 drivers/edac/skx_edac.c   |9 -
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index ac2f302..8b16ec5 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3434,9 +3434,14 @@ MODULE_DEVICE_TABLE(x86cpu, amd64_cpuids);
 
 static int __init amd64_edac_init(void)
 {
+   const char *owner;
int err = -ENODEV;
int i;
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
if (!x86_match_cpu(amd64_cpuids))
return -ENODEV;
 
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index a3180a8..7ca02f8 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -45,6 +45,8 @@
 #include "edac_module.h"
 #include "pnd2_edac.h"
 
+#define EDAC_MOD_STR   "pnd2_edac"
+
 #define APL_NUM_CHANNELS   4
 #define DNV_NUM_CHANNELS   2
 #define DNV_MAX_DIMMS  2 /* Max DIMMs per channel */
@@ -1337,7 +1339,7 @@ static int pnd2_register_mci(struct mem_ctl_info **ppmci)
pvt = mci->pvt_info;
memset(pvt, 0, sizeof(*pvt));
 
-   mci->mod_name = "pnd2_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->dev_name = ops->name;
mci->ctl_name = "Pondicherry2";
 
@@ -1529,10 +1531,15 @@ MODULE_DEVICE_TABLE(x86cpu, pnd2_cpuids);
 static int __init pnd2_init(void)
 {
const struct x86_cpu_id *id;
+   const char *owner;
int rc;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(pnd2_cpuids);
if (!id)
return -ENODEV;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index dc05916..581fdb7 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -36,7 +36,7 @@ static LIST_HEAD(sbridge_edac_list);
  * Alter this version for the module when modifications are made
  */
 #define SBRIDGE_REVISION" Ver: 1.1.2 "
-#define EDAC_MOD_STR  "sbridge_edac"
+#define EDAC_MOD_STR   "sb_edac"
 
 /*
  * Debug macros
@@ -3155,7 +3155,7 @@ static int sbridge_register_mci(struct sbridge_dev 
*sbridge_dev, enum type type)
MEM_FLAG_DDR4 : MEM_FLAG_DDR3;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
-   mci->mod_name = "sb_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->dev_name = pci_name(pdev);
mci->ctl_page_to_phys = NULL;
 
@@ -3402,10 +3402,15 @@ static void sbridge_remove(void)
 static int __init sbridge_init(void)
 {
const struct x86_cpu_id *id;
+   const char *owner;
int rc;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(sbridge_cpuids);
if (!id)
return -ENODEV;
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index 16dea97..85d5f0b 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -31,6 +31,8 @@
 
 #include "edac_module.h"
 
+#define EDAC_MOD_STR"skx_edac"
+
 /*
  * Debug macros
  */
@@ -469,7 +471,7 @@ static int skx_register_mci(struct skx_imc *imc)
mci->mtype_cap = MEM_FLAG_DDR4;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
-   mci->mod_name = "skx_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->dev_name = pci_name(imc->chan[0].cdev);
mci->ctl_page_to_phys = NULL;
 
@@ -1039,12 +1041,17 @@ static int __init skx_init(void)
 {
const struct x86_cpu_id *id;
const struct munit *m;
+   const char *owner;
int rc = 0, i;
u8 mc = 0, src_id, node_id;
struct skx_dev *d;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(skx_cpuids);
if (!id)
return -ENODEV;


[PATCH v4 3/5] ghes_edac: add platform check to enable ghes_edac

2017-08-23 Thread Toshi Kani
The ghes_edac driver was introduced in 2013 [1], but it has not
been enabled by any distro yet.  This driver obtains error info
from firmware interfaces, which are not properly implemented on
many platforms, as the driver always emits the messages below:

 This EDAC driver relies on BIOS to enumerate memory and get error reports.
 Unfortunately, not all BIOSes reflect the memory layout correctly
 So, the end result of using this driver varies from vendor to vendor
 If you find incorrect reports, please contact your hardware vendor
 to correct its BIOS.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms
to the list when they support ghes_edac.

"ghes_edac.force_load=1" skips this platform check.

[1]: https://lwn.net/Articles/538438/
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Mauro Carvalho Chehab <mche...@kernel.org>
Cc: Tony Luck <tony.l...@intel.com>
---
 drivers/edac/ghes_edac.c |   29 -
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 8d904df..0030a09 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -38,6 +38,10 @@ static struct ghes_edac_pvt *ghes_pvt;
  */
 static DEFINE_SPINLOCK(ghes_lock);
 
+/* Set 1 to skip the platform check */
+static bool __read_mostly ghes_edac_force_load;
+module_param_named(force_load, ghes_edac_force_load, bool, 0);
+
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
u8 type;
@@ -415,6 +419,15 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
spin_unlock_irqrestore(_lock, flags);
 }
 
+/*
+ * Known systems that are safe to enable this module.
+ * "ghes_edac.force_load=1" skips this check if necessary.
+ */
+static struct acpi_platform_list plat_list[] = {
+   {"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+   { } /* End */
+};
+
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
bool fake = false;
@@ -422,6 +435,12 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
struct mem_ctl_info *mci;
struct edac_mc_layer layers[1];
struct ghes_edac_dimm_fill dimm_fill;
+   int idx;
+
+   /* Check if safe to enable on this system */
+   idx = acpi_match_platform_list(plat_list);
+   if (!ghes_edac_force_load && idx < 0)
+   return 0;
 
/*
 * We have only one logical memory controller to which all DIMMs belong.
@@ -460,17 +479,17 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
mci->ctl_name = "ghes_edac";
mci->dev_name = "ghes";
 
-   if (!fake) {
+   if (fake) {
+   pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
+   pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
+   pr_info("work on such system. Use this driver with caution\n");
+   } else if (idx < 0) {
pr_info("This EDAC driver relies on BIOS to enumerate memory 
and get error reports.\n");
pr_info("Unfortunately, not all BIOSes reflect the memory 
layout correctly.\n");
pr_info("So, the end result of using this driver varies from 
vendor to vendor.\n");
pr_info("If you find incorrect reports, please contact your 
hardware vendor\n");
pr_info("to correct its BIOS.\n");
pr_info("This system has %d DIMM sockets.\n", num_dimm);
-   } else {
-   pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
-   pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
-   pr_info("work on such system. Use this driver with caution\n");
}
 
if (!fake) {


[PATCH v4 3/5] ghes_edac: add platform check to enable ghes_edac

2017-08-23 Thread Toshi Kani
The ghes_edac driver was introduced in 2013 [1], but it has not
been enabled by any distro yet.  This driver obtains error info
from firmware interfaces, which are not properly implemented on
many platforms, as the driver always emits the messages below:

 This EDAC driver relies on BIOS to enumerate memory and get error reports.
 Unfortunately, not all BIOSes reflect the memory layout correctly
 So, the end result of using this driver varies from vendor to vendor
 If you find incorrect reports, please contact your hardware vendor
 to correct its BIOS.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms
to the list when they support ghes_edac.

"ghes_edac.force_load=1" skips this platform check.

[1]: https://lwn.net/Articles/538438/
Signed-off-by: Toshi Kani 
Cc: Borislav Petkov 
Cc: Mauro Carvalho Chehab 
Cc: Tony Luck 
---
 drivers/edac/ghes_edac.c |   29 -
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 8d904df..0030a09 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -38,6 +38,10 @@ static struct ghes_edac_pvt *ghes_pvt;
  */
 static DEFINE_SPINLOCK(ghes_lock);
 
+/* Set 1 to skip the platform check */
+static bool __read_mostly ghes_edac_force_load;
+module_param_named(force_load, ghes_edac_force_load, bool, 0);
+
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
u8 type;
@@ -415,6 +419,15 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
spin_unlock_irqrestore(_lock, flags);
 }
 
+/*
+ * Known systems that are safe to enable this module.
+ * "ghes_edac.force_load=1" skips this check if necessary.
+ */
+static struct acpi_platform_list plat_list[] = {
+   {"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+   { } /* End */
+};
+
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
bool fake = false;
@@ -422,6 +435,12 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
struct mem_ctl_info *mci;
struct edac_mc_layer layers[1];
struct ghes_edac_dimm_fill dimm_fill;
+   int idx;
+
+   /* Check if safe to enable on this system */
+   idx = acpi_match_platform_list(plat_list);
+   if (!ghes_edac_force_load && idx < 0)
+   return 0;
 
/*
 * We have only one logical memory controller to which all DIMMs belong.
@@ -460,17 +479,17 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
mci->ctl_name = "ghes_edac";
mci->dev_name = "ghes";
 
-   if (!fake) {
+   if (fake) {
+   pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
+   pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
+   pr_info("work on such system. Use this driver with caution\n");
+   } else if (idx < 0) {
pr_info("This EDAC driver relies on BIOS to enumerate memory 
and get error reports.\n");
pr_info("Unfortunately, not all BIOSes reflect the memory 
layout correctly.\n");
pr_info("So, the end result of using this driver varies from 
vendor to vendor.\n");
pr_info("If you find incorrect reports, please contact your 
hardware vendor\n");
pr_info("to correct its BIOS.\n");
pr_info("This system has %d DIMM sockets.\n", num_dimm);
-   } else {
-   pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
-   pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
-   pr_info("work on such system. Use this driver with caution\n");
}
 
if (!fake) {


[PATCH v3 0/5] enable ghes_edac on selected platforms

2017-08-18 Thread Toshi Kani
The ghes_edac driver was introduced in 2013 [1], but it has not been
enabled by any distro yet.  This is because the driver obtains error
info from firmware interfaces, which are not properly implemented on
many platforms.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms to
the list when they support ghes_edac.

Patch 1-2 introduces a common function for platform check.
Patch 3 introduces platform check to ghes_edac.
Patch 4-5 optimizes regular edac driver's init code when ghes_edac is used.

v3:
 - Change struct & func names to "platform" from "oem" (patch 1)
 - Drop a patch that checks OSC APEI bit (remove v2 patch 3)
 - Drop a patch that avoids multiple calls to dmi_walk() (remove v2 patch 4)
 - Change parameter name to ghes_edac.force_load (patch 3)
 - Change function to edac_get_owner() (patch 4)
 - Change edac_mc_owner to const char * (patch 4)
 - Change to call edac_get_owner() at the beginning (patch 5)
 - Remove ".c" from mod_name (patch 5)

v2:
 - Address review comments (patch 1)
 - Add OSC APEI check (patch 3)
 - Avoid multiple dmi_walk (patch 4)
 - Add EDAC MC owner check (patch 6,7)

---
Toshi Kani (5):
 1/5 ACPI / blacklist: add acpi_match_platform_list()
 2/5 intel_pstate: convert to use acpi_match_platform_list()
 3/5 ghes_edac: add platform check to enable ghes_edac
 4/5 EDAC: add edac_get_owner() to check MC owner
 5/5 edac drivers: add MC owner check in init

---
 drivers/acpi/blacklist.c   | 83 +++---
 drivers/acpi/utils.c   | 40 
 drivers/cpufreq/intel_pstate.c | 64 +---
 drivers/edac/amd64_edac.c  |  5 +++
 drivers/edac/edac_mc.c |  7 +++-
 drivers/edac/edac_mc.h |  8 
 drivers/edac/ghes_edac.c   | 28 +++---
 drivers/edac/pnd2_edac.c   |  9 -
 drivers/edac/sb_edac.c |  9 -
 drivers/edac/skx_edac.c|  8 +++-
 include/linux/acpi.h   | 19 ++
 11 files changed, 162 insertions(+), 118 deletions(-)



[PATCH v3 0/5] enable ghes_edac on selected platforms

2017-08-18 Thread Toshi Kani
The ghes_edac driver was introduced in 2013 [1], but it has not been
enabled by any distro yet.  This is because the driver obtains error
info from firmware interfaces, which are not properly implemented on
many platforms.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms to
the list when they support ghes_edac.

Patch 1-2 introduces a common function for platform check.
Patch 3 introduces platform check to ghes_edac.
Patch 4-5 optimizes regular edac driver's init code when ghes_edac is used.

v3:
 - Change struct & func names to "platform" from "oem" (patch 1)
 - Drop a patch that checks OSC APEI bit (remove v2 patch 3)
 - Drop a patch that avoids multiple calls to dmi_walk() (remove v2 patch 4)
 - Change parameter name to ghes_edac.force_load (patch 3)
 - Change function to edac_get_owner() (patch 4)
 - Change edac_mc_owner to const char * (patch 4)
 - Change to call edac_get_owner() at the beginning (patch 5)
 - Remove ".c" from mod_name (patch 5)

v2:
 - Address review comments (patch 1)
 - Add OSC APEI check (patch 3)
 - Avoid multiple dmi_walk (patch 4)
 - Add EDAC MC owner check (patch 6,7)

---
Toshi Kani (5):
 1/5 ACPI / blacklist: add acpi_match_platform_list()
 2/5 intel_pstate: convert to use acpi_match_platform_list()
 3/5 ghes_edac: add platform check to enable ghes_edac
 4/5 EDAC: add edac_get_owner() to check MC owner
 5/5 edac drivers: add MC owner check in init

---
 drivers/acpi/blacklist.c   | 83 +++---
 drivers/acpi/utils.c   | 40 
 drivers/cpufreq/intel_pstate.c | 64 +---
 drivers/edac/amd64_edac.c  |  5 +++
 drivers/edac/edac_mc.c |  7 +++-
 drivers/edac/edac_mc.h |  8 
 drivers/edac/ghes_edac.c   | 28 +++---
 drivers/edac/pnd2_edac.c   |  9 -
 drivers/edac/sb_edac.c |  9 -
 drivers/edac/skx_edac.c|  8 +++-
 include/linux/acpi.h   | 19 ++
 11 files changed, 162 insertions(+), 118 deletions(-)



[PATCH v3 2/5] intel_pstate: convert to use acpi_match_platform_list()

2017-08-18 Thread Toshi Kani
Convert to use acpi_match_platform_list() for the platform check.
There is no change in functionality.

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: "Rafael J. Wysocki" <r...@rjwysocki.net>
Cc: Srinivas Pandruvada <srinivas.pandruv...@linux.intel.com>
Cc: Len Brown <l...@kernel.org>
Cc: Borislav Petkov <b...@alien8.de>
---
 drivers/cpufreq/intel_pstate.c |   64 
 1 file changed, 25 insertions(+), 39 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 65ee4fc..ad713cd 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2466,39 +2466,31 @@ enum {
PPC,
 };
 
-struct hw_vendor_info {
-   u16  valid;
-   char oem_id[ACPI_OEM_ID_SIZE];
-   char oem_table_id[ACPI_OEM_TABLE_ID_SIZE];
-   int  oem_pwr_table;
-};
-
 /* Hardware vendor-specific info that has its own power management modes */
-static struct hw_vendor_info vendor_info[] __initdata = {
-   {1, "HP", "ProLiant", PSS},
-   {1, "ORACLE", "X4-2", PPC},
-   {1, "ORACLE", "X4-2L   ", PPC},
-   {1, "ORACLE", "X4-2B   ", PPC},
-   {1, "ORACLE", "X3-2", PPC},
-   {1, "ORACLE", "X3-2L   ", PPC},
-   {1, "ORACLE", "X3-2B   ", PPC},
-   {1, "ORACLE", "X4470M2 ", PPC},
-   {1, "ORACLE", "X4270M3 ", PPC},
-   {1, "ORACLE", "X4270M2 ", PPC},
-   {1, "ORACLE", "X4170M2 ", PPC},
-   {1, "ORACLE", "X4170 M3", PPC},
-   {1, "ORACLE", "X4275 M3", PPC},
-   {1, "ORACLE", "X6-2", PPC},
-   {1, "ORACLE", "Sudbury ", PPC},
-   {0, "", ""},
+static struct acpi_platform_list plat_info[] __initdata = {
+   {"HP", "ProLiant", 0, ACPI_SIG_FADT, all_versions, 0, PSS},
+   {"ORACLE", "X4-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4470M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4270M3 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4270M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4170M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4170 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4275 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X6-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "Sudbury ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   { } /* End */
 };
 
 static bool __init intel_pstate_platform_pwr_mgmt_exists(void)
 {
-   struct acpi_table_header hdr;
-   struct hw_vendor_info *v_info;
const struct x86_cpu_id *id;
u64 misc_pwr;
+   int idx;
 
id = x86_match_cpu(intel_pstate_cpu_oob_ids);
if (id) {
@@ -2507,21 +2499,15 @@ static bool __init 
intel_pstate_platform_pwr_mgmt_exists(void)
return true;
}
 
-   if (acpi_disabled ||
-   ACPI_FAILURE(acpi_get_table_header(ACPI_SIG_FADT, 0, )))
+   idx = acpi_match_platform_list(plat_info);
+   if (idx < 0)
return false;
 
-   for (v_info = vendor_info; v_info->valid; v_info++) {
-   if (!strncmp(hdr.oem_id, v_info->oem_id, ACPI_OEM_ID_SIZE) &&
-   !strncmp(hdr.oem_table_id, v_info->oem_table_id,
-   ACPI_OEM_TABLE_ID_SIZE))
-   switch (v_info->oem_pwr_table) {
-   case PSS:
-   return intel_pstate_no_acpi_pss();
-   case PPC:
-   return intel_pstate_has_acpi_ppc() &&
-   (!force_load);
-   }
+   switch (plat_info[idx].data) {
+   case PSS:
+   return intel_pstate_no_acpi_pss();
+   case PPC:
+   return intel_pstate_has_acpi_ppc() && !force_load;
}
 
return false;


[PATCH v3 2/5] intel_pstate: convert to use acpi_match_platform_list()

2017-08-18 Thread Toshi Kani
Convert to use acpi_match_platform_list() for the platform check.
There is no change in functionality.

Signed-off-by: Toshi Kani 
Cc: "Rafael J. Wysocki" 
Cc: Srinivas Pandruvada 
Cc: Len Brown 
Cc: Borislav Petkov 
---
 drivers/cpufreq/intel_pstate.c |   64 
 1 file changed, 25 insertions(+), 39 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 65ee4fc..ad713cd 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2466,39 +2466,31 @@ enum {
PPC,
 };
 
-struct hw_vendor_info {
-   u16  valid;
-   char oem_id[ACPI_OEM_ID_SIZE];
-   char oem_table_id[ACPI_OEM_TABLE_ID_SIZE];
-   int  oem_pwr_table;
-};
-
 /* Hardware vendor-specific info that has its own power management modes */
-static struct hw_vendor_info vendor_info[] __initdata = {
-   {1, "HP", "ProLiant", PSS},
-   {1, "ORACLE", "X4-2", PPC},
-   {1, "ORACLE", "X4-2L   ", PPC},
-   {1, "ORACLE", "X4-2B   ", PPC},
-   {1, "ORACLE", "X3-2", PPC},
-   {1, "ORACLE", "X3-2L   ", PPC},
-   {1, "ORACLE", "X3-2B   ", PPC},
-   {1, "ORACLE", "X4470M2 ", PPC},
-   {1, "ORACLE", "X4270M3 ", PPC},
-   {1, "ORACLE", "X4270M2 ", PPC},
-   {1, "ORACLE", "X4170M2 ", PPC},
-   {1, "ORACLE", "X4170 M3", PPC},
-   {1, "ORACLE", "X4275 M3", PPC},
-   {1, "ORACLE", "X6-2", PPC},
-   {1, "ORACLE", "Sudbury ", PPC},
-   {0, "", ""},
+static struct acpi_platform_list plat_info[] __initdata = {
+   {"HP", "ProLiant", 0, ACPI_SIG_FADT, all_versions, 0, PSS},
+   {"ORACLE", "X4-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X3-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4470M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4270M3 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4270M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4170M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4170 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X4275 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "X6-2", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   {"ORACLE", "Sudbury ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+   { } /* End */
 };
 
 static bool __init intel_pstate_platform_pwr_mgmt_exists(void)
 {
-   struct acpi_table_header hdr;
-   struct hw_vendor_info *v_info;
const struct x86_cpu_id *id;
u64 misc_pwr;
+   int idx;
 
id = x86_match_cpu(intel_pstate_cpu_oob_ids);
if (id) {
@@ -2507,21 +2499,15 @@ static bool __init 
intel_pstate_platform_pwr_mgmt_exists(void)
return true;
}
 
-   if (acpi_disabled ||
-   ACPI_FAILURE(acpi_get_table_header(ACPI_SIG_FADT, 0, )))
+   idx = acpi_match_platform_list(plat_info);
+   if (idx < 0)
return false;
 
-   for (v_info = vendor_info; v_info->valid; v_info++) {
-   if (!strncmp(hdr.oem_id, v_info->oem_id, ACPI_OEM_ID_SIZE) &&
-   !strncmp(hdr.oem_table_id, v_info->oem_table_id,
-   ACPI_OEM_TABLE_ID_SIZE))
-   switch (v_info->oem_pwr_table) {
-   case PSS:
-   return intel_pstate_no_acpi_pss();
-   case PPC:
-   return intel_pstate_has_acpi_ppc() &&
-   (!force_load);
-   }
+   switch (plat_info[idx].data) {
+   case PSS:
+   return intel_pstate_no_acpi_pss();
+   case PPC:
+   return intel_pstate_has_acpi_ppc() && !force_load;
}
 
return false;


[PATCH v3 5/5] edac drivers: add MC owner check in init

2017-08-18 Thread Toshi Kani
Change generic x86 edac drivers, which probe CPU type with
x86_match_cpu(), to verify the module owner at the beginning
of their module init functions.  This allows them to fail
their init immediately when ghes_edac is enabled.  Similar
change can be made to other edac drivers as necessary.

Also, remove ".c" from module names of pnp2_edac, sb_edac,
and skx_edac.

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Suggested-by: Borislav Petkov <b...@alien8.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Mauro Carvalho Chehab <mche...@kernel.org>
Cc: Tony Luck <tony.l...@intel.com>
---
 drivers/edac/amd64_edac.c |5 +
 drivers/edac/pnd2_edac.c  |9 -
 drivers/edac/sb_edac.c|9 +++--
 drivers/edac/skx_edac.c   |8 +++-
 4 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 3aea556..529bcbf 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3435,9 +3435,14 @@ MODULE_DEVICE_TABLE(x86cpu, amd64_cpuids);
 
 static int __init amd64_edac_init(void)
 {
+   const char *owner;
int err = -ENODEV;
int i;
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
if (!x86_match_cpu(amd64_cpuids))
return -ENODEV;
 
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index 8e59949..6609041 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -45,6 +45,8 @@
 #include "edac_module.h"
 #include "pnd2_edac.h"
 
+#define EDAC_MOD_STR   "pnd2_edac"
+
 #define APL_NUM_CHANNELS   4
 #define DNV_NUM_CHANNELS   2
 #define DNV_MAX_DIMMS  2 /* Max DIMMs per channel */
@@ -1313,7 +1315,7 @@ static int pnd2_register_mci(struct mem_ctl_info **ppmci)
pvt = mci->pvt_info;
memset(pvt, 0, sizeof(*pvt));
 
-   mci->mod_name = "pnd2_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->dev_name = ops->name;
mci->ctl_name = "Pondicherry2";
 
@@ -1505,10 +1507,15 @@ MODULE_DEVICE_TABLE(x86cpu, pnd2_cpuids);
 static int __init pnd2_init(void)
 {
const struct x86_cpu_id *id;
+   const char *owner;
int rc;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(pnd2_cpuids);
if (!id)
return -ENODEV;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 80d860c..d411aa5c 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -36,7 +36,7 @@ static LIST_HEAD(sbridge_edac_list);
  * Alter this version for the module when modifications are made
  */
 #define SBRIDGE_REVISION" Ver: 1.1.2 "
-#define EDAC_MOD_STR  "sbridge_edac"
+#define EDAC_MOD_STR   "sb_edac"
 
 /*
  * Debug macros
@@ -3124,7 +3124,7 @@ static int sbridge_register_mci(struct sbridge_dev 
*sbridge_dev, enum type type)
MEM_FLAG_DDR4 : MEM_FLAG_DDR3;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
-   mci->mod_name = "sb_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->mod_ver = SBRIDGE_REVISION;
mci->dev_name = pci_name(pdev);
mci->ctl_page_to_phys = NULL;
@@ -3372,10 +3372,15 @@ static void sbridge_remove(void)
 static int __init sbridge_init(void)
 {
const struct x86_cpu_id *id;
+   const char *owner;
int rc;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(sbridge_cpuids);
if (!id)
return -ENODEV;
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index 64bef6c9..17a2fbd 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -31,6 +31,7 @@
 
 #include "edac_module.h"
 
+#define EDAC_MOD_STR"skx_edac"
 #define SKX_REVISION" Ver: 1.0 "
 
 /*
@@ -471,7 +472,7 @@ static int skx_register_mci(struct skx_imc *imc)
mci->mtype_cap = MEM_FLAG_DDR4;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
-   mci->mod_name = "skx_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->dev_name = pci_name(imc->chan[0].cdev);
mci->mod_ver = SKX_REVISION;
mci->ctl_page_to_phys = NULL;
@@ -1042,12 +1043,17 @@ static int __init skx_init(void)
 {
const struct x86_cpu_id *id;
const struct munit *m;
+   const char *owner;
int rc = 0, i;
u8 mc = 0, src_id, node_id;
struct skx_dev *d;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(skx_cpuids);
if (!id)
return -ENODEV;


[PATCH v3 5/5] edac drivers: add MC owner check in init

2017-08-18 Thread Toshi Kani
Change generic x86 edac drivers, which probe CPU type with
x86_match_cpu(), to verify the module owner at the beginning
of their module init functions.  This allows them to fail
their init immediately when ghes_edac is enabled.  Similar
change can be made to other edac drivers as necessary.

Also, remove ".c" from module names of pnp2_edac, sb_edac,
and skx_edac.

Signed-off-by: Toshi Kani 
Suggested-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Mauro Carvalho Chehab 
Cc: Tony Luck 
---
 drivers/edac/amd64_edac.c |5 +
 drivers/edac/pnd2_edac.c  |9 -
 drivers/edac/sb_edac.c|9 +++--
 drivers/edac/skx_edac.c   |8 +++-
 4 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 3aea556..529bcbf 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3435,9 +3435,14 @@ MODULE_DEVICE_TABLE(x86cpu, amd64_cpuids);
 
 static int __init amd64_edac_init(void)
 {
+   const char *owner;
int err = -ENODEV;
int i;
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
if (!x86_match_cpu(amd64_cpuids))
return -ENODEV;
 
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index 8e59949..6609041 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -45,6 +45,8 @@
 #include "edac_module.h"
 #include "pnd2_edac.h"
 
+#define EDAC_MOD_STR   "pnd2_edac"
+
 #define APL_NUM_CHANNELS   4
 #define DNV_NUM_CHANNELS   2
 #define DNV_MAX_DIMMS  2 /* Max DIMMs per channel */
@@ -1313,7 +1315,7 @@ static int pnd2_register_mci(struct mem_ctl_info **ppmci)
pvt = mci->pvt_info;
memset(pvt, 0, sizeof(*pvt));
 
-   mci->mod_name = "pnd2_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->dev_name = ops->name;
mci->ctl_name = "Pondicherry2";
 
@@ -1505,10 +1507,15 @@ MODULE_DEVICE_TABLE(x86cpu, pnd2_cpuids);
 static int __init pnd2_init(void)
 {
const struct x86_cpu_id *id;
+   const char *owner;
int rc;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(pnd2_cpuids);
if (!id)
return -ENODEV;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 80d860c..d411aa5c 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -36,7 +36,7 @@ static LIST_HEAD(sbridge_edac_list);
  * Alter this version for the module when modifications are made
  */
 #define SBRIDGE_REVISION" Ver: 1.1.2 "
-#define EDAC_MOD_STR  "sbridge_edac"
+#define EDAC_MOD_STR   "sb_edac"
 
 /*
  * Debug macros
@@ -3124,7 +3124,7 @@ static int sbridge_register_mci(struct sbridge_dev 
*sbridge_dev, enum type type)
MEM_FLAG_DDR4 : MEM_FLAG_DDR3;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
-   mci->mod_name = "sb_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->mod_ver = SBRIDGE_REVISION;
mci->dev_name = pci_name(pdev);
mci->ctl_page_to_phys = NULL;
@@ -3372,10 +3372,15 @@ static void sbridge_remove(void)
 static int __init sbridge_init(void)
 {
const struct x86_cpu_id *id;
+   const char *owner;
int rc;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(sbridge_cpuids);
if (!id)
return -ENODEV;
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index 64bef6c9..17a2fbd 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -31,6 +31,7 @@
 
 #include "edac_module.h"
 
+#define EDAC_MOD_STR"skx_edac"
 #define SKX_REVISION" Ver: 1.0 "
 
 /*
@@ -471,7 +472,7 @@ static int skx_register_mci(struct skx_imc *imc)
mci->mtype_cap = MEM_FLAG_DDR4;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
-   mci->mod_name = "skx_edac.c";
+   mci->mod_name = EDAC_MOD_STR;
mci->dev_name = pci_name(imc->chan[0].cdev);
mci->mod_ver = SKX_REVISION;
mci->ctl_page_to_phys = NULL;
@@ -1042,12 +1043,17 @@ static int __init skx_init(void)
 {
const struct x86_cpu_id *id;
const struct munit *m;
+   const char *owner;
int rc = 0, i;
u8 mc = 0, src_id, node_id;
struct skx_dev *d;
 
edac_dbg(2, "\n");
 
+   owner = edac_get_owner();
+   if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
+   return -EBUSY;
+
id = x86_match_cpu(skx_cpuids);
if (!id)
return -ENODEV;


[PATCH v3 3/5] ghes_edac: add platform check to enable ghes_edac

2017-08-18 Thread Toshi Kani
The ghes_edac driver was introduced in 2013 [1], but it has not
been enabled by any distro yet.  This driver obtains error info
from firmware interfaces, which are not properly implemented on
many platforms, as the driver always emits the messages below:

 This EDAC driver relies on BIOS to enumerate memory and get error reports.
 Unfortunately, not all BIOSes reflect the memory layout correctly
 So, the end result of using this driver varies from vendor to vendor
 If you find incorrect reports, please contact your hardware vendor
 to correct its BIOS.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms
to the list when they support ghes_edac.

"ghes_edac.force_load=1" skips this platform check.

[1]: https://lwn.net/Articles/538438/
Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Mauro Carvalho Chehab <mche...@kernel.org>
Cc: Tony Luck <tony.l...@intel.com>
---
 drivers/edac/ghes_edac.c |   28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 4e61a62..367e106 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -34,6 +34,9 @@ static LIST_HEAD(ghes_reglist);
 static DEFINE_MUTEX(ghes_edac_lock);
 static int ghes_edac_mc_num;
 
+/* Set 1 to skip the platform check */
+static bool __read_mostly ghes_edac_force_load;
+module_param_named(force_load, ghes_edac_force_load, bool, 0);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -405,6 +408,15 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
+/*
+ * Known systems that are safe to enable this module.
+ * "ghes_edac.force_load=1" skips this check if necessary.
+ */
+static struct acpi_platform_list plat_list[] = {
+   {"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+   { } /* End */
+};
+
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
bool fake = false;
@@ -413,6 +425,12 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
struct edac_mc_layer layers[1];
struct ghes_edac_pvt *pvt;
struct ghes_edac_dimm_fill dimm_fill;
+   int idx;
+
+   /* Check if safe to enable on this system */
+   idx = acpi_match_platform_list(plat_list);
+   if (!ghes_edac_force_load && idx < 0)
+   return 0;
 
/* Get the number of DIMMs */
dmi_walk(ghes_edac_count_dimms, _dimm);
@@ -456,7 +474,11 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
mci->dev_name = "ghes";
 
if (!ghes_edac_mc_num) {
-   if (!fake) {
+   if (fake) {
+   pr_info("This system has a very crappy BIOS: It doesn't 
even list the DIMMS.\n");
+   pr_info("Its SMBIOS info is wrong. It is doubtful that 
the error report would\n");
+   pr_info("work on such system. Use this driver with 
caution\n");
+   } else if (idx < 0) {
pr_info("This EDAC driver relies on BIOS to enumerate 
memory and get error reports.\n");
pr_info("Unfortunately, not all BIOSes reflect the 
memory layout correctly.\n");
pr_info("So, the end result of using this driver varies 
from vendor to vendor.\n");
@@ -464,10 +486,6 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
pr_info("to correct its BIOS.\n");
pr_info("This system has %d DIMM sockets.\n",
num_dimm);
-   } else {
-   pr_info("This system has a very crappy BIOS: It doesn't 
even list the DIMMS.\n");
-   pr_info("Its SMBIOS info is wrong. It is doubtful that 
the error report would\n");
-   pr_info("work on such system. Use this driver with 
caution\n");
}
}
 


[PATCH v3 1/5] ACPI / blacklist: add acpi_match_platform_list()

2017-08-18 Thread Toshi Kani
ACPI OEM ID / OEM Table ID / Revision can be used to identify
a platform based on ACPI firmware info.  acpi_blacklisted(),
intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
have been using similar check to detect a list of platforms
that require special handlings.

Move the platform check in acpi_blacklisted() to a new common
utility function, acpi_match_platform_list(), so that other
drivers do not have to implement their own version.

There is no change in functionality.

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Cc: "Rafael J. Wysocki" <r...@rjwysocki.net>
Cc: Borislav Petkov <b...@alien8.de>
---
 drivers/acpi/blacklist.c |   83 --
 drivers/acpi/utils.c |   40 ++
 include/linux/acpi.h |   19 +++
 3 files changed, 73 insertions(+), 69 deletions(-)

diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
index bb542ac..037fd53 100644
--- a/drivers/acpi/blacklist.c
+++ b/drivers/acpi/blacklist.c
@@ -30,30 +30,13 @@
 
 #include "internal.h"
 
-enum acpi_blacklist_predicates {
-   all_versions,
-   less_than_or_equal,
-   equal,
-   greater_than_or_equal,
-};
-
-struct acpi_blacklist_item {
-   char oem_id[7];
-   char oem_table_id[9];
-   u32 oem_revision;
-   char *table;
-   enum acpi_blacklist_predicates oem_revision_predicate;
-   char *reason;
-   u32 is_critical_error;
-};
-
 static struct dmi_system_id acpi_rev_dmi_table[] __initdata;
 
 /*
  * POLICY: If *anything* doesn't work, put it on the blacklist.
  *If they are critical errors, mark it critical, and abort driver load.
  */
-static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
+static struct acpi_platform_list acpi_blacklist[] __initdata = {
/* Compaq Presario 1700 */
{"PTLTD ", "  DSDT  ", 0x0604, ACPI_SIG_DSDT, less_than_or_equal,
 "Multiple problems", 1},
@@ -67,65 +50,27 @@ static struct acpi_blacklist_item acpi_blacklist[] 
__initdata = {
{"IBM   ", "TP600E  ", 0x0105, ACPI_SIG_DSDT, less_than_or_equal,
 "Incorrect _ADR", 1},
 
-   {""}
+   { }
 };
 
 int __init acpi_blacklisted(void)
 {
-   int i = 0;
+   int i;
int blacklisted = 0;
-   struct acpi_table_header table_header;
-
-   while (acpi_blacklist[i].oem_id[0] != '\0') {
-   if (acpi_get_table_header(acpi_blacklist[i].table, 0, 
_header)) {
-   i++;
-   continue;
-   }
-
-   if (strncmp(acpi_blacklist[i].oem_id, table_header.oem_id, 6)) {
-   i++;
-   continue;
-   }
-
-   if (strncmp
-   (acpi_blacklist[i].oem_table_id, table_header.oem_table_id,
-8)) {
-   i++;
-   continue;
-   }
-
-   if ((acpi_blacklist[i].oem_revision_predicate == all_versions)
-   || (acpi_blacklist[i].oem_revision_predicate ==
-   less_than_or_equal
-   && table_header.oem_revision <=
-   acpi_blacklist[i].oem_revision)
-   || (acpi_blacklist[i].oem_revision_predicate ==
-   greater_than_or_equal
-   && table_header.oem_revision >=
-   acpi_blacklist[i].oem_revision)
-   || (acpi_blacklist[i].oem_revision_predicate == equal
-   && table_header.oem_revision ==
-   acpi_blacklist[i].oem_revision)) {
 
-   printk(KERN_ERR PREFIX
-  "Vendor \"%6.6s\" System \"%8.8s\" "
-  "Revision 0x%x has a known ACPI BIOS problem.\n",
-  acpi_blacklist[i].oem_id,
-  acpi_blacklist[i].oem_table_id,
-  acpi_blacklist[i].oem_revision);
+   i = acpi_match_platform_list(acpi_blacklist);
+   if (i >= 0) {
+   pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" Revision 0x%x 
has a known ACPI BIOS problem.\n",
+  acpi_blacklist[i].oem_id,
+  acpi_blacklist[i].oem_table_id,
+  acpi_blacklist[i].oem_revision);
 
-   printk(KERN_ERR PREFIX
-  "Reason: %s. This is a %s error\n",
-  acpi_blacklist[i].reason,
-  (acpi_blacklist[i].
-   is_critical_error ? "non-recoverable" :
-   "recoverable"));
+   pr_er

[PATCH v3 3/5] ghes_edac: add platform check to enable ghes_edac

2017-08-18 Thread Toshi Kani
The ghes_edac driver was introduced in 2013 [1], but it has not
been enabled by any distro yet.  This driver obtains error info
from firmware interfaces, which are not properly implemented on
many platforms, as the driver always emits the messages below:

 This EDAC driver relies on BIOS to enumerate memory and get error reports.
 Unfortunately, not all BIOSes reflect the memory layout correctly
 So, the end result of using this driver varies from vendor to vendor
 If you find incorrect reports, please contact your hardware vendor
 to correct its BIOS.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms
to the list when they support ghes_edac.

"ghes_edac.force_load=1" skips this platform check.

[1]: https://lwn.net/Articles/538438/
Signed-off-by: Toshi Kani 
Cc: Borislav Petkov 
Cc: Mauro Carvalho Chehab 
Cc: Tony Luck 
---
 drivers/edac/ghes_edac.c |   28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 4e61a62..367e106 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -34,6 +34,9 @@ static LIST_HEAD(ghes_reglist);
 static DEFINE_MUTEX(ghes_edac_lock);
 static int ghes_edac_mc_num;
 
+/* Set 1 to skip the platform check */
+static bool __read_mostly ghes_edac_force_load;
+module_param_named(force_load, ghes_edac_force_load, bool, 0);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -405,6 +408,15 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
+/*
+ * Known systems that are safe to enable this module.
+ * "ghes_edac.force_load=1" skips this check if necessary.
+ */
+static struct acpi_platform_list plat_list[] = {
+   {"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+   { } /* End */
+};
+
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
bool fake = false;
@@ -413,6 +425,12 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
struct edac_mc_layer layers[1];
struct ghes_edac_pvt *pvt;
struct ghes_edac_dimm_fill dimm_fill;
+   int idx;
+
+   /* Check if safe to enable on this system */
+   idx = acpi_match_platform_list(plat_list);
+   if (!ghes_edac_force_load && idx < 0)
+   return 0;
 
/* Get the number of DIMMs */
dmi_walk(ghes_edac_count_dimms, _dimm);
@@ -456,7 +474,11 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
mci->dev_name = "ghes";
 
if (!ghes_edac_mc_num) {
-   if (!fake) {
+   if (fake) {
+   pr_info("This system has a very crappy BIOS: It doesn't 
even list the DIMMS.\n");
+   pr_info("Its SMBIOS info is wrong. It is doubtful that 
the error report would\n");
+   pr_info("work on such system. Use this driver with 
caution\n");
+   } else if (idx < 0) {
pr_info("This EDAC driver relies on BIOS to enumerate 
memory and get error reports.\n");
pr_info("Unfortunately, not all BIOSes reflect the 
memory layout correctly.\n");
pr_info("So, the end result of using this driver varies 
from vendor to vendor.\n");
@@ -464,10 +486,6 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
pr_info("to correct its BIOS.\n");
pr_info("This system has %d DIMM sockets.\n",
num_dimm);
-   } else {
-   pr_info("This system has a very crappy BIOS: It doesn't 
even list the DIMMS.\n");
-   pr_info("Its SMBIOS info is wrong. It is doubtful that 
the error report would\n");
-   pr_info("work on such system. Use this driver with 
caution\n");
}
}
 


[PATCH v3 1/5] ACPI / blacklist: add acpi_match_platform_list()

2017-08-18 Thread Toshi Kani
ACPI OEM ID / OEM Table ID / Revision can be used to identify
a platform based on ACPI firmware info.  acpi_blacklisted(),
intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
have been using similar check to detect a list of platforms
that require special handlings.

Move the platform check in acpi_blacklisted() to a new common
utility function, acpi_match_platform_list(), so that other
drivers do not have to implement their own version.

There is no change in functionality.

Signed-off-by: Toshi Kani 
Cc: "Rafael J. Wysocki" 
Cc: Borislav Petkov 
---
 drivers/acpi/blacklist.c |   83 --
 drivers/acpi/utils.c |   40 ++
 include/linux/acpi.h |   19 +++
 3 files changed, 73 insertions(+), 69 deletions(-)

diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
index bb542ac..037fd53 100644
--- a/drivers/acpi/blacklist.c
+++ b/drivers/acpi/blacklist.c
@@ -30,30 +30,13 @@
 
 #include "internal.h"
 
-enum acpi_blacklist_predicates {
-   all_versions,
-   less_than_or_equal,
-   equal,
-   greater_than_or_equal,
-};
-
-struct acpi_blacklist_item {
-   char oem_id[7];
-   char oem_table_id[9];
-   u32 oem_revision;
-   char *table;
-   enum acpi_blacklist_predicates oem_revision_predicate;
-   char *reason;
-   u32 is_critical_error;
-};
-
 static struct dmi_system_id acpi_rev_dmi_table[] __initdata;
 
 /*
  * POLICY: If *anything* doesn't work, put it on the blacklist.
  *If they are critical errors, mark it critical, and abort driver load.
  */
-static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
+static struct acpi_platform_list acpi_blacklist[] __initdata = {
/* Compaq Presario 1700 */
{"PTLTD ", "  DSDT  ", 0x0604, ACPI_SIG_DSDT, less_than_or_equal,
 "Multiple problems", 1},
@@ -67,65 +50,27 @@ static struct acpi_blacklist_item acpi_blacklist[] 
__initdata = {
{"IBM   ", "TP600E  ", 0x0105, ACPI_SIG_DSDT, less_than_or_equal,
 "Incorrect _ADR", 1},
 
-   {""}
+   { }
 };
 
 int __init acpi_blacklisted(void)
 {
-   int i = 0;
+   int i;
int blacklisted = 0;
-   struct acpi_table_header table_header;
-
-   while (acpi_blacklist[i].oem_id[0] != '\0') {
-   if (acpi_get_table_header(acpi_blacklist[i].table, 0, 
_header)) {
-   i++;
-   continue;
-   }
-
-   if (strncmp(acpi_blacklist[i].oem_id, table_header.oem_id, 6)) {
-   i++;
-   continue;
-   }
-
-   if (strncmp
-   (acpi_blacklist[i].oem_table_id, table_header.oem_table_id,
-8)) {
-   i++;
-   continue;
-   }
-
-   if ((acpi_blacklist[i].oem_revision_predicate == all_versions)
-   || (acpi_blacklist[i].oem_revision_predicate ==
-   less_than_or_equal
-   && table_header.oem_revision <=
-   acpi_blacklist[i].oem_revision)
-   || (acpi_blacklist[i].oem_revision_predicate ==
-   greater_than_or_equal
-   && table_header.oem_revision >=
-   acpi_blacklist[i].oem_revision)
-   || (acpi_blacklist[i].oem_revision_predicate == equal
-   && table_header.oem_revision ==
-   acpi_blacklist[i].oem_revision)) {
 
-   printk(KERN_ERR PREFIX
-  "Vendor \"%6.6s\" System \"%8.8s\" "
-  "Revision 0x%x has a known ACPI BIOS problem.\n",
-  acpi_blacklist[i].oem_id,
-  acpi_blacklist[i].oem_table_id,
-  acpi_blacklist[i].oem_revision);
+   i = acpi_match_platform_list(acpi_blacklist);
+   if (i >= 0) {
+   pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" Revision 0x%x 
has a known ACPI BIOS problem.\n",
+  acpi_blacklist[i].oem_id,
+  acpi_blacklist[i].oem_table_id,
+  acpi_blacklist[i].oem_revision);
 
-   printk(KERN_ERR PREFIX
-  "Reason: %s. This is a %s error\n",
-  acpi_blacklist[i].reason,
-  (acpi_blacklist[i].
-   is_critical_error ? "non-recoverable" :
-   "recoverable"));
+   pr_err(PREFIX "Reason: %s. This is a %s error\n",
+

[PATCH v3 4/5] EDAC: add edac_get_owner() to check MC owner

2017-08-18 Thread Toshi Kani
Only a single edac driver can be enabled for EDAC MC.  When ghes_edac
is enabled, a regular edac driver for the CPU type / platform still
attempts to register itself and fails in edac_mc_add_mc().

Add edac_get_owner() so that regular edac drivers can check the owner
of EDAC MC without calling edac_mc_add_mc().

Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
Suggested-by: Borislav Petkov <b...@alien8.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Mauro Carvalho Chehab <mche...@kernel.org>
Cc: Tony Luck <tony.l...@intel.com>
---
 drivers/edac/edac_mc.c |7 ++-
 drivers/edac/edac_mc.h |8 
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 4800721..48193f5 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -53,7 +53,7 @@ static LIST_HEAD(mc_devices);
  * Used to lock EDAC MC to just one module, avoiding two drivers e. g.
  * apei/ghes and i7core_edac to be used at the same time.
  */
-static void const *edac_mc_owner;
+static const char *edac_mc_owner;
 
 static struct bus_type mc_bus[EDAC_MAX_MCS];
 
@@ -701,6 +701,11 @@ struct mem_ctl_info *edac_mc_find(int idx)
 }
 EXPORT_SYMBOL(edac_mc_find);
 
+const char *edac_get_owner(void)
+{
+   return edac_mc_owner;
+}
+EXPORT_SYMBOL_GPL(edac_get_owner);
 
 /* FIXME - should a warning be printed if no error detection? correction? */
 int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 5357800..4165e15 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -128,6 +128,14 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
   unsigned sz_pvt);
 
 /**
+ * edac_get_owner - Return the owner's mod_name of EDAC MC
+ *
+ * Returns:
+ * Pointer to mod_name string when EDAC MC is owned. NULL otherwise.
+ */
+extern const char *edac_get_owner(void);
+
+/*
  * edac_mc_add_mc_with_groups() - Insert the @mci structure into the mci
  * global list and create sysfs entries associated with @mci structure.
  *


  1   2   3   4   5   6   7   8   9   10   >