Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings

2021-01-25 Thread Ding Tianhong
On 2021/1/26 12:45, Nicholas Piggin wrote:
> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
> supports PMD sized vmap mappings.
> 
> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
> or larger, and fall back to small pages if that was unsuccessful.
> 
> Architectures must ensure that any arch specific vmalloc allocations
> that require PAGE_SIZE mappings (e.g., module allocations vs strict
> module rwx) use the VM_NOHUGE flag to inhibit larger mappings.
> 
> When hugepage vmalloc mappings are enabled in the next patch, this
> reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> 
> This can result in more internal fragmentation and memory overhead for a
> given allocation, an option nohugevmalloc is added to disable at boot.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/Kconfig|  11 ++
>  include/linux/vmalloc.h |  21 
>  mm/page_alloc.c |   5 +-
>  mm/vmalloc.c| 215 +++-
>  4 files changed, 205 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..eef170e0c9b8 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>  config HAVE_ARCH_HUGE_VMAP
>   bool
>  
> +#
> +#  Archs that select this would be capable of PMD-sized vmaps (i.e.,
> +#  arch_vmap_pmd_supported() returns true), and they must make no assumptions
> +#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP 
> flag
> +#  can be used to prohibit arch-specific allocations from using hugepages to
> +#  help with this (e.g., modules may require it).
> +#
> +config HAVE_ARCH_HUGE_VMALLOC
> + depends on HAVE_ARCH_HUGE_VMAP
> + bool
> +
>  config ARCH_WANT_HUGE_PMD_SHARE
>   bool
>  
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 99ea72d547dc..93270adf5db5 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -25,6 +25,7 @@ struct notifier_block;  /* in notifier.h */
>  #define VM_NO_GUARD  0x0040  /* don't add guard page */
>  #define VM_KASAN 0x0080  /* has allocated kasan shadow 
> memory */
>  #define VM_MAP_PUT_PAGES 0x0100  /* put pages and free array in 
> vfree */
> +#define VM_NO_HUGE_VMAP  0x0200  /* force PAGE_SIZE pte 
> mapping */
> 
>  /*
>   * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
> @@ -59,6 +60,9 @@ struct vm_struct {
>   unsigned long   size;
>   unsigned long   flags;
>   struct page **pages;
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + unsigned intpage_order;
> +#endif
>   unsigned intnr_pages;
>   phys_addr_t phys_addr;
>   const void  *caller;
Hi Nicholas:

Give a suggestion :)

The page order was only used to indicate the huge page flag for vm area, and 
only valid when
size bigger than PMD_SIZE, so can we use the vm flgas to instead of that, just 
like define the
new flag named VM_HUGEPAGE, it would not break the vm struct, and it is easier 
for me to backport the serious
patches to our own branches. (Base on the lts version).

Tianhong

> @@ -193,6 +197,22 @@ void free_vm_area(struct vm_struct *area);
>  extern struct vm_struct *remove_vm_area(const void *addr);
>  extern struct vm_struct *find_vm_area(const void *addr);
>  
> +static inline bool is_vm_area_hugepages(const void *addr)
> +{
> + /*
> +  * This may not 100% tell if the area is mapped with > PAGE_SIZE
> +  * page table entries, if for some reason the architecture indicates
> +  * larger sizes are available but decides not to use them, nothing
> +  * prevents that. This only indicates the size of the physical page
> +  * allocated in the vmalloc layer.
> +  */
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + return find_vm_area(addr)->page_order > 0;
> +#else
> + return false;
> +#endif
> +}
> +
>  #ifdef CONFIG_MMU
>  int vmap_range(unsigned long addr, unsigned long end,
>   phys_addr_t phys_addr, pgprot_t prot,
> @@ -210,6 +230,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
>   if (vm)
>   vm->flags |= VM_FLUSH_RESET_PERMS;
>  }
> +
>  #else
>  static inline int
>  map_kernel_range_noflush(unsigned long start, unsigned long size,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 027f6481ba59..b7a9661fa232 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -72,6 +72,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -8238,6 +8239,7 @@ void *__init alloc_large_system_hash(const char 
> *tablename,
>   void *table = NULL;
>   gfp_t gfp_flags;
>   bool 

Re: [PATCH v11 04/13] mm/ioremap: rename ioremap_*_range to vmap_*_range

2021-01-25 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH 3/5] powerpc/xive: remove unnecessary unmap_kernel_range

2021-01-25 Thread Christoph Hellwig
On Tue, Jan 26, 2021 at 02:54:02PM +1000, Nicholas Piggin wrote:
> iounmap will remove ptes.

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH v11 05/13] mm: HUGE_VMAP arch support cleanup

2021-01-25 Thread Ding Tianhong
Reviewed-by: Ding Tianhong 

On 2021/1/26 12:45, Nicholas Piggin wrote:
> This changes the awkward approach where architectures provide init
> functions to determine which levels they can provide large mappings for,
> to one where the arch is queried for each call.
> 
> This removes code and indirection, and allows constant-folding of dead
> code for unsupported levels.
> 
> This also adds a prot argument to the arch query. This is unused
> currently but could help with some architectures (e.g., some powerpc
> processors can't map uncacheable memory with large pages).
> 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: x...@kernel.org
> Cc: "H. Peter Anvin" 
> Acked-by: Catalin Marinas  [arm64]
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/arm64/include/asm/vmalloc.h |  8 ++
>  arch/arm64/mm/mmu.c  | 10 +--
>  arch/powerpc/include/asm/vmalloc.h   |  8 ++
>  arch/powerpc/mm/book3s64/radix_pgtable.c |  8 +-
>  arch/x86/include/asm/vmalloc.h   |  7 ++
>  arch/x86/mm/ioremap.c| 12 +--
>  include/linux/io.h   |  9 ---
>  include/linux/vmalloc.h  |  6 ++
>  init/main.c  |  1 -
>  mm/ioremap.c | 94 ++--
>  10 files changed, 85 insertions(+), 78 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/vmalloc.h 
> b/arch/arm64/include/asm/vmalloc.h
> index 2ca708ab9b20..597b40405319 100644
> --- a/arch/arm64/include/asm/vmalloc.h
> +++ b/arch/arm64/include/asm/vmalloc.h
> @@ -1,4 +1,12 @@
>  #ifndef _ASM_ARM64_VMALLOC_H
>  #define _ASM_ARM64_VMALLOC_H
>  
> +#include 
> +
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +bool arch_vmap_p4d_supported(pgprot_t prot);
> +bool arch_vmap_pud_supported(pgprot_t prot);
> +bool arch_vmap_pmd_supported(pgprot_t prot);
> +#endif
> +
>  #endif /* _ASM_ARM64_VMALLOC_H */
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ae0c3d023824..1613d290cbd1 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1313,12 +1313,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, 
> int *size, pgprot_t prot)
>   return dt_virt;
>  }
>  
> -int __init arch_ioremap_p4d_supported(void)
> +bool arch_vmap_p4d_supported(pgprot_t prot)
>  {
> - return 0;
> + return false;
>  }
>  
> -int __init arch_ioremap_pud_supported(void)
> +bool arch_vmap_pud_supported(pgprot_t prot)
>  {
>   /*
>* Only 4k granule supports level 1 block mappings.
> @@ -1328,9 +1328,9 @@ int __init arch_ioremap_pud_supported(void)
>  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
>  }
>  
> -int __init arch_ioremap_pmd_supported(void)
> +bool arch_vmap_pmd_supported(pgprot_t prot)
>  {
> - /* See arch_ioremap_pud_supported() */
> + /* See arch_vmap_pud_supported() */
>   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
>  }
>  
> diff --git a/arch/powerpc/include/asm/vmalloc.h 
> b/arch/powerpc/include/asm/vmalloc.h
> index b992dfaaa161..105abb73f075 100644
> --- a/arch/powerpc/include/asm/vmalloc.h
> +++ b/arch/powerpc/include/asm/vmalloc.h
> @@ -1,4 +1,12 @@
>  #ifndef _ASM_POWERPC_VMALLOC_H
>  #define _ASM_POWERPC_VMALLOC_H
>  
> +#include 
> +
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +bool arch_vmap_p4d_supported(pgprot_t prot);
> +bool arch_vmap_pud_supported(pgprot_t prot);
> +bool arch_vmap_pmd_supported(pgprot_t prot);
> +#endif
> +
>  #endif /* _ASM_POWERPC_VMALLOC_H */
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 98f0b243c1ab..743807fc210f 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -1082,13 +1082,13 @@ void radix__ptep_modify_prot_commit(struct 
> vm_area_struct *vma,
>   set_pte_at(mm, addr, ptep, pte);
>  }
>  
> -int __init arch_ioremap_pud_supported(void)
> +bool arch_vmap_pud_supported(pgprot_t prot)
>  {
>   /* HPT does not cope with large pages in the vmalloc area */
>   return radix_enabled();
>  }
>  
> -int __init arch_ioremap_pmd_supported(void)
> +bool arch_vmap_pmd_supported(pgprot_t prot)
>  {
>   return radix_enabled();
>  }
> @@ -1182,7 +1182,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
>   return 1;
>  }
>  
> -int __init arch_ioremap_p4d_supported(void)
> +bool arch_vmap_p4d_supported(pgprot_t prot)
>  {
> - return 0;
> + return false;
>  }
> diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
> index 29837740b520..094ea2b565f3 100644
> --- a/arch/x86/include/asm/vmalloc.h
> +++ b/arch/x86/include/asm/vmalloc.h
> @@ -1,6 +1,13 @@
>  #ifndef _ASM_X86_VMALLOC_H
>  #define _ASM_X86_VMALLOC_H
>  
> +#include 
>  #include 
>  
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +bool arch_vmap_p4d_supported(pgprot_t prot);
> +bool 

[PATCH 3/5] powerpc/xive: remove unnecessary unmap_kernel_range

2021-01-25 Thread Nicholas Piggin
iounmap will remove ptes.

Cc: "Cédric Le Goater" 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/sysdev/xive/common.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/sysdev/xive/common.c 
b/arch/powerpc/sysdev/xive/common.c
index 595310e056f4..d6c2069cc828 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -959,16 +959,12 @@ EXPORT_SYMBOL_GPL(is_xive_irq);
 void xive_cleanup_irq_data(struct xive_irq_data *xd)
 {
if (xd->eoi_mmio) {
-   unmap_kernel_range((unsigned long)xd->eoi_mmio,
-  1u << xd->esb_shift);
iounmap(xd->eoi_mmio);
if (xd->eoi_mmio == xd->trig_mmio)
xd->trig_mmio = NULL;
xd->eoi_mmio = NULL;
}
if (xd->trig_mmio) {
-   unmap_kernel_range((unsigned long)xd->trig_mmio,
-  1u << xd->esb_shift);
iounmap(xd->trig_mmio);
xd->trig_mmio = NULL;
}
-- 
2.23.0



[PATCH v11 13/13] powerpc/64s/radix: Enable huge vmalloc mappings

2021-01-25 Thread Nicholas Piggin
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nicholas Piggin 
---
 .../admin-guide/kernel-parameters.txt |  2 ++
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/kernel/module.c  | 21 +++
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a10b545c2070..d62df53e5200 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3225,6 +3225,8 @@
 
nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.
 
+   nohugevmalloc   [PPC] Disable kernel huge vmalloc mappings.
+
nosmt   [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 107bb4319e0e..781da6829ab7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -181,6 +181,7 @@ config PPC
select GENERIC_GETTIMEOFDAY
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
+   select HAVE_ARCH_HUGE_VMALLOC   if HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index a211b0253cdb..07026335d24d 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -87,13 +87,26 @@ int module_finalize(const Elf_Ehdr *hdr,
return 0;
 }
 
-#ifdef MODULES_VADDR
 void *module_alloc(unsigned long size)
 {
+   unsigned long start = VMALLOC_START;
+   unsigned long end = VMALLOC_END;
+
+#ifdef MODULES_VADDR
BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
+   start = MODULES_VADDR;
+   end = MODULES_END;
+#endif
+
+   /*
+* Don't do huge page allocations for modules yet until more testing
+* is done. STRICT_MODULE_RWX may require extra work to support this
+* too.
+*/
 
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, 
GFP_KERNEL,
-   PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, 
NUMA_NO_NODE,
+   return __vmalloc_node_range(size, 1, start, end, GFP_KERNEL,
+   PAGE_KERNEL_EXEC,
+   VM_NO_HUGE_VMAP | VM_FLUSH_RESET_PERMS,
+   NUMA_NO_NODE,
__builtin_return_address(0));
 }
-#endif
-- 
2.23.0



[PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings

2021-01-25 Thread Nicholas Piggin
Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
supports PMD sized vmap mappings.

vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
or larger, and fall back to small pages if that was unsuccessful.

Architectures must ensure that any arch specific vmalloc allocations
that require PAGE_SIZE mappings (e.g., module allocations vs strict
module rwx) use the VM_NOHUGE flag to inhibit larger mappings.

When hugepage vmalloc mappings are enabled in the next patch, this
reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.

This can result in more internal fragmentation and memory overhead for a
given allocation, an option nohugevmalloc is added to disable at boot.

Signed-off-by: Nicholas Piggin 
---
 arch/Kconfig|  11 ++
 include/linux/vmalloc.h |  21 
 mm/page_alloc.c |   5 +-
 mm/vmalloc.c| 215 +++-
 4 files changed, 205 insertions(+), 47 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a3..eef170e0c9b8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
 config HAVE_ARCH_HUGE_VMAP
bool
 
+#
+#  Archs that select this would be capable of PMD-sized vmaps (i.e.,
+#  arch_vmap_pmd_supported() returns true), and they must make no assumptions
+#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
+#  can be used to prohibit arch-specific allocations from using hugepages to
+#  help with this (e.g., modules may require it).
+#
+config HAVE_ARCH_HUGE_VMALLOC
+   depends on HAVE_ARCH_HUGE_VMAP
+   bool
+
 config ARCH_WANT_HUGE_PMD_SHARE
bool
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 99ea72d547dc..93270adf5db5 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -25,6 +25,7 @@ struct notifier_block;/* in notifier.h */
 #define VM_NO_GUARD0x0040  /* don't add guard page */
 #define VM_KASAN   0x0080  /* has allocated kasan shadow 
memory */
 #define VM_MAP_PUT_PAGES   0x0100  /* put pages and free array in 
vfree */
+#define VM_NO_HUGE_VMAP0x0200  /* force PAGE_SIZE pte 
mapping */
 
 /*
  * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
@@ -59,6 +60,9 @@ struct vm_struct {
unsigned long   size;
unsigned long   flags;
struct page **pages;
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
+   unsigned intpage_order;
+#endif
unsigned intnr_pages;
phys_addr_t phys_addr;
const void  *caller;
@@ -193,6 +197,22 @@ void free_vm_area(struct vm_struct *area);
 extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
 
+static inline bool is_vm_area_hugepages(const void *addr)
+{
+   /*
+* This may not 100% tell if the area is mapped with > PAGE_SIZE
+* page table entries, if for some reason the architecture indicates
+* larger sizes are available but decides not to use them, nothing
+* prevents that. This only indicates the size of the physical page
+* allocated in the vmalloc layer.
+*/
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
+   return find_vm_area(addr)->page_order > 0;
+#else
+   return false;
+#endif
+}
+
 #ifdef CONFIG_MMU
 int vmap_range(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot,
@@ -210,6 +230,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
if (vm)
vm->flags |= VM_FLUSH_RESET_PERMS;
 }
+
 #else
 static inline int
 map_kernel_range_noflush(unsigned long start, unsigned long size,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 027f6481ba59..b7a9661fa232 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -72,6 +72,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -8238,6 +8239,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
void *table = NULL;
gfp_t gfp_flags;
bool virt;
+   bool huge;
 
/* allow the kernel cmdline to have a say */
if (!numentries) {
@@ -8305,6 +8307,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
} else if (get_order(size) >= MAX_ORDER || hashdist) {
table = __vmalloc(size, gfp_flags);
virt = true;
+   huge = is_vm_area_hugepages(table);
} else {
/*
 * If bucketsize is not a power-of-two, we may free
@@ -8321,7 +8324,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
 

[PATCH v11 11/13] mm/vmalloc: add vmap_range_noflush variant

2021-01-25 Thread Nicholas Piggin
As a side-effect, the order of flush_cache_vmap() and
arch_sync_kernel_mappings() calls are switched, but that now matches
the other callers in this file.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f043386bb51d..47ab4338cfff 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -240,7 +240,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, 
unsigned long end,
return 0;
 }
 
-int vmap_range(unsigned long addr, unsigned long end,
+static int vmap_range_noflush(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot,
unsigned int max_page_shift)
 {
@@ -263,14 +263,24 @@ int vmap_range(unsigned long addr, unsigned long end,
break;
} while (pgd++, phys_addr += (next - addr), addr = next, addr != end);
 
-   flush_cache_vmap(start, end);
-
if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
arch_sync_kernel_mappings(start, end);
 
return err;
 }
 
+int vmap_range(unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   unsigned int max_page_shift)
+{
+   int err;
+
+   err = vmap_range_noflush(addr, end, phys_addr, prot, max_page_shift);
+   flush_cache_vmap(addr, end);
+
+   return err;
+}
+
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 pgtbl_mod_mask *mask)
 {
-- 
2.23.0



[PATCH v11 10/13] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c

2021-01-25 Thread Nicholas Piggin
This is a generic kernel virtual memory mapper, not specific to ioremap.

Code is unchanged other than making vmap_range non-static.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Nicholas Piggin 
---
 include/linux/vmalloc.h |   3 +
 mm/ioremap.c| 203 
 mm/vmalloc.c| 202 +++
 3 files changed, 205 insertions(+), 203 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 9f7b8b00101b..99ea72d547dc 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -194,6 +194,9 @@ extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
 
 #ifdef CONFIG_MMU
+int vmap_range(unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   unsigned int max_page_shift);
 extern int map_kernel_range_noflush(unsigned long start, unsigned long size,
pgprot_t prot, struct page **pages);
 int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot,
diff --git a/mm/ioremap.c b/mm/ioremap.c
index 3264d0203785..d1dcc7e744ac 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -28,209 +28,6 @@ early_param("nohugeiomap", set_nohugeiomap);
 static const bool iomap_max_page_shift = PAGE_SHIFT;
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
-static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
-{
-   pte_t *pte;
-   u64 pfn;
-
-   pfn = phys_addr >> PAGE_SHIFT;
-   pte = pte_alloc_kernel_track(pmd, addr, mask);
-   if (!pte)
-   return -ENOMEM;
-   do {
-   BUG_ON(!pte_none(*pte));
-   set_pte_at(_mm, addr, pte, pfn_pte(pfn, prot));
-   pfn++;
-   } while (pte++, addr += PAGE_SIZE, addr != end);
-   *mask |= PGTBL_PTE_MODIFIED;
-   return 0;
-}
-
-static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift)
-{
-   if (max_page_shift < PMD_SHIFT)
-   return 0;
-
-   if (!arch_vmap_pmd_supported(prot))
-   return 0;
-
-   if ((end - addr) != PMD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(addr, PMD_SIZE))
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PMD_SIZE))
-   return 0;
-
-   if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
-   return 0;
-
-   return pmd_set_huge(pmd, phys_addr, prot);
-}
-
-static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift, pgtbl_mod_mask *mask)
-{
-   pmd_t *pmd;
-   unsigned long next;
-
-   pmd = pmd_alloc_track(_mm, pud, addr, mask);
-   if (!pmd)
-   return -ENOMEM;
-   do {
-   next = pmd_addr_end(addr, end);
-
-   if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot,
-   max_page_shift)) {
-   *mask |= PGTBL_PMD_MODIFIED;
-   continue;
-   }
-
-   if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask))
-   return -ENOMEM;
-   } while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
-   return 0;
-}
-
-static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift)
-{
-   if (max_page_shift < PUD_SHIFT)
-   return 0;
-
-   if (!arch_vmap_pud_supported(prot))
-   return 0;
-
-   if ((end - addr) != PUD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(addr, PUD_SIZE))
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PUD_SIZE))
-   return 0;
-
-   if (pud_present(*pud) && !pud_free_pmd_page(pud, addr))
-   return 0;
-
-   return pud_set_huge(pud, phys_addr, prot);
-}
-
-static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift, pgtbl_mod_mask *mask)
-{
-   pud_t *pud;
-   unsigned long next;
-
-   pud = pud_alloc_track(_mm, p4d, addr, mask);
-   if (!pud)
-   return -ENOMEM;
-   do {
-   next = pud_addr_end(addr, end);
-
-   if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot,
-   max_page_shift)) {
-   *mask |= PGTBL_PUD_MODIFIED;
-   continue;

[PATCH v11 09/13] mm/vmalloc: provide fallback arch huge vmap support functions

2021-01-25 Thread Nicholas Piggin
If an architecture doesn't support a particular page table level as
a huge vmap page size then allow it to skip defining the support
query function.

Suggested-by: Christoph Hellwig 
Signed-off-by: Nicholas Piggin 
---
 arch/arm64/include/asm/vmalloc.h   |  7 +++
 arch/powerpc/include/asm/vmalloc.h |  7 +++
 arch/x86/include/asm/vmalloc.h | 13 +
 include/linux/vmalloc.h| 24 
 4 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index fc9a12d6cc1a..7a22aeea9bb5 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -4,11 +4,8 @@
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-static inline bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
 
+#define arch_vmap_pud_supported arch_vmap_pud_supported
 static inline bool arch_vmap_pud_supported(pgprot_t prot)
 {
/*
@@ -19,11 +16,13 @@ static inline bool arch_vmap_pud_supported(pgprot_t prot)
   !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
+#define arch_vmap_pmd_supported arch_vmap_pmd_supported
 static inline bool arch_vmap_pmd_supported(pgprot_t prot)
 {
/* See arch_vmap_pud_supported() */
return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
+
 #endif
 
 #endif /* _ASM_ARM64_VMALLOC_H */
diff --git a/arch/powerpc/include/asm/vmalloc.h 
b/arch/powerpc/include/asm/vmalloc.h
index 3f0c153befb0..4c69ece52a31 100644
--- a/arch/powerpc/include/asm/vmalloc.h
+++ b/arch/powerpc/include/asm/vmalloc.h
@@ -5,21 +5,20 @@
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-static inline bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
 
+#define arch_vmap_pud_supported arch_vmap_pud_supported
 static inline bool arch_vmap_pud_supported(pgprot_t prot)
 {
/* HPT does not cope with large pages in the vmalloc area */
return radix_enabled();
 }
 
+#define arch_vmap_pmd_supported arch_vmap_pmd_supported
 static inline bool arch_vmap_pmd_supported(pgprot_t prot)
 {
return radix_enabled();
 }
+
 #endif
 
 #endif /* _ASM_POWERPC_VMALLOC_H */
diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
index e714b00fc0ca..49ce331f3ac6 100644
--- a/arch/x86/include/asm/vmalloc.h
+++ b/arch/x86/include/asm/vmalloc.h
@@ -6,24 +6,21 @@
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-static inline bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
 
+#ifdef CONFIG_X86_64
+#define arch_vmap_pud_supported arch_vmap_pud_supported
 static inline bool arch_vmap_pud_supported(pgprot_t prot)
 {
-#ifdef CONFIG_X86_64
return boot_cpu_has(X86_FEATURE_GBPAGES);
-#else
-   return false;
-#endif
 }
+#endif
 
+#define arch_vmap_pmd_supported arch_vmap_pmd_supported
 static inline bool arch_vmap_pmd_supported(pgprot_t prot)
 {
return boot_cpu_has(X86_FEATURE_PSE);
 }
+
 #endif
 
 #endif /* _ASM_X86_VMALLOC_H */
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 00bd62bd701e..9f7b8b00101b 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -83,10 +83,26 @@ struct vmap_area {
};
 };
 
-#ifndef CONFIG_HAVE_ARCH_HUGE_VMAP
-static inline bool arch_vmap_p4d_supported(pgprot_t prot) { return false; }
-static inline bool arch_vmap_pud_supported(pgprot_t prot) { return false; }
-static inline bool arch_vmap_pmd_supported(pgprot_t prot) { return false; }
+/* archs that select HAVE_ARCH_HUGE_VMAP should override one or more of these 
*/
+#ifndef arch_vmap_p4d_supported
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+#endif
+
+#ifndef arch_vmap_pud_supported
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+   return false;
+}
+#endif
+
+#ifndef arch_vmap_pmd_supported
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   return false;
+}
 #endif
 
 /*
-- 
2.23.0



[PATCH v11 08/13] x86: inline huge vmap supported functions

2021-01-25 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Signed-off-by: Nicholas Piggin 
---
 arch/x86/include/asm/vmalloc.h | 22 +++---
 arch/x86/mm/ioremap.c  | 21 -
 arch/x86/mm/pgtable.c  | 13 -
 3 files changed, 19 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
index 094ea2b565f3..e714b00fc0ca 100644
--- a/arch/x86/include/asm/vmalloc.h
+++ b/arch/x86/include/asm/vmalloc.h
@@ -1,13 +1,29 @@
 #ifndef _ASM_X86_VMALLOC_H
 #define _ASM_X86_VMALLOC_H
 
+#include 
 #include 
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+#ifdef CONFIG_X86_64
+   return boot_cpu_has(X86_FEATURE_GBPAGES);
+#else
+   return false;
+#endif
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   return boot_cpu_has(X86_FEATURE_PSE);
+}
 #endif
 
 #endif /* _ASM_X86_VMALLOC_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index fbaf0c447986..12c686c65ea9 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -481,27 +481,6 @@ void iounmap(volatile void __iomem *addr)
 }
 EXPORT_SYMBOL(iounmap);
 
-#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-
-bool arch_vmap_pud_supported(pgprot_t prot)
-{
-#ifdef CONFIG_X86_64
-   return boot_cpu_has(X86_FEATURE_GBPAGES);
-#else
-   return false;
-#endif
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   return boot_cpu_has(X86_FEATURE_PSE);
-}
-#endif
-
 /*
  * Convert a physical pointer to a virtual kernel pointer for /dev/mem
  * access
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f6a9e2e36642..d27cf69e811d 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -780,14 +780,6 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
-/*
- * Until we support 512GB pages, skip them in the vmap area.
- */
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;
-}
-
 #ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
@@ -861,11 +853,6 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud, unsigned long addr)
-{
-   return pud_none(*pud);
-}
-
 /*
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
-- 
2.23.0



[PATCH v11 07/13] arm64: inline huge vmap supported functions

2021-01-25 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Acked-by: Catalin Marinas 
Signed-off-by: Nicholas Piggin 
---
 arch/arm64/include/asm/vmalloc.h | 23 ---
 arch/arm64/mm/mmu.c  | 26 --
 2 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 597b40405319..fc9a12d6cc1a 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -4,9 +4,26 @@
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+   /*
+* Only 4k granule supports level 1 block mappings.
+* SW table walks can't handle removal of intermediate entries.
+*/
+   return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
+  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   /* See arch_vmap_pud_supported() */
+   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
+}
 #endif
 
 #endif /* _ASM_ARM64_VMALLOC_H */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 1613d290cbd1..ab9ba7c36dae 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1313,27 +1313,6 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int 
*size, pgprot_t prot)
return dt_virt;
 }
 
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-
-bool arch_vmap_pud_supported(pgprot_t prot)
-{
-   /*
-* Only 4k granule supports level 1 block mappings.
-* SW table walks can't handle removal of intermediate entries.
-*/
-   return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
-  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   /* See arch_vmap_pud_supported() */
-   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
-}
-
 int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot)
 {
pud_t new_pud = pfn_pud(__phys_to_pfn(phys), mk_pud_sect_prot(prot));
@@ -1425,11 +1404,6 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
return 1;
 }
 
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;   /* Don't attempt a block mapping */
-}
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
 {
-- 
2.23.0



[PATCH v11 06/13] powerpc: inline huge vmap supported functions

2021-01-25 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/vmalloc.h   | 19 ---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 21 -
 2 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/vmalloc.h 
b/arch/powerpc/include/asm/vmalloc.h
index 105abb73f075..3f0c153befb0 100644
--- a/arch/powerpc/include/asm/vmalloc.h
+++ b/arch/powerpc/include/asm/vmalloc.h
@@ -1,12 +1,25 @@
 #ifndef _ASM_POWERPC_VMALLOC_H
 #define _ASM_POWERPC_VMALLOC_H
 
+#include 
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+   /* HPT does not cope with large pages in the vmalloc area */
+   return radix_enabled();
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   return radix_enabled();
+}
 #endif
 
 #endif /* _ASM_POWERPC_VMALLOC_H */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 743807fc210f..8da62afccee5 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1082,22 +1082,6 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
 }
 
-bool arch_vmap_pud_supported(pgprot_t prot)
-{
-   /* HPT does not cope with large pages in the vmalloc area */
-   return radix_enabled();
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   return radix_enabled();
-}
-
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;
-}
-
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
pte_t *ptep = (pte_t *)pud;
@@ -1181,8 +1165,3 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
return 1;
 }
-
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-- 
2.23.0



[PATCH v11 05/13] mm: HUGE_VMAP arch support cleanup

2021-01-25 Thread Nicholas Piggin
This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.

This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.

This also adds a prot argument to the arch query. This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).

Cc: linuxppc-dev@lists.ozlabs.org
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Acked-by: Catalin Marinas  [arm64]
Signed-off-by: Nicholas Piggin 
---
 arch/arm64/include/asm/vmalloc.h |  8 ++
 arch/arm64/mm/mmu.c  | 10 +--
 arch/powerpc/include/asm/vmalloc.h   |  8 ++
 arch/powerpc/mm/book3s64/radix_pgtable.c |  8 +-
 arch/x86/include/asm/vmalloc.h   |  7 ++
 arch/x86/mm/ioremap.c| 12 +--
 include/linux/io.h   |  9 ---
 include/linux/vmalloc.h  |  6 ++
 init/main.c  |  1 -
 mm/ioremap.c | 94 ++--
 10 files changed, 85 insertions(+), 78 deletions(-)

diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 2ca708ab9b20..597b40405319 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -1,4 +1,12 @@
 #ifndef _ASM_ARM64_VMALLOC_H
 #define _ASM_ARM64_VMALLOC_H
 
+#include 
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_ARM64_VMALLOC_H */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ae0c3d023824..1613d290cbd1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1313,12 +1313,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int 
*size, pgprot_t prot)
return dt_virt;
 }
 
-int __init arch_ioremap_p4d_supported(void)
+bool arch_vmap_p4d_supported(pgprot_t prot)
 {
-   return 0;
+   return false;
 }
 
-int __init arch_ioremap_pud_supported(void)
+bool arch_vmap_pud_supported(pgprot_t prot)
 {
/*
 * Only 4k granule supports level 1 block mappings.
@@ -1328,9 +1328,9 @@ int __init arch_ioremap_pud_supported(void)
   !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
-int __init arch_ioremap_pmd_supported(void)
+bool arch_vmap_pmd_supported(pgprot_t prot)
 {
-   /* See arch_ioremap_pud_supported() */
+   /* See arch_vmap_pud_supported() */
return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
diff --git a/arch/powerpc/include/asm/vmalloc.h 
b/arch/powerpc/include/asm/vmalloc.h
index b992dfaaa161..105abb73f075 100644
--- a/arch/powerpc/include/asm/vmalloc.h
+++ b/arch/powerpc/include/asm/vmalloc.h
@@ -1,4 +1,12 @@
 #ifndef _ASM_POWERPC_VMALLOC_H
 #define _ASM_POWERPC_VMALLOC_H
 
+#include 
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_POWERPC_VMALLOC_H */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 98f0b243c1ab..743807fc210f 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1082,13 +1082,13 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
 }
 
-int __init arch_ioremap_pud_supported(void)
+bool arch_vmap_pud_supported(pgprot_t prot)
 {
/* HPT does not cope with large pages in the vmalloc area */
return radix_enabled();
 }
 
-int __init arch_ioremap_pmd_supported(void)
+bool arch_vmap_pmd_supported(pgprot_t prot)
 {
return radix_enabled();
 }
@@ -1182,7 +1182,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
return 1;
 }
 
-int __init arch_ioremap_p4d_supported(void)
+bool arch_vmap_p4d_supported(pgprot_t prot)
 {
-   return 0;
+   return false;
 }
diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
index 29837740b520..094ea2b565f3 100644
--- a/arch/x86/include/asm/vmalloc.h
+++ b/arch/x86/include/asm/vmalloc.h
@@ -1,6 +1,13 @@
 #ifndef _ASM_X86_VMALLOC_H
 #define _ASM_X86_VMALLOC_H
 
+#include 
 #include 
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_X86_VMALLOC_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 9e5ccc56f8e0..fbaf0c447986 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -481,24 +481,26 @@ void iounmap(volatile void __iomem *addr)
 }
 

[PATCH v11 04/13] mm/ioremap: rename ioremap_*_range to vmap_*_range

2021-01-25 Thread Nicholas Piggin
This will be used as a generic kernel virtual mapping function, so
re-name it in preparation.

Signed-off-by: Nicholas Piggin 
---
 mm/ioremap.c | 64 +++-
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/mm/ioremap.c b/mm/ioremap.c
index 5fa1ab41d152..3f4d36f9745a 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -61,9 +61,9 @@ static inline int ioremap_pud_enabled(void) { return 0; }
 static inline int ioremap_pmd_enabled(void) { return 0; }
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
-static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pte_t *pte;
u64 pfn;
@@ -81,9 +81,8 @@ static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
return 0;
 }
 
-static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_pmd_enabled())
return 0;
@@ -103,9 +102,9 @@ static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long 
addr,
return pmd_set_huge(pmd, phys_addr, prot);
 }
 
-static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pmd_t *pmd;
unsigned long next;
@@ -116,20 +115,19 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned 
long addr,
do {
next = pmd_addr_end(addr, end);
 
-   if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) {
+   if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) {
*mask |= PGTBL_PMD_MODIFIED;
continue;
}
 
-   if (ioremap_pte_range(pmd, addr, next, phys_addr, prot, mask))
+   if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask))
return -ENOMEM;
} while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
return 0;
 }
 
-static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_pud_enabled())
return 0;
@@ -149,9 +147,9 @@ static int ioremap_try_huge_pud(pud_t *pud, unsigned long 
addr,
return pud_set_huge(pud, phys_addr, prot);
 }
 
-static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pud_t *pud;
unsigned long next;
@@ -162,20 +160,19 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
do {
next = pud_addr_end(addr, end);
 
-   if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot)) {
+   if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot)) {
*mask |= PGTBL_PUD_MODIFIED;
continue;
}
 
-   if (ioremap_pmd_range(pud, addr, next, phys_addr, prot, mask))
+   if (vmap_pmd_range(pud, addr, next, phys_addr, prot, mask))
return -ENOMEM;
} while (pud++, phys_addr += (next - addr), addr = next, addr != end);
return 0;
 }
 
-static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_p4d(p4d_t *p4d, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_p4d_enabled())
return 0;
@@ -195,9 +192,9 @@ static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long 
addr,
return p4d_set_huge(p4d, phys_addr, prot);
 }
 
-static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
-   

[PATCH v11 03/13] mm/vmalloc: rename vmap_*_range vmap_pages_*_range

2021-01-25 Thread Nicholas Piggin
The vmalloc mapper operates on a struct page * array rather than a
linear physical address, re-name it to make this distinction clear.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 62372f9e0167..7f2f36116980 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -189,7 +189,7 @@ void unmap_kernel_range_noflush(unsigned long start, 
unsigned long size)
arch_sync_kernel_mappings(start, end);
 }
 
-static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
+static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -217,7 +217,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
return 0;
 }
 
-static int vmap_pmd_range(pud_t *pud, unsigned long addr,
+static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -229,13 +229,13 @@ static int vmap_pmd_range(pud_t *pud, unsigned long addr,
return -ENOMEM;
do {
next = pmd_addr_end(addr, end);
-   if (vmap_pte_range(pmd, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (pmd++, addr = next, addr != end);
return 0;
 }
 
-static int vmap_pud_range(p4d_t *p4d, unsigned long addr,
+static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -247,13 +247,13 @@ static int vmap_pud_range(p4d_t *p4d, unsigned long addr,
return -ENOMEM;
do {
next = pud_addr_end(addr, end);
-   if (vmap_pmd_range(pud, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (pud++, addr = next, addr != end);
return 0;
 }
 
-static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
+static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -265,7 +265,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
return -ENOMEM;
do {
next = p4d_addr_end(addr, end);
-   if (vmap_pud_range(p4d, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (p4d++, addr = next, addr != end);
return 0;
@@ -306,7 +306,7 @@ int map_kernel_range_noflush(unsigned long addr, unsigned 
long size,
next = pgd_addr_end(addr, end);
if (pgd_bad(*pgd))
mask |= PGTBL_PGD_MODIFIED;
-   err = vmap_p4d_range(pgd, addr, next, prot, pages, , );
+   err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, , 
);
if (err)
return err;
} while (pgd++, addr = next, addr != end);
-- 
2.23.0



[PATCH v11 02/13] mm: apply_to_pte_range warn and fail if a large pte is encountered

2021-01-25 Thread Nicholas Piggin
apply_to_pte_range might mistake a large pte for bad, or treat it as a
page table, resulting in a crash or corruption. Add a test to warn and
return error if large entries are found.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Nicholas Piggin 
---
 mm/memory.c | 66 +++--
 1 file changed, 49 insertions(+), 17 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index feff48e1465a..672e39a72788 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2440,13 +2440,21 @@ static int apply_to_pmd_range(struct mm_struct *mm, 
pud_t *pud,
}
do {
next = pmd_addr_end(addr, end);
-   if (create || !pmd_none_or_clear_bad(pmd)) {
-   err = apply_to_pte_range(mm, pmd, addr, next, fn, data,
-create, mask);
-   if (err)
-   break;
+   if (pmd_none(*pmd) && !create)
+   continue;
+   if (WARN_ON_ONCE(pmd_leaf(*pmd)))
+   return -EINVAL;
+   if (!pmd_none(*pmd) && WARN_ON_ONCE(pmd_bad(*pmd))) {
+   if (!create)
+   continue;
+   pmd_clear_bad(pmd);
}
+   err = apply_to_pte_range(mm, pmd, addr, next,
+fn, data, create, mask);
+   if (err)
+   break;
} while (pmd++, addr = next, addr != end);
+
return err;
 }
 
@@ -2468,13 +2476,21 @@ static int apply_to_pud_range(struct mm_struct *mm, 
p4d_t *p4d,
}
do {
next = pud_addr_end(addr, end);
-   if (create || !pud_none_or_clear_bad(pud)) {
-   err = apply_to_pmd_range(mm, pud, addr, next, fn, data,
-create, mask);
-   if (err)
-   break;
+   if (pud_none(*pud) && !create)
+   continue;
+   if (WARN_ON_ONCE(pud_leaf(*pud)))
+   return -EINVAL;
+   if (!pud_none(*pud) && WARN_ON_ONCE(pud_bad(*pud))) {
+   if (!create)
+   continue;
+   pud_clear_bad(pud);
}
+   err = apply_to_pmd_range(mm, pud, addr, next,
+fn, data, create, mask);
+   if (err)
+   break;
} while (pud++, addr = next, addr != end);
+
return err;
 }
 
@@ -2496,13 +2512,21 @@ static int apply_to_p4d_range(struct mm_struct *mm, 
pgd_t *pgd,
}
do {
next = p4d_addr_end(addr, end);
-   if (create || !p4d_none_or_clear_bad(p4d)) {
-   err = apply_to_pud_range(mm, p4d, addr, next, fn, data,
-create, mask);
-   if (err)
-   break;
+   if (p4d_none(*p4d) && !create)
+   continue;
+   if (WARN_ON_ONCE(p4d_leaf(*p4d)))
+   return -EINVAL;
+   if (!p4d_none(*p4d) && WARN_ON_ONCE(p4d_bad(*p4d))) {
+   if (!create)
+   continue;
+   p4d_clear_bad(p4d);
}
+   err = apply_to_pud_range(mm, p4d, addr, next,
+fn, data, create, mask);
+   if (err)
+   break;
} while (p4d++, addr = next, addr != end);
+
return err;
 }
 
@@ -2522,9 +2546,17 @@ static int __apply_to_page_range(struct mm_struct *mm, 
unsigned long addr,
pgd = pgd_offset(mm, addr);
do {
next = pgd_addr_end(addr, end);
-   if (!create && pgd_none_or_clear_bad(pgd))
+   if (pgd_none(*pgd) && !create)
continue;
-   err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, create, 
);
+   if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+   return -EINVAL;
+   if (!pgd_none(*pgd) && WARN_ON_ONCE(pgd_bad(*pgd))) {
+   if (!create)
+   continue;
+   pgd_clear_bad(pgd);
+   }
+   err = apply_to_p4d_range(mm, pgd, addr, next,
+fn, data, create, );
if (err)
break;
} while (pgd++, addr = next, addr != end);
-- 
2.23.0



[PATCH v11 01/13] mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page

2021-01-25 Thread Nicholas Piggin
vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
Whether or not a vmap is huge depends on the architecture details,
alignments, boot options, etc., which the caller can not be expected
to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.

This change teaches vmalloc_to_page about larger pages, and returns
the struct page that corresponds to the offset within the large page.
This makes the API agnostic to mapping implementation details.

[*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
fail gracefully on unexpected huge vmap mappings")

Reviewed-by: Christoph Hellwig 
Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 41 ++---
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e6f352bf0498..62372f9e0167 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -34,7 +34,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 #include 
 #include 
@@ -343,7 +343,9 @@ int is_vmalloc_or_module_addr(const void *x)
 }
 
 /*
- * Walk a vmap address to the struct page it maps.
+ * Walk a vmap address to the struct page it maps. Huge vmap mappings will
+ * return the tail page that corresponds to the base page address, which
+ * matches small vmap mappings.
  */
 struct page *vmalloc_to_page(const void *vmalloc_addr)
 {
@@ -363,25 +365,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 
if (pgd_none(*pgd))
return NULL;
+   if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+   return NULL; /* XXX: no allowance for huge pgd */
+   if (WARN_ON_ONCE(pgd_bad(*pgd)))
+   return NULL;
+
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d))
return NULL;
-   pud = pud_offset(p4d, addr);
+   if (p4d_leaf(*p4d))
+   return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(p4d_bad(*p4d)))
+   return NULL;
 
-   /*
-* Don't dereference bad PUD or PMD (below) entries. This will also
-* identify huge mappings, which we may encounter on architectures
-* that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
-* identified as vmalloc addresses by is_vmalloc_addr(), but are
-* not [unambiguously] associated with a struct page, so there is
-* no correct value to return for them.
-*/
-   WARN_ON_ONCE(pud_bad(*pud));
-   if (pud_none(*pud) || pud_bad(*pud))
+   pud = pud_offset(p4d, addr);
+   if (pud_none(*pud))
+   return NULL;
+   if (pud_leaf(*pud))
+   return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(pud_bad(*pud)))
return NULL;
+
pmd = pmd_offset(pud, addr);
-   WARN_ON_ONCE(pmd_bad(*pmd));
-   if (pmd_none(*pmd) || pmd_bad(*pmd))
+   if (pmd_none(*pmd))
+   return NULL;
+   if (pmd_leaf(*pmd))
+   return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(pmd_bad(*pmd)))
return NULL;
 
ptep = pte_offset_map(pmd, addr);
@@ -389,6 +399,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
if (pte_present(pte))
page = pte_page(pte);
pte_unmap(ptep);
+
return page;
 }
 EXPORT_SYMBOL(vmalloc_to_page);
-- 
2.23.0



[PATCH v11 00/13] huge vmalloc mappings

2021-01-25 Thread Nicholas Piggin
I think I ended up implementing all Christoph's comments because
they turned out better in the end. Cleanups coming in another
series though.

Thanks,
Nick

Since v10:
- Fixed code style, most > 80 colums, tweak patch titles, etc [thanks Christoph]
- Made huge vmalloc code and data structure compile away if unselected
  [Christoph]
- Archs only have to provide arch_vmap_p?d_supported for levels they
  implement [Christoph]

Since v9:
- Fixed intermediate build breakage on x86-32 !PAE [thanks Ding]
- Fixed small page fallback case vm_struct double-free [thanks Ding]

Since v8:
- Fixed nommu compile.
- Added Kconfig option help text
- Added VM_NOHUGE which should help archs implement it [suggested by Rick]

Since v7:
- Rebase, added some acks, compile fix
- Removed "order=" from vmallocinfo, it's a bit confusing (nr_pages
  is in small page size for compatibility).
- Added arch_vmap_pmd_supported() test before starting to allocate
  the large page, rather than only testing it when doing the map, to
  avoid unsupported configs trying to allocate huge pages for no
  reason.

Since v6:
- Fixed a false positive warning introduced in patch 2, found by
  kbuild test robot.

Since v5:
- Split arch changes out better and make the constant folding work
- Avoid most of the 80 column wrap, fix a reference to lib/ioremap.c
- Fix compile error on some archs

Since v4:
- Fixed an off-by-page-order bug in v4
- Several minor cleanups.
- Added page order to /proc/vmallocinfo
- Added hugepage to alloc_large_system_hage output.
- Made an architecture config option, powerpc only for now.

Since v3:
- Fixed an off-by-one bug in a loop
- Fix !CONFIG_HAVE_ARCH_HUGE_VMAP build fail

Nicholas Piggin (13):
  mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in
vmalloc_to_page
  mm: apply_to_pte_range warn and fail if a large pte is encountered
  mm/vmalloc: rename vmap_*_range vmap_pages_*_range
  mm/ioremap: rename ioremap_*_range to vmap_*_range
  mm: HUGE_VMAP arch support cleanup
  powerpc: inline huge vmap supported functions
  arm64: inline huge vmap supported functions
  x86: inline huge vmap supported functions
  mm/vmalloc: provide fallback arch huge vmap support functions
  mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c
  mm/vmalloc: add vmap_range_noflush variant
  mm/vmalloc: Hugepage vmalloc mappings
  powerpc/64s/radix: Enable huge vmalloc mappings

 .../admin-guide/kernel-parameters.txt |   2 +
 arch/Kconfig  |  11 +
 arch/arm64/include/asm/vmalloc.h  |  24 +
 arch/arm64/mm/mmu.c   |  26 -
 arch/powerpc/Kconfig  |   1 +
 arch/powerpc/include/asm/vmalloc.h|  20 +
 arch/powerpc/kernel/module.c  |  21 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c  |  21 -
 arch/x86/include/asm/vmalloc.h|  20 +
 arch/x86/mm/ioremap.c |  19 -
 arch/x86/mm/pgtable.c |  13 -
 include/linux/io.h|   9 -
 include/linux/vmalloc.h   |  46 ++
 init/main.c   |   1 -
 mm/ioremap.c  | 225 +---
 mm/memory.c   |  66 ++-
 mm/page_alloc.c   |   5 +-
 mm/vmalloc.c  | 484 +++---
 18 files changed, 614 insertions(+), 400 deletions(-)

-- 
2.23.0



[PATCH v2] tpm: ibmvtpm: fix error return code in tpm_ibmvtpm_probe()

2021-01-25 Thread Stefan Berger
From: Stefan Berger 

Return error code -ETIMEDOUT rather than '0' when waiting for the
rtce_buf to be set has timed out.

Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before 
proceeding")
Reported-by: Hulk Robot 
Signed-off-by: Wang Hai 
Signed-off-by: Stefan Berger 
---
 drivers/char/tpm/tpm_ibmvtpm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index 994385bf37c0..813eb2cac0ce 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -687,6 +687,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
ibmvtpm->rtce_buf != NULL,
HZ)) {
dev_err(dev, "CRQ response timed out\n");
+   rc = -ETIMEDOUT;
goto init_irq_cleanup;
}
 
-- 
2.25.4



Re: [PATCH] powerpc/mm: Limit allocation of SWIOTLB on server machines

2021-01-25 Thread Thiago Jung Bauermann


Konrad Rzeszutek Wilk  writes:

> On Fri, Jan 08, 2021 at 09:27:01PM -0300, Thiago Jung Bauermann wrote:
>> 
>> Ram Pai  writes:
>> 
>> > On Wed, Dec 23, 2020 at 09:06:01PM -0300, Thiago Jung Bauermann wrote:
>> >> 
>> >> Hi Ram,
>> >> 
>> >> Thanks for reviewing this patch.
>> >> 
>> >> Ram Pai  writes:
>> >> 
>> >> > On Fri, Dec 18, 2020 at 03:21:03AM -0300, Thiago Jung Bauermann wrote:
>> >> >> On server-class POWER machines, we don't need the SWIOTLB unless we're 
>> >> >> a
>> >> >> secure VM. Nevertheless, if CONFIG_SWIOTLB is enabled we 
>> >> >> unconditionally
>> >> >> allocate it.
>> >> >> 
>> >> >> In most cases this is harmless, but on a few machine configurations 
>> >> >> (e.g.,
>> >> >> POWER9 powernv systems with 4 GB area reserved for crashdump kernel) 
>> >> >> it can
>> >> >> happen that memblock can't find a 64 MB chunk of memory for the 
>> >> >> SWIOTLB and
>> >> >> fails with a scary-looking WARN_ONCE:
>> >> >> 
>> >> >>  [ cut here ]
>> >> >>  memblock: bottom-up allocation failed, memory hotremove may be 
>> >> >> affected
>> >> >>  WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 
>> >> >> memblock_find_in_range_node+0x328/0x340
>> >> >>  Modules linked in:
>> >> >>  CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc2-orig+ #6
>> >> >>  NIP:  c0442f38 LR: c0442f34 CTR: c01e0080
>> >> >>  REGS: c1def900 TRAP: 0700   Not tainted  (5.10.0-rc2-orig+)
>> >> >>  MSR:  92021033   CR: 2802  XER: 
>> >> >> 2004
>> >> >>  CFAR: c014b7b4 IRQMASK: 1
>> >> >>  GPR00: c0442f34 c1defba0 c1deff00 
>> >> >> 0047
>> >> >>  GPR04: 7fff c1def828 c1def820 
>> >> >> 
>> >> >>  GPR08: 001ffc3e c1b75478 c1b75478 
>> >> >> 0001
>> >> >>  GPR12: 2000 c203  
>> >> >> 
>> >> >>  GPR16:    
>> >> >> 0203
>> >> >>  GPR20:  0001 0001 
>> >> >> c1defc10
>> >> >>  GPR24: c1defc08 c1c91868 c1defc18 
>> >> >> c1c91890
>> >> >>  GPR28:   0400 
>> >> >> 
>> >> >>  NIP [c0442f38] memblock_find_in_range_node+0x328/0x340
>> >> >>  LR [c0442f34] memblock_find_in_range_node+0x324/0x340
>> >> >>  Call Trace:
>> >> >>  [c1defba0] [c0442f34] 
>> >> >> memblock_find_in_range_node+0x324/0x340 (unreliable)
>> >> >>  [c1defc90] [c15ac088] 
>> >> >> memblock_alloc_range_nid+0xec/0x1b0
>> >> >>  [c1defd40] [c15ac1f8] 
>> >> >> memblock_alloc_internal+0xac/0x110
>> >> >>  [c1defda0] [c15ac4d0] memblock_alloc_try_nid+0x94/0xcc
>> >> >>  [c1defe30] [c159c3c8] swiotlb_init+0x78/0x104
>> >> >>  [c1defea0] [c158378c] mem_init+0x4c/0x98
>> >> >>  [c1defec0] [c157457c] start_kernel+0x714/0xac8
>> >> >>  [c1deff90] [c000d244] start_here_common+0x1c/0x58
>> >> >>  Instruction dump:
>> >> >>  2c23 4182ffd4 ea610088 ea810090 4bfffe84 3921 3d42fff4 
>> >> >> 3c62ff60
>> >> >>  3863c560 992a8bfc 4bd0881d 6000 <0fe0> ea610088 4bfffd94 
>> >> >> 6000
>> >> >>  random: get_random_bytes called from __warn+0x128/0x184 with 
>> >> >> crng_init=0
>> >> >>  ---[ end trace  ]---
>> >> >>  software IO TLB: Cannot allocate buffer
>> >> >> 
>> >> >> Unless this is a secure VM the message can actually be ignored, 
>> >> >> because the
>> >> >> SWIOTLB isn't needed. Therefore, let's avoid the SWIOTLB in those 
>> >> >> cases.
>> >> >
>> >> > The above warn_on is conveying a genuine warning. Should it be silenced?
>> >> 
>> >> Not sure I understand your point. This patch doesn't silence the
>> >> warning, it avoids the problem it is warning about.
>> >
>> > Sorry, I should have explained it better. My point is...  
>> >
>> >If CONFIG_SWIOTLB is enabled, it means that the kernel is
>> >promising the bounce buffering capability. I know, currently we
>> >do not have any kernel subsystems that use bounce buffers on
>> >non-secure-pseries-kernel or powernv-kernel.  But that does not
>> >mean, there wont be any. In case there is such a third-party
>> >module needing bounce buffering, it wont be able to operate,
>> >because of the proposed change in your patch.
>> >
>> >Is that a good thing or a bad thing, I do not know. I will let
>> >the experts opine.
>> 
>> Ping? Does anyone else has an opinion on this? The other option I can
>> think of is changing the crashkernel code to not reserve so much memory
>> below 4 GB. Other people are considering this option, but it's not
>> planned for the near future.
>
> That seems a more suitable solution regardless, but there is always
> the danger of not being enough or 

Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end

2021-01-25 Thread Thiago Jung Bauermann


Mike Rapoport  writes:

> On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
>> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann 
>>  wrote:
>> 
>> > Mike Rapoport  writes:
>> > 
>> > > > Signed-off-by: Roman Gushchin 
>> > > 
>> > > Reviewed-by: Mike Rapoport 
>> > 
>> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
>> > patch. This happens on some ppc64le bare metal (powernv) server machines 
>> > with
>> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I 
>> > posted
>> > to solve this issue in a different way:
>> > 
>> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauer...@linux.ibm.com/
>> > 
>> > Since this patch solves that problem, is it possible to include it in the 
>> > next
>> > feasible v5.11-rcX, with the following tag?
>> 
>> We could do this,

Thanks!

>> if we're confident that this patch doesn't depend on
>> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...
>
> A think it does not depend on cma bottom-up allocation, it's rather the other
> way around: without this CMA bottom-up allocation could fail with KASLR
> enabled.

I agree. Conceptually, this could have been patch 1 in this series.

> Still, this patch may need updates to the way x86 does early reservations:
>
> https://lore.kernel.org/lkml/20210115083255.12744-1-r...@kernel.org

Ah, I wasn't aware of this. Thanks for fixing those issues. That series
seems to be well accepted.

>> > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated 
>> > from low memory")
>> 
>> I added that.

Thanks!
-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH] PCI: dwc: layerscape: convert to builtin_platform_driver()

2021-01-25 Thread Michael Walle

Am 2021-01-21 12:01, schrieb Geert Uytterhoeven:

Hi Saravana,

On Thu, Jan 21, 2021 at 1:05 AM Saravana Kannan  
wrote:
On Wed, Jan 20, 2021 at 3:53 PM Michael Walle  
wrote:

> Am 2021-01-20 20:47, schrieb Saravana Kannan:
> > On Wed, Jan 20, 2021 at 11:28 AM Michael Walle 
> > wrote:
> >>
> >> [RESEND, fat-fingered the buttons of my mail client and converted
> >> all CCs to BCCs :(]
> >>
> >> Am 2021-01-20 20:02, schrieb Saravana Kannan:
> >> > On Wed, Jan 20, 2021 at 6:24 AM Rob Herring  wrote:
> >> >>
> >> >> On Wed, Jan 20, 2021 at 4:53 AM Michael Walle 
> >> >> wrote:
> >> >> >
> >> >> > fw_devlink will defer the probe until all suppliers are ready. We 
can't
> >> >> > use builtin_platform_driver_probe() because it doesn't retry after 
probe
> >> >> > deferral. Convert it to builtin_platform_driver().
> >> >>
> >> >> If builtin_platform_driver_probe() doesn't work with fw_devlink, then
> >> >> shouldn't it be fixed or removed?
> >> >
> >> > I was actually thinking about this too. The problem with fixing
> >> > builtin_platform_driver_probe() to behave like
> >> > builtin_platform_driver() is that these probe functions could be
> >> > marked with __init. But there are also only 20 instances of
> >> > builtin_platform_driver_probe() in the kernel:
> >> > $ git grep ^builtin_platform_driver_probe | wc -l
> >> > 20
> >> >
> >> > So it might be easier to just fix them to not use
> >> > builtin_platform_driver_probe().
> >> >
> >> > Michael,
> >> >
> >> > Any chance you'd be willing to help me by converting all these to
> >> > builtin_platform_driver() and delete builtin_platform_driver_probe()?
> >>
> >> If it just moving the probe function to the _driver struct and
> >> remove the __init annotations. I could look into that.
> >
> > Yup. That's pretty much it AFAICT.
> >
> > builtin_platform_driver_probe() also makes sure the driver doesn't ask
> > for async probe, etc. But I doubt anyone is actually setting async
> > flags and still using builtin_platform_driver_probe().
>
> Hasn't module_platform_driver_probe() the same problem? And there
> are ~80 drivers which uses that.

Yeah. The biggest problem with all of these is the __init markers.
Maybe some familiar with coccinelle can help?


And dropping them will increase memory usage.


Although I do have the changes for the builtin_platform_driver_probe()
ready, I don't think it makes much sense to send these unless we agree
on the increased memory footprint. While there are just a few
builtin_platform_driver_probe() and memory increase _might_ be
negligible, there are many more module_platform_driver_probe().

-michael


Re: [PATCH] PCI: dwc: layerscape: convert to builtin_platform_driver()

2021-01-25 Thread Michael Walle

Am 2021-01-25 19:58, schrieb Saravana Kannan:

On Mon, Jan 25, 2021 at 8:50 AM Lorenzo Pieralisi
 wrote:


On Wed, Jan 20, 2021 at 08:28:36PM +0100, Michael Walle wrote:
> [RESEND, fat-fingered the buttons of my mail client and converted
> all CCs to BCCs :(]
>
> Am 2021-01-20 20:02, schrieb Saravana Kannan:
> > On Wed, Jan 20, 2021 at 6:24 AM Rob Herring  wrote:
> > >
> > > On Wed, Jan 20, 2021 at 4:53 AM Michael Walle 
> > > wrote:
> > > >
> > > > fw_devlink will defer the probe until all suppliers are ready. We can't
> > > > use builtin_platform_driver_probe() because it doesn't retry after probe
> > > > deferral. Convert it to builtin_platform_driver().
> > >
> > > If builtin_platform_driver_probe() doesn't work with fw_devlink, then
> > > shouldn't it be fixed or removed?
> >
> > I was actually thinking about this too. The problem with fixing
> > builtin_platform_driver_probe() to behave like
> > builtin_platform_driver() is that these probe functions could be
> > marked with __init. But there are also only 20 instances of
> > builtin_platform_driver_probe() in the kernel:
> > $ git grep ^builtin_platform_driver_probe | wc -l
> > 20
> >
> > So it might be easier to just fix them to not use
> > builtin_platform_driver_probe().
> >
> > Michael,
> >
> > Any chance you'd be willing to help me by converting all these to
> > builtin_platform_driver() and delete builtin_platform_driver_probe()?
>
> If it just moving the probe function to the _driver struct and
> remove the __init annotations. I could look into that.

Can I drop this patch then ?


No, please pick it up. Michael and I were talking about doing similar
changes for other drivers.


Yes please, I was just about to answer, but Saravana beat me.

-michael


Re: [PATCH] PCI: dwc: layerscape: convert to builtin_platform_driver()

2021-01-25 Thread Lorenzo Pieralisi
On Wed, Jan 20, 2021 at 08:28:36PM +0100, Michael Walle wrote:
> [RESEND, fat-fingered the buttons of my mail client and converted
> all CCs to BCCs :(]
> 
> Am 2021-01-20 20:02, schrieb Saravana Kannan:
> > On Wed, Jan 20, 2021 at 6:24 AM Rob Herring  wrote:
> > > 
> > > On Wed, Jan 20, 2021 at 4:53 AM Michael Walle 
> > > wrote:
> > > >
> > > > fw_devlink will defer the probe until all suppliers are ready. We can't
> > > > use builtin_platform_driver_probe() because it doesn't retry after probe
> > > > deferral. Convert it to builtin_platform_driver().
> > > 
> > > If builtin_platform_driver_probe() doesn't work with fw_devlink, then
> > > shouldn't it be fixed or removed?
> > 
> > I was actually thinking about this too. The problem with fixing
> > builtin_platform_driver_probe() to behave like
> > builtin_platform_driver() is that these probe functions could be
> > marked with __init. But there are also only 20 instances of
> > builtin_platform_driver_probe() in the kernel:
> > $ git grep ^builtin_platform_driver_probe | wc -l
> > 20
> > 
> > So it might be easier to just fix them to not use
> > builtin_platform_driver_probe().
> > 
> > Michael,
> > 
> > Any chance you'd be willing to help me by converting all these to
> > builtin_platform_driver() and delete builtin_platform_driver_probe()?
> 
> If it just moving the probe function to the _driver struct and
> remove the __init annotations. I could look into that.

Can I drop this patch then ?

Thanks,
Lorenzo


[PATCH v4 23/23] powerpc/syscall: Avoid storing 'current' in another pointer

2021-01-25 Thread Christophe Leroy
By saving the pointer pointing to thread_info.flags, gcc copies r2
in a non-volatile register.

We know 'current' doesn't change, so avoid that intermediaite pointer.

Reduces null_syscall benchmark by 2 cycles (322 => 320 cycles)

On PPC64, gcc seems to know that 'current' is not changing, and it keeps
it in a non volatile register to avoid multiple read of 'current' in paca.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/syscall.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index 47ae55f94d1c..72e0b18b88d8 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -186,7 +186,6 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
   struct pt_regs *regs,
   long scv)
 {
-   unsigned long *ti_flagsp = _thread_info()->flags;
unsigned long ti_flags;
unsigned long ret = 0;
 
@@ -202,7 +201,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
/* Check whether the syscall is issued inside a restartable sequence */
rseq_syscall(regs);
 
-   ti_flags = *ti_flagsp;
+   ti_flags = current_thread_info()->flags;
 
if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && !scv) {
if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL {
@@ -216,7 +215,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
ret = _TIF_RESTOREALL;
else
regs->gpr[3] = r3;
-   clear_bits(_TIF_PERSYSCALL_MASK, ti_flagsp);
+   clear_bits(_TIF_PERSYSCALL_MASK, _thread_info()->flags);
} else {
regs->gpr[3] = r3;
}
@@ -228,7 +227,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
 
 again:
local_irq_disable();
-   ti_flags = READ_ONCE(*ti_flagsp);
+   ti_flags = READ_ONCE(current_thread_info()->flags);
while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
local_irq_enable();
if (ti_flags & _TIF_NEED_RESCHED) {
@@ -244,7 +243,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
do_notify_resume(regs, ti_flags);
}
local_irq_disable();
-   ti_flags = READ_ONCE(*ti_flagsp);
+   ti_flags = READ_ONCE(current_thread_info()->flags);
}
 
if (IS_ENABLED(CONFIG_PPC_BOOK3S) && IS_ENABLED(CONFIG_PPC_FPU)) {
-- 
2.25.0



[PATCH v4 22/23] powerpc/syscall: Optimise checks in beginning of system_call_exception()

2021-01-25 Thread Christophe Leroy
Combine all tests of regs->msr into a single logical one.

Before the patch:

   0:   81 6a 00 84 lwz r11,132(r10)
   4:   90 6a 00 88 stw r3,136(r10)
   8:   69 60 00 02 xorir0,r11,2
   c:   54 00 ff fe rlwinm  r0,r0,31,31,31
  10:   0f 00 00 00 twnei   r0,0
  14:   69 63 40 00 xorir3,r11,16384
  18:   54 63 97 fe rlwinm  r3,r3,18,31,31
  1c:   0f 03 00 00 twnei   r3,0
  20:   69 6b 80 00 xorir11,r11,32768
  24:   55 6b 8f fe rlwinm  r11,r11,17,31,31
  28:   0f 0b 00 00 twnei   r11,0

After the patch:

   0:   81 6a 00 84 lwz r11,132(r10)
   4:   90 6a 00 88 stw r3,136(r10)
   8:   7d 6b 58 f8 not r11,r11
   c:   71 6b c0 02 andi.   r11,r11,49154
  10:   0f 0b 00 00 twnei   r11,0

6 cycles less on powerpc 8xx (328 => 322 cycles).

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/syscall.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index a40775daa88b..47ae55f94d1c 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -28,6 +28,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
   unsigned long r0, struct pt_regs *regs)
 {
syscall_fn f;
+   unsigned long expected_msr;
 
regs->orig_gpr3 = r3;
 
@@ -39,10 +40,13 @@ notrace long system_call_exception(long r3, long r4, long 
r5,
 
trace_hardirqs_off(); /* finish reconciling */
 
+   expected_msr = MSR_PR;
if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
-   BUG_ON(!(regs->msr & MSR_RI));
-   BUG_ON(!(regs->msr & MSR_PR));
-   BUG_ON(arch_irq_disabled_regs(regs));
+   expected_msr |= MSR_RI;
+   if (IS_ENABLED(CONFIG_PPC32))
+   expected_msr |= MSR_EE;
+   BUG_ON((regs->msr & expected_msr) ^ expected_msr);
+   BUG_ON(IS_ENABLED(CONFIG_PPC64) && arch_irq_disabled_regs(regs));
 
 #ifdef CONFIG_PPC_PKEY
if (mmu_has_feature(MMU_FTR_PKEY)) {
-- 
2.25.0



[PATCH v4 21/23] powerpc/syscall: Remove FULL_REGS verification in system_call_exception

2021-01-25 Thread Christophe Leroy
For book3s/64, FULL_REGS() is 'true' at all time, so the test voids.
For others, non volatile registers are saved inconditionally.

So the verification is pointless.

Should one fail to do it, it would anyway be caught by the
CHECK_FULL_REGS() in copy_thread() as we have removed the
special versions ppc_fork() and friends.

null_syscall benchmark reduction 4 cycles (332 => 328 cycles)

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/syscall.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index 30f8a397a522..a40775daa88b 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -42,7 +42,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
BUG_ON(!(regs->msr & MSR_RI));
BUG_ON(!(regs->msr & MSR_PR));
-   BUG_ON(!FULL_REGS(regs));
BUG_ON(arch_irq_disabled_regs(regs));
 
 #ifdef CONFIG_PPC_PKEY
-- 
2.25.0



[PATCH v4 20/23] powerpc/syscall: Do not check unsupported scv vector on PPC32

2021-01-25 Thread Christophe Leroy
Only PPC64 has scv. No need to check the 0x7ff0 trap on PPC32.

And ignore the scv parameter in syscall_exit_prepare (Save 14 cycles
346 => 332 cycles)

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 1 -
 arch/powerpc/kernel/syscall.c  | 7 +--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 9922a04650f7..6ae9c7bcb06c 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -343,7 +343,6 @@ transfer_to_syscall:
 
 ret_from_syscall:
addir4,r1,STACK_FRAME_OVERHEAD
-   li  r5,0
bl  syscall_exit_prepare
 #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
/* If the process has its own DBCR0 value, load it up.  The internal
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index 476909b11051..30f8a397a522 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -86,7 +86,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
local_irq_enable();
 
if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
-   if (unlikely(regs->trap == 0x7ff0)) {
+   if (IS_ENABLED(CONFIG_PPC64) && unlikely(regs->trap == 0x7ff0)) 
{
/* Unsupported scv vector */
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
return regs->gpr[3];
@@ -109,7 +109,7 @@ notrace long system_call_exception(long r3, long r4, long 
r5,
r8 = regs->gpr[8];
 
} else if (unlikely(r0 >= NR_syscalls)) {
-   if (unlikely(regs->trap == 0x7ff0)) {
+   if (IS_ENABLED(CONFIG_PPC64) && unlikely(regs->trap == 0x7ff0)) 
{
/* Unsupported scv vector */
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
return regs->gpr[3];
@@ -187,6 +187,9 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
unsigned long ti_flags;
unsigned long ret = 0;
 
+   if (IS_ENABLED(CONFIG_PPC32))
+   scv = 0;
+
CT_WARN_ON(ct_state() == CONTEXT_USER);
 
kuap_check();
-- 
2.25.0



[PATCH v4 19/23] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()

2021-01-25 Thread Christophe Leroy
When r3 is not modified, reload it from regs->orig_r3 to free
volatile registers. This avoids a stack frame for the likely part
of system_call_exception()

Before the patch:

c000b4d4 :
c000b4d4:   7c 08 02 a6 mflrr0
c000b4d8:   94 21 ff e0 stwur1,-32(r1)
c000b4dc:   93 e1 00 1c stw r31,28(r1)
c000b4e0:   90 01 00 24 stw r0,36(r1)
c000b4e4:   90 6a 00 88 stw r3,136(r10)
c000b4e8:   81 6a 00 84 lwz r11,132(r10)
c000b4ec:   69 6b 00 02 xorir11,r11,2
c000b4f0:   55 6b ff fe rlwinm  r11,r11,31,31,31
c000b4f4:   0f 0b 00 00 twnei   r11,0
c000b4f8:   81 6a 00 a0 lwz r11,160(r10)
c000b4fc:   55 6b 07 fe clrlwi  r11,r11,31
c000b500:   0f 0b 00 00 twnei   r11,0
c000b504:   7c 0c 42 e6 mftbr0
c000b508:   83 e2 00 08 lwz r31,8(r2)
c000b50c:   81 82 00 28 lwz r12,40(r2)
c000b510:   90 02 00 24 stw r0,36(r2)
c000b514:   7d 8c f8 50 subfr12,r12,r31
c000b518:   7c 0c 02 14 add r0,r12,r0
c000b51c:   90 02 00 08 stw r0,8(r2)
c000b520:   7c 10 13 a6 mtspr   80,r0
c000b524:   81 62 00 70 lwz r11,112(r2)
c000b528:   71 60 86 91 andi.   r0,r11,34449
c000b52c:   40 82 00 34 bne c000b560 
c000b530:   2b 89 01 b6 cmplwi  cr7,r9,438
c000b534:   41 9d 00 64 bgt cr7,c000b598 

c000b538:   3d 40 c0 5c lis r10,-16292
c000b53c:   55 29 10 3a rlwinm  r9,r9,2,0,29
c000b540:   39 4a 41 e8 addir10,r10,16872
c000b544:   80 01 00 24 lwz r0,36(r1)
c000b548:   7d 2a 48 2e lwzxr9,r10,r9
c000b54c:   7c 08 03 a6 mtlrr0
c000b550:   7d 29 03 a6 mtctr   r9
c000b554:   83 e1 00 1c lwz r31,28(r1)
c000b558:   38 21 00 20 addir1,r1,32
c000b55c:   4e 80 04 20 bctr

After the patch:

c000b4d4 :
c000b4d4:   81 6a 00 84 lwz r11,132(r10)
c000b4d8:   90 6a 00 88 stw r3,136(r10)
c000b4dc:   69 6b 00 02 xorir11,r11,2
c000b4e0:   55 6b ff fe rlwinm  r11,r11,31,31,31
c000b4e4:   0f 0b 00 00 twnei   r11,0
c000b4e8:   80 6a 00 a0 lwz r3,160(r10)
c000b4ec:   54 63 07 fe clrlwi  r3,r3,31
c000b4f0:   0f 03 00 00 twnei   r3,0
c000b4f4:   7d 6c 42 e6 mftbr11
c000b4f8:   81 82 00 08 lwz r12,8(r2)
c000b4fc:   80 02 00 28 lwz r0,40(r2)
c000b500:   91 62 00 24 stw r11,36(r2)
c000b504:   7c 00 60 50 subfr0,r0,r12
c000b508:   7d 60 5a 14 add r11,r0,r11
c000b50c:   91 62 00 08 stw r11,8(r2)
c000b510:   7c 10 13 a6 mtspr   80,r0
c000b514:   80 62 00 70 lwz r3,112(r2)
c000b518:   70 6b 86 91 andi.   r11,r3,34449
c000b51c:   40 82 00 28 bne c000b544 
c000b520:   2b 89 01 b6 cmplwi  cr7,r9,438
c000b524:   41 9d 00 84 bgt cr7,c000b5a8 

c000b528:   80 6a 00 88 lwz r3,136(r10)
c000b52c:   3d 40 c0 5c lis r10,-16292
c000b530:   55 29 10 3a rlwinm  r9,r9,2,0,29
c000b534:   39 4a 41 e4 addir10,r10,16868
c000b538:   7d 2a 48 2e lwzxr9,r10,r9
c000b53c:   7d 29 03 a6 mtctr   r9
c000b540:   4e 80 04 20 bctr

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/syscall.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index a3510fa4e641..476909b11051 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -115,6 +115,9 @@ notrace long system_call_exception(long r3, long r4, long 
r5,
return regs->gpr[3];
}
return -ENOSYS;
+   } else {
+   /* Restore r3 from orig_gpr3 to free up a volatile reg */
+   r3 = regs->orig_gpr3;
}
 
/* May be faster to do array_index_nospec? */
-- 
2.25.0



[PATCH v4 18/23] powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry

2021-01-25 Thread Christophe Leroy
system_call_exception() checks MSR_PR and BUGs if a syscall
is issued from kernel mode.

No need to handle it anymore from the ASM entry code.

null_syscall reduction 2 cycles (348 => 346 cycles)

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S   | 30 --
 arch/powerpc/kernel/head_32.h|  3 ---
 arch/powerpc/kernel/head_booke.h |  3 ---
 3 files changed, 36 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index ce5fdb23ed7c..9922a04650f7 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -416,36 +416,6 @@ ret_from_kernel_thread:
li  r3,0
b   ret_from_syscall
 
-   /*
-* System call was called from kernel. We get here with SRR1 in r9.
-* Mark the exception as recoverable once we have retrieved SRR0,
-* trap a warning and return ENOSYS with CR[SO] set.
-*/
-   .globl  ret_from_kernel_syscall
-ret_from_kernel_syscall:
-   mfspr   r9, SPRN_SRR0
-   mfspr   r10, SPRN_SRR1
-#if !defined(CONFIG_4xx) && !defined(CONFIG_BOOKE)
-   LOAD_REG_IMMEDIATE(r11, MSR_KERNEL & ~(MSR_IR|MSR_DR))
-   mtmsr   r11
-#endif
-
-0: trap
-   EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
-
-   li  r3, ENOSYS
-   crset   so
-#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
-   mtspr   SPRN_NRI, r0
-#endif
-   mtspr   SPRN_SRR0, r9
-   mtspr   SPRN_SRR1, r10
-   rfi
-#ifdef CONFIG_40x
-   b . /* Prevent prefetch past rfi */
-#endif
-_ASM_NOKPROBE_SYMBOL(ret_from_kernel_syscall)
-
 /*
  * Top-level page fault handling.
  * This is in assembler because if do_page_fault tells us that
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index c2aa0d8f1f63..c0de4acbe3f8 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -118,8 +118,6 @@
 .macro SYSCALL_ENTRY trapno
mfspr   r9, SPRN_SRR1
mfspr   r10, SPRN_SRR0
-   andi.   r11, r9, MSR_PR
-   beq-99f
LOAD_REG_IMMEDIATE(r11, MSR_KERNEL) /* can take exceptions 
*/
lis r12, 1f@h
ori r12, r12, 1f@l
@@ -176,7 +174,6 @@
 3:
 #endif
b   transfer_to_syscall /* jump to handler */
-99:b   ret_from_kernel_syscall
 .endm
 
 .macro save_dar_dsisr_on_stack reg1, reg2, sp
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index faff094b650e..7af84e1e717b 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -106,10 +106,8 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 #endif
mfspr   r9, SPRN_SRR1
BOOKE_CLEAR_BTB(r11)
-   andi.   r11, r9, MSR_PR
lwz r11, TASK_STACK - THREAD(r10)
rlwinm  r12,r12,0,4,2   /* Clear SO bit in CR */
-   beq-99f
ALLOC_STACK_FRAME(r11, THREAD_SIZE - INT_FRAME_SIZE)
stw r12, _CCR(r11)  /* save various registers */
mflrr12
@@ -157,7 +155,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 
 3:
b   transfer_to_syscall /* jump to handler */
-99:b   ret_from_kernel_syscall
 .endm
 
 /* To handle the additional exception priority levels on 40x and Book-E
-- 
2.25.0



[PATCH v4 17/23] powerpc/syscall: implement system call entry/exit logic in C for PPC32

2021-01-25 Thread Christophe Leroy
That's port of PPC64 syscall entry/exit logic in C to PPC32.

Performancewise on 8xx:
Before : 304 cycles on null_syscall
After  : 348 cycles on null_syscall

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S   | 227 ---
 arch/powerpc/kernel/head_32.h|  16 ---
 arch/powerpc/kernel/head_booke.h |  15 --
 3 files changed, 29 insertions(+), 229 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 97dc28a68465..ce5fdb23ed7c 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -329,117 +329,22 @@ stack_ovf:
 _ASM_NOKPROBE_SYMBOL(stack_ovf)
 #endif
 
-#ifdef CONFIG_TRACE_IRQFLAGS
-trace_syscall_entry_irq_off:
-   /*
-* Syscall shouldn't happen while interrupts are disabled,
-* so let's do a warning here.
-*/
-0: trap
-   EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
-   bl  trace_hardirqs_on
-
-   /* Now enable for real */
-   LOAD_REG_IMMEDIATE(r10, MSR_KERNEL | MSR_EE)
-   mtmsr   r10
-
-   REST_GPR(0, r1)
-   REST_4GPRS(3, r1)
-   REST_2GPRS(7, r1)
-   b   DoSyscall
-#endif /* CONFIG_TRACE_IRQFLAGS */
-
.globl  transfer_to_syscall
 transfer_to_syscall:
SAVE_NVGPRS(r1)
 #ifdef CONFIG_PPC_BOOK3S_32
kuep_lock r11, r12
 #endif
-#ifdef CONFIG_TRACE_IRQFLAGS
-   andi.   r12,r9,MSR_EE
-   beq-trace_syscall_entry_irq_off
-#endif /* CONFIG_TRACE_IRQFLAGS */
 
-/*
- * Handle a system call.
- */
-   .stabs  "arch/powerpc/kernel/",N_SO,0,0,0f
-   .stabs  "entry_32.S",N_SO,0,0,0f
-0:
-
-_GLOBAL(DoSyscall)
-   stw r3,ORIG_GPR3(r1)
-   li  r12,0
-   stw r12,RESULT(r1)
-#ifdef CONFIG_TRACE_IRQFLAGS
-   /* Make sure interrupts are enabled */
-   mfmsr   r11
-   andi.   r12,r11,MSR_EE
-   /* We came in with interrupts disabled, we WARN and mark them enabled
-* for lockdep now */
-0: tweqi   r12, 0
-   EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
-#endif /* CONFIG_TRACE_IRQFLAGS */
-   lwz r11,TI_FLAGS(r2)
-   andi.   r11,r11,_TIF_SYSCALL_DOTRACE
-   bne-syscall_dotrace
-syscall_dotrace_cont:
-   cmplwi  0,r0,NR_syscalls
-   lis r10,sys_call_table@h
-   ori r10,r10,sys_call_table@l
-   slwir0,r0,2
-   bge-66f
-
-   barrier_nospec_asm
-   /*
-* Prevent the load of the handler below (based on the user-passed
-* system call number) being speculatively executed until the test
-* against NR_syscalls and branch to .66f above has
-* committed.
-*/
+   /* Calling convention has r9 = orig r0, r10 = regs */
+   mr  r9,r0
+   addir10,r1,STACK_FRAME_OVERHEAD
+   bl  system_call_exception
 
-   lwzxr10,r10,r0  /* Fetch system call handler [ptr] */
-   mtlrr10
-   addir9,r1,STACK_FRAME_OVERHEAD
-   PPC440EP_ERR42
-   blrl/* Call handler */
-   .globl  ret_from_syscall
 ret_from_syscall:
-#ifdef CONFIG_DEBUG_RSEQ
-   /* Check whether the syscall is issued inside a restartable sequence */
-   stw r3,GPR3(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
-   bl  rseq_syscall
-   lwz r3,GPR3(r1)
-#endif
-   mr  r6,r3
-   /* disable interrupts so current_thread_info()->flags can't change */
-   LOAD_REG_IMMEDIATE(r10,MSR_KERNEL)  /* doesn't include MSR_EE */
-   /* Note: We don't bother telling lockdep about it */
-   mtmsr   r10
-   lwz r9,TI_FLAGS(r2)
-   li  r8,-MAX_ERRNO
-   andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
-   bne-syscall_exit_work
-   cmplw   0,r3,r8
-   blt+syscall_exit_cont
-   lwz r11,_CCR(r1)/* Load CR */
-   neg r3,r3
-   orisr11,r11,0x1000  /* Set SO bit in CR */
-   stw r11,_CCR(r1)
-syscall_exit_cont:
-   lwz r8,_MSR(r1)
-#ifdef CONFIG_TRACE_IRQFLAGS
-   /* If we are going to return from the syscall with interrupts
-* off, we trace that here. It shouldn't normally happen.
-*/
-   andi.   r10,r8,MSR_EE
-   bne+1f
-   stw r3,GPR3(r1)
-   bl  trace_hardirqs_off
-   lwz r3,GPR3(r1)
-1:
-#endif /* CONFIG_TRACE_IRQFLAGS */
+   addir4,r1,STACK_FRAME_OVERHEAD
+   li  r5,0
+   bl  syscall_exit_prepare
 #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
/* If the process has its own DBCR0 value, load it up.  The internal
   debug mode bit tells us that dbcr0 should be loaded. */
@@ -453,34 +358,39 @@ syscall_exit_cont:
cmplwi  cr0,r5,0
bne-2f
 #endif /* CONFIG_PPC_47x */
-1:
-BEGIN_FTR_SECTION
-   lwarx   r7,0,r1
-END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
-   stwcx.  r0,0,r1 /* to clear 

[PATCH v4 16/23] powerpc/32: Always save non volatile GPRs at syscall entry

2021-01-25 Thread Christophe Leroy
In preparation for porting syscall entry/exit to C, inconditionally
save non volatile general purpose registers.

Commit 965dd3ad3076 ("powerpc/64/syscall: Remove non-volatile GPR save
optimisation") provides detailed explanation.

This increases the number of cycles by 24 cycles on 8xx with
null_syscall benchmark (280 => 304 cycles)

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S   | 46 +---
 arch/powerpc/kernel/head_32.h|  2 +-
 arch/powerpc/kernel/head_booke.h |  2 +-
 arch/powerpc/kernel/syscalls/syscall.tbl | 20 +++
 4 files changed, 8 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index b1e36602c013..97dc28a68465 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -351,6 +351,7 @@ trace_syscall_entry_irq_off:
 
.globl  transfer_to_syscall
 transfer_to_syscall:
+   SAVE_NVGPRS(r1)
 #ifdef CONFIG_PPC_BOOK3S_32
kuep_lock r11, r12
 #endif
@@ -614,51 +615,6 @@ ret_from_kernel_syscall:
 #endif
 _ASM_NOKPROBE_SYMBOL(ret_from_kernel_syscall)
 
-/*
- * The fork/clone functions need to copy the full register set into
- * the child process. Therefore we need to save all the nonvolatile
- * registers (r13 - r31) before calling the C code.
- */
-   .globl  ppc_fork
-ppc_fork:
-   SAVE_NVGPRS(r1)
-   lwz r0,_TRAP(r1)
-   rlwinm  r0,r0,0,0,30/* clear LSB to indicate full */
-   stw r0,_TRAP(r1)/* register set saved */
-   b   sys_fork
-
-   .globl  ppc_vfork
-ppc_vfork:
-   SAVE_NVGPRS(r1)
-   lwz r0,_TRAP(r1)
-   rlwinm  r0,r0,0,0,30/* clear LSB to indicate full */
-   stw r0,_TRAP(r1)/* register set saved */
-   b   sys_vfork
-
-   .globl  ppc_clone
-ppc_clone:
-   SAVE_NVGPRS(r1)
-   lwz r0,_TRAP(r1)
-   rlwinm  r0,r0,0,0,30/* clear LSB to indicate full */
-   stw r0,_TRAP(r1)/* register set saved */
-   b   sys_clone
-
-   .globl  ppc_clone3
-ppc_clone3:
-   SAVE_NVGPRS(r1)
-   lwz r0,_TRAP(r1)
-   rlwinm  r0,r0,0,0,30/* clear LSB to indicate full */
-   stw r0,_TRAP(r1)/* register set saved */
-   b   sys_clone3
-
-   .globl  ppc_swapcontext
-ppc_swapcontext:
-   SAVE_NVGPRS(r1)
-   lwz r0,_TRAP(r1)
-   rlwinm  r0,r0,0,0,30/* clear LSB to indicate full */
-   stw r0,_TRAP(r1)/* register set saved */
-   b   sys_swapcontext
-
 /*
  * Top-level page fault handling.
  * This is in assembler because if do_page_fault tells us that
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 24dc326e0d56..7b12736ec546 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -148,7 +148,7 @@
stw r2,GPR2(r11)
addir10,r10,STACK_FRAME_REGS_MARKER@l
stw r9,_MSR(r11)
-   li  r2, \trapno + 1
+   li  r2, \trapno
stw r10,8(r11)
stw r2,_TRAP(r11)
SAVE_GPR(0, r11)
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index b3c502c503a0..626e716576ce 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -124,7 +124,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
stw r2,GPR2(r11)
addir12, r12, STACK_FRAME_REGS_MARKER@l
stw r9,_MSR(r11)
-   li  r2, \trapno + 1
+   li  r2, \trapno
stw r12, 8(r11)
stw r2,_TRAP(r11)
SAVE_GPR(0, r11)
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index f744eb5cba88..96b2157f0371 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -9,9 +9,7 @@
 #
 0  nospu   restart_syscall sys_restart_syscall
 1  nospu   exitsys_exit
-2  32  forkppc_fork
sys_fork
-2  64  forksys_fork
-2  spu forksys_ni_syscall
+2  nospu   forksys_fork
 3  common  readsys_read
 4  common  write   sys_write
 5  common  opensys_open
compat_sys_open
@@ -160,9 +158,7 @@
 11932  sigreturn   sys_sigreturn   
compat_sys_sigreturn
 11964  sigreturn   sys_ni_syscall
 119spu sigreturn   sys_ni_syscall
-12032  clone   ppc_clone   
sys_clone
-12064  clone   sys_clone
-120spu clone 

[PATCH v4 15/23] powerpc/syscall: Change condition to check MSR_RI

2021-01-25 Thread Christophe Leroy
In system_call_exception(), MSR_RI also needs to be checked on 8xx.
Only booke and 40x doesn't have MSR_RI.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index b66cfcbcb755..a3510fa4e641 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -39,7 +39,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
 
trace_hardirqs_off(); /* finish reconciling */
 
-   if (IS_ENABLED(CONFIG_PPC_BOOK3S))
+   if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
BUG_ON(!(regs->msr & MSR_RI));
BUG_ON(!(regs->msr & MSR_PR));
BUG_ON(!FULL_REGS(regs));
-- 
2.25.0



[PATCH v4 14/23] powerpc/syscall: Save r3 in regs->orig_r3

2021-01-25 Thread Christophe Leroy
Save r3 in regs->orig_r3 in system_call_exception()

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_64.S | 1 -
 arch/powerpc/kernel/syscall.c  | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index aa1af139d947..a562a4240aa6 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -278,7 +278,6 @@ END_BTB_FLUSH_SECTION
std r10,_LINK(r1)
std r11,_TRAP(r1)
std r12,_CCR(r1)
-   std r3,ORIG_GPR3(r1)
addir10,r1,STACK_FRAME_OVERHEAD
ld  r11,exception_marker@toc(r2)
std r11,-16(r10)/* "regshere" marker */
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index cb415170b8f2..b66cfcbcb755 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -29,6 +29,8 @@ notrace long system_call_exception(long r3, long r4, long r5,
 {
syscall_fn f;
 
+   regs->orig_gpr3 = r3;
+
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
 
-- 
2.25.0



[PATCH v4 13/23] powerpc/syscall: Use is_compat_task()

2021-01-25 Thread Christophe Leroy
Instead of hard comparing task flags with _TIF_32BIT, use
is_compat_task(). The advantage is that it returns 0 on PPC32
allthough _TIF_32BIT is always set.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/syscall.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index bf9bf4b5bc41..cb415170b8f2 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -2,6 +2,8 @@
 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -116,7 +118,7 @@ notrace long system_call_exception(long r3, long r4, long 
r5,
/* May be faster to do array_index_nospec? */
barrier_nospec();
 
-   if (unlikely(is_32bit_task())) {
+   if (unlikely(is_compat_task())) {
f = (void *)compat_sys_call_table[r0];
 
r3 &= 0xULL;
-- 
2.25.0



[PATCH v4 12/23] powerpc/syscall: Make syscall.c buildable on PPC32

2021-01-25 Thread Christophe Leroy
ifdef out specific PPC64 stuff to allow building
syscall.c on PPC32.

Modify Makefile to always build syscall.o

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/Makefile  | 4 ++--
 arch/powerpc/kernel/syscall.c | 9 +
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1cbc51fc82fd..23c127db0d0c 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -46,10 +46,10 @@ obj-y   := cputable.o 
syscalls.o \
   prom.o traps.o setup-common.o \
   udbg.o misc.o io.o misc_$(BITS).o \
   of_platform.o prom_parse.o firmware.o \
-  hw_breakpoint_constraints.o
+  hw_breakpoint_constraints.o syscall.o
 obj-y  += ptrace/
 obj-$(CONFIG_PPC64)+= setup_64.o \
-  paca.o nvram_64.o note.o syscall.o
+  paca.o nvram_64.o note.o
 obj-$(CONFIG_COMPAT)   += sys_ppc32.o signal_32.o
 obj-$(CONFIG_VDSO32)   += vdso32/
 obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index b627a6384029..bf9bf4b5bc41 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -39,7 +39,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
BUG_ON(!(regs->msr & MSR_RI));
BUG_ON(!(regs->msr & MSR_PR));
BUG_ON(!FULL_REGS(regs));
-   BUG_ON(regs->softe != IRQS_ENABLED);
+   BUG_ON(arch_irq_disabled_regs(regs));
 
 #ifdef CONFIG_PPC_PKEY
if (mmu_has_feature(MMU_FTR_PKEY)) {
@@ -77,7 +77,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
 * frame, or if the unwinder was taught the first stack frame always
 * returns to user with IRQS_ENABLED, this store could be avoided!
 */
-   regs->softe = IRQS_ENABLED;
+   irq_soft_mask_regs_set_state(regs, IRQS_ENABLED);
 
local_irq_enable();
 
@@ -147,6 +147,7 @@ static notrace inline bool prep_irq_for_enabled_exit(bool 
clear_ri)
__hard_EE_RI_disable();
else
__hard_irq_disable();
+#ifdef CONFIG_PPC64
if (unlikely(lazy_irq_pending_nocheck())) {
/* Took an interrupt, may have more exit work to do. */
if (clear_ri)
@@ -158,7 +159,7 @@ static notrace inline bool prep_irq_for_enabled_exit(bool 
clear_ri)
}
local_paca->irq_happened = 0;
irq_soft_mask_set(IRQS_ENABLED);
-
+#endif
return true;
 }
 
@@ -281,7 +282,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
return ret;
 }
 
-#ifdef CONFIG_PPC_BOOK3S /* BOOK3E not yet using this */
+#ifdef CONFIG_PPC_BOOK3S_64 /* BOOK3E not yet using this */
 notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, 
unsigned long msr)
 {
 #ifdef CONFIG_PPC_BOOK3E
-- 
2.25.0



[PATCH v4 11/23] powerpc/syscall: Rename syscall_64.c into syscall.c

2021-01-25 Thread Christophe Leroy
syscall_64.c will be reused almost as is for PPC32.

Rename it syscall.c

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/Makefile| 2 +-
 arch/powerpc/kernel/{syscall_64.c => syscall.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/kernel/{syscall_64.c => syscall.c} (100%)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fe2ef598e2ea..1cbc51fc82fd 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -49,7 +49,7 @@ obj-y := cputable.o syscalls.o \
   hw_breakpoint_constraints.o
 obj-y  += ptrace/
 obj-$(CONFIG_PPC64)+= setup_64.o \
-  paca.o nvram_64.o note.o syscall_64.o
+  paca.o nvram_64.o note.o syscall.o
 obj-$(CONFIG_COMPAT)   += sys_ppc32.o signal_32.o
 obj-$(CONFIG_VDSO32)   += vdso32/
 obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall.c
similarity index 100%
rename from arch/powerpc/kernel/syscall_64.c
rename to arch/powerpc/kernel/syscall.c
-- 
2.25.0



[PATCH v4 10/23] powerpc/irq: Add stub irq_soft_mask_return() for PPC32

2021-01-25 Thread Christophe Leroy
To allow building syscall_64.c smoothly on PPC32, add stub version
of irq_soft_mask_return().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hw_irq.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 4739f61e632c..56a98936a6a9 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -330,6 +330,11 @@ static inline void irq_soft_mask_regs_set_state(struct 
pt_regs *regs, unsigned l
 }
 #else /* CONFIG_PPC64 */
 
+static inline notrace unsigned long irq_soft_mask_return(void)
+{
+   return 0;
+}
+
 static inline unsigned long arch_local_save_flags(void)
 {
return mfmsr();
-- 
2.25.0



[PATCH v4 09/23] powerpc/irq: Rework helpers that manipulate MSR[EE/RI]

2021-01-25 Thread Christophe Leroy
In preparation of porting PPC32 to C syscall entry/exit,
rewrite the following helpers as static inline functions and
add support for PPC32 in them:
__hard_irq_enable()
__hard_irq_disable()
__hard_EE_RI_disable()
__hard_RI_enable()

Then use them in PPC32 version of arch_local_irq_disable()
and arch_local_irq_enable() to avoid code duplication.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hw_irq.h | 75 +--
 arch/powerpc/include/asm/reg.h|  1 +
 2 files changed, 52 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index ed0c3b049dfd..4739f61e632c 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -50,6 +50,55 @@
 
 #ifndef __ASSEMBLY__
 
+static inline void __hard_irq_enable(void)
+{
+   if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x))
+   wrtee(MSR_EE);
+   else if (IS_ENABLED(CONFIG_PPC_8xx))
+   wrtspr(SPRN_EIE);
+   else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+   __mtmsrd(MSR_EE | MSR_RI, 1);
+   else
+   mtmsr(mfmsr() | MSR_EE);
+}
+
+static inline void __hard_irq_disable(void)
+{
+   if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x))
+   wrtee(0);
+   else if (IS_ENABLED(CONFIG_PPC_8xx))
+   wrtspr(SPRN_EID);
+   else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+   __mtmsrd(MSR_RI, 1);
+   else
+   mtmsr(mfmsr() & ~MSR_EE);
+}
+
+static inline void __hard_EE_RI_disable(void)
+{
+   if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x))
+   wrtee(0);
+   else if (IS_ENABLED(CONFIG_PPC_8xx))
+   wrtspr(SPRN_NRI);
+   else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+   __mtmsrd(0, 1);
+   else
+   mtmsr(mfmsr() & ~(MSR_EE | MSR_RI));
+}
+
+static inline void __hard_RI_enable(void)
+{
+   if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x))
+   return;
+
+   if (IS_ENABLED(CONFIG_PPC_8xx))
+   wrtspr(SPRN_EID);
+   else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+   __mtmsrd(MSR_RI, 1);
+   else
+   mtmsr(mfmsr() | MSR_RI);
+}
+
 #ifdef CONFIG_PPC64
 #include 
 
@@ -212,18 +261,6 @@ static inline bool arch_irqs_disabled(void)
 
 #endif /* CONFIG_PPC_BOOK3S */
 
-#ifdef CONFIG_PPC_BOOK3E
-#define __hard_irq_enable()wrtee(MSR_EE)
-#define __hard_irq_disable()   wrtee(0)
-#define __hard_EE_RI_disable() wrtee(0)
-#define __hard_RI_enable() do { } while (0)
-#else
-#define __hard_irq_enable()__mtmsrd(MSR_EE|MSR_RI, 1)
-#define __hard_irq_disable()   __mtmsrd(MSR_RI, 1)
-#define __hard_EE_RI_disable() __mtmsrd(0, 1)
-#define __hard_RI_enable() __mtmsrd(MSR_RI, 1)
-#endif
-
 #define hard_irq_disable() do {\
unsigned long flags;\
__hard_irq_disable();   \
@@ -322,22 +359,12 @@ static inline unsigned long arch_local_irq_save(void)
 
 static inline void arch_local_irq_disable(void)
 {
-   if (IS_ENABLED(CONFIG_BOOKE))
-   wrtee(0);
-   else if (IS_ENABLED(CONFIG_PPC_8xx))
-   wrtspr(SPRN_EID);
-   else
-   mtmsr(mfmsr() & ~MSR_EE);
+   __hard_irq_disable();
 }
 
 static inline void arch_local_irq_enable(void)
 {
-   if (IS_ENABLED(CONFIG_BOOKE))
-   wrtee(MSR_EE);
-   else if (IS_ENABLED(CONFIG_PPC_8xx))
-   wrtspr(SPRN_EIE);
-   else
-   mtmsr(mfmsr() | MSR_EE);
+   __hard_irq_enable();
 }
 
 static inline bool arch_irqs_disabled_flags(unsigned long flags)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index e40a921d78f9..d05dca30604d 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1375,6 +1375,7 @@
 #define mtmsr(v)   asm volatile("mtmsr %0" : \
 : "r" ((unsigned long)(v)) \
 : "memory")
+#define __mtmsrd(v, l) BUILD_BUG()
 #define __MTMSR"mtmsr"
 #endif
 
-- 
2.25.0



[PATCH v4 05/23] powerpc/64s: Make kuap_check_amr() and kuap_get_and_check_amr() generic

2021-01-25 Thread Christophe Leroy
In preparation of porting powerpc32 to C syscall entry/exit,
rename kuap_check_amr() and kuap_get_and_check_amr() as kuap_check()
and kuap_get_and_check(), and move in the generic asm/kup.h the stub
for when CONFIG_PPC_KUAP is not selected.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 24 ++--
 arch/powerpc/include/asm/kup.h   |  9 -
 arch/powerpc/kernel/syscall_64.c | 12 ++--
 3 files changed, 16 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index f50f72e535aa..1507681ad4ef 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -281,7 +281,7 @@ static inline void kuap_kernel_restore(struct pt_regs *regs,
 */
 }
 
-static inline unsigned long kuap_get_and_check_amr(void)
+static inline unsigned long kuap_get_and_check(void)
 {
if (mmu_has_feature(MMU_FTR_BOOK3S_KUAP)) {
unsigned long amr = mfspr(SPRN_AMR);
@@ -292,27 +292,7 @@ static inline unsigned long kuap_get_and_check_amr(void)
return 0;
 }
 
-#else /* CONFIG_PPC_PKEY */
-
-static inline void kuap_user_restore(struct pt_regs *regs)
-{
-}
-
-static inline void kuap_kernel_restore(struct pt_regs *regs, unsigned long amr)
-{
-}
-
-static inline unsigned long kuap_get_and_check_amr(void)
-{
-   return 0;
-}
-
-#endif /* CONFIG_PPC_PKEY */
-
-
-#ifdef CONFIG_PPC_KUAP
-
-static inline void kuap_check_amr(void)
+static inline void kuap_check(void)
 {
if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_BOOK3S_KUAP))
WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index bf221a2a523e..6ef9f9cfbed0 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -66,7 +66,14 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, 
bool is_write)
return false;
 }
 
-static inline void kuap_check_amr(void) { }
+static inline void kuap_check(void) { }
+static inline void kuap_user_restore(struct pt_regs *regs) { }
+static inline void kuap_kernel_restore(struct pt_regs *regs, unsigned long 
amr) { }
+
+static inline unsigned long kuap_get_and_check(void)
+{
+   return 0;
+}
 
 /*
  * book3s/64/kup-radix.h defines these functions for the !KUAP case to flush
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 32f72965da26..b627a6384029 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -65,7 +65,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
isync();
} else
 #endif
-   kuap_check_amr();
+   kuap_check();
 
account_cpu_user_entry();
 
@@ -181,7 +181,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
 
CT_WARN_ON(ct_state() == CONTEXT_USER);
 
-   kuap_check_amr();
+   kuap_check();
 
regs->result = r3;
 
@@ -303,7 +303,7 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs, unsigned
 * We don't need to restore AMR on the way back to userspace for KUAP.
 * AMR can only have been unlocked if we interrupted the kernel.
 */
-   kuap_check_amr();
+   kuap_check();
 
local_irq_save(flags);
 
@@ -381,7 +381,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct 
pt_regs *regs, unsign
unsigned long *ti_flagsp = _thread_info()->flags;
unsigned long flags;
unsigned long ret = 0;
-   unsigned long amr;
+   unsigned long kuap;
 
if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
unrecoverable_exception(regs);
@@ -394,7 +394,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct 
pt_regs *regs, unsign
if (TRAP(regs) != 0x700)
CT_WARN_ON(ct_state() == CONTEXT_USER);
 
-   amr = kuap_get_and_check_amr();
+   kuap = kuap_get_and_check();
 
if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
@@ -446,7 +446,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct 
pt_regs *regs, unsign
 * which would cause Read-After-Write stalls. Hence, we take the AMR
 * value from the check above.
 */
-   kuap_kernel_restore(regs, amr);
+   kuap_kernel_restore(regs, kuap);
 
return ret;
 }
-- 
2.25.0



[PATCH v4 06/23] powerpc/32s: Create C version of kuap_user/kernel_restore() and friends

2021-01-25 Thread Christophe Leroy
In preparation of porting PPC32 to C syscall entry/exit,
create C version of kuap_user_restore() and kuap_kernel_restore()
and kuap_check() and kuap_get_and_check() on book3s/32.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/kup.h | 33 
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index a0117a9d5b06..a3e72e1141c5 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -103,6 +103,39 @@ static inline void kuap_update_sr(u32 sr, u32 addr, u32 
end)
isync();/* Context sync required after mtsrin() */
 }
 
+static inline void kuap_user_restore(struct pt_regs *regs)
+{
+}
+
+static inline void kuap_kernel_restore(struct pt_regs *regs, unsigned long 
kuap)
+{
+   u32 addr = kuap & 0xf000;
+   u32 end = kuap << 28;
+
+   if (unlikely(!kuap))
+   return;
+
+   current->thread.kuap = 0;
+   kuap_update_sr(mfsrin(addr) & ~SR_KS, addr, end);   /* Clear Ks */
+}
+
+static inline void kuap_check(void)
+{
+   if (!IS_ENABLED(CONFIG_PPC_KUAP_DEBUG))
+   return;
+
+   WARN_ON_ONCE(current->thread.kuap != 0);
+}
+
+static inline unsigned long kuap_get_and_check(void)
+{
+   unsigned long kuap = current->thread.kuap;
+
+   WARN_ON_ONCE(IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && kuap != 0);
+
+   return kuap;
+}
+
 static __always_inline void allow_user_access(void __user *to, const void 
__user *from,
  u32 size, unsigned long dir)
 {
-- 
2.25.0



[PATCH v4 08/23] powerpc/irq: Add helper to set regs->softe

2021-01-25 Thread Christophe Leroy
regs->softe doesn't exist on PPC32.

Add irq_soft_mask_regs_set_state() helper to set regs->softe.
This helper will void on PPC32.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hw_irq.h | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 614957f74cee..ed0c3b049dfd 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -38,6 +38,8 @@
 #define PACA_IRQ_MUST_HARD_MASK(PACA_IRQ_EE)
 #endif
 
+#endif /* CONFIG_PPC64 */
+
 /*
  * flags for paca->irq_soft_mask
  */
@@ -46,8 +48,6 @@
 #define IRQS_PMI_DISABLED  2
 #define IRQS_ALL_DISABLED  (IRQS_DISABLED | IRQS_PMI_DISABLED)
 
-#endif /* CONFIG_PPC64 */
-
 #ifndef __ASSEMBLY__
 
 #ifdef CONFIG_PPC64
@@ -287,6 +287,10 @@ extern void irq_set_pending_from_srr1(unsigned long srr1);
 
 extern void force_external_irq_replay(void);
 
+static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned 
long val)
+{
+   regs->softe = val;
+}
 #else /* CONFIG_PPC64 */
 
 static inline unsigned long arch_local_save_flags(void)
@@ -355,6 +359,9 @@ static inline bool arch_irq_disabled_regs(struct pt_regs 
*regs)
 
 static inline void may_hard_irq_enable(void) { }
 
+static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned 
long val)
+{
+}
 #endif /* CONFIG_PPC64 */
 
 #define ARCH_IRQ_INIT_FLAGSIRQ_NOREQUEST
-- 
2.25.0



[PATCH v4 07/23] powerpc/8xx: Create C version of kuap_user/kernel_restore() and friends

2021-01-25 Thread Christophe Leroy
In preparation of porting PPC32 to C syscall entry/exit,
create C version of kuap_user_restore() and kuap_kernel_restore()
and kuap_check() and kuap_get_and_check() on 8xx

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/kup-8xx.h | 27 
 1 file changed, 27 insertions(+)

diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h 
b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
index 17a4a616436f..5ca6c375f767 100644
--- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -34,6 +34,33 @@
 
 #include 
 
+static inline void kuap_user_restore(struct pt_regs *regs)
+{
+}
+
+static inline void kuap_kernel_restore(struct pt_regs *regs, unsigned long 
kuap)
+{
+   mtspr(SPRN_MD_AP, kuap);
+}
+
+static inline void kuap_check(void)
+{
+   if (!IS_ENABLED(CONFIG_PPC_KUAP_DEBUG))
+   return;
+
+   WARN_ON_ONCE(mfspr(SPRN_MD_AP) >> 16 != MD_APG_KUAP >> 16);
+}
+
+static inline unsigned long kuap_get_and_check(void)
+{
+   unsigned long kuap = mfspr(SPRN_MD_AP);
+
+   if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG))
+   WARN_ON_ONCE(mfspr(SPRN_MD_AP) >> 16 != MD_APG_KUAP >> 16);
+
+   return kuap;
+}
+
 static inline void allow_user_access(void __user *to, const void __user *from,
 unsigned long size, unsigned long dir)
 {
-- 
2.25.0



[PATCH v4 02/23] powerpc/32: Always enable data translation on syscall entry

2021-01-25 Thread Christophe Leroy
If the code can use a stack in vm area, it can also use a
stack in linear space.

Simplify code by removing old non VMAP stack code on PPC32 in syscall.

That means the data translation is now re-enabled early in
syscall entry in all cases, not only when using VMAP stacks.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.h| 23 +--
 arch/powerpc/kernel/head_booke.h |  2 --
 2 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index a2f72c966baf..fdc07beab844 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -118,7 +118,6 @@
 .macro SYSCALL_ENTRY trapno
mfspr   r12,SPRN_SPRG_THREAD
mfspr   r9, SPRN_SRR1
-#ifdef CONFIG_VMAP_STACK
mfspr   r11, SPRN_SRR0
mtctr   r11
andi.   r11, r9, MSR_PR
@@ -126,30 +125,16 @@
lwz r1,TASK_STACK-THREAD(r12)
beq-99f
addir1, r1, THREAD_SIZE - INT_FRAME_SIZE
-   li  r10, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
+   LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~(MSR_IR | MSR_RI)) /* can take 
DTLB miss */
mtmsr   r10
isync
tovirt(r12, r12)
stw r11,GPR1(r1)
stw r11,0(r1)
mr  r11, r1
-#else
-   andi.   r11, r9, MSR_PR
-   lwz r11,TASK_STACK-THREAD(r12)
-   beq-99f
-   addir11, r11, THREAD_SIZE - INT_FRAME_SIZE
-   tophys(r11, r11)
-   stw r1,GPR1(r11)
-   stw r1,0(r11)
-   tovirt(r1, r11) /* set new kernel sp */
-#endif
mflrr10
stw r10, _LINK(r11)
-#ifdef CONFIG_VMAP_STACK
mfctr   r10
-#else
-   mfspr   r10,SPRN_SRR0
-#endif
stw r10,_NIP(r11)
mfcrr10
rlwinm  r10,r10,0,4,2   /* Clear SO bit in CR */
@@ -157,11 +142,7 @@
 #ifdef CONFIG_40x
rlwinm  r9,r9,0,14,12   /* clear MSR_WE (necessary?) */
 #else
-#ifdef CONFIG_VMAP_STACK
LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~MSR_IR) /* can take exceptions */
-#else
-   LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~(MSR_IR|MSR_DR)) /* can take 
exceptions */
-#endif
mtmsr   r10 /* (except for mach check in rtas) */
 #endif
lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */
@@ -190,7 +171,6 @@
li  r12,-1  /* clear all pending debug events */
mtspr   SPRN_DBSR,r12
lis r11,global_dbcr0@ha
-   tophys(r11,r11)
addir11,r11,global_dbcr0@l
lwz r12,0(r11)
mtspr   SPRN_DBCR0,r12
@@ -200,7 +180,6 @@
 #endif
 
 3:
-   tovirt_novmstack r2, r2 /* set r2 to current */
lis r11, transfer_to_syscall@h
ori r11, r11, transfer_to_syscall@l
 #ifdef CONFIG_TRACE_IRQFLAGS
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index bf33af714d11..706cd9368992 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -144,7 +144,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
li  r12,-1  /* clear all pending debug events */
mtspr   SPRN_DBSR,r12
lis r11,global_dbcr0@ha
-   tophys(r11,r11)
addir11,r11,global_dbcr0@l
 #ifdef CONFIG_SMP
lwz r10, TASK_CPU(r2)
@@ -158,7 +157,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
stw r12,4(r11)
 
 3:
-   tovirt(r2, r2)  /* set r2 to current */
lis r11, transfer_to_syscall@h
ori r11, r11, transfer_to_syscall@l
 #ifdef CONFIG_TRACE_IRQFLAGS
-- 
2.25.0



[PATCH v4 00/23] powerpc/32: Implement C syscall entry/exit

2021-01-25 Thread Christophe Leroy
This series implements C syscall entry/exit for PPC32. It reuses
the work already done for PPC64.

This series is based on Nick's v6 series "powerpc: interrupt wrappers".

Patch 1 is a bug fix submitted separately but this series depends on it.
Patches 2-4 are an extract from the series "powerpc/32: Reduce head
complexity and re-activate MMU earlier". The changes here are limited
to system calls. That series will be respined to only contain exception
related changes and the syscall changes will remain in this series.
Patches 5-16 are preparatory changes.
Patch 17 is THE patch that changes to C syscall entry/exit
Patches 18-23 are optimisations.

In terms on performance we have the following number of cycles on an
8xx running null_syscall benchmark:
- mainline: 296 cycles
- after patch 4: 283 cycles
- after patch 16: 304 cycles
- after patch 17: 348 cycles
- at the end of the series: 320 cycles

So in summary, we have a degradation of performance of 8% on null_syscall.

I think it is not a big degradation, it is worth it.

v4 is the first mature version.

Christophe Leroy (23):
  powerpc/32s: Add missing call to kuep_lock on syscall entry
  powerpc/32: Always enable data translation on syscall entry
  powerpc/32: On syscall entry, enable instruction translation at the
same time as data
  powerpc/32: Reorder instructions to avoid using CTR in syscall entry
  powerpc/64s: Make kuap_check_amr() and kuap_get_and_check_amr()
generic
  powerpc/32s: Create C version of kuap_user/kernel_restore() and
friends
  powerpc/8xx: Create C version of kuap_user/kernel_restore() and
friends
  powerpc/irq: Add helper to set regs->softe
  powerpc/irq: Rework helpers that manipulate MSR[EE/RI]
  powerpc/irq: Add stub irq_soft_mask_return() for PPC32
  powerpc/syscall: Rename syscall_64.c into syscall.c
  powerpc/syscall: Make syscall.c buildable on PPC32
  powerpc/syscall: Use is_compat_task()
  powerpc/syscall: Save r3 in regs->orig_r3
  powerpc/syscall: Change condition to check MSR_RI
  powerpc/32: Always save non volatile GPRs at syscall entry
  powerpc/syscall: implement system call entry/exit logic in C for PPC32
  powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry
  powerpc/syscall: Avoid stack frame in likely part of
system_call_exception()
  powerpc/syscall: Do not check unsupported scv vector on PPC32
  powerpc/syscall: Remove FULL_REGS verification in
system_call_exception
  powerpc/syscall: Optimise checks in beginning of
system_call_exception()
  powerpc/syscall: Avoid storing 'current' in another pointer

 arch/powerpc/include/asm/book3s/32/kup.h  |  33 ++
 arch/powerpc/include/asm/book3s/64/kup.h  |  24 +-
 arch/powerpc/include/asm/hw_irq.h |  91 --
 arch/powerpc/include/asm/kup.h|   9 +-
 arch/powerpc/include/asm/nohash/32/kup-8xx.h  |  27 ++
 arch/powerpc/include/asm/reg.h|   1 +
 arch/powerpc/kernel/Makefile  |   4 +-
 arch/powerpc/kernel/entry_32.S| 305 ++
 arch/powerpc/kernel/entry_64.S|   1 -
 arch/powerpc/kernel/head_32.h |  76 +
 arch/powerpc/kernel/head_booke.h  |  27 +-
 .../kernel/{syscall_64.c => syscall.c}|  57 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl  |  20 +-
 13 files changed, 225 insertions(+), 450 deletions(-)
 rename arch/powerpc/kernel/{syscall_64.c => syscall.c} (90%)

-- 
2.25.0



[PATCH v4 03/23] powerpc/32: On syscall entry, enable instruction translation at the same time as data

2021-01-25 Thread Christophe Leroy
On 40x and 8xx, kernel text is pinned.
On book3s/32, kernel text is mapped by BATs.

Enable instruction translation at the same time as data translation, it
makes things simpler.

MSR_RI can also be set at the same time because srr0/srr1 are already
saved and r1 is set properly.

On booke, translation is always on, so at the end all PPC32
have translation on early.

This reduces null_syscall benchmark by 13 cycles on 8xx
(296 ==> 283 cycles).

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.h| 26 +-
 arch/powerpc/kernel/head_booke.h |  7 ++-
 2 files changed, 11 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index fdc07beab844..4029c51dce5d 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -125,9 +125,13 @@
lwz r1,TASK_STACK-THREAD(r12)
beq-99f
addir1, r1, THREAD_SIZE - INT_FRAME_SIZE
-   LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~(MSR_IR | MSR_RI)) /* can take 
DTLB miss */
-   mtmsr   r10
-   isync
+   LOAD_REG_IMMEDIATE(r10, MSR_KERNEL) /* can take exceptions 
*/
+   mtspr   SPRN_SRR1, r10
+   lis r10, 1f@h
+   ori r10, r10, 1f@l
+   mtspr   SPRN_SRR0, r10
+   rfi
+1:
tovirt(r12, r12)
stw r11,GPR1(r1)
stw r11,0(r1)
@@ -141,9 +145,6 @@
stw r10,_CCR(r11)   /* save registers */
 #ifdef CONFIG_40x
rlwinm  r9,r9,0,14,12   /* clear MSR_WE (necessary?) */
-#else
-   LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~MSR_IR) /* can take exceptions */
-   mtmsr   r10 /* (except for mach check in rtas) */
 #endif
lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */
stw r2,GPR2(r11)
@@ -180,8 +181,6 @@
 #endif
 
 3:
-   lis r11, transfer_to_syscall@h
-   ori r11, r11, transfer_to_syscall@l
 #ifdef CONFIG_TRACE_IRQFLAGS
/*
 * If MSR is changing we need to keep interrupts disabled at this point
@@ -193,15 +192,8 @@
 #else
LOAD_REG_IMMEDIATE(r10, MSR_KERNEL | MSR_EE)
 #endif
-#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
-   mtspr   SPRN_NRI, r0
-#endif
-   mtspr   SPRN_SRR1,r10
-   mtspr   SPRN_SRR0,r11
-   rfi /* jump to handler, enable MMU */
-#ifdef CONFIG_40x
-   b . /* Prevent prefetch past rfi */
-#endif
+   mtmsr   r10
+   b   transfer_to_syscall /* jump to handler */
 99:b   ret_from_kernel_syscall
 .endm
 
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 706cd9368992..b3c502c503a0 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -157,8 +157,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
stw r12,4(r11)
 
 3:
-   lis r11, transfer_to_syscall@h
-   ori r11, r11, transfer_to_syscall@l
 #ifdef CONFIG_TRACE_IRQFLAGS
/*
 * If MSR is changing we need to keep interrupts disabled at this point
@@ -172,9 +170,8 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
lis r10, (MSR_KERNEL | MSR_EE)@h
ori r10, r10, (MSR_KERNEL | MSR_EE)@l
 #endif
-   mtspr   SPRN_SRR1,r10
-   mtspr   SPRN_SRR0,r11
-   rfi /* jump to handler, enable MMU */
+   mtmsr   r10
+   b   transfer_to_syscall /* jump to handler */
 99:b   ret_from_kernel_syscall
 .endm
 
-- 
2.25.0



[PATCH v4 01/23] powerpc/32s: Add missing call to kuep_lock on syscall entry

2021-01-25 Thread Christophe Leroy
Userspace Execution protection and fast syscall entry were implemented
independently from each other and were both merged in kernel 5.2,
leading to syscall entry missing userspace execution protection.

On syscall entry, execution of user space memory must be
locked in the same way as on exception entry.

Fixes: b86fb88855ea ("powerpc/32: implement fast entry for syscalls on non 
BOOKE")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index b102b40c4988..b1e36602c013 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -351,6 +351,9 @@ trace_syscall_entry_irq_off:
 
.globl  transfer_to_syscall
 transfer_to_syscall:
+#ifdef CONFIG_PPC_BOOK3S_32
+   kuep_lock r11, r12
+#endif
 #ifdef CONFIG_TRACE_IRQFLAGS
andi.   r12,r9,MSR_EE
beq-trace_syscall_entry_irq_off
-- 
2.25.0



[PATCH v4 04/23] powerpc/32: Reorder instructions to avoid using CTR in syscall entry

2021-01-25 Thread Christophe Leroy
Now that we are using rfi instead of mtmsr to reactivate MMU, it is
possible to reorder instructions and avoid the need to use CTR for
stashing SRR0.

null_syscall on 8xx is reduced by 3 cycles (283 => 280 cycles).

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.h | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 4029c51dce5d..24dc326e0d56 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -116,30 +116,28 @@
 .endm
 
 .macro SYSCALL_ENTRY trapno
-   mfspr   r12,SPRN_SPRG_THREAD
mfspr   r9, SPRN_SRR1
-   mfspr   r11, SPRN_SRR0
-   mtctr   r11
+   mfspr   r10, SPRN_SRR0
andi.   r11, r9, MSR_PR
+   beq-99f
+   LOAD_REG_IMMEDIATE(r11, MSR_KERNEL) /* can take exceptions 
*/
+   lis r12, 1f@h
+   ori r12, r12, 1f@l
+   mtspr   SPRN_SRR1, r11
+   mtspr   SPRN_SRR0, r12
+   mfspr   r12,SPRN_SPRG_THREAD
mr  r11, r1
lwz r1,TASK_STACK-THREAD(r12)
-   beq-99f
+   tovirt(r12, r12)
addir1, r1, THREAD_SIZE - INT_FRAME_SIZE
-   LOAD_REG_IMMEDIATE(r10, MSR_KERNEL) /* can take exceptions 
*/
-   mtspr   SPRN_SRR1, r10
-   lis r10, 1f@h
-   ori r10, r10, 1f@l
-   mtspr   SPRN_SRR0, r10
rfi
 1:
-   tovirt(r12, r12)
stw r11,GPR1(r1)
stw r11,0(r1)
mr  r11, r1
+   stw r10,_NIP(r11)
mflrr10
stw r10, _LINK(r11)
-   mfctr   r10
-   stw r10,_NIP(r11)
mfcrr10
rlwinm  r10,r10,0,4,2   /* Clear SO bit in CR */
stw r10,_CCR(r11)   /* save registers */
-- 
2.25.0



[PATCH v2 2/2] powerpc/sstep: Fix incorrect return from analyze_instr()

2021-01-25 Thread Ananth N Mavinakayanahalli
We currently just percolate the return value from analyze_instr()
to the caller of emulate_step(), especially if it is a -1.

For one particular case (opcode = 4) for instructions that aren't
currently emulated, we are returning 'should not be single-stepped'
while we should have returned 0 which says 'did not emulate, may
have to single-step'.

Fixes: 930d6288a26787 ("powerpc: sstep: Add support for maddhd, maddhdu, maddld 
instructions")
Signed-off-by: Ananth N Mavinakayanahalli 
Suggested-by: Michael Ellerman 
Tested-by: Naveen N. Rao 
Reviewed-by: Sandipan Das 
---
 arch/powerpc/lib/sstep.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index f859cbbb6375..e96cff845ef7 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1445,6 +1445,11 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
 
 #ifdef __powerpc64__
case 4:
+   /*
+* There are very many instructions with this primary opcode
+* introduced in the ISA as early as v2.03. However, the ones
+* we currently emulate were all introduced with ISA 3.0
+*/
if (!cpu_has_feature(CPU_FTR_ARCH_300))
goto unknown_opcode;
 
@@ -1472,7 +1477,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
 * There are other instructions from ISA 3.0 with the same
 * primary opcode which do not have emulation support yet.
 */
-   return -1;
+   goto unknown_opcode;
 #endif
 
case 7: /* mulli */




[PATCH v4 1/2] [PATCH] powerpc/sstep: Check instruction validity against ISA version before emulation

2021-01-25 Thread Ananth N Mavinakayanahalli
We currently unconditionally try to emulate newer instructions on older
Power versions that could cause issues. Gate it.

Fixes: 350779a29f11 ("powerpc: Handle most loads and stores in instruction 
emulation code")
Signed-off-by: Ananth N Mavinakayanahalli 
---

[v4] Based on feedback from Paul Mackerras, Naveen Rao and Michael Ellerman,
 changed return code to 0, after setting opcode type to UNKNOWN
[v3] Addressed Naveen's comments on scv and addpcis
[v2] Fixed description
---
 arch/powerpc/lib/sstep.c |   78 +-
 1 file changed, 62 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index bf7a7d62ae8b..f859cbbb6375 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1304,9 +1304,11 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
if ((word & 0xfe2) == 2)
op->type = SYSCALL;
else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
-   (word & 0xfe3) == 1)
+   (word & 0xfe3) == 1) {  /* scv */
op->type = SYSCALL_VECTORED_0;
-   else
+   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   goto unknown_opcode;
+   } else
op->type = UNKNOWN;
return 0;
 #endif
@@ -1410,7 +1412,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
 #ifdef __powerpc64__
case 1:
if (!cpu_has_feature(CPU_FTR_ARCH_31))
-   return -1;
+   goto unknown_opcode;
 
prefix_r = GET_PREFIX_R(word);
ra = GET_PREFIX_RA(suffix);
@@ -1444,7 +1446,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
 #ifdef __powerpc64__
case 4:
if (!cpu_has_feature(CPU_FTR_ARCH_300))
-   return -1;
+   goto unknown_opcode;
 
switch (word & 0x3f) {
case 48:/* maddhd */
@@ -1530,6 +1532,8 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
case 19:
if (((word >> 1) & 0x1f) == 2) {
/* addpcis */
+   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   goto unknown_opcode;
imm = (short) (word & 0xffc1);  /* d0 + d2 fields */
imm |= (word >> 15) & 0x3e; /* d1 field */
op->val = regs->nip + (imm << 16) + 4;
@@ -1842,7 +1846,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
 #ifdef __powerpc64__
case 265:   /* modud */
if (!cpu_has_feature(CPU_FTR_ARCH_300))
-   return -1;
+   goto unknown_opcode;
op->val = regs->gpr[ra] % regs->gpr[rb];
goto compute_done;
 #endif
@@ -1852,7 +1856,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
 
case 267:   /* moduw */
if (!cpu_has_feature(CPU_FTR_ARCH_300))
-   return -1;
+   goto unknown_opcode;
op->val = (unsigned int) regs->gpr[ra] %
(unsigned int) regs->gpr[rb];
goto compute_done;
@@ -1889,7 +1893,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
 #endif
case 755:   /* darn */
if (!cpu_has_feature(CPU_FTR_ARCH_300))
-   return -1;
+   goto unknown_opcode;
switch (ra & 0x3) {
case 0:
/* 32-bit conditioned */
@@ -1911,14 +1915,14 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
 #ifdef __powerpc64__
case 777:   /* modsd */
if (!cpu_has_feature(CPU_FTR_ARCH_300))
-   return -1;
+   goto unknown_opcode;
op->val = (long int) regs->gpr[ra] %
(long int) regs->gpr[rb];
goto compute_done;
 #endif
case 779:   /* modsw */
if (!cpu_has_feature(CPU_FTR_ARCH_300))
-   return -1;
+   goto unknown_opcode;
op->val = (int) regs->gpr[ra] %
(int) regs->gpr[rb];
goto compute_done;
@@ -1995,14 +1999,14 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
 #endif
 

RE: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-25 Thread David Laight
From: Christophe Leroy
> Sent: 25 January 2021 09:15
> 
> Le 24/01/2021 à 09:22, Nicholas Piggin a écrit :
> > Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
> > enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
> > supports PMD sized vmap mappings.
> >
> > vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
> > or larger, and fall back to small pages if that was unsuccessful.
> >
> > Architectures must ensure that any arch specific vmalloc allocations
> > that require PAGE_SIZE mappings (e.g., module allocations vs strict
> > module rwx) use the VM_NOHUGE flag to inhibit larger mappings.
> >
> > When hugepage vmalloc mappings are enabled in the next patch, this
> > reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
> > POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> >
> > This can result in more internal fragmentation and memory overhead for a
> > given allocation, an option nohugevmalloc is added to disable at boot.
> >
> > Signed-off-by: Nicholas Piggin 
> > ---
> >   arch/Kconfig|  10 +++
> >   include/linux/vmalloc.h |  18 
> >   mm/page_alloc.c |   5 +-
> >   mm/vmalloc.c| 192 ++--
> >   4 files changed, 177 insertions(+), 48 deletions(-)
> >
> 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 0377e1d059e5..eef61e0f5170 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> 
> > @@ -2691,15 +2746,18 @@ EXPORT_SYMBOL_GPL(vmap_pfn);
> >   #endif /* CONFIG_VMAP_PFN */
> >
> >   static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > -pgprot_t prot, int node)
> > +pgprot_t prot, unsigned int page_shift,
> > +int node)
> >   {
> > const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > -   unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
> > -   unsigned long array_size;
> > -   unsigned int i;
> > +   unsigned int page_order = page_shift - PAGE_SHIFT;
> > +   unsigned long addr = (unsigned long)area->addr;
> > +   unsigned long size = get_vm_area_size(area);
> > +   unsigned int nr_small_pages = size >> PAGE_SHIFT;
> > struct page **pages;
> > +   unsigned int i;
> >
> > -   array_size = (unsigned long)nr_pages * sizeof(struct page *);
> > +   array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
> 
> array_size() is a function in include/linux/overflow.h
> 
> For some reason, it breaks the build with your series.

I can't see the replacement definition for array_size.
The old local variable is deleted.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-25 Thread Christophe Leroy




Le 25/01/2021 à 12:37, Nicholas Piggin a écrit :

Excerpts from Christophe Leroy's message of January 25, 2021 7:14 pm:



Le 24/01/2021 à 09:22, Nicholas Piggin a écrit :

Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
supports PMD sized vmap mappings.

vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
or larger, and fall back to small pages if that was unsuccessful.

Architectures must ensure that any arch specific vmalloc allocations
that require PAGE_SIZE mappings (e.g., module allocations vs strict
module rwx) use the VM_NOHUGE flag to inhibit larger mappings.

When hugepage vmalloc mappings are enabled in the next patch, this
reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.

This can result in more internal fragmentation and memory overhead for a
given allocation, an option nohugevmalloc is added to disable at boot.

Signed-off-by: Nicholas Piggin 
---
   arch/Kconfig|  10 +++
   include/linux/vmalloc.h |  18 
   mm/page_alloc.c |   5 +-
   mm/vmalloc.c| 192 ++--
   4 files changed, 177 insertions(+), 48 deletions(-)




diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0377e1d059e5..eef61e0f5170 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c



@@ -2691,15 +2746,18 @@ EXPORT_SYMBOL_GPL(vmap_pfn);
   #endif /* CONFIG_VMAP_PFN */
   
   static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,

-pgprot_t prot, int node)
+pgprot_t prot, unsigned int page_shift,
+int node)
   {
const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-   unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
-   unsigned long array_size;
-   unsigned int i;
+   unsigned int page_order = page_shift - PAGE_SHIFT;
+   unsigned long addr = (unsigned long)area->addr;
+   unsigned long size = get_vm_area_size(area);
+   unsigned int nr_small_pages = size >> PAGE_SHIFT;
struct page **pages;
+   unsigned int i;
   
-	array_size = (unsigned long)nr_pages * sizeof(struct page *);

+   array_size = (unsigned long)nr_small_pages * sizeof(struct page *);


array_size() is a function in include/linux/overflow.h

For some reason, it breaks the build with your series.


What config? I haven't seen it.



Several configs I believe. I saw it this morning in 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210124082230.2118861-13-npig...@gmail.com/


Though the reports have all disappeared now.


Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-25 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of January 25, 2021 7:14 pm:
> 
> 
> Le 24/01/2021 à 09:22, Nicholas Piggin a écrit :
>> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
>> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
>> supports PMD sized vmap mappings.
>> 
>> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
>> or larger, and fall back to small pages if that was unsuccessful.
>> 
>> Architectures must ensure that any arch specific vmalloc allocations
>> that require PAGE_SIZE mappings (e.g., module allocations vs strict
>> module rwx) use the VM_NOHUGE flag to inhibit larger mappings.
>> 
>> When hugepage vmalloc mappings are enabled in the next patch, this
>> reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
>> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
>> 
>> This can result in more internal fragmentation and memory overhead for a
>> given allocation, an option nohugevmalloc is added to disable at boot.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>   arch/Kconfig|  10 +++
>>   include/linux/vmalloc.h |  18 
>>   mm/page_alloc.c |   5 +-
>>   mm/vmalloc.c| 192 ++--
>>   4 files changed, 177 insertions(+), 48 deletions(-)
>> 
> 
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 0377e1d059e5..eef61e0f5170 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
> 
>> @@ -2691,15 +2746,18 @@ EXPORT_SYMBOL_GPL(vmap_pfn);
>>   #endif /* CONFIG_VMAP_PFN */
>>   
>>   static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>> - pgprot_t prot, int node)
>> + pgprot_t prot, unsigned int page_shift,
>> + int node)
>>   {
>>  const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>> -unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>> -unsigned long array_size;
>> -unsigned int i;
>> +unsigned int page_order = page_shift - PAGE_SHIFT;
>> +unsigned long addr = (unsigned long)area->addr;
>> +unsigned long size = get_vm_area_size(area);
>> +unsigned int nr_small_pages = size >> PAGE_SHIFT;
>>  struct page **pages;
>> +unsigned int i;
>>   
>> -array_size = (unsigned long)nr_pages * sizeof(struct page *);
>> +array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
> 
> array_size() is a function in include/linux/overflow.h
> 
> For some reason, it breaks the build with your series.

What config? I haven't seen it.

Thanks,
Nick


Re: [PATCH v10 06/12] powerpc: inline huge vmap supported functions

2021-01-25 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of January 25, 2021 6:42 pm:
> 
> 
> Le 24/01/2021 à 09:22, Nicholas Piggin a écrit :
>> This allows unsupported levels to be constant folded away, and so
>> p4d_free_pud_page can be removed because it's no longer linked to.
> 
> Ah, ok, you did it here. Why not squashing this patch into patch 5 directly ?

To reduce arch code movement in the first patch and split up these arch
patches to get separate acks for them.

Maybe overkill for these changes but doesn't hurt I think.

Thanks,
Nick


[PATCH] powerpc: remove unneeded semicolons

2021-01-25 Thread Chengyang Fan
Remove superfluous semicolons after function definitions.

Signed-off-by: Chengyang Fan 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h   |  2 +-
 arch/powerpc/include/asm/book3s/64/mmu.h|  2 +-
 arch/powerpc/include/asm/book3s/64/tlbflush-radix.h |  2 +-
 arch/powerpc/include/asm/book3s/64/tlbflush.h   |  2 +-
 arch/powerpc/include/asm/firmware.h |  2 +-
 arch/powerpc/include/asm/kvm_ppc.h  |  6 +++---
 arch/powerpc/include/asm/paca.h |  6 +++---
 arch/powerpc/include/asm/rtas.h |  2 +-
 arch/powerpc/include/asm/setup.h|  6 +++---
 arch/powerpc/include/asm/simple_spinlock.h  |  4 ++--
 arch/powerpc/include/asm/smp.h  |  2 +-
 arch/powerpc/include/asm/xmon.h |  4 ++--
 arch/powerpc/kernel/prom.c  |  2 +-
 arch/powerpc/kernel/setup.h | 12 ++--
 arch/powerpc/platforms/powernv/subcore.h|  2 +-
 arch/powerpc/platforms/pseries/pseries.h|  2 +-
 16 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index 685c589e723f..b85f8e114a9c 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -94,7 +94,7 @@ typedef struct {
 } mm_context_t;
 
 void update_bats(void);
-static inline void cleanup_cpu_mmu_context(void) { };
+static inline void cleanup_cpu_mmu_context(void) { }
 
 /* patch sites */
 extern s32 patch__hash_page_A0, patch__hash_page_A1, patch__hash_page_A2;
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 995bbcdd0ef8..eace8c3f7b0a 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -239,7 +239,7 @@ static inline void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
 #ifdef CONFIG_PPC_PSERIES
 extern void radix_init_pseries(void);
 #else
-static inline void radix_init_pseries(void) { };
+static inline void radix_init_pseries(void) { }
 #endif
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 94439e0cefc9..8b33601cdb9d 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -35,7 +35,7 @@ extern void radix__flush_pwc_lpid(unsigned int lpid);
 extern void radix__flush_all_lpid(unsigned int lpid);
 extern void radix__flush_all_lpid_guest(unsigned int lpid);
 #else
-static inline void radix__tlbiel_all(unsigned int action) { WARN_ON(1); };
+static inline void radix__tlbiel_all(unsigned int action) { WARN_ON(1); }
 static inline void radix__flush_tlb_lpid_page(unsigned int lpid,
unsigned long addr,
unsigned long page_size)
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index dcb5c3839d2f..215973b4cb26 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -31,7 +31,7 @@ static inline void tlbiel_all(void)
hash__tlbiel_all(TLB_INVAL_SCOPE_GLOBAL);
 }
 #else
-static inline void tlbiel_all(void) { BUG(); };
+static inline void tlbiel_all(void) { BUG(); }
 #endif
 
 static inline void tlbiel_all_lpid(bool radix)
diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index aa6a5ef5d483..7604673787d6 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -137,7 +137,7 @@ extern unsigned int __start___fw_ftr_fixup, 
__stop___fw_ftr_fixup;
 #ifdef CONFIG_PPC_PSERIES
 void pseries_probe_fw_features(void);
 #else
-static inline void pseries_probe_fw_features(void) { };
+static inline void pseries_probe_fw_features(void) { }
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 0a056c64c317..259ba4ce9ad3 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -627,9 +627,9 @@ extern int h_ipi_redirect;
 static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
struct kvm *kvm)
{ return NULL; }
-static inline void kvmppc_alloc_host_rm_ops(void) {};
-static inline void kvmppc_free_host_rm_ops(void) {};
-static inline void kvmppc_free_pimap(struct kvm *kvm) {};
+static inline void kvmppc_alloc_host_rm_ops(void) {}
+static inline void kvmppc_free_host_rm_ops(void) {}
+static inline void kvmppc_free_pimap(struct kvm *kvm) {}
 static inline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
{ return 0; }
 static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
diff --git 

Re: [PATCH v4 2/2] powerpc/mce: Remove per cpu variables from MCE handlers

2021-01-25 Thread Christophe Leroy




Le 22/01/2021 à 13:32, Ganesh Goudar a écrit :

Access to per-cpu variables requires translation to be enabled on
pseries machine running in hash mmu mode, Since part of MCE handler
runs in realmode and part of MCE handling code is shared between ppc
architectures pseries and powernv, it becomes difficult to manage
these variables differently on different architectures, So have
these variables in paca instead of having them as per-cpu variables
to avoid complications.

Signed-off-by: Ganesh Goudar 
---
v2: Dynamically allocate memory for machine check event info

v3: Remove check for hash mmu lpar, use memblock_alloc_try_nid
 to allocate memory.

v4: Spliting the patch into two.
---
  arch/powerpc/include/asm/mce.h | 18 +++
  arch/powerpc/include/asm/paca.h|  4 ++
  arch/powerpc/kernel/mce.c  | 79 ++
  arch/powerpc/kernel/setup-common.c |  2 +-
  4 files changed, 70 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 71f38e9248be..17dc451f0e45 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -916,7 +916,6 @@ void __init setup_arch(char **cmdline_p)
/* On BookE, setup per-core TLB data structures. */
setup_tlb_core_data();
  #endif
-


This line removal is really required for this patch ?


/* Print various info about the machine that has been gathered so far. 
*/
print_system_info();
  
@@ -938,6 +937,7 @@ void __init setup_arch(char **cmdline_p)

exc_lvl_early_init();
emergency_stack_init();
  
+	mce_init();


You have to include mce.h to avoid build failure on PPC32.




smp_release_cpus();
  
  	initmem_init();




Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-25 Thread Christophe Leroy




Le 24/01/2021 à 09:22, Nicholas Piggin a écrit :

Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
supports PMD sized vmap mappings.

vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
or larger, and fall back to small pages if that was unsuccessful.

Architectures must ensure that any arch specific vmalloc allocations
that require PAGE_SIZE mappings (e.g., module allocations vs strict
module rwx) use the VM_NOHUGE flag to inhibit larger mappings.

When hugepage vmalloc mappings are enabled in the next patch, this
reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.

This can result in more internal fragmentation and memory overhead for a
given allocation, an option nohugevmalloc is added to disable at boot.

Signed-off-by: Nicholas Piggin 
---
  arch/Kconfig|  10 +++
  include/linux/vmalloc.h |  18 
  mm/page_alloc.c |   5 +-
  mm/vmalloc.c| 192 ++--
  4 files changed, 177 insertions(+), 48 deletions(-)




diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0377e1d059e5..eef61e0f5170 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c



@@ -2691,15 +2746,18 @@ EXPORT_SYMBOL_GPL(vmap_pfn);
  #endif /* CONFIG_VMAP_PFN */
  
  static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,

-pgprot_t prot, int node)
+pgprot_t prot, unsigned int page_shift,
+int node)
  {
const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-   unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
-   unsigned long array_size;
-   unsigned int i;
+   unsigned int page_order = page_shift - PAGE_SHIFT;
+   unsigned long addr = (unsigned long)area->addr;
+   unsigned long size = get_vm_area_size(area);
+   unsigned int nr_small_pages = size >> PAGE_SHIFT;
struct page **pages;
+   unsigned int i;
  
-	array_size = (unsigned long)nr_pages * sizeof(struct page *);

+   array_size = (unsigned long)nr_small_pages * sizeof(struct page *);


array_size() is a function in include/linux/overflow.h

For some reason, it breaks the build with your series.



gfp_mask |= __GFP_NOWARN;
if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
gfp_mask |= __GFP_HIGHMEM;


Re: [PATCH v10 06/12] powerpc: inline huge vmap supported functions

2021-01-25 Thread Christophe Leroy




Le 24/01/2021 à 09:22, Nicholas Piggin a écrit :

This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.


Ah, ok, you did it here. Why not squashing this patch into patch 5 directly ?



Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman 
Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/vmalloc.h   | 19 ---
  arch/powerpc/mm/book3s64/radix_pgtable.c | 21 -
  2 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/vmalloc.h 
b/arch/powerpc/include/asm/vmalloc.h
index 105abb73f075..3f0c153befb0 100644
--- a/arch/powerpc/include/asm/vmalloc.h
+++ b/arch/powerpc/include/asm/vmalloc.h
@@ -1,12 +1,25 @@
  #ifndef _ASM_POWERPC_VMALLOC_H
  #define _ASM_POWERPC_VMALLOC_H
  
+#include 

  #include 
  
  #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP

-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+   /* HPT does not cope with large pages in the vmalloc area */
+   return radix_enabled();
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   return radix_enabled();
+}
  #endif
  
  #endif /* _ASM_POWERPC_VMALLOC_H */

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 743807fc210f..8da62afccee5 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1082,22 +1082,6 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
  }
  
-bool arch_vmap_pud_supported(pgprot_t prot)

-{
-   /* HPT does not cope with large pages in the vmalloc area */
-   return radix_enabled();
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   return radix_enabled();
-}
-
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;
-}
-
  int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
  {
pte_t *ptep = (pte_t *)pud;
@@ -1181,8 +1165,3 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
  
  	return 1;

  }
-
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}



Re: [PATCH v10 05/12] mm: HUGE_VMAP arch support cleanup

2021-01-25 Thread Christophe Leroy




Le 24/01/2021 à 09:22, Nicholas Piggin a écrit :

This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.

This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.


It looks like this is only the case when CONFIG_HAVE_ARCH_HUGE_VMAP is not 
defined.

When it is defined, for exemple on powerpc you defined arch_vmap_p4d_supported() as a regular 
function in arch/powerpc/mm/book3s64/radix_pgtable.c, so allthough it returns always false, it won't 
constant fold dead code.




This also adds a prot argument to the arch query. This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).

Cc: linuxppc-dev@lists.ozlabs.org
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Acked-by: Catalin Marinas  [arm64]
Signed-off-by: Nicholas Piggin 
---
  arch/arm64/include/asm/vmalloc.h |  8 +++
  arch/arm64/mm/mmu.c  | 10 +--
  arch/powerpc/include/asm/vmalloc.h   |  8 +++
  arch/powerpc/mm/book3s64/radix_pgtable.c |  8 +--
  arch/x86/include/asm/vmalloc.h   |  7 ++
  arch/x86/mm/ioremap.c| 12 ++--
  include/linux/io.h   |  9 ---
  include/linux/vmalloc.h  |  6 ++
  init/main.c  |  1 -
  mm/ioremap.c | 88 +---
  10 files changed, 79 insertions(+), 78 deletions(-)



Christophe


Re: [PATCH v10 05/12] mm: HUGE_VMAP arch support cleanup

2021-01-25 Thread Christophe Leroy




Le 24/01/2021 à 12:40, Christoph Hellwig a écrit :

diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 2ca708ab9b20..597b40405319 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -1,4 +1,12 @@
  #ifndef _ASM_ARM64_VMALLOC_H
  #define _ASM_ARM64_VMALLOC_H
  
+#include 

+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif


Shouldn't the be inlines or macros?  Also it would be useful
if the architectures would not have to override all functions
but just those that are it actually implements?

Also lots of > 80 char lines in the patch.



Since 
https://github.com/linuxppc/linux/commit/bdc48fa11e46f867ea4d75fa59ee87a7f48be144
this 80 char limit is not strongly enforced anymore.

Allthough 80 is still the prefered limit, code is often more readable with a slightly longer single 
line that with lines splited.


Christophe


[PATCH] KVM: PPC: Book3S: Assign boolean values to a bool variable

2021-01-25 Thread Jiapeng Zhong
Fix the following coccicheck warnings:

./arch/powerpc/kvm/book3s_hv_rm_xics.c:381:3-15: WARNING: Assignment of
0/1 to bool variable.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Zhong 
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index c2c9c73..68e509d 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -378,7 +378,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
arch_spin_unlock(>lock);
icp->n_reject++;
new_irq = reject;
-   check_resend = 0;
+   check_resend = false;
goto again;
}
} else {
-- 
1.8.3.1