Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings
On 2021/1/26 12:45, Nicholas Piggin wrote: > Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC > enables support on architectures that define HAVE_ARCH_HUGE_VMAP and > supports PMD sized vmap mappings. > > vmalloc will attempt to allocate PMD-sized pages if allocating PMD size > or larger, and fall back to small pages if that was unsuccessful. > > Architectures must ensure that any arch specific vmalloc allocations > that require PAGE_SIZE mappings (e.g., module allocations vs strict > module rwx) use the VM_NOHUGE flag to inhibit larger mappings. > > When hugepage vmalloc mappings are enabled in the next patch, this > reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node > POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. > > This can result in more internal fragmentation and memory overhead for a > given allocation, an option nohugevmalloc is added to disable at boot. > > Signed-off-by: Nicholas Piggin > --- > arch/Kconfig| 11 ++ > include/linux/vmalloc.h | 21 > mm/page_alloc.c | 5 +- > mm/vmalloc.c| 215 +++- > 4 files changed, 205 insertions(+), 47 deletions(-) > > diff --git a/arch/Kconfig b/arch/Kconfig > index 24862d15f3a3..eef170e0c9b8 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > config HAVE_ARCH_HUGE_VMAP > bool > > +# > +# Archs that select this would be capable of PMD-sized vmaps (i.e., > +# arch_vmap_pmd_supported() returns true), and they must make no assumptions > +# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP > flag > +# can be used to prohibit arch-specific allocations from using hugepages to > +# help with this (e.g., modules may require it). > +# > +config HAVE_ARCH_HUGE_VMALLOC > + depends on HAVE_ARCH_HUGE_VMAP > + bool > + > config ARCH_WANT_HUGE_PMD_SHARE > bool > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index 99ea72d547dc..93270adf5db5 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -25,6 +25,7 @@ struct notifier_block; /* in notifier.h */ > #define VM_NO_GUARD 0x0040 /* don't add guard page */ > #define VM_KASAN 0x0080 /* has allocated kasan shadow > memory */ > #define VM_MAP_PUT_PAGES 0x0100 /* put pages and free array in > vfree */ > +#define VM_NO_HUGE_VMAP 0x0200 /* force PAGE_SIZE pte > mapping */ > > /* > * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC. > @@ -59,6 +60,9 @@ struct vm_struct { > unsigned long size; > unsigned long flags; > struct page **pages; > +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC > + unsigned intpage_order; > +#endif > unsigned intnr_pages; > phys_addr_t phys_addr; > const void *caller; Hi Nicholas: Give a suggestion :) The page order was only used to indicate the huge page flag for vm area, and only valid when size bigger than PMD_SIZE, so can we use the vm flgas to instead of that, just like define the new flag named VM_HUGEPAGE, it would not break the vm struct, and it is easier for me to backport the serious patches to our own branches. (Base on the lts version). Tianhong > @@ -193,6 +197,22 @@ void free_vm_area(struct vm_struct *area); > extern struct vm_struct *remove_vm_area(const void *addr); > extern struct vm_struct *find_vm_area(const void *addr); > > +static inline bool is_vm_area_hugepages(const void *addr) > +{ > + /* > + * This may not 100% tell if the area is mapped with > PAGE_SIZE > + * page table entries, if for some reason the architecture indicates > + * larger sizes are available but decides not to use them, nothing > + * prevents that. This only indicates the size of the physical page > + * allocated in the vmalloc layer. > + */ > +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC > + return find_vm_area(addr)->page_order > 0; > +#else > + return false; > +#endif > +} > + > #ifdef CONFIG_MMU > int vmap_range(unsigned long addr, unsigned long end, > phys_addr_t phys_addr, pgprot_t prot, > @@ -210,6 +230,7 @@ static inline void set_vm_flush_reset_perms(void *addr) > if (vm) > vm->flags |= VM_FLUSH_RESET_PERMS; > } > + > #else > static inline int > map_kernel_range_noflush(unsigned long start, unsigned long size, > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 027f6481ba59..b7a9661fa232 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -72,6 +72,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -8238,6 +8239,7 @@ void *__init alloc_large_system_hash(const char > *tablename, > void *table = NULL; > gfp_t gfp_flags; > bool
Re: [PATCH v11 04/13] mm/ioremap: rename ioremap_*_range to vmap_*_range
Looks good, Reviewed-by: Christoph Hellwig
Re: [PATCH 3/5] powerpc/xive: remove unnecessary unmap_kernel_range
On Tue, Jan 26, 2021 at 02:54:02PM +1000, Nicholas Piggin wrote: > iounmap will remove ptes. Looks good, Reviewed-by: Christoph Hellwig
Re: [PATCH v11 05/13] mm: HUGE_VMAP arch support cleanup
Reviewed-by: Ding Tianhong On 2021/1/26 12:45, Nicholas Piggin wrote: > This changes the awkward approach where architectures provide init > functions to determine which levels they can provide large mappings for, > to one where the arch is queried for each call. > > This removes code and indirection, and allows constant-folding of dead > code for unsupported levels. > > This also adds a prot argument to the arch query. This is unused > currently but could help with some architectures (e.g., some powerpc > processors can't map uncacheable memory with large pages). > > Cc: linuxppc-dev@lists.ozlabs.org > Cc: Catalin Marinas > Cc: Will Deacon > Cc: linux-arm-ker...@lists.infradead.org > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: x...@kernel.org > Cc: "H. Peter Anvin" > Acked-by: Catalin Marinas [arm64] > Signed-off-by: Nicholas Piggin > --- > arch/arm64/include/asm/vmalloc.h | 8 ++ > arch/arm64/mm/mmu.c | 10 +-- > arch/powerpc/include/asm/vmalloc.h | 8 ++ > arch/powerpc/mm/book3s64/radix_pgtable.c | 8 +- > arch/x86/include/asm/vmalloc.h | 7 ++ > arch/x86/mm/ioremap.c| 12 +-- > include/linux/io.h | 9 --- > include/linux/vmalloc.h | 6 ++ > init/main.c | 1 - > mm/ioremap.c | 94 ++-- > 10 files changed, 85 insertions(+), 78 deletions(-) > > diff --git a/arch/arm64/include/asm/vmalloc.h > b/arch/arm64/include/asm/vmalloc.h > index 2ca708ab9b20..597b40405319 100644 > --- a/arch/arm64/include/asm/vmalloc.h > +++ b/arch/arm64/include/asm/vmalloc.h > @@ -1,4 +1,12 @@ > #ifndef _ASM_ARM64_VMALLOC_H > #define _ASM_ARM64_VMALLOC_H > > +#include > + > +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP > +bool arch_vmap_p4d_supported(pgprot_t prot); > +bool arch_vmap_pud_supported(pgprot_t prot); > +bool arch_vmap_pmd_supported(pgprot_t prot); > +#endif > + > #endif /* _ASM_ARM64_VMALLOC_H */ > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index ae0c3d023824..1613d290cbd1 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -1313,12 +1313,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, > int *size, pgprot_t prot) > return dt_virt; > } > > -int __init arch_ioremap_p4d_supported(void) > +bool arch_vmap_p4d_supported(pgprot_t prot) > { > - return 0; > + return false; > } > > -int __init arch_ioremap_pud_supported(void) > +bool arch_vmap_pud_supported(pgprot_t prot) > { > /* >* Only 4k granule supports level 1 block mappings. > @@ -1328,9 +1328,9 @@ int __init arch_ioremap_pud_supported(void) > !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); > } > > -int __init arch_ioremap_pmd_supported(void) > +bool arch_vmap_pmd_supported(pgprot_t prot) > { > - /* See arch_ioremap_pud_supported() */ > + /* See arch_vmap_pud_supported() */ > return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); > } > > diff --git a/arch/powerpc/include/asm/vmalloc.h > b/arch/powerpc/include/asm/vmalloc.h > index b992dfaaa161..105abb73f075 100644 > --- a/arch/powerpc/include/asm/vmalloc.h > +++ b/arch/powerpc/include/asm/vmalloc.h > @@ -1,4 +1,12 @@ > #ifndef _ASM_POWERPC_VMALLOC_H > #define _ASM_POWERPC_VMALLOC_H > > +#include > + > +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP > +bool arch_vmap_p4d_supported(pgprot_t prot); > +bool arch_vmap_pud_supported(pgprot_t prot); > +bool arch_vmap_pmd_supported(pgprot_t prot); > +#endif > + > #endif /* _ASM_POWERPC_VMALLOC_H */ > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c > b/arch/powerpc/mm/book3s64/radix_pgtable.c > index 98f0b243c1ab..743807fc210f 100644 > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c > @@ -1082,13 +1082,13 @@ void radix__ptep_modify_prot_commit(struct > vm_area_struct *vma, > set_pte_at(mm, addr, ptep, pte); > } > > -int __init arch_ioremap_pud_supported(void) > +bool arch_vmap_pud_supported(pgprot_t prot) > { > /* HPT does not cope with large pages in the vmalloc area */ > return radix_enabled(); > } > > -int __init arch_ioremap_pmd_supported(void) > +bool arch_vmap_pmd_supported(pgprot_t prot) > { > return radix_enabled(); > } > @@ -1182,7 +1182,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) > return 1; > } > > -int __init arch_ioremap_p4d_supported(void) > +bool arch_vmap_p4d_supported(pgprot_t prot) > { > - return 0; > + return false; > } > diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h > index 29837740b520..094ea2b565f3 100644 > --- a/arch/x86/include/asm/vmalloc.h > +++ b/arch/x86/include/asm/vmalloc.h > @@ -1,6 +1,13 @@ > #ifndef _ASM_X86_VMALLOC_H > #define _ASM_X86_VMALLOC_H > > +#include > #include > > +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP > +bool arch_vmap_p4d_supported(pgprot_t prot); > +bool
[PATCH 3/5] powerpc/xive: remove unnecessary unmap_kernel_range
iounmap will remove ptes. Cc: "Cédric Le Goater" Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Nicholas Piggin --- arch/powerpc/sysdev/xive/common.c | 4 1 file changed, 4 deletions(-) diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index 595310e056f4..d6c2069cc828 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -959,16 +959,12 @@ EXPORT_SYMBOL_GPL(is_xive_irq); void xive_cleanup_irq_data(struct xive_irq_data *xd) { if (xd->eoi_mmio) { - unmap_kernel_range((unsigned long)xd->eoi_mmio, - 1u << xd->esb_shift); iounmap(xd->eoi_mmio); if (xd->eoi_mmio == xd->trig_mmio) xd->trig_mmio = NULL; xd->eoi_mmio = NULL; } if (xd->trig_mmio) { - unmap_kernel_range((unsigned long)xd->trig_mmio, - 1u << xd->esb_shift); iounmap(xd->trig_mmio); xd->trig_mmio = NULL; } -- 2.23.0
[PATCH v11 13/13] powerpc/64s/radix: Enable huge vmalloc mappings
Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Nicholas Piggin --- .../admin-guide/kernel-parameters.txt | 2 ++ arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/module.c | 21 +++ 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a10b545c2070..d62df53e5200 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3225,6 +3225,8 @@ nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings. + nohugevmalloc [PPC] Disable kernel huge vmalloc mappings. + nosmt [KNL,S390] Disable symmetric multithreading (SMT). Equivalent to smt=1. diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 107bb4319e0e..781da6829ab7 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -181,6 +181,7 @@ config PPC select GENERIC_GETTIMEOFDAY select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU + select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14 diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index a211b0253cdb..07026335d24d 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -87,13 +87,26 @@ int module_finalize(const Elf_Ehdr *hdr, return 0; } -#ifdef MODULES_VADDR void *module_alloc(unsigned long size) { + unsigned long start = VMALLOC_START; + unsigned long end = VMALLOC_END; + +#ifdef MODULES_VADDR BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR); + start = MODULES_VADDR; + end = MODULES_END; +#endif + + /* +* Don't do huge page allocations for modules yet until more testing +* is done. STRICT_MODULE_RWX may require extra work to support this +* too. +*/ - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, GFP_KERNEL, - PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, + return __vmalloc_node_range(size, 1, start, end, GFP_KERNEL, + PAGE_KERNEL_EXEC, + VM_NO_HUGE_VMAP | VM_FLUSH_RESET_PERMS, + NUMA_NO_NODE, __builtin_return_address(0)); } -#endif -- 2.23.0
[PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings
Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC enables support on architectures that define HAVE_ARCH_HUGE_VMAP and supports PMD sized vmap mappings. vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or larger, and fall back to small pages if that was unsuccessful. Architectures must ensure that any arch specific vmalloc allocations that require PAGE_SIZE mappings (e.g., module allocations vs strict module rwx) use the VM_NOHUGE flag to inhibit larger mappings. When hugepage vmalloc mappings are enabled in the next patch, this reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. This can result in more internal fragmentation and memory overhead for a given allocation, an option nohugevmalloc is added to disable at boot. Signed-off-by: Nicholas Piggin --- arch/Kconfig| 11 ++ include/linux/vmalloc.h | 21 mm/page_alloc.c | 5 +- mm/vmalloc.c| 215 +++- 4 files changed, 205 insertions(+), 47 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 24862d15f3a3..eef170e0c9b8 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD config HAVE_ARCH_HUGE_VMAP bool +# +# Archs that select this would be capable of PMD-sized vmaps (i.e., +# arch_vmap_pmd_supported() returns true), and they must make no assumptions +# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag +# can be used to prohibit arch-specific allocations from using hugepages to +# help with this (e.g., modules may require it). +# +config HAVE_ARCH_HUGE_VMALLOC + depends on HAVE_ARCH_HUGE_VMAP + bool + config ARCH_WANT_HUGE_PMD_SHARE bool diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 99ea72d547dc..93270adf5db5 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -25,6 +25,7 @@ struct notifier_block;/* in notifier.h */ #define VM_NO_GUARD0x0040 /* don't add guard page */ #define VM_KASAN 0x0080 /* has allocated kasan shadow memory */ #define VM_MAP_PUT_PAGES 0x0100 /* put pages and free array in vfree */ +#define VM_NO_HUGE_VMAP0x0200 /* force PAGE_SIZE pte mapping */ /* * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC. @@ -59,6 +60,9 @@ struct vm_struct { unsigned long size; unsigned long flags; struct page **pages; +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC + unsigned intpage_order; +#endif unsigned intnr_pages; phys_addr_t phys_addr; const void *caller; @@ -193,6 +197,22 @@ void free_vm_area(struct vm_struct *area); extern struct vm_struct *remove_vm_area(const void *addr); extern struct vm_struct *find_vm_area(const void *addr); +static inline bool is_vm_area_hugepages(const void *addr) +{ + /* +* This may not 100% tell if the area is mapped with > PAGE_SIZE +* page table entries, if for some reason the architecture indicates +* larger sizes are available but decides not to use them, nothing +* prevents that. This only indicates the size of the physical page +* allocated in the vmalloc layer. +*/ +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC + return find_vm_area(addr)->page_order > 0; +#else + return false; +#endif +} + #ifdef CONFIG_MMU int vmap_range(unsigned long addr, unsigned long end, phys_addr_t phys_addr, pgprot_t prot, @@ -210,6 +230,7 @@ static inline void set_vm_flush_reset_perms(void *addr) if (vm) vm->flags |= VM_FLUSH_RESET_PERMS; } + #else static inline int map_kernel_range_noflush(unsigned long start, unsigned long size, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 027f6481ba59..b7a9661fa232 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -72,6 +72,7 @@ #include #include #include +#include #include #include @@ -8238,6 +8239,7 @@ void *__init alloc_large_system_hash(const char *tablename, void *table = NULL; gfp_t gfp_flags; bool virt; + bool huge; /* allow the kernel cmdline to have a say */ if (!numentries) { @@ -8305,6 +8307,7 @@ void *__init alloc_large_system_hash(const char *tablename, } else if (get_order(size) >= MAX_ORDER || hashdist) { table = __vmalloc(size, gfp_flags); virt = true; + huge = is_vm_area_hugepages(table); } else { /* * If bucketsize is not a power-of-two, we may free @@ -8321,7 +8324,7 @@ void *__init alloc_large_system_hash(const char *tablename,
[PATCH v11 11/13] mm/vmalloc: add vmap_range_noflush variant
As a side-effect, the order of flush_cache_vmap() and arch_sync_kernel_mappings() calls are switched, but that now matches the other callers in this file. Reviewed-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index f043386bb51d..47ab4338cfff 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -240,7 +240,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, return 0; } -int vmap_range(unsigned long addr, unsigned long end, +static int vmap_range_noflush(unsigned long addr, unsigned long end, phys_addr_t phys_addr, pgprot_t prot, unsigned int max_page_shift) { @@ -263,14 +263,24 @@ int vmap_range(unsigned long addr, unsigned long end, break; } while (pgd++, phys_addr += (next - addr), addr = next, addr != end); - flush_cache_vmap(start, end); - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) arch_sync_kernel_mappings(start, end); return err; } +int vmap_range(unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift) +{ + int err; + + err = vmap_range_noflush(addr, end, phys_addr, prot, max_page_shift); + flush_cache_vmap(addr, end); + + return err; +} + static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pgtbl_mod_mask *mask) { -- 2.23.0
[PATCH v11 10/13] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c
This is a generic kernel virtual memory mapper, not specific to ioremap. Code is unchanged other than making vmap_range non-static. Reviewed-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- include/linux/vmalloc.h | 3 + mm/ioremap.c| 203 mm/vmalloc.c| 202 +++ 3 files changed, 205 insertions(+), 203 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 9f7b8b00101b..99ea72d547dc 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -194,6 +194,9 @@ extern struct vm_struct *remove_vm_area(const void *addr); extern struct vm_struct *find_vm_area(const void *addr); #ifdef CONFIG_MMU +int vmap_range(unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift); extern int map_kernel_range_noflush(unsigned long start, unsigned long size, pgprot_t prot, struct page **pages); int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot, diff --git a/mm/ioremap.c b/mm/ioremap.c index 3264d0203785..d1dcc7e744ac 100644 --- a/mm/ioremap.c +++ b/mm/ioremap.c @@ -28,209 +28,6 @@ early_param("nohugeiomap", set_nohugeiomap); static const bool iomap_max_page_shift = PAGE_SHIFT; #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ -static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, - phys_addr_t phys_addr, pgprot_t prot, - pgtbl_mod_mask *mask) -{ - pte_t *pte; - u64 pfn; - - pfn = phys_addr >> PAGE_SHIFT; - pte = pte_alloc_kernel_track(pmd, addr, mask); - if (!pte) - return -ENOMEM; - do { - BUG_ON(!pte_none(*pte)); - set_pte_at(_mm, addr, pte, pfn_pte(pfn, prot)); - pfn++; - } while (pte++, addr += PAGE_SIZE, addr != end); - *mask |= PGTBL_PTE_MODIFIED; - return 0; -} - -static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end, - phys_addr_t phys_addr, pgprot_t prot, - unsigned int max_page_shift) -{ - if (max_page_shift < PMD_SHIFT) - return 0; - - if (!arch_vmap_pmd_supported(prot)) - return 0; - - if ((end - addr) != PMD_SIZE) - return 0; - - if (!IS_ALIGNED(addr, PMD_SIZE)) - return 0; - - if (!IS_ALIGNED(phys_addr, PMD_SIZE)) - return 0; - - if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) - return 0; - - return pmd_set_huge(pmd, phys_addr, prot); -} - -static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, - phys_addr_t phys_addr, pgprot_t prot, - unsigned int max_page_shift, pgtbl_mod_mask *mask) -{ - pmd_t *pmd; - unsigned long next; - - pmd = pmd_alloc_track(_mm, pud, addr, mask); - if (!pmd) - return -ENOMEM; - do { - next = pmd_addr_end(addr, end); - - if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot, - max_page_shift)) { - *mask |= PGTBL_PMD_MODIFIED; - continue; - } - - if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask)) - return -ENOMEM; - } while (pmd++, phys_addr += (next - addr), addr = next, addr != end); - return 0; -} - -static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end, - phys_addr_t phys_addr, pgprot_t prot, - unsigned int max_page_shift) -{ - if (max_page_shift < PUD_SHIFT) - return 0; - - if (!arch_vmap_pud_supported(prot)) - return 0; - - if ((end - addr) != PUD_SIZE) - return 0; - - if (!IS_ALIGNED(addr, PUD_SIZE)) - return 0; - - if (!IS_ALIGNED(phys_addr, PUD_SIZE)) - return 0; - - if (pud_present(*pud) && !pud_free_pmd_page(pud, addr)) - return 0; - - return pud_set_huge(pud, phys_addr, prot); -} - -static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, - phys_addr_t phys_addr, pgprot_t prot, - unsigned int max_page_shift, pgtbl_mod_mask *mask) -{ - pud_t *pud; - unsigned long next; - - pud = pud_alloc_track(_mm, p4d, addr, mask); - if (!pud) - return -ENOMEM; - do { - next = pud_addr_end(addr, end); - - if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot, - max_page_shift)) { - *mask |= PGTBL_PUD_MODIFIED; - continue;
[PATCH v11 09/13] mm/vmalloc: provide fallback arch huge vmap support functions
If an architecture doesn't support a particular page table level as a huge vmap page size then allow it to skip defining the support query function. Suggested-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- arch/arm64/include/asm/vmalloc.h | 7 +++ arch/powerpc/include/asm/vmalloc.h | 7 +++ arch/x86/include/asm/vmalloc.h | 13 + include/linux/vmalloc.h| 24 4 files changed, 31 insertions(+), 20 deletions(-) diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h index fc9a12d6cc1a..7a22aeea9bb5 100644 --- a/arch/arm64/include/asm/vmalloc.h +++ b/arch/arm64/include/asm/vmalloc.h @@ -4,11 +4,8 @@ #include #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -static inline bool arch_vmap_p4d_supported(pgprot_t prot) -{ - return false; -} +#define arch_vmap_pud_supported arch_vmap_pud_supported static inline bool arch_vmap_pud_supported(pgprot_t prot) { /* @@ -19,11 +16,13 @@ static inline bool arch_vmap_pud_supported(pgprot_t prot) !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); } +#define arch_vmap_pmd_supported arch_vmap_pmd_supported static inline bool arch_vmap_pmd_supported(pgprot_t prot) { /* See arch_vmap_pud_supported() */ return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); } + #endif #endif /* _ASM_ARM64_VMALLOC_H */ diff --git a/arch/powerpc/include/asm/vmalloc.h b/arch/powerpc/include/asm/vmalloc.h index 3f0c153befb0..4c69ece52a31 100644 --- a/arch/powerpc/include/asm/vmalloc.h +++ b/arch/powerpc/include/asm/vmalloc.h @@ -5,21 +5,20 @@ #include #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -static inline bool arch_vmap_p4d_supported(pgprot_t prot) -{ - return false; -} +#define arch_vmap_pud_supported arch_vmap_pud_supported static inline bool arch_vmap_pud_supported(pgprot_t prot) { /* HPT does not cope with large pages in the vmalloc area */ return radix_enabled(); } +#define arch_vmap_pmd_supported arch_vmap_pmd_supported static inline bool arch_vmap_pmd_supported(pgprot_t prot) { return radix_enabled(); } + #endif #endif /* _ASM_POWERPC_VMALLOC_H */ diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h index e714b00fc0ca..49ce331f3ac6 100644 --- a/arch/x86/include/asm/vmalloc.h +++ b/arch/x86/include/asm/vmalloc.h @@ -6,24 +6,21 @@ #include #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -static inline bool arch_vmap_p4d_supported(pgprot_t prot) -{ - return false; -} +#ifdef CONFIG_X86_64 +#define arch_vmap_pud_supported arch_vmap_pud_supported static inline bool arch_vmap_pud_supported(pgprot_t prot) { -#ifdef CONFIG_X86_64 return boot_cpu_has(X86_FEATURE_GBPAGES); -#else - return false; -#endif } +#endif +#define arch_vmap_pmd_supported arch_vmap_pmd_supported static inline bool arch_vmap_pmd_supported(pgprot_t prot) { return boot_cpu_has(X86_FEATURE_PSE); } + #endif #endif /* _ASM_X86_VMALLOC_H */ diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 00bd62bd701e..9f7b8b00101b 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -83,10 +83,26 @@ struct vmap_area { }; }; -#ifndef CONFIG_HAVE_ARCH_HUGE_VMAP -static inline bool arch_vmap_p4d_supported(pgprot_t prot) { return false; } -static inline bool arch_vmap_pud_supported(pgprot_t prot) { return false; } -static inline bool arch_vmap_pmd_supported(pgprot_t prot) { return false; } +/* archs that select HAVE_ARCH_HUGE_VMAP should override one or more of these */ +#ifndef arch_vmap_p4d_supported +static inline bool arch_vmap_p4d_supported(pgprot_t prot) +{ + return false; +} +#endif + +#ifndef arch_vmap_pud_supported +static inline bool arch_vmap_pud_supported(pgprot_t prot) +{ + return false; +} +#endif + +#ifndef arch_vmap_pmd_supported +static inline bool arch_vmap_pmd_supported(pgprot_t prot) +{ + return false; +} #endif /* -- 2.23.0
[PATCH v11 08/13] x86: inline huge vmap supported functions
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x...@kernel.org Cc: "H. Peter Anvin" Signed-off-by: Nicholas Piggin --- arch/x86/include/asm/vmalloc.h | 22 +++--- arch/x86/mm/ioremap.c | 21 - arch/x86/mm/pgtable.c | 13 - 3 files changed, 19 insertions(+), 37 deletions(-) diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h index 094ea2b565f3..e714b00fc0ca 100644 --- a/arch/x86/include/asm/vmalloc.h +++ b/arch/x86/include/asm/vmalloc.h @@ -1,13 +1,29 @@ #ifndef _ASM_X86_VMALLOC_H #define _ASM_X86_VMALLOC_H +#include #include #include #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -bool arch_vmap_p4d_supported(pgprot_t prot); -bool arch_vmap_pud_supported(pgprot_t prot); -bool arch_vmap_pmd_supported(pgprot_t prot); +static inline bool arch_vmap_p4d_supported(pgprot_t prot) +{ + return false; +} + +static inline bool arch_vmap_pud_supported(pgprot_t prot) +{ +#ifdef CONFIG_X86_64 + return boot_cpu_has(X86_FEATURE_GBPAGES); +#else + return false; +#endif +} + +static inline bool arch_vmap_pmd_supported(pgprot_t prot) +{ + return boot_cpu_has(X86_FEATURE_PSE); +} #endif #endif /* _ASM_X86_VMALLOC_H */ diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index fbaf0c447986..12c686c65ea9 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -481,27 +481,6 @@ void iounmap(volatile void __iomem *addr) } EXPORT_SYMBOL(iounmap); -#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -bool arch_vmap_p4d_supported(pgprot_t prot) -{ - return false; -} - -bool arch_vmap_pud_supported(pgprot_t prot) -{ -#ifdef CONFIG_X86_64 - return boot_cpu_has(X86_FEATURE_GBPAGES); -#else - return false; -#endif -} - -bool arch_vmap_pmd_supported(pgprot_t prot) -{ - return boot_cpu_has(X86_FEATURE_PSE); -} -#endif - /* * Convert a physical pointer to a virtual kernel pointer for /dev/mem * access diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index f6a9e2e36642..d27cf69e811d 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -780,14 +780,6 @@ int pmd_clear_huge(pmd_t *pmd) return 0; } -/* - * Until we support 512GB pages, skip them in the vmap area. - */ -int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) -{ - return 0; -} - #ifdef CONFIG_X86_64 /** * pud_free_pmd_page - Clear pud entry and free pmd page. @@ -861,11 +853,6 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #else /* !CONFIG_X86_64 */ -int pud_free_pmd_page(pud_t *pud, unsigned long addr) -{ - return pud_none(*pud); -} - /* * Disable free page handling on x86-PAE. This assures that ioremap() * does not update sync'd pmd entries. See vmalloc_sync_one(). -- 2.23.0
[PATCH v11 07/13] arm64: inline huge vmap supported functions
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Acked-by: Catalin Marinas Signed-off-by: Nicholas Piggin --- arch/arm64/include/asm/vmalloc.h | 23 --- arch/arm64/mm/mmu.c | 26 -- 2 files changed, 20 insertions(+), 29 deletions(-) diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h index 597b40405319..fc9a12d6cc1a 100644 --- a/arch/arm64/include/asm/vmalloc.h +++ b/arch/arm64/include/asm/vmalloc.h @@ -4,9 +4,26 @@ #include #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -bool arch_vmap_p4d_supported(pgprot_t prot); -bool arch_vmap_pud_supported(pgprot_t prot); -bool arch_vmap_pmd_supported(pgprot_t prot); +static inline bool arch_vmap_p4d_supported(pgprot_t prot) +{ + return false; +} + +static inline bool arch_vmap_pud_supported(pgprot_t prot) +{ + /* +* Only 4k granule supports level 1 block mappings. +* SW table walks can't handle removal of intermediate entries. +*/ + return IS_ENABLED(CONFIG_ARM64_4K_PAGES) && + !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); +} + +static inline bool arch_vmap_pmd_supported(pgprot_t prot) +{ + /* See arch_vmap_pud_supported() */ + return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); +} #endif #endif /* _ASM_ARM64_VMALLOC_H */ diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 1613d290cbd1..ab9ba7c36dae 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1313,27 +1313,6 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot) return dt_virt; } -bool arch_vmap_p4d_supported(pgprot_t prot) -{ - return false; -} - -bool arch_vmap_pud_supported(pgprot_t prot) -{ - /* -* Only 4k granule supports level 1 block mappings. -* SW table walks can't handle removal of intermediate entries. -*/ - return IS_ENABLED(CONFIG_ARM64_4K_PAGES) && - !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); -} - -bool arch_vmap_pmd_supported(pgprot_t prot) -{ - /* See arch_vmap_pud_supported() */ - return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); -} - int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot) { pud_t new_pud = pfn_pud(__phys_to_pfn(phys), mk_pud_sect_prot(prot)); @@ -1425,11 +1404,6 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr) return 1; } -int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) -{ - return 0; /* Don't attempt a block mapping */ -} - #ifdef CONFIG_MEMORY_HOTPLUG static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) { -- 2.23.0
[PATCH v11 06/13] powerpc: inline huge vmap supported functions
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/vmalloc.h | 19 --- arch/powerpc/mm/book3s64/radix_pgtable.c | 21 - 2 files changed, 16 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/vmalloc.h b/arch/powerpc/include/asm/vmalloc.h index 105abb73f075..3f0c153befb0 100644 --- a/arch/powerpc/include/asm/vmalloc.h +++ b/arch/powerpc/include/asm/vmalloc.h @@ -1,12 +1,25 @@ #ifndef _ASM_POWERPC_VMALLOC_H #define _ASM_POWERPC_VMALLOC_H +#include #include #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -bool arch_vmap_p4d_supported(pgprot_t prot); -bool arch_vmap_pud_supported(pgprot_t prot); -bool arch_vmap_pmd_supported(pgprot_t prot); +static inline bool arch_vmap_p4d_supported(pgprot_t prot) +{ + return false; +} + +static inline bool arch_vmap_pud_supported(pgprot_t prot) +{ + /* HPT does not cope with large pages in the vmalloc area */ + return radix_enabled(); +} + +static inline bool arch_vmap_pmd_supported(pgprot_t prot) +{ + return radix_enabled(); +} #endif #endif /* _ASM_POWERPC_VMALLOC_H */ diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 743807fc210f..8da62afccee5 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -1082,22 +1082,6 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, set_pte_at(mm, addr, ptep, pte); } -bool arch_vmap_pud_supported(pgprot_t prot) -{ - /* HPT does not cope with large pages in the vmalloc area */ - return radix_enabled(); -} - -bool arch_vmap_pmd_supported(pgprot_t prot) -{ - return radix_enabled(); -} - -int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) -{ - return 0; -} - int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { pte_t *ptep = (pte_t *)pud; @@ -1181,8 +1165,3 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) return 1; } - -bool arch_vmap_p4d_supported(pgprot_t prot) -{ - return false; -} -- 2.23.0
[PATCH v11 05/13] mm: HUGE_VMAP arch support cleanup
This changes the awkward approach where architectures provide init functions to determine which levels they can provide large mappings for, to one where the arch is queried for each call. This removes code and indirection, and allows constant-folding of dead code for unsupported levels. This also adds a prot argument to the arch query. This is unused currently but could help with some architectures (e.g., some powerpc processors can't map uncacheable memory with large pages). Cc: linuxppc-dev@lists.ozlabs.org Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x...@kernel.org Cc: "H. Peter Anvin" Acked-by: Catalin Marinas [arm64] Signed-off-by: Nicholas Piggin --- arch/arm64/include/asm/vmalloc.h | 8 ++ arch/arm64/mm/mmu.c | 10 +-- arch/powerpc/include/asm/vmalloc.h | 8 ++ arch/powerpc/mm/book3s64/radix_pgtable.c | 8 +- arch/x86/include/asm/vmalloc.h | 7 ++ arch/x86/mm/ioremap.c| 12 +-- include/linux/io.h | 9 --- include/linux/vmalloc.h | 6 ++ init/main.c | 1 - mm/ioremap.c | 94 ++-- 10 files changed, 85 insertions(+), 78 deletions(-) diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h index 2ca708ab9b20..597b40405319 100644 --- a/arch/arm64/include/asm/vmalloc.h +++ b/arch/arm64/include/asm/vmalloc.h @@ -1,4 +1,12 @@ #ifndef _ASM_ARM64_VMALLOC_H #define _ASM_ARM64_VMALLOC_H +#include + +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP +bool arch_vmap_p4d_supported(pgprot_t prot); +bool arch_vmap_pud_supported(pgprot_t prot); +bool arch_vmap_pmd_supported(pgprot_t prot); +#endif + #endif /* _ASM_ARM64_VMALLOC_H */ diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index ae0c3d023824..1613d290cbd1 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1313,12 +1313,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot) return dt_virt; } -int __init arch_ioremap_p4d_supported(void) +bool arch_vmap_p4d_supported(pgprot_t prot) { - return 0; + return false; } -int __init arch_ioremap_pud_supported(void) +bool arch_vmap_pud_supported(pgprot_t prot) { /* * Only 4k granule supports level 1 block mappings. @@ -1328,9 +1328,9 @@ int __init arch_ioremap_pud_supported(void) !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); } -int __init arch_ioremap_pmd_supported(void) +bool arch_vmap_pmd_supported(pgprot_t prot) { - /* See arch_ioremap_pud_supported() */ + /* See arch_vmap_pud_supported() */ return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); } diff --git a/arch/powerpc/include/asm/vmalloc.h b/arch/powerpc/include/asm/vmalloc.h index b992dfaaa161..105abb73f075 100644 --- a/arch/powerpc/include/asm/vmalloc.h +++ b/arch/powerpc/include/asm/vmalloc.h @@ -1,4 +1,12 @@ #ifndef _ASM_POWERPC_VMALLOC_H #define _ASM_POWERPC_VMALLOC_H +#include + +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP +bool arch_vmap_p4d_supported(pgprot_t prot); +bool arch_vmap_pud_supported(pgprot_t prot); +bool arch_vmap_pmd_supported(pgprot_t prot); +#endif + #endif /* _ASM_POWERPC_VMALLOC_H */ diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 98f0b243c1ab..743807fc210f 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -1082,13 +1082,13 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, set_pte_at(mm, addr, ptep, pte); } -int __init arch_ioremap_pud_supported(void) +bool arch_vmap_pud_supported(pgprot_t prot) { /* HPT does not cope with large pages in the vmalloc area */ return radix_enabled(); } -int __init arch_ioremap_pmd_supported(void) +bool arch_vmap_pmd_supported(pgprot_t prot) { return radix_enabled(); } @@ -1182,7 +1182,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) return 1; } -int __init arch_ioremap_p4d_supported(void) +bool arch_vmap_p4d_supported(pgprot_t prot) { - return 0; + return false; } diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h index 29837740b520..094ea2b565f3 100644 --- a/arch/x86/include/asm/vmalloc.h +++ b/arch/x86/include/asm/vmalloc.h @@ -1,6 +1,13 @@ #ifndef _ASM_X86_VMALLOC_H #define _ASM_X86_VMALLOC_H +#include #include +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP +bool arch_vmap_p4d_supported(pgprot_t prot); +bool arch_vmap_pud_supported(pgprot_t prot); +bool arch_vmap_pmd_supported(pgprot_t prot); +#endif + #endif /* _ASM_X86_VMALLOC_H */ diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 9e5ccc56f8e0..fbaf0c447986 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -481,24 +481,26 @@ void iounmap(volatile void __iomem *addr) }
[PATCH v11 04/13] mm/ioremap: rename ioremap_*_range to vmap_*_range
This will be used as a generic kernel virtual mapping function, so re-name it in preparation. Signed-off-by: Nicholas Piggin --- mm/ioremap.c | 64 +++- 1 file changed, 33 insertions(+), 31 deletions(-) diff --git a/mm/ioremap.c b/mm/ioremap.c index 5fa1ab41d152..3f4d36f9745a 100644 --- a/mm/ioremap.c +++ b/mm/ioremap.c @@ -61,9 +61,9 @@ static inline int ioremap_pud_enabled(void) { return 0; } static inline int ioremap_pmd_enabled(void) { return 0; } #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ -static int ioremap_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot, - pgtbl_mod_mask *mask) +static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot, + pgtbl_mod_mask *mask) { pte_t *pte; u64 pfn; @@ -81,9 +81,8 @@ static int ioremap_pte_range(pmd_t *pmd, unsigned long addr, return 0; } -static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, - pgprot_t prot) +static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot) { if (!ioremap_pmd_enabled()) return 0; @@ -103,9 +102,9 @@ static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr, return pmd_set_huge(pmd, phys_addr, prot); } -static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot, - pgtbl_mod_mask *mask) +static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot, + pgtbl_mod_mask *mask) { pmd_t *pmd; unsigned long next; @@ -116,20 +115,19 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr, do { next = pmd_addr_end(addr, end); - if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) { + if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) { *mask |= PGTBL_PMD_MODIFIED; continue; } - if (ioremap_pte_range(pmd, addr, next, phys_addr, prot, mask)) + if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask)) return -ENOMEM; } while (pmd++, phys_addr += (next - addr), addr = next, addr != end); return 0; } -static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, - pgprot_t prot) +static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot) { if (!ioremap_pud_enabled()) return 0; @@ -149,9 +147,9 @@ static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr, return pud_set_huge(pud, phys_addr, prot); } -static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot, - pgtbl_mod_mask *mask) +static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot, + pgtbl_mod_mask *mask) { pud_t *pud; unsigned long next; @@ -162,20 +160,19 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr, do { next = pud_addr_end(addr, end); - if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot)) { + if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot)) { *mask |= PGTBL_PUD_MODIFIED; continue; } - if (ioremap_pmd_range(pud, addr, next, phys_addr, prot, mask)) + if (vmap_pmd_range(pud, addr, next, phys_addr, prot, mask)) return -ENOMEM; } while (pud++, phys_addr += (next - addr), addr = next, addr != end); return 0; } -static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, - pgprot_t prot) +static int vmap_try_huge_p4d(p4d_t *p4d, unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot) { if (!ioremap_p4d_enabled()) return 0; @@ -195,9 +192,9 @@ static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr, return p4d_set_huge(p4d, phys_addr, prot); } -static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr, -
[PATCH v11 03/13] mm/vmalloc: rename vmap_*_range vmap_pages_*_range
The vmalloc mapper operates on a struct page * array rather than a linear physical address, re-name it to make this distinction clear. Reviewed-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 62372f9e0167..7f2f36116980 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -189,7 +189,7 @@ void unmap_kernel_range_noflush(unsigned long start, unsigned long size) arch_sync_kernel_mappings(start, end); } -static int vmap_pte_range(pmd_t *pmd, unsigned long addr, +static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, int *nr, pgtbl_mod_mask *mask) { @@ -217,7 +217,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr, return 0; } -static int vmap_pmd_range(pud_t *pud, unsigned long addr, +static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, int *nr, pgtbl_mod_mask *mask) { @@ -229,13 +229,13 @@ static int vmap_pmd_range(pud_t *pud, unsigned long addr, return -ENOMEM; do { next = pmd_addr_end(addr, end); - if (vmap_pte_range(pmd, addr, next, prot, pages, nr, mask)) + if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask)) return -ENOMEM; } while (pmd++, addr = next, addr != end); return 0; } -static int vmap_pud_range(p4d_t *p4d, unsigned long addr, +static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, int *nr, pgtbl_mod_mask *mask) { @@ -247,13 +247,13 @@ static int vmap_pud_range(p4d_t *p4d, unsigned long addr, return -ENOMEM; do { next = pud_addr_end(addr, end); - if (vmap_pmd_range(pud, addr, next, prot, pages, nr, mask)) + if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask)) return -ENOMEM; } while (pud++, addr = next, addr != end); return 0; } -static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, +static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, int *nr, pgtbl_mod_mask *mask) { @@ -265,7 +265,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, return -ENOMEM; do { next = p4d_addr_end(addr, end); - if (vmap_pud_range(p4d, addr, next, prot, pages, nr, mask)) + if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask)) return -ENOMEM; } while (p4d++, addr = next, addr != end); return 0; @@ -306,7 +306,7 @@ int map_kernel_range_noflush(unsigned long addr, unsigned long size, next = pgd_addr_end(addr, end); if (pgd_bad(*pgd)) mask |= PGTBL_PGD_MODIFIED; - err = vmap_p4d_range(pgd, addr, next, prot, pages, , ); + err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, , ); if (err) return err; } while (pgd++, addr = next, addr != end); -- 2.23.0
[PATCH v11 02/13] mm: apply_to_pte_range warn and fail if a large pte is encountered
apply_to_pte_range might mistake a large pte for bad, or treat it as a page table, resulting in a crash or corruption. Add a test to warn and return error if large entries are found. Reviewed-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- mm/memory.c | 66 +++-- 1 file changed, 49 insertions(+), 17 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index feff48e1465a..672e39a72788 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2440,13 +2440,21 @@ static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud, } do { next = pmd_addr_end(addr, end); - if (create || !pmd_none_or_clear_bad(pmd)) { - err = apply_to_pte_range(mm, pmd, addr, next, fn, data, -create, mask); - if (err) - break; + if (pmd_none(*pmd) && !create) + continue; + if (WARN_ON_ONCE(pmd_leaf(*pmd))) + return -EINVAL; + if (!pmd_none(*pmd) && WARN_ON_ONCE(pmd_bad(*pmd))) { + if (!create) + continue; + pmd_clear_bad(pmd); } + err = apply_to_pte_range(mm, pmd, addr, next, +fn, data, create, mask); + if (err) + break; } while (pmd++, addr = next, addr != end); + return err; } @@ -2468,13 +2476,21 @@ static int apply_to_pud_range(struct mm_struct *mm, p4d_t *p4d, } do { next = pud_addr_end(addr, end); - if (create || !pud_none_or_clear_bad(pud)) { - err = apply_to_pmd_range(mm, pud, addr, next, fn, data, -create, mask); - if (err) - break; + if (pud_none(*pud) && !create) + continue; + if (WARN_ON_ONCE(pud_leaf(*pud))) + return -EINVAL; + if (!pud_none(*pud) && WARN_ON_ONCE(pud_bad(*pud))) { + if (!create) + continue; + pud_clear_bad(pud); } + err = apply_to_pmd_range(mm, pud, addr, next, +fn, data, create, mask); + if (err) + break; } while (pud++, addr = next, addr != end); + return err; } @@ -2496,13 +2512,21 @@ static int apply_to_p4d_range(struct mm_struct *mm, pgd_t *pgd, } do { next = p4d_addr_end(addr, end); - if (create || !p4d_none_or_clear_bad(p4d)) { - err = apply_to_pud_range(mm, p4d, addr, next, fn, data, -create, mask); - if (err) - break; + if (p4d_none(*p4d) && !create) + continue; + if (WARN_ON_ONCE(p4d_leaf(*p4d))) + return -EINVAL; + if (!p4d_none(*p4d) && WARN_ON_ONCE(p4d_bad(*p4d))) { + if (!create) + continue; + p4d_clear_bad(p4d); } + err = apply_to_pud_range(mm, p4d, addr, next, +fn, data, create, mask); + if (err) + break; } while (p4d++, addr = next, addr != end); + return err; } @@ -2522,9 +2546,17 @@ static int __apply_to_page_range(struct mm_struct *mm, unsigned long addr, pgd = pgd_offset(mm, addr); do { next = pgd_addr_end(addr, end); - if (!create && pgd_none_or_clear_bad(pgd)) + if (pgd_none(*pgd) && !create) continue; - err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, create, ); + if (WARN_ON_ONCE(pgd_leaf(*pgd))) + return -EINVAL; + if (!pgd_none(*pgd) && WARN_ON_ONCE(pgd_bad(*pgd))) { + if (!create) + continue; + pgd_clear_bad(pgd); + } + err = apply_to_p4d_range(mm, pgd, addr, next, +fn, data, create, ); if (err) break; } while (pgd++, addr = next, addr != end); -- 2.23.0
[PATCH v11 01/13] mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page
vmalloc_to_page returns NULL for addresses mapped by larger pages[*]. Whether or not a vmap is huge depends on the architecture details, alignments, boot options, etc., which the caller can not be expected to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page. This change teaches vmalloc_to_page about larger pages, and returns the struct page that corresponds to the offset within the large page. This makes the API agnostic to mapping implementation details. [*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings") Reviewed-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 41 ++--- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index e6f352bf0498..62372f9e0167 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -34,7 +34,7 @@ #include #include #include - +#include #include #include #include @@ -343,7 +343,9 @@ int is_vmalloc_or_module_addr(const void *x) } /* - * Walk a vmap address to the struct page it maps. + * Walk a vmap address to the struct page it maps. Huge vmap mappings will + * return the tail page that corresponds to the base page address, which + * matches small vmap mappings. */ struct page *vmalloc_to_page(const void *vmalloc_addr) { @@ -363,25 +365,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) if (pgd_none(*pgd)) return NULL; + if (WARN_ON_ONCE(pgd_leaf(*pgd))) + return NULL; /* XXX: no allowance for huge pgd */ + if (WARN_ON_ONCE(pgd_bad(*pgd))) + return NULL; + p4d = p4d_offset(pgd, addr); if (p4d_none(*p4d)) return NULL; - pud = pud_offset(p4d, addr); + if (p4d_leaf(*p4d)) + return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT); + if (WARN_ON_ONCE(p4d_bad(*p4d))) + return NULL; - /* -* Don't dereference bad PUD or PMD (below) entries. This will also -* identify huge mappings, which we may encounter on architectures -* that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be -* identified as vmalloc addresses by is_vmalloc_addr(), but are -* not [unambiguously] associated with a struct page, so there is -* no correct value to return for them. -*/ - WARN_ON_ONCE(pud_bad(*pud)); - if (pud_none(*pud) || pud_bad(*pud)) + pud = pud_offset(p4d, addr); + if (pud_none(*pud)) + return NULL; + if (pud_leaf(*pud)) + return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + if (WARN_ON_ONCE(pud_bad(*pud))) return NULL; + pmd = pmd_offset(pud, addr); - WARN_ON_ONCE(pmd_bad(*pmd)); - if (pmd_none(*pmd) || pmd_bad(*pmd)) + if (pmd_none(*pmd)) + return NULL; + if (pmd_leaf(*pmd)) + return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + if (WARN_ON_ONCE(pmd_bad(*pmd))) return NULL; ptep = pte_offset_map(pmd, addr); @@ -389,6 +399,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) if (pte_present(pte)) page = pte_page(pte); pte_unmap(ptep); + return page; } EXPORT_SYMBOL(vmalloc_to_page); -- 2.23.0
[PATCH v11 00/13] huge vmalloc mappings
I think I ended up implementing all Christoph's comments because they turned out better in the end. Cleanups coming in another series though. Thanks, Nick Since v10: - Fixed code style, most > 80 colums, tweak patch titles, etc [thanks Christoph] - Made huge vmalloc code and data structure compile away if unselected [Christoph] - Archs only have to provide arch_vmap_p?d_supported for levels they implement [Christoph] Since v9: - Fixed intermediate build breakage on x86-32 !PAE [thanks Ding] - Fixed small page fallback case vm_struct double-free [thanks Ding] Since v8: - Fixed nommu compile. - Added Kconfig option help text - Added VM_NOHUGE which should help archs implement it [suggested by Rick] Since v7: - Rebase, added some acks, compile fix - Removed "order=" from vmallocinfo, it's a bit confusing (nr_pages is in small page size for compatibility). - Added arch_vmap_pmd_supported() test before starting to allocate the large page, rather than only testing it when doing the map, to avoid unsupported configs trying to allocate huge pages for no reason. Since v6: - Fixed a false positive warning introduced in patch 2, found by kbuild test robot. Since v5: - Split arch changes out better and make the constant folding work - Avoid most of the 80 column wrap, fix a reference to lib/ioremap.c - Fix compile error on some archs Since v4: - Fixed an off-by-page-order bug in v4 - Several minor cleanups. - Added page order to /proc/vmallocinfo - Added hugepage to alloc_large_system_hage output. - Made an architecture config option, powerpc only for now. Since v3: - Fixed an off-by-one bug in a loop - Fix !CONFIG_HAVE_ARCH_HUGE_VMAP build fail Nicholas Piggin (13): mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page mm: apply_to_pte_range warn and fail if a large pte is encountered mm/vmalloc: rename vmap_*_range vmap_pages_*_range mm/ioremap: rename ioremap_*_range to vmap_*_range mm: HUGE_VMAP arch support cleanup powerpc: inline huge vmap supported functions arm64: inline huge vmap supported functions x86: inline huge vmap supported functions mm/vmalloc: provide fallback arch huge vmap support functions mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c mm/vmalloc: add vmap_range_noflush variant mm/vmalloc: Hugepage vmalloc mappings powerpc/64s/radix: Enable huge vmalloc mappings .../admin-guide/kernel-parameters.txt | 2 + arch/Kconfig | 11 + arch/arm64/include/asm/vmalloc.h | 24 + arch/arm64/mm/mmu.c | 26 - arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/vmalloc.h| 20 + arch/powerpc/kernel/module.c | 21 +- arch/powerpc/mm/book3s64/radix_pgtable.c | 21 - arch/x86/include/asm/vmalloc.h| 20 + arch/x86/mm/ioremap.c | 19 - arch/x86/mm/pgtable.c | 13 - include/linux/io.h| 9 - include/linux/vmalloc.h | 46 ++ init/main.c | 1 - mm/ioremap.c | 225 +--- mm/memory.c | 66 ++- mm/page_alloc.c | 5 +- mm/vmalloc.c | 484 +++--- 18 files changed, 614 insertions(+), 400 deletions(-) -- 2.23.0
[PATCH v2] tpm: ibmvtpm: fix error return code in tpm_ibmvtpm_probe()
From: Stefan Berger Return error code -ETIMEDOUT rather than '0' when waiting for the rtce_buf to be set has timed out. Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before proceeding") Reported-by: Hulk Robot Signed-off-by: Wang Hai Signed-off-by: Stefan Berger --- drivers/char/tpm/tpm_ibmvtpm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c index 994385bf37c0..813eb2cac0ce 100644 --- a/drivers/char/tpm/tpm_ibmvtpm.c +++ b/drivers/char/tpm/tpm_ibmvtpm.c @@ -687,6 +687,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev, ibmvtpm->rtce_buf != NULL, HZ)) { dev_err(dev, "CRQ response timed out\n"); + rc = -ETIMEDOUT; goto init_irq_cleanup; } -- 2.25.4
Re: [PATCH] powerpc/mm: Limit allocation of SWIOTLB on server machines
Konrad Rzeszutek Wilk writes: > On Fri, Jan 08, 2021 at 09:27:01PM -0300, Thiago Jung Bauermann wrote: >> >> Ram Pai writes: >> >> > On Wed, Dec 23, 2020 at 09:06:01PM -0300, Thiago Jung Bauermann wrote: >> >> >> >> Hi Ram, >> >> >> >> Thanks for reviewing this patch. >> >> >> >> Ram Pai writes: >> >> >> >> > On Fri, Dec 18, 2020 at 03:21:03AM -0300, Thiago Jung Bauermann wrote: >> >> >> On server-class POWER machines, we don't need the SWIOTLB unless we're >> >> >> a >> >> >> secure VM. Nevertheless, if CONFIG_SWIOTLB is enabled we >> >> >> unconditionally >> >> >> allocate it. >> >> >> >> >> >> In most cases this is harmless, but on a few machine configurations >> >> >> (e.g., >> >> >> POWER9 powernv systems with 4 GB area reserved for crashdump kernel) >> >> >> it can >> >> >> happen that memblock can't find a 64 MB chunk of memory for the >> >> >> SWIOTLB and >> >> >> fails with a scary-looking WARN_ONCE: >> >> >> >> >> >> [ cut here ] >> >> >> memblock: bottom-up allocation failed, memory hotremove may be >> >> >> affected >> >> >> WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 >> >> >> memblock_find_in_range_node+0x328/0x340 >> >> >> Modules linked in: >> >> >> CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc2-orig+ #6 >> >> >> NIP: c0442f38 LR: c0442f34 CTR: c01e0080 >> >> >> REGS: c1def900 TRAP: 0700 Not tainted (5.10.0-rc2-orig+) >> >> >> MSR: 92021033 CR: 2802 XER: >> >> >> 2004 >> >> >> CFAR: c014b7b4 IRQMASK: 1 >> >> >> GPR00: c0442f34 c1defba0 c1deff00 >> >> >> 0047 >> >> >> GPR04: 7fff c1def828 c1def820 >> >> >> >> >> >> GPR08: 001ffc3e c1b75478 c1b75478 >> >> >> 0001 >> >> >> GPR12: 2000 c203 >> >> >> >> >> >> GPR16: >> >> >> 0203 >> >> >> GPR20: 0001 0001 >> >> >> c1defc10 >> >> >> GPR24: c1defc08 c1c91868 c1defc18 >> >> >> c1c91890 >> >> >> GPR28: 0400 >> >> >> >> >> >> NIP [c0442f38] memblock_find_in_range_node+0x328/0x340 >> >> >> LR [c0442f34] memblock_find_in_range_node+0x324/0x340 >> >> >> Call Trace: >> >> >> [c1defba0] [c0442f34] >> >> >> memblock_find_in_range_node+0x324/0x340 (unreliable) >> >> >> [c1defc90] [c15ac088] >> >> >> memblock_alloc_range_nid+0xec/0x1b0 >> >> >> [c1defd40] [c15ac1f8] >> >> >> memblock_alloc_internal+0xac/0x110 >> >> >> [c1defda0] [c15ac4d0] memblock_alloc_try_nid+0x94/0xcc >> >> >> [c1defe30] [c159c3c8] swiotlb_init+0x78/0x104 >> >> >> [c1defea0] [c158378c] mem_init+0x4c/0x98 >> >> >> [c1defec0] [c157457c] start_kernel+0x714/0xac8 >> >> >> [c1deff90] [c000d244] start_here_common+0x1c/0x58 >> >> >> Instruction dump: >> >> >> 2c23 4182ffd4 ea610088 ea810090 4bfffe84 3921 3d42fff4 >> >> >> 3c62ff60 >> >> >> 3863c560 992a8bfc 4bd0881d 6000 <0fe0> ea610088 4bfffd94 >> >> >> 6000 >> >> >> random: get_random_bytes called from __warn+0x128/0x184 with >> >> >> crng_init=0 >> >> >> ---[ end trace ]--- >> >> >> software IO TLB: Cannot allocate buffer >> >> >> >> >> >> Unless this is a secure VM the message can actually be ignored, >> >> >> because the >> >> >> SWIOTLB isn't needed. Therefore, let's avoid the SWIOTLB in those >> >> >> cases. >> >> > >> >> > The above warn_on is conveying a genuine warning. Should it be silenced? >> >> >> >> Not sure I understand your point. This patch doesn't silence the >> >> warning, it avoids the problem it is warning about. >> > >> > Sorry, I should have explained it better. My point is... >> > >> >If CONFIG_SWIOTLB is enabled, it means that the kernel is >> >promising the bounce buffering capability. I know, currently we >> >do not have any kernel subsystems that use bounce buffers on >> >non-secure-pseries-kernel or powernv-kernel. But that does not >> >mean, there wont be any. In case there is such a third-party >> >module needing bounce buffering, it wont be able to operate, >> >because of the proposed change in your patch. >> > >> >Is that a good thing or a bad thing, I do not know. I will let >> >the experts opine. >> >> Ping? Does anyone else has an opinion on this? The other option I can >> think of is changing the crashkernel code to not reserve so much memory >> below 4 GB. Other people are considering this option, but it's not >> planned for the near future. > > That seems a more suitable solution regardless, but there is always > the danger of not being enough or
Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
Mike Rapoport writes: > On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote: >> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann >> wrote: >> >> > Mike Rapoport writes: >> > >> > > > Signed-off-by: Roman Gushchin >> > > >> > > Reviewed-by: Mike Rapoport >> > >> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this >> > patch. This happens on some ppc64le bare metal (powernv) server machines >> > with >> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I >> > posted >> > to solve this issue in a different way: >> > >> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauer...@linux.ibm.com/ >> > >> > Since this patch solves that problem, is it possible to include it in the >> > next >> > feasible v5.11-rcX, with the following tag? >> >> We could do this, Thanks! >> if we're confident that this patch doesn't depend on >> [1/2] "mm: cma: allocate cma areas bottom-up"? I think it is... > > A think it does not depend on cma bottom-up allocation, it's rather the other > way around: without this CMA bottom-up allocation could fail with KASLR > enabled. I agree. Conceptually, this could have been patch 1 in this series. > Still, this patch may need updates to the way x86 does early reservations: > > https://lore.kernel.org/lkml/20210115083255.12744-1-r...@kernel.org Ah, I wasn't aware of this. Thanks for fixing those issues. That series seems to be well accepted. >> > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated >> > from low memory") >> >> I added that. Thanks! -- Thiago Jung Bauermann IBM Linux Technology Center
Re: [PATCH] PCI: dwc: layerscape: convert to builtin_platform_driver()
Am 2021-01-21 12:01, schrieb Geert Uytterhoeven: Hi Saravana, On Thu, Jan 21, 2021 at 1:05 AM Saravana Kannan wrote: On Wed, Jan 20, 2021 at 3:53 PM Michael Walle wrote: > Am 2021-01-20 20:47, schrieb Saravana Kannan: > > On Wed, Jan 20, 2021 at 11:28 AM Michael Walle > > wrote: > >> > >> [RESEND, fat-fingered the buttons of my mail client and converted > >> all CCs to BCCs :(] > >> > >> Am 2021-01-20 20:02, schrieb Saravana Kannan: > >> > On Wed, Jan 20, 2021 at 6:24 AM Rob Herring wrote: > >> >> > >> >> On Wed, Jan 20, 2021 at 4:53 AM Michael Walle > >> >> wrote: > >> >> > > >> >> > fw_devlink will defer the probe until all suppliers are ready. We can't > >> >> > use builtin_platform_driver_probe() because it doesn't retry after probe > >> >> > deferral. Convert it to builtin_platform_driver(). > >> >> > >> >> If builtin_platform_driver_probe() doesn't work with fw_devlink, then > >> >> shouldn't it be fixed or removed? > >> > > >> > I was actually thinking about this too. The problem with fixing > >> > builtin_platform_driver_probe() to behave like > >> > builtin_platform_driver() is that these probe functions could be > >> > marked with __init. But there are also only 20 instances of > >> > builtin_platform_driver_probe() in the kernel: > >> > $ git grep ^builtin_platform_driver_probe | wc -l > >> > 20 > >> > > >> > So it might be easier to just fix them to not use > >> > builtin_platform_driver_probe(). > >> > > >> > Michael, > >> > > >> > Any chance you'd be willing to help me by converting all these to > >> > builtin_platform_driver() and delete builtin_platform_driver_probe()? > >> > >> If it just moving the probe function to the _driver struct and > >> remove the __init annotations. I could look into that. > > > > Yup. That's pretty much it AFAICT. > > > > builtin_platform_driver_probe() also makes sure the driver doesn't ask > > for async probe, etc. But I doubt anyone is actually setting async > > flags and still using builtin_platform_driver_probe(). > > Hasn't module_platform_driver_probe() the same problem? And there > are ~80 drivers which uses that. Yeah. The biggest problem with all of these is the __init markers. Maybe some familiar with coccinelle can help? And dropping them will increase memory usage. Although I do have the changes for the builtin_platform_driver_probe() ready, I don't think it makes much sense to send these unless we agree on the increased memory footprint. While there are just a few builtin_platform_driver_probe() and memory increase _might_ be negligible, there are many more module_platform_driver_probe(). -michael
Re: [PATCH] PCI: dwc: layerscape: convert to builtin_platform_driver()
Am 2021-01-25 19:58, schrieb Saravana Kannan: On Mon, Jan 25, 2021 at 8:50 AM Lorenzo Pieralisi wrote: On Wed, Jan 20, 2021 at 08:28:36PM +0100, Michael Walle wrote: > [RESEND, fat-fingered the buttons of my mail client and converted > all CCs to BCCs :(] > > Am 2021-01-20 20:02, schrieb Saravana Kannan: > > On Wed, Jan 20, 2021 at 6:24 AM Rob Herring wrote: > > > > > > On Wed, Jan 20, 2021 at 4:53 AM Michael Walle > > > wrote: > > > > > > > > fw_devlink will defer the probe until all suppliers are ready. We can't > > > > use builtin_platform_driver_probe() because it doesn't retry after probe > > > > deferral. Convert it to builtin_platform_driver(). > > > > > > If builtin_platform_driver_probe() doesn't work with fw_devlink, then > > > shouldn't it be fixed or removed? > > > > I was actually thinking about this too. The problem with fixing > > builtin_platform_driver_probe() to behave like > > builtin_platform_driver() is that these probe functions could be > > marked with __init. But there are also only 20 instances of > > builtin_platform_driver_probe() in the kernel: > > $ git grep ^builtin_platform_driver_probe | wc -l > > 20 > > > > So it might be easier to just fix them to not use > > builtin_platform_driver_probe(). > > > > Michael, > > > > Any chance you'd be willing to help me by converting all these to > > builtin_platform_driver() and delete builtin_platform_driver_probe()? > > If it just moving the probe function to the _driver struct and > remove the __init annotations. I could look into that. Can I drop this patch then ? No, please pick it up. Michael and I were talking about doing similar changes for other drivers. Yes please, I was just about to answer, but Saravana beat me. -michael
Re: [PATCH] PCI: dwc: layerscape: convert to builtin_platform_driver()
On Wed, Jan 20, 2021 at 08:28:36PM +0100, Michael Walle wrote: > [RESEND, fat-fingered the buttons of my mail client and converted > all CCs to BCCs :(] > > Am 2021-01-20 20:02, schrieb Saravana Kannan: > > On Wed, Jan 20, 2021 at 6:24 AM Rob Herring wrote: > > > > > > On Wed, Jan 20, 2021 at 4:53 AM Michael Walle > > > wrote: > > > > > > > > fw_devlink will defer the probe until all suppliers are ready. We can't > > > > use builtin_platform_driver_probe() because it doesn't retry after probe > > > > deferral. Convert it to builtin_platform_driver(). > > > > > > If builtin_platform_driver_probe() doesn't work with fw_devlink, then > > > shouldn't it be fixed or removed? > > > > I was actually thinking about this too. The problem with fixing > > builtin_platform_driver_probe() to behave like > > builtin_platform_driver() is that these probe functions could be > > marked with __init. But there are also only 20 instances of > > builtin_platform_driver_probe() in the kernel: > > $ git grep ^builtin_platform_driver_probe | wc -l > > 20 > > > > So it might be easier to just fix them to not use > > builtin_platform_driver_probe(). > > > > Michael, > > > > Any chance you'd be willing to help me by converting all these to > > builtin_platform_driver() and delete builtin_platform_driver_probe()? > > If it just moving the probe function to the _driver struct and > remove the __init annotations. I could look into that. Can I drop this patch then ? Thanks, Lorenzo
[PATCH v4 23/23] powerpc/syscall: Avoid storing 'current' in another pointer
By saving the pointer pointing to thread_info.flags, gcc copies r2 in a non-volatile register. We know 'current' doesn't change, so avoid that intermediaite pointer. Reduces null_syscall benchmark by 2 cycles (322 => 320 cycles) On PPC64, gcc seems to know that 'current' is not changing, and it keeps it in a non volatile register to avoid multiple read of 'current' in paca. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/syscall.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c index 47ae55f94d1c..72e0b18b88d8 100644 --- a/arch/powerpc/kernel/syscall.c +++ b/arch/powerpc/kernel/syscall.c @@ -186,7 +186,6 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs, long scv) { - unsigned long *ti_flagsp = _thread_info()->flags; unsigned long ti_flags; unsigned long ret = 0; @@ -202,7 +201,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, /* Check whether the syscall is issued inside a restartable sequence */ rseq_syscall(regs); - ti_flags = *ti_flagsp; + ti_flags = current_thread_info()->flags; if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && !scv) { if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL { @@ -216,7 +215,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, ret = _TIF_RESTOREALL; else regs->gpr[3] = r3; - clear_bits(_TIF_PERSYSCALL_MASK, ti_flagsp); + clear_bits(_TIF_PERSYSCALL_MASK, _thread_info()->flags); } else { regs->gpr[3] = r3; } @@ -228,7 +227,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, again: local_irq_disable(); - ti_flags = READ_ONCE(*ti_flagsp); + ti_flags = READ_ONCE(current_thread_info()->flags); while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) { local_irq_enable(); if (ti_flags & _TIF_NEED_RESCHED) { @@ -244,7 +243,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, do_notify_resume(regs, ti_flags); } local_irq_disable(); - ti_flags = READ_ONCE(*ti_flagsp); + ti_flags = READ_ONCE(current_thread_info()->flags); } if (IS_ENABLED(CONFIG_PPC_BOOK3S) && IS_ENABLED(CONFIG_PPC_FPU)) { -- 2.25.0
[PATCH v4 22/23] powerpc/syscall: Optimise checks in beginning of system_call_exception()
Combine all tests of regs->msr into a single logical one. Before the patch: 0: 81 6a 00 84 lwz r11,132(r10) 4: 90 6a 00 88 stw r3,136(r10) 8: 69 60 00 02 xorir0,r11,2 c: 54 00 ff fe rlwinm r0,r0,31,31,31 10: 0f 00 00 00 twnei r0,0 14: 69 63 40 00 xorir3,r11,16384 18: 54 63 97 fe rlwinm r3,r3,18,31,31 1c: 0f 03 00 00 twnei r3,0 20: 69 6b 80 00 xorir11,r11,32768 24: 55 6b 8f fe rlwinm r11,r11,17,31,31 28: 0f 0b 00 00 twnei r11,0 After the patch: 0: 81 6a 00 84 lwz r11,132(r10) 4: 90 6a 00 88 stw r3,136(r10) 8: 7d 6b 58 f8 not r11,r11 c: 71 6b c0 02 andi. r11,r11,49154 10: 0f 0b 00 00 twnei r11,0 6 cycles less on powerpc 8xx (328 => 322 cycles). Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/syscall.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c index a40775daa88b..47ae55f94d1c 100644 --- a/arch/powerpc/kernel/syscall.c +++ b/arch/powerpc/kernel/syscall.c @@ -28,6 +28,7 @@ notrace long system_call_exception(long r3, long r4, long r5, unsigned long r0, struct pt_regs *regs) { syscall_fn f; + unsigned long expected_msr; regs->orig_gpr3 = r3; @@ -39,10 +40,13 @@ notrace long system_call_exception(long r3, long r4, long r5, trace_hardirqs_off(); /* finish reconciling */ + expected_msr = MSR_PR; if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x)) - BUG_ON(!(regs->msr & MSR_RI)); - BUG_ON(!(regs->msr & MSR_PR)); - BUG_ON(arch_irq_disabled_regs(regs)); + expected_msr |= MSR_RI; + if (IS_ENABLED(CONFIG_PPC32)) + expected_msr |= MSR_EE; + BUG_ON((regs->msr & expected_msr) ^ expected_msr); + BUG_ON(IS_ENABLED(CONFIG_PPC64) && arch_irq_disabled_regs(regs)); #ifdef CONFIG_PPC_PKEY if (mmu_has_feature(MMU_FTR_PKEY)) { -- 2.25.0
[PATCH v4 21/23] powerpc/syscall: Remove FULL_REGS verification in system_call_exception
For book3s/64, FULL_REGS() is 'true' at all time, so the test voids. For others, non volatile registers are saved inconditionally. So the verification is pointless. Should one fail to do it, it would anyway be caught by the CHECK_FULL_REGS() in copy_thread() as we have removed the special versions ppc_fork() and friends. null_syscall benchmark reduction 4 cycles (332 => 328 cycles) Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/syscall.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c index 30f8a397a522..a40775daa88b 100644 --- a/arch/powerpc/kernel/syscall.c +++ b/arch/powerpc/kernel/syscall.c @@ -42,7 +42,6 @@ notrace long system_call_exception(long r3, long r4, long r5, if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x)) BUG_ON(!(regs->msr & MSR_RI)); BUG_ON(!(regs->msr & MSR_PR)); - BUG_ON(!FULL_REGS(regs)); BUG_ON(arch_irq_disabled_regs(regs)); #ifdef CONFIG_PPC_PKEY -- 2.25.0
[PATCH v4 20/23] powerpc/syscall: Do not check unsupported scv vector on PPC32
Only PPC64 has scv. No need to check the 0x7ff0 trap on PPC32. And ignore the scv parameter in syscall_exit_prepare (Save 14 cycles 346 => 332 cycles) Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/entry_32.S | 1 - arch/powerpc/kernel/syscall.c | 7 +-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 9922a04650f7..6ae9c7bcb06c 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -343,7 +343,6 @@ transfer_to_syscall: ret_from_syscall: addir4,r1,STACK_FRAME_OVERHEAD - li r5,0 bl syscall_exit_prepare #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) /* If the process has its own DBCR0 value, load it up. The internal diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c index 476909b11051..30f8a397a522 100644 --- a/arch/powerpc/kernel/syscall.c +++ b/arch/powerpc/kernel/syscall.c @@ -86,7 +86,7 @@ notrace long system_call_exception(long r3, long r4, long r5, local_irq_enable(); if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) { - if (unlikely(regs->trap == 0x7ff0)) { + if (IS_ENABLED(CONFIG_PPC64) && unlikely(regs->trap == 0x7ff0)) { /* Unsupported scv vector */ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); return regs->gpr[3]; @@ -109,7 +109,7 @@ notrace long system_call_exception(long r3, long r4, long r5, r8 = regs->gpr[8]; } else if (unlikely(r0 >= NR_syscalls)) { - if (unlikely(regs->trap == 0x7ff0)) { + if (IS_ENABLED(CONFIG_PPC64) && unlikely(regs->trap == 0x7ff0)) { /* Unsupported scv vector */ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); return regs->gpr[3]; @@ -187,6 +187,9 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, unsigned long ti_flags; unsigned long ret = 0; + if (IS_ENABLED(CONFIG_PPC32)) + scv = 0; + CT_WARN_ON(ct_state() == CONTEXT_USER); kuap_check(); -- 2.25.0
[PATCH v4 19/23] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()
When r3 is not modified, reload it from regs->orig_r3 to free volatile registers. This avoids a stack frame for the likely part of system_call_exception() Before the patch: c000b4d4 : c000b4d4: 7c 08 02 a6 mflrr0 c000b4d8: 94 21 ff e0 stwur1,-32(r1) c000b4dc: 93 e1 00 1c stw r31,28(r1) c000b4e0: 90 01 00 24 stw r0,36(r1) c000b4e4: 90 6a 00 88 stw r3,136(r10) c000b4e8: 81 6a 00 84 lwz r11,132(r10) c000b4ec: 69 6b 00 02 xorir11,r11,2 c000b4f0: 55 6b ff fe rlwinm r11,r11,31,31,31 c000b4f4: 0f 0b 00 00 twnei r11,0 c000b4f8: 81 6a 00 a0 lwz r11,160(r10) c000b4fc: 55 6b 07 fe clrlwi r11,r11,31 c000b500: 0f 0b 00 00 twnei r11,0 c000b504: 7c 0c 42 e6 mftbr0 c000b508: 83 e2 00 08 lwz r31,8(r2) c000b50c: 81 82 00 28 lwz r12,40(r2) c000b510: 90 02 00 24 stw r0,36(r2) c000b514: 7d 8c f8 50 subfr12,r12,r31 c000b518: 7c 0c 02 14 add r0,r12,r0 c000b51c: 90 02 00 08 stw r0,8(r2) c000b520: 7c 10 13 a6 mtspr 80,r0 c000b524: 81 62 00 70 lwz r11,112(r2) c000b528: 71 60 86 91 andi. r0,r11,34449 c000b52c: 40 82 00 34 bne c000b560 c000b530: 2b 89 01 b6 cmplwi cr7,r9,438 c000b534: 41 9d 00 64 bgt cr7,c000b598 c000b538: 3d 40 c0 5c lis r10,-16292 c000b53c: 55 29 10 3a rlwinm r9,r9,2,0,29 c000b540: 39 4a 41 e8 addir10,r10,16872 c000b544: 80 01 00 24 lwz r0,36(r1) c000b548: 7d 2a 48 2e lwzxr9,r10,r9 c000b54c: 7c 08 03 a6 mtlrr0 c000b550: 7d 29 03 a6 mtctr r9 c000b554: 83 e1 00 1c lwz r31,28(r1) c000b558: 38 21 00 20 addir1,r1,32 c000b55c: 4e 80 04 20 bctr After the patch: c000b4d4 : c000b4d4: 81 6a 00 84 lwz r11,132(r10) c000b4d8: 90 6a 00 88 stw r3,136(r10) c000b4dc: 69 6b 00 02 xorir11,r11,2 c000b4e0: 55 6b ff fe rlwinm r11,r11,31,31,31 c000b4e4: 0f 0b 00 00 twnei r11,0 c000b4e8: 80 6a 00 a0 lwz r3,160(r10) c000b4ec: 54 63 07 fe clrlwi r3,r3,31 c000b4f0: 0f 03 00 00 twnei r3,0 c000b4f4: 7d 6c 42 e6 mftbr11 c000b4f8: 81 82 00 08 lwz r12,8(r2) c000b4fc: 80 02 00 28 lwz r0,40(r2) c000b500: 91 62 00 24 stw r11,36(r2) c000b504: 7c 00 60 50 subfr0,r0,r12 c000b508: 7d 60 5a 14 add r11,r0,r11 c000b50c: 91 62 00 08 stw r11,8(r2) c000b510: 7c 10 13 a6 mtspr 80,r0 c000b514: 80 62 00 70 lwz r3,112(r2) c000b518: 70 6b 86 91 andi. r11,r3,34449 c000b51c: 40 82 00 28 bne c000b544 c000b520: 2b 89 01 b6 cmplwi cr7,r9,438 c000b524: 41 9d 00 84 bgt cr7,c000b5a8 c000b528: 80 6a 00 88 lwz r3,136(r10) c000b52c: 3d 40 c0 5c lis r10,-16292 c000b530: 55 29 10 3a rlwinm r9,r9,2,0,29 c000b534: 39 4a 41 e4 addir10,r10,16868 c000b538: 7d 2a 48 2e lwzxr9,r10,r9 c000b53c: 7d 29 03 a6 mtctr r9 c000b540: 4e 80 04 20 bctr Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/syscall.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c index a3510fa4e641..476909b11051 100644 --- a/arch/powerpc/kernel/syscall.c +++ b/arch/powerpc/kernel/syscall.c @@ -115,6 +115,9 @@ notrace long system_call_exception(long r3, long r4, long r5, return regs->gpr[3]; } return -ENOSYS; + } else { + /* Restore r3 from orig_gpr3 to free up a volatile reg */ + r3 = regs->orig_gpr3; } /* May be faster to do array_index_nospec? */ -- 2.25.0
[PATCH v4 18/23] powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry
system_call_exception() checks MSR_PR and BUGs if a syscall is issued from kernel mode. No need to handle it anymore from the ASM entry code. null_syscall reduction 2 cycles (348 => 346 cycles) Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/entry_32.S | 30 -- arch/powerpc/kernel/head_32.h| 3 --- arch/powerpc/kernel/head_booke.h | 3 --- 3 files changed, 36 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index ce5fdb23ed7c..9922a04650f7 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -416,36 +416,6 @@ ret_from_kernel_thread: li r3,0 b ret_from_syscall - /* -* System call was called from kernel. We get here with SRR1 in r9. -* Mark the exception as recoverable once we have retrieved SRR0, -* trap a warning and return ENOSYS with CR[SO] set. -*/ - .globl ret_from_kernel_syscall -ret_from_kernel_syscall: - mfspr r9, SPRN_SRR0 - mfspr r10, SPRN_SRR1 -#if !defined(CONFIG_4xx) && !defined(CONFIG_BOOKE) - LOAD_REG_IMMEDIATE(r11, MSR_KERNEL & ~(MSR_IR|MSR_DR)) - mtmsr r11 -#endif - -0: trap - EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING - - li r3, ENOSYS - crset so -#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS) - mtspr SPRN_NRI, r0 -#endif - mtspr SPRN_SRR0, r9 - mtspr SPRN_SRR1, r10 - rfi -#ifdef CONFIG_40x - b . /* Prevent prefetch past rfi */ -#endif -_ASM_NOKPROBE_SYMBOL(ret_from_kernel_syscall) - /* * Top-level page fault handling. * This is in assembler because if do_page_fault tells us that diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index c2aa0d8f1f63..c0de4acbe3f8 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -118,8 +118,6 @@ .macro SYSCALL_ENTRY trapno mfspr r9, SPRN_SRR1 mfspr r10, SPRN_SRR0 - andi. r11, r9, MSR_PR - beq-99f LOAD_REG_IMMEDIATE(r11, MSR_KERNEL) /* can take exceptions */ lis r12, 1f@h ori r12, r12, 1f@l @@ -176,7 +174,6 @@ 3: #endif b transfer_to_syscall /* jump to handler */ -99:b ret_from_kernel_syscall .endm .macro save_dar_dsisr_on_stack reg1, reg2, sp diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index faff094b650e..7af84e1e717b 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -106,10 +106,8 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) #endif mfspr r9, SPRN_SRR1 BOOKE_CLEAR_BTB(r11) - andi. r11, r9, MSR_PR lwz r11, TASK_STACK - THREAD(r10) rlwinm r12,r12,0,4,2 /* Clear SO bit in CR */ - beq-99f ALLOC_STACK_FRAME(r11, THREAD_SIZE - INT_FRAME_SIZE) stw r12, _CCR(r11) /* save various registers */ mflrr12 @@ -157,7 +155,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) 3: b transfer_to_syscall /* jump to handler */ -99:b ret_from_kernel_syscall .endm /* To handle the additional exception priority levels on 40x and Book-E -- 2.25.0
[PATCH v4 17/23] powerpc/syscall: implement system call entry/exit logic in C for PPC32
That's port of PPC64 syscall entry/exit logic in C to PPC32. Performancewise on 8xx: Before : 304 cycles on null_syscall After : 348 cycles on null_syscall Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/entry_32.S | 227 --- arch/powerpc/kernel/head_32.h| 16 --- arch/powerpc/kernel/head_booke.h | 15 -- 3 files changed, 29 insertions(+), 229 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 97dc28a68465..ce5fdb23ed7c 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -329,117 +329,22 @@ stack_ovf: _ASM_NOKPROBE_SYMBOL(stack_ovf) #endif -#ifdef CONFIG_TRACE_IRQFLAGS -trace_syscall_entry_irq_off: - /* -* Syscall shouldn't happen while interrupts are disabled, -* so let's do a warning here. -*/ -0: trap - EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING - bl trace_hardirqs_on - - /* Now enable for real */ - LOAD_REG_IMMEDIATE(r10, MSR_KERNEL | MSR_EE) - mtmsr r10 - - REST_GPR(0, r1) - REST_4GPRS(3, r1) - REST_2GPRS(7, r1) - b DoSyscall -#endif /* CONFIG_TRACE_IRQFLAGS */ - .globl transfer_to_syscall transfer_to_syscall: SAVE_NVGPRS(r1) #ifdef CONFIG_PPC_BOOK3S_32 kuep_lock r11, r12 #endif -#ifdef CONFIG_TRACE_IRQFLAGS - andi. r12,r9,MSR_EE - beq-trace_syscall_entry_irq_off -#endif /* CONFIG_TRACE_IRQFLAGS */ -/* - * Handle a system call. - */ - .stabs "arch/powerpc/kernel/",N_SO,0,0,0f - .stabs "entry_32.S",N_SO,0,0,0f -0: - -_GLOBAL(DoSyscall) - stw r3,ORIG_GPR3(r1) - li r12,0 - stw r12,RESULT(r1) -#ifdef CONFIG_TRACE_IRQFLAGS - /* Make sure interrupts are enabled */ - mfmsr r11 - andi. r12,r11,MSR_EE - /* We came in with interrupts disabled, we WARN and mark them enabled -* for lockdep now */ -0: tweqi r12, 0 - EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING -#endif /* CONFIG_TRACE_IRQFLAGS */ - lwz r11,TI_FLAGS(r2) - andi. r11,r11,_TIF_SYSCALL_DOTRACE - bne-syscall_dotrace -syscall_dotrace_cont: - cmplwi 0,r0,NR_syscalls - lis r10,sys_call_table@h - ori r10,r10,sys_call_table@l - slwir0,r0,2 - bge-66f - - barrier_nospec_asm - /* -* Prevent the load of the handler below (based on the user-passed -* system call number) being speculatively executed until the test -* against NR_syscalls and branch to .66f above has -* committed. -*/ + /* Calling convention has r9 = orig r0, r10 = regs */ + mr r9,r0 + addir10,r1,STACK_FRAME_OVERHEAD + bl system_call_exception - lwzxr10,r10,r0 /* Fetch system call handler [ptr] */ - mtlrr10 - addir9,r1,STACK_FRAME_OVERHEAD - PPC440EP_ERR42 - blrl/* Call handler */ - .globl ret_from_syscall ret_from_syscall: -#ifdef CONFIG_DEBUG_RSEQ - /* Check whether the syscall is issued inside a restartable sequence */ - stw r3,GPR3(r1) - addir3,r1,STACK_FRAME_OVERHEAD - bl rseq_syscall - lwz r3,GPR3(r1) -#endif - mr r6,r3 - /* disable interrupts so current_thread_info()->flags can't change */ - LOAD_REG_IMMEDIATE(r10,MSR_KERNEL) /* doesn't include MSR_EE */ - /* Note: We don't bother telling lockdep about it */ - mtmsr r10 - lwz r9,TI_FLAGS(r2) - li r8,-MAX_ERRNO - andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) - bne-syscall_exit_work - cmplw 0,r3,r8 - blt+syscall_exit_cont - lwz r11,_CCR(r1)/* Load CR */ - neg r3,r3 - orisr11,r11,0x1000 /* Set SO bit in CR */ - stw r11,_CCR(r1) -syscall_exit_cont: - lwz r8,_MSR(r1) -#ifdef CONFIG_TRACE_IRQFLAGS - /* If we are going to return from the syscall with interrupts -* off, we trace that here. It shouldn't normally happen. -*/ - andi. r10,r8,MSR_EE - bne+1f - stw r3,GPR3(r1) - bl trace_hardirqs_off - lwz r3,GPR3(r1) -1: -#endif /* CONFIG_TRACE_IRQFLAGS */ + addir4,r1,STACK_FRAME_OVERHEAD + li r5,0 + bl syscall_exit_prepare #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) /* If the process has its own DBCR0 value, load it up. The internal debug mode bit tells us that dbcr0 should be loaded. */ @@ -453,34 +358,39 @@ syscall_exit_cont: cmplwi cr0,r5,0 bne-2f #endif /* CONFIG_PPC_47x */ -1: -BEGIN_FTR_SECTION - lwarx r7,0,r1 -END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX) - stwcx. r0,0,r1 /* to clear
[PATCH v4 16/23] powerpc/32: Always save non volatile GPRs at syscall entry
In preparation for porting syscall entry/exit to C, inconditionally save non volatile general purpose registers. Commit 965dd3ad3076 ("powerpc/64/syscall: Remove non-volatile GPR save optimisation") provides detailed explanation. This increases the number of cycles by 24 cycles on 8xx with null_syscall benchmark (280 => 304 cycles) Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/entry_32.S | 46 +--- arch/powerpc/kernel/head_32.h| 2 +- arch/powerpc/kernel/head_booke.h | 2 +- arch/powerpc/kernel/syscalls/syscall.tbl | 20 +++ 4 files changed, 8 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index b1e36602c013..97dc28a68465 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -351,6 +351,7 @@ trace_syscall_entry_irq_off: .globl transfer_to_syscall transfer_to_syscall: + SAVE_NVGPRS(r1) #ifdef CONFIG_PPC_BOOK3S_32 kuep_lock r11, r12 #endif @@ -614,51 +615,6 @@ ret_from_kernel_syscall: #endif _ASM_NOKPROBE_SYMBOL(ret_from_kernel_syscall) -/* - * The fork/clone functions need to copy the full register set into - * the child process. Therefore we need to save all the nonvolatile - * registers (r13 - r31) before calling the C code. - */ - .globl ppc_fork -ppc_fork: - SAVE_NVGPRS(r1) - lwz r0,_TRAP(r1) - rlwinm r0,r0,0,0,30/* clear LSB to indicate full */ - stw r0,_TRAP(r1)/* register set saved */ - b sys_fork - - .globl ppc_vfork -ppc_vfork: - SAVE_NVGPRS(r1) - lwz r0,_TRAP(r1) - rlwinm r0,r0,0,0,30/* clear LSB to indicate full */ - stw r0,_TRAP(r1)/* register set saved */ - b sys_vfork - - .globl ppc_clone -ppc_clone: - SAVE_NVGPRS(r1) - lwz r0,_TRAP(r1) - rlwinm r0,r0,0,0,30/* clear LSB to indicate full */ - stw r0,_TRAP(r1)/* register set saved */ - b sys_clone - - .globl ppc_clone3 -ppc_clone3: - SAVE_NVGPRS(r1) - lwz r0,_TRAP(r1) - rlwinm r0,r0,0,0,30/* clear LSB to indicate full */ - stw r0,_TRAP(r1)/* register set saved */ - b sys_clone3 - - .globl ppc_swapcontext -ppc_swapcontext: - SAVE_NVGPRS(r1) - lwz r0,_TRAP(r1) - rlwinm r0,r0,0,0,30/* clear LSB to indicate full */ - stw r0,_TRAP(r1)/* register set saved */ - b sys_swapcontext - /* * Top-level page fault handling. * This is in assembler because if do_page_fault tells us that diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index 24dc326e0d56..7b12736ec546 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -148,7 +148,7 @@ stw r2,GPR2(r11) addir10,r10,STACK_FRAME_REGS_MARKER@l stw r9,_MSR(r11) - li r2, \trapno + 1 + li r2, \trapno stw r10,8(r11) stw r2,_TRAP(r11) SAVE_GPR(0, r11) diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index b3c502c503a0..626e716576ce 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -124,7 +124,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) stw r2,GPR2(r11) addir12, r12, STACK_FRAME_REGS_MARKER@l stw r9,_MSR(r11) - li r2, \trapno + 1 + li r2, \trapno stw r12, 8(r11) stw r2,_TRAP(r11) SAVE_GPR(0, r11) diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index f744eb5cba88..96b2157f0371 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -9,9 +9,7 @@ # 0 nospu restart_syscall sys_restart_syscall 1 nospu exitsys_exit -2 32 forkppc_fork sys_fork -2 64 forksys_fork -2 spu forksys_ni_syscall +2 nospu forksys_fork 3 common readsys_read 4 common write sys_write 5 common opensys_open compat_sys_open @@ -160,9 +158,7 @@ 11932 sigreturn sys_sigreturn compat_sys_sigreturn 11964 sigreturn sys_ni_syscall 119spu sigreturn sys_ni_syscall -12032 clone ppc_clone sys_clone -12064 clone sys_clone -120spu clone
[PATCH v4 15/23] powerpc/syscall: Change condition to check MSR_RI
In system_call_exception(), MSR_RI also needs to be checked on 8xx. Only booke and 40x doesn't have MSR_RI. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/syscall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c index b66cfcbcb755..a3510fa4e641 100644 --- a/arch/powerpc/kernel/syscall.c +++ b/arch/powerpc/kernel/syscall.c @@ -39,7 +39,7 @@ notrace long system_call_exception(long r3, long r4, long r5, trace_hardirqs_off(); /* finish reconciling */ - if (IS_ENABLED(CONFIG_PPC_BOOK3S)) + if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x)) BUG_ON(!(regs->msr & MSR_RI)); BUG_ON(!(regs->msr & MSR_PR)); BUG_ON(!FULL_REGS(regs)); -- 2.25.0
[PATCH v4 14/23] powerpc/syscall: Save r3 in regs->orig_r3
Save r3 in regs->orig_r3 in system_call_exception() Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/entry_64.S | 1 - arch/powerpc/kernel/syscall.c | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index aa1af139d947..a562a4240aa6 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -278,7 +278,6 @@ END_BTB_FLUSH_SECTION std r10,_LINK(r1) std r11,_TRAP(r1) std r12,_CCR(r1) - std r3,ORIG_GPR3(r1) addir10,r1,STACK_FRAME_OVERHEAD ld r11,exception_marker@toc(r2) std r11,-16(r10)/* "regshere" marker */ diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c index cb415170b8f2..b66cfcbcb755 100644 --- a/arch/powerpc/kernel/syscall.c +++ b/arch/powerpc/kernel/syscall.c @@ -29,6 +29,8 @@ notrace long system_call_exception(long r3, long r4, long r5, { syscall_fn f; + regs->orig_gpr3 = r3; + if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED); -- 2.25.0
[PATCH v4 13/23] powerpc/syscall: Use is_compat_task()
Instead of hard comparing task flags with _TIF_32BIT, use is_compat_task(). The advantage is that it returns 0 on PPC32 allthough _TIF_32BIT is always set. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/syscall.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c index bf9bf4b5bc41..cb415170b8f2 100644 --- a/arch/powerpc/kernel/syscall.c +++ b/arch/powerpc/kernel/syscall.c @@ -2,6 +2,8 @@ #include #include +#include + #include #include #include @@ -116,7 +118,7 @@ notrace long system_call_exception(long r3, long r4, long r5, /* May be faster to do array_index_nospec? */ barrier_nospec(); - if (unlikely(is_32bit_task())) { + if (unlikely(is_compat_task())) { f = (void *)compat_sys_call_table[r0]; r3 &= 0xULL; -- 2.25.0
[PATCH v4 12/23] powerpc/syscall: Make syscall.c buildable on PPC32
ifdef out specific PPC64 stuff to allow building syscall.c on PPC32. Modify Makefile to always build syscall.o Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/Makefile | 4 ++-- arch/powerpc/kernel/syscall.c | 9 + 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 1cbc51fc82fd..23c127db0d0c 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -46,10 +46,10 @@ obj-y := cputable.o syscalls.o \ prom.o traps.o setup-common.o \ udbg.o misc.o io.o misc_$(BITS).o \ of_platform.o prom_parse.o firmware.o \ - hw_breakpoint_constraints.o + hw_breakpoint_constraints.o syscall.o obj-y += ptrace/ obj-$(CONFIG_PPC64)+= setup_64.o \ - paca.o nvram_64.o note.o syscall.o + paca.o nvram_64.o note.o obj-$(CONFIG_COMPAT) += sys_ppc32.o signal_32.o obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c index b627a6384029..bf9bf4b5bc41 100644 --- a/arch/powerpc/kernel/syscall.c +++ b/arch/powerpc/kernel/syscall.c @@ -39,7 +39,7 @@ notrace long system_call_exception(long r3, long r4, long r5, BUG_ON(!(regs->msr & MSR_RI)); BUG_ON(!(regs->msr & MSR_PR)); BUG_ON(!FULL_REGS(regs)); - BUG_ON(regs->softe != IRQS_ENABLED); + BUG_ON(arch_irq_disabled_regs(regs)); #ifdef CONFIG_PPC_PKEY if (mmu_has_feature(MMU_FTR_PKEY)) { @@ -77,7 +77,7 @@ notrace long system_call_exception(long r3, long r4, long r5, * frame, or if the unwinder was taught the first stack frame always * returns to user with IRQS_ENABLED, this store could be avoided! */ - regs->softe = IRQS_ENABLED; + irq_soft_mask_regs_set_state(regs, IRQS_ENABLED); local_irq_enable(); @@ -147,6 +147,7 @@ static notrace inline bool prep_irq_for_enabled_exit(bool clear_ri) __hard_EE_RI_disable(); else __hard_irq_disable(); +#ifdef CONFIG_PPC64 if (unlikely(lazy_irq_pending_nocheck())) { /* Took an interrupt, may have more exit work to do. */ if (clear_ri) @@ -158,7 +159,7 @@ static notrace inline bool prep_irq_for_enabled_exit(bool clear_ri) } local_paca->irq_happened = 0; irq_soft_mask_set(IRQS_ENABLED); - +#endif return true; } @@ -281,7 +282,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, return ret; } -#ifdef CONFIG_PPC_BOOK3S /* BOOK3E not yet using this */ +#ifdef CONFIG_PPC_BOOK3S_64 /* BOOK3E not yet using this */ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr) { #ifdef CONFIG_PPC_BOOK3E -- 2.25.0
[PATCH v4 11/23] powerpc/syscall: Rename syscall_64.c into syscall.c
syscall_64.c will be reused almost as is for PPC32. Rename it syscall.c Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/Makefile| 2 +- arch/powerpc/kernel/{syscall_64.c => syscall.c} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename arch/powerpc/kernel/{syscall_64.c => syscall.c} (100%) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index fe2ef598e2ea..1cbc51fc82fd 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -49,7 +49,7 @@ obj-y := cputable.o syscalls.o \ hw_breakpoint_constraints.o obj-y += ptrace/ obj-$(CONFIG_PPC64)+= setup_64.o \ - paca.o nvram_64.o note.o syscall_64.o + paca.o nvram_64.o note.o syscall.o obj-$(CONFIG_COMPAT) += sys_ppc32.o signal_32.o obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall.c similarity index 100% rename from arch/powerpc/kernel/syscall_64.c rename to arch/powerpc/kernel/syscall.c -- 2.25.0
[PATCH v4 10/23] powerpc/irq: Add stub irq_soft_mask_return() for PPC32
To allow building syscall_64.c smoothly on PPC32, add stub version of irq_soft_mask_return(). Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/hw_irq.h | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h index 4739f61e632c..56a98936a6a9 100644 --- a/arch/powerpc/include/asm/hw_irq.h +++ b/arch/powerpc/include/asm/hw_irq.h @@ -330,6 +330,11 @@ static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned l } #else /* CONFIG_PPC64 */ +static inline notrace unsigned long irq_soft_mask_return(void) +{ + return 0; +} + static inline unsigned long arch_local_save_flags(void) { return mfmsr(); -- 2.25.0
[PATCH v4 09/23] powerpc/irq: Rework helpers that manipulate MSR[EE/RI]
In preparation of porting PPC32 to C syscall entry/exit, rewrite the following helpers as static inline functions and add support for PPC32 in them: __hard_irq_enable() __hard_irq_disable() __hard_EE_RI_disable() __hard_RI_enable() Then use them in PPC32 version of arch_local_irq_disable() and arch_local_irq_enable() to avoid code duplication. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/hw_irq.h | 75 +-- arch/powerpc/include/asm/reg.h| 1 + 2 files changed, 52 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h index ed0c3b049dfd..4739f61e632c 100644 --- a/arch/powerpc/include/asm/hw_irq.h +++ b/arch/powerpc/include/asm/hw_irq.h @@ -50,6 +50,55 @@ #ifndef __ASSEMBLY__ +static inline void __hard_irq_enable(void) +{ + if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x)) + wrtee(MSR_EE); + else if (IS_ENABLED(CONFIG_PPC_8xx)) + wrtspr(SPRN_EIE); + else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64)) + __mtmsrd(MSR_EE | MSR_RI, 1); + else + mtmsr(mfmsr() | MSR_EE); +} + +static inline void __hard_irq_disable(void) +{ + if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x)) + wrtee(0); + else if (IS_ENABLED(CONFIG_PPC_8xx)) + wrtspr(SPRN_EID); + else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64)) + __mtmsrd(MSR_RI, 1); + else + mtmsr(mfmsr() & ~MSR_EE); +} + +static inline void __hard_EE_RI_disable(void) +{ + if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x)) + wrtee(0); + else if (IS_ENABLED(CONFIG_PPC_8xx)) + wrtspr(SPRN_NRI); + else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64)) + __mtmsrd(0, 1); + else + mtmsr(mfmsr() & ~(MSR_EE | MSR_RI)); +} + +static inline void __hard_RI_enable(void) +{ + if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x)) + return; + + if (IS_ENABLED(CONFIG_PPC_8xx)) + wrtspr(SPRN_EID); + else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64)) + __mtmsrd(MSR_RI, 1); + else + mtmsr(mfmsr() | MSR_RI); +} + #ifdef CONFIG_PPC64 #include @@ -212,18 +261,6 @@ static inline bool arch_irqs_disabled(void) #endif /* CONFIG_PPC_BOOK3S */ -#ifdef CONFIG_PPC_BOOK3E -#define __hard_irq_enable()wrtee(MSR_EE) -#define __hard_irq_disable() wrtee(0) -#define __hard_EE_RI_disable() wrtee(0) -#define __hard_RI_enable() do { } while (0) -#else -#define __hard_irq_enable()__mtmsrd(MSR_EE|MSR_RI, 1) -#define __hard_irq_disable() __mtmsrd(MSR_RI, 1) -#define __hard_EE_RI_disable() __mtmsrd(0, 1) -#define __hard_RI_enable() __mtmsrd(MSR_RI, 1) -#endif - #define hard_irq_disable() do {\ unsigned long flags;\ __hard_irq_disable(); \ @@ -322,22 +359,12 @@ static inline unsigned long arch_local_irq_save(void) static inline void arch_local_irq_disable(void) { - if (IS_ENABLED(CONFIG_BOOKE)) - wrtee(0); - else if (IS_ENABLED(CONFIG_PPC_8xx)) - wrtspr(SPRN_EID); - else - mtmsr(mfmsr() & ~MSR_EE); + __hard_irq_disable(); } static inline void arch_local_irq_enable(void) { - if (IS_ENABLED(CONFIG_BOOKE)) - wrtee(MSR_EE); - else if (IS_ENABLED(CONFIG_PPC_8xx)) - wrtspr(SPRN_EIE); - else - mtmsr(mfmsr() | MSR_EE); + __hard_irq_enable(); } static inline bool arch_irqs_disabled_flags(unsigned long flags) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index e40a921d78f9..d05dca30604d 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1375,6 +1375,7 @@ #define mtmsr(v) asm volatile("mtmsr %0" : \ : "r" ((unsigned long)(v)) \ : "memory") +#define __mtmsrd(v, l) BUILD_BUG() #define __MTMSR"mtmsr" #endif -- 2.25.0
[PATCH v4 05/23] powerpc/64s: Make kuap_check_amr() and kuap_get_and_check_amr() generic
In preparation of porting powerpc32 to C syscall entry/exit, rename kuap_check_amr() and kuap_get_and_check_amr() as kuap_check() and kuap_get_and_check(), and move in the generic asm/kup.h the stub for when CONFIG_PPC_KUAP is not selected. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/64/kup.h | 24 ++-- arch/powerpc/include/asm/kup.h | 9 - arch/powerpc/kernel/syscall_64.c | 12 ++-- 3 files changed, 16 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index f50f72e535aa..1507681ad4ef 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -281,7 +281,7 @@ static inline void kuap_kernel_restore(struct pt_regs *regs, */ } -static inline unsigned long kuap_get_and_check_amr(void) +static inline unsigned long kuap_get_and_check(void) { if (mmu_has_feature(MMU_FTR_BOOK3S_KUAP)) { unsigned long amr = mfspr(SPRN_AMR); @@ -292,27 +292,7 @@ static inline unsigned long kuap_get_and_check_amr(void) return 0; } -#else /* CONFIG_PPC_PKEY */ - -static inline void kuap_user_restore(struct pt_regs *regs) -{ -} - -static inline void kuap_kernel_restore(struct pt_regs *regs, unsigned long amr) -{ -} - -static inline unsigned long kuap_get_and_check_amr(void) -{ - return 0; -} - -#endif /* CONFIG_PPC_PKEY */ - - -#ifdef CONFIG_PPC_KUAP - -static inline void kuap_check_amr(void) +static inline void kuap_check(void) { if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_BOOK3S_KUAP)) WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED); diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h index bf221a2a523e..6ef9f9cfbed0 100644 --- a/arch/powerpc/include/asm/kup.h +++ b/arch/powerpc/include/asm/kup.h @@ -66,7 +66,14 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) return false; } -static inline void kuap_check_amr(void) { } +static inline void kuap_check(void) { } +static inline void kuap_user_restore(struct pt_regs *regs) { } +static inline void kuap_kernel_restore(struct pt_regs *regs, unsigned long amr) { } + +static inline unsigned long kuap_get_and_check(void) +{ + return 0; +} /* * book3s/64/kup-radix.h defines these functions for the !KUAP case to flush diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c index 32f72965da26..b627a6384029 100644 --- a/arch/powerpc/kernel/syscall_64.c +++ b/arch/powerpc/kernel/syscall_64.c @@ -65,7 +65,7 @@ notrace long system_call_exception(long r3, long r4, long r5, isync(); } else #endif - kuap_check_amr(); + kuap_check(); account_cpu_user_entry(); @@ -181,7 +181,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, CT_WARN_ON(ct_state() == CONTEXT_USER); - kuap_check_amr(); + kuap_check(); regs->result = r3; @@ -303,7 +303,7 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned * We don't need to restore AMR on the way back to userspace for KUAP. * AMR can only have been unlocked if we interrupted the kernel. */ - kuap_check_amr(); + kuap_check(); local_irq_save(flags); @@ -381,7 +381,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign unsigned long *ti_flagsp = _thread_info()->flags; unsigned long flags; unsigned long ret = 0; - unsigned long amr; + unsigned long kuap; if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI))) unrecoverable_exception(regs); @@ -394,7 +394,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign if (TRAP(regs) != 0x700) CT_WARN_ON(ct_state() == CONTEXT_USER); - amr = kuap_get_and_check_amr(); + kuap = kuap_get_and_check(); if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) { clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp); @@ -446,7 +446,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign * which would cause Read-After-Write stalls. Hence, we take the AMR * value from the check above. */ - kuap_kernel_restore(regs, amr); + kuap_kernel_restore(regs, kuap); return ret; } -- 2.25.0
[PATCH v4 06/23] powerpc/32s: Create C version of kuap_user/kernel_restore() and friends
In preparation of porting PPC32 to C syscall entry/exit, create C version of kuap_user_restore() and kuap_kernel_restore() and kuap_check() and kuap_get_and_check() on book3s/32. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/kup.h | 33 1 file changed, 33 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h index a0117a9d5b06..a3e72e1141c5 100644 --- a/arch/powerpc/include/asm/book3s/32/kup.h +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -103,6 +103,39 @@ static inline void kuap_update_sr(u32 sr, u32 addr, u32 end) isync();/* Context sync required after mtsrin() */ } +static inline void kuap_user_restore(struct pt_regs *regs) +{ +} + +static inline void kuap_kernel_restore(struct pt_regs *regs, unsigned long kuap) +{ + u32 addr = kuap & 0xf000; + u32 end = kuap << 28; + + if (unlikely(!kuap)) + return; + + current->thread.kuap = 0; + kuap_update_sr(mfsrin(addr) & ~SR_KS, addr, end); /* Clear Ks */ +} + +static inline void kuap_check(void) +{ + if (!IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) + return; + + WARN_ON_ONCE(current->thread.kuap != 0); +} + +static inline unsigned long kuap_get_and_check(void) +{ + unsigned long kuap = current->thread.kuap; + + WARN_ON_ONCE(IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && kuap != 0); + + return kuap; +} + static __always_inline void allow_user_access(void __user *to, const void __user *from, u32 size, unsigned long dir) { -- 2.25.0
[PATCH v4 08/23] powerpc/irq: Add helper to set regs->softe
regs->softe doesn't exist on PPC32. Add irq_soft_mask_regs_set_state() helper to set regs->softe. This helper will void on PPC32. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/hw_irq.h | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h index 614957f74cee..ed0c3b049dfd 100644 --- a/arch/powerpc/include/asm/hw_irq.h +++ b/arch/powerpc/include/asm/hw_irq.h @@ -38,6 +38,8 @@ #define PACA_IRQ_MUST_HARD_MASK(PACA_IRQ_EE) #endif +#endif /* CONFIG_PPC64 */ + /* * flags for paca->irq_soft_mask */ @@ -46,8 +48,6 @@ #define IRQS_PMI_DISABLED 2 #define IRQS_ALL_DISABLED (IRQS_DISABLED | IRQS_PMI_DISABLED) -#endif /* CONFIG_PPC64 */ - #ifndef __ASSEMBLY__ #ifdef CONFIG_PPC64 @@ -287,6 +287,10 @@ extern void irq_set_pending_from_srr1(unsigned long srr1); extern void force_external_irq_replay(void); +static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned long val) +{ + regs->softe = val; +} #else /* CONFIG_PPC64 */ static inline unsigned long arch_local_save_flags(void) @@ -355,6 +359,9 @@ static inline bool arch_irq_disabled_regs(struct pt_regs *regs) static inline void may_hard_irq_enable(void) { } +static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned long val) +{ +} #endif /* CONFIG_PPC64 */ #define ARCH_IRQ_INIT_FLAGSIRQ_NOREQUEST -- 2.25.0
[PATCH v4 07/23] powerpc/8xx: Create C version of kuap_user/kernel_restore() and friends
In preparation of porting PPC32 to C syscall entry/exit, create C version of kuap_user_restore() and kuap_kernel_restore() and kuap_check() and kuap_get_and_check() on 8xx Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/nohash/32/kup-8xx.h | 27 1 file changed, 27 insertions(+) diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h b/arch/powerpc/include/asm/nohash/32/kup-8xx.h index 17a4a616436f..5ca6c375f767 100644 --- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h @@ -34,6 +34,33 @@ #include +static inline void kuap_user_restore(struct pt_regs *regs) +{ +} + +static inline void kuap_kernel_restore(struct pt_regs *regs, unsigned long kuap) +{ + mtspr(SPRN_MD_AP, kuap); +} + +static inline void kuap_check(void) +{ + if (!IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) + return; + + WARN_ON_ONCE(mfspr(SPRN_MD_AP) >> 16 != MD_APG_KUAP >> 16); +} + +static inline unsigned long kuap_get_and_check(void) +{ + unsigned long kuap = mfspr(SPRN_MD_AP); + + if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) + WARN_ON_ONCE(mfspr(SPRN_MD_AP) >> 16 != MD_APG_KUAP >> 16); + + return kuap; +} + static inline void allow_user_access(void __user *to, const void __user *from, unsigned long size, unsigned long dir) { -- 2.25.0
[PATCH v4 02/23] powerpc/32: Always enable data translation on syscall entry
If the code can use a stack in vm area, it can also use a stack in linear space. Simplify code by removing old non VMAP stack code on PPC32 in syscall. That means the data translation is now re-enabled early in syscall entry in all cases, not only when using VMAP stacks. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.h| 23 +-- arch/powerpc/kernel/head_booke.h | 2 -- 2 files changed, 1 insertion(+), 24 deletions(-) diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index a2f72c966baf..fdc07beab844 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -118,7 +118,6 @@ .macro SYSCALL_ENTRY trapno mfspr r12,SPRN_SPRG_THREAD mfspr r9, SPRN_SRR1 -#ifdef CONFIG_VMAP_STACK mfspr r11, SPRN_SRR0 mtctr r11 andi. r11, r9, MSR_PR @@ -126,30 +125,16 @@ lwz r1,TASK_STACK-THREAD(r12) beq-99f addir1, r1, THREAD_SIZE - INT_FRAME_SIZE - li r10, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */ + LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~(MSR_IR | MSR_RI)) /* can take DTLB miss */ mtmsr r10 isync tovirt(r12, r12) stw r11,GPR1(r1) stw r11,0(r1) mr r11, r1 -#else - andi. r11, r9, MSR_PR - lwz r11,TASK_STACK-THREAD(r12) - beq-99f - addir11, r11, THREAD_SIZE - INT_FRAME_SIZE - tophys(r11, r11) - stw r1,GPR1(r11) - stw r1,0(r11) - tovirt(r1, r11) /* set new kernel sp */ -#endif mflrr10 stw r10, _LINK(r11) -#ifdef CONFIG_VMAP_STACK mfctr r10 -#else - mfspr r10,SPRN_SRR0 -#endif stw r10,_NIP(r11) mfcrr10 rlwinm r10,r10,0,4,2 /* Clear SO bit in CR */ @@ -157,11 +142,7 @@ #ifdef CONFIG_40x rlwinm r9,r9,0,14,12 /* clear MSR_WE (necessary?) */ #else -#ifdef CONFIG_VMAP_STACK LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~MSR_IR) /* can take exceptions */ -#else - LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~(MSR_IR|MSR_DR)) /* can take exceptions */ -#endif mtmsr r10 /* (except for mach check in rtas) */ #endif lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */ @@ -190,7 +171,6 @@ li r12,-1 /* clear all pending debug events */ mtspr SPRN_DBSR,r12 lis r11,global_dbcr0@ha - tophys(r11,r11) addir11,r11,global_dbcr0@l lwz r12,0(r11) mtspr SPRN_DBCR0,r12 @@ -200,7 +180,6 @@ #endif 3: - tovirt_novmstack r2, r2 /* set r2 to current */ lis r11, transfer_to_syscall@h ori r11, r11, transfer_to_syscall@l #ifdef CONFIG_TRACE_IRQFLAGS diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index bf33af714d11..706cd9368992 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -144,7 +144,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) li r12,-1 /* clear all pending debug events */ mtspr SPRN_DBSR,r12 lis r11,global_dbcr0@ha - tophys(r11,r11) addir11,r11,global_dbcr0@l #ifdef CONFIG_SMP lwz r10, TASK_CPU(r2) @@ -158,7 +157,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) stw r12,4(r11) 3: - tovirt(r2, r2) /* set r2 to current */ lis r11, transfer_to_syscall@h ori r11, r11, transfer_to_syscall@l #ifdef CONFIG_TRACE_IRQFLAGS -- 2.25.0
[PATCH v4 00/23] powerpc/32: Implement C syscall entry/exit
This series implements C syscall entry/exit for PPC32. It reuses the work already done for PPC64. This series is based on Nick's v6 series "powerpc: interrupt wrappers". Patch 1 is a bug fix submitted separately but this series depends on it. Patches 2-4 are an extract from the series "powerpc/32: Reduce head complexity and re-activate MMU earlier". The changes here are limited to system calls. That series will be respined to only contain exception related changes and the syscall changes will remain in this series. Patches 5-16 are preparatory changes. Patch 17 is THE patch that changes to C syscall entry/exit Patches 18-23 are optimisations. In terms on performance we have the following number of cycles on an 8xx running null_syscall benchmark: - mainline: 296 cycles - after patch 4: 283 cycles - after patch 16: 304 cycles - after patch 17: 348 cycles - at the end of the series: 320 cycles So in summary, we have a degradation of performance of 8% on null_syscall. I think it is not a big degradation, it is worth it. v4 is the first mature version. Christophe Leroy (23): powerpc/32s: Add missing call to kuep_lock on syscall entry powerpc/32: Always enable data translation on syscall entry powerpc/32: On syscall entry, enable instruction translation at the same time as data powerpc/32: Reorder instructions to avoid using CTR in syscall entry powerpc/64s: Make kuap_check_amr() and kuap_get_and_check_amr() generic powerpc/32s: Create C version of kuap_user/kernel_restore() and friends powerpc/8xx: Create C version of kuap_user/kernel_restore() and friends powerpc/irq: Add helper to set regs->softe powerpc/irq: Rework helpers that manipulate MSR[EE/RI] powerpc/irq: Add stub irq_soft_mask_return() for PPC32 powerpc/syscall: Rename syscall_64.c into syscall.c powerpc/syscall: Make syscall.c buildable on PPC32 powerpc/syscall: Use is_compat_task() powerpc/syscall: Save r3 in regs->orig_r3 powerpc/syscall: Change condition to check MSR_RI powerpc/32: Always save non volatile GPRs at syscall entry powerpc/syscall: implement system call entry/exit logic in C for PPC32 powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry powerpc/syscall: Avoid stack frame in likely part of system_call_exception() powerpc/syscall: Do not check unsupported scv vector on PPC32 powerpc/syscall: Remove FULL_REGS verification in system_call_exception powerpc/syscall: Optimise checks in beginning of system_call_exception() powerpc/syscall: Avoid storing 'current' in another pointer arch/powerpc/include/asm/book3s/32/kup.h | 33 ++ arch/powerpc/include/asm/book3s/64/kup.h | 24 +- arch/powerpc/include/asm/hw_irq.h | 91 -- arch/powerpc/include/asm/kup.h| 9 +- arch/powerpc/include/asm/nohash/32/kup-8xx.h | 27 ++ arch/powerpc/include/asm/reg.h| 1 + arch/powerpc/kernel/Makefile | 4 +- arch/powerpc/kernel/entry_32.S| 305 ++ arch/powerpc/kernel/entry_64.S| 1 - arch/powerpc/kernel/head_32.h | 76 + arch/powerpc/kernel/head_booke.h | 27 +- .../kernel/{syscall_64.c => syscall.c}| 57 ++-- arch/powerpc/kernel/syscalls/syscall.tbl | 20 +- 13 files changed, 225 insertions(+), 450 deletions(-) rename arch/powerpc/kernel/{syscall_64.c => syscall.c} (90%) -- 2.25.0
[PATCH v4 03/23] powerpc/32: On syscall entry, enable instruction translation at the same time as data
On 40x and 8xx, kernel text is pinned. On book3s/32, kernel text is mapped by BATs. Enable instruction translation at the same time as data translation, it makes things simpler. MSR_RI can also be set at the same time because srr0/srr1 are already saved and r1 is set properly. On booke, translation is always on, so at the end all PPC32 have translation on early. This reduces null_syscall benchmark by 13 cycles on 8xx (296 ==> 283 cycles). Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.h| 26 +- arch/powerpc/kernel/head_booke.h | 7 ++- 2 files changed, 11 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index fdc07beab844..4029c51dce5d 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -125,9 +125,13 @@ lwz r1,TASK_STACK-THREAD(r12) beq-99f addir1, r1, THREAD_SIZE - INT_FRAME_SIZE - LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~(MSR_IR | MSR_RI)) /* can take DTLB miss */ - mtmsr r10 - isync + LOAD_REG_IMMEDIATE(r10, MSR_KERNEL) /* can take exceptions */ + mtspr SPRN_SRR1, r10 + lis r10, 1f@h + ori r10, r10, 1f@l + mtspr SPRN_SRR0, r10 + rfi +1: tovirt(r12, r12) stw r11,GPR1(r1) stw r11,0(r1) @@ -141,9 +145,6 @@ stw r10,_CCR(r11) /* save registers */ #ifdef CONFIG_40x rlwinm r9,r9,0,14,12 /* clear MSR_WE (necessary?) */ -#else - LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~MSR_IR) /* can take exceptions */ - mtmsr r10 /* (except for mach check in rtas) */ #endif lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */ stw r2,GPR2(r11) @@ -180,8 +181,6 @@ #endif 3: - lis r11, transfer_to_syscall@h - ori r11, r11, transfer_to_syscall@l #ifdef CONFIG_TRACE_IRQFLAGS /* * If MSR is changing we need to keep interrupts disabled at this point @@ -193,15 +192,8 @@ #else LOAD_REG_IMMEDIATE(r10, MSR_KERNEL | MSR_EE) #endif -#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS) - mtspr SPRN_NRI, r0 -#endif - mtspr SPRN_SRR1,r10 - mtspr SPRN_SRR0,r11 - rfi /* jump to handler, enable MMU */ -#ifdef CONFIG_40x - b . /* Prevent prefetch past rfi */ -#endif + mtmsr r10 + b transfer_to_syscall /* jump to handler */ 99:b ret_from_kernel_syscall .endm diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index 706cd9368992..b3c502c503a0 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -157,8 +157,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) stw r12,4(r11) 3: - lis r11, transfer_to_syscall@h - ori r11, r11, transfer_to_syscall@l #ifdef CONFIG_TRACE_IRQFLAGS /* * If MSR is changing we need to keep interrupts disabled at this point @@ -172,9 +170,8 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) lis r10, (MSR_KERNEL | MSR_EE)@h ori r10, r10, (MSR_KERNEL | MSR_EE)@l #endif - mtspr SPRN_SRR1,r10 - mtspr SPRN_SRR0,r11 - rfi /* jump to handler, enable MMU */ + mtmsr r10 + b transfer_to_syscall /* jump to handler */ 99:b ret_from_kernel_syscall .endm -- 2.25.0
[PATCH v4 01/23] powerpc/32s: Add missing call to kuep_lock on syscall entry
Userspace Execution protection and fast syscall entry were implemented independently from each other and were both merged in kernel 5.2, leading to syscall entry missing userspace execution protection. On syscall entry, execution of user space memory must be locked in the same way as on exception entry. Fixes: b86fb88855ea ("powerpc/32: implement fast entry for syscalls on non BOOKE") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/entry_32.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index b102b40c4988..b1e36602c013 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -351,6 +351,9 @@ trace_syscall_entry_irq_off: .globl transfer_to_syscall transfer_to_syscall: +#ifdef CONFIG_PPC_BOOK3S_32 + kuep_lock r11, r12 +#endif #ifdef CONFIG_TRACE_IRQFLAGS andi. r12,r9,MSR_EE beq-trace_syscall_entry_irq_off -- 2.25.0
[PATCH v4 04/23] powerpc/32: Reorder instructions to avoid using CTR in syscall entry
Now that we are using rfi instead of mtmsr to reactivate MMU, it is possible to reorder instructions and avoid the need to use CTR for stashing SRR0. null_syscall on 8xx is reduced by 3 cycles (283 => 280 cycles). Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.h | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index 4029c51dce5d..24dc326e0d56 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -116,30 +116,28 @@ .endm .macro SYSCALL_ENTRY trapno - mfspr r12,SPRN_SPRG_THREAD mfspr r9, SPRN_SRR1 - mfspr r11, SPRN_SRR0 - mtctr r11 + mfspr r10, SPRN_SRR0 andi. r11, r9, MSR_PR + beq-99f + LOAD_REG_IMMEDIATE(r11, MSR_KERNEL) /* can take exceptions */ + lis r12, 1f@h + ori r12, r12, 1f@l + mtspr SPRN_SRR1, r11 + mtspr SPRN_SRR0, r12 + mfspr r12,SPRN_SPRG_THREAD mr r11, r1 lwz r1,TASK_STACK-THREAD(r12) - beq-99f + tovirt(r12, r12) addir1, r1, THREAD_SIZE - INT_FRAME_SIZE - LOAD_REG_IMMEDIATE(r10, MSR_KERNEL) /* can take exceptions */ - mtspr SPRN_SRR1, r10 - lis r10, 1f@h - ori r10, r10, 1f@l - mtspr SPRN_SRR0, r10 rfi 1: - tovirt(r12, r12) stw r11,GPR1(r1) stw r11,0(r1) mr r11, r1 + stw r10,_NIP(r11) mflrr10 stw r10, _LINK(r11) - mfctr r10 - stw r10,_NIP(r11) mfcrr10 rlwinm r10,r10,0,4,2 /* Clear SO bit in CR */ stw r10,_CCR(r11) /* save registers */ -- 2.25.0
[PATCH v2 2/2] powerpc/sstep: Fix incorrect return from analyze_instr()
We currently just percolate the return value from analyze_instr() to the caller of emulate_step(), especially if it is a -1. For one particular case (opcode = 4) for instructions that aren't currently emulated, we are returning 'should not be single-stepped' while we should have returned 0 which says 'did not emulate, may have to single-step'. Fixes: 930d6288a26787 ("powerpc: sstep: Add support for maddhd, maddhdu, maddld instructions") Signed-off-by: Ananth N Mavinakayanahalli Suggested-by: Michael Ellerman Tested-by: Naveen N. Rao Reviewed-by: Sandipan Das --- arch/powerpc/lib/sstep.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index f859cbbb6375..e96cff845ef7 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1445,6 +1445,11 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, #ifdef __powerpc64__ case 4: + /* +* There are very many instructions with this primary opcode +* introduced in the ISA as early as v2.03. However, the ones +* we currently emulate were all introduced with ISA 3.0 +*/ if (!cpu_has_feature(CPU_FTR_ARCH_300)) goto unknown_opcode; @@ -1472,7 +1477,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, * There are other instructions from ISA 3.0 with the same * primary opcode which do not have emulation support yet. */ - return -1; + goto unknown_opcode; #endif case 7: /* mulli */
[PATCH v4 1/2] [PATCH] powerpc/sstep: Check instruction validity against ISA version before emulation
We currently unconditionally try to emulate newer instructions on older Power versions that could cause issues. Gate it. Fixes: 350779a29f11 ("powerpc: Handle most loads and stores in instruction emulation code") Signed-off-by: Ananth N Mavinakayanahalli --- [v4] Based on feedback from Paul Mackerras, Naveen Rao and Michael Ellerman, changed return code to 0, after setting opcode type to UNKNOWN [v3] Addressed Naveen's comments on scv and addpcis [v2] Fixed description --- arch/powerpc/lib/sstep.c | 78 +- 1 file changed, 62 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index bf7a7d62ae8b..f859cbbb6375 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1304,9 +1304,11 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, if ((word & 0xfe2) == 2) op->type = SYSCALL; else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && - (word & 0xfe3) == 1) + (word & 0xfe3) == 1) { /* scv */ op->type = SYSCALL_VECTORED_0; - else + if (!cpu_has_feature(CPU_FTR_ARCH_300)) + goto unknown_opcode; + } else op->type = UNKNOWN; return 0; #endif @@ -1410,7 +1412,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, #ifdef __powerpc64__ case 1: if (!cpu_has_feature(CPU_FTR_ARCH_31)) - return -1; + goto unknown_opcode; prefix_r = GET_PREFIX_R(word); ra = GET_PREFIX_RA(suffix); @@ -1444,7 +1446,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, #ifdef __powerpc64__ case 4: if (!cpu_has_feature(CPU_FTR_ARCH_300)) - return -1; + goto unknown_opcode; switch (word & 0x3f) { case 48:/* maddhd */ @@ -1530,6 +1532,8 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, case 19: if (((word >> 1) & 0x1f) == 2) { /* addpcis */ + if (!cpu_has_feature(CPU_FTR_ARCH_300)) + goto unknown_opcode; imm = (short) (word & 0xffc1); /* d0 + d2 fields */ imm |= (word >> 15) & 0x3e; /* d1 field */ op->val = regs->nip + (imm << 16) + 4; @@ -1842,7 +1846,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, #ifdef __powerpc64__ case 265: /* modud */ if (!cpu_has_feature(CPU_FTR_ARCH_300)) - return -1; + goto unknown_opcode; op->val = regs->gpr[ra] % regs->gpr[rb]; goto compute_done; #endif @@ -1852,7 +1856,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, case 267: /* moduw */ if (!cpu_has_feature(CPU_FTR_ARCH_300)) - return -1; + goto unknown_opcode; op->val = (unsigned int) regs->gpr[ra] % (unsigned int) regs->gpr[rb]; goto compute_done; @@ -1889,7 +1893,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, #endif case 755: /* darn */ if (!cpu_has_feature(CPU_FTR_ARCH_300)) - return -1; + goto unknown_opcode; switch (ra & 0x3) { case 0: /* 32-bit conditioned */ @@ -1911,14 +1915,14 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, #ifdef __powerpc64__ case 777: /* modsd */ if (!cpu_has_feature(CPU_FTR_ARCH_300)) - return -1; + goto unknown_opcode; op->val = (long int) regs->gpr[ra] % (long int) regs->gpr[rb]; goto compute_done; #endif case 779: /* modsw */ if (!cpu_has_feature(CPU_FTR_ARCH_300)) - return -1; + goto unknown_opcode; op->val = (int) regs->gpr[ra] % (int) regs->gpr[rb]; goto compute_done; @@ -1995,14 +1999,14 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, #endif
RE: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings
From: Christophe Leroy > Sent: 25 January 2021 09:15 > > Le 24/01/2021 à 09:22, Nicholas Piggin a écrit : > > Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC > > enables support on architectures that define HAVE_ARCH_HUGE_VMAP and > > supports PMD sized vmap mappings. > > > > vmalloc will attempt to allocate PMD-sized pages if allocating PMD size > > or larger, and fall back to small pages if that was unsuccessful. > > > > Architectures must ensure that any arch specific vmalloc allocations > > that require PAGE_SIZE mappings (e.g., module allocations vs strict > > module rwx) use the VM_NOHUGE flag to inhibit larger mappings. > > > > When hugepage vmalloc mappings are enabled in the next patch, this > > reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node > > POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. > > > > This can result in more internal fragmentation and memory overhead for a > > given allocation, an option nohugevmalloc is added to disable at boot. > > > > Signed-off-by: Nicholas Piggin > > --- > > arch/Kconfig| 10 +++ > > include/linux/vmalloc.h | 18 > > mm/page_alloc.c | 5 +- > > mm/vmalloc.c| 192 ++-- > > 4 files changed, 177 insertions(+), 48 deletions(-) > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 0377e1d059e5..eef61e0f5170 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > > @@ -2691,15 +2746,18 @@ EXPORT_SYMBOL_GPL(vmap_pfn); > > #endif /* CONFIG_VMAP_PFN */ > > > > static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > -pgprot_t prot, int node) > > +pgprot_t prot, unsigned int page_shift, > > +int node) > > { > > const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > > - unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; > > - unsigned long array_size; > > - unsigned int i; > > + unsigned int page_order = page_shift - PAGE_SHIFT; > > + unsigned long addr = (unsigned long)area->addr; > > + unsigned long size = get_vm_area_size(area); > > + unsigned int nr_small_pages = size >> PAGE_SHIFT; > > struct page **pages; > > + unsigned int i; > > > > - array_size = (unsigned long)nr_pages * sizeof(struct page *); > > + array_size = (unsigned long)nr_small_pages * sizeof(struct page *); > > array_size() is a function in include/linux/overflow.h > > For some reason, it breaks the build with your series. I can't see the replacement definition for array_size. The old local variable is deleted. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings
Le 25/01/2021 à 12:37, Nicholas Piggin a écrit : Excerpts from Christophe Leroy's message of January 25, 2021 7:14 pm: Le 24/01/2021 à 09:22, Nicholas Piggin a écrit : Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC enables support on architectures that define HAVE_ARCH_HUGE_VMAP and supports PMD sized vmap mappings. vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or larger, and fall back to small pages if that was unsuccessful. Architectures must ensure that any arch specific vmalloc allocations that require PAGE_SIZE mappings (e.g., module allocations vs strict module rwx) use the VM_NOHUGE flag to inhibit larger mappings. When hugepage vmalloc mappings are enabled in the next patch, this reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. This can result in more internal fragmentation and memory overhead for a given allocation, an option nohugevmalloc is added to disable at boot. Signed-off-by: Nicholas Piggin --- arch/Kconfig| 10 +++ include/linux/vmalloc.h | 18 mm/page_alloc.c | 5 +- mm/vmalloc.c| 192 ++-- 4 files changed, 177 insertions(+), 48 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 0377e1d059e5..eef61e0f5170 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2691,15 +2746,18 @@ EXPORT_SYMBOL_GPL(vmap_pfn); #endif /* CONFIG_VMAP_PFN */ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, -pgprot_t prot, int node) +pgprot_t prot, unsigned int page_shift, +int node) { const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; - unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; - unsigned long array_size; - unsigned int i; + unsigned int page_order = page_shift - PAGE_SHIFT; + unsigned long addr = (unsigned long)area->addr; + unsigned long size = get_vm_area_size(area); + unsigned int nr_small_pages = size >> PAGE_SHIFT; struct page **pages; + unsigned int i; - array_size = (unsigned long)nr_pages * sizeof(struct page *); + array_size = (unsigned long)nr_small_pages * sizeof(struct page *); array_size() is a function in include/linux/overflow.h For some reason, it breaks the build with your series. What config? I haven't seen it. Several configs I believe. I saw it this morning in https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210124082230.2118861-13-npig...@gmail.com/ Though the reports have all disappeared now.
Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings
Excerpts from Christophe Leroy's message of January 25, 2021 7:14 pm: > > > Le 24/01/2021 à 09:22, Nicholas Piggin a écrit : >> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC >> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and >> supports PMD sized vmap mappings. >> >> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size >> or larger, and fall back to small pages if that was unsuccessful. >> >> Architectures must ensure that any arch specific vmalloc allocations >> that require PAGE_SIZE mappings (e.g., module allocations vs strict >> module rwx) use the VM_NOHUGE flag to inhibit larger mappings. >> >> When hugepage vmalloc mappings are enabled in the next patch, this >> reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node >> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. >> >> This can result in more internal fragmentation and memory overhead for a >> given allocation, an option nohugevmalloc is added to disable at boot. >> >> Signed-off-by: Nicholas Piggin >> --- >> arch/Kconfig| 10 +++ >> include/linux/vmalloc.h | 18 >> mm/page_alloc.c | 5 +- >> mm/vmalloc.c| 192 ++-- >> 4 files changed, 177 insertions(+), 48 deletions(-) >> > >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index 0377e1d059e5..eef61e0f5170 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c > >> @@ -2691,15 +2746,18 @@ EXPORT_SYMBOL_GPL(vmap_pfn); >> #endif /* CONFIG_VMAP_PFN */ >> >> static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, >> - pgprot_t prot, int node) >> + pgprot_t prot, unsigned int page_shift, >> + int node) >> { >> const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; >> -unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; >> -unsigned long array_size; >> -unsigned int i; >> +unsigned int page_order = page_shift - PAGE_SHIFT; >> +unsigned long addr = (unsigned long)area->addr; >> +unsigned long size = get_vm_area_size(area); >> +unsigned int nr_small_pages = size >> PAGE_SHIFT; >> struct page **pages; >> +unsigned int i; >> >> -array_size = (unsigned long)nr_pages * sizeof(struct page *); >> +array_size = (unsigned long)nr_small_pages * sizeof(struct page *); > > array_size() is a function in include/linux/overflow.h > > For some reason, it breaks the build with your series. What config? I haven't seen it. Thanks, Nick
Re: [PATCH v10 06/12] powerpc: inline huge vmap supported functions
Excerpts from Christophe Leroy's message of January 25, 2021 6:42 pm: > > > Le 24/01/2021 à 09:22, Nicholas Piggin a écrit : >> This allows unsupported levels to be constant folded away, and so >> p4d_free_pud_page can be removed because it's no longer linked to. > > Ah, ok, you did it here. Why not squashing this patch into patch 5 directly ? To reduce arch code movement in the first patch and split up these arch patches to get separate acks for them. Maybe overkill for these changes but doesn't hurt I think. Thanks, Nick
[PATCH] powerpc: remove unneeded semicolons
Remove superfluous semicolons after function definitions. Signed-off-by: Chengyang Fan --- arch/powerpc/include/asm/book3s/32/mmu-hash.h | 2 +- arch/powerpc/include/asm/book3s/64/mmu.h| 2 +- arch/powerpc/include/asm/book3s/64/tlbflush-radix.h | 2 +- arch/powerpc/include/asm/book3s/64/tlbflush.h | 2 +- arch/powerpc/include/asm/firmware.h | 2 +- arch/powerpc/include/asm/kvm_ppc.h | 6 +++--- arch/powerpc/include/asm/paca.h | 6 +++--- arch/powerpc/include/asm/rtas.h | 2 +- arch/powerpc/include/asm/setup.h| 6 +++--- arch/powerpc/include/asm/simple_spinlock.h | 4 ++-- arch/powerpc/include/asm/smp.h | 2 +- arch/powerpc/include/asm/xmon.h | 4 ++-- arch/powerpc/kernel/prom.c | 2 +- arch/powerpc/kernel/setup.h | 12 ++-- arch/powerpc/platforms/powernv/subcore.h| 2 +- arch/powerpc/platforms/pseries/pseries.h| 2 +- 16 files changed, 29 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h b/arch/powerpc/include/asm/book3s/32/mmu-hash.h index 685c589e723f..b85f8e114a9c 100644 --- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h @@ -94,7 +94,7 @@ typedef struct { } mm_context_t; void update_bats(void); -static inline void cleanup_cpu_mmu_context(void) { }; +static inline void cleanup_cpu_mmu_context(void) { } /* patch sites */ extern s32 patch__hash_page_A0, patch__hash_page_A1, patch__hash_page_A2; diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index 995bbcdd0ef8..eace8c3f7b0a 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -239,7 +239,7 @@ static inline void setup_initial_memory_limit(phys_addr_t first_memblock_base, #ifdef CONFIG_PPC_PSERIES extern void radix_init_pseries(void); #else -static inline void radix_init_pseries(void) { }; +static inline void radix_init_pseries(void) { } #endif #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h index 94439e0cefc9..8b33601cdb9d 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h @@ -35,7 +35,7 @@ extern void radix__flush_pwc_lpid(unsigned int lpid); extern void radix__flush_all_lpid(unsigned int lpid); extern void radix__flush_all_lpid_guest(unsigned int lpid); #else -static inline void radix__tlbiel_all(unsigned int action) { WARN_ON(1); }; +static inline void radix__tlbiel_all(unsigned int action) { WARN_ON(1); } static inline void radix__flush_tlb_lpid_page(unsigned int lpid, unsigned long addr, unsigned long page_size) diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h index dcb5c3839d2f..215973b4cb26 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h @@ -31,7 +31,7 @@ static inline void tlbiel_all(void) hash__tlbiel_all(TLB_INVAL_SCOPE_GLOBAL); } #else -static inline void tlbiel_all(void) { BUG(); }; +static inline void tlbiel_all(void) { BUG(); } #endif static inline void tlbiel_all_lpid(bool radix) diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h index aa6a5ef5d483..7604673787d6 100644 --- a/arch/powerpc/include/asm/firmware.h +++ b/arch/powerpc/include/asm/firmware.h @@ -137,7 +137,7 @@ extern unsigned int __start___fw_ftr_fixup, __stop___fw_ftr_fixup; #ifdef CONFIG_PPC_PSERIES void pseries_probe_fw_features(void); #else -static inline void pseries_probe_fw_features(void) { }; +static inline void pseries_probe_fw_features(void) { } #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 0a056c64c317..259ba4ce9ad3 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -627,9 +627,9 @@ extern int h_ipi_redirect; static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap( struct kvm *kvm) { return NULL; } -static inline void kvmppc_alloc_host_rm_ops(void) {}; -static inline void kvmppc_free_host_rm_ops(void) {}; -static inline void kvmppc_free_pimap(struct kvm *kvm) {}; +static inline void kvmppc_alloc_host_rm_ops(void) {} +static inline void kvmppc_free_host_rm_ops(void) {} +static inline void kvmppc_free_pimap(struct kvm *kvm) {} static inline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall) { return 0; } static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu) diff --git
Re: [PATCH v4 2/2] powerpc/mce: Remove per cpu variables from MCE handlers
Le 22/01/2021 à 13:32, Ganesh Goudar a écrit : Access to per-cpu variables requires translation to be enabled on pseries machine running in hash mmu mode, Since part of MCE handler runs in realmode and part of MCE handling code is shared between ppc architectures pseries and powernv, it becomes difficult to manage these variables differently on different architectures, So have these variables in paca instead of having them as per-cpu variables to avoid complications. Signed-off-by: Ganesh Goudar --- v2: Dynamically allocate memory for machine check event info v3: Remove check for hash mmu lpar, use memblock_alloc_try_nid to allocate memory. v4: Spliting the patch into two. --- arch/powerpc/include/asm/mce.h | 18 +++ arch/powerpc/include/asm/paca.h| 4 ++ arch/powerpc/kernel/mce.c | 79 ++ arch/powerpc/kernel/setup-common.c | 2 +- 4 files changed, 70 insertions(+), 33 deletions(-) diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 71f38e9248be..17dc451f0e45 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -916,7 +916,6 @@ void __init setup_arch(char **cmdline_p) /* On BookE, setup per-core TLB data structures. */ setup_tlb_core_data(); #endif - This line removal is really required for this patch ? /* Print various info about the machine that has been gathered so far. */ print_system_info(); @@ -938,6 +937,7 @@ void __init setup_arch(char **cmdline_p) exc_lvl_early_init(); emergency_stack_init(); + mce_init(); You have to include mce.h to avoid build failure on PPC32. smp_release_cpus(); initmem_init();
Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings
Le 24/01/2021 à 09:22, Nicholas Piggin a écrit : Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC enables support on architectures that define HAVE_ARCH_HUGE_VMAP and supports PMD sized vmap mappings. vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or larger, and fall back to small pages if that was unsuccessful. Architectures must ensure that any arch specific vmalloc allocations that require PAGE_SIZE mappings (e.g., module allocations vs strict module rwx) use the VM_NOHUGE flag to inhibit larger mappings. When hugepage vmalloc mappings are enabled in the next patch, this reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. This can result in more internal fragmentation and memory overhead for a given allocation, an option nohugevmalloc is added to disable at boot. Signed-off-by: Nicholas Piggin --- arch/Kconfig| 10 +++ include/linux/vmalloc.h | 18 mm/page_alloc.c | 5 +- mm/vmalloc.c| 192 ++-- 4 files changed, 177 insertions(+), 48 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 0377e1d059e5..eef61e0f5170 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2691,15 +2746,18 @@ EXPORT_SYMBOL_GPL(vmap_pfn); #endif /* CONFIG_VMAP_PFN */ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, -pgprot_t prot, int node) +pgprot_t prot, unsigned int page_shift, +int node) { const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; - unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; - unsigned long array_size; - unsigned int i; + unsigned int page_order = page_shift - PAGE_SHIFT; + unsigned long addr = (unsigned long)area->addr; + unsigned long size = get_vm_area_size(area); + unsigned int nr_small_pages = size >> PAGE_SHIFT; struct page **pages; + unsigned int i; - array_size = (unsigned long)nr_pages * sizeof(struct page *); + array_size = (unsigned long)nr_small_pages * sizeof(struct page *); array_size() is a function in include/linux/overflow.h For some reason, it breaks the build with your series. gfp_mask |= __GFP_NOWARN; if (!(gfp_mask & (GFP_DMA | GFP_DMA32))) gfp_mask |= __GFP_HIGHMEM;
Re: [PATCH v10 06/12] powerpc: inline huge vmap supported functions
Le 24/01/2021 à 09:22, Nicholas Piggin a écrit : This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Ah, ok, you did it here. Why not squashing this patch into patch 5 directly ? Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/vmalloc.h | 19 --- arch/powerpc/mm/book3s64/radix_pgtable.c | 21 - 2 files changed, 16 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/vmalloc.h b/arch/powerpc/include/asm/vmalloc.h index 105abb73f075..3f0c153befb0 100644 --- a/arch/powerpc/include/asm/vmalloc.h +++ b/arch/powerpc/include/asm/vmalloc.h @@ -1,12 +1,25 @@ #ifndef _ASM_POWERPC_VMALLOC_H #define _ASM_POWERPC_VMALLOC_H +#include #include #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -bool arch_vmap_p4d_supported(pgprot_t prot); -bool arch_vmap_pud_supported(pgprot_t prot); -bool arch_vmap_pmd_supported(pgprot_t prot); +static inline bool arch_vmap_p4d_supported(pgprot_t prot) +{ + return false; +} + +static inline bool arch_vmap_pud_supported(pgprot_t prot) +{ + /* HPT does not cope with large pages in the vmalloc area */ + return radix_enabled(); +} + +static inline bool arch_vmap_pmd_supported(pgprot_t prot) +{ + return radix_enabled(); +} #endif #endif /* _ASM_POWERPC_VMALLOC_H */ diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 743807fc210f..8da62afccee5 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -1082,22 +1082,6 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, set_pte_at(mm, addr, ptep, pte); } -bool arch_vmap_pud_supported(pgprot_t prot) -{ - /* HPT does not cope with large pages in the vmalloc area */ - return radix_enabled(); -} - -bool arch_vmap_pmd_supported(pgprot_t prot) -{ - return radix_enabled(); -} - -int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) -{ - return 0; -} - int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { pte_t *ptep = (pte_t *)pud; @@ -1181,8 +1165,3 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) return 1; } - -bool arch_vmap_p4d_supported(pgprot_t prot) -{ - return false; -}
Re: [PATCH v10 05/12] mm: HUGE_VMAP arch support cleanup
Le 24/01/2021 à 09:22, Nicholas Piggin a écrit : This changes the awkward approach where architectures provide init functions to determine which levels they can provide large mappings for, to one where the arch is queried for each call. This removes code and indirection, and allows constant-folding of dead code for unsupported levels. It looks like this is only the case when CONFIG_HAVE_ARCH_HUGE_VMAP is not defined. When it is defined, for exemple on powerpc you defined arch_vmap_p4d_supported() as a regular function in arch/powerpc/mm/book3s64/radix_pgtable.c, so allthough it returns always false, it won't constant fold dead code. This also adds a prot argument to the arch query. This is unused currently but could help with some architectures (e.g., some powerpc processors can't map uncacheable memory with large pages). Cc: linuxppc-dev@lists.ozlabs.org Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x...@kernel.org Cc: "H. Peter Anvin" Acked-by: Catalin Marinas [arm64] Signed-off-by: Nicholas Piggin --- arch/arm64/include/asm/vmalloc.h | 8 +++ arch/arm64/mm/mmu.c | 10 +-- arch/powerpc/include/asm/vmalloc.h | 8 +++ arch/powerpc/mm/book3s64/radix_pgtable.c | 8 +-- arch/x86/include/asm/vmalloc.h | 7 ++ arch/x86/mm/ioremap.c| 12 ++-- include/linux/io.h | 9 --- include/linux/vmalloc.h | 6 ++ init/main.c | 1 - mm/ioremap.c | 88 +--- 10 files changed, 79 insertions(+), 78 deletions(-) Christophe
Re: [PATCH v10 05/12] mm: HUGE_VMAP arch support cleanup
Le 24/01/2021 à 12:40, Christoph Hellwig a écrit : diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h index 2ca708ab9b20..597b40405319 100644 --- a/arch/arm64/include/asm/vmalloc.h +++ b/arch/arm64/include/asm/vmalloc.h @@ -1,4 +1,12 @@ #ifndef _ASM_ARM64_VMALLOC_H #define _ASM_ARM64_VMALLOC_H +#include + +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP +bool arch_vmap_p4d_supported(pgprot_t prot); +bool arch_vmap_pud_supported(pgprot_t prot); +bool arch_vmap_pmd_supported(pgprot_t prot); +#endif Shouldn't the be inlines or macros? Also it would be useful if the architectures would not have to override all functions but just those that are it actually implements? Also lots of > 80 char lines in the patch. Since https://github.com/linuxppc/linux/commit/bdc48fa11e46f867ea4d75fa59ee87a7f48be144 this 80 char limit is not strongly enforced anymore. Allthough 80 is still the prefered limit, code is often more readable with a slightly longer single line that with lines splited. Christophe
[PATCH] KVM: PPC: Book3S: Assign boolean values to a bool variable
Fix the following coccicheck warnings: ./arch/powerpc/kvm/book3s_hv_rm_xics.c:381:3-15: WARNING: Assignment of 0/1 to bool variable. Reported-by: Abaci Robot Signed-off-by: Jiapeng Zhong --- arch/powerpc/kvm/book3s_hv_rm_xics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index c2c9c73..68e509d 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -378,7 +378,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, arch_spin_unlock(>lock); icp->n_reject++; new_irq = reject; - check_resend = 0; + check_resend = false; goto again; } } else { -- 1.8.3.1