Re: [PATCH v13 6/8] powerpc: Set ARCH_HAS_STRICT_MODULE_RWX
Le 10/05/2021 à 03:18, Jordan Niethe a écrit : From: Russell Currey To enable strict module RWX on powerpc, set: CONFIG_STRICT_MODULE_RWX=y You should also have CONFIG_STRICT_KERNEL_RWX=y set to have any real security benefit. ARCH_HAS_STRICT_MODULE_RWX is set to require ARCH_HAS_STRICT_KERNEL_RWX. This is due to a quirk in arch/Kconfig and arch/powerpc/Kconfig that makes STRICT_MODULE_RWX *on by default* in configurations where STRICT_KERNEL_RWX is *unavailable*. Since this doesn't make much sense, and module RWX without kernel RWX doesn't make much sense, having the same dependencies as kernel RWX works around this problem. With STRICT_MODULE_RWX, now make module_alloc() allocate pages with KERNEL_PAGE protection rather than KERNEL_PAGE_EXEC. Book32s/32 processors with a hash mmu (i.e. 604 core) can not set memory protection on a page by page basis so do not enable. Signed-off-by: Russell Currey [jpn: - predicate on !PPC_BOOK3S_604 - make module_alloc() use PAGE_KERNEL protection] Signed-off-by: Jordan Niethe --- v10: - Predicate on !PPC_BOOK3S_604 - Make module_alloc() use PAGE_KERNEL protection v11: - Neaten up v13: Use strict_kernel_rwx_enabled() --- arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/module.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index cce0a137b046..cb5d9d862c35 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -140,6 +140,7 @@ config PPC select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64 select ARCH_HAS_SET_MEMORY select ARCH_HAS_STRICT_KERNEL_RWX if ((PPC_BOOK3S_64 || PPC32) && !HIBERNATION) + select ARCH_HAS_STRICT_MODULE_RWX if ARCH_HAS_STRICT_KERNEL_RWX && !PPC_BOOK3S_604 select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UACCESS_FLUSHCACHE select ARCH_HAS_COPY_MC if PPC64 diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index 3f35c8d20be7..f24004635ed5 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -92,12 +92,14 @@ int module_finalize(const Elf_Ehdr *hdr, static __always_inline void * __module_alloc(unsigned long size, unsigned long start, unsigned long end) { + pgprot_t prot = strict_kernel_rwx_enabled() ? PAGE_KERNEL : PAGE_KERNEL_EXEC; + I'm not sure this is OK. I think we need to make a new helper strict_module_rwx_enabled() because I don't think we want PAGE_KERNEL here when CONFIG_STRICT_MODULE_RWX is not selected. /* * Don't do huge page allocations for modules yet until more testing * is done. STRICT_MODULE_RWX may require extra work to support this * too. */ - return __vmalloc_node_range(size, 1, start, end, GFP_KERNEL, PAGE_KERNEL_EXEC, + return __vmalloc_node_range(size, 1, start, end, GFP_KERNEL, prot, VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP, NUMA_NO_NODE, __builtin_return_address(0)); }
Re: [PATCH v13 3/8] powerpc/kprobes: Mark newly allocated probes as ROX
Le 10/05/2021 à 03:18, Jordan Niethe a écrit : From: Russell Currey Add the arch specific insn page allocator for powerpc. This allocates ROX pages if STRICT_KERNEL_RWX is enabled. These pages are only written to with patch_instruction() which is able to write RO pages. Reviewed-by: Daniel Axtens Signed-off-by: Russell Currey Signed-off-by: Christophe Leroy [jpn: Reword commit message, switch to __vmalloc_node_range()] Signed-off-by: Jordan Niethe --- v9: - vmalloc_exec() no longer exists - Set the page to RW before freeing it v10: - use __vmalloc_node_range() v11: - Neaten up v12: - Switch from __vmalloc_node_range() to module_alloc() v13: Use strict_kernel_rwx_enabled() --- arch/powerpc/kernel/kprobes.c | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index 01ab2163659e..b517f3e6e7c5 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -19,11 +19,13 @@ #include #include #include +#include #include #include #include #include #include +#include #include DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL; @@ -103,6 +105,21 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset) return addr; } +void *alloc_insn_page(void) +{ + void *page; + + page = module_alloc(PAGE_SIZE); + if (!page) + return NULL; + + if (strict_kernel_rwx_enabled()) { I'm not sure this is OK. I think we need to make a new helper strict_module_rwx_enabled() because I don't think we want to call that when CONFIG_STRICT_MODULE_RWX is not selected. + set_memory_ro((unsigned long)page, 1); + set_memory_x((unsigned long)page, 1); + } + return page; +} + int arch_prepare_kprobe(struct kprobe *p) { int ret = 0;
[PATCH kernel] powerpc/makefile: Remove flag duplicates when generating vdso linker scripts
The cmd_cpp_lds_S rule already has -P and -U$(ARCH) so there is no need in duplicating these, clean that up. Since only -C is left and scripts/Makefile.build have -C removed since commit 5cb0512c02ec ("Kbuild: don't pass "-C" to preprocessor when processing linker scripts") this follows the lead and removes CPPFLAGS_vdso(32|64).lds altogether. Signed-off-by: Alexey Kardashevskiy --- scripts/checkpatch.pl complains as it does not handle quotes in the commit subject line well. oh well. --- arch/powerpc/kernel/vdso32/Makefile | 1 - arch/powerpc/kernel/vdso64/Makefile | 1 - 2 files changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile index 7d9a6fee0e3d..7420e88d5aa3 100644 --- a/arch/powerpc/kernel/vdso32/Makefile +++ b/arch/powerpc/kernel/vdso32/Makefile @@ -44,7 +44,6 @@ asflags-y := -D__VDSO32__ -s obj-y += vdso32_wrapper.o targets += vdso32.lds -CPPFLAGS_vdso32.lds += -P -C -Upowerpc # link rule for the .so file, .lds has to be first $(obj)/vdso32.so.dbg: $(src)/vdso32.lds $(obj-vdso32) $(obj)/vgettimeofday.o FORCE diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile index 2813e3f98db6..fb118630c334 100644 --- a/arch/powerpc/kernel/vdso64/Makefile +++ b/arch/powerpc/kernel/vdso64/Makefile @@ -30,7 +30,6 @@ ccflags-y := -shared -fno-common -fno-builtin -nostdlib \ asflags-y := -D__VDSO64__ -s targets += vdso64.lds -CPPFLAGS_vdso64.lds += -P -C -U$(ARCH) # link rule for the .so file, .lds has to be first $(obj)/vdso64.so.dbg: $(src)/vdso64.lds $(obj-vdso64) $(obj)/vgettimeofday.o FORCE -- 2.30.2
[PATCH] powerpc/64e/interrupt: fix nvgprs being clobbered
Some interrupt handlers have an "extra" that saves 1 or 2 registers (r14, r15) in the paca save area and makes them available to use by the handler. The change to always save nvgprs in exception handlers lead to some interrupt handlers saving those scratch r14 / r15 registers into the interrupt frame's GPR saves, which get restored on interrupt exit. Fix this by always reloading those scratch registers from paca before the EXCEPTION_COMMON that saves nvgprs. Fixes: 4228b2c3d20e ("powerpc/64e/interrupt: always save nvgprs on interrupt") Tested-by: Christian Zigotzky Reported-by: Christian Zigotzky Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64e.S | 38 ++-- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 7c3654b0d0f4..f1ae710274bc 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -340,6 +340,12 @@ ret_from_mc_except: andi. r10,r10,IRQS_DISABLED; /* yes -> go out of line */ \ bne masked_interrupt_book3e_##n +/* + * Additional regs must be re-loaded from paca before EXCEPTION_COMMON* is + * called, because that does SAVE_NVGPRS which must see the original register + * values, otherwise the scratch values might be restored when exiting the + * interrupt. + */ #define PROLOG_ADDITION_2REGS_GEN(n) \ std r14,PACA_EXGEN+EX_R14(r13); \ std r15,PACA_EXGEN+EX_R15(r13) @@ -535,6 +541,10 @@ __end_interrupts: PROLOG_ADDITION_2REGS) mfspr r14,SPRN_DEAR mfspr r15,SPRN_ESR + std r14,_DAR(r1) + std r15,_DSISR(r1) + ld r14,PACA_EXGEN+EX_R14(r13) + ld r15,PACA_EXGEN+EX_R15(r13) EXCEPTION_COMMON(0x300) b storage_fault_common @@ -544,6 +554,10 @@ __end_interrupts: PROLOG_ADDITION_2REGS) li r15,0 mr r14,r10 + std r14,_DAR(r1) + std r15,_DSISR(r1) + ld r14,PACA_EXGEN+EX_R14(r13) + ld r15,PACA_EXGEN+EX_R15(r13) EXCEPTION_COMMON(0x400) b storage_fault_common @@ -557,6 +571,10 @@ __end_interrupts: PROLOG_ADDITION_2REGS) mfspr r14,SPRN_DEAR mfspr r15,SPRN_ESR + std r14,_DAR(r1) + std r15,_DSISR(r1) + ld r14,PACA_EXGEN+EX_R14(r13) + ld r15,PACA_EXGEN+EX_R15(r13) EXCEPTION_COMMON(0x600) b alignment_more /* no room, go out of line */ @@ -565,10 +583,10 @@ __end_interrupts: NORMAL_EXCEPTION_PROLOG(0x700, BOOKE_INTERRUPT_PROGRAM, PROLOG_ADDITION_1REG) mfspr r14,SPRN_ESR - EXCEPTION_COMMON(0x700) std r14,_DSISR(r1) - addir3,r1,STACK_FRAME_OVERHEAD ld r14,PACA_EXGEN+EX_R14(r13) + EXCEPTION_COMMON(0x700) + addir3,r1,STACK_FRAME_OVERHEAD bl program_check_exception REST_NVGPRS(r1) b interrupt_return @@ -725,11 +743,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) * normal exception */ mfspr r14,SPRN_DBSR - EXCEPTION_COMMON_CRIT(0xd00) std r14,_DSISR(r1) - addir3,r1,STACK_FRAME_OVERHEAD ld r14,PACA_EXCRIT+EX_R14(r13) ld r15,PACA_EXCRIT+EX_R15(r13) + EXCEPTION_COMMON_CRIT(0xd00) + addir3,r1,STACK_FRAME_OVERHEAD bl DebugException REST_NVGPRS(r1) b interrupt_return @@ -796,11 +814,11 @@ kernel_dbg_exc: * normal exception */ mfspr r14,SPRN_DBSR - EXCEPTION_COMMON_DBG(0xd08) std r14,_DSISR(r1) - addir3,r1,STACK_FRAME_OVERHEAD ld r14,PACA_EXDBG+EX_R14(r13) ld r15,PACA_EXDBG+EX_R15(r13) + EXCEPTION_COMMON_DBG(0xd08) + addir3,r1,STACK_FRAME_OVERHEAD bl DebugException REST_NVGPRS(r1) b interrupt_return @@ -931,11 +949,7 @@ masked_interrupt_book3e_0x2c0: * original values stashed away in the PACA */ storage_fault_common: - std r14,_DAR(r1) - std r15,_DSISR(r1) addir3,r1,STACK_FRAME_OVERHEAD - ld r14,PACA_EXGEN+EX_R14(r13) - ld r15,PACA_EXGEN+EX_R15(r13) bl do_page_fault b interrupt_return @@ -944,11 +958,7 @@ storage_fault_common: * continues here. */ alignment_more: - std r14,_DAR(r1) - std r15,_DSISR(r1) addir3,r1,STACK_FRAME_OVERHEAD - ld r14,PACA_EXGEN+EX_R14(r13) - ld r15,PACA_EXGEN+EX_R15(r13) bl alignment_exception REST_NVGPRS(r1) b interrupt_return -- 2.23.0
Re: [PATCH kernel v3] powerpc/makefile: Do not redefine $(CPP) for preprocessor
On 14/05/2021 12:42, Masahiro Yamada wrote: On Fri, May 14, 2021 at 3:59 AM Nathan Chancellor wrote: On 5/13/2021 4:59 AM, Alexey Kardashevskiy wrote: The $(CPP) (do only preprocessing) macro is already defined in Makefile. However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results in flags duplication. Which is not a big deal by itself except for the flags which depend on other flags and the compiler checks them as it parses the command line. Specifically, scripts/Makefile.build:304 generates ksyms for .S files. If clang+llvm+sanitizer are enabled, this results in -emit-llvm-bc -fno-lto -flto -fvisibility=hidden \ -fsanitize=cfi-mfcall -fno-lto ... in the clang command line and triggers error: I do not know how to reproduce this for powerpc. Currently, only x86 and arm64 select ARCH_SUPPORTS_LTO_CLANG. Is this a fix for a potential issue? Yeah, it is work in progress to enable LTO_CLANG for PPC64: https://github.com/aik/linux/commits/lto clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed with '-flto' This removes unnecessary CPP redefinition. Which works fine as in most place KBUILD_CFLAGS is passed to $CPP except arch/powerpc/kernel/vdso64/vdso(32|64).lds. To fix vdso, this does: 1. add -m(big|little)-endian to $CPP 2. add target to $KBUILD_CPPFLAGS as otherwise clang ignores -m(big|little)-endian if the building platform does not support big endian (such as x86). Signed-off-by: Alexey Kardashevskiy --- Changes: v3: * moved vdso cleanup in a separate patch * only add target to KBUILD_CPPFLAGS for CLANG v2: * fix KBUILD_CPPFLAGS * add CLANG_FLAGS to CPPFLAGS --- Makefile | 1 + arch/powerpc/Makefile | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 15b6476d0f89..5b545bef7653 100644 --- a/Makefile +++ b/Makefile @@ -576,6 +576,7 @@ CC_VERSION_TEXT = $(subst $(pound),,$(shell $(CC) --version 2>/dev/null | head - ifneq ($(findstring clang,$(CC_VERSION_TEXT)),) ifneq ($(CROSS_COMPILE),) CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%)) +KBUILD_CPPFLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%)) You can avoid the duplication here by just doing: KBUILD_CPPFLAGS += $(CLANG_FLAGS) I am still not super happy about the flag duplication but I am not sure I can think of a better solution. If KBUILD_CPPFLAGS are always included when building .o files, maybe we should just add $(CLANG_FLAGS) to KBUILD_CPPFLAGS instead of KBUILD_CFLAGS? Hmm, I think including --target=* in CPP flags is sensible, but not all CLANG_FLAGS are CPP flags. At least, -(no)-integrated-as is not a CPP flag. We could introduce a separate CLANG_CPP_FLAGS, but it would require more code changes... So, I do not have a strong opinion either way. BTW, another approach might be to modify the linker script. In my best guess, the reason why powerpc adding the endian flag to CPP is this line in arch/powerpc/kernel/vdso64/vdso64.lds.S #ifdef __LITTLE_ENDIAN__ OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", "elf64-powerpcle") #else OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc") #endif You can use the CONFIG option to check the endian-ness. #ifdef CONFIG_CPU_BIG_ENDIAN OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc") #else OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", "elf64-powerpcle") #endif All the big endian arches define CONFIG_CPU_BIG_ENDIAN. (but not all little endian arches define CONFIG_CPU_LITTLE_ENDIAN) This should work with .lds. But missing --target=* might still hit us somewhere else later, these include 3 header files each and there might be endianness dependent stuff. So, #ifdef CONFIG_CPU_BIG_ENDIAN < big endian code > #else < little endian code > #endif works for all architectures. Only the exception is you cannot replace the one in uapi headers. arch/powerpc/include/uapi/asm/byteorder.h: #ifdef __LITTLE_ENDIAN__ since it is exported to userspace, where CONFIG options are not available. BTW, various flags are historically used. - CONFIG_CPU_BIG_ENDIAN / CONFIG_CPU_LITTLE_ENDIAN - __BIG_ENDIAN / __LITTLE_ENDIAN - __LITTLE_ENDIAN__ (powerpc only) __LITTLE_ENDIAN__ is defined by powerpc gcc and clang. My experiments... [1] powerpc-linux-gnu-gcc-> __BIG_ENDIAN__ is defined masahiro@grover:~$ echo | powerpc-linux-gnu-gcc -E -dM -x c - | grep ENDIAN #define __ORDER_LITTLE_ENDIAN__ 1234 #define __BIG_ENDIAN__ 1 #define __FLOAT_WORD_ORDER__ __ORDER_BIG_ENDIAN__ #define __ORDER_PDP_ENDIAN__ 3412 #define _BIG_ENDIAN 1 #define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__ #define __VEC_ELEMENT_REG_ORDER__ __ORDER_BIG_ENDIAN__ #define __ORDER_BIG_ENDIAN__ 4321 [2] powerpc-linux-gnu-gcc + -mlittle-endian-> __LITTLE_ENDIAN__ is defined masahiro@grover:~$ echo | powerpc-linux-gnu-gcc -E -dM -x c - -mlittle-endian | grep ENDIAN #define __ORDER_LITTLE_ENDIAN__ 1234 #define
Re: [PATCH kernel v2] powerpc/makefile: Do not redefine $(CPP) for preprocessor
On Wed, May 12, 2021 at 7:29 PM Segher Boessenkool wrote: > > On Wed, May 12, 2021 at 01:48:53PM +1000, Alexey Kardashevskiy wrote: > > >Oh! I completely missed those few lines. Sorry for that :-( > > > > Well, I probably should have made it a separate patch anyway, I'll > > repost separately. > > Thanks. > > > >To compensate a bit: > > > > > >>It still puzzles me why we need -C > > >>(preserve comments in the preprocessor output) flag here. > > > > > >It is so that a human can look at the output and read it. Comments are > > >very significant to human readers :-) > > > > I seriously doubt anyone ever read those :) > > I am pretty sure whoever wrote it did! Keeping comments in the pre-processed linker scripts is troublesome. That is why -C was removed from scripts/Makefile.build commit 5cb0512c02ecd7e6214e912e4c150f4219ac78e0 Author: Linus Torvalds Date: Thu Nov 2 14:10:37 2017 -0700 Kbuild: don't pass "-C" to preprocessor when processing linker scripts You can entirely remove CPPFLAGS_vdso32.lds += -P -C -Upowerpc -- Best Regards Masahiro Yamada
Re: [PATCH kernel v3] powerpc/makefile: Do not redefine $(CPP) for preprocessor
On Fri, May 14, 2021 at 3:59 AM Nathan Chancellor wrote: > > On 5/13/2021 4:59 AM, Alexey Kardashevskiy wrote: > > The $(CPP) (do only preprocessing) macro is already defined in Makefile. > > However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results > > in flags duplication. Which is not a big deal by itself except for > > the flags which depend on other flags and the compiler checks them > > as it parses the command line. > > > > Specifically, scripts/Makefile.build:304 generates ksyms for .S files. > > If clang+llvm+sanitizer are enabled, this results in > > > > -emit-llvm-bc -fno-lto -flto -fvisibility=hidden \ > > -fsanitize=cfi-mfcall -fno-lto ... > > > > in the clang command line and triggers error: I do not know how to reproduce this for powerpc. Currently, only x86 and arm64 select ARCH_SUPPORTS_LTO_CLANG. Is this a fix for a potential issue? > > clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed with > > '-flto' > > > > This removes unnecessary CPP redefinition. Which works fine as in most > > place KBUILD_CFLAGS is passed to $CPP except > > arch/powerpc/kernel/vdso64/vdso(32|64).lds. To fix vdso, this does: > > 1. add -m(big|little)-endian to $CPP > > 2. add target to $KBUILD_CPPFLAGS as otherwise clang ignores > > -m(big|little)-endian if > > the building platform does not support big endian (such as x86). > > > > Signed-off-by: Alexey Kardashevskiy > > --- > > Changes: > > v3: > > * moved vdso cleanup in a separate patch > > * only add target to KBUILD_CPPFLAGS for CLANG > > > > v2: > > * fix KBUILD_CPPFLAGS > > * add CLANG_FLAGS to CPPFLAGS > > --- > > Makefile | 1 + > > arch/powerpc/Makefile | 3 ++- > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/Makefile b/Makefile > > index 15b6476d0f89..5b545bef7653 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -576,6 +576,7 @@ CC_VERSION_TEXT = $(subst $(pound),,$(shell $(CC) > > --version 2>/dev/null | head - > > ifneq ($(findstring clang,$(CC_VERSION_TEXT)),) > > ifneq ($(CROSS_COMPILE),) > > CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%)) > > +KBUILD_CPPFLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%)) > > You can avoid the duplication here by just doing: > > KBUILD_CPPFLAGS += $(CLANG_FLAGS) > > I am still not super happy about the flag duplication but I am not sure > I can think of a better solution. If KBUILD_CPPFLAGS are always included > when building .o files, maybe we should just add $(CLANG_FLAGS) to > KBUILD_CPPFLAGS instead of KBUILD_CFLAGS? Hmm, I think including --target=* in CPP flags is sensible, but not all CLANG_FLAGS are CPP flags. At least, -(no)-integrated-as is not a CPP flag. We could introduce a separate CLANG_CPP_FLAGS, but it would require more code changes... So, I do not have a strong opinion either way. BTW, another approach might be to modify the linker script. In my best guess, the reason why powerpc adding the endian flag to CPP is this line in arch/powerpc/kernel/vdso64/vdso64.lds.S #ifdef __LITTLE_ENDIAN__ OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", "elf64-powerpcle") #else OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc") #endif You can use the CONFIG option to check the endian-ness. #ifdef CONFIG_CPU_BIG_ENDIAN OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc") #else OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", "elf64-powerpcle") #endif All the big endian arches define CONFIG_CPU_BIG_ENDIAN. (but not all little endian arches define CONFIG_CPU_LITTLE_ENDIAN) So, #ifdef CONFIG_CPU_BIG_ENDIAN < big endian code > #else < little endian code > #endif works for all architectures. Only the exception is you cannot replace the one in uapi headers. arch/powerpc/include/uapi/asm/byteorder.h: #ifdef __LITTLE_ENDIAN__ since it is exported to userspace, where CONFIG options are not available. BTW, various flags are historically used. - CONFIG_CPU_BIG_ENDIAN / CONFIG_CPU_LITTLE_ENDIAN - __BIG_ENDIAN / __LITTLE_ENDIAN - __LITTLE_ENDIAN__ (powerpc only) __LITTLE_ENDIAN__ is defined by powerpc gcc and clang. My experiments... [1] powerpc-linux-gnu-gcc-> __BIG_ENDIAN__ is defined masahiro@grover:~$ echo | powerpc-linux-gnu-gcc -E -dM -x c - | grep ENDIAN #define __ORDER_LITTLE_ENDIAN__ 1234 #define __BIG_ENDIAN__ 1 #define __FLOAT_WORD_ORDER__ __ORDER_BIG_ENDIAN__ #define __ORDER_PDP_ENDIAN__ 3412 #define _BIG_ENDIAN 1 #define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__ #define __VEC_ELEMENT_REG_ORDER__ __ORDER_BIG_ENDIAN__ #define __ORDER_BIG_ENDIAN__ 4321 [2] powerpc-linux-gnu-gcc + -mlittle-endian-> __LITTLE_ENDIAN__ is defined masahiro@grover:~$ echo | powerpc-linux-gnu-gcc -E -dM -x c - -mlittle-endian | grep ENDIAN #define __ORDER_LITTLE_ENDIAN__ 1234 #define _LITTLE_ENDIAN 1 #define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__ #define __ORDER_PDP_ENDIAN__ 3412 #define __LITTLE_ENDIAN__ 1 #define __BYTE_ORDER__
Re: [PATCH kernel v3] powerpc/makefile: Do not redefine $(CPP) for preprocessor
On 14/05/2021 04:59, Nathan Chancellor wrote: On 5/13/2021 4:59 AM, Alexey Kardashevskiy wrote: The $(CPP) (do only preprocessing) macro is already defined in Makefile. However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results in flags duplication. Which is not a big deal by itself except for the flags which depend on other flags and the compiler checks them as it parses the command line. Specifically, scripts/Makefile.build:304 generates ksyms for .S files. If clang+llvm+sanitizer are enabled, this results in -emit-llvm-bc -fno-lto -flto -fvisibility=hidden \ -fsanitize=cfi-mfcall -fno-lto ... in the clang command line and triggers error: clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed with '-flto' This removes unnecessary CPP redefinition. Which works fine as in most place KBUILD_CFLAGS is passed to $CPP except arch/powerpc/kernel/vdso64/vdso(32|64).lds. To fix vdso, this does: 1. add -m(big|little)-endian to $CPP 2. add target to $KBUILD_CPPFLAGS as otherwise clang ignores -m(big|little)-endian if the building platform does not support big endian (such as x86). Signed-off-by: Alexey Kardashevskiy --- Changes: v3: * moved vdso cleanup in a separate patch * only add target to KBUILD_CPPFLAGS for CLANG v2: * fix KBUILD_CPPFLAGS * add CLANG_FLAGS to CPPFLAGS --- Makefile | 1 + arch/powerpc/Makefile | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 15b6476d0f89..5b545bef7653 100644 --- a/Makefile +++ b/Makefile @@ -576,6 +576,7 @@ CC_VERSION_TEXT = $(subst $(pound),,$(shell $(CC) --version 2>/dev/null | head - ifneq ($(findstring clang,$(CC_VERSION_TEXT)),) ifneq ($(CROSS_COMPILE),) CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%)) +KBUILD_CPPFLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%)) You can avoid the duplication here by just doing: KBUILD_CPPFLAGS += $(CLANG_FLAGS) This has potential of duplicating even more flags which is exactly what I am trying to avoid here. I am still not super happy about the flag duplication but I am not sure I can think of a better solution. If KBUILD_CPPFLAGS are always included when building .o files, My understanding is that KBUILD_CPPFLAGS should not be added for .o. Who does know or decide for sure about what CPPFLAGS are for? :) maybe we should just add $(CLANG_FLAGS) to KBUILD_CPPFLAGS instead of KBUILD_CFLAGS? endif ifeq ($(LLVM_IAS),1) CLANG_FLAGS += -integrated-as diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 3212d076ac6a..306bfd2797ad 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -76,6 +76,7 @@ endif ifdef CONFIG_CPU_LITTLE_ENDIAN KBUILD_CFLAGS += -mlittle-endian +KBUILD_CPPFLAGS += -mlittle-endian KBUILD_LDFLAGS += -EL LDEMULATION := lppc GNUTARGET := powerpcle @@ -83,6 +84,7 @@ MULTIPLEWORD := -mno-multiple KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-save-toc-indirect) else KBUILD_CFLAGS += $(call cc-option,-mbig-endian) +KBUILD_CPPFLAGS += $(call cc-option,-mbig-endian) KBUILD_LDFLAGS += -EB LDEMULATION := ppc GNUTARGET := powerpc @@ -208,7 +210,6 @@ KBUILD_CPPFLAGS += -I $(srctree)/arch/$(ARCH) $(asinstr) KBUILD_AFLAGS += $(AFLAGS-y) KBUILD_CFLAGS += $(call cc-option,-msoft-float) KBUILD_CFLAGS += -pipe $(CFLAGS-y) -CPP = $(CC) -E $(KBUILD_CFLAGS) CHECKFLAGS += -m$(BITS) -D__powerpc__ -D__powerpc$(BITS)__ ifdef CONFIG_CPU_BIG_ENDIAN -- Alexey
Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1
On 14 May 2021 at 00:58am, Nicholas Piggin wrote: Excerpts from Christian Zigotzky's message of May 14, 2021 6:20 am: On 13 May 2021 at 07:00pm, Christophe Leroy wrote: Ah yes, I remember this problem. Can you select CONFIG_VIRT_CPU_ACCOUNTING_GEN in your configuration ? Otherwise, I can try to fix the branch. Christophe I selected this. After that it compiles. 1. git bisect good - Xorg restarts again and again Output: [f9aa0ac1e9e82b60401ad567bdabc30598325bc1] Revert "powerpc/64e/interrupt: use new interrupt return" 2. git bisect good - Xorg restarts again and again Output: [cd6d259a14704741bf0cd1dcadb84c0de22d7f77] Revert "powerpc/64e/interrupt: always save nvgprs on interrupt" 3. git bisect bad - Xorg works Output: [9bfa20ef2ae54d3b9088dfbcde4ef97062cf5ef2] Revert "powerpc/interrupt: update common interrupt code for" 4. git bisect good - Xorg restarts again and again Output: cd6d259a14704741bf0cd1dcadb84c0de22d7f77 is the first bad commit commit cd6d259a14704741bf0cd1dcadb84c0de22d7f77 Author: Christophe Leroy Date: Thu May 13 09:52:06 2021 + Revert "powerpc/64e/interrupt: always save nvgprs on interrupt" This reverts commit 4228b2c3d20e9f80b847f809c38e6cf82864fa50. :04 04 156542c857ad72776b69bb67b2f244afeeb7abd3 92ea86ed097fce16238b0c2f2b343473894e4e8e M arch Thank you both very much for chasing this down. I think I see the problem, it's clobbering r14 and r15 for some interrupts. Something like this is required, I'll give it more review and testing though. Thanks, Nick --- diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 7c3654b0d0f4..b91ef04f1ce2 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -535,6 +535,10 @@ __end_interrupts: PROLOG_ADDITION_2REGS) mfspr r14,SPRN_DEAR mfspr r15,SPRN_ESR + std r14,_DAR(r1) + std r15,_DSISR(r1) + ld r14,PACA_EXGEN+EX_R14(r13) + ld r15,PACA_EXGEN+EX_R15(r13) EXCEPTION_COMMON(0x300) b storage_fault_common @@ -544,6 +548,10 @@ __end_interrupts: PROLOG_ADDITION_2REGS) li r15,0 mr r14,r10 + std r14,_DAR(r1) + std r15,_DSISR(r1) + ld r14,PACA_EXGEN+EX_R14(r13) + ld r15,PACA_EXGEN+EX_R15(r13) EXCEPTION_COMMON(0x400) b storage_fault_common @@ -557,6 +565,10 @@ __end_interrupts: PROLOG_ADDITION_2REGS) mfspr r14,SPRN_DEAR mfspr r15,SPRN_ESR + std r14,_DAR(r1) + std r15,_DSISR(r1) + ld r14,PACA_EXGEN+EX_R14(r13) + ld r15,PACA_EXGEN+EX_R15(r13) EXCEPTION_COMMON(0x600) b alignment_more /* no room, go out of line */ @@ -565,10 +577,10 @@ __end_interrupts: NORMAL_EXCEPTION_PROLOG(0x700, BOOKE_INTERRUPT_PROGRAM, PROLOG_ADDITION_1REG) mfspr r14,SPRN_ESR - EXCEPTION_COMMON(0x700) std r14,_DSISR(r1) - addir3,r1,STACK_FRAME_OVERHEAD ld r14,PACA_EXGEN+EX_R14(r13) + EXCEPTION_COMMON(0x700) + addir3,r1,STACK_FRAME_OVERHEAD bl program_check_exception REST_NVGPRS(r1) b interrupt_return @@ -725,11 +737,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) * normal exception */ mfspr r14,SPRN_DBSR - EXCEPTION_COMMON_CRIT(0xd00) std r14,_DSISR(r1) - addir3,r1,STACK_FRAME_OVERHEAD ld r14,PACA_EXCRIT+EX_R14(r13) ld r15,PACA_EXCRIT+EX_R15(r13) + EXCEPTION_COMMON_CRIT(0xd00) + addir3,r1,STACK_FRAME_OVERHEAD bl DebugException REST_NVGPRS(r1) b interrupt_return @@ -796,11 +808,11 @@ kernel_dbg_exc: * normal exception */ mfspr r14,SPRN_DBSR - EXCEPTION_COMMON_DBG(0xd08) std r14,_DSISR(r1) - addir3,r1,STACK_FRAME_OVERHEAD ld r14,PACA_EXDBG+EX_R14(r13) ld r15,PACA_EXDBG+EX_R15(r13) + EXCEPTION_COMMON_DBG(0xd08) + addir3,r1,STACK_FRAME_OVERHEAD bl DebugException REST_NVGPRS(r1) b interrupt_return @@ -931,11 +943,7 @@ masked_interrupt_book3e_0x2c0: * original values stashed away in the PACA */ storage_fault_common: - std r14,_DAR(r1) - std r15,_DSISR(r1) addir3,r1,STACK_FRAME_OVERHEAD - ld r14,PACA_EXGEN+EX_R14(r13) - ld r15,PACA_EXGEN+EX_R15(r13) bl do_page_fault b interrupt_return @@ -944,11 +952,7 @@ storage_fault_common: * continues here. */ alignment_more: - std r14,_DAR(r1) - std r15,_DSISR(r1) addir3,r1,STACK_FRAME_OVERHEAD - ld
Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1
Excerpts from Christian Zigotzky's message of May 14, 2021 6:20 am: > On 13 May 2021 at 07:00pm, Christophe Leroy wrote: >> >> Ah yes, I remember this problem. >> >> Can you select CONFIG_VIRT_CPU_ACCOUNTING_GEN in your configuration ? >> >> Otherwise, I can try to fix the branch. >> >> Christophe > I selected this. After that it compiles. > > 1. git bisect good - Xorg restarts again and again > Output: [f9aa0ac1e9e82b60401ad567bdabc30598325bc1] Revert > "powerpc/64e/interrupt: use new interrupt return" > 2. git bisect good - Xorg restarts again and again > Output: [cd6d259a14704741bf0cd1dcadb84c0de22d7f77] Revert > "powerpc/64e/interrupt: always save nvgprs on interrupt" > 3. git bisect bad - Xorg works > Output: [9bfa20ef2ae54d3b9088dfbcde4ef97062cf5ef2] Revert > "powerpc/interrupt: update common interrupt code for" > 4. git bisect good - Xorg restarts again and again > Output: > > cd6d259a14704741bf0cd1dcadb84c0de22d7f77 is the first bad commit > commit cd6d259a14704741bf0cd1dcadb84c0de22d7f77 > Author: Christophe Leroy > Date: Thu May 13 09:52:06 2021 + > > Revert "powerpc/64e/interrupt: always save nvgprs on interrupt" > > This reverts commit 4228b2c3d20e9f80b847f809c38e6cf82864fa50. > > :04 04 156542c857ad72776b69bb67b2f244afeeb7abd3 > 92ea86ed097fce16238b0c2f2b343473894e4e8e M arch Thank you both very much for chasing this down. I think I see the problem, it's clobbering r14 and r15 for some interrupts. Something like this is required, I'll give it more review and testing though. Thanks, Nick --- diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 7c3654b0d0f4..b91ef04f1ce2 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -535,6 +535,10 @@ __end_interrupts: PROLOG_ADDITION_2REGS) mfspr r14,SPRN_DEAR mfspr r15,SPRN_ESR + std r14,_DAR(r1) + std r15,_DSISR(r1) + ld r14,PACA_EXGEN+EX_R14(r13) + ld r15,PACA_EXGEN+EX_R15(r13) EXCEPTION_COMMON(0x300) b storage_fault_common @@ -544,6 +548,10 @@ __end_interrupts: PROLOG_ADDITION_2REGS) li r15,0 mr r14,r10 + std r14,_DAR(r1) + std r15,_DSISR(r1) + ld r14,PACA_EXGEN+EX_R14(r13) + ld r15,PACA_EXGEN+EX_R15(r13) EXCEPTION_COMMON(0x400) b storage_fault_common @@ -557,6 +565,10 @@ __end_interrupts: PROLOG_ADDITION_2REGS) mfspr r14,SPRN_DEAR mfspr r15,SPRN_ESR + std r14,_DAR(r1) + std r15,_DSISR(r1) + ld r14,PACA_EXGEN+EX_R14(r13) + ld r15,PACA_EXGEN+EX_R15(r13) EXCEPTION_COMMON(0x600) b alignment_more /* no room, go out of line */ @@ -565,10 +577,10 @@ __end_interrupts: NORMAL_EXCEPTION_PROLOG(0x700, BOOKE_INTERRUPT_PROGRAM, PROLOG_ADDITION_1REG) mfspr r14,SPRN_ESR - EXCEPTION_COMMON(0x700) std r14,_DSISR(r1) - addir3,r1,STACK_FRAME_OVERHEAD ld r14,PACA_EXGEN+EX_R14(r13) + EXCEPTION_COMMON(0x700) + addir3,r1,STACK_FRAME_OVERHEAD bl program_check_exception REST_NVGPRS(r1) b interrupt_return @@ -725,11 +737,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) * normal exception */ mfspr r14,SPRN_DBSR - EXCEPTION_COMMON_CRIT(0xd00) std r14,_DSISR(r1) - addir3,r1,STACK_FRAME_OVERHEAD ld r14,PACA_EXCRIT+EX_R14(r13) ld r15,PACA_EXCRIT+EX_R15(r13) + EXCEPTION_COMMON_CRIT(0xd00) + addir3,r1,STACK_FRAME_OVERHEAD bl DebugException REST_NVGPRS(r1) b interrupt_return @@ -796,11 +808,11 @@ kernel_dbg_exc: * normal exception */ mfspr r14,SPRN_DBSR - EXCEPTION_COMMON_DBG(0xd08) std r14,_DSISR(r1) - addir3,r1,STACK_FRAME_OVERHEAD ld r14,PACA_EXDBG+EX_R14(r13) ld r15,PACA_EXDBG+EX_R15(r13) + EXCEPTION_COMMON_DBG(0xd08) + addir3,r1,STACK_FRAME_OVERHEAD bl DebugException REST_NVGPRS(r1) b interrupt_return @@ -931,11 +943,7 @@ masked_interrupt_book3e_0x2c0: * original values stashed away in the PACA */ storage_fault_common: - std r14,_DAR(r1) - std r15,_DSISR(r1) addir3,r1,STACK_FRAME_OVERHEAD - ld r14,PACA_EXGEN+EX_R14(r13) - ld r15,PACA_EXGEN+EX_R15(r13) bl do_page_fault b interrupt_return @@ -944,11 +952,7 @@ storage_fault_common: * continues here. */ alignment_more: - std r14,_DAR(r1) - std r15,_DSISR(r1) addir3,r1,STACK_FRAME_OVERHEAD - ld
Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1
On 13 May 2021 at 07:00pm, Christophe Leroy wrote: Le 13/05/2021 à 18:35, Christian Zigotzky a écrit : On 13 May 2021 at 5:51pm, Christophe Leroy wrote: Le 13/05/2021 à 17:19, Christian Zigotzky a écrit : On 13 May 2021 at 12:01 pm, Christophe Leroy wrote: Le 13/05/2021 à 08:47, Christian Zigotzky a écrit : Hi Christophe, On 9. May 2021, at 19:37, Christophe Leroy wrote: Did I do an error in my analysis ? No, you didn’t. On the contrary you have found the issue. ;-) The issue is somewhere in the new interrupt code. I'm not convinced, but let's give it a try. ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context tracking scheme Could you please create a patch for reverting the new interrupt code? I would like to confirm your result. Please fetch https://github.com/chleroy/linux.git and try the branch for_christian. This is a revert of approx a dozen of commits around the changes to 64e on top of v5.13-rc1. If the top of the branch works for you, it would be great if you can find out which one of the reverts fixes the problem for real. Thanks Christophe It's working! Great! I can use the RC1 on my FSL P5040. Thank you! The issue is definitely somewhere in the interrupt code. Please tell me the next steps. Can you bisect between 5.13-rc1 and the top of the 'for_christian' branch to find the first 'good' commit ? Take care it is an "up side down" bisect, a 'good' is one that does _not_ work and a 'bad' is a commit that works. git bisect start git bisect bad 1c8f441f1485 git bisect good 6efb943b8616 Christophe Hi Christophe, Yes, I can. Shall I use the branch 'for_christian' or the default linux git for bisecting? I have tried it already with the branch 'for_christian' but it doesn't compile. git bisect start git bisect bad 1c8f441f1485 git bisect good 6efb943b8616 Output: [d66b1d1aab0c1caad11eca417f515b86ec0bebe9] Revert "powerpc/64e/interrupt: Use new interrupt context tracking scheme" Result: arch/powerpc/kernel/interrupt.o: In function `.syscall_exit_prepare': interrupt.c:(.text+0x278): undefined reference to `.schedule_user' arch/powerpc/kernel/interrupt.o: In function `.interrupt_exit_user_prepare': interrupt.c:(.text+0x340): undefined reference to `.schedule_user' Makefile:1191: recipe for target 'vmlinux' failed make: *** [vmlinux] Error 1 Ah yes, I remember this problem. Can you select CONFIG_VIRT_CPU_ACCOUNTING_GEN in your configuration ? Otherwise, I can try to fix the branch. Christophe I selected this. After that it compiles. 1. git bisect good - Xorg restarts again and again Output: [f9aa0ac1e9e82b60401ad567bdabc30598325bc1] Revert "powerpc/64e/interrupt: use new interrupt return" 2. git bisect good - Xorg restarts again and again Output: [cd6d259a14704741bf0cd1dcadb84c0de22d7f77] Revert "powerpc/64e/interrupt: always save nvgprs on interrupt" 3. git bisect bad - Xorg works Output: [9bfa20ef2ae54d3b9088dfbcde4ef97062cf5ef2] Revert "powerpc/interrupt: update common interrupt code for" 4. git bisect good - Xorg restarts again and again Output: cd6d259a14704741bf0cd1dcadb84c0de22d7f77 is the first bad commit commit cd6d259a14704741bf0cd1dcadb84c0de22d7f77 Author: Christophe Leroy Date: Thu May 13 09:52:06 2021 + Revert "powerpc/64e/interrupt: always save nvgprs on interrupt" This reverts commit 4228b2c3d20e9f80b847f809c38e6cf82864fa50. :04 04 156542c857ad72776b69bb67b2f244afeeb7abd3 92ea86ed097fce16238b0c2f2b343473894e4e8e M arch - Christian
Re: [PATCH] x86: Define only {pud/pmd}_{set/clear}_huge when usefull
On 5/13/21 11:27 AM, Christophe Leroy wrote: > When PUD and/or PMD are folded, those functions are useless > and we have a stub in linux/pgtable.h > > Reported-by: Randy Dunlap > Signed-off-by: Christophe Leroy > --- > arch/x86/mm/pgtable.c | 34 +++--- > 1 file changed, 19 insertions(+), 15 deletions(-) WorksForMe. Thanks. Acked-by: Randy Dunlap # build-tested -- ~Randy
Re: [PATCH kernel v3] powerpc/makefile: Do not redefine $(CPP) for preprocessor
On 5/13/2021 4:59 AM, Alexey Kardashevskiy wrote: The $(CPP) (do only preprocessing) macro is already defined in Makefile. However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results in flags duplication. Which is not a big deal by itself except for the flags which depend on other flags and the compiler checks them as it parses the command line. Specifically, scripts/Makefile.build:304 generates ksyms for .S files. If clang+llvm+sanitizer are enabled, this results in -emit-llvm-bc -fno-lto -flto -fvisibility=hidden \ -fsanitize=cfi-mfcall -fno-lto ... in the clang command line and triggers error: clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed with '-flto' This removes unnecessary CPP redefinition. Which works fine as in most place KBUILD_CFLAGS is passed to $CPP except arch/powerpc/kernel/vdso64/vdso(32|64).lds. To fix vdso, this does: 1. add -m(big|little)-endian to $CPP 2. add target to $KBUILD_CPPFLAGS as otherwise clang ignores -m(big|little)-endian if the building platform does not support big endian (such as x86). Signed-off-by: Alexey Kardashevskiy --- Changes: v3: * moved vdso cleanup in a separate patch * only add target to KBUILD_CPPFLAGS for CLANG v2: * fix KBUILD_CPPFLAGS * add CLANG_FLAGS to CPPFLAGS --- Makefile | 1 + arch/powerpc/Makefile | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 15b6476d0f89..5b545bef7653 100644 --- a/Makefile +++ b/Makefile @@ -576,6 +576,7 @@ CC_VERSION_TEXT = $(subst $(pound),,$(shell $(CC) --version 2>/dev/null | head - ifneq ($(findstring clang,$(CC_VERSION_TEXT)),) ifneq ($(CROSS_COMPILE),) CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%)) +KBUILD_CPPFLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%)) You can avoid the duplication here by just doing: KBUILD_CPPFLAGS += $(CLANG_FLAGS) I am still not super happy about the flag duplication but I am not sure I can think of a better solution. If KBUILD_CPPFLAGS are always included when building .o files, maybe we should just add $(CLANG_FLAGS) to KBUILD_CPPFLAGS instead of KBUILD_CFLAGS? endif ifeq ($(LLVM_IAS),1) CLANG_FLAGS += -integrated-as diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 3212d076ac6a..306bfd2797ad 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -76,6 +76,7 @@ endif ifdef CONFIG_CPU_LITTLE_ENDIAN KBUILD_CFLAGS += -mlittle-endian +KBUILD_CPPFLAGS+= -mlittle-endian KBUILD_LDFLAGS+= -EL LDEMULATION := lppc GNUTARGET := powerpcle @@ -83,6 +84,7 @@ MULTIPLEWORD := -mno-multiple KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-save-toc-indirect) else KBUILD_CFLAGS += $(call cc-option,-mbig-endian) +KBUILD_CPPFLAGS += $(call cc-option,-mbig-endian) KBUILD_LDFLAGS+= -EB LDEMULATION := ppc GNUTARGET := powerpc @@ -208,7 +210,6 @@ KBUILD_CPPFLAGS += -I $(srctree)/arch/$(ARCH) $(asinstr) KBUILD_AFLAGS += $(AFLAGS-y) KBUILD_CFLAGS += $(call cc-option,-msoft-float) KBUILD_CFLAGS += -pipe $(CFLAGS-y) -CPP= $(CC) -E $(KBUILD_CFLAGS) CHECKFLAGS += -m$(BITS) -D__powerpc__ -D__powerpc$(BITS)__ ifdef CONFIG_CPU_BIG_ENDIAN
[PATCH] x86: Define only {pud/pmd}_{set/clear}_huge when usefull
When PUD and/or PMD are folded, those functions are useless and we have a stub in linux/pgtable.h Reported-by: Randy Dunlap Signed-off-by: Christophe Leroy --- arch/x86/mm/pgtable.c | 34 +++--- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index d27cf69e811d..1303ff6ef7be 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -682,6 +682,7 @@ int p4d_clear_huge(p4d_t *p4d) } #endif +#if CONFIG_PGTABLE_LEVELS > 3 /** * pud_set_huge - setup kernel PUD mapping * @@ -720,6 +721,23 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) return 1; } +/** + * pud_clear_huge - clear kernel PUD mapping when it is set + * + * Returns 1 on success and 0 on failure (no PUD map is found). + */ +int pud_clear_huge(pud_t *pud) +{ + if (pud_large(*pud)) { + pud_clear(pud); + return 1; + } + + return 0; +} +#endif + +#if CONFIG_PGTABLE_LEVELS > 2 /** * pmd_set_huge - setup kernel PMD mapping * @@ -750,21 +768,6 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) return 1; } -/** - * pud_clear_huge - clear kernel PUD mapping when it is set - * - * Returns 1 on success and 0 on failure (no PUD map is found). - */ -int pud_clear_huge(pud_t *pud) -{ - if (pud_large(*pud)) { - pud_clear(pud); - return 1; - } - - return 0; -} - /** * pmd_clear_huge - clear kernel PMD mapping when it is set * @@ -779,6 +782,7 @@ int pmd_clear_huge(pmd_t *pmd) return 0; } +#endif #ifdef CONFIG_X86_64 /** -- 2.25.0
Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1
Le 13/05/2021 à 18:35, Christian Zigotzky a écrit : On 13 May 2021 at 5:51pm, Christophe Leroy wrote: Le 13/05/2021 à 17:19, Christian Zigotzky a écrit : On 13 May 2021 at 12:01 pm, Christophe Leroy wrote: Le 13/05/2021 à 08:47, Christian Zigotzky a écrit : Hi Christophe, On 9. May 2021, at 19:37, Christophe Leroy wrote: Did I do an error in my analysis ? No, you didn’t. On the contrary you have found the issue. ;-) The issue is somewhere in the new interrupt code. I'm not convinced, but let's give it a try. ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context tracking scheme Could you please create a patch for reverting the new interrupt code? I would like to confirm your result. Please fetch https://github.com/chleroy/linux.git and try the branch for_christian. This is a revert of approx a dozen of commits around the changes to 64e on top of v5.13-rc1. If the top of the branch works for you, it would be great if you can find out which one of the reverts fixes the problem for real. Thanks Christophe It's working! Great! I can use the RC1 on my FSL P5040. Thank you! The issue is definitely somewhere in the interrupt code. Please tell me the next steps. Can you bisect between 5.13-rc1 and the top of the 'for_christian' branch to find the first 'good' commit ? Take care it is an "up side down" bisect, a 'good' is one that does _not_ work and a 'bad' is a commit that works. git bisect start git bisect bad 1c8f441f1485 git bisect good 6efb943b8616 Christophe Hi Christophe, Yes, I can. Shall I use the branch 'for_christian' or the default linux git for bisecting? I have tried it already with the branch 'for_christian' but it doesn't compile. git bisect start git bisect bad 1c8f441f1485 git bisect good 6efb943b8616 Output: [d66b1d1aab0c1caad11eca417f515b86ec0bebe9] Revert "powerpc/64e/interrupt: Use new interrupt context tracking scheme" Result: arch/powerpc/kernel/interrupt.o: In function `.syscall_exit_prepare': interrupt.c:(.text+0x278): undefined reference to `.schedule_user' arch/powerpc/kernel/interrupt.o: In function `.interrupt_exit_user_prepare': interrupt.c:(.text+0x340): undefined reference to `.schedule_user' Makefile:1191: recipe for target 'vmlinux' failed make: *** [vmlinux] Error 1 Ah yes, I remember this problem. Can you select CONFIG_VIRT_CPU_ACCOUNTING_GEN in your configuration ? Otherwise, I can try to fix the branch. Christophe
Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1
On 13 May 2021 at 5:51pm, Christophe Leroy wrote: Le 13/05/2021 à 17:19, Christian Zigotzky a écrit : On 13 May 2021 at 12:01 pm, Christophe Leroy wrote: Le 13/05/2021 à 08:47, Christian Zigotzky a écrit : Hi Christophe, On 9. May 2021, at 19:37, Christophe Leroy wrote: Did I do an error in my analysis ? No, you didn’t. On the contrary you have found the issue. ;-) The issue is somewhere in the new interrupt code. I'm not convinced, but let's give it a try. ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context tracking scheme Could you please create a patch for reverting the new interrupt code? I would like to confirm your result. Please fetch https://github.com/chleroy/linux.git and try the branch for_christian. This is a revert of approx a dozen of commits around the changes to 64e on top of v5.13-rc1. If the top of the branch works for you, it would be great if you can find out which one of the reverts fixes the problem for real. Thanks Christophe It's working! Great! I can use the RC1 on my FSL P5040. Thank you! The issue is definitely somewhere in the interrupt code. Please tell me the next steps. Can you bisect between 5.13-rc1 and the top of the 'for_christian' branch to find the first 'good' commit ? Take care it is an "up side down" bisect, a 'good' is one that does _not_ work and a 'bad' is a commit that works. git bisect start git bisect bad 1c8f441f1485 git bisect good 6efb943b8616 Christophe Hi Christophe, Yes, I can. Shall I use the branch 'for_christian' or the default linux git for bisecting? I have tried it already with the branch 'for_christian' but it doesn't compile. git bisect start git bisect bad 1c8f441f1485 git bisect good 6efb943b8616 Output: [d66b1d1aab0c1caad11eca417f515b86ec0bebe9] Revert "powerpc/64e/interrupt: Use new interrupt context tracking scheme" Result: arch/powerpc/kernel/interrupt.o: In function `.syscall_exit_prepare': interrupt.c:(.text+0x278): undefined reference to `.schedule_user' arch/powerpc/kernel/interrupt.o: In function `.interrupt_exit_user_prepare': interrupt.c:(.text+0x340): undefined reference to `.schedule_user' Makefile:1191: recipe for target 'vmlinux' failed make: *** [vmlinux] Error 1 Thanks, Christian
Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1
Le 13/05/2021 à 17:19, Christian Zigotzky a écrit : On 13 May 2021 at 12:01 pm, Christophe Leroy wrote: Le 13/05/2021 à 08:47, Christian Zigotzky a écrit : Hi Christophe, On 9. May 2021, at 19:37, Christophe Leroy wrote: Did I do an error in my analysis ? No, you didn’t. On the contrary you have found the issue. ;-) The issue is somewhere in the new interrupt code. I'm not convinced, but let's give it a try. ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context tracking scheme Could you please create a patch for reverting the new interrupt code? I would like to confirm your result. Please fetch https://github.com/chleroy/linux.git and try the branch for_christian. This is a revert of approx a dozen of commits around the changes to 64e on top of v5.13-rc1. If the top of the branch works for you, it would be great if you can find out which one of the reverts fixes the problem for real. Thanks Christophe It's working! Great! I can use the RC1 on my FSL P5040. Thank you! The issue is definitely somewhere in the interrupt code. Please tell me the next steps. Can you bisect between 5.13-rc1 and the top of the 'for_christian' branch to find the first 'good' commit ? Take care it is an "up side down" bisect, a 'good' is one that does _not_ work and a 'bad' is a commit that works. git bisect start git bisect bad 1c8f441f1485 git bisect good 6efb943b8616 Christophe
Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1
On 13 May 2021 at 12:01 pm, Christophe Leroy wrote: Le 13/05/2021 à 08:47, Christian Zigotzky a écrit : Hi Christophe, On 9. May 2021, at 19:37, Christophe Leroy wrote: Did I do an error in my analysis ? No, you didn’t. On the contrary you have found the issue. ;-) The issue is somewhere in the new interrupt code. I'm not convinced, but let's give it a try. ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context tracking scheme Could you please create a patch for reverting the new interrupt code? I would like to confirm your result. Please fetch https://github.com/chleroy/linux.git and try the branch for_christian. This is a revert of approx a dozen of commits around the changes to 64e on top of v5.13-rc1. If the top of the branch works for you, it would be great if you can find out which one of the reverts fixes the problem for real. Thanks Christophe It's working! Great! I can use the RC1 on my FSL P5040. Thank you! The issue is definitely somewhere in the interrupt code. Please tell me the next steps. - Christian
[PATCH 1/2] powerpc/64s: Fix entry flush patching w/strict RWX & hash
The entry flush mitigation can be enabled/disabled at runtime. When this happens it results in the kernel patching its own instructions to enable/disable the mitigation sequence. With strict kernel RWX enabled instruction patching happens via a secondary mapping of the kernel text, so that we don't have to make the primary mapping writable. With the hash MMU this leads to a hash fault, which causes us to execute the exception entry which contains the entry flush mitigation. This means we end up executing the entry flush in a semi-patched state, ie. after we have patched the first instruction but before we patch the second or third instruction of the sequence. On machines with updated firmware the entry flush is a series of special nops, and it's safe to to execute in a semi-patched state. However when using the fallback flush the sequence is mflr/branch/mtlr, and so it's not safe to execute if we have patched out the mflr but not the other two instructions. Doing so leads to us corrputing LR, leading to an oops, for example: # echo 0 > /sys/kernel/debug/powerpc/entry_flush kernel tried to execute exec-protected page (c2971000) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xc2971000 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 0 PID: 2215 Comm: bash Not tainted 5.13.0-rc1-00010-gda3bb206c9ce #1 NIP: c2971000 LR: c2971000 CTR: c0120c40 REGS: c00013243840 TRAP: 0400 Not tainted (5.13.0-rc1-00010-gda3bb206c9ce) MSR: 800010009033 CR: 48428482 XER: ... NIP 0xc2971000 LR 0xc2971000 Call Trace: do_patch_instruction+0xc4/0x340 (unreliable) do_entry_flush_fixups+0x100/0x3b0 entry_flush_set+0x50/0xe0 simple_attr_write+0x160/0x1a0 full_proxy_write+0x8c/0x110 vfs_write+0xf0/0x340 ksys_write+0x84/0x140 system_call_exception+0x164/0x2d0 system_call_common+0xec/0x278 The simplest fix is to change the order in which we patch the instructions, so that the sequence is always safe to execute. For the non-fallback flushes it doesn't matter what order we patch in. Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again") Signed-off-by: Michael Ellerman --- arch/powerpc/lib/feature-fixups.c | 56 ++- 1 file changed, 40 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 0aefa6a4a259..b49bb41e3ec5 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -325,6 +325,28 @@ static int __do_entry_flush_fixups(void *data) if (types & L1D_FLUSH_MTTRIG) instrs[i++] = 0x7c12dba6; /* mtspr TRIG2,r0 (SPR #882) */ + /* +* If we're patching in or out the fallback flush we need to be careful about the +* order in which we patch instructions. That's because it's possible we could +* take a page fault after patching one instruction, so the sequence of +* instructions must be safe even in a half patched state. +* +* To make that work, when patching in the fallback flush we patch in this order: +* - the mflr (dest) +* - the mtlr (dest + 2) +* - the branch(dest + 1) +* +* That ensures the sequence is safe to execute at any point. In contrast if we +* patch the mtlr last, it's possible we could return from the branch and not +* restore LR, leading to a crash later. +* +* When patching out the fallback flush (either with nops or another flush type), +* we patch in this order: +* - the branch(dest + 1) +* - the mtlr (dest + 2) +* - the mflr (dest) +*/ + start = PTRRELOC(&__start___entry_flush_fixup); end = PTRRELOC(&__stop___entry_flush_fixup); for (i = 0; start < end; start++, i++) { @@ -332,15 +354,16 @@ static int __do_entry_flush_fixups(void *data) pr_devel("patching dest %lx\n", (unsigned long)dest); - patch_instruction((struct ppc_inst *)dest, ppc_inst(instrs[0])); - - if (types == L1D_FLUSH_FALLBACK) - patch_branch((struct ppc_inst *)(dest + 1), (unsigned long)_flush_fallback, -BRANCH_SET_LINK); - else + if (types == L1D_FLUSH_FALLBACK) { + patch_instruction((struct ppc_inst *)dest, ppc_inst(instrs[0])); + patch_instruction((struct ppc_inst *)(dest + 2), ppc_inst(instrs[2])); + patch_branch((struct ppc_inst *)(dest + 1), +(unsigned long)_flush_fallback, BRANCH_SET_LINK); + } else {
[PATCH 2/2] powerpc/64s: Fix stf mitigation patching w/strict RWX & hash
The stf entry barrier fallback is unsafe to execute in a semi-patched state, which can happen when enabling/disabling the mitigation with strict kernel RWX enabled and using the hash MMU. See the previous commit for more details. Fix it by changing the order in which we patch the instructions. Note the stf barrier fallback is only used on Power6 or earlier. Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again") Signed-off-by: Michael Ellerman --- arch/powerpc/lib/feature-fixups.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index b49bb41e3ec5..71032475aa40 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -150,17 +150,17 @@ static void do_stf_entry_barrier_fixups(enum stf_barrier_type types) pr_devel("patching dest %lx\n", (unsigned long)dest); - patch_instruction((struct ppc_inst *)dest, ppc_inst(instrs[0])); - - if (types & STF_BARRIER_FALLBACK) + // See comment in do_entry_flush_fixups() RE order of patching + if (types & STF_BARRIER_FALLBACK) { + patch_instruction((struct ppc_inst *)dest, ppc_inst(instrs[0])); + patch_instruction((struct ppc_inst *)(dest + 2), ppc_inst(instrs[2])); patch_branch((struct ppc_inst *)(dest + 1), -(unsigned long)_barrier_fallback, -BRANCH_SET_LINK); - else - patch_instruction((struct ppc_inst *)(dest + 1), - ppc_inst(instrs[1])); - - patch_instruction((struct ppc_inst *)(dest + 2), ppc_inst(instrs[2])); +(unsigned long)_barrier_fallback, BRANCH_SET_LINK); + } else { + patch_instruction((struct ppc_inst *)(dest + 1), ppc_inst(instrs[1])); + patch_instruction((struct ppc_inst *)(dest + 2), ppc_inst(instrs[2])); + patch_instruction((struct ppc_inst *)dest, ppc_inst(instrs[0])); + } } printk(KERN_DEBUG "stf-barrier: patched %d entry locations (%s barrier)\n", i, -- 2.25.1
[PATCH 0/8] xen: harden frontends against malicious backends
Xen backends of para-virtualized devices can live in dom0 kernel, dom0 user land, or in a driver domain. This means that a backend might reside in a less trusted environment than the Xen core components, so a backend should not be able to do harm to a Xen guest (it can still mess up I/O data, but it shouldn't be able to e.g. crash a guest by other means or cause a privilege escalation in the guest). Unfortunately many frontends in the Linux kernel are fully trusting their respective backends. This series is starting to fix the most important frontends: console, disk and network. It was discussed to handle this as a security problem, but the topic was discussed in public before, so it isn't a real secret. Juergen Gross (8): xen: sync include/xen/interface/io/ring.h with Xen's newest version xen/blkfront: read response from backend only once xen/blkfront: don't take local copy of a request from the ring page xen/blkfront: don't trust the backend response data blindly xen/netfront: read response from backend only once xen/netfront: don't read data from request on the ring page xen/netfront: don't trust the backend response data blindly xen/hvc: replace BUG_ON() with negative return value drivers/block/xen-blkfront.c| 118 +- drivers/net/xen-netfront.c | 184 ++--- drivers/tty/hvc/hvc_xen.c | 15 +- include/xen/interface/io/ring.h | 278 ++-- 4 files changed, 369 insertions(+), 226 deletions(-) -- 2.26.2
Re: [PATCH 01/11] PCI: Use sysfs_emit() and sysfs_emit_at() in "show" functions
Hi Logan, [...] > Thanks, this is a great cleanup. I've reviewed the entire series. > > Reviewed-by: Logan Gunthorpe Thank you! Appreciate it! > I agree that the new lines that are missing should be added. While working on the simple change to add the missing new lines, I've found that we have slight inconsistency with handling these in one of the attributes, as per: # uname -r 5.13.0-rc1-00092-gc06a2ba62fc4 # grep -oE 'pci=resource_alignment.+' /proc/cmdline pci=resource_alignment=20@00:1f.2 # cat /sys/bus/pci/resource_alignment 20@00:1f. # echo 20@00:1f.2 > /sys/bus/pci/resource_alignment # cat /sys/bus/pci/resource_alignment 20@00:1f.2 # echo -n 20@00:1f.2 > /sys/bus/pci/resource_alignment # cat /sys/bus/pci/resource_alignment 20@00:1f. I adjusted the code so that we handle the input better in the upcoming v2 - the change is simple, but since it adds more code, I'll drop your "Reviewed-by", so that we can make sure that subsequent review will cover this new change. Sorry for troubles in advance! Krzysztof
Re: [RFC 1/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
On 5/12/21 10:57 PM, Peter Zijlstra wrote: > On Wed, May 12, 2021 at 10:08:21PM +0530, Kajol Jain wrote: >> +static void nvdimm_pmu_read(struct perf_event *event) >> +{ >> +struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu); >> + >> +/* jump to arch/platform specific callbacks if any */ >> +if (nd_pmu && nd_pmu->read) >> +nd_pmu->read(event, nd_pmu->dev); >> +} >> + >> +static void nvdimm_pmu_del(struct perf_event *event, int flags) >> +{ >> +struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu); >> + >> +/* jump to arch/platform specific callbacks if any */ >> +if (nd_pmu && nd_pmu->del) >> +nd_pmu->del(event, flags, nd_pmu->dev); >> +} >> + >> +static int nvdimm_pmu_add(struct perf_event *event, int flags) >> +{ >> +struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu); >> + >> +if (flags & PERF_EF_START) >> +/* jump to arch/platform specific callbacks if any */ >> +if (nd_pmu && nd_pmu->add) >> +return nd_pmu->add(event, flags, nd_pmu->dev); >> +return 0; >> +} > > What's the value add here? Why can't you directly set driver pointers? I > also don't really believe ->{add,del,read} can be optional and still > have a sane driver. > Hi Peter, The intend for adding these callbacks is to give flexibility to the arch/platform specific driver code to use its own routine for getting counter data or specific checks/operations. Arch/platform driver code would have different method to get the counter data like IBM pseries nmem* device which uses a hypervisor call(hcall). But yes the current read/add/del functions are not adding value. We could add an arch/platform specific function which could handle the capturing of the counter data and do the rest of the operation here, is this approach better? Thanks, Kajol Jain
[PATCH 4/4] KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C
Almost all logic is moved to C, by introducing a new in_guest mode for the P9 path that branches very early in the KVM interrupt handler to P9 exit code. The main P9 entry and exit assembly is now only about 160 lines of low level stack setup and register save/restore, plus a bad-interrupt handler. There are two motivations for this, the first is just make the code more maintainable being in C. The second is to reduce the amount of code running in a special KVM mode, "realmode". In quotes because with radix it is no longer necessarily real-mode in the MMU, but it still has to be treated specially because it may be in real-mode, and has various important registers like PID, DEC, TB, etc set to guest. This is hostile to the rest of Linux and can't use arbitrary kernel functionality or be instrumented well. This initial patch is a reasonably faithful conversion of the asm code, but it does lack any loop to return quickly back into the guest without switching out of realmode in the case of unimportant or easily handled interrupts. As explained in previous changes, handling HV interrupts very quickly in this low level realmode is not so important for P9 performance, and are important to avoid for security, observability, debugability reasons. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/asm-prototypes.h | 3 +- arch/powerpc/include/asm/kvm_asm.h| 1 + arch/powerpc/include/asm/kvm_book3s_64.h | 8 + arch/powerpc/include/asm/kvm_host.h | 7 +- arch/powerpc/kernel/security.c| 5 +- arch/powerpc/kvm/Makefile | 1 + arch/powerpc/kvm/book3s_64_entry.S| 254 ++ arch/powerpc/kvm/book3s_hv.c | 9 +- arch/powerpc/kvm/book3s_hv_p9_entry.c | 207 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 125 +-- 10 files changed, 496 insertions(+), 124 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_hv_p9_entry.c diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 1c7b75834e04..02ee6f5ac9fe 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -120,6 +120,7 @@ extern s32 patch__call_flush_branch_caches3; extern s32 patch__flush_count_cache_return; extern s32 patch__flush_link_stack_return; extern s32 patch__call_kvm_flush_link_stack; +extern s32 patch__call_kvm_flush_link_stack_p9; extern s32 patch__memset_nocache, patch__memcpy_nocache; extern long flush_branch_caches; @@ -140,7 +141,7 @@ void kvmhv_load_host_pmu(void); void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use); void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu); -int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu); +void kvmppc_p9_enter_guest(struct kvm_vcpu *vcpu); long kvmppc_h_set_dabr(struct kvm_vcpu *vcpu, unsigned long dabr); long kvmppc_h_set_xdabr(struct kvm_vcpu *vcpu, unsigned long dabr, diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index a3633560493b..43b1788e1f93 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -147,6 +147,7 @@ #define KVM_GUEST_MODE_SKIP2 #define KVM_GUEST_MODE_GUEST_HV3 #define KVM_GUEST_MODE_HOST_HV 4 +#define KVM_GUEST_MODE_HV_FAST 5 /* ISA >= v3.0 host+guest radix, indep thr */ #define KVM_INST_FETCH_FAILED -1 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 9bb9bb370b53..c214bcffb441 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -153,9 +153,17 @@ static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu *vcpu) return radix; } +int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu); + #define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */ #endif +/* + * Invalid HDSISR value which is used to indicate when HW has not set the reg. + * Used to work around an errata. + */ +#define HDSISR_CANARY 0x7fff + /* * We use a lock bit in HPTE dword 0 to synchronize updates and * accesses to each HPTE, and another bit to indicate non-present diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 1e83359f286b..69add9d662df 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -683,7 +683,12 @@ struct kvm_vcpu_arch { ulong fault_dar; u32 fault_dsisr; unsigned long intr_msr; - ulong fault_gpa;/* guest real address of page fault (POWER9) */ + /* +* POWER9 and later: fault_gpa contains the guest real address of page +* fault for a radix guest, or segment descriptor (equivalent to result +* from slbmfev of SLB entry that translated the EA) for hash guests. +*/ + ulong fault_gpa; #endif #ifdef CONFIG_BOOKE diff --git a/arch/powerpc/kernel/security.c
[PATCH 3/4] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path
In the interest of minimising the amount of code that is run in "real-mode", don't handle hcalls in real mode in the P9 path. This requires some new handlers for H_CEDE and xics-on-xive to be added before xive is pulled or cede logic is checked. This introduces a change in radix guest behaviour where radix guests that execute 'sc 1' in userspace now get a privilege fault whereas previously the 'sc 1' would be reflected as a syscall interrupt to the guest kernel. That reflection is only required for hash guests that run PR KVM. Background: In POWER8 and earlier processors, it is very expensive to exit from the HV real mode context of a guest hypervisor interrupt, and switch to host virtual mode. On those processors, guest->HV interrupts reach the hypervisor with the MMU off because the MMU is loaded with guest context (LPCR, SDR1, SLB), and the other threads in the sub-core need to be pulled out of the guest too. Then the primary must save off guest state, invalidate SLB and ERAT, and load up host state before the MMU can be enabled to run in host virtual mode (~= regular Linux mode). Hash guests also require a lot of hcalls to run due to the nature of the MMU architecture and paravirtualisation design. The XICS interrupt controller requires hcalls to run. So KVM traditionally tries hard to avoid the full exit, by handling hcalls and other interrupts in real mode as much as possible. By contrast, POWER9 has independent MMU context per-thread, and in radix mode the hypervisor is in host virtual memory mode when the HV interrupt is taken. Radix guests do not require significant hcalls to manage their translations, and xive guests don't need hcalls to handle interrupts. So it's much less important from a performance standpoint to handle hcalls in real mode in P9. The TCE hcalls are performance critical, introduced for P8 in order to achieve 10GbE performance. They was not required on P9 (which was able to drive 40GbE networking with just the virt mode hcalls), but performance is quite important. After later changes that add hash guest support to the P9 path, hash hcalls are also performance critical. The full entry/exit performance can still be improved significanty, but if this is found to be inadequate then real-mode hcall handlers could be reintroduced for the P9 path (in C), but this would be a last resort. Reviewed-by: Alexey Kardashevskiy Reviewed-by: Cédric Le Goater Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/kvm_ppc.h | 6 ++ arch/powerpc/kvm/book3s.c | 6 ++ arch/powerpc/kvm/book3s_hv.c| 79 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 5 ++ arch/powerpc/kvm/book3s_xive.c | 64 5 files changed, 149 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 8c10c3427166..cb9e3c85c605 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -129,6 +129,7 @@ extern void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu); extern int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu); extern int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu); extern void kvmppc_core_queue_machine_check(struct kvm_vcpu *vcpu, ulong flags); +extern void kvmppc_core_queue_syscall(struct kvm_vcpu *vcpu); extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags); extern void kvmppc_core_queue_fpunavail(struct kvm_vcpu *vcpu); extern void kvmppc_core_queue_vec_unavail(struct kvm_vcpu *vcpu); @@ -606,6 +607,7 @@ extern void kvmppc_free_pimap(struct kvm *kvm); extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall); extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu); extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd); +extern int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req); extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu); extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval); extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev, @@ -638,6 +640,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu) static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { } static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd) { return 0; } +static inline int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req) + { return 0; } #endif #ifdef CONFIG_KVM_XIVE @@ -672,6 +676,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level, bool line_status); extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu); extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu); +extern void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu); static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu) { @@ -713,6 +718,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 ir
[PATCH 2/4] KVM: PPC: Book3S HV P9: Move radix MMU switching instructions together
Switching the MMU from radix<->radix mode is tricky particularly as the MMU can remain enabled and requires a certain sequence of SPR updates. Move these together into their own functions. This also includes the radix TLB check / flush because it's tied in to MMU switching due to tlbiel getting LPID from LPIDR. Reviewed-by: Alexey Kardashevskiy Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv.c | 62 1 file changed, 41 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 68914b26017b..287cd3e1b918 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3478,12 +3478,49 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) trace_kvmppc_run_core(vc, 1); } +static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 lpcr) +{ + struct kvmppc_vcore *vc = vcpu->arch.vcore; + struct kvm_nested_guest *nested = vcpu->arch.nested; + u32 lpid; + + lpid = nested ? nested->shadow_lpid : kvm->arch.lpid; + + /* +* All the isync()s are overkill but trivially follow the ISA +* requirements. Some can likely be replaced with justification +* comment for why they are not needed. +*/ + isync(); + mtspr(SPRN_LPID, lpid); + isync(); + mtspr(SPRN_LPCR, lpcr); + isync(); + mtspr(SPRN_PID, vcpu->arch.pid); + isync(); + + /* TLBIEL must have LPIDR set, so set guest LPID before flushing. */ + kvmppc_check_need_tlb_flush(kvm, vc->pcpu, nested); +} + +static void switch_mmu_to_host_radix(struct kvm *kvm, u32 pid) +{ + isync(); + mtspr(SPRN_PID, pid); + isync(); + mtspr(SPRN_LPID, kvm->arch.host_lpid); + isync(); + mtspr(SPRN_LPCR, kvm->arch.host_lpcr); + isync(); +} + /* * Load up hypervisor-mode registers on P9. */ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr) { + struct kvm *kvm = vcpu->kvm; struct kvmppc_vcore *vc = vcpu->arch.vcore; s64 hdec; u64 tb, purr, spurr; @@ -3535,7 +3572,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, } mtspr(SPRN_CIABR, vcpu->arch.ciabr); mtspr(SPRN_IC, vcpu->arch.ic); - mtspr(SPRN_PID, vcpu->arch.pid); mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC | (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG)); @@ -3549,8 +3585,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, mtspr(SPRN_AMOR, ~0UL); - mtspr(SPRN_LPCR, lpcr); - isync(); + switch_mmu_to_guest_radix(kvm, vcpu, lpcr); /* * P9 suppresses the HDEC exception when LPCR[HDICE] = 0, @@ -3593,7 +3628,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, mtspr(SPRN_DAWR1, host_dawr1); mtspr(SPRN_DAWRX1, host_dawrx1); } - mtspr(SPRN_PID, host_pidr); /* * Since this is radix, do a eieio; tlbsync; ptesync sequence in @@ -3608,9 +3642,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, if (cpu_has_feature(CPU_FTR_ARCH_31)) asm volatile(PPC_CP_ABORT); - mtspr(SPRN_LPID, vcpu->kvm->arch.host_lpid);/* restore host LPID */ - isync(); - vc->dpdes = mfspr(SPRN_DPDES); vc->vtb = mfspr(SPRN_VTB); mtspr(SPRN_DPDES, 0); @@ -3627,7 +3658,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, } mtspr(SPRN_HDEC, 0x7fff); - mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr); + + switch_mmu_to_host_radix(kvm, host_pidr); return trap; } @@ -4181,7 +4213,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, { struct kvm_run *run = vcpu->run; int trap, r, pcpu; - int srcu_idx, lpid; + int srcu_idx; struct kvmppc_vcore *vc; struct kvm *kvm = vcpu->kvm; struct kvm_nested_guest *nested = vcpu->arch.nested; @@ -4255,13 +4287,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, vc->vcore_state = VCORE_RUNNING; trace_kvmppc_run_core(vc, 0); - if (cpu_has_feature(CPU_FTR_HVMODE)) { - lpid = nested ? nested->shadow_lpid : kvm->arch.lpid; - mtspr(SPRN_LPID, lpid); - isync(); - kvmppc_check_need_tlb_flush(kvm, pcpu, nested); - } - guest_enter_irqoff(); srcu_idx = srcu_read_lock(>srcu); @@ -4280,11 +4305,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, srcu_read_unlock(>srcu, srcu_idx); - if (cpu_has_feature(CPU_FTR_HVMODE)) { - mtspr(SPRN_LPID,
[PATCH 1/4] KVM: PPC: Book3S HV P9: Move xive vcpu context management into kvmhv_p9_guest_entry
Move the xive management up so the low level register switching can be pushed further down in a later patch. XIVE MMIO CI operations can run in higher level code with machine checks, tracing, etc., available. Reviewed-by: Alexey Kardashevskiy Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8a31df067c65..68914b26017b 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3558,15 +3558,11 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, */ mtspr(SPRN_HDEC, hdec); - kvmppc_xive_push_vcpu(vcpu); - mtspr(SPRN_SRR0, vcpu->arch.shregs.srr0); mtspr(SPRN_SRR1, vcpu->arch.shregs.srr1); trap = __kvmhv_vcpu_entry_p9(vcpu); - kvmppc_xive_pull_vcpu(vcpu); - /* Advance host PURR/SPURR by the amount used by guest */ purr = mfspr(SPRN_PURR); spurr = mfspr(SPRN_SPURR); @@ -3764,7 +3760,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, trap = 0; } } else { + kvmppc_xive_push_vcpu(vcpu); trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr); + kvmppc_xive_pull_vcpu(vcpu); + } vcpu->arch.slb_max = 0; -- 2.23.0
[PATCH 0/4] KVM: PPC: Convert P9 HV path to C
This applies on top of these series: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=238649 https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=238941 https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=238946 This was broken out from the large Cify series. Since that was last posted, changes are: - Rebase, reordering of patches, tweaking changelog and comments. - Changed P9 radix exist SLB sanitising to use 4x slbmte to clear rather than 8x slbmfee/slbmfev, which turns out to be faster (and is what today's asm code does) [from review from Paul]. - Renamed book3s_hv_interrupt.c to book3s_hv_p9_entry.c, which reduces confusion with book3s_hv_interrupts.S. - Fixed !HV compile [Alexey]. Nicholas Piggin (4): KVM: PPC: Book3S HV P9: Move xive vcpu context management into kvmhv_p9_guest_entry KVM: PPC: Book3S HV P9: Move radix MMU switching instructions together KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C arch/powerpc/include/asm/asm-prototypes.h | 3 +- arch/powerpc/include/asm/kvm_asm.h| 1 + arch/powerpc/include/asm/kvm_book3s_64.h | 8 + arch/powerpc/include/asm/kvm_host.h | 7 +- arch/powerpc/include/asm/kvm_ppc.h| 6 + arch/powerpc/kernel/security.c| 5 +- arch/powerpc/kvm/Makefile | 1 + arch/powerpc/kvm/book3s.c | 6 + arch/powerpc/kvm/book3s_64_entry.S| 254 ++ arch/powerpc/kvm/book3s_hv.c | 155 + arch/powerpc/kvm/book3s_hv_p9_entry.c | 207 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 120 +- arch/powerpc/kvm/book3s_xive.c| 64 ++ 13 files changed, 683 insertions(+), 154 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_hv_p9_entry.c -- 2.23.0
[PATCH kernel v3] powerpc/makefile: Do not redefine $(CPP) for preprocessor
The $(CPP) (do only preprocessing) macro is already defined in Makefile. However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results in flags duplication. Which is not a big deal by itself except for the flags which depend on other flags and the compiler checks them as it parses the command line. Specifically, scripts/Makefile.build:304 generates ksyms for .S files. If clang+llvm+sanitizer are enabled, this results in -emit-llvm-bc -fno-lto -flto -fvisibility=hidden \ -fsanitize=cfi-mfcall -fno-lto ... in the clang command line and triggers error: clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed with '-flto' This removes unnecessary CPP redefinition. Which works fine as in most place KBUILD_CFLAGS is passed to $CPP except arch/powerpc/kernel/vdso64/vdso(32|64).lds. To fix vdso, this does: 1. add -m(big|little)-endian to $CPP 2. add target to $KBUILD_CPPFLAGS as otherwise clang ignores -m(big|little)-endian if the building platform does not support big endian (such as x86). Signed-off-by: Alexey Kardashevskiy --- Changes: v3: * moved vdso cleanup in a separate patch * only add target to KBUILD_CPPFLAGS for CLANG v2: * fix KBUILD_CPPFLAGS * add CLANG_FLAGS to CPPFLAGS --- Makefile | 1 + arch/powerpc/Makefile | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 15b6476d0f89..5b545bef7653 100644 --- a/Makefile +++ b/Makefile @@ -576,6 +576,7 @@ CC_VERSION_TEXT = $(subst $(pound),,$(shell $(CC) --version 2>/dev/null | head - ifneq ($(findstring clang,$(CC_VERSION_TEXT)),) ifneq ($(CROSS_COMPILE),) CLANG_FLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%)) +KBUILD_CPPFLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%)) endif ifeq ($(LLVM_IAS),1) CLANG_FLAGS+= -integrated-as diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 3212d076ac6a..306bfd2797ad 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -76,6 +76,7 @@ endif ifdef CONFIG_CPU_LITTLE_ENDIAN KBUILD_CFLAGS += -mlittle-endian +KBUILD_CPPFLAGS+= -mlittle-endian KBUILD_LDFLAGS += -EL LDEMULATION:= lppc GNUTARGET := powerpcle @@ -83,6 +84,7 @@ MULTIPLEWORD := -mno-multiple KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-save-toc-indirect) else KBUILD_CFLAGS += $(call cc-option,-mbig-endian) +KBUILD_CPPFLAGS += $(call cc-option,-mbig-endian) KBUILD_LDFLAGS += -EB LDEMULATION:= ppc GNUTARGET := powerpc @@ -208,7 +210,6 @@ KBUILD_CPPFLAGS += -I $(srctree)/arch/$(ARCH) $(asinstr) KBUILD_AFLAGS += $(AFLAGS-y) KBUILD_CFLAGS += $(call cc-option,-msoft-float) KBUILD_CFLAGS += -pipe $(CFLAGS-y) -CPP= $(CC) -E $(KBUILD_CFLAGS) CHECKFLAGS += -m$(BITS) -D__powerpc__ -D__powerpc$(BITS)__ ifdef CONFIG_CPU_BIG_ENDIAN -- 2.30.2
[PATCH] selftests/powerpc: Fix duplicate included pthread.h
Clean up the following includecheck warning: ./tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c: pthread.h is included more than once. No functional change. Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong --- tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c index e2a0c07..9ef37a9 100644 --- a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c +++ b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c @@ -17,7 +17,6 @@ #include #include #include -#include #include "tm.h" #include "utils.h" -- 1.8.3.1
Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value
On 13.05.21 12:25, Greg Kroah-Hartman wrote: On Thu, May 13, 2021 at 12:03:02PM +0200, Juergen Gross wrote: Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values. Signed-off-by: Juergen Gross --- drivers/tty/hvc/hvc_xen.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index 92c9a476defc..30d7ffb1e04c 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons, cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ + + if (WARN_ONCE((prod - cons) > sizeof(intf->out), + "Illegal ring page indices")) + return -EINVAL; How nice, you just rebooted on panic-on-warn systems :( + BUG_ON((prod - cons) > sizeof(intf->out)); Why keep this line? Failed to delete it, sorry. Please just fix this up properly, if userspace can trigger this, then both the WARN_ON() and BUG_ON() are not correct and need to be correctly handled. It can be triggered by the console backend, but I agree a WARN isn't the way to go here. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value
On Thu, May 13, 2021 at 12:03:02PM +0200, Juergen Gross wrote: > Xen frontends shouldn't BUG() in case of illegal data received from > their backends. So replace the BUG_ON()s when reading illegal data from > the ring page with negative return values. > > Signed-off-by: Juergen Gross > --- > drivers/tty/hvc/hvc_xen.c | 15 +-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c > index 92c9a476defc..30d7ffb1e04c 100644 > --- a/drivers/tty/hvc/hvc_xen.c > +++ b/drivers/tty/hvc/hvc_xen.c > @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons, > cons = intf->out_cons; > prod = intf->out_prod; > mb(); /* update queue values before going on */ > + > + if (WARN_ONCE((prod - cons) > sizeof(intf->out), > + "Illegal ring page indices")) > + return -EINVAL; How nice, you just rebooted on panic-on-warn systems :( > + > BUG_ON((prod - cons) > sizeof(intf->out)); Why keep this line? Please just fix this up properly, if userspace can trigger this, then both the WARN_ON() and BUG_ON() are not correct and need to be correctly handled. > > while ((sent < len) && ((prod - cons) < sizeof(intf->out))) > @@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const > char *data, int len) >*/ > while (len) { > int sent = __write_console(cons, data, len); > - > + > + if (sent < 0) > + return sent; > + > data += sent; > len -= sent; > > @@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char > *buf, int len) > cons = intf->in_cons; > prod = intf->in_prod; > mb(); /* get pointers before reading ring */ > - BUG_ON((prod - cons) > sizeof(intf->in)); > + > + if (WARN_ONCE((prod - cons) > sizeof(intf->in), > + "Illegal ring page indices")) > + return -EINVAL; Same here, you still just paniced a machine :( thanks, greg k-h
Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value
On 13.05.21 12:16, Christophe Leroy wrote: Le 13/05/2021 à 12:03, Juergen Gross a écrit : Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values. Signed-off-by: Juergen Gross --- drivers/tty/hvc/hvc_xen.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index 92c9a476defc..30d7ffb1e04c 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons, cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ + + if (WARN_ONCE((prod - cons) > sizeof(intf->out), + "Illegal ring page indices")) + return -EINVAL; + BUG_ON((prod - cons) > sizeof(intf->out)); Why keep the BUG_ON() ? Oh, failed to delete it. Thanks for noticing. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value
Le 13/05/2021 à 12:03, Juergen Gross a écrit : Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values. Signed-off-by: Juergen Gross --- drivers/tty/hvc/hvc_xen.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index 92c9a476defc..30d7ffb1e04c 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons, cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ + + if (WARN_ONCE((prod - cons) > sizeof(intf->out), + "Illegal ring page indices")) + return -EINVAL; + BUG_ON((prod - cons) > sizeof(intf->out)); Why keep the BUG_ON() ? while ((sent < len) && ((prod - cons) < sizeof(intf->out))) @@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const char *data, int len) */ while (len) { int sent = __write_console(cons, data, len); - + + if (sent < 0) + return sent; + data += sent; len -= sent; @@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len) cons = intf->in_cons; prod = intf->in_prod; mb(); /* get pointers before reading ring */ - BUG_ON((prod - cons) > sizeof(intf->in)); + + if (WARN_ONCE((prod - cons) > sizeof(intf->in), + "Illegal ring page indices")) + return -EINVAL; while (cons != prod && recv < len) buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)];
[powerpc:fixes-test] BUILD SUCCESS da3bb206c9ceb0736d9e2897ea697acabad35833
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test branch HEAD: da3bb206c9ceb0736d9e2897ea697acabad35833 KVM: PPC: Book3S HV: Fix kvm_unmap_gfn_range_hv() for Hash MMU elapsed time: 1183m configs tested: 9 configs skipped: 83 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 allyesconfig x86_64rhel-8.3-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 rhel-8.3-kbuiltin x86_64 kexec --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
[PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value
Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values. Signed-off-by: Juergen Gross --- drivers/tty/hvc/hvc_xen.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index 92c9a476defc..30d7ffb1e04c 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons, cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ + + if (WARN_ONCE((prod - cons) > sizeof(intf->out), + "Illegal ring page indices")) + return -EINVAL; + BUG_ON((prod - cons) > sizeof(intf->out)); while ((sent < len) && ((prod - cons) < sizeof(intf->out))) @@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const char *data, int len) */ while (len) { int sent = __write_console(cons, data, len); - + + if (sent < 0) + return sent; + data += sent; len -= sent; @@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len) cons = intf->in_cons; prod = intf->in_prod; mb(); /* get pointers before reading ring */ - BUG_ON((prod - cons) > sizeof(intf->in)); + + if (WARN_ONCE((prod - cons) > sizeof(intf->in), + "Illegal ring page indices")) + return -EINVAL; while (cons != prod && recv < len) buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)]; -- 2.26.2
Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1
Le 13/05/2021 à 08:47, Christian Zigotzky a écrit : Hi Christophe, On 9. May 2021, at 19:37, Christophe Leroy wrote: Did I do an error in my analysis ? No, you didn’t. On the contrary you have found the issue. ;-) The issue is somewhere in the new interrupt code. I'm not convinced, but let's give it a try. ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context tracking scheme Could you please create a patch for reverting the new interrupt code? I would like to confirm your result. Please fetch https://github.com/chleroy/linux.git and try the branch for_christian. This is a revert of approx a dozen of commits around the changes to 64e on top of v5.13-rc1. If the top of the branch works for you, it would be great if you can find out which one of the reverts fixes the problem for real. Thanks Christophe
Re: [RFC 01/10] powerpc/rtas: new APIs for busy and extended delay statuses
On 04/05/2021 13:03, Nathan Lynch wrote: Add new APIs for handling busy (-2) and extended delay hint (9900...9905) statuses from RTAS. These are intended to be drop-in replacements for existing uses of rtas_busy_delay(). A problem with rtas_busy_delay() and rtas_busy_delay_time() is that they consider -2/busy to be equivalent to 9900 (wait 1ms). In fact, the OS should call again as soon as it wants on -2, which at least on PowerVM means RTAS is returning only to uphold the general requirement that RTAS must return control to the OS in a "timely fashion" (250us). Combine this with the fact that msleep(1) actually sleeps for more like 20ms in practice: on busy VMs we schedule away for much longer than necessary on -2 and 9900. This is fixed in rtas_sched_if_busy(), which uses usleep_range() for small delay hints, and only schedules away on -2 if there is other work available. It also refuses to sleep longer than one second regardless of the hinted value, on the assumption that even longer running operations can tolerate polling at 1HZ. rtas_spin_if_busy() and rtas_force_spin_if_busy() are provided for atomic contexts which need to handle busy status and extended delay hints. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas.h | 4 + arch/powerpc/kernel/rtas.c | 168 2 files changed, 172 insertions(+) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 9dc97d2f9d27..555ff3290f92 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -266,6 +266,10 @@ extern int rtas_set_rtc_time(struct rtc_time *rtc_time); extern unsigned int rtas_busy_delay_time(int status); extern unsigned int rtas_busy_delay(int status); +bool rtas_sched_if_busy(int status); +bool rtas_spin_if_busy(int status); +bool rtas_force_spin_if_busy(int status); + extern int early_init_dt_scan_rtas(unsigned long node, const char *uname, int depth, void *data); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 6bada744402b..4a1dfbfa51ba 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -519,6 +519,174 @@ unsigned int rtas_busy_delay(int status) } EXPORT_SYMBOL(rtas_busy_delay); +/** + * rtas_force_spin_if_busy() - Consume a busy or extended delay status + * in atomic context. + * @status: Return value from rtas_call() or similar function. + * + * Use this function when you cannot avoid using an RTAS function + * which may return an extended delay hint in atomic context. If + * possible, use rtas_spin_if_busy() or rtas_sched_if_busy() instead + * of this function. + * + * Return: True if @status is -2 or 990x, in which case + * rtas_spin_if_busy() will have delayed an appropriate amount + * of time, and the caller should call the RTAS function + * again. False otherwise. + */ +bool rtas_force_spin_if_busy(int status) rtas_force_delay_if_busy()? neither this one nor rtas_spin_if_busy() actually spins. +{ + bool was_busy = true; + + switch (status) { + case RTAS_BUSY: + /* OK to call again immediately; do nothing. */ + break; + case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX: + mdelay(1); + break; + default: + was_busy = false; + break; + } + + return was_busy; +} + +/** + * rtas_spin_if_busy() - Consume a busy status in atomic context. + * @status: Return value from rtas_call() or similar function. + * + * Prefer rtas_sched_if_busy() over this function. Prefer this + * function over rtas_force_spin_if_busy(). Use this function in + * atomic contexts with RTAS calls that are specified to return -2 but + * not 990x. This function will complain and execute a minimal delay + * if passed a 990x status. + * + * Return: True if @status is -2 or 990x, in which case + * rtas_spin_if_busy() will have delayed an appropriate amount + * of time, and the caller should call the RTAS function + * again. False otherwise. + */ +bool rtas_spin_if_busy(int status) rtas_delay_if_busy()? +{ + bool was_busy = true; + + switch (status) { + case RTAS_BUSY: + /* OK to call again immediately; do nothing. */ + break; + case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX: + /* +* Generally, RTAS functions which can return this +* status should be considered too expensive to use in +* atomic context. Change the calling code to use +* rtas_sched_if_busy(), or if that's not possible, +* use rtas_force_spin_if_busy(). +*/ + pr_warn_once("%pS may use RTAS call in atomic context which returns extended delay.\n", +__builtin_return_address(0)); +
[PATCH v2] mm/ioremap: Fix iomap_max_page_shift
iomap_max_page_shift is expected to contain a page shift, so it can't be a 'bool', has to be an 'unsigned int' And fix the default values: P4D_SHIFT is when huge iomap is allowed. However, on some architectures (eg: powerpc book3s/64), P4D_SHIFT is not a constant so it can't be used to initialise a static variable. So, initialise iomap_max_page_shift with a maximum shift supported by the architecture, it is gated by P4D_SHIFT in vmap_try_huge_p4d() anyway. Signed-off-by: Christophe Leroy Fixes: bbc180a5adb0 ("mm: HUGE_VMAP arch support cleanup") Reviewed-by: Nicholas Piggin Reviewed-by: Anshuman Khandual --- v2: Initialise with BITS_PER_LONG - 1 instead of P4D_SHIFT which is not a constant on powerpc/64. --- mm/ioremap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/ioremap.c b/mm/ioremap.c index d1dcc7e744ac..8ee0136f8cb0 100644 --- a/mm/ioremap.c +++ b/mm/ioremap.c @@ -16,16 +16,16 @@ #include "pgalloc-track.h" #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -static bool __ro_after_init iomap_max_page_shift = PAGE_SHIFT; +static unsigned int __ro_after_init iomap_max_page_shift = BITS_PER_LONG - 1; static int __init set_nohugeiomap(char *str) { - iomap_max_page_shift = P4D_SHIFT; + iomap_max_page_shift = PAGE_SHIFT; return 0; } early_param("nohugeiomap", set_nohugeiomap); #else /* CONFIG_HAVE_ARCH_HUGE_VMAP */ -static const bool iomap_max_page_shift = PAGE_SHIFT; +static const unsigned int iomap_max_page_shift = PAGE_SHIFT; #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ int ioremap_page_range(unsigned long addr, -- 2.25.0
Re: [PATCH v2] powerpc/papr_scm: Make 'perf_stats' invisible if perf-stats unavailable
Thanks for looking into this patch Dan, Dan Williams writes: > On Fri, May 7, 2021 at 4:40 AM Vaibhav Jain wrote: >> >> In case performance stats for an nvdimm are not available, reading the >> 'perf_stats' sysfs file returns an -ENOENT error. A better approach is >> to make the 'perf_stats' file entirely invisible to indicate that >> performance stats for an nvdimm are unavailable. >> >> So this patch updates 'papr_nd_attribute_group' to add a 'is_visible' >> callback implemented as newly introduced 'papr_nd_attribute_visible()' >> that returns an appropriate mode in case performance stats aren't >> supported in a given nvdimm. >> >> Also the initialization of 'papr_scm_priv.stat_buffer_len' is moved >> from papr_scm_nvdimm_init() to papr_scm_probe() so that it value is >> available when 'papr_nd_attribute_visible()' is called during nvdimm >> initialization. >> >> Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from >> PHYP') > > Since this has been the exposed ABI since v5.9 perhaps a note / > analysis is needed here that the disappearance of this file will not > break any existing scripts/tools. > Yes, I have added a note in patch description and also updated Documentation/ABI/testing/sysfs-bus-papr-pmem with a note on when this sysfs attr would be available. Thanks for this suggestion. >> Signed-off-by: Vaibhav Jain >> --- >> Changelog: >> >> v2: >> * Removed a redundant dev_info() from pap_scm_nvdimm_init() [ Aneesh ] >> * Marked papr_nd_attribute_visible() as static which also fixed the >> build warning reported by kernel build robot >> --- >> arch/powerpc/platforms/pseries/papr_scm.c | 35 --- >> 1 file changed, 24 insertions(+), 11 deletions(-) >> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c >> b/arch/powerpc/platforms/pseries/papr_scm.c >> index e2b69cc3beaf..11e7b90a3360 100644 >> --- a/arch/powerpc/platforms/pseries/papr_scm.c >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c >> @@ -907,6 +907,20 @@ static ssize_t flags_show(struct device *dev, >> } >> DEVICE_ATTR_RO(flags); >> >> +static umode_t papr_nd_attribute_visible(struct kobject *kobj, >> +struct attribute *attr, int n) >> +{ >> + struct device *dev = container_of(kobj, typeof(*dev), kobj); > > This can use the kobj_to_dev() helper. > Fixed in v3 >> + struct nvdimm *nvdimm = to_nvdimm(dev); >> + struct papr_scm_priv *p = nvdimm_provider_data(nvdimm); >> + >> + /* For if perf-stats not available remove perf_stats sysfs */ >> + if (attr == _attr_perf_stats.attr && p->stat_buffer_len == 0) >> + return 0; >> + >> + return attr->mode; >> +} >> + >> /* papr_scm specific dimm attributes */ >> static struct attribute *papr_nd_attributes[] = { >> _attr_flags.attr, >> @@ -916,6 +930,7 @@ static struct attribute *papr_nd_attributes[] = { >> >> static struct attribute_group papr_nd_attribute_group = { >> .name = "papr", >> + .is_visible = papr_nd_attribute_visible, >> .attrs = papr_nd_attributes, >> }; >> >> @@ -931,7 +946,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) >> struct nd_region_desc ndr_desc; >> unsigned long dimm_flags; >> int target_nid, online_nid; >> - ssize_t stat_size; >> >> p->bus_desc.ndctl = papr_scm_ndctl; >> p->bus_desc.module = THIS_MODULE; >> @@ -1016,16 +1030,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv >> *p) >> list_add_tail(>region_list, _nd_regions); >> mutex_unlock(_ndr_lock); >> >> - /* Try retriving the stat buffer and see if its supported */ >> - stat_size = drc_pmem_query_stats(p, NULL, 0); >> - if (stat_size > 0) { >> - p->stat_buffer_len = stat_size; >> - dev_dbg(>pdev->dev, "Max perf-stat size %lu-bytes\n", >> - p->stat_buffer_len); >> - } else { >> - dev_info(>pdev->dev, "Dimm performance stats >> unavailable\n"); >> - } >> - >> return 0; >> >> err: nvdimm_bus_unregister(p->bus); >> @@ -1102,6 +1106,7 @@ static int papr_scm_probe(struct platform_device *pdev) >> u64 blocks, block_size; >> struct papr_scm_priv *p; >> const char *uuid_str; >> + ssize_t stat_size; >> u64 uuid[2]; >> int rc; >> >> @@ -1179,6 +1184,14 @@ static int papr_scm_probe(struct platform_device >> *pdev) >> p->res.name = pdev->name; >> p->res.flags = IORESOURCE_MEM; >> >> + /* Try retriving the stat buffer and see if its supported */ > > s/retriving/retrieving/ > Fixed in v3. >> + stat_size = drc_pmem_query_stats(p, NULL, 0); >> + if (stat_size > 0) { >> + p->stat_buffer_len = stat_size; >> + dev_dbg(>pdev->dev, "Max perf-stat size %lu-bytes\n", >> + p->stat_buffer_len); >> + } >> + >> rc = papr_scm_nvdimm_init(p);
[powerpc:next-test] BUILD SUCCESS d64ef8451fbc85489928bea033a63d9c09fca786
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test branch HEAD: d64ef8451fbc85489928bea033a63d9c09fca786 powerpc/32s: Remove asm/book3s/32/hash.h elapsed time: 1129m configs tested: 9 configs skipped: 83 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 allyesconfig x86_64rhel-8.3-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 rhel-8.3-kbuiltin x86_64 kexec --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
[PATCH v3] powerpc/papr_scm: Make 'perf_stats' invisible if perf-stats unavailable
In case performance stats for an nvdimm are not available, reading the 'perf_stats' sysfs file returns an -ENOENT error. A better approach is to make the 'perf_stats' file entirely invisible to indicate that performance stats for an nvdimm are unavailable. So this patch updates 'papr_nd_attribute_group' to add a 'is_visible' callback implemented as newly introduced 'papr_nd_attribute_visible()' that returns an appropriate mode in case performance stats aren't supported in a given nvdimm. Also the initialization of 'papr_scm_priv.stat_buffer_len' is moved from papr_scm_nvdimm_init() to papr_scm_probe() so that it value is available when 'papr_nd_attribute_visible()' is called during nvdimm initialization. Even though 'perf_stats' attribute is available since v5.9, there are no known user-space tools/scripts that are dependent on presence of its sysfs file. Hence I dont expect any user-space breakage with this patch. Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from PHYP') Reviewed-by: Dan Williams Signed-off-by: Vaibhav Jain --- Changelog: v3: * Minor spell corrections [ Dan Williams ] * Switched to kobj_to_dev() helper in papr_nd_attribute_visible() [ Dan Williams ] * Updated ABI/../sysfs-bus-papr-pmem to indicate that the attribute is only available for devices that support performance stats. * Updated patch description. v2: * Removed a redundant dev_info() from pap_scm_nvdimm_init() [ Aneesh ] * Marked papr_nd_attribute_visible() as static which also fixed the build warning reported by kernel build robot --- Documentation/ABI/testing/sysfs-bus-papr-pmem | 8 +++-- arch/powerpc/platforms/pseries/papr_scm.c | 35 +-- 2 files changed, 29 insertions(+), 14 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem index 8316c33862a0..0aa02bf2bde5 100644 --- a/Documentation/ABI/testing/sysfs-bus-papr-pmem +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem @@ -39,9 +39,11 @@ KernelVersion: v5.9 Contact: linuxppc-dev , linux-nvd...@lists.01.org, Description: (RO) Report various performance stats related to papr-scm NVDIMM - device. Each stat is reported on a new line with each line - composed of a stat-identifier followed by it value. Below are - currently known dimm performance stats which are reported: + device. This attribute is only available for NVDIMM devices + that support reporting NVDIMM performance stats. Each stat is + reported on a new line with each line composed of a + stat-identifier followed by it value. Below are currently known + dimm performance stats which are reported: * "CtlResCt" : Controller Reset Count * "CtlResTm" : Controller Reset Elapsed Time diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index e2b69cc3beaf..e8577e7e49ab 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -907,6 +907,20 @@ static ssize_t flags_show(struct device *dev, } DEVICE_ATTR_RO(flags); +static umode_t papr_nd_attribute_visible(struct kobject *kobj, +struct attribute *attr, int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct nvdimm *nvdimm = to_nvdimm(dev); + struct papr_scm_priv *p = nvdimm_provider_data(nvdimm); + + /* For if perf-stats not available remove perf_stats sysfs */ + if (attr == _attr_perf_stats.attr && p->stat_buffer_len == 0) + return 0; + + return attr->mode; +} + /* papr_scm specific dimm attributes */ static struct attribute *papr_nd_attributes[] = { _attr_flags.attr, @@ -916,6 +930,7 @@ static struct attribute *papr_nd_attributes[] = { static struct attribute_group papr_nd_attribute_group = { .name = "papr", + .is_visible = papr_nd_attribute_visible, .attrs = papr_nd_attributes, }; @@ -931,7 +946,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) struct nd_region_desc ndr_desc; unsigned long dimm_flags; int target_nid, online_nid; - ssize_t stat_size; p->bus_desc.ndctl = papr_scm_ndctl; p->bus_desc.module = THIS_MODULE; @@ -1016,16 +1030,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) list_add_tail(>region_list, _nd_regions); mutex_unlock(_ndr_lock); - /* Try retriving the stat buffer and see if its supported */ - stat_size = drc_pmem_query_stats(p, NULL, 0); - if (stat_size > 0) { - p->stat_buffer_len = stat_size; - dev_dbg(>pdev->dev, "Max perf-stat size %lu-bytes\n", - p->stat_buffer_len); - } else { - dev_info(>pdev->dev, "Dimm performance stats unavailable\n"); -
[PATCH v2 2/2] powerpc/kprobes: Replace ppc_optinsn by common optinsn
Commit 51c9c0843993 ("powerpc/kprobes: Implement Optprobes") implemented a powerpc specific version of optinsn in order to workaround the 32Mb limitation for direct branches. Instead of implementing a dedicated powerpc version, use the common optinsn and override the allocation and freeing functions. This also indirectly remove the CLANG warning about is_kprobe_ppc_optinsn_slot() not being use, and the powerpc will now benefit from commit 5b485629ba0d ("kprobes, extable: Identify kprobes trampolines as kernel text area") Suggested-by: Masami Hiramatsu Signed-off-by: Christophe Leroy Acked-by: Masami Hiramatsu --- v2: no change --- arch/powerpc/kernel/optprobes.c | 23 +-- 1 file changed, 5 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c index cdf87086fa33..a370190cd02a 100644 --- a/arch/powerpc/kernel/optprobes.c +++ b/arch/powerpc/kernel/optprobes.c @@ -31,11 +31,9 @@ #define TMPL_END_IDX \ (optprobe_template_end - optprobe_template_entry) -DEFINE_INSN_CACHE_OPS(ppc_optinsn); - static bool insn_page_in_use; -static void *__ppc_alloc_insn_page(void) +void *alloc_optinsn_page(void) { if (insn_page_in_use) return NULL; @@ -43,20 +41,11 @@ static void *__ppc_alloc_insn_page(void) return _slot; } -static void __ppc_free_insn_page(void *page __maybe_unused) +void free_optinsn_page(void *page) { insn_page_in_use = false; } -struct kprobe_insn_cache kprobe_ppc_optinsn_slots = { - .mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex), - .pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages), - /* insn_size initialized later */ - .alloc = __ppc_alloc_insn_page, - .free = __ppc_free_insn_page, - .nr_garbage = 0, -}; - /* * Check if we can optimize this probe. Returns NIP post-emulation if this can * be optimized and 0 otherwise. @@ -136,7 +125,7 @@ NOKPROBE_SYMBOL(optimized_callback); void arch_remove_optimized_kprobe(struct optimized_kprobe *op) { if (op->optinsn.insn) { - free_ppc_optinsn_slot(op->optinsn.insn, 1); + free_optinsn_slot(op->optinsn.insn, 1); op->optinsn.insn = NULL; } } @@ -203,14 +192,12 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) unsigned long nip, size; int rc, i; - kprobe_ppc_optinsn_slots.insn_size = MAX_OPTINSN_SIZE; - nip = can_optimize(p); if (!nip) return -EILSEQ; /* Allocate instruction slot for detour buffer */ - buff = get_ppc_optinsn_slot(); + buff = get_optinsn_slot(); if (!buff) return -ENOMEM; @@ -297,7 +284,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) return 0; error: - free_ppc_optinsn_slot(buff, 0); + free_optinsn_slot(buff, 0); return -ERANGE; } -- 2.25.0
[PATCH v2 1/2] kprobes: Allow architectures to override optinsn page allocation
Some architectures like powerpc require a non standard allocation of optinsn page, because module pages are too far from the kernel for direct branches. Define weak alloc_optinsn_page() and free_optinsn_page(), that fall back on alloc_insn_page() and free_insn_page() when not overriden by the architecture. Suggested-by: Masami Hiramatsu Signed-off-by: Christophe Leroy Acked-by: Masami Hiramatsu --- v2: Added missing prototypes in linux/kprobes.h --- include/linux/kprobes.h | 3 +++ kernel/kprobes.c| 14 -- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 1883a4a9f16a..02d4020615a7 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -409,6 +409,9 @@ void dump_kprobe(struct kprobe *kp); void *alloc_insn_page(void); void free_insn_page(void *page); +void *alloc_optinsn_page(void); +void free_optinsn_page(void *page); + int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type, char *sym); diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 745f08fdd7a6..8c0a6fdef771 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -321,11 +321,21 @@ int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum, } #ifdef CONFIG_OPTPROBES +void __weak *alloc_optinsn_page(void) +{ + return alloc_insn_page(); +} + +void __weak free_optinsn_page(void *page) +{ + free_insn_page(page); +} + /* For optimized_kprobe buffer */ struct kprobe_insn_cache kprobe_optinsn_slots = { .mutex = __MUTEX_INITIALIZER(kprobe_optinsn_slots.mutex), - .alloc = alloc_insn_page, - .free = free_insn_page, + .alloc = alloc_optinsn_page, + .free = free_optinsn_page, .sym = KPROBE_OPTINSN_PAGE_SYM, .pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages), /* .insn_size is initialized later */ -- 2.25.0
Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1
Hi Christophe, > On 9. May 2021, at 19:37, Christophe Leroy > wrote: > > Did I do an error in my analysis ? No, you didn’t. On the contrary you have found the issue. ;-) The issue is somewhere in the new interrupt code. > ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context > tracking scheme Could you please create a patch for reverting the new interrupt code? I would like to confirm your result. Thanks for your help, Christian > > BAD * c70a4be130de Merge tag 'powerpc-5.13-1' of > git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux > |\ > | * 525642624783 powerpc/signal32: Fix erroneous SIGSEGV on RT signal return > | * f9cd5f91a897 powerpc: Avoid clang uninitialized warning in > __get_user_size_allowed > | * adb68c38d8d4 powerpc/papr_scm: Mark nvdimm as unarmed if needed during > probe > | * ee1bc694fbae powerpc/kvm: Fix build error when PPC_MEM_KEYS/PPC_PSERIES=n > | * 30c400886bad powerpc/kasan: Fix shadow start address with modules > | * fc5590fd56c9 powerpc/kernel/iommu: Use largepool as a last resort when > !largealloc > | * 3c0468d4451e powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to > save TCEs > | * ee6b25fa7c03 powerpc/44x: fix spelling mistake in Kconfig "varients" -> > "variants" > | * cc7130bf119a powerpc/iommu: Annotate nested lock for lockdep > | * 4be518d83880 powerpc/iommu: Do not immediately panic when failed IOMMU > table allocation > | * 7f1fa82d7994 powerpc/iommu: Allocate it_map by vmalloc > | * 0db11461677a selftests/powerpc: remove unneeded semicolon > | * caea7b833d86 powerpc/64s: remove unneeded semicolon > | * f3d03fc748d4 powerpc/eeh: remove unneeded semicolon > | * 290f7d8ce2b1 powerpc/selftests: Add selftest to test concurrent > perf/ptrace events > | * c65c64cc7bbd powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR > | * c9cb0afb4eaa powerpc/selftests/perf-hwbreak: Coalesce event creation code > | * dae4ff8031b4 powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR > | * 421a7483878c powerpc/configs: Add IBMVNIC to some 64-bit configs > | * da650ada1009 selftests/powerpc: Add uaccess flush test > | * 8a87a5077143 powerpc/52xx: Fix an invalid ASM expression ('addi' used > instead of 'add') > | * 0f197ddce403 powerpc/64s: Fix mm_cpumask memory ordering comment > | * 66d9b7492887 powerpc/perf: Fix the threshold event selection for memory > events in power10 > | * b4ded42268ee powerpc/perf: Fix sampled instruction type for larx/stcx > | * 0bd3f9e953bd powerpc/legacy_serial: Use early_ioremap() > | * 9ccba66d4d2a powerpc/64: Fix the definition of the fixmap area > | * 389586333c02 powerpc: make ALTIVEC select PPC_FPU > | * 7d9462765707 powerpc/64s: Add FA_DUMP to defconfig > | * d936f8182e1b powerpc/powernv: Fix type of opal_mpipl_query_tag() addr > argument > | * 2e341f56a16a powerpc/fadump: Fix sparse warnings > | * 39352430aaa0 powerpc: Move copy_inst_from_kernel_nofault() > | * 41d6cf68b5f6 powerpc: Rename probe_kernel_read_inst() > | * 6449078d5011 powerpc: Make probe_kernel_read_inst() common to PPC32 and > PPC64 > | * 6ac7897f08e0 powerpc: Remove probe_user_read_inst() > | * ee7c3ec3b4b1 powerpc/ebpf32: Use standard function call for functions > within 32M distance > | * e7de0023e123 powerpc/ebpf32: Rework 64 bits shifts to avoid tests and > branches > | * d228cc496966 powerpc/ebpf32: Fix comment on BPF_ALU{64} | BPF_LSH | BPF_K > | * 867e762480f4 powerpc/32: Use r2 in wrtspr() instead of r0 > | * f56607e85ee3 selftests/timens: Fix gettime_perf to work on powerpc > | * 92d9d61be519 powerpc/mce: save ignore_event flag unconditionally for UE > | * eacf4c020265 powerpc: Enable OPTPROBES on PPC32 > | * 693557ebf407 powerpc/inst: ppc_inst_as_u64() becomes ppc_inst_as_ulong() > | * e522331173ec powerpc/irq: Enhance readability of trap types > | * 7fab639729ce powerpc/32s: Enhance readability of trap types > | * 0f5eb28a6ce6 powerpc/8xx: Enhance readability of trap types > | * a9d2f9bb225f powerpc/pseries/iommu: Fix window size for direct mapping > with pmem > | * e4e8bc1df691 powerpc/kvm: Fix PR KVM with KUAP/MEM_KEYS enabled > | * ed8029d7b472 powerpc/pseries: Stop calling printk in rtas_stop_self() > | * 3027a37c06be powerpc: Only define _TASK_CPU for 32-bit > | * 39d0099f9439 powerpc/pseries: Add shutdown() to vio_driver and vio_bus > | * af31fd0c9107 powerpc/perf: Expose processor pipeline stage cycles using > PERF_SAMPLE_WEIGHT_STRUCT > | * 2886e2df10be Documentation/powerpc: Add proper links for manual and tests > | * 29c9a2699e71 powerpc/pseries: Set UNISOLATE on dlpar_cpu_remove() failure > | * 0e3b3ff83ce2 powerpc/pseries: Introduce dlpar_unisolate_drc() > | * 864ec4d40c83 powerpc/pseries/mce: Fix a typo in error type assignment > | * cbd3d5ba46b6 powerpc/fadump: Fix compile error since trap type change > | * d8a1d6c58986 powerpc/perf: Add platform specific check_attr_config > | * a38cb4171928 Merge branch 'topic/ppc-kvm' into next > | |\ > | | * 732f21a3053c KVM: PPC: Book3S HV: Ensure