date:20220922

Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-09-22 Thread Christophe Leroy



Le 23/09/2022 à 05:30, Nicholas Piggin a écrit :
> This adds basic POWER10_CPU option, which builds with -mcpu=power10.
> 
> Signed-off-by: Nicholas Piggin 
> ---
> There's quite a lot of asm and linker changes slated for the next merge
> window already so I may leave the pcrel patch for next time. I think we
> can add the basic POWER10 build option though.
> 
> Thanks,
> Nick
> 
>   arch/powerpc/Makefile  | 7 ++-
>   arch/powerpc/platforms/Kconfig.cputype | 8 +++-
>   2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 8a3d69b02672..ea88af26f8c6 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -192,9 +192,14 @@ ifdef CONFIG_476FPE_ERR46
>   -T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
>   endif
>   
> -# No AltiVec or VSX instructions when building kernel
> +# No prefix or pcrel
> +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)

We have lots of code to handle prefixed instructions in code_patching, 
and that code complexifies stuff and has a performance impact.
And it is only partially taken into account, areas like ftrace don't 
properly take care of prefixed instructions.

Should we get rid of prefixed instruction support completely in the 
kernel, and come back to more simple code ?

> +KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)
> +
> +# No AltiVec or VSX or MMA instructions when building kernel
>   KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
>   KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> +KBUILD_CFLAGS += $(call cc-option,-mno-mma)
>   
>   # No SPE instruction when building kernel
>   # (We use all available options to help semi-broken compilers)
> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
> b/arch/powerpc/platforms/Kconfig.cputype
> index 4017be72e46f..1f7c903ea664 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -171,6 +171,11 @@ config POWER9_CPU
>   depends on PPC_BOOK3S_64
>   select ARCH_HAS_FAST_MULTIPLIER
>   
> +config POWER10_CPU
> + bool "POWER10"
> + depends on PPC_BOOK3S_64
> + select ARCH_HAS_FAST_MULTIPLIER
> +
>   config E5500_CPU
>   bool "Freescale e5500"
>   depends on PPC64 && E500
> @@ -239,6 +244,7 @@ config TARGET_CPU
>   default "power7" if POWER7_CPU
>   default "power8" if POWER8_CPU
>   default "power9" if POWER9_CPU
> + default "power10" if POWER10_CPU
>   default "405" if 405_CPU
>   default "440" if 440_CPU
>   default "464" if 464_CPU

Re: [PATCH 1/5] powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKER

2022-09-22 Thread Christophe Leroy



Le 23/09/2022 à 05:25, Nicholas Piggin a écrit :
> Using a 32-bit constant for this marker allows it to be loaded with
> two ALU instructions, like 32-bit. This avoids a TOC entry and a
> TOC load that depends on the r2 value that has just been loaded from
> the PACA.
> 
> This changes the value for 32-bit as well, so both have the same
> value in the low 4 bytes and 64-bit has 0x in the top bytes.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>   arch/powerpc/include/asm/ptrace.h| 5 +++--
>   arch/powerpc/kernel/entry_32.S   | 6 +++---
>   arch/powerpc/kernel/exceptions-64e.S | 8 +---
>   arch/powerpc/kernel/exceptions-64s.S | 2 +-
>   arch/powerpc/kernel/head_64.S| 7 ---
>   arch/powerpc/kernel/interrupt_64.S   | 6 +++---
>   6 files changed, 11 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ptrace.h 
> b/arch/powerpc/include/asm/ptrace.h
> index a03403695cd4..49d720bb888b 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -99,6 +99,9 @@ struct pt_regs
>   
>   #define STACK_FRAME_WITH_PT_REGS (STACK_FRAME_OVERHEAD + sizeof(struct 
> pt_regs))
>   
> +/* 0x8d9a988d on 64-bit */
> +#define STACK_FRAME_REGS_MARKER  ASM_CONST(-0x72656773) /* 0x8d9a988d */
> +

0x72656773 is "REGS" in ASCII (Big Endian) and you can spot it 
immediatly in a memory dump.
0x7265677368657265 is "REGSHERE".

0x8d9a988d is not printable.

Don't know if it can be a problem.

Christophe

[powerpc:next-test] BUILD SUCCESS 6d2ed86d96d8c5dec5e6fe355fff678e519c419f

2022-09-22 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: 6d2ed86d96d8c5dec5e6fe355fff678e519c419f  powerpc/highmem: 
Properly handle fragmented memory

elapsed time: 1087m

configs tested: 108
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
um i386_defconfig
um   x86_64_defconfig
powerpc   allnoconfig
arc  randconfig-r043-20220922
sh   allmodconfig
arc defconfig
s390 allmodconfig
mips allyesconfig
alpha   defconfig
powerpc  allmodconfig
s390defconfig
i386defconfig
x86_64randconfig-a011
x86_64randconfig-a013
s390 allyesconfig
x86_64  defconfig
x86_64randconfig-a015
x86_64randconfig-a004
x86_64   rhel-8.3
x86_64randconfig-a002
arc  allyesconfig
alphaallyesconfig
x86_64randconfig-a006
m68k allmodconfig
x86_64   allyesconfig
ia64 allmodconfig
m68k allyesconfig
i386  randconfig-a014
i386  randconfig-a012
i386  randconfig-a016
x86_64   rhel-8.3-kvm
x86_64  rhel-8.3-func
x86_64 rhel-8.3-kunit
x86_64rhel-8.3-kselftests
arm defconfig
x86_64   rhel-8.3-syz
arm  exynos_defconfig
parisc64defconfig
csky  allnoconfig
arc   allnoconfig
alpha allnoconfig
riscv allnoconfig
i386 allyesconfig
arm64allyesconfig
arm  allyesconfig
sparc   defconfig
loongarchalldefconfig
m68kstmark2_defconfig
powerpc  allyesconfig
riscv   defconfig
riscvallmodconfig
riscvallyesconfig
sparc   sparc32_defconfig
sh   se7721_defconfig
m68k  atari_defconfig
mips allmodconfig
xtensa   common_defconfig
xtensa   allyesconfig
cskydefconfig
sparcallyesconfig
x86_64  kexec
i386  randconfig-c001
nios2allyesconfig
nios2   defconfig
parisc  defconfig
parisc   allyesconfig
x86_64randconfig-c001
arm  randconfig-c002-20220922
loongarch   defconfig
loongarch allnoconfig
loongarchallmodconfig
armcerfcube_defconfig
microblaze  defconfig
arm   viper_defconfig
riscvnommu_virt_defconfig
riscv  rv32_defconfig
riscvnommu_k210_defconfig
i386   debian-10.3-kselftests
i386  debian-10.3
i386  debian-10.3-kvm
i386debian-10.3-kunit
i386 debian-10.3-func

clang tested configs:
hexagon  randconfig-r041-20220922
riscvrandconfig-r042-20220922
hexagon  randconfig-r045-20220922
s390 randconfig-r044-20220922
x86_64randconfig-a012
x86_64randconfig-a014
x86_64randconfig-a016
x86_64randconfig-a005
x86_64randconfig-a001
i386  randconfig-a002
i386  randconfig-a006
i386  randconfig-a004
x86_64randconfig-a003
i386  randconfig-a013
i386  randconfig-a011
i386  randconfig-a015
arm s5pv210_defconfig
powerpc

Re: [PATCH] powerpc: Ignore DSI error caused by the copy/paste instruction

2022-09-22 Thread Haren Myneni

On Thu, 2022-09-22 at 09:04 +, Christophe Leroy wrote:
> 
> Le 22/09/2022 à 10:29, Haren Myneni a écrit :
> > DSI error will be generated when the paste operation is issued on
> > the suspended NX window due to NX state changes. The hypervisor
> > expects the partition to ignore this error during page pault
> > handling. To differentiate DSI caused by an actual HW configuration
> > or by the NX window, a new “ibm,pi-features” type value is defined.
> > Byte 0, bit 3 of pi-attribute-specifier-type is now defined to
> > indicate this DSI error.
> 
> What is NX ? No eXec ? That's what it is usually. But in that case
> it 
> would be the ISI, not DSI.

NX is nest accelerator supports several functions such as compression,
encryption and etc. It is DSI error mentioned in PAPR ("DSI Caused by
User Mode NX")

> 
> > This patch adds changes to read ibm,pi-features property and ignore
> > DSI error in the page fault handling if CPU_FTR_NX_DSI if defined.
> > 
> > Signed-off-by: Haren Myneni 
> > ---
> >   arch/powerpc/include/asm/cputable.h |  5 ++--
> >   arch/powerpc/kernel/prom.c  | 36 +---
> > -
> >   arch/powerpc/mm/fault.c | 17 +-
> >   3 files changed, 45 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> > index 014005428687..154cc1e85770 100644
> > --- a/arch/powerpc/mm/fault.c
> > +++ b/arch/powerpc/mm/fault.c
> > @@ -367,7 +367,22 @@ static void sanity_check_fault(bool is_write,
> > bool is_user,
> >   #elif defined(CONFIG_PPC_8xx)
> >   #define page_fault_is_bad(__err)  ((__err) & DSISR_NOEXEC_OR_G)
> >   #elif defined(CONFIG_PPC64)
> > -#define page_fault_is_bad(__err)   ((__err) & DSISR_BAD_FAULT_64S)
> > +static inline int page_fault_is_bad(unsigned long __err)
> 
> The name was __err because it was a macro and there was a risk of 
> collision with a 'err' variable in the caller.
> 
> But as it is now a function, you can just call it 'err'.
> 
> And no need of the 'inline' keyword, GCC will inline it anyway.

Thanks for your comments. I will repost the patch with these changes.

> 
> > +{
> > +   unsigned long flag = DSISR_BAD_FAULT_64S;
> > +
> > +   /*
> > +* PAPR 14.15.3.4.1
> > +* If byte 0, bit 3 of pi-attribute-specifier-type in
> > +* ibm,pi-features property is defined, ignore the DSI error
> > +* which is caused by the paste instruction on the
> > +* suspended NX window.
> > +*/
> > +   if (cpu_has_feature(CPU_FTR_NX_DSI))
> > +   flag &= ~DSISR_BAD_COPYPASTE;
> > +
> > +   return ((__err) & flag);
> 
> The () around __err was because it was a macro parameter. It is 
> pointless now. And same for the overall ones. Now it can be :
> 
>   return err & flags;
> 
> > +}
> >   #else
> >   #define page_fault_is_bad(__err)  ((__err) & DSISR_BAD_FAULT_32S)
> >   #endif

[PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-09-22 Thread Nicholas Piggin

This adds basic POWER10_CPU option, which builds with -mcpu=power10.

Signed-off-by: Nicholas Piggin 
---
There's quite a lot of asm and linker changes slated for the next merge
window already so I may leave the pcrel patch for next time. I think we
can add the basic POWER10 build option though.

Thanks,
Nick

 arch/powerpc/Makefile  | 7 ++-
 arch/powerpc/platforms/Kconfig.cputype | 8 +++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 8a3d69b02672..ea88af26f8c6 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -192,9 +192,14 @@ ifdef CONFIG_476FPE_ERR46
-T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
 endif
 
-# No AltiVec or VSX instructions when building kernel
+# No prefix or pcrel
+KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
+KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)
+
+# No AltiVec or VSX or MMA instructions when building kernel
 KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
 KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
+KBUILD_CFLAGS += $(call cc-option,-mno-mma)
 
 # No SPE instruction when building kernel
 # (We use all available options to help semi-broken compilers)
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 4017be72e46f..1f7c903ea664 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -171,6 +171,11 @@ config POWER9_CPU
depends on PPC_BOOK3S_64
select ARCH_HAS_FAST_MULTIPLIER
 
+config POWER10_CPU
+   bool "POWER10"
+   depends on PPC_BOOK3S_64
+   select ARCH_HAS_FAST_MULTIPLIER
+
 config E5500_CPU
bool "Freescale e5500"
depends on PPC64 && E500
@@ -239,6 +244,7 @@ config TARGET_CPU
default "power7" if POWER7_CPU
default "power8" if POWER8_CPU
default "power9" if POWER9_CPU
+   default "power10" if POWER10_CPU
default "405" if 405_CPU
default "440" if 440_CPU
default "464" if 464_CPU
-- 
2.37.2

[PATCH 5/5] powerpc/64e: provide an addressing macro for use with TOC in alternate register

2022-09-22 Thread Nicholas Piggin

The interrupt entry code carefully saves a minimal number of registers,
so in some places the TOC is required, it is loaded into a different
register, so provide a macro that can supply an alternate TOC register.

This continues to use got addressing because TOC-relative results in
"got/toc optimization is not supported" messages by the linker. Having
r2 be one of the saved registers and using that for TOC addressing may
be the best way to avoid that and switch this to TOC addressing.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ppc_asm.h   | 11 +++
 arch/powerpc/kernel/exceptions-64e.S | 14 +++---
 arch/powerpc/mm/nohash/tlb_low_64e.S |  2 +-
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 17a87ecd82c9..622aaaec2be9 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -324,6 +324,17 @@ GLUE(.,name):
addis   reg,r2,name@toc@ha; \
addireg,reg,name@toc@l
 
+#ifdef CONFIG_PPC_BOOK3E_64
+/*
+ * This is used in register-constrained interrupt handlers. Not to be used
+ * by BOOK3S. ld complains with "got/toc optimization is not supported" if r2
+ * is not used for the TOC offset, so use @got(tocreg). If the interrupt
+ * handlers saved r2 instead, LOAD_REG_ADDR could be used.
+ */
+#define LOAD_REG_ADDR_ALTTOC(reg,tocreg,name)  \
+   ld  reg,name@got(tocreg)
+#endif
+
 #define LOAD_REG_ADDRBASE(reg,name)LOAD_REG_ADDR(reg,name)
 #define ADDROFF(name)  0
 
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index c20b39d011a9..8ac9dfd38874 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -699,8 +699,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 
 #ifdef CONFIG_RELOCATABLE
__LOAD_PACA_TOC(r15)
-   ld  r14,interrupt_base_book3e@got(r15)
-   ld  r15,__end_interrupts@got(r15)
+   LOAD_REG_ADDR_ALTTOC(r14, r15, interrupt_base_book3e)
+   LOAD_REG_ADDR_ALTTOC(r15, r15, __end_interrupts)
cmpld   cr0,r10,r14
cmpld   cr1,r10,r15
 #else
@@ -770,8 +770,8 @@ kernel_dbg_exc:
 
 #ifdef CONFIG_RELOCATABLE
__LOAD_PACA_TOC(r15)
-   ld  r14,interrupt_base_book3e@got(r15)
-   ld  r15,__end_interrupts@got(r15)
+   LOAD_REG_ADDR_ALTTOC(r14, r15, interrupt_base_book3e)
+   LOAD_REG_ADDR_ALTTOC(r15, r15, __end_interrupts)
cmpld   cr0,r10,r14
cmpld   cr1,r10,r15
 #else
@@ -895,8 +895,8 @@ kernel_dbg_exc:
 .macro SEARCH_RESTART_TABLE
 #ifdef CONFIG_RELOCATABLE
__LOAD_PACA_TOC(r11)
-   ld  r14,__start___restart_table@got(r11)
-   ld  r15,__stop___restart_table@got(r11)
+   LOAD_REG_ADDR_ALTTOC(r14, r11, __start___restart_table)
+   LOAD_REG_ADDR_ALTTOC(r15, r11, __stop___restart_table)
 #else
LOAD_REG_IMMEDIATE_SYM(r14, r11, __start___restart_table)
LOAD_REG_IMMEDIATE_SYM(r15, r11, __stop___restart_table)
@@ -1315,7 +1315,7 @@ a2_tlbinit_after_linear_map:
/* Now we branch the new virtual address mapped by this entry */
 #ifdef CONFIG_RELOCATABLE
__LOAD_PACA_TOC(r5)
-   ld  r3,1f@got(r5)
+   LOAD_REG_ADDR_ALTTOC(r3, r5, 1f)
 #else
LOAD_REG_IMMEDIATE_SYM(r3, r5, 1f)
 #endif
diff --git a/arch/powerpc/mm/nohash/tlb_low_64e.S 
b/arch/powerpc/mm/nohash/tlb_low_64e.S
index 0e4d9c817382..2f3095716721 100644
--- a/arch/powerpc/mm/nohash/tlb_low_64e.S
+++ b/arch/powerpc/mm/nohash/tlb_low_64e.S
@@ -1125,7 +1125,7 @@ tlb_load_linear:
 * final implementation, especially when dealing with hypervisors
 */
__LOAD_PACA_TOC(r11)
-   ld  r11,linear_map_top@got(r11)
+   LOAD_REG_ADDR_ALTTOC(r11, r11, linear_map_top)
ld  r10,0(r11)
tovirt(10,10)
cmpld   cr0,r16,r10
-- 
2.37.2

[PATCH 4/5] powerpc/64: provide a helper macro to load r2 with the kernel TOC

2022-09-22 Thread Nicholas Piggin

A later change stops the kernel using r2 and loads it with a poison
value.  Provide a PACATOC loading abstraction which can hide this
detail.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ppc_asm.h |  6 ++
 arch/powerpc/kernel/exceptions-64e.S   | 12 ++--
 arch/powerpc/kernel/exceptions-64s.S   |  6 +++---
 arch/powerpc/kernel/head_64.S  |  4 ++--
 arch/powerpc/kernel/interrupt_64.S | 12 ++--
 arch/powerpc/kernel/optprobes_head.S   |  2 +-
 arch/powerpc/kernel/trace/ftrace_low.S |  2 +-
 arch/powerpc/kernel/trace/ftrace_mprofile.S|  3 +--
 arch/powerpc/kvm/book3s_64_entry.S |  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S|  4 ++--
 arch/powerpc/kvm/tm.S  |  2 +-
 arch/powerpc/mm/nohash/tlb_low_64e.S   |  2 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |  2 +-
 13 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index da570b197e82..17a87ecd82c9 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -305,6 +305,12 @@ GLUE(.,name):
 
 #ifdef __powerpc64__
 
+#define __LOAD_PACA_TOC(reg)   \
+   ld  reg,PACATOC(r13)
+
+#define LOAD_PACA_TOC()\
+   __LOAD_PACA_TOC(r2)
+
 #define LOAD_REG_IMMEDIATE(reg, expr) __LOAD_REG_IMMEDIATE reg, expr
 
 #define LOAD_REG_IMMEDIATE_SYM(reg, tmp, expr) \
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 08a3139079b7..c20b39d011a9 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -382,7 +382,7 @@ exc_##n##_common:   
\
ld  r4,excf+EX_R11(r13);/* get back r11 */  \
mfspr   r5,scratch; /* get back r13 */  \
std r12,GPR12(r1);  /* save r12 in stackframe */\
-   ld  r2,PACATOC(r13);/* get kernel TOC into r2 */\
+   LOAD_PACA_TOC();/* get kernel TOC into r2 */\
mflrr6; /* save LR in stackframe */ \
mfctr   r7; /* save CTR in stackframe */\
mfspr   r8,SPRN_XER;/* save XER in stackframe */\
@@ -698,7 +698,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
beq+1f
 
 #ifdef CONFIG_RELOCATABLE
-   ld  r15,PACATOC(r13)
+   __LOAD_PACA_TOC(r15)
ld  r14,interrupt_base_book3e@got(r15)
ld  r15,__end_interrupts@got(r15)
cmpld   cr0,r10,r14
@@ -769,7 +769,7 @@ kernel_dbg_exc:
beq+1f
 
 #ifdef CONFIG_RELOCATABLE
-   ld  r15,PACATOC(r13)
+   __LOAD_PACA_TOC(r15)
ld  r14,interrupt_base_book3e@got(r15)
ld  r15,__end_interrupts@got(r15)
cmpld   cr0,r10,r14
@@ -894,7 +894,7 @@ kernel_dbg_exc:
 
 .macro SEARCH_RESTART_TABLE
 #ifdef CONFIG_RELOCATABLE
-   ld  r11,PACATOC(r13)
+   __LOAD_PACA_TOC(r11)
ld  r14,__start___restart_table@got(r11)
ld  r15,__stop___restart_table@got(r11)
 #else
@@ -1073,7 +1073,7 @@ bad_stack_book3e:
std r11,0(r1)
li  r12,0
std r12,0(r11)
-   ld  r2,PACATOC(r13)
+   LOAD_PACA_TOC()
 1: addir3,r1,STACK_FRAME_OVERHEAD
bl  kernel_bad_stack
b   1b
@@ -1314,7 +1314,7 @@ a2_tlbinit_after_linear_map:
 
/* Now we branch the new virtual address mapped by this entry */
 #ifdef CONFIG_RELOCATABLE
-   ld  r5,PACATOC(r13)
+   __LOAD_PACA_TOC(r5)
ld  r3,1f@got(r5)
 #else
LOAD_REG_IMMEDIATE_SYM(r3, r5, 1f)
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9d375ea58add..8e6fd8c6618e 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -580,7 +580,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
std r2,GPR2(r1) /* save r2 in stackframe*/
SAVE_GPRS(3, 8, r1) /* save r3 - r8 in stackframe   */
mflrr9  /* Get LR, later save to stack  */
-   ld  r2,PACATOC(r13) /* get kernel TOC into r2   */
+   LOAD_PACA_TOC() /* get kernel TOC into r2   */
std r9,_LINK(r1)
lbz r10,PACAIRQSOFTMASK(r13)
mfspr   r11,SPRN_XER/* save XER in stackframe   */
@@ -610,7 +610,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 .macro SEARCH_RESTART_TABLE
 #ifdef CONFIG_RELOCATABLE
mr  r12,r2
-   ld  r2,PACATOC(r13)
+   LOAD_PACA_TOC()
LOAD_REG_ADDR(r9, __start___restart_table)
LOAD_REG_ADDR(r10, __stop___restart_table)
mr  r2,r12
@@

[PATCH 3/5] powerpc/64: switch asm helpers from GOT to TOC relative addressing

2022-09-22 Thread Nicholas Piggin

There is no need to use GOT addressing within the kernel.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/boot/ppc_asm.h| 3 ++-
 arch/powerpc/include/asm/ppc_asm.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/ppc_asm.h b/arch/powerpc/boot/ppc_asm.h
index f823f87b7357..afdd195528eb 100644
--- a/arch/powerpc/boot/ppc_asm.h
+++ b/arch/powerpc/boot/ppc_asm.h
@@ -90,7 +90,8 @@
addireg,reg,name@l
 #else
 #define LOAD_REG_ADDR(reg,name)\
-   ld  reg,name@got(r2)
+   addis   reg,r2,name@toc@ha; \
+   addireg,reg,name@toc@l
 #endif
 
 #endif /* _PPC64_PPC_ASM_H */
diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 83c02f5a7f2a..da570b197e82 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -315,7 +315,8 @@ GLUE(.,name):
rldimi  reg, tmp, 32, 0
 
 #define LOAD_REG_ADDR(reg,name)\
-   ld  reg,name@got(r2)
+   addis   reg,r2,name@toc@ha; \
+   addireg,reg,name@toc@l
 
 #define LOAD_REG_ADDRBASE(reg,name)LOAD_REG_ADDR(reg,name)
 #define ADDROFF(name)  0
-- 
2.37.2

[PATCH 2/5] powerpc/64: asm use consistent global variable declaration and access

2022-09-22 Thread Nicholas Piggin

Use helper macros to access global variables, and place them in .data
sections rather than in .toc. Putting addresses in TOC is not required
because the kernel is linked with a single TOC.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/boot/opal-calls.S  |  6 +++---
 arch/powerpc/boot/ppc_asm.h |  9 +
 arch/powerpc/kernel/interrupt_64.S  | 10 --
 arch/powerpc/kernel/swsusp_asm64.S  | 16 +---
 arch/powerpc/kernel/trace/ftrace_mprofile.S |  3 +--
 arch/powerpc/kernel/vector.S| 15 +++
 arch/powerpc/lib/copypage_64.S  |  7 +--
 arch/powerpc/lib/string_64.S|  7 +--
 arch/powerpc/perf/bhrb.S|  2 +-
 arch/powerpc/platforms/pseries/hvCall.S |  4 ++--
 arch/powerpc/xmon/spr_access.S  |  4 ++--
 11 files changed, 32 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/boot/opal-calls.S b/arch/powerpc/boot/opal-calls.S
index ad0e15d930c4..1f2f330a459e 100644
--- a/arch/powerpc/boot/opal-calls.S
+++ b/arch/powerpc/boot/opal-calls.S
@@ -16,7 +16,7 @@ opal_kentry:
li  r5, 0
li  r6, 0
li  r7, 0
-   ld  r11,opal@got(r2)
+   LOAD_REG_ADDR(r11, opal)
ld  r8,0(r11)
ld  r9,8(r11)
bctr
@@ -35,7 +35,7 @@ opal_call:
mr  r13,r2
 
/* Set opal return address */
-   ld  r11,opal_return@got(r2)
+   LOAD_REG_ADDR(r11, opal_return)
mtlrr11
mfmsr   r12
 
@@ -45,7 +45,7 @@ opal_call:
mtspr   SPRN_HSRR1,r12
 
/* load the opal call entry point and base */
-   ld  r11,opal@got(r2)
+   LOAD_REG_ADDR(r11, opal)
ld  r12,8(r11)
ld  r2,0(r11)
mtspr   SPRN_HSRR0,r12
diff --git a/arch/powerpc/boot/ppc_asm.h b/arch/powerpc/boot/ppc_asm.h
index 192b97523b05..f823f87b7357 100644
--- a/arch/powerpc/boot/ppc_asm.h
+++ b/arch/powerpc/boot/ppc_asm.h
@@ -84,4 +84,13 @@
 #define MFTBU(dest)mfspr dest, SPRN_TBRU
 #endif
 
+#ifdef CONFIG_PPC32
+#define LOAD_REG_ADDR(reg,name)\
+   lis reg,name@ha;\
+   addireg,reg,name@l
+#else
+#define LOAD_REG_ADDR(reg,name)\
+   ld  reg,name@got(r2)
+#endif
+
 #endif /* _PPC64_PPC_ASM_H */
diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index ee2d2d410c5a..fbfb72a62da9 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -13,16 +13,6 @@
 #include 
 #include 
 
-   .section".toc","aw"
-SYS_CALL_TABLE:
-   .tc sys_call_table[TC],sys_call_table
-
-#ifdef CONFIG_COMPAT
-COMPAT_SYS_CALL_TABLE:
-   .tc compat_sys_call_table[TC],compat_sys_call_table
-#endif
-   .previous
-
.align 7
 
 .macro DEBUG_SRR_VALID srr
diff --git a/arch/powerpc/kernel/swsusp_asm64.S 
b/arch/powerpc/kernel/swsusp_asm64.S
index 9f1903c7f540..f645652c2654 100644
--- a/arch/powerpc/kernel/swsusp_asm64.S
+++ b/arch/powerpc/kernel/swsusp_asm64.S
@@ -76,16 +76,10 @@
 swsusp_save_area:
.space SL_SIZE
 
-   .section ".toc","aw"
-swsusp_save_area_ptr:
-   .tc swsusp_save_area[TC],swsusp_save_area
-restore_pblist_ptr:
-   .tc restore_pblist[TC],restore_pblist
-
.section .text
.align  5
 _GLOBAL(swsusp_arch_suspend)
-   ld  r11,swsusp_save_area_ptr@toc(r2)
+   LOAD_REG_ADDR(r11, swsusp_save_area)
SAVE_SPECIAL(LR)
SAVE_REGISTER(r1)
SAVE_SPECIAL(CR)
@@ -131,7 +125,7 @@ END_FW_FTR_SECTION_IFCLR(FW_FEATURE_LPAR)
bl swsusp_save
 
/* restore LR */
-   ld  r11,swsusp_save_area_ptr@toc(r2)
+   LOAD_REG_ADDR(r11, swsusp_save_area)
RESTORE_SPECIAL(LR)
addir1,r1,128
 
@@ -145,7 +139,7 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
sync
 
-   ld  r12,restore_pblist_ptr@toc(r2)
+   LOAD_REG_ADDR(r11, restore_pblist)
ld  r12,0(r12)
 
cmpdi   r12,0
@@ -187,7 +181,7 @@ nothing_to_copy:
tlbia
 #endif
 
-   ld  r11,swsusp_save_area_ptr@toc(r2)
+   LOAD_REG_ADDR(r11, swsusp_save_area)
 
RESTORE_SPECIAL(CR)
 
@@ -265,7 +259,7 @@ END_FW_FTR_SECTION_IFCLR(FW_FEATURE_LPAR)
bl  do_after_copyback
addir1,r1,128
 
-   ld  r11,swsusp_save_area_ptr@toc(r2)
+   LOAD_REG_ADDR(r11, swsusp_save_area)
RESTORE_SPECIAL(LR)
 
li  r3, 0
diff --git a/arch/powerpc/kernel/trace/ftrace_mprofile.S 
b/arch/powerpc/kernel/trace/ftrace_mprofile.S
index 4fa23e260cab..33fcfb2eaded 100644
--- a/arch/powerpc/kernel/trace/ftrace_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_mprofile.S
@@ -85,8 +85,7 @@
std r2, STK_GOT(r1)
ld  r2,PACATOC(r13) /* get kernel TOC in r2 */
 
-   addis   r3,r2,function_trace_op@toc@ha
-   addi

[PATCH 1/5] powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKER

2022-09-22 Thread Nicholas Piggin

Using a 32-bit constant for this marker allows it to be loaded with
two ALU instructions, like 32-bit. This avoids a TOC entry and a
TOC load that depends on the r2 value that has just been loaded from
the PACA.

This changes the value for 32-bit as well, so both have the same
value in the low 4 bytes and 64-bit has 0x in the top bytes.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ptrace.h| 5 +++--
 arch/powerpc/kernel/entry_32.S   | 6 +++---
 arch/powerpc/kernel/exceptions-64e.S | 8 +---
 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 arch/powerpc/kernel/head_64.S| 7 ---
 arch/powerpc/kernel/interrupt_64.S   | 6 +++---
 6 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index a03403695cd4..49d720bb888b 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -99,6 +99,9 @@ struct pt_regs
 
 #define STACK_FRAME_WITH_PT_REGS (STACK_FRAME_OVERHEAD + sizeof(struct 
pt_regs))
 
+/* 0x8d9a988d on 64-bit */
+#define STACK_FRAME_REGS_MARKERASM_CONST(-0x72656773) /* 0x8d9a988d */
+
 #ifdef __powerpc64__
 
 /*
@@ -115,7 +118,6 @@ struct pt_regs
 
 #define STACK_FRAME_OVERHEAD   112 /* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE2   /* Location of LR in stack frame */
-#define STACK_FRAME_REGS_MARKERASM_CONST(0x7265677368657265)
 #define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + \
 STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE)
 #define STACK_FRAME_MARKER 12
@@ -136,7 +138,6 @@ struct pt_regs
 #define KERNEL_REDZONE_SIZE0
 #define STACK_FRAME_OVERHEAD   16  /* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE1   /* Location of LR in stack frame */
-#define STACK_FRAME_REGS_MARKERASM_CONST(0x72656773)
 #define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD)
 #define STACK_FRAME_MARKER 2
 #define STACK_FRAME_MIN_SIZE   STACK_FRAME_OVERHEAD
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1d599df6f169..c2516f982cfa 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -265,7 +265,7 @@ fast_exception_return:
mtcrr10
lwz r10,_LINK(r11)
mtlrr10
-   /* Clear the exception_marker on the stack to avoid confusing 
stacktrace */
+   /* Clear the exception marker on the stack to avoid confusing 
stacktrace */
li  r10, 0
stw r10, 8(r11)
REST_GPR(10, r11)
@@ -322,7 +322,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
li  r0,0
 
/*
-* Leaving a stale exception_marker on the stack can confuse
+* Leaving a stale exception marker on the stack can confuse
 * the reliable stack unwinder later on. Clear it.
 */
stw r0,8(r1)
@@ -374,7 +374,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
mtspr   SPRN_XER,r5
 
/*
-* Leaving a stale exception_marker on the stack can confuse
+* Leaving a stale exception marker on the stack can confuse
 * the reliable stack unwinder later on. Clear it.
 */
stw r0,8(r1)
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 67dc4e3179a0..08a3139079b7 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -389,7 +389,7 @@ exc_##n##_common:   
\
ld  r9,excf+EX_R1(r13); /* load orig r1 back from PACA */   \
lwz r10,excf+EX_CR(r13);/* load orig CR back from PACA  */  \
lbz r11,PACAIRQSOFTMASK(r13); /* get current IRQ softe */   \
-   ld  r12,exception_marker@toc(r2);   \
+   LOAD_REG_IMMEDIATE(r12, STACK_FRAME_REGS_MARKER);   \
li  r0,0;   \
std r3,GPR10(r1);   /* save r10 to stackframe */\
std r4,GPR11(r1);   /* save r11 to stackframe */\
@@ -470,12 +470,6 @@ exc_##n##_bad_stack:   
\
bl  hdlr;   \
b   interrupt_return
 
-/* This value is used to mark exception frames on the stack. */
-   .section".toc","aw"
-exception_marker:
-   .tc ID_EXC_MARKER[TC],STACK_FRAME_REGS_MARKER
-
-
 /*
  * And here we have the exception vectors !
  */
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 3d0dc133a9ae..9d375ea58add 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -589,7 +589,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
li

[PATCH 0/5] powerpc/64: avoid GOT addressing, don't put data in TOC

2022-09-22 Thread Nicholas Piggin

This is a cleaned up set of the initial prep patches from the pcrel
series, dealing with regularising addressing variables from asm
and using helper macros more consistently.

Nothing really new, this is more complete, splits out the changes
more logically, adds changelog/comments, and avoids touching ppc32
much (only slight comment change and interrupt stack magic number
change).

Patch 2 is probably actually required before we make the TOC readonly
at runtime, because there is one case (hcall_tracepoint_refcount)
that puts read-write data in the TOC section.

Thanks,
Nick

Nicholas Piggin (5):
  powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKER
  powerpc/64: asm use consistent global variable declaration and access
  powerpc/64: switch asm helpers from GOT to TOC relative addressing
  powerpc/64: provide a helper macro to load r2 with the kernel TOC
  powerpc/64e: provide an addressing macro for use with TOC in alternate
register

 arch/powerpc/boot/opal-calls.S|  6 ++--
 arch/powerpc/boot/ppc_asm.h   | 10 ++
 arch/powerpc/include/asm/ppc_asm.h| 20 ++-
 arch/powerpc/include/asm/ptrace.h |  5 +--
 arch/powerpc/kernel/entry_32.S|  6 ++--
 arch/powerpc/kernel/exceptions-64e.S  | 34 ---
 arch/powerpc/kernel/exceptions-64s.S  |  8 ++---
 arch/powerpc/kernel/head_64.S | 11 ++
 arch/powerpc/kernel/interrupt_64.S| 28 +--
 arch/powerpc/kernel/optprobes_head.S  |  2 +-
 arch/powerpc/kernel/swsusp_asm64.S| 16 +++--
 arch/powerpc/kernel/trace/ftrace_low.S|  2 +-
 arch/powerpc/kernel/trace/ftrace_mprofile.S   |  6 ++--
 arch/powerpc/kernel/vector.S  | 15 
 arch/powerpc/kvm/book3s_64_entry.S|  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  4 +--
 arch/powerpc/kvm/tm.S |  2 +-
 arch/powerpc/lib/copypage_64.S|  7 +---
 arch/powerpc/lib/string_64.S  |  7 +---
 arch/powerpc/mm/nohash/tlb_low_64e.S  |  4 +--
 arch/powerpc/perf/bhrb.S  |  2 +-
 .../powerpc/platforms/powernv/opal-wrappers.S |  2 +-
 arch/powerpc/platforms/pseries/hvCall.S   |  4 +--
 arch/powerpc/xmon/spr_access.S|  4 +--
 24 files changed, 97 insertions(+), 110 deletions(-)

-- 
2.37.2

Re: [PATCH v2 2/2] powerpc/pci: Prefer PCI domain assignment via DT 'linux,pci-domain' and alias

2022-09-22 Thread Benjamin Herrenschmidt

On Mon, 2022-08-15 at 15:46 +1000, Michael Ellerman wrote:
> Guenter Roeck  writes:
> > On Wed, Jul 06, 2022 at 12:21:48PM +0200, Pali Rohár wrote:
> > > Other Linux architectures use DT property 'linux,pci-domain' for 
> > > specifying
> > > fixed PCI domain of PCI controller specified in Device-Tree.
> > > 
> > > And lot of Freescale powerpc boards have defined numbered pci alias in
> > > Device-Tree for every PCIe controller which number specify preferred PCI
> > > domain.
> > > 
> > > So prefer usage of DT property 'linux,pci-domain' (via function
> > > of_get_pci_domain_nr()) and DT pci alias (via function of_alias_get_id())
> > > on powerpc architecture for assigning PCI domain to PCI controller.
> > > 
> > > Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on 
> > > device-tree properties")
> > > Signed-off-by: Pali Rohár 
> > 
> > This patch results in a number of boot warnings with various qemu
> > boot tests.
> 
> Thanks for the report.
> 
> I have automated qemu boot tests to catch things like this, they even
> have DEBUG_ATOMIC_SLEEP enabled ... but I inadvertantly broke my script
> that checks for "BUG:" in the console log. Sometimes you just can't
> win.

So the problem is

spin_lock(_spinlock);

get_phb_number() relies on it for the phb_bitmap allocation. You can
move it out of the lock but you'll have to either:

 - Take the lock inside it to protect the allocation

 - Turn find_first_zero_bit/set_bit into a loop of
find_first_zero_bit+test_and_set_bit() which wouldn't require a lock.

Note about the other "reg" numbering conversation ... I'm pretty sure
that breaks some old PowerMac crap which shows nobody really uses these
things considering how long the patch has been there :-)

I'm pretty sure something somewhere assumes the primary bus is 0. Some
old userspace definitely does though that might no longer be relevant.
The whole business with "domain 0" being special and avoiding domain
numbers in /proc relies on this too...

Cheers,
Ben.

Re: [PATCH] powerpc/pci: Enable PCI domains in /proc when PCI bus numbers are not unique

2022-09-22 Thread Benjamin Herrenschmidt

On Thu, 2022-09-01 at 13:53 +1000, Michael Ellerman wrote:
> > 
> > I sent two patches which do another steps to achieve it:
> > https://lore.kernel.org/linuxppc-dev/20220817163927.24453-1-p...@kernel.org/t/#u
> > 
> > Main blocker is pci-OF-bus-map which is in direct conflict with
> > CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT and which used on chrp and pmac.
> > And I have no idea if pci-OF-bus-map is still needed or not.
> 
> Yeah thanks, I saw those patches.
> 
> I can't find any code that refers to pci-OF-bus-map, so I'm inclined to
> remove it entirely.
> 
> But I'll do some more searching to see if I can find any references to
> it in old code.

Trying to remember ... :-)

So this is what I recall at this point:

 - Ancient X11 didn't understand domains in /proc and thus would barf,
which was the primary reason for not enabling them always iirc...

 - There might be something else with early PowerMacs (Grand Central
chipset) where we have effectively two domains (gc and chaos) but
overlapping bus numbers. There might still be pre-historical code in
there that assumes it's that way though I can't see anything obvious.
Paul might still have one of these :-) (PowerMac 7200/7500/8500/9500
afaik).

 - pci-OF-bus-map predates the PCI layer keeping track of the PCI/OF
relationship. I don't believe it's still used anywhere in the kernel,
though it's possible (unlikely ?) that some garbage remains in
userspace that does.

At this point, I wouldn't object to tearing this all out and just
having domains always (and see what the fallout is).

Cheers,
Ben.

Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")

2022-09-22 Thread John Hubbard

On 9/22/22 19:26, John Hubbard wrote:
> 
> Reviewed-by: John Hubbard 
> 

I forgot to mention that I had applied your fix to Akira's
issue, before reviewing. So that fix works and builds and
looks nice too.

thanks,

-- 
John Hubbard
NVIDIA

Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")

2022-09-22 Thread John Hubbard

On 9/20/22 05:23, David Hildenbrand wrote:
> [1] 
> https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=nkpwhu8ado3v56bxccsu97oyj...@mail.gmail.com
> [2] 
> https://lore.kernel.org/r/CAHk-=wg40eazofo16eviaj7mfqdhz2gvebvfsmf6gyzsprj...@mail.gmail.com
> [2] 
> https://lkml.kernel.org/r/CAHk-=wit-dmhmfqery29jspjfgebx_ld+pnerc4j2ag990w...@mail.gmail.com

s/2/3/

...
> diff --git a/Documentation/process/coding-style.rst 
> b/Documentation/process/coding-style.rst
> index 03eb53fd029a..e05899cbfd49 100644
> --- a/Documentation/process/coding-style.rst
> +++ b/Documentation/process/coding-style.rst
> @@ -1186,6 +1186,67 @@ expression used.  For instance:
>   #endif /* CONFIG_SOMETHING */
>  
>  
> +22) Do not crash the kernel
> +---
> +
> +In general, it is not the kernel developer's decision to crash the kernel.

What do you think of this alternate wording:

In general, the decision to crash the kernel belongs to the user, rather
than to the kernel developer.


> +
> +Avoid panic()
> +=
> +
> +panic() should be used with care and primarily only during system boot.
> +panic() is, for example, acceptable when running out of memory during boot 
> and
> +not being able to continue.
> +
> +Use WARN() rather than BUG()
> +
> +
> +Do not add new code that uses any of the BUG() variants, such as BUG(),
> +BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
> +WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not
> +required if there is no reasonable way to at least partially recover.
> +
> +"I'm too lazy to do error handling" is not an excuse for using BUG(). Major
> +internal corruptions with no way of continuing may still use BUG(), but need
> +good justification.
> +
> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
> +**
> +
> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
> +is common for a given warning condition, if it occurs at all, to occur
> +multiple times. This can fill up and wrap the kernel log, and can even slow
> +the system enough that the excessive logging turns into its own, additional
> +problem.
> +
> +Do not WARN lightly
> +***
> +
> +WARN*() is intended for unexpected, this-should-never-happen situations.
> +WARN*() macros are not to be used for anything that is expected to happen
> +during normal operation. These are not pre- or post-condition asserts, for
> +example. Again: WARN*() must not be used for a condition that is expected
> +to trigger easily, for example, by user space actions. pr_warn_once() is a
> +possible alternative, if you need to notify the user of a problem.
> +
> +Do not worry about panic_on_warn users
> +**
> +
> +A few more words about panic_on_warn: Remember that ``panic_on_warn`` is an
> +available kernel option, and that many users set this option. This is why
> +there is a "Do not WARN lightly" writeup, above. However, the existence of
> +panic_on_warn users is not a valid reason to avoid the judicious use
> +WARN*(). That is because, whoever enables panic_on_warn has explicitly
> +asked the kernel to crash if a WARN*() fires, and such users must be
> +prepared to deal with the consequences of a system that is somewhat more
> +likely to crash.
> +
> +Use BUILD_BUG_ON() for compile-time assertions
> +**
> +
> +The use of BUILD_BUG_ON() is acceptable and encouraged, because it is a
> +compile-time assertion that has no effect at runtime.
> +
>  Appendix I) References
>  --
>  

I like the wording, it feels familiar somehow! :)

Reviewed-by: John Hubbard 

thanks,

-- 
John Hubbard
NVIDIA

Re: [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants

2022-09-22 Thread John Hubbard

On 9/22/22 19:11, Joe Perches wrote:
>> Should this be a separate patch? Adding a bunch of exceptions to the BUG() 
>> rules is 
>> a separate and distinct thing from adding VM_BUG_ON() and other *BUG*() 
>> variants to
>> the mix.
> 
> Not in my opinion.

OK, then. :)

> 
>>> my $msg_level = \
>>> $msg_level = \ if ($file);
>>> &{$msg_level}("AVOID_BUG",
>>> - "Avoid crashing the kernel - try using 
>>> WARN_ON & recovery code rather than BUG() or BUG_ON()\n" . $herecurr);
>>> + "Do not crash the kernel unless it is 
>>> unavoidable - use WARN_ON_ONCE & recovery code (if reasonable) rather than 
>>> BUG() or variants.\n" . $herecurr);
>>
>> Here's a requested tweak, to clean up the output and fix punctuation:
>>
>> "Avoid crashing the kernel--use WARN_ON_ONCE() plus recovery code (if 
>> feasible) instead of BUG() or variants.\n" . $herecurr);
> 
> Fixing punctuation here would be removing the trailing period as checkpatch
> only has periods for multi-sentence output messages.

OK, let's do that too. 

> 
> And I think that "Do not crash" is a stronger statement than "Avoid crashing"
> so I prefer the original suggestion but it's not a big deal either way.

Yes, stronger wording is better. So how about this:

"Do not crash the kernel unless it is absolutely unavoidable--use 
WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants\n" 
. $herecurr);


thanks,

-- 
John Hubbard
NVIDIA

Re: [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants

2022-09-22 Thread Joe Perches

On Thu, 2022-09-22 at 19:05 -0700, John Hubbard wrote:
> On 9/20/22 05:23, David Hildenbrand wrote:
> > checkpatch does not point out that VM_BUG_ON() and friends should be
> > avoided, however, Linus notes:
> > 
> > VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
> > no different, the only difference is "we can make the code smaller
> > because these are less important". [1]
> > 
> > So let's warn on VM_BUG_ON() and other BUG variants as well. While at it,
> > make it clearer that the kernel really shouldn't be crashed.
> > 
> > As there are some subsystem BUG macros that actually don't end up crashing
> > the kernel -- for example, KVM_BUG_ON() -- exclude these manually.
> > 
> > [1] 
> > https://lore.kernel.org/r/CAHk-=wg40eazofo16eviaj7mfqdhz2gvebvfsmf6gyzsprj...@mail.gmail.com
[]
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> > @@ -4695,12 +4695,12 @@ sub process {
> > }
> > }
> >  
> > -# avoid BUG() or BUG_ON()
> > -   if ($line =~ /\b(?:BUG|BUG_ON)\b/) {
> > +# do not use BUG() or variants
> > +   if ($line =~ 
> > /\b(?!AA_|BUILD_|DCCP_|IDA_|KVM_|RWLOCK_|snd_|SPIN_)(?:[a-zA-Z_]*_)?BUG(?:_ON)?(?:_[A-Z_]+)?\s*\(/)
> >  {
> 
> Should this be a separate patch? Adding a bunch of exceptions to the BUG() 
> rules is 
> a separate and distinct thing from adding VM_BUG_ON() and other *BUG*() 
> variants to
> the mix.

Not in my opinion.

> > my $msg_level = \
> > $msg_level = \ if ($file);
> > &{$msg_level}("AVOID_BUG",
> > - "Avoid crashing the kernel - try using 
> > WARN_ON & recovery code rather than BUG() or BUG_ON()\n" . $herecurr);
> > + "Do not crash the kernel unless it is 
> > unavoidable - use WARN_ON_ONCE & recovery code (if reasonable) rather than 
> > BUG() or variants.\n" . $herecurr);
> 
> Here's a requested tweak, to clean up the output and fix punctuation:
> 
> "Avoid crashing the kernel--use WARN_ON_ONCE() plus recovery code (if 
> feasible) instead of BUG() or variants.\n" . $herecurr);

Fixing punctuation here would be removing the trailing period as checkpatch
only has periods for multi-sentence output messages.

And I think that "Do not crash" is a stronger statement than "Avoid crashing"
so I prefer the original suggestion but it's not a big deal either way.

Re: [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants

2022-09-22 Thread John Hubbard

On 9/20/22 05:23, David Hildenbrand wrote:
> checkpatch does not point out that VM_BUG_ON() and friends should be
> avoided, however, Linus notes:
> 
> VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
> no different, the only difference is "we can make the code smaller
> because these are less important". [1]
> 
> So let's warn on VM_BUG_ON() and other BUG variants as well. While at it,
> make it clearer that the kernel really shouldn't be crashed.
> 
> As there are some subsystem BUG macros that actually don't end up crashing
> the kernel -- for example, KVM_BUG_ON() -- exclude these manually.
> 
> [1] 
> https://lore.kernel.org/r/CAHk-=wg40eazofo16eviaj7mfqdhz2gvebvfsmf6gyzsprj...@mail.gmail.com
> 
> Signed-off-by: David Hildenbrand 
> ---
>  scripts/checkpatch.pl | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 79e759aac543..21f3a79aa46f 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -4695,12 +4695,12 @@ sub process {
>   }
>   }
>  
> -# avoid BUG() or BUG_ON()
> - if ($line =~ /\b(?:BUG|BUG_ON)\b/) {
> +# do not use BUG() or variants
> + if ($line =~ 
> /\b(?!AA_|BUILD_|DCCP_|IDA_|KVM_|RWLOCK_|snd_|SPIN_)(?:[a-zA-Z_]*_)?BUG(?:_ON)?(?:_[A-Z_]+)?\s*\(/)
>  {

Should this be a separate patch? Adding a bunch of exceptions to the BUG() 
rules is 
a separate and distinct thing from adding VM_BUG_ON() and other *BUG*() 
variants to
the mix.

>   my $msg_level = \
>   $msg_level = \ if ($file);
>   &{$msg_level}("AVOID_BUG",
> -   "Avoid crashing the kernel - try using 
> WARN_ON & recovery code rather than BUG() or BUG_ON()\n" . $herecurr);
> +   "Do not crash the kernel unless it is 
> unavoidable - use WARN_ON_ONCE & recovery code (if reasonable) rather than 
> BUG() or variants.\n" . $herecurr);

Here's a requested tweak, to clean up the output and fix punctuation:

"Avoid crashing the kernel--use WARN_ON_ONCE() plus recovery code (if feasible) 
instead of BUG() or variants.\n" . $herecurr);


thanks,

-- 
John Hubbard
NVIDIA

Re: [PATCH 2/2] powerpc/rtas: block error injection when locked down

2022-09-22 Thread Paul Moore

On Thu, Sep 22, 2022 at 3:38 PM Nathan Lynch  wrote:
>
> The error injection facility on pseries VMs allows corruption of
> arbitrary guest memory, potentially enabling a sufficiently privileged
> user to disable lockdown or perform other modifications of the running
> kernel via the rtas syscall.
>
> Block the PAPR error injection facility from being opened or called
> when locked down.
>
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/kernel/rtas.c | 25 -
>  include/linux/security.h   |  1 +
>  security/security.c|  1 +
>  3 files changed, 26 insertions(+), 1 deletion(-)

...

> diff --git a/include/linux/security.h b/include/linux/security.h
> index 1ca8dbacd3cc..b5d5138ae66a 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -123,6 +123,7 @@ enum lockdown_reason {
> LOCKDOWN_BPF_WRITE_USER,
> LOCKDOWN_DBG_WRITE_KERNEL,
> LOCKDOWN_DEVICE_TREE,
> +   LOCKDOWN_RTAS_ERROR_INJECTION,

With the understanding that I've never heard of RTAS until now, are
there any other RTAS events that would require a lockdown reason?  As
a follow up, is it important to distinguish between different RTAS
lockdown reasons?

I'm trying to determine if we can just call it LOCKDOWN_RTAS.

> LOCKDOWN_INTEGRITY_MAX,
> LOCKDOWN_KCORE,
> LOCKDOWN_KPROBES,
> diff --git a/security/security.c b/security/security.c
> index 2863fc31eec6..6518b239ada2 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -61,6 +61,7 @@ const char *const 
> lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
> [LOCKDOWN_BPF_WRITE_USER] = "use of bpf to write user RAM",
> [LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM",
> [LOCKDOWN_DEVICE_TREE] = "modifying device tree contents",
> +   [LOCKDOWN_RTAS_ERROR_INJECTION] = "RTAS error injection",

See above.

> [LOCKDOWN_INTEGRITY_MAX] = "integrity",
> [LOCKDOWN_KCORE] = "/proc/kcore access",
> [LOCKDOWN_KPROBES] = "use of kprobes",
> --
> 2.37.3

-- 
paul-moore.com

Re: [PATCH 1/2] powerpc/pseries: block untrusted device tree changes when locked down

2022-09-22 Thread Paul Moore

On Thu, Sep 22, 2022 at 3:38 PM Nathan Lynch  wrote:
>
> The /proc/powerpc/ofdt interface allows the root user to freely alter
> the in-kernel device tree, enabling arbitrary physical address writes
> via drivers that could bind to malicious device nodes, thus making it
> possible to disable lockdown.
>
> Historically this interface has been used on the pseries platform to
> facilitate the runtime addition and removal of processor, memory, and
> device resources (aka Dynamic Logical Partitioning or DLPAR). Years
> ago, the processor and memory use cases were migrated to designs that
> happen to be lockdown-friendly: device tree updates are communicated
> directly to the kernel from firmware without passing through untrusted
> user space. I/O device DLPAR via the "drmgr" command in powerpc-utils
> remains the sole legitimate user of /proc/powerpc/ofdt, but it is
> already broken in lockdown since it uses /dev/mem to allocate argument
> buffers for the rtas syscall. So only illegitimate uses of the
> interface should see a behavior change when running on a locked down
> kernel.
>
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/platforms/pseries/reconfig.c | 5 +
>  include/linux/security.h  | 1 +
>  security/security.c   | 1 +
>  3 files changed, 7 insertions(+)

A couple of small nits below, but in general this seems reasonable.
However, as we are currently at -rc6 I would like us to wait to merge
this until after the upcoming merge window closes (I don't like
merging new functionality into -next at -rc6).

https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git/tree/README.md

> diff --git a/include/linux/security.h b/include/linux/security.h
> index 7bd0c490703d..1ca8dbacd3cc 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -122,6 +122,7 @@ enum lockdown_reason {
> LOCKDOWN_XMON_WR,
> LOCKDOWN_BPF_WRITE_USER,
> LOCKDOWN_DBG_WRITE_KERNEL,
> +   LOCKDOWN_DEVICE_TREE,

I would suggest moving LOCKDOWN_DEVICE_TREE to be next to
LOCKDOWN_ACPI_TABLES.  It's not a hard requirement, but it seems like
a nice idea to group similar things when we can.

> LOCKDOWN_INTEGRITY_MAX,
> LOCKDOWN_KCORE,
> LOCKDOWN_KPROBES,
> diff --git a/security/security.c b/security/security.c
> index 4b95de24bc8d..2863fc31eec6 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -60,6 +60,7 @@ const char *const 
> lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
> [LOCKDOWN_XMON_WR] = "xmon write access",
> [LOCKDOWN_BPF_WRITE_USER] = "use of bpf to write user RAM",
> [LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM",
> +   [LOCKDOWN_DEVICE_TREE] = "modifying device tree contents",

Might as well move this one too.

> [LOCKDOWN_INTEGRITY_MAX] = "integrity",
> [LOCKDOWN_KCORE] = "/proc/kcore access",
> [LOCKDOWN_KPROBES] = "use of kprobes",
> --
> 2.37.3

-- 
paul-moore.com

Re: [PATCH 1/2] tools/perf/tests: Fix string substitutions in build id test

2022-09-22 Thread Ian Rogers

On Thu, Sep 22, 2022 at 12:15 PM Arnaldo Carvalho de Melo
 wrote:
>
> Em Wed, Sep 21, 2022 at 10:38:38PM +0530, Athira Rajeev escreveu:
> > The perf test named “build id cache operations” skips with below
> > error on some distros:
>
> I wonder if we shouldn't instead state that bash is needed?
>
> ⬢[acme@toolbox perf-urgent]$ head -1 tools/perf/tests/shell/*.sh | grep ^#
> #!/bin/sh
> #!/bin/bash
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/bash
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/bash
> #!/bin/sh
> #!/bin/bash
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> #!/bin/sh
> ⬢[acme@toolbox perf-urgent]$
>
> Opinions?

+1 to bash. Perhaps python longer term?  The XML test output generated
by things like kunit is possible to generate from either bash or
python, but in my experience the python stuff feels better built.

Thanks,
Ian

> - Arnaldo
>
> > <<>>
> >  78: build id cache operations   :
> > test child forked, pid 01
> > WARNING: wine not found. PE binaries will not be run.
> > test binaries: /tmp/perf.ex.SHA1.PKz /tmp/perf.ex.MD5.Gt3 
> > ./tests/shell/../pe-file.exe
> > DEBUGINFOD_URLS=
> > Adding 4abd406f041feb4f10ecde3fc30fd0639e1a91cb /tmp/perf.ex.SHA1.PKz: Ok
> > build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb
> > ./tests/shell/buildid.sh: 69: ./tests/shell/buildid.sh: Bad substitution
> > test child finished with -2
> > build id cache operations: Skip
> > <<>>
> >
> > The test script "tests/shell/buildid.sh" uses some of the
> > string substitution ways which are supported in bash, but not in
> > "sh" or other shells. Above error on line number 69 that reports
> > "Bad substitution" is:
> >
> > <<>>
> > link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
> > <<>>
> >
> > Here the way of getting first two characters from id ie,
> > ${id:0:2} and similarly expressions like ${id:2} is not
> > recognised in "sh". So the line errors and instead of
> > hitting failure, the test gets skipped as shown in logs.
> > So the syntax issue causes test not to be executed in
> > such cases. Similarly usage : "${@: -1}" [ to pick last
> > argument passed to a function] in “test_record” doesn’t
> > work in all distros.
> >
> > Fix this by using alternative way with "cut" command
> > to pick "n" characters from the string. Also fix the usage
> > of “${@: -1}” to work in all cases.
> >
> > Another usage in “test_record” is:
> > <<>>
> > ${perf} record --buildid-all -o ${data} $@ &> ${log}
> > <<>>
> >
> > This causes the perf record to start in background and
> > Results in the data file not being created by the time
> > "check" function is invoked. Below log shows perf record
> > result getting displayed after the call to "check" function.
> >
> > <<>>
> > running: perf record /tmp/perf.ex.SHA1.EAU
> > build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb
> > link: 
> > /tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb
> > failed: link 
> > /tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb 
> > does not exist
> > test child finished with -1
> > build id cache operations: FAILED!
> > root@machine:~/athira/linux/tools/perf# Couldn't synthesize bpf events.
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.010 MB /tmp/perf.data.bFF ]
> > <<>>
> >
> > Fix this by redirecting output instead of using “&” which
> > starts the command in background.
> >
> > Signed-off-by: Athira Rajeev 
> > ---
> >  tools/perf/tests/shell/buildid.sh | 16 +---
> >  1 file changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/tests/shell/buildid.sh 
> > b/tools/perf/tests/shell/buildid.sh
> > index f05670d1e39e..3512c4423d48 100755
> > --- a/tools/perf/tests/shell/buildid.sh
> > +++ b/tools/perf/tests/shell/buildid.sh
> > @@ -66,7 +66,7 @@ check()
> >   esac
> >   echo "build id: ${id}"
> >
> > - link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
> > + link=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/$(echo 
> > ${id}|cut -c 3-)
> >   echo "link: ${link}"
> >
> >   if [ ! -h $link ]; then
> > @@ -74,7 +74,7 @@ check()
> >   exit 1
> >   fi
> >
> > - file=${build_id_dir}/.build-id/${id:0:2}/`readlink ${link}`/elf
> > + file=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/`readlink 
> > ${link}`/elf
> >   echo "file: ${file}"
> >
> >   if [ ! -x $file ]; then
> > @@ -117,20 +117,22 @@ test_record()
> >  {
> >   data=$(mktemp /tmp/perf.data.XXX)
> >   build_id_dir=$(mktemp -d /tmp/perf.debug.XXX)
> > - log=$(mktemp /tmp/perf.log.XXX)
> > + log_out=$(mktemp /tmp/perf.log.out.XXX)
> > + log_err=$(mktemp /tmp/perf.log.err.XXX)
> >   perf="perf --buildid-dir ${build_id_dir}"
> > + eval last=\${$#}
> >
> >   echo "running: perf record $@"
> > - ${perf} record --buildid-all -o ${data} $@

Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")

2022-09-22 Thread Akira Yokosawa

Hi,

Minor nits on section title adornments.
See inline comments below.

On Tue, 20 Sep 2022 14:23:00 +0200, David Hildenbrand wrote:
> Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
> is just as bad as BUG_ON(), because it will crash the kernel on
> distributions that enable CONFIG_DEBUG_VM (like Fedora):
> 
> VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
> no different, the only difference is "we can make the code smaller
> because these are less important". [2]
> 
> This resulted in a more generic discussion about usage of BUG() and
> friends. While there might be corner cases that still deserve a BUG_ON(),
> most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
> recovery path if reasonable:
> 
> The only possible case where BUG_ON can validly be used is "I have
> some fundamental data corruption and cannot possibly return an
> error". [2]
> 
> As a very good approximation is the general rule:
> 
> "absolutely no new BUG_ON() calls _ever_" [2]
> 
> ... not even if something really shouldn't ever happen and is merely for
> documenting that an invariant always has to hold. However, there are sill
> exceptions where BUG_ON() may be used:
> 
> If you have a "this is major internal corruption, there's no way we can
> continue", then BUG_ON() is appropriate. [3]
> 
> There is only one good BUG_ON():
> 
> Now, that said, there is one very valid sub-form of BUG_ON():
> BUILD_BUG_ON() is absolutely 100% fine. [2]
> 
> While WARN will also crash the machine with panic_on_warn set, that's
> exactly to be expected:
> 
> So we have two very different cases: the "virtual machine with good
> logging where a dead machine is fine" - use 'panic_on_warn'. And
> the actual real hardware with real drivers, running real loads by
> users. [4]
> 
> The basic idea is that warnings will similarly get reported by users
> and be found during testing. However, in contrast to a BUG(), there is a
> way to actually influence the expected behavior (e.g., panic_on_warn)
> and to eventually keep the machine alive to extract some debug info.
> 
> Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
> expect this code to trigger in any case, recovery code is not really
> helpful.
> 
> I'd prefer to keep all these warnings 'simple' - i.e. no attempted
> recovery & control flow, unless we ever expect these to trigger.
> [5]
> 
> There have been different rules floating around that were never properly
> documented. Let's try to clarify.
> 
> [1] 
> https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=nkpwhu8ado3v56bxccsu97oyj...@mail.gmail.com
> [2] 
> https://lore.kernel.org/r/CAHk-=wg40eazofo16eviaj7mfqdhz2gvebvfsmf6gyzsprj...@mail.gmail.com
> [2] 
> https://lkml.kernel.org/r/CAHk-=wit-dmhmfqery29jspjfgebx_ld+pnerc4j2ag990w...@mail.gmail.com
> [4] 
> https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=k3nov4zaceux9puqf1tjktjla2x...@mail.gmail.com
> [5] https://lore.kernel.org/r/ywiw+mvezotoxn%...@gmail.com
> 
> Signed-off-by: David Hildenbrand 
> ---
>  Documentation/process/coding-style.rst | 61 ++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/Documentation/process/coding-style.rst 
> b/Documentation/process/coding-style.rst
> index 03eb53fd029a..e05899cbfd49 100644
> --- a/Documentation/process/coding-style.rst
> +++ b/Documentation/process/coding-style.rst
> @@ -1186,6 +1186,67 @@ expression used.  For instance:
>   #endif /* CONFIG_SOMETHING */
>  
>  
> +22) Do not crash the kernel
> +---
> +
> +In general, it is not the kernel developer's decision to crash the kernel.
> +
> +Avoid panic()
> +=
This looks to me like a subsection-level title.  The adornment symbol
needs to be:

   *

> +
> +panic() should be used with care and primarily only during system boot.
> +panic() is, for example, acceptable when running out of memory during boot 
> and
> +not being able to continue.
> +
> +Use WARN() rather than BUG()
> +
Ditto.

> +
> +Do not add new code that uses any of the BUG() variants, such as BUG(),
> +BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
> +WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not
> +required if there is no reasonable way to at least partially recover.
> +
> +"I'm too lazy to do error handling" is not an excuse for using BUG(). Major
> +internal corruptions with no way of continuing may still use BUG(), but need
> +good justification.
> +
> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
> +**
These wrong adornment symbol confuse ReST parser of Sphinx and results in
the build error from "make htmldocs" at this title (long message folded):

Sphinx parallel build error:

docutils.utils.SystemMessage: 
/xxx/Documentation/process/coding-style.rst:1213:

Re: [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status

2022-09-22 Thread Bjorn Helgaas

On Fri, Sep 02, 2022 at 02:16:34AM +0800, Zhuo Chen wrote:
> Statements clearing AER error status in aer_enable_rootport() has the
> same function as pci_aer_raw_clear_status(). So we replace them, which
> has no functional changes.
> 
> Signed-off-by: Zhuo Chen 
> ---
>  drivers/pci/pcie/aer.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index d2996afa80f6..eb0193f279f2 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1287,12 +1287,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
>  SYSTEM_ERROR_INTR_ON_MESG_MASK);
>  
>   /* Clear error status */
> - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, );
> - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
> - pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, );
> - pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
> - pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, );
> - pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
> + pci_aer_raw_clear_status(pdev);

It's true that this is functionally equivalent.

But 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status() to
unconditionally clear Error Status") says pci_aer_raw_clear_status()
is only for use in the EDR path (this should have been included in the
function comment), so I think we should preserve that property and use
pci_aer_clear_status() here.

pci_aer_raw_clear_status() is the same as pci_aer_clear_status()
except it doesn't check pcie_aer_is_native().  And I'm pretty sure we
can't get to aer_enable_rootport() *unless* pcie_aer_is_native(),
because get_port_device_capability() checks the same thing, so they
should be equivalent here.

Bjorn

Re: [PATCH 0/8] generic command line v4

2022-09-22 Thread Daniel Gimpelevich

On Thu, 2022-09-22 at 14:10 -0700, Daniel Walker wrote:
> On Thu, Sep 22, 2022 at 05:03:46PM -0400, Sean Anderson wrote:
[snip]
> > As recently as last month, someone's patch to add such support was
> > rejected for this reason [1].
> > 
> > --Sean
> > 
> > [1] 
> > https://lore.kernel.org/linux-arm-kernel/20220812084613.GA3107@willie-the-truck/
> 
> 
> I had no idea.. Thanks for pointing that out. I guess I will re-submit in that
> case.
> 
> Daniel

This has been happening repeatedly since circa 2014, on multiple
architectures. It's quite frustrating, really.

Re: [PATCH 0/8] generic command line v4

2022-09-22 Thread Daniel Walker

On Thu, Sep 22, 2022 at 05:03:46PM -0400, Sean Anderson wrote:
> 
> 
> 
> On 9/22/22 4:53 PM, Daniel Walker wrote:
> > On Thu, Sep 22, 2022 at 04:45:01PM -0400, Sean Anderson wrote:
> >> 
> >> 
> >> 
> >> For an arm64 platform (after rebasing):
> >> 
> >> Tested-by: Sean Anderson 
> > 
> > Maybe I'll re-submit it.
> > 
> > Daniel
> > 
> 
> There's still no way to extend the command line on ARM64, since the
> existing method was removed in anticipation that your series would be
> added. 
> 
> As recently as last month, someone's patch to add such support was
> rejected for this reason [1].
> 
> --Sean
> 
> [1] 
> https://lore.kernel.org/linux-arm-kernel/20220812084613.GA3107@willie-the-truck/


I had no idea.. Thanks for pointing that out. I guess I will re-submit in that
case.

Daniel

Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()

2022-09-22 Thread Bjorn Helgaas

On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:
> When state is pci_channel_io_frozen in pcie_do_recovery(),
> the severity is fatal and fatal status should be cleared.
> So we add pci_aer_clear_fatal_status().

Seems sensible to me.  Did you find this by code inspection or by
debugging a problem?  If the latter, it would be nice to mention the
symptoms of the problem in the commit log.

> Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
> and pci_aer_clear_nonfatal_status() contains the function of
> 'if (host->native_aer || pcie_ports_native)', so we move them
> out of it.

Wrap commit log to fill 75 columns.

> Signed-off-by: Zhuo Chen 
> ---
>  drivers/pci/pcie/err.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 0c5a143025af..e0a8ade4c3fe 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>* it is responsible for clearing this status.  In that case, the
>* signaling device may not even be visible to the OS.
>*/
> - if (host->native_aer || pcie_ports_native) {
> + if (host->native_aer || pcie_ports_native)
>   pcie_clear_device_status(dev);

pcie_clear_device_status() doesn't check for pcie_aer_is_native()
internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
status only when we control AER"), both callers check before calling
it.

I think we should move the check inside pcie_clear_device_status().
That could be a separate preliminary patch.

There are a couple other places (aer_root_reset() and
get_port_device_capability()) that do the same check and could be
changed to use pcie_aer_is_native() instead.  That could be another
preliminary patch.

> + if (state == pci_channel_io_frozen)
> + pci_aer_clear_fatal_status(dev);
> + else
>   pci_aer_clear_nonfatal_status(dev);
> - }
> +
>   pci_info(bridge, "device recovery successful\n");
>   return status;
>  
> -- 
> 2.30.1 (Apple Git-130)
>

Re: [PATCH 0/8] generic command line v4

2022-09-22 Thread Sean Anderson

On 9/22/22 4:53 PM, Daniel Walker wrote:
> On Thu, Sep 22, 2022 at 04:45:01PM -0400, Sean Anderson wrote:
>> 
>> 
>> 
>> For an arm64 platform (after rebasing):
>> 
>> Tested-by: Sean Anderson 
> 
> Maybe I'll re-submit it.
> 
> Daniel
> 

There's still no way to extend the command line on ARM64, since the
existing method was removed in anticipation that your series would be
added. 

As recently as last month, someone's patch to add such support was
rejected for this reason [1].

--Sean

[1] 
https://lore.kernel.org/linux-arm-kernel/20220812084613.GA3107@willie-the-truck/

Re: [PATCH 0/8] generic command line v4

2022-09-22 Thread Daniel Walker

On Thu, Sep 22, 2022 at 04:45:01PM -0400, Sean Anderson wrote:
> 
> 
> 
> For an arm64 platform (after rebasing):
> 
> Tested-by: Sean Anderson 

Maybe I'll re-submit it.

Daniel

Re: [PATCH 0/8] generic command line v4

2022-09-22 Thread Sean Anderson




On 4/16/21 12:09 AM, Daniel Walker wrote:
> 
> v4 release changes
> 
> * Updated insert-sys-cert tool to change command line symbols after
>   compilation.
> 
>   This tool is used to release binary kernels internally to companies
>   and then later insert certificates for each product by consumers of
>   the binary kernel. Cisco uses this tool for this purpose.
> 
>   Cisco has a similar need for the command line to be modified on a
>   binary released kernels similar to how certificates are setup.
> 
> * Added global symbols to hold append and prepend values.
> 
>   These changes follow the system certificate code to allow the
>   insert-sys-cert tool to be used.
> 
> * Added a test case to confirm functionality.
> 
>   Seemed sensible to add this to make sure everything is working.
> 
> * Dropped powerpc changes
> 
>   Christophe Leroy has reservations about the features for powerpc. I
>   don't think his reservations are founded, and these changes should
>   fully work on powerpc. However, I dropped these changes so Christophe
>   can have more time to get comfortable with the changes.
> 
> 
> Enjoy!
> 
> 
> Daniel Walker (8):
>   CMDLINE: add generic builtin command line
>   scripts: insert-sys-cert: add command line insert capability
>   scripts: insert-sys-cert: change name to insert-symbol
>   CMDLINE: mips: convert to generic builtin command line
>   drivers: firmware: efi: libstub: enable generic commandline
>   CMDLINE: x86: convert to generic builtin command line
>   of: allow sending a NULL value to early_init_dt_scan_chosen
>   CMDLINE: arm64: convert to generic builtin command line
> 
>  arch/arm64/Kconfig|  33 +--
>  arch/arm64/include/asm/setup.h|   2 +
>  arch/arm64/kernel/idreg-override.c|   9 +-
>  arch/mips/Kconfig |   4 +-
>  arch/mips/Kconfig.debug   |  44 
>  arch/mips/configs/ar7_defconfig   |   9 +-
>  arch/mips/configs/bcm47xx_defconfig   |   8 +-
>  arch/mips/configs/bcm63xx_defconfig   |  15 +-
>  arch/mips/configs/bmips_be_defconfig  |  11 +-
>  arch/mips/configs/bmips_stb_defconfig |  11 +-
>  arch/mips/configs/capcella_defconfig  |  11 +-
>  arch/mips/configs/ci20_defconfig  |  10 +-
>  arch/mips/configs/cu1000-neo_defconfig|  10 +-
>  arch/mips/configs/cu1830-neo_defconfig|  10 +-
>  arch/mips/configs/e55_defconfig   |   4 +-
>  arch/mips/configs/generic_defconfig   |   6 +-
>  arch/mips/configs/gpr_defconfig   |  18 +-
>  arch/mips/configs/loongson3_defconfig |  13 +-
>  arch/mips/configs/mpc30x_defconfig|   7 +-
>  arch/mips/configs/tb0219_defconfig|   7 +-
>  arch/mips/configs/tb0226_defconfig|   7 +-
>  arch/mips/configs/tb0287_defconfig|   7 +-
>  arch/mips/configs/workpad_defconfig   |  11 +-
>  arch/mips/include/asm/setup.h |   2 +
>  arch/mips/kernel/relocate.c   |  17 +-
>  arch/mips/kernel/setup.c  |  36 +--
>  arch/mips/pic32/pic32mzda/early_console.c |   2 +-
>  arch/mips/pic32/pic32mzda/init.c  |   3 +-
>  arch/x86/Kconfig  |  44 +---
>  arch/x86/kernel/setup.c   |  18 +-
>  .../firmware/efi/libstub/efi-stub-helper.c|  29 +++
>  drivers/firmware/efi/libstub/efi-stub.c   |   9 +
>  drivers/firmware/efi/libstub/efistub.h|   1 +
>  drivers/firmware/efi/libstub/x86-stub.c   |  13 +-
>  drivers/of/fdt.c  |  44 ++--
>  include/linux/cmdline.h   | 103 
>  init/Kconfig  |  78 ++
>  lib/Kconfig   |   4 +
>  lib/Makefile  |   3 +
>  lib/generic_cmdline.S |  53 
>  lib/test_cmdline1.c   | 139 ++
>  scripts/Makefile  |   2 +-
>  .../{insert-sys-cert.c => insert-symbol.c}| 243 --
>  43 files changed, 716 insertions(+), 394 deletions(-)
>  create mode 100644 include/linux/cmdline.h
>  create mode 100644 lib/generic_cmdline.S
>  create mode 100644 lib/test_cmdline1.c
>  rename scripts/{insert-sys-cert.c => insert-symbol.c} (72%)
> 

For an arm64 platform (after rebasing):

Tested-by: Sean Anderson

Re: [External] Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-22 Thread Bjorn Helgaas

On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote:
> On 9/12/22 12:22 AM, Serge Semin wrote:
> > On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:
> > > Status bits for ERR_NONFATAL errors only are cleared in
> > > pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
> > > error status in ntb_hw_idt.c and lpfc_attr.c. So we add
> > > pci_aer_clear_uncorrect_error_status() and change to use it.
> > 
> > What about the next drivers
> > 
> > drivers/scsi/lpfc/lpfc_attr.c
> > drivers/crypto/hisilicon/qm.c
> > drivers/net/ethernet/intel/ice/ice_main.c
> > 
> > which call the pci_aer_clear_nonfatal_status() method too?
> 
> ‘pci_aer_clear_nonfatal_status()’ in
> drivers/net/ethernet/intel/ice/ice_main.c has already been removed and
> merged in kernel in: 
> https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776

It's better if you can use kernel.org URLs that don't depend on
third parties like github, e.g.,

  https://git.kernel.org/linus/ca415ea1f03a

> ‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be
> removed in the next kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406

This is a problem because 00278564a60e ("crypto: hisilicon - Remove
pci_aer_clear_nonfatal_status() call") is in Herbert's cryptodev tree,
and if I apply this series to the PCI tree and Linus merges it before
Herbert's cryptodev changes, it will break the build.

I think we need to split this patch up like this:

  - Add pci_aer_clear_uncorrect_error_status() to PCI core
  - Convert dpc to use pci_aer_clear_uncorrect_error_status()
(I might end up squashing with above)
  - Convert lpfc to use pci_aer_clear_uncorrect_error_status()
  - Convert ntb_hw_idt to use pci_aer_clear_uncorrect_error_status()
  - Unexport pci_aer_clear_nonfatal_status()

Then I can apply all but the last patch safely.  If the crypto changes
are merged first, we can add the last one; otherwise we can do it for
the next cycle.

> Uncorrectable error status register was intended to be cleared in
> drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in 
> https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb
> and
> https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f

This will be a behavior change for lpfc and ntb_hw_idt.  It looks like
it changes the behavior back to what it was before e7b0b847de6d
("PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery"),
so it might be OK, but splitting these out to their own patches will
make the change more obvious and we can make sure that's what we want.

Bjorn

> > > Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
> > > no functional changes.
> > > 
> > > Since pci_aer_clear_nonfatal_status() is used only internally, move
> > > its declaration to the PCI internal header file. Also, no one cares
> > > about return value of pci_aer_clear_nonfatal_status(), so make it void.
> > > 
> > > Signed-off-by: Zhuo Chen 
> > > ---
> > >   drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
> > >   drivers/pci/pci.h   |  2 ++
> > >   drivers/pci/pcie/aer.c  | 23 ++-
> > >   drivers/pci/pcie/dpc.c  |  3 +--
> > >   drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
> > >   include/linux/aer.h |  4 ++--
> > >   6 files changed, 27 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c 
> > > b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > index 733557231ed0..de1dbbc5b9de 100644
> > > --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > @@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> > >   ret = pci_enable_pcie_error_reporting(pdev);
> > >   if (ret != 0)
> > >   dev_warn(>dev, "PCIe AER capability disabled\n");
> > 
> > > - else /* Cleanup nonfatal error status before getting to init */
> > > - pci_aer_clear_nonfatal_status(pdev);
> > > + else /* Cleanup uncorrectable error status before getting to init */
> > > + pci_aer_clear_uncorrect_error_status(pdev);
> > 
> >  From the IDT NTB PCIe initialization procedure point of view both of
> > these methods are equivalent. So for the IDT NTB part:
> > 
> IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original
> function is clear uncorrectable error status register including fatal and
> non-fatal error status bits.
> 
> > Acked-by: Serge Semin 
> > 
> > -Sergey
> > 
> > >   /* First enable the PCI device */
> > >   ret = pcim_enable_device(pdev);
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index e10cdec6c56e..574176f43025 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
> > >   void pci_aer_exit(struct

[PATCH 0/2] powerpc/pseries: restrict error injection and DT changes when locked down

2022-09-22 Thread Nathan Lynch

Add two new lockdown reasons for use in powerpc's pseries platform
code.

The pseries platform allows hardware-level error injection via certain
calls to the RTAS (Run Time Abstraction Services) firmware. ACPI-based
error injection is already restricted in lockdown; this facility
should be restricted for the same reasons.

pseries also allows nearly arbitrary device tree changes via
/proc/powerpc/ofdt. Just as overriding ACPI tables is not allowed
while locked down, so should this facility be restricted.

Nathan Lynch (2):
  powerpc/pseries: block untrusted device tree changes when locked down
  powerpc/rtas: block error injection when locked down

 arch/powerpc/kernel/rtas.c| 25 ++-
 arch/powerpc/platforms/pseries/reconfig.c |  5 +
 include/linux/security.h  |  2 ++
 security/security.c   |  2 ++
 4 files changed, 33 insertions(+), 1 deletion(-)

-- 
2.37.3

[PATCH 1/2] powerpc/pseries: block untrusted device tree changes when locked down

2022-09-22 Thread Nathan Lynch

The /proc/powerpc/ofdt interface allows the root user to freely alter
the in-kernel device tree, enabling arbitrary physical address writes
via drivers that could bind to malicious device nodes, thus making it
possible to disable lockdown.

Historically this interface has been used on the pseries platform to
facilitate the runtime addition and removal of processor, memory, and
device resources (aka Dynamic Logical Partitioning or DLPAR). Years
ago, the processor and memory use cases were migrated to designs that
happen to be lockdown-friendly: device tree updates are communicated
directly to the kernel from firmware without passing through untrusted
user space. I/O device DLPAR via the "drmgr" command in powerpc-utils
remains the sole legitimate user of /proc/powerpc/ofdt, but it is
already broken in lockdown since it uses /dev/mem to allocate argument
buffers for the rtas syscall. So only illegitimate uses of the
interface should see a behavior change when running on a locked down
kernel.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/platforms/pseries/reconfig.c | 5 +
 include/linux/security.h  | 1 +
 security/security.c   | 1 +
 3 files changed, 7 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/reconfig.c 
b/arch/powerpc/platforms/pseries/reconfig.c
index cad7a0c93117..599bd2c78514 100644
--- a/arch/powerpc/platforms/pseries/reconfig.c
+++ b/arch/powerpc/platforms/pseries/reconfig.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -361,6 +362,10 @@ static ssize_t ofdt_write(struct file *file, const char 
__user *buf, size_t coun
char *kbuf;
char *tmp;
 
+   rv = security_locked_down(LOCKDOWN_DEVICE_TREE);
+   if (rv)
+   return rv;
+
kbuf = memdup_user_nul(buf, count);
if (IS_ERR(kbuf))
return PTR_ERR(kbuf);
diff --git a/include/linux/security.h b/include/linux/security.h
index 7bd0c490703d..1ca8dbacd3cc 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -122,6 +122,7 @@ enum lockdown_reason {
LOCKDOWN_XMON_WR,
LOCKDOWN_BPF_WRITE_USER,
LOCKDOWN_DBG_WRITE_KERNEL,
+   LOCKDOWN_DEVICE_TREE,
LOCKDOWN_INTEGRITY_MAX,
LOCKDOWN_KCORE,
LOCKDOWN_KPROBES,
diff --git a/security/security.c b/security/security.c
index 4b95de24bc8d..2863fc31eec6 100644
--- a/security/security.c
+++ b/security/security.c
@@ -60,6 +60,7 @@ const char *const 
lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
[LOCKDOWN_XMON_WR] = "xmon write access",
[LOCKDOWN_BPF_WRITE_USER] = "use of bpf to write user RAM",
[LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM",
+   [LOCKDOWN_DEVICE_TREE] = "modifying device tree contents",
[LOCKDOWN_INTEGRITY_MAX] = "integrity",
[LOCKDOWN_KCORE] = "/proc/kcore access",
[LOCKDOWN_KPROBES] = "use of kprobes",
-- 
2.37.3

[PATCH 2/2] powerpc/rtas: block error injection when locked down

2022-09-22 Thread Nathan Lynch

The error injection facility on pseries VMs allows corruption of
arbitrary guest memory, potentially enabling a sufficiently privileged
user to disable lockdown or perform other modifications of the running
kernel via the rtas syscall.

Block the PAPR error injection facility from being opened or called
when locked down.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas.c | 25 -
 include/linux/security.h   |  1 +
 security/security.c|  1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 693133972294..c2540d393f1c 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -464,6 +465,9 @@ void rtas_call_unlocked(struct rtas_args *args, int token, 
int nargs, int nret,
va_end(list);
 }
 
+static int ibm_open_errinjct_token;
+static int ibm_errinjct_token;
+
 int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 {
va_list list;
@@ -476,6 +480,16 @@ int rtas_call(int token, int nargs, int nret, int 
*outputs, ...)
if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
return -1;
 
+   if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) {
+   /*
+* It would be nicer to not discard the error value
+* from security_locked_down(), but callers expect an
+* RTAS status, not an errno.
+*/
+   if (security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION))
+   return -1;
+   }
+
if ((mfmsr() & (MSR_IR|MSR_DR)) != (MSR_IR|MSR_DR)) {
WARN_ON_ONCE(1);
return -1;
@@ -1227,6 +1241,14 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
if (block_rtas_call(token, nargs, ))
return -EINVAL;
 
+   if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) {
+   int err;
+
+   err = security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION);
+   if (err)
+   return err;
+   }
+
/* Need to handle ibm,suspend_me call specially */
if (token == rtas_token("ibm,suspend-me")) {
 
@@ -1325,7 +1347,8 @@ void __init rtas_initialize(void)
 #ifdef CONFIG_RTAS_ERROR_LOGGING
rtas_last_error_token = rtas_token("rtas-last-error");
 #endif
-
+   ibm_open_errinjct_token = rtas_token("ibm,open-errinjct");
+   ibm_errinjct_token = rtas_token("ibm,errinjct");
rtas_syscall_filter_init();
 }
 
diff --git a/include/linux/security.h b/include/linux/security.h
index 1ca8dbacd3cc..b5d5138ae66a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -123,6 +123,7 @@ enum lockdown_reason {
LOCKDOWN_BPF_WRITE_USER,
LOCKDOWN_DBG_WRITE_KERNEL,
LOCKDOWN_DEVICE_TREE,
+   LOCKDOWN_RTAS_ERROR_INJECTION,
LOCKDOWN_INTEGRITY_MAX,
LOCKDOWN_KCORE,
LOCKDOWN_KPROBES,
diff --git a/security/security.c b/security/security.c
index 2863fc31eec6..6518b239ada2 100644
--- a/security/security.c
+++ b/security/security.c
@@ -61,6 +61,7 @@ const char *const 
lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
[LOCKDOWN_BPF_WRITE_USER] = "use of bpf to write user RAM",
[LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM",
[LOCKDOWN_DEVICE_TREE] = "modifying device tree contents",
+   [LOCKDOWN_RTAS_ERROR_INJECTION] = "RTAS error injection",
[LOCKDOWN_INTEGRITY_MAX] = "integrity",
[LOCKDOWN_KCORE] = "/proc/kcore access",
[LOCKDOWN_KPROBES] = "use of kprobes",
-- 
2.37.3

Re: [PATCH 1/2] tools/perf/tests: Fix string substitutions in build id test

2022-09-22 Thread Arnaldo Carvalho de Melo

Em Wed, Sep 21, 2022 at 10:38:38PM +0530, Athira Rajeev escreveu:
> The perf test named “build id cache operations” skips with below
> error on some distros:

I wonder if we shouldn't instead state that bash is needed?

⬢[acme@toolbox perf-urgent]$ head -1 tools/perf/tests/shell/*.sh | grep ^#
#!/bin/sh
#!/bin/bash
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/bash
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/bash
#!/bin/sh
#!/bin/bash
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
#!/bin/sh
⬢[acme@toolbox perf-urgent]$

Opinions?

- Arnaldo
 
> <<>>
>  78: build id cache operations   :
> test child forked, pid 01
> WARNING: wine not found. PE binaries will not be run.
> test binaries: /tmp/perf.ex.SHA1.PKz /tmp/perf.ex.MD5.Gt3 
> ./tests/shell/../pe-file.exe
> DEBUGINFOD_URLS=
> Adding 4abd406f041feb4f10ecde3fc30fd0639e1a91cb /tmp/perf.ex.SHA1.PKz: Ok
> build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb
> ./tests/shell/buildid.sh: 69: ./tests/shell/buildid.sh: Bad substitution
> test child finished with -2
> build id cache operations: Skip
> <<>>
> 
> The test script "tests/shell/buildid.sh" uses some of the
> string substitution ways which are supported in bash, but not in
> "sh" or other shells. Above error on line number 69 that reports
> "Bad substitution" is:
> 
> <<>>
> link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
> <<>>
> 
> Here the way of getting first two characters from id ie,
> ${id:0:2} and similarly expressions like ${id:2} is not
> recognised in "sh". So the line errors and instead of
> hitting failure, the test gets skipped as shown in logs.
> So the syntax issue causes test not to be executed in
> such cases. Similarly usage : "${@: -1}" [ to pick last
> argument passed to a function] in “test_record” doesn’t
> work in all distros.
> 
> Fix this by using alternative way with "cut" command
> to pick "n" characters from the string. Also fix the usage
> of “${@: -1}” to work in all cases.
> 
> Another usage in “test_record” is:
> <<>>
> ${perf} record --buildid-all -o ${data} $@ &> ${log}
> <<>>
> 
> This causes the perf record to start in background and
> Results in the data file not being created by the time
> "check" function is invoked. Below log shows perf record
> result getting displayed after the call to "check" function.
> 
> <<>>
> running: perf record /tmp/perf.ex.SHA1.EAU
> build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb
> link: /tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb
> failed: link 
> /tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb does 
> not exist
> test child finished with -1
> build id cache operations: FAILED!
> root@machine:~/athira/linux/tools/perf# Couldn't synthesize bpf events.
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.010 MB /tmp/perf.data.bFF ]
> <<>>
> 
> Fix this by redirecting output instead of using “&” which
> starts the command in background.
> 
> Signed-off-by: Athira Rajeev 
> ---
>  tools/perf/tests/shell/buildid.sh | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/tests/shell/buildid.sh 
> b/tools/perf/tests/shell/buildid.sh
> index f05670d1e39e..3512c4423d48 100755
> --- a/tools/perf/tests/shell/buildid.sh
> +++ b/tools/perf/tests/shell/buildid.sh
> @@ -66,7 +66,7 @@ check()
>   esac
>   echo "build id: ${id}"
>  
> - link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
> + link=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/$(echo 
> ${id}|cut -c 3-)
>   echo "link: ${link}"
>  
>   if [ ! -h $link ]; then
> @@ -74,7 +74,7 @@ check()
>   exit 1
>   fi
>  
> - file=${build_id_dir}/.build-id/${id:0:2}/`readlink ${link}`/elf
> + file=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/`readlink 
> ${link}`/elf
>   echo "file: ${file}"
>  
>   if [ ! -x $file ]; then
> @@ -117,20 +117,22 @@ test_record()
>  {
>   data=$(mktemp /tmp/perf.data.XXX)
>   build_id_dir=$(mktemp -d /tmp/perf.debug.XXX)
> - log=$(mktemp /tmp/perf.log.XXX)
> + log_out=$(mktemp /tmp/perf.log.out.XXX)
> + log_err=$(mktemp /tmp/perf.log.err.XXX)
>   perf="perf --buildid-dir ${build_id_dir}"
> + eval last=\${$#}
>  
>   echo "running: perf record $@"
> - ${perf} record --buildid-all -o ${data} $@ &> ${log}
> + ${perf} record --buildid-all -o ${data} $@ 1>${log_out} 2>${log_err}
>   if [ $? -ne 0 ]; then
>   echo "failed: record $@"
> - echo "see log: ${log}"
> + echo "see log: ${log_err}"
>   exit 1
>   fi
>  
> - check ${@: -1}
> + check $last
>  
> - rm -f ${log}
> + rm -f ${log_out} ${log_err}
>   rm -rf ${build_id_dir}
>   rm -rf ${data}
>  }
> -- 
> 2.17.1

-- 

- Arnaldo

[PATCH v5 27/30] RFC: KVM: powerpc: Move processor compatibility check to hardware setup

2022-09-22 Thread isaku . yamahata

From: Isaku Yamahata 

Move processor compatibility check from kvm_arch_processor_compat() into
kvm_arch_hardware_setup().  The check does model name comparison with a
global variable, cur_cpu_spec.  There is no point to check it at run time
on all processors.

kvmppc_core_check_processor_compat() checks the global variable.  There are
five implementation for it as follows.

  arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec;
  arch/powerpc/kvm/book3s.c: return 0
  arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2")
  arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc")
 strcmp(cur_cpu_spec->cpu_name, "e5500")
 strcmp(cur_cpu_spec->cpu_name, "e6500")

Suggested-by: Sean Christopherson 
Signed-off-by: Isaku Yamahata 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7b56d6ccfdfb..31dc4f231e9d 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -444,12 +444,12 @@ int kvm_arch_hardware_enable(void)
 
 int kvm_arch_hardware_setup(void *opaque)
 {
-   return 0;
+   return kvmppc_core_check_processor_compat();
 }
 
 int kvm_arch_check_processor_compat(void)
 {
-   return kvmppc_core_check_processor_compat();
+   return 0;
 }
 
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
-- 
2.25.1

Re: [PATCH v6 2/8] dt-bindings: phy: Add Lynx 10G phy binding

2022-09-22 Thread Krzysztof Kozlowski

On 20/09/2022 22:23, Sean Anderson wrote:
> This adds a binding for the SerDes module found on QorIQ processors.
> Each phy is a subnode of the top-level device, possibly supporting
> multiple lanes and protocols. This "thick" #phy-cells is used due to
> allow for better organization of parameters. Note that the particular
> parameters necessary to select a protocol-controller/lane combination
> vary across different SoCs, and even within different SerDes on the same
> SoC.
> 
> The driver is designed to be able to completely reconfigure lanes at
> runtime. Generally, the phy consumer can select the appropriate
> protocol using set_mode.
> 
> There are two PLLs, each of which can be used as the master clock for
> each lane. Each PLL has its own reference. For the moment they are
> required, because it simplifies the driver implementation. Absent
> reference clocks can be modeled by a fixed-clock with a rate of 0.
> 
> Signed-off-by: Sean Anderson 
> Reviewed-by: Rob Herring 
> ---
> 
> Changes in v6:
> - fsl,type -> phy-type
> 
> Changes in v4:
> - Use subnodes to describe lane configuration, instead of describing
>   PCCRs. This is the same style used by phy-cadence-sierra et al.
> 
> Changes in v3:
> - Manually expand yaml references
> - Add mode configuration to device tree
> 
> Changes in v2:
> - Rename to fsl,lynx-10g.yaml
> - Refer to the device in the documentation, rather than the binding
> - Move compatible first
> - Document phy cells in the description
> - Allow a value of 1 for phy-cells. This allows for compatibility with
>   the similar (but according to Ioana Ciornei different enough) lynx-28g
>   binding.
> - Remove minItems
> - Use list for clock-names
> - Fix example binding having too many cells in regs
> - Add #clock-cells. This will allow using assigned-clocks* to configure
>   the PLLs.
> - Document the structure of the compatible strings
> 
>  .../devicetree/bindings/phy/fsl,lynx-10g.yaml | 236 ++
>  1 file changed, 236 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml
> 
> diff --git a/Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml 
> b/Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml
> new file mode 100644
> index ..ce9afdbf33f4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml
> @@ -0,0 +1,236 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/phy/fsl,lynx-10g.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: NXP Lynx 10G SerDes
> +
> +maintainers:
> +  - Sean Anderson 
> +
> +description: |
> +  These Lynx "SerDes" devices are found in NXP's QorIQ line of processors. 
> The
> +  SerDes provides up to eight lanes. Each lane may be configured 
> individually,
> +  or may be combined with adjacent lanes for a multi-lane protocol. The 
> SerDes
> +  supports a variety of protocols, including up to 10G Ethernet, PCIe, SATA, 
> and
> +  others. The specific protocols supported for each lane depend on the
> +  particular SoC.
> +
> +properties:
> +  compatible:
> +items:
> +  - enum:
> +  - fsl,ls1046a-serdes
> +  - fsl,ls1088a-serdes
> +  - const: fsl,lynx-10g
> +
> +  "#address-cells":

If there is going to be resend, use only one type of quotes: ' or ".

FWIW (Rob's reviewed it):

Acked-by: Krzysztof Kozlowski 

Best regards,
Krzysztof

Re: [PATCH v6 2/8] dt-bindings: phy: Add Lynx 10G phy binding

2022-09-22 Thread Krzysztof Kozlowski

On 22/09/2022 17:23, Sean Anderson wrote:
>>
>> This check can fail if there are any dependencies. The base for a patch
>> series is generally the most recent rc1.
>>
>> If you already ran 'make dt_binding_check' and didn't see the above
>> error(s), then make sure 'yamllint' is installed and dt-schema is up to
>> date:
>>
>> pip3 install dtschema --upgrade
>>
>> Please check and re-submit.
>>
> 
> I believe this is due to the previous patch not being applied, same as last 
> time.

Yes, please ignore bot's report.

Best regards,
Krzysztof

Re: [PATCH v6 2/8] dt-bindings: phy: Add Lynx 10G phy binding

2022-09-22 Thread Sean Anderson




On 9/21/22 2:57 AM, Krzysztof Kozlowski wrote:
> On Tue, 20 Sep 2022 16:23:50 -0400, Sean Anderson wrote:
>> This adds a binding for the SerDes module found on QorIQ processors.
>> Each phy is a subnode of the top-level device, possibly supporting
>> multiple lanes and protocols. This "thick" #phy-cells is used due to
>> allow for better organization of parameters. Note that the particular
>> parameters necessary to select a protocol-controller/lane combination
>> vary across different SoCs, and even within different SerDes on the same
>> SoC.
>> 
>> The driver is designed to be able to completely reconfigure lanes at
>> runtime. Generally, the phy consumer can select the appropriate
>> protocol using set_mode.
>> 
>> There are two PLLs, each of which can be used as the master clock for
>> each lane. Each PLL has its own reference. For the moment they are
>> required, because it simplifies the driver implementation. Absent
>> reference clocks can be modeled by a fixed-clock with a rate of 0.
>> 
>> Signed-off-by: Sean Anderson 
>> Reviewed-by: Rob Herring 
>> ---
>> 
>> Changes in v6:
>> - fsl,type -> phy-type
>> 
>> Changes in v4:
>> - Use subnodes to describe lane configuration, instead of describing
>>   PCCRs. This is the same style used by phy-cadence-sierra et al.
>> 
>> Changes in v3:
>> - Manually expand yaml references
>> - Add mode configuration to device tree
>> 
>> Changes in v2:
>> - Rename to fsl,lynx-10g.yaml
>> - Refer to the device in the documentation, rather than the binding
>> - Move compatible first
>> - Document phy cells in the description
>> - Allow a value of 1 for phy-cells. This allows for compatibility with
>>   the similar (but according to Ioana Ciornei different enough) lynx-28g
>>   binding.
>> - Remove minItems
>> - Use list for clock-names
>> - Fix example binding having too many cells in regs
>> - Add #clock-cells. This will allow using assigned-clocks* to configure
>>   the PLLs.
>> - Document the structure of the compatible strings
>> 
>>  .../devicetree/bindings/phy/fsl,lynx-10g.yaml | 236 ++
>>  1 file changed, 236 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml
>> 
> 
> My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
> on your patch (DT_CHECKER_FLAGS is new in v5.13):
> 
> yamllint warnings/errors:
> 
> dtschema/dtc warnings/errors:
> Error: 
> Documentation/devicetree/bindings/phy/fsl,lynx-10g.example.dts:51.27-28 
> syntax error
> FATAL ERROR: Unable to parse input tree
> make[1]: *** [scripts/Makefile.lib:384: 
> Documentation/devicetree/bindings/phy/fsl,lynx-10g.example.dtb] Error 1
> make[1]: *** Waiting for unfinished jobs
> make: *** [Makefile:1420: dt_binding_check] Error 2
> 
> doc reference errors (make refcheckdocs):
> 
> See https://patchwork.ozlabs.org/patch/
> 
> This check can fail if there are any dependencies. The base for a patch
> series is generally the most recent rc1.
> 
> If you already ran 'make dt_binding_check' and didn't see the above
> error(s), then make sure 'yamllint' is installed and dt-schema is up to
> date:
> 
> pip3 install dtschema --upgrade
> 
> Please check and re-submit.
> 

I believe this is due to the previous patch not being applied, same as last 
time.

--Sean

Re: [PATCH 16/17] powerpc/qspinlock: allow indefinite spinning on a preempted owner

2022-09-22 Thread Laurent Dufour

On 28/07/2022 08:31:19, Nicholas Piggin wrote:
> Provide an option that holds off queueing indefinitely while the lock
> owner is preempted. This could reduce queueing latencies for very
> overcommitted vcpu situations.
> 
> This is disabled by default.

Hi Nick,

I should have missed something here.

If this option is turned on, CPU trying to lock when there is a preempted
owner will spin checking the lock->val and yielding the lock owner CPU.
Am I right?

If yes, why not being queued and spin checking its own value, yielding
against the lock owner CPU? This will generate less cache bouncing, which
is what the queued spinlock is trying to address, isn't it?

Thanks,
Laurent.

> ---
>  arch/powerpc/lib/qspinlock.c | 91 +++-
>  1 file changed, 79 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
> index 24f68bd71e2b..5cfd69931e31 100644
> --- a/arch/powerpc/lib/qspinlock.c
> +++ b/arch/powerpc/lib/qspinlock.c
> @@ -35,6 +35,7 @@ static int HEAD_SPINS __read_mostly = (1<<8);
>  
>  static bool pv_yield_owner __read_mostly = true;
>  static bool pv_yield_allow_steal __read_mostly = false;
> +static bool pv_spin_on_preempted_owner __read_mostly = false;
>  static bool pv_yield_prev __read_mostly = true;
>  static bool pv_yield_propagate_owner __read_mostly = true;
>  static bool pv_prod_head __read_mostly = false;
> @@ -220,13 +221,15 @@ static struct qnode *get_tail_qnode(struct qspinlock 
> *lock, u32 val)
>   BUG();
>  }
>  
> -static __always_inline void __yield_to_locked_owner(struct qspinlock *lock, 
> u32 val, bool paravirt, bool clear_mustq)
> +static __always_inline void __yield_to_locked_owner(struct qspinlock *lock, 
> u32 val, bool paravirt, bool clear_mustq, bool *preempted)
>  {
>   int owner;
>   u32 yield_count;
>  
>   BUG_ON(!(val & _Q_LOCKED_VAL));
>  
> + *preempted = false;
> +
>   if (!paravirt)
>   goto relax;
>  
> @@ -241,6 +244,8 @@ static __always_inline void 
> __yield_to_locked_owner(struct qspinlock *lock, u32
>  
>   spin_end();
>  
> + *preempted = true;
> +
>   /*
>* Read the lock word after sampling the yield count. On the other side
>* there may a wmb because the yield count update is done by the
> @@ -265,14 +270,14 @@ static __always_inline void 
> __yield_to_locked_owner(struct qspinlock *lock, u32
>   spin_cpu_relax();
>  }
>  
> -static __always_inline void yield_to_locked_owner(struct qspinlock *lock, 
> u32 val, bool paravirt)
> +static __always_inline void yield_to_locked_owner(struct qspinlock *lock, 
> u32 val, bool paravirt, bool *preempted)
>  {
> - __yield_to_locked_owner(lock, val, paravirt, false);
> + __yield_to_locked_owner(lock, val, paravirt, false, preempted);
>  }
>  
> -static __always_inline void yield_head_to_locked_owner(struct qspinlock 
> *lock, u32 val, bool paravirt, bool clear_mustq)
> +static __always_inline void yield_head_to_locked_owner(struct qspinlock 
> *lock, u32 val, bool paravirt, bool clear_mustq, bool *preempted)
>  {
> - __yield_to_locked_owner(lock, val, paravirt, clear_mustq);
> + __yield_to_locked_owner(lock, val, paravirt, clear_mustq, preempted);
>  }
>  
>  static __always_inline void propagate_yield_cpu(struct qnode *node, u32 val, 
> int *set_yield_cpu, bool paravirt)
> @@ -364,12 +369,33 @@ static __always_inline void yield_to_prev(struct 
> qspinlock *lock, struct qnode *
>  
>  static __always_inline bool try_to_steal_lock(struct qspinlock *lock, bool 
> paravirt)
>  {
> - int iters;
> + int iters = 0;
> +
> + if (!STEAL_SPINS) {
> + if (paravirt && pv_spin_on_preempted_owner) {
> + spin_begin();
> + for (;;) {
> + u32 val = READ_ONCE(lock->val);
> + bool preempted;
> +
> + if (val & _Q_MUST_Q_VAL)
> + break;
> + if (!(val & _Q_LOCKED_VAL))
> + break;
> + if (!vcpu_is_preempted(get_owner_cpu(val)))
> + break;
> + yield_to_locked_owner(lock, val, paravirt, 
> );
> + }
> + spin_end();
> + }
> + return false;
> + }
>  
>   /* Attempt to steal the lock */
>   spin_begin();
>   for (;;) {
>   u32 val = READ_ONCE(lock->val);
> + bool preempted;
>  
>   if (val & _Q_MUST_Q_VAL)
>   break;
> @@ -382,9 +408,22 @@ static __always_inline bool try_to_steal_lock(struct 
> qspinlock *lock, bool parav
>   continue;
>   }
>  
> - yield_to_locked_owner(lock, val, paravirt);
> -
> - iters++;
> + yield_to_locked_owner(lock, val,

Re: [PATCH v4 12/12] KVM: remove KVM_REQ_UNHALT

2022-09-22 Thread Marc Zyngier

On Wed, 21 Sep 2022 01:32:01 +0100,
Sean Christopherson  wrote:
> 
> From: Paolo Bonzini 
> 
> KVM_REQ_UNHALT is now unnecessary because it is replaced by the return
> value of kvm_vcpu_block/kvm_vcpu_halt.  Remove it.
> 
> No functional change intended.
> 
> Signed-off-by: Paolo Bonzini 
> Signed-off-by: Sean Christopherson 

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.

Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")

2022-09-22 Thread David Hildenbrand

  
+22) Do not crash the kernel

+---
+
+In general, it is not the kernel developer's decision to crash the kernel.
+
+Avoid panic()
+=

This looks to me like a subsection-level title.  The adornment symbol
needs to be:

*


+
+panic() should be used with care and primarily only during system boot.
+panic() is, for example, acceptable when running out of memory during boot and
+not being able to continue.
+
+Use WARN() rather than BUG()
+

Ditto.


+
+Do not add new code that uses any of the BUG() variants, such as BUG(),
+BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
+WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not
+required if there is no reasonable way to at least partially recover.
+
+"I'm too lazy to do error handling" is not an excuse for using BUG(). Major
+internal corruptions with no way of continuing may still use BUG(), but need
+good justification.
+
+Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
+**

These wrong adornment symbol confuse ReST parser of Sphinx and results in
the build error from "make htmldocs" at this title (long message folded):



Thanks,

the following on top should do the trick:


diff --git a/Documentation/process/coding-style.rst 
b/Documentation/process/coding-style.rst
index e05899cbfd49..9efde65ac2f3 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -1192,14 +1192,14 @@ expression used.  For instance:
 In general, it is not the kernel developer's decision to crash the kernel.
 
 Avoid panic()

-=
+*
 
 panic() should be used with care and primarily only during system boot.

 panic() is, for example, acceptable when running out of memory during boot and
 not being able to continue.
 
 Use WARN() rather than BUG()

-
+
 
 Do not add new code that uses any of the BUG() variants, such as BUG(),

 BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably


--
Thanks,

David / dhildenb

Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")

2022-09-22 Thread David Hildenbrand


On 21.09.22 06:40, Kalle Valo wrote:

David Hildenbrand  writes:


Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
is just as bad as BUG_ON(), because it will crash the kernel on
distributions that enable CONFIG_DEBUG_VM (like Fedora):

 VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
 no different, the only difference is "we can make the code smaller
 because these are less important". [2]

This resulted in a more generic discussion about usage of BUG() and
friends. While there might be corner cases that still deserve a BUG_ON(),
most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
recovery path if reasonable:

 The only possible case where BUG_ON can validly be used is "I have
 some fundamental data corruption and cannot possibly return an
 error". [2]

As a very good approximation is the general rule:

 "absolutely no new BUG_ON() calls _ever_" [2]

... not even if something really shouldn't ever happen and is merely for
documenting that an invariant always has to hold. However, there are sill
exceptions where BUG_ON() may be used:

 If you have a "this is major internal corruption, there's no way we can
 continue", then BUG_ON() is appropriate. [3]

There is only one good BUG_ON():

 Now, that said, there is one very valid sub-form of BUG_ON():
 BUILD_BUG_ON() is absolutely 100% fine. [2]

While WARN will also crash the machine with panic_on_warn set, that's
exactly to be expected:

 So we have two very different cases: the "virtual machine with good
 logging where a dead machine is fine" - use 'panic_on_warn'. And
 the actual real hardware with real drivers, running real loads by
 users. [4]

The basic idea is that warnings will similarly get reported by users
and be found during testing. However, in contrast to a BUG(), there is a
way to actually influence the expected behavior (e.g., panic_on_warn)
and to eventually keep the machine alive to extract some debug info.

Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
expect this code to trigger in any case, recovery code is not really
helpful.

 I'd prefer to keep all these warnings 'simple' - i.e. no attempted
 recovery & control flow, unless we ever expect these to trigger.
 [5]

There have been different rules floating around that were never properly
documented. Let's try to clarify.

[1] 
https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=nkpwhu8ado3v56bxccsu97oyj...@mail.gmail.com
[2] 
https://lore.kernel.org/r/CAHk-=wg40eazofo16eviaj7mfqdhz2gvebvfsmf6gyzsprj...@mail.gmail.com
[2] 
https://lkml.kernel.org/r/CAHk-=wit-dmhmfqery29jspjfgebx_ld+pnerc4j2ag990w...@mail.gmail.com
[4] 
https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=k3nov4zaceux9puqf1tjktjla2x...@mail.gmail.com
[5] https://lore.kernel.org/r/ywiw+mvezotoxn%...@gmail.com

Signed-off-by: David Hildenbrand 


[...]


+Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
+**
+
+WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
+is common for a given warning condition, if it occurs at all, to occur
+multiple times. This can fill up and wrap the kernel log, and can even slow
+the system enough that the excessive logging turns into its own, additional
+problem.


FWIW I have had cases where WARN() messages caused a reboot, maybe
mention that here? In my case the logging was so excessive that the
watchdog wasn't updated and in the end the device was forcefully
rebooted.



That should be covered by the last part, no? What would be your suggestion?

--
Thanks,

David / dhildenb

Re: [PATCH v4 11/12] KVM: mips, x86: do not rely on KVM_REQ_UNHALT

2022-09-22 Thread Philippe Mathieu-Daudé

On Wed, Sep 21, 2022 at 2:34 AM Sean Christopherson  wrote:
>
> From: Paolo Bonzini 
>
> KVM_REQ_UNHALT is a weird request that simply reports the value of
> kvm_arch_vcpu_runnable() on exit from kvm_vcpu_halt().  Only
> MIPS and x86 are looking at it, the others just clear it.  Check
> the state of the vCPU directly so that the request is handled
> as a nop on all architectures.
>
> No functional change intended, except for corner cases where an
> event arrive immediately after a signal become pending or after
> another similar host-side event.
>
> Signed-off-by: Paolo Bonzini 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/mips/kvm/emulate.c | 7 +++
>  arch/x86/kvm/x86.c  | 9 -
>  2 files changed, 11 insertions(+), 5 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] powerpc: dts: turris1x.dts: Fix NOR partitions labels

2022-09-22 Thread Pali Rohár

On Wednesday 31 August 2022 00:55:00 Pali Rohár wrote:
> Partition partition@2 contains generic kernel image and it does not
> have to be used only for rescue purposes. Partition partition@1c
> contains bootable rescue system and partition partition@34 contains
> factory image/data for restoring to NAND. So change partition labels to
> better fit their purpose by removing possible misleading substring "rootfs"
> from these labels.
> 
> Fixes: 54c15ec3b738 ("powerpc: dts: Add DTS file for CZ.NIC Turris 1.x 
> routers")
> Signed-off-by: Pali Rohár 

Ping?

> ---
>  arch/powerpc/boot/dts/turris1x.dts | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/boot/dts/turris1x.dts 
> b/arch/powerpc/boot/dts/turris1x.dts
> index b7635f0b87a9..64c0dd733e40 100644
> --- a/arch/powerpc/boot/dts/turris1x.dts
> +++ b/arch/powerpc/boot/dts/turris1x.dts
> @@ -263,21 +263,21 @@
>   };
>  
>   partition@2 {
> - /* 1.7 MB for Rescue Linux Kernel Image 
> */
> + /* 1.7 MB for Linux Kernel Image */
>   reg = <0x0002 0x001a>;
> - label = "rescue-kernel";
> + label = "kernel";
>   };
>  
>   partition@1c {
>   /* 1.5 MB for Rescue JFFS2 Root File 
> System */
>   reg = <0x001c 0x0018>;
> - label = "rescue-rootfs";
> + label = "rescue";
>   };
>  
>   partition@34 {
> - /* 11 MB for TAR.XZ Backup with content 
> of NAND Root File System */
> + /* 11 MB for TAR.XZ Archive with 
> Factory content of NAND Root File System */
>   reg = <0x0034 0x00b0>;
> - label = "backup-rootfs";
> + label = "factory";
>   };
>  
>   partition@e4 {
> -- 
> 2.20.1
>

Re: [PATCH] powerpc: dts: turris1x.dts: Fix labels in DSA cpu port nodes

2022-09-22 Thread Pali Rohár

On Saturday 27 August 2022 15:15:38 Pali Rohár wrote:
> DSA cpu port node has to be marked with "cpu" label.
> So fix it for both cpu port nodes.
> 
> Fixes: 54c15ec3b738 ("powerpc: dts: Add DTS file for CZ.NIC Turris 1.x 
> routers")
> Signed-off-by: Pali Rohár 

Ping?

> ---
>  arch/powerpc/boot/dts/turris1x.dts | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/boot/dts/turris1x.dts 
> b/arch/powerpc/boot/dts/turris1x.dts
> index 8277e0d7fd91..64c0dd733e40 100644
> --- a/arch/powerpc/boot/dts/turris1x.dts
> +++ b/arch/powerpc/boot/dts/turris1x.dts
> @@ -147,7 +147,7 @@
>  
>   port@0 {
>   reg = <0>;
> - label = "cpu1";
> + label = "cpu";
>   ethernet = <>;
>   phy-mode = "rgmii-id";
>  
> @@ -184,7 +184,7 @@
>  
>   port@6 {
>   reg = <6>;
> - label = "cpu0";
> + label = "cpu";
>   ethernet = <>;
>   phy-mode = "rgmii-id";
>  
> -- 
> 2.20.1
>

Re: [powerpc] Kernel crash with THP tests (next-20220920)

2022-09-22 Thread Sachin Sant




> On 22-Sep-2022, at 5:11 AM, Mike Kravetz  wrote:
> 
> On 09/21/22 12:00, Sachin Sant wrote:
>> While running transparent huge page tests [1] against 6.0.0-rc6-next-20220920
>> following crash is seen on IBM Power server.
> 
> Thanks Sachin,
> 
> Naoya reported this, with my analysis here:
> https://lore.kernel.org/linux-mm/YyqCS6+OXAgoqI8T@monkey/
> 

Thanks Mike for the pointer.

> An updated version of the patch was posted here,
> https://lore.kernel.org/linux-mm/20220921202702.106069-1-mike.krav...@oracle.com/
> 
This updated patch works for me. The test runs to completion without any
issues.

- Sachin

Re: [PATCH] powerpc/pseries: Move vas_migration_handler early during migration

2022-09-22 Thread Nathan Lynch

Haren Myneni  writes:
> When the migration is initiated, the hypervisor changes VAS
> mappings as part of pre-migration event. Then the OS gets the
> migration event which closes all VAS windows before the migration
> starts. NX generates continuous faults until windows are closed
> and the user space can not differentiate these NX faults coming
> from the actual migration. So to reduce this time window, close
> VAS windows first in pseries_migrate_partition().

I'm concerned that this is only narrowing a window of time where
undesirable faults occur, and that it may not be sufficient for all
configurations. Migrations can be in progress for minutes or hours,
while the time that we wait for the VASI state transition is usually
seconds or minutes. So I worry that this works around a problem in
limited cases but doesn't cover them all.

Maybe I don't understand the problem well enough. How does user space
respond to the NX faults?

[PATCH v3 1/4] drm/ofdrm: Add ofdrm for Open Firmware framebuffers

2022-09-22 Thread Thomas Zimmermann

Open Firmware provides basic display output via the 'display' node.
DT platform code already provides a device that represents the node's
framebuffer. Add a DRM driver for the device. The display mode and
color format is pre-initialized by the system's firmware. Runtime
modesetting via DRM is not possible. The display is useful during
early boot stages or as error fallback.

Similar functionality is already provided by fbdev's offb driver,
which is insufficient for modern userspace. The old driver includes
support for BootX device tree, which can be found on old 32-bit
PowerPC Macintosh systems. If these are still in use, the
functionality can be added to ofdrm or implemented in a new
driver. As with simpledrm, the fbdev driver cannot be selected if
ofdrm is already enabled.

Two notable points about the driver:

 * Reading the framebuffer aperture from the device tree is not
reliable on all systems. Ofdrm takes the heuristics and a comment
from offb to pick the correct range.

 * No resource management may be tied to the underlying PCI device.
Otherwise the handover to the native driver will fail with a resource
conflict. PCI management is therefore done as part of the platform
device's cleanup.

The driver has been tested on qemu's ppc64le emulation. The device
hand-over has been tested with bochs.

v3:
* reintegrate FWFB helpers into ofdrm
* use damage iterator
* sync GEM BOs with drm_gem_fb_{begin,end}_cpu_access()
* fix various atomic_check helpers
* remove CRTC atomic_{enable,disable} (Javier)
* compute stride with drm_format_info_min_pitch() (Daniel)
v2:
* removed simple-pipe helpers
* built driver on top of FWFB helpers
* merged all init code into single function
* make PCI support optional (Michal)
* support COMPILE_TEST (Javier)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 MAINTAINERS   |   1 +
 drivers/gpu/drm/tiny/Kconfig  |  13 +
 drivers/gpu/drm/tiny/Makefile |   1 +
 drivers/gpu/drm/tiny/ofdrm.c  | 741 ++
 drivers/video/fbdev/Kconfig   |   1 +
 5 files changed, 757 insertions(+)
 create mode 100644 drivers/gpu/drm/tiny/ofdrm.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 5dea02ce7b24..55f3a9def47f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6655,6 +6655,7 @@ L:dri-de...@lists.freedesktop.org
 S: Maintained
 T: git git://anongit.freedesktop.org/drm/drm-misc
 F: drivers/gpu/drm/drm_aperture.c
+F: drivers/gpu/drm/tiny/ofdrm.c
 F: drivers/gpu/drm/tiny/simpledrm.c
 F: drivers/video/aperture.c
 F: include/drm/drm_aperture.h
diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
index 565957264875..a300b03a3c7a 100644
--- a/drivers/gpu/drm/tiny/Kconfig
+++ b/drivers/gpu/drm/tiny/Kconfig
@@ -51,6 +51,19 @@ config DRM_GM12U320
 This is a KMS driver for projectors which use the GM12U320 chipset
 for video transfer over USB2/3, such as the Acer C120 mini projector.
 
+config DRM_OFDRM
+   tristate "Open Firmware display driver"
+   depends on DRM && OF && (PPC || COMPILE_TEST)
+   select APERTURE_HELPERS
+   select DRM_GEM_SHMEM_HELPER
+   select DRM_KMS_HELPER
+   help
+ DRM driver for Open Firmware framebuffers.
+
+ This driver assumes that the display hardware has been initialized
+ by the Open Firmware before the kernel boots. Scanout buffer, size,
+ and display format must be provided via device tree.
+
 config DRM_PANEL_MIPI_DBI
tristate "DRM support for MIPI DBI compatible panels"
depends on DRM && SPI
diff --git a/drivers/gpu/drm/tiny/Makefile b/drivers/gpu/drm/tiny/Makefile
index 1d9d6227e7ab..76dde89a044b 100644
--- a/drivers/gpu/drm/tiny/Makefile
+++ b/drivers/gpu/drm/tiny/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_DRM_ARCPGU)+= arcpgu.o
 obj-$(CONFIG_DRM_BOCHS)+= bochs.o
 obj-$(CONFIG_DRM_CIRRUS_QEMU)  += cirrus.o
 obj-$(CONFIG_DRM_GM12U320) += gm12u320.o
+obj-$(CONFIG_DRM_OFDRM)+= ofdrm.o
 obj-$(CONFIG_DRM_PANEL_MIPI_DBI)   += panel-mipi-dbi.o
 obj-$(CONFIG_DRM_SIMPLEDRM)+= simpledrm.o
 obj-$(CONFIG_TINYDRM_HX8357D)  += hx8357d.o
diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
new file mode 100644
index ..5ac9aa769513
--- /dev/null
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -0,0 +1,741 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_NAME"ofdrm"
+#define DRIVER_DESC"DRM driver for OF platform devices"
+#define DRIVER_DATE"20220501"
+#define DRIVER_MAJOR   1
+#define DRIVER_MINOR   0
+
+/*
+ * Helpers

[PATCH v3 0/4] drm: Add driver for PowerPC OF displays

2022-09-22 Thread Thomas Zimmermann

PowerPC's Open Firmware offers a simple display buffer for graphics
output. Add ofdrm, a DRM driver for the device. As with the existing
simpledrm driver, the graphics hardware is pre-initialized by the
firmware. The driver only provides blitting, no actual DRM modesetting
is possible.

For version 3 of this patchset, all preparatory changes have been
merged into the DRM codebase. Only ofdrm changes are left.

Patch 1 adds ofdrm, which has again been significantly reworked.
The FWFB library has been removed infavor of various functions in
existing DRM helper libraries. Ofdrm now supports damage iterators
and synchronization for imported GEM BOs.

Patches 2 to 4 add support for color management. The code has been
taken from fbdev's offb. I have no hardware available for testing the
functionality. Qemu's stdvga apparently does not support gamma tables
in RGB modes. I verified that the color management code is executed
by running Gnome's night-mode settings, but the display's color tone
does not change.

Thomas Zimmermann (4):
  drm/ofdrm: Add ofdrm for Open Firmware framebuffers
  drm/ofdrm: Add CRTC state
  drm/ofdrm: Add per-model device function
  drm/ofdrm: Support color management

 MAINTAINERS   |1 +
 drivers/gpu/drm/tiny/Kconfig  |   13 +
 drivers/gpu/drm/tiny/Makefile |1 +
 drivers/gpu/drm/tiny/ofdrm.c  | 1351 +
 drivers/video/fbdev/Kconfig   |1 +
 5 files changed, 1367 insertions(+)
 create mode 100644 drivers/gpu/drm/tiny/ofdrm.c


base-commit: a7d5d07d5ac5ac58ec81932b3f732e3127d17af9
-- 
2.37.3

[PATCH v3 3/4] drm/ofdrm: Add per-model device function

2022-09-22 Thread Thomas Zimmermann

Add a per-model device-function structure in preparation of adding
color-management support. Detection of the individual models has been
taken from fbdev's offb.

v3:
* define constants for PCI ids (Javier)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/tiny/ofdrm.c | 125 +++
 1 file changed, 125 insertions(+)

diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index a78aee800956..9a8e5696999c 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -28,6 +28,21 @@
 #define DRIVER_MAJOR   1
 #define DRIVER_MINOR   0
 
+#define PCI_VENDOR_ID_ATI_R520 0x7100
+#define PCI_VENDOR_ID_ATI_R600 0x9400
+
+enum ofdrm_model {
+   OFDRM_MODEL_UNKNOWN,
+   OFDRM_MODEL_MACH64, /* ATI Mach64 */
+   OFDRM_MODEL_RAGE128, /* ATI Rage128 */
+   OFDRM_MODEL_RAGE_M3A, /* ATI Rage Mobility M3 Head A */
+   OFDRM_MODEL_RAGE_M3B, /* ATI Rage Mobility M3 Head B */
+   OFDRM_MODEL_RADEON, /* ATI Radeon */
+   OFDRM_MODEL_GXT2000, /* IBM GXT2000 */
+   OFDRM_MODEL_AVIVO, /* ATI R5xx */
+   OFDRM_MODEL_QEMU, /* QEMU VGA */
+};
+
 /*
  * Helpers for display nodes
  */
@@ -149,14 +164,63 @@ static u64 display_get_address_of(struct drm_device *dev, 
struct device_node *of
return address;
 }
 
+static bool is_avivo(__be32 vendor, __be32 device)
+{
+   /* This will match most R5xx */
+   return (vendor == PCI_VENDOR_ID_ATI) &&
+  ((device >= PCI_VENDOR_ID_ATI_R520 && device < 0x7800) ||
+   (PCI_VENDOR_ID_ATI_R600 >= 0x9400));
+}
+
+static enum ofdrm_model display_get_model_of(struct drm_device *dev, struct 
device_node *of_node)
+{
+   enum ofdrm_model model = OFDRM_MODEL_UNKNOWN;
+
+   if (of_node_name_prefix(of_node, "ATY,Rage128")) {
+   model = OFDRM_MODEL_RAGE128;
+   } else if (of_node_name_prefix(of_node, "ATY,RageM3pA") ||
+  of_node_name_prefix(of_node, "ATY,RageM3p12A")) {
+   model = OFDRM_MODEL_RAGE_M3A;
+   } else if (of_node_name_prefix(of_node, "ATY,RageM3pB")) {
+   model = OFDRM_MODEL_RAGE_M3B;
+   } else if (of_node_name_prefix(of_node, "ATY,Rage6")) {
+   model = OFDRM_MODEL_RADEON;
+   } else if (of_node_name_prefix(of_node, "ATY,")) {
+   return OFDRM_MODEL_MACH64;
+   } else if (of_device_is_compatible(of_node, "pci1014,b7") ||
+  of_device_is_compatible(of_node, "pci1014,21c")) {
+   model = OFDRM_MODEL_GXT2000;
+   } else if (of_node_name_prefix(of_node, "vga,Display-")) {
+   struct device_node *of_parent;
+   const __be32 *vendor_p, *device_p;
+
+   /* Look for AVIVO initialized by SLOF */
+   of_parent = of_get_parent(of_node);
+   vendor_p = of_get_property(of_parent, "vendor-id", NULL);
+   device_p = of_get_property(of_parent, "device-id", NULL);
+   if (vendor_p && device_p && is_avivo(*vendor_p, *device_p))
+   model = OFDRM_MODEL_AVIVO;
+   of_node_put(of_parent);
+   } else if (of_device_is_compatible(of_node, "qemu,std-vga")) {
+   model = OFDRM_MODEL_QEMU;
+   }
+
+   return model;
+}
+
 /*
  * Open Firmware display device
  */
 
+struct ofdrm_device_funcs {
+};
+
 struct ofdrm_device {
struct drm_device dev;
struct platform_device *pdev;
 
+   const struct ofdrm_device_funcs *funcs;
+
/* simplefb settings */
struct iosys_map screen_base;
struct drm_display_mode mode;
@@ -520,6 +584,33 @@ static const struct drm_mode_config_funcs 
ofdrm_mode_config_funcs = {
  * Init / Cleanup
  */
 
+static const struct ofdrm_device_funcs ofdrm_unknown_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_mach64_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_rage128_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_rage_m3a_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_rage_m3b_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_radeon_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_gxt2000_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_avivo_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_qemu_device_funcs = {
+};
+
 static struct drm_display_mode ofdrm_mode(unsigned int width, unsigned int 
height)
 {
/*
@@ -541,6 +632,7 @@ static struct ofdrm_device *ofdrm_device_create(struct 
drm_driver *drv,
struct device_node *of_node = pdev->dev.of_node;
struct ofdrm_device *odev;
struct drm_device *dev;
+   enum ofdrm_model model;
int width, height, linebytes;
const struct drm_format_info *format;
u64 address;
@@ -569,6 +661,39 @@ static struct ofdrm_device *ofdrm_device_create(struct

[PATCH v3 4/4] drm/ofdrm: Support color management

2022-09-22 Thread Thomas Zimmermann

Support the CRTC's color-management property and implement each model's
palette support.

The OF hardware has different methods of setting the palette. The
respective code has been taken from fbdev's offb and refactored into
per-model device functions. The device functions integrate this
functionality into the overall modesetting.

As palette handling is a CRTC property that depends on the primary
plane's color format, the plane's atomic_check helper now updates the
format field in ofdrm's custom CRTC state. The CRTC's atomic_flush
helper updates the palette for the format as needed.

v3:
* lookup CRTC state with drm_atomic_get_new_crtc_state()
* access HW palette with writeb(), writel(), and readl() (Ben)
* declare register values as u32

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/tiny/ofdrm.c | 442 ++-
 1 file changed, 437 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index 9a8e5696999c..f891588375e7 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -31,6 +32,33 @@
 #define PCI_VENDOR_ID_ATI_R520 0x7100
 #define PCI_VENDOR_ID_ATI_R600 0x9400
 
+#define OFDRM_GAMMA_LUT_SIZE   256
+
+/* Definitions used by the Avivo palette  */
+#define AVIVO_DC_LUT_RW_SELECT  0x6480
+#define AVIVO_DC_LUT_RW_MODE0x6484
+#define AVIVO_DC_LUT_RW_INDEX   0x6488
+#define AVIVO_DC_LUT_SEQ_COLOR  0x648c
+#define AVIVO_DC_LUT_PWL_DATA   0x6490
+#define AVIVO_DC_LUT_30_COLOR   0x6494
+#define AVIVO_DC_LUT_READ_PIPE_SELECT   0x6498
+#define AVIVO_DC_LUT_WRITE_EN_MASK  0x649c
+#define AVIVO_DC_LUT_AUTOFILL   0x64a0
+#define AVIVO_DC_LUTA_CONTROL   0x64c0
+#define AVIVO_DC_LUTA_BLACK_OFFSET_BLUE 0x64c4
+#define AVIVO_DC_LUTA_BLACK_OFFSET_GREEN0x64c8
+#define AVIVO_DC_LUTA_BLACK_OFFSET_RED  0x64cc
+#define AVIVO_DC_LUTA_WHITE_OFFSET_BLUE 0x64d0
+#define AVIVO_DC_LUTA_WHITE_OFFSET_GREEN0x64d4
+#define AVIVO_DC_LUTA_WHITE_OFFSET_RED  0x64d8
+#define AVIVO_DC_LUTB_CONTROL   0x6cc0
+#define AVIVO_DC_LUTB_BLACK_OFFSET_BLUE 0x6cc4
+#define AVIVO_DC_LUTB_BLACK_OFFSET_GREEN0x6cc8
+#define AVIVO_DC_LUTB_BLACK_OFFSET_RED  0x6ccc
+#define AVIVO_DC_LUTB_WHITE_OFFSET_BLUE 0x6cd0
+#define AVIVO_DC_LUTB_WHITE_OFFSET_GREEN0x6cd4
+#define AVIVO_DC_LUTB_WHITE_OFFSET_RED  0x6cd8
+
 enum ofdrm_model {
OFDRM_MODEL_UNKNOWN,
OFDRM_MODEL_MACH64, /* ATI Mach64 */
@@ -212,7 +240,14 @@ static enum ofdrm_model display_get_model_of(struct 
drm_device *dev, struct devi
  * Open Firmware display device
  */
 
+struct ofdrm_device;
+
 struct ofdrm_device_funcs {
+   void __iomem *(*cmap_ioremap)(struct ofdrm_device *odev,
+ struct device_node *of_node,
+ u64 fb_bas);
+   void (*cmap_write)(struct ofdrm_device *odev, unsigned char index,
+  unsigned char r, unsigned char g, unsigned char b);
 };
 
 struct ofdrm_device {
@@ -227,6 +262,9 @@ struct ofdrm_device {
const struct drm_format_info *format;
unsigned int pitch;
 
+   /* colormap */
+   void __iomem *cmap_base;
+
/* modesetting */
uint32_t formats[8];
struct drm_plane primary_plane;
@@ -339,12 +377,322 @@ static struct resource *ofdrm_find_fb_resource(struct 
ofdrm_device *odev,
return max_res;
 }
 
+/*
+ * Colormap / Palette
+ */
+
+static void __iomem *get_cmap_address_of(struct ofdrm_device *odev, struct 
device_node *of_node,
+int bar_no, unsigned long offset, 
unsigned long size)
+{
+   struct drm_device *dev = >dev;
+   const __be32 *addr_p;
+   u64 max_size, address;
+   unsigned int flags;
+   void __iomem *mem;
+
+   addr_p = of_get_pci_address(of_node, bar_no, _size, );
+   if (!addr_p)
+   addr_p = of_get_address(of_node, bar_no, _size, );
+   if (!addr_p)
+   return ERR_PTR(-ENODEV);
+
+   if ((flags & (IORESOURCE_IO | IORESOURCE_MEM)) == 0)
+   return ERR_PTR(-ENODEV);
+
+   if ((offset + size) >= max_size)
+   return ERR_PTR(-ENODEV);
+
+   address = of_translate_address(of_node, addr_p);
+   if (address == OF_BAD_ADDR)
+   return ERR_PTR(-ENODEV);
+
+   mem = devm_ioremap(dev->dev, address + offset, size);
+   if (!mem)
+   return ERR_PTR(-ENOMEM);
+
+   return mem;
+}
+
+static void __iomem *ofdrm_mach64_cmap_ioremap(struct ofdrm_device *odev,
+  struct device_node

[PATCH v3 2/4] drm/ofdrm: Add CRTC state

2022-09-22 Thread Thomas Zimmermann

Add a dedicated CRTC state to ofdrm to later store information for
palette updates.

v3:
* rework CRTC state helpers (Javier)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/tiny/ofdrm.c | 59 ++--
 1 file changed, 56 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index 5ac9aa769513..a78aee800956 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -279,6 +279,21 @@ static struct resource *ofdrm_find_fb_resource(struct 
ofdrm_device *odev,
  * Modesetting
  */
 
+struct ofdrm_crtc_state {
+   struct drm_crtc_state base;
+};
+
+static struct ofdrm_crtc_state *to_ofdrm_crtc_state(struct drm_crtc_state 
*base)
+{
+   return container_of(base, struct ofdrm_crtc_state, base);
+}
+
+static void ofdrm_crtc_state_destroy(struct ofdrm_crtc_state *ofdrm_crtc_state)
+{
+   __drm_atomic_helper_crtc_destroy_state(_crtc_state->base);
+   kfree(ofdrm_crtc_state);
+}
+
 /*
  * Support all formats of OF display and maybe more; in order
  * of preference. The display's update function will do any
@@ -429,13 +444,51 @@ static const struct drm_crtc_helper_funcs 
ofdrm_crtc_helper_funcs = {
.atomic_check = ofdrm_crtc_helper_atomic_check,
 };
 
+static void ofdrm_crtc_reset(struct drm_crtc *crtc)
+{
+   struct ofdrm_crtc_state *ofdrm_crtc_state =
+   kzalloc(sizeof(*ofdrm_crtc_state), GFP_KERNEL);
+
+   if (crtc->state)
+   ofdrm_crtc_state_destroy(to_ofdrm_crtc_state(crtc->state));
+
+   if (ofdrm_crtc_state)
+   __drm_atomic_helper_crtc_reset(crtc, _crtc_state->base);
+   else
+   __drm_atomic_helper_crtc_reset(crtc, NULL);
+}
+
+static struct drm_crtc_state *ofdrm_crtc_atomic_duplicate_state(struct 
drm_crtc *crtc)
+{
+   struct drm_device *dev = crtc->dev;
+   struct drm_crtc_state *crtc_state = crtc->state;
+   struct ofdrm_crtc_state *new_ofdrm_crtc_state;
+
+   if (drm_WARN_ON(dev, !crtc_state))
+   return NULL;
+
+   new_ofdrm_crtc_state = kzalloc(sizeof(*new_ofdrm_crtc_state), 
GFP_KERNEL);
+   if (!new_ofdrm_crtc_state)
+   return NULL;
+
+   __drm_atomic_helper_crtc_duplicate_state(crtc, 
_ofdrm_crtc_state->base);
+
+   return _ofdrm_crtc_state->base;
+}
+
+static void ofdrm_crtc_atomic_destroy_state(struct drm_crtc *crtc,
+   struct drm_crtc_state *crtc_state)
+{
+   ofdrm_crtc_state_destroy(to_ofdrm_crtc_state(crtc_state));
+}
+
 static const struct drm_crtc_funcs ofdrm_crtc_funcs = {
-   .reset = drm_atomic_helper_crtc_reset,
+   .reset = ofdrm_crtc_reset,
.destroy = drm_crtc_cleanup,
.set_config = drm_atomic_helper_set_config,
.page_flip = drm_atomic_helper_page_flip,
-   .atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state,
-   .atomic_destroy_state = drm_atomic_helper_crtc_destroy_state,
+   .atomic_duplicate_state = ofdrm_crtc_atomic_duplicate_state,
+   .atomic_destroy_state = ofdrm_crtc_atomic_destroy_state,
 };
 
 static int ofdrm_connector_helper_get_modes(struct drm_connector *connector)
-- 
2.37.3

Re: [PATCH] powerpc: Ignore DSI error caused by the copy/paste instruction

2022-09-22 Thread Christophe Leroy



Le 22/09/2022 à 10:29, Haren Myneni a écrit :
> 
> DSI error will be generated when the paste operation is issued on
> the suspended NX window due to NX state changes. The hypervisor
> expects the partition to ignore this error during page pault
> handling. To differentiate DSI caused by an actual HW configuration
> or by the NX window, a new “ibm,pi-features” type value is defined.
> Byte 0, bit 3 of pi-attribute-specifier-type is now defined to
> indicate this DSI error.

What is NX ? No eXec ? That's what it is usually. But in that case it 
would be the ISI, not DSI.

> 
> This patch adds changes to read ibm,pi-features property and ignore
> DSI error in the page fault handling if CPU_FTR_NX_DSI if defined.
> 
> Signed-off-by: Haren Myneni 
> ---
>   arch/powerpc/include/asm/cputable.h |  5 ++--
>   arch/powerpc/kernel/prom.c  | 36 +
>   arch/powerpc/mm/fault.c | 17 +-
>   3 files changed, 45 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 014005428687..154cc1e85770 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -367,7 +367,22 @@ static void sanity_check_fault(bool is_write, bool 
> is_user,
>   #elif defined(CONFIG_PPC_8xx)
>   #define page_fault_is_bad(__err)((__err) & DSISR_NOEXEC_OR_G)
>   #elif defined(CONFIG_PPC64)
> -#define page_fault_is_bad(__err) ((__err) & DSISR_BAD_FAULT_64S)
> +static inline int page_fault_is_bad(unsigned long __err)

The name was __err because it was a macro and there was a risk of 
collision with a 'err' variable in the caller.

But as it is now a function, you can just call it 'err'.

And no need of the 'inline' keyword, GCC will inline it anyway.

> +{
> + unsigned long flag = DSISR_BAD_FAULT_64S;
> +
> + /*
> +  * PAPR 14.15.3.4.1
> +  * If byte 0, bit 3 of pi-attribute-specifier-type in
> +  * ibm,pi-features property is defined, ignore the DSI error
> +  * which is caused by the paste instruction on the
> +  * suspended NX window.
> +  */
> + if (cpu_has_feature(CPU_FTR_NX_DSI))
> + flag &= ~DSISR_BAD_COPYPASTE;
> +
> + return ((__err) & flag);

The () around __err was because it was a macro parameter. It is 
pointless now. And same for the overall ones. Now it can be :

return err & flags;

> +}
>   #else
>   #define page_fault_is_bad(__err)((__err) & DSISR_BAD_FAULT_32S)
>   #endif

[PATCH] powerpc: Ignore DSI error caused by the copy/paste instruction

2022-09-22 Thread Haren Myneni



DSI error will be generated when the paste operation is issued on
the suspended NX window due to NX state changes. The hypervisor
expects the partition to ignore this error during page pault
handling. To differentiate DSI caused by an actual HW configuration
or by the NX window, a new “ibm,pi-features” type value is defined.
Byte 0, bit 3 of pi-attribute-specifier-type is now defined to
indicate this DSI error.

This patch adds changes to read ibm,pi-features property and ignore
DSI error in the page fault handling if CPU_FTR_NX_DSI if defined.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/cputable.h |  5 ++--
 arch/powerpc/kernel/prom.c  | 36 +
 arch/powerpc/mm/fault.c | 17 +-
 3 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index ae8c3e13cfce..8dc9949b6365 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -192,6 +192,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_RADIX_PREFETCH_BUG  LONG_ASM_CONST(0x0002)
 #define CPU_FTR_ARCH_31
LONG_ASM_CONST(0x0004)
 #define CPU_FTR_DAWR1  LONG_ASM_CONST(0x0008)
+#define CPU_FTR_NX_DSI LONG_ASM_CONST(0x0010)
 
 #ifndef __ASSEMBLY__
 
@@ -429,7 +430,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_P9_TLBIE_STQ_BUG | \
-   CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR)
+   CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR | CPU_FTR_NX_DSI)
 #define CPU_FTRS_POWER9_DD2_0 (CPU_FTRS_POWER9 | CPU_FTR_P9_RADIX_PREFETCH_BUG)
 #define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | \
   CPU_FTR_P9_RADIX_PREFETCH_BUG | \
@@ -451,7 +452,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \
-   CPU_FTR_DAWR | CPU_FTR_DAWR1)
+   CPU_FTR_DAWR | CPU_FTR_DAWR1 | CPU_FTR_NX_DSI)
 #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index a730b951b64b..19047c582e9f 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -137,7 +137,7 @@ static void __init move_device_tree(void)
 }
 
 /*
- * ibm,pa-features is a per-cpu property that contains a string of
+ * ibm,pa/pi-features is a per-cpu property that contains a string of
  * attribute descriptors, each of which has a 2 byte header plus up
  * to 254 bytes worth of processor attribute bits.  First header
  * byte specifies the number of bytes following the header.
@@ -149,15 +149,17 @@ static void __init move_device_tree(void)
  * is supported/not supported.  Note that the bit numbers are
  * big-endian to match the definition in PAPR.
  */
-static struct ibm_pa_feature {
+struct ibm_feature {
unsigned long   cpu_features;   /* CPU_FTR_xxx bit */
unsigned long   mmu_features;   /* MMU_FTR_xxx bit */
unsigned intcpu_user_ftrs;  /* PPC_FEATURE_xxx bit */
unsigned intcpu_user_ftrs2; /* PPC_FEATURE2_xxx bit */
-   unsigned char   pabyte; /* byte number in ibm,pa-features */
+   unsigned char   pabyte; /* byte number in ibm,pa/pi-features */
unsigned char   pabit;  /* bit number (big-endian) */
unsigned char   invert; /* if 1, pa bit set => clear feature */
-} ibm_pa_features[] __initdata = {
+};
+
+static struct ibm_feature ibm_pa_features[] __initdata = {
{ .pabyte = 0,  .pabit = 0, .cpu_user_ftrs = PPC_FEATURE_HAS_MMU },
{ .pabyte = 0,  .pabit = 1, .cpu_user_ftrs = PPC_FEATURE_HAS_FPU },
{ .pabyte = 0,  .pabit = 3, .cpu_features  = CPU_FTR_CTRL },
@@ -179,9 +181,19 @@ static struct ibm_pa_feature {
{ .pabyte = 64, .pabit = 0, .cpu_features = CPU_FTR_DAWR1 },
 };
 
+/*
+ * ibm,pi-features property provides the support of processor specific
+ * options not described in ibm,pa-features. Right now use byte 0, bit 3
+ * which indicates the occurrence of DSI interrupt when the paste operation
+ * on the suspended NX window.
+ */
+static struct ibm_feature ibm_pi_features[] __initdata = {
+   { .pabyte = 0, .pabit = 3, .cpu_features  = CPU_FTR_NX_DSI },
+};
+
 static void __init scan_features(unsigned long node, const unsigned char *ftrs,
 unsigned long tablelen,
-struct ibm_pa_feature *fp,
+

[PATCH] powerpc/pseries/vas: Add VAS IRQ primary handler

2022-09-22 Thread Haren Myneni



irq_default_primary_handler() can be used only with IRQF_ONESHOT
flag, but the flag disables IRQ before executing the thread handler
and enables it after the interrupt is handled. But this IRQ disable
sets the VAS IRQ OFF state in the hypervisor. In case if NX faults
during this window, the hypervisor will not deliver the fault
interrupt to the partition and the user space may wait continuously
for the CSB update. So use VAS specific IRQ handler instead of
calling the default primary handler.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/pseries/vas.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index 7e6e6dd2e33e..fe33bdb620d5 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -210,6 +210,20 @@ static irqreturn_t pseries_vas_fault_thread_fn(int irq, 
void *data)
return IRQ_HANDLED;
 }
 
+/*
+ * irq_default_primary_handler() can be used only with IRQF_ONESHOT
+ * which disables IRQ before executing the thread handler and enables
+ * it after. But this disabling interrupt sets the VAS IRQ OFF
+ * state in the hypervisor. If the NX generates fault interrupt
+ * during this window, the hypervisor will not deliver this
+ * interrupt to the LPAR. So use VAS specific IRQ handler instead
+ * of calling the default primary handler.
+ */
+static irqreturn_t pseries_vas_irq_handler(int irq, void *data)
+{
+   return IRQ_WAKE_THREAD;
+}
+
 /*
  * Allocate window and setup IRQ mapping.
  */
@@ -240,8 +254,9 @@ static int allocate_setup_window(struct pseries_vas_window 
*txwin,
goto out_irq;
}
 
-   rc = request_threaded_irq(txwin->fault_virq, NULL,
- pseries_vas_fault_thread_fn, IRQF_ONESHOT,
+   rc = request_threaded_irq(txwin->fault_virq,
+ pseries_vas_irq_handler,
+ pseries_vas_fault_thread_fn, 0,
  txwin->name, txwin);
if (rc) {
pr_err("VAS-Window[%d]: Request IRQ(%u) failed with %d\n",
-- 
2.26.3

[PATCH] powerpc/pseries: Move vas_migration_handler early during migration

2022-09-22 Thread Haren Myneni



When the migration is initiated, the hypervisor changes VAS
mappings as part of pre-migration event. Then the OS gets the
migration event which closes all VAS windows before the migration
starts. NX generates continuous faults until windows are closed
and the user space can not differentiate these NX faults coming
from the actual migration. So to reduce this time window, close
VAS windows first in pseries_migrate_partition().

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/pseries/mobility.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 3d36a8955eaf..884595b7c51f 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -740,11 +740,19 @@ static int pseries_migrate_partition(u64 handle)
 #ifdef CONFIG_PPC_WATCHDOG
factor = nmi_wd_lpm_factor;
 #endif
+   /*
+* When the migration is initiated, the hypervisor changes VAS
+* mappings to prepare before OS gets the notification and
+* closes all VAS windows. NX generates continuous faults during
+* this time and the user space can not differentiate these
+* faults from the migration event. So reduce this time window
+* by closing VAS windows at the beginning of this function.
+*/
+   vas_migration_handler(VAS_SUSPEND);
+
ret = wait_for_vasi_session_suspending(handle);
if (ret)
-   return ret;
-
-   vas_migration_handler(VAS_SUSPEND);
+   goto out;
 
if (factor)
watchdog_nmi_set_timeout_pct(factor);
@@ -765,6 +773,7 @@ static int pseries_migrate_partition(u64 handle)
if (factor)
watchdog_nmi_set_timeout_pct(0);
 
+out:
vas_migration_handler(VAS_RESUME);
 
return ret;
-- 
2.26.3

Re: [PATCH v2 10/10] drm/ofdrm: Support color management

2022-09-22 Thread Thomas Zimmermann


Hi

Am 22.09.22 um 09:28 schrieb Maxime Ripard:

On Thu, Sep 22, 2022 at 08:42:23AM +0200, Thomas Zimmermann wrote:

Hi

Am 21.09.22 um 18:48 schrieb Geert Uytterhoeven:

Hi Thomas,

On Wed, Sep 21, 2022 at 2:55 PM Thomas Zimmermann  wrote:

Am 05.08.22 um 02:19 schrieb Benjamin Herrenschmidt:

On Wed, 2022-07-20 at 16:27 +0200, Thomas Zimmermann wrote:

+#if !defined(CONFIG_PPC)
+static inline void out_8(void __iomem *addr, int val)
+{ }
+static inline void out_le32(void __iomem *addr, int val)
+{ }
+static inline unsigned int in_le32(const void __iomem *addr)
+{
+   return 0;
+}
+#endif


These guys could just be replaced with readb/writel/readl respectively
(beware of the argument swap).


I only added them for COMPILE_TEST. There appears to be no portable
interface that implements out_le32() and in_le32()?


iowrite32() and ioread32()?


Do they always use little endian, as these *_le32 helpers do? I though they
use host byte order.


They use either outl or writel under the hood, which are always little-endian


I see. I'll replace the custom helpers.

Best regards
Thomas



Maxime


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v2 10/10] drm/ofdrm: Support color management

2022-09-22 Thread Maxime Ripard

On Thu, Sep 22, 2022 at 08:42:23AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 21.09.22 um 18:48 schrieb Geert Uytterhoeven:
> > Hi Thomas,
> > 
> > On Wed, Sep 21, 2022 at 2:55 PM Thomas Zimmermann  
> > wrote:
> > > Am 05.08.22 um 02:19 schrieb Benjamin Herrenschmidt:
> > > > On Wed, 2022-07-20 at 16:27 +0200, Thomas Zimmermann wrote:
> > > > > +#if !defined(CONFIG_PPC)
> > > > > +static inline void out_8(void __iomem *addr, int val)
> > > > > +{ }
> > > > > +static inline void out_le32(void __iomem *addr, int val)
> > > > > +{ }
> > > > > +static inline unsigned int in_le32(const void __iomem *addr)
> > > > > +{
> > > > > +   return 0;
> > > > > +}
> > > > > +#endif
> > > > 
> > > > These guys could just be replaced with readb/writel/readl respectively
> > > > (beware of the argument swap).
> > > 
> > > I only added them for COMPILE_TEST. There appears to be no portable
> > > interface that implements out_le32() and in_le32()?
> > 
> > iowrite32() and ioread32()?
> 
> Do they always use little endian, as these *_le32 helpers do? I though they
> use host byte order.

They use either outl or writel under the hood, which are always little-endian

Maxime


signature.asc
Description: PGP signature

Re: [PATCH v2 10/10] drm/ofdrm: Support color management

2022-09-22 Thread Thomas Zimmermann


Hi

Am 21.09.22 um 18:48 schrieb Geert Uytterhoeven:

Hi Thomas,

On Wed, Sep 21, 2022 at 2:55 PM Thomas Zimmermann  wrote:

Am 05.08.22 um 02:19 schrieb Benjamin Herrenschmidt:

On Wed, 2022-07-20 at 16:27 +0200, Thomas Zimmermann wrote:

+#if !defined(CONFIG_PPC)
+static inline void out_8(void __iomem *addr, int val)
+{ }
+static inline void out_le32(void __iomem *addr, int val)
+{ }
+static inline unsigned int in_le32(const void __iomem *addr)
+{
+   return 0;
+}
+#endif


These guys could just be replaced with readb/writel/readl respectively
(beware of the argument swap).


I only added them for COMPILE_TEST. There appears to be no portable
interface that implements out_le32() and in_le32()?


iowrite32() and ioread32()?


Do they always use little endian, as these *_le32 helpers do? I though 
they use host byte order.


Best regards
Thomas



Gr{oetje,eeting}s,

 Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
 -- Linus Torvalds


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature

59 matches

Mail list logo