date:20210513





Le 10/05/2021 à 03:18, Jordan Niethe a écrit :

From: Russell Currey 

To enable strict module RWX on powerpc, set:

 CONFIG_STRICT_MODULE_RWX=y

You should also have CONFIG_STRICT_KERNEL_RWX=y set to have any real
security benefit.

ARCH_HAS_STRICT_MODULE_RWX is set to require ARCH_HAS_STRICT_KERNEL_RWX.
This is due to a quirk in arch/Kconfig and arch/powerpc/Kconfig that
makes STRICT_MODULE_RWX *on by default* in configurations where
STRICT_KERNEL_RWX is *unavailable*.

Since this doesn't make much sense, and module RWX without kernel RWX
doesn't make much sense, having the same dependencies as kernel RWX
works around this problem.

With STRICT_MODULE_RWX, now make module_alloc() allocate pages with
KERNEL_PAGE protection rather than KERNEL_PAGE_EXEC.

Book32s/32 processors with a hash mmu (i.e. 604 core) can not set memory
protection on a page by page basis so do not enable.

Signed-off-by: Russell Currey 
[jpn: - predicate on !PPC_BOOK3S_604
   - make module_alloc() use PAGE_KERNEL protection]
Signed-off-by: Jordan Niethe 
---
v10: - Predicate on !PPC_BOOK3S_604
  - Make module_alloc() use PAGE_KERNEL protection
v11: - Neaten up
v13: Use strict_kernel_rwx_enabled()
---
  arch/powerpc/Kconfig | 1 +
  arch/powerpc/kernel/module.c | 4 +++-
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cce0a137b046..cb5d9d862c35 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -140,6 +140,7 @@ config PPC
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!HIBERNATION)
+   select ARCH_HAS_STRICT_MODULE_RWX   if ARCH_HAS_STRICT_KERNEL_RWX 
&& !PPC_BOOK3S_604
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
select ARCH_HAS_COPY_MC if PPC64
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index 3f35c8d20be7..f24004635ed5 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -92,12 +92,14 @@ int module_finalize(const Elf_Ehdr *hdr,
  static __always_inline void *
  __module_alloc(unsigned long size, unsigned long start, unsigned long end)
  {
+   pgprot_t prot = strict_kernel_rwx_enabled() ? PAGE_KERNEL : 
PAGE_KERNEL_EXEC;
+


I'm not sure this is OK.

I think we need to make a new helper strict_module_rwx_enabled() because I don't think we want 
PAGE_KERNEL here when CONFIG_STRICT_MODULE_RWX is not selected.




/*
 * Don't do huge page allocations for modules yet until more testing
 * is done. STRICT_MODULE_RWX may require extra work to support this
 * too.
 */
-   return __vmalloc_node_range(size, 1, start, end, GFP_KERNEL, 
PAGE_KERNEL_EXEC,
+   return __vmalloc_node_range(size, 1, start, end, GFP_KERNEL, prot,
VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP,
NUMA_NO_NODE, __builtin_return_address(0));
  }

Re: [PATCH v13 3/8] powerpc/kprobes: Mark newly allocated probes as ROX





Le 10/05/2021 à 03:18, Jordan Niethe a écrit :

From: Russell Currey 

Add the arch specific insn page allocator for powerpc. This allocates
ROX pages if STRICT_KERNEL_RWX is enabled. These pages are only written
to with patch_instruction() which is able to write RO pages.

Reviewed-by: Daniel Axtens 
Signed-off-by: Russell Currey 
Signed-off-by: Christophe Leroy 
[jpn: Reword commit message, switch to __vmalloc_node_range()]
Signed-off-by: Jordan Niethe 
---
v9: - vmalloc_exec() no longer exists
 - Set the page to RW before freeing it
v10: - use __vmalloc_node_range()
v11: - Neaten up
v12: - Switch from __vmalloc_node_range() to module_alloc()
v13: Use strict_kernel_rwx_enabled()
---
  arch/powerpc/kernel/kprobes.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 01ab2163659e..b517f3e6e7c5 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -19,11 +19,13 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
  #include 
  #include 
+#include 
  #include 
  
  DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;

@@ -103,6 +105,21 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
return addr;
  }
  
+void *alloc_insn_page(void)

+{
+   void *page;
+
+   page = module_alloc(PAGE_SIZE);
+   if (!page)
+   return NULL;
+
+   if (strict_kernel_rwx_enabled()) {


I'm not sure this is OK.

I think we need to make a new helper strict_module_rwx_enabled() because I don't think we want to 
call that when CONFIG_STRICT_MODULE_RWX is not selected.




+   set_memory_ro((unsigned long)page, 1);
+   set_memory_x((unsigned long)page, 1);
+   }
+   return page;
+}
+
  int arch_prepare_kprobe(struct kprobe *p)
  {
int ret = 0;

[PATCH kernel] powerpc/makefile: Remove flag duplicates when generating vdso linker scripts

The cmd_cpp_lds_S rule already has -P and -U$(ARCH) so there is no need
in duplicating these, clean that up. Since only -C is left and
scripts/Makefile.build have -C removed since
commit 5cb0512c02ec ("Kbuild: don't pass "-C" to preprocessor when processing 
linker scripts")
this follows the lead and removes CPPFLAGS_vdso(32|64).lds altogether.

Signed-off-by: Alexey Kardashevskiy 
---

scripts/checkpatch.pl complains as it does not handle quotes in
the commit subject line well. oh well.

---
 arch/powerpc/kernel/vdso32/Makefile | 1 -
 arch/powerpc/kernel/vdso64/Makefile | 1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/Makefile 
b/arch/powerpc/kernel/vdso32/Makefile
index 7d9a6fee0e3d..7420e88d5aa3 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -44,7 +44,6 @@ asflags-y := -D__VDSO32__ -s
 
 obj-y += vdso32_wrapper.o
 targets += vdso32.lds
-CPPFLAGS_vdso32.lds += -P -C -Upowerpc
 
 # link rule for the .so file, .lds has to be first
 $(obj)/vdso32.so.dbg: $(src)/vdso32.lds $(obj-vdso32) $(obj)/vgettimeofday.o 
FORCE
diff --git a/arch/powerpc/kernel/vdso64/Makefile 
b/arch/powerpc/kernel/vdso64/Makefile
index 2813e3f98db6..fb118630c334 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -30,7 +30,6 @@ ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
 asflags-y := -D__VDSO64__ -s
 
 targets += vdso64.lds
-CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
 
 # link rule for the .so file, .lds has to be first
 $(obj)/vdso64.so.dbg: $(src)/vdso64.lds $(obj-vdso64) $(obj)/vgettimeofday.o 
FORCE
-- 
2.30.2

[PATCH] powerpc/64e/interrupt: fix nvgprs being clobbered

Some interrupt handlers have an "extra" that saves 1 or 2 registers
(r14, r15) in the paca save area and makes them available to use by the
handler.

The change to always save nvgprs in exception handlers lead to some
interrupt handlers saving those scratch r14 / r15 registers into the
interrupt frame's GPR saves, which get restored on interrupt exit.

Fix this by always reloading those scratch registers from paca before the
EXCEPTION_COMMON that saves nvgprs.

Fixes: 4228b2c3d20e ("powerpc/64e/interrupt: always save nvgprs on interrupt")
Tested-by: Christian Zigotzky 
Reported-by: Christian Zigotzky 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64e.S | 38 ++--
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 7c3654b0d0f4..f1ae710274bc 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -340,6 +340,12 @@ ret_from_mc_except:
andi.   r10,r10,IRQS_DISABLED;  /* yes -> go out of line */ \
bne masked_interrupt_book3e_##n
 
+/*
+ * Additional regs must be re-loaded from paca before EXCEPTION_COMMON* is
+ * called, because that does SAVE_NVGPRS which must see the original register
+ * values, otherwise the scratch values might be restored when exiting the
+ * interrupt.
+ */
 #define PROLOG_ADDITION_2REGS_GEN(n)   \
std r14,PACA_EXGEN+EX_R14(r13); \
std r15,PACA_EXGEN+EX_R15(r13)
@@ -535,6 +541,10 @@ __end_interrupts:
PROLOG_ADDITION_2REGS)
mfspr   r14,SPRN_DEAR
mfspr   r15,SPRN_ESR
+   std r14,_DAR(r1)
+   std r15,_DSISR(r1)
+   ld  r14,PACA_EXGEN+EX_R14(r13)
+   ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x300)
b   storage_fault_common
 
@@ -544,6 +554,10 @@ __end_interrupts:
PROLOG_ADDITION_2REGS)
li  r15,0
mr  r14,r10
+   std r14,_DAR(r1)
+   std r15,_DSISR(r1)
+   ld  r14,PACA_EXGEN+EX_R14(r13)
+   ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x400)
b   storage_fault_common
 
@@ -557,6 +571,10 @@ __end_interrupts:
PROLOG_ADDITION_2REGS)
mfspr   r14,SPRN_DEAR
mfspr   r15,SPRN_ESR
+   std r14,_DAR(r1)
+   std r15,_DSISR(r1)
+   ld  r14,PACA_EXGEN+EX_R14(r13)
+   ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x600)
b   alignment_more  /* no room, go out of line */
 
@@ -565,10 +583,10 @@ __end_interrupts:
NORMAL_EXCEPTION_PROLOG(0x700, BOOKE_INTERRUPT_PROGRAM,
PROLOG_ADDITION_1REG)
mfspr   r14,SPRN_ESR
-   EXCEPTION_COMMON(0x700)
std r14,_DSISR(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
ld  r14,PACA_EXGEN+EX_R14(r13)
+   EXCEPTION_COMMON(0x700)
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  program_check_exception
REST_NVGPRS(r1)
b   interrupt_return
@@ -725,11 +743,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 * normal exception
 */
mfspr   r14,SPRN_DBSR
-   EXCEPTION_COMMON_CRIT(0xd00)
std r14,_DSISR(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
ld  r14,PACA_EXCRIT+EX_R14(r13)
ld  r15,PACA_EXCRIT+EX_R15(r13)
+   EXCEPTION_COMMON_CRIT(0xd00)
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  DebugException
REST_NVGPRS(r1)
b   interrupt_return
@@ -796,11 +814,11 @@ kernel_dbg_exc:
 * normal exception
 */
mfspr   r14,SPRN_DBSR
-   EXCEPTION_COMMON_DBG(0xd08)
std r14,_DSISR(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
ld  r14,PACA_EXDBG+EX_R14(r13)
ld  r15,PACA_EXDBG+EX_R15(r13)
+   EXCEPTION_COMMON_DBG(0xd08)
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  DebugException
REST_NVGPRS(r1)
b   interrupt_return
@@ -931,11 +949,7 @@ masked_interrupt_book3e_0x2c0:
  * original values stashed away in the PACA
  */
 storage_fault_common:
-   std r14,_DAR(r1)
-   std r15,_DSISR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
-   ld  r14,PACA_EXGEN+EX_R14(r13)
-   ld  r15,PACA_EXGEN+EX_R15(r13)
bl  do_page_fault
b   interrupt_return
 
@@ -944,11 +958,7 @@ storage_fault_common:
  * continues here.
  */
 alignment_more:
-   std r14,_DAR(r1)
-   std r15,_DSISR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
-   ld  r14,PACA_EXGEN+EX_R14(r13)
-   ld  r15,PACA_EXGEN+EX_R15(r13)
bl  alignment_exception
REST_NVGPRS(r1)
b   interrupt_return
-- 
2.23.0

Re: [PATCH kernel v3] powerpc/makefile: Do not redefine $(CPP) for preprocessor





On 14/05/2021 12:42, Masahiro Yamada wrote:

On Fri, May 14, 2021 at 3:59 AM Nathan Chancellor  wrote:


On 5/13/2021 4:59 AM, Alexey Kardashevskiy wrote:

The $(CPP) (do only preprocessing) macro is already defined in Makefile.
However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results
in flags duplication. Which is not a big deal by itself except for
the flags which depend on other flags and the compiler checks them
as it parses the command line.

Specifically, scripts/Makefile.build:304 generates ksyms for .S files.
If clang+llvm+sanitizer are enabled, this results in

-emit-llvm-bc -fno-lto -flto -fvisibility=hidden \
   -fsanitize=cfi-mfcall -fno-lto  ...

in the clang command line and triggers error:


I do not know how to reproduce this for powerpc.
Currently, only x86 and arm64 select
ARCH_SUPPORTS_LTO_CLANG.

Is this a fix for a potential issue?


Yeah, it is work in progress to enable LTO_CLANG for PPC64:

https://github.com/aik/linux/commits/lto








clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed with 
'-flto'

This removes unnecessary CPP redefinition. Which works fine as in most
place KBUILD_CFLAGS is passed to $CPP except
arch/powerpc/kernel/vdso64/vdso(32|64).lds. To fix vdso, this does:
1. add -m(big|little)-endian to $CPP
2. add target to $KBUILD_CPPFLAGS as otherwise clang ignores 
-m(big|little)-endian if
the building platform does not support big endian (such as x86).

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v3:
* moved vdso cleanup in a separate patch
* only add target to KBUILD_CPPFLAGS for CLANG

v2:
* fix KBUILD_CPPFLAGS
* add CLANG_FLAGS to CPPFLAGS
---
   Makefile  | 1 +
   arch/powerpc/Makefile | 3 ++-
   2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 15b6476d0f89..5b545bef7653 100644
--- a/Makefile
+++ b/Makefile
@@ -576,6 +576,7 @@ CC_VERSION_TEXT = $(subst $(pound),,$(shell $(CC) --version 
2>/dev/null | head -
   ifneq ($(findstring clang,$(CC_VERSION_TEXT)),)
   ifneq ($(CROSS_COMPILE),)
   CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%))
+KBUILD_CPPFLAGS  += --target=$(notdir $(CROSS_COMPILE:%-=%))


You can avoid the duplication here by just doing:

KBUILD_CPPFLAGS += $(CLANG_FLAGS)

I am still not super happy about the flag duplication but I am not sure
I can think of a better solution. If KBUILD_CPPFLAGS are always included
when building .o files, maybe we should just add $(CLANG_FLAGS) to
KBUILD_CPPFLAGS instead of KBUILD_CFLAGS?


Hmm, I think including --target=* in CPP flags is sensible,
but not all CLANG_FLAGS are CPP flags.
At least, -(no)-integrated-as is not a CPP flag.

We could introduce a separate CLANG_CPP_FLAGS, but
it would require more code changes...

So, I do not have a strong opinion either way.



BTW, another approach might be to modify the linker script.


In my best guess, the reason why powerpc adding the endian flag to CPP
is this line in arch/powerpc/kernel/vdso64/vdso64.lds.S

#ifdef __LITTLE_ENDIAN__
OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", "elf64-powerpcle")
#else
OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc")
#endif


You can use the CONFIG option to check the endian-ness.

#ifdef CONFIG_CPU_BIG_ENDIAN
OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc")
#else
OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", "elf64-powerpcle")
#endif


All the big endian arches define CONFIG_CPU_BIG_ENDIAN.
(but not all little endian arches define CONFIG_CPU_LITTLE_ENDIAN)



This should work with .lds. But missing --target=* might still hit us 
somewhere else later, these include 3 header files each and there might 
be endianness dependent stuff.






So,
#ifdef CONFIG_CPU_BIG_ENDIAN
< big endian code >
#else
   < little endian code >
#endif

works for all architectures.


Only the exception is you cannot replace the one in uapi headers.
   arch/powerpc/include/uapi/asm/byteorder.h: #ifdef __LITTLE_ENDIAN__
since it is exported to userspace, where CONFIG options are not available.



BTW, various flags are historically used.

  -  CONFIG_CPU_BIG_ENDIAN   /  CONFIG_CPU_LITTLE_ENDIAN
  -  __BIG_ENDIAN   / __LITTLE_ENDIAN
  -  __LITTLE_ENDIAN__ (powerpc only)



__LITTLE_ENDIAN__  is defined by powerpc gcc and clang.

My experiments...


[1] powerpc-linux-gnu-gcc-> __BIG_ENDIAN__ is defined

masahiro@grover:~$ echo | powerpc-linux-gnu-gcc -E  -dM -x c - | grep ENDIAN
#define __ORDER_LITTLE_ENDIAN__ 1234
#define __BIG_ENDIAN__ 1
#define __FLOAT_WORD_ORDER__ __ORDER_BIG_ENDIAN__
#define __ORDER_PDP_ENDIAN__ 3412
#define _BIG_ENDIAN 1
#define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__
#define __VEC_ELEMENT_REG_ORDER__ __ORDER_BIG_ENDIAN__
#define __ORDER_BIG_ENDIAN__ 4321


[2] powerpc-linux-gnu-gcc + -mlittle-endian-> __LITTLE_ENDIAN__ is defined

masahiro@grover:~$ echo | powerpc-linux-gnu-gcc  -E  -dM   -x c -
-mlittle-endian  | grep ENDIAN
#define __ORDER_LITTLE_ENDIAN__ 1234
#define

Re: [PATCH kernel v2] powerpc/makefile: Do not redefine $(CPP) for preprocessor

2021-05-13 Thread Masahiro Yamada

On Wed, May 12, 2021 at 7:29 PM Segher Boessenkool
 wrote:
>
> On Wed, May 12, 2021 at 01:48:53PM +1000, Alexey Kardashevskiy wrote:
> > >Oh!  I completely missed those few lines.  Sorry for that :-(
> >
> > Well, I probably should have made it a separate patch anyway, I'll
> > repost separately.
>
> Thanks.
>
> > >To compensate a bit:
> > >
> > >>It still puzzles me why we need -C
> > >>(preserve comments in the preprocessor output) flag here.
> > >
> > >It is so that a human can look at the output and read it.  Comments are
> > >very significant to human readers :-)
> >
> > I seriously doubt anyone ever read those :)
>
> I am pretty sure whoever wrote it did!


Keeping comments in the pre-processed linker scripts
is troublesome.

That is why -C was removed from scripts/Makefile.build


commit 5cb0512c02ecd7e6214e912e4c150f4219ac78e0
Author: Linus Torvalds 
Date:   Thu Nov 2 14:10:37 2017 -0700

Kbuild: don't pass "-C" to preprocessor when processing linker scripts




You can entirely remove

 CPPFLAGS_vdso32.lds += -P -C -Upowerpc





-- 
Best Regards
Masahiro Yamada

Re: [PATCH kernel v3] powerpc/makefile: Do not redefine $(CPP) for preprocessor

2021-05-13 Thread Masahiro Yamada

On Fri, May 14, 2021 at 3:59 AM Nathan Chancellor  wrote:
>
> On 5/13/2021 4:59 AM, Alexey Kardashevskiy wrote:
> > The $(CPP) (do only preprocessing) macro is already defined in Makefile.
> > However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results
> > in flags duplication. Which is not a big deal by itself except for
> > the flags which depend on other flags and the compiler checks them
> > as it parses the command line.
> >
> > Specifically, scripts/Makefile.build:304 generates ksyms for .S files.
> > If clang+llvm+sanitizer are enabled, this results in
> >
> > -emit-llvm-bc -fno-lto -flto -fvisibility=hidden \
> >   -fsanitize=cfi-mfcall -fno-lto  ...
> >
> > in the clang command line and triggers error:

I do not know how to reproduce this for powerpc.
Currently, only x86 and arm64 select
ARCH_SUPPORTS_LTO_CLANG.

Is this a fix for a potential issue?


> > clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed with 
> > '-flto'
> >
> > This removes unnecessary CPP redefinition. Which works fine as in most
> > place KBUILD_CFLAGS is passed to $CPP except
> > arch/powerpc/kernel/vdso64/vdso(32|64).lds. To fix vdso, this does:
> > 1. add -m(big|little)-endian to $CPP
> > 2. add target to $KBUILD_CPPFLAGS as otherwise clang ignores 
> > -m(big|little)-endian if
> > the building platform does not support big endian (such as x86).
> >
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> > Changes:
> > v3:
> > * moved vdso cleanup in a separate patch
> > * only add target to KBUILD_CPPFLAGS for CLANG
> >
> > v2:
> > * fix KBUILD_CPPFLAGS
> > * add CLANG_FLAGS to CPPFLAGS
> > ---
> >   Makefile  | 1 +
> >   arch/powerpc/Makefile | 3 ++-
> >   2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/Makefile b/Makefile
> > index 15b6476d0f89..5b545bef7653 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -576,6 +576,7 @@ CC_VERSION_TEXT = $(subst $(pound),,$(shell $(CC) 
> > --version 2>/dev/null | head -
> >   ifneq ($(findstring clang,$(CC_VERSION_TEXT)),)
> >   ifneq ($(CROSS_COMPILE),)
> >   CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%))
> > +KBUILD_CPPFLAGS  += --target=$(notdir $(CROSS_COMPILE:%-=%))
>
> You can avoid the duplication here by just doing:
>
> KBUILD_CPPFLAGS += $(CLANG_FLAGS)
>
> I am still not super happy about the flag duplication but I am not sure
> I can think of a better solution. If KBUILD_CPPFLAGS are always included
> when building .o files, maybe we should just add $(CLANG_FLAGS) to
> KBUILD_CPPFLAGS instead of KBUILD_CFLAGS?

Hmm, I think including --target=* in CPP flags is sensible,
but not all CLANG_FLAGS are CPP flags.
At least, -(no)-integrated-as is not a CPP flag.

We could introduce a separate CLANG_CPP_FLAGS, but
it would require more code changes...

So, I do not have a strong opinion either way.



BTW, another approach might be to modify the linker script.


In my best guess, the reason why powerpc adding the endian flag to CPP
is this line in arch/powerpc/kernel/vdso64/vdso64.lds.S

#ifdef __LITTLE_ENDIAN__
OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", "elf64-powerpcle")
#else
OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc")
#endif


You can use the CONFIG option to check the endian-ness.

#ifdef CONFIG_CPU_BIG_ENDIAN
OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc")
#else
OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", "elf64-powerpcle")
#endif


All the big endian arches define CONFIG_CPU_BIG_ENDIAN.
(but not all little endian arches define CONFIG_CPU_LITTLE_ENDIAN)


So,
#ifdef CONFIG_CPU_BIG_ENDIAN
   < big endian code >
#else
  < little endian code >
#endif

works for all architectures.


Only the exception is you cannot replace the one in uapi headers.
  arch/powerpc/include/uapi/asm/byteorder.h: #ifdef __LITTLE_ENDIAN__
since it is exported to userspace, where CONFIG options are not available.



BTW, various flags are historically used.

 -  CONFIG_CPU_BIG_ENDIAN   /  CONFIG_CPU_LITTLE_ENDIAN
 -  __BIG_ENDIAN   / __LITTLE_ENDIAN
 -  __LITTLE_ENDIAN__ (powerpc only)



__LITTLE_ENDIAN__  is defined by powerpc gcc and clang.

My experiments...


[1] powerpc-linux-gnu-gcc-> __BIG_ENDIAN__ is defined

masahiro@grover:~$ echo | powerpc-linux-gnu-gcc -E  -dM -x c - | grep ENDIAN
#define __ORDER_LITTLE_ENDIAN__ 1234
#define __BIG_ENDIAN__ 1
#define __FLOAT_WORD_ORDER__ __ORDER_BIG_ENDIAN__
#define __ORDER_PDP_ENDIAN__ 3412
#define _BIG_ENDIAN 1
#define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__
#define __VEC_ELEMENT_REG_ORDER__ __ORDER_BIG_ENDIAN__
#define __ORDER_BIG_ENDIAN__ 4321


[2] powerpc-linux-gnu-gcc + -mlittle-endian-> __LITTLE_ENDIAN__ is defined

masahiro@grover:~$ echo | powerpc-linux-gnu-gcc  -E  -dM   -x c -
-mlittle-endian  | grep ENDIAN
#define __ORDER_LITTLE_ENDIAN__ 1234
#define _LITTLE_ENDIAN 1
#define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __ORDER_PDP_ENDIAN__ 3412
#define __LITTLE_ENDIAN__ 1
#define __BYTE_ORDER__

Re: [PATCH kernel v3] powerpc/makefile: Do not redefine $(CPP) for preprocessor





On 14/05/2021 04:59, Nathan Chancellor wrote:

On 5/13/2021 4:59 AM, Alexey Kardashevskiy wrote:

The $(CPP) (do only preprocessing) macro is already defined in Makefile.
However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results
in flags duplication. Which is not a big deal by itself except for
the flags which depend on other flags and the compiler checks them
as it parses the command line.

Specifically, scripts/Makefile.build:304 generates ksyms for .S files.
If clang+llvm+sanitizer are enabled, this results in

-emit-llvm-bc -fno-lto -flto -fvisibility=hidden \
  -fsanitize=cfi-mfcall -fno-lto  ...

in the clang command line and triggers error:

clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed 
with '-flto'


This removes unnecessary CPP redefinition. Which works fine as in most
place KBUILD_CFLAGS is passed to $CPP except
arch/powerpc/kernel/vdso64/vdso(32|64).lds. To fix vdso, this does:
1. add -m(big|little)-endian to $CPP
2. add target to $KBUILD_CPPFLAGS as otherwise clang ignores 
-m(big|little)-endian if

the building platform does not support big endian (such as x86).

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v3:
* moved vdso cleanup in a separate patch
* only add target to KBUILD_CPPFLAGS for CLANG

v2:
* fix KBUILD_CPPFLAGS
* add CLANG_FLAGS to CPPFLAGS
---
  Makefile  | 1 +
  arch/powerpc/Makefile | 3 ++-
  2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 15b6476d0f89..5b545bef7653 100644
--- a/Makefile
+++ b/Makefile
@@ -576,6 +576,7 @@ CC_VERSION_TEXT = $(subst $(pound),,$(shell $(CC) 
--version 2>/dev/null | head -

  ifneq ($(findstring clang,$(CC_VERSION_TEXT)),)
  ifneq ($(CROSS_COMPILE),)
  CLANG_FLAGS    += --target=$(notdir $(CROSS_COMPILE:%-=%))
+KBUILD_CPPFLAGS    += --target=$(notdir $(CROSS_COMPILE:%-=%))


You can avoid the duplication here by just doing:

KBUILD_CPPFLAGS    += $(CLANG_FLAGS)


This has potential of duplicating even more flags which is exactly what 
I am trying to avoid here.



I am still not super happy about the flag duplication but I am not sure 
I can think of a better solution. If KBUILD_CPPFLAGS are always included 
when building .o files,



My understanding is that KBUILD_CPPFLAGS should not be added for .o. Who 
does know or decide for sure about what CPPFLAGS are for? :)



maybe we should just add $(CLANG_FLAGS) to 
KBUILD_CPPFLAGS instead of KBUILD_CFLAGS?



  endif
  ifeq ($(LLVM_IAS),1)
  CLANG_FLAGS    += -integrated-as
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 3212d076ac6a..306bfd2797ad 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -76,6 +76,7 @@ endif
  ifdef CONFIG_CPU_LITTLE_ENDIAN
  KBUILD_CFLAGS    += -mlittle-endian
+KBUILD_CPPFLAGS    += -mlittle-endian
  KBUILD_LDFLAGS    += -EL
  LDEMULATION    := lppc
  GNUTARGET    := powerpcle
@@ -83,6 +84,7 @@ MULTIPLEWORD    := -mno-multiple
  KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-save-toc-indirect)
  else
  KBUILD_CFLAGS += $(call cc-option,-mbig-endian)
+KBUILD_CPPFLAGS += $(call cc-option,-mbig-endian)
  KBUILD_LDFLAGS    += -EB
  LDEMULATION    := ppc
  GNUTARGET    := powerpc
@@ -208,7 +210,6 @@ KBUILD_CPPFLAGS    += -I $(srctree)/arch/$(ARCH) 
$(asinstr)

  KBUILD_AFLAGS    += $(AFLAGS-y)
  KBUILD_CFLAGS    += $(call cc-option,-msoft-float)
  KBUILD_CFLAGS    += -pipe $(CFLAGS-y)
-CPP    = $(CC) -E $(KBUILD_CFLAGS)
  CHECKFLAGS    += -m$(BITS) -D__powerpc__ -D__powerpc$(BITS)__
  ifdef CONFIG_CPU_BIG_ENDIAN





--
Alexey

Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1


On 14 May 2021 at 00:58am, Nicholas Piggin wrote:

Excerpts from Christian Zigotzky's message of May 14, 2021 6:20 am:

On 13 May 2021 at 07:00pm, Christophe Leroy wrote:

Ah yes, I remember this problem.

Can you select CONFIG_VIRT_CPU_ACCOUNTING_GEN in your configuration ?

Otherwise, I can try to fix the branch.

Christophe

I selected this. After that it compiles.

1. git bisect good - Xorg restarts again and again
      Output: [f9aa0ac1e9e82b60401ad567bdabc30598325bc1] Revert
"powerpc/64e/interrupt: use new interrupt return"
2. git bisect good - Xorg restarts again and again
      Output: [cd6d259a14704741bf0cd1dcadb84c0de22d7f77] Revert
"powerpc/64e/interrupt: always save nvgprs on interrupt"
3. git bisect bad - Xorg works
      Output: [9bfa20ef2ae54d3b9088dfbcde4ef97062cf5ef2] Revert
"powerpc/interrupt: update common interrupt code for"
4. git bisect good - Xorg restarts again and again
      Output:

cd6d259a14704741bf0cd1dcadb84c0de22d7f77 is the first bad commit
commit cd6d259a14704741bf0cd1dcadb84c0de22d7f77
Author: Christophe Leroy 
Date:   Thu May 13 09:52:06 2021 +

      Revert "powerpc/64e/interrupt: always save nvgprs on interrupt"

      This reverts commit 4228b2c3d20e9f80b847f809c38e6cf82864fa50.

:04 04 156542c857ad72776b69bb67b2f244afeeb7abd3
92ea86ed097fce16238b0c2f2b343473894e4e8e M    arch

Thank you both very much for chasing this down.

I think I see the problem, it's clobbering r14 and r15 for some
interrupts. Something like this is required, I'll give it more
review and testing though.

Thanks,
Nick

---
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 7c3654b0d0f4..b91ef04f1ce2 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -535,6 +535,10 @@ __end_interrupts:
PROLOG_ADDITION_2REGS)
mfspr   r14,SPRN_DEAR
mfspr   r15,SPRN_ESR
+   std r14,_DAR(r1)
+   std r15,_DSISR(r1)
+   ld  r14,PACA_EXGEN+EX_R14(r13)
+   ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x300)
b   storage_fault_common
  
@@ -544,6 +548,10 @@ __end_interrupts:

PROLOG_ADDITION_2REGS)
li  r15,0
mr  r14,r10
+   std r14,_DAR(r1)
+   std r15,_DSISR(r1)
+   ld  r14,PACA_EXGEN+EX_R14(r13)
+   ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x400)
b   storage_fault_common
  
@@ -557,6 +565,10 @@ __end_interrupts:

PROLOG_ADDITION_2REGS)
mfspr   r14,SPRN_DEAR
mfspr   r15,SPRN_ESR
+   std r14,_DAR(r1)
+   std r15,_DSISR(r1)
+   ld  r14,PACA_EXGEN+EX_R14(r13)
+   ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x600)
b   alignment_more  /* no room, go out of line */
  
@@ -565,10 +577,10 @@ __end_interrupts:

NORMAL_EXCEPTION_PROLOG(0x700, BOOKE_INTERRUPT_PROGRAM,
PROLOG_ADDITION_1REG)
mfspr   r14,SPRN_ESR
-   EXCEPTION_COMMON(0x700)
std r14,_DSISR(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
ld  r14,PACA_EXGEN+EX_R14(r13)
+   EXCEPTION_COMMON(0x700)
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  program_check_exception
REST_NVGPRS(r1)
b   interrupt_return
@@ -725,11 +737,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 * normal exception
 */
mfspr   r14,SPRN_DBSR
-   EXCEPTION_COMMON_CRIT(0xd00)
std r14,_DSISR(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
ld  r14,PACA_EXCRIT+EX_R14(r13)
ld  r15,PACA_EXCRIT+EX_R15(r13)
+   EXCEPTION_COMMON_CRIT(0xd00)
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  DebugException
REST_NVGPRS(r1)
b   interrupt_return
@@ -796,11 +808,11 @@ kernel_dbg_exc:
 * normal exception
 */
mfspr   r14,SPRN_DBSR
-   EXCEPTION_COMMON_DBG(0xd08)
std r14,_DSISR(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
ld  r14,PACA_EXDBG+EX_R14(r13)
ld  r15,PACA_EXDBG+EX_R15(r13)
+   EXCEPTION_COMMON_DBG(0xd08)
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  DebugException
REST_NVGPRS(r1)
b   interrupt_return
@@ -931,11 +943,7 @@ masked_interrupt_book3e_0x2c0:
   * original values stashed away in the PACA
   */
  storage_fault_common:
-   std r14,_DAR(r1)
-   std r15,_DSISR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
-   ld  r14,PACA_EXGEN+EX_R14(r13)
-   ld  r15,PACA_EXGEN+EX_R15(r13)
bl  do_page_fault
b   interrupt_return
  
@@ -944,11 +952,7 @@ storage_fault_common:

   * continues here.
   */
  alignment_more:
-   std r14,_DAR(r1)
-   std r15,_DSISR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
-   ld

Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1

Excerpts from Christian Zigotzky's message of May 14, 2021 6:20 am:
> On 13 May 2021 at 07:00pm, Christophe Leroy wrote:
>>
>> Ah yes, I remember this problem.
>>
>> Can you select CONFIG_VIRT_CPU_ACCOUNTING_GEN in your configuration ?
>>
>> Otherwise, I can try to fix the branch.
>>
>> Christophe
> I selected this. After that it compiles.
> 
> 1. git bisect good - Xorg restarts again and again
>      Output: [f9aa0ac1e9e82b60401ad567bdabc30598325bc1] Revert 
> "powerpc/64e/interrupt: use new interrupt return"
> 2. git bisect good - Xorg restarts again and again
>      Output: [cd6d259a14704741bf0cd1dcadb84c0de22d7f77] Revert 
> "powerpc/64e/interrupt: always save nvgprs on interrupt"
> 3. git bisect bad - Xorg works
>      Output: [9bfa20ef2ae54d3b9088dfbcde4ef97062cf5ef2] Revert 
> "powerpc/interrupt: update common interrupt code for"
> 4. git bisect good - Xorg restarts again and again
>      Output:
> 
> cd6d259a14704741bf0cd1dcadb84c0de22d7f77 is the first bad commit
> commit cd6d259a14704741bf0cd1dcadb84c0de22d7f77
> Author: Christophe Leroy 
> Date:   Thu May 13 09:52:06 2021 +
> 
>      Revert "powerpc/64e/interrupt: always save nvgprs on interrupt"
> 
>      This reverts commit 4228b2c3d20e9f80b847f809c38e6cf82864fa50.
> 
> :04 04 156542c857ad72776b69bb67b2f244afeeb7abd3 
> 92ea86ed097fce16238b0c2f2b343473894e4e8e M    arch

Thank you both very much for chasing this down.

I think I see the problem, it's clobbering r14 and r15 for some 
interrupts. Something like this is required, I'll give it more
review and testing though.

Thanks,
Nick

---
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 7c3654b0d0f4..b91ef04f1ce2 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -535,6 +535,10 @@ __end_interrupts:
PROLOG_ADDITION_2REGS)
mfspr   r14,SPRN_DEAR
mfspr   r15,SPRN_ESR
+   std r14,_DAR(r1)
+   std r15,_DSISR(r1)
+   ld  r14,PACA_EXGEN+EX_R14(r13)
+   ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x300)
b   storage_fault_common
 
@@ -544,6 +548,10 @@ __end_interrupts:
PROLOG_ADDITION_2REGS)
li  r15,0
mr  r14,r10
+   std r14,_DAR(r1)
+   std r15,_DSISR(r1)
+   ld  r14,PACA_EXGEN+EX_R14(r13)
+   ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x400)
b   storage_fault_common
 
@@ -557,6 +565,10 @@ __end_interrupts:
PROLOG_ADDITION_2REGS)
mfspr   r14,SPRN_DEAR
mfspr   r15,SPRN_ESR
+   std r14,_DAR(r1)
+   std r15,_DSISR(r1)
+   ld  r14,PACA_EXGEN+EX_R14(r13)
+   ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x600)
b   alignment_more  /* no room, go out of line */
 
@@ -565,10 +577,10 @@ __end_interrupts:
NORMAL_EXCEPTION_PROLOG(0x700, BOOKE_INTERRUPT_PROGRAM,
PROLOG_ADDITION_1REG)
mfspr   r14,SPRN_ESR
-   EXCEPTION_COMMON(0x700)
std r14,_DSISR(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
ld  r14,PACA_EXGEN+EX_R14(r13)
+   EXCEPTION_COMMON(0x700)
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  program_check_exception
REST_NVGPRS(r1)
b   interrupt_return
@@ -725,11 +737,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 * normal exception
 */
mfspr   r14,SPRN_DBSR
-   EXCEPTION_COMMON_CRIT(0xd00)
std r14,_DSISR(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
ld  r14,PACA_EXCRIT+EX_R14(r13)
ld  r15,PACA_EXCRIT+EX_R15(r13)
+   EXCEPTION_COMMON_CRIT(0xd00)
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  DebugException
REST_NVGPRS(r1)
b   interrupt_return
@@ -796,11 +808,11 @@ kernel_dbg_exc:
 * normal exception
 */
mfspr   r14,SPRN_DBSR
-   EXCEPTION_COMMON_DBG(0xd08)
std r14,_DSISR(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
ld  r14,PACA_EXDBG+EX_R14(r13)
ld  r15,PACA_EXDBG+EX_R15(r13)
+   EXCEPTION_COMMON_DBG(0xd08)
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  DebugException
REST_NVGPRS(r1)
b   interrupt_return
@@ -931,11 +943,7 @@ masked_interrupt_book3e_0x2c0:
  * original values stashed away in the PACA
  */
 storage_fault_common:
-   std r14,_DAR(r1)
-   std r15,_DSISR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
-   ld  r14,PACA_EXGEN+EX_R14(r13)
-   ld  r15,PACA_EXGEN+EX_R15(r13)
bl  do_page_fault
b   interrupt_return
 
@@ -944,11 +952,7 @@ storage_fault_common:
  * continues here.
  */
 alignment_more:
-   std r14,_DAR(r1)
-   std r15,_DSISR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
-   ld

Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1


On 13 May 2021 at 07:00pm, Christophe Leroy wrote:



Le 13/05/2021 à 18:35, Christian Zigotzky a écrit :

On 13 May 2021 at 5:51pm, Christophe Leroy wrote:



Le 13/05/2021 à 17:19, Christian Zigotzky a écrit :

On 13 May 2021 at 12:01 pm, Christophe Leroy wrote:



Le 13/05/2021 à 08:47, Christian Zigotzky a écrit :

Hi Christophe,

On 9. May 2021, at 19:37, Christophe Leroy 
 wrote:


Did I do an error in my analysis ?


No, you didn’t. On the contrary you have found the issue. ;-) The 
issue is somewhere in the new interrupt code.


I'm not convinced, but let's give it a try.



ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt 
context tracking scheme


Could you please create a patch for reverting the new interrupt 
code? I would like to confirm your result.


Please fetch https://github.com/chleroy/linux.git and try the 
branch for_christian.


This is a revert of approx a dozen of commits around the changes 
to 64e on top of v5.13-rc1.


If the top of the branch works for you, it would be great if you 
can find out which one of the reverts fixes the problem for real.


Thanks
Christophe
It's working! Great! I can use the RC1 on my FSL P5040. Thank you! 
The issue is definitely somewhere in the interrupt code. Please 
tell me the next steps.




Can you bisect between 5.13-rc1 and the top of the 'for_christian' 
branch to find the first 'good' commit ?


Take care it is an "up side down" bisect, a 'good' is one that does 
_not_ work and a 'bad' is a commit that works.


git bisect start
git bisect bad 1c8f441f1485
git bisect good 6efb943b8616

Christophe

Hi Christophe,

Yes, I can. Shall I use the branch 'for_christian' or the default 
linux git for bisecting? I have tried it already with the branch 
'for_christian' but it doesn't compile.


git bisect start
git bisect bad 1c8f441f1485
git bisect good 6efb943b8616

Output: [d66b1d1aab0c1caad11eca417f515b86ec0bebe9] Revert 
"powerpc/64e/interrupt: Use new interrupt context tracking scheme"


Result:

arch/powerpc/kernel/interrupt.o: In function `.syscall_exit_prepare':
interrupt.c:(.text+0x278): undefined reference to `.schedule_user'
arch/powerpc/kernel/interrupt.o: In function 
`.interrupt_exit_user_prepare':

interrupt.c:(.text+0x340): undefined reference to `.schedule_user'
Makefile:1191: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1



Ah yes, I remember this problem.

Can you select CONFIG_VIRT_CPU_ACCOUNTING_GEN in your configuration ?

Otherwise, I can try to fix the branch.

Christophe

I selected this. After that it compiles.

1. git bisect good - Xorg restarts again and again
    Output: [f9aa0ac1e9e82b60401ad567bdabc30598325bc1] Revert 
"powerpc/64e/interrupt: use new interrupt return"

2. git bisect good - Xorg restarts again and again
    Output: [cd6d259a14704741bf0cd1dcadb84c0de22d7f77] Revert 
"powerpc/64e/interrupt: always save nvgprs on interrupt"

3. git bisect bad - Xorg works
    Output: [9bfa20ef2ae54d3b9088dfbcde4ef97062cf5ef2] Revert 
"powerpc/interrupt: update common interrupt code for"

4. git bisect good - Xorg restarts again and again
    Output:

cd6d259a14704741bf0cd1dcadb84c0de22d7f77 is the first bad commit
commit cd6d259a14704741bf0cd1dcadb84c0de22d7f77
Author: Christophe Leroy 
Date:   Thu May 13 09:52:06 2021 +

    Revert "powerpc/64e/interrupt: always save nvgprs on interrupt"

    This reverts commit 4228b2c3d20e9f80b847f809c38e6cf82864fa50.

:04 04 156542c857ad72776b69bb67b2f244afeeb7abd3 
92ea86ed097fce16238b0c2f2b343473894e4e8e M    arch


- Christian

Re: [PATCH] x86: Define only {pud/pmd}_{set/clear}_huge when usefull

2021-05-13 Thread Randy Dunlap

On 5/13/21 11:27 AM, Christophe Leroy wrote:
> When PUD and/or PMD are folded, those functions are useless
> and we have a stub in linux/pgtable.h
> 
> Reported-by: Randy Dunlap 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/x86/mm/pgtable.c | 34 +++---
>  1 file changed, 19 insertions(+), 15 deletions(-)

WorksForMe. Thanks.

Acked-by: Randy Dunlap  # build-tested

-- 
~Randy

Re: [PATCH kernel v3] powerpc/makefile: Do not redefine $(CPP) for preprocessor

2021-05-13 Thread Nathan Chancellor


On 5/13/2021 4:59 AM, Alexey Kardashevskiy wrote:

The $(CPP) (do only preprocessing) macro is already defined in Makefile.
However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results
in flags duplication. Which is not a big deal by itself except for
the flags which depend on other flags and the compiler checks them
as it parses the command line.

Specifically, scripts/Makefile.build:304 generates ksyms for .S files.
If clang+llvm+sanitizer are enabled, this results in

-emit-llvm-bc -fno-lto -flto -fvisibility=hidden \
  -fsanitize=cfi-mfcall -fno-lto  ...

in the clang command line and triggers error:

clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed with 
'-flto'

This removes unnecessary CPP redefinition. Which works fine as in most
place KBUILD_CFLAGS is passed to $CPP except
arch/powerpc/kernel/vdso64/vdso(32|64).lds. To fix vdso, this does:
1. add -m(big|little)-endian to $CPP
2. add target to $KBUILD_CPPFLAGS as otherwise clang ignores 
-m(big|little)-endian if
the building platform does not support big endian (such as x86).

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v3:
* moved vdso cleanup in a separate patch
* only add target to KBUILD_CPPFLAGS for CLANG

v2:
* fix KBUILD_CPPFLAGS
* add CLANG_FLAGS to CPPFLAGS
---
  Makefile  | 1 +
  arch/powerpc/Makefile | 3 ++-
  2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 15b6476d0f89..5b545bef7653 100644
--- a/Makefile
+++ b/Makefile
@@ -576,6 +576,7 @@ CC_VERSION_TEXT = $(subst $(pound),,$(shell $(CC) --version 
2>/dev/null | head -
  ifneq ($(findstring clang,$(CC_VERSION_TEXT)),)
  ifneq ($(CROSS_COMPILE),)
  CLANG_FLAGS   += --target=$(notdir $(CROSS_COMPILE:%-=%))
+KBUILD_CPPFLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%))


You can avoid the duplication here by just doing:

KBUILD_CPPFLAGS += $(CLANG_FLAGS)

I am still not super happy about the flag duplication but I am not sure 
I can think of a better solution. If KBUILD_CPPFLAGS are always included 
when building .o files, maybe we should just add $(CLANG_FLAGS) to 
KBUILD_CPPFLAGS instead of KBUILD_CFLAGS?



  endif
  ifeq ($(LLVM_IAS),1)
  CLANG_FLAGS   += -integrated-as
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 3212d076ac6a..306bfd2797ad 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -76,6 +76,7 @@ endif
  
  ifdef CONFIG_CPU_LITTLE_ENDIAN

  KBUILD_CFLAGS += -mlittle-endian
+KBUILD_CPPFLAGS+= -mlittle-endian
  KBUILD_LDFLAGS+= -EL
  LDEMULATION   := lppc
  GNUTARGET := powerpcle
@@ -83,6 +84,7 @@ MULTIPLEWORD  := -mno-multiple
  KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-save-toc-indirect)
  else
  KBUILD_CFLAGS += $(call cc-option,-mbig-endian)
+KBUILD_CPPFLAGS += $(call cc-option,-mbig-endian)
  KBUILD_LDFLAGS+= -EB
  LDEMULATION   := ppc
  GNUTARGET := powerpc
@@ -208,7 +210,6 @@ KBUILD_CPPFLAGS += -I $(srctree)/arch/$(ARCH) $(asinstr)
  KBUILD_AFLAGS += $(AFLAGS-y)
  KBUILD_CFLAGS += $(call cc-option,-msoft-float)
  KBUILD_CFLAGS += -pipe $(CFLAGS-y)
-CPP= $(CC) -E $(KBUILD_CFLAGS)
  
  CHECKFLAGS	+= -m$(BITS) -D__powerpc__ -D__powerpc$(BITS)__

  ifdef CONFIG_CPU_BIG_ENDIAN

[PATCH] x86: Define only {pud/pmd}_{set/clear}_huge when usefull

When PUD and/or PMD are folded, those functions are useless
and we have a stub in linux/pgtable.h

Reported-by: Randy Dunlap 
Signed-off-by: Christophe Leroy 
---
 arch/x86/mm/pgtable.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index d27cf69e811d..1303ff6ef7be 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -682,6 +682,7 @@ int p4d_clear_huge(p4d_t *p4d)
 }
 #endif
 
+#if CONFIG_PGTABLE_LEVELS > 3
 /**
  * pud_set_huge - setup kernel PUD mapping
  *
@@ -720,6 +721,23 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot)
return 1;
 }
 
+/**
+ * pud_clear_huge - clear kernel PUD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PUD map is found).
+ */
+int pud_clear_huge(pud_t *pud)
+{
+   if (pud_large(*pud)) {
+   pud_clear(pud);
+   return 1;
+   }
+
+   return 0;
+}
+#endif
+
+#if CONFIG_PGTABLE_LEVELS > 2
 /**
  * pmd_set_huge - setup kernel PMD mapping
  *
@@ -750,21 +768,6 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t 
prot)
return 1;
 }
 
-/**
- * pud_clear_huge - clear kernel PUD mapping when it is set
- *
- * Returns 1 on success and 0 on failure (no PUD map is found).
- */
-int pud_clear_huge(pud_t *pud)
-{
-   if (pud_large(*pud)) {
-   pud_clear(pud);
-   return 1;
-   }
-
-   return 0;
-}
-
 /**
  * pmd_clear_huge - clear kernel PMD mapping when it is set
  *
@@ -779,6 +782,7 @@ int pmd_clear_huge(pmd_t *pmd)
 
return 0;
 }
+#endif
 
 #ifdef CONFIG_X86_64
 /**
-- 
2.25.0

Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1

Le 13/05/2021 à 18:35, Christian Zigotzky a écrit :

On 13 May 2021 at 5:51pm, Christophe Leroy wrote:

Le 13/05/2021 à 17:19, Christian Zigotzky a écrit :

On 13 May 2021 at 12:01 pm, Christophe Leroy wrote:

Le 13/05/2021 à 08:47, Christian Zigotzky a écrit :

Hi Christophe,

On 9. May 2021, at 19:37, Christophe Leroy wrote:

Did I do an error in my analysis ?

No, you didn’t. On the contrary you have found the issue. ;-) The issue is somewhere in the new
interrupt code.

I'm not convinced, but let's give it a try.

ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context tracking
scheme

Could you please create a patch for reverting the new interrupt code? I would like to confirm
your result.

Please fetch https://github.com/chleroy/linux.git and try the branch
for_christian.

This is a revert of approx a dozen of commits around the changes to 64e on top
of v5.13-rc1.

If the top of the branch works for you, it would be great if you can find out which one of the
reverts fixes the problem for real.

Thanks
Christophe
It's working! Great! I can use the RC1 on my FSL P5040. Thank you! The issue is definitely
somewhere in the interrupt code. Please tell me the next steps.

Can you bisect between 5.13-rc1 and the top of the 'for_christian' branch to find the first 'good'
commit ?

Take care it is an "up side down" bisect, a 'good' is one that does _not_ work and a 'bad' is a
commit that works.

git bisect start
git bisect bad 1c8f441f1485
git bisect good 6efb943b8616

Christophe

Hi Christophe,

Yes, I can. Shall I use the branch 'for_christian' or the default linux git for bisecting? I have
tried it already with the branch 'for_christian' but it doesn't compile.

git bisect start
git bisect bad 1c8f441f1485
git bisect good 6efb943b8616

Output: [d66b1d1aab0c1caad11eca417f515b86ec0bebe9] Revert "powerpc/64e/interrupt: Use new interrupt
context tracking scheme"

Result:

arch/powerpc/kernel/interrupt.o: In function `.syscall_exit_prepare':
interrupt.c:(.text+0x278): undefined reference to `.schedule_user'
arch/powerpc/kernel/interrupt.o: In function `.interrupt_exit_user_prepare':
interrupt.c:(.text+0x340): undefined reference to `.schedule_user'
Makefile:1191: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1

Ah yes, I remember this problem.

Can you select CONFIG_VIRT_CPU_ACCOUNTING_GEN in your configuration ?

Otherwise, I can try to fix the branch.

Christophe

Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1


On 13 May 2021 at 5:51pm, Christophe Leroy wrote:



Le 13/05/2021 à 17:19, Christian Zigotzky a écrit :

On 13 May 2021 at 12:01 pm, Christophe Leroy wrote:



Le 13/05/2021 à 08:47, Christian Zigotzky a écrit :

Hi Christophe,

On 9. May 2021, at 19:37, Christophe Leroy 
 wrote:


Did I do an error in my analysis ?


No, you didn’t. On the contrary you have found the issue. ;-) The 
issue is somewhere in the new interrupt code.


I'm not convinced, but let's give it a try.



ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt 
context tracking scheme


Could you please create a patch for reverting the new interrupt 
code? I would like to confirm your result.


Please fetch https://github.com/chleroy/linux.git and try the branch 
for_christian.


This is a revert of approx a dozen of commits around the changes to 
64e on top of v5.13-rc1.


If the top of the branch works for you, it would be great if you can 
find out which one of the reverts fixes the problem for real.


Thanks
Christophe
It's working! Great! I can use the RC1 on my FSL P5040. Thank you! 
The issue is definitely somewhere in the interrupt code. Please tell 
me the next steps.




Can you bisect between 5.13-rc1 and the top of the 'for_christian' 
branch to find the first 'good' commit ?


Take care it is an "up side down" bisect, a 'good' is one that does 
_not_ work and a 'bad' is a commit that works.


git bisect start
git bisect bad 1c8f441f1485
git bisect good 6efb943b8616

Christophe

Hi Christophe,

Yes, I can. Shall I use the branch 'for_christian' or the default linux 
git for bisecting? I have tried it already with the branch 
'for_christian' but it doesn't compile.


git bisect start
git bisect bad 1c8f441f1485
git bisect good 6efb943b8616

Output: [d66b1d1aab0c1caad11eca417f515b86ec0bebe9] Revert 
"powerpc/64e/interrupt: Use new interrupt context tracking scheme"


Result:

arch/powerpc/kernel/interrupt.o: In function `.syscall_exit_prepare':
interrupt.c:(.text+0x278): undefined reference to `.schedule_user'
arch/powerpc/kernel/interrupt.o: In function `.interrupt_exit_user_prepare':
interrupt.c:(.text+0x340): undefined reference to `.schedule_user'
Makefile:1191: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1



Thanks,
Christian

Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1





Le 13/05/2021 à 17:19, Christian Zigotzky a écrit :

On 13 May 2021 at 12:01 pm, Christophe Leroy wrote:



Le 13/05/2021 à 08:47, Christian Zigotzky a écrit :

Hi Christophe,


On 9. May 2021, at 19:37, Christophe Leroy  wrote:

Did I do an error in my analysis ?


No, you didn’t. On the contrary you have found the issue. ;-) The issue is somewhere in the new 
interrupt code.


I'm not convinced, but let's give it a try.




ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context tracking 
scheme


Could you please create a patch for reverting the new interrupt code? I would like to confirm 
your result.


Please fetch https://github.com/chleroy/linux.git and try the branch 
for_christian.

This is a revert of approx a dozen of commits around the changes to 64e on top 
of v5.13-rc1.

If the top of the branch works for you, it would be great if you can find out which one of the 
reverts fixes the problem for real.


Thanks
Christophe
It's working! Great! I can use the RC1 on my FSL P5040. Thank you! The issue is definitely somewhere 
in the interrupt code. Please tell me the next steps.




Can you bisect between 5.13-rc1 and the top of the 'for_christian' branch to find the first 'good' 
commit ?


Take care it is an "up side down" bisect, a 'good' is one that does _not_ work and a 'bad' is a 
commit that works.


git bisect start
git bisect bad 1c8f441f1485
git bisect good 6efb943b8616

Christophe

Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1


On 13 May 2021 at 12:01 pm, Christophe Leroy wrote:



Le 13/05/2021 à 08:47, Christian Zigotzky a écrit :

Hi Christophe,

On 9. May 2021, at 19:37, Christophe Leroy 
 wrote:


Did I do an error in my analysis ?


No, you didn’t. On the contrary you have found the issue. ;-) The 
issue is somewhere in the new interrupt code.


I'm not convinced, but let's give it a try.



ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt 
context tracking scheme


Could you please create a patch for reverting the new interrupt code? 
I would like to confirm your result.


Please fetch https://github.com/chleroy/linux.git and try the branch 
for_christian.


This is a revert of approx a dozen of commits around the changes to 
64e on top of v5.13-rc1.


If the top of the branch works for you, it would be great if you can 
find out which one of the reverts fixes the problem for real.


Thanks
Christophe
It's working! Great! I can use the RC1 on my FSL P5040. Thank you! The 
issue is definitely somewhere in the interrupt code. Please tell me the 
next steps.


- Christian

[PATCH 1/2] powerpc/64s: Fix entry flush patching w/strict RWX & hash

2021-05-13 Thread Michael Ellerman

The entry flush mitigation can be enabled/disabled at runtime. When this
happens it results in the kernel patching its own instructions to
enable/disable the mitigation sequence.

With strict kernel RWX enabled instruction patching happens via a
secondary mapping of the kernel text, so that we don't have to make the
primary mapping writable. With the hash MMU this leads to a hash fault,
which causes us to execute the exception entry which contains the entry
flush mitigation.

This means we end up executing the entry flush in a semi-patched state,
ie. after we have patched the first instruction but before we patch the
second or third instruction of the sequence.

On machines with updated firmware the entry flush is a series of special
nops, and it's safe to to execute in a semi-patched state.

However when using the fallback flush the sequence is mflr/branch/mtlr,
and so it's not safe to execute if we have patched out the mflr but not
the other two instructions. Doing so leads to us corrputing LR, leading
to an oops, for example:

  # echo 0 > /sys/kernel/debug/powerpc/entry_flush
  kernel tried to execute exec-protected page (c2971000) - exploit 
attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch
  Faulting instruction address: 0xc2971000
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  CPU: 0 PID: 2215 Comm: bash Not tainted 5.13.0-rc1-00010-gda3bb206c9ce #1
  NIP:  c2971000 LR: c2971000 CTR: c0120c40
  REGS: c00013243840 TRAP: 0400   Not tainted  
(5.13.0-rc1-00010-gda3bb206c9ce)
  MSR:  800010009033   CR: 48428482  XER: 
  ...
  NIP  0xc2971000
  LR   0xc2971000
  Call Trace:
do_patch_instruction+0xc4/0x340 (unreliable)
do_entry_flush_fixups+0x100/0x3b0
entry_flush_set+0x50/0xe0
simple_attr_write+0x160/0x1a0
full_proxy_write+0x8c/0x110
vfs_write+0xf0/0x340
ksys_write+0x84/0x140
system_call_exception+0x164/0x2d0
system_call_common+0xec/0x278

The simplest fix is to change the order in which we patch the
instructions, so that the sequence is always safe to execute. For the
non-fallback flushes it doesn't matter what order we patch in.

Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/lib/feature-fixups.c | 56 ++-
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index 0aefa6a4a259..b49bb41e3ec5 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -325,6 +325,28 @@ static int __do_entry_flush_fixups(void *data)
if (types & L1D_FLUSH_MTTRIG)
instrs[i++] = 0x7c12dba6; /* mtspr TRIG2,r0 (SPR #882) */
 
+   /*
+* If we're patching in or out the fallback flush we need to be careful 
about the
+* order in which we patch instructions. That's because it's possible 
we could
+* take a page fault after patching one instruction, so the sequence of
+* instructions must be safe even in a half patched state.
+*
+* To make that work, when patching in the fallback flush we patch in 
this order:
+*  - the mflr  (dest)
+*  - the mtlr  (dest + 2)
+*  - the branch(dest + 1)
+*
+* That ensures the sequence is safe to execute at any point. In 
contrast if we
+* patch the mtlr last, it's possible we could return from the branch 
and not
+* restore LR, leading to a crash later.
+*
+* When patching out the fallback flush (either with nops or another 
flush type),
+* we patch in this order:
+*  - the branch(dest + 1)
+*  - the mtlr  (dest + 2)
+*  - the mflr  (dest)
+*/
+
start = PTRRELOC(&__start___entry_flush_fixup);
end = PTRRELOC(&__stop___entry_flush_fixup);
for (i = 0; start < end; start++, i++) {
@@ -332,15 +354,16 @@ static int __do_entry_flush_fixups(void *data)
 
pr_devel("patching dest %lx\n", (unsigned long)dest);
 
-   patch_instruction((struct ppc_inst *)dest, ppc_inst(instrs[0]));
-
-   if (types == L1D_FLUSH_FALLBACK)
-   patch_branch((struct ppc_inst *)(dest + 1), (unsigned 
long)_flush_fallback,
-BRANCH_SET_LINK);
-   else
+   if (types == L1D_FLUSH_FALLBACK) {
+   patch_instruction((struct ppc_inst *)dest, 
ppc_inst(instrs[0]));
+   patch_instruction((struct ppc_inst *)(dest + 2), 
ppc_inst(instrs[2]));
+   patch_branch((struct ppc_inst *)(dest + 1),
+(unsigned long)_flush_fallback, 
BRANCH_SET_LINK);
+   } else {

[PATCH 2/2] powerpc/64s: Fix stf mitigation patching w/strict RWX & hash

2021-05-13 Thread Michael Ellerman

The stf entry barrier fallback is unsafe to execute in a semi-patched
state, which can happen when enabling/disabling the mitigation with
strict kernel RWX enabled and using the hash MMU.

See the previous commit for more details.

Fix it by changing the order in which we patch the instructions.

Note the stf barrier fallback is only used on Power6 or earlier.

Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/lib/feature-fixups.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index b49bb41e3ec5..71032475aa40 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -150,17 +150,17 @@ static void do_stf_entry_barrier_fixups(enum 
stf_barrier_type types)
 
pr_devel("patching dest %lx\n", (unsigned long)dest);
 
-   patch_instruction((struct ppc_inst *)dest, ppc_inst(instrs[0]));
-
-   if (types & STF_BARRIER_FALLBACK)
+   // See comment in do_entry_flush_fixups() RE order of patching
+   if (types & STF_BARRIER_FALLBACK) {
+   patch_instruction((struct ppc_inst *)dest, 
ppc_inst(instrs[0]));
+   patch_instruction((struct ppc_inst *)(dest + 2), 
ppc_inst(instrs[2]));
patch_branch((struct ppc_inst *)(dest + 1),
-(unsigned long)_barrier_fallback,
-BRANCH_SET_LINK);
-   else
-   patch_instruction((struct ppc_inst *)(dest + 1),
- ppc_inst(instrs[1]));
-
-   patch_instruction((struct ppc_inst *)(dest + 2), 
ppc_inst(instrs[2]));
+(unsigned long)_barrier_fallback, 
BRANCH_SET_LINK);
+   } else {
+   patch_instruction((struct ppc_inst *)(dest + 1), 
ppc_inst(instrs[1]));
+   patch_instruction((struct ppc_inst *)(dest + 2), 
ppc_inst(instrs[2]));
+   patch_instruction((struct ppc_inst *)dest, 
ppc_inst(instrs[0]));
+   }
}
 
printk(KERN_DEBUG "stf-barrier: patched %d entry locations (%s 
barrier)\n", i,
-- 
2.25.1

[PATCH 0/8] xen: harden frontends against malicious backends

Xen backends of para-virtualized devices can live in dom0 kernel, dom0
user land, or in a driver domain. This means that a backend might
reside in a less trusted environment than the Xen core components, so
a backend should not be able to do harm to a Xen guest (it can still
mess up I/O data, but it shouldn't be able to e.g. crash a guest by
other means or cause a privilege escalation in the guest).

Unfortunately many frontends in the Linux kernel are fully trusting
their respective backends. This series is starting to fix the most
important frontends: console, disk and network.

It was discussed to handle this as a security problem, but the topic
was discussed in public before, so it isn't a real secret.

Juergen Gross (8):
  xen: sync include/xen/interface/io/ring.h with Xen's newest version
  xen/blkfront: read response from backend only once
  xen/blkfront: don't take local copy of a request from the ring page
  xen/blkfront: don't trust the backend response data blindly
  xen/netfront: read response from backend only once
  xen/netfront: don't read data from request on the ring page
  xen/netfront: don't trust the backend response data blindly
  xen/hvc: replace BUG_ON() with negative return value

 drivers/block/xen-blkfront.c| 118 +-
 drivers/net/xen-netfront.c  | 184 ++---
 drivers/tty/hvc/hvc_xen.c   |  15 +-
 include/xen/interface/io/ring.h | 278 ++--
 4 files changed, 369 insertions(+), 226 deletions(-)

-- 
2.26.2

Re: [PATCH 01/11] PCI: Use sysfs_emit() and sysfs_emit_at() in "show" functions

2021-05-13 Thread Krzysztof Wilczyński

Hi Logan,

[...]
> Thanks, this is a great cleanup. I've reviewed the entire series.
> 
> Reviewed-by: Logan Gunthorpe 

Thank you!  Appreciate it!

> I agree that the new lines that are missing should be added.

While working on the simple change to add the missing new lines, I've
found that we have slight inconsistency with handling these in one of
the attributes, as per:

  # uname -r
  5.13.0-rc1-00092-gc06a2ba62fc4
  
  # grep -oE 'pci=resource_alignment.+' /proc/cmdline 
  pci=resource_alignment=20@00:1f.2

  # cat /sys/bus/pci/resource_alignment
  20@00:1f.
  
  # echo 20@00:1f.2 > /sys/bus/pci/resource_alignment
  # cat /sys/bus/pci/resource_alignment
  20@00:1f.2

  # echo -n 20@00:1f.2 > /sys/bus/pci/resource_alignment
  # cat /sys/bus/pci/resource_alignment
  20@00:1f.

I adjusted the code so that we handle the input better in the upcoming
v2 - the change is simple, but since it adds more code, I'll drop your
"Reviewed-by", so that we can make sure that subsequent review will
cover this new change.  Sorry for troubles in advance!

Krzysztof

Re: [RFC 1/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats

2021-05-13 Thread kajoljain

On 5/12/21 10:57 PM, Peter Zijlstra wrote:
> On Wed, May 12, 2021 at 10:08:21PM +0530, Kajol Jain wrote:
>> +static void nvdimm_pmu_read(struct perf_event *event)
>> +{
>> +struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>> +
>> +/* jump to arch/platform specific callbacks if any */
>> +if (nd_pmu && nd_pmu->read)
>> +nd_pmu->read(event, nd_pmu->dev);
>> +}
>> +
>> +static void nvdimm_pmu_del(struct perf_event *event, int flags)
>> +{
>> +struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>> +
>> +/* jump to arch/platform specific callbacks if any */
>> +if (nd_pmu && nd_pmu->del)
>> +nd_pmu->del(event, flags, nd_pmu->dev);
>> +}
>> +
>> +static int nvdimm_pmu_add(struct perf_event *event, int flags)
>> +{
>> +struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>> +
>> +if (flags & PERF_EF_START)
>> +/* jump to arch/platform specific callbacks if any */
>> +if (nd_pmu && nd_pmu->add)
>> +return nd_pmu->add(event, flags, nd_pmu->dev);
>> +return 0;
>> +}
> 
> What's the value add here? Why can't you directly set driver pointers? I
> also don't really believe ->{add,del,read} can be optional and still
> have a sane driver.
> 

Hi Peter,

  The intend for adding these callbacks  is to give flexibility to the
arch/platform specific driver code to use its own routine for getting 
counter data or specific checks/operations. Arch/platform driver code
would have different method to get the counter data like IBM pseries
nmem* device which uses a hypervisor call(hcall).

But yes the current read/add/del functions are not adding value. We
could  add an arch/platform specific function which could handle the
capturing of the counter data and do the rest of the operation here,
is this approach better?

Thanks,
Kajol Jain

[PATCH 4/4] KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C

Almost all logic is moved to C, by introducing a new in_guest mode for
the P9 path that branches very early in the KVM interrupt handler to P9
exit code.

The main P9 entry and exit assembly is now only about 160 lines of low
level stack setup and register save/restore, plus a bad-interrupt
handler.

There are two motivations for this, the first is just make the code more
maintainable being in C. The second is to reduce the amount of code
running in a special KVM mode, "realmode". In quotes because with radix
it is no longer necessarily real-mode in the MMU, but it still has to be
treated specially because it may be in real-mode, and has various
important registers like PID, DEC, TB, etc set to guest. This is hostile
to the rest of Linux and can't use arbitrary kernel functionality or be
instrumented well.

This initial patch is a reasonably faithful conversion of the asm code,
but it does lack any loop to return quickly back into the guest without
switching out of realmode in the case of unimportant or easily handled
interrupts. As explained in previous changes, handling HV interrupts
very quickly in this low level realmode is not so important for P9
performance, and are important to avoid for security, observability,
debugability reasons.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/asm-prototypes.h |   3 +-
 arch/powerpc/include/asm/kvm_asm.h|   1 +
 arch/powerpc/include/asm/kvm_book3s_64.h  |   8 +
 arch/powerpc/include/asm/kvm_host.h   |   7 +-
 arch/powerpc/kernel/security.c|   5 +-
 arch/powerpc/kvm/Makefile |   1 +
 arch/powerpc/kvm/book3s_64_entry.S| 254 ++
 arch/powerpc/kvm/book3s_hv.c  |   9 +-
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 207 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 125 +--
 10 files changed, 496 insertions(+), 124 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_p9_entry.c

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 1c7b75834e04..02ee6f5ac9fe 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -120,6 +120,7 @@ extern s32 patch__call_flush_branch_caches3;
 extern s32 patch__flush_count_cache_return;
 extern s32 patch__flush_link_stack_return;
 extern s32 patch__call_kvm_flush_link_stack;
+extern s32 patch__call_kvm_flush_link_stack_p9;
 extern s32 patch__memset_nocache, patch__memcpy_nocache;
 
 extern long flush_branch_caches;
@@ -140,7 +141,7 @@ void kvmhv_load_host_pmu(void);
 void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use);
 void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu);
 
-int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu);
+void kvmppc_p9_enter_guest(struct kvm_vcpu *vcpu);
 
 long kvmppc_h_set_dabr(struct kvm_vcpu *vcpu, unsigned long dabr);
 long kvmppc_h_set_xdabr(struct kvm_vcpu *vcpu, unsigned long dabr,
diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index a3633560493b..43b1788e1f93 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -147,6 +147,7 @@
 #define KVM_GUEST_MODE_SKIP2
 #define KVM_GUEST_MODE_GUEST_HV3
 #define KVM_GUEST_MODE_HOST_HV 4
+#define KVM_GUEST_MODE_HV_FAST 5 /* ISA >= v3.0 host+guest radix, indep thr */
 
 #define KVM_INST_FETCH_FAILED  -1
 
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 9bb9bb370b53..c214bcffb441 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -153,9 +153,17 @@ static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu 
*vcpu)
return radix;
 }
 
+int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu);
+
 #define KVM_DEFAULT_HPT_ORDER  24  /* 16MB HPT by default */
 #endif
 
+/*
+ * Invalid HDSISR value which is used to indicate when HW has not set the reg.
+ * Used to work around an errata.
+ */
+#define HDSISR_CANARY  0x7fff
+
 /*
  * We use a lock bit in HPTE dword 0 to synchronize updates and
  * accesses to each HPTE, and another bit to indicate non-present
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1e83359f286b..69add9d662df 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -683,7 +683,12 @@ struct kvm_vcpu_arch {
ulong fault_dar;
u32 fault_dsisr;
unsigned long intr_msr;
-   ulong fault_gpa;/* guest real address of page fault (POWER9) */
+   /*
+* POWER9 and later: fault_gpa contains the guest real address of page
+* fault for a radix guest, or segment descriptor (equivalent to result
+* from slbmfev of SLB entry that translated the EA) for hash guests.
+*/
+   ulong fault_gpa;
 #endif
 
 #ifdef CONFIG_BOOKE
diff --git a/arch/powerpc/kernel/security.c

[PATCH 3/4] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

In the interest of minimising the amount of code that is run in
"real-mode", don't handle hcalls in real mode in the P9 path. This
requires some new handlers for H_CEDE and xics-on-xive to be added
before xive is pulled or cede logic is checked.

This introduces a change in radix guest behaviour where radix guests
that execute 'sc 1' in userspace now get a privilege fault whereas
previously the 'sc 1' would be reflected as a syscall interrupt to the
guest kernel. That reflection is only required for hash guests that run
PR KVM.

Background:

In POWER8 and earlier processors, it is very expensive to exit from the
HV real mode context of a guest hypervisor interrupt, and switch to host
virtual mode. On those processors, guest->HV interrupts reach the
hypervisor with the MMU off because the MMU is loaded with guest context
(LPCR, SDR1, SLB), and the other threads in the sub-core need to be
pulled out of the guest too. Then the primary must save off guest state,
invalidate SLB and ERAT, and load up host state before the MMU can be
enabled to run in host virtual mode (~= regular Linux mode).

Hash guests also require a lot of hcalls to run due to the nature of the
MMU architecture and paravirtualisation design. The XICS interrupt
controller requires hcalls to run.

So KVM traditionally tries hard to avoid the full exit, by handling
hcalls and other interrupts in real mode as much as possible.

By contrast, POWER9 has independent MMU context per-thread, and in radix
mode the hypervisor is in host virtual memory mode when the HV interrupt
is taken. Radix guests do not require significant hcalls to manage their
translations, and xive guests don't need hcalls to handle interrupts.

So it's much less important from a performance standpoint to handle
hcalls in real mode in P9.

The TCE hcalls are performance critical, introduced for P8 in order to
achieve 10GbE performance. They was not required on P9 (which was able
to drive 40GbE networking with just the virt mode hcalls), but
performance is quite important. After later changes that add hash guest
support to the P9 path, hash hcalls are also performance critical.  The
full entry/exit performance can still be improved significanty, but if
this is found to be inadequate then real-mode hcall handlers could be
reintroduced for the P9 path (in C), but this would be a last resort.

Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_ppc.h  |  6 ++
 arch/powerpc/kvm/book3s.c   |  6 ++
 arch/powerpc/kvm/book3s_hv.c| 79 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  5 ++
 arch/powerpc/kvm/book3s_xive.c  | 64 
 5 files changed, 149 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 8c10c3427166..cb9e3c85c605 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -129,6 +129,7 @@ extern void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu);
 extern int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu);
 extern int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_machine_check(struct kvm_vcpu *vcpu, ulong 
flags);
+extern void kvmppc_core_queue_syscall(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags);
 extern void kvmppc_core_queue_fpunavail(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_vec_unavail(struct kvm_vcpu *vcpu);
@@ -606,6 +607,7 @@ extern void kvmppc_free_pimap(struct kvm *kvm);
 extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall);
 extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
 extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd);
+extern int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req);
 extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
 extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
@@ -638,6 +640,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
 static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
 static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
{ return 0; }
+static inline int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
+   { return 0; }
 #endif
 
 #ifdef CONFIG_KVM_XIVE
@@ -672,6 +676,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int 
irq_source_id, u32 irq,
   int level, bool line_status);
 extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
 extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
+extern void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu);
 
 static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
 {
@@ -713,6 +718,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, int 
irq_source_id, u32 ir

[PATCH 2/4] KVM: PPC: Book3S HV P9: Move radix MMU switching instructions together

Switching the MMU from radix<->radix mode is tricky particularly as the
MMU can remain enabled and requires a certain sequence of SPR updates.
Move these together into their own functions.

This also includes the radix TLB check / flush because it's tied in to
MMU switching due to tlbiel getting LPID from LPIDR.

Reviewed-by: Alexey Kardashevskiy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 62 
 1 file changed, 41 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 68914b26017b..287cd3e1b918 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3478,12 +3478,49 @@ static noinline void kvmppc_run_core(struct 
kvmppc_vcore *vc)
trace_kvmppc_run_core(vc, 1);
 }
 
+static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, 
u64 lpcr)
+{
+   struct kvmppc_vcore *vc = vcpu->arch.vcore;
+   struct kvm_nested_guest *nested = vcpu->arch.nested;
+   u32 lpid;
+
+   lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
+
+   /*
+* All the isync()s are overkill but trivially follow the ISA
+* requirements. Some can likely be replaced with justification
+* comment for why they are not needed.
+*/
+   isync();
+   mtspr(SPRN_LPID, lpid);
+   isync();
+   mtspr(SPRN_LPCR, lpcr);
+   isync();
+   mtspr(SPRN_PID, vcpu->arch.pid);
+   isync();
+
+   /* TLBIEL must have LPIDR set, so set guest LPID before flushing. */
+   kvmppc_check_need_tlb_flush(kvm, vc->pcpu, nested);
+}
+
+static void switch_mmu_to_host_radix(struct kvm *kvm, u32 pid)
+{
+   isync();
+   mtspr(SPRN_PID, pid);
+   isync();
+   mtspr(SPRN_LPID, kvm->arch.host_lpid);
+   isync();
+   mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
+   isync();
+}
+
 /*
  * Load up hypervisor-mode registers on P9.
  */
 static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 unsigned long lpcr)
 {
+   struct kvm *kvm = vcpu->kvm;
struct kvmppc_vcore *vc = vcpu->arch.vcore;
s64 hdec;
u64 tb, purr, spurr;
@@ -3535,7 +3572,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
}
mtspr(SPRN_CIABR, vcpu->arch.ciabr);
mtspr(SPRN_IC, vcpu->arch.ic);
-   mtspr(SPRN_PID, vcpu->arch.pid);
 
mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
  (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
@@ -3549,8 +3585,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
 
mtspr(SPRN_AMOR, ~0UL);
 
-   mtspr(SPRN_LPCR, lpcr);
-   isync();
+   switch_mmu_to_guest_radix(kvm, vcpu, lpcr);
 
/*
 * P9 suppresses the HDEC exception when LPCR[HDICE] = 0,
@@ -3593,7 +3628,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
mtspr(SPRN_DAWR1, host_dawr1);
mtspr(SPRN_DAWRX1, host_dawrx1);
}
-   mtspr(SPRN_PID, host_pidr);
 
/*
 * Since this is radix, do a eieio; tlbsync; ptesync sequence in
@@ -3608,9 +3642,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
if (cpu_has_feature(CPU_FTR_ARCH_31))
asm volatile(PPC_CP_ABORT);
 
-   mtspr(SPRN_LPID, vcpu->kvm->arch.host_lpid);/* restore host LPID */
-   isync();
-
vc->dpdes = mfspr(SPRN_DPDES);
vc->vtb = mfspr(SPRN_VTB);
mtspr(SPRN_DPDES, 0);
@@ -3627,7 +3658,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
}
 
mtspr(SPRN_HDEC, 0x7fff);
-   mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr);
+
+   switch_mmu_to_host_radix(kvm, host_pidr);
 
return trap;
 }
@@ -4181,7 +4213,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
 {
struct kvm_run *run = vcpu->run;
int trap, r, pcpu;
-   int srcu_idx, lpid;
+   int srcu_idx;
struct kvmppc_vcore *vc;
struct kvm *kvm = vcpu->kvm;
struct kvm_nested_guest *nested = vcpu->arch.nested;
@@ -4255,13 +4287,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
vc->vcore_state = VCORE_RUNNING;
trace_kvmppc_run_core(vc, 0);
 
-   if (cpu_has_feature(CPU_FTR_HVMODE)) {
-   lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
-   mtspr(SPRN_LPID, lpid);
-   isync();
-   kvmppc_check_need_tlb_flush(kvm, pcpu, nested);
-   }
-
guest_enter_irqoff();
 
srcu_idx = srcu_read_lock(>srcu);
@@ -4280,11 +4305,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
 
srcu_read_unlock(>srcu, srcu_idx);
 
-   if (cpu_has_feature(CPU_FTR_HVMODE)) {
-   mtspr(SPRN_LPID,

[PATCH 1/4] KVM: PPC: Book3S HV P9: Move xive vcpu context management into kvmhv_p9_guest_entry

Move the xive management up so the low level register switching can be
pushed further down in a later patch. XIVE MMIO CI operations can run in
higher level code with machine checks, tracing, etc., available.

Reviewed-by: Alexey Kardashevskiy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8a31df067c65..68914b26017b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3558,15 +3558,11 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
 */
mtspr(SPRN_HDEC, hdec);
 
-   kvmppc_xive_push_vcpu(vcpu);
-
mtspr(SPRN_SRR0, vcpu->arch.shregs.srr0);
mtspr(SPRN_SRR1, vcpu->arch.shregs.srr1);
 
trap = __kvmhv_vcpu_entry_p9(vcpu);
 
-   kvmppc_xive_pull_vcpu(vcpu);
-
/* Advance host PURR/SPURR by the amount used by guest */
purr = mfspr(SPRN_PURR);
spurr = mfspr(SPRN_SPURR);
@@ -3764,7 +3760,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
trap = 0;
}
} else {
+   kvmppc_xive_push_vcpu(vcpu);
trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr);
+   kvmppc_xive_pull_vcpu(vcpu);
+
}
 
vcpu->arch.slb_max = 0;
-- 
2.23.0

[PATCH 0/4] KVM: PPC: Convert P9 HV path to C

This applies on top of these series:

https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=238649
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=238941
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=238946

This was broken out from the large Cify series. Since that was last
posted, changes are:

- Rebase, reordering of patches, tweaking changelog and comments.
- Changed P9 radix exist SLB sanitising to use 4x slbmte to clear
  rather than 8x slbmfee/slbmfev, which turns out to be faster (and
  is what today's asm code does) [from review from Paul].
- Renamed book3s_hv_interrupt.c to book3s_hv_p9_entry.c, which reduces
  confusion with book3s_hv_interrupts.S.
- Fixed !HV compile [Alexey].

Nicholas Piggin (4):
  KVM: PPC: Book3S HV P9: Move xive vcpu context management into
kvmhv_p9_guest_entry
  KVM: PPC: Book3S HV P9: Move radix MMU switching instructions together
  KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9
path
  KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C

 arch/powerpc/include/asm/asm-prototypes.h |   3 +-
 arch/powerpc/include/asm/kvm_asm.h|   1 +
 arch/powerpc/include/asm/kvm_book3s_64.h  |   8 +
 arch/powerpc/include/asm/kvm_host.h   |   7 +-
 arch/powerpc/include/asm/kvm_ppc.h|   6 +
 arch/powerpc/kernel/security.c|   5 +-
 arch/powerpc/kvm/Makefile |   1 +
 arch/powerpc/kvm/book3s.c |   6 +
 arch/powerpc/kvm/book3s_64_entry.S| 254 ++
 arch/powerpc/kvm/book3s_hv.c  | 155 +
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 207 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 120 +-
 arch/powerpc/kvm/book3s_xive.c|  64 ++
 13 files changed, 683 insertions(+), 154 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_p9_entry.c

-- 
2.23.0

[PATCH kernel v3] powerpc/makefile: Do not redefine $(CPP) for preprocessor

The $(CPP) (do only preprocessing) macro is already defined in Makefile.
However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results
in flags duplication. Which is not a big deal by itself except for
the flags which depend on other flags and the compiler checks them
as it parses the command line.

Specifically, scripts/Makefile.build:304 generates ksyms for .S files.
If clang+llvm+sanitizer are enabled, this results in

-emit-llvm-bc -fno-lto -flto -fvisibility=hidden \
 -fsanitize=cfi-mfcall -fno-lto  ...

in the clang command line and triggers error:

clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed with 
'-flto'

This removes unnecessary CPP redefinition. Which works fine as in most
place KBUILD_CFLAGS is passed to $CPP except
arch/powerpc/kernel/vdso64/vdso(32|64).lds. To fix vdso, this does:
1. add -m(big|little)-endian to $CPP
2. add target to $KBUILD_CPPFLAGS as otherwise clang ignores 
-m(big|little)-endian if
the building platform does not support big endian (such as x86).

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v3:
* moved vdso cleanup in a separate patch
* only add target to KBUILD_CPPFLAGS for CLANG

v2:
* fix KBUILD_CPPFLAGS
* add CLANG_FLAGS to CPPFLAGS
---
 Makefile  | 1 +
 arch/powerpc/Makefile | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 15b6476d0f89..5b545bef7653 100644
--- a/Makefile
+++ b/Makefile
@@ -576,6 +576,7 @@ CC_VERSION_TEXT = $(subst $(pound),,$(shell $(CC) --version 
2>/dev/null | head -
 ifneq ($(findstring clang,$(CC_VERSION_TEXT)),)
 ifneq ($(CROSS_COMPILE),)
 CLANG_FLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%))
+KBUILD_CPPFLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%))
 endif
 ifeq ($(LLVM_IAS),1)
 CLANG_FLAGS+= -integrated-as
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 3212d076ac6a..306bfd2797ad 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -76,6 +76,7 @@ endif
 
 ifdef CONFIG_CPU_LITTLE_ENDIAN
 KBUILD_CFLAGS  += -mlittle-endian
+KBUILD_CPPFLAGS+= -mlittle-endian
 KBUILD_LDFLAGS += -EL
 LDEMULATION:= lppc
 GNUTARGET  := powerpcle
@@ -83,6 +84,7 @@ MULTIPLEWORD  := -mno-multiple
 KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-save-toc-indirect)
 else
 KBUILD_CFLAGS += $(call cc-option,-mbig-endian)
+KBUILD_CPPFLAGS += $(call cc-option,-mbig-endian)
 KBUILD_LDFLAGS += -EB
 LDEMULATION:= ppc
 GNUTARGET  := powerpc
@@ -208,7 +210,6 @@ KBUILD_CPPFLAGS += -I $(srctree)/arch/$(ARCH) $(asinstr)
 KBUILD_AFLAGS  += $(AFLAGS-y)
 KBUILD_CFLAGS  += $(call cc-option,-msoft-float)
 KBUILD_CFLAGS  += -pipe $(CFLAGS-y)
-CPP= $(CC) -E $(KBUILD_CFLAGS)
 
 CHECKFLAGS += -m$(BITS) -D__powerpc__ -D__powerpc$(BITS)__
 ifdef CONFIG_CPU_BIG_ENDIAN
-- 
2.30.2

[PATCH] selftests/powerpc: Fix duplicate included pthread.h

2021-05-13 Thread Jiapeng Chong

Clean up the following includecheck warning:

./tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c: pthread.h is
included more than once.

No functional change.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c 
b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
index e2a0c07..9ef37a9 100644
--- a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
+++ b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "tm.h"
 #include "utils.h"
-- 
1.8.3.1

Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value


On 13.05.21 12:25, Greg Kroah-Hartman wrote:

On Thu, May 13, 2021 at 12:03:02PM +0200, Juergen Gross wrote:

Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Signed-off-by: Juergen Gross 
---
  drivers/tty/hvc/hvc_xen.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 92c9a476defc..30d7ffb1e04c 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons,
cons = intf->out_cons;
prod = intf->out_prod;
mb();   /* update queue values before going on */
+
+   if (WARN_ONCE((prod - cons) > sizeof(intf->out),
+ "Illegal ring page indices"))
+   return -EINVAL;


How nice, you just rebooted on panic-on-warn systems :(


+
BUG_ON((prod - cons) > sizeof(intf->out));


Why keep this line?


Failed to delete it, sorry.



Please just fix this up properly, if userspace can trigger this, then
both the WARN_ON() and BUG_ON() are not correct and need to be correctly
handled.


It can be triggered by the console backend, but I agree a WARN isn't the
way to go here.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value

2021-05-13 Thread Greg Kroah-Hartman

On Thu, May 13, 2021 at 12:03:02PM +0200, Juergen Gross wrote:
> Xen frontends shouldn't BUG() in case of illegal data received from
> their backends. So replace the BUG_ON()s when reading illegal data from
> the ring page with negative return values.
> 
> Signed-off-by: Juergen Gross 
> ---
>  drivers/tty/hvc/hvc_xen.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
> index 92c9a476defc..30d7ffb1e04c 100644
> --- a/drivers/tty/hvc/hvc_xen.c
> +++ b/drivers/tty/hvc/hvc_xen.c
> @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons,
>   cons = intf->out_cons;
>   prod = intf->out_prod;
>   mb();   /* update queue values before going on */
> +
> + if (WARN_ONCE((prod - cons) > sizeof(intf->out),
> +   "Illegal ring page indices"))
> + return -EINVAL;

How nice, you just rebooted on panic-on-warn systems :(

> +
>   BUG_ON((prod - cons) > sizeof(intf->out));

Why keep this line?

Please just fix this up properly, if userspace can trigger this, then
both the WARN_ON() and BUG_ON() are not correct and need to be correctly
handled.


>  
>   while ((sent < len) && ((prod - cons) < sizeof(intf->out)))
> @@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const 
> char *data, int len)
>*/
>   while (len) {
>   int sent = __write_console(cons, data, len);
> - 
> +
> + if (sent < 0)
> + return sent;
> +
>   data += sent;
>   len -= sent;
>  
> @@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char 
> *buf, int len)
>   cons = intf->in_cons;
>   prod = intf->in_prod;
>   mb();   /* get pointers before reading ring */
> - BUG_ON((prod - cons) > sizeof(intf->in));
> +
> + if (WARN_ONCE((prod - cons) > sizeof(intf->in),
> +   "Illegal ring page indices"))
> + return -EINVAL;

Same here, you still just paniced a machine :(

thanks,

greg k-h

Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value


On 13.05.21 12:16, Christophe Leroy wrote:



Le 13/05/2021 à 12:03, Juergen Gross a écrit :

Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Signed-off-by: Juergen Gross 
---
  drivers/tty/hvc/hvc_xen.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 92c9a476defc..30d7ffb1e04c 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -86,6 +86,11 @@ static int __write_console(struct xencons_info 
*xencons,

  cons = intf->out_cons;
  prod = intf->out_prod;
  mb();    /* update queue values before going on */
+
+    if (WARN_ONCE((prod - cons) > sizeof(intf->out),
+  "Illegal ring page indices"))
+    return -EINVAL;
+
  BUG_ON((prod - cons) > sizeof(intf->out));


Why keep the BUG_ON() ?


Oh, failed to delete it. Thanks for noticing.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value





Le 13/05/2021 à 12:03, Juergen Gross a écrit :

Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Signed-off-by: Juergen Gross 
---
  drivers/tty/hvc/hvc_xen.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 92c9a476defc..30d7ffb1e04c 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons,
cons = intf->out_cons;
prod = intf->out_prod;
mb();   /* update queue values before going on */
+
+   if (WARN_ONCE((prod - cons) > sizeof(intf->out),
+ "Illegal ring page indices"))
+   return -EINVAL;
+
BUG_ON((prod - cons) > sizeof(intf->out));


Why keep the BUG_ON() ?


  
  	while ((sent < len) && ((prod - cons) < sizeof(intf->out)))

@@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const char 
*data, int len)
 */
while (len) {
int sent = __write_console(cons, data, len);
-   
+
+   if (sent < 0)
+   return sent;
+
data += sent;
len -= sent;
  
@@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len)

cons = intf->in_cons;
prod = intf->in_prod;
mb();   /* get pointers before reading ring */
-   BUG_ON((prod - cons) > sizeof(intf->in));
+
+   if (WARN_ONCE((prod - cons) > sizeof(intf->in),
+ "Illegal ring page indices"))
+   return -EINVAL;
  
  	while (cons != prod && recv < len)

buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)];

[powerpc:fixes-test] BUILD SUCCESS da3bb206c9ceb0736d9e2897ea697acabad35833

2021-05-13 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
fixes-test
branch HEAD: da3bb206c9ceb0736d9e2897ea697acabad35833  KVM: PPC: Book3S HV: Fix 
kvm_unmap_gfn_range_hv() for Hash MMU

elapsed time: 1183m

configs tested: 9
configs skipped: 83

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   allyesconfig
x86_64rhel-8.3-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

[PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value

Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Signed-off-by: Juergen Gross 
---
 drivers/tty/hvc/hvc_xen.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 92c9a476defc..30d7ffb1e04c 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons,
cons = intf->out_cons;
prod = intf->out_prod;
mb();   /* update queue values before going on */
+
+   if (WARN_ONCE((prod - cons) > sizeof(intf->out),
+ "Illegal ring page indices"))
+   return -EINVAL;
+
BUG_ON((prod - cons) > sizeof(intf->out));
 
while ((sent < len) && ((prod - cons) < sizeof(intf->out)))
@@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const char 
*data, int len)
 */
while (len) {
int sent = __write_console(cons, data, len);
-   
+
+   if (sent < 0)
+   return sent;
+
data += sent;
len -= sent;
 
@@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char *buf, 
int len)
cons = intf->in_cons;
prod = intf->in_prod;
mb();   /* get pointers before reading ring */
-   BUG_ON((prod - cons) > sizeof(intf->in));
+
+   if (WARN_ONCE((prod - cons) > sizeof(intf->in),
+ "Illegal ring page indices"))
+   return -EINVAL;
 
while (cons != prod && recv < len)
buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)];
-- 
2.26.2

Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1





Le 13/05/2021 à 08:47, Christian Zigotzky a écrit :

Hi Christophe,


On 9. May 2021, at 19:37, Christophe Leroy  wrote:

Did I do an error in my analysis ?


No, you didn’t. On the contrary you have found the issue. ;-) The issue is 
somewhere in the new interrupt code.


I'm not convinced, but let's give it a try.




ZZ | * | ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context tracking 
scheme


Could you please create a patch for reverting the new interrupt code? I would 
like to confirm your result.


Please fetch https://github.com/chleroy/linux.git and try the branch 
for_christian.

This is a revert of approx a dozen of commits around the changes to 64e on top 
of v5.13-rc1.

If the top of the branch works for you, it would be great if you can find out which one of the 
reverts fixes the problem for real.


Thanks
Christophe

Re: [RFC 01/10] powerpc/rtas: new APIs for busy and extended delay statuses





On 04/05/2021 13:03, Nathan Lynch wrote:

Add new APIs for handling busy (-2) and extended delay
hint (9900...9905) statuses from RTAS. These are intended to be
drop-in replacements for existing uses of rtas_busy_delay().

A problem with rtas_busy_delay() and rtas_busy_delay_time() is that
they consider -2/busy to be equivalent to 9900 (wait 1ms). In fact,
the OS should call again as soon as it wants on -2, which at least on
PowerVM means RTAS is returning only to uphold the general requirement
that RTAS must return control to the OS in a "timely fashion" (250us).

Combine this with the fact that msleep(1) actually sleeps for more
like 20ms in practice: on busy VMs we schedule away for much longer
than necessary on -2 and 9900.

This is fixed in rtas_sched_if_busy(), which uses usleep_range() for
small delay hints, and only schedules away on -2 if there is other
work available. It also refuses to sleep longer than one second
regardless of the hinted value, on the assumption that even longer
running operations can tolerate polling at 1HZ.

rtas_spin_if_busy() and rtas_force_spin_if_busy() are provided for
atomic contexts which need to handle busy status and extended delay
hints.

Signed-off-by: Nathan Lynch 
---
  arch/powerpc/include/asm/rtas.h |   4 +
  arch/powerpc/kernel/rtas.c  | 168 
  2 files changed, 172 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9dc97d2f9d27..555ff3290f92 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -266,6 +266,10 @@ extern int rtas_set_rtc_time(struct rtc_time *rtc_time);
  extern unsigned int rtas_busy_delay_time(int status);
  extern unsigned int rtas_busy_delay(int status);
  
+bool rtas_sched_if_busy(int status);

+bool rtas_spin_if_busy(int status);
+bool rtas_force_spin_if_busy(int status);
+
  extern int early_init_dt_scan_rtas(unsigned long node,
const char *uname, int depth, void *data);
  
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c

index 6bada744402b..4a1dfbfa51ba 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -519,6 +519,174 @@ unsigned int rtas_busy_delay(int status)
  }
  EXPORT_SYMBOL(rtas_busy_delay);
  
+/**

+ * rtas_force_spin_if_busy() - Consume a busy or extended delay status
+ * in atomic context.
+ * @status: Return value from rtas_call() or similar function.
+ *
+ * Use this function when you cannot avoid using an RTAS function
+ * which may return an extended delay hint in atomic context. If
+ * possible, use rtas_spin_if_busy() or rtas_sched_if_busy() instead
+ * of this function.
+ *
+ * Return: True if @status is -2 or 990x, in which case
+ * rtas_spin_if_busy() will have delayed an appropriate amount
+ * of time, and the caller should call the RTAS function
+ * again. False otherwise.
+ */
+bool rtas_force_spin_if_busy(int status)


rtas_force_delay_if_busy()? neither this one nor rtas_spin_if_busy() 
actually spins.




+{
+   bool was_busy = true;
+
+   switch (status) {
+   case RTAS_BUSY:
+   /* OK to call again immediately; do nothing. */
+   break;
+   case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
+   mdelay(1);
+   break;
+   default:
+   was_busy = false;
+   break;
+   }
+
+   return was_busy;
+}
+
+/**
+ * rtas_spin_if_busy() - Consume a busy status in atomic context.
+ * @status: Return value from rtas_call() or similar function.
+ *
+ * Prefer rtas_sched_if_busy() over this function. Prefer this
+ * function over rtas_force_spin_if_busy(). Use this function in
+ * atomic contexts with RTAS calls that are specified to return -2 but
+ * not 990x. This function will complain and execute a minimal delay
+ * if passed a 990x status.
+ *
+ * Return: True if @status is -2 or 990x, in which case
+ * rtas_spin_if_busy() will have delayed an appropriate amount
+ * of time, and the caller should call the RTAS function
+ * again. False otherwise.
+ */
+bool rtas_spin_if_busy(int status)


rtas_delay_if_busy()?



+{
+   bool was_busy = true;
+
+   switch (status) {
+   case RTAS_BUSY:
+   /* OK to call again immediately; do nothing. */
+   break;
+   case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
+   /*
+* Generally, RTAS functions which can return this
+* status should be considered too expensive to use in
+* atomic context. Change the calling code to use
+* rtas_sched_if_busy(), or if that's not possible,
+* use rtas_force_spin_if_busy().
+*/
+   pr_warn_once("%pS may use RTAS call in atomic context which returns 
extended delay.\n",
+__builtin_return_address(0));
+

[PATCH v2] mm/ioremap: Fix iomap_max_page_shift

iomap_max_page_shift is expected to contain a page shift,
so it can't be a 'bool', has to be an 'unsigned int'

And fix the default values: P4D_SHIFT is when huge iomap is allowed.

However, on some architectures (eg: powerpc book3s/64), P4D_SHIFT is
not a constant so it can't be used to initialise a static variable.
So, initialise iomap_max_page_shift with a maximum shift supported
by the architecture, it is gated by P4D_SHIFT in vmap_try_huge_p4d()
anyway.

Signed-off-by: Christophe Leroy 
Fixes: bbc180a5adb0 ("mm: HUGE_VMAP arch support cleanup")
Reviewed-by: Nicholas Piggin 
Reviewed-by: Anshuman Khandual 
---
v2: Initialise with BITS_PER_LONG - 1 instead of P4D_SHIFT which is not a 
constant on powerpc/64.
---
 mm/ioremap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/ioremap.c b/mm/ioremap.c
index d1dcc7e744ac..8ee0136f8cb0 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -16,16 +16,16 @@
 #include "pgalloc-track.h"
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-static bool __ro_after_init iomap_max_page_shift = PAGE_SHIFT;
+static unsigned int __ro_after_init iomap_max_page_shift = BITS_PER_LONG - 1;
 
 static int __init set_nohugeiomap(char *str)
 {
-   iomap_max_page_shift = P4D_SHIFT;
+   iomap_max_page_shift = PAGE_SHIFT;
return 0;
 }
 early_param("nohugeiomap", set_nohugeiomap);
 #else /* CONFIG_HAVE_ARCH_HUGE_VMAP */
-static const bool iomap_max_page_shift = PAGE_SHIFT;
+static const unsigned int iomap_max_page_shift = PAGE_SHIFT;
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 int ioremap_page_range(unsigned long addr,
-- 
2.25.0

Re: [PATCH v2] powerpc/papr_scm: Make 'perf_stats' invisible if perf-stats unavailable

2021-05-13 Thread Vaibhav Jain

Thanks for looking into this patch Dan,

Dan Williams  writes:

> On Fri, May 7, 2021 at 4:40 AM Vaibhav Jain  wrote:
>>
>> In case performance stats for an nvdimm are not available, reading the
>> 'perf_stats' sysfs file returns an -ENOENT error. A better approach is
>> to make the 'perf_stats' file entirely invisible to indicate that
>> performance stats for an nvdimm are unavailable.
>>
>> So this patch updates 'papr_nd_attribute_group' to add a 'is_visible'
>> callback implemented as newly introduced 'papr_nd_attribute_visible()'
>> that returns an appropriate mode in case performance stats aren't
>> supported in a given nvdimm.
>>
>> Also the initialization of 'papr_scm_priv.stat_buffer_len' is moved
>> from papr_scm_nvdimm_init() to papr_scm_probe() so that it value is
>> available when 'papr_nd_attribute_visible()' is called during nvdimm
>> initialization.
>>
>> Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from 
>> PHYP')
>
> Since this has been the exposed ABI since v5.9 perhaps a note /
> analysis is needed here that the disappearance of this file will not
> break any existing scripts/tools.
>
Yes, I have added a note in patch description and also updated
Documentation/ABI/testing/sysfs-bus-papr-pmem with a note on when this
sysfs attr would be available. Thanks for this suggestion.

>> Signed-off-by: Vaibhav Jain 
>> ---
>> Changelog:
>>
>> v2:
>> * Removed a redundant dev_info() from pap_scm_nvdimm_init() [ Aneesh ]
>> * Marked papr_nd_attribute_visible() as static which also fixed the
>>   build warning reported by kernel build robot
>> ---
>>  arch/powerpc/platforms/pseries/papr_scm.c | 35 ---
>>  1 file changed, 24 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index e2b69cc3beaf..11e7b90a3360 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -907,6 +907,20 @@ static ssize_t flags_show(struct device *dev,
>>  }
>>  DEVICE_ATTR_RO(flags);
>>
>> +static umode_t papr_nd_attribute_visible(struct kobject *kobj,
>> +struct attribute *attr, int n)
>> +{
>> +   struct device *dev = container_of(kobj, typeof(*dev), kobj);
>
> This can use the kobj_to_dev() helper.
>
Fixed in v3
>> +   struct nvdimm *nvdimm = to_nvdimm(dev);
>> +   struct papr_scm_priv *p = nvdimm_provider_data(nvdimm);
>> +
>> +   /* For if perf-stats not available remove perf_stats sysfs */
>> +   if (attr == _attr_perf_stats.attr && p->stat_buffer_len == 0)
>> +   return 0;
>> +
>> +   return attr->mode;
>> +}
>> +
>>  /* papr_scm specific dimm attributes */
>>  static struct attribute *papr_nd_attributes[] = {
>> _attr_flags.attr,
>> @@ -916,6 +930,7 @@ static struct attribute *papr_nd_attributes[] = {
>>
>>  static struct attribute_group papr_nd_attribute_group = {
>> .name = "papr",
>> +   .is_visible = papr_nd_attribute_visible,
>> .attrs = papr_nd_attributes,
>>  };
>>
>> @@ -931,7 +946,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>> struct nd_region_desc ndr_desc;
>> unsigned long dimm_flags;
>> int target_nid, online_nid;
>> -   ssize_t stat_size;
>>
>> p->bus_desc.ndctl = papr_scm_ndctl;
>> p->bus_desc.module = THIS_MODULE;
>> @@ -1016,16 +1030,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv 
>> *p)
>> list_add_tail(>region_list, _nd_regions);
>> mutex_unlock(_ndr_lock);
>>
>> -   /* Try retriving the stat buffer and see if its supported */
>> -   stat_size = drc_pmem_query_stats(p, NULL, 0);
>> -   if (stat_size > 0) {
>> -   p->stat_buffer_len = stat_size;
>> -   dev_dbg(>pdev->dev, "Max perf-stat size %lu-bytes\n",
>> -   p->stat_buffer_len);
>> -   } else {
>> -   dev_info(>pdev->dev, "Dimm performance stats 
>> unavailable\n");
>> -   }
>> -
>> return 0;
>>
>>  err:   nvdimm_bus_unregister(p->bus);
>> @@ -1102,6 +1106,7 @@ static int papr_scm_probe(struct platform_device *pdev)
>> u64 blocks, block_size;
>> struct papr_scm_priv *p;
>> const char *uuid_str;
>> +   ssize_t stat_size;
>> u64 uuid[2];
>> int rc;
>>
>> @@ -1179,6 +1184,14 @@ static int papr_scm_probe(struct platform_device 
>> *pdev)
>> p->res.name  = pdev->name;
>> p->res.flags = IORESOURCE_MEM;
>>
>> +   /* Try retriving the stat buffer and see if its supported */
>
> s/retriving/retrieving/
>
Fixed in v3.
>> +   stat_size = drc_pmem_query_stats(p, NULL, 0);
>> +   if (stat_size > 0) {
>> +   p->stat_buffer_len = stat_size;
>> +   dev_dbg(>pdev->dev, "Max perf-stat size %lu-bytes\n",
>> +   p->stat_buffer_len);
>> +   }
>> +
>> rc = papr_scm_nvdimm_init(p);

[powerpc:next-test] BUILD SUCCESS d64ef8451fbc85489928bea033a63d9c09fca786

2021-05-13 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: d64ef8451fbc85489928bea033a63d9c09fca786  powerpc/32s: Remove 
asm/book3s/32/hash.h

elapsed time: 1129m

configs tested: 9
configs skipped: 83

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   allyesconfig
x86_64rhel-8.3-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

[PATCH v3] powerpc/papr_scm: Make 'perf_stats' invisible if perf-stats unavailable

2021-05-13 Thread Vaibhav Jain

In case performance stats for an nvdimm are not available, reading the
'perf_stats' sysfs file returns an -ENOENT error. A better approach is
to make the 'perf_stats' file entirely invisible to indicate that
performance stats for an nvdimm are unavailable.

So this patch updates 'papr_nd_attribute_group' to add a 'is_visible'
callback implemented as newly introduced 'papr_nd_attribute_visible()'
that returns an appropriate mode in case performance stats aren't
supported in a given nvdimm.

Also the initialization of 'papr_scm_priv.stat_buffer_len' is moved
from papr_scm_nvdimm_init() to papr_scm_probe() so that it value is
available when 'papr_nd_attribute_visible()' is called during nvdimm
initialization.

Even though 'perf_stats' attribute is available since v5.9, there are
no known user-space tools/scripts that are dependent on presence of its
sysfs file. Hence I dont expect any user-space breakage with this
patch.

Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from 
PHYP')
Reviewed-by: Dan Williams 
Signed-off-by: Vaibhav Jain 
---
Changelog:

v3:
* Minor spell corrections [ Dan Williams ]
* Switched to kobj_to_dev() helper in papr_nd_attribute_visible() [
  Dan Williams ]
* Updated ABI/../sysfs-bus-papr-pmem to indicate that the attribute is
  only available for devices that support performance stats.
* Updated patch description.

v2:
* Removed a redundant dev_info() from pap_scm_nvdimm_init() [ Aneesh ]
* Marked papr_nd_attribute_visible() as static which also fixed the
  build warning reported by kernel build robot
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem |  8 +++--
 arch/powerpc/platforms/pseries/papr_scm.c | 35 +--
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem 
b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 8316c33862a0..0aa02bf2bde5 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -39,9 +39,11 @@ KernelVersion:   v5.9
 Contact:   linuxppc-dev , 
linux-nvd...@lists.01.org,
 Description:
(RO) Report various performance stats related to papr-scm NVDIMM
-   device.  Each stat is reported on a new line with each line
-   composed of a stat-identifier followed by it value. Below are
-   currently known dimm performance stats which are reported:
+   device. This attribute is only available for NVDIMM devices
+   that support reporting NVDIMM performance stats. Each stat is
+   reported on a new line with each line composed of a
+   stat-identifier followed by it value. Below are currently known
+   dimm performance stats which are reported:
 
* "CtlResCt" : Controller Reset Count
* "CtlResTm" : Controller Reset Elapsed Time
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index e2b69cc3beaf..e8577e7e49ab 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -907,6 +907,20 @@ static ssize_t flags_show(struct device *dev,
 }
 DEVICE_ATTR_RO(flags);
 
+static umode_t papr_nd_attribute_visible(struct kobject *kobj,
+struct attribute *attr, int n)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct nvdimm *nvdimm = to_nvdimm(dev);
+   struct papr_scm_priv *p = nvdimm_provider_data(nvdimm);
+
+   /* For if perf-stats not available remove perf_stats sysfs */
+   if (attr == _attr_perf_stats.attr && p->stat_buffer_len == 0)
+   return 0;
+
+   return attr->mode;
+}
+
 /* papr_scm specific dimm attributes */
 static struct attribute *papr_nd_attributes[] = {
_attr_flags.attr,
@@ -916,6 +930,7 @@ static struct attribute *papr_nd_attributes[] = {
 
 static struct attribute_group papr_nd_attribute_group = {
.name = "papr",
+   .is_visible = papr_nd_attribute_visible,
.attrs = papr_nd_attributes,
 };
 
@@ -931,7 +946,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
struct nd_region_desc ndr_desc;
unsigned long dimm_flags;
int target_nid, online_nid;
-   ssize_t stat_size;
 
p->bus_desc.ndctl = papr_scm_ndctl;
p->bus_desc.module = THIS_MODULE;
@@ -1016,16 +1030,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
list_add_tail(>region_list, _nd_regions);
mutex_unlock(_ndr_lock);
 
-   /* Try retriving the stat buffer and see if its supported */
-   stat_size = drc_pmem_query_stats(p, NULL, 0);
-   if (stat_size > 0) {
-   p->stat_buffer_len = stat_size;
-   dev_dbg(>pdev->dev, "Max perf-stat size %lu-bytes\n",
-   p->stat_buffer_len);
-   } else {
-   dev_info(>pdev->dev, "Dimm performance stats unavailable\n");
-

[PATCH v2 2/2] powerpc/kprobes: Replace ppc_optinsn by common optinsn

Commit 51c9c0843993 ("powerpc/kprobes: Implement Optprobes")
implemented a powerpc specific version of optinsn in order
to workaround the 32Mb limitation for direct branches.

Instead of implementing a dedicated powerpc version, use the
common optinsn and override the allocation and freeing functions.

This also indirectly remove the CLANG warning about
is_kprobe_ppc_optinsn_slot() not being use, and the powerpc will
now benefit from commit 5b485629ba0d ("kprobes, extable: Identify
kprobes trampolines as kernel text area")

Suggested-by: Masami Hiramatsu 
Signed-off-by: Christophe Leroy 
Acked-by: Masami Hiramatsu 
---
v2: no change
---
 arch/powerpc/kernel/optprobes.c | 23 +--
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index cdf87086fa33..a370190cd02a 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -31,11 +31,9 @@
 #define TMPL_END_IDX   \
(optprobe_template_end - optprobe_template_entry)
 
-DEFINE_INSN_CACHE_OPS(ppc_optinsn);
-
 static bool insn_page_in_use;
 
-static void *__ppc_alloc_insn_page(void)
+void *alloc_optinsn_page(void)
 {
if (insn_page_in_use)
return NULL;
@@ -43,20 +41,11 @@ static void *__ppc_alloc_insn_page(void)
return _slot;
 }
 
-static void __ppc_free_insn_page(void *page __maybe_unused)
+void free_optinsn_page(void *page)
 {
insn_page_in_use = false;
 }
 
-struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
-   .mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
-   .pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
-   /* insn_size initialized later */
-   .alloc = __ppc_alloc_insn_page,
-   .free = __ppc_free_insn_page,
-   .nr_garbage = 0,
-};
-
 /*
  * Check if we can optimize this probe. Returns NIP post-emulation if this can
  * be optimized and 0 otherwise.
@@ -136,7 +125,7 @@ NOKPROBE_SYMBOL(optimized_callback);
 void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
 {
if (op->optinsn.insn) {
-   free_ppc_optinsn_slot(op->optinsn.insn, 1);
+   free_optinsn_slot(op->optinsn.insn, 1);
op->optinsn.insn = NULL;
}
 }
@@ -203,14 +192,12 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe 
*op, struct kprobe *p)
unsigned long nip, size;
int rc, i;
 
-   kprobe_ppc_optinsn_slots.insn_size = MAX_OPTINSN_SIZE;
-
nip = can_optimize(p);
if (!nip)
return -EILSEQ;
 
/* Allocate instruction slot for detour buffer */
-   buff = get_ppc_optinsn_slot();
+   buff = get_optinsn_slot();
if (!buff)
return -ENOMEM;
 
@@ -297,7 +284,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe 
*op, struct kprobe *p)
return 0;
 
 error:
-   free_ppc_optinsn_slot(buff, 0);
+   free_optinsn_slot(buff, 0);
return -ERANGE;
 
 }
-- 
2.25.0

[PATCH v2 1/2] kprobes: Allow architectures to override optinsn page allocation

Some architectures like powerpc require a non standard
allocation of optinsn page, because module pages are
too far from the kernel for direct branches.

Define weak alloc_optinsn_page() and free_optinsn_page(), that
fall back on alloc_insn_page() and free_insn_page() when not
overriden by the architecture.

Suggested-by: Masami Hiramatsu 
Signed-off-by: Christophe Leroy 
Acked-by: Masami Hiramatsu 
---
v2: Added missing prototypes in linux/kprobes.h
---
 include/linux/kprobes.h |  3 +++
 kernel/kprobes.c| 14 --
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 1883a4a9f16a..02d4020615a7 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -409,6 +409,9 @@ void dump_kprobe(struct kprobe *kp);
 void *alloc_insn_page(void);
 void free_insn_page(void *page);
 
+void *alloc_optinsn_page(void);
+void free_optinsn_page(void *page);
+
 int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
   char *sym);
 
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 745f08fdd7a6..8c0a6fdef771 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -321,11 +321,21 @@ int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, 
unsigned int *symnum,
 }
 
 #ifdef CONFIG_OPTPROBES
+void __weak *alloc_optinsn_page(void)
+{
+   return alloc_insn_page();
+}
+
+void __weak free_optinsn_page(void *page)
+{
+   free_insn_page(page);
+}
+
 /* For optimized_kprobe buffer */
 struct kprobe_insn_cache kprobe_optinsn_slots = {
.mutex = __MUTEX_INITIALIZER(kprobe_optinsn_slots.mutex),
-   .alloc = alloc_insn_page,
-   .free = free_insn_page,
+   .alloc = alloc_optinsn_page,
+   .free = free_optinsn_page,
.sym = KPROBE_OPTINSN_PAGE_SYM,
.pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages),
/* .insn_size is initialized later */
-- 
2.25.0

Re: [FSL P50x0] Xorg always restarts again and again after the the PowerPC updates 5.13-1