Re: [RFC PATCH kernel] powerpc: Fix early setup to make early_ioremap work

2021-05-19 Thread Alexey Kardashevskiy




On 20/05/2021 15:46, Christophe Leroy wrote:



Le 20/05/2021 à 05:29, Alexey Kardashevskiy a écrit :

The immediate problem is that after
0bd3f9e953bd ("powerpc/legacy_serial: Use early_ioremap()")
the kernel silently reboots. The reason is that early_ioremap() returns
broken addresses as it uses slot_virt[] array which initialized with
offsets from FIXADDR_TOP == IOREMAP_END+FIXADDR_SIZE ==
KERN_IO_END- FIXADDR_SIZ + FIXADDR_SIZE == __kernel_io_end which is 0
when early_ioremap_setup() is called. __kernel_io_end is initialized
little bit later in early_init_mmu().

This fixes the initialization by swapping early_ioremap_setup and
early_init_mmu.


Hum ... Chris tested it on a T2080RDB, that must be a book3e.

So we missed it. I guess your fix is right.



Oh cool.



This also fixes IOREMAP_END to use FIXADDR_SIZE defined just next to it,
seems to make sense, unless there is some weird logic with redefining
FIXADDR_SIZE as the compiling goes.


Well, I don't think the order of defines matters, the change should be 
kept out of the fix.


When I see this:

#define IOREMAP_END(KERN_IO_END - FIXADDR_SIZE)
#define FIXADDR_SIZESZ_32M


... I have to think harder what FIXADDR_SIZE was in the first macro and 
in what order the preprocessor expands them.



But if you want it anyway, then I'd suggest to move it before 
IOREMAP_BASE in order to keep the 3 IOREMAP_xxx together.


Up to Michael, I guess.






Signed-off-by: Alexey Kardashevskiy 


Reviewed-by: Christophe Leroy 


---
  arch/powerpc/include/asm/book3s/64/pgtable.h | 2 +-
  arch/powerpc/kernel/setup_64.c   | 3 ++-
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h

index a666d561b44d..54a06129794b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -325,8 +325,8 @@ extern unsigned long pci_io_base;
  #define  PHB_IO_END    (KERN_IO_START + FULL_IO_SIZE)
  #define IOREMAP_BASE    (PHB_IO_END)
  #define IOREMAP_START    (ioremap_bot)
-#define IOREMAP_END    (KERN_IO_END - FIXADDR_SIZE)
  #define FIXADDR_SIZE    SZ_32M
+#define IOREMAP_END    (KERN_IO_END - FIXADDR_SIZE)
  /* Advertise special mapping type for AGP */
  #define HAVE_PAGE_AGP
diff --git a/arch/powerpc/kernel/setup_64.c 
b/arch/powerpc/kernel/setup_64.c

index b779d25761cf..ce09fe5debf4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -369,11 +369,12 @@ void __init early_setup(unsigned long dt_ptr)
  apply_feature_fixups();
  setup_feature_keys();
-    early_ioremap_setup();
  /* Initialize the hash table or TLB handling */
  early_init_mmu();
+    early_ioremap_setup();
+
  /*
   * After firmware and early platform setup code has set things up,
   * we note the SPR values for configurable control/performance



--
Alexey


Re: Conflict between arch/powerpc/include/asm/disassemble.h and drivers/staging/rtl8723bs/include/wifi.h

2021-05-19 Thread Greg KH
On Thu, May 20, 2021 at 07:30:49AM +0200, Christophe Leroy wrote:
> Hello,
> 
> I was trying to include powerpc asm/disassemble.h in some more widely used
> headers in order to reduce open coding, and I'm facing the following
> problem:
> 
> drivers/staging/rtl8723bs/include/wifi.h:237:30: error: conflicting types for 
> 'get_ra'
> drivers/staging/rtl8723bs/include/wifi.h:237:30: error: conflicting types for 
> 'get_ra'
> make[4]: *** [scripts/Makefile.build:272: 
> drivers/staging/rtl8723bs/core/rtw_btcoex.o] Error 1
> make[4]: *** [scripts/Makefile.build:272: 
> drivers/staging/rtl8723bs/core/rtw_ap.o] Error 1
> make[3]: *** [scripts/Makefile.build:515: drivers/staging/rtl8723bs] Error 2
> 
> (More details at 
> http://kisskb.ellerman.id.au/kisskb/head/ee2dedcaaf3fe176e68498018632767d02639d03/)
> 
> Taking into account that asm/disassemble.h has been existing since 2008
> while rtl8723bs/include/wifi.h was created in 2017, and that the get_ra()
> defined in the later is used at exactly one place only, would it be possible
> to change it there ?
> (https://elixir.bootlin.com/linux/v5.13-rc2/A/ident/get_ra)

Yes, the staging code can change, I'll make a patch for it after
coffee...



Re: [RFC PATCH kernel] powerpc: Fix early setup to make early_ioremap work

2021-05-19 Thread Christophe Leroy




Le 20/05/2021 à 05:29, Alexey Kardashevskiy a écrit :

The immediate problem is that after
0bd3f9e953bd ("powerpc/legacy_serial: Use early_ioremap()")
the kernel silently reboots. The reason is that early_ioremap() returns
broken addresses as it uses slot_virt[] array which initialized with
offsets from FIXADDR_TOP == IOREMAP_END+FIXADDR_SIZE ==
KERN_IO_END- FIXADDR_SIZ + FIXADDR_SIZE == __kernel_io_end which is 0
when early_ioremap_setup() is called. __kernel_io_end is initialized
little bit later in early_init_mmu().

This fixes the initialization by swapping early_ioremap_setup and
early_init_mmu.


Hum ... Chris tested it on a T2080RDB, that must be a book3e.

So we missed it. I guess your fix is right.



This also fixes IOREMAP_END to use FIXADDR_SIZE defined just next to it,
seems to make sense, unless there is some weird logic with redefining
FIXADDR_SIZE as the compiling goes.


Well, I don't think the order of defines matters, the change should be kept out 
of the fix.
But if you want it anyway, then I'd suggest to move it before IOREMAP_BASE in order to keep the 3 
IOREMAP_xxx together.




Signed-off-by: Alexey Kardashevskiy 


Reviewed-by: Christophe Leroy 


---
  arch/powerpc/include/asm/book3s/64/pgtable.h | 2 +-
  arch/powerpc/kernel/setup_64.c   | 3 ++-
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index a666d561b44d..54a06129794b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -325,8 +325,8 @@ extern unsigned long pci_io_base;
  #define  PHB_IO_END   (KERN_IO_START + FULL_IO_SIZE)
  #define IOREMAP_BASE  (PHB_IO_END)
  #define IOREMAP_START (ioremap_bot)
-#define IOREMAP_END(KERN_IO_END - FIXADDR_SIZE)
  #define FIXADDR_SIZE  SZ_32M
+#define IOREMAP_END(KERN_IO_END - FIXADDR_SIZE)
  
  /* Advertise special mapping type for AGP */

  #define HAVE_PAGE_AGP
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index b779d25761cf..ce09fe5debf4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -369,11 +369,12 @@ void __init early_setup(unsigned long dt_ptr)
apply_feature_fixups();
setup_feature_keys();
  
-	early_ioremap_setup();
  
  	/* Initialize the hash table or TLB handling */

early_init_mmu();
  
+	early_ioremap_setup();

+
/*
 * After firmware and early platform setup code has set things up,
 * we note the SPR values for configurable control/performance



Re: [RFC PATCH kernel] powerpc: Fix early setup to make early_ioremap work

2021-05-19 Thread Christophe Leroy




Le 20/05/2021 à 05:33, Alexey Kardashevskiy a écrit :

Hm, my thunderbird says it is not cc:'ed but git sendmail says it did cc:


Server: localhost
MAIL FROM:
RCPT TO:
RCPT TO:
RCPT TO:
RCPT TO:
From: Alexey Kardashevskiy 
To: linuxppc-dev@lists.ozlabs.org
Cc: Alexey Kardashevskiy ,
     Michael Ellerman ,
     Christophe Leroy 
Subject: [RFC PATCH kernel] powerpc: Fix early setup to make early_ioremap work


Not sure what to believe.


I got the patch.

If you are looking at the mail you received from the ppc list, I think the list removes the members 
of the list from the Cc: in order to avoid the mail being received multiple times.


Christophe




On 20/05/2021 13:29, Alexey Kardashevskiy wrote:

The immediate problem is that after
0bd3f9e953bd ("powerpc/legacy_serial: Use early_ioremap()")
the kernel silently reboots. The reason is that early_ioremap() returns
broken addresses as it uses slot_virt[] array which initialized with
offsets from FIXADDR_TOP == IOREMAP_END+FIXADDR_SIZE ==
KERN_IO_END- FIXADDR_SIZ + FIXADDR_SIZE == __kernel_io_end which is 0
when early_ioremap_setup() is called. __kernel_io_end is initialized
little bit later in early_init_mmu().

This fixes the initialization by swapping early_ioremap_setup and
early_init_mmu.

This also fixes IOREMAP_END to use FIXADDR_SIZE defined just next to it,
seems to make sense, unless there is some weird logic with redefining
FIXADDR_SIZE as the compiling goes.

Signed-off-by: Alexey Kardashevskiy 
---
  arch/powerpc/include/asm/book3s/64/pgtable.h | 2 +-
  arch/powerpc/kernel/setup_64.c   | 3 ++-
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h

index a666d561b44d..54a06129794b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -325,8 +325,8 @@ extern unsigned long pci_io_base;
  #define  PHB_IO_END    (KERN_IO_START + FULL_IO_SIZE)
  #define IOREMAP_BASE    (PHB_IO_END)
  #define IOREMAP_START    (ioremap_bot)
-#define IOREMAP_END    (KERN_IO_END - FIXADDR_SIZE)
  #define FIXADDR_SIZE    SZ_32M
+#define IOREMAP_END    (KERN_IO_END - FIXADDR_SIZE)
  /* Advertise special mapping type for AGP */
  #define HAVE_PAGE_AGP
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index b779d25761cf..ce09fe5debf4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -369,11 +369,12 @@ void __init early_setup(unsigned long dt_ptr)
  apply_feature_fixups();
  setup_feature_keys();
-    early_ioremap_setup();
  /* Initialize the hash table or TLB handling */
  early_init_mmu();
+    early_ioremap_setup();
+
  /*
   * After firmware and early platform setup code has set things up,
   * we note the SPR values for configurable control/performance





Conflict between arch/powerpc/include/asm/disassemble.h and drivers/staging/rtl8723bs/include/wifi.h

2021-05-19 Thread Christophe Leroy

Hello,

I was trying to include powerpc asm/disassemble.h in some more widely used headers in order to 
reduce open coding, and I'm facing the following problem:


drivers/staging/rtl8723bs/include/wifi.h:237:30: error: conflicting types for 
'get_ra'
drivers/staging/rtl8723bs/include/wifi.h:237:30: error: conflicting types for 
'get_ra'
make[4]: *** [scripts/Makefile.build:272: 
drivers/staging/rtl8723bs/core/rtw_btcoex.o] Error 1
make[4]: *** [scripts/Makefile.build:272: 
drivers/staging/rtl8723bs/core/rtw_ap.o] Error 1
make[3]: *** [scripts/Makefile.build:515: drivers/staging/rtl8723bs] Error 2

(More details at 
http://kisskb.ellerman.id.au/kisskb/head/ee2dedcaaf3fe176e68498018632767d02639d03/)

Taking into account that asm/disassemble.h has been existing since 2008 while 
rtl8723bs/include/wifi.h was created in 2017, and that the get_ra() defined in the later is used at 
exactly one place only, would it be possible to change it there ? 
(https://elixir.bootlin.com/linux/v5.13-rc2/A/ident/get_ra)


Thanks
Christophe


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Nicholas Piggin
Excerpts from Dmitry V. Levin's message of May 20, 2021 1:06 pm:
> On Thu, May 20, 2021 at 12:40:36PM +1000, Nicholas Piggin wrote:
> [...]
>> > Looks like struct pt_regs.trap already contains the information that could
>> > be used to tell 'sc' from 'scv': if (pt_regs.trap & ~0xf) == 0x3000, then
>> > it's scv.  Is my reading of arch/powerpc/include/asm/ptrace.h correct?
>> 
>> Hmm, I think it is. Certainly in the kernel regs struct it is, I had in 
>> my mind that we put it to 0xc00 when populating the user struct for
>> compatibility, but it seems not. So I guess this would work.
> 
> OK, can we state that (pt_regs.trap & ~0xf) == 0x3000 is a part of the scv
> ABI, so it's not going to change and could be relied upon by userspace?
> Could this be documented in Documentation/powerpc/syscall64-abi.rst,
> please?

Yeah I think we can do that. The kernel doesn't care what is put in the
userspace pt_regs.trap too much so if this is your preferred approach
then I will document it in the ABI.

Thanks,
Nick


Re: [PATCH v14 6/9] powerpc/bpf: Write protect JIT code

2021-05-19 Thread Jordan Niethe
On Mon, May 17, 2021 at 4:40 PM Christophe Leroy
 wrote:
>
>
>
> Le 17/05/2021 à 05:28, Jordan Niethe a écrit :
> > Add the necessary call to bpf_jit_binary_lock_ro() to remove write and
> > add exec permissions to the JIT image after it has finished being
> > written.
> >
> > Without CONFIG_STRICT_MODULE_RWX the image will be writable and
> > executable until the call to bpf_jit_binary_lock_ro().
>
> And _with_ CONFIG_STRICT_MODULE_RWX what will happen ? It will be _writable_ 
> but not _executable_ ?
That's right.
With CONFIG_STRICT_MODULE_RWX the image will initially be PAGE_KERNEL
from bpf_jit_alloc_exec() calling module_alloc(). So not executable.
bpf_jit_binary_lock_ro() will then remove write and add executable.

Without CONFIG_STRICT_MODULE_RWX the image will initially be
PAGE_KERNEL_EXEC from module_alloc().
bpf_jit_binary_lock_ro() will remove write, but until that point it
will have been write + exec.
>
> >
> > Reviewed-by: Christophe Leroy 
> > Signed-off-by: Jordan Niethe 
> > ---
> > v10: New to series
> > v11: Remove CONFIG_STRICT_MODULE_RWX conditional
> > ---
> >   arch/powerpc/net/bpf_jit_comp.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/powerpc/net/bpf_jit_comp.c 
> > b/arch/powerpc/net/bpf_jit_comp.c
> > index 6c8c268e4fe8..53aefee3fe70 100644
> > --- a/arch/powerpc/net/bpf_jit_comp.c
> > +++ b/arch/powerpc/net/bpf_jit_comp.c
> > @@ -237,6 +237,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
> > *fp)
> >   fp->jited_len = alloclen;
> >
> >   bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * 
> > PAGE_SIZE));
> > + bpf_jit_binary_lock_ro(bpf_hdr);
> >   if (!fp->is_func || extra_pass) {
> >   bpf_prog_fill_jited_linfo(fp, addrs);
> >   out_addrs:
> >


Re: [PATCH v14 7/9] powerpc: Set ARCH_HAS_STRICT_MODULE_RWX

2021-05-19 Thread Jordan Niethe
On Mon, May 17, 2021 at 4:49 PM Christophe Leroy
 wrote:
>
>
>
> Le 17/05/2021 à 05:28, Jordan Niethe a écrit :
> > From: Russell Currey 
> >
> > To enable strict module RWX on powerpc, set:
> >
> >  CONFIG_STRICT_MODULE_RWX=y
> >
> > You should also have CONFIG_STRICT_KERNEL_RWX=y set to have any real
> > security benefit.
> >
> > ARCH_HAS_STRICT_MODULE_RWX is set to require ARCH_HAS_STRICT_KERNEL_RWX.
> > This is due to a quirk in arch/Kconfig and arch/powerpc/Kconfig that
> > makes STRICT_MODULE_RWX *on by default* in configurations where
> > STRICT_KERNEL_RWX is *unavailable*.
> >
> > Since this doesn't make much sense, and module RWX without kernel RWX
> > doesn't make much sense, having the same dependencies as kernel RWX
> > works around this problem.
> >
> > Book32s/32 processors with a hash mmu (i.e. 604 core) can not set memory
>^^
>
> Book32s ==> Book3s
Thanks.
>
> > protection on a page by page basis so do not enable.
>
> It is not exactly that. The problem on 604 is for _exec_ protection.
Right.
>
> Note that on book3s/32, on both 603 and 604 core, it is not possible to write 
> protect kernel pages.
> So maybe it would make sense to disable ARCH_HAS_STRICT_MODULE_RWX on 
> CONFIG_PPC_BOOK3S_32
> completely, I'm not sure.
Yeah, that does seem like it would make sense to disable it.
>
>
> >
> > Signed-off-by: Russell Currey 
> > [jpn: - predicate on !PPC_BOOK3S_604
> >- make module_alloc() use PAGE_KERNEL protection]
> > Signed-off-by: Jordan Niethe 
>
> Reviewed-by: Christophe Leroy 
>
> > ---
> > v10: - Predicate on !PPC_BOOK3S_604
> >   - Make module_alloc() use PAGE_KERNEL protection
> > v11: - Neaten up
> > v13: Use strict_kernel_rwx_enabled()
> > v14: Make changes to module_alloc() its own commit
> > ---
> >   arch/powerpc/Kconfig | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index cce0a137b046..cb5d9d862c35 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -140,6 +140,7 @@ config PPC
> >   select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
> > && PPC_BOOK3S_64
> >   select ARCH_HAS_SET_MEMORY
> >   select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) 
> > && !HIBERNATION)
> > + select ARCH_HAS_STRICT_MODULE_RWX   if ARCH_HAS_STRICT_KERNEL_RWX 
> > && !PPC_BOOK3S_604
> >   select ARCH_HAS_TICK_BROADCAST  if 
> > GENERIC_CLOCKEVENTS_BROADCAST
> >   select ARCH_HAS_UACCESS_FLUSHCACHE
> >   select ARCH_HAS_COPY_MC if PPC64
> >


Re: [PATCH v8 27/30] powerpc/kprobes: Don't allow breakpoints on suffixes

2021-05-19 Thread Jordan Niethe
On Wed, May 19, 2021 at 6:11 PM Naveen N. Rao
 wrote:
>
> Christophe Leroy wrote:
> >
> >
> > Le 06/05/2020 à 05:40, Jordan Niethe a écrit :
> >> Do not allow inserting breakpoints on the suffix of a prefix instruction
> >> in kprobes.
> >>
> >> Signed-off-by: Jordan Niethe 
> >> ---
> >> v8: Add this back from v3
> >> ---
> >>   arch/powerpc/kernel/kprobes.c | 13 +
> >>   1 file changed, 13 insertions(+)
> >>
> >> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> >> index 33d54b091c70..227510df8c55 100644
> >> --- a/arch/powerpc/kernel/kprobes.c
> >> +++ b/arch/powerpc/kernel/kprobes.c
> >> @@ -106,7 +106,9 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
> >> unsigned int offset)
> >>   int arch_prepare_kprobe(struct kprobe *p)
> >>   {
> >>  int ret = 0;
> >> +struct kprobe *prev;
> >>  struct ppc_inst insn = ppc_inst_read((struct ppc_inst *)p->addr);
> >> +struct ppc_inst prefix = ppc_inst_read((struct ppc_inst *)(p->addr - 
> >> 1));
> >
> > What if p->addr is the first word of a page and the previous page is
> > not mapped ?
>
> Good catch!
> I think we can just skip validation if the instruction is at the
> beginning of a page. I have a few cleanups in this area - I will post a
> patchset soon.
Yeah thanks Christophe for noticing that. And thanks Naveen that
sounds like it should fix it.
>
> >
> >>
> >>  if ((unsigned long)p->addr & 0x03) {
> >>  printk("Attempt to register kprobe at an unaligned 
> >> address\n");
> >> @@ -114,6 +116,17 @@ int arch_prepare_kprobe(struct kprobe *p)
> >>  } else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) {
> >>  printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n");
> >>  ret = -EINVAL;
> >> +} else if (ppc_inst_prefixed(prefix)) {
> >
> > If p->addr - 2 contains a valid prefixed instruction, then p->addr - 1 
> > contains the suffix of that
> > prefixed instruction. Are we sure a suffix can never ever be misinterpreted 
> > as the prefix of a
> > prefixed instruction ?
>
> Yes. Per the ISA:
>   Bits 0:5 of all prefixes are assigned the primary opcode
>   value 0b01. 0b01 is not available for use as a
>   primary opcode for either word instructions or suffixes
>   of prefixed instructions.
Yep, a prefix will never be a valid word instruction or suffix.
>
>
> - Naveen
>


Re: [PATCH v3 0/4] P2040/P2041 i2c recovery erratum

2021-05-19 Thread Chris Packham


On 13/05/21 3:01 am, w...@kernel.org wrote:
>>> I've been doing my recent work with a P2040 and prior to that I did test
>>> out the recovery on a T2081 (which isn't documented to have this
>>> erratum) when I was re-working the driver. The "new" recovery actually
>>> seems better but I don't have a reliably faulty i2c device so that's
>>> only based on me writing some code to manually trigger the recovery
>>> (using the snippet below) and observing it with an oscilloscope.
>> You don't need a faulty device, just an aborted I2C read/write op.
> If you can wire GPIOs to the bus, you can use the I2C fault injector:
>
>   Documentation/i2c/gpio-fault-injection.rst
>
> There are already two "incomplete transfer" injectors.
>
Just giving this thread a poke. I have been looking at my options for 
triggering an i2c recovery but haven't really had time to do much. I 
think the best option given what I've got access to is a modified SFP 
that grounds the SDA line but I need to find a system where I can attach 
an oscilloscope (should be a few of these in the office when I can get 
on-site).

I can confirm that when manually triggered the existing recovery and the 
new erratum workaround produce what I'd expect to observe on an 
oscilloscope.

I haven't explored Joakim's alternative recovery but I don't think that 
should hold up these changes, any improvement to the existing recovery 
can be done later as a follow-up.


Re: [RFC PATCH kernel] powerpc: Fix early setup to make early_ioremap work

2021-05-19 Thread Alexey Kardashevskiy

Hm, my thunderbird says it is not cc:'ed but git sendmail says it did cc:


Server: localhost
MAIL FROM:
RCPT TO:
RCPT TO:
RCPT TO:
RCPT TO:
From: Alexey Kardashevskiy 
To: linuxppc-dev@lists.ozlabs.org
Cc: Alexey Kardashevskiy ,
Michael Ellerman ,
Christophe Leroy 
Subject: [RFC PATCH kernel] powerpc: Fix early setup to make 
early_ioremap work



Not sure what to believe.


On 20/05/2021 13:29, Alexey Kardashevskiy wrote:

The immediate problem is that after
0bd3f9e953bd ("powerpc/legacy_serial: Use early_ioremap()")
the kernel silently reboots. The reason is that early_ioremap() returns
broken addresses as it uses slot_virt[] array which initialized with
offsets from FIXADDR_TOP == IOREMAP_END+FIXADDR_SIZE ==
KERN_IO_END- FIXADDR_SIZ + FIXADDR_SIZE == __kernel_io_end which is 0
when early_ioremap_setup() is called. __kernel_io_end is initialized
little bit later in early_init_mmu().

This fixes the initialization by swapping early_ioremap_setup and
early_init_mmu.

This also fixes IOREMAP_END to use FIXADDR_SIZE defined just next to it,
seems to make sense, unless there is some weird logic with redefining
FIXADDR_SIZE as the compiling goes.

Signed-off-by: Alexey Kardashevskiy 
---
  arch/powerpc/include/asm/book3s/64/pgtable.h | 2 +-
  arch/powerpc/kernel/setup_64.c   | 3 ++-
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index a666d561b44d..54a06129794b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -325,8 +325,8 @@ extern unsigned long pci_io_base;
  #define  PHB_IO_END   (KERN_IO_START + FULL_IO_SIZE)
  #define IOREMAP_BASE  (PHB_IO_END)
  #define IOREMAP_START (ioremap_bot)
-#define IOREMAP_END(KERN_IO_END - FIXADDR_SIZE)
  #define FIXADDR_SIZE  SZ_32M
+#define IOREMAP_END(KERN_IO_END - FIXADDR_SIZE)
  
  /* Advertise special mapping type for AGP */

  #define HAVE_PAGE_AGP
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index b779d25761cf..ce09fe5debf4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -369,11 +369,12 @@ void __init early_setup(unsigned long dt_ptr)
apply_feature_fixups();
setup_feature_keys();
  
-	early_ioremap_setup();
  
  	/* Initialize the hash table or TLB handling */

early_init_mmu();
  
+	early_ioremap_setup();

+
/*
 * After firmware and early platform setup code has set things up,
 * we note the SPR values for configurable control/performance



--
Alexey


[RFC PATCH kernel] powerpc: Fix early setup to make early_ioremap work

2021-05-19 Thread Alexey Kardashevskiy
The immediate problem is that after
0bd3f9e953bd ("powerpc/legacy_serial: Use early_ioremap()")
the kernel silently reboots. The reason is that early_ioremap() returns
broken addresses as it uses slot_virt[] array which initialized with
offsets from FIXADDR_TOP == IOREMAP_END+FIXADDR_SIZE ==
KERN_IO_END- FIXADDR_SIZ + FIXADDR_SIZE == __kernel_io_end which is 0
when early_ioremap_setup() is called. __kernel_io_end is initialized
little bit later in early_init_mmu().

This fixes the initialization by swapping early_ioremap_setup and
early_init_mmu.

This also fixes IOREMAP_END to use FIXADDR_SIZE defined just next to it,
seems to make sense, unless there is some weird logic with redefining
FIXADDR_SIZE as the compiling goes.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 2 +-
 arch/powerpc/kernel/setup_64.c   | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index a666d561b44d..54a06129794b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -325,8 +325,8 @@ extern unsigned long pci_io_base;
 #define  PHB_IO_END(KERN_IO_START + FULL_IO_SIZE)
 #define IOREMAP_BASE   (PHB_IO_END)
 #define IOREMAP_START  (ioremap_bot)
-#define IOREMAP_END(KERN_IO_END - FIXADDR_SIZE)
 #define FIXADDR_SIZE   SZ_32M
+#define IOREMAP_END(KERN_IO_END - FIXADDR_SIZE)
 
 /* Advertise special mapping type for AGP */
 #define HAVE_PAGE_AGP
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index b779d25761cf..ce09fe5debf4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -369,11 +369,12 @@ void __init early_setup(unsigned long dt_ptr)
apply_feature_fixups();
setup_feature_keys();
 
-   early_ioremap_setup();
 
/* Initialize the hash table or TLB handling */
early_init_mmu();
 
+   early_ioremap_setup();
+
/*
 * After firmware and early platform setup code has set things up,
 * we note the SPR values for configurable control/performance
-- 
2.30.2



Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Dmitry V. Levin
On Thu, May 20, 2021 at 12:40:36PM +1000, Nicholas Piggin wrote:
[...]
> > Looks like struct pt_regs.trap already contains the information that could
> > be used to tell 'sc' from 'scv': if (pt_regs.trap & ~0xf) == 0x3000, then
> > it's scv.  Is my reading of arch/powerpc/include/asm/ptrace.h correct?
> 
> Hmm, I think it is. Certainly in the kernel regs struct it is, I had in 
> my mind that we put it to 0xc00 when populating the user struct for
> compatibility, but it seems not. So I guess this would work.

OK, can we state that (pt_regs.trap & ~0xf) == 0x3000 is a part of the scv
ABI, so it's not going to change and could be relied upon by userspace?
Could this be documented in Documentation/powerpc/syscall64-abi.rst,
please?


-- 
ldv


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Dmitry V. Levin
On Thu, May 20, 2021 at 12:45:57PM +1000, Nicholas Piggin wrote:
> Excerpts from Dmitry V. Levin's message of May 20, 2021 11:06 am:
> > On Wed, May 19, 2021 at 07:48:47PM -0400, Rich Felker wrote:
> >> On Wed, May 19, 2021 at 06:09:25PM +, Joakim Tjernlund wrote:
> > [...]
> >> > W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new 
> >> > syscall I/F? 
> >> 
> >> No, it's a new independent interface.
> > 
> > Unfortunately, being a new independent interface doesn't mean it isn't
> > an ABI break.  In fact, it was a severe ABI break, and this thread is
> > an attempt to find a hotfix.
> 
> It is an ABI break, that was known. The ptrace info stuff I fixed with 
> the patch earlier was obviously a bug in my initial implementation and 
> not intended (sorry my ptrace testing was not sufficient, and thanks for
> reporting it, by the way).

Could you check whether tools/testing/selftests/ptrace/get_syscall_info.c
passes again with your fix, please?
If yes, then PTRACE_GET_SYSCALL_INFO is fixed.

By the way, kernel tracing and audit subsystems also use those functions
from asm/syscall.h and asm/ptrace.h, so your ptrace fix is likely to fix
these subsystems as well.


-- 
ldv


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Nicholas Piggin
Excerpts from Dmitry V. Levin's message of May 20, 2021 11:06 am:
> On Wed, May 19, 2021 at 07:48:47PM -0400, Rich Felker wrote:
>> On Wed, May 19, 2021 at 06:09:25PM +, Joakim Tjernlund wrote:
> [...]
>> > W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new 
>> > syscall I/F? 
>> 
>> No, it's a new independent interface.
> 
> Unfortunately, being a new independent interface doesn't mean it isn't
> an ABI break.  In fact, it was a severe ABI break, and this thread is
> an attempt to find a hotfix.

It is an ABI break, that was known. The ptrace info stuff I fixed with 
the patch earlier was obviously a bug in my initial implementation and 
not intended (sorry my ptrace testing was not sufficient, and thanks for
reporting it, by the way).

But the register ABI was always a known break. The issue is that rfscv
clobbers LR, so it can not support the old ABI. If the old ABI did not 
preserve LR, then we may have chosen to not change register ABI at all.

Thanks,
Nick


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Nicholas Piggin
Excerpts from Dmitry V. Levin's message of May 20, 2021 9:27 am:
> On Thu, May 20, 2021 at 08:51:53AM +1000, Nicholas Piggin wrote:
>> Excerpts from Dmitry V. Levin's message of May 19, 2021 11:26 pm:
>> > On Wed, May 19, 2021 at 08:59:05PM +1000, Nicholas Piggin wrote:
>> >> Excerpts from Dmitry V. Levin's message of May 19, 2021 8:24 pm:
>> >> > On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote:
>> >> > [...]
>> >> >> With this patch, I think the ptrace ABI should mostly be fixed. I 
>> >> >> think 
>> >> >> a problem remains with applications that look at system call return 
>> >> >> registers directly and have powerpc specific error cases. Those 
>> >> >> probably
>> >> >> will just need to be updated unfortunately. Michael thought it might be
>> >> >> possible to return an indication via ptrace somehow that the syscall is
>> >> >> using a new ABI, so such apps can be updated to test for it. I don't 
>> >> >> know how that would be done.
>> >> > 
>> >> > Is there any sane way for these applications to handle the scv case?
>> >> > How can they tell that the scv semantics is being used for the given
>> >> > syscall invocation?  Can this information be obtained e.g. from struct
>> >> > pt_regs?
>> >> 
>> >> Not that I know of. Michael suggested there might be a way to add 
>> >> something. ptrace_syscall_info has some pad bytes, could
>> >> we use one for flags bits and set a bit for "new system call ABI"?
>> > 
>> > PTRACE_GET_SYSCALL_INFO is an architecture-agnostic API, it hides all
>> > architecture-specific details behind struct ptrace_syscall_info which has
>> > the same meaning on all architectures.  ptrace_syscall_info.exit contains
>> > both rval and is_error fields to support every architecture regardless of
>> > its syscall ABI.
>> > 
>> > ptrace_syscall_info.exit is extensible, but every architecture would have
>> > to define a method of telling whether the system call follows the "new
>> > system call ABI" conventions to export this bit of information.
>> 
>> It's already architecture speicfic if you look at registers of syscall 
>> exit state so I don't see a problem with a flag that ppc can use for
>> ABI.
> 
> To be honest, I don't see anything architecture-specific in
> PTRACE_GET_SYSCALL_INFO API.  Yes, it's implementation uses various
> functions defined in asm/syscall.h, but this doesn't make the interface
> architecture-specific.

No. But a field or flag it exports could be architecture dependent.
It doesn't detract independence from the rest of the ABI. That said...

> PTRACE_GET_SYSCALL_INFO saves its users from necessity to be aware of
> tracee registers.  That's why the only place where strace has to deal
> with tracee registers nowadays is syscall tampering.  The most reliable
> solution is to introduce PTRACE_SET_SYSCALL_INFO, this would make the
> whole syscall abi issue irrelevant for ptracers, maybe the time has come
> to implement it.
> 
> Unfortunately, extending ptrace API takes time, and it's not going to be
> backported to older kernels anyway, but scv-enabled kernels are already
> in the wild, so we need a quick powerpc-specific fix that would be
> backported to all maintained scv-enabled kernels.
> 
> [...]
>> > I wonder why can't this information be just exported to the tracer via
>> > struct pt_regs?
>> 
>> It might be able to, I don't see why that would be superior though.
>> 
>> Where could you put it... I guess it could go in the trap field in a 
>> high bit. But could that break things that just test for syscall 
>> trap number (and don't care about register ABI)? I'm not sure.
> 
> Looks like struct pt_regs.trap already contains the information that could
> be used to tell 'sc' from 'scv': if (pt_regs.trap & ~0xf) == 0x3000, then
> it's scv.  Is my reading of arch/powerpc/include/asm/ptrace.h correct?

Hmm, I think it is. Certainly in the kernel regs struct it is, I had in 
my mind that we put it to 0xc00 when populating the user struct for
compatibility, but it seems not. So I guess this would work.

Thanks,
Nick


Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries

2021-05-19 Thread Peter Xu
On Wed, May 19, 2021 at 10:16:07AM +0530, Aneesh Kumar K.V wrote:
> > On Thu, Apr 22, 2021 at 11:13:17AM +0530, Aneesh Kumar K.V wrote:
> >> pmd/pud_populate is the right interface to be used to set the respective
> >> page table entries. Some architectures like ppc64 do assume that 
> >> set_pmd/pud_at
> >> can only be used to set a hugepage PTE. Since we are not setting up a 
> >> hugepage
> >> PTE here, use the pmd/pud_populate interface.

[1]

> Can you try this change?
> 
> modified   mm/mremap.c
> @@ -279,7 +279,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, 
> unsigned long old_addr,
>   pmd_clear(old_pmd);
>  
>   VM_BUG_ON(!pmd_none(*new_pmd));
> - pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
> + pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
>  
>   if (new_ptl != old_ptl)
>   spin_unlock(new_ptl);

I reported this issue today somewhere else:

https://lore.kernel.org/linux-mm/YKVemB5DuSqLFmmz@t490s/

And came to this same line after the bisection.

This seems to work at least for my userfaultfd test on shmem, however I don't
fully understand the commit message [1] on: How do we guarantee we're not
moving a thp pte?

-- 
Peter Xu



Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Dmitry V. Levin
On Wed, May 19, 2021 at 07:48:47PM -0400, Rich Felker wrote:
> On Wed, May 19, 2021 at 06:09:25PM +, Joakim Tjernlund wrote:
[...]
> > W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new 
> > syscall I/F? 
> 
> No, it's a new independent interface.

Unfortunately, being a new independent interface doesn't mean it isn't
an ABI break.  In fact, it was a severe ABI break, and this thread is
an attempt to find a hotfix.


-- 
ldv


Re: [musl] Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Rich Felker
On Wed, May 19, 2021 at 06:09:25PM +, Joakim Tjernlund wrote:
> On Wed, 2021-05-19 at 10:22 -0500, Segher Boessenkool wrote:
> > On Wed, May 19, 2021 at 03:06:49PM +, Joakim Tjernlund wrote:
> > > On Wed, 2021-05-19 at 09:38 -0500, Segher Boessenkool wrote:
> > > > On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote:
> > > > > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm:
> > > > > > I always figured the ppc way was superior. It begs the question if 
> > > > > > not the other archs should
> > > > > > change instead?
> > > > > 
> > > > > It is superior in some ways, not enough to be worth being different.
> > > > 
> > > > The PowerPC syscall ABI *requires* using cr0.3 for indicating errors,
> > > > you will have to do that whether you conflate the concepts of return
> > > > code and error indicator or not!
> > > > 
> > > > > Other archs are unlikely to change because it would be painful for
> > > > > not much benefit.
> > > > 
> > > > Other archs cannot easily change for much the same reason :-)
> > > 
> > > Really? I figured you could just add extra error indication in kernel 
> > > syscall I/F.
> > > Eventually user space could migrate to the new indication.
> > 
> > You seem to assume all user space uses glibc, or *any* libc even?  This
> > is false.  Some programs do system calls directly.  Do not break the
> > kernel ABI :-)
> 
> Adding an extra indicator does not break ABI, does it ?

It does, because the old ABI on most archs has no clobbers except the
return value register. Some archs though have additional
syscall-clobbered regs that could be repurposed as extra return
values, but you could only rely on them being meaningful after probing
for a kernel that speaks the new variant. This just makes things more
complicated, not more useful.

> W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new 
> syscall I/F? 

No, it's a new independent interface.

Rich


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Dmitry V. Levin
On Thu, May 20, 2021 at 08:51:53AM +1000, Nicholas Piggin wrote:
> Excerpts from Dmitry V. Levin's message of May 19, 2021 11:26 pm:
> > On Wed, May 19, 2021 at 08:59:05PM +1000, Nicholas Piggin wrote:
> >> Excerpts from Dmitry V. Levin's message of May 19, 2021 8:24 pm:
> >> > On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote:
> >> > [...]
> >> >> With this patch, I think the ptrace ABI should mostly be fixed. I think 
> >> >> a problem remains with applications that look at system call return 
> >> >> registers directly and have powerpc specific error cases. Those probably
> >> >> will just need to be updated unfortunately. Michael thought it might be
> >> >> possible to return an indication via ptrace somehow that the syscall is
> >> >> using a new ABI, so such apps can be updated to test for it. I don't 
> >> >> know how that would be done.
> >> > 
> >> > Is there any sane way for these applications to handle the scv case?
> >> > How can they tell that the scv semantics is being used for the given
> >> > syscall invocation?  Can this information be obtained e.g. from struct
> >> > pt_regs?
> >> 
> >> Not that I know of. Michael suggested there might be a way to add 
> >> something. ptrace_syscall_info has some pad bytes, could
> >> we use one for flags bits and set a bit for "new system call ABI"?
> > 
> > PTRACE_GET_SYSCALL_INFO is an architecture-agnostic API, it hides all
> > architecture-specific details behind struct ptrace_syscall_info which has
> > the same meaning on all architectures.  ptrace_syscall_info.exit contains
> > both rval and is_error fields to support every architecture regardless of
> > its syscall ABI.
> > 
> > ptrace_syscall_info.exit is extensible, but every architecture would have
> > to define a method of telling whether the system call follows the "new
> > system call ABI" conventions to export this bit of information.
> 
> It's already architecture speicfic if you look at registers of syscall 
> exit state so I don't see a problem with a flag that ppc can use for
> ABI.

To be honest, I don't see anything architecture-specific in
PTRACE_GET_SYSCALL_INFO API.  Yes, it's implementation uses various
functions defined in asm/syscall.h, but this doesn't make the interface
architecture-specific.

PTRACE_GET_SYSCALL_INFO saves its users from necessity to be aware of
tracee registers.  That's why the only place where strace has to deal
with tracee registers nowadays is syscall tampering.  The most reliable
solution is to introduce PTRACE_SET_SYSCALL_INFO, this would make the
whole syscall abi issue irrelevant for ptracers, maybe the time has come
to implement it.

Unfortunately, extending ptrace API takes time, and it's not going to be
backported to older kernels anyway, but scv-enabled kernels are already
in the wild, so we need a quick powerpc-specific fix that would be
backported to all maintained scv-enabled kernels.

[...]
> > I wonder why can't this information be just exported to the tracer via
> > struct pt_regs?
> 
> It might be able to, I don't see why that would be superior though.
> 
> Where could you put it... I guess it could go in the trap field in a 
> high bit. But could that break things that just test for syscall 
> trap number (and don't care about register ABI)? I'm not sure.

Looks like struct pt_regs.trap already contains the information that could
be used to tell 'sc' from 'scv': if (pt_regs.trap & ~0xf) == 0x3000, then
it's scv.  Is my reading of arch/powerpc/include/asm/ptrace.h correct?


-- 
ldv


Re: [FSL P50x0] KVM HV doesn't work anymore

2021-05-19 Thread Nicholas Piggin
Excerpts from Christian Zigotzky's message of May 19, 2021 9:52 pm:
> On 19 May 2021 at 09:57 am, Nicholas Piggin wrote:
>> Excerpts from Christian Zigotzky's message of May 17, 2021 7:42 pm:
>>> On 17 May 2021 at 09:42am, Nicholas Piggin wrote:
 Excerpts from Christian Zigotzky's message of May 15, 2021 11:46 pm:
> I tried it but it doesn't solve the issue. The uImage works without 
> KVM
> HV in a virtual e5500 QEMU machine.
 Any more progress with this? I would say that bisect might have just
 been a bit unstable and maybe by chance some things did not crash so
 it's pointing to the wrong patch.

 Upstream merge of powerpc-5.13-1 was good and powerpc-5.13-2 was bad?

 Between that looks like some KVM MMU rework. You could try the patch
 before this one b1c5356e873c ("KVM: PPC: Convert to the gfn-based MMU
 notifier callbacks"). That won't revert cleanly so just try run the
 tree at that point. If it works, test the patch and see if it fails.

 Thanks,
 Nick
>>> Hi Nick,
>>>
>>> Thanks a lot for your answer. Yes, there is a little bit of progress.
>>> The RC2 of kernel 5.13 successfully boots with -smp 3 in a virtual e5500
>>> QEMU machine.
>>> -smp 4 doesn't work anymore since the PowerPC updates 5.13-2. I used
>>> -smp 4 before 5.13 because my FSL P5040 machine has 4 cores.
>>>
>>> Could you please post a patch for reverting the commit before
>>> b1c5356e873c ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks")?
>> You could `git checkout b1c5356e873c~1`
>>
>> Thanks,
>> Nick
> Hi Nick,
> 
> Thanks for your answer. I checked out the commit b1c5356e873c~1 (HEAD is 
> now at d923ff258423 KVM: MIPS/MMU: Convert to the gfn-based MMU notifier 
> callbacks).
> The kernel boots with '-smp 4' without any problems.
> After that I patched with the probable first bad commit (KVM: PPC: 
> Convert to the gfn-based MMU notifier callbacks). The kernel also boots 
> with this patch. That means, this isn't the first bad commit.
> Further information: 
> https://forum.hyperion-entertainment.com/viewtopic.php?p=53267#p53267

Hmm, okay that probably rules out those notifier changes then.

Can you remind me were you able to rule these out as suspects?

8f6cc75a97d1 powerpc: move norestart trap flag to bit 0
8dc7f0229b78 powerpc: remove partial register save logic
c45ba4f44f6b powerpc: clean up do_page_fault
d738ee8d56de powerpc/64e/interrupt: handle bad_page_fault in C
ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context tracking scheme
097157e16cf8 powerpc/64e/interrupt: reconcile irq soft-mask state in C
3db8aa10de9a powerpc/64e/interrupt: NMI save irq soft-mask state in C
0c2472de23ae powerpc/64e/interrupt: use new interrupt return
dc6231821a14 powerpc/interrupt: update common interrupt code for
4228b2c3d20e powerpc/64e/interrupt: always save nvgprs on interrupt
5a5a893c4ad8 powerpc/syscall: switch user_exit_irqoff and trace_hardirqs_off 
order

Thanks,
Nick


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Nicholas Piggin
Excerpts from Dmitry V. Levin's message of May 19, 2021 11:26 pm:
> On Wed, May 19, 2021 at 08:59:05PM +1000, Nicholas Piggin wrote:
>> Excerpts from Dmitry V. Levin's message of May 19, 2021 8:24 pm:
>> > On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote:
>> > [...]
>> >> With this patch, I think the ptrace ABI should mostly be fixed. I think 
>> >> a problem remains with applications that look at system call return 
>> >> registers directly and have powerpc specific error cases. Those probably
>> >> will just need to be updated unfortunately. Michael thought it might be
>> >> possible to return an indication via ptrace somehow that the syscall is
>> >> using a new ABI, so such apps can be updated to test for it. I don't 
>> >> know how that would be done.
>> > 
>> > Is there any sane way for these applications to handle the scv case?
>> > How can they tell that the scv semantics is being used for the given
>> > syscall invocation?  Can this information be obtained e.g. from struct
>> > pt_regs?
>> 
>> Not that I know of. Michael suggested there might be a way to add 
>> something. ptrace_syscall_info has some pad bytes, could
>> we use one for flags bits and set a bit for "new system call ABI"?
> 
> PTRACE_GET_SYSCALL_INFO is an architecture-agnostic API, it hides all
> architecture-specific details behind struct ptrace_syscall_info which has
> the same meaning on all architectures.  ptrace_syscall_info.exit contains
> both rval and is_error fields to support every architecture regardless of
> its syscall ABI.
> 
> ptrace_syscall_info.exit is extensible, but every architecture would have
> to define a method of telling whether the system call follows the "new
> system call ABI" conventions to export this bit of information.

It's already architecture speicfic if you look at registers of syscall 
exit state so I don't see a problem with a flag that ppc can use for
ABI.

> 
> This essentially means implementing something like
> static inline long syscall_get_error_abi(struct task_struct *task, struct 
> pt_regs *regs)
> for every architecture, and using it along with syscall_get_error
> in ptrace_get_syscall_info_exit to initialize the new field in
> ptrace_syscall_info.exit structure.

Yes this could work. Other architectures can just use a generic
implementation if they don't define their own so that's easy. And
in userspace they can continue to ignore the flag.

> 
>> As a more hacky thing you could make a syscall with -1 and see how
>> the error looks, and then assume all syscalls will be the same.
> 
> This would be very unreliable because sc and scv are allowed to intermingle,
> so every syscall invocation can follow any of these two error handling
> conventions.
> 
>> Or... is it possible at syscall entry to peek the address of
>> the instruction which caused the call and see if that was a
>> scv instruction? That would be about as reliable as possible
>> without having that new flag bit.
> 
> No other architecture requires peeking into tracee memory just to find out
> the syscall ABI.  This would make powerpc the most ugly architecture for
> ptracing.
> 
> I wonder why can't this information be just exported to the tracer via
> struct pt_regs?

It might be able to, I don't see why that would be superior though.

Where could you put it... I guess it could go in the trap field in a 
high bit. But could that break things that just test for syscall 
trap number (and don't care about register ABI)? I'm not sure.

Thanks,
Nick


[PATCH v4 2/3] audit: add support for the openat2 syscall

2021-05-19 Thread Richard Guy Briggs
The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9
("open: introduce openat2(2) syscall")

Add the openat2(2) syscall to the audit syscall classifier.

Link: https://github.com/linux-audit/audit-kernel/issues/67
Signed-off-by: Richard Guy Briggs 
Link: 
https://lore.kernel.org/r/f5f1a4d8699613f8c02ce762807228c841c2e26f.1621363275.git@redhat.com
---
 arch/alpha/kernel/audit.c   | 2 ++
 arch/ia64/kernel/audit.c| 2 ++
 arch/parisc/kernel/audit.c  | 2 ++
 arch/parisc/kernel/compat_audit.c   | 2 ++
 arch/powerpc/kernel/audit.c | 2 ++
 arch/powerpc/kernel/compat_audit.c  | 2 ++
 arch/s390/kernel/audit.c| 2 ++
 arch/s390/kernel/compat_audit.c | 2 ++
 arch/sparc/kernel/audit.c   | 2 ++
 arch/sparc/kernel/compat_audit.c| 2 ++
 arch/x86/ia32/audit.c   | 2 ++
 arch/x86/kernel/audit_64.c  | 2 ++
 include/linux/auditsc_classmacros.h | 1 +
 kernel/auditsc.c| 3 +++
 lib/audit.c | 4 
 lib/compat_audit.c  | 4 
 16 files changed, 36 insertions(+)

diff --git a/arch/alpha/kernel/audit.c b/arch/alpha/kernel/audit.c
index 81cbd804e375..3ab04709784a 100644
--- a/arch/alpha/kernel/audit.c
+++ b/arch/alpha/kernel/audit.c
@@ -42,6 +42,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
return AUDITSC_OPENAT;
case __NR_execve:
return AUDITSC_EXECVE;
+   case __NR_openat2:
+   return AUDITSC_OPENAT2;
default:
return AUDITSC_NATIVE;
}
diff --git a/arch/ia64/kernel/audit.c b/arch/ia64/kernel/audit.c
index dba6a74c9ab3..ec61f20ca61f 100644
--- a/arch/ia64/kernel/audit.c
+++ b/arch/ia64/kernel/audit.c
@@ -43,6 +43,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
return AUDITSC_OPENAT;
case __NR_execve:
return AUDITSC_EXECVE;
+   case __NR_openat2:
+   return AUDITSC_OPENAT2;
default:
return AUDITSC_NATIVE;
}
diff --git a/arch/parisc/kernel/audit.c b/arch/parisc/kernel/audit.c
index 14244e83db75..f420b5552140 100644
--- a/arch/parisc/kernel/audit.c
+++ b/arch/parisc/kernel/audit.c
@@ -52,6 +52,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
return AUDITSC_OPENAT;
case __NR_execve:
return AUDITSC_EXECVE;
+   case __NR_openat2:
+   return AUDITSC_OPENAT2;
default:
return AUDITSC_NATIVE;
}
diff --git a/arch/parisc/kernel/compat_audit.c 
b/arch/parisc/kernel/compat_audit.c
index 1d6347d37d92..3ec490c28656 100644
--- a/arch/parisc/kernel/compat_audit.c
+++ b/arch/parisc/kernel/compat_audit.c
@@ -36,6 +36,8 @@ int parisc32_classify_syscall(unsigned syscall)
return AUDITSC_OPENAT;
case __NR_execve:
return AUDITSC_EXECVE;
+   case __NR_openat2:
+   return AUDITSC_OPENAT2;
default:
return AUDITSC_COMPAT;
}
diff --git a/arch/powerpc/kernel/audit.c b/arch/powerpc/kernel/audit.c
index 6eb18ef77dff..1bcfca5fdf67 100644
--- a/arch/powerpc/kernel/audit.c
+++ b/arch/powerpc/kernel/audit.c
@@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
return AUDITSC_SOCKETCALL;
case __NR_execve:
return AUDITSC_EXECVE;
+   case __NR_openat2:
+   return AUDITSC_OPENAT2;
default:
return AUDITSC_NATIVE;
}
diff --git a/arch/powerpc/kernel/compat_audit.c 
b/arch/powerpc/kernel/compat_audit.c
index b1dc2d1c4bad..251abf79d536 100644
--- a/arch/powerpc/kernel/compat_audit.c
+++ b/arch/powerpc/kernel/compat_audit.c
@@ -39,6 +39,8 @@ int ppc32_classify_syscall(unsigned syscall)
return AUDITSC_SOCKETCALL;
case __NR_execve:
return AUDITSC_EXECVE;
+   case __NR_openat2:
+   return AUDITSC_OPENAT2;
default:
return AUDITSC_COMPAT;
}
diff --git a/arch/s390/kernel/audit.c b/arch/s390/kernel/audit.c
index 7e331e1831d4..02051a596b87 100644
--- a/arch/s390/kernel/audit.c
+++ b/arch/s390/kernel/audit.c
@@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
return AUDITSC_SOCKETCALL;
case __NR_execve:
return AUDITSC_EXECVE;
+   case __NR_openat2:
+   return AUDITSC_OPENAT2;
default:
return AUDITSC_NATIVE;
}
diff --git a/arch/s390/kernel/compat_audit.c b/arch/s390/kernel/compat_audit.c
index fc3d1c7ad21c..4b3d463e7d97 100644
--- a/arch/s390/kernel/compat_audit.c
+++ b/arch/s390/kernel/compat_audit.c
@@ -40,6 +40,8 @@ int s390_classify_syscall(unsigned syscall)
return AUDITSC_SOCKETCALL;
case __NR_execve:
return AUDITSC_EXECVE;
+   case __NR_openat2:
+   return AUDITSC_OPENAT2;

[PATCH v4 0/3] audit: add support for openat2

2021-05-19 Thread Richard Guy Briggs
The openat2(2) syscall was added in v5.6.  Add support for openat2 to the
audit syscall classifier and for recording openat2 parameters that cannot
be captured in the syscall parameters of the SYSCALL record.

Supporting userspace code can be found in
https://github.com/rgbriggs/audit-userspace/tree/ghau-openat2

Supporting test case can be found in
https://github.com/linux-audit/audit-testsuite/pull/103

Changelog:
v4:
- change filename include/linux/auditscm.h to auditsc_classmacros.h to avoid 
socket association

v3:
- re-add commit descriptions that somehow got dropped
- add new file to MAINTAINERS

v2:
- add include/linux/auditscm.h for audit syscall class macros due to syscall 
redefinition warnings:
arch/x86/ia32/audit.c:3:
./include/linux/audit.h:12,
./include/linux/sched.h:22,
./include/linux/seccomp.h:21,
./arch/x86/include/asm/seccomp.h:5,
./arch/x86/include/asm/unistd.h:20,
./arch/x86/include/generated/uapi/asm/unistd_64.h:4: warning: 
"__NR_read" redefined #define __NR_read 0
...
./arch/x86/include/generated/uapi/asm/unistd_64.h:338: warning: 
"__NR_rseq" redefined #define __NR_rseq 334
previous:
arch/x86/ia32/audit.c:2:
./arch/x86/include/generated/uapi/asm/unistd_32.h:7: note: this is the 
location of the previous definition #define __NR_read 3 
 
...
./arch/x86/include/generated/uapi/asm/unistd_32.h:386: note: this is 
the location of the previous definition #define __NR_rseq 386

Richard Guy Briggs (3):
  audit: replace magic audit syscall class numbers with macros
  audit: add support for the openat2 syscall
  audit: add OPENAT2 record to list how

 MAINTAINERS |  1 +
 arch/alpha/kernel/audit.c   | 10 ++
 arch/ia64/kernel/audit.c| 10 ++
 arch/parisc/kernel/audit.c  | 10 ++
 arch/parisc/kernel/compat_audit.c   | 11 ++
 arch/powerpc/kernel/audit.c | 12 ++-
 arch/powerpc/kernel/compat_audit.c  | 13 +++-
 arch/s390/kernel/audit.c| 12 ++-
 arch/s390/kernel/compat_audit.c | 13 +++-
 arch/sparc/kernel/audit.c   | 12 ++-
 arch/sparc/kernel/compat_audit.c| 13 +++-
 arch/x86/ia32/audit.c   | 13 +++-
 arch/x86/kernel/audit_64.c  | 10 ++
 fs/open.c   |  2 ++
 include/linux/audit.h   | 11 ++
 include/linux/auditsc_classmacros.h | 24 ++
 include/uapi/linux/audit.h  |  1 +
 kernel/audit.h  |  2 ++
 kernel/auditsc.c| 31 +++--
 lib/audit.c | 14 -
 lib/compat_audit.c  | 15 +-
 21 files changed, 169 insertions(+), 71 deletions(-)
 create mode 100644 include/linux/auditsc_classmacros.h

-- 
2.27.0



[PATCH v4 1/3] audit: replace magic audit syscall class numbers with macros

2021-05-19 Thread Richard Guy Briggs
Replace audit syscall class magic numbers with macros.

This required putting the macros into new header file
include/linux/auditsc_classmacros.h since the syscall macros were
included for both 64 bit and 32 bit in any compat code, causing
redefinition warnings.

Signed-off-by: Richard Guy Briggs 
Link: 
https://lore.kernel.org/r/2300b1083a32aade7ae7efb95826e8f3f260b1df.1621363275.git@redhat.com
---
 MAINTAINERS |  1 +
 arch/alpha/kernel/audit.c   |  8 
 arch/ia64/kernel/audit.c|  8 
 arch/parisc/kernel/audit.c  |  8 
 arch/parisc/kernel/compat_audit.c   |  9 +
 arch/powerpc/kernel/audit.c | 10 +-
 arch/powerpc/kernel/compat_audit.c  | 11 ++-
 arch/s390/kernel/audit.c| 10 +-
 arch/s390/kernel/compat_audit.c | 11 ++-
 arch/sparc/kernel/audit.c   | 10 +-
 arch/sparc/kernel/compat_audit.c| 11 ++-
 arch/x86/ia32/audit.c   | 11 ++-
 arch/x86/kernel/audit_64.c  |  8 
 include/linux/audit.h   |  1 +
 include/linux/auditsc_classmacros.h | 23 +++
 kernel/auditsc.c| 12 ++--
 lib/audit.c | 10 +-
 lib/compat_audit.c  | 11 ++-
 18 files changed, 102 insertions(+), 71 deletions(-)
 create mode 100644 include/linux/auditsc_classmacros.h

diff --git a/MAINTAINERS b/MAINTAINERS
index bd7aff0c120f..3348d12019f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3036,6 +3036,7 @@ W:https://github.com/linux-audit
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git
 F: include/asm-generic/audit_*.h
 F: include/linux/audit.h
+F: include/linux/auditsc_classmacros.h
 F: include/uapi/linux/audit.h
 F: kernel/audit*
 F: lib/*audit.c
diff --git a/arch/alpha/kernel/audit.c b/arch/alpha/kernel/audit.c
index 96a9d18ff4c4..81cbd804e375 100644
--- a/arch/alpha/kernel/audit.c
+++ b/arch/alpha/kernel/audit.c
@@ -37,13 +37,13 @@ int audit_classify_syscall(int abi, unsigned syscall)
 {
switch(syscall) {
case __NR_open:
-   return 2;
+   return AUDITSC_OPEN;
case __NR_openat:
-   return 3;
+   return AUDITSC_OPENAT;
case __NR_execve:
-   return 5;
+   return AUDITSC_EXECVE;
default:
-   return 0;
+   return AUDITSC_NATIVE;
}
 }
 
diff --git a/arch/ia64/kernel/audit.c b/arch/ia64/kernel/audit.c
index 5192ca899fe6..dba6a74c9ab3 100644
--- a/arch/ia64/kernel/audit.c
+++ b/arch/ia64/kernel/audit.c
@@ -38,13 +38,13 @@ int audit_classify_syscall(int abi, unsigned syscall)
 {
switch(syscall) {
case __NR_open:
-   return 2;
+   return AUDITSC_OPEN;
case __NR_openat:
-   return 3;
+   return AUDITSC_OPENAT;
case __NR_execve:
-   return 5;
+   return AUDITSC_EXECVE;
default:
-   return 0;
+   return AUDITSC_NATIVE;
}
 }
 
diff --git a/arch/parisc/kernel/audit.c b/arch/parisc/kernel/audit.c
index 9eb47b2225d2..14244e83db75 100644
--- a/arch/parisc/kernel/audit.c
+++ b/arch/parisc/kernel/audit.c
@@ -47,13 +47,13 @@ int audit_classify_syscall(int abi, unsigned syscall)
 #endif
switch (syscall) {
case __NR_open:
-   return 2;
+   return AUDITSC_OPEN;
case __NR_openat:
-   return 3;
+   return AUDITSC_OPENAT;
case __NR_execve:
-   return 5;
+   return AUDITSC_EXECVE;
default:
-   return 0;
+   return AUDITSC_NATIVE;
}
 }
 
diff --git a/arch/parisc/kernel/compat_audit.c 
b/arch/parisc/kernel/compat_audit.c
index 20c39c9d86a9..1d6347d37d92 100644
--- a/arch/parisc/kernel/compat_audit.c
+++ b/arch/parisc/kernel/compat_audit.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+#include 
 #include 
 
 unsigned int parisc32_dir_class[] = {
@@ -30,12 +31,12 @@ int parisc32_classify_syscall(unsigned syscall)
 {
switch (syscall) {
case __NR_open:
-   return 2;
+   return AUDITSC_OPEN;
case __NR_openat:
-   return 3;
+   return AUDITSC_OPENAT;
case __NR_execve:
-   return 5;
+   return AUDITSC_EXECVE;
default:
-   return 1;
+   return AUDITSC_COMPAT;
}
 }
diff --git a/arch/powerpc/kernel/audit.c b/arch/powerpc/kernel/audit.c
index a27f3d09..6eb18ef77dff 100644
--- a/arch/powerpc/kernel/audit.c
+++ b/arch/powerpc/kernel/audit.c
@@ -47,15 +47,15 @@ int audit_classify_syscall(int abi, unsigned syscall)
 #endif
switch(syscall) {
case __NR_open:
-   return 2;
+   return AUDITSC_OPEN;

Re: [PATCH v7 02/15] swiotlb: Refactor swiotlb_create_debugfs

2021-05-19 Thread Florian Fainelli



On 5/17/2021 11:42 PM, Claire Chang wrote:
> Split the debugfs creation to make the code reusable for supporting
> different bounce buffer pools, e.g. restricted DMA pool.
> 
> Signed-off-by: Claire Chang 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v7 07/15] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-05-19 Thread Florian Fainelli



On 5/17/2021 11:42 PM, Claire Chang wrote:
> Update is_swiotlb_active to add a struct device argument. This will be
> useful later to allow for restricted DMA pool.
> 
> Signed-off-by: Claire Chang 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v7 06/15] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-05-19 Thread Florian Fainelli



On 5/17/2021 11:42 PM, Claire Chang wrote:
> Update is_swiotlb_buffer to add a struct device argument. This will be
> useful later to allow for restricted DMA pool.
> 
> Signed-off-by: Claire Chang 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v7 05/15] swiotlb: Add a new get_io_tlb_mem getter

2021-05-19 Thread Florian Fainelli



On 5/17/2021 11:42 PM, Claire Chang wrote:
> Add a new getter, get_io_tlb_mem, to help select the io_tlb_mem struct.
> The restricted DMA pool is preferred if available.
> 
> Signed-off-by: Claire Chang 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH net-next] ibmveth: fix kobj_to_dev.cocci warnings

2021-05-19 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Wed, 19 May 2021 10:28:49 +0800 you wrote:
> Use kobj_to_dev() instead of container_of()
> 
> Generated by: scripts/coccinelle/api/kobj_to_dev.cocci
> 
> Signed-off-by: YueHaibing 
> ---
>  drivers/net/ethernet/ibm/ibmveth.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Here is the summary with links:
  - [net-next] ibmveth: fix kobj_to_dev.cocci warnings
https://git.kernel.org/netdev/net-next/c/1756055de284

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [PATCH v7 03/15] swiotlb: Add DMA_RESTRICTED_POOL

2021-05-19 Thread Florian Fainelli



On 5/17/2021 11:42 PM, Claire Chang wrote:
> Add a new kconfig symbol, DMA_RESTRICTED_POOL, for restricted DMA pool.
> 
> Signed-off-by: Claire Chang 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v7 11/15] dma-direct: Add a new wrapper __dma_direct_free_pages()

2021-05-19 Thread Florian Fainelli



On 5/17/2021 11:42 PM, Claire Chang wrote:
> Add a new wrapper __dma_direct_free_pages() that will be useful later
> for swiotlb_free().
> 
> Signed-off-by: Claire Chang 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v7 04/15] swiotlb: Add restricted DMA pool initialization

2021-05-19 Thread Florian Fainelli



On 5/17/2021 11:42 PM, Claire Chang wrote:
> Add the initialization function to create restricted DMA pools from
> matching reserved-memory nodes.
> 
> Signed-off-by: Claire Chang 
> ---
>  include/linux/device.h  |  4 +++
>  include/linux/swiotlb.h |  3 +-
>  kernel/dma/swiotlb.c| 76 +
>  3 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 38a2071cf776..4987608ea4ff 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -416,6 +416,7 @@ struct dev_links_info {
>   * @dma_pools:   Dma pools (if dma'ble device).
>   * @dma_mem: Internal for coherent mem override.
>   * @cma_area:Contiguous memory area for dma allocations
> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
>   * @archdata:For arch-specific additions.
>   * @of_node: Associated device tree node.
>   * @fwnode:  Associated device node supplied by platform firmware.
> @@ -521,6 +522,9 @@ struct device {
>  #ifdef CONFIG_DMA_CMA
>   struct cma *cma_area;   /* contiguous memory area for dma
>  allocations */
> +#endif
> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> + struct io_tlb_mem *dma_io_tlb_mem;
>  #endif
>   /* arch specific additions */
>   struct dev_archdata archdata;
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 216854a5e513..03ad6e3b4056 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -72,7 +72,8 @@ extern enum swiotlb_force swiotlb_force;
>   *   range check to see if the memory was in fact allocated by this
>   *   API.
>   * @nslabs:  The number of IO TLB blocks (in groups of 64) between @start and
> - *   @end. This is command line adjustable via setup_io_tlb_npages.
> + *   @end. For default swiotlb, this is command line adjustable via
> + *   setup_io_tlb_npages.
>   * @used:The number of used IO TLB block.
>   * @list:The free list describing the number of free entries available
>   *   from each index.
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index b849b01a446f..1d8eb4de0d01 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -39,6 +39,13 @@
>  #ifdef CONFIG_DEBUG_FS
>  #include 
>  #endif
> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#endif
>  
>  #include 
>  #include 
> @@ -690,3 +697,72 @@ static int __init swiotlb_create_default_debugfs(void)
>  late_initcall(swiotlb_create_default_debugfs);
>  
>  #endif
> +
> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> + struct device *dev)
> +{
> + struct io_tlb_mem *mem = rmem->priv;
> + unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
> +
> + if (dev->dma_io_tlb_mem)
> + return 0;
> +
> + /*
> +  * Since multiple devices can share the same pool, the private data,
> +  * io_tlb_mem struct, will be initialized by the first device attached
> +  * to it.
> +  */
> + if (!mem) {
> + mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
> + if (!mem)
> + return -ENOMEM;
> +
> + if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
> + kfree(mem);
> + return -EINVAL;

This could probably deserve a warning here to indicate that the reserved
area must be accessible within the linear mapping as I would expect a
lot of people to trip over that.

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v7 01/15] swiotlb: Refactor swiotlb init functions

2021-05-19 Thread Florian Fainelli



On 5/17/2021 11:42 PM, Claire Chang wrote:
> Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
> initialization to make the code reusable.
> 
> Note that we now also call set_memory_decrypted in swiotlb_init_with_tbl.
> 
> Signed-off-by: Claire Chang 
> ---
>  kernel/dma/swiotlb.c | 51 ++--
>  1 file changed, 25 insertions(+), 26 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 8ca7d505d61c..d3232fc19385 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -168,9 +168,30 @@ void __init swiotlb_update_mem_attributes(void)
>   memset(vaddr, 0, bytes);
>  }
>  
> -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
> start,
> + unsigned long nslabs, bool late_alloc)
>  {
> + void *vaddr = phys_to_virt(start);
>   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
> +
> + mem->nslabs = nslabs;
> + mem->start = start;
> + mem->end = mem->start + bytes;
> + mem->index = 0;
> + mem->late_alloc = late_alloc;
> + spin_lock_init(>lock);
> + for (i = 0; i < mem->nslabs; i++) {
> + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> + mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> + mem->slots[i].alloc_size = 0;
> + }
> +
> + set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
> + memset(vaddr, 0, bytes);

You are doing an unconditional set_memory_decrypted() followed by a
memset here, and then:

> +}
> +
> +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> +{
>   struct io_tlb_mem *mem;
>   size_t alloc_size;
>  
> @@ -186,16 +207,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
> long nslabs, int verbose)
>   if (!mem)
>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> __func__, alloc_size, PAGE_SIZE);
> - mem->nslabs = nslabs;
> - mem->start = __pa(tlb);
> - mem->end = mem->start + bytes;
> - mem->index = 0;
> - spin_lock_init(>lock);
> - for (i = 0; i < mem->nslabs; i++) {
> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> - mem->slots[i].alloc_size = 0;
> - }
> +
> + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);

You convert this call site with swiotlb_init_io_tlb_mem() which did not
do the set_memory_decrypted()+memset(). Is this okay or should
swiotlb_init_io_tlb_mem() add an additional argument to do this
conditionally?
-- 
Florian


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Joakim Tjernlund
On Wed, 2021-05-19 at 10:22 -0500, Segher Boessenkool wrote:
> On Wed, May 19, 2021 at 03:06:49PM +, Joakim Tjernlund wrote:
> > On Wed, 2021-05-19 at 09:38 -0500, Segher Boessenkool wrote:
> > > On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote:
> > > > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm:
> > > > > I always figured the ppc way was superior. It begs the question if 
> > > > > not the other archs should
> > > > > change instead?
> > > > 
> > > > It is superior in some ways, not enough to be worth being different.
> > > 
> > > The PowerPC syscall ABI *requires* using cr0.3 for indicating errors,
> > > you will have to do that whether you conflate the concepts of return
> > > code and error indicator or not!
> > > 
> > > > Other archs are unlikely to change because it would be painful for
> > > > not much benefit.
> > > 
> > > Other archs cannot easily change for much the same reason :-)
> > 
> > Really? I figured you could just add extra error indication in kernel 
> > syscall I/F.
> > Eventually user space could migrate to the new indication.
> 
> You seem to assume all user space uses glibc, or *any* libc even?  This
> is false.  Some programs do system calls directly.  Do not break the
> kernel ABI :-)

Adding an extra indicator does not break ABI, does it ?
W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new 
syscall I/F? 

 Jocke


Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries

2021-05-19 Thread Nathan Chancellor

On 5/18/2021 9:46 PM, Aneesh Kumar K.V wrote:

Nathan Chancellor  writes:


Hi Aneesh,

On Thu, Apr 22, 2021 at 11:13:17AM +0530, Aneesh Kumar K.V wrote:

pmd/pud_populate is the right interface to be used to set the respective
page table entries. Some architectures like ppc64 do assume that set_pmd/pud_at
can only be used to set a hugepage PTE. Since we are not setting up a hugepage
PTE here, use the pmd/pud_populate interface.

Signed-off-by: Aneesh Kumar K.V 
---
  mm/mremap.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index ec8f840399ed..574287f9bb39 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -26,6 +26,7 @@
  
  #include 

  #include 
+#include 
  
  #include "internal.h"
  
@@ -257,9 +258,8 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,

pmd_clear(old_pmd);
  
  	VM_BUG_ON(!pmd_none(*new_pmd));

+   pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
  
-	/* Set the new pmd */

-   set_pmd_at(mm, new_addr, new_pmd, pmd);
flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
@@ -306,8 +306,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, 
unsigned long old_addr,
  
  	VM_BUG_ON(!pud_none(*new_pud));
  
-	/* Set the new pud */

-   set_pud_at(mm, new_addr, new_pud, pud);
+   pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud));
flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
--
2.30.2




This commit causes my WSL2 VM to close when compiling something memory
intensive, such as an x86_64_defconfig + CONFIG_LTO_CLANG_FULL=y kernel
or LLVM/Clang. Unfortunately, I do not have much further information to
provide since I do not see any sort of splat in dmesg right before it
closes and I have found zero information about getting the previous
kernel message in WSL2 (custom init so no systemd or anything).

The config file is the stock one from Microsoft:

https://github.com/microsoft/WSL2-Linux-Kernel/blob/a571dc8cedc8e0e56487c0dc93243e0b5db8960a/Microsoft/config-wsl

I have attached my .config anyways, which includes CONFIG_DEBUG_VM,
which does not appear to show anything out of the ordinary. I have also
attached a dmesg just in case anything sticks out. I am happy to provide
any additional information or perform additional debugging steps as
needed.



Can you try this change?


Thank you for the quick diff! This resolves my issue.

Tested-by: Nathan Chancellor 


modified   mm/mremap.c
@@ -279,7 +279,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, 
unsigned long old_addr,
pmd_clear(old_pmd);
  
  	VM_BUG_ON(!pmd_none(*new_pmd));

-   pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
+   pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
  
  	if (new_ptl != old_ptl)

spin_unlock(new_ptl);





Re: [musl] Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Rich Felker
On Wed, May 19, 2021 at 10:22:05AM -0500, Segher Boessenkool wrote:
> On Wed, May 19, 2021 at 03:06:49PM +, Joakim Tjernlund wrote:
> > On Wed, 2021-05-19 at 09:38 -0500, Segher Boessenkool wrote:
> > > On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote:
> > > > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm:
> > > > > I always figured the ppc way was superior. It begs the question if 
> > > > > not the other archs should
> > > > > change instead?
> > > > 
> > > > It is superior in some ways, not enough to be worth being different.
> > > 
> > > The PowerPC syscall ABI *requires* using cr0.3 for indicating errors,
> > > you will have to do that whether you conflate the concepts of return
> > > code and error indicator or not!
> > > 
> > > > Other archs are unlikely to change because it would be painful for
> > > > not much benefit.
> > > 
> > > Other archs cannot easily change for much the same reason :-)
> > 
> > Really? I figured you could just add extra error indication in kernel 
> > syscall I/F.
> > Eventually user space could migrate to the new indication.
> 
> You seem to assume all user space uses glibc, or *any* libc even?  This
> is false.  Some programs do system calls directly.  Do not break the
> kernel ABI :-)

Even if it were easy to change, the old ppc ABI with a separate error
indicator is much worse to use. In musl we paper over archs that do
this silliness by converting to a normal negated errno code. There are
literally no syscalls that need the ability to return negative values
in addition to error codes; historically there were one or two (I only
recall one fcntl command) but there were ways to disambiguate and
they're only fallbacks for ancient kernels nowadays, if used at all.

Rich


Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument

2021-05-19 Thread Guenter Roeck
On Wed, May 19, 2021 at 09:20:38AM -0500, Segher Boessenkool wrote:
> On Wed, May 19, 2021 at 06:37:44AM -0700, Guenter Roeck wrote:
> > On 5/19/21 5:03 AM, Segher Boessenkool wrote:
> > >On Tue, May 18, 2021 at 07:45:14PM -0500, Segher Boessenkool wrote:
> > >>And it actually explicitly is undefined behaviour in C90 already
> > >>(3.6.6.4 in C90, 6.8.6.4 in C99 and later).
> > >
> > >... but there is a GCC extension that allows this by default:
> > >
> > >   For C only, warn about a 'return' statement with an expression in a
> > >   function whose return type is 'void', unless the expression type is
> > >   also 'void'.  As a GNU extension, the latter case is accepted
> > >   without a warning unless '-Wpedantic' is used.
> > 
> > In C99:
> > 
> > "6.8.6.4 The return statement
> > Constraints
> > 
> > A return statement with an expression shall not appear in a function whose 
> > return type
> > is void. A return statement without an expression shall only appear in a 
> > function
> > whose return type is void."
> > 
> > Sounds like invalid to me, not just undefined behavior.
> 
> I don't know what "invalid" would mean here other than UB, it isn't a
> specific defined term, unlike the latter, which is precisely defined in
> 3.4.3/1:
>   undefined behavior
>   behavior, upon use of a nonportable or erroneous program construct or
>   of erroneous data, for which this International Standard imposes no
>   requirements
> 
> This is the strongest thing the standard can say, it is not Law, it does
> not prohibit anyone from doing anything :-)
> 
> "Shall" and "shall not" X means it is undefined behaviour if X (or its
> inverse)  is violated.  See 4.2:
>   If a ''shall'' or ''shall not'' requirement that appears outside of a
>   constraint or runtime-constraint is violated, the behavior is
>   undefined.  Undefined behavior is otherwise indicated in this
>   International Standard by the words ''undefined behavior'' or by the
>   omission of any explicit definition of behavior.  There is no
>   difference in emphasis among these three; they all describe ''behavior
>   that is undefined''.
> which also explains that what you call "invalid" has undefined behaviour
> just as well, most likely.
> 

I'd have assumed that "shall not" is syntactically wrong, but I stand
corrected.

Guenter


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Segher Boessenkool
On Wed, May 19, 2021 at 03:06:49PM +, Joakim Tjernlund wrote:
> On Wed, 2021-05-19 at 09:38 -0500, Segher Boessenkool wrote:
> > On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote:
> > > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm:
> > > > I always figured the ppc way was superior. It begs the question if not 
> > > > the other archs should
> > > > change instead?
> > > 
> > > It is superior in some ways, not enough to be worth being different.
> > 
> > The PowerPC syscall ABI *requires* using cr0.3 for indicating errors,
> > you will have to do that whether you conflate the concepts of return
> > code and error indicator or not!
> > 
> > > Other archs are unlikely to change because it would be painful for
> > > not much benefit.
> > 
> > Other archs cannot easily change for much the same reason :-)
> 
> Really? I figured you could just add extra error indication in kernel syscall 
> I/F.
> Eventually user space could migrate to the new indication.

You seem to assume all user space uses glibc, or *any* libc even?  This
is false.  Some programs do system calls directly.  Do not break the
kernel ABI :-)


Segher


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Joakim Tjernlund
On Wed, 2021-05-19 at 09:38 -0500, Segher Boessenkool wrote:
> On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote:
> > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm:
> > > I always figured the ppc way was superior. It begs the question if not 
> > > the other archs should
> > > change instead?
> > 
> > It is superior in some ways, not enough to be worth being different.
> 
> The PowerPC syscall ABI *requires* using cr0.3 for indicating errors,
> you will have to do that whether you conflate the concepts of return
> code and error indicator or not!
> 
> > Other archs are unlikely to change because it would be painful for
> > not much benefit.
> 
> Other archs cannot easily change for much the same reason :-)

Really? I figured you could just add extra error indication in kernel syscall 
I/F.
Eventually user space could migrate to the new indication.

 Jocke


Re: [PATCH v2 2/2] powerpc/legacy_serial: Use early_ioremap()

2021-05-19 Thread Alexey Kardashevskiy




On 20/04/2021 23:32, Christophe Leroy wrote:

From: Christophe Leroy 

[0.00] ioremap() called early from 
find_legacy_serial_ports+0x3cc/0x474. Use early_ioremap() instead

find_legacy_serial_ports() is called early from setup_arch(), before
paging_init(). vmalloc is not available yet, ioremap shouldn't be
used that early.

Use early_ioremap() and switch to a regular ioremap() later.

Signed-off-by: Christophe Leroy 
Signed-off-by: Christophe Leroy 


My POWER9 box silently reboots with the upstream kernel which has this.

This hunk:

diff --git a/arch/powerpc/kernel/legacy_serial.c 
b/arch/powerpc/kernel/legacy_serial.c

index f061e06e9f51..6bdb3f5f64e3 100644
--- a/arch/powerpc/kernel/legacy_serial.c
+++ b/arch/powerpc/kernel/legacy_serial.c
@@ -336,6 +336,16 @@ static void __init setup_legacy_serial_console(int 
console)

if (addr == NULL)
return;
udbg_uart_init_mmio(addr, stride);
+
+
+   {
+   void *ea = early_ioremap(info->taddr, 0x1000);
+   pr_err("___K___ (%u) %s %u: ior=%lx early=%lx\n",
+   smp_processor_id(), __func__, __LINE__,
+   (unsigned long) addr, (unsigned 
long) ea);

+   early_iounmap(ea, 0x1000);
+   }
+


produced:

[0.00] ___K___ (0) setup_legacy_serial_console 345: 
ior=c00a83f8 early=ffc003f8 




The early address just does not look right - ffc003f8. Do you 
have a quick idea what is exactly wrong before I wake up and dig more? 
:)  It is powernv_defconfig. Thanks,





---
  arch/powerpc/kernel/legacy_serial.c | 33 +
  1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/legacy_serial.c 
b/arch/powerpc/kernel/legacy_serial.c
index f061e06e9f51..8b2c1a8553a0 100644
--- a/arch/powerpc/kernel/legacy_serial.c
+++ b/arch/powerpc/kernel/legacy_serial.c
@@ -15,6 +15,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #undef DEBUG
  
@@ -34,6 +35,7 @@ static struct legacy_serial_info {

unsigned intclock;
int irq_check_parent;
phys_addr_t taddr;
+   void __iomem*early_addr;
  } legacy_serial_infos[MAX_LEGACY_SERIAL_PORTS];
  
  static const struct of_device_id legacy_serial_parents[] __initconst = {

@@ -325,17 +327,16 @@ static void __init setup_legacy_serial_console(int 
console)
  {
struct legacy_serial_info *info = _serial_infos[console];
struct plat_serial8250_port *port = _serial_ports[console];
-   void __iomem *addr;
unsigned int stride;
  
  	stride = 1 << port->regshift;
  
  	/* Check if a translated MMIO address has been found */

if (info->taddr) {
-   addr = ioremap(info->taddr, 0x1000);
-   if (addr == NULL)
+   info->early_addr = early_ioremap(info->taddr, 0x1000);
+   if (info->early_addr == NULL)
return;
-   udbg_uart_init_mmio(addr, stride);
+   udbg_uart_init_mmio(info->early_addr, stride);
} else {
/* Check if it's PIO and we support untranslated PIO */
if (port->iotype == UPIO_PORT && isa_io_special)
@@ -353,6 +354,30 @@ static void __init setup_legacy_serial_console(int console)
udbg_uart_setup(info->speed, info->clock);
  }
  
+static int __init ioremap_legacy_serial_console(void)

+{
+   struct legacy_serial_info *info = 
_serial_infos[legacy_serial_console];
+   struct plat_serial8250_port *port = 
_serial_ports[legacy_serial_console];
+   void __iomem *vaddr;
+
+   if (legacy_serial_console < 0)
+   return 0;
+
+   if (!info->early_addr)
+   return 0;
+
+   vaddr = ioremap(info->taddr, 0x1000);
+   if (WARN_ON(!vaddr))
+   return -ENOMEM;
+
+   udbg_uart_init_mmio(vaddr, 1 << port->regshift);
+   early_iounmap(info->early_addr, 0x1000);
+   info->early_addr = NULL;
+
+   return 0;
+}
+early_initcall(ioremap_legacy_serial_console);
+
  /*
   * This is called very early, as part of setup_system() or eventually
   * setup_arch(), basically before anything else in this file. This function



--
Alexey


[PATCH v1 12/12] powerpc/optprobes: use PPC_RAW_ macros

2021-05-19 Thread Christophe Leroy
Use PPC_RAW_ macros to simplify the code.

And use PPC_LO/PPC_HI instead of IMM_L/IMM_H which are for
internal use inside ppc-opcode.h

Those macros are self explanatory, comments can go as well.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/ppc-opcode.h | 11 
 arch/powerpc/kernel/optprobes.c   | 39 +--
 2 files changed, 12 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index ac41776661e9..87802e8073e6 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -261,9 +261,6 @@
 #define PPC_INST_BCTR  0x4e800420
 #define PPC_INST_BCTRL 0x4e800421
 #define PPC_INST_DIVD  0x7c0003d2
-#define PPC_INST_RLDICR0x7804
-#define PPC_INST_ORI   0x6000
-#define PPC_INST_ORIS  0x6400
 #define PPC_INST_BRANCH0x4800
 #define PPC_INST_BL0x4801
 #define PPC_INST_BRANCH_COND   0x4080
@@ -323,6 +320,8 @@
 #define PPC_LO(v)  ((v) & 0x)
 #define PPC_HI(v)  (((v) >> 16) & 0x)
 #define PPC_HA(v)  PPC_HI((v) + 0x8000)
+#define PPC_HIGHER(v)  (((v) >> 32) & 0x)
+#define PPC_HIGHEST(v) (((v) >> 48) & 0x)
 
 /*
  * Only use the larx hint bit on 64bit CPUs. e500v1/v2 based CPUs will treat a
@@ -499,8 +498,8 @@
 #define PPC_RAW_AND_DOT(d, a, b)   (0x7c39 | ___PPC_RA(d) | 
___PPC_RS(a) | ___PPC_RB(b))
 #define PPC_RAW_OR(d, a, b)(0x7c000378 | ___PPC_RA(d) | 
___PPC_RS(a) | ___PPC_RB(b))
 #define PPC_RAW_MR(d, a)   PPC_RAW_OR(d, a, a)
-#define PPC_RAW_ORI(d, a, i)   (PPC_INST_ORI | ___PPC_RA(d) | 
___PPC_RS(a) | IMM_L(i))
-#define PPC_RAW_ORIS(d, a, i)  (PPC_INST_ORIS | ___PPC_RA(d) | 
___PPC_RS(a) | IMM_L(i))
+#define PPC_RAW_ORI(d, a, i)   (0x6000 | ___PPC_RA(d) | 
___PPC_RS(a) | IMM_L(i))
+#define PPC_RAW_ORIS(d, a, i)  (0x6400 | ___PPC_RA(d) | 
___PPC_RS(a) | IMM_L(i))
 #define PPC_RAW_NOR(d, a, b)   (0x7cf8 | ___PPC_RA(d) | 
___PPC_RS(a) | ___PPC_RB(b))
 #define PPC_RAW_XOR(d, a, b)   (0x7c000278 | ___PPC_RA(d) | 
___PPC_RS(a) | ___PPC_RB(b))
 #define PPC_RAW_XORI(d, a, i)  (0x6800 | ___PPC_RA(d) | 
___PPC_RS(a) | IMM_L(i))
@@ -519,7 +518,7 @@
(0x5401 | ___PPC_RA(d) | 
___PPC_RS(a) | __PPC_SH(i) | __PPC_MB(mb) | __PPC_ME(me))
 #define PPC_RAW_RLWIMI(d, a, i, mb, me) (0x5000 | ___PPC_RA(d) | 
___PPC_RS(a) | __PPC_SH(i) | __PPC_MB(mb) | __PPC_ME(me))
 #define PPC_RAW_RLDICL(d, a, i, mb) (0x7800 | ___PPC_RA(d) | 
___PPC_RS(a) | __PPC_SH64(i) | __PPC_MB64(mb))
-#define PPC_RAW_RLDICR(d, a, i, me) (PPC_INST_RLDICR | ___PPC_RA(d) | 
___PPC_RS(a) | __PPC_SH64(i) | __PPC_ME64(me))
+#define PPC_RAW_RLDICR(d, a, i, me) (0x7804 | ___PPC_RA(d) | 
___PPC_RS(a) | __PPC_SH64(i) | __PPC_ME64(me))
 
 /* slwi = rlwinm Rx, Ry, n, 0, 31-n */
 #define PPC_RAW_SLWI(d, a, i)  PPC_RAW_RLWINM(d, a, i, 0, 31-(i))
diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 9c1c8de8c06d..1488d4e9b34d 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -137,10 +137,8 @@ void arch_remove_optimized_kprobe(struct optimized_kprobe 
*op)
 
 static void patch_imm32_load_insns(unsigned long val, int reg, kprobe_opcode_t 
*addr)
 {
-   patch_instruction(addr, ppc_inst(PPC_RAW_LIS(reg, IMM_H(val;
-   addr++;
-
-   patch_instruction(addr, ppc_inst(PPC_RAW_ORI(reg, reg, IMM_L(val;
+   patch_instruction(addr++, ppc_inst(PPC_RAW_LIS(reg, PPC_HI(val;
+   patch_instruction(addr, ppc_inst(PPC_RAW_ORI(reg, reg, PPC_LO(val;
 }
 
 /*
@@ -149,34 +147,11 @@ static void patch_imm32_load_insns(unsigned long val, int 
reg, kprobe_opcode_t *
  */
 static void patch_imm64_load_insns(unsigned long long val, int reg, 
kprobe_opcode_t *addr)
 {
-   /* lis reg,(op)@highest */
-   patch_instruction(addr,
- ppc_inst(PPC_INST_ADDIS | ___PPC_RT(reg) |
-  ((val >> 48) & 0x)));
-   addr++;
-
-   /* ori reg,reg,(op)@higher */
-   patch_instruction(addr,
- ppc_inst(PPC_INST_ORI | ___PPC_RA(reg) |
-  ___PPC_RS(reg) | ((val >> 32) & 0x)));
-   addr++;
-
-   /* rldicr reg,reg,32,31 */
-   patch_instruction(addr,
- ppc_inst(PPC_INST_RLDICR | ___PPC_RA(reg) |
-  ___PPC_RS(reg) | __PPC_SH64(32) | 
__PPC_ME64(31)));
-   addr++;
-
-   /* oris reg,reg,(op)@h */
-   patch_instruction(addr,
- ppc_inst(PPC_INST_ORIS | ___PPC_RA(reg) |
-  ___PPC_RS(reg) | ((val >> 16) & 0x)));
-   addr++;
-

[PATCH v1 11/12] powerpc/optprobes: Compact code source a bit.

2021-05-19 Thread Christophe Leroy
Now that lines can be up to 100 chars long, minimise the
amount of split lines to increase readability.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/optprobes.c | 33 +++--
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 8c08ca15faf3..9c1c8de8c06d 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -18,18 +18,12 @@
 #include 
 #include 
 
-#define TMPL_CALL_HDLR_IDX \
-   (optprobe_template_call_handler - optprobe_template_entry)
-#define TMPL_EMULATE_IDX   \
-   (optprobe_template_call_emulate - optprobe_template_entry)
-#define TMPL_RET_IDX   \
-   (optprobe_template_ret - optprobe_template_entry)
-#define TMPL_OP_IDX\
-   (optprobe_template_op_address - optprobe_template_entry)
-#define TMPL_INSN_IDX  \
-   (optprobe_template_insn - optprobe_template_entry)
-#define TMPL_END_IDX   \
-   (optprobe_template_end - optprobe_template_entry)
+#define TMPL_CALL_HDLR_IDX (optprobe_template_call_handler - 
optprobe_template_entry)
+#define TMPL_EMULATE_IDX   (optprobe_template_call_emulate - 
optprobe_template_entry)
+#define TMPL_RET_IDX   (optprobe_template_ret - 
optprobe_template_entry)
+#define TMPL_OP_IDX(optprobe_template_op_address - 
optprobe_template_entry)
+#define TMPL_INSN_IDX  (optprobe_template_insn - 
optprobe_template_entry)
+#define TMPL_END_IDX   (optprobe_template_end - 
optprobe_template_entry)
 
 DEFINE_INSN_CACHE_OPS(ppc_optinsn);
 
@@ -280,8 +274,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe 
*op, struct kprobe *p)
 */
patch_branch(buff + TMPL_RET_IDX, nip, 0);
 
-   flush_icache_range((unsigned long)buff,
-  (unsigned long)([TMPL_END_IDX]));
+   flush_icache_range((unsigned long)buff, (unsigned 
long)([TMPL_END_IDX]));
 
op->optinsn.insn = buff;
 
@@ -319,10 +312,8 @@ void arch_optimize_kprobes(struct list_head *oplist)
 * Backup instructions which will be replaced
 * by jump address
 */
-   memcpy(op->optinsn.copied_insn, op->kp.addr,
-  RELATIVEJUMP_SIZE);
-   create_branch(, op->kp.addr,
- (unsigned long)op->optinsn.insn, 0);
+   memcpy(op->optinsn.copied_insn, op->kp.addr, RELATIVEJUMP_SIZE);
+   create_branch(, op->kp.addr, (unsigned 
long)op->optinsn.insn, 0);
patch_instruction(op->kp.addr, instr);
list_del_init(>list);
}
@@ -333,8 +324,7 @@ void arch_unoptimize_kprobe(struct optimized_kprobe *op)
arch_arm_kprobe(>kp);
 }
 
-void arch_unoptimize_kprobes(struct list_head *oplist,
-struct list_head *done_list)
+void arch_unoptimize_kprobes(struct list_head *oplist, struct list_head 
*done_list)
 {
struct optimized_kprobe *op;
struct optimized_kprobe *tmp;
@@ -345,8 +335,7 @@ void arch_unoptimize_kprobes(struct list_head *oplist,
}
 }
 
-int arch_within_optimized_kprobe(struct optimized_kprobe *op,
-unsigned long addr)
+int arch_within_optimized_kprobe(struct optimized_kprobe *op, unsigned long 
addr)
 {
return ((unsigned long)op->kp.addr <= addr &&
(unsigned long)op->kp.addr + RELATIVEJUMP_SIZE > addr);
-- 
2.25.0



[PATCH v1 10/12] powerpc/optprobes: Minimise casts

2021-05-19 Thread Christophe Leroy
nip is already an unsigned long, no cast needed.

op_callback_addr and emulate_step_addr are kprobe_opcode_t *.
There value is obtained with ppc_kallsyms_lookup_name() which
returns 'unsigned long', and there values are used create_branch()
which expects 'unsigned long'. So change them to 'unsigned long'
to avoid casting them back and forth.

can_optimize() used p->addr several times as 'unsigned long'.
Use a local 'unsigned long' variable and avoid casting multiple times.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/optprobes.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index adaf31157f6d..8c08ca15faf3 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -66,6 +66,7 @@ static unsigned long can_optimize(struct kprobe *p)
struct pt_regs regs;
struct instruction_op op;
unsigned long nip = 0;
+   unsigned long addr = (unsigned long)p->addr;
 
/*
 * kprobe placed for kretprobe during boot time
@@ -73,7 +74,7 @@ static unsigned long can_optimize(struct kprobe *p)
 * So further checks can be skipped.
 */
if (p->addr == (kprobe_opcode_t *)_trampoline)
-   return (unsigned long)p->addr + sizeof(kprobe_opcode_t);
+   return addr + sizeof(kprobe_opcode_t);
 
/*
 * We only support optimizing kernel addresses, but not
@@ -81,11 +82,11 @@ static unsigned long can_optimize(struct kprobe *p)
 *
 * FIXME: Optimize kprobes placed in module addresses.
 */
-   if (!is_kernel_addr((unsigned long)p->addr))
+   if (!is_kernel_addr(addr))
return 0;
 
memset(, 0, sizeof(struct pt_regs));
-   regs.nip = (unsigned long)p->addr;
+   regs.nip = addr;
regs.trap = 0x0;
regs.msr = MSR_KERNEL;
 
@@ -195,7 +196,8 @@ static void patch_imm_load_insns(unsigned long val, int 
reg, kprobe_opcode_t *ad
 int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe 
*p)
 {
struct ppc_inst branch_op_callback, branch_emulate_step, temp;
-   kprobe_opcode_t *op_callback_addr, *emulate_step_addr, *buff;
+   unsigned long op_callback_addr, emulate_step_addr;
+   kprobe_opcode_t *buff;
long b_offset;
unsigned long nip, size;
int rc, i;
@@ -225,8 +227,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe 
*op, struct kprobe *p)
goto error;
 
/* Check if the return address is also within 32MB range */
-   b_offset = (unsigned long)(buff + TMPL_RET_IDX) -
-   (unsigned long)nip;
+   b_offset = (unsigned long)(buff + TMPL_RET_IDX) - nip;
if (!is_offset_in_branch_range(b_offset))
goto error;
 
@@ -249,20 +250,18 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe 
*op, struct kprobe *p)
/*
 * 2. branch to optimized_callback() and emulate_step()
 */
-   op_callback_addr = (kprobe_opcode_t 
*)ppc_kallsyms_lookup_name("optimized_callback");
-   emulate_step_addr = (kprobe_opcode_t 
*)ppc_kallsyms_lookup_name("emulate_step");
+   op_callback_addr = ppc_kallsyms_lookup_name("optimized_callback");
+   emulate_step_addr = ppc_kallsyms_lookup_name("emulate_step");
if (!op_callback_addr || !emulate_step_addr) {
WARN(1, "Unable to lookup 
optimized_callback()/emulate_step()\n");
goto error;
}
 
rc = create_branch(_op_callback, buff + TMPL_CALL_HDLR_IDX,
-  (unsigned long)op_callback_addr,
-  BRANCH_SET_LINK);
+  op_callback_addr, BRANCH_SET_LINK);
 
rc |= create_branch(_emulate_step, buff + TMPL_EMULATE_IDX,
-   (unsigned long)emulate_step_addr,
-   BRANCH_SET_LINK);
+   emulate_step_addr, BRANCH_SET_LINK);
 
if (rc)
goto error;
-- 
2.25.0



[PATCH v1 09/12] powerpc/inst: Refactor PPC32 and PPC64 versions

2021-05-19 Thread Christophe Leroy
ppc_inst() ppc_inst_prefixed() ppc_inst_swab() can easily
be made common to both PPC32 and PPC64.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/inst.h | 49 +
 1 file changed, 13 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
index 9f3513621766..3d4286809cfd 100644
--- a/arch/powerpc/include/asm/inst.h
+++ b/arch/powerpc/include/asm/inst.h
@@ -61,9 +61,9 @@ static inline int ppc_inst_primary_opcode(struct ppc_inst x)
return get_op(ppc_inst_val(x));
 }
 
-#ifdef CONFIG_PPC64
 #define ppc_inst(x) ((struct ppc_inst){ .val = (x) })
 
+#ifdef CONFIG_PPC64
 #define ppc_inst_prefix(x, y) ((struct ppc_inst){ .val = (x), .suffix = (y) })
 
 static inline u32 ppc_inst_suffix(struct ppc_inst x)
@@ -71,57 +71,34 @@ static inline u32 ppc_inst_suffix(struct ppc_inst x)
return x.suffix;
 }
 
-static inline bool ppc_inst_prefixed(struct ppc_inst x)
-{
-   return ppc_inst_primary_opcode(x) == OP_PREFIX;
-}
+#else
+#define ppc_inst_prefix(x, y) ppc_inst(x)
 
-static inline struct ppc_inst ppc_inst_swab(struct ppc_inst x)
+static inline u32 ppc_inst_suffix(struct ppc_inst x)
 {
-   return ppc_inst_prefix(swab32(ppc_inst_val(x)), 
swab32(ppc_inst_suffix(x)));
+   return 0;
 }
 
+#endif /* CONFIG_PPC64 */
+
 static inline struct ppc_inst ppc_inst_read(const unsigned int *ptr)
 {
-   u32 val, suffix;
-
-   val = *ptr;
-   if (get_op(val) == OP_PREFIX) {
-   suffix = *(ptr + 1);
-   return ppc_inst_prefix(val, suffix);
-   } else {
-   return ppc_inst(val);
-   }
+   if (IS_ENABLED(CONFIG_PPC64) && get_op(*ptr) == OP_PREFIX)
+   return ppc_inst_prefix(*ptr, *(ptr + 1));
+   else
+   return ppc_inst(*ptr);
 }
 
-#else
-
-#define ppc_inst(x) ((struct ppc_inst){ .val = x })
-
-#define ppc_inst_prefix(x, y) ppc_inst(x)
-
 static inline bool ppc_inst_prefixed(struct ppc_inst x)
 {
-   return false;
-}
-
-static inline u32 ppc_inst_suffix(struct ppc_inst x)
-{
-   return 0;
+   return IS_ENABLED(CONFIG_PPC64) && ppc_inst_primary_opcode(x) == 
OP_PREFIX;
 }
 
 static inline struct ppc_inst ppc_inst_swab(struct ppc_inst x)
 {
-   return ppc_inst(swab32(ppc_inst_val(x)));
-}
-
-static inline struct ppc_inst ppc_inst_read(const unsigned int *ptr)
-{
-   return ppc_inst(*ptr);
+   return ppc_inst_prefix(swab32(ppc_inst_val(x)), 
swab32(ppc_inst_suffix(x)));
 }
 
-#endif /* CONFIG_PPC64 */
-
 static inline bool ppc_inst_equal(struct ppc_inst x, struct ppc_inst y)
 {
if (ppc_inst_val(x) != ppc_inst_val(y))
-- 
2.25.0



[PATCH v1 07/12] powerpc/lib/code-patching: Don't use struct 'ppc_inst' for runnable code in tests.

2021-05-19 Thread Christophe Leroy
'struct ppc_inst' is meant to represent an instruction internally, it
is not meant to dereference code in memory.

For testing code patching, use patch_instruction() to properly
write into memory the code to be tested.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/lib/code-patching.c | 95 ++--
 1 file changed, 53 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 82f2c1edb498..508e9511ca96 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -422,9 +422,9 @@ static void __init test_branch_iform(void)
 {
int err;
struct ppc_inst instr;
-   unsigned long addr;
-
-   addr = (unsigned long)
+   unsigned int tmp[2];
+   struct ppc_inst *iptr = (struct ppc_inst *)tmp;
+   unsigned long addr = (unsigned long)tmp;
 
/* The simplest case, branch to self, no flags */
check(instr_is_branch_iform(ppc_inst(0x4800)));
@@ -445,52 +445,57 @@ static void __init test_branch_iform(void)
check(!instr_is_branch_iform(ppc_inst(0x7bfd)));
 
/* Absolute branch to 0x100 */
-   instr = ppc_inst(0x48000103);
-   check(instr_is_branch_to_addr(, 0x100));
+   patch_instruction(iptr, ppc_inst(0x48000103));
+   check(instr_is_branch_to_addr(iptr, 0x100));
/* Absolute branch to 0x420fc */
-   instr = ppc_inst(0x480420ff);
-   check(instr_is_branch_to_addr(, 0x420fc));
+   patch_instruction(iptr, ppc_inst(0x480420ff));
+   check(instr_is_branch_to_addr(iptr, 0x420fc));
/* Maximum positive relative branch, + 20MB - 4B */
-   instr = ppc_inst(0x49fc);
-   check(instr_is_branch_to_addr(, addr + 0x1FC));
+   patch_instruction(iptr, ppc_inst(0x49fc));
+   check(instr_is_branch_to_addr(iptr, addr + 0x1FC));
/* Smallest negative relative branch, - 4B */
-   instr = ppc_inst(0x4bfc);
-   check(instr_is_branch_to_addr(, addr - 4));
+   patch_instruction(iptr, ppc_inst(0x4bfc));
+   check(instr_is_branch_to_addr(iptr, addr - 4));
/* Largest negative relative branch, - 32 MB */
-   instr = ppc_inst(0x4a00);
-   check(instr_is_branch_to_addr(, addr - 0x200));
+   patch_instruction(iptr, ppc_inst(0x4a00));
+   check(instr_is_branch_to_addr(iptr, addr - 0x200));
 
/* Branch to self, with link */
-   err = create_branch(, , addr, BRANCH_SET_LINK);
-   check(instr_is_branch_to_addr(, addr));
+   err = create_branch(, iptr, addr, BRANCH_SET_LINK);
+   patch_instruction(iptr, instr);
+   check(instr_is_branch_to_addr(iptr, addr));
 
/* Branch to self - 0x100, with link */
-   err = create_branch(, , addr - 0x100, BRANCH_SET_LINK);
-   check(instr_is_branch_to_addr(, addr - 0x100));
+   err = create_branch(, iptr, addr - 0x100, BRANCH_SET_LINK);
+   patch_instruction(iptr, instr);
+   check(instr_is_branch_to_addr(iptr, addr - 0x100));
 
/* Branch to self + 0x100, no link */
-   err = create_branch(, , addr + 0x100, 0);
-   check(instr_is_branch_to_addr(, addr + 0x100));
+   err = create_branch(, iptr, addr + 0x100, 0);
+   patch_instruction(iptr, instr);
+   check(instr_is_branch_to_addr(iptr, addr + 0x100));
 
/* Maximum relative negative offset, - 32 MB */
-   err = create_branch(, , addr - 0x200, BRANCH_SET_LINK);
-   check(instr_is_branch_to_addr(, addr - 0x200));
+   err = create_branch(, iptr, addr - 0x200, BRANCH_SET_LINK);
+   patch_instruction(iptr, instr);
+   check(instr_is_branch_to_addr(iptr, addr - 0x200));
 
/* Out of range relative negative offset, - 32 MB + 4*/
-   err = create_branch(, , addr - 0x204, BRANCH_SET_LINK);
+   err = create_branch(, iptr, addr - 0x204, BRANCH_SET_LINK);
check(err);
 
/* Out of range relative positive offset, + 32 MB */
-   err = create_branch(, , addr + 0x200, BRANCH_SET_LINK);
+   err = create_branch(, iptr, addr + 0x200, BRANCH_SET_LINK);
check(err);
 
/* Unaligned target */
-   err = create_branch(, , addr + 3, BRANCH_SET_LINK);
+   err = create_branch(, iptr, addr + 3, BRANCH_SET_LINK);
check(err);
 
/* Check flags are masked correctly */
-   err = create_branch(, , addr, 0xFFFC);
-   check(instr_is_branch_to_addr(, addr));
+   err = create_branch(, iptr, addr, 0xFFFC);
+   patch_instruction(iptr, instr);
+   check(instr_is_branch_to_addr(iptr, addr));
check(ppc_inst_equal(instr, ppc_inst(0x4800)));
 }
 
@@ -513,9 +518,10 @@ static void __init test_branch_bform(void)
int err;
unsigned long addr;
struct ppc_inst *iptr, instr;
+   unsigned int tmp[2];
unsigned int flags;
 
-   iptr = 
+   iptr = (struct ppc_inst *)tmp;
addr = (unsigned 

[PATCH v1 08/12] powerpc: Don't use 'struct ppc_inst' to reference instruction location

2021-05-19 Thread Christophe Leroy
'struct ppc_inst' is an internal representation of an instruction, but
in-memory instructions are and will remain a table of 'unsigned int'
forever.

Replace all 'struct ppc_inst *' used for locating an instruction in
memory by 'unsigned int *'. This removes a lot of undue casts
to 'struct ppc_inst *'.

It also helps locating ab-use of 'struct ppc_inst' dereference.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/code-patching.h  | 22 +++---
 arch/powerpc/include/asm/inst.h   | 12 +--
 arch/powerpc/include/asm/uprobes.h|  4 +-
 arch/powerpc/kernel/crash_dump.c  |  4 +-
 arch/powerpc/kernel/epapr_paravirt.c  |  4 +-
 arch/powerpc/kernel/jump_label.c  |  2 +-
 arch/powerpc/kernel/kgdb.c|  6 +-
 arch/powerpc/kernel/kprobes.c | 17 ++--
 arch/powerpc/kernel/mce_power.c   |  2 +-
 arch/powerpc/kernel/optprobes.c   | 45 +--
 arch/powerpc/kernel/setup_32.c|  2 +-
 arch/powerpc/kernel/trace/ftrace.c| 26 +++---
 arch/powerpc/kernel/uprobes.c |  6 +-
 arch/powerpc/lib/code-patching.c  | 50 ++--
 arch/powerpc/lib/feature-fixups.c | 96 +++
 arch/powerpc/perf/core-book3s.c   |  4 +-
 arch/powerpc/platforms/86xx/mpc86xx_smp.c |  2 +-
 arch/powerpc/platforms/powermac/smp.c |  4 +-
 arch/powerpc/xmon/xmon.c  | 10 +--
 19 files changed, 154 insertions(+), 164 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index f9bd1397b696..b20ee4e1e6fb 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -23,13 +23,13 @@
 #define BRANCH_ABSOLUTE0x2
 
 bool is_offset_in_branch_range(long offset);
-int create_branch(struct ppc_inst *instr, const struct ppc_inst *addr,
+int create_branch(struct ppc_inst *instr, const unsigned int *addr,
  unsigned long target, int flags);
-int create_cond_branch(struct ppc_inst *instr, const struct ppc_inst *addr,
+int create_cond_branch(struct ppc_inst *instr, const unsigned int *addr,
   unsigned long target, int flags);
-int patch_branch(struct ppc_inst *addr, unsigned long target, int flags);
-int patch_instruction(struct ppc_inst *addr, struct ppc_inst instr);
-int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr);
+int patch_branch(unsigned int *addr, unsigned long target, int flags);
+int patch_instruction(unsigned int *addr, struct ppc_inst instr);
+int raw_patch_instruction(unsigned int *addr, struct ppc_inst instr);
 
 static inline unsigned long patch_site_addr(s32 *site)
 {
@@ -38,18 +38,18 @@ static inline unsigned long patch_site_addr(s32 *site)
 
 static inline int patch_instruction_site(s32 *site, struct ppc_inst instr)
 {
-   return patch_instruction((struct ppc_inst *)patch_site_addr(site), 
instr);
+   return patch_instruction((unsigned int *)patch_site_addr(site), instr);
 }
 
 static inline int patch_branch_site(s32 *site, unsigned long target, int flags)
 {
-   return patch_branch((struct ppc_inst *)patch_site_addr(site), target, 
flags);
+   return patch_branch((unsigned int *)patch_site_addr(site), target, 
flags);
 }
 
 static inline int modify_instruction(unsigned int *addr, unsigned int clr,
 unsigned int set)
 {
-   return patch_instruction((struct ppc_inst *)addr, ppc_inst((*addr & 
~clr) | set));
+   return patch_instruction(addr, ppc_inst((*addr & ~clr) | set));
 }
 
 static inline int modify_instruction_site(s32 *site, unsigned int clr, 
unsigned int set)
@@ -59,9 +59,9 @@ static inline int modify_instruction_site(s32 *site, unsigned 
int clr, unsigned
 
 int instr_is_relative_branch(struct ppc_inst instr);
 int instr_is_relative_link_branch(struct ppc_inst instr);
-unsigned long branch_target(const struct ppc_inst *instr);
-int translate_branch(struct ppc_inst *instr, const struct ppc_inst *dest,
-const struct ppc_inst *src);
+unsigned long branch_target(const unsigned int *instr);
+int translate_branch(struct ppc_inst *instr, const unsigned int *dest,
+const unsigned int *src);
 extern bool is_conditional_branch(struct ppc_inst instr);
 #ifdef CONFIG_PPC_BOOK3E_64
 void __patch_exception(int exc, unsigned long addr);
diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
index b49f11c69ed5..9f3513621766 100644
--- a/arch/powerpc/include/asm/inst.h
+++ b/arch/powerpc/include/asm/inst.h
@@ -81,13 +81,13 @@ static inline struct ppc_inst ppc_inst_swab(struct ppc_inst 
x)
return ppc_inst_prefix(swab32(ppc_inst_val(x)), 
swab32(ppc_inst_suffix(x)));
 }
 
-static inline struct ppc_inst ppc_inst_read(const struct ppc_inst *ptr)
+static inline struct ppc_inst ppc_inst_read(const unsigned int *ptr)
 {
u32 val, suffix;
 
-   val = *(u32 *)ptr;
+   val = *ptr;
  

[PATCH v1 06/12] powerpc/lib/code-patching: Make instr_is_branch_to_addr() static

2021-05-19 Thread Christophe Leroy
instr_is_branch_to_addr() is only used in code-patching.c

Make it static.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/code-patching.h |  1 -
 arch/powerpc/lib/code-patching.c | 18 +-
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index f1d029bf906e..f9bd1397b696 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -59,7 +59,6 @@ static inline int modify_instruction_site(s32 *site, unsigned 
int clr, unsigned
 
 int instr_is_relative_branch(struct ppc_inst instr);
 int instr_is_relative_link_branch(struct ppc_inst instr);
-int instr_is_branch_to_addr(const struct ppc_inst *instr, unsigned long addr);
 unsigned long branch_target(const struct ppc_inst *instr);
 int translate_branch(struct ppc_inst *instr, const struct ppc_inst *dest,
 const struct ppc_inst *src);
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 0308429b0d1a..82f2c1edb498 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -367,15 +367,6 @@ unsigned long branch_target(const struct ppc_inst *instr)
return 0;
 }
 
-int instr_is_branch_to_addr(const struct ppc_inst *instr, unsigned long addr)
-{
-   if (instr_is_branch_iform(ppc_inst_read(instr)) ||
-   instr_is_branch_bform(ppc_inst_read(instr)))
-   return branch_target(instr) == addr;
-
-   return 0;
-}
-
 int translate_branch(struct ppc_inst *instr, const struct ppc_inst *dest,
 const struct ppc_inst *src)
 {
@@ -410,6 +401,15 @@ void __patch_exception(int exc, unsigned long addr)
 
 #ifdef CONFIG_CODE_PATCHING_SELFTEST
 
+static int instr_is_branch_to_addr(const struct ppc_inst *instr, unsigned long 
addr)
+{
+   if (instr_is_branch_iform(ppc_inst_read(instr)) ||
+   instr_is_branch_bform(ppc_inst_read(instr)))
+   return branch_target(instr) == addr;
+
+   return 0;
+}
+
 static void __init test_trampoline(void)
 {
asm ("nop;\n");
-- 
2.25.0



[PATCH v1 05/12] powerpc: Do not dereference code as 'struct ppc_inst' (uprobe, code-patching, feature-fixups)

2021-05-19 Thread Christophe Leroy
'struct ppc_inst' is an internal structure to represent an instruction,
it is not directly the representation of that instruction in text code.
It is not meant to map and dereference code.

Dereferencing code directly through 'struct ppc_inst' has two main issues:
- On powerpc, structs are expected to be 8 bytes aligned while code is
spread every 4 byte.
- Should a non prefixed instruction lie at the end of the page and the
following page not be mapped, it would generate a page fault.

In-memory code must be accessed with ppc_inst_read().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/uprobes.c | 2 +-
 arch/powerpc/lib/code-patching.c  | 8 
 arch/powerpc/lib/feature-fixups.c | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index 186f69b11e94..46971bb41d05 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -42,7 +42,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
return -EINVAL;
 
if (cpu_has_feature(CPU_FTR_ARCH_31) &&
-   ppc_inst_prefixed(auprobe->insn) &&
+   ppc_inst_prefixed(ppc_inst_read(>insn)) &&
(addr & 0x3f) == 60) {
pr_info_ratelimited("Cannot register a uprobe on 64 byte 
unaligned prefixed instruction\n");
return -EINVAL;
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 870b30d9be2f..0308429b0d1a 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -329,13 +329,13 @@ static unsigned long branch_iform_target(const struct 
ppc_inst *instr)
 {
signed long imm;
 
-   imm = ppc_inst_val(*instr) & 0x3FC;
+   imm = ppc_inst_val(ppc_inst_read(instr)) & 0x3FC;
 
/* If the top bit of the immediate value is set this is negative */
if (imm & 0x200)
imm -= 0x400;
 
-   if ((ppc_inst_val(*instr) & BRANCH_ABSOLUTE) == 0)
+   if ((ppc_inst_val(ppc_inst_read(instr)) & BRANCH_ABSOLUTE) == 0)
imm += (unsigned long)instr;
 
return (unsigned long)imm;
@@ -345,13 +345,13 @@ static unsigned long branch_bform_target(const struct 
ppc_inst *instr)
 {
signed long imm;
 
-   imm = ppc_inst_val(*instr) & 0xFFFC;
+   imm = ppc_inst_val(ppc_inst_read(instr)) & 0xFFFC;
 
/* If the top bit of the immediate value is set this is negative */
if (imm & 0x8000)
imm -= 0x1;
 
-   if ((ppc_inst_val(*instr) & BRANCH_ABSOLUTE) == 0)
+   if ((ppc_inst_val(ppc_inst_read(instr)) & BRANCH_ABSOLUTE) == 0)
imm += (unsigned long)instr;
 
return (unsigned long)imm;
diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index fe26f2fa0f3f..8905b53109bc 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -51,7 +51,7 @@ static int patch_alt_instruction(struct ppc_inst *src, struct 
ppc_inst *dest,
 
instr = ppc_inst_read(src);
 
-   if (instr_is_relative_branch(*src)) {
+   if (instr_is_relative_branch(ppc_inst_read(src))) {
struct ppc_inst *target = (struct ppc_inst *)branch_target(src);
 
/* Branch within the section doesn't need translating */
-- 
2.25.0



[PATCH v1 04/12] powerpc/inst: Avoid pointer dereferencing in ppc_inst_equal()

2021-05-19 Thread Christophe Leroy
Avoid casting/dereferencing ppc_inst() as u64* , check each member
of the struct when relevant.

And remove the 0xff initialisation of the suffix for non
prefixed instruction. An instruction with 0xff as a suffix
might be invalid, but still is a prefixed instruction and
has to be considered as this.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/inst.h | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
index 17f74429d5d6..b49f11c69ed5 100644
--- a/arch/powerpc/include/asm/inst.h
+++ b/arch/powerpc/include/asm/inst.h
@@ -62,7 +62,7 @@ static inline int ppc_inst_primary_opcode(struct ppc_inst x)
 }
 
 #ifdef CONFIG_PPC64
-#define ppc_inst(x) ((struct ppc_inst){ .val = (x), .suffix = 0xff })
+#define ppc_inst(x) ((struct ppc_inst){ .val = (x) })
 
 #define ppc_inst_prefix(x, y) ((struct ppc_inst){ .val = (x), .suffix = (y) })
 
@@ -73,7 +73,7 @@ static inline u32 ppc_inst_suffix(struct ppc_inst x)
 
 static inline bool ppc_inst_prefixed(struct ppc_inst x)
 {
-   return ppc_inst_primary_opcode(x) == OP_PREFIX && ppc_inst_suffix(x) != 
0xff;
+   return ppc_inst_primary_opcode(x) == OP_PREFIX;
 }
 
 static inline struct ppc_inst ppc_inst_swab(struct ppc_inst x)
@@ -94,11 +94,6 @@ static inline struct ppc_inst ppc_inst_read(const struct 
ppc_inst *ptr)
}
 }
 
-static inline bool ppc_inst_equal(struct ppc_inst x, struct ppc_inst y)
-{
-   return *(u64 *) == *(u64 *)
-}
-
 #else
 
 #define ppc_inst(x) ((struct ppc_inst){ .val = x })
@@ -125,13 +120,17 @@ static inline struct ppc_inst ppc_inst_read(const struct 
ppc_inst *ptr)
return *ptr;
 }
 
+#endif /* CONFIG_PPC64 */
+
 static inline bool ppc_inst_equal(struct ppc_inst x, struct ppc_inst y)
 {
-   return ppc_inst_val(x) == ppc_inst_val(y);
+   if (ppc_inst_val(x) != ppc_inst_val(y))
+   return false;
+   if (!ppc_inst_prefixed(x))
+   return true;
+   return ppc_inst_suffix(x) == ppc_inst_suffix(y);
 }
 
-#endif /* CONFIG_PPC64 */
-
 static inline int ppc_inst_len(struct ppc_inst x)
 {
return ppc_inst_prefixed(x) ? 8 : 4;
-- 
2.25.0



[PATCH v1 03/12] powerpc/inst: Improve readability of get_user_instr() and friends

2021-05-19 Thread Christophe Leroy
Use get_op() instead of open coding, and remove unneeded line splits.

And remove unneeded local variable initialisation.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/inst.h | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
index 57c31e712e67..17f74429d5d6 100644
--- a/arch/powerpc/include/asm/inst.h
+++ b/arch/powerpc/include/asm/inst.h
@@ -3,12 +3,13 @@
 #define _ASM_POWERPC_INST_H
 
 #include 
+#include 
 
 #ifdef CONFIG_PPC64
 
 #define ___get_user_instr(gu_op, dest, ptr)\
 ({ \
-   long __gui_ret = 0; \
+   long __gui_ret; \
unsigned int __user *__gui_ptr = (unsigned int __user *)ptr;\
struct ppc_inst __gui_inst; \
unsigned int __prefix, __suffix;\
@@ -16,10 +17,9 @@
__chk_user_ptr(ptr);\
__gui_ret = gu_op(__prefix, __gui_ptr); \
if (__gui_ret == 0) {   \
-   if ((__prefix >> 26) == OP_PREFIX) {\
+   if (get_op(__prefix) == OP_PREFIX) {\
__gui_ret = gu_op(__suffix, __gui_ptr + 1); \
-   __gui_inst = ppc_inst_prefix(__prefix,  \
-__suffix); \
+   __gui_inst = ppc_inst_prefix(__prefix, __suffix); \
} else {\
__gui_inst = ppc_inst(__prefix);\
}   \
@@ -36,11 +36,9 @@
 })
 #endif /* CONFIG_PPC64 */
 
-#define get_user_instr(x, ptr) \
-   ___get_user_instr(get_user, x, ptr)
+#define get_user_instr(x, ptr) ___get_user_instr(get_user, x, ptr)
 
-#define __get_user_instr(x, ptr) \
-   ___get_user_instr(__get_user, x, ptr)
+#define __get_user_instr(x, ptr) ___get_user_instr(__get_user, x, ptr)
 
 /*
  * Instruction data type for POWER
@@ -60,7 +58,7 @@ static inline u32 ppc_inst_val(struct ppc_inst x)
 
 static inline int ppc_inst_primary_opcode(struct ppc_inst x)
 {
-   return ppc_inst_val(x) >> 26;
+   return get_op(ppc_inst_val(x));
 }
 
 #ifdef CONFIG_PPC64
@@ -75,13 +73,12 @@ static inline u32 ppc_inst_suffix(struct ppc_inst x)
 
 static inline bool ppc_inst_prefixed(struct ppc_inst x)
 {
-   return (ppc_inst_primary_opcode(x) == 1) && ppc_inst_suffix(x) != 0xff;
+   return ppc_inst_primary_opcode(x) == OP_PREFIX && ppc_inst_suffix(x) != 
0xff;
 }
 
 static inline struct ppc_inst ppc_inst_swab(struct ppc_inst x)
 {
-   return ppc_inst_prefix(swab32(ppc_inst_val(x)),
-  swab32(ppc_inst_suffix(x)));
+   return ppc_inst_prefix(swab32(ppc_inst_val(x)), 
swab32(ppc_inst_suffix(x)));
 }
 
 static inline struct ppc_inst ppc_inst_read(const struct ppc_inst *ptr)
@@ -89,7 +86,7 @@ static inline struct ppc_inst ppc_inst_read(const struct 
ppc_inst *ptr)
u32 val, suffix;
 
val = *(u32 *)ptr;
-   if ((val >> 26) == OP_PREFIX) {
+   if (get_op(val) == OP_PREFIX) {
suffix = *((u32 *)ptr + 1);
return ppc_inst_prefix(val, suffix);
} else {
-- 
2.25.0



[PATCH v1 02/12] powerpc/inst: Reduce casts in get_user_instr()

2021-05-19 Thread Christophe Leroy
Declare __gui_ptr as 'unsigned int *' instead of casting it
at each use.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/inst.h | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
index 887ef150fdda..57c31e712e67 100644
--- a/arch/powerpc/include/asm/inst.h
+++ b/arch/powerpc/include/asm/inst.h
@@ -9,16 +9,15 @@
 #define ___get_user_instr(gu_op, dest, ptr)\
 ({ \
long __gui_ret = 0; \
-   unsigned long __gui_ptr = (unsigned long)ptr;   \
+   unsigned int __user *__gui_ptr = (unsigned int __user *)ptr;\
struct ppc_inst __gui_inst; \
unsigned int __prefix, __suffix;\
\
__chk_user_ptr(ptr);\
-   __gui_ret = gu_op(__prefix, (unsigned int __user *)__gui_ptr);  \
+   __gui_ret = gu_op(__prefix, __gui_ptr); \
if (__gui_ret == 0) {   \
if ((__prefix >> 26) == OP_PREFIX) {\
-   __gui_ret = gu_op(__suffix, \
-   (unsigned int __user *)__gui_ptr + 1);  \
+   __gui_ret = gu_op(__suffix, __gui_ptr + 1); \
__gui_inst = ppc_inst_prefix(__prefix,  \
 __suffix); \
} else {\
-- 
2.25.0



[PATCH v1 01/12] powerpc/inst: Fix sparse detection on get_user_instr()

2021-05-19 Thread Christophe Leroy
get_user_instr() lacks sparse detection for the __user tag.

This is because __gui_ptr is assigned with a cast.

Fix that by adding a __chk_user_ptr()

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/inst.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
index 268d3bd073c8..887ef150fdda 100644
--- a/arch/powerpc/include/asm/inst.h
+++ b/arch/powerpc/include/asm/inst.h
@@ -12,6 +12,8 @@
unsigned long __gui_ptr = (unsigned long)ptr;   \
struct ppc_inst __gui_inst; \
unsigned int __prefix, __suffix;\
+   \
+   __chk_user_ptr(ptr);\
__gui_ret = gu_op(__prefix, (unsigned int __user *)__gui_ptr);  \
if (__gui_ret == 0) {   \
if ((__prefix >> 26) == OP_PREFIX) {\
@@ -29,7 +31,10 @@
 })
 #else /* !CONFIG_PPC64 */
 #define ___get_user_instr(gu_op, dest, ptr)\
-   gu_op((dest).val, (u32 __user *)(ptr))
+({ \
+   __chk_user_ptr(ptr);\
+   gu_op((dest).val, (u32 __user *)(ptr)); \
+})
 #endif /* CONFIG_PPC64 */
 
 #define get_user_instr(x, ptr) \
-- 
2.25.0



[PATCH v1 00/12] Cleanup use of 'struct ppc_inst'

2021-05-19 Thread Christophe Leroy
This series is a cleanup of the use of 'struct ppc_inst'.

A confusion is made between internal representation of powerpc
instructions with 'struct ppc_inst' and in-memory code which is
and will always be an array of 'unsigned int'.

This series cleans it up.

First patch is fixing detection of missing '__user' flag by sparse
when using get_user_instr().

Last part of the series does some source code cleanup in
optprobes, it is put at the ends of this series because of
clashes with the 'struct ppc_inst' cleanups.

Christophe Leroy (12):
  powerpc/inst: Fix sparse detection on get_user_instr()
  powerpc/inst: Reduce casts in get_user_instr()
  powerpc/inst: Improve readability of get_user_instr() and friends
  powerpc/inst: Avoid pointer dereferencing in ppc_inst_equal()
  powerpc: Do not dereference code as 'struct ppc_inst' (uprobe,
code-patching, feature-fixups)
  powerpc/lib/code-patching: Make instr_is_branch_to_addr() static
  powerpc/lib/code-patching: Don't use struct 'ppc_inst' for runnable
code in tests.
  powerpc: Don't use 'struct ppc_inst' to reference instruction location
  powerpc/inst: Refactor PPC32 and PPC64 versions
  powerpc/optprobes: Minimise casts
  powerpc/optprobes: Compact code source a bit.
  powerpc/optprobes: use PPC_RAW_ macros

 arch/powerpc/include/asm/code-patching.h  |  23 ++-
 arch/powerpc/include/asm/inst.h   |  95 +
 arch/powerpc/include/asm/ppc-opcode.h |  11 +-
 arch/powerpc/include/asm/uprobes.h|   4 +-
 arch/powerpc/kernel/crash_dump.c  |   4 +-
 arch/powerpc/kernel/epapr_paravirt.c  |   4 +-
 arch/powerpc/kernel/jump_label.c  |   2 +-
 arch/powerpc/kernel/kgdb.c|   6 +-
 arch/powerpc/kernel/kprobes.c |  17 ++-
 arch/powerpc/kernel/mce_power.c   |   2 +-
 arch/powerpc/kernel/optprobes.c   | 124 +---
 arch/powerpc/kernel/setup_32.c|   2 +-
 arch/powerpc/kernel/trace/ftrace.c|  26 ++--
 arch/powerpc/kernel/uprobes.c |   6 +-
 arch/powerpc/lib/code-patching.c  | 165 --
 arch/powerpc/lib/feature-fixups.c |  98 ++---
 arch/powerpc/perf/core-book3s.c   |   4 +-
 arch/powerpc/platforms/86xx/mpc86xx_smp.c |   2 +-
 arch/powerpc/platforms/powermac/smp.c |   4 +-
 arch/powerpc/xmon/xmon.c  |  10 +-
 20 files changed, 274 insertions(+), 335 deletions(-)

-- 
2.25.0



Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Segher Boessenkool
On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote:
> Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm:
> > I always figured the ppc way was superior. It begs the question if not the 
> > other archs should
> > change instead?
> 
> It is superior in some ways, not enough to be worth being different.

The PowerPC syscall ABI *requires* using cr0.3 for indicating errors,
you will have to do that whether you conflate the concepts of return
code and error indicator or not!

> Other archs are unlikely to change because it would be painful for
> not much benefit.

Other archs cannot easily change for much the same reason :-)

> New system calls just should be made to not return
> error numbers.

Which sometimes is a difficult / non-natural / clumsy thing to do.


Segher


Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument

2021-05-19 Thread Segher Boessenkool
On Wed, May 19, 2021 at 06:37:44AM -0700, Guenter Roeck wrote:
> On 5/19/21 5:03 AM, Segher Boessenkool wrote:
> >On Tue, May 18, 2021 at 07:45:14PM -0500, Segher Boessenkool wrote:
> >>And it actually explicitly is undefined behaviour in C90 already
> >>(3.6.6.4 in C90, 6.8.6.4 in C99 and later).
> >
> >... but there is a GCC extension that allows this by default:
> >
> >   For C only, warn about a 'return' statement with an expression in a
> >   function whose return type is 'void', unless the expression type is
> >   also 'void'.  As a GNU extension, the latter case is accepted
> >   without a warning unless '-Wpedantic' is used.
> 
> In C99:
> 
> "6.8.6.4 The return statement
> Constraints
> 
> A return statement with an expression shall not appear in a function whose 
> return type
> is void. A return statement without an expression shall only appear in a 
> function
> whose return type is void."
> 
> Sounds like invalid to me, not just undefined behavior.

I don't know what "invalid" would mean here other than UB, it isn't a
specific defined term, unlike the latter, which is precisely defined in
3.4.3/1:
  undefined behavior
  behavior, upon use of a nonportable or erroneous program construct or
  of erroneous data, for which this International Standard imposes no
  requirements

This is the strongest thing the standard can say, it is not Law, it does
not prohibit anyone from doing anything :-)

"Shall" and "shall not" X means it is undefined behaviour if X (or its
inverse)  is violated.  See 4.2:
  If a ''shall'' or ''shall not'' requirement that appears outside of a
  constraint or runtime-constraint is violated, the behavior is
  undefined.  Undefined behavior is otherwise indicated in this
  International Standard by the words ''undefined behavior'' or by the
  omission of any explicit definition of behavior.  There is no
  difference in emphasis among these three; they all describe ''behavior
  that is undefined''.
which also explains that what you call "invalid" has undefined behaviour
just as well, most likely.


Segher


Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument

2021-05-19 Thread Guenter Roeck

On 5/19/21 5:03 AM, Segher Boessenkool wrote:

On Tue, May 18, 2021 at 07:45:14PM -0500, Segher Boessenkool wrote:

On Wed, May 19, 2021 at 10:26:22AM +1000, Michael Ellerman wrote:

Guenter Roeck  writes:

Ah, sorry. I wasn't aware that the following is valid C code

void f1()
{
  return f2();
  ^^
}

as long as f2() is void as well. Confusing, but we live and learn.


It might be valid, but it's still bad IMHO.

It's confusing to readers, and serves no useful purpose.


And it actually explicitly is undefined behaviour in C90 already
(3.6.6.4 in C90, 6.8.6.4 in C99 and later).


... but there is a GCC extension that allows this by default:

   For C only, warn about a 'return' statement with an expression in a
   function whose return type is 'void', unless the expression type is
   also 'void'.  As a GNU extension, the latter case is accepted
   without a warning unless '-Wpedantic' is used.



In C99:

"6.8.6.4 The return statement
Constraints

A return statement with an expression shall not appear in a function whose 
return type
is void. A return statement without an expression shall only appear in a 
function
whose return type is void."

Sounds like invalid to me, not just undefined behavior.

Guenter


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Dmitry V. Levin
On Wed, May 19, 2021 at 08:59:05PM +1000, Nicholas Piggin wrote:
> Excerpts from Dmitry V. Levin's message of May 19, 2021 8:24 pm:
> > On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote:
> > [...]
> >> With this patch, I think the ptrace ABI should mostly be fixed. I think 
> >> a problem remains with applications that look at system call return 
> >> registers directly and have powerpc specific error cases. Those probably
> >> will just need to be updated unfortunately. Michael thought it might be
> >> possible to return an indication via ptrace somehow that the syscall is
> >> using a new ABI, so such apps can be updated to test for it. I don't 
> >> know how that would be done.
> > 
> > Is there any sane way for these applications to handle the scv case?
> > How can they tell that the scv semantics is being used for the given
> > syscall invocation?  Can this information be obtained e.g. from struct
> > pt_regs?
> 
> Not that I know of. Michael suggested there might be a way to add 
> something. ptrace_syscall_info has some pad bytes, could
> we use one for flags bits and set a bit for "new system call ABI"?

PTRACE_GET_SYSCALL_INFO is an architecture-agnostic API, it hides all
architecture-specific details behind struct ptrace_syscall_info which has
the same meaning on all architectures.  ptrace_syscall_info.exit contains
both rval and is_error fields to support every architecture regardless of
its syscall ABI.

ptrace_syscall_info.exit is extensible, but every architecture would have
to define a method of telling whether the system call follows the "new
system call ABI" conventions to export this bit of information.

This essentially means implementing something like
static inline long syscall_get_error_abi(struct task_struct *task, struct 
pt_regs *regs)
for every architecture, and using it along with syscall_get_error
in ptrace_get_syscall_info_exit to initialize the new field in
ptrace_syscall_info.exit structure.

> As a more hacky thing you could make a syscall with -1 and see how
> the error looks, and then assume all syscalls will be the same.

This would be very unreliable because sc and scv are allowed to intermingle,
so every syscall invocation can follow any of these two error handling
conventions.

> Or... is it possible at syscall entry to peek the address of
> the instruction which caused the call and see if that was a
> scv instruction? That would be about as reliable as possible
> without having that new flag bit.

No other architecture requires peeking into tracee memory just to find out
the syscall ABI.  This would make powerpc the most ugly architecture for
ptracing.

I wonder why can't this information be just exported to the tracer via
struct pt_regs?


-- 
ldv


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Tulio Magno Quites Machado Filho
Nicholas Piggin via Libc-alpha  writes:

> As a more hacky thing you could make a syscall with -1 and see how
> the error looks, and then assume all syscalls will be the same.

I'm not sure this would work.
Even in glibc, it's expected that early syscalls will use sc while scv is used
later in the execution.

-- 
Tulio Magno


Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument

2021-05-19 Thread Segher Boessenkool
On Tue, May 18, 2021 at 07:45:14PM -0500, Segher Boessenkool wrote:
> On Wed, May 19, 2021 at 10:26:22AM +1000, Michael Ellerman wrote:
> > Guenter Roeck  writes:
> > > Ah, sorry. I wasn't aware that the following is valid C code
> > >
> > > void f1()
> > > {
> > >  return f2();
> > >  ^^
> > > }
> > >
> > > as long as f2() is void as well. Confusing, but we live and learn.
> > 
> > It might be valid, but it's still bad IMHO.
> > 
> > It's confusing to readers, and serves no useful purpose.
> 
> And it actually explicitly is undefined behaviour in C90 already
> (3.6.6.4 in C90, 6.8.6.4 in C99 and later).

... but there is a GCC extension that allows this by default:

  For C only, warn about a 'return' statement with an expression in a
  function whose return type is 'void', unless the expression type is
  also 'void'.  As a GNU extension, the latter case is accepted
  without a warning unless '-Wpedantic' is used.


Segher


Re: [PATCH v3 5/6] sched/fair: Consider SMT in ASYM_PACKING load balance

2021-05-19 Thread Srikar Dronamraju
* Peter Zijlstra  [2021-05-19 11:59:48]:

> On Tue, May 18, 2021 at 12:07:40PM -0700, Ricardo Neri wrote:
> > On Fri, May 14, 2021 at 07:14:15PM -0700, Ricardo Neri wrote:
> > > On Fri, May 14, 2021 at 11:47:45AM +0200, Peter Zijlstra wrote:
> 
> > > > So I'm thinking that this is a property of having ASYM_PACKING at a core
> > > > level, rather than some arch special. Wouldn't something like this be
> > > > more appropriate?
> 
> > > Thanks Peter for the quick review! This makes sense to me. The only
> > > reason we proposed arch_asym_check_smt_siblings() is because we were
> > > about breaking powerpc (I need to study how they set priorities for SMT,
> > > if applicable). If you think this is not an issue I can post a
> > > v4 with this update.
> > 
> > As far as I can see, priorities in powerpc are set by the CPU number.
> > However, I am not sure how CPUs are enumerated? If CPUs in brackets are
> > SMT sibling, Does an enumeration looks like A) [0, 1], [2, 3] or B) [0, 2],
> > [1, 3]? I guess B is the right answer. Otherwise, both SMT siblings of a
> > core would need to be busy before a new core is used.
> > 
> > Still, I think the issue described in the cover letter may be
> > reproducible in powerpc as well. If CPU3 is offlined, and [0, 2] pulled
> > tasks from [1, -] so that both CPU0 and CPU2 become busy, CPU1 would not be
> > able to help since CPU0 has the highest priority.
> > 
> > I am cc'ing the linuxppc list to get some feedback.
> 
> IIRC the concern with Power is that their Cores can go faster if the
> higher SMT siblings are unused.
> 
> That is, suppose you have an SMT4 Core with only a single active task,
> then if only SMT0 is used it can reach max performance, but if the
> active sibling is SMT1 it can not reach max performance, and if the only
> active sibling is SMT2 it goes slower still.
> 
> So they need to pack the tasks to the lowest SMT siblings, and have the
> highest SMT siblings idle (where possible) in order to increase
> performance.
> 
> 

If you are referring to SD_ASYM_PACKING, then packing tasks to lowest SMT
was needed in POWER7 timeframe. So if there was one thread running, then
running on the lowest sibling provided the best performance as if running in
single threaded mode.

However recent chips like POWER8/ POWER9 / POWER10 dont need SD_ASYM_PACKING
since the hardware itself does the switch. So even if task is place on a
higher sibling within the core, we dont need to switch the task to a lower
sibling for it to perform better. Now running only one thread running on any
sibling, its expected to run in single threaded mode.

-- 
Thanks and Regards
Srikar Dronamraju


[PATCH v2 4/4] powerpc: wii_defconfig: Enable OTP by default

2021-05-19 Thread Emmanuel Gil Peyrot
This selects the nintendo-otp module when building for this platform, if
CONFIG_NVMEM is also selected.

Signed-off-by: Emmanuel Gil Peyrot 
---
 arch/powerpc/configs/wii_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/wii_defconfig 
b/arch/powerpc/configs/wii_defconfig
index 379c171f3ddd..a0c45bf2bfb1 100644
--- a/arch/powerpc/configs/wii_defconfig
+++ b/arch/powerpc/configs/wii_defconfig
@@ -99,6 +99,7 @@ CONFIG_LEDS_TRIGGER_HEARTBEAT=y
 CONFIG_LEDS_TRIGGER_PANIC=y
 CONFIG_RTC_CLASS=y
 CONFIG_RTC_DRV_GENERIC=y
+CONFIG_NVMEM_NINTENDO_OTP=y
 CONFIG_EXT2_FS=y
 CONFIG_EXT4_FS=y
 CONFIG_FUSE_FS=m
-- 
2.31.1



[PATCH v2 3/4] powerpc: wii.dts: Expose the OTP on this platform

2021-05-19 Thread Emmanuel Gil Peyrot
This can be used by the newly-added nintendo-otp nvmem module.

Signed-off-by: Emmanuel Gil Peyrot 
---
 arch/powerpc/boot/dts/wii.dts | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/boot/dts/wii.dts b/arch/powerpc/boot/dts/wii.dts
index aaa381da1906..7837c4a3f09c 100644
--- a/arch/powerpc/boot/dts/wii.dts
+++ b/arch/powerpc/boot/dts/wii.dts
@@ -219,6 +219,11 @@ control@d800100 {
reg = <0x0d800100 0x300>;
};
 
+   otp@d8001ec {
+   compatible = "nintendo,hollywood-otp";
+   reg = <0x0d8001ec 0x8>;
+   };
+
disk@d806000 {
compatible = "nintendo,hollywood-di";
reg = <0x0d806000 0x40>;
-- 
2.31.1



[PATCH v2 2/4] dt-bindings: nintendo-otp: Document the Wii and Wii U OTP support

2021-05-19 Thread Emmanuel Gil Peyrot
Both of these consoles use the exact same two registers, even at the
same address, but the Wii U has eight banks of 128 bytes memory while
the Wii only has one, hence the two compatible strings.

Signed-off-by: Emmanuel Gil Peyrot 
---
 .../devicetree/bindings/nvmem/nintendo-otp.txt | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nvmem/nintendo-otp.txt

diff --git a/Documentation/devicetree/bindings/nvmem/nintendo-otp.txt 
b/Documentation/devicetree/bindings/nvmem/nintendo-otp.txt
new file mode 100644
index ..b26d705ec52d
--- /dev/null
+++ b/Documentation/devicetree/bindings/nvmem/nintendo-otp.txt
@@ -0,0 +1,14 @@
+Nintendo Wii and Wii U OTP
+
+Required Properties:
+- compatible: depending on the console this should be one of:
+   - "nintendo,hollywood-otp" for the Wii
+   - "nintendo,latte-otp" for the Wii U
+- reg: base address and size of the OTP registers
+
+
+Example:
+   otp@d8001ec {
+   compatible = "nintendo,latte-otp";
+   reg = <0x0d8001ec 0x8>;
+   };
-- 
2.31.1



[PATCH v2 1/4] nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP

2021-05-19 Thread Emmanuel Gil Peyrot
This OTP is read-only and contains various keys used by the console to
decrypt, encrypt or verify various pieces of storage.

Its size depends on the console, it is 128 bytes on the Wii and
1024 bytes on the Wii U (split into eight 128 bytes banks).

It can be used directly by writing into one register and reading from
the other one, without any additional synchronisation.

Signed-off-by: Emmanuel Gil Peyrot 
---
 drivers/nvmem/Kconfig|  11 
 drivers/nvmem/Makefile   |   2 +
 drivers/nvmem/nintendo-otp.c | 115 +++
 3 files changed, 128 insertions(+)
 create mode 100644 drivers/nvmem/nintendo-otp.c

diff --git a/drivers/nvmem/Kconfig b/drivers/nvmem/Kconfig
index dd2019006838..dd6196e49b2d 100644
--- a/drivers/nvmem/Kconfig
+++ b/drivers/nvmem/Kconfig
@@ -107,6 +107,17 @@ config MTK_EFUSE
  This driver can also be built as a module. If so, the module
  will be called efuse-mtk.
 
+config NVMEM_NINTENDO_OTP
+   tristate "Nintendo Wii and Wii U OTP Support"
+   help
+ This is a driver to expose the OTP on a Nintendo Wii or Wii U.
+
+ This memory contains common and per-console keys, signatures and
+ related data required to access peripherals.
+
+ This driver can also be built as a module. If so, the module
+ will be called nvmem-nintendo-otp.
+
 config QCOM_QFPROM
tristate "QCOM QFPROM Support"
depends on ARCH_QCOM || COMPILE_TEST
diff --git a/drivers/nvmem/Makefile b/drivers/nvmem/Makefile
index bbea1410240a..dcbbde35b6a8 100644
--- a/drivers/nvmem/Makefile
+++ b/drivers/nvmem/Makefile
@@ -23,6 +23,8 @@ obj-$(CONFIG_NVMEM_LPC18XX_OTP)   += nvmem_lpc18xx_otp.o
 nvmem_lpc18xx_otp-y:= lpc18xx_otp.o
 obj-$(CONFIG_NVMEM_MXS_OCOTP)  += nvmem-mxs-ocotp.o
 nvmem-mxs-ocotp-y  := mxs-ocotp.o
+obj-$(CONFIG_NVMEM_NINTENDO_OTP)   += nvmem-nintendo-otp.o
+nvmem-nintendo-otp-y   := nintendo-otp.o
 obj-$(CONFIG_MTK_EFUSE)+= nvmem_mtk-efuse.o
 nvmem_mtk-efuse-y  := mtk-efuse.o
 obj-$(CONFIG_QCOM_QFPROM)  += nvmem_qfprom.o
diff --git a/drivers/nvmem/nintendo-otp.c b/drivers/nvmem/nintendo-otp.c
new file mode 100644
index ..de6f5d7c10ef
--- /dev/null
+++ b/drivers/nvmem/nintendo-otp.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Nintendo Wii and Wii U OTP driver
+ *
+ * Copyright (C) 2021 Emmanuel Gil Peyrot 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define HW_OTPCMD  0
+#define HW_OTPDATA 4
+#define OTP_READ   0x8000
+
+struct nintendo_otp_priv {
+   void __iomem *regs;
+};
+
+struct nintendo_otp_devtype_data {
+   const char *name;
+   unsigned int num_banks;
+};
+
+static const struct nintendo_otp_devtype_data hollywood_otp_data = {
+   .name = "wii-otp",
+   .num_banks = 1,
+};
+
+static const struct nintendo_otp_devtype_data latte_otp_data = {
+   .name = "wiiu-otp",
+   .num_banks = 8,
+};
+
+static int nintendo_otp_reg_read(void *context,
+unsigned int reg, void *_val, size_t bytes)
+{
+   struct nintendo_otp_priv *priv = context;
+   u32 *val = _val;
+   int words = bytes >> 2;
+   u32 bank, addr;
+
+   while (words--) {
+   bank = (reg << 1) & ~0xff;
+   addr = (reg >> 2) & 0x1f;
+   iowrite32be(OTP_READ | bank | addr, priv->regs + HW_OTPCMD);
+   *val++ = ioread32be(priv->regs + HW_OTPDATA);
+   reg += 4;
+   }
+
+   return 0;
+}
+
+static const struct of_device_id nintendo_otp_of_table[] = {
+   { .compatible = "nintendo,hollywood-otp", .data = _otp_data },
+   { .compatible = "nintendo,latte-otp", .data = _otp_data },
+   {/* sentinel */},
+};
+MODULE_DEVICE_TABLE(of, nintendo_otp_of_table);
+
+static int nintendo_otp_probe(struct platform_device *pdev)
+{
+   struct device *dev = >dev;
+   const struct of_device_id *of_id =
+   of_match_device(nintendo_otp_of_table, dev);
+   struct resource *res;
+   struct nvmem_device *nvmem;
+   struct nintendo_otp_priv *priv;
+
+   struct nvmem_config config = {
+   .stride = 4,
+   .word_size = 4,
+   .reg_read = nintendo_otp_reg_read,
+   .read_only = true,
+   .root_only = true,
+   };
+
+   priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   priv->regs = devm_ioremap_resource(dev, res);
+   if (IS_ERR(priv->regs))
+   return PTR_ERR(priv->regs);
+
+   if (of_id->data) {
+   const struct nintendo_otp_devtype_data *data = of_id->data;
+   config.name = data->name;
+   config.size = data->num_banks * 128;
+   }
+
+   config.dev = dev;
+   

[PATCH v2 0/4] nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP

2021-05-19 Thread Emmanuel Gil Peyrot
The OTP is a read-only memory area which contains various keys and
signatures used to decrypt, encrypt or verify various pieces of storage.

Its size depends on the console, it is 128 bytes on the Wii and
1024 bytes on the Wii U (split into eight 128 bytes banks).

It can be used directly by writing into one register and reading from
the other one, without any additional synchronisation.

This series has only been tested on the Wii U so far, using the
downstream 4.19 branch from linux-wiiu[1], but it should also work on
the Wii on mainline.

[1] https://gitlab.com/linux-wiiu/linux-wiiu

Changes since v1:
- Fixed the commit messages so they can be accepted by other email
  servers, sorry about that.

Emmanuel Gil Peyrot (4):
  nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP
  dt-bindings: nintendo-otp: Document the Wii and Wii U OTP support
  powerpc: wii.dts: Expose the OTP on this platform
  powerpc: wii_defconfig: Enable OTP by default

 .../bindings/nvmem/nintendo-otp.txt   |  14 +++
 arch/powerpc/boot/dts/wii.dts |   5 +
 arch/powerpc/configs/wii_defconfig|   1 +
 drivers/nvmem/Kconfig |  11 ++
 drivers/nvmem/Makefile|   2 +
 drivers/nvmem/nintendo-otp.c  | 115 ++
 6 files changed, 148 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nvmem/nintendo-otp.txt
 create mode 100644 drivers/nvmem/nintendo-otp.c

-- 
2.31.1



[PATCH 3/4] powerpc: wii.dts: Expose the OTP on this platform

2021-05-19 Thread Emmanuel Gil Peyrot
This can be used by the newly-added nintendo-otp nvmem module.

Signed-off-by: Emmanuel Gil Peyrot 
---
 arch/powerpc/boot/dts/wii.dts | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/boot/dts/wii.dts b/arch/powerpc/boot/dts/wii.dts
index aaa381da1906..7837c4a3f09c 100644
--- a/arch/powerpc/boot/dts/wii.dts
+++ b/arch/powerpc/boot/dts/wii.dts
@@ -219,6 +219,11 @@ control@d800100 {
reg = <0x0d800100 0x300>;
};
 
+   otp@d8001ec {
+   compatible = "nintendo,hollywood-otp";
+   reg = <0x0d8001ec 0x8>;
+   };
+
disk@d806000 {
compatible = "nintendo,hollywood-di";
reg = <0x0d806000 0x40>;
-- 
2.31.1



[PATCH 4/4] powerpc: wii_defconfig: Enable OTP by default

2021-05-19 Thread Emmanuel Gil Peyrot
This selects the nintendo-otp module when building for this platform, if
CONFIG_NVMEM is also selected.

Signed-off-by: Emmanuel Gil Peyrot 
---
 arch/powerpc/configs/wii_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/wii_defconfig 
b/arch/powerpc/configs/wii_defconfig
index 379c171f3ddd..a0c45bf2bfb1 100644
--- a/arch/powerpc/configs/wii_defconfig
+++ b/arch/powerpc/configs/wii_defconfig
@@ -99,6 +99,7 @@ CONFIG_LEDS_TRIGGER_HEARTBEAT=y
 CONFIG_LEDS_TRIGGER_PANIC=y
 CONFIG_RTC_CLASS=y
 CONFIG_RTC_DRV_GENERIC=y
+CONFIG_NVMEM_NINTENDO_OTP=y
 CONFIG_EXT2_FS=y
 CONFIG_EXT4_FS=y
 CONFIG_FUSE_FS=m
-- 
2.31.1



[PATCH 2/4] dt-bindings: nintendo-otp: Document the Wii and Wii U OTP support

2021-05-19 Thread Emmanuel Gil Peyrot
Both of these consoles use the exact same two registers, even at the
same address, but the Wii U has eight banks of 128 bytes memory while
the Wii only has one, hence the two compatible strings.

Signed-off-by: Emmanuel Gil Peyrot 
---
 .../devicetree/bindings/nvmem/nintendo-otp.txt | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nvmem/nintendo-otp.txt

diff --git a/Documentation/devicetree/bindings/nvmem/nintendo-otp.txt 
b/Documentation/devicetree/bindings/nvmem/nintendo-otp.txt
new file mode 100644
index ..b26d705ec52d
--- /dev/null
+++ b/Documentation/devicetree/bindings/nvmem/nintendo-otp.txt
@@ -0,0 +1,14 @@
+Nintendo Wii and Wii U OTP
+
+Required Properties:
+- compatible: depending on the console this should be one of:
+   - "nintendo,hollywood-otp" for the Wii
+   - "nintendo,latte-otp" for the Wii U
+- reg: base address and size of the OTP registers
+
+
+Example:
+   otp@d8001ec {
+   compatible = "nintendo,latte-otp";
+   reg = <0x0d8001ec 0x8>;
+   };
-- 
2.31.1



[PATCH 1/4] nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP

2021-05-19 Thread Emmanuel Gil Peyrot
This OTP is read-only and contains various keys used by the console to
decrypt, encrypt or verify various pieces of storage.

Its size depends on the console, it is 128 bytes on the Wii and
1024 bytes on the Wii U (split into eight 128 bytes banks).

It can be used directly by writing into one register and reading from
the other one, without any additional synchronisation.

Signed-off-by: Emmanuel Gil Peyrot 
---
 drivers/nvmem/Kconfig|  11 
 drivers/nvmem/Makefile   |   2 +
 drivers/nvmem/nintendo-otp.c | 115 +++
 3 files changed, 128 insertions(+)
 create mode 100644 drivers/nvmem/nintendo-otp.c

diff --git a/drivers/nvmem/Kconfig b/drivers/nvmem/Kconfig
index dd2019006838..dd6196e49b2d 100644
--- a/drivers/nvmem/Kconfig
+++ b/drivers/nvmem/Kconfig
@@ -107,6 +107,17 @@ config MTK_EFUSE
  This driver can also be built as a module. If so, the module
  will be called efuse-mtk.
 
+config NVMEM_NINTENDO_OTP
+   tristate "Nintendo Wii and Wii U OTP Support"
+   help
+ This is a driver to expose the OTP on a Nintendo Wii or Wii U.
+
+ This memory contains common and per-console keys, signatures and
+ related data required to access peripherals.
+
+ This driver can also be built as a module. If so, the module
+ will be called nvmem-nintendo-otp.
+
 config QCOM_QFPROM
tristate "QCOM QFPROM Support"
depends on ARCH_QCOM || COMPILE_TEST
diff --git a/drivers/nvmem/Makefile b/drivers/nvmem/Makefile
index bbea1410240a..dcbbde35b6a8 100644
--- a/drivers/nvmem/Makefile
+++ b/drivers/nvmem/Makefile
@@ -23,6 +23,8 @@ obj-$(CONFIG_NVMEM_LPC18XX_OTP)   += nvmem_lpc18xx_otp.o
 nvmem_lpc18xx_otp-y:= lpc18xx_otp.o
 obj-$(CONFIG_NVMEM_MXS_OCOTP)  += nvmem-mxs-ocotp.o
 nvmem-mxs-ocotp-y  := mxs-ocotp.o
+obj-$(CONFIG_NVMEM_NINTENDO_OTP)   += nvmem-nintendo-otp.o
+nvmem-nintendo-otp-y   := nintendo-otp.o
 obj-$(CONFIG_MTK_EFUSE)+= nvmem_mtk-efuse.o
 nvmem_mtk-efuse-y  := mtk-efuse.o
 obj-$(CONFIG_QCOM_QFPROM)  += nvmem_qfprom.o
diff --git a/drivers/nvmem/nintendo-otp.c b/drivers/nvmem/nintendo-otp.c
new file mode 100644
index ..de6f5d7c10ef
--- /dev/null
+++ b/drivers/nvmem/nintendo-otp.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Nintendo Wii and Wii U OTP driver
+ *
+ * Copyright (C) 2021 Emmanuel Gil Peyrot 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define HW_OTPCMD  0
+#define HW_OTPDATA 4
+#define OTP_READ   0x8000
+
+struct nintendo_otp_priv {
+   void __iomem *regs;
+};
+
+struct nintendo_otp_devtype_data {
+   const char *name;
+   unsigned int num_banks;
+};
+
+static const struct nintendo_otp_devtype_data hollywood_otp_data = {
+   .name = "wii-otp",
+   .num_banks = 1,
+};
+
+static const struct nintendo_otp_devtype_data latte_otp_data = {
+   .name = "wiiu-otp",
+   .num_banks = 8,
+};
+
+static int nintendo_otp_reg_read(void *context,
+unsigned int reg, void *_val, size_t bytes)
+{
+   struct nintendo_otp_priv *priv = context;
+   u32 *val = _val;
+   int words = bytes >> 2;
+   u32 bank, addr;
+
+   while (words--) {
+   bank = (reg << 1) & ~0xff;
+   addr = (reg >> 2) & 0x1f;
+   iowrite32be(OTP_READ | bank | addr, priv->regs + HW_OTPCMD);
+   *val++ = ioread32be(priv->regs + HW_OTPDATA);
+   reg += 4;
+   }
+
+   return 0;
+}
+
+static const struct of_device_id nintendo_otp_of_table[] = {
+   { .compatible = "nintendo,hollywood-otp", .data = _otp_data },
+   { .compatible = "nintendo,latte-otp", .data = _otp_data },
+   {/* sentinel */},
+};
+MODULE_DEVICE_TABLE(of, nintendo_otp_of_table);
+
+static int nintendo_otp_probe(struct platform_device *pdev)
+{
+   struct device *dev = >dev;
+   const struct of_device_id *of_id =
+   of_match_device(nintendo_otp_of_table, dev);
+   struct resource *res;
+   struct nvmem_device *nvmem;
+   struct nintendo_otp_priv *priv;
+
+   struct nvmem_config config = {
+   .stride = 4,
+   .word_size = 4,
+   .reg_read = nintendo_otp_reg_read,
+   .read_only = true,
+   .root_only = true,
+   };
+
+   priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   priv->regs = devm_ioremap_resource(dev, res);
+   if (IS_ERR(priv->regs))
+   return PTR_ERR(priv->regs);
+
+   if (of_id->data) {
+   const struct nintendo_otp_devtype_data *data = of_id->data;
+   config.name = data->name;
+   config.size = data->num_banks * 128;
+   }
+
+   config.dev = dev;
+   

[PATCH 0/4] nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP

2021-05-19 Thread Emmanuel Gil Peyrot
The OTP is a read-only memory area which contains various keys and
signatures used to decrypt, encrypt or verify various pieces of storage.

Its size depends on the console, it is 128 bytes on the Wii and
1024 bytes on the Wii U (split into eight 128 bytes banks).

It can be used directly by writing into one register and reading from
the other one, without any additional synchronisation.

This series has only been tested on the Wii U so far, using the
downstream 4.19 branch from linux-wiiu[1], but it should also work on
the Wii on mainline.

[1] https://gitlab.com/linux-wiiu/linux-wiiu

Emmanuel Gil Peyrot (4):
  nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP
  dt-bindings: nintendo-otp: Document the Wii and Wii U OTP support
  powerpc: wii.dts: Expose the OTP on this platform
  powerpc: wii_defconfig: Enable OTP by default

 .../bindings/nvmem/nintendo-otp.txt   |  14 +++
 arch/powerpc/boot/dts/wii.dts |   5 +
 arch/powerpc/configs/wii_defconfig|   1 +
 drivers/nvmem/Kconfig |  11 ++
 drivers/nvmem/Makefile|   2 +
 drivers/nvmem/nintendo-otp.c  | 115 ++
 6 files changed, 148 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nvmem/nintendo-otp.txt
 create mode 100644 drivers/nvmem/nintendo-otp.c

-- 
2.31.1



[PATCH 1/4] nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP

2021-05-19 Thread Emmanuel Gil Peyrot
This OTP is read-only and contains various keys used by the console to
decrypt, encrypt or verify various pieces of storage.

Its size depends on the console, it is 128 bytes on the Wii and
1024 bytes on the Wii U (split into eight 128 bytes banks).

It can be used directly by writing into one register and reading from
the other one, without any additional synchronisation.

Signed-off-by: Emmanuel Gil Peyrot 
---
 drivers/nvmem/Kconfig|  11 
 drivers/nvmem/Makefile   |   2 +
 drivers/nvmem/nintendo-otp.c | 115 +++
 3 files changed, 128 insertions(+)
 create mode 100644 drivers/nvmem/nintendo-otp.c

diff --git a/drivers/nvmem/Kconfig b/drivers/nvmem/Kconfig
index dd2019006838..dd6196e49b2d 100644
--- a/drivers/nvmem/Kconfig
+++ b/drivers/nvmem/Kconfig
@@ -107,6 +107,17 @@ config MTK_EFUSE
  This driver can also be built as a module. If so, the module
  will be called efuse-mtk.
 
+config NVMEM_NINTENDO_OTP
+   tristate "Nintendo Wii and Wii U OTP Support"
+   help
+ This is a driver to expose the OTP on a Nintendo Wii or Wii U.
+
+ This memory contains common and per-console keys, signatures and
+ related data required to access peripherals.
+
+ This driver can also be built as a module. If so, the module
+ will be called nvmem-nintendo-otp.
+
 config QCOM_QFPROM
tristate "QCOM QFPROM Support"
depends on ARCH_QCOM || COMPILE_TEST
diff --git a/drivers/nvmem/Makefile b/drivers/nvmem/Makefile
index bbea1410240a..dcbbde35b6a8 100644
--- a/drivers/nvmem/Makefile
+++ b/drivers/nvmem/Makefile
@@ -23,6 +23,8 @@ obj-$(CONFIG_NVMEM_LPC18XX_OTP)   += nvmem_lpc18xx_otp.o
 nvmem_lpc18xx_otp-y:= lpc18xx_otp.o
 obj-$(CONFIG_NVMEM_MXS_OCOTP)  += nvmem-mxs-ocotp.o
 nvmem-mxs-ocotp-y  := mxs-ocotp.o
+obj-$(CONFIG_NVMEM_NINTENDO_OTP)   += nvmem-nintendo-otp.o
+nvmem-nintendo-otp-y   := nintendo-otp.o
 obj-$(CONFIG_MTK_EFUSE)+= nvmem_mtk-efuse.o
 nvmem_mtk-efuse-y  := mtk-efuse.o
 obj-$(CONFIG_QCOM_QFPROM)  += nvmem_qfprom.o
diff --git a/drivers/nvmem/nintendo-otp.c b/drivers/nvmem/nintendo-otp.c
new file mode 100644
index ..de6f5d7c10ef
--- /dev/null
+++ b/drivers/nvmem/nintendo-otp.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Nintendo Wii and Wii U OTP driver
+ *
+ * Copyright (C) 2021 Emmanuel Gil Peyrot 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define HW_OTPCMD  0
+#define HW_OTPDATA 4
+#define OTP_READ   0x8000
+
+struct nintendo_otp_priv {
+   void __iomem *regs;
+};
+
+struct nintendo_otp_devtype_data {
+   const char *name;
+   unsigned int num_banks;
+};
+
+static const struct nintendo_otp_devtype_data hollywood_otp_data = {
+   .name = "wii-otp",
+   .num_banks = 1,
+};
+
+static const struct nintendo_otp_devtype_data latte_otp_data = {
+   .name = "wiiu-otp",
+   .num_banks = 8,
+};
+
+static int nintendo_otp_reg_read(void *context,
+unsigned int reg, void *_val, size_t bytes)
+{
+   struct nintendo_otp_priv *priv = context;
+   u32 *val = _val;
+   int words = bytes >> 2;
+   u32 bank, addr;
+
+   while (words--) {
+   bank = (reg << 1) & ~0xff;
+   addr = (reg >> 2) & 0x1f;
+   iowrite32be(OTP_READ | bank | addr, priv->regs + HW_OTPCMD);
+   *val++ = ioread32be(priv->regs + HW_OTPDATA);
+   reg += 4;
+   }
+
+   return 0;
+}
+
+static const struct of_device_id nintendo_otp_of_table[] = {
+   { .compatible = "nintendo,hollywood-otp", .data = _otp_data },
+   { .compatible = "nintendo,latte-otp", .data = _otp_data },
+   {/* sentinel */},
+};
+MODULE_DEVICE_TABLE(of, nintendo_otp_of_table);
+
+static int nintendo_otp_probe(struct platform_device *pdev)
+{
+   struct device *dev = >dev;
+   const struct of_device_id *of_id =
+   of_match_device(nintendo_otp_of_table, dev);
+   struct resource *res;
+   struct nvmem_device *nvmem;
+   struct nintendo_otp_priv *priv;
+
+   struct nvmem_config config = {
+   .stride = 4,
+   .word_size = 4,
+   .reg_read = nintendo_otp_reg_read,
+   .read_only = true,
+   .root_only = true,
+   };
+
+   priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   priv->regs = devm_ioremap_resource(dev, res);
+   if (IS_ERR(priv->regs))
+   return PTR_ERR(priv->regs);
+
+   if (of_id->data) {
+   const struct nintendo_otp_devtype_data *data = of_id->data;
+   config.name = data->name;
+   config.size = data->num_banks * 128;
+   }
+
+   config.dev = dev;
+   

[PATCH 2/4] dt-bindings: nintendo-otp: Document the Wii and Wii U OTP support

2021-05-19 Thread Emmanuel Gil Peyrot
Both of these consoles use the exact same two registers, even at the
same address, but the Wii U has eight banks of 128 bytes memory while
the Wii only has one, hence the two compatible strings.

Signed-off-by: Emmanuel Gil Peyrot 
---
 .../devicetree/bindings/nvmem/nintendo-otp.txt | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nvmem/nintendo-otp.txt

diff --git a/Documentation/devicetree/bindings/nvmem/nintendo-otp.txt 
b/Documentation/devicetree/bindings/nvmem/nintendo-otp.txt
new file mode 100644
index ..b26d705ec52d
--- /dev/null
+++ b/Documentation/devicetree/bindings/nvmem/nintendo-otp.txt
@@ -0,0 +1,14 @@
+Nintendo Wii and Wii U OTP
+
+Required Properties:
+- compatible: depending on the console this should be one of:
+   - "nintendo,hollywood-otp" for the Wii
+   - "nintendo,latte-otp" for the Wii U
+- reg: base address and size of the OTP registers
+
+
+Example:
+   otp@d8001ec {
+   compatible = "nintendo,latte-otp";
+   reg = <0x0d8001ec 0x8>;
+   };
-- 
2.31.1



[PATCH 4/4] powerpc: wii_defconfig: Enable OTP by default

2021-05-19 Thread Emmanuel Gil Peyrot
This selects the nintendo-otp module when building for this platform, if
CONFIG_NVMEM is also selected.

Signed-off-by: Emmanuel Gil Peyrot 
---
 arch/powerpc/configs/wii_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/wii_defconfig 
b/arch/powerpc/configs/wii_defconfig
index 379c171f3ddd..a0c45bf2bfb1 100644
--- a/arch/powerpc/configs/wii_defconfig
+++ b/arch/powerpc/configs/wii_defconfig
@@ -99,6 +99,7 @@ CONFIG_LEDS_TRIGGER_HEARTBEAT=y
 CONFIG_LEDS_TRIGGER_PANIC=y
 CONFIG_RTC_CLASS=y
 CONFIG_RTC_DRV_GENERIC=y
+CONFIG_NVMEM_NINTENDO_OTP=y
 CONFIG_EXT2_FS=y
 CONFIG_EXT4_FS=y
 CONFIG_FUSE_FS=m
-- 
2.31.1



[PATCH 0/4] nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP

2021-05-19 Thread Emmanuel Gil Peyrot
The OTP is a read-only memory area which contains various keys and
signatures used to decrypt, encrypt or verify various pieces of storage.

Its size depends on the console, it is 128 bytes on the Wii and
1024 bytes on the Wii U (split into eight 128 bytes banks).

It can be used directly by writing into one register and reading from
the other one, without any additional synchronisation.

This series has only been tested on the Wii U so far, using the
downstream 4.19 branch from linux-wiiu[1], but it should also work on
the Wii on mainline.

[1] https://gitlab.com/linux-wiiu/linux-wiiu

Emmanuel Gil Peyrot (4):
  nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP
  dt-bindings: nintendo-otp: Document the Wii and Wii U OTP support
  powerpc: wii.dts: Expose the OTP on this platform
  powerpc: wii_defconfig: Enable OTP by default

 .../bindings/nvmem/nintendo-otp.txt   |  14 +++
 arch/powerpc/boot/dts/wii.dts |   5 +
 arch/powerpc/configs/wii_defconfig|   1 +
 drivers/nvmem/Kconfig |  11 ++
 drivers/nvmem/Makefile|   2 +
 drivers/nvmem/nintendo-otp.c  | 115 ++
 6 files changed, 148 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nvmem/nintendo-otp.txt
 create mode 100644 drivers/nvmem/nintendo-otp.c

-- 
2.31.1



[PATCH 3/4] powerpc: wii.dts: Expose the OTP on this platform

2021-05-19 Thread Emmanuel Gil Peyrot
This can be used by the newly-added nintendo-otp nvmem module.

Signed-off-by: Emmanuel Gil Peyrot 
---
 arch/powerpc/boot/dts/wii.dts | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/boot/dts/wii.dts b/arch/powerpc/boot/dts/wii.dts
index aaa381da1906..7837c4a3f09c 100644
--- a/arch/powerpc/boot/dts/wii.dts
+++ b/arch/powerpc/boot/dts/wii.dts
@@ -219,6 +219,11 @@ control@d800100 {
reg = <0x0d800100 0x300>;
};
 
+   otp@d8001ec {
+   compatible = "nintendo,hollywood-otp";
+   reg = <0x0d8001ec 0x8>;
+   };
+
disk@d806000 {
compatible = "nintendo,hollywood-di";
reg = <0x0d806000 0x40>;
-- 
2.31.1



Re: [FSL P50x0] KVM HV doesn't work anymore

2021-05-19 Thread Christian Zigotzky

On 19 May 2021 at 09:57 am, Nicholas Piggin wrote:

Excerpts from Christian Zigotzky's message of May 17, 2021 7:42 pm:

On 17 May 2021 at 09:42am, Nicholas Piggin wrote:

Excerpts from Christian Zigotzky's message of May 15, 2021 11:46 pm:
I tried it but it doesn't solve the issue. The uImage works without 
KVM

HV in a virtual e5500 QEMU machine.

Any more progress with this? I would say that bisect might have just
been a bit unstable and maybe by chance some things did not crash so
it's pointing to the wrong patch.

Upstream merge of powerpc-5.13-1 was good and powerpc-5.13-2 was bad?

Between that looks like some KVM MMU rework. You could try the patch
before this one b1c5356e873c ("KVM: PPC: Convert to the gfn-based MMU
notifier callbacks"). That won't revert cleanly so just try run the
tree at that point. If it works, test the patch and see if it fails.

Thanks,
Nick

Hi Nick,

Thanks a lot for your answer. Yes, there is a little bit of progress.
The RC2 of kernel 5.13 successfully boots with -smp 3 in a virtual e5500
QEMU machine.
-smp 4 doesn't work anymore since the PowerPC updates 5.13-2. I used
-smp 4 before 5.13 because my FSL P5040 machine has 4 cores.

Could you please post a patch for reverting the commit before
b1c5356e873c ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks")?

You could `git checkout b1c5356e873c~1`

Thanks,
Nick

Hi Nick,

Thanks for your answer. I checked out the commit b1c5356e873c~1 (HEAD is 
now at d923ff258423 KVM: MIPS/MMU: Convert to the gfn-based MMU notifier 
callbacks).

The kernel boots with '-smp 4' without any problems.
After that I patched with the probable first bad commit (KVM: PPC: 
Convert to the gfn-based MMU notifier callbacks). The kernel also boots 
with this patch. That means, this isn't the first bad commit.
Further information: 
https://forum.hyperion-entertainment.com/viewtopic.php?p=53267#p53267


Cheers,
Christian


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Nicholas Piggin
Excerpts from Nicholas Piggin's message of May 19, 2021 6:42 pm:
> Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm:
>> On Wed, 2021-05-19 at 17:55 +1000, Nicholas Piggin wrote:
>>> Excerpts from Joakim Tjernlund's message of May 19, 2021 5:33 pm:
>>> > On Wed, 2021-05-19 at 02:13 +0300, Dmitry V. Levin wrote:
>>> > > Hi,
>>> > > 
>>> > > On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote:
>>> > > [...]
>>> > > > - Error handling: The consensus among kernel, glibc, and musl is to 
>>> > > > move to
>>> > > >   using negative return values in r3 rather than CR0[SO]=1 to 
>>> > > > indicate error,
>>> > > >   which matches most other architectures, and is closer to a function 
>>> > > > call.
>>> > 
>>> > What about syscalls like times(2) which can return -1 without it being an 
>>> > error?
>>> 
>>> They do become errors / indistinguishable and have to be dealt with by 
>>> libc or userspace. Which does follow what most architectures do (all 
>>> except ia64, mips, sparc, and powerpc actually).
>>> 
>>> Interesting question though, it should have been noted.
>>> 
>>> Thanks,
>>> Nick
>> 
>> I always figured the ppc way was superior. It begs the question if not the 
>> other archs should
>> change instead?
> 
> It is superior in some ways, not enough to be worth being different.
> 
> Other archs are unlikely to change because it would be painful for
> not much benefit. New system calls just should be made to not return
> error numbers. If we ever had a big new version of syscall ABI in
> Linux, we can always use another scv vector number for it.

Or... is it possible at syscall entry to peek the address of
the instruction which caused the call and see if that was a
scv instruction? That would be about as reliable as possible
without having that new flag bit.

Thanks,
Nick


Re: [PATCH v3 5/6] sched/fair: Consider SMT in ASYM_PACKING load balance

2021-05-19 Thread Nicholas Piggin
Excerpts from Peter Zijlstra's message of May 19, 2021 7:59 pm:
> On Tue, May 18, 2021 at 12:07:40PM -0700, Ricardo Neri wrote:
>> On Fri, May 14, 2021 at 07:14:15PM -0700, Ricardo Neri wrote:
>> > On Fri, May 14, 2021 at 11:47:45AM +0200, Peter Zijlstra wrote:
> 
>> > > So I'm thinking that this is a property of having ASYM_PACKING at a core
>> > > level, rather than some arch special. Wouldn't something like this be
>> > > more appropriate?
> 
>> > Thanks Peter for the quick review! This makes sense to me. The only
>> > reason we proposed arch_asym_check_smt_siblings() is because we were
>> > about breaking powerpc (I need to study how they set priorities for SMT,
>> > if applicable). If you think this is not an issue I can post a
>> > v4 with this update.
>> 
>> As far as I can see, priorities in powerpc are set by the CPU number.
>> However, I am not sure how CPUs are enumerated? If CPUs in brackets are
>> SMT sibling, Does an enumeration looks like A) [0, 1], [2, 3] or B) [0, 2],
>> [1, 3]? I guess B is the right answer. Otherwise, both SMT siblings of a
>> core would need to be busy before a new core is used.
>> 
>> Still, I think the issue described in the cover letter may be
>> reproducible in powerpc as well. If CPU3 is offlined, and [0, 2] pulled
>> tasks from [1, -] so that both CPU0 and CPU2 become busy, CPU1 would not be
>> able to help since CPU0 has the highest priority.
>> 
>> I am cc'ing the linuxppc list to get some feedback.
> 
> IIRC the concern with Power is that their Cores can go faster if the
> higher SMT siblings are unused.
> 
> That is, suppose you have an SMT4 Core with only a single active task,
> then if only SMT0 is used it can reach max performance, but if the
> active sibling is SMT1 it can not reach max performance, and if the only
> active sibling is SMT2 it goes slower still.
> 
> So they need to pack the tasks to the lowest SMT siblings, and have the
> highest SMT siblings idle (where possible) in order to increase
> performance.

That's correct.

Thanks,
Nick


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Nicholas Piggin
Excerpts from Dmitry V. Levin's message of May 19, 2021 8:24 pm:
> On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote:
> [...]
>> With this patch, I think the ptrace ABI should mostly be fixed. I think 
>> a problem remains with applications that look at system call return 
>> registers directly and have powerpc specific error cases. Those probably
>> will just need to be updated unfortunately. Michael thought it might be
>> possible to return an indication via ptrace somehow that the syscall is
>> using a new ABI, so such apps can be updated to test for it. I don't 
>> know how that would be done.
> 
> Is there any sane way for these applications to handle the scv case?
> How can they tell that the scv semantics is being used for the given
> syscall invocation?  Can this information be obtained e.g. from struct
> pt_regs?

Not that I know of. Michael suggested there might be a way to add 
something. ptrace_syscall_info has some pad bytes, could
we use one for flags bits and set a bit for "new system call ABI"?

As a more hacky thing you could make a syscall with -1 and see how
the error looks, and then assume all syscalls will be the same.

Thanks,
Nick

> 
> For example, in strace we have the following powerpc-specific code used
> for syscall tampering:
> 
> $ cat src/linux/powerpc/set_error.c
> /*
>  * Copyright (c) 2016-2021 The strace developers.
>  * All rights reserved.
>  *
>  * SPDX-License-Identifier: LGPL-2.1-or-later
>  */
> 
> static int
> arch_set_r3_ccr(struct tcb *tcp, const unsigned long r3,
>   const unsigned long ccr_set, const unsigned long ccr_clear)
> {
>   if (ptrace_syscall_info_is_valid() &&
>   upeek(tcp, sizeof(long) * PT_CCR, _regs.ccr))
> return -1;
>   const unsigned long old_ccr = ppc_regs.ccr;
>   ppc_regs.gpr[3] = r3;
>   ppc_regs.ccr |= ccr_set;
>   ppc_regs.ccr &= ~ccr_clear;
>   if (ppc_regs.ccr != old_ccr &&
>   upoke(tcp, sizeof(long) * PT_CCR, ppc_regs.ccr))
>   return -1;
>   return upoke(tcp, sizeof(long) * (PT_R0 + 3), ppc_regs.gpr[3]);
> }
> 
> static int
> arch_set_error(struct tcb *tcp)
> {
>   return arch_set_r3_ccr(tcp, tcp->u_error, 0x1000, 0);
> }
> 
> static int
> arch_set_success(struct tcb *tcp)
> {
>   return arch_set_r3_ccr(tcp, tcp->u_rval, 0, 0x1000);
> }
> 
> 
> -- 
> ldv
> 


[PATCH 5/5] powerpc/kprobes: Warn if instruction patching failed

2021-05-19 Thread Naveen N. Rao
When arming and disarming probes, we currently assume that instruction
patching can never fail, and don't have a mechanism to surface errors.
Add a warning in case instruction patching ever fails.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 7195162362941f..e0190e75a221eb 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -158,13 +158,13 @@ NOKPROBE_SYMBOL(arch_prepare_kprobe);
 
 void arch_arm_kprobe(struct kprobe *p)
 {
-   patch_instruction((struct ppc_inst *)p->addr, 
ppc_inst(BREAKPOINT_INSTRUCTION));
+   WARN_ON_ONCE(patch_instruction((struct ppc_inst *)p->addr, 
ppc_inst(BREAKPOINT_INSTRUCTION)));
 }
 NOKPROBE_SYMBOL(arch_arm_kprobe);
 
 void arch_disarm_kprobe(struct kprobe *p)
 {
-   patch_instruction((struct ppc_inst *)p->addr, ppc_inst(p->opcode));
+   WARN_ON_ONCE(patch_instruction((struct ppc_inst *)p->addr, 
ppc_inst(p->opcode)));
 }
 NOKPROBE_SYMBOL(arch_disarm_kprobe);
 
-- 
2.30.2



[PATCH 4/5] powerpc/kprobes: Refactor arch_prepare_kprobe()

2021-05-19 Thread Naveen N. Rao
Clean up the function to look sane:
- return immediately on error, rather than pointlessly setting the
  return value
- pr_info() instead of printk()
- check return value of patch_instruction()
- and to top it all of: a reverse christmas tree!

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c | 64 +--
 1 file changed, 31 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index bbef9e918ecb39..7195162362941f 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,56 +105,54 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
 
 int arch_prepare_kprobe(struct kprobe *p)
 {
-   int ret = 0;
+   struct ppc_inst insn = ppc_inst_read((struct ppc_inst *)p->addr);
+   struct instruction_op op;
struct kprobe *prev;
struct pt_regs regs;
-   struct instruction_op op;
-   struct ppc_inst insn = ppc_inst_read((struct ppc_inst *)p->addr);
+   int ret = 0;
 
if ((unsigned long)p->addr & 0x03) {
-   printk("Attempt to register kprobe at an unaligned address\n");
-   ret = -EINVAL;
+   pr_info("Attempt to register kprobe at an unaligned address\n");
+   return -EINVAL;
} else if (IS_MTMSRD(insn) || IS_RFID(insn)) {
-   printk("Cannot register a kprobe on mtmsr[d]/rfi[d]\n");
-   ret = -EINVAL;
+   pr_info("Cannot register a kprobe on mtmsr[d]/rfi[d]\n");
+   return -EINVAL;
} else if ((unsigned long)p->addr & ~PAGE_MASK &&
ppc_inst_prefixed(ppc_inst_read((struct ppc_inst 
*)(p->addr - 1 {
-   printk("Cannot register a kprobe on the second word of prefixed 
instruction\n");
-   ret = -EINVAL;
+   pr_info("Cannot register a kprobe on the second word of 
prefixed instruction\n");
+   return -EINVAL;
}
+
+   /* Check if the previous instruction is a prefix instruction with an 
active kprobe */
preempt_disable();
prev = get_kprobe(p->addr - 1);
preempt_enable_no_resched();
-   if (prev &&
-   ppc_inst_prefixed(ppc_inst_read((struct ppc_inst 
*)prev->ainsn.insn))) {
-   printk("Cannot register a kprobe on the second word of prefixed 
instruction\n");
-   ret = -EINVAL;
+   if (prev && ppc_inst_prefixed(ppc_inst_read((struct ppc_inst 
*)prev->ainsn.insn))) {
+   pr_info("Cannot register a kprobe on the second word of 
prefixed instruction\n");
+   return -EINVAL;
}
 
/* insn must be on a special executable page on ppc64.  This is
 * not explicitly required on ppc32 (right now), but it doesn't hurt */
-   if (!ret) {
-   p->ainsn.insn = get_insn_slot();
-   if (!p->ainsn.insn)
-   ret = -ENOMEM;
-   }
+   p->ainsn.insn = get_insn_slot();
+   if (!p->ainsn.insn)
+   return -ENOMEM;
 
-   if (!ret) {
-   patch_instruction((struct ppc_inst *)p->ainsn.insn, insn);
-   p->opcode = ppc_inst_val(insn);
-
-   /* Check if this is an instruction we recognise */
-   p->ainsn.boostable = 0;
-   memset(, 0, sizeof(struct pt_regs));
-   regs.nip = (unsigned long)p->addr;
-   regs.msr = MSR_KERNEL;
-   ret = analyse_instr(, , insn);
-   if (ret == 1 || (ret == 0 && GETTYPE(op.type) != UNKNOWN))
-   p->ainsn.boostable = 1;
-   ret = 0;
-   }
+   if (patch_instruction((struct ppc_inst *)p->ainsn.insn, insn))
+   return -EFAULT;
 
-   return ret;
+   p->opcode = ppc_inst_val(insn);
+
+   /* Check if this is an instruction we recognise */
+   p->ainsn.boostable = 0;
+   memset(, 0, sizeof(struct pt_regs));
+   regs.nip = (unsigned long)p->addr;
+   regs.msr = MSR_KERNEL;
+   ret = analyse_instr(, , insn);
+   if (ret == 1 || (ret == 0 && GETTYPE(op.type) != UNKNOWN))
+   p->ainsn.boostable = 1;
+
+   return 0;
 }
 NOKPROBE_SYMBOL(arch_prepare_kprobe);
 
-- 
2.30.2



[PATCH 3/5] powerpc/kprobes: Check instruction validity during kprobe registration

2021-05-19 Thread Naveen N. Rao
In trap-based (classic) kprobes, we try to emulate the probed
instruction so as to avoid having to single step it. We use a flag to
determine if the probed instruction was successfully emulated, so that we
can speed up subsequent probe hits.

However, emulate_step() doesn't differentiate between unknown
instructions and an emulation attempt that failed. As such, the current
heuristic is not of much use. Instead, use analyse_instr() during kprobe
registration to determine if the probed instruction can be decoded by
our instruction emulation infrastructure. For unknown instructions, we
can then directly single-step while for other instructions, we can
attempt to emulate and fall back to single stepping if that fails.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c | 62 +--
 1 file changed, 16 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index b7b20875d34d91..bbef9e918ecb39 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -107,6 +107,8 @@ int arch_prepare_kprobe(struct kprobe *p)
 {
int ret = 0;
struct kprobe *prev;
+   struct pt_regs regs;
+   struct instruction_op op;
struct ppc_inst insn = ppc_inst_read((struct ppc_inst *)p->addr);
 
if ((unsigned long)p->addr & 0x03) {
@@ -140,9 +142,18 @@ int arch_prepare_kprobe(struct kprobe *p)
if (!ret) {
patch_instruction((struct ppc_inst *)p->ainsn.insn, insn);
p->opcode = ppc_inst_val(insn);
+
+   /* Check if this is an instruction we recognise */
+   p->ainsn.boostable = 0;
+   memset(, 0, sizeof(struct pt_regs));
+   regs.nip = (unsigned long)p->addr;
+   regs.msr = MSR_KERNEL;
+   ret = analyse_instr(, , insn);
+   if (ret == 1 || (ret == 0 && GETTYPE(op.type) != UNKNOWN))
+   p->ainsn.boostable = 1;
+   ret = 0;
}
 
-   p->ainsn.boostable = 0;
return ret;
 }
 NOKPROBE_SYMBOL(arch_prepare_kprobe);
@@ -225,47 +236,6 @@ void arch_prepare_kretprobe(struct kretprobe_instance *ri, 
struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(arch_prepare_kretprobe);
 
-static int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
-{
-   int ret;
-   struct ppc_inst insn = ppc_inst_read((struct ppc_inst *)p->ainsn.insn);
-
-   /* regs->nip is also adjusted if emulate_step returns 1 */
-   ret = emulate_step(regs, insn);
-   if (ret > 0) {
-   /*
-* Once this instruction has been boosted
-* successfully, set the boostable flag
-*/
-   if (unlikely(p->ainsn.boostable == 0))
-   p->ainsn.boostable = 1;
-   } else if (ret < 0) {
-   /*
-* We don't allow kprobes on mtmsr(d)/rfi(d), etc.
-* So, we should never get here... but, its still
-* good to catch them, just in case...
-*/
-   printk("Can't step on instruction %s\n", ppc_inst_as_str(insn));
-   BUG();
-   } else {
-   /*
-* If we haven't previously emulated this instruction, then it
-* can't be boosted. Note it down so we don't try to do so 
again.
-*
-* If, however, we had emulated this instruction in the past,
-* then this is just an error with the current run (for
-* instance, exceptions due to a load/store). We return 0 so
-* that this is now single-stepped, but continue to try
-* emulating it in subsequent probe hits.
-*/
-   if (unlikely(p->ainsn.boostable != 1))
-   p->ainsn.boostable = -1;
-   }
-
-   return ret;
-}
-NOKPROBE_SYMBOL(try_to_emulate);
-
 int kprobe_handler(struct pt_regs *regs)
 {
struct kprobe *p;
@@ -334,8 +304,8 @@ int kprobe_handler(struct pt_regs *regs)
set_current_kprobe(p, regs, kcb);
kprobes_inc_nmissed_count(p);
kcb->kprobe_status = KPROBE_REENTER;
-   if (p->ainsn.boostable >= 0) {
-   ret = try_to_emulate(p, regs);
+   if (p->ainsn.boostable) {
+   ret = emulate_step(regs, ppc_inst_read((struct ppc_inst 
*)p->ainsn.insn));
 
if (ret > 0) {
restore_previous_kprobe(kcb);
@@ -356,8 +326,8 @@ int kprobe_handler(struct pt_regs *regs)
return 1;
}
 
-   if (p->ainsn.boostable >= 0) {
-   ret = try_to_emulate(p, regs);
+   if (p->ainsn.boostable) {
+   ret = emulate_step(regs, ppc_inst_read((struct ppc_inst 
*)p->ainsn.insn));
 
if (ret > 0) {
if (p->post_handler)
-- 
2.30.2



[PATCH 2/5] powerpc/kprobes: Roll IS_RFI() macro into IS_RFID()

2021-05-19 Thread Naveen N. Rao
In kprobes and xmon, we should exclude both 32-bit and 64-bit variants
of mtmsr and rfi instructions from being stepped. Have IS_RFID() also
detect a rfi instruction similar to IS_MTMSRD().

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/sstep.h | 7 +++
 arch/powerpc/kernel/kprobes.c| 4 ++--
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 972ed0df154d60..1df867c2e054e5 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -13,12 +13,11 @@ struct pt_regs;
  * we don't allow putting a breakpoint on an mtmsrd instruction.
  * Similarly we don't allow breakpoints on rfid instructions.
  * These macros tell us if an instruction is a mtmsrd or rfid.
- * Note that IS_MTMSRD returns true for both an mtmsr (32-bit)
- * and an mtmsrd (64-bit).
+ * Note that these return true for both mtmsr/rfi (32-bit)
+ * and mtmsrd/rfid (64-bit).
  */
 #define IS_MTMSRD(instr)   ((ppc_inst_val(instr) & 0xfc0007be) == 
0x7c000124)
-#define IS_RFID(instr) ((ppc_inst_val(instr) & 0xfc0007fe) == 
0x4c24)
-#define IS_RFI(instr)  ((ppc_inst_val(instr) & 0xfc0007fe) == 
0x4c64)
+#define IS_RFID(instr) ((ppc_inst_val(instr) & 0xfc0007be) == 
0x4c24)
 
 enum instruction_type {
COMPUTE,/* arith/logical/CR op, etc. */
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index f611d9eb3562d7..b7b20875d34d91 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -112,8 +112,8 @@ int arch_prepare_kprobe(struct kprobe *p)
if ((unsigned long)p->addr & 0x03) {
printk("Attempt to register kprobe at an unaligned address\n");
ret = -EINVAL;
-   } else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) {
-   printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n");
+   } else if (IS_MTMSRD(insn) || IS_RFID(insn)) {
+   printk("Cannot register a kprobe on mtmsr[d]/rfi[d]\n");
ret = -EINVAL;
} else if ((unsigned long)p->addr & ~PAGE_MASK &&
ppc_inst_prefixed(ppc_inst_read((struct ppc_inst 
*)(p->addr - 1 {
-- 
2.30.2



[PATCH 1/5] powerpc/kprobes: Fix validation of prefixed instructions across page boundary

2021-05-19 Thread Naveen N. Rao
When checking if the probed instruction is the suffix of a prefixed
instruction, we access the instruction at the previous word. If the
probed instruction is the very first word of a module, we can end up
trying to access an invalid page. Fix this by skipping the check for all
instructions at the beginning of a page. Prefixed instructions cannot
cross a 64-byte boundary and as such, preventing probing on such
instructions is not worthwhile.

Cc: sta...@vger.kernel.org # v5.8+
Fixes: b4657f7650babc ("powerpc/kprobes: Don't allow breakpoints on suffixes")
Reported-by: Christophe Leroy 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 01ab2163659e4b..f611d9eb3562d7 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -108,7 +108,6 @@ int arch_prepare_kprobe(struct kprobe *p)
int ret = 0;
struct kprobe *prev;
struct ppc_inst insn = ppc_inst_read((struct ppc_inst *)p->addr);
-   struct ppc_inst prefix = ppc_inst_read((struct ppc_inst *)(p->addr - 
1));
 
if ((unsigned long)p->addr & 0x03) {
printk("Attempt to register kprobe at an unaligned address\n");
@@ -116,7 +115,8 @@ int arch_prepare_kprobe(struct kprobe *p)
} else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) {
printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n");
ret = -EINVAL;
-   } else if (ppc_inst_prefixed(prefix)) {
+   } else if ((unsigned long)p->addr & ~PAGE_MASK &&
+   ppc_inst_prefixed(ppc_inst_read((struct ppc_inst 
*)(p->addr - 1 {
printk("Cannot register a kprobe on the second word of prefixed 
instruction\n");
ret = -EINVAL;
}
-- 
2.30.2



[PATCH 0/5] powerpc/kprobes: fixes and cleanups

2021-05-19 Thread Naveen N. Rao
Various fixes and some code refactoring for kprobes on powerpc. The 
first patch fixes an invalid access if probing the first instruction in 
a kernel module. The rest are small cleanups. More details in the 
individual patches.

- Naveen

Naveen N. Rao (5):
  powerpc/kprobes: Fix validation of prefixed instructions across page
boundary
  powerpc/kprobes: Roll IS_RFI() macro into IS_RFID()
  powerpc/kprobes: Check instruction validity during kprobe registration
  powerpc/kprobes: Refactor arch_prepare_kprobe()
  powerpc/kprobes: Warn if instruction patching failed

 arch/powerpc/include/asm/sstep.h |   7 +-
 arch/powerpc/kernel/kprobes.c| 112 +++
 2 files changed, 43 insertions(+), 76 deletions(-)


base-commit: 3a81c0495fdb91fd9a9b4f617098c283131eeae1
-- 
2.30.2



Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Dmitry V. Levin
On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote:
[...]
> With this patch, I think the ptrace ABI should mostly be fixed. I think 
> a problem remains with applications that look at system call return 
> registers directly and have powerpc specific error cases. Those probably
> will just need to be updated unfortunately. Michael thought it might be
> possible to return an indication via ptrace somehow that the syscall is
> using a new ABI, so such apps can be updated to test for it. I don't 
> know how that would be done.

Is there any sane way for these applications to handle the scv case?
How can they tell that the scv semantics is being used for the given
syscall invocation?  Can this information be obtained e.g. from struct
pt_regs?

For example, in strace we have the following powerpc-specific code used
for syscall tampering:

$ cat src/linux/powerpc/set_error.c
/*
 * Copyright (c) 2016-2021 The strace developers.
 * All rights reserved.
 *
 * SPDX-License-Identifier: LGPL-2.1-or-later
 */

static int
arch_set_r3_ccr(struct tcb *tcp, const unsigned long r3,
const unsigned long ccr_set, const unsigned long ccr_clear)
{
if (ptrace_syscall_info_is_valid() &&
upeek(tcp, sizeof(long) * PT_CCR, _regs.ccr))
return -1;
const unsigned long old_ccr = ppc_regs.ccr;
ppc_regs.gpr[3] = r3;
ppc_regs.ccr |= ccr_set;
ppc_regs.ccr &= ~ccr_clear;
if (ppc_regs.ccr != old_ccr &&
upoke(tcp, sizeof(long) * PT_CCR, ppc_regs.ccr))
return -1;
return upoke(tcp, sizeof(long) * (PT_R0 + 3), ppc_regs.gpr[3]);
}

static int
arch_set_error(struct tcb *tcp)
{
return arch_set_r3_ccr(tcp, tcp->u_error, 0x1000, 0);
}

static int
arch_set_success(struct tcb *tcp)
{
return arch_set_r3_ccr(tcp, tcp->u_rval, 0, 0x1000);
}


-- 
ldv


Re: [PATCH v3 5/6] sched/fair: Consider SMT in ASYM_PACKING load balance

2021-05-19 Thread Peter Zijlstra
On Tue, May 18, 2021 at 12:07:40PM -0700, Ricardo Neri wrote:
> On Fri, May 14, 2021 at 07:14:15PM -0700, Ricardo Neri wrote:
> > On Fri, May 14, 2021 at 11:47:45AM +0200, Peter Zijlstra wrote:

> > > So I'm thinking that this is a property of having ASYM_PACKING at a core
> > > level, rather than some arch special. Wouldn't something like this be
> > > more appropriate?

> > Thanks Peter for the quick review! This makes sense to me. The only
> > reason we proposed arch_asym_check_smt_siblings() is because we were
> > about breaking powerpc (I need to study how they set priorities for SMT,
> > if applicable). If you think this is not an issue I can post a
> > v4 with this update.
> 
> As far as I can see, priorities in powerpc are set by the CPU number.
> However, I am not sure how CPUs are enumerated? If CPUs in brackets are
> SMT sibling, Does an enumeration looks like A) [0, 1], [2, 3] or B) [0, 2],
> [1, 3]? I guess B is the right answer. Otherwise, both SMT siblings of a
> core would need to be busy before a new core is used.
> 
> Still, I think the issue described in the cover letter may be
> reproducible in powerpc as well. If CPU3 is offlined, and [0, 2] pulled
> tasks from [1, -] so that both CPU0 and CPU2 become busy, CPU1 would not be
> able to help since CPU0 has the highest priority.
> 
> I am cc'ing the linuxppc list to get some feedback.

IIRC the concern with Power is that their Cores can go faster if the
higher SMT siblings are unused.

That is, suppose you have an SMT4 Core with only a single active task,
then if only SMT0 is used it can reach max performance, but if the
active sibling is SMT1 it can not reach max performance, and if the only
active sibling is SMT2 it goes slower still.

So they need to pack the tasks to the lowest SMT siblings, and have the
highest SMT siblings idle (where possible) in order to increase
performance.




Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Nicholas Piggin
Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm:
> On Wed, 2021-05-19 at 17:55 +1000, Nicholas Piggin wrote:
>> Excerpts from Joakim Tjernlund's message of May 19, 2021 5:33 pm:
>> > On Wed, 2021-05-19 at 02:13 +0300, Dmitry V. Levin wrote:
>> > > Hi,
>> > > 
>> > > On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote:
>> > > [...]
>> > > > - Error handling: The consensus among kernel, glibc, and musl is to 
>> > > > move to
>> > > >   using negative return values in r3 rather than CR0[SO]=1 to indicate 
>> > > > error,
>> > > >   which matches most other architectures, and is closer to a function 
>> > > > call.
>> > 
>> > What about syscalls like times(2) which can return -1 without it being an 
>> > error?
>> 
>> They do become errors / indistinguishable and have to be dealt with by 
>> libc or userspace. Which does follow what most architectures do (all 
>> except ia64, mips, sparc, and powerpc actually).
>> 
>> Interesting question though, it should have been noted.
>> 
>> Thanks,
>> Nick
> 
> I always figured the ppc way was superior. It begs the question if not the 
> other archs should
> change instead?

It is superior in some ways, not enough to be worth being different.

Other archs are unlikely to change because it would be painful for
not much benefit. New system calls just should be made to not return
error numbers. If we ever had a big new version of syscall ABI in
Linux, we can always use another scv vector number for it.

Thanks,
Nick


Re: [PATCH v8 27/30] powerpc/kprobes: Don't allow breakpoints on suffixes

2021-05-19 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 06/05/2020 à 05:40, Jordan Niethe a écrit :

Do not allow inserting breakpoints on the suffix of a prefix instruction
in kprobes.

Signed-off-by: Jordan Niethe 
---
v8: Add this back from v3
---
  arch/powerpc/kernel/kprobes.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 33d54b091c70..227510df8c55 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -106,7 +106,9 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
  int arch_prepare_kprobe(struct kprobe *p)
  {
int ret = 0;
+   struct kprobe *prev;
struct ppc_inst insn = ppc_inst_read((struct ppc_inst *)p->addr);
+   struct ppc_inst prefix = ppc_inst_read((struct ppc_inst *)(p->addr - 
1));


What if p->addr is the first word of a page and the previous page is 
not mapped ?


Good catch!
I think we can just skip validation if the instruction is at the 
beginning of a page. I have a few cleanups in this area - I will post a 
patchset soon.




  
  	if ((unsigned long)p->addr & 0x03) {

printk("Attempt to register kprobe at an unaligned address\n");
@@ -114,6 +116,17 @@ int arch_prepare_kprobe(struct kprobe *p)
} else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) {
printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n");
ret = -EINVAL;
+   } else if (ppc_inst_prefixed(prefix)) {


If p->addr - 2 contains a valid prefixed instruction, then p->addr - 1 contains the suffix of that 
prefixed instruction. Are we sure a suffix can never ever be misinterpreted as the prefix of a 
prefixed instruction ?


Yes. Per the ISA:
 Bits 0:5 of all prefixes are assigned the primary opcode
 value 0b01. 0b01 is not available for use as a
 primary opcode for either word instructions or suffixes
 of prefixed instructions.


- Naveen



Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Joakim Tjernlund
On Wed, 2021-05-19 at 17:55 +1000, Nicholas Piggin wrote:
> Excerpts from Joakim Tjernlund's message of May 19, 2021 5:33 pm:
> > On Wed, 2021-05-19 at 02:13 +0300, Dmitry V. Levin wrote:
> > > Hi,
> > > 
> > > On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote:
> > > [...]
> > > > - Error handling: The consensus among kernel, glibc, and musl is to 
> > > > move to
> > > >   using negative return values in r3 rather than CR0[SO]=1 to indicate 
> > > > error,
> > > >   which matches most other architectures, and is closer to a function 
> > > > call.
> > 
> > What about syscalls like times(2) which can return -1 without it being an 
> > error?
> 
> They do become errors / indistinguishable and have to be dealt with by 
> libc or userspace. Which does follow what most architectures do (all 
> except ia64, mips, sparc, and powerpc actually).
> 
> Interesting question though, it should have been noted.
> 
> Thanks,
> Nick

I always figured the ppc way was superior. It begs the question if not the 
other archs should
change instead?

 Jocke


Re: [FSL P50x0] KVM HV doesn't work anymore

2021-05-19 Thread Nicholas Piggin
Excerpts from Christian Zigotzky's message of May 17, 2021 7:42 pm:
> On 17 May 2021 at 09:42am, Nicholas Piggin wrote:
>> Excerpts from Christian Zigotzky's message of May 15, 2021 11:46 pm:
>>> On 15 May 2021 at 12:08pm Christophe Leroy wrote:

 Le 15/05/2021 à 11:48, Christian Zigotzky a écrit :
> Hi All,
>
> I bisected today [1] and the bisecting itself was OK but the
> reverting of the bad commit doesn't solve the issue. Do you have an
> idea which commit could be resposible for this issue? Maybe the
> bisecting wasn't successful. I will look in the kernel git log. Maybe
> there is a commit that affected KVM HV on FSL P50x0 machines.
 If the uImage doesn't load, it may be because of the size of uImage.

 See https://github.com/linuxppc/issues/issues/208

 Is there a significant size difference with and without KVM HV ?

 Maybe you can try to remove another option to reduce the size of the
 uImage.
>>> I tried it but it doesn't solve the issue. The uImage works without KVM
>>> HV in a virtual e5500 QEMU machine.
>> Any more progress with this? I would say that bisect might have just
>> been a bit unstable and maybe by chance some things did not crash so
>> it's pointing to the wrong patch.
>>
>> Upstream merge of powerpc-5.13-1 was good and powerpc-5.13-2 was bad?
>>
>> Between that looks like some KVM MMU rework. You could try the patch
>> before this one b1c5356e873c ("KVM: PPC: Convert to the gfn-based MMU
>> notifier callbacks"). That won't revert cleanly so just try run the
>> tree at that point. If it works, test the patch and see if it fails.
>>
>> Thanks,
>> Nick
> Hi Nick,
> 
> Thanks a lot for your answer. Yes, there is a little bit of progress. 
> The RC2 of kernel 5.13 successfully boots with -smp 3 in a virtual e5500 
> QEMU machine.
> -smp 4 doesn't work anymore since the PowerPC updates 5.13-2. I used 
> -smp 4 before 5.13 because my FSL P5040 machine has 4 cores.
> 
> Could you please post a patch for reverting the commit before 
> b1c5356e873c ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks")?

You could `git checkout b1c5356e873c~1`

Thanks,
Nick
> 


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Nicholas Piggin
Excerpts from Joakim Tjernlund's message of May 19, 2021 5:33 pm:
> On Wed, 2021-05-19 at 02:13 +0300, Dmitry V. Levin wrote:
>> Hi,
>> 
>> On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote:
>> [...]
>> > - Error handling: The consensus among kernel, glibc, and musl is to move to
>> >   using negative return values in r3 rather than CR0[SO]=1 to indicate 
>> > error,
>> >   which matches most other architectures, and is closer to a function call.
> 
> What about syscalls like times(2) which can return -1 without it being an 
> error?

They do become errors / indistinguishable and have to be dealt with by 
libc or userspace. Which does follow what most architectures do (all 
except ia64, mips, sparc, and powerpc actually).

Interesting question though, it should have been noted.

Thanks,
Nick


Re: Linux powerpc new system call instruction and ABI

2021-05-19 Thread Joakim Tjernlund
On Wed, 2021-05-19 at 02:13 +0300, Dmitry V. Levin wrote:
> Hi,
> 
> On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote:
> [...]
> > - Error handling: The consensus among kernel, glibc, and musl is to move to
> >   using negative return values in r3 rather than CR0[SO]=1 to indicate 
> > error,
> >   which matches most other architectures, and is closer to a function call.

What about syscalls like times(2) which can return -1 without it being an error?

 Jocke


Re: [PATCH v3] powerpc/papr_scm: Reduce error severity if nvdimm stats inaccessible

2021-05-19 Thread kajoljain



On 5/8/21 10:06 AM, Vaibhav Jain wrote:
> Currently drc_pmem_qeury_stats() generates a dev_err in case
> "Enable Performance Information Collection" feature is disabled from
> HMC or performance stats are not available for an nvdimm. The error is
> of the form below:
> 
> papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Failed to query
>performance stats, Err:-10
> 
> This error message confuses users as it implies a possible problem
> with the nvdimm even though its due to a disabled/unavailable
> feature. We fix this by explicitly handling the H_AUTHORITY and
> H_UNSUPPORTED errors from the H_SCM_PERFORMANCE_STATS hcall.
> 
> In case of H_AUTHORITY error an info message is logged instead of an
> error, saying that "Permission denied while accessing performance
> stats" and an EPERM error is returned back.
> 
> In case of H_UNSUPPORTED error we return a EOPNOTSUPP error back from
> drc_pmem_query_stats() indicating that performance stats-query
> operation is not supported on this nvdimm.
> 
> Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from 
> PHYP')
> Signed-off-by: Vaibhav Jain 
> ---
> Changelog
> 
> v3:
> * Return EOPNOTSUPP error in case of H_UNSUPPORTED [ Ira ]
> * Return EPERM in case of H_AUTHORITY [ Ira ]
> * Updated patch description
> 

Patch looks good to me.

Reviewed-By: Kajol Jain

Thanks,
Kajol Jain

> v2:
> * Updated the message logged in case of H_AUTHORITY error [ Ira ]
> * Switched from dev_warn to dev_info in case of H_AUTHORITY error.
> * Instead of -EPERM return -EACCESS for H_AUTHORITY error.
> * Added explicit handling of H_UNSUPPORTED error.
> ---
>  arch/powerpc/platforms/pseries/papr_scm.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index ef26fe40efb0..e2b69cc3beaf 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -310,6 +310,13 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv 
> *p,
>   dev_err(>pdev->dev,
>   "Unknown performance stats, Err:0x%016lX\n", ret[0]);
>   return -ENOENT;
> + } else if (rc == H_AUTHORITY) {
> + dev_info(>pdev->dev,
> +  "Permission denied while accessing performance stats");
> + return -EPERM;
> + } else if (rc == H_UNSUPPORTED) {
> + dev_dbg(>pdev->dev, "Performance stats unsupported\n");
> + return -EOPNOTSUPP;
>   } else if (rc != H_SUCCESS) {
>   dev_err(>pdev->dev,
>   "Failed to query performance stats, Err:%lld\n", rc);
>