Re: [PATCH v3 31/41] powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE

2021-08-02 Thread Finn Thain
On Mon, 2 Aug 2021, LEROY Christophe wrote:

> Le 01/08/2021 à 03:21, Finn Thain a écrit :
> > On Sat, 31 Jul 2021, Christophe Leroy wrote:
> > 
> > > > 
> > > > Stan Johnson contacted me about a regression in mainline that he
> > > > observed on his G3 Powerbooks. Using 'git bisect' we determined that
> > > > this patch was the cause of the regression, i.e. commit 4c0104a83fc3
> > > > ("powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE").
> > > > 
> > > > When testing 4c0104a83fc and all subsequent builds, various user
> > > > processes were liable to segfault. Here is the console log that Stan
> > > > provided:
> > > 
> > > Hi, i will be able to look at that more in details next week, however I
> > > have a few preliminary qurstions.
> > > 
> > > Can you reliabily reproduce the problem with the said commit, and can
> > > you reliabily run without problem with the parent commit ?
> > 
> > Yes and yes. (I already asked Stan to establish those things before I
> > contacted the list.)
> 
> I think I found the problem with that commit. Can you retry with the following
> change:
> 
> diff --git a/arch/powerpc/kernel/head_book3s_32.S
> b/arch/powerpc/kernel/head_book3s_32.S
> index 0a3d7d4a9ec4..a294103a91a1 100644
> --- a/arch/powerpc/kernel/head_book3s_32.S
> +++ b/arch/powerpc/kernel/head_book3s_32.S
> @@ -299,7 +299,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
>   EXCEPTION_PROLOG_1
>   EXCEPTION_PROLOG_2 0x300 DataAccess handle_dar_dsisr=1
>   prepare_transfer_to_handler
> - lwz r5, _DSISR(r11)
> + lwz r5, _DSISR(r1)
>   andis.  r0, r5, DSISR_DABRMATCH@h
>   bne-1f
>   bl  do_page_fault

That patch doesn't apply to mainline. This version might help.

diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 764edd860ed4..68e5c0a7e99d 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -300,7 +300,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
EXCEPTION_PROLOG_1
EXCEPTION_PROLOG_2 INTERRUPT_DATA_STORAGE DataAccess handle_dar_dsisr=1
prepare_transfer_to_handler
-   lwz r5, _DSISR(r11)
+   lwz r5, _DSISR(r1)
andis.  r0, r5, DSISR_DABRMATCH@h
bne-1f
bl  do_page_fault

[PATCH] KVM: PPC: Book3S HV: Fix kvmhv_copy_tofrom_guest_radix

2021-08-02 Thread Fabiano Rosas
This function was introduced along with nested HV guest support. It
uses the platform's Radix MMU quadrants[1] to provide a nested
hypervisor with fast access to its nested guests memory
(H_COPY_TOFROM_GUEST hypercall). It has also since been added as a
fast path for the kvmppc_ld/st routines which are used during
instruction emulation.

The commit def0bfdbd603 ("powerpc: use probe_user_read() and
probe_user_write()") changed the low level copy function from
raw_copy_from_user to probe_user_read, which adds a check to
access_ok. In powerpc that is:

 static inline bool __access_ok(unsigned long addr, unsigned long size)
 {
return addr < TASK_SIZE_MAX && size <= TASK_SIZE_MAX - addr;
 }

and TASK_SIZE_MAX is 0x0010UL for 64-bit, which means that
setting the two MSBs of the effective address (which correspond to the
quadrant) now cause access_ok to reject the access.

This was not caught earlier because the most common code path via
kvmppc_ld/st contains a fallback (kvm_read_guest) that is likely to
succeed for L1 guests. For nested guests there is no fallback.

Another issue is that probe_user_read (now __copy_from_user_nofault)
does not return the number of not copied bytes in case of failure, so
the destination memory is not being cleared anymore in
kvmhv_copy_from_guest_radix:

 ret = kvmhv_copy_tofrom_guest_radix(vcpu, eaddr, to, NULL, n);
 if (ret > 0)<-- always false!
memset(to + (n - ret), 0, ret);

This patch fixes both issues by introducing two new functions that set
the quadrant bit of the effective address only after checking
access_ok and moving the memset closer to __copy_to_user_inatomic.

1 - for more on quadrants see commit d7b456152230 ("KVM: PPC: Book3S
HV: Implement functions to access quadrants 1 & 2")

Fixes: def0bfdbd603 ("powerpc: use probe_user_read() and probe_user_write()")
Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 63 --
 1 file changed, 49 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index b5905ae4377c..076a8e4a9135 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -30,12 +30,57 @@
  */
 static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 };
 
+/* LPIDR and PIDR must have already been set */
+static long __copy_from_guest_quadrant(void *dst, void __user *src, size_t 
size,
+  unsigned long quadrant)
+{
+   long ret = size;
+   mm_segment_t old_fs = force_uaccess_begin();
+
+   if (access_ok(src, size)) {
+   src += (quadrant << 62);
+
+   pagefault_disable();
+   ret = __copy_from_user_inatomic((void __user *)dst, src, size);
+   pagefault_enable();
+   }
+   force_uaccess_end(old_fs);
+
+   if (!ret)
+   return ret;
+
+   memset(dst + (size - ret), 0, ret);
+
+   return -EFAULT;
+}
+
+/* LPIDR and PIDR must have already been set */
+static long __copy_to_guest_quadrant(void __user *dst, void *src, size_t size,
+unsigned long quadrant)
+{
+   long ret = -EFAULT;
+   mm_segment_t old_fs = force_uaccess_begin();
+
+   if (access_ok(dst, size)) {
+   dst += (quadrant << 62);
+
+   pagefault_disable();
+   ret = __copy_to_user_inatomic(dst, (void __user *)src, size);
+   pagefault_enable();
+   }
+   force_uaccess_end(old_fs);
+
+   if (ret)
+   return -EFAULT;
+   return 0;
+}
+
 unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid,
  gva_t eaddr, void *to, void *from,
  unsigned long n)
 {
int old_pid, old_lpid;
-   unsigned long quadrant, ret = n;
+   unsigned long quadrant, ret;
bool is_load = !!to;
 
/* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */
@@ -47,10 +92,6 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
pid,
quadrant = 1;
if (!pid)
quadrant = 2;
-   if (is_load)
-   from = (void *) (eaddr | (quadrant << 62));
-   else
-   to = (void *) (eaddr | (quadrant << 62));
 
preempt_disable();
 
@@ -66,9 +107,9 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
pid,
isync();
 
if (is_load)
-   ret = copy_from_user_nofault(to, (const void __user *)from, n);
+   ret = __copy_from_guest_quadrant(to, (void __user *)eaddr, n, 
quadrant);
else
-   ret = copy_to_user_nofault((void __user *)to, from, n);
+   ret = __copy_to_guest_quadrant((void __user *)eaddr, from, n, 
quadrant);
 
/* switch the pid first to avoid running host with unallocated pid */
if 

Re: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)

2021-08-02 Thread Lynn Boger
Fixes will be in Go 1.16.7 and 1.15.15. Backports are no longer being 
done for Go 1.14.


On 8/2/21 1:18 AM, Michael Ellerman wrote:

"Paul Murphy"  writes:
  
(My apologies for however IBM's email client munges this)



I heard it is going to be in Go 1.16.7, but I do not know much about Go.
Maybe the folks in Cc can chime in.
  
  
We have backports primed and ready for the next point release. They

are waiting on the release manager to cherrypick them.

OK good, that is still the correct fix in the long run.


I think we were aware that our VDSO usage may have exploited some
peculiarities in how the ppc64 version was constructed (i.e hand
written assembly which just didn't happen to clobber R30).

Yeah I was "somewhat surprised" that Go thought it could use r30 like
that across a VDSO call :D

But to be fair the ABI of the VDSO has always been a little fishy,
because the entry points pretend to be a transparent wrapper around
system calls, but then in a case like this are not.


Go up to this point has only used the vdso function __kernel_clock_gettime; it
is the only entry point which would need to explicitly avoid R30 for
Go's sake.

I thought about limiting the workaround to just that code, but it seemed
silly and likely to come back to bite us. Once the compilers decides to
spill a non-volatile there are plenty of other ones to choose from.

cheers


Re: [PATCH 2/3] trace: refactor TRACE_IRQFLAGS_SUPPORT in Kconfig

2021-08-02 Thread Max Filippov
On Fri, Jul 30, 2021 at 10:24 PM Masahiro Yamada  wrote:
>
> Make architectures select TRACE_IRQFLAGS_SUPPORT instead of
> having many defines.
>
> Signed-off-by: Masahiro Yamada 
> ---
>  arch/xtensa/Kconfig   | 4 +---

Acked-by: Max Filippov 
-- 
Thanks.
-- Max


Re: [PATCH 2/3] trace: refactor TRACE_IRQFLAGS_SUPPORT in Kconfig

2021-08-02 Thread Catalin Marinas
On Sat, Jul 31, 2021 at 02:22:32PM +0900, Masahiro Yamada wrote:
> Make architectures select TRACE_IRQFLAGS_SUPPORT instead of
> having many defines.
> 
> Signed-off-by: Masahiro Yamada 

For arm64:

Acked-by: Catalin Marinas 


Re: [PATCH v1 4/5] PCI: Adapt all code locations to not use struct pci_dev::driver directly

2021-08-02 Thread Boris Ostrovsky


On 7/31/21 8:08 AM, Uwe Kleine-König wrote:
> Hello Boris,
>
> On Fri, Jul 30, 2021 at 04:37:31PM -0400, Boris Ostrovsky wrote:
>> On 7/29/21 4:37 PM, Uwe Kleine-König wrote:
>>> --- a/drivers/pci/xen-pcifront.c
>>> +++ b/drivers/pci/xen-pcifront.c
>>> @@ -599,12 +599,12 @@ static pci_ers_result_t pcifront_common_process(int 
>>> cmd,
>>> result = PCI_ERS_RESULT_NONE;
>>>  
>>> pcidev = pci_get_domain_bus_and_slot(domain, bus, devfn);
>>> -   if (!pcidev || !pcidev->driver) {
>>> +   pdrv = pci_driver_of_dev(pcidev);
>>> +   if (!pcidev || !pdrv) {
>> If pcidev is NULL we are dead by the time we reach 'if' statement.
> Oh, you're right. So this needs something like:
>
>   if (!pcidev || !(pdrv = pci_driver_of_dev(pcidev)))


Sure, that's fine. And while at it please also drop 'if (pdrv)' check below 
(it's not directly related to your change but is more noticeable now so since 
you are in that function anyway I'd appreciate if you could do that).


Thanks.

-boris


>
> or repeating the call to pci_driver_of_dev for each previous usage of
> pdev->driver.
>
> If there are no other preferences I'd got with the first approach for
> v2.
>
> Best regards and thanks for catching,
> Uwe
>


Re: [PATCH 2/2] virtio-console: remove unnecessary kmemdup()

2021-08-02 Thread Xianting Tian


在 2021/8/2 下午3:25, Jiri Slaby 写道:

Hi,

why is this 2/2? I seem (Lore neither) to find 1/2.

You didn't receive 1/2?
[PATCH 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/1/8 


On 01. 08. 21, 7:16, Xianting Tian wrote:

hvc framework will never pass stack memory to the put_chars() function,


Am I blind or missing something?

hvc_console_print(...)
{
  char c[N_OUTBUF]
...
  cons_ops[index]->put_chars(vtermnos[index], c, i);

The same here:

hvc_poll_put_char(..., char ch)
{
...
   n = hp->ops->put_chars(hp->vtermno, , 1);

AFAICS both of them *pass* a pointer to stack variable.


yes, I discussed the issue with Arnd before in below thread,  you can 
get the history, thanks


https://lkml.org/lkml/2021/7/27/494 




So the calling of kmemdup() is unnecessary, remove it.

Fixes: c4baad5029 ("virtio-console: avoid DMA from stack")


This patch doesn't "Fix" -- it reverts the commit. You should've CCed 
the author too.


yes, we discussed ther issue in above thread, which we CCed the author.




Signed-off-by: Xianting Tian 
---
  drivers/char/virtio_console.c | 12 ++--
  1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/char/virtio_console.c 
b/drivers/char/virtio_console.c

index 7eaf303a7..4ed3ffb1d 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1117,8 +1117,6 @@ static int put_chars(u32 vtermno, const char 
*buf, int count)

  {
  struct port *port;
  struct scatterlist sg[1];
-    void *data;
-    int ret;
    if (unlikely(early_put_chars))
  return early_put_chars(vtermno, buf, count);
@@ -1127,14 +1125,8 @@ static int put_chars(u32 vtermno, const char 
*buf, int count)

  if (!port)
  return -EPIPE;
  -    data = kmemdup(buf, count, GFP_ATOMIC);
-    if (!data)
-    return -ENOMEM;
-
-    sg_init_one(sg, data, count);
-    ret = __send_to_port(port, sg, 1, count, data, false);
-    kfree(data);
-    return ret;
+    sg_init_one(sg, buf, count);
+    return __send_to_port(port, sg, 1, count, (void *)buf, false);
  }
    /*






Re: [PATCH 2/3] trace: refactor TRACE_IRQFLAGS_SUPPORT in Kconfig

2021-08-02 Thread Michael Ellerman
Masahiro Yamada  writes:
> Make architectures select TRACE_IRQFLAGS_SUPPORT instead of
> having many defines.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/Kconfig  | 3 +++
>  arch/arc/Kconfig  | 4 +---
>  arch/arm/Kconfig  | 5 +
>  arch/arm64/Kconfig| 4 +---
>  arch/csky/Kconfig | 4 +---
>  arch/hexagon/Kconfig  | 4 +---
>  arch/microblaze/Kconfig   | 1 +
>  arch/microblaze/Kconfig.debug | 5 -
>  arch/mips/Kconfig | 1 +
>  arch/mips/Kconfig.debug   | 4 
>  arch/nds32/Kconfig| 4 +---
>  arch/nios2/Kconfig| 3 ---
>  arch/openrisc/Kconfig | 4 +---
>  arch/parisc/Kconfig   | 1 +
>  arch/parisc/Kconfig.debug | 3 ---
>  arch/powerpc/Kconfig  | 5 +
>  arch/riscv/Kconfig| 4 +---
>  arch/s390/Kconfig | 1 +
>  arch/s390/Kconfig.debug   | 3 ---
>  arch/sh/Kconfig   | 1 +
>  arch/sh/Kconfig.debug | 3 ---
>  arch/sparc/Kconfig| 1 +
>  arch/sparc/Kconfig.debug  | 4 
>  arch/um/Kconfig   | 5 +
>  arch/x86/Kconfig  | 1 +
>  arch/x86/Kconfig.debug| 3 ---
>  arch/xtensa/Kconfig   | 4 +---
>  27 files changed, 21 insertions(+), 64 deletions(-)

> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index d01e3401581d..76a28452c042 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -94,10 +94,6 @@ config STACKTRACE_SUPPORT
>   bool
>   default y
>  
> -config TRACE_IRQFLAGS_SUPPORT
> - bool
> - default y
> -
>  config LOCKDEP_SUPPORT
>   bool
>   default y
> @@ -271,6 +267,7 @@ config PPC
>   select STRICT_KERNEL_RWX if STRICT_MODULE_RWX
>   select SYSCTL_EXCEPTION_TRACE
>   select THREAD_INFO_IN_TASK
> + select TRACE_IRQFLAGS_SUPPORT
>   select VIRT_TO_BUS  if !PPC64
>   #
>   # Please keep this list sorted alphabetically.

Acked-by: Michael Ellerman  (powerpc)

cheers


[Bug 213803] G5 kernel build (v5.14-rc2) fails at linking stage - ld: arch/powerpc/mm/pgtable.o: in function `.__ptep_set_access_flags': /usr/src/linux-stable/./arch/powerpc/include/asm/book3s/64/pgta

2021-08-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213803

--- Comment #1 from Erhard F. (erhar...@mailbox.org) ---
Still a problem in v5.14-rc4.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH 3/3] isystem: delete global -isystem compile option

2021-08-02 Thread Segher Boessenkool
On Mon, Aug 02, 2021 at 11:30:00PM +0300, Alexey Dobriyan wrote:
> On Mon, Aug 02, 2021 at 11:47:47AM -0500, Segher Boessenkool wrote:
> > The kernel *cannot* make up its own types for this.  It has to use the
> > types it is required to use (by C, by the ABIs, etc.)  So why
> > reimplement this?
> 
> Yes, it can. gcc headers have stuff like this:
> 
>   #define __PTRDIFF_TYPE__ long int
>   #define __SIZE_TYPE__ long unsigned int
> 
> If gcc can defined standard types, kernel can too.

The kernel *has to* use those exact same types.  So why on earth do you
feel you should reimplement this?

> > > noreturn, alignas newest C standard
> > > are next.
> > 
> > What is wrong with  and ?
> 
> These two are actually quite nice.
> 
> Have you seen ? Loads of macrology crap.
> Kernel can ship nicer one.

It is a pretty tame file.  And it works correctly for *all* targets,
including all Linux targets.  Why reimplement this?  No, it takes
virtually no resources to compile this.  And you do not have to maintain
it *at all*, the compiler will take care of it.  It is standard.

> > > They are userspace headers in the sense they are external to the project
> > > just like userspace programs are external to the kernel.
> > 
> > So you are going to rewrite all of the rest of GCC inside the kernel
> > project as well?
> 
> What an argument. "the rest of GCC" is already there except for stdarg.h.

???

That is there as well.  But you want to remove it.

"The rest of GCC" is everything in cc1 (the compiler binary), in libgcc
(not that the kernel wants that either on most targets, although it is
required), etc.  A few GB of binary goodness.

> > > Kernel chose to be self-contained.
> > 
> > That is largely historical, imo.  Nowadays this is less necessary.
> 
> I kind of agree as in kernel should use int8_t and stuff because they
> are standard.

s8 is a much nicer name, heh.  But it could
  #define s8 int8_t
certainly.

What I meant was the kernel wanted to avoid standard headers because
those traditionally have been a bit problematic.  But decades have gone
by, and nowadays the kernel's own headers are at least as bad.

> Also, -isystem removal disables  and  which is
> desireable.

Why?  Do you think  #include   will ever make it past code
review?  Do you need to throw up extra barriers so people will have a
harder time changing that policy, if ever they think that a good idea?

> > > It will be used for intrinsics where necessary.
> > 
> > Like, everywhere.
> 
> No, where necessary. Patch demostrates there are only a few places which
> want -isystem back.

Yes, where necessary, that is what I said.  So, potentially everywhere.
An arch can decide to use some builtin in a generic header, for example.

Your patch makes for more work in the future, that is the best it does.


Segher


Re: Debian SID kernel doesn't boot on PowerBook 3400c

2021-08-02 Thread Stan Johnson
On 8/2/21 8:41 AM, Christophe Leroy wrote:
> 
> 
> Le 31/07/2021 à 20:24, Stan Johnson a écrit :
>> Hi Christophe,
>>
>> On 7/31/21 9:58 AM, Christophe Leroy wrote:
>>> Stan Johnson  a écrit :
>>>
 Hello,

 The current Debian SID kernel will not boot on a PowerBook 3400c
 running
 the latest version of Debian SID. If booted using the BootX extension,
 the kernel hangs immediately:

 "Welcome to Linux, kernel 5.10.0-8-powerpc"

 If booted from Mac OS, the Mac OS screen hangs.

 Booting also hangs if the "No video driver" option is selected in
 BootX,
 "No video driver" causes "video=ofonly" to be passed to the kernel.

 This is the current command line that I'm using in BootX:
 root=/dev/sda13 video=chips65550:vmode:14,cmode:16

 Kernel v5.9 works as expected.

 The config file I'm using is attached.

 Here are the results of a git bisect, marking v5.9 as "good" and the
 most current kernel as "bad":

 $ cd linux
 $ git remote update
 $ git bisect reset
 $ git bisect start
 $ git bisect bad
 $ git bisect good v5.9

 Note: "bad" -> hangs at boot; "good" -> boots to login prompt

   1) 5.11.0-rc5-pmac-00034-g684da7628d9 (bad)
   2) 5.10.0-rc3-pmac-00383-gbb9dd3ce617 (good)
   3) 5.10.0-pmac-06637-g2911ed9f47b (good)
  Note: I had to disable SMP to build this kernel.
   4) 5.10.0-pmac-10584-g9805529ec54 (good)
  Note: I had to disable SMP to build this kernel.
   5) 5.10.0-pmac-12577-g8552d28e140 (bad)
   6) 5.10.0-pmac-11576-g8a5be36b930 (bad)
   7) 5.10.0-pmac-11044-gbe695ee29e8 (good)
  Note: I had to disable SMP to build this kernel.
   8) 5.10.0-rc2-pmac-00288-g59d512e4374 (bad)
   9) 5.10.0-rc2-pmac-00155-gc3d35ddd1ec (good)
 10) 5.10.0-rc2-pmac-00221-g7049b288ea8 (good)
 11) 5.10.0-rc2-pmac-00254-g4b74a35fc7e (bad)
 12) 5.10.0-rc2-pmac-00237-ged22bb8d39f (good)
 13) 5.10.0-rc2-pmac-00245-g87b57ea7e10 (good)
 14) 5.10.0-rc2-pmac-00249-gf10881a46f8 (bad)
 15) 5.10.0-rc2-pmac-00247-gf8a4b277c3c (good)
 16) 5.10.0-rc2-pmac-00248-gdb972a3787d (bad)

 db972a3787d12b1ce9ba7a31ec376d8a79e04c47 is the first bad commit
>>>
>>> Not sure this is really the root of the problem.
>>>
>>> Can you try again without CONFIG_VMAP_STACK ?
>>>
>>> Thanks
>>> Christophe
>>> ...
>>
>>
>> With CONFIG_VMAP_STACK=y, 5.11.0-rc5-pmac-00034-g684da7628d9 hangs at
>> boot on the PB 3400c.
>>
>> Without CONFIG_VMAP_STACK, 5.11.0-rc5-pmac-00034-g684da7628d9 boots as
>> expected.
>>
>> I didn't re-build the Debian SID kernel, though I confirmed that the
>> Debian config file for 5.10.0-8-powerpc includes CONFIG_VMAP_STACK=y.
>> It's not clear whether removing CONFIG_VMAP_STACK would be appropriate
>> for other powerpc systems.
>>
>> Please let me know why removing CONFIG_VMAP_STACK fixed the problem on
>> the PB 3400c. Should CONFIG_HAVE_ARCH_VMAP_STACK also be removed?
>>
> 
> When CONFIG_HAVE_ARCH_VMAP_STACK is selected by the architecture,
> CONFIG_VMAP_STACK  is selected by default.
> 
> The point is that your config has CONFIG_ADB_PMU.
> 
> A bug with VMAP stack was detected during 5.9 release cycle for
> platforms selecting CONFIG_ADB_PMU. Because fixing the bug was an heavy
> change, we prefered at that time to disable VMAP stack, so VMAP stack
> was deselected for CONFIG_ADB_PMU by commit
> 4a133eb351ccc275683ad49305d0b04dde903733.
> 
> Then as a second step, the proper fix was implemented and then VMAP
> stack was enabled again by the commit you bisected.
> 
> Taking into account that the problem disappears for you when you
> manually deselect VMAP stacks, it means the problem is not the fix
> itself, but the fact that VMAP stacks are now enable by default.
> 
> We need to understand why VMAP stack doesn't work on your platform, more
> than that why it doesn't boot at all with VMAP stack.
> 
> Could you send me the dmesg output of your system when it properly boots ?
> 
> Did you check with kernel 5.13 ?
> 
> Thanks
> Christophe
> 

Christophe,

Thanks for your response. It looks like I never tested v5.13 (I was
originally just reporting that the default Debian SID kernel,
5.10.0-8-powerpc, hangs at boot on the PB 3400c).

So I rebuilt the stock v5.13 from kernel.org using Finn's
dot-config-powermac-5.13, which got changed slightly at compilation (see
dot-config-v5.13-pmac, attached). It has CONFIG_VMAP_STACK and
CONFIG_ADB_PMU set, and it booted, but there were multiple memory
errors. So it looks like the hang-at-boot problem was fixed sometime
after v5.11, but there are now memory errors (similar to Wallstreet).

With CONFIG_VMAP_STACK not set (CONFIG_ADB_PMU is still set), the
.config file turns into the attached dot-config-v5.13-pmac_NO_VMAP. And
there were still memory errors (dmesg output attached).

The memory errors may be a completely unrelated issue, since they occur
regardless of the 

Re: [PATCH 3/3] isystem: delete global -isystem compile option

2021-08-02 Thread Nathan Chancellor

On 8/2/2021 1:32 PM, Alexey Dobriyan wrote:

On Mon, Aug 02, 2021 at 11:18:32AM -0700, Nathan Chancellor wrote:

On Sun, Aug 01, 2021 at 11:13:36PM +0300, Alexey Dobriyan wrote:

In theory, it enables "leakage" of userspace headers into kernel which
may present licensing problem.

In practice, only stdarg.h was used, stdbool.h is trivial and SIMD
intrinsics are contained to a few architectures and aren't global
problem.

In general, kernel is very self contained code and -isystem removal
will further isolate it from Ring Threeland influence.

nds32 keeps -isystem globally due to intrisics used in entrenched header.

-isystem is selectively reenabled for some files.

Not compile tested on hexagon.


With this series on top of v5.14-rc4 and a tangential patch to fix
another issue, ARCH=hexagon defconfig and allmodconfig show no issues.

Tested-by: Nathan Chancellor  # build (hexagon)


Oh wow, small miracle. Thank you!

Where can I find a cross-compiler? This link doesn't seem to have one
https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/11.1.0/


Hexagon only builds with LLVM now because of the bump to require gcc 
4.9: https://lore.kernel.org/r/20210623141854.ga32...@lst.de/


Brian Cain has a link in that thread to an LLVM toolchain that works 
well for defconfig (allmodconfig requires LLVM 13/14 from git). 
Otherwise, https://apt.llvm.org or LLVM from your package manager should 
be sufficient for the same targets.


$ make -skj"$(nproc)" ARCH=hexagon CROSS_COMPILE=hexagon-linux-musl- 
LLVM=1 LLVM_IAS=1 defconfig all


should work fine as long as the bin folder for whatever toolchain you 
download is in your PATH.


Cheers,
Nathan


Re: [PATCH 3/3] isystem: delete global -isystem compile option

2021-08-02 Thread Alexey Dobriyan
On Mon, Aug 02, 2021 at 11:18:32AM -0700, Nathan Chancellor wrote:
> On Sun, Aug 01, 2021 at 11:13:36PM +0300, Alexey Dobriyan wrote:
> > In theory, it enables "leakage" of userspace headers into kernel which
> > may present licensing problem.
> > 
> > In practice, only stdarg.h was used, stdbool.h is trivial and SIMD
> > intrinsics are contained to a few architectures and aren't global
> > problem.
> > 
> > In general, kernel is very self contained code and -isystem removal
> > will further isolate it from Ring Threeland influence.
> > 
> > nds32 keeps -isystem globally due to intrisics used in entrenched header.
> > 
> > -isystem is selectively reenabled for some files.
> > 
> > Not compile tested on hexagon.
> 
> With this series on top of v5.14-rc4 and a tangential patch to fix
> another issue, ARCH=hexagon defconfig and allmodconfig show no issues.
> 
> Tested-by: Nathan Chancellor  # build (hexagon)

Oh wow, small miracle. Thank you!

Where can I find a cross-compiler? This link doesn't seem to have one
https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/11.1.0/


Re: [PATCH 3/3] isystem: delete global -isystem compile option

2021-08-02 Thread Alexey Dobriyan
On Mon, Aug 02, 2021 at 11:47:47AM -0500, Segher Boessenkool wrote:
> On Mon, Aug 02, 2021 at 09:42:45AM +0300, Alexey Dobriyan wrote:
> > On Sun, Aug 01, 2021 at 04:32:47PM -0500, Segher Boessenkool wrote:
> > > On Sun, Aug 01, 2021 at 11:13:36PM +0300, Alexey Dobriyan wrote:
> > > > In theory, it enables "leakage" of userspace headers into kernel which
> > > > may present licensing problem.
> > > 
> > > > -NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) 
> > > > -print-file-name=include)
> > > > +NOSTDINC_FLAGS += -nostdinc
> > > 
> > > This is removing the compiler's own include files.  These are required
> > > for all kinds of basic features, and required to be compliant to the C
> > > standard at all.
> > 
> > No they are not required.
> 
> This is false, they *are* required, whenever you want to use these
> features.  If you do not include the required headers you get undefined
> behaviour.
> 
> > Kernel uses its own bool, uintptr_t and
> > static_assert, memset(), CHAR_BIT.
> 
> Yes, and it occasionally gets it wrong.  Great fun.  See c46bbf5d2def
> for the latest episode in this saga.  (Yes I know this is uapi so maybe
> not the best example here, but it isn't like the kernel gets such things
> wrong so often these days ;-) )
> 
> The kernel *cannot* make up its own types for this.  It has to use the
> types it is required to use (by C, by the ABIs, etc.)  So why
> reimplement this?

Yes, it can. gcc headers have stuff like this:

#define __PTRDIFF_TYPE__ long int
#define __SIZE_TYPE__ long unsigned int

If gcc can defined standard types, kernel can too.

> > noreturn, alignas newest C standard
> > are next.
> 
> What is wrong with  and ?

These two are actually quite nice.

Have you seen ? Loads of macrology crap.
Kernel can ship nicer one.

> > This version changelog didn't mention but kernel would use
> > -ffreestanding too if not other problems with the flag.
> 
> It is still true for freestanding C implementations, you just get a
> severely reduced standard library,
> 
> > > These are not "userspace headers", that is what
> > > -nostdinc takes care of already.
> > 
> > They are userspace headers in the sense they are external to the project
> > just like userspace programs are external to the kernel.
> 
> So you are going to rewrite all of the rest of GCC inside the kernel
> project as well?

What an argument. "the rest of GCC" is already there except for stdarg.h.

> > > In the case of GCC all these headers are GPL-with-runtime-exception, so
> > > claiming this can cause licensing problems is fearmongering.
> > 
> > I agree licensing problem doesn't really exist.
> > It would take gcc drop-in replacement with authors insane enough to not
> > license standard headers properly.
> 
> There does still not exist a drop-in replacement for GCC, not if you
> look closely and/or rely on details (like the kernel does).  Some of the
> differences are hidden by "linux/compiler-*.h", but hardly all.
> 
> > > I strongly advise against doing this.
> > 
> > Kernel chose to be self-contained.
> 
> That is largely historical, imo.  Nowadays this is less necessary.

I kind of agree as in kernel should use int8_t and stuff because they
are standard.

Also, -isystem removal disables  and  which is
desireable.

> Also, the kernel chose to *do* use the compiler include files.  It is
> you who wants to abolish that here.
> 
> > -isystem removal makes sense then.
> 
> -nostdinc -isystem $(shell $(CC) -print-file-name=include)  makes sense
> for that: you do indeed not want the userspace headers.  Maiming the
> compiler (by removing some of its functional parts, namely, its generic
> headers) does not make sense.
> 
> > It will be used for intrinsics where necessary.
> 
> Like, everywhere.

No, where necessary. Patch demostrates there are only a few places which
want -isystem back.


Re: [PATCH 3/3] isystem: delete global -isystem compile option

2021-08-02 Thread Nathan Chancellor
On Sun, Aug 01, 2021 at 11:13:36PM +0300, Alexey Dobriyan wrote:
> In theory, it enables "leakage" of userspace headers into kernel which
> may present licensing problem.
> 
> In practice, only stdarg.h was used, stdbool.h is trivial and SIMD
> intrinsics are contained to a few architectures and aren't global
> problem.
> 
> In general, kernel is very self contained code and -isystem removal
> will further isolate it from Ring Threeland influence.
> 
> nds32 keeps -isystem globally due to intrisics used in entrenched header.
> 
> -isystem is selectively reenabled for some files.
> 
> Not compile tested on hexagon.

With this series on top of v5.14-rc4 and a tangential patch to fix
another issue, ARCH=hexagon defconfig and allmodconfig show no issues.

Tested-by: Nathan Chancellor  # build (hexagon)

> Compile tested on:
> 
> alpha-allmodconfig alpha-allnoconfig alpha-defconfig arm64-allmodconfig
> arm64-allnoconfig arm64-defconfig arm-am200epdkit arm-aspeed_g4
> arm-aspeed_g5 arm-assabet arm-at91_dt arm-axm55xx arm-badge4 arm-bcm2835
> arm-cerfcube arm-clps711x arm-cm_x300 arm-cns3420vb arm-colibri_pxa270
> arm-colibri_pxa300 arm-collie arm-corgi arm-davinci_all arm-dove
> arm-ep93xx arm-eseries_pxa arm-exynos arm-ezx arm-footbridge arm-gemini
> arm-h3600 arm-h5000 arm-hackkit arm-hisi arm-imote2 arm-imx_v4_v5
> arm-imx_v6_v7 arm-integrator arm-iop32x arm-ixp4xx arm-jornada720
> arm-keystone arm-lart arm-lpc18xx arm-lpc32xx arm-lpd270 arm-lubbock
> arm-magician arm-mainstone arm-milbeaut_m10v arm-mini2440 arm-mmp2
> arm-moxart arm-mps2 arm-multi_v4t arm-multi_v5 arm-multi_v7 arm-mv78xx0
> arm-mvebu_v5 arm-mvebu_v7 arm-mxs arm-neponset arm-netwinder arm-nhk8815
> arm-omap1 arm-omap2plus arm-orion5x arm-oxnas_v6 arm-palmz72 arm-pcm027
> arm-pleb arm-pxa arm-pxa168 arm-pxa255-idp arm-pxa3xx arm-pxa910
> arm-qcom arm-realview arm-rpc arm-s3c2410 arm-s3c6400 arm-s5pv210
> arm-sama5 arm-shannon arm-shmobile arm-simpad arm-socfpga arm-spear13xx
> arm-spear3xx arm-spear6xx arm-spitz arm-stm32 arm-sunxi arm-tct_hammer
> arm-tegra arm-trizeps4 arm-u8500 arm-versatile arm-vexpress arm-vf610m4
> arm-viper arm-vt8500_v6_v7 arm-xcep arm-zeus csky-allmodconfig
> csky-allnoconfig csky-defconfig h8300-edosk2674 h8300-h8300h-sim
> h8300-h8s-sim i386-allmodconfig i386-allnoconfig i386-defconfig
> ia64-allmodconfig ia64-allnoconfig ia64-bigsur ia64-generic ia64-gensparse
> ia64-tiger ia64-zx1 m68k-amcore m68k-amiga m68k-apollo m68k-atari
> m68k-bvme6000 m68k-hp300 m68k-m5208evb m68k-m5249evb m68k-m5272c3
> m68k-m5275evb m68k-m5307c3 m68k-m5407c3 m68k-m5475evb m68k-mac
> m68k-multi m68k-mvme147 m68k-mvme16x m68k-q40 m68k-stmark2 m68k-sun3
> m68k-sun3x microblaze-allmodconfig microblaze-allnoconfig microblaze-mmu
> mips-ar7 mips-ath25 mips-ath79 mips-bcm47xx mips-bcm63xx mips-bigsur
> mips-bmips_be mips-bmips_stb mips-capcella mips-cavium_octeon mips-ci20
> mips-cobalt mips-cu1000-neo mips-cu1830-neo mips-db1xxx mips-decstation
> mips-decstation_64 mips-decstation_r4k mips-e55 mips-fuloong2e
> mips-gcw0 mips-generic mips-gpr mips-ip22 mips-ip27 mips-ip28 mips-ip32
> mips-jazz mips-jmr3927 mips-lemote2f mips-loongson1b mips-loongson1c
> mips-loongson2k mips-loongson3 mips-malta mips-maltaaprp mips-malta_kvm
> mips-malta_qemu_32r6 mips-maltasmvp mips-maltasmvp_eva mips-maltaup
> mips-maltaup_xpa mips-mpc30x mips-mtx1 mips-nlm_xlp mips-nlm_xlr
> mips-omega2p mips-pic32mzda mips-pistachio mips-qi_lb60 mips-rb532
> mips-rbtx49xx mips-rm200 mips-rs90 mips-rt305x mips-sb1250_swarm
> mips-tb0219 mips-tb0226 mips-tb0287 mips-vocore2 mips-workpad mips-xway
> nds32-allmodconfig nds32-allnoconfig nds32-defconfig nios2-10m50
> nios2-3c120 nios2-allmodconfig nios2-allnoconfig openrisc-allmodconfig
> openrisc-allnoconfig openrisc-or1klitex openrisc-or1ksim
> openrisc-simple_smp parisc-allnoconfig parisc-generic-32bit
> parisc-generic-64bit powerpc-acadia powerpc-adder875 powerpc-akebono
> powerpc-amigaone powerpc-arches powerpc-asp8347 powerpc-bamboo
> powerpc-bluestone powerpc-canyonlands powerpc-cell powerpc-chrp32
> powerpc-cm5200 powerpc-currituck powerpc-ebony powerpc-eiger
> powerpc-ep8248e powerpc-ep88xc powerpc-fsp2 powerpc-g5 powerpc-gamecube
> powerpc-ge_imp3a powerpc-holly powerpc-icon powerpc-iss476-smp
> powerpc-katmai powerpc-kilauea powerpc-klondike powerpc-kmeter1
> powerpc-ksi8560 powerpc-linkstation powerpc-lite5200b powerpc-makalu
> powerpc-maple powerpc-mgcoge powerpc-microwatt powerpc-motionpro
> powerpc-mpc512x powerpc-mpc5200 powerpc-mpc7448_hpc2 powerpc-mpc8272_ads
> powerpc-mpc8313_rdb powerpc-mpc8315_rdb powerpc-mpc832x_mds
> powerpc-mpc832x_rdb powerpc-mpc834x_itx powerpc-mpc834x_itxgp
> powerpc-mpc834x_mds powerpc-mpc836x_mds powerpc-mpc836x_rdk
> powerpc-mpc837x_mds powerpc-mpc837x_rdb powerpc-mpc83xx
> powerpc-mpc8540_ads powerpc-mpc8560_ads powerpc-mpc85xx_cds
> powerpc-mpc866_ads powerpc-mpc885_ads powerpc-mvme5100 powerpc-obs600
> powerpc-pasemi powerpc-pcm030 powerpc-pmac32 powerpc-powernv
> powerpc-ppa8548 

Re: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)

2021-08-02 Thread Segher Boessenkool
On Mon, Aug 02, 2021 at 04:18:58PM +1000, Michael Ellerman wrote:
> > Go up to this point has only used the vdso function __kernel_clock_gettime; 
> > it
> > is the only entry point which would need to explicitly avoid R30 for
> > Go's sake.
> 
> I thought about limiting the workaround to just that code, but it seemed
> silly and likely to come back to bite us. Once the compilers decides to
> spill a non-volatile there are plenty of other ones to choose from.

It can be cheaper to spill N..31 consecutively, using stmw for example.
For 64-bit Power implementations it doesn't currently make any
difference.  Since none of this will be inlined it doesn't have any real
impact.

(This also happens with -m32 -fpic, which always sets GPR30 as fixed, it
is the offset table register.  With those flags -ffixed-r30 doesn't do
anything btw (r30 already *is* a fixed function register), and this will
not work with a Go that clobbers r30.  But this is academic :-) )


Segher


Re: [PATCH 3/3] isystem: delete global -isystem compile option

2021-08-02 Thread Segher Boessenkool
On Mon, Aug 02, 2021 at 09:42:45AM +0300, Alexey Dobriyan wrote:
> On Sun, Aug 01, 2021 at 04:32:47PM -0500, Segher Boessenkool wrote:
> > On Sun, Aug 01, 2021 at 11:13:36PM +0300, Alexey Dobriyan wrote:
> > > In theory, it enables "leakage" of userspace headers into kernel which
> > > may present licensing problem.
> > 
> > > -NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) 
> > > -print-file-name=include)
> > > +NOSTDINC_FLAGS += -nostdinc
> > 
> > This is removing the compiler's own include files.  These are required
> > for all kinds of basic features, and required to be compliant to the C
> > standard at all.
> 
> No they are not required.

This is false, they *are* required, whenever you want to use these
features.  If you do not include the required headers you get undefined
behaviour.

> Kernel uses its own bool, uintptr_t and
> static_assert, memset(), CHAR_BIT.

Yes, and it occasionally gets it wrong.  Great fun.  See c46bbf5d2def
for the latest episode in this saga.  (Yes I know this is uapi so maybe
not the best example here, but it isn't like the kernel gets such things
wrong so often these days ;-) )

The kernel *cannot* make up its own types for this.  It has to use the
types it is required to use (by C, by the ABIs, etc.)  So why
reimplement this?

> noreturn, alignas newest C standard
> are next.

What is wrong with  and ?

> This version changelog didn't mention but kernel would use
> -ffreestanding too if not other problems with the flag.

It is still true for freestanding C implementations, you just get a
severely reduced standard library,

> > These are not "userspace headers", that is what
> > -nostdinc takes care of already.
> 
> They are userspace headers in the sense they are external to the project
> just like userspace programs are external to the kernel.

So you are going to rewrite all of the rest of GCC inside the kernel
project as well?

> > In the case of GCC all these headers are GPL-with-runtime-exception, so
> > claiming this can cause licensing problems is fearmongering.
> 
> I agree licensing problem doesn't really exist.
> It would take gcc drop-in replacement with authors insane enough to not
> license standard headers properly.

There does still not exist a drop-in replacement for GCC, not if you
look closely and/or rely on details (like the kernel does).  Some of the
differences are hidden by "linux/compiler-*.h", but hardly all.

> > I strongly advise against doing this.
> 
> Kernel chose to be self-contained.

That is largely historical, imo.  Nowadays this is less necessary.

Also, the kernel chose to *do* use the compiler include files.  It is
you who wants to abolish that here.

> -isystem removal makes sense then.

-nostdinc -isystem $(shell $(CC) -print-file-name=include)  makes sense
for that: you do indeed not want the userspace headers.  Maiming the
compiler (by removing some of its functional parts, namely, its generic
headers) does not make sense.

> It will be used for intrinsics where necessary.

Like, everywhere.


Segher


Re: [PATCH v3 31/41] powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE

2021-08-02 Thread LEROY Christophe




Le 01/08/2021 à 03:21, Finn Thain a écrit :

On Sat, 31 Jul 2021, Christophe Leroy wrote:



Stan Johnson contacted me about a regression in mainline that he
observed on his G3 Powerbooks. Using 'git bisect' we determined that
this patch was the cause of the regression, i.e. commit 4c0104a83fc3
("powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE").

When testing 4c0104a83fc and all subsequent builds, various user
processes were liable to segfault. Here is the console log that Stan
provided:


Hi, i will be able to look at that more in details next week, however I
have a few preliminary qurstions.

Can you reliabily reproduce the problem with the said commit, and can
you reliabily run without problem with the parent commit ?


Yes and yes. (I already asked Stan to establish those things before I
contacted the list.)


I think I found the problem with that commit. Can you retry with the 
following change:


diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S

index 0a3d7d4a9ec4..a294103a91a1 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -299,7 +299,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
EXCEPTION_PROLOG_1
EXCEPTION_PROLOG_2 0x300 DataAccess handle_dar_dsisr=1
prepare_transfer_to_handler
-   lwz r5, _DSISR(r11)
+   lwz r5, _DSISR(r1)
andis.  r0, r5, DSISR_DABRMATCH@h
bne-1f
bl  do_page_fault
---
Thanks
Christophe


Re: Debian SID kernel doesn't boot on PowerBook 3400c

2021-08-02 Thread Christophe Leroy




Le 31/07/2021 à 20:24, Stan Johnson a écrit :

Hi Christophe,

On 7/31/21 9:58 AM, Christophe Leroy wrote:

Stan Johnson  a écrit :


Hello,

The current Debian SID kernel will not boot on a PowerBook 3400c running
the latest version of Debian SID. If booted using the BootX extension,
the kernel hangs immediately:

"Welcome to Linux, kernel 5.10.0-8-powerpc"

If booted from Mac OS, the Mac OS screen hangs.

Booting also hangs if the "No video driver" option is selected in BootX,
"No video driver" causes "video=ofonly" to be passed to the kernel.

This is the current command line that I'm using in BootX:
root=/dev/sda13 video=chips65550:vmode:14,cmode:16

Kernel v5.9 works as expected.

The config file I'm using is attached.

Here are the results of a git bisect, marking v5.9 as "good" and the
most current kernel as "bad":

$ cd linux
$ git remote update
$ git bisect reset
$ git bisect start
$ git bisect bad
$ git bisect good v5.9

Note: "bad" -> hangs at boot; "good" -> boots to login prompt

  1) 5.11.0-rc5-pmac-00034-g684da7628d9 (bad)
  2) 5.10.0-rc3-pmac-00383-gbb9dd3ce617 (good)
  3) 5.10.0-pmac-06637-g2911ed9f47b (good)
     Note: I had to disable SMP to build this kernel.
  4) 5.10.0-pmac-10584-g9805529ec54 (good)
     Note: I had to disable SMP to build this kernel.
  5) 5.10.0-pmac-12577-g8552d28e140 (bad)
  6) 5.10.0-pmac-11576-g8a5be36b930 (bad)
  7) 5.10.0-pmac-11044-gbe695ee29e8 (good)
     Note: I had to disable SMP to build this kernel.
  8) 5.10.0-rc2-pmac-00288-g59d512e4374 (bad)
  9) 5.10.0-rc2-pmac-00155-gc3d35ddd1ec (good)
10) 5.10.0-rc2-pmac-00221-g7049b288ea8 (good)
11) 5.10.0-rc2-pmac-00254-g4b74a35fc7e (bad)
12) 5.10.0-rc2-pmac-00237-ged22bb8d39f (good)
13) 5.10.0-rc2-pmac-00245-g87b57ea7e10 (good)
14) 5.10.0-rc2-pmac-00249-gf10881a46f8 (bad)
15) 5.10.0-rc2-pmac-00247-gf8a4b277c3c (good)
16) 5.10.0-rc2-pmac-00248-gdb972a3787d (bad)

db972a3787d12b1ce9ba7a31ec376d8a79e04c47 is the first bad commit


Not sure this is really the root of the problem.

Can you try again without CONFIG_VMAP_STACK ?

Thanks
Christophe
...



With CONFIG_VMAP_STACK=y, 5.11.0-rc5-pmac-00034-g684da7628d9 hangs at
boot on the PB 3400c.

Without CONFIG_VMAP_STACK, 5.11.0-rc5-pmac-00034-g684da7628d9 boots as
expected.

I didn't re-build the Debian SID kernel, though I confirmed that the
Debian config file for 5.10.0-8-powerpc includes CONFIG_VMAP_STACK=y.
It's not clear whether removing CONFIG_VMAP_STACK would be appropriate
for other powerpc systems.

Please let me know why removing CONFIG_VMAP_STACK fixed the problem on
the PB 3400c. Should CONFIG_HAVE_ARCH_VMAP_STACK also be removed?



When CONFIG_HAVE_ARCH_VMAP_STACK is selected by the architecture, CONFIG_VMAP_STACK  is selected by 
default.


The point is that your config has CONFIG_ADB_PMU.

A bug with VMAP stack was detected during 5.9 release cycle for platforms selecting CONFIG_ADB_PMU. 
Because fixing the bug was an heavy change, we prefered at that time to disable VMAP stack, so VMAP 
stack was deselected for CONFIG_ADB_PMU by commit 4a133eb351ccc275683ad49305d0b04dde903733.


Then as a second step, the proper fix was implemented and then VMAP stack was enabled again by the 
commit you bisected.


Taking into account that the problem disappears for you when you manually deselect VMAP stacks, it 
means the problem is not the fix itself, but the fact that VMAP stacks are now enable by default.


We need to understand why VMAP stack doesn't work on your platform, more than that why it doesn't 
boot at all with VMAP stack.


Could you send me the dmesg output of your system when it properly boots ?

Did you check with kernel 5.13 ?

Thanks
Christophe


Re: [PATCH] powerpc/xive: Do not skip CPU-less nodes when creating the IPIs

2021-08-02 Thread Cédric Le Goater
On 8/2/21 8:37 AM, Michael Ellerman wrote:
> Cédric Le Goater  writes:
>> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at
>> runtime. Today, the IPI is not created for such nodes, and hot-plugged
>> CPUs use a bogus IPI, which leads to soft lockups.
>>
>> We could create the node IPI on demand but it is a bit complex because
>> this code would be called under bringup_up() and some IRQ locking is
>> being done. The simplest solution is to create the IPIs for all nodes
>> at startup.
>>
>> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")
>> Cc: sta...@vger.kernel.org # v5.13
>> Reported-by: Geetika Moolchandani 
>> Cc: Srikar Dronamraju 
>> Signed-off-by: Cédric Le Goater 
>> ---
>>
>> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes
>> are collected from /sys/devices/system/node/ but CPU-less nodes are
>> not listed there. When interrupts are scanned, the link representing
>> the node structure is NULL and segfault occurs.
> 
> Breaking userspace is usually frowned upon, even if it is irqbalance.
>
> If CPU-less nodes appeared in /sys/devices/system/node would that fix
> it? Could we do that or is that not possible for other reasons?
> 
>> Version 1.7 seems immune. 
> 
> Which was released in August 2020.
> 
> Looks like some distros still ship 1.6, I take it you're not sure if
> that is broken or not.

I did a bisect on irqbalance and the "bad" commit was introduced between 
version 1.7 and version 1.8 :

  commit 31dea01f3a47 ("Also fetch node info for non-PCI devices") 
  
https://github.com/Irqbalance/irqbalance/commit/31dea01f3a47aa6374560638486879e5129f9c94

which was backported on RHEL 8 in RPM irqbalance-1.4.0-6.el8.

Any distro using irqbalance <= 1.7 without the patch above is fine. 

Since irqbalance handled cleanly irqs referencing offline nodes before 
this patch, I am inclined to think that the irqbalance fix is incomplete. 
Unfortunately, the commit log lacks some context on the non-PCI devices. 

Thanks,

C. 


Re: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)

2021-08-02 Thread Benjamin Herrenschmidt
On Mon, 2021-08-02 at 16:18 +1000, Michael Ellerman wrote:
> 
> But to be fair the ABI of the VDSO has always been a little fishy,
> 
> because the entry points pretend to be a transparent wrapper around
> 
> system calls, but then in a case like this are not.

This is somewhat debatable :-) If your perspective is from an
application, whether your wrapper is glibc, the vdso or *** knows what
else, you can't make assumptions about the state of the registers on a
signal hitting somewhere in your call chain other than what's
guanranteed by the ABI overall (ie, TLS etc...).

Nowhere was it written that a VDSO call behaved strictly like an sc
instruction :-)
> 
> > Go up to this point has only used the vdso function __kernel_clock_gettime; 
> > it
> > is the only entry point which would need to explicitly avoid R30 for
> > Go's sake.
> 
> 
> I thought about limiting the workaround to just that code, but it seemed
> silly and likely to come back to bite us. Once the compilers decides to
> spill a non-volatile there are plenty of other ones to choose from.

Yeah fine graining this is a waste of time, I agree. Just stick with
fixing r30 for the whole vdso, it won't actually hurt, just make sure
this is somewhat documented as to why we do it (I don't remember what
you patch does off hand, I assume at least your git commit has all the
data, a comment near the offending line in the Makefile might be good
too).

Cheers,
Ben.




Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-02 Thread Christophe Leroy




Le 28/07/2021 à 00:26, Tom Lendacky a écrit :

Replace occurrences of mem_encrypt_active() with calls to prot_guest_has()
with the PATTR_MEM_ENCRYPT attribute.



What about 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210730114231.23445-1-w...@kernel.org/ ?


Christophe




Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: VMware Graphics 
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Dave Young 
Cc: Baoquan He 
Signed-off-by: Tom Lendacky 
---
  arch/x86/kernel/head64.c| 4 ++--
  arch/x86/mm/ioremap.c   | 4 ++--
  arch/x86/mm/mem_encrypt.c   | 5 ++---
  arch/x86/mm/pat/set_memory.c| 3 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++-
  drivers/gpu/drm/drm_cache.c | 4 ++--
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++--
  drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +++---
  drivers/iommu/amd/iommu.c   | 3 ++-
  drivers/iommu/amd/iommu_v2.c| 3 ++-
  drivers/iommu/iommu.c   | 3 ++-
  fs/proc/vmcore.c| 6 +++---
  kernel/dma/swiotlb.c| 4 ++--
  13 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..cafed6456d45 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -19,7 +19,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  
  #include 

@@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 * there is no need to zero it after changing the memory encryption
 * attribute.
 */
-   if (mem_encrypt_active()) {
+   if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 0f2d5ace5986..5e1c1f5cbbe8 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -693,7 +693,7 @@ static bool __init 
early_memremap_is_setup_data(resource_size_t phys_addr,
  bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long 
size,
 unsigned long flags)
  {
-   if (!mem_encrypt_active())
+   if (!prot_guest_has(PATTR_MEM_ENCRYPT))
return true;
  
  	if (flags & MEMREMAP_ENC)

@@ -723,7 +723,7 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
  {
bool encrypted_prot;
  
-	if (!mem_encrypt_active())

+   if (!prot_guest_has(PATTR_MEM_ENCRYPT))
return prot;
  
  	encrypted_prot = true;

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 451de8e84fce..0f1533dbe81c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -364,8 +364,7 @@ int __init early_set_memory_encrypted(unsigned long vaddr, 
unsigned long size)
  /*
   * SME and SEV are very similar but they are not the same, so there are
   * times that the kernel will need to distinguish between SME and SEV. The
- * sme_active() and sev_active() functions are used for this.  When a
- * distinction isn't needed, the mem_encrypt_active() function can be used.
+ * sme_active() and sev_active() functions are used for this.
   *
   * The trampoline code is a good example for this requirement.  Before
   * paging is activated, SME will access all memory as decrypted, but SEV
@@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
 * The unused memory range was mapped decrypted, change the encryption
 * attribute from decrypted to encrypted before freeing it.
 */
-   if (mem_encrypt_active()) {
+   if (sme_me_mask) {
r = set_memory_encrypted(vaddr, npages);
if (r) {
pr_warn("failed to free unused decrypted pages\n");
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index ad8a5c586a35..6925f2bb4be1 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -1986,7 +1987,7 @@ static int __set_memory_enc_dec(unsigned long addr, int 
numpages, bool enc)
int ret;
  
  	/* Nothing to do if memory encryption is not active */

-   if (!mem_encrypt_active())
+   if (!prot_guest_has(PATTR_MEM_ENCRYPT))
return 0;
  
  	/* Should not be working on unaligned addresses */

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index abb928894eac..8407224717df 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -38,6 +38,7 @@
  

Re: [PATCH v3] fpga: dfl: fme: Fix cpu hotplug issue in performance reporting

2021-08-02 Thread kajoljain



On 8/2/21 2:28 PM, Moritz Fischer wrote:
> On Mon, Aug 02, 2021 at 01:15:00PM +0530, kajoljain wrote:
>>
>>
>> On 7/13/21 1:12 PM, Kajol Jain wrote:
>>> The performance reporting driver added cpu hotplug
>>> feature but it didn't add pmu migration call in cpu
>>> offline function.
>>> This can create an issue incase the current designated
>>> cpu being used to collect fme pmu data got offline,
>>> as based on current code we are not migrating fme pmu to
>>> new target cpu. Because of that perf will still try to
>>> fetch data from that offline cpu and hence we will not
>>> get counter data.
>>>
>>> Patch fixed this issue by adding pmu_migrate_context call
>>> in fme_perf_offline_cpu function.
>>>
>>> Fixes: 724142f8c42a ("fpga: dfl: fme: add performance reporting support")
>>> Tested-by: Xu Yilun 
>>> Acked-by: Wu Hao 
>>> Signed-off-by: Kajol Jain 
>>> Cc: sta...@vger.kernel.org
>>> ---
>>
>> Any update on this patch? Please let me know if any changes required.
>>
>> Thanks,
>> Kajol Jain
> 
> It's in my 'fixes' branch.

Thanks Moritz for informing me.

Thanks,
Kajol Jain

> 
> - Moritz
> 


Re: [PATCH] powerpc/vdso: Don't use r30 to avoid breaking Go lang

2021-08-02 Thread Michael Ellerman
Michael Ellerman  writes:
> The Go runtime uses r30 for some special value called 'g'. It assumes
> that value will remain unchanged even when calling VDSO functions.
> Although r30 is non-volatile across function calls, the callee is free
> to use it, as long as the callee saves the value and restores it before
> returning.
>
> It used to be true by accident that the VDSO didn't use r30, because the
> VDSO was hand-written asm. When we switched to building the VDSO from C
> the compiler started using r30, at least in some builds, leading to
> crashes in Go. eg:
>
>   ~/go/src$ ./all.bash
>   Building Go cmd/dist using /usr/lib/go-1.16. (go1.16.2 linux/ppc64le)
>   Building Go toolchain1 using /usr/lib/go-1.16.
>   go build os/exec: /usr/lib/go-1.16/pkg/tool/linux_ppc64le/compile: signal: 
> segmentation fault
>   go build reflect: /usr/lib/go-1.16/pkg/tool/linux_ppc64le/compile: signal: 
> segmentation fault
>   go tool dist: FAILED: /usr/lib/go-1.16/bin/go install -gcflags=-l 
> -tags=math_big_pure_go compiler_bootstrap bootstrap/cmd/...: exit status 1
>
> There are patches in flight to fix Go[1], but until they are released
> and widely deployed we can workaround it in the VDSO by avoiding use of
> r30.
>
> Note this only works with GCC, clang does not support -ffixed-rN.
>
> 1: https://go-review.googlesource.com/c/go/+/328110
>
> Fixes: ab037dd87a2f ("powerpc/vdso: Switch VDSO to generic C implementation.")
> Cc: sta...@vger.kernel.org # v5.11+

In practice, with GCC 10.3.0, that commit doesn't result in r30 being
used by the compiler.

It's commit 74205b3fc2ef ("powerpc/vdso: Add support for time
namespaces"), which went into v5.13-rc1, which causes r30 to be used in
__c_kernel_clock_gettime():

06e0 <__c_kernel_clock_gettime>:
 6e0:   0f 00 03 28 cmplwi  r3,15
 6e4:   ec 00 81 41 bgt 7d0 <__c_kernel_clock_gettime+0xf0>
 6e8:   01 00 20 39 li  r9,1
 6ec:   30 18 29 7d slw r9,r9,r3
 6f0:   83 08 2a 71 andi.   r10,r9,2179
 6f4:   fc 00 82 41 beq 7f0 <__c_kernel_clock_gettime+0x110>
 6f8:   e4 26 63 78 rldicr  r3,r3,4,59
 6fc:   ff 7f 20 3d lis r9,32767
 700:   f0 ff c1 fb std r30,-16(r1)
 704:   f8 ff e1 fb std r31,-8(r1)
 708:   14 1a c5 7c add r6,r5,r3
 70c:   ff ff 2b 61 ori r11,r9,65535
 710:   00 00 05 81 lwz r8,0(r5)
 714:   01 00 09 71 andi.   r9,r8,1
 718:   98 00 82 40 bne 7b0 <__c_kernel_clock_gettime+0xd0>
 71c:   ac 04 20 7c lwsync
 720:   a6 42 cc 7f mftbr30


cheers


Re: [PATCH] powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()

2021-08-02 Thread Michael Ellerman
Will Deacon  writes:
> Commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()")
> introduced a set_memory_encrypted() call to swiotlb_exit() so that the
> buffer pages are returned to an encrypted state prior to being freed.
>
> Sachin reports that this leads to the following crash on a Power server:
>
> [0.010799] software IO TLB: tearing down default memory pool
> [0.010805] [ cut here ]
> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
>
> Nick spotted that this is because set_memory_encrypted() is issuing an
> ultracall which doesn't exist for the processor, and should therefore
> be gated by mem_encrypt_active() to mirror the x86 implementation.
>
> Cc: Konrad Rzeszutek Wilk 
> Cc: Claire Chang 
> Cc: Christoph Hellwig 
> Cc: Robin Murphy 
> Fixes: ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()")
> Suggested-by: Nicholas Piggin 
> Reported-by: Sachin Sant 
> Tested-by: Sachin Sant 
> Tested-by: Nathan Chancellor 
> Link: 
> https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/
> Signed-off-by: Will Deacon 
> ---
>  arch/powerpc/platforms/pseries/svm.c | 6 ++
>  1 file changed, 6 insertions(+)

Thanks.

Acked-by: Michael Ellerman 


I assume Konrad will take this via the swiotlb tree?

cheers


Re: [PATCH 08/11] mm: Remove the now unused mem_encrypt_active() function

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:11PM -0500, Tom Lendacky wrote:
> The mem_encrypt_active() function has been replaced by prot_guest_has(),
> so remove the implementation.
> 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 


Re: [PATCH 09/11] x86/sev: Remove the now unused mem_encrypt_active() function

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:12PM -0500, Tom Lendacky wrote:
> The mem_encrypt_active() function has been replaced by prot_guest_has(),
> so remove the implementation.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 


Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:09PM -0500, Tom Lendacky wrote:
> @@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct 
> trampoline_header *th)
>   if (prot_guest_has(PATTR_HOST_MEM_ENCRYPT))
>   th->flags |= TH_FLAGS_SME_ACTIVE;
>  
> - if (sev_es_active()) {
> + if (prot_guest_has(PATTR_GUEST_PROT_STATE)) {
>   /*
>* Skip the call to verify_cpu() in secondary_startup_64 as it
>* will cause #VC exceptions when the AP can't handle them yet.

Not sure how TDX will handle AP booting, are you sure it needs this
special setup as well? Otherwise a check for SEV-ES would be better
instead of the generic PATTR_GUEST_PROT_STATE.

Regards,

Joerg


Re: [PATCH 05/11] x86/sev: Replace occurrences of sev_active() with prot_guest_has()

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:08PM -0500, Tom Lendacky wrote:
> Replace occurrences of sev_active() with the more generic prot_guest_has()
> using PATTR_GUEST_MEM_ENCRYPT, except for in arch/x86/mm/mem_encrypt*.c
> where PATTR_SEV will be used. If future support is added for other memory
> encryption technologies, the use of PATTR_GUEST_MEM_ENCRYPT can be
> updated, as required, to use PATTR_SEV.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Cc: Ard Biesheuvel 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 


Re: [PATCH 02/11] x86/sev: Add an x86 version of prot_guest_has()

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:05PM -0500, Tom Lendacky wrote:
> Introduce an x86 version of the prot_guest_has() function. This will be
> used in the more generic x86 code to replace vendor specific calls like
> sev_active(), etc.
> 
> While the name suggests this is intended mainly for guests, it will
> also be used for host memory encryption checks in place of sme_active().
> 
> The amd_prot_guest_has() function does not use EXPORT_SYMBOL_GPL for the
> same reasons previously stated when changing sme_active(), sev_active and
> sme_me_mask to EXPORT_SYBMOL:
>   commit 87df26175e67 ("x86/mm: Unbreak modules that rely on external 
> PAGE_KERNEL availability")
>   commit 9d5f38ba6c82 ("x86/mm: Unbreak modules that use the DMA API")
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Co-developed-by: Andi Kleen 
> Signed-off-by: Andi Kleen 
> Co-developed-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 


Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:04PM -0500, Tom Lendacky wrote:
> In prep for other protected virtualization technologies, introduce a
> generic helper function, prot_guest_has(), that can be used to check
> for specific protection attributes, like memory encryption. This is
> intended to eliminate having to add multiple technology-specific checks
> to the code (e.g. if (sev_active() || tdx_active())).
> 
> Co-developed-by: Andi Kleen 
> Signed-off-by: Andi Kleen 
> Co-developed-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 


Re: [PATCH 04/11] x86/sme: Replace occurrences of sme_active() with prot_guest_has()

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:07PM -0500, Tom Lendacky wrote:
> Replace occurrences of sme_active() with the more generic prot_guest_has()
> using PATTR_HOST_MEM_ENCRYPT, except for in arch/x86/mm/mem_encrypt*.c
> where PATTR_SME will be used. If future support is added for other memory
> encryption technologies, the use of PATTR_HOST_MEM_ENCRYPT can be
> updated, as required, to use PATTR_SME.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Cc: Joerg Roedel 
> Cc: Will Deacon 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 


Re: [PATCH v2 13/21] s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR

2021-08-02 Thread Niklas Schnelle
On Fri, 2021-07-23 at 11:50 -0600, Logan Gunthorpe wrote:
> Setting the ->dma_address to DMA_MAPPING_ERROR is not part of
> the ->map_sg calling convention, so remove it.
> 
> Link: https://lore.kernel.org/linux-mips/20210716063241.gc13...@lst.de/
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Logan Gunthorpe 
> Cc: Niklas Schnelle 
> Cc: Gerald Schaefer 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> ---
>  arch/s390/pci/pci_dma.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
> index c78b02012764..be48e5b5bfcf 100644
> --- a/arch/s390/pci/pci_dma.c
> +++ b/arch/s390/pci/pci_dma.c
> @@ -492,7 +492,6 @@ static int s390_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   for (i = 1; i < nr_elements; i++) {
>   s = sg_next(s);
>  
> - s->dma_address = DMA_MAPPING_ERROR;
>   s->dma_length = 0;
>  
>   if (s->offset || (size & ~PAGE_MASK) ||

Acked-by: Niklas Schnelle 

Thanks!



Re: [PATCH v3] fpga: dfl: fme: Fix cpu hotplug issue in performance reporting

2021-08-02 Thread Moritz Fischer
On Mon, Aug 02, 2021 at 01:15:00PM +0530, kajoljain wrote:
> 
> 
> On 7/13/21 1:12 PM, Kajol Jain wrote:
> > The performance reporting driver added cpu hotplug
> > feature but it didn't add pmu migration call in cpu
> > offline function.
> > This can create an issue incase the current designated
> > cpu being used to collect fme pmu data got offline,
> > as based on current code we are not migrating fme pmu to
> > new target cpu. Because of that perf will still try to
> > fetch data from that offline cpu and hence we will not
> > get counter data.
> > 
> > Patch fixed this issue by adding pmu_migrate_context call
> > in fme_perf_offline_cpu function.
> > 
> > Fixes: 724142f8c42a ("fpga: dfl: fme: add performance reporting support")
> > Tested-by: Xu Yilun 
> > Acked-by: Wu Hao 
> > Signed-off-by: Kajol Jain 
> > Cc: sta...@vger.kernel.org
> > ---
> 
> Any update on this patch? Please let me know if any changes required.
> 
> Thanks,
> Kajol Jain

It's in my 'fixes' branch.

- Moritz


Re: [PATCH 2/2] virtio-console: remove unnecessary kmemdup()

2021-08-02 Thread Xianting Tian



在 2021/8/2 下午4:40, Jiri Slaby 写道:

On 02. 08. 21, 10:32, Xianting Tian wrote:


在 2021/8/2 下午3:25, Jiri Slaby 写道:

Hi,

why is this 2/2? I seem (Lore neither) to find 1/2.

You didn't receive 1/2?
[PATCH 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/1/8 


Oh, I did, but it's not properly threaded. PLease fix your setup.

Ok, thanks



On 01. 08. 21, 7:16, Xianting Tian wrote:
hvc framework will never pass stack memory to the put_chars() 
function,


Am I blind or missing something?

hvc_console_print(...)
{
  char c[N_OUTBUF]
...
  cons_ops[index]->put_chars(vtermnos[index], c, i);

The same here:

hvc_poll_put_char(..., char ch)
{
...
   n = hp->ops->put_chars(hp->vtermno, , 1);

AFAICS both of them *pass* a pointer to stack variable.


yes, I discussed the issue with Arnd before in below thread, you can 
get the history, thanks


https://lkml.org/lkml/2021/7/27/494 



So is this a v2? You should have noted that. And what changed from v1 
too.


I think yes, I should mentioned it in this patch, sorry for that:(




So the calling of kmemdup() is unnecessary, remove it.

Fixes: c4baad5029 ("virtio-console: avoid DMA from stack")


This patch doesn't "Fix" -- it reverts the commit. You should've 
CCed the author too.


yes, we discussed ther issue in above thread, which we CCed the author.


I don't see any input from the author?


Anyway, 1/2 does not even build, so you will send v3 with all the 
above fixed, hopefully.


yes, I will send v3 patch after I figured out a better solution based on 
Arnd's comments for the patch '1/2'.


Do you have any other suggestion for the solution?

thanks.



thanks,


Re: [PATCH 2/2] virtio-console: remove unnecessary kmemdup()

2021-08-02 Thread Jiri Slaby

On 02. 08. 21, 10:32, Xianting Tian wrote:


在 2021/8/2 下午3:25, Jiri Slaby 写道:

Hi,

why is this 2/2? I seem (Lore neither) to find 1/2.

You didn't receive 1/2?
[PATCH 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/1/8 


Oh, I did, but it's not properly threaded. PLease fix your setup.


On 01. 08. 21, 7:16, Xianting Tian wrote:

hvc framework will never pass stack memory to the put_chars() function,


Am I blind or missing something?

hvc_console_print(...)
{
  char c[N_OUTBUF]
...
  cons_ops[index]->put_chars(vtermnos[index], c, i);

The same here:

hvc_poll_put_char(..., char ch)
{
...
   n = hp->ops->put_chars(hp->vtermno, , 1);

AFAICS both of them *pass* a pointer to stack variable.


yes, I discussed the issue with Arnd before in below thread,  you can 
get the history, thanks


https://lkml.org/lkml/2021/7/27/494 


So is this a v2? You should have noted that. And what changed from v1 too.


So the calling of kmemdup() is unnecessary, remove it.

Fixes: c4baad5029 ("virtio-console: avoid DMA from stack")


This patch doesn't "Fix" -- it reverts the commit. You should've CCed 
the author too.


yes, we discussed ther issue in above thread, which we CCed the author.


I don't see any input from the author?


Anyway, 1/2 does not even build, so you will send v3 with all the above 
fixed, hopefully.


thanks,
--
js


Re: [PATCH v3] fpga: dfl: fme: Fix cpu hotplug issue in performance reporting

2021-08-02 Thread kajoljain



On 7/13/21 1:12 PM, Kajol Jain wrote:
> The performance reporting driver added cpu hotplug
> feature but it didn't add pmu migration call in cpu
> offline function.
> This can create an issue incase the current designated
> cpu being used to collect fme pmu data got offline,
> as based on current code we are not migrating fme pmu to
> new target cpu. Because of that perf will still try to
> fetch data from that offline cpu and hence we will not
> get counter data.
> 
> Patch fixed this issue by adding pmu_migrate_context call
> in fme_perf_offline_cpu function.
> 
> Fixes: 724142f8c42a ("fpga: dfl: fme: add performance reporting support")
> Tested-by: Xu Yilun 
> Acked-by: Wu Hao 
> Signed-off-by: Kajol Jain 
> Cc: sta...@vger.kernel.org
> ---

Any update on this patch? Please let me know if any changes required.

Thanks,
Kajol Jain

>  drivers/fpga/dfl-fme-perf.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> ---
> Changelog:
> v2 -> v3:
> - Added Acked-by tag
> - Removed comment as suggested by Wu Hao
> - Link to patch v2: https://lkml.org/lkml/2021/7/9/143
> 
> v1 -> v2:
> - Add sta...@vger.kernel.org in cc list
> - Link to patch v1: https://lkml.org/lkml/2021/6/28/275
> 
> RFC -> PATCH v1
> - Remove RFC tag
> - Did nits changes on subject and commit message as suggested by Xu Yilun
> - Added Tested-by tag
> - Link to rfc patch: https://lkml.org/lkml/2021/6/28/112
> ---
> 
> diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
> index 4299145ef347..587c82be12f7 100644
> --- a/drivers/fpga/dfl-fme-perf.c
> +++ b/drivers/fpga/dfl-fme-perf.c
> @@ -953,6 +953,8 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct 
> hlist_node *node)
>   return 0;
>  
>   priv->cpu = target;
> + perf_pmu_migrate_context(>pmu, cpu, target);
> +
>   return 0;
>  }
>  
> 


[PATCH v4 4/4] powerpc/papr_scm: Document papr_scm sysfs event format entries

2021-08-02 Thread Kajol Jain
Details is added for the event, cpumask and format attributes
in the ABI documentation.

Acked-by: Peter Zijlstra (Intel) 
Reviewed-by: Madhavan Srinivasan 
Tested-by: Nageswara R Sastry 
Signed-off-by: Kajol Jain 
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem 
b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 95254cec92bf..4d86252448f8 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -61,3 +61,34 @@ Description:
* "CchRHCnt" : Cache Read Hit Count
* "CchWHCnt" : Cache Write Hit Count
* "FastWCnt" : Fast Write Count
+
+What:  /sys/devices/nmemX/format
+Date:  June 2021
+Contact:   linuxppc-dev , 
nvd...@lists.linux.dev,
+Description:   (RO) Attribute group to describe the magic bits
+that go into perf_event_attr.config for a particular pmu.
+(See ABI/testing/sysfs-bus-event_source-devices-format).
+
+Each attribute under this group defines a bit range of the
+perf_event_attr.config. Supported attribute is listed
+below::
+
+   event  = "config:0-4"  - event ID
+
+   For example::
+   noopstat = "event=0x1"
+
+What:  /sys/devices/nmemX/events
+Date:  June 2021
+Contact:   linuxppc-dev , 
nvd...@lists.linux.dev,
+Description:(RO) Attribute group to describe performance monitoring
+events specific to papr-scm. Each attribute in this group 
describes
+a single performance monitoring event supported by this nvdimm 
pmu.
+The name of the file is the name of the event.
+(See ABI/testing/sysfs-bus-event_source-devices-events).
+
+What:  /sys/devices/nmemX/cpumask
+Date:  June 2021
+Contact:   linuxppc-dev , 
nvd...@lists.linux.dev,
+Description:   (RO) This sysfs file exposes the cpumask which is designated to 
make
+HCALLs to retrieve nvdimm pmu event counter data.
-- 
2.26.2



[PATCH v4 3/4] powerpc/papr_scm: Add perf interface support

2021-08-02 Thread Kajol Jain
Performance monitoring support for papr-scm nvdimm devices
via perf interface is added which includes addition of pmu
functions like add/del/read/event_init for nvdimm_pmu struture.

A new parameter 'priv' in added to the pdev_archdata structure to save
nvdimm_pmu device pointer, to handle the unregistering of pmu device.

papr_scm_pmu_register function populates the nvdimm_pmu structure
with events, cpumask, attribute groups along with event handling
functions. Finally the populated nvdimm_pmu structure is passed to
register the pmu device.
Event handling functions internally uses hcall to get events and
counter data.

Result in power9 machine with 2 nvdimm device:

Ex: List all event by perf list

command:# perf list nmem

  nmem0/cchrhcnt/[Kernel PMU event]
  nmem0/cchwhcnt/[Kernel PMU event]
  nmem0/critrscu/[Kernel PMU event]
  nmem0/ctlresct/[Kernel PMU event]
  nmem0/ctlrestm/[Kernel PMU event]
  nmem0/fastwcnt/[Kernel PMU event]
  nmem0/hostlcnt/[Kernel PMU event]
  nmem0/hostldur/[Kernel PMU event]
  nmem0/hostscnt/[Kernel PMU event]
  nmem0/hostsdur/[Kernel PMU event]
  nmem0/medrcnt/ [Kernel PMU event]
  nmem0/medrdur/ [Kernel PMU event]
  nmem0/medwcnt/ [Kernel PMU event]
  nmem0/medwdur/ [Kernel PMU event]
  nmem0/memlife/ [Kernel PMU event]
  nmem0/noopstat/[Kernel PMU event]
  nmem0/ponsecs/ [Kernel PMU event]
  nmem1/cchrhcnt/[Kernel PMU event]
  nmem1/cchwhcnt/[Kernel PMU event]
  nmem1/critrscu/[Kernel PMU event]
  ...
  nmem1/noopstat/[Kernel PMU event]
  nmem1/ponsecs/ [Kernel PMU event]

Acked-by: Peter Zijlstra (Intel) 
Reviewed-by: Madhavan Srinivasan 
Tested-by: Nageswara R Sastry 
Signed-off-by: Kajol Jain 
---
 arch/powerpc/include/asm/device.h |   5 +
 arch/powerpc/platforms/pseries/papr_scm.c | 365 ++
 2 files changed, 370 insertions(+)

diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 219559d65864..47ed639f3b8f 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -48,6 +48,11 @@ struct dev_archdata {
 
 struct pdev_archdata {
u64 dma_mask;
+   /*
+* Pointer to nvdimm_pmu structure, to handle the unregistering
+* of pmu device
+*/
+   void *priv;
 };
 
 #endif /* _ASM_POWERPC_DEVICE_H */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index f48e87ac89c9..26900101e638 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -19,6 +19,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #define BIND_ANY_ADDR (~0ul)
 
@@ -68,6 +70,8 @@
 #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
 #define PAPR_SCM_PERF_STATS_VERSION 0x1
 
+#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
+
 /* Struct holding a single performance metric */
 struct papr_scm_perf_stat {
u8 stat_id[8];
@@ -120,6 +124,12 @@ struct papr_scm_priv {
 
/* length of the stat buffer as expected by phyp */
size_t stat_buffer_len;
+
+/* array to have event_code and stat_id mappings */
+   char **nvdimm_events_map;
+
+   /* count of supported events */
+   u32 total_events;
 };
 
 static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -340,6 +350,354 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv 
*p,
return 0;
 }
 
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *nvdimm_pmu_format_attr[] = {
+   _attr_event.attr,
+   NULL,
+};
+
+static struct attribute_group nvdimm_pmu_format_group = {
+   .name = "format",
+   .attrs = nvdimm_pmu_format_attr,
+};
+
+static int papr_scm_pmu_get_value(struct perf_event *event, struct device 
*dev, u64 *count)
+{
+   struct papr_scm_perf_stat *stat;
+   struct papr_scm_perf_stats *stats;
+   struct papr_scm_priv *p = (struct papr_scm_priv *)dev->driver_data;
+   int rc, size;
+
+   /* Allocate request buffer enough to hold single performance stat */
+   size = sizeof(struct papr_scm_perf_stats) +
+   sizeof(struct papr_scm_perf_stat);
+
+   if (!p || !p->nvdimm_events_map)
+   

[PATCH v4 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats

2021-08-02 Thread Kajol Jain
A common interface is added to get performance stats reporting
support for nvdimm devices. Added interface includes support for
pmu register/unregister functions, cpu hotplug and pmu event
functions like event_init/add/read/del.
User could use the standard perf tool to access perf
events exposed via pmu.

Acked-by: Peter Zijlstra (Intel) 
Reviewed-by: Madhavan Srinivasan 
Tested-by: Nageswara R Sastry 
Signed-off-by: Kajol Jain 
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_perf.c | 230 +++
 include/linux/nd.h   |   3 +
 3 files changed, 234 insertions(+)
 create mode 100644 drivers/nvdimm/nd_perf.c

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 29203f3d3069..25dba6095612 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -18,6 +18,7 @@ nd_e820-y := e820.o
 libnvdimm-y := core.o
 libnvdimm-y += bus.o
 libnvdimm-y += dimm_devs.o
+libnvdimm-y += nd_perf.o
 libnvdimm-y += dimm.o
 libnvdimm-y += region_devs.o
 libnvdimm-y += region.o
diff --git a/drivers/nvdimm/nd_perf.c b/drivers/nvdimm/nd_perf.c
new file mode 100644
index ..4c49d1bc2a3c
--- /dev/null
+++ b/drivers/nvdimm/nd_perf.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * nd_perf.c: NVDIMM Device Performance Monitoring Unit support
+ *
+ * Perf interface to expose nvdimm performance stats.
+ *
+ * Copyright (C) 2021 IBM Corporation
+ */
+
+#define pr_fmt(fmt) "nvdimm_pmu: " fmt
+
+#include 
+
+static ssize_t nvdimm_pmu_cpumask_show(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct pmu *pmu = dev_get_drvdata(dev);
+   struct nvdimm_pmu *nd_pmu;
+
+   nd_pmu = container_of(pmu, struct nvdimm_pmu, pmu);
+
+   return cpumap_print_to_pagebuf(true, buf, cpumask_of(nd_pmu->cpu));
+}
+
+static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+   struct nvdimm_pmu *nd_pmu;
+   u32 target;
+   int nodeid;
+   const struct cpumask *cpumask;
+
+   nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+   /* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
+   cpumask_test_and_clear_cpu(cpu, _pmu->arch_cpumask);
+
+   /*
+* If given cpu is not same as current designated cpu for
+* counter access, just return.
+*/
+   if (cpu != nd_pmu->cpu)
+   return 0;
+
+   /* Check for any active cpu in nd_pmu->arch_cpumask */
+   target = cpumask_any(_pmu->arch_cpumask);
+
+   /*
+* Incase we don't have any active cpu in nd_pmu->arch_cpumask,
+* check in given cpu's numa node list.
+*/
+   if (target >= nr_cpu_ids) {
+   nodeid = cpu_to_node(cpu);
+   cpumask = cpumask_of_node(nodeid);
+   target = cpumask_any_but(cpumask, cpu);
+   }
+   nd_pmu->cpu = target;
+
+   /* Migrate nvdimm pmu events to the new target cpu if valid */
+   if (target >= 0 && target < nr_cpu_ids)
+   perf_pmu_migrate_context(_pmu->pmu, cpu, target);
+
+   return 0;
+}
+
+static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+   struct nvdimm_pmu *nd_pmu;
+
+   nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+   if (nd_pmu->cpu >= nr_cpu_ids)
+   nd_pmu->cpu = cpu;
+
+   return 0;
+}
+
+static int create_cpumask_attr_group(struct nvdimm_pmu *nd_pmu)
+{
+   struct perf_pmu_events_attr *attr;
+   struct attribute **attrs;
+   struct attribute_group *nvdimm_pmu_cpumask_group;
+
+   attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+   if (!attr)
+   return -ENOMEM;
+
+   attrs = kzalloc(2 * sizeof(struct attribute *), GFP_KERNEL);
+   if (!attrs) {
+   kfree(attr);
+   return -ENOMEM;
+   }
+
+   /* Allocate memory for cpumask attribute group */
+   nvdimm_pmu_cpumask_group = kzalloc(sizeof(*nvdimm_pmu_cpumask_group), 
GFP_KERNEL);
+   if (!nvdimm_pmu_cpumask_group) {
+   kfree(attr);
+   kfree(attrs);
+   return -ENOMEM;
+   }
+
+   sysfs_attr_init(>attr.attr);
+   attr->attr.attr.name = "cpumask";
+   attr->attr.attr.mode = 0444;
+   attr->attr.show = nvdimm_pmu_cpumask_show;
+   attrs[0] = >attr.attr;
+   attrs[1] = NULL;
+
+   nvdimm_pmu_cpumask_group->attrs = attrs;
+   nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR] = nvdimm_pmu_cpumask_group;
+   return 0;
+}
+
+static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
+{
+   int nodeid, rc;
+   const struct cpumask *cpumask;
+
+   /*
+* Incase cpu hotplug is not handled by arch specific code
+* they can still provide required cpumask which can be used
+* to get designatd cpu for counter access.
+* Check for any active cpu in nd_pmu->arch_cpumask.
+*/
+   

[PATCH v4 1/4] drivers/nvdimm: Add nvdimm pmu structure

2021-08-02 Thread Kajol Jain
A structure is added, called nvdimm_pmu, for performance
stats reporting support of nvdimm devices. It can be used to add
nvdimm pmu data such as supported events and pmu event functions
like event_init/add/read/del with cpu hotplug support.

Acked-by: Peter Zijlstra (Intel) 
Reviewed-by: Madhavan Srinivasan 
Tested-by: Nageswara R Sastry 
Signed-off-by: Kajol Jain 
---
 include/linux/nd.h | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/include/linux/nd.h b/include/linux/nd.h
index ee9ad76afbba..712499cf7335 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -8,6 +8,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 enum nvdimm_event {
NVDIMM_REVALIDATE_POISON,
@@ -23,6 +25,47 @@ enum nvdimm_claim_class {
NVDIMM_CCLASS_UNKNOWN,
 };
 
+/* Event attribute array index */
+#define NVDIMM_PMU_FORMAT_ATTR 0
+#define NVDIMM_PMU_EVENT_ATTR  1
+#define NVDIMM_PMU_CPUMASK_ATTR2
+#define NVDIMM_PMU_NULL_ATTR   3
+
+/**
+ * struct nvdimm_pmu - data structure for nvdimm perf driver
+ *
+ * @name: name of the nvdimm pmu device.
+ * @pmu: pmu data structure for nvdimm performance stats.
+ * @dev: nvdimm device pointer.
+ * @functions(event_init/add/del/read): platform specific pmu functions.
+ * @attr_groups: data structure for events, formats and cpumask
+ * @cpu: designated cpu for counter access.
+ * @node: node for cpu hotplug notifier link.
+ * @cpuhp_state: state for cpu hotplug notification.
+ * @arch_cpumask: cpumask to get designated cpu for counter access.
+ */
+struct nvdimm_pmu {
+   const char *name;
+   struct pmu pmu;
+   struct device *dev;
+   int (*event_init)(struct perf_event *event);
+   int  (*add)(struct perf_event *event, int flags);
+   void (*del)(struct perf_event *event, int flags);
+   void (*read)(struct perf_event *event);
+   /*
+* Attribute groups for the nvdimm pmu. Index 0 used for
+* format attribute, index 1 used for event attribute,
+* index 2 used for cpusmask attribute and index 3 kept as NULL.
+*/
+   const struct attribute_group *attr_groups[4];
+   int cpu;
+   struct hlist_node node;
+   enum cpuhp_state cpuhp_state;
+
+   /* cpumask provided by arch/platform specific code */
+   struct cpumask arch_cpumask;
+};
+
 struct nd_device_driver {
struct device_driver drv;
unsigned long type;
-- 
2.26.2



[PATCH v4 0/4] Add perf interface to expose nvdimm

2021-08-02 Thread Kajol Jain
Patchset adds performance stats reporting support for nvdimm.
Added interface includes support for pmu register/unregister
functions. A structure is added called nvdimm_pmu to be used for
adding arch/platform specific data such as supported events, cpumask
pmu event functions like event_init/add/read/del.
User could use the standard perf tool to access perf
events exposed via pmu.

Added implementation to expose IBM pseries platform nmem*
device performance stats using this interface.

Result from power9 pseries lpar with 2 nvdimm device:
command:# perf list nmem
  nmem0/cchrhcnt/[Kernel PMU event]
  nmem0/cchwhcnt/[Kernel PMU event]
  nmem0/critrscu/[Kernel PMU event]
  nmem0/ctlresct/[Kernel PMU event]
  nmem0/ctlrestm/[Kernel PMU event]
  nmem0/fastwcnt/[Kernel PMU event]
  nmem0/hostlcnt/[Kernel PMU event]
  nmem0/hostldur/[Kernel PMU event]
  nmem0/hostscnt/[Kernel PMU event]
  nmem0/hostsdur/[Kernel PMU event]
  nmem0/medrcnt/ [Kernel PMU event]
  nmem0/medrdur/ [Kernel PMU event]
  nmem0/medwcnt/ [Kernel PMU event]
  nmem0/medwdur/ [Kernel PMU event]
  nmem0/memlife/ [Kernel PMU event]
  nmem0/noopstat/[Kernel PMU event]
  nmem0/ponsecs/ [Kernel PMU event]
  nmem1/cchrhcnt/[Kernel PMU event]
  nmem1/cchwhcnt/[Kernel PMU event]
  nmem1/critrscu/[Kernel PMU event]
  ...
  nmem1/noopstat/[Kernel PMU event]
  nmem1/ponsecs/ [Kernel PMU event]

Patch1:
Introduces the nvdimm_pmu structure
Patch2:
Adds common interface to add arch/platform specific data
includes supported events, pmu event functions. It also
adds code for cpu hotplug support.
Patch3:
Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
nmem* pmu. It fills in the nvdimm_pmu structure with event attrs
cpumask andevent functions and then registers the pmu by adding
callbacks to register_nvdimm_pmu.
Patch4:
Sysfs documentation patch

Changelog
---
v3 -> v4
- Rebase code on top of current papr_scm code without any logical
  changes.

- Added Acked-by tag from Peter Zijlstra and Reviewed by tag
  from Madhavan Srinivasan.

- Link to the patchset v3: https://lkml.org/lkml/2021/6/17/605

v2 -> v3
- Added Tested-by tag.

- Fix nvdimm mailing list in the ABI Documentation.

- Link to the patchset v2: https://lkml.org/lkml/2021/6/14/25

v1 -> v2
- Fix hotplug code by adding pmu migration call
  incase current designated cpu got offline. As
  pointed by Peter Zijlstra.

- Removed the retun -1 part from cpu hotplug offline
  function.

- Link to the patchset v1: https://lkml.org/lkml/2021/6/8/500

Kajol Jain (4):
  drivers/nvdimm: Add nvdimm pmu structure
  drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  powerpc/papr_scm: Add perf interface support
  powerpc/papr_scm: Document papr_scm sysfs event format entries

 Documentation/ABI/testing/sysfs-bus-papr-pmem |  31 ++
 arch/powerpc/include/asm/device.h |   5 +
 arch/powerpc/platforms/pseries/papr_scm.c | 365 ++
 drivers/nvdimm/Makefile   |   1 +
 drivers/nvdimm/nd_perf.c  | 230 +++
 include/linux/nd.h|  46 +++
 6 files changed, 678 insertions(+)
 create mode 100644 drivers/nvdimm/nd_perf.c

-- 
2.26.2



Re: [PATCH 2/2] virtio-console: remove unnecessary kmemdup()

2021-08-02 Thread Jiri Slaby

Hi,

why is this 2/2? I seem (Lore neither) to find 1/2.

On 01. 08. 21, 7:16, Xianting Tian wrote:

hvc framework will never pass stack memory to the put_chars() function,


Am I blind or missing something?

hvc_console_print(...)
{
  char c[N_OUTBUF]
...
  cons_ops[index]->put_chars(vtermnos[index], c, i);

The same here:

hvc_poll_put_char(..., char ch)
{
...
   n = hp->ops->put_chars(hp->vtermno, , 1);

AFAICS both of them *pass* a pointer to stack variable.


So the calling of kmemdup() is unnecessary, remove it.

Fixes: c4baad5029 ("virtio-console: avoid DMA from stack")


This patch doesn't "Fix" -- it reverts the commit. You should've CCed 
the author too.



Signed-off-by: Xianting Tian 
---
  drivers/char/virtio_console.c | 12 ++--
  1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7eaf303a7..4ed3ffb1d 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1117,8 +1117,6 @@ static int put_chars(u32 vtermno, const char *buf, int 
count)
  {
struct port *port;
struct scatterlist sg[1];
-   void *data;
-   int ret;
  
  	if (unlikely(early_put_chars))

return early_put_chars(vtermno, buf, count);
@@ -1127,14 +1125,8 @@ static int put_chars(u32 vtermno, const char *buf, int 
count)
if (!port)
return -EPIPE;
  
-	data = kmemdup(buf, count, GFP_ATOMIC);

-   if (!data)
-   return -ENOMEM;
-
-   sg_init_one(sg, data, count);
-   ret = __send_to_port(port, sg, 1, count, data, false);
-   kfree(data);
-   return ret;
+   sg_init_one(sg, buf, count);
+   return __send_to_port(port, sg, 1, count, (void *)buf, false);
  }
  
  /*





--
js
suse labs


Re: [PATCH 3/3] isystem: delete global -isystem compile option

2021-08-02 Thread Alexey Dobriyan
On Sun, Aug 01, 2021 at 04:32:47PM -0500, Segher Boessenkool wrote:
> On Sun, Aug 01, 2021 at 11:13:36PM +0300, Alexey Dobriyan wrote:
> > In theory, it enables "leakage" of userspace headers into kernel which
> > may present licensing problem.
> 
> > -NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) 
> > -print-file-name=include)
> > +NOSTDINC_FLAGS += -nostdinc
> 
> This is removing the compiler's own include files.  These are required
> for all kinds of basic features, and required to be compliant to the C
> standard at all.

No they are not required. Kernel uses its own bool, uintptr_t and
static_assert, memset(), CHAR_BIT. noreturn, alignas newest C standard
are next.

This version changelog didn't mention but kernel would use
-ffreestanding too if not other problems with the flag.

> These are not "userspace headers", that is what
> -nostdinc takes care of already.

They are userspace headers in the sense they are external to the project
just like userspace programs are external to the kernel.

> In the case of GCC all these headers are GPL-with-runtime-exception, so
> claiming this can cause licensing problems is fearmongering.

I agree licensing problem doesn't really exist.
It would take gcc drop-in replacement with authors insane enough to not
license standard headers properly.

> I strongly advise against doing this.

Kernel chose to be self-contained. -isystem removal makes sense then.
It will be used for intrinsics where necessary.


Re: [PATCH] powerpc/xive: Do not skip CPU-less nodes when creating the IPIs

2021-08-02 Thread Michael Ellerman
Cédric Le Goater  writes:
> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at
> runtime. Today, the IPI is not created for such nodes, and hot-plugged
> CPUs use a bogus IPI, which leads to soft lockups.
>
> We could create the node IPI on demand but it is a bit complex because
> this code would be called under bringup_up() and some IRQ locking is
> being done. The simplest solution is to create the IPIs for all nodes
> at startup.
>
> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")
> Cc: sta...@vger.kernel.org # v5.13
> Reported-by: Geetika Moolchandani 
> Cc: Srikar Dronamraju 
> Signed-off-by: Cédric Le Goater 
> ---
>
> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes
> are collected from /sys/devices/system/node/ but CPU-less nodes are
> not listed there. When interrupts are scanned, the link representing
> the node structure is NULL and segfault occurs.

Breaking userspace is usually frowned upon, even if it is irqbalance.

If CPU-less nodes appeared in /sys/devices/system/node would that fix
it? Could we do that or is that not possible for other reasons?

> Version 1.7 seems immune. 

Which was released in August 2020.

Looks like some distros still ship 1.6, I take it you're not sure if
that is broken or not.

cheers


RE: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)

2021-08-02 Thread Michael Ellerman
"Paul Murphy"  writes:
>  
> (My apologies for however IBM's email client munges this)
>
>> I heard it is going to be in Go 1.16.7, but I do not know much about Go.
>> Maybe the folks in Cc can chime in.
>  
>  
> We have backports primed and ready for the next point release. They
> are waiting on the release manager to cherrypick them.

OK good, that is still the correct fix in the long run.

> I think we were aware that our VDSO usage may have exploited some
> peculiarities in how the ppc64 version was constructed (i.e hand
> written assembly which just didn't happen to clobber R30).

Yeah I was "somewhat surprised" that Go thought it could use r30 like
that across a VDSO call :D

But to be fair the ABI of the VDSO has always been a little fishy,
because the entry points pretend to be a transparent wrapper around
system calls, but then in a case like this are not.

> Go up to this point has only used the vdso function __kernel_clock_gettime; it
> is the only entry point which would need to explicitly avoid R30 for
> Go's sake.

I thought about limiting the workaround to just that code, but it seemed
silly and likely to come back to bite us. Once the compilers decides to
spill a non-volatile there are plenty of other ones to choose from.

cheers


Re: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)

2021-08-02 Thread Michael Ellerman
Andreas Schwab  writes:
> On Jul 29 2021, Michael Ellerman wrote:
>
>> I haven't been able to reproduce the crash by following the instructions
>> in your bug report, I have go1.13.8, I guess the crash is only in newer
>> versions?
>
> Yes, only go1.14 and later are affected.
>
> https://build.opensuse.org/package/live_build_log/openSUSE:Factory:PowerPC/go1.13/standard/ppc64
> https://build.opensuse.org/package/live_build_log/openSUSE:Factory:PowerPC/go1.14/standard/ppc64

Thanks. That helps explain why I didn't see it, my test boxes have 1.13 
installed.

cheers


Re: [PATCH kernel] powerpc/powernv: Check if powernv_rng is initialized

2021-08-02 Thread Michael Ellerman
Alexey Kardashevskiy  writes:
> The powernv-rng driver has 2 users - the bare metal powernv platform and
> the KVM's H_RANDOM hcall. The hcall handler works fine when it is L0 KVM
> but fails in L1 KVM as there is no support for the HW registers in L1 VMs
> and such support is not advertised either (== no "ibm,power-rng" in
> the FDT). So when a nested VM tries H_RANDOM, the L1 KVM crashes on
> in_be64(rng->regs).
>
> This checks the pointers and returns an error if the feature is not
> set up.
>
> Signed-off-by: Alexey Kardashevskiy 
> ---
>
>
> Randomly randomized H_RANDOM:
>
> 00:00:45 executing program 10:
> r0 = openat$kvm(0xff9c, &(0x7f00), 0x0, 0x0)
> r1 = ioctl$KVM_CREATE_VM(r0, 0x2000ae01, 0x0)
> r2 = ioctl$KVM_CREATE_VCPU(r1, 0x2000ae41, 0x0)
> ioctl$KVM_SET_REGS(r2, 0x8188ae82, &(0x7f0001c0)={[0x0, 0x0, 
> 0xffe1, 0x0, 0x0, 0x20953, 0x0, 0xfffe, 0x0, 0x0, 
> 0x2], 0x2000})
> syz_kvm_setup_cpu$ppc64(0x, r2, 
> &(0x7fe8/0x18)=nil, 0x0, 0x0, 0x0, 0x0, 0x0)
> r3 = openat$kvm(0xff9c, &(0x7f000100), 0x0, 0x0)
> syz_kvm_setup_cpu$ppc64(r1, r2, &(0x7fe7/0x18)=nil, 
> &(0x7f80)=[{0x0, 
> &(0x7f000280)="e03d0080ef61e403ef79ef650900ef61647b007ce03fff63e403ff7bff679952ff6370e63f7e603c6360e403637863640003636018a8803c28bf8460e4038478ef97846436888460b6f6a03c88d6a560e403a5781beda564d879a5602665c03cb08dc660e403c67806b3c664966fc660d53fe03cddf1e760e403e7785c41e7646623e7602244463fb1f2803e00809462e403947a946604009462a6a6607f4abb4c13603f7b63e4037b7b7b679a367b6332d9c17c201c994f7201004cbb7a603f72047b63e4037b7b955f7b6799947b636401607f",
>  0xf0}], 0x1, 0x0, &(0x7fc0)=[@featur2={0x1, 0x1000}], 0x1)
>
>
> cpu 0xd: Vector: 300 (Data Access) at [c0001599f590]
> pc: c011d2bc: powernv_get_random_long+0x4c/0xc0
> lr: c011d298: powernv_get_random_long+0x28/0xc0
> sp: c0001599f830
>msr: 8280b033
>dar: 0
>  dsisr: 4000
>   current = 0xc000614c7f80
>   paca= 0xc000fff81700 irqmask: 0x03   irq_happened: 0x01
> pid   = 31576, comm = syz-executor.10
>
> Linux version 5.14.0-rc2-le_f29cf1ff9a23_a+fstn1 (aik@fstn1-p1) (gcc (Ubuntu 
> 10.3.0-1ubuntu1) 10.3.0, GNU ld (GNU Binutils for Ubuntu) 2.36.1) #263 SMP 
> Thu Jul 29 17:56:12 AEST 2021
> enter ? for help
> [c0001599f860] c01e45f8 kvmppc_pseries_do_hcall+0x5d8/0x2190
> [c0001599f8f0] c01ea2dc kvmppc_vcpu_run_hv+0x31c/0x14d0
> [c0001599f9c0] c01bd518 kvmppc_vcpu_run+0x48/0x60
> [c0001599f9f0] c01b74b0 kvm_arch_vcpu_ioctl_run+0x580/0x7d0
> [c0001599fa90] c019e6f8 kvm_vcpu_ioctl+0x418/0xd00
> [c0001599fc70] c079d8c4 sys_ioctl+0xb44/0x2100
> [c0001599fd90] c003b704 system_call_exception+0x224/0x410
> [c0001599fe10] c000c0e8 system_call_vectored_common+0xe8/0x278

There would be no bug if KVM was using arch_get_random_seed_long(),
because that defers to ppc_md, which is only populated when the RNG is
setup correctly. That seems like a better fix?

cheers