Re: [RFC PATCH 0/4] powerpc/papr_scm: Add support for reporting NVDIMM performance statistics

2020-10-21 Thread Michal Suchánek
Hello,

apparently this has not received any (public) comments.

Maybe resend without the RFC status?

Clearly the kernel interface must be defined first, and then ndctl can
follow and make use of it.

Thanks

Michal

On Mon, May 18, 2020 at 04:38:10PM +0530, Vaibhav Jain wrote:
> The patch-set proposes to add support for fetching and reporting
> performance statistics for PAPR compliant NVDIMMs as described in
> documentation for H_SCM_PERFORMANCE_STATS hcall Ref[1]. The patch-set
> also implements mechanisms to expose NVDIMM performance stats via
> sysfs and newly introduced PDSMs[2] for libndctl.
> 
> This patch-set combined with corresponding ndctl and libndctl changes
> proposed at Ref[3] should enable user to fetch PAPR compliant NVDIMMs
> using following command:
> 
>  # ndctl list -D --stats
> [
>   {
> "dev":"nmem0",
> "stats":{
>   "Controller Reset Count":2,
>   "Controller Reset Elapsed Time":603331,
>   "Power-on Seconds":603931,
>   "Life Remaining":"100%",
>   "Critical Resource Utilization":"0%",
>   "Host Load Count":5781028,
>   "Host Store Count":8966800,
>   "Host Load Duration":975895365,
>   "Host Store Duration":716230690,
>   "Media Read Count":0,
>   "Media Write Count":6313,
>   "Media Read Duration":0,
>   "Media Write Duration":9679615,
>   "Cache Read Hit Count":5781028,
>   "Cache Write Hit Count":8442479,
>   "Fast Write Count":8969912
> }
>   }
> ]
> 
> The patchset is dependent on existing patch-set "[PATCH v7 0/5]
> powerpc/papr_scm: Add support for reporting nvdimm health" available
> at Ref[2] that adds support for reporting PAPR compliant NVDIMMs in
> 'papr_scm' kernel module.
> 
> Structure of the patch-set
> ==
> 
> The patch-set starts with implementing functionality in papr_scm
> module to issue H_SCM_PERFORMANCE_STATS hcall, fetch & parse dimm
> performance stats and exposing them as a PAPR specific libnvdimm
> attribute named 'perf_stats'
> 
> Patch-2 introduces a new PDSM named FETCH_PERF_STATS that can be
> issued by libndctl asking papr_scm to issue the
> H_SCM_PERFORMANCE_STATS hcall using helpers introduced earlier and
> storing the results in a dimm specific perf-stats-buffer.
> 
> Patch-3 introduces a new PDSM named READ_PERF_STATS that can be
> issued by libndctl to read the perf-stats-buffer in an incremental
> manner to workaround the 256-bytes envelop limitation of libnvdimm.
> 
> Finally Patch-4 introduces a new PDSM named GET_PERF_STAT that can be
> issued by libndctl to read values of a specific NVDIMM performance
> stat like "Life Remaining".
> 
> References
> ==
> [1] Documentation/powerpc/papr_hcals.rst
> 
> [2] 
> https://lore.kernel.org/linux-nvdimm/20200508104922.72565-1-vaib...@linux.ibm.com/
> 
> [3] https://github.com/vaibhav92/ndctl/tree/papr_scm_stats_v1
> 
> Vaibhav Jain (4):
>   powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
>   powerpc/papr_scm: Add support for PAPR_SCM_PDSM_FETCH_PERF_STATS
>   powerpc/papr_scm: Implement support for PAPR_SCM_PDSM_READ_PERF_STATS
>   powerpc/papr_scm: Add support for PDSM GET_PERF_STAT
> 
>  Documentation/ABI/testing/sysfs-bus-papr-scm  |  27 ++
>  arch/powerpc/include/uapi/asm/papr_scm_pdsm.h |  60 +++
>  arch/powerpc/platforms/pseries/papr_scm.c | 391 ++
>  3 files changed, 478 insertions(+)
> 
> -- 
> 2.26.2
> 


Re: [PATCH v4 2/2] lkdtm/powerpc: Add SLB multihit test

2020-10-19 Thread Michal Suchánek
On Mon, Oct 19, 2020 at 09:59:57PM +1100, Michael Ellerman wrote:
> Hi Ganesh,
> 
> Some comments below ...
> 
> Ganesh Goudar  writes:
> > To check machine check handling, add support to inject slb
> > multihit errors.
> >
> > Cc: Kees Cook 
> > Reviewed-by: Michal Suchánek 
> > Co-developed-by: Mahesh Salgaonkar 
> > Signed-off-by: Mahesh Salgaonkar 
> > Signed-off-by: Ganesh Goudar 
> > ---
> >  drivers/misc/lkdtm/Makefile |   1 +
> >  drivers/misc/lkdtm/core.c   |   3 +
> >  drivers/misc/lkdtm/lkdtm.h  |   3 +
> >  drivers/misc/lkdtm/powerpc.c| 156 
> >  tools/testing/selftests/lkdtm/tests.txt |   1 +
> >  5 files changed, 164 insertions(+)
> >  create mode 100644 drivers/misc/lkdtm/powerpc.c
> >
> ..
> > diff --git a/drivers/misc/lkdtm/powerpc.c b/drivers/misc/lkdtm/powerpc.c
> > new file mode 100644
> > index ..f388b53dccba
> > --- /dev/null
> > +++ b/drivers/misc/lkdtm/powerpc.c
> > @@ -0,0 +1,156 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include "lkdtm.h"
> > +#include 
> > +#include 
> 
> Usual style is to include the linux headers first and then the local header.
> 
> > +
> > +/* Gets index for new slb entry */
> > +static inline unsigned long get_slb_index(void)
> > +{
> > +   unsigned long index;
> > +
> > +   index = get_paca()->stab_rr;
> > +
> > +   /*
> > +* simple round-robin replacement of slb starting at SLB_NUM_BOLTED.
> > +*/
> > +   if (index < (mmu_slb_size - 1))
> > +   index++;
> > +   else
> > +   index = SLB_NUM_BOLTED;
> > +   get_paca()->stab_rr = index;
> > +   return index;
> > +}
> 
> I'm not sure we need that really?
> 
> We can just always insert at SLB_MUM_BOLTED and SLB_NUM_BOLTED + 1.
> 
> Or we could allocate from the top down using mmu_slb_size - 1, and
> mmu_slb_size - 2.
> 
> 
> > +#define slb_esid_mask(ssize)   \
> > +   (((ssize) == MMU_SEGSIZE_256M) ? ESID_MASK : ESID_MASK_1T)
> > +
> > +/* Form the operand for slbmte */
> > +static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
> > +unsigned long slot)
> > +{
> > +   return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | slot;
> > +}
> > +
> > +#define slb_vsid_shift(ssize)  \
> > +   ((ssize) == MMU_SEGSIZE_256M ? SLB_VSID_SHIFT : SLB_VSID_SHIFT_1T)
> > +
> > +/* Form the operand for slbmte */
> > +static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
> > +unsigned long flags)
> > +{
> > +   return (get_kernel_vsid(ea, ssize) << slb_vsid_shift(ssize)) | flags |
> > +   ((unsigned long)ssize << SLB_VSID_SSIZE_SHIFT);
> > +}
> 
> I realise it's not much code, but I'd rather those were in a header,
> rather than copied from slb.c. That way they can never skew vs the
> versions in slb.c
> 
> Best place I think would be arch/powerpc/include/asm/book3s/64/mmu-hash.h
> 
> 
> > +
> > +/* Inserts new slb entry */
> 
> It inserts two.
> 
> > +static void insert_slb_entry(char *p, int ssize)
> > +{
> > +   unsigned long flags, entry;
> > +
> > +   flags = SLB_VSID_KERNEL | mmu_psize_defs[MMU_PAGE_64K].sllp;
> 
> That won't work if the kernel is built for 4K pages. Or at least it
> won't work the way we want it to.
> 
> You should use mmu_linear_psize.
> 
> But for vmalloc you should use mmu_vmalloc_psize, so it will need to be
> a parameter.
> 
> > +   preempt_disable();
> > +
> > +   entry = get_slb_index();
> > +   asm volatile("slbmte %0,%1" :
> > +   : "r" (mk_vsid_data((unsigned long)p, ssize, flags)),
> > + "r" (mk_esid_data((unsigned long)p, ssize, entry))
> > +   : "memory");
> > +
> > +   entry = get_slb_index();
> > +   asm volatile("slbmte %0,%1" :
> > +   : "r" (mk_vsid_data((unsigned long)p, ssize, flags)),
> > + "r" (mk_esid_data((unsigned long)p, ssize, entry))
> > +   : "memory");
> > +   preempt_enable();
> > +   /*
> > +* This triggers exception, If handled correctly we must recover
> > +* from this error.
> > +*/
> > +   p[0] = '!';
> 
> That doesn't belong in here, it should be don

Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

2020-10-16 Thread Michal Suchánek
On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote:
> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
> > Michal Suchánek  writes:
> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
> >>> > Hello,
> >>> > 
> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
> >>> > Reimplement book3s idle code in C").
> >>> > 
> >>> > The symptom is host locking up completely after some hours of KVM
> >>> > workload with messages like
> >>> > 
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 47
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 71
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 47
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 71
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 47
> >>> > 
> >>> > printed before the host locks up.
> >>> > 
> >>> > The machines run sandboxed builds which is a mixed workload resulting in
> >>> > IO/single core/mutiple core load over time and there are periods of no
> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM
> >>> > setup/terdown is somewhat excercised as well.
> >>> > 
> >>> > POWER9 with the new guest entry fast path does not seem to be affected.
> >>> > 
> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
> >>> > after idle") which gives same idle code as 5.1.16 and the kernel seems
> >>> > stable.
> >>> > 
> >>> > Config is attached.
> >>> > 
> >>> > I cannot easily revert this commit, especially if I want to use the same
> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are applicable
> >>> > only to the new idle code.
> >>> > 
> >>> > Any idea what can be the problem?
> >>> 
> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
> >>> those threads. I wonder what they are doing. POWER8 doesn't have a good
> >>> NMI IPI and I don't know if it supports pdbg dumping registers from the
> >>> BMC unfortunately.
> >>
> >> It may be possible to set up fadump with a later kernel version that
> >> supports it on powernv and dump the whole kernel.
> > 
> > Your firmware won't support it AFAIK.
> > 
> > You could try kdump, but if we have CPUs stuck in KVM then there's a
> > good chance it won't work :/
> 
> I haven't had any luck yet reproducing this still. Testing with sub 
> cores of various different combinations, etc. I'll keep trying though.

Hello,

I tried running some KVM guests to simulate the workload and what I get
is guests failing to start with a rcu stall. Tried both 5.3 and 5.9
kernel and qemu 4.2.1 and 5.1.0

To start some guests I run

for i in $(seq 0 9) ; do /opt/qemu/bin/qemu-system-ppc64 -m 2048 -accel kvm 
-smp 8 -kernel /boot/vmlinux -initrd /boot/initrd -nodefaults -nographic 
-serial mon:telnet::444$i,server,wait & done

To simulate some workload I run

xz -zc9T0 < /dev/zero > /dev/null &
while true; do
killall -STOP xz; sleep 1; killall -CONT xz; sleep 1;
done &

on the host and add a job that executes this to the ramdisk. However, most
guests never get to the point where the job is executed.

Any idea what might be the problem?

In the past I was able to boot guests quite realiably.

This is boot log of one of the VMs

Trying ::1...
Connected to localhost.
Escape character is '^]'.


SLOF **
QEMU Starting
 Build Date = Jul 17 2020 11:15:24
 FW Version = git-e18ddad8516ff2cf
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@7100
Populating /vdevice/nvram@7101
Populating /pci@8002000
No NVRAM common partition, re-initializing...
Scanning USB 
Using default console: /vdevice/vty@7100
Detected RAM kernel at 40 (27c8620 bytes) 
 
  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This pro

Re: [PATCH v6 02/11] mm/gup: Use functions to track lockless pgtbl walks on gup_pgd_range

2020-10-15 Thread Michal Suchánek
Hello,

On Thu, Feb 06, 2020 at 12:25:18AM -0300, Leonardo Bras wrote:
> On Thu, 2020-02-06 at 00:08 -0300, Leonardo Bras wrote:
> > gup_pgd_range(addr, end, gup_flags, pages, );
> > -   local_irq_enable();
> > +   end_lockless_pgtbl_walk(IRQS_ENABLED);
> > ret = nr;
> > }
> >  
> 
> Just noticed IRQS_ENABLED is not available on other archs than ppc64.
> I will fix this for v7.

Has threre been v7?

I cannot find it.

Thanks

Michal


Re: [PATCH] powerpc/perf: fix Threshold Event CounterMultiplier width for P10

2020-10-13 Thread Michal Suchánek
On Tue, Oct 13, 2020 at 06:27:05PM +0530, Madhavan Srinivasan wrote:
> 
> On 10/12/20 4:59 PM, Michal Suchánek wrote:
> > Hello,
> > 
> > On Mon, Oct 12, 2020 at 04:01:28PM +0530, Madhavan Srinivasan wrote:
> > > Power9 and isa v3.1 has 7bit mantissa field for Threshold Event Counter
> >^^^ Shouldn't his be 3.0?
> 
> My bad, What I meant was
> 
> Power9, ISA v3.0 and ISA v3.1 define a 7 bit mantissa field for Threshold
> Event Counter Multiplier(TECM).
I am really confused.

The following text and the code suggests that the mantissa is 8bit on
POWER10 and ISA v3.1.

Thanks

Michal
> 
> Maddy
> 
> > 
> > > Multiplier (TECM). TECM is part of Monitor Mode Control Register A 
> > > (MMCRA).
> > > This field along with Threshold Event Counter Exponent (TECE) is used to
> > > get threshould counter value. In Power10, the width of TECM field is
> > > increase to 8bits. Patch fixes the current code to modify the MMCRA[TECM]
> > > extraction macro to handling this changes.
> > > 
> > > Fixes: 170a315f41c64 ('powerpc/perf: Support to export MMCRA[TEC*] field 
> > > to userspace')
> > > Signed-off-by: Madhavan Srinivasan 
> > > ---
> > >   arch/powerpc/perf/isa207-common.c | 3 +++
> > >   arch/powerpc/perf/isa207-common.h | 4 
> > >   2 files changed, 7 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/perf/isa207-common.c 
> > > b/arch/powerpc/perf/isa207-common.c
> > > index 964437adec18..5fe129f02290 100644
> > > --- a/arch/powerpc/perf/isa207-common.c
> > > +++ b/arch/powerpc/perf/isa207-common.c
> > > @@ -247,6 +247,9 @@ void isa207_get_mem_weight(u64 *weight)
> > >   u64 sier = mfspr(SPRN_SIER);
> > >   u64 val = (sier & ISA207_SIER_TYPE_MASK) >> 
> > > ISA207_SIER_TYPE_SHIFT;
> > > + if (cpu_has_feature(CPU_FTR_ARCH_31))
> > > + mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
> > > +
> > >   if (val == 0 || val == 7)
> > >   *weight = 0;
> > >   else
> > > diff --git a/arch/powerpc/perf/isa207-common.h 
> > > b/arch/powerpc/perf/isa207-common.h
> > > index 044de65e96b9..71380e854f48 100644
> > > --- a/arch/powerpc/perf/isa207-common.h
> > > +++ b/arch/powerpc/perf/isa207-common.h
> > > @@ -219,6 +219,10 @@
> > >   #define MMCRA_THR_CTR_EXP(v)(((v) >> 
> > > MMCRA_THR_CTR_EXP_SHIFT) &\
> > >   MMCRA_THR_CTR_EXP_MASK)
> > > +#define P10_MMCRA_THR_CTR_MANT_MASK  0xFFul
> > > +#define P10_MMCRA_THR_CTR_MANT(v)(((v) >> 
> > > MMCRA_THR_CTR_MANT_SHIFT) &\
> > > + P10_MMCRA_THR_CTR_MANT_MASK)
> > > +
> > >   /* MMCRA Threshold Compare bit constant for power9 */
> > >   #define p9_MMCRA_THR_CMP_SHIFT  45
> > > -- 
> > > 2.26.2
> > > 


Re: [PATCH] powerpc/perf: fix Threshold Event CounterMultiplier width for P10

2020-10-12 Thread Michal Suchánek
Hello,

On Mon, Oct 12, 2020 at 04:01:28PM +0530, Madhavan Srinivasan wrote:
> Power9 and isa v3.1 has 7bit mantissa field for Threshold Event Counter
  ^^^ Shouldn't his be 3.0?

> Multiplier (TECM). TECM is part of Monitor Mode Control Register A (MMCRA).
> This field along with Threshold Event Counter Exponent (TECE) is used to
> get threshould counter value. In Power10, the width of TECM field is
> increase to 8bits. Patch fixes the current code to modify the MMCRA[TECM]
> extraction macro to handling this changes.
> 
> Fixes: 170a315f41c64 ('powerpc/perf: Support to export MMCRA[TEC*] field to 
> userspace')
> Signed-off-by: Madhavan Srinivasan 
> ---
>  arch/powerpc/perf/isa207-common.c | 3 +++
>  arch/powerpc/perf/isa207-common.h | 4 
>  2 files changed, 7 insertions(+)
> 
> diff --git a/arch/powerpc/perf/isa207-common.c 
> b/arch/powerpc/perf/isa207-common.c
> index 964437adec18..5fe129f02290 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -247,6 +247,9 @@ void isa207_get_mem_weight(u64 *weight)
>   u64 sier = mfspr(SPRN_SIER);
>   u64 val = (sier & ISA207_SIER_TYPE_MASK) >> ISA207_SIER_TYPE_SHIFT;
>  
> + if (cpu_has_feature(CPU_FTR_ARCH_31))
> + mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
> +
>   if (val == 0 || val == 7)
>   *weight = 0;
>   else
> diff --git a/arch/powerpc/perf/isa207-common.h 
> b/arch/powerpc/perf/isa207-common.h
> index 044de65e96b9..71380e854f48 100644
> --- a/arch/powerpc/perf/isa207-common.h
> +++ b/arch/powerpc/perf/isa207-common.h
> @@ -219,6 +219,10 @@
>  #define MMCRA_THR_CTR_EXP(v) (((v) >> MMCRA_THR_CTR_EXP_SHIFT) &\
>   MMCRA_THR_CTR_EXP_MASK)
>  
> +#define P10_MMCRA_THR_CTR_MANT_MASK  0xFFul
> +#define P10_MMCRA_THR_CTR_MANT(v)(((v) >> MMCRA_THR_CTR_MANT_SHIFT) &\
> + P10_MMCRA_THR_CTR_MANT_MASK)
> +
>  /* MMCRA Threshold Compare bit constant for power9 */
>  #define p9_MMCRA_THR_CMP_SHIFT   45
>  
> -- 
> 2.26.2
> 


Re: [PATCH v2 2/3] lkdtm/powerpc: Add SLB multihit test

2020-09-29 Thread Michal Suchánek
Hello,

On Fri, Sep 25, 2020 at 12:57:33PM -0700, Kees Cook wrote:
> On Fri, Sep 25, 2020 at 04:01:22PM +0530, Ganesh Goudar wrote:
> > Add support to inject slb multihit errors, to test machine
> > check handling.
> 
> Thank you for more tests in here!
Thanks for working on integrating this.
> 
> > 
> > Based on work by Mahesh Salgaonkar and Michal Suchánek.
> > 
> > Cc: Mahesh Salgaonkar 
> > Cc: Michal Suchánek 
> 
> Should these be Co-developed-by: with S-o-b?

I don't think I wrote any of this code. I packaged it for SUSE and maybe
changed some constants based on test result discussion.

I compared this code to my saved snapshots of past versions of the test
module and this covers all the test cases I have. The only difference is that
the development modules have verbose prints showing what's going on.

It is true that without the verbose prints some explanatory comments
could be helpful.

Reviewed-by: Michal Suchánek 
> 
> > Signed-off-by: Ganesh Goudar 
> > ---
> >  drivers/misc/lkdtm/Makefile  |   4 ++
> >  drivers/misc/lkdtm/core.c|   3 +
> >  drivers/misc/lkdtm/lkdtm.h   |   3 +
> >  drivers/misc/lkdtm/powerpc.c | 132 +++
> >  4 files changed, 142 insertions(+)
> >  create mode 100644 drivers/misc/lkdtm/powerpc.c
> > 
> > diff --git a/drivers/misc/lkdtm/Makefile b/drivers/misc/lkdtm/Makefile
> > index c70b3822013f..6a82f407fbcd 100644
> > --- a/drivers/misc/lkdtm/Makefile
> > +++ b/drivers/misc/lkdtm/Makefile
> > @@ -11,6 +11,10 @@ lkdtm-$(CONFIG_LKDTM)+= usercopy.o
> >  lkdtm-$(CONFIG_LKDTM)  += stackleak.o
> >  lkdtm-$(CONFIG_LKDTM)  += cfi.o
> >  
> > +ifeq ($(CONFIG_PPC64),y)
> > +lkdtm-$(CONFIG_LKDTM)  += powerpc.o
> > +endif
> 
> This can just be:
> 
> lkdtm-$(CONFIG_PPC64) += powerpc.o
> 
> > +
> >  KASAN_SANITIZE_stackleak.o := n
> >  KCOV_INSTRUMENT_rodata.o   := n
> >  
> > diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
> > index a5e344df9166..8d5db42baa90 100644
> > --- a/drivers/misc/lkdtm/core.c
> > +++ b/drivers/misc/lkdtm/core.c
> > @@ -178,6 +178,9 @@ static const struct crashtype crashtypes[] = {
> >  #ifdef CONFIG_X86_32
> > CRASHTYPE(DOUBLE_FAULT),
> >  #endif
> > +#ifdef CONFIG_PPC64
> > +   CRASHTYPE(PPC_SLB_MULTIHIT),
> > +#endif
> >  };
> >  
> >  
> > diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
> > index 8878538b2c13..b305bd511ee5 100644
> > --- a/drivers/misc/lkdtm/lkdtm.h
> > +++ b/drivers/misc/lkdtm/lkdtm.h
> > @@ -104,4 +104,7 @@ void lkdtm_STACKLEAK_ERASING(void);
> >  /* cfi.c */
> >  void lkdtm_CFI_FORWARD_PROTO(void);
> >  
> > +/* powerpc.c */
> > +void lkdtm_PPC_SLB_MULTIHIT(void);
> > +
> >  #endif
> > diff --git a/drivers/misc/lkdtm/powerpc.c b/drivers/misc/lkdtm/powerpc.c
> > new file mode 100644
> > index ..d6db18444757
> > --- /dev/null
> > +++ b/drivers/misc/lkdtm/powerpc.c
> > @@ -0,0 +1,132 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> 
> Please #include "lkdtm.h" here to get the correct pr_fmt heading (and
> any future header adjustments).
> 
> > +#include 
> > +#include 
> > +
> > +static inline unsigned long get_slb_index(void)
> > +{
> > +   unsigned long index;
> > +
> > +   index = get_paca()->stab_rr;
> > +
> > +   /*
> > +* simple round-robin replacement of slb starting at SLB_NUM_BOLTED.
> > +*/
> > +   if (index < (mmu_slb_size - 1))
> > +   index++;
> > +   else
> > +   index = SLB_NUM_BOLTED;
> > +   get_paca()->stab_rr = index;
> > +   return index;
> > +}
> > +
> > +#define slb_esid_mask(ssize)   \
> > +   (((ssize) == MMU_SEGSIZE_256M) ? ESID_MASK : ESID_MASK_1T)
> > +
> > +static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
> > +unsigned long slot)
> > +{
> > +   return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | slot;
> > +}
> > +
> > +#define slb_vsid_shift(ssize)  \
> > +   ((ssize) == MMU_SEGSIZE_256M ? SLB_VSID_SHIFT : SLB_VSID_SHIFT_1T)
> > +
> > +static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
> > +unsigned long flags)
> > +{
> > +   return (get_kernel_vsid(ea, ssize) << slb_vsid_shift(ssize)) | flags |
> > +   ((unsigned long)ssize << SLB_VSI

Re: [PATCH 0/3] powerpc/mce: Fix mce handler and add selftest

2020-09-17 Thread Michal Suchánek
Hello,

On Wed, Sep 16, 2020 at 10:52:25PM +0530, Ganesh Goudar wrote:
> This patch series fixes mce handling for pseries, provides debugfs
> interface for mce injection and adds selftest to test mce handling
> on pseries/powernv machines running in hash mmu mode.
> debugfs interface and sleftest are added only for slb multihit
> injection, We can add other tests in future if possible.
> 
> Ganesh Goudar (3):
>   powerpc/mce: remove nmi_enter/exit from real mode handler
>   powerpc/mce: Add debugfs interface to inject MCE
>   selftest/powerpc: Add slb multihit selftest

Is the below logic sound? It does not agree with what is added here:

void machine_check_exception(struct pt_regs *regs)
{
int recover = 0;

/*
 * BOOK3S_64 does not call this handler as a non-maskable interrupt
 * (it uses its own early real-mode handler to handle the MCE proper
 * and then raises irq_work to call this handler when interrupts are
 * enabled).
 *
 * This is silly. The BOOK3S_64 should just call a different function
 * rather than expecting semantics to magically change. Something
 * like 'non_nmi_machine_check_exception()', perhaps?
 */
const bool nmi = !IS_ENABLED(CONFIG_PPC_BOOK3S_64);

if (nmi) nmi_enter();

Thanks

Michal


Re: [PATCH 2/3] powerpc/mce: Add debugfs interface to inject MCE

2020-09-17 Thread Michal Suchánek
Hello,

On Wed, Sep 16, 2020 at 10:52:27PM +0530, Ganesh Goudar wrote:
> To test machine check handling, add debugfs interface to inject
> slb multihit errors.
> 
> To inject slb multihit:
>  #echo 1 > /sys/kernel/debug/powerpc/mce_error_inject/inject_slb_multihit
> 
> Signed-off-by: Ganesh Goudar 
> Signed-off-by: Mahesh Salgaonkar 
> ---
>  arch/powerpc/Kconfig.debug |   9 ++
>  arch/powerpc/sysdev/Makefile   |   2 +
>  arch/powerpc/sysdev/mce_error_inject.c | 148 +
>  3 files changed, 159 insertions(+)
>  create mode 100644 arch/powerpc/sysdev/mce_error_inject.c
> 
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index b88900f4832f..61db133f2f0d 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -398,3 +398,12 @@ config KASAN_SHADOW_OFFSET
>   hex
>   depends on KASAN
>   default 0xe000
> +
> +config MCE_ERROR_INJECT
> + bool "Enable MCE error injection through debugfs"
> + depends on DEBUG_FS
> + default y
> + help
> +   This option creates an mce_error_inject directory in the
> +   powerpc debugfs directory that allows limited injection of
> +   Machine Check Errors (MCEs).
> diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
> index 026b3f01a991..7fc10b77 100644
> --- a/arch/powerpc/sysdev/Makefile
> +++ b/arch/powerpc/sysdev/Makefile
> @@ -52,3 +52,5 @@ obj-$(CONFIG_PPC_XICS)  += xics/
>  obj-$(CONFIG_PPC_XIVE)   += xive/
>  
>  obj-$(CONFIG_GE_FPGA)+= ge/
> +
> +obj-$(CONFIG_MCE_ERROR_INJECT)   += mce_error_inject.o
> diff --git a/arch/powerpc/sysdev/mce_error_inject.c 
> b/arch/powerpc/sysdev/mce_error_inject.c
> new file mode 100644
> index ..ca4726bfa2d9
> --- /dev/null
> +++ b/arch/powerpc/sysdev/mce_error_inject.c
> @@ -0,0 +1,148 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Machine Check Exception injection code
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static inline unsigned long get_slb_index(void)
> +{
> + unsigned long index;
> +
> + index = get_paca()->stab_rr;
> +
> + /*
> +  * simple round-robin replacement of slb starting at SLB_NUM_BOLTED.
> +  */
> + if (index < (mmu_slb_size - 1))
> + index++;
> + else
> + index = SLB_NUM_BOLTED;
> + get_paca()->stab_rr = index;
> + return index;
> +}
> +
> +#define slb_esid_mask(ssize) \
> + (((ssize) == MMU_SEGSIZE_256M) ? ESID_MASK : ESID_MASK_1T)
> +
> +static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
> +  unsigned long slot)
> +{
> + return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | slot;
> +}
> +
> +#define slb_vsid_shift(ssize)\
> + ((ssize) == MMU_SEGSIZE_256M ? SLB_VSID_SHIFT : SLB_VSID_SHIFT_1T)
> +
> +static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
> +  unsigned long flags)
> +{
> + return (get_kernel_vsid(ea, ssize) << slb_vsid_shift(ssize)) | flags |
> + ((unsigned long)ssize << SLB_VSID_SSIZE_SHIFT);
> +}
> +
> +static void insert_slb_entry(char *p, int ssize)
> +{
> + unsigned long flags, entry;
> + struct paca_struct *paca;
> +
> + flags = SLB_VSID_KERNEL | mmu_psize_defs[MMU_PAGE_64K].sllp;
> +
> + preempt_disable();
> +
> + paca = get_paca();
This seems unused?
> +
> + entry = get_slb_index();
> + asm volatile("slbmte %0,%1" :
> + : "r" (mk_vsid_data((unsigned long)p, ssize, flags)),
> +   "r" (mk_esid_data((unsigned long)p, ssize, entry))
> + : "memory");
> +
> + entry = get_slb_index();
> + asm volatile("slbmte %0,%1" :
> + : "r" (mk_vsid_data((unsigned long)p, ssize, flags)),
> +   "r" (mk_esid_data((unsigned long)p, ssize, entry))
> + : "memory");
> + preempt_enable();
> + p[0] = '!';
> +}
> +
> +static void inject_vmalloc_slb_multihit(void)
> +{
> + char *p;
> +
> + p = vmalloc(2048);
> + if (!p)
> + return;
> +
> + insert_slb_entry(p, MMU_SEGSIZE_1T);
> + vfree(p);
> +}
> +
> +static void inject_kmalloc_slb_multihit(void)
> +{
> + char *p;
> +
> + p = kmalloc(2048, GFP_KERNEL);
> + if (!p)
> + return;
> +
> + insert_slb_entry(p, MMU_SEGSIZE_1T);
> + kfree(p);
> +}
> +
> +static ssize_t inject_slb_multihit(const char __user *u_buf, size_t count)
> +{
> + char buf[32];
> + size_t buf_size;
> +
> + buf_size = min(count, (sizeof(buf) - 1));
> + if (copy_from_user(buf, u_buf, buf_size))
> + return -EFAULT;
> + buf[buf_size] = '\0';
> +
> + if (buf[0] != '1')
> + return -EINVAL;
> +
> + inject_vmalloc_slb_multihit();
> + inject_kmalloc_slb_multihit();
This is 

Re: [PATCH 1/3] powerpc/mce: remove nmi_enter/exit from real mode handler

2020-09-17 Thread Michal Suchánek
Hello,

On Wed, Sep 16, 2020 at 10:52:26PM +0530, Ganesh Goudar wrote:
> Use of nmi_enter/exit in real mode handler causes the kernel to panic
> and reboot on injecting slb mutihit on pseries machine running in hash
> mmu mode, As these calls try to accesses memory outside RMO region in
> real mode handler where translation is disabled.
> 
> Add check to not to use these calls on pseries machine running in hash
> mmu mode.
> 
> Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI 
> accounting")
> Signed-off-by: Ganesh Goudar 
> ---
>  arch/powerpc/kernel/mce.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index ada59f6c4298..1d42fe0f5f9c 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -591,10 +591,15 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info);
>  long notrace machine_check_early(struct pt_regs *regs)
>  {
>   long handled = 0;
> - bool nested = in_nmi();
> + bool nested;
> + bool is_pseries_hpt_guest;
>   u8 ftrace_enabled = this_cpu_get_ftrace_enabled();
>  
>   this_cpu_set_ftrace_enabled(0);
> + is_pseries_hpt_guest = machine_is(pseries) &&
> +mmu_has_feature(MMU_FTR_HPTE_TABLE);
> + /* Do not use nmi_enter/exit for pseries hpte guest */
> + nested = is_pseries_hpt_guest ? true : in_nmi();
As pointed out already in another comment nesting is supported natively
since 69ea03b56ed2c7189ccd0b5910ad39f3cad1df21. You can simply do
nmi_enter and nmi_exit unconditionally - or only based on
is_pseries_hpt_guest.

The other question is what is the value of calling nmi_enter here at
all. It crashes in one case, we simply skip it for that case, and we are
good. Maybe we could skip it altogether?

Thanks

Michal


Re: [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting"

2020-09-16 Thread Michal Suchánek
On Tue, Sep 15, 2020 at 08:16:42PM +0200, pet...@infradead.org wrote:
> On Tue, Sep 15, 2020 at 08:06:59PM +0200, Michal Suchanek wrote:
> > This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d.
> > 
> > This commit causes the kernel to oops and reboot when injecting a SLB
> > multihit which causes a MCE.
> > 
> > Before this commit a SLB multihit was corrected by the kernel and the
> > system continued to operate normally.
> > 
> > cc: sta...@vger.kernel.org
> > Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI 
> > accounting")
> > Signed-off-by: Michal Suchanek 
> 
> Ever since 69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")
> nmi_enter() supports nesting natively.

And this patch was merged in parallel with this native nesting support
and conflicted with it - hence the explicit nesting in the hunk that did
not conflict.

Either way the bug is present on kernels both with and without
69ea03b56ed2. So besides the conflict 69ea03b56ed2 does not affect this
problem.

Thanks

Michal


Injecting SLB miltihit crashes kernel 5.9.0-rc5

2020-09-15 Thread Michal Suchánek
Hello,

Using the SLB mutihit injection test module (which I did not write so I
do not want to post it here) to verify updates on my 5.3 frankernekernel
I found that the kernel crashes with Oops: kernel bad access.

I tested on latest upstream kernel build that I have at hand and the
result is te same (minus the message - nothing was logged and the kernel
simply rebooted).

Since the whole effort to write a real mode MCE handler was supposed to
prevent this maybe the SLB injection module should be added to the
kernel selftests?

Thanks

Michal


Re: [PATCH] powerpc/traps: fix recoverability of machine check handling on book3s/32

2020-09-14 Thread Michal Suchánek
On Fri, Sep 11, 2020 at 11:23:57PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > Hello,
> >
> > does this logic apply to "Unrecoverable System Reset" as well?
> 
> Which logic do you mean?
> 
> We do call die() before checking MSR_RI in system_reset_exception():
> 
>   /*
>* No debugger or crash dump registered, print logs then
>* panic.
>*/
>   die("System Reset", regs, SIGABRT);
>   
>   mdelay(2*MSEC_PER_SEC); /* Wait a little while for others to print */
>   add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
>   nmi_panic(regs, "System Reset");
>   
>   out:
>   #ifdef CONFIG_PPC_BOOK3S_64
>   BUG_ON(get_paca()->in_nmi == 0);
>   if (get_paca()->in_nmi > 1)
>   die("Unrecoverable nested System Reset", regs, SIGABRT);
>   #endif
>   /* Must die if the interrupt is not recoverable */
>   if (!(regs->msr & MSR_RI))
>   die("Unrecoverable System Reset", regs, SIGABRT);
> 
> 
> So you should see the output from die("System Reset", ...) even if
> MSR[RI] was clear when you took the system reset.

Indeed, replied to the wrong patch. I was looking at daf00ae71dad
("powerpc/traps: restore recoverability of machine_check interrupts")
which has very similar commit message.

Sorry about the confusion.

Thanks

Michal

> 
> cheers
> 
> > On Tue, Jan 22, 2019 at 02:11:24PM +, Christophe Leroy wrote:
> >> Looks like book3s/32 doesn't set RI on machine check, so
> >> checking RI before calling die() will always be fatal
> >> allthought this is not an issue in most cases.
> >> 
> >> Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable 
> >> interrupt")
> >> Fixes: daf00ae71dad ("powerpc/traps: restore recoverability of 
> >> machine_check interrupts")
> >> Signed-off-by: Christophe Leroy 
> >> Cc: sta...@vger.kernel.org
> >> ---
> >>  arch/powerpc/kernel/traps.c | 8 
> >>  1 file changed, 4 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >> index 64936b60d521..c740f8bfccc9 100644
> >> --- a/arch/powerpc/kernel/traps.c
> >> +++ b/arch/powerpc/kernel/traps.c
> >> @@ -763,15 +763,15 @@ void machine_check_exception(struct pt_regs *regs)
> >>if (check_io_access(regs))
> >>goto bail;
> >>  
> >> -  /* Must die if the interrupt is not recoverable */
> >> -  if (!(regs->msr & MSR_RI))
> >> -  nmi_panic(regs, "Unrecoverable Machine check");
> >> -
> >>if (!nested)
> >>nmi_exit();
> >>  
> >>die("Machine check", regs, SIGBUS);
> >>  
> >> +  /* Must die if the interrupt is not recoverable */
> >> +  if (!(regs->msr & MSR_RI))
> >> +  nmi_panic(regs, "Unrecoverable Machine check");
> >> +
> >>return;
> >>  
> >>  bail:
> >> -- 
> >> 2.13.3
> >> 


Re: [PATCH] powerpc/traps: fix recoverability of machine check handling on book3s/32

2020-09-11 Thread Michal Suchánek
Hello,

does this logic apply to "Unrecoverable System Reset" as well?

Thanks

Michal

On Tue, Jan 22, 2019 at 02:11:24PM +, Christophe Leroy wrote:
> Looks like book3s/32 doesn't set RI on machine check, so
> checking RI before calling die() will always be fatal
> allthought this is not an issue in most cases.
> 
> Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable 
> interrupt")
> Fixes: daf00ae71dad ("powerpc/traps: restore recoverability of machine_check 
> interrupts")
> Signed-off-by: Christophe Leroy 
> Cc: sta...@vger.kernel.org
> ---
>  arch/powerpc/kernel/traps.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 64936b60d521..c740f8bfccc9 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -763,15 +763,15 @@ void machine_check_exception(struct pt_regs *regs)
>   if (check_io_access(regs))
>   goto bail;
>  
> - /* Must die if the interrupt is not recoverable */
> - if (!(regs->msr & MSR_RI))
> - nmi_panic(regs, "Unrecoverable Machine check");
> -
>   if (!nested)
>   nmi_exit();
>  
>   die("Machine check", regs, SIGBUS);
>  
> + /* Must die if the interrupt is not recoverable */
> + if (!(regs->msr & MSR_RI))
> + nmi_panic(regs, "Unrecoverable Machine check");
> +
>   return;
>  
>  bail:
> -- 
> 2.13.3
> 


Re: [PATCH] KVM: PPC: Book3S HV: Do not allocate HPT for a nested guest

2020-09-11 Thread Michal Suchánek
On Fri, Sep 11, 2020 at 10:01:33AM +0200, Greg Kurz wrote:
> On Fri, 11 Sep 2020 09:45:36 +0200
> Greg Kurz  wrote:
> 
> > On Fri, 11 Sep 2020 01:16:07 -0300
> > Fabiano Rosas  wrote:
> > 
> > > The current nested KVM code does not support HPT guests. This is
> > > informed/enforced in some ways:
> > > 
> > > - Hosts < P9 will not be able to enable the nested HV feature;
> > > 
> > > - The nested hypervisor MMU capabilities will not contain
> > >   KVM_CAP_PPC_MMU_HASH_V3;
> > > 
> > > - QEMU reflects the MMU capabilities in the
> > >   'ibm,arch-vec-5-platform-support' device-tree property;
> > > 
> > > - The nested guest, at 'prom_parse_mmu_model' ignores the
> > >   'disable_radix' kernel command line option if HPT is not supported;
> > > 
> > > - The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.
> > > 
> > > There is, however, still a way to start a HPT guest by using
> > > max-compat-cpu=power8 at the QEMU machine options. This leads to the
> > > guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
> > > ioctl.
> > > 
> > > With the guest set to hash, the nested hypervisor goes through the
> > > entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
> > > crashes when it tries to execute an hypervisor-privileged (mtspr
> > > HDEC) instruction at __kvmppc_vcore_entry:
> > > 
> > > root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...
> > > 
> > > 
> > > [  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
> > > [  538.543355] NIP:  c0080753f388 LR: c0080753f368 CTR: 
> > > c01e5ec0
> > > [  538.543417] REGS: c013e91e33b0 TRAP: 0700   Not tainted  
> > > (5.9.0-rc4)
> > > [  538.543470] MSR:  82843033   CR: 
> > > 22422882  XER: 2004
> > > [  538.543546] CFAR: c0080753f4b0 IRQMASK: 3
> > >GPR00: c008075397a0 c013e91e3640 c0080755e600 
> > > 8000
> > >GPR04:  c013eab19800 c01394de 
> > > 0043a054db72
> > >GPR08: 003b1652   
> > > c008075502e0
> > >GPR12: c01e5ec0 c007ffa74200 c013eab19800 
> > > 0008
> > >GPR16:  c0139676c6c0 c1d23948 
> > > c013e91e38b8
> > >GPR20: 0053  0001 
> > > 
> > >GPR24: 0001 0001  
> > > 0001
> > >GPR28: 0001 0053 c013eab19800 
> > > 0001
> > > [  538.544067] NIP [c0080753f388] __kvmppc_vcore_entry+0x90/0x104 
> > > [kvm_hv]
> > > [  538.544121] LR [c0080753f368] __kvmppc_vcore_entry+0x70/0x104 
> > > [kvm_hv]
> > > [  538.544173] Call Trace:
> > > [  538.544196] [c013e91e3640] [c013e91e3680] 0xc013e91e3680 
> > > (unreliable)
> > > [  538.544260] [c013e91e3820] [c008075397a0] 
> > > kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
> > > [  538.544325] [c013e91e39e0] [c0080753d99c] 
> > > kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
> > > [  538.544394] [c013e91e3ad0] [c008072da4fc] 
> > > kvmppc_vcpu_run+0x34/0x48 [kvm]
> > > [  538.544472] [c013e91e3af0] [c008072d61b8] 
> > > kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
> > > [  538.544539] [c013e91e3b80] [c008072c7450] 
> > > kvm_vcpu_ioctl+0x298/0x778 [kvm]
> > > [  538.544605] [c013e91e3ce0] [c04b8c2c] sys_ioctl+0x1dc/0xc90
> > > [  538.544662] [c013e91e3dc0] [c002f9a4] 
> > > system_call_exception+0xe4/0x1c0
> > > [  538.544726] [c013e91e3e20] [c000d140] 
> > > system_call_common+0xf0/0x27c
> > > [  538.544787] Instruction dump:
> > > [  538.544821] f86d1098 6000 6000 4899 e8ad0fe8 e8c500a0 
> > > e9264140 75290002
> > > [  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 
> > > f90d10a0 480104fd
> > > [  538.544953] ---[ end trace 74423e2b948c2e0c ]---
> > > 
> > > This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
> > > the nested hypervisor, causing QEMU to abort.
> > > 
> > > Reported-by: Satheesh Rajendran 
> > > Signed-off-by: Fabiano Rosas 
> > > ---
> > 
> > LGTM
> > 
> > Reviewed-by: Greg Kurz 
> > 
> > >  arch/powerpc/kvm/book3s_hv.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > index 4ba06a2a306c..764b6239ef72 100644
> > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > @@ -5250,6 +5250,12 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
> > >   case KVM_PPC_ALLOCATE_HTAB: {
> > >   u32 htab_order;
> > >  
> > > + /* If we're a nested hypervisor, we currently only support 
> > > radix */
> > > + if (kvmhv_on_pseries()) {
> > > + r = -EOPNOTSUPP;
> 
> According to 

Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

2020-09-07 Thread Michal Suchánek
On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote:
> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
> > Michal Suchánek  writes:
> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
> >>> > Hello,
> >>> > 
> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
> >>> > Reimplement book3s idle code in C").
> >>> > 
> >>> > The symptom is host locking up completely after some hours of KVM
> >>> > workload with messages like
> >>> > 
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 47
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 71
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 47
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 71
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 47
> >>> > 
> >>> > printed before the host locks up.
> >>> > 
> >>> > The machines run sandboxed builds which is a mixed workload resulting in
> >>> > IO/single core/mutiple core load over time and there are periods of no
> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM
> >>> > setup/terdown is somewhat excercised as well.
> >>> > 
> >>> > POWER9 with the new guest entry fast path does not seem to be affected.
> >>> > 
> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
> >>> > after idle") which gives same idle code as 5.1.16 and the kernel seems
> >>> > stable.
> >>> > 
> >>> > Config is attached.
> >>> > 
> >>> > I cannot easily revert this commit, especially if I want to use the same
> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are applicable
> >>> > only to the new idle code.
> >>> > 
> >>> > Any idea what can be the problem?
> >>> 
> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
> >>> those threads. I wonder what they are doing. POWER8 doesn't have a good
> >>> NMI IPI and I don't know if it supports pdbg dumping registers from the
> >>> BMC unfortunately.
> >>
> >> It may be possible to set up fadump with a later kernel version that
> >> supports it on powernv and dump the whole kernel.
> > 
> > Your firmware won't support it AFAIK.
> > 
> > You could try kdump, but if we have CPUs stuck in KVM then there's a
> > good chance it won't work :/
> 
> I haven't had any luck yet reproducing this still. Testing with sub 
> cores of various different combinations, etc. I'll keep trying though.
> 
> I don't know if there's much we can add to debug it. Can we run pdbg
> on the BMCs on these things?

I suppose it depends on the machine type?

Thanks

Michal


Re: [PATCH v11] Fixup for "powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32"

2020-09-01 Thread Michal Suchánek
Hello,

can you add Fixes: ?

Thanks

Michal

On Tue, Sep 01, 2020 at 05:28:57AM +, Christophe Leroy wrote:
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/vdso/gettimeofday.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
> b/arch/powerpc/include/asm/vdso/gettimeofday.h
> index 59a609a48b63..8da84722729b 100644
> --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
> @@ -186,6 +186,8 @@ int __c_kernel_clock_getres(clockid_t clock_id, struct 
> __kernel_timespec *res,
>  #else
>  int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
>const struct vdso_data *vd);
> +int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts,
> +const struct vdso_data *vd);
>  int __c_kernel_clock_getres(clockid_t clock_id, struct old_timespec32 *res,
>   const struct vdso_data *vd);
>  #endif
> -- 
> 2.25.0
> 


Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

2020-08-31 Thread Michal Suchánek
On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
> > Hello,
> > 
> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
> > Reimplement book3s idle code in C").
> > 
> > The symptom is host locking up completely after some hours of KVM
> > workload with messages like
> > 
> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
> > 
> > printed before the host locks up.
> > 
> > The machines run sandboxed builds which is a mixed workload resulting in
> > IO/single core/mutiple core load over time and there are periods of no
> > activity and no VMS runnig as well. The VMs are shortlived so VM
> > setup/terdown is somewhat excercised as well.
> > 
> > POWER9 with the new guest entry fast path does not seem to be affected.
> > 
> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
> > after idle") which gives same idle code as 5.1.16 and the kernel seems
> > stable.
> > 
> > Config is attached.
> > 
> > I cannot easily revert this commit, especially if I want to use the same
> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are applicable
> > only to the new idle code.
> > 
> > Any idea what can be the problem?
> 
> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
> those threads. I wonder what they are doing. POWER8 doesn't have a good
> NMI IPI and I don't know if it supports pdbg dumping registers from the
> BMC unfortunately.
It may be possible to set up fadump with a later kernel version that
supports it on powernv and dump the whole kernel.

Thanks

Michal


Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

2020-08-31 Thread Michal Suchánek
On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
> > Hello,
> > 
> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
> > Reimplement book3s idle code in C").
> > 
> > The symptom is host locking up completely after some hours of KVM
> > workload with messages like
> > 
> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
> > 
> > printed before the host locks up.
> > 
> > The machines run sandboxed builds which is a mixed workload resulting in
> > IO/single core/mutiple core load over time and there are periods of no
> > activity and no VMS runnig as well. The VMs are shortlived so VM
> > setup/terdown is somewhat excercised as well.
> > 
> > POWER9 with the new guest entry fast path does not seem to be affected.
> > 
> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
> > after idle") which gives same idle code as 5.1.16 and the kernel seems
> > stable.
> > 
> > Config is attached.
> > 
> > I cannot easily revert this commit, especially if I want to use the same
> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are applicable
> > only to the new idle code.
> > 
> > Any idea what can be the problem?
> 
> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
> those threads. I wonder what they are doing. POWER8 doesn't have a good
> NMI IPI and I don't know if it supports pdbg dumping registers from the
> BMC unfortunately. Do the messages always come in pairs of CPUs?
> 
> I'm not sure where to start with reproducing, I'll have to try. How many
> vCPUs in the guests? Do you have several guests running at once?

The guests are spawned on demand - there are like 20-30 'slots'
configured where a VM may be running or it may be idle with no VM
spawned when there are no jobs available.

Thanks

Michal


Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-08-07 Thread Michal Suchánek
On Fri, Aug 07, 2020 at 08:58:09AM +0200, David Hildenbrand wrote:
> On 07.08.20 06:32, Andrew Morton wrote:
> > On Fri, 3 Jul 2020 18:28:23 +0530 Srikar Dronamraju 
> >  wrote:
> > 
> >>> The memory hotplug changes that somehow because you can hotremove numa
> >>> nodes and therefore make the nodemask sparse but that is not a common
> >>> case. I am not sure what would happen if a completely new node was added
> >>> and its corresponding node was already used by the renumbered one
> >>> though. It would likely conflate the two I am afraid. But I am not sure
> >>> this is really possible with x86 and a lack of a bug report would
> >>> suggest that nobody is doing that at least.
> >>>
> >>
> >> JFYI,
> >> Satheesh copied in this mailchain had opened a bug a year on crash with 
> >> vcpu
> >> hotplug on memoryless node. 
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=202187
> > 
> > So...  do we merge this patch or not?  Seems that the overall view is
> > "risky but nobody is likely to do anything better any time soon"?
> 
> I recall the issue Michal saw was "fix powerpc" vs. "break other
> architectures". @Michal how should we proceed? At least x86-64 won't be
> affected IIUC.
There is a patch to introduce the node remapping on ppc as well which
should eliminate the empty node 0.

https://patchwork.ozlabs.org/project/linuxppc-dev/patch/2020073916.243569-1-aneesh.ku...@linux.ibm.com/

Thanks

Michal


Re: [PATCH v3 4/6] powerpc/64s: implement queued spinlocks and rwlocks

2020-07-23 Thread Michal Suchánek
On Mon, Jul 06, 2020 at 02:35:38PM +1000, Nicholas Piggin wrote:
> These have shown significantly improved performance and fairness when
> spinlock contention is moderate to high on very large systems.
> 
>  [ Numbers hopefully forthcoming after more testing, but initial
>results look good ]
> 
> Thanks to the fast path, single threaded performance is not noticably
> hurt.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/Kconfig  | 13 
>  arch/powerpc/include/asm/Kbuild   |  2 ++
>  arch/powerpc/include/asm/qspinlock.h  | 25 +++
>  arch/powerpc/include/asm/spinlock.h   |  5 +
>  arch/powerpc/include/asm/spinlock_types.h |  5 +
>  arch/powerpc/lib/Makefile |  3 +++
>  include/asm-generic/qspinlock.h   |  2 ++
>  7 files changed, 55 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/qspinlock.h
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 24ac85c868db..17663ea57697 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -146,6 +146,8 @@ config PPC
>   select ARCH_SUPPORTS_ATOMIC_RMW
>   select ARCH_USE_BUILTIN_BSWAP
>   select ARCH_USE_CMPXCHG_LOCKREF if PPC64
> + select ARCH_USE_QUEUED_RWLOCKS  if PPC_QUEUED_SPINLOCKS
> + select ARCH_USE_QUEUED_SPINLOCKSif PPC_QUEUED_SPINLOCKS
>   select ARCH_WANT_IPC_PARSE_VERSION
>   select ARCH_WEAK_RELEASE_ACQUIRE
>   select BINFMT_ELF
> @@ -492,6 +494,17 @@ config HOTPLUG_CPU
>  
> Say N if you are unsure.
>  
> +config PPC_QUEUED_SPINLOCKS
> + bool "Queued spinlocks"
> + depends on SMP
> + default "y" if PPC_BOOK3S_64
> + help
> +   Say Y here to use to use queued spinlocks which are more complex
> +   but give better salability and fairness on large SMP and NUMA
   ^ +c?
Thanks

Michal
> +   systems.
> +
> +   If unsure, say "Y" if you have lots of cores, otherwise "N".
> +
>  config ARCH_CPU_PROBE_RELEASE
>   def_bool y
>   depends on HOTPLUG_CPU
> diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
> index dadbcf3a0b1e..1dd8b6adff5e 100644
> --- a/arch/powerpc/include/asm/Kbuild
> +++ b/arch/powerpc/include/asm/Kbuild
> @@ -6,5 +6,7 @@ generated-y += syscall_table_spu.h
>  generic-y += export.h
>  generic-y += local64.h
>  generic-y += mcs_spinlock.h
> +generic-y += qrwlock.h
> +generic-y += qspinlock.h
>  generic-y += vtime.h
>  generic-y += early_ioremap.h
> diff --git a/arch/powerpc/include/asm/qspinlock.h 
> b/arch/powerpc/include/asm/qspinlock.h
> new file mode 100644
> index ..c49e33e24edd
> --- /dev/null
> +++ b/arch/powerpc/include/asm/qspinlock.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_QSPINLOCK_H
> +#define _ASM_POWERPC_QSPINLOCK_H
> +
> +#include 
> +
> +#define _Q_PENDING_LOOPS (1 << 9) /* not tuned */
> +
> +#define smp_mb__after_spinlock()   smp_mb()
> +
> +static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
> +{
> + /*
> +  * This barrier was added to simple spinlocks by commit 51d7d5205d338,
> +  * but it should now be possible to remove it, asm arm64 has done with
> +  * commit c6f5d02b6a0f.
> +  */
> + smp_mb();
> + return atomic_read(>val);
> +}
> +#define queued_spin_is_locked queued_spin_is_locked
> +
> +#include 
> +
> +#endif /* _ASM_POWERPC_QSPINLOCK_H */
> diff --git a/arch/powerpc/include/asm/spinlock.h 
> b/arch/powerpc/include/asm/spinlock.h
> index 21357fe05fe0..434615f1d761 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -3,7 +3,12 @@
>  #define __ASM_SPINLOCK_H
>  #ifdef __KERNEL__
>  
> +#ifdef CONFIG_PPC_QUEUED_SPINLOCKS
> +#include 
> +#include 
> +#else
>  #include 
> +#endif
>  
>  #endif /* __KERNEL__ */
>  #endif /* __ASM_SPINLOCK_H */
> diff --git a/arch/powerpc/include/asm/spinlock_types.h 
> b/arch/powerpc/include/asm/spinlock_types.h
> index 3906f52dae65..c5d742f18021 100644
> --- a/arch/powerpc/include/asm/spinlock_types.h
> +++ b/arch/powerpc/include/asm/spinlock_types.h
> @@ -6,6 +6,11 @@
>  # error "please don't include this file directly"
>  #endif
>  
> +#ifdef CONFIG_PPC_QUEUED_SPINLOCKS
> +#include 
> +#include 
> +#else
>  #include 
> +#endif
>  
>  #endif
> diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> index 5e994cda8e40..d66a645503eb 100644
> --- a/arch/powerpc/lib/Makefile
> +++ b/arch/powerpc/lib/Makefile
> @@ -41,7 +41,10 @@ obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o 
> copypage_power7.o \
>  obj64-y  += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \
>  memcpy_64.o memcpy_mcsafe_64.o
>  
> +ifndef CONFIG_PPC_QUEUED_SPINLOCKS
>  obj64-$(CONFIG_SMP)  += locks.o
> +endif
> +
>  obj64-$(CONFIG_ALTIVEC)  += vmx-helper.o
>  obj64-$(CONFIG_KPROBES_SANITY_TEST)  += test_emulate_step.o \
>  

Re: [PATCH] powerpc/fault: kernel can extend a user process's stack

2020-07-20 Thread Michal Suchánek
Hello,

On Wed, Dec 11, 2019 at 08:37:21PM +1100, Daniel Axtens wrote:
> > Fixes: 14cf11af6cf6 ("powerpc: Merge enough to start building in
> > arch/powerpc.")
> 
> Wow, that's pretty ancient! I'm also not sure it's right - in that same
> patch, arch/ppc64/mm/fault.c contains:
> 
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 213)  
>   if (address + 2048 < uregs->gpr[1]
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 214)  
>   && (!user_mode(regs) || !store_updates_sp(regs)))
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 215)  
>   goto bad_area;
> 
> Which is the same as the new arch/powerpc/mm/fault.c code:
> 
> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 234)if 
> (address + 2048 < uregs->gpr[1]
> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 235)
> && (!user_mode(regs) || !store_updates_sp(regs)))
> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 236)  
>   goto bad_area;
> 
> So either they're both right or they're both wrong, either way I'm not
> sure how this patch is to blame.

Is there any progress on resolving this?

I did not notice any followup patch nor this one being merged/refuted.

Thanks

Michal

> 
> I guess we should also cc stable@...
> 
> Regards,
> Daniel
> 
> >> Reported-by: Tom Lane 
> >> Cc: Daniel Black 
> >> Signed-off-by: Daniel Axtens 
> >> ---
> >>  arch/powerpc/mm/fault.c | 10 ++
> >>  1 file changed, 10 insertions(+)
> >> 
> >> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> >> index b5047f9b5dec..00183731ea22 100644
> >> --- a/arch/powerpc/mm/fault.c
> >> +++ b/arch/powerpc/mm/fault.c
> >> @@ -287,7 +287,17 @@ static bool bad_stack_expansion(struct pt_regs *regs, 
> >> unsigned long address,
> >>if (!res)
> >>return !store_updates_sp(inst);
> >>*must_retry = true;
> >> +  } else if ((flags & FAULT_FLAG_WRITE) &&
> >> + !(flags & FAULT_FLAG_USER)) {
> >> +  /*
> >> +   * the kernel can also attempt to write beyond the end
> >> +   * of a process's stack - for example setting up a
> >> +   * signal frame. We assume this is valid, subject to
> >> +   * the checks in expand_stack() later.
> >> +   */
> >> +  return false;
> >>}
> >> +
> >>return true;
> >>}
> >>return false;
> >> -- 
> >> 2.20.1
> >> 


Re: [PATCH v3] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-17 Thread Michal Suchánek
On Fri, Jul 17, 2020 at 03:58:01PM +1000, Daniel Axtens wrote:
> Michal Suchánek  writes:
> 
> > On Wed, Jul 15, 2020 at 07:52:01AM -0400, Nayna Jain wrote:
> >> The device-tree property to check secure and trusted boot state is
> >> different for guests(pseries) compared to baremetal(powernv).
> >> 
> >> This patch updates the existing is_ppc_secureboot_enabled() and
> >> is_ppc_trustedboot_enabled() functions to add support for pseries.
> >> 
> >> The secureboot and trustedboot state are exposed via device-tree property:
> >> /proc/device-tree/ibm,secure-boot and /proc/device-tree/ibm,trusted-boot
> >> 
> >> The values of ibm,secure-boot under pseries are interpreted as:
> >   ^^^
> >> 
> >> 0 - Disabled
> >> 1 - Enabled in Log-only mode. This patch interprets this value as
> >> disabled, since audit mode is currently not supported for Linux.
> >> 2 - Enabled and enforced.
> >> 3-9 - Enabled and enforcing; requirements are at the discretion of the
> >> operating system.
> >> 
> >> The values of ibm,trusted-boot under pseries are interpreted as:
> >^^^
> > These two should be different I suppose?
> 
> I'm not quite sure what you mean? They'll be documented in a future
> revision of the PAPR, once I get my act together and submit the
> relevant internal paperwork.

Nevermind, one talks about secure boot, the other about trusted boot.

Thanks

Michal


Re: [PATCH net-next] ibmvnic: Increase driver logging

2020-07-16 Thread Michal Suchánek
On Thu, Jul 16, 2020 at 10:59:58AM -0500, Thomas Falcon wrote:
> 
> On 7/15/20 8:29 PM, David Miller wrote:
> > From: Jakub Kicinski 
> > Date: Wed, 15 Jul 2020 17:06:32 -0700
> > 
> > > On Wed, 15 Jul 2020 18:51:55 -0500 Thomas Falcon wrote:
> > > > free_netdev(netdev);
> > > > dev_set_drvdata(>dev, NULL);
> > > > +   netdev_info(netdev, "VNIC client device has been successfully 
> > > > removed.\n");
> > > A step too far, perhaps.
> > > 
> > > In general this patch looks a little questionable IMHO, this amount of
> > > logging output is not commonly seen in drivers. All the the info
> > > messages are just static text, not even carrying any extra information.
> > > In an era of ftrace, and bpftrace, do we really need this?
> > Agreed, this is too much.  This is debugging, and thus suitable for tracing
> > facilities, at best.
> 
> Thanks for your feedback. I see now that I was overly aggressive with this
> patch to be sure, but it would help with narrowing down problems at a first
> glance, should they arise. The driver in its current state logs very little
> of what is it doing without the use of additional debugging or tracing
> facilities. Would it be worth it to pursue a less aggressive version or
> would that be dead on arrival? What are acceptable driver operations to log
> at this level?

Also would it be advisable to add the messages as pr_dbg to be enabled on 
demand?

Thanks

Michal


Re: [PATCH v3] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-16 Thread Michal Suchánek
On Wed, Jul 15, 2020 at 07:52:01AM -0400, Nayna Jain wrote:
> The device-tree property to check secure and trusted boot state is
> different for guests(pseries) compared to baremetal(powernv).
> 
> This patch updates the existing is_ppc_secureboot_enabled() and
> is_ppc_trustedboot_enabled() functions to add support for pseries.
> 
> The secureboot and trustedboot state are exposed via device-tree property:
> /proc/device-tree/ibm,secure-boot and /proc/device-tree/ibm,trusted-boot
> 
> The values of ibm,secure-boot under pseries are interpreted as:
  ^^^
> 
> 0 - Disabled
> 1 - Enabled in Log-only mode. This patch interprets this value as
> disabled, since audit mode is currently not supported for Linux.
> 2 - Enabled and enforced.
> 3-9 - Enabled and enforcing; requirements are at the discretion of the
> operating system.
> 
> The values of ibm,trusted-boot under pseries are interpreted as:
   ^^^
These two should be different I suppose?

Thanks

Michal
> 0 - Disabled
> 1 - Enabled
> 
> Signed-off-by: Nayna Jain 
> Reviewed-by: Daniel Axtens 
> ---
> v3:
> * fixed double check. Thanks Daniel for noticing it.
> * updated patch description.
> 
> v2:
> * included Michael Ellerman's feedback.
> * added Daniel Axtens's Reviewed-by.
> 
>  arch/powerpc/kernel/secure_boot.c | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/secure_boot.c 
> b/arch/powerpc/kernel/secure_boot.c
> index 4b982324d368..118bcb5f79c4 100644
> --- a/arch/powerpc/kernel/secure_boot.c
> +++ b/arch/powerpc/kernel/secure_boot.c
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static struct device_node *get_ppc_fw_sb_node(void)
>  {
> @@ -23,12 +24,19 @@ bool is_ppc_secureboot_enabled(void)
>  {
>   struct device_node *node;
>   bool enabled = false;
> + u32 secureboot;
>  
>   node = get_ppc_fw_sb_node();
>   enabled = of_property_read_bool(node, "os-secureboot-enforcing");
> -
>   of_node_put(node);
>  
> + if (enabled)
> + goto out;
> +
> + if (!of_property_read_u32(of_root, "ibm,secure-boot", ))
> + enabled = (secureboot > 1);
> +
> +out:
>   pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled");
>  
>   return enabled;
> @@ -38,12 +46,19 @@ bool is_ppc_trustedboot_enabled(void)
>  {
>   struct device_node *node;
>   bool enabled = false;
> + u32 trustedboot;
>  
>   node = get_ppc_fw_sb_node();
>   enabled = of_property_read_bool(node, "trusted-enabled");
> -
>   of_node_put(node);
>  
> + if (enabled)
> + goto out;
> +
> + if (!of_property_read_u32(of_root, "ibm,trusted-boot", ))
> + enabled = (trustedboot > 0);
> +
> +out:
>   pr_info("Trusted boot mode %s\n", enabled ? "enabled" : "disabled");
>  
>   return enabled;
> -- 
> 2.26.2
> 


Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-07-03 Thread Michal Suchánek
On Wed, Jul 01, 2020 at 02:21:10PM +0200, Michal Hocko wrote:
> On Wed 01-07-20 13:30:57, David Hildenbrand wrote:
> > On 01.07.20 13:06, David Hildenbrand wrote:
> > > On 01.07.20 13:01, Srikar Dronamraju wrote:
> > >> * David Hildenbrand  [2020-07-01 12:15:54]:
> > >>
> > >>> On 01.07.20 12:04, Srikar Dronamraju wrote:
> >  * Michal Hocko  [2020-07-01 10:42:00]:
> > 
> > >
> > >>
> > >> 2. Also existence of dummy node also leads to inconsistent 
> > >> information. The
> > >> number of online nodes is inconsistent with the information in the
> > >> device-tree and resource-dump
> > >>
> > >> 3. When the dummy node is present, single node non-Numa systems end 
> > >> up showing
> > >> up as NUMA systems and numa_balancing gets enabled. This will mean 
> > >> we take
> > >> the hit from the unnecessary numa hinting faults.
> > >
> > > I have to say that I dislike the node online/offline state and 
> > > directly
> > > exporting that to the userspace. Users should only care whether the 
> > > node
> > > has memory/cpus. Numa nodes can be online without any memory. Just
> > > offline all the present memory blocks but do not physically hot remove
> > > them and you are in the same situation. If users are confused by an
> > > output of tools like numactl -H then those could be updated and hide
> > > nodes without any memory
> > >
> > > The autonuma problem sounds interesting but again this patch doesn't
> > > really solve the underlying problem because I strongly suspect that 
> > > the
> > > problem is still there when a numa node gets all its memory offline as
> > > mentioned above.
> 
> I would really appreciate a feedback to these two as well.
> 
> > > While I completely agree that making node 0 special is wrong, I have
> > > still hard time to review this very simply looking patch because all 
> > > the
> > > numa initialization is so spread around that this might just blow up
> > > at unexpected places. IIRC we have discussed testing in the previous
> > > version and David has provided a way to emulate these configurations
> > > on x86. Did you manage to use those instruction for additional testing
> > > on other than ppc architectures?
> > >
> > 
> >  I have tried all the steps that David mentioned and reported back at
> >  https://lore.kernel.org/lkml/20200511174731.gd1...@linux.vnet.ibm.com/t/#u
> > 
> >  As a summary, David's steps are still not creating a 
> >  memoryless/cpuless on
> >  x86 VM.
> > >>>
> > >>> Now, that is wrong. You get a memoryless/cpuless node, which is *not
> > >>> online*. Once you hotplug some memory, it will switch online. Once you
> > >>> remove memory, it will switch back offline.
> > >>>
> > >>
> > >> Let me clarify, we are looking for a node 0 which is cpuless/memoryless 
> > >> at
> > >> boot.  The code in question tries to handle a cpuless/memoryless node 0 
> > >> at
> > >> boot.
> > > 
> > > I was just correcting your statement, because it was wrong.
> > > 
> > > Could be that x86 code maps PXM 1 to node 0 because PXM 1 does neither
> > > have CPUs nor memory. That would imply that we can, in fact, never have
> > > node 0 offline during boot.
> > > 
> > 
> > Yep, looks like it.
> > 
> > [0.009726] SRAT: PXM 1 -> APIC 0x00 -> Node 0
> > [0.009727] SRAT: PXM 1 -> APIC 0x01 -> Node 0
> > [0.009727] SRAT: PXM 1 -> APIC 0x02 -> Node 0
> > [0.009728] SRAT: PXM 1 -> APIC 0x03 -> Node 0
> > [0.009731] ACPI: SRAT: Node 0 PXM 1 [mem 0x-0x0009]
> > [0.009732] ACPI: SRAT: Node 0 PXM 1 [mem 0x0010-0xbfff]
> > [0.009733] ACPI: SRAT: Node 0 PXM 1 [mem 0x1-0x13fff]
> 
> This begs a question whether ppc can do the same thing?
Or x86 stop doing it so that you can see on what node you are running?

What's the point of this indirection other than another way of avoiding
empty node 0?

Thanks

Michal


Re: [PATCH v6 6/8] powerpc/pmem: Avoid the barrier in flush routines

2020-06-30 Thread Michal Suchánek
On Mon, Jun 29, 2020 at 06:50:15PM -0700, Dan Williams wrote:
> On Mon, Jun 29, 2020 at 1:41 PM Aneesh Kumar K.V
>  wrote:
> >
> > Michal Suchánek  writes:
> >
> > > Hello,
> > >
> > > On Mon, Jun 29, 2020 at 07:27:20PM +0530, Aneesh Kumar K.V wrote:
> > >> nvdimm expect the flush routines to just mark the cache clean. The 
> > >> barrier
> > >> that mark the store globally visible is done in nvdimm_flush().
> > >>
> > >> Update the papr_scm driver to a simplified nvdim_flush callback that do
> > >> only the required barrier.
> > >>
> > >> Signed-off-by: Aneesh Kumar K.V 
> > >> ---
> > >>  arch/powerpc/lib/pmem.c   |  6 --
> > >>  arch/powerpc/platforms/pseries/papr_scm.c | 13 +
> > >>  2 files changed, 13 insertions(+), 6 deletions(-)
> > >>
> > >> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
> > >> index 5a61aaeb6930..21210fa676e5 100644
> > >> --- a/arch/powerpc/lib/pmem.c
> > >> +++ b/arch/powerpc/lib/pmem.c
> > >> @@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long 
> > >> start, unsigned long stop)
> > >>
> > >>  for (i = 0; i < size >> shift; i++, addr += bytes)
> > >>  asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): 
> > >> "memory");
> > >> -
> > >> -
> > >> -asm volatile(PPC_PHWSYNC ::: "memory");
> > >>  }
> > >>
> > >>  static inline void __flush_pmem_range(unsigned long start, unsigned 
> > >> long stop)
> > >> @@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long 
> > >> start, unsigned long stop)
> > >>
> > >>  for (i = 0; i < size >> shift; i++, addr += bytes)
> > >>  asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): 
> > >> "memory");
> > >> -
> > >> -
> > >> -asm volatile(PPC_PHWSYNC ::: "memory");
> > >>  }
> > >>
> > >>  static inline void clean_pmem_range(unsigned long start, unsigned long 
> > >> stop)
> > >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> > >> b/arch/powerpc/platforms/pseries/papr_scm.c
> > >> index 9c569078a09f..9a9a0766f8b6 100644
> > >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> > >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> > >> @@ -630,6 +630,18 @@ static int papr_scm_ndctl(struct 
> > >> nvdimm_bus_descriptor *nd_desc,
> > >>
> > >>  return 0;
> > >>  }
> > >> +/*
> > >> + * We have made sure the pmem writes are done such that before calling 
> > >> this
> > >> + * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. 
> > >> Here
> > >> + * we just need to add the necessary barrier to make sure the above 
> > >> flushes
> > >> + * are have updated persistent storage before any data access or data 
> > >> transfer
> > >> + * caused by subsequent instructions is initiated.
> > >> + */
> > >> +static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio 
> > >> *bio)
> > >> +{
> > >> +arch_pmem_flush_barrier();
> > >> +return 0;
> > >> +}
> > >>
> > >>  static ssize_t flags_show(struct device *dev,
> > >>struct device_attribute *attr, char *buf)
> > >> @@ -743,6 +755,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv 
> > >> *p)
> > >>  ndr_desc.mapping = 
> > >>  ndr_desc.num_mappings = 1;
> > >>  ndr_desc.nd_set = >nd_set;
> > >> +ndr_desc.flush = papr_scm_flush_sync;
> > >
> > > AFAICT currently the only device that implements flush is virtio_pmem.
> > > How does the nfit driver get away without implementing flush?
> >
> > generic_nvdimm_flush does the required barrier for nfit. The reason for
> > adding ndr_desc.flush call back for papr_scm was to avoid the usage
> > of iomem based deep flushing (ndr_region_data.flush_wpq) which is not
> > supported by papr_scm.
> >
> > BTW we do return NULL for ndrd_get_flush_wpq() on power. So the upstream
> > code also does the same thing, but in a different way.
> >
> >
> > > Also the flush takes arguments that are completely unused but a user of
> > > the pmem region must assume they are used, and call flush() on the
> > > region rather than arch_pmem_flush_barrier() directly.
> >
> > The bio argument can help a pmem driver to do range based flushing in
> > case of pmem_make_request. If bio is null then we must assume a full
> > device flush.
> 
> The bio argument isn't for range based flushing, it is for flush
> operations that need to complete asynchronously.
How does the block layer determine that the pmem device needs
asynchronous fushing?

The flush() was designed for the purpose with the bio argument and only
virtio_pmem which is fulshed asynchronously used it. Now that papr_scm
resuses it fir different purpose how do you tell?

Thanks

Michal


Re: [PATCH v6 6/8] powerpc/pmem: Avoid the barrier in flush routines

2020-06-29 Thread Michal Suchánek
Hello,

On Mon, Jun 29, 2020 at 07:27:20PM +0530, Aneesh Kumar K.V wrote:
> nvdimm expect the flush routines to just mark the cache clean. The barrier
> that mark the store globally visible is done in nvdimm_flush().
> 
> Update the papr_scm driver to a simplified nvdim_flush callback that do
> only the required barrier.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/lib/pmem.c   |  6 --
>  arch/powerpc/platforms/pseries/papr_scm.c | 13 +
>  2 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
> index 5a61aaeb6930..21210fa676e5 100644
> --- a/arch/powerpc/lib/pmem.c
> +++ b/arch/powerpc/lib/pmem.c
> @@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long start, 
> unsigned long stop)
>  
>   for (i = 0; i < size >> shift; i++, addr += bytes)
>   asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
> -
> -
> - asm volatile(PPC_PHWSYNC ::: "memory");
>  }
>  
>  static inline void __flush_pmem_range(unsigned long start, unsigned long 
> stop)
> @@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long start, 
> unsigned long stop)
>  
>   for (i = 0; i < size >> shift; i++, addr += bytes)
>   asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
> -
> -
> - asm volatile(PPC_PHWSYNC ::: "memory");
>  }
>  
>  static inline void clean_pmem_range(unsigned long start, unsigned long stop)
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index 9c569078a09f..9a9a0766f8b6 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -630,6 +630,18 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor 
> *nd_desc,
>  
>   return 0;
>  }
> +/*
> + * We have made sure the pmem writes are done such that before calling this
> + * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. Here
> + * we just need to add the necessary barrier to make sure the above flushes
> + * are have updated persistent storage before any data access or data 
> transfer
> + * caused by subsequent instructions is initiated.
> + */
> +static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio *bio)
> +{
> + arch_pmem_flush_barrier();
> + return 0;
> +}
>  
>  static ssize_t flags_show(struct device *dev,
> struct device_attribute *attr, char *buf)
> @@ -743,6 +755,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>   ndr_desc.mapping = 
>   ndr_desc.num_mappings = 1;
>   ndr_desc.nd_set = >nd_set;
> + ndr_desc.flush = papr_scm_flush_sync;

AFAICT currently the only device that implements flush is virtio_pmem.
How does the nfit driver get away without implementing flush?
Also the flush takes arguments that are completely unused but a user of
the pmem region must assume they are used, and call flush() on the
region rather than arch_pmem_flush_barrier() directly.  This may not
work well with md as discussed with earlier iteration of the patchest.

Thanks

Michal


Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-06-26 Thread Michal Suchánek
On Fri, May 22, 2020 at 09:01:17AM -0400, Mikulas Patocka wrote:
> 
> 
> On Fri, 22 May 2020, Aneesh Kumar K.V wrote:
> 
> > On 5/22/20 3:01 PM, Michal Suchánek wrote:
> > > On Thu, May 21, 2020 at 02:52:30PM -0400, Mikulas Patocka wrote:
> > > > 
> > > > 
> > > > On Thu, 21 May 2020, Dan Williams wrote:
> > > > 
> > > > > On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
> > > > >  wrote:
> > > > > > 
> > > > > > > Moving on to the patch itself--Aneesh, have you audited other
> > > > > > > persistent
> > > > > > > memory users in the kernel?  For example, 
> > > > > > > drivers/md/dm-writecache.c
> > > > > > > does
> > > > > > > this:
> > > > > > > 
> > > > > > > static void writecache_commit_flushed(struct dm_writecache *wc, 
> > > > > > > bool
> > > > > > > wait_for_ios)
> > > > > > > {
> > > > > > >if (WC_MODE_PMEM(wc))
> > > > > > >wmb(); <==
> > > > > > >   else
> > > > > > >   ssd_commit_flushed(wc, wait_for_ios);
> > > > > > > }
> > > > > > > 
> > > > > > > I believe you'll need to make modifications there.
> > > > > > > 
> > > > > > 
> > > > > > Correct. Thanks for catching that.
> > > > > > 
> > > > > > 
> > > > > > I don't understand dm much, wondering how this will work with
> > > > > > non-synchronous DAX device?
> > > > > 
> > > > > That's a good point. DM-writecache needs to be cognizant of things
> > > > > like virtio-pmem that violate the rule that persisent memory writes
> > > > > can be flushed by CPU functions rather than calling back into the
> > > > > driver. It seems we need to always make the flush case a dax_operation
> > > > > callback to account for this.
> > > > 
> > > > dm-writecache is normally sitting on the top of dm-linear, so it would
> > > > need to pass the wmb() call through the dm core and dm-linear target ...
> > > > that would slow it down ... I remember that you already did it this way
> > > > some times ago and then removed it.
> > > > 
> > > > What's the exact problem with POWER? Could the POWER system have two 
> > > > types
> > > > of persistent memory that need two different ways of flushing?
> > > 
> > > As far as I understand the discussion so far
> > > 
> > >   - on POWER $oldhardware uses $oldinstruction to ensure pmem consistency
> > >   - on POWER $newhardware uses $newinstruction to ensure pmem consistency
> > > (compatible with $oldinstruction on $oldhardware)
> > 
> > Correct.
> > 
> > >   - on some platforms instead of barrier instruction a callback into the
> > > driver is issued to ensure consistency 
> > 
> > This is virtio-pmem only at this point IIUC.
> > 
> > -aneesh
> 
> And does the virtio-pmem driver track which pages are dirty? Or does it 
> need to specify the range of pages to flush in the flush function?
> 
> > > None of this is reflected by the dm driver.
> 
> We could make a new dax method:
> void *(dax_get_flush_function)(void);
> 
> This would return a pointer to "wmb()" on x86 and something else on Power.
> 
> The method "dax_get_flush_function" would be called only once when 
> initializing the writecache driver (because the call would be slow because 
> it would have to go through the DM stack) and then, the returned function 
> would be called each time we need write ordering. The returned function 
> would do just "sfence; ret".

Hello,

as far as I understand the code virtio_pmem has a fush function defined
which indeed can make use of the region properties, such as memory
range. If such function exists you need quivalent of sync() - call into
the device in question. If it does not calling arch_pmem_flush_barrier()
instead of wmb() should suffice.

I am not aware of an interface to determine if the flush function exists
for a particular region.

Thanks

Michal


Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Michal Suchánek
On Tue, Jun 02, 2020 at 05:40:39PM +0200, Daniel Kolesa wrote:
> 
> 
> On Tue, Jun 2, 2020, at 17:27, Michal Suchánek wrote:
> > On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > > On Tue, Jun 2, 2020, at 16:23, Michal Suchánek wrote:
> > > > On Tue, Jun 02, 2020 at 01:40:23PM +, Joseph Myers wrote:
> > > > > On Tue, 2 Jun 2020, Daniel Kolesa wrote:
> > > > > 
> > > > > > not be limited to being just userspace under ppc64le, but should be 
> > > > > > runnable on a native kernel as well, which should not be limited to 
> > > > > > any 
> > > > > > particular baseline other than just PowerPC.
> > > > > 
> > > > > This is a fairly unusual approach to bringing up a new ABI.  Since 
> > > > > new 
> > > > > ABIs are more likely to be used on new systems rather than switching 
> > > > > ABI 
> > > > > on an existing installation, and since it can take quite some time 
> > > > > for all 
> > > > > the software support for a new ABI to become widely available in 
> > > > > distributions, people developing new ABIs are likely to think about 
> > > > > what 
> > > > > new systems are going to be relevant in a few years' time when 
> > > > > working out 
> > > > > the minimum hardware requirements for the new ABI.  (The POWER8 
> > > > > minimum 
> > > > > for powerpc64le fits in with that, for example.)
> > > > That means that you cannot run ppc64le on FSL embedded CPUs (which lack
> > > > the vector instructions in LE mode). Which may be fine with you but
> > > > other people may want to support these. Can't really say if that's good
> > > > idea or not but I don't foresee them going away in a few years, either.
> > > 
> > > well, ppc64le already cannot be run on those, as far as I know (I don't 
> > > think it's possible to build ppc64le userland without VSX in any 
> > > configuration)
> > 
> > What hardware are you targetting then? I did not notice anything
> > specific mentioned in the thread.
> > 
> > Naturally on POWER the first cpu that has LE support is POWER8 so you
> > can count on all other POWER8 features to be present. With other
> > architecture variants the situation is different.
> 
> This is not true; nearly every 32-bit PowerPC CPU has LE support (all the way 
> back to 6xx), these would be the native-hardware targets for the port (would 
> need kernel support implemented, but it's technically possible).
I find dealing with memory management issues on 32bit architectures a
pain. There is never enough address space.
> 
> As far as 64-bit CPUs go, POWER7 is the first one that could in practice run 
> the current ppc64le configuration, but in glibc it's limited to POWER8 and in 
> gcc the default for powerpc64le is also POWER8 (however, it is perfectly 
> possible to configure gcc for POWER7 and use musl libc with it).
That's interesting. I guess I was tricked but the glibc limitation.

Thanks

Michal


Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Michal Suchánek
On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> On Tue, Jun 2, 2020, at 16:23, Michal Suchánek wrote:
> > On Tue, Jun 02, 2020 at 01:40:23PM +, Joseph Myers wrote:
> > > On Tue, 2 Jun 2020, Daniel Kolesa wrote:
> > > 
> > > > not be limited to being just userspace under ppc64le, but should be 
> > > > runnable on a native kernel as well, which should not be limited to any 
> > > > particular baseline other than just PowerPC.
> > > 
> > > This is a fairly unusual approach to bringing up a new ABI.  Since new 
> > > ABIs are more likely to be used on new systems rather than switching ABI 
> > > on an existing installation, and since it can take quite some time for 
> > > all 
> > > the software support for a new ABI to become widely available in 
> > > distributions, people developing new ABIs are likely to think about what 
> > > new systems are going to be relevant in a few years' time when working 
> > > out 
> > > the minimum hardware requirements for the new ABI.  (The POWER8 minimum 
> > > for powerpc64le fits in with that, for example.)
> > That means that you cannot run ppc64le on FSL embedded CPUs (which lack
> > the vector instructions in LE mode). Which may be fine with you but
> > other people may want to support these. Can't really say if that's good
> > idea or not but I don't foresee them going away in a few years, either.
> 
> well, ppc64le already cannot be run on those, as far as I know (I don't think 
> it's possible to build ppc64le userland without VSX in any configuration)

What hardware are you targetting then? I did not notice anything
specific mentioned in the thread.

Naturally on POWER the first cpu that has LE support is POWER8 so you
can count on all other POWER8 features to be present. With other
architecture variants the situation is different.

Thanks

Michal


Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Michal Suchánek
On Tue, Jun 02, 2020 at 01:40:23PM +, Joseph Myers wrote:
> On Tue, 2 Jun 2020, Daniel Kolesa wrote:
> 
> > not be limited to being just userspace under ppc64le, but should be 
> > runnable on a native kernel as well, which should not be limited to any 
> > particular baseline other than just PowerPC.
> 
> This is a fairly unusual approach to bringing up a new ABI.  Since new 
> ABIs are more likely to be used on new systems rather than switching ABI 
> on an existing installation, and since it can take quite some time for all 
> the software support for a new ABI to become widely available in 
> distributions, people developing new ABIs are likely to think about what 
> new systems are going to be relevant in a few years' time when working out 
> the minimum hardware requirements for the new ABI.  (The POWER8 minimum 
> for powerpc64le fits in with that, for example.)
That means that you cannot run ppc64le on FSL embedded CPUs (which lack
the vector instructions in LE mode). Which may be fine with you but
other people may want to support these. Can't really say if that's good
idea or not but I don't foresee them going away in a few years, either.

Thanks

Michal


Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

2020-06-01 Thread Michal Suchánek
On Mon, Jun 01, 2020 at 05:31:50PM +0530, Aneesh Kumar K.V wrote:
> On 6/1/20 3:39 PM, Jan Kara wrote:
> > On Fri 29-05-20 16:25:35, Aneesh Kumar K.V wrote:
> > > On 5/29/20 3:22 PM, Jan Kara wrote:
> > > > On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
> > > > > Thanks Michal. I also missed Jeff in this email thread.
> > > > 
> > > > And I think you'll also need some of the sched maintainers for the prctl
> > > > bits...
> > > > 
> > > > > On 5/29/20 3:03 PM, Michal Suchánek wrote:
> > > > > > Adding Jan
> > > > > > 
> > > > > > On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
> > > > > > > With POWER10, architecture is adding new pmem flush and sync 
> > > > > > > instructions.
> > > > > > > The kernel should prevent the usage of MAP_SYNC if applications 
> > > > > > > are not using
> > > > > > > the new instructions on newer hardware.
> > > > > > > 
> > > > > > > This patch adds a prctl option MAP_SYNC_ENABLE that can be used 
> > > > > > > to enable
> > > > > > > the usage of MAP_SYNC. The kernel config option is added to allow 
> > > > > > > the user
> > > > > > > to control whether MAP_SYNC should be enabled by default or not.
> > > > > > > 
> > > > > > > Signed-off-by: Aneesh Kumar K.V 
> > > > ...
> > > > > > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > > > > > index 8c700f881d92..d5a9a363e81e 100644
> > > > > > > --- a/kernel/fork.c
> > > > > > > +++ b/kernel/fork.c
> > > > > > > @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp 
> > > > > > > DEFINE_SPINLOCK(mmlist_lock);
> > > > > > > static unsigned long default_dump_filter = 
> > > > > > > MMF_DUMP_FILTER_DEFAULT;
> > > > > > > +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
> > > > > > > +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
> > > > > > > +#else
> > > > > > > +unsigned long default_map_sync_mask = 0;
> > > > > > > +#endif
> > > > > > > +
> > > > 
> > > > I'm not sure CONFIG is really the right approach here. For a distro 
> > > > that would
> > > > basically mean to disable MAP_SYNC for all PPC kernels unless 
> > > > application
> > > > explicitly uses the right prctl. Shouldn't we rather initialize
> > > > default_map_sync_mask on boot based on whether the CPU we run on 
> > > > requires
> > > > new flush instructions or not? Otherwise the patch looks sensible.
> > > > 
> > > 
> > > yes that is correct. We ideally want to deny MAP_SYNC only w.r.t POWER10.
> > > But on a virtualized platform there is no easy way to detect that. We 
> > > could
> > > ideally hook this into the nvdimm driver where we look at the new compat
> > > string ibm,persistent-memory-v2 and then disable MAP_SYNC
> > > if we find a device with the specific value.
> > 
> > Hum, couldn't we set some flag for nvdimm devices with
> > "ibm,persistent-memory-v2" property and then check it during mmap(2) time
> > and when the device has this propery and the mmap(2) caller doesn't have
> > the prctl set, we'd disallow MAP_SYNC? That should make things mostly
> > seamless, shouldn't it? Only apps that want to use MAP_SYNC on these
> > devices would need to use prctl(MMF_DISABLE_MAP_SYNC, 0) but then these
> > applications need to be aware of new instructions so this isn't that much
> > additional burden...
> 
> I am not sure application would want to add that much details/knowledge
> about a platform in their code. I was expecting application to do
> 
> #ifdef __ppc64__
> prctl(MAP_SYNC_ENABLE, 1, 0, 0, 0));
> #endif
> a = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
> MAP_SHARED_VALIDATE | MAP_SYNC, fd, 0);
> 
> 
> For that code all the complexity that we add w.r.t ibm,persistent-memory-v2
> is not useful. Do you see a value in making all these device specific rather
> than a conditional on  __ppc64__?
If the vpmem devices continue to work with the old instruction on
POWER10 then it makes sense to make this per-device.

Also adding a message to kernel log in case the application does not do
the prctl would be helful for people migrating old code to POWER10.

Thanks

Michal


Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

2020-05-29 Thread Michal Suchánek
Adding Jan

On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
> With POWER10, architecture is adding new pmem flush and sync instructions.
> The kernel should prevent the usage of MAP_SYNC if applications are not using
> the new instructions on newer hardware.
> 
> This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
> the usage of MAP_SYNC. The kernel config option is added to allow the user
> to control whether MAP_SYNC should be enabled by default or not.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  include/linux/sched/coredump.h | 13 ++---
>  include/uapi/linux/prctl.h |  3 +++
>  kernel/fork.c  |  8 +++-
>  kernel/sys.c   | 18 ++
>  mm/Kconfig |  3 +++
>  mm/mmap.c  |  4 
>  6 files changed, 45 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index ecdc6542070f..9ba6b3d5f991 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -72,9 +72,16 @@ static inline int get_dumpable(struct mm_struct *mm)
>  #define MMF_DISABLE_THP  24  /* disable THP for all VMAs */
>  #define MMF_OOM_VICTIM   25  /* mm is the oom victim */
>  #define MMF_OOM_REAP_QUEUED  26  /* mm was queued for oom_reaper */
> -#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
> +#define MMF_DISABLE_MAP_SYNC 27  /* disable THP for all VMAs */
> +#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
> +#define MMF_DISABLE_MAP_SYNC_MASK(1 << MMF_DISABLE_MAP_SYNC)
>  
> -#define MMF_INIT_MASK(MMF_DUMPABLE_MASK | 
> MMF_DUMP_FILTER_MASK |\
> -  MMF_DISABLE_THP_MASK)
> +#define MMF_INIT_MASK(MMF_DUMPABLE_MASK | 
> MMF_DUMP_FILTER_MASK | \
> + MMF_DISABLE_THP_MASK | MMF_DISABLE_MAP_SYNC_MASK)
> +
> +static inline bool map_sync_enabled(struct mm_struct *mm)
> +{
> + return !(mm->flags & MMF_DISABLE_MAP_SYNC_MASK);
> +}
>  
>  #endif /* _LINUX_SCHED_COREDUMP_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 07b4f8131e36..ee4cde32d5cf 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -238,4 +238,7 @@ struct prctl_mm_map {
>  #define PR_SET_IO_FLUSHER57
>  #define PR_GET_IO_FLUSHER58
>  
> +#define PR_SET_MAP_SYNC_ENABLE   59
> +#define PR_GET_MAP_SYNC_ENABLE   60
> +
>  #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 8c700f881d92..d5a9a363e81e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
>  
>  static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
>  
> +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
> +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
> +#else
> +unsigned long default_map_sync_mask = 0;
> +#endif
> +
>  static int __init coredump_filter_setup(char *s)
>  {
>   default_dump_filter =
> @@ -1039,7 +1045,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
> struct task_struct *p,
>   mm->flags = current->mm->flags & MMF_INIT_MASK;
>   mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
>   } else {
> - mm->flags = default_dump_filter;
> + mm->flags = default_dump_filter | default_map_sync_mask;
>   mm->def_flags = 0;
>   }
>  
> diff --git a/kernel/sys.c b/kernel/sys.c
> index d325f3ab624a..f6127cf4128b 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2450,6 +2450,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, 
> arg2, unsigned long, arg3,
>   clear_bit(MMF_DISABLE_THP, >mm->flags);
>   up_write(>mm->mmap_sem);
>   break;
> +
> + case PR_GET_MAP_SYNC_ENABLE:
> + if (arg2 || arg3 || arg4 || arg5)
> + return -EINVAL;
> + error = !test_bit(MMF_DISABLE_MAP_SYNC, >mm->flags);
> + break;
> + case PR_SET_MAP_SYNC_ENABLE:
> + if (arg3 || arg4 || arg5)
> + return -EINVAL;
> + if (down_write_killable(>mm->mmap_sem))
> + return -EINTR;
> + if (arg2)
> + clear_bit(MMF_DISABLE_MAP_SYNC, >mm->flags);
> + else
> + set_bit(MMF_DISABLE_MAP_SYNC, >mm->flags);
> + up_write(>mm->mmap_sem);
> + break;
> +
>   case PR_MPX_ENABLE_MANAGEMENT:
>   case PR_MPX_DISABLE_MANAGEMENT:
>   /* No longer implemented: */
> diff --git a/mm/Kconfig b/mm/Kconfig
> index c1acc34c1c35..38fd7cfbfca8 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -867,4 +867,7 @@ config ARCH_HAS_HUGEPD
>  config MAPPING_DIRTY_HELPERS
>  bool
>  
> +config ARCH_MAP_SYNC_DISABLE
> +

Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-22 Thread Michal Suchánek
On Thu, May 21, 2020 at 02:52:30PM -0400, Mikulas Patocka wrote:
> 
> 
> On Thu, 21 May 2020, Dan Williams wrote:
> 
> > On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
> >  wrote:
> > >
> > > > Moving on to the patch itself--Aneesh, have you audited other persistent
> > > > memory users in the kernel?  For example, drivers/md/dm-writecache.c 
> > > > does
> > > > this:
> > > >
> > > > static void writecache_commit_flushed(struct dm_writecache *wc, bool 
> > > > wait_for_ios)
> > > > {
> > > >   if (WC_MODE_PMEM(wc))
> > > >   wmb(); <==
> > > >  else
> > > >  ssd_commit_flushed(wc, wait_for_ios);
> > > > }
> > > >
> > > > I believe you'll need to make modifications there.
> > > >
> > >
> > > Correct. Thanks for catching that.
> > >
> > >
> > > I don't understand dm much, wondering how this will work with
> > > non-synchronous DAX device?
> > 
> > That's a good point. DM-writecache needs to be cognizant of things
> > like virtio-pmem that violate the rule that persisent memory writes
> > can be flushed by CPU functions rather than calling back into the
> > driver. It seems we need to always make the flush case a dax_operation
> > callback to account for this.
> 
> dm-writecache is normally sitting on the top of dm-linear, so it would 
> need to pass the wmb() call through the dm core and dm-linear target ... 
> that would slow it down ... I remember that you already did it this way 
> some times ago and then removed it.
> 
> What's the exact problem with POWER? Could the POWER system have two types 
> of persistent memory that need two different ways of flushing?

As far as I understand the discussion so far

 - on POWER $oldhardware uses $oldinstruction to ensure pmem consistency
 - on POWER $newhardware uses $newinstruction to ensure pmem consistency
   (compatible with $oldinstruction on $oldhardware)
 - on some platforms instead of barrier instruction a callback into the
   driver is issued to ensure consistency

None of this is reflected by the dm driver.

Thanks

Michal


Re: crash in cpuidle_enter_state with 5.7-rc1

2020-04-21 Thread Michal Suchánek
On Tue, Apr 21, 2020 at 10:21:52PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > On Mon, Apr 20, 2020 at 08:50:30AM +0200, Michal Suchánek wrote:
> >> On Mon, Apr 20, 2020 at 04:15:39PM +1000, Michael Ellerman wrote:
> >> > Michal Suchánek  writes:
> > ...
> >> > 
> >> > 
> >> > And I've just hit it with your config on a machine here, but the crash
> >> > is different:
> >> That does not look like it.
> >> You don't have this part in the stack trace:
> >> > [1.234899] [c7597420] [] 0x0
> >> > [1.234908] [c7597720] [0a6d] 0xa6d
> >> > [1.234919] [c7597a20] [] 0x0
> >> > [1.234931] [c7597d20] [0004] 0x4
> >> which is somewhat random but at least on such line is always present in
> >> the traces I get. Also I always get crash in cpuidle_enter_state
> > ..
> >> > I'm going to guess it's STRICT_KERNEL_RWX that's at fault.
> >> I can try without that as well.
> >
> > Can't reproduce without STRICT_KERNEL_RWX either.
> 
> I've reproduced something similar all the way back to v5.5, though it
> seems harder to hit - sometimes 5 boots will succeed before one fails.
I only tried 3 times because I do not have automation in place to
capture these early crashes. I suppose I could tell the kernel to not
reboot on panic and try rebooting several times.
> 
> Are you testing on top of PowerVM or KVM?
PowerVM.

Thanks

Michal


Re: crash in cpuidle_enter_state with 5.7-rc1

2020-04-20 Thread Michal Suchánek
On Mon, Apr 20, 2020 at 08:50:30AM +0200, Michal Suchánek wrote:
> Hello,
> 
> On Mon, Apr 20, 2020 at 04:15:39PM +1000, Michael Ellerman wrote:
> > Michal Suchánek  writes:
...
> > 
> > 
> > And I've just hit it with your config on a machine here, but the crash
> > is different:
> That does not look like it.
> You don't have this part in the stack trace:
> > [1.234899] [c7597420] [] 0x0
> > [1.234908] [c7597720] [0a6d] 0xa6d
> > [1.234919] [c7597a20] [] 0x0
> > [1.234931] [c7597d20] [0004] 0x4
> which is somewhat random but at least on such line is always present in
> the traces I get. Also I always get crash in cpuidle_enter_state
..
> > I'm going to guess it's STRICT_KERNEL_RWX that's at fault.
> I can try without that as well.
Can't reproduce without STRICT_KERNEL_RWX either.

Thanks

Michal


Re: crash in cpuidle_enter_state with 5.7-rc1

2020-04-20 Thread Michal Suchánek
Hello,

On Mon, Apr 20, 2020 at 04:15:39PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > Hello,
> >
> > I observe crash in cpuidle_enter_state in early boot on POWER9 pSeries
> > machine with 5.7-rc1 kernel. The crash is not 100% reliable. Sometimes
> > the machine boots.
> >
> > Attaching config, dmesg, and sample crash message. The stack below
> > cpuidle_enter_state appears random - different in each crash.
> >
> > Any idea what could cause this?
> 
> Nothing immediately springs to mind.
> 
> > Preparing to boot Linux version 5.7.0-rc1-1.g8f6a41f-default 
> > (geeko@buildhost) (gcc version 9.3.1 20200406 [revision 
> > 6db837a5288ee3ca5ec504fbd5a765817e556ac2] (SUSE Linux), GNU ld (GNU 
> > Binutils; openSUSE Tumbleweed) 2.34.0.20200325-1) #1 SMP Fri Apr 17 
> > 10:39:25 UTC 2020 (8f6a41f)
> > Detected machine type: 0101
> > command line: BOOT_IMAGE=/boot/vmlinux-5.7.0-rc1-1.g8f6a41f-default 
> > root=UUID=04f3f652-7c85-470b-9d5f-490601f371f8 mitigations=auto quiet 
> > crashkernel=242M
> > Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
> > Calling ibm,client-architecture-support... done
> > memory layout at init:
> >   memory_limit :  (16 MB aligned)
> >   alloc_bottom : 0e68
> >   alloc_top: 2000
> >   alloc_top_hi : 2000
> >   rmo_top  : 2000
> >   ram_top  : 2000
> > instantiating rtas at 0x1ecb... done
> > prom_hold_cpus: skipped
> > copying OF device tree...
> > Building dt strings...
> > Building dt structure...
> > Device tree strings 0x0e69 -> 0x0e691886
> > Device tree struct  0x0e6a -> 0x0e6b
> > Quiescing Open Firmware ...
> > Booting Linux via __start() @ 0x0a6e ...
> > [1.234639] BUG: Unable to handle kernel data access on read at 
> > 0xc26970e0
> > [1.234654] Faulting instruction address: 0xc00088dc
> > [1.234665] Oops: Kernel access of bad area, sig: 11 [#1]
> > [1.234675] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> > [1.234686] Modules linked in:
> > [1.234698] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 
> > 5.7.0-rc1-1.g8f6a41f-default #1 openSUSE Tumbleweed (unreleased)
> > [1.234714] NIP:  c00088dc LR: c0aad890 CTR: 
> > c00087a0
> > [1.234727] REGS: c7596e90 TRAP: 0300   Not tainted  
> > (5.7.0-rc1-1.g8f6a41f-default)
> > [1.234742] MSR:  80001033   CR: 2822  
> > XER: 
> 
> MMU was on when we faulted (IR & DR), so it's not real mode weirdness.
> 
> > [1.234760] CFAR: c00087fc DAR: c26970e0 DSISR: 4000 
> > IRQMASK: 0
> > [1.234760] GPR00: c0aa9384 c7597120 c269f000 
> > 
> > [1.234760] GPR04: c25d2778  c800 
> > c26d69c0
> > [1.234760] GPR08: 000e98056e8436d9 0300  
> > 
> > [1.234760] GPR12: 80001033 c0001ec79c00  
> > 1ef3d880
> > [1.234760] GPR16:   c0058990 
> > 
> > [1.234760] GPR20: c25d2778 c003ffa967c8 0001 
> > 0008
> > [2.234760] GPR24: c7538100   
> > 4995ca42
> > [1.234760] GPR28: c25d2778 c003ffa967c8 04945388 
> > 
> > [1.234869] NIP [c00088dc] data_access_common_virt+0x13c/0x170
> > [1.234882] LR [c0aad890] snooze_loop+0x70/0x220
> > [1.234888] Call Trace:
> > [1.234899] [c7597420] [] 0x0
> > [1.234908] [c7597720] [0a6d] 0xa6d
> > [1.234919] [c7597a20] [] 0x0
> > [1.234931] [c7597d20] [0004] 0x4
> > [1.234943] [c7597d50] [c0aa9384] 
> > cpuidle_enter_state+0xa4/0x590
> > [1.234954] Freeing unused kernel memory: 5312K
> > [1.234958] [c7597dd0] [c0aa990c] cpuidle_enter+0x4c/0x70
> > [1.234965] [c7597e10] [c019635c] call_cpuidle+0x4c/0x90
> > [1.234969] [c7597e30] [c0196978] do_idle+0x308/0x420
> > [1.234973] [c7597ed0] [c0196cd8] 
> > cpu_startup_entry+0x38/0x40
> > [1.234977] [c7597f00]

Re: CVE-2020-11669: Linux kernel 4.10 to 5.1: powerpc: guest can cause DoS on POWER9 KVM hosts

2020-04-15 Thread Michal Suchánek
On Wed, Apr 15, 2020 at 10:52:53PM +1000, Andrew Donnellan wrote:
> The Linux kernel for powerpc from v4.10 to v5.1 has a bug where the
> Authority Mask Register (AMR), Authority Mask Override Register (AMOR) and
> User Authority Mask Override Register (UAMOR) are not correctly saved and
> restored when the CPU is going into/coming out of idle state.
> 
> On POWER9 CPUs, this means that a CPU may return from idle with the AMR
> value of another thread on the same core.
> 
> This allows a trivial Denial of Service attack against KVM hosts, by booting
> a guest kernel which makes use of the AMR, such as a v5.2 or later kernel
> with Kernel Userspace Access Prevention (KUAP) enabled.
> 
> The guest kernel will set the AMR to prevent userspace access, then the
> thread will go idle. At a later point, the hardware thread that the guest
> was using may come out of idle and start executing in the host, without
> restoring the host AMR value. The host kernel can get caught in a page fault
> loop, as the AMR is unexpectedly causing memory accesses to fail in the
> host, and the host is eventually rendered unusable.

Hello,

shouldn't the kernel restore the host registers when leaving the guest?

I recall some code exists for handling the *AM*R when leaving guest. Can
the KVM guest enter idle without exiting to host?

Thanks

Michal


Re: [PATCH v2 2/2] crypto: Remove unnecessary memzero_explicit()

2020-04-14 Thread Michal Suchánek
On Tue, Apr 14, 2020 at 12:24:36PM -0400, Waiman Long wrote:
> On 4/14/20 2:08 AM, Christophe Leroy wrote:
> >
> >
> > Le 14/04/2020 à 00:28, Waiman Long a écrit :
> >> Since kfree_sensitive() will do an implicit memzero_explicit(), there
> >> is no need to call memzero_explicit() before it. Eliminate those
> >> memzero_explicit() and simplify the call sites. For better correctness,
> >> the setting of keylen is also moved down after the key pointer check.
> >>
> >> Signed-off-by: Waiman Long 
> >> ---
> >>   .../allwinner/sun8i-ce/sun8i-ce-cipher.c  | 19 +-
> >>   .../allwinner/sun8i-ss/sun8i-ss-cipher.c  | 20 +--
> >>   drivers/crypto/amlogic/amlogic-gxl-cipher.c   | 12 +++
> >>   drivers/crypto/inside-secure/safexcel_hash.c  |  3 +--
> >>   4 files changed, 14 insertions(+), 40 deletions(-)
> >>
> >> diff --git a/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c
> >> b/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c
> >> index aa4e8fdc2b32..8358fac98719 100644
> >> --- a/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c
> >> +++ b/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c
> >> @@ -366,10 +366,7 @@ void sun8i_ce_cipher_exit(struct crypto_tfm *tfm)
> >>   {
> >>   struct sun8i_cipher_tfm_ctx *op = crypto_tfm_ctx(tfm);
> >>   -    if (op->key) {
> >> -    memzero_explicit(op->key, op->keylen);
> >> -    kfree(op->key);
> >> -    }
> >> +    kfree_sensitive(op->key);
> >>   crypto_free_sync_skcipher(op->fallback_tfm);
> >>   pm_runtime_put_sync_suspend(op->ce->dev);
> >>   }
> >> @@ -391,14 +388,11 @@ int sun8i_ce_aes_setkey(struct crypto_skcipher
> >> *tfm, const u8 *key,
> >>   dev_dbg(ce->dev, "ERROR: Invalid keylen %u\n", keylen);
> >>   return -EINVAL;
> >>   }
> >> -    if (op->key) {
> >> -    memzero_explicit(op->key, op->keylen);
> >> -    kfree(op->key);
> >> -    }
> >> -    op->keylen = keylen;
> >> +    kfree_sensitive(op->key);
> >>   op->key = kmemdup(key, keylen, GFP_KERNEL | GFP_DMA);
> >>   if (!op->key)
> >>   return -ENOMEM;
> >> +    op->keylen = keylen;
> >
> > Does it matter at all to ensure op->keylen is not set when of->key is
> > NULL ? I'm not sure.
> >
> > But if it does, then op->keylen should be set to 0 when freeing op->key. 
> 
> My thinking is that if memory allocation fails, we just don't touch
> anything and return an error code. I will not explicitly set keylen to 0
> in this case unless it is specified in the API documentation.
You already freed the key by now so not touching anything is not
possible. The key is set to NULL on allocation failure so setting keylen
to 0 should be redundant. However, setting keylen to 0 is consisent with
not having a key, and it avoids the possibility of leaking the length
later should that ever cause any problem.

Thanks

Michal


Re: [PATCH] powerpcs: perf: consolidate perf_callchain_user_64 and perf_callchain_user_32

2020-04-09 Thread Michal Suchánek
On Tue, Apr 07, 2020 at 07:21:06AM +0200, Christophe Leroy wrote:
> 
> 
> Le 06/04/2020 à 23:00, Michal Suchanek a écrit :
> > perf_callchain_user_64 and perf_callchain_user_32 are nearly identical.
> > Consolidate into one function with thin wrappers.
> > 
> > Suggested-by: Nicholas Piggin 
> > Signed-off-by: Michal Suchanek 
> > ---
> >   arch/powerpc/perf/callchain.h| 24 +++-
> >   arch/powerpc/perf/callchain_32.c | 21 ++---
> >   arch/powerpc/perf/callchain_64.c | 14 --
> >   3 files changed, 29 insertions(+), 30 deletions(-)
> > 
> > diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
> > index 7a2cb9e1181a..7540bb71cb60 100644
> > --- a/arch/powerpc/perf/callchain.h
> > +++ b/arch/powerpc/perf/callchain.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _POWERPC_PERF_CALLCHAIN_H
> >   #define _POWERPC_PERF_CALLCHAIN_H
> > -int read_user_stack_slow(void __user *ptr, void *buf, int nb);
> > +int read_user_stack_slow(const void __user *ptr, void *buf, int nb);
> 
> Does the constification of ptr has to be in this patch ?
It was in the original patch. The code is touched anyway.
> Wouldn't it be better to have it as a separate patch ?
Don't care much either way. Can resend it as separate patches.
> 
> >   void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> > struct pt_regs *regs);
> >   void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> > @@ -16,4 +16,26 @@ static inline bool invalid_user_sp(unsigned long sp)
> > return (!sp || (sp & mask) || (sp > top));
> >   }
> > +/*
> > + * On 32-bit we just access the address and let hash_page create a
> > + * HPTE if necessary, so there is no need to fall back to reading
> > + * the page tables.  Since this is called at interrupt level,
> > + * do_page_fault() won't treat a DSI as a page fault.
> > + */
> > +static inline int __read_user_stack(const void __user *ptr, void *ret,
> > +   size_t size)
> > +{
> > +   int rc;
> > +
> > +   if ((unsigned long)ptr > TASK_SIZE - size ||
> > +   ((unsigned long)ptr & (size - 1)))
> > +   return -EFAULT;
> > +   rc = probe_user_read(ret, ptr, size);
> > +
> > +   if (rc && IS_ENABLED(CONFIG_PPC64))
> 
> gcc is probably smart enough to deal with it efficiently, but it would
> be more correct to test rc after checking CONFIG_PPC64.
IS_ENABLED(CONFIG_PPC64) is constant so that part of the check should be
compiled out in any case.

Thanks

Michal


Re: [PATCH v12 5/8] powerpc/64: make buildable without CONFIG_COMPAT

2020-04-07 Thread Michal Suchánek
On Tue, Apr 07, 2020 at 07:50:30AM +0200, Christophe Leroy wrote:
> 
> 
> Le 20/03/2020 à 11:20, Michal Suchanek a écrit :
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2:
> > - fix 32bit ifdef condition in signal.c
> > - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> > - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> > v3:
> > - use IS_ENABLED and maybe_unused where possible
> > - do not ifdef declarations
> > - clean up Makefile
> > v4:
> > - further makefile cleanup
> > - simplify is_32bit_task conditions
> > - avoid ifdef in condition by using return
> > v5:
> > - avoid unreachable code on 32bit
> > - make is_current_64bit constant on !COMPAT
> > - add stub perf_callchain_user_32 to avoid some ifdefs
> > v6:
> > - consolidate current_is_64bit
> > v7:
> > - remove leftover perf_callchain_user_32 stub from previous series version
> > v8:
> > - fix build again - too trigger-happy with stub removal
> > - remove a vdso.c hunk that causes warning according to kbuild test robot
> > v9:
> > - removed current_is_64bit in previous patch
> > v10:
> > - rebase on top of 70ed86f4de5bd
> > ---
> >   arch/powerpc/include/asm/thread_info.h | 4 ++--
> >   arch/powerpc/kernel/Makefile   | 6 +++---
> >   arch/powerpc/kernel/entry_64.S | 2 ++
> >   arch/powerpc/kernel/signal.c   | 3 +--
> >   arch/powerpc/kernel/syscall_64.c   | 6 ++
> >   arch/powerpc/kernel/vdso.c | 3 ++-
> >   arch/powerpc/perf/callchain.c  | 8 +++-
> >   7 files changed, 19 insertions(+), 13 deletions(-)
> > 
> 
> [...]
> 
> > diff --git a/arch/powerpc/kernel/syscall_64.c 
> > b/arch/powerpc/kernel/syscall_64.c
> > index 87d95b455b83..2dcbfe38f5ac 100644
> > --- a/arch/powerpc/kernel/syscall_64.c
> > +++ b/arch/powerpc/kernel/syscall_64.c
> > @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long 
> > r5,
> >long r6, long r7, long r8,
> >unsigned long r0, struct pt_regs *regs)
> >   {
> > -   unsigned long ti_flags;
> > syscall_fn f;
> > if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> > @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long 
> > r5,
> > local_irq_enable();
> > -   ti_flags = current_thread_info()->flags;
> > -   if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> > +   if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> > /*
> >  * We use the return value of do_syscall_trace_enter() as the
> >  * syscall number. If the syscall was rejected for any reason
> > @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long 
> > r5,
> > /* May be faster to do array_index_nospec? */
> > barrier_nospec();
> > -   if (unlikely(ti_flags & _TIF_32BIT)) {
> > +   if (unlikely(is_32bit_task())) {
> 
> is_compat() should be used here instead, because we dont want to use
is_compat_task()
> compat_sys_call_table() on PPC32.
> 
> > f = (void *)compat_sys_call_table[r0];
> > r3 &= 0xULL;
> 
That only applies once you use this for 32bit as well. Right now it's
64bit only so the two are the same.

Thanks

Michal


Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32

2020-04-06 Thread Michal Suchánek
On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on March 25, 2020 5:38 am:
> > On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
> >> Michal Suchanek's on March 19, 2020 10:19 pm:
> >> > There are two almost identical copies for 32bit and 64bit.
> >> > 
> >> > The function is used only in 32bit code which will be split out in next
> >> > patch so consolidate to one function.
> >> > 
> >> > Signed-off-by: Michal Suchanek 
> >> > Reviewed-by: Christophe Leroy 
> >> > ---
> >> > v6:  new patch
> >> > v8:  move the consolidated function out of the ifdef block.
> >> > v11: rebase on top of def0bfdbd603
> >> > ---
> >> >  arch/powerpc/perf/callchain.c | 48 +--
> >> >  1 file changed, 24 insertions(+), 24 deletions(-)
> >> > 
> >> > diff --git a/arch/powerpc/perf/callchain.c 
> >> > b/arch/powerpc/perf/callchain.c
> >> > index cbc251981209..c9a78c6e4361 100644
> >> > --- a/arch/powerpc/perf/callchain.c
> >> > +++ b/arch/powerpc/perf/callchain.c
> >> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user 
> >> > *ptr, unsigned long *ret)
> >> >  return read_user_stack_slow(ptr, ret, 8);
> >> >  }
> >> >  
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int 
> >> > *ret)
> >> > -{
> >> > -if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -((unsigned long)ptr & 3))
> >> > -return -EFAULT;
> >> > -
> >> > -if (!probe_user_read(ret, ptr, sizeof(*ret)))
> >> > -return 0;
> >> > -
> >> > -return read_user_stack_slow(ptr, ret, 4);
> >> > -}
> >> > -
> >> >  static inline int valid_user_sp(unsigned long sp, int is_64)
> >> >  {
> >> >  if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x1UL) 
> >> > - 32)
> >> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct 
> >> > perf_callchain_entry_ctx *entry,
> >> >  }
> >> >  
> >> >  #else  /* CONFIG_PPC64 */
> >> > -/*
> >> > - * On 32-bit we just access the address and let hash_page create a
> >> > - * HPTE if necessary, so there is no need to fall back to reading
> >> > - * the page tables.  Since this is called at interrupt level,
> >> > - * do_page_fault() won't treat a DSI as a page fault.
> >> > - */
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int 
> >> > *ret)
> >> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >> >  {
> >> > -if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -((unsigned long)ptr & 3))
> >> > -return -EFAULT;
> >> > -
> >> > -return probe_user_read(ret, ptr, sizeof(*ret));
> >> > +return 0;
> >> >  }
> >> >  
> >> >  static inline void perf_callchain_user_64(struct 
> >> > perf_callchain_entry_ctx *entry,
> >> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, 
> >> > int is_64)
> >> >  
> >> >  #endif /* CONFIG_PPC64 */
> >> >  
> >> > +/*
> >> > + * On 32-bit we just access the address and let hash_page create a
> >> > + * HPTE if necessary, so there is no need to fall back to reading
> >> > + * the page tables.  Since this is called at interrupt level,
> >> > + * do_page_fault() won't treat a DSI as a page fault.
> >> > + */
> >> 
> >> The comment is actually probably better to stay in the 32-bit
> >> read_user_stack_slow implementation. Is that function defined
> >> on 32-bit purely so that you can use IS_ENABLED()? In that case
> > It documents the IS_ENABLED() and that's where it is. The 32bit
> > definition is only a technical detail.
> 
> Sorry for the late reply, busy trying to fix bugs in the C rewrite
> series. I don't think it is the right place, it should be in the
> ppc32 implementation detail.
Which does not exist anymore after the 32bit and 64bit part is split.
> ppc64 has an equivalent comment at the top of its read_user_stack functions.
> 
> >> I would prefer to put a BUG() there which makes it self documenting.
> > Which will cause checkpatch complaints about introducing new BUG() which
> > is frowned on.
> 
> It's fine in this case, that warning is about not introducing
> runtime bugs, but this wouldn't be.
> 
> But... I actually don't like adding read_user_stack_slow on 32-bit
> and especially not just to make IS_ENABLED work.
That's to not break build at this point. Later the function is removed.
> 
> IMO this would be better if you really want to consolidate it
> 
> ---
> 
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index cbc251981209..ca3a599b3f54 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -108,7 +108,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
> *entry, struct pt_regs *re
>   * interrupt context, so if the access faults, we read the page tables
>   * to find which page (if any) is mapped and access it directly.
>   */
> -static int 

Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32

2020-04-03 Thread Michal Suchánek
On Fri, Apr 03, 2020 at 09:26:27PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on April 3, 2020 8:52 pm:
> > Hello,
> > 
> > there are 3 variants of the function
> > 
> > read_user_stack_64
> > 
> > 32bit read_user_stack_32
> > 64bit read_user_Stack_32
> 
> Right.
> 
> > On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
> [...]
> >>  #endif /* CONFIG_PPC64 */
> >>  
> >> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> +{
> >> +  return __read_user_stack(ptr, ret, sizeof(*ret));
> > Does not work for 64bit read_user_stack_32 ^ this should be 4.
> > 
> > Other than that it should preserve the existing logic just fine.
> 
> sizeof(int) == 4 on 64bit so it should work.
> 
Right, the type is different for the 32bit and 64bit version.

Thanks

Michal


Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32

2020-04-03 Thread Michal Suchánek
Hello,

there are 3 variants of the function

read_user_stack_64

32bit read_user_stack_32
64bit read_user_Stack_32

On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on March 25, 2020 5:38 am:
> > On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
> >> Michal Suchanek's on March 19, 2020 10:19 pm:
> >> > There are two almost identical copies for 32bit and 64bit.
> >> > 
> >> > The function is used only in 32bit code which will be split out in next
> >> > patch so consolidate to one function.
> >> > 
> >> > Signed-off-by: Michal Suchanek 
> >> > Reviewed-by: Christophe Leroy 
> >> > ---
> >> > v6:  new patch
> >> > v8:  move the consolidated function out of the ifdef block.
> >> > v11: rebase on top of def0bfdbd603
> >> > ---
> >> >  arch/powerpc/perf/callchain.c | 48 +--
> >> >  1 file changed, 24 insertions(+), 24 deletions(-)
> >> > 
> >> > diff --git a/arch/powerpc/perf/callchain.c 
> >> > b/arch/powerpc/perf/callchain.c
> >> > index cbc251981209..c9a78c6e4361 100644
> >> > --- a/arch/powerpc/perf/callchain.c
> >> > +++ b/arch/powerpc/perf/callchain.c
> >> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user 
> >> > *ptr, unsigned long *ret)
> >> >  return read_user_stack_slow(ptr, ret, 8);
> >> >  }
> >> >  
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int 
> >> > *ret)
> >> > -{
> >> > -if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -((unsigned long)ptr & 3))
> >> > -return -EFAULT;
> >> > -
> >> > -if (!probe_user_read(ret, ptr, sizeof(*ret)))
> >> > -return 0;
> >> > -
> >> > -return read_user_stack_slow(ptr, ret, 4);
> >> > -}
> >> > -
> >> >  static inline int valid_user_sp(unsigned long sp, int is_64)
> >> >  {
> >> >  if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x1UL) 
> >> > - 32)
> >> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct 
> >> > perf_callchain_entry_ctx *entry,
> >> >  }
> >> >  
> >> >  #else  /* CONFIG_PPC64 */
> >> > -/*
> >> > - * On 32-bit we just access the address and let hash_page create a
> >> > - * HPTE if necessary, so there is no need to fall back to reading
> >> > - * the page tables.  Since this is called at interrupt level,
> >> > - * do_page_fault() won't treat a DSI as a page fault.
> >> > - */
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int 
> >> > *ret)
> >> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >> >  {
> >> > -if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -((unsigned long)ptr & 3))
> >> > -return -EFAULT;
> >> > -
> >> > -return probe_user_read(ret, ptr, sizeof(*ret));
> >> > +return 0;
> >> >  }
> >> >  
> >> >  static inline void perf_callchain_user_64(struct 
> >> > perf_callchain_entry_ctx *entry,
> >> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, 
> >> > int is_64)
> >> >  
> >> >  #endif /* CONFIG_PPC64 */
> >> >  
> >> > +/*
> >> > + * On 32-bit we just access the address and let hash_page create a
> >> > + * HPTE if necessary, so there is no need to fall back to reading
> >> > + * the page tables.  Since this is called at interrupt level,
> >> > + * do_page_fault() won't treat a DSI as a page fault.
> >> > + */
> >> 
> >> The comment is actually probably better to stay in the 32-bit
> >> read_user_stack_slow implementation. Is that function defined
> >> on 32-bit purely so that you can use IS_ENABLED()? In that case
> > It documents the IS_ENABLED() and that's where it is. The 32bit
> > definition is only a technical detail.
> 
> Sorry for the late reply, busy trying to fix bugs in the C rewrite
> series. I don't think it is the right place, it should be in the
> ppc32 implementation detail. ppc64 has an equivalent comment at the
> top of its read_user_stack functions.
> 
> >> I would prefer to put a BUG() there which makes it self documenting.
> > Which will cause checkpatch complaints about introducing new BUG() which
> > is frowned on.
> 
> It's fine in this case, that warning is about not introducing
> runtime bugs, but this wouldn't be.
> 
> But... I actually don't like adding read_user_stack_slow on 32-bit
> and especially not just to make IS_ENABLED work.
> 
> IMO this would be better if you really want to consolidate it
> 
> ---
> 
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index cbc251981209..ca3a599b3f54 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -108,7 +108,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
> *entry, struct pt_regs *re
>   * interrupt context, so if the access faults, we read the page tables
>   * to find which page (if any) is mapped and access it directly.
>   */
> -static int read_user_stack_slow(void __user 

Re: [PATCH] powerpc/64: Fix section mismatch warnings.

2020-03-26 Thread Michal Suchánek
On Thu, Mar 26, 2020 at 11:22:03PM +0530, Naveen N. Rao wrote:
> Michal Suchanek wrote:
> > Fixes the following warnings:
> > 
> > WARNING: vmlinux.o(.text+0x2d24): Section mismatch in reference from the 
> > variable __boot_from_prom to the function .init.text:prom_init()
> > The function __boot_from_prom() references
> > the function __init prom_init().
> > This is often because __boot_from_prom lacks a __init
> > annotation or the annotation of prom_init is wrong.
> > 
> > WARNING: vmlinux.o(.text+0x2fd0): Section mismatch in reference from the 
> > variable start_here_common to the function .init.text:start_kernel()
> > The function start_here_common() references
> > the function __init start_kernel().
> > This is often because start_here_common lacks a __init
> > annotation or the annotation of start_kernel is wrong.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> >  arch/powerpc/kernel/head_64.S | 4 
> >  1 file changed, 4 insertions(+)
> 
> Michael committed a similar patch just earlier today:
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=6eeb9b3b9ce588f14a697737a30d0702b5a20293

Missed it because it did not reach master yet.

Thanks

Michal


Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32

2020-03-24 Thread Michal Suchánek
On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
> Michal Suchanek's on March 19, 2020 10:19 pm:
> > There are two almost identical copies for 32bit and 64bit.
> > 
> > The function is used only in 32bit code which will be split out in next
> > patch so consolidate to one function.
> > 
> > Signed-off-by: Michal Suchanek 
> > Reviewed-by: Christophe Leroy 
> > ---
> > v6:  new patch
> > v8:  move the consolidated function out of the ifdef block.
> > v11: rebase on top of def0bfdbd603
> > ---
> >  arch/powerpc/perf/callchain.c | 48 +--
> >  1 file changed, 24 insertions(+), 24 deletions(-)
> > 
> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> > index cbc251981209..c9a78c6e4361 100644
> > --- a/arch/powerpc/perf/callchain.c
> > +++ b/arch/powerpc/perf/callchain.c
> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user 
> > *ptr, unsigned long *ret)
> > return read_user_stack_slow(ptr, ret, 8);
> >  }
> >  
> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> > -{
> > -   if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> > -   ((unsigned long)ptr & 3))
> > -   return -EFAULT;
> > -
> > -   if (!probe_user_read(ret, ptr, sizeof(*ret)))
> > -   return 0;
> > -
> > -   return read_user_stack_slow(ptr, ret, 4);
> > -}
> > -
> >  static inline int valid_user_sp(unsigned long sp, int is_64)
> >  {
> > if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x1UL) - 32)
> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry,
> >  }
> >  
> >  #else  /* CONFIG_PPC64 */
> > -/*
> > - * On 32-bit we just access the address and let hash_page create a
> > - * HPTE if necessary, so there is no need to fall back to reading
> > - * the page tables.  Since this is called at interrupt level,
> > - * do_page_fault() won't treat a DSI as a page fault.
> > - */
> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >  {
> > -   if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> > -   ((unsigned long)ptr & 3))
> > -   return -EFAULT;
> > -
> > -   return probe_user_read(ret, ptr, sizeof(*ret));
> > +   return 0;
> >  }
> >  
> >  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx 
> > *entry,
> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int 
> > is_64)
> >  
> >  #endif /* CONFIG_PPC64 */
> >  
> > +/*
> > + * On 32-bit we just access the address and let hash_page create a
> > + * HPTE if necessary, so there is no need to fall back to reading
> > + * the page tables.  Since this is called at interrupt level,
> > + * do_page_fault() won't treat a DSI as a page fault.
> > + */
> 
> The comment is actually probably better to stay in the 32-bit
> read_user_stack_slow implementation. Is that function defined
> on 32-bit purely so that you can use IS_ENABLED()? In that case
It documents the IS_ENABLED() and that's where it is. The 32bit
definition is only a technical detail.
> I would prefer to put a BUG() there which makes it self documenting.
Which will cause checkpatch complaints about introducing new BUG() which
is frowned on.

Thanks

Michal


Re: [PATCH v11 5/8] powerpc/64: make buildable without CONFIG_COMPAT

2020-03-24 Thread Michal Suchánek
On Tue, Mar 24, 2020 at 06:54:20PM +1000, Nicholas Piggin wrote:
> Michal Suchanek's on March 19, 2020 10:19 pm:
> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> > index 4b0152108f61..a264989626fd 100644
> > --- a/arch/powerpc/kernel/signal.c
> > +++ b/arch/powerpc/kernel/signal.c
> > @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
> > sigset_t *oldset = sigmask_to_save();
> > struct ksignal ksig = { .sig = 0 };
> > int ret;
> > -   int is32 = is_32bit_task();
> >  
> > BUG_ON(tsk != current);
> >  
> > @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
> >  
> > rseq_signal_deliver(, tsk->thread.regs);
> >  
> > -   if (is32) {
> > +   if (is_32bit_task()) {
> > if (ksig.ka.sa.sa_flags & SA_SIGINFO)
> > ret = handle_rt_signal32(, oldset, tsk);
> > else
> 
> Unnecessary?
> 
> > diff --git a/arch/powerpc/kernel/syscall_64.c 
> > b/arch/powerpc/kernel/syscall_64.c
> > index 87d95b455b83..2dcbfe38f5ac 100644
> > --- a/arch/powerpc/kernel/syscall_64.c
> > +++ b/arch/powerpc/kernel/syscall_64.c
> > @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long 
> > r5,
> >long r6, long r7, long r8,
> >unsigned long r0, struct pt_regs *regs)
> >  {
> > -   unsigned long ti_flags;
> > syscall_fn f;
> >  
> > if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> > @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long 
> > r5,
> >  
> > local_irq_enable();
> >  
> > -   ti_flags = current_thread_info()->flags;
> > -   if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> > +   if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> > /*
> >  * We use the return value of do_syscall_trace_enter() as the
> >  * syscall number. If the syscall was rejected for any reason
> > @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long 
> > r5,
> > /* May be faster to do array_index_nospec? */
> > barrier_nospec();
> >  
> > -   if (unlikely(ti_flags & _TIF_32BIT)) {
> > +   if (unlikely(is_32bit_task())) {
> 
> Problem is, does this allow the load of ti_flags to be used for both
> tests, or does test_bit make it re-load?
> 
> This could maybe be fixed by testing if(IS_ENABLED(CONFIG_COMPAT) &&
Both points already discussed here:

https://lore.kernel.org/linuxppc-dev/13fa324dc879a7f325290bf2e131b87eb491cd7b.1573576649.git.msucha...@suse.de/

Thanks

Michal


Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.

2020-03-20 Thread Michal Suchánek
On Fri, Mar 20, 2020 at 06:31:57PM +0200, Andy Shevchenko wrote:
> On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> > On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > > While at it also simplify the existing perf patterns.
> > > > > And still missed fixes from parse-maintainers.pl.
> > > > 
> > > > Oh, that script UX is truly ingenious.
> > > 
> > > You have at least two options, their combinations, etc:
> > >  - complain to the author :-)
> > >  - send a patch :-)
> > 
> > Recently:
> > 
> > https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.ca...@perches.com/
> 
> But why?
> 
> Shouldn't we rather run MAINTAINERS clean up once and require people to use
> parse-maintainers.pl for good?

That cleanup did not happen yet, and I am not volunteering for one.
The difference between MAINTAINERS and MAINTAINERS.new is:

 MAINTAINERS | 5510 +--
 1 file changed, 2755 insertions(+), 2755 deletions(-)

Thanks

Michal


Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.

2020-03-20 Thread Michal Suchánek
On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > While at it also simplify the existing perf patterns.
> > > > And still missed fixes from parse-maintainers.pl.
> > > 
> > > Oh, that script UX is truly ingenious.
> > 
> > You have at least two options, their combinations, etc:
> >  - complain to the author :-)
> >  - send a patch :-)
> 
> Recently:
> 
> https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.ca...@perches.com/

Can we expect that reaordering is taken care of in that discussion then?

Thanks

Michal


Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.

2020-03-20 Thread Michal Suchánek
On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > While at it also simplify the existing perf patterns.
> > 
> 
> And still missed fixes from parse-maintainers.pl.

Oh, that script UX is truly ingenious. It provides no output and quietly
creates MAINTAINERS.new which is, of course, not included in the patch.

Thanks

Michal

> 
> I see it like below in the linux-next (after the script)
> 
> PERFORMANCE EVENTS SUBSYSTEM
> M:  Peter Zijlstra 
> M:  Ingo Molnar 
> M:  Arnaldo Carvalho de Melo 
> R:  Mark Rutland 
> R:  Alexander Shishkin 
> R:  Jiri Olsa 
> R:  Namhyung Kim 
> L:  linux-ker...@vger.kernel.org
> S:  Supported
> T:  git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> perf/core
> F:  arch/*/events/*
> F:  arch/*/events/*/*
> F:  arch/*/include/asm/perf_event.h
> F:  arch/*/kernel/*/*/perf_event*.c
> F:  arch/*/kernel/*/perf_event*.c
> F:  arch/*/kernel/perf_callchain.c
> F:  arch/*/kernel/perf_event*.c
> F:  include/linux/perf_event.h
> F:  include/uapi/linux/perf_event.h
> F:  kernel/events/*
> F:  tools/perf/
> 
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13080,7 +13080,7 @@ R:  Namhyung Kim 
> >  L: linux-ker...@vger.kernel.org
> >  T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
> >  S: Supported
> > -F: kernel/events/*
> > +F: kernel/events/
> >  F: include/linux/perf_event.h
> >  F: include/uapi/linux/perf_event.h
> >  F: arch/*/kernel/perf_event*.c
> > @@ -13088,8 +13088,8 @@ F:  arch/*/kernel/*/perf_event*.c
> >  F: arch/*/kernel/*/*/perf_event*.c
> >  F: arch/*/include/asm/perf_event.h
> >  F: arch/*/kernel/perf_callchain.c
> > -F: arch/*/events/*
> > -F: arch/*/events/*/*
> > +F: arch/*/events/
> > +F: arch/*/perf/
> >  F: tools/perf/
> >  
> >  PERFORMANCE EVENTS SUBSYSTEM ARM64 PMU EVENTS
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 


Re: [PATCH -v2] treewide: Rename "unencrypted" to "decrypted"

2020-03-19 Thread Michal Suchánek
On Thu, Mar 19, 2020 at 06:25:49PM +0100, Thomas Gleixner wrote:
> Borislav Petkov  writes:
> 
> > On Thu, Mar 19, 2020 at 11:06:15AM +, Robin Murphy wrote:
> >> Let me add another vote from a native English speaker that "unencrypted" is
> >> the appropriate term to imply the *absence* of encryption, whereas
> >> "decrypted" implies the *reversal* of applied encryption.
Even as a non-native speaker I can clearly see the distinction.
> >> 
> >> Naming things is famously hard, for good reason - names are *important* for
> >> understanding. Just because a decision was already made one way doesn't 
> >> mean
> >> that that decision was necessarily right. Churning one area to be
> >> consistently inaccurate just because it's less work than churning another
> >> area to be consistently accurate isn't really the best excuse.
> >
> > Well, the reason we chose "decrypted" vs something else is so to be as
> > different from "encrypted" as possible. If we called it "unencrypted"
> > you'd have stuff like:
> >
> >if (force_dma_unencrypted(dev))
> > set_memory_encrypted((unsigned long)cpu_addr, 1 << 
> > page_order);

If you want something with high edit distance from 'encrypted' meaning
the opposite there is already 'cleartext' which was designed for this
exact purpose.

Thanks

Michal


Re: [PATCH v11 4/8] powerpc/perf: consolidate valid_user_sp

2020-03-19 Thread Michal Suchánek
On Thu, Mar 19, 2020 at 03:16:03PM +0100, Christophe Leroy wrote:
> 
> 
> Le 19/03/2020 à 14:35, Andy Shevchenko a écrit :
> > On Thu, Mar 19, 2020 at 1:54 PM Michal Suchanek  wrote:
> > > 
> > > Merge the 32bit and 64bit version.
> > > 
> > > Halve the check constants on 32bit.
> > > 
> > > Use STACK_TOP since it is defined.
> > > 
> > > Passing is_64 is now redundant since is_32bit_task() is used to
> > > determine which callchain variant should be used. Use STACK_TOP and
> > > is_32bit_task() directly.
> > > 
> > > This removes a page from the valid 32bit area on 64bit:
> > >   #define TASK_SIZE_USER32 (0x0001UL - (1 * PAGE_SIZE))
> > >   #define STACK_TOP_USER32 TASK_SIZE_USER32
> > 
> > ...
> > 
> > > +static inline int valid_user_sp(unsigned long sp)
> > > +{
> > > +   bool is_64 = !is_32bit_task();
> > > +
> > > +   if (!sp || (sp & (is_64 ? 7 : 3)) || sp > STACK_TOP - (is_64 ? 32 
> > > : 16))
> > > +   return 0;
> > > +   return 1;
> > > +}
> > 
> > Other possibility:
> 
> I prefer this one.
> 
> > 
> >unsigned long align = is_32bit_task() ? 3 : 7;
> 
> I would call it mask instead of align
> 
> >unsigned long top = STACK_TOP - (is_32bit_task() ? 16 : 32);
> > 
> >return !(!sp || (sp & align) || sp > top);
And we can avoid the inversion here as well as in !valid_user_sp(sp) by
changing to invalid_user_sp.

Thanks

Michal


Re: [PATCH v11 4/8] powerpc/perf: consolidate valid_user_sp

2020-03-19 Thread Michal Suchánek
On Thu, Mar 19, 2020 at 03:35:03PM +0200, Andy Shevchenko wrote:
> On Thu, Mar 19, 2020 at 1:54 PM Michal Suchanek  wrote:
> >
> > Merge the 32bit and 64bit version.
> >
> > Halve the check constants on 32bit.
> >
> > Use STACK_TOP since it is defined.
> >
> > Passing is_64 is now redundant since is_32bit_task() is used to
> > determine which callchain variant should be used. Use STACK_TOP and
> > is_32bit_task() directly.
> >
> > This removes a page from the valid 32bit area on 64bit:
> >  #define TASK_SIZE_USER32 (0x0001UL - (1 * PAGE_SIZE))
> >  #define STACK_TOP_USER32 TASK_SIZE_USER32
> 
> ...
> 
> > +static inline int valid_user_sp(unsigned long sp)
> > +{
> > +   bool is_64 = !is_32bit_task();
> > +
> > +   if (!sp || (sp & (is_64 ? 7 : 3)) || sp > STACK_TOP - (is_64 ? 32 : 
> > 16))
> > +   return 0;
> > +   return 1;
> > +}
> 
> Perhaps better to read
> 
>   if (!sp)
> return 0;
> 
>   if (is_32bit_task()) {
> if (sp & 0x03)
>   return 0;
> if (sp > STACK_TOP - 16)
>   return 0;
>   } else {
> ...
>   }
> 
>   return 1;
> 
> Other possibility:
> 
>   unsigned long align = is_32bit_task() ? 3 : 7;
>   unsigned long top = STACK_TOP - (is_32bit_task() ? 16 : 32);
> 
>   return !(!sp || (sp & align) || sp > top);
Sounds reasonale.

Thanks

Michal
> 
> -- 
> With Best Regards,
> Andy Shevchenko


Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11

2020-03-19 Thread Michal Suchánek
On Thu, Mar 19, 2020 at 01:36:56PM +0100, Christophe Leroy wrote:
> You sent it twice ? Any difference between the two dispatch ?
Some headers were broken the first time around.

Thanks

Michal
> 
> Christophe
> 
> Le 19/03/2020 à 13:19, Michal Suchanek a écrit :
> > Less code means less bugs so add a knob to skip the compat stuff.
> > 
> > Changes in v2: saner CONFIG_COMPAT ifdefs
> > Changes in v3:
> >   - change llseek to 32bit instead of builing it unconditionally in fs
> >   - clanup the makefile conditionals
> >   - remove some ifdefs or convert to IS_DEFINED where possible
> > Changes in v4:
> >   - cleanup is_32bit_task and current_is_64bit
> >   - more makefile cleanup
> > Changes in v5:
> >   - more current_is_64bit cleanup
> >   - split off callchain.c 32bit and 64bit parts
> > Changes in v6:
> >   - cleanup makefile after split
> >   - consolidate read_user_stack_32
> >   - fix some checkpatch warnings
> > Changes in v7:
> >   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
> >   - remove leftover hunk
> >   - add review tags
> > Changes in v8:
> >   - consolidate valid_user_sp to fix it in the split callchain.c
> >   - fix build errors/warnings with PPC64 !COMPAT and PPC32
> > Changes in v9:
> >   - remove current_is_64bit()
> > Chanegs in v10:
> >   - rebase, sent together with the syscall cleanup
> > Changes in v11:
> >   - rebase
> >   - add MAINTAINERS pattern for ppc perf
> > 
> > Michal Suchanek (8):
> >powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
> >powerpc: move common register copy functions from signal_32.c to
> >  signal.c
> >powerpc/perf: consolidate read_user_stack_32
> >powerpc/perf: consolidate valid_user_sp
> >powerpc/64: make buildable without CONFIG_COMPAT
> >powerpc/64: Make COMPAT user-selectable disabled on littleendian by
> >  default.
> >powerpc/perf: split callchain.c by bitness
> >MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
> >  entry.
> > 
> >   MAINTAINERS|   2 +
> >   arch/powerpc/Kconfig   |   5 +-
> >   arch/powerpc/include/asm/thread_info.h |   4 +-
> >   arch/powerpc/include/asm/unistd.h  |   1 +
> >   arch/powerpc/kernel/Makefile   |   6 +-
> >   arch/powerpc/kernel/entry_64.S |   2 +
> >   arch/powerpc/kernel/signal.c   | 144 +-
> >   arch/powerpc/kernel/signal_32.c| 140 --
> >   arch/powerpc/kernel/syscall_64.c   |   6 +-
> >   arch/powerpc/kernel/vdso.c |   3 +-
> >   arch/powerpc/perf/Makefile |   5 +-
> >   arch/powerpc/perf/callchain.c  | 356 +
> >   arch/powerpc/perf/callchain.h  |  20 ++
> >   arch/powerpc/perf/callchain_32.c   | 196 ++
> >   arch/powerpc/perf/callchain_64.c   | 174 
> >   fs/read_write.c|   3 +-
> >   16 files changed, 556 insertions(+), 511 deletions(-)
> >   create mode 100644 arch/powerpc/perf/callchain.h
> >   create mode 100644 arch/powerpc/perf/callchain_32.c
> >   create mode 100644 arch/powerpc/perf/callchain_64.c
> > 


Re: [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.

2020-03-19 Thread Michal Suchánek
On Thu, Mar 19, 2020 at 03:37:03PM +0200, Andy Shevchenko wrote:
> On Thu, Mar 19, 2020 at 2:21 PM Michal Suchanek  wrote:
> >
> > Signed-off-by: Michal Suchanek 
> > ---
> > v10: new patch
> > ---
> >  MAINTAINERS | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index bc8dbe4fe4c9..329bf4a31412 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13088,6 +13088,8 @@ F:  arch/*/kernel/*/perf_event*.c
> >  F: arch/*/kernel/*/*/perf_event*.c
> >  F: arch/*/include/asm/perf_event.h
> >  F: arch/*/kernel/perf_callchain.c
> > +F: arch/*/perf/*
> > +F: arch/*/perf/*/*
> >  F: arch/*/events/*
> >  F: arch/*/events/*/*
> >  F: tools/perf/
> 
> Had you run parse-maintainers.pl?
Did not know it exists. The output is:

scripts/parse-maintainers.pl 
Odd non-pattern line '
Documentation/devicetree/bindings/media/ti,cal.yaml
' for 'TI VPE/CAL DRIVERS' at scripts/parse-maintainers.pl line 147,
<$file> line 16756.

Thanks

Michal


Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11

2020-03-19 Thread Michal Suchánek
Lost some headers so will resend.

On Thu, Mar 19, 2020 at 12:52:20PM +0100, Michal Suchanek wrote:
> Less code means less bugs so add a knob to skip the compat stuff.
> 
> Changes in v2: saner CONFIG_COMPAT ifdefs
> Changes in v3:
>  - change llseek to 32bit instead of builing it unconditionally in fs
>  - clanup the makefile conditionals
>  - remove some ifdefs or convert to IS_DEFINED where possible
> Changes in v4:
>  - cleanup is_32bit_task and current_is_64bit
>  - more makefile cleanup
> Changes in v5:
>  - more current_is_64bit cleanup
>  - split off callchain.c 32bit and 64bit parts
> Changes in v6:
>  - cleanup makefile after split
>  - consolidate read_user_stack_32
>  - fix some checkpatch warnings
> Changes in v7:
>  - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>  - remove leftover hunk
>  - add review tags
> Changes in v8:
>  - consolidate valid_user_sp to fix it in the split callchain.c
>  - fix build errors/warnings with PPC64 !COMPAT and PPC32
> Changes in v9:
>  - remove current_is_64bit()
> Chanegs in v10:
>  - rebase, sent together with the syscall cleanup
> Changes in v11:
>  - rebase
>  - add MAINTAINERS pattern for ppc perf
> 
> Michal Suchanek (8):
>   powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
>   powerpc: move common register copy functions from signal_32.c to
> signal.c
>   powerpc/perf: consolidate read_user_stack_32
>   powerpc/perf: consolidate valid_user_sp
>   powerpc/64: make buildable without CONFIG_COMPAT
>   powerpc/64: Make COMPAT user-selectable disabled on littleendian by
> default.
>   powerpc/perf: split callchain.c by bitness
>   MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
> entry.
> 
>  MAINTAINERS|   2 +
>  arch/powerpc/Kconfig   |   5 +-
>  arch/powerpc/include/asm/thread_info.h |   4 +-
>  arch/powerpc/include/asm/unistd.h  |   1 +
>  arch/powerpc/kernel/Makefile   |   6 +-
>  arch/powerpc/kernel/entry_64.S |   2 +
>  arch/powerpc/kernel/signal.c   | 144 +-
>  arch/powerpc/kernel/signal_32.c| 140 --
>  arch/powerpc/kernel/syscall_64.c   |   6 +-
>  arch/powerpc/kernel/vdso.c |   3 +-
>  arch/powerpc/perf/Makefile |   5 +-
>  arch/powerpc/perf/callchain.c  | 356 +
>  arch/powerpc/perf/callchain.h  |  20 ++
>  arch/powerpc/perf/callchain_32.c   | 196 ++
>  arch/powerpc/perf/callchain_64.c   | 174 
>  fs/read_write.c|   3 +-
>  16 files changed, 556 insertions(+), 511 deletions(-)
>  create mode 100644 arch/powerpc/perf/callchain.h
>  create mode 100644 arch/powerpc/perf/callchain_32.c
>  create mode 100644 arch/powerpc/perf/callchain_64.c
> 
> -- 
> 2.23.0
> 


Re: [RFC PATCH v1] pseries/drmem: don't cache node id in drmem_lmb struct

2020-03-11 Thread Michal Suchánek
On Wed, Mar 11, 2020 at 06:08:15PM -0500, Scott Cheloha wrote:
> At memory hot-remove time we can retrieve an LMB's nid from its
> corresponding memory_block.  There is no need to store the nid
> in multiple locations.
> 
> Signed-off-by: Scott Cheloha 
> ---
> The linear search in powerpc's memory_add_physaddr_to_nid() has become a
> bottleneck at boot on systems with many LMBs.
> 
> As described in this patch here:
> 
> https://lore.kernel.org/linuxppc-dev/20200221172901.1596249-2-chel...@linux.ibm.com/
> 
> the linear search seriously cripples drmem_init().
> 
> The obvious solution (shown in that patch) is to just make the search
> in memory_add_physaddr_to_nid() faster.  An XArray seems well-suited
> to the task of mapping an address range to an LMB object.
> 
> The less obvious approach is to just call memory_add_physaddr_to_nid()
> in fewer places.
> 
> I'm not sure which approach is correct, hence the RFC.

You basically revert the below which will likely cause the very error
that was fixed there:

commit b2d3b5ee66f2a04a918cc043cec0c9ed3de58f40
Author: Nathan Fontenot 
Date:   Tue Oct 2 10:35:59 2018 -0500

powerpc/pseries: Track LMB nid instead of using device tree

When removing memory we need to remove the memory from the node
it was added to instead of looking up the node it should be in
in the device tree.

During testing we have seen scenarios where the affinity for a
LMB changes due to a partition migration or PRRN event. In these
cases the node the LMB exists in may not match the node the device
tree indicates it belongs in. This can lead to a system crash
when trying to DLPAR remove the LMB after a migration or PRRN
event. The current code looks up the node in the device tree to
remove the LMB from, the crash occurs when we try to offline this
node and it does not have any data, i.e. node_data[nid] == NULL.

Thanks

Michal


Re: [PATCH rebased 1/2] powerpc: reserve memory for capture kernel after hugepages init

2020-03-05 Thread Michal Suchánek
Hello,

This seems to cause crash with kdump reservation 1GB quite reliably.

Thanks

Michal

On Tue, Feb 18, 2020 at 05:28:34PM +0100, Michal Suchanek wrote:
> From: Hari Bathini 
> 
> Sometimes, memory reservation for KDump/FADump can overlap with memory
> marked for hugepages. This overlap leads to error, hang in KDump case
> and copy error reported by f/w in case of FADump, while trying to
> capture dump. Report error while setting up memory for the capture
> kernel instead of running into issues while capturing dump, by moving
> KDump/FADump reservation below MMU early init and failing gracefully
> when hugepages memory overlaps with capture kernel memory.
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/kernel/prom.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 6620f37abe73..0f14dc9c4dab 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -735,14 +735,6 @@ void __init early_init_devtree(void *params)
>   if (PHYSICAL_START > MEMORY_START)
>   memblock_reserve(MEMORY_START, 0x8000);
>   reserve_kdump_trampoline();
> -#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
> - /*
> -  * If we fail to reserve memory for firmware-assisted dump then
> -  * fallback to kexec based kdump.
> -  */
> - if (fadump_reserve_mem() == 0)
> -#endif
> - reserve_crashkernel();
>   early_reserve_mem();
>  
>   /* Ensure that total memory size is page-aligned. */
> @@ -781,6 +773,14 @@ void __init early_init_devtree(void *params)
>  #endif
>  
>   mmu_early_init_devtree();
> +#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
> + /*
> +  * If we fail to reserve memory for firmware-assisted dump then
> +  * fallback to kexec based kdump.
> +  */
> + if (fadump_reserve_mem() == 0)
> +#endif
> + reserve_crashkernel();
>  
>  #ifdef CONFIG_PPC_POWERNV
>   /* Scan and build the list of machine check recoverable ranges */
> -- 
> 2.23.0
> 


Re: [PATCH 2/2] powerpc: avoid adjusting memory_limit for capture kernel memory reservation

2020-02-18 Thread Michal Suchánek
On Fri, Jun 28, 2019 at 12:51:19AM +0530, Hari Bathini wrote:
> Currently, if memory_limit is specified and it overlaps with memory to
> be reserved for capture kernel, memory_limit is adjusted to accommodate
> capture kernel. With memory reservation for capture kernel moved later
> (after enforcing memory limit), this adjustment no longer holds water.
> So, avoid adjusting memory_limit and error out instead.

The adjustment of memory limit does not look quite sound
 - There is no code to undo the adjustment in case reservation fails
 - I don't think reservation is still forced to the end of memory
   causing the kernel to use memory it was supposed not to
 - The CMA reservation again causes teh reserved memory to be used
 - Finally the CMA reservation makes this obsolete because the reserved
   memory is can be used by the system

> 
> Signed-off-by: Hari Bathini 
Reviewed-by: Michal Suchanek 
> ---
>  arch/powerpc/kernel/fadump.c|   16 
>  arch/powerpc/kernel/machine_kexec.c |   22 +++---
>  2 files changed, 11 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 4eab972..a784695 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -476,22 +476,6 @@ int __init fadump_reserve_mem(void)
>  #endif
>   }
>  
> - /*
> -  * Calculate the memory boundary.
> -  * If memory_limit is less than actual memory boundary then reserve
> -  * the memory for fadump beyond the memory_limit and adjust the
> -  * memory_limit accordingly, so that the running kernel can run with
> -  * specified memory_limit.
> -  */
> - if (memory_limit && memory_limit < memblock_end_of_DRAM()) {
> - size = get_fadump_area_size();
> - if ((memory_limit + size) < memblock_end_of_DRAM())
> - memory_limit += size;
> - else
> - memory_limit = memblock_end_of_DRAM();
> - printk(KERN_INFO "Adjusted memory_limit for firmware-assisted"
> - " dump, now %#016llx\n", memory_limit);
> - }
>   if (memory_limit)
>   memory_boundary = memory_limit;
>   else
> diff --git a/arch/powerpc/kernel/machine_kexec.c 
> b/arch/powerpc/kernel/machine_kexec.c
> index c4ed328..fc5533b 100644
> --- a/arch/powerpc/kernel/machine_kexec.c
> +++ b/arch/powerpc/kernel/machine_kexec.c
> @@ -125,10 +125,8 @@ void __init reserve_crashkernel(void)
>   crashk_res.end = crash_base + crash_size - 1;
>   }
>  
> - if (crashk_res.end == crashk_res.start) {
> - crashk_res.start = crashk_res.end = 0;
> - return;
> - }
> + if (crashk_res.end == crashk_res.start)
> + goto error_out;
>  
>   /* We might have got these values via the command line or the
>* device tree, either way sanitise them now. */
> @@ -170,15 +168,13 @@ void __init reserve_crashkernel(void)
>   if (overlaps_crashkernel(__pa(_stext), _end - _stext)) {
>   printk(KERN_WARNING
>   "Crash kernel can not overlap current kernel\n");
> - crashk_res.start = crashk_res.end = 0;
> - return;
> + goto error_out;
>   }
>  
>   /* Crash kernel trumps memory limit */
>   if (memory_limit && memory_limit <= crashk_res.end) {
> - memory_limit = crashk_res.end + 1;
> - printk("Adjusted memory limit for crashkernel, now 0x%llx\n",
> -memory_limit);
> + pr_err("Crash kernel size can't exceed memory_limit\n");
> + goto error_out;
>   }
>  
>   printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
> @@ -190,9 +186,13 @@ void __init reserve_crashkernel(void)
>   if (!memblock_is_region_memory(crashk_res.start, crash_size) ||
>   memblock_reserve(crashk_res.start, crash_size)) {
>   pr_err("Failed to reserve memory for crashkernel!\n");
> - crashk_res.start = crashk_res.end = 0;
> - return;
> + goto error_out;
>   }
> +
> + return;
> +error_out:
> + crashk_res.start = crashk_res.end = 0;
> + return;
>  }
>  
>  int overlaps_crashkernel(unsigned long start, unsigned long size)
> 


Re: [PATCH] powerpc/process: Remove unneccessary #ifdef CONFIG_PPC64 in copy_thread_tls()

2020-01-29 Thread Michal Suchánek
On Wed, Jan 29, 2020 at 07:50:07PM +, Christophe Leroy wrote:
> is_32bit_task() exists on both PPC64 and PPC32, no need of an ifdefery.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/process.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index fad50db9dcf2..e730b8e522b0 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1634,11 +1634,9 @@ int copy_thread_tls(unsigned long clone_flags, 
> unsigned long usp,
>   p->thread.regs = childregs;
>   childregs->gpr[3] = 0;  /* Result from fork() */
>   if (clone_flags & CLONE_SETTLS) {
> -#ifdef CONFIG_PPC64
>   if (!is_32bit_task())
>   childregs->gpr[13] = tls;
>   else
> -#endif
>   childregs->gpr[2] = tls;
>   }
>  

Reviewed-by: Michal Suchanek 

Thanks

Michal


Re: [PATCH] powerpc/64: system call implement the bulk of the logic in C fix

2020-01-28 Thread Michal Suchánek
On Tue, Jan 28, 2020 at 10:41:02AM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on January 28, 2020 4:08 am:
> > On Tue, Jan 28, 2020 at 12:17:12AM +1000, Nicholas Piggin wrote:
> >> This incremental patch fixes several soft-mask debug and unsafe
> >> smp_processor_id messages due to tracing and false positives in
> >> "unreconciled" code.
> >> 
> >> It also fixes a bug with syscall tracing functions that set registers
> >> (e.g., PTRACE_SETREG) not setting GPRs properly.
> >> 
> >> There was a bug reported with the TM selftests, I haven't been able
> >> to reproduce that one.
> >> 
> >> I can squash this into the main patch and resend the series if it
> >> helps but the incremental helps to see the bug fixes.
> > 
> > There are some whitespace differences between this and the series I have
> > applied locally. What does it apply to?
> > 
> > Is there some revision of the patchset I missed?
> 
> No I may have just missed some of your whitespace cleanups, or maybe I got
> some that Michael made which you don't have in his next-test branch.

Looks like the latter. I will pick patches from next-test.

Thanks

Michal


Re: [PATCH] powerpc/64: system call implement the bulk of the logic in C fix

2020-01-27 Thread Michal Suchánek
On Tue, Jan 28, 2020 at 12:17:12AM +1000, Nicholas Piggin wrote:
> This incremental patch fixes several soft-mask debug and unsafe
> smp_processor_id messages due to tracing and false positives in
> "unreconciled" code.
> 
> It also fixes a bug with syscall tracing functions that set registers
> (e.g., PTRACE_SETREG) not setting GPRs properly.
> 
> There was a bug reported with the TM selftests, I haven't been able
> to reproduce that one.
> 
> I can squash this into the main patch and resend the series if it
> helps but the incremental helps to see the bug fixes.

There are some whitespace differences between this and the series I have
applied locally. What does it apply to?

Is there some revision of the patchset I missed?

Thanks

Michal
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/cputime.h | 39 +-
>  arch/powerpc/kernel/syscall_64.c   | 26 ++--
>  2 files changed, 41 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cputime.h 
> b/arch/powerpc/include/asm/cputime.h
> index c43614cffaac..6639a6847cc0 100644
> --- a/arch/powerpc/include/asm/cputime.h
> +++ b/arch/powerpc/include/asm/cputime.h
> @@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const 
> cputime_t ct)
>  #ifdef CONFIG_PPC64
>  #define get_accounting(tsk)  (_paca()->accounting)
>  static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
> +
> +/*
> + * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
> + * can't use use get_paca()
> + */
> +static notrace inline void account_cpu_user_entry(void)
> +{
> + unsigned long tb = mftb();
> + struct cpu_accounting_data *acct = _paca->accounting;
> +
> + acct->utime += (tb - acct->starttime_user);
> + acct->starttime = tb;
> +}
> +static notrace inline void account_cpu_user_exit(void)
> +{
> + unsigned long tb = mftb();
> + struct cpu_accounting_data *acct = _paca->accounting;
> +
> + acct->stime += (tb - acct->starttime);
> + acct->starttime_user = tb;
> +}
> +
>  #else
>  #define get_accounting(tsk)  (_thread_info(tsk)->accounting)
>  /*
> @@ -60,23 +82,6 @@ static inline void arch_vtime_task_switch(struct 
> task_struct *prev)
>  }
>  #endif
>  
> -static inline void account_cpu_user_entry(void)
> -{
> - unsigned long tb = mftb();
> - struct cpu_accounting_data *acct = get_accounting(current);
> -
> - acct->utime += (tb - acct->starttime_user);
> - acct->starttime = tb;
> -}
> -static inline void account_cpu_user_exit(void)
> -{
> - unsigned long tb = mftb();
> - struct cpu_accounting_data *acct = get_accounting(current);
> -
> - acct->stime += (tb - acct->starttime);
> - acct->starttime_user = tb;
> -}
> -
>  #endif /* __KERNEL__ */
>  #else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>  static inline void account_cpu_user_entry(void)
> diff --git a/arch/powerpc/kernel/syscall_64.c 
> b/arch/powerpc/kernel/syscall_64.c
> index 529393a1ff1e..cfe458adde07 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -19,7 +19,8 @@ extern void __noreturn tabort_syscall(void);
>  
>  typedef long (*syscall_fn)(long, long, long, long, long, long);
>  
> -long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
> r8,
> +/* Has to run notrace because it is entered "unreconciled" */
> +notrace long system_call_exception(long r3, long r4, long r5, long r6, long 
> r7, long r8,
>  unsigned long r0, struct pt_regs *regs)
>  {
>   unsigned long ti_flags;
> @@ -36,7 +37,7 @@ long system_call_exception(long r3, long r4, long r5, long 
> r6, long r7, long r8,
>  #ifdef CONFIG_PPC_SPLPAR
>   if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
>   firmware_has_feature(FW_FEATURE_SPLPAR)) {
> - struct lppaca *lp = get_lppaca();
> + struct lppaca *lp = local_paca->lppaca_ptr;
>  
>   if (unlikely(local_paca->dtl_ridx != be64_to_cpu(lp->dtl_idx)))
>   accumulate_stolen_time();
> @@ -71,13 +72,22 @@ long system_call_exception(long r3, long r4, long r5, 
> long r6, long r7, long r8,
>* We use the return value of do_syscall_trace_enter() as the
>* syscall number. If the syscall was rejected for any reason
>* do_syscall_trace_enter() returns an invalid syscall number
> -  * and the test below against NR_syscalls will fail.
> +  * and the test against NR_syscalls will fail and the return
> +  * value to be used is in regs->gpr[3].
>*/
>   r0 = do_syscall_trace_enter(regs);
> - }
> -
> - if (unlikely(r0 >= NR_syscalls))
> + if (unlikely(r0 >= NR_syscalls))
> + return regs->gpr[3];
> + r3 = regs->gpr[3];
> + r4 = regs->gpr[4];
> + r5 = regs->gpr[5];
> + r6 = regs->gpr[6];
> +

Re: [PATCH] powerpc: drmem: avoid NULL pointer dereference when drmem is unavailable

2020-01-23 Thread Michal Suchánek
On Thu, Jan 23, 2020 at 09:56:10AM -0600, Nathan Lynch wrote:
> Hello and thanks for the patch.
> 
> Libor Pechacek  writes:
> > In KVM guests drmem structure is only zero initialized. Trying to
> > manipulate DLPAR parameters results in a crash in this environment.
> 
> I think this statement needs qualification. Unless I'm mistaken, this
> happens only when you boot a guest without any hotpluggable memory
> configured, and then try to add or remove memory.
> 
> 
> > diff --git a/arch/powerpc/include/asm/drmem.h 
> > b/arch/powerpc/include/asm/drmem.h
> > index 3d76e1c388c2..28c3d936fdf3 100644
> > --- a/arch/powerpc/include/asm/drmem.h
> > +++ b/arch/powerpc/include/asm/drmem.h
> > @@ -27,12 +27,12 @@ struct drmem_lmb_info {
> >  extern struct drmem_lmb_info *drmem_info;
> >  
> >  #define for_each_drmem_lmb_in_range(lmb, start, end)   \
> > -   for ((lmb) = (start); (lmb) <= (end); (lmb)++)
> > +   for ((lmb) = (start); (lmb) < (end); (lmb)++)
> >  
> >  #define for_each_drmem_lmb(lmb)\
> > for_each_drmem_lmb_in_range((lmb),  \
> > _info->lmbs[0],   \
> > -   _info->lmbs[drmem_info->n_lmbs - 1])
> > +   _info->lmbs[drmem_info->n_lmbs])
> >  
> >  /*
> >   * The of_drconf_cell_v1 struct defines the layout of the LMB data
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> > b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > index c126b94d1943..4ea6af002e27 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > @@ -236,9 +236,9 @@ static int get_lmb_range(u32 drc_index, int n_lmbs,
> > if (!start)
> > return -EINVAL;
> >  
> > -   end = [n_lmbs - 1];
> > +   end = [n_lmbs];
> >  
> > -   last_lmb = _info->lmbs[drmem_info->n_lmbs - 1];
> > +   last_lmb = _info->lmbs[drmem_info->n_lmbs];
> > if (end > last_lmb)
> > return -EINVAL;
> 
> Is this not undefined behavior? I'd rather do this in a way that does
> not involve forming out-of-bounds pointers. Even if it's safe, naming
> that pointer "last_lmb" now actively hinders understanding of the code;
> it should be named "limit" or something.

Indeed, the name might be misleading now.

However, the loop differes from anything else we have in the kernel.

The standard explicitly allows the pointer to point just after the last
element to allow expressing the iteration limit without danger of
overflow.

Thanks

Michal


Re: [PATCH] powerpc: drmem: avoid NULL pointer dereference when drmem is unavailable

2020-01-22 Thread Michal Suchánek
On Thu, Jan 16, 2020 at 11:27:58AM +0100, Libor Pechacek wrote:
> In KVM guests drmem structure is only zero initialized. Trying to
> manipulate DLPAR parameters results in a crash in this environment.
> 
> $ echo "memory add count 1" > /sys/kernel/dlpar
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: af_packet(E) rfkill(E) nvram(E) vmx_crypto(E)
> gf128mul(E) e1000(E) virtio_balloon(E) rtc_generic(E) crct10dif_vpmsum(E)
> btrfs(E) blake2b_generic(E) libcrc32c(E) xor(E) raid6_pq(E) virtio_rng(E)
> virtio_blk(E) ohci_pci(E) ehci_pci(E) ohci_hcd(E) ehci_hcd(E)
> crc32c_vpmsum(E) usbcore(E) virtio_pci(E) virtio_ring(E) virtio(E) sg(E)
> dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E)
> scsi_mod(E)
> CPU: 1 PID: 4114 Comm: bash Kdump: loaded Tainted: GE 
> 5.5.0-rc6-2-default #1
> NIP:  c00ff294 LR: c00ff248 CTR: 
> REGS: c000fb9d3880 TRAP: 0300   Tainted: GE  
> (5.5.0-rc6-2-default)
> MSR:  80009033   CR: 28242428  XER: 2000
> CFAR: c09a6c10 DAR: 0010 DSISR: 4000 IRQMASK: 0
> GPR00: c00ff248 c000fb9d3b10 c1682e00 0033
> GPR04: c000ff30bf90 c000ff394800 5110 ffe8
> GPR08:   fe1c 
> GPR12: 2200 c0003fffee00  00011cbc37c0
> GPR16: 00011cb27ed0  00011cb6dd10 
> GPR20: 00011cb7db28 01003ce035f0 00011cbc7828 00011cbc6c70
> GPR24: 01003cf01210  c000ffade4e0 c2d7216b
> GPR28: 0001 c2d78560  c15458d0
> NIP [c00ff294] dlpar_memory+0x6e4/0xd00
> LR [c00ff248] dlpar_memory+0x698/0xd00
> Call Trace:
> [c000fb9d3b10] [c00ff248] dlpar_memory+0x698/0xd00 (unreliable)
> [c000fb9d3ba0] [c00f5990] handle_dlpar_errorlog+0xc0/0x190
> [c000fb9d3c10] [c00f5c58] dlpar_store+0x198/0x4a0
> [c000fb9d3cd0] [c0c4cb00] kobj_attr_store+0x30/0x50
> [c000fb9d3cf0] [c05a37b4] sysfs_kf_write+0x64/0x90
> [c000fb9d3d10] [c05a2c90] kernfs_fop_write+0x1b0/0x290
> [c000fb9d3d60] [c04a2bec] __vfs_write+0x3c/0x70
> [c000fb9d3d80] [c04a6560] vfs_write+0xd0/0x260
> [c000fb9d3dd0] [c04a69ac] ksys_write+0xdc/0x130
> [c000fb9d3e20] [c000b478] system_call+0x5c/0x68
> Instruction dump:
> ebc9 1ce70018 38e7ffe8 7cfe3a14 7fbe3840 419dff14 fb610068 7fc9f378
> 3900 480c 6000 4195fef4 <81490010> 39290018 38c80001 7ea93840
> ---[ end trace cc2dd8152608c295 ]---
> 
> Taking closer look at the code, I can see that for_each_drmem_lmb is a
> macro expanding into `for (lmb = _info->lmbs[0]; lmb <=
> _info->lmbs[drmem_info->n_lmbs - 1]; lmb++)`. When drmem_info->lmbs
> is NULL, the loop would iterate through the whole address range if it
> weren't stopped by the NULL pointer dereference on the next line.
> 
> This patch aligns for_each_drmem_lmb and for_each_drmem_lmb_in_range macro
> behavior with the common C semantics, where the end marker does not belong
> to the scanned range, and alters get_lmb_range() semantics. As a side
> effect, the wraparound observed in the crash is prevented.
> 
> Fixes: 6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT 
> format")
> Cc: Michal Suchanek 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Libor Pechacek 

Reviewed-by: Michal Suchanek 
> ---
>  arch/powerpc/include/asm/drmem.h| 4 ++--
>  arch/powerpc/platforms/pseries/hotplug-memory.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/drmem.h 
> b/arch/powerpc/include/asm/drmem.h
> index 3d76e1c388c2..28c3d936fdf3 100644
> --- a/arch/powerpc/include/asm/drmem.h
> +++ b/arch/powerpc/include/asm/drmem.h
> @@ -27,12 +27,12 @@ struct drmem_lmb_info {
>  extern struct drmem_lmb_info *drmem_info;
>  
>  #define for_each_drmem_lmb_in_range(lmb, start, end) \
> - for ((lmb) = (start); (lmb) <= (end); (lmb)++)
> + for ((lmb) = (start); (lmb) < (end); (lmb)++)
>  
>  #define for_each_drmem_lmb(lmb)  \
>   for_each_drmem_lmb_in_range((lmb),  \
>   _info->lmbs[0],   \
> - _info->lmbs[drmem_info->n_lmbs - 1])
> + _info->lmbs[drmem_info->n_lmbs])
>  
>  /*
>   * The of_drconf_cell_v1 struct defines the layout of the LMB data
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index c126b94d1943..4ea6af002e27 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -236,9 +236,9 @@ static int 

Re: [PATCH v6 3/6] powerpc/fadump: reorganize /sys/kernel/fadump_* sysfs files

2020-01-08 Thread Michal Suchánek
On Wed, Dec 11, 2019 at 09:39:07PM +0530, Sourabh Jain wrote:
> As the number of FADump sysfs files increases it is hard to manage all of
> them inside /sys/kernel directory. It's better to have all the FADump
> related sysfs files in a dedicated directory /sys/kernel/fadump. But in
> order to maintain backward compatibility a symlink has been added for every
> sysfs that has moved to new location.

Patched this series into my test kernel and the sysfs sfiles look OK.

Tested-by: Michal Suchanek 

Thanks

Michal


Re: [PATCH 2/2] powerpc: avoid adjusting memory_limit for capture kernel memory reservation

2020-01-06 Thread Michal Suchánek
On Wed, Jul 24, 2019 at 11:26:59AM +0530, Mahesh Jagannath Salgaonkar wrote:
> On 7/22/19 11:19 PM, Michal Suchánek wrote:
> > On Fri, 28 Jun 2019 00:51:19 +0530
> > Hari Bathini  wrote:
> > 
> >> Currently, if memory_limit is specified and it overlaps with memory to
> >> be reserved for capture kernel, memory_limit is adjusted to accommodate
> >> capture kernel. With memory reservation for capture kernel moved later
> >> (after enforcing memory limit), this adjustment no longer holds water.
> >> So, avoid adjusting memory_limit and error out instead.
> > 
> > Can you split out the memory limit adjustment out of memory reservation
> > so it can still be adjusted?
> 
> Do you mean adjust the memory limit before we do the actual reservation ?

Yes, without that you get a regression in ability to enable fadump with
limited memory - something like the below patch should fix it. Then
again, there is no code to un-move the memory_limit in case the allocation
fails, and we now have cma allocation which is dubious to allocate
beyond memory_limit. So maybe removing the memory_limit adjustment is a
bugfix removing 'feature' that has bitrotten over time.

Thanks

Michal

From: Michal Suchanek 
Date: Mon, 6 Jan 2020 14:55:40 +0100
Subject: [PATCH 2/2] powerpc/fadump: adjust memlimit before MMU early init

Moving the farump memory reservation before early MMU init makes the
memlimit adjustment to make room for fadump ineffective.

Move the adjustment back before early MMU init.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/include/asm/fadump.h |  3 +-
 arch/powerpc/kernel/fadump.c  | 80 +++
 arch/powerpc/kernel/prom.c|  3 ++
 3 files changed, 66 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 526a6a647312..76d3cbe1379c 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -30,6 +30,7 @@ static inline void fadump_cleanup(void) { }
 #if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
 extern int early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
  int depth, void *data);
-extern int fadump_reserve_mem(void);
+int fadump_adjust_memlimit(void);
+int fadump_reserve_mem(void);
 #endif
 #endif /* _ASM_POWERPC_FADUMP_H */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 8ad6d8d1cdbe..4d76452dcb3d 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -431,19 +431,22 @@ static int __init fadump_get_boot_mem_regions(void)
return ret;
 }
 
-int __init fadump_reserve_mem(void)
+static inline u64 fadump_get_reserve_alignment(void)
 {
-   u64 base, size, mem_boundary, bootmem_min, align = PAGE_SIZE;
-   bool is_memblock_bottom_up = memblock_bottom_up();
-   int ret = 1;
+   u64 align = PAGE_SIZE;
 
-   if (!fw_dump.fadump_enabled)
-   return 0;
+#ifdef CONFIG_CMA
+   if (!fw_dump.nocma)
+   align = FADUMP_CMA_ALIGNMENT;
+#endif
 
-   if (!fw_dump.fadump_supported) {
-   pr_info("Firmware-Assisted Dump is not supported on this 
hardware\n");
-   goto error_out;
-   }
+   return align;
+}
+
+static inline u64 fadump_get_bootmem_min(void)
+{
+   u64 bootmem_min = 0;
+   u64 align = fadump_get_reserve_alignment();
 
/*
 * Initialize boot memory size
@@ -455,7 +458,6 @@ int __init fadump_reserve_mem(void)
PAGE_ALIGN(fadump_calculate_reserve_size());
 #ifdef CONFIG_CMA
if (!fw_dump.nocma) {
-   align = FADUMP_CMA_ALIGNMENT;
fw_dump.boot_memory_size =
ALIGN(fw_dump.boot_memory_size, align);
}
@@ -472,8 +474,43 @@ int __init fadump_reserve_mem(void)
pr_err("Too many holes in boot memory area to enable 
fadump\n");
goto error_out;
}
+
+   }
+
+   return bootmem_min;
+error_out:
+   fw_dump.fadump_enabled = 0;
+   return 0;
+}
+
+int __init fadump_adjust_memlimit(void)
+{
+   u64 size, bootmem_min;
+
+   if (!fw_dump.fadump_enabled)
+   return 0;
+
+   if (!fw_dump.fadump_supported) {
+   pr_info("Firmware-Assisted Dump is not supported on this 
hardware\n");
+   fw_dump.fadump_enabled = 0;
+   return 0;
}
 
+#ifdef CONFIG_HUGETLB_PAGE
+   if (fw_dump.dump_active) {
+   /*
+* FADump capture kernel doesn't care much about hugepages.
+* In fact, handling hugepages in capture kernel is asking for
+* trouble. So, disable HugeTLB support when fadump is active.
+*/
+

Re: [PATCH v15 06/23] selftests/vm/pkeys: Typecast the pkey register

2019-12-18 Thread Michal Suchánek
On Wed, Dec 18, 2019 at 01:01:46PM -0800, Dave Hansen wrote:
> On 12/18/19 12:59 PM, Michal Suchánek wrote:
> >> I'd really just rather do %016lx *everywhere* than sprinkle the
> >> PKEY_REG_FMTs around.
> > Does lx work with u32 without warnings?
> 
> Either way, I'd be happy to just make the x86 one u64 to make the whole
> thing look more sane,

So long as it still works with u64 on x86 it should be pretty
future-proof.  Does not look like we are getting 128bit registers for
anything but math units any time soon.

Thanks

Michal


Re: [PATCH v15 06/23] selftests/vm/pkeys: Typecast the pkey register

2019-12-18 Thread Michal Suchánek
On Wed, Dec 18, 2019 at 12:46:50PM -0800, Dave Hansen wrote:
> On 12/17/19 11:51 PM, Sandipan Das wrote:
> > write_pkey_reg(pkey_reg);
> > -   dprintf4("pkey_reg now: %08x\n", read_pkey_reg());
> > +   dprintf4("pkey_reg now: "PKEY_REG_FMT"\n", read_pkey_reg());
> >  }
> >  
> >  #define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
> > diff --git a/tools/testing/selftests/vm/pkey-x86.h 
> > b/tools/testing/selftests/vm/pkey-x86.h
> > index 2f04ade8ca9c..5f40901219d3 100644
> > --- a/tools/testing/selftests/vm/pkey-x86.h
> > +++ b/tools/testing/selftests/vm/pkey-x86.h
> > @@ -46,6 +46,8 @@
> >  #define HPAGE_SIZE (1UL<<21)
> >  #define PAGE_SIZE  4096
> >  #define MB (1<<20)
> > +#define pkey_reg_t u32
> > +#define PKEY_REG_FMT   "%016x"
> 
> How big is the ppc one?
u64
> 
> I'd really just rather do %016lx *everywhere* than sprinkle the
> PKEY_REG_FMTs around.

Does lx work with u32 without warnings?

It's likely the size difference that requires a format specifier definition.

> 
> BTW, why are you doing a %016lx for a u32?

It's "%016x" without 'l' for x86 and with 'l' for ppc64.

Thanks

Michal


Re: [PATCH] powerpc: Add barrier_nospec to raw_copy_in_user()

2019-12-15 Thread Michal Suchánek
On Thu, Sep 19, 2019 at 12:04:27AM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > On Wed,  6 Mar 2019 12:10:38 +1100
> > Suraj Jitindar Singh  wrote:
> >
> >> Commit ddf35cf3764b ("powerpc: Use barrier_nospec in copy_from_user()")
> >> Added barrier_nospec before loading from user-controller pointers.
> >> The intention was to order the load from the potentially user-controlled
> >> pointer vs a previous branch based on an access_ok() check or similar.
> >> 
> >> In order to achieve the same result, add a barrier_nospec to the
> >> raw_copy_in_user() function before loading from such a user-controlled
> >> pointer.
> >> 
> >> Signed-off-by: Suraj Jitindar Singh 
> >> ---
> >>  arch/powerpc/include/asm/uaccess.h | 1 +
> >>  1 file changed, 1 insertion(+)
> >> 
> >> diff --git a/arch/powerpc/include/asm/uaccess.h 
> >> b/arch/powerpc/include/asm/uaccess.h
> >> index e3a731793ea2..bb615592d5bb 100644
> >> --- a/arch/powerpc/include/asm/uaccess.h
> >> +++ b/arch/powerpc/include/asm/uaccess.h
> >> @@ -306,6 +306,7 @@ extern unsigned long __copy_tofrom_user(void __user 
> >> *to,
> >>  static inline unsigned long
> >>  raw_copy_in_user(void __user *to, const void __user *from, unsigned long 
> >> n)
> >>  {
> >> +  barrier_nospec();
> >>return __copy_tofrom_user(to, from, n);
> >>  }
> >>  #endif /* __powerpc64__ */
> >
> > Hello,
> >
> > AFAICT this is not needed. The data cannot be used in the kernel without
> > subsequent copy_from_user which already has the nospec barrier.
> 
> Suraj did talk to me about this before sending the patch. My memory is
> that he came up with a scenario where it was at least theoretically
> useful, but he didn't mention that in the change log. He was working on
> nested KVM at the time.
> 
> I've now forgotten, and he's moved on to other things so probably won't
> remember either, if he's even checking his mail :/

In absence of any argument for the code and absence of the same code on
other architectures I would say you were just confused when merging
this. The code is confusing after all.

Thanks

Michal


Re: [PATCH] powerpc/fault: kernel can extend a user process's stack

2019-12-10 Thread Michal Suchánek
On Wed, Dec 11, 2019 at 12:43:37PM +1100, Daniel Axtens wrote:
> If a process page-faults trying to write beyond the end of its
> stack, we attempt to grow the stack.
> 
> However, if the kernel attempts to write beyond the end of a
> process's stack, it takes a bad fault. This can occur when the
> kernel is trying to set up a signal frame.
> 
> Permit the kernel to grow a process's stack. The same general
> limits as to how and when the stack can be grown apply: the kernel
> code still passes through expand_stack(), so anything that goes
> beyond e.g. the rlimit should still be blocked.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=205183
Fixes: 14cf11af6cf6 ("powerpc: Merge enough to start building in
arch/powerpc.")
> Reported-by: Tom Lane 
> Cc: Daniel Black 
> Signed-off-by: Daniel Axtens 
> ---
>  arch/powerpc/mm/fault.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index b5047f9b5dec..00183731ea22 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -287,7 +287,17 @@ static bool bad_stack_expansion(struct pt_regs *regs, 
> unsigned long address,
>   if (!res)
>   return !store_updates_sp(inst);
>   *must_retry = true;
> + } else if ((flags & FAULT_FLAG_WRITE) &&
> +!(flags & FAULT_FLAG_USER)) {
> + /*
> +  * the kernel can also attempt to write beyond the end
> +  * of a process's stack - for example setting up a
> +  * signal frame. We assume this is valid, subject to
> +  * the checks in expand_stack() later.
> +  */
> + return false;
>   }
> +
>   return true;
>   }
>   return false;
> -- 
> 2.20.1
> 


Re: [PATCH v2 29/35] powerpc/perf: remove current_is_64bit()

2019-11-27 Thread Michal Suchánek
On Wed, Nov 27, 2019 at 06:41:09AM +0100, Christophe Leroy wrote:
> 
> 
> Le 26/11/2019 à 21:13, Michal Suchanek a écrit :
> > Since commit ed1cd6deb013 ("powerpc: Activate CONFIG_THREAD_INFO_IN_TASK")
> > current_is_64bit() is quivalent to !is_32bit_task().
> > Remove the redundant function.
> > 
> > Link: https://github.com/linuxppc/issues/issues/275
> > Link: https://lkml.org/lkml/2019/9/12/540
> > 
> > Fixes: linuxppc#275
> > Suggested-by: Christophe Leroy 
> > Signed-off-by: Michal Suchanek 
> 
> This change is already in powerpc/next, see 
> https://github.com/linuxppc/linux/commit/42484d2c0f82b666292faf6668c77b49a3a04bc0

Right, needs rebase.

Thanks

Michal
> 
> Christophe
> 
> > ---
> >   arch/powerpc/perf/callchain.c | 17 +
> >   1 file changed, 1 insertion(+), 16 deletions(-)
> > 
> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> > index c84bbd4298a0..35d542515faf 100644
> > --- a/arch/powerpc/perf/callchain.c
> > +++ b/arch/powerpc/perf/callchain.c
> > @@ -284,16 +284,6 @@ static void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry,
> > }
> >   }
> > -static inline int current_is_64bit(void)
> > -{
> > -   /*
> > -* We can't use test_thread_flag() here because we may be on an
> > -* interrupt stack, and the thread flags don't get copied over
> > -* from the thread_info on the main stack to the interrupt stack.
> > -*/
> > -   return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> > -}
> > -
> >   #else  /* CONFIG_PPC64 */
> >   /*
> >* On 32-bit we just access the address and let hash_page create a
> > @@ -321,11 +311,6 @@ static inline void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry
> >   {
> >   }
> > -static inline int current_is_64bit(void)
> > -{
> > -   return 0;
> > -}
> > -
> >   static inline int valid_user_sp(unsigned long sp, int is_64)
> >   {
> > if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
> > @@ -486,7 +471,7 @@ static void perf_callchain_user_32(struct 
> > perf_callchain_entry_ctx *entry,
> >   void
> >   perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct 
> > pt_regs *regs)
> >   {
> > -   if (current_is_64bit())
> > +   if (!is_32bit_task())
> > perf_callchain_user_64(entry, regs);
> > else
> > perf_callchain_user_32(entry, regs);
> > 


Re: [PATCH v3 2/4] powerpc/fadump: reorganize /sys/kernel/fadump_* sysfs files

2019-11-24 Thread Michal Suchánek
On Sat, Nov 16, 2019 at 08:07:29PM +0530, Sourabh Jain wrote:
> 
> 
> On 11/9/19 6:29 PM, Michal Suchánek wrote:
> > On Sat, Nov 09, 2019 at 05:53:37PM +0530, Sourabh Jain wrote:
> >> As the number of FADump sysfs files increases it is hard to manage all of
> >> them inside /sys/kernel directory. It's better to have all the FADump
> >> related sysfs files in a dedicated directory /sys/kernel/fadump. But in
> >> order to maintain the backward compatibility the /sys/kernel/fadump_*
> >> sysfs files are replicated inside /sys/kernel/fadump/ and eventually get
> >> removed in future.
> >>
> >> As the FADump sysfs files are now part of dedicated directory there is no
> >> need to prefix their name with fadump_, hence sysfs file names are also
> >> updated. For example fadump_enabled sysfs file is now referred as enabled.
> >>
> >> Also consolidate ABI documentation for all the FADump sysfs files in a
> >> single file Documentation/ABI/testing/sysfs-kernel-fadump.
> >>
> >> Signed-off-by: Sourabh Jain 
> >> ---
> >>  Documentation/ABI/testing/sysfs-kernel-fadump | 41 +++
> >>  arch/powerpc/kernel/fadump.c  | 38 +
> >>  arch/powerpc/platforms/powernv/opal-core.c| 10 +++--
> >>  3 files changed, 86 insertions(+), 3 deletions(-)
> >>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump
> >>
> >> diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump 
> >> b/Documentation/ABI/testing/sysfs-kernel-fadump
> >> new file mode 100644
> >> index ..a77f1a5ba389
> >> --- /dev/null
> >> +++ b/Documentation/ABI/testing/sysfs-kernel-fadump
> >> @@ -0,0 +1,41 @@
> >> +What: /sys/kernel/fadump/*
> >> +Date: Nov 2019
> >> +Contact:  linuxppc-dev@lists.ozlabs.org
> >> +Description:
> >> +  The /sys/kernel/fadump/* is a collection of FADump sysfs
> >> +  file provide information about the configuration status
> >> +  of Firmware Assisted Dump (FADump).
> >> +
> >> +What: /sys/kernel/fadump/enabled
> >> +Date: Nov 2019
> >> +Contact:  linuxppc-dev@lists.ozlabs.org
> >> +Description:  read only
> >> +  Primarily used to identify whether the FADump is enabled in
> >> +  the kernel or not.
> >> +User: Kdump service
> >> +
> >> +What: /sys/kernel/fadump/registered
> >> +Date: Nov 2019
> >> +Contact:  linuxppc-dev@lists.ozlabs.org
> >> +Description:  read/write
> >> +  Helps to control the dump collect feature from userspace.
> >> +  Setting 1 to this file enables the system to collect the
> >> +  dump and 0 to disable it.
> >> +User: Kdump service
> >> +
> >> +What: /sys/kernel/fadump/release_mem
> >> +Date: Nov 2019
> >> +Contact:  linuxppc-dev@lists.ozlabs.org
> >> +Description:  write only
> >> +  This is a special sysfs file and only available when
> >> +  the system is booted to capture the vmcore using FADump.
> >> +  It is used to release the memory reserved by FADump to
> >> +  save the crash dump.
> >> +
> >> +What: /sys/kernel/fadump/release_opalcore
> >> +Date: Nov 2019
> >> +Contact:  linuxppc-dev@lists.ozlabs.org
> >> +Description:  write only
> >> +  The sysfs file is available when the system is booted to
> >> +  collect the dump on OPAL based machine. It used to release
> >> +  the memory used to collect the opalcore.
> >> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> >> index ed59855430b9..a9591def0c84 100644
> >> --- a/arch/powerpc/kernel/fadump.c
> >> +++ b/arch/powerpc/kernel/fadump.c
> >> @@ -1418,6 +1418,9 @@ static int fadump_region_show(struct seq_file *m, 
> >> void *private)
> >>return 0;
> >>  }
> >>  
> >> +struct kobject *fadump_kobj;
> >> +EXPORT_SYMBOL_GPL(fadump_kobj);
> >> +
> >>  static struct kobj_attribute fadump_release_attr = 
> >> __ATTR(fadump_release_mem,
> >>0200, NULL,
> >>fadump_release_memory_store);
> >> @@ -1428,6 

Re: WARN_ONCE does not warn once

2019-11-15 Thread Michal Suchánek
On Fri, Nov 15, 2019 at 09:44:34AM +0100, Michal Suchánek wrote:
> On Fri, Nov 15, 2019 at 03:43:24PM +1100, Michael Ellerman wrote:
> > Michal Suchánek  writes:
> > > On Thu, Nov 14, 2019 at 05:46:55PM +0100, Michal Suchánek wrote:
> > >> Hello,
> > >> 
> > >> on powernv with 5.3.8 kernel I get flood of messages on boot.
> > >> 
> > >> The messages match WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");
> > >
> > > Asking around it was pointed out that WARN_ONCE warns up to as many
> > > times as you have CPUs.
> > 
> > Does it?
> > 
> > > Did not bother counting the messages but it may very well be the case it
> > > is printed once for each CPU.
> > 
> > The way it's implemented is slightly racy, but I'd be surprised if every
> > CPU hit that race all at once.
> 
> Printing a warn_once this early probably forces some peculiar timing.
> grep  WARN.*__opal_flush_console dmesg.txt | wc -l gives exactly the
> number of CPUs as shown by lscpu.
> 
And this dose not change with enforcing once using an atomic.

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index d271accf224b..dd870124b804 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -425,6 +425,7 @@ static s64 __opal_flush_console(uint32_t vtermno)
s64 rc;
 
if (!opal_check_token(OPAL_CONSOLE_FLUSH)) {
+   static atomic_t warned;
__be64 evt;
 
/*
@@ -432,7 +433,8 @@ static s64 __opal_flush_console(uint32_t vtermno)
 * the console can still be flushed by calling the polling
 * function while it has OPAL_EVENT_CONSOLE_OUTPUT events.
 */
-   WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");
+   if (!atomic_xchg(, 1))
+   WARN(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");
 
opal_poll_events();
if (!(be64_to_cpu(evt) & OPAL_EVENT_CONSOLE_OUTPUT))

Something more tricky is going on.

Thanks

Michal


Re: WARN_ONCE does not warn once

2019-11-14 Thread Michal Suchánek
On Thu, Nov 14, 2019 at 05:46:55PM +0100, Michal Suchánek wrote:
> Hello,
> 
> on powernv with 5.3.8 kernel I get flood of messages on boot.
> 
> The messages match WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");

Asking around it was pointed out that WARN_ONCE warns up to as many
times as you have CPUs.

Did not bother counting the messages but it may very well be the case it
is printed once for each CPU.

Thanks

Michal


WARN_ONCE does not warn once

2019-11-14 Thread Michal Suchánek
Hello,

on powernv with 5.3.8 kernel I get flood of messages on boot.

The messages match WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");

[ cut here ]
opal: OPAL_CONSOLE_FLUSH missing.
WARNING: CPU: 0 PID: 0 at ../arch/powerpc/platforms/powernv/opal.c:435 
__opal_flush_console+0xfc/0x110
Modules linked in:
Supported: No, Unreleased kernel
CPU: 0 PID: 0 Comm: swapper Not tainted 5.3.8-11.gea9b07d-default #1 SLE15-SP2 
(unreleased)
NIP:  c00b9b9c LR: c00b9b98 CTR: c0c06ca0
REGS: c158b8c0 TRAP: 0700   Not tainted  (5.3.8-11.gea9b07d-default)
MSR:  90021033   CR: 28004222  XER: 2000
CFAR: c0132df0 IRQMASK: 1 
GPR00: c00b9b98 c158bb50 c158c600 0022 
GPR04: c0e63b7a 0002 4f534e4f435f4c41 4853554c465f454c 
GPR08: 676e697373696d20  0027 90001033 
GPR12: c01cc080 c215   
GPR16: 313114c0  c14641e0  
GPR20:  c1765738 c1763738 0001 
GPR24: c15c1ed8 c1763318 c1762dec 0001 
GPR28:  c2130d18   
NIP [c00b9b9c] __opal_flush_console+0xfc/0x110
LR [c00b9b98] __opal_flush_console+0xf8/0x110
Call Trace:
[c158bb50] [c00b9b98] __opal_flush_console+0xf8/0x110 
(unreliable)
[c158bbd0] [c00b9dbc] opal_flush_console+0x2c/0x60
[c158bc00] [c080c180] udbg_opal_putc+0x80/0xd0
[c158bc50] [c0033660] udbg_write+0x90/0x140
[c158bc90] [c01c888c] console_unlock+0x32c/0x820
[c158bd70] [c01c9768] register_console+0x5d8/0x790
[c158be10] [c0fb9dcc] register_early_udbg_console+0x9c/0xb4
[c158be80] [c0fb99c8] setup_arch+0x78/0x39c
[c158bef0] [c0fb3c00] start_kernel+0xb4/0x6bc
[c158bf90] [c000c4d4] start_here_common+0x1c/0x20
Instruction dump:
6000 3860fffe 4ba8 6000 6000 3c62ff8d 3921 3d42fff6 
38637558 992afea8 480791f5 6000 <0fe0> 4b54 48079715 6000 
random: get_random_bytes called from print_oops_end_marker+0x6c/0xa0 with 
crng_init=0
---[ end trace  ]---

This is reapated numerous times and the kernel eventually boots.

Is that an issue with the WARN_ONCE implementation?

Thanks

Michal


Re: [PATCH] Fix wrong message when RFI Flush is disable

2019-11-14 Thread Michal Suchánek
On Thu, Nov 14, 2019 at 08:07:35PM +1100, Michael Ellerman wrote:
> On Thu, 2019-05-02 at 21:09:07 UTC, Gustavo Walbon wrote:
> > From: "Gustavo L. F. Walbon" 
> > 
> > The issue was showing "Mitigation" message via sysfs whatever the state of
> > "RFI Flush", but it should show "Vulnerable" when it is disabled.
> > 
> > If you have "L1D private" feature enabled and not "RFI Flush" you are
> > vulnerable to meltdown attacks.
> > 
> > "RFI Flush" is the key feature to mitigate the meltdown whatever the
> > "L1D private" state.
> > 
> > SEC_FTR_L1D_THREAD_PRIV is a feature for Power9 only.
> > 
> > So the message should be as the truth table shows.
> > CPU | L1D private | RFI Flush |   sysfs   |
> > | --- | - | - |
> >  P9 |False|   False   | Vulnerable
> >  P9 |False|   True| Mitigation: RFI Flush
> >  P9 |True |   False   | Vulnerable: L1D private per thread
> >  P9 |True |   True| Mitigation: RFI Flush, L1D private per
> > | |   | thread
> >  P8 |False|   False   | Vulnerable
> >  P8 |False|   True| Mitigation: RFI Flush
> > 
> > Output before this fix:
> >  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
> >  Mitigation: RFI Flush, L1D private per thread
> >  # echo 0 > /sys/kernel/debug/powerpc/rfi_flush
> >  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
> >  Mitigation: L1D private per thread
> > 
> > Output after fix:
> >  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
> >  Mitigation: RFI Flush, L1D private per thread
> >  # echo 0 > /sys/kernel/debug/powerpc/rfi_flush
> >  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
> >  Vulnerable: L1D private per thread
> > 
> > Link: https://github.com/linuxppc/issues/issues/243
> > 
> > Signed-off-by: Gustavo L. F. Walbon 
> > Signed-off-by: Mauro S. M. Rodrigues 
> 
> Applied to powerpc next, thanks.
> 
> https://git.kernel.org/powerpc/c/4e706af3cd8e1d0503c25332b30cad33c97ed442
> 
> cheers

Fixes: ff348355e9c7 ("powerpc/64s: Enhance the information in
cpu_show_meltdown()")

Thanks

Michal


Re: [PATCH 31/33] powerpc/64: make buildable without CONFIG_COMPAT

2019-11-13 Thread Michal Suchánek
On Wed, Nov 13, 2019 at 01:02:34PM +1000, Nicholas Piggin wrote:
> Michal Suchanek's on November 13, 2019 2:52 am:
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek 
> 
> For the most part these seem okay to me.
> 
> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index 45f1d5e54671..35874119b398 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -44,16 +44,16 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
> >  endif
> >  
> >  obj-y  := cputable.o ptrace.o syscalls.o \
> > -  irq.o align.o signal_32.o pmc.o vdso.o \
> > +  irq.o align.o signal_$(BITS).o pmc.o vdso.o \
> >process.o systbl.o idle.o \
> >signal.o sysfs.o cacheinfo.o time.o \
> >prom.o traps.o setup-common.o \
> >udbg.o misc.o io.o misc_$(BITS).o \
> >of_platform.o prom_parse.o
> > -obj-$(CONFIG_PPC64)+= setup_64.o sys_ppc32.o \
> > -  signal_64.o ptrace32.o \
> > +obj-$(CONFIG_PPC64)+= setup_64.o \
> >paca.o nvram_64.o firmware.o note.o \
> >syscall_64.o
> > +obj-$(CONFIG_COMPAT)   += sys_ppc32.o ptrace32.o signal_32.o
> >  obj-$(CONFIG_VDSO32)   += vdso32/
> >  obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
> >  obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> > index 00173cc904ef..c339a984958f 100644
> > --- a/arch/powerpc/kernel/entry_64.S
> > +++ b/arch/powerpc/kernel/entry_64.S
> > @@ -52,8 +52,10 @@
> >  SYS_CALL_TABLE:
> > .tc sys_call_table[TC],sys_call_table
> >  
> > +#ifdef CONFIG_COMPAT
> >  COMPAT_SYS_CALL_TABLE:
> > .tc compat_sys_call_table[TC],compat_sys_call_table
> > +#endif
> >  
> >  /* This value is used to mark exception frames on the stack. */
> >  exception_marker:
> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> > index 60436432399f..61678cb0e6a1 100644
> > --- a/arch/powerpc/kernel/signal.c
> > +++ b/arch/powerpc/kernel/signal.c
> > @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
> > sigset_t *oldset = sigmask_to_save();
> > struct ksignal ksig = { .sig = 0 };
> > int ret;
> > -   int is32 = is_32bit_task();
> >  
> > BUG_ON(tsk != current);
> >  
> > @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
> >  
> > rseq_signal_deliver(, tsk->thread.regs);
> >  
> > -   if (is32) {
> > +   if (is_32bit_task()) {
> > if (ksig.ka.sa.sa_flags & SA_SIGINFO)
> > ret = handle_rt_signal32(, oldset, tsk);
> > else
> 
> This is just a clean up I guess.

It also expands directly to if(0) or if(1) for the !COMPAT cases. I am
not sure how it would work with the intermediate variable.

There was more complex change initially but after some additional
cleanups removing the variable is the only part left.

> 
> > diff --git a/arch/powerpc/kernel/syscall_64.c 
> > b/arch/powerpc/kernel/syscall_64.c
> > index d00cfc4a39a9..319ebd4f494d 100644
> > --- a/arch/powerpc/kernel/syscall_64.c
> > +++ b/arch/powerpc/kernel/syscall_64.c
> > @@ -17,7 +17,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, 
> > long);
> >  
> >  long system_call_exception(long r3, long r4, long r5, long r6, long r7, 
> > long r8, unsigned long r0, struct pt_regs *regs)
> >  {
> > -   unsigned long ti_flags;
> > syscall_fn f;
> >  
> > if (IS_ENABLED(CONFIG_PPC_BOOK3S))
> > @@ -64,8 +63,7 @@ long system_call_exception(long r3, long r4, long r5, 
> > long r6, long r7, long r8,
> >  
> > __hard_irq_enable();
> >  
> > -   ti_flags = current_thread_info()->flags;
> > -   if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> > +   if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> > /*
> >  * We use the return value of do_syscall_trace_enter() as the
> >  * syscall number. If the syscall was rejected for any reason
> > @@ -81,7 +79,7 @@ long system_call_exception(long r3, long r4, long r5, 
> > long r6, long r7, long r8,
> > /* May be faster to do array_index_nospec? */
> > barrier_nospec();
> >  
> > -   if (unlikely(ti_flags & _TIF_32BIT)) {
> > +   if (unlikely(is_32bit_task())) {
> > f = (void *)compat_sys_call_table[r0];
> >  
> > r3 &= 0xULL;
> 
> I guess this is okay, I did want to be careful about where ti_flags
> was loaded exactly, but I think DOTRACE and 32BIT are not volatile.
> Is it possible to define _TIF_32BIT to zero for 64-bit !compat case
> and have the original branch 

Re: [PATCH v3 2/4] powerpc/fadump: reorganize /sys/kernel/fadump_* sysfs files

2019-11-09 Thread Michal Suchánek
On Sat, Nov 09, 2019 at 05:53:37PM +0530, Sourabh Jain wrote:
> As the number of FADump sysfs files increases it is hard to manage all of
> them inside /sys/kernel directory. It's better to have all the FADump
> related sysfs files in a dedicated directory /sys/kernel/fadump. But in
> order to maintain the backward compatibility the /sys/kernel/fadump_*
> sysfs files are replicated inside /sys/kernel/fadump/ and eventually get
> removed in future.
> 
> As the FADump sysfs files are now part of dedicated directory there is no
> need to prefix their name with fadump_, hence sysfs file names are also
> updated. For example fadump_enabled sysfs file is now referred as enabled.
> 
> Also consolidate ABI documentation for all the FADump sysfs files in a
> single file Documentation/ABI/testing/sysfs-kernel-fadump.
> 
> Signed-off-by: Sourabh Jain 
> ---
>  Documentation/ABI/testing/sysfs-kernel-fadump | 41 +++
>  arch/powerpc/kernel/fadump.c  | 38 +
>  arch/powerpc/platforms/powernv/opal-core.c| 10 +++--
>  3 files changed, 86 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump
> 
> diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump 
> b/Documentation/ABI/testing/sysfs-kernel-fadump
> new file mode 100644
> index ..a77f1a5ba389
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-fadump
> @@ -0,0 +1,41 @@
> +What:/sys/kernel/fadump/*
> +Date:Nov 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description:
> + The /sys/kernel/fadump/* is a collection of FADump sysfs
> + file provide information about the configuration status
> + of Firmware Assisted Dump (FADump).
> +
> +What:/sys/kernel/fadump/enabled
> +Date:Nov 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: read only
> + Primarily used to identify whether the FADump is enabled in
> + the kernel or not.
> +User:Kdump service
> +
> +What:/sys/kernel/fadump/registered
> +Date:Nov 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: read/write
> + Helps to control the dump collect feature from userspace.
> + Setting 1 to this file enables the system to collect the
> + dump and 0 to disable it.
> +User:Kdump service
> +
> +What:/sys/kernel/fadump/release_mem
> +Date:Nov 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: write only
> + This is a special sysfs file and only available when
> + the system is booted to capture the vmcore using FADump.
> + It is used to release the memory reserved by FADump to
> + save the crash dump.
> +
> +What:/sys/kernel/fadump/release_opalcore
> +Date:Nov 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: write only
> + The sysfs file is available when the system is booted to
> + collect the dump on OPAL based machine. It used to release
> + the memory used to collect the opalcore.
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index ed59855430b9..a9591def0c84 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -1418,6 +1418,9 @@ static int fadump_region_show(struct seq_file *m, void 
> *private)
>   return 0;
>  }
>  
> +struct kobject *fadump_kobj;
> +EXPORT_SYMBOL_GPL(fadump_kobj);
> +
>  static struct kobj_attribute fadump_release_attr = __ATTR(fadump_release_mem,
>   0200, NULL,
>   fadump_release_memory_store);
> @@ -1428,6 +1431,16 @@ static struct kobj_attribute fadump_register_attr = 
> __ATTR(fadump_registered,
>   0644, fadump_register_show,
>   fadump_register_store);
>  
> +static struct kobj_attribute release_attr = __ATTR(release_mem,
> + 0200, NULL,
> + fadump_release_memory_store);
> +static struct kobj_attribute enable_attr = __ATTR(enabled,
> + 0444, fadump_enabled_show,
> + NULL);
> +static struct kobj_attribute register_attr = __ATTR(registered,
> + 0644, fadump_register_show,
> + fadump_register_store);
> +
>  DEFINE_SHOW_ATTRIBUTE(fadump_region);
>  
>  static void fadump_init_files(void)
> @@ -1435,6 +1448,11 @@ static void fadump_init_files(void)
>   struct dentry *debugfs_file;
>   int rc = 0;
>  
> + fadump_kobj = 

Re: [PATCH] powerpc/fadump: Remove duplicate message.

2019-11-07 Thread Michal Suchánek
On Thu, Oct 24, 2019 at 01:16:51PM +0200, Michal Suchánek wrote:
> On Thu, Oct 24, 2019 at 04:08:08PM +0530, Hari Bathini wrote:
> > 
> > Michal, thanks for looking into this.
> > 
> > On 23/10/19 11:26 PM, Michal Suchanek wrote:
> > > There is duplicate message about lack of support by firmware in
> > > fadump_reserve_mem and setup_fadump. Due to different capitalization it
> > > is clear that the one in setup_fadump is shown on boot. Remove the
> > > duplicate that is not shown.
> > 
> > Actually, the message in fadump_reserve_mem() is logged. 
> > fadump_reserve_mem()
> > executes first and sets fw_dump.fadump_enabled to `0`, if fadump is not 
> > supported.
> > So, the other message in setup_fadump() doesn't get logged anymore with 
> > recent
> > changes. The right thing to do would be to remove similar message in 
> > setup_fadump() instead.
> 
> I need to re-check with a recent kernel build. I saw the message from
> setup_fadump and not the one from fadump_reserve_mem but not sure what
> the platform init code looked like in the kernel I tested with.

Indeed, I was missing the patch that changes the capitalization in
fadump_reserve_mem. In my kernel both messages are the same and the one
from fadump_reserve_mem is displayed.

Thanks

Michal


Re: [RFC PATCH 00/27] current interrupt series plus scv syscall

2019-10-30 Thread Michal Suchánek
Hello,

On Wed, Oct 02, 2019 at 01:13:52PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on September 24, 2019 7:33 pm:
> > Hello,
> > 
> > can you mark the individual patches with RFC rather than the wole
> > series?
> 
> Hey, thanks for the reviews. I'll resend all but the last two patches
> soon for merge in the next window.

Will you resend these or should I cherry-pick the part I need for the
!COMPAT?

Thanks

Michal


Re: [PATCH] powerpc/fadump: Remove duplicate message.

2019-10-24 Thread Michal Suchánek
On Thu, Oct 24, 2019 at 04:08:08PM +0530, Hari Bathini wrote:
> 
> Michal, thanks for looking into this.
> 
> On 23/10/19 11:26 PM, Michal Suchanek wrote:
> > There is duplicate message about lack of support by firmware in
> > fadump_reserve_mem and setup_fadump. Due to different capitalization it
> > is clear that the one in setup_fadump is shown on boot. Remove the
> > duplicate that is not shown.
> 
> Actually, the message in fadump_reserve_mem() is logged. fadump_reserve_mem()
> executes first and sets fw_dump.fadump_enabled to `0`, if fadump is not 
> supported.
> So, the other message in setup_fadump() doesn't get logged anymore with recent
> changes. The right thing to do would be to remove similar message in 
> setup_fadump() instead.

I need to re-check with a recent kernel build. I saw the message from
setup_fadump and not the one from fadump_reserve_mem but not sure what
the platform init code looked like in the kernel I tested with.

Thanks

Michal


Re: [RFC PATCH 23/27] powerpc/64: system call implement the bulk of the logic in C

2019-10-02 Thread Michal Suchánek
On Sun, Sep 15, 2019 at 11:28:09AM +1000, Nicholas Piggin wrote:
> System call entry and particularly exit code is beyond the limit of what
> is reasonable to implement in asm.
> 
> This conversion moves all conditional branches out of the asm code,
> except for the case that all GPRs should be restored at exit.
> 
> Null syscall test is about 5% faster after this patch, because the exit
> work is handled under local_irq_disable, and the hard mask and pending
> interrupt replay is handled after that, which avoids games with MSR.
> 
> Signed-off-by: Nicholas Piggin 
> 
> v3:
> - Fix !KUAP build [mpe]
> - Fix BookE build/boot [mpe]
> - Don't trace irqs with MSR[RI]=0
> - Don't allow syscall_exit_prepare to be ftraced, because function
>   graph tracing which traces exits barfs after the IRQ state is
>   prepared for kernel exit.
> - Fix BE syscall table to use normal function descriptors now that they
>   are called from C.
> - Comment syscall_exit_prepare.
...
> -#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
> -BEGIN_FW_FTR_SECTION
> - /* see if there are any DTL entries to process */
> - ld  r10,PACALPPACAPTR(r13)  /* get ptr to VPA */
> - ld  r11,PACA_DTL_RIDX(r13)  /* get log read index */
> - addir10,r10,LPPACA_DTLIDX
> - LDX_BE  r10,0,r10   /* get log write index */
> - cmpdr11,r10
> - beq+33f
> - bl  accumulate_stolen_time
> - REST_GPR(0,r1)
> - REST_4GPRS(3,r1)
> - REST_2GPRS(7,r1)
> - addir9,r1,STACK_FRAME_OVERHEAD
> -33:
> -END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
> -#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE && CONFIG_PPC_SPLPAR */
...
> diff --git a/arch/powerpc/kernel/syscall_64.c 
> b/arch/powerpc/kernel/syscall_64.c
> new file mode 100644
> index ..1d2529824588
> --- /dev/null
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -0,0 +1,195 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +extern void __noreturn tabort_syscall(void);
> +
> +typedef long (*syscall_fn)(long, long, long, long, long, long);
> +
> +long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
> r8, unsigned long r0, struct pt_regs *regs)
> +{
> + unsigned long ti_flags;
> + syscall_fn f;
> +
> + BUG_ON(!(regs->msr & MSR_PR));
> +
> + if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
> + unlikely(regs->msr & MSR_TS_T))
> + tabort_syscall();
> +
> + account_cpu_user_entry();
> +
> +#ifdef CONFIG_PPC_SPLPAR
> + if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
> + firmware_has_feature(FW_FEATURE_SPLPAR)) {
> + struct lppaca *lp = get_lppaca();
> +
> + if (unlikely(local_paca->dtl_ridx != lp->dtl_idx))

sparse complains about this, and in time.c this it converted like this:
if (unlikely(local_paca->dtl_ridx != be64_to_cpu(lp->dtl_idx)))
The removed code has a LDX_BE there so there should be some conversion
involved, right?

Thanks

Michal


Re: [RFC PATCH 27/27] powerpc/64s: system call support for scv/rfscv instructions

2019-10-02 Thread Michal Suchánek
On Sun, Sep 15, 2019 at 11:28:13AM +1000, Nicholas Piggin wrote:
> Add support for the scv instruction on POWER9 and later CPUs.
> 
> For now this implements the zeroth scv vector 'scv 0', as identical
> to 'sc' system calls, with the exception that lr is not preserved, and
> it is 64-bit only. There may yet be changes made to this ABI, so it's
> for testing only.
> 
> This also doesn't yet properly handle PR KVM, or the case where a guest
> is denied AIL=3 mode. I haven't added real mode entry points, so scv
> must not be used when AIL=0, but I haven't cleared the FSCR in this
> case.
> 
> This also implements a strange hack to handle the non-implemented
> vectors, scheduling a decrementer and expecting it to interrupt and
> replay pending soft masked interrupts. This is unlikely to be a good
> idea, and needs to actually do a proper handler and replay in case an
> interrupt is pending.
> 
> It may also require some errata handling before it can be safely used
> on all POWER9 CPUs, I have to look that up.
> 
> rfscv is implemented to return from scv type system calls. It can not
> be used to return from sc system calls because those are defined to
> preserve lr.
> 
> In a comparison of getpid syscall, the test program had scv taking
> about 3 more cycles in user mode (92 vs 89 for sc), due to lr handling.
> Total cycles taken for a getpid system call on POWER9 are improved from
> 436 to 345 (26.3% faster), mostly due to reducing mtmsr and mtspr.
...
> diff --git a/arch/powerpc/kernel/syscall_64.c 
> b/arch/powerpc/kernel/syscall_64.c
> index 034b52d3d78c..3e8aa5ae8ec8 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -15,6 +15,77 @@ extern void __noreturn tabort_syscall(void);
>  
>  typedef long (*syscall_fn)(long, long, long, long, long, long);
>  
> +#ifdef CONFIG_PPC_BOOK3S
> +long system_call_vectored_exception(long r3, long r4, long r5, long r6, long 
> r7, long r8, unsigned long r0, struct pt_regs *regs)
> +{
> + unsigned long ti_flags;
> + syscall_fn f;
> +
> + BUG_ON(!(regs->msr & MSR_RI));
> + BUG_ON(!(regs->msr & MSR_PR));
> + BUG_ON(!FULL_REGS(regs));
> + BUG_ON(regs->softe != IRQS_ENABLED);
> +
> + if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
> + unlikely(regs->msr & MSR_TS_T))
> + tabort_syscall();
> +
> + account_cpu_user_entry();
> +
> +#ifdef CONFIG_PPC_SPLPAR
> + if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
> + firmware_has_feature(FW_FEATURE_SPLPAR)) {
> + struct lppaca *lp = get_lppaca();
> +
> + if (unlikely(local_paca->dtl_ridx != lp->dtl_idx))
This adds another instance of the lack of endian conversion issue.
> + accumulate_stolen_time();
> + }
> +#endif

Thanks

Michal


Re: [RFC PATCH 24/27] powerpc/64s: interrupt return in C

2019-10-02 Thread Michal Suchánek
On Sun, Sep 15, 2019 at 11:28:10AM +1000, Nicholas Piggin wrote:
> Implement the bulk of interrupt return logic in C. The asm return code
> must handle a few cases: restoring full GPRs, and emulating stack store.
> 
> The asm return code is moved into 64e for now. The new logic has made
> allowance for 64e, but I don't have a full environment that works well
> to test it, and even booting in emulated qemu is not great for stress
> testing. 64e shouldn't be too far off working with this, given a bit
> more testing and auditing of the logic.
> 
> This is slightly faster on a POWER9 (page fault speed increases about
> 1.1%), probably due to reduced mtmsrd.
> 
> Signed-off-by: Nicholas Piggin 
...
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 24621e7e5033..45c1524b6c9e 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -524,6 +524,7 @@ void giveup_all(struct task_struct *tsk)
>  }
>  EXPORT_SYMBOL(giveup_all);
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
This fails build on !BOOK3S_64 because restore_fpu and restore_altivec
are used exclusively from restore_math which is now BOOK3S_64 only.
>  /*
>   * The exception exit path calls restore_math() with interrupts hard disabled
>   * but the soft irq state not "reconciled". ftrace code that calls
> @@ -564,6 +565,7 @@ void notrace restore_math(struct pt_regs *regs)
>  
>   regs->msr = msr;
>  }
> +#endif
>  
>  static void save_all(struct task_struct *tsk)
>  {

Thanks

Michal


Re: [RFC PATCH 00/27] current interrupt series plus scv syscall

2019-10-02 Thread Michal Suchánek
Hello,

can you mark the individual patches with RFC rather than the wole
series?

Thanks

Michal

On Sun, Sep 15, 2019 at 11:27:46AM +1000, Nicholas Piggin wrote:
> My interrupt entry patches have finally collided with syscall and
> interrupt exit patches, so I'll merge the series. Most patches have
> been seen already, however there have been a number of small changes
> and fixes throughout the series.
> 
> The final two patches add support for 'scv' and 'rfscv' instructions.
> 
> I'm posting this out now so we can start considering ABI and userspace
> support. We have the PPC_FEATURE2_SCV hwcap for this.
> 
> Thanks,
> Nick
> 
> Nicholas Piggin (27):
>   powerpc/64s/exception: Introduce INT_DEFINE parameter block for code
> generation
>   powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE
> parameters
>   powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE
> parameters
>   powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros
>   powerpc/64s/exception: Move all interrupt handlers to new style code
> gen macros
>   powerpc/64s/exception: Remove old INT_ENTRY macro
>   powerpc/64s/exception: Remove old INT_COMMON macro
>   powerpc/64s/exception: Remove old INT_KVM_HANDLER
>   powerpc/64s/exception: Add ISIDE option
>   powerpc/64s/exception: move real->virt switch into the common handler
>   powerpc/64s/exception: move soft-mask test to common code
>   powerpc/64s/exception: move KVM test to common code
>   powerpc/64s/exception: remove confusing IEARLY option
>   powerpc/64s/exception: remove the SPR saving patch code macros
>   powerpc/64s/exception: trim unused arguments from KVMTEST macro
>   powerpc/64s/exception: hdecrementer avoid touching the stack
>   powerpc/64s/exception: re-inline some handlers
>   powerpc/64s/exception: Clean up SRR specifiers
>   powerpc/64s/exception: add more comments for interrupt handlers
>   powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is
> supported
>   powerpc/64s/exception: soft nmi interrupt should not use
> ret_from_except
>   powerpc/64: system call remove non-volatile GPR save optimisation
>   powerpc/64: system call implement the bulk of the logic in C
>   powerpc/64s: interrupt return in C
>   powerpc/64s/exception: remove lite interrupt return
>   powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked
>   powerpc/64s: system call support for scv/rfscv instructions
> 
>  arch/powerpc/include/asm/asm-prototypes.h |   11 -
>  .../powerpc/include/asm/book3s/64/kup-radix.h |   24 +-
>  arch/powerpc/include/asm/cputime.h|   24 +
>  arch/powerpc/include/asm/exception-64s.h  |4 -
>  arch/powerpc/include/asm/head-64.h|2 +-
>  arch/powerpc/include/asm/hw_irq.h |4 +
>  arch/powerpc/include/asm/ppc_asm.h|2 +
>  arch/powerpc/include/asm/processor.h  |2 +-
>  arch/powerpc/include/asm/ptrace.h |3 +
>  arch/powerpc/include/asm/signal.h |3 +
>  arch/powerpc/include/asm/switch_to.h  |   11 +
>  arch/powerpc/include/asm/time.h   |4 +-
>  arch/powerpc/kernel/Makefile  |3 +-
>  arch/powerpc/kernel/cpu_setup_power.S |2 +-
>  arch/powerpc/kernel/dt_cpu_ftrs.c |1 +
>  arch/powerpc/kernel/entry_64.S|  964 ++--
>  arch/powerpc/kernel/exceptions-64e.S  |  254 +-
>  arch/powerpc/kernel/exceptions-64s.S  | 2046 -
>  arch/powerpc/kernel/process.c |2 +
>  arch/powerpc/kernel/signal.h  |2 -
>  arch/powerpc/kernel/syscall_64.c  |  422 
>  arch/powerpc/kernel/syscalls/syscall.tbl  |   22 +-
>  arch/powerpc/kernel/systbl.S  |9 +-
>  arch/powerpc/kernel/time.c|9 -
>  arch/powerpc/kernel/vector.S  |2 +-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |   11 -
>  arch/powerpc/kvm/book3s_segment.S |7 -
>  27 files changed, 2458 insertions(+), 1392 deletions(-)
>  create mode 100644 arch/powerpc/kernel/syscall_64.c
> 
> -- 
> 2.23.0
> 


Re: [PATCH v7 5/6] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.

2019-09-14 Thread Michal Suchánek
On Tue, 03 Sep 2019 10:00:57 +1000
Michael Ellerman  wrote:

> Michal Suchánek  writes:
> > On Mon, 02 Sep 2019 12:03:12 +1000
> > Michael Ellerman  wrote:
> >  
> >> Michal Suchanek  writes:  
> >> > On bigendian ppc64 it is common to have 32bit legacy binaries but much
> >> > less so on littleendian.
> >> 
> >> I think the toolchain people will tell you that there is no 32-bit
> >> little endian ABI defined at all, if anything works it's by accident.  
> >
> > I have seen a piece of software that workarounds code issues on 64bit
> > by always compiling 32bit code. So it does work in some way.  
> 
> What software is that?

The only one I have seen is stockfish (v9)

> 
> > Also it has been pointed out that you can still switch to BE even with
> > the 'fast-switch' removed.  
> 
> Yes we have a proper syscall for endian switching, sys_switch_endian(),
> which is definitely supported.
> 
> But that *only* switches the endian-ness of the process, it does nothing
> to the syscall layer. So any process that switches to the other endian
> must endian flip syscall arguments (that aren't in registers), or flip
> back to the native endian before calling syscalls.

In other words just installing a chroot of binaries built for the other
endian won't work. You need something like qemu to do the syscall
translation or run full VM with a kernel that has the swapped endian
syscall ABI.

Thanks

Michal


Re: [PATCH v8 5/7] powerpc/64: make buildable without CONFIG_COMPAT

2019-09-12 Thread Michal Suchánek
On Thu, 12 Sep 2019 21:36:11 +0200
Christophe Leroy  wrote:

> Le 12/09/2019 à 20:26, Michal Suchánek a écrit :
> > On Thu, 12 Sep 2019 20:02:16 +0200
> > Christophe Leroy  wrote:
> >   
> >> Le 12/09/2019 à 19:26, Michal Suchanek a écrit :  
> >>> There are numerous references to 32bit functions in generic and 64bit
> >>> code so ifdef them out.
> >>>
> >>> Signed-off-by: Michal Suchanek 
> >>> ---
> >>> v2:
> >>> - fix 32bit ifdef condition in signal.c
> >>> - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> >>> - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> >>> v3:
> >>> - use IS_ENABLED and maybe_unused where possible
> >>> - do not ifdef declarations
> >>> - clean up Makefile
> >>> v4:
> >>> - further makefile cleanup
> >>> - simplify is_32bit_task conditions
> >>> - avoid ifdef in condition by using return
> >>> v5:
> >>> - avoid unreachable code on 32bit
> >>> - make is_current_64bit constant on !COMPAT
> >>> - add stub perf_callchain_user_32 to avoid some ifdefs
> >>> v6:
> >>> - consolidate current_is_64bit
> >>> v7:
> >>> - remove leftover perf_callchain_user_32 stub from previous series version
> >>> v8:
> >>> - fix build again - too trigger-happy with stub removal
> >>> - remove a vdso.c hunk that causes warning according to kbuild test robot
> >>> ---
> >>>arch/powerpc/include/asm/thread_info.h |  4 +--
> >>>arch/powerpc/kernel/Makefile   |  7 ++---
> >>>arch/powerpc/kernel/entry_64.S |  2 ++
> >>>arch/powerpc/kernel/signal.c   |  3 +-
> >>>arch/powerpc/kernel/syscall_64.c   |  6 ++--
> >>>arch/powerpc/kernel/vdso.c |  3 +-
> >>>arch/powerpc/perf/callchain.c  | 39 ++
> >>>7 files changed, 33 insertions(+), 31 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/include/asm/thread_info.h 
> >>> b/arch/powerpc/include/asm/thread_info.h
> >>> index 8e1d0195ac36..c128d8a48ea3 100644
> >>> --- a/arch/powerpc/include/asm/thread_info.h
> >>> +++ b/arch/powerpc/include/asm/thread_info.h
> >>> @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned 
> >>> int flags)
> >>>   return (ti->local_flags & flags) != 0;
> >>>}
> >>>
> >>> -#ifdef CONFIG_PPC64
> >>> +#ifdef CONFIG_COMPAT
> >>>#define is_32bit_task()(test_thread_flag(TIF_32BIT))
> >>>#else
> >>> -#define is_32bit_task()  (1)
> >>> +#define is_32bit_task()  (IS_ENABLED(CONFIG_PPC32))
> >>>#endif
> >>>
> >>>#if defined(CONFIG_PPC64)  
> >>
> >> [...]
> >>  
> >>> +static inline int current_is_64bit(void)
> >>> +{
> >>> + if (!IS_ENABLED(CONFIG_COMPAT))
> >>> + return IS_ENABLED(CONFIG_PPC64);
> >>> + /*
> >>> +  * We can't use test_thread_flag() here because we may be on an
> >>> +  * interrupt stack, and the thread flags don't get copied over
> >>> +  * from the thread_info on the main stack to the interrupt stack.
> >>> +  */
> >>> + return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> >>> +}  
> >>
> >>
> >> Since at least commit ed1cd6deb013 ("powerpc: Activate
> >> CONFIG_THREAD_INFO_IN_TASK")
> >> [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed1cd6d]
> >> the above comment is wrong and current_is_64bit() is equivalent to
> >> !is_32bit_task()
> >>
> >> See https://github.com/linuxppc/issues/issues/275
> >>
> >> Christophe  
> > 
> > I aim at changing the code as little as possible here. A separate patch
> > on top removing this function would be ok?  
> 
> Yes I agree. By making prior to this patch a separate patch which drops 
> current_is_64bit() would be good. And it would reduce the size of this 
> patch by approximately one third.

Indeed, removing it before makes this patch much cleaner.

Thanks

Michal


Re: [PATCH v8 5/7] powerpc/64: make buildable without CONFIG_COMPAT

2019-09-12 Thread Michal Suchánek
On Thu, 12 Sep 2019 20:02:16 +0200
Christophe Leroy  wrote:

> Le 12/09/2019 à 19:26, Michal Suchanek a écrit :
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2:
> > - fix 32bit ifdef condition in signal.c
> > - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> > - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> > v3:
> > - use IS_ENABLED and maybe_unused where possible
> > - do not ifdef declarations
> > - clean up Makefile
> > v4:
> > - further makefile cleanup
> > - simplify is_32bit_task conditions
> > - avoid ifdef in condition by using return
> > v5:
> > - avoid unreachable code on 32bit
> > - make is_current_64bit constant on !COMPAT
> > - add stub perf_callchain_user_32 to avoid some ifdefs
> > v6:
> > - consolidate current_is_64bit
> > v7:
> > - remove leftover perf_callchain_user_32 stub from previous series version
> > v8:
> > - fix build again - too trigger-happy with stub removal
> > - remove a vdso.c hunk that causes warning according to kbuild test robot
> > ---
> >   arch/powerpc/include/asm/thread_info.h |  4 +--
> >   arch/powerpc/kernel/Makefile   |  7 ++---
> >   arch/powerpc/kernel/entry_64.S |  2 ++
> >   arch/powerpc/kernel/signal.c   |  3 +-
> >   arch/powerpc/kernel/syscall_64.c   |  6 ++--
> >   arch/powerpc/kernel/vdso.c |  3 +-
> >   arch/powerpc/perf/callchain.c  | 39 ++
> >   7 files changed, 33 insertions(+), 31 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/thread_info.h 
> > b/arch/powerpc/include/asm/thread_info.h
> > index 8e1d0195ac36..c128d8a48ea3 100644
> > --- a/arch/powerpc/include/asm/thread_info.h
> > +++ b/arch/powerpc/include/asm/thread_info.h
> > @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned 
> > int flags)
> > return (ti->local_flags & flags) != 0;
> >   }
> >   
> > -#ifdef CONFIG_PPC64
> > +#ifdef CONFIG_COMPAT
> >   #define is_32bit_task()   (test_thread_flag(TIF_32BIT))
> >   #else
> > -#define is_32bit_task()(1)
> > +#define is_32bit_task()(IS_ENABLED(CONFIG_PPC32))
> >   #endif
> >   
> >   #if defined(CONFIG_PPC64)  
> 
> [...]
> 
> > +static inline int current_is_64bit(void)
> > +{
> > +   if (!IS_ENABLED(CONFIG_COMPAT))
> > +   return IS_ENABLED(CONFIG_PPC64);
> > +   /*
> > +* We can't use test_thread_flag() here because we may be on an
> > +* interrupt stack, and the thread flags don't get copied over
> > +* from the thread_info on the main stack to the interrupt stack.
> > +*/
> > +   return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> > +}  
> 
> 
> Since at least commit ed1cd6deb013 ("powerpc: Activate 
> CONFIG_THREAD_INFO_IN_TASK") 
> [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed1cd6d]
>  
> the above comment is wrong and current_is_64bit() is equivalent to 
> !is_32bit_task()
> 
> See https://github.com/linuxppc/issues/issues/275
> 
> Christophe

I aim at changing the code as little as possible here. A separate patch
on top removing this function would be ok?

Thanks

Michal


Re: [PATCH v3] powerpc/64: system call implement the bulk of the logic in C

2019-09-05 Thread Michal Suchánek
On Thu, 5 Sep 2019 15:25:31 +
Christophe Leroy  wrote:

> On 09/05/2019 12:35 PM, Nicholas Piggin wrote:
> > System call entry and particularly exit code is beyond the limit of what
> > is reasonable to implement in asm.
> > 
> > This conversion moves all conditional branches out of the asm code,
> > except for the case that all GPRs should be restored at exit.
> > 
> > Null syscall test is about 5% faster after this patch, because the exit
> > work is handled under local_irq_disable, and the hard mask and pending
> > interrupt replay is handled after that, which avoids games with MSR.
> > 
> > Signed-off-by: Nicholas Piggin   
> 
> Cannot apply it on latest powerpc/merge. On what does that apply ?

The v2 series had 4 patches so presumably the previous 3 are missing
here.

Thanks

Michal


Re: [PATCH v2] powerpc/fadump: when fadump is supported register the fadump sysfs files.

2019-09-04 Thread Michal Suchánek
On Thu, 29 Aug 2019 10:58:16 +0530
Hari Bathini  wrote:

> On 28/08/19 10:57 PM, Michal Suchanek wrote:
> > Currently it is not possible to distinguish the case when fadump is
> > supported by firmware and disabled in kernel and completely unsupported
> > using the kernel sysfs interface. User can investigate the devicetree
> > but it is more reasonable to provide sysfs files in case we get some
> > fadumpv2 in the future.
> > 
> > With this patch sysfs files are available whenever fadump is supported
> > by firmware.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---  
> 
> [...]
> 
> > -   if (!fw_dump.fadump_supported) {
> > +   if (!fw_dump.fadump_supported && fw_dump.fadump_enabled) {
> > printk(KERN_ERR "Firmware-assisted dump is not supported on"
> > " this hardware\n");
> > -   return 0;
> > }  
> 
> The above hunk is redundant with similar message already logged during
> early boot in fadump_reserve_mem() function. I am not strongly against
> this though. So...

I see this:
[0.00] debug: ignoring loglevel setting.
[0.00] Firmware-assisted dump is not supported on this hardware
[0.00] Reserving 256MB of memory at 128MB for crashkernel (System RAM: 
8192MB)
[0.00] Allocated 5832704 bytes for 2048 pacas at c7a8
[0.00] Page sizes from device-tree:
[0.00] base_shift=12: shift=12, sllp=0x, avpnm=0x, 
tlbiel=1, penc=0
[0.00] base_shift=16: shift=16, sllp=0x0110, avpnm=0x, 
tlbiel=1, penc=1
[0.00] Page orders: linear mapping = 16, virtual = 16, io = 16, vmemmap 
= 16
[0.00] Using 1TB segments
[0.00] Initializing hash mmu with SLB

Clearly the second message is logged from the above code. The duplicate
is capitalized: "Firmware-Assisted Dump is not supported on this
hardware" and I don't see it logged. So if anything is duplicate that
should be removed it is the message in fadump_reserve_mem(). It is not
clear why that one is not seen, though.

Thanks

Michal


Re: [PATCH v7 5/6] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.

2019-09-02 Thread Michal Suchánek
On Mon, 02 Sep 2019 12:03:12 +1000
Michael Ellerman  wrote:

> Michal Suchanek  writes:
> > On bigendian ppc64 it is common to have 32bit legacy binaries but much
> > less so on littleendian.  
> 
> I think the toolchain people will tell you that there is no 32-bit
> little endian ABI defined at all, if anything works it's by accident.

I have seen a piece of software that workarounds code issues on 64bit
by always compiling 32bit code. So it does work in some way. Also it
has been pointed out that you can still switch to BE even with the
'fast-switch' removed.

> 
> So I think we should not make this selectable, unless someone puts their
> hand up to say they want it and are willing to test it and keep it
> working.

I don't really care either way.

Thanks

Michal

> 
> cheers
> 
> > v3: make configurable
> > ---
> >  arch/powerpc/Kconfig | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 5bab0bb6b833..b0339e892329 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -264,8 +264,9 @@ config PANIC_TIMEOUT
> > default 180
> >  
> >  config COMPAT
> > -   bool
> > -   default y if PPC64
> > +   bool "Enable support for 32bit binaries"
> > +   depends on PPC64
> > +   default y if !CPU_LITTLE_ENDIAN
> > select COMPAT_BINFMT_ELF
> > select ARCH_WANT_OLD_COMPAT_IPC
> > select COMPAT_OLD_SIGACTION
> > -- 
> > 2.22.0  



Re: [PATCH v7 3/6] powerpc/perf: consolidate read_user_stack_32

2019-09-02 Thread Michal Suchánek
On Mon, 02 Sep 2019 14:01:17 +1000
Michael Ellerman  wrote:

> Michael Ellerman  writes:
> > Michal Suchanek  writes:  
> ...
> >> @@ -295,6 +279,12 @@ static inline int current_is_64bit(void)
> >>  }
> >>  
> >>  #else  /* CONFIG_PPC64 */
> >> +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >> +{
> >> +  return 0;
> >> +}
> >> +#endif /* CONFIG_PPC64 */  
> >
> > Ending the PPC64 else case here, and then restarting it below with an
> > ifndef means we end up with two parts of the file that define 32-bit
> > code, with a common chunk in the middle, which I dislike.
> >
> > I'd rather you add the empty read_user_stack_slow() in the existing
> > #else section and then move read_user_stack_32() below the whole ifdef
> > PPC64/else/endif section.
> >
> > Is there some reason that doesn't work?  
> 
> Gah, I missed that you split the whole file later in the series. Any
> reason you did it in two steps rather than moving patch 6 earlier in the
> series?

To make this patch readable.

Thanks

Michal


Re: [PATCH v7 6/6] powerpc/perf: split callchain.c by bitness

2019-08-31 Thread Michal Suchánek
On Fri, 30 Aug 2019 23:03:43 +0200
Michal Suchanek  wrote:

> Building callchain.c with !COMPAT proved quite ugly with all the
> defines. Splitting out the 32bit and 64bit parts looks better.
> 
> No code change intended.

valid_user_sp is broken in compat. It needs to be ifdefed for the 32bit
case.
> 
> Signed-off-by: Michal Suchanek 
> Reviewed-by: Christophe Leroy 
> ---
> v6:
>  - move current_is_64bit consolidetaion to earlier patch
>  - move defines to the top of callchain_32.c
>  - Makefile cleanup
> ---
>  arch/powerpc/perf/Makefile   |   5 +-
>  arch/powerpc/perf/callchain.c| 365 +--
>  arch/powerpc/perf/callchain.h|  11 +
>  arch/powerpc/perf/callchain_32.c | 204 +
>  arch/powerpc/perf/callchain_64.c | 185 
>  5 files changed, 405 insertions(+), 365 deletions(-)
>  create mode 100644 arch/powerpc/perf/callchain.h
>  create mode 100644 arch/powerpc/perf/callchain_32.c
>  create mode 100644 arch/powerpc/perf/callchain_64.c
> 
> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
> index c155dcbb8691..53d614e98537 100644
> --- a/arch/powerpc/perf/Makefile
> +++ b/arch/powerpc/perf/Makefile
> @@ -1,6 +1,9 @@
>  # SPDX-License-Identifier: GPL-2.0
>  
> -obj-$(CONFIG_PERF_EVENTS)+= callchain.o perf_regs.o
> +obj-$(CONFIG_PERF_EVENTS)+= callchain.o callchain_$(BITS).o perf_regs.o
> +ifdef CONFIG_COMPAT
> +obj-$(CONFIG_PERF_EVENTS)+= callchain_32.o
> +endif
>  
>  obj-$(CONFIG_PPC_PERF_CTRS)  += core-book3s.o bhrb.o
>  obj64-$(CONFIG_PPC_PERF_CTRS)+= ppc970-pmu.o power5-pmu.o \
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index 9b8dc822f531..8f30f1b47c78 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -15,11 +15,9 @@
>  #include 
>  #include 
>  #include 
> -#ifdef CONFIG_COMPAT
> -#include "../kernel/ppc32.h"
> -#endif
>  #include 
>  
> +#include "callchain.h"
>  
>  /*
>   * Is sp valid as the address of the next kernel stack frame after prev_sp?
> @@ -102,367 +100,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
> *entry, struct pt_regs *re
>   }
>  }
>  
> -#ifdef CONFIG_PPC64
> -/*
> - * On 64-bit we don't want to invoke hash_page on user addresses from
> - * interrupt context, so if the access faults, we read the page tables
> - * to find which page (if any) is mapped and access it directly.
> - */
> -static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> -{
> - int ret = -EFAULT;
> - pgd_t *pgdir;
> - pte_t *ptep, pte;
> - unsigned shift;
> - unsigned long addr = (unsigned long) ptr;
> - unsigned long offset;
> - unsigned long pfn, flags;
> - void *kaddr;
> -
> - pgdir = current->mm->pgd;
> - if (!pgdir)
> - return -EFAULT;
> -
> - local_irq_save(flags);
> - ptep = find_current_mm_pte(pgdir, addr, NULL, );
> - if (!ptep)
> - goto err_out;
> - if (!shift)
> - shift = PAGE_SHIFT;
> -
> - /* align address to page boundary */
> - offset = addr & ((1UL << shift) - 1);
> -
> - pte = READ_ONCE(*ptep);
> - if (!pte_present(pte) || !pte_user(pte))
> - goto err_out;
> - pfn = pte_pfn(pte);
> - if (!page_is_ram(pfn))
> - goto err_out;
> -
> - /* no highmem to worry about here */
> - kaddr = pfn_to_kaddr(pfn);
> - memcpy(buf, kaddr + offset, nb);
> - ret = 0;
> -err_out:
> - local_irq_restore(flags);
> - return ret;
> -}
> -
> -static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> -{
> - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
> - ((unsigned long)ptr & 7))
> - return -EFAULT;
> -
> - pagefault_disable();
> - if (!__get_user_inatomic(*ret, ptr)) {
> - pagefault_enable();
> - return 0;
> - }
> - pagefault_enable();
> -
> - return read_user_stack_slow(ptr, ret, 8);
> -}
> -
> -static inline int valid_user_sp(unsigned long sp, int is_64)
> -{
> - if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x1UL) - 32)
> - return 0;
> - return 1;
> -}
> -
> -/*
> - * 64-bit user processes use the same stack frame for RT and non-RT signals.
> - */
> -struct signal_frame_64 {
> - chardummy[__SIGNAL_FRAMESIZE];
> - struct ucontext uc;
> - unsigned long   unused[2];
> - unsigned inttramp[6];
> - struct siginfo  *pinfo;
> - void*puc;
> - struct siginfo  info;
> - charabigap[288];
> -};
> -
> -static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
> -{
> - if (nip == fp + offsetof(struct signal_frame_64, tramp))
> - return 1;
> - if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
> - nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
> - return 1;
> - return 0;
> -}
> -

Re: [PATCH v7 0/6] Disable compat cruft on ppc64le v7

2019-08-31 Thread Michal Suchánek
On Sat, 31 Aug 2019 08:41:58 +0200
Christophe Leroy  wrote:

> Le 30/08/2019 à 23:03, Michal Suchanek a écrit :
> > Less code means less bugs so add a knob to skip the compat stuff.  
> 
> I guess on PPC64 you have Gigabytes of memory and thousands of bogomips, 
> hence you focus on bugs.
> 
> My main focus usually is kernel size and performance, which makes this 
> series interesting as well.
> 
> Anyway, I was wondering, would it make sense (in a following series, not 
> in this one) to make it buildable as a module, just like some of binfmt ?

I think not.

You have the case when 32bit support is not optional because you are
32bit and when it is optional because you are 64bit. These cases either
diverge or the 32bit case suffers the penalty of some indirection that
makes the functionality loadable.

The indirection requires some synchronization so the 32bit code cannot
be unloaded while you are trying to call it, and possibly counting
32bit tasks so you will know when you can unload the 32bit code again.

This would add more code which benefits neither performance nor size
nor reliability.

Also you can presumably run 32bit code with binfmt-misc already but
don't get stuff like perf counters handled in the kernel because it is
not native code anymore.

Thanks

Michal

> 
> Christophe
> 
> > 
> > This is tested on ppc64le top of
> > 
> > https://patchwork.ozlabs.org/cover/1153556/
> > 
> > Changes in v2: saner CONFIG_COMPAT ifdefs
> > Changes in v3:
> >   - change llseek to 32bit instead of builing it unconditionally in fs
> >   - clanup the makefile conditionals
> >   - remove some ifdefs or convert to IS_DEFINED where possible
> > Changes in v4:
> >   - cleanup is_32bit_task and current_is_64bit
> >   - more makefile cleanup
> > Changes in v5:
> >   - more current_is_64bit cleanup
> >   - split off callchain.c 32bit and 64bit parts
> > Changes in v6:
> >   - cleanup makefile after split
> >   - consolidate read_user_stack_32
> >   - fix some checkpatch warnings
> > Changes in v7:
> >   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
> >   - remove leftover hunk
> >   - add review tags
> > 
> > Michal Suchanek (6):
> >powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
> >powerpc: move common register copy functions from signal_32.c to
> >  signal.c
> >powerpc/perf: consolidate read_user_stack_32
> >powerpc/64: make buildable without CONFIG_COMPAT
> >powerpc/64: Make COMPAT user-selectable disabled on littleendian by
> >  default.
> >powerpc/perf: split callchain.c by bitness
> > 
> >   arch/powerpc/Kconfig   |   5 +-
> >   arch/powerpc/include/asm/thread_info.h |   4 +-
> >   arch/powerpc/include/asm/unistd.h  |   1 +
> >   arch/powerpc/kernel/Makefile   |   7 +-
> >   arch/powerpc/kernel/entry_64.S |   2 +
> >   arch/powerpc/kernel/signal.c   | 144 +-
> >   arch/powerpc/kernel/signal_32.c| 140 -
> >   arch/powerpc/kernel/syscall_64.c   |   6 +-
> >   arch/powerpc/kernel/vdso.c |   5 +-
> >   arch/powerpc/perf/Makefile |   5 +-
> >   arch/powerpc/perf/callchain.c  | 377 +
> >   arch/powerpc/perf/callchain.h  |  11 +
> >   arch/powerpc/perf/callchain_32.c   | 204 +
> >   arch/powerpc/perf/callchain_64.c   | 185 
> >   fs/read_write.c|   3 +-
> >   15 files changed, 566 insertions(+), 533 deletions(-)
> >   create mode 100644 arch/powerpc/perf/callchain.h
> >   create mode 100644 arch/powerpc/perf/callchain_32.c
> >   create mode 100644 arch/powerpc/perf/callchain_64.c
> >   



Re: [PATCH v7 4/6] powerpc/64: make buildable without CONFIG_COMPAT

2019-08-31 Thread Michal Suchánek
On Fri, 30 Aug 2019 23:03:41 +0200
Michal Suchanek  wrote:

> There are numerous references to 32bit functions in generic and 64bit
> code so ifdef them out.
> 
> Signed-off-by: Michal Suchanek 
> ---
> v2:
> - fix 32bit ifdef condition in signal.c
> - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> v3:
> - use IS_ENABLED and maybe_unused where possible
> - do not ifdef declarations
> - clean up Makefile
> v4:
> - further makefile cleanup
> - simplify is_32bit_task conditions
> - avoid ifdef in condition by using return
> v5:
> - avoid unreachable code on 32bit
> - make is_current_64bit constant on !COMPAT
> - add stub perf_callchain_user_32 to avoid some ifdefs
> v6:
> - consolidate current_is_64bit
> v7:
> - remove leftover perf_callchain_user_32 stub from previous series version

removed too much stuff here

> ---
>  arch/powerpc/include/asm/thread_info.h |  4 ++--
>  arch/powerpc/kernel/Makefile   |  7 +++---
>  arch/powerpc/kernel/entry_64.S |  2 ++
>  arch/powerpc/kernel/signal.c   |  3 +--
>  arch/powerpc/kernel/syscall_64.c   |  6 ++---
>  arch/powerpc/kernel/vdso.c |  5 ++---
>  arch/powerpc/perf/callchain.c  | 31 +-
>  7 files changed, 27 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/thread_info.h 
> b/arch/powerpc/include/asm/thread_info.h
> index 8e1d0195ac36..c128d8a48ea3 100644
> --- a/arch/powerpc/include/asm/thread_info.h
> +++ b/arch/powerpc/include/asm/thread_info.h
> @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned int 
> flags)
>   return (ti->local_flags & flags) != 0;
>  }
>  
> -#ifdef CONFIG_PPC64
> +#ifdef CONFIG_COMPAT
>  #define is_32bit_task()  (test_thread_flag(TIF_32BIT))
>  #else
> -#define is_32bit_task()  (1)
> +#define is_32bit_task()  (IS_ENABLED(CONFIG_PPC32))
>  #endif
>  
>  #if defined(CONFIG_PPC64)
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 1d646a94d96c..9d8772e863b9 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -44,16 +44,15 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
>  endif
>  
>  obj-y:= cputable.o ptrace.o syscalls.o \
> -irq.o align.o signal_32.o pmc.o vdso.o \
> +irq.o align.o signal_$(BITS).o pmc.o vdso.o \
>  process.o systbl.o idle.o \
>  signal.o sysfs.o cacheinfo.o time.o \
>  prom.o traps.o setup-common.o \
>  udbg.o misc.o io.o misc_$(BITS).o \
>  of_platform.o prom_parse.o
> -obj-$(CONFIG_PPC64)  += setup_64.o sys_ppc32.o \
> -signal_64.o ptrace32.o \
> -paca.o nvram_64.o firmware.o \
> +obj-$(CONFIG_PPC64)  += setup_64.o paca.o nvram_64.o firmware.o \
>  syscall_64.o
> +obj-$(CONFIG_COMPAT) += sys_ppc32.o ptrace32.o signal_32.o
>  obj-$(CONFIG_VDSO32) += vdso32/
>  obj-$(CONFIG_PPC_WATCHDOG)   += watchdog.o
>  obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 2ec825a85f5b..a2dbf216f607 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -51,8 +51,10 @@
>  SYS_CALL_TABLE:
>   .tc sys_call_table[TC],sys_call_table
>  
> +#ifdef CONFIG_COMPAT
>  COMPAT_SYS_CALL_TABLE:
>   .tc compat_sys_call_table[TC],compat_sys_call_table
> +#endif
>  
>  /* This value is used to mark exception frames on the stack. */
>  exception_marker:
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index 60436432399f..61678cb0e6a1 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
>   sigset_t *oldset = sigmask_to_save();
>   struct ksignal ksig = { .sig = 0 };
>   int ret;
> - int is32 = is_32bit_task();
>  
>   BUG_ON(tsk != current);
>  
> @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
>  
>   rseq_signal_deliver(, tsk->thread.regs);
>  
> - if (is32) {
> + if (is_32bit_task()) {
>   if (ksig.ka.sa.sa_flags & SA_SIGINFO)
>   ret = handle_rt_signal32(, oldset, tsk);
>   else
> diff --git a/arch/powerpc/kernel/syscall_64.c 
> b/arch/powerpc/kernel/syscall_64.c
> index 98ed970796d5..0d5cbbe54cf1 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -38,7 +38,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, 
> long);
>  
>  long system_call_exception(long r3, long r4, long r5, long 

Re: [PATCH v6 4/6] powerpc/64: make buildable without CONFIG_COMPAT

2019-08-30 Thread Michal Suchánek
On Fri, 30 Aug 2019 22:21:09 +0200
Christophe Leroy  wrote:

> Le 30/08/2019 à 20:57, Michal Suchanek a écrit :
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2:
> > - fix 32bit ifdef condition in signal.c
> > - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> > - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> > v3:
> > - use IS_ENABLED and maybe_unused where possible
> > - do not ifdef declarations
> > - clean up Makefile
> > v4:
> > - further makefile cleanup
> > - simplify is_32bit_task conditions
> > - avoid ifdef in condition by using return
> > v5:
> > - avoid unreachable code on 32bit
> > - make is_current_64bit constant on !COMPAT
> > - add stub perf_callchain_user_32 to avoid some ifdefs
> > v6:
> > - consolidate current_is_64bit
> > ---
> >   arch/powerpc/include/asm/thread_info.h |  4 +--
> >   arch/powerpc/kernel/Makefile   |  7 +++--
> >   arch/powerpc/kernel/entry_64.S |  2 ++
> >   arch/powerpc/kernel/signal.c   |  3 +--
> >   arch/powerpc/kernel/syscall_64.c   |  6 ++---
> >   arch/powerpc/kernel/vdso.c |  5 ++--
> >   arch/powerpc/perf/callchain.c  | 37 +++---
> >   7 files changed, 33 insertions(+), 31 deletions(-)
> >   
> 
> [...]
> 
> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> > index b7cdcce20280..788ad2c63f18 100644
> > --- a/arch/powerpc/perf/callchain.c
> > +++ b/arch/powerpc/perf/callchain.c
> > @@ -15,7 +15,7 @@
> >   #include 
> >   #include 
> >   #include 
> > -#ifdef CONFIG_PPC64
> > +#ifdef CONFIG_COMPAT
> >   #include "../kernel/ppc32.h"
> >   #endif
> >   #include 
> > @@ -268,16 +268,6 @@ static void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry,
> > }
> >   }
> >   
> > -static inline int current_is_64bit(void)
> > -{
> > -   /*
> > -* We can't use test_thread_flag() here because we may be on an
> > -* interrupt stack, and the thread flags don't get copied over
> > -* from the thread_info on the main stack to the interrupt stack.
> > -*/
> > -   return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> > -}
> > -
> >   #else  /* CONFIG_PPC64 */
> >   static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >   {
> > @@ -314,11 +304,6 @@ static inline void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry
> >   {
> >   }
> >   
> > -static inline int current_is_64bit(void)
> > -{
> > -   return 0;
> > -}
> > -
> >   static inline int valid_user_sp(unsigned long sp, int is_64)
> >   {
> > if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
> > @@ -334,6 +319,7 @@ static inline int valid_user_sp(unsigned long sp, int 
> > is_64)
> >   
> >   #endif /* CONFIG_PPC64 */
> >   
> > +#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
> >   /*
> >* Layout for non-RT signal frames
> >*/
> > @@ -475,6 +461,25 @@ static void perf_callchain_user_32(struct 
> > perf_callchain_entry_ctx *entry,
> > sp = next_sp;
> > }
> >   }
> > +#else /* 32bit */
> > +static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> > +  struct pt_regs *regs)
> > +{
> > +   (void)_user_stack_32; /* unused if !COMPAT */  
> 
> You don't need that anymore do you ?

Yes, this part is not needed. It was removed later anyway but this
state is broken.

Thanks

Michal

> 
> Christophe
> 
> > +}
> > +#endif /* 32bit */
> > +
> > +static inline int current_is_64bit(void)
> > +{
> > +   if (!IS_ENABLED(CONFIG_COMPAT))
> > +   return IS_ENABLED(CONFIG_PPC64);
> > +   /*
> > +* We can't use test_thread_flag() here because we may be on an
> > +* interrupt stack, and the thread flags don't get copied over
> > +* from the thread_info on the main stack to the interrupt stack.
> > +*/
> > +   return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> > +}
> >   
> >   void
> >   perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct 
> > pt_regs *regs)
> >   



Re: [PATCH] Revert "asm-generic: Remove unneeded __ARCH_WANT_SYS_LLSEEK macro"

2019-08-30 Thread Michal Suchánek
On Fri, 30 Aug 2019 21:54:43 +0200
Arnd Bergmann  wrote:

> On Fri, Aug 30, 2019 at 9:46 PM Michal Suchanek  wrote:
> >
> > This reverts commit caf6f9c8a326cffd1d4b3ff3f1cfba75d159d70b.
> >
> > Maybe it was needed after all.
> >
> > When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.
> >
> > There is resistance to both removing the llseek syscall from the 64bit
> > syscall tables and building the llseek interface unconditionally.
> >
> > Link: https://lore.kernel.org/lkml/20190828151552.ga16...@infradead.org/
> > Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/
> >
> > Signed-off-by: Michal Suchanek   
> 
> This seems like the right idea in principle.
> 
> > index 5bbf587f5bc1..2f3c4bb138c4 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -331,7 +331,7 @@ COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, 
> > compat_off_t, offset, unsigned i
> >  }
> >  #endif
> >
> > -#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
> > +#ifdef __ARCH_WANT_SYS_LLSEEK
> >  SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
> > unsigned long, offset_low, loff_t __user *, result,
> > unsigned int, whence)  
> 
> However, only reverting the patch will now break all newly added
> 32-bit architectures that don't define __ARCH_WANT_SYS_LLSEEK:
> at least nds32 and riscv32 come to mind, not sure if there is another.

AFAICT nds32 never had the syscall. Its headers were added without
__ARCH_WANT_SYS_LLSEEK before the define was removed.

The new architecture csky should be handled.

> 
> I think the easiest way however would be to combine the two checks
> above and make it
> 
> #if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) ||
> defined(__ARCH_WANT_SYS_LLSEEK)
> 
> and then only set __ARCH_WANT_SYS_LLSEEK for powerpc.

Yes, that limits the use of __ARCH_WANT_SYS_LLSEEK, does not require
resurrecting the old headers, and may fix some architectures like nds32
that forgot to add it.

Thanks

Michal


Re: [PATCH v2 4/4] powerpc/64: system call implement the bulk of the logic in C

2019-08-30 Thread Michal Suchánek
On Sat, 31 Aug 2019 02:48:26 +0800
kbuild test robot  wrote:

> Hi Nicholas,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3-rc6 next-20190830]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64-syscalls-in-C/20190828-064221
> config: powerpc64-defconfig (attached as .config)
> compiler: powerpc64-linux-gcc (GCC) 7.4.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.4.0 make.cross ARCH=powerpc64 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot 
> 
> All errors (new ones prefixed by >>):
> 
>powerpc64-linux-ld: warning: orphan section `.gnu.hash' from `linker 
> stubs' being placed in section `.gnu.hash'.
>arch/powerpc/kernel/syscall_64.o: In function `.system_call_exception':
> >> syscall_64.c:(.text+0x180): undefined reference to `.tabort_syscall'  

Interestingly it builds and boots for me. Is this something about
dotted vs dotless symbols depending on some config options?

I see there is _GLOBAL(ret_from_fork) just below so maybe we need
_GLOBAL(tabort_syscall) instead of .globl tabort_syscall as well.

Thanks

Michal


Re: [PATCH v6 6/6] powerpc/perf: split callchain.c by bitness

2019-08-30 Thread Michal Suchánek
On Fri, 30 Aug 2019 20:57:57 +0200
Michal Suchanek  wrote:

> Building callchain.c with !COMPAT proved quite ugly with all the
> defines. Splitting out the 32bit and 64bit parts looks better.
> 

BTW the powerpc callchain.c does not match any of the patterns of PERF
CORE in MAINTAINERS (unlike callchain implementation on other
platforms). Is that intentional?

Thanks

Michal


  1   2   3   >