Re: [PATCH v4 7/8] lockdep: Change hardirq{s_enabled,_context} to per-cpu variables

2020-06-30 Thread Ahmed S. Darwish
Peter Zijlstra wrote:

...

> -#define lockdep_assert_irqs_disabled()   do {\
> - WARN_ONCE(debug_locks && !current->lockdep_recursion && \
> -   current->hardirqs_enabled,\
> -   "IRQs not disabled as expected\n");   \
> - } while (0)

...

> +#define lockdep_assert_irqs_disabled()   \
> +do { \
> + WARN_ON_ONCE(debug_locks && this_cpu_read(hardirqs_enabled));   \
> +} while (0)

I think it would be nice to keep the "IRQs not disabled as expected"
message. It makes the lockdep splat much more readable.

This is similarly the case for the v3 lockdep preemption macros:

  https://lkml.kernel.org/r/20200630054452.3675847-5-a.darw...@linutronix.de

I did not add a message though to get in-sync with the IRQ macros above.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH


Re: [PATCH updated] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-06-30 Thread Aneesh Kumar K.V

On 6/30/20 12:36 PM, Dan Williams wrote:

On Mon, Jun 29, 2020 at 10:02 PM Aneesh Kumar K.V
 wrote:


Dan Williams  writes:


On Mon, Jun 29, 2020 at 1:29 PM Aneesh Kumar K.V
 wrote:


Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V 
---
  drivers/md/dm-writecache.c   | 2 +-
  drivers/nvdimm/region_devs.c | 8 
  include/linux/libnvdimm.h| 4 
  3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 74f3c506f084..8c6b6dce64e2 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
  static void writecache_commit_flushed(struct dm_writecache *wc, bool 
wait_for_ios)
  {
 if (WC_MODE_PMEM(wc))
-   wmb();
+   arch_pmem_flush_barrier();
 else
 ssd_commit_flushed(wc, wait_for_ios);
  }
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4502f9c4708d..b308ad09b63d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
 idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));

 /*
-* The first wmb() is needed to 'sfence' all previous writes
-* such that they are architecturally visible for the platform
-* buffer flush.  Note that we've already arranged for pmem
+* The first arch_pmem_flush_barrier() is needed to 'sfence' all
+* previous writes such that they are architecturally visible for
+* the platform buffer flush. Note that we've already arranged for pmem
  * writes to avoid the cache via memcpy_flushcache().  The final
  * wmb() ensures ordering for the NVDIMM flush write.
  */
-   wmb();
+   arch_pmem_flush_barrier();
 for (i = 0; i < nd_region->ndr_mappings; i++)
 if (ndrd_get_flush_wpq(ndrd, i, 0))
 writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..66f6c65bd789 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t 
size)
  }
  #endif

+#ifndef arch_pmem_flush_barrier
+#define arch_pmem_flush_barrier() wmb()
+#endif


I think it is out of place to define this in libnvdimm.h and it is odd
to give it such a long name. The other pmem api helpers like
arch_wb_cache_pmem() and arch_invalidate_pmem() are function calls for
libnvdimm driver operations, this barrier is just an instruction and
is closer to wmb() than the pmem api routine.

Since it is a store fence for pmem, so let's just call it pmem_wmb()
and define the generic version in include/linux/compiler.h. It should
probably also be documented alongside dma_wmb() in
Documentation/memory-barriers.txt about why code would use it over
wmb(), and why a symmetric pmem_rmb() is not needed.


How about the below? I used pmem_barrier() instead of pmem_wmb().


Why? A barrier() is a bi-directional ordering mechanic for reads and
writes, and the proposed semantics mechanism only orders writes +
persistence. Otherwise the default fallback to wmb() on archs that
don't override it does not make sense.


I
guess we wanted this to order() any data access not jus the following
stores to persistent storage?


Why?


W.r.t why a symmetric pmem_rmb() is not
needed I was not sure how to explain that. Are you suggesting to explain
why a read/load from persistent storage don't want to wait for
pmem_barrier() ?


I would expect that the explanation is that a typical rmb() is
sufficient and that there is nothing pmem specific semantic for read
ordering for pmem vs normal read-barrier semantics.



modified   Documentation/memory-barriers.txt
@@ -1935,6 +1935,16 @@ There are some more advanced barrier functions:
   relaxed I/O accessors and the Documentation/DMA-API.txt file for more
   information on consistent memory.

+ (*) pmem_barrier();
+
+ These are for use with persistent memory to esure the ordering of stores
+ to persistent memory region.


If it was just ordering I would expect a typical wmb() to be
sufficient, why is the pmem-specific instruction needed? I thought it
was handshaking with hardware to ensure acceptance into a persistence
domain *in 

Re: [PATCH updated] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-06-30 Thread Dan Williams
On Mon, Jun 29, 2020 at 10:02 PM Aneesh Kumar K.V
 wrote:
>
> Dan Williams  writes:
>
> > On Mon, Jun 29, 2020 at 1:29 PM Aneesh Kumar K.V
> >  wrote:
> >>
> >> Architectures like ppc64 provide persistent memory specific barriers
> >> that will ensure that all stores for which the modifications are
> >> written to persistent storage by preceding dcbfps and dcbstps
> >> instructions have updated persistent storage before any data
> >> access or data transfer caused by subsequent instructions is initiated.
> >> This is in addition to the ordering done by wmb()
> >>
> >> Update nvdimm core such that architecture can use barriers other than
> >> wmb to ensure all previous writes are architecturally visible for
> >> the platform buffer flush.
> >>
> >> Signed-off-by: Aneesh Kumar K.V 
> >> ---
> >>  drivers/md/dm-writecache.c   | 2 +-
> >>  drivers/nvdimm/region_devs.c | 8 
> >>  include/linux/libnvdimm.h| 4 
> >>  3 files changed, 9 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> >> index 74f3c506f084..8c6b6dce64e2 100644
> >> --- a/drivers/md/dm-writecache.c
> >> +++ b/drivers/md/dm-writecache.c
> >> @@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache 
> >> *wc)
> >>  static void writecache_commit_flushed(struct dm_writecache *wc, bool 
> >> wait_for_ios)
> >>  {
> >> if (WC_MODE_PMEM(wc))
> >> -   wmb();
> >> +   arch_pmem_flush_barrier();
> >> else
> >> ssd_commit_flushed(wc, wait_for_ios);
> >>  }
> >> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> >> index 4502f9c4708d..b308ad09b63d 100644
> >> --- a/drivers/nvdimm/region_devs.c
> >> +++ b/drivers/nvdimm/region_devs.c
> >> @@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region 
> >> *nd_region)
> >> idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 
> >> 8));
> >>
> >> /*
> >> -* The first wmb() is needed to 'sfence' all previous writes
> >> -* such that they are architecturally visible for the platform
> >> -* buffer flush.  Note that we've already arranged for pmem
> >> +* The first arch_pmem_flush_barrier() is needed to 'sfence' all
> >> +* previous writes such that they are architecturally visible for
> >> +* the platform buffer flush. Note that we've already arranged for 
> >> pmem
> >>  * writes to avoid the cache via memcpy_flushcache().  The final
> >>  * wmb() ensures ordering for the NVDIMM flush write.
> >>  */
> >> -   wmb();
> >> +   arch_pmem_flush_barrier();
> >> for (i = 0; i < nd_region->ndr_mappings; i++)
> >> if (ndrd_get_flush_wpq(ndrd, i, 0))
> >> writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
> >> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> >> index 18da4059be09..66f6c65bd789 100644
> >> --- a/include/linux/libnvdimm.h
> >> +++ b/include/linux/libnvdimm.h
> >> @@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, 
> >> size_t size)
> >>  }
> >>  #endif
> >>
> >> +#ifndef arch_pmem_flush_barrier
> >> +#define arch_pmem_flush_barrier() wmb()
> >> +#endif
> >
> > I think it is out of place to define this in libnvdimm.h and it is odd
> > to give it such a long name. The other pmem api helpers like
> > arch_wb_cache_pmem() and arch_invalidate_pmem() are function calls for
> > libnvdimm driver operations, this barrier is just an instruction and
> > is closer to wmb() than the pmem api routine.
> >
> > Since it is a store fence for pmem, so let's just call it pmem_wmb()
> > and define the generic version in include/linux/compiler.h. It should
> > probably also be documented alongside dma_wmb() in
> > Documentation/memory-barriers.txt about why code would use it over
> > wmb(), and why a symmetric pmem_rmb() is not needed.
>
> How about the below? I used pmem_barrier() instead of pmem_wmb().

Why? A barrier() is a bi-directional ordering mechanic for reads and
writes, and the proposed semantics mechanism only orders writes +
persistence. Otherwise the default fallback to wmb() on archs that
don't override it does not make sense.

> I
> guess we wanted this to order() any data access not jus the following
> stores to persistent storage?

Why?

> W.r.t why a symmetric pmem_rmb() is not
> needed I was not sure how to explain that. Are you suggesting to explain
> why a read/load from persistent storage don't want to wait for
> pmem_barrier() ?

I would expect that the explanation is that a typical rmb() is
sufficient and that there is nothing pmem specific semantic for read
ordering for pmem vs normal read-barrier semantics.

>
> modified   Documentation/memory-barriers.txt
> @@ -1935,6 +1935,16 @@ There are some more advanced barrier functions:
>   relaxed I/O accessors and the Documentation/DMA-API.txt file for more
>   information on consistent 

Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-06-30 Thread Hari Bathini



On 30/06/20 9:00 am, piliu wrote:
> 
> 
> On 06/29/2020 01:55 PM, Hari Bathini wrote:
>>
>>
>> On 28/06/20 7:44 am, piliu wrote:
>>> Hi Hari,
>>
>> Hi Pingfan,
>>
>>>
>>> After a quick through for this series, I have a few question/comment on
>>> this patch for the time being. Pls see comment inline.
>>>
>>> On 06/27/2020 03:05 AM, Hari Bathini wrote:
 crashkernel region could have an overlap with special memory regions
 like  opal, rtas, tce-table & such. These regions are referred to as
 exclude memory ranges. Setup this ranges during image probe in order
 to avoid them while finding the buffer for different kdump segments.
>>
>> [...]
>>
 +  /*
 +   * Use the locate_mem_hole logic in kexec_add_buffer() for regular
 +   * kexec_file_load syscall
 +   */
 +  if (kbuf->image->type != KEXEC_TYPE_CRASH)
 +  return 0;
>>> Can the ranges overlap [crashk_res.start, crashk_res.end]?  Otherwise
>>> there is no requirement for @exclude_ranges.
>>
>> The ranges like rtas, opal are loaded by f/w. They almost always overlap with
>> crashkernel region. So, @exclude_ranges is required to support kdump.
> f/w passes rtas/opal as service, then must f/w mark these ranges as
> fdt_reserved_mem in order to make kernel aware not to use these ranges?

It does. Actually, reserve_map + reserved-ranges are reserved as soon as
memblock allocator is ready but not before crashkernel reservation.
Check early_reserve_mem() call in kernel/prom.c

> Otherwise kernel memory allocation besides kdump can also overwrite
> these ranges.> 
> Hmm, revisiting reserve_crashkernel(). It seems not to take any reserved
> memory into consider except kernel text. Could it work based on memblock
> allocator?

So, kdump could possibly overwrite these regions which is why an exclude
range list is needed. Same thing was done in kexec-tools as well.

Thanks
Hari


[PATCH 4/4] powerpc sstep: add testcases for vsx load/store instructions

2020-06-30 Thread Balamuruhan S
add testcases for vsx load/store vector paired instructions,
* Load VSX Vector Paired (lxvp)
* Load VSX Vector Paired Indexed (lxvpx)
* Prefixed Load VSX Vector Paired (plxvp)
* Store VSX Vector Paired (stxvp)
* Store VSX Vector Paired Indexed (stxvpx)
* Prefixed Store VSX Vector Paired (pstxvp)

Signed-off-by: Balamuruhan S 
---
 arch/powerpc/include/asm/ppc-opcode.h |   7 +
 arch/powerpc/lib/test_emulate_step.c  | 273 ++
 2 files changed, 280 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 558efd25683b..9bc9b184db6e 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -384,6 +384,10 @@
 #define PPC_INST_VCMPEQUD  0x10c7
 #define PPC_INST_VCMPEQUB  0x1006
 
+/* Prefixes */
+#define PPC_PREFIX_MLS 0x0600
+#define PPC_PREFIX_8LS 0x0400
+
 /* macros to insert fields into opcodes */
 #define ___PPC_RA(a)   (((a) & 0x1f) << 16)
 #define ___PPC_RB(b)   (((b) & 0x1f) << 11)
@@ -415,6 +419,9 @@
 #define __PPC_CT(t)(((t) & 0x0f) << 21)
 #define __PPC_SPR(r)   r) & 0x1f) << 16) | r) >> 5) & 0x1f) << 11))
 #define __PPC_RC21 (0x1 << 10)
+#define __PPC_PRFX_R(r)(((r) & 0x1) << 20)
+#define __PPC_TP(tp)   (((tp) & 0xf) << 22)
+#define __PPC_TX(tx)   (((tx) & 0x1) << 21)
 
 /*
  * Both low and high 16 bits are added as SIGNED additions, so if low 16 bits
diff --git a/arch/powerpc/lib/test_emulate_step.c 
b/arch/powerpc/lib/test_emulate_step.c
index 46af80279ebc..98ecbc66bef8 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -14,7 +14,13 @@
 #include 
 
 #define IMM_L(i)   ((uintptr_t)(i) & 0x)
+#define IMM_H(i)   (((uintptr_t)(i) >> 16) & 0x3)
 #define IMM_DS(i)  ((uintptr_t)(i) & 0xfffc)
+#define IMM_DQ(i)  (((uintptr_t)(i) & 0xfff) << 4)
+
+#define PLXVP_EX_OP0xe800
+#define PSTXVP_EX_OP   0xf800
+
 
 /*
  * Defined with TEST_ prefix so it does not conflict with other
@@ -47,6 +53,21 @@
___PPC_RA(a) | ___PPC_RB(b))
 #define TEST_LXVD2X(s, a, b)   ppc_inst(PPC_INST_LXVD2X | VSX_XX1((s), R##a, 
R##b))
 #define TEST_STXVD2X(s, a, b)  ppc_inst(PPC_INST_STXVD2X | VSX_XX1((s), R##a, 
R##b))
+#define TEST_LXVP(tp, tx, a, i) \
+   (PPC_INST_LXVP | __PPC_TP(tp) | __PPC_TX(tx) | ___PPC_RA(a) | IMM_DQ(i))
+#define TEST_STXVP(sp, sx, a, i) \
+   (PPC_INST_STXVP | __PPC_TP(sp) | __PPC_TX(sx) | ___PPC_RA(a) | 
IMM_DQ(i) | 0x1)
+#define TEST_LXVPX(tp, tx, a, b) \
+   (PPC_INST_LXVPX | __PPC_TP(tp) | __PPC_TX(tx) | ___PPC_RA(a) | 
___PPC_RB(b))
+#define TEST_STXVPX(sp, sx, a, b) \
+   (PPC_INST_STXVPX | __PPC_TP(sp) | __PPC_TX(sx) | ___PPC_RA(a) | 
___PPC_RB(b))
+#define TEST_PLXVP(a, i, pr, tp, tx) \
+   ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_H(i)) << 32 | \
+(PLXVP_EX_OP | __PPC_TP(tp) | __PPC_TX(tx) | ___PPC_RA(a) | IMM_L(i)))
+#define TEST_PSTXVP(a, i, pr, sp, sx) \
+   ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_H(i)) << 32 | \
+(PSTXVP_EX_OP | __PPC_TP(sp) | __PPC_TX(sx) | ___PPC_RA(a) | IMM_L(i)))
+
 #define TEST_ADD(t, a, b)  ppc_inst(PPC_INST_ADD | ___PPC_RT(t) |  
\
___PPC_RA(a) | ___PPC_RB(b))
 #define TEST_ADD_DOT(t, a, b)  ppc_inst(PPC_INST_ADD | ___PPC_RT(t) |  
\
@@ -444,6 +465,255 @@ static void __init test_lxvd2x_stxvd2x(void)
 }
 #endif /* CONFIG_VSX */
 
+#ifdef CONFIG_VSX
+static void __init test_lxvp_stxvp(void)
+{
+   struct pt_regs regs;
+   union {
+   vector128 a[2];
+   u32 b[8];
+   } c;
+   u32 cached_b[8];
+   int stepped = -1;
+
+   init_pt_regs();
+
+   /*** lxvp ***/
+
+   cached_b[0] = c.b[0] = 18233;
+   cached_b[1] = c.b[1] = 34863571;
+   cached_b[2] = c.b[2] = 834;
+   cached_b[3] = c.b[3] = 6138911;
+   cached_b[4] = c.b[4] = 1234;
+   cached_b[5] = c.b[5] = 5678;
+   cached_b[6] = c.b[6] = 91011;
+   cached_b[7] = c.b[7] = 121314;
+
+   regs.gpr[4] = (unsigned long)
+
+   /*
+* lxvp XTp,DQ(RA)
+* XTp = 32??TX + 2??Tp
+* let TX=1 Tp=1 RA=4 DQ=0
+*/
+   stepped = emulate_step(, ppc_inst(TEST_LXVP(1, 1, 4, 0)));
+
+   if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+   show_result("lxvp", "PASS");
+   } else {
+   if (!cpu_has_feature(CPU_FTR_VSX))
+   show_result("lxvp", "PASS (!CPU_FTR_VSX)");
+   else
+   show_result("lxvp", "FAIL");
+   }
+
+   /*** stxvp ***/
+
+   c.b[0] = 21379463;
+   c.b[1] = 87;
+   c.b[2] = 374234;
+   c.b[3] = 4;
+   c.b[4] = 90;
+   c.b[5] = 122;
+   c.b[6] = 555;
+   c.b[7] = 

Re: [PATCH v6 5/8] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction.

2020-06-30 Thread Dan Williams
On Mon, Jun 29, 2020 at 10:05 PM Aneesh Kumar K.V
 wrote:
>
> Dan Williams  writes:
>
> > On Mon, Jun 29, 2020 at 6:58 AM Aneesh Kumar K.V
> >  wrote:
> >>
> >> of_pmem on POWER10 can now use phwsync instead of hwsync to ensure
> >> all previous writes are architecturally visible for the platform
> >> buffer flush.
> >>
> >> Signed-off-by: Aneesh Kumar K.V 
> >> ---
> >>  arch/powerpc/include/asm/cacheflush.h | 7 +++
> >>  1 file changed, 7 insertions(+)
> >>
> >> diff --git a/arch/powerpc/include/asm/cacheflush.h 
> >> b/arch/powerpc/include/asm/cacheflush.h
> >> index 54764c6e922d..95782f77d768 100644
> >> --- a/arch/powerpc/include/asm/cacheflush.h
> >> +++ b/arch/powerpc/include/asm/cacheflush.h
> >> @@ -98,6 +98,13 @@ static inline void invalidate_dcache_range(unsigned 
> >> long start,
> >> mb();   /* sync */
> >>  }
> >>
> >> +#define arch_pmem_flush_barrier arch_pmem_flush_barrier
> >> +static inline void  arch_pmem_flush_barrier(void)
> >> +{
> >> +   if (cpu_has_feature(CPU_FTR_ARCH_207S))
> >> +   asm volatile(PPC_PHWSYNC ::: "memory");
> >
> > Shouldn't this fallback to a compatible store-fence in an else statement?
>
> The idea was to avoid calling this on anything else. We ensure that by
> making sure that pmem devices are not initialized on systems without that
> cpu feature. Patch 1 does that. Also, the last patch adds a WARN_ON() to
> catch the usage of this outside pmem devices and on systems without that
> cpu feature.

If patch1 handles this why re-check the cpu-feature in this helper? If
the intent is for these routines to be generic why not have them fall
back to the P8 barrier instructions for example like x86 clwb(). Any
kernel code can call it, and it falls back to a compatible clflush()
call on older cpus. I otherwise don't get the point of patch7.


[PATCH v2 2/3] powerpc/pseries: Use doorbells even if XIVE is available

2020-06-30 Thread Nicholas Piggin
KVM supports msgsndp in guests by trapping and emulating the
instruction, so it was decided to always use XIVE for IPIs if it is
available. However on PowerVM systems, msgsndp can be used and gives
better performance. On large systems, high XIVE interrupt rates can
have sub-linear scaling, and using msgsndp can reduce the load on
the interrupt controller.

So switch to using core local doorbells even if XIVE is available.
This reduces performance for KVM guests with an SMT topology by
about 50% for ping-pong context switching between SMT vCPUs. An
option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
to get KVM performance back.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/pseries/smp.c | 54 ++--
 1 file changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/smp.c 
b/arch/powerpc/platforms/pseries/smp.c
index 6891710833be..67e6ad5076ce 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -188,13 +188,16 @@ static int pseries_smp_prepare_cpu(int cpu)
return 0;
 }
 
-static void smp_pseries_cause_ipi(int cpu)
+/* Cause IPI as setup by the interrupt controller (xics or xive) */
+static void (*ic_cause_ipi)(int cpu) __ro_after_init;
+
+/* Use msgsndp doorbells target is a sibling, else use interrupt controller */
+static void dbell_or_ic_cause_ipi(int cpu)
 {
-   /* POWER9 should not use this handler */
if (doorbell_try_core_ipi(cpu))
return;
 
-   icp_ops->cause_ipi(cpu);
+   ic_cause_ipi(cpu);
 }
 
 static int pseries_cause_nmi_ipi(int cpu)
@@ -218,26 +221,41 @@ static int pseries_cause_nmi_ipi(int cpu)
return 0;
 }
 
-static __init void pSeries_smp_probe_xics(void)
-{
-   xics_smp_probe();
-
-   if (cpu_has_feature(CPU_FTR_DBELL) && !is_secure_guest())
-   smp_ops->cause_ipi = smp_pseries_cause_ipi;
-   else
-   smp_ops->cause_ipi = icp_ops->cause_ipi;
-}
-
 static __init void pSeries_smp_probe(void)
 {
if (xive_enabled())
-   /*
-* Don't use P9 doorbells when XIVE is enabled. IPIs
-* using MMIOs should be faster
-*/
xive_smp_probe();
else
-   pSeries_smp_probe_xics();
+   xics_smp_probe();
+
+   /* No doorbell facility, must use the interrupt controller for IPIs */
+   if (!cpu_has_feature(CPU_FTR_DBELL))
+   return;
+
+   /* Doorbells can only be used for IPIs between SMT siblings */
+   if (!cpu_has_feature(CPU_FTR_SMT))
+   return;
+
+   /*
+* KVM emulates doorbells by disabling FSCR[MSGP] so msgsndp faults
+* to the hypervisor which then reads the instruction from guest
+* memory. This can't be done if the guest is secure, so don't use
+* doorbells in secure guests.
+*
+* Under PowerVM, FSCR[MSGP] is enabled so doorbells could be used
+* by secure guests if we distinguished this from KVM.
+*/
+   if (is_secure_guest())
+   return;
+
+   /*
+* The guest can use doobells for SMT sibling IPIs, which stay in
+* the core rather than going to the interrupt controller. This
+* tends to be slower under KVM where doorbells are emulated, but
+* faster for PowerVM where they're enabled.
+*/
+   ic_cause_ipi = smp_ops->cause_ipi;
+   smp_ops->cause_ipi = dbell_or_ic_cause_ipi;
 }
 
 static struct smp_ops_t pseries_smp_ops = {
-- 
2.23.0



Re: [PATCH 2/2] powerpc/mm/books64/pkeys: Rename is_pkey_enabled()

2020-06-30 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> Rename is_pkey_enabled() to is_pkey_masked() to better indicates that
> this check is to make sure the key is available for userspace usage.

I don't think the new name makes that any clearer. Unless you know that
"masked" means not "available for userspace".

It's also not clear if masked means 00 or 11.

Now that there's only one caller why not just fold it in, that way it
doesn't need a name at all.

> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index ca5fcb4bff32..70d760ade922 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -206,18 +206,16 @@ static inline void write_uamor(u64 value)
>   mtspr(SPRN_UAMOR, value);
>  }
>  
> -static bool is_pkey_enabled(int pkey)
> +static bool is_pkey_masked(int pkey)
>  {
>   u64 uamor = read_uamor();
>   u64 pkey_bits = 0x3ul << pkeyshift(pkey);
>   u64 uamor_pkey_bits = (uamor & pkey_bits);
>  
>   /*
> -  * Both the bits in UAMOR corresponding to the key should be set or
> -  * reset.
> +  * Both the bits in UAMOR corresponding to the key should be set
>*/
> - WARN_ON(uamor_pkey_bits && (uamor_pkey_bits != pkey_bits));
> - return !!(uamor_pkey_bits);
> + return (uamor_pkey_bits != pkey_bits);
>  }
>  
>  static inline void init_amr(int pkey, u8 init_bits)
> @@ -246,7 +244,7 @@ int __arch_set_user_pkey_access(struct task_struct *tsk, 
> int pkey,
>   u64 new_amr_bits = 0x0ul;
>   u64 new_iamr_bits = 0x0ul;
>  
> - if (!is_pkey_enabled(pkey))
> + if (is_pkey_masked(pkey))
>   return -EINVAL;

eg:
u64 pkey_bits = 0x3ul << pkeyshift(pkey);

if ((read_uamor() & pkey_bits) != pkey_bits)
return -EINVAL;

>  
>   if (init_val & PKEY_DISABLE_EXECUTE) {


cheers


Re: [PATCH v6 6/8] powerpc/pmem: Avoid the barrier in flush routines

2020-06-30 Thread Michal Suchánek
On Mon, Jun 29, 2020 at 06:50:15PM -0700, Dan Williams wrote:
> On Mon, Jun 29, 2020 at 1:41 PM Aneesh Kumar K.V
>  wrote:
> >
> > Michal Suchánek  writes:
> >
> > > Hello,
> > >
> > > On Mon, Jun 29, 2020 at 07:27:20PM +0530, Aneesh Kumar K.V wrote:
> > >> nvdimm expect the flush routines to just mark the cache clean. The 
> > >> barrier
> > >> that mark the store globally visible is done in nvdimm_flush().
> > >>
> > >> Update the papr_scm driver to a simplified nvdim_flush callback that do
> > >> only the required barrier.
> > >>
> > >> Signed-off-by: Aneesh Kumar K.V 
> > >> ---
> > >>  arch/powerpc/lib/pmem.c   |  6 --
> > >>  arch/powerpc/platforms/pseries/papr_scm.c | 13 +
> > >>  2 files changed, 13 insertions(+), 6 deletions(-)
> > >>
> > >> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
> > >> index 5a61aaeb6930..21210fa676e5 100644
> > >> --- a/arch/powerpc/lib/pmem.c
> > >> +++ b/arch/powerpc/lib/pmem.c
> > >> @@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long 
> > >> start, unsigned long stop)
> > >>
> > >>  for (i = 0; i < size >> shift; i++, addr += bytes)
> > >>  asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): 
> > >> "memory");
> > >> -
> > >> -
> > >> -asm volatile(PPC_PHWSYNC ::: "memory");
> > >>  }
> > >>
> > >>  static inline void __flush_pmem_range(unsigned long start, unsigned 
> > >> long stop)
> > >> @@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long 
> > >> start, unsigned long stop)
> > >>
> > >>  for (i = 0; i < size >> shift; i++, addr += bytes)
> > >>  asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): 
> > >> "memory");
> > >> -
> > >> -
> > >> -asm volatile(PPC_PHWSYNC ::: "memory");
> > >>  }
> > >>
> > >>  static inline void clean_pmem_range(unsigned long start, unsigned long 
> > >> stop)
> > >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> > >> b/arch/powerpc/platforms/pseries/papr_scm.c
> > >> index 9c569078a09f..9a9a0766f8b6 100644
> > >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> > >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> > >> @@ -630,6 +630,18 @@ static int papr_scm_ndctl(struct 
> > >> nvdimm_bus_descriptor *nd_desc,
> > >>
> > >>  return 0;
> > >>  }
> > >> +/*
> > >> + * We have made sure the pmem writes are done such that before calling 
> > >> this
> > >> + * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. 
> > >> Here
> > >> + * we just need to add the necessary barrier to make sure the above 
> > >> flushes
> > >> + * are have updated persistent storage before any data access or data 
> > >> transfer
> > >> + * caused by subsequent instructions is initiated.
> > >> + */
> > >> +static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio 
> > >> *bio)
> > >> +{
> > >> +arch_pmem_flush_barrier();
> > >> +return 0;
> > >> +}
> > >>
> > >>  static ssize_t flags_show(struct device *dev,
> > >>struct device_attribute *attr, char *buf)
> > >> @@ -743,6 +755,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv 
> > >> *p)
> > >>  ndr_desc.mapping = 
> > >>  ndr_desc.num_mappings = 1;
> > >>  ndr_desc.nd_set = >nd_set;
> > >> +ndr_desc.flush = papr_scm_flush_sync;
> > >
> > > AFAICT currently the only device that implements flush is virtio_pmem.
> > > How does the nfit driver get away without implementing flush?
> >
> > generic_nvdimm_flush does the required barrier for nfit. The reason for
> > adding ndr_desc.flush call back for papr_scm was to avoid the usage
> > of iomem based deep flushing (ndr_region_data.flush_wpq) which is not
> > supported by papr_scm.
> >
> > BTW we do return NULL for ndrd_get_flush_wpq() on power. So the upstream
> > code also does the same thing, but in a different way.
> >
> >
> > > Also the flush takes arguments that are completely unused but a user of
> > > the pmem region must assume they are used, and call flush() on the
> > > region rather than arch_pmem_flush_barrier() directly.
> >
> > The bio argument can help a pmem driver to do range based flushing in
> > case of pmem_make_request. If bio is null then we must assume a full
> > device flush.
> 
> The bio argument isn't for range based flushing, it is for flush
> operations that need to complete asynchronously.
How does the block layer determine that the pmem device needs
asynchronous fushing?

The flush() was designed for the purpose with the bio argument and only
virtio_pmem which is fulshed asynchronously used it. Now that papr_scm
resuses it fir different purpose how do you tell?

Thanks

Michal


Re: [PATCH v4 7/8] lockdep: Change hardirq{s_enabled,_context} to per-cpu variables

2020-06-30 Thread Peter Zijlstra
On Tue, Jun 30, 2020 at 07:59:39AM +0200, Ahmed S. Darwish wrote:
> Peter Zijlstra wrote:
> 
> ...
> 
> > -#define lockdep_assert_irqs_disabled() do {\
> > -   WARN_ONCE(debug_locks && !current->lockdep_recursion && \
> > - current->hardirqs_enabled,\
> > - "IRQs not disabled as expected\n");   \
> > -   } while (0)
> 
> ...
> 
> > +#define lockdep_assert_irqs_disabled() \
> > +do {   
> > \
> > +   WARN_ON_ONCE(debug_locks && this_cpu_read(hardirqs_enabled));   \
> > +} while (0)
> 
> I think it would be nice to keep the "IRQs not disabled as expected"
> message. It makes the lockdep splat much more readable.
> 
> This is similarly the case for the v3 lockdep preemption macros:
> 
>   https://lkml.kernel.org/r/20200630054452.3675847-5-a.darw...@linutronix.de
> 
> I did not add a message though to get in-sync with the IRQ macros above.

Hurmph.. the file:line output of a splat is usually all I look at, also
__WARN_printf() generates such atrocious crap code that try and not use
it.

I suppose I should do a __WARN_str() or something, but then people are
unlikely to want to use that, too much variation etc. :/

Cursed if you do, cursed if you don't.


[PATCH v2 0/3] powerpc/pseries: IPI doorbell improvements

2020-06-30 Thread Nicholas Piggin
Since v1:
- Fixed SMP compile error.
- Fixed EPAPR / KVM_GUEST breakage.
- Expanded patch 3 changelog a bit.

Thanks,
Nick

Nicholas Piggin (3):
  powerpc: inline doorbell sending functions
  powerpc/pseries: Use doorbells even if XIVE is available
  powerpc/pseries: Add KVM guest doorbell restrictions

 arch/powerpc/include/asm/dbell.h | 63 ++--
 arch/powerpc/include/asm/firmware.h  |  6 +++
 arch/powerpc/include/asm/kvm_para.h  | 26 ++--
 arch/powerpc/kernel/Makefile |  5 +--
 arch/powerpc/kernel/dbell.c  | 55 
 arch/powerpc/kernel/firmware.c   | 19 +
 arch/powerpc/platforms/pseries/smp.c | 62 +++
 7 files changed, 134 insertions(+), 102 deletions(-)

-- 
2.23.0



Re: [PATCH 1/2] powerpc/mm/book3s54/pkeys: make pkey access check work on execute_only_key

2020-06-30 Thread Michael Ellerman
On Sat, 27 Jun 2020 12:31:46 +0530, Aneesh Kumar K.V wrote:
> pkey_access_permitted() should not check for pkey is available in UAMOR or 
> not.
> The kernel needs to do that check only while allocating keys. This also makes
> sure execute_only_key which is marked as non-manageable via UAMOR gives the
> right access check return w.r.t pkey_access_permitted().
> 
> This fix the page fault loop when using PROT_EXEC as below
> 
> [...]

Patch 1 applied to powerpc/fixes.

[1/2] powerpc/mm/pkeys: Make pkey access check work on execute_only_key
  https://git.kernel.org/powerpc/c/19ab500edb5d6020010caba48ce3b4ce4182ab63

cheers


Re: [PATCH] crypto: af_alg - Fix regression on empty requests

2020-06-30 Thread Naresh Kamboju
On Fri, 26 Jun 2020 at 12:00, Herbert Xu  wrote:
>
> On Tue, Jun 23, 2020 at 10:02:17AM -0700, Eric Biggers wrote:
> >
> > The source code for the two failing AF_ALG tests is here:
> >
> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypto/af_alg02.c
> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypto/af_alg05.c
> >
> > They use read() and write(), not send() and recv().
> >
> > af_alg02 uses read() to read from a "salsa20" request socket without writing
> > anything to it.  It is expected that this returns 0, i.e. that behaves like
> > encrypting an empty message.

Since we are on this subject,
LTP af_alg02  test case fails on stable 4.9 and stable 4.4
This is not a regression because the test case has been failing from
the beginning.

Is this test case expected to fail on stable 4.9 and 4.4 ?
or any chance to fix this on these older branches ?

Test output:
af_alg02.c:52: BROK: Timed out while reading from request socket.

ref:
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191-g082e807235d7/testrun/2884917/suite/ltp-crypto-tests/test/af_alg02/history/
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191-g082e807235d7/testrun/2884606/suite/ltp-crypto-tests/test/af_alg02/log

- Naresh


Re: [PATCH v6 6/8] powerpc/pmem: Avoid the barrier in flush routines

2020-06-30 Thread Aneesh Kumar K.V

On 6/30/20 2:24 PM, Michal Suchánek wrote:

On Mon, Jun 29, 2020 at 06:50:15PM -0700, Dan Williams wrote:

On Mon, Jun 29, 2020 at 1:41 PM Aneesh Kumar K.V
 wrote:


Michal Suchánek  writes:


Hello,

On Mon, Jun 29, 2020 at 07:27:20PM +0530, Aneesh Kumar K.V wrote:

nvdimm expect the flush routines to just mark the cache clean. The barrier
that mark the store globally visible is done in nvdimm_flush().

Update the papr_scm driver to a simplified nvdim_flush callback that do
only the required barrier.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/lib/pmem.c   |  6 --
  arch/powerpc/platforms/pseries/papr_scm.c | 13 +
  2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 5a61aaeb6930..21210fa676e5 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long start, 
unsigned long stop)

  for (i = 0; i < size >> shift; i++, addr += bytes)
  asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-asm volatile(PPC_PHWSYNC ::: "memory");
  }

  static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
@@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long start, 
unsigned long stop)

  for (i = 0; i < size >> shift; i++, addr += bytes)
  asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-asm volatile(PPC_PHWSYNC ::: "memory");
  }

  static inline void clean_pmem_range(unsigned long start, unsigned long stop)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 9c569078a09f..9a9a0766f8b6 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -630,6 +630,18 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor 
*nd_desc,

  return 0;
  }
+/*
+ * We have made sure the pmem writes are done such that before calling this
+ * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. Here
+ * we just need to add the necessary barrier to make sure the above flushes
+ * are have updated persistent storage before any data access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio *bio)
+{
+arch_pmem_flush_barrier();
+return 0;
+}

  static ssize_t flags_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -743,6 +755,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
  ndr_desc.mapping = 
  ndr_desc.num_mappings = 1;
  ndr_desc.nd_set = >nd_set;
+ndr_desc.flush = papr_scm_flush_sync;


AFAICT currently the only device that implements flush is virtio_pmem.
How does the nfit driver get away without implementing flush?


generic_nvdimm_flush does the required barrier for nfit. The reason for
adding ndr_desc.flush call back for papr_scm was to avoid the usage
of iomem based deep flushing (ndr_region_data.flush_wpq) which is not
supported by papr_scm.

BTW we do return NULL for ndrd_get_flush_wpq() on power. So the upstream
code also does the same thing, but in a different way.



Also the flush takes arguments that are completely unused but a user of
the pmem region must assume they are used, and call flush() on the
region rather than arch_pmem_flush_barrier() directly.


The bio argument can help a pmem driver to do range based flushing in
case of pmem_make_request. If bio is null then we must assume a full
device flush.


The bio argument isn't for range based flushing, it is for flush
operations that need to complete asynchronously.

How does the block layer determine that the pmem device needs
asynchronous fushing?



set_bit(ND_REGION_ASYNC, _desc.flags);

and dax_synchronous(dev)


The flush() was designed for the purpose with the bio argument and only
virtio_pmem which is fulshed asynchronously used it. Now that papr_scm
resuses it fir different purpose how do you tell?



-aneesh


Re: [PATCH updated] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-06-30 Thread Aneesh Kumar K.V

On 6/30/20 12:52 PM, Aneesh Kumar K.V wrote:

On 6/30/20 12:36 PM, Dan Williams wrote:

On Mon, Jun 29, 2020 at 10:02 PM Aneesh Kumar K.V
 wrote:


Dan Williams  writes:


On Mon, Jun 29, 2020 at 1:29 PM Aneesh Kumar K.V
 wrote:


Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is 
initiated.

This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V 
---
  drivers/md/dm-writecache.c   | 2 +-
  drivers/nvdimm/region_devs.c | 8 
  include/linux/libnvdimm.h    | 4 
  3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 74f3c506f084..8c6b6dce64e2 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct 
dm_writecache *wc)
  static void writecache_commit_flushed(struct dm_writecache *wc, 
bool wait_for_ios)

  {
 if (WC_MODE_PMEM(wc))
-   wmb();
+   arch_pmem_flush_barrier();
 else
 ssd_commit_flushed(wc, wait_for_ios);
  }
diff --git a/drivers/nvdimm/region_devs.c 
b/drivers/nvdimm/region_devs.c

index 4502f9c4708d..b308ad09b63d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region 
*nd_region)
 idx = this_cpu_add_return(flush_idx, hash_32(current->pid 
+ idx, 8));


 /*
-    * The first wmb() is needed to 'sfence' all previous writes
-    * such that they are architecturally visible for the platform
-    * buffer flush.  Note that we've already arranged for pmem
+    * The first arch_pmem_flush_barrier() is needed to 
'sfence' all
+    * previous writes such that they are architecturally 
visible for
+    * the platform buffer flush. Note that we've already 
arranged for pmem
  * writes to avoid the cache via memcpy_flushcache().  The 
final

  * wmb() ensures ordering for the NVDIMM flush write.
  */
-   wmb();
+   arch_pmem_flush_barrier();
 for (i = 0; i < nd_region->ndr_mappings; i++)
 if (ndrd_get_flush_wpq(ndrd, i, 0))
 writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..66f6c65bd789 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void 
*addr, size_t size)

  }
  #endif

+#ifndef arch_pmem_flush_barrier
+#define arch_pmem_flush_barrier() wmb()
+#endif


I think it is out of place to define this in libnvdimm.h and it is odd
to give it such a long name. The other pmem api helpers like
arch_wb_cache_pmem() and arch_invalidate_pmem() are function calls for
libnvdimm driver operations, this barrier is just an instruction and
is closer to wmb() than the pmem api routine.

Since it is a store fence for pmem, so let's just call it pmem_wmb()
and define the generic version in include/linux/compiler.h. It should
probably also be documented alongside dma_wmb() in
Documentation/memory-barriers.txt about why code would use it over
wmb(), and why a symmetric pmem_rmb() is not needed.


How about the below? I used pmem_barrier() instead of pmem_wmb().


Why? A barrier() is a bi-directional ordering mechanic for reads and
writes, and the proposed semantics mechanism only orders writes +
persistence. Otherwise the default fallback to wmb() on archs that
don't override it does not make sense.


I
guess we wanted this to order() any data access not jus the following
stores to persistent storage?


Why?


W.r.t why a symmetric pmem_rmb() is not
needed I was not sure how to explain that. Are you suggesting to explain
why a read/load from persistent storage don't want to wait for
pmem_barrier() ?


I would expect that the explanation is that a typical rmb() is
sufficient and that there is nothing pmem specific semantic for read
ordering for pmem vs normal read-barrier semantics.



Should that be rmb()? A smp_rmb() would suffice right?


-aneesh


Re: [PATCH 3/3] powerpc/pseries: Add KVM guest doorbell restrictions

2020-06-30 Thread Nicholas Piggin
Excerpts from Paul Mackerras's message of June 30, 2020 6:26 pm:
> On Tue, Jun 30, 2020 at 03:35:08PM +1000, Nicholas Piggin wrote:
>> Excerpts from Paul Mackerras's message of June 30, 2020 12:27 pm:
>> > On Sun, Jun 28, 2020 at 01:04:28AM +1000, Nicholas Piggin wrote:
>> >> KVM guests have certain restrictions and performance quirks when
>> >> using doorbells. This patch tests for KVM environment in doorbell
>> >> setup, and optimises IPI performance:
>> >> 
>> >>  - PowerVM guests may now use doorbells even if they are secure.
>> >> 
>> >>  - KVM guests no longer use doorbells if XIVE is available.
>> > 
>> > It seems, from the fact that you completely remove
>> > kvm_para_available(), that you perhaps haven't tried building with
>> > CONFIG_KVM_GUEST=y.
>> 
>> It's still there and builds:
> 
> OK, good, I missed that.
> 
>> static inline int kvm_para_available(void)
>> {
>> return IS_ENABLED(CONFIG_KVM_GUEST) && is_kvm_guest();
>> }
>> 
>> but...
>> 
>> > Somewhat confusingly, that option is not used or
>> > needed when building for a PAPR guest (i.e. the "pseries" platform)
>> > but is used on non-IBM platforms using the "epapr" hypervisor
>> > interface.
>> 
>> ... is_kvm_guest() returns false on !PSERIES now.
> 
> And therefore kvm_para_available() returns false on all the platforms
> where the code that depends on it could actually be used.
> 
> It's not correct to assume that !PSERIES means not a KVM guest.

Yep, thanks for catching it.

>> Not intended
>> to break EPAPR. I'm not sure of a good way to share this between
>> EPAPR and PSERIES, I might just make a copy of it but I'll see.
> 
> OK, so you're doing a new version?

Just sent.

Thanks,
Nick


[PATCH v2 3/3] powerpc/pseries: Add KVM guest doorbell restrictions

2020-06-30 Thread Nicholas Piggin
KVM guests have certain restrictions and performance quirks when using
doorbells. This patch moves the EPAPR KVM guest test so it can be shared
with PSERIES, and uses that in doorbell setup code to apply the KVM
guest quirks and  improves IPI performance for two cases:

 - PowerVM guests may now use doorbells even if they are secure.

 - KVM guests no longer use doorbells if XIVE is available.

There is a valid complaint that "KVM guest" is not a very reasonable
thing to test for, it's preferable for the hypervisor to advertise
particular behaviours to the guest so they could change if the
hypervisor implementation or configuration changes. However in this case
we were already assuming a KVM guest worst case, so this patch is about
containing those quirks. If KVM later advertises fast doorbells, we
should test for that and override the quirks.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/firmware.h  |  6 +
 arch/powerpc/include/asm/kvm_para.h  | 26 +++
 arch/powerpc/kernel/Makefile |  5 ++--
 arch/powerpc/kernel/firmware.c   | 19 ++
 arch/powerpc/platforms/pseries/smp.c | 38 +---
 5 files changed, 53 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index 6003c2e533a0..f67efbaba17f 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -132,6 +132,12 @@ extern int ibm_nmi_interlock_token;
 
 extern unsigned int __start___fw_ftr_fixup, __stop___fw_ftr_fixup;
 
+#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST)
+bool is_kvm_guest(void);
+#else
+static inline bool is_kvm_guest(void) { return false; }
+#endif
+
 #ifdef CONFIG_PPC_PSERIES
 void pseries_probe_fw_features(void);
 #else
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 9c1f6b4b9bbf..744612054c94 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -8,35 +8,15 @@
 #ifndef __POWERPC_KVM_PARA_H__
 #define __POWERPC_KVM_PARA_H__
 
-#include 
-
-#ifdef CONFIG_KVM_GUEST
-
-#include 
-
-static inline int kvm_para_available(void)
-{
-   struct device_node *hyper_node;
-
-   hyper_node = of_find_node_by_path("/hypervisor");
-   if (!hyper_node)
-   return 0;
+#include 
 
-   if (!of_device_is_compatible(hyper_node, "linux,kvm"))
-   return 0;
-
-   return 1;
-}
-
-#else
+#include 
 
 static inline int kvm_para_available(void)
 {
-   return 0;
+   return IS_ENABLED(CONFIG_KVM_GUEST) && is_kvm_guest();
 }
 
-#endif
-
 static inline unsigned int kvm_arch_para_features(void)
 {
unsigned long r;
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 244542ae2a91..852164439dcb 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -45,11 +45,10 @@ obj-y   := cputable.o 
syscalls.o \
   signal.o sysfs.o cacheinfo.o time.o \
   prom.o traps.o setup-common.o \
   udbg.o misc.o io.o misc_$(BITS).o \
-  of_platform.o prom_parse.o
+  of_platform.o prom_parse.o firmware.o
 obj-y  += ptrace/
 obj-$(CONFIG_PPC64)+= setup_64.o \
-  paca.o nvram_64.o firmware.o note.o \
-  syscall_64.o
+  paca.o nvram_64.o note.o syscall_64.o
 obj-$(CONFIG_COMPAT)   += sys_ppc32.o signal_32.o
 obj-$(CONFIG_VDSO32)   += vdso32/
 obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
diff --git a/arch/powerpc/kernel/firmware.c b/arch/powerpc/kernel/firmware.c
index cc4a5e3f51f1..fe48d319d490 100644
--- a/arch/powerpc/kernel/firmware.c
+++ b/arch/powerpc/kernel/firmware.c
@@ -11,8 +11,27 @@
 
 #include 
 #include 
+#include 
 
 #include 
 
+#ifdef CONFIG_PPC64
 unsigned long powerpc_firmware_features __read_mostly;
 EXPORT_SYMBOL_GPL(powerpc_firmware_features);
+#endif
+
+#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST)
+bool is_kvm_guest(void)
+{
+   struct device_node *hyper_node;
+
+   hyper_node = of_find_node_by_path("/hypervisor");
+   if (!hyper_node)
+   return 0;
+
+   if (!of_device_is_compatible(hyper_node, "linux,kvm"))
+   return 0;
+
+   return 1;
+}
+#endif
diff --git a/arch/powerpc/platforms/pseries/smp.c 
b/arch/powerpc/platforms/pseries/smp.c
index 67e6ad5076ce..7af0003b40b6 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -236,24 +236,32 @@ static __init void pSeries_smp_probe(void)
if (!cpu_has_feature(CPU_FTR_SMT))
return;
 
-   /*
-* KVM emulates doorbells by disabling FSCR[MSGP] so msgsndp faults
-* to the 

Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-06-30 Thread piliu



On 06/30/2020 02:10 PM, Hari Bathini wrote:
> 
> 
> On 30/06/20 9:00 am, piliu wrote:
>>
>>
>> On 06/29/2020 01:55 PM, Hari Bathini wrote:
>>>
>>>
>>> On 28/06/20 7:44 am, piliu wrote:
 Hi Hari,
>>>
>>> Hi Pingfan,
>>>

 After a quick through for this series, I have a few question/comment on
 this patch for the time being. Pls see comment inline.

 On 06/27/2020 03:05 AM, Hari Bathini wrote:
> crashkernel region could have an overlap with special memory regions
> like  opal, rtas, tce-table & such. These regions are referred to as
> exclude memory ranges. Setup this ranges during image probe in order
> to avoid them while finding the buffer for different kdump segments.
>>>
>>> [...]
>>>
> + /*
> +  * Use the locate_mem_hole logic in kexec_add_buffer() for regular
> +  * kexec_file_load syscall
> +  */
> + if (kbuf->image->type != KEXEC_TYPE_CRASH)
> + return 0;
 Can the ranges overlap [crashk_res.start, crashk_res.end]?  Otherwise
 there is no requirement for @exclude_ranges.
>>>
>>> The ranges like rtas, opal are loaded by f/w. They almost always overlap 
>>> with
>>> crashkernel region. So, @exclude_ranges is required to support kdump.
>> f/w passes rtas/opal as service, then must f/w mark these ranges as
>> fdt_reserved_mem in order to make kernel aware not to use these ranges?
> 
> It does. Actually, reserve_map + reserved-ranges are reserved as soon as
> memblock allocator is ready but not before crashkernel reservation.
> Check early_reserve_mem() call in kernel/prom.c
> 
>> Otherwise kernel memory allocation besides kdump can also overwrite
>> these ranges.> 
>> Hmm, revisiting reserve_crashkernel(). It seems not to take any reserved
>> memory into consider except kernel text. Could it work based on memblock
>> allocator?
> 
> So, kdump could possibly overwrite these regions which is why an exclude
> range list is needed. Same thing was done in kexec-tools as well.
OK, got it.

Thanks,
Pingfan
> 
> Thanks
> Hari
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 



[PATCH v2] ASoC: fsl_asrc: Add an option to select internal ratio mode

2020-06-30 Thread Shengjiu Wang
The ASRC not only supports ideal ratio mode, but also supports
internal ratio mode.

For internal rato mode, the rate of clock source should be divided
with no remainder by sample rate, otherwise there is sound
distortion.

Add function fsl_asrc_select_clk() to find proper clock source for
internal ratio mode, if the clock source is available then internal
ratio mode will be selected.

With change, the ideal ratio mode is not the only option for user.

Signed-off-by: Shengjiu Wang 
---
changes in v2
- update according to Nicolin's comments

 sound/soc/fsl/fsl_asrc.c | 54 ++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index 95f6a9617b0b..4105ef2c4f99 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -582,11 +582,55 @@ static int fsl_asrc_dai_startup(struct snd_pcm_substream 
*substream,
SNDRV_PCM_HW_PARAM_RATE, _asrc_rate_constraints);
 }
 
+/**
+ * Select proper clock source for internal ratio mode
+ */
+static int fsl_asrc_select_clk(struct fsl_asrc_priv *asrc_priv,
+  struct fsl_asrc_pair *pair,
+  int in_rate,
+  int out_rate)
+{
+   struct fsl_asrc_pair_priv *pair_priv = pair->private;
+   struct asrc_config *config = pair_priv->config;
+   int rate[2], select_clk[2]; /* Array size 2 means IN and OUT */
+   int clk_rate, clk_index;
+   int i = 0, j = 0;
+
+   rate[0] = in_rate;
+   rate[1] = out_rate;
+
+   /* Select proper clock source for internal ratio mode */
+   for (j = 0; j < 2; j++) {
+   for (i = 0; i < ASRC_CLK_MAP_LEN; i++) {
+   clk_index = asrc_priv->clk_map[j][i];
+   clk_rate = 
clk_get_rate(asrc_priv->asrck_clk[clk_index]);
+   /* Only match a perfect clock source with no remainder 
*/
+   if (clk_rate != 0 && (clk_rate / rate[j]) <= 1024 &&
+   (clk_rate % rate[j]) == 0)
+   break;
+   }
+
+   select_clk[j] = i;
+   }
+
+   /* Switch to ideal ratio mode if there is no proper clock source */
+   if (select_clk[IN] == ASRC_CLK_MAP_LEN || select_clk[OUT] == 
ASRC_CLK_MAP_LEN) {
+   select_clk[IN] = INCLK_NONE;
+   select_clk[OUT] = OUTCLK_ASRCK1_CLK;
+   }
+
+   config->inclk = select_clk[IN];
+   config->outclk = select_clk[OUT];
+
+   return 0;
+}
+
 static int fsl_asrc_dai_hw_params(struct snd_pcm_substream *substream,
  struct snd_pcm_hw_params *params,
  struct snd_soc_dai *dai)
 {
struct fsl_asrc *asrc = snd_soc_dai_get_drvdata(dai);
+   struct fsl_asrc_priv *asrc_priv = asrc->private;
struct snd_pcm_runtime *runtime = substream->runtime;
struct fsl_asrc_pair *pair = runtime->private_data;
struct fsl_asrc_pair_priv *pair_priv = pair->private;
@@ -605,8 +649,6 @@ static int fsl_asrc_dai_hw_params(struct snd_pcm_substream 
*substream,
 
config.pair = pair->index;
config.channel_num = channels;
-   config.inclk = INCLK_NONE;
-   config.outclk = OUTCLK_ASRCK1_CLK;
 
if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) {
config.input_format   = params_format(params);
@@ -620,6 +662,14 @@ static int fsl_asrc_dai_hw_params(struct snd_pcm_substream 
*substream,
config.output_sample_rate = rate;
}
 
+   ret = fsl_asrc_select_clk(asrc_priv, pair,
+ config.input_sample_rate,
+ config.output_sample_rate);
+   if (ret) {
+   dev_err(dai->dev, "fail to select clock\n");
+   return ret;
+   }
+
ret = fsl_asrc_config_pair(pair, false);
if (ret) {
dev_err(dai->dev, "fail to config asrc pair\n");
-- 
2.21.0



Re: [PATCH 00/13] iommu: Remove usage of dev->archdata.iommu

2020-06-30 Thread Joerg Roedel
On Thu, Jun 25, 2020 at 03:08:23PM +0200, Joerg Roedel wrote:
> Joerg Roedel (13):
>   iommu/exynos: Use dev_iommu_priv_get/set()
>   iommu/vt-d: Use dev_iommu_priv_get/set()
>   iommu/msm: Use dev_iommu_priv_get/set()
>   iommu/omap: Use dev_iommu_priv_get/set()
>   iommu/rockchip: Use dev_iommu_priv_get/set()
>   iommu/tegra: Use dev_iommu_priv_get/set()
>   iommu/pamu: Use dev_iommu_priv_get/set()
>   iommu/mediatek: Do no use dev->archdata.iommu
>   x86: Remove dev->archdata.iommu pointer
>   ia64: Remove dev->archdata.iommu pointer
>   arm: Remove dev->archdata.iommu pointer
>   arm64: Remove dev->archdata.iommu pointer
>   powerpc/dma: Remove dev->archdata.iommu_domain

Applied.


Re: [PATCH 3/3] powerpc/pseries: Add KVM guest doorbell restrictions

2020-06-30 Thread Paul Mackerras
On Tue, Jun 30, 2020 at 03:35:08PM +1000, Nicholas Piggin wrote:
> Excerpts from Paul Mackerras's message of June 30, 2020 12:27 pm:
> > On Sun, Jun 28, 2020 at 01:04:28AM +1000, Nicholas Piggin wrote:
> >> KVM guests have certain restrictions and performance quirks when
> >> using doorbells. This patch tests for KVM environment in doorbell
> >> setup, and optimises IPI performance:
> >> 
> >>  - PowerVM guests may now use doorbells even if they are secure.
> >> 
> >>  - KVM guests no longer use doorbells if XIVE is available.
> > 
> > It seems, from the fact that you completely remove
> > kvm_para_available(), that you perhaps haven't tried building with
> > CONFIG_KVM_GUEST=y.
> 
> It's still there and builds:

OK, good, I missed that.

> static inline int kvm_para_available(void)
> {
> return IS_ENABLED(CONFIG_KVM_GUEST) && is_kvm_guest();
> }
> 
> but...
> 
> > Somewhat confusingly, that option is not used or
> > needed when building for a PAPR guest (i.e. the "pseries" platform)
> > but is used on non-IBM platforms using the "epapr" hypervisor
> > interface.
> 
> ... is_kvm_guest() returns false on !PSERIES now.

And therefore kvm_para_available() returns false on all the platforms
where the code that depends on it could actually be used.

It's not correct to assume that !PSERIES means not a KVM guest.

> Not intended
> to break EPAPR. I'm not sure of a good way to share this between
> EPAPR and PSERIES, I might just make a copy of it but I'll see.

OK, so you're doing a new version?

Regards,
Paul.


[PATCH v2 1/3] powerpc: inline doorbell sending functions

2020-06-30 Thread Nicholas Piggin
These are only called in one place for a given platform, so inline them
for performance.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/dbell.h | 63 ++--
 arch/powerpc/kernel/dbell.c  | 55 
 2 files changed, 60 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/include/asm/dbell.h b/arch/powerpc/include/asm/dbell.h
index 4ce6808deed3..f19d2282e3f8 100644
--- a/arch/powerpc/include/asm/dbell.h
+++ b/arch/powerpc/include/asm/dbell.h
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 
 #define PPC_DBELL_MSG_BRDCAST  (0x0400)
 #define PPC_DBELL_TYPE(x)  (((x) & 0xf) << (63-36))
@@ -87,9 +88,6 @@ static inline void ppc_msgsync(void)
 
 #endif /* CONFIG_PPC_BOOK3S */
 
-extern void doorbell_global_ipi(int cpu);
-extern void doorbell_core_ipi(int cpu);
-extern int doorbell_try_core_ipi(int cpu);
 extern void doorbell_exception(struct pt_regs *regs);
 
 static inline void ppc_msgsnd(enum ppc_dbell type, u32 flags, u32 tag)
@@ -100,4 +98,63 @@ static inline void ppc_msgsnd(enum ppc_dbell type, u32 
flags, u32 tag)
_ppc_msgsnd(msg);
 }
 
+#ifdef CONFIG_SMP
+
+/*
+ * Doorbells must only be used if CPU_FTR_DBELL is available.
+ * msgsnd is used in HV, and msgsndp is used in !HV.
+ *
+ * These should be used by platform code that is aware of restrictions.
+ * Other arch code should use ->cause_ipi.
+ *
+ * doorbell_global_ipi() sends a dbell to any target CPU.
+ * Must be used only by architectures that address msgsnd target
+ * by PIR/get_hard_smp_processor_id.
+ */
+static inline void doorbell_global_ipi(int cpu)
+{
+   u32 tag = get_hard_smp_processor_id(cpu);
+
+   kvmppc_set_host_ipi(cpu);
+   /* Order previous accesses vs. msgsnd, which is treated as a store */
+   ppc_msgsnd_sync();
+   ppc_msgsnd(PPC_DBELL_MSGTYPE, 0, tag);
+}
+
+/*
+ * doorbell_core_ipi() sends a dbell to a target CPU in the same core.
+ * Must be used only by architectures that address msgsnd target
+ * by TIR/cpu_thread_in_core.
+ */
+static inline void doorbell_core_ipi(int cpu)
+{
+   u32 tag = cpu_thread_in_core(cpu);
+
+   kvmppc_set_host_ipi(cpu);
+   /* Order previous accesses vs. msgsnd, which is treated as a store */
+   ppc_msgsnd_sync();
+   ppc_msgsnd(PPC_DBELL_MSGTYPE, 0, tag);
+}
+
+/*
+ * Attempt to cause a core doorbell if destination is on the same core.
+ * Returns 1 on success, 0 on failure.
+ */
+static inline int doorbell_try_core_ipi(int cpu)
+{
+   int this_cpu = get_cpu();
+   int ret = 0;
+
+   if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
+   doorbell_core_ipi(cpu);
+   ret = 1;
+   }
+
+   put_cpu();
+
+   return ret;
+}
+
+#endif /* CONFIG_SMP */
+
 #endif /* _ASM_POWERPC_DBELL_H */
diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c
index f17ff1200eaa..52680cf07c9d 100644
--- a/arch/powerpc/kernel/dbell.c
+++ b/arch/powerpc/kernel/dbell.c
@@ -18,61 +18,6 @@
 
 #ifdef CONFIG_SMP
 
-/*
- * Doorbells must only be used if CPU_FTR_DBELL is available.
- * msgsnd is used in HV, and msgsndp is used in !HV.
- *
- * These should be used by platform code that is aware of restrictions.
- * Other arch code should use ->cause_ipi.
- *
- * doorbell_global_ipi() sends a dbell to any target CPU.
- * Must be used only by architectures that address msgsnd target
- * by PIR/get_hard_smp_processor_id.
- */
-void doorbell_global_ipi(int cpu)
-{
-   u32 tag = get_hard_smp_processor_id(cpu);
-
-   kvmppc_set_host_ipi(cpu);
-   /* Order previous accesses vs. msgsnd, which is treated as a store */
-   ppc_msgsnd_sync();
-   ppc_msgsnd(PPC_DBELL_MSGTYPE, 0, tag);
-}
-
-/*
- * doorbell_core_ipi() sends a dbell to a target CPU in the same core.
- * Must be used only by architectures that address msgsnd target
- * by TIR/cpu_thread_in_core.
- */
-void doorbell_core_ipi(int cpu)
-{
-   u32 tag = cpu_thread_in_core(cpu);
-
-   kvmppc_set_host_ipi(cpu);
-   /* Order previous accesses vs. msgsnd, which is treated as a store */
-   ppc_msgsnd_sync();
-   ppc_msgsnd(PPC_DBELL_MSGTYPE, 0, tag);
-}
-
-/*
- * Attempt to cause a core doorbell if destination is on the same core.
- * Returns 1 on success, 0 on failure.
- */
-int doorbell_try_core_ipi(int cpu)
-{
-   int this_cpu = get_cpu();
-   int ret = 0;
-
-   if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
-   doorbell_core_ipi(cpu);
-   ret = 1;
-   }
-
-   put_cpu();
-
-   return ret;
-}
-
 void doorbell_exception(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
-- 
2.23.0



Re: [PATCH] selftests/powerpc: Fix build issue with output directory

2020-06-30 Thread Michael Ellerman
On Thu, 25 Jun 2020 22:27:21 +0530, Harish wrote:
> We use OUTPUT directory as TMPOUT for checking no-pie option. When
> building powerpc/ from selftests directory, the OUTPUT directory
> eventually points to powerpc/pmu/ebb/ and gets removed when
> checking for -no-pie option in try-run routine, subsequently build
> fails with the following
> 
> $ make -C powerpc
> ...
> ...
> TARGET=ebb; BUILD_TARGET=$OUTPUT/$TARGET; mkdir -p $BUILD_TARGET; make 
> OUTPUT=$BUILD_TARGET -k -C $TARGET all
> make[2]: Entering directory 
> '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb'
> make[2]: *** No rule to make target 'Makefile'.
> make[2]: Failed to remake makefile 'Makefile'.
> make[2]: *** No rule to make target 'ebb.c', needed by 
> '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
> make[2]: *** No rule to make target 'ebb_handler.S', needed by 
> '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
> make[2]: *** No rule to make target 'trace.c', needed by 
> '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
> make[2]: *** No rule to make target 'busy_loop.S', needed by 
> '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
> make[2]: Target 'all' not remade because of errors.
> 
> [...]

Applied to powerpc/fixes.

[1/1] selftests/powerpc: Fix build failure in ebb tests
  https://git.kernel.org/powerpc/c/896066aa0685af3434637998b76218c2045142a8

cheers


[PATCH v3] ASoC: fsl_asrc: Add an option to select internal ratio mode

2020-06-30 Thread Shengjiu Wang
The ASRC not only supports ideal ratio mode, but also supports
internal ratio mode.

For internal rato mode, the rate of clock source should be divided
with no remainder by sample rate, otherwise there is sound
distortion.

Add function fsl_asrc_select_clk() to find proper clock source for
internal ratio mode, if the clock source is available then internal
ratio mode will be selected.

With change, the ideal ratio mode is not the only option for user.

Signed-off-by: Shengjiu Wang 
---
changes in v3
- convert fsl_asrc_select_clk to void type

changes in v2
- update according to Nicolin's comments

 sound/soc/fsl/fsl_asrc.c | 46 ++--
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index 95f6a9617b0b..462ce9f9ab48 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -582,11 +582,51 @@ static int fsl_asrc_dai_startup(struct snd_pcm_substream 
*substream,
SNDRV_PCM_HW_PARAM_RATE, _asrc_rate_constraints);
 }
 
+/* Select proper clock source for internal ratio mode */
+static void fsl_asrc_select_clk(struct fsl_asrc_priv *asrc_priv,
+   struct fsl_asrc_pair *pair,
+   int in_rate,
+   int out_rate)
+{
+   struct fsl_asrc_pair_priv *pair_priv = pair->private;
+   struct asrc_config *config = pair_priv->config;
+   int rate[2], select_clk[2]; /* Array size 2 means IN and OUT */
+   int clk_rate, clk_index;
+   int i = 0, j = 0;
+
+   rate[IN] = in_rate;
+   rate[OUT] = out_rate;
+
+   /* Select proper clock source for internal ratio mode */
+   for (j = 0; j < 2; j++) {
+   for (i = 0; i < ASRC_CLK_MAP_LEN; i++) {
+   clk_index = asrc_priv->clk_map[j][i];
+   clk_rate = 
clk_get_rate(asrc_priv->asrck_clk[clk_index]);
+   /* Only match a perfect clock source with no remainder 
*/
+   if (clk_rate != 0 && (clk_rate / rate[j]) <= 1024 &&
+   (clk_rate % rate[j]) == 0)
+   break;
+   }
+
+   select_clk[j] = i;
+   }
+
+   /* Switch to ideal ratio mode if there is no proper clock source */
+   if (select_clk[IN] == ASRC_CLK_MAP_LEN || select_clk[OUT] == 
ASRC_CLK_MAP_LEN) {
+   select_clk[IN] = INCLK_NONE;
+   select_clk[OUT] = OUTCLK_ASRCK1_CLK;
+   }
+
+   config->inclk = select_clk[IN];
+   config->outclk = select_clk[OUT];
+}
+
 static int fsl_asrc_dai_hw_params(struct snd_pcm_substream *substream,
  struct snd_pcm_hw_params *params,
  struct snd_soc_dai *dai)
 {
struct fsl_asrc *asrc = snd_soc_dai_get_drvdata(dai);
+   struct fsl_asrc_priv *asrc_priv = asrc->private;
struct snd_pcm_runtime *runtime = substream->runtime;
struct fsl_asrc_pair *pair = runtime->private_data;
struct fsl_asrc_pair_priv *pair_priv = pair->private;
@@ -605,8 +645,6 @@ static int fsl_asrc_dai_hw_params(struct snd_pcm_substream 
*substream,
 
config.pair = pair->index;
config.channel_num = channels;
-   config.inclk = INCLK_NONE;
-   config.outclk = OUTCLK_ASRCK1_CLK;
 
if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) {
config.input_format   = params_format(params);
@@ -620,6 +658,10 @@ static int fsl_asrc_dai_hw_params(struct snd_pcm_substream 
*substream,
config.output_sample_rate = rate;
}
 
+   fsl_asrc_select_clk(asrc_priv, pair,
+   config.input_sample_rate,
+   config.output_sample_rate);
+
ret = fsl_asrc_config_pair(pair, false);
if (ret) {
dev_err(dai->dev, "fail to config asrc pair\n");
-- 
2.21.0



Re: [PATCH] ASoC: fsl_asrc: Add an option to select internal ratio mode

2020-06-30 Thread Fabio Estevam
Hi Shengjiu,

On Mon, Jun 29, 2020 at 11:10 AM Shengjiu Wang  wrote:

> +/**

"/**" notation may confuse 'make htmldocs". Since this is a single
line comment you could do:

/* Select proper clock source for internal ratio mode */


> + * Select proper clock source for internal ratio mode
> + */
> +static int fsl_asrc_select_clk(struct fsl_asrc_priv *asrc_priv,
> +  struct fsl_asrc_pair *pair,
> +  int in_rate,
> +  int out_rate)
> +{
> +   struct fsl_asrc_pair_priv *pair_priv = pair->private;
> +   struct asrc_config *config = pair_priv->config;
> +   int rate[2], select_clk[2]; /* Array size 2 means IN and OUT */
> +   int clk_rate, clk_index;
> +   int i = 0, j = 0;
> +   bool clk_sel[2];
> +
> +   rate[0] = in_rate;
> +   rate[1] = out_rate;
> +
> +   /* Select proper clock source for internal ratio mode */
> +   for (j = 0; j < 2; j++) {
> +   for (i = 0; i < ASRC_CLK_MAP_LEN; i++) {
> +   clk_index = asrc_priv->clk_map[j][i];
> +   clk_rate = 
> clk_get_rate(asrc_priv->asrck_clk[clk_index]);
> +   if (clk_rate != 0 && (clk_rate / rate[j]) <= 1024 &&
> +   (clk_rate % rate[j]) == 0)
> +   break;
> +   }
> +
> +   if (i == ASRC_CLK_MAP_LEN) {
> +   select_clk[j] = OUTCLK_ASRCK1_CLK;
> +   clk_sel[j] = false;
> +   } else {
> +   select_clk[j] = i;
> +   clk_sel[j] = true;
> +   }
> +   }
> +
> +   /* Switch to ideal ratio mode if there is no proper clock source */
> +   if (!clk_sel[IN] || !clk_sel[OUT])
> +   select_clk[IN] = INCLK_NONE;
> +
> +   config->inclk = select_clk[IN];
> +   config->outclk = select_clk[OUT];
> +
> +   return 0;

This new function always returns 0. Should it be converted to 'void'
type instead?

> +   ret = fsl_asrc_select_clk(asrc_priv, pair,
> + config.input_sample_rate,
> + config.output_sample_rate);
> +   if (ret) {
> +   dev_err(dai->dev, "fail to select clock\n");

fsl_asrc_select_clk() does not return error, so you could skip the
error checking.


Re: [PATCH] ASoC: fsl_asrc: Add an option to select internal ratio mode

2020-06-30 Thread Shengjiu Wang
On Tue, Jun 30, 2020 at 8:38 PM Fabio Estevam  wrote:
>
> Hi Shengjiu,
>
> On Mon, Jun 29, 2020 at 11:10 AM Shengjiu Wang  wrote:
>
> > +/**
>
> "/**" notation may confuse 'make htmldocs". Since this is a single
> line comment you could do:
>
> /* Select proper clock source for internal ratio mode */
>
>
> > + * Select proper clock source for internal ratio mode
> > + */
> > +static int fsl_asrc_select_clk(struct fsl_asrc_priv *asrc_priv,
> > +  struct fsl_asrc_pair *pair,
> > +  int in_rate,
> > +  int out_rate)
> > +{
> > +   struct fsl_asrc_pair_priv *pair_priv = pair->private;
> > +   struct asrc_config *config = pair_priv->config;
> > +   int rate[2], select_clk[2]; /* Array size 2 means IN and OUT */
> > +   int clk_rate, clk_index;
> > +   int i = 0, j = 0;
> > +   bool clk_sel[2];
> > +
> > +   rate[0] = in_rate;
> > +   rate[1] = out_rate;
> > +
> > +   /* Select proper clock source for internal ratio mode */
> > +   for (j = 0; j < 2; j++) {
> > +   for (i = 0; i < ASRC_CLK_MAP_LEN; i++) {
> > +   clk_index = asrc_priv->clk_map[j][i];
> > +   clk_rate = 
> > clk_get_rate(asrc_priv->asrck_clk[clk_index]);
> > +   if (clk_rate != 0 && (clk_rate / rate[j]) <= 1024 &&
> > +   (clk_rate % rate[j]) == 0)
> > +   break;
> > +   }
> > +
> > +   if (i == ASRC_CLK_MAP_LEN) {
> > +   select_clk[j] = OUTCLK_ASRCK1_CLK;
> > +   clk_sel[j] = false;
> > +   } else {
> > +   select_clk[j] = i;
> > +   clk_sel[j] = true;
> > +   }
> > +   }
> > +
> > +   /* Switch to ideal ratio mode if there is no proper clock source */
> > +   if (!clk_sel[IN] || !clk_sel[OUT])
> > +   select_clk[IN] = INCLK_NONE;
> > +
> > +   config->inclk = select_clk[IN];
> > +   config->outclk = select_clk[OUT];
> > +
> > +   return 0;
>
> This new function always returns 0. Should it be converted to 'void'
> type instead?
>
> > +   ret = fsl_asrc_select_clk(asrc_priv, pair,
> > + config.input_sample_rate,
> > + config.output_sample_rate);
> > +   if (ret) {
> > +   dev_err(dai->dev, "fail to select clock\n");
>
> fsl_asrc_select_clk() does not return error, so you could skip the
> error checking.

ok, will update the patch

best regards
wang shengjiu


[PATCH 30/30] misc: cxl: flash: Remove unused pointer

2020-06-30 Thread Lee Jones
The DRC index pointer us updated on an OPCODE_ADD, but never
actually read.  Remove the used pointer and shift up OPCODE_ADD
to group with OPCODE_DELETE which also provides a noop.

Fixes the following W=1 kernel build warning:

 drivers/misc/cxl/flash.c: In function ‘update_devicetree’:
 drivers/misc/cxl/flash.c:178:16: warning: variable ‘drc_index’ set but not 
used [-Wunused-but-set-variable]
 178 | __be32 *data, drc_index, phandle;
 | ^

Cc: Frederic Barrat 
Cc: Andrew Donnellan 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones 
---
 drivers/misc/cxl/flash.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/misc/cxl/flash.c b/drivers/misc/cxl/flash.c
index cb9cca35a2263..24e3dfcc91a74 100644
--- a/drivers/misc/cxl/flash.c
+++ b/drivers/misc/cxl/flash.c
@@ -175,7 +175,7 @@ static int update_devicetree(struct cxl *adapter, s32 scope)
struct update_nodes_workarea *unwa;
u32 action, node_count;
int token, rc, i;
-   __be32 *data, drc_index, phandle;
+   __be32 *data, phandle;
char *buf;
 
token = rtas_token("ibm,update-nodes");
@@ -206,15 +206,12 @@ static int update_devicetree(struct cxl *adapter, s32 
scope)
 
switch (action) {
case OPCODE_DELETE:
+   case OPCODE_ADD:
/* nothing to do */
break;
case OPCODE_UPDATE:
update_node(phandle, scope);
break;
-   case OPCODE_ADD:
-   /* nothing to do, just move pointer */
-   drc_index = *data++;
-   break;
}
}
}
-- 
2.25.1



Re: [PATCH v2] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-06-30 Thread Christophe Leroy




Le 30/06/2020 à 03:19, Michael Ellerman a écrit :

Michael Ellerman  writes:

Christophe Leroy  writes:

Hi Michael,

I see this patch is marked as "defered" in patchwork, but I can't see
any related discussion. Is it normal ?


Because it uses the "m<>" constraint which didn't work on GCC 4.6.

https://github.com/linuxppc/issues/issues/297

So we should be able to pick it up for v5.9 hopefully.


It seems to break the build with the kernel.org 4.9.4 compiler and
corenet64_smp_defconfig:


Looks like 4.9.4 doesn't accept "m<>" constraint either.
Changing it to "m" make it build.

Christophe



+ make -s CC=powerpc64-linux-gnu-gcc -j 160
In file included from /linux/include/linux/uaccess.h:11:0,
  from /linux/include/linux/sched/task.h:11,
  from /linux/include/linux/sched/signal.h:9,
  from /linux/include/linux/rcuwait.h:6,
  from /linux/include/linux/percpu-rwsem.h:7,
  from /linux/include/linux/fs.h:33,
  from /linux/include/linux/huge_mm.h:8,
  from /linux/include/linux/mm.h:675,
  from /linux/arch/powerpc/kernel/signal_32.c:17:
/linux/arch/powerpc/kernel/signal_32.c: In function 
'save_user_regs.isra.14.constprop':
/linux/arch/powerpc/include/asm/uaccess.h:161:2: error: 'asm' operand has 
impossible constraints
   __asm__ __volatile__( \
   ^
/linux/arch/powerpc/include/asm/uaccess.h:197:12: note: in expansion of macro 
'__put_user_asm'
 case 4: __put_user_asm(x, ptr, retval, "stw"); break; \
 ^
/linux/arch/powerpc/include/asm/uaccess.h:206:2: note: in expansion of macro 
'__put_user_size_allowed'
   __put_user_size_allowed(x, ptr, size, retval);  \
   ^
/linux/arch/powerpc/include/asm/uaccess.h:220:2: note: in expansion of macro 
'__put_user_size'
   __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err); \
   ^
/linux/arch/powerpc/include/asm/uaccess.h:96:2: note: in expansion of macro 
'__put_user_nocheck'
   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
   ^
/linux/arch/powerpc/kernel/signal_32.c:120:7: note: in expansion of macro 
'__put_user'
if (__put_user((unsigned int)gregs[i], >mc_gregs[i]))
^
/linux/scripts/Makefile.build:280: recipe for target 
'arch/powerpc/kernel/signal_32.o' failed
make[3]: *** [arch/powerpc/kernel/signal_32.o] Error 1
make[3]: *** Waiting for unfinished jobs
In file included from /linux/include/linux/uaccess.h:11:0,
  from /linux/include/linux/sched/task.h:11,
  from /linux/include/linux/sched/signal.h:9,
  from /linux/include/linux/rcuwait.h:6,
  from /linux/include/linux/percpu-rwsem.h:7,
  from /linux/include/linux/fs.h:33,
  from /linux/include/linux/huge_mm.h:8,
  from /linux/include/linux/mm.h:675,
  from /linux/arch/powerpc/kernel/signal_64.c:12:
/linux/arch/powerpc/kernel/signal_64.c: In function '__se_sys_swapcontext':
/linux/arch/powerpc/include/asm/uaccess.h:319:2: error: 'asm' operand has 
impossible constraints
   __asm__ __volatile__(\
   ^
/linux/arch/powerpc/include/asm/uaccess.h:359:10: note: in expansion of macro 
'__get_user_asm'
   case 1: __get_user_asm(x, (u8 __user *)ptr, retval, "lbz"); break; \
   ^
/linux/arch/powerpc/include/asm/uaccess.h:370:2: note: in expansion of macro 
'__get_user_size_allowed'
   __get_user_size_allowed(x, ptr, size, retval);  \
   ^
/linux/arch/powerpc/include/asm/uaccess.h:393:3: note: in expansion of macro 
'__get_user_size'
__get_user_size(__gu_val, __gu_addr, __gu_size, __gu_err); \
^
/linux/arch/powerpc/include/asm/uaccess.h:94:2: note: in expansion of macro 
'__get_user_nocheck'
   __get_user_nocheck((x), (ptr), sizeof(*(ptr)), true)
   ^
/linux/arch/powerpc/kernel/signal_64.c:672:9: note: in expansion of macro 
'__get_user'
   || __get_user(tmp, (u8 __user *) new_ctx + ctx_size - 1))
  ^
/linux/scripts/Makefile.build:280: recipe for target 
'arch/powerpc/kernel/signal_64.o' failed
make[3]: *** [arch/powerpc/kernel/signal_64.o] Error 1
/linux/scripts/Makefile.build:497: recipe for target 'arch/powerpc/kernel' 
failed
make[2]: *** [arch/powerpc/kernel] Error 2
/linux/Makefile:1756: recipe for target 'arch/powerpc' failed
make[1]: *** [arch/powerpc] Error 2
Makefile:185: recipe for target '__sub-make' failed
make: *** [__sub-make] Error 2


cheers



[Bug 208181] BUG: KASAN: stack-out-of-bounds in strcmp+0x58/0xd8

2020-06-30 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=208181

--- Comment #8 from Christophe Leroy (christophe.le...@csgroup.eu) ---
block_address_translation contains funny sizes. But the adresses seems ok.
So it shows you have a 24 Mb text+rodata area. 8 BATs are used
(16+8+8+32+64+128+256+256)
By increasing CONFIG_DATA_SHIFT to 25, you'll get a 32Mb alignment
So you will have only 6 bats used (32+32+64+128+256+256), so two additional
data BATs will be available for KASAN.

But regardless of the BAT stuff, KASAN should work properly.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH 11/20] fs: remove a weird comment in submit_bh_wbc

2020-06-30 Thread Jens Axboe
On 6/29/20 1:39 PM, Christoph Hellwig wrote:
> All bios can get remapped if submitted to partitions.  No need to
> comment on that.

I'm pretty sure that comment is from me, dating back to when the bio
code was introduced in 2001. The point wasn't the remapping, just
that from here on down the IO was purely bio based, not buffer_heads.
Anyway, totally agree that it should just die, it's not that
interesting or useful anymore.

-- 
Jens Axboe



[PATCH 28/30] misc: ocxl: config: Provide correct formatting to function headers

2020-06-30 Thread Lee Jones
A nice attempt was made to provide kerneldoc headers for
read_template_version() and read_afu_lpc_memory_info() however,
the provided formatting does not match what is expected by
kerneldoc.

Fixes the following W=1 warnings:

 drivers/misc/ocxl/config.c:286: warning: Function parameter or member 'dev' 
not described in 'read_template_version'
 drivers/misc/ocxl/config.c:286: warning: Function parameter or member 'fn' not 
described in 'read_template_version'
 drivers/misc/ocxl/config.c:286: warning: Function parameter or member 'len' 
not described in 'read_template_version'
 drivers/misc/ocxl/config.c:286: warning: Function parameter or member 
'version' not described in 'read_template_version'
 drivers/misc/ocxl/config.c:489: warning: Function parameter or member 'dev' 
not described in 'read_afu_lpc_memory_info'
 drivers/misc/ocxl/config.c:489: warning: Function parameter or member 'fn' not 
described in 'read_afu_lpc_memory_info'
 drivers/misc/ocxl/config.c:489: warning: Function parameter or member 'afu' 
not described in 'read_afu_lpc_memory_info'

Cc: Frederic Barrat 
Cc: Andrew Donnellan 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones 
---
 drivers/misc/ocxl/config.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index c8e19bfb5ef90..e3b99a39d207e 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -273,11 +273,11 @@ static int read_afu_info(struct pci_dev *dev, struct 
ocxl_fn_config *fn,
 }
 
 /**
- * Read the template version from the AFU
- * dev: the device for the AFU
- * fn: the AFU offsets
- * len: outputs the template length
- * version: outputs the major<<8,minor version
+ * read_template_version - Read the template version from the AFU
+ * @dev: the device for the AFU
+ * @fn: the AFU offsets
+ * @len: outputs the template length
+ * @version: outputs the major<<8,minor version
  *
  * Returns 0 on success, negative on failure
  */
@@ -476,10 +476,10 @@ static int validate_afu(struct pci_dev *dev, struct 
ocxl_afu_config *afu)
 }
 
 /**
- * Populate AFU metadata regarding LPC memory
- * dev: the device for the AFU
- * fn: the AFU offsets
- * afu: the AFU struct to populate the LPC metadata into
+ * read_afu_lpc_memory_info - Populate AFU metadata regarding LPC memory
+ * @dev: the device for the AFU
+ * @fn: the AFU offsets
+ * @afu: the AFU struct to populate the LPC metadata into
  *
  * Returns 0 on success, negative on failure
  */
-- 
2.25.1



Re: [PATCH v3] ASoC: fsl_asrc: Add an option to select internal ratio mode

2020-06-30 Thread Fabio Estevam
On Tue, Jun 30, 2020 at 11:07 AM Shengjiu Wang  wrote:
>
> The ASRC not only supports ideal ratio mode, but also supports
> internal ratio mode.
>
> For internal rato mode, the rate of clock source should be divided
> with no remainder by sample rate, otherwise there is sound
> distortion.
>
> Add function fsl_asrc_select_clk() to find proper clock source for
> internal ratio mode, if the clock source is available then internal
> ratio mode will be selected.
>
> With change, the ideal ratio mode is not the only option for user.
>
> Signed-off-by: Shengjiu Wang 

Reviewed-by: Fabio Estevam 


Re: [PATCH updated] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-06-30 Thread Aneesh Kumar K.V


Update patch. 

>From 1e6aa6c4182e14ec5d6bf878ae44c3f69ebff745 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" 
Date: Tue, 12 May 2020 20:58:33 +0530
Subject: [PATCH] libnvdimm/nvdimm/flush: Allow architecture to override the
 flush barrier

Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V 
---
 Documentation/memory-barriers.txt | 14 ++
 drivers/md/dm-writecache.c|  2 +-
 drivers/nvdimm/region_devs.c  |  8 
 include/asm-generic/barrier.h | 10 ++
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index eaabc3134294..340273a6b18e 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1935,6 +1935,20 @@ There are some more advanced barrier functions:
  relaxed I/O accessors and the Documentation/DMA-API.txt file for more
  information on consistent memory.
 
+ (*) pmem_wmb();
+
+ This is for use with persistent memory to ensure that stores for which
+ modifications are written to persistent storage have updated the 
persistent
+ storage.
+
+ For example, after a non-temporal write to pmem region, we use pmem_wmb()
+ to ensures that stores have updated the persistent storage. This ensures
+ that stores have updated persistent storage before any data access or
+ data transfer caused by subsequent instructions is initiated. This is
+ in addition to the ordering done by wmb().
+
+ For load from persistent memory, existing read memory barriers are 
sufficient
+ to ensure read ordering.
 
 ===
 IMPLICIT KERNEL MEMORY BARRIERS
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 74f3c506f084..00534fa4a384 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
 static void writecache_commit_flushed(struct dm_writecache *wc, bool 
wait_for_ios)
 {
if (WC_MODE_PMEM(wc))
-   wmb();
+   pmem_wmb();
else
ssd_commit_flushed(wc, wait_for_ios);
 }
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4502f9c4708d..2333b290bdcf 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
/*
-* The first wmb() is needed to 'sfence' all previous writes
-* such that they are architecturally visible for the platform
-* buffer flush.  Note that we've already arranged for pmem
+* The first arch_pmem_flush_barrier() is needed to 'sfence' all
+* previous writes such that they are architecturally visible for
+* the platform buffer flush. Note that we've already arranged for pmem
 * writes to avoid the cache via memcpy_flushcache().  The final
 * wmb() ensures ordering for the NVDIMM flush write.
 */
-   wmb();
+   pmem_wmb();
for (i = 0; i < nd_region->ndr_mappings; i++)
if (ndrd_get_flush_wpq(ndrd, i, 0))
writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 2eacaf7d62f6..879d68faec1d 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -257,5 +257,15 @@ do {   
\
 })
 #endif
 
+/*
+ * pmem_barrier() ensures that all stores for which the modification
+ * are written to persistent storage by preceding instructions have
+ * updated persistent storage before any data  access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+#ifndef pmem_wmb
+#define pmem_wmb() wmb()
+#endif
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
-- 
2.26.2



Re: rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/29/20 1:39 PM, Christoph Hellwig wrote:
> Hi Jens,
> 
> this series moves the make_request_fn method into block_device_operations
> with the much more descriptive ->submit_bio name.  It then also gives
> generic_make_request a more descriptive name, and further optimize the
> path to issue to blk-mq, removing the need for the direct_make_request
> bypass.

Looks good to me, and it's a nice cleanup as well. Applied.

-- 
Jens Axboe



[PATCH 27/30] misc: cxl: hcalls: Demote half-assed kerneldoc attempt

2020-06-30 Thread Lee Jones
Function headers will need a lot of work before they reach the
standards expected of kerneldoc.  Demote them down to basic
comments/headers, for now at least.

Fixes the following W=1 kernel build warnings:

 drivers/misc/cxl/hcalls.c:175: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_detach_process'
 drivers/misc/cxl/hcalls.c:175: warning: Function parameter or member 
'process_token' not described in 'cxl_h_detach_process'
 drivers/misc/cxl/hcalls.c:207: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_control_function'
 drivers/misc/cxl/hcalls.c:207: warning: Function parameter or member 'op' not 
described in 'cxl_h_control_function'
 drivers/misc/cxl/hcalls.c:207: warning: Function parameter or member 'p1' not 
described in 'cxl_h_control_function'
 drivers/misc/cxl/hcalls.c:207: warning: Function parameter or member 'p2' not 
described in 'cxl_h_control_function'
 drivers/misc/cxl/hcalls.c:207: warning: Function parameter or member 'p3' not 
described in 'cxl_h_control_function'
 drivers/misc/cxl/hcalls.c:207: warning: Function parameter or member 'p4' not 
described in 'cxl_h_control_function'
 drivers/misc/cxl/hcalls.c:207: warning: Function parameter or member 'out' not 
described in 'cxl_h_control_function'
 drivers/misc/cxl/hcalls.c:245: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_reset_afu'
 drivers/misc/cxl/hcalls.c:258: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_suspend_process'
 drivers/misc/cxl/hcalls.c:258: warning: Function parameter or member 
'process_token' not described in 'cxl_h_suspend_process'
 drivers/misc/cxl/hcalls.c:271: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_resume_process'
 drivers/misc/cxl/hcalls.c:271: warning: Function parameter or member 
'process_token' not described in 'cxl_h_resume_process'
 drivers/misc/cxl/hcalls.c:284: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_read_error_state'
 drivers/misc/cxl/hcalls.c:284: warning: Function parameter or member 'state' 
not described in 'cxl_h_read_error_state'
 drivers/misc/cxl/hcalls.c:300: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_get_afu_err'
 drivers/misc/cxl/hcalls.c:300: warning: Function parameter or member 'offset' 
not described in 'cxl_h_get_afu_err'
 drivers/misc/cxl/hcalls.c:300: warning: Function parameter or member 
'buf_address' not described in 'cxl_h_get_afu_err'
 drivers/misc/cxl/hcalls.c:300: warning: Function parameter or member 'len' not 
described in 'cxl_h_get_afu_err'
 drivers/misc/cxl/hcalls.c:320: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_get_config'
 drivers/misc/cxl/hcalls.c:320: warning: Function parameter or member 'cr_num' 
not described in 'cxl_h_get_config'
 drivers/misc/cxl/hcalls.c:320: warning: Function parameter or member 'offset' 
not described in 'cxl_h_get_config'
 drivers/misc/cxl/hcalls.c:320: warning: Function parameter or member 
'buf_address' not described in 'cxl_h_get_config'
 drivers/misc/cxl/hcalls.c:320: warning: Function parameter or member 'len' not 
described in 'cxl_h_get_config'
 drivers/misc/cxl/hcalls.c:333: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_terminate_process'
 drivers/misc/cxl/hcalls.c:333: warning: Function parameter or member 
'process_token' not described in 'cxl_h_terminate_process'
 drivers/misc/cxl/hcalls.c:351: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_collect_vpd'
 drivers/misc/cxl/hcalls.c:351: warning: Function parameter or member 'record' 
not described in 'cxl_h_collect_vpd'
 drivers/misc/cxl/hcalls.c:351: warning: Function parameter or member 
'list_address' not described in 'cxl_h_collect_vpd'
 drivers/misc/cxl/hcalls.c:351: warning: Function parameter or member 'num' not 
described in 'cxl_h_collect_vpd'
 drivers/misc/cxl/hcalls.c:351: warning: Function parameter or member 'out' not 
described in 'cxl_h_collect_vpd'
 drivers/misc/cxl/hcalls.c:362: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_get_fn_error_interrupt'
 drivers/misc/cxl/hcalls.c:362: warning: Function parameter or member 'reg' not 
described in 'cxl_h_get_fn_error_interrupt'
 drivers/misc/cxl/hcalls.c:374: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_ack_fn_error_interrupt'
 drivers/misc/cxl/hcalls.c:374: warning: Function parameter or member 'value' 
not described in 'cxl_h_ack_fn_error_interrupt'
 drivers/misc/cxl/hcalls.c:386: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_get_error_log'
 drivers/misc/cxl/hcalls.c:386: warning: Function parameter or member 'value' 
not described in 'cxl_h_get_error_log'
 drivers/misc/cxl/hcalls.c:399: warning: Function parameter or member 
'unit_address' not described in 'cxl_h_collect_int_info'
 

Re: [PATCH v2] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-06-30 Thread Christophe Leroy




On 06/30/2020 04:33 PM, Segher Boessenkool wrote:

On Tue, Jun 30, 2020 at 04:55:05PM +0200, Christophe Leroy wrote:

Le 30/06/2020 à 03:19, Michael Ellerman a écrit :

Michael Ellerman  writes:

Because it uses the "m<>" constraint which didn't work on GCC 4.6.

https://github.com/linuxppc/issues/issues/297

So we should be able to pick it up for v5.9 hopefully.


It seems to break the build with the kernel.org 4.9.4 compiler and
corenet64_smp_defconfig:


Looks like 4.9.4 doesn't accept "m<>" constraint either.


The evidence contradicts this assertion.


Changing it to "m" make it build.


But that just means something else is wrong.


+ make -s CC=powerpc64-linux-gnu-gcc -j 160
In file included from /linux/include/linux/uaccess.h:11:0,
  from /linux/include/linux/sched/task.h:11,
  from /linux/include/linux/sched/signal.h:9,
  from /linux/include/linux/rcuwait.h:6,
  from /linux/include/linux/percpu-rwsem.h:7,
  from /linux/include/linux/fs.h:33,
  from /linux/include/linux/huge_mm.h:8,
  from /linux/include/linux/mm.h:675,
  from /linux/arch/powerpc/kernel/signal_32.c:17:
/linux/arch/powerpc/kernel/signal_32.c: In function
'save_user_regs.isra.14.constprop':
/linux/arch/powerpc/include/asm/uaccess.h:161:2: error: 'asm' operand has
impossible constraints
   __asm__ __volatile__( \
   ^
/linux/arch/powerpc/include/asm/uaccess.h:197:12: note: in expansion of
macro '__put_user_asm'
 case 4: __put_user_asm(x, ptr, retval, "stw"); break; \
 ^
/linux/arch/powerpc/include/asm/uaccess.h:206:2: note: in expansion of
macro '__put_user_size_allowed'
   __put_user_size_allowed(x, ptr, size, retval);  \
   ^
/linux/arch/powerpc/include/asm/uaccess.h:220:2: note: in expansion of
macro '__put_user_size'
   __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err); \
   ^
/linux/arch/powerpc/include/asm/uaccess.h:96:2: note: in expansion of
macro '__put_user_nocheck'
   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
   ^
/linux/arch/powerpc/kernel/signal_32.c:120:7: note: in expansion of macro
'__put_user'
if (__put_user((unsigned int)gregs[i], >mc_gregs[i]))
^


Can we see what that was after the macro jungle?  Like, the actual
preprocessed code?



Sorry for previous misunderstanding

Here is the code:

#define __put_user_asm(x, addr, err, op)\
__asm__ __volatile__(   \
"1:" op "%U2%X2 %1,%2# put_user\n"  \
"2:\n"\
".section .fixup,\"ax\"\n"  \
"3:li %0,%3\n"\
"  b 2b\n"\
".previous\n" \
EX_TABLE(1b, 3b)\
: "=r" (err)  \
: "r" (x), "m<>" (*addr), "i" (-EFAULT), "0" (err))

Christophe


[Bug 208181] BUG: KASAN: stack-out-of-bounds in strcmp+0x58/0xd8

2020-06-30 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=208181

--- Comment #9 from Erhard F. (erhar...@mailbox.org) ---
Ok, thanks for the clarification! So if KASAN works properly something else
must cause this hit. I will start a bisect the next few days and see how that
turns out...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH V4 0/3] cpufreq: Allow default governor on cmdline and fix locking issues

2020-06-30 Thread Rafael J. Wysocki
On Mon, Jun 29, 2020 at 10:58 PM Viresh Kumar  wrote:
>
> Hi,
>
> I have picked Quentin's series over my patch, modified both and tested.
>
> V3->V4:
> - Do __module_get() for cpufreq_default_governor() case as well and get
>   rid of an extra variable.
> - Use a single character array, default_governor, instead of two of them.
>
> V2->V3:
> - default_governor is a string now and we don't set it on governor
>   registration or unregistration anymore.
> - Fixed locking issues in cpufreq_init_policy().
>
> --
> Viresh
>
> Original cover letter fro Quentin:
>
> This series enables users of prebuilt kernels (e.g. distro kernels) to
> specify their CPUfreq governor of choice using the kernel command line,
> instead of having to wait for the system to fully boot to userspace to
> switch using the sysfs interface. This is helpful for 2 reasons:
>   1. users get to choose the governor that runs during the actual boot;
>   2. it simplifies the userspace boot procedure a bit (one less thing to
>  worry about).
>
> To enable this, the first patch moves all governor init calls to
> core_initcall, to make sure they are registered by the time the drivers
> probe. This should be relatively low impact as registering a governor
> is a simple procedure (it gets added to a llist), and all governors
> already load at core_initcall anyway when they're set as the default
> in Kconfig. This also allows to clean-up the governors' init/exit code,
> and reduces boilerplate.
>
> The second patch introduces the new command line parameter, inspired by
> its cpuidle counterpart. More details can be found in the respective
> patch headers.
>
> Changes in v2:
>  - added Viresh's ack to patch 01
>  - moved the assignment of 'default_governor' in patch 02 to the governor
>registration path instead of the driver registration (Viresh)
>
> Quentin Perret (2):
>   cpufreq: Register governors at core_initcall
>   cpufreq: Specify default governor on command line
>
> Viresh Kumar (1):
>   cpufreq: Fix locking issues with governors
>
>  .../admin-guide/kernel-parameters.txt |  5 ++
>  Documentation/admin-guide/pm/cpufreq.rst  |  6 +-
>  .../platforms/cell/cpufreq_spudemand.c| 26 +-
>  drivers/cpufreq/cpufreq.c | 87 ---
>  drivers/cpufreq/cpufreq_conservative.c| 22 ++---
>  drivers/cpufreq/cpufreq_ondemand.c| 24 ++---
>  drivers/cpufreq/cpufreq_performance.c | 14 +--
>  drivers/cpufreq/cpufreq_powersave.c   | 18 +---
>  drivers/cpufreq/cpufreq_userspace.c   | 18 +---
>  include/linux/cpufreq.h   | 14 +++
>  kernel/sched/cpufreq_schedutil.c  |  6 +-
>  11 files changed, 100 insertions(+), 140 deletions(-)
>
> --

All three patches applied as 5.9 material, thanks!


Re: rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/30/20 12:21 PM, Jens Axboe wrote:
> On 6/30/20 12:19 PM, Christoph Hellwig wrote:
>> On Tue, Jun 30, 2020 at 09:43:31AM -0600, Jens Axboe wrote:
>>> On 6/30/20 7:57 AM, Jens Axboe wrote:
 On 6/29/20 1:39 PM, Christoph Hellwig wrote:
> Hi Jens,
>
> this series moves the make_request_fn method into block_device_operations
> with the much more descriptive ->submit_bio name.  It then also gives
> generic_make_request a more descriptive name, and further optimize the
> path to issue to blk-mq, removing the need for the direct_make_request
> bypass.

 Looks good to me, and it's a nice cleanup as well. Applied.
>>>
>>> Dropped, insta-crashes with dm:
>>
>> Hmm.  Can you send me what is at "submit_bio_noacct+0x1f6" from gdb?
>> Or your .config?
> 
> I'd have to apply and compile again. But it's a bad RIP, so I'm guessing
> it's ->submit_bio == NULL. Let me know if you really need it, and I can
> re-generate the OOPS and have the vmlinux too.

Here's the .config

-- 
Jens Axboe

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 5.8.0-rc1 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 10.1.0-2ubuntu1~18.04) 10.1.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=100100
CONFIG_LD_VERSION=23000
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_SCHED_THERMAL_PRESSURE is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=18
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT 

Re: rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/30/20 7:57 AM, Jens Axboe wrote:
> On 6/29/20 1:39 PM, Christoph Hellwig wrote:
>> Hi Jens,
>>
>> this series moves the make_request_fn method into block_device_operations
>> with the much more descriptive ->submit_bio name.  It then also gives
>> generic_make_request a more descriptive name, and further optimize the
>> path to issue to blk-mq, removing the need for the direct_make_request
>> bypass.
> 
> Looks good to me, and it's a nice cleanup as well. Applied.

Dropped, insta-crashes with dm:

[   10.240134] BUG: kernel NULL pointer dereference, address: 
[   10.241000] #PF: supervisor instruction fetch in kernel mode
[   10.241666] #PF: error_code(0x0010) - not-present page
[   10.242280] PGD 0 P4D 0 
[   10.242600] Oops: 0010 [#1] PREEMPT SMP
[   10.243073] CPU: 1 PID: 2110 Comm: systemd-udevd Not tainted 5.8.0-rc3+ #6655
[   10.243939] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
[   10.245012] RIP: 0010:0x0
[   10.245322] Code: Bad RIP value.
[   10.245695] RSP: 0018:c92f7af8 EFLAGS: 00010246
[   10.246333] RAX: 81c83520 RBX: 8881b805dea8 RCX: 88819e844070
[   10.247227] RDX:  RSI:  RDI: 88819e844070
[   10.248112] RBP: c92f7b48 R08: 8881b6f38800 R09: 88818ff0ea58
[   10.248994] R10:  R11: 88818ff0ea58 R12: 88819e844070
[   10.250077] R13:  R14:  R15: 888107812948
[   10.251168] FS:  7f5c3ed66a80() GS:8881b9c8() 
knlGS:
[   10.252161] CS:  0010 DS:  ES:  CR0: 80050033
[   10.253189] CR2: ffd6 CR3: 0001b2953003 CR4: 001606e0
[   10.254157] DR0:  DR1:  DR2: 
[   10.255279] DR3:  DR6: fffe0ff0 DR7: 0400
[   10.256365] Call Trace:
[   10.256781]  submit_bio_noacct+0x1f6/0x3d0
[   10.257297]  submit_bio+0x37/0x130
[   10.257780]  ? guard_bio_eod+0x2e/0x70
[   10.258418]  mpage_readahead+0x13c/0x180
[   10.259096]  ? blkdev_direct_IO+0x490/0x490
[   10.259654]  read_pages+0x68/0x2d0
[   10.260051]  page_cache_readahead_unbounded+0x1b7/0x220
[   10.260818]  generic_file_buffered_read+0x865/0xc80
[   10.261587]  ? _copy_to_user+0x6d/0x80
[   10.262171]  ? cp_new_stat+0x119/0x130
[   10.262680]  new_sync_read+0xfe/0x170
[   10.263155]  vfs_read+0xc8/0x180
[   10.263647]  ksys_read+0x53/0xc0
[   10.264209]  do_syscall_64+0x3c/0x70
[   10.264759]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   10.265200] RIP: 0033:0x7f5c3fcc9ab2
[   10.265510] Code: Bad RIP value.
[   10.265775] RSP: 002b:7ffc8e0cf9c8 EFLAGS: 0246 ORIG_RAX: 

[   10.266426] RAX: ffda RBX: 55d5eca76c68 RCX: 7f5c3fcc9ab2
[   10.267012] RDX: 0040 RSI: 55d5eca76c78 RDI: 0006
[   10.267591] RBP: 55d5eca44890 R08: 55d5eca76c50 R09: 7f5c3fd99a40
[   10.268168] R10: 0008 R11: 0246 R12: 3bd9
[   10.268744] R13: 0040 R14: 55d5eca76c50 R15: 55d5eca448e0
[   10.269319] Modules linked in:
[   10.269562] CR2: 
[   10.269845] ---[ end trace f09b8963e5a3593b ]---

-- 
Jens Axboe



[PATCH] selftests/seccomp: fix ptrace tests on powerpc

2020-06-30 Thread Thadeu Lima de Souza Cascardo
As pointed out by Michael Ellerman, the ptrace ABI on powerpc does not
allow or require the return code to be set on syscall entry when
skipping the syscall. It will always return ENOSYS and the return code
must be set on syscall exit.

This code does that, behaving more similarly to strace. It still sets
the return code on entry, which is overridden on powerpc, and it will
always repeat the same on exit. Also, on powerpc, the errno is not
inverted, and depends on ccr.so being set.

This has been tested on powerpc and amd64.

Cc: Michael Ellerman 
Cc: Kees Cook 
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 24 +++
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 252140a52553..b90a9190ba88 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1738,6 +1738,14 @@ void change_syscall(struct __test_metadata *_metadata,
TH_LOG("Can't modify syscall return on this architecture");
 #else
regs.SYSCALL_RET = result;
+# if defined(__powerpc__)
+   if (result < 0) {
+   regs.SYSCALL_RET = -result;
+   regs.ccr |= 0x1000;
+   } else {
+   regs.ccr &= ~0x1000;
+   }
+# endif
 #endif
 
 #ifdef HAVE_GETREGS
@@ -1796,6 +1804,7 @@ void tracer_ptrace(struct __test_metadata *_metadata, 
pid_t tracee,
int ret, nr;
unsigned long msg;
static bool entry;
+   int *syscall_nr = args;
 
/*
 * The traditional way to tell PTRACE_SYSCALL entry/exit
@@ -1809,10 +1818,15 @@ void tracer_ptrace(struct __test_metadata *_metadata, 
pid_t tracee,
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
: PTRACE_EVENTMSG_SYSCALL_EXIT, msg);
 
-   if (!entry)
+   if (!entry && !syscall_nr)
return;
 
-   nr = get_syscall(_metadata, tracee);
+   if (entry)
+   nr = get_syscall(_metadata, tracee);
+   else
+   nr = *syscall_nr;
+   if (syscall_nr)
+   *syscall_nr = nr;
 
if (nr == __NR_getpid)
change_syscall(_metadata, tracee, __NR_getppid, 0);
@@ -1889,9 +1903,10 @@ TEST_F(TRACE_syscall, ptrace_syscall_redirected)
 
 TEST_F(TRACE_syscall, ptrace_syscall_errno)
 {
+   int syscall_nr = -1;
/* Swap SECCOMP_RET_TRACE tracer for PTRACE_SYSCALL tracer. */
teardown_trace_fixture(_metadata, self->tracer);
-   self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, NULL,
+   self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, 
_nr,
   true);
 
/* Tracer should skip the open syscall, resulting in ESRCH. */
@@ -1900,9 +1915,10 @@ TEST_F(TRACE_syscall, ptrace_syscall_errno)
 
 TEST_F(TRACE_syscall, ptrace_syscall_faked)
 {
+   int syscall_nr = -1;
/* Swap SECCOMP_RET_TRACE tracer for PTRACE_SYSCALL tracer. */
teardown_trace_fixture(_metadata, self->tracer);
-   self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, NULL,
+   self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, 
_nr,
   true);
 
/* Tracer should skip the gettid syscall, resulting fake pid. */
-- 
2.25.1



Re: rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Christoph Hellwig
On Tue, Jun 30, 2020 at 09:43:31AM -0600, Jens Axboe wrote:
> On 6/30/20 7:57 AM, Jens Axboe wrote:
> > On 6/29/20 1:39 PM, Christoph Hellwig wrote:
> >> Hi Jens,
> >>
> >> this series moves the make_request_fn method into block_device_operations
> >> with the much more descriptive ->submit_bio name.  It then also gives
> >> generic_make_request a more descriptive name, and further optimize the
> >> path to issue to blk-mq, removing the need for the direct_make_request
> >> bypass.
> > 
> > Looks good to me, and it's a nice cleanup as well. Applied.
> 
> Dropped, insta-crashes with dm:

Hmm.  Can you send me what is at "submit_bio_noacct+0x1f6" from gdb?
Or your .config?


Re: [PATCH v2] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-06-30 Thread Segher Boessenkool
On Tue, Jun 30, 2020 at 04:55:05PM +0200, Christophe Leroy wrote:
> Le 30/06/2020 à 03:19, Michael Ellerman a écrit :
> >Michael Ellerman  writes:
> >>Because it uses the "m<>" constraint which didn't work on GCC 4.6.
> >>
> >>https://github.com/linuxppc/issues/issues/297
> >>
> >>So we should be able to pick it up for v5.9 hopefully.
> >
> >It seems to break the build with the kernel.org 4.9.4 compiler and
> >corenet64_smp_defconfig:
> 
> Looks like 4.9.4 doesn't accept "m<>" constraint either.

The evidence contradicts this assertion.

> Changing it to "m" make it build.

But that just means something else is wrong.

> >+ make -s CC=powerpc64-linux-gnu-gcc -j 160
> >In file included from /linux/include/linux/uaccess.h:11:0,
> >  from /linux/include/linux/sched/task.h:11,
> >  from /linux/include/linux/sched/signal.h:9,
> >  from /linux/include/linux/rcuwait.h:6,
> >  from /linux/include/linux/percpu-rwsem.h:7,
> >  from /linux/include/linux/fs.h:33,
> >  from /linux/include/linux/huge_mm.h:8,
> >  from /linux/include/linux/mm.h:675,
> >  from /linux/arch/powerpc/kernel/signal_32.c:17:
> >/linux/arch/powerpc/kernel/signal_32.c: In function 
> >'save_user_regs.isra.14.constprop':
> >/linux/arch/powerpc/include/asm/uaccess.h:161:2: error: 'asm' operand has 
> >impossible constraints
> >   __asm__ __volatile__( \
> >   ^
> >/linux/arch/powerpc/include/asm/uaccess.h:197:12: note: in expansion of 
> >macro '__put_user_asm'
> > case 4: __put_user_asm(x, ptr, retval, "stw"); break; \
> > ^
> >/linux/arch/powerpc/include/asm/uaccess.h:206:2: note: in expansion of 
> >macro '__put_user_size_allowed'
> >   __put_user_size_allowed(x, ptr, size, retval);  \
> >   ^
> >/linux/arch/powerpc/include/asm/uaccess.h:220:2: note: in expansion of 
> >macro '__put_user_size'
> >   __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err); \
> >   ^
> >/linux/arch/powerpc/include/asm/uaccess.h:96:2: note: in expansion of 
> >macro '__put_user_nocheck'
> >   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
> >   ^
> >/linux/arch/powerpc/kernel/signal_32.c:120:7: note: in expansion of macro 
> >'__put_user'
> >if (__put_user((unsigned int)gregs[i], >mc_gregs[i]))
> >^

Can we see what that was after the macro jungle?  Like, the actual
preprocessed code?

Also, what GCC version *does* work on this?


Segher


Re: [PATCH updated] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-06-30 Thread Dan Williams
On Tue, Jun 30, 2020 at 5:48 AM Aneesh Kumar K.V
 wrote:
>
>
> Update patch.
>
> From 1e6aa6c4182e14ec5d6bf878ae44c3f69ebff745 Mon Sep 17 00:00:00 2001
> From: "Aneesh Kumar K.V" 
> Date: Tue, 12 May 2020 20:58:33 +0530
> Subject: [PATCH] libnvdimm/nvdimm/flush: Allow architecture to override the
>  flush barrier
>
> Architectures like ppc64 provide persistent memory specific barriers
> that will ensure that all stores for which the modifications are
> written to persistent storage by preceding dcbfps and dcbstps
> instructions have updated persistent storage before any data
> access or data transfer caused by subsequent instructions is initiated.
> This is in addition to the ordering done by wmb()
>
> Update nvdimm core such that architecture can use barriers other than
> wmb to ensure all previous writes are architecturally visible for
> the platform buffer flush.

Looks good, after a few minor fixups below you can add:

Reviewed-by: Dan Williams 

I'm expecting that these will be merged through the powerpc tree since
they mostly impact powerpc with only minor touches to libnvdimm.

> Signed-off-by: Aneesh Kumar K.V 
> ---
>  Documentation/memory-barriers.txt | 14 ++
>  drivers/md/dm-writecache.c|  2 +-
>  drivers/nvdimm/region_devs.c  |  8 
>  include/asm-generic/barrier.h | 10 ++
>  4 files changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index eaabc3134294..340273a6b18e 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1935,6 +1935,20 @@ There are some more advanced barrier functions:
>   relaxed I/O accessors and the Documentation/DMA-API.txt file for more
>   information on consistent memory.
>
> + (*) pmem_wmb();
> +
> + This is for use with persistent memory to ensure that stores for which
> + modifications are written to persistent storage have updated the 
> persistent
> + storage.

I think this should be:

s/updated the persistent storage/reached a platform durability domain/

> +
> + For example, after a non-temporal write to pmem region, we use 
> pmem_wmb()
> + to ensures that stores have updated the persistent storage. This ensures

s/ensures/ensure/

...and the same comment about "persistent storage" because pmem_wmb()
as implemented on x86 does not guarantee that the writes have reached
storage it ensures that writes have reached buffers / queues that are
within the ADR (platform persistence / durability) domain.

> + that stores have updated persistent storage before any data access or
> + data transfer caused by subsequent instructions is initiated. This is
> + in addition to the ordering done by wmb().
> +
> + For load from persistent memory, existing read memory barriers are 
> sufficient
> + to ensure read ordering.
>
>  ===
>  IMPLICIT KERNEL MEMORY BARRIERS
> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> index 74f3c506f084..00534fa4a384 100644
> --- a/drivers/md/dm-writecache.c
> +++ b/drivers/md/dm-writecache.c
> @@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache 
> *wc)
>  static void writecache_commit_flushed(struct dm_writecache *wc, bool 
> wait_for_ios)
>  {
> if (WC_MODE_PMEM(wc))
> -   wmb();
> +   pmem_wmb();
> else
> ssd_commit_flushed(wc, wait_for_ios);
>  }
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 4502f9c4708d..2333b290bdcf 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
> idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
>
> /*
> -* The first wmb() is needed to 'sfence' all previous writes
> -* such that they are architecturally visible for the platform
> -* buffer flush.  Note that we've already arranged for pmem
> +* The first arch_pmem_flush_barrier() is needed to 'sfence' all

One missed arch_pmem_flush_barrier() rename.

> +* previous writes such that they are architecturally visible for
> +* the platform buffer flush. Note that we've already arranged for 
> pmem
>  * writes to avoid the cache via memcpy_flushcache().  The final
>  * wmb() ensures ordering for the NVDIMM flush write.
>  */
> -   wmb();
> +   pmem_wmb();
> for (i = 0; i < nd_region->ndr_mappings; i++)
> if (ndrd_get_flush_wpq(ndrd, i, 0))
> writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index 2eacaf7d62f6..879d68faec1d 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -257,5 +257,15 @@ do { 

Re: rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/30/20 12:19 PM, Christoph Hellwig wrote:
> On Tue, Jun 30, 2020 at 09:43:31AM -0600, Jens Axboe wrote:
>> On 6/30/20 7:57 AM, Jens Axboe wrote:
>>> On 6/29/20 1:39 PM, Christoph Hellwig wrote:
 Hi Jens,

 this series moves the make_request_fn method into block_device_operations
 with the much more descriptive ->submit_bio name.  It then also gives
 generic_make_request a more descriptive name, and further optimize the
 path to issue to blk-mq, removing the need for the direct_make_request
 bypass.
>>>
>>> Looks good to me, and it's a nice cleanup as well. Applied.
>>
>> Dropped, insta-crashes with dm:
> 
> Hmm.  Can you send me what is at "submit_bio_noacct+0x1f6" from gdb?
> Or your .config?

I'd have to apply and compile again. But it's a bad RIP, so I'm guessing
it's ->submit_bio == NULL. Let me know if you really need it, and I can
re-generate the OOPS and have the vmlinux too.

-- 
Jens Axboe



Re: [PATCH v2] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-06-30 Thread Segher Boessenkool
Hi again,

Thanks for your work so far!

On Tue, Jun 30, 2020 at 06:53:39PM +, Christophe Leroy wrote:
> On 06/30/2020 04:33 PM, Segher Boessenkool wrote:
> >>>+ make -s CC=powerpc64-linux-gnu-gcc -j 160
> >>>In file included from /linux/include/linux/uaccess.h:11:0,
> >>>  from /linux/include/linux/sched/task.h:11,
> >>>  from /linux/include/linux/sched/signal.h:9,
> >>>  from /linux/include/linux/rcuwait.h:6,
> >>>  from /linux/include/linux/percpu-rwsem.h:7,
> >>>  from /linux/include/linux/fs.h:33,
> >>>  from /linux/include/linux/huge_mm.h:8,
> >>>  from /linux/include/linux/mm.h:675,
> >>>  from /linux/arch/powerpc/kernel/signal_32.c:17:
> >>>/linux/arch/powerpc/kernel/signal_32.c: In function
> >>>'save_user_regs.isra.14.constprop':
> >>>/linux/arch/powerpc/include/asm/uaccess.h:161:2: error: 'asm' operand has
> >>>impossible constraints
> >>>   __asm__ __volatile__( \
> >>>   ^
> >>>/linux/arch/powerpc/include/asm/uaccess.h:197:12: note: in expansion of
> >>>macro '__put_user_asm'
> >>> case 4: __put_user_asm(x, ptr, retval, "stw"); break; \
> >>> ^
> >>>/linux/arch/powerpc/include/asm/uaccess.h:206:2: note: in expansion of
> >>>macro '__put_user_size_allowed'
> >>>   __put_user_size_allowed(x, ptr, size, retval);  \
> >>>   ^
> >>>/linux/arch/powerpc/include/asm/uaccess.h:220:2: note: in expansion of
> >>>macro '__put_user_size'
> >>>   __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err); \
> >>>   ^
> >>>/linux/arch/powerpc/include/asm/uaccess.h:96:2: note: in expansion of
> >>>macro '__put_user_nocheck'
> >>>   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
> >>>   ^
> >>>/linux/arch/powerpc/kernel/signal_32.c:120:7: note: in expansion of macro
> >>>'__put_user'
> >>>if (__put_user((unsigned int)gregs[i], >mc_gregs[i]))
> >>>^
> >
> >Can we see what that was after the macro jungle?  Like, the actual
> >preprocessed code?
> 
> Sorry for previous misunderstanding
> 
> Here is the code:
> 
> #define __put_user_asm(x, addr, err, op)  \
>   __asm__ __volatile__(   \
>   "1: " op "%U2%X2 %1,%2  # put_user\n"   \
>   "2:\n"  \
>   ".section .fixup,\"ax\"\n"  \
>   "3: li %0,%3\n" \
>   "   b 2b\n" \
>   ".previous\n"   \
>   EX_TABLE(1b, 3b)\
>   : "=r" (err)\
>   : "r" (x), "m<>" (*addr), "i" (-EFAULT), "0" (err))

Yeah I don't see it.  I'll have to look at compiler debug dumps, but I
don't have any working 4.9 around, and I cannot reproduce this with
either older or newer compilers.

It is complainig that constrain_operands just does not work *at all* on
this "m<>" constraint apparently, which doesn't make much sense.

I'll try later when I have more time, sorry :-/


Segher


Re: [PATCH 0/8 v2] PCI: Align return values of PCIe capability and PCI accessors

2020-06-30 Thread Jason Gunthorpe
On Mon, Jun 15, 2020 at 09:32:17AM +0200, refactormys...@gmail.com wrote:
> Bolarinwa Olayemi Saheed (8):
>   IB/hfi1: Convert PCIBIOS_* errors to generic -E* errors
>   IB/hfi1: Convert PCIBIOS_* errors to generic -E* errors

Applied to rdma for-next thanks

Jason


[PATCH v4 16/26] mm/powerpc: Use general page fault accounting

2020-06-30 Thread Peter Xu
Use the general page fault accounting by passing regs into handle_mm_fault().

CC: Michael Ellerman 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Peter Xu 
---
 arch/powerpc/mm/fault.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 25dee001d8e1..00259e9b452d 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -607,7 +607,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   fault = handle_mm_fault(vma, address, flags, NULL);
+   fault = handle_mm_fault(vma, address, flags, regs);
 
major |= fault & VM_FAULT_MAJOR;
 
@@ -633,14 +633,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
/*
 * Major/minor page fault accounting.
 */
-   if (major) {
-   current->maj_flt++;
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+   if (major)
cmo_account_page_fault();
-   } else {
-   current->min_flt++;
-   perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
-   }
+
return 0;
 }
 NOKPROBE_SYMBOL(__do_page_fault);
-- 
2.26.2



Re: [PATCH v6 6/8] powerpc/pmem: Avoid the barrier in flush routines

2020-06-30 Thread Dan Williams
On Tue, Jun 30, 2020 at 2:21 AM Aneesh Kumar K.V
 wrote:
[..]
> >> The bio argument isn't for range based flushing, it is for flush
> >> operations that need to complete asynchronously.
> > How does the block layer determine that the pmem device needs
> > asynchronous fushing?
> >
>
> set_bit(ND_REGION_ASYNC, _desc.flags);
>
> and dax_synchronous(dev)

Yes, but I think it is overkill to have an indirect function call just
for a single instruction.

How about something like this instead, to share a common pmem_wmb()
across x86 and powerpc.

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 20ff30c2ab93..b14009060c83 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1180,6 +1180,13 @@ int nvdimm_flush(struct nd_region *nd_region,
struct bio *bio)
 {
int rc = 0;

+   /*
+* pmem_wmb() is needed to 'sfence' all previous writes such
+* that they are architecturally visible for the platform buffer
+* flush.
+*/
+   pmem_wmb();
+
if (!nd_region->flush)
rc = generic_nvdimm_flush(nd_region);
else {
@@ -1206,17 +1213,14 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));

/*
-* The first wmb() is needed to 'sfence' all previous writes
-* such that they are architecturally visible for the platform
-* buffer flush.  Note that we've already arranged for pmem
-* writes to avoid the cache via memcpy_flushcache().  The final
-* wmb() ensures ordering for the NVDIMM flush write.
+* Note that we've already arranged for pmem writes to avoid the
+* cache via memcpy_flushcache().  The final wmb() ensures
+* ordering for the NVDIMM flush write.
 */
-   wmb();
for (i = 0; i < nd_region->ndr_mappings; i++)
if (ndrd_get_flush_wpq(ndrd, i, 0))
writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
-   wmb();
+   pmem_wmb();

return 0;
 }


Re: [PATCH v6 6/8] powerpc/pmem: Avoid the barrier in flush routines

2020-06-30 Thread Aneesh Kumar K.V

On 7/1/20 1:15 AM, Dan Williams wrote:

On Tue, Jun 30, 2020 at 2:21 AM Aneesh Kumar K.V
 wrote:
[..]

The bio argument isn't for range based flushing, it is for flush
operations that need to complete asynchronously.

How does the block layer determine that the pmem device needs
asynchronous fushing?



 set_bit(ND_REGION_ASYNC, _desc.flags);

and dax_synchronous(dev)


Yes, but I think it is overkill to have an indirect function call just
for a single instruction.

How about something like this instead, to share a common pmem_wmb()
across x86 and powerpc.

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 20ff30c2ab93..b14009060c83 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1180,6 +1180,13 @@ int nvdimm_flush(struct nd_region *nd_region,
struct bio *bio)
  {
 int rc = 0;

+   /*
+* pmem_wmb() is needed to 'sfence' all previous writes such
+* that they are architecturally visible for the platform buffer
+* flush.
+*/
+   pmem_wmb();
+
 if (!nd_region->flush)
 rc = generic_nvdimm_flush(nd_region);
 else {
@@ -1206,17 +1213,14 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
 idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));

 /*
-* The first wmb() is needed to 'sfence' all previous writes
-* such that they are architecturally visible for the platform
-* buffer flush.  Note that we've already arranged for pmem
-* writes to avoid the cache via memcpy_flushcache().  The final
-* wmb() ensures ordering for the NVDIMM flush write.
+* Note that we've already arranged for pmem writes to avoid the
+* cache via memcpy_flushcache().  The final wmb() ensures
+* ordering for the NVDIMM flush write.
  */
-   wmb();



The series already convert this to pmem_wmb().


 for (i = 0; i < nd_region->ndr_mappings; i++)
 if (ndrd_get_flush_wpq(ndrd, i, 0))
 writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
-   wmb();
+   pmem_wmb();



Should this be pmem_wmb()? This is ordering the above writeq() right?



 return 0;
  }



This still results in two pmem_wmb() on platforms that doesn't have 
flush_wpq. I was trying to avoid that by adding a nd_region->flush call 
back.


-aneesh


[PATCH 1/2] powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case

2020-06-30 Thread Santosh Sivaraj
From: "Aneesh Kumar K.V" 

commit 12e4d53f3f04e81f9e83d6fc10edc7314ab9f6b9 upstream

The TLB flush optimisation (a46cc7a90f: powerpc/mm/radix: Improve TLB/PWC
flushes) may result in random memory corruption.

On any SMP system, freeing page directories should observe the exact same
order as normal page freeing:

 1) unhook page/directory
 2) TLB invalidate
 3) free page/directory

Without this, any concurrent page-table walk could end up with a
Use-after-Free.  This is esp.  trivial for anything that has software
page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware
caches partial page-walks (ie.  caches page directories).

Even on UP this might give issues since mmu_gather is preemptible these
days.  An interrupt or preempted task accessing user pages might stumble
into the free page if the hardware caches page directories.

!SMP case is right now broken for radix translation w.r.t page walk
cache flush.  We can get interrupted in between page table free and
that would imply we have page walk cache entries pointing to tables
which got freed already.  Michael said "both our platforms that run on
Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a
problem for anyone in practice, unless they've hacked their kernel to
build it !SMP."

Link: 
http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.ku...@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Cc:  # 4.19
Signed-off-by: Santosh Sivaraj 
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 --
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 8 
 arch/powerpc/mm/pgtable-book3s64.c   | 7 ---
 5 files changed, 1 insertion(+), 26 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f38d153d25861..4863fc0dd945a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -215,7 +215,7 @@ config PPC
select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && 
HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_RCU_TABLE_FREE  if SMP
+   select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 82e44b1a00ae9..79ba3fbb512e3 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -110,7 +110,6 @@ static inline void pgtable_free(void *table, unsigned 
index_size)
 #define check_pgt_cache()  do { } while (0)
 #define get_hugepd_cache_index(x)  (x)
 
-#ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
void *table, int shift)
 {
@@ -127,13 +126,6 @@ static inline void __tlb_remove_table(void *_table)
 
pgtable_free(table, shift);
 }
-#else
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
-   void *table, int shift)
-{
-   pgtable_free(table, shift);
-}
-#endif
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
  unsigned long address)
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index f9019b579903a..1013c02142139 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -47,9 +47,7 @@ extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned 
long);
 extern void pte_fragment_free(unsigned long *, int);
 extern void pmd_fragment_free(unsigned long *);
 extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
-#ifdef CONFIG_SMP
 extern void __tlb_remove_table(void *_table);
-#endif
 
 static inline pgd_t *radix__pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 8825953c225b2..96eed46d56842 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -111,7 +111,6 @@ static inline void pgtable_free(void *table, unsigned 
index_size)
 #define check_pgt_cache()  do { } while (0)
 #define get_hugepd_cache_index(x)  (x)
 
-#ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
void *table, int shift)
 {
@@ -128,13 +127,6 @@ static inline void __tlb_remove_table(void *_table)
 
pgtable_free(table, shift);
 }
-#else
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
-   void *table, int shift)
-{
-   pgtable_free(table, shift);
-}
-#endif
 
 static inline void __pte_free_tlb(struct mmu_gather 

[powerpc:next-test] BUILD SUCCESS 8b1bcbb263cc04d8bb9e57d64eda7cffc3dbc1aa

2020-06-30 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next-test
branch HEAD: 8b1bcbb263cc04d8bb9e57d64eda7cffc3dbc1aa  MAINTAINERS: Remove self 
from powerpc EEH

elapsed time: 1028m

configs tested: 98
configs skipped: 1

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
pariscallnoconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20200701
i386 randconfig-a001-20200701
i386 randconfig-a006-20200701
i386 randconfig-a005-20200701
i386 randconfig-a004-20200701
i386 randconfig-a003-20200701
x86_64   randconfig-a012-20200701
x86_64   randconfig-a016-20200701
x86_64   randconfig-a014-20200701
x86_64   randconfig-a011-20200701
x86_64   randconfig-a015-20200701
x86_64   randconfig-a013-20200701
i386 randconfig-a011-20200701
i386 randconfig-a015-20200701
i386 randconfig-a014-20200701
i386 randconfig-a016-20200701
i386 randconfig-a012-20200701
i386 randconfig-a013-20200701
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390 allyesconfig
s390  allnoconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
sparc64 defconfig
sparc64   allnoconfig
sparc64  allyesconfig
sparc64  allmodconfig
um   allmodconfig
umallnoconfig
um   allyesconfig
um  defconfig
x86_64   rhel-7.6
x86_64   rhel
x86_64 rhel-7.2-clear
x86_64lkp
x86_64  fedora-25
x86_64rhel-7.6-kselftests
x86_64   rhel-8.3
x86_64  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH v6 6/8] powerpc/pmem: Avoid the barrier in flush routines

2020-06-30 Thread Dan Williams
On Tue, Jun 30, 2020 at 8:09 PM Aneesh Kumar K.V
 wrote:
>
> On 7/1/20 1:15 AM, Dan Williams wrote:
> > On Tue, Jun 30, 2020 at 2:21 AM Aneesh Kumar K.V
> >  wrote:
> > [..]
>  The bio argument isn't for range based flushing, it is for flush
>  operations that need to complete asynchronously.
> >>> How does the block layer determine that the pmem device needs
> >>> asynchronous fushing?
> >>>
> >>
> >>  set_bit(ND_REGION_ASYNC, _desc.flags);
> >>
> >> and dax_synchronous(dev)
> >
> > Yes, but I think it is overkill to have an indirect function call just
> > for a single instruction.
> >
> > How about something like this instead, to share a common pmem_wmb()
> > across x86 and powerpc.
> >
> > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> > index 20ff30c2ab93..b14009060c83 100644
> > --- a/drivers/nvdimm/region_devs.c
> > +++ b/drivers/nvdimm/region_devs.c
> > @@ -1180,6 +1180,13 @@ int nvdimm_flush(struct nd_region *nd_region,
> > struct bio *bio)
> >   {
> >  int rc = 0;
> >
> > +   /*
> > +* pmem_wmb() is needed to 'sfence' all previous writes such
> > +* that they are architecturally visible for the platform buffer
> > +* flush.
> > +*/
> > +   pmem_wmb();
> > +
> >  if (!nd_region->flush)
> >  rc = generic_nvdimm_flush(nd_region);
> >  else {
> > @@ -1206,17 +1213,14 @@ int generic_nvdimm_flush(struct nd_region 
> > *nd_region)
> >  idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 
> > 8));
> >
> >  /*
> > -* The first wmb() is needed to 'sfence' all previous writes
> > -* such that they are architecturally visible for the platform
> > -* buffer flush.  Note that we've already arranged for pmem
> > -* writes to avoid the cache via memcpy_flushcache().  The final
> > -* wmb() ensures ordering for the NVDIMM flush write.
> > +* Note that we've already arranged for pmem writes to avoid the
> > +* cache via memcpy_flushcache().  The final wmb() ensures
> > +* ordering for the NVDIMM flush write.
> >   */
> > -   wmb();
>
>
> The series already convert this to pmem_wmb().
>
> >  for (i = 0; i < nd_region->ndr_mappings; i++)
> >  if (ndrd_get_flush_wpq(ndrd, i, 0))
> >  writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
> > -   wmb();
> > +   pmem_wmb();
>
>
> Should this be pmem_wmb()? This is ordering the above writeq() right?

Correct, this can just be wmb().

>
> >
> >  return 0;
> >   }
> >
>
> This still results in two pmem_wmb() on platforms that doesn't have
> flush_wpq. I was trying to avoid that by adding a nd_region->flush call
> back.

How about skip or exit early out of generic_nvdimm_flush if
ndrd->flush_wpq is NULL? That still saves an indirect branch at the
cost of another conditional, but that should still be worth it.


[PATCH 2/2] mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush

2020-06-30 Thread Santosh Sivaraj
From: Peter Zijlstra 

commit 0ed1325967ab5f7a4549a2641c6ebe115f76e228 upstream

Architectures for which we have hardware walkers of Linux page table
should flush TLB on mmu gather batch allocation failures and batch flush.
Some architectures like POWER supports multiple translation modes (hash
and radix) and in the case of POWER only radix translation mode needs the
above TLBI.  This is because for hash translation mode kernel wants to
avoid this extra flush since there are no hardware walkers of linux page
table.  With radix translation, the hardware also walks linux page table
and with that, kernel needs to make sure to TLB invalidate page walk cache
before page table pages are freed.

More details in commit d86564a2f085 ("mm/tlb, x86/mm: Support invalidating
TLB caches for RCU_TABLE_FREE")

The changes to sparc are to make sure we keep the old behavior since we
are now removing HAVE_RCU_TABLE_INVALIDATE.  The default value for
tlb_needs_table_invalidate is to always force an invalidate and sparc can
avoid the table invalidate.  Hence we define tlb_needs_table_invalidate to
false for sparc architecture.

Link: 
http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.ku...@linux.ibm.com
Fixes: a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes")
Signed-off-by: Peter Zijlstra (Intel) 
Cc:   # 4.19
Signed-off-by: Santosh Sivaraj 
---
 arch/Kconfig|  3 ---
 arch/powerpc/include/asm/tlb.h  | 11 +++
 arch/sparc/include/asm/tlb_64.h |  9 +
 include/asm-generic/tlb.h   | 15 +++
 mm/memory.c | 16 
 5 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index a336548487e69..3abbdb0cea447 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -363,9 +363,6 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_RCU_TABLE_FREE
bool
 
-config HAVE_RCU_TABLE_INVALIDATE
-   bool
-
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
 
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index f0e571b2dc7c8..63418275f402e 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -30,6 +30,17 @@
 #define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
 
 extern void tlb_flush(struct mmu_gather *tlb);
+/*
+ * book3s:
+ * Hash does not use the linux page-tables, so we can avoid
+ * the TLB invalidate for page-table freeing, Radix otoh does use the
+ * page-tables and needs the TLBI.
+ *
+ * nohash:
+ * We still do TLB invalidate in the __pte_free_tlb routine before we
+ * add the page table pages to mmu gather table batch.
+ */
+#define tlb_needs_table_invalidate()   radix_enabled()
 
 /* Get the generic bits... */
 #include 
diff --git a/arch/sparc/include/asm/tlb_64.h b/arch/sparc/include/asm/tlb_64.h
index a2f3fa61ee36a..8cb8f3833239a 100644
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -28,6 +28,15 @@ void flush_tlb_pending(void);
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 #define tlb_flush(tlb) flush_tlb_pending()
 
+/*
+ * SPARC64's hardware TLB fill does not use the Linux page-tables
+ * and therefore we don't need a TLBI when freeing page-table pages.
+ */
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define tlb_needs_table_invalidate()   (false)
+#endif
+
 #include 
 
 #endif /* _SPARC64_TLB_H */
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b3353e21f3b3e..92dcfd01e0ee4 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -61,8 +61,23 @@ struct mmu_table_batch {
 extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
+/*
+ * This allows an architecture that does not use the linux page-tables for
+ * hardware to skip the TLBI when freeing page tables.
+ */
+#ifndef tlb_needs_table_invalidate
+#define tlb_needs_table_invalidate() (true)
 #endif
 
+#else
+
+#ifdef tlb_needs_table_invalidate
+#error tlb_needs_table_invalidate() requires HAVE_RCU_TABLE_FREE
+#endif
+
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
+
 /*
  * If we can't allocate a page to make a big batch of page pointers
  * to work on, then just handle a few from the on-stack structure.
diff --git a/mm/memory.c b/mm/memory.c
index bbf0cc4066c84..7656714c9b7c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -325,14 +325,14 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, 
struct page *page, int page_
  */
 static inline void tlb_table_invalidate(struct mmu_gather *tlb)
 {
-#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE
-   /*
-* Invalidate page-table caches used by hardware walkers. Then we still
-* need to RCU-sched wait while freeing the pages because software
-* walkers can still be in-flight.
-*/
-   tlb_flush_mmu_tlbonly(tlb);
-#endif
+   if (tlb_needs_table_invalidate()) {
+   /*
+* Invalidate page-table caches used by hardware