date:20200528

[PATCH] powerpc/64/syscall: Disable sanitisers for C syscall entry/exit code

2020-05-28 Thread Daniel Axtens

syzkaller is picking up a bunch of crashes that look like this:

Unrecoverable exception 380 at c037ed60 (msr=80001031)
Oops: Unrecoverable exception, sig: 6 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 
5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0
NIP:  c037ed60 LR: c004bac8 CTR: c0030990
REGS: c000555a7230 TRAP: 0380   Not tainted  
(5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e)
MSR:  80001031   CR: 48222882  XER: 2000
CFAR: c004bac4 IRQMASK: 0
GPR00: c004bb68 c000555a74c0 c24b3500 0005
GPR04:   c004bb88 c0080091
GPR08: 000b c004bac8 00016000 c2503500
GPR12: c0030990 c319 106a5898 106a
GPR16: 106a5890 c7a92000 c8180e00 c7a8f700
GPR20: c7a904b0 1011 c259d318 5deadbeef100
GPR24: 5deadbeef122 c00078422700 c9ee88b8 c00078422778
GPR28: 0001 8280b033  c000555a75a0
NIP [c037ed60] __sanitizer_cov_trace_pc+0x40/0x50
LR [c004bac8] interrupt_exit_kernel_prepare+0x118/0x310
Call Trace:
[c000555a74c0] [c004bb68] interrupt_exit_kernel_prepare+0x1b8/0x310 
(unreliable)
[c000555a7530] [c000f9a8] interrupt_return+0x118/0x1c0
--- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50
..

That looks like the KCOV helper accessing memory that's not safe to
access in the interrupt handling context.

Do not instrument the new syscall entry/exit code with KCOV, GCOV or
UBSAN.

Cc: Nicholas Piggin 
Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in 
C")
Signed-off-by: Daniel Axtens 

---

be warned: I haven't attempted to reproduce the crash yet,
nor have I been able to test that this fixes it. I will attempt to do
that soon. Logically though, it does seem like this would be a
good thing to do regardless.
---
 arch/powerpc/kernel/Makefile | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1c4385852d3d..1d443a7dc8a7 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -156,12 +156,19 @@ obj-$(CONFIG_PPC_SECVAR_SYSFS)+= secvar-sysfs.o
 GCOV_PROFILE_prom_init.o := n
 KCOV_INSTRUMENT_prom_init.o := n
 UBSAN_SANITIZE_prom_init.o := n
+
 GCOV_PROFILE_kprobes.o := n
 KCOV_INSTRUMENT_kprobes.o := n
 UBSAN_SANITIZE_kprobes.o := n
+
 GCOV_PROFILE_kprobes-ftrace.o := n
 KCOV_INSTRUMENT_kprobes-ftrace.o := n
 UBSAN_SANITIZE_kprobes-ftrace.o := n
+
+GCOV_PROFILE_syscall_64.o := n
+KCOV_INSTRUMENT_syscall_64.o := n
+UBSAN_SANITIZE_syscall_64.o := n
+
 UBSAN_SANITIZE_vdso.o := n
 
 # Necessary for booting with kcov enabled on book3e machines
-- 
2.20.1

[RFC PATCH 2/2] powerpc/pmem: Disable synchronous fault by default.

2020-05-28 Thread Aneesh Kumar K.V

This adds a kernel config option that controls whether MAP_SYNC is enabled by
default. With POWER10, architecture is adding new pmem flush and sync
instructions. The kernel should prevent the usage of MAP_SYNC if applications
are not using the new instructions on newer hardware.

This config allows user to control whether MAP_SYNC should be enabled by
default or not.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/Kconfig.cputype | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 27a81c291be8..f8694838ad4e 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -383,6 +383,15 @@ config PPC_KUEP
 
  If you're unsure, say Y.
 
+config ARCH_MAP_SYNC_DISABLE
+   bool "Disable synchronous fault support (MAP_SYNC)"
+   default y
+   help
+ Disable support for synchronous fault with nvdimm namespaces.
+
+ If you're unsure, say Y.
+
+
 config PPC_HAVE_KUAP
bool
 
-- 
2.26.2

[RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

2020-05-28 Thread Aneesh Kumar K.V

With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware.

This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
the usage of MAP_SYNC. The kernel config option is added to allow the user
to control whether MAP_SYNC should be enabled by default or not.

Signed-off-by: Aneesh Kumar K.V 
---
 include/linux/sched/coredump.h | 13 ++---
 include/uapi/linux/prctl.h |  3 +++
 kernel/fork.c  |  8 +++-
 kernel/sys.c   | 18 ++
 mm/Kconfig |  3 +++
 mm/mmap.c  |  4 
 6 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index ecdc6542070f..9ba6b3d5f991 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -72,9 +72,16 @@ static inline int get_dumpable(struct mm_struct *mm)
 #define MMF_DISABLE_THP24  /* disable THP for all VMAs */
 #define MMF_OOM_VICTIM 25  /* mm is the oom victim */
 #define MMF_OOM_REAP_QUEUED26  /* mm was queued for oom_reaper */
-#define MMF_DISABLE_THP_MASK   (1 << MMF_DISABLE_THP)
+#define MMF_DISABLE_MAP_SYNC   27  /* disable THP for all VMAs */
+#define MMF_DISABLE_THP_MASK   (1 << MMF_DISABLE_THP)
+#define MMF_DISABLE_MAP_SYNC_MASK  (1 << MMF_DISABLE_MAP_SYNC)
 
-#define MMF_INIT_MASK  (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
-MMF_DISABLE_THP_MASK)
+#define MMF_INIT_MASK  (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK | \
+   MMF_DISABLE_THP_MASK | MMF_DISABLE_MAP_SYNC_MASK)
+
+static inline bool map_sync_enabled(struct mm_struct *mm)
+{
+   return !(mm->flags & MMF_DISABLE_MAP_SYNC_MASK);
+}
 
 #endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..ee4cde32d5cf 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -238,4 +238,7 @@ struct prctl_mm_map {
 #define PR_SET_IO_FLUSHER  57
 #define PR_GET_IO_FLUSHER  58
 
+#define PR_SET_MAP_SYNC_ENABLE 59
+#define PR_GET_MAP_SYNC_ENABLE 60
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..d5a9a363e81e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
 
 static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
 
+#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
+unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
+#else
+unsigned long default_map_sync_mask = 0;
+#endif
+
 static int __init coredump_filter_setup(char *s)
 {
default_dump_filter =
@@ -1039,7 +1045,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm->flags = current->mm->flags & MMF_INIT_MASK;
mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
} else {
-   mm->flags = default_dump_filter;
+   mm->flags = default_dump_filter | default_map_sync_mask;
mm->def_flags = 0;
}
 
diff --git a/kernel/sys.c b/kernel/sys.c
index d325f3ab624a..f6127cf4128b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2450,6 +2450,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
clear_bit(MMF_DISABLE_THP, &me->mm->flags);
up_write(&me->mm->mmap_sem);
break;
+
+   case PR_GET_MAP_SYNC_ENABLE:
+   if (arg2 || arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = !test_bit(MMF_DISABLE_MAP_SYNC, &me->mm->flags);
+   break;
+   case PR_SET_MAP_SYNC_ENABLE:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   if (down_write_killable(&me->mm->mmap_sem))
+   return -EINTR;
+   if (arg2)
+   clear_bit(MMF_DISABLE_MAP_SYNC, &me->mm->flags);
+   else
+   set_bit(MMF_DISABLE_MAP_SYNC, &me->mm->flags);
+   up_write(&me->mm->mmap_sem);
+   break;
+
case PR_MPX_ENABLE_MANAGEMENT:
case PR_MPX_DISABLE_MANAGEMENT:
/* No longer implemented: */
diff --git a/mm/Kconfig b/mm/Kconfig
index c1acc34c1c35..38fd7cfbfca8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -867,4 +867,7 @@ config ARCH_HAS_HUGEPD
 config MAPPING_DIRTY_HELPERS
 bool
 
+config ARCH_MAP_SYNC_DISABLE
+   bool
+
 endmenu
diff --git a/mm/mmap.c b/mm/mmap.c
index f609e9ec4a25..613e5894f178 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1464,6 +1464,10 @@ unsigned long do_mmap(struct file *file, unsigned long 
addr,
case MAP_SHARED_VAL

[PATCH v4 8/8] powerpc/pmem: Initialize pmem device on newer hardware

2020-05-28 Thread Aneesh Kumar K.V

With kernel now supporting new pmem flush/sync instructions, we can now
enable the kernel to initialize the device. On P10 these devices would
appear with a new compatible string. For PAPR device we have

compatible   "ibm,pmemory-v2"

and for OF pmem device we have

compatible   "pmem-region-v2"

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 1 +
 drivers/nvdimm/of_pmem.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index ad506e7003c9..407e08f08157 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -483,6 +483,7 @@ static int papr_scm_remove(struct platform_device *pdev)
 
 static const struct of_device_id papr_scm_match[] = {
{ .compatible = "ibm,pmemory" },
+   { .compatible = "ibm,pmemory-v2" },
{ },
 };
 
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 6826a274a1f1..10dbdcdfb9ce 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -90,6 +90,7 @@ static int of_pmem_region_remove(struct platform_device *pdev)
 
 static const struct of_device_id of_pmem_region_match[] = {
{ .compatible = "pmem-region" },
+   { .compatible = "pmem-region-v2" },
{ },
 };
 
-- 
2.26.2

[PATCH v4 7/8] powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem flush functions.

2020-05-28 Thread Aneesh Kumar K.V

We only support persistent memory on P8 and above. This is enforced by the
firmware and further checked on virtualzied platform during platform init.
Add WARN_ONCE in pmem flush routines to catch the wrong usage of these.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/cacheflush.h | 2 ++
 arch/powerpc/lib/pmem.c   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index bc3ea009cf14..865fae8a226e 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -125,6 +125,8 @@ static inline void  arch_pmem_flush_barrier(void)
 {
if (cpu_has_feature(CPU_FTR_ARCH_207S))
asm volatile(PPC_PHWSYNC ::: "memory");
+   else
+   WARN_ONCE(1, "Using pmem flush on older hardware.");
 }
 #endif /* __KERNEL__ */
 
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 21210fa676e5..f40bd908d28d 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -37,12 +37,14 @@ static inline void clean_pmem_range(unsigned long start, 
unsigned long stop)
 {
if (cpu_has_feature(CPU_FTR_ARCH_207S))
return __clean_pmem_range(start, stop);
+   WARN_ONCE(1, "Using pmem flush on older hardware.");
 }
 
 static inline void flush_pmem_range(unsigned long start, unsigned long stop)
 {
if (cpu_has_feature(CPU_FTR_ARCH_207S))
return __flush_pmem_range(start, stop);
+   WARN_ONCE(1, "Using pmem flush on older hardware.");
 }
 
 /*
-- 
2.26.2

[PATCH v4 6/8] powerpc/pmem: Avoid the barrier in flush routines

2020-05-28 Thread Aneesh Kumar K.V

nvdimm expect the flush routines to just mark the cache clean. The barrier
that mark the store globally visible is done in nvdimm_flush().

Update the papr_scm driver to a simplified nvdim_flush callback that do
only the required barrier.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/lib/pmem.c   |  6 --
 arch/powerpc/platforms/pseries/papr_scm.c | 13 +
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 5a61aaeb6930..21210fa676e5 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long start, 
unsigned long stop)
 
for (i = 0; i < size >> shift; i++, addr += bytes)
asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-   asm volatile(PPC_PHWSYNC ::: "memory");
 }
 
 static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
@@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long start, 
unsigned long stop)
 
for (i = 0; i < size >> shift; i++, addr += bytes)
asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-   asm volatile(PPC_PHWSYNC ::: "memory");
 }
 
 static inline void clean_pmem_range(unsigned long start, unsigned long stop)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..ad506e7003c9 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -285,6 +285,18 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor 
*nd_desc,
 
return 0;
 }
+/*
+ * We have made sure the pmem writes are done such that before calling this
+ * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. Here
+ * we just need to add the necessary barrier to make sure the above flushes
+ * are have updated persistent storage before any data access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio *bio)
+{
+   arch_pmem_flush_barrier();
+   return 0;
+}
 
 static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 {
@@ -340,6 +352,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
ndr_desc.mapping = &mapping;
ndr_desc.num_mappings = 1;
ndr_desc.nd_set = &p->nd_set;
+   ndr_desc.flush = papr_scm_flush_sync;
 
if (p->is_volatile)
p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
-- 
2.26.2

[PATCH v4 4/8] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-28 Thread Aneesh Kumar K.V

Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/md/dm-writecache.c   | 2 +-
 drivers/nvdimm/region_devs.c | 8 
 include/linux/libnvdimm.h| 4 
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 613c171b1b6d..904fdbf2b089 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -540,7 +540,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
 static void writecache_commit_flushed(struct dm_writecache *wc, bool 
wait_for_ios)
 {
if (WC_MODE_PMEM(wc))
-   wmb();
+   arch_pmem_flush_barrier();
else
ssd_commit_flushed(wc, wait_for_ios);
 }
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..88ea34a9c7fd 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1216,13 +1216,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
/*
-* The first wmb() is needed to 'sfence' all previous writes
-* such that they are architecturally visible for the platform
-* buffer flush.  Note that we've already arranged for pmem
+* The first arch_pmem_flush_barrier() is needed to 'sfence' all
+* previous writes such that they are architecturally visible for
+* the platform buffer flush. Note that we've already arranged for pmem
 * writes to avoid the cache via memcpy_flushcache().  The final
 * wmb() ensures ordering for the NVDIMM flush write.
 */
-   wmb();
+   arch_pmem_flush_barrier();
for (i = 0; i < nd_region->ndr_mappings; i++)
if (ndrd_get_flush_wpq(ndrd, i, 0))
writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..66f6c65bd789 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t 
size)
 }
 #endif
 
+#ifndef arch_pmem_flush_barrier
+#define arch_pmem_flush_barrier() wmb()
+#endif
+
 #endif /* __LIBNVDIMM_H__ */
-- 
2.26.2

[PATCH v4 5/8] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction.

2020-05-28 Thread Aneesh Kumar K.V

of_pmem on POWER10 can now use phwsync instead of hwsync to ensure
all previous writes are architecturally visible for the platform
buffer flush.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/cacheflush.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index e92191b390f3..bc3ea009cf14 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -119,6 +119,13 @@ static inline void invalidate_dcache_range(unsigned long 
start,
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
memcpy(dst, src, len)
 
+
+#define arch_pmem_flush_barrier arch_pmem_flush_barrier
+static inline void  arch_pmem_flush_barrier(void)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_207S))
+   asm volatile(PPC_PHWSYNC ::: "memory");
+}
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_POWERPC_CACHEFLUSH_H */
-- 
2.26.2

[PATCH v4 3/8] powerpc/pmem: Add flush routines using new pmem store and sync instruction

2020-05-28 Thread Aneesh Kumar K.V

Start using dcbstps; phwsync; sequence for flushing persistent memory range.
The new instructions are implemented as a variant of dcbf and hwsync and on
P8 and P9 they will be executed as those instructions. We avoid using them on
older hardware. This helps to avoid difficult to debug bugs.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/lib/pmem.c | 50 +
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..5a61aaeb6930 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,20 +9,62 @@
 
 #include 
 
+static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
+{
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
+   void *addr = (void *)(start & ~(bytes - 1));
+   unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+   unsigned long i;
+
+   for (i = 0; i < size >> shift; i++, addr += bytes)
+   asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+   asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
+{
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
+   void *addr = (void *)(start & ~(bytes - 1));
+   unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+   unsigned long i;
+
+   for (i = 0; i < size >> shift; i++, addr += bytes)
+   asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+   asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void clean_pmem_range(unsigned long start, unsigned long stop)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_207S))
+   return __clean_pmem_range(start, stop);
+}
+
+static inline void flush_pmem_range(unsigned long start, unsigned long stop)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_207S))
+   return __flush_pmem_range(start, stop);
+}
+
 /*
  * CONFIG_ARCH_HAS_PMEM_API symbols
  */
 void arch_wb_cache_pmem(void *addr, size_t size)
 {
unsigned long start = (unsigned long) addr;
-   flush_dcache_range(start, start + size);
+   clean_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
 
 void arch_invalidate_pmem(void *addr, size_t size)
 {
unsigned long start = (unsigned long) addr;
-   flush_dcache_range(start, start + size);
+   flush_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
 
@@ -35,7 +77,7 @@ long __copy_from_user_flushcache(void *dest, const void 
__user *src,
unsigned long copied, start = (unsigned long) dest;
 
copied = __copy_from_user(dest, src, size);
-   flush_dcache_range(start, start + size);
+   clean_pmem_range(start, start + size);
 
return copied;
 }
@@ -45,7 +87,7 @@ void *memcpy_flushcache(void *dest, const void *src, size_t 
size)
unsigned long start = (unsigned long) dest;
 
memcpy(dest, src, size);
-   flush_dcache_range(start, start + size);
+   clean_pmem_range(start, start + size);
 
return dest;
 }
-- 
2.26.2

[PATCH v4 2/8] powerpc/pmem: Add new instructions for persistent storage and sync

2020-05-28 Thread Aneesh Kumar K.V

POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage.

Additionally, POWER10 also introduce phwsync and plwsync which can be used
to establish order of these writes to persistent storage.

This patch exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new
instructions are added as a variant of the old ones that old hardware
won't differentiate.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/ppc-opcode.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 2a39c716c343..1ad014e4633e 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -219,6 +219,8 @@
 #define PPC_INST_STWCX 0x7c00012d
 #define PPC_INST_LWSYNC0x7c2004ac
 #define PPC_INST_SYNC  0x7c0004ac
+#define PPC_INST_PHWSYNC   0x7c8004ac
+#define PPC_INST_PLWSYNC   0x7ca004ac
 #define PPC_INST_SYNC_MASK 0xfc0007fe
 #define PPC_INST_ISYNC 0x4c00012c
 #define PPC_INST_LXVD2X0x7c000698
@@ -284,6 +286,8 @@
 #define PPC_INST_TABORT0x7c00071d
 #define PPC_INST_TSR   0x7c0005dd
 
+#define PPC_INST_DCBF  0x7cac
+
 #define PPC_INST_NAP   0x4c000364
 #define PPC_INST_SLEEP 0x4c0003a4
 #define PPC_INST_WINKLE0x4c0003e4
@@ -532,6 +536,14 @@
 #define STBCIX(s,a,b)  stringify_in_c(.long PPC_INST_STBCIX | \
   __PPC_RS(s) | __PPC_RA(a) | __PPC_RB(b))
 
+#definePPC_DCBFPS(a, b)stringify_in_c(.long PPC_INST_DCBF |
\
+  ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
+#definePPC_DCBSTPS(a, b)   stringify_in_c(.long PPC_INST_DCBF |
\
+  ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
+
+#definePPC_PHWSYNC stringify_in_c(.long PPC_INST_PHWSYNC)
+#definePPC_PLWSYNC stringify_in_c(.long PPC_INST_PLWSYNC)
+
 /*
  * Define what the VSX XX1 form instructions will look like, then add
  * the 128 bit load store instructions based on that.
-- 
2.26.2

[PATCH v4 0/8] Support new pmem flush and sync instructions for POWER

2020-05-28 Thread Aneesh Kumar K.V

This patch series enables the usage os new pmem flush and sync instructions on 
POWER
architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps 
and dcbfps)
that can be used to write modified locations back to persistent storage. 
Additionally,
POWER10 also introduce phwsync and plwsync which can be used to establish order 
of these
writes to persistent storage.

This series exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new 
instructions
are added as a variant of the old ones that old hardware won't differentiate.

On POWER10, pmem devices will be represented by a different device tree compat
strings. This ensures that older kernels won't initialize pmem devices on 
POWER10.

Changes from V3:
* Add new compat string to be used for the device.
* Use arch_pmem_flush_barrier() in dm-writecache.

Aneesh Kumar K.V (8):
  powerpc/pmem: Restrict papr_scm to P8 and above.
  powerpc/pmem: Add new instructions for persistent storage and sync
  powerpc/pmem: Add flush routines using new pmem store and sync
instruction
  libnvdimm/nvdimm/flush: Allow architecture to override the flush
barrier
  powerpc/pmem/of_pmem: Update of_pmem to use the new barrier
instruction.
  powerpc/pmem: Avoid the barrier in flush routines
  powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem
flush functions.
  powerpc/pmem: Initialize pmem device on newer hardware

 arch/powerpc/include/asm/cacheflush.h |  9 +
 arch/powerpc/include/asm/ppc-opcode.h | 12 ++
 arch/powerpc/lib/pmem.c   | 46 +--
 arch/powerpc/platforms/pseries/papr_scm.c | 14 +++
 arch/powerpc/platforms/pseries/pmem.c |  6 +++
 drivers/md/dm-writecache.c|  2 +-
 drivers/nvdimm/of_pmem.c  |  1 +
 drivers/nvdimm/region_devs.c  |  8 ++--
 include/linux/libnvdimm.h |  4 ++
 9 files changed, 93 insertions(+), 9 deletions(-)

-- 
2.26.2

[PATCH v4 1/8] powerpc/pmem: Restrict papr_scm to P8 and above.

2020-05-28 Thread Aneesh Kumar K.V

The PAPR based virtualized persistent memory devices are only supported on
POWER9 and above. In the followup patch, the kernel will switch the persistent
memory cache flush functions to use a new `dcbf` variant instruction. The new
instructions even though added in ISA 3.1 works even on P8 and P9 because these
are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
P9 behaves as such.

Considering these devices are only supported on P8 and above,  update the driver
to prevent a P7-compat guest from using persistent memory devices.

We don't update of_pmem driver with the same condition, because, on bare-metal,
the firmware enables pmem support only on P9 and above. There the kernel depends
on OPAL firmware to restrict exposing persistent memory related device tree
entries on older hardware. of_pmem.ko is written without any arch dependency and
we don't want to add ppc64 specific cpu feature check in of_pmem driver.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/pseries/pmem.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pmem.c 
b/arch/powerpc/platforms/pseries/pmem.c
index f860a897a9e0..2347e1038f58 100644
--- a/arch/powerpc/platforms/pseries/pmem.c
+++ b/arch/powerpc/platforms/pseries/pmem.c
@@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
 
 static int pseries_pmem_init(void)
 {
+   /*
+* Only supported on POWER8 and above.
+*/
+   if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+   return 0;
+
pmem_node = of_find_node_by_type(NULL, "ibm,persistent-memory");
if (!pmem_node)
return 0;
-- 
2.26.2

Re: [PATCH] powerpc/xive: Enforce load-after-store ordering when StoreEOI is active

2020-05-28 Thread Michael Ellerman

On Thu, 2020-02-20 at 08:15:06 UTC, =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= wrote:
> When an interrupt has been handled, the OS notifies the interrupt
> controller with a EOI sequence. On a POWER9 system using the XIVE
> interrupt controller, this can be done with a load or a store
> operation on the ESB interrupt management page of the interrupt. The
> StoreEOI operation has less latency and improves interrupt handling
> performance but it was deactivated during the POWER9 DD2.0 timeframe
> because of ordering issues. We use the LoadEOI today but we plan to
> reactivate StoreEOI in future architectures.
> 
> There is usually no need to enforce ordering between ESB load and
> store operations as they should lead to the same result. E.g. a store
> trigger and a load EOI can be executed in any order. Assuming the
> interrupt state is PQ=10, a store trigger followed by a load EOI will
> return a Q bit. In the reverse order, it will create a new interrupt
> trigger from HW. In both cases, the handler processing interrupts is
> notified.
> 
> In some cases, the XIVE_ESB_SET_PQ_10 load operation is used to
> disable temporarily the interrupt source (mask/unmask). When the
> source is reenabled, the OS can detect if interrupts were received
> while the source was disabled and reinject them. This process needs
> special care when StoreEOI is activated. The ESB load and store
> operations should be correctly ordered because a XIVE_ESB_STORE_EOI
> operation could leave the source enabled if it has not completed
> before the loads.
> 
> For those cases, we enforce Load-after-Store ordering with a special
> load operation offset. To avoid performance impact, this ordering is
> only enforced when really needed, that is when interrupt sources are
> temporarily disabled with the XIVE_ESB_SET_PQ_10 load. It should not
> be needed for other loads.
> 
> Signed-off-by: CÃ©dric Le Goater 

Applied to powerpc topic/ppc-kvm, thanks.

https://git.kernel.org/powerpc/c/b1f9be9392f090f08e4ad9e2c68963aeff03bd67

cheers

Re: [PATCH] powerpc/uaccess: Don't use "m<>" constraint

2020-05-28 Thread Michael Ellerman

On Thu, 2020-05-07 at 12:33:24 UTC, Michael Ellerman wrote:
> The "m<>" constraint breaks compilation with GCC 4.6.x era compilers.
> 
> The use of the constraint allows the compiler to use update-form
> instructions, however in practice current compilers never generate
> those forms for any of the current uses of __put_user_asm_goto().
> 
> We anticipate that GCC 4.6 will be declared unsupported for building
> the kernel in the not too distant future. So for now just switch to
> the "m" constraint.
> 
> Fixes: 334710b1496a ("powerpc/uaccess: Implement unsafe_put_user() using 'asm 
> goto'")
> Signed-off-by: Michael Ellerman 

Applied to powerpc topic/uaccess-ppc.

https://git.kernel.org/powerpc/c/e2a8b49e79553bd8ec48f73cead84e6146c09408

cheers

Re: [PATCH v4 2/2] powerpc/uaccess: Implement unsafe_copy_to_user() as a simple loop

2020-05-28 Thread Michael Ellerman

On Fri, 2020-04-17 at 17:08:52 UTC, Christophe Leroy wrote:
> At the time being, unsafe_copy_to_user() is based on
> raw_copy_to_user() which calls __copy_tofrom_user().
> 
> __copy_tofrom_user() is a big optimised function to copy big amount
> of data. It aligns destinations to cache line in order to use
> dcbz instruction.
> 
> Today unsafe_copy_to_user() is called only from filldir().
> It is used to mainly copy small amount of data like filenames,
> so __copy_tofrom_user() is not fit.
> 
> Also, unsafe_copy_to_user() is used within user_access_begin/end
> sections. In those section, it is preferable to not call functions.
> 
> Rewrite unsafe_copy_to_user() as a macro that uses __put_user_goto().
> We first perform a loop of long, then we finish with necessary
> complements.
> 
> unsafe_copy_to_user() might be used in the near future to copy
> fixed-size data, like pt_regs structs during signal processing.
> Having it as a macro allows GCC to optimise it for instead when
> it knows the size in advance, it can unloop loops, drop complements
> when the size is a multiple of longs, etc ...
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc topic/uaccess-ppc, thanks.

https://git.kernel.org/powerpc/c/17bc43367fc2a720400d21c745db641c654c1e6b

cheers

Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-05-28 Thread Michael Ellerman

On Fri, 2020-04-17 at 17:08:51 UTC, Christophe Leroy wrote:
> unsafe_put_user() is designed to take benefit of 'asm goto'.
> 
> Instead of using the standard __put_user() approach and branch
> based on the returned error, use 'asm goto' and make the
> exception code branch directly to the error label. There is
> no code anymore in the fixup section.
> 
> This change significantly simplifies functions using
> unsafe_put_user()
...
> 
> Signed-off-by: Christophe Leroy 
> Reviewed-by: Segher Boessenkool 

Applied to powerpc topic/uaccess-ppc, thanks.

https://git.kernel.org/powerpc/c/334710b1496af8a0960e70121f850e209c20958f

cheers

Re: [PATCH v3] powerpc/uaccess: evaluate macro arguments once, before user access is allowed

2020-05-28 Thread Michael Ellerman

On Tue, 2020-04-07 at 04:12:45 UTC, Nicholas Piggin wrote:
> get/put_user can be called with nontrivial arguments. fs/proc/page.c
> has a good example:
> 
> if (put_user(stable_page_flags(ppage), out)) {
> 
> stable_page_flags is quite a lot of code, including spin locks in the
> page allocator.
> 
> Ensure these arguments are evaluated before user access is allowed.
> This improves security by reducing code with access to userspace, but
> it also fixes a PREEMPT bug with KUAP on powerpc/64s:
> stable_page_flags is currently called with AMR set to allow writes,
> it ends up calling spin_unlock(), which can call preempt_schedule. But
> the task switch code can not be called with AMR set (it relies on
> interrupts saving the register), so this blows up.
> 
> It's fine if the code inside allow_user_access is preemptible, because
> a timer or IPI will save the AMR, but it's not okay to explicitly
> cause a reschedule.
> 
> Signed-off-by: Nicholas Piggin 

Applied to powerpc topic/uaccess-ppc, thanks.

https://git.kernel.org/powerpc/c/d02f6b7dab8228487268298ea1f21081c0b4b3eb

cheers

Re: [PATCH v2 4/5] powerpc/uaccess: Implement user_read_access_begin and user_write_access_begin

2020-05-28 Thread Michael Ellerman

On Fri, 2020-04-03 at 07:20:53 UTC, Christophe Leroy wrote:
> Add support for selective read or write user access with
> user_read_access_begin/end and user_write_access_begin/end.
> 
> Signed-off-by: Christophe Leroy 
> Reviewed-by: Kees Cook 

Applied to powerpc topic/uaccess-ppc, thanks.

https://git.kernel.org/powerpc/c/4fe5cda9f89d0aea8e915b7c96ae34bda4e12e51

cheers

Re: [PATCH v2 3/5] drm/i915/gem: Replace user_access_begin by user_write_access_begin

2020-05-28 Thread Michael Ellerman

On Fri, 2020-04-03 at 07:20:52 UTC, Christophe Leroy wrote:
> When i915_gem_execbuffer2_ioctl() is using user_access_begin(),
> that's only to perform unsafe_put_user() so use
> user_write_access_begin() in order to only open write access.
> 
> Signed-off-by: Christophe Leroy 
> Reviewed-by: Kees Cook 

Applied to powerpc topic/uaccess, thanks.

https://git.kernel.org/powerpc/c/b44f687386875b714dae2afa768e73401e45c21c

cheers

Re: [PATCH v2 2/5] uaccess: Selectively open read or write user access

2020-05-28 Thread Michael Ellerman

On Fri, 2020-04-03 at 07:20:51 UTC, Christophe Leroy wrote:
> When opening user access to only perform reads, only open read access.
> When opening user access to only perform writes, only open write
> access.
> 
> Signed-off-by: Christophe Leroy 
> Reviewed-by: Kees Cook 

Applied to powerpc topic/uaccess, thanks.

https://git.kernel.org/powerpc/c/41cd780524674082b037e7c8461f90c5e42103f0

cheers

Re: [PATCH v2 1/5] uaccess: Add user_read_access_begin/end and user_write_access_begin/end

2020-05-28 Thread Michael Ellerman

On Fri, 2020-04-03 at 07:20:50 UTC, Christophe Leroy wrote:
> Some architectures like powerpc64 have the capability to separate
> read access and write access protection.
> For get_user() and copy_from_user(), powerpc64 only open read access.
> For put_user() and copy_to_user(), powerpc64 only open write access.
> But when using unsafe_get_user() or unsafe_put_user(),
> user_access_begin open both read and write.
> 
> Other architectures like powerpc book3s 32 bits only allow write
> access protection. And on this architecture protection is an heavy
> operation as it requires locking/unlocking per segment of 256Mbytes.
> On those architecture it is therefore desirable to do the unlocking
> only for write access. (Note that book3s/32 ranges from very old
> powermac from the 90's with powerpc 601 processor, till modern
> ADSL boxes with PowerQuicc II processors for instance so it
> is still worth considering.)
> 
> In order to avoid any risk based of hacking some variable parameters
> passed to user_access_begin/end that would allow hacking and
> leaving user access open or opening too much, it is preferable to
> use dedicated static functions that can't be overridden.
> 
> Add a user_read_access_begin and user_read_access_end to only open
> read access.
> 
> Add a user_write_access_begin and user_write_access_end to only open
> write access.
> 
> By default, when undefined, those new access helpers default on the
> existing user_access_begin and user_access_end.
> 
> Signed-off-by: Christophe Leroy 
> Reviewed-by: Kees Cook 

Applied to powerpc topic/uaccess, thanks.

https://git.kernel.org/powerpc/c/999a22890cb183b918e4372395d24426a755cef2

cheers

Re: [PATCH] powerpc/nvram: Replace kmalloc with kzalloc in the error message

2020-05-28 Thread Michael Ellerman

Yi Wang  writes:
> From: Liao Pingfang 
>
> Use kzalloc instead of kmalloc in the error message according to
> the previous kzalloc() call.

Please just remove the message instead, it's a tiny allocation that's
unlikely to ever fail, and the caller will print an error anyway.

cheers

> diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
> index fb4f610..c3a0c8d 100644
> --- a/arch/powerpc/kernel/nvram_64.c
> +++ b/arch/powerpc/kernel/nvram_64.c
> @@ -892,7 +892,7 @@ loff_t __init nvram_create_partition(const char *name, 
> int sig,
>   /* Create our OS partition */
>   new_part = kzalloc(sizeof(*new_part), GFP_KERNEL);
>   if (!new_part) {
> - pr_err("%s: kmalloc failed\n", __func__);
> + pr_err("%s: kzalloc failed\n", __func__);
>   return -ENOMEM;
>   }
>  
> -- 
> 2.9.5

Re: powerpc/pci: [PATCH 1/1 V3] PCIE PHB reset

2020-05-28 Thread Sam Bobroff

On Tue, May 26, 2020 at 08:21:59AM -0500, wenxi...@linux.vnet.ibm.com wrote:
> From: Wen Xiong 
> 
> Several device drivers hit EEH(Extended Error handling) when triggering
> kdump on Pseries PowerVM. This patch implemented a reset of the PHBs
> in pci general code when triggering kdump. PHB reset stop all PCI
> transactions from normal kernel. We have tested the patch in several
> enviroments:
> - direct slot adapters
> - adapters under the switch
> - a VF adapter in PowerVM
> - a VF adapter/adapter in KVM guest.
> 
> Signed-off-by: Wen Xiong 

Looks good to me.
Reviewed-by: Sam Bobroff 

(It would be easier to review if you included a patchset change log and
CC'd people who reviewed earlier versions.)

Cheers,
Sam.

> 
> ---
>  arch/powerpc/platforms/pseries/pci.c | 152 +++
>  1 file changed, 152 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/pci.c 
> b/arch/powerpc/platforms/pseries/pci.c
> index 911534b89c85..cb7e4276cf04 100644
> --- a/arch/powerpc/platforms/pseries/pci.c
> +++ b/arch/powerpc/platforms/pseries/pci.c
> @@ -11,6 +11,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include 
>  #include 
> @@ -354,3 +356,153 @@ int pseries_root_bridge_prepare(struct pci_host_bridge 
> *bridge)
>  
>   return 0;
>  }
> +
> +/**
> + * pseries_get_pdn_addr - Retrieve PHB address
> + * @pe: EEH PE
> + *
> + * Retrieve the assocated PHB address. Actually, there're 2 RTAS
> + * function calls dedicated for the purpose. We need implement
> + * it through the new function and then the old one. Besides,
> + * you should make sure the config address is figured out from
> + * FDT node before calling the function.
> + *
> + */
> +static int pseries_get_pdn_addr(struct pci_controller *phb)
> +{
> + int ret = -1;
> + int rets[3];
> + int ibm_get_config_addr_info;
> + int ibm_get_config_addr_info2;
> + int config_addr = 0;
> + struct pci_dn *root_pdn, *pdn;
> +
> + ibm_get_config_addr_info2   = rtas_token("ibm,get-config-addr-info2");
> + ibm_get_config_addr_info= rtas_token("ibm,get-config-addr-info");
> +
> + root_pdn = PCI_DN(phb->dn);
> + pdn = list_first_entry(&root_pdn->child_list, struct pci_dn, list);
> + config_addr = (pdn->busno << 16) | (pdn->devfn << 8);
> +
> + if (ibm_get_config_addr_info2 != RTAS_UNKNOWN_SERVICE) {
> + /*
> +  * First of all, we need to make sure there has one PE
> +  * associated with the device. If option is 1, it
> +  * queries if config address is supported in a PE or not.
> +  * If option is 0, it returns PE config address or config
> +  * address for the PE primary bus.
> +  */
> + ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
> + config_addr, BUID_HI(pdn->phb->buid),
> + BUID_LO(pdn->phb->buid), 1);
> + if (ret || (rets[0] == 0)) {
> + pr_warn("%s: Failed to get address for PHB#%x-PE# 
> option=%d config_addr=%x\n",
> + __func__, pdn->phb->global_number, 1, rets[0]);
> + return -1;
> + }
> +
> + /* Retrieve the associated PE config address */
> + ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
> + config_addr, BUID_HI(pdn->phb->buid),
> + BUID_LO(pdn->phb->buid), 0);
> + if (ret) {
> + pr_warn("%s: Failed to get address for PHB#%x-PE# 
> option=%d config_addr=%x\n",
> + __func__, pdn->phb->global_number, 0, rets[0]);
> + return -1;
> + }
> + return rets[0];
> + }
> +
> + if (ibm_get_config_addr_info != RTAS_UNKNOWN_SERVICE) {
> + ret = rtas_call(ibm_get_config_addr_info, 4, 2, rets,
> + config_addr, BUID_HI(pdn->phb->buid),
> + BUID_LO(pdn->phb->buid), 0);
> + if (ret || rets[0]) {
> + pr_warn("%s: Failed to get address for PHB#%x-PE# 
> config_addr=%x\n",
> + __func__, pdn->phb->global_number, rets[0]);
> + return -1;
> + }
> + return rets[0];
> + }
> +
> + return ret;
> +}
> +
> +static int __init pseries_phb_reset(void)
> +{
> + struct pci_controller *phb;
> + int config_addr;
> + int ibm_set_slot_reset;
> + int ibm_configure_pe;
> + int ret;
> +
> + if (is_kdump_kernel() || reset_devices) {
> + pr_info("Issue PHB reset ...\n");
> + ibm_set_slot_reset = rtas_token("ibm,set-slot-reset");
> + ibm_configure_pe = rtas_token("ibm,configure-pe");
> +
> + if (ibm_set_slot_reset == RTAS_UNKNOWN_SERVICE ||
> + ibm_configure_pe == RTAS_UNKNOWN_SERVICE) {
> + pr_info("%s: EEH

Re: powerpc/pci: [PATCH 1/1 V3] PCIE PHB reset

2020-05-28 Thread Oliver O'Halloran

On Wed, May 27, 2020 at 12:48 AM  wrote:
>
> From: Wen Xiong 
>
> Several device drivers hit EEH(Extended Error handling) when triggering
> kdump on Pseries PowerVM. This patch implemented a reset of the PHBs
> in pci general code when triggering kdump. PHB reset stop all PCI
> transactions from normal kernel. We have tested the patch in several
> enviroments:
> - direct slot adapters
> - adapters under the switch
> - a VF adapter in PowerVM
> - a VF adapter/adapter in KVM guest.
>
> Signed-off-by: Wen Xiong 

Can you include a patch changelog in future. It's helpful to see what
we need to look at specificly when you post a new revision of a patch.
Looks good to me otherwise.

Reviewed-by: Oliver O'Halloran

Re: [PATCH v2] selftests: powerpc: Add test for execute-disabled pkeys

2020-05-28 Thread Michael Ellerman

Hi Sandipan,

A few comments below ...

Sandipan Das  writes:
> Apart from read and write access, memory protection keys can
> also be used for restricting execute permission of pages on
> powerpc. This adds a test to verify if the feature works as
> expected.
>
> Signed-off-by: Sandipan Das 
> ---
>
> Previous versions can be found at
> v1: 
> https://lore.kernel.org/linuxppc-dev/20200508162332.65316-1-sandi...@linux.ibm.com/
>
> Changes in v2:
> - Added .gitignore entry for test binary.
> - Fixed builds for older distros where siginfo_t might not have si_pkey as
>   a formal member based on discussion with Michael.
>
> ---
>  tools/testing/selftests/powerpc/mm/.gitignore |   1 +
>  tools/testing/selftests/powerpc/mm/Makefile   |   3 +-
>  .../selftests/powerpc/mm/pkey_exec_prot.c | 336 ++
>  3 files changed, 339 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
>
> diff --git a/tools/testing/selftests/powerpc/mm/.gitignore 
> b/tools/testing/selftests/powerpc/mm/.gitignore
> index 2ca523255b1b..8f841f925baa 100644
> --- a/tools/testing/selftests/powerpc/mm/.gitignore
> +++ b/tools/testing/selftests/powerpc/mm/.gitignore
> @@ -8,3 +8,4 @@ wild_bctr
>  large_vm_fork_separation
>  bad_accesses
>  tlbie_test
> +pkey_exec_prot
> diff --git a/tools/testing/selftests/powerpc/mm/Makefile 
> b/tools/testing/selftests/powerpc/mm/Makefile
> index b9103c4bb414..2816229f648b 100644
> --- a/tools/testing/selftests/powerpc/mm/Makefile
> +++ b/tools/testing/selftests/powerpc/mm/Makefile
> @@ -3,7 +3,7 @@ noarg:
>   $(MAKE) -C ../
>  
>  TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors 
> wild_bctr \
> -   large_vm_fork_separation bad_accesses
> +   large_vm_fork_separation bad_accesses pkey_exec_prot
>  TEST_GEN_PROGS_EXTENDED := tlbie_test
>  TEST_GEN_FILES := tempfile
>  
> @@ -17,6 +17,7 @@ $(OUTPUT)/prot_sao: ../utils.c
>  $(OUTPUT)/wild_bctr: CFLAGS += -m64
>  $(OUTPUT)/large_vm_fork_separation: CFLAGS += -m64
>  $(OUTPUT)/bad_accesses: CFLAGS += -m64
> +$(OUTPUT)/pkey_exec_prot: CFLAGS += -m64
>  
>  $(OUTPUT)/tempfile:
>   dd if=/dev/zero of=$@ bs=64k count=1
> diff --git a/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c 
> b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
> new file mode 100644
> index ..147fb9ed47d5
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +/*
> + * Copyright 2020, Sandipan Das, IBM Corp.
> + *
> + * Test if applying execute protection on pages using memory
> + * protection keys works as expected.
> + */
> +
> +#define _GNU_SOURCE
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "utils.h"
> +
> +/* Override definitions as they might be inconsistent */

Can you please expand the comment to say why/where you've seen problems,
so one day we can drop these once those old libcs are no longer around.

> +#undef PKEY_DISABLE_ACCESS
> +#define PKEY_DISABLE_ACCESS  0x3
> +
> +#undef PKEY_DISABLE_WRITE
> +#define PKEY_DISABLE_WRITE   0x2
> +
> +#undef PKEY_DISABLE_EXECUTE
> +#define PKEY_DISABLE_EXECUTE 0x4
> +
> +/* Older distros might not define this */
> +#ifndef SEGV_PKUERR
> +#define SEGV_PKUERR  4
> +#endif
> +
> +#define SI_PKEY_OFFSET   0x20
> +
> +#define SYS_pkey_mprotect386
> +#define SYS_pkey_alloc   384
> +#define SYS_pkey_free385
> +
> +#define PKEY_BITS_PER_PKEY   2
> +#define NR_PKEYS 32
> +
> +#define PKEY_BITS_MASK   ((1UL << PKEY_BITS_PER_PKEY) - 1)

If you include "reg.h" then there's a mfspr()/mtspr() macro you can use.

> +static unsigned long pkeyreg_get(void)
> +{
> + unsigned long uamr;

The SPR is AMR not uamr?

> + asm volatile("mfspr %0, 0xd" : "=r"(uamr));
> + return uamr;
> +}
> +
> +static void pkeyreg_set(unsigned long uamr)
> +{
> + asm volatile("isync; mtspr  0xd, %0; isync;" : : "r"(uamr));
> +}

You can use mtspr() there, but you'll need to add the isync's yourself.

> +static void pkey_set_rights(int pkey, unsigned long rights)
> +{
> + unsigned long uamr, shift;
> +
> + shift = (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY;
> + uamr = pkeyreg_get();
> + uamr &= ~(PKEY_BITS_MASK << shift);
> + uamr |= (rights & PKEY_BITS_MASK) << shift;
> + pkeyreg_set(uamr);
> +}
> +
> +static int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
> +{
> + return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
> +}
> +
> +static int sys_pkey_alloc(unsigned long flags, unsigned long rights)
> +{
> + return syscall(SYS_pkey_alloc, flags, rights);
> +}
> +
> +static int sys_pkey_free(int pkey)
> +{
> + return syscall(SYS_pkey_free, pkey);
> +}
> +
> +static volatile int fpkey, fcode, ftype, faults;

The "proper" type to use for things accessed in signal

Re: [RFC PATCH 1/4] powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

2020-05-28 Thread Alistair Popple

For what it's worth I tested this series on Mambo PowerNV and it seems to 
correctly enable/disable the prefix FSCR bit based on the cpu feature so feel 
free to add:

Tested-by: Alistair Popple 

Mikey is going to test out pseries.

- Alistair

On Thursday, 28 May 2020 12:58:40 AM AEST Michael Ellerman wrote:
> __init_FSCR() was added originally in commit 2468dcf641e4 ("powerpc:
> Add support for context switching the TAR register") (Feb 2013), and
> only set FSCR_TAR.
> 
> At that point FSCR (Facility Status and Control Register) was not
> context switched, so the setting was permanent after boot.
> 
> Later we added initialisation of FSCR_DSCR to __init_FSCR(), in commit
> 54c9b2253d34 ("powerpc: Set DSCR bit in FSCR setup") (Mar 2013), again
> that was permanent after boot.
> 
> Then commit 2517617e0de6 ("powerpc: Fix context switch DSCR on
> POWER8") (Aug 2013) added a limited context switch of FSCR, just the
> FSCR_DSCR bit was context switched based on thread.dscr_inherit. That
> commit said "This clears the H/FSCR DSCR bit initially", but it
> didn't, it left the initialisation of FSCR_DSCR in __init_FSCR().
> However the initial context switch from init_task to pid 1 would clear
> FSCR_DSCR because thread.dscr_inherit was 0.
> 
> That commit also introduced the requirement that FSCR_DSCR be clear
> for user processes, so that we can take the facility unavailable
> interrupt in order to manage dscr_inherit.
> 
> Then in commit 152d523e6307 ("powerpc: Create context switch helpers
> save_sprs() and restore_sprs()") (Dec 2015) FSCR was added to
> thread_struct. However it still wasn't fully context switched, we just
> took the existing value and set FSCR_DSCR if the new thread had
> dscr_inherit set. FSCR was still initialised at boot to FSCR_DSCR |
> FSCR_TAR, but that value was not propagated into the thread_struct, so
> the initial context switch set FSCR_DSCR back to 0.
> 
> Finally commit b57bd2de8c6c ("powerpc: Improve FSCR init and context
> switching") (Jun 2016) added a full context switch of the FSCR, and
> added an initialisation of init_task.thread.fscr to FSCR_TAR |
> FSCR_EBB, but omitted FSCR_DSCR.
> 
> The end result is that swapper runs with FSCR_DSCR set because of the
> initialisation in __init_FSCR(), but no other processes do, they use
> the value from init_task.thread.fscr.
> 
> Having FSCR_DSCR set for swapper allows it to access SPR 3 from
> userspace, but swapper never runs userspace, so it has no useful
> effect. It's also confusing to have the value initialised in two
> places to two different values.
> 
> So remove FSCR_DSCR from __init_FSCR(), this at least gets us to the
> point where there's a single value of FSCR, even if it's still set in
> two places.
> 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/kernel/cpu_setup_power.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S
> b/arch/powerpc/kernel/cpu_setup_power.S index a460298c7ddb..f91ecb10d0ae
> 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -184,7 +184,7 @@ _GLOBAL(__restore_cpu_power9)
> 
>  __init_FSCR:
>   mfspr   r3,SPRN_FSCR
> - ori r3,r3,FSCR_TAR|FSCR_DSCR|FSCR_EBB
> + ori r3,r3,FSCR_TAR|FSCR_EBB
>   mtspr   SPRN_FSCR,r3
>   blr

[PATCH] powerpc/nvram: Replace kmalloc with kzalloc in the error message

2020-05-28 Thread Yi Wang

From: Liao Pingfang 

Use kzalloc instead of kmalloc in the error message according to
the previous kzalloc() call.

Signed-off-by: Liao Pingfang 
---
 arch/powerpc/kernel/nvram_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
index fb4f610..c3a0c8d 100644
--- a/arch/powerpc/kernel/nvram_64.c
+++ b/arch/powerpc/kernel/nvram_64.c
@@ -892,7 +892,7 @@ loff_t __init nvram_create_partition(const char *name, int 
sig,
/* Create our OS partition */
new_part = kzalloc(sizeof(*new_part), GFP_KERNEL);
if (!new_part) {
-   pr_err("%s: kmalloc failed\n", __func__);
+   pr_err("%s: kzalloc failed\n", __func__);
return -ENOMEM;
}
 
-- 
2.9.5

[powerpc:next-test] BUILD SUCCESS bc26d22277c297c35013700e40f276e37b991980

2020-05-28 Thread kbuild test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next-test
branch HEAD: bc26d22277c297c35013700e40f276e37b991980  powerpc/pseries: Update 
hv-24x7 information after migration

Warning in current branch:

kernel/events/hw_breakpoint.c:216:12: warning: no previous prototype for 
'arch_reserve_bp_slot' [-Wmissing-prototypes]
kernel/events/hw_breakpoint.c:221:13: warning: no previous prototype for 
'arch_release_bp_slot' [-Wmissing-prototypes]

Warning ids grouped by kconfigs:

recent_errors
|-- i386-allnoconfig
|   |-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_release_bp_slot
|   `-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_reserve_bp_slot
|-- i386-allyesconfig
|   |-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_release_bp_slot
|   `-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_reserve_bp_slot
|-- i386-debian-10.3
|   |-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_release_bp_slot
|   `-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_reserve_bp_slot
|-- i386-defconfig
|   |-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_release_bp_slot
|   `-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_reserve_bp_slot
|-- i386-randconfig-c001-20200526
|   |-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_release_bp_slot
|   `-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_reserve_bp_slot
|-- sh-allmodconfig
|   |-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_release_bp_slot
|   `-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_reserve_bp_slot
|-- sh-allnoconfig
|   |-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_release_bp_slot
|   `-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_reserve_bp_slot
`-- x86_64-randconfig-c002-20200526
|-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_release_bp_slot
`-- 
kernel-events-hw_breakpoint.c:warning:no-previous-prototype-for-arch_reserve_bp_slot

elapsed time: 3491m

configs tested: 79
configs skipped: 1

The following configs have been built successfully.
More configs may be tested in the coming days.

arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
i386  allnoconfig
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
xtensa  defconfig
h8300allyesconfig
h8300allmodconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
pariscallnoconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
riscvallyesconfig
riscv allnoconfig
riscv   defco

Re: [PATCH] powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc again

2020-05-28 Thread Michael Ellerman

Daniel Borkmann  writes:
> On 5/28/20 2:23 PM, Michael Ellerman wrote:
>> Petr Mladek  writes:
>>> On Thu 2020-05-28 11:03:43, Michael Ellerman wrote:
 Petr Mladek  writes:
> The commit 0ebeea8ca8a4d1d453a ("bpf: Restrict bpf_probe_read{, str}() 
> only
> to archs where they work") caused that bpf_probe_read{, str}() functions
> were not longer available on architectures where the same logical address
> might have different content in kernel and user memory mapping. These
> architectures should use probe_read_{user,kernel}_str helpers.
>
> For backward compatibility, the problematic functions are still available
> on architectures where the user and kernel address spaces are not
> overlapping. This is defined 
> CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
>
> At the moment, these backward compatible functions are enabled only
> on x86_64, arm, and arm64. Let's do it also on powerpc that has
> the non overlapping address space as well.
>
> Signed-off-by: Petr Mladek 

 This seems like it should have a Fixes: tag and go into v5.7?
>>>
>>> Good point:
>>>
>>> Fixes: commit 0ebeea8ca8a4d1d4 ("bpf: Restrict bpf_probe_read{, str}() only 
>>> to archs where they work")
>>>
>>> And yes, it should ideally go into v5.7 either directly or via stable.
>>>
>>> Should I resend the patch with Fixes and
>>> Cc: sta...@vger.kernel.org #v45.7 lines, please?
>> 
>> If it goes into v5.7 then it doesn't need a Cc: stable, and I guess a
>> Fixes: tag is nice to have but not so important as it already mentions
>> the commit that caused the problem. So a resend probably isn't
>> necessary.
>> 
>> Acked-by: Michael Ellerman 
>> 
>> Daniel can you pick this up, or should I?
>
> Yeah I'll take it into bpf tree for v5.7.

Thanks.

cheers

[PATCH] powerpc: Fix misleading small cores print

2020-05-28 Thread Michael Neuling

Currently when we boot on a big core system, we get this print:
  [0.040500] Using small cores at SMT level

This is misleading as we've actually detected big cores.

This patch clears up the print to say we've detect big cores but are
using small cores for scheduling.

Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6d2a3a3666..c820c95162 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1383,7 +1383,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
 
 #ifdef CONFIG_SCHED_SMT
if (has_big_cores) {
-   pr_info("Using small cores at SMT level\n");
+   pr_info("Big cores detected but using small core scheduling\n");
power9_topology[0].mask = smallcore_smt_mask;
powerpc_topology[0].mask = smallcore_smt_mask;
}
-- 
2.26.2

Re: [PATCH v8 1/5] powerpc: Document details on H_SCM_HEALTH hcall

2020-05-28 Thread Vaibhav Jain

Thanks for looking into this patchset Dan,


Dan Williams  writes:

> On Tue, May 26, 2020 at 9:13 PM Vaibhav Jain  wrote:
>>
>> Add documentation to 'papr_hcalls.rst' describing the bitmap flags
>> that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
>> specification.
>>
>
> Please do a global s/SCM/PMEM/ or s/SCM/NVDIMM/. It's unfortunate that
> we already have 2 ways to describe persistent memory devices, let's
> not perpetuate a third so that "grep" has a chance to find
> interrelated code across architectures. Other than that this looks
> good to me.

Sure, will use PAPR_NVDIMM instead of PAPR_SCM for new code being
introduced. However certain identifiers like H_SCM_HEALTH are taken from
the papr specificiation hence need to use the same name.

>
>> Cc: "Aneesh Kumar K . V" 
>> Cc: Dan Williams 
>> Cc: Michael Ellerman 
>> Cc: Ira Weiny 
>> Signed-off-by: Vaibhav Jain 
>> ---
>> Changelog:
>> v7..v8:
>> * Added a clarification on bit-ordering of Health Bitmap
>>
>> Resend:
>> * None
>>
>> v6..v7:
>> * None
>>
>> v5..v6:
>> * New patch in the series
>> ---
>>  Documentation/powerpc/papr_hcalls.rst | 45 ---
>>  1 file changed, 41 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/powerpc/papr_hcalls.rst 
>> b/Documentation/powerpc/papr_hcalls.rst
>> index 3493631a60f8..45063f305813 100644
>> --- a/Documentation/powerpc/papr_hcalls.rst
>> +++ b/Documentation/powerpc/papr_hcalls.rst
>> @@ -220,13 +220,50 @@ from the LPAR memory.
>>  **H_SCM_HEALTH**
>>
>>  | Input: drcIndex
>> -| Out: *health-bitmap, health-bit-valid-bitmap*
>> +| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
>>  | Return Value: *H_Success, H_Parameter, H_Hardware*
>>
>>  Given a DRC Index return the info on predictive failure and overall health 
>> of
>> -the NVDIMM. The asserted bits in the health-bitmap indicate a single 
>> predictive
>> -failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
>> -valid.
>> +the NVDIMM. The asserted bits in the health-bitmap indicate one or more 
>> states
>> +(described in table below) of the NVDIMM and health-bit-valid-bitmap 
>> indicate
>> +which bits in health-bitmap are valid. The bits are reported in
>> +reverse bit ordering for example a value of 0xC400
>> +indicates bits 0, 1, and 5 are valid.
>> +
>> +Health Bitmap Flags:
>> +
>> ++--+---+
>> +|  Bit |   Definition   
>>|
>> ++==+===+
>> +|  00  | SCM device is unable to persist memory contents.   
>>|
>> +|  | If the system is powered down, nothing will be saved.  
>>|
>> ++--+---+
>> +|  01  | SCM device failed to persist memory contents. Either contents were 
>> not|
>> +|  | saved successfully on power down or were not restored properly on  
>>|
>> +|  | power up.  
>>|
>> ++--+---+
>> +|  02  | SCM device contents are persisted from previous IPL. The data from 
>>|
>> +|  | the last boot were successfully restored.  
>>|
>> ++--+---+
>> +|  03  | SCM device contents are not persisted from previous IPL. There was 
>> no |
>> +|  | data to restore from the last boot.
>>|
>> ++--+---+
>> +|  04  | SCM device memory life remaining is critically low 
>>|
>> ++--+---+
>> +|  05  | SCM device will be garded off next IPL due to failure  
>>|
>> ++--+---+
>> +|  06  | SCM contents cannot persist due to current platform health status. 
>> A  |
>> +|  | hardware failure may prevent data from being saved or restored.
>>|
>> ++--+---+
>> +|  07  | SCM device is unable to persist memory contents in certain 
>> conditions |
>> ++--+---+
>> +|  08  | SCM device is encrypted
>>|
>> ++--+---+
>> +|  09  | SCM device has successfully completed a requested erase or secure  
>>|
>> +|  | erase procedure.   
>>|
>> ++--+---+
>> +|10

[PATCH net] drivers/net/ibmvnic: Update VNIC protocol version reporting

2020-05-28 Thread Thomas Falcon

VNIC protocol version is reported in big-endian format, but it
is not byteswapped before logging. Fix that, and remove version
comparison as only one protocol version exists at this time.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 621a6e0..f2c7160 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -4689,12 +4689,10 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
dev_err(dev, "Error %ld in VERSION_EXCHG_RSP\n", rc);
break;
}
-   dev_info(dev, "Partner protocol version is %d\n",
-crq->version_exchange_rsp.version);
-   if (be16_to_cpu(crq->version_exchange_rsp.version) <
-   ibmvnic_version)
-   ibmvnic_version =
+   ibmvnic_version =
be16_to_cpu(crq->version_exchange_rsp.version);
+   dev_info(dev, "Partner protocol version is %d\n",
+ibmvnic_version);
send_cap_queries(adapter);
break;
case QUERY_CAPABILITY_RSP:
-- 
1.8.3.1

Re: [PATCH] powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc again

2020-05-28 Thread Daniel Borkmann


On 5/28/20 2:23 PM, Michael Ellerman wrote:

Petr Mladek  writes:

On Thu 2020-05-28 11:03:43, Michael Ellerman wrote:

Petr Mladek  writes:

The commit 0ebeea8ca8a4d1d453a ("bpf: Restrict bpf_probe_read{, str}() only
to archs where they work") caused that bpf_probe_read{, str}() functions
were not longer available on architectures where the same logical address
might have different content in kernel and user memory mapping. These
architectures should use probe_read_{user,kernel}_str helpers.

For backward compatibility, the problematic functions are still available
on architectures where the user and kernel address spaces are not
overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

At the moment, these backward compatible functions are enabled only
on x86_64, arm, and arm64. Let's do it also on powerpc that has
the non overlapping address space as well.

Signed-off-by: Petr Mladek 


This seems like it should have a Fixes: tag and go into v5.7?


Good point:

Fixes: commit 0ebeea8ca8a4d1d4 ("bpf: Restrict bpf_probe_read{, str}() only to archs 
where they work")

And yes, it should ideally go into v5.7 either directly or via stable.

Should I resend the patch with Fixes and
Cc: sta...@vger.kernel.org #v45.7 lines, please?


If it goes into v5.7 then it doesn't need a Cc: stable, and I guess a
Fixes: tag is nice to have but not so important as it already mentions
the commit that caused the problem. So a resend probably isn't
necessary.

Acked-by: Michael Ellerman 

Daniel can you pick this up, or should I?


Yeah I'll take it into bpf tree for v5.7.

Thanks everyone,
Daniel

Re: [PATCH] powerpc/64s: Fix restore of NV GPRs after facility unavailable exception

2020-05-28 Thread Michael Ellerman

On Tue, 26 May 2020 16:18:08 +1000, Michael Ellerman wrote:
> Commit 702f09805222 ("powerpc/64s/exception: Remove lite interrupt
> return") changed the interrupt return path to not restore non-volatile
> registers by default, and explicitly restore them in paths where it is
> required.
> 
> But it missed that the facility unavailable exception can sometimes
> modify user registers, ie. when it does emulation of move from DSCR.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/64s: Fix restore of NV GPRs after facility unavailable exception
  https://git.kernel.org/powerpc/c/595d153dd1022392083ac93a1550382cbee127e0

cheers

Re: [PATCH] powerpc/64s: Disable STRICT_KERNEL_RWX

2020-05-28 Thread Michael Ellerman

On Wed, 20 May 2020 23:36:05 +1000, Michael Ellerman wrote:
> Several strange crashes have been eventually traced back to
> STRICT_KERNEL_RWX and its interaction with code patching.
> 
> Various paths in our ftrace, kprobes and other patching code need to
> be hardened against patching failures, otherwise we can end up running
> with partially/incorrectly patched ftrace paths, kprobes or jump
> labels, which can then cause strange crashes.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/64s: Disable STRICT_KERNEL_RWX
  https://git.kernel.org/powerpc/c/8659a0e0efdd975c73355dbc033f79ba3b31e82c

cheers

Re: [PATCH] Revert "powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits."

2020-05-28 Thread Michael Ellerman

On Wed, 20 May 2020 10:23:45 + (UTC), Christophe Leroy wrote:
> This reverts commit 697ece78f8f749aeea40f2711389901f0974017a.
> 
> The implementation of SWAP on powerpc requires page protection
> bits to not be one of the least significant PTE bits.
> 
> Until the SWAP implementation is changed and this requirement voids,
> we have to keep at least _PAGE_RW outside of the 3 last bits.
> 
> [...]

Applied to powerpc/fixes.

[1/1] Revert "powerpc/32s: reorder Linux PTE bits to better match Hash PTE 
bits."
  https://git.kernel.org/powerpc/c/40bb0e904212cf7d6f041a98c58c8341b2016670

cheers

Re: [PATCH v2] powerpc/wii: Fix declaration made after definition

2020-05-28 Thread Michael Ellerman

Nathan Chancellor  writes:
> A 0day randconfig uncovered an error with clang, trimmed for brevity:
>
> arch/powerpc/platforms/embedded6xx/wii.c:195:7: error: attribute
> declaration must precede definition [-Werror,-Wignored-attributes]
> if (!machine_is(wii))
>  ^
>
> The macro machine_is declares mach_##name but define_machine actually
> defines mach_##name, hence the warning.
>
> To fix this, move define_machine after the machine_is usage.
>
> Fixes: 5a7ee3198dfa ("powerpc: wii: platform support")
> Reported-by: kbuild test robot 
> Link: https://github.com/ClangBuiltLinux/linux/issues/989
> Reviewed-by: Nick Desaulniers 
> Signed-off-by: Nathan Chancellor 
> ---
>
> v1 -> v2:
>
> * s/is_machine/machine_is/ (Nick)
>
> * Add Nick's reviewed-by tag.

I already picked up v1, and it's behind a merge so I don't want to
rebase it. I'll live with the typo. Thanks.

cheers

Re: [PATCH 2/3] powerpc/pci: unmap legacy INTx interrupts of passthrough IO adapters

2020-05-28 Thread Michael Ellerman

Cédric Le Goater  writes:
> On 4/29/20 9:51 AM, Cédric Le Goater wrote:
>> When a passthrough IO adapter is removed from a pseries machine using
>> hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
>> expects the guest OS to have cleared all page table entries related to
>> the adapter. If some are still present, the RTAS call which isolates
>> the PCI slot returns error 9001 "valid outstanding translations" and
>> the removal of the IO adapter fails.
>> 
>> INTx interrupt numbers need special care because Linux maps the
>> interrupts automatically in the Linux interrupt number space if they
>> are presented in the device tree node describing the IO adapter. These
>> interrupts are not un-mapped automatically and in case of an hot-plug
>> adapter, the PCI hot-plug layer needs to handle the cleanup to make
>> sure that all the page table entries of the XIVE ESB pages are
>> cleared.
>
> It seems this patch needs more digging to make sure we are handling
> the IRQ unmapping in the correct PCI handler. Could you please keep
> it back for the moment ? 

Yep no worries.

cheers

Re: [PATCH v3 1/3] riscv: Move kernel mapping to vmalloc zone

2020-05-28 Thread Alex Ghiti


Hi Zong,

Le 5/27/20 à 3:29 AM, Alex Ghiti a écrit :

Le 5/27/20 à 2:05 AM, Zong Li a écrit :

On Wed, May 27, 2020 at 1:06 AM Alex Ghiti  wrote:

Hi Zong,

Le 5/26/20 à 5:43 AM, Zong Li a écrit :

On Sun, May 24, 2020 at 4:54 PM Alexandre Ghiti  wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be 
loaded
physically at the beginning of the main memory. Therefore, we 
could use

the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from 
PAGE_OFFSET
and since in the linear mapping, two different virtual addresses 
cannot
point to the same physical address, the kernel mapping needs to 
lie outside

the linear mapping.

In addition, because modules and BPF must be close to the kernel 
(inside
+-2GB window), the kernel is placed at the end of the vmalloc zone 
minus

2GB, which leaves room for modules and BPF. The kernel could not be
placed at the beginning of the vmalloc zone since other vmalloc
allocations from the kernel could get all the +-2GB window around the
kernel which would prevent new modules and BPF programs to be loaded.

Signed-off-by: Alexandre Ghiti 
---
   arch/riscv/boot/loader.lds.S |  3 +-
   arch/riscv/include/asm/page.h    | 10 +-
   arch/riscv/include/asm/pgtable.h | 37 +---
   arch/riscv/kernel/head.S |  3 +-
   arch/riscv/kernel/module.c   |  4 +--
   arch/riscv/kernel/vmlinux.lds.S  |  3 +-
   arch/riscv/mm/init.c | 58 
+---

   arch/riscv/mm/physaddr.c |  2 +-
   8 files changed, 87 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S 
b/arch/riscv/boot/loader.lds.S

index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
   /* SPDX-License-Identifier: GPL-2.0 */

   #include 
+#include 

   OUTPUT_ARCH(riscv)
   ENTRY(_start)

   SECTIONS
   {
-   . = PAGE_OFFSET;
+   . = KERNEL_LINK_ADDR;

  .payload : {
  *(.payload)
diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h

index 2d50f76efe48..48bb09b6a9b7 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

   #ifdef CONFIG_MMU
   extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
   extern unsigned long pfn_base;
   #define ARCH_PFN_OFFSET    (pfn_base)
   #else
   #define va_pa_offset   0
+#define va_kernel_pa_offset    0
   #define ARCH_PFN_OFFSET    (PAGE_OFFSET >> PAGE_SHIFT)
   #endif /* CONFIG_MMU */

   extern unsigned long max_low_pfn;
   extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

   #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + 
va_pa_offset))

-#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - 
va_pa_offset)

+#define kernel_mapping_va_to_pa(x) \
+   ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)  \
+   (((x) >= PAGE_OFFSET) ? \
+   linear_mapping_va_to_pa(x) : 
kernel_mapping_va_to_pa(x))


   #ifdef CONFIG_DEBUG_VIRTUAL
   extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h

index 35b60035b6b0..25213cfaf680 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,29 @@

   #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
-
-#ifdef CONFIG_MMU
+#ifndef CONFIG_MMU
+#define KERNEL_VIRT_ADDR   PAGE_OFFSET
+#define KERNEL_LINK_ADDR   PAGE_OFFSET
+#else
+/*
+ * Leave 2GB for modules and BPF that must lie within a 2GB range 
around

+ * the kernel.
+ */
+#define KERNEL_VIRT_ADDR   (VMALLOC_END - SZ_2G + 1)
+#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR

   #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
   #define VMALLOC_END  (PAGE_OFFSET - 1)
   #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)

   #define BPF_JIT_REGION_SIZE    (SZ_128M)
-#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END (VMALLOC_END)
+#define BPF_JIT_REGION_START   (kernel_virt_addr)
+#define BPF_JIT_REGION_END (kernel_virt_addr + 
BPF_JIT_REGION_SIZE)

It seems to have a potential risk here, the region of bpf is
overlapping with kernel mapping, so if kernel size is bigger than
128MB, bpf region would be occupied and run out by kernel mapping.

Is there the risk as I mentioned?



Sorry I forgot to answer this one: I was confident that 128MB was 
large enough for kernel
and BPF. But I see no reason to leave this risk so I'll change 
kernel_virt_addr for _end so

BPF will have its 128MB reserved.

Thanks !

Alex

Re: [PATCH] powerpc/64: Remove unused generic_secondary_thread_init()

2020-05-28 Thread Michael Ellerman

Jordan Niethe  writes:
> On Tue, May 26, 2020 at 4:36 PM Michael Ellerman  wrote:
>>
>> The last caller was removed in 2014 in commit fb5a515704d7 ("powerpc:
>> Remove platforms/wsp and associated pieces").
>>
>> Once generic_secondary_thread_init() is removed there are no longer
>> any uses of book3e_secondary_thread_init() or
>> generic_secondary_common_init so remove them too.
>>
>> Signed-off-by: Michael Ellerman 
>> ---
>>  arch/powerpc/include/asm/smp.h   |  1 -
>>  arch/powerpc/kernel/exceptions-64e.S |  4 
>>  arch/powerpc/kernel/head_64.S| 18 --
>>  3 files changed, 23 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
>> index 49a25e2400f2..81a49566ccd8 100644
>> --- a/arch/powerpc/include/asm/smp.h
>> +++ b/arch/powerpc/include/asm/smp.h
>> @@ -243,7 +243,6 @@ extern void arch_send_call_function_ipi_mask(const 
>> struct cpumask *mask);
>>   * 64-bit but defining them all here doesn't harm
>>   */
>>  extern void generic_secondary_smp_init(void);
>> -extern void generic_secondary_thread_init(void);
>>  extern unsigned long __secondary_hold_spinloop;
>>  extern unsigned long __secondary_hold_acknowledge;
>>  extern char __secondary_hold;
>> diff --git a/arch/powerpc/kernel/exceptions-64e.S 
>> b/arch/powerpc/kernel/exceptions-64e.S
>> index d9ed79415100..9f9e8686798b 100644
>> --- a/arch/powerpc/kernel/exceptions-64e.S
>> +++ b/arch/powerpc/kernel/exceptions-64e.S
>> @@ -1814,10 +1814,6 @@ _GLOBAL(book3e_secondary_core_init)
>>  1: mtlrr28
>> blr
>>
>> -_GLOBAL(book3e_secondary_thread_init)
>> -   mflrr28
>> -   b   3b
>> -
>> .globl init_core_book3e
>>  init_core_book3e:
>> /* Establish the interrupt vector base */
>> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
>> index 0e05a9a47a4b..4ae2c18c5fc6 100644
>> --- a/arch/powerpc/kernel/head_64.S
>> +++ b/arch/powerpc/kernel/head_64.S
>> @@ -302,23 +302,6 @@ _GLOBAL(fsl_secondary_thread_init)
>>  1:
>>  #endif
>
> Nothing directly calls generic_secondary_thread_init() but I think
> fsl_secondary_thread_init() which is directly above "falls through"
> into it. fsl_secondary_thread_init() still has callers.

Damnit, you're right, I love deleting code! Thanks for reviewing.

I'll send a v2.

cheers

Re: [PATCH] powerpc/book3s64/kvm: Fix secondary page table walk warning during migration

2020-05-28 Thread Aneesh Kumar K.V


On 5/28/20 6:23 PM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:

This patch fix the below warning reported during migration.

  find_kvm_secondary_pte called with kvm mmu_lock not held
  CPU: 23 PID: 5341 Comm: qemu-system-ppc Tainted: GW 
5.7.0-rc5-kvm-00211-g9ccf10d6d088 #432
  NIP:  c00800fe848c LR: c00800fe8488 CTR: 
  REGS: c01e19f077e0 TRAP: 0700   Tainted: GW  
(5.7.0-rc5-kvm-00211-g9ccf10d6d088)
  MSR:  90029033   CR: 4422  XER: 2004
  CFAR: c012f5ac IRQMASK: 0
  GPR00: c00800fe8488 c01e19f07a70 c00800ffe200 0039
  GPR04: 0001 c01ffc8b4900 00018840 0007
  GPR08: 0003 0001 0007 0001
  GPR12: 2000 c01fff6d9400 00011f884678 7fff70b7
  GPR16: 7fff7137cb90 7fff7dcb4410 0001 
  GPR20: 0ffe  0001 
  GPR24: 8000 0001 c01e1f67e600 c01e1fd82410
  GPR28: 1000 c01e2e41 0fff 0ffe
  NIP [c00800fe848c] kvmppc_hv_get_dirty_log_radix+0x2e4/0x340 [kvm_hv]
  LR [c00800fe8488] kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv]
  Call Trace:
  [c01e19f07a70] [c00800fe8488] 
kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv] (unreliable)
  [c01e19f07b50] [c00800fd42e4] 
kvm_vm_ioctl_get_dirty_log_hv+0x33c/0x3c0 [kvm_hv]
  [c01e19f07be0] [c00800eea878] kvm_vm_ioctl_get_dirty_log+0x30/0x50 
[kvm]
  [c01e19f07c00] [c00800edc818] kvm_vm_ioctl+0x2b0/0xc00 [kvm]
  [c01e19f07d50] [c046e148] ksys_ioctl+0xf8/0x150
  [c01e19f07da0] [c046e1c8] sys_ioctl+0x28/0x80
  [c01e19f07dc0] [c003652c] system_call_exception+0x16c/0x240
  [c01e19f07e20] [c000d070] system_call_common+0xf0/0x278
  Instruction dump:
  7d3a512a 4200ffd0 7ffefb78 4bfffdc4 6000 3c82 e8848468 3c62
  e86384a8 38840010 4800673d e8410018 <0fe0> 4bfffdd4 6000 6000

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/kvm_book3s_64.h |  9 ++
  arch/powerpc/kvm/book3s_64_mmu_radix.c   | 35 
  2 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index c58e64a0a74f..cd5bad08b8d3 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -635,6 +635,15 @@ extern void kvmhv_remove_nest_rmap_range(struct kvm *kvm,
unsigned long gpa, unsigned long hpa,
unsigned long nbytes);
  
+static inline pte_t *__find_kvm_secondary_pte(struct kvm *kvm, unsigned long ea,

+ unsigned *hshift)
+{
+   pte_t *pte;
+
+   pte = __find_linux_pte(kvm->arch.pgtable, ea, NULL, hshift);
+   return pte;
+}


Why not just open code this in the single caller?


We could do that. But I though it is confusing and we want to avoid 
using linux page table walker (__find_linux_pte()) directly from within 
kvm code to walk partition scoped table.




Leaving it here someone will invariably decide to call it one day.



I was looking at documenting it at the call site.  We can possibly add a 
comment here explaining avoid calling this directly and if used should 
have a  comments around explaining why it is safe.




If you think it's worth keeping then it should have a comment explaining
why it doesn't check the lock, and find_kvm_secondary_pte() should call
it no?



Was trying to avoid that indirection. I guess we should add a comment 
which suggest to avoid using __find_kvm_secondary_pte() and if used we 
should ask for comment at the call site explaining why it is safe?


-aneesh

Re: [PATCH V4 1/2] powerpc/perf: Add support for outputting extended regs in perf intr_regs

2020-05-28 Thread Nageswara R Sastry


"Linuxppc-dev"  wrote on 27/05/2020 03:20:17
PM:

> From: "Athira Rajeev" 
> To: linuxppc-dev@lists.ozlabs.org
> Cc: ravi.bango...@linux.ibm.com, atraj...@linux.vnet.ibm.com,
> ma...@linux.vnet.ibm.com, linux-ker...@vger.kernel.org,
> a...@kernel.org, a...@linux.vnet.ibm.com, jo...@kernel.org
> Date: 28/05/2020 02:46 PM
> Subject: [PATCH V4 1/2] powerpc/perf: Add support for outputting
> extended regs in perf intr_regs
> Sent by: "Linuxppc-dev"  +maddy=linux.vnet.ibm@lists.ozlabs.org>
>
> From: Anju T Sudhakar 
>
> Add support for perf extended register capability in powerpc.
> The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the
> PMU which support extended registers. The generic code define the mask
> of extended registers as 0 for non supported architectures.
>
> Patch adds extended regs support for power9 platform by
> exposing MMCR0, MMCR1 and MMCR2 registers.
>
> REG_RESERVED mask needs update to include extended regs.
> `PERF_REG_EXTENDED_MASK`, contains mask value of the supported registers,
> is defined at runtime in the kernel based on platform since the supported
> registers may differ from one processor version to another and hence the
> MASK value.
>
> with patch
> --
>
> available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11
> r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26
> r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe
> trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2
>
> PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0
> ... intr regs: mask 0x ABI 64-bit
>  r00xc012b77c
>  r10xc03fe5e03930
>  r20xc1b0e000
>  r30xc03fdcddf800
>  r40xc03fc788
>  r50x9c422724be
>  r60xc03fe5e03908
>  r70xff63bddc8706
>  r80x9e4
>  r90x0
>  r10   0x1
>  r11   0x0
>  r12   0xc01299c0
>  r13   0xc03c4800
>  r14   0x0
>  r15   0x7fffdd8b8b00
>  r16   0x0
>  r17   0x7fffdd8be6b8
>  r18   0x7e7076607730
>  r19   0x2f
>  r20   0xc0001fc26c68
>  r21   0xc0002041e4227e00
>  r22   0xc0002018fb60
>  r23   0x1
>  r24   0xc03ffec4d900
>  r25   0x8000
>  r26   0x0
>  r27   0x1
>  r28   0x1
>  r29   0xc1be1260
>  r30   0x6008010
>  r31   0xc03ffebb7218
>  nip   0xc012b910
>  msr   0x90009033
>  orig_r3 0xc012b86c
>  ctr   0xc01299c0
>  link  0xc012b77c
>  xer   0x0
>  ccr   0x2800
>  softe 0x1
>  trap  0xf00
>  dar   0x0
>  dsisr 0x800
>  sier  0x0
>  mmcra 0x800
>  mmcr0 0x82008090
>  mmcr1 0x1e00
>  mmcr2 0x0
>  ... thread: perf:4784
>
> Signed-off-by: Anju T Sudhakar 
> [Defined PERF_REG_EXTENDED_MASK at run time to add support for
> different platforms ]
> Signed-off-by: Athira Rajeev 
> Reviewed-by: Madhavan Srinivasan 

Tested with 5.7.0-rc2
Tested-by: Nageswara R Sastry 


> ---
>  arch/powerpc/include/asm/perf_event_server.h |  8 +++
>  arch/powerpc/include/uapi/asm/perf_regs.h| 14 +++-
>  arch/powerpc/perf/core-book3s.c  |  1 +
>  arch/powerpc/perf/perf_regs.c| 34 +
> ---
>  arch/powerpc/perf/power9-pmu.c   |  6 +
>  5 files changed, 59 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/
> powerpc/include/asm/perf_event_server.h
> index 3e9703f..1458e1a 100644
> --- a/arch/powerpc/include/asm/perf_event_server.h
> +++ b/arch/powerpc/include/asm/perf_event_server.h
> @@ -15,6 +15,9 @@
>  #define MAX_EVENT_ALTERNATIVES   8
>  #define MAX_LIMITED_HWCOUNTERS   2
>
> +extern u64 mask_var;
> +#define PERF_REG_EXTENDED_MASK  mask_var
> +
>  struct perf_event;
>
>  /*
> @@ -55,6 +58,11 @@ struct power_pmu {
> int   *blacklist_ev;
> /* BHRB entries in the PMU */
> int  bhrb_nr;
> +   /*
> +* set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if
> +* the pmu supports extended perf regs capability
> +*/
> +   int  capabilities;
>  };
>
>  /*
> diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h b/arch/
> powerpc/include/uapi/asm/perf_regs.h
> index f599064..485b1d5 100644
> --- a/arch/powerpc/include/uapi/asm/perf_regs.h
> +++ b/arch/powerpc/include/uapi/asm/perf_regs.h
> @@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
> PERF_REG_POWERPC_DSISR,
> PERF_REG_POWERPC_SIER,
> PERF_REG_POWERPC_MMCRA,
> -   PERF_REG_POWERPC_MAX,
> +   /* Extended registers */
> +   PERF_REG_POWERPC_MMCR0,
> +   PERF_REG_POWERPC_MMCR1,
> +   PERF_REG_POWERPC_MMCR2,
> +   /* Max regs without the extended regs */
> +   PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
>  };
> +
> +#define PERF_REG_PMU_MASK   ((1ULL << PERF_REG_POWERPC_MAX) - 1)
> +
> +/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
> +#def

Re: [PATCH V4 2/2] tools/perf: Add perf tools support for extended register capability in powerpc

2020-05-28 Thread Nageswara R Sastry


"Athira Rajeev"  wrote on 27/05/2020 03:20:18
PM:

> From: "Athira Rajeev" 
> To: linuxppc-dev@lists.ozlabs.org
> Cc: linux-ker...@vger.kernel.org, ravi.bango...@linux.ibm.com,
> ma...@linux.vnet.ibm.com, a...@kernel.org, a...@linux.vnet.ibm.com,
> jo...@kernel.org, m...@ellerman.id.au, atraj...@linux.vnet.ibm.com
> Date: 28/05/2020 02:46 PM
> Subject: [PATCH V4 2/2] tools/perf: Add perf tools support for
> extended register capability in powerpc
>
> From: Anju T Sudhakar 
>
> Add extended regs to sample_reg_mask in the tool side to use
> with `-I?` option. Perf tools side uses extended mask to display
> the platform supported register names (with -I? option) to the user
> and also send this mask to the kernel to capture the extended registers
> in each sample. Hence decide the mask value based on the processor
> version.
>
> Signed-off-by: Anju T Sudhakar 
> [Decide extended mask at run time based on platform]
> Signed-off-by: Athira Rajeev 
> Reviewed-by: Madhavan Srinivasan 

Tested-by: Nageswara R Sastry 
Tested with 5.7.0-rc2
Tested the following scenarios
1. perf record -I
2. perf report -D  # in output check for the registers
3. perf record -I
4. perf record -I
5. perf record -I
6. perf record -I -e 

> ---
>  tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++-
>  tools/perf/arch/powerpc/include/perf_regs.h |  5 ++-
>  tools/perf/arch/powerpc/util/perf_regs.c| 55 ++
> +++
>  3 files changed, 72 insertions(+), 2 deletions(-)
>
> diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h b/
> tools/arch/powerpc/include/uapi/asm/perf_regs.h
> index f599064..485b1d5 100644
> --- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
> +++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
> @@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
> PERF_REG_POWERPC_DSISR,
> PERF_REG_POWERPC_SIER,
> PERF_REG_POWERPC_MMCRA,
> -   PERF_REG_POWERPC_MAX,
> +   /* Extended registers */
> +   PERF_REG_POWERPC_MMCR0,
> +   PERF_REG_POWERPC_MMCR1,
> +   PERF_REG_POWERPC_MMCR2,
> +   /* Max regs without the extended regs */
> +   PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
>  };
> +
> +#define PERF_REG_PMU_MASK   ((1ULL << PERF_REG_POWERPC_MAX) - 1)
> +
> +/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
> +#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2
> + 1)) - 1) \
> +- PERF_REG_PMU_MASK)
> +
>  #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
> diff --git a/tools/perf/arch/powerpc/include/perf_regs.h b/tools/
> perf/arch/powerpc/include/perf_regs.h
> index e18a355..46ed00d 100644
> --- a/tools/perf/arch/powerpc/include/perf_regs.h
> +++ b/tools/perf/arch/powerpc/include/perf_regs.h
> @@ -64,7 +64,10 @@
> [PERF_REG_POWERPC_DAR] = "dar",
> [PERF_REG_POWERPC_DSISR] = "dsisr",
> [PERF_REG_POWERPC_SIER] = "sier",
> -   [PERF_REG_POWERPC_MMCRA] = "mmcra"
> +   [PERF_REG_POWERPC_MMCRA] = "mmcra",
> +   [PERF_REG_POWERPC_MMCR0] = "mmcr0",
> +   [PERF_REG_POWERPC_MMCR1] = "mmcr1",
> +   [PERF_REG_POWERPC_MMCR2] = "mmcr2",
>  };
>
>  static inline const char *perf_reg_name(int id)
> diff --git a/tools/perf/arch/powerpc/util/perf_regs.c b/tools/perf/
> arch/powerpc/util/perf_regs.c
> index 0a52429..9179230 100644
> --- a/tools/perf/arch/powerpc/util/perf_regs.c
> +++ b/tools/perf/arch/powerpc/util/perf_regs.c
> @@ -6,9 +6,14 @@
>
>  #include "../../../util/perf_regs.h"
>  #include "../../../util/debug.h"
> +#include "../../../util/event.h"
> +#include "../../../util/header.h"
> +#include "../../../perf-sys.h"
>
>  #include 
>
> +#define PVR_POWER9  0x004E
> +
>  const struct sample_reg sample_reg_masks[] = {
> SMPL_REG(r0, PERF_REG_POWERPC_R0),
> SMPL_REG(r1, PERF_REG_POWERPC_R1),
> @@ -55,6 +60,9 @@
> SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR),
> SMPL_REG(sier, PERF_REG_POWERPC_SIER),
> SMPL_REG(mmcra, PERF_REG_POWERPC_MMCRA),
> +   SMPL_REG(mmcr0, PERF_REG_POWERPC_MMCR0),
> +   SMPL_REG(mmcr1, PERF_REG_POWERPC_MMCR1),
> +   SMPL_REG(mmcr2, PERF_REG_POWERPC_MMCR2),
> SMPL_REG_END
>  };
>
> @@ -163,3 +171,50 @@ int arch_sdt_arg_parse_op(char *old_op, char
**new_op)
>
> return SDT_ARG_VALID;
>  }
> +
> +uint64_t arch__intr_reg_mask(void)
> +{
> +   struct perf_event_attr attr = {
> +  .type   = PERF_TYPE_HARDWARE,
> +  .config = PERF_COUNT_HW_CPU_CYCLES,
> +  .sample_type= PERF_SAMPLE_REGS_INTR,
> +  .precise_ip = 1,
> +  .disabled   = 1,
> +  .exclude_kernel = 1,
> +   };
> +   int fd, ret;
> +   char buffer[64];
> +   u32 version;
> +   u64 extended_mask = 0;
> +
> +   /* Get the PVR value to set the extended
> +* mask specific to platform
> +*/
> +   get_cpuid(buffer, sizeof(buffer));
> +   ret = sscanf(buffer, "%u,", &version);
> +
> +   if (ret != 1) {
> +  pr_debug("Failed to get the processor version, unable to
> output extended registers\n");
> +  return PERF_REGS_MASK;
> +   }

Re: [PATCH] powerpc/book3s64/kvm: Fix secondary page table walk warning during migration

2020-05-28 Thread Michael Ellerman

"Aneesh Kumar K.V"  writes:
> This patch fix the below warning reported during migration.
>
>  find_kvm_secondary_pte called with kvm mmu_lock not held
>  CPU: 23 PID: 5341 Comm: qemu-system-ppc Tainted: GW 
> 5.7.0-rc5-kvm-00211-g9ccf10d6d088 #432
>  NIP:  c00800fe848c LR: c00800fe8488 CTR: 
>  REGS: c01e19f077e0 TRAP: 0700   Tainted: GW  
> (5.7.0-rc5-kvm-00211-g9ccf10d6d088)
>  MSR:  90029033   CR: 4422  XER: 2004
>  CFAR: c012f5ac IRQMASK: 0
>  GPR00: c00800fe8488 c01e19f07a70 c00800ffe200 0039
>  GPR04: 0001 c01ffc8b4900 00018840 0007
>  GPR08: 0003 0001 0007 0001
>  GPR12: 2000 c01fff6d9400 00011f884678 7fff70b7
>  GPR16: 7fff7137cb90 7fff7dcb4410 0001 
>  GPR20: 0ffe  0001 
>  GPR24: 8000 0001 c01e1f67e600 c01e1fd82410
>  GPR28: 1000 c01e2e41 0fff 0ffe
>  NIP [c00800fe848c] kvmppc_hv_get_dirty_log_radix+0x2e4/0x340 [kvm_hv]
>  LR [c00800fe8488] kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv]
>  Call Trace:
>  [c01e19f07a70] [c00800fe8488] 
> kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv] (unreliable)
>  [c01e19f07b50] [c00800fd42e4] 
> kvm_vm_ioctl_get_dirty_log_hv+0x33c/0x3c0 [kvm_hv]
>  [c01e19f07be0] [c00800eea878] kvm_vm_ioctl_get_dirty_log+0x30/0x50 
> [kvm]
>  [c01e19f07c00] [c00800edc818] kvm_vm_ioctl+0x2b0/0xc00 [kvm]
>  [c01e19f07d50] [c046e148] ksys_ioctl+0xf8/0x150
>  [c01e19f07da0] [c046e1c8] sys_ioctl+0x28/0x80
>  [c01e19f07dc0] [c003652c] system_call_exception+0x16c/0x240
>  [c01e19f07e20] [c000d070] system_call_common+0xf0/0x278
>  Instruction dump:
>  7d3a512a 4200ffd0 7ffefb78 4bfffdc4 6000 3c82 e8848468 3c62
>  e86384a8 38840010 4800673d e8410018 <0fe0> 4bfffdd4 6000 6000
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/kvm_book3s_64.h |  9 ++
>  arch/powerpc/kvm/book3s_64_mmu_radix.c   | 35 
>  2 files changed, 38 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
> b/arch/powerpc/include/asm/kvm_book3s_64.h
> index c58e64a0a74f..cd5bad08b8d3 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -635,6 +635,15 @@ extern void kvmhv_remove_nest_rmap_range(struct kvm *kvm,
>   unsigned long gpa, unsigned long hpa,
>   unsigned long nbytes);
>  
> +static inline pte_t *__find_kvm_secondary_pte(struct kvm *kvm, unsigned long 
> ea,
> +   unsigned *hshift)
> +{
> + pte_t *pte;
> +
> + pte = __find_linux_pte(kvm->arch.pgtable, ea, NULL, hshift);
> + return pte;
> +}

Why not just open code this in the single caller?

Leaving it here someone will invariably decide to call it one day.

If you think it's worth keeping then it should have a comment explaining
why it doesn't check the lock, and find_kvm_secondary_pte() should call
it no?

cheers

Re: [PATCH] powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc again

2020-05-28 Thread Michael Ellerman

Petr Mladek  writes:
> On Thu 2020-05-28 11:03:43, Michael Ellerman wrote:
>> Petr Mladek  writes:
>> > The commit 0ebeea8ca8a4d1d453a ("bpf: Restrict bpf_probe_read{, str}() only
>> > to archs where they work") caused that bpf_probe_read{, str}() functions
>> > were not longer available on architectures where the same logical address
>> > might have different content in kernel and user memory mapping. These
>> > architectures should use probe_read_{user,kernel}_str helpers.
>> >
>> > For backward compatibility, the problematic functions are still available
>> > on architectures where the user and kernel address spaces are not
>> > overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
>> >
>> > At the moment, these backward compatible functions are enabled only
>> > on x86_64, arm, and arm64. Let's do it also on powerpc that has
>> > the non overlapping address space as well.
>> >
>> > Signed-off-by: Petr Mladek 
>> 
>> This seems like it should have a Fixes: tag and go into v5.7?
>
> Good point:
>
> Fixes: commit 0ebeea8ca8a4d1d4 ("bpf: Restrict bpf_probe_read{, str}() only 
> to archs where they work")
>
> And yes, it should ideally go into v5.7 either directly or via stable.
>
> Should I resend the patch with Fixes and
> Cc: sta...@vger.kernel.org #v45.7 lines, please?

If it goes into v5.7 then it doesn't need a Cc: stable, and I guess a
Fixes: tag is nice to have but not so important as it already mentions
the commit that caused the problem. So a resend probably isn't
necessary.

Acked-by: Michael Ellerman 


Daniel can you pick this up, or should I?

cheers

[PATCH AUTOSEL 4.4 4/7] net/ethernet/freescale: rework quiesce/activate for ucc_geth

2020-05-28 Thread Sasha Levin

From: Valentin Longchamp 

[ Upstream commit 79dde73cf9bcf1dd317a2667f78b758e9fe139ed ]

ugeth_quiesce/activate are used to halt the controller when there is a
link change that requires to reconfigure the mac.

The previous implementation called netif_device_detach(). This however
causes the initial activation of the netdevice to fail precisely because
it's detached. For details, see [1].

A possible workaround was the revert of commit
net: linkwatch: add check for netdevice being present to linkwatch_do_dev
However, the check introduced in the above commit is correct and shall be
kept.

The netif_device_detach() is thus replaced with
netif_tx_stop_all_queues() that prevents any tranmission. This allows to
perform mac config change required by the link change, without detaching
the corresponding netdevice and thus not preventing its initial
activation.

[1] https://lists.openwall.net/netdev/2020/01/08/201

Signed-off-by: Valentin Longchamp 
Acked-by: Matteo Ghidoni 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/freescale/ucc_geth.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
b/drivers/net/ethernet/freescale/ucc_geth.c
index 55ac00055977..96a1f62cc148 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ucc_geth.h"
 
@@ -1551,11 +1552,8 @@ static int ugeth_disable(struct ucc_geth_private *ugeth, 
enum comm_dir mode)
 
 static void ugeth_quiesce(struct ucc_geth_private *ugeth)
 {
-   /* Prevent any further xmits, plus detach the device. */
-   netif_device_detach(ugeth->ndev);
-
-   /* Wait for any current xmits to finish. */
-   netif_tx_disable(ugeth->ndev);
+   /* Prevent any further xmits */
+   netif_tx_stop_all_queues(ugeth->ndev);
 
/* Disable the interrupt to avoid NAPI rescheduling. */
disable_irq(ugeth->ug_info->uf_info.irq);
@@ -1568,7 +1566,10 @@ static void ugeth_activate(struct ucc_geth_private 
*ugeth)
 {
napi_enable(&ugeth->napi);
enable_irq(ugeth->ug_info->uf_info.irq);
-   netif_device_attach(ugeth->ndev);
+
+   /* allow to xmit again  */
+   netif_tx_wake_all_queues(ugeth->ndev);
+   __netdev_watchdog_up(ugeth->ndev);
 }
 
 /* Called every time the controller might need to be made
-- 
2.25.1

[PATCH AUTOSEL 4.9 5/9] net/ethernet/freescale: rework quiesce/activate for ucc_geth

2020-05-28 Thread Sasha Levin

From: Valentin Longchamp 

[ Upstream commit 79dde73cf9bcf1dd317a2667f78b758e9fe139ed ]

ugeth_quiesce/activate are used to halt the controller when there is a
link change that requires to reconfigure the mac.

The previous implementation called netif_device_detach(). This however
causes the initial activation of the netdevice to fail precisely because
it's detached. For details, see [1].

A possible workaround was the revert of commit
net: linkwatch: add check for netdevice being present to linkwatch_do_dev
However, the check introduced in the above commit is correct and shall be
kept.

The netif_device_detach() is thus replaced with
netif_tx_stop_all_queues() that prevents any tranmission. This allows to
perform mac config change required by the link change, without detaching
the corresponding netdevice and thus not preventing its initial
activation.

[1] https://lists.openwall.net/netdev/2020/01/08/201

Signed-off-by: Valentin Longchamp 
Acked-by: Matteo Ghidoni 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/freescale/ucc_geth.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
b/drivers/net/ethernet/freescale/ucc_geth.c
index 714593023bbc..af922bac19ae 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ucc_geth.h"
 
@@ -1551,11 +1552,8 @@ static int ugeth_disable(struct ucc_geth_private *ugeth, 
enum comm_dir mode)
 
 static void ugeth_quiesce(struct ucc_geth_private *ugeth)
 {
-   /* Prevent any further xmits, plus detach the device. */
-   netif_device_detach(ugeth->ndev);
-
-   /* Wait for any current xmits to finish. */
-   netif_tx_disable(ugeth->ndev);
+   /* Prevent any further xmits */
+   netif_tx_stop_all_queues(ugeth->ndev);
 
/* Disable the interrupt to avoid NAPI rescheduling. */
disable_irq(ugeth->ug_info->uf_info.irq);
@@ -1568,7 +1566,10 @@ static void ugeth_activate(struct ucc_geth_private 
*ugeth)
 {
napi_enable(&ugeth->napi);
enable_irq(ugeth->ug_info->uf_info.irq);
-   netif_device_attach(ugeth->ndev);
+
+   /* allow to xmit again  */
+   netif_tx_wake_all_queues(ugeth->ndev);
+   __netdev_watchdog_up(ugeth->ndev);
 }
 
 /* Called every time the controller might need to be made
-- 
2.25.1

[PATCH AUTOSEL 4.14 08/13] net/ethernet/freescale: rework quiesce/activate for ucc_geth

2020-05-28 Thread Sasha Levin

From: Valentin Longchamp 

[ Upstream commit 79dde73cf9bcf1dd317a2667f78b758e9fe139ed ]

ugeth_quiesce/activate are used to halt the controller when there is a
link change that requires to reconfigure the mac.

The previous implementation called netif_device_detach(). This however
causes the initial activation of the netdevice to fail precisely because
it's detached. For details, see [1].

A possible workaround was the revert of commit
net: linkwatch: add check for netdevice being present to linkwatch_do_dev
However, the check introduced in the above commit is correct and shall be
kept.

The netif_device_detach() is thus replaced with
netif_tx_stop_all_queues() that prevents any tranmission. This allows to
perform mac config change required by the link change, without detaching
the corresponding netdevice and thus not preventing its initial
activation.

[1] https://lists.openwall.net/netdev/2020/01/08/201

Signed-off-by: Valentin Longchamp 
Acked-by: Matteo Ghidoni 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/freescale/ucc_geth.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
b/drivers/net/ethernet/freescale/ucc_geth.c
index bddf4c25ee6e..7c2a9fd4dc1a 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ucc_geth.h"
 
@@ -1551,11 +1552,8 @@ static int ugeth_disable(struct ucc_geth_private *ugeth, 
enum comm_dir mode)
 
 static void ugeth_quiesce(struct ucc_geth_private *ugeth)
 {
-   /* Prevent any further xmits, plus detach the device. */
-   netif_device_detach(ugeth->ndev);
-
-   /* Wait for any current xmits to finish. */
-   netif_tx_disable(ugeth->ndev);
+   /* Prevent any further xmits */
+   netif_tx_stop_all_queues(ugeth->ndev);
 
/* Disable the interrupt to avoid NAPI rescheduling. */
disable_irq(ugeth->ug_info->uf_info.irq);
@@ -1568,7 +1566,10 @@ static void ugeth_activate(struct ucc_geth_private 
*ugeth)
 {
napi_enable(&ugeth->napi);
enable_irq(ugeth->ug_info->uf_info.irq);
-   netif_device_attach(ugeth->ndev);
+
+   /* allow to xmit again  */
+   netif_tx_wake_all_queues(ugeth->ndev);
+   __netdev_watchdog_up(ugeth->ndev);
 }
 
 /* Called every time the controller might need to be made
-- 
2.25.1

[PATCH AUTOSEL 4.19 12/17] net/ethernet/freescale: rework quiesce/activate for ucc_geth

2020-05-28 Thread Sasha Levin

From: Valentin Longchamp 

[ Upstream commit 79dde73cf9bcf1dd317a2667f78b758e9fe139ed ]

ugeth_quiesce/activate are used to halt the controller when there is a
link change that requires to reconfigure the mac.

The previous implementation called netif_device_detach(). This however
causes the initial activation of the netdevice to fail precisely because
it's detached. For details, see [1].

A possible workaround was the revert of commit
net: linkwatch: add check for netdevice being present to linkwatch_do_dev
However, the check introduced in the above commit is correct and shall be
kept.

The netif_device_detach() is thus replaced with
netif_tx_stop_all_queues() that prevents any tranmission. This allows to
perform mac config change required by the link change, without detaching
the corresponding netdevice and thus not preventing its initial
activation.

[1] https://lists.openwall.net/netdev/2020/01/08/201

Signed-off-by: Valentin Longchamp 
Acked-by: Matteo Ghidoni 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/freescale/ucc_geth.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
b/drivers/net/ethernet/freescale/ucc_geth.c
index a5bf02ae4bc5..5de6f7c73c1f 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ucc_geth.h"
 
@@ -1551,11 +1552,8 @@ static int ugeth_disable(struct ucc_geth_private *ugeth, 
enum comm_dir mode)
 
 static void ugeth_quiesce(struct ucc_geth_private *ugeth)
 {
-   /* Prevent any further xmits, plus detach the device. */
-   netif_device_detach(ugeth->ndev);
-
-   /* Wait for any current xmits to finish. */
-   netif_tx_disable(ugeth->ndev);
+   /* Prevent any further xmits */
+   netif_tx_stop_all_queues(ugeth->ndev);
 
/* Disable the interrupt to avoid NAPI rescheduling. */
disable_irq(ugeth->ug_info->uf_info.irq);
@@ -1568,7 +1566,10 @@ static void ugeth_activate(struct ucc_geth_private 
*ugeth)
 {
napi_enable(&ugeth->napi);
enable_irq(ugeth->ug_info->uf_info.irq);
-   netif_device_attach(ugeth->ndev);
+
+   /* allow to xmit again  */
+   netif_tx_wake_all_queues(ugeth->ndev);
+   __netdev_watchdog_up(ugeth->ndev);
 }
 
 /* Called every time the controller might need to be made
-- 
2.25.1

[PATCH AUTOSEL 5.4 15/26] net/ethernet/freescale: rework quiesce/activate for ucc_geth

2020-05-28 Thread Sasha Levin

From: Valentin Longchamp 

[ Upstream commit 79dde73cf9bcf1dd317a2667f78b758e9fe139ed ]

ugeth_quiesce/activate are used to halt the controller when there is a
link change that requires to reconfigure the mac.

The previous implementation called netif_device_detach(). This however
causes the initial activation of the netdevice to fail precisely because
it's detached. For details, see [1].

A possible workaround was the revert of commit
net: linkwatch: add check for netdevice being present to linkwatch_do_dev
However, the check introduced in the above commit is correct and shall be
kept.

The netif_device_detach() is thus replaced with
netif_tx_stop_all_queues() that prevents any tranmission. This allows to
perform mac config change required by the link change, without detaching
the corresponding netdevice and thus not preventing its initial
activation.

[1] https://lists.openwall.net/netdev/2020/01/08/201

Signed-off-by: Valentin Longchamp 
Acked-by: Matteo Ghidoni 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/freescale/ucc_geth.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
b/drivers/net/ethernet/freescale/ucc_geth.c
index f839fa94ebdd..d3b8ce734c1b 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ucc_geth.h"
 
@@ -1548,11 +1549,8 @@ static int ugeth_disable(struct ucc_geth_private *ugeth, 
enum comm_dir mode)
 
 static void ugeth_quiesce(struct ucc_geth_private *ugeth)
 {
-   /* Prevent any further xmits, plus detach the device. */
-   netif_device_detach(ugeth->ndev);
-
-   /* Wait for any current xmits to finish. */
-   netif_tx_disable(ugeth->ndev);
+   /* Prevent any further xmits */
+   netif_tx_stop_all_queues(ugeth->ndev);
 
/* Disable the interrupt to avoid NAPI rescheduling. */
disable_irq(ugeth->ug_info->uf_info.irq);
@@ -1565,7 +1563,10 @@ static void ugeth_activate(struct ucc_geth_private 
*ugeth)
 {
napi_enable(&ugeth->napi);
enable_irq(ugeth->ug_info->uf_info.irq);
-   netif_device_attach(ugeth->ndev);
+
+   /* allow to xmit again  */
+   netif_tx_wake_all_queues(ugeth->ndev);
+   __netdev_watchdog_up(ugeth->ndev);
 }
 
 /* Called every time the controller might need to be made
-- 
2.25.1

[PATCH AUTOSEL 5.6 30/47] net/ethernet/freescale: rework quiesce/activate for ucc_geth

2020-05-28 Thread Sasha Levin

From: Valentin Longchamp 

[ Upstream commit 79dde73cf9bcf1dd317a2667f78b758e9fe139ed ]

ugeth_quiesce/activate are used to halt the controller when there is a
link change that requires to reconfigure the mac.

The previous implementation called netif_device_detach(). This however
causes the initial activation of the netdevice to fail precisely because
it's detached. For details, see [1].

A possible workaround was the revert of commit
net: linkwatch: add check for netdevice being present to linkwatch_do_dev
However, the check introduced in the above commit is correct and shall be
kept.

The netif_device_detach() is thus replaced with
netif_tx_stop_all_queues() that prevents any tranmission. This allows to
perform mac config change required by the link change, without detaching
the corresponding netdevice and thus not preventing its initial
activation.

[1] https://lists.openwall.net/netdev/2020/01/08/201

Signed-off-by: Valentin Longchamp 
Acked-by: Matteo Ghidoni 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/freescale/ucc_geth.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
b/drivers/net/ethernet/freescale/ucc_geth.c
index 0d101c00286f..ab1b4a77b4a3 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ucc_geth.h"
 
@@ -1548,11 +1549,8 @@ static int ugeth_disable(struct ucc_geth_private *ugeth, 
enum comm_dir mode)
 
 static void ugeth_quiesce(struct ucc_geth_private *ugeth)
 {
-   /* Prevent any further xmits, plus detach the device. */
-   netif_device_detach(ugeth->ndev);
-
-   /* Wait for any current xmits to finish. */
-   netif_tx_disable(ugeth->ndev);
+   /* Prevent any further xmits */
+   netif_tx_stop_all_queues(ugeth->ndev);
 
/* Disable the interrupt to avoid NAPI rescheduling. */
disable_irq(ugeth->ug_info->uf_info.irq);
@@ -1565,7 +1563,10 @@ static void ugeth_activate(struct ucc_geth_private 
*ugeth)
 {
napi_enable(&ugeth->napi);
enable_irq(ugeth->ug_info->uf_info.irq);
-   netif_device_attach(ugeth->ndev);
+
+   /* allow to xmit again  */
+   netif_tx_wake_all_queues(ugeth->ndev);
+   __netdev_watchdog_up(ugeth->ndev);
 }
 
 /* Called every time the controller might need to be made
-- 
2.25.1

[PATCH] powerpc/32: disable KASAN with pages bigger than 16k

2020-05-28 Thread Christophe Leroy

Mapping of early shadow area is implemented by using a single static
page table having all entries pointing to the same early shadow page.
The shadow area must therefore occupy full PGD entries.

The shadow area has a size of 128Mbytes starting at 0xf800.
With 4k pages, a PGD entry is 4Mbytes
With 16k pages, a PGD entry is 64Mbytes
With 64k pages, a PGD entry is 256Mbytes which is too big.

Until we rework the early shadow mapping, disable KASAN when the
page size is too big.

Reported-by: kbuild test robot 
Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1dfa59126fcf..066b36bf9efa 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -169,8 +169,8 @@ config PPC
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
-   select HAVE_ARCH_KASAN  if PPC32
-   select HAVE_ARCH_KASAN_VMALLOC  if PPC32
+   select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
+   select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
-- 
2.25.0

Re: [PATCH] powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc again

2020-05-28 Thread Petr Mladek

On Thu 2020-05-28 11:03:43, Michael Ellerman wrote:
> Petr Mladek  writes:
> > The commit 0ebeea8ca8a4d1d453a ("bpf: Restrict bpf_probe_read{, str}() only
> > to archs where they work") caused that bpf_probe_read{, str}() functions
> > were not longer available on architectures where the same logical address
> > might have different content in kernel and user memory mapping. These
> > architectures should use probe_read_{user,kernel}_str helpers.
> >
> > For backward compatibility, the problematic functions are still available
> > on architectures where the user and kernel address spaces are not
> > overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
> >
> > At the moment, these backward compatible functions are enabled only
> > on x86_64, arm, and arm64. Let's do it also on powerpc that has
> > the non overlapping address space as well.
> >
> > Signed-off-by: Petr Mladek 
> 
> This seems like it should have a Fixes: tag and go into v5.7?

Good point:

Fixes: commit 0ebeea8ca8a4d1d4 ("bpf: Restrict bpf_probe_read{, str}() only to 
archs where they work")

And yes, it should ideally go into v5.7 either directly or via stable.

Should I resend the patch with Fixes and
Cc: sta...@vger.kernel.org #v45.7 lines, please?

Best Regards,
Petr

Re: [RFC PATCH 1/4] powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

2020-05-28 Thread Nicholas Piggin

Excerpts from Michael Ellerman's message of May 28, 2020 12:58 am:
> __init_FSCR() was added originally in commit 2468dcf641e4 ("powerpc:
> Add support for context switching the TAR register") (Feb 2013), and
> only set FSCR_TAR.
> 
> At that point FSCR (Facility Status and Control Register) was not
> context switched, so the setting was permanent after boot.
> 
> Later we added initialisation of FSCR_DSCR to __init_FSCR(), in commit
> 54c9b2253d34 ("powerpc: Set DSCR bit in FSCR setup") (Mar 2013), again
> that was permanent after boot.
> 
> Then commit 2517617e0de6 ("powerpc: Fix context switch DSCR on
> POWER8") (Aug 2013) added a limited context switch of FSCR, just the
> FSCR_DSCR bit was context switched based on thread.dscr_inherit. That
> commit said "This clears the H/FSCR DSCR bit initially", but it
> didn't, it left the initialisation of FSCR_DSCR in __init_FSCR().
> However the initial context switch from init_task to pid 1 would clear
> FSCR_DSCR because thread.dscr_inherit was 0.
> 
> That commit also introduced the requirement that FSCR_DSCR be clear
> for user processes, so that we can take the facility unavailable
> interrupt in order to manage dscr_inherit.
> 
> Then in commit 152d523e6307 ("powerpc: Create context switch helpers
> save_sprs() and restore_sprs()") (Dec 2015) FSCR was added to
> thread_struct. However it still wasn't fully context switched, we just
> took the existing value and set FSCR_DSCR if the new thread had
> dscr_inherit set. FSCR was still initialised at boot to FSCR_DSCR |
> FSCR_TAR, but that value was not propagated into the thread_struct, so
> the initial context switch set FSCR_DSCR back to 0.
> 
> Finally commit b57bd2de8c6c ("powerpc: Improve FSCR init and context
> switching") (Jun 2016) added a full context switch of the FSCR, and
> added an initialisation of init_task.thread.fscr to FSCR_TAR |
> FSCR_EBB, but omitted FSCR_DSCR.
> 
> The end result is that swapper runs with FSCR_DSCR set because of the
> initialisation in __init_FSCR(), but no other processes do, they use
> the value from init_task.thread.fscr.
> 
> Having FSCR_DSCR set for swapper allows it to access SPR 3 from
> userspace, but swapper never runs userspace, so it has no useful
> effect. It's also confusing to have the value initialised in two
> places to two different values.
> 
> So remove FSCR_DSCR from __init_FSCR(), this at least gets us to the
> point where there's a single value of FSCR, even if it's still set in
> two places.

Thanks for sorting out this mess, everything looks good to me.

Thanks,
Nick

[PATCH] powerpc/book3s64/kvm: Fix secondary page table walk warning during migration

2020-05-28 Thread Aneesh Kumar K.V

This patch fix the below warning reported during migration.

 find_kvm_secondary_pte called with kvm mmu_lock not held
 CPU: 23 PID: 5341 Comm: qemu-system-ppc Tainted: GW 
5.7.0-rc5-kvm-00211-g9ccf10d6d088 #432
 NIP:  c00800fe848c LR: c00800fe8488 CTR: 
 REGS: c01e19f077e0 TRAP: 0700   Tainted: GW  
(5.7.0-rc5-kvm-00211-g9ccf10d6d088)
 MSR:  90029033   CR: 4422  XER: 2004
 CFAR: c012f5ac IRQMASK: 0
 GPR00: c00800fe8488 c01e19f07a70 c00800ffe200 0039
 GPR04: 0001 c01ffc8b4900 00018840 0007
 GPR08: 0003 0001 0007 0001
 GPR12: 2000 c01fff6d9400 00011f884678 7fff70b7
 GPR16: 7fff7137cb90 7fff7dcb4410 0001 
 GPR20: 0ffe  0001 
 GPR24: 8000 0001 c01e1f67e600 c01e1fd82410
 GPR28: 1000 c01e2e41 0fff 0ffe
 NIP [c00800fe848c] kvmppc_hv_get_dirty_log_radix+0x2e4/0x340 [kvm_hv]
 LR [c00800fe8488] kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv]
 Call Trace:
 [c01e19f07a70] [c00800fe8488] 
kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv] (unreliable)
 [c01e19f07b50] [c00800fd42e4] 
kvm_vm_ioctl_get_dirty_log_hv+0x33c/0x3c0 [kvm_hv]
 [c01e19f07be0] [c00800eea878] kvm_vm_ioctl_get_dirty_log+0x30/0x50 
[kvm]
 [c01e19f07c00] [c00800edc818] kvm_vm_ioctl+0x2b0/0xc00 [kvm]
 [c01e19f07d50] [c046e148] ksys_ioctl+0xf8/0x150
 [c01e19f07da0] [c046e1c8] sys_ioctl+0x28/0x80
 [c01e19f07dc0] [c003652c] system_call_exception+0x16c/0x240
 [c01e19f07e20] [c000d070] system_call_common+0xf0/0x278
 Instruction dump:
 7d3a512a 4200ffd0 7ffefb78 4bfffdc4 6000 3c82 e8848468 3c62
 e86384a8 38840010 4800673d e8410018 <0fe0> 4bfffdd4 6000 6000

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  9 ++
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 35 
 2 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index c58e64a0a74f..cd5bad08b8d3 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -635,6 +635,15 @@ extern void kvmhv_remove_nest_rmap_range(struct kvm *kvm,
unsigned long gpa, unsigned long hpa,
unsigned long nbytes);
 
+static inline pte_t *__find_kvm_secondary_pte(struct kvm *kvm, unsigned long 
ea,
+ unsigned *hshift)
+{
+   pte_t *pte;
+
+   pte = __find_linux_pte(kvm->arch.pgtable, ea, NULL, hshift);
+   return pte;
+}
+
 static inline pte_t *find_kvm_secondary_pte(struct kvm *kvm, unsigned long ea,
unsigned *hshift)
 {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 271f1c3d8443..a328db6a5ef8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1040,7 +1040,7 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm,
 {
unsigned long gfn = memslot->base_gfn + pagenum;
unsigned long gpa = gfn << PAGE_SHIFT;
-   pte_t *ptep;
+   pte_t *ptep, pte;
unsigned int shift;
int ret = 0;
unsigned long old, *rmapp;
@@ -1048,12 +1048,35 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm,
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
return ret;
 
-   ptep = find_kvm_secondary_pte(kvm, gpa, &shift);
-   if (ptep && pte_present(*ptep) && pte_dirty(*ptep)) {
-   ret = 1;
-   if (shift)
-   ret = 1 << (shift - PAGE_SHIFT);
+   /*
+* For performance reason we don't hold kvm->mmu_lock
+* while walking the partition scoped table.
+*/
+   ptep = __find_kvm_secondary_pte(kvm, gpa, &shift);
+   if (!ptep)
+   return 0;
+
+   pte = READ_ONCE(*ptep);
+   if (pte_present(pte) && pte_dirty(pte)) {
spin_lock(&kvm->mmu_lock);
+   /*
+* Recheck the pte again
+*/
+   if (pte_val(pte) != pte_val(*ptep)) {
+   /*
+* We have KVM_MEM_LOG_DIRTY_PAGES enabled. Hence we
+* can only find PAGE_SIZE pte entries here. We can
+* continue to use the pte addr returned by above
+* page table walk.
+*/
+   if (!pte_present(*ptep) || !pte_dirty(*ptep)) {
+   spin_unlock(&kvm->mmu_lock);

Re: [PATCH v4 09/22] powerpc/kvm/book3s: Add helper to walk partition scoped linux page table.

2020-05-28 Thread Paul Mackerras

On Thu, May 28, 2020 at 11:31:04AM +0530, Aneesh Kumar K.V wrote:
> On 5/28/20 7:13 AM, Paul Mackerras wrote:
> > On Tue, May 05, 2020 at 12:47:16PM +0530, Aneesh Kumar K.V wrote:
> > > The locking rules for walking partition scoped table is different from 
> > > process
> > > scoped table. Hence add a helper for secondary linux page table walk and 
> > > also
> > > add check whether we are holding the right locks.
> > 
> > This patch is causing new warnings to appear when testing migration,
> > like this:
> > 
> > [  142.090159] [ cut here ]
> > [  142.090160] find_kvm_secondary_pte called with kvm mmu_lock not held
> > [  142.090176] WARNING: CPU: 23 PID: 5341 at 
> > arch/powerpc/include/asm/kvm_book3s_64.h:644 
> > kvmppc_hv_get_dirty_log_radix+0x2e4/0x340 [kvm_hv]
> > [  142.090177] Modules linked in: xt_conntrack nf_conntrack nf_defrag_ipv6 
> > nf_defrag_ipv4 libcrc32c bridge stp llc ebtable_filter ebtables 
> > ip6table_filter ip6_tables bpfilter overlay binfmt_misc input_leds 
> > raid_class scsi_transport_sas sch_fq_codel sunrpc kvm_hv kvm
> > [  142.090188] CPU: 23 PID: 5341 Comm: qemu-system-ppc Tainted: GW  
> >5.7.0-rc5-kvm-00211-g9ccf10d6d088 #432
> > [  142.090189] NIP:  c00800fe848c LR: c00800fe8488 CTR: 
> > 
> > [  142.090190] REGS: c01e19f077e0 TRAP: 0700   Tainted: GW  
> > (5.7.0-rc5-kvm-00211-g9ccf10d6d088)
> > [  142.090191] MSR:  90029033   CR: 
> > 4422  XER: 2004
> > [  142.090196] CFAR: c012f5ac IRQMASK: 0
> > GPR00: c00800fe8488 c01e19f07a70 c00800ffe200 
> > 0039
> > GPR04: 0001 c01ffc8b4900 00018840 
> > 0007
> > GPR08: 0003 0001 0007 
> > 0001
> > GPR12: 2000 c01fff6d9400 00011f884678 
> > 7fff70b7
> > GPR16: 7fff7137cb90 7fff7dcb4410 0001 
> > 
> > GPR20: 0ffe  0001 
> > 
> > GPR24: 8000 0001 c01e1f67e600 
> > c01e1fd82410
> > GPR28: 1000 c01e2e41 0fff 
> > 0ffe
> > [  142.090217] NIP [c00800fe848c] 
> > kvmppc_hv_get_dirty_log_radix+0x2e4/0x340 [kvm_hv]
> > [  142.090223] LR [c00800fe8488] 
> > kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv]
> > [  142.090224] Call Trace:
> > [  142.090230] [c01e19f07a70] [c00800fe8488] 
> > kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv] (unreliable)
> > [  142.090236] [c01e19f07b50] [c00800fd42e4] 
> > kvm_vm_ioctl_get_dirty_log_hv+0x33c/0x3c0 [kvm_hv]
> > [  142.090292] [c01e19f07be0] [c00800eea878] 
> > kvm_vm_ioctl_get_dirty_log+0x30/0x50 [kvm]
> > [  142.090300] [c01e19f07c00] [c00800edc818] 
> > kvm_vm_ioctl+0x2b0/0xc00 [kvm]
> > [  142.090302] [c01e19f07d50] [c046e148] ksys_ioctl+0xf8/0x150
> > [  142.090305] [c01e19f07da0] [c046e1c8] sys_ioctl+0x28/0x80
> > [  142.090307] [c01e19f07dc0] [c003652c] 
> > system_call_exception+0x16c/0x240
> > [  142.090309] [c01e19f07e20] [c000d070] 
> > system_call_common+0xf0/0x278
> > [  142.090310] Instruction dump:
> > [  142.090312] 7d3a512a 4200ffd0 7ffefb78 4bfffdc4 6000 3c82 
> > e8848468 3c62
> > [  142.090317] e86384a8 38840010 4800673d e8410018 <0fe0> 4bfffdd4 
> > 6000 6000
> > [  142.090322] ---[ end trace 619d45057b6919e0 ]---
> > 
> > Indeed, kvm_radix_test_clear_dirty() tests the PTE dirty bit
> > locklessly, and only takes the kvm->mmu_lock once it finds a dirty
> > PTE.  I think that is important for performance, since on any given
> > scan of the guest real address space we may only find a small
> > proportion of the guest pages to be dirty.
> > 
> > Are you now relying on the kvm->mmu_lock to protect the existence of
> > the PTEs, or just their content?
> > 
> 
> The patch series should not change any rules w.r.t kvm partition scoped page
> table walk. We only added helpers to make it explicit that this is different
> from regular linux page table walk. And kvm->mmu_lock is what was used to
> protect the partition scoped table walk earlier.
> 
> In this specific case, what we need probably is an open coded kvm partition
> scoped walk with a comment around explaining why is it ok to walk that
> partition scoped table without taking kvm->mmu_lock.
> 
> What happens when a parallel invalidate happens to Qemu address space? Since
> we are not holding kvm->mmu_lock mmu notifier will complete and we will go
> ahead with unmapping partition scoped table.
> 
> Do we need a change like below?
> 
> @@ -1040,7 +1040,7 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm,
>  {
>   unsigned long gfn = memslot->base_gfn + pagenum;
>   u

56 matches

Mail list logo