[PATCH] cpuidle/powernv : Remove dead code block

2020-07-05 Thread Abhishek Goel
Commit 1961acad2f88559c2cdd2ef67c58c3627f1f6e54 removes usage of
function "validate_dt_prop_sizes". This patch removes this unused
function.

Signed-off-by: Abhishek Goel 
---
 drivers/cpuidle/cpuidle-powernv.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 1b299e801f74..addaa6e6718b 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -244,20 +244,6 @@ static inline void add_powernv_state(int index, const char 
*name,
stop_psscr_table[index].mask = psscr_mask;
 }
 
-/*
- * Returns 0 if prop1_len == prop2_len. Else returns -1
- */
-static inline int validate_dt_prop_sizes(const char *prop1, int prop1_len,
-const char *prop2, int prop2_len)
-{
-   if (prop1_len == prop2_len)
-   return 0;
-
-   pr_warn("cpuidle-powernv: array sizes don't match for %s and %s\n",
-   prop1, prop2);
-   return -1;
-}
-
 extern u32 pnv_get_supported_cpuidle_states(void);
 static int powernv_add_idle_states(void)
 {
-- 
2.17.1



Re: Using Firefox hangs system

2020-07-05 Thread Paul Menzel

Dear Nicholas,


Thank you for the quick response.


Am 06.07.20 um 02:41 schrieb Nicholas Piggin:

Excerpts from Paul Menzel's message of July 5, 2020 8:30 pm:



Am 05.07.20 um 11:22 schrieb Paul Menzel:


With an IBM S822LC with Ubuntu 20.04, after updating to Firefox 78.0,
using Firefox seems to hang the system. This happened with self-built
Linux 5.7-rc5+ and now with 5.8-rc3+.

(At least I believe the Firefox update is causing this.)

Log in is impossible, and using the Serial over LAN over IPMI shows the
messages below.


[ 2620.579187] watchdog: BUG: soft lockup - CPU#125 stuck for 22s!
[swapper/125:0]
[ 2620.579378] Modules linked in: tcp_diag inet_diag unix_diag
xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4
xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat
nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink
ip6table_filter ip6_tables iptable_filter bridge stp llc overlay xfs
kvm_hv kvm joydev binfmt_misc uas usb_storage vmx_crypto ofpart
cmdlinepart bnx2x powernv_flash mtd mdio crct10dif_vpmsum at24
ibmpowernv ipmi_powernv ipmi_devintf powernv_rng ipmi_msghandler
opal_prd sch_fq_codel parport_pc nfsd ppdev lp auth_rpcgss nfs_acl
parport lockd grace sunrpc ip_tables x_tables autofs4 btrfs
blake2b_generic libcrc32c xor zstd_compress raid6_pq input_leds
mac_hid hid_generic ast drm_vram_helper drm_ttm_helper i2c_algo_bit
ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
drm_panel_orientation_quirks ahci libahci usbhid hid crc32c_vpmsum
uio_pdrv_genirq uio
[ 2620.579537] CPU: 125 PID: 0 Comm: swapper/125 Tainted: G  D
W    L    5.8.0-rc3+ #1
[ 2620.579552] NIP:  c10dad38 LR: c10dad30 CTR:
c0237830
[ 2620.579568] REGS: c0ffcb8c7600 TRAP: 0900   Tainted: G  D
W    L (5.8.0-rc3+)
[ 2620.579582] MSR:  90009033   CR:
44004228  XER: 
[ 2620.579599] CFAR: c10dad44 IRQMASK: 0 [ 2620.579599] GPR00:
c023718c c0ffcb8c7890 c1f9a900  [
2620.579599] GPR04: c1fce438 0078 00010008c1f2
 [ 2620.579599] GPR08: 00ffd96a
8087  c1fd25e0 [ 2620.579599]
GPR12: 4400 c072f680 c1ea36d8
c0ffcb859800 [ 2620.579599] GPR16: c166c880
c16f8e00 000a c0ffcb859800 [ 2620.579599]
GPR20: 0100 c166c918 c1fd21e8
c0ffcb859800 [ 2620.579599] GPR24: 00ffd96a
c1d44b80 c1d53780 0008 [ 2620.579599]
GPR28: c1fd21e0 0001 
c1d44b80 [ 2620.579711] NIP [c10dad38]
_raw_spin_lock_irqsave+0x98/0x120
[ 2620.579724] LR [c10dad30] _raw_spin_lock_irqsave+0x90/0x120
[ 2620.579737] Call Trace:
[ 2620.579746] [c0ffcb8c7890] [c13c84a0]
ncsi_ops+0x209f50/0x2dc1d8 (unreliable)
[ 2620.579763] [c0ffcb8c78d0] [c023718c] rcu_core+0xfc/0x7a0
[ 2620.579777] [c0ffcb8c7970] [c10db81c]
__do_softirq+0x17c/0x534
[ 2620.579791] [c0ffcb8c7aa0] [c01786f4] irq_exit+0xd4/0x130
[ 2620.579805] [c0ffcb8c7ad0] [c0025eec]
timer_interrupt+0x13c/0x370
[ 2620.579821] [c0ffcb8c7b40] [c00165c0]
replay_soft_interrupts+0x320/0x3f0
[ 2620.579837] [c0ffcb8c7d30] [c00166d8]
arch_local_irq_restore+0x48/0xa0
[ 2620.579853] [c0ffcb8c7d50] [c0de2fe0]
cpuidle_enter_state+0x100/0x780


[snip]


I have to warm reset the system to get it working again.


I am unable to reproduce this with Ubuntu’s Linux


Okay, not sure what that would be from, looks like RCU perhaps. Anyway
if it comes up again, let us know.


Ah, it’s a different trace. I think it’s just an effect of the first 
error (as below), as some CPUs lock up. I wasn’t able to capture the 
start of the trace above. In the attachment for the hang *below* you can 
also see


[  664.705193] watchdog: BUG: soft lockup - CPU#134 stuck for 26s! 
[swapper/134:0]


after the first Oops.


With Linux 5.8-rc3+, I got now the beginning of the Linux messages.


[  572.253008] Oops: Exception in kernel mode, sig: 5 [#1]
[  572.253198] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[  572.253232] Modules linked in: tcp_diag inet_diag unix_diag xt_CHECKSUM 
xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle 
ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter 
bridge stp llc overlay xfs kvm_hv kvm binfmt_misc joydev uas usb_storage 
vmx_crypto bnx2x crct10dif_vpmsum ofpart cmdlinepart powernv_flash mtd mdio 
ibmpowernv at24 ipmi_powernv ipmi_devintf ipmi_msghandler opal_prd powernv_rng 
sch_fq_codel parport_pc ppdev lp nfsd parport auth_rpcgss nfs_acl lockd grace 
sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic libcrc32c xor 
zstd_compress raid6_pq input_leds mac_hid 

[RFC v2 1/2] powerpc/powernv : Add support for pre-entry and post-exit of stop state using OPAL V4 OS services

2020-07-05 Thread Abhishek Goel
This patch provides kernel framework fro opal support of save restore
of sprs in idle stop loop. Opal support for stop states is needed to
selectively enable stop states or to introduce a quirk quickly in case
a buggy stop state is present.

We make a opal call from kernel if firmware-stop-support for stop
states is present and enabled. All the quirks for pre-entry of stop
state is handled inside opal. A call from opal is made into kernel
where we execute stop afer saving of NVGPRs.
After waking up from 0x100 vector in kernel, we enter back into opal.
All the quirks in post exit path, if any, are then handled in opal,
from where we return successfully back to kernel.

Signed-off-by: Abhishek Goel 
---
v1->v2 : Rebased the patch on Nick's Opal V4 OS patchset

 arch/powerpc/include/asm/opal-api.h|  4 +++-
 arch/powerpc/include/asm/opal.h|  1 +
 arch/powerpc/platforms/powernv/idle.c  | 12 
 arch/powerpc/platforms/powernv/opal-call.c |  1 +
 arch/powerpc/platforms/powernv/opal.c  | 15 +++
 5 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 97c5e5423827..437b6937685d 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -219,7 +219,8 @@
 #define OPAL_REPORT_TRAP   183
 #define OPAL_FIND_VM_AREA  184
 #define OPAL_REGISTER_OS_OPS   185
-#define OPAL_LAST  185
+#define OPAL_CPU_IDLE  186
+#define OPAL_LAST  186
 
 #define QUIESCE_HOLD   1 /* Spin all calls at entry */
 #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
@@ -1207,6 +1208,7 @@ struct opal_os_ops {
__be64  os_printf; /* void printf(int32_t level, const char *str) */
__be64  os_vm_map; /* int64_t os_vm_map(uint64_t ea, uint64_t pa, 
uint64_t flags) */
__be64  os_vm_unmap; /* void os_vm_unmap(uint64_t ea) */
+   __be64  os_idle_stop; /* void os_idle_stop(uint64_t srr1_addr, uint64_t 
psscr) */
 };
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 09985b7718b3..1774c056acb8 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -407,6 +407,7 @@ void opal_sensor_groups_init(void);
 
 int64_t opal_find_vm_area(uint64_t addr, struct opal_vm_area *opal_vm_area);
 int64_t opal_register_os_ops(struct opal_os_ops *ops, uint64_t size);
+int64_t opal_cpu_idle(uint64_t srr1_addr, uint64_t psscr);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 78599bca66c2..3afd4293f729 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -805,6 +805,18 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
return srr1;
 }
 
+static unsigned long power9_firmware_idle_stop(unsigned long psscr, bool 
mmu_on)
+{
+   unsigned long srr1;
+   int rc;
+
+   rc = opal_cpu_idle(cpu_to_be64(), (uint64_t) psscr);
+
+   if (mmu_on)
+   mtmsr(MSR_KERNEL);
+   return srr1;
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 static unsigned long power9_offline_stop(unsigned long psscr)
 {
diff --git a/arch/powerpc/platforms/powernv/opal-call.c 
b/arch/powerpc/platforms/powernv/opal-call.c
index 11f419e76059..79076ca2de03 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -351,3 +351,4 @@ OPAL_CALL(opal_sym_to_addr, 
OPAL_SYM_TO_ADDR);
 OPAL_CALL(opal_report_trap,OPAL_REPORT_TRAP);
 OPAL_CALL(opal_find_vm_area,   OPAL_FIND_VM_AREA);
 OPAL_CALL(opal_register_os_ops,OPAL_REGISTER_OS_OPS);
+OPAL_CALL(opal_cpu_idle,   OPAL_CPU_IDLE);
diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 93b9afaf33b3..1fbf7065f918 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -1150,6 +1150,20 @@ static void os_vm_unmap(uint64_t ea)
local_flush_tlb_mm(mm);
 }
 
+int64_t os_idle_stop(uint64_t srr1_addr, uint64_t psscr)
+{
+   /*
+* For lite state which does not lose even GPRS we call
+* idle_stop_noloss while for all other states we call
+* idle_stop_mayloss. Saving and restoration of other additional
+* SPRs if required is handled in OPAL. All the quirks are also
+* handled in OPAL.
+*/
+   if (!(psscr & (PSSCR_EC|PSSCR_ESL)))
+   return isa300_idle_stop_noloss(psscr);
+   return isa300_idle_stop_mayloss(psscr);
+}
+
 static int __init opal_init_mm(void)
 {
struct mm_struct *mm;
@@ -1231,6 +1245,7 @@ static int __init opal_init_early(void)
  

[PATCH v3 6/6] powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the lock hint

2020-07-05 Thread Nicholas Piggin
This brings the behaviour of the uncontended fast path back to
roughly equivalent to simple spinlocks -- a single atomic op with
lock hint.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/atomic.h| 28 
 arch/powerpc/include/asm/qspinlock.h |  2 +-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/atomic.h 
b/arch/powerpc/include/asm/atomic.h
index 498785ffc25f..f6a3d145ffb7 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -193,6 +193,34 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t 
*v)
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
+/*
+ * Don't want to override the generic atomic_try_cmpxchg_acquire, because
+ * we add a lock hint to the lwarx, which may not be wanted for the
+ * _acquire case (and is not used by the other _acquire variants so it
+ * would be a surprise).
+ */
+static __always_inline bool
+atomic_try_cmpxchg_lock(atomic_t *v, int *old, int new)
+{
+   int r, o = *old;
+
+   __asm__ __volatile__ (
+"1:\t" PPC_LWARX(%0,0,%2,1) "  # atomic_try_cmpxchg_acquire\n"
+"  cmpw0,%0,%3 \n"
+"  bne-2f  \n"
+"  stwcx.  %4,0,%2 \n"
+"  bne-1b  \n"
+"\t"   PPC_ACQUIRE_BARRIER "   \n"
+"2:\n"
+   : "=" (r), "+m" (v->counter)
+   : "r" (>counter), "r" (o), "r" (new)
+   : "cr0", "memory");
+
+   if (unlikely(r != o))
+   *old = r;
+   return likely(r == o);
+}
+
 /**
  * atomic_fetch_add_unless - add unless the number is a given value
  * @v: pointer of type atomic_t
diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
index f5066f00a08c..b752d34517b3 100644
--- a/arch/powerpc/include/asm/qspinlock.h
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -37,7 +37,7 @@ static __always_inline void queued_spin_lock(struct qspinlock 
*lock)
 {
u32 val = 0;
 
-   if (likely(atomic_try_cmpxchg_acquire(>val, , _Q_LOCKED_VAL)))
+   if (likely(atomic_try_cmpxchg_lock(>val, , _Q_LOCKED_VAL)))
return;
 
queued_spin_lock_slowpath(lock, val);
-- 
2.23.0



[PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-05 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/paravirt.h   | 28 
 arch/powerpc/include/asm/qspinlock.h  | 66 +++
 arch/powerpc/include/asm/qspinlock_paravirt.h |  7 ++
 arch/powerpc/platforms/pseries/Kconfig|  5 ++
 arch/powerpc/platforms/pseries/setup.c|  6 +-
 include/asm-generic/qspinlock.h   |  2 +
 6 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h

diff --git a/arch/powerpc/include/asm/paravirt.h 
b/arch/powerpc/include/asm/paravirt.h
index 7a8546660a63..f2d51f929cf5 100644
--- a/arch/powerpc/include/asm/paravirt.h
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -29,6 +29,16 @@ static inline void yield_to_preempted(int cpu, u32 
yield_count)
 {
plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(cpu), 
yield_count);
 }
+
+static inline void prod_cpu(int cpu)
+{
+   plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu));
+}
+
+static inline void yield_to_any(void)
+{
+   plpar_hcall_norets(H_CONFER, -1, 0);
+}
 #else
 static inline bool is_shared_processor(void)
 {
@@ -45,6 +55,19 @@ static inline void yield_to_preempted(int cpu, u32 
yield_count)
 {
___bad_yield_to_preempted(); /* This would be a bug */
 }
+
+extern void ___bad_yield_to_any(void);
+static inline void yield_to_any(void)
+{
+   ___bad_yield_to_any(); /* This would be a bug */
+}
+
+extern void ___bad_prod_cpu(void);
+static inline void prod_cpu(int cpu)
+{
+   ___bad_prod_cpu(); /* This would be a bug */
+}
+
 #endif
 
 #define vcpu_is_preempted vcpu_is_preempted
@@ -57,5 +80,10 @@ static inline bool vcpu_is_preempted(int cpu)
return false;
 }
 
+static inline bool pv_is_native_spin_unlock(void)
+{
+ return !is_shared_processor();
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_PARAVIRT_H */
diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
index c49e33e24edd..f5066f00a08c 100644
--- a/arch/powerpc/include/asm/qspinlock.h
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -3,9 +3,47 @@
 #define _ASM_POWERPC_QSPINLOCK_H
 
 #include 
+#include 
 
 #define _Q_PENDING_LOOPS   (1 << 9) /* not tuned */
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_queued_spin_unlock(struct qspinlock *lock);
+
+static __always_inline void queued_spin_lock_slowpath(struct qspinlock *lock, 
u32 val)
+{
+   if (!is_shared_processor())
+   native_queued_spin_lock_slowpath(lock, val);
+   else
+   __pv_queued_spin_lock_slowpath(lock, val);
+}
+
+#define queued_spin_unlock queued_spin_unlock
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+   if (!is_shared_processor())
+   smp_store_release(>locked, 0);
+   else
+   __pv_queued_spin_unlock(lock);
+}
+
+#else
+extern void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+#endif
+
+static __always_inline void queued_spin_lock(struct qspinlock *lock)
+{
+   u32 val = 0;
+
+   if (likely(atomic_try_cmpxchg_acquire(>val, , _Q_LOCKED_VAL)))
+   return;
+
+   queued_spin_lock_slowpath(lock, val);
+}
+#define queued_spin_lock queued_spin_lock
+
 #define smp_mb__after_spinlock()   smp_mb()
 
 static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
@@ -20,6 +58,34 @@ static __always_inline int queued_spin_is_locked(struct 
qspinlock *lock)
 }
 #define queued_spin_is_locked queued_spin_is_locked
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define SPIN_THRESHOLD (1<<15) /* not tuned */
+
+static __always_inline void pv_wait(u8 *ptr, u8 val)
+{
+   if (*ptr != val)
+   return;
+   yield_to_any();
+   /*
+* We could pass in a CPU here if waiting in the queue and yield to
+* the previous CPU in the queue.
+*/
+}
+
+static __always_inline void pv_kick(int cpu)
+{
+   prod_cpu(cpu);
+}
+
+extern void __pv_init_lock_hash(void);
+
+static inline void pv_spinlocks_init(void)
+{
+   __pv_init_lock_hash();
+}
+
+#endif
+
 #include 
 
 #endif /* _ASM_POWERPC_QSPINLOCK_H */
diff --git a/arch/powerpc/include/asm/qspinlock_paravirt.h 
b/arch/powerpc/include/asm/qspinlock_paravirt.h
new file mode 100644
index ..750d1b5e0202
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock_paravirt.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __ASM_QSPINLOCK_PARAVIRT_H
+#define __ASM_QSPINLOCK_PARAVIRT_H
+
+EXPORT_SYMBOL(__pv_queued_spin_unlock);
+
+#endif /* __ASM_QSPINLOCK_PARAVIRT_H */
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 24c18362e5ea..756e727b383f 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -25,9 +25,14 @@ config 

[PATCH v3 4/6] powerpc/64s: implement queued spinlocks and rwlocks

2020-07-05 Thread Nicholas Piggin
These have shown significantly improved performance and fairness when
spinlock contention is moderate to high on very large systems.

 [ Numbers hopefully forthcoming after more testing, but initial
   results look good ]

Thanks to the fast path, single threaded performance is not noticably
hurt.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig  | 13 
 arch/powerpc/include/asm/Kbuild   |  2 ++
 arch/powerpc/include/asm/qspinlock.h  | 25 +++
 arch/powerpc/include/asm/spinlock.h   |  5 +
 arch/powerpc/include/asm/spinlock_types.h |  5 +
 arch/powerpc/lib/Makefile |  3 +++
 include/asm-generic/qspinlock.h   |  2 ++
 7 files changed, 55 insertions(+)
 create mode 100644 arch/powerpc/include/asm/qspinlock.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 24ac85c868db..17663ea57697 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -146,6 +146,8 @@ config PPC
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
+   select ARCH_USE_QUEUED_RWLOCKS  if PPC_QUEUED_SPINLOCKS
+   select ARCH_USE_QUEUED_SPINLOCKSif PPC_QUEUED_SPINLOCKS
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
@@ -492,6 +494,17 @@ config HOTPLUG_CPU
 
  Say N if you are unsure.
 
+config PPC_QUEUED_SPINLOCKS
+   bool "Queued spinlocks"
+   depends on SMP
+   default "y" if PPC_BOOK3S_64
+   help
+ Say Y here to use to use queued spinlocks which are more complex
+ but give better salability and fairness on large SMP and NUMA
+ systems.
+
+ If unsure, say "Y" if you have lots of cores, otherwise "N".
+
 config ARCH_CPU_PROBE_RELEASE
def_bool y
depends on HOTPLUG_CPU
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index dadbcf3a0b1e..1dd8b6adff5e 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -6,5 +6,7 @@ generated-y += syscall_table_spu.h
 generic-y += export.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
+generic-y += qrwlock.h
+generic-y += qspinlock.h
 generic-y += vtime.h
 generic-y += early_ioremap.h
diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
new file mode 100644
index ..c49e33e24edd
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_QSPINLOCK_H
+#define _ASM_POWERPC_QSPINLOCK_H
+
+#include 
+
+#define _Q_PENDING_LOOPS   (1 << 9) /* not tuned */
+
+#define smp_mb__after_spinlock()   smp_mb()
+
+static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
+{
+   /*
+* This barrier was added to simple spinlocks by commit 51d7d5205d338,
+* but it should now be possible to remove it, asm arm64 has done with
+* commit c6f5d02b6a0f.
+*/
+   smp_mb();
+   return atomic_read(>val);
+}
+#define queued_spin_is_locked queued_spin_is_locked
+
+#include 
+
+#endif /* _ASM_POWERPC_QSPINLOCK_H */
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 21357fe05fe0..434615f1d761 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -3,7 +3,12 @@
 #define __ASM_SPINLOCK_H
 #ifdef __KERNEL__
 
+#ifdef CONFIG_PPC_QUEUED_SPINLOCKS
+#include 
+#include 
+#else
 #include 
+#endif
 
 #endif /* __KERNEL__ */
 #endif /* __ASM_SPINLOCK_H */
diff --git a/arch/powerpc/include/asm/spinlock_types.h 
b/arch/powerpc/include/asm/spinlock_types.h
index 3906f52dae65..c5d742f18021 100644
--- a/arch/powerpc/include/asm/spinlock_types.h
+++ b/arch/powerpc/include/asm/spinlock_types.h
@@ -6,6 +6,11 @@
 # error "please don't include this file directly"
 #endif
 
+#ifdef CONFIG_PPC_QUEUED_SPINLOCKS
+#include 
+#include 
+#else
 #include 
+#endif
 
 #endif
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 5e994cda8e40..d66a645503eb 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -41,7 +41,10 @@ obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o 
copypage_power7.o \
 obj64-y+= copypage_64.o copyuser_64.o mem_64.o hweight_64.o \
   memcpy_64.o memcpy_mcsafe_64.o
 
+ifndef CONFIG_PPC_QUEUED_SPINLOCKS
 obj64-$(CONFIG_SMP)+= locks.o
+endif
+
 obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o
 obj64-$(CONFIG_KPROBES_SANITY_TEST)+= test_emulate_step.o \
   test_emulate_step_exec_instr.o
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index fde943d180e0..fb0a814d4395 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -12,6 +12,7 @@
 
 #include 
 
+#ifndef queued_spin_is_locked
 /**
  * 

[PATCH v3 3/6] powerpc: move spinlock implementation to simple_spinlock

2020-07-05 Thread Nicholas Piggin
To prepare for queued spinlocks. This is a simple rename except to update
preprocessor guard name and a file reference.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/simple_spinlock.h| 292 ++
 .../include/asm/simple_spinlock_types.h   |  21 ++
 arch/powerpc/include/asm/spinlock.h   | 285 +
 arch/powerpc/include/asm/spinlock_types.h |  12 +-
 4 files changed, 315 insertions(+), 295 deletions(-)
 create mode 100644 arch/powerpc/include/asm/simple_spinlock.h
 create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h

diff --git a/arch/powerpc/include/asm/simple_spinlock.h 
b/arch/powerpc/include/asm/simple_spinlock.h
new file mode 100644
index ..e048c041c4a9
--- /dev/null
+++ b/arch/powerpc/include/asm/simple_spinlock.h
@@ -0,0 +1,292 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __ASM_SIMPLE_SPINLOCK_H
+#define __ASM_SIMPLE_SPINLOCK_H
+#ifdef __KERNEL__
+
+/*
+ * Simple spin lock operations.  
+ *
+ * Copyright (C) 2001-2004 Paul Mackerras , IBM
+ * Copyright (C) 2001 Anton Blanchard , IBM
+ * Copyright (C) 2002 Dave Engebretsen , IBM
+ * Rework to support virtual processors
+ *
+ * Type of int is used as a full 64b word is not necessary.
+ *
+ * (the type definitions are in asm/simple_spinlock_types.h)
+ */
+#include 
+#include 
+#ifdef CONFIG_PPC64
+#include 
+#endif
+#include 
+#include 
+
+#ifdef CONFIG_PPC64
+/* use 0x80yy when locked, where yy == CPU number */
+#ifdef __BIG_ENDIAN__
+#define LOCK_TOKEN (*(u32 *)(_paca()->lock_token))
+#else
+#define LOCK_TOKEN (*(u32 *)(_paca()->paca_index))
+#endif
+#else
+#define LOCK_TOKEN 1
+#endif
+
+static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
+{
+   return lock.slock == 0;
+}
+
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
+{
+   smp_mb();
+   return !arch_spin_value_unlocked(*lock);
+}
+
+/*
+ * This returns the old value in the lock, so we succeeded
+ * in getting the lock if the return value is 0.
+ */
+static inline unsigned long __arch_spin_trylock(arch_spinlock_t *lock)
+{
+   unsigned long tmp, token;
+
+   token = LOCK_TOKEN;
+   __asm__ __volatile__(
+"1:" PPC_LWARX(%0,0,%2,1) "\n\
+   cmpwi   0,%0,0\n\
+   bne-2f\n\
+   stwcx.  %1,0,%2\n\
+   bne-1b\n"
+   PPC_ACQUIRE_BARRIER
+"2:"
+   : "=" (tmp)
+   : "r" (token), "r" (>slock)
+   : "cr0", "memory");
+
+   return tmp;
+}
+
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+   return __arch_spin_trylock(lock) == 0;
+}
+
+/*
+ * On a system with shared processors (that is, where a physical
+ * processor is multiplexed between several virtual processors),
+ * there is no point spinning on a lock if the holder of the lock
+ * isn't currently scheduled on a physical processor.  Instead
+ * we detect this situation and ask the hypervisor to give the
+ * rest of our timeslice to the lock holder.
+ *
+ * So that we can tell which virtual processor is holding a lock,
+ * we put 0x8000 | smp_processor_id() in the lock when it is
+ * held.  Conveniently, we have a word in the paca that holds this
+ * value.
+ */
+
+#if defined(CONFIG_PPC_SPLPAR)
+/* We only yield to the hypervisor if we are in shared processor mode */
+void splpar_spin_yield(arch_spinlock_t *lock);
+void splpar_rw_yield(arch_rwlock_t *lock);
+#else /* SPLPAR */
+static inline void splpar_spin_yield(arch_spinlock_t *lock) {};
+static inline void splpar_rw_yield(arch_rwlock_t *lock) {};
+#endif
+
+static inline void spin_yield(arch_spinlock_t *lock)
+{
+   if (is_shared_processor())
+   splpar_spin_yield(lock);
+   else
+   barrier();
+}
+
+static inline void rw_yield(arch_rwlock_t *lock)
+{
+   if (is_shared_processor())
+   splpar_rw_yield(lock);
+   else
+   barrier();
+}
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+   while (1) {
+   if (likely(__arch_spin_trylock(lock) == 0))
+   break;
+   do {
+   HMT_low();
+   if (is_shared_processor())
+   splpar_spin_yield(lock);
+   } while (unlikely(lock->slock != 0));
+   HMT_medium();
+   }
+}
+
+static inline
+void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
+{
+   unsigned long flags_dis;
+
+   while (1) {
+   if (likely(__arch_spin_trylock(lock) == 0))
+   break;
+   local_save_flags(flags_dis);
+   local_irq_restore(flags);
+   do {
+   HMT_low();
+   if (is_shared_processor())
+   splpar_spin_yield(lock);
+   } while (unlikely(lock->slock != 0));
+   HMT_medium();
+   

[PATCH v3 2/6] powerpc/pseries: move some PAPR paravirt functions to their own file

2020-07-05 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/paravirt.h | 61 +
 arch/powerpc/include/asm/spinlock.h | 24 +---
 arch/powerpc/lib/locks.c| 12 +++---
 3 files changed, 68 insertions(+), 29 deletions(-)
 create mode 100644 arch/powerpc/include/asm/paravirt.h

diff --git a/arch/powerpc/include/asm/paravirt.h 
b/arch/powerpc/include/asm/paravirt.h
new file mode 100644
index ..7a8546660a63
--- /dev/null
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __ASM_PARAVIRT_H
+#define __ASM_PARAVIRT_H
+#ifdef __KERNEL__
+
+#include 
+#include 
+#ifdef CONFIG_PPC64
+#include 
+#include 
+#endif
+
+#ifdef CONFIG_PPC_SPLPAR
+DECLARE_STATIC_KEY_FALSE(shared_processor);
+
+static inline bool is_shared_processor(void)
+{
+   return static_branch_unlikely(_processor);
+}
+
+/* If bit 0 is set, the cpu has been preempted */
+static inline u32 yield_count_of(int cpu)
+{
+   __be32 yield_count = READ_ONCE(lppaca_of(cpu).yield_count);
+   return be32_to_cpu(yield_count);
+}
+
+static inline void yield_to_preempted(int cpu, u32 yield_count)
+{
+   plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(cpu), 
yield_count);
+}
+#else
+static inline bool is_shared_processor(void)
+{
+   return false;
+}
+
+static inline u32 yield_count_of(int cpu)
+{
+   return 0;
+}
+
+extern void ___bad_yield_to_preempted(void);
+static inline void yield_to_preempted(int cpu, u32 yield_count)
+{
+   ___bad_yield_to_preempted(); /* This would be a bug */
+}
+#endif
+
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+   if (!is_shared_processor())
+   return false;
+   if (yield_count_of(cpu) & 1)
+   return true;
+   return false;
+}
+
+#endif /* __KERNEL__ */
+#endif /* __ASM_PARAVIRT_H */
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 2d620896cdae..79be9bb10bbb 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -15,11 +15,10 @@
  *
  * (the type definitions are in asm/spinlock_types.h)
  */
-#include 
 #include 
+#include 
 #ifdef CONFIG_PPC64
 #include 
-#include 
 #endif
 #include 
 #include 
@@ -35,18 +34,6 @@
 #define LOCK_TOKEN 1
 #endif
 
-#ifdef CONFIG_PPC_PSERIES
-DECLARE_STATIC_KEY_FALSE(shared_processor);
-
-#define vcpu_is_preempted vcpu_is_preempted
-static inline bool vcpu_is_preempted(int cpu)
-{
-   if (!static_branch_unlikely(_processor))
-   return false;
-   return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
-}
-#endif
-
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return lock.slock == 0;
@@ -110,15 +97,6 @@ static inline void splpar_spin_yield(arch_spinlock_t *lock) 
{};
 static inline void splpar_rw_yield(arch_rwlock_t *lock) {};
 #endif
 
-static inline bool is_shared_processor(void)
-{
-#ifdef CONFIG_PPC_SPLPAR
-   return static_branch_unlikely(_processor);
-#else
-   return false;
-#endif
-}
-
 static inline void spin_yield(arch_spinlock_t *lock)
 {
if (is_shared_processor())
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index 6440d5943c00..04165b7a163f 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -27,14 +27,14 @@ void splpar_spin_yield(arch_spinlock_t *lock)
return;
holder_cpu = lock_value & 0x;
BUG_ON(holder_cpu >= NR_CPUS);
-   yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+
+   yield_count = yield_count_of(holder_cpu);
if ((yield_count & 1) == 0)
return; /* virtual cpu is currently running */
rmb();
if (lock->slock != lock_value)
return; /* something has changed */
-   plpar_hcall_norets(H_CONFER,
-   get_hard_smp_processor_id(holder_cpu), yield_count);
+   yield_to_preempted(holder_cpu, yield_count);
 }
 EXPORT_SYMBOL_GPL(splpar_spin_yield);
 
@@ -53,13 +53,13 @@ void splpar_rw_yield(arch_rwlock_t *rw)
return; /* no write lock at present */
holder_cpu = lock_value & 0x;
BUG_ON(holder_cpu >= NR_CPUS);
-   yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+
+   yield_count = yield_count_of(holder_cpu);
if ((yield_count & 1) == 0)
return; /* virtual cpu is currently running */
rmb();
if (rw->lock != lock_value)
return; /* something has changed */
-   plpar_hcall_norets(H_CONFER,
-   get_hard_smp_processor_id(holder_cpu), yield_count);
+   yield_to_preempted(holder_cpu, yield_count);
 }
 #endif
-- 
2.23.0



[PATCH v3 1/6] powerpc/powernv: must include hvcall.h to get PAPR defines

2020-07-05 Thread Nicholas Piggin
An include goes away in future patches which breaks compilation
without this.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c 
b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index f923359d8afc..8eba6ece7808 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -15,6 +15,7 @@
 
 #include 
 #include 
+#include  /* share error returns with PAPR */
 #include "pci.h"
 
 unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb)
-- 
2.23.0



[PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-05 Thread Nicholas Piggin
v3 is updated to use __pv_queued_spin_unlock, noticed by Waiman (thank you).

Thanks,
Nick

Nicholas Piggin (6):
  powerpc/powernv: must include hvcall.h to get PAPR defines
  powerpc/pseries: move some PAPR paravirt functions to their own file
  powerpc: move spinlock implementation to simple_spinlock
  powerpc/64s: implement queued spinlocks and rwlocks
  powerpc/pseries: implement paravirt qspinlocks for SPLPAR
  powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the
lock hint

 arch/powerpc/Kconfig  |  13 +
 arch/powerpc/include/asm/Kbuild   |   2 +
 arch/powerpc/include/asm/atomic.h |  28 ++
 arch/powerpc/include/asm/paravirt.h   |  89 +
 arch/powerpc/include/asm/qspinlock.h  |  91 ++
 arch/powerpc/include/asm/qspinlock_paravirt.h |   7 +
 arch/powerpc/include/asm/simple_spinlock.h| 292 +
 .../include/asm/simple_spinlock_types.h   |  21 ++
 arch/powerpc/include/asm/spinlock.h   | 308 +-
 arch/powerpc/include/asm/spinlock_types.h |  17 +-
 arch/powerpc/lib/Makefile |   3 +
 arch/powerpc/lib/locks.c  |  12 +-
 arch/powerpc/platforms/powernv/pci-ioda-tce.c |   1 +
 arch/powerpc/platforms/pseries/Kconfig|   5 +
 arch/powerpc/platforms/pseries/setup.c|   6 +-
 include/asm-generic/qspinlock.h   |   4 +
 16 files changed, 577 insertions(+), 322 deletions(-)
 create mode 100644 arch/powerpc/include/asm/paravirt.h
 create mode 100644 arch/powerpc/include/asm/qspinlock.h
 create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
 create mode 100644 arch/powerpc/include/asm/simple_spinlock.h
 create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h

-- 
2.23.0



Re: [PATCH v3 1/3] powerpc/mm: Enable radix GTSE only if supported.

2020-07-05 Thread Bharata B Rao
On Mon, Jul 06, 2020 at 07:19:02AM +0530, Santosh Sivaraj wrote:
> 
> Hi Bharata,
> 
> Bharata B Rao  writes:
> 
> > Make GTSE an MMU feature and enable it by default for radix.
> > However for guest, conditionally enable it if hypervisor supports
> > it via OV5 vector. Let prom_init ask for radix GTSE only if the
> > support exists.
> >
> > Having GTSE as an MMU feature will make it easy to enable radix
> > without GTSE. Currently radix assumes GTSE is enabled by default.
> >
> > Signed-off-by: Bharata B Rao 
> > Reviewed-by: Aneesh Kumar K.V 
> > ---
> >  arch/powerpc/include/asm/mmu.h|  4 
> >  arch/powerpc/kernel/dt_cpu_ftrs.c |  1 +
> >  arch/powerpc/kernel/prom_init.c   | 13 -
> >  arch/powerpc/mm/init_64.c |  5 -
> >  4 files changed, 17 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
> > index f4ac25d4df05..884d51995934 100644
> > --- a/arch/powerpc/include/asm/mmu.h
> > +++ b/arch/powerpc/include/asm/mmu.h
> > @@ -28,6 +28,9 @@
> >   * Individual features below.
> >   */
> >  
> > +/* Guest Translation Shootdown Enable */
> > +#define MMU_FTR_GTSE   ASM_CONST(0x1000)
> > +
> >  /*
> >   * Support for 68 bit VA space. We added that from ISA 2.05
> >   */
> > @@ -173,6 +176,7 @@ enum {
> >  #endif
> >  #ifdef CONFIG_PPC_RADIX_MMU
> > MMU_FTR_TYPE_RADIX |
> > +   MMU_FTR_GTSE |
> >  #ifdef CONFIG_PPC_KUAP
> > MMU_FTR_RADIX_KUAP |
> >  #endif /* CONFIG_PPC_KUAP */
> > diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
> > b/arch/powerpc/kernel/dt_cpu_ftrs.c
> > index a0edeb391e3e..ac650c233cd9 100644
> > --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> > +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> > @@ -336,6 +336,7 @@ static int __init feat_enable_mmu_radix(struct 
> > dt_cpu_feature *f)
> >  #ifdef CONFIG_PPC_RADIX_MMU
> > cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
> > cur_cpu_spec->mmu_features |= MMU_FTRS_HASH_BASE;
> > +   cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;
> > cur_cpu_spec->cpu_user_features |= PPC_FEATURE_HAS_MMU;
> >  
> > return 1;
> > diff --git a/arch/powerpc/kernel/prom_init.c 
> > b/arch/powerpc/kernel/prom_init.c
> > index 90c604d00b7d..cbc605cfdec0 100644
> > --- a/arch/powerpc/kernel/prom_init.c
> > +++ b/arch/powerpc/kernel/prom_init.c
> > @@ -1336,12 +1336,15 @@ static void __init prom_check_platform_support(void)
> > }
> > }
> >  
> > -   if (supported.radix_mmu && supported.radix_gtse &&
> > -   IS_ENABLED(CONFIG_PPC_RADIX_MMU)) {
> > -   /* Radix preferred - but we require GTSE for now */
> > -   prom_debug("Asking for radix with GTSE\n");
> > +   if (supported.radix_mmu && IS_ENABLED(CONFIG_PPC_RADIX_MMU)) {
> > +   /* Radix preferred - Check if GTSE is also supported */
> > +   prom_debug("Asking for radix\n");
> > ibm_architecture_vec.vec5.mmu = OV5_FEAT(OV5_MMU_RADIX);
> > -   ibm_architecture_vec.vec5.radix_ext = OV5_FEAT(OV5_RADIX_GTSE);
> > +   if (supported.radix_gtse)
> > +   ibm_architecture_vec.vec5.radix_ext =
> > +   OV5_FEAT(OV5_RADIX_GTSE);
> > +   else
> > +   prom_debug("Radix GTSE isn't supported\n");
> > } else if (supported.hash_mmu) {
> > /* Default to hash mmu (if we can) */
> > prom_debug("Asking for hash\n");
> > diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> > index bc73abf0bc25..152aa0200cef 100644
> > --- a/arch/powerpc/mm/init_64.c
> > +++ b/arch/powerpc/mm/init_64.c
> > @@ -407,12 +407,15 @@ static void __init early_check_vec5(void)
> > if (!(vec5[OV5_INDX(OV5_RADIX_GTSE)] &
> > OV5_FEAT(OV5_RADIX_GTSE))) {
> > pr_warn("WARNING: Hypervisor doesn't support RADIX with 
> > GTSE\n");
> > -   }
> > +   cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
> > +   } else
> > +   cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;
> 
> The GTSE flag is set by default in feat_enable_mmu_radix(), should it
> be set again here?

Strictly speaking no, but makes it a bit explicit and also follows what
the related feature does below:

> > /* Do radix anyway - the hypervisor said we had to */
> > cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;

Regards,
Bharata.


Re: [PATCH v3 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support

2020-07-05 Thread Michael Ellerman
Kajol Jain  writes:
> Patch here adds cpu hotplug functions to hv_24x7 pmu.
> A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum
> is added.
>
> The online callback function updates the cpumask only if its
> empty. As the primary intention of adding hotplug support
> is to designate a CPU to make HCALL to collect the
> counter data.
>
> The offline function test and clear corresponding cpu in a cpumask
> and update cpumask to any other active cpu.
>
> Signed-off-by: Kajol Jain 
> Reviewed-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/perf/hv-24x7.c | 45 +
>  include/linux/cpuhotplug.h  |  1 +
>  2 files changed, 46 insertions(+)
>
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index db213eb7cb02..ce4739e2b407 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -31,6 +31,8 @@ static int interface_version;
>  /* Whether we have to aggregate result data for some domains. */
>  static bool aggregate_result_elements;
>  
> +static cpumask_t hv_24x7_cpumask;
> +
>  static bool domain_is_valid(unsigned domain)
>  {
>   switch (domain) {
> @@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = {
>   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
>  };
>  
> +static int ppc_hv_24x7_cpu_online(unsigned int cpu)
> +{
> + /* Make this CPU the designated target for counter collection */

The comment implies every newly onlined CPU will become the target, but
actually it's only the first onlined CPU.

So I think the comment needs updating, or you could just drop the
comment, I think the code is fairly clear by itself.

> + if (cpumask_empty(_24x7_cpumask))
> + cpumask_set_cpu(cpu, _24x7_cpumask);
> +
> + return 0;
> +}
> +
> +static int ppc_hv_24x7_cpu_offline(unsigned int cpu)
> +{
> + int target = -1;

No need to initialise target, you assign to it unconditionally below.

> + /* Check if exiting cpu is used for collecting 24x7 events */
> + if (!cpumask_test_and_clear_cpu(cpu, _24x7_cpumask))
> + return 0;
> +
> + /* Find a new cpu to collect 24x7 events */
> + target = cpumask_last(cpu_active_mask);

Any reason to use cpumask_last() vs cpumask_first(), or a randomly
chosen CPU?

> + if (target < 0 || target >= nr_cpu_ids)
> + return -1;
> +
> + /* Migrate 24x7 events to the new target */
> + cpumask_set_cpu(target, _24x7_cpumask);
> + perf_pmu_migrate_context(_24x7_pmu, cpu, target);
> +
> + return 0;
> +}
> +
> +static int hv_24x7_cpu_hotplug_init(void)
> +{
> + return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
> +   "perf/powerpc/hv_24x7:online",
> +   ppc_hv_24x7_cpu_online,
> +   ppc_hv_24x7_cpu_offline);
> +}
> +
>  static int hv_24x7_init(void)
>  {
>   int r;
> @@ -1685,6 +1725,11 @@ static int hv_24x7_init(void)
>   if (r)
>   return r;
>  
> + /* init cpuhotplug */
> + r = hv_24x7_cpu_hotplug_init();
> + if (r)
> + pr_err("hv_24x7: CPU hotplug init failed\n");
> +

The hotplug initialisation shouldn't fail unless something is badly
wrong. I think you should just fail initialisation of the entire PMU if
that happens, which will make the error handling in the next patch much
simpler.

cheers

>   r = perf_pmu_register(_24x7_pmu, h_24x7_pmu.name, -1);
>   if (r)
>   return r;


[PATCH V4 2/3] mm/sparsemem: Enable vmem_altmap support in vmemmap_alloc_block_buf()

2020-07-05 Thread Anshuman Khandual
There are many instances where vmemap allocation is often switched between
regular memory and device memory just based on whether altmap is available
or not. vmemmap_alloc_block_buf() is used in various platforms to allocate
vmemmap mappings. Lets also enable it to handle altmap based device memory
allocation along with existing regular memory allocations. This will help
in avoiding the altmap based allocation switch in many places. To summarize
there are two different methods to call vmemmap_alloc_block_buf().

vmemmap_alloc_block_buf(size, node, NULL)   /* Allocate from system RAM */
vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */

This converts altmap_alloc_block_buf() into a static function, drops it's
entry from the header and updates Documentation/vm/memory-model.rst.

Cc: Jonathan Corbet 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Andrew Morton 
Cc: linux-...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Tested-by: Jia He 
Suggested-by: Robin Murphy 
Signed-off-by: Anshuman Khandual 
---
 Documentation/vm/memory-model.rst |  2 +-
 arch/arm64/mm/mmu.c   |  2 +-
 arch/powerpc/mm/init_64.c |  4 ++--
 arch/x86/mm/init_64.c |  5 +
 include/linux/mm.h|  4 ++--
 mm/sparse-vmemmap.c   | 28 +---
 6 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/Documentation/vm/memory-model.rst 
b/Documentation/vm/memory-model.rst
index 91228044ed16..f26142cf24f2 100644
--- a/Documentation/vm/memory-model.rst
+++ b/Documentation/vm/memory-model.rst
@@ -178,7 +178,7 @@ for persistent memory devices in pre-allocated storage on 
those
 devices. This storage is represented with :c:type:`struct vmem_altmap`
 that is eventually passed to vmemmap_populate() through a long chain
 of function calls. The vmemmap_populate() implementation may use the
-`vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to
+`vmem_altmap` along with :c:func:`vmemmap_alloc_block_buf` helper to
 allocate memory map on the persistent memory device.
 
 ZONE_DEVICE
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 63b74fd56cd8..9c08d1882106 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1101,7 +1101,7 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
if (pmd_none(READ_ONCE(*pmdp))) {
void *p = NULL;
 
-   p = vmemmap_alloc_block_buf(PMD_SIZE, node);
+   p = vmemmap_alloc_block_buf(PMD_SIZE, node, NULL);
if (!p)
return -ENOMEM;
 
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index bc73abf0bc25..3fd504d72c5e 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -225,12 +225,12 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
 * fall back to system memory if the altmap allocation fail.
 */
if (altmap && !altmap_cross_boundary(altmap, start, page_size)) 
{
-   p = altmap_alloc_block_buf(page_size, altmap);
+   p = vmemmap_alloc_block_buf(page_size, node, altmap);
if (!p)
pr_debug("altmap block allocation failed, 
falling back to system memory");
}
if (!p)
-   p = vmemmap_alloc_block_buf(page_size, node);
+   p = vmemmap_alloc_block_buf(page_size, node, NULL);
if (!p)
return -ENOMEM;
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 19c0ed3271a3..5a7a45e7c5ea 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1463,10 +1463,7 @@ static int __meminit vmemmap_populate_hugepages(unsigned 
long start,
if (pmd_none(*pmd)) {
void *p;
 
-   if (altmap)
-   p = altmap_alloc_block_buf(PMD_SIZE, altmap);
-   else
-   p = vmemmap_alloc_block_buf(PMD_SIZE, node);
+   p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
if (p) {
pte_t entry;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e40ac543d248..1973872ed3ab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3015,8 +3015,8 @@ pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long 
addr, int node,
struct vmem_altmap *altmap);
 void 

[PATCH V4 0/3] arm64: Enable vmemmap mapping from device memory

2020-07-05 Thread Anshuman Khandual
This series enables vmemmap backing memory allocation from device memory
ranges on arm64. But before that, it enables vmemmap_populate_basepages()
and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
alocation requests.

This series applies on 5.8-rc4.

Changes in V4:

- Dropped 'fallback' from vmemmap_alloc_block_buf() per Catalin

Changes in V3: 
(https://patchwork.kernel.org/project/linux-mm/list/?series=304707)

- Dropped comment from free_hotplug_page_range() per Robin
- Modified comment in unmap_hotplug_range() per Robin
- Enabled altmap support in vmemmap_alloc_block_buf() per Robin

Changes in V2: (https://lkml.org/lkml/2020/3/4/475)

- Rebased on latest hot-remove series (v14) adding P4D page table support

Changes in V1: (https://lkml.org/lkml/2020/1/23/12)

- Added an WARN_ON() in unmap_hotplug_range() when altmap is
  provided without the page table backing memory being freed

Changes in RFC V2: (https://lkml.org/lkml/2019/10/21/11)

- Changed the commit message on 1/2 patch per Will
- Changed the commit message on 2/2 patch as well
- Rebased on arm64 memory hot remove series (v10)

RFC V1: (https://lkml.org/lkml/2019/6/28/32)

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: David Hildenbrand 
Cc: Mike Rapoport 
Cc: Michal Hocko 
Cc: "Matthew Wilcox (Oracle)" 
Cc: "Kirill A. Shutemov" 
Cc: Andrew Morton 
Cc: Dan Williams 
Cc: Pavel Tatashin 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org


Anshuman Khandual (3):
  mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages()
  mm/sparsemem: Enable vmem_altmap support in vmemmap_alloc_block_buf()
  arm64/mm: Enable vmem_altmap support for vmemmap mappings

 Documentation/vm/memory-model.rst |  2 +-
 arch/arm64/mm/mmu.c   | 58 ---
 arch/ia64/mm/discontig.c  |  2 +-
 arch/powerpc/mm/init_64.c |  4 +--
 arch/riscv/mm/init.c  |  2 +-
 arch/x86/mm/init_64.c | 11 +++---
 include/linux/mm.h|  9 ++---
 mm/sparse-vmemmap.c   | 36 ++-
 8 files changed, 72 insertions(+), 52 deletions(-)

-- 
2.20.1



[PATCH] powerpc: select ARCH_HAS_MEMBARRIER_SYNC_CORE

2020-07-05 Thread Nicholas Piggin
powerpc return from interrupt and return from system call sequences are
context synchronising.

Signed-off-by: Nicholas Piggin 
---
 .../features/sched/membarrier-sync-core/arch-support.txt  | 4 ++--
 arch/powerpc/Kconfig  | 1 +
 arch/powerpc/include/asm/exception-64s.h  | 4 
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt 
b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
index 8a521a622966..52ad74a25f54 100644
--- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt
+++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
@@ -5,7 +5,7 @@
 #
 # Architecture requirements
 #
-# * arm/arm64
+# * arm/arm64/powerpc
 #
 # Rely on implicit context synchronization as a result of exception return
 # when returning from IPI handler, and when returning to user-space.
@@ -45,7 +45,7 @@
 |   nios2: | TODO |
 |openrisc: | TODO |
 |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: |  ok  |
 |   riscv: | TODO |
 |s390: | TODO |
 |  sh: | TODO |
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff..920c4e3ca4ef 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -131,6 +131,7 @@ config PPC
select ARCH_HAS_PTE_DEVMAP  if PPC_BOOK3S_64
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_MEMBARRIER_CALLBACKS
+   select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
select ARCH_HAS_STRICT_KERNEL_RWX   if (PPC32 && !HIBERNATION)
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 47bd4ea0837d..b88cb3a989b6 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -68,6 +68,10 @@
  *
  * The nop instructions allow us to insert one or more instructions to flush 
the
  * L1-D cache when returning to userspace or a guest.
+ *
+ * powerpc relies on return from interrupt/syscall being context synchronising
+ * (which hrfid, rfid, and rfscv are) to support ARCH_HAS_MEMBARRIER_SYNC_CORE
+ * without additional additional synchronisation instructions.
  */
 #define RFI_FLUSH_SLOT \
RFI_FLUSH_FIXUP_SECTION;\
-- 
2.23.0



Re: [PATCH v3 1/3] powerpc/mm: Enable radix GTSE only if supported.

2020-07-05 Thread Santosh Sivaraj


Hi Bharata,

Bharata B Rao  writes:

> Make GTSE an MMU feature and enable it by default for radix.
> However for guest, conditionally enable it if hypervisor supports
> it via OV5 vector. Let prom_init ask for radix GTSE only if the
> support exists.
>
> Having GTSE as an MMU feature will make it easy to enable radix
> without GTSE. Currently radix assumes GTSE is enabled by default.
>
> Signed-off-by: Bharata B Rao 
> Reviewed-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/mmu.h|  4 
>  arch/powerpc/kernel/dt_cpu_ftrs.c |  1 +
>  arch/powerpc/kernel/prom_init.c   | 13 -
>  arch/powerpc/mm/init_64.c |  5 -
>  4 files changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
> index f4ac25d4df05..884d51995934 100644
> --- a/arch/powerpc/include/asm/mmu.h
> +++ b/arch/powerpc/include/asm/mmu.h
> @@ -28,6 +28,9 @@
>   * Individual features below.
>   */
>  
> +/* Guest Translation Shootdown Enable */
> +#define MMU_FTR_GTSE ASM_CONST(0x1000)
> +
>  /*
>   * Support for 68 bit VA space. We added that from ISA 2.05
>   */
> @@ -173,6 +176,7 @@ enum {
>  #endif
>  #ifdef CONFIG_PPC_RADIX_MMU
>   MMU_FTR_TYPE_RADIX |
> + MMU_FTR_GTSE |
>  #ifdef CONFIG_PPC_KUAP
>   MMU_FTR_RADIX_KUAP |
>  #endif /* CONFIG_PPC_KUAP */
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
> b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index a0edeb391e3e..ac650c233cd9 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -336,6 +336,7 @@ static int __init feat_enable_mmu_radix(struct 
> dt_cpu_feature *f)
>  #ifdef CONFIG_PPC_RADIX_MMU
>   cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
>   cur_cpu_spec->mmu_features |= MMU_FTRS_HASH_BASE;
> + cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;
>   cur_cpu_spec->cpu_user_features |= PPC_FEATURE_HAS_MMU;
>  
>   return 1;
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index 90c604d00b7d..cbc605cfdec0 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -1336,12 +1336,15 @@ static void __init prom_check_platform_support(void)
>   }
>   }
>  
> - if (supported.radix_mmu && supported.radix_gtse &&
> - IS_ENABLED(CONFIG_PPC_RADIX_MMU)) {
> - /* Radix preferred - but we require GTSE for now */
> - prom_debug("Asking for radix with GTSE\n");
> + if (supported.radix_mmu && IS_ENABLED(CONFIG_PPC_RADIX_MMU)) {
> + /* Radix preferred - Check if GTSE is also supported */
> + prom_debug("Asking for radix\n");
>   ibm_architecture_vec.vec5.mmu = OV5_FEAT(OV5_MMU_RADIX);
> - ibm_architecture_vec.vec5.radix_ext = OV5_FEAT(OV5_RADIX_GTSE);
> + if (supported.radix_gtse)
> + ibm_architecture_vec.vec5.radix_ext =
> + OV5_FEAT(OV5_RADIX_GTSE);
> + else
> + prom_debug("Radix GTSE isn't supported\n");
>   } else if (supported.hash_mmu) {
>   /* Default to hash mmu (if we can) */
>   prom_debug("Asking for hash\n");
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index bc73abf0bc25..152aa0200cef 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -407,12 +407,15 @@ static void __init early_check_vec5(void)
>   if (!(vec5[OV5_INDX(OV5_RADIX_GTSE)] &
>   OV5_FEAT(OV5_RADIX_GTSE))) {
>   pr_warn("WARNING: Hypervisor doesn't support RADIX with 
> GTSE\n");
> - }
> + cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
> + } else
> + cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;

The GTSE flag is set by default in feat_enable_mmu_radix(), should it
be set again here?

Thanks,
Santosh
>   /* Do radix anyway - the hypervisor said we had to */
>   cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
>   } else if (mmu_supported == OV5_FEAT(OV5_MMU_HASH)) {
>   /* Hypervisor only supports hash - disable radix */
>   cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
> + cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
>   }
>  }
>  
> -- 
> 2.21.3


[PATCH 14/14] powerpc/eeh: Move PE tree setup into the platform

2020-07-05 Thread Oliver O'Halloran
The EEH core has a concept of a "PE tree" to support PowerNV. The PE tree
follows the PCI bus structures because a reset asserted on an upstream
bridge will be propagated to the downstream bridges. On pseries there's a
1-1 correspondence between what the guest sees are a PHB and a PE so the
"tree" is really just a single node.

Current the EEH core is reponsible for setting up this PE tree which it
does by traversing the pci_dn tree. The structure of the pci_dn tree
matches the bus tree on PowerNV which leads to the PE tree being "correct"
this setup method doesn't make a whole lot of sense and it's actively
confusing for the pseries case where it doesn't really do anything.

We want to remove the dependence on pci_dn anyway so this patch move
choosing where to insert a new PE into the platform code rather than
being part of the generic EEH code. For PowerNV this simplifies the
tree building logic and removes the use of pci_dn. For pseries we
keep the existing logic. I'm not really convinced it does anything
due to the 1-1 PE-to-PHB correspondence so every device under that
PHB should be in the same PE, but I'd rather not remove it entirely
until we've had a chance to look at it more deeply.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h   |  2 +-
 arch/powerpc/kernel/eeh_pe.c | 70 ++--
 arch/powerpc/platforms/powernv/eeh-powernv.c | 27 +++-
 arch/powerpc/platforms/pseries/eeh_pseries.c | 59 +++--
 4 files changed, 101 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 8d34e5b790c2..1cab629dbc74 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -283,7 +283,7 @@ struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
 struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root);
 struct eeh_pe *eeh_pe_get(struct pci_controller *phb,
  int pe_no, int config_addr);
-int eeh_pe_tree_insert(struct eeh_dev *edev);
+int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent);
 int eeh_pe_tree_remove(struct eeh_dev *edev);
 void eeh_pe_update_time_stamp(struct eeh_pe *pe);
 void *eeh_pe_traverse(struct eeh_pe *root,
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 898205829a8f..ea2f8b362d18 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -318,53 +318,20 @@ struct eeh_pe *eeh_pe_get(struct pci_controller *phb,
return pe;
 }
 
-/**
- * eeh_pe_get_parent - Retrieve the parent PE
- * @edev: EEH device
- *
- * The whole PEs existing in the system are organized as hierarchy
- * tree. The function is used to retrieve the parent PE according
- * to the parent EEH device.
- */
-static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
-{
-   struct eeh_dev *parent;
-   struct pci_dn *pdn = eeh_dev_to_pdn(edev);
-
-   /*
-* It might have the case for the indirect parent
-* EEH device already having associated PE, but
-* the direct parent EEH device doesn't have yet.
-*/
-   if (edev->physfn)
-   pdn = pci_get_pdn(edev->physfn);
-   else
-   pdn = pdn ? pdn->parent : NULL;
-   while (pdn) {
-   /* We're poking out of PCI territory */
-   parent = pdn_to_eeh_dev(pdn);
-   if (!parent)
-   return NULL;
-
-   if (parent->pe)
-   return parent->pe;
-
-   pdn = pdn->parent;
-   }
-
-   return NULL;
-}
-
 /**
  * eeh_pe_tree_insert - Add EEH device to parent PE
  * @edev: EEH device
+ * @new_pe_parent: PE to create additional PEs under
  *
- * Add EEH device to the parent PE. If the parent PE already
- * exists, the PE type will be changed to EEH_PE_BUS. Otherwise,
- * we have to create new PE to hold the EEH device and the new
- * PE will be linked to its parent PE as well.
+ * Add EEH device to the PE in edev->pe_config_addr. If a PE already
+ * exists with that address then @edev is added to that PE. Otherwise
+ * a new PE is created and inserted into the PE tree as a child of
+ * @new_pe_parent.
+ *
+ * If @new_pe_parent is NULL then the new PE will be inserted under
+ * directly under the the PHB.
  */
-int eeh_pe_tree_insert(struct eeh_dev *edev)
+int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent)
 {
struct pci_controller *hose = edev->controller;
struct eeh_pe *pe, *parent;
@@ -399,7 +366,7 @@ int eeh_pe_tree_insert(struct eeh_dev *edev)
}
 
eeh_edev_dbg(edev,
-"Added to device PE (parent: PE#%x)\n",
+"Added to existing PE (parent: PE#%x)\n",
 pe->parent->addr);
} else {
/* Mark the PE as type of PCI bus */

[PATCH 13/14] powerpc/eeh: Drop pdn use in eeh_pe_tree_insert()

2020-07-05 Thread Oliver O'Halloran
This is mostly just to make the subsequent diffs less noisy. No functional
changes.

One thing that needs calling out is the removal of the "config_addr"
variable and replacing it with edev->bdfn. The contents of edev->bdfn are
the same, however it's worth pointing out that what RTAS calls a
"config_addr" isn't the same as the bdfn. The config_addr is supposed to
be:  with each field being an 8 bit number. Various parts
of the EEH code use BDFN and "config_addr" as interchangeable quantities
even though they aren't really.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/kernel/eeh_pe.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 97bf09db2ecd..898205829a8f 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -366,9 +366,8 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev 
*edev)
  */
 int eeh_pe_tree_insert(struct eeh_dev *edev)
 {
+   struct pci_controller *hose = edev->controller;
struct eeh_pe *pe, *parent;
-   struct pci_dn *pdn = eeh_dev_to_pdn(edev);
-   int config_addr = (pdn->busno << 8) | (pdn->devfn);
 
/* Check if the PE number is valid */
if (!eeh_has_flag(EEH_VALID_PE_ZERO) && !edev->pe_config_addr) {
@@ -382,7 +381,7 @@ int eeh_pe_tree_insert(struct eeh_dev *edev)
 * PE should be composed of PCI bus and its subordinate
 * components.
 */
-   pe = eeh_pe_get(pdn->phb, edev->pe_config_addr, config_addr);
+   pe = eeh_pe_get(hose, edev->pe_config_addr, edev->bdfn);
if (pe) {
if (pe->type & EEH_PE_INVALID) {
list_add_tail(>entry, >edevs);
@@ -416,15 +415,15 @@ int eeh_pe_tree_insert(struct eeh_dev *edev)
 
/* Create a new EEH PE */
if (edev->physfn)
-   pe = eeh_pe_alloc(pdn->phb, EEH_PE_VF);
+   pe = eeh_pe_alloc(hose, EEH_PE_VF);
else
-   pe = eeh_pe_alloc(pdn->phb, EEH_PE_DEVICE);
+   pe = eeh_pe_alloc(hose, EEH_PE_DEVICE);
if (!pe) {
pr_err("%s: out of memory!\n", __func__);
return -ENOMEM;
}
pe->addr= edev->pe_config_addr;
-   pe->config_addr = config_addr;
+   pe->config_addr = edev->bdfn;
 
/*
 * Put the new EEH PE into hierarchy tree. If the parent
@@ -434,10 +433,10 @@ int eeh_pe_tree_insert(struct eeh_dev *edev)
 */
parent = eeh_pe_get_parent(edev);
if (!parent) {
-   parent = eeh_phb_pe_get(pdn->phb);
+   parent = eeh_phb_pe_get(hose);
if (!parent) {
pr_err("%s: No PHB PE is found (PHB Domain=%d)\n",
-   __func__, pdn->phb->global_number);
+   __func__, hose->global_number);
edev->pe = NULL;
kfree(pe);
return -EEXIST;
-- 
2.26.2



[PATCH 12/14] powerpc/eeh: Rename eeh_{add_to|remove_from}_parent_pe()

2020-07-05 Thread Oliver O'Halloran
The naming of eeh_{add_to|remove_from}_parent_pe() doesn't really reflect
what they actually do. If the PE referred to be edev->pe_config_addr
already exists under that PHB then the edev is added to that PE. However,
if the PE doesn't exist the a new one is created for the edev.

The bulk of the implementation of eeh_add_to_parent_pe() covers that
second case. Similarly, most of eeh_remove_from_parent_pe() is
determining when it's safe to delete a PE.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h   | 4 ++--
 arch/powerpc/kernel/eeh.c| 4 ++--
 arch/powerpc/kernel/eeh_driver.c | 2 +-
 arch/powerpc/kernel/eeh_pe.c | 8 
 arch/powerpc/kernel/pci_dn.c | 2 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 2 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c | 8 
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index a2f7ed204ece..8d34e5b790c2 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -283,8 +283,8 @@ struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
 struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root);
 struct eeh_pe *eeh_pe_get(struct pci_controller *phb,
  int pe_no, int config_addr);
-int eeh_add_to_parent_pe(struct eeh_dev *edev);
-int eeh_rmv_from_parent_pe(struct eeh_dev *edev);
+int eeh_pe_tree_insert(struct eeh_dev *edev);
+int eeh_pe_tree_remove(struct eeh_dev *edev);
 void eeh_pe_update_time_stamp(struct eeh_pe *pe);
 void *eeh_pe_traverse(struct eeh_pe *root,
  eeh_pe_traverse_func fn, void *flag);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f203ffc5c57d..94682382fc8c 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1107,7 +1107,7 @@ void eeh_probe_device(struct pci_dev *dev)
 * FIXME: HEY MA, LOOK AT ME, NO LOCKING!
 */
if (edev->pdev && edev->pdev != dev) {
-   eeh_rmv_from_parent_pe(edev);
+   eeh_pe_tree_remove(edev);
eeh_addr_cache_rmv_dev(edev->pdev);
eeh_sysfs_remove_device(edev->pdev);
 
@@ -1186,7 +1186,7 @@ void eeh_remove_device(struct pci_dev *dev)
edev->in_error = false;
dev->dev.archdata.edev = NULL;
if (!(edev->pe->state & EEH_PE_KEEP))
-   eeh_rmv_from_parent_pe(edev);
+   eeh_pe_tree_remove(edev);
else
edev->mode |= EEH_DEV_DISCONNECTED;
 }
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index b84d3cb2532e..4197e4559f65 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -542,7 +542,7 @@ static void *eeh_pe_detach_dev(struct eeh_pe *pe, void 
*userdata)
continue;
 
edev->mode &= ~(EEH_DEV_DISCONNECTED | EEH_DEV_IRQ_DISABLED);
-   eeh_rmv_from_parent_pe(edev);
+   eeh_pe_tree_remove(edev);
}
 
return NULL;
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index f20fb0ee6aec..97bf09db2ecd 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -356,7 +356,7 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev 
*edev)
 }
 
 /**
- * eeh_add_to_parent_pe - Add EEH device to parent PE
+ * eeh_pe_tree_insert - Add EEH device to parent PE
  * @edev: EEH device
  *
  * Add EEH device to the parent PE. If the parent PE already
@@ -364,7 +364,7 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev 
*edev)
  * we have to create new PE to hold the EEH device and the new
  * PE will be linked to its parent PE as well.
  */
-int eeh_add_to_parent_pe(struct eeh_dev *edev)
+int eeh_pe_tree_insert(struct eeh_dev *edev)
 {
struct eeh_pe *pe, *parent;
struct pci_dn *pdn = eeh_dev_to_pdn(edev);
@@ -459,7 +459,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 }
 
 /**
- * eeh_rmv_from_parent_pe - Remove one EEH device from the associated PE
+ * eeh_pe_tree_remove - Remove one EEH device from the associated PE
  * @edev: EEH device
  *
  * The PE hierarchy tree might be changed when doing PCI hotplug.
@@ -467,7 +467,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
  * during EEH recovery. So we have to call the function remove the
  * corresponding PE accordingly if necessary.
  */
-int eeh_rmv_from_parent_pe(struct eeh_dev *edev)
+int eeh_pe_tree_remove(struct eeh_dev *edev)
 {
struct eeh_pe *pe, *parent, *child;
bool keep, recover;
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index bf11ac8427ac..e99b7c547d7e 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -263,7 +263,7 @@ void remove_sriov_vf_pdns(struct pci_dev *pdev)
 * have a configured PE.
  

[PATCH 11/14] powerpc/eeh: Remove class code field from edev

2020-07-05 Thread Oliver O'Halloran
The edev->class_code field is never referenced anywhere except for the
platform specific probe functions. The same information is available in
the pci_dev for PowerNV and in the pci_dn on pseries so we can remove
the field.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h   | 1 -
 arch/powerpc/platforms/powernv/eeh-powernv.c | 5 ++---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 3 +--
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 293a55dc803b..a2f7ed204ece 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -133,7 +133,6 @@ static inline bool eeh_pe_passed(struct eeh_pe *pe)
 
 struct eeh_dev {
int mode;   /* EEH mode */
-   int class_code; /* Class code of the device */
int bdfn;   /* bdfn of device (for cfg ops) */
struct pci_controller *controller;
int pe_config_addr; /* PE config address*/
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index c9f2f454d053..7cbb03a97a61 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -372,19 +372,18 @@ static struct eeh_dev *pnv_eeh_probe(struct pci_dev *pdev)
}
 
/* Skip for PCI-ISA bridge */
-   if ((pdn->class_code >> 8) == PCI_CLASS_BRIDGE_ISA)
+   if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA)
return NULL;
 
eeh_edev_dbg(edev, "Probing device\n");
 
/* Initialize eeh device */
-   edev->class_code = pdn->class_code;
edev->mode  &= 0xFF00;
edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
-   if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
+   if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_PCI) {
edev->mode |= EEH_DEV_BRIDGE;
if (edev->pcie_cap) {
pnv_pci_cfg_read(pdn, edev->pcie_cap + PCI_EXP_FLAGS,
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index e75579b857ce..daf6caeca8f0 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -273,12 +273,11 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
 * correctly reflects that current device is root port
 * or PCIe switch downstream port.
 */
-   edev->class_code = pdn->class_code;
edev->pcix_cap = pseries_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
edev->pcie_cap = pseries_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
edev->aer_cap = pseries_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
edev->mode &= 0xFF00;
-   if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
+   if ((pdn->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
edev->mode |= EEH_DEV_BRIDGE;
if (edev->pcie_cap) {
rtas_read_config(pdn, edev->pcie_cap + PCI_EXP_FLAGS,
-- 
2.26.2



[PATCH 10/14] powerpc/eeh: Remove spurious use of pci_dn in eeh_dump_dev_log

2020-07-05 Thread Oliver O'Halloran
Retrieve the domain, bus, device, and function numbers from the edev.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/kernel/eeh.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 1a12c8bdf61e..f203ffc5c57d 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -167,23 +167,17 @@ void eeh_show_enabled(void)
  */
 static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 {
-   struct pci_dn *pdn = eeh_dev_to_pdn(edev);
u32 cfg;
int cap, i;
int n = 0, l = 0;
char buffer[128];
 
-   if (!pdn) {
-   pr_warn("EEH: Note: No error log for absent device.\n");
-   return 0;
-   }
-
n += scnprintf(buf+n, len-n, "%04x:%02x:%02x.%01x\n",
-  pdn->phb->global_number, pdn->busno,
-  PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+   edev->pe->phb->global_number, edev->bdfn >> 8,
+   PCI_SLOT(edev->bdfn), PCI_FUNC(edev->bdfn));
pr_warn("EEH: of node=%04x:%02x:%02x.%01x\n",
-   pdn->phb->global_number, pdn->busno,
-   PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+   edev->pe->phb->global_number, edev->bdfn >> 8,
+   PCI_SLOT(edev->bdfn), PCI_FUNC(edev->bdfn));
 
eeh_ops->read_config(edev, PCI_VENDOR_ID, 4, );
n += scnprintf(buf+n, len-n, "dev/vend:%08x\n", cfg);
-- 
2.26.2



[PATCH 09/14] powerpc/eeh: Pass eeh_dev to eeh_ops->{read|write}_config()

2020-07-05 Thread Oliver O'Halloran
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference
a specific device rather than pci_dn. No functional changes.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h   |  4 +-
 arch/powerpc/kernel/eeh.c| 22 +
 arch/powerpc/kernel/eeh_pe.c | 47 +---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 43 ++
 arch/powerpc/platforms/pseries/eeh_pseries.c | 16 ---
 5 files changed, 68 insertions(+), 64 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 2728a3790f6c..293a55dc803b 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -226,8 +226,8 @@ struct eeh_ops {
int (*configure_bridge)(struct eeh_pe *pe);
int (*err_inject)(struct eeh_pe *pe, int type, int func,
  unsigned long addr, unsigned long mask);
-   int (*read_config)(struct pci_dn *pdn, int where, int size, u32 *val);
-   int (*write_config)(struct pci_dn *pdn, int where, int size, u32 val);
+   int (*read_config)(struct eeh_dev *edev, int where, int size, u32 *val);
+   int (*write_config)(struct eeh_dev *edev, int where, int size, u32 val);
int (*next_error)(struct eeh_pe **pe);
int (*restore_config)(struct eeh_dev *edev);
int (*notify_resume)(struct eeh_dev *edev);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 1cef0f4bb2d5..1a12c8bdf61e 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -185,21 +185,21 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char 
*buf, size_t len)
pdn->phb->global_number, pdn->busno,
PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
 
-   eeh_ops->read_config(pdn, PCI_VENDOR_ID, 4, );
+   eeh_ops->read_config(edev, PCI_VENDOR_ID, 4, );
n += scnprintf(buf+n, len-n, "dev/vend:%08x\n", cfg);
pr_warn("EEH: PCI device/vendor: %08x\n", cfg);
 
-   eeh_ops->read_config(pdn, PCI_COMMAND, 4, );
+   eeh_ops->read_config(edev, PCI_COMMAND, 4, );
n += scnprintf(buf+n, len-n, "cmd/stat:%x\n", cfg);
pr_warn("EEH: PCI cmd/status register: %08x\n", cfg);
 
/* Gather bridge-specific registers */
if (edev->mode & EEH_DEV_BRIDGE) {
-   eeh_ops->read_config(pdn, PCI_SEC_STATUS, 2, );
+   eeh_ops->read_config(edev, PCI_SEC_STATUS, 2, );
n += scnprintf(buf+n, len-n, "sec stat:%x\n", cfg);
pr_warn("EEH: Bridge secondary status: %04x\n", cfg);
 
-   eeh_ops->read_config(pdn, PCI_BRIDGE_CONTROL, 2, );
+   eeh_ops->read_config(edev, PCI_BRIDGE_CONTROL, 2, );
n += scnprintf(buf+n, len-n, "brdg ctl:%x\n", cfg);
pr_warn("EEH: Bridge control: %04x\n", cfg);
}
@@ -207,11 +207,11 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char 
*buf, size_t len)
/* Dump out the PCI-X command and status regs */
cap = edev->pcix_cap;
if (cap) {
-   eeh_ops->read_config(pdn, cap, 4, );
+   eeh_ops->read_config(edev, cap, 4, );
n += scnprintf(buf+n, len-n, "pcix-cmd:%x\n", cfg);
pr_warn("EEH: PCI-X cmd: %08x\n", cfg);
 
-   eeh_ops->read_config(pdn, cap+4, 4, );
+   eeh_ops->read_config(edev, cap+4, 4, );
n += scnprintf(buf+n, len-n, "pcix-stat:%x\n", cfg);
pr_warn("EEH: PCI-X status: %08x\n", cfg);
}
@@ -223,7 +223,7 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char 
*buf, size_t len)
pr_warn("EEH: PCI-E capabilities and status follow:\n");
 
for (i=0; i<=8; i++) {
-   eeh_ops->read_config(pdn, cap+4*i, 4, );
+   eeh_ops->read_config(edev, cap+4*i, 4, );
n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
 
if ((i % 4) == 0) {
@@ -250,7 +250,7 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char 
*buf, size_t len)
pr_warn("EEH: PCI-E AER capability register set follows:\n");
 
for (i=0; i<=13; i++) {
-   eeh_ops->read_config(pdn, cap+4*i, 4, );
+   eeh_ops->read_config(edev, cap+4*i, 4, );
n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
 
if ((i % 4) == 0) {
@@ -917,15 +917,13 @@ int eeh_pe_reset_full(struct eeh_pe *pe, bool 
include_passed)
  */
 void eeh_save_bars(struct eeh_dev *edev)
 {
-   struct pci_dn *pdn;
int i;
 
-   pdn = eeh_dev_to_pdn(edev);
-   if (!pdn)
+   if (!edev)
return;
 
for (i = 0; i < 16; i++)
-   eeh_ops->read_config(pdn, i * 4, 4, >config_space[i]);
+   eeh_ops->read_config(edev, i * 4, 4, >config_space[i]);
 
   

[PATCH 08/14] powerpc/eeh: Pass eeh_dev to eeh_ops->resume_notify()

2020-07-05 Thread Oliver O'Halloran
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference
a specific device rather than pci_dn. No functional changes.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h   | 2 +-
 arch/powerpc/kernel/eeh_driver.c | 4 ++--
 arch/powerpc/kernel/eeh_sysfs.c  | 2 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c | 4 +---
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 3eeaa5ef852f..2728a3790f6c 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -230,7 +230,7 @@ struct eeh_ops {
int (*write_config)(struct pci_dn *pdn, int where, int size, u32 val);
int (*next_error)(struct eeh_pe **pe);
int (*restore_config)(struct eeh_dev *edev);
-   int (*notify_resume)(struct pci_dn *pdn);
+   int (*notify_resume)(struct eeh_dev *edev);
 };
 
 extern int eeh_subsystem_flags;
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index b70b9273f45a..b84d3cb2532e 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -425,8 +425,8 @@ static enum pci_ers_result eeh_report_resume(struct eeh_dev 
*edev,
 
pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_RECOVERED);
 #ifdef CONFIG_PCI_IOV
-   if (eeh_ops->notify_resume && eeh_dev_to_pdn(edev))
-   eeh_ops->notify_resume(eeh_dev_to_pdn(edev));
+   if (eeh_ops->notify_resume)
+   eeh_ops->notify_resume(edev);
 #endif
return PCI_ERS_RESULT_NONE;
 }
diff --git a/arch/powerpc/kernel/eeh_sysfs.c b/arch/powerpc/kernel/eeh_sysfs.c
index 4fb0f1e1017a..429620da73ba 100644
--- a/arch/powerpc/kernel/eeh_sysfs.c
+++ b/arch/powerpc/kernel/eeh_sysfs.c
@@ -99,7 +99,7 @@ static ssize_t eeh_notify_resume_store(struct device *dev,
if (!edev || !edev->pe || !eeh_ops->notify_resume)
return -ENODEV;
 
-   if (eeh_ops->notify_resume(pci_get_pdn(pdev)))
+   if (eeh_ops->notify_resume(edev))
return -EIO;
 
return count;
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 83122bf65a8c..7a4c7a373241 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -793,10 +793,8 @@ static int pseries_call_allow_unfreeze(struct eeh_dev 
*edev)
return rc;
 }
 
-static int pseries_notify_resume(struct pci_dn *pdn)
+static int pseries_notify_resume(struct eeh_dev *edev)
 {
-   struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
-
if (!edev)
return -EEXIST;
 
-- 
2.26.2



[PATCH 07/14] powerpc/eeh: Pass eeh_dev to eeh_ops->restore_config()

2020-07-05 Thread Oliver O'Halloran
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference
a specific device rather than pci_dn. No functional changes.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h   | 2 +-
 arch/powerpc/kernel/eeh.c| 5 ++---
 arch/powerpc/kernel/eeh_pe.c | 6 ++
 arch/powerpc/platforms/powernv/eeh-powernv.c | 6 ++
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 046c5a2fe411..3eeaa5ef852f 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -229,7 +229,7 @@ struct eeh_ops {
int (*read_config)(struct pci_dn *pdn, int where, int size, u32 *val);
int (*write_config)(struct pci_dn *pdn, int where, int size, u32 val);
int (*next_error)(struct eeh_pe **pe);
-   int (*restore_config)(struct pci_dn *pdn);
+   int (*restore_config)(struct eeh_dev *edev);
int (*notify_resume)(struct pci_dn *pdn);
 };
 
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index a4df6f6de0bd..1cef0f4bb2d5 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -726,7 +726,6 @@ static void eeh_disable_and_save_dev_state(struct eeh_dev 
*edev,
 
 static void eeh_restore_dev_state(struct eeh_dev *edev, void *userdata)
 {
-   struct pci_dn *pdn = eeh_dev_to_pdn(edev);
struct pci_dev *pdev = eeh_dev_to_pci_dev(edev);
struct pci_dev *dev = userdata;
 
@@ -734,8 +733,8 @@ static void eeh_restore_dev_state(struct eeh_dev *edev, 
void *userdata)
return;
 
/* Apply customization from firmware */
-   if (pdn && eeh_ops->restore_config)
-   eeh_ops->restore_config(pdn);
+   if (eeh_ops->restore_config)
+   eeh_ops->restore_config(edev);
 
/* The caller should restore state for the specified device */
if (pdev != dev)
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 177852e39a25..d71493f66917 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -843,16 +843,14 @@ static void eeh_restore_device_bars(struct eeh_dev *edev)
  */
 static void eeh_restore_one_device_bars(struct eeh_dev *edev, void *flag)
 {
-   struct pci_dn *pdn = eeh_dev_to_pdn(edev);
-
/* Do special restore for bridges */
if (edev->mode & EEH_DEV_BRIDGE)
eeh_restore_bridge_bars(edev);
else
eeh_restore_device_bars(edev);
 
-   if (eeh_ops->restore_config && pdn)
-   eeh_ops->restore_config(pdn);
+   if (eeh_ops->restore_config)
+   eeh_ops->restore_config(edev);
 }
 
 /**
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 8f3a7611efc1..a41e67f674e6 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1619,12 +1619,10 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
return ret;
 }
 
-static int pnv_eeh_restore_config(struct pci_dn *pdn)
+static int pnv_eeh_restore_config(struct eeh_dev *edev)
 {
-   struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
struct pnv_phb *phb;
s64 ret = 0;
-   int config_addr = (pdn->busno << 8) | (pdn->devfn);
 
if (!edev)
return -EEXIST;
@@ -1638,7 +1636,7 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
 
if (ret) {
pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
-   __func__, config_addr, ret);
+   __func__, edev->bdfn, ret);
return -EIO;
}
 
-- 
2.26.2



[PATCH 06/14] powerpc/eeh: Remove VF config space restoration

2020-07-05 Thread Oliver O'Halloran
There's a bunch of strange things about this code. First up is that none of
the fields being written to are functional for a VF. The SR-IOV
specification lists then as "Reserved, but OS should preserve" so writing
new values to them doesn't do anything and is clearly wrong from a
correctness perspective.

However, since VFs are designed to be managed by the OS there is an
argument to be made that we should be saving and restoring some parts of
config space. We already sort of do that by saving the first 64 bytes of
config space in the eeh_dev (see eeh_dev->config_space[]). This is
inadequate since it doesn't even consider saving and restoring the PCI
capability structures. However, this is a problem with EEH in general and
that needs to be fixed for non-VF devices too.

There's no real reason to keep around this around so delete it.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h   |  1 -
 arch/powerpc/kernel/eeh.c| 59 
 arch/powerpc/platforms/powernv/eeh-powernv.c | 20 ++-
 arch/powerpc/platforms/pseries/eeh_pseries.c | 26 +
 4 files changed, 7 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 1bddc0dfe099..046c5a2fe411 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -314,7 +314,6 @@ int eeh_pe_reset(struct eeh_pe *pe, int option, bool 
include_passed);
 int eeh_pe_configure(struct eeh_pe *pe);
 int eeh_pe_inject_err(struct eeh_pe *pe, int type, int func,
  unsigned long addr, unsigned long mask);
-int eeh_restore_vf_config(struct pci_dn *pdn);
 
 /**
  * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 859f76020256..a4df6f6de0bd 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -742,65 +742,6 @@ static void eeh_restore_dev_state(struct eeh_dev *edev, 
void *userdata)
pci_restore_state(pdev);
 }
 
-int eeh_restore_vf_config(struct pci_dn *pdn)
-{
-   struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
-   u32 devctl, cmd, cap2, aer_capctl;
-   int old_mps;
-
-   if (edev->pcie_cap) {
-   /* Restore MPS */
-   old_mps = (ffs(pdn->mps) - 8) << 5;
-   eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
-2, );
-   devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
-   devctl |= old_mps;
-   eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
- 2, devctl);
-
-   /* Disable Completion Timeout if possible */
-   eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
-4, );
-   if (cap2 & PCI_EXP_DEVCAP2_COMP_TMOUT_DIS) {
-   eeh_ops->read_config(pdn,
-edev->pcie_cap + PCI_EXP_DEVCTL2,
-4, );
-   cap2 |= PCI_EXP_DEVCTL2_COMP_TMOUT_DIS;
-   eeh_ops->write_config(pdn,
- edev->pcie_cap + PCI_EXP_DEVCTL2,
- 4, cap2);
-   }
-   }
-
-   /* Enable SERR and parity checking */
-   eeh_ops->read_config(pdn, PCI_COMMAND, 2, );
-   cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
-   eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
-
-   /* Enable report various errors */
-   if (edev->pcie_cap) {
-   eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
-2, );
-   devctl &= ~PCI_EXP_DEVCTL_CERE;
-   devctl |= (PCI_EXP_DEVCTL_NFERE |
-  PCI_EXP_DEVCTL_FERE |
-  PCI_EXP_DEVCTL_URRE);
-   eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
- 2, devctl);
-   }
-
-   /* Enable ECRC generation and check */
-   if (edev->pcie_cap && edev->aer_cap) {
-   eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
-4, _capctl);
-   aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
-   eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
- 4, aer_capctl);
-   }
-
-   return 0;
-}
-
 /**
  * pcibios_set_pcie_reset_state - Set PCI-E reset state
  * @dev: pci device struct
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index bcd0515d8f79..8f3a7611efc1 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1629,20 +1629,12 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
if 

[PATCH 05/14] powerpc/eeh: Kill off eeh_ops->get_pe_addr()

2020-07-05 Thread Oliver O'Halloran
This is used in precisely one place which is in pseries specific platform
code.  There's no need to have the callback in eeh_ops since the platform
chooses the EEH PE addresses anyway. The PowerNV implementation has always
been a stub too so remove it.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h   |  1 -
 arch/powerpc/platforms/powernv/eeh-powernv.c | 13 
 arch/powerpc/platforms/pseries/eeh_pseries.c | 22 ++--
 3 files changed, 11 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 3d648e042835..1bddc0dfe099 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -220,7 +220,6 @@ struct eeh_ops {
int (*init)(void);
struct eeh_dev *(*probe)(struct pci_dev *pdev);
int (*set_option)(struct eeh_pe *pe, int option);
-   int (*get_pe_addr)(struct eeh_pe *pe);
int (*get_state)(struct eeh_pe *pe, int *delay);
int (*reset)(struct eeh_pe *pe, int option);
int (*get_log)(struct eeh_pe *pe, int severity, char *drv_log, unsigned 
long len);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 79409e005fcd..bcd0515d8f79 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -535,18 +535,6 @@ static int pnv_eeh_set_option(struct eeh_pe *pe, int 
option)
return 0;
 }
 
-/**
- * pnv_eeh_get_pe_addr - Retrieve PE address
- * @pe: EEH PE
- *
- * Retrieve the PE address according to the given tranditional
- * PCI BDF (Bus/Device/Function) address.
- */
-static int pnv_eeh_get_pe_addr(struct eeh_pe *pe)
-{
-   return pe->addr;
-}
-
 static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
 {
struct pnv_phb *phb = pe->phb->private_data;
@@ -1670,7 +1658,6 @@ static struct eeh_ops pnv_eeh_ops = {
.init   = pnv_eeh_init,
.probe  = pnv_eeh_probe,
.set_option = pnv_eeh_set_option,
-   .get_pe_addr= pnv_eeh_get_pe_addr,
.get_state  = pnv_eeh_get_state,
.reset  = pnv_eeh_reset,
.get_log= pnv_eeh_get_log,
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 18a2522b9b5e..088771fa38be 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -32,6 +32,8 @@
 #include 
 #include 
 
+static int pseries_eeh_get_pe_addr(struct pci_dn *pdn);
+
 /* RTAS tokens */
 static int ibm_set_eeh_option;
 static int ibm_set_slot_reset;
@@ -301,7 +303,7 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
eeh_edev_dbg(edev, "EEH failed to enable on device (code 
%d)\n", ret);
} else {
/* Retrieve PE address */
-   edev->pe_config_addr = eeh_ops->get_pe_addr();
+   edev->pe_config_addr = pseries_eeh_get_pe_addr(pdn);
pe.addr = edev->pe_config_addr;
 
/* Some older systems (Power4) allow the ibm,set-eeh-option
@@ -431,8 +433,10 @@ static int pseries_eeh_set_option(struct eeh_pe *pe, int 
option)
  * It's notable that zero'ed return value means invalid PE config
  * address.
  */
-static int pseries_eeh_get_pe_addr(struct eeh_pe *pe)
+static int pseries_eeh_get_pe_addr(struct pci_dn *pdn)
 {
+   int config_addr = rtas_config_addr(pdn->busno, pdn->devfn, 0);
+   int buid = pdn->phb->buid;
int ret = 0;
int rets[3];
 
@@ -443,18 +447,16 @@ static int pseries_eeh_get_pe_addr(struct eeh_pe *pe)
 * meaningless.
 */
ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
-   pe->config_addr, BUID_HI(pe->phb->buid),
-   BUID_LO(pe->phb->buid), 1);
+   config_addr, BUID_HI(buid), BUID_LO(buid), 1);
if (ret || (rets[0] == 0))
return 0;
 
/* Retrieve the associated PE config address */
ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
-   pe->config_addr, BUID_HI(pe->phb->buid),
-   BUID_LO(pe->phb->buid), 0);
+   config_addr, BUID_HI(buid), BUID_LO(buid), 0);
if (ret) {
pr_warn("%s: Failed to get address for PHB#%x-PE#%x\n",
-   __func__, pe->phb->global_number, 
pe->config_addr);
+   __func__, pdn->phb->global_number, config_addr);
return 0;
}
 
@@ -463,11 +465,10 @@ static int pseries_eeh_get_pe_addr(struct eeh_pe *pe)
 
if (ibm_get_config_addr_info != RTAS_UNKNOWN_SERVICE) {
ret = 

[PATCH 04/14] powerpc/pseries: Stop using pdn->pe_number

2020-07-05 Thread Oliver O'Halloran
The pci_dn->pe_number field is mainly used to track the IODA PE number of a
device on PowerNV. At some point it grew a user in the pseries SR-IOV
support which muddies the waters a bit, so remove it.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index ace117f99d94..18a2522b9b5e 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -52,8 +52,6 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
dev_dbg(>dev, "EEH: Setting up device\n");
 #ifdef CONFIG_PCI_IOV
if (pdev->is_virtfn) {
-   struct pci_dn *physfn_pdn;
-
pdn->device_id  =  pdev->device;
pdn->vendor_id  =  pdev->vendor;
pdn->class_code =  pdev->class;
@@ -63,8 +61,6 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
 * completion from platform.
 */
pdn->last_allow_rc =  0;
-   physfn_pdn  =  pci_get_pdn(pdev->physfn);
-   pdn->pe_number  =  physfn_pdn->pe_num_map[pdn->vf_index];
}
 #endif
pseries_eeh_init_edev(pdn);
@@ -772,8 +768,8 @@ int pseries_send_allow_unfreeze(struct pci_dn *pdn,
 
 static int pseries_call_allow_unfreeze(struct eeh_dev *edev)
 {
+   int cur_vfs = 0, rc = 0, vf_index, bus, devfn, vf_pe_num;
struct pci_dn *pdn, *tmp, *parent, *physfn_pdn;
-   int cur_vfs = 0, rc = 0, vf_index, bus, devfn;
u16 *vf_pe_array;
 
vf_pe_array = kzalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL);
@@ -806,8 +802,10 @@ static int pseries_call_allow_unfreeze(struct eeh_dev 
*edev)
}
} else {
pdn = pci_get_pdn(edev->pdev);
-   vf_pe_array[0] = cpu_to_be16(pdn->pe_number);
physfn_pdn = pci_get_pdn(edev->physfn);
+
+   vf_pe_num = physfn_pdn->pe_num_map[edev->vf_index];
+   vf_pe_array[0] = cpu_to_be16(vf_pe_num);
rc = pseries_send_allow_unfreeze(physfn_pdn,
 vf_pe_array, 1);
pdn->last_allow_rc = rc;
-- 
2.26.2



[PATCH 03/14] powerpc/eeh: Move vf_index out of pci_dn and into eeh_dev

2020-07-05 Thread Oliver O'Halloran
Drivers that do not support the PCI error handling callbacks are handled by
tearing down the device and re-probing them. If the device to be removed is
a virtual function we need to know the index of the index of the VF so that
we can remove it with the pci_iov_{add|remove}_virtfn() API.

Currently this is handled by looking up the pci_dn, and using the vf_index
that was stashed there when the pci_dn for the VF was created in
pcibios_sriov_enable(). We would like to eliminate the use of pci_dn
outside of pseries though so we need to provide the generic EEH code with
some other way to find the vf_index.

The easiest thing to do here is move the vf_index field out of pci_dn and
into eeh_dev.  Currently pci_dn and eeh_dev are allocated and initialized
together so this is a fairly minimal change in preparation for splitting
pci_dn and eeh_dev in the future.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h| 3 +++
 arch/powerpc/include/asm/pci-bridge.h | 1 -
 arch/powerpc/kernel/eeh_driver.c  | 6 ++
 arch/powerpc/kernel/pci_dn.c  | 7 ---
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index e22881a0c415..3d648e042835 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -148,7 +148,10 @@ struct eeh_dev {
struct pci_dn *pdn; /* Associated PCI device node   */
struct pci_dev *pdev;   /* Associated PCI device*/
bool in_error;  /* Error flag for edev  */
+
+   /* VF specific properties */
struct pci_dev *physfn; /* Associated SRIOV PF  */
+   int vf_index;   /* Index of this VF */
 };
 
 /* "fmt" must be a simple literal string */
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b92e81b256e5..d2a2a14e56f9 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -202,7 +202,6 @@ struct pci_dn {
 #define IODA_INVALID_PE0x
unsigned int pe_number;
 #ifdef CONFIG_PCI_IOV
-   int vf_index;   /* VF index in the PF */
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
unsigned int *pe_num_map;   /* PE# for the first VF PE or array */
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 7b048cee767c..b70b9273f45a 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -477,7 +477,7 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
}
 
 #ifdef CONFIG_PCI_IOV
-   pci_iov_add_virtfn(edev->physfn, eeh_dev_to_pdn(edev)->vf_index);
+   pci_iov_add_virtfn(edev->physfn, edev->vf_index);
 #endif
return NULL;
 }
@@ -521,9 +521,7 @@ static void eeh_rmv_device(struct eeh_dev *edev, void 
*userdata)
 
if (edev->physfn) {
 #ifdef CONFIG_PCI_IOV
-   struct pci_dn *pdn = eeh_dev_to_pdn(edev);
-
-   pci_iov_remove_virtfn(edev->physfn, pdn->vf_index);
+   pci_iov_remove_virtfn(edev->physfn, edev->vf_index);
edev->pdev = NULL;
 #endif
if (rmv_data)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f790a8d06f50..bf11ac8427ac 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -146,7 +146,6 @@ static struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
 
 #ifdef CONFIG_PCI_IOV
 static struct pci_dn *add_one_sriov_vf_pdn(struct pci_dn *parent,
-  int vf_index,
   int busno, int devfn)
 {
struct pci_dn *pdn;
@@ -163,7 +162,6 @@ static struct pci_dn *add_one_sriov_vf_pdn(struct pci_dn 
*parent,
pdn->parent = parent;
pdn->busno = busno;
pdn->devfn = devfn;
-   pdn->vf_index = vf_index;
pdn->pe_number = IODA_INVALID_PE;
INIT_LIST_HEAD(>child_list);
INIT_LIST_HEAD(>list);
@@ -194,7 +192,7 @@ struct pci_dn *add_sriov_vf_pdns(struct pci_dev *pdev)
for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
struct eeh_dev *edev __maybe_unused;
 
-   pdn = add_one_sriov_vf_pdn(parent, i,
+   pdn = add_one_sriov_vf_pdn(parent,
   pci_iov_virtfn_bus(pdev, i),
   pci_iov_virtfn_devfn(pdev, i));
if (!pdn) {
@@ -207,7 +205,10 @@ struct pci_dn *add_sriov_vf_pdns(struct pci_dev *pdev)
/* Create the EEH device for the VF */
edev = eeh_dev_init(pdn);
BUG_ON(!edev);
+
+   /* FIXME: these should probably be populated by the EEH probe */
edev->physfn = pdev;
+  

[PATCH 02/14] powerpc/eeh: Remove eeh_dev.c

2020-07-05 Thread Oliver O'Halloran
The only thing in this file is eeh_dev_init() which is allocates and
initialises an eeh_dev based on a pci_dn. This is only ever called from
pci_dn.c so move it into there and remove the file.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h |  6 
 arch/powerpc/kernel/Makefile   |  2 +-
 arch/powerpc/kernel/eeh_dev.c  | 54 --
 arch/powerpc/kernel/pci_dn.c   | 20 +
 4 files changed, 21 insertions(+), 61 deletions(-)
 delete mode 100644 arch/powerpc/kernel/eeh_dev.c

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 646307481493..e22881a0c415 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -293,7 +293,6 @@ void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
-struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
 void eeh_show_enabled(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
@@ -339,11 +338,6 @@ static inline bool eeh_enabled(void)
 
 static inline void eeh_show_enabled(void) { }
 
-static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
-{
-   return NULL;
-}
-
 static inline void eeh_dev_phb_init_dynamic(struct pci_controller *phb) { }
 
 static inline int eeh_check_failure(const volatile void __iomem *token)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 244542ae2a91..c5211bdcf1b6 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -71,7 +71,7 @@ obj-$(CONFIG_PPC_RTAS_DAEMON) += rtasd.o
 obj-$(CONFIG_RTAS_FLASH)   += rtas_flash.o
 obj-$(CONFIG_RTAS_PROC)+= rtas-proc.o
 obj-$(CONFIG_PPC_DT_CPU_FTRS)  += dt_cpu_ftrs.o
-obj-$(CONFIG_EEH)  += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
+obj-$(CONFIG_EEH)  += eeh.o eeh_pe.o eeh_cache.o \
  eeh_driver.o eeh_event.o eeh_sysfs.o
 obj-$(CONFIG_GENERIC_TBSYNC)   += smp-tbsync.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump.o
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
deleted file mode 100644
index 8e159a12f10c..
--- a/arch/powerpc/kernel/eeh_dev.c
+++ /dev/null
@@ -1,54 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * The file intends to implement dynamic creation of EEH device, which will
- * be bound with OF node and PCI device simutaneously. The EEH devices would
- * be foundamental information for EEH core components to work proerly. 
Besides,
- * We have to support multiple situations where dynamic creation of EEH device
- * is required:
- *
- * 1) Before PCI emunation starts, we need create EEH devices according to the
- *PCI sensitive OF nodes.
- * 2) When PCI emunation is done, we need do the binding between PCI device and
- *the associated EEH device.
- * 3) DR (Dynamic Reconfiguration) would create PCI sensitive OF node. EEH 
device
- *will be created while PCI sensitive OF node is detected from DR.
- * 4) PCI hotplug needs redoing the binding between PCI device and EEH device. 
If
- *PHB is newly inserted, we also need create EEH devices accordingly.
- *
- * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2012.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
-/**
- * eeh_dev_init - Create EEH device according to OF node
- * @pdn: PCI device node
- *
- * It will create EEH device according to the given OF node. The function
- * might be called by PCI emunation, DR, PHB hotplug.
- */
-struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
-{
-   struct eeh_dev *edev;
-
-   /* Allocate EEH device */
-   edev = kzalloc(sizeof(*edev), GFP_KERNEL);
-   if (!edev)
-   return NULL;
-
-   /* Associate EEH device with OF node */
-   pdn->edev = edev;
-   edev->pdn = pdn;
-   edev->bdfn = (pdn->busno << 8) | pdn->devfn;
-   edev->controller = pdn->phb;
-
-   return edev;
-}
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 4e654df55969..f790a8d06f50 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -124,6 +124,26 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
return NULL;
 }
 
+#ifdef CONFIG_EEH
+static struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
+{
+   struct eeh_dev *edev;
+
+   /* Allocate EEH device */
+   edev = kzalloc(sizeof(*edev), GFP_KERNEL);
+   if (!edev)
+   return NULL;
+
+   /* Associate EEH device with OF node */
+   pdn->edev = edev;
+   edev->pdn = pdn;
+   edev->bdfn = (pdn->busno << 8) | pdn->devfn;
+   edev->controller = pdn->phb;
+
+   return edev;
+}
+#endif /* CONFIG_EEH */
+
 #ifdef CONFIG_PCI_IOV
 static struct pci_dn *add_one_sriov_vf_pdn(struct pci_dn *parent,
  

[PATCH 01/14] powerpc/eeh: Remove eeh_dev_phb_init_dynamic()

2020-07-05 Thread Oliver O'Halloran
This function is a one line wrapper around eeh_phb_pe_create() and despite
the name it doesn't create any eeh_dev structures. Replace it with direct
calls to eeh_phb_pe_create() since that does what it says on the tin
and removes a layer of indirection.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h |  1 -
 arch/powerpc/kernel/eeh.c  |  2 +-
 arch/powerpc/kernel/eeh_dev.c  | 13 -
 arch/powerpc/kernel/of_platform.c  |  4 ++--
 arch/powerpc/platforms/pseries/pci_dlpar.c |  2 +-
 5 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 964a54292b36..646307481493 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -294,7 +294,6 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
 struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
-void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
 void eeh_show_enabled(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index d407981dec76..859f76020256 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1096,7 +1096,7 @@ static int eeh_init(void)
 
/* Initialize PHB PEs */
list_for_each_entry_safe(hose, tmp, _list, list_node)
-   eeh_dev_phb_init_dynamic(hose);
+   eeh_phb_pe_create(hose);
 
eeh_addr_cache_init();
 
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index 7370185c7a05..8e159a12f10c 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -52,16 +52,3 @@ struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
 
return edev;
 }
-
-/**
- * eeh_dev_phb_init_dynamic - Create EEH devices for devices included in PHB
- * @phb: PHB
- *
- * Scan the PHB OF node and its child association, then create the
- * EEH devices accordingly
- */
-void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
-{
-   /* EEH PE for PHB */
-   eeh_phb_pe_create(phb);
-}
diff --git a/arch/powerpc/kernel/of_platform.c 
b/arch/powerpc/kernel/of_platform.c
index 71a3f97dc988..f89376ff633e 100644
--- a/arch/powerpc/kernel/of_platform.c
+++ b/arch/powerpc/kernel/of_platform.c
@@ -62,8 +62,8 @@ static int of_pci_phb_probe(struct platform_device *dev)
/* Init pci_dn data structures */
pci_devs_phb_init_dynamic(phb);
 
-   /* Create EEH PEs for the PHB */
-   eeh_dev_phb_init_dynamic(phb);
+   /* Create EEH PE for the PHB */
+   eeh_phb_pe_create(phb);
 
/* Scan the bus */
pcibios_scan_phb(phb);
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c 
b/arch/powerpc/platforms/pseries/pci_dlpar.c
index b3a38f5a6b68..f9ae17e8a0f4 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -34,7 +34,7 @@ struct pci_controller *init_phb_dynamic(struct device_node 
*dn)
pci_devs_phb_init_dynamic(phb);
 
/* Create EEH devices for the PHB */
-   eeh_dev_phb_init_dynamic(phb);
+   eeh_phb_pe_create(phb);
 
if (dn->child)
pseries_eeh_init_edev_recursive(PCI_DN(dn));
-- 
2.26.2



EEH core pci_dn de-lousing

2020-07-05 Thread Oliver O'Halloran
Removes most of the uses of pci_dn in the EEH core. There's a few
stragglers remaining in pseries specific bits of kernel/eeh*.c mainly
the the support for "open sriov" where the hypervisor allows the guest
to manage SR-IOV physical functions. We can largely ignore that on
non-pseries platforms though.

There'll be a follow up to this which actually removes the use of pci_dn
from PowerNV entirely and we can start looking at properly supporting
native PCIe. At last.

Oliver





Re: [RFC PATCH 4/5] powerpc/mm: Remove custom stack expansion checking

2020-07-05 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of July 6, 2020 3:49 am:
> 
> 
> Le 03/07/2020 à 16:13, Michael Ellerman a écrit :
>> We have powerpc specific logic in our page fault handling to decide if
>> an access to an unmapped address below the stack pointer should expand
>> the stack VMA.
>> 
>> The logic aims to prevent userspace from doing bad accesses below the
>> stack pointer. However as long as the stack is < 1MB in size, we allow
>> all accesses without further checks. Adding some debug I see that I
>> can do a full kernel build and LTP run, and not a single process has
>> used more than 1MB of stack. So for the majority of processes the
>> logic never even fires.
>> 
>> We also recently found a nasty bug in this code which could cause
>> userspace programs to be killed during signal delivery. It went
>> unnoticed presumably because most processes use < 1MB of stack.
>> 
>> The generic mm code has also grown support for stack guard pages since
>> this code was originally written, so the most heinous case of the
>> stack expanding into other mappings is now handled for us.
>> 
>> Finally although some other arches have special logic in this path,
>> from what I can tell none of x86, arm64, arm and s390 impose any extra
>> checks other than those in expand_stack().
>> 
>> So drop our complicated logic and like other architectures just let
>> the stack expand as long as its within the rlimit.
> 
> I agree that's probably not worth a so complicated logic that is nowhere 
> documented.

Agreed.

>> @@ -569,30 +488,15 @@ static int __do_page_fault(struct pt_regs *regs, 
>> unsigned long address,
>>  vma = find_vma(mm, address);
>>  if (unlikely(!vma))
>>  return bad_area(regs, address);
>> -if (likely(vma->vm_start <= address))
>> -goto good_area;
>> -if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
>> -return bad_area(regs, address);
>>   
>> -/* The stack is being expanded, check if it's valid */
>> -if (unlikely(bad_stack_expansion(regs, address, vma, flags,
>> - _retry))) {
>> -if (!must_retry)
>> +if (unlikely(vma->vm_start > address)) {
>> +if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
> 
> We are already in an unlikely() branch, I don't think it is worth having 
> a second level of unlikely(), better let gcc decide what's most efficient.

I'm not sure being nested matters. It does in terms of how the code is 
generated and how much it might acutally matter, but if we say we 
optimise the expand stack case rather than the segfault case, then 
unlikely is fine here. I find it can be a readability aid as well.

Thanks,
Nick


Re: [PATCH V4 0/4] mm/debug_vm_pgtable: Add some more tests

2020-07-05 Thread Anshuman Khandual


On 07/06/2020 06:18 AM, Anshuman Khandual wrote:
> This series adds some more arch page table helper validation tests which
> are related to core and advanced memory functions. This also creates a
> documentation, enlisting expected semantics for all page table helpers as
> suggested by Mike Rapoport previously (https://lkml.org/lkml/2020/1/30/40).
> 
> There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD
> ifdefs scattered across the test. But consolidating all the fallback stubs
> is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is
> not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE.
> 
> Tested on arm64, x86 platforms but only build tested on all other enabled
> platforms through ARCH_HAS_DEBUG_VM_PGTABLE i.e powerpc, arc, s390. The
> following failure on arm64 still exists which was mentioned previously. It
> will be fixed with the upcoming THP migration on arm64 enablement series.
> 
> WARNING  mm/debug_vm_pgtable.c:860 debug_vm_pgtable+0x940/0xa54
> WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd
> 
> This series is based on v5.8-rc4.
> 
> Changes in V4:
> 
> - Replaced READ_ONCE() with ptep_get() while accessing PTE pointers per 
> Christophe
> - Fixed function argument alignments per Christophe

+Cc - Zi Yan, Gerald Schaefer, Christophe Leroy

The Cc list was not complete. Please do let me know if you could
not retrieve all the four patches of the series from the list.

- Anshuman


[PATCH V4 4/4] Documentation/mm: Add descriptions for arch page table helpers

2020-07-05 Thread Anshuman Khandual
This adds a specific description file for all arch page table helpers which
is in sync with the semantics being tested via CONFIG_DEBUG_VM_PGTABLE. All
future changes either to these descriptions here or the debug test should
always remain in sync.

Cc: Jonathan Corbet 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Vineet Gupta 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Kirill A. Shutemov 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: x...@kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Mike Rapoport 
Suggested-by: Mike Rapoport 
Signed-off-by: Anshuman Khandual 
---
 Documentation/vm/arch_pgtable_helpers.rst | 258 ++
 mm/debug_vm_pgtable.c |   6 +
 2 files changed, 264 insertions(+)
 create mode 100644 Documentation/vm/arch_pgtable_helpers.rst

diff --git a/Documentation/vm/arch_pgtable_helpers.rst 
b/Documentation/vm/arch_pgtable_helpers.rst
new file mode 100644
index ..cd7609b05446
--- /dev/null
+++ b/Documentation/vm/arch_pgtable_helpers.rst
@@ -0,0 +1,258 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _arch_page_table_helpers:
+
+===
+Architecture Page Table Helpers
+===
+
+Generic MM expects architectures (with MMU) to provide helpers to create, 
access
+and modify page table entries at various level for different memory functions.
+These page table helpers need to conform to a common semantics across 
platforms.
+Following tables describe the expected semantics which can also be tested 
during
+boot via CONFIG_DEBUG_VM_PGTABLE option. All future changes in here or the 
debug
+test need to be in sync.
+
+==
+PTE Page Table Helpers
+==
+
+
+| pte_same  | Tests whether both PTE entries are the same  
|
+
+| pte_bad   | Tests a non-table mapped PTE 
|
+
+| pte_present   | Tests a valid mapped PTE 
|
+
+| pte_young | Tests a young PTE
|
+
+| pte_dirty | Tests a dirty PTE
|
+
+| pte_write | Tests a writable PTE 
|
+
+| pte_special   | Tests a special PTE  
|
+
+| pte_protnone  | Tests a PROT_NONE PTE
|
+
+| pte_devmap| Tests a ZONE_DEVICE mapped PTE   
|
+
+| pte_soft_dirty| Tests a soft dirty PTE   
|
+
+| pte_swp_soft_dirty| Tests a soft dirty swapped PTE   
|
+
+| pte_mkyoung   | Creates a young PTE  
|
+
+| pte_mkold | Creates an old PTE   
|
+
+| pte_mkdirty   | Creates a dirty PTE  
|
+
+| pte_mkclean   | Creates a clean PTE  
|
+
+| pte_mkwrite   | Creates a writable PTE   
|
+
+| pte_mkwrprotect   | Creates a write protected PTE

[PATCH V4 3/4] mm/debug_vm_pgtable: Add debug prints for individual tests

2020-07-05 Thread Anshuman Khandual
This adds debug print information that enlists all tests getting executed
on a given platform. With dynamic debug enabled, the following information
will be splashed during boot. For compactness purpose, dropped both time
stamp and prefix (i.e debug_vm_pgtable) from this sample output.

[debug_vm_pgtable  ]: Validating architecture page table helpers
[pte_basic_tests   ]: Validating PTE basic
[pmd_basic_tests   ]: Validating PMD basic
[p4d_basic_tests   ]: Validating P4D basic
[pgd_basic_tests   ]: Validating PGD basic
[pte_clear_tests   ]: Validating PTE clear
[pmd_clear_tests   ]: Validating PMD clear
[pte_advanced_tests]: Validating PTE advanced
[pmd_advanced_tests]: Validating PMD advanced
[hugetlb_advanced_tests]: Validating HugeTLB advanced
[pmd_leaf_tests]: Validating PMD leaf
[pmd_huge_tests]: Validating PMD huge
[pte_savedwrite_tests  ]: Validating PTE saved write
[pmd_savedwrite_tests  ]: Validating PMD saved write
[pmd_populate_tests]: Validating PMD populate
[pte_special_tests ]: Validating PTE special
[pte_protnone_tests]: Validating PTE protnone
[pmd_protnone_tests]: Validating PMD protnone
[pte_devmap_tests  ]: Validating PTE devmap
[pmd_devmap_tests  ]: Validating PMD devmap
[pte_swap_tests]: Validating PTE swap
[swap_migration_tests  ]: Validating swap migration
[hugetlb_basic_tests   ]: Validating HugeTLB basic
[pmd_thp_tests ]: Validating PMD based THP

Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Vineet Gupta 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Kirill A. Shutemov 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: x...@kernel.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Tested-by: Vineet Gupta#arc
Signed-off-by: Anshuman Khandual 
---
 mm/debug_vm_pgtable.c | 46 ++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index dc72825f94a4..a9ae8cb7e832 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -8,7 +8,7 @@
  *
  * Author: Anshuman Khandual 
  */
-#define pr_fmt(fmt) "debug_vm_pgtable: %s: " fmt, __func__
+#define pr_fmt(fmt) "debug_vm_pgtable: [%-25s]: " fmt, __func__
 
 #include 
 #include 
@@ -48,6 +48,7 @@ static void __init pte_basic_tests(unsigned long pfn, 
pgprot_t prot)
 {
pte_t pte = pfn_pte(pfn, prot);
 
+   pr_debug("Validating PTE basic\n");
WARN_ON(!pte_same(pte, pte));
WARN_ON(!pte_young(pte_mkyoung(pte_mkold(pte;
WARN_ON(!pte_dirty(pte_mkdirty(pte_mkclean(pte;
@@ -64,6 +65,7 @@ static void __init pte_advanced_tests(struct mm_struct *mm,
 {
pte_t pte = pfn_pte(pfn, prot);
 
+   pr_debug("Validating PTE advanced\n");
pte = pfn_pte(pfn, prot);
set_pte_at(mm, vaddr, ptep, pte);
ptep_set_wrprotect(mm, vaddr, ptep);
@@ -103,6 +105,7 @@ static void __init pte_savedwrite_tests(unsigned long pfn, 
pgprot_t prot)
 {
pte_t pte = pfn_pte(pfn, prot);
 
+   pr_debug("Validating PTE saved write\n");
WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte;
WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte;
 }
@@ -114,6 +117,7 @@ static void __init pmd_basic_tests(unsigned long pfn, 
pgprot_t prot)
if (!has_transparent_hugepage())
return;
 
+   pr_debug("Validating PMD basic\n");
WARN_ON(!pmd_same(pmd, pmd));
WARN_ON(!pmd_young(pmd_mkyoung(pmd_mkold(pmd;
WARN_ON(!pmd_dirty(pmd_mkdirty(pmd_mkclean(pmd;
@@ -138,6 +142,7 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
if (!has_transparent_hugepage())
return;
 
+   pr_debug("Validating PMD advanced\n");
/* Align the address wrt HPAGE_PMD_SIZE */
vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
 
@@ -180,6 +185,7 @@ static void __init pmd_leaf_tests(unsigned long pfn, 
pgprot_t prot)
 {
pmd_t pmd = pfn_pmd(pfn, prot);
 
+   pr_debug("Validating PMD leaf\n");
/*
 * PMD based THP is a leaf entry.
 */
@@ -193,6 +199,8 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
long pfn, pgprot_t prot)
 
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
return;
+
+   pr_debug("Validating PMD huge\n");
/*
 * X86 defined pmd_set_huge() verifies that the given
 * PMD is not a populated non-leaf entry.
@@ -208,6 +216,7 @@ static void __init 

[PATCH V4 2/4] mm/debug_vm_pgtable: Add tests validating advanced arch page table helpers

2020-07-05 Thread Anshuman Khandual
This adds new tests validating for these following arch advanced page table
helpers. These tests create and test specific mapping types at various page
table levels.

1. pxxp_set_wrprotect()
2. pxxp_get_and_clear()
3. pxxp_set_access_flags()
4. pxxp_get_and_clear_full()
5. pxxp_test_and_clear_young()
6. pxx_leaf()
7. pxx_set_huge()
8. pxx_(clear|mk)_savedwrite()
9. huge_pxxp_xxx()

Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Vineet Gupta 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Kirill A. Shutemov 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: x...@kernel.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Tested-by: Vineet Gupta#arc
Suggested-by: Catalin Marinas 
Signed-off-by: Anshuman Khandual 
---
 mm/debug_vm_pgtable.c | 312 ++
 1 file changed, 312 insertions(+)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 2fac47db3eb7..dc72825f94a4 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define VMFLAGS(VM_READ|VM_WRITE|VM_EXEC)
 
@@ -55,6 +57,55 @@ static void __init pte_basic_tests(unsigned long pfn, 
pgprot_t prot)
WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte;
 }
 
+static void __init pte_advanced_tests(struct mm_struct *mm,
+ struct vm_area_struct *vma, pte_t *ptep,
+ unsigned long pfn, unsigned long vaddr,
+ pgprot_t prot)
+{
+   pte_t pte = pfn_pte(pfn, prot);
+
+   pte = pfn_pte(pfn, prot);
+   set_pte_at(mm, vaddr, ptep, pte);
+   ptep_set_wrprotect(mm, vaddr, ptep);
+   pte = ptep_get(ptep);
+   WARN_ON(pte_write(pte));
+
+   pte = pfn_pte(pfn, prot);
+   set_pte_at(mm, vaddr, ptep, pte);
+   ptep_get_and_clear(mm, vaddr, ptep);
+   pte = ptep_get(ptep);
+   WARN_ON(!pte_none(pte));
+
+   pte = pfn_pte(pfn, prot);
+   pte = pte_wrprotect(pte);
+   pte = pte_mkclean(pte);
+   set_pte_at(mm, vaddr, ptep, pte);
+   pte = pte_mkwrite(pte);
+   pte = pte_mkdirty(pte);
+   ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
+   pte = ptep_get(ptep);
+   WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
+
+   pte = pfn_pte(pfn, prot);
+   set_pte_at(mm, vaddr, ptep, pte);
+   ptep_get_and_clear_full(mm, vaddr, ptep, 1);
+   pte = ptep_get(ptep);
+   WARN_ON(!pte_none(pte));
+
+   pte = pte_mkyoung(pte);
+   set_pte_at(mm, vaddr, ptep, pte);
+   ptep_test_and_clear_young(vma, vaddr, ptep);
+   pte = ptep_get(ptep);
+   WARN_ON(pte_young(pte));
+}
+
+static void __init pte_savedwrite_tests(unsigned long pfn, pgprot_t prot)
+{
+   pte_t pte = pfn_pte(pfn, prot);
+
+   WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte;
+   WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte;
+}
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
 {
@@ -77,6 +128,90 @@ static void __init pmd_basic_tests(unsigned long pfn, 
pgprot_t prot)
WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
 }
 
+static void __init pmd_advanced_tests(struct mm_struct *mm,
+ struct vm_area_struct *vma, pmd_t *pmdp,
+ unsigned long pfn, unsigned long vaddr,
+ pgprot_t prot)
+{
+   pmd_t pmd = pfn_pmd(pfn, prot);
+
+   if (!has_transparent_hugepage())
+   return;
+
+   /* Align the address wrt HPAGE_PMD_SIZE */
+   vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
+
+   pmd = pfn_pmd(pfn, prot);
+   set_pmd_at(mm, vaddr, pmdp, pmd);
+   pmdp_set_wrprotect(mm, vaddr, pmdp);
+   pmd = READ_ONCE(*pmdp);
+   WARN_ON(pmd_write(pmd));
+
+   pmd = pfn_pmd(pfn, prot);
+   set_pmd_at(mm, vaddr, pmdp, pmd);
+   pmdp_huge_get_and_clear(mm, vaddr, pmdp);
+   pmd = READ_ONCE(*pmdp);
+   WARN_ON(!pmd_none(pmd));
+
+   pmd = pfn_pmd(pfn, prot);
+   pmd = pmd_wrprotect(pmd);
+   pmd = pmd_mkclean(pmd);
+   set_pmd_at(mm, vaddr, pmdp, pmd);
+   pmd = pmd_mkwrite(pmd);
+   pmd = pmd_mkdirty(pmd);
+   pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
+   pmd = READ_ONCE(*pmdp);
+   WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
+
+   pmd = pmd_mkhuge(pfn_pmd(pfn, 

[PATCH V4 1/4] mm/debug_vm_pgtable: Add tests validating arch helpers for core MM features

2020-07-05 Thread Anshuman Khandual
This adds new tests validating arch page table helpers for these following
core memory features. These tests create and test specific mapping types at
various page table levels.

1. SPECIAL mapping
2. PROTNONE mapping
3. DEVMAP mapping
4. SOFTDIRTY mapping
5. SWAP mapping
6. MIGRATION mapping
7. HUGETLB mapping
8. THP mapping

Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Vineet Gupta 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Kirill A. Shutemov 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: x...@kernel.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Tested-by: Vineet Gupta#arc
Reviewed-by: Zi Yan 
Suggested-by: Catalin Marinas 
Signed-off-by: Anshuman Khandual 
---
 mm/debug_vm_pgtable.c | 302 +-
 1 file changed, 301 insertions(+), 1 deletion(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 61ab16fb2e36..2fac47db3eb7 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -282,6 +282,278 @@ static void __init pmd_populate_tests(struct mm_struct 
*mm, pmd_t *pmdp,
WARN_ON(pmd_bad(pmd));
 }
 
+static void __init pte_special_tests(unsigned long pfn, pgprot_t prot)
+{
+   pte_t pte = pfn_pte(pfn, prot);
+
+   if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL))
+   return;
+
+   WARN_ON(!pte_special(pte_mkspecial(pte)));
+}
+
+static void __init pte_protnone_tests(unsigned long pfn, pgprot_t prot)
+{
+   pte_t pte = pfn_pte(pfn, prot);
+
+   if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+   return;
+
+   WARN_ON(!pte_protnone(pte));
+   WARN_ON(!pte_present(pte));
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot)
+{
+   pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
+
+   if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+   return;
+
+   WARN_ON(!pmd_protnone(pmd));
+   WARN_ON(!pmd_present(pmd));
+}
+#else  /* !CONFIG_TRANSPARENT_HUGEPAGE */
+static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot) { }
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
+static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot)
+{
+   pte_t pte = pfn_pte(pfn, prot);
+
+   WARN_ON(!pte_devmap(pte_mkdevmap(pte)));
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot)
+{
+   pmd_t pmd = pfn_pmd(pfn, prot);
+
+   WARN_ON(!pmd_devmap(pmd_mkdevmap(pmd)));
+}
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot)
+{
+   pud_t pud = pfn_pud(pfn, prot);
+
+   WARN_ON(!pud_devmap(pud_mkdevmap(pud)));
+}
+#else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
+static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
+#else  /* CONFIG_TRANSPARENT_HUGEPAGE */
+static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#else
+static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+#endif /* CONFIG_ARCH_HAS_PTE_DEVMAP */
+
+static void __init pte_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
+{
+   pte_t pte = pfn_pte(pfn, prot);
+
+   if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
+   return;
+
+   WARN_ON(!pte_soft_dirty(pte_mksoft_dirty(pte)));
+   WARN_ON(pte_soft_dirty(pte_clear_soft_dirty(pte)));
+}
+
+static void __init pte_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
+{
+   pte_t pte = pfn_pte(pfn, prot);
+
+   if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
+   return;
+
+   WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte)));
+   WARN_ON(pte_swp_soft_dirty(pte_swp_clear_soft_dirty(pte)));
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
+{
+   pmd_t pmd = pfn_pmd(pfn, prot);
+
+   if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
+   return;
+
+   WARN_ON(!pmd_soft_dirty(pmd_mksoft_dirty(pmd)));
+   WARN_ON(pmd_soft_dirty(pmd_clear_soft_dirty(pmd)));
+}
+
+static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
+{
+   pmd_t pmd = pfn_pmd(pfn, prot);
+
+   if 

[PATCH V4 0/4] mm/debug_vm_pgtable: Add some more tests

2020-07-05 Thread Anshuman Khandual
This series adds some more arch page table helper validation tests which
are related to core and advanced memory functions. This also creates a
documentation, enlisting expected semantics for all page table helpers as
suggested by Mike Rapoport previously (https://lkml.org/lkml/2020/1/30/40).

There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD
ifdefs scattered across the test. But consolidating all the fallback stubs
is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is
not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE.

Tested on arm64, x86 platforms but only build tested on all other enabled
platforms through ARCH_HAS_DEBUG_VM_PGTABLE i.e powerpc, arc, s390. The
following failure on arm64 still exists which was mentioned previously. It
will be fixed with the upcoming THP migration on arm64 enablement series.

WARNING  mm/debug_vm_pgtable.c:860 debug_vm_pgtable+0x940/0xa54
WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd

This series is based on v5.8-rc4.

Changes in V4:

- Replaced READ_ONCE() with ptep_get() while accessing PTE pointers per 
Christophe
- Fixed function argument alignments per Christophe

Changes in V3: 
(https://patchwork.kernel.org/project/linux-mm/list/?series=302483)

- Replaced HAVE_ARCH_SOFT_DIRTY with MEM_SOFT_DIRTY
- Added HAVE_ARCH_HUGE_VMAP checks in pxx_huge_tests() per Gerald
- Updated documentation for pmd_thp_tests() per Zi Yan
- Replaced READ_ONCE() with huge_ptep_get() per Gerald
- Added pte_mkhuge() and masking with PMD_MASK per Gerald
- Replaced pte_same() with holding pfn check in pxx_swap_tests()
- Added documentation for all (#ifdef #else #endif) per Gerald
- Updated pmd_protnone_tests() per Gerald
- Updated HugeTLB PTE creation in hugetlb_advanced_tests() per Gerald
- Replaced [pmd|pud]_mknotpresent() with [pmd|pud]_mkinvalid()
- Added has_transparent_hugepage() check for PMD and PUD tests
- Added a patch which debug prints all individual tests being executed
- Updated documentation for renamed [pmd|pud]_mkinvalid() helpers

Changes in V2: 
(https://patchwork.kernel.org/project/linux-mm/list/?series=260573)

- Dropped CONFIG_ARCH_HAS_PTE_SPECIAL per Christophe
- Dropped CONFIG_NUMA_BALANCING per Christophe
- Dropped CONFIG_HAVE_ARCH_SOFT_DIRTY per Christophe
- Dropped CONFIG_MIGRATION per Christophe
- Replaced CONFIG_S390 with __HAVE_ARCH_PMDP_INVALIDATE
- Moved page allocation & free inside swap_migration_tests() per Christophe
- Added CONFIG_TRANSPARENT_HUGEPAGE to protect pfn_pmd()
- Added CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD to protect pfn_pud()
- Added a patch for other arch advanced page table helper tests
- Added a patch creating a documentation for page table helper semantics

Changes in V1: (https://patchwork.kernel.org/patch/11408253/)

Cc: Jonathan Corbet 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Vineet Gupta 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Kirill A. Shutemov 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: x...@kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org


Anshuman Khandual (4):
  mm/debug_vm_pgtable: Add tests validating arch helpers for core MM features
  mm/debug_vm_pgtable: Add tests validating advanced arch page table helpers
  mm/debug_vm_pgtable: Add debug prints for individual tests
  Documentation/mm: Add descriptions for arch page table helpers

 Documentation/vm/arch_pgtable_helpers.rst | 258 +
 mm/debug_vm_pgtable.c | 666 +-
 2 files changed, 922 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/vm/arch_pgtable_helpers.rst

-- 
2.20.1



Re: Using Firefox hangs system

2020-07-05 Thread Nicholas Piggin
Excerpts from Paul Menzel's message of July 5, 2020 8:30 pm:
> [Removed Rafael from CC]
> 
> Dear Linux folks,
> 
> 
> Am 05.07.20 um 11:22 schrieb Paul Menzel:
> 
>> With an IBM S822LC with Ubuntu 20.04, after updating to Firefox 78.0, 
>> using Firefox seems to hang the system. This happened with self-built 
>> Linux 5.7-rc5+ and now with 5.8-rc3+.
>> 
>> (At least I believe the Firefox update is causing this.)
>> 
>> Log in is impossible, and using the Serial over LAN over IPMI shows the 
>> messages below.
>> 
>>> [ 2620.579187] watchdog: BUG: soft lockup - CPU#125 stuck for 22s! 
>>> [swapper/125:0]
>>> [ 2620.579378] Modules linked in: tcp_diag inet_diag unix_diag 
>>> xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 
>>> xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat 
>>> nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink 
>>> ip6table_filter ip6_tables iptable_filter bridge stp llc overlay xfs 
>>> kvm_hv kvm joydev binfmt_misc uas usb_storage vmx_crypto ofpart 
>>> cmdlinepart bnx2x powernv_flash mtd mdio crct10dif_vpmsum at24 
>>> ibmpowernv ipmi_powernv ipmi_devintf powernv_rng ipmi_msghandler 
>>> opal_prd sch_fq_codel parport_pc nfsd ppdev lp auth_rpcgss nfs_acl 
>>> parport lockd grace sunrpc ip_tables x_tables autofs4 btrfs 
>>> blake2b_generic libcrc32c xor zstd_compress raid6_pq input_leds 
>>> mac_hid hid_generic ast drm_vram_helper drm_ttm_helper i2c_algo_bit 
>>> ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm 
>>> drm_panel_orientation_quirks ahci libahci usbhid hid crc32c_vpmsum 
>>> uio_pdrv_genirq uio
>>> [ 2620.579537] CPU: 125 PID: 0 Comm: swapper/125 Tainted: G  D 
>>> W    L    5.8.0-rc3+ #1
>>> [ 2620.579552] NIP:  c10dad38 LR: c10dad30 CTR: 
>>> c0237830
>>> [ 2620.579568] REGS: c0ffcb8c7600 TRAP: 0900   Tainted: G  D 
>>> W    L (5.8.0-rc3+)
>>> [ 2620.579582] MSR:  90009033   CR: 
>>> 44004228  XER: 
>>> [ 2620.579599] CFAR: c10dad44 IRQMASK: 0 [ 2620.579599] GPR00: 
>>> c023718c c0ffcb8c7890 c1f9a900  [ 
>>> 2620.579599] GPR04: c1fce438 0078 00010008c1f2 
>>>  [ 2620.579599] GPR08: 00ffd96a 
>>> 8087  c1fd25e0 [ 2620.579599] 
>>> GPR12: 4400 c072f680 c1ea36d8 
>>> c0ffcb859800 [ 2620.579599] GPR16: c166c880 
>>> c16f8e00 000a c0ffcb859800 [ 2620.579599] 
>>> GPR20: 0100 c166c918 c1fd21e8 
>>> c0ffcb859800 [ 2620.579599] GPR24: 00ffd96a 
>>> c1d44b80 c1d53780 0008 [ 2620.579599] 
>>> GPR28: c1fd21e0 0001  
>>> c1d44b80 [ 2620.579711] NIP [c10dad38] 
>>> _raw_spin_lock_irqsave+0x98/0x120
>>> [ 2620.579724] LR [c10dad30] _raw_spin_lock_irqsave+0x90/0x120
>>> [ 2620.579737] Call Trace:
>>> [ 2620.579746] [c0ffcb8c7890] [c13c84a0] 
>>> ncsi_ops+0x209f50/0x2dc1d8 (unreliable)
>>> [ 2620.579763] [c0ffcb8c78d0] [c023718c] rcu_core+0xfc/0x7a0
>>> [ 2620.579777] [c0ffcb8c7970] [c10db81c] 
>>> __do_softirq+0x17c/0x534
>>> [ 2620.579791] [c0ffcb8c7aa0] [c01786f4] irq_exit+0xd4/0x130
>>> [ 2620.579805] [c0ffcb8c7ad0] [c0025eec] 
>>> timer_interrupt+0x13c/0x370
>>> [ 2620.579821] [c0ffcb8c7b40] [c00165c0] 
>>> replay_soft_interrupts+0x320/0x3f0
>>> [ 2620.579837] [c0ffcb8c7d30] [c00166d8] 
>>> arch_local_irq_restore+0x48/0xa0
>>> [ 2620.579853] [c0ffcb8c7d50] [c0de2fe0] 
>>> cpuidle_enter_state+0x100/0x780

[snip]

>> 
>> I have to warm reset the system to get it working again.
> 
> I am unable to reproduce this with Ubuntu’s Linux

Okay, not sure what that would be from, looks like RCU perhaps. Anyway 
if it comes up again, let us know.

> With Linux 5.8-rc3+, I got now the beginning of the Linux messages.
> 
>> [  572.253008] Oops: Exception in kernel mode, sig: 5 [#1]
>> [  572.253198] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>> [  572.253232] Modules linked in: tcp_diag inet_diag unix_diag xt_CHECKSUM 
>> xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp 
>> ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack 
>> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables 
>> iptable_filter bridge stp llc overlay xfs kvm_hv kvm binfmt_misc joydev uas 
>> usb_storage vmx_crypto bnx2x crct10dif_vpmsum ofpart cmdlinepart 
>> powernv_flash mtd mdio ibmpowernv at24 ipmi_powernv ipmi_devintf 
>> ipmi_msghandler opal_prd powernv_rng sch_fq_codel parport_pc ppdev lp nfsd 
>> parport auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 
>> btrfs blake2b_generic libcrc32c xor zstd_compress raid6_pq input_leds 
>> mac_hid hid_generic ast drm_vram_helper drm_ttm_helper 

Re: [PATCH v2 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-05 Thread Nicholas Piggin
Excerpts from Waiman Long's message of July 6, 2020 5:00 am:
> On 7/3/20 3:35 AM, Nicholas Piggin wrote:
>> Signed-off-by: Nicholas Piggin 
>> ---
>>   arch/powerpc/include/asm/paravirt.h   | 28 ++
>>   arch/powerpc/include/asm/qspinlock.h  | 55 +++
>>   arch/powerpc/include/asm/qspinlock_paravirt.h |  5 ++
>>   arch/powerpc/platforms/pseries/Kconfig|  5 ++
>>   arch/powerpc/platforms/pseries/setup.c|  6 +-
>>   include/asm-generic/qspinlock.h   |  2 +
>>   6 files changed, 100 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
>>
>> diff --git a/arch/powerpc/include/asm/paravirt.h 
>> b/arch/powerpc/include/asm/paravirt.h
>> index 7a8546660a63..f2d51f929cf5 100644
>> --- a/arch/powerpc/include/asm/paravirt.h
>> +++ b/arch/powerpc/include/asm/paravirt.h
>> @@ -29,6 +29,16 @@ static inline void yield_to_preempted(int cpu, u32 
>> yield_count)
>>   {
>>  plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(cpu), 
>> yield_count);
>>   }
>> +
>> +static inline void prod_cpu(int cpu)
>> +{
>> +plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu));
>> +}
>> +
>> +static inline void yield_to_any(void)
>> +{
>> +plpar_hcall_norets(H_CONFER, -1, 0);
>> +}
>>   #else
>>   static inline bool is_shared_processor(void)
>>   {
>> @@ -45,6 +55,19 @@ static inline void yield_to_preempted(int cpu, u32 
>> yield_count)
>>   {
>>  ___bad_yield_to_preempted(); /* This would be a bug */
>>   }
>> +
>> +extern void ___bad_yield_to_any(void);
>> +static inline void yield_to_any(void)
>> +{
>> +___bad_yield_to_any(); /* This would be a bug */
>> +}
>> +
>> +extern void ___bad_prod_cpu(void);
>> +static inline void prod_cpu(int cpu)
>> +{
>> +___bad_prod_cpu(); /* This would be a bug */
>> +}
>> +
>>   #endif
>>   
>>   #define vcpu_is_preempted vcpu_is_preempted
>> @@ -57,5 +80,10 @@ static inline bool vcpu_is_preempted(int cpu)
>>  return false;
>>   }
>>   
>> +static inline bool pv_is_native_spin_unlock(void)
>> +{
>> + return !is_shared_processor();
>> +}
>> +
>>   #endif /* __KERNEL__ */
>>   #endif /* __ASM_PARAVIRT_H */
>> diff --git a/arch/powerpc/include/asm/qspinlock.h 
>> b/arch/powerpc/include/asm/qspinlock.h
>> index c49e33e24edd..0960a0de2467 100644
>> --- a/arch/powerpc/include/asm/qspinlock.h
>> +++ b/arch/powerpc/include/asm/qspinlock.h
>> @@ -3,9 +3,36 @@
>>   #define _ASM_POWERPC_QSPINLOCK_H
>>   
>>   #include 
>> +#include 
>>   
>>   #define _Q_PENDING_LOOPS   (1 << 9) /* not tuned */
>>   
>> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
>> +extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 
>> val);
>> +extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
>> +
>> +static __always_inline void queued_spin_lock_slowpath(struct qspinlock 
>> *lock, u32 val)
>> +{
>> +if (!is_shared_processor())
>> +native_queued_spin_lock_slowpath(lock, val);
>> +else
>> +__pv_queued_spin_lock_slowpath(lock, val);
>> +}
> 
> In a previous mail, I said that:

Hey, yeah I read that right after sending the series out. Thanks for the 
thorough review.

> You may need to match the use of __pv_queued_spin_lock_slowpath() with 
> the corresponding __pv_queued_spin_unlock(), e.g.
> 
> #define queued_spin_unlock queued_spin_unlock
> static inline queued_spin_unlock(struct qspinlock *lock)
> {
>      if (!is_shared_processor())
>      smp_store_release(>locked, 0);
>      else
>      __pv_queued_spin_unlock(lock);
> }
> 
> Otherwise, pv_kick() will never be called.
> 
> Maybe PowerPC HMT is different that the shared cpus can still process 
> instruction, though slower, that cpu kicking like what was done in kvm 
> is not really necessary. If that is the case, I think we should document 
> that.

It does stop dispatch, but it will wake up by itself after all other 
vCPUs have had a chance to dispatch. I will re-test with the fix in
place and see if there's any significant performance differences.

Thanks,
Nick



Re: [PATCH v2 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-05 Thread Waiman Long

On 7/3/20 3:35 AM, Nicholas Piggin wrote:

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/paravirt.h   | 28 ++
  arch/powerpc/include/asm/qspinlock.h  | 55 +++
  arch/powerpc/include/asm/qspinlock_paravirt.h |  5 ++
  arch/powerpc/platforms/pseries/Kconfig|  5 ++
  arch/powerpc/platforms/pseries/setup.c|  6 +-
  include/asm-generic/qspinlock.h   |  2 +
  6 files changed, 100 insertions(+), 1 deletion(-)
  create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h

diff --git a/arch/powerpc/include/asm/paravirt.h 
b/arch/powerpc/include/asm/paravirt.h
index 7a8546660a63..f2d51f929cf5 100644
--- a/arch/powerpc/include/asm/paravirt.h
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -29,6 +29,16 @@ static inline void yield_to_preempted(int cpu, u32 
yield_count)
  {
plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(cpu), 
yield_count);
  }
+
+static inline void prod_cpu(int cpu)
+{
+   plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu));
+}
+
+static inline void yield_to_any(void)
+{
+   plpar_hcall_norets(H_CONFER, -1, 0);
+}
  #else
  static inline bool is_shared_processor(void)
  {
@@ -45,6 +55,19 @@ static inline void yield_to_preempted(int cpu, u32 
yield_count)
  {
___bad_yield_to_preempted(); /* This would be a bug */
  }
+
+extern void ___bad_yield_to_any(void);
+static inline void yield_to_any(void)
+{
+   ___bad_yield_to_any(); /* This would be a bug */
+}
+
+extern void ___bad_prod_cpu(void);
+static inline void prod_cpu(int cpu)
+{
+   ___bad_prod_cpu(); /* This would be a bug */
+}
+
  #endif
  
  #define vcpu_is_preempted vcpu_is_preempted

@@ -57,5 +80,10 @@ static inline bool vcpu_is_preempted(int cpu)
return false;
  }
  
+static inline bool pv_is_native_spin_unlock(void)

+{
+ return !is_shared_processor();
+}
+
  #endif /* __KERNEL__ */
  #endif /* __ASM_PARAVIRT_H */
diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
index c49e33e24edd..0960a0de2467 100644
--- a/arch/powerpc/include/asm/qspinlock.h
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -3,9 +3,36 @@
  #define _ASM_POWERPC_QSPINLOCK_H
  
  #include 

+#include 
  
  #define _Q_PENDING_LOOPS	(1 << 9) /* not tuned */
  
+#ifdef CONFIG_PARAVIRT_SPINLOCKS

+extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+
+static __always_inline void queued_spin_lock_slowpath(struct qspinlock *lock, 
u32 val)
+{
+   if (!is_shared_processor())
+   native_queued_spin_lock_slowpath(lock, val);
+   else
+   __pv_queued_spin_lock_slowpath(lock, val);
+}


In a previous mail, I said that:

You may need to match the use of __pv_queued_spin_lock_slowpath() with 
the corresponding __pv_queued_spin_unlock(), e.g.


#define queued_spin_unlock queued_spin_unlock
static inline queued_spin_unlock(struct qspinlock *lock)
{
    if (!is_shared_processor())
    smp_store_release(>locked, 0);
    else
    __pv_queued_spin_unlock(lock);
}

Otherwise, pv_kick() will never be called.

Maybe PowerPC HMT is different that the shared cpus can still process 
instruction, though slower, that cpu kicking like what was done in kvm 
is not really necessary. If that is the case, I think we should document 
that.


Cheers,
Longman



Re: [PATCH 3/5] selftests/powerpc: Update the stack expansion test

2020-07-05 Thread Christophe Leroy




Le 03/07/2020 à 16:13, Michael Ellerman a écrit :

Update the stack expansion load/store test to take into account the
new allowance of 4096 bytes below the stack pointer.


[I didn't receive patch 2, don't know why, hence commenting patch 2 here.]

Shouldn't patch 2 carry a fixes tag and be Cced to stable for 
application to previous kernel releases ?


Christophe



Signed-off-by: Michael Ellerman 
---
  .../selftests/powerpc/mm/stack_expansion_ldst.c| 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c 
b/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
index 0587e11437f5..95c3f3de16a1 100644
--- a/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
+++ b/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
@@ -186,17 +186,17 @@ static void test_one_type(enum access_type type, unsigned 
long page_size, unsign
// But if we go past the rlimit it should fail
assert(test_one(DEFAULT_SIZE, rlim_cur + 1, type) != 0);
  
-	// Above 1MB powerpc only allows accesses within 2048 bytes of

+   // Above 1MB powerpc only allows accesses within 4096 bytes of
// r1 for accesses that aren't stdu
-   assert(test_one(1 * _MB + page_size - 128, -2048, type) == 0);
+   assert(test_one(1 * _MB + page_size - 128, -4096, type) == 0);
  #ifdef __powerpc__
-   assert(test_one(1 * _MB + page_size - 128, -2049, type) != 0);
+   assert(test_one(1 * _MB + page_size - 128, -4097, type) != 0);
  #else
-   assert(test_one(1 * _MB + page_size - 128, -2049, type) == 0);
+   assert(test_one(1 * _MB + page_size - 128, -4097, type) == 0);
  #endif
  
  	// By consuming 2MB of stack we test the stdu case

-   assert(test_one(2 * _MB + page_size - 128, -2048, type) == 0);
+   assert(test_one(2 * _MB + page_size - 128, -4096, type) == 0);
  }
  
  static int test(void)




Re: [RFC PATCH 4/5] powerpc/mm: Remove custom stack expansion checking

2020-07-05 Thread Christophe Leroy




Le 03/07/2020 à 16:13, Michael Ellerman a écrit :

We have powerpc specific logic in our page fault handling to decide if
an access to an unmapped address below the stack pointer should expand
the stack VMA.

The logic aims to prevent userspace from doing bad accesses below the
stack pointer. However as long as the stack is < 1MB in size, we allow
all accesses without further checks. Adding some debug I see that I
can do a full kernel build and LTP run, and not a single process has
used more than 1MB of stack. So for the majority of processes the
logic never even fires.

We also recently found a nasty bug in this code which could cause
userspace programs to be killed during signal delivery. It went
unnoticed presumably because most processes use < 1MB of stack.

The generic mm code has also grown support for stack guard pages since
this code was originally written, so the most heinous case of the
stack expanding into other mappings is now handled for us.

Finally although some other arches have special logic in this path,
from what I can tell none of x86, arm64, arm and s390 impose any extra
checks other than those in expand_stack().

So drop our complicated logic and like other architectures just let
the stack expand as long as its within the rlimit.


I agree that's probably not worth a so complicated logic that is nowhere 
documented.


This patch looks good to me, minor comments below.



Signed-off-by: Michael Ellerman 
---
  arch/powerpc/mm/fault.c | 106 ++--
  1 file changed, 5 insertions(+), 101 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index ed01329dd12b..925a7231abb3 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -42,39 +42,7 @@
  #include 
  #include 
  
-/*

- * Check whether the instruction inst is a store using
- * an update addressing form which will update r1.
- */
-static bool store_updates_sp(struct ppc_inst inst)
-{
-   /* check for 1 in the rA field */
-   if (((ppc_inst_val(inst) >> 16) & 0x1f) != 1)
-   return false;
-   /* check major opcode */
-   switch (ppc_inst_primary_opcode(inst)) {
-   case OP_STWU:
-   case OP_STBU:
-   case OP_STHU:
-   case OP_STFSU:
-   case OP_STFDU:
-   return true;
-   case OP_STD:/* std or stdu */
-   return (ppc_inst_val(inst) & 3) == 1;
-   case OP_31:
-   /* check minor opcode */
-   switch ((ppc_inst_val(inst) >> 1) & 0x3ff) {
-   case OP_31_XOP_STDUX:
-   case OP_31_XOP_STWUX:
-   case OP_31_XOP_STBUX:
-   case OP_31_XOP_STHUX:
-   case OP_31_XOP_STFSUX:
-   case OP_31_XOP_STFDUX:
-   return true;
-   }
-   }
-   return false;
-}
+


Do we need this additional blank line ?


  /*
   * do_page_fault error handling helpers
   */
@@ -267,54 +235,6 @@ static bool bad_kernel_fault(struct pt_regs *regs, 
unsigned long error_code,
return false;
  }
  
-static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,

-   struct vm_area_struct *vma, unsigned int flags,
-   bool *must_retry)
-{
-   /*
-* N.B. The POWER/Open ABI allows programs to access up to
-* 288 bytes below the stack pointer.
-* The kernel signal delivery code writes up to 4KB
-* below the stack pointer (r1) before decrementing it.
-* The exec code can write slightly over 640kB to the stack
-* before setting the user r1.  Thus we allow the stack to
-* expand to 1MB without further checks.
-*/
-   if (address + 0x10 < vma->vm_end) {
-   struct ppc_inst __user *nip = (struct ppc_inst __user 
*)regs->nip;
-   /* get user regs even if this fault is in kernel mode */
-   struct pt_regs *uregs = current->thread.regs;
-   if (uregs == NULL)
-   return true;
-
-   /*
-* A user-mode access to an address a long way below
-* the stack pointer is only valid if the instruction
-* is one which would update the stack pointer to the
-* address accessed if the instruction completed,
-* i.e. either stwu rs,n(r1) or stwux rs,r1,rb
-* (or the byte, halfword, float or double forms).
-*
-* If we don't check this then any write to the area
-* between the last mapped region and the stack will
-* expand the stack rather than segfaulting.
-*/
-   if (address + 4096 >= uregs->gpr[1])
-   return false;
-
-   if ((flags & FAULT_FLAG_WRITE) && (flags & FAULT_FLAG_USER) &&
-   access_ok(nip, sizeof(*nip))) {
-   struct 

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.8-5 tag

2020-07-05 Thread pr-tracker-bot
The pull request you sent on Sat, 04 Jul 2020 23:33:05 +1000:

> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> tags/powerpc-5.8-5

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/9bc0b029a8889f2c67c988760aba66a8d7b22af5

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


[PATCH 2/2] powerpc/powernv: Move pnv_ioda_setup_bus_dma under CONFIG_IOMMU_API

2020-07-05 Thread Oliver O'Halloran
pnv_ioda_setup_bus_dma() is only used when a passed through PE is
returned to the host. If the kernel is built without IOMMU support
this is dead code. Move it under the #ifdef with the rest of the
IOMMU API support.

Reported-by: kernel test robot 
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 26 +++
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index c2d46d28114b..31c3e6d58c41 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1885,19 +1885,6 @@ static bool pnv_pci_ioda_iommu_bypass_supported(struct 
pci_dev *pdev,
return false;
 }
 
-static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
-{
-   struct pci_dev *dev;
-
-   list_for_each_entry(dev, >devices, bus_list) {
-   set_iommu_table_base(>dev, pe->table_group.tables[0]);
-   dev->dev.archdata.dma_offset = pe->tce_bypass_base;
-
-   if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
-   pnv_ioda_setup_bus_dma(pe, dev->subordinate);
-   }
-}
-
 static inline __be64 __iomem *pnv_ioda_get_inval_reg(struct pnv_phb *phb,
 bool real_mode)
 {
@@ -2547,6 +2534,19 @@ static long pnv_pci_ioda2_create_table_userspace(
return ret;
 }
 
+static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
+{
+   struct pci_dev *dev;
+
+   list_for_each_entry(dev, >devices, bus_list) {
+   set_iommu_table_base(>dev, pe->table_group.tables[0]);
+   dev->dev.archdata.dma_offset = pe->tce_bypass_base;
+
+   if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
+   pnv_ioda_setup_bus_dma(pe, dev->subordinate);
+   }
+}
+
 static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
 {
struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
-- 
2.26.2



[PATCH 1/2] powerpc/powernv: Make pnv_pci_sriov_enable() and friends static

2020-07-05 Thread Oliver O'Halloran
The kernel test robot noticed these are non-static which causes Clang to
print some warnings. These are called via ppc_md function pointers so
there's no need for them to be non-static.

Reported-by: kernel test robot 
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 73a63efcf855..c2d46d28114b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1490,7 +1490,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
}
 }
 
-void pnv_pci_sriov_disable(struct pci_dev *pdev)
+static void pnv_pci_sriov_disable(struct pci_dev *pdev)
 {
struct pci_bus*bus;
struct pci_controller *hose;
@@ -1600,7 +1600,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, 
u16 num_vfs)
}
 }
 
-int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
+static int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 {
struct pci_bus*bus;
struct pci_controller *hose;
@@ -1715,7 +1715,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
num_vfs)
return ret;
 }
 
-int pnv_pcibios_sriov_disable(struct pci_dev *pdev)
+static int pnv_pcibios_sriov_disable(struct pci_dev *pdev)
 {
pnv_pci_sriov_disable(pdev);
 
@@ -1724,7 +1724,7 @@ int pnv_pcibios_sriov_disable(struct pci_dev *pdev)
return 0;
 }
 
-int pnv_pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
+static int pnv_pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 {
/* Allocate PCI data */
add_sriov_vf_pdns(pdev);
-- 
2.26.2



[Bug 208197] OF: /pci@f2000000/mac-io@17/gpio@50/...: could not find phandle

2020-07-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=208197

--- Comment #2 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 290097
  --> https://bugzilla.kernel.org/attachment.cgi?id=290097=edit
bisect.log

Did a bisect but got quite some skips due to the system not finishing booting.

 # git bisect skip | tee -a ~/bisect02.log
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
388bcc6ecc609fca1b4920de7dc3806c98ec535e
48ebea5026d692c5ab0a7d303f0fe1f8ba046e0f
c78c31b374a68be79cb4a03ef5b6c187f034e903
c8be6af9ef16cf44d690fc227a0d2dd7a526ef05
eb7fbc9fb1185a7f89adeb2de724c2c96ff608e9
42926ac3cd50937346c23c0005817264af4357a7
baf1d9c182935e88aab08701b0a0b22871117fe0
5f5377eaddfc24e5d7562e588d0ff84f9264d7c1
96fa72ffb2155dba9ba8c5d282a1ff19ed32f177
716a7a25969003d82ab738179c3f1068a120ed11
fbc35b45f9f6a971341b9462c6e94c257e779fb5
45bb08de65b418959313593f527c619e102c2d57
93d2e4322aa74c1ad1e8c2160608eb9a960d69ff
69b07ee33eb12a505d55e3e716fc7452496b9041
fefcfc968723caf93318613a08e1f3ad07a6154f
0f605db5bdd42edfbfcac36acaf8f72cfe9ce774
c82c83c330654c5639960ebc3dabbae53c43f79e
114dbb4fa7c4053a51964d112e2851e818e085c6
55623260bb33e2ab849af76edf2253bc04cb241f
2cd38fd15e4ebcfe917a443734820269f8b5ba2b
ab7c1e163b525316a870a494dd4ea196e7a6c455
f7d8f3f092d001f8d91552d2697643e727694942
Keine binäre Suche mehr möglich!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Using Firefox hangs system

2020-07-05 Thread Paul Menzel

Dear Linux folks,


With an IBM S822LC with Ubuntu 20.04, after updating to Firefox 78.0, 
using Firefox seems to hang the system. This happened with self-built 
Linux 5.7-rc5+ and now with 5.8-rc3+.


(At least I believe the Firefox update is causing this.)

Log in is impossible, and using the Serial over LAN over IPMI shows the 
messages below.



[ 2620.579187] watchdog: BUG: soft lockup - CPU#125 stuck for 22s! 
[swapper/125:0]
[ 2620.579378] Modules linked in: tcp_diag inet_diag unix_diag xt_CHECKSUM 
xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle 
ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter 
bridge stp llc overlay xfs kvm_hv kvm joydev binfmt_misc uas usb_storage 
vmx_crypto ofpart cmdlinepart bnx2x powernv_flash mtd mdio crct10dif_vpmsum 
at24 ibmpowernv ipmi_powernv ipmi_devintf powernv_rng ipmi_msghandler opal_prd 
sch_fq_codel parport_pc nfsd ppdev lp auth_rpcgss nfs_acl parport lockd grace 
sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic libcrc32c xor 
zstd_compress raid6_pq input_leds mac_hid hid_generic ast drm_vram_helper 
drm_ttm_helper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops drm drm_panel_orientation_quirks ahci libahci usbhid hid 
crc32c_vpmsum uio_pdrv_genirq uio
[ 2620.579537] CPU: 125 PID: 0 Comm: swapper/125 Tainted: G  D WL
5.8.0-rc3+ #1
[ 2620.579552] NIP:  c10dad38 LR: c10dad30 CTR: c0237830
[ 2620.579568] REGS: c0ffcb8c7600 TRAP: 0900   Tainted: G  D WL 
(5.8.0-rc3+)
[ 2620.579582] MSR:  90009033   CR: 44004228  
XER: 
[ 2620.579599] CFAR: c10dad44 IRQMASK: 0 
[ 2620.579599] GPR00: c023718c c0ffcb8c7890 c1f9a900  
[ 2620.579599] GPR04: c1fce438 0078 00010008c1f2  
[ 2620.579599] GPR08: 00ffd96a 8087  c1fd25e0 
[ 2620.579599] GPR12: 4400 c072f680 c1ea36d8 c0ffcb859800 
[ 2620.579599] GPR16: c166c880 c16f8e00 000a c0ffcb859800 
[ 2620.579599] GPR20: 0100 c166c918 c1fd21e8 c0ffcb859800 
[ 2620.579599] GPR24: 00ffd96a c1d44b80 c1d53780 0008 
[ 2620.579599] GPR28: c1fd21e0 0001  c1d44b80 
[ 2620.579711] NIP [c10dad38] _raw_spin_lock_irqsave+0x98/0x120

[ 2620.579724] LR [c10dad30] _raw_spin_lock_irqsave+0x90/0x120
[ 2620.579737] Call Trace:
[ 2620.579746] [c0ffcb8c7890] [c13c84a0] ncsi_ops+0x209f50/0x2dc1d8 
(unreliable)
[ 2620.579763] [c0ffcb8c78d0] [c023718c] rcu_core+0xfc/0x7a0
[ 2620.579777] [c0ffcb8c7970] [c10db81c] __do_softirq+0x17c/0x534
[ 2620.579791] [c0ffcb8c7aa0] [c01786f4] irq_exit+0xd4/0x130
[ 2620.579805] [c0ffcb8c7ad0] [c0025eec] timer_interrupt+0x13c/0x370
[ 2620.579821] [c0ffcb8c7b40] [c00165c0] 
replay_soft_interrupts+0x320/0x3f0
[ 2620.579837] [c0ffcb8c7d30] [c00166d8] 
arch_local_irq_restore+0x48/0xa0
[ 2620.579853] [c0ffcb8c7d50] [c0de2fe0] 
cpuidle_enter_state+0x100/0x780
[ 2620.579869] [c0ffcb8c7dd0] [c0de36fc] cpuidle_enter+0x4c/0x70
[ 2620.579883] [c0ffcb8c7e10] [c01c6bb4] do_idle+0x3c4/0x590
[ 2620.579896] [c0ffcb8c7ee0] [c01c6fcc] cpu_startup_entry+0x3c/0x50
[ 2620.579911] [c0ffcb8c7f10] [c00615f4] start_secondary+0x2d4/0x3b0
[ 2620.579927] [c0ffcb8c7f90] [c000c454] 
start_secondary_prolog+0x10/0x14
[ 2620.579941] Instruction dump:
[ 2620.579950] 6000 6000 7c0802a6 fba10028 fbe10038 7c7f1b78 f8010050 8bad0988 
[ 2620.579967] 7fc3f378 4af3b96d 6000 7c210b78 <6000> 813f 2c29 4082fff0 
[ 2645.907192] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:

[ 2660.067201] watchdog: CPU 0 detected hard LOCKUP on other CPUs 113
[ 2660.067385] watchdog: CPU 0 TB:1390608252047, last SMP heartbeat 
TB:1382840188990 (15171ms ago)
[ 2708.927190] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 2724.067205] watchdog: CPU 0 detected hard LOCKUP on other CPUs 87
[ 2724.067396] watchdog: CPU 0 TB:1423376252137, last SMP heartbeat 
TB:1415618427864 (15152ms ago)
[ 2771.947188] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
5:0]
[ 2620.579378] Modules linked in: tcp_diag inet_diag unix_diag xt_CHECKSUM 
xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle 
ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter 
bridge stp llc overlay xfs kvm_hv kvm joydev binfmt_misc uas usb_storage 
vmx_crypto ofpart cmdlinepart bnx2x powernv_flash mtd mdio crct10dif_vpmsum 
at24 ibmpowernv ipmi_powernv ipmi_devintf