Re: [PATCH 5/6] dma-mapping: support fsl-mc bus

2018-03-07 Thread Christoph Hellwig
On Tue, Mar 06, 2018 at 04:41:56AM +, Nipun Gupta wrote:
> Sorry for asking a trivial question - looking into dma_configure() I see that
> PCI is used in the start and the end of the API.
> In the end part pci_put_host_bridge_device() is called.
> So are two bus callbacks something like 'dma_config_start' & 'dma_config_end'
> will be required where the former one will return "dma_dev"?

I'd just use dma_configure as the callback.

Currently the of_dma_configure and acpi_dma_configure are only used
for PCI anyway, as no one else sets a non-NULL dma dev.   For fsl-mc
I suspect only of_dma_configure is relevanet, so just call that directly.
If at some point we get enough busses with either OF or ACPI we could
create a helper called from ->dma_configure for that.


[PATCH RFC 5/5] KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend mode

2018-03-07 Thread Paul Mackerras
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
where the contents of the TEXASR can get corrupted while a thread is
in fake suspend state.  The workaround is for the instruction emulation
code to use the value saved at the most recent guest exit in real
suspend mode.  We achieve this by simply not saving the TEXASR into
the vcpu struct on an exit in fake suspend state.  We also have to
take care to set the orig_texasr field only on guest exit in real
suspend state.

This also means that on guest entry in fake suspend state, TEXASR
will be restored to the value it had on the last exit in real suspend
state, effectively counteracting any hardware-caused corruption.  This
works because TEXASR may not be written in suspend state.

With this, the guest might see the wrong values in TEXASR if it reads
it while in suspend state, but will see the correct value in
non-transactional state (e.g. after a treclaim), and treclaim will
work correctly.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 7b932f1..f374073 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -3105,10 +3105,6 @@ kvmppc_save_tm:
li  r3, TM_CAUSE_KVM_RESCHED
 
 BEGIN_FTR_SECTION
-   /* Emulation of the treclaim instruction needs TEXASR before treclaim */
-   mfspr   r6, SPRN_TEXASR
-   std r6, VCPU_ORIG_TEXASR(r9)
-
lbz r0, HSTATE_FAKE_SUSPEND(r13) /* Were we fake suspended? */
cmpwi   r0, 0
beq 3f
@@ -3116,7 +3112,12 @@ BEGIN_FTR_SECTION
beq 4f
bl  pnv_power9_force_smt4_catch
nop
+   b   6f
 3:
+   /* Emulation of the treclaim instruction needs TEXASR before treclaim */
+   mfspr   r6, SPRN_TEXASR
+   std r6, VCPU_ORIG_TEXASR(r9)
+6:
 END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
 
/* Clear the MSR RI since r1, r13 are all going to be foobar. */
@@ -3160,7 +3161,8 @@ BEGIN_FTR_SECTION
andcr3, r3, r0
mtspr   SPRN_PSSCR, r3
ld  r9, HSTATE_KVM_VCPU(r13)
-   b   1f
+   /* Don't save TEXASR, use value from last exit in real suspend state */
+   b   11f
 2:
 END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
 
@@ -3234,12 +3236,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
 * change these outside of a transaction, so they must always be
 * context switched.
 */
+   mfspr   r7, SPRN_TEXASR
+   std r7, VCPU_TEXASR(r9)
+11:
mfspr   r5, SPRN_TFHAR
mfspr   r6, SPRN_TFIAR
-   mfspr   r7, SPRN_TEXASR
std r5, VCPU_TFHAR(r9)
std r6, VCPU_TFIAR(r9)
-   std r7, VCPU_TEXASR(r9)
 
addir1, r1, PPC_MIN_STKFRM
ld  r0, PPC_LR_STKOFF(r1)
-- 
2.7.4



[PATCH RFC 0/5] powerpc & KVM: Work around POWER9 TM hardware bugs

2018-03-07 Thread Paul Mackerras
POWER9 has some shortcomings in its implementation of transactional
memory.  Starting with v2.2 of the "Nimbus" chip, some changes have
been made to the hardware which make it able to generate hypervisor
interrupts in the situations where hardware needs the hypervisor to
provide some assistance with the implementation.  Specifically, the
core does not have enough storage to store a complete checkpoint of
all the architected state for all 4 threads, and therefore needs to
be able to offload the checkpointed state of threads which are in
transactional suspended state (for threads that are in transactional
state, the hardware can simply abort the transaction).

This series implements the hypervisor assistance for TM for KVM
guests, thus allowing them to use TM.  This then means that we can
allow live migration of guests on POWER8 that may be using TM to
POWER9 hosts.

 arch/powerpc/include/asm/asm-prototypes.h |   3 +
 arch/powerpc/include/asm/cputable.h   |   5 +-
 arch/powerpc/include/asm/kvm_asm.h|   2 +
 arch/powerpc/include/asm/kvm_book3s.h |   4 +
 arch/powerpc/include/asm/kvm_book3s_64.h  |  43 ++
 arch/powerpc/include/asm/kvm_book3s_asm.h |   1 +
 arch/powerpc/include/asm/kvm_host.h   |   1 +
 arch/powerpc/include/asm/paca.h   |   3 +
 arch/powerpc/include/asm/powernv.h|   1 +
 arch/powerpc/include/asm/ppc-opcode.h |   4 +
 arch/powerpc/include/asm/reg.h|   7 +
 arch/powerpc/kernel/asm-offsets.c |   3 +
 arch/powerpc/kernel/cputable.c|  23 +++-
 arch/powerpc/kernel/dt_cpu_ftrs.c |   3 +
 arch/powerpc/kernel/exceptions-64s.S  |   4 +-
 arch/powerpc/kernel/idle_book3s.S |  19 +++
 arch/powerpc/kvm/Makefile |   7 +
 arch/powerpc/kvm/book3s_hv.c  |  18 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 150 -
 arch/powerpc/kvm/book3s_hv_tm.c   | 216 ++
 arch/powerpc/kvm/book3s_hv_tm_builtin.c   | 109 +++
 arch/powerpc/kvm/powerpc.c|   5 +-
 arch/powerpc/platforms/powernv/idle.c |  77 +++
 23 files changed, 694 insertions(+), 14 deletions(-)




[PATCH RFC 3/5] KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9

2018-03-07 Thread Paul Mackerras
POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode).  Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads.  The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems.  This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.

The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional.  The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated.  The trechkpt
instruction also causes a soft patch interrupt.

On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present.  The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state.  Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR.  The new PSSCR bit is write-only and
reads back as 0.

On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.

Emulation of the instructions that cause a softpatch interrupt is
handled in two paths.  If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state.  This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active.  If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on.  This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.

The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0.  The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.

With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_asm.h|   2 +
 arch/powerpc/include/asm/kvm_book3s.h |   4 +
 arch/powerpc/include/asm/kvm_book3s_64.h  |  43 ++
 arch/powerpc/include/asm/kvm_book3s_asm.h |   1 +
 arch/powerpc/include/asm/kvm_host.h   |   1 +
 arch/powerpc/include/asm/ppc-opcode.h |   4 +
 arch/powerpc/include/asm/reg.h|   7 +
 arch/powerpc/kernel/asm-offsets.c |   2 +
 arch/powerpc/kernel/cputable.c|   1 -
 arch/powerpc/kernel/exceptions-64s.S  |   4 +-
 arch/powerpc/kvm/Makefile |   7 +
 arch/powerpc/kvm/book3s_hv.c  |  18 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 131 +-
 arch/powerpc/kvm/book3s_hv_tm.c   | 216 ++
 arch/powerpc/kvm/book3s_hv_tm_builtin.c   | 109 +++
 arch/powerpc/kvm/powerpc.c|   5 +-
 16 files changed, 545 insertions(+), 10 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_tm.c
 create mode 100644 arch/powerpc/kvm/book3s_hv_tm_builtin.c

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index 09a802b..a790d5c 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -108,6 +108,8 @@
 
 /* book3s_hv */
 
+#define BOOK3S_INTERRUPT_HV_SOFTPATCH  0x1500
+
 /*
  * Special trap used to indicate to host that this is a
  * passthrough interrupt that could not be handled
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 376ae80..4c02a73 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -241,6 +241,10 @@ extern void kvmppc_update_lpcr(struct 

[PATCH RFC 1/5] powerpc: Add a CPU feature bit for TM bug workarounds on POWER9 v2.2

2018-03-07 Thread Paul Mackerras
This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2
processors which will be used to enable the hypervisor to assist
hardware with the handling of checkpointed register values while
the CPU is in suspend state, in order to work around hardware
bugs.

When the dt_cpu_ftrs subsystem is in use, the software assistance
can be enabled using a "tm-suspend-hypervisor-assist" node in the
device tree.  In the absence of such a node, a quirk enables the
assistance for POWER9 "Nimbus" DD2.2 processors.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/cputable.h |  5 -
 arch/powerpc/kernel/cputable.c  | 24 ++--
 arch/powerpc/kernel/dt_cpu_ftrs.c   |  3 +++
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index a2c5c95..46343ac 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -203,6 +203,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_DAWR   LONG_ASM_CONST(0x0400)
 #define CPU_FTR_DABRX  LONG_ASM_CONST(0x0800)
 #define CPU_FTR_PMAO_BUG   LONG_ASM_CONST(0x1000)
+#define CPU_FTR_P9_TM_EMUL LONG_ASM_CONST(0x2000)
 #define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000)
 #define CPU_FTR_POWER9_DD2_1   LONG_ASM_CONST(0x8000)
 
@@ -470,6 +471,7 @@ static inline void cpu_feature_keys_init(void) { }
 (~CPU_FTR_SAO))
 #define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9
 #define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1)
+#define CPU_FTRS_POWER9_DD2_2 (CPU_FTRS_POWER9 | CPU_FTR_P9_TM_EMUL)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -489,7 +491,8 @@ static inline void cpu_feature_keys_init(void) { }
 CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
 CPU_FTRS_POWER8 | CPU_FTRS_POWER8_DD1 | CPU_FTRS_CELL | \
 CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD1 | CPU_FTRS_POWER9_DD2_1)
+CPU_FTRS_POWER9_DD1 | CPU_FTRS_POWER9_DD2_1 | \
+CPU_FTRS_POWER9_DD2_2)
 #endif
 #else
 enum {
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index c40a9fc..68052ea 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -553,11 +553,31 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check_early= __machine_check_early_realmode_p9,
.platform   = "power9",
},
-   {   /* Power9 DD 2.1 or later (see DD2.0 above) */
+   {   /* Power9 DD 2.1 */
+   .pvr_mask   = 0xefff,
+   .pvr_value  = 0x004e0201,
+   .cpu_name   = "POWER9 (raw)",
+   .cpu_features   = CPU_FTRS_POWER9_DD2_1,
+   .cpu_user_features  = COMMON_USER_POWER9,
+   .cpu_user_features2 = COMMON_USER2_POWER9,
+   .mmu_features   = MMU_FTRS_POWER9,
+   .icache_bsize   = 128,
+   .dcache_bsize   = 128,
+   .num_pmcs   = 6,
+   .pmc_type   = PPC_PMC_IBM,
+   .oprofile_cpu_type  = "ppc64/power9",
+   .oprofile_type  = PPC_OPROFILE_INVALID,
+   .cpu_setup  = __setup_cpu_power9,
+   .cpu_restore= __restore_cpu_power9,
+   .flush_tlb  = __flush_tlb_power9,
+   .machine_check_early= __machine_check_early_realmode_p9,
+   .platform   = "power9",
+   },
+   {   /* Power9 DD2.2 or later */
.pvr_mask   = 0x,
.pvr_value  = 0x004e,
.cpu_name   = "POWER9 (raw)",
-   .cpu_features   = CPU_FTRS_POWER9_DD2_1,
+   .cpu_features   = CPU_FTRS_POWER9_DD2_2,
.cpu_user_features  = COMMON_USER_POWER9,
.cpu_user_features2 = COMMON_USER2_POWER9,
.mmu_features   = MMU_FTRS_POWER9,
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 945e2c2..e181266 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -590,6 +590,7 @@ static struct dt_cpu_feature_match __initdata
{"virtual-page-class-key-protection", feat_enable, 0},
{"transactional-memory", feat_enable_tm, CPU_FTR_TM},
{"transactional-memory-v3", feat_enable_tm, 0},
+   

[PATCH RFC 2/5] powerpc/powernv: Provide a way to force a core into SMT4 mode

2018-03-07 Thread Paul Mackerras
POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily.  This workaround is only needed when
running bare-metal.

This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state.  Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.

To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state.  If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0.  The pnv_power9_force_smt4_catch() function does the following:

1. Set the dont_stop flag for each thread in the core, except
   ourselves (in fact we use an atomic_inc() in case more than
   one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
   requested_psscr field in the paca being 0.  If this is at
   least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
   being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
   we sent a doorbell interrupt and check if they are awake now.

This relies on the following properties:

- Once dont_stop is non-zero, requested_psccr can't go from zero to
  non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
  a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
  and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
  must be in SMT4 mode, since SMT modes are powers of 2.

This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop.  The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.

In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/asm-prototypes.h |  3 ++
 arch/powerpc/include/asm/paca.h   |  3 ++
 arch/powerpc/include/asm/powernv.h|  1 +
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/idle_book3s.S | 19 
 arch/powerpc/platforms/powernv/idle.c | 77 +++
 6 files changed, 104 insertions(+)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 7330150..4e14d23 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -126,4 +126,7 @@ extern int __ucmpdi2(u64, u64);
 void _mcount(void);
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip);
 
+void pnv_power9_force_smt4_catch(void);
+void pnv_power9_force_smt4_release(void);
+
 #endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index b62c310..4803cc1 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 register struct paca_struct *local_paca asm("r13");
 
@@ -177,6 +178,8 @@ struct paca_struct {
u8 thread_mask;
/* Mask to denote subcore sibling threads */
u8 subcore_sibling_mask;
+   /* Flag to request this thread not to stop */
+   atomic_t dont_stop;
/*
 * Pointer to an array which contains pointer
 * to the sibling threads' paca.
diff --git a/arch/powerpc/include/asm/powernv.h 
b/arch/powerpc/include/asm/powernv.h
index dc5f6a5..d1c2d2e6 100644
--- a/arch/powerpc/include/asm/powernv.h
+++ b/arch/powerpc/include/asm/powernv.h
@@ -40,6 +40,7 @@ static inline int pnv_npu2_handle_fault(struct npu_context 
*context,
 }
 
 static inline void pnv_tm_init(void) { }
+static inline void pnv_power9_force_smt4(void) { }
 #endif
 
 #endif /* _ASM_POWERNV_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index ea5eb91..dbefe30 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -759,6 +759,7 @@ int main(void)
OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
OFFSET(PACA_REQ_PSSCR, paca_struct, requested_psscr);
+   

[PATCH RFC 4/5] KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode

2018-03-07 Thread Paul Mackerras
From: Suraj Jitindar Singh 

This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
where a treclaim performed in fake suspend mode can cause subsequent
reads from the XER register to return inconsistent values for the SO
(summary overflow) bit.  The inconsistent SO bit state can potentially
be observed on any thread in the core.  We have to do the treclaim
because that is the only way to get the thread out of suspend state
(fake or real) and into non-transactional state.

The workaround for the bug is to force the core into SMT4 mode before
doing the treclaim.  This patch adds the code to do that.

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f73eba6..7b932f1 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -3089,6 +3089,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 kvmppc_save_tm:
mflrr0
std r0, PPC_LR_STKOFF(r1)
+   stdur1, -PPC_MIN_STKFRM(r1)
 
/* Turn on TM. */
mfmsr   r8
@@ -3108,8 +3109,14 @@ BEGIN_FTR_SECTION
mfspr   r6, SPRN_TEXASR
std r6, VCPU_ORIG_TEXASR(r9)
 
-   rldicl. r8, r8, 64 - MSR_TS_S_LG, 62
+   lbz r0, HSTATE_FAKE_SUSPEND(r13) /* Were we fake suspended? */
+   cmpwi   r0, 0
beq 3f
+   rldicl. r8, r8, 64 - MSR_TS_S_LG, 62 /* Did we actually hrfid? */
+   beq 4f
+   bl  pnv_power9_force_smt4_catch
+   nop
+3:
 END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
 
/* Clear the MSR RI since r1, r13 are all going to be foobar. */
@@ -3126,7 +3133,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
 
/* If doing TM emulation on POWER9 DD2.2, check for fake suspend mode */
 BEGIN_FTR_SECTION
-3:
lbz r9, HSTATE_FAKE_SUSPEND(r13)
cmpwi   r9, 0
beq 2f
@@ -3138,13 +3144,16 @@ BEGIN_FTR_SECTION
/* Reload stack pointer and TOC. */
ld  r1, HSTATE_HOST_R1(r13)
ld  r2, PACATOC(r13)
+   /* Set MSR RI now we have r1 and r13 back. */
li  r5, MSR_RI
mtmsrd  r5, 1
HMT_MEDIUM
ld  r6, HSTATE_DSCR(r13)
mtspr   SPRN_DSCR, r6
-   li  r0, 0
-   stb r0, HSTATE_FAKE_SUSPEND(r13)
+   bl  pnv_power9_force_smt4_release
+   nop
+
+4:
mfspr   r3, SPRN_PSSCR
/* PSSCR_FAKE_SUSPEND is a write-only bit, but clear it anyway */
li  r0, PSSCR_FAKE_SUSPEND
@@ -3232,6 +3241,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
std r6, VCPU_TFIAR(r9)
std r7, VCPU_TEXASR(r9)
 
+   addir1, r1, PPC_MIN_STKFRM
ld  r0, PPC_LR_STKOFF(r1)
mtlrr0
blr
@@ -3266,6 +3276,8 @@ kvmppc_restore_tm:
mtspr   SPRN_TFIAR, r6
mtspr   SPRN_TEXASR, r7
 
+   li  r0, 0
+   stb r0, HSTATE_FAKE_SUSPEND(r13)
ld  r5, VCPU_MSR(r4)
rldicl. r5, r5, 64 - MSR_TS_S_LG, 62
beqlr   /* TM not active in guest */
-- 
2.7.4



Re: [PATCH v3] powerpc/npu-dma.c: Fix deadlock in mmio_invalidate

2018-03-07 Thread Alistair Popple
Michael,

This won't apply cleanly on top of Balbir's MMIO ATSD Flushing patch
(https://patchwork.ozlabs.org/patch/848343/). I will resend a v4 which applies
cleanly on top of that as the rebase/merge is non-trivial.

- Alistair

On Friday, 2 March 2018 4:18:45 PM AEDT Alistair Popple wrote:
> When sending TLB invalidates to the NPU we need to send extra flushes due
> to a hardware issue. The original implementation would lock the all the
> ATSD MMIO registers sequentially before unlocking and relocking each of
> them sequentially to do the extra flush.
> 
> This introduced a deadlock as it is possible for one thread to hold one
> ATSD register whilst waiting for another register to be freed while the
> other thread is holding that register waiting for the one in the first
> thread to be freed.
> 
> For example if there are two threads and two ATSD registers:
> 
> Thread A  Thread B
> Acquire 1
> Acquire 2
> Release 1 Acquire 1
> Wait 1Wait 2
> 
> Both threads will be stuck waiting to acquire a register resulting in an
> RCU stall warning or soft lockup.
> 
> This patch solves the deadlock by refactoring the code to ensure registers
> are not released between flushes and to ensure all registers are either
> acquired or released together and in order.
> 
> Fixes: bbd5ff50afff ("powerpc/powernv/npu-dma: Add explicit flush when 
> sending an ATSD")
> Signed-off-by: Alistair Popple 
> 
> ---
> 
> v2:
>  - Added memory barriers around ATSD register aquisition/release
>  - Added compiler barriers around npdev[][] assignment
> 
> v3:
>  - Added comments to describe required locking
> 
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 228 
> +++
>  1 file changed, 139 insertions(+), 89 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index 0a253b64ac5f..0dec96eb3358 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -410,6 +410,11 @@ struct npu_context {
>   void *priv;
>  };
>  
> +struct mmio_atsd_reg {
> + struct npu *npu;
> + int reg;
> +};
> +
>  /*
>   * Find a free MMIO ATSD register and mark it in use. Return -ENOSPC
>   * if none are available.
> @@ -419,7 +424,7 @@ static int get_mmio_atsd_reg(struct npu *npu)
>   int i;
>  
>   for (i = 0; i < npu->mmio_atsd_count; i++) {
> - if (!test_and_set_bit(i, >mmio_atsd_usage))
> + if (!test_and_set_bit_lock(i, >mmio_atsd_usage))
>   return i;
>   }
>  
> @@ -428,86 +433,90 @@ static int get_mmio_atsd_reg(struct npu *npu)
>  
>  static void put_mmio_atsd_reg(struct npu *npu, int reg)
>  {
> - clear_bit(reg, >mmio_atsd_usage);
> + clear_bit_unlock(reg, >mmio_atsd_usage);
>  }
>  
>  /* MMIO ATSD register offsets */
>  #define XTS_ATSD_AVA  1
>  #define XTS_ATSD_STAT 2
>  
> -static int mmio_launch_invalidate(struct npu *npu, unsigned long launch,
> - unsigned long va)
> +static void mmio_launch_invalidate(struct mmio_atsd_reg *mmio_atsd_reg,
> + unsigned long launch, unsigned long va)
>  {
> - int mmio_atsd_reg;
> -
> - do {
> - mmio_atsd_reg = get_mmio_atsd_reg(npu);
> - cpu_relax();
> - } while (mmio_atsd_reg < 0);
> + struct npu *npu = mmio_atsd_reg->npu;
> + int reg = mmio_atsd_reg->reg;
>  
>   __raw_writeq(cpu_to_be64(va),
> - npu->mmio_atsd_regs[mmio_atsd_reg] + XTS_ATSD_AVA);
> + npu->mmio_atsd_regs[reg] + XTS_ATSD_AVA);
>   eieio();
> - __raw_writeq(cpu_to_be64(launch), npu->mmio_atsd_regs[mmio_atsd_reg]);
> -
> - return mmio_atsd_reg;
> + __raw_writeq(cpu_to_be64(launch), npu->mmio_atsd_regs[reg]);
>  }
>  
> -static int mmio_invalidate_pid(struct npu *npu, unsigned long pid, bool 
> flush)
> +static void mmio_invalidate_pid(struct mmio_atsd_reg 
> mmio_atsd_reg[NV_MAX_NPUS],
> + unsigned long pid, bool flush)
>  {
> + int i;
>   unsigned long launch;
>  
> - /* IS set to invalidate matching PID */
> - launch = PPC_BIT(12);
> + for (i = 0; i <= max_npu2_index; i++) {
> + if (mmio_atsd_reg[i].reg < 0)
> + continue;
> +
> + /* IS set to invalidate matching PID */
> + launch = PPC_BIT(12);
>  
> - /* PRS set to process-scoped */
> - launch |= PPC_BIT(13);
> + /* PRS set to process-scoped */
> + launch |= PPC_BIT(13);
>  
> - /* AP */
> - launch |= (u64) mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17);
> + /* AP */
> + launch |= (u64)
> + mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17);
>  
> - /* PID */
> - launch |= pid << PPC_BITLSHIFT(38);
> + /* PID */
> + launch |= pid << PPC_BITLSHIFT(38);
>  
> - /* No 

Re: [RFC PATCH 1/1] powerpc/ftrace: Exclude real mode code from

2018-03-07 Thread Michael Ellerman
"Naveen N. Rao"  writes:

> We can't take a trap in most parts of real mode code. Instead of adding
> the 'notrace' annotation to all C functions that can be invoked from
> real mode, detect that we are in real mode on ftrace entry and return
> back.
>
> Signed-off-by: Naveen N. Rao 
> ---
> This RFC only handles -mprofile-kernel to demonstrate the approach being 
> considered. We will need to handle other ftrace entry if we decide to 
> continue down this path.

Paul and I were talking about having a paca flag for this, ie.
paca->safe_to_ftrace (or whatever). I'm not sure if you've talked to
him and decided this is a better approach.

I guess I'm 50/50 on which is better, they both have pluses and minuses.

cheers

> diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S 
> b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> index 3f3e81852422..ecc0e8e38ead 100644
> --- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> +++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> @@ -56,6 +56,21 @@ _GLOBAL(ftrace_caller)
>  
>   /* Load special regs for save below */
>   mfmsr   r8
> +
> + /* Only proceed if we are not in real mode and can take interrupts */
> + andi.   r9, r8, MSR_IR|MSR_DR|MSR_RI
> + cmpdi   r9, MSR_IR|MSR_DR|MSR_RI
> + beq 1f
> + mflrr8
> + mtctr   r8
> + REST_GPR(9, r1)
> + REST_GPR(8, r1)
> + addir1, r1, SWITCH_FRAME_SIZE
> + ld  r0, LRSAVE(r1)
> + mtlrr0
> + bctr
> +
> +1:
>   mfctr   r9
>   mfxer   r10
>   mfcrr11
> -- 
> 2.16.1


[PATCH 4/4] powerpc/xmon: Move empty plpar_set_ciabr() into plpar_wrappers.h

2018-03-07 Thread Michael Ellerman
Now that plpar_wrappers.h has an #ifdef PSERIES we can move the empty
version of plpar_set_ciabr() which xmon wants into there.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/plpar_wrappers.h | 6 ++
 arch/powerpc/xmon/xmon.c  | 7 +--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 9233b84f489a..96c1a46acbd0 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -334,6 +334,12 @@ static inline long plpar_get_cpu_characteristics(struct 
h_cpu_char_result *p)
return rc;
 }
 
+#else /* !CONFIG_PPC_PSERIES */
+
+static inline long plpar_set_ciabr(unsigned long ciabr)
+{
+   return 0;
+}
 #endif /* CONFIG_PPC_PSERIES */
 
 #endif /* _ASM_POWERPC_PLPAR_WRAPPERS_H */
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index ecbb34eb1ddf..4118f723ed00 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -61,12 +62,6 @@
 #include 
 #endif
 
-#if defined(CONFIG_PPC_SPLPAR)
-#include 
-#else
-static inline long plpar_set_ciabr(unsigned long ciabr) {return 0; };
-#endif
-
 #include "nonstdio.h"
 #include "dis-asm.h"
 
-- 
2.14.1



[PATCH 3/4] powerpc: Rename plapr routines to plpar

2018-03-07 Thread Michael Ellerman
Back in 2013 we added some hypercall wrappers which misspelled
"plpar" (P-series Logical PARtition) as "plapr".

Visually they're hard to distinguish and it almost doesn't matter, but
it is confusing when grepping to miss some calls because of the typo.

They've also started spreading, so before they take over let's fix
them all to be "plpar".

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/plpar_wrappers.h | 6 +++---
 arch/powerpc/platforms/pseries/setup.c| 2 +-
 arch/powerpc/platforms/pseries/smp.c  | 2 +-
 arch/powerpc/xmon/xmon.c  | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 09cb26816b2d..9233b84f489a 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -305,17 +305,17 @@ static inline long enable_little_endian_exceptions(void)
return plpar_set_mode(1, H_SET_MODE_RESOURCE_LE, 0, 0);
 }
 
-static inline long plapr_set_ciabr(unsigned long ciabr)
+static inline long plpar_set_ciabr(unsigned long ciabr)
 {
return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_CIABR, ciabr, 0);
 }
 
-static inline long plapr_set_watchpoint0(unsigned long dawr0, unsigned long 
dawrx0)
+static inline long plpar_set_watchpoint0(unsigned long dawr0, unsigned long 
dawrx0)
 {
return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR, dawr0, dawrx0);
 }
 
-static inline long plapr_signal_sys_reset(long cpu)
+static inline long plpar_signal_sys_reset(long cpu)
 {
return plpar_hcall_norets(H_SIGNAL_SYS_RESET, cpu);
 }
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 8ef396dd11ca..60b608eb8fed 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -739,7 +739,7 @@ static int pseries_set_dawr(unsigned long dawr, unsigned 
long dawrx)
/* PAPR says we can't set HYP */
dawrx &= ~DAWRX_HYP;
 
-   return  plapr_set_watchpoint0(dawr, dawrx);
+   return  plpar_set_watchpoint0(dawr, dawrx);
 }
 
 #define CMO_CHARACTERISTICS_TOKEN 44
diff --git a/arch/powerpc/platforms/pseries/smp.c 
b/arch/powerpc/platforms/pseries/smp.c
index d506bf661f0f..3df46123cce3 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -215,7 +215,7 @@ static int pseries_cause_nmi_ipi(int cpu)
hwcpu = get_hard_smp_processor_id(cpu);
}
 
-   if (plapr_signal_sys_reset(hwcpu) == H_SUCCESS)
+   if (plpar_signal_sys_reset(hwcpu) == H_SUCCESS)
return 1;
 
return 0;
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index b6574b6f7d4a..ecbb34eb1ddf 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -64,7 +64,7 @@
 #if defined(CONFIG_PPC_SPLPAR)
 #include 
 #else
-static inline long plapr_set_ciabr(unsigned long ciabr) {return 0; };
+static inline long plpar_set_ciabr(unsigned long ciabr) {return 0; };
 #endif
 
 #include "nonstdio.h"
@@ -328,7 +328,7 @@ static void write_ciabr(unsigned long ciabr)
mtspr(SPRN_CIABR, ciabr);
return;
}
-   plapr_set_ciabr(ciabr);
+   plpar_set_ciabr(ciabr);
 }
 
 /**
-- 
2.14.1



[PATCH 2/4] powerpc/pseries: Make plpar_wrappers.h safe to include when PSERIES=n

2018-03-07 Thread Michael Ellerman
Currently plpar_wrappers.h is not safe to include when
CONFIG_PPC_PSERIES=n, or at least it can be depending on other config
options and so on.

Fix that by wrapping the entire content in an ifdef.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/plpar_wrappers.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 1776af9e0118..09cb26816b2d 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_POWERPC_PLPAR_WRAPPERS_H
 #define _ASM_POWERPC_PLPAR_WRAPPERS_H
 
+#ifdef CONFIG_PPC_PSERIES
+
 #include 
 #include 
 
@@ -332,4 +334,6 @@ static inline long plpar_get_cpu_characteristics(struct 
h_cpu_char_result *p)
return rc;
 }
 
+#endif /* CONFIG_PPC_PSERIES */
+
 #endif /* _ASM_POWERPC_PLPAR_WRAPPERS_H */
-- 
2.14.1



[PATCH 1/4] powerpc/pseries: Move smp_query_cpu_stopped() etc. out of plpar_wrappers.h

2018-03-07 Thread Michael Ellerman
smp_query_cpu_stopped() and related #defines are currently in
plpar_wrappers.h. The function actually does an RTAS call, not an
hcall, and basically has nothing to do with plpar_wrappers.h

Move it into pseries.h, where it can easily be used by the only two
callers in pseries/smp.c and pseries/hotplug-cpu.c.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/plpar_wrappers.h | 8 
 arch/powerpc/platforms/pseries/pseries.h  | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 55eddf50d149..1776af9e0118 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -9,14 +9,6 @@
 #include 
 #include 
 
-/* Get state of physical CPU from query_cpu_stopped */
-int smp_query_cpu_stopped(unsigned int pcpu);
-#define QCSS_STOPPED 0
-#define QCSS_STOPPING 1
-#define QCSS_NOT_STOPPED 2
-#define QCSS_HARDWARE_ERROR -1
-#define QCSS_HARDWARE_BUSY -2
-
 static inline long poll_pending(void)
 {
return plpar_hcall_norets(H_POLL_PENDING);
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 1ae1d9f4dbe9..c73351cea276 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -27,6 +27,14 @@ extern int pSeries_machine_check_exception(struct pt_regs 
*regs);
 
 #ifdef CONFIG_SMP
 extern void smp_init_pseries(void);
+
+/* Get state of physical CPU from query_cpu_stopped */
+int smp_query_cpu_stopped(unsigned int pcpu);
+#define QCSS_STOPPED 0
+#define QCSS_STOPPING 1
+#define QCSS_NOT_STOPPED 2
+#define QCSS_HARDWARE_ERROR -1
+#define QCSS_HARDWARE_BUSY -2
 #else
 static inline void smp_init_pseries(void) { };
 #endif
-- 
2.14.1



Re: [PATCH 1/2] powerpc/mm/keys: Move pte bits to correct headers

2018-03-07 Thread Aneesh Kumar K.V

On 03/08/2018 01:58 AM, Ram Pai wrote:

On Wed, Mar 07, 2018 at 07:06:44PM +0530, Aneesh Kumar K.V wrote:

Memory keys are supported only with hash translation mode. Instead of #ifdef in
generic code move the key related pte bits to respective headers

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
  arch/powerpc/include/asm/book3s/64/hash-64k.h |  7 +++
  arch/powerpc/include/asm/book3s/64/pgtable.h  | 19 ---
  3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index fc3dc6a93939..4103bfc7c223 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -33,6 +33,13 @@
  #define H_PAGE_THP_HUGE 0x0
  #define H_PAGE_COMBO  0x0

+/* memory key bits, only 8 keys supported */
+#define H_PTE_PKEY_BIT00
+#define H_PTE_PKEY_BIT10
+#define H_PTE_PKEY_BIT2_RPAGE_RSV3
+#define H_PTE_PKEY_BIT3_RPAGE_RSV4
+#define H_PTE_PKEY_BIT4_RPAGE_RSV5
+



If CONFIG_PPC_MEM_KEYS is not defined, all of them have to be 0.  How is
that handled here?


why? conditional defines of pte bits always results in error, like we 
check for an overloaded key bit in some code path and taking wrong action.





  /* 8 bytes per each pte entry */
  #define H_PTE_FRAG_SIZE_SHIFT  (H_PTE_INDEX_SIZE + 3)
  #define H_PTE_FRAG_NR (PAGE_SIZE >> H_PTE_FRAG_SIZE_SHIFT)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index e53728ff29a0..bb880c97b87d 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -16,6 +16,13 @@
  #define H_PAGE_BUSY   _RPAGE_RPN44 /* software: PTE & hash are busy */
  #define H_PAGE_HASHPTE_RPAGE_RPN43/* PTE has associated HPTE */

+/* memory key bits. */
+#define H_PTE_PKEY_BIT0_RPAGE_RSV1
+#define H_PTE_PKEY_BIT1_RPAGE_RSV2
+#define H_PTE_PKEY_BIT2_RPAGE_RSV3
+#define H_PTE_PKEY_BIT3_RPAGE_RSV4
+#define H_PTE_PKEY_BIT4_RPAGE_RSV5
+


same comment as above.



-aneesh



Re: [PATCH 00/14] numa aware allocation for pacas, stacks, pagetables

2018-03-07 Thread Nicholas Piggin
On Wed, 07 Mar 2018 21:50:04 +1100
Michael Ellerman  wrote:

> Nicholas Piggin  writes:
> 
> > This series allows numa aware allocations for various early data
> > structures for radix. Hash still has a bolted SLB limitation that
> > prevents at least pacas and stacks from node-affine allocations.
> >
> > Fixed up a number of bugs, got pSeries working, added a couple more
> > cases where page tables can be allocated node-local.  
> 
> Few problems in here:
> 
> FAILURE kernel-build-linux » powerpc,gcc_ubuntu_be,pmac32
>   arch/powerpc/kernel/prom.c:748:2: error: implicit declaration of function 
> 'allocate_paca_ptrs' [-Werror=implicit-function-declaration]
> 
> FAILURE kernel-build-linux » powerpc,gcc_ubuntu_le,powernv
>   arch/powerpc/include/asm/paca.h:49:33: error: 'struct paca_struct' has no 
> member named 'lppaca_ptr'
>   arch/powerpc/include/asm/paca.h:49:33: error: 'struct paca_struct' has no 
> member named 'lppaca_ptr'
> 
> Did I miss a follow-up or something?

Here's a patch that applies to "powerpc/64: defer paca allocation
until memory topology is discovered". The first hunk fixes the ppc32
issue, and the second hunk avoids freeing the cpu_to_phys_id array
if the platform didn't allocate it. But I've just realized that
should go into the previous patch (which is missing the
memblock_free).
--

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 8aaa697701f1..fff29b8057d9 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -258,7 +258,8 @@ extern void free_unused_pacas(void);
 
 #else /* CONFIG_PPC64 */
 
-static inline void allocate_pacas(void) { };
+static inline void allocate_paca_ptrs(void) { };
+static inline allocate_paca(int cpu) { } ;
 static inline void free_unused_pacas(void) { };
 
 #endif /* CONFIG_PPC64 */
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 56f7a2b793e0..2ba05acc2973 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -854,8 +854,10 @@ static void smp_setup_pacas(void)
set_hard_smp_processor_id(cpu, cpu_to_phys_id[cpu]);
}
 
-   memblock_free(__pa(cpu_to_phys_id), nr_cpu_ids * sizeof(u32));
-   cpu_to_phys_id = NULL;
+   if (cpu_to_phys_id) {
+   memblock_free(__pa(cpu_to_phys_id), nr_cpu_ids * sizeof(u32));
+   cpu_to_phys_id = NULL;
+   }
 }
 #endif
 



Re: [PATCH] powerpc/powernv/mce: Don't silently restart the machine

2018-03-07 Thread Stewart Smith
Balbir Singh  writes:
> On MCE the current code will restart the machine with
> ppc_md.restart(). This case was extremely unlikely since
> prior to that a skiboot call is made and that resulted in
> a checkstop for analysis.
>
> With newer skiboots, on P9 we don't checkstop the box by
> default, instead we return back to the kernel to extract
> useful information at the time of the MCE. While we still
> get this information, this patch converts the restart to
> a panic(), so that if configured a dump can be taken and
> we can track and probably debug the potential issue causing
> the MCE.

This will likely change again, but I can send a patch that changes the
comment (along with the logic of decoding it all and having enough
information to make sensible decisions). But... I kind of don't want to
bikeshed a comment to death :)

I reckon the panic() here is the right thing to do no matter
what.

Reviewed-by: Stewart Smith 

-- 
Stewart Smith
OPAL Architect, IBM.



[PATCH] powerpc/powernv/mce: Don't silently restart the machine

2018-03-07 Thread Balbir Singh
On MCE the current code will restart the machine with
ppc_md.restart(). This case was extremely unlikely since
prior to that a skiboot call is made and that resulted in
a checkstop for analysis.

With newer skiboots, on P9 we don't checkstop the box by
default, instead we return back to the kernel to extract
useful information at the time of the MCE. While we still
get this information, this patch converts the restart to
a panic(), so that if configured a dump can be taken and
we can track and probably debug the potential issue causing
the MCE.

Signed-off-by: Balbir Singh 
Reviewed-by: Nicholas Piggin 
---

Changelog
- Change panic to use the passed msg, instead of a hard
coded MCE one, since this code path is common to both
HMI's and MCE's

 arch/powerpc/platforms/powernv/opal.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index c15182765ff5..516e23de5a3d 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -490,9 +490,12 @@ void pnv_platform_error_reboot(struct pt_regs *regs, const 
char *msg)
 *opal to trigger checkstop explicitly for error analysis.
 *The FSP PRD component would have already got notified
 *about this error through other channels.
+* 4. We are running on a newer skiboot that by default does
+*not cause a checkstop, drops us back to the kernel to
+*extract context and state at the time of the error.
 */
 
-   ppc_md.restart(NULL);
+   panic(msg);
 }
 
 int opal_machine_check(struct pt_regs *regs)
-- 
2.13.6



Re: [RFC v2 1/3] hotplug/mobility: Apply assoc updates for Post Migration Topo

2018-03-07 Thread Michael Bringmann
Accepted Tyrel's change to dlpar_cpu_readd_by_index.  The amendment
will be included in the next version of the RFC.

Michael

On 03/07/2018 01:32 PM, Tyrel Datwyler wrote:
> On 02/26/2018 12:52 PM, Michael Bringmann wrote:
>> hotplug/mobility: Recognize more changes to the associativity of
>> memory blocks described by the 'ibm,dynamic-memory' and 'cpu'
>> properties when processing the topology of LPARS in Post Migration
>> events.  Previous efforts only recognized whether a memory block's
>> assignment had changed in the property.  Changes here include:
>>
>> * Checking the aa_index values of the old/new properties and 'readd'
>>   any block for which the setting has changed.
>> * Checking for changes in cpu associativity and making 'readd' calls
>>   when differences are observed.
>>
>> Signed-off-by: Michael Bringmann 
>> ---
>> Changes in RFC:
>>   -- Simplify code to update CPU nodes during mobility checks.
>>  Remove functions to generate extra HP_ELOG messages in favor
>>  of direct function calls to dlpar_cpu_readd_by_index.
>>   -- Move check for "cpu" node type from pseries_update_cpu to
>>  pseries_smp_notifier in 'hotplug-cpu.c'
>>   -- Remove functions 'pseries_memory_readd_by_index' and
>>  'pseries_cpu_readd_by_index' as no longer needed outside of
>>  'mobility.c'.
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-cpu.c|   69 
>> +++
>>  arch/powerpc/platforms/pseries/hotplug-memory.c |6 ++
>>  2 files changed, 75 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
>> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index a7d14aa7..91ef22a 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -636,6 +636,27 @@ static int dlpar_cpu_remove_by_index(u32 drc_index)
>>  return rc;
>>  }
>>  
>> +static int dlpar_cpu_readd_by_index(u32 drc_index)
>> +{
>> +int rc = 0;
>> +
>> +pr_info("Attempting to update CPU, drc index %x\n", drc_index);
>> +
>> +if (dlpar_cpu_remove_by_index(drc_index))
>> +rc = -EINVAL;
>> +else if (dlpar_cpu_add(drc_index))
>> +rc = -EINVAL;
> 
> While this if block appears to do the right thing it looks a little icky to 
> me as I find it hard to follow the flow. To me the natural way of thinking 
> about this is if the remove succeeds then add the cpu back. Further, you are 
> masking the return codes from the dlpar code by reporting EINVAL instead of 
> capturing the actual return values. EINVAL implies that their was something 
> wrong with the drc_index supplied. I would do something more like the 
> following which captures the return codes and only relies on a single 
> conditional if statement.
> 
> rc = dlpar_cpu_remove_by_index(drc_index);
> if (!rc)
>   rc = dlpar_cpu_add(drc_index);
> 
> -Tyrel
> 
>> +
>> +if (rc)
>> +pr_info("Failed to update cpu at drc_index %lx\n",
>> +(unsigned long int)drc_index);
>> +else
>> +pr_info("CPU at drc_index %lx was updated\n",
>> +(unsigned long int)drc_index);
>> +
>> +return rc;
>> +}
>> +
>>  static int find_dlpar_cpus_to_remove(u32 *cpu_drcs, int cpus_to_remove)
>>  {
>>  struct device_node *dn;
>> @@ -826,6 +847,9 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>>  else
>>  rc = -EINVAL;
>>  break;
>> +case PSERIES_HP_ELOG_ACTION_READD:
>> +rc = dlpar_cpu_readd_by_index(drc_index);
>> +break;
>>  default:
>>  pr_err("Invalid action (%d) specified\n", hp_elog->action);
>>  rc = -EINVAL;
>> @@ -876,12 +900,53 @@ static ssize_t dlpar_cpu_release(const char *buf, 
>> size_t count)
>>  
>>  #endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>>  
>> +static int pseries_update_cpu(struct of_reconfig_data *pr)
>> +{
>> +u32 old_entries, new_entries;
>> +__be32 *p, *old_assoc, *new_assoc;
>> +int rc = 0;
>> +
>> +/* So far, we only handle the 'ibm,associativity' property,
>> + * here.
>> + * The first int of the property is the number of domains
>> + * described.  This is followed by an array of level values.
>> + */
>> +p = (__be32 *) pr->old_prop->value;
>> +if (!p)
>> +return -EINVAL;
>> +old_entries = be32_to_cpu(*p++);
>> +old_assoc = p;
>> +
>> +p = (__be32 *)pr->prop->value;
>> +if (!p)
>> +return -EINVAL;
>> +new_entries = be32_to_cpu(*p++);
>> +new_assoc = p;
>> +
>> +if (old_entries == new_entries) {
>> +int sz = old_entries * sizeof(int);
>> +
>> +if (!memcmp(old_assoc, new_assoc, sz))
>> +rc = dlpar_cpu_readd_by_index(
>> +be32_to_cpu(pr->dn->phandle));
>> +
>> +} else {
>> +rc = dlpar_cpu_readd_by_index(
>> +

Re: [PATCH 1/6] Docs: dt: add fsl-mc iommu-parent device-tree binding

2018-03-07 Thread Rob Herring
On Mon, Mar 05, 2018 at 07:59:21PM +0530, Nipun Gupta wrote:
> The existing IOMMU bindings cannot be used to specify the relationship
> between fsl-mc devices and IOMMUs. This patch adds a binding for
> mapping fsl-mc devices to IOMMUs, using a new iommu-parent property.
> 
> Signed-off-by: Nipun Gupta 
> ---
>  .../devicetree/bindings/misc/fsl,qoriq-mc.txt  | 31 
> ++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt 
> b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> index 6611a7c..011c7d6 100644
> --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> @@ -9,6 +9,24 @@ blocks that can be used to create functional hardware 
> objects/devices
>  such as network interfaces, crypto accelerator instances, L2 switches,
>  etc.
>  
> +For an overview of the DPAA2 architecture and fsl-mc bus see:
> +drivers/staging/fsl-mc/README.txt
> +
> +As described in the above overview, all DPAA2 objects in a DPRC share the
> +same hardware "isolation context" and a 10-bit value called an ICID
> +(isolation context id) is expressed by the hardware to identify
> +the requester.
> +
> +The generic 'iommus' property is cannot be used to describe the relationship
> +between fsl-mc and IOMMUs, so an iommu-parent property is used to define
> +the same.

Why not? It is just a link between 2 nodes.

> +
> +For generic IOMMU bindings, see
> +Documentation/devicetree/bindings/iommu/iommu.txt.
> +
> +For arm-smmu binding, see:
> +Documentation/devicetree/bindings/iommu/arm,smmu.txt.
> +
>  Required properties:
>  
>  - compatible
> @@ -88,14 +106,27 @@ Sub-nodes:
>Value type: 
>Definition: Specifies the phandle to the PHY device node 
> associated
>with the this dpmac.
> +Optional properties:
> +
> +- iommu-parent: Maps the devices on fsl-mc bus to an IOMMU.
> +  The property specifies the IOMMU behind which the devices on
> +  fsl-mc bus are residing.

If you want a generic property, this should be documented in the common 
binding.

Couldn't you have more than 1 IOMMU upstream of a MC?

>  
>  Example:
>  
> +smmu: iommu@500 {
> +   compatible = "arm,mmu-500";
> +   #iommu-cells = <1>;
> +   stream-match-mask = <0x7C00>;
> +   ...
> +};
> +
>  fsl_mc: fsl-mc@80c00 {
>  compatible = "fsl,qoriq-mc";
>  reg = <0x0008 0x0c00 0 0x40>,/* MC portal base */
><0x 0x0834 0 0x4>; /* MC control reg */
>  msi-parent = <>;
> +iommu-parent = <>;
>  #address-cells = <3>;
>  #size-cells = <1>;
>  
> -- 
> 1.9.1
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH 17/21] powerpc: Add missing prototype for sys_debug_setcontext

2018-03-07 Thread Mathieu Malaterre
On Sun, Mar 4, 2018 at 11:54 AM, Michael Ellerman  wrote:
> Mathieu Malaterre  writes:
>
>> In commit 81e7009ea46c ("powerpc: merge ppc signal.c and ppc64 signal32.c")
>> the function sys_debug_setcontext was added without a prototype.
>>
>> Fix compilation warning (treated as error in W=1):
>>
>>   CC  arch/powerpc/kernel/signal_32.o
>> arch/powerpc/kernel/signal_32.c:1227:5: error: no previous prototype for 
>> ‘sys_debug_setcontext’ [-Werror=missing-prototypes]
>>  int sys_debug_setcontext(struct ucontext __user *ctx,
>>  ^~~~
>> cc1: all warnings being treated as errors
>
> This one should actually be using the SYSCALL_DEFINE syntax, so that it
> can be used with CONFIG_FTRACE_SYSCALLS.
>
> See eg. our mmap:
>
>   SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
> unsigned long, prot, unsigned long, flags,
> unsigned long, fd, off_t, offset)
>   {
> return do_mmap2(addr, len, prot, flags, fd, offset, PAGE_SHIFT);
>   }
>
>
> We probably still need this patch, but I'm not entirely sure because the
> SYSCALL_DEFINE macro does all sorts of shenanigans.

I see. Could you please drop this patch then. The patch does not look
that trivial anymore. I'll need to dig a bit more on how to do the
syscall stuff with a 7 params function.

Thanks


[PATCH v2 05/21] powerpc: Avoid comparison of unsigned long >= 0 in pfn_valid

2018-03-07 Thread Mathieu Malaterre
Rewrite comparison since all values compared are of type `unsigned long`.

Instead of using unsigned properties and rewriting the original code as:
(originally suggested by Segher Boessenkool )

  #define pfn_valid(pfn) \
   (((pfn) - ARCH_PFN_OFFSET) < (max_mapnr - ARCH_PFN_OFFSET))

Prefer a static inline function to make code as readable as possible.

Fix a warning (treated as error in W=1):

  CC  arch/powerpc/kernel/irq.o
In file included from ./include/linux/bug.h:5:0,
 from ./include/linux/cpumask.h:13,
 from ./include/linux/smp.h:13,
 from ./include/linux/kernel_stat.h:5,
 from arch/powerpc/kernel/irq.c:35:
./include/linux/dma-mapping.h: In function ‘dma_map_resource’:
./arch/powerpc/include/asm/page.h:129:32: error: comparison of unsigned 
expression >= 0 is always true [-Werror=type-limits]
 #define pfn_valid(pfn)  ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)
^
Suggested-by: Christophe Leroy 
Signed-off-by: Mathieu Malaterre 
---
 arch/powerpc/include/asm/page.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 8da5d4c1cab2..6f74938483b7 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -126,7 +126,15 @@ extern long long virt_phys_offset;
 
 #ifdef CONFIG_FLATMEM
 #define ARCH_PFN_OFFSET((unsigned long)(MEMORY_START >> 
PAGE_SHIFT))
-#define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)
+#ifndef __ASSEMBLY__
+extern unsigned long max_mapnr;
+static inline bool pfn_valid(unsigned long pfn)
+{
+   unsigned long min_pfn = ARCH_PFN_OFFSET;
+
+   return pfn >= min_pfn && pfn < max_mapnr;
+}
+#endif
 #endif
 
 #define virt_to_pfn(kaddr) (__pa(kaddr) >> PAGE_SHIFT)
-- 
2.11.0



[PATCH v2 15/21] powerpc: Make function MMU_setup static

2018-03-07 Thread Mathieu Malaterre
Since function `MMU_setup` is not meant to be exported, change the
signature to `static`. Fix warning (treated as error with W=1):

  CC  kernel/sys.o
arch/powerpc/mm/init_32.c:102:13: error: no previous prototype for ‘MMU_setup’ 
[-Werror=missing-prototypes]
 void __init MMU_setup(void)
 ^
cc1: all warnings being treated as errors

Signed-off-by: Mathieu Malaterre 
---
 arch/powerpc/mm/init_32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 6419b33ca309..a2bf6965d04f 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -99,7 +99,7 @@ unsigned long __max_low_memory = MAX_LOW_MEM;
 /*
  * Check for command-line options that affect what MMU_init will do.
  */
-void __init MMU_setup(void)
+static void __init MMU_setup(void)
 {
/* Check for nobats option (used in mapin_ram). */
if (strstr(boot_command_line, "nobats")) {
-- 
2.11.0



Re: [PATCH 1/2] powerpc/mm/keys: Move pte bits to correct headers

2018-03-07 Thread Ram Pai
On Wed, Mar 07, 2018 at 07:06:44PM +0530, Aneesh Kumar K.V wrote:
> Memory keys are supported only with hash translation mode. Instead of #ifdef 
> in
> generic code move the key related pte bits to respective headers
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
>  arch/powerpc/include/asm/book3s/64/hash-64k.h |  7 +++
>  arch/powerpc/include/asm/book3s/64/pgtable.h  | 19 ---
>  3 files changed, 14 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index fc3dc6a93939..4103bfc7c223 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -33,6 +33,13 @@
>  #define H_PAGE_THP_HUGE 0x0
>  #define H_PAGE_COMBO 0x0
> 
> +/* memory key bits, only 8 keys supported */
> +#define H_PTE_PKEY_BIT0  0
> +#define H_PTE_PKEY_BIT1  0
> +#define H_PTE_PKEY_BIT2  _RPAGE_RSV3
> +#define H_PTE_PKEY_BIT3  _RPAGE_RSV4
> +#define H_PTE_PKEY_BIT4  _RPAGE_RSV5
> +


If CONFIG_PPC_MEM_KEYS is not defined, all of them have to be 0.  How is
that handled here? 

>  /* 8 bytes per each pte entry */
>  #define H_PTE_FRAG_SIZE_SHIFT  (H_PTE_INDEX_SIZE + 3)
>  #define H_PTE_FRAG_NR(PAGE_SIZE >> H_PTE_FRAG_SIZE_SHIFT)
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index e53728ff29a0..bb880c97b87d 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -16,6 +16,13 @@
>  #define H_PAGE_BUSY  _RPAGE_RPN44 /* software: PTE & hash are busy */
>  #define H_PAGE_HASHPTE   _RPAGE_RPN43/* PTE has associated HPTE */
> 
> +/* memory key bits. */
> +#define H_PTE_PKEY_BIT0  _RPAGE_RSV1
> +#define H_PTE_PKEY_BIT1  _RPAGE_RSV2
> +#define H_PTE_PKEY_BIT2  _RPAGE_RSV3
> +#define H_PTE_PKEY_BIT3  _RPAGE_RSV4
> +#define H_PTE_PKEY_BIT4  _RPAGE_RSV5
> +

same comment as above.

RP



Re: [RFC v2 1/3] hotplug/mobility: Apply assoc updates for Post Migration Topo

2018-03-07 Thread Tyrel Datwyler
On 02/26/2018 12:52 PM, Michael Bringmann wrote:
> hotplug/mobility: Recognize more changes to the associativity of
> memory blocks described by the 'ibm,dynamic-memory' and 'cpu'
> properties when processing the topology of LPARS in Post Migration
> events.  Previous efforts only recognized whether a memory block's
> assignment had changed in the property.  Changes here include:
> 
> * Checking the aa_index values of the old/new properties and 'readd'
>   any block for which the setting has changed.
> * Checking for changes in cpu associativity and making 'readd' calls
>   when differences are observed.
> 
> Signed-off-by: Michael Bringmann 
> ---
> Changes in RFC:
>   -- Simplify code to update CPU nodes during mobility checks.
>  Remove functions to generate extra HP_ELOG messages in favor
>  of direct function calls to dlpar_cpu_readd_by_index.
>   -- Move check for "cpu" node type from pseries_update_cpu to
>  pseries_smp_notifier in 'hotplug-cpu.c'
>   -- Remove functions 'pseries_memory_readd_by_index' and
>  'pseries_cpu_readd_by_index' as no longer needed outside of
>  'mobility.c'.
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c|   69 
> +++
>  arch/powerpc/platforms/pseries/hotplug-memory.c |6 ++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index a7d14aa7..91ef22a 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -636,6 +636,27 @@ static int dlpar_cpu_remove_by_index(u32 drc_index)
>   return rc;
>  }
>  
> +static int dlpar_cpu_readd_by_index(u32 drc_index)
> +{
> + int rc = 0;
> +
> + pr_info("Attempting to update CPU, drc index %x\n", drc_index);
> +
> + if (dlpar_cpu_remove_by_index(drc_index))
> + rc = -EINVAL;
> + else if (dlpar_cpu_add(drc_index))
> + rc = -EINVAL;

While this if block appears to do the right thing it looks a little icky to me 
as I find it hard to follow the flow. To me the natural way of thinking about 
this is if the remove succeeds then add the cpu back. Further, you are masking 
the return codes from the dlpar code by reporting EINVAL instead of capturing 
the actual return values. EINVAL implies that their was something wrong with 
the drc_index supplied. I would do something more like the following which 
captures the return codes and only relies on a single conditional if statement.

rc = dlpar_cpu_remove_by_index(drc_index);
if (!rc)
rc = dlpar_cpu_add(drc_index);

-Tyrel

> +
> + if (rc)
> + pr_info("Failed to update cpu at drc_index %lx\n",
> + (unsigned long int)drc_index);
> + else
> + pr_info("CPU at drc_index %lx was updated\n",
> + (unsigned long int)drc_index);
> +
> + return rc;
> +}
> +
>  static int find_dlpar_cpus_to_remove(u32 *cpu_drcs, int cpus_to_remove)
>  {
>   struct device_node *dn;
> @@ -826,6 +847,9 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>   else
>   rc = -EINVAL;
>   break;
> + case PSERIES_HP_ELOG_ACTION_READD:
> + rc = dlpar_cpu_readd_by_index(drc_index);
> + break;
>   default:
>   pr_err("Invalid action (%d) specified\n", hp_elog->action);
>   rc = -EINVAL;
> @@ -876,12 +900,53 @@ static ssize_t dlpar_cpu_release(const char *buf, 
> size_t count)
>  
>  #endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>  
> +static int pseries_update_cpu(struct of_reconfig_data *pr)
> +{
> + u32 old_entries, new_entries;
> + __be32 *p, *old_assoc, *new_assoc;
> + int rc = 0;
> +
> + /* So far, we only handle the 'ibm,associativity' property,
> +  * here.
> +  * The first int of the property is the number of domains
> +  * described.  This is followed by an array of level values.
> +  */
> + p = (__be32 *) pr->old_prop->value;
> + if (!p)
> + return -EINVAL;
> + old_entries = be32_to_cpu(*p++);
> + old_assoc = p;
> +
> + p = (__be32 *)pr->prop->value;
> + if (!p)
> + return -EINVAL;
> + new_entries = be32_to_cpu(*p++);
> + new_assoc = p;
> +
> + if (old_entries == new_entries) {
> + int sz = old_entries * sizeof(int);
> +
> + if (!memcmp(old_assoc, new_assoc, sz))
> + rc = dlpar_cpu_readd_by_index(
> + be32_to_cpu(pr->dn->phandle));
> +
> + } else {
> + rc = dlpar_cpu_readd_by_index(
> + be32_to_cpu(pr->dn->phandle));
> + }
> +
> + return rc;
> +}
> +
>  static int pseries_smp_notifier(struct notifier_block *nb,
>   unsigned long action, void *data)
>  {
>   struct of_reconfig_data *rd = data;

Re: [RFC PATCH 1/1] powerpc/ftrace: Exclude real mode code from

2018-03-07 Thread Steven Rostedt
On Thu, 08 Mar 2018 00:07:07 +0530
"Naveen N. Rao"  wrote:

> Yes, that's negligible.
> Though, to be honest, I will have to introduce a 'mfmsr' for the older 
> -pg variant. I still think that the improved reliability far outweighs 
> the minor slowdown there.

In that case, can you introduce a read_mostly variable that can be
tested before calling the mfmsr. Why punish normal ftrace tracing if
kvm is not enabled or running?

Both should probably have an #ifdef CONFIG_KVM encapsulating the code.

-- Steve



Re: [RFC PATCH 1/1] powerpc/ftrace: Exclude real mode code from

2018-03-07 Thread Naveen N. Rao

Hi Steve,

Steven Rostedt wrote:

On Wed,  7 Mar 2018 22:16:19 +0530
"Naveen N. Rao"  wrote:


We can't take a trap in most parts of real mode code. Instead of adding
the 'notrace' annotation to all C functions that can be invoked from
real mode, detect that we are in real mode on ftrace entry and return
back.

Signed-off-by: Naveen N. Rao 
---
This RFC only handles -mprofile-kernel to demonstrate the approach being 
considered. We will need to handle other ftrace entry if we decide to 
continue down this path.


I do prefer this trade off.


Great, thanks!





diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S 
b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
index 3f3e81852422..ecc0e8e38ead 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
@@ -56,6 +56,21 @@ _GLOBAL(ftrace_caller)
 
 	/* Load special regs for save below */

mfmsr   r8
+
+   /* Only proceed if we are not in real mode and can take interrupts */
+   andi.   r9, r8, MSR_IR|MSR_DR|MSR_RI
+   cmpdi   r9, MSR_IR|MSR_DR|MSR_RI
+   beq 1f


OK, I assume this check and branch is negligible compared to the mfmsr
call?


Yes, that's negligible.
Though, to be honest, I will have to introduce a 'mfmsr' for the older 
-pg variant. I still think that the improved reliability far outweighs 
the minor slowdown there.


- Naveen




Re: [RFC PATCH 1/1] powerpc/ftrace: Exclude real mode code from

2018-03-07 Thread Steven Rostedt
On Wed,  7 Mar 2018 22:16:19 +0530
"Naveen N. Rao"  wrote:

> We can't take a trap in most parts of real mode code. Instead of adding
> the 'notrace' annotation to all C functions that can be invoked from
> real mode, detect that we are in real mode on ftrace entry and return
> back.
> 
> Signed-off-by: Naveen N. Rao 
> ---
> This RFC only handles -mprofile-kernel to demonstrate the approach being 
> considered. We will need to handle other ftrace entry if we decide to 
> continue down this path.

I do prefer this trade off.


> diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S 
> b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> index 3f3e81852422..ecc0e8e38ead 100644
> --- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> +++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> @@ -56,6 +56,21 @@ _GLOBAL(ftrace_caller)
>  
>   /* Load special regs for save below */
>   mfmsr   r8
> +
> + /* Only proceed if we are not in real mode and can take interrupts */
> + andi.   r9, r8, MSR_IR|MSR_DR|MSR_RI
> + cmpdi   r9, MSR_IR|MSR_DR|MSR_RI
> + beq 1f

OK, I assume this check and branch is negligible compared to the mfmsr
call?

-- Steve


> + mflrr8
> + mtctr   r8
> + REST_GPR(9, r1)
> + REST_GPR(8, r1)
> + addir1, r1, SWITCH_FRAME_SIZE
> + ld  r0, LRSAVE(r1)
> + mtlrr0
> + bctr
> +
> +1:
>   mfctr   r9
>   mfxer   r10
>   mfcrr11



[RFC PATCH 1/1] powerpc/ftrace: Exclude real mode code from

2018-03-07 Thread Naveen N. Rao
We can't take a trap in most parts of real mode code. Instead of adding
the 'notrace' annotation to all C functions that can be invoked from
real mode, detect that we are in real mode on ftrace entry and return
back.

Signed-off-by: Naveen N. Rao 
---
This RFC only handles -mprofile-kernel to demonstrate the approach being 
considered. We will need to handle other ftrace entry if we decide to 
continue down this path.

- Naveen


 arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S 
b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
index 3f3e81852422..ecc0e8e38ead 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
@@ -56,6 +56,21 @@ _GLOBAL(ftrace_caller)
 
/* Load special regs for save below */
mfmsr   r8
+
+   /* Only proceed if we are not in real mode and can take interrupts */
+   andi.   r9, r8, MSR_IR|MSR_DR|MSR_RI
+   cmpdi   r9, MSR_IR|MSR_DR|MSR_RI
+   beq 1f
+   mflrr8
+   mtctr   r8
+   REST_GPR(9, r1)
+   REST_GPR(8, r1)
+   addir1, r1, SWITCH_FRAME_SIZE
+   ld  r0, LRSAVE(r1)
+   mtlrr0
+   bctr
+
+1:
mfctr   r9
mfxer   r10
mfcrr11
-- 
2.16.1



[RFC PATCH 0/1] Exclude real mode code from ftrace

2018-03-07 Thread Naveen N. Rao
If the function tracer is enabled when starting a guest, we get the 
below oops:

[ cut here ]
Delta way too big! 17582052940437522358 ts=17582052944931114496 write stamp = 
4493592138
Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1380 Comm: qemu-system-ppc Not tainted 4.16.0-rc3-nnr+ #148
NIP:  c02635f8 LR: c02635f4 CTR: c01c1384
REGS: c000fffd1d80 TRAP: 0700   Not tainted  (4.16.0-rc3-nnr+)
MSR:  92823003   CR: 2824  XER: 2000
CFAR: c0144f94 SOFTE: 3 
GPR00: c02635f4 c000a26931d0 c13fbe00 0058 
GPR04: 0001  0001  
GPR08: fe8d c1287368 c1287368 
28242224 GPR12: 2000 cfac 
c012cd04 c000f0279a00 GPR16: c000a26938e0 
c0de2044 c15c5488  GPR20: 
 0001  0001 
GPR24: 0001   
0003 GPR28:   
03e8 c000a2693260 NIP [c02635f8] 
rb_handle_timestamp+0x88/0x90
LR [c02635f4] rb_handle_timestamp+0x84/0x90
Call Trace:
[c000a26931d0] [c02635f4] rb_handle_timestamp+0x84/0x90 (unreliable)
[c000a2693240] [c0266d84] ring_buffer_lock_reserve+0x174/0x5d0
[c000a26932b0] [c02728a0] trace_function+0x50/0x190
[c000a2693310] [c027f000] function_trace_call+0x140/0x170
[c000a2693340] [c0064c80] ftrace_call+0x4/0xb8
[c000a2693510] [c012720c] kvmppc_hv_entry+0x148/0x164
[c000a26935b0] [c0126ce0] kvmppc_call_hv_entry+0x28/0x124
[c000a2693620] [c011dd84] __kvmppc_vcore_entry+0x13c/0x1b8
[c000a26937f0] [c011a8c0] kvmppc_run_core+0xec0/0x1e50
[c000a26939b0] [c011c6e4] kvmppc_vcpu_run_hv+0x484/0x1270
[c000a2693b30] [c00f8ea8] kvmppc_vcpu_run+0x38/0x50
[c000a2693b50] [c00f4a8c] kvm_arch_vcpu_ioctl_run+0x28c/0x380
[c000a2693be0] [c00e6978] kvm_vcpu_ioctl+0x4c8/0x780
[c000a2693d40] [c03e64e8] do_vfs_ioctl+0xd8/0x900
[c000a2693de0] [c03e6d7c] SyS_ioctl+0x6c/0x100
[c000a2693e30] [c000bc60] system_call+0x58/0x6c
Instruction dump:
2f89 409effd4 e8c300b0 e8bf 3921 3ce2ffc9 3c62ffc2 38e78808 
38638058 992a7032 4bee1939 6000 <0fe0> 4ba4 3c4c011a 38428800 
---[ end trace 6c43107948f7546d ]---

The KVM entry code updates the timebase register based on the guest's 
tb_offset, which upsets ftrace ring buffer time stamps resulting in a 
WARN_ONCE() in rb_handle_timestamp(). Furthermore, WARN() inserts a trap 
instruction which is now hit while we are in guest MMU context, 
resulting in the oops above.

The obvious way to address this is to exclude all KVM C code that can be 
run when we are in KVM_GUEST_MODE_HOST_HV from ftrace using the 
'notrace' annotation (*). But, there are a few problems doing that:
- the list grows quickly since we need to blacklist not just the top 
  level function, but every other function which those can call and any 
  and all functions that those can in turn call, and so on...
- even if we do the above, it is hard to ensure that all functions are 
  covered and that this continues to be the case due to code refactoring 
  adding new functions.

The other ways to handle this need a slightly larger hammer:
1. exclude all KVM code from ftrace
2. exclude all real mode code from ftrace

(1) is fairly easy to do, but is still not sufficient since we do call 
into various mm/ helpers and they will need to be additionally excluded.  
It also ends up excluding a lot of KVM code that can still be traced.

(2) is the approach implemented by the subsequent patch (+) and looks 
like a reasonable tradeoff since it additionally excludes all real mode 
code, rather than just the KVM code. However, I am not completely sure 
how much real mode C code we have, that we would like to be able to 
trace. So, it would be good to hear what is preferable.

Please let me know your thoughts.


Thanks,
Naveen

-
(*) Afaics, KVM real mode code is not segregated into a separate file 
and is not trivial to do. If this is not true, then this may be an 
option to consider.
(+) This RFC only handles -mprofile-kernel, and would need to be updated 
to deal with other ftrace entry code.



Naveen N. Rao (1):
  powerpc/ftrace: Exclude real mode code from being traced

 arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 15 +++
 1 file changed, 15 insertions(+)

-- 
2.16.1



[PATCH V2] powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb

2018-03-07 Thread Aneesh Kumar K.V
With 64k page size, we have hugetlb pte entries at the pmd and pud level for
book3s64. We don't need to create a separate page table cache for that. With 4k
we need to make sure hugepd page table cache for 16M is placed at PUD level
and 16G at the PGD level.

Simplify all these by not using HUGEPD_PD_SHIFT which is confusing for book3s64.

Without this patch, with 64k page size we create pagetable caches with shift
value 10 and 7 which are not used at all.

Fixes:419df06eea5bfa81("powerpc: Reduce the PTE_INDEX_SIZE")

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hugetlbpage.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 876da2bc1796..3b509b268030 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -122,9 +122,6 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
 #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 #define HUGEPD_PGD_SHIFT PGDIR_SHIFT
 #define HUGEPD_PUD_SHIFT PUD_SHIFT
-#else
-#define HUGEPD_PGD_SHIFT PUD_SHIFT
-#define HUGEPD_PUD_SHIFT PMD_SHIFT
 #endif
 
 /*
@@ -669,12 +666,24 @@ static int __init hugetlbpage_init(void)
if (add_huge_page_size(1ULL << shift) < 0)
continue;
 
+
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (shift > PGDIR_SHIFT)
+   BUG();
+   else if (shift > PUD_SHIFT)
+   pdshift = PGDIR_SHIFT;
+   else if (shift > PMD_SHIFT)
+   pdshift = PUD_SHIFT;
+   else
+   pdshift = PMD_SHIFT;
+#else
if (shift < HUGEPD_PUD_SHIFT)
pdshift = PMD_SHIFT;
else if (shift < HUGEPD_PGD_SHIFT)
pdshift = PUD_SHIFT;
else
pdshift = PGDIR_SHIFT;
+#endif
/*
 * if we have pdshift and shift value same, we don't
 * use pgt cache for hugepd.
-- 
2.14.3



Re: [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits

2018-03-07 Thread Christophe LEROY



Le 07/03/2018 à 08:16, Nicholas Piggin a écrit :

On Wed, 7 Mar 2018 07:12:23 +0100
Christophe LEROY  wrote:


Le 07/03/2018 à 00:12, Nicholas Piggin a écrit :

On Tue, 6 Mar 2018 15:41:00 +0100
Christophe LEROY  wrote:
   

Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :



@@ -596,10 +601,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
slice_or_mask(_mask, _mask);
slice_print_mask(" potential", _mask);

-	if ((addr != 0 || fixed) &&

-   slice_check_fit(mm, , _mask)) {
-   slice_dbg(" fits potential !\n");
-   goto convert;
+   if (addr || fixed) {
+   if (slice_check_range_fits(mm, _mask, addr, len)) {
+   slice_dbg(" fits potential !\n");
+   goto convert;
+   }


Why not keep the original structure and just replacing slice_check_fit()
by slice_check_range_fits() ?

I believe cleanups should not be mixed with real feature changes. If
needed, you should have a cleanup patch up front the serie.


For code that is already changing, I think minor cleanups are okay if
they're very simple. Maybe this is getting to the point of needing
another patch. You've made valid points for a lot of other unnecessary
cleanups though, so I'll fix all of those.


Ok, that's not a big point, but I like when patches really modifies
only the lines they need to modify.


Fair point, and in the end I agree mostly they should do that. But I
don't think entirely if you can make the code slightly better as you
go (again, so long as the change is obvious). I think having extra
patches for trivial cleanups is not that great either.


Why do we need a double if ?

Why not just the following ? With proper alignment of the second line
with the open parenthese, it fits in one line

if ((addr != 0 || fixed) &&
-   slice_check_fit(mm, , _mask)) {
+   slice_check_range_fits(mm, _mask, addr, len)) {
slice_dbg(" fits potential !\n");
goto convert;


For this case the main motivation was to make this test match the
form of the same test (with different mask) above here. Doing the
same thing with different coding styles annoys me.


Yes good point.

Christophe



I think I kept this one but fixed all your other suggestions in
the v2 series.

Thanks,
Nick



[PATCH 2/2] powerpc/mm/keys: Update documentation in key fault handling

2018-03-07 Thread Aneesh Kumar K.V
No functionality change in this patch. Adds more code comments. We also remove
an unnecessary pkey check after we check for pkey error in this patch.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/fault.c | 28 
 arch/powerpc/mm/pkeys.c | 11 ---
 2 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 866446cf2d9a..c01d627e687a 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -297,7 +297,12 @@ static bool access_error(bool is_write, bool is_exec,
 
if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE
return true;
-
+   /*
+* We should ideally do the vma pkey access check here. But in the
+* fault path, handle_mm_fault() also does the same check. To avoid
+* these multiple checks, we skip it here and handle access error due
+* to pkeys later.
+*/
return false;
 }
 
@@ -518,25 +523,16 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
 
 #ifdef CONFIG_PPC_MEM_KEYS
/*
-* if the HPTE is not hashed, hardware will not detect
-* a key fault. Lets check if we failed because of a
-* software detected key fault.
+* we skipped checking for access error due to key earlier.
+* Check that using handle_mm_fault error return.
 */
if (unlikely(fault & VM_FAULT_SIGSEGV) &&
-   !arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
-   is_exec, 0)) {
-   /*
-* The PGD-PDT...PMD-PTE tree may not have been fully setup.
-* Hence we cannot walk the tree to locate the PTE, to locate
-* the key. Hence let's use vma_pkey() to get the key; instead
-* of get_mm_addr_key().
-*/
+   !arch_vma_access_permitted(vma, is_write, is_exec, 0)) {
+
int pkey = vma_pkey(vma);
 
-   if (likely(pkey)) {
-   up_read(>mmap_sem);
-   return bad_key_fault_exception(regs, address, pkey);
-   }
+   up_read(>mmap_sem);
+   return bad_key_fault_exception(regs, address, pkey);
}
 #endif /* CONFIG_PPC_MEM_KEYS */
 
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index ba71c5481f42..56d33056a559 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -119,18 +119,15 @@ int pkey_initialize(void)
 #else
os_reserved = 0;
 #endif
+   initial_allocation_mask = ~0x0;
+   pkey_amr_uamor_mask = ~0x0ul;
+   pkey_iamr_mask = ~0x0ul;
/*
-* Bits are in LE format. NOTE: 1, 0 are reserved.
+* key 0, 1 are reserved.
 * key 0 is the default key, which allows read/write/execute.
 * key 1 is recommended not to be used. PowerISA(3.0) page 1015,
 * programming note.
 */
-   initial_allocation_mask = ~0x0;
-
-   /* register mask is in BE format */
-   pkey_amr_uamor_mask = ~0x0ul;
-   pkey_iamr_mask = ~0x0ul;
-
for (i = 2; i < (pkeys_total - os_reserved); i++) {
initial_allocation_mask &= ~(0x1 << i);
pkey_amr_uamor_mask &= ~(0x3ul << pkeyshift(i));
-- 
2.14.3



[PATCH 1/2] powerpc/mm/keys: Move pte bits to correct headers

2018-03-07 Thread Aneesh Kumar K.V
Memory keys are supported only with hash translation mode. Instead of #ifdef in
generic code move the key related pte bits to respective headers

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  7 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 19 ---
 3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index fc3dc6a93939..4103bfc7c223 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -33,6 +33,13 @@
 #define H_PAGE_THP_HUGE 0x0
 #define H_PAGE_COMBO   0x0
 
+/* memory key bits, only 8 keys supported */
+#define H_PTE_PKEY_BIT00
+#define H_PTE_PKEY_BIT10
+#define H_PTE_PKEY_BIT2_RPAGE_RSV3
+#define H_PTE_PKEY_BIT3_RPAGE_RSV4
+#define H_PTE_PKEY_BIT4_RPAGE_RSV5
+
 /* 8 bytes per each pte entry */
 #define H_PTE_FRAG_SIZE_SHIFT  (H_PTE_INDEX_SIZE + 3)
 #define H_PTE_FRAG_NR  (PAGE_SIZE >> H_PTE_FRAG_SIZE_SHIFT)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index e53728ff29a0..bb880c97b87d 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -16,6 +16,13 @@
 #define H_PAGE_BUSY_RPAGE_RPN44 /* software: PTE & hash are busy */
 #define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */
 
+/* memory key bits. */
+#define H_PTE_PKEY_BIT0_RPAGE_RSV1
+#define H_PTE_PKEY_BIT1_RPAGE_RSV2
+#define H_PTE_PKEY_BIT2_RPAGE_RSV3
+#define H_PTE_PKEY_BIT3_RPAGE_RSV4
+#define H_PTE_PKEY_BIT4_RPAGE_RSV5
+
 /*
  * We need to differentiate between explicit huge page and THP huge
  * page, since THP huge page also need to track real subpage details
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 973199bd4654..c233915abb68 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -60,25 +60,6 @@
 /* Max physical address bit as per radix table */
 #define _RPAGE_PA_MAX  57
 
-#ifdef CONFIG_PPC_MEM_KEYS
-#ifdef CONFIG_PPC_64K_PAGES
-#define H_PTE_PKEY_BIT0_RPAGE_RSV1
-#define H_PTE_PKEY_BIT1_RPAGE_RSV2
-#else /* CONFIG_PPC_64K_PAGES */
-#define H_PTE_PKEY_BIT00 /* _RPAGE_RSV1 is not available */
-#define H_PTE_PKEY_BIT10 /* _RPAGE_RSV2 is not available */
-#endif /* CONFIG_PPC_64K_PAGES */
-#define H_PTE_PKEY_BIT2_RPAGE_RSV3
-#define H_PTE_PKEY_BIT3_RPAGE_RSV4
-#define H_PTE_PKEY_BIT4_RPAGE_RSV5
-#else /*  CONFIG_PPC_MEM_KEYS */
-#define H_PTE_PKEY_BIT00
-#define H_PTE_PKEY_BIT10
-#define H_PTE_PKEY_BIT20
-#define H_PTE_PKEY_BIT30
-#define H_PTE_PKEY_BIT40
-#endif /*  CONFIG_PPC_MEM_KEYS */
-
 /*
  * Max physical address bit we will use for now.
  *
-- 
2.14.3



[RFC PATCH] powerpc/mm/radix: Parse disable_radix commandline correctly.

2018-03-07 Thread Aneesh Kumar K.V
kernel parameter disable_radix takes different options
disable_radix=yes|no|1|0  or just disable_radix. When using the later format
we get below error.

 `Malformed early option 'disable_radix'`

We also update the command line parsing in prom_init to handle the new format.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/prom_init.c| 16 +---
 arch/powerpc/kernel/prom_init_check.sh |  2 +-
 arch/powerpc/mm/init_64.c  |  2 +-
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index d22c41c26bb3..77735a7655ee 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -171,7 +171,7 @@ static unsigned long __initdata prom_tce_alloc_start;
 static unsigned long __initdata prom_tce_alloc_end;
 #endif
 
-static bool __initdata prom_radix_disable;
+static bool __initdata prom_radix_disable = 
!IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT); 
 
 struct platform_support {
bool hash_mmu;
@@ -641,9 +641,19 @@ static void __init early_cmdline_parse(void)
 
opt = strstr(prom_cmd_line, "disable_radix");
if (opt) {
-   prom_debug("Radix disabled from cmdline\n");
-   prom_radix_disable = true;
+   opt += 13;
+   if (*opt && *opt == '=') {
+   bool val;
+
+   if (kstrtobool(++opt, ))
+   prom_radix_disable = true;
+   else
+   prom_radix_disable = val;
+   } else
+   prom_radix_disable = true;
}
+   if (prom_radix_disable)
+   prom_debug("Radix disabled from cmdline\n");
 }
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
diff --git a/arch/powerpc/kernel/prom_init_check.sh 
b/arch/powerpc/kernel/prom_init_check.sh
index 12640f7e726b..acb6b9226352 100644
--- a/arch/powerpc/kernel/prom_init_check.sh
+++ b/arch/powerpc/kernel/prom_init_check.sh
@@ -19,7 +19,7 @@
 WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush
 _end enter_prom memcpy memset reloc_offset __secondary_hold
 __secondary_hold_acknowledge __secondary_hold_spinloop __start
-strcmp strcpy strlcpy strlen strncmp strstr logo_linux_clut224
+strcmp strcpy strlcpy strlen strncmp strstr kstrtobool logo_linux_clut224
 reloc_got2 kernstart_addr memstart_addr linux_banner _stext
 __prom_init_toc_start __prom_init_toc_end btext_setup_display TOC."
 
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 63470b06c502..51ce091914f9 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -366,7 +366,7 @@ static int __init parse_disable_radix(char *p)
 {
bool val;
 
-   if (strlen(p) == 0)
+   if (!p)
val = true;
else if (kstrtobool(p, ))
return -EINVAL;
-- 
2.14.3



[PATCH] powerpc/mm/hash: Move the slb_addr_limit check within PPC_MM_SLICES

2018-03-07 Thread Aneesh Kumar K.V
Should not have any impact, because we always select PP_MM_SLICES these days.
Nevertheless it is good to indicate that slb_addr_limit is available only
with slice code.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/slb_low.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S
index c66cb06e73a1..337ef162851d 100644
--- a/arch/powerpc/mm/slb_low.S
+++ b/arch/powerpc/mm/slb_low.S
@@ -166,6 +166,8 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT)
 */
cmpdi   r9, 0
bne-8f
+
+#ifdef CONFIG_PPC_MM_SLICES
 /*
  * user space make sure we are within the allowed limit
 */
@@ -183,7 +185,6 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT)
 * really do dynamic patching unfortunately as processes might flip
 * between 4k and 64k standard page size
 */
-#ifdef CONFIG_PPC_MM_SLICES
/* r10 have esid */
cmpldi  r10,16
/* below SLICE_LOW_TOP */
-- 
2.14.3



[PATCH V4 0/3] Add support for 4PB virtual address space on hash

2018-03-07 Thread Aneesh Kumar K.V
This patch series extended the max virtual address space value from 512TB
to 4PB with 64K page size. We do that by allocating one vsid context for
each 512TB range. More details of that is explained in patch 3.

Changes from V3:
* move extended_id to be a union with mm_context_t id. This reduce some
 array index complexity.
* Add addr_limit check when handling slb miss for extended context


Changes from V2:
* Rebased on top of slice_mask series from Nick Piggin
* Fixed segfault when mmap with 512TB hint address

Aneesh Kumar K.V (3):
  powerpc/mm: Add support for handling > 512TB address in SLB miss
  powerpc/mm/hash64: Increase the VA range
  powerpc/mm/hash: Don't memset pgd table if not needed

 arch/powerpc/include/asm/book3s/64/hash-4k.h  |   6 +
 arch/powerpc/include/asm/book3s/64/hash-64k.h |   7 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   6 +-
 arch/powerpc/include/asm/book3s/64/mmu.h  |  26 -
 arch/powerpc/include/asm/book3s/64/pgalloc.h  |  13 ++-
 arch/powerpc/include/asm/processor.h  |  16 ++-
 arch/powerpc/kernel/exceptions-64s.S  |  12 +-
 arch/powerpc/mm/copro_fault.c |   2 +-
 arch/powerpc/mm/hash_utils_64.c   |   4 +-
 arch/powerpc/mm/init_64.c |   6 -
 arch/powerpc/mm/mmu_context_book3s64.c|  15 ++-
 arch/powerpc/mm/pgtable-hash64.c  |   2 +-
 arch/powerpc/mm/pgtable_64.c  |   5 -
 arch/powerpc/mm/slb.c | 154 ++
 arch/powerpc/mm/slb_low.S |   6 +-
 arch/powerpc/mm/tlb_hash64.c  |   2 +-
 16 files changed, 252 insertions(+), 30 deletions(-)

-- 
2.14.3



[PATCH V4 3/3] powerpc/mm/hash: Don't memset pgd table if not needed

2018-03-07 Thread Aneesh Kumar K.V
We need to zero-out pgd table only if we share the slab cache with pud/pmd
level caches. With the support of 4PB, we don't share the slab cache anymore.
Instead of removing the code completely hide it within an #ifdef. We don't need
to do this with any other page table level, because they all allocate table
of double the size and we take of initializing the first half corrrectly during
page table zap.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 4746bc68d446..07f0dbac479f 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -80,8 +80,19 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
pgd = kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
   pgtable_gfp_flags(mm, GFP_KERNEL));
+   /*
+* With hugetlb, we don't clear the second half of the page table.
+* If we share the same slab cache with the pmd or pud level table,
+* we need to make sure we zero out the full table on alloc.
+* With 4K we don't store slot in the second half. Hence we don't
+* need to do this for 4k.
+*/
+#if (H_PGD_INDEX_SIZE == H_PUD_CACHE_INDEX) || \
+   (H_PGD_INDEX_SIZE == H_PMD_CACHE_INDEX)
+#if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_PPC_64K_PAGES)
memset(pgd, 0, PGD_TABLE_SIZE);
-
+#endif
+#endif
return pgd;
 }
 
-- 
2.14.3



[PATCH V4 2/3] powerpc/mm/hash64: Increase the VA range

2018-03-07 Thread Aneesh Kumar K.V
This patch increase the max virtual address value to 4PB. With 4K page size
config we continue to limit ourself to 64TB.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 2 +-
 arch/powerpc/include/asm/processor.h  | 9 -
 arch/powerpc/mm/init_64.c | 6 --
 arch/powerpc/mm/pgtable_64.c  | 5 -
 4 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 0ee0fc1ad675..02098d7fe177 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -4,7 +4,7 @@
 
 #define H_PTE_INDEX_SIZE  8
 #define H_PMD_INDEX_SIZE  10
-#define H_PUD_INDEX_SIZE  7
+#define H_PUD_INDEX_SIZE  10
 #define H_PGD_INDEX_SIZE  8
 /*
  * No of address bits below which we use the default context
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 70d65b482504..a621a068880a 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -109,6 +109,13 @@ void release_thread(struct task_struct *);
 #define TASK_SIZE_64TB  (0x4000UL)
 #define TASK_SIZE_128TB (0x8000UL)
 #define TASK_SIZE_512TB (0x0002UL)
+#define TASK_SIZE_1PB   (0x0004UL)
+#define TASK_SIZE_2PB   (0x0008UL)
+/*
+ * With 52 bits in the address we can support
+ * upto 4PB of range.
+ */
+#define TASK_SIZE_4PB   (0x0010UL)
 
 /*
  * For now 512TB is only supported with book3s and 64K linux page size.
@@ -117,7 +124,7 @@ void release_thread(struct task_struct *);
 /*
  * Max value currently used:
  */
-#define TASK_SIZE_USER64   TASK_SIZE_512TB
+#define TASK_SIZE_USER64   TASK_SIZE_4PB
 #define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
 #define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
 #else
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index fdb424a29f03..63470b06c502 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -68,12 +68,6 @@
 
 #include "mmu_decl.h"
 
-#ifdef CONFIG_PPC_BOOK3S_64
-#if H_PGTABLE_RANGE > USER_VSID_RANGE
-#warning Limited user VSID range means pagetable space is wasted
-#endif
-#endif /* CONFIG_PPC_BOOK3S_64 */
-
 phys_addr_t memstart_addr = ~0;
 EXPORT_SYMBOL_GPL(memstart_addr);
 phys_addr_t kernstart_addr;
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 28c980eb4422..16636bdf3331 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -57,11 +57,6 @@
 
 #include "mmu_decl.h"
 
-#ifdef CONFIG_PPC_BOOK3S_64
-#if TASK_SIZE_USER64 > (1UL << (ESID_BITS + SID_SHIFT))
-#error TASK_SIZE_USER64 exceeds user VSID range
-#endif
-#endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
 /*
-- 
2.14.3



[PATCH V4 1/3] powerpc/mm: Add support for handling > 512TB address in SLB miss

2018-03-07 Thread Aneesh Kumar K.V
For address above 512TB we allocate additional mmu context. To make it all
easy address above 512TB is handled with IR/DR=1 and with stack frame setup.

We do the additional context allocation in SLB miss handler. If the context is
not allocated, we enable interrupts and allocate the context and retry the
access which will again result in a SLB miss.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |   6 +
 arch/powerpc/include/asm/book3s/64/hash-64k.h |   5 +
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   6 +-
 arch/powerpc/include/asm/book3s/64/mmu.h  |  26 -
 arch/powerpc/include/asm/processor.h  |   7 ++
 arch/powerpc/kernel/exceptions-64s.S  |  12 +-
 arch/powerpc/mm/copro_fault.c |   2 +-
 arch/powerpc/mm/hash_utils_64.c   |   4 +-
 arch/powerpc/mm/mmu_context_book3s64.c|  15 ++-
 arch/powerpc/mm/pgtable-hash64.c  |   2 +-
 arch/powerpc/mm/slb.c | 154 ++
 arch/powerpc/mm/slb_low.S |   6 +-
 arch/powerpc/mm/tlb_hash64.c  |   2 +-
 13 files changed, 231 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 67c5475311ee..af2ba9875f18 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -11,6 +11,12 @@
 #define H_PUD_INDEX_SIZE  9
 #define H_PGD_INDEX_SIZE  9
 
+/*
+ * No of address bits below which we use the default context
+ * for slb allocation. For 4k this is 64TB.
+ */
+#define H_BITS_FIRST_CONTEXT   46
+
 #ifndef __ASSEMBLY__
 #define H_PTE_TABLE_SIZE   (sizeof(pte_t) << H_PTE_INDEX_SIZE)
 #define H_PMD_TABLE_SIZE   (sizeof(pmd_t) << H_PMD_INDEX_SIZE)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 3bcf269f8f55..0ee0fc1ad675 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -6,6 +6,11 @@
 #define H_PMD_INDEX_SIZE  10
 #define H_PUD_INDEX_SIZE  7
 #define H_PGD_INDEX_SIZE  8
+/*
+ * No of address bits below which we use the default context
+ * for slb allocation. For 64k this is 512TB.
+ */
+#define H_BITS_FIRST_CONTEXT   49
 
 /*
  * 64k aligned address free up few of the lower bits of RPN for us
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 50ed64fba4ae..8ee83f6e9c84 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -691,8 +691,8 @@ static inline int user_segment_size(unsigned long addr)
return MMU_SEGSIZE_256M;
 }
 
-static inline unsigned long get_vsid(unsigned long context, unsigned long ea,
-int ssize)
+static inline unsigned long __get_vsid(unsigned long context, unsigned long ea,
+  int ssize)
 {
unsigned long va_bits = VA_BITS;
unsigned long vsid_bits;
@@ -744,7 +744,7 @@ static inline unsigned long get_kernel_vsid(unsigned long 
ea, int ssize)
 */
context = (ea >> 60) - KERNEL_REGION_CONTEXT_OFFSET;
 
-   return get_vsid(context, ea, ssize);
+   return __get_vsid(context, ea, ssize);
 }
 
 unsigned htab_shift_for_mem_size(unsigned long mem_size);
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 78579305..a70adbb7ec56 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -91,7 +91,15 @@ struct slice_mask {
 };
 
 typedef struct {
-   mm_context_id_t id;
+   union {
+   /*
+* One context for each 512TB.
+* First 512TB context is saved in id and is also used
+* as PIDR.
+*/
+   mm_context_id_t id;
+   mm_context_id_t extended_id[TASK_SIZE_USER64/TASK_CONTEXT_SIZE];
+   };
u16 user_psize; /* page size index */
 
/* Number of bits in the mm_cpumask */
@@ -193,5 +201,21 @@ extern void radix_init_pseries(void);
 static inline void radix_init_pseries(void) { };
 #endif
 
+static inline int get_esid_context(mm_context_t *ctx, unsigned long ea)
+{
+   int index = ea >> H_BITS_FIRST_CONTEXT;
+
+   return ctx->extended_id[index];
+}
+
+static inline unsigned long get_user_vsid(mm_context_t *ctx,
+ unsigned long ea, int ssize)
+{
+   unsigned long context = get_esid_context(ctx, ea);
+
+   return __get_vsid(context, ea, ssize);
+}
+
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 01299cdc9806..70d65b482504 100644
--- 

Re: [PATCH 00/14] numa aware allocation for pacas, stacks, pagetables

2018-03-07 Thread Nicholas Piggin
On Wed, 07 Mar 2018 21:50:04 +1100
Michael Ellerman  wrote:

> Nicholas Piggin  writes:
> 
> > This series allows numa aware allocations for various early data
> > structures for radix. Hash still has a bolted SLB limitation that
> > prevents at least pacas and stacks from node-affine allocations.
> >
> > Fixed up a number of bugs, got pSeries working, added a couple more
> > cases where page tables can be allocated node-local.  
> 
> Few problems in here:
> 
> FAILURE kernel-build-linux » powerpc,gcc_ubuntu_be,pmac32
>   arch/powerpc/kernel/prom.c:748:2: error: implicit declaration of function 
> 'allocate_paca_ptrs' [-Werror=implicit-function-declaration]
> 
> FAILURE kernel-build-linux » powerpc,gcc_ubuntu_le,powernv
>   arch/powerpc/include/asm/paca.h:49:33: error: 'struct paca_struct' has no 
> member named 'lppaca_ptr'
>   arch/powerpc/include/asm/paca.h:49:33: error: 'struct paca_struct' has no 
> member named 'lppaca_ptr'
> 
> Did I miss a follow-up or something?

No, I probably just don't do enough compile testing on ppc32. Not
sure about the powernv error, probably just missed testing a config.
Do you have more logs?

Thanks,
Nick


Re: [PATCH 00/14] numa aware allocation for pacas, stacks, pagetables

2018-03-07 Thread Michael Ellerman
Nicholas Piggin  writes:

> This series allows numa aware allocations for various early data
> structures for radix. Hash still has a bolted SLB limitation that
> prevents at least pacas and stacks from node-affine allocations.
>
> Fixed up a number of bugs, got pSeries working, added a couple more
> cases where page tables can be allocated node-local.

Few problems in here:

FAILURE kernel-build-linux » powerpc,gcc_ubuntu_be,pmac32
  arch/powerpc/kernel/prom.c:748:2: error: implicit declaration of function 
'allocate_paca_ptrs' [-Werror=implicit-function-declaration]

FAILURE kernel-build-linux » powerpc,gcc_ubuntu_le,powernv
  arch/powerpc/include/asm/paca.h:49:33: error: 'struct paca_struct' has no 
member named 'lppaca_ptr'
  arch/powerpc/include/asm/paca.h:49:33: error: 'struct paca_struct' has no 
member named 'lppaca_ptr'

Did I miss a follow-up or something?

cheers


Re: [PATCH v2 10/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-03-07 Thread Michael Ellerman
Nicholas Piggin  writes:
> On Wed,  7 Mar 2018 11:37:18 +1000
> Nicholas Piggin  wrote:
>
>> The number of high slices a process might use now depends on its
>> address space size, and what allocation address it has requested.
>> 
>> This patch uses that limit throughout call chains where possible,
>> rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
>> This saves some cost for processes that don't use very large address
>> spaces.
>> 
>> Perormance numbers aren't changed significantly, this may change
>> with larger address spaces or different mmap access patterns that
>> require more slice mask building.
>
>
> Ignore this patch in the series. I didn't intend to send it.

Oops :D

I'll drop it and rebuild.

cheers

kisskb: Failed 206/268
http://kisskb.ellerman.id.au/kisskb/head/82bc47f26969f6ed290cda529b9893941923c0f4/
  Failed: powerpc-next/pseries_le_defconfig+NO_NUMA/ppc64le 
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296349/log/)
/kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero 
with non-const size? Good code?
/kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure 
high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_le_defconfig+NO_SPLPAR/ppc64le   
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296347/log/)
/kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero 
with non-const size? Good code?
/kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure 
high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_defconfig+NO_SPLPAR/powerpc-5.3  
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296346/log/)
/kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero 
with non-const size? Good code?
/kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure 
high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_defconfig+NO_SPLPAR/powerpc  
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296345/log/)
/kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero 
with non-const size? Good code?
/kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure 
high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pmac32_defconfig+KVM/powerpc 
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296343/log/)
/kisskb/src/arch/powerpc/kvm/powerpc.c:1361:2: error: 'emulated' may be 
used uninitialized in this function [-Werror=uninitialized]
  Failed: powerpc-next/allmodconfig+64K_PAGES/powerpc-5.3   
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296342/log/)
/kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero 
with non-const size? Good code?
/kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure 
high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/allmodconfig+64K_PAGES/powerpc   
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296341/log/)
/kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero 
with non-const size? Good code?
/kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure 
high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/skiroot_defconfig/ppc64le
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296339/log/)
/kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero 
with non-const size? Good code?
/kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure 
high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/powernv_defconfig+NO_NUMA/ppc64le
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296338/log/)
/kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero 
with non-const size? Good code?
/kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure 
high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/powernv_defconfig+STRICT_RWX/ppc64le 
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296337/log/)
/kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero 
with non-const size? Good code?
/kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure 
high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/corenet64_smp_defconfig/powerpc-5.3  
(http://kisskb.ellerman.id.au/kisskb/buildresult/13296332/log/)
/kisskb/src/arch/powerpc/include/asm/hugetlb.h:101:10: 

[PATCH] PCI/hotplug: ppc: correct a php_slot usage after free

2018-03-07 Thread wei . guo . simon
From: Simon Guo 

In pnv_php_unregister_one(), pnv_php_put_slot() might kfree
php_slot structure. But there is pci_hp_deregister() after
that with php_slot reference.

This patch moves pnv_php_put_slot() to the end of function.

Signed-off-by: Simon Guo 
---
 drivers/pci/hotplug/pnv_php.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 74f6a17..eb60692e 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -930,8 +930,8 @@ static void pnv_php_unregister_one(struct device_node *dn)
return;
 
php_slot->state = PNV_PHP_STATE_OFFLINE;
-   pnv_php_put_slot(php_slot);
pci_hp_deregister(_slot->slot);
+   pnv_php_put_slot(php_slot);
 }
 
 static void pnv_php_unregister(struct device_node *dn)
-- 
1.8.3.1