Re: [PATCH 2/2] powerpc/64s: reimplement book3s idle code in C

2018-08-09 Thread Nicholas Piggin
On Thu, 9 Aug 2018 20:15:04 +0530
Gautham R Shenoy  wrote:

> Hello Nicholas,
> On Fri, Aug 03, 2018 at 02:13:50PM +1000, Nicholas Piggin wrote:
> > Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
> > speific HV idle code to the powernv platform code. Generic book3s
> > assembly stubs are kept in common code and used only to save and
> > restore the stack frame and non-volatile GPRs just before going to
> > idle and just after waking up, which can return into C code.
> > 
> > Moving the HMI, SPR, OPAL, locking, etc. to C is the only real
> > way this stuff will cope with non-trivial new CPU implementation
> > details, firmware changes, etc., without becoming unmaintainable.
> > 
> > This is not a strict translation to C code, there are some
> > differences.
> > 
> > - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
> >   but saves and restores them all explicitly.
> > 
> > - The optimisation where EC=ESL=0 idle modes did not have to save
> >   GPRs or change MSR is restored, because it's now simple to do.
> >   State loss modes that did not actually lose GPRs can use this
> >   optimization too.
> > 
> > - KVM secondary entry code is now more of a call/return style
> >   rather than spaghetti. nap_state_lost is not required beccause
> >   KVM always returns via NVGPR restorig path.
> > 
> > This seems pretty solid, so needs more review and testing now. The
> > KVM changes are pretty significant and complicated. POWER7 needs to
> > be tested too.
> > 
> > Open question:
> > - Why does P9 restore some of the PMU SPRs (but not others), and
> >   P8 only zeroes them?  
> 
> We are restoring MMCR0 (from the value saved in the stack) and MMCR1,
> MMCR2, and MMCRA in the stop_sprs in PACA. We saw that MMCRH and MMCRC
> are cleared on both POWER8 and POWER9. Hence we didn't restore
> them. MMCRS is being initialized by the KVM code.
> 
> Is there anything apart from these that need to be restored ?

No I'm wondering why it is we restore those on POWER9? POWER8 does not
restore them, only zeroes. What is the difference with POWER9?

I will leave that in for now so we don't change too much with one patch,
but it would be nice to document a bit better the reasons for saving or
clearing SPRs.

> 
> > 
> > Since RFC v1:
> > - Now tested and working with POWER9 hash and radix.
> > - KVM support added. This took a bit of work to untangle and might
> >   still have some issues, but POWER9 seems to work including hash on
> >   radix with dependent threads mode.
> > - This snowballed a bit because of KVM and other details making it
> >   not feasible to leave POWER7/8 code alone. That's only half done
> >   at the moment.
> > - So far this trades about 800 lines of asm for 500 of C. With POWER7/8
> >   support done it might be another hundred or so lines of C.
> > 
> > Since RFC v2:
> > - Fixed deep state SLB reloading
> > - Now tested and working with POWER8.
> > - Accounted for most feedback.
> > 
> > Since RFC v3:
> > - Rebased to powerpc merge + idle state bugfix
> > - Split SLB flush/restore code out and shared with MCE code (pseries
> >   MCE patches can also use).
> > - More testing on POWER8 including KVM with secondaries.
> > - Performance testing looks good. EC=ESL=0 is about 5% faster, other
> >   stop states look a little faster too.
> > - Adjusted SPR saving to handler POWER7, haven't tested it.  
> 
> 
> This patch looks good to me.
> 
> A couple of comments below. 
> 
> > ---  
> 
> [... snip ..]
> 
> > @@ -178,23 +177,30 @@ struct paca_struct {
> >  #endif
> > 
> >  #ifdef CONFIG_PPC_POWERNV
> > -   /* Per-core mask tracking idle threads and a lock bit-[L][] */
> > -   u32 *core_idle_state_ptr;
> > -   u8 thread_idle_state;   /* PNV_THREAD_RUNNING/NAP/SLEEP */
> > -   /* Mask to indicate thread id in core */
> > -   u8 thread_mask;
> > -   /* Mask to denote subcore sibling threads */
> > -   u8 subcore_sibling_mask;
> > -   /* Flag to request this thread not to stop */
> > -   atomic_t dont_stop;
> > -   /* The PSSCR value that the kernel requested before going to stop */
> > -   u64 requested_psscr;
> > +   /* PowerNV idle fields */
> > +   /* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
> > +   unsigned long idle_state;
> > +   union {
> > +   /* P7/P8 specific fields */
> > +   struct {
> > +   /* PNV_THREAD_RUNNING/NAP/SLEEP */
> > +   u8 thread_idle_state;
> > +   /* Mask to indicate thread id in core */
> > +   u8 thread_mask;  
> 
> This is no longer needed. We can get this from cpu_thread_in_core()
> from the C code.
> 
> The only place where we are currently using this is to DUMP the value
> of the thread_mask from xmon but not anywhere else in the idle entry
> code.

Good catch, removed it.

> > +   /* Mask to denote subcore sibling threads */
> > +   u8 subcore_sibling_mask;
> > +   };
> > 
> > -   /*
> > -*

Re: [PATCH 2/2] powerpc/64s: reimplement book3s idle code in C

2018-08-09 Thread Gautham R Shenoy
Hello Nicholas,
On Fri, Aug 03, 2018 at 02:13:50PM +1000, Nicholas Piggin wrote:
> Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
> speific HV idle code to the powernv platform code. Generic book3s
> assembly stubs are kept in common code and used only to save and
> restore the stack frame and non-volatile GPRs just before going to
> idle and just after waking up, which can return into C code.
> 
> Moving the HMI, SPR, OPAL, locking, etc. to C is the only real
> way this stuff will cope with non-trivial new CPU implementation
> details, firmware changes, etc., without becoming unmaintainable.
> 
> This is not a strict translation to C code, there are some
> differences.
> 
> - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
>   but saves and restores them all explicitly.
> 
> - The optimisation where EC=ESL=0 idle modes did not have to save
>   GPRs or change MSR is restored, because it's now simple to do.
>   State loss modes that did not actually lose GPRs can use this
>   optimization too.
> 
> - KVM secondary entry code is now more of a call/return style
>   rather than spaghetti. nap_state_lost is not required beccause
>   KVM always returns via NVGPR restorig path.
> 
> This seems pretty solid, so needs more review and testing now. The
> KVM changes are pretty significant and complicated. POWER7 needs to
> be tested too.
> 
> Open question:
> - Why does P9 restore some of the PMU SPRs (but not others), and
>   P8 only zeroes them?

We are restoring MMCR0 (from the value saved in the stack) and MMCR1,
MMCR2, and MMCRA in the stop_sprs in PACA. We saw that MMCRH and MMCRC
are cleared on both POWER8 and POWER9. Hence we didn't restore
them. MMCRS is being initialized by the KVM code.

Is there anything apart from these that need to be restored ?

> 
> Since RFC v1:
> - Now tested and working with POWER9 hash and radix.
> - KVM support added. This took a bit of work to untangle and might
>   still have some issues, but POWER9 seems to work including hash on
>   radix with dependent threads mode.
> - This snowballed a bit because of KVM and other details making it
>   not feasible to leave POWER7/8 code alone. That's only half done
>   at the moment.
> - So far this trades about 800 lines of asm for 500 of C. With POWER7/8
>   support done it might be another hundred or so lines of C.
> 
> Since RFC v2:
> - Fixed deep state SLB reloading
> - Now tested and working with POWER8.
> - Accounted for most feedback.
> 
> Since RFC v3:
> - Rebased to powerpc merge + idle state bugfix
> - Split SLB flush/restore code out and shared with MCE code (pseries
>   MCE patches can also use).
> - More testing on POWER8 including KVM with secondaries.
> - Performance testing looks good. EC=ESL=0 is about 5% faster, other
>   stop states look a little faster too.
> - Adjusted SPR saving to handler POWER7, haven't tested it.


This patch looks good to me.

A couple of comments below. 

> ---

[... snip ..]

> @@ -178,23 +177,30 @@ struct paca_struct {
>  #endif
> 
>  #ifdef CONFIG_PPC_POWERNV
> - /* Per-core mask tracking idle threads and a lock bit-[L][] */
> - u32 *core_idle_state_ptr;
> - u8 thread_idle_state;   /* PNV_THREAD_RUNNING/NAP/SLEEP */
> - /* Mask to indicate thread id in core */
> - u8 thread_mask;
> - /* Mask to denote subcore sibling threads */
> - u8 subcore_sibling_mask;
> - /* Flag to request this thread not to stop */
> - atomic_t dont_stop;
> - /* The PSSCR value that the kernel requested before going to stop */
> - u64 requested_psscr;
> + /* PowerNV idle fields */
> + /* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
> + unsigned long idle_state;
> + union {
> + /* P7/P8 specific fields */
> + struct {
> + /* PNV_THREAD_RUNNING/NAP/SLEEP */
> + u8 thread_idle_state;
> + /* Mask to indicate thread id in core */
> + u8 thread_mask;

This is no longer needed. We can get this from cpu_thread_in_core()
from the C code.

The only place where we are currently using this is to DUMP the value
of the thread_mask from xmon but not anywhere else in the idle entry
code.



> + /* Mask to denote subcore sibling threads */
> + u8 subcore_sibling_mask;
> + };
> 
> - /*
> -  * Save area for additional SPRs that need to be
> -  * saved/restored during cpuidle stop.
> -  */
> - struct stop_sprs stop_sprs;
> + /* P9 specific fields */
> + struct {
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + /* The PSSCR value that the kernel requested before 
> going to stop */
> + u64 requested_psscr;
> + /* Flag to request this thread not to stop */
> + atomic_t dont_stop;
> +#endif
> + };
> + };
>  #endif
[..snip..]

>

[PATCH 2/2] powerpc/64s: reimplement book3s idle code in C

2018-08-02 Thread Nicholas Piggin
Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
speific HV idle code to the powernv platform code. Generic book3s
assembly stubs are kept in common code and used only to save and
restore the stack frame and non-volatile GPRs just before going to
idle and just after waking up, which can return into C code.

Moving the HMI, SPR, OPAL, locking, etc. to C is the only real
way this stuff will cope with non-trivial new CPU implementation
details, firmware changes, etc., without becoming unmaintainable.

This is not a strict translation to C code, there are some
differences.

- Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
  but saves and restores them all explicitly.

- The optimisation where EC=ESL=0 idle modes did not have to save
  GPRs or change MSR is restored, because it's now simple to do.
  State loss modes that did not actually lose GPRs can use this
  optimization too.

- KVM secondary entry code is now more of a call/return style
  rather than spaghetti. nap_state_lost is not required beccause
  KVM always returns via NVGPR restorig path.

This seems pretty solid, so needs more review and testing now. The
KVM changes are pretty significant and complicated. POWER7 needs to
be tested too.

Open question:
- Why does P9 restore some of the PMU SPRs (but not others), and
  P8 only zeroes them?

Since RFC v1:
- Now tested and working with POWER9 hash and radix.
- KVM support added. This took a bit of work to untangle and might
  still have some issues, but POWER9 seems to work including hash on
  radix with dependent threads mode.
- This snowballed a bit because of KVM and other details making it
  not feasible to leave POWER7/8 code alone. That's only half done
  at the moment.
- So far this trades about 800 lines of asm for 500 of C. With POWER7/8
  support done it might be another hundred or so lines of C.

Since RFC v2:
- Fixed deep state SLB reloading
- Now tested and working with POWER8.
- Accounted for most feedback.

Since RFC v3:
- Rebased to powerpc merge + idle state bugfix
- Split SLB flush/restore code out and shared with MCE code (pseries
  MCE patches can also use).
- More testing on POWER8 including KVM with secondaries.
- Performance testing looks good. EC=ESL=0 is about 5% faster, other
  stop states look a little faster too.
- Adjusted SPR saving to handler POWER7, haven't tested it.
---
 arch/powerpc/include/asm/cpuidle.h   |  19 +-
 arch/powerpc/include/asm/paca.h  |  40 +-
 arch/powerpc/include/asm/processor.h |   9 +-
 arch/powerpc/include/asm/reg.h   |   7 +-
 arch/powerpc/kernel/asm-offsets.c|  18 -
 arch/powerpc/kernel/dt_cpu_ftrs.c|  21 +-
 arch/powerpc/kernel/exceptions-64s.S |  17 +-
 arch/powerpc/kernel/idle_book3s.S| 998 ++-
 arch/powerpc/kernel/setup-common.c   |   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |  90 +-
 arch/powerpc/platforms/powernv/idle.c| 816 ++
 arch/powerpc/platforms/powernv/subcore.c |   2 +-
 arch/powerpc/xmon/xmon.c |  25 +-
 13 files changed, 859 insertions(+), 1207 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 43e5f31fe64d..9844b3ded187 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -27,10 +27,11 @@
  * the THREAD_WINKLE_BITS are set, which indicate which threads have not
  * yet woken from the winkle state.
  */
-#define PNV_CORE_IDLE_LOCK_BIT 0x1000
+#define NR_PNV_CORE_IDLE_LOCK_BIT  28
+#define PNV_CORE_IDLE_LOCK_BIT (1ULL << 
NR_PNV_CORE_IDLE_LOCK_BIT)
 
+#define PNV_CORE_IDLE_WINKLE_COUNT_SHIFT   16
 #define PNV_CORE_IDLE_WINKLE_COUNT 0x0001
-#define PNV_CORE_IDLE_WINKLE_COUNT_ALL_BIT 0x0008
 #define PNV_CORE_IDLE_WINKLE_COUNT_BITS0x000F
 #define PNV_CORE_IDLE_THREAD_WINKLE_BITS_SHIFT 8
 #define PNV_CORE_IDLE_THREAD_WINKLE_BITS   0xFF00
@@ -68,16 +69,6 @@
 #define ERR_DEEP_STATE_ESL_MISMATCH-2
 
 #ifndef __ASSEMBLY__
-/* Additional SPRs that need to be saved/restored during stop */
-struct stop_sprs {
-   u64 pid;
-   u64 ldbar;
-   u64 fscr;
-   u64 hfscr;
-   u64 mmcr1;
-   u64 mmcr2;
-   u64 mmcra;
-};
 
 #define PNV_IDLE_NAME_LEN16
 struct pnv_idle_states_t {
@@ -92,10 +83,6 @@ struct pnv_idle_states_t {
 
 extern struct pnv_idle_states_t *pnv_idle_states;
 extern int nr_pnv_idle_states;
-extern u32 pnv_fastsleep_workaround_at_entry[];
-extern u32 pnv_fastsleep_workaround_at_exit[];
-
-extern u64 pnv_first_deep_stop_state;
 
 unsigned long pnv_cpu_offline(unsigned int cpu);
 int validate_psscr_val_mask(u64 *psscr_val, u64 *psscr_mask, u32 flags);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4e9cede5a7e7..d2cee5ebaaa1 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -168,7 +