from:"Akshay Adiga"

Re: [PATCH v5 02/10] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle

2019-03-13 Thread Akshay Adiga

On Fri, Mar 08, 2019 at 12:16:11PM +1100, Michael Ellerman wrote:
> In order to implement KUAP (Kernel Userspace Access Protection) on
> Power9 we will be using the AMR, and therefore indirectly the
> UAMOR/AMOR.
> 
> So save/restore these regs in the idle code.
> 
> Signed-off-by: Michael Ellerman 
> ---
> v5: Unchanged.
> v4: New.
> 
>  arch/powerpc/kernel/idle_book3s.S | 27 +++
>  1 file changed, 23 insertions(+), 4 deletions(-)

Opps.. i posted a comment on the v4. 

It would be good if we can make AMOR/UAMOR/AMR save-restore
code power9 only.

Re: [PATCH v4 2/9] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle

2019-03-13 Thread Akshay Adiga

On Fri, Mar 01, 2019 at 01:49:10AM +1100, Michael Ellerman wrote:
> In order to implement KUAP (Kernel Userspace Access Protection) on
> Power9 we will be using the AMR, and therefore indirectly the
> UAMOR/AMOR.
> 
> So save/restore these regs in the idle code.
> 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/kernel/idle_book3s.S | 27 +++
>  1 file changed, 23 insertions(+), 4 deletions(-)
> 
> v4: New.
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 36178000a2f2..4a860d3b9229 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -170,8 +170,11 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>   bne-core_idle_lock_held
>   blr
> 
> -/* Reuse an unused pt_regs slot for IAMR */
> +/* Reuse some unused pt_regs slots for AMR/IAMR/UAMOR/UAMOR */
> +#define PNV_POWERSAVE_AMR_TRAP
>  #define PNV_POWERSAVE_IAMR   _DAR
> +#define PNV_POWERSAVE_UAMOR  _DSISR
> +#define PNV_POWERSAVE_AMOR   RESULT
> 
>  /*
>   * Pass requested state in r3:
> @@ -205,8 +208,16 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>   SAVE_NVGPRS(r1)
> 
>  BEGIN_FTR_SECTION
> + mfspr   r4, SPRN_AMR
>   mfspr   r5, SPRN_IAMR
> + mfspr   r6, SPRN_UAMOR
> + std r4, PNV_POWERSAVE_AMR(r1)
>   std r5, PNV_POWERSAVE_IAMR(r1)
> + std r6, PNV_POWERSAVE_UAMOR(r1)
> +BEGIN_FTR_SECTION_NESTED(42)
> + mfspr   r7, SPRN_AMOR
> + std r7, PNV_POWERSAVE_AMOR(r1)
> +END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
>  END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)

Can we save/restore for ARCH_300S only ?

I checked the user manual "POWER8 Processor User’s Manual for the
Single-Chip Module" v1.3 dated 16 March 2016. In section 9.7.2.2,
It mentions that AMOR and UAMOR will be preserved by hardware.

Excrept : 
" When an SMT thread switch is enabled, the napping thread’s
 resources can be given to other threads to improve core
 performance. Software must restore the architected state
 of the dormant thread upon exiting nap mode except for
 the following architected facilities, which are preserved
 by the hardware:
 - SLB State
 - All Hypervisor Special Registers (includes PURR, SPURR, AMOR,
 UAMOR, AMR, ACOP)
"

Keeping 207S has no harm, its just that we may spend a few
more cycles saving and restoring the sprs.

> 
>   mfcrr5
> @@ -935,12 +946,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>   REST_GPR(2, r1)
> 
>  BEGIN_FTR_SECTION
> - /* IAMR was saved in pnv_powersave_common() */
> + /* These regs were saved in pnv_powersave_common() */
> + ld  r4, PNV_POWERSAVE_AMR(r1)
>   ld  r5, PNV_POWERSAVE_IAMR(r1)
> + ld  r6, PNV_POWERSAVE_UAMOR(r1)
> + mtspr   SPRN_AMR, r4
>   mtspr   SPRN_IAMR, r5
> + mtspr   SPRN_UAMOR, r6
> +BEGIN_FTR_SECTION_NESTED(42)
> + ld  r7, PNV_POWERSAVE_AMOR(r1)
> + mtspr   SPRN_AMOR, r7
> +END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
>   /*
> -  * We don't need an isync here because the upcoming mtmsrd is
> -  * execution synchronizing.
> +  * We don't need an isync here after restoring IAMR because the upcoming
> +  * mtmsrd is execution synchronizing.
>*/
>  END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> 
> -- 
> 2.20.1
>

Re: [PATCH v2] powerpc/powernv/idle: Restore IAMR after idle

2019-02-27 Thread Akshay Adiga

The changes look good to me.

On Fri, Feb 08, 2019 at 10:11:03PM +1100, Russell Currey wrote:
> Without restoring the IAMR after idle, execution prevention on POWER9
> with Radix MMU is overwritten and the kernel can freely execute userspace 
> without
> faulting.
> 
> This is necessary when returning from any stop state that modifies user
> state, as well as hypervisor state.
> 
> To test how this fails without this patch, load the lkdtm driver and
> do the following:
> 
>echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT
> 
> which won't fault, then boot the kernel with powersave=off, where it
> will fault.  Applying this patch will fix this.
> 
> Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user
> space")
> Cc: 
> Signed-off-by: Russell Currey 

Reviewed-by: Akshay Adiga

Re: [PATCH] powerpc/powernv/idle: Restore IAMR after idle

2019-02-20 Thread Akshay Adiga

On Wed, Feb 06, 2019 at 05:28:37PM +1100, Russell Currey wrote:
> Without restoring the IAMR after idle, execution prevention on POWER9
> with Radix MMU is overwritten and the kernel can freely execute userspace 
> without
> faulting.
> 
> This is necessary when returning from any stop state that modifies user
> state, as well as hypervisor state.
> 
> To test how this fails without this patch, load the lkdtm driver and
> do the following:
> 
>echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT
> 
> which won't fault, then boot the kernel with powersave=off, where it
> will fault.  Applying this patch will fix this.
> 
> Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user
> space")
> Cc: 
> Signed-off-by: Russell Currey 
> ---
>  arch/powerpc/include/asm/cpuidle.h |  1 +
>  arch/powerpc/kernel/asm-offsets.c  |  1 +
>  arch/powerpc/kernel/idle_book3s.S  | 20 
>  3 files changed, 22 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/cpuidle.h 
> b/arch/powerpc/include/asm/cpuidle.h
> index 43e5f31fe64d..ad67dbe59498 100644
> --- a/arch/powerpc/include/asm/cpuidle.h
> +++ b/arch/powerpc/include/asm/cpuidle.h
> @@ -77,6 +77,7 @@ struct stop_sprs {
>   u64 mmcr1;
>   u64 mmcr2;
>   u64 mmcra;
> + u64 iamr;
>  };
> 
>  #define PNV_IDLE_NAME_LEN16
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 9ffc72ded73a..10e0314c2b0d 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -774,6 +774,7 @@ int main(void)
>   STOP_SPR(STOP_MMCR1, mmcr1);
>   STOP_SPR(STOP_MMCR2, mmcr2);
>   STOP_SPR(STOP_MMCRA, mmcra);
> + STOP_SPR(STOP_IAMR, iamr);
>  #endif
> 
>   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 7f5ac2e8581b..bb4f552f6c7e 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -200,6 +200,12 @@ pnv_powersave_common:
>   /* Continue saving state */
>   SAVE_GPR(2, r1)
>   SAVE_NVGPRS(r1)
> +
> +BEGIN_FTR_SECTION
> + mfspr   r5, SPRN_IAMR
> + std r5, STOP_IAMR(r13)
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> +
>   mfcrr5
>   std r5,_CCR(r1)
>   std r1,PACAR1(r13)
> @@ -924,6 +930,13 @@ BEGIN_FTR_SECTION
>  END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>   REST_NVGPRS(r1)
>   REST_GPR(2, r1)
> +
> +BEGIN_FTR_SECTION
> + ld  r4, STOP_IAMR(r13)
> + mtspr   SPRN_IAMR, r4
> + isync
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> +
>   ld  r4,PACAKMSR(r13)
>   ld  r5,_LINK(r1)
>   ld  r6,_CCR(r1)
> @@ -946,6 +959,13 @@ pnv_wakeup_noloss:
>  BEGIN_FTR_SECTION
>   CHECK_HMI_INTERRUPT
>  END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
> +
> +BEGIN_FTR_SECTION
> + ld  r4, STOP_IAMR(r13)
> + mtspr   SPRN_IAMR, r4
> + isync
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> +

pnv_wakeup_noloss gets called from two paths:
1) cpu wakes up at 0x100  (ESL=EC=1)
2) cpu wakes up at next instruction  (ESL=EC=0)

We know for the fact that its not lost with ESL=EC=0 , so we
should put it somewhere else.

I would like to put the restore code in pnv_restore_hyp_resource_arch300
it already has some work arounds we can add another.

By this time we cr3 has comparison of SRR1[46:47] with 2

  542 pnv_restore_hyp_resource_arch300:
  543 /*
  544  * Workaround for POWER9, if we lost resources, the ERAT
  545  * might have been mixed up and needs flushing. We also need
  546  * to reload MMCR0 (see comment above). We also need to set
  547  * then clear bit 60 in MMCRA to ensure the PMU starts running.
  548  */
  549 blt cr3,1f
  550 BEGIN_FTR_SECTION
  551 PPC_INVALIDATE_ERAT
  552 ld  r1,PACAR1(r13)
  553 ld  r4,_MMCR0(r1)
  554 mtspr   SPRN_MMCR0,r4
  555 END_FTR_SECTION_IFCLR(CPU_FTR_POWER9_DD2_1)
  556 mfspr   r4,SPRN_MMCRA
  557 ori r4,r4,(1 << (63-60))
  558 mtspr   SPRN_MMCRA,r4
  559 xorir4,r4,(1 << (63-60))
  560 mtspr   SPRN_MMCRA,r4
+ 561 BEGIN_FTR_SECTION
+ 562 ld  r4,STOP_IAMR(r13)
+ 563 mtspr   SPRN_IAMR,r4
+ 564 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
  565 1:
  566 /* 

I have a patchset to handle both AMOR and IAMR,
need to test it on power8 before posting.


>   ld  r4,PACAKMSR(r13)
>   ld  r5,_NIP(r1)
>   ld  r6,_CCR(r1)
> -- 
> 2.20.1
>

Re: [PATCH] powerpc/powernv/idle: Restore IAMR after idle

2019-02-19 Thread Akshay Adiga

On Wed, Feb 06, 2019 at 05:28:37PM +1100, Russell Currey wrote:
> Without restoring the IAMR after idle, execution prevention on POWER9
> with Radix MMU is overwritten and the kernel can freely execute userspace 
> without
> faulting.
> 
> This is necessary when returning from any stop state that modifies user
> state, as well as hypervisor state.
> 
> To test how this fails without this patch, load the lkdtm driver and
> do the following:
> 
>echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT
> 
> which won't fault, then boot the kernel with powersave=off, where it
> will fault.  Applying this patch will fix this.
> 
> Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user
> space")
> Cc: 
> Signed-off-by: Russell Currey 
> ---
>  arch/powerpc/include/asm/cpuidle.h |  1 +
>  arch/powerpc/kernel/asm-offsets.c  |  1 +
>  arch/powerpc/kernel/idle_book3s.S  | 20 
>  3 files changed, 22 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/cpuidle.h 
> b/arch/powerpc/include/asm/cpuidle.h
> index 43e5f31fe64d..ad67dbe59498 100644
> --- a/arch/powerpc/include/asm/cpuidle.h
> +++ b/arch/powerpc/include/asm/cpuidle.h
> @@ -77,6 +77,7 @@ struct stop_sprs {
>   u64 mmcr1;
>   u64 mmcr2;
>   u64 mmcra;
> + u64 iamr;
>  };
> 
>  #define PNV_IDLE_NAME_LEN16
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 9ffc72ded73a..10e0314c2b0d 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -774,6 +774,7 @@ int main(void)
>   STOP_SPR(STOP_MMCR1, mmcr1);
>   STOP_SPR(STOP_MMCR2, mmcr2);
>   STOP_SPR(STOP_MMCRA, mmcra);
> + STOP_SPR(STOP_IAMR, iamr);
>  #endif
> 
>   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 7f5ac2e8581b..bb4f552f6c7e 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -200,6 +200,12 @@ pnv_powersave_common:
>   /* Continue saving state */
>   SAVE_GPR(2, r1)
>   SAVE_NVGPRS(r1)
> +
> +BEGIN_FTR_SECTION
> + mfspr   r5, SPRN_IAMR
> + std r5, STOP_IAMR(r13)
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> +

Are we trying to add for both power8 and power9 ?
power9 would be CPU_FTR_ARCH_300.

Re: [PATCH] powerpc/powernv/idle: Restore IAMR after idle

2019-02-19 Thread Akshay Adiga

On Tue, Feb 19, 2019 at 02:21:04PM +1000, Nicholas Piggin wrote:
> Michael Ellerman's on February 8, 2019 11:04 am:
> > Nicholas Piggin  writes:
> >> Russell Currey's on February 6, 2019 4:28 pm:
> >>> Without restoring the IAMR after idle, execution prevention on POWER9
> >>> with Radix MMU is overwritten and the kernel can freely execute userspace 
> >>> without
> >>> faulting.
> >>> 
> >>> This is necessary when returning from any stop state that modifies user
> >>> state, as well as hypervisor state.
> >>> 
> >>> To test how this fails without this patch, load the lkdtm driver and
> >>> do the following:
> >>> 
> >>>echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT
> >>> 
> >>> which won't fault, then boot the kernel with powersave=off, where it
> >>> will fault.  Applying this patch will fix this.
> >>> 
> >>> Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user
> >>> space")
> >>> Cc: 
> >>> Signed-off-by: Russell Currey 
> >>
> >> Good catch and debugging. This really should be a quirk, we don't want 
> >> to have to restore this thing on a thread switch.
> > 
> > I'm not sure I follow. We don't context switch it on Radix, but we do
> > on hash if pkeys are enabled.
> 
> Badly worded, I mean a hardware quirk. It should follow thread
> switches. Still, avoiding it for the no-loss case is better than
> nothing. We can just revisit it as an optimization if future
> hardware does not require the restore.

Apparently, the POWER9 Processor User’s Manual v2.0 documents that
IAMR can be lost, and that is not just the end.

Pasting excerpt from "Section 23.5.9.2 State Loss and Restoration,Page 309"

  On the POWER9 core, the only state that can be lost for
  Stop levels less than four, when PSSCR[ESL] = ‘1’ are the
  following SPRs: CR, FPSCR, VSCR, XER, DSCR, AMR, IAMR, UAMOR,
  AMOR, DAWR, DAWRX.

My observation is that AMOR is being used in kernel as of today
and AMOR is also lost (recreated in similar scenarios where
IAMR is lost).

Re: [PATCH] cpufreq: powernv: fix missing check of return value in init_powernv_pstates()

2019-02-17 Thread Akshay Adiga

On Sat, Feb 16, 2019 at 12:06:23PM -0500, Yangtao Li wrote:
> kmalloc() could fail, so insert a check of its return value. And
> if it fails, returns -ENOMEM.
> 
> And remove (struct pstate_idx_revmap_data *) to fix coccinelle WARNING
> by the way.
> 
> WARNING: casting value returned by memory allocation function to (struct
> pstate_idx_revmap_data *) is useless.
> 
> Signed-off-by: Yangtao Li 
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
Looks good to me. Thanks for fixing this.

[RFC PATCH v2 3/3] cpuidle/powernv: save-restore sprs in opal

2018-10-11 Thread Akshay Adiga

From: Abhishek Goel 

This patch moves the saving and restoring of sprs for P9 cpuidle
from kernel to opal.
In an attempt to make the powernv idle code backward compatible,
and to some extent forward compatible, add support for pre-stop entry
and post-stop exit actions in OPAL. If a kernel knows about this
opal call, then just a firmware supporting newer hardware is required,
instead of waiting for kernel updates.

Signed-off-by: Abhishek Goel 
Signed-off-by: Akshay Adiga 
---
Changes from v1 :
 - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
   idle code in C"
 - Set a global variable "request_opal_call" to indicate that deep
   states should make opal_call.
 - All the states that loses hypervisor states will be handled by OPAL
 - All the decision making such as identifying first thread in
   the core and taking locks before restoring in such cases have also been
   moved to OPAL
 arch/powerpc/include/asm/opal-api.h   |  4 +-
 arch/powerpc/include/asm/opal.h   |  3 +
 arch/powerpc/include/asm/processor.h  |  3 +-
 arch/powerpc/kernel/idle_book3s.S |  6 +-
 arch/powerpc/platforms/powernv/idle.c | 88 +--
 .../powerpc/platforms/powernv/opal-wrappers.S |  2 +
 6 files changed, 77 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 8365353330b4..93ea1f79e295 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -210,7 +210,9 @@
 #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR   164
 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR   165
 #defineOPAL_NX_COPROC_INIT 167
-#define OPAL_LAST  167
+#define OPAL_IDLE_SAVE 170
+#define OPAL_IDLE_RESTORE  171
+#define OPAL_LAST  171
 
 #define QUIESCE_HOLD   1 /* Spin all calls at entry */
 #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ff3866473afe..26995e16171e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -356,6 +356,9 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs);
 extern void opal_shutdown(void);
 extern int opal_resync_timebase(void);
 
+extern int opal_cpuidle_save(u64 psscr);
+extern int opal_cpuidle_restore(u64 psscr, u64 srr1);
+
 extern void opal_lpc_init(void);
 
 extern void opal_kmsg_init(void);
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 822d3236ad7f..26fa6c1836f4 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -510,7 +510,8 @@ static inline unsigned long get_clean_sp(unsigned long sp, 
int is_32)
 
 /* asm stubs */
 extern unsigned long isa300_idle_stop_noloss(unsigned long psscr_val);
-extern unsigned long isa300_idle_stop_mayloss(unsigned long psscr_val);
+extern unsigned long isa300_idle_stop_mayloss(unsigned long psscr_val,
+   bool request_opal_call);
 extern unsigned long isa206_idle_insn_mayloss(unsigned long type);
 
 extern unsigned long cpuidle_disable;
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index ffdee1ab4388..a2014d152035 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -52,14 +52,16 @@ _GLOBAL(isa300_idle_stop_noloss)
 _GLOBAL(isa300_idle_stop_mayloss)
mtspr   SPRN_PSSCR,r3
std r1,PACAR1(r13)
-   mflrr4
+   mflrr7
mfcrr5
/* use stack red zone rather than a new frame */
addir6,r1,-INT_FRAME_SIZE
SAVE_GPR(2, r6)
SAVE_NVGPRS(r6)
-   std r4,_LINK(r6)
+   std r7,_LINK(r6)
std r5,_CCR(r6)
+   cmpwi   r4,0
+   bne opal_cpuidle_save
PPC_STOP
b   .   /* catch bugs */
 
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 681a23a066bb..bcfe08022e65 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -171,6 +171,7 @@ static void pnv_fastsleep_workaround_apply(void *info)
 
 static bool power7_fastsleep_workaround_entry = true;
 static bool power7_fastsleep_workaround_exit = true;
+static bool request_opal_call = false;
 
 /*
  * Used to store fastsleep workaround state
@@ -604,6 +605,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
unsigned long mmcr0 = 0;
struct p9_sprs sprs;
bool sprs_saved = false;
+   bool is_hv_loss = false;
 
memset(, 0, sizeof(sprs));
 
@@ -648,7 +650,9 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
  */
mmcr0

[RFC PATCH v2 2/3] powernv/cpuidle: Pass pointers instead of values to stop loop

2018-10-11 Thread Akshay Adiga

Passing pointer to the pnv_idle_state instead of psscr value and mask.
This helps us to pass more information to the stop loop. This will help to
figure out the method to enter/exit idle state.

Signed-off-by: Akshay Adiga 

---
Changes from v1 :
 - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
   idle code in C"

 arch/powerpc/include/asm/processor.h  |  5 ++-
 arch/powerpc/platforms/powernv/idle.c | 47 ++-
 drivers/cpuidle/cpuidle-powernv.c | 15 +++--
 3 files changed, 24 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 936795acba48..822d3236ad7f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* We do _not_ want to define new machine types at all, those must die
  * in favor of using the device-tree
@@ -518,9 +519,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, 
IDLE_POWERSAVE_OFF};
 extern int powersave_nap;  /* set if nap mode can be used in idle loop */
 
 extern void power7_idle_type(unsigned long type);
-extern void power9_idle_type(unsigned long stop_psscr_val,
- unsigned long stop_psscr_mask);
-
+extern void power9_idle_type(struct pnv_idle_states_t *state);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
 extern void poweroff_now(void);
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 755918402591..681a23a066bb 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -44,8 +44,7 @@ int nr_pnv_idle_states;
  * The default stop state that will be used by ppc_md.power_save
  * function on platforms that support stop instruction.
  */
-static u64 pnv_default_stop_val;
-static u64 pnv_default_stop_mask;
+struct pnv_idle_states_t *pnv_default_state;
 static bool default_stop_found;
 
 /*
@@ -72,9 +71,7 @@ const int nr_known_versions = 1;
  * psscr value and mask of the deepest stop idle state.
  * Used when a cpu is offlined.
  */
-static u64 pnv_deepest_stop_psscr_val;
-static u64 pnv_deepest_stop_psscr_mask;
-static u64 pnv_deepest_stop_flag;
+static struct pnv_idle_states_t *pnv_deepest_state;
 static bool deepest_stop_found;
 
 static unsigned long power7_offline_type;
@@ -96,7 +93,7 @@ static int pnv_save_sprs_for_deep_states(void)
uint64_t hid5_val   = mfspr(SPRN_HID5);
uint64_t hmeer_val  = mfspr(SPRN_HMEER);
uint64_t msr_val = MSR_IDLE;
-   uint64_t psscr_val = pnv_deepest_stop_psscr_val;
+   uint64_t psscr_val = pnv_deepest_state->psscr_val;
 
for_each_present_cpu(cpu) {
uint64_t pir = get_hard_smp_processor_id(cpu);
@@ -820,17 +817,15 @@ static unsigned long power9_offline_stop(unsigned long 
psscr)
return srr1;
 }
 
-static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
- unsigned long stop_psscr_mask)
+static unsigned long __power9_idle_type(struct pnv_idle_states_t *state)
 {
unsigned long psscr;
unsigned long srr1;
 
if (!prep_irq_for_idle_irqsoff())
return 0;
-
psscr = mfspr(SPRN_PSSCR);
-   psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
+   psscr = (psscr & ~state->psscr_mask) | state->psscr_val;
 
__ppc64_runlatch_off();
srr1 = power9_idle_stop(psscr, true);
@@ -841,12 +836,10 @@ static unsigned long __power9_idle_type(unsigned long 
stop_psscr_val,
return srr1;
 }
 
-void power9_idle_type(unsigned long stop_psscr_val,
- unsigned long stop_psscr_mask)
+void power9_idle_type(struct pnv_idle_states_t *state)
 {
unsigned long srr1;
-
-   srr1 = __power9_idle_type(stop_psscr_val, stop_psscr_mask);
+   srr1 = __power9_idle_type(state);
irq_set_pending_from_srr1(srr1);
 }
 
@@ -855,7 +848,7 @@ void power9_idle_type(unsigned long stop_psscr_val,
  */
 void power9_idle(void)
 {
-   power9_idle_type(pnv_default_stop_val, pnv_default_stop_mask);
+   power9_idle_type(pnv_default_state);
 }
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -974,8 +967,8 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
unsigned long psscr;
 
psscr = mfspr(SPRN_PSSCR);
-   psscr = (psscr & ~pnv_deepest_stop_psscr_mask) |
-   pnv_deepest_stop_psscr_val;
+   psscr = (psscr & ~pnv_deepest_state->psscr_mask) |
+   pnv_deepest_state->psscr_val;
srr1 = power9_offline_stop(psscr);
} else if (cpu_has_feature(CPU_FTR_ARCH_206) && power7_offline_type) {
srr1 = power7_offline();
@@ -1123,16 +1116,1

[RFC PATCH v2 1/3] cpuidle/powernv: Add support for states with ibm, cpuidle-state-v1

2018-10-11 Thread Akshay Adiga

This patch adds support for new device-tree format for idle state
description.

Previously if a older kernel runs on a newer firmware, it may enable
all available states irrespective of its capability of handling it.
New device tree format adds a compatible flag, so that only kernel
which has the capability to handle the version of stop state will enable
it.

Older kernel will still see stop0 and stop0_lite in older format and we
will depricate it after some time.

1) Idea is to bump up the version in firmware if we find a bug or
regression in stop states. A fix will be provided in linux which would
now know about the bumped up version of stop states, where as kernel
without fixes would ignore the states.

2) Slowly deprecate cpuidle /cpuhotplug threshold which is hard-coded
into cpuidle-powernv driver. Instead use compatible strings to indicate
if idle state is suitable for cpuidle and hotplug.

New idle state device tree format :
   power-mgt {
...
 ibm,enabled-stop-levels = <0xec00>;
 ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
 ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
 ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
 ibm,cpu-idle-state-flags = <0x10 0x101000>;
 ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
 ibm,idle-states {
 stop4 {
 flags = <0x207000>;
 compatible = "ibm,state-v1",
  "opal-supported";
 type = "cpuidle";
 psscr-mask = <0x0 0x3003ff>;
 handle = <0x102>;
 latency-ns = <0x186a0>;
 residency-ns = <0x989680>;
 psscr = <0x0 0x300374>;
  };
...
stop11 {
 ...
 compatible = "ibm,state-v1",
  "opal-supported";
 type = "cpuoffline";
 ...
  };
 };
type strings :
"cpuidle" : indicates it should be used by cpuidle-driver
"cpuoffline" : indicates it should be used by hotplug driver

compatible strings :
"ibm,state-v1" : kernel checks if it knows about this version
"opal-supported" : indicates kernel can fall back to use opal
   for stop-transitions

Signed-off-by: Akshay Adiga 
---

Changes from v1 :
 - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
   idle code in C"
 - Moved "cpuidle" and "cpuoffline" as seperate property called
   "type"
 

 arch/powerpc/include/asm/cpuidle.h|   9 ++
 arch/powerpc/platforms/powernv/idle.c | 132 +-
 drivers/cpuidle/cpuidle-powernv.c |  31 --
 3 files changed, 160 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 9844b3ded187..e920a15e797f 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -70,14 +70,23 @@
 
 #ifndef __ASSEMBLY__
 
+enum idle_state_type_t {
+   CPUIDLE_TYPE,
+   CPUOFFLINE_TYPE
+};
+
+#define POWERNV_THRESHOLD_LATENCY_NS 20
+#define PNV_VER_NAME_LEN32
 #define PNV_IDLE_NAME_LEN16
 struct pnv_idle_states_t {
char name[PNV_IDLE_NAME_LEN];
+   char version[PNV_VER_NAME_LEN];
u32 latency_ns;
u32 residency_ns;
u64 psscr_val;
u64 psscr_mask;
u32 flags;
+   enum idle_state_type_t type;
bool valid;
 };
 
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 96186af9e953..755918402591 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -54,6 +54,20 @@ static bool default_stop_found;
 static u64 pnv_first_tb_loss_level = MAX_STOP_STATE + 1;
 static u64 pnv_first_hv_loss_level = MAX_STOP_STATE + 1;
 
+
+static int parse_dt_v1(struct device_node *np);
+struct stop_version_t {
+   const char name[PNV_VER_NAME_LEN];
+   int (*parser_fn)(struct device_node *np);
+};
+struct stop_version_t known_versions[] = {
+   {
+   .name =  "ibm,state-v1",
+   .parser_fn = parse_dt_v1,
+   }
+   };
+const int nr_known_versions = 1;
+
 /*
  * psscr value and mask of the deepest stop idle state.
  * Used when a cpu is offlined.
@@ -1195,6 +1209,77 @@ static void __init pnv_probe_idle_states(void)
supported_cpuidle_states |= pnv_idle_states[i].flags;
 }
 
+static int parse_dt_v1(struct device_node *dt_node)
+{
+   const char *temp_str;

[RFC PATCH v2 0/3] New device-tree format and Opal based idle save-restore

2018-10-11 Thread Akshay Adiga

Previously if a older kernel runs on a newer firmware, it may enable
all available states irrespective of its capability of handling it.
New device tree format adds a compatible flag, so that only kernel
which has the capability to handle the version of stop state will enable
it.

Older kernel will still see stop0 and stop0_lite in older format and we
will depricate it after some time.

1) Idea is to bump up the version string in firmware if we find a bug or
regression in stop states. A fix will be provided in linux which would
now know about the bumped up version of stop states, where as kernel
without fixes would ignore the states.

2) Slowly deprecate cpuidle/cpuhotplug threshold which is hard-coded
into cpuidle-powernv driver. Instead use compatible strings to indicate
if idle state is suitable for cpuidle and hotplug.

New idle state device tree format :
   power-mgt {
...
 ibm,enabled-stop-levels = <0xec00>;
 ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
 ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
 ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
 ibm,cpu-idle-state-flags = <0x10 0x101000>;
 ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
 ibm,idle-states {
 stop4 {
 flags = <0x207000>;
 compatible = "ibm,state-v1",
  "opal-support";
 type = "cpuidle";
 psscr-mask = <0x0 0x3003ff>;
 handle = <0x102>;
 latency-ns = <0x186a0>;
 residency-ns = <0x989680>;
 psscr = <0x0 0x300374>;
  };
...
stop11 {
 ...
 compatible = "ibm,state-v1",
  "opal-support";
 type = "cpuoffline";
 ...
  };
 };

High-level parsing algorithm :

Say Known version string = "ibm,state-v1"

for each stop state node in device tree:
if (compatible has known version string)
kernel takes care of stop-transitions
else if (compatible has "opal-support")
OPAL takes care of stop-transitions
else
Skip All deeper states

When a state does not have both version support and opal support,
Its possible to exit from a shallower state. Hence skipping all
deeper states.

OPAL support for idle states


With this patch series, all the states that loose hypervisor state
will be handled through opal_call.

Patch 3 adds support for Saving/restoring of SPRs and resync-timebase
in OPAL. Also all the decision making such as identifying first thread
in the core and taking locks before restoring, etc are implemented in
OPAL.

How does it work ?
---

Consider a case that stop4 has a bug. We take the following steps to
mitigate the problem.

1) Change compatible string for stop4 in OPAL to "ibm-state-v2" and
remove "opal-supported". ship the new firmware.
The kernel ignores stop4 and all deeper states. But we will still have
shallower states. Prevents from completely disabling stop states.

2) Implement workaround in OPAL and add "opal-supported". Ship new firmware
The kernel uses opal for stop-transtion , which has workaround implemented.
We get stop4 and deeper states working without kernel changes and backports.
(and considerably less time)

3) Implement workaround in kernel and add "ibm-state-v2" as known versions
The kernel will now be able to handle stop4 and deeper states.

Changes from v1 :
 - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
   idle code in C"
http://patchwork.ozlabs.org/patch/969596/
 - All the states that loses hypervisor states will be handled by OPAL
 - All the decision making such as identifying first thread in
   the core and taking locks before restoring in such cases have also been
   moved to OPAL


Abhishek Goel (1):
  cpuidle/powernv: save-restore sprs in opal

Akshay Adiga (2):
  cpuidle/powernv: Add support for states with ibm,cpuidle-state-v1
  powernv/cpuidle: Pass pointers instead of values to  stop loop

 arch/powerpc/include/asm/cpuidle.h|   9 +
 arch/powerpc/include/asm/opal-api.h   |   4 +-
 arch/powerpc/include/asm/opal.h   |   3 +
 arch/powerpc/include/asm/processor.h  |   8 +-
 arch/powerpc/kernel/idle_book3s.S |   6 +-
 arch/powerpc/platforms/powernv/idle.c | 247 ++
 .../powerpc/platforms/powernv/opal-wrappers.S |   2 +
 drivers/cpuidle/cpuidle-powernv.c |  46 ++--
 8 files changed, 251 insertions(+), 74 deletions(-)

-- 
2.17.1

Re: [PATCH v4] powerpc/64s: reimplement book3s idle code in C

2018-09-26 Thread Akshay Adiga

On Fri, Sep 14, 2018 at 11:52:40AM +1000, Nicholas Piggin wrote:
> +
> + /*
> +  * On POWER9, SRR1 bits do not match exactly as expected.
> +  * SRR1_WS_GPRLOSS (10b) can also result in SPR loss, so
> +  * always test PSSCR if there is any state loss.
> +  */
> + if (likely((psscr & PSSCR_RL_MASK) < pnv_first_hv_loss_level)) {

Shouldn't we check PLS field to see if the cpu/core woke up from hv loss ?

Currently, a cpu requested stop4 (RL=4) and exited from a shallower state
(PLS=2), SPR's are unecessarily restored.

We can do something like : 

#define PSSCR_PLS_SHIFT 60
if (likely((psscr & PSSCR_PLS) >> PSSCR_PLS_SHIFT) < pnv_first_hv_loss_level)
> + if (sprs_saved)
> + atomic_stop_thread_idle();
> + goto out;
> + }
> +
> + /* HV state loss */
> + BUG_ON(!sprs_saved);
> +
> + atomic_lock_thread_idle();
> +
> + if ((*state & ((1 << threads_per_core) - 1)) != 0)
> + goto core_woken;
> +
> + /* Per-core SPRs */
> + mtspr(SPRN_PTCR,sprs.ptcr);
> + mtspr(SPRN_RPR, sprs.rpr);
> + mtspr(SPRN_TSCR,sprs.tscr);
> + mtspr(SPRN_LDBAR,   sprs.ldbar);
> + mtspr(SPRN_AMOR,sprs.amor);
> +
> + if ((psscr & PSSCR_RL_MASK) >= pnv_first_tb_loss_level) {
> + /* TB loss */
> + if (opal_resync_timebase() != OPAL_SUCCESS)
> + BUG();
> + }

Re: [PATCH] powernv/cpuidle: Fix idle states all being marked invalid

2018-08-03 Thread Akshay Adiga

On Fri, Aug 03, 2018 at 01:39:51AM +1000, Nicholas Piggin wrote:
> Commit 9c7b185ab2 ("powernv/cpuidle: Parse dt idle properties into
> global structure") parses dt idle states into structs, but never
> marks them valid. This results in all idle states being lost.
>
My bad. Thanks nick for fixing this. We definatetely need this.


> Cc: Akshay Adiga 
> Cc: Gautham R. Shenoy 
> Signed-off-by: Nicholas Piggin 

Acked-by: Akshay Adiga 

> ---
>  arch/powerpc/platforms/powernv/idle.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 3116bab10aa3..ecb002c5db83 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -651,11 +651,12 @@ static int __init pnv_power9_idle_init(void)
> >psscr_mask,
> state->flags);
>   if (err) {
> - state->valid = false;
>   report_invalid_psscr_val(state->psscr_val, err);
>   continue;
>   }
> 
> + state->valid = true;
> +
>   if (max_residency_ns < state->residency_ns) {
>   max_residency_ns = state->residency_ns;
>   pnv_deepest_stop_psscr_val = state->psscr_val;
> -- 
> 2.17.0
>

[RFC PATCH 3/3] cpuidle/powernv: Conditionally save-restore sprs using opal

2018-08-01 Thread Akshay Adiga

From: Abhishek Goel 

If a state has "opal-supported" compat flag in device-tree, an opal call
needs to be made during the entry and exit of the stop state. This patch
passes a hint to the power9_idle_stop and power9_offline_stop.

This patch moves the saving and restoring of sprs for P9 cpuidle
from kernel to opal. This patch still uses existing code to detect
first thread in core.
In an attempt to make the powernv idle code backward compatible,
and to some extent forward compatible, add support for pre-stop entry
and post-stop exit actions in OPAL. If a kernel knows about this
opal call, then just a firmware supporting newer hardware is required,
instead of waiting for kernel updates.

Signed-off-by: Abhishek Goel 
Signed-off-by: Akshay Adiga 
---
 arch/powerpc/include/asm/cpuidle.h|  1 +
 arch/powerpc/include/asm/opal-api.h   |  4 +-
 arch/powerpc/include/asm/opal.h   |  3 +
 arch/powerpc/include/asm/paca.h   |  5 +-
 arch/powerpc/include/asm/processor.h  |  6 +-
 arch/powerpc/kernel/asm-offsets.c |  3 +
 arch/powerpc/kernel/idle_book3s.S | 55 ++-
 arch/powerpc/platforms/powernv/idle.c | 41 ++
 .../powerpc/platforms/powernv/opal-wrappers.S |  2 +
 arch/powerpc/xmon/xmon.c  | 14 ++---
 10 files changed, 109 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index b965066560cc..2fb2324d15fc 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -96,6 +96,7 @@ struct pnv_idle_states_t {
u64 psscr_val;
u64 psscr_mask;
u32 flags;
+   bool req_opal_call;
enum idle_state_type_t type;
bool valid;
 };
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 3bab299eda49..6792a737bc9a 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -208,7 +208,9 @@
 #define OPAL_SENSOR_READ_U64   162
 #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR   164
 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR   165
-#define OPAL_LAST  165
+#define OPAL_IDLE_SAVE 168
+#define OPAL_IDLE_RESTORE  169
+#define OPAL_LAST  169
 
 #define QUIESCE_HOLD   1 /* Spin all calls at entry */
 #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e1b2910c6e81..12d57aeacde2 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -356,6 +356,9 @@ extern void opal_kmsg_init(void);
 
 extern int opal_event_request(unsigned int opal_event_nr);
 
+extern int opal_cpuidle_save(u64 *stop_sprs, int scope, u64 psscr);
+extern int opal_cpuidle_restore(u64 *stop_sprs, int scope, u64 psscr, u64 
srr1);
+
 struct opal_sg_list *opal_vmalloc_to_sg_list(void *vmalloc_addr,
 unsigned long vmalloc_size);
 void opal_free_sg_list(struct opal_sg_list *sg);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 6d34bd71139d..586059594443 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -195,11 +195,14 @@ struct paca_struct {
/* The PSSCR value that the kernel requested before going to stop */
u64 requested_psscr;
 
+   u64 wakeup_psscr;
+   u8 req_opal_call;
/*
-* Save area for additional SPRs that need to be
+* Save area for SPRs that need to be
 * saved/restored during cpuidle stop.
 */
struct stop_sprs stop_sprs;
+   u64 *opal_stop_sprs;
 #endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 34f572056add..9f9fb1f11dd6 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -513,8 +513,10 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, 
IDLE_POWERSAVE_OFF};
 extern int powersave_nap;  /* set if nap mode can be used in idle loop */
 extern unsigned long power7_idle_insn(unsigned long type); /* 
PNV_THREAD_NAP/etc*/
 extern void power7_idle_type(unsigned long type);
-extern unsigned long power9_idle_stop(unsigned long psscr_val);
-extern unsigned long power9_offline_stop(unsigned long psscr_val);
+extern unsigned long power9_idle_stop(unsigned long psscr_val,
+ bool opal_enabled);
+extern unsigned long power9_offline_stop(unsigned long psscr_val,
+bool opal_enabled);
 extern void power9_idle_type(int index);
 
 extern void flush_instruction_cache(void);
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 262c44a90ea1..740ae068e

[RFC PATCH 2/3] powernv/cpuidle: Pass pointers instead of values to stop loop

2018-08-01 Thread Akshay Adiga

Passing pointer to the pnv_idle_state instead of psscr value and mask.
This helps us to pass more information to the stop loop. This will help to
figure out the method to enter/exit idle state.

Signed-off-by: Akshay Adiga 
---
 arch/powerpc/include/asm/processor.h  |  3 +-
 arch/powerpc/platforms/powernv/idle.c | 58 ---
 drivers/cpuidle/cpuidle-powernv.c | 16 +++-
 3 files changed, 32 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 5debe337ea9d..34f572056add 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -515,8 +515,7 @@ extern unsigned long power7_idle_insn(unsigned long type); 
/* PNV_THREAD_NAP/etc
 extern void power7_idle_type(unsigned long type);
 extern unsigned long power9_idle_stop(unsigned long psscr_val);
 extern unsigned long power9_offline_stop(unsigned long psscr_val);
-extern void power9_idle_type(unsigned long stop_psscr_val,
- unsigned long stop_psscr_mask);
+extern void power9_idle_type(int index);
 
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 93accece92e3..a6ef9b68e27b 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -43,8 +43,7 @@ int nr_pnv_idle_states;
  * The default stop state that will be used by ppc_md.power_save
  * function on platforms that support stop instruction.
  */
-static u64 pnv_default_stop_val;
-static u64 pnv_default_stop_mask;
+struct pnv_idle_states_t *pnv_default_state;
 static bool default_stop_found;
 
 static int parse_dt_v1(struct device_node *np);
@@ -70,9 +69,7 @@ u64 pnv_first_deep_stop_state = MAX_STOP_STATE;
  * psscr value and mask of the deepest stop idle state.
  * Used when a cpu is offlined.
  */
-static u64 pnv_deepest_stop_psscr_val;
-static u64 pnv_deepest_stop_psscr_mask;
-static u64 pnv_deepest_stop_flag;
+static struct pnv_idle_states_t *pnv_deepest_state;
 static bool deepest_stop_found;
 
 static int pnv_save_sprs_for_deep_states(void)
@@ -92,7 +89,7 @@ static int pnv_save_sprs_for_deep_states(void)
uint64_t hid5_val = mfspr(SPRN_HID5);
uint64_t hmeer_val = mfspr(SPRN_HMEER);
uint64_t msr_val = MSR_IDLE;
-   uint64_t psscr_val = pnv_deepest_stop_psscr_val;
+   uint64_t psscr_val = pnv_deepest_state->psscr_val;
 
for_each_present_cpu(cpu) {
uint64_t pir = get_hard_smp_processor_id(cpu);
@@ -218,18 +215,15 @@ static void pnv_alloc_idle_core_states(void)
pr_warn("cpuidle-powernv: Idle power-savings, CPU-Hotplug 
affected\n");
 
if (cpu_has_feature(CPU_FTR_ARCH_300) &&
-   (pnv_deepest_stop_flag & OPAL_PM_LOSE_FULL_CONTEXT)) {
+   (pnv_deepest_state->flags & OPAL_PM_LOSE_FULL_CONTEXT)) {
/*
 * Use the default stop state for CPU-Hotplug
 * if available.
 */
if (default_stop_found) {
-   pnv_deepest_stop_psscr_val =
-   pnv_default_stop_val;
-   pnv_deepest_stop_psscr_mask =
-   pnv_default_stop_mask;
+   pnv_deepest_state = pnv_default_state;
pr_warn("cpuidle-powernv: Offlined CPUs will 
stop with psscr = 0x%016llx\n",
-   pnv_deepest_stop_psscr_val);
+   pnv_deepest_state->psscr_val);
} else { /* Fallback to snooze loop for CPU-Hotplug */
deepest_stop_found = false;
pr_warn("cpuidle-powernv: Offlined CPUs will 
busy wait\n");
@@ -365,20 +359,20 @@ void power7_idle(void)
power7_idle_type(PNV_THREAD_NAP);
 }
 
-static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
- unsigned long stop_psscr_mask)
+static unsigned long __power9_idle_type(struct pnv_idle_states_t *state)
 {
-   unsigned long psscr;
+   unsigned long psscr, stop_psscr_mask, stop_psscr_val;
unsigned long srr1;
 
+   stop_psscr_mask = state->psscr_mask;
+   stop_psscr_val = state->psscr_val;
if (!prep_irq_for_idle_irqsoff())
return 0;
 
psscr = mfspr(SPRN_PSSCR);
psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
-
__ppc64_runlatch_off();
-   srr1 = power9_idle_stop(psscr);
+   srr1 = power9_idle_stop(psscr, state->opal_supported);
__ppc64_runlatch_on();
 
fini_irq_for_idle_irqsoff();
@@ -386,12 +380,11 @@ static unsigned l

[RFC PATCH 1/3] cpuidle/powernv: Add support for states with ibm, cpuidle-state-v1

2018-08-01 Thread Akshay Adiga

This patch adds support for new device-tree format for idle state
description.

Previously if a older kernel runs on a newer firmware, it may enable
all available states irrespective of its capability of handling it.
New device tree format adds a compatible flag, so that only kernel
which has the capability to handle the version of stop state will enable
it.

Older kernel will still see stop0 and stop0_lite in older format and we
will depricate it after some time.

1) Idea is to bump up the version in firmware if we find a bug or
regression in stop states. A fix will be provided in linux which would
now know about the bumped up version of stop states, where as kernel
without fixes would ignore the states.

2) Slowly deprecate cpuidle /cpuhotplug threshold which is hard-coded
into cpuidle-powernv driver. Instead use compatible strings to indicate
if idle state is suitable for cpuidle and hotplug.

New idle state device tree format :
   power-mgt {
...
 ibm,enabled-stop-levels = <0xec00>;
 ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
 ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
 ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
 ibm,cpu-idle-state-flags = <0x10 0x101000>;
 ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
 ibm,idle-states {
 stop4 {
 flags = <0x207000>;
 compatible = "ibm,state-v1",
  "cpuidle",
  "opal-supported";
 psscr-mask = <0x0 0x3003ff>;
 handle = <0x102>;
 latency-ns = <0x186a0>;
 residency-ns = <0x989680>;
 psscr = <0x0 0x300374>;
  };
...
stop11 {
 ...
 compatible = "ibm,state-v1",
  "cpuoffline",
  "opal-supported";
 ...
  };
 };

compatible strings :
"cpuidle" : indicates it should be used by cpuidle-driver
"cpuoffline" : indicates it should be used by hotplug driver
"ibm,state-v1" : kernel checks if it knows about this version
"opal-supported" : indicates kernel can fall back to use opal
   for stop-transitions

Signed-off-by: Akshay Adiga 
---
 arch/powerpc/include/asm/cpuidle.h|  11 ++
 arch/powerpc/platforms/powernv/idle.c | 139 +-
 drivers/cpuidle/cpuidle-powernv.c |  50 +
 3 files changed, 175 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 43e5f31fe64d..b965066560cc 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -79,17 +79,28 @@ struct stop_sprs {
u64 mmcra;
 };
 
+enum idle_state_type_t {
+   CPUIDLE_TYPE,
+   CPUOFFLINE_TYPE
+};
+
+
+#define POWERNV_THRESHOLD_LATENCY_NS 20
+#define PNV_VER_NAME_LEN32
 #define PNV_IDLE_NAME_LEN16
 struct pnv_idle_states_t {
char name[PNV_IDLE_NAME_LEN];
+   char version[PNV_VER_NAME_LEN];
u32 latency_ns;
u32 residency_ns;
u64 psscr_val;
u64 psscr_mask;
u32 flags;
+   enum idle_state_type_t type;
bool valid;
 };
 
+
 extern struct pnv_idle_states_t *pnv_idle_states;
 extern int nr_pnv_idle_states;
 extern u32 pnv_fastsleep_workaround_at_entry[];
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 7cf71b3e03a1..93accece92e3 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -47,6 +47,19 @@ static u64 pnv_default_stop_val;
 static u64 pnv_default_stop_mask;
 static bool default_stop_found;
 
+static int parse_dt_v1(struct device_node *np);
+struct stop_version_t {
+   const char name[PNV_VER_NAME_LEN];
+   int (*parser_fn)(struct device_node *np);
+};
+struct stop_version_t known_versions[] = {
+   {
+   .name =  "ibm,state-v1",
+   .parser_fn = parse_dt_v1,
+   }
+};
+const int nr_known_versions = 1;
+
 /*
  * First deep stop state. Used to figure out when to save/restore
  * hypervisor context.
@@ -659,8 +672,14 @@ static int __init pnv_power9_idle_init(void)
state->valid = false;
report_invalid_psscr_val(state->psscr_val, err);
continue;
+   } else {
+   state->valid = true;
}
 
+   /*

[RFC PATCH 0/3] New device-tree format and Opal based idle save-restore

2018-08-01 Thread Akshay Adiga

Previously if a older kernel runs on a newer firmware, it may enable
all available states irrespective of its capability of handling it.
New device tree format adds a compatible flag, so that only kernel
which has the capability to handle the version of stop state will enable
it.

Older kernel will still see stop0 and stop0_lite in older format and we
will depricate it after some time.

1) Idea is to bump up the version string in firmware if we find a bug or
regression in stop states. A fix will be provided in linux which would
now know about the bumped up version of stop states, where as kernel
without fixes would ignore the states.

2) Slowly deprecate cpuidle/cpuhotplug threshold which is hard-coded
into cpuidle-powernv driver. Instead use compatible strings to indicate
if idle state is suitable for cpuidle and hotplug.

New idle state device tree format :
   power-mgt {
...
 ibm,enabled-stop-levels = <0xec00>;
 ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
 ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
 ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
 ibm,cpu-idle-state-flags = <0x10 0x101000>;
 ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
 ibm,idle-states {
 stop4 {
 flags = <0x207000>;
 compatible = "ibm,state-v1",
  "cpuidle",
  "opal-supported";
 psscr-mask = <0x0 0x3003ff>;
 handle = <0x102>;
 latency-ns = <0x186a0>;
 residency-ns = <0x989680>;
 psscr = <0x0 0x300374>;
  };
...
stop11 {
 ...
 compatible = "ibm,state-v1",
  "cpuoffline",
  "opal-supported";
 ...
  };
 };

Skiboot patch-set for device-tree is posted here :
https://patchwork.ozlabs.org/project/skiboot/list/?series=58934

High-level parsing algorithm :

Say Known version string = "ibm,state-v1"

for each stop state node in device tree:
if (compatible has known version string)
kernel takes care of stop-transitions
else if (compatible has "opal-supported")
OPAL takes care of stop-transitions
else
Skip All deeper states

When a state does not have both version support and opal support,
Its possible to exit from a shallower state. Hence skipping all
deeper states.

How does it work ?
---

Consider a case that stop4 has a bug. We take the following steps to
mitigate the problem.

1) Change compatible string for stop4 in OPAL to "ibm-state-v2" and
remove "opal-supported". ship the new firmware.
The kernel ignores stop4 and all deeper states. But we will still have
shallower states. Prevents from completely disabling stop states.

2) Implement workaround in OPAL and add "opal-supported". Ship new firmware
The kernel uses opal for stop-transtion , which has workaround implemented.
We get stop4 and deeper states working without kernel changes and backports.
(and considerably less time)

3) Implement workaround in kernel and add "ibm-state-v2" as known versions
The kernel will now be able to handle stop4 and deeper states.

Also includes Abhishek's RFC which was posted there :
 https://patchwork.ozlabs.org/patch/947568/

This patch-set is on top of mpe-next



Abhishek Goel (1):
  cpuidle/powernv: Conditionally save-restore sprs using opal

Akshay Adiga (2):
  cpuidle/powernv: Add support for states with ibm,cpuidle-state-v1
  powernv/cpuidle: Pass pointers instead of values to stop loop

 arch/powerpc/include/asm/cpuidle.h|  12 +
 arch/powerpc/include/asm/opal-api.h   |   4 +-
 arch/powerpc/include/asm/opal.h   |   3 +
 arch/powerpc/include/asm/paca.h   |   5 +-
 arch/powerpc/include/asm/processor.h  |   9 +-
 arch/powerpc/kernel/asm-offsets.c |   3 +
 arch/powerpc/kernel/idle_book3s.S |  55 -
 arch/powerpc/platforms/powernv/idle.c | 214 +++---
 .../powerpc/platforms/powernv/opal-wrappers.S |   2 +
 arch/powerpc/xmon/xmon.c  |  14 +-
 drivers/cpuidle/cpuidle-powernv.c |  66 +++---
 11 files changed, 304 insertions(+), 83 deletions(-)

-- 
2.18.0.rc2.85.g1fb9df7

[PATCH v4 2/2] powernv/cpuidle: Use parsed device tree values for cpuidle_init

2018-07-05 Thread Akshay Adiga

Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
cpuidle driver. Use properties from pnv_idle_states structure for powernv
cpuidle_init.

Signed-off-by: Akshay Adiga 
Reviewed-by: Nicholas Piggin 
Reviewed-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/cpuidle.h |   2 +
 drivers/cpuidle/cpuidle-powernv.c  | 158 +
 2 files changed, 28 insertions(+), 132 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 574b0ce1d671..43e5f31fe64d 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -90,6 +90,8 @@ struct pnv_idle_states_t {
bool valid;
 };
 
+extern struct pnv_idle_states_t *pnv_idle_states;
+extern int nr_pnv_idle_states;
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index d29e4f041efe..84b1ebe212b3 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -242,6 +242,7 @@ static inline void add_powernv_state(int index, const char 
*name,
powernv_states[index].target_residency = target_residency;
powernv_states[index].exit_latency = exit_latency;
powernv_states[index].enter = idle_fn;
+   /* For power8 and below psscr_* will be 0 */
stop_psscr_table[index].val = psscr_val;
stop_psscr_table[index].mask = psscr_mask;
 }
@@ -263,186 +264,80 @@ static inline int validate_dt_prop_sizes(const char 
*prop1, int prop1_len,
 extern u32 pnv_get_supported_cpuidle_states(void);
 static int powernv_add_idle_states(void)
 {
-   struct device_node *power_mgt;
int nr_idle_states = 1; /* Snooze */
-   int dt_idle_states, count;
-   u32 latency_ns[CPUIDLE_STATE_MAX];
-   u32 residency_ns[CPUIDLE_STATE_MAX];
-   u32 flags[CPUIDLE_STATE_MAX];
-   u64 psscr_val[CPUIDLE_STATE_MAX];
-   u64 psscr_mask[CPUIDLE_STATE_MAX];
-   const char *names[CPUIDLE_STATE_MAX];
+   int dt_idle_states;
u32 has_stop_states = 0;
-   int i, rc;
+   int i;
u32 supported_flags = pnv_get_supported_cpuidle_states();
 
 
/* Currently we have snooze statically defined */
-
-   power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
-   if (!power_mgt) {
-   pr_warn("opal: PowerMgmt Node not found\n");
-   goto out;
-   }
-
-   /* Read values of any property to determine the num of idle states */
-   dt_idle_states = of_property_count_u32_elems(power_mgt, 
"ibm,cpu-idle-state-flags");
-   if (dt_idle_states < 0) {
-   pr_warn("cpuidle-powernv: no idle states found in the DT\n");
+   if (nr_pnv_idle_states <= 0) {
+   pr_warn("cpuidle-powernv : Only Snooze is available\n");
goto out;
}
 
-   count = of_property_count_u32_elems(power_mgt,
-   "ibm,cpu-idle-state-latencies-ns");
-
-   if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags", dt_idle_states,
-  "ibm,cpu-idle-state-latencies-ns",
-  count) != 0)
-   goto out;
-
-   count = of_property_count_strings(power_mgt,
- "ibm,cpu-idle-state-names");
-   if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags", dt_idle_states,
-  "ibm,cpu-idle-state-names",
-  count) != 0)
-   goto out;
+   /* TODO: Count only states which are eligible for cpuidle */
+   dt_idle_states = nr_pnv_idle_states;
 
/*
 * Since snooze is used as first idle state, max idle states allowed is
 * CPUIDLE_STATE_MAX -1
 */
-   if (dt_idle_states > CPUIDLE_STATE_MAX - 1) {
+   if (nr_pnv_idle_states > CPUIDLE_STATE_MAX - 1) {
pr_warn("cpuidle-powernv: discovered idle states more than 
allowed");
dt_idle_states = CPUIDLE_STATE_MAX - 1;
}
 
-   if (of_property_read_u32_array(power_mgt,
-   "ibm,cpu-idle-state-flags", flags, dt_idle_states)) {
-   pr_warn("cpuidle-powernv : missing ibm,cpu-idle-state-flags in 
DT\n");
-   goto out;
-   }
-
-   if (of_property_read_u32_array(power_mgt,
-   "ibm,cpu-idle-state-latencies-ns", latency_ns,
-   dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-state-latencies-ns in DT\n");
-   goto out;
-   }
-   if (of_property_read_string_array(power_mgt,
-   "ibm,cpu-idle-state-names", names, dt_idle_states) < 0) {
-

[PATCH v4 1/2] powernv/cpuidle: Parse dt idle properties into global structure

2018-07-05 Thread Akshay Adiga

Device-tree parsing happens twice, once while deciding idle state to be
used for hotplug and once during cpuidle init. Hence, parsing the device
tree and caching it will reduce code duplication. Parsing code has been
moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
to the properties in the device tree the number of available states is
also required.

Signed-off-by: Akshay Adiga 
Reviewed-by: Nicholas Piggin 
Reviewed-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/cpuidle.h|  11 ++
 arch/powerpc/platforms/powernv/idle.c | 216 --
 2 files changed, 149 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index e210a83eb196..574b0ce1d671 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -79,6 +79,17 @@ struct stop_sprs {
u64 mmcra;
 };
 
+#define PNV_IDLE_NAME_LEN16
+struct pnv_idle_states_t {
+   char name[PNV_IDLE_NAME_LEN];
+   u32 latency_ns;
+   u32 residency_ns;
+   u64 psscr_val;
+   u64 psscr_mask;
+   u32 flags;
+   bool valid;
+};
+
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 1c5d0675b43c..7cf71b3e03a1 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -36,6 +36,8 @@
 #define P9_STOP_SPR_PSSCR  855
 
 static u32 supported_cpuidle_states;
+struct pnv_idle_states_t *pnv_idle_states;
+int nr_pnv_idle_states;
 
 /*
  * The default stop state that will be used by ppc_md.power_save
@@ -622,48 +624,10 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 
*psscr_mask, u32 flags)
  * @dt_idle_states: Number of idle state entries
  * Returns 0 on success
  */
-static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
-   int dt_idle_states)
+static int __init pnv_power9_idle_init(void)
 {
-   u64 *psscr_val = NULL;
-   u64 *psscr_mask = NULL;
-   u32 *residency_ns = NULL;
u64 max_residency_ns = 0;
-   int rc = 0, i;
-
-   psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
-   psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL);
-   residency_ns = kcalloc(dt_idle_states, sizeof(*residency_ns),
-  GFP_KERNEL);
-
-   if (!psscr_val || !psscr_mask || !residency_ns) {
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u64_array(np,
-   "ibm,cpu-idle-state-psscr",
-   psscr_val, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr in 
DT\n");
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u64_array(np,
-  "ibm,cpu-idle-state-psscr-mask",
-  psscr_mask, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr-mask 
in DT\n");
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u32_array(np,
-  "ibm,cpu-idle-state-residency-ns",
-   residency_ns, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-state-residency-ns in DT\n");
-   rc = -1;
-   goto out;
-   }
+   int i;
 
/*
 * Set pnv_first_deep_stop_state, pnv_deepest_stop_psscr_{val,mask},
@@ -679,33 +643,36 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
 * the shallowest (OPAL_PM_STOP_INST_FAST) loss-less stop state.
 */
pnv_first_deep_stop_state = MAX_STOP_STATE;
-   for (i = 0; i < dt_idle_states; i++) {
+   for (i = 0; i < nr_pnv_idle_states; i++) {
int err;
-   u64 psscr_rl = psscr_val[i] & PSSCR_RL_MASK;
+   struct pnv_idle_states_t *state = _idle_states[i];
+   u64 psscr_rl = state->psscr_val & PSSCR_RL_MASK;
 
-   if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
-(pnv_first_deep_stop_state > psscr_rl))
+   if ((state->flags & OPAL_PM_LOSE_FULL_CONTEXT) &&
+   pnv_first_deep_stop_state > psscr_rl)
pnv_first_deep_stop_state = psscr_rl;
 
-   err = validate_psscr_val_mask(_val[i], _mask[i],
- flags[i]);
+   err = validate_psscr_val_mask(>psscr_val,
+ >psscr_mask,
+ state->flags);

[PATCH v4 0/2] powernv/cpuidle Device-tree parsing cleanup

2018-07-05 Thread Akshay Adiga

Device-tree parsed multiple time in powernv cpuidle and powernv
hotplug code.

First to identify supported flags. Second time, to identify deepest_state
and first deep state. Third time, during cpuidle init to find the available
idle states. Any change in device-tree format will lead to make changes in
these 3 places. Errors in device-tree can be handled in a better manner.

This series adds code to parse device tree once and save in global structure.

Changes from v3 :
 - Removed a stale comment
Changes from v2 :
 - Fix build error (moved a hunk from patch 1 to patch 2)
Changes from v1 :
 - fold first 2 patches into 1
 - rename pm_ctrl_reg_* as psscr_*
 - added comment stating removal of pmicr parsing code
 - removed parsing code for pmicr
 - add member valid in pnv_idle_states_t to indicate if the psscr-mask/val
are valid combination,
 - Change function description of pnv_parse_cpuidle_dt
 - Added error handling code.


Akshay Adiga (2):
  powernv/cpuidle: Parse dt idle properties into global structure
  powernv/cpuidle: Use parsed device tree values for cpuidle_init

 arch/powerpc/include/asm/cpuidle.h|  13 ++
 arch/powerpc/platforms/powernv/idle.c | 216 --
 drivers/cpuidle/cpuidle-powernv.c | 158 ---
 3 files changed, 177 insertions(+), 210 deletions(-)

-- 
2.18.0.rc2.85.g1fb9df7

[PATCH v3 2/2] powernv/cpuidle: Use parsed device tree values for cpuidle_init

2018-07-03 Thread Akshay Adiga

Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
cpuidle driver. Use properties from pnv_idle_states structure for powernv
cpuidle_init.

Signed-off-by: Akshay Adiga 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/cpuidle.h |   2 +
 drivers/cpuidle/cpuidle-powernv.c  | 154 +
 2 files changed, 28 insertions(+), 128 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 574b0ce1d671..43e5f31fe64d 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -90,6 +90,8 @@ struct pnv_idle_states_t {
bool valid;
 };
 
+extern struct pnv_idle_states_t *pnv_idle_states;
+extern int nr_pnv_idle_states;
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index d29e4f041efe..208d57c77305 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -242,6 +242,7 @@ static inline void add_powernv_state(int index, const char 
*name,
powernv_states[index].target_residency = target_residency;
powernv_states[index].exit_latency = exit_latency;
powernv_states[index].enter = idle_fn;
+   /* For power8 and below psscr_* will be 0 */
stop_psscr_table[index].val = psscr_val;
stop_psscr_table[index].mask = psscr_mask;
 }
@@ -263,186 +264,84 @@ static inline int validate_dt_prop_sizes(const char 
*prop1, int prop1_len,
 extern u32 pnv_get_supported_cpuidle_states(void);
 static int powernv_add_idle_states(void)
 {
-   struct device_node *power_mgt;
int nr_idle_states = 1; /* Snooze */
-   int dt_idle_states, count;
-   u32 latency_ns[CPUIDLE_STATE_MAX];
-   u32 residency_ns[CPUIDLE_STATE_MAX];
-   u32 flags[CPUIDLE_STATE_MAX];
-   u64 psscr_val[CPUIDLE_STATE_MAX];
-   u64 psscr_mask[CPUIDLE_STATE_MAX];
-   const char *names[CPUIDLE_STATE_MAX];
+   int dt_idle_states;
u32 has_stop_states = 0;
-   int i, rc;
+   int i;
u32 supported_flags = pnv_get_supported_cpuidle_states();
 
 
/* Currently we have snooze statically defined */
-
-   power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
-   if (!power_mgt) {
-   pr_warn("opal: PowerMgmt Node not found\n");
+   if (nr_pnv_idle_states <= 0) {
+   pr_warn("cpuidle-powernv : Only Snooze is available\n");
goto out;
}
 
-   /* Read values of any property to determine the num of idle states */
-   dt_idle_states = of_property_count_u32_elems(power_mgt, 
"ibm,cpu-idle-state-flags");
-   if (dt_idle_states < 0) {
-   pr_warn("cpuidle-powernv: no idle states found in the DT\n");
-   goto out;
-   }
-
-   count = of_property_count_u32_elems(power_mgt,
-   "ibm,cpu-idle-state-latencies-ns");
-
-   if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags", dt_idle_states,
-  "ibm,cpu-idle-state-latencies-ns",
-  count) != 0)
-   goto out;
-
-   count = of_property_count_strings(power_mgt,
- "ibm,cpu-idle-state-names");
-   if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags", dt_idle_states,
-  "ibm,cpu-idle-state-names",
-  count) != 0)
-   goto out;
+   /* TODO: Count only states which are eligible for cpuidle */
+   dt_idle_states = nr_pnv_idle_states;
 
/*
 * Since snooze is used as first idle state, max idle states allowed is
 * CPUIDLE_STATE_MAX -1
 */
-   if (dt_idle_states > CPUIDLE_STATE_MAX - 1) {
+   if (nr_pnv_idle_states > CPUIDLE_STATE_MAX - 1) {
pr_warn("cpuidle-powernv: discovered idle states more than 
allowed");
dt_idle_states = CPUIDLE_STATE_MAX - 1;
}
 
-   if (of_property_read_u32_array(power_mgt,
-   "ibm,cpu-idle-state-flags", flags, dt_idle_states)) {
-   pr_warn("cpuidle-powernv : missing ibm,cpu-idle-state-flags in 
DT\n");
-   goto out;
-   }
-
-   if (of_property_read_u32_array(power_mgt,
-   "ibm,cpu-idle-state-latencies-ns", latency_ns,
-   dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-state-latencies-ns in DT\n");
-   goto out;
-   }
-   if (of_property_read_string_array(power_mgt,
-   "ibm,cpu-idle-state-names", names, dt_idle_states) < 0) {
-   pr_warn(&

[PATCH v3 1/2] powernv/cpuidle: Parse dt idle properties into global structure

2018-07-03 Thread Akshay Adiga

Device-tree parsing happens twice, once while deciding idle state to be
used for hotplug and once during cpuidle init. Hence, parsing the device
tree and caching it will reduce code duplication. Parsing code has been
moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
to the properties in the device tree the number of available states is
also required.

Signed-off-by: Akshay Adiga 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/cpuidle.h|  11 ++
 arch/powerpc/platforms/powernv/idle.c | 216 --
 2 files changed, 149 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index e210a83eb196..574b0ce1d671 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -79,6 +79,17 @@ struct stop_sprs {
u64 mmcra;
 };
 
+#define PNV_IDLE_NAME_LEN16
+struct pnv_idle_states_t {
+   char name[PNV_IDLE_NAME_LEN];
+   u32 latency_ns;
+   u32 residency_ns;
+   u64 psscr_val;
+   u64 psscr_mask;
+   u32 flags;
+   bool valid;
+};
+
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 1c5d0675b43c..7cf71b3e03a1 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -36,6 +36,8 @@
 #define P9_STOP_SPR_PSSCR  855
 
 static u32 supported_cpuidle_states;
+struct pnv_idle_states_t *pnv_idle_states;
+int nr_pnv_idle_states;
 
 /*
  * The default stop state that will be used by ppc_md.power_save
@@ -622,48 +624,10 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 
*psscr_mask, u32 flags)
  * @dt_idle_states: Number of idle state entries
  * Returns 0 on success
  */
-static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
-   int dt_idle_states)
+static int __init pnv_power9_idle_init(void)
 {
-   u64 *psscr_val = NULL;
-   u64 *psscr_mask = NULL;
-   u32 *residency_ns = NULL;
u64 max_residency_ns = 0;
-   int rc = 0, i;
-
-   psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
-   psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL);
-   residency_ns = kcalloc(dt_idle_states, sizeof(*residency_ns),
-  GFP_KERNEL);
-
-   if (!psscr_val || !psscr_mask || !residency_ns) {
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u64_array(np,
-   "ibm,cpu-idle-state-psscr",
-   psscr_val, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr in 
DT\n");
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u64_array(np,
-  "ibm,cpu-idle-state-psscr-mask",
-  psscr_mask, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr-mask 
in DT\n");
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u32_array(np,
-  "ibm,cpu-idle-state-residency-ns",
-   residency_ns, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-state-residency-ns in DT\n");
-   rc = -1;
-   goto out;
-   }
+   int i;
 
/*
 * Set pnv_first_deep_stop_state, pnv_deepest_stop_psscr_{val,mask},
@@ -679,33 +643,36 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
 * the shallowest (OPAL_PM_STOP_INST_FAST) loss-less stop state.
 */
pnv_first_deep_stop_state = MAX_STOP_STATE;
-   for (i = 0; i < dt_idle_states; i++) {
+   for (i = 0; i < nr_pnv_idle_states; i++) {
int err;
-   u64 psscr_rl = psscr_val[i] & PSSCR_RL_MASK;
+   struct pnv_idle_states_t *state = _idle_states[i];
+   u64 psscr_rl = state->psscr_val & PSSCR_RL_MASK;
 
-   if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
-(pnv_first_deep_stop_state > psscr_rl))
+   if ((state->flags & OPAL_PM_LOSE_FULL_CONTEXT) &&
+   pnv_first_deep_stop_state > psscr_rl)
pnv_first_deep_stop_state = psscr_rl;
 
-   err = validate_psscr_val_mask(_val[i], _mask[i],
- flags[i]);
+   err = validate_psscr_val_mask(>psscr_val,
+ >psscr_mask,
+ state->flags);
if (e

[PATCH v3 0/2] powernv/cpuidle Device-tree parsing cleanup

2018-07-03 Thread Akshay Adiga



Device-tree parsed multiple time in powernv cpuidle and powernv
hotplug code. 

First to identify supported flags. Second time, to identify deepest_state
and first deep state. Third time, during cpuidle init to find the available
idle states. Any change in device-tree format will lead to make changes in
these 3 places. Errors in device-tree can be handled in a better manner.

This series adds code to parse device tree once and save in global structure.

Changes from v2 :
 - Fix build error (moved a hunk from patch 1 to patch 2)
Changes from v1 :
 - fold first 2 patches into 1
 - rename pm_ctrl_reg_* as psscr_*
 - added comment stating removal of pmicr parsing code
 - removed parsing code for pmicr
 - add member valid in pnv_idle_states_t to indicate if the psscr-mask/val
are valid combination,
 - Change function description of pnv_parse_cpuidle_dt
 - Added error handling code.


Akshay Adiga (2):
  powernv/cpuidle: Parse dt idle properties into global structure
  powernv/cpuidle: Use parsed device tree values for cpuidle_init

 arch/powerpc/include/asm/cpuidle.h|  13 ++
 arch/powerpc/platforms/powernv/idle.c | 216 --
 drivers/cpuidle/cpuidle-powernv.c | 154 --
 3 files changed, 177 insertions(+), 206 deletions(-)

-- 
2.18.0.rc2.85.g1fb9df7

[PATCH v2 2/2] powernv/cpuidle: Use parsed device tree values for cpuidle_init

2018-07-02 Thread Akshay Adiga

Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
cpuidle driver. Use properties from pnv_idle_states structure for powernv
cpuidle_init.

Signed-off-by: Akshay Adiga 
---
 arch/powerpc/include/asm/cpuidle.h |   2 +
 drivers/cpuidle/cpuidle-powernv.c  | 143 +
 2 files changed, 26 insertions(+), 119 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 574b0ce1d671..43e5f31fe64d 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -90,6 +90,8 @@ struct pnv_idle_states_t {
bool valid;
 };
 
+extern struct pnv_idle_states_t *pnv_idle_states;
+extern int nr_pnv_idle_states;
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 7ab613d4dca1..ec93b2ae7b17 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -242,6 +242,7 @@ static inline void add_powernv_state(int index, const char 
*name,
powernv_states[index].target_residency = target_residency;
powernv_states[index].exit_latency = exit_latency;
powernv_states[index].enter = idle_fn;
+   /* For power8 and below psscr_* will be 0 */
stop_psscr_table[index].val = psscr_val;
stop_psscr_table[index].mask = psscr_mask;
 }
@@ -263,179 +264,84 @@ static inline int validate_dt_prop_sizes(const char 
*prop1, int prop1_len,
 extern u32 pnv_get_supported_cpuidle_states(void);
 static int powernv_add_idle_states(void)
 {
-   struct device_node *power_mgt;
int nr_idle_states = 1; /* Snooze */
-   int dt_idle_states, count;
-   u32 latency_ns[CPUIDLE_STATE_MAX];
-   u32 residency_ns[CPUIDLE_STATE_MAX];
-   u32 flags[CPUIDLE_STATE_MAX];
-   u64 psscr_val[CPUIDLE_STATE_MAX];
-   u64 psscr_mask[CPUIDLE_STATE_MAX];
-   const char *names[CPUIDLE_STATE_MAX];
+   int dt_idle_states;
u32 has_stop_states = 0;
-   int i, rc;
+   int i;
u32 supported_flags = pnv_get_supported_cpuidle_states();
 
 
/* Currently we have snooze statically defined */
-
-   power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
-   if (!power_mgt) {
-   pr_warn("opal: PowerMgmt Node not found\n");
+   if (nr_pnv_idle_states <= 0) {
+   pr_warn("cpuidle-powernv : Only Snooze is available\n");
goto out;
}
 
-   /* Read values of any property to determine the num of idle states */
-   dt_idle_states = of_property_count_u32_elems(power_mgt, 
"ibm,cpu-idle-state-flags");
-   if (dt_idle_states < 0) {
-   pr_warn("cpuidle-powernv: no idle states found in the DT\n");
-   goto out;
-   }
-
-   count = of_property_count_u32_elems(power_mgt,
-   "ibm,cpu-idle-state-latencies-ns");
-
-   if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags", dt_idle_states,
-  "ibm,cpu-idle-state-latencies-ns",
-  count) != 0)
-   goto out;
-
-   count = of_property_count_strings(power_mgt,
- "ibm,cpu-idle-state-names");
-   if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags", dt_idle_states,
-  "ibm,cpu-idle-state-names",
-  count) != 0)
-   goto out;
+   /* TODO: Count only states which are eligible for cpuidle */
+   dt_idle_states = nr_pnv_idle_states;
 
/*
 * Since snooze is used as first idle state, max idle states allowed is
 * CPUIDLE_STATE_MAX -1
 */
-   if (dt_idle_states > CPUIDLE_STATE_MAX - 1) {
+   if (nr_pnv_idle_states > CPUIDLE_STATE_MAX - 1) {
pr_warn("cpuidle-powernv: discovered idle states more than 
allowed");
dt_idle_states = CPUIDLE_STATE_MAX - 1;
}
 
-   if (of_property_read_u32_array(power_mgt,
-   "ibm,cpu-idle-state-flags", flags, dt_idle_states)) {
-   pr_warn("cpuidle-powernv : missing ibm,cpu-idle-state-flags in 
DT\n");
-   goto out;
-   }
-
-   if (of_property_read_u32_array(power_mgt,
-   "ibm,cpu-idle-state-latencies-ns", latency_ns,
-   dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-state-latencies-ns in DT\n");
-   goto out;
-   }
-   if (of_property_read_string_array(power_mgt,
-   "ibm,cpu-idle-state-names", names, dt_idle_states) < 0) {
-   pr_warn("cpuidle-powernv: missing ib

[PATCH v2 1/2] powernv/cpuidle: Parse dt idle properties into global structure

2018-07-02 Thread Akshay Adiga

Device-tree parsing happens twice, once while deciding idle state to be
used for hotplug and once during cpuidle init. Hence, parsing the device
tree and caching it will reduce code duplication. Parsing code has been
moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
to the properties in the device tree the number of available states is
also required.

Signed-off-by: Akshay Adiga 
---
 arch/powerpc/include/asm/cpuidle.h|  11 ++
 arch/powerpc/platforms/powernv/idle.c | 216 --
 drivers/cpuidle/cpuidle-powernv.c |  11 +-
 3 files changed, 151 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index e210a83eb196..574b0ce1d671 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -79,6 +79,17 @@ struct stop_sprs {
u64 mmcra;
 };
 
+#define PNV_IDLE_NAME_LEN16
+struct pnv_idle_states_t {
+   char name[PNV_IDLE_NAME_LEN];
+   u32 latency_ns;
+   u32 residency_ns;
+   u64 psscr_val;
+   u64 psscr_mask;
+   u32 flags;
+   bool valid;
+};
+
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 1c5d0675b43c..7cf71b3e03a1 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -36,6 +36,8 @@
 #define P9_STOP_SPR_PSSCR  855
 
 static u32 supported_cpuidle_states;
+struct pnv_idle_states_t *pnv_idle_states;
+int nr_pnv_idle_states;
 
 /*
  * The default stop state that will be used by ppc_md.power_save
@@ -622,48 +624,10 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 
*psscr_mask, u32 flags)
  * @dt_idle_states: Number of idle state entries
  * Returns 0 on success
  */
-static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
-   int dt_idle_states)
+static int __init pnv_power9_idle_init(void)
 {
-   u64 *psscr_val = NULL;
-   u64 *psscr_mask = NULL;
-   u32 *residency_ns = NULL;
u64 max_residency_ns = 0;
-   int rc = 0, i;
-
-   psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
-   psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL);
-   residency_ns = kcalloc(dt_idle_states, sizeof(*residency_ns),
-  GFP_KERNEL);
-
-   if (!psscr_val || !psscr_mask || !residency_ns) {
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u64_array(np,
-   "ibm,cpu-idle-state-psscr",
-   psscr_val, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr in 
DT\n");
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u64_array(np,
-  "ibm,cpu-idle-state-psscr-mask",
-  psscr_mask, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr-mask 
in DT\n");
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u32_array(np,
-  "ibm,cpu-idle-state-residency-ns",
-   residency_ns, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-state-residency-ns in DT\n");
-   rc = -1;
-   goto out;
-   }
+   int i;
 
/*
 * Set pnv_first_deep_stop_state, pnv_deepest_stop_psscr_{val,mask},
@@ -679,33 +643,36 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
 * the shallowest (OPAL_PM_STOP_INST_FAST) loss-less stop state.
 */
pnv_first_deep_stop_state = MAX_STOP_STATE;
-   for (i = 0; i < dt_idle_states; i++) {
+   for (i = 0; i < nr_pnv_idle_states; i++) {
int err;
-   u64 psscr_rl = psscr_val[i] & PSSCR_RL_MASK;
+   struct pnv_idle_states_t *state = _idle_states[i];
+   u64 psscr_rl = state->psscr_val & PSSCR_RL_MASK;
 
-   if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
-(pnv_first_deep_stop_state > psscr_rl))
+   if ((state->flags & OPAL_PM_LOSE_FULL_CONTEXT) &&
+   pnv_first_deep_stop_state > psscr_rl)
pnv_first_deep_stop_state = psscr_rl;
 
-   err = validate_psscr_val_mask(_val[i], _mask[i],
- flags[i]);
+   err = validate_psscr_val_mask(>psscr_val,
+ >psscr_mask,
+ state->flags);
if (e

[PATCH v2 0/2] powernv/cpuidle Device-tree parsing cleanup

2018-07-02 Thread Akshay Adiga

Device-tree parsed multiple time in powernv cpuidle and powernv
hotplug code. 

First to identify supported flags. Second time, to identify deepest_state
and first deep state. Third time, during cpuidle init to find the available
idle states. Any change in device-tree format will lead to make changes in
these 3 places. Errors in device-tree can be handled in a better manner.

This series adds code to parse device tree once and save in global structure.

Changes from v1 :  
 - folded first 2 patches into 1 
 - rename pm_ctrl_reg_* as psscr_* 
 - added comment stating removal of pmicr parsing code 
 - removed parsing code for pmicr  
 - add member valid in pnv_idle_states_t to indicate if the psscr-mask/val 
are valid combination, 
 - Change function description of pnv_parse_cpuidle_dt 
 - Added error handling code.  

Akshay Adiga (2):
  powernv/cpuidle: Parse dt idle properties into global structure
  powernv/cpuidle: Use parsed device tree values for cpuidle_init

 arch/powerpc/include/asm/cpuidle.h|  13 ++
 arch/powerpc/platforms/powernv/idle.c | 216 --
 drivers/cpuidle/cpuidle-powernv.c | 156 ---
 3 files changed, 178 insertions(+), 207 deletions(-)

-- 
2.18.0.rc2.85.g1fb9df7

[PATCH 3/3] powernv/cpuidle: Use parsed device tree values for cpuidle_init

2018-06-18 Thread Akshay Adiga

Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
cpuidle driver. Use properties from pnv_idle_states structure for powernv
cpuidle_init.

Signed-off-by: Akshay Adiga 
---
 arch/powerpc/include/asm/cpuidle.h |  2 ++
 drivers/cpuidle/cpuidle-powernv.c  | 49 +++---
 2 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 55ee7e3..1446747 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -93,6 +93,8 @@ struct pnv_idle_states_t {
u32 flags;
 };
 
+extern struct pnv_idle_states_t *pnv_idle_states;
+extern int nr_pnv_idle_states;
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index d29e4f0..de8ba26 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -285,6 +285,11 @@ static int powernv_add_idle_states(void)
goto out;
}
 
+   if (nr_pnv_idle_states <= 0) {
+   pr_warn("opal: No idle states found\n");
+   goto out;
+   }
+
/* Read values of any property to determine the num of idle states */
dt_idle_states = of_property_count_u32_elems(power_mgt, 
"ibm,cpu-idle-state-flags");
if (dt_idle_states < 0) {
@@ -338,7 +343,7 @@ static int powernv_add_idle_states(void)
 * If the idle states use stop instruction, probe for psscr values
 * and psscr mask which are necessary to specify required stop level.
 */
-   has_stop_states = (flags[0] &
+   has_stop_states = (pnv_idle_states[0].flags &
   (OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP));
if (has_stop_states) {
count = of_property_count_u64_elems(power_mgt,
@@ -387,51 +392,55 @@ static int powernv_add_idle_states(void)
residency_ns, dt_idle_states);
}
 
-   for (i = 0; i < dt_idle_states; i++) {
+   for (i = 0; i < nr_pnv_idle_states; i++) {
unsigned int exit_latency, target_residency;
bool stops_timebase = false;
+   struct pnv_idle_states_t *state = _idle_states[i];
 
/*
 * Skip the platform idle state whose flag isn't in
-* the supported_cpuidle_states flag mask.
+* the supported_pnv_idle_states flag mask.
 */
-   if ((flags[i] & supported_flags) != flags[i])
+   if ((state->flags & supported_flags) !=
+   state->flags)
continue;
/*
 * If an idle state has exit latency beyond
 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
 * in cpu-idle.
 */
-   if (latency_ns[i] > POWERNV_THRESHOLD_LATENCY_NS)
+   if (state->latency_ns > POWERNV_THRESHOLD_LATENCY_NS)
continue;
/*
 * Firmware passes residency and latency values in ns.
 * cpuidle expects it in us.
 */
-   exit_latency = DIV_ROUND_UP(latency_ns[i], 1000);
+   exit_latency = DIV_ROUND_UP(state->latency_ns, 1000);
if (!rc)
-   target_residency = DIV_ROUND_UP(residency_ns[i], 1000);
+   target_residency = DIV_ROUND_UP(state->residency_ns, 
1000);
else
target_residency = 0;
 
if (has_stop_states) {
-   int err = validate_psscr_val_mask(_val[i],
- _mask[i],
- flags[i]);
+   int err;
+   err = validate_psscr_val_mask(>pm_ctrl_reg_val,
+ >pm_ctrl_reg_mask,
+ state->flags);
if (err) {
-   report_invalid_psscr_val(psscr_val[i], err);
+   report_invalid_psscr_val(state->pm_ctrl_reg_val,
+err);
continue;
}
}
 
-   if (flags[i] & OPAL_PM_TIMEBASE_STOP)
+   if (state->flags & OPAL_PM_TIMEBASE_STOP)
stops_timebase = true;
 
/*
 * For nap and fastsleep, use default target_residency
 * values if f/w does not expose it.
 */
-

[PATCH 2/3] cpuidle/powernv: Change platform init to avoid reparsing dt

2018-06-18 Thread Akshay Adiga

The required data is accessible from cpuidle_states structure and
nr_cpu_idle_states. This patch makes changes to avoid reparsing and use
data from these structures.

Signed-off-by: Akshay Adiga 
---
 arch/powerpc/platforms/powernv/idle.c | 37 ---
 1 file changed, 8 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 07be984..0a62611 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -624,8 +624,7 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 
*psscr_mask, u32 flags)
  * @dt_idle_states: Number of idle state entries
  * Returns 0 on success
  */
-static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
-   int dt_idle_states)
+static int __init pnv_power9_idle_init(void)
 {
u64 max_residency_ns = 0;
int i;
@@ -644,7 +643,7 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
 * the shallowest (OPAL_PM_STOP_INST_FAST) loss-less stop state.
 */
pnv_first_deep_stop_state = MAX_STOP_STATE;
-   for (i = 0; i < dt_idle_states; i++) {
+   for (i = 0; i < nr_pnv_idle_states; i++) {
int err;
struct pnv_idle_states_t *state = _idle_states[i];
u64 psscr_rl = state->pm_ctrl_reg_val & PSSCR_RL_MASK;
@@ -704,41 +703,21 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
  */
 static void __init pnv_probe_idle_states(void)
 {
-   struct device_node *np;
-   int dt_idle_states;
-   u32 *flags = NULL;
int i;
 
-   np = of_find_node_by_path("/ibm,opal/power-mgt");
-   if (!np) {
-   pr_warn("opal: PowerMgmt Node not found\n");
-   goto out;
-   }
-   dt_idle_states = of_property_count_u32_elems(np,
-   "ibm,cpu-idle-state-flags");
-   if (dt_idle_states < 0) {
+   if (nr_pnv_idle_states < 0) {
pr_warn("cpuidle-powernv: no idle states found in the DT\n");
-   goto out;
-   }
-
-   flags = kcalloc(dt_idle_states, sizeof(*flags),  GFP_KERNEL);
-
-   if (of_property_read_u32_array(np,
-   "ibm,cpu-idle-state-flags", flags, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-flags in 
DT\n");
-   goto out;
+   return;
}
 
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
-   if (pnv_power9_idle_init(np, flags, dt_idle_states))
-   goto out;
+   if (pnv_power9_idle_init())
+   return;
}
 
-   for (i = 0; i < dt_idle_states; i++)
-   supported_cpuidle_states |= flags[i];
+   for (i = 0; i < nr_pnv_idle_states; i++)
+   supported_cpuidle_states |= pnv_idle_states[i].flags;
 
-out:
-   kfree(flags);
 }
 
 /*
-- 
2.5.5

[PATCH 1/3] powernv/cpuidle: Parse dt idle properties into global structure

2018-06-18 Thread Akshay Adiga

Device-tree parsing happens in twice, once while deciding idle state to
be used for hotplug and once during cpuidle init. Hence, parsing the
device tree and caching it will reduce code duplication. Parsing code
has been moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states().

Setting up things so that number of available idle states can be
accessible to cpuidle-powernv driver. Hence adding nr_pnv_idle_states to
track number of idle states.

Signed-off-by: Akshay Adiga 
---
 arch/powerpc/include/asm/cpuidle.h|  14 +++
 arch/powerpc/platforms/powernv/idle.c | 197 --
 2 files changed, 152 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index e210a83..55ee7e3 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -79,6 +79,20 @@ struct stop_sprs {
u64 mmcra;
 };
 
+#define PNV_IDLE_NAME_LEN16
+struct pnv_idle_states_t {
+   char name[PNV_IDLE_NAME_LEN];
+   u32 latency_ns;
+   u32 residency_ns;
+   /*
+* Register value/mask used to select different idle states.
+* PMICR in POWER8 and PSSCR in POWER9
+*/
+   u64 pm_ctrl_reg_val;
+   u64 pm_ctrl_reg_mask;
+   u32 flags;
+};
+
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 1c5d067..07be984 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -36,6 +36,8 @@
 #define P9_STOP_SPR_PSSCR  855
 
 static u32 supported_cpuidle_states;
+struct pnv_idle_states_t *pnv_idle_states;
+int nr_pnv_idle_states;
 
 /*
  * The default stop state that will be used by ppc_md.power_save
@@ -625,45 +627,8 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 
*psscr_mask, u32 flags)
 static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
int dt_idle_states)
 {
-   u64 *psscr_val = NULL;
-   u64 *psscr_mask = NULL;
-   u32 *residency_ns = NULL;
u64 max_residency_ns = 0;
-   int rc = 0, i;
-
-   psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
-   psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL);
-   residency_ns = kcalloc(dt_idle_states, sizeof(*residency_ns),
-  GFP_KERNEL);
-
-   if (!psscr_val || !psscr_mask || !residency_ns) {
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u64_array(np,
-   "ibm,cpu-idle-state-psscr",
-   psscr_val, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr in 
DT\n");
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u64_array(np,
-  "ibm,cpu-idle-state-psscr-mask",
-  psscr_mask, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr-mask 
in DT\n");
-   rc = -1;
-   goto out;
-   }
-
-   if (of_property_read_u32_array(np,
-  "ibm,cpu-idle-state-residency-ns",
-   residency_ns, dt_idle_states)) {
-   pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-state-residency-ns in DT\n");
-   rc = -1;
-   goto out;
-   }
+   int i;
 
/*
 * Set pnv_first_deep_stop_state, pnv_deepest_stop_psscr_{val,mask},
@@ -681,31 +646,33 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
pnv_first_deep_stop_state = MAX_STOP_STATE;
for (i = 0; i < dt_idle_states; i++) {
int err;
-   u64 psscr_rl = psscr_val[i] & PSSCR_RL_MASK;
+   struct pnv_idle_states_t *state = _idle_states[i];
+   u64 psscr_rl = state->pm_ctrl_reg_val & PSSCR_RL_MASK;
 
-   if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
-(pnv_first_deep_stop_state > psscr_rl))
+   if ((state->flags & OPAL_PM_LOSE_FULL_CONTEXT) &&
+   (pnv_first_deep_stop_state > psscr_rl))
pnv_first_deep_stop_state = psscr_rl;
 
-   err = validate_psscr_val_mask(_val[i], _mask[i],
- flags[i]);
+   err = validate_psscr_val_mask(>pm_ctrl_reg_val,
+ >pm_ctrl_reg_mask,
+ state->flags);
if (err) {
-   report_invalid_psscr_val(psscr_val[i], e

[PATCH 0/3] powernv/cpuidle Device-tree parsing cleanup

2018-06-18 Thread Akshay Adiga

Device-tree parsed multiple time in powernv cpuidle and powernv
hotplug code. 

First to identify supported flags, secondly, to identify deepest_state
and first deep state, Thirdly , during cpudidle init to find the available
idle states. Any change in device-tree format will lead to make changes in
these 3 places.

This series adds code to parse device tree once and save in global structure.

Akshay Adiga (3):
  powernv/cpuidle: Parse dt idle properties into global structure
  cpuidle/powernv: Change platform init to avoid reparsing dt
  powernv/cpuidle: Use parsed device tree values for cpuidle_init

 arch/powerpc/include/asm/cpuidle.h|  16 +++
 arch/powerpc/platforms/powernv/idle.c | 214 +-
 drivers/cpuidle/cpuidle-powernv.c |  49 +---
 3 files changed, 182 insertions(+), 97 deletions(-)

-- 
2.5.5

Re: [PATCH v2] cpuidle/powernv : Add Description for cpuidle state

2018-06-05 Thread Akshay Adiga

On Tue, Jun 05, 2018 at 02:24:39PM +0530, Abhishek wrote:
> 
> 
> On 06/04/2018 05:15 PM, Akshay Adiga wrote:
> > On Mon, Jun 04, 2018 at 07:04:14PM +1000, Benjamin Herrenschmidt wrote:
> > > Is this a new property ? I'm not fan of adding yet another of those
> > > silly arrays.
> > > 
> > > I would say this is the right time now to switch over to a node per
> > > state instead, as we discussed with Vaidy.
> 
> It is not a new property. Name was being used for description as description
> was not present in device tree. A skiboot patch adding description to device
> tree have been posted. This patch reads those description instead of copying
> name itself into description. And we fall back to reading name into
> description to not break the comaptibility with older firmware.

>From a cpuidle point of view this is a exisiting property, but for
powernv there was no device-tree property describing this. 

Abhishek has added the following skiboot patch for adding description
for each idle state in device-tree .
https://patchwork.ozlabs.org/patch/924879/

I agree this can go into new device tree format which each idle state
as a node. Probably i will roll this patch into mine in the next
version.


> 
> Thanks
> Abhishek
> 
> > I posted  the node based device tree here :
> > skiboot patch :  https://patchwork.ozlabs.org/patch/923120/
> > kernel patch : https://lkml.org/lkml/2018/5/30/1146
> > 
> > Do you have any inputs for this design ?
> > 
> > > Additionally, while doing that, we can provide the versioning mechanism
> > > I proposed so we can deal with state specific issues and erratas.
> > > 
> > > Cheers,
> > > Ben.
> > > 
>

Re: [PATCH v2] cpuidle/powernv : Add Description for cpuidle state

2018-06-04 Thread Akshay Adiga

On Mon, Jun 04, 2018 at 07:04:14PM +1000, Benjamin Herrenschmidt wrote:
> Is this a new property ? I'm not fan of adding yet another of those
> silly arrays.
> 
> I would say this is the right time now to switch over to a node per
> state instead, as we discussed with Vaidy.

I posted  the node based device tree here :
skiboot patch :  https://patchwork.ozlabs.org/patch/923120/
kernel patch : https://lkml.org/lkml/2018/5/30/1146

Do you have any inputs for this design ?

> 
> Additionally, while doing that, we can provide the versioning mechanism
> I proposed so we can deal with state specific issues and erratas.
> 
> Cheers,
> Ben.
>

Re: [PATCH] [SCHEME 2]powernv/cpuidle: Add support for new idle state device-tree format

2018-05-30 Thread Akshay Adiga

On Thu, May 31, 2018 at 08:23:20AM +0530, Akshay Adiga wrote:
> This patch adds support for new device-tree format for idle state
> description.
> 
> Previously if a older kernel runs on a newer firmware, it may enable
> all available states irrespective of its capability of handling it.
> New device tree format adds a compatible flag, so that only kernel
> which has the capability to handle the version of stop state will enable
> it.
> 
> Older kernel will still see stop0 and stop0_lite in older format and we
> will depricate it after some time.
> 
> Idea is to bump up the version in firmware if we find a bug or
> regression in stop states. A fix will be provided in linux which would
> now know about the bumped up version of stop states, where as kernel
> without fixes would ignore the states.
> 
> New idle state device tree format :
> 
> power-mgt {
> ibm,cpu-idle-state-names = "stop0_lite", "stop0";
> ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
> ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
> ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
> ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
> ibm,cpu-idle-state-flags = <0x10 0x101000>;
> ibm,enabled-stop-levels = <0xec00>;
> ibm,idle-states {
>   ibm,cpu-idle-state-names = "stop1", "stop2", "stop4", "stop5";
>   ibm,cpu-idle-state-residency-ns = <0xc350 0x186a0 0x989680
> 0x1312d00>;
>   ibm,cpu-idle-state-latencies-ns = <0x1388 0x2710 0x186a0
> 0x30d40>;
>   ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff 0x0
> 0x3003ff 0x0 0x3003ff>;
>   ibm,cpu-idle-state-psscr = <0x0 0x300331 0x0 0x300332 0x0
> 0x300374 0x0 0x300375>;
>   ibm,cpu-idle-state-versions = "ibm,idle-state-v1
> ", "ibm,idle-state-v1", "ibm,idle-state-v1", "ibm,idle-state-v1";
>   ibm,cpu-idle-state-flags = <0x101000 0x101000 0x207000
> 0x207000>;
>}
>  }
> 
> Signed-off-by: Akshay Adiga 
> ---

This patch is intended to be a RFC.
The skiboot patch has been posted here :
https://patchwork.ozlabs.org/patch/923121/

Re: [PATCH] [SCHEME 1] Add support for new idle device tree format

2018-05-30 Thread Akshay Adiga

On Thu, May 31, 2018 at 08:23:04AM +0530, Akshay Adiga wrote:
> This patch adds support for new device-tree format for idle state
> description.
> 
> Previously if a older kernel runs on a newer firmware, it may enable
> all available states irrespective of its capability of handling it.
> New device tree format adds a compatible flag, so that only kernel
> which has the capability to handle the version of stop state will enable
> it.
> 
> Older kernel will still see stop0 and stop0_lite in older format and we
> will depricate it after some time.
> 
> Idea is to bump up the version in firmware if we find a bug or
> regression in stop states. A fix will be provided in linux which would
> now know about the bumped up version of stop states, where as kernel
> without fixes would ignore the states.
> 
> New idle state device tree format :
>power-mgt {
> ...
> ibm,enabled-stop-levels = <0xec00>;
> ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
> ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
> ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
> ibm,cpu-idle-state-flags = <0x10 0x101000>;
> ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
> ibm,idle-states {
> phandle = <0xff>;
> stop4 {
> flags = <0x207000>;
> compatible = "ibm,cpuidle-state-v1";
> psscr-mask = <0x0 0x3003ff>;
> handle = <0x102>;
> latency-ns = <0x186a0>;
> residency-ns = <0x989680>;
> psscr = <0x0 0x300374>;
>  };
> 
> ...
>         stop11 {
> ...
> compatible = "ibm,cpuoffline-state-v1";
> ...
> 
>};
>};
> 
> Signed-off-by: Akshay Adiga 

This is intended to be a RFC patch. 
The skiboot patch for the device tree is posted here :
https://patchwork.ozlabs.org/patch/923120/

[PATCH] [SCHEME 2]powernv/cpuidle: Add support for new idle state device-tree format

2018-05-30 Thread Akshay Adiga

This patch adds support for new device-tree format for idle state
description.

Previously if a older kernel runs on a newer firmware, it may enable
all available states irrespective of its capability of handling it.
New device tree format adds a compatible flag, so that only kernel
which has the capability to handle the version of stop state will enable
it.

Older kernel will still see stop0 and stop0_lite in older format and we
will depricate it after some time.

Idea is to bump up the version in firmware if we find a bug or
regression in stop states. A fix will be provided in linux which would
now know about the bumped up version of stop states, where as kernel
without fixes would ignore the states.

New idle state device tree format :

power-mgt {
ibm,cpu-idle-state-names = "stop0_lite", "stop0";
ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
ibm,cpu-idle-state-flags = <0x10 0x101000>;
ibm,enabled-stop-levels = <0xec00>;
ibm,idle-states {
  ibm,cpu-idle-state-names = "stop1", "stop2", "stop4", "stop5";
  ibm,cpu-idle-state-residency-ns = <0xc350 0x186a0 0x989680
0x1312d00>;
  ibm,cpu-idle-state-latencies-ns = <0x1388 0x2710 0x186a0
0x30d40>;
  ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff 0x0
0x3003ff 0x0 0x3003ff>;
  ibm,cpu-idle-state-psscr = <0x0 0x300331 0x0 0x300332 0x0
0x300374 0x0 0x300375>;
  ibm,cpu-idle-state-versions = "ibm,idle-state-v1
", "ibm,idle-state-v1", "ibm,idle-state-v1", "ibm,idle-state-v1";
  ibm,cpu-idle-state-flags = <0x101000 0x101000 0x207000
0x207000>;
   }
 }

Signed-off-by: Akshay Adiga 
---
 arch/powerpc/platforms/powernv/idle.c | 60 ++--
 drivers/cpuidle/cpuidle-powernv.c | 65 +--
 2 files changed, 112 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 1f12ab1..ab52665 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -622,8 +622,8 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 
*psscr_mask, u32 flags)
  * @dt_idle_states: Number of idle state entries
  * Returns 0 on success
  */
-static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
-   int dt_idle_states)
+static int __init pnv_power9_idle_init(struct device_node *np, struct 
device_node *np_new,u32 *flags,
+   int dt_idle_states, int 
additional_states)
 {
u64 *psscr_val = NULL;
u64 *psscr_mask = NULL;
@@ -631,9 +631,10 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
u64 max_residency_ns = 0;
int rc = 0, i;
 
-   psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
-   psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL);
-   residency_ns = kcalloc(dt_idle_states, sizeof(*residency_ns),
+   /* TODO: remove ugliness of using additional_states count*/
+   psscr_val = kcalloc(dt_idle_states+additional_states, 
sizeof(*psscr_val), GFP_KERNEL);
+   psscr_mask = kcalloc(dt_idle_states+additional_states, 
sizeof(*psscr_mask), GFP_KERNEL);
+   residency_ns = kcalloc(dt_idle_states+additional_states, 
sizeof(*residency_ns),
   GFP_KERNEL);
 
if (!psscr_val || !psscr_mask || !residency_ns) {
@@ -648,6 +649,13 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
rc = -1;
goto out;
}
+   if (of_property_read_u64_array(np_new,
+   "ibm,cpu-idle-state-psscr",
+   psscr_val + dt_idle_states, additional_states)) {
+   pr_warn("cpuidle-powernv: missing addtional 
ibm,cpu-idle-state-psscr in DT\n");
+   rc = -1;
+   goto out;
+   }
 
if (of_property_read_u64_array(np,
   "ibm,cpu-idle-state-psscr-mask",
@@ -656,6 +664,13 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
rc = -1;
goto out;
}
+   if (of_property_read_u64_array(np_new,
+  "ibm,cpu-idle-state-psscr-mask",
+  psscr_mask + dt_idle_states, 
additional_states)) {
+   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr-mask 
in DT\n");
+

[PATCH] [SCHEME 1] Add support for new idle device tree format

2018-05-30 Thread Akshay Adiga

This patch adds support for new device-tree format for idle state
description.

Previously if a older kernel runs on a newer firmware, it may enable
all available states irrespective of its capability of handling it.
New device tree format adds a compatible flag, so that only kernel
which has the capability to handle the version of stop state will enable
it.

Older kernel will still see stop0 and stop0_lite in older format and we
will depricate it after some time.

Idea is to bump up the version in firmware if we find a bug or
regression in stop states. A fix will be provided in linux which would
now know about the bumped up version of stop states, where as kernel
without fixes would ignore the states.

New idle state device tree format :
   power-mgt {
...
ibm,enabled-stop-levels = <0xec00>;
ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
ibm,cpu-idle-state-flags = <0x10 0x101000>;
ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
ibm,idle-states {
phandle = <0xff>;
stop4 {
flags = <0x207000>;
compatible = "ibm,cpuidle-state-v1";
psscr-mask = <0x0 0x3003ff>;
handle = <0x102>;
latency-ns = <0x186a0>;
residency-ns = <0x989680>;
psscr = <0x0 0x300374>;
 };

...
stop11 {
...
compatible = "ibm,cpuoffline-state-v1";
...

   };
   };

Signed-off-by: Akshay Adiga 
---
 arch/powerpc/platforms/powernv/idle.c | 70 ++-
 drivers/cpuidle/cpuidle-powernv.c | 26 -
 2 files changed, 85 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 1f12ab1..074381d 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -623,17 +623,19 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 
*psscr_mask, u32 flags)
  * Returns 0 on success
  */
 static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
-   int dt_idle_states)
+   int dt_idle_states, int 
additional_states)
 {
u64 *psscr_val = NULL;
u64 *psscr_mask = NULL;
u32 *residency_ns = NULL;
u64 max_residency_ns = 0;
-   int rc = 0, i;
-
-   psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
-   psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL);
-   residency_ns = kcalloc(dt_idle_states, sizeof(*residency_ns),
+   int rc = 0, i, orig_dt_idle_states;
+   struct device_node *dt_node;
+   orig_dt_idle_states = dt_idle_states;
+   /* TODO: remove ugliness of using additional_states count*/
+   psscr_val = kcalloc(dt_idle_states+additional_states, 
sizeof(*psscr_val), GFP_KERNEL);
+   psscr_mask = kcalloc(dt_idle_states+additional_states, 
sizeof(*psscr_mask), GFP_KERNEL);
+   residency_ns = kcalloc(dt_idle_states+additional_states, 
sizeof(*residency_ns),
   GFP_KERNEL);
 
if (!psscr_val || !psscr_mask || !residency_ns) {
@@ -664,7 +666,37 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
rc = -1;
goto out;
}
+   for_each_compatible_node( dt_node, NULL, "ibm,cpuoffline-state_v1" ) {
+   printk("Found a state\n");
+   rc = of_property_read_u32(dt_node, "residency-ns" , 
&(residency_ns[dt_idle_states]));
+   if (rc)
+   printk("error reading residency rc= %d\n",rc);
+   rc = of_property_read_u64(dt_node, "psscr-mask" , 
&(psscr_mask[dt_idle_states]));
+   if (rc)
+   printk("error reading psscr-mask rc= %d\n",rc);
+   rc = of_property_read_u64(dt_node, "psscr" , 
&(psscr_val[dt_idle_states]));
+   if (rc)
+   printk("error reading psscr rc= %d\n",rc);
+   dt_idle_states++;
+   }
+   /* Fall back if no cpuoffline state found */
+   if ( orig_dt_idle_states == dt_idle_states) {
 
+   for_each_compatible_node( dt_node, NULL, "ibm,cpuidle-state_v1" 
) {
+   printk("Found a state\n");
+   rc = of_property_read_u32(dt_node, "residency-ns" , 
&(reside

Re: [PATCH] cpuidle/powernv : init all present cpus for deep states

2018-05-24 Thread Akshay Adiga

On Wed, May 16, 2018 at 05:32:14PM +0530, Akshay Adiga wrote:
> Init all present cpus for deep states instead of "all possible" cpus.
> Init fails if the possible cpu is gaurded. Resulting in making only
> non-deep states available for cpuidle/hotplug.
> 
> Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/idle.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 1f12ab1..1c5d067 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -79,7 +79,7 @@ static int pnv_save_sprs_for_deep_states(void)
>   uint64_t msr_val = MSR_IDLE;
>   uint64_t psscr_val = pnv_deepest_stop_psscr_val;
> 
> - for_each_possible_cpu(cpu) {
> + for_each_present_cpu(cpu) {
>   uint64_t pir = get_hard_smp_processor_id(cpu);
>   uint64_t hsprg0_val = (uint64_t)paca_ptrs[cpu];
> 
> @@ -814,7 +814,7 @@ static int __init pnv_init_idle_states(void)
>   int cpu;
> 
>   pr_info("powernv: idle: Saving PACA pointers of all CPUs in 
> their thread sibling PACA\n");
> - for_each_possible_cpu(cpu) {
> + for_each_present_cpu(cpu) {
>   int base_cpu = cpu_first_thread_sibling(cpu);
>   int idx = cpu_thread_in_core(cpu);
>   int i;
> -- 
> 2.5.5
>

Missed CC'ing to  m...@ellerman.id.au

Re: [PATCH] cpuidle/powernv : init all present cpus for deep states

2018-05-16 Thread Akshay Adiga

Yes this needs to be sent to  stable. 

Fixes: d405a98c ("powerpc/powernv: Move cpuidle related code from setup.c
to new file")

[PATCH] cpuidle/powernv : init all present cpus for deep states

2018-05-16 Thread Akshay Adiga

Init all present cpus for deep states instead of "all possible" cpus.
Init fails if the possible cpu is gaurded. Resulting in making only
non-deep states available for cpuidle/hotplug.

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/idle.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 1f12ab1..1c5d067 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -79,7 +79,7 @@ static int pnv_save_sprs_for_deep_states(void)
uint64_t msr_val = MSR_IDLE;
uint64_t psscr_val = pnv_deepest_stop_psscr_val;
 
-   for_each_possible_cpu(cpu) {
+   for_each_present_cpu(cpu) {
uint64_t pir = get_hard_smp_processor_id(cpu);
uint64_t hsprg0_val = (uint64_t)paca_ptrs[cpu];
 
@@ -814,7 +814,7 @@ static int __init pnv_init_idle_states(void)
int cpu;
 
pr_info("powernv: idle: Saving PACA pointers of all CPUs in 
their thread sibling PACA\n");
-   for_each_possible_cpu(cpu) {
+   for_each_present_cpu(cpu) {
int base_cpu = cpu_first_thread_sibling(cpu);
int idx = cpu_thread_in_core(cpu);
int i;
-- 
2.5.5

Re: [Skiboot] [PATCH 1/2] SLW: Remove stop1_lite and stop0 stop states

2018-05-10 Thread Akshay Adiga

On Thu, May 03, 2018 at 08:15:59PM +1000, Nicholas Piggin wrote:
> On Thu, 03 May 2018 20:03:55 +1000
> Stewart Smith <stew...@linux.vnet.ibm.com> wrote:
> 
> > Nicholas Piggin <npig...@gmail.com> writes:
> > > On Thu, 3 May 2018 14:36:47 +0530
> > > Akshay Adiga <akshay.ad...@linux.vnet.ibm.com> wrote:
> > >  
> > >> On Tue, May 01, 2018 at 01:47:23PM +1000, Nicholas Piggin wrote:  
> > >> > On Mon, 30 Apr 2018 14:42:08 +0530
> > >> > Akshay Adiga <akshay.ad...@linux.vnet.ibm.com> wrote:
> > >> > 
> > >> > > Powersaving for stop0_lite and stop1_lite is observed to be quite 
> > >> > > similar
> > >> > > and both states resume without state loss. Using context_switch test 
> > >> > > [1]
> > >> > > we observe that stop0_lite has slightly lower latency, hence removing
> > >> > > stop1_lite.
> > >> > > 
> > >> > > [1] linux/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> > >> > > 
> > >> > > Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
> > >> > 
> > >> > I'm okay for removing stop1_lite and stop2_lite -- SMT switching
> > >> > is very latency critical. If we decide to actually start saving
> > >> > real power then SMT should already have been switched.
> > >> > 
> > >> > So I would put stop1_lite and stop2_lite removal in the same patch.
> > >> 
> > >> I can do this.
> > >>   
> > >> > 
> > >> > Then what do we have? stop0_lite, stop0, stop1 for our fast idle
> > >> > states.
> > >> 
> > >> Currently we were looking at  stop0_lite , stop1 as the fast idle states
> > >> because stop0 and stop1 have similar latency and powersaving.
> > >> Having so many low latency states does not make sense.
> > >>   
> > >> > 
> > >> > I would be against removing stop0 if that is our fastest way to
> > >> > release SMT resources, even if there is only a small advantage. Why
> > >> > not remove stop1 instead?
> > >> >
> > >> SMT-folding comes into picture only when we have at least one thread
> > >> running in the core. stop0 and stop1 has exactly same power-saving and
> > >> both will release SMT resources if at least one thread in the core is
> > >> running.  
> > >
> > > Right, but you don't know that other threads are running or will remain
> > > running when you enter stop. If not, then latency is higher for stop1,
> > > no? So we need to be using stop0.
> > >  
> > >> 
> > >> As soon as all threads are idle core enters stop0/stop1, where stop1
> > >> does a bit more powersaving than stop0.
> > >>   
> > >> > We also need to better evaluate stop0_lite. How much advantage does
> > >> > that have over snooze?
> > >> 
> > >> I evaluated snooze and stop0_lite, there is an additional ipi latency of
> > >> a few microseconds in case of stop0_lite. So snooze cannot still be
> > >> replaced by stop0_lite.  
> > >
> > > I meant the other way around. Replace stop0_lite with snooze.
> > >
> > > So we would have snooze, stop0, stop2, and stop4 and/or 5.  
> > 
> > Slightly stupid question: should we be disabling these here or should
> > Linux be better and deciding what states to use?
> 
> Yeah not a bad question, I don't have a good answer. I don't know how
> smart Linux is at deciding what to use and when.
> 
> I am pretty sure the way we set our _lite states wrong -- we don't
> want to go into stop2_lite as a deeper sleep state than stop0 for
> example, because that then prevents SMT folding.

I think we should keep both stop0 and stop1, i was not able to get
a good enough reason to remove stop0.

I a diffrent patch we need to tweak residencies so that we can bias
to more useful stop states.

Re: [Skiboot] [PATCH 1/2] SLW: Remove stop1_lite and stop0 stop states

2018-05-03 Thread Akshay Adiga

On Tue, May 01, 2018 at 01:47:23PM +1000, Nicholas Piggin wrote:
> On Mon, 30 Apr 2018 14:42:08 +0530
> Akshay Adiga <akshay.ad...@linux.vnet.ibm.com> wrote:
> 
> > Powersaving for stop0_lite and stop1_lite is observed to be quite similar
> > and both states resume without state loss. Using context_switch test [1]
> > we observe that stop0_lite has slightly lower latency, hence removing
> > stop1_lite.
> > 
> > [1] linux/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> > 
> > Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
> 
> I'm okay for removing stop1_lite and stop2_lite -- SMT switching
> is very latency critical. If we decide to actually start saving
> real power then SMT should already have been switched.
> 
> So I would put stop1_lite and stop2_lite removal in the same patch.

I can do this.

> 
> Then what do we have? stop0_lite, stop0, stop1 for our fast idle
> states.

Currently we were looking at  stop0_lite , stop1 as the fast idle states
because stop0 and stop1 have similar latency and powersaving.
Having so many low latency states does not make sense.

> 
> I would be against removing stop0 if that is our fastest way to
> release SMT resources, even if there is only a small advantage. Why
> not remove stop1 instead?
>
SMT-folding comes into picture only when we have at least one thread
running in the core. stop0 and stop1 has exactly same power-saving and
both will release SMT resources if at least one thread in the core is
running.

As soon as all threads are idle core enters stop0/stop1, where stop1
does a bit more powersaving than stop0.

> We also need to better evaluate stop0_lite. How much advantage does
> that have over snooze?

I evaluated snooze and stop0_lite, there is an additional ipi latency of
a few microseconds in case of stop0_lite. So snooze cannot still be
replaced by stop0_lite.

> 
> Thanks,
> Nick
> 
> 
> > ---
> >  hw/slw.c | 30 --
> >  1 file changed, 30 deletions(-)
> > 
> > diff --git a/hw/slw.c b/hw/slw.c
> > index 3f9abaa..edfc783 100644
> > --- a/hw/slw.c
> > +++ b/hw/slw.c
> > @@ -521,36 +521,6 @@ static struct cpu_idle_states power9_cpu_idle_states[] 
> > = {
> >  | OPAL_PM_PSSCR_TR(3),
> > .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
> > {
> > -   .name = "stop0",
> > -   .latency_ns = 2000,
> > -   .residency_ns = 2,
> > -   .flags = 0*OPAL_PM_DEC_STOP \
> > -  | 0*OPAL_PM_TIMEBASE_STOP  \
> > -  | 1*OPAL_PM_LOSE_USER_CONTEXT \
> > -  | 0*OPAL_PM_LOSE_HYP_CONTEXT \
> > -  | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> > -  | 1*OPAL_PM_STOP_INST_FAST,
> > -   .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
> > -| OPAL_PM_PSSCR_MTL(3) \
> > -| OPAL_PM_PSSCR_TR(3) \
> > -| OPAL_PM_PSSCR_ESL \
> > -| OPAL_PM_PSSCR_EC,
> > -   .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
> > -   {
> > -   .name = "stop1_lite", /* Enter stop1 with no state loss */
> > -   .latency_ns = 4900,
> > -   .residency_ns = 49000,
> > -   .flags = 0*OPAL_PM_DEC_STOP \
> > -  | 0*OPAL_PM_TIMEBASE_STOP  \
> > -  | 0*OPAL_PM_LOSE_USER_CONTEXT \
> > -  | 0*OPAL_PM_LOSE_HYP_CONTEXT \
> > -  | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> > -  | 1*OPAL_PM_STOP_INST_FAST,
> > -   .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(1) \
> > -| OPAL_PM_PSSCR_MTL(3) \
> > -| OPAL_PM_PSSCR_TR(3),
> > -   .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
> > -   {
> > .name = "stop1",
> > .latency_ns = 5000,
> > .residency_ns = 5,
>

[RESEND][PATCH] cpuidle/powernv : Restore different PSSCR for idle and hotplug

2018-02-28 Thread Akshay Adiga

commit 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
states via stop API.") uses stop-api provided by the firmware to restore
PSSCR. PSSCR restore is required for handling special wakeup. When special
wakeup is completed, the core enters stop state based on restored PSSCR.

Currently PSSCR is restored to deepest available stop state, causing
a idle cpu to enter deeper stop state on a special wakeup, which causes
the cpu to hang on wakeup.

A "sensors" command which reads temperature (through DTS sensors) on idle
cpu can trigger special wakeup.

Failed Scenario :
Request restore of PSSCR with RL = 11
cpu enters idle state (stop5)
  user triggers "sensors" command
   Assert special wakeup on cpu
 Restores PSSCR with RL = 11  < Done by firmware
  Read DTS sensor
   Deassert special wakeup
  cpu enters idle state (stop11) <-- Instead of stop5

Cpu hang is caused because cpu ended up in a deeper state than it requested

This patch fixes instability caused by special wakeup when stop11 is
enabled. Requests restore of PSSCR to deepest stop state used by cpuidle.
Only when offlining cpu, request restore of PSSCR to deepest stop state.
On onlining cpu, request restore of PSSCR to deepest stop state used by
cpuidle.

Cc: <sta...@vger.kernel.org> # v4.14+
Fixes : 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
states via stop API.")
Reported-by: Pridhiviraj Paidipeddi <ppaid...@linux.vnet.ibm.com>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/cpuidle.h|  2 ++
 arch/powerpc/platforms/powernv/idle.c | 46 ---
 drivers/cpuidle/cpuidle-powernv.c |  1 -
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index e210a83..f52e9f1 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -67,6 +67,8 @@
 #define ERR_EC_ESL_MISMATCH-1
 #define ERR_DEEP_STATE_ESL_MISMATCH-2
 
+#define POWERNV_THRESHOLD_LATENCY_NS 20
+
 #ifndef __ASSEMBLY__
 /* Additional SPRs that need to be saved/restored during stop */
 struct stop_sprs {
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 443d5ca..4b0c7d24 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -56,8 +56,11 @@ u64 pnv_first_deep_stop_state = MAX_STOP_STATE;
  */
 static u64 pnv_deepest_stop_psscr_val;
 static u64 pnv_deepest_stop_psscr_mask;
+static u64 pnv_deepest_cpuidle_psscr_val;
+static u64 pnv_deepest_cpuidle_psscr_mask;
 static u64 pnv_deepest_stop_flag;
 static bool deepest_stop_found;
+static bool deepest_cpuidle_found;
 
 static int pnv_save_sprs_for_deep_states(void)
 {
@@ -76,7 +79,14 @@ static int pnv_save_sprs_for_deep_states(void)
uint64_t hid5_val = mfspr(SPRN_HID5);
uint64_t hmeer_val = mfspr(SPRN_HMEER);
uint64_t msr_val = MSR_IDLE;
-   uint64_t psscr_val = pnv_deepest_stop_psscr_val;
+
+   /*
+* Pick deepest cpuidle psscr as the value to be
+* restored through wakeup engine.
+* We will request a deeper state to be restored
+* in hotplug path
+*/
+   uint64_t psscr_val = pnv_deepest_cpuidle_psscr_val;
 
for_each_possible_cpu(cpu) {
uint64_t pir = get_hard_smp_processor_id(cpu);
@@ -409,7 +419,7 @@ static void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, 
u64 lpcr_val)
  */
 unsigned long pnv_cpu_offline(unsigned int cpu)
 {
-   unsigned long srr1;
+   u64 srr1;
u32 idle_states = pnv_get_supported_cpuidle_states();
u64 lpcr_val;
 
@@ -429,12 +439,18 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
__ppc64_runlatch_off();
 
if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
-   unsigned long psscr;
+   u64 psscr;
+   u64 pir = get_hard_smp_processor_id(cpu);
 
psscr = mfspr(SPRN_PSSCR);
psscr = (psscr & ~pnv_deepest_stop_psscr_mask) |
pnv_deepest_stop_psscr_val;
+   if (pnv_deepest_stop_psscr_val != pnv_deepest_cpuidle_psscr_val)
+   opal_slw_set_reg(pir, P9_STOP_SPR_PSSCR, psscr);
srr1 = power9_idle_stop(psscr);
+   psscr = (psscr & ~pnv_deepest_cpuidle_psscr_mask) |
+   pnv_deepest_cpuidle_psscr_val;
+   opal_slw_set_reg(pir, P9_STOP_SPR_PSSCR, psscr);
 
} else if ((idle_states & OPAL_PM_WINKLE_ENABLED) &&
   (idle_states & OPAL_PM_LOSE_FULL_CONTEXT)) {
@@ -555,6 +571,7 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
u64 *psscr_val = NULL;
u64 *

Re: [PATCH] cpuidle/powernv : Restore different PSSCR for idle and hotplug

2018-02-28 Thread Akshay Adiga

On Mon, Feb 26, 2018 at 03:47:12PM +1100, Stewart Smith wrote:
> Akshay Adiga <akshay.ad...@linux.vnet.ibm.com> writes:
> > commit 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
> > states via stop API.") uses stop-api provided by the firmware to restore
> > PSSCR. PSSCR restore is required for handling special wakeup. When special
> > wakeup is completed, the core enters stop state based on restored PSSCR.
> >
> > Currently PSSCR is restored to deepest available stop state, causing
> > a idle cpu to enter deeper stop state on a special wakeup, which causes
> > the cpu to hang on wakeup.
> >
> > A "sensors" command which reads temperature (through DTS sensors) on idle
> > cpu can trigger special wakeup.
> >
> > Failed Scenario :
> > Request restore of PSSCR with RL = 11
> > cpu enters idle state (stop5)
> >   user triggers "sensors" command
> >Assert special wakeup on cpu
> >  Restores PSSCR with RL = 11  < Done by firmware
> >   Read DTS sensor
> >Deassert special wakeup
> >   cpu enters idle state (stop11) <-- Instead of stop5
> >
> > Cpu hang is caused because cpu ended up in a deeper state than it requested
> >
> > This patch fixes instability caused by special wakeup when stop11 is
> > enabled. Requests restore of PSSCR to deepest stop state used by cpuidle.
> > Only when offlining cpu, request restore of PSSCR to deepest stop state.
> > On onlining cpu, request restore of PSSCR to deepest stop state used by
> > cpuidle.
> >
> > Fixes : 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
> > states via stop API.")
> 
> This should CC stable ?
> 
> We'll need this to enable stop11 in firmware and not break things, right?

Yes I will resend and CC it to stable.
> 
> -- 
> Stewart Smith
> OPAL Architect, IBM.
>

[PATCH] cpuidle/powernv : Restore different PSSCR for idle and hotplug

2018-02-19 Thread Akshay Adiga

commit 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
states via stop API.") uses stop-api provided by the firmware to restore
PSSCR. PSSCR restore is required for handling special wakeup. When special
wakeup is completed, the core enters stop state based on restored PSSCR.

Currently PSSCR is restored to deepest available stop state, causing
a idle cpu to enter deeper stop state on a special wakeup, which causes
the cpu to hang on wakeup.

A "sensors" command which reads temperature (through DTS sensors) on idle
cpu can trigger special wakeup.

Failed Scenario :
Request restore of PSSCR with RL = 11
cpu enters idle state (stop5)
  user triggers "sensors" command
   Assert special wakeup on cpu
 Restores PSSCR with RL = 11  < Done by firmware
  Read DTS sensor
   Deassert special wakeup
  cpu enters idle state (stop11) <-- Instead of stop5

Cpu hang is caused because cpu ended up in a deeper state than it requested

This patch fixes instability caused by special wakeup when stop11 is
enabled. Requests restore of PSSCR to deepest stop state used by cpuidle.
Only when offlining cpu, request restore of PSSCR to deepest stop state.
On onlining cpu, request restore of PSSCR to deepest stop state used by
cpuidle.

Fixes : 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
states via stop API.")
Reported-by: Pridhiviraj Paidipeddi <ppaid...@linux.vnet.ibm.com>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/cpuidle.h|  2 ++
 arch/powerpc/platforms/powernv/idle.c | 46 ---
 drivers/cpuidle/cpuidle-powernv.c |  1 -
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index e210a83..f52e9f1 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -67,6 +67,8 @@
 #define ERR_EC_ESL_MISMATCH-1
 #define ERR_DEEP_STATE_ESL_MISMATCH-2
 
+#define POWERNV_THRESHOLD_LATENCY_NS 20
+
 #ifndef __ASSEMBLY__
 /* Additional SPRs that need to be saved/restored during stop */
 struct stop_sprs {
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 443d5ca..4b0c7d24 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -56,8 +56,11 @@ u64 pnv_first_deep_stop_state = MAX_STOP_STATE;
  */
 static u64 pnv_deepest_stop_psscr_val;
 static u64 pnv_deepest_stop_psscr_mask;
+static u64 pnv_deepest_cpuidle_psscr_val;
+static u64 pnv_deepest_cpuidle_psscr_mask;
 static u64 pnv_deepest_stop_flag;
 static bool deepest_stop_found;
+static bool deepest_cpuidle_found;
 
 static int pnv_save_sprs_for_deep_states(void)
 {
@@ -76,7 +79,14 @@ static int pnv_save_sprs_for_deep_states(void)
uint64_t hid5_val = mfspr(SPRN_HID5);
uint64_t hmeer_val = mfspr(SPRN_HMEER);
uint64_t msr_val = MSR_IDLE;
-   uint64_t psscr_val = pnv_deepest_stop_psscr_val;
+
+   /*
+* Pick deepest cpuidle psscr as the value to be
+* restored through wakeup engine.
+* We will request a deeper state to be restored
+* in hotplug path
+*/
+   uint64_t psscr_val = pnv_deepest_cpuidle_psscr_val;
 
for_each_possible_cpu(cpu) {
uint64_t pir = get_hard_smp_processor_id(cpu);
@@ -409,7 +419,7 @@ static void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, 
u64 lpcr_val)
  */
 unsigned long pnv_cpu_offline(unsigned int cpu)
 {
-   unsigned long srr1;
+   u64 srr1;
u32 idle_states = pnv_get_supported_cpuidle_states();
u64 lpcr_val;
 
@@ -429,12 +439,18 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
__ppc64_runlatch_off();
 
if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
-   unsigned long psscr;
+   u64 psscr;
+   u64 pir = get_hard_smp_processor_id(cpu);
 
psscr = mfspr(SPRN_PSSCR);
psscr = (psscr & ~pnv_deepest_stop_psscr_mask) |
pnv_deepest_stop_psscr_val;
+   if (pnv_deepest_stop_psscr_val != pnv_deepest_cpuidle_psscr_val)
+   opal_slw_set_reg(pir, P9_STOP_SPR_PSSCR, psscr);
srr1 = power9_idle_stop(psscr);
+   psscr = (psscr & ~pnv_deepest_cpuidle_psscr_mask) |
+   pnv_deepest_cpuidle_psscr_val;
+   opal_slw_set_reg(pir, P9_STOP_SPR_PSSCR, psscr);
 
} else if ((idle_states & OPAL_PM_WINKLE_ENABLED) &&
   (idle_states & OPAL_PM_LOSE_FULL_CONTEXT)) {
@@ -555,6 +571,7 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
u64 *psscr_val = NULL;
u64 *psscr_mask = NULL;
u32 *residency_ns = NUL

Re: [PATCH] powerpc/powernv: Clear LPCR[PECE1] via stop-api only for deep state offline

2017-09-19 Thread Akshay Adiga


Hi Michael,

Any comments on this patch ?

On 09/06/2017 02:32 PM, pavrampu wrote:

On 2017-08-31 17:17, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
>
> commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
> stop-api only on Hotplug") clears the PECE1 bit of the LPCR via
> stop-api during CPU-Hotplug to prevent wakeup due to a decrementer on
> an offlined CPU which is in a deep stop state.
>
> In the case where the stop-api support is found to be lacking, the
> commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT
> states when stop-api fails") disables deep states that lose hypervisor
> context. Thus in this case, the offlined CPU will be put to some
> shallow idle state.
>
> However, we currently unconditionally clear the PECE1 in LPCR via
> stop-api during CPU-Hotplug even when deep states are disabled due to
> stop-api failure.
>
> Fix this by clearing PECE1 of LPCR via stop-api during CPU-Hotplug
> *only* when the offlined CPU will be put to a deep state that loses
> hypervisor context.
>
> Fixes: commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
> stop-api only on Hotplug")
>
> Reported-by: Pavithra Prakash 
> Signed-off-by: Gautham R. Shenoy 

Tested-by: Pavithra Prakash 

> ---
>  arch/powerpc/platforms/powernv/idle.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/idle.c
> b/arch/powerpc/platforms/powernv/idle.c
> index 9f59041..23f8fba 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -393,7 +393,13 @@ static void pnv_program_cpu_hotplug_lpcr(unsigned
> int cpu, u64 lpcr_val)
>  u64 pir = get_hard_smp_processor_id(cpu);
>
>  mtspr(SPRN_LPCR, lpcr_val);
> -opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
> +
> +/*
> + * Program the LPCR via stop-api only for deepest stop state
> + * can lose hypervisor context.
> + */
> +if (supported_cpuidle_states & OPAL_PM_LOSE_FULL_CONTEXT)
> +opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
>  }
>
>  /*

Re: [PATCH] powerpc/powernv: Clear LPCR[PECE1] via stop-api only for deep state offline

2017-09-01 Thread Akshay Adiga



On 08/31/2017 05:37 PM, Nicholas Piggin wrote:

On Thu, 31 Aug 2017 17:17:41 +0530
"Gautham R. Shenoy"  wrote:

> From: "Gautham R. Shenoy" 
>
> commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
> stop-api only on Hotplug") clears the PECE1 bit of the LPCR via
> stop-api during CPU-Hotplug to prevent wakeup due to a decrementer on
> an offlined CPU which is in a deep stop state.
>
> In the case where the stop-api support is found to be lacking, the
> commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT
> states when stop-api fails") disables deep states that lose hypervisor
> context. Thus in this case, the offlined CPU will be put to some
> shallow idle state.
>
> However, we currently unconditionally clear the PECE1 in LPCR via
> stop-api during CPU-Hotplug even when deep states are disabled due to
> stop-api failure.
>
> Fix this by clearing PECE1 of LPCR via stop-api during CPU-Hotplug
> *only* when the offlined CPU will be put to a deep state that loses
> hypervisor context.

This looks okay to me. The bug is due to calling opal_slw_set_reg when
firmware has not enabled that feature, right?

Yes,

In the case where the stop-api support is found to be lacking, the 
commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT 
states when stop-api fails") disables deep states that lose hypervisor 
context. Thus in this case, the offlined CPU will be put to some shallow 
idle state.


If a shallow state ( < stop4 ) is being chosen for cpu hotplug, then :
1) this opal call is not required.
2) may not be supported.

Hence  should call opal_slw_set_reg() only if a deep state chosen for 
cpu hotplug.


>
> Fixes: commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
> stop-api only on Hotplug")
>
> Reported-by: Pavithra Prakash 
> Signed-off-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/platforms/powernv/idle.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/idle.c
b/arch/powerpc/platforms/powernv/idle.c
> index 9f59041..23f8fba 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -393,7 +393,13 @@ static void
pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val)
>  u64 pir = get_hard_smp_processor_id(cpu);
>
>  mtspr(SPRN_LPCR, lpcr_val);
> -opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
> +
> +/*
> + * Program the LPCR via stop-api only for deepest stop state
> + * can lose hypervisor context.
> + */
> +if (supported_cpuidle_states & OPAL_PM_LOSE_FULL_CONTEXT)
> +opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
>  }
>
>  /*

[PATCH] powernv:idle: Clear r12 on wakeup from stop lite

2017-06-27 Thread Akshay Adiga

pnv_wakeup_noloss expects R12 to contain SRR1 value to determine if
the wakeup reason is an HMI in CHECK_HMI_INTERRUPT.

When we wakeup with ESL=0, SRR1 will not contain the wakeup reason, so
there is no point setting R12 to SRR1.

However, we don't set R12 at all and R12 contains garbage, and still
being used to check HMI assuming that it had SRR1. causing the
OPAL msglog to be filled with the following print:
HMI: Received HMI interrupt: HMER = 0x0040

This patch clears R12 after waking up from stop with ESL=EC=0, so that
we don't accidentally enter the HMI handler in pnv_wakeup_noloss if
the R12[42:45] corresponds to HMI as wakeup reason.

Bug existed prior to "commit 9d29250136f6 ("powerpc/64s/idle: Avoid SRR
usage in idle sleep/wake paths")  but was never hit in practice

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
Fixes: 9d29250136f6 ("powerpc/64s/idle: Avoid SRR usage in idle
sleep/wake paths")
---
 arch/powerpc/kernel/idle_book3s.S | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 1ea14b9..34794fd 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -256,6 +256,21 @@ power_enter_stop:
bne  .Lhandle_esl_ec_set
IDLE_STATE_ENTER_SEQ(PPC_STOP)
li  r3,0  /* Since we didn't lose state, return 0 */
+   /*
+* pnv_wakeup_noloss expects R12 to contain SRR1 value
+* to determine if the wakeup reason is an HMI in
+* CHECK_HMI_INTERRUPT.
+*
+* However, when we wakeup with ESL=0,
+* SRR1 will not contain the wakeup reason,
+* so there is no point setting R12 to SRR1.
+*
+* Further, we clear R12 here, so that we
+* don't accidentally enter the HMI
+* in pnv_wakeup_noloss if the
+* R12[42:45] == WAKE_HMI.
+*/
+   li  r12, 0
b   pnv_wakeup_noloss
 
 .Lhandle_esl_ec_set:
-- 
2.5.5

[PATCH -next] powernv: cpufreq: Fix uninitialized lpstate_idx in gpstates_timer_handler

2016-11-14 Thread Akshay Adiga

lpstate_idx remains uninitialized in the case when elapsed_time
is greater than MAX_RAMP_DOWN_TIME. At the end of rampdown
global pstate should be equal to local pstate.

Fixes: 20b15b766354 ("cpufreq: powernv: Use PMCR to verify global and
localpstate")
Reported-by: Stephen Rothwell <s...@canb.auug.org.au>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index c82304b..c5c5bc3 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -624,6 +624,7 @@ void gpstate_timer_handler(unsigned long data)
 
if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) {
gpstate_idx = pstate_to_idx(freq_data.pstate_id);
+   lpstate_idx = gpstate_idx;
reset_gpstates(policy);
gpstates->highest_lpstate_idx = gpstate_idx;
} else {
-- 
2.5.5

Re: [PATCH v2 2/2] cpufreq: powernv: Use PMCR to verify global and local pstate

2016-11-08 Thread Akshay Adiga


Thanks gautham for the review.

Good point, I have made the macros more generic in the next version as 
you have mentioned.


I will post a separate patch to set pstates using these macros. :)

On 11/08/2016 09:10 AM, Gautham R Shenoy wrote:

On Mon, Nov 07, 2016 at 01:09:09PM +0530, Akshay Adiga wrote:

As fast_switch() may get called with interrupt disable mode, we cannot
hold a mutex to update the global_pstate_info. So currently, fast_switch()
does not update the global_pstate_info and it will end up with stale data
whenever pstate is updated through fast_switch().

As the gpstate_timer can fire after fast_switch() has updated the pstates,
the timer handler cannot rely on the cached values of local and global
pstate and needs to read it from the PMCR.

Only gpstate_timer_handler() is affected by the stale cached pstate data
beacause either fast_switch() or target_index() routines will be called
for a given govenor, but gpstate_timer can fire after the governor has
changed to schedutil.


Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---

Changes from v1 :
- Corrected Commit message
- Type cast pstate values read from PMCR to type s8
- Added Macros to get local and global pstates from PMCR

Thanks for this. Could you also send a (separate patch) to set the
local and global pstates to PMCR in set_pstate?



  drivers/cpufreq/powernv-cpufreq.c | 34 --
  1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 4a4380d..bf4bc585 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -42,6 +42,8 @@
  #define PMSR_PSAFE_ENABLE (1UL << 30)
  #define PMSR_SPR_EM_DISABLE   (1UL << 31)
  #define PMSR_MAX(x)   ((x >> 32) & 0xFF)
+#define PMCR_LPSTATE(x)(((x) >> 48) & 0xFF)
+#define PMCR_GPSTATE(x)(((x) >> 56) & 0xFF)

You define:
#define LPSTATE_SHIFT48
#define GPSTATE_SHIFT56

since we can use this in the set_variants.

Moreover, the LPSTATE, GPSTATE retreival is applicable to both PMCR and PMSR. So
could you rename these functions to GET_LPSTATE, GET_GPSTATE.

Similarly, we might want to have a SET_LPSTATE, SET_GPSTATE and fix
the hard coded values that we have in set_pstate.



  #define MAX_RAMP_DOWN_TIME5120
  /*
@@ -592,7 +594,8 @@ void gpstate_timer_handler(unsigned long data)
  {
struct cpufreq_policy *policy = (struct cpufreq_policy *)data;
struct global_pstate_info *gpstates = policy->driver_data;
-   int gpstate_idx;
+   int gpstate_idx, lpstate_idx;
+   unsigned long val;
unsigned int time_diff = jiffies_to_msecs(jiffies)
- gpstates->last_sampled_time;
struct powernv_smp_call_data freq_data;
@@ -600,21 +603,36 @@ void gpstate_timer_handler(unsigned long data)
if (!spin_trylock(>gpstate_lock))
return;

+   /*
+* If PMCR was last updated was using fast_swtich then
+* We may have wrong in gpstate->last_lpstate_idx
+* value. Hence, read from PMCR to get correct data.
+*/
+   val = get_pmspr(SPRN_PMCR);
+   freq_data.gpstate_id = (s8)PMCR_GPSTATE(val);
+   freq_data.pstate_id = (s8)PMCR_LPSTATE(val);
+   if (freq_data.gpstate_id  == freq_data.pstate_id) {
+   reset_gpstates(policy);
+   spin_unlock(>gpstate_lock);
+   return;
+   }
+
gpstates->last_sampled_time += time_diff;
gpstates->elapsed_time += time_diff;
-   freq_data.pstate_id = idx_to_pstate(gpstates->last_lpstate_idx);

-   if ((gpstates->last_gpstate_idx == gpstates->last_lpstate_idx) ||
-   (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME)) {
+   if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) {
gpstate_idx = pstate_to_idx(freq_data.pstate_id);
reset_gpstates(policy);
gpstates->highest_lpstate_idx = gpstate_idx;
} else {
+   lpstate_idx = pstate_to_idx(freq_data.pstate_id);
gpstate_idx = calc_global_pstate(gpstates->elapsed_time,
 gpstates->highest_lpstate_idx,
-gpstates->last_lpstate_idx);
+lpstate_idx);
}
-
+   freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
+   gpstates->last_gpstate_idx = gpstate_idx;
+   gpstates->last_lpstate_idx = lpstate_idx;
/*
 * If local pstate is equal to global pstate, rampdown is over
 * So timer is not required to be queued.
@@ -622,10 +640,6 @@ void gpstate_timer_handler(unsigned long data)
if (gpstate_idx != gpstates->last_lpstate_idx)

[PATCH v3 2/2] cpufreq: powernv: Use PMCR to verify global and local pstate

2016-11-08 Thread Akshay Adiga

As fast_switch() may get called with interrupt disable mode, we cannot
hold a mutex to update the global_pstate_info. So currently, fast_switch()
does not update the global_pstate_info and it will end up with stale data
whenever pstate is updated through fast_switch().

As the gpstate_timer can fire after fast_switch() has updated the pstates,
the timer handler cannot rely on the cached values of local and global
pstate and needs to read it from the PMCR.

Only gpstate_timer_handler() is affected by the stale cached pstate data
beacause either fast_switch() or target_index() routines will be called
for a given govenor, but gpstate_timer can fire after the governor has
changed to schedutil.

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.ku...@linaro.org>
---
Changes from v2 :
- Added generic macros GET_LPSTATE and GET_GPSTATE
 instead of making it specific to PMCR.
Changes from v1 :
- Corrected Commit message
- Type cast pstate values read from PMCR to type s8
- Added Macros to get local and global pstates from PMCR

 drivers/cpufreq/powernv-cpufreq.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 4a4380d..c82304b 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -42,6 +42,10 @@
 #define PMSR_PSAFE_ENABLE  (1UL << 30)
 #define PMSR_SPR_EM_DISABLE(1UL << 31)
 #define PMSR_MAX(x)((x >> 32) & 0xFF)
+#define LPSTATE_SHIFT  48
+#define GPSTATE_SHIFT  56
+#define GET_LPSTATE(x) (((x) >> LPSTATE_SHIFT) & 0xFF)
+#define GET_GPSTATE(x) (((x) >> GPSTATE_SHIFT) & 0xFF)
 
 #define MAX_RAMP_DOWN_TIME 5120
 /*
@@ -592,7 +596,8 @@ void gpstate_timer_handler(unsigned long data)
 {
struct cpufreq_policy *policy = (struct cpufreq_policy *)data;
struct global_pstate_info *gpstates = policy->driver_data;
-   int gpstate_idx;
+   int gpstate_idx, lpstate_idx;
+   unsigned long val;
unsigned int time_diff = jiffies_to_msecs(jiffies)
- gpstates->last_sampled_time;
struct powernv_smp_call_data freq_data;
@@ -600,21 +605,36 @@ void gpstate_timer_handler(unsigned long data)
if (!spin_trylock(>gpstate_lock))
return;
 
+   /*
+* If PMCR was last updated was using fast_swtich then
+* We may have wrong in gpstate->last_lpstate_idx
+* value. Hence, read from PMCR to get correct data.
+*/
+   val = get_pmspr(SPRN_PMCR);
+   freq_data.gpstate_id = (s8)GET_GPSTATE(val);
+   freq_data.pstate_id = (s8)GET_LPSTATE(val);
+   if (freq_data.gpstate_id  == freq_data.pstate_id) {
+   reset_gpstates(policy);
+   spin_unlock(>gpstate_lock);
+   return;
+   }
+
gpstates->last_sampled_time += time_diff;
gpstates->elapsed_time += time_diff;
-   freq_data.pstate_id = idx_to_pstate(gpstates->last_lpstate_idx);
 
-   if ((gpstates->last_gpstate_idx == gpstates->last_lpstate_idx) ||
-   (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME)) {
+   if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) {
gpstate_idx = pstate_to_idx(freq_data.pstate_id);
reset_gpstates(policy);
gpstates->highest_lpstate_idx = gpstate_idx;
} else {
+   lpstate_idx = pstate_to_idx(freq_data.pstate_id);
gpstate_idx = calc_global_pstate(gpstates->elapsed_time,
 gpstates->highest_lpstate_idx,
-gpstates->last_lpstate_idx);
+lpstate_idx);
}
-
+   freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
+   gpstates->last_gpstate_idx = gpstate_idx;
+   gpstates->last_lpstate_idx = lpstate_idx;
/*
 * If local pstate is equal to global pstate, rampdown is over
 * So timer is not required to be queued.
@@ -622,10 +642,6 @@ void gpstate_timer_handler(unsigned long data)
if (gpstate_idx != gpstates->last_lpstate_idx)
queue_gpstate_timer(gpstates);
 
-   freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
-   gpstates->last_gpstate_idx = pstate_to_idx(freq_data.gpstate_id);
-   gpstates->last_lpstate_idx = pstate_to_idx(freq_data.pstate_id);
-
spin_unlock(>gpstate_lock);
 
/* Timer may get migrated to a different cpu on cpu hot unplug */
-- 
2.5.5

[PATCH v3 1/2] cpufreq: powernv: Adding fast_switch for schedutil

2016-11-08 Thread Akshay Adiga

Adding fast_switch which does light weight operation to set the desired
pstate. Both global and local pstates are set to the same desired pstate.

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.ku...@linaro.org>
---
Changes from v2 :
- No changes.
Changes from v1 :
- Removed unnecessary check for index out of bound.

 drivers/cpufreq/powernv-cpufreq.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index d3ffde8..4a4380d 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -752,9 +752,12 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
spin_lock_init(>gpstate_lock);
ret = cpufreq_table_validate_and_show(policy, powernv_freqs);
 
-   if (ret < 0)
+   if (ret < 0) {
kfree(policy->driver_data);
+   return ret;
+   }
 
+   policy->fast_switch_possible = true;
return ret;
 }
 
@@ -897,6 +900,20 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy 
*policy)
del_timer_sync(>timer);
 }
 
+static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,
+   unsigned int target_freq)
+{
+   int index;
+   struct powernv_smp_call_data freq_data;
+
+   index = cpufreq_table_find_index_dl(policy, target_freq);
+   freq_data.pstate_id = powernv_freqs[index].driver_data;
+   freq_data.gpstate_id = powernv_freqs[index].driver_data;
+   set_pstate(_data);
+
+   return powernv_freqs[index].frequency;
+}
+
 static struct cpufreq_driver powernv_cpufreq_driver = {
.name   = "powernv-cpufreq",
.flags  = CPUFREQ_CONST_LOOPS,
@@ -904,6 +921,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
.exit   = powernv_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index   = powernv_cpufreq_target_index,
+   .fast_switch= powernv_fast_switch,
.get= powernv_cpufreq_get,
.stop_cpu   = powernv_cpufreq_stop_cpu,
.attr   = powernv_cpu_freq_attr,
-- 
2.5.5

[PATCH v2 1/2] cpufreq: powernv: Adding fast_switch for schedutil

2016-11-06 Thread Akshay Adiga

Adding fast_switch which does light weight operation to set the desired
pstate. Both global and local pstates are set to the same desired pstate.

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
Changes from v1 :
- Removed unnecessary check for index out of bound.

 drivers/cpufreq/powernv-cpufreq.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index d3ffde8..4a4380d 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -752,9 +752,12 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
spin_lock_init(>gpstate_lock);
ret = cpufreq_table_validate_and_show(policy, powernv_freqs);
 
-   if (ret < 0)
+   if (ret < 0) {
kfree(policy->driver_data);
+   return ret;
+   }
 
+   policy->fast_switch_possible = true;
return ret;
 }
 
@@ -897,6 +900,20 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy 
*policy)
del_timer_sync(>timer);
 }
 
+static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,
+   unsigned int target_freq)
+{
+   int index;
+   struct powernv_smp_call_data freq_data;
+
+   index = cpufreq_table_find_index_dl(policy, target_freq);
+   freq_data.pstate_id = powernv_freqs[index].driver_data;
+   freq_data.gpstate_id = powernv_freqs[index].driver_data;
+   set_pstate(_data);
+
+   return powernv_freqs[index].frequency;
+}
+
 static struct cpufreq_driver powernv_cpufreq_driver = {
.name   = "powernv-cpufreq",
.flags  = CPUFREQ_CONST_LOOPS,
@@ -904,6 +921,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
.exit   = powernv_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index   = powernv_cpufreq_target_index,
+   .fast_switch= powernv_fast_switch,
.get= powernv_cpufreq_get,
.stop_cpu   = powernv_cpufreq_stop_cpu,
.attr   = powernv_cpu_freq_attr,
-- 
2.5.5

[PATCH v2 2/2] cpufreq: powernv: Use PMCR to verify global and local pstate

2016-11-06 Thread Akshay Adiga

As fast_switch() may get called with interrupt disable mode, we cannot
hold a mutex to update the global_pstate_info. So currently, fast_switch()
does not update the global_pstate_info and it will end up with stale data
whenever pstate is updated through fast_switch().

As the gpstate_timer can fire after fast_switch() has updated the pstates,
the timer handler cannot rely on the cached values of local and global
pstate and needs to read it from the PMCR.

Only gpstate_timer_handler() is affected by the stale cached pstate data
beacause either fast_switch() or target_index() routines will be called
for a given govenor, but gpstate_timer can fire after the governor has
changed to schedutil.


Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---

Changes from v1 :
- Corrected Commit message
- Type cast pstate values read from PMCR to type s8
- Added Macros to get local and global pstates from PMCR


 drivers/cpufreq/powernv-cpufreq.c | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 4a4380d..bf4bc585 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -42,6 +42,8 @@
 #define PMSR_PSAFE_ENABLE  (1UL << 30)
 #define PMSR_SPR_EM_DISABLE(1UL << 31)
 #define PMSR_MAX(x)((x >> 32) & 0xFF)
+#define PMCR_LPSTATE(x)(((x) >> 48) & 0xFF)
+#define PMCR_GPSTATE(x)(((x) >> 56) & 0xFF)
 
 #define MAX_RAMP_DOWN_TIME 5120
 /*
@@ -592,7 +594,8 @@ void gpstate_timer_handler(unsigned long data)
 {
struct cpufreq_policy *policy = (struct cpufreq_policy *)data;
struct global_pstate_info *gpstates = policy->driver_data;
-   int gpstate_idx;
+   int gpstate_idx, lpstate_idx;
+   unsigned long val;
unsigned int time_diff = jiffies_to_msecs(jiffies)
- gpstates->last_sampled_time;
struct powernv_smp_call_data freq_data;
@@ -600,21 +603,36 @@ void gpstate_timer_handler(unsigned long data)
if (!spin_trylock(>gpstate_lock))
return;
 
+   /*
+* If PMCR was last updated was using fast_swtich then
+* We may have wrong in gpstate->last_lpstate_idx
+* value. Hence, read from PMCR to get correct data.
+*/
+   val = get_pmspr(SPRN_PMCR);
+   freq_data.gpstate_id = (s8)PMCR_GPSTATE(val);
+   freq_data.pstate_id = (s8)PMCR_LPSTATE(val);
+   if (freq_data.gpstate_id  == freq_data.pstate_id) {
+   reset_gpstates(policy);
+   spin_unlock(>gpstate_lock);
+   return;
+   }
+
gpstates->last_sampled_time += time_diff;
gpstates->elapsed_time += time_diff;
-   freq_data.pstate_id = idx_to_pstate(gpstates->last_lpstate_idx);
 
-   if ((gpstates->last_gpstate_idx == gpstates->last_lpstate_idx) ||
-   (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME)) {
+   if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) {
gpstate_idx = pstate_to_idx(freq_data.pstate_id);
reset_gpstates(policy);
gpstates->highest_lpstate_idx = gpstate_idx;
} else {
+   lpstate_idx = pstate_to_idx(freq_data.pstate_id);
gpstate_idx = calc_global_pstate(gpstates->elapsed_time,
 gpstates->highest_lpstate_idx,
-gpstates->last_lpstate_idx);
+lpstate_idx);
}
-
+   freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
+   gpstates->last_gpstate_idx = gpstate_idx;
+   gpstates->last_lpstate_idx = lpstate_idx;
/*
 * If local pstate is equal to global pstate, rampdown is over
 * So timer is not required to be queued.
@@ -622,10 +640,6 @@ void gpstate_timer_handler(unsigned long data)
if (gpstate_idx != gpstates->last_lpstate_idx)
queue_gpstate_timer(gpstates);
 
-   freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
-   gpstates->last_gpstate_idx = pstate_to_idx(freq_data.gpstate_id);
-   gpstates->last_lpstate_idx = pstate_to_idx(freq_data.pstate_id);
-
spin_unlock(>gpstate_lock);
 
/* Timer may get migrated to a different cpu on cpu hot unplug */
-- 
2.5.5

Re: [PATCH 2/2] cpufreq: powernv: Use PMSR to verify global and local pstate

2016-11-06 Thread Akshay Adiga


Thanks Viresh for taking a look at it.

I will make the mentioned changes in the next version of the patch and
will add Shilpa and Gautham to the mail chain.

Regards

Akshay Adiga


On 11/04/2016 12:11 PM, Viresh Kumar wrote:

On 04-11-16, 10:57, Akshay Adiga wrote:

As fast_switch may get called in interrupt disable mode, it does not

s/in interrupt disable mode/with interrupts disabled
s/it does/it may


update the global_pstate_info data structure. Hence the global_pstate_info
has stale data whenever pstate is updated through fast_swtich().

s/has/may have
s/swtich/switch


So the gpstate_timer can fire after a fast_switch() call has update

s/So the/The
s/a fast_swtich() call has update/the fast_switch() call has updated


the pstates to a different value. Hence the timer handler cannot rely
on the cached values of local and global pstate and needs to read it
from the PMSR.

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>

---
  drivers/cpufreq/powernv-cpufreq.c | 32 ++--
  1 file changed, 22 insertions(+), 10 deletions(-)

I am not the best guy to judge the code changes here. Can you please include
Shilpa and Gautham to the mail chain and get there feedback.

Re: [PATCH 1/2] cpufreq: powernv: Adding fast_switch for schedutil

2016-11-06 Thread Akshay Adiga


Thanks Viresh for taking a look at it.

I will make the mentioned changes in the next version of the patch.


Regards

Akshay Adiga


On 11/04/2016 12:03 PM, Viresh Kumar wrote:

On 04-11-16, 10:57, Akshay Adiga wrote:

Adding fast_switch which does light weight operation to
set the desired pstate.

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
  drivers/cpufreq/powernv-cpufreq.c | 22 +-
  1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index d3ffde8..09a0496 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -752,9 +752,12 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
spin_lock_init(>gpstate_lock);
ret = cpufreq_table_validate_and_show(policy, powernv_freqs);
  
-	if (ret < 0)

+   if (ret < 0) {
kfree(policy->driver_data);
+   return ret;
+   }
  
+	policy->fast_switch_possible = true;

return ret;
  }
  
@@ -897,6 +900,22 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)

del_timer_sync(>timer);
  }
  
+static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,

+   unsigned int target_freq)
+{
+   int index;
+   struct powernv_smp_call_data freq_data;
+
+   index = cpufreq_table_find_index_dl(policy, target_freq);
+   if (index < 0 || index >= powernv_pstate_info.nr_pstates)
+   return CPUFREQ_ENTRY_INVALID;

I don't think such a check is required at all. It wouldn't happen without a BUG
in kernel.

+   freq_data.pstate_id = powernv_freqs[index].driver_data;
+   freq_data.gpstate_id = powernv_freqs[index].driver_data;
+   set_pstate(_data);
+
+   return powernv_freqs[index].frequency;
+}
+
  static struct cpufreq_driver powernv_cpufreq_driver = {
.name   = "powernv-cpufreq",
.flags  = CPUFREQ_CONST_LOOPS,
@@ -904,6 +923,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
.exit   = powernv_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index   = powernv_cpufreq_target_index,
+   .fast_switch= powernv_fast_switch,
.get= powernv_cpufreq_get,
.stop_cpu   = powernv_cpufreq_stop_cpu,
.attr   = powernv_cpu_freq_attr,
--
2.7.4

[PATCH 1/2] cpufreq: powernv: Adding fast_switch for schedutil

2016-11-03 Thread Akshay Adiga

Adding fast_switch which does light weight operation to
set the desired pstate.

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index d3ffde8..09a0496 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -752,9 +752,12 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
spin_lock_init(>gpstate_lock);
ret = cpufreq_table_validate_and_show(policy, powernv_freqs);
 
-   if (ret < 0)
+   if (ret < 0) {
kfree(policy->driver_data);
+   return ret;
+   }
 
+   policy->fast_switch_possible = true;
return ret;
 }
 
@@ -897,6 +900,22 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy 
*policy)
del_timer_sync(>timer);
 }
 
+static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,
+   unsigned int target_freq)
+{
+   int index;
+   struct powernv_smp_call_data freq_data;
+
+   index = cpufreq_table_find_index_dl(policy, target_freq);
+   if (index < 0 || index >= powernv_pstate_info.nr_pstates)
+   return CPUFREQ_ENTRY_INVALID;
+   freq_data.pstate_id = powernv_freqs[index].driver_data;
+   freq_data.gpstate_id = powernv_freqs[index].driver_data;
+   set_pstate(_data);
+
+   return powernv_freqs[index].frequency;
+}
+
 static struct cpufreq_driver powernv_cpufreq_driver = {
.name   = "powernv-cpufreq",
.flags  = CPUFREQ_CONST_LOOPS,
@@ -904,6 +923,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
.exit   = powernv_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index   = powernv_cpufreq_target_index,
+   .fast_switch= powernv_fast_switch,
.get= powernv_cpufreq_get,
.stop_cpu   = powernv_cpufreq_stop_cpu,
.attr   = powernv_cpu_freq_attr,
-- 
2.7.4

[PATCH 2/2] cpufreq: powernv: Use PMSR to verify global and local pstate

2016-11-03 Thread Akshay Adiga

As fast_switch may get called in interrupt disable mode, it does not
update the global_pstate_info data structure. Hence the global_pstate_info
has stale data whenever pstate is updated through fast_swtich().

So the gpstate_timer can fire after a fast_switch() call has update
the pstates to a different value. Hence the timer handler cannot rely
on the cached values of local and global pstate and needs to read it
from the PMSR. 

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>

---
 drivers/cpufreq/powernv-cpufreq.c | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 09a0496..57713b5 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -592,7 +592,8 @@ void gpstate_timer_handler(unsigned long data)
 {
struct cpufreq_policy *policy = (struct cpufreq_policy *)data;
struct global_pstate_info *gpstates = policy->driver_data;
-   int gpstate_idx;
+   int gpstate_idx, lpstate_idx;
+   unsigned long val;
unsigned int time_diff = jiffies_to_msecs(jiffies)
- gpstates->last_sampled_time;
struct powernv_smp_call_data freq_data;
@@ -600,21 +601,36 @@ void gpstate_timer_handler(unsigned long data)
if (!spin_trylock(>gpstate_lock))
return;
 
+   /*
+* If PMCR was last updated was using fast_swtich then
+* We may have wrong in gpstate->last_lpstate_idx
+* value. Hence, read from PMCR to get correct data.
+*/
+   val = get_pmspr(SPRN_PMCR);
+   freq_data.gpstate_id = (val >> (56)) & 0xFF;
+   freq_data.pstate_id = (val >> (48)) & 0xFF;
+   if (freq_data.gpstate_id  == freq_data.pstate_id) {
+   reset_gpstates(policy);
+   spin_unlock(>gpstate_lock);
+   return;
+   }
+
gpstates->last_sampled_time += time_diff;
gpstates->elapsed_time += time_diff;
-   freq_data.pstate_id = idx_to_pstate(gpstates->last_lpstate_idx);
 
-   if ((gpstates->last_gpstate_idx == gpstates->last_lpstate_idx) ||
-   (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME)) {
+   if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) {
gpstate_idx = pstate_to_idx(freq_data.pstate_id);
reset_gpstates(policy);
gpstates->highest_lpstate_idx = gpstate_idx;
} else {
+   lpstate_idx = pstate_to_idx(freq_data.pstate_id);
gpstate_idx = calc_global_pstate(gpstates->elapsed_time,
 gpstates->highest_lpstate_idx,
-gpstates->last_lpstate_idx);
+lpstate_idx);
}
-
+   freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
+   gpstates->last_gpstate_idx = gpstate_idx;
+   gpstates->last_lpstate_idx = lpstate_idx;
/*
 * If local pstate is equal to global pstate, rampdown is over
 * So timer is not required to be queued.
@@ -622,10 +638,6 @@ void gpstate_timer_handler(unsigned long data)
if (gpstate_idx != gpstates->last_lpstate_idx)
queue_gpstate_timer(gpstates);
 
-   freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
-   gpstates->last_gpstate_idx = pstate_to_idx(freq_data.gpstate_id);
-   gpstates->last_lpstate_idx = pstate_to_idx(freq_data.pstate_id);
-
spin_unlock(>gpstate_lock);
 
/* Timer may get migrated to a different cpu on cpu hot unplug */
-- 
2.7.4

Re: [PATCH] Work around for enabling CONFIG_CMDLINE on ppc64le

2016-09-27 Thread Akshay Adiga


Hi Michael,

Here is the link to the bug raised on launchpad.
https://bugs.launchpad.net/ubuntu/+source/gcc-5/+bug/1628207


On 09/23/2016 09:45 AM, Akshay Adiga wrote:

Hi Michael,

Anton found this bug and raised it against gcc v7.0 and a fix is 
available

 in upstream gcc.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71709

Currently, gcc v5.4.0  and v6.1.1 shipped with Ubuntu 16.04 and 16.10  
respectively,

 are hitting this problem.

I have also raised bug against Ubuntu for fixing gcc for 16.04.

https://bugzilla.linux.ibm.com/show_bug.cgi?id=146668


On 09/22/2016 03:51 PM, Michael Ellerman wrote:

Akshay Adiga <akshay.ad...@linux.vnet.ibm.com> writes:


Observed that boot arguments (passed as CONFIG_CMDLINE)  are not being
picked up by kernel while using gcc-ppc64-linux-gnu v5.4.0 and v6.1.1.
While it works as expected with v5.3.1 .

Found that in init/main.c in  setup_command_line() the pointers 
passed to

strcpy() is messed up.

Hi Akshay,

Thanks for debugging this.


The problem goes away when compiler optimization is restricted to -O1.
diff --git a/init/main.c b/init/main.c
index a8a58e2..4259c42 100644
--- a/init/main.c
+++ b/init/main.c
@@ -358,7 +358,13 @@ static inline void smp_prepare_cpus(unsigned 
int maxcpus) { }

   * parsing is performed in place, and we should allow a component to
   * store reference of name/value for future reference.
   */
-static void __init setup_command_line(char *command_line)
+static void __init
+#ifdef CONFIG_PPC64
+#if  GCC_VERSION > 50301
+__attribute__((optimize("-O1")))
+#endif
+#endif
+setup_command_line(char *command_line)
  {
  saved_command_line =
  memblock_virt_alloc(strlen(boot_command_line) + 1, 0);

But I can't merge that patch.

Our options are one or both of:
  - get GCC fixed and backport the fix to the compilers we care about.
  - blacklist the broken compiler versions.

Is there a GCC bug filed for this?

cheers

Re: [PATCH] Work around for enabling CONFIG_CMDLINE on ppc64le

2016-09-22 Thread Akshay Adiga


Hi Michael,

Anton found this bug and raised it against gcc v7.0 and a fix is available
 in upstream gcc.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71709

Currently, gcc v5.4.0  and v6.1.1 shipped with Ubuntu 16.04 and 16.10  
respectively,
 are hitting this problem.

I have also raised bug against Ubuntu for fixing gcc for 16.04.

https://bugzilla.linux.ibm.com/show_bug.cgi?id=146668


On 09/22/2016 03:51 PM, Michael Ellerman wrote:

Akshay Adiga <akshay.ad...@linux.vnet.ibm.com> writes:


Observed that boot arguments (passed as CONFIG_CMDLINE)  are not being
picked up by kernel while using gcc-ppc64-linux-gnu v5.4.0 and v6.1.1.
While it works as expected with v5.3.1 .

Found that in init/main.c in  setup_command_line() the pointers passed to
strcpy() is messed up.

Hi Akshay,

Thanks for debugging this.


The problem goes away when compiler optimization is restricted to -O1.
diff --git a/init/main.c b/init/main.c
index a8a58e2..4259c42 100644
--- a/init/main.c
+++ b/init/main.c
@@ -358,7 +358,13 @@ static inline void smp_prepare_cpus(unsigned int maxcpus) 
{ }
   * parsing is performed in place, and we should allow a component to
   * store reference of name/value for future reference.
   */
-static void __init setup_command_line(char *command_line)
+static void __init
+#ifdef CONFIG_PPC64
+   #if  GCC_VERSION > 50301
+   __attribute__((optimize("-O1")))
+   #endif
+#endif
+   setup_command_line(char *command_line)
  {
saved_command_line =
memblock_virt_alloc(strlen(boot_command_line) + 1, 0);

But I can't merge that patch.

Our options are one or both of:
  - get GCC fixed and backport the fix to the compilers we care about.
  - blacklist the broken compiler versions.

Is there a GCC bug filed for this?

cheers

[PATCH] Work around for enabling CONFIG_CMDLINE on ppc64le

2016-09-22 Thread Akshay Adiga

Observed that boot arguments (passed as CONFIG_CMDLINE)  are not being
picked up by kernel while using gcc-ppc64-linux-gnu v5.4.0 and v6.1.1.
While it works as expected with v5.3.1 .

Found that in init/main.c in  setup_command_line() the pointers passed to
strcpy() is messed up.

source for setup_command_line from init/main.c:
void setup_command_line(char *command_line)
{
saved_command_line =
memblock_virt_alloc(strlen(boot_command_line) + 1, 0);
initcall_command_line =
memblock_virt_alloc(strlen(boot_command_line) + 1, 0);
static_command_line = memblock_virt_alloc(strlen(command_line) + 1, 0);
strcpy(saved_command_line, boot_command_line);
strcpy(static_command_line, command_line);
}

Following is the asm dump for strcpy:

char *strcpy(char *dest, const char *src)
{
c0161408:   ff ff 84 38 addir4,r4,-1
c016140c:   ff ff 43 39 addir10,r3,-1
char *tmp = dest;

while ((*dest++ = *src++) != '\0')
c0161410:   01 00 24 8d lbzur9,1(r4)
c0161414:   00 00 a9 2f cmpdi   cr7,r9,0
c0161418:   01 00 2a 9d stbur9,1(r10)
c016141c:   f4 ff 9e 40 bne cr7,c0161410
<strcpy+0x8>
/* nothing */;
return tmp;
}

Following are the asm dump for the working and non working binaries which
concluded that the argument for the second strcpy() is not loaded into r3 and
is getting clobbered with the return value of previous strcpy().

Not Working asm dump :

c03308d8:   38 c4 6a f8 std r3,-15304(r10)
strcpy(saved_command_line, boot_command_line);
c03308dc:   06 00 62 3c addis   r3,r2,6
c03308e0:   28 c4 63 e8 ld  r3,-15320(r3)
c03308e4:   25 0b e3 4b bl  c0161408

c03308e8:   00 00 00 60 nop
strcpy(static_command_line, command_line);
c03308ec:   78 f3 c4 7f mr  r4,r30
c03308f0:   19 0b e3 4b bl  c0161408

c03308f4:   00 00 00 60 nop

Working asm dump :

c03308d4:   38 c4 c3 fb std r30,-15304(r3)
strcpy(saved_command_line, boot_command_line);
c03308d8:   06 00 62 3c addis   r3,r2,6
c03308dc:   28 c4 63 e8 ld  r3,-15320(r3)
c03308e0:   6d 08 e3 4b bl  c016114c

c03308e4:   00 00 00 60 nop
strcpy(static_command_line, command_line);
c03308e8:   78 eb a4 7f mr  r4,r29
c03308ec:   78 f3 c3 7f mr  r3,r30
c03308f0:   5d 08 e3 4b bl  c016114c

c03308f4:   00 00 00 60 nop

The problem goes away when compiler optimization is restricted to -O1.

Reported-by: Madhavan Srinivasan <ma...@linux.vnet.ibm.com>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
 init/main.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index a8a58e2..4259c42 100644
--- a/init/main.c
+++ b/init/main.c
@@ -358,7 +358,13 @@ static inline void smp_prepare_cpus(unsigned int maxcpus) 
{ }
  * parsing is performed in place, and we should allow a component to
  * store reference of name/value for future reference.
  */
-static void __init setup_command_line(char *command_line)
+static void __init
+#ifdef CONFIG_PPC64
+   #if  GCC_VERSION > 50301
+   __attribute__((optimize("-O1")))
+   #endif
+#endif
+   setup_command_line(char *command_line)
 {
saved_command_line =
memblock_virt_alloc(strlen(boot_command_line) + 1, 0);
-- 
2.5.5

[PATCH] cpufreq: powernv: Fix crash in gpstate_timer_handler

2016-08-04 Thread Akshay Adiga

'commit 09ca4c9b5958 ("cpufreq: powernv: Replacing pstate_id with
frequency table index")' changes calc_global_pstate() to use
cpufreq_table index instead of pstate_id.

But in gpstate_timer_handler() pstate_id was being passed instead
of cpufreq_table index, which caused the index_to_pstate() to access
out of bound indices, leading to this crash.

Adding sanity check for index and pstate, to ensure only valid pstate
and index values are returned.

Call Trace:
[c0078d66b130] [c011d224] __free_irq+0x234/0x360
(unreliable)
[c0078d66b1c0] [c011d44c] free_irq+0x6c/0xa0
[c0078d66b1f0] [c006c4f8] opal_event_shutdown+0x88/0xd0
[c0078d66b230] [c0067a4c] opal_shutdown+0x1c/0x90
[c0078d66b260] [c0063a00] pnv_shutdown+0x20/0x40
[c0078d66b280] [c0021538] machine_restart+0x38/0x90
[c78d66b310] [c0965ea0] panic+0x284/0x300
[c0078d66b3a0] [c001f508] die+0x388/0x450
[c0078d66b430] [c0045a50] bad_page_fault+0xd0/0x140
[c0078d66b4a0] [c0008964] handle_page_fault+0x2c/0x30
   interrupt: 300 at gpstate_timer_handler+0x150/0x260
LR = gpstate_timer_handler+0x130/0x260
[c0078d66b7f0] [c0132b58] call_timer_fn+0x58/0x1c0
[c0078d66b880] [c0132e20] expire_timers+0x130/0x1d0
[c0078d66b8f0] [c0133068] run_timer_softirq+0x1a8/0x230
[c0078d66b980] [c00b535c] __do_softirq+0x18c/0x400
[c0078d66ba70] [c00b5828] irq_exit+0xc8/0x100
[c0078d66ba90] [c001e214] timer_interrupt+0xa4/0xe0
[c0078d66bac0] [c00027d0] decrementer_common+0x150/0x180
   interrupt: 901 at arch_local_irq_restore+0x74/0x90
  0] [c0106b34] call_cpuidle+0x44/0x90
[c0078d66be50] [c010708c] cpu_startup_entry+0x38c/0x460
[c0078d66bf20] [c003d930] start_secondary+0x330/0x380
[c0078d66bf90] [c0008e6c] start_secondary_prolog+0x10/0x14

Fixes: 08d27eb ("cpufreq: powernv: Replacing pstate_id with
frequency table index")
Reported-by: Madhavan Srinivasan <ma...@linux.vnet.ibm.com>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 87796e0..d3ffde8 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -145,11 +145,30 @@ static struct powernv_pstate_info {
 /* Use following macros for conversions between pstate_id and index */
 static inline int idx_to_pstate(unsigned int i)
 {
+   if (unlikely(i >= powernv_pstate_info.nr_pstates)) {
+   pr_warn_once("index %u is out of bound\n", i);
+   return powernv_freqs[powernv_pstate_info.nominal].driver_data;
+   }
+
return powernv_freqs[i].driver_data;
 }
 
 static inline unsigned int pstate_to_idx(int pstate)
 {
+   int min = powernv_freqs[powernv_pstate_info.min].driver_data;
+   int max = powernv_freqs[powernv_pstate_info.max].driver_data;
+
+   if (min > 0) {
+   if (unlikely((pstate < max) || (pstate > min))) {
+   pr_warn_once("pstate %d is out of bound\n", pstate);
+   return powernv_pstate_info.nominal;
+   }
+   } else {
+   if (unlikely((pstate > max) || (pstate < min))) {
+   pr_warn_once("pstate %d is out of bound\n", pstate);
+   return powernv_pstate_info.nominal;
+   }
+   }
/*
 * abs() is deliberately used so that is works with
 * both monotonically increasing and decreasing
@@ -593,7 +612,7 @@ void gpstate_timer_handler(unsigned long data)
} else {
gpstate_idx = calc_global_pstate(gpstates->elapsed_time,
 gpstates->highest_lpstate_idx,
-freq_data.pstate_id);
+gpstates->last_lpstate_idx);
}
 
/*
-- 
2.1.4

Re: [PATCH v2] cpufreq: powernv: Replacing pstate_id with frequency table index

2016-07-06 Thread Akshay Adiga




On 06/30/2016 11:53 AM, Akshay Adiga wrote:

Refactoring code to use frequency table index instead of pstate_id.
This abstraction will make the code independent of the pstate values.

- No functional changes
- The highest frequency is at frequency table index 0 and the frequency
   decreases as the index increases.
- Macros pstates_to_idx() and idx_to_pstate() can be used for conversion
   between pstate_id and index.
- powernv_pstate_info now contains frequency table index to min, max and
   nominal frequency (instead of pstate_ids)
- global_pstate_info new stores index values instead pstate ids.
- variables renamed as *_idx which now store index instead of pstate

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
---
Changes from v1:
   - changed macro names from get_pstate()/ get_index() to
idx_to_pstate()/ pstate_to_idx()
   - Renamed variables that store index instead of pstate_id to *_idx
   - Retained previous printk's

v1 : http://marc.info/?l=linux-pm=146677701501225=1

  drivers/cpufreq/powernv-cpufreq.c | 177 ++
  1 file changed, 102 insertions(+), 75 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index b29c5c2..72c91d8 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -64,12 +64,14 @@
  /**
   * struct global_pstate_info -Per policy data structure to maintain 
history of
   *global pstates
- * @highest_lpstate:   The local pstate from which we are ramping down
+ * @highest_lpstate_idx:   The local pstate index from which we are
+ * ramping down
   * @elapsed_time: Time in ms spent in ramping down from
- * highest_lpstate
+ * highest_lpstate_idx
   * @last_sampled_time:Time from boot in ms when global 
pstates were
   *last set
- * @last_lpstate,last_gpstate: Last set values for local and global pstates
+ * @last_lpstate_idx,  Last set value of local pstate and global
+ * last_gpstate_idxpstate in terms of cpufreq table index
   * @timer:Is used for ramping down if cpu goes idle for
   *a long time with global pstate held high
   * @gpstate_lock: A spinlock to maintain synchronization between
@@ -77,11 +79,11 @@
   *governer's target_index calls
   */
  struct global_pstate_info {
-   int highest_lpstate;
+   int highest_lpstate_idx;
unsigned int elapsed_time;
unsigned int last_sampled_time;
-   int last_lpstate;
-   int last_gpstate;
+   int last_lpstate_idx;
+   int last_gpstate_idx;
spinlock_t gpstate_lock;
struct timer_list timer;
  };
@@ -124,29 +126,47 @@ static int nr_chips;
  static DEFINE_PER_CPU(struct chip *, chip_info);
  
  /*

- * Note: The set of pstates consists of contiguous integers, the
- * smallest of which is indicated by powernv_pstate_info.min, the
- * largest of which is indicated by powernv_pstate_info.max.
+ * Note:
+ * The set of pstates consists of contiguous integers.
+ * powernv_pstate_info stores the index of the frequency table for
+ * max, min and nominal frequencies. It also stores number of
+ * available frequencies.
   *
- * The nominal pstate is the highest non-turbo pstate in this
- * platform. This is indicated by powernv_pstate_info.nominal.
+ * powernv_pstate_info.nominal indicates the index to the highest
+ * non-turbo frequency.
   */
  static struct powernv_pstate_info {
-   int min;
-   int max;
-   int nominal;
-   int nr_pstates;
+   unsigned int min;
+   unsigned int max;
+   unsigned int nominal;
+   unsigned int nr_pstates;
  } powernv_pstate_info;
  
+/* Use following macros for conversions between pstate_id and index */

+static inline int idx_to_pstate(unsigned int i)
+{
+   return powernv_freqs[i].driver_data;
+}
+
+static inline unsigned int pstate_to_idx(int pstate)
+{
+   /*
+* abs() is deliberately used so that is works with
+* both monotonically increasing and decreasing
+* pstate values
+*/
+   return abs(pstate - idx_to_pstate(powernv_pstate_info.max));
+}
+
  static inline void reset_gpstates(struct cpufreq_policy *policy)
  {
struct global_pstate_info *gpstates = policy->driver_data;
  
-	gpstates->highest_lpstate = 0;

+   gpstates->highest_lpstate_idx = 0;
gpstates->elapsed_time = 0;
gpstates->last_sampled_time = 0;
-   gpstates->last_lpstate = 0;
-   gpstates->last_gpstate = 0;
+   gpstates->last_lpstate_idx = 0;
+   gpstates->last_gpstate_idx = 0;
  }
  
  /*

@@ -156,9 +176,10 @@ static inline void reset_gpstates(struc

[PATCH v2] cpufreq: powernv: Replacing pstate_id with frequency table index

2016-06-30 Thread Akshay Adiga

Refactoring code to use frequency table index instead of pstate_id.
This abstraction will make the code independent of the pstate values.

- No functional changes
- The highest frequency is at frequency table index 0 and the frequency
  decreases as the index increases.
- Macros pstates_to_idx() and idx_to_pstate() can be used for conversion
  between pstate_id and index.
- powernv_pstate_info now contains frequency table index to min, max and
  nominal frequency (instead of pstate_ids)
- global_pstate_info new stores index values instead pstate ids.
- variables renamed as *_idx which now store index instead of pstate

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
---
Changes from v1:
  - changed macro names from get_pstate()/ get_index() to 
idx_to_pstate()/ pstate_to_idx()
  - Renamed variables that store index instead of pstate_id to *_idx
  - Retained previous printk's 

v1 : http://marc.info/?l=linux-pm=146677701501225=1

 drivers/cpufreq/powernv-cpufreq.c | 177 ++
 1 file changed, 102 insertions(+), 75 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index b29c5c2..72c91d8 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -64,12 +64,14 @@
 /**
  * struct global_pstate_info - Per policy data structure to maintain history of
  * global pstates
- * @highest_lpstate:   The local pstate from which we are ramping down
+ * @highest_lpstate_idx:   The local pstate index from which we are
+ * ramping down
  * @elapsed_time:  Time in ms spent in ramping down from
- * highest_lpstate
+ * highest_lpstate_idx
  * @last_sampled_time: Time from boot in ms when global pstates were
  * last set
- * @last_lpstate,last_gpstate: Last set values for local and global pstates
+ * @last_lpstate_idx,  Last set value of local pstate and global
+ * last_gpstate_idxpstate in terms of cpufreq table index
  * @timer: Is used for ramping down if cpu goes idle for
  * a long time with global pstate held high
  * @gpstate_lock:  A spinlock to maintain synchronization between
@@ -77,11 +79,11 @@
  * governer's target_index calls
  */
 struct global_pstate_info {
-   int highest_lpstate;
+   int highest_lpstate_idx;
unsigned int elapsed_time;
unsigned int last_sampled_time;
-   int last_lpstate;
-   int last_gpstate;
+   int last_lpstate_idx;
+   int last_gpstate_idx;
spinlock_t gpstate_lock;
struct timer_list timer;
 };
@@ -124,29 +126,47 @@ static int nr_chips;
 static DEFINE_PER_CPU(struct chip *, chip_info);
 
 /*
- * Note: The set of pstates consists of contiguous integers, the
- * smallest of which is indicated by powernv_pstate_info.min, the
- * largest of which is indicated by powernv_pstate_info.max.
+ * Note:
+ * The set of pstates consists of contiguous integers.
+ * powernv_pstate_info stores the index of the frequency table for
+ * max, min and nominal frequencies. It also stores number of
+ * available frequencies.
  *
- * The nominal pstate is the highest non-turbo pstate in this
- * platform. This is indicated by powernv_pstate_info.nominal.
+ * powernv_pstate_info.nominal indicates the index to the highest
+ * non-turbo frequency.
  */
 static struct powernv_pstate_info {
-   int min;
-   int max;
-   int nominal;
-   int nr_pstates;
+   unsigned int min;
+   unsigned int max;
+   unsigned int nominal;
+   unsigned int nr_pstates;
 } powernv_pstate_info;
 
+/* Use following macros for conversions between pstate_id and index */
+static inline int idx_to_pstate(unsigned int i)
+{
+   return powernv_freqs[i].driver_data;
+}
+
+static inline unsigned int pstate_to_idx(int pstate)
+{
+   /*
+* abs() is deliberately used so that is works with
+* both monotonically increasing and decreasing
+* pstate values
+*/
+   return abs(pstate - idx_to_pstate(powernv_pstate_info.max));
+}
+
 static inline void reset_gpstates(struct cpufreq_policy *policy)
 {
struct global_pstate_info *gpstates = policy->driver_data;
 
-   gpstates->highest_lpstate = 0;
+   gpstates->highest_lpstate_idx = 0;
gpstates->elapsed_time = 0;
gpstates->last_sampled_time = 0;
-   gpstates->last_lpstate = 0;
-   gpstates->last_gpstate = 0;
+   gpstates->last_lpstate_idx = 0;
+   gpstates->last_gpstate_idx = 0;
 }
 
 /*
@@ -156,9 +176,10 @@ static inline void reset_gpstates(struct cpufreq_policy 
*policy)
 static int init_powernv_pstates(void)
 {
struct device_nod

Re: [PATCH] cpufreq: powernv: Replacing pstate_id with frequency table index

2016-06-27 Thread Akshay Adiga


Hi viresh,

My apologies. I realize that i have messed it up a quite a few places. Surely 
with the checkpatch as well. I will send a v2 with corrections.

On 06/27/2016 12:00 PM, Viresh Kumar wrote:


Hi Akshay,

Did you try running checkpatch for this?

On 24-06-16, 19:33, Akshay Adiga wrote:

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index b29c5c2..f6ce6f0 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -43,6 +43,7 @@
  #define PMSR_SPR_EM_DISABLE   (1UL << 31)
  #define PMSR_MAX(x)   ((x >> 32) & 0xFF)
  
+

?


  #define MAX_RAMP_DOWN_TIME5120
  /*
   * On an idle system we want the global pstate to ramp-down from max value to
@@ -124,20 +125,29 @@ static int nr_chips;
  static DEFINE_PER_CPU(struct chip *, chip_info);
  
  /*

- * Note: The set of pstates consists of contiguous integers, the
- * smallest of which is indicated by powernv_pstate_info.min, the
- * largest of which is indicated by powernv_pstate_info.max.
+ * Note: The set of pstates consists of contiguous integers,
+ *
+ * powernv_pstate_info stores the index of the frequency table
+ *  instead of pstate itself for each of the pstates referred
   *
   * The nominal pstate is the highest non-turbo pstate in this
   * platform. This is indicated by powernv_pstate_info.nominal.
   */
  static struct powernv_pstate_info {
-   int min;
-   int max;
-   int nominal;
-   int nr_pstates;
+   unsigned int min;
+   unsigned int max;
+   unsigned int nominal;
+   unsigned int nr_pstates;
  } powernv_pstate_info;
  
+/* Use following macros for conversions between pstate_id and index */

+static inline int get_pstate(unsigned int i) {

Read coding-styles please on how to write functions.


+   return powernv_freqs[i].driver_data;
+}

Add a blank line here please.


+static inline unsigned int get_index(int pstate) {
+   return abs(pstate - get_pstate(powernv_pstate_info.max));
+}
+
  static inline void reset_gpstates(struct cpufreq_policy *policy)
  {
struct global_pstate_info *gpstates = policy->driver_data;
@@ -208,23 +218,28 @@ static int init_powernv_pstates(void)
return -ENODEV;
}
  
+	powernv_pstate_info.nr_pstates = nr_pstates;

pr_debug("NR PStates %d\n", nr_pstates);
for (i = 0; i < nr_pstates; i++) {
u32 id = be32_to_cpu(pstate_ids[i]);
u32 freq = be32_to_cpu(pstate_freqs[i]);
  
-		pr_debug("PState id %d freq %d MHz\n", id, freq);

?


powernv_freqs[i].frequency = freq * 1000; /* kHz */
powernv_freqs[i].driver_data = id;

Will it be possible for Shilpa who was earlier on this to review this patch? As
we don't really have great knowledge of the internals of this driver.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] cpufreq: powernv: Replacing pstate_id with frequency table index

2016-06-24 Thread Akshay Adiga

Refactoring code to use frequency table index instead of pstate_id.
This abstraction will make the code independent of the pstate values.

- No functional changes
- The highest frequency is at frequency table index 0 and the frequency
  decreases as the index increases.
- Macros get_index() and get_pstate() can be used for conversion between
  pstate_id and index.
- powernv_pstate_info now contains frequency table index to min, max and
  nominal frequency (instead of pstate_ids)

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 107 ++
 1 file changed, 61 insertions(+), 46 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index b29c5c2..f6ce6f0 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -43,6 +43,7 @@
 #define PMSR_SPR_EM_DISABLE(1UL << 31)
 #define PMSR_MAX(x)((x >> 32) & 0xFF)
 
+
 #define MAX_RAMP_DOWN_TIME 5120
 /*
  * On an idle system we want the global pstate to ramp-down from max value to
@@ -124,20 +125,29 @@ static int nr_chips;
 static DEFINE_PER_CPU(struct chip *, chip_info);
 
 /*
- * Note: The set of pstates consists of contiguous integers, the
- * smallest of which is indicated by powernv_pstate_info.min, the
- * largest of which is indicated by powernv_pstate_info.max.
+ * Note: The set of pstates consists of contiguous integers,
+ *
+ * powernv_pstate_info stores the index of the frequency table
+ *  instead of pstate itself for each of the pstates referred
  *
  * The nominal pstate is the highest non-turbo pstate in this
  * platform. This is indicated by powernv_pstate_info.nominal.
  */
 static struct powernv_pstate_info {
-   int min;
-   int max;
-   int nominal;
-   int nr_pstates;
+   unsigned int min;
+   unsigned int max;
+   unsigned int nominal;
+   unsigned int nr_pstates;
 } powernv_pstate_info;
 
+/* Use following macros for conversions between pstate_id and index */
+static inline int get_pstate(unsigned int i) {
+   return powernv_freqs[i].driver_data;
+}
+static inline unsigned int get_index(int pstate) {
+   return abs(pstate - get_pstate(powernv_pstate_info.max));
+}
+
 static inline void reset_gpstates(struct cpufreq_policy *policy)
 {
struct global_pstate_info *gpstates = policy->driver_data;
@@ -208,23 +218,28 @@ static int init_powernv_pstates(void)
return -ENODEV;
}
 
+   powernv_pstate_info.nr_pstates = nr_pstates;
pr_debug("NR PStates %d\n", nr_pstates);
for (i = 0; i < nr_pstates; i++) {
u32 id = be32_to_cpu(pstate_ids[i]);
u32 freq = be32_to_cpu(pstate_freqs[i]);
 
-   pr_debug("PState id %d freq %d MHz\n", id, freq);
powernv_freqs[i].frequency = freq * 1000; /* kHz */
powernv_freqs[i].driver_data = id;
+
+   if (id == pstate_max)
+   powernv_pstate_info.max = i;
+   else if (id == pstate_nominal)
+   powernv_pstate_info.nominal = i;
+   else if (id == pstate_min)
+   powernv_pstate_info.min = i;
}
+
+   pr_info("pstate_info: min %d nominal %d max %d\n",
+   powernv_pstate_info.min, powernv_pstate_info.nominal,
+   powernv_pstate_info.max);
/* End of list marker entry */
powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
-
-   powernv_pstate_info.min = pstate_min;
-   powernv_pstate_info.max = pstate_max;
-   powernv_pstate_info.nominal = pstate_nominal;
-   powernv_pstate_info.nr_pstates = nr_pstates;
-
return 0;
 }
 
@@ -233,15 +248,15 @@ static unsigned int pstate_id_to_freq(int pstate_id)
 {
int i;
 
-   i = powernv_pstate_info.max - pstate_id;
+   i = get_index(pstate_id);
if (i >= powernv_pstate_info.nr_pstates || i < 0) {
pr_warn("PState id %d outside of PState table, "
"reporting nominal id %d instead\n",
pstate_id, powernv_pstate_info.nominal);
-   i = powernv_pstate_info.max - powernv_pstate_info.nominal;
+   i = powernv_pstate_info.nominal;
}
 
-   return powernv_freqs[i].frequency;
+   return get_pstate(i);
 }
 
 /*
@@ -252,7 +267,7 @@ static ssize_t cpuinfo_nominal_freq_show(struct 
cpufreq_policy *policy,
char *buf)
 {
return sprintf(buf, "%u\n",
-   pstate_id_to_freq(powernv_pstate_info.nominal));
+   get_pstate(powernv_pstate_info.nominal));
 }
 
 struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq =
@@ -426,7 +441,7 @@ static void set_pstate(void *data)
  */
 static inline unsigned int g

Re: [PATCH-next v2 0/2] cpufreq: powernv: Fixes for Global pstate management

2016-05-09 Thread Akshay Adiga


On 05/03/2016 08:49 PM, Akshay Adiga wrote:


Fixes are based on patch https://patchwork.ozlabs.org/patch/612058/ which
is in Rafael's linux-next.

- Patch [1] fixes WARN_ON in powernv_target_index()
- Patch [2] Deleting any pending timer to saves an unnecessary irq call
  in powernv_target_index()

Akshay Adiga (2):
   cpufreq: powernv: Move smp_call_function_any() out of irq safe block
   cpufreq: powernv: del_timer_sync when global and local pstate are
 equal

  drivers/cpufreq/powernv-cpufreq.c | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)


Does this look good ? Anything i should do ?


Regards
Akshay Adiga

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH-next v2 0/2] cpufreq: powernv: Fixes for Global pstate management

2016-05-03 Thread Akshay Adiga

Fixes are based on patch https://patchwork.ozlabs.org/patch/612058/ which
is in Rafael's linux-next.

- Patch [1] fixes WARN_ON in powernv_target_index()
- Patch [2] Deleting any pending timer to saves an unnecessary irq call
 in powernv_target_index()

Akshay Adiga (2):
  cpufreq: powernv: Move smp_call_function_any() out of irq safe block
  cpufreq: powernv: del_timer_sync when global and local pstate are
equal

 drivers/cpufreq/powernv-cpufreq.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH-next v2 1/2] cpufreq: powernv: Move smp_call_function_any() out of irq safe block

2016-05-03 Thread Akshay Adiga

Fix a WARN_ON caused by smp_call_function_any() when irq is disabled,
because of changes made in the patch ('cpufreq: powernv: Ramp-down
 global pstate slower than local-pstate')
https://patchwork.ozlabs.org/patch/612058/

 WARNING: CPU: 0 PID: 4 at kernel/smp.c:291
smp_call_function_single+0x170/0x180

 Call Trace:
 [c007f648f9f0] [c007f648fa90] 0xc007f648fa90 (unreliable)
 [c007f648fa30] [c01430e0] smp_call_function_any+0x170/0x1c0
 [c007f648fa90] [c07b4b00]
powernv_cpufreq_target_index+0xe0/0x250
 [c007f648fb00] [c07ac9dc]
__cpufreq_driver_target+0x20c/0x3d0
 [c007f648fbc0] [c07b1b4c] od_dbs_timer+0xcc/0x260
 [c007f648fc10] [c07b3024] dbs_work_handler+0x54/0xa0
 [c007f648fc50] [c00c49a8] process_one_work+0x1d8/0x590
 [c007f648fce0] [c00c4e08] worker_thread+0xa8/0x660
 [c007f648fd80] [c00cca88] kthread+0x108/0x130
 [c007f648fe30] [c00095e8] ret_from_kernel_thread+0x5c/0x74

- Calling smp_call_function_any() with interrupt disabled (through
 spin_lock_irqsave) could cause a deadlock, as smp_call_function_any()
 relies on the IPI to complete. This is detected in the
 smp_call_function_any() call and hence the WARN_ON.

- As the spinlock (gpstates->lock) is only used to synchronize access of
 global_pstate_info  between timer irq handler and target_index calls. And
 the timer irq handler just try_locks() hence it would not cause a
 deadlock. Hence could do without making spinlocks irq safe.

- As the smp_call_function_any() is a blocking call and does not access
 global_pstates_info, it could reduce the critcal section by moving
 smp_call_function_any() after giving up the lock.

Reported-by: Abdul Haleem <abdha...@linux.vnet.linux.com>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
Patch is based on Rafael's linux-next
 drivers/cpufreq/powernv-cpufreq.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 144c732..1f0e20c 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -581,9 +581,10 @@ void gpstate_timer_handler(unsigned long data)
gpstates->last_gpstate = freq_data.gpstate_id;
gpstates->last_lpstate = freq_data.pstate_id;
 
+   spin_unlock(>gpstate_lock);
+
/* Timer may get migrated to a different cpu on cpu hot unplug */
smp_call_function_any(policy->cpus, set_pstate, _data, 1);
-   spin_unlock(>gpstate_lock);
 }
 
 /*
@@ -596,7 +597,6 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 {
struct powernv_smp_call_data freq_data;
unsigned int cur_msec, gpstate_id;
-   unsigned long flags;
struct global_pstate_info *gpstates = policy->driver_data;
 
if (unlikely(rebooting) && new_index != get_nominal_index())
@@ -607,7 +607,7 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 
cur_msec = jiffies_to_msecs(get_jiffies_64());
 
-   spin_lock_irqsave(>gpstate_lock, flags);
+   spin_lock(>gpstate_lock);
freq_data.pstate_id = powernv_freqs[new_index].driver_data;
 
if (!gpstates->last_sampled_time) {
@@ -654,13 +654,14 @@ gpstates_done:
gpstates->last_gpstate = freq_data.gpstate_id;
gpstates->last_lpstate = freq_data.pstate_id;
 
+   spin_unlock(>gpstate_lock);
+
/*
 * Use smp_call_function to send IPI and execute the
 * mtspr on target CPU.  We could do that without IPI
 * if current CPU is within policy->cpus (core)
 */
smp_call_function_any(policy->cpus, set_pstate, _data, 1);
-   spin_unlock_irqrestore(>gpstate_lock, flags);
return 0;
 }
 
-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH-next v2 2/2] cpufreq: powernv: del_timer_sync when global and local pstate are equal

2016-05-03 Thread Akshay Adiga

When global and local pstate are equal in a powernv_target_index() call,
we don't queue a timer. But we may have timer already queued for future.
This could cause the timer to fire one additional time for no use.

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
Patch is based on Rafael's linux-next
 drivers/cpufreq/powernv-cpufreq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 1f0e20c..54c4536 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -647,6 +647,8 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 */
if (gpstate_id != freq_data.pstate_id)
queue_gpstate_timer(gpstates);
+   else
+   del_timer_sync(>timer);
 
 gpstates_done:
freq_data.gpstate_id = gpstate_id;
-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH -next 1/2] cpufreq: powernv: Move smp_call_function_any() out of irq safe block

2016-05-03 Thread Akshay Adiga



Hi Viresh,

On 05/03/2016 05:19 PM, Viresh Kumar wrote:


On 03-05-16, 15:10, Akshay Adiga wrote:

Fixing a WARN_ON caused by smp_call_function_any() when irq is disabled,
because of changes made in the patch
('cpufreq: powernv: Ramp-down global pstate slower than local-pstate')
https://patchwork.ozlabs.org/patch/612058/

  WARNING: CPU: 0 PID: 4 at kernel/smp.c:291
smp_call_function_single+0x170/0x180

  Call Trace:
  [c007f648f9f0] [c007f648fa90] 0xc007f648fa90 (unreliable)
  [c007f648fa30] [c01430e0] smp_call_function_any+0x170/0x1c0
  [c007f648fa90] [c07b4b00]
powernv_cpufreq_target_index+0xe0/0x250
  [c007f648fb00] [c07ac9dc]
__cpufreq_driver_target+0x20c/0x3d0
  [c007f648fbc0] [c07b1b4c] od_dbs_timer+0xcc/0x260
  [c007f648fc10] [c07b3024] dbs_work_handler+0x54/0xa0
  [c007f648fc50] [c00c49a8] process_one_work+0x1d8/0x590
  [c007f648fce0] [c00c4e08] worker_thread+0xa8/0x660
  [c007f648fd80] [c00cca88] kthread+0x108/0x130
  [c007f648fe30] [c00095e8] ret_from_kernel_thread+0x5c/0x74

Moving smp_call_function_any() out of the critical section and changing
irq safe spinlocks to normal spinlocks.

Reported-by: Abdul Haleem <abdha...@linux.vnet.linux.com>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
Patch is based on Rafael's linux-next
  drivers/cpufreq/powernv-cpufreq.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 144c732..1f0e20c 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -581,9 +581,10 @@ void gpstate_timer_handler(unsigned long data)
gpstates->last_gpstate = freq_data.gpstate_id;
gpstates->last_lpstate = freq_data.pstate_id;
  
+	spin_unlock(>gpstate_lock);

+
/* Timer may get migrated to a different cpu on cpu hot unplug */
smp_call_function_any(policy->cpus, set_pstate, _data, 1);
-   spin_unlock(>gpstate_lock);
  }
  
  /*

@@ -596,7 +597,6 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
  {
struct powernv_smp_call_data freq_data;
unsigned int cur_msec, gpstate_id;
-   unsigned long flags;
struct global_pstate_info *gpstates = policy->driver_data;
  
  	if (unlikely(rebooting) && new_index != get_nominal_index())

@@ -607,7 +607,7 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
  
  	cur_msec = jiffies_to_msecs(get_jiffies_64());
  
-	spin_lock_irqsave(>gpstate_lock, flags);

+   spin_lock(>gpstate_lock);

You don't necessarily have to write 'what you are doing' in the commit log, but
tell us why you are doing that.

Please explain, why is this changed and why will things continue to work
without this.


Thanks for reviewing. I have tried to convey that in the first line of commit 
message,

"WARN_ON caused by smp_call_function_any() when irq is disabled,
because of changes made in the patch"

I see, i have not explained why i am changing irq safe spinlock to normal 
spinlock.
will add some explanation.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH -next 0/2] cpufreq: powernv: Fixes for Global pstate management

2016-05-03 Thread Akshay Adiga

Fixes are based on patch https://patchwork.ozlabs.org/patch/612058/ which is
in Rafael's linux-next.

- Patch [1] fixes WARN_ON in powernv_target_index()
- Patch [2] Deleting any pending timer to saves an unnecessary irq call
 in powernv_target_index()


Akshay Adiga (2):
  cpufreq: powernv: Move smp_call_function_any() out of irq safe block
  cpufreq: powernv: del_timer_sync when global and local pstate are
equal

 drivers/cpufreq/powernv-cpufreq.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH -next 2/2] cpufreq: powernv: del_timer_sync when global and local pstate are equal

2016-05-03 Thread Akshay Adiga

Deleting pending gpstates->timer for the policy when global and local pstate
are equal while executing target_index(). This saves an unnecessary
irq call.

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
Patch is based on Rafael's linux-next
 drivers/cpufreq/powernv-cpufreq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 1f0e20c..54c4536 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -647,6 +647,8 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 */
if (gpstate_id != freq_data.pstate_id)
queue_gpstate_timer(gpstates);
+   else
+   del_timer_sync(>timer);
 
 gpstates_done:
freq_data.gpstate_id = gpstate_id;
-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH -next 1/2] cpufreq: powernv: Move smp_call_function_any() out of irq safe block

2016-05-03 Thread Akshay Adiga

Fixing a WARN_ON caused by smp_call_function_any() when irq is disabled,
because of changes made in the patch
('cpufreq: powernv: Ramp-down global pstate slower than local-pstate')
https://patchwork.ozlabs.org/patch/612058/

 WARNING: CPU: 0 PID: 4 at kernel/smp.c:291
smp_call_function_single+0x170/0x180

 Call Trace:
 [c007f648f9f0] [c007f648fa90] 0xc007f648fa90 (unreliable)
 [c007f648fa30] [c01430e0] smp_call_function_any+0x170/0x1c0
 [c007f648fa90] [c07b4b00]
powernv_cpufreq_target_index+0xe0/0x250
 [c007f648fb00] [c07ac9dc]
__cpufreq_driver_target+0x20c/0x3d0
 [c007f648fbc0] [c07b1b4c] od_dbs_timer+0xcc/0x260
 [c007f648fc10] [c07b3024] dbs_work_handler+0x54/0xa0
 [c007f648fc50] [c00c49a8] process_one_work+0x1d8/0x590
 [c007f648fce0] [c00c4e08] worker_thread+0xa8/0x660
 [c007f648fd80] [c00cca88] kthread+0x108/0x130
 [c007f648fe30] [c00095e8] ret_from_kernel_thread+0x5c/0x74

Moving smp_call_function_any() out of the critical section and changing
irq safe spinlocks to normal spinlocks.

Reported-by: Abdul Haleem <abdha...@linux.vnet.linux.com>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
Patch is based on Rafael's linux-next
 drivers/cpufreq/powernv-cpufreq.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 144c732..1f0e20c 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -581,9 +581,10 @@ void gpstate_timer_handler(unsigned long data)
gpstates->last_gpstate = freq_data.gpstate_id;
gpstates->last_lpstate = freq_data.pstate_id;
 
+   spin_unlock(>gpstate_lock);
+
/* Timer may get migrated to a different cpu on cpu hot unplug */
smp_call_function_any(policy->cpus, set_pstate, _data, 1);
-   spin_unlock(>gpstate_lock);
 }
 
 /*
@@ -596,7 +597,6 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 {
struct powernv_smp_call_data freq_data;
unsigned int cur_msec, gpstate_id;
-   unsigned long flags;
struct global_pstate_info *gpstates = policy->driver_data;
 
if (unlikely(rebooting) && new_index != get_nominal_index())
@@ -607,7 +607,7 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 
cur_msec = jiffies_to_msecs(get_jiffies_64());
 
-   spin_lock_irqsave(>gpstate_lock, flags);
+   spin_lock(>gpstate_lock);
freq_data.pstate_id = powernv_freqs[new_index].driver_data;
 
if (!gpstates->last_sampled_time) {
@@ -654,13 +654,14 @@ gpstates_done:
gpstates->last_gpstate = freq_data.gpstate_id;
gpstates->last_lpstate = freq_data.pstate_id;
 
+   spin_unlock(>gpstate_lock);
+
/*
 * Use smp_call_function to send IPI and execute the
 * mtspr on target CPU.  We could do that without IPI
 * if current CPU is within policy->cpus (core)
 */
smp_call_function_any(policy->cpus, set_pstate, _data, 1);
-   spin_unlock_irqrestore(>gpstate_lock, flags);
return 0;
 }
 
-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-22 Thread Akshay Adiga


Hi Stewart,

On 04/20/2016 03:41 AM, Stewart Smith wrote:

Akshay Adiga<akshay.ad...@linux.vnet.ibm.com>  writes:

Iozone results show fairly consistent performance boost.
YCSB on redis shows improved Max latencies in most cases.

What about power consumption?


Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb
with different record sizes . The following table shows IOoperations/sec
with and without patch.
Iozone Results ( in op/sec) ( mean over 3 iterations )

What's the variance between runs?


Re-Ran Iozone test

w/o : without patch,  w : with patch , stdev : standard deviation , avg ; 
average

Iozone Results for ReWrite
+--++---++---+---+-+
| filesize | reclen |  w/o(avg) | w/o(stdev) |   w(avg)  |  w(stdev) | change% |
+--++---++---+---+-+
|  200704  |   1|  795070.4 |  5813.51   |  805127.8 |  16872.59 |  1.264  |
|  200704  |   2| 1448973.8 |  23058.79  | 1472098.8 |  18062.73 |  1.595  |
|  200704  |   4|  2413444  |  85988.09  | 2562535.8 |  48649.35 |  6.177  |
|  200704  |   8|  3827453  |  87710.52  | 3846888.2 |  86438.51 |  0.507  |
|  200704  |   16   | 5276096.8 |  73208.19  | 5425961.6 | 170774.75 |  2.840  |
|  200704  |   32   | 6742930.6 |  22789.45  | 6848904.4 | 257768.84 |  1.571  |
|  200704  |   64   | 7059479.2 | 300725.26  |  7373635  | 285106.90 |  4.450  |
|  200704  |  128   | 7097647.2 | 408171.71  |  7716500  | 266139.68 |  8.719  |
|  200704  |  256   |  6710810  | 314594.13  | 7661752.6 | 454049.27 |  14.170 |
|  200704  |  512   | 7034675.4 | 516152.97  | 7378583.2 | 613617.57 |  4.888  |
|  200704  |  1024  | 6265317.2 | 446101.38  | 7540629.6 | 294865.20 |  20.355 |
|  401408  |   1|  802233.2 |  4263.92   |   817507  |  17727.09 |  1.903  |
|  401408  |   2| 1461892.8 |  53678.12  |  1482872  |  45670.30 |  1.435  |
|  401408  |   4| 2629686.8 |  24365.33  | 2673196.2 |  41576.78 |  1.654  |
|  401408  |   8| 4156353.8 |  70636.85  | 4149330.4 |  56521.84 |  -0.168 |
|  401408  |   16   |  5895437  |  63762.43  | 5924167.4 | 396311.75 |  0.487  |
|  401408  |   32   | 7330826.6 | 167080.53  | 7785889.2 | 245434.99 |  6.207  |
|  401408  |   64   | 8298555.2 | 328890.89  | 8482416.8 | 249698.02 |  2.215  |
|  401408  |  128   | 8241108.6 | 490560.96  |  8686478  | 224816.21 |  5.404  |
|  401408  |  256   | 8038080.6 | 327704.66  | 8372327.4 | 210978.18 |  4.158  |
|  401408  |  512   | 8229523.4 | 371701.73  | 8654695.2 | 296715.07 |  5.166  |
+--++---++---+---+-+

Iozone results for Write
+--++---++---++-+
| filesize | reclen |  w/o(avg) | w/o(stdev) |   w(avg)  |  w(stdev)  | change% 
|
+--++---++---++-+
|  200704  |   1|   575825  |  7,876.69  |  569388.4 |  6,699.59  |  -1.12  
|
|  200704  |   2| 1061229.4 |  7,589.50  | 1045193.2 | 19,785.85  |  -1.51  
|
|  200704  |   4|  1808329  | 13,040.67  | 1798138.4 | 50,367.19  |  -0.56  
|
|  200704  |   8| 2822953.4 | 19,948.89  | 2830305.6 | 21,202.77  |   0.26  
|
|  200704  |   16   |  3976987  | 62,201.72  | 3909063.8 | 268,640.51 |  -1.71  
|
|  200704  |   32   | 4959358.2 | 112,052.99 |  4760303  | 330,343.73 |  -4.01  
|
|  200704  |   64   | 5452454.6 | 628,078.72 | 5692265.6 | 190,562.91 |   4.40  
|
|  200704  |  128   | 5645246.8 | 10,455.85  | 5653330.2 | 18,153.76  |   0.14  
|
|  200704  |  256   | 5855897.2 | 184,854.25 |  5402069  | 538,523.04 |  -7.75  
|
|  200704  |  512   |  5515904  | 326,198.86 | 5639976.4 |  8,480.46  |   2.25  
|
|  200704  |  1024  | 5471718.2 | 415,179.15 | 5399414.6 | 686,124.50 |  -1.32  
|
|  401408  |   1|  584786.6 |  1,256.59  |  587237.2 |  6,552.55  |   0.42  
|
|  401408  |   2| 1047018.8 | 26,567.72  | 1040926.8 | 16,495.93  |  -0.58  
|
|  401408  |   4| 1815465.8 | 16,426.92  | 1773652.6 | 38,169.02  |  -2.30  
|
|  401408  |   8|  2814285  | 27,374.53  |  2756608  | 96,689.13  |  -2.05  
|
|  401408  |   16   |  3931646  | 129,648.79 | 3805793.4 | 141,368.40 |  -3.20  
|
|  401408  |   32   | 4875353.4 | 146,203.70 |  4884084  | 265,484.01 |   0.18  
|
|  401408  |   64   | 5479805.8 | 349,995.36 | 5565292.2 | 20,645.45  |   1.56  
|
|  401408  |  128   |  5598486  | 195,680.23 |  5645125  | 62,017.38  |   0.83  
|
|  401408  |  256   |  5803148  | 328,683.02 |  5657215  | 20,579.28  |  -2.51  
|
|  401408  |  512   | 5565091.4 | 166,123.57 | 5725974.4 | 169,506.29 |   2.89  
|
+--++---++---++-+


Tested with YCSB workload (50% update + 50% read) over redis for 1 million
records and 1 million operation. Each test was carried out with target
operations per second and persistence disabled.

Max-latency

[PATCH v3 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-19 Thread Akshay Adiga

iterations )
---
op/sOperation   with patch  without patch   %change
---
15000   Read61480.6 50261.4 22.32
15000   cleanup 215.2   293.6   -26.70
15000   update  25666.2 25163.8 2.00

25000   Read32626.2 89525.4 -63.56
25000   cleanup 292.2   263.0   11.10
25000   update  32293.4 90255.0 -64.22

35000   Read34783.0 33119.0 5.02
35000   cleanup 321.2   395.8   -18.8
35000   update  36047.0 38747.8 -6.97

4   Read38562.2 42357.4 -8.96
4   cleanup 371.8   384.6   -3.33
4   update  27861.4 41547.8 -32.94

45000   Read42271.0 88120.6 -52.03
45000   cleanup 263.6   383.0   -31.17
45000   update  29755.8 81359.0 -63.43

(test without target op/s)
47659   Read83061.4 136440.6-39.12
47659   cleanup 195.8   193.8   1.03
47659   update  73429.4 124971.8-41.24

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.ku...@linaro.org>
---
 drivers/cpufreq/powernv-cpufreq.c | 258 --
 1 file changed, 251 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index e2e2219..144c732 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -36,12 +36,56 @@
 #include 
 #include  /* Required for cpu_sibling_mask() in UP configs */
 #include 
+#include 
 
 #define POWERNV_MAX_PSTATES256
 #define PMSR_PSAFE_ENABLE  (1UL << 30)
 #define PMSR_SPR_EM_DISABLE(1UL << 31)
 #define PMSR_MAX(x)((x >> 32) & 0xFF)
 
+#define MAX_RAMP_DOWN_TIME 5120
+/*
+ * On an idle system we want the global pstate to ramp-down from max value to
+ * min over a span of ~5 secs. Also we want it to initially ramp-down slowly 
and
+ * then ramp-down rapidly later on.
+ *
+ * This gives a percentage rampdown for time elapsed in milliseconds.
+ * ramp_down_percentage = ((ms * ms) >> 18)
+ * ~= 3.8 * (sec * sec)
+ *
+ * At 0 ms ramp_down_percent = 0
+ * At 5120 ms  ramp_down_percent = 100
+ */
+#define ramp_down_percent(time)((time * time) >> 18)
+
+/* Interval after which the timer is queued to bring down global pstate */
+#define GPSTATE_TIMER_INTERVAL 2000
+
+/**
+ * struct global_pstate_info - Per policy data structure to maintain history of
+ * global pstates
+ * @highest_lpstate:   The local pstate from which we are ramping down
+ * @elapsed_time:  Time in ms spent in ramping down from
+ * highest_lpstate
+ * @last_sampled_time: Time from boot in ms when global pstates were
+ * last set
+ * @last_lpstate,last_gpstate: Last set values for local and global pstates
+ * @timer: Is used for ramping down if cpu goes idle for
+ * a long time with global pstate held high
+ * @gpstate_lock:  A spinlock to maintain synchronization between
+ * routines called by the timer handler and
+ * governer's target_index calls
+ */
+struct global_pstate_info {
+   int highest_lpstate;
+   unsigned int elapsed_time;
+   unsigned int last_sampled_time;
+   int last_lpstate;
+   int last_gpstate;
+   spinlock_t gpstate_lock;
+   struct timer_list timer;
+};
+
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
 
@@ -94,6 +138,17 @@ static struct powernv_pstate_info {
int nr_pstates;
 } powernv_pstate_info;
 
+static inline void reset_gpstates(struct cpufreq_policy *policy)
+{
+   struct global_pstate_info *gpstates = policy->driver_data;
+
+   gpstates->highest_lpstate = 0;
+   gpstates->elapsed_time = 0;
+   gpstates->last_sampled_time = 0;
+   gpstates->last_lpstate = 0;
+   gpstates->last_gpstate = 0;
+}
+
 /*
  * Initialize the freq table based on data obtained
  * from the firmware passed via device-tree
@@ -285,6 +340,7 @@ static inline void set_pmspr(unsigned long sprn, unsigned 
long val)
 struct powernv_smp_call_data {
unsigned int freq;
int pstate_id;
+   int gpstate_id;
 };
 
 /*
@@ -343,19 +399,21 @@ static unsigned int powernv_cpufreq_get(unsigned int c

[PATCH v3 0/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-19 Thread Akshay Adiga

The frequency transition latency from pmin to pmax is observed to be in few
millisecond granurality. And it usually happens to take a performance penalty
during sudden frequency rampup requests.

This patch set solves this problem by using a chip-level entity called "global
pstates". Global pstate manages elements across other dependent core chiplets.
Typically, the element that needs to be managed is the voltage setting.
So by holding global pstates higher than local pstate for some amount of time
( ~5 seconds) the subsequent rampups could be made faster.

(1/2) patch removes the flag from cpufreq_policy->driver_data, so that it can
be used for tracking global pstates.

(2/2) patch adds code for global pstate management.
- The iozone results with this patchset, shows improvements in almost all cases.
- YCSB workload on redis with various  target operations per second shows 
better MaxLatency with this patch.

Changes from v1:
- Fixed coding style
- Added a routine to reset global_pstate_info instead of hacky memset
- Handled case where cpufreq_table_validate_and_show() fails
- changed int queue_gpstate_timer() to void queue_gpstate_timer()

Changes from v2:
- dropped the unreated change. 

Akshay Adiga (1):
  cpufreq: powernv: Ramp-down global pstate slower than local-pstate

Shilpasri G Bhat (1):
  cpufreq: powernv: Remove flag use-case of policy->driver_data

 drivers/cpufreq/powernv-cpufreq.c | 269 --
 1 file changed, 256 insertions(+), 13 deletions(-)

-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/2] cpufreq: powernv: Remove flag use-case of policy->driver_data

2016-04-19 Thread Akshay Adiga

From: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>

commit 1b0289848d5d ("cpufreq: powernv: Add sysfs attributes to show
throttle stats") used policy->driver_data as a flag for one-time creation
of throttle sysfs files. Instead of this use 'kernfs_find_and_get()' to
check if the attribute already exists. This is required as
policy->driver_data is used for other purposes in the later patch.

Signed-off-by: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.ku...@linaro.org>
---
 drivers/cpufreq/powernv-cpufreq.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 39ac78c..e2e2219 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -455,13 +455,15 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
int base, i;
+   struct kernfs_node *kn;
 
base = cpu_first_thread_sibling(policy->cpu);
 
for (i = 0; i < threads_per_core; i++)
cpumask_set_cpu(base + i, policy->cpus);
 
-   if (!policy->driver_data) {
+   kn = kernfs_find_and_get(policy->kobj.sd, throttle_attr_grp.name);
+   if (!kn) {
int ret;
 
ret = sysfs_create_group(>kobj, _attr_grp);
@@ -470,11 +472,8 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpu);
return ret;
}
-   /*
-* policy->driver_data is used as a flag for one-time
-* creation of throttle sysfs files.
-*/
-   policy->driver_data = policy;
+   } else {
+   kernfs_put(kn);
}
return cpufreq_table_validate_and_show(policy, powernv_freqs);
 }
-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-19 Thread Akshay Adiga


Hi Viresh,

On 04/18/2016 03:48 PM, Viresh Kumar wrote:

On 15-04-16, 11:58, Akshay Adiga wrote:

  static int powernv_cpufreq_reboot_notifier(struct notifier_block *nb,
-   unsigned long action, void *unused)
+  unsigned long action, void *unused)

Unrelated change.. better don't add such changes..


Posting out v3 with out this unrelated change.


  {
int cpu;
struct cpufreq_policy cpu_policy;
@@ -603,15 +843,18 @@ static struct notifier_block powernv_cpufreq_opal_nb = {
  static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
  {
struct powernv_smp_call_data freq_data;
-
+   struct global_pstate_info *gpstates = policy->driver_data;

You removed a blank line here and I feel the code looks better with
that.


freq_data.pstate_id = powernv_pstate_info.min;
+   freq_data.gpstate_id = powernv_pstate_info.min;
smp_call_function_single(policy->cpu, set_pstate, _data, 1);
+   del_timer_sync(>timer);
  }
  
  static struct cpufreq_driver powernv_cpufreq_driver = {

.name   = "powernv-cpufreq",
.flags  = CPUFREQ_CONST_LOOPS,
.init   = powernv_cpufreq_cpu_init,
+   .exit   = powernv_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index   = powernv_cpufreq_target_index,
.get= powernv_cpufreq_get,

None of the above comments are mandatory for you to fix..

Acked-by: Viresh Kumar <viresh.ku...@linaro.org>


Thanks for Ack  :)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-15 Thread Akshay Adiga

iterations )
---
op/sOperation   with patch  without patch   %change
---
15000   Read61480.6 50261.4 22.32
15000   cleanup 215.2   293.6   -26.70
15000   update  25666.2 25163.8 2.00

25000   Read32626.2 89525.4 -63.56
25000   cleanup 292.2   263.0   11.10
25000   update  32293.4 90255.0 -64.22

35000   Read34783.0 33119.0 5.02
35000   cleanup 321.2   395.8   -18.8
35000   update  36047.0 38747.8 -6.97

4   Read38562.2 42357.4 -8.96
4   cleanup 371.8   384.6   -3.33
4   update  27861.4 41547.8 -32.94

45000   Read42271.0 88120.6 -52.03
45000   cleanup 263.6   383.0   -31.17
45000   update  29755.8 81359.0 -63.43

(test without target op/s)
47659   Read83061.4 136440.6-39.12
47659   cleanup 195.8   193.8   1.03
47659   update  73429.4 124971.8-41.24

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 261 --
 1 file changed, 252 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index e2e2219..78388c0 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -36,12 +36,56 @@
 #include 
 #include  /* Required for cpu_sibling_mask() in UP configs */
 #include 
+#include 
 
 #define POWERNV_MAX_PSTATES256
 #define PMSR_PSAFE_ENABLE  (1UL << 30)
 #define PMSR_SPR_EM_DISABLE(1UL << 31)
 #define PMSR_MAX(x)((x >> 32) & 0xFF)
 
+#define MAX_RAMP_DOWN_TIME 5120
+/*
+ * On an idle system we want the global pstate to ramp-down from max value to
+ * min over a span of ~5 secs. Also we want it to initially ramp-down slowly 
and
+ * then ramp-down rapidly later on.
+ *
+ * This gives a percentage rampdown for time elapsed in milliseconds.
+ * ramp_down_percentage = ((ms * ms) >> 18)
+ * ~= 3.8 * (sec * sec)
+ *
+ * At 0 ms ramp_down_percent = 0
+ * At 5120 ms  ramp_down_percent = 100
+ */
+#define ramp_down_percent(time)((time * time) >> 18)
+
+/* Interval after which the timer is queued to bring down global pstate */
+#define GPSTATE_TIMER_INTERVAL 2000
+
+/**
+ * struct global_pstate_info - Per policy data structure to maintain history of
+ * global pstates
+ * @highest_lpstate:   The local pstate from which we are ramping down
+ * @elapsed_time:  Time in ms spent in ramping down from
+ * highest_lpstate
+ * @last_sampled_time: Time from boot in ms when global pstates were
+ * last set
+ * @last_lpstate,last_gpstate: Last set values for local and global pstates
+ * @timer: Is used for ramping down if cpu goes idle for
+ * a long time with global pstate held high
+ * @gpstate_lock:  A spinlock to maintain synchronization between
+ * routines called by the timer handler and
+ * governer's target_index calls
+ */
+struct global_pstate_info {
+   int highest_lpstate;
+   unsigned int elapsed_time;
+   unsigned int last_sampled_time;
+   int last_lpstate;
+   int last_gpstate;
+   spinlock_t gpstate_lock;
+   struct timer_list timer;
+};
+
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
 
@@ -94,6 +138,17 @@ static struct powernv_pstate_info {
int nr_pstates;
 } powernv_pstate_info;
 
+static inline void reset_gpstates(struct cpufreq_policy *policy)
+{
+   struct global_pstate_info *gpstates = policy->driver_data;
+
+   gpstates->highest_lpstate = 0;
+   gpstates->elapsed_time = 0;
+   gpstates->last_sampled_time = 0;
+   gpstates->last_lpstate = 0;
+   gpstates->last_gpstate = 0;
+}
+
 /*
  * Initialize the freq table based on data obtained
  * from the firmware passed via device-tree
@@ -285,6 +340,7 @@ static inline void set_pmspr(unsigned long sprn, unsigned 
long val)
 struct powernv_smp_call_data {
unsigned int freq;
int pstate_id;
+   int gpstate_id;
 };
 
 /*
@@ -343,19 +399,21 @@ static unsigned int powernv_cpufreq_get(unsigned int cpu)
  * (struct powernv_smp_call_data *) and the psta

[PATCH v2 1/2] cpufreq: powernv: Remove flag use-case of policy->driver_data

2016-04-15 Thread Akshay Adiga

From: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>

commit 1b0289848d5d ("cpufreq: powernv: Add sysfs attributes to show
throttle stats") used policy->driver_data as a flag for one-time creation
of throttle sysfs files. Instead of this use 'kernfs_find_and_get()' to
check if the attribute already exists. This is required as
policy->driver_data is used for other purposes in the later patch.

Signed-off-by: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 39ac78c..e2e2219 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -455,13 +455,15 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
int base, i;
+   struct kernfs_node *kn;
 
base = cpu_first_thread_sibling(policy->cpu);
 
for (i = 0; i < threads_per_core; i++)
cpumask_set_cpu(base + i, policy->cpus);
 
-   if (!policy->driver_data) {
+   kn = kernfs_find_and_get(policy->kobj.sd, throttle_attr_grp.name);
+   if (!kn) {
int ret;
 
ret = sysfs_create_group(>kobj, _attr_grp);
@@ -470,11 +472,8 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpu);
return ret;
}
-   /*
-* policy->driver_data is used as a flag for one-time
-* creation of throttle sysfs files.
-*/
-   policy->driver_data = policy;
+   } else {
+   kernfs_put(kn);
}
return cpufreq_table_validate_and_show(policy, powernv_freqs);
 }
-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-15 Thread Akshay Adiga

The frequency transition latency from pmin to pmax is observed to be in few
millisecond granurality. And it usually happens to take a performance penalty
during sudden frequency rampup requests.

This patch set solves this problem by using a chip-level entity called "global
 pstates". Global pstate manages elements across other dependent core chiplets.
Typically, the element that needs to be managed is the voltage setting.
So by holding global pstates higher than local pstate for some amount of time
( ~5 seconds) the subsequent rampups could be made faster.

(1/2) patch removes the flag from cpufreq_policy->driver_data, so that it can
be used for tracking global pstates.

(2/2) patch adds code for global pstate management.

- The iozone results with this patchset, shows improvements in almost all cases.
- YCSB workload on redis with various  target operations per second shows 
better MaxLatency with this patch.

Changes from v1:
- Fixed coding style
- Added a routine to reset global_pstate_info instead of hacky memset
- Handled case when cpufreq_table_validate_and_show() fails
- changed int queue_gpstate_timer() to void queue_gpstate_timer()


Akshay Adiga (1):
  cpufreq: powernv: Ramp-down global pstate slower than local-pstate

Shilpasri G Bhat (1):
  cpufreq: powernv: Remove flag use-case of policy->driver_data

 drivers/cpufreq/powernv-cpufreq.c | 272 +++---
 1 file changed, 257 insertions(+), 15 deletions(-)

-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-14 Thread Akshay Adiga


Hi Balbir,

On 04/14/2016 11:10 AM, Balbir Singh wrote:


On 13/04/16 04:06, Akshay Adiga wrote:

This patch brings down global pstate at a slower rate than the local
pstate. As the frequency transition latency from pmin to pmax is
observed to be in few millisecond granurality. It takes a performance
penalty during sudden frequency rampup. Hence by holding global pstates
higher than local pstate makes the subsequent rampups faster.

What domains does local and global refer to?


The *global pstate* is a Chip-level entity as such, so the global entity
(Voltage)  is managed across several cores. But with a DCM (Dual Chip Modules),
its more of a socket-level entity and hence Voltage is managed across both 
chips.

The *local pstate* is a Core-level entity, so the local entity (frequency) is
managed across threads.

I will include this in the commit message. Thanks.




+/*
+ * Quadratic equation which gives the percentage rampdown for time elapsed in
+ * milliseconds. time can be between 0 and MAX_RAMP_DOWN_TIME ( milliseconds )
+ * This equation approximates to y = -4e-6 x^2

Thanks for documenting this, but I think it will also be good to explain why we
use y = -4 e-6*x^2 as opposed to any other magic numbers.


Well it empirically worked out best this way.

On an idle system we want the Global Pstate to ramp-down from max value to min 
over
a span of ~5 secs. Also we want initially ramp-down slowly and ramp-down 
rapidly later
on, hence the equation.

I will try to make this in the comment more informative.

Regards

Akshay Adiga

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-13 Thread Akshay Adiga


Hi Viresh ,

Thanks for reviewing in detail.
I will correct all comments related to coding standards in my next patch.

On 04/13/2016 10:33 AM, Viresh Kumar wrote:


Comments mostly on the coding standards which you have *not* followed.

Also, please run checkpatch --strict next time you send patches
upstream.


Thanks for pointing out the --strict option, was not aware of that. I will
run checkpatch --strict on the next versions.


On 12-04-16, 23:36, Akshay Adiga wrote:
+
+/*
+ * While resetting we don't want "timer" fields to be set to zero as we
+ * may lose track of timer and will not be able to cleanly remove it
+ */
+#define reset_gpstates(policy)   memset(policy->driver_data, 0,\
+   sizeof(struct global_pstate_info)-\
+   sizeof(struct timer_list)-\
+   sizeof(spinlock_t))
That's super *ugly*. Why don't you create a simple routine which will
set the 5 integer variables to 0 in a straight forward way ?


Yeh, will create a routine.


@@ -348,14 +395,17 @@ static void set_pstate(void *freq_data)
unsigned long val;
unsigned long pstate_ul =
((struct powernv_smp_call_data *) freq_data)->pstate_id;
+   unsigned long gpstate_ul =
+   ((struct powernv_smp_call_data *) freq_data)->gpstate_id;

Remove these unnecessary casts and do:

struct powernv_smp_call_data *freq_data = data; //Name func arg as data

And then use freq_data->*.


Ok. Will do that.


+/*
+ * gpstate_timer_handler
+ *
+ * @data: pointer to cpufreq_policy on which timer was queued
+ *
+ * This handler brings down the global pstate closer to the local pstate
+ * according quadratic equation. Queues a new timer if it is still not equal
+ * to local pstate
+ */
+void gpstate_timer_handler(unsigned long data)
+{
+   struct cpufreq_policy *policy = (struct cpufreq_policy *) data;

no need to cast.


May be i need a cast here,  because data is unsigned long ( unlike other places 
where its void *).
On building without cast, it throws me a warning.


+   struct global_pstate_info *gpstates = (struct global_pstate_info *)
+   struct powernv_smp_call_data freq_data;
+   int ret;
+
+   ret = spin_trylock(>gpstate_lock);

no need of 'ret' for just this, simply do: if (!spin_trylock())...


Sure will do that.


a

+   if (!ret)
+   return;
+
+   gpstates->last_sampled_time += time_diff;
+   gpstates->elapsed_time += time_diff;
+   freq_data.pstate_id = gpstates->last_lpstate;
+   if ((gpstates->last_gpstate == freq_data.pstate_id) ||
+   (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME)) {
+   freq_data.gpstate_id = freq_data.pstate_id;
+   reset_gpstates(policy);
+   gpstates->highest_lpstate = freq_data.pstate_id;
+   } else {
+   freq_data.gpstate_id = calculate_global_pstate(

You can't break a line after ( of a function call :)

Let it go beyond 80 columns if it has to.


May be i will try to get it inside 80 columns with a temporary variable instead 
of
freq_data.gpstate_id.


+   gpstates->elapsed_time, gpstates->highest_lpstate,
+   freq_data.pstate_id);
+   }
+
+   /* If local pstate is equal to global pstate, rampdown is over

Bad style again.


+* So timer is not required to be queued.
+*/
+   if (freq_data.gpstate_id != freq_data.pstate_id)
+   ret = queue_gpstate_timer(gpstates);

ret not used.


Should i make it void instead of returning int?, as i cannot do much even if it 
fails, except for notifying.


+gpstates_done:
+   gpstates->last_sampled_time = cur_msec;
+   gpstates->last_gpstate = freq_data.gpstate_id;
+   gpstates->last_lpstate = freq_data.pstate_id;
+
/*
 * Use smp_call_function to send IPI and execute the
 * mtspr on target CPU.  We could do that without IPI
 * if current CPU is within policy->cpus (core)
 */
smp_call_function_any(policy->cpus, set_pstate, _data, 1);
+   spin_unlock_irqrestore(>gpstate_lock, flags);
+   return 0;
+}
  
+static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)

Add this after the init() routine.


Ok will do it.


+   policy->driver_data = gpstates;
+
+   /* initialize timer */
+   init_timer_deferrable(>timer);
+   gpstates->timer.data = (unsigned long) policy;
+   gpstates->timer.function = gpstate_timer_handler;
+   gpstates->timer.expires = jiffies +
+   msecs_to_jiffies(GPSTATE_TIMER_INTERVAL);
+
+   pr_info("Added global_pstate_info & timer for %d cpu\n", base);
return cpufreq_table_validate_and_show(policy, powernv_freqs);

Who will free gpstates if this fails ?


Thanks for pointing out.

[PATCH 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-12 Thread Akshay Adiga

   cleanup 292.2   263.0   11.10
25000   update  32293.4 90255.0 -64.22

35000   Read34783.0 33119.0 5.02
35000   cleanup 321.2   395.8   -18.8
35000   update  36047.0 38747.8 -6.97

4   Read38562.2 42357.4 -8.96
4   cleanup 371.8   384.6   -3.33
4   update  27861.4 41547.8 -32.94

45000   Read42271.0 88120.6 -52.03
45000   cleanup 263.6   383.0   -31.17
45000   update  29755.8 81359.0 -63.43

(test without target op/s)
47659   Read83061.4 136440.6-39.12
47659   cleanup 195.8   193.8   1.03
47659   update  73429.4 124971.8-41.24

Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 239 +-
 1 file changed, 237 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index e2e2219..288fa10 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -36,12 +36,58 @@
 #include 
 #include  /* Required for cpu_sibling_mask() in UP configs */
 #include 
+#include 
 
 #define POWERNV_MAX_PSTATES256
 #define PMSR_PSAFE_ENABLE  (1UL << 30)
 #define PMSR_SPR_EM_DISABLE(1UL << 31)
 #define PMSR_MAX(x)((x >> 32) & 0xFF)
 
+/*
+ * Quadratic equation which gives the percentage rampdown for time elapsed in
+ * milliseconds. time can be between 0 and MAX_RAMP_DOWN_TIME ( milliseconds )
+ * This equation approximates to y = -4e-6 x^2
+ *
+ * At 0 seconds x= ramp_down_percent=0
+ * At MAX_RAMP_DOWN_TIME x=5120 ramp_down_percent=100
+ */
+#define MAX_RAMP_DOWN_TIME 5120
+#define ramp_down_percent(time)((time * time)>>18)
+
+/*Interval after which the timer is queued to bring down global pstate*/
+#define GPSTATE_TIMER_INTERVAL 2000
+/*
+ * global_pstate_info :
+ * per policy data structure to maintain history of global pstates
+ *
+ * @highest_lpstate : the local pstate from which we are ramping down
+ * @elapsed_time : time in ms spent in ramping down from highest_lpstate
+ * @last_sampled_time : time from boot in ms when global pstates were last set
+ * @last_lpstate , last_gpstate : last set values for local and global pstates
+ * @timer : is used for ramping down if cpu goes idle for a long time with
+ * global pstate held high
+ * @gpstate_lock : a spinlock to maintain synchronization between routines
+ * called by the timer handler and governer's target_index calls
+ */
+struct global_pstate_info {
+   int highest_lpstate;
+   unsigned int elapsed_time;
+   unsigned int last_sampled_time;
+   int last_lpstate;
+   int last_gpstate;
+   spinlock_t gpstate_lock;
+   struct timer_list timer;
+};
+
+/*
+ * While resetting we don't want "timer" fields to be set to zero as we
+ * may lose track of timer and will not be able to cleanly remove it
+ */
+#define reset_gpstates(policy)   memset(policy->driver_data, 0,\
+   sizeof(struct global_pstate_info)-\
+   sizeof(struct timer_list)-\
+   sizeof(spinlock_t))
+
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
 
@@ -285,6 +331,7 @@ static inline void set_pmspr(unsigned long sprn, unsigned 
long val)
 struct powernv_smp_call_data {
unsigned int freq;
int pstate_id;
+   int gpstate_id;
 };
 
 /*
@@ -348,14 +395,17 @@ static void set_pstate(void *freq_data)
unsigned long val;
unsigned long pstate_ul =
((struct powernv_smp_call_data *) freq_data)->pstate_id;
+   unsigned long gpstate_ul =
+   ((struct powernv_smp_call_data *) freq_data)->gpstate_id;
 
val = get_pmspr(SPRN_PMCR);
val = val & 0xULL;
 
pstate_ul = pstate_ul & 0xFF;
+   gpstate_ul = gpstate_ul & 0xFF;
 
/* Set both global(bits 56..63) and local(bits 48..55) PStates */
-   val = val | (pstate_ul << 56) | (pstate_ul << 48);
+   val = val | (gpstate_ul << 56) | (pstate_ul << 48);
 
pr_debug("Setting cpu %d pmcr to %016lX\n",
raw_smp_processor_id(), val);
@@ -425,6 +475,109 @@ next:
 }
 
 /*
+ * calcuate_global_pstate:
+ *
+ * @elapsed_time : elapsed time in milliseconds
+ * @local_pstate : new local pstate
+ * @highest_lpstate : pstate from which its ramping down
+ *

[PATCH 1/2] cpufreq: powernv: Remove flag use-case of policy->driver_data

2016-04-12 Thread Akshay Adiga

From: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>

'commit 1b0289848d5d ("cpufreq: powernv: Add sysfs attributes to show throttle
stats")' used policy->driver_data as a flag for one-time creation
of throttle sysfs files. Instead of this use 'kernfs_find_and_get()'
to check if the attribute already exists. This is required as
policy->driver_data is used for other purposes in the later patch.

Signed-off-by: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>
Signed-off-by: Akshay Adiga <akshay.ad...@linux.vnet.ibm.com>
Reviewed-by: Shreyas B. Prabhu <shre...@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 39ac78c..e2e2219 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -455,13 +455,15 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
int base, i;
+   struct kernfs_node *kn;
 
base = cpu_first_thread_sibling(policy->cpu);
 
for (i = 0; i < threads_per_core; i++)
cpumask_set_cpu(base + i, policy->cpus);
 
-   if (!policy->driver_data) {
+   kn = kernfs_find_and_get(policy->kobj.sd, throttle_attr_grp.name);
+   if (!kn) {
int ret;
 
ret = sysfs_create_group(>kobj, _attr_grp);
@@ -470,11 +472,8 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpu);
return ret;
}
-   /*
-* policy->driver_data is used as a flag for one-time
-* creation of throttle sysfs files.
-*/
-   policy->driver_data = policy;
+   } else {
+   kernfs_put(kn);
}
return cpufreq_table_validate_and_show(policy, powernv_freqs);
 }
-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-12 Thread Akshay Adiga

The frequency transition latency from pmin to pmax is observed to be in few 
millisecond granurality. And it usually happens to take a performance penalty
during sudden frequency rampup requests. 

This patch set solves this problem by using a chip-level entity called "global
 pstates". Global pstate manages elements across other dependent core chiplets.
Typically, the element that needs to be managed is the voltage setting.
So by holding global pstates higher than local pstate for some amount of time 
( ~5 seconds) the subsequent rampups could be made faster.

(1/2) patch removes the flag from cpufreq_policy->driver_data, so that it can
be used for tracking global pstates.

(2/2) patch adds code for global pstate management. 
- The iozone results with this patchset, shows improvements in almost all cases.
- YCSB workload on redis with various  target operations per second shows 
better MaxLatency with this patch.

Akshay Adiga (1):
  powernv: Ramp-down global pstate slower than local-pstate

Shilpasri G Bhat (1):
  cpufreq: powernv: Remove flag use-case of policy->driver_data

 drivers/cpufreq/powernv-cpufreq.c | 250 --
 1 file changed, 242 insertions(+), 8 deletions(-)

-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

86 matches

Mail list logo