Re: [PATCH] powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

2017-10-05 Thread Anju T Sudhakar

Hi Santosh,


On Thursday 05 October 2017 03:20 PM, Santosh Sivaraj wrote:

* Anju T Sudhakar  wrote (on 2017-10-04 06:50:52 
+):


Nest/core pmu units are enabled only when it is used. A reference count is
maintained for the events which uses the nest/core pmu units. Currently in
*_imc_counters_release function a WARN() is used for notification of any
underflow of ref count.
 
The case where event ref count hit a negative value is, when perf session is

started, followed by offlining of all cpus in a given core.
i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
ref->count to zero, if the current cpu which is about to offline is the last
cpu in a given core and make an OPAL call to disable the engine in that core.
And on perf session termination, perf->destroy (core_imc_counters_release) will
first decrement the ref->count for this core and based on the ref->count value
an opal call is made to disable the core-imc engine.
Now, since cpuhotplug path already clears the ref->count for core and disabled
the engine, perf->destroy() decrementing again at event termination make it
negative which in turn fires the WARN_ON. The same happens for nest units.
 
Add a check to see if the reference count is alreday zero, before decrementing

the count, so that the ref count will not hit a negative value.
 
Signed-off-by: Anju T Sudhakar 

Reviewed-by: Santosh Sivaraj 


Thanks for reviewing.

-Anju

---
  arch/powerpc/perf/imc-pmu.c | 28 
  1 file changed, 28 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 9ccac86f3463..e3a1f65933b5 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -399,6 +399,20 @@ static void nest_imc_counters_release(struct perf_event 
*event)
  
  	/* Take the mutex lock for this node and then decrement the reference count */

mutex_lock(>lock);
+   if (ref->refc == 0) {
+   /*
+* The scenario where this is true is, when perf session is
+* started, followed by offlining of all cpus in a given node.
+*
+* In the cpuhotplug offline path, ppc_nest_imc_cpu_offline()
+* function set the ref->count to zero, if the cpu which is
+* about to offline is the last cpu in a given node and make
+* an OPAL call to disable the engine in that node.
+*
+*/
+   mutex_unlock(>lock);
+   return;
+   }
ref->refc--;
if (ref->refc == 0) {
rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
@@ -646,6 +660,20 @@ static void core_imc_counters_release(struct perf_event 
*event)
return;
  
  	mutex_lock(>lock);

+   if (ref->refc == 0) {
+   /*
+* The scenario where this is true is, when perf session is
+* started, followed by offlining of all cpus in a given core.
+*
+* In the cpuhotplug offline path, ppc_core_imc_cpu_offline()
+* function set the ref->count to zero, if the cpu which is
+* about to offline is the last cpu in a given core and make
+* an OPAL call to disable the engine in that core.
+*
+*/
+   mutex_unlock(>lock);
+   return;
+   }
ref->refc--;
if (ref->refc == 0) {
rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,




Re: [PATCH] powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

2017-10-05 Thread Anju T Sudhakar

Hi Santosh,


On Thursday 05 October 2017 03:20 PM, Santosh Sivaraj wrote:

* Anju T Sudhakar  wrote (on 2017-10-04 06:50:52 
+):


Nest/core pmu units are enabled only when it is used. A reference count is
maintained for the events which uses the nest/core pmu units. Currently in
*_imc_counters_release function a WARN() is used for notification of any
underflow of ref count.
 
The case where event ref count hit a negative value is, when perf session is

started, followed by offlining of all cpus in a given core.
i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
ref->count to zero, if the current cpu which is about to offline is the last
cpu in a given core and make an OPAL call to disable the engine in that core.
And on perf session termination, perf->destroy (core_imc_counters_release) will
first decrement the ref->count for this core and based on the ref->count value
an opal call is made to disable the core-imc engine.
Now, since cpuhotplug path already clears the ref->count for core and disabled
the engine, perf->destroy() decrementing again at event termination make it
negative which in turn fires the WARN_ON. The same happens for nest units.
 
Add a check to see if the reference count is alreday zero, before decrementing

the count, so that the ref count will not hit a negative value.
 
Signed-off-by: Anju T Sudhakar 

Reviewed-by: Santosh Sivaraj 


Thanks for reviewing.

-Anju

---
  arch/powerpc/perf/imc-pmu.c | 28 
  1 file changed, 28 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 9ccac86f3463..e3a1f65933b5 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -399,6 +399,20 @@ static void nest_imc_counters_release(struct perf_event 
*event)
  
  	/* Take the mutex lock for this node and then decrement the reference count */

mutex_lock(>lock);
+   if (ref->refc == 0) {
+   /*
+* The scenario where this is true is, when perf session is
+* started, followed by offlining of all cpus in a given node.
+*
+* In the cpuhotplug offline path, ppc_nest_imc_cpu_offline()
+* function set the ref->count to zero, if the cpu which is
+* about to offline is the last cpu in a given node and make
+* an OPAL call to disable the engine in that node.
+*
+*/
+   mutex_unlock(>lock);
+   return;
+   }
ref->refc--;
if (ref->refc == 0) {
rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
@@ -646,6 +660,20 @@ static void core_imc_counters_release(struct perf_event 
*event)
return;
  
  	mutex_lock(>lock);

+   if (ref->refc == 0) {
+   /*
+* The scenario where this is true is, when perf session is
+* started, followed by offlining of all cpus in a given core.
+*
+* In the cpuhotplug offline path, ppc_core_imc_cpu_offline()
+* function set the ref->count to zero, if the cpu which is
+* about to offline is the last cpu in a given core and make
+* an OPAL call to disable the engine in that core.
+*
+*/
+   mutex_unlock(>lock);
+   return;
+   }
ref->refc--;
if (ref->refc == 0) {
rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,




Re: [PATCH] powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

2017-10-05 Thread Santosh Sivaraj
* Anju T Sudhakar  wrote (on 2017-10-04 06:50:52 
+):

> Nest/core pmu units are enabled only when it is used. A reference count is
>   
> maintained for the events which uses the nest/core pmu units. Currently in
>   
> *_imc_counters_release function a WARN() is used for notification of any  
>   
> underflow of ref count.   
>   
>   
>   
> The case where event ref count hit a negative value is, when perf session is  
>   
> started, followed by offlining of all cpus in a given core.   
>   
> i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the   
>   
> ref->count to zero, if the current cpu which is about to offline is the last  
>   
> cpu in a given core and make an OPAL call to disable the engine in that core. 
>   
> And on perf session termination, perf->destroy (core_imc_counters_release) 
> will 
> first decrement the ref->count for this core and based on the ref->count 
> value  
> an opal call is made to disable the core-imc engine.  
>   
> Now, since cpuhotplug path already clears the ref->count for core and 
> disabled  
> the engine, perf->destroy() decrementing again at event termination make it   
>   
> negative which in turn fires the WARN_ON. The same happens for nest units.
>   
>   
>   
> Add a check to see if the reference count is alreday zero, before 
> decrementing  
> the count, so that the ref count will not hit a negative value.   
>   
>   
>   
> Signed-off-by: Anju T Sudhakar 

Reviewed-by: Santosh Sivaraj 
> ---
>  arch/powerpc/perf/imc-pmu.c | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index 9ccac86f3463..e3a1f65933b5 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -399,6 +399,20 @@ static void nest_imc_counters_release(struct perf_event 
> *event)
>  
>   /* Take the mutex lock for this node and then decrement the reference 
> count */
>   mutex_lock(>lock);
> + if (ref->refc == 0) {
> + /*
> +  * The scenario where this is true is, when perf session is
> +  * started, followed by offlining of all cpus in a given node.
> +  *
> +  * In the cpuhotplug offline path, ppc_nest_imc_cpu_offline()
> +  * function set the ref->count to zero, if the cpu which is
> +  * about to offline is the last cpu in a given node and make
> +  * an OPAL call to disable the engine in that node.
> +  *
> +  */
> + mutex_unlock(>lock);
> + return;
> + }
>   ref->refc--;
>   if (ref->refc == 0) {
>   rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
> @@ -646,6 +660,20 @@ static void core_imc_counters_release(struct perf_event 
> *event)
>   return;
>  
>   mutex_lock(>lock);
> + if (ref->refc == 0) {
> + /*
> +  * The scenario where this is true is, when perf session is
> +  * started, followed by offlining of all cpus in a given core.
> +  *
> +  * In the cpuhotplug offline path, ppc_core_imc_cpu_offline()
> +  * function set the ref->count to zero, if the cpu which is
> +  * about to offline is the last cpu in a given core and make
> +  * an OPAL call to disable the engine in that core.
> +  *
> +  */
> + mutex_unlock(>lock);
> + return;
> + }
>   ref->refc--;
>   if (ref->refc == 0) {
>   rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,

-- 


Re: [PATCH] powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

2017-10-05 Thread Santosh Sivaraj
* Anju T Sudhakar  wrote (on 2017-10-04 06:50:52 
+):

> Nest/core pmu units are enabled only when it is used. A reference count is
>   
> maintained for the events which uses the nest/core pmu units. Currently in
>   
> *_imc_counters_release function a WARN() is used for notification of any  
>   
> underflow of ref count.   
>   
>   
>   
> The case where event ref count hit a negative value is, when perf session is  
>   
> started, followed by offlining of all cpus in a given core.   
>   
> i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the   
>   
> ref->count to zero, if the current cpu which is about to offline is the last  
>   
> cpu in a given core and make an OPAL call to disable the engine in that core. 
>   
> And on perf session termination, perf->destroy (core_imc_counters_release) 
> will 
> first decrement the ref->count for this core and based on the ref->count 
> value  
> an opal call is made to disable the core-imc engine.  
>   
> Now, since cpuhotplug path already clears the ref->count for core and 
> disabled  
> the engine, perf->destroy() decrementing again at event termination make it   
>   
> negative which in turn fires the WARN_ON. The same happens for nest units.
>   
>   
>   
> Add a check to see if the reference count is alreday zero, before 
> decrementing  
> the count, so that the ref count will not hit a negative value.   
>   
>   
>   
> Signed-off-by: Anju T Sudhakar 

Reviewed-by: Santosh Sivaraj 
> ---
>  arch/powerpc/perf/imc-pmu.c | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index 9ccac86f3463..e3a1f65933b5 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -399,6 +399,20 @@ static void nest_imc_counters_release(struct perf_event 
> *event)
>  
>   /* Take the mutex lock for this node and then decrement the reference 
> count */
>   mutex_lock(>lock);
> + if (ref->refc == 0) {
> + /*
> +  * The scenario where this is true is, when perf session is
> +  * started, followed by offlining of all cpus in a given node.
> +  *
> +  * In the cpuhotplug offline path, ppc_nest_imc_cpu_offline()
> +  * function set the ref->count to zero, if the cpu which is
> +  * about to offline is the last cpu in a given node and make
> +  * an OPAL call to disable the engine in that node.
> +  *
> +  */
> + mutex_unlock(>lock);
> + return;
> + }
>   ref->refc--;
>   if (ref->refc == 0) {
>   rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
> @@ -646,6 +660,20 @@ static void core_imc_counters_release(struct perf_event 
> *event)
>   return;
>  
>   mutex_lock(>lock);
> + if (ref->refc == 0) {
> + /*
> +  * The scenario where this is true is, when perf session is
> +  * started, followed by offlining of all cpus in a given core.
> +  *
> +  * In the cpuhotplug offline path, ppc_core_imc_cpu_offline()
> +  * function set the ref->count to zero, if the cpu which is
> +  * about to offline is the last cpu in a given core and make
> +  * an OPAL call to disable the engine in that core.
> +  *
> +  */
> + mutex_unlock(>lock);
> + return;
> + }
>   ref->refc--;
>   if (ref->refc == 0) {
>   rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,

-- 


Re: [PATCH] powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

2017-09-25 Thread Anju T Sudhakar

Hi mpe,


On Thursday 21 September 2017 10:04 AM, Michael Ellerman wrote:

Anju T Sudhakar  writes:


Nest/core pmu units are enabled only when it is used. A reference count is
maintained for the events which uses the nest/core pmu units. Currently in
*_imc_counters_release function a WARN() is used for notification of any
underflow of ref count. Replace WARN() with a pr_info since it is an overkill.

As discussed elsewhere this is not the right solution.

If it's OK for the reference count to be negative, then we shouldn't
print anything when it is.

But I don't understand how it can be OK for the refcount to be negative.
That means someone has a negative number of references to something?

cheers



Scenario where this happens is in a stress test where perf session is 
started, followed by offlining of all cpus in a given core. And finally 
terminate the perf session.


So, in cpuhotplug offline path(ppc_core_imc_cpu_offline), function set 
the ref->count to zero, if the current cpu which is about to offline is 
the last cpu in a given core and make an OPAL call to disable the engine 
in that core.
And on perf session termination, perf->destory 
(core_imc_counters_release) will first decrement the ref->count for this 
core and based on the ref->count value an opal call is made to disable 
the core-imc engine.


Now, since cpuhotplug path already clears the ref->count for core and 
disabled the engine, perf->destroy() decrementing again at event 
termination make it negative which in turn fires the WARN_ON.


So we do prefer to remove the message as this wont happen in normal 
operation and the core counters are working as expected.

I will send out a patch by removing the message asap.


Thanks,
Anju




Re: [PATCH] powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

2017-09-25 Thread Anju T Sudhakar

Hi mpe,


On Thursday 21 September 2017 10:04 AM, Michael Ellerman wrote:

Anju T Sudhakar  writes:


Nest/core pmu units are enabled only when it is used. A reference count is
maintained for the events which uses the nest/core pmu units. Currently in
*_imc_counters_release function a WARN() is used for notification of any
underflow of ref count. Replace WARN() with a pr_info since it is an overkill.

As discussed elsewhere this is not the right solution.

If it's OK for the reference count to be negative, then we shouldn't
print anything when it is.

But I don't understand how it can be OK for the refcount to be negative.
That means someone has a negative number of references to something?

cheers



Scenario where this happens is in a stress test where perf session is 
started, followed by offlining of all cpus in a given core. And finally 
terminate the perf session.


So, in cpuhotplug offline path(ppc_core_imc_cpu_offline), function set 
the ref->count to zero, if the current cpu which is about to offline is 
the last cpu in a given core and make an OPAL call to disable the engine 
in that core.
And on perf session termination, perf->destory 
(core_imc_counters_release) will first decrement the ref->count for this 
core and based on the ref->count value an opal call is made to disable 
the core-imc engine.


Now, since cpuhotplug path already clears the ref->count for core and 
disabled the engine, perf->destroy() decrementing again at event 
termination make it negative which in turn fires the WARN_ON.


So we do prefer to remove the message as this wont happen in normal 
operation and the core counters are working as expected.

I will send out a patch by removing the message asap.


Thanks,
Anju




Re: [PATCH] powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

2017-09-20 Thread Michael Ellerman
Anju T Sudhakar  writes:

> Nest/core pmu units are enabled only when it is used. A reference count is
> maintained for the events which uses the nest/core pmu units. Currently in
> *_imc_counters_release function a WARN() is used for notification of any
> underflow of ref count. Replace WARN() with a pr_info since it is an overkill.

As discussed elsewhere this is not the right solution.

If it's OK for the reference count to be negative, then we shouldn't
print anything when it is.

But I don't understand how it can be OK for the refcount to be negative.
That means someone has a negative number of references to something?

cheers


Re: [PATCH] powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

2017-09-20 Thread Michael Ellerman
Anju T Sudhakar  writes:

> Nest/core pmu units are enabled only when it is used. A reference count is
> maintained for the events which uses the nest/core pmu units. Currently in
> *_imc_counters_release function a WARN() is used for notification of any
> underflow of ref count. Replace WARN() with a pr_info since it is an overkill.

As discussed elsewhere this is not the right solution.

If it's OK for the reference count to be negative, then we shouldn't
print anything when it is.

But I don't understand how it can be OK for the refcount to be negative.
That means someone has a negative number of references to something?

cheers