Re: [PATCH] Fix the message in facility unavailable exception

2016-11-06 Thread Michael Ellerman
Balbir Singh  writes:

> I ran into this during some testing on qemu. The current
> facility_strings[] are correct when the trap address is
> 0xf80 (hypervisor facility unavailable). When the trap
> address is 0xf60, IC (Interruption Cause) a.k.a status
> in the code is undefined for values 0 and 1.

OK. But how did you generate an exception with an undefined status code?

> This patch
> adds a check to prevent printing the wrong information
> and helps better direct debugging effort.
>
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index d26605d..da0f634 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1520,8 +1520,14 @@ void facility_unavailable_exception(struct pt_regs 
> *regs)
>   }
>  
>   if ((status < ARRAY_SIZE(facility_strings)) &&
> - facility_strings[status])
> - facility = facility_strings[status];
> + facility_strings[status]) {
> + if (!hv && status < 2) {
> + pr_warn("Unexpected facility unavailable exception "
> + "interruption cause %d\n", status);

Please don't add un-ratelimited printks() in this function, otherwise if
they're user triggerable (which some are) it gives the user a way to
scrub the kernel log.

> + facility = "Unknown";
> + } else
> + facility = facility_strings[status];
> + }

I think we should instead tighten the condition on that top-level if, and
have an else clause for all cases that uses "Unknown". eg.

if ((hv || status >= 2) &&
(status < ARRAY_SIZE(facility_strings)) &&
facility_strings[status])
{
facility = facility_strings[status];
} else {
facility = "Unknown";
}

And then if you want to we can also print the hex status value in the
existing printk().

cheers


[PATCH v2 1/2] cpufreq: powernv: Adding fast_switch for schedutil

2016-11-06 Thread Akshay Adiga
Adding fast_switch which does light weight operation to set the desired
pstate. Both global and local pstates are set to the same desired pstate.

Signed-off-by: Akshay Adiga 
---
Changes from v1 :
- Removed unnecessary check for index out of bound.

 drivers/cpufreq/powernv-cpufreq.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index d3ffde8..4a4380d 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -752,9 +752,12 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
spin_lock_init(>gpstate_lock);
ret = cpufreq_table_validate_and_show(policy, powernv_freqs);
 
-   if (ret < 0)
+   if (ret < 0) {
kfree(policy->driver_data);
+   return ret;
+   }
 
+   policy->fast_switch_possible = true;
return ret;
 }
 
@@ -897,6 +900,20 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy 
*policy)
del_timer_sync(>timer);
 }
 
+static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,
+   unsigned int target_freq)
+{
+   int index;
+   struct powernv_smp_call_data freq_data;
+
+   index = cpufreq_table_find_index_dl(policy, target_freq);
+   freq_data.pstate_id = powernv_freqs[index].driver_data;
+   freq_data.gpstate_id = powernv_freqs[index].driver_data;
+   set_pstate(_data);
+
+   return powernv_freqs[index].frequency;
+}
+
 static struct cpufreq_driver powernv_cpufreq_driver = {
.name   = "powernv-cpufreq",
.flags  = CPUFREQ_CONST_LOOPS,
@@ -904,6 +921,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
.exit   = powernv_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index   = powernv_cpufreq_target_index,
+   .fast_switch= powernv_fast_switch,
.get= powernv_cpufreq_get,
.stop_cpu   = powernv_cpufreq_stop_cpu,
.attr   = powernv_cpu_freq_attr,
-- 
2.5.5



[PATCH v2 2/2] cpufreq: powernv: Use PMCR to verify global and local pstate

2016-11-06 Thread Akshay Adiga
As fast_switch() may get called with interrupt disable mode, we cannot
hold a mutex to update the global_pstate_info. So currently, fast_switch()
does not update the global_pstate_info and it will end up with stale data
whenever pstate is updated through fast_switch().

As the gpstate_timer can fire after fast_switch() has updated the pstates,
the timer handler cannot rely on the cached values of local and global
pstate and needs to read it from the PMCR.

Only gpstate_timer_handler() is affected by the stale cached pstate data
beacause either fast_switch() or target_index() routines will be called
for a given govenor, but gpstate_timer can fire after the governor has
changed to schedutil.


Signed-off-by: Akshay Adiga 
---

Changes from v1 :
- Corrected Commit message
- Type cast pstate values read from PMCR to type s8
- Added Macros to get local and global pstates from PMCR


 drivers/cpufreq/powernv-cpufreq.c | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 4a4380d..bf4bc585 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -42,6 +42,8 @@
 #define PMSR_PSAFE_ENABLE  (1UL << 30)
 #define PMSR_SPR_EM_DISABLE(1UL << 31)
 #define PMSR_MAX(x)((x >> 32) & 0xFF)
+#define PMCR_LPSTATE(x)(((x) >> 48) & 0xFF)
+#define PMCR_GPSTATE(x)(((x) >> 56) & 0xFF)
 
 #define MAX_RAMP_DOWN_TIME 5120
 /*
@@ -592,7 +594,8 @@ void gpstate_timer_handler(unsigned long data)
 {
struct cpufreq_policy *policy = (struct cpufreq_policy *)data;
struct global_pstate_info *gpstates = policy->driver_data;
-   int gpstate_idx;
+   int gpstate_idx, lpstate_idx;
+   unsigned long val;
unsigned int time_diff = jiffies_to_msecs(jiffies)
- gpstates->last_sampled_time;
struct powernv_smp_call_data freq_data;
@@ -600,21 +603,36 @@ void gpstate_timer_handler(unsigned long data)
if (!spin_trylock(>gpstate_lock))
return;
 
+   /*
+* If PMCR was last updated was using fast_swtich then
+* We may have wrong in gpstate->last_lpstate_idx
+* value. Hence, read from PMCR to get correct data.
+*/
+   val = get_pmspr(SPRN_PMCR);
+   freq_data.gpstate_id = (s8)PMCR_GPSTATE(val);
+   freq_data.pstate_id = (s8)PMCR_LPSTATE(val);
+   if (freq_data.gpstate_id  == freq_data.pstate_id) {
+   reset_gpstates(policy);
+   spin_unlock(>gpstate_lock);
+   return;
+   }
+
gpstates->last_sampled_time += time_diff;
gpstates->elapsed_time += time_diff;
-   freq_data.pstate_id = idx_to_pstate(gpstates->last_lpstate_idx);
 
-   if ((gpstates->last_gpstate_idx == gpstates->last_lpstate_idx) ||
-   (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME)) {
+   if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) {
gpstate_idx = pstate_to_idx(freq_data.pstate_id);
reset_gpstates(policy);
gpstates->highest_lpstate_idx = gpstate_idx;
} else {
+   lpstate_idx = pstate_to_idx(freq_data.pstate_id);
gpstate_idx = calc_global_pstate(gpstates->elapsed_time,
 gpstates->highest_lpstate_idx,
-gpstates->last_lpstate_idx);
+lpstate_idx);
}
-
+   freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
+   gpstates->last_gpstate_idx = gpstate_idx;
+   gpstates->last_lpstate_idx = lpstate_idx;
/*
 * If local pstate is equal to global pstate, rampdown is over
 * So timer is not required to be queued.
@@ -622,10 +640,6 @@ void gpstate_timer_handler(unsigned long data)
if (gpstate_idx != gpstates->last_lpstate_idx)
queue_gpstate_timer(gpstates);
 
-   freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
-   gpstates->last_gpstate_idx = pstate_to_idx(freq_data.gpstate_id);
-   gpstates->last_lpstate_idx = pstate_to_idx(freq_data.pstate_id);
-
spin_unlock(>gpstate_lock);
 
/* Timer may get migrated to a different cpu on cpu hot unplug */
-- 
2.5.5



Re: [PATCH v2] ppc: cpufreq: disable preemption while checking CPU throttling state

2016-11-06 Thread Michael Ellerman
Denis Kirjanov  writes:

> [   67.700897] BUG: using smp_processor_id() in preemptible [] 
> code: cat/7343
> [   67.700988] caller is .powernv_cpufreq_throttle_check+0x2c/0x710
> [   67.700998] CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1
> [   67.701038] Call Trace:
> [   67.701066] [c007d25b75b0] [c0971378] 
> .dump_stack+0xe4/0x150 (unreliable)
> [   67.701153] [c007d25b7640] [c05162e4] 
> .check_preemption_disabled+0x134/0x150
> [   67.701238] [c007d25b76e0] [c07b63ac] 
> .powernv_cpufreq_throttle_check+0x2c/0x710
> [   67.701322] [c007d25b7790] [c07b6d18] 
> .powernv_cpufreq_target_index+0x288/0x360
> [   67.701407] [c007d25b7870] [c07acee4] 
> .__cpufreq_driver_target+0x394/0x8c0
> [   67.701491] [c007d25b7920] [c07b22ac] 
> .cpufreq_set+0x7c/0xd0
> [   67.701565] [c007d25b79b0] [c07adf50] 
> .store_scaling_setspeed+0x80/0xc0
> [   67.701650] [c007d25b7a40] [c07ae270] .store+0xa0/0x100
> [   67.701723] [c007d25b7ae0] [c03566e8] 
> .sysfs_kf_write+0x88/0xb0
> [   67.701796] [c007d25b7b70] [c03553b8] 
> .kernfs_fop_write+0x178/0x260
> [   67.701881] [c007d25b7c10] [c02ac3cc] 
> .__vfs_write+0x3c/0x1c0
> [   67.701954] [c007d25b7cf0] [c02ad584] .vfs_write+0xc4/0x230
> [   67.702027] [c007d25b7d90] [c02aeef8] .SyS_write+0x58/0x100
> [   67.702101] [c007d25b7e30] [c000bfec] system_call+0x38/0xfc
>
> Signed-off-by: Denis Kirjanov 
>
> v2:  wrap powernv_cpufreq_throttle_check()
> as suggested by Gautham R Shenoy

That should be below the "---".

When did this break?

cheers


Re: [PATCH 2/2] cpufreq: powernv: Use PMSR to verify global and local pstate

2016-11-06 Thread Akshay Adiga

Thanks Viresh for taking a look at it.

I will make the mentioned changes in the next version of the patch and
will add Shilpa and Gautham to the mail chain.

Regards

Akshay Adiga


On 11/04/2016 12:11 PM, Viresh Kumar wrote:

On 04-11-16, 10:57, Akshay Adiga wrote:

As fast_switch may get called in interrupt disable mode, it does not

s/in interrupt disable mode/with interrupts disabled
s/it does/it may


update the global_pstate_info data structure. Hence the global_pstate_info
has stale data whenever pstate is updated through fast_swtich().

s/has/may have
s/swtich/switch


So the gpstate_timer can fire after a fast_switch() call has update

s/So the/The
s/a fast_swtich() call has update/the fast_switch() call has updated


the pstates to a different value. Hence the timer handler cannot rely
on the cached values of local and global pstate and needs to read it
from the PMSR.

Signed-off-by: Akshay Adiga 

---
  drivers/cpufreq/powernv-cpufreq.c | 32 ++--
  1 file changed, 22 insertions(+), 10 deletions(-)

I am not the best guy to judge the code changes here. Can you please include
Shilpa and Gautham to the mail chain and get there feedback.








Re: [PATCH 1/2] cpufreq: powernv: Adding fast_switch for schedutil

2016-11-06 Thread Akshay Adiga

Thanks Viresh for taking a look at it.

I will make the mentioned changes in the next version of the patch.


Regards

Akshay Adiga


On 11/04/2016 12:03 PM, Viresh Kumar wrote:

On 04-11-16, 10:57, Akshay Adiga wrote:

Adding fast_switch which does light weight operation to
set the desired pstate.

Signed-off-by: Akshay Adiga 
---
  drivers/cpufreq/powernv-cpufreq.c | 22 +-
  1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index d3ffde8..09a0496 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -752,9 +752,12 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
spin_lock_init(>gpstate_lock);
ret = cpufreq_table_validate_and_show(policy, powernv_freqs);
  
-	if (ret < 0)

+   if (ret < 0) {
kfree(policy->driver_data);
+   return ret;
+   }
  
+	policy->fast_switch_possible = true;

return ret;
  }
  
@@ -897,6 +900,22 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)

del_timer_sync(>timer);
  }
  
+static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,

+   unsigned int target_freq)
+{
+   int index;
+   struct powernv_smp_call_data freq_data;
+
+   index = cpufreq_table_find_index_dl(policy, target_freq);
+   if (index < 0 || index >= powernv_pstate_info.nr_pstates)
+   return CPUFREQ_ENTRY_INVALID;

I don't think such a check is required at all. It wouldn't happen without a BUG
in kernel.

+   freq_data.pstate_id = powernv_freqs[index].driver_data;
+   freq_data.gpstate_id = powernv_freqs[index].driver_data;
+   set_pstate(_data);
+
+   return powernv_freqs[index].frequency;
+}
+
  static struct cpufreq_driver powernv_cpufreq_driver = {
.name   = "powernv-cpufreq",
.flags  = CPUFREQ_CONST_LOOPS,
@@ -904,6 +923,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
.exit   = powernv_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index   = powernv_cpufreq_target_index,
+   .fast_switch= powernv_fast_switch,
.get= powernv_cpufreq_get,
.stop_cpu   = powernv_cpufreq_stop_cpu,
.attr   = powernv_cpu_freq_attr,
--
2.7.4




Re: [PATCH v2] ppc: cpufreq: disable preemption while checking CPU throttling state

2016-11-06 Thread Denis Kirjanov
On Monday, November 7, 2016, Gautham R Shenoy 
wrote:

> Hi Denis,
>
> On Fri, Nov 04, 2016 at 07:08:38AM -0400, Denis Kirjanov wrote:
>
> You can provide the config option with which this bug was found in the
> change log. I suppose you had enabled CONFIG_DEBUG_PREEMPT.
>
>
that's why I put the comment


> > [   67.700897] BUG: using smp_processor_id() in preemptible
> [] code: cat/7343
> > [   67.700988] caller is .powernv_cpufreq_throttle_check+0x2c/0x710
> > [   67.700998] CPU: 13 PID: 7343 Comm: cat Not tainted
> 4.8.0-rc5-dirty #1
> > [   67.701038] Call Trace:
> > [   67.701066] [c007d25b75b0] [c0971378]
> .dump_stack+0xe4/0x150 (unreliable)
> > [   67.701153] [c007d25b7640] [c05162e4]
> .check_preemption_disabled+0x134/0x150
> > [   67.701238] [c007d25b76e0] [c07b63ac]
> .powernv_cpufreq_throttle_check+0x2c/0x710
> > [   67.701322] [c007d25b7790] [c07b6d18]
> .powernv_cpufreq_target_index+0x288/0x360
> > [   67.701407] [c007d25b7870] [c07acee4]
> .__cpufreq_driver_target+0x394/0x8c0
> > [   67.701491] [c007d25b7920] [c07b22ac]
> .cpufreq_set+0x7c/0xd0
> > [   67.701565] [c007d25b79b0] [c07adf50]
> .store_scaling_setspeed+0x80/0xc0
> > [   67.701650] [c007d25b7a40] [c07ae270]
> .store+0xa0/0x100
> > [   67.701723] [c007d25b7ae0] [c03566e8]
> .sysfs_kf_write+0x88/0xb0
> > [   67.701796] [c007d25b7b70] [c03553b8]
> .kernfs_fop_write+0x178/0x260
> > [   67.701881] [c007d25b7c10] [c02ac3cc]
> .__vfs_write+0x3c/0x1c0
> > [   67.701954] [c007d25b7cf0] [c02ad584]
> .vfs_write+0xc4/0x230
> > [   67.702027] [c007d25b7d90] [c02aeef8]
> .SyS_write+0x58/0x100
> > [   67.702101] [c007d25b7e30] [c000bfec]
> system_call+0x38/0xfc
> >
> > Signed-off-by: Denis Kirjanov >
> >
> > v2:  wrap powernv_cpufreq_throttle_check()
> > as suggested by Gautham R Shenoy
>
> Looks good otherwise.
>
> Reviewed-by: Gautham R. Shenoy >
>
> > ---
> >  drivers/cpufreq/powernv-cpufreq.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cpufreq/powernv-cpufreq.c
> b/drivers/cpufreq/powernv-cpufreq.c
> > index d3ffde8..112e0e2 100644
> > --- a/drivers/cpufreq/powernv-cpufreq.c
> > +++ b/drivers/cpufreq/powernv-cpufreq.c
> > @@ -647,8 +647,15 @@ static int powernv_cpufreq_target_index(struct
> cpufreq_policy *policy,
> >   if (unlikely(rebooting) && new_index != get_nominal_index())
> >   return 0;
> >
> > - if (!throttled)
> > + if (!throttled) {
> > + /*
> > +  * we don't want to be preempted while
> > +  * checking if the CPU frequency has been throttled
> > +  */
> > + preempt_disable();
> >   powernv_cpufreq_throttle_check(NULL);
> > + preempt_enable();
> > +}
> >
> >   cur_msec = jiffies_to_msecs(get_jiffies_64());
> >
> > --
> > 1.8.3.1
> >
>
>


Re: [PATCH v2] ppc: cpufreq: disable preemption while checking CPU throttling state

2016-11-06 Thread Gautham R Shenoy
Hi Denis,

On Fri, Nov 04, 2016 at 07:08:38AM -0400, Denis Kirjanov wrote:

You can provide the config option with which this bug was found in the
change log. I suppose you had enabled CONFIG_DEBUG_PREEMPT.

> [   67.700897] BUG: using smp_processor_id() in preemptible [] 
> code: cat/7343
> [   67.700988] caller is .powernv_cpufreq_throttle_check+0x2c/0x710
> [   67.700998] CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1
> [   67.701038] Call Trace:
> [   67.701066] [c007d25b75b0] [c0971378] 
> .dump_stack+0xe4/0x150 (unreliable)
> [   67.701153] [c007d25b7640] [c05162e4] 
> .check_preemption_disabled+0x134/0x150
> [   67.701238] [c007d25b76e0] [c07b63ac] 
> .powernv_cpufreq_throttle_check+0x2c/0x710
> [   67.701322] [c007d25b7790] [c07b6d18] 
> .powernv_cpufreq_target_index+0x288/0x360
> [   67.701407] [c007d25b7870] [c07acee4] 
> .__cpufreq_driver_target+0x394/0x8c0
> [   67.701491] [c007d25b7920] [c07b22ac] 
> .cpufreq_set+0x7c/0xd0
> [   67.701565] [c007d25b79b0] [c07adf50] 
> .store_scaling_setspeed+0x80/0xc0
> [   67.701650] [c007d25b7a40] [c07ae270] .store+0xa0/0x100
> [   67.701723] [c007d25b7ae0] [c03566e8] 
> .sysfs_kf_write+0x88/0xb0
> [   67.701796] [c007d25b7b70] [c03553b8] 
> .kernfs_fop_write+0x178/0x260
> [   67.701881] [c007d25b7c10] [c02ac3cc] 
> .__vfs_write+0x3c/0x1c0
> [   67.701954] [c007d25b7cf0] [c02ad584] .vfs_write+0xc4/0x230
> [   67.702027] [c007d25b7d90] [c02aeef8] .SyS_write+0x58/0x100
> [   67.702101] [c007d25b7e30] [c000bfec] system_call+0x38/0xfc
> 
> Signed-off-by: Denis Kirjanov 
> 
> v2:  wrap powernv_cpufreq_throttle_check()
> as suggested by Gautham R Shenoy

Looks good otherwise.

Reviewed-by: Gautham R. Shenoy 

> ---
>  drivers/cpufreq/powernv-cpufreq.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index d3ffde8..112e0e2 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -647,8 +647,15 @@ static int powernv_cpufreq_target_index(struct 
> cpufreq_policy *policy,
>   if (unlikely(rebooting) && new_index != get_nominal_index())
>   return 0;
> 
> - if (!throttled)
> + if (!throttled) {
> + /*
> +  * we don't want to be preempted while
> +  * checking if the CPU frequency has been throttled
> +  */
> + preempt_disable();
>   powernv_cpufreq_throttle_check(NULL);
> + preempt_enable();
> +}
> 
>   cur_msec = jiffies_to_msecs(get_jiffies_64());
> 
> -- 
> 1.8.3.1
> 



[PATCH 4/4] powerpc/perf: macros for PowerISA v3.0 format encoding

2016-11-06 Thread Madhavan Srinivasan
Patch to add macros and constants to support the PowerISA v3.0 raw
event encoding format. Couple of new functions added to support
the new width and location of bit fields like PMCxCOMB and THRESH_CMP
within MMCR* in PowerISA v3.0.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/isa207-common.c | 88 ---
 arch/powerpc/perf/isa207-common.h | 27 +++-
 2 files changed, 108 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index 2a2040ea5f99..a3d8a6f31226 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -55,6 +55,81 @@ static inline bool event_is_fab_match(u64 event)
return (event == 0x30056 || event == 0x4f052);
 }
 
+static bool is_event_valid(u64 event)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+   (cpu_has_feature(CPU_FTR_POWER9_DD1)) &&
+   (event & ~EVENT_VALID_MASK))
+   return false;
+   else if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+   (event & ~ISA300_EVENT_VALID_MASK))
+   return false;
+   else if (event & ~EVENT_VALID_MASK)
+   return false;
+
+   return true;
+}
+
+static u64 mmcra_sdar_mode(u64 event)
+{
+   u64 sm;
+
+   if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+  (cpu_has_feature(CPU_FTR_POWER9_DD1))) {
+   goto sm_tlb;
+   } else if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   sm = (event >> ISA300_SDAR_MODE_SHIFT) & ISA300_SDAR_MODE_MASK;
+   if (sm)
+   return sm<> EVENT_COMBINE_SHIFT) & EVENT_COMBINE_MASK;
+   else if (cpu_has_feature(CPU_FTR_ARCH_300))
+   combine = (event >> ISA300_EVENT_COMBINE_SHIFT) & 
ISA300_EVENT_COMBINE_MASK;
+   else
+   combine = (event >> EVENT_COMBINE_SHIFT) & EVENT_COMBINE_MASK;
+
+   return combine;
+}
+
+static unsigned long combine_shift(unsigned long pmc)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+  (cpu_has_feature(CPU_FTR_POWER9_DD1)))
+   goto comb_shift;
+   else if (cpu_has_feature(CPU_FTR_ARCH_300))
+   return ISA300_MMCR1_COMBINE_SHIFT(pmc);
+   else
+   goto comb_shift;
+
+comb_shift:
+   return MMCR1_COMBINE_SHIFT(pmc);
+}
+
 int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 {
unsigned int unit, pmc, cache, ebb;
@@ -62,7 +137,7 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, 
unsigned long *valp)
 
mask = value = 0;
 
-   if (event & ~EVENT_VALID_MASK)
+   if (!is_event_valid(event))
return -1;
 
pmc   = (event >> EVENT_PMC_SHIFT)& EVENT_PMC_MASK;
@@ -189,15 +264,13 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
pmc_inuse |= 1 << pmc;
}
 
-   /* In continuous sampling mode, update SDAR on TLB miss */
-   mmcra = MMCRA_SDAR_MODE_TLB;
mmcr1 = mmcr2 = 0;
 
/* Second pass: assign PMCs, set all MMCR1 fields */
for (i = 0; i < n_ev; ++i) {
pmc = (event[i] >> EVENT_PMC_SHIFT) & EVENT_PMC_MASK;
unit= (event[i] >> EVENT_UNIT_SHIFT) & EVENT_UNIT_MASK;
-   combine = (event[i] >> EVENT_COMBINE_SHIFT) & 
EVENT_COMBINE_MASK;
+   combine = combine_from_event(event[i]);
psel=  event[i] & EVENT_PSEL_MASK;
 
if (!pmc) {
@@ -211,10 +284,13 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
 
if (pmc <= 4) {
mmcr1 |= unit << MMCR1_UNIT_SHIFT(pmc);
-   mmcr1 |= combine << MMCR1_COMBINE_SHIFT(pmc);
+   mmcr1 |= combine << combine_shift(pmc);
mmcr1 |= psel << MMCR1_PMCSEL_SHIFT(pmc);
}
 
+   /* In continuous sampling mode, update SDAR on TLB miss */
+   mmcra |= mmcra_sdar_mode(event[i]);
+
if (event[i] & EVENT_IS_L1) {
cache = event[i] >> EVENT_CACHE_SEL_SHIFT;
mmcr1 |= (cache & 1) << MMCR1_IC_QUAL_SHIFT;
@@ -245,7 +321,7 @@ int 

[PATCH 3/4] powerpc/perf: PowerISA v3.0 raw event format encoding

2016-11-06 Thread Madhavan Srinivasan
Patch to update the PowerISA v3.0 raw event encoding format
information and add support for the same in Power9.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/power9-pmu.c | 134 +
 1 file changed, 134 insertions(+)

diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index d1782fd644e9..928d0e739ed4 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -16,6 +16,78 @@
 #include "isa207-common.h"
 
 /*
+ * Raw event encoding for PowerISA v3.0:
+ *
+ *60565248444036   
 32
+ * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - 
- - |
+ *   | | [ ]   [ ] [  thresh_cmp ]   [  thresh_ctl 
  ]
+ *   | |  | | |
+ *   | |  *- IFM (Linux)|thresh start/stop OR FAB match -*
+ *   | *- BHRB (Linux)  *sm
+ *   *- EBB (Linux)
+ *
+ *2824201612 8 4   
  0
+ * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - 
- - |
+ *   [   ] [  sample ]   [cache]   [ pmc ]   [unit ]   []m   [pmcxsel  
  ]
+ * ||   |  | |
+ * ||   |  | *- mark
+ * ||   *- L1/L2/L3 cache_sel  |
+ * ||  |
+ * |*- sampling mode for marked events *- combine
+ * |
+ * *- thresh_sel
+ *
+ * Below uses IBM bit numbering.
+ *
+ * MMCR1[x:y] = unit(PMCxUNIT)
+ * MMCR1[24]   = pmc1combine[0]
+ * MMCR1[25]   = pmc1combine[1]
+ * MMCR1[26]   = pmc2combine[0]
+ * MMCR1[27]   = pmc2combine[1]
+ * MMCR1[28]   = pmc3combine[0]
+ * MMCR1[29]   = pmc3combine[1]
+ * MMCR1[30]   = pmc4combine[0]
+ * MMCR1[31]   = pmc4combine[1]
+ *
+ * if pmc == 3 and unit == 0 and pmcxsel[0:6] == 0b0101011
+ * # PM_MRK_FAB_RSP_MATCH
+ * MMCR1[20:27] = thresh_ctl   (FAB_CRESP_MATCH / FAB_TYPE_MATCH)
+ * else if pmc == 4 and unit == 0xf and pmcxsel[0:6] == 0b0101001
+ * # PM_MRK_FAB_RSP_MATCH_CYC
+ * MMCR1[20:27] = thresh_ctl   (FAB_CRESP_MATCH / FAB_TYPE_MATCH)
+ * else
+ * MMCRA[48:55] = thresh_ctl   (THRESH START/END)
+ *
+ * if thresh_sel:
+ * MMCRA[45:47] = thresh_sel
+ *
+ * if thresh_cmp:
+ * MMCRA[9:11] = thresh_cmp[0:2]
+ * MMCRA[12:18] = thresh_cmp[3:9]
+ *
+ * if unit == 6 or unit == 7
+ * MMCRC[53:55] = cache_sel[1:3]  (L2EVENT_SEL)
+ * else if unit == 8 or unit == 9:
+ * if cache_sel[0] == 0: # L3 bank
+ * MMCRC[47:49] = cache_sel[1:3]  (L3EVENT_SEL0)
+ * else if cache_sel[0] == 1:
+ * MMCRC[50:51] = cache_sel[2:3]  (L3EVENT_SEL1)
+ * else if cache_sel[1]: # L1 event
+ * MMCR1[16] = cache_sel[2]
+ * MMCR1[17] = cache_sel[3]
+ *
+ * if mark:
+ * MMCRA[63]= 1(SAMPLE_ENABLE)
+ * MMCRA[57:59] = sample[0:2]  (RAND_SAMP_ELIG)
+ * MMCRA[61:62] = sample[3:4]  (RAND_SAMP_MODE)
+ *
+ * if EBB and BHRB:
+ * MMCRA[32:33] = IFM
+ *
+ * MMCRA[SDAR_MODE]  = sm
+ */
+
+/*
  * Some power9 event codes.
  */
 #define EVENT(_name, _code)_name = _code,
@@ -99,6 +171,48 @@ static const struct attribute_group 
*power9_isa207_pmu_attr_groups[] = {
NULL,
 };
 
+PMU_FORMAT_ATTR(event, "config:0-51");
+PMU_FORMAT_ATTR(pmcxsel,   "config:0-7");
+PMU_FORMAT_ATTR(mark,  "config:8");
+PMU_FORMAT_ATTR(combine,   "config:10-11");
+PMU_FORMAT_ATTR(unit,  "config:12-15");
+PMU_FORMAT_ATTR(pmc,   "config:16-19");
+PMU_FORMAT_ATTR(cache_sel, "config:20-23");
+PMU_FORMAT_ATTR(sample_mode,   "config:24-28");
+PMU_FORMAT_ATTR(thresh_sel,"config:29-31");
+PMU_FORMAT_ATTR(thresh_stop,   "config:32-35");
+PMU_FORMAT_ATTR(thresh_start,  "config:36-39");
+PMU_FORMAT_ATTR(thresh_cmp,"config:40-49");
+PMU_FORMAT_ATTR(sdar_mode, "config:50-51");
+
+static struct attribute *power9_pmu_format_attr[] = {
+   _attr_event.attr,
+   _attr_pmcxsel.attr,
+   _attr_mark.attr,
+   _attr_combine.attr,
+   _attr_unit.attr,
+   _attr_pmc.attr,
+   _attr_cache_sel.attr,
+   _attr_sample_mode.attr,
+   _attr_thresh_sel.attr,
+   _attr_thresh_stop.attr,
+   _attr_thresh_start.attr,
+   _attr_thresh_cmp.attr,
+   _attr_sdar_mode.attr,
+   NULL,
+};
+
+static struct attribute_group power9_pmu_format_group = {
+   .name = "format",
+   .attrs = power9_pmu_format_attr,
+};
+
+static const struct attribute_group *power9_pmu_attr_groups[] = {
+   _pmu_format_group,
+   _pmu_events_group,
+   NULL,
+};
+
 static int power9_generic_events[] = {
[PERF_COUNT_HW_CPU_CYCLES] =PM_CYC,
[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =   

[PATCH 2/4] powerpc/perf: update attribute_group data structure

2016-11-06 Thread Madhavan Srinivasan
Rename the power_pmu and attribute_group variables that
support PowerISA v2.07. Add a cpu feature flag check to pick
the PowerISA v2.07 format structures to support.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/power9-pmu.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 443511b18bc5..d1782fd644e9 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -93,7 +93,7 @@ static struct attribute_group power9_pmu_events_group = {
.attrs = power9_events_attr,
 };
 
-static const struct attribute_group *power9_pmu_attr_groups[] = {
+static const struct attribute_group *power9_isa207_pmu_attr_groups[] = {
_pmu_format_group,
_pmu_events_group,
NULL,
@@ -260,7 +260,7 @@ static int 
power9_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 
 #undef C
 
-static struct power_pmu power9_pmu = {
+static struct power_pmu power9_isa207_pmu = {
.name   = "POWER9",
.n_counter  = MAX_PMU_COUNTERS,
.add_fields = ISA207_ADD_FIELDS,
@@ -274,7 +274,7 @@ static struct power_pmu power9_pmu = {
.n_generic  = ARRAY_SIZE(power9_generic_events),
.generic_events = power9_generic_events,
.cache_events   = _cache_events,
-   .attr_groups= power9_pmu_attr_groups,
+   .attr_groups= power9_isa207_pmu_attr_groups,
.bhrb_nr= 32,
 };
 
@@ -287,7 +287,10 @@ static int __init init_power9_pmu(void)
strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power9"))
return -ENODEV;
 
-   rc = register_power_pmu(_pmu);
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
+   rc = register_power_pmu(_isa207_pmu);
+   }
+
if (rc)
return rc;
 
-- 
2.7.4



[PATCH 1/4] powerpc/perf: factor out the event format field

2016-11-06 Thread Madhavan Srinivasan
Factor out the format field structure for PowerISA v2.07.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/isa207-common.c | 34 ++
 arch/powerpc/perf/power8-pmu.c| 39 ---
 arch/powerpc/perf/power9-pmu.c| 39 ---
 3 files changed, 42 insertions(+), 70 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index 6143c99f3ec5..2a2040ea5f99 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -12,6 +12,40 @@
  */
 #include "isa207-common.h"
 
+PMU_FORMAT_ATTR(event, "config:0-49");
+PMU_FORMAT_ATTR(pmcxsel,   "config:0-7");
+PMU_FORMAT_ATTR(mark,  "config:8");
+PMU_FORMAT_ATTR(combine,   "config:11");
+PMU_FORMAT_ATTR(unit,  "config:12-15");
+PMU_FORMAT_ATTR(pmc,   "config:16-19");
+PMU_FORMAT_ATTR(cache_sel, "config:20-23");
+PMU_FORMAT_ATTR(sample_mode,   "config:24-28");
+PMU_FORMAT_ATTR(thresh_sel,"config:29-31");
+PMU_FORMAT_ATTR(thresh_stop,   "config:32-35");
+PMU_FORMAT_ATTR(thresh_start,  "config:36-39");
+PMU_FORMAT_ATTR(thresh_cmp,"config:40-49");
+
+struct attribute *isa207_pmu_format_attr[] = {
+   _attr_event.attr,
+   _attr_pmcxsel.attr,
+   _attr_mark.attr,
+   _attr_combine.attr,
+   _attr_unit.attr,
+   _attr_pmc.attr,
+   _attr_cache_sel.attr,
+   _attr_sample_mode.attr,
+   _attr_thresh_sel.attr,
+   _attr_thresh_stop.attr,
+   _attr_thresh_start.attr,
+   _attr_thresh_cmp.attr,
+   NULL,
+};
+
+struct attribute_group isa207_pmu_format_group = {
+   .name = "format",
+   .attrs = isa207_pmu_format_attr,
+};
+
 static inline bool event_is_fab_match(u64 event)
 {
/* Only check pmc, unit and pmcxsel, ignore the edge bit (0) */
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index ab830d106ec5..d07186382f3a 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -30,6 +30,9 @@ enum {
 #definePOWER8_MMCRA_IFM2   0x8000UL
 #definePOWER8_MMCRA_IFM3   0xC000UL
 
+/* PowerISA v2.07 format attribute structure*/
+extern struct attribute_group isa207_pmu_format_group;
+
 /* Table of alternatives, sorted by column 0 */
 static const unsigned int event_alternatives[][MAX_ALT] = {
{ PM_MRK_ST_CMPL,   PM_MRK_ST_CMPL_ALT },
@@ -175,42 +178,8 @@ static struct attribute_group power8_pmu_events_group = {
.attrs = power8_events_attr,
 };
 
-PMU_FORMAT_ATTR(event, "config:0-49");
-PMU_FORMAT_ATTR(pmcxsel,   "config:0-7");
-PMU_FORMAT_ATTR(mark,  "config:8");
-PMU_FORMAT_ATTR(combine,   "config:11");
-PMU_FORMAT_ATTR(unit,  "config:12-15");
-PMU_FORMAT_ATTR(pmc,   "config:16-19");
-PMU_FORMAT_ATTR(cache_sel, "config:20-23");
-PMU_FORMAT_ATTR(sample_mode,   "config:24-28");
-PMU_FORMAT_ATTR(thresh_sel,"config:29-31");
-PMU_FORMAT_ATTR(thresh_stop,   "config:32-35");
-PMU_FORMAT_ATTR(thresh_start,  "config:36-39");
-PMU_FORMAT_ATTR(thresh_cmp,"config:40-49");
-
-static struct attribute *power8_pmu_format_attr[] = {
-   _attr_event.attr,
-   _attr_pmcxsel.attr,
-   _attr_mark.attr,
-   _attr_combine.attr,
-   _attr_unit.attr,
-   _attr_pmc.attr,
-   _attr_cache_sel.attr,
-   _attr_sample_mode.attr,
-   _attr_thresh_sel.attr,
-   _attr_thresh_stop.attr,
-   _attr_thresh_start.attr,
-   _attr_thresh_cmp.attr,
-   NULL,
-};
-
-static struct attribute_group power8_pmu_format_group = {
-   .name = "format",
-   .attrs = power8_pmu_format_attr,
-};
-
 static const struct attribute_group *power8_pmu_attr_groups[] = {
-   _pmu_format_group,
+   _pmu_format_group,
_pmu_events_group,
NULL,
 };
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 8e9a81967ff8..443511b18bc5 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -31,6 +31,9 @@ enum {
 #define POWER9_MMCRA_IFM2  0x8000UL
 #define POWER9_MMCRA_IFM3  0xC000UL
 
+/* PowerISA v2.07 format attribute structure*/
+extern struct attribute_group isa207_pmu_format_group;
+
 GENERIC_EVENT_ATTR(cpu-cycles, PM_CYC);
 GENERIC_EVENT_ATTR(stalled-cycles-frontend,PM_ICT_NOSLOT_CYC);
 GENERIC_EVENT_ATTR(stalled-cycles-backend, PM_CMPLU_STALL);
@@ -90,42 +93,8 @@ static struct attribute_group power9_pmu_events_group = {
.attrs = power9_events_attr,
 };
 
-PMU_FORMAT_ATTR(event, "config:0-49");
-PMU_FORMAT_ATTR(pmcxsel,   "config:0-7");
-PMU_FORMAT_ATTR(mark,  "config:8");
-PMU_FORMAT_ATTR(combine,   "config:11");
-PMU_FORMAT_ATTR(unit,  "config:12-15");
-PMU_FORMAT_ATTR(pmc,   

[PATCH 0/4]Support PowerISA v3.0 PMU Raw event format

2016-11-06 Thread Madhavan Srinivasan
Patchset to factor out the PowerISA v2.07 PMU raw event
format encoding and add support to the PowerISA v3.0 PMU
raw event format encoding.

Madhavan Srinivasan (4):
  powerpc/perf: factor out the event format field
  powerpc/perf: update attribute_group data structure
  powerpc/perf: PowerISA v3.0 raw event format encoding
  powerpc/perf: macros for PowerISA v3.0 format encoding

 arch/powerpc/perf/isa207-common.c | 122 --
 arch/powerpc/perf/isa207-common.h |  27 -
 arch/powerpc/perf/power8-pmu.c|  39 ++--
 arch/powerpc/perf/power9-pmu.c| 112 +-
 4 files changed, 255 insertions(+), 45 deletions(-)

-- 
2.7.4



Re: [RESEND] [PATCH v3] cxl: Prevent adapter reset if an active context exists

2016-11-06 Thread Andrew Donnellan

On 04/11/16 23:07, Frederic Barrat wrote:

When I inject an EEH error, this patch causes the following WARN.
Thoughts?


mmm, hard to see a relation with that patch. I couldn't reproduce
either. Could it bear any relation with the patch you're working on
(lspci called while the capi device is unconfigured)?


No, this was without any other patches...


[   60.593116] pci :01 : [PE# 000] Switching PHB to CXL
[   60.622727] Adapter context unlocked with 0 active contexts
[   60.622762] [ cut here ]
[   60.622771] WARNING: CPU: 12 PID: 627 at
../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]
[   60.622772] Modules linked in: fuse powernv_rng rng_core leds_powernv
powernv_op_panel led_class vmx_crypto ib_iser rdma_cm iw_cm ib_cm
ib_core libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
multipath bnx2x mdio libcrc32c cxl
[   60.622794] CPU: 12 PID: 627 Comm: eehd Not tainted
4.9.0-rc1-ajd-6-g6fb17cc #4
[   60.622795] task: c003be084900 task.stack: c003be108000
[   60.622797] NIP: d4350be0 LR: d4350bdc CTR:
c0492fd0
[   60.622799] REGS: c003be10b660 TRAP: 0700   Not tainted
(4.9.0-rc1-ajd-6-g6fb17cc)
[   60.622800] MSR: 90010282b033

[   60.622810]   CR: 28000282  XER: 2000
[   60.622811] SOFTE: 1 CFAR: c094fc88
[   60.622814] GPR00: d4350bdc c003be10b8e0 d4379ae8
002f
[   60.622818] GPR04: 0001  03b8

[   60.622822] GPR08:   
0001
[   60.622826] GPR12:  cfe03000 c00baac8
c003c5166500
[   60.622830] GPR16:   

[   60.622834] GPR20:   
c0b14fe8
[   60.622837] GPR24: c0b14fc0 c003afc10400 c003b0c4

[   60.622841] GPR28: c003c505a098  c003afc10400
0006
[   60.622850] NIP [d4350be0]
cxl_adapter_context_unlock+0x60/0x80 [cxl]
[   60.622856] LR [d4350bdc]
cxl_adapter_context_unlock+0x5c/0x80 [cxl]
[   60.622857] Call Trace:
[   60.622863] [c003be10b8e0] [d4350bdc]
cxl_adapter_context_unlock+0x5c/0x80 [cxl] (unreliable)
[   60.622871] [c003be10b940] [d435e810]
cxl_configure_adapter+0x930/0x960 [cxl]
[   60.622879] [c003be10b9f0] [d435e88c]
cxl_pci_slot_reset+0x4c/0x230 [cxl]
[   60.622883] [c003be10baa0] [c0032cd4]
eeh_report_reset+0x164/0x1a0
[   60.622887] [c003be10bae0] [c0031220]
eeh_pe_dev_traverse+0x90/0x170
[   60.622890] [c003be10bb70] [c0033354]
eeh_handle_normal_event+0x3d4/0x520
[   60.622892] [c003be10bc20] [c0033624]
eeh_handle_event+0x44/0x360
[   60.622895] [c003be10bcd0] [c0033a58]
eeh_event_handler+0x118/0x1d0
[   60.622898] [c003be10bd80] [c00babc8] kthread+0x108/0x130
[   60.622902] [c003be10be30] [c000c0a0]
ret_from_kernel_thread+0x5c/0xbc
[   60.622903] Instruction dump:
[   60.622905] 2f84 4dfe0020 7c0802a6 7c8407b4 3920 f8010010
f821ffa1 91230348
[   60.622911] 3c62 e8638070 48016639 e8410018 <0fe0> 38210060
e8010010 7c0803a6
[   60.622918] ---[ end trace d358551c9a007b4f ]---
[   60.622959] cxl afu0.0: Activating AFU directed mode
[   60.623097] EEH: Notify device driver to resume


That *definitely* looks related to this patch...


Andrew

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [RESEND] [PATCH v3] cxl: Prevent adapter reset if an active context exists

2016-11-06 Thread Andrew Donnellan

On 05/11/16 00:15, Uma Krishnan wrote:

Frederic/Andrew,

Just recently this issue has been reported by system test without any
of the two patches you are suspecting - this patch nor the lspci patch.
I was hoping the lspci patch from Andrew can possibly solve it.
System test CQ is SW370625. The stack reported in that is same,

[ 5895.245959] EEH: PHB#2 failure detected, location: N/A
[ 5895.246078] CPU: 19 PID: 121774 Comm: lspci Not tainted
3.10.0-514.el7.ppc64le #1
[ 5895.246240] Call Trace:
[ 5895.246307] [c009f3707a60] [c0017ce0]
show_stack+0x80/0x330 (unreliable)
[ 5895.246501] [c009f3707b10] [c09b22f4]
dump_stack+0x30/0x44
[ 5895.246665] [c009f3707b30] [c003b9ac]
eeh_dev_check_failure+0x21c/0x580
[ 5895.246855] [c009f3707bd0] [c00879dc]
pnv_pci_read_config+0xbc/0x160
[ 5895.247045] [c009f3707c10] [c0527d54]
pci_user_read_config_dword+0x84/0x160
[ 5895.247233] [c009f3707c60] [c0547224]
pci_read_config+0xf4/0x2e0
[ 5895.247398] [c009f3707ce0] [c03efb3c] read+0x10c/0x2a0
[ 5895.247561] [c009f3707da0] [c031d160]
vfs_read+0x110/0x290
[ 5895.247726] [c009f3707de0] [c031ec70]
SyS_pread64+0xb0/0xd0


This isn't a WARN - this stack trace is printed explicitly by the EEH 
code in the case of a PHB failure. arch/powerpc/kernel/eeh.c, line 403.



Andrew

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 00/22] mtd: nand: return error code of nand_scan(_ident,_tail) on error

2016-11-06 Thread Boris Brezillon
On Fri,  4 Nov 2016 19:42:48 +0900
Masahiro Yamada  wrote:

> nand_scan(), nand_scan_ident(), nand_scan_tail() return
> an appropriate negative value on error.
> 
> Most of drivers return the value from them on error,
> but some of them return the fixed error code -ENXIO
> (and a few return -ENODEV).
> 
> This series make those drivers return more precise error code.

Applied and fixed the bug I found in patch 13.

Thanks,

Boris

> 
> 
> Masahiro Yamada (22):
>   mtd: nand: ams-delta: return error code of nand_scan() on error
>   mtd: nand: cmx270: return error code of nand_scan() on error
>   mtd: nand: cs553x: return error code of nand_scan() on error
>   mtd: nand: gpio: return error code of nand_scan() on error
>   mtd: nand: mpc5121: return error code of nand_scan() on error
>   mtd: nand: tmio: return error code of nand_scan() on error
>   mtd: nand: orion: return error code of nand_scan() on error
>   mtd: nand: pasemi: return error code of nand_scan() on error
>   mtd: nand: plat_nand: return error code of nand_scan() on error
>   mtd: nand: atmel: return error code of nand_scan_ident/tail() on error
>   mtd: nand: brcmnand: return error code of nand_scan_ident/tail() on
> error
>   mtd: nand: fsmc: return error code of nand_scan_ident/tail() on error
>   mtd: nand: lpc32xx: return error code of nand_scan_ident/tail() on
> error
>   mtd: nand: mediatek: return error code of nand_scan_ident/tail() on
> error
>   mtd: nand: mxc: return error code of nand_scan_ident/tail() on error
>   mtd: nand: omap2: return error code of nand_scan_ident/tail() on error
>   mtd: nand: vf610: return error code of nand_scan_ident/tail() on error
>   mtd: nand: cafe: return error code of nand_scan_ident() on error
>   mtd: nand: hisi504: return error code of nand_scan_ident() on error
>   mtd: nand: pxa3xx: return error code of nand_scan_ident() on error
>   mtd: nand: nandsim: remove unneeded checks for nand_scan_ident/tail()
>   mtd: nand: socrates: use nand_scan() for nand_scan_ident/tail() combo
> 
>  drivers/mtd/nand/ams-delta.c |  5 ++---
>  drivers/mtd/nand/atmel_nand.c| 10 --
>  drivers/mtd/nand/brcmnand/brcmnand.c | 10 ++
>  drivers/mtd/nand/cafe_nand.c |  5 ++---
>  drivers/mtd/nand/cmx270_nand.c   |  4 ++--
>  drivers/mtd/nand/cs553x_nand.c   |  5 ++---
>  drivers/mtd/nand/fsmc_nand.c |  9 -
>  drivers/mtd/nand/gpio.c  |  5 ++---
>  drivers/mtd/nand/hisi504_nand.c  |  4 +---
>  drivers/mtd/nand/lpc32xx_mlc.c   | 10 --
>  drivers/mtd/nand/lpc32xx_slc.c   |  9 +++--
>  drivers/mtd/nand/mpc5121_nfc.c   |  4 ++--
>  drivers/mtd/nand/mtk_nand.c  |  4 ++--
>  drivers/mtd/nand/mxc_nand.c  | 10 --
>  drivers/mtd/nand/nandsim.c   |  4 
>  drivers/mtd/nand/omap2.c |  9 -
>  drivers/mtd/nand/orion_nand.c|  5 ++---
>  drivers/mtd/nand/pasemi_nand.c   |  5 ++---
>  drivers/mtd/nand/plat_nand.c |  5 ++---
>  drivers/mtd/nand/pxa3xx_nand.c   |  5 +++--
>  drivers/mtd/nand/socrates_nand.c | 12 ++--
>  drivers/mtd/nand/tmio_nand.c |  6 +++---
>  drivers/mtd/nand/vf610_nfc.c | 10 --
>  23 files changed, 62 insertions(+), 93 deletions(-)
> 



Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type

2016-11-06 Thread Jonathan Maxwell
On Thu, Nov 3, 2016 at 8:40 AM, Brian King  wrote:
> On 10/27/2016 10:26 AM, Eric Dumazet wrote:
>> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>>> We recently encountered a bug where a few customers using ibmveth on the
>>> same LPAR hit an issue where a TCP session hung when large receive was
>>> enabled. Closer analysis revealed that the session was stuck because the
>>> one side was advertising a zero window repeatedly.
>>>
>>> We narrowed this down to the fact the ibmveth driver did not set gso_size
>>> which is translated by TCP into the MSS later up the stack. The MSS is
>>> used to calculate the TCP window size and as that was abnormally large,
>>> it was calculating a zero window, even although the sockets receive buffer
>>> was completely empty.
>>>
>>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom
>>> and Marcelo for all your help and review on this.
>>>
>>> The patch fixes both our internal reproduction tests and our customers 
>>> tests.
>>>
>>> Signed-off-by: Jon Maxwell 
>>> ---
>>>  drivers/net/ethernet/ibm/ibmveth.c | 20 
>>>  1 file changed, 20 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
>>> b/drivers/net/ethernet/ibm/ibmveth.c
>>> index 29c05d0..c51717e 100644
>>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int 
>>> budget)
>>>  int frames_processed = 0;
>>>  unsigned long lpar_rc;
>>>  struct iphdr *iph;
>>> +bool large_packet = 0;
>>> +u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>>>
>>>  restart_poll:
>>>  while (frames_processed < budget) {
>>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, 
>>> int budget)
>>>  iph->check = 0;
>>>  iph->check = 
>>> ip_fast_csum((unsigned char *)iph, iph->ihl);
>>>  adapter->rx_large_packets++;
>>> +large_packet = 1;
>>>  }
>>>  }
>>>  }
>>>
>>> +if (skb->len > netdev->mtu) {
>>> +iph = (struct iphdr *)skb->data;
>>> +if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>>> +iph->protocol == IPPROTO_TCP) {
>>> +hdr_len += sizeof(struct iphdr);
>>> +skb_shinfo(skb)->gso_type = 
>>> SKB_GSO_TCPV4;
>>> +skb_shinfo(skb)->gso_size = 
>>> netdev->mtu - hdr_len;
>>> +} else if (be16_to_cpu(skb->protocol) == 
>>> ETH_P_IPV6 &&
>>> +   iph->protocol == IPPROTO_TCP) {
>>> +hdr_len += sizeof(struct ipv6hdr);
>>> +skb_shinfo(skb)->gso_type = 
>>> SKB_GSO_TCPV6;
>>> +skb_shinfo(skb)->gso_size = 
>>> netdev->mtu - hdr_len;
>>> +}
>>> +if (!large_packet)
>>> +adapter->rx_large_packets++;
>>> +}
>>> +
>>>
>>
>> This might break forwarding and PMTU discovery.
>>
>> You force gso_size to device mtu, regardless of real MSS used by the TCP
>> sender.
>>
>> Don't you have the MSS provided in RX descriptor, instead of guessing
>> the value ?
>
> We've had some further discussions on this with the Virtual I/O Server (VIOS)
> development team. The large receive aggregation in the VIOS (AIX based) is 
> actually
> being done by software in the VIOS. What they may be able to do is when 
> performing
> this aggregation, they could look at the packet lengths of all the packets 
> being
> aggregated and take the largest packet size within the aggregation unit, 
> minus the
> header length and return that to the virtual ethernet client which we could 
> then stuff
> into gso_size. They are currently assessing how feasible this would be to do 
> and whether
> it would impact other bits of the code. However, assuming this does end up 
> being an option,
> would this address the concerns here or is that going to break something else 
> I'm
> not thinking of?

I was discussing this with a colleague and although this is better than
what we have so far. We wonder if there could be a corner case where
it ends up with a smaller value than the current MSS. For example if
the application sent a burst of small TCP packets with the PUSH
bit set. In that case they may not be coalesced by GRO. The VIOS could
probably be coded to detect that condition and use the previous MSS.
But that may not necessarily be the current MSS.

The ibmveth driver passes 

Re: [PATCH 1/3] powerpc: Emulation support for load/store instructions on LE

2016-11-06 Thread Ravi Bangoria


On Sunday 06 November 2016 01:01 AM, Anton Blanchard wrote:
> Hi,
>
>> kprobe, uprobe, hw-breakpoint and xmon are the only user of
>> emulate_step.
>>
>> Kprobe / uprobe single-steps instruction if they can't emulate it, so
>> there is no problem with them. As I mention, hw-breakpoint is broken.
>> However I'm not sure about xmon, I need to check that.
> I was mostly concerned that it would impact kprobes. Sounds like we are
> ok there.
>
>> So yes, there is no user-visible feature that depends on this.
> Aren't hardware breakpoints exposed via perf? I'd call perf
> user-visible.


Thanks Anton, That's a good catch. I tried this on ppc64le:

  $ sudo cat /proc/kallsyms  | grep pid_max
c116998c D pid_max

  $ sudo ./perf record -a --event=mem:0xc116998c sleep 10


Before patch:
  It does not record any data and throws below warning.

  $ dmesg
[  817.895573] Unable to handle hardware breakpoint. Breakpoint at 
0xc116998c will be disabled.
[  817.895581] [ cut here ]
[  817.895588] WARNING: CPU: 24 PID: 2032 at 
arch/powerpc/kernel/hw_breakpoint.c:277 hw_breakpoint_handler+0x124/0x230
...

After patch:
  It records data properly.

  $ sudo ./perf report --stdio
...
# Samples: 36  of event 'mem:0xc116998c'
# Event count (approx.): 36
#
# Overhead  CommandShared Object Symbol  
#   .    .
#
63.89%  kdumpctl   [kernel.vmlinux]  [k] alloc_pid
27.78%  opal_errd  [kernel.vmlinux]  [k] alloc_pid
 5.56%  kworker/u97:4  [kernel.vmlinux]  [k] alloc_pid
 2.78%  systemd[kernel.vmlinux]  [k] alloc_pid


-Ravi



Linux 4.9: Reported regressions as of Sunday, 2016-11-06

2016-11-06 Thread Thorsten Leemhuis
Hi! Here is my third regression report for Linux 4.9. It lists 17
regressions I'm aware of. 6 of them are new; 3 got fixed since
last weeks report (a fourth looks fixed as well). The console
problem ("console: don't prefer first registered [...]") got
reported to me multiple times, but the revert to finally get
this fixed is in -mm already.

As always: Are you aware of any other regressions? Then please let me
know (simply CC regressi...@leemhuis.info). And please tell me if there
is anything in the report that shouldn't be there.

Ciao, Thorsten

== Current regressions ==

Desc: thinkpad x60: BIOS limit stops working,
Repo: 16-11-05 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264916.html
Stat: n/a 
Note: WIP

Desc: thinkpad x60:  thermal passive cooling can not prevent the system from 
overheating, when there is no BIOS limit.
Repo: 16-11-05 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264916.html
Stat: n/a 
Note: WIP

Desc: test failures of sendfile(2) and splice(2) 
Repo: 16-11-01 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1262400.html
Stat: 16-11-01 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1262648.html
Note: WIP, patch available

Desc: amdgpu, topaz: powerplay initialization failed
Repo: 16-10-31 https://bugzilla.kernel.org/show_bug.cgi?id=185681 
https://bugs.freedesktop.org/show_bug.cgi?id=98357#
Stat: 16-11-04 https://bugzilla.kernel.org/show_bug.cgi?id=185681#c7
Note: WIP

Desc: mangled display since -rc1 (two systems: one with intel, one with nvidia 
gpu)
Repo: 16-10-31 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1261699.html
Stat: n/a 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1262493.html
Note: root cause unknown, proper bisec needed (would be good if somebody could 
help the reporter)

Desc: "build regression: make.cross ARCH=mips fails with ""No rule to make 
target 'alchemy/devboards/'. """
Repo: 16-10-30 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1262410.html 
https://marc.info/?l=linux-kernel=147780880425626
Stat: n/a 
Note: nothing happened yet; BTW: Should build regressions be on this list at 
all?

Desc: tpm0: TPM self test failed & can't request region for resource
Repo: 16-10-28 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1259943.html 
https://bugzilla.kernel.org/show_bug.cgi?id=185631
Stat: 16-11-03 
https://www.mail-archive.com/tpmdd-devel@lists.sourceforge.net/msg02010.html
Note: Partly fixed by 
https://git.kernel.org/torvalds/c/befd99656c5eb765fe9d96045c4cba099fd938db , 
but it seems more fixes are needed (and available!)

Desc: boot failure of Intel Mobile Internet Devices due to a change in the PCI 
subsystem that appeared in v4.9-rc1.
Repo: 16-10-23 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1255643.html
Stat: 16-10-26 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1258579.html
Note: Poked list, as it looks like the proposed fix got forgotten

Desc: Radeon Oops on shutdown / Panic on shutdown in routine 
radeon_connector_unregister()
Repo: 16-10-19 https://bugzilla.kernel.org/show_bug.cgi?id=178421 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1261699.html
Stat: 16-10-30 https://bugzilla.kernel.org/show_bug.cgi?id=178421#c6
Note: Patch available

Desc: ""console: don't prefer first registered if DT specifies stdout-path"" 
breaks console on video outputs of various ARM boards; breaks some ppc machines 
as well
Repo: 16-10-18 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264523.html 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1253391.html 
https://www.linux-mips.org/archives/linux-mips/16-10/msg00176.html
Stat: 16-11-06 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1265059.html 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264422.html
Note: revert discussed and also in -mm; Side note: this seems to be a 
regression that annoys quite a lot of people

Desc: unable to handle kernel NULL pointer dereference at fuse_setattr
Repo: 16-10-17 https://bugzilla.kernel.org/show_bug.cgi?id=177801
Stat: 16-10-18 https://bugzilla.kernel.org/show_bug.cgi?id=177801#c5
Note: poked Miklos, as the fix is not yet upstream afaics

Desc: Skylake gen6 suspend/resume video regression
Repo: 16-10-16 https://bugzilla.kernel.org/show_bug.cgi?id=177731 
https://bugs.freedesktop.org/show_bug.cgi?id=98517
Stat: 16-10-25 https://bugzilla.kernel.org/show_bug.cgi?id=177731#c3
Note: WIP

Desc: warning in intel_dp_aux_transfer: CPU: 0 PID: 4 at 
drivers/gpu/drm/i915/intel_dp.c:1062 intel_dp_aux_transfer+0x1ed/0x230#
Repo: 16-10-16 https://bugzilla.kernel.org/show_bug.cgi?id=177701
Stat: 16-10-27 https://bugs.freedesktop.org/show_bug.cgi?id=97344
Note: Poked Janni a week ago to give a status update, but didn't hear anything 
yet

Desc: module loadling broken due to kbuild changes
Repo: 16-10-15 

Re: Linux 4.9: Reported regressions as of Sunday, 2016-10-30

2016-11-06 Thread Thorsten Leemhuis
Lo! On 01.11.2016 09:18, Paul Bolle wrote:
> On Sun, 2016-10-30 at 14:20 +0100, Thorsten Leemhuis wrote:
>> As always: Are you aware of any other regressions? Then please let me
>> know (simply CC regressi...@leemhuis.info).
> Do build regressions count?

That's a good question.

> Because I was trying to fix an obscure build issue in arch/mips, choose
> a random configuration that should hit that issue, and promptly ran
> into
> https://lkml.kernel.org/r/<201610301405.k82kqqw0%25fengguang...@intel.com>
> The same configuration does build under v4.8, I tested that of course.

I'd say it's a practical problem that users run into and hence it's a
regression. Sure, in this case it hits only those that compile kernels
themselves; but those are users, too, and we don't want to scare them
away with things that suddenly stop working.

IOW: I'll include it in this weeks report.

Ciao, Thorsten