Re: [PATCH] Fix the message in facility unavailable exception
Balbir Singhwrites: > I ran into this during some testing on qemu. The current > facility_strings[] are correct when the trap address is > 0xf80 (hypervisor facility unavailable). When the trap > address is 0xf60, IC (Interruption Cause) a.k.a status > in the code is undefined for values 0 and 1. OK. But how did you generate an exception with an undefined status code? > This patch > adds a check to prevent printing the wrong information > and helps better direct debugging effort. > > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c > index d26605d..da0f634 100644 > --- a/arch/powerpc/kernel/traps.c > +++ b/arch/powerpc/kernel/traps.c > @@ -1520,8 +1520,14 @@ void facility_unavailable_exception(struct pt_regs > *regs) > } > > if ((status < ARRAY_SIZE(facility_strings)) && > - facility_strings[status]) > - facility = facility_strings[status]; > + facility_strings[status]) { > + if (!hv && status < 2) { > + pr_warn("Unexpected facility unavailable exception " > + "interruption cause %d\n", status); Please don't add un-ratelimited printks() in this function, otherwise if they're user triggerable (which some are) it gives the user a way to scrub the kernel log. > + facility = "Unknown"; > + } else > + facility = facility_strings[status]; > + } I think we should instead tighten the condition on that top-level if, and have an else clause for all cases that uses "Unknown". eg. if ((hv || status >= 2) && (status < ARRAY_SIZE(facility_strings)) && facility_strings[status]) { facility = facility_strings[status]; } else { facility = "Unknown"; } And then if you want to we can also print the hex status value in the existing printk(). cheers
[PATCH v2 1/2] cpufreq: powernv: Adding fast_switch for schedutil
Adding fast_switch which does light weight operation to set the desired pstate. Both global and local pstates are set to the same desired pstate. Signed-off-by: Akshay Adiga--- Changes from v1 : - Removed unnecessary check for index out of bound. drivers/cpufreq/powernv-cpufreq.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index d3ffde8..4a4380d 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -752,9 +752,12 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy) spin_lock_init(>gpstate_lock); ret = cpufreq_table_validate_and_show(policy, powernv_freqs); - if (ret < 0) + if (ret < 0) { kfree(policy->driver_data); + return ret; + } + policy->fast_switch_possible = true; return ret; } @@ -897,6 +900,20 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy) del_timer_sync(>timer); } +static unsigned int powernv_fast_switch(struct cpufreq_policy *policy, + unsigned int target_freq) +{ + int index; + struct powernv_smp_call_data freq_data; + + index = cpufreq_table_find_index_dl(policy, target_freq); + freq_data.pstate_id = powernv_freqs[index].driver_data; + freq_data.gpstate_id = powernv_freqs[index].driver_data; + set_pstate(_data); + + return powernv_freqs[index].frequency; +} + static struct cpufreq_driver powernv_cpufreq_driver = { .name = "powernv-cpufreq", .flags = CPUFREQ_CONST_LOOPS, @@ -904,6 +921,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = { .exit = powernv_cpufreq_cpu_exit, .verify = cpufreq_generic_frequency_table_verify, .target_index = powernv_cpufreq_target_index, + .fast_switch= powernv_fast_switch, .get= powernv_cpufreq_get, .stop_cpu = powernv_cpufreq_stop_cpu, .attr = powernv_cpu_freq_attr, -- 2.5.5
[PATCH v2 2/2] cpufreq: powernv: Use PMCR to verify global and local pstate
As fast_switch() may get called with interrupt disable mode, we cannot hold a mutex to update the global_pstate_info. So currently, fast_switch() does not update the global_pstate_info and it will end up with stale data whenever pstate is updated through fast_switch(). As the gpstate_timer can fire after fast_switch() has updated the pstates, the timer handler cannot rely on the cached values of local and global pstate and needs to read it from the PMCR. Only gpstate_timer_handler() is affected by the stale cached pstate data beacause either fast_switch() or target_index() routines will be called for a given govenor, but gpstate_timer can fire after the governor has changed to schedutil. Signed-off-by: Akshay Adiga--- Changes from v1 : - Corrected Commit message - Type cast pstate values read from PMCR to type s8 - Added Macros to get local and global pstates from PMCR drivers/cpufreq/powernv-cpufreq.c | 34 -- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index 4a4380d..bf4bc585 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -42,6 +42,8 @@ #define PMSR_PSAFE_ENABLE (1UL << 30) #define PMSR_SPR_EM_DISABLE(1UL << 31) #define PMSR_MAX(x)((x >> 32) & 0xFF) +#define PMCR_LPSTATE(x)(((x) >> 48) & 0xFF) +#define PMCR_GPSTATE(x)(((x) >> 56) & 0xFF) #define MAX_RAMP_DOWN_TIME 5120 /* @@ -592,7 +594,8 @@ void gpstate_timer_handler(unsigned long data) { struct cpufreq_policy *policy = (struct cpufreq_policy *)data; struct global_pstate_info *gpstates = policy->driver_data; - int gpstate_idx; + int gpstate_idx, lpstate_idx; + unsigned long val; unsigned int time_diff = jiffies_to_msecs(jiffies) - gpstates->last_sampled_time; struct powernv_smp_call_data freq_data; @@ -600,21 +603,36 @@ void gpstate_timer_handler(unsigned long data) if (!spin_trylock(>gpstate_lock)) return; + /* +* If PMCR was last updated was using fast_swtich then +* We may have wrong in gpstate->last_lpstate_idx +* value. Hence, read from PMCR to get correct data. +*/ + val = get_pmspr(SPRN_PMCR); + freq_data.gpstate_id = (s8)PMCR_GPSTATE(val); + freq_data.pstate_id = (s8)PMCR_LPSTATE(val); + if (freq_data.gpstate_id == freq_data.pstate_id) { + reset_gpstates(policy); + spin_unlock(>gpstate_lock); + return; + } + gpstates->last_sampled_time += time_diff; gpstates->elapsed_time += time_diff; - freq_data.pstate_id = idx_to_pstate(gpstates->last_lpstate_idx); - if ((gpstates->last_gpstate_idx == gpstates->last_lpstate_idx) || - (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME)) { + if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) { gpstate_idx = pstate_to_idx(freq_data.pstate_id); reset_gpstates(policy); gpstates->highest_lpstate_idx = gpstate_idx; } else { + lpstate_idx = pstate_to_idx(freq_data.pstate_id); gpstate_idx = calc_global_pstate(gpstates->elapsed_time, gpstates->highest_lpstate_idx, -gpstates->last_lpstate_idx); +lpstate_idx); } - + freq_data.gpstate_id = idx_to_pstate(gpstate_idx); + gpstates->last_gpstate_idx = gpstate_idx; + gpstates->last_lpstate_idx = lpstate_idx; /* * If local pstate is equal to global pstate, rampdown is over * So timer is not required to be queued. @@ -622,10 +640,6 @@ void gpstate_timer_handler(unsigned long data) if (gpstate_idx != gpstates->last_lpstate_idx) queue_gpstate_timer(gpstates); - freq_data.gpstate_id = idx_to_pstate(gpstate_idx); - gpstates->last_gpstate_idx = pstate_to_idx(freq_data.gpstate_id); - gpstates->last_lpstate_idx = pstate_to_idx(freq_data.pstate_id); - spin_unlock(>gpstate_lock); /* Timer may get migrated to a different cpu on cpu hot unplug */ -- 2.5.5
Re: [PATCH v2] ppc: cpufreq: disable preemption while checking CPU throttling state
Denis Kirjanovwrites: > [ 67.700897] BUG: using smp_processor_id() in preemptible [] > code: cat/7343 > [ 67.700988] caller is .powernv_cpufreq_throttle_check+0x2c/0x710 > [ 67.700998] CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1 > [ 67.701038] Call Trace: > [ 67.701066] [c007d25b75b0] [c0971378] > .dump_stack+0xe4/0x150 (unreliable) > [ 67.701153] [c007d25b7640] [c05162e4] > .check_preemption_disabled+0x134/0x150 > [ 67.701238] [c007d25b76e0] [c07b63ac] > .powernv_cpufreq_throttle_check+0x2c/0x710 > [ 67.701322] [c007d25b7790] [c07b6d18] > .powernv_cpufreq_target_index+0x288/0x360 > [ 67.701407] [c007d25b7870] [c07acee4] > .__cpufreq_driver_target+0x394/0x8c0 > [ 67.701491] [c007d25b7920] [c07b22ac] > .cpufreq_set+0x7c/0xd0 > [ 67.701565] [c007d25b79b0] [c07adf50] > .store_scaling_setspeed+0x80/0xc0 > [ 67.701650] [c007d25b7a40] [c07ae270] .store+0xa0/0x100 > [ 67.701723] [c007d25b7ae0] [c03566e8] > .sysfs_kf_write+0x88/0xb0 > [ 67.701796] [c007d25b7b70] [c03553b8] > .kernfs_fop_write+0x178/0x260 > [ 67.701881] [c007d25b7c10] [c02ac3cc] > .__vfs_write+0x3c/0x1c0 > [ 67.701954] [c007d25b7cf0] [c02ad584] .vfs_write+0xc4/0x230 > [ 67.702027] [c007d25b7d90] [c02aeef8] .SyS_write+0x58/0x100 > [ 67.702101] [c007d25b7e30] [c000bfec] system_call+0x38/0xfc > > Signed-off-by: Denis Kirjanov > > v2: wrap powernv_cpufreq_throttle_check() > as suggested by Gautham R Shenoy That should be below the "---". When did this break? cheers
Re: [PATCH 2/2] cpufreq: powernv: Use PMSR to verify global and local pstate
Thanks Viresh for taking a look at it. I will make the mentioned changes in the next version of the patch and will add Shilpa and Gautham to the mail chain. Regards Akshay Adiga On 11/04/2016 12:11 PM, Viresh Kumar wrote: On 04-11-16, 10:57, Akshay Adiga wrote: As fast_switch may get called in interrupt disable mode, it does not s/in interrupt disable mode/with interrupts disabled s/it does/it may update the global_pstate_info data structure. Hence the global_pstate_info has stale data whenever pstate is updated through fast_swtich(). s/has/may have s/swtich/switch So the gpstate_timer can fire after a fast_switch() call has update s/So the/The s/a fast_swtich() call has update/the fast_switch() call has updated the pstates to a different value. Hence the timer handler cannot rely on the cached values of local and global pstate and needs to read it from the PMSR. Signed-off-by: Akshay Adiga--- drivers/cpufreq/powernv-cpufreq.c | 32 ++-- 1 file changed, 22 insertions(+), 10 deletions(-) I am not the best guy to judge the code changes here. Can you please include Shilpa and Gautham to the mail chain and get there feedback.
Re: [PATCH 1/2] cpufreq: powernv: Adding fast_switch for schedutil
Thanks Viresh for taking a look at it. I will make the mentioned changes in the next version of the patch. Regards Akshay Adiga On 11/04/2016 12:03 PM, Viresh Kumar wrote: On 04-11-16, 10:57, Akshay Adiga wrote: Adding fast_switch which does light weight operation to set the desired pstate. Signed-off-by: Akshay Adiga--- drivers/cpufreq/powernv-cpufreq.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index d3ffde8..09a0496 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -752,9 +752,12 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy) spin_lock_init(>gpstate_lock); ret = cpufreq_table_validate_and_show(policy, powernv_freqs); - if (ret < 0) + if (ret < 0) { kfree(policy->driver_data); + return ret; + } + policy->fast_switch_possible = true; return ret; } @@ -897,6 +900,22 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy) del_timer_sync(>timer); } +static unsigned int powernv_fast_switch(struct cpufreq_policy *policy, + unsigned int target_freq) +{ + int index; + struct powernv_smp_call_data freq_data; + + index = cpufreq_table_find_index_dl(policy, target_freq); + if (index < 0 || index >= powernv_pstate_info.nr_pstates) + return CPUFREQ_ENTRY_INVALID; I don't think such a check is required at all. It wouldn't happen without a BUG in kernel. + freq_data.pstate_id = powernv_freqs[index].driver_data; + freq_data.gpstate_id = powernv_freqs[index].driver_data; + set_pstate(_data); + + return powernv_freqs[index].frequency; +} + static struct cpufreq_driver powernv_cpufreq_driver = { .name = "powernv-cpufreq", .flags = CPUFREQ_CONST_LOOPS, @@ -904,6 +923,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = { .exit = powernv_cpufreq_cpu_exit, .verify = cpufreq_generic_frequency_table_verify, .target_index = powernv_cpufreq_target_index, + .fast_switch= powernv_fast_switch, .get= powernv_cpufreq_get, .stop_cpu = powernv_cpufreq_stop_cpu, .attr = powernv_cpu_freq_attr, -- 2.7.4
Re: [PATCH v2] ppc: cpufreq: disable preemption while checking CPU throttling state
On Monday, November 7, 2016, Gautham R Shenoywrote: > Hi Denis, > > On Fri, Nov 04, 2016 at 07:08:38AM -0400, Denis Kirjanov wrote: > > You can provide the config option with which this bug was found in the > change log. I suppose you had enabled CONFIG_DEBUG_PREEMPT. > > that's why I put the comment > > [ 67.700897] BUG: using smp_processor_id() in preemptible > [] code: cat/7343 > > [ 67.700988] caller is .powernv_cpufreq_throttle_check+0x2c/0x710 > > [ 67.700998] CPU: 13 PID: 7343 Comm: cat Not tainted > 4.8.0-rc5-dirty #1 > > [ 67.701038] Call Trace: > > [ 67.701066] [c007d25b75b0] [c0971378] > .dump_stack+0xe4/0x150 (unreliable) > > [ 67.701153] [c007d25b7640] [c05162e4] > .check_preemption_disabled+0x134/0x150 > > [ 67.701238] [c007d25b76e0] [c07b63ac] > .powernv_cpufreq_throttle_check+0x2c/0x710 > > [ 67.701322] [c007d25b7790] [c07b6d18] > .powernv_cpufreq_target_index+0x288/0x360 > > [ 67.701407] [c007d25b7870] [c07acee4] > .__cpufreq_driver_target+0x394/0x8c0 > > [ 67.701491] [c007d25b7920] [c07b22ac] > .cpufreq_set+0x7c/0xd0 > > [ 67.701565] [c007d25b79b0] [c07adf50] > .store_scaling_setspeed+0x80/0xc0 > > [ 67.701650] [c007d25b7a40] [c07ae270] > .store+0xa0/0x100 > > [ 67.701723] [c007d25b7ae0] [c03566e8] > .sysfs_kf_write+0x88/0xb0 > > [ 67.701796] [c007d25b7b70] [c03553b8] > .kernfs_fop_write+0x178/0x260 > > [ 67.701881] [c007d25b7c10] [c02ac3cc] > .__vfs_write+0x3c/0x1c0 > > [ 67.701954] [c007d25b7cf0] [c02ad584] > .vfs_write+0xc4/0x230 > > [ 67.702027] [c007d25b7d90] [c02aeef8] > .SyS_write+0x58/0x100 > > [ 67.702101] [c007d25b7e30] [c000bfec] > system_call+0x38/0xfc > > > > Signed-off-by: Denis Kirjanov > > > > > v2: wrap powernv_cpufreq_throttle_check() > > as suggested by Gautham R Shenoy > > Looks good otherwise. > > Reviewed-by: Gautham R. Shenoy > > > > --- > > drivers/cpufreq/powernv-cpufreq.c | 9 - > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/cpufreq/powernv-cpufreq.c > b/drivers/cpufreq/powernv-cpufreq.c > > index d3ffde8..112e0e2 100644 > > --- a/drivers/cpufreq/powernv-cpufreq.c > > +++ b/drivers/cpufreq/powernv-cpufreq.c > > @@ -647,8 +647,15 @@ static int powernv_cpufreq_target_index(struct > cpufreq_policy *policy, > > if (unlikely(rebooting) && new_index != get_nominal_index()) > > return 0; > > > > - if (!throttled) > > + if (!throttled) { > > + /* > > + * we don't want to be preempted while > > + * checking if the CPU frequency has been throttled > > + */ > > + preempt_disable(); > > powernv_cpufreq_throttle_check(NULL); > > + preempt_enable(); > > +} > > > > cur_msec = jiffies_to_msecs(get_jiffies_64()); > > > > -- > > 1.8.3.1 > > > >
Re: [PATCH v2] ppc: cpufreq: disable preemption while checking CPU throttling state
Hi Denis, On Fri, Nov 04, 2016 at 07:08:38AM -0400, Denis Kirjanov wrote: You can provide the config option with which this bug was found in the change log. I suppose you had enabled CONFIG_DEBUG_PREEMPT. > [ 67.700897] BUG: using smp_processor_id() in preemptible [] > code: cat/7343 > [ 67.700988] caller is .powernv_cpufreq_throttle_check+0x2c/0x710 > [ 67.700998] CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1 > [ 67.701038] Call Trace: > [ 67.701066] [c007d25b75b0] [c0971378] > .dump_stack+0xe4/0x150 (unreliable) > [ 67.701153] [c007d25b7640] [c05162e4] > .check_preemption_disabled+0x134/0x150 > [ 67.701238] [c007d25b76e0] [c07b63ac] > .powernv_cpufreq_throttle_check+0x2c/0x710 > [ 67.701322] [c007d25b7790] [c07b6d18] > .powernv_cpufreq_target_index+0x288/0x360 > [ 67.701407] [c007d25b7870] [c07acee4] > .__cpufreq_driver_target+0x394/0x8c0 > [ 67.701491] [c007d25b7920] [c07b22ac] > .cpufreq_set+0x7c/0xd0 > [ 67.701565] [c007d25b79b0] [c07adf50] > .store_scaling_setspeed+0x80/0xc0 > [ 67.701650] [c007d25b7a40] [c07ae270] .store+0xa0/0x100 > [ 67.701723] [c007d25b7ae0] [c03566e8] > .sysfs_kf_write+0x88/0xb0 > [ 67.701796] [c007d25b7b70] [c03553b8] > .kernfs_fop_write+0x178/0x260 > [ 67.701881] [c007d25b7c10] [c02ac3cc] > .__vfs_write+0x3c/0x1c0 > [ 67.701954] [c007d25b7cf0] [c02ad584] .vfs_write+0xc4/0x230 > [ 67.702027] [c007d25b7d90] [c02aeef8] .SyS_write+0x58/0x100 > [ 67.702101] [c007d25b7e30] [c000bfec] system_call+0x38/0xfc > > Signed-off-by: Denis Kirjanov> > v2: wrap powernv_cpufreq_throttle_check() > as suggested by Gautham R Shenoy Looks good otherwise. Reviewed-by: Gautham R. Shenoy > --- > drivers/cpufreq/powernv-cpufreq.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/powernv-cpufreq.c > b/drivers/cpufreq/powernv-cpufreq.c > index d3ffde8..112e0e2 100644 > --- a/drivers/cpufreq/powernv-cpufreq.c > +++ b/drivers/cpufreq/powernv-cpufreq.c > @@ -647,8 +647,15 @@ static int powernv_cpufreq_target_index(struct > cpufreq_policy *policy, > if (unlikely(rebooting) && new_index != get_nominal_index()) > return 0; > > - if (!throttled) > + if (!throttled) { > + /* > + * we don't want to be preempted while > + * checking if the CPU frequency has been throttled > + */ > + preempt_disable(); > powernv_cpufreq_throttle_check(NULL); > + preempt_enable(); > +} > > cur_msec = jiffies_to_msecs(get_jiffies_64()); > > -- > 1.8.3.1 >
[PATCH 4/4] powerpc/perf: macros for PowerISA v3.0 format encoding
Patch to add macros and constants to support the PowerISA v3.0 raw event encoding format. Couple of new functions added to support the new width and location of bit fields like PMCxCOMB and THRESH_CMP within MMCR* in PowerISA v3.0. Signed-off-by: Madhavan Srinivasan--- arch/powerpc/perf/isa207-common.c | 88 --- arch/powerpc/perf/isa207-common.h | 27 +++- 2 files changed, 108 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c index 2a2040ea5f99..a3d8a6f31226 100644 --- a/arch/powerpc/perf/isa207-common.c +++ b/arch/powerpc/perf/isa207-common.c @@ -55,6 +55,81 @@ static inline bool event_is_fab_match(u64 event) return (event == 0x30056 || event == 0x4f052); } +static bool is_event_valid(u64 event) +{ + if (cpu_has_feature(CPU_FTR_ARCH_300) && + (cpu_has_feature(CPU_FTR_POWER9_DD1)) && + (event & ~EVENT_VALID_MASK)) + return false; + else if (cpu_has_feature(CPU_FTR_ARCH_300) && + (event & ~ISA300_EVENT_VALID_MASK)) + return false; + else if (event & ~EVENT_VALID_MASK) + return false; + + return true; +} + +static u64 mmcra_sdar_mode(u64 event) +{ + u64 sm; + + if (cpu_has_feature(CPU_FTR_ARCH_300) && + (cpu_has_feature(CPU_FTR_POWER9_DD1))) { + goto sm_tlb; + } else if (cpu_has_feature(CPU_FTR_ARCH_300)) { + sm = (event >> ISA300_SDAR_MODE_SHIFT) & ISA300_SDAR_MODE_MASK; + if (sm) + return sm< > EVENT_COMBINE_SHIFT) & EVENT_COMBINE_MASK; + else if (cpu_has_feature(CPU_FTR_ARCH_300)) + combine = (event >> ISA300_EVENT_COMBINE_SHIFT) & ISA300_EVENT_COMBINE_MASK; + else + combine = (event >> EVENT_COMBINE_SHIFT) & EVENT_COMBINE_MASK; + + return combine; +} + +static unsigned long combine_shift(unsigned long pmc) +{ + if (cpu_has_feature(CPU_FTR_ARCH_300) && + (cpu_has_feature(CPU_FTR_POWER9_DD1))) + goto comb_shift; + else if (cpu_has_feature(CPU_FTR_ARCH_300)) + return ISA300_MMCR1_COMBINE_SHIFT(pmc); + else + goto comb_shift; + +comb_shift: + return MMCR1_COMBINE_SHIFT(pmc); +} + int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp) { unsigned int unit, pmc, cache, ebb; @@ -62,7 +137,7 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp) mask = value = 0; - if (event & ~EVENT_VALID_MASK) + if (!is_event_valid(event)) return -1; pmc = (event >> EVENT_PMC_SHIFT)& EVENT_PMC_MASK; @@ -189,15 +264,13 @@ int isa207_compute_mmcr(u64 event[], int n_ev, pmc_inuse |= 1 << pmc; } - /* In continuous sampling mode, update SDAR on TLB miss */ - mmcra = MMCRA_SDAR_MODE_TLB; mmcr1 = mmcr2 = 0; /* Second pass: assign PMCs, set all MMCR1 fields */ for (i = 0; i < n_ev; ++i) { pmc = (event[i] >> EVENT_PMC_SHIFT) & EVENT_PMC_MASK; unit= (event[i] >> EVENT_UNIT_SHIFT) & EVENT_UNIT_MASK; - combine = (event[i] >> EVENT_COMBINE_SHIFT) & EVENT_COMBINE_MASK; + combine = combine_from_event(event[i]); psel= event[i] & EVENT_PSEL_MASK; if (!pmc) { @@ -211,10 +284,13 @@ int isa207_compute_mmcr(u64 event[], int n_ev, if (pmc <= 4) { mmcr1 |= unit << MMCR1_UNIT_SHIFT(pmc); - mmcr1 |= combine << MMCR1_COMBINE_SHIFT(pmc); + mmcr1 |= combine << combine_shift(pmc); mmcr1 |= psel << MMCR1_PMCSEL_SHIFT(pmc); } + /* In continuous sampling mode, update SDAR on TLB miss */ + mmcra |= mmcra_sdar_mode(event[i]); + if (event[i] & EVENT_IS_L1) { cache = event[i] >> EVENT_CACHE_SEL_SHIFT; mmcr1 |= (cache & 1) << MMCR1_IC_QUAL_SHIFT; @@ -245,7 +321,7 @@ int
[PATCH 3/4] powerpc/perf: PowerISA v3.0 raw event format encoding
Patch to update the PowerISA v3.0 raw event encoding format information and add support for the same in Power9. Signed-off-by: Madhavan Srinivasan--- arch/powerpc/perf/power9-pmu.c | 134 + 1 file changed, 134 insertions(+) diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c index d1782fd644e9..928d0e739ed4 100644 --- a/arch/powerpc/perf/power9-pmu.c +++ b/arch/powerpc/perf/power9-pmu.c @@ -16,6 +16,78 @@ #include "isa207-common.h" /* + * Raw event encoding for PowerISA v3.0: + * + *60565248444036 32 + * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | + * | | [ ] [ ] [ thresh_cmp ] [ thresh_ctl ] + * | | | | | + * | | *- IFM (Linux)|thresh start/stop OR FAB match -* + * | *- BHRB (Linux) *sm + * *- EBB (Linux) + * + *2824201612 8 4 0 + * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | + * [ ] [ sample ] [cache] [ pmc ] [unit ] []m [pmcxsel ] + * || | | | + * || | | *- mark + * || *- L1/L2/L3 cache_sel | + * || | + * |*- sampling mode for marked events *- combine + * | + * *- thresh_sel + * + * Below uses IBM bit numbering. + * + * MMCR1[x:y] = unit(PMCxUNIT) + * MMCR1[24] = pmc1combine[0] + * MMCR1[25] = pmc1combine[1] + * MMCR1[26] = pmc2combine[0] + * MMCR1[27] = pmc2combine[1] + * MMCR1[28] = pmc3combine[0] + * MMCR1[29] = pmc3combine[1] + * MMCR1[30] = pmc4combine[0] + * MMCR1[31] = pmc4combine[1] + * + * if pmc == 3 and unit == 0 and pmcxsel[0:6] == 0b0101011 + * # PM_MRK_FAB_RSP_MATCH + * MMCR1[20:27] = thresh_ctl (FAB_CRESP_MATCH / FAB_TYPE_MATCH) + * else if pmc == 4 and unit == 0xf and pmcxsel[0:6] == 0b0101001 + * # PM_MRK_FAB_RSP_MATCH_CYC + * MMCR1[20:27] = thresh_ctl (FAB_CRESP_MATCH / FAB_TYPE_MATCH) + * else + * MMCRA[48:55] = thresh_ctl (THRESH START/END) + * + * if thresh_sel: + * MMCRA[45:47] = thresh_sel + * + * if thresh_cmp: + * MMCRA[9:11] = thresh_cmp[0:2] + * MMCRA[12:18] = thresh_cmp[3:9] + * + * if unit == 6 or unit == 7 + * MMCRC[53:55] = cache_sel[1:3] (L2EVENT_SEL) + * else if unit == 8 or unit == 9: + * if cache_sel[0] == 0: # L3 bank + * MMCRC[47:49] = cache_sel[1:3] (L3EVENT_SEL0) + * else if cache_sel[0] == 1: + * MMCRC[50:51] = cache_sel[2:3] (L3EVENT_SEL1) + * else if cache_sel[1]: # L1 event + * MMCR1[16] = cache_sel[2] + * MMCR1[17] = cache_sel[3] + * + * if mark: + * MMCRA[63]= 1(SAMPLE_ENABLE) + * MMCRA[57:59] = sample[0:2] (RAND_SAMP_ELIG) + * MMCRA[61:62] = sample[3:4] (RAND_SAMP_MODE) + * + * if EBB and BHRB: + * MMCRA[32:33] = IFM + * + * MMCRA[SDAR_MODE] = sm + */ + +/* * Some power9 event codes. */ #define EVENT(_name, _code)_name = _code, @@ -99,6 +171,48 @@ static const struct attribute_group *power9_isa207_pmu_attr_groups[] = { NULL, }; +PMU_FORMAT_ATTR(event, "config:0-51"); +PMU_FORMAT_ATTR(pmcxsel, "config:0-7"); +PMU_FORMAT_ATTR(mark, "config:8"); +PMU_FORMAT_ATTR(combine, "config:10-11"); +PMU_FORMAT_ATTR(unit, "config:12-15"); +PMU_FORMAT_ATTR(pmc, "config:16-19"); +PMU_FORMAT_ATTR(cache_sel, "config:20-23"); +PMU_FORMAT_ATTR(sample_mode, "config:24-28"); +PMU_FORMAT_ATTR(thresh_sel,"config:29-31"); +PMU_FORMAT_ATTR(thresh_stop, "config:32-35"); +PMU_FORMAT_ATTR(thresh_start, "config:36-39"); +PMU_FORMAT_ATTR(thresh_cmp,"config:40-49"); +PMU_FORMAT_ATTR(sdar_mode, "config:50-51"); + +static struct attribute *power9_pmu_format_attr[] = { + _attr_event.attr, + _attr_pmcxsel.attr, + _attr_mark.attr, + _attr_combine.attr, + _attr_unit.attr, + _attr_pmc.attr, + _attr_cache_sel.attr, + _attr_sample_mode.attr, + _attr_thresh_sel.attr, + _attr_thresh_stop.attr, + _attr_thresh_start.attr, + _attr_thresh_cmp.attr, + _attr_sdar_mode.attr, + NULL, +}; + +static struct attribute_group power9_pmu_format_group = { + .name = "format", + .attrs = power9_pmu_format_attr, +}; + +static const struct attribute_group *power9_pmu_attr_groups[] = { + _pmu_format_group, + _pmu_events_group, + NULL, +}; + static int power9_generic_events[] = { [PERF_COUNT_HW_CPU_CYCLES] =PM_CYC, [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
[PATCH 2/4] powerpc/perf: update attribute_group data structure
Rename the power_pmu and attribute_group variables that support PowerISA v2.07. Add a cpu feature flag check to pick the PowerISA v2.07 format structures to support. Signed-off-by: Madhavan Srinivasan--- arch/powerpc/perf/power9-pmu.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c index 443511b18bc5..d1782fd644e9 100644 --- a/arch/powerpc/perf/power9-pmu.c +++ b/arch/powerpc/perf/power9-pmu.c @@ -93,7 +93,7 @@ static struct attribute_group power9_pmu_events_group = { .attrs = power9_events_attr, }; -static const struct attribute_group *power9_pmu_attr_groups[] = { +static const struct attribute_group *power9_isa207_pmu_attr_groups[] = { _pmu_format_group, _pmu_events_group, NULL, @@ -260,7 +260,7 @@ static int power9_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = { #undef C -static struct power_pmu power9_pmu = { +static struct power_pmu power9_isa207_pmu = { .name = "POWER9", .n_counter = MAX_PMU_COUNTERS, .add_fields = ISA207_ADD_FIELDS, @@ -274,7 +274,7 @@ static struct power_pmu power9_pmu = { .n_generic = ARRAY_SIZE(power9_generic_events), .generic_events = power9_generic_events, .cache_events = _cache_events, - .attr_groups= power9_pmu_attr_groups, + .attr_groups= power9_isa207_pmu_attr_groups, .bhrb_nr= 32, }; @@ -287,7 +287,10 @@ static int __init init_power9_pmu(void) strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power9")) return -ENODEV; - rc = register_power_pmu(_pmu); + if (cpu_has_feature(CPU_FTR_POWER9_DD1)) { + rc = register_power_pmu(_isa207_pmu); + } + if (rc) return rc; -- 2.7.4
[PATCH 1/4] powerpc/perf: factor out the event format field
Factor out the format field structure for PowerISA v2.07. Signed-off-by: Madhavan Srinivasan--- arch/powerpc/perf/isa207-common.c | 34 ++ arch/powerpc/perf/power8-pmu.c| 39 --- arch/powerpc/perf/power9-pmu.c| 39 --- 3 files changed, 42 insertions(+), 70 deletions(-) diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c index 6143c99f3ec5..2a2040ea5f99 100644 --- a/arch/powerpc/perf/isa207-common.c +++ b/arch/powerpc/perf/isa207-common.c @@ -12,6 +12,40 @@ */ #include "isa207-common.h" +PMU_FORMAT_ATTR(event, "config:0-49"); +PMU_FORMAT_ATTR(pmcxsel, "config:0-7"); +PMU_FORMAT_ATTR(mark, "config:8"); +PMU_FORMAT_ATTR(combine, "config:11"); +PMU_FORMAT_ATTR(unit, "config:12-15"); +PMU_FORMAT_ATTR(pmc, "config:16-19"); +PMU_FORMAT_ATTR(cache_sel, "config:20-23"); +PMU_FORMAT_ATTR(sample_mode, "config:24-28"); +PMU_FORMAT_ATTR(thresh_sel,"config:29-31"); +PMU_FORMAT_ATTR(thresh_stop, "config:32-35"); +PMU_FORMAT_ATTR(thresh_start, "config:36-39"); +PMU_FORMAT_ATTR(thresh_cmp,"config:40-49"); + +struct attribute *isa207_pmu_format_attr[] = { + _attr_event.attr, + _attr_pmcxsel.attr, + _attr_mark.attr, + _attr_combine.attr, + _attr_unit.attr, + _attr_pmc.attr, + _attr_cache_sel.attr, + _attr_sample_mode.attr, + _attr_thresh_sel.attr, + _attr_thresh_stop.attr, + _attr_thresh_start.attr, + _attr_thresh_cmp.attr, + NULL, +}; + +struct attribute_group isa207_pmu_format_group = { + .name = "format", + .attrs = isa207_pmu_format_attr, +}; + static inline bool event_is_fab_match(u64 event) { /* Only check pmc, unit and pmcxsel, ignore the edge bit (0) */ diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index ab830d106ec5..d07186382f3a 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -30,6 +30,9 @@ enum { #definePOWER8_MMCRA_IFM2 0x8000UL #definePOWER8_MMCRA_IFM3 0xC000UL +/* PowerISA v2.07 format attribute structure*/ +extern struct attribute_group isa207_pmu_format_group; + /* Table of alternatives, sorted by column 0 */ static const unsigned int event_alternatives[][MAX_ALT] = { { PM_MRK_ST_CMPL, PM_MRK_ST_CMPL_ALT }, @@ -175,42 +178,8 @@ static struct attribute_group power8_pmu_events_group = { .attrs = power8_events_attr, }; -PMU_FORMAT_ATTR(event, "config:0-49"); -PMU_FORMAT_ATTR(pmcxsel, "config:0-7"); -PMU_FORMAT_ATTR(mark, "config:8"); -PMU_FORMAT_ATTR(combine, "config:11"); -PMU_FORMAT_ATTR(unit, "config:12-15"); -PMU_FORMAT_ATTR(pmc, "config:16-19"); -PMU_FORMAT_ATTR(cache_sel, "config:20-23"); -PMU_FORMAT_ATTR(sample_mode, "config:24-28"); -PMU_FORMAT_ATTR(thresh_sel,"config:29-31"); -PMU_FORMAT_ATTR(thresh_stop, "config:32-35"); -PMU_FORMAT_ATTR(thresh_start, "config:36-39"); -PMU_FORMAT_ATTR(thresh_cmp,"config:40-49"); - -static struct attribute *power8_pmu_format_attr[] = { - _attr_event.attr, - _attr_pmcxsel.attr, - _attr_mark.attr, - _attr_combine.attr, - _attr_unit.attr, - _attr_pmc.attr, - _attr_cache_sel.attr, - _attr_sample_mode.attr, - _attr_thresh_sel.attr, - _attr_thresh_stop.attr, - _attr_thresh_start.attr, - _attr_thresh_cmp.attr, - NULL, -}; - -static struct attribute_group power8_pmu_format_group = { - .name = "format", - .attrs = power8_pmu_format_attr, -}; - static const struct attribute_group *power8_pmu_attr_groups[] = { - _pmu_format_group, + _pmu_format_group, _pmu_events_group, NULL, }; diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c index 8e9a81967ff8..443511b18bc5 100644 --- a/arch/powerpc/perf/power9-pmu.c +++ b/arch/powerpc/perf/power9-pmu.c @@ -31,6 +31,9 @@ enum { #define POWER9_MMCRA_IFM2 0x8000UL #define POWER9_MMCRA_IFM3 0xC000UL +/* PowerISA v2.07 format attribute structure*/ +extern struct attribute_group isa207_pmu_format_group; + GENERIC_EVENT_ATTR(cpu-cycles, PM_CYC); GENERIC_EVENT_ATTR(stalled-cycles-frontend,PM_ICT_NOSLOT_CYC); GENERIC_EVENT_ATTR(stalled-cycles-backend, PM_CMPLU_STALL); @@ -90,42 +93,8 @@ static struct attribute_group power9_pmu_events_group = { .attrs = power9_events_attr, }; -PMU_FORMAT_ATTR(event, "config:0-49"); -PMU_FORMAT_ATTR(pmcxsel, "config:0-7"); -PMU_FORMAT_ATTR(mark, "config:8"); -PMU_FORMAT_ATTR(combine, "config:11"); -PMU_FORMAT_ATTR(unit, "config:12-15"); -PMU_FORMAT_ATTR(pmc,
[PATCH 0/4]Support PowerISA v3.0 PMU Raw event format
Patchset to factor out the PowerISA v2.07 PMU raw event format encoding and add support to the PowerISA v3.0 PMU raw event format encoding. Madhavan Srinivasan (4): powerpc/perf: factor out the event format field powerpc/perf: update attribute_group data structure powerpc/perf: PowerISA v3.0 raw event format encoding powerpc/perf: macros for PowerISA v3.0 format encoding arch/powerpc/perf/isa207-common.c | 122 -- arch/powerpc/perf/isa207-common.h | 27 - arch/powerpc/perf/power8-pmu.c| 39 ++-- arch/powerpc/perf/power9-pmu.c| 112 +- 4 files changed, 255 insertions(+), 45 deletions(-) -- 2.7.4
Re: [RESEND] [PATCH v3] cxl: Prevent adapter reset if an active context exists
On 04/11/16 23:07, Frederic Barrat wrote: When I inject an EEH error, this patch causes the following WARN. Thoughts? mmm, hard to see a relation with that patch. I couldn't reproduce either. Could it bear any relation with the patch you're working on (lspci called while the capi device is unconfigured)? No, this was without any other patches... [ 60.593116] pci :01 : [PE# 000] Switching PHB to CXL [ 60.622727] Adapter context unlocked with 0 active contexts [ 60.622762] [ cut here ] [ 60.622771] WARNING: CPU: 12 PID: 627 at ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl] [ 60.622772] Modules linked in: fuse powernv_rng rng_core leds_powernv powernv_op_panel led_class vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath bnx2x mdio libcrc32c cxl [ 60.622794] CPU: 12 PID: 627 Comm: eehd Not tainted 4.9.0-rc1-ajd-6-g6fb17cc #4 [ 60.622795] task: c003be084900 task.stack: c003be108000 [ 60.622797] NIP: d4350be0 LR: d4350bdc CTR: c0492fd0 [ 60.622799] REGS: c003be10b660 TRAP: 0700 Not tainted (4.9.0-rc1-ajd-6-g6fb17cc) [ 60.622800] MSR: 90010282b033[ 60.622810] CR: 28000282 XER: 2000 [ 60.622811] SOFTE: 1 CFAR: c094fc88 [ 60.622814] GPR00: d4350bdc c003be10b8e0 d4379ae8 002f [ 60.622818] GPR04: 0001 03b8 [ 60.622822] GPR08: 0001 [ 60.622826] GPR12: cfe03000 c00baac8 c003c5166500 [ 60.622830] GPR16: [ 60.622834] GPR20: c0b14fe8 [ 60.622837] GPR24: c0b14fc0 c003afc10400 c003b0c4 [ 60.622841] GPR28: c003c505a098 c003afc10400 0006 [ 60.622850] NIP [d4350be0] cxl_adapter_context_unlock+0x60/0x80 [cxl] [ 60.622856] LR [d4350bdc] cxl_adapter_context_unlock+0x5c/0x80 [cxl] [ 60.622857] Call Trace: [ 60.622863] [c003be10b8e0] [d4350bdc] cxl_adapter_context_unlock+0x5c/0x80 [cxl] (unreliable) [ 60.622871] [c003be10b940] [d435e810] cxl_configure_adapter+0x930/0x960 [cxl] [ 60.622879] [c003be10b9f0] [d435e88c] cxl_pci_slot_reset+0x4c/0x230 [cxl] [ 60.622883] [c003be10baa0] [c0032cd4] eeh_report_reset+0x164/0x1a0 [ 60.622887] [c003be10bae0] [c0031220] eeh_pe_dev_traverse+0x90/0x170 [ 60.622890] [c003be10bb70] [c0033354] eeh_handle_normal_event+0x3d4/0x520 [ 60.622892] [c003be10bc20] [c0033624] eeh_handle_event+0x44/0x360 [ 60.622895] [c003be10bcd0] [c0033a58] eeh_event_handler+0x118/0x1d0 [ 60.622898] [c003be10bd80] [c00babc8] kthread+0x108/0x130 [ 60.622902] [c003be10be30] [c000c0a0] ret_from_kernel_thread+0x5c/0xbc [ 60.622903] Instruction dump: [ 60.622905] 2f84 4dfe0020 7c0802a6 7c8407b4 3920 f8010010 f821ffa1 91230348 [ 60.622911] 3c62 e8638070 48016639 e8410018 <0fe0> 38210060 e8010010 7c0803a6 [ 60.622918] ---[ end trace d358551c9a007b4f ]--- [ 60.622959] cxl afu0.0: Activating AFU directed mode [ 60.623097] EEH: Notify device driver to resume That *definitely* looks related to this patch... Andrew -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [RESEND] [PATCH v3] cxl: Prevent adapter reset if an active context exists
On 05/11/16 00:15, Uma Krishnan wrote: Frederic/Andrew, Just recently this issue has been reported by system test without any of the two patches you are suspecting - this patch nor the lspci patch. I was hoping the lspci patch from Andrew can possibly solve it. System test CQ is SW370625. The stack reported in that is same, [ 5895.245959] EEH: PHB#2 failure detected, location: N/A [ 5895.246078] CPU: 19 PID: 121774 Comm: lspci Not tainted 3.10.0-514.el7.ppc64le #1 [ 5895.246240] Call Trace: [ 5895.246307] [c009f3707a60] [c0017ce0] show_stack+0x80/0x330 (unreliable) [ 5895.246501] [c009f3707b10] [c09b22f4] dump_stack+0x30/0x44 [ 5895.246665] [c009f3707b30] [c003b9ac] eeh_dev_check_failure+0x21c/0x580 [ 5895.246855] [c009f3707bd0] [c00879dc] pnv_pci_read_config+0xbc/0x160 [ 5895.247045] [c009f3707c10] [c0527d54] pci_user_read_config_dword+0x84/0x160 [ 5895.247233] [c009f3707c60] [c0547224] pci_read_config+0xf4/0x2e0 [ 5895.247398] [c009f3707ce0] [c03efb3c] read+0x10c/0x2a0 [ 5895.247561] [c009f3707da0] [c031d160] vfs_read+0x110/0x290 [ 5895.247726] [c009f3707de0] [c031ec70] SyS_pread64+0xb0/0xd0 This isn't a WARN - this stack trace is printed explicitly by the EEH code in the case of a PHB failure. arch/powerpc/kernel/eeh.c, line 403. Andrew -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [PATCH 00/22] mtd: nand: return error code of nand_scan(_ident,_tail) on error
On Fri, 4 Nov 2016 19:42:48 +0900 Masahiro Yamadawrote: > nand_scan(), nand_scan_ident(), nand_scan_tail() return > an appropriate negative value on error. > > Most of drivers return the value from them on error, > but some of them return the fixed error code -ENXIO > (and a few return -ENODEV). > > This series make those drivers return more precise error code. Applied and fixed the bug I found in patch 13. Thanks, Boris > > > Masahiro Yamada (22): > mtd: nand: ams-delta: return error code of nand_scan() on error > mtd: nand: cmx270: return error code of nand_scan() on error > mtd: nand: cs553x: return error code of nand_scan() on error > mtd: nand: gpio: return error code of nand_scan() on error > mtd: nand: mpc5121: return error code of nand_scan() on error > mtd: nand: tmio: return error code of nand_scan() on error > mtd: nand: orion: return error code of nand_scan() on error > mtd: nand: pasemi: return error code of nand_scan() on error > mtd: nand: plat_nand: return error code of nand_scan() on error > mtd: nand: atmel: return error code of nand_scan_ident/tail() on error > mtd: nand: brcmnand: return error code of nand_scan_ident/tail() on > error > mtd: nand: fsmc: return error code of nand_scan_ident/tail() on error > mtd: nand: lpc32xx: return error code of nand_scan_ident/tail() on > error > mtd: nand: mediatek: return error code of nand_scan_ident/tail() on > error > mtd: nand: mxc: return error code of nand_scan_ident/tail() on error > mtd: nand: omap2: return error code of nand_scan_ident/tail() on error > mtd: nand: vf610: return error code of nand_scan_ident/tail() on error > mtd: nand: cafe: return error code of nand_scan_ident() on error > mtd: nand: hisi504: return error code of nand_scan_ident() on error > mtd: nand: pxa3xx: return error code of nand_scan_ident() on error > mtd: nand: nandsim: remove unneeded checks for nand_scan_ident/tail() > mtd: nand: socrates: use nand_scan() for nand_scan_ident/tail() combo > > drivers/mtd/nand/ams-delta.c | 5 ++--- > drivers/mtd/nand/atmel_nand.c| 10 -- > drivers/mtd/nand/brcmnand/brcmnand.c | 10 ++ > drivers/mtd/nand/cafe_nand.c | 5 ++--- > drivers/mtd/nand/cmx270_nand.c | 4 ++-- > drivers/mtd/nand/cs553x_nand.c | 5 ++--- > drivers/mtd/nand/fsmc_nand.c | 9 - > drivers/mtd/nand/gpio.c | 5 ++--- > drivers/mtd/nand/hisi504_nand.c | 4 +--- > drivers/mtd/nand/lpc32xx_mlc.c | 10 -- > drivers/mtd/nand/lpc32xx_slc.c | 9 +++-- > drivers/mtd/nand/mpc5121_nfc.c | 4 ++-- > drivers/mtd/nand/mtk_nand.c | 4 ++-- > drivers/mtd/nand/mxc_nand.c | 10 -- > drivers/mtd/nand/nandsim.c | 4 > drivers/mtd/nand/omap2.c | 9 - > drivers/mtd/nand/orion_nand.c| 5 ++--- > drivers/mtd/nand/pasemi_nand.c | 5 ++--- > drivers/mtd/nand/plat_nand.c | 5 ++--- > drivers/mtd/nand/pxa3xx_nand.c | 5 +++-- > drivers/mtd/nand/socrates_nand.c | 12 ++-- > drivers/mtd/nand/tmio_nand.c | 6 +++--- > drivers/mtd/nand/vf610_nfc.c | 10 -- > 23 files changed, 62 insertions(+), 93 deletions(-) >
Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
On Thu, Nov 3, 2016 at 8:40 AM, Brian Kingwrote: > On 10/27/2016 10:26 AM, Eric Dumazet wrote: >> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote: >>> We recently encountered a bug where a few customers using ibmveth on the >>> same LPAR hit an issue where a TCP session hung when large receive was >>> enabled. Closer analysis revealed that the session was stuck because the >>> one side was advertising a zero window repeatedly. >>> >>> We narrowed this down to the fact the ibmveth driver did not set gso_size >>> which is translated by TCP into the MSS later up the stack. The MSS is >>> used to calculate the TCP window size and as that was abnormally large, >>> it was calculating a zero window, even although the sockets receive buffer >>> was completely empty. >>> >>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom >>> and Marcelo for all your help and review on this. >>> >>> The patch fixes both our internal reproduction tests and our customers >>> tests. >>> >>> Signed-off-by: Jon Maxwell >>> --- >>> drivers/net/ethernet/ibm/ibmveth.c | 20 >>> 1 file changed, 20 insertions(+) >>> >>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c >>> b/drivers/net/ethernet/ibm/ibmveth.c >>> index 29c05d0..c51717e 100644 >>> --- a/drivers/net/ethernet/ibm/ibmveth.c >>> +++ b/drivers/net/ethernet/ibm/ibmveth.c >>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int >>> budget) >>> int frames_processed = 0; >>> unsigned long lpar_rc; >>> struct iphdr *iph; >>> +bool large_packet = 0; >>> +u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr); >>> >>> restart_poll: >>> while (frames_processed < budget) { >>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, >>> int budget) >>> iph->check = 0; >>> iph->check = >>> ip_fast_csum((unsigned char *)iph, iph->ihl); >>> adapter->rx_large_packets++; >>> +large_packet = 1; >>> } >>> } >>> } >>> >>> +if (skb->len > netdev->mtu) { >>> +iph = (struct iphdr *)skb->data; >>> +if (be16_to_cpu(skb->protocol) == ETH_P_IP && >>> +iph->protocol == IPPROTO_TCP) { >>> +hdr_len += sizeof(struct iphdr); >>> +skb_shinfo(skb)->gso_type = >>> SKB_GSO_TCPV4; >>> +skb_shinfo(skb)->gso_size = >>> netdev->mtu - hdr_len; >>> +} else if (be16_to_cpu(skb->protocol) == >>> ETH_P_IPV6 && >>> + iph->protocol == IPPROTO_TCP) { >>> +hdr_len += sizeof(struct ipv6hdr); >>> +skb_shinfo(skb)->gso_type = >>> SKB_GSO_TCPV6; >>> +skb_shinfo(skb)->gso_size = >>> netdev->mtu - hdr_len; >>> +} >>> +if (!large_packet) >>> +adapter->rx_large_packets++; >>> +} >>> + >>> >> >> This might break forwarding and PMTU discovery. >> >> You force gso_size to device mtu, regardless of real MSS used by the TCP >> sender. >> >> Don't you have the MSS provided in RX descriptor, instead of guessing >> the value ? > > We've had some further discussions on this with the Virtual I/O Server (VIOS) > development team. The large receive aggregation in the VIOS (AIX based) is > actually > being done by software in the VIOS. What they may be able to do is when > performing > this aggregation, they could look at the packet lengths of all the packets > being > aggregated and take the largest packet size within the aggregation unit, > minus the > header length and return that to the virtual ethernet client which we could > then stuff > into gso_size. They are currently assessing how feasible this would be to do > and whether > it would impact other bits of the code. However, assuming this does end up > being an option, > would this address the concerns here or is that going to break something else > I'm > not thinking of? I was discussing this with a colleague and although this is better than what we have so far. We wonder if there could be a corner case where it ends up with a smaller value than the current MSS. For example if the application sent a burst of small TCP packets with the PUSH bit set. In that case they may not be coalesced by GRO. The VIOS could probably be coded to detect that condition and use the previous MSS. But that may not necessarily be the current MSS. The ibmveth driver passes
Re: [PATCH 1/3] powerpc: Emulation support for load/store instructions on LE
On Sunday 06 November 2016 01:01 AM, Anton Blanchard wrote: > Hi, > >> kprobe, uprobe, hw-breakpoint and xmon are the only user of >> emulate_step. >> >> Kprobe / uprobe single-steps instruction if they can't emulate it, so >> there is no problem with them. As I mention, hw-breakpoint is broken. >> However I'm not sure about xmon, I need to check that. > I was mostly concerned that it would impact kprobes. Sounds like we are > ok there. > >> So yes, there is no user-visible feature that depends on this. > Aren't hardware breakpoints exposed via perf? I'd call perf > user-visible. Thanks Anton, That's a good catch. I tried this on ppc64le: $ sudo cat /proc/kallsyms | grep pid_max c116998c D pid_max $ sudo ./perf record -a --event=mem:0xc116998c sleep 10 Before patch: It does not record any data and throws below warning. $ dmesg [ 817.895573] Unable to handle hardware breakpoint. Breakpoint at 0xc116998c will be disabled. [ 817.895581] [ cut here ] [ 817.895588] WARNING: CPU: 24 PID: 2032 at arch/powerpc/kernel/hw_breakpoint.c:277 hw_breakpoint_handler+0x124/0x230 ... After patch: It records data properly. $ sudo ./perf report --stdio ... # Samples: 36 of event 'mem:0xc116998c' # Event count (approx.): 36 # # Overhead CommandShared Object Symbol # . . # 63.89% kdumpctl [kernel.vmlinux] [k] alloc_pid 27.78% opal_errd [kernel.vmlinux] [k] alloc_pid 5.56% kworker/u97:4 [kernel.vmlinux] [k] alloc_pid 2.78% systemd[kernel.vmlinux] [k] alloc_pid -Ravi
Linux 4.9: Reported regressions as of Sunday, 2016-11-06
Hi! Here is my third regression report for Linux 4.9. It lists 17 regressions I'm aware of. 6 of them are new; 3 got fixed since last weeks report (a fourth looks fixed as well). The console problem ("console: don't prefer first registered [...]") got reported to me multiple times, but the revert to finally get this fixed is in -mm already. As always: Are you aware of any other regressions? Then please let me know (simply CC regressi...@leemhuis.info). And please tell me if there is anything in the report that shouldn't be there. Ciao, Thorsten == Current regressions == Desc: thinkpad x60: BIOS limit stops working, Repo: 16-11-05 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264916.html Stat: n/a Note: WIP Desc: thinkpad x60: thermal passive cooling can not prevent the system from overheating, when there is no BIOS limit. Repo: 16-11-05 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264916.html Stat: n/a Note: WIP Desc: test failures of sendfile(2) and splice(2) Repo: 16-11-01 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1262400.html Stat: 16-11-01 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1262648.html Note: WIP, patch available Desc: amdgpu, topaz: powerplay initialization failed Repo: 16-10-31 https://bugzilla.kernel.org/show_bug.cgi?id=185681 https://bugs.freedesktop.org/show_bug.cgi?id=98357# Stat: 16-11-04 https://bugzilla.kernel.org/show_bug.cgi?id=185681#c7 Note: WIP Desc: mangled display since -rc1 (two systems: one with intel, one with nvidia gpu) Repo: 16-10-31 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1261699.html Stat: n/a https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1262493.html Note: root cause unknown, proper bisec needed (would be good if somebody could help the reporter) Desc: "build regression: make.cross ARCH=mips fails with ""No rule to make target 'alchemy/devboards/'. """ Repo: 16-10-30 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1262410.html https://marc.info/?l=linux-kernel=147780880425626 Stat: n/a Note: nothing happened yet; BTW: Should build regressions be on this list at all? Desc: tpm0: TPM self test failed & can't request region for resource Repo: 16-10-28 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1259943.html https://bugzilla.kernel.org/show_bug.cgi?id=185631 Stat: 16-11-03 https://www.mail-archive.com/tpmdd-devel@lists.sourceforge.net/msg02010.html Note: Partly fixed by https://git.kernel.org/torvalds/c/befd99656c5eb765fe9d96045c4cba099fd938db , but it seems more fixes are needed (and available!) Desc: boot failure of Intel Mobile Internet Devices due to a change in the PCI subsystem that appeared in v4.9-rc1. Repo: 16-10-23 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1255643.html Stat: 16-10-26 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1258579.html Note: Poked list, as it looks like the proposed fix got forgotten Desc: Radeon Oops on shutdown / Panic on shutdown in routine radeon_connector_unregister() Repo: 16-10-19 https://bugzilla.kernel.org/show_bug.cgi?id=178421 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1261699.html Stat: 16-10-30 https://bugzilla.kernel.org/show_bug.cgi?id=178421#c6 Note: Patch available Desc: ""console: don't prefer first registered if DT specifies stdout-path"" breaks console on video outputs of various ARM boards; breaks some ppc machines as well Repo: 16-10-18 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264523.html https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1253391.html https://www.linux-mips.org/archives/linux-mips/16-10/msg00176.html Stat: 16-11-06 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1265059.html https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264422.html Note: revert discussed and also in -mm; Side note: this seems to be a regression that annoys quite a lot of people Desc: unable to handle kernel NULL pointer dereference at fuse_setattr Repo: 16-10-17 https://bugzilla.kernel.org/show_bug.cgi?id=177801 Stat: 16-10-18 https://bugzilla.kernel.org/show_bug.cgi?id=177801#c5 Note: poked Miklos, as the fix is not yet upstream afaics Desc: Skylake gen6 suspend/resume video regression Repo: 16-10-16 https://bugzilla.kernel.org/show_bug.cgi?id=177731 https://bugs.freedesktop.org/show_bug.cgi?id=98517 Stat: 16-10-25 https://bugzilla.kernel.org/show_bug.cgi?id=177731#c3 Note: WIP Desc: warning in intel_dp_aux_transfer: CPU: 0 PID: 4 at drivers/gpu/drm/i915/intel_dp.c:1062 intel_dp_aux_transfer+0x1ed/0x230# Repo: 16-10-16 https://bugzilla.kernel.org/show_bug.cgi?id=177701 Stat: 16-10-27 https://bugs.freedesktop.org/show_bug.cgi?id=97344 Note: Poked Janni a week ago to give a status update, but didn't hear anything yet Desc: module loadling broken due to kbuild changes Repo: 16-10-15
Re: Linux 4.9: Reported regressions as of Sunday, 2016-10-30
Lo! On 01.11.2016 09:18, Paul Bolle wrote: > On Sun, 2016-10-30 at 14:20 +0100, Thorsten Leemhuis wrote: >> As always: Are you aware of any other regressions? Then please let me >> know (simply CC regressi...@leemhuis.info). > Do build regressions count? That's a good question. > Because I was trying to fix an obscure build issue in arch/mips, choose > a random configuration that should hit that issue, and promptly ran > into > https://lkml.kernel.org/r/<201610301405.k82kqqw0%25fengguang...@intel.com> > The same configuration does build under v4.8, I tested that of course. I'd say it's a practical problem that users run into and hence it's a regression. Sure, in this case it hits only those that compile kernels themselves; but those are users, too, and we don't want to scare them away with things that suddenly stop working. IOW: I'll include it in this weeks report. Ciao, Thorsten