Re: [PATCH] perf test: Update event group check for support of uncore event
On 07-Dec-22 10:28 PM, Athira Rajeev wrote: > Event group test checks group creation for combinations of > hw, sw and uncore PMU events. Some of the uncore pmus may > require additional permission to access the counters. > For example, in case of hv_24x7, partition need to have > permissions to access hv_24x7 pmu counters. If not, event_open > will fail. Hence add a sanity check to see if event_open > succeeds before proceeding with the test. > > Fixes: b20d9215a35f ("perf test: Add event group test for events in multiple > PMUs") > Signed-off-by: Athira Rajeev Acked-by: Ravi Bangoria Thanks, Ravi
Re: [PATCH v2] perf: Rewrite core context handling
On 14-Oct-22 3:26 PM, Ravi Bangoria wrote: > On 13-Oct-22 4:29 PM, Peter Zijlstra wrote: >> On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote: >> >>>> - refcount_t refcount; >>>> + refcount_t refcount; /* event <-> ctx */ >>> >>> Ok. We need to remove all those // XXX get/put_ctx() from code >>> which we added to make refcount a pmu_ctx <-> ctx. >> >> Them already gone :-) I've not yet fixed up the typoes, but current >> version should be here: >> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core >> >> Thanks! > > I've been running perf-fuzzer on Xeon machine since yesterday and I don't see > any > issue. Will do the same on my AMD machine as well over the weekend. Only one WARN_ON() hit. Otherwise all good. https://lore.kernel.org/lkml/8d91528b-e830-6ad0-8a92-621ce9f94...@amd.com Thanks, Ravi
Re: [PATCH v2] perf: Rewrite core context handling
On 13-Oct-22 4:29 PM, Peter Zijlstra wrote: > On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote: > >>> - refcount_t refcount; >>> + refcount_t refcount; /* event <-> ctx */ >> >> Ok. We need to remove all those // XXX get/put_ctx() from code >> which we added to make refcount a pmu_ctx <-> ctx. > > Them already gone :-) I've not yet fixed up the typoes, but current > version should be here: > > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core > > Thanks! I've been running perf-fuzzer on Xeon machine since yesterday and I don't see any issue. Will do the same on my AMD machine as well over the weekend. Thanks, Ravi
Re: [PATCH v2] perf: Rewrite core context handling
On 13-Oct-22 2:17 AM, Peter Zijlstra wrote: > On Wed, Oct 12, 2022 at 02:16:29PM +0200, Peter Zijlstra wrote: > >> That's the intent yeah. But due to not always holding ctx->mutex over >> put_pmu_ctx() this might be moot. I'm almost through auditing epc usage >> and I think ctx->lock is sufficient, fingers crossed. > > So the very last epc usage threw a spanner into the works and made > things complicated. > > Specifically sys_perf_event_open()'s group_leader case uses > event->pmu_ctx while only holding ctx->mutex. Therefore we can't fully > let go of ctx->mutex locking and purely rely on ctx->lock. > > Now the good news is that the annoying put_pmu_ctx() without holding > ctx->mutex case doesn't actually matter here. Since we hold a reference > on the group_leader (per the filedesc) the event can't go away, > therefore it must have a pmu_ctx, and then holding ctx->mutex ensures > the pmu_ctx is stable -- iow it serializes against > sys_perf_event_open()'s move_group and perf_pmu_migrate_context() > changing the epc around. > > So we're going with the normal mutex+lock for modification rule, but > allow the weird put_pmu_ctx() exception. > > I have the below delta. > > I'm hoping we can call this done -- I'm going to see if I can bribe Mark > to take a look at the arm64 thing soon and then hopefully queue the > whole thing once -rc1 happens. That should give us a good long soak > until the next merge window. Sounds good. Thanks for all the help! I've glanced through the changes and they looks fine, below are few minor points. > + * Specificially, sys_perf_event_open()'s group_leader case depends on > + * ctx->mutex pinning the configuration. Since we hold a reference on > + * group_leader (through the filedesc) it can't fo away, therefore it's typo: can't go away > - refcount_t refcount; > + refcount_t refcount; /* event <-> ctx */ Ok. We need to remove all those // XXX get/put_ctx() from code which we added to make refcount a pmu_ctx <-> ctx. > +#define double_list_for_each_entry(pos1, pos2, head1, head2, member) \ > + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ > + pos2 = list_first_entry(head2, typeof(*pos2), member); \ > + !list_entry_is_head(pos1, head1, member) &&\ > + !list_entry_is_head(pos2, head2, member); \ > + pos1 = list_next_entry(pos1, member), \ > + pos2 = list_next_entry(pos2, member)) > + > static void perf_event_swap_task_ctx_data(struct perf_event_context > *prev_ctx, > struct perf_event_context *next_ctx) While this is unrelated to this patch, shouldn't we also need to swap event->hw.target? A purely hypothetical scenario: Consider two processes having clone contexts (for example, two children of the same parent). While process switch between these two, the perf event context would get swapped but event->hw.target will point to other sibling's task_struct. If any one process exit just after single context swap, _free_event() will call put_task_context() on sibling process' task_struct. > @@ -12436,6 +12463,9 @@ SYSCALL_DEFINE5(perf_event_open, >* Allow the addition of software events to hw >* groups, this is safe because software events >* never fail to schedule. > + * > + * Note the comment that goes with struct > + * pmu_event_pmu_context. typo: perf_event_pmu_context The good (or bad? ;)) news is, perf test and Vince's perf_event_tests are running fine without any regression on my machine. Thanks, Ravi
Re: [PATCH v2] perf: Rewrite core context handling
On 11-Oct-22 11:17 PM, Peter Zijlstra wrote: > On Tue, Oct 11, 2022 at 04:02:56PM +0200, Peter Zijlstra wrote: >> On Tue, Oct 11, 2022 at 06:49:55PM +0530, Ravi Bangoria wrote: >>> On 11-Oct-22 4:59 PM, Peter Zijlstra wrote: >>>> On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: >>>> >>>>> +static void perf_event_swap_task_ctx_data(struct perf_event_context >>>>> *prev_ctx, >>>>> + struct perf_event_context *next_ctx) >>>>> +{ >>>>> + struct perf_event_pmu_context *prev_epc, *next_epc; >>>>> + >>>>> + if (!prev_ctx->nr_task_data) >>>>> + return; >>>>> + >>>>> + prev_epc = list_first_entry(_ctx->pmu_ctx_list, >>>>> + struct perf_event_pmu_context, >>>>> + pmu_ctx_entry); >>>>> + next_epc = list_first_entry(_ctx->pmu_ctx_list, >>>>> + struct perf_event_pmu_context, >>>>> + pmu_ctx_entry); >>>>> + >>>>> + while (_epc->pmu_ctx_entry != _ctx->pmu_ctx_list && >>>>> +_epc->pmu_ctx_entry != _ctx->pmu_ctx_list) { >>>>> + >>>>> + WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); >>>>> + >>>>> + /* >>>>> + * PMU specific parts of task perf context can require >>>>> + * additional synchronization. As an example of such >>>>> + * synchronization see implementation details of Intel >>>>> + * LBR call stack data profiling; >>>>> + */ >>>>> + if (prev_epc->pmu->swap_task_ctx) >>>>> + prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); >>>>> + else >>>>> + swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); >>>> >>>> Did I forget to advance the iterators here? >>> >>> Yeah. Seems so. I overlooked it too. >> >> OK; so I'm not slowly going crazy staring at this code ;-) Let me go add >> it now then. :-) >> >> But first I gotta taxi the kids around for a bit, bbl. > > OK, so I've been going over the perf_event_pmu_context life-time thing > as well, there were a bunch of XXXs there and I'm not sure Im happy with > things, but I'd also forgotten most of it. > > Ideally epc works like it's a regular member of ctx -- locking wise that > is, but I'm not sure we can make that stick -- see the ctx->mutex issues > we have with put_ctx(). > > As such, I'm going to have to re-audit all the epc usage to see if > pure ctx->lock is sufficient. > > I did do make epc RCU freed, because pretty much everything is and that > was easy enough to make happen -- it means we don't need to worry about > that. > > But I'm going cross-eyes from staring at this all day, so more tomorrow. > The below is what I currently have. > > --- > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -833,13 +833,13 @@ struct perf_event { > * `[1:n]-' `-[n:1]-> pmu <-[1:n]-' > * > * > - * XXX destroy epc when empty > - * refcount, !rcu > + * epc lifetime is refcount based and RCU freed (similar to > perf_event_context). > + * epc locking is as if it were a member of perf_event_context; specifically: > * > - * XXX epc locking > + * modification, both: ctx->mutex && ctx->lock > + * reading, either: ctx->mutex || ctx->lock > * > - * event->pmu_ctxctx->mutex && inactive > - * ctx->pmu_ctx_list ctx->mutex && ctx->lock > + * XXX except this isn't true ... see put_pmu_ctx(). > * > */ > struct perf_event_pmu_context { > @@ -857,6 +857,7 @@ struct perf_event_pmu_context { > unsigned intnr_events; > > atomic_trefcount; /* event <-> epc */ > + struct rcu_head rcu_head; > > void*task_ctx_data; /* pmu specific data */ > /* > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -1727,6 +1727,10 @@ perf_event_groups_next(struct perf_event > return NULL; > } > > +#define perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) > \ > + for (event = perf_event_groups_first(groups, cpu,
Re: [PATCH v2] perf: Rewrite core context handling
On 11-Oct-22 4:59 PM, Peter Zijlstra wrote: > On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: > >> +static void perf_event_swap_task_ctx_data(struct perf_event_context >> *prev_ctx, >> + struct perf_event_context *next_ctx) >> +{ >> +struct perf_event_pmu_context *prev_epc, *next_epc; >> + >> +if (!prev_ctx->nr_task_data) >> +return; >> + >> +prev_epc = list_first_entry(_ctx->pmu_ctx_list, >> +struct perf_event_pmu_context, >> +pmu_ctx_entry); >> +next_epc = list_first_entry(_ctx->pmu_ctx_list, >> +struct perf_event_pmu_context, >> +pmu_ctx_entry); >> + >> +while (_epc->pmu_ctx_entry != _ctx->pmu_ctx_list && >> + _epc->pmu_ctx_entry != _ctx->pmu_ctx_list) { >> + >> +WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); >> + >> +/* >> + * PMU specific parts of task perf context can require >> + * additional synchronization. As an example of such >> + * synchronization see implementation details of Intel >> + * LBR call stack data profiling; >> + */ >> +if (prev_epc->pmu->swap_task_ctx) >> +prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); >> +else >> +swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); > > Did I forget to advance the iterators here? Yeah. Seems so. I overlooked it too. Thanks, Ravi
Re: [PATCH] perf: Rewrite core context handling
On 10-Oct-22 3:53 PM, Peter Zijlstra wrote: > On Tue, Sep 06, 2022 at 11:20:53AM +0530, Ravi Bangoria wrote: > >> This one was simple enough so I prepared a patch for this. Let >> me know if you see any issues with below diff. > > I've extraed this as a separate patch since it's not strictly required > for correctness and the patch is a quite large enough. Sure. I'll keep it separate. Thanks, Ravi
Re: [PATCH] perf: Rewrite core context handling
On 10-Oct-22 3:44 PM, Peter Zijlstra wrote: > On Wed, Sep 07, 2022 at 04:58:49PM +0530, Ravi Bangoria wrote: >>> -static void >>> -ctx_flexible_sched_in(struct perf_event_context *ctx, >>> - struct perf_cpu_context *cpuctx) >>> +/* XXX .busy thingy from Peter's patch */ >>> +static void ctx_flexible_sched_in(struct perf_event_context *ctx, struct >>> pmu *pmu) >> >> This one turned out to be very easy. Given that, we iterate over each >> pmu, we can just return error if we fail to schedule any flexible event. >> (It wouldn't be straight forward like this if we needed to implement >> pmu=NULL optimization.) >> >> --- >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index e0232e0bb74e..923656af73fe 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -3751,6 +3751,7 @@ static int merge_sched_in(struct perf_event *event, >> void *data) >> cpc = >> this_cpu_ptr(event->pmu_ctx->pmu->cpu_pmu_context); >> perf_mux_hrtimer_restart(cpc); >> group_update_userpage(event); >> +return -EBUSY; >> } >> } >> > > I'm afraid this breaks things; consider: > > f79256532682 ("perf/core: fix userpage->time_enabled of inactive events") > > I totally hate this -- because it means we *HAVE* to iterate the > inactive events, but alas. Sure. Will drop this. Thanks, Ravi
Re: [PATCH] perf: Rewrite core context handling
> -static void > -ctx_flexible_sched_in(struct perf_event_context *ctx, > - struct perf_cpu_context *cpuctx) > +/* XXX .busy thingy from Peter's patch */ > +static void ctx_flexible_sched_in(struct perf_event_context *ctx, struct pmu > *pmu) This one turned out to be very easy. Given that, we iterate over each pmu, we can just return error if we fail to schedule any flexible event. (It wouldn't be straight forward like this if we needed to implement pmu=NULL optimization.) --- diff --git a/kernel/events/core.c b/kernel/events/core.c index e0232e0bb74e..923656af73fe 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3751,6 +3751,7 @@ static int merge_sched_in(struct perf_event *event, void *data) cpc = this_cpu_ptr(event->pmu_ctx->pmu->cpu_pmu_context); perf_mux_hrtimer_restart(cpc); group_update_userpage(event); + return -EBUSY; } } @@ -3776,7 +3777,6 @@ static void ctx_pinned_sched_in(struct perf_event_context *ctx, struct pmu *pmu) } } -/* XXX .busy thingy from Peter's patch */ static void ctx_flexible_sched_in(struct perf_event_context *ctx, struct pmu *pmu) { struct perf_event_pmu_context *pmu_ctx; --- Thanks, Ravi
Re: [PATCH] perf: Rewrite core context handling
> @@ -9752,10 +9889,13 @@ void perf_tp_event(u16 event_type, u64 count, void > *record, int entry_size, > struct trace_entry *entry = record; > > rcu_read_lock(); > - ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]); > + ctx = rcu_dereference(task->perf_event_ctxp); > if (!ctx) > goto unlock; > > + // XXX iterate groups instead, we should be able to > + // find the subtree for the perf_tracepoint pmu and CPU. > + > list_for_each_entry_rcu(event, >event_list, event_entry) { > if (event->cpu != smp_processor_id()) > continue; This one was simple enough so I prepared a patch for this. Let me know if you see any issues with below diff. --- diff --git a/kernel/events/core.c b/kernel/events/core.c index 820c56c66b26..e0232e0bb74e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9807,6 +9807,44 @@ static struct pmu perf_swevent = { #ifdef CONFIG_EVENT_TRACING +static void tp_perf_event_destroy(struct perf_event *event) +{ + perf_trace_destroy(event); +} + +static int perf_tp_event_init(struct perf_event *event) +{ + int err; + + if (event->attr.type != PERF_TYPE_TRACEPOINT) + return -ENOENT; + + /* +* no branch sampling for tracepoint events +*/ + if (has_branch_stack(event)) + return -EOPNOTSUPP; + + err = perf_trace_init(event); + if (err) + return err; + + event->destroy = tp_perf_event_destroy; + + return 0; +} + +static struct pmu perf_tracepoint = { + .task_ctx_nr= perf_sw_context, + + .event_init = perf_tp_event_init, + .add= perf_trace_add, + .del= perf_trace_del, + .start = perf_swevent_start, + .stop = perf_swevent_stop, + .read = perf_swevent_read, +}; + static int perf_tp_filter_match(struct perf_event *event, struct perf_sample_data *data) { @@ -9856,6 +9894,49 @@ void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx, } EXPORT_SYMBOL_GPL(perf_trace_run_bpf_submit); + +static void __perf_tp_event_target_task(u64 count, void *record, + struct pt_regs *regs, + struct perf_sample_data *data, + struct perf_event *event) +{ + struct trace_entry *entry = record; + + if (event->attr.config != entry->type) + return; + /* Cannot deliver synchronous signal to other task. */ + if (event->attr.sigtrap) + return; + if (perf_tp_event_match(event, data, regs)) + perf_swevent_event(event, count, data, regs); +} + +static void perf_tp_event_target_task(u64 count, void *record, + struct pt_regs *regs, + struct perf_sample_data *data, + struct perf_event_context *ctx) +{ + struct perf_event *event, *sibling; + + event = perf_event_groups_first(>pinned_groups, smp_processor_id(), + _tracepoint, NULL); + for (; event; event = perf_event_groups_next(event, _tracepoint)) { + __perf_tp_event_target_task(count, record, regs, data, event); + for_each_sibling_event(sibling, event) { + __perf_tp_event_target_task(count, record, regs, data, sibling); + } + } + + event = perf_event_groups_first(>flexible_groups, smp_processor_id(), + _tracepoint, NULL); + for (; event; event = perf_event_groups_next(event, _tracepoint)) { + __perf_tp_event_target_task(count, record, regs, data, event); + for_each_sibling_event(sibling, event) { + __perf_tp_event_target_task(count, record, regs, data, sibling); + } + } +} + void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size, struct pt_regs *regs, struct hlist_head *head, int rctx, struct task_struct *task) @@ -9886,29 +9967,15 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size, */ if (task && task != current) { struct perf_event_context *ctx; - struct trace_entry *entry = record; rcu_read_lock(); ctx = rcu_dereference(task->perf_event_ctxp); if (!ctx) goto unlock; - // XXX iterate groups instead, we should be able to - // find the subtree for the perf_tracepoint pmu and CPU. - - list_for_each_entry_rcu(event, >event_list, event_entry) { -
Re: [PATCH] perf: Rewrite core context handling
> So the basic issue I mentioned is that: > > > /* > * ,[1:n]-. > * V V > * perf_event_context <-[1:n]-> perf_event_pmu_context <--- perf_event > * ^ ^ | | > * `[1:n]-' `-[n:1]-> pmu <-[1:n]-' > * > * > * XXX destroy epc when empty > * refcount, !rcu > * > * XXX epc locking > * > * event->pmu_ctxctx->mutex && inactive > * ctx->pmu_ctx_list ctx->mutex && ctx->lock > * > */ > struct perf_event_pmu_context { > ... > atomic_trefcount; /* event <-> epc */ > ... > } > > But that then also suggests that: > > struct perf_event_context { > ... > refcount_t refcount; > ... > } > > should be: ctx <-> epc, and that is not so, the ctx::refcount still > counts the number of events. > > Now this might not be bad, but it is confusing. I don't have strong opinion, but we store events in ctx, not in pmu_ctx. So, I think it makes sense to keep refcount as ctx <-> event? [...] > +void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu) > +{ > + struct perf_event_context *src_ctx, *dst_ctx; > + LIST_HEAD(events); > + > + src_ctx = _cpu_ptr(_context, src_cpu)->ctx; > + dst_ctx = _cpu_ptr(_context, dst_cpu)->ctx; > + > + /* > + * See perf_event_ctx_lock() for comments on the details > + * of swizzling perf_event::ctx. > + */ > + mutex_lock_double(_ctx->mutex, _ctx->mutex); > + > + __perf_pmu_remove(src_ctx, src_cpu, pmu, _src->pinned_groups, > ); > + __perf_pmu_remove(src_ctx, src_cpu, pmu, _src->flexible_groups, > ); Rbtrees does not contain sibling events. Shouldn't we continue using event_list here? Thanks, Ravi
Re: [PATCH] perf: Rewrite core context handling
On 29-Aug-22 8:10 PM, Peter Zijlstra wrote: > On Mon, Aug 29, 2022 at 02:04:33PM +0200, Peter Zijlstra wrote: >> On Mon, Aug 29, 2022 at 05:03:47PM +0530, Ravi Bangoria wrote: >>> @@ -12598,6 +12590,7 @@ EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter); >>> >>> void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu) >>> { >>> +#if 0 // XXX buggered - cpu hotplug, who cares >>> struct perf_event_context *src_ctx; >>> struct perf_event_context *dst_ctx; >>> struct perf_event *event, *tmp; >>> @@ -12658,6 +12651,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int >>> src_cpu, int dst_cpu) >>> } >>> mutex_unlock(_ctx->mutex); >>> mutex_unlock(_ctx->mutex); >>> +#endif >>> } >>> EXPORT_SYMBOL_GPL(perf_pmu_migrate_context); >>> >> >> Note to self; fix this :-) I'll see if I have time for that later today. > > Urgh, while going through that it appears the whole refcounting thing > isn't fully done either. Not sure I follow. Can you please elaborate. Thanks, Ravi
[PATCH] perf: Rewrite core context handling
From: Peter Zijlstra There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which has resulted in a number of yucky things (both proposed and merged). Notably: - HW breakpoint PMU - ARM big.little PMU / Intel ADL PMU - Intel Branch Monitoring PMU - AMD IBS PMU - S390 cpum_cf PMU - PowerPC trace_imc PMU *Current design:* Currently we have a per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ |^ | ^ `-'| `--> pmu ---' v ^ perf_event --' Each task has an array of pointers to a perf_event_context. Each perf_event_context has a direct relation to a PMU and a group of events for that PMU. The task related perf_event_context's have a pointer back to that task. Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which includes a perf_event_context, which again has a direct relation to that PMU, and a group of events for that PMU. The perf_cpu_context also tracks which task context is currently associated with that CPU and includes a few other things like the hrtimer for rotation etc. Each perf_event is then associated with its PMU and one perf_event_context. *Proposed design:* New design proposed by this patch reduce to a single task context and a single CPU context but adds some intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---' | | | |perf_cpu_pmu_context <--. | `.^ | | || | | vv | | ,--> perf_event_pmu_context | | || | || v v| perf_event ---> pmu ' With new design, perf_event_context will hold all pmu events in the respective(pinned/flexible) rbtrees. This can be achieved by adding pmu to rbtree key: {cpu, pmu, cgroup, group_index} Each perf_event_context carry a list of perf_event_pmu_context which is used to hold per-pmu-per-context state. For ex, it keeps track of currently active events for that pmu, a pmu specific task_ctx_data, a flag to tell whether rotation is required or not etc. Similarly, perf_cpu_pmu_context is used to hold per-pmu-per-cpu state like hrtimer details to drive the event rotation, a pointer to perf_event_pmu_context of currently running task and some other ancillary information. Each perf_event is associated to it's pmu, perf_event_context and perf_event_pmu_context. *Tests:* With this, perf test and perf_event_test runs fine without any issues on Intel and AMD machines. However, one WARN_ON() was reported on Intel machine[1] with perf fuzzer. Note that, so far I've run these tests to verify stability of the change only, not functional correctness. Also, some of the known issues marked with XXX needs to be resolved. Further optimizations to current implementation are possible. For ex, ctx_resched() can be optimized to reschedule only single pmu events. ctx_sched_in() can be optimized to use pmu=NULL while scheduling entire context. However, this will lead to significant changes to rbtree and minheap layouts. [1]: https://lore.kernel.org/r/8d91528b-e830-6ad0-8a92-621ce9f944ca%40amd.com Signed-off-by: Peter Zijlstra Signed-off-by: Ravi Bangoria --- This is a 3rd version of perf event context rework and it's quite stable now, so I thought to remove RFC tag. Previous versions: RFC v2: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bango...@amd.com RFC v1: https://lore.kernel.org/lkml/20181010104559.go5...@hirez.programming.kicks-ass.net RFC v2 -> PATCH: - Rebased to v6.0-rc2 - Fixed locking issues reported by lockdep - Fixed kernel crash caused by stale cpc->task_epc - Resolved few bugs and warnings reported while running perf test, perf_event_test and perf fuzzer - Removed cgrp_cpuctx_list. No need to maintain list as we have single perf_event_context for each cpu - Removed x86_pmu_update_cpu_context() quirk - Modify pmu->filter callback (including s/filter_match/filter/). arch/arm64/kernel/perf_event.c | 24 +- arch/powerpc/perf/core-book3s.c|8 +- arch/s390/kernel/perf_pai_crypto.c |2 +- arch/x86/events/amd/brs.c |2 +- ar
Re: [PATCH 4/4] bpf powerpc: Add addr > TASK_SIZE_MAX explicit check
@@ -763,6 +771,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * /* dst = *(u16 *)(ul) (src + off) */ case BPF_LDX | BPF_MEM | BPF_H: case BPF_LDX | BPF_PROBE_MEM | BPF_H: + if (BPF_MODE(code) == BPF_PROBE_MEM) { + EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off)); + PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX); + EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2])); + PPC_BCC(COND_GT, (ctx->idx + 4) * 4); + EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg)); + PPC_JMP((ctx->idx + 2) * 4); + } That code seems strictly identical to the previous one and the next one. Can you refactor in a function ? I'll check this. EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off)); if (insn_is_zext([i + 1])) addrs[++i] = ctx->idx * 4; @@ -773,6 +789,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * /* dst = *(u32 *)(ul) (src + off) */ case BPF_LDX | BPF_MEM | BPF_W: case BPF_LDX | BPF_PROBE_MEM | BPF_W: + if (BPF_MODE(code) == BPF_PROBE_MEM) { + EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off)); + PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX); + EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2])); + PPC_BCC(COND_GT, (ctx->idx + 4) * 4); + EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg)); + PPC_JMP((ctx->idx + 2) * 4); + } EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off)); if (insn_is_zext([i + 1])) addrs[++i] = ctx->idx * 4; @@ -783,6 +807,20 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * /* dst = *(u64 *)(ul) (src + off) */ case BPF_LDX | BPF_MEM | BPF_DW: case BPF_LDX | BPF_PROBE_MEM | BPF_DW: + if (BPF_MODE(code) == BPF_PROBE_MEM) { + EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off)); + PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX); + EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2])); + if (off % 4) That test is worth a comment. (off % 4) test is based on how PPC_BPF_LL() emits instruction. And I'd prefer if (off & 3) { PPC_BCC(COND_GT, (ctx->idx + 5) * 4); EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg)); PPC_JMP((ctx->idx + 3) * 4); } else { PPC_BCC(COND_GT, (ctx->idx + 4) * 4); EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg)); PPC_JMP((ctx->idx + 2) * 4); } Yes this is neat. Thanks for the review, Ravi
Re: [PATCH 3/4] bpf powerpc: Add BPF_PROBE_MEM support for 64bit JIT
On 7/6/21 3:23 PM, Christophe Leroy wrote: Le 06/07/2021 à 09:32, Ravi Bangoria a écrit : BPF load instruction with BPF_PROBE_MEM mode can cause a fault inside kernel. Append exception table for such instructions within BPF program. Can you do the same for 32bit ? Sure. But before that, do you think the approach is fine(including patch #4)? Because it's little bit different from what other archs do. [...] @@ -89,6 +89,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) { u32 proglen; u32 alloclen; + u32 extable_len = 0; + u32 fixup_len = 0; Setting those to 0 doesn't seem to be needed, as it doesn't seem to exist any path to skip the setting below. You should not perform unnecessary init at declaration as it is error prone. Ok. [...] @@ -234,7 +247,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) fp->bpf_func = (void *)image; fp->jited = 1; - fp->jited_len = alloclen; + fp->jited_len = proglen + FUNCTION_DESCR_SIZE; bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE)); if (!fp->is_func || extra_pass) { This hunk does not apply on latest powerpc tree. You are missing commit 62e3d4210ac9c Ok. I prepared this on a bpf/master. Will rebase it to powerpc/next. [...] +static int add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, + u32 code, struct codegen_context *ctx, int dst_reg) +{ + off_t offset; + unsigned long pc; + struct exception_table_entry *ex; + u32 *fixup; + + /* Populate extable entries only in the last pass */ + if (pass != 2 || BPF_MODE(code) != BPF_PROBE_MEM) 'code' is only used for that test, can you do the test before calling add_extable_entry() ? Ok. + return 0; + + if (!fp->aux->extable || + WARN_ON_ONCE(ctx->exentry_idx >= fp->aux->num_exentries)) + return -EINVAL; + + pc = (unsigned long)[ctx->idx - 1]; You should call this function before incrementing ctx->idx Ok. + + fixup = (void *)fp->aux->extable - + (fp->aux->num_exentries * BPF_FIXUP_LEN) + + (ctx->exentry_idx * BPF_FIXUP_LEN); + + fixup[0] = PPC_RAW_XOR(dst_reg, dst_reg, dst_reg); Prefered way to clear a reg in according to ISA is to do 'li reg, 0' Sure I'll use 'li reg, 0' But can you point me to where in ISA this is mentioned? + fixup[1] = (PPC_INST_BRANCH | + (((long)(pc + 4) - (long)[1]) & 0x03fc)); Would be nice if we could have a PPC_RAW_BRANCH() stuff, we could do something like PPC_RAW_BRANCH((long)(pc + 4) - (long)[1]) Ok. [...] @@ -710,25 +752,41 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * */ /* dst = *(u8 *)(ul) (src + off) */ case BPF_LDX | BPF_MEM | BPF_B: + case BPF_LDX | BPF_PROBE_MEM | BPF_B: Could do: + case BPF_LDX | BPF_PROBE_MEM | BPF_B: + ret = add_extable_entry(fp, image, pass, code, ctx, dst_reg); + if (ret) + return ret; case BPF_LDX | BPF_MEM | BPF_B: Yes this is neat. Thanks for the review. Ravi
[PATCH 4/4] bpf powerpc: Add addr > TASK_SIZE_MAX explicit check
On PowerPC with KUAP enabled, any kernel code which wants to access userspace needs to be surrounded by disable-enable KUAP. But that is not happening for BPF_PROBE_MEM load instruction. So, when BPF program tries to access invalid userspace address, page-fault handler considers it as bad KUAP fault: Kernel attempted to read user page (d000) - exploit attempt? (uid: 0) Considering the fact that PTR_TO_BTF_ID (which uses BPF_PROBE_MEM mode) could either be a valid kernel pointer or NULL but should never be a pointer to userspace address, execute BPF_PROBE_MEM load only if addr > TASK_SIZE_MAX, otherwise set dst_reg=0 and move on. This will catch NULL, valid or invalid userspace pointers. Only bad kernel pointer will be handled by BPF exception table. [Alexei suggested for x86] Suggested-by: Alexei Starovoitov Signed-off-by: Ravi Bangoria --- arch/powerpc/net/bpf_jit_comp64.c | 38 +++ 1 file changed, 38 insertions(+) diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 1884c6dca89a..46becae76210 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -753,6 +753,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * /* dst = *(u8 *)(ul) (src + off) */ case BPF_LDX | BPF_MEM | BPF_B: case BPF_LDX | BPF_PROBE_MEM | BPF_B: + if (BPF_MODE(code) == BPF_PROBE_MEM) { + EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off)); + PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX); + EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2])); + PPC_BCC(COND_GT, (ctx->idx + 4) * 4); + EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg)); + PPC_JMP((ctx->idx + 2) * 4); + } EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off)); if (insn_is_zext([i + 1])) addrs[++i] = ctx->idx * 4; @@ -763,6 +771,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * /* dst = *(u16 *)(ul) (src + off) */ case BPF_LDX | BPF_MEM | BPF_H: case BPF_LDX | BPF_PROBE_MEM | BPF_H: + if (BPF_MODE(code) == BPF_PROBE_MEM) { + EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off)); + PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX); + EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2])); + PPC_BCC(COND_GT, (ctx->idx + 4) * 4); + EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg)); + PPC_JMP((ctx->idx + 2) * 4); + } EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off)); if (insn_is_zext([i + 1])) addrs[++i] = ctx->idx * 4; @@ -773,6 +789,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * /* dst = *(u32 *)(ul) (src + off) */ case BPF_LDX | BPF_MEM | BPF_W: case BPF_LDX | BPF_PROBE_MEM | BPF_W: + if (BPF_MODE(code) == BPF_PROBE_MEM) { + EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off)); + PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX); + EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2])); + PPC_BCC(COND_GT, (ctx->idx + 4) * 4); + EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg)); + PPC_JMP((ctx->idx + 2) * 4); + } EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off)); if (insn_is_zext([i + 1])) addrs[++i] = ctx->idx * 4; @@ -783,6 +807,20 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * /* dst = *(u64 *)(ul) (src + off) */ case BPF_LDX | BPF_MEM | BPF_DW: case BPF_LDX | BPF_PROBE_MEM | BPF_DW: + if (BPF_MODE(code) == BPF_PROBE_MEM) { + EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off)); + PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX); + EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2])); + if (off % 4) + PPC_BCC(COND_GT, (ctx->idx + 5) * 4); + else + PPC_BCC(COND_GT, (ctx->idx + 4) * 4); + EMIT(P
[PATCH 3/4] bpf powerpc: Add BPF_PROBE_MEM support for 64bit JIT
BPF load instruction with BPF_PROBE_MEM mode can cause a fault inside kernel. Append exception table for such instructions within BPF program. Unlike other archs which uses extable 'fixup' field to pass dest_reg and nip, BPF exception table on PowerPC follows the generic PowerPC exception table design, where it populates both fixup and extable sections witin BPF program. fixup section contains two instructions, first instruction clears dest_reg and 2nd jumps to next instruction in the BPF code. extable 'insn' field contains relative offset of the instruction and 'fixup' field contains relative offset of the fixup entry. Example layout of BPF program with extable present: +--+ | | | | 0x4020 -->| ld r27,4(r3) | | | | | 0x40ac -->| lwz r3,0(r4)| | | | | |--| 0x4280 -->| xor r27,r27,r27 | \ fixup entry | b 0x4024 | / 0x4288 -->| xor r3,r3,r3 | | b 0x40b0 | |--| 0x4290 -->| insn=0xfd90 | \ extable entry | fixup=0xffec | / 0x4298 -->| insn=0xfe14 | | fixup=0xffec | +--+ (Addresses shown here are chosen random, not real) Signed-off-by: Ravi Bangoria --- arch/powerpc/net/bpf_jit.h| 5 ++- arch/powerpc/net/bpf_jit_comp.c | 25 + arch/powerpc/net/bpf_jit_comp32.c | 2 +- arch/powerpc/net/bpf_jit_comp64.c | 60 ++- 4 files changed, 83 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index 411c63d945c7..e9408ad190d3 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -141,8 +141,11 @@ struct codegen_context { unsigned int idx; unsigned int stack_size; int b2p[ARRAY_SIZE(b2p)]; + unsigned int exentry_idx; }; +#define BPF_FIXUP_LEN 8 /* Two instructions */ + static inline void bpf_flush_icache(void *start, void *end) { smp_wmb(); /* smp write barrier */ @@ -166,7 +169,7 @@ static inline void bpf_clear_seen_register(struct codegen_context *ctx, int i) void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func); int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx, - u32 *addrs); + u32 *addrs, int pass); void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx); void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx); void bpf_jit_realloc_regs(struct codegen_context *ctx); diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index a9585e52a88d..3ebd8897cf09 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -89,6 +89,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) { u32 proglen; u32 alloclen; + u32 extable_len = 0; + u32 fixup_len = 0; u8 *image = NULL; u32 *code_base; u32 *addrs; @@ -131,7 +133,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) image = jit_data->image; bpf_hdr = jit_data->header; proglen = jit_data->proglen; - alloclen = proglen + FUNCTION_DESCR_SIZE; extra_pass = true; goto skip_init_ctx; } @@ -149,7 +150,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) cgctx.stack_size = round_up(fp->aux->stack_depth, 16); /* Scouting faux-generate pass 0 */ - if (bpf_jit_build_body(fp, 0, , addrs)) { + if (bpf_jit_build_body(fp, 0, , addrs, 0)) { /* We hit something illegal or unsupported. */ fp = org_fp; goto out_addrs; @@ -162,7 +163,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) */ if (cgctx.seen & SEEN_TAILCALL) { cgctx.idx = 0; - if (bpf_jit_build_body(fp, 0, , addrs)) { + if (bpf_jit_build_body(fp, 0, , addrs, 0)) { fp = org_fp; goto out_addrs; } @@ -177,8 +178,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) bpf_jit_build_prologue(0, ); bpf_jit_build_epilogue(0, ); + fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN; + extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry); + proglen = cgctx.idx * 4; - alloclen = proglen + FUNCTION_DESCR_SIZE; + alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len; bpf_hdr = bpf_jit_binary_alloc(alloclen, , 4, bpf_jit_fill_ill_
[PATCH 2/4] bpf powerpc: Remove extra_pass from bpf_jit_build_body()
In case of extra_pass, we always skips usual JIT passes. Thus extra_pass is always false while calling bpf_jit_build_body() and thus it can be removed. Signed-off-by: Ravi Bangoria --- arch/powerpc/net/bpf_jit.h| 2 +- arch/powerpc/net/bpf_jit_comp.c | 6 +++--- arch/powerpc/net/bpf_jit_comp32.c | 4 ++-- arch/powerpc/net/bpf_jit_comp64.c | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index d6267e93027a..411c63d945c7 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -166,7 +166,7 @@ static inline void bpf_clear_seen_register(struct codegen_context *ctx, int i) void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func); int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx, - u32 *addrs, bool extra_pass); + u32 *addrs); void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx); void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx); void bpf_jit_realloc_regs(struct codegen_context *ctx); diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index 798ac4350a82..a9585e52a88d 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -149,7 +149,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) cgctx.stack_size = round_up(fp->aux->stack_depth, 16); /* Scouting faux-generate pass 0 */ - if (bpf_jit_build_body(fp, 0, , addrs, false)) { + if (bpf_jit_build_body(fp, 0, , addrs)) { /* We hit something illegal or unsupported. */ fp = org_fp; goto out_addrs; @@ -162,7 +162,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) */ if (cgctx.seen & SEEN_TAILCALL) { cgctx.idx = 0; - if (bpf_jit_build_body(fp, 0, , addrs, false)) { + if (bpf_jit_build_body(fp, 0, , addrs)) { fp = org_fp; goto out_addrs; } @@ -210,7 +210,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) /* Now build the prologue, body code & epilogue for real. */ cgctx.idx = 0; bpf_jit_build_prologue(code_base, ); - bpf_jit_build_body(fp, code_base, , addrs, extra_pass); + bpf_jit_build_body(fp, code_base, , addrs); bpf_jit_build_epilogue(code_base, ); if (bpf_jit_enable > 1) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index bbb16099e8c7..1f81bea35aab 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -266,7 +266,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 /* Assemble the body code between the prologue & epilogue */ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx, - u32 *addrs, bool extra_pass) + u32 *addrs) { const struct bpf_insn *insn = fp->insnsi; int flen = fp->len; @@ -846,7 +846,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * case BPF_JMP | BPF_CALL: ctx->seen |= SEEN_FUNC; - ret = bpf_jit_get_func_addr(fp, [i], extra_pass, + ret = bpf_jit_get_func_addr(fp, [i], false, _addr, _addr_fixed); if (ret < 0) return ret; diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 57a8c1153851..984177d9d394 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -272,7 +272,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 /* Assemble the body code between the prologue & epilogue */ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx, - u32 *addrs, bool extra_pass) + u32 *addrs) { const struct bpf_insn *insn = fp->insnsi; int flen = fp->len; @@ -763,7 +763,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * case BPF_JMP | BPF_CALL: ctx->seen |= SEEN_FUNC; - ret = bpf_jit_get_func_addr(fp, [i], extra_pass, + ret = bpf_jit_get_func_addr(fp, [i], false, _addr, _addr_fixed); if (ret < 0) return ret; -- 2.26.3
[PATCH 0/4] bpf powerpc: Add BPF_PROBE_MEM support for 64bit JIT
Patch #1, #2 are simple cleanup patches. Patch #3 adds BPF_PROBE_MEM support with PowerPC 64bit JIT compiler. Patch #4 adds explicit addr > TASK_SIZE_MAX check to handle bad userspace pointers. Ravi Bangoria (4): bpf powerpc: Remove unused SEEN_STACK bpf powerpc: Remove extra_pass from bpf_jit_build_body() bpf powerpc: Add BPF_PROBE_MEM support for 64bit JIT bpf powerpc: Add addr > TASK_SIZE_MAX explicit check arch/powerpc/net/bpf_jit.h| 8 ++- arch/powerpc/net/bpf_jit_comp.c | 25 ++-- arch/powerpc/net/bpf_jit_comp32.c | 4 +- arch/powerpc/net/bpf_jit_comp64.c | 100 +- 4 files changed, 124 insertions(+), 13 deletions(-) -- 2.26.3
[PATCH 1/4] bpf powerpc: Remove unused SEEN_STACK
SEEN_STACK is unused on PowerPC. Remove it. Also, have SEEN_TAILCALL use 0x4000. Signed-off-by: Ravi Bangoria --- arch/powerpc/net/bpf_jit.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index 99fad093f43e..d6267e93027a 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -116,8 +116,7 @@ static inline bool is_nearbranch(int offset) #define COND_LE(CR0_GT | COND_CMP_FALSE) #define SEEN_FUNC 0x2000 /* might call external helpers */ -#define SEEN_STACK 0x4000 /* uses BPF stack */ -#define SEEN_TAILCALL 0x8000 /* uses tail calls */ +#define SEEN_TAILCALL 0x4000 /* uses tail calls */ #define SEEN_VREG_MASK 0x1ff8 /* Volatile registers r3-r12 */ #define SEEN_NVREG_MASK0x0003 /* Non volatile registers r14-r31 */ -- 2.26.3
[PATCH v3 4/4] powerpc/selftests: Add selftest to test concurrent perf/ptrace events
ptrace and perf watchpoints can't co-exists if their address range overlaps. See commit 29da4f91c0c1 ("powerpc/watchpoint: Don't allow concurrent perf and ptrace events") for more detail. Add selftest for the same. Sample o/p: # ./ptrace-perf-hwbreak test: ptrace-perf-hwbreak tags: git_version:powerpc-5.8-7-118-g937fa174a15d-dirty perf cpu event -> ptrace thread event (Overlapping): Ok perf cpu event -> ptrace thread event (Non-overlapping): Ok perf thread event -> ptrace same thread event (Overlapping): Ok perf thread event -> ptrace same thread event (Non-overlapping): Ok perf thread event -> ptrace other thread event: Ok ptrace thread event -> perf kernel event: Ok ptrace thread event -> perf same thread event (Overlapping): Ok ptrace thread event -> perf same thread event (Non-overlapping): Ok ptrace thread event -> perf other thread event: Ok ptrace thread event -> perf cpu event (Overlapping): Ok ptrace thread event -> perf cpu event (Non-overlapping): Ok ptrace thread event -> perf same thread & cpu event (Overlapping): Ok ptrace thread event -> perf same thread & cpu event (Non-overlapping): Ok ptrace thread event -> perf other thread & cpu event: Ok success: ptrace-perf-hwbreak Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/.gitignore | 1 + .../testing/selftests/powerpc/ptrace/Makefile | 2 +- .../powerpc/ptrace/ptrace-perf-hwbreak.c | 659 ++ 3 files changed, 661 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c diff --git a/tools/testing/selftests/powerpc/ptrace/.gitignore b/tools/testing/selftests/powerpc/ptrace/.gitignore index 0e96150b7c7e..eb75e5360e31 100644 --- a/tools/testing/selftests/powerpc/ptrace/.gitignore +++ b/tools/testing/selftests/powerpc/ptrace/.gitignore @@ -14,3 +14,4 @@ perf-hwbreak core-pkey ptrace-pkey ptrace-syscall +ptrace-perf-hwbreak diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile b/tools/testing/selftests/powerpc/ptrace/Makefile index 8d3f006c98cc..a500639da97a 100644 --- a/tools/testing/selftests/powerpc/ptrace/Makefile +++ b/tools/testing/selftests/powerpc/ptrace/Makefile @@ -2,7 +2,7 @@ TEST_GEN_PROGS := ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr \ ptrace-tar ptrace-tm-tar ptrace-tm-spd-tar ptrace-vsx ptrace-tm-vsx \ ptrace-tm-spd-vsx ptrace-tm-spr ptrace-hwbreak ptrace-pkey core-pkey \ - perf-hwbreak ptrace-syscall + perf-hwbreak ptrace-syscall ptrace-perf-hwbreak top_srcdir = ../../../../.. include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c new file mode 100644 index ..fad14f926b98 --- /dev/null +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c @@ -0,0 +1,659 @@ +// SPDX-License-Identifier: GPL-2.0+ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "ptrace.h" + +char data[16]; + +/* Overlapping address range */ +volatile __u64 *ptrace_data1 = (__u64 *)[0]; +volatile __u64 *perf_data1 = (__u64 *)[4]; + +/* Non-overlapping address range */ +volatile __u64 *ptrace_data2 = (__u64 *)[0]; +volatile __u64 *perf_data2 = (__u64 *)[8]; + +static unsigned long pid_max_addr(void) +{ + FILE *fp; + char *line, *c; + char addr[100]; + size_t len = 0; + + fp = fopen("/proc/kallsyms", "r"); + if (!fp) { + printf("Failed to read /proc/kallsyms. Exiting..\n"); + exit(EXIT_FAILURE); + } + + while (getline(, , fp) != -1) { + if (!strstr(line, "pid_max") || strstr(line, "pid_max_max") || + strstr(line, "pid_max_min")) + continue; + + strncpy(addr, line, len < 100 ? len : 100); + c = strchr(addr, ' '); + *c = '\0'; + return strtoul(addr, , 16); + } + fclose(fp); + printf("Could not find pix_max. Exiting..\n"); + exit(EXIT_FAILURE); + return -1; +} + +static void perf_user_event_attr_set(struct perf_event_attr *attr, __u64 addr, __u64 len) +{ + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_TYPE_BREAKPOINT; + attr->size = sizeof(struct perf_event_attr); + attr->bp_type= HW_BREAKPOINT_R; + attr->bp_addr= addr; + attr->bp_len = len; + attr->exclude_kernel = 1; + attr->exclude_hv = 1; +} + +static void perf_kernel_event_attr_set(struct perf_event_attr *attr) +{ + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_
[PATCH v3 3/4] powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR
Extend perf-hwbreak.c selftest to test multiple DAWRs. Also add testcase for testing 512 byte boundary removal. Sample o/p: # ./perf-hwbreak ... TESTED: Process specific, Two events, diff addr TESTED: Process specific, Two events, same addr TESTED: Process specific, Two events, diff addr, one is RO, other is WO TESTED: Process specific, Two events, same addr, one is RO, other is WO TESTED: Systemwide, Two events, diff addr TESTED: Systemwide, Two events, same addr TESTED: Systemwide, Two events, diff addr, one is RO, other is WO TESTED: Systemwide, Two events, same addr, one is RO, other is WO TESTED: Process specific, 512 bytes, unaligned success: perf_hwbreak Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/perf-hwbreak.c | 552 +- 1 file changed, 551 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c index b1d5a14d7722..419506636ebe 100644 --- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c @@ -21,8 +21,13 @@ #include #include #include +#include #include #include +#include +#include +#include +#include #include #include #include @@ -34,6 +39,12 @@ #define DAWR_LENGTH_MAX ((0x3f + 1) * 8) +int nprocs; + +static volatile int a = 10; +static volatile int b = 10; +static volatile char c[512 + 8] __attribute__((aligned(512))); + static void perf_event_attr_set(struct perf_event_attr *attr, __u32 type, __u64 addr, __u64 len, bool exclude_user) @@ -68,6 +79,76 @@ static int perf_process_event_open(__u32 type, __u64 addr, __u64 len) return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0); } +static int perf_cpu_event_open(long cpu, __u32 type, __u64 addr, __u64 len) +{ + struct perf_event_attr attr; + + perf_event_attr_set(, type, addr, len, 0); + return syscall(__NR_perf_event_open, , -1, cpu, -1, 0); +} + +static void close_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + close(fd[i]); +} + +static unsigned long read_fds(int *fd, int n) +{ + int i; + unsigned long c = 0; + unsigned long count = 0; + size_t res; + + for (i = 0; i < n; i++) { + res = read(fd[i], , sizeof(c)); + assert(res == sizeof(unsigned long long)); + count += c; + } + return count; +} + +static void reset_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + ioctl(fd[i], PERF_EVENT_IOC_RESET); +} + +static void enable_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + ioctl(fd[i], PERF_EVENT_IOC_ENABLE); +} + +static void disable_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + ioctl(fd[i], PERF_EVENT_IOC_DISABLE); +} + +static int perf_systemwide_event_open(int *fd, __u32 type, __u64 addr, __u64 len) +{ + int i = 0; + + /* Assume online processors are 0 to nprocs for simplisity */ + for (i = 0; i < nprocs; i++) { + fd[i] = perf_cpu_event_open(i, type, addr, len); + if (fd[i] < 0) { + close_fds(fd, i); + return fd[i]; + } + } + return 0; +} + static inline bool breakpoint_test(int len) { int fd; @@ -262,11 +343,467 @@ static int runtest_dar_outside(void) return fail; } +static void multi_dawr_workload(void) +{ + a += 10; + b += 10; + c[512 + 1] += 'a'; +} + +static int test_process_multi_diff_addr(void) +{ + unsigned long long breaks1 = 0, breaks2 = 0; + int fd1, fd2; + char *desc = "Process specific, Two events, diff addr"; + size_t res; + + fd1 = perf_process_event_open(HW_BREAKPOINT_RW, (__u64), (__u64)sizeof(a)); + if (fd1 < 0) { + perror("perf_process_event_open"); + exit(EXIT_FAILURE); + } + + fd2 = perf_process_event_open(HW_BREAKPOINT_RW, (__u64), (__u64)sizeof(b)); + if (fd2 < 0) { + close(fd1); + perror("perf_process_event_open"); + exit(EXIT_FAILURE); + } + + ioctl(fd1, PERF_EVENT_IOC_RESET); + ioctl(fd2, PERF_EVENT_IOC_RESET); + ioctl(fd1, PERF_EVENT_IOC_ENABLE); + ioctl(fd2, PERF_EVENT_IOC_ENABLE); + multi_dawr_workload(); + ioctl(fd1, PERF_EVENT_IOC_DISABLE); + ioctl(fd2, PERF_EVENT_IOC_DISABLE); + + res = read(fd1, , sizeof(breaks1)); + assert(res == sizeof(unsigned long long)); + res = read(fd2, , sizeof(breaks2)); + assert(res == sizeof(unsigned long long)); + + close(fd1); + close(fd2); + + if (breaks1 != 2 || breaks2 != 2
[PATCH v3 2/4] powerpc/selftests/perf-hwbreak: Coalesce event creation code
perf-hwbreak selftest opens hw-breakpoint event at multiple places for which it has same code repeated. Coalesce that code into a function. Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/perf-hwbreak.c | 79 +-- 1 file changed, 39 insertions(+), 40 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c index c1f324afdbf3..b1d5a14d7722 100644 --- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c @@ -34,28 +34,46 @@ #define DAWR_LENGTH_MAX ((0x3f + 1) * 8) -static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid, - int cpu, int group_fd, - unsigned long flags) +static void perf_event_attr_set(struct perf_event_attr *attr, + __u32 type, __u64 addr, __u64 len, + bool exclude_user) { - attr->size = sizeof(*attr); - return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_TYPE_BREAKPOINT; + attr->size = sizeof(struct perf_event_attr); + attr->bp_type= type; + attr->bp_addr= addr; + attr->bp_len = len; + attr->exclude_kernel = 1; + attr->exclude_hv = 1; + attr->exclude_guest = 1; + attr->exclude_user = exclude_user; + attr->disabled = 1; } -static inline bool breakpoint_test(int len) +static int +perf_process_event_open_exclude_user(__u32 type, __u64 addr, __u64 len, bool exclude_user) { struct perf_event_attr attr; + + perf_event_attr_set(, type, addr, len, exclude_user); + return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0); +} + +static int perf_process_event_open(__u32 type, __u64 addr, __u64 len) +{ + struct perf_event_attr attr; + + perf_event_attr_set(, type, addr, len, 0); + return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0); +} + +static inline bool breakpoint_test(int len) +{ int fd; - /* setup counters */ - memset(, 0, sizeof(attr)); - attr.disabled = 1; - attr.type = PERF_TYPE_BREAKPOINT; - attr.bp_type = HW_BREAKPOINT_R; /* bp_addr can point anywhere but needs to be aligned */ - attr.bp_addr = (__u64)() & 0xf800; - attr.bp_len = len; - fd = sys_perf_event_open(, 0, -1, -1, 0); + fd = perf_process_event_open(HW_BREAKPOINT_R, (__u64)() & 0xf800, len); if (fd < 0) return false; close(fd); @@ -75,7 +93,6 @@ static inline bool dawr_supported(void) static int runtestsingle(int readwriteflag, int exclude_user, int arraytest) { int i,j; - struct perf_event_attr attr; size_t res; unsigned long long breaks, needed; int readint; @@ -85,6 +102,7 @@ static int runtestsingle(int readwriteflag, int exclude_user, int arraytest) int break_fd; int loop_num = MAX_LOOPS - (rand() % 100); /* provide some variability */ volatile int *k; + __u64 len; /* align to 0x400 boundary as required by DAWR */ readintalign = (int *)(((unsigned long)readintarraybig + 0x7ff) & @@ -94,19 +112,11 @@ static int runtestsingle(int readwriteflag, int exclude_user, int arraytest) if (arraytest) ptr = [0]; - /* setup counters */ - memset(, 0, sizeof(attr)); - attr.disabled = 1; - attr.type = PERF_TYPE_BREAKPOINT; - attr.bp_type = readwriteflag; - attr.bp_addr = (__u64)ptr; - attr.bp_len = sizeof(int); - if (arraytest) - attr.bp_len = DAWR_LENGTH_MAX; - attr.exclude_user = exclude_user; - break_fd = sys_perf_event_open(, 0, -1, -1, 0); + len = arraytest ? DAWR_LENGTH_MAX : sizeof(int); + break_fd = perf_process_event_open_exclude_user(readwriteflag, (__u64)ptr, + len, exclude_user); if (break_fd < 0) { - perror("sys_perf_event_open"); + perror("perf_process_event_open_exclude_user"); exit(1); } @@ -153,7 +163,6 @@ static int runtest_dar_outside(void) void *target; volatile __u16 temp16; volatile __u64 temp64; - struct perf_event_attr attr; int break_fd; unsigned long long breaks; int fail = 0; @@ -165,21 +174,11 @@ static int runtest_dar_outside(void) exit(EXIT_FAILURE); } - /* setup counters */ - memset(, 0, sizeof(attr)); - attr.disabled = 1; - attr.type = PERF_TYPE_BREAKPOINT; - attr.exclude_
[PATCH v3 1/4] powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR
Add selftests to test multiple active DAWRs with ptrace interface. Sample o/p: $ ./ptrace-hwbreak ... PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO, len: 6: Ok PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO, len: 6: Ok PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO, len: 6: Ok Signed-off-by: Ravi Bangoria Reviewed-by: Daniel Axtens --- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 79 +++ 1 file changed, 79 insertions(+) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c index 2e0d86e0687e..a0635a3819aa 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c @@ -194,6 +194,18 @@ static void test_workload(void) big_var[rand() % DAWR_MAX_LEN] = 'a'; else cvar = big_var[rand() % DAWR_MAX_LEN]; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */ + gstruct.a[rand() % A_LEN] = 'a'; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */ + cvar = gstruct.b[rand() % B_LEN]; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */ + gstruct.a[rand() % A_LEN] = 'a'; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */ + cvar = gstruct.a[rand() % A_LEN]; } static void check_success(pid_t child_pid, const char *name, const char *type, @@ -417,6 +429,69 @@ static void test_sethwdebug_range_aligned(pid_t child_pid) ptrace_delhwdebug(child_pid, wh); } +static void test_multi_sethwdebug_range(pid_t child_pid) +{ + struct ppc_hw_breakpoint info1, info2; + unsigned long wp_addr1, wp_addr2; + char *name1 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED"; + char *name2 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED"; + int len1, len2; + int wh1, wh2; + + wp_addr1 = (unsigned long) + wp_addr2 = (unsigned long) + len1 = A_LEN; + len2 = B_LEN; + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, len1); + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, len2); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */ + wh1 = ptrace_sethwdebug(child_pid, ); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */ + wh2 = ptrace_sethwdebug(child_pid, ); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name1, "WO", wp_addr1, len1); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name2, "RO", wp_addr2, len2); + + ptrace_delhwdebug(child_pid, wh1); + ptrace_delhwdebug(child_pid, wh2); +} + +static void test_multi_sethwdebug_range_dawr_overlap(pid_t child_pid) +{ + struct ppc_hw_breakpoint info1, info2; + unsigned long wp_addr1, wp_addr2; + char *name = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap"; + int len1, len2; + int wh1, wh2; + + wp_addr1 = (unsigned long) + wp_addr2 = (unsigned long) + len1 = A_LEN; + len2 = A_LEN; + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, len1); + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, len2); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */ + wh1 = ptrace_sethwdebug(child_pid, ); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */ + wh2 = ptrace_sethwdebug(child_pid, ); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "WO", wp_addr1, len1); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "RO", wp_addr2, len2); + + ptrace_delhwdebug(child_pid, wh1); + ptrace_delhwdebug(child_pid, wh2); +} + static void test_sethwdebug_range_unaligned(pid_t child_pid) { struct ppc_hw_breakpoint info; @@ -504,6 +579,10 @@ run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, bool dawr) test_sethwdebug_range_unaligned(child_pid); test_sethwdebug_range_unaligned_dar(child_pid); test_sethwdebug_dawr_max_range(child_pid); + if (dbginfo->num_data_bps > 1) { + test_multi_sethwdebug_range(child_pid); + test_multi_sethwdebug_range_dawr_overlap(child_pid); + } } } } -- 2.27.0
[PATCH v3 0/4] powerpc/selftests: Add Power10 2nd DAWR selftests
Power10 introduced 2nd watchpoint (DAWR). ISA 3.1, Book 3S, Ch 9 - 'Debug Facilities' covers the feature in detail. Kernel patches to enable the 2nd DAWR are already in[1], including kvm enablement[2]. These patches adds selftests for 2nd DAWR. [1]: https://git.kernel.org/torvalds/c/deb2bd9bcc8428d4b65b6ba640ba8b57c1b20b17 [2]: https://git.kernel.org/torvalds/c/bd1de1a0e6eff4bde5ceae969673b85b8446fd6a v2: https://lore.kernel.org/r/20210407054938.312857-1-ravi.bango...@linux.ibm.com v2->v3: - Fixed some checkpatch warnings v1: https://lore.kernel.org/r/20200723102058.312282-1-ravi.bango...@linux.ibm.com v1->v2: - Kvm patches are already upstream - Rebased selftests to powerpc/next Ravi Bangoria (4): powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR powerpc/selftests/perf-hwbreak: Coalesce event creation code powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR powerpc/selftests: Add selftest to test concurrent perf/ptrace events .../selftests/powerpc/ptrace/.gitignore | 1 + .../testing/selftests/powerpc/ptrace/Makefile | 2 +- .../selftests/powerpc/ptrace/perf-hwbreak.c | 631 +++-- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 79 +++ .../powerpc/ptrace/ptrace-perf-hwbreak.c | 659 ++ 5 files changed, 1330 insertions(+), 42 deletions(-) create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c -- 2.27.0
Re: [PATCH v2 2/4] powerpc/selftests/perf-hwbreak: Coalesce event creation code
On 4/9/21 12:49 PM, Daniel Axtens wrote: Hi Ravi, perf-hwbreak selftest opens hw-breakpoint event at multiple places for which it has same code repeated. Coalesce that code into a function. Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/perf-hwbreak.c | 78 +-- This doesn't simplify things very much for the code as it stands now, but I think your next patch adds a bunch of calls to these functions, so I agree that it makes sense to consolidate them now. Right. This helps in next patch. 1 file changed, 38 insertions(+), 40 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c index c1f324afdbf3..bde475341c8a 100644 --- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c @@ -34,28 +34,46 @@ #define DAWR_LENGTH_MAX ((0x3f + 1) * 8) -static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid, - int cpu, int group_fd, - unsigned long flags) +static void perf_event_attr_set(struct perf_event_attr *attr, + __u32 type, __u64 addr, __u64 len, + bool exclude_user) { - attr->size = sizeof(*attr); - return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_TYPE_BREAKPOINT; + attr->size = sizeof(struct perf_event_attr); + attr->bp_type= type; + attr->bp_addr= addr; + attr->bp_len = len; + attr->exclude_kernel = 1; + attr->exclude_hv = 1; + attr->exclude_guest = 1; Only 1 of the calls to perf sets exclude_{kernel,hv,guest} - I assume there's no issue with setting them always but I wanted to check. True. there is no issue in setting them as this test does all load/stores in userspace. So all events will be recorded irrespective of settings of exclude_{kernel,hv,guest}. + attr->exclude_user = exclude_user; + attr->disabled = 1; } - /* setup counters */ - memset(, 0, sizeof(attr)); - attr.disabled = 1; - attr.type = PERF_TYPE_BREAKPOINT; - attr.bp_type = readwriteflag; - attr.bp_addr = (__u64)ptr; - attr.bp_len = sizeof(int); - if (arraytest) - attr.bp_len = DAWR_LENGTH_MAX; - attr.exclude_user = exclude_user; - break_fd = sys_perf_event_open(, 0, -1, -1, 0); + break_fd = perf_process_event_open_exclude_user(readwriteflag, (__u64)ptr, + arraytest ? DAWR_LENGTH_MAX : sizeof(int), + exclude_user); checkpatch doesn't like this very much: CHECK: Alignment should match open parenthesis #103: FILE: tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c:115: + break_fd = perf_process_event_open_exclude_user(readwriteflag, (__u64)ptr, + arraytest ? DAWR_LENGTH_MAX : sizeof(int), Sure will fix it. Apart from that, this seems good but I haven't checked in super fine detail just yet :) Thanks for the review. -Ravi
Re: [PATCH v2 1/4] powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR
On 4/9/21 12:22 PM, Daniel Axtens wrote: Hi Ravi, Add selftests to test multiple active DAWRs with ptrace interface. It would be good if somewhere (maybe in the cover letter) you explain what DAWR stands for and where to find more information about it. I found the Power ISA v3.1 Book 3 Chapter 9 very helpful. Sure. Will add the details in v2 cover letter. Apart from that, I don't have any specific comments about this patch. It looks good to me, it seems to do what it says, and there are no comments from checkpatch. It is a bit sparse in terms of comments but it is consistent with the rest of the file so I can't really complain there :) Reviewed-by: Daniel Axtens Thanks Ravi
[PATCH v2 4/4] powerpc/selftests: Add selftest to test concurrent perf/ptrace events
ptrace and perf watchpoints can't co-exists if their address range overlaps. See commit 29da4f91c0c1 ("powerpc/watchpoint: Don't allow concurrent perf and ptrace events") for more detail. Add selftest for the same. Sample o/p: # ./ptrace-perf-hwbreak test: ptrace-perf-hwbreak tags: git_version:powerpc-5.8-7-118-g937fa174a15d-dirty perf cpu event -> ptrace thread event (Overlapping): Ok perf cpu event -> ptrace thread event (Non-overlapping): Ok perf thread event -> ptrace same thread event (Overlapping): Ok perf thread event -> ptrace same thread event (Non-overlapping): Ok perf thread event -> ptrace other thread event: Ok ptrace thread event -> perf kernel event: Ok ptrace thread event -> perf same thread event (Overlapping): Ok ptrace thread event -> perf same thread event (Non-overlapping): Ok ptrace thread event -> perf other thread event: Ok ptrace thread event -> perf cpu event (Overlapping): Ok ptrace thread event -> perf cpu event (Non-overlapping): Ok ptrace thread event -> perf same thread & cpu event (Overlapping): Ok ptrace thread event -> perf same thread & cpu event (Non-overlapping): Ok ptrace thread event -> perf other thread & cpu event: Ok success: ptrace-perf-hwbreak Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/.gitignore | 1 + .../testing/selftests/powerpc/ptrace/Makefile | 2 +- .../powerpc/ptrace/ptrace-perf-hwbreak.c | 659 ++ 3 files changed, 661 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c diff --git a/tools/testing/selftests/powerpc/ptrace/.gitignore b/tools/testing/selftests/powerpc/ptrace/.gitignore index 0e96150b7c7e..eb75e5360e31 100644 --- a/tools/testing/selftests/powerpc/ptrace/.gitignore +++ b/tools/testing/selftests/powerpc/ptrace/.gitignore @@ -14,3 +14,4 @@ perf-hwbreak core-pkey ptrace-pkey ptrace-syscall +ptrace-perf-hwbreak diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile b/tools/testing/selftests/powerpc/ptrace/Makefile index 8d3f006c98cc..a500639da97a 100644 --- a/tools/testing/selftests/powerpc/ptrace/Makefile +++ b/tools/testing/selftests/powerpc/ptrace/Makefile @@ -2,7 +2,7 @@ TEST_GEN_PROGS := ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr \ ptrace-tar ptrace-tm-tar ptrace-tm-spd-tar ptrace-vsx ptrace-tm-vsx \ ptrace-tm-spd-vsx ptrace-tm-spr ptrace-hwbreak ptrace-pkey core-pkey \ - perf-hwbreak ptrace-syscall + perf-hwbreak ptrace-syscall ptrace-perf-hwbreak top_srcdir = ../../../../.. include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c new file mode 100644 index ..6b8804a4942e --- /dev/null +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c @@ -0,0 +1,659 @@ +// SPDX-License-Identifier: GPL-2.0+ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "ptrace.h" + +char data[16]; + +/* Overlapping address range */ +volatile __u64 *ptrace_data1 = (__u64 *)[0]; +volatile __u64 *perf_data1 = (__u64 *)[4]; + +/* Non-overlapping address range */ +volatile __u64 *ptrace_data2 = (__u64 *)[0]; +volatile __u64 *perf_data2 = (__u64 *)[8]; + +static unsigned long pid_max_addr(void) +{ + FILE *fp; + char *line, *c; + char addr[100]; + size_t len = 0; + + fp = fopen("/proc/kallsyms", "r"); + if (!fp) { + printf("Failed to read /proc/kallsyms. Exiting..\n"); + exit(EXIT_FAILURE); + } + + while (getline(, , fp) != -1) { + if (!strstr(line, "pid_max") || strstr(line, "pid_max_max") || + strstr(line, "pid_max_min")) + continue; + + strncpy(addr, line, len < 100 ? len : 100); + c = strchr(addr, ' '); + *c = '\0'; + return strtoul(addr, , 16); + } + fclose(fp); + printf("Could not find pix_max. Exiting..\n"); + exit(EXIT_FAILURE); + return -1; +} + +static void perf_user_event_attr_set(struct perf_event_attr *attr, __u64 addr, __u64 len) +{ + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_TYPE_BREAKPOINT; + attr->size = sizeof(struct perf_event_attr); + attr->bp_type= HW_BREAKPOINT_R; + attr->bp_addr= addr; + attr->bp_len = len; + attr->exclude_kernel = 1; + attr->exclude_hv = 1; +} + +static void perf_kernel_event_attr_set(struct perf_event_attr *attr) +{ + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_
[PATCH v2 3/4] powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR
Extend perf-hwbreak.c selftest to test multiple DAWRs. Also add testcase for testing 512 byte boundary removal. Sample o/p: # ./perf-hwbreak ... TESTED: Process specific, Two events, diff addr TESTED: Process specific, Two events, same addr TESTED: Process specific, Two events, diff addr, one is RO, other is WO TESTED: Process specific, Two events, same addr, one is RO, other is WO TESTED: Systemwide, Two events, diff addr TESTED: Systemwide, Two events, same addr TESTED: Systemwide, Two events, diff addr, one is RO, other is WO TESTED: Systemwide, Two events, same addr, one is RO, other is WO TESTED: Process specific, 512 bytes, unaligned success: perf_hwbreak Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/perf-hwbreak.c | 568 +- 1 file changed, 567 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c index bde475341c8a..1dafba42c23d 100644 --- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c @@ -21,8 +21,13 @@ #include #include #include +#include #include #include +#include +#include +#include +#include #include #include #include @@ -34,6 +39,12 @@ #define DAWR_LENGTH_MAX ((0x3f + 1) * 8) +int nprocs; + +static volatile int a = 10; +static volatile int b = 10; +static volatile char c[512 + 8] __attribute__((aligned(512))); + static void perf_event_attr_set(struct perf_event_attr *attr, __u32 type, __u64 addr, __u64 len, bool exclude_user) @@ -68,6 +79,76 @@ static int perf_process_event_open(__u32 type, __u64 addr, __u64 len) return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0); } +static int perf_cpu_event_open(long cpu, __u32 type, __u64 addr, __u64 len) +{ + struct perf_event_attr attr; + + perf_event_attr_set(, type, addr, len, 0); + return syscall(__NR_perf_event_open, , -1, cpu, -1, 0); +} + +static void close_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + close(fd[i]); +} + +static unsigned long read_fds(int *fd, int n) +{ + int i; + unsigned long c = 0; + unsigned long count = 0; + size_t res; + + for (i = 0; i < n; i++) { + res = read(fd[i], , sizeof(c)); + assert(res == sizeof(unsigned long long)); + count += c; + } + return count; +} + +static void reset_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + ioctl(fd[i], PERF_EVENT_IOC_RESET); +} + +static void enable_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + ioctl(fd[i], PERF_EVENT_IOC_ENABLE); +} + +static void disable_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + ioctl(fd[i], PERF_EVENT_IOC_DISABLE); +} + +static int perf_systemwide_event_open(int *fd, __u32 type, __u64 addr, __u64 len) +{ + int i = 0; + + /* Assume online processors are 0 to nprocs for simplisity */ + for (i = 0; i < nprocs; i++) { + fd[i] = perf_cpu_event_open(i, type, addr, len); + if (fd[i] < 0) { + close_fds(fd, i); + return fd[i]; + } + } + return 0; +} + static inline bool breakpoint_test(int len) { int fd; @@ -261,11 +342,483 @@ static int runtest_dar_outside(void) return fail; } +static void multi_dawr_workload(void) +{ + a += 10; + b += 10; + c[512 + 1] += 'a'; +} + +static int test_process_multi_diff_addr(void) +{ + unsigned long long breaks1 = 0, breaks2 = 0; + int fd1, fd2; + char *desc = "Process specific, Two events, diff addr"; + size_t res; + + fd1 = perf_process_event_open(HW_BREAKPOINT_RW, (__u64), (__u64)sizeof(a)); + if (fd1 < 0) { + perror("perf_process_event_open"); + exit(EXIT_FAILURE); + } + + fd2 = perf_process_event_open(HW_BREAKPOINT_RW, (__u64), (__u64)sizeof(b)); + if (fd2 < 0) { + close(fd1); + perror("perf_process_event_open"); + exit(EXIT_FAILURE); + } + + ioctl(fd1, PERF_EVENT_IOC_RESET); + ioctl(fd2, PERF_EVENT_IOC_RESET); + ioctl(fd1, PERF_EVENT_IOC_ENABLE); + ioctl(fd2, PERF_EVENT_IOC_ENABLE); + multi_dawr_workload(); + ioctl(fd1, PERF_EVENT_IOC_DISABLE); + ioctl(fd2, PERF_EVENT_IOC_DISABLE); + + res = read(fd1, , sizeof(breaks1)); + assert(res == sizeof(unsigned long long)); + res = read(fd2, , sizeof(breaks2)); + assert(res == sizeof(unsigned long long)); + + close(fd1); + close(fd2); + + if (breaks1 != 2 || breaks2 != 2
[PATCH v2 2/4] powerpc/selftests/perf-hwbreak: Coalesce event creation code
perf-hwbreak selftest opens hw-breakpoint event at multiple places for which it has same code repeated. Coalesce that code into a function. Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/perf-hwbreak.c | 78 +-- 1 file changed, 38 insertions(+), 40 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c index c1f324afdbf3..bde475341c8a 100644 --- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c @@ -34,28 +34,46 @@ #define DAWR_LENGTH_MAX ((0x3f + 1) * 8) -static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid, - int cpu, int group_fd, - unsigned long flags) +static void perf_event_attr_set(struct perf_event_attr *attr, + __u32 type, __u64 addr, __u64 len, + bool exclude_user) { - attr->size = sizeof(*attr); - return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_TYPE_BREAKPOINT; + attr->size = sizeof(struct perf_event_attr); + attr->bp_type= type; + attr->bp_addr= addr; + attr->bp_len = len; + attr->exclude_kernel = 1; + attr->exclude_hv = 1; + attr->exclude_guest = 1; + attr->exclude_user = exclude_user; + attr->disabled = 1; } -static inline bool breakpoint_test(int len) +static int +perf_process_event_open_exclude_user(__u32 type, __u64 addr, __u64 len, bool exclude_user) { struct perf_event_attr attr; + + perf_event_attr_set(, type, addr, len, exclude_user); + return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0); +} + +static int perf_process_event_open(__u32 type, __u64 addr, __u64 len) +{ + struct perf_event_attr attr; + + perf_event_attr_set(, type, addr, len, 0); + return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0); +} + +static inline bool breakpoint_test(int len) +{ int fd; - /* setup counters */ - memset(, 0, sizeof(attr)); - attr.disabled = 1; - attr.type = PERF_TYPE_BREAKPOINT; - attr.bp_type = HW_BREAKPOINT_R; /* bp_addr can point anywhere but needs to be aligned */ - attr.bp_addr = (__u64)() & 0xf800; - attr.bp_len = len; - fd = sys_perf_event_open(, 0, -1, -1, 0); + fd = perf_process_event_open(HW_BREAKPOINT_R, (__u64)() & 0xf800, len); if (fd < 0) return false; close(fd); @@ -75,7 +93,6 @@ static inline bool dawr_supported(void) static int runtestsingle(int readwriteflag, int exclude_user, int arraytest) { int i,j; - struct perf_event_attr attr; size_t res; unsigned long long breaks, needed; int readint; @@ -94,19 +111,11 @@ static int runtestsingle(int readwriteflag, int exclude_user, int arraytest) if (arraytest) ptr = [0]; - /* setup counters */ - memset(, 0, sizeof(attr)); - attr.disabled = 1; - attr.type = PERF_TYPE_BREAKPOINT; - attr.bp_type = readwriteflag; - attr.bp_addr = (__u64)ptr; - attr.bp_len = sizeof(int); - if (arraytest) - attr.bp_len = DAWR_LENGTH_MAX; - attr.exclude_user = exclude_user; - break_fd = sys_perf_event_open(, 0, -1, -1, 0); + break_fd = perf_process_event_open_exclude_user(readwriteflag, (__u64)ptr, + arraytest ? DAWR_LENGTH_MAX : sizeof(int), + exclude_user); if (break_fd < 0) { - perror("sys_perf_event_open"); + perror("perf_process_event_open_exclude_user"); exit(1); } @@ -153,7 +162,6 @@ static int runtest_dar_outside(void) void *target; volatile __u16 temp16; volatile __u64 temp64; - struct perf_event_attr attr; int break_fd; unsigned long long breaks; int fail = 0; @@ -165,21 +173,11 @@ static int runtest_dar_outside(void) exit(EXIT_FAILURE); } - /* setup counters */ - memset(, 0, sizeof(attr)); - attr.disabled = 1; - attr.type = PERF_TYPE_BREAKPOINT; - attr.exclude_kernel = 1; - attr.exclude_hv = 1; - attr.exclude_guest = 1; - attr.bp_type = HW_BREAKPOINT_RW; /* watch middle half of target array */ - attr.bp_addr = (__u64)(target + 2); - attr.bp_len = 4; - break_fd = sys_perf_event_open(, 0, -1, -1, 0); + break_fd = perf_process_event_open(HW_BREAKPOINT_RW, (__u64)(target + 2), 4); if (break_fd < 0) {
[PATCH v2 1/4] powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR
Add selftests to test multiple active DAWRs with ptrace interface. Sample o/p: $ ./ptrace-hwbreak ... PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO, len: 6: Ok PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO, len: 6: Ok PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO, len: 6: Ok Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 79 +++ 1 file changed, 79 insertions(+) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c index 2e0d86e0687e..a0635a3819aa 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c @@ -194,6 +194,18 @@ static void test_workload(void) big_var[rand() % DAWR_MAX_LEN] = 'a'; else cvar = big_var[rand() % DAWR_MAX_LEN]; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */ + gstruct.a[rand() % A_LEN] = 'a'; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */ + cvar = gstruct.b[rand() % B_LEN]; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */ + gstruct.a[rand() % A_LEN] = 'a'; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */ + cvar = gstruct.a[rand() % A_LEN]; } static void check_success(pid_t child_pid, const char *name, const char *type, @@ -417,6 +429,69 @@ static void test_sethwdebug_range_aligned(pid_t child_pid) ptrace_delhwdebug(child_pid, wh); } +static void test_multi_sethwdebug_range(pid_t child_pid) +{ + struct ppc_hw_breakpoint info1, info2; + unsigned long wp_addr1, wp_addr2; + char *name1 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED"; + char *name2 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED"; + int len1, len2; + int wh1, wh2; + + wp_addr1 = (unsigned long) + wp_addr2 = (unsigned long) + len1 = A_LEN; + len2 = B_LEN; + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, len1); + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, len2); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */ + wh1 = ptrace_sethwdebug(child_pid, ); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */ + wh2 = ptrace_sethwdebug(child_pid, ); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name1, "WO", wp_addr1, len1); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name2, "RO", wp_addr2, len2); + + ptrace_delhwdebug(child_pid, wh1); + ptrace_delhwdebug(child_pid, wh2); +} + +static void test_multi_sethwdebug_range_dawr_overlap(pid_t child_pid) +{ + struct ppc_hw_breakpoint info1, info2; + unsigned long wp_addr1, wp_addr2; + char *name = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap"; + int len1, len2; + int wh1, wh2; + + wp_addr1 = (unsigned long) + wp_addr2 = (unsigned long) + len1 = A_LEN; + len2 = A_LEN; + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, len1); + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, len2); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */ + wh1 = ptrace_sethwdebug(child_pid, ); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */ + wh2 = ptrace_sethwdebug(child_pid, ); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "WO", wp_addr1, len1); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "RO", wp_addr2, len2); + + ptrace_delhwdebug(child_pid, wh1); + ptrace_delhwdebug(child_pid, wh2); +} + static void test_sethwdebug_range_unaligned(pid_t child_pid) { struct ppc_hw_breakpoint info; @@ -504,6 +579,10 @@ run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, bool dawr) test_sethwdebug_range_unaligned(child_pid); test_sethwdebug_range_unaligned_dar(child_pid); test_sethwdebug_dawr_max_range(child_pid); + if (dbginfo->num_data_bps > 1) { + test_multi_sethwdebug_range(child_pid); + test_multi_sethwdebug_range_dawr_overlap(child_pid); + } } } } -- 2.27.0
[PATCH v2 0/4] powerpc/selftests: Add Power10 2nd DAWR selftests
Add selftests for 2nd DAWR supported by Power10. v1: https://lore.kernel.org/r/20200723102058.312282-1-ravi.bango...@linux.ibm.com v1->v2: - Kvm patches are already upstream - Rebased selftests to powerpc/next Ravi Bangoria (4): powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR powerpc/selftests/perf-hwbreak: Coalesce event creation code powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR powerpc/selftests: Add selftest to test concurrent perf/ptrace events .../selftests/powerpc/ptrace/.gitignore | 1 + .../testing/selftests/powerpc/ptrace/Makefile | 2 +- .../selftests/powerpc/ptrace/perf-hwbreak.c | 646 +++-- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 79 +++ .../powerpc/ptrace/ptrace-perf-hwbreak.c | 659 ++ 5 files changed, 1345 insertions(+), 42 deletions(-) create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c -- 2.27.0
[PATCH v5] powerpc/uprobes: Validation for prefixed instruction
As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria Acked-by: Naveen N. Rao Acked-by: Sandipan Das --- v4: https://lore.kernel.org/r/20210305115433.140769-1-ravi.bango...@linux.ibm.com v4->v5: - Replace SZ_ macros with numbers arch/powerpc/kernel/uprobes.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..186f69b11e94 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,13 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (cpu_has_feature(CPU_FTR_ARCH_31) && + ppc_inst_prefixed(auprobe->insn) && + (addr & 0x3f) == 60) { + pr_info_ratelimited("Cannot register a uprobe on 64 byte unaligned prefixed instruction\n"); + return -EINVAL; + } + return 0; } -- 2.27.0
Re: [PATCH v4] powerpc/uprobes: Validation for prefixed instruction
On 3/9/21 4:51 PM, Naveen N. Rao wrote: On 2021/03/09 08:54PM, Michael Ellerman wrote: Ravi Bangoria writes: As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria Acked-by: Naveen N. Rao Do we have a Fixes: tag for this? Since this is an additional check we are adding, I don't think we should add a Fixes: tag. Nothing is broken per-se -- we're just adding more checks to catch simple mistakes. Also, like Oleg pointed out, there are still many other ways for users to shoot themselves in the foot with uprobes and prefixed instructions, if they so desire. However, if you still think we should add a Fixes: tag, we can perhaps use the below commit since I didn't see any specific commit adding support for prefixed instructions for uprobes: Fixes: 650b55b707fdfa ("powerpc: Add prefixed instructions to instruction data type") True. IMO, It doesn't really need any Fixes tag. --- v3: https://lore.kernel.org/r/20210304050529.59391-1-ravi.bango...@linux.ibm.com v3->v4: - CONFIG_PPC64 check was not required, remove it. - Use SZ_ macros instead of hardcoded numbers. arch/powerpc/kernel/uprobes.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..4cbfff6e94a3 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,13 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (cpu_has_feature(CPU_FTR_ARCH_31) && + ppc_inst_prefixed(auprobe->insn) && + (addr & (SZ_64 - 4)) == SZ_64 - 4) { + pr_info_ratelimited("Cannot register a uprobe on 64 byte unaligned prefixed instruction\n"); + return -EINVAL; I realise we already did the 0x03 check above, but I still think this would be clearer simply as: (addr & 0x3f == 60) Indeed, I like the use of `60' there -- hex is overrated ;) Sure. Will resend. Ravi
[PATCH v4] powerpc/uprobes: Validation for prefixed instruction
As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria Acked-by: Naveen N. Rao --- v3: https://lore.kernel.org/r/20210304050529.59391-1-ravi.bango...@linux.ibm.com v3->v4: - CONFIG_PPC64 check was not required, remove it. - Use SZ_ macros instead of hardcoded numbers. arch/powerpc/kernel/uprobes.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..4cbfff6e94a3 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,13 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (cpu_has_feature(CPU_FTR_ARCH_31) && + ppc_inst_prefixed(auprobe->insn) && + (addr & (SZ_64 - 4)) == SZ_64 - 4) { + pr_info_ratelimited("Cannot register a uprobe on 64 byte unaligned prefixed instruction\n"); + return -EINVAL; + } + return 0; } -- 2.27.0
Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction
On 3/4/21 4:21 PM, Christophe Leroy wrote: Le 04/03/2021 à 11:13, Ravi Bangoria a écrit : On 3/4/21 1:02 PM, Christophe Leroy wrote: Le 04/03/2021 à 06:05, Ravi Bangoria a écrit : As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v2: https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com v2->v3: - Drop restriction for Uprobe on suffix of prefixed instruction. It needs lot of code change including generic code but what we get in return is not worth it. arch/powerpc/kernel/uprobes.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..c400971ebe70 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31)) cpu_has_feature(CPU_FTR_ARCH_31) should return 'false' when CONFIG_PPC64 is not enabled, no need to double check. Ok. I'm going to drop CONFIG_PPC64 check because it's not really required as I replied to Naveen. So, I'll keep CPU_FTR_ARCH_31 check as is. + return 0; + + if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) { Maybe 3C instead of 4F ? : (addr & 0x3C) == 0x3C Didn't follow. It's not (addr & 0x3C), it's (addr & 0x3F). Sorry I meant 3c instead of 3f (And usually we don't use capital letters for that). The last two bits are supposed to always be 0, so it doesn't really matter, I just thought it would look better having the same value both sides of the test, ie (addr & 0x3c) == 0x3c. Ok yeah makes sense. Thanks. What about (addr & (SZ_64 - 4)) == SZ_64 - 4 to make it more explicit ? Yes this is bit better. Though, it should be: (addr & (SZ_64 - 1)) == SZ_64 - 4 -1 or -4 should give the same results as instructions are always 32 bits aligned though. Got it. Ravi
Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction
On 3/4/21 1:02 PM, Christophe Leroy wrote: Le 04/03/2021 à 06:05, Ravi Bangoria a écrit : As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v2: https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com v2->v3: - Drop restriction for Uprobe on suffix of prefixed instruction. It needs lot of code change including generic code but what we get in return is not worth it. arch/powerpc/kernel/uprobes.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..c400971ebe70 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31)) cpu_has_feature(CPU_FTR_ARCH_31) should return 'false' when CONFIG_PPC64 is not enabled, no need to double check. Ok. I'm going to drop CONFIG_PPC64 check because it's not really required as I replied to Naveen. So, I'll keep CPU_FTR_ARCH_31 check as is. + return 0; + + if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) { Maybe 3C instead of 4F ? : (addr & 0x3C) == 0x3C Didn't follow. It's not (addr & 0x3C), it's (addr & 0x3F). What about (addr & (SZ_64 - 4)) == SZ_64 - 4 to make it more explicit ? Yes this is bit better. Though, it should be: (addr & (SZ_64 - 1)) == SZ_64 - 4 Ravi
Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction
@@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31)) + return 0; Sorry, I missed this last time, but I think we can drop the above checks. ppc_inst_prefixed() already factors in the dependency on CONFIG_PPC64, Yeah, makes sense. I initially added CONFIG_PPC64 check because I was using ppc_inst_prefix(x, y) macro which is not available for !CONFIG_PPC64. and I don't think we need to confirm if we're running on a ISA V3.1 for the below check. Prefixed instructions would be supported only when ARCH_31 is set. (Ignoring insane scenario where user probes on prefixed instruction on p10 predecessors). So I guess I still need ARCH_31 check? Ravi
[PATCH v3] powerpc/uprobes: Validation for prefixed instruction
As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v2: https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com v2->v3: - Drop restriction for Uprobe on suffix of prefixed instruction. It needs lot of code change including generic code but what we get in return is not worth it. arch/powerpc/kernel/uprobes.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..c400971ebe70 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31)) + return 0; + + if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) { + pr_info_ratelimited("Cannot register a uprobe on 64 byte unaligned prefixed instruction\n"); + return -EINVAL; + } + return 0; } -- 2.27.0
Re: [PATCH] powerpc/sstep: Fix VSX instruction emulation
On 2/25/21 8:49 AM, Jordan Niethe wrote: Commit af99da74333b ("powerpc/sstep: Support VSX vector paired storage access instructions") added loading and storing 32 word long data into adjacent VSRs. However the calculation used to determine if two VSRs needed to be loaded/stored inadvertently prevented the load/storing taking place for instructions with a data length less than 16 words. This causes the emulation to not function correctly, which can be seen by the alignment_handler selftest: $ ./alignment_handler [snip] test: test_alignment_handler_vsx_207 tags: git_version:powerpc-5.12-1-0-g82d2c16b350f VSX: 2.07B Doing lxsspx: PASSED Doing lxsiwax: FAILED: Wrong Data Doing lxsiwzx: PASSED Doing stxsspx: PASSED Doing stxsiwx: PASSED failure: test_alignment_handler_vsx_207 test: test_alignment_handler_vsx_300 tags: git_version:powerpc-5.12-1-0-g82d2c16b350f VSX: 3.00B Doing lxsd: PASSED Doing lxsibzx: PASSED Doing lxsihzx: PASSED Doing lxssp:FAILED: Wrong Data Doing lxv: PASSED Doing lxvb16x: PASSED Doing lxvh8x: PASSED Doing lxvx: PASSED Doing lxvwsx: FAILED: Wrong Data Doing lxvl: PASSED Doing lxvll:PASSED Doing stxsd:PASSED Doing stxsibx: PASSED Doing stxsihx: PASSED Doing stxssp: PASSED Doing stxv: PASSED Doing stxvb16x: PASSED Doing stxvh8x: PASSED Doing stxvx:PASSED Doing stxvl:PASSED Doing stxvll: PASSED failure: test_alignment_handler_vsx_300 [snip] Fix this by making sure all VSX instruction emulation correctly load/store from the VSRs. Fixes: af99da74333b ("powerpc/sstep: Support VSX vector paired storage access instructions") Signed-off-by: Jordan Niethe Yikes! Reviewed-by: Ravi Bangoria
Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction
On 2/4/21 6:38 PM, Naveen N. Rao wrote: On 2021/02/04 04:17PM, Ravi Bangoria wrote: Don't allow Uprobe on 2nd word of a prefixed instruction. As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction as well. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v1: http://lore.kernel.org/r/20210119091234.76317-1-ravi.bango...@linux.ibm.com v1->v2: - Instead of introducing new arch hook from verify_opcode(), use existing hook arch_uprobe_analyze_insn(). - Add explicit check for prefixed instruction crossing 64-byte boundary. If probe is on such instruction, throw an error. arch/powerpc/kernel/uprobes.c | 66 ++- 1 file changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..485d19a2a31f 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -7,6 +7,7 @@ * Adapted from the x86 port by Ananth N Mavinakayanahalli */ #include +#include #include #include #include @@ -28,6 +29,69 @@ bool is_trap_insn(uprobe_opcode_t *insn) return (is_trap(*insn)); } +#ifdef CONFIG_PPC64 +static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr) +{ + struct page *page; + struct vm_area_struct *vma; + void *kaddr; + unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD; + + if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 0) + return -EINVAL; + + kaddr = kmap_atomic(page); + *instr = *((u32 *)(kaddr + (addr & ~PAGE_MASK))); + kunmap_atomic(kaddr); + put_page(page); + return 0; +} + +static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr) +{ + struct ppc_inst inst; + u32 prefix, suffix; + + /* +* No need to check if addr is pointing to beginning of the +* page. Even if probe is on a suffix of page-unaligned +* prefixed instruction, hw will raise exception and kernel +* will send SIGBUS. +*/ + if (!(addr & ~PAGE_MASK)) + return 0; + + if (get_instr(mm, addr, ) < 0) + return -EINVAL; + if (get_instr(mm, addr + 4, ) < 0) + return -EINVAL; + + inst = ppc_inst_prefix(prefix, suffix); + if (ppc_inst_prefixed(inst) && (addr & 0x3F) == 0x3C) { + printk_ratelimited("Cannot register a uprobe on 64 byte " ^^ pr_info_ratelimited() It should be sufficient to check the primary opcode to determine if it is a prefixed instruction. You don't have to read the suffix. I see that we don't have a helper to do this currently, so you could do: if (ppc_inst_primary_opcode(ppc_inst(prefix)) == 1) Ok. In the future, it might be worthwhile to add IS_PREFIX() as a macro similar to IS_MTMSRD() if there are more such uses. Along with this, if you also add the below to the start of this function, you can get rid of the #ifdef: if (!IS_ENABLED(CONFIG_PPC64)) return 0; Yeah this is better. Thanks for the review, Naveen! Ravi
Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction
On 2/4/21 9:42 PM, Naveen N. Rao wrote: On 2021/02/04 06:38PM, Naveen N. Rao wrote: On 2021/02/04 04:17PM, Ravi Bangoria wrote: Don't allow Uprobe on 2nd word of a prefixed instruction. As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction as well. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v1: http://lore.kernel.org/r/20210119091234.76317-1-ravi.bango...@linux.ibm.com v1->v2: - Instead of introducing new arch hook from verify_opcode(), use existing hook arch_uprobe_analyze_insn(). - Add explicit check for prefixed instruction crossing 64-byte boundary. If probe is on such instruction, throw an error. arch/powerpc/kernel/uprobes.c | 66 ++- 1 file changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..485d19a2a31f 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -7,6 +7,7 @@ * Adapted from the x86 port by Ananth N Mavinakayanahalli */ #include +#include #include #include #include @@ -28,6 +29,69 @@ bool is_trap_insn(uprobe_opcode_t *insn) return (is_trap(*insn)); } +#ifdef CONFIG_PPC64 +static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr) +{ + struct page *page; + struct vm_area_struct *vma; + void *kaddr; + unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD; + + if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 0) + return -EINVAL; + + kaddr = kmap_atomic(page); + *instr = *((u32 *)(kaddr + (addr & ~PAGE_MASK))); + kunmap_atomic(kaddr); + put_page(page); + return 0; +} + +static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr) +{ + struct ppc_inst inst; + u32 prefix, suffix; + + /* +* No need to check if addr is pointing to beginning of the +* page. Even if probe is on a suffix of page-unaligned +* prefixed instruction, hw will raise exception and kernel +* will send SIGBUS. +*/ + if (!(addr & ~PAGE_MASK)) + return 0; + + if (get_instr(mm, addr, ) < 0) + return -EINVAL; + if (get_instr(mm, addr + 4, ) < 0) + return -EINVAL; + + inst = ppc_inst_prefix(prefix, suffix); + if (ppc_inst_prefixed(inst) && (addr & 0x3F) == 0x3C) { + printk_ratelimited("Cannot register a uprobe on 64 byte " ^^ pr_info_ratelimited() It should be sufficient to check the primary opcode to determine if it is a prefixed instruction. You don't have to read the suffix. I see that we don't have a helper to do this currently, so you could do: if (ppc_inst_primary_opcode(ppc_inst(prefix)) == 1) Seeing the kprobes code, I realized that we have to check for another scenario (Thanks, Jordan!). If this is the suffix of a prefix instruction for which a uprobe has already been installed, then the previous word will be a 'trap' instruction. You need to check if there is a uprobe at the previous word, and if the original instruction there was a prefix instruction. Yes, this patch will fail to detect such scenario. I think I should read the instruction directly from file, like what copy_insn() does. With that, I'll get original instruction rather that 'trap'. I'll think more along this line. Ravi
Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction
On 2/4/21 6:45 PM, Naveen N. Rao wrote: On 2021/02/04 04:19PM, Ravi Bangoria wrote: On 2/4/21 4:17 PM, Ravi Bangoria wrote: Don't allow Uprobe on 2nd word of a prefixed instruction. As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction as well. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. @mpe, arch_uprobe_analyze_insn() can return early if cpu_has_feature(CPU_FTR_ARCH_31) is not set. But that will miss out a rare scenario of user running binary with prefixed instruction on p10 predecessors. Please let me know if I should add cpu_has_feature(CPU_FTR_ARCH_31) or not. The check you are adding is very specific to prefixed instructions, so it makes sense to add a cpu feature check for v3.1. On older processors, those are invalid instructions like any other. The instruction emulation infrastructure will refuse to emulate it and the instruction will be single stepped. Sure will add it. Ravi
Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction
On 2/6/21 11:36 PM, Oleg Nesterov wrote: On 02/04, Ravi Bangoria wrote: +static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr) +{ + struct page *page; + struct vm_area_struct *vma; + void *kaddr; + unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD; + + if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 0) + return -EINVAL; "vma" is not used, Ok. and I don't think you need FOLL_SPLIT_PMD. Isn't it needed if the target page is hugepage? Otherwise I can't really comment this ppc-specific change. To be honest, I don't even understand why do we need this fix. Sure, the breakpoint in the middle of 64-bit insn won't work, why do we care? The user should know what does he do. That's a valid point. This patch is to protract user from doing invalid thing. Though, there is one minor scenario where this patch will help. If the original prefixed instruction is 64 byte unaligned, and say user probes it, Uprobe infrastructure will emulate such instruction transparently without notifying user that the instruction is improperly aligned. Not to mention we can't really trust get_user_pages() in that this page can be modified by mm owner or debugger... As Naveen pointed out, there might be existing uprobe on the prefix and this patch will fail to detect such scenario. So I'm thinking to read the instruction directly from file backed page (like copy_insn), in which case I won't use get_user_pages(). Thanks Oleg for the review! Ravi
Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction
On 2/4/21 4:17 PM, Ravi Bangoria wrote: Don't allow Uprobe on 2nd word of a prefixed instruction. As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction as well. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. @mpe, arch_uprobe_analyze_insn() can return early if cpu_has_feature(CPU_FTR_ARCH_31) is not set. But that will miss out a rare scenario of user running binary with prefixed instruction on p10 predecessors. Please let me know if I should add cpu_has_feature(CPU_FTR_ARCH_31) or not. - Ravi
[PATCH v2] powerpc/uprobes: Validation for prefixed instruction
Don't allow Uprobe on 2nd word of a prefixed instruction. As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction as well. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v1: http://lore.kernel.org/r/20210119091234.76317-1-ravi.bango...@linux.ibm.com v1->v2: - Instead of introducing new arch hook from verify_opcode(), use existing hook arch_uprobe_analyze_insn(). - Add explicit check for prefixed instruction crossing 64-byte boundary. If probe is on such instruction, throw an error. arch/powerpc/kernel/uprobes.c | 66 ++- 1 file changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..485d19a2a31f 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -7,6 +7,7 @@ * Adapted from the x86 port by Ananth N Mavinakayanahalli */ #include +#include #include #include #include @@ -28,6 +29,69 @@ bool is_trap_insn(uprobe_opcode_t *insn) return (is_trap(*insn)); } +#ifdef CONFIG_PPC64 +static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr) +{ + struct page *page; + struct vm_area_struct *vma; + void *kaddr; + unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD; + + if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 0) + return -EINVAL; + + kaddr = kmap_atomic(page); + *instr = *((u32 *)(kaddr + (addr & ~PAGE_MASK))); + kunmap_atomic(kaddr); + put_page(page); + return 0; +} + +static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr) +{ + struct ppc_inst inst; + u32 prefix, suffix; + + /* +* No need to check if addr is pointing to beginning of the +* page. Even if probe is on a suffix of page-unaligned +* prefixed instruction, hw will raise exception and kernel +* will send SIGBUS. +*/ + if (!(addr & ~PAGE_MASK)) + return 0; + + if (get_instr(mm, addr, ) < 0) + return -EINVAL; + if (get_instr(mm, addr + 4, ) < 0) + return -EINVAL; + + inst = ppc_inst_prefix(prefix, suffix); + if (ppc_inst_prefixed(inst) && (addr & 0x3F) == 0x3C) { + printk_ratelimited("Cannot register a uprobe on 64 byte " + "unaligned prefixed instruction\n"); + return -EINVAL; + } + + suffix = prefix; + if (get_instr(mm, addr - 4, ) < 0) + return -EINVAL; + + inst = ppc_inst_prefix(prefix, suffix); + if (ppc_inst_prefixed(inst)) { + printk_ratelimited("Cannot register a uprobe on the second " + "word of prefixed instruction\n"); + return -EINVAL; + } + return 0; +} +#else +static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr) +{ + return 0; +} +#endif + /** * arch_uprobe_analyze_insn * @mm: the probed address space. @@ -41,7 +105,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; - return 0; + return validate_prefixed_instr(mm, addr); } /* -- 2.26.2
Re: [PATCH] powerpc/sstep: Fix array out of bound warning
On 1/28/21 10:50 PM, Naveen N. Rao wrote: On 2021/01/15 11:46AM, Ravi Bangoria wrote: Compiling kernel with -Warray-bounds throws below warning: In function 'emulate_vsx_store': warning: array subscript is above array bounds [-Warray-bounds] buf.d[2] = byterev_8(reg->d[1]); ~^~~ buf.d[3] = byterev_8(reg->d[0]); ~^~~ Fix it by converting local variable 'union vsx_reg buf' into an array. Also consider function argument 'union vsx_reg *reg' as array instead of pointer because callers are actually passing an array to it. I think you should change the function prototype to reflect this. However, while I agree with this change in principle, it looks to be a lot of code churn for a fairly narrow use. Perhaps we should just address the specific bug. Something like the below (not tested)? Yes, this would serve the same purpose and it's more compact as well. Sent v2. @@ -818,13 +818,15 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg, break; if (rev) { /* reverse 32 bytes */ - buf.d[0] = byterev_8(reg->d[3]); - buf.d[1] = byterev_8(reg->d[2]); - buf.d[2] = byterev_8(reg->d[1]); - buf.d[3] = byterev_8(reg->d[0]); - reg = + union vsx_reg buf32[2]; + buf32[0].d[0] = byterev_8(reg[1].d[1]); + buf32[0].d[1] = byterev_8(reg[1].d[0]); + buf32[1].d[0] = byterev_8(reg[0].d[1]); + buf32[1].d[1] = byterev_8(reg[0].d[0]); + memcpy(mem, buf32, size); + } else { + memcpy(mem, reg, size); } - memcpy(mem, reg, size); break; case 16: /* stxv, stxvx, stxvl, stxvll */ - Naveen
[PATCH v2] powerpc/sstep: Fix array out of bound warning
Compiling kernel with -Warray-bounds throws below warning: In function 'emulate_vsx_store': warning: array subscript is above array bounds [-Warray-bounds] buf.d[2] = byterev_8(reg->d[1]); ~^~~ buf.d[3] = byterev_8(reg->d[0]); ~^~~ Fix it by using temporary array variable 'union vsx_reg buf32[]' in that code block. Also, with element_size = 32, 'union vsx_reg *reg' is an array of size 2. So, use 'reg' as an array instead of pointer in the same code block. Fixes: af99da74333b ("powerpc/sstep: Support VSX vector paired storage access instructions") Suggested-by: Naveen N. Rao Signed-off-by: Ravi Bangoria --- v1: http://lore.kernel.org/r/20210115061620.692500-1-ravi.bango...@linux.ibm.com v1->v2: - Change code only in the affected block arch/powerpc/lib/sstep.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index bf7a7d62ae8b..ede093e96234 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -818,13 +818,15 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg, break; if (rev) { /* reverse 32 bytes */ - buf.d[0] = byterev_8(reg->d[3]); - buf.d[1] = byterev_8(reg->d[2]); - buf.d[2] = byterev_8(reg->d[1]); - buf.d[3] = byterev_8(reg->d[0]); - reg = + union vsx_reg buf32[2]; + buf32[0].d[0] = byterev_8(reg[1].d[1]); + buf32[0].d[1] = byterev_8(reg[1].d[0]); + buf32[1].d[0] = byterev_8(reg[0].d[1]); + buf32[1].d[1] = byterev_8(reg[0].d[0]); + memcpy(mem, buf32, size); + } else { + memcpy(mem, reg, size); } - memcpy(mem, reg, size); break; case 16: /* stxv, stxvx, stxvl, stxvll */ -- 2.26.2
Re: [PATCH] powerpc/uprobes: Don't allow probe on suffix of prefixed instruction
On 1/19/21 10:56 PM, Oleg Nesterov wrote: On 01/19, Ravi Bangoria wrote: Probe on 2nd word of a prefixed instruction is invalid scenario and should be restricted. I don't understand this ppc-specific problem, but... So far (upto Power9), instruction size was fixed - 4 bytes. But Power10 introduced a prefixed instruction which consist of 8 bytes, where first 4 bytes is prefix and remaining is suffix. This patch checks whether the Uprobe is on the 2nd word (suffix) of a prefixed instruction. If so, consider it as invalid Uprobe. +#ifdef CONFIG_PPC64 +int arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, + uprobe_opcode_t opcode) +{ + uprobe_opcode_t prefix; + void *kaddr; + struct ppc_inst inst; + + /* Don't check if vaddr is pointing to the beginning of page */ + if (!(vaddr & ~PAGE_MASK)) + return 0; So the fix is incomplete? Or insn at the start of page can't be prefixed? Prefixed instruction can not cross 64 byte boundary. If it does, kernel generates SIGBUS. Considering all powerpc supported page sizes to be multiple of 64 bytes, there will never be a scenario where prefix and suffix will be on different pages. i.e. a beginning of the page should never be a suffix. +int __weak arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, +uprobe_opcode_t opcode) +{ + return 0; +} + static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode) { uprobe_opcode_t old_opcode; @@ -275,6 +281,8 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t if (is_swbp_insn(new_opcode)) { if (is_swbp)/* register: already installed? */ return 0; + if (arch_uprobe_verify_opcode(page, vaddr, old_opcode)) + return -EINVAL; Well, this doesn't look good... To me it would be better to change the prepare_uprobe() path to copy the potential prefix into uprobe->arch and check ppc_inst_prefixed() in arch_uprobe_analyze_insn(). What do you think? Agreed. The only reason I was checking via verify_opcode() is to make the code more simpler. If I need to check via prepare_uprobe(), I'll need to abuse uprobe->offset by setting it to uprobe->offset - 4 to read previous 4 bytes of current instruction. Which, IMHO, is not that straightforward with current implementation of prepare_uprobe(). But while replying here, I'm thinking... I should be able to grab a page using mm and vaddr, which are already available in arch_uprobe_analyze_insn(). With that, I should be able to do all this inside arch_uprobe_analyze_insn() only. I'll try this and send v2 if that works. Thanks for the review. Ravi
[PATCH] powerpc/uprobes: Don't allow probe on suffix of prefixed instruction
Probe on 2nd word of a prefixed instruction is invalid scenario and should be restricted. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/uprobes.c | 28 include/linux/uprobes.h | 1 + kernel/events/uprobes.c | 8 3 files changed, 37 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..c73d5a397164 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -7,6 +7,7 @@ * Adapted from the x86 port by Ananth N Mavinakayanahalli */ #include +#include #include #include #include @@ -44,6 +45,33 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, return 0; } +#ifdef CONFIG_PPC64 +int arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, + uprobe_opcode_t opcode) +{ + uprobe_opcode_t prefix; + void *kaddr; + struct ppc_inst inst; + + /* Don't check if vaddr is pointing to the beginning of page */ + if (!(vaddr & ~PAGE_MASK)) + return 0; + + kaddr = kmap_atomic(page); + memcpy(, kaddr + ((vaddr - 4) & ~PAGE_MASK), UPROBE_SWBP_INSN_SIZE); + kunmap_atomic(kaddr); + + inst = ppc_inst_prefix(prefix, opcode); + + if (ppc_inst_prefixed(inst)) { + printk_ratelimited("Cannot register a uprobe on the second " + "word of prefixed instruction\n"); + return -1; + } + return 0; +} +#endif + /* * arch_uprobe_pre_xol - prepare to execute out of line. * @auprobe: the probepoint information. diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index f46e0ca0169c..5a3b45878e13 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -128,6 +128,7 @@ extern bool uprobe_deny_signal(void); extern bool arch_uprobe_skip_sstep(struct arch_uprobe *aup, struct pt_regs *regs); extern void uprobe_clear_state(struct mm_struct *mm); extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr); +int arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t opcode); extern int arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs); extern int arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs); extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk); diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index bf9edd8d75be..be02e6c26e3f 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -255,6 +255,12 @@ static void copy_to_page(struct page *page, unsigned long vaddr, const void *src kunmap_atomic(kaddr); } +int __weak arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, +uprobe_opcode_t opcode) +{ + return 0; +} + static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode) { uprobe_opcode_t old_opcode; @@ -275,6 +281,8 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t if (is_swbp_insn(new_opcode)) { if (is_swbp)/* register: already installed? */ return 0; + if (arch_uprobe_verify_opcode(page, vaddr, old_opcode)) + return -EINVAL; } else { if (!is_swbp) /* unregister: was it changed by us? */ return 0; -- 2.26.2
[PATCH] powerpc/sstep: Fix array out of bound warning
Compiling kernel with -Warray-bounds throws below warning: In function 'emulate_vsx_store': warning: array subscript is above array bounds [-Warray-bounds] buf.d[2] = byterev_8(reg->d[1]); ~^~~ buf.d[3] = byterev_8(reg->d[0]); ~^~~ Fix it by converting local variable 'union vsx_reg buf' into an array. Also consider function argument 'union vsx_reg *reg' as array instead of pointer because callers are actually passing an array to it. Fixes: af99da74333b ("powerpc/sstep: Support VSX vector paired storage access instructions") Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/sstep.c | 61 1 file changed, 31 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index bf7a7d62ae8b..5b4281ade5b6 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -723,7 +723,8 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, const unsigned char *bp; size = GETSIZE(op->type); - reg->d[0] = reg->d[1] = 0; + reg[0].d[0] = reg[0].d[1] = 0; + reg[1].d[0] = reg[1].d[1] = 0; switch (op->element_size) { case 32: @@ -742,25 +743,25 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, /* scalar loads, lxvd2x, lxvdsx */ read_size = (size >= 8) ? 8 : size; i = IS_LE ? 8 : 8 - read_size; - memcpy(>b[i], mem, read_size); + memcpy([0].b[i], mem, read_size); if (rev) - do_byte_reverse(>b[i], 8); + do_byte_reverse([0].b[i], 8); if (size < 8) { if (op->type & SIGNEXT) { /* size == 4 is the only case here */ - reg->d[IS_LE] = (signed int) reg->d[IS_LE]; + reg[0].d[IS_LE] = (signed int)reg[0].d[IS_LE]; } else if (op->vsx_flags & VSX_FPCONV) { preempt_disable(); - conv_sp_to_dp(>fp[1 + IS_LE], - >dp[IS_LE]); + conv_sp_to_dp([0].fp[1 + IS_LE], + [0].dp[IS_LE]); preempt_enable(); } } else { if (size == 16) { unsigned long v = *(unsigned long *)(mem + 8); - reg->d[IS_BE] = !rev ? v : byterev_8(v); + reg[0].d[IS_BE] = !rev ? v : byterev_8(v); } else if (op->vsx_flags & VSX_SPLAT) - reg->d[IS_BE] = reg->d[IS_LE]; + reg[0].d[IS_BE] = reg[0].d[IS_LE]; } break; case 4: @@ -768,13 +769,13 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, wp = mem; for (j = 0; j < size / 4; ++j) { i = IS_LE ? 3 - j : j; - reg->w[i] = !rev ? *wp++ : byterev_4(*wp++); + reg[0].w[i] = !rev ? *wp++ : byterev_4(*wp++); } if (op->vsx_flags & VSX_SPLAT) { - u32 val = reg->w[IS_LE ? 3 : 0]; + u32 val = reg[0].w[IS_LE ? 3 : 0]; for (; j < 4; ++j) { i = IS_LE ? 3 - j : j; - reg->w[i] = val; + reg[0].w[i] = val; } } break; @@ -783,7 +784,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, hp = mem; for (j = 0; j < size / 2; ++j) { i = IS_LE ? 7 - j : j; - reg->h[i] = !rev ? *hp++ : byterev_2(*hp++); + reg[0].h[i] = !rev ? *hp++ : byterev_2(*hp++); } break; case 1: @@ -791,7 +792,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, bp = mem; for (j = 0; j < size; ++j) { i = IS_LE ? 15 - j : j; - reg->b[i] = *bp++; + reg[0].b[i] = *bp++; } break; } @@ -804,7 +805,7 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg, { int size, write_size; int i, j; - union vsx_reg buf; + union vsx_reg buf[2]; unsigned int *wp; unsigned short *hp; unsigned char *bp; @@ -818,11 +819,11 @@ void emulate_vs
[PATCH v3 1/4] KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1
On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED hcall to load L2 guest state in cpu. L1 hypervisor prepares the L2 state in struct hv_guest_state and passes a pointer to it via hcall. Using that pointer, L0 reads/writes that state directly from/to L1 memory. Thus L0 must be aware of hv_guest_state layout of L1. Currently it uses version field to achieve this. i.e. If L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't allow nested kvm guest. This restriction can be loosen up a bit. L0 can be taught to understand older layout of hv_guest_state, if we restrict the new member to be added only at the end. i.e. we can allow nested guest even when L0 hv_guest_state.version > L1 hv_guest_state.version. Though, the other way around is not possible. Signed-off-by: Ravi Bangoria Reviewed-by: Fabiano Rosas --- arch/powerpc/include/asm/hvcall.h | 17 +++-- arch/powerpc/kvm/book3s_hv_nested.c | 55 +++-- 2 files changed, 60 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index c1fbccb04390..ca6840239f90 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -526,9 +526,12 @@ struct h_cpu_char_result { u64 behaviour; }; -/* Register state for entering a nested guest with H_ENTER_NESTED */ +/* + * Register state for entering a nested guest with H_ENTER_NESTED. + * New member must be added at the end. + */ struct hv_guest_state { - u64 version;/* version of this structure layout */ + u64 version;/* version of this structure layout, must be first */ u32 lpid; u32 vcpu_token; /* These registers are hypervisor privileged (at least for writing) */ @@ -562,6 +565,16 @@ struct hv_guest_state { /* Latest version of hv_guest_state structure */ #define HV_GUEST_STATE_VERSION 1 +static inline int hv_guest_state_size(unsigned int version) +{ + switch (version) { + case 1: + return offsetofend(struct hv_guest_state, ppr); + default: + return -1; + } +} + /* * From the document "H_GetPerformanceCounterInfo Interface" v1.07 * diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 33b58549a9aa..937dd5114300 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -215,12 +215,51 @@ static void kvmhv_nested_mmio_needed(struct kvm_vcpu *vcpu, u64 regs_ptr) } } +static int kvmhv_read_guest_state_and_regs(struct kvm_vcpu *vcpu, + struct hv_guest_state *l2_hv, + struct pt_regs *l2_regs, + u64 hv_ptr, u64 regs_ptr) +{ + int size; + + if (kvm_vcpu_read_guest(vcpu, hv_ptr, _hv->version, + sizeof(l2_hv->version))) + return -1; + + if (kvmppc_need_byteswap(vcpu)) + l2_hv->version = swab64(l2_hv->version); + + size = hv_guest_state_size(l2_hv->version); + if (size < 0) + return -1; + + return kvm_vcpu_read_guest(vcpu, hv_ptr, l2_hv, size) || + kvm_vcpu_read_guest(vcpu, regs_ptr, l2_regs, + sizeof(struct pt_regs)); +} + +static int kvmhv_write_guest_state_and_regs(struct kvm_vcpu *vcpu, + struct hv_guest_state *l2_hv, + struct pt_regs *l2_regs, + u64 hv_ptr, u64 regs_ptr) +{ + int size; + + size = hv_guest_state_size(l2_hv->version); + if (size < 0) + return -1; + + return kvm_vcpu_write_guest(vcpu, hv_ptr, l2_hv, size) || + kvm_vcpu_write_guest(vcpu, regs_ptr, l2_regs, +sizeof(struct pt_regs)); +} + long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) { long int err, r; struct kvm_nested_guest *l2; struct pt_regs l2_regs, saved_l1_regs; - struct hv_guest_state l2_hv, saved_l1_hv; + struct hv_guest_state l2_hv = {0}, saved_l1_hv; struct kvmppc_vcore *vc = vcpu->arch.vcore; u64 hv_ptr, regs_ptr; u64 hdec_exp; @@ -235,17 +274,15 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) hv_ptr = kvmppc_get_gpr(vcpu, 4); regs_ptr = kvmppc_get_gpr(vcpu, 5); vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); - err = kvm_vcpu_read_guest(vcpu, hv_ptr, _hv, - sizeof(struct hv_guest_state)) || - kvm_vcpu_read_guest(vcpu, regs_ptr, _regs, - sizeof(struct pt_regs)); + err = kvmhv_read_guest_state_and_regs(vcpu, _hv, _regs, + hv_ptr, reg
[PATCH v3 4/4] KVM: PPC: Introduce new capability for 2nd DAWR
Introduce KVM_CAP_PPC_DAWR1 which can be used by Qemu to query whether kvm supports 2nd DAWR or not. The capability is by default disabled even when the underlying CPU supports 2nd DAWR. Qemu needs to check and enable it manually to use the feature. Signed-off-by: Ravi Bangoria --- Documentation/virt/kvm/api.rst | 10 ++ arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/book3s_hv.c | 12 arch/powerpc/kvm/powerpc.c | 10 ++ include/uapi/linux/kvm.h | 1 + tools/include/uapi/linux/kvm.h | 1 + 6 files changed, 35 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index abb24575bdf9..049f07ebf197 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6016,6 +6016,16 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space can then handle to implement model specific MSR handling and/or user notifications to inform a user that an MSR was not handled. +7.22 KVM_CAP_PPC_DAWR1 +-- + +:Architectures: ppc +:Parameters: none +:Returns: 0 on success, -EINVAL when CPU doesn't support 2nd DAWR + +This capability can be used to check / enable 2nd DAWR feature provided +by POWER10 processor. + 8. Other capabilities. == diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 0a056c64c317..13c39d24dda5 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -314,6 +314,7 @@ struct kvmppc_ops { int size); int (*enable_svm)(struct kvm *kvm); int (*svm_off)(struct kvm *kvm); + int (*enable_dawr1)(struct kvm *kvm); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b7a30c0692a7..04c02344bd3f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -5625,6 +5625,17 @@ static int kvmhv_svm_off(struct kvm *kvm) return ret; } +static int kvmhv_enable_dawr1(struct kvm *kvm) +{ + if (!cpu_has_feature(CPU_FTR_DAWR1)) + return -ENODEV; + + /* kvm == NULL means the caller is testing if the capability exists */ + if (kvm) + kvm->arch.dawr1_enabled = true; + return 0; +} + static struct kvmppc_ops kvm_ops_hv = { .get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv, .set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv, @@ -5668,6 +5679,7 @@ static struct kvmppc_ops kvm_ops_hv = { .store_to_eaddr = kvmhv_store_to_eaddr, .enable_svm = kvmhv_enable_svm, .svm_off = kvmhv_svm_off, + .enable_dawr1 = kvmhv_enable_dawr1, }; static int kvm_init_subcore_bitmap(void) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 13999123b735..380656528b5b 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -678,6 +678,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = hv_enabled && kvmppc_hv_ops->enable_svm && !kvmppc_hv_ops->enable_svm(NULL); break; + case KVM_CAP_PPC_DAWR1: + r = !!(hv_enabled && kvmppc_hv_ops->enable_dawr1 && + !kvmppc_hv_ops->enable_dawr1(NULL)); + break; #endif default: r = 0; @@ -2187,6 +2191,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, break; r = kvm->arch.kvm_ops->enable_svm(kvm); break; + case KVM_CAP_PPC_DAWR1: + r = -EINVAL; + if (!is_kvmppc_hv_enabled(kvm) || !kvm->arch.kvm_ops->enable_dawr1) + break; + r = kvm->arch.kvm_ops->enable_dawr1(kvm); + break; #endif default: r = -EINVAL; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index ca41220b40b8..f1210f99a52d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_X86_USER_SPACE_MSR 188 #define KVM_CAP_X86_MSR_FILTER 189 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190 +#define KVM_CAP_PPC_DAWR1 191 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index ca41220b40b8..f1210f99a52d 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_X86_USER_SPACE_MSR 188 #define KVM_CAP_X86_MSR_FILTER 189 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190 +#define KVM_CAP_PPC_DAWR1 191 #ifdef KVM_CAP_IRQ_ROUTING -- 2.26.2
[PATCH v3 0/4] KVM: PPC: Power10 2nd DAWR enablement
Enable p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it. A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has been added to query 2nd DAWR support via kvm ioctl. This feature also needs to be enabled in Qemu to really use it. I'll post Qemu patches once kvm patches get accepted. v2: https://lore.kernel.org/kvm/20201124105953.39325-1-ravi.bango...@linux.ibm.com v2->v3: - Patch #1. If L0 version > L1, L0 hv_guest_state will contain some additional fields which won't be filled while reading from L1 memory and thus they can contain garbage. Initialize l2_hv with 0s to avoid such situations. - Patch #3. Introduce per vm flag dawr1_enabled. - Patch #4. Instead of auto enabling KVM_CAP_PPC_DAWR1, let user check and enable it manually. Also move KVM_CAP_PPC_DAWR1 check / enable logic inside #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE). - Explain KVM_CAP_PPC_DAWR1 in Documentation/virt/kvm/api.rst - Rebased on top of 5.10-rc3. v1->v2: - patch #1: New patch - patch #2: Don't rename KVM_REG_PPC_DAWR, it's an uapi macro - patch #3: Increment HV_GUEST_STATE_VERSION - Split kvm and selftests patches into different series - Patches rebased to paulus/kvm-ppc-next (cf59eb13e151) + few other watchpoint patches which are yet to be merged in paulus/kvm-ppc-next. Ravi Bangoria (4): KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1 KVM: PPC: Rename current DAWR macros and variables KVM: PPC: Add infrastructure to support 2nd DAWR KVM: PPC: Introduce new capability for 2nd DAWR Documentation/virt/kvm/api.rst| 12 arch/powerpc/include/asm/hvcall.h | 25 ++- arch/powerpc/include/asm/kvm_host.h | 7 +- arch/powerpc/include/asm/kvm_ppc.h| 1 + arch/powerpc/include/uapi/asm/kvm.h | 2 + arch/powerpc/kernel/asm-offsets.c | 6 +- arch/powerpc/kvm/book3s_hv.c | 79 +++ arch/powerpc/kvm/book3s_hv_nested.c | 70 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +--- arch/powerpc/kvm/powerpc.c| 10 +++ include/uapi/linux/kvm.h | 1 + tools/arch/powerpc/include/uapi/asm/kvm.h | 2 + tools/include/uapi/linux/kvm.h| 1 + 13 files changed, 216 insertions(+), 43 deletions(-) -- 2.26.2
[PATCH v3 2/4] KVM: PPC: Rename current DAWR macros and variables
Power10 is introducing second DAWR. Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's uapi so changing it will break userspace. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/kvm_host.h | 4 ++-- arch/powerpc/kernel/asm-offsets.c | 4 ++-- arch/powerpc/kvm/book3s_hv.c| 24 arch/powerpc/kvm/book3s_hv_nested.c | 8 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++-- 5 files changed, 30 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d67a470e95a3..62cadf1a596e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -584,8 +584,8 @@ struct kvm_vcpu_arch { u32 ctrl; u32 dabrx; ulong dabr; - ulong dawr; - ulong dawrx; + ulong dawr0; + ulong dawrx0; ulong ciabr; ulong cfar; ulong ppr; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index c2722ff36e98..5a77aac516ba 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -548,8 +548,8 @@ int main(void) OFFSET(VCPU_CTRL, kvm_vcpu, arch.ctrl); OFFSET(VCPU_DABR, kvm_vcpu, arch.dabr); OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx); - OFFSET(VCPU_DAWR, kvm_vcpu, arch.dawr); - OFFSET(VCPU_DAWRX, kvm_vcpu, arch.dawrx); + OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0); + OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0); OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr); OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags); OFFSET(VCPU_DEC, kvm_vcpu, arch.dec); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index e3b1839fc251..bcbad8daa974 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -782,8 +782,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, return H_UNSUPPORTED_FLAG_START; if (value2 & DABRX_HYP) return H_P4; - vcpu->arch.dawr = value1; - vcpu->arch.dawrx = value2; + vcpu->arch.dawr0 = value1; + vcpu->arch.dawrx0 = value2; return H_SUCCESS; case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE: /* KVM does not support mflags=2 (AIL=2) */ @@ -1747,10 +1747,10 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, *val = get_reg_val(id, vcpu->arch.vcore->vtb); break; case KVM_REG_PPC_DAWR: - *val = get_reg_val(id, vcpu->arch.dawr); + *val = get_reg_val(id, vcpu->arch.dawr0); break; case KVM_REG_PPC_DAWRX: - *val = get_reg_val(id, vcpu->arch.dawrx); + *val = get_reg_val(id, vcpu->arch.dawrx0); break; case KVM_REG_PPC_CIABR: *val = get_reg_val(id, vcpu->arch.ciabr); @@ -1979,10 +1979,10 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, vcpu->arch.vcore->vtb = set_reg_val(id, *val); break; case KVM_REG_PPC_DAWR: - vcpu->arch.dawr = set_reg_val(id, *val); + vcpu->arch.dawr0 = set_reg_val(id, *val); break; case KVM_REG_PPC_DAWRX: - vcpu->arch.dawrx = set_reg_val(id, *val) & ~DAWRX_HYP; + vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP; break; case KVM_REG_PPC_CIABR: vcpu->arch.ciabr = set_reg_val(id, *val); @@ -3437,8 +3437,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, int trap; unsigned long host_hfscr = mfspr(SPRN_HFSCR); unsigned long host_ciabr = mfspr(SPRN_CIABR); - unsigned long host_dawr = mfspr(SPRN_DAWR0); - unsigned long host_dawrx = mfspr(SPRN_DAWRX0); + unsigned long host_dawr0 = mfspr(SPRN_DAWR0); + unsigned long host_dawrx0 = mfspr(SPRN_DAWRX0); unsigned long host_psscr = mfspr(SPRN_PSSCR); unsigned long host_pidr = mfspr(SPRN_PID); @@ -3477,8 +3477,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, mtspr(SPRN_SPURR, vcpu->arch.spurr); if (dawr_enabled()) { - mtspr(SPRN_DAWR0, vcpu->arch.dawr); - mtspr(SPRN_DAWRX0, vcpu->arch.dawrx); + mtspr(SPRN_DAWR0, vcpu->arch.dawr0); + mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0); } mtspr(SPRN_CIABR, vcpu->arch.ciabr); mtspr(SPRN_IC, vcpu->arch.ic); @@ -3530,8 +3530,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64
[PATCH v3 3/4] KVM: PPC: Add infrastructure to support 2nd DAWR
kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria --- Documentation/virt/kvm/api.rst| 2 ++ arch/powerpc/include/asm/hvcall.h | 8 - arch/powerpc/include/asm/kvm_host.h | 3 ++ arch/powerpc/include/uapi/asm/kvm.h | 2 ++ arch/powerpc/kernel/asm-offsets.c | 2 ++ arch/powerpc/kvm/book3s_hv.c | 43 +++ arch/powerpc/kvm/book3s_hv_nested.c | 7 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 23 tools/arch/powerpc/include/uapi/asm/kvm.h | 2 ++ 9 files changed, 91 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 36d5f1f3c6dd..abb24575bdf9 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -2249,6 +2249,8 @@ registers, find a list below: PPC KVM_REG_PPC_PSSCR 64 PPC KVM_REG_PPC_DEC_EXPIRY 64 PPC KVM_REG_PPC_PTCR64 + PPC KVM_REG_PPC_DAWR1 64 + PPC KVM_REG_PPC_DAWRX1 64 PPC KVM_REG_PPC_TM_GPR0 64 ... PPC KVM_REG_PPC_TM_GPR3164 diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index ca6840239f90..98afa58b619a 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -560,16 +560,22 @@ struct hv_guest_state { u64 pidr; u64 cfar; u64 ppr; + /* Version 1 ends here */ + u64 dawr1; + u64 dawrx1; + /* Version 2 ends here */ }; /* Latest version of hv_guest_state structure */ -#define HV_GUEST_STATE_VERSION 1 +#define HV_GUEST_STATE_VERSION 2 static inline int hv_guest_state_size(unsigned int version) { switch (version) { case 1: return offsetofend(struct hv_guest_state, ppr); + case 2: + return offsetofend(struct hv_guest_state, dawrx1); default: return -1; } diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 62cadf1a596e..a93cfb672421 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -307,6 +307,7 @@ struct kvm_arch { u8 svm_enabled; bool threads_indep; bool nested_enable; + bool dawr1_enabled; pgd_t *pgtable; u64 process_table; struct dentry *debugfs_dir; @@ -586,6 +587,8 @@ struct kvm_vcpu_arch { ulong dabr; ulong dawr0; ulong dawrx0; + ulong dawr1; + ulong dawrx1; ulong ciabr; ulong cfar; ulong ppr; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index c3af3f324c5a..9f18fa090f1f 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -644,6 +644,8 @@ struct kvm_ppc_cpu_char { #define KVM_REG_PPC_MMCR3 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc1) #define KVM_REG_PPC_SIER2 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc2) #define KVM_REG_PPC_SIER3 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc3) +#define KVM_REG_PPC_DAWR1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc4) +#define KVM_REG_PPC_DAWRX1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc5) /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 5a77aac516ba..a35ea4e19360 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -550,6 +550,8 @@ int main(void) OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx); OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0); OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0); + OFFSET(VCPU_DAWR1, kvm_vcpu, arch.dawr1); + OFFSET(VCPU_DAWRX1, kvm_vcpu, arch.dawrx1); OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr); OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags); OFFSET(VCPU_DEC, kvm_vcpu, arch.dec); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index bcbad8daa974..b7a30c0692a7 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -785,6 +785,22 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, vcpu->arch.dawr0 = value1; vcpu->arch.dawrx0 = value2; return H_SUCCESS; + case H_SET_MODE_RESOURCE_SET_DAWR1: + if (!kvmppc_power8_compatible(vcpu)) + return H_P2; + if (!ppc_breakpoint_available()) + return H_P2; + if (!cpu_has_feature(CPU_FTR_DAWR1)) +
[PATCH] powerpc/xmon: Fix build failure for 8xx
With CONFIG_PPC_8xx and CONFIG_XMON set, kernel build fails with arch/powerpc/xmon/xmon.c:1379:12: error: 'find_free_data_bpt' defined but not used [-Werror=unused-function] Fix it by enclosing find_free_data_bpt() inside #ifndef CONFIG_PPC_8xx. Reported-by: kernel test robot Fixes: 30df74d67d48 ("powerpc/watchpoint/xmon: Support 2nd DAWR") Signed-off-by: Ravi Bangoria --- arch/powerpc/xmon/xmon.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 55c43a6c9111..5559edf36756 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1383,6 +1383,7 @@ static long check_bp_loc(unsigned long addr) return 1; } +#ifndef CONFIG_PPC_8xx static int find_free_data_bpt(void) { int i; @@ -1394,6 +1395,7 @@ static int find_free_data_bpt(void) printf("Couldn't find free breakpoint register\n"); return -1; } +#endif static void print_data_bpts(void) { -- 2.17.1
Re: [PATCH 1/2] powerpc: sstep: Fix load and update instructions
On 11/19/20 11:11 AM, Sandipan Das wrote: The Power ISA says that the fixed-point load and update instructions must neither use R0 for the base address (RA) nor have the destination (RT) and the base address (RA) as the same register. In these cases, the instruction is invalid. This applies to the following instructions. * Load Byte and Zero with Update (lbzu) * Load Byte and Zero with Update Indexed (lbzux) * Load Halfword and Zero with Update (lhzu) * Load Halfword and Zero with Update Indexed (lhzux) * Load Halfword Algebraic with Update (lhau) * Load Halfword Algebraic with Update Indexed (lhaux) * Load Word and Zero with Update (lwzu) * Load Word and Zero with Update Indexed (lwzux) * Load Word Algebraic with Update Indexed (lwaux) * Load Doubleword with Update (ldu) * Load Doubleword with Update Indexed (ldux) However, the following behaviour is observed using some invalid opcodes where RA = RT. An userspace program using an invalid instruction word like 0xe9ce0001, i.e. "ldu r14, 0(r14)", runs and exits without getting terminated abruptly. The instruction performs the load operation but does not write the effective address to the base address register. Attaching an uprobe at that instruction's address results in emulation which writes the effective address to the base register. Thus, the final value of the base address register is different. To remove any inconsistencies, this adds an additional check for the aforementioned instructions to make sure that they are treated as unknown by the emulation infrastructure when RA = 0 or RA = RT. The kernel will then fallback to executing the instruction on hardware. Signed-off-by: Sandipan Das For the series: Reviewed-by: Ravi Bangoria
Re: [PATCH 1/2] powerpc: sstep: Fix load and update instructions
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 855457ed09b5..25a5436be6c6 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -2157,11 +2157,15 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, case 23: /* lwzx */ case 55:/* lwzux */ + if (u && (ra == 0 || ra == rd)) + return -1; I guess you also need to split case 23 and 55? - Ravi
[PATCH v2 4/4] KVM: PPC: Introduce new capability for 2nd DAWR
Introduce KVM_CAP_PPC_DAWR1 which can be used by Qemu to query whether kvm supports 2nd DAWR or not. Signed-off-by: Ravi Bangoria --- arch/powerpc/kvm/powerpc.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 13999123b735..48763fe59fc5 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -679,6 +679,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) !kvmppc_hv_ops->enable_svm(NULL); break; #endif + case KVM_CAP_PPC_DAWR1: + r = cpu_has_feature(CPU_FTR_DAWR1); + break; default: r = 0; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f6d86033c4fa..0f32d6cbabc2 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1035,6 +1035,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_LAST_CPU 184 #define KVM_CAP_SMALLER_MAXPHYADDR 185 #define KVM_CAP_S390_DIAG318 186 +#define KVM_CAP_PPC_DAWR1 187 #ifdef KVM_CAP_IRQ_ROUTING -- 2.26.2
[PATCH v2 0/4] KVM: PPC: Power10 2nd DAWR enablement
Enable p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it. A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has been added to query 2nd DAWR support via kvm ioctl. This feature also needs to be enabled in Qemu to really use it. I'll post Qemu patches once kvm patches get accepted. v1: https://lore.kernel.org/r/20200723102058.312282-1-ravi.bango...@linux.ibm.com v1->v2: - patch #1: New patch - patch #2: Don't rename KVM_REG_PPC_DAWR, it's an uapi macro - patch #3: Increment HV_GUEST_STATE_VERSION - Split kvm and selftests patches into different series - Patches rebased to paulus/kvm-ppc-next (cf59eb13e151) + few other watchpoint patches which are yet to be merged in paulus/kvm-ppc-next. Ravi Bangoria (4): KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1 KVM: PPC: Rename current DAWR macros and variables KVM: PPC: Add infrastructure to support 2nd DAWR KVM: PPC: Introduce new capability for 2nd DAWR Documentation/virt/kvm/api.rst| 2 + arch/powerpc/include/asm/hvcall.h | 25 - arch/powerpc/include/asm/kvm_host.h | 6 +- arch/powerpc/include/uapi/asm/kvm.h | 2 + arch/powerpc/kernel/asm-offsets.c | 6 +- arch/powerpc/kvm/book3s_hv.c | 65 ++ arch/powerpc/kvm/book3s_hv_nested.c | 68 ++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 ++ arch/powerpc/kvm/powerpc.c| 3 + include/uapi/linux/kvm.h | 1 + tools/arch/powerpc/include/uapi/asm/kvm.h | 2 + 11 files changed, 181 insertions(+), 42 deletions(-) -- 2.26.2
[PATCH v2 1/4] KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1
On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED hcall to load L2 guest state in cpu. L1 hypervisor prepares the L2 state in struct hv_guest_state and passes a pointer to it via hcall. Using that pointer, L0 reads/writes that state directly from/to L1 memory. Thus L0 must be aware of hv_guest_state layout of L1. Currently it uses version field to achieve this. i.e. If L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't allow nested kvm guest. This restriction can be loosen up a bit. L0 can be taught to understand older layout of hv_guest_state, if we restrict the new member to be added only at the end. i.e. we can allow nested guest even when L0 hv_guest_state.version > L1 hv_guest_state.version. Though, the other way around is not possible. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hvcall.h | 17 +++-- arch/powerpc/kvm/book3s_hv_nested.c | 53 - 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index fbb377055471..a7073fddb657 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -524,9 +524,12 @@ struct h_cpu_char_result { u64 behaviour; }; -/* Register state for entering a nested guest with H_ENTER_NESTED */ +/* + * Register state for entering a nested guest with H_ENTER_NESTED. + * New member must be added at the end. + */ struct hv_guest_state { - u64 version;/* version of this structure layout */ + u64 version;/* version of this structure layout, must be first */ u32 lpid; u32 vcpu_token; /* These registers are hypervisor privileged (at least for writing) */ @@ -560,6 +563,16 @@ struct hv_guest_state { /* Latest version of hv_guest_state structure */ #define HV_GUEST_STATE_VERSION 1 +static inline int hv_guest_state_size(unsigned int version) +{ + switch (version) { + case 1: + return offsetofend(struct hv_guest_state, ppr); + default: + return -1; + } +} + #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_HVCALL_H */ diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 33b58549a9aa..2b433c3bacea 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -215,6 +215,45 @@ static void kvmhv_nested_mmio_needed(struct kvm_vcpu *vcpu, u64 regs_ptr) } } +static int kvmhv_read_guest_state_and_regs(struct kvm_vcpu *vcpu, + struct hv_guest_state *l2_hv, + struct pt_regs *l2_regs, + u64 hv_ptr, u64 regs_ptr) +{ + int size; + + if (kvm_vcpu_read_guest(vcpu, hv_ptr, &(l2_hv->version), + sizeof(l2_hv->version))) + return -1; + + if (kvmppc_need_byteswap(vcpu)) + l2_hv->version = swab64(l2_hv->version); + + size = hv_guest_state_size(l2_hv->version); + if (size < 0) + return -1; + + return kvm_vcpu_read_guest(vcpu, hv_ptr, l2_hv, size) || + kvm_vcpu_read_guest(vcpu, regs_ptr, l2_regs, + sizeof(struct pt_regs)); +} + +static int kvmhv_write_guest_state_and_regs(struct kvm_vcpu *vcpu, + struct hv_guest_state *l2_hv, + struct pt_regs *l2_regs, + u64 hv_ptr, u64 regs_ptr) +{ + int size; + + size = hv_guest_state_size(l2_hv->version); + if (size < 0) + return -1; + + return kvm_vcpu_write_guest(vcpu, hv_ptr, l2_hv, size) || + kvm_vcpu_write_guest(vcpu, regs_ptr, l2_regs, +sizeof(struct pt_regs)); +} + long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) { long int err, r; @@ -235,17 +274,15 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) hv_ptr = kvmppc_get_gpr(vcpu, 4); regs_ptr = kvmppc_get_gpr(vcpu, 5); vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); - err = kvm_vcpu_read_guest(vcpu, hv_ptr, _hv, - sizeof(struct hv_guest_state)) || - kvm_vcpu_read_guest(vcpu, regs_ptr, _regs, - sizeof(struct pt_regs)); + err = kvmhv_read_guest_state_and_regs(vcpu, _hv, _regs, + hv_ptr, regs_ptr); srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); if (err) return H_PARAMETER; if (kvmppc_need_byteswap(vcpu)) byteswap_hv_regs(_hv); - if (l2_hv.version != HV_GUEST_STATE_VERSION) + if (l2_hv.version >
[PATCH v2 3/4] KVM: PPC: Add infrastructure to support 2nd DAWR
kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria --- Documentation/virt/kvm/api.rst| 2 ++ arch/powerpc/include/asm/hvcall.h | 8 - arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/include/uapi/asm/kvm.h | 2 ++ arch/powerpc/kernel/asm-offsets.c | 2 ++ arch/powerpc/kvm/book3s_hv.c | 41 +++ arch/powerpc/kvm/book3s_hv_nested.c | 7 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 23 + tools/arch/powerpc/include/uapi/asm/kvm.h | 2 ++ 9 files changed, 88 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index eb3a1316f03e..72c98735aa52 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -2249,6 +2249,8 @@ registers, find a list below: PPC KVM_REG_PPC_PSSCR 64 PPC KVM_REG_PPC_DEC_EXPIRY 64 PPC KVM_REG_PPC_PTCR64 + PPC KVM_REG_PPC_DAWR1 64 + PPC KVM_REG_PPC_DAWRX1 64 PPC KVM_REG_PPC_TM_GPR0 64 ... PPC KVM_REG_PPC_TM_GPR3164 diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index a7073fddb657..4bacd27a348b 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -558,16 +558,22 @@ struct hv_guest_state { u64 pidr; u64 cfar; u64 ppr; + /* Version 1 ends here */ + u64 dawr1; + u64 dawrx1; + /* Version 2 ends here */ }; /* Latest version of hv_guest_state structure */ -#define HV_GUEST_STATE_VERSION 1 +#define HV_GUEST_STATE_VERSION 2 static inline int hv_guest_state_size(unsigned int version) { switch (version) { case 1: return offsetofend(struct hv_guest_state, ppr); + case 2: + return offsetofend(struct hv_guest_state, dawrx1); default: return -1; } diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 62cadf1a596e..9804afdf8578 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -586,6 +586,8 @@ struct kvm_vcpu_arch { ulong dabr; ulong dawr0; ulong dawrx0; + ulong dawr1; + ulong dawrx1; ulong ciabr; ulong cfar; ulong ppr; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index c3af3f324c5a..9f18fa090f1f 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -644,6 +644,8 @@ struct kvm_ppc_cpu_char { #define KVM_REG_PPC_MMCR3 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc1) #define KVM_REG_PPC_SIER2 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc2) #define KVM_REG_PPC_SIER3 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc3) +#define KVM_REG_PPC_DAWR1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc4) +#define KVM_REG_PPC_DAWRX1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc5) /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index e4256f5b4602..26d4fa8fe51e 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -549,6 +549,8 @@ int main(void) OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx); OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0); OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0); + OFFSET(VCPU_DAWR1, kvm_vcpu, arch.dawr1); + OFFSET(VCPU_DAWRX1, kvm_vcpu, arch.dawrx1); OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr); OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags); OFFSET(VCPU_DEC, kvm_vcpu, arch.dec); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index d5c6efc8a76e..2ff645789e9e 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -785,6 +785,20 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, vcpu->arch.dawr0 = value1; vcpu->arch.dawrx0 = value2; return H_SUCCESS; + case H_SET_MODE_RESOURCE_SET_DAWR1: + if (!kvmppc_power8_compatible(vcpu)) + return H_P2; + if (!ppc_breakpoint_available()) + return H_P2; + if (!cpu_has_feature(CPU_FTR_DAWR1)) + return H_P2; + if (mflags) + return H_UNSUPPORTED_FLAG_START; + if (value2 & DABRX_HYP) + return H_P4; + vcpu->arch.dawr1 = value1; +
[PATCH v2 2/4] KVM: PPC: Rename current DAWR macros and variables
Power10 is introducing second DAWR. Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's uapi so changing it will break userspace. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/kvm_host.h | 4 ++-- arch/powerpc/kernel/asm-offsets.c | 4 ++-- arch/powerpc/kvm/book3s_hv.c| 24 arch/powerpc/kvm/book3s_hv_nested.c | 8 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++-- 5 files changed, 30 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d67a470e95a3..62cadf1a596e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -584,8 +584,8 @@ struct kvm_vcpu_arch { u32 ctrl; u32 dabrx; ulong dabr; - ulong dawr; - ulong dawrx; + ulong dawr0; + ulong dawrx0; ulong ciabr; ulong cfar; ulong ppr; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 8711c2164b45..e4256f5b4602 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -547,8 +547,8 @@ int main(void) OFFSET(VCPU_CTRL, kvm_vcpu, arch.ctrl); OFFSET(VCPU_DABR, kvm_vcpu, arch.dabr); OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx); - OFFSET(VCPU_DAWR, kvm_vcpu, arch.dawr); - OFFSET(VCPU_DAWRX, kvm_vcpu, arch.dawrx); + OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0); + OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0); OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr); OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags); OFFSET(VCPU_DEC, kvm_vcpu, arch.dec); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 490a0f6a7285..d5c6efc8a76e 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -782,8 +782,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, return H_UNSUPPORTED_FLAG_START; if (value2 & DABRX_HYP) return H_P4; - vcpu->arch.dawr = value1; - vcpu->arch.dawrx = value2; + vcpu->arch.dawr0 = value1; + vcpu->arch.dawrx0 = value2; return H_SUCCESS; case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE: /* KVM does not support mflags=2 (AIL=2) */ @@ -1747,10 +1747,10 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, *val = get_reg_val(id, vcpu->arch.vcore->vtb); break; case KVM_REG_PPC_DAWR: - *val = get_reg_val(id, vcpu->arch.dawr); + *val = get_reg_val(id, vcpu->arch.dawr0); break; case KVM_REG_PPC_DAWRX: - *val = get_reg_val(id, vcpu->arch.dawrx); + *val = get_reg_val(id, vcpu->arch.dawrx0); break; case KVM_REG_PPC_CIABR: *val = get_reg_val(id, vcpu->arch.ciabr); @@ -1979,10 +1979,10 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, vcpu->arch.vcore->vtb = set_reg_val(id, *val); break; case KVM_REG_PPC_DAWR: - vcpu->arch.dawr = set_reg_val(id, *val); + vcpu->arch.dawr0 = set_reg_val(id, *val); break; case KVM_REG_PPC_DAWRX: - vcpu->arch.dawrx = set_reg_val(id, *val) & ~DAWRX_HYP; + vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP; break; case KVM_REG_PPC_CIABR: vcpu->arch.ciabr = set_reg_val(id, *val); @@ -3437,8 +3437,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, int trap; unsigned long host_hfscr = mfspr(SPRN_HFSCR); unsigned long host_ciabr = mfspr(SPRN_CIABR); - unsigned long host_dawr = mfspr(SPRN_DAWR0); - unsigned long host_dawrx = mfspr(SPRN_DAWRX0); + unsigned long host_dawr0 = mfspr(SPRN_DAWR0); + unsigned long host_dawrx0 = mfspr(SPRN_DAWRX0); unsigned long host_psscr = mfspr(SPRN_PSSCR); unsigned long host_pidr = mfspr(SPRN_PID); @@ -3477,8 +3477,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, mtspr(SPRN_SPURR, vcpu->arch.spurr); if (dawr_enabled()) { - mtspr(SPRN_DAWR0, vcpu->arch.dawr); - mtspr(SPRN_DAWRX0, vcpu->arch.dawrx); + mtspr(SPRN_DAWR0, vcpu->arch.dawr0); + mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0); } mtspr(SPRN_CIABR, vcpu->arch.ciabr); mtspr(SPRN_IC, vcpu->arch.ic); @@ -3530,8 +3530,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64
[PATCH v3] powerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions
POWER10 DD1 has an issue where it generates watchpoint exceptions when it shouldn't. The conditions where this occur are: - octword op - ending address of DAWR range is less than starting address of op - those addresses need to be in the same or in two consecutive 512B blocks - 'op address + 64B' generates an address that has a carry into bit 52 (crosses 2K boundary) Handle such spurious exception by considering them as extraneous and emulating/single-steeping instruction without generating an event. Signed-off-by: Ravi Bangoria [Fixed build warning reported by kernel test robot] Reported-by: kernel test robot --- v2->v3: - v2: https://lore.kernel.org/r/20201022034039.330365-1-ravi.bango...@linux.ibm.com - Drop first patch which introduced CPU_FTRS_POWER10_DD1. Instead use P1 DD1 PVR direclty in if condition. We can't set CPU_FTRS_POWER10_DD1 inside guest as guest can be migrated to futur version of cpu. Dependency: VSX-32 byte emulation support patches https://lore.kernel.org/r/20201011050908.72173-1-ravi.bango...@linux.ibm.com arch/powerpc/kernel/hw_breakpoint.c | 68 - 1 file changed, 66 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 1f4a1efa0074..67297aea5d94 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -644,6 +644,11 @@ static bool is_larx_stcx_instr(int type) return type == LARX || type == STCX; } +static bool is_octword_vsx_instr(int type, int size) +{ + return ((type == LOAD_VSX || type == STORE_VSX) && size == 32); +} + /* * We've failed in reliably handling the hw-breakpoint. Unregister * it and throw a warning message to let the user know about it. @@ -694,6 +699,58 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event **bp, return true; } +static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info, +int *hit, unsigned long ea) +{ + int i; + unsigned long hw_end_addr; + + /* +* Handle spurious exception only when any bp_per_reg is set. +* Otherwise this might be created by xmon and not actually a +* spurious exception. +*/ + for (i = 0; i < nr_wp_slots(); i++) { + if (!info[i]) + continue; + + hw_end_addr = ALIGN(info[i]->address + info[i]->len, HW_BREAKPOINT_SIZE); + + /* +* Ending address of DAWR range is less than starting +* address of op. +*/ + if ((hw_end_addr - 1) >= ea) + continue; + + /* +* Those addresses need to be in the same or in two +* consecutive 512B blocks; +*/ + if (((hw_end_addr - 1) >> 10) != (ea >> 10)) + continue; + + /* +* 'op address + 64B' generates an address that has a +* carry into bit 52 (crosses 2K boundary). +*/ + if ((ea & 0x800) == ((ea + 64) & 0x800)) + continue; + + break; + } + + if (i == nr_wp_slots()) + return; + + for (i = 0; i < nr_wp_slots(); i++) { + if (info[i]) { + hit[i] = 1; + info[i]->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; + } + } +} + int hw_breakpoint_handler(struct die_args *args) { bool err = false; @@ -752,8 +809,15 @@ int hw_breakpoint_handler(struct die_args *args) goto reset; if (!nr_hit) { - rc = NOTIFY_DONE; - goto out; + /* Workaround for Power10 DD1 */ + if (mfspr(SPRN_PVR) == 0x800100 && + !IS_ENABLED(CONFIG_PPC_8xx) && + is_octword_vsx_instr(type, size)) { + handle_p10dd1_spurious_exception(info, hit, ea); + } else { + rc = NOTIFY_DONE; + goto out; + } } /* -- 2.25.1
Re: [PATCH 1/2] powerpc: Introduce POWER10_DD1 feature
Hi Michael, +static void __init fixup_cpu_features(void) +{ + unsigned long version = mfspr(SPRN_PVR); + + if ((version & 0x) == 0x00800100) + cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1; +} + static int __init early_init_dt_scan_cpus(unsigned long node, const char *uname, int depth, void *data) @@ -378,6 +386,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node, check_cpu_feature_properties(node); check_cpu_pa_features(node); + fixup_cpu_features(); } This is not the way we normally do CPU features. In the past we have always added a raw entry in cputable.c, see eg. the Power9 DD 2.0, 2.1 entries. True. But that won't work PowerVM guests because kernel overwrites "raw" mode cpu_features with "architected" mode cpu_features. Doing it here is not really safe, if you're running with an architected PVR (via cpu-version property), you can't set the DD1 feature, because you might be migrated to a future CPU that doesn't have the DD1 quirks. Okay.. I suppose, that mean kernel need to check real PVR every time when it encounters spurious exception. i.e. instead of using if (cpu_has_feature(CPU_FTR_POWER10_DD1) && ...) it need to use if (mfspr(SPRN_PVR) == 0x800100 && ...) in patch 2. Right? Thanks, Ravi
Re: [PATCH v2 1/2] powerpc: Introduce POWER10_DD1 feature
+static void __init fixup_cpu_features(void) +{ + unsigned long version = mfspr(SPRN_PVR); + + if ((version & 0x) == 0x00800100) + cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1; +} + I am just wondering why this is needed here, but the same thing is not done for, say, CPU_FTR_POWER9_DD2_1? When we don't use DT cpu_features (PowerVM / kvm geusts), we call identify_cpu() twice. First with Real PVR which sets "raw" cpu_spec as cur_cpu_spec and then 2nd time with Logical PVR (0x0f...) which (mostly) overwrites the cur_cpu_spec with "architected" mode cpu_spec. I don't see DD version specific entries for "architected" mode in cpu_specs[] for any previous processors. So I've introduced this function to tweak cpu_features. Though, I don't know why we don't have similar thing for CPU_FTR_POWER9_DD2_1. I've to check that. And should we get a /* Power10 DD 1 */ added to cpu_specs[]? IIUC, we don't need such entry. For PowerVM / kvm guests, we overwrite cpu_spec, so /* Power10 */ "raw" entry is sufficient. And For baremetal, we don't use cpu_specs[] at all. I think even for powernv, using dt features can be disabled by the cmdline with dt_cpu_ftrs=off, then cpu_specs[] will then be used. Ok... with dt_cpu_ftrs=off, we seem to be using raw mode cpu_specs[] entry on baremetal. So yeah, I'll add /* Power10 DD1 */ raw mode entry into cpu_specs[]. Thanks for pointing it out. -Ravi
Re: [PATCH v2 1/2] powerpc: Introduce POWER10_DD1 feature
On 10/22/20 10:41 AM, Jordan Niethe wrote: On Thu, Oct 22, 2020 at 2:40 PM Ravi Bangoria wrote: POWER10_DD1 feature flag will be needed while adding conditional code that applies only for Power10 DD1. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 8 ++-- arch/powerpc/kernel/dt_cpu_ftrs.c | 3 +++ arch/powerpc/kernel/prom.c | 9 + 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 93bc70d4c9a1..d486f56c0d33 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -216,6 +216,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) #define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) #define CPU_FTR_DAWR1 LONG_ASM_CONST(0x0008) +#define CPU_FTR_POWER10_DD1LONG_ASM_CONST(0x0010) #ifndef __ASSEMBLY__ @@ -479,6 +480,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \ CPU_FTR_DAWR | CPU_FTR_DAWR1) +#define CPU_FTRS_POWER10_DD1 (CPU_FTRS_POWER10 | CPU_FTR_POWER10_DD1) #define CPU_FTRS_CELL (CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ @@ -497,14 +499,16 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTRS_POWER10_DD1) #else #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \ CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \ CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \ CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTRS_POWER10_DD1) #endif /* CONFIG_CPU_LITTLE_ENDIAN */ #endif #else diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 1098863e17ee..b2327f2967ff 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -811,6 +811,9 @@ static __init void cpufeatures_cpu_quirks(void) } update_tlbie_feature_flag(version); + + if ((version & 0x) == 0x00800100) + cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1; } static void __init cpufeatures_setup_finished(void) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index c1545f22c077..c778c81284f7 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -305,6 +305,14 @@ static void __init check_cpu_feature_properties(unsigned long node) } } +static void __init fixup_cpu_features(void) +{ + unsigned long version = mfspr(SPRN_PVR); + + if ((version & 0x) == 0x00800100) + cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1; +} + I am just wondering why this is needed here, but the same thing is not done for, say, CPU_FTR_POWER9_DD2_1? When we don't use DT cpu_features (PowerVM / kvm geusts), we call identify_cpu() twice. First with Real PVR which sets "raw" cpu_spec as cur_cpu_spec and then 2nd time with Logical PVR (0x0f...) which (mostly) overwrites the cur_cpu_spec with "architected" mode cpu_spec. I don't see DD version specific entries for "architected" mode in cpu_specs[] for any previous processors. So I've introduced this function to tweak cpu_features. Though, I don't know why we don't have similar thing for CPU_FTR_POWER9_DD2_1. I've to check that. And should we get a /* Power10 DD 1 */ added to cpu_specs[]? IIUC, we don't need such entry. For PowerVM / kvm guests, we overwrite cpu_spec, so /* Power10 */ "raw" entry is sufficient. And For baremetal, we don't use cpu_specs[] at all. static int __init early_init_dt_scan_cpus(unsigned long node, const char *uname, int depth, void *data) @@ -378,6 +386,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node, check_cpu_feature_properties(node); check_cpu_pa_features(node); + fixup_cpu_features(); } identical_pvr_fixup(node); -- 2.25.1
[PATCH v2 2/2] powerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions
POWER10 DD1 has an issue where it generates watchpoint exceptions when it shouldn't. The conditions where this occur are: - octword op - ending address of DAWR range is less than starting address of op - those addresses need to be in the same or in two consecutive 512B blocks - 'op address + 64B' generates an address that has a carry into bit 52 (crosses 2K boundary) Handle such spurious exception by considering them as extraneous and emulating/single-steeping instruction without generating an event. Signed-off-by: Ravi Bangoria [Fixed build warning reported by kernel test robot] Reported-by: kernel test robot --- Dependency: VSX-32 byte emulation support patches https://lore.kernel.org/r/20201011050908.72173-1-ravi.bango...@linux.ibm.com arch/powerpc/kernel/hw_breakpoint.c | 67 - 1 file changed, 65 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index f4e8f21046f5..9e83dd3d2ec5 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -499,6 +499,11 @@ static bool is_larx_stcx_instr(int type) return type == LARX || type == STCX; } +static bool is_octword_vsx_instr(int type, int size) +{ + return ((type == LOAD_VSX || type == STORE_VSX) && size == 32); +} + /* * We've failed in reliably handling the hw-breakpoint. Unregister * it and throw a warning message to let the user know about it. @@ -549,6 +554,58 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event **bp, return true; } +static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info, +int *hit, unsigned long ea) +{ + int i; + unsigned long hw_end_addr; + + /* +* Handle spurious exception only when any bp_per_reg is set. +* Otherwise this might be created by xmon and not actually a +* spurious exception. +*/ + for (i = 0; i < nr_wp_slots(); i++) { + if (!info[i]) + continue; + + hw_end_addr = ALIGN(info[i]->address + info[i]->len, HW_BREAKPOINT_SIZE); + + /* +* Ending address of DAWR range is less than starting +* address of op. +*/ + if ((hw_end_addr - 1) >= ea) + continue; + + /* +* Those addresses need to be in the same or in two +* consecutive 512B blocks; +*/ + if (((hw_end_addr - 1) >> 10) != (ea >> 10)) + continue; + + /* +* 'op address + 64B' generates an address that has a +* carry into bit 52 (crosses 2K boundary). +*/ + if ((ea & 0x800) == ((ea + 64) & 0x800)) + continue; + + break; + } + + if (i == nr_wp_slots()) + return; + + for (i = 0; i < nr_wp_slots(); i++) { + if (info[i]) { + hit[i] = 1; + info[i]->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; + } + } +} + int hw_breakpoint_handler(struct die_args *args) { bool err = false; @@ -607,8 +664,14 @@ int hw_breakpoint_handler(struct die_args *args) goto reset; if (!nr_hit) { - rc = NOTIFY_DONE; - goto out; + if (cpu_has_feature(CPU_FTR_POWER10_DD1) && + !IS_ENABLED(CONFIG_PPC_8xx) && + is_octword_vsx_instr(type, size)) { + handle_p10dd1_spurious_exception(info, hit, ea); + } else { + rc = NOTIFY_DONE; + goto out; + } } /* -- 2.25.1
[PATCH v2 1/2] powerpc: Introduce POWER10_DD1 feature
POWER10_DD1 feature flag will be needed while adding conditional code that applies only for Power10 DD1. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 8 ++-- arch/powerpc/kernel/dt_cpu_ftrs.c | 3 +++ arch/powerpc/kernel/prom.c | 9 + 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 93bc70d4c9a1..d486f56c0d33 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -216,6 +216,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) #define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) #define CPU_FTR_DAWR1 LONG_ASM_CONST(0x0008) +#define CPU_FTR_POWER10_DD1LONG_ASM_CONST(0x0010) #ifndef __ASSEMBLY__ @@ -479,6 +480,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \ CPU_FTR_DAWR | CPU_FTR_DAWR1) +#define CPU_FTRS_POWER10_DD1 (CPU_FTRS_POWER10 | CPU_FTR_POWER10_DD1) #define CPU_FTRS_CELL (CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ @@ -497,14 +499,16 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTRS_POWER10_DD1) #else #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \ CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \ CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \ CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTRS_POWER10_DD1) #endif /* CONFIG_CPU_LITTLE_ENDIAN */ #endif #else diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 1098863e17ee..b2327f2967ff 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -811,6 +811,9 @@ static __init void cpufeatures_cpu_quirks(void) } update_tlbie_feature_flag(version); + + if ((version & 0x) == 0x00800100) + cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1; } static void __init cpufeatures_setup_finished(void) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index c1545f22c077..c778c81284f7 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -305,6 +305,14 @@ static void __init check_cpu_feature_properties(unsigned long node) } } +static void __init fixup_cpu_features(void) +{ + unsigned long version = mfspr(SPRN_PVR); + + if ((version & 0x) == 0x00800100) + cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1; +} + static int __init early_init_dt_scan_cpus(unsigned long node, const char *uname, int depth, void *data) @@ -378,6 +386,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node, check_cpu_feature_properties(node); check_cpu_pa_features(node); + fixup_cpu_features(); } identical_pvr_fixup(node); -- 2.25.1
[PATCH 1/2] powerpc: Introduce POWER10_DD1 feature
POWER10_DD1 feature flag will be needed while adding conditional code that applies only for Power10 DD1. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 8 ++-- arch/powerpc/kernel/dt_cpu_ftrs.c | 3 +++ arch/powerpc/kernel/prom.c | 9 + 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 93bc70d4c9a1..d486f56c0d33 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -216,6 +216,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) #define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) #define CPU_FTR_DAWR1 LONG_ASM_CONST(0x0008) +#define CPU_FTR_POWER10_DD1LONG_ASM_CONST(0x0010) #ifndef __ASSEMBLY__ @@ -479,6 +480,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \ CPU_FTR_DAWR | CPU_FTR_DAWR1) +#define CPU_FTRS_POWER10_DD1 (CPU_FTRS_POWER10 | CPU_FTR_POWER10_DD1) #define CPU_FTRS_CELL (CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ @@ -497,14 +499,16 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTRS_POWER10_DD1) #else #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \ CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \ CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \ CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTRS_POWER10_DD1) #endif /* CONFIG_CPU_LITTLE_ENDIAN */ #endif #else diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 1098863e17ee..b2327f2967ff 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -811,6 +811,9 @@ static __init void cpufeatures_cpu_quirks(void) } update_tlbie_feature_flag(version); + + if ((version & 0x) == 0x00800100) + cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1; } static void __init cpufeatures_setup_finished(void) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index c1545f22c077..c778c81284f7 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -305,6 +305,14 @@ static void __init check_cpu_feature_properties(unsigned long node) } } +static void __init fixup_cpu_features(void) +{ + unsigned long version = mfspr(SPRN_PVR); + + if ((version & 0x) == 0x00800100) + cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1; +} + static int __init early_init_dt_scan_cpus(unsigned long node, const char *uname, int depth, void *data) @@ -378,6 +386,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node, check_cpu_feature_properties(node); check_cpu_pa_features(node); + fixup_cpu_features(); } identical_pvr_fixup(node); -- 2.25.1
[PATCH 2/2] powerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions
POWER10 DD1 has an issue where it generates watchpoint exceptions when it shouldn't. The conditions where this occur are: - octword op - ending address of DAWR range is less than starting address of op - those addresses need to be in the same or in two consecutive 512B blocks - 'op address + 64B' generates an address that has a carry into bit 52 (crosses 2K boundary) Handle such spurious exception by considering them as extraneous and emulating/single-steeping instruction without generating an event. Signed-off-by: Ravi Bangoria --- Dependency: VSX-32 byte emulation support patches https://lore.kernel.org/r/20201011050908.72173-1-ravi.bango...@linux.ibm.com arch/powerpc/kernel/hw_breakpoint.c | 69 - 1 file changed, 67 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index f4e8f21046f5..4514745d27c3 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -499,6 +499,11 @@ static bool is_larx_stcx_instr(int type) return type == LARX || type == STCX; } +static bool is_octword_vsx_instr(int type, int size) +{ + return ((type == LOAD_VSX || type == STORE_VSX) && size == 32); +} + /* * We've failed in reliably handling the hw-breakpoint. Unregister * it and throw a warning message to let the user know about it. @@ -549,6 +554,60 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event **bp, return true; } +static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info, +int *hit, unsigned long ea) +{ + int i; + unsigned long hw_start_addr; + unsigned long hw_end_addr; + + /* +* Handle spurious exception only when any bp_per_reg is set. +* Otherwise this might be created by xmon and not actually a +* spurious exception. +*/ + for (i = 0; i < nr_wp_slots(); i++) { + if (!info[i]) + continue; + + hw_start_addr = ALIGN_DOWN(info[i]->address, HW_BREAKPOINT_SIZE); + hw_end_addr = ALIGN(info[i]->address + info[i]->len, HW_BREAKPOINT_SIZE); + + /* +* Ending address of DAWR range is less than starting +* address of op. +*/ + if ((hw_end_addr - 1) >= ea) + continue; + + /* +* Those addresses need to be in the same or in two +* consecutive 512B blocks; +*/ + if (((hw_end_addr - 1) >> 10) != (ea >> 10)) + continue; + + /* +* 'op address + 64B' generates an address that has a +* carry into bit 52 (crosses 2K boundary). +*/ + if ((ea & 0x800) == ((ea + 64) & 0x800)) + continue; + + break; + } + + if (i == nr_wp_slots()) + return; + + for (i = 0; i < nr_wp_slots(); i++) { + if (info[i]) { + hit[i] = 1; + info[i]->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; + } + } +} + int hw_breakpoint_handler(struct die_args *args) { bool err = false; @@ -607,8 +666,14 @@ int hw_breakpoint_handler(struct die_args *args) goto reset; if (!nr_hit) { - rc = NOTIFY_DONE; - goto out; + if (cpu_has_feature(CPU_FTR_POWER10_DD1) && + !IS_ENABLED(CONFIG_PPC_8xx) && + is_octword_vsx_instr(type, size)) { + handle_p10dd1_spurious_exception(info, hit, ea); + } else { + rc = NOTIFY_DONE; + goto out; + } } /* -- 2.25.1
Re: [PATCH v5 1/5] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
Hi Daniel, On 10/12/20 7:14 PM, Daniel Axtens wrote: Hi, To review this, I looked through the supported instructions to see if there were any that I thought might have been missed. I didn't find any other v3.1 ones, although I don't have a v3.1 ISA to hand so I was basically looking for instructions I didn't recognise. I did, however, find a number of instructions that are new in ISA 3.0 that aren't guarded: - addpcis - lxvl/stxvl - lxvll/stxvll - lxvwsx - stxvx - lxsipzx - lxvh8x - lxsihzx - lxvb16x/stxvb16x - stxsibx - stxsihx - lxvb16x - lxsd/stxsd - lxssp/stxssp - lxv/stxv Also, I don't know how bothered we are about P7, but if I'm reading the ISA correctly, lqarx/stqcx. are not supported before ISA 2.07. Likewise a number of the vector instructions like lxsiwzx and lxsiwax (and the companion stores). I realise it's not really the point of this particular patch, so I don't think this should block acceptance. What I would like to know - and maybe this is something where we need mpe to weigh in - is whether we need consistent guards for 2.07 and 3.0. We have some 3.0 guards already but clearly not everywhere. Yes, those needs to be handled properly. Otherwise they can be harmful when emulated on a non-supporting platform. Will work on it when I get some time. Thanks for reporting it. With all that said - the patch does what it says it does, and looks good to me: Reviewed-by: Daniel Axtens Thanks! Ravi
Re: [PATCH v5 1/5] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
Hi Daniel, On 10/12/20 7:21 AM, Daniel Axtens wrote: Hi, Apologies if this has come up in a previous revision. case 1: + if (!cpu_has_feature(CPU_FTR_ARCH_31)) + return -1; + prefix_r = GET_PREFIX_R(word); ra = GET_PREFIX_RA(suffix); The comment above analyse_instr reads in part: * Return value is 1 if the instruction can be emulated just by * updating *regs with the information in *op, -1 if we need the * GPRs but *regs doesn't contain the full register set, or 0 * otherwise. I was wondering why returning -1 if the instruction isn't supported the right thing to do - it seemed to me that it should return 0? I did look and see that there are other cases where the code returns -1 for an unsupported operation, e.g.: #ifdef __powerpc64__ case 4: if (!cpu_has_feature(CPU_FTR_ARCH_300)) return -1; switch (word & 0x3f) { case 48:/* maddhd */ That's from commit 930d6288a267 ("powerpc: sstep: Add support for maddhd, maddhdu, maddld instructions"), but it's not explained there either I see the same pattern in a number of commits: commit 6324320de609 ("powerpc sstep: Add support for modsd, modud instructions"), commit 6c180071509a ("powerpc sstep: Add support for modsw, moduw instructions"), commit a23987ef267a ("powerpc: sstep: Add support for darn instruction") and a few others, all of which seem to have come through Sandipan in February 2019. I haven't spotted any explanation for why -1 was chosen, but I haven't checked the mailing list archives. If -1 is OK, would it be possible to update the comment to explain why? Agreed. As you rightly pointed out, there are many more such cases and, yes, we are aware of this issue and it's being worked upon as a separate patch. (Sandipan did send a fix patch internally some time back). Thanks for pointing it out! Ravi
[PATCH v5 5/5] powerpc/sstep: Add testcases for VSX vector paired load/store instructions
From: Balamuruhan S Add testcases for VSX vector paired load/store instructions. Sample o/p: emulate_step_test: lxvp : PASS emulate_step_test: stxvp : PASS emulate_step_test: lxvpx : PASS emulate_step_test: stxvpx : PASS emulate_step_test: plxvp : PASS emulate_step_test: pstxvp : PASS Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/test_emulate_step.c | 270 +++ 1 file changed, 270 insertions(+) diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c index 0a201b771477..783d1b85ecfe 100644 --- a/arch/powerpc/lib/test_emulate_step.c +++ b/arch/powerpc/lib/test_emulate_step.c @@ -612,6 +612,273 @@ static void __init test_lxvd2x_stxvd2x(void) } #endif /* CONFIG_VSX */ +#ifdef CONFIG_VSX +static void __init test_lxvp_stxvp(void) +{ + struct pt_regs regs; + union { + vector128 a; + u32 b[4]; + } c[2]; + u32 cached_b[8]; + int stepped = -1; + + if (!cpu_has_feature(CPU_FTR_ARCH_31)) { + show_result("lxvp", "SKIP (!CPU_FTR_ARCH_31)"); + show_result("stxvp", "SKIP (!CPU_FTR_ARCH_31)"); + return; + } + + init_pt_regs(); + + /*** lxvp ***/ + + cached_b[0] = c[0].b[0] = 18233; + cached_b[1] = c[0].b[1] = 34863571; + cached_b[2] = c[0].b[2] = 834; + cached_b[3] = c[0].b[3] = 6138911; + cached_b[4] = c[1].b[0] = 1234; + cached_b[5] = c[1].b[1] = 5678; + cached_b[6] = c[1].b[2] = 91011; + cached_b[7] = c[1].b[3] = 121314; + + regs.gpr[4] = (unsigned long)[0].a; + + /* +* lxvp XTp,DQ(RA) +* XTp = 32xTX + 2xTp +* let TX=1 Tp=1 RA=4 DQ=0 +*/ + stepped = emulate_step(, ppc_inst(PPC_RAW_LXVP(34, 4, 0))); + + if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) { + show_result("lxvp", "PASS"); + } else { + if (!cpu_has_feature(CPU_FTR_VSX)) + show_result("lxvp", "PASS (!CPU_FTR_VSX)"); + else + show_result("lxvp", "FAIL"); + } + + /*** stxvp ***/ + + c[0].b[0] = 21379463; + c[0].b[1] = 87; + c[0].b[2] = 374234; + c[0].b[3] = 4; + c[1].b[0] = 90; + c[1].b[1] = 122; + c[1].b[2] = 555; + c[1].b[3] = 32144; + + /* +* stxvp XSp,DQ(RA) +* XSp = 32xSX + 2xSp +* let SX=1 Sp=1 RA=4 DQ=0 +*/ + stepped = emulate_step(, ppc_inst(PPC_RAW_STXVP(34, 4, 0))); + + if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == c[0].b[1] && + cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] && + cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] && + cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] && + cpu_has_feature(CPU_FTR_VSX)) { + show_result("stxvp", "PASS"); + } else { + if (!cpu_has_feature(CPU_FTR_VSX)) + show_result("stxvp", "PASS (!CPU_FTR_VSX)"); + else + show_result("stxvp", "FAIL"); + } +} +#else +static void __init test_lxvp_stxvp(void) +{ + show_result("lxvp", "SKIP (CONFIG_VSX is not set)"); + show_result("stxvp", "SKIP (CONFIG_VSX is not set)"); +} +#endif /* CONFIG_VSX */ + +#ifdef CONFIG_VSX +static void __init test_lxvpx_stxvpx(void) +{ + struct pt_regs regs; + union { + vector128 a; + u32 b[4]; + } c[2]; + u32 cached_b[8]; + int stepped = -1; + + if (!cpu_has_feature(CPU_FTR_ARCH_31)) { + show_result("lxvpx", "SKIP (!CPU_FTR_ARCH_31)"); + show_result("stxvpx", "SKIP (!CPU_FTR_ARCH_31)"); + return; + } + + init_pt_regs(); + + /*** lxvpx ***/ + + cached_b[0] = c[0].b[0] = 18233; + cached_b[1] = c[0].b[1] = 34863571; + cached_b[2] = c[0].b[2] = 834; + cached_b[3] = c[0].b[3] = 6138911; + cached_b[4] = c[1].b[0] = 1234; + cached_b[5] = c[1].b[1] = 5678; + cached_b[6] = c[1].b[2] = 91011; + cached_b[7] = c[1].b[3] = 121314; + + regs.gpr[3] = (unsigned long)[0].a; + regs.gpr[4] = 0; + + /* +* lxvpx XTp,RA,RB +* XTp = 32xTX + 2xTp +* let TX=1 Tp=1 RA=3 RB=4 +*/ + stepped = emulate_step(, ppc_inst(PPC_RAW_LXVPX(34, 3, 4))); + + if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) { + show_result("lxvpx", "PASS&qu
[PATCH v5 4/5] powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions
From: Balamuruhan S Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as macros for new VSX vector paired instructions, * Load VSX Vector Paired (lxvp) * Load VSX Vector Paired Indexed (lxvpx) * Prefixed Load VSX Vector Paired (plxvp) * Store VSX Vector Paired (stxvp) * Store VSX Vector Paired Indexed (stxvpx) * Prefixed Store VSX Vector Paired (pstxvp) Suggested-by: Naveen N. Rao Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/ppc-opcode.h | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index a6e3700c4566..5e7918ca4fb7 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -78,6 +78,9 @@ #define IMM_L(i) ((uintptr_t)(i) & 0x) #define IMM_DS(i) ((uintptr_t)(i) & 0xfffc) +#define IMM_DQ(i) ((uintptr_t)(i) & 0xfff0) +#define IMM_D0(i) (((uintptr_t)(i) >> 16) & 0x3) +#define IMM_D1(i) IMM_L(i) /* * 16-bit immediate helper macros: HA() is for use with sign-extending instrs @@ -295,6 +298,8 @@ #define __PPC_XB(b)b) & 0x1f) << 11) | (((b) & 0x20) >> 4)) #define __PPC_XS(s)s) & 0x1f) << 21) | (((s) & 0x20) >> 5)) #define __PPC_XT(s)__PPC_XS(s) +#define __PPC_XSP(s) s) & 0x1e) | (((s) >> 5) & 0x1)) << 21) +#define __PPC_XTP(s) __PPC_XSP(s) #define __PPC_T_TLB(t) (((t) & 0x3) << 21) #define __PPC_WC(w)(((w) & 0x3) << 21) #define __PPC_WS(w)(((w) & 0x1f) << 11) @@ -395,6 +400,14 @@ #define PPC_RAW_XVCPSGNDP(t, a, b) ((0xf780 | VSX_XX3((t), (a), (b #define PPC_RAW_VPERMXOR(vrt, vra, vrb, vrc) \ ((0x102d | ___PPC_RT(vrt) | ___PPC_RA(vra) | ___PPC_RB(vrb) | (((vrc) & 0x1f) << 6))) +#define PPC_RAW_LXVP(xtp, a, i)(0x1800 | __PPC_XTP(xtp) | ___PPC_RA(a) | IMM_DQ(i)) +#define PPC_RAW_STXVP(xsp, a, i) (0x1801 | __PPC_XSP(xsp) | ___PPC_RA(a) | IMM_DQ(i)) +#define PPC_RAW_LXVPX(xtp, a, b) (0x7c00029a | __PPC_XTP(xtp) | ___PPC_RA(a) | ___PPC_RB(b)) +#define PPC_RAW_STXVPX(xsp, a, b) (0x7c00039a | __PPC_XSP(xsp) | ___PPC_RA(a) | ___PPC_RB(b)) +#define PPC_RAW_PLXVP(xtp, i, a, pr) \ + ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xe800 | __PPC_XTP(xtp) | ___PPC_RA(a) | IMM_D1(i))) +#define PPC_RAW_PSTXVP(xsp, i, a, pr) \ + ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xf800 | __PPC_XSP(xsp) | ___PPC_RA(a) | IMM_D1(i))) #define PPC_RAW_NAP(0x4c000364) #define PPC_RAW_SLEEP (0x4c0003a4) #define PPC_RAW_WINKLE (0x4c0003e4) -- 2.26.2
[PATCH v5 3/5] powerpc/sstep: Support VSX vector paired storage access instructions
From: Balamuruhan S VSX Vector Paired instructions loads/stores an octword (32 bytes) from/to storage into two sequential VSRs. Add emulation support for these new instructions: * Load VSX Vector Paired (lxvp) * Load VSX Vector Paired Indexed (lxvpx) * Prefixed Load VSX Vector Paired (plxvp) * Store VSX Vector Paired (stxvp) * Store VSX Vector Paired Indexed (stxvpx) * Prefixed Store VSX Vector Paired (pstxvp) Suggested-by: Naveen N. Rao Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria [kernel test robot reported a build failure] Reported-by: kernel test robot --- arch/powerpc/lib/sstep.c | 150 +-- 1 file changed, 129 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index faf0bbf3efb7..96ca813a65e7 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -32,6 +32,10 @@ extern char system_call_vectored_emulate[]; #define XER_OV32 0x0008U #define XER_CA32 0x0004U +#ifdef CONFIG_VSX +#define VSX_REGISTER_XTP(rd) rd) & 1) << 5) | ((rd) & 0xfe)) +#endif + #ifdef CONFIG_PPC_FPU /* * Functions in ldstfp.S @@ -279,6 +283,19 @@ static nokprobe_inline void do_byte_reverse(void *ptr, int nb) up[1] = tmp; break; } + case 32: { + unsigned long *up = (unsigned long *)ptr; + unsigned long tmp; + + tmp = byterev_8(up[0]); + up[0] = byterev_8(up[3]); + up[3] = tmp; + tmp = byterev_8(up[2]); + up[2] = byterev_8(up[1]); + up[1] = tmp; + break; + } + #endif default: WARN_ON_ONCE(1); @@ -709,6 +726,8 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, reg->d[0] = reg->d[1] = 0; switch (op->element_size) { + case 32: + /* [p]lxvp[x] */ case 16: /* whole vector; lxv[x] or lxvl[l] */ if (size == 0) @@ -717,7 +736,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, if (IS_LE && (op->vsx_flags & VSX_LDLEFT)) rev = !rev; if (rev) - do_byte_reverse(reg, 16); + do_byte_reverse(reg, size); break; case 8: /* scalar loads, lxvd2x, lxvdsx */ @@ -793,6 +812,20 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg, size = GETSIZE(op->type); switch (op->element_size) { + case 32: + /* [p]stxvp[x] */ + if (size == 0) + break; + if (rev) { + /* reverse 32 bytes */ + buf.d[0] = byterev_8(reg->d[3]); + buf.d[1] = byterev_8(reg->d[2]); + buf.d[2] = byterev_8(reg->d[1]); + buf.d[3] = byterev_8(reg->d[0]); + reg = + } + memcpy(mem, reg, size); + break; case 16: /* stxv, stxvx, stxvl, stxvll */ if (size == 0) @@ -861,28 +894,43 @@ static nokprobe_inline int do_vsx_load(struct instruction_op *op, bool cross_endian) { int reg = op->reg; - u8 mem[16]; - union vsx_reg buf; + int i, j, nr_vsx_regs; + u8 mem[32]; + union vsx_reg buf[2]; int size = GETSIZE(op->type); if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs)) return -EFAULT; - emulate_vsx_load(op, , mem, cross_endian); + nr_vsx_regs = size / sizeof(__vector128); + emulate_vsx_load(op, buf, mem, cross_endian); preempt_disable(); if (reg < 32) { /* FP regs + extensions */ if (regs->msr & MSR_FP) { - load_vsrn(reg, ); + for (i = 0; i < nr_vsx_regs; i++) { + j = IS_LE ? nr_vsx_regs - i - 1 : i; + load_vsrn(reg + i, [j].v); + } } else { - current->thread.fp_state.fpr[reg][0] = buf.d[0]; - current->thread.fp_state.fpr[reg][1] = buf.d[1]; + for (i = 0; i < nr_vsx_regs; i++) { + j = IS_LE ? nr_vsx_regs - i - 1 : i; + current->thread.fp_state.fpr[reg + i][0] = buf[j].d[0]; + current->thread.fp_state.fpr[reg + i][1] = buf[j].d[1]; + } } } else { - if (regs->msr & MSR_VEC) - load_vsrn(r
[PATCH v5 2/5] powerpc/sstep: Cover new VSX instructions under CONFIG_VSX
Recently added Power10 prefixed VSX instruction are included unconditionally in the kernel. If they are executed on a machine without VSX support, it might create issues. Fix that. Also fix one mnemonics spelling mistake in comment. Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores") Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/sstep.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index e6242744c71b..faf0bbf3efb7 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -2757,6 +2757,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, case 41:/* plwa */ op->type = MKOP(LOAD, PREFIXED | SIGNEXT, 4); break; +#ifdef CONFIG_VSX case 42:/* plxsd */ op->reg = rd + 32; op->type = MKOP(LOAD_VSX, PREFIXED, 8); @@ -2797,13 +2798,14 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, op->element_size = 16; op->vsx_flags = VSX_CHECK_VEC; break; +#endif /* CONFIG_VSX */ case 56:/* plq */ op->type = MKOP(LOAD, PREFIXED, 16); break; case 57:/* pld */ op->type = MKOP(LOAD, PREFIXED, 8); break; - case 60:/* stq */ + case 60:/* pstq */ op->type = MKOP(STORE, PREFIXED, 16); break; case 61:/* pstd */ -- 2.26.2
[PATCH v5 1/5] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
From: Balamuruhan S Unconditional emulation of prefixed instructions will allow emulation of them on Power10 predecessors which might cause issues. Restrict that. Fixes: 3920742b92f5 ("powerpc sstep: Add support for prefixed fixed-point arithmetic") Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores") Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/sstep.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index e9dcaba9a4f8..e6242744c71b 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1346,6 +1346,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, switch (opcode) { #ifdef __powerpc64__ case 1: + if (!cpu_has_feature(CPU_FTR_ARCH_31)) + return -1; + prefix_r = GET_PREFIX_R(word); ra = GET_PREFIX_RA(suffix); rd = (suffix >> 21) & 0x1f; @@ -2733,6 +2736,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, } break; case 1: /* Prefixed instructions */ + if (!cpu_has_feature(CPU_FTR_ARCH_31)) + return -1; + prefix_r = GET_PREFIX_R(word); ra = GET_PREFIX_RA(suffix); op->update_reg = ra; -- 2.26.2
[PATCH v5 0/5] powerpc/sstep: VSX 32-byte vector paired load/store instructions
VSX vector paired instructions operates with octword (32-byte) operand for loads and stores between storage and a pair of two sequential Vector-Scalar Registers (VSRs). There are 4 word instructions and 2 prefixed instructions that provides this 32-byte storage access operations - lxvp, lxvpx, stxvp, stxvpx, plxvp, pstxvp. Emulation infrastructure doesn't have support for these instructions, to operate with 32-byte storage access and to operate with 2 VSX registers. This patch series enables the instruction emulation support and adds test cases for them respectively. v4: https://lore.kernel.org/r/20201008072726.233086-1-ravi.bango...@linux.ibm.com Changes in v5: - * Fix build breakage reported by Kernel test robote * Patch #2 is new. CONFIG_VSX check was missing for some VSX instructions. Patch #2 adds that check. Changes in v4: - * Patch #1 is (kind of) new. * Patch #2 now enables both analyse_instr() and emulate_step() unlike prev series where both were in separate patches. * Patch #2 also has important fix for emulation on LE. * Patch #3 and #4. Added XSP/XTP, D0/D1 instruction operands, removed *_EX_OP, __PPC_T[P][X] macros which are incorrect, and adhered to PPC_RAW_* convention. * Added `CPU_FTR_ARCH_31` check in testcases to avoid failing in p8/p9. * Some consmetic changes. * Rebased to powerpc/next Changes in v3: - Worked on review comments and suggestions from Ravi and Naveen, * Fix the do_vsx_load() to handle vsx instructions if MSR_FP/MSR_VEC cleared in exception conditions and it reaches to read/write to thread_struct member fp_state/vr_state respectively. * Fix wrongly used `__vector128 v[2]` in struct vsx_reg as it should hold a single vsx register size. * Remove unnecessary `VSX_CHECK_VEC` flag set and condition to check `VSX_LDLEFT` that is not applicable for these vsx instructions. * Fix comments in emulate_vsx_load() that were misleading. * Rebased on latest powerpc next branch. Changes in v2: - * Fix suggestion from Sandipan, wrap ISA 3.1 instructions with cpu_has_feature(CPU_FTR_ARCH_31) check. Balamuruhan S (4): powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set powerpc/sstep: Support VSX vector paired storage access instructions powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions powerpc/sstep: Add testcases for VSX vector paired load/store instructions Ravi Bangoria (1): powerpc/sstep: Cover new VSX instructions under CONFIG_VSX arch/powerpc/include/asm/ppc-opcode.h | 13 ++ arch/powerpc/lib/sstep.c | 160 --- arch/powerpc/lib/test_emulate_step.c | 270 ++ 3 files changed, 421 insertions(+), 22 deletions(-) -- 2.26.2
[PATCH v4 4/4] powerpc/sstep: Add testcases for VSX vector paired load/store instructions
From: Balamuruhan S Add testcases for VSX vector paired load/store instructions. Sample o/p: emulate_step_test: lxvp : PASS emulate_step_test: stxvp : PASS emulate_step_test: lxvpx : PASS emulate_step_test: stxvpx : PASS emulate_step_test: plxvp : PASS emulate_step_test: pstxvp : PASS Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/test_emulate_step.c | 270 +++ 1 file changed, 270 insertions(+) diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c index 0a201b771477..783d1b85ecfe 100644 --- a/arch/powerpc/lib/test_emulate_step.c +++ b/arch/powerpc/lib/test_emulate_step.c @@ -612,6 +612,273 @@ static void __init test_lxvd2x_stxvd2x(void) } #endif /* CONFIG_VSX */ +#ifdef CONFIG_VSX +static void __init test_lxvp_stxvp(void) +{ + struct pt_regs regs; + union { + vector128 a; + u32 b[4]; + } c[2]; + u32 cached_b[8]; + int stepped = -1; + + if (!cpu_has_feature(CPU_FTR_ARCH_31)) { + show_result("lxvp", "SKIP (!CPU_FTR_ARCH_31)"); + show_result("stxvp", "SKIP (!CPU_FTR_ARCH_31)"); + return; + } + + init_pt_regs(); + + /*** lxvp ***/ + + cached_b[0] = c[0].b[0] = 18233; + cached_b[1] = c[0].b[1] = 34863571; + cached_b[2] = c[0].b[2] = 834; + cached_b[3] = c[0].b[3] = 6138911; + cached_b[4] = c[1].b[0] = 1234; + cached_b[5] = c[1].b[1] = 5678; + cached_b[6] = c[1].b[2] = 91011; + cached_b[7] = c[1].b[3] = 121314; + + regs.gpr[4] = (unsigned long)[0].a; + + /* +* lxvp XTp,DQ(RA) +* XTp = 32xTX + 2xTp +* let TX=1 Tp=1 RA=4 DQ=0 +*/ + stepped = emulate_step(, ppc_inst(PPC_RAW_LXVP(34, 4, 0))); + + if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) { + show_result("lxvp", "PASS"); + } else { + if (!cpu_has_feature(CPU_FTR_VSX)) + show_result("lxvp", "PASS (!CPU_FTR_VSX)"); + else + show_result("lxvp", "FAIL"); + } + + /*** stxvp ***/ + + c[0].b[0] = 21379463; + c[0].b[1] = 87; + c[0].b[2] = 374234; + c[0].b[3] = 4; + c[1].b[0] = 90; + c[1].b[1] = 122; + c[1].b[2] = 555; + c[1].b[3] = 32144; + + /* +* stxvp XSp,DQ(RA) +* XSp = 32xSX + 2xSp +* let SX=1 Sp=1 RA=4 DQ=0 +*/ + stepped = emulate_step(, ppc_inst(PPC_RAW_STXVP(34, 4, 0))); + + if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == c[0].b[1] && + cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] && + cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] && + cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] && + cpu_has_feature(CPU_FTR_VSX)) { + show_result("stxvp", "PASS"); + } else { + if (!cpu_has_feature(CPU_FTR_VSX)) + show_result("stxvp", "PASS (!CPU_FTR_VSX)"); + else + show_result("stxvp", "FAIL"); + } +} +#else +static void __init test_lxvp_stxvp(void) +{ + show_result("lxvp", "SKIP (CONFIG_VSX is not set)"); + show_result("stxvp", "SKIP (CONFIG_VSX is not set)"); +} +#endif /* CONFIG_VSX */ + +#ifdef CONFIG_VSX +static void __init test_lxvpx_stxvpx(void) +{ + struct pt_regs regs; + union { + vector128 a; + u32 b[4]; + } c[2]; + u32 cached_b[8]; + int stepped = -1; + + if (!cpu_has_feature(CPU_FTR_ARCH_31)) { + show_result("lxvpx", "SKIP (!CPU_FTR_ARCH_31)"); + show_result("stxvpx", "SKIP (!CPU_FTR_ARCH_31)"); + return; + } + + init_pt_regs(); + + /*** lxvpx ***/ + + cached_b[0] = c[0].b[0] = 18233; + cached_b[1] = c[0].b[1] = 34863571; + cached_b[2] = c[0].b[2] = 834; + cached_b[3] = c[0].b[3] = 6138911; + cached_b[4] = c[1].b[0] = 1234; + cached_b[5] = c[1].b[1] = 5678; + cached_b[6] = c[1].b[2] = 91011; + cached_b[7] = c[1].b[3] = 121314; + + regs.gpr[3] = (unsigned long)[0].a; + regs.gpr[4] = 0; + + /* +* lxvpx XTp,RA,RB +* XTp = 32xTX + 2xTp +* let TX=1 Tp=1 RA=3 RB=4 +*/ + stepped = emulate_step(, ppc_inst(PPC_RAW_LXVPX(34, 3, 4))); + + if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) { + show_result("lxvpx", "PASS&qu
[PATCH v4 3/4] powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions
From: Balamuruhan S Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as macros for new VSX vector paired instructions, * Load VSX Vector Paired (lxvp) * Load VSX Vector Paired Indexed (lxvpx) * Prefixed Load VSX Vector Paired (plxvp) * Store VSX Vector Paired (stxvp) * Store VSX Vector Paired Indexed (stxvpx) * Prefixed Store VSX Vector Paired (pstxvp) Suggested-by: Naveen N. Rao Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/ppc-opcode.h | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index a6e3700c4566..5e7918ca4fb7 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -78,6 +78,9 @@ #define IMM_L(i) ((uintptr_t)(i) & 0x) #define IMM_DS(i) ((uintptr_t)(i) & 0xfffc) +#define IMM_DQ(i) ((uintptr_t)(i) & 0xfff0) +#define IMM_D0(i) (((uintptr_t)(i) >> 16) & 0x3) +#define IMM_D1(i) IMM_L(i) /* * 16-bit immediate helper macros: HA() is for use with sign-extending instrs @@ -295,6 +298,8 @@ #define __PPC_XB(b)b) & 0x1f) << 11) | (((b) & 0x20) >> 4)) #define __PPC_XS(s)s) & 0x1f) << 21) | (((s) & 0x20) >> 5)) #define __PPC_XT(s)__PPC_XS(s) +#define __PPC_XSP(s) s) & 0x1e) | (((s) >> 5) & 0x1)) << 21) +#define __PPC_XTP(s) __PPC_XSP(s) #define __PPC_T_TLB(t) (((t) & 0x3) << 21) #define __PPC_WC(w)(((w) & 0x3) << 21) #define __PPC_WS(w)(((w) & 0x1f) << 11) @@ -395,6 +400,14 @@ #define PPC_RAW_XVCPSGNDP(t, a, b) ((0xf780 | VSX_XX3((t), (a), (b #define PPC_RAW_VPERMXOR(vrt, vra, vrb, vrc) \ ((0x102d | ___PPC_RT(vrt) | ___PPC_RA(vra) | ___PPC_RB(vrb) | (((vrc) & 0x1f) << 6))) +#define PPC_RAW_LXVP(xtp, a, i)(0x1800 | __PPC_XTP(xtp) | ___PPC_RA(a) | IMM_DQ(i)) +#define PPC_RAW_STXVP(xsp, a, i) (0x1801 | __PPC_XSP(xsp) | ___PPC_RA(a) | IMM_DQ(i)) +#define PPC_RAW_LXVPX(xtp, a, b) (0x7c00029a | __PPC_XTP(xtp) | ___PPC_RA(a) | ___PPC_RB(b)) +#define PPC_RAW_STXVPX(xsp, a, b) (0x7c00039a | __PPC_XSP(xsp) | ___PPC_RA(a) | ___PPC_RB(b)) +#define PPC_RAW_PLXVP(xtp, i, a, pr) \ + ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xe800 | __PPC_XTP(xtp) | ___PPC_RA(a) | IMM_D1(i))) +#define PPC_RAW_PSTXVP(xsp, i, a, pr) \ + ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xf800 | __PPC_XSP(xsp) | ___PPC_RA(a) | IMM_D1(i))) #define PPC_RAW_NAP(0x4c000364) #define PPC_RAW_SLEEP (0x4c0003a4) #define PPC_RAW_WINKLE (0x4c0003e4) -- 2.26.2
[PATCH v4 2/4] powerpc/sstep: Support VSX vector paired storage access instructions
From: Balamuruhan S VSX Vector Paired instructions loads/stores an octword (32 bytes) from/to storage into two sequential VSRs. Add emulation support for these new instructions: * Load VSX Vector Paired (lxvp) * Load VSX Vector Paired Indexed (lxvpx) * Prefixed Load VSX Vector Paired (plxvp) * Store VSX Vector Paired (stxvp) * Store VSX Vector Paired Indexed (stxvpx) * Prefixed Store VSX Vector Paired (pstxvp) Suggested-by: Naveen N. Rao Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/sstep.c | 146 +-- 1 file changed, 125 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index e6242744c71b..e39ee1651636 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -32,6 +32,10 @@ extern char system_call_vectored_emulate[]; #define XER_OV32 0x0008U #define XER_CA32 0x0004U +#ifdef CONFIG_VSX +#define VSX_REGISTER_XTP(rd) rd) & 1) << 5) | ((rd) & 0xfe)) +#endif + #ifdef CONFIG_PPC_FPU /* * Functions in ldstfp.S @@ -279,6 +283,19 @@ static nokprobe_inline void do_byte_reverse(void *ptr, int nb) up[1] = tmp; break; } + case 32: { + unsigned long *up = (unsigned long *)ptr; + unsigned long tmp; + + tmp = byterev_8(up[0]); + up[0] = byterev_8(up[3]); + up[3] = tmp; + tmp = byterev_8(up[2]); + up[2] = byterev_8(up[1]); + up[1] = tmp; + break; + } + #endif default: WARN_ON_ONCE(1); @@ -709,6 +726,8 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, reg->d[0] = reg->d[1] = 0; switch (op->element_size) { + case 32: + /* [p]lxvp[x] */ case 16: /* whole vector; lxv[x] or lxvl[l] */ if (size == 0) @@ -717,7 +736,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, if (IS_LE && (op->vsx_flags & VSX_LDLEFT)) rev = !rev; if (rev) - do_byte_reverse(reg, 16); + do_byte_reverse(reg, size); break; case 8: /* scalar loads, lxvd2x, lxvdsx */ @@ -793,6 +812,20 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg, size = GETSIZE(op->type); switch (op->element_size) { + case 32: + /* [p]stxvp[x] */ + if (size == 0) + break; + if (rev) { + /* reverse 32 bytes */ + buf.d[0] = byterev_8(reg->d[3]); + buf.d[1] = byterev_8(reg->d[2]); + buf.d[2] = byterev_8(reg->d[1]); + buf.d[3] = byterev_8(reg->d[0]); + reg = + } + memcpy(mem, reg, size); + break; case 16: /* stxv, stxvx, stxvl, stxvll */ if (size == 0) @@ -861,28 +894,43 @@ static nokprobe_inline int do_vsx_load(struct instruction_op *op, bool cross_endian) { int reg = op->reg; - u8 mem[16]; - union vsx_reg buf; + int i, j, nr_vsx_regs; + u8 mem[32]; + union vsx_reg buf[2]; int size = GETSIZE(op->type); if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs)) return -EFAULT; - emulate_vsx_load(op, , mem, cross_endian); + nr_vsx_regs = size / sizeof(__vector128); + emulate_vsx_load(op, buf, mem, cross_endian); preempt_disable(); if (reg < 32) { /* FP regs + extensions */ if (regs->msr & MSR_FP) { - load_vsrn(reg, ); + for (i = 0; i < nr_vsx_regs; i++) { + j = IS_LE ? nr_vsx_regs - i - 1 : i; + load_vsrn(reg + i, [j].v); + } } else { - current->thread.fp_state.fpr[reg][0] = buf.d[0]; - current->thread.fp_state.fpr[reg][1] = buf.d[1]; + for (i = 0; i < nr_vsx_regs; i++) { + j = IS_LE ? nr_vsx_regs - i - 1 : i; + current->thread.fp_state.fpr[reg + i][0] = buf[j].d[0]; + current->thread.fp_state.fpr[reg + i][1] = buf[j].d[1]; + } } } else { - if (regs->msr & MSR_VEC) - load_vsrn(reg, ); - else - current->thread.vr_state.vr
[PATCH v4 0/4] powerpc/sstep: VSX 32-byte vector paired load/store instructions
VSX vector paired instructions operates with octword (32-byte) operand for loads and stores between storage and a pair of two sequential Vector-Scalar Registers (VSRs). There are 4 word instructions and 2 prefixed instructions that provides this 32-byte storage access operations - lxvp, lxvpx, stxvp, stxvpx, plxvp, pstxvp. Emulation infrastructure doesn't have support for these instructions, to operate with 32-byte storage access and to operate with 2 VSX registers. This patch series enables the instruction emulation support and adds test cases for them respectively. v3: https://lore.kernel.org/linuxppc-dev/20200731081637.1837559-1-bal...@linux.ibm.com/ Changes in v4: - * Patch #1 is (kind of) new. * Patch #2 now enables both analyse_instr() and emulate_step() unlike prev series where both were in separate patches. * Patch #2 also has important fix for emulation on LE. * Patch #3 and #4. Added XSP/XTP, D0/D1 instruction operands, removed *_EX_OP, __PPC_T[P][X] macros which are incorrect, and adhered to PPC_RAW_* convention. * Added `CPU_FTR_ARCH_31` check in testcases to avoid failing in p8/p9. * Some consmetic changes. * Rebased to powerpc/next Changes in v3: - Worked on review comments and suggestions from Ravi and Naveen, * Fix the do_vsx_load() to handle vsx instructions if MSR_FP/MSR_VEC cleared in exception conditions and it reaches to read/write to thread_struct member fp_state/vr_state respectively. * Fix wrongly used `__vector128 v[2]` in struct vsx_reg as it should hold a single vsx register size. * Remove unnecessary `VSX_CHECK_VEC` flag set and condition to check `VSX_LDLEFT` that is not applicable for these vsx instructions. * Fix comments in emulate_vsx_load() that were misleading. * Rebased on latest powerpc next branch. Changes in v2: - * Fix suggestion from Sandipan, wrap ISA 3.1 instructions with cpu_has_feature(CPU_FTR_ARCH_31) check. * Rebase on latest powerpc next branch. Balamuruhan S (4): powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set powerpc/sstep: Support VSX vector paired storage access instructions powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions powerpc/sstep: Add testcases for VSX vector paired load/store instructions arch/powerpc/include/asm/ppc-opcode.h | 13 ++ arch/powerpc/lib/sstep.c | 152 +-- arch/powerpc/lib/test_emulate_step.c | 270 ++ 3 files changed, 414 insertions(+), 21 deletions(-) -- 2.26.2
[PATCH v4 1/4] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
From: Balamuruhan S Unconditional emulation of prefixed instructions will allow emulation of them on Power10 predecessors which might cause issues. Restrict that. Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/sstep.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index e9dcaba9a4f8..e6242744c71b 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1346,6 +1346,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, switch (opcode) { #ifdef __powerpc64__ case 1: + if (!cpu_has_feature(CPU_FTR_ARCH_31)) + return -1; + prefix_r = GET_PREFIX_R(word); ra = GET_PREFIX_RA(suffix); rd = (suffix >> 21) & 0x1f; @@ -2733,6 +2736,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, } break; case 1: /* Prefixed instructions */ + if (!cpu_has_feature(CPU_FTR_ARCH_31)) + return -1; + prefix_r = GET_PREFIX_R(word); ra = GET_PREFIX_RA(suffix); op->update_reg = ra; -- 2.26.2
Re: [PATCH v6 0/8] powerpc/watchpoint: Bug fixes plus new feature flag
On 9/17/20 6:54 PM, Rogerio Alves wrote: On 9/2/20 1:29 AM, Ravi Bangoria wrote: Patch #1 fixes issue for quardword instruction on p10 predecessors. Patch #2 fixes issue for vector instructions. Patch #3 fixes a bug about watchpoint not firing when created with ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N. The fix uses HW_BRK_TYPE_PRIV_ALL for ptrace user which, I guess, should be fine because we don't leak any kernel addresses and PRIV_ALL will also help to cover scenarios when kernel accesses user memory. Patch #4,#5 fixes infinite exception bug, again the bug happens only with CONFIG_HAVE_HW_BREAKPOINT=N. Patch #6 fixes two places where we are missing to set hw_len. Patch #7 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 which will be set when running on ISA 3.1 compliant machine. Patch #8 finally adds selftest to test scenarios fixed by patch#2,#3 and also moves MODE_EXACT tests outside of BP_RANGE condition. [...] Tested this patch set for: - SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N = OK - Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N = OK - Check for PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 = OK - Fix quarword instruction handling on p10 predecessors = OK - Fix handling of vector instructions = OK Also tested for: - Set second watchpoint (P10 Mambo) = OK - Infinity loop on sc instruction = OK Thanks Rogerio! Ravi
Re: [PATCH 0/7] powerpc/watchpoint: 2nd DAWR kvm enablement + selftests
Hi Paul, On 9/2/20 8:02 AM, Paul Mackerras wrote: On Thu, Jul 23, 2020 at 03:50:51PM +0530, Ravi Bangoria wrote: Patch #1, #2 and #3 enables p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it. A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has been added to query 2nd DAWR support via kvm ioctl. This feature also needs to be enabled in Qemu to really use it. I'll reply link to qemu patches once I post them in qemu-devel mailing list. Patch #4, #5, #6 and #7 adds selftests to test 2nd DAWR. If/when you resubmit these patches, please split the KVM patches into a separate series, since the KVM patches would go via my tree whereas I expect the selftests/powerpc patches would go through Michael Ellerman's tree. Sure. Will split it. Thanks, Ravi
Re: [PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR
Hi Paul, diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 33793444144c..03f401d7be41 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -538,6 +538,8 @@ struct hv_guest_state { s64 tb_offset; u64 dawr0; u64 dawrx0; + u64 dawr1; + u64 dawrx1; u64 ciabr; u64 hdec_expiry; u64 purr; After this struct, there is a macro HV_GUEST_STATE_VERSION, I guess that also needs to be incremented because I'm adding new members in the struct? Thanks, Ravi
Re: [PATCH 1/7] powerpc/watchpoint/kvm: Rename current DAWR macros and variables
Hi Paul, On 9/2/20 7:19 AM, Paul Mackerras wrote: On Thu, Jul 23, 2020 at 03:50:52PM +0530, Ravi Bangoria wrote: Power10 is introducing second DAWR. Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. Most of this looks fine, but I think we should not change the existing names in arch/powerpc/include/uapi/asm/kvm.h (and therefore also Documentation/virt/kvm/api.rst). Missed that I'm changing uapi. I'll rename only those macros/variables which are not uapi. Thanks, Ravi
Re: [PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR
Hi Paul, On 9/2/20 7:31 AM, Paul Mackerras wrote: On Thu, Jul 23, 2020 at 03:50:53PM +0530, Ravi Bangoria wrote: kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Is this the same interface as will be defined in PAPR and available under PowerVM, or is it a new/different interface for KVM? Yes, kvm hcall interface for 2nd DAWR is same as PowerVM, as defined in PAPR. Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set. In general QEMU wants to be able to control all aspects of the virtual machine presented to the guest, meaning that just because a host has a particular hardware capability does not mean we should automatically present that capability to the guest. In this case, QEMU will want a way to control whether the guest sees the availability of the second DAWR/X registers or not, i.e. whether a H_SET_MODE to set DAWR[X]1 will succeed or fail. Patch #3 adds new kvm capability KVM_CAP_PPC_DAWR1 that can be checked by Qemu. Also, as suggested by David in Qemu patch[1], I'm planning to add new machine capability in Qemu: -machine cap-dawr1=ON/OFF cap-dawr1 will be default ON when PPC_FEATURE2_ARCH_3_10 is set and OFF otherwise. Is this correct approach? [1]: https://lore.kernel.org/kvm/20200724045613.ga8...@umbus.fritz.box Thanks, Ravi
[PATCH v6 8/8] powerpc/watchpoint/selftests: Tests for kernel accessing user memory
Introduce tests to cover simple scenarios where user is watching memory which can be accessed by kernel as well. We also support _MODE_EXACT with _SETHWDEBUG interface. Move those testcases out- side of _BP_RANGE condition. This will help to test _MODE_EXACT scenarios when CONFIG_HAVE_HW_BREAKPOINT is not set, eg: $ ./ptrace-hwbreak ... PTRACE_SET_DEBUGREG, Kernel Access Userspace, len: 8: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace, len: 1: Ok success: ptrace-hwbreak Suggested-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 48 ++- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c index fc477dfe86a2..2e0d86e0687e 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c @@ -20,6 +20,8 @@ #include #include #include +#include +#include #include "ptrace.h" #define SPRN_PVR 0x11F @@ -44,6 +46,7 @@ struct gstruct { }; static volatile struct gstruct gstruct __attribute__((aligned(512))); +static volatile char cwd[PATH_MAX] __attribute__((aligned(8))); static void get_dbginfo(pid_t child_pid, struct ppc_debug_info *dbginfo) { @@ -138,6 +141,9 @@ static void test_workload(void) write_var(len); } + /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */ + syscall(__NR_getcwd, , PATH_MAX); + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO test */ write_var(1); @@ -150,6 +156,9 @@ static void test_workload(void) else read_var(1); + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */ + syscall(__NR_getcwd, , PATH_MAX); + /* PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, WO test */ gstruct.a[rand() % A_LEN] = 'a'; @@ -293,6 +302,24 @@ static int test_set_debugreg(pid_t child_pid) return 0; } +static int test_set_debugreg_kernel_userspace(pid_t child_pid) +{ + unsigned long wp_addr = (unsigned long)cwd; + char *name = "PTRACE_SET_DEBUGREG"; + + /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */ + wp_addr &= ~0x7UL; + wp_addr |= (1Ul << DABR_READ_SHIFT); + wp_addr |= (1UL << DABR_WRITE_SHIFT); + wp_addr |= (1UL << DABR_TRANSLATION_SHIFT); + ptrace_set_debugreg(child_pid, wp_addr); + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "Kernel Access Userspace", wp_addr, 8); + + ptrace_set_debugreg(child_pid, 0); + return 0; +} + static void get_ppc_hw_breakpoint(struct ppc_hw_breakpoint *info, int type, unsigned long addr, int len) { @@ -338,6 +365,22 @@ static void test_sethwdebug_exact(pid_t child_pid) ptrace_delhwdebug(child_pid, wh); } +static void test_sethwdebug_exact_kernel_userspace(pid_t child_pid) +{ + struct ppc_hw_breakpoint info; + unsigned long wp_addr = (unsigned long) + char *name = "PPC_PTRACE_SETHWDEBUG, MODE_EXACT"; + int len = 1; /* hardcoded in kernel */ + int wh; + + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */ + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr, 0); + wh = ptrace_sethwdebug(child_pid, ); + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "Kernel Access Userspace", wp_addr, len); + ptrace_delhwdebug(child_pid, wh); +} + static void test_sethwdebug_range_aligned(pid_t child_pid) { struct ppc_hw_breakpoint info; @@ -452,9 +495,10 @@ static void run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, bool dawr) { test_set_debugreg(child_pid); + test_set_debugreg_kernel_userspace(child_pid); + test_sethwdebug_exact(child_pid); + test_sethwdebug_exact_kernel_userspace(child_pid); if (dbginfo->features & PPC_DEBUG_FEATURE_DATA_BP_RANGE) { - test_sethwdebug_exact(child_pid); - test_sethwdebug_range_aligned(child_pid); if (dawr || is_8xx) { test_sethwdebug_range_unaligned(child_pid); -- 2.26.2
[PATCH v6 7/8] powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 can be used to determine whether we are running on an ISA 3.1 compliant machine. Which is needed to determine DAR behaviour, 512 byte boundary limit etc. This was requested by Pedro Miraglia Franco de Carvalho for extending watchpoint features in gdb. Note that availability of 2nd DAWR is independent of this flag and should be checked using ppc_debug_info->num_data_bps. Signed-off-by: Ravi Bangoria --- Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 ++ 3 files changed, 4 insertions(+) diff --git a/Documentation/powerpc/ptrace.rst b/Documentation/powerpc/ptrace.rst index 864d4b61..77725d69eb4a 100644 --- a/Documentation/powerpc/ptrace.rst +++ b/Documentation/powerpc/ptrace.rst @@ -46,6 +46,7 @@ features will have bits indicating whether there is support for:: #define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x10 + #define PPC_DEBUG_FEATURE_DATA_BP_ARCH_310x20 2. PTRACE_SETHWDEBUG diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h index f5f1ccc740fc..7004cfea3f5f 100644 --- a/arch/powerpc/include/uapi/asm/ptrace.h +++ b/arch/powerpc/include/uapi/asm/ptrace.h @@ -222,6 +222,7 @@ struct ppc_debug_info { #define PPC_DEBUG_FEATURE_DATA_BP_RANGE0x0004 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x0008 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x0010 +#define PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 0x0020 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 48c52426af80..aa36fcad36cd 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -57,6 +57,8 @@ void ppc_gethwdinfo(struct ppc_debug_info *dbginfo) } else { dbginfo->features = 0; } + if (cpu_has_feature(CPU_FTR_ARCH_31)) + dbginfo->features |= PPC_DEBUG_FEATURE_DATA_BP_ARCH_31; } int ptrace_get_debugreg(struct task_struct *child, unsigned long addr, -- 2.26.2
[PATCH v6 6/8] powerpc/watchpoint: Add hw_len wherever missing
There are couple of places where we set len but not hw_len. For ptrace/perf watchpoints, when CONFIG_HAVE_HW_BREAKPOINT=Y, hw_len will be calculated and set internally while parsing watchpoint. But when CONFIG_HAVE_HW_BREAKPOINT=N, we need to manually set 'hw_len'. Similarly for xmon as well, hw_len needs to be set directly. Fixes: b57aeab811db ("powerpc/watchpoint: Fix length calculation for unaligned target") Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/ptrace/ptrace-noadv.c | 1 + arch/powerpc/xmon/xmon.c | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index c9122ed91340..48c52426af80 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -219,6 +219,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE); brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL; brk.len = DABR_MAX_LEN; + brk.hw_len = DABR_MAX_LEN; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ) brk.type |= HW_BRK_TYPE_READ; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index df7bca00f5ec..55c43a6c9111 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -969,6 +969,7 @@ static void insert_cpu_bpts(void) brk.address = dabr[i].address; brk.type = (dabr[i].enabled & HW_BRK_TYPE_DABR) | HW_BRK_TYPE_PRIV_ALL; brk.len = 8; + brk.hw_len = 8; __set_breakpoint(i, ); } } -- 2.26.2
[PATCH v6 5/8] powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N
On powerpc, ptrace watchpoint works in one-shot mode. i.e. kernel disables event every time it fires and user has to re-enable it. Also, in case of ptrace watchpoint, kernel notifies ptrace user before executing instruction. With CONFIG_HAVE_HW_BREAKPOINT=N, kernel is missing to disable ptrace event and thus it's causing infinite loop of exceptions. This is especially harmful when user watches on a data which is also read/written by kernel, eg syscall parameters. In such case, infinite exceptions happens in kernel mode which causes soft-lockup. Fixes: 9422de3e953d ("powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers") Reported-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 3 ++ arch/powerpc/kernel/process.c | 48 +++ arch/powerpc/kernel/ptrace/ptrace-noadv.c | 4 +- 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 81872c420476..abebfbee5b1c 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -18,6 +18,7 @@ struct arch_hw_breakpoint { u16 type; u16 len; /* length of the target data symbol */ u16 hw_len; /* length programmed in hw */ + u8 flags; }; /* Note: Don't change the first 6 bits below as they are in the same order @@ -37,6 +38,8 @@ struct arch_hw_breakpoint { #define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \ HW_BRK_TYPE_HYP) +#define HW_BRK_FLAG_DISABLED 0x1 + /* Minimum granularity */ #ifdef CONFIG_PPC_8xx #define HW_BREAKPOINT_SIZE 0x4 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 016bd831908e..160fbbf41d40 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -636,6 +636,44 @@ void do_send_trap(struct pt_regs *regs, unsigned long address, (void __user *)address); } #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ + +static void do_break_handler(struct pt_regs *regs) +{ + struct arch_hw_breakpoint null_brk = {0}; + struct arch_hw_breakpoint *info; + struct ppc_inst instr = ppc_inst(0); + int type = 0; + int size = 0; + unsigned long ea; + int i; + + /* +* If underneath hw supports only one watchpoint, we know it +* caused exception. 8xx also falls into this category. +*/ + if (nr_wp_slots() == 1) { + __set_breakpoint(0, _brk); + current->thread.hw_brk[0] = null_brk; + current->thread.hw_brk[0].flags |= HW_BRK_FLAG_DISABLED; + return; + } + + /* Otherwise findout which DAWR caused exception and disable it. */ + wp_get_instr_detail(regs, , , , ); + + for (i = 0; i < nr_wp_slots(); i++) { + info = >thread.hw_brk[i]; + if (!info->address) + continue; + + if (wp_check_constraints(regs, instr, ea, type, size, info)) { + __set_breakpoint(i, _brk); + current->thread.hw_brk[i] = null_brk; + current->thread.hw_brk[i].flags |= HW_BRK_FLAG_DISABLED; + } + } +} + void do_break (struct pt_regs *regs, unsigned long address, unsigned long error_code) { @@ -647,6 +685,16 @@ void do_break (struct pt_regs *regs, unsigned long address, if (debugger_break_match(regs)) return; + /* +* We reach here only when watchpoint exception is generated by ptrace +* event (or hw is buggy!). Now if CONFIG_HAVE_HW_BREAKPOINT is set, +* watchpoint is already handled by hw_breakpoint_handler() so we don't +* have to do anything. But when CONFIG_HAVE_HW_BREAKPOINT is not set, +* we need to manually handle the watchpoint here. +*/ + if (!IS_ENABLED(CONFIG_HAVE_HW_BREAKPOINT)) + do_break_handler(regs); + /* Deliver the signal to userspace */ force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)address); } diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 57a0ab822334..c9122ed91340 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -286,11 +286,13 @@ long ppc_del_hwdebug(struct task_struct *child, long data) } return ret; #else /* CONFIG_HAVE_HW_BREAKPOINT */ - if (child->thread.hw_brk[data - 1].address == 0) + if (!(child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED) && + child->thread.hw_brk[data - 1].address == 0) return -ENOENT; child
[PATCH v6 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c
Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused the exception. So we have a sw logic to detect that in hw_breakpoint.c. But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y. Move DAWR detection logic outside of hw_breakpoint.c so that it can be reused when CONFIG_HAVE_HW_BREAKPOINT is not set. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 8 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/hw_breakpoint.c | 159 + .../kernel/hw_breakpoint_constraints.c| 162 ++ 4 files changed, 174 insertions(+), 158 deletions(-) create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 9b68eafebf43..81872c420476 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -10,6 +10,7 @@ #define _PPC_BOOK3S_64_HW_BREAKPOINT_H #include +#include #ifdef __KERNEL__ struct arch_hw_breakpoint { @@ -52,6 +53,13 @@ static inline int nr_wp_slots(void) return cpu_has_feature(CPU_FTR_DAWR1) ? 2 : 1; } +bool wp_check_constraints(struct pt_regs *regs, struct ppc_inst instr, + unsigned long ea, int type, int size, + struct arch_hw_breakpoint *info); + +void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, +int *type, int *size, unsigned long *ea); + #ifdef CONFIG_HAVE_HW_BREAKPOINT #include #include diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index cbf41fb4ee89..a5550c2b24c4 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -45,7 +45,8 @@ obj-y := cputable.o syscalls.o \ signal.o sysfs.o cacheinfo.o time.o \ prom.o traps.o setup-common.o \ udbg.o misc.o io.o misc_$(BITS).o \ - of_platform.o prom_parse.o firmware.o + of_platform.o prom_parse.o firmware.o \ + hw_breakpoint_constraints.o obj-y += ptrace/ obj-$(CONFIG_PPC64)+= setup_64.o \ paca.o nvram_64.o note.o syscall_64.o diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index f6b24838ca3c..f4e8f21046f5 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -494,161 +494,6 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) } } -static bool dar_in_user_range(unsigned long dar, struct arch_hw_breakpoint *info) -{ - return ((info->address <= dar) && (dar - info->address < info->len)); -} - -static bool ea_user_range_overlaps(unsigned long ea, int size, - struct arch_hw_breakpoint *info) -{ - return ((ea < info->address + info->len) && - (ea + size > info->address)); -} - -static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info) -{ - unsigned long hw_start_addr, hw_end_addr; - - hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE); - hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE); - - return ((hw_start_addr <= dar) && (hw_end_addr > dar)); -} - -static bool ea_hw_range_overlaps(unsigned long ea, int size, -struct arch_hw_breakpoint *info) -{ - unsigned long hw_start_addr, hw_end_addr; - unsigned long align_size = HW_BREAKPOINT_SIZE; - - /* -* On p10 predecessors, quadword is handle differently then -* other instructions. -*/ - if (!cpu_has_feature(CPU_FTR_ARCH_31) && size == 16) - align_size = HW_BREAKPOINT_SIZE_QUADWORD; - - hw_start_addr = ALIGN_DOWN(info->address, align_size); - hw_end_addr = ALIGN(info->address + info->len, align_size); - - return ((ea < hw_end_addr) && (ea + size > hw_start_addr)); -} - -/* - * If hw has multiple DAWR registers, we also need to check all - * dawrx constraint bits to confirm this is _really_ a valid event. - * If type is UNKNOWN, but privilege level matches, consider it as - * a positive match. - */ -static bool check_dawrx_constraints(struct pt_regs *regs, int type, - struct arch_hw_breakpoint *info) -{ - if (OP_IS_LOAD(type) && !(info->type & HW_BRK_TYPE_READ)) - return false; - - /* -* The Cache Management instructions other than dcbz never -* cause a match. i.e. if type is CACHEOP, the instruction -* is dcbz, and dcbz is tr
[PATCH v6 3/8] powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N
When kernel is compiled with CONFIG_HAVE_HW_BREAKPOINT=N, user can still create watchpoint using PPC_PTRACE_SETHWDEBUG, with limited functionalities. But, such watchpoints are never firing because of the missing privilege settings. Fix that. It's safe to set HW_BRK_TYPE_PRIV_ALL because we don't really leak any kernel address in signal info. Setting HW_BRK_TYPE_PRIV_ALL will also help to find scenarios when kernel accesses user memory. Reported-by: Pedro Miraglia Franco de Carvalho Suggested-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 697c7e4b5877..57a0ab822334 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -217,7 +217,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf return -EIO; brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE); - brk.type = HW_BRK_TYPE_TRANSLATE; + brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL; brk.len = DABR_MAX_LEN; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ) brk.type |= HW_BRK_TYPE_READ; -- 2.26.2
[PATCH v6 2/8] powerpc/watchpoint: Fix handling of vector instructions
Vector load/store instructions are special because they are always aligned. Thus unaligned EA needs to be aligned down before comparing it with watch ranges. Otherwise we might consider valid event as invalid. Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/hw_breakpoint.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 9f7df1c37233..f6b24838ca3c 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -644,6 +644,8 @@ static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, if (*type == CACHEOP) { *size = cache_op_size(); *ea &= ~(*size - 1); + } else if (*type == LOAD_VMX || *type == STORE_VMX) { + *ea &= ~(*size - 1); } } -- 2.26.2
[PATCH v6 1/8] powerpc/watchpoint: Fix quarword instruction handling on p10 predecessors
On p10 predecessors, watchpoint with quarword access is compared at quardword length. If the watch range is doubleword or less than that in a first half of quarword aligned 16 bytes, and if there is any unaligned quadword access which will access only the 2nd half, the handler should consider it as extraneous and emulate/single-step it before continuing. Reported-by: Pedro Miraglia Franco de Carvalho Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 1 + arch/powerpc/kernel/hw_breakpoint.c | 12 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index db206a7f38e2..9b68eafebf43 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -42,6 +42,7 @@ struct arch_hw_breakpoint { #else #define HW_BREAKPOINT_SIZE 0x8 #endif +#define HW_BREAKPOINT_SIZE_QUADWORD0x10 #define DABR_MAX_LEN 8 #define DAWR_MAX_LEN 512 diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 1f4a1efa0074..9f7df1c37233 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -520,9 +520,17 @@ static bool ea_hw_range_overlaps(unsigned long ea, int size, struct arch_hw_breakpoint *info) { unsigned long hw_start_addr, hw_end_addr; + unsigned long align_size = HW_BREAKPOINT_SIZE; - hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE); - hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE); + /* +* On p10 predecessors, quadword is handle differently then +* other instructions. +*/ + if (!cpu_has_feature(CPU_FTR_ARCH_31) && size == 16) + align_size = HW_BREAKPOINT_SIZE_QUADWORD; + + hw_start_addr = ALIGN_DOWN(info->address, align_size); + hw_end_addr = ALIGN(info->address + info->len, align_size); return ((ea < hw_end_addr) && (ea + size > hw_start_addr)); } -- 2.26.2
[PATCH v6 0/8] powerpc/watchpoint: Bug fixes plus new feature flag
Patch #1 fixes issue for quardword instruction on p10 predecessors. Patch #2 fixes issue for vector instructions. Patch #3 fixes a bug about watchpoint not firing when created with ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N. The fix uses HW_BRK_TYPE_PRIV_ALL for ptrace user which, I guess, should be fine because we don't leak any kernel addresses and PRIV_ALL will also help to cover scenarios when kernel accesses user memory. Patch #4,#5 fixes infinite exception bug, again the bug happens only with CONFIG_HAVE_HW_BREAKPOINT=N. Patch #6 fixes two places where we are missing to set hw_len. Patch #7 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 which will be set when running on ISA 3.1 compliant machine. Patch #8 finally adds selftest to test scenarios fixed by patch#2,#3 and also moves MODE_EXACT tests outside of BP_RANGE condition. Christophe, let me know if this series breaks something for 8xx. v5: https://lore.kernel.org/r/20200825043617.1073634-1-ravi.bango...@linux.ibm.com v5->v6: - Fix build faulure reported by kernel test robot - patch #5. Use more compact if condition, suggested by Christophe Ravi Bangoria (8): powerpc/watchpoint: Fix quarword instruction handling on p10 predecessors powerpc/watchpoint: Fix handling of vector instructions powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N powerpc/watchpoint: Add hw_len wherever missing powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 powerpc/watchpoint/selftests: Tests for kernel accessing user memory Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/asm/hw_breakpoint.h | 12 ++ arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/hw_breakpoint.c | 149 +--- .../kernel/hw_breakpoint_constraints.c| 162 ++ arch/powerpc/kernel/process.c | 48 ++ arch/powerpc/kernel/ptrace/ptrace-noadv.c | 9 +- arch/powerpc/xmon/xmon.c | 1 + .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 48 +- 10 files changed, 282 insertions(+), 152 deletions(-) create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c -- 2.26.2
Re: [PATCH v5 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c
Hi Christophe, +static int cache_op_size(void) +{ +#ifdef __powerpc64__ + return ppc64_caches.l1d.block_size; +#else + return L1_CACHE_BYTES; +#endif +} You've got l1_dcache_bytes() in arch/powerpc/include/asm/cache.h to do that. + +void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, + int *type, int *size, unsigned long *ea) +{ + struct instruction_op op; + + if (__get_user_instr_inatomic(*instr, (void __user *)regs->nip)) + return; + + analyse_instr(, regs, *instr); + *type = GETTYPE(op.type); + *ea = op.ea; +#ifdef __powerpc64__ + if (!(regs->msr & MSR_64BIT)) + *ea &= 0xUL; +#endif This #ifdef is unneeded, it should build fine on a 32 bits too. This patch is just a code movement from one file to another. I don't really change the logic. Would you mind if I do a separate patch for these changes (not a part of this series)? Thanks for review, Ravi