from:"Ravi Bangoria"

Re: [PATCH] perf test: Update event group check for support of uncore event

2022-12-07 Thread Ravi Bangoria

On 07-Dec-22 10:28 PM, Athira Rajeev wrote:
> Event group test checks group creation for combinations of
> hw, sw and uncore PMU events. Some of the uncore pmus may
> require additional permission to access the counters.
> For example, in case of hv_24x7, partition need to have
> permissions to access hv_24x7 pmu counters. If not, event_open
> will fail. Hence add a sanity check to see if event_open
> succeeds before proceeding with the test.
> 
> Fixes: b20d9215a35f ("perf test: Add event group test for events in multiple 
> PMUs")
> Signed-off-by: Athira Rajeev 

Acked-by: Ravi Bangoria 

Thanks,
Ravi

Re: [PATCH v2] perf: Rewrite core context handling

2022-10-17 Thread Ravi Bangoria

On 14-Oct-22 3:26 PM, Ravi Bangoria wrote:
> On 13-Oct-22 4:29 PM, Peter Zijlstra wrote:
>> On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote:
>>
>>>> -  refcount_t  refcount;
>>>> +  refcount_t  refcount; /* event <-> ctx */
>>>
>>> Ok. We need to remove all those // XXX get/put_ctx() from code
>>> which we added to make refcount a pmu_ctx <-> ctx.
>>
>> Them already gone :-) I've not yet fixed up the typoes, but current
>> version should be here:
>>
>>   
>> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core
>>
>> Thanks!
> 
> I've been running perf-fuzzer on Xeon machine since yesterday and I don't see 
> any
> issue. Will do the same on my AMD machine as well over the weekend.

Only one WARN_ON() hit. Otherwise all good.
https://lore.kernel.org/lkml/8d91528b-e830-6ad0-8a92-621ce9f94...@amd.com

Thanks,
Ravi

Re: [PATCH v2] perf: Rewrite core context handling

2022-10-14 Thread Ravi Bangoria

On 13-Oct-22 4:29 PM, Peter Zijlstra wrote:
> On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote:
> 
>>> -   refcount_t  refcount;
>>> +   refcount_t  refcount; /* event <-> ctx */
>>
>> Ok. We need to remove all those // XXX get/put_ctx() from code
>> which we added to make refcount a pmu_ctx <-> ctx.
> 
> Them already gone :-) I've not yet fixed up the typoes, but current
> version should be here:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core
> 
> Thanks!

I've been running perf-fuzzer on Xeon machine since yesterday and I don't see 
any
issue. Will do the same on my AMD machine as well over the weekend.

Thanks,
Ravi

Re: [PATCH v2] perf: Rewrite core context handling

2022-10-13 Thread Ravi Bangoria

On 13-Oct-22 2:17 AM, Peter Zijlstra wrote:
> On Wed, Oct 12, 2022 at 02:16:29PM +0200, Peter Zijlstra wrote:
> 
>> That's the intent yeah. But due to not always holding ctx->mutex over
>> put_pmu_ctx() this might be moot. I'm almost through auditing epc usage
>> and I think ctx->lock is sufficient, fingers crossed.
> 
> So the very last epc usage threw a spanner into the works and made
> things complicated.
> 
> Specifically sys_perf_event_open()'s group_leader case uses
> event->pmu_ctx while only holding ctx->mutex. Therefore we can't fully
> let go of ctx->mutex locking and purely rely on ctx->lock.
> 
> Now the good news is that the annoying put_pmu_ctx() without holding
> ctx->mutex case doesn't actually matter here. Since we hold a reference
> on the group_leader (per the filedesc) the event can't go away,
> therefore it must have a pmu_ctx, and then holding ctx->mutex ensures
> the pmu_ctx is stable -- iow it serializes against
> sys_perf_event_open()'s move_group and perf_pmu_migrate_context()
> changing the epc around.
> 
> So we're going with the normal mutex+lock for modification rule, but
> allow the weird put_pmu_ctx() exception.
> 
> I have the below delta.
> 
> I'm hoping we can call this done -- I'm going to see if I can bribe Mark
> to take a look at the arm64 thing soon and then hopefully queue the
> whole thing once -rc1 happens. That should give us a good long soak
> until the next merge window.

Sounds good. Thanks for all the help!

I've glanced through the changes and they looks fine, below are few minor
points.

> + * Specificially, sys_perf_event_open()'s group_leader case depends on
> + * ctx->mutex pinning the configuration. Since we hold a reference on
> + * group_leader (through the filedesc) it can't fo away, therefore it's

typo: can't go away

> - refcount_t  refcount;
> + refcount_t  refcount; /* event <-> ctx */

Ok. We need to remove all those // XXX get/put_ctx() from code
which we added to make refcount a pmu_ctx <-> ctx.

> +#define double_list_for_each_entry(pos1, pos2, head1, head2, member) \
> + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \
> +  pos2 = list_first_entry(head2, typeof(*pos2), member); \
> +  !list_entry_is_head(pos1, head1, member) &&\
> +  !list_entry_is_head(pos2, head2, member);  \
> +  pos1 = list_next_entry(pos1, member),  \
> +  pos2 = list_next_entry(pos2, member))
> +
>  static void perf_event_swap_task_ctx_data(struct perf_event_context 
> *prev_ctx,
> struct perf_event_context *next_ctx)

While this is unrelated to this patch, shouldn't we also need to swap
event->hw.target? A purely hypothetical scenario: Consider two processes
having clone contexts (for example, two children of the same parent).
While process switch between these two, the perf event context would get
swapped but event->hw.target will point to other sibling's task_struct.
If any one process exit just after single context swap, _free_event()
will call put_task_context() on sibling process' task_struct.

> @@ -12436,6 +12463,9 @@ SYSCALL_DEFINE5(perf_event_open,
>* Allow the addition of software events to hw
>* groups, this is safe because software events
>* never fail to schedule.
> +  *
> +  * Note the comment that goes with struct
> +  * pmu_event_pmu_context.

typo: perf_event_pmu_context

The good (or bad? ;)) news is, perf test and Vince's perf_event_tests
are running fine without any regression on my machine.

Thanks,
Ravi

Re: [PATCH v2] perf: Rewrite core context handling

2022-10-12 Thread Ravi Bangoria

On 11-Oct-22 11:17 PM, Peter Zijlstra wrote:
> On Tue, Oct 11, 2022 at 04:02:56PM +0200, Peter Zijlstra wrote:
>> On Tue, Oct 11, 2022 at 06:49:55PM +0530, Ravi Bangoria wrote:
>>> On 11-Oct-22 4:59 PM, Peter Zijlstra wrote:
>>>> On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote:
>>>>
>>>>> +static void perf_event_swap_task_ctx_data(struct perf_event_context 
>>>>> *prev_ctx,
>>>>> +   struct perf_event_context *next_ctx)
>>>>> +{
>>>>> + struct perf_event_pmu_context *prev_epc, *next_epc;
>>>>> +
>>>>> + if (!prev_ctx->nr_task_data)
>>>>> + return;
>>>>> +
>>>>> + prev_epc = list_first_entry(_ctx->pmu_ctx_list,
>>>>> + struct perf_event_pmu_context,
>>>>> + pmu_ctx_entry);
>>>>> + next_epc = list_first_entry(_ctx->pmu_ctx_list,
>>>>> + struct perf_event_pmu_context,
>>>>> + pmu_ctx_entry);
>>>>> +
>>>>> + while (_epc->pmu_ctx_entry != _ctx->pmu_ctx_list &&
>>>>> +_epc->pmu_ctx_entry != _ctx->pmu_ctx_list) {
>>>>> +
>>>>> + WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu);
>>>>> +
>>>>> + /*
>>>>> +  * PMU specific parts of task perf context can require
>>>>> +  * additional synchronization. As an example of such
>>>>> +  * synchronization see implementation details of Intel
>>>>> +  * LBR call stack data profiling;
>>>>> +  */
>>>>> + if (prev_epc->pmu->swap_task_ctx)
>>>>> + prev_epc->pmu->swap_task_ctx(prev_epc, next_epc);
>>>>> + else
>>>>> + swap(prev_epc->task_ctx_data, next_epc->task_ctx_data);
>>>>
>>>> Did I forget to advance the iterators here?
>>>
>>> Yeah. Seems so. I overlooked it too.
>>
>> OK; so I'm not slowly going crazy staring at this code ;-) Let me go add
>> it now then. :-)
>>
>> But first I gotta taxi the kids around for a bit, bbl.
> 
> OK, so I've been going over the perf_event_pmu_context life-time thing
> as well, there were a bunch of XXXs there and I'm not sure Im happy with
> things, but I'd also forgotten most of it.
> 
> Ideally epc works like it's a regular member of ctx -- locking wise that
> is, but I'm not sure we can make that stick -- see the ctx->mutex issues
> we have with put_ctx().
> 
> As such, I'm going to have to re-audit all the epc usage to see if
> pure ctx->lock is sufficient.
> 
> I did do make epc RCU freed, because pretty much everything is and that
> was easy enough to make happen -- it means we don't need to worry about
> that.
> 
> But I'm going cross-eyes from staring at this all day, so more tomorrow.
> The below is what I currently have.
> 
> ---
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -833,13 +833,13 @@ struct perf_event {
>   *   `[1:n]-' `-[n:1]-> pmu <-[1:n]-'
>   *
>   *
> - * XXX destroy epc when empty
> - *   refcount, !rcu
> + * epc lifetime is refcount based and RCU freed (similar to 
> perf_event_context).
> + * epc locking is as if it were a member of perf_event_context; specifically:
>   *
> - * XXX epc locking
> + *   modification, both: ctx->mutex && ctx->lock
> + *   reading, either: ctx->mutex || ctx->lock
>   *
> - *   event->pmu_ctxctx->mutex && inactive
> - *   ctx->pmu_ctx_list ctx->mutex && ctx->lock
> + * XXX except this isn't true ... see put_pmu_ctx().
>   *
>   */
>  struct perf_event_pmu_context {
> @@ -857,6 +857,7 @@ struct perf_event_pmu_context {
>   unsigned intnr_events;
>  
>   atomic_trefcount; /* event <-> epc */
> + struct rcu_head rcu_head;
>  
>   void*task_ctx_data; /* pmu specific data */
>   /*
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1727,6 +1727,10 @@ perf_event_groups_next(struct perf_event
>   return NULL;
>  }
>  
> +#define perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu)   
> \
> + for (event = perf_event_groups_first(groups, cpu,

Re: [PATCH v2] perf: Rewrite core context handling

2022-10-11 Thread Ravi Bangoria

On 11-Oct-22 4:59 PM, Peter Zijlstra wrote:
> On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote:
> 
>> +static void perf_event_swap_task_ctx_data(struct perf_event_context 
>> *prev_ctx,
>> +  struct perf_event_context *next_ctx)
>> +{
>> +struct perf_event_pmu_context *prev_epc, *next_epc;
>> +
>> +if (!prev_ctx->nr_task_data)
>> +return;
>> +
>> +prev_epc = list_first_entry(_ctx->pmu_ctx_list,
>> +struct perf_event_pmu_context,
>> +pmu_ctx_entry);
>> +next_epc = list_first_entry(_ctx->pmu_ctx_list,
>> +struct perf_event_pmu_context,
>> +pmu_ctx_entry);
>> +
>> +while (_epc->pmu_ctx_entry != _ctx->pmu_ctx_list &&
>> +   _epc->pmu_ctx_entry != _ctx->pmu_ctx_list) {
>> +
>> +WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu);
>> +
>> +/*
>> + * PMU specific parts of task perf context can require
>> + * additional synchronization. As an example of such
>> + * synchronization see implementation details of Intel
>> + * LBR call stack data profiling;
>> + */
>> +if (prev_epc->pmu->swap_task_ctx)
>> +prev_epc->pmu->swap_task_ctx(prev_epc, next_epc);
>> +else
>> +swap(prev_epc->task_ctx_data, next_epc->task_ctx_data);
> 
> Did I forget to advance the iterators here?

Yeah. Seems so. I overlooked it too.

Thanks,
Ravi

Re: [PATCH] perf: Rewrite core context handling

2022-10-10 Thread Ravi Bangoria

On 10-Oct-22 3:53 PM, Peter Zijlstra wrote:
> On Tue, Sep 06, 2022 at 11:20:53AM +0530, Ravi Bangoria wrote:
> 
>> This one was simple enough so I prepared a patch for this. Let
>> me know if you see any issues with below diff.
> 
> I've extraed this as a separate patch since it's not strictly required
> for correctness and the patch is a quite large enough.

Sure. I'll keep it separate.

Thanks,
Ravi

Re: [PATCH] perf: Rewrite core context handling

2022-10-10 Thread Ravi Bangoria

On 10-Oct-22 3:44 PM, Peter Zijlstra wrote:
> On Wed, Sep 07, 2022 at 04:58:49PM +0530, Ravi Bangoria wrote:
>>> -static void
>>> -ctx_flexible_sched_in(struct perf_event_context *ctx,
>>> - struct perf_cpu_context *cpuctx)
>>> +/* XXX .busy thingy from Peter's patch */
>>> +static void ctx_flexible_sched_in(struct perf_event_context *ctx, struct 
>>> pmu *pmu)
>>
>> This one turned out to be very easy. Given that, we iterate over each
>> pmu, we can just return error if we fail to schedule any flexible event.
>> (It wouldn't be straight forward like this if we needed to implement
>> pmu=NULL optimization.)
>>
>> ---
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index e0232e0bb74e..923656af73fe 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -3751,6 +3751,7 @@ static int merge_sched_in(struct perf_event *event, 
>> void *data)
>>  cpc = 
>> this_cpu_ptr(event->pmu_ctx->pmu->cpu_pmu_context);
>>  perf_mux_hrtimer_restart(cpc);
>>  group_update_userpage(event);
>> +return -EBUSY;
>>  }
>>  }
>>  
> 
> I'm afraid this breaks things; consider:
> 
>   f79256532682 ("perf/core: fix userpage->time_enabled of inactive events")
> 
> I totally hate this -- because it means we *HAVE* to iterate the
> inactive events, but alas.

Sure. Will drop this.

Thanks,
Ravi

Re: [PATCH] perf: Rewrite core context handling

2022-09-07 Thread Ravi Bangoria

> -static void
> -ctx_flexible_sched_in(struct perf_event_context *ctx,
> -   struct perf_cpu_context *cpuctx)
> +/* XXX .busy thingy from Peter's patch */
> +static void ctx_flexible_sched_in(struct perf_event_context *ctx, struct pmu 
> *pmu)

This one turned out to be very easy. Given that, we iterate over each
pmu, we can just return error if we fail to schedule any flexible event.
(It wouldn't be straight forward like this if we needed to implement
pmu=NULL optimization.)

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e0232e0bb74e..923656af73fe 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3751,6 +3751,7 @@ static int merge_sched_in(struct perf_event *event, void 
*data)
cpc = 
this_cpu_ptr(event->pmu_ctx->pmu->cpu_pmu_context);
perf_mux_hrtimer_restart(cpc);
group_update_userpage(event);
+   return -EBUSY;
}
}
 
@@ -3776,7 +3777,6 @@ static void ctx_pinned_sched_in(struct perf_event_context 
*ctx, struct pmu *pmu)
}
 }
 
-/* XXX .busy thingy from Peter's patch */
 static void ctx_flexible_sched_in(struct perf_event_context *ctx, struct pmu 
*pmu)
 {
struct perf_event_pmu_context *pmu_ctx;
---

Thanks,
Ravi

Re: [PATCH] perf: Rewrite core context handling

2022-09-05 Thread Ravi Bangoria

> @@ -9752,10 +9889,13 @@ void perf_tp_event(u16 event_type, u64 count, void 
> *record, int entry_size,
>   struct trace_entry *entry = record;
>  
>   rcu_read_lock();
> - ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);
> + ctx = rcu_dereference(task->perf_event_ctxp);
>   if (!ctx)
>   goto unlock;
>  
> + // XXX iterate groups instead, we should be able to
> + // find the subtree for the perf_tracepoint pmu and CPU.
> +
>   list_for_each_entry_rcu(event, >event_list, event_entry) {
>   if (event->cpu != smp_processor_id())
>   continue;

This one was simple enough so I prepared a patch for this. Let
me know if you see any issues with below diff.

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 820c56c66b26..e0232e0bb74e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9807,6 +9807,44 @@ static struct pmu perf_swevent = {
 
 #ifdef CONFIG_EVENT_TRACING
 
+static void tp_perf_event_destroy(struct perf_event *event)
+{
+   perf_trace_destroy(event);
+}
+
+static int perf_tp_event_init(struct perf_event *event)
+{
+   int err;
+
+   if (event->attr.type != PERF_TYPE_TRACEPOINT)
+   return -ENOENT;
+
+   /*
+* no branch sampling for tracepoint events
+*/
+   if (has_branch_stack(event))
+   return -EOPNOTSUPP;
+
+   err = perf_trace_init(event);
+   if (err)
+   return err;
+
+   event->destroy = tp_perf_event_destroy;
+
+   return 0;
+}
+
+static struct pmu perf_tracepoint = {
+   .task_ctx_nr= perf_sw_context,
+
+   .event_init = perf_tp_event_init,
+   .add= perf_trace_add,
+   .del= perf_trace_del,
+   .start  = perf_swevent_start,
+   .stop   = perf_swevent_stop,
+   .read   = perf_swevent_read,
+};
+
 static int perf_tp_filter_match(struct perf_event *event,
struct perf_sample_data *data)
 {
@@ -9856,6 +9894,49 @@ void perf_trace_run_bpf_submit(void *raw_data, int size, 
int rctx,
 }
 EXPORT_SYMBOL_GPL(perf_trace_run_bpf_submit);
 
+
+static void __perf_tp_event_target_task(u64 count, void *record,
+   struct pt_regs *regs,
+   struct perf_sample_data *data,
+   struct perf_event *event)
+{
+   struct trace_entry *entry = record;
+
+   if (event->attr.config != entry->type)
+   return;
+   /* Cannot deliver synchronous signal to other task. */
+   if (event->attr.sigtrap)
+   return;
+   if (perf_tp_event_match(event, data, regs))
+   perf_swevent_event(event, count, data, regs);
+}
+
+static void perf_tp_event_target_task(u64 count, void *record,
+ struct pt_regs *regs,
+ struct perf_sample_data *data,
+ struct perf_event_context *ctx)
+{
+   struct perf_event *event, *sibling;
+
+   event = perf_event_groups_first(>pinned_groups, smp_processor_id(),
+   _tracepoint, NULL);
+   for (; event; event = perf_event_groups_next(event, _tracepoint)) {
+   __perf_tp_event_target_task(count, record, regs, data, event);
+   for_each_sibling_event(sibling, event) {
+   __perf_tp_event_target_task(count, record, regs, data, 
sibling);
+   }
+   }
+
+   event = perf_event_groups_first(>flexible_groups, 
smp_processor_id(),
+   _tracepoint, NULL);
+   for (; event; event = perf_event_groups_next(event, _tracepoint)) {
+   __perf_tp_event_target_task(count, record, regs, data, event);
+   for_each_sibling_event(sibling, event) {
+   __perf_tp_event_target_task(count, record, regs, data, 
sibling);
+   }
+   }
+}
+
 void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size,
   struct pt_regs *regs, struct hlist_head *head, int rctx,
   struct task_struct *task)
@@ -9886,29 +9967,15 @@ void perf_tp_event(u16 event_type, u64 count, void 
*record, int entry_size,
 */
if (task && task != current) {
struct perf_event_context *ctx;
-   struct trace_entry *entry = record;
 
rcu_read_lock();
ctx = rcu_dereference(task->perf_event_ctxp);
if (!ctx)
goto unlock;
 
-   // XXX iterate groups instead, we should be able to
-   // find the subtree for the perf_tracepoint pmu and CPU.
-
-   list_for_each_entry_rcu(event, >event_list, event_entry) {
-

Re: [PATCH] perf: Rewrite core context handling

2022-09-04 Thread Ravi Bangoria

> So the basic issue I mentioned is that:
> 
> 
> /*
>  *   ,[1:n]-.
>  *   V  V
>  * perf_event_context <-[1:n]-> perf_event_pmu_context <--- perf_event
>  *   ^  ^ | |
>  *   `[1:n]-' `-[n:1]-> pmu <-[1:n]-'
>  *
>  *
>  * XXX destroy epc when empty
>  *   refcount, !rcu
>  *
>  * XXX epc locking
>  *
>  *   event->pmu_ctxctx->mutex && inactive
>  *   ctx->pmu_ctx_list ctx->mutex && ctx->lock
>  *
>  */
> struct perf_event_pmu_context {
>   ...
> atomic_trefcount; /* event <-> epc */
>   ...
> }
> 
> But that then also suggests that:
> 
> struct perf_event_context {
>   ...
> refcount_t  refcount;
>   ...
> }
> 
> should be: ctx <-> epc, and that is not so, the ctx::refcount still
> counts the number of events.
> 
> Now this might not be bad, but it is confusing.

I don't have strong opinion, but we store events in ctx, not in pmu_ctx.
So, I think it makes sense to keep refcount as ctx <-> event?

[...]

> +void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
> +{
> + struct perf_event_context *src_ctx, *dst_ctx;
> + LIST_HEAD(events);
> +
> + src_ctx = _cpu_ptr(_context, src_cpu)->ctx;
> + dst_ctx = _cpu_ptr(_context, dst_cpu)->ctx;
> +
> + /*
> +  * See perf_event_ctx_lock() for comments on the details
> +  * of swizzling perf_event::ctx.
> +  */
> + mutex_lock_double(_ctx->mutex, _ctx->mutex);
> +
> + __perf_pmu_remove(src_ctx, src_cpu, pmu, _src->pinned_groups, 
> );
> + __perf_pmu_remove(src_ctx, src_cpu, pmu, _src->flexible_groups, 
> );

Rbtrees does not contain sibling events. Shouldn't we continue using
event_list here?

Thanks,
Ravi

Re: [PATCH] perf: Rewrite core context handling

2022-09-01 Thread Ravi Bangoria

On 29-Aug-22 8:10 PM, Peter Zijlstra wrote:
> On Mon, Aug 29, 2022 at 02:04:33PM +0200, Peter Zijlstra wrote:
>> On Mon, Aug 29, 2022 at 05:03:47PM +0530, Ravi Bangoria wrote:
>>> @@ -12598,6 +12590,7 @@ EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
>>>  
>>>  void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
>>>  {
>>> +#if 0 // XXX buggered - cpu hotplug, who cares
>>> struct perf_event_context *src_ctx;
>>> struct perf_event_context *dst_ctx;
>>> struct perf_event *event, *tmp;
>>> @@ -12658,6 +12651,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int 
>>> src_cpu, int dst_cpu)
>>> }
>>> mutex_unlock(_ctx->mutex);
>>> mutex_unlock(_ctx->mutex);
>>> +#endif
>>>  }
>>>  EXPORT_SYMBOL_GPL(perf_pmu_migrate_context);
>>>  
>>
>> Note to self; fix this :-) I'll see if I have time for that later today.
> 
> Urgh, while going through that it appears the whole refcounting thing
> isn't fully done either.

Not sure I follow. Can you please elaborate.

Thanks,
Ravi

[PATCH] perf: Rewrite core context handling

2022-08-29 Thread Ravi Bangoria

From: Peter Zijlstra 

There have been various issues and limitations with the way perf uses
(task) contexts to track events. Most notable is the single hardware
PMU task context, which has resulted in a number of yucky things (both
proposed and merged).

Notably:
 - HW breakpoint PMU
 - ARM big.little PMU / Intel ADL PMU
 - Intel Branch Monitoring PMU
 - AMD IBS PMU
 - S390 cpum_cf PMU
 - PowerPC trace_imc PMU

*Current design:*
Currently we have a per task and per cpu perf_event_contexts:

  task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context
   ^ |^ |   ^
   `-'| `--> pmu ---'
  v   ^
 perf_event --'

Each task has an array of pointers to a perf_event_context. Each
perf_event_context has a direct relation to a PMU and a group of
events for that PMU. The task related perf_event_context's have a
pointer back to that task.

Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which
includes a perf_event_context, which again has a direct relation to
that PMU, and a group of events for that PMU.

The perf_cpu_context also tracks which task context is currently
associated with that CPU and includes a few other things like the
hrtimer for rotation etc.

Each perf_event is then associated with its PMU and one
perf_event_context.

*Proposed design:*
New design proposed by this patch reduce to a single task context and
a single CPU context but adds some intermediate data-structures:

  task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context
   ^   |   ^ ^
   `---'   | |
   | |perf_cpu_pmu_context <--.
   | `.^  |
   |  ||  |
   |  vv  |
   | ,--> perf_event_pmu_context  |
   | ||
   | ||
   v v|
  perf_event ---> pmu '

With new design, perf_event_context will hold all pmu events in the
respective(pinned/flexible) rbtrees. This can be achieved by adding
pmu to rbtree key:

  {cpu, pmu, cgroup, group_index}

Each perf_event_context carry a list of perf_event_pmu_context which
is used to hold per-pmu-per-context state. For ex, it keeps track of
currently active events for that pmu, a pmu specific task_ctx_data,
a flag to tell whether rotation is required or not etc.

Similarly, perf_cpu_pmu_context is used to hold per-pmu-per-cpu state
like hrtimer details to drive the event rotation, a pointer to
perf_event_pmu_context of currently running task and some other
ancillary information.

Each perf_event is associated to it's pmu, perf_event_context and
perf_event_pmu_context.

*Tests:*
With this, perf test and perf_event_test runs fine without any issues
on Intel and AMD machines. However, one WARN_ON() was reported on Intel
machine[1] with perf fuzzer. Note that, so far I've run these tests to
verify stability of the change only, not functional correctness. Also,
some of the known issues marked with XXX needs to be resolved.

Further optimizations to current implementation are possible. For ex,
ctx_resched() can be optimized to reschedule only single pmu events.
ctx_sched_in() can be optimized to use pmu=NULL while scheduling entire
context. However, this will lead to significant changes to rbtree and
minheap layouts.

[1]: https://lore.kernel.org/r/8d91528b-e830-6ad0-8a92-621ce9f944ca%40amd.com

Signed-off-by: Peter Zijlstra 
Signed-off-by: Ravi Bangoria 
---

This is a 3rd version of perf event context rework and it's quite
stable now, so I thought to remove RFC tag. Previous versions:

RFC v2: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bango...@amd.com
RFC v1: 
https://lore.kernel.org/lkml/20181010104559.go5...@hirez.programming.kicks-ass.net

RFC v2 -> PATCH:
 - Rebased to v6.0-rc2
 - Fixed locking issues reported by lockdep
 - Fixed kernel crash caused by stale cpc->task_epc
 - Resolved few bugs and warnings reported while running perf test,
   perf_event_test and perf fuzzer
 - Removed cgrp_cpuctx_list. No need to maintain list as we have
   single perf_event_context for each cpu
 - Removed x86_pmu_update_cpu_context() quirk
 - Modify pmu->filter callback (including s/filter_match/filter/).


 arch/arm64/kernel/perf_event.c |   24 +-
 arch/powerpc/perf/core-book3s.c|8 +-
 arch/s390/kernel/perf_pai_crypto.c |2 +-
 arch/x86/events/amd/brs.c  |2 +-
 ar

Re: [PATCH 4/4] bpf powerpc: Add addr > TASK_SIZE_MAX explicit check

2021-07-06 Thread Ravi Bangoria





@@ -763,6 +771,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
  /* dst = *(u16 *)(ul) (src + off) */
  case BPF_LDX | BPF_MEM | BPF_H:
  case BPF_LDX | BPF_PROBE_MEM | BPF_H:
+    if (BPF_MODE(code) == BPF_PROBE_MEM) {
+    EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off));
+    PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX);
+    EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2]));
+    PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
+    EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg));
+    PPC_JMP((ctx->idx + 2) * 4);
+    }


That code seems strictly identical to the previous one and the next one.
Can you refactor in a function ?


I'll check this.




  EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
  if (insn_is_zext([i + 1]))
  addrs[++i] = ctx->idx * 4;
@@ -773,6 +789,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
  /* dst = *(u32 *)(ul) (src + off) */
  case BPF_LDX | BPF_MEM | BPF_W:
  case BPF_LDX | BPF_PROBE_MEM | BPF_W:
+    if (BPF_MODE(code) == BPF_PROBE_MEM) {
+    EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off));
+    PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX);
+    EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2]));
+    PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
+    EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg));
+    PPC_JMP((ctx->idx + 2) * 4);
+    }
  EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
  if (insn_is_zext([i + 1]))
  addrs[++i] = ctx->idx * 4;
@@ -783,6 +807,20 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
  /* dst = *(u64 *)(ul) (src + off) */
  case BPF_LDX | BPF_MEM | BPF_DW:
  case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
+    if (BPF_MODE(code) == BPF_PROBE_MEM) {
+    EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off));
+    PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX);
+    EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2]));
+    if (off % 4)


That test is worth a comment.


(off % 4) test is based on how PPC_BPF_LL() emits instruction.



And I'd prefer

 if (off & 3) {
     PPC_BCC(COND_GT, (ctx->idx + 5) * 4);
     EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg));
     PPC_JMP((ctx->idx + 3) * 4);
 } else {
     PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
     EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg));
     PPC_JMP((ctx->idx + 2) * 4);
 }


Yes this is neat.

Thanks for the review,
Ravi

Re: [PATCH 3/4] bpf powerpc: Add BPF_PROBE_MEM support for 64bit JIT

2021-07-06 Thread Ravi Bangoria





On 7/6/21 3:23 PM, Christophe Leroy wrote:



Le 06/07/2021 à 09:32, Ravi Bangoria a écrit :

BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.


Can you do the same for 32bit ?


Sure. But before that, do you think the approach is fine(including
patch #4)? Because it's little bit different from what other archs do.

[...]

@@ -89,6 +89,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
  {
  u32 proglen;
  u32 alloclen;
+    u32 extable_len = 0;
+    u32 fixup_len = 0;


Setting those to 0 doesn't seem to be needed, as it doesn't seem to exist any 
path to skip the setting below. You should not perform unnecessary init at 
declaration as it is error prone.


Ok.

[...]

@@ -234,7 +247,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
  fp->bpf_func = (void *)image;
  fp->jited = 1;
-    fp->jited_len = alloclen;
+    fp->jited_len = proglen + FUNCTION_DESCR_SIZE;
  bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
  if (!fp->is_func || extra_pass) {


This hunk does not apply on latest powerpc tree. You are missing commit 
62e3d4210ac9c


Ok. I prepared this on a bpf/master. Will rebase it to powerpc/next.

[...]

+static int add_extable_entry(struct bpf_prog *fp, u32 *image, int pass,
+ u32 code, struct codegen_context *ctx, int dst_reg)
+{
+    off_t offset;
+    unsigned long pc;
+    struct exception_table_entry *ex;
+    u32 *fixup;
+
+    /* Populate extable entries only in the last pass */
+    if (pass != 2 || BPF_MODE(code) != BPF_PROBE_MEM)


'code' is only used for that test, can you do the test before calling 
add_extable_entry() ?


Ok.




+    return 0;
+
+    if (!fp->aux->extable ||
+    WARN_ON_ONCE(ctx->exentry_idx >= fp->aux->num_exentries))
+    return -EINVAL;
+
+    pc = (unsigned long)[ctx->idx - 1];


You should call this function before incrementing ctx->idx


Ok.




+
+    fixup = (void *)fp->aux->extable -
+    (fp->aux->num_exentries * BPF_FIXUP_LEN) +
+    (ctx->exentry_idx * BPF_FIXUP_LEN);
+
+    fixup[0] = PPC_RAW_XOR(dst_reg, dst_reg, dst_reg);


Prefered way to clear a reg in according to ISA is to do 'li reg, 0'


Sure I'll use 'li reg, 0' But can you point me to where in ISA this
is mentioned?




+    fixup[1] = (PPC_INST_BRANCH |
+   (((long)(pc + 4) - (long)[1]) & 0x03fc));


Would be nice if we could have a PPC_RAW_BRANCH() stuff, we could do something like 
PPC_RAW_BRANCH((long)(pc + 4) - (long)[1])


Ok.

[...]

@@ -710,25 +752,41 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
   */
  /* dst = *(u8 *)(ul) (src + off) */
  case BPF_LDX | BPF_MEM | BPF_B:
+    case BPF_LDX | BPF_PROBE_MEM | BPF_B:


Could do:
+    case BPF_LDX | BPF_PROBE_MEM | BPF_B:
+    ret = add_extable_entry(fp, image, pass, code, ctx, dst_reg);
+    if (ret)
+    return ret;
   case BPF_LDX | BPF_MEM | BPF_B:


Yes this is neat.

Thanks for the review.
Ravi

[PATCH 4/4] bpf powerpc: Add addr > TASK_SIZE_MAX explicit check

2021-07-06 Thread Ravi Bangoria

On PowerPC with KUAP enabled, any kernel code which wants to
access userspace needs to be surrounded by disable-enable KUAP.
But that is not happening for BPF_PROBE_MEM load instruction.
So, when BPF program tries to access invalid userspace address,
page-fault handler considers it as bad KUAP fault:

  Kernel attempted to read user page (d000) - exploit attempt? (uid: 0)

Considering the fact that PTR_TO_BTF_ID (which uses BPF_PROBE_MEM
mode) could either be a valid kernel pointer or NULL but should
never be a pointer to userspace address, execute BPF_PROBE_MEM load
only if addr > TASK_SIZE_MAX, otherwise set dst_reg=0 and move on.

This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]
Suggested-by: Alexei Starovoitov 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/net/bpf_jit_comp64.c | 38 +++
 1 file changed, 38 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 1884c6dca89a..46becae76210 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -753,6 +753,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
/* dst = *(u8 *)(ul) (src + off) */
case BPF_LDX | BPF_MEM | BPF_B:
case BPF_LDX | BPF_PROBE_MEM | BPF_B:
+   if (BPF_MODE(code) == BPF_PROBE_MEM) {
+   EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, 
off));
+   PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX);
+   EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], 
b2p[TMP_REG_2]));
+   PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
+   EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg));
+   PPC_JMP((ctx->idx + 2) * 4);
+   }
EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
if (insn_is_zext([i + 1]))
addrs[++i] = ctx->idx * 4;
@@ -763,6 +771,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
/* dst = *(u16 *)(ul) (src + off) */
case BPF_LDX | BPF_MEM | BPF_H:
case BPF_LDX | BPF_PROBE_MEM | BPF_H:
+   if (BPF_MODE(code) == BPF_PROBE_MEM) {
+   EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, 
off));
+   PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX);
+   EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], 
b2p[TMP_REG_2]));
+   PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
+   EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg));
+   PPC_JMP((ctx->idx + 2) * 4);
+   }
EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
if (insn_is_zext([i + 1]))
addrs[++i] = ctx->idx * 4;
@@ -773,6 +789,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
/* dst = *(u32 *)(ul) (src + off) */
case BPF_LDX | BPF_MEM | BPF_W:
case BPF_LDX | BPF_PROBE_MEM | BPF_W:
+   if (BPF_MODE(code) == BPF_PROBE_MEM) {
+   EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, 
off));
+   PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX);
+   EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], 
b2p[TMP_REG_2]));
+   PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
+   EMIT(PPC_RAW_XOR(dst_reg, dst_reg, dst_reg));
+   PPC_JMP((ctx->idx + 2) * 4);
+   }
EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
if (insn_is_zext([i + 1]))
addrs[++i] = ctx->idx * 4;
@@ -783,6 +807,20 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
/* dst = *(u64 *)(ul) (src + off) */
case BPF_LDX | BPF_MEM | BPF_DW:
case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
+   if (BPF_MODE(code) == BPF_PROBE_MEM) {
+   EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, 
off));
+   PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX);
+   EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], 
b2p[TMP_REG_2]));
+   if (off % 4)
+   PPC_BCC(COND_GT, (ctx->idx + 5) * 4);
+   else
+   PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
+   EMIT(P

[PATCH 3/4] bpf powerpc: Add BPF_PROBE_MEM support for 64bit JIT

2021-07-06 Thread Ravi Bangoria

BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections witin BPF program. fixup section contains two instructions,
first instruction clears dest_reg and 2nd jumps to next instruction
in the BPF code. extable 'insn' field contains relative offset of
the instruction and 'fixup' field contains relative offset of the
fixup entry. Example layout of BPF program with extable present:

 +--+
 |  |
 |  |
   0x4020 -->| ld   r27,4(r3)   |
 |  |
 |  |
   0x40ac -->| lwz  r3,0(r4)|
 |  |
 |  |
 |--|
   0x4280 -->| xor r27,r27,r27  |  \ fixup entry
 | b   0x4024   |  /
   0x4288 -->| xor r3,r3,r3 |
 | b   0x40b0   |
 |--|
   0x4290 -->| insn=0xfd90  |  \ extable entry
 | fixup=0xffec |  /
   0x4298 -->| insn=0xfe14  |
 | fixup=0xffec |
 +--+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/net/bpf_jit.h|  5 ++-
 arch/powerpc/net/bpf_jit_comp.c   | 25 +
 arch/powerpc/net/bpf_jit_comp32.c |  2 +-
 arch/powerpc/net/bpf_jit_comp64.c | 60 ++-
 4 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 411c63d945c7..e9408ad190d3 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -141,8 +141,11 @@ struct codegen_context {
unsigned int idx;
unsigned int stack_size;
int b2p[ARRAY_SIZE(b2p)];
+   unsigned int exentry_idx;
 };
 
+#define BPF_FIXUP_LEN  8 /* Two instructions */
+
 static inline void bpf_flush_icache(void *start, void *end)
 {
smp_wmb();  /* smp write barrier */
@@ -166,7 +169,7 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
 
 void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
-  u32 *addrs);
+  u32 *addrs, int pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index a9585e52a88d..3ebd8897cf09 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -89,6 +89,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 {
u32 proglen;
u32 alloclen;
+   u32 extable_len = 0;
+   u32 fixup_len = 0;
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
@@ -131,7 +133,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
image = jit_data->image;
bpf_hdr = jit_data->header;
proglen = jit_data->proglen;
-   alloclen = proglen + FUNCTION_DESCR_SIZE;
extra_pass = true;
goto skip_init_ctx;
}
@@ -149,7 +150,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
/* Scouting faux-generate pass 0 */
-   if (bpf_jit_build_body(fp, 0, , addrs)) {
+   if (bpf_jit_build_body(fp, 0, , addrs, 0)) {
/* We hit something illegal or unsupported. */
fp = org_fp;
goto out_addrs;
@@ -162,7 +163,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 */
if (cgctx.seen & SEEN_TAILCALL) {
cgctx.idx = 0;
-   if (bpf_jit_build_body(fp, 0, , addrs)) {
+   if (bpf_jit_build_body(fp, 0, , addrs, 0)) {
fp = org_fp;
goto out_addrs;
}
@@ -177,8 +178,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
bpf_jit_build_prologue(0, );
bpf_jit_build_epilogue(0, );
 
+   fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN;
+   extable_len = fp->aux->num_exentries * sizeof(struct 
exception_table_entry);
+
proglen = cgctx.idx * 4;
-   alloclen = proglen + FUNCTION_DESCR_SIZE;
+   alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len;
 
bpf_hdr = bpf_jit_binary_alloc(alloclen, , 4, 
bpf_jit_fill_ill_

[PATCH 2/4] bpf powerpc: Remove extra_pass from bpf_jit_build_body()

2021-07-06 Thread Ravi Bangoria

In case of extra_pass, we always skips usual JIT passes. Thus
extra_pass is always false while calling bpf_jit_build_body()
and thus it can be removed.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/net/bpf_jit.h| 2 +-
 arch/powerpc/net/bpf_jit_comp.c   | 6 +++---
 arch/powerpc/net/bpf_jit_comp32.c | 4 ++--
 arch/powerpc/net/bpf_jit_comp64.c | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index d6267e93027a..411c63d945c7 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -166,7 +166,7 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
 
 void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
-  u32 *addrs, bool extra_pass);
+  u32 *addrs);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 798ac4350a82..a9585e52a88d 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -149,7 +149,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
/* Scouting faux-generate pass 0 */
-   if (bpf_jit_build_body(fp, 0, , addrs, false)) {
+   if (bpf_jit_build_body(fp, 0, , addrs)) {
/* We hit something illegal or unsupported. */
fp = org_fp;
goto out_addrs;
@@ -162,7 +162,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 */
if (cgctx.seen & SEEN_TAILCALL) {
cgctx.idx = 0;
-   if (bpf_jit_build_body(fp, 0, , addrs, false)) {
+   if (bpf_jit_build_body(fp, 0, , addrs)) {
fp = org_fp;
goto out_addrs;
}
@@ -210,7 +210,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
/* Now build the prologue, body code & epilogue for real. */
cgctx.idx = 0;
bpf_jit_build_prologue(code_base, );
-   bpf_jit_build_body(fp, code_base, , addrs, extra_pass);
+   bpf_jit_build_body(fp, code_base, , addrs);
bpf_jit_build_epilogue(code_base, );
 
if (bpf_jit_enable > 1)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index bbb16099e8c7..1f81bea35aab 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -266,7 +266,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32
 
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
-  u32 *addrs, bool extra_pass)
+  u32 *addrs)
 {
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
@@ -846,7 +846,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
case BPF_JMP | BPF_CALL:
ctx->seen |= SEEN_FUNC;
 
-   ret = bpf_jit_get_func_addr(fp, [i], extra_pass,
+   ret = bpf_jit_get_func_addr(fp, [i], false,
_addr, 
_addr_fixed);
if (ret < 0)
return ret;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 57a8c1153851..984177d9d394 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -272,7 +272,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32
 
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
-  u32 *addrs, bool extra_pass)
+  u32 *addrs)
 {
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
@@ -763,7 +763,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
case BPF_JMP | BPF_CALL:
ctx->seen |= SEEN_FUNC;
 
-   ret = bpf_jit_get_func_addr(fp, [i], extra_pass,
+   ret = bpf_jit_get_func_addr(fp, [i], false,
_addr, 
_addr_fixed);
if (ret < 0)
return ret;
-- 
2.26.3

[PATCH 0/4] bpf powerpc: Add BPF_PROBE_MEM support for 64bit JIT

2021-07-06 Thread Ravi Bangoria

Patch #1, #2 are simple cleanup patches. Patch #3 adds
BPF_PROBE_MEM support with PowerPC 64bit JIT compiler.
Patch #4 adds explicit addr > TASK_SIZE_MAX check to
handle bad userspace pointers.

Ravi Bangoria (4):
  bpf powerpc: Remove unused SEEN_STACK
  bpf powerpc: Remove extra_pass from bpf_jit_build_body()
  bpf powerpc: Add BPF_PROBE_MEM support for 64bit JIT
  bpf powerpc: Add addr > TASK_SIZE_MAX explicit check

 arch/powerpc/net/bpf_jit.h|   8 ++-
 arch/powerpc/net/bpf_jit_comp.c   |  25 ++--
 arch/powerpc/net/bpf_jit_comp32.c |   4 +-
 arch/powerpc/net/bpf_jit_comp64.c | 100 +-
 4 files changed, 124 insertions(+), 13 deletions(-)

-- 
2.26.3

[PATCH 1/4] bpf powerpc: Remove unused SEEN_STACK

2021-07-06 Thread Ravi Bangoria

SEEN_STACK is unused on PowerPC. Remove it. Also, have
SEEN_TAILCALL use 0x4000.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/net/bpf_jit.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 99fad093f43e..d6267e93027a 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -116,8 +116,7 @@ static inline bool is_nearbranch(int offset)
 #define COND_LE(CR0_GT | COND_CMP_FALSE)
 
 #define SEEN_FUNC  0x2000 /* might call external helpers */
-#define SEEN_STACK 0x4000 /* uses BPF stack */
-#define SEEN_TAILCALL  0x8000 /* uses tail calls */
+#define SEEN_TAILCALL  0x4000 /* uses tail calls */
 
 #define SEEN_VREG_MASK 0x1ff8 /* Volatile registers r3-r12 */
 #define SEEN_NVREG_MASK0x0003 /* Non volatile registers r14-r31 */
-- 
2.26.3

[PATCH v3 4/4] powerpc/selftests: Add selftest to test concurrent perf/ptrace events

2021-04-12 Thread Ravi Bangoria

ptrace and perf watchpoints can't co-exists if their address range
overlaps. See commit 29da4f91c0c1 ("powerpc/watchpoint: Don't allow
concurrent perf and ptrace events") for more detail. Add selftest
for the same.

Sample o/p:
  # ./ptrace-perf-hwbreak
  test: ptrace-perf-hwbreak
  tags: git_version:powerpc-5.8-7-118-g937fa174a15d-dirty
  perf cpu event -> ptrace thread event (Overlapping): Ok
  perf cpu event -> ptrace thread event (Non-overlapping): Ok
  perf thread event -> ptrace same thread event (Overlapping): Ok
  perf thread event -> ptrace same thread event (Non-overlapping): Ok
  perf thread event -> ptrace other thread event: Ok
  ptrace thread event -> perf kernel event: Ok
  ptrace thread event -> perf same thread event (Overlapping): Ok
  ptrace thread event -> perf same thread event (Non-overlapping): Ok
  ptrace thread event -> perf other thread event: Ok
  ptrace thread event -> perf cpu event (Overlapping): Ok
  ptrace thread event -> perf cpu event (Non-overlapping): Ok
  ptrace thread event -> perf same thread & cpu event (Overlapping): Ok
  ptrace thread event -> perf same thread & cpu event (Non-overlapping): Ok
  ptrace thread event -> perf other thread & cpu event: Ok
  success: ptrace-perf-hwbreak

Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/.gitignore   |   1 +
 .../testing/selftests/powerpc/ptrace/Makefile |   2 +-
 .../powerpc/ptrace/ptrace-perf-hwbreak.c  | 659 ++
 3 files changed, 661 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c

diff --git a/tools/testing/selftests/powerpc/ptrace/.gitignore 
b/tools/testing/selftests/powerpc/ptrace/.gitignore
index 0e96150b7c7e..eb75e5360e31 100644
--- a/tools/testing/selftests/powerpc/ptrace/.gitignore
+++ b/tools/testing/selftests/powerpc/ptrace/.gitignore
@@ -14,3 +14,4 @@ perf-hwbreak
 core-pkey
 ptrace-pkey
 ptrace-syscall
+ptrace-perf-hwbreak
diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile 
b/tools/testing/selftests/powerpc/ptrace/Makefile
index 8d3f006c98cc..a500639da97a 100644
--- a/tools/testing/selftests/powerpc/ptrace/Makefile
+++ b/tools/testing/selftests/powerpc/ptrace/Makefile
@@ -2,7 +2,7 @@
 TEST_GEN_PROGS := ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr \
   ptrace-tar ptrace-tm-tar ptrace-tm-spd-tar ptrace-vsx 
ptrace-tm-vsx \
   ptrace-tm-spd-vsx ptrace-tm-spr ptrace-hwbreak ptrace-pkey 
core-pkey \
-  perf-hwbreak ptrace-syscall
+  perf-hwbreak ptrace-syscall ptrace-perf-hwbreak
 
 top_srcdir = ../../../../..
 include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
new file mode 100644
index ..fad14f926b98
--- /dev/null
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
@@ -0,0 +1,659 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "ptrace.h"
+
+char data[16];
+
+/* Overlapping address range */
+volatile __u64 *ptrace_data1 = (__u64 *)[0];
+volatile __u64 *perf_data1 = (__u64 *)[4];
+
+/* Non-overlapping address range */
+volatile __u64 *ptrace_data2 = (__u64 *)[0];
+volatile __u64 *perf_data2 = (__u64 *)[8];
+
+static unsigned long pid_max_addr(void)
+{
+   FILE *fp;
+   char *line, *c;
+   char addr[100];
+   size_t len = 0;
+
+   fp = fopen("/proc/kallsyms", "r");
+   if (!fp) {
+   printf("Failed to read /proc/kallsyms. Exiting..\n");
+   exit(EXIT_FAILURE);
+   }
+
+   while (getline(, , fp) != -1) {
+   if (!strstr(line, "pid_max") || strstr(line, "pid_max_max") ||
+   strstr(line, "pid_max_min"))
+   continue;
+
+   strncpy(addr, line, len < 100 ? len : 100);
+   c = strchr(addr, ' ');
+   *c = '\0';
+   return strtoul(addr, , 16);
+   }
+   fclose(fp);
+   printf("Could not find pix_max. Exiting..\n");
+   exit(EXIT_FAILURE);
+   return -1;
+}
+
+static void perf_user_event_attr_set(struct perf_event_attr *attr, __u64 addr, 
__u64 len)
+{
+   memset(attr, 0, sizeof(struct perf_event_attr));
+   attr->type   = PERF_TYPE_BREAKPOINT;
+   attr->size   = sizeof(struct perf_event_attr);
+   attr->bp_type= HW_BREAKPOINT_R;
+   attr->bp_addr= addr;
+   attr->bp_len = len;
+   attr->exclude_kernel = 1;
+   attr->exclude_hv = 1;
+}
+
+static void perf_kernel_event_attr_set(struct perf_event_attr *attr)
+{
+   memset(attr, 0, sizeof(struct perf_event_attr));
+   attr->type   = PERF_

[PATCH v3 3/4] powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR

2021-04-12 Thread Ravi Bangoria

Extend perf-hwbreak.c selftest to test multiple DAWRs. Also add
testcase for testing 512 byte boundary removal.

Sample o/p:
  # ./perf-hwbreak
  ...
  TESTED: Process specific, Two events, diff addr
  TESTED: Process specific, Two events, same addr
  TESTED: Process specific, Two events, diff addr, one is RO, other is WO
  TESTED: Process specific, Two events, same addr, one is RO, other is WO
  TESTED: Systemwide, Two events, diff addr
  TESTED: Systemwide, Two events, same addr
  TESTED: Systemwide, Two events, diff addr, one is RO, other is WO
  TESTED: Systemwide, Two events, same addr, one is RO, other is WO
  TESTED: Process specific, 512 bytes, unaligned
  success: perf_hwbreak

Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/perf-hwbreak.c   | 552 +-
 1 file changed, 551 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
index b1d5a14d7722..419506636ebe 100644
--- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
@@ -21,8 +21,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -34,6 +39,12 @@
 
 #define DAWR_LENGTH_MAX ((0x3f + 1) * 8)
 
+int nprocs;
+
+static volatile int a = 10;
+static volatile int b = 10;
+static volatile char c[512 + 8] __attribute__((aligned(512)));
+
 static void perf_event_attr_set(struct perf_event_attr *attr,
__u32 type, __u64 addr, __u64 len,
bool exclude_user)
@@ -68,6 +79,76 @@ static int perf_process_event_open(__u32 type, __u64 addr, 
__u64 len)
return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0);
 }
 
+static int perf_cpu_event_open(long cpu, __u32 type, __u64 addr, __u64 len)
+{
+   struct perf_event_attr attr;
+
+   perf_event_attr_set(, type, addr, len, 0);
+   return syscall(__NR_perf_event_open, , -1, cpu, -1, 0);
+}
+
+static void close_fds(int *fd, int n)
+{
+   int i;
+
+   for (i = 0; i < n; i++)
+   close(fd[i]);
+}
+
+static unsigned long read_fds(int *fd, int n)
+{
+   int i;
+   unsigned long c = 0;
+   unsigned long count = 0;
+   size_t res;
+
+   for (i = 0; i < n; i++) {
+   res = read(fd[i], , sizeof(c));
+   assert(res == sizeof(unsigned long long));
+   count += c;
+   }
+   return count;
+}
+
+static void reset_fds(int *fd, int n)
+{
+   int i;
+
+   for (i = 0; i < n; i++)
+   ioctl(fd[i], PERF_EVENT_IOC_RESET);
+}
+
+static void enable_fds(int *fd, int n)
+{
+   int i;
+
+   for (i = 0; i < n; i++)
+   ioctl(fd[i], PERF_EVENT_IOC_ENABLE);
+}
+
+static void disable_fds(int *fd, int n)
+{
+   int i;
+
+   for (i = 0; i < n; i++)
+   ioctl(fd[i], PERF_EVENT_IOC_DISABLE);
+}
+
+static int perf_systemwide_event_open(int *fd, __u32 type, __u64 addr, __u64 
len)
+{
+   int i = 0;
+
+   /* Assume online processors are 0 to nprocs for simplisity */
+   for (i = 0; i < nprocs; i++) {
+   fd[i] = perf_cpu_event_open(i, type, addr, len);
+   if (fd[i] < 0) {
+   close_fds(fd, i);
+   return fd[i];
+   }
+   }
+   return 0;
+}
+
 static inline bool breakpoint_test(int len)
 {
int fd;
@@ -262,11 +343,467 @@ static int runtest_dar_outside(void)
return fail;
 }
 
+static void multi_dawr_workload(void)
+{
+   a += 10;
+   b += 10;
+   c[512 + 1] += 'a';
+}
+
+static int test_process_multi_diff_addr(void)
+{
+   unsigned long long breaks1 = 0, breaks2 = 0;
+   int fd1, fd2;
+   char *desc = "Process specific, Two events, diff addr";
+   size_t res;
+
+   fd1 = perf_process_event_open(HW_BREAKPOINT_RW, (__u64), 
(__u64)sizeof(a));
+   if (fd1 < 0) {
+   perror("perf_process_event_open");
+   exit(EXIT_FAILURE);
+   }
+
+   fd2 = perf_process_event_open(HW_BREAKPOINT_RW, (__u64), 
(__u64)sizeof(b));
+   if (fd2 < 0) {
+   close(fd1);
+   perror("perf_process_event_open");
+   exit(EXIT_FAILURE);
+   }
+
+   ioctl(fd1, PERF_EVENT_IOC_RESET);
+   ioctl(fd2, PERF_EVENT_IOC_RESET);
+   ioctl(fd1, PERF_EVENT_IOC_ENABLE);
+   ioctl(fd2, PERF_EVENT_IOC_ENABLE);
+   multi_dawr_workload();
+   ioctl(fd1, PERF_EVENT_IOC_DISABLE);
+   ioctl(fd2, PERF_EVENT_IOC_DISABLE);
+
+   res = read(fd1, , sizeof(breaks1));
+   assert(res == sizeof(unsigned long long));
+   res = read(fd2, , sizeof(breaks2));
+   assert(res == sizeof(unsigned long long));
+
+   close(fd1);
+   close(fd2);
+
+   if (breaks1 != 2 || breaks2 != 2

[PATCH v3 2/4] powerpc/selftests/perf-hwbreak: Coalesce event creation code

2021-04-12 Thread Ravi Bangoria

perf-hwbreak selftest opens hw-breakpoint event at multiple places for
which it has same code repeated. Coalesce that code into a function.

Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/perf-hwbreak.c   | 79 +--
 1 file changed, 39 insertions(+), 40 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
index c1f324afdbf3..b1d5a14d7722 100644
--- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
@@ -34,28 +34,46 @@
 
 #define DAWR_LENGTH_MAX ((0x3f + 1) * 8)
 
-static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid,
- int cpu, int group_fd,
- unsigned long flags)
+static void perf_event_attr_set(struct perf_event_attr *attr,
+   __u32 type, __u64 addr, __u64 len,
+   bool exclude_user)
 {
-   attr->size = sizeof(*attr);
-   return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+   memset(attr, 0, sizeof(struct perf_event_attr));
+   attr->type   = PERF_TYPE_BREAKPOINT;
+   attr->size   = sizeof(struct perf_event_attr);
+   attr->bp_type= type;
+   attr->bp_addr= addr;
+   attr->bp_len = len;
+   attr->exclude_kernel = 1;
+   attr->exclude_hv = 1;
+   attr->exclude_guest  = 1;
+   attr->exclude_user   = exclude_user;
+   attr->disabled   = 1;
 }
 
-static inline bool breakpoint_test(int len)
+static int
+perf_process_event_open_exclude_user(__u32 type, __u64 addr, __u64 len, bool 
exclude_user)
 {
struct perf_event_attr attr;
+
+   perf_event_attr_set(, type, addr, len, exclude_user);
+   return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0);
+}
+
+static int perf_process_event_open(__u32 type, __u64 addr, __u64 len)
+{
+   struct perf_event_attr attr;
+
+   perf_event_attr_set(, type, addr, len, 0);
+   return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0);
+}
+
+static inline bool breakpoint_test(int len)
+{
int fd;
 
-   /* setup counters */
-   memset(, 0, sizeof(attr));
-   attr.disabled = 1;
-   attr.type = PERF_TYPE_BREAKPOINT;
-   attr.bp_type = HW_BREAKPOINT_R;
/* bp_addr can point anywhere but needs to be aligned */
-   attr.bp_addr = (__u64)() & 0xf800;
-   attr.bp_len = len;
-   fd = sys_perf_event_open(, 0, -1, -1, 0);
+   fd = perf_process_event_open(HW_BREAKPOINT_R, (__u64)() & 
0xf800, len);
if (fd < 0)
return false;
close(fd);
@@ -75,7 +93,6 @@ static inline bool dawr_supported(void)
 static int runtestsingle(int readwriteflag, int exclude_user, int arraytest)
 {
int i,j;
-   struct perf_event_attr attr;
size_t res;
unsigned long long breaks, needed;
int readint;
@@ -85,6 +102,7 @@ static int runtestsingle(int readwriteflag, int 
exclude_user, int arraytest)
int break_fd;
int loop_num = MAX_LOOPS - (rand() % 100); /* provide some variability 
*/
volatile int *k;
+   __u64 len;
 
/* align to 0x400 boundary as required by DAWR */
readintalign = (int *)(((unsigned long)readintarraybig + 0x7ff) &
@@ -94,19 +112,11 @@ static int runtestsingle(int readwriteflag, int 
exclude_user, int arraytest)
if (arraytest)
ptr = [0];
 
-   /* setup counters */
-   memset(, 0, sizeof(attr));
-   attr.disabled = 1;
-   attr.type = PERF_TYPE_BREAKPOINT;
-   attr.bp_type = readwriteflag;
-   attr.bp_addr = (__u64)ptr;
-   attr.bp_len = sizeof(int);
-   if (arraytest)
-   attr.bp_len = DAWR_LENGTH_MAX;
-   attr.exclude_user = exclude_user;
-   break_fd = sys_perf_event_open(, 0, -1, -1, 0);
+   len = arraytest ? DAWR_LENGTH_MAX : sizeof(int);
+   break_fd = perf_process_event_open_exclude_user(readwriteflag, 
(__u64)ptr,
+   len, exclude_user);
if (break_fd < 0) {
-   perror("sys_perf_event_open");
+   perror("perf_process_event_open_exclude_user");
exit(1);
}
 
@@ -153,7 +163,6 @@ static int runtest_dar_outside(void)
void *target;
volatile __u16 temp16;
volatile __u64 temp64;
-   struct perf_event_attr attr;
int break_fd;
unsigned long long breaks;
int fail = 0;
@@ -165,21 +174,11 @@ static int runtest_dar_outside(void)
exit(EXIT_FAILURE);
}
 
-   /* setup counters */
-   memset(, 0, sizeof(attr));
-   attr.disabled = 1;
-   attr.type = PERF_TYPE_BREAKPOINT;
-   attr.exclude_

[PATCH v3 1/4] powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR

2021-04-12 Thread Ravi Bangoria

Add selftests to test multiple active DAWRs with ptrace interface.

Sample o/p:
  $ ./ptrace-hwbreak
  ...
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO, len: 6: Ok

Signed-off-by: Ravi Bangoria 
Reviewed-by: Daniel Axtens 
---
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 79 +++
 1 file changed, 79 insertions(+)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
index 2e0d86e0687e..a0635a3819aa 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
@@ -194,6 +194,18 @@ static void test_workload(void)
big_var[rand() % DAWR_MAX_LEN] = 'a';
else
cvar = big_var[rand() % DAWR_MAX_LEN];
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */
+   gstruct.a[rand() % A_LEN] = 'a';
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */
+   cvar = gstruct.b[rand() % B_LEN];
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */
+   gstruct.a[rand() % A_LEN] = 'a';
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */
+   cvar = gstruct.a[rand() % A_LEN];
 }
 
 static void check_success(pid_t child_pid, const char *name, const char *type,
@@ -417,6 +429,69 @@ static void test_sethwdebug_range_aligned(pid_t child_pid)
ptrace_delhwdebug(child_pid, wh);
 }
 
+static void test_multi_sethwdebug_range(pid_t child_pid)
+{
+   struct ppc_hw_breakpoint info1, info2;
+   unsigned long wp_addr1, wp_addr2;
+   char *name1 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED";
+   char *name2 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED";
+   int len1, len2;
+   int wh1, wh2;
+
+   wp_addr1 = (unsigned long)
+   wp_addr2 = (unsigned long)
+   len1 = A_LEN;
+   len2 = B_LEN;
+   get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, 
len1);
+   get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, 
len2);
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */
+   wh1 = ptrace_sethwdebug(child_pid, );
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */
+   wh2 = ptrace_sethwdebug(child_pid, );
+
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name1, "WO", wp_addr1, len1);
+
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name2, "RO", wp_addr2, len2);
+
+   ptrace_delhwdebug(child_pid, wh1);
+   ptrace_delhwdebug(child_pid, wh2);
+}
+
+static void test_multi_sethwdebug_range_dawr_overlap(pid_t child_pid)
+{
+   struct ppc_hw_breakpoint info1, info2;
+   unsigned long wp_addr1, wp_addr2;
+   char *name = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap";
+   int len1, len2;
+   int wh1, wh2;
+
+   wp_addr1 = (unsigned long)
+   wp_addr2 = (unsigned long)
+   len1 = A_LEN;
+   len2 = A_LEN;
+   get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, 
len1);
+   get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, 
len2);
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */
+   wh1 = ptrace_sethwdebug(child_pid, );
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */
+   wh2 = ptrace_sethwdebug(child_pid, );
+
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name, "WO", wp_addr1, len1);
+
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name, "RO", wp_addr2, len2);
+
+   ptrace_delhwdebug(child_pid, wh1);
+   ptrace_delhwdebug(child_pid, wh2);
+}
+
 static void test_sethwdebug_range_unaligned(pid_t child_pid)
 {
struct ppc_hw_breakpoint info;
@@ -504,6 +579,10 @@ run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, 
bool dawr)
test_sethwdebug_range_unaligned(child_pid);
test_sethwdebug_range_unaligned_dar(child_pid);
test_sethwdebug_dawr_max_range(child_pid);
+   if (dbginfo->num_data_bps > 1) {
+   test_multi_sethwdebug_range(child_pid);
+   
test_multi_sethwdebug_range_dawr_overlap(child_pid);
+   }
}
}
 }
-- 
2.27.0

[PATCH v3 0/4] powerpc/selftests: Add Power10 2nd DAWR selftests

2021-04-12 Thread Ravi Bangoria

Power10 introduced 2nd watchpoint (DAWR). ISA 3.1, Book 3S, Ch 9 -
'Debug Facilities' covers the feature in detail. Kernel patches to
enable the 2nd DAWR are already in[1], including kvm enablement[2].
These patches adds selftests for 2nd DAWR. 

[1]: https://git.kernel.org/torvalds/c/deb2bd9bcc8428d4b65b6ba640ba8b57c1b20b17
[2]: https://git.kernel.org/torvalds/c/bd1de1a0e6eff4bde5ceae969673b85b8446fd6a

v2: 
https://lore.kernel.org/r/20210407054938.312857-1-ravi.bango...@linux.ibm.com
v2->v3:
 - Fixed some checkpatch warnings

v1: 
https://lore.kernel.org/r/20200723102058.312282-1-ravi.bango...@linux.ibm.com
v1->v2:
 - Kvm patches are already upstream
 - Rebased selftests to powerpc/next

Ravi Bangoria (4):
  powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR
  powerpc/selftests/perf-hwbreak: Coalesce event creation code
  powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR
  powerpc/selftests: Add selftest to test concurrent perf/ptrace events

 .../selftests/powerpc/ptrace/.gitignore   |   1 +
 .../testing/selftests/powerpc/ptrace/Makefile |   2 +-
 .../selftests/powerpc/ptrace/perf-hwbreak.c   | 631 +++--
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c |  79 +++
 .../powerpc/ptrace/ptrace-perf-hwbreak.c  | 659 ++
 5 files changed, 1330 insertions(+), 42 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c

-- 
2.27.0

Re: [PATCH v2 2/4] powerpc/selftests/perf-hwbreak: Coalesce event creation code

2021-04-11 Thread Ravi Bangoria





On 4/9/21 12:49 PM, Daniel Axtens wrote:

Hi Ravi,


perf-hwbreak selftest opens hw-breakpoint event at multiple places for
which it has same code repeated. Coalesce that code into a function.

Signed-off-by: Ravi Bangoria 
---
  .../selftests/powerpc/ptrace/perf-hwbreak.c   | 78 +--


This doesn't simplify things very much for the code as it stands now,
but I think your next patch adds a bunch of calls to these functions, so
I agree that it makes sense to consolidate them now.


Right. This helps in next patch.


  1 file changed, 38 insertions(+), 40 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
index c1f324afdbf3..bde475341c8a 100644
--- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
@@ -34,28 +34,46 @@
  
  #define DAWR_LENGTH_MAX ((0x3f + 1) * 8)
  
-static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid,

- int cpu, int group_fd,
- unsigned long flags)
+static void perf_event_attr_set(struct perf_event_attr *attr,
+   __u32 type, __u64 addr, __u64 len,
+   bool exclude_user)
  {
-   attr->size = sizeof(*attr);
-   return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+   memset(attr, 0, sizeof(struct perf_event_attr));
+   attr->type   = PERF_TYPE_BREAKPOINT;
+   attr->size   = sizeof(struct perf_event_attr);
+   attr->bp_type= type;
+   attr->bp_addr= addr;
+   attr->bp_len = len;
+   attr->exclude_kernel = 1;
+   attr->exclude_hv = 1;
+   attr->exclude_guest  = 1;


Only 1 of the calls to perf sets exclude_{kernel,hv,guest} - I assume
there's no issue with setting them always but I wanted to check.


True. there is no issue in setting them as this test does all
load/stores in userspace. So all events will be recorded
irrespective of settings of exclude_{kernel,hv,guest}.




+   attr->exclude_user   = exclude_user;
+   attr->disabled   = 1;
  }
  
-	/* setup counters */

-   memset(, 0, sizeof(attr));
-   attr.disabled = 1;
-   attr.type = PERF_TYPE_BREAKPOINT;
-   attr.bp_type = readwriteflag;
-   attr.bp_addr = (__u64)ptr;
-   attr.bp_len = sizeof(int);
-   if (arraytest)
-   attr.bp_len = DAWR_LENGTH_MAX;
-   attr.exclude_user = exclude_user;
-   break_fd = sys_perf_event_open(, 0, -1, -1, 0);
+   break_fd = perf_process_event_open_exclude_user(readwriteflag, 
(__u64)ptr,
+   arraytest ? DAWR_LENGTH_MAX : sizeof(int),
+   exclude_user);


checkpatch doesn't like this very much:

CHECK: Alignment should match open parenthesis
#103: FILE: tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c:115:
+   break_fd = perf_process_event_open_exclude_user(readwriteflag, 
(__u64)ptr,
+   arraytest ? DAWR_LENGTH_MAX : sizeof(int),


Sure will fix it.



Apart from that, this seems good but I haven't checked in super fine
detail just yet :)


Thanks for the review.
-Ravi

Re: [PATCH v2 1/4] powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR

2021-04-11 Thread Ravi Bangoria




On 4/9/21 12:22 PM, Daniel Axtens wrote:

Hi Ravi,


Add selftests to test multiple active DAWRs with ptrace interface.


It would be good if somewhere (maybe in the cover letter) you explain
what DAWR stands for and where to find more information about it. I
found the Power ISA v3.1 Book 3 Chapter 9 very helpful.


Sure. Will add the details in v2 cover letter.


Apart from that, I don't have any specific comments about this patch. It
looks good to me, it seems to do what it says, and there are no comments
from checkpatch. It is a bit sparse in terms of comments but it is
consistent with the rest of the file so I can't really complain there :)

Reviewed-by: Daniel Axtens 


Thanks
Ravi

[PATCH v2 4/4] powerpc/selftests: Add selftest to test concurrent perf/ptrace events

2021-04-06 Thread Ravi Bangoria

ptrace and perf watchpoints can't co-exists if their address range
overlaps. See commit 29da4f91c0c1 ("powerpc/watchpoint: Don't allow
concurrent perf and ptrace events") for more detail. Add selftest
for the same.

Sample o/p:
  # ./ptrace-perf-hwbreak
  test: ptrace-perf-hwbreak
  tags: git_version:powerpc-5.8-7-118-g937fa174a15d-dirty
  perf cpu event -> ptrace thread event (Overlapping): Ok
  perf cpu event -> ptrace thread event (Non-overlapping): Ok
  perf thread event -> ptrace same thread event (Overlapping): Ok
  perf thread event -> ptrace same thread event (Non-overlapping): Ok
  perf thread event -> ptrace other thread event: Ok
  ptrace thread event -> perf kernel event: Ok
  ptrace thread event -> perf same thread event (Overlapping): Ok
  ptrace thread event -> perf same thread event (Non-overlapping): Ok
  ptrace thread event -> perf other thread event: Ok
  ptrace thread event -> perf cpu event (Overlapping): Ok
  ptrace thread event -> perf cpu event (Non-overlapping): Ok
  ptrace thread event -> perf same thread & cpu event (Overlapping): Ok
  ptrace thread event -> perf same thread & cpu event (Non-overlapping): Ok
  ptrace thread event -> perf other thread & cpu event: Ok
  success: ptrace-perf-hwbreak

Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/.gitignore   |   1 +
 .../testing/selftests/powerpc/ptrace/Makefile |   2 +-
 .../powerpc/ptrace/ptrace-perf-hwbreak.c  | 659 ++
 3 files changed, 661 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c

diff --git a/tools/testing/selftests/powerpc/ptrace/.gitignore 
b/tools/testing/selftests/powerpc/ptrace/.gitignore
index 0e96150b7c7e..eb75e5360e31 100644
--- a/tools/testing/selftests/powerpc/ptrace/.gitignore
+++ b/tools/testing/selftests/powerpc/ptrace/.gitignore
@@ -14,3 +14,4 @@ perf-hwbreak
 core-pkey
 ptrace-pkey
 ptrace-syscall
+ptrace-perf-hwbreak
diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile 
b/tools/testing/selftests/powerpc/ptrace/Makefile
index 8d3f006c98cc..a500639da97a 100644
--- a/tools/testing/selftests/powerpc/ptrace/Makefile
+++ b/tools/testing/selftests/powerpc/ptrace/Makefile
@@ -2,7 +2,7 @@
 TEST_GEN_PROGS := ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr \
   ptrace-tar ptrace-tm-tar ptrace-tm-spd-tar ptrace-vsx 
ptrace-tm-vsx \
   ptrace-tm-spd-vsx ptrace-tm-spr ptrace-hwbreak ptrace-pkey 
core-pkey \
-  perf-hwbreak ptrace-syscall
+  perf-hwbreak ptrace-syscall ptrace-perf-hwbreak
 
 top_srcdir = ../../../../..
 include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
new file mode 100644
index ..6b8804a4942e
--- /dev/null
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
@@ -0,0 +1,659 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "ptrace.h"
+
+char data[16];
+
+/* Overlapping address range */
+volatile __u64 *ptrace_data1 = (__u64 *)[0];
+volatile __u64 *perf_data1 = (__u64 *)[4];
+
+/* Non-overlapping address range */
+volatile __u64 *ptrace_data2 = (__u64 *)[0];
+volatile __u64 *perf_data2 = (__u64 *)[8];
+
+static unsigned long pid_max_addr(void)
+{
+   FILE *fp;
+   char *line, *c;
+   char addr[100];
+   size_t len = 0;
+
+   fp = fopen("/proc/kallsyms", "r");
+   if (!fp) {
+   printf("Failed to read /proc/kallsyms. Exiting..\n");
+   exit(EXIT_FAILURE);
+   }
+
+   while (getline(, , fp) != -1) {
+   if (!strstr(line, "pid_max") || strstr(line, "pid_max_max") ||
+   strstr(line, "pid_max_min"))
+   continue;
+
+   strncpy(addr, line, len < 100 ? len : 100);
+   c = strchr(addr, ' ');
+   *c = '\0';
+   return strtoul(addr, , 16);
+   }
+   fclose(fp);
+   printf("Could not find pix_max. Exiting..\n");
+   exit(EXIT_FAILURE);
+   return -1;
+}
+
+static void perf_user_event_attr_set(struct perf_event_attr *attr, __u64 addr, 
__u64 len)
+{
+   memset(attr, 0, sizeof(struct perf_event_attr));
+   attr->type   = PERF_TYPE_BREAKPOINT;
+   attr->size   = sizeof(struct perf_event_attr);
+   attr->bp_type= HW_BREAKPOINT_R;
+   attr->bp_addr= addr;
+   attr->bp_len = len;
+   attr->exclude_kernel = 1;
+   attr->exclude_hv = 1;
+}
+
+static void perf_kernel_event_attr_set(struct perf_event_attr *attr)
+{
+   memset(attr, 0, sizeof(struct perf_event_attr));
+   attr->type   = PERF_

[PATCH v2 3/4] powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR

2021-04-06 Thread Ravi Bangoria

Extend perf-hwbreak.c selftest to test multiple DAWRs. Also add
testcase for testing 512 byte boundary removal.

Sample o/p:
  # ./perf-hwbreak
  ...
  TESTED: Process specific, Two events, diff addr
  TESTED: Process specific, Two events, same addr
  TESTED: Process specific, Two events, diff addr, one is RO, other is WO
  TESTED: Process specific, Two events, same addr, one is RO, other is WO
  TESTED: Systemwide, Two events, diff addr
  TESTED: Systemwide, Two events, same addr
  TESTED: Systemwide, Two events, diff addr, one is RO, other is WO
  TESTED: Systemwide, Two events, same addr, one is RO, other is WO
  TESTED: Process specific, 512 bytes, unaligned
  success: perf_hwbreak

Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/perf-hwbreak.c   | 568 +-
 1 file changed, 567 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
index bde475341c8a..1dafba42c23d 100644
--- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
@@ -21,8 +21,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -34,6 +39,12 @@
 
 #define DAWR_LENGTH_MAX ((0x3f + 1) * 8)
 
+int nprocs;
+
+static volatile int a = 10;
+static volatile int b = 10;
+static volatile char c[512 + 8] __attribute__((aligned(512)));
+
 static void perf_event_attr_set(struct perf_event_attr *attr,
__u32 type, __u64 addr, __u64 len,
bool exclude_user)
@@ -68,6 +79,76 @@ static int perf_process_event_open(__u32 type, __u64 addr, 
__u64 len)
return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0);
 }
 
+static int perf_cpu_event_open(long cpu, __u32 type, __u64 addr, __u64 len)
+{
+   struct perf_event_attr attr;
+
+   perf_event_attr_set(, type, addr, len, 0);
+   return syscall(__NR_perf_event_open, , -1, cpu, -1, 0);
+}
+
+static void close_fds(int *fd, int n)
+{
+   int i;
+
+   for (i = 0; i < n; i++)
+   close(fd[i]);
+}
+
+static unsigned long read_fds(int *fd, int n)
+{
+   int i;
+   unsigned long c = 0;
+   unsigned long count = 0;
+   size_t res;
+
+   for (i = 0; i < n; i++) {
+   res = read(fd[i], , sizeof(c));
+   assert(res == sizeof(unsigned long long));
+   count += c;
+   }
+   return count;
+}
+
+static void reset_fds(int *fd, int n)
+{
+   int i;
+
+   for (i = 0; i < n; i++)
+   ioctl(fd[i], PERF_EVENT_IOC_RESET);
+}
+
+static void enable_fds(int *fd, int n)
+{
+   int i;
+
+   for (i = 0; i < n; i++)
+   ioctl(fd[i], PERF_EVENT_IOC_ENABLE);
+}
+
+static void disable_fds(int *fd, int n)
+{
+   int i;
+
+   for (i = 0; i < n; i++)
+   ioctl(fd[i], PERF_EVENT_IOC_DISABLE);
+}
+
+static int perf_systemwide_event_open(int *fd, __u32 type, __u64 addr, __u64 
len)
+{
+   int i = 0;
+
+   /* Assume online processors are 0 to nprocs for simplisity */
+   for (i = 0; i < nprocs; i++) {
+   fd[i] = perf_cpu_event_open(i, type, addr, len);
+   if (fd[i] < 0) {
+   close_fds(fd, i);
+   return fd[i];
+   }
+   }
+   return 0;
+}
+
 static inline bool breakpoint_test(int len)
 {
int fd;
@@ -261,11 +342,483 @@ static int runtest_dar_outside(void)
return fail;
 }
 
+static void multi_dawr_workload(void)
+{
+   a += 10;
+   b += 10;
+   c[512 + 1] += 'a';
+}
+
+static int test_process_multi_diff_addr(void)
+{
+   unsigned long long breaks1 = 0, breaks2 = 0;
+   int fd1, fd2;
+   char *desc = "Process specific, Two events, diff addr";
+   size_t res;
+
+   fd1 = perf_process_event_open(HW_BREAKPOINT_RW, (__u64), 
(__u64)sizeof(a));
+   if (fd1 < 0) {
+   perror("perf_process_event_open");
+   exit(EXIT_FAILURE);
+   }
+
+   fd2 = perf_process_event_open(HW_BREAKPOINT_RW, (__u64), 
(__u64)sizeof(b));
+   if (fd2 < 0) {
+   close(fd1);
+   perror("perf_process_event_open");
+   exit(EXIT_FAILURE);
+   }
+
+   ioctl(fd1, PERF_EVENT_IOC_RESET);
+   ioctl(fd2, PERF_EVENT_IOC_RESET);
+   ioctl(fd1, PERF_EVENT_IOC_ENABLE);
+   ioctl(fd2, PERF_EVENT_IOC_ENABLE);
+   multi_dawr_workload();
+   ioctl(fd1, PERF_EVENT_IOC_DISABLE);
+   ioctl(fd2, PERF_EVENT_IOC_DISABLE);
+
+   res = read(fd1, , sizeof(breaks1));
+   assert(res == sizeof(unsigned long long));
+   res = read(fd2, , sizeof(breaks2));
+   assert(res == sizeof(unsigned long long));
+
+   close(fd1);
+   close(fd2);
+
+   if (breaks1 != 2 || breaks2 != 2

[PATCH v2 2/4] powerpc/selftests/perf-hwbreak: Coalesce event creation code

2021-04-06 Thread Ravi Bangoria

perf-hwbreak selftest opens hw-breakpoint event at multiple places for
which it has same code repeated. Coalesce that code into a function.

Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/perf-hwbreak.c   | 78 +--
 1 file changed, 38 insertions(+), 40 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
index c1f324afdbf3..bde475341c8a 100644
--- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
@@ -34,28 +34,46 @@
 
 #define DAWR_LENGTH_MAX ((0x3f + 1) * 8)
 
-static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid,
- int cpu, int group_fd,
- unsigned long flags)
+static void perf_event_attr_set(struct perf_event_attr *attr,
+   __u32 type, __u64 addr, __u64 len,
+   bool exclude_user)
 {
-   attr->size = sizeof(*attr);
-   return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+   memset(attr, 0, sizeof(struct perf_event_attr));
+   attr->type   = PERF_TYPE_BREAKPOINT;
+   attr->size   = sizeof(struct perf_event_attr);
+   attr->bp_type= type;
+   attr->bp_addr= addr;
+   attr->bp_len = len;
+   attr->exclude_kernel = 1;
+   attr->exclude_hv = 1;
+   attr->exclude_guest  = 1;
+   attr->exclude_user   = exclude_user;
+   attr->disabled   = 1;
 }
 
-static inline bool breakpoint_test(int len)
+static int
+perf_process_event_open_exclude_user(__u32 type, __u64 addr, __u64 len, bool 
exclude_user)
 {
struct perf_event_attr attr;
+
+   perf_event_attr_set(, type, addr, len, exclude_user);
+   return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0);
+}
+
+static int perf_process_event_open(__u32 type, __u64 addr, __u64 len)
+{
+   struct perf_event_attr attr;
+
+   perf_event_attr_set(, type, addr, len, 0);
+   return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0);
+}
+
+static inline bool breakpoint_test(int len)
+{
int fd;
 
-   /* setup counters */
-   memset(, 0, sizeof(attr));
-   attr.disabled = 1;
-   attr.type = PERF_TYPE_BREAKPOINT;
-   attr.bp_type = HW_BREAKPOINT_R;
/* bp_addr can point anywhere but needs to be aligned */
-   attr.bp_addr = (__u64)() & 0xf800;
-   attr.bp_len = len;
-   fd = sys_perf_event_open(, 0, -1, -1, 0);
+   fd = perf_process_event_open(HW_BREAKPOINT_R, (__u64)() & 
0xf800, len);
if (fd < 0)
return false;
close(fd);
@@ -75,7 +93,6 @@ static inline bool dawr_supported(void)
 static int runtestsingle(int readwriteflag, int exclude_user, int arraytest)
 {
int i,j;
-   struct perf_event_attr attr;
size_t res;
unsigned long long breaks, needed;
int readint;
@@ -94,19 +111,11 @@ static int runtestsingle(int readwriteflag, int 
exclude_user, int arraytest)
if (arraytest)
ptr = [0];
 
-   /* setup counters */
-   memset(, 0, sizeof(attr));
-   attr.disabled = 1;
-   attr.type = PERF_TYPE_BREAKPOINT;
-   attr.bp_type = readwriteflag;
-   attr.bp_addr = (__u64)ptr;
-   attr.bp_len = sizeof(int);
-   if (arraytest)
-   attr.bp_len = DAWR_LENGTH_MAX;
-   attr.exclude_user = exclude_user;
-   break_fd = sys_perf_event_open(, 0, -1, -1, 0);
+   break_fd = perf_process_event_open_exclude_user(readwriteflag, 
(__u64)ptr,
+   arraytest ? DAWR_LENGTH_MAX : sizeof(int),
+   exclude_user);
if (break_fd < 0) {
-   perror("sys_perf_event_open");
+   perror("perf_process_event_open_exclude_user");
exit(1);
}
 
@@ -153,7 +162,6 @@ static int runtest_dar_outside(void)
void *target;
volatile __u16 temp16;
volatile __u64 temp64;
-   struct perf_event_attr attr;
int break_fd;
unsigned long long breaks;
int fail = 0;
@@ -165,21 +173,11 @@ static int runtest_dar_outside(void)
exit(EXIT_FAILURE);
}
 
-   /* setup counters */
-   memset(, 0, sizeof(attr));
-   attr.disabled = 1;
-   attr.type = PERF_TYPE_BREAKPOINT;
-   attr.exclude_kernel = 1;
-   attr.exclude_hv = 1;
-   attr.exclude_guest = 1;
-   attr.bp_type = HW_BREAKPOINT_RW;
/* watch middle half of target array */
-   attr.bp_addr = (__u64)(target + 2);
-   attr.bp_len = 4;
-   break_fd = sys_perf_event_open(, 0, -1, -1, 0);
+   break_fd = perf_process_event_open(HW_BREAKPOINT_RW, (__u64)(target + 
2), 4);
if (break_fd < 0) {

[PATCH v2 1/4] powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR

2021-04-06 Thread Ravi Bangoria

Add selftests to test multiple active DAWRs with ptrace interface.

Sample o/p:
  $ ./ptrace-hwbreak
  ...
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO, len: 6: Ok

Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 79 +++
 1 file changed, 79 insertions(+)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
index 2e0d86e0687e..a0635a3819aa 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
@@ -194,6 +194,18 @@ static void test_workload(void)
big_var[rand() % DAWR_MAX_LEN] = 'a';
else
cvar = big_var[rand() % DAWR_MAX_LEN];
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */
+   gstruct.a[rand() % A_LEN] = 'a';
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */
+   cvar = gstruct.b[rand() % B_LEN];
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */
+   gstruct.a[rand() % A_LEN] = 'a';
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */
+   cvar = gstruct.a[rand() % A_LEN];
 }
 
 static void check_success(pid_t child_pid, const char *name, const char *type,
@@ -417,6 +429,69 @@ static void test_sethwdebug_range_aligned(pid_t child_pid)
ptrace_delhwdebug(child_pid, wh);
 }
 
+static void test_multi_sethwdebug_range(pid_t child_pid)
+{
+   struct ppc_hw_breakpoint info1, info2;
+   unsigned long wp_addr1, wp_addr2;
+   char *name1 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED";
+   char *name2 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED";
+   int len1, len2;
+   int wh1, wh2;
+
+   wp_addr1 = (unsigned long)
+   wp_addr2 = (unsigned long)
+   len1 = A_LEN;
+   len2 = B_LEN;
+   get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, 
len1);
+   get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, 
len2);
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */
+   wh1 = ptrace_sethwdebug(child_pid, );
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */
+   wh2 = ptrace_sethwdebug(child_pid, );
+
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name1, "WO", wp_addr1, len1);
+
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name2, "RO", wp_addr2, len2);
+
+   ptrace_delhwdebug(child_pid, wh1);
+   ptrace_delhwdebug(child_pid, wh2);
+}
+
+static void test_multi_sethwdebug_range_dawr_overlap(pid_t child_pid)
+{
+   struct ppc_hw_breakpoint info1, info2;
+   unsigned long wp_addr1, wp_addr2;
+   char *name = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap";
+   int len1, len2;
+   int wh1, wh2;
+
+   wp_addr1 = (unsigned long)
+   wp_addr2 = (unsigned long)
+   len1 = A_LEN;
+   len2 = A_LEN;
+   get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, 
len1);
+   get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, 
len2);
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */
+   wh1 = ptrace_sethwdebug(child_pid, );
+
+   /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */
+   wh2 = ptrace_sethwdebug(child_pid, );
+
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name, "WO", wp_addr1, len1);
+
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name, "RO", wp_addr2, len2);
+
+   ptrace_delhwdebug(child_pid, wh1);
+   ptrace_delhwdebug(child_pid, wh2);
+}
+
 static void test_sethwdebug_range_unaligned(pid_t child_pid)
 {
struct ppc_hw_breakpoint info;
@@ -504,6 +579,10 @@ run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, 
bool dawr)
test_sethwdebug_range_unaligned(child_pid);
test_sethwdebug_range_unaligned_dar(child_pid);
test_sethwdebug_dawr_max_range(child_pid);
+   if (dbginfo->num_data_bps > 1) {
+   test_multi_sethwdebug_range(child_pid);
+   
test_multi_sethwdebug_range_dawr_overlap(child_pid);
+   }
}
}
 }
-- 
2.27.0

[PATCH v2 0/4] powerpc/selftests: Add Power10 2nd DAWR selftests

2021-04-06 Thread Ravi Bangoria

Add selftests for 2nd DAWR supported by Power10.

v1: 
https://lore.kernel.org/r/20200723102058.312282-1-ravi.bango...@linux.ibm.com
v1->v2:
 - Kvm patches are already upstream
 - Rebased selftests to powerpc/next

Ravi Bangoria (4):
  powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR
  powerpc/selftests/perf-hwbreak: Coalesce event creation code
  powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR
  powerpc/selftests: Add selftest to test concurrent perf/ptrace events

 .../selftests/powerpc/ptrace/.gitignore   |   1 +
 .../testing/selftests/powerpc/ptrace/Makefile |   2 +-
 .../selftests/powerpc/ptrace/perf-hwbreak.c   | 646 +++--
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c |  79 +++
 .../powerpc/ptrace/ptrace-perf-hwbreak.c  | 659 ++
 5 files changed, 1345 insertions(+), 42 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c

-- 
2.27.0

[PATCH v5] powerpc/uprobes: Validation for prefixed instruction

2021-03-11 Thread Ravi Bangoria

As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
Acked-by: Naveen N. Rao 
Acked-by: Sandipan Das 
---
v4: 
https://lore.kernel.org/r/20210305115433.140769-1-ravi.bango...@linux.ibm.com
v4->v5:
  - Replace SZ_ macros with numbers

 arch/powerpc/kernel/uprobes.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..186f69b11e94 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,13 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
if (addr & 0x03)
return -EINVAL;
 
+   if (cpu_has_feature(CPU_FTR_ARCH_31) &&
+   ppc_inst_prefixed(auprobe->insn) &&
+   (addr & 0x3f) == 60) {
+   pr_info_ratelimited("Cannot register a uprobe on 64 byte 
unaligned prefixed instruction\n");
+   return -EINVAL;
+   }
+
return 0;
 }
 
-- 
2.27.0

Re: [PATCH v4] powerpc/uprobes: Validation for prefixed instruction

2021-03-09 Thread Ravi Bangoria





On 3/9/21 4:51 PM, Naveen N. Rao wrote:

On 2021/03/09 08:54PM, Michael Ellerman wrote:

Ravi Bangoria  writes:

As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
Acked-by: Naveen N. Rao 


Do we have a Fixes: tag for this?


Since this is an additional check we are adding, I don't think we should
add a Fixes: tag. Nothing is broken per-se -- we're just adding more
checks to catch simple mistakes. Also, like Oleg pointed out, there are
still many other ways for users to shoot themselves in the foot with
uprobes and prefixed instructions, if they so desire.

However, if you still think we should add a Fixes: tag, we can perhaps
use the below commit since I didn't see any specific commit adding
support for prefixed instructions for uprobes:

Fixes: 650b55b707fdfa ("powerpc: Add prefixed instructions to
instruction data type")


True. IMO, It doesn't really need any Fixes tag.






---
v3: https://lore.kernel.org/r/20210304050529.59391-1-ravi.bango...@linux.ibm.com
v3->v4:
   - CONFIG_PPC64 check was not required, remove it.
   - Use SZ_ macros instead of hardcoded numbers.

  arch/powerpc/kernel/uprobes.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..4cbfff6e94a3 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,13 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
if (addr & 0x03)
return -EINVAL;
  
+	if (cpu_has_feature(CPU_FTR_ARCH_31) &&

+   ppc_inst_prefixed(auprobe->insn) &&
+   (addr & (SZ_64 - 4)) == SZ_64 - 4) {
+   pr_info_ratelimited("Cannot register a uprobe on 64 byte unaligned 
prefixed instruction\n");
+   return -EINVAL;


I realise we already did the 0x03 check above, but I still think this
would be clearer simply as:

(addr & 0x3f == 60)


Indeed, I like the use of `60' there -- hex is overrated ;)


Sure. Will resend.

Ravi

[PATCH v4] powerpc/uprobes: Validation for prefixed instruction

2021-03-05 Thread Ravi Bangoria

As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
Acked-by: Naveen N. Rao 
---
v3: https://lore.kernel.org/r/20210304050529.59391-1-ravi.bango...@linux.ibm.com
v3->v4:
  - CONFIG_PPC64 check was not required, remove it.
  - Use SZ_ macros instead of hardcoded numbers.

 arch/powerpc/kernel/uprobes.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..4cbfff6e94a3 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,13 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
if (addr & 0x03)
return -EINVAL;
 
+   if (cpu_has_feature(CPU_FTR_ARCH_31) &&
+   ppc_inst_prefixed(auprobe->insn) &&
+   (addr & (SZ_64 - 4)) == SZ_64 - 4) {
+   pr_info_ratelimited("Cannot register a uprobe on 64 byte 
unaligned prefixed instruction\n");
+   return -EINVAL;
+   }
+
return 0;
 }
 
-- 
2.27.0

Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction

2021-03-04 Thread Ravi Bangoria





On 3/4/21 4:21 PM, Christophe Leroy wrote:



Le 04/03/2021 à 11:13, Ravi Bangoria a écrit :



On 3/4/21 1:02 PM, Christophe Leroy wrote:



Le 04/03/2021 à 06:05, Ravi Bangoria a écrit :

As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
---
v2: 
https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com
v2->v3:
   - Drop restriction for Uprobe on suffix of prefixed instruction.
 It needs lot of code change including generic code but what
 we get in return is not worth it.

  arch/powerpc/kernel/uprobes.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..c400971ebe70 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
  if (addr & 0x03)
  return -EINVAL;
+    if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31))


cpu_has_feature(CPU_FTR_ARCH_31) should return 'false' when CONFIG_PPC64 is not 
enabled, no need to double check.


Ok.

I'm going to drop CONFIG_PPC64 check because it's not really
required as I replied to Naveen. So, I'll keep CPU_FTR_ARCH_31
check as is.




+    return 0;
+
+    if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) {


Maybe 3C instead of 4F ? : (addr & 0x3C) == 0x3C


Didn't follow. It's not (addr & 0x3C), it's (addr & 0x3F).


Sorry I meant 3c instead of 3f (And usually we don't use capital letters for 
that).
The last two bits are supposed to always be 0, so it doesn't really matter, I just 
thought it would look better having the same value both sides of the test, ie (addr 
& 0x3c) == 0x3c.


Ok yeah makes sense. Thanks.







What about

(addr & (SZ_64 - 4)) == SZ_64 - 4 to make it more explicit ?


Yes this is bit better. Though, it should be:

 (addr & (SZ_64 - 1)) == SZ_64 - 4


-1 or -4 should give the same results as instructions are always 32 bits 
aligned though.


Got it.

Ravi

Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction

2021-03-04 Thread Ravi Bangoria





On 3/4/21 1:02 PM, Christophe Leroy wrote:



Le 04/03/2021 à 06:05, Ravi Bangoria a écrit :

As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
---
v2: 
https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com
v2->v3:
   - Drop restriction for Uprobe on suffix of prefixed instruction.
 It needs lot of code change including generic code but what
 we get in return is not worth it.

  arch/powerpc/kernel/uprobes.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..c400971ebe70 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
  if (addr & 0x03)
  return -EINVAL;
+    if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31))


cpu_has_feature(CPU_FTR_ARCH_31) should return 'false' when CONFIG_PPC64 is not 
enabled, no need to double check.


Ok.

I'm going to drop CONFIG_PPC64 check because it's not really
required as I replied to Naveen. So, I'll keep CPU_FTR_ARCH_31
check as is.




+    return 0;
+
+    if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) {


Maybe 3C instead of 4F ? : (addr & 0x3C) == 0x3C


Didn't follow. It's not (addr & 0x3C), it's (addr & 0x3F).



What about

(addr & (SZ_64 - 4)) == SZ_64 - 4 to make it more explicit ?


Yes this is bit better. Though, it should be:

(addr & (SZ_64 - 1)) == SZ_64 - 4

Ravi

Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction

2021-03-03 Thread Ravi Bangoria





@@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
if (addr & 0x03)
return -EINVAL;
  
+	if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31))

+   return 0;


Sorry, I missed this last time, but I think we can drop the above
checks. ppc_inst_prefixed() already factors in the dependency on
CONFIG_PPC64,


Yeah, makes sense. I initially added CONFIG_PPC64 check because
I was using ppc_inst_prefix(x, y) macro which is not available
for !CONFIG_PPC64.


and I don't think we need to confirm if we're running on a
ISA V3.1 for the below check.


Prefixed instructions would be supported only when ARCH_31 is set.
(Ignoring insane scenario where user probes on prefixed instruction
on p10 predecessors). So I guess I still need ARCH_31 check?

Ravi

[PATCH v3] powerpc/uprobes: Validation for prefixed instruction

2021-03-03 Thread Ravi Bangoria

As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
---
v2: 
https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com
v2->v3:
  - Drop restriction for Uprobe on suffix of prefixed instruction.
It needs lot of code change including generic code but what
we get in return is not worth it.

 arch/powerpc/kernel/uprobes.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..c400971ebe70 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
if (addr & 0x03)
return -EINVAL;
 
+   if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31))
+   return 0;
+
+   if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) {
+   pr_info_ratelimited("Cannot register a uprobe on 64 byte 
unaligned prefixed instruction\n");
+   return -EINVAL;
+   }
+
return 0;
 }
 
-- 
2.27.0

Re: [PATCH] powerpc/sstep: Fix VSX instruction emulation

2021-02-26 Thread Ravi Bangoria





On 2/25/21 8:49 AM, Jordan Niethe wrote:

Commit af99da74333b ("powerpc/sstep: Support VSX vector paired storage
access instructions") added loading and storing 32 word long data into
adjacent VSRs. However the calculation used to determine if two VSRs
needed to be loaded/stored inadvertently prevented the load/storing
taking place for instructions with a data length less than 16 words.

This causes the emulation to not function correctly, which can be seen
by the alignment_handler selftest:

$ ./alignment_handler
[snip]
test: test_alignment_handler_vsx_207
tags: git_version:powerpc-5.12-1-0-g82d2c16b350f
VSX: 2.07B
 Doing lxsspx:   PASSED
 Doing lxsiwax:  FAILED: Wrong Data
 Doing lxsiwzx:  PASSED
 Doing stxsspx:  PASSED
 Doing stxsiwx:  PASSED
failure: test_alignment_handler_vsx_207
test: test_alignment_handler_vsx_300
tags: git_version:powerpc-5.12-1-0-g82d2c16b350f
VSX: 3.00B
 Doing lxsd: PASSED
 Doing lxsibzx:  PASSED
 Doing lxsihzx:  PASSED
 Doing lxssp:FAILED: Wrong Data
 Doing lxv:  PASSED
 Doing lxvb16x:  PASSED
 Doing lxvh8x:   PASSED
 Doing lxvx: PASSED
 Doing lxvwsx:   FAILED: Wrong Data
 Doing lxvl: PASSED
 Doing lxvll:PASSED
 Doing stxsd:PASSED
 Doing stxsibx:  PASSED
 Doing stxsihx:  PASSED
 Doing stxssp:   PASSED
 Doing stxv: PASSED
 Doing stxvb16x: PASSED
 Doing stxvh8x:  PASSED
 Doing stxvx:PASSED
 Doing stxvl:PASSED
 Doing stxvll:   PASSED
failure: test_alignment_handler_vsx_300
[snip]

Fix this by making sure all VSX instruction emulation correctly
load/store from the VSRs.

Fixes: af99da74333b ("powerpc/sstep: Support VSX vector paired storage access 
instructions")
Signed-off-by: Jordan Niethe 


Yikes!

Reviewed-by: Ravi Bangoria

Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction

2021-02-08 Thread Ravi Bangoria





On 2/4/21 6:38 PM, Naveen N. Rao wrote:

On 2021/02/04 04:17PM, Ravi Bangoria wrote:

Don't allow Uprobe on 2nd word of a prefixed instruction. As per
ISA 3.1, prefixed instruction should not cross 64-byte boundary.
So don't allow Uprobe on such prefixed instruction as well.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if we notice
that probe is on the 2nd word of prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
---
v1: http://lore.kernel.org/r/20210119091234.76317-1-ravi.bango...@linux.ibm.com
v1->v2:
   - Instead of introducing new arch hook from verify_opcode(), use
 existing hook arch_uprobe_analyze_insn().
   - Add explicit check for prefixed instruction crossing 64-byte
 boundary. If probe is on such instruction, throw an error.

  arch/powerpc/kernel/uprobes.c | 66 ++-
  1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..485d19a2a31f 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -7,6 +7,7 @@
   * Adapted from the x86 port by Ananth N Mavinakayanahalli 
   */
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -28,6 +29,69 @@ bool is_trap_insn(uprobe_opcode_t *insn)
return (is_trap(*insn));
  }
  
+#ifdef CONFIG_PPC64

+static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr)
+{
+   struct page *page;
+   struct vm_area_struct *vma;
+   void *kaddr;
+   unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD;
+
+   if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 
0)
+   return -EINVAL;
+
+   kaddr = kmap_atomic(page);
+   *instr = *((u32 *)(kaddr + (addr & ~PAGE_MASK)));
+   kunmap_atomic(kaddr);
+   put_page(page);
+   return 0;
+}
+
+static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr)
+{
+   struct ppc_inst inst;
+   u32 prefix, suffix;
+
+   /*
+* No need to check if addr is pointing to beginning of the
+* page. Even if probe is on a suffix of page-unaligned
+* prefixed instruction, hw will raise exception and kernel
+* will send SIGBUS.
+*/
+   if (!(addr & ~PAGE_MASK))
+   return 0;
+
+   if (get_instr(mm, addr, ) < 0)
+   return -EINVAL;
+   if (get_instr(mm, addr + 4, ) < 0)
+   return -EINVAL;
+
+   inst = ppc_inst_prefix(prefix, suffix);
+   if (ppc_inst_prefixed(inst) && (addr & 0x3F) == 0x3C) {
+   printk_ratelimited("Cannot register a uprobe on 64 byte "

^^ pr_info_ratelimited()

It should be sufficient to check the primary opcode to determine if it
is a prefixed instruction. You don't have to read the suffix. I see that
we don't have a helper to do this currently, so you could do:

if (ppc_inst_primary_opcode(ppc_inst(prefix)) == 1)


Ok.



In the future, it might be worthwhile to add IS_PREFIX() as a macro
similar to IS_MTMSRD() if there are more such uses.

Along with this, if you also add the below to the start of this
function, you can get rid of the #ifdef:

if (!IS_ENABLED(CONFIG_PPC64))
return 0;


Yeah this is better.

Thanks for the review, Naveen!
Ravi

Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction

2021-02-08 Thread Ravi Bangoria





On 2/4/21 9:42 PM, Naveen N. Rao wrote:

On 2021/02/04 06:38PM, Naveen N. Rao wrote:

On 2021/02/04 04:17PM, Ravi Bangoria wrote:

Don't allow Uprobe on 2nd word of a prefixed instruction. As per
ISA 3.1, prefixed instruction should not cross 64-byte boundary.
So don't allow Uprobe on such prefixed instruction as well.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if we notice
that probe is on the 2nd word of prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
---
v1: http://lore.kernel.org/r/20210119091234.76317-1-ravi.bango...@linux.ibm.com
v1->v2:
   - Instead of introducing new arch hook from verify_opcode(), use
 existing hook arch_uprobe_analyze_insn().
   - Add explicit check for prefixed instruction crossing 64-byte
 boundary. If probe is on such instruction, throw an error.

  arch/powerpc/kernel/uprobes.c | 66 ++-
  1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..485d19a2a31f 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -7,6 +7,7 @@
   * Adapted from the x86 port by Ananth N Mavinakayanahalli 
   */
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -28,6 +29,69 @@ bool is_trap_insn(uprobe_opcode_t *insn)
return (is_trap(*insn));
  }
  
+#ifdef CONFIG_PPC64

+static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr)
+{
+   struct page *page;
+   struct vm_area_struct *vma;
+   void *kaddr;
+   unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD;
+
+   if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 
0)
+   return -EINVAL;
+
+   kaddr = kmap_atomic(page);
+   *instr = *((u32 *)(kaddr + (addr & ~PAGE_MASK)));
+   kunmap_atomic(kaddr);
+   put_page(page);
+   return 0;
+}
+
+static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr)
+{
+   struct ppc_inst inst;
+   u32 prefix, suffix;
+
+   /*
+* No need to check if addr is pointing to beginning of the
+* page. Even if probe is on a suffix of page-unaligned
+* prefixed instruction, hw will raise exception and kernel
+* will send SIGBUS.
+*/
+   if (!(addr & ~PAGE_MASK))
+   return 0;
+
+   if (get_instr(mm, addr, ) < 0)
+   return -EINVAL;
+   if (get_instr(mm, addr + 4, ) < 0)
+   return -EINVAL;
+
+   inst = ppc_inst_prefix(prefix, suffix);
+   if (ppc_inst_prefixed(inst) && (addr & 0x3F) == 0x3C) {
+   printk_ratelimited("Cannot register a uprobe on 64 byte "

^^ pr_info_ratelimited()

It should be sufficient to check the primary opcode to determine if it
is a prefixed instruction. You don't have to read the suffix. I see that
we don't have a helper to do this currently, so you could do:

if (ppc_inst_primary_opcode(ppc_inst(prefix)) == 1)


Seeing the kprobes code, I realized that we have to check for another
scenario (Thanks, Jordan!). If this is the suffix of a prefix
instruction for which a uprobe has already been installed, then the
previous word will be a 'trap' instruction. You need to check if there
is a uprobe at the previous word, and if the original instruction there
was a prefix instruction.


Yes, this patch will fail to detect such scenario. I think I should
read the instruction directly from file, like what copy_insn() does.
With that, I'll get original instruction rather that 'trap'.

I'll think more along this line.

Ravi

Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction

2021-02-08 Thread Ravi Bangoria





On 2/4/21 6:45 PM, Naveen N. Rao wrote:

On 2021/02/04 04:19PM, Ravi Bangoria wrote:



On 2/4/21 4:17 PM, Ravi Bangoria wrote:

Don't allow Uprobe on 2nd word of a prefixed instruction. As per
ISA 3.1, prefixed instruction should not cross 64-byte boundary.
So don't allow Uprobe on such prefixed instruction as well.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if we notice
that probe is on the 2nd word of prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.


@mpe,

arch_uprobe_analyze_insn() can return early if
cpu_has_feature(CPU_FTR_ARCH_31) is not set. But that will
miss out a rare scenario of user running binary with prefixed
instruction on p10 predecessors. Please let me know if I
should add cpu_has_feature(CPU_FTR_ARCH_31) or not.


The check you are adding is very specific to prefixed instructions, so
it makes sense to add a cpu feature check for v3.1.

On older processors, those are invalid instructions like any other. The
instruction emulation infrastructure will refuse to emulate it and the
instruction will be single stepped.


Sure will add it.

Ravi

Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction

2021-02-08 Thread Ravi Bangoria





On 2/6/21 11:36 PM, Oleg Nesterov wrote:

On 02/04, Ravi Bangoria wrote:


+static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr)
+{
+   struct page *page;
+   struct vm_area_struct *vma;
+   void *kaddr;
+   unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD;
+
+   if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 
0)
+   return -EINVAL;


"vma" is not used,


Ok.


and I don't think you need FOLL_SPLIT_PMD.

Isn't it needed if the target page is hugepage?


Otherwise I can't really comment this ppc-specific change.

To be honest, I don't even understand why do we need this fix. Sure, the
breakpoint in the middle of 64-bit insn won't work, why do we care? The
user should know what does he do.


That's a valid point. This patch is to protract user from doing
invalid thing.

Though, there is one minor scenario where this patch will help. If
the original prefixed instruction is 64 byte unaligned, and say
user probes it, Uprobe infrastructure will emulate such instruction
transparently without notifying user that the instruction is
improperly aligned.


Not to mention we can't really trust get_user_pages() in that this page
can be modified by mm owner or debugger...


As Naveen pointed out, there might be existing uprobe on the prefix
and this patch will fail to detect such scenario. So I'm thinking to
read the instruction directly from file backed page (like copy_insn),
in which case I won't use get_user_pages().

Thanks Oleg for the review!

Ravi

Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction

2021-02-04 Thread Ravi Bangoria





On 2/4/21 4:17 PM, Ravi Bangoria wrote:

Don't allow Uprobe on 2nd word of a prefixed instruction. As per
ISA 3.1, prefixed instruction should not cross 64-byte boundary.
So don't allow Uprobe on such prefixed instruction as well.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if we notice
that probe is on the 2nd word of prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.


@mpe,

arch_uprobe_analyze_insn() can return early if
cpu_has_feature(CPU_FTR_ARCH_31) is not set. But that will
miss out a rare scenario of user running binary with prefixed
instruction on p10 predecessors. Please let me know if I
should add cpu_has_feature(CPU_FTR_ARCH_31) or not.

- Ravi

[PATCH v2] powerpc/uprobes: Validation for prefixed instruction

2021-02-04 Thread Ravi Bangoria

Don't allow Uprobe on 2nd word of a prefixed instruction. As per
ISA 3.1, prefixed instruction should not cross 64-byte boundary.
So don't allow Uprobe on such prefixed instruction as well.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if we notice
that probe is on the 2nd word of prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
---
v1: http://lore.kernel.org/r/20210119091234.76317-1-ravi.bango...@linux.ibm.com
v1->v2:
  - Instead of introducing new arch hook from verify_opcode(), use
existing hook arch_uprobe_analyze_insn().
  - Add explicit check for prefixed instruction crossing 64-byte
boundary. If probe is on such instruction, throw an error.

 arch/powerpc/kernel/uprobes.c | 66 ++-
 1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..485d19a2a31f 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -7,6 +7,7 @@
  * Adapted from the x86 port by Ananth N Mavinakayanahalli 
  */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,6 +29,69 @@ bool is_trap_insn(uprobe_opcode_t *insn)
return (is_trap(*insn));
 }
 
+#ifdef CONFIG_PPC64
+static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr)
+{
+   struct page *page;
+   struct vm_area_struct *vma;
+   void *kaddr;
+   unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD;
+
+   if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 
0)
+   return -EINVAL;
+
+   kaddr = kmap_atomic(page);
+   *instr = *((u32 *)(kaddr + (addr & ~PAGE_MASK)));
+   kunmap_atomic(kaddr);
+   put_page(page);
+   return 0;
+}
+
+static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr)
+{
+   struct ppc_inst inst;
+   u32 prefix, suffix;
+
+   /*
+* No need to check if addr is pointing to beginning of the
+* page. Even if probe is on a suffix of page-unaligned
+* prefixed instruction, hw will raise exception and kernel
+* will send SIGBUS.
+*/
+   if (!(addr & ~PAGE_MASK))
+   return 0;
+
+   if (get_instr(mm, addr, ) < 0)
+   return -EINVAL;
+   if (get_instr(mm, addr + 4, ) < 0)
+   return -EINVAL;
+
+   inst = ppc_inst_prefix(prefix, suffix);
+   if (ppc_inst_prefixed(inst) && (addr & 0x3F) == 0x3C) {
+   printk_ratelimited("Cannot register a uprobe on 64 byte "
+  "unaligned prefixed instruction\n");
+   return -EINVAL;
+   }
+
+   suffix = prefix;
+   if (get_instr(mm, addr - 4, ) < 0)
+   return -EINVAL;
+
+   inst = ppc_inst_prefix(prefix, suffix);
+   if (ppc_inst_prefixed(inst)) {
+   printk_ratelimited("Cannot register a uprobe on the second "
+  "word of prefixed instruction\n");
+   return -EINVAL;
+   }
+   return 0;
+}
+#else
+static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr)
+{
+   return 0;
+}
+#endif
+
 /**
  * arch_uprobe_analyze_insn
  * @mm: the probed address space.
@@ -41,7 +105,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
if (addr & 0x03)
return -EINVAL;
 
-   return 0;
+   return validate_prefixed_instr(mm, addr);
 }
 
 /*
-- 
2.26.2

Re: [PATCH] powerpc/sstep: Fix array out of bound warning

2021-01-28 Thread Ravi Bangoria





On 1/28/21 10:50 PM, Naveen N. Rao wrote:

On 2021/01/15 11:46AM, Ravi Bangoria wrote:

Compiling kernel with -Warray-bounds throws below warning:

   In function 'emulate_vsx_store':
   warning: array subscript is above array bounds [-Warray-bounds]
   buf.d[2] = byterev_8(reg->d[1]);
   ~^~~
   buf.d[3] = byterev_8(reg->d[0]);
   ~^~~

Fix it by converting local variable 'union vsx_reg buf' into an array.
Also consider function argument 'union vsx_reg *reg' as array instead
of pointer because callers are actually passing an array to it.


I think you should change the function prototype to reflect this.

However, while I agree with this change in principle, it looks to be a
lot of code churn for a fairly narrow use. Perhaps we should just
address the specific bug. Something like the below (not tested)?


Yes, this would serve the same purpose and it's more compact as well. Sent v2.



@@ -818,13 +818,15 @@ void emulate_vsx_store(struct instruction_op *op, const 
union vsx_reg *reg,
 break;
 if (rev) {
 /* reverse 32 bytes */
-   buf.d[0] = byterev_8(reg->d[3]);
-   buf.d[1] = byterev_8(reg->d[2]);
-   buf.d[2] = byterev_8(reg->d[1]);
-   buf.d[3] = byterev_8(reg->d[0]);
-   reg = 
+   union vsx_reg buf32[2];
+   buf32[0].d[0] = byterev_8(reg[1].d[1]);
+   buf32[0].d[1] = byterev_8(reg[1].d[0]);
+   buf32[1].d[0] = byterev_8(reg[0].d[1]);
+   buf32[1].d[1] = byterev_8(reg[0].d[0]);
+   memcpy(mem, buf32, size);
+   } else {
+   memcpy(mem, reg, size);
 }
-   memcpy(mem, reg, size);
 break;
 case 16:
 /* stxv, stxvx, stxvl, stxvll */


- Naveen

[PATCH v2] powerpc/sstep: Fix array out of bound warning

2021-01-28 Thread Ravi Bangoria

Compiling kernel with -Warray-bounds throws below warning:

  In function 'emulate_vsx_store':
  warning: array subscript is above array bounds [-Warray-bounds]
  buf.d[2] = byterev_8(reg->d[1]);
  ~^~~
  buf.d[3] = byterev_8(reg->d[0]);
  ~^~~

Fix it by using temporary array variable 'union vsx_reg buf32[]' in
that code block. Also, with element_size = 32, 'union vsx_reg *reg'
is an array of size 2. So, use 'reg' as an array instead of pointer
in the same code block.

Fixes: af99da74333b ("powerpc/sstep: Support VSX vector paired storage access 
instructions")
Suggested-by: Naveen N. Rao 
Signed-off-by: Ravi Bangoria 
---
v1: http://lore.kernel.org/r/20210115061620.692500-1-ravi.bango...@linux.ibm.com
v1->v2:
  - Change code only in the affected block

 arch/powerpc/lib/sstep.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index bf7a7d62ae8b..ede093e96234 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -818,13 +818,15 @@ void emulate_vsx_store(struct instruction_op *op, const 
union vsx_reg *reg,
break;
if (rev) {
/* reverse 32 bytes */
-   buf.d[0] = byterev_8(reg->d[3]);
-   buf.d[1] = byterev_8(reg->d[2]);
-   buf.d[2] = byterev_8(reg->d[1]);
-   buf.d[3] = byterev_8(reg->d[0]);
-   reg = 
+   union vsx_reg buf32[2];
+   buf32[0].d[0] = byterev_8(reg[1].d[1]);
+   buf32[0].d[1] = byterev_8(reg[1].d[0]);
+   buf32[1].d[0] = byterev_8(reg[0].d[1]);
+   buf32[1].d[1] = byterev_8(reg[0].d[0]);
+   memcpy(mem, buf32, size);
+   } else {
+   memcpy(mem, reg, size);
}
-   memcpy(mem, reg, size);
break;
case 16:
/* stxv, stxvx, stxvl, stxvll */
-- 
2.26.2

Re: [PATCH] powerpc/uprobes: Don't allow probe on suffix of prefixed instruction

2021-01-20 Thread Ravi Bangoria





On 1/19/21 10:56 PM, Oleg Nesterov wrote:

On 01/19, Ravi Bangoria wrote:


Probe on 2nd word of a prefixed instruction is invalid scenario and
should be restricted.


I don't understand this ppc-specific problem, but...


So far (upto Power9), instruction size was fixed - 4 bytes. But Power10
introduced a prefixed instruction which consist of 8 bytes, where first
4 bytes is prefix and remaining is suffix.

This patch checks whether the Uprobe is on the 2nd word (suffix) of a
prefixed instruction. If so, consider it as invalid Uprobe.




+#ifdef CONFIG_PPC64
+int arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr,
+ uprobe_opcode_t opcode)
+{
+   uprobe_opcode_t prefix;
+   void *kaddr;
+   struct ppc_inst inst;
+
+   /* Don't check if vaddr is pointing to the beginning of page */
+   if (!(vaddr & ~PAGE_MASK))
+   return 0;


So the fix is incomplete? Or insn at the start of page can't be prefixed?


Prefixed instruction can not cross 64 byte boundary. If it does, kernel
generates SIGBUS. Considering all powerpc supported page sizes to be
multiple of 64 bytes, there will never be a scenario where prefix and
suffix will be on different pages. i.e. a beginning of the page should
never be a suffix.




+int __weak arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr,
+uprobe_opcode_t opcode)
+{
+   return 0;
+}
+
  static int verify_opcode(struct page *page, unsigned long vaddr, 
uprobe_opcode_t *new_opcode)
  {
uprobe_opcode_t old_opcode;
@@ -275,6 +281,8 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
if (is_swbp_insn(new_opcode)) {
if (is_swbp)/* register: already installed? */
return 0;
+   if (arch_uprobe_verify_opcode(page, vaddr, old_opcode))
+   return -EINVAL;


Well, this doesn't look good...

To me it would be better to change the prepare_uprobe() path to copy
the potential prefix into uprobe->arch and check ppc_inst_prefixed()
in arch_uprobe_analyze_insn(). What do you think?


Agreed. The only reason I was checking via verify_opcode() is to make the
code more simpler. If I need to check via prepare_uprobe(), I'll need to
abuse uprobe->offset by setting it to uprobe->offset - 4 to read previous
4 bytes of current instruction. Which, IMHO, is not that straightforward
with current implementation of prepare_uprobe().

But while replying here, I'm thinking... I should be able to grab a page
using mm and vaddr, which are already available in arch_uprobe_analyze_insn().
With that, I should be able to do all this inside arch_uprobe_analyze_insn()
only. I'll try this and send v2 if that works.

Thanks for the review.
Ravi

[PATCH] powerpc/uprobes: Don't allow probe on suffix of prefixed instruction

2021-01-19 Thread Ravi Bangoria

Probe on 2nd word of a prefixed instruction is invalid scenario and
should be restricted.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if we notice
that probe is on the 2nd word of prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/uprobes.c | 28 
 include/linux/uprobes.h   |  1 +
 kernel/events/uprobes.c   |  8 
 3 files changed, 37 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..c73d5a397164 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -7,6 +7,7 @@
  * Adapted from the x86 port by Ananth N Mavinakayanahalli 
  */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -44,6 +45,33 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
return 0;
 }
 
+#ifdef CONFIG_PPC64
+int arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr,
+ uprobe_opcode_t opcode)
+{
+   uprobe_opcode_t prefix;
+   void *kaddr;
+   struct ppc_inst inst;
+
+   /* Don't check if vaddr is pointing to the beginning of page */
+   if (!(vaddr & ~PAGE_MASK))
+   return 0;
+
+   kaddr = kmap_atomic(page);
+   memcpy(, kaddr + ((vaddr - 4) & ~PAGE_MASK), 
UPROBE_SWBP_INSN_SIZE);
+   kunmap_atomic(kaddr);
+
+   inst = ppc_inst_prefix(prefix, opcode);
+
+   if (ppc_inst_prefixed(inst)) {
+   printk_ratelimited("Cannot register a uprobe on the second "
+  "word of prefixed instruction\n");
+   return -1;
+   }
+   return 0;
+}
+#endif
+
 /*
  * arch_uprobe_pre_xol - prepare to execute out of line.
  * @auprobe: the probepoint information.
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f46e0ca0169c..5a3b45878e13 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -128,6 +128,7 @@ extern bool uprobe_deny_signal(void);
 extern bool arch_uprobe_skip_sstep(struct arch_uprobe *aup, struct pt_regs 
*regs);
 extern void uprobe_clear_state(struct mm_struct *mm);
 extern int  arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct 
*mm, unsigned long addr);
+int arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, 
uprobe_opcode_t opcode);
 extern int  arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
 extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs 
*regs);
 extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index bf9edd8d75be..be02e6c26e3f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -255,6 +255,12 @@ static void copy_to_page(struct page *page, unsigned long 
vaddr, const void *src
kunmap_atomic(kaddr);
 }
 
+int __weak arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr,
+uprobe_opcode_t opcode)
+{
+   return 0;
+}
+
 static int verify_opcode(struct page *page, unsigned long vaddr, 
uprobe_opcode_t *new_opcode)
 {
uprobe_opcode_t old_opcode;
@@ -275,6 +281,8 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
if (is_swbp_insn(new_opcode)) {
if (is_swbp)/* register: already installed? */
return 0;
+   if (arch_uprobe_verify_opcode(page, vaddr, old_opcode))
+   return -EINVAL;
} else {
if (!is_swbp)   /* unregister: was it changed by us? */
return 0;
-- 
2.26.2

[PATCH] powerpc/sstep: Fix array out of bound warning

2021-01-14 Thread Ravi Bangoria

Compiling kernel with -Warray-bounds throws below warning:

  In function 'emulate_vsx_store':
  warning: array subscript is above array bounds [-Warray-bounds]
  buf.d[2] = byterev_8(reg->d[1]);
  ~^~~
  buf.d[3] = byterev_8(reg->d[0]);
  ~^~~

Fix it by converting local variable 'union vsx_reg buf' into an array.
Also consider function argument 'union vsx_reg *reg' as array instead
of pointer because callers are actually passing an array to it.

Fixes: af99da74333b ("powerpc/sstep: Support VSX vector paired storage access 
instructions")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/sstep.c | 61 
 1 file changed, 31 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index bf7a7d62ae8b..5b4281ade5b6 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -723,7 +723,8 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
const unsigned char *bp;
 
size = GETSIZE(op->type);
-   reg->d[0] = reg->d[1] = 0;
+   reg[0].d[0] = reg[0].d[1] = 0;
+   reg[1].d[0] = reg[1].d[1] = 0;
 
switch (op->element_size) {
case 32:
@@ -742,25 +743,25 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
/* scalar loads, lxvd2x, lxvdsx */
read_size = (size >= 8) ? 8 : size;
i = IS_LE ? 8 : 8 - read_size;
-   memcpy(>b[i], mem, read_size);
+   memcpy([0].b[i], mem, read_size);
if (rev)
-   do_byte_reverse(>b[i], 8);
+   do_byte_reverse([0].b[i], 8);
if (size < 8) {
if (op->type & SIGNEXT) {
/* size == 4 is the only case here */
-   reg->d[IS_LE] = (signed int) reg->d[IS_LE];
+   reg[0].d[IS_LE] = (signed int)reg[0].d[IS_LE];
} else if (op->vsx_flags & VSX_FPCONV) {
preempt_disable();
-   conv_sp_to_dp(>fp[1 + IS_LE],
- >dp[IS_LE]);
+   conv_sp_to_dp([0].fp[1 + IS_LE],
+ [0].dp[IS_LE]);
preempt_enable();
}
} else {
if (size == 16) {
unsigned long v = *(unsigned long *)(mem + 8);
-   reg->d[IS_BE] = !rev ? v : byterev_8(v);
+   reg[0].d[IS_BE] = !rev ? v : byterev_8(v);
} else if (op->vsx_flags & VSX_SPLAT)
-   reg->d[IS_BE] = reg->d[IS_LE];
+   reg[0].d[IS_BE] = reg[0].d[IS_LE];
}
break;
case 4:
@@ -768,13 +769,13 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
wp = mem;
for (j = 0; j < size / 4; ++j) {
i = IS_LE ? 3 - j : j;
-   reg->w[i] = !rev ? *wp++ : byterev_4(*wp++);
+   reg[0].w[i] = !rev ? *wp++ : byterev_4(*wp++);
}
if (op->vsx_flags & VSX_SPLAT) {
-   u32 val = reg->w[IS_LE ? 3 : 0];
+   u32 val = reg[0].w[IS_LE ? 3 : 0];
for (; j < 4; ++j) {
i = IS_LE ? 3 - j : j;
-   reg->w[i] = val;
+   reg[0].w[i] = val;
}
}
break;
@@ -783,7 +784,7 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
hp = mem;
for (j = 0; j < size / 2; ++j) {
i = IS_LE ? 7 - j : j;
-   reg->h[i] = !rev ? *hp++ : byterev_2(*hp++);
+   reg[0].h[i] = !rev ? *hp++ : byterev_2(*hp++);
}
break;
case 1:
@@ -791,7 +792,7 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
bp = mem;
for (j = 0; j < size; ++j) {
i = IS_LE ? 15 - j : j;
-   reg->b[i] = *bp++;
+   reg[0].b[i] = *bp++;
}
break;
}
@@ -804,7 +805,7 @@ void emulate_vsx_store(struct instruction_op *op, const 
union vsx_reg *reg,
 {
int size, write_size;
int i, j;
-   union vsx_reg buf;
+   union vsx_reg buf[2];
unsigned int *wp;
unsigned short *hp;
unsigned char *bp;
@@ -818,11 +819,11 @@ void emulate_vs

[PATCH v3 1/4] KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1

2020-12-16 Thread Ravi Bangoria

On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED
hcall to load L2 guest state in cpu. L1 hypervisor prepares the
L2 state in struct hv_guest_state and passes a pointer to it via
hcall. Using that pointer, L0 reads/writes that state directly
from/to L1 memory. Thus L0 must be aware of hv_guest_state layout
of L1. Currently it uses version field to achieve this. i.e. If
L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't
allow nested kvm guest.

This restriction can be loosen up a bit. L0 can be taught to
understand older layout of hv_guest_state, if we restrict the
new member to be added only at the end. i.e. we can allow
nested guest even when L0 hv_guest_state.version > L1
hv_guest_state.version. Though, the other way around is not
possible.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Fabiano Rosas 
---
 arch/powerpc/include/asm/hvcall.h   | 17 +++--
 arch/powerpc/kvm/book3s_hv_nested.c | 55 +++--
 2 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index c1fbccb04390..ca6840239f90 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -526,9 +526,12 @@ struct h_cpu_char_result {
u64 behaviour;
 };
 
-/* Register state for entering a nested guest with H_ENTER_NESTED */
+/*
+ * Register state for entering a nested guest with H_ENTER_NESTED.
+ * New member must be added at the end.
+ */
 struct hv_guest_state {
-   u64 version;/* version of this structure layout */
+   u64 version;/* version of this structure layout, must be 
first */
u32 lpid;
u32 vcpu_token;
/* These registers are hypervisor privileged (at least for writing) */
@@ -562,6 +565,16 @@ struct hv_guest_state {
 /* Latest version of hv_guest_state structure */
 #define HV_GUEST_STATE_VERSION 1
 
+static inline int hv_guest_state_size(unsigned int version)
+{
+   switch (version) {
+   case 1:
+   return offsetofend(struct hv_guest_state, ppr);
+   default:
+   return -1;
+   }
+}
+
 /*
  * From the document "H_GetPerformanceCounterInfo Interface" v1.07
  *
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 33b58549a9aa..937dd5114300 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -215,12 +215,51 @@ static void kvmhv_nested_mmio_needed(struct kvm_vcpu 
*vcpu, u64 regs_ptr)
}
 }
 
+static int kvmhv_read_guest_state_and_regs(struct kvm_vcpu *vcpu,
+  struct hv_guest_state *l2_hv,
+  struct pt_regs *l2_regs,
+  u64 hv_ptr, u64 regs_ptr)
+{
+   int size;
+
+   if (kvm_vcpu_read_guest(vcpu, hv_ptr, _hv->version,
+   sizeof(l2_hv->version)))
+   return -1;
+
+   if (kvmppc_need_byteswap(vcpu))
+   l2_hv->version = swab64(l2_hv->version);
+
+   size = hv_guest_state_size(l2_hv->version);
+   if (size < 0)
+   return -1;
+
+   return kvm_vcpu_read_guest(vcpu, hv_ptr, l2_hv, size) ||
+   kvm_vcpu_read_guest(vcpu, regs_ptr, l2_regs,
+   sizeof(struct pt_regs));
+}
+
+static int kvmhv_write_guest_state_and_regs(struct kvm_vcpu *vcpu,
+   struct hv_guest_state *l2_hv,
+   struct pt_regs *l2_regs,
+   u64 hv_ptr, u64 regs_ptr)
+{
+   int size;
+
+   size = hv_guest_state_size(l2_hv->version);
+   if (size < 0)
+   return -1;
+
+   return kvm_vcpu_write_guest(vcpu, hv_ptr, l2_hv, size) ||
+   kvm_vcpu_write_guest(vcpu, regs_ptr, l2_regs,
+sizeof(struct pt_regs));
+}
+
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 {
long int err, r;
struct kvm_nested_guest *l2;
struct pt_regs l2_regs, saved_l1_regs;
-   struct hv_guest_state l2_hv, saved_l1_hv;
+   struct hv_guest_state l2_hv = {0}, saved_l1_hv;
struct kvmppc_vcore *vc = vcpu->arch.vcore;
u64 hv_ptr, regs_ptr;
u64 hdec_exp;
@@ -235,17 +274,15 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
hv_ptr = kvmppc_get_gpr(vcpu, 4);
regs_ptr = kvmppc_get_gpr(vcpu, 5);
vcpu->srcu_idx = srcu_read_lock(>kvm->srcu);
-   err = kvm_vcpu_read_guest(vcpu, hv_ptr, _hv,
- sizeof(struct hv_guest_state)) ||
-   kvm_vcpu_read_guest(vcpu, regs_ptr, _regs,
-   sizeof(struct pt_regs));
+   err = kvmhv_read_guest_state_and_regs(vcpu, _hv, _regs,
+ hv_ptr, reg

[PATCH v3 4/4] KVM: PPC: Introduce new capability for 2nd DAWR

2020-12-16 Thread Ravi Bangoria

Introduce KVM_CAP_PPC_DAWR1 which can be used by Qemu to query whether
kvm supports 2nd DAWR or not. The capability is by default disabled
even when the underlying CPU supports 2nd DAWR. Qemu needs to check
and enable it manually to use the feature.

Signed-off-by: Ravi Bangoria 
---
 Documentation/virt/kvm/api.rst | 10 ++
 arch/powerpc/include/asm/kvm_ppc.h |  1 +
 arch/powerpc/kvm/book3s_hv.c   | 12 
 arch/powerpc/kvm/powerpc.c | 10 ++
 include/uapi/linux/kvm.h   |  1 +
 tools/include/uapi/linux/kvm.h |  1 +
 6 files changed, 35 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index abb24575bdf9..049f07ebf197 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6016,6 +6016,16 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit 
notifications which user space
 can then handle to implement model specific MSR handling and/or user 
notifications
 to inform a user that an MSR was not handled.
 
+7.22 KVM_CAP_PPC_DAWR1
+--
+
+:Architectures: ppc
+:Parameters: none
+:Returns: 0 on success, -EINVAL when CPU doesn't support 2nd DAWR
+
+This capability can be used to check / enable 2nd DAWR feature provided
+by POWER10 processor.
+
 8. Other capabilities.
 ==
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 0a056c64c317..13c39d24dda5 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -314,6 +314,7 @@ struct kvmppc_ops {
  int size);
int (*enable_svm)(struct kvm *kvm);
int (*svm_off)(struct kvm *kvm);
+   int (*enable_dawr1)(struct kvm *kvm);
 };
 
 extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b7a30c0692a7..04c02344bd3f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5625,6 +5625,17 @@ static int kvmhv_svm_off(struct kvm *kvm)
return ret;
 }
 
+static int kvmhv_enable_dawr1(struct kvm *kvm)
+{
+   if (!cpu_has_feature(CPU_FTR_DAWR1))
+   return -ENODEV;
+
+   /* kvm == NULL means the caller is testing if the capability exists */
+   if (kvm)
+   kvm->arch.dawr1_enabled = true;
+   return 0;
+}
+
 static struct kvmppc_ops kvm_ops_hv = {
.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
@@ -5668,6 +5679,7 @@ static struct kvmppc_ops kvm_ops_hv = {
.store_to_eaddr = kvmhv_store_to_eaddr,
.enable_svm = kvmhv_enable_svm,
.svm_off = kvmhv_svm_off,
+   .enable_dawr1 = kvmhv_enable_dawr1,
 };
 
 static int kvm_init_subcore_bitmap(void)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 13999123b735..380656528b5b 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -678,6 +678,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = hv_enabled && kvmppc_hv_ops->enable_svm &&
!kvmppc_hv_ops->enable_svm(NULL);
break;
+   case KVM_CAP_PPC_DAWR1:
+   r = !!(hv_enabled && kvmppc_hv_ops->enable_dawr1 &&
+  !kvmppc_hv_ops->enable_dawr1(NULL));
+   break;
 #endif
default:
r = 0;
@@ -2187,6 +2191,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
r = kvm->arch.kvm_ops->enable_svm(kvm);
break;
+   case KVM_CAP_PPC_DAWR1:
+   r = -EINVAL;
+   if (!is_kvmppc_hv_enabled(kvm) || 
!kvm->arch.kvm_ops->enable_dawr1)
+   break;
+   r = kvm->arch.kvm_ops->enable_dawr1(kvm);
+   break;
 #endif
default:
r = -EINVAL;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ca41220b40b8..f1210f99a52d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_X86_USER_SPACE_MSR 188
 #define KVM_CAP_X86_MSR_FILTER 189
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
+#define KVM_CAP_PPC_DAWR1 191
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index ca41220b40b8..f1210f99a52d 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_X86_USER_SPACE_MSR 188
 #define KVM_CAP_X86_MSR_FILTER 189
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
+#define KVM_CAP_PPC_DAWR1 191
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.26.2

[PATCH v3 0/4] KVM: PPC: Power10 2nd DAWR enablement

2020-12-16 Thread Ravi Bangoria

Enable p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor
resource and thus H_SET_MODE hcall is used to set/unset it. A new case
H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for
setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has
been added to query 2nd DAWR support via kvm ioctl.

This feature also needs to be enabled in Qemu to really use it. I'll
post Qemu patches once kvm patches get accepted.

v2: 
https://lore.kernel.org/kvm/20201124105953.39325-1-ravi.bango...@linux.ibm.com

v2->v3:
 - Patch #1. If L0 version > L1, L0 hv_guest_state will contain some
   additional fields which won't be filled while reading from L1
   memory and thus they can contain garbage. Initialize l2_hv with 0s
   to avoid such situations.
 - Patch #3. Introduce per vm flag dawr1_enabled.
 - Patch #4. Instead of auto enabling KVM_CAP_PPC_DAWR1, let user check
   and enable it manually. Also move KVM_CAP_PPC_DAWR1 check / enable
   logic inside #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE).
 - Explain KVM_CAP_PPC_DAWR1 in Documentation/virt/kvm/api.rst 
 - Rebased on top of 5.10-rc3.

v1->v2:
 - patch #1: New patch
 - patch #2: Don't rename KVM_REG_PPC_DAWR, it's an uapi macro
 - patch #3: Increment HV_GUEST_STATE_VERSION
 - Split kvm and selftests patches into different series
 - Patches rebased to paulus/kvm-ppc-next (cf59eb13e151) + few
   other watchpoint patches which are yet to be merged in
   paulus/kvm-ppc-next.

Ravi Bangoria (4):
  KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1
  KVM: PPC: Rename current DAWR macros and variables
  KVM: PPC: Add infrastructure to support 2nd DAWR
  KVM: PPC: Introduce new capability for 2nd DAWR

 Documentation/virt/kvm/api.rst| 12 
 arch/powerpc/include/asm/hvcall.h | 25 ++-
 arch/powerpc/include/asm/kvm_host.h   |  7 +-
 arch/powerpc/include/asm/kvm_ppc.h|  1 +
 arch/powerpc/include/uapi/asm/kvm.h   |  2 +
 arch/powerpc/kernel/asm-offsets.c |  6 +-
 arch/powerpc/kvm/book3s_hv.c  | 79 +++
 arch/powerpc/kvm/book3s_hv_nested.c   | 70 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 43 +---
 arch/powerpc/kvm/powerpc.c| 10 +++
 include/uapi/linux/kvm.h  |  1 +
 tools/arch/powerpc/include/uapi/asm/kvm.h |  2 +
 tools/include/uapi/linux/kvm.h|  1 +
 13 files changed, 216 insertions(+), 43 deletions(-)

-- 
2.26.2

[PATCH v3 2/4] KVM: PPC: Rename current DAWR macros and variables

2020-12-16 Thread Ravi Bangoria

Power10 is introducing second DAWR. Use real register names (with
suffix 0) from ISA for current macros and variables used by kvm.
One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's
uapi so changing it will break userspace.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/kvm_host.h |  4 ++--
 arch/powerpc/kernel/asm-offsets.c   |  4 ++--
 arch/powerpc/kvm/book3s_hv.c| 24 
 arch/powerpc/kvm/book3s_hv_nested.c |  8 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++--
 5 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d67a470e95a3..62cadf1a596e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -584,8 +584,8 @@ struct kvm_vcpu_arch {
u32 ctrl;
u32 dabrx;
ulong dabr;
-   ulong dawr;
-   ulong dawrx;
+   ulong dawr0;
+   ulong dawrx0;
ulong ciabr;
ulong cfar;
ulong ppr;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index c2722ff36e98..5a77aac516ba 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -548,8 +548,8 @@ int main(void)
OFFSET(VCPU_CTRL, kvm_vcpu, arch.ctrl);
OFFSET(VCPU_DABR, kvm_vcpu, arch.dabr);
OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx);
-   OFFSET(VCPU_DAWR, kvm_vcpu, arch.dawr);
-   OFFSET(VCPU_DAWRX, kvm_vcpu, arch.dawrx);
+   OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0);
+   OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0);
OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr);
OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags);
OFFSET(VCPU_DEC, kvm_vcpu, arch.dec);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index e3b1839fc251..bcbad8daa974 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -782,8 +782,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
return H_UNSUPPORTED_FLAG_START;
if (value2 & DABRX_HYP)
return H_P4;
-   vcpu->arch.dawr  = value1;
-   vcpu->arch.dawrx = value2;
+   vcpu->arch.dawr0  = value1;
+   vcpu->arch.dawrx0 = value2;
return H_SUCCESS;
case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE:
/* KVM does not support mflags=2 (AIL=2) */
@@ -1747,10 +1747,10 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
*val = get_reg_val(id, vcpu->arch.vcore->vtb);
break;
case KVM_REG_PPC_DAWR:
-   *val = get_reg_val(id, vcpu->arch.dawr);
+   *val = get_reg_val(id, vcpu->arch.dawr0);
break;
case KVM_REG_PPC_DAWRX:
-   *val = get_reg_val(id, vcpu->arch.dawrx);
+   *val = get_reg_val(id, vcpu->arch.dawrx0);
break;
case KVM_REG_PPC_CIABR:
*val = get_reg_val(id, vcpu->arch.ciabr);
@@ -1979,10 +1979,10 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
vcpu->arch.vcore->vtb = set_reg_val(id, *val);
break;
case KVM_REG_PPC_DAWR:
-   vcpu->arch.dawr = set_reg_val(id, *val);
+   vcpu->arch.dawr0 = set_reg_val(id, *val);
break;
case KVM_REG_PPC_DAWRX:
-   vcpu->arch.dawrx = set_reg_val(id, *val) & ~DAWRX_HYP;
+   vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP;
break;
case KVM_REG_PPC_CIABR:
vcpu->arch.ciabr = set_reg_val(id, *val);
@@ -3437,8 +3437,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
int trap;
unsigned long host_hfscr = mfspr(SPRN_HFSCR);
unsigned long host_ciabr = mfspr(SPRN_CIABR);
-   unsigned long host_dawr = mfspr(SPRN_DAWR0);
-   unsigned long host_dawrx = mfspr(SPRN_DAWRX0);
+   unsigned long host_dawr0 = mfspr(SPRN_DAWR0);
+   unsigned long host_dawrx0 = mfspr(SPRN_DAWRX0);
unsigned long host_psscr = mfspr(SPRN_PSSCR);
unsigned long host_pidr = mfspr(SPRN_PID);
 
@@ -3477,8 +3477,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
mtspr(SPRN_SPURR, vcpu->arch.spurr);
 
if (dawr_enabled()) {
-   mtspr(SPRN_DAWR0, vcpu->arch.dawr);
-   mtspr(SPRN_DAWRX0, vcpu->arch.dawrx);
+   mtspr(SPRN_DAWR0, vcpu->arch.dawr0);
+   mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0);
}
mtspr(SPRN_CIABR, vcpu->arch.ciabr);
mtspr(SPRN_IC, vcpu->arch.ic);
@@ -3530,8 +3530,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64

[PATCH v3 3/4] KVM: PPC: Add infrastructure to support 2nd DAWR

2020-12-16 Thread Ravi Bangoria

kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR.
DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/
unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR.
Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set.

Signed-off-by: Ravi Bangoria 
---
 Documentation/virt/kvm/api.rst|  2 ++
 arch/powerpc/include/asm/hvcall.h |  8 -
 arch/powerpc/include/asm/kvm_host.h   |  3 ++
 arch/powerpc/include/uapi/asm/kvm.h   |  2 ++
 arch/powerpc/kernel/asm-offsets.c |  2 ++
 arch/powerpc/kvm/book3s_hv.c  | 43 +++
 arch/powerpc/kvm/book3s_hv_nested.c   |  7 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 23 
 tools/arch/powerpc/include/uapi/asm/kvm.h |  2 ++
 9 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 36d5f1f3c6dd..abb24575bdf9 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2249,6 +2249,8 @@ registers, find a list below:
   PPC KVM_REG_PPC_PSSCR   64
   PPC KVM_REG_PPC_DEC_EXPIRY  64
   PPC KVM_REG_PPC_PTCR64
+  PPC KVM_REG_PPC_DAWR1   64
+  PPC KVM_REG_PPC_DAWRX1  64
   PPC KVM_REG_PPC_TM_GPR0 64
   ...
   PPC KVM_REG_PPC_TM_GPR3164
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index ca6840239f90..98afa58b619a 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -560,16 +560,22 @@ struct hv_guest_state {
u64 pidr;
u64 cfar;
u64 ppr;
+   /* Version 1 ends here */
+   u64 dawr1;
+   u64 dawrx1;
+   /* Version 2 ends here */
 };
 
 /* Latest version of hv_guest_state structure */
-#define HV_GUEST_STATE_VERSION 1
+#define HV_GUEST_STATE_VERSION 2
 
 static inline int hv_guest_state_size(unsigned int version)
 {
switch (version) {
case 1:
return offsetofend(struct hv_guest_state, ppr);
+   case 2:
+   return offsetofend(struct hv_guest_state, dawrx1);
default:
return -1;
}
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 62cadf1a596e..a93cfb672421 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -307,6 +307,7 @@ struct kvm_arch {
u8 svm_enabled;
bool threads_indep;
bool nested_enable;
+   bool dawr1_enabled;
pgd_t *pgtable;
u64 process_table;
struct dentry *debugfs_dir;
@@ -586,6 +587,8 @@ struct kvm_vcpu_arch {
ulong dabr;
ulong dawr0;
ulong dawrx0;
+   ulong dawr1;
+   ulong dawrx1;
ulong ciabr;
ulong cfar;
ulong ppr;
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index c3af3f324c5a..9f18fa090f1f 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -644,6 +644,8 @@ struct kvm_ppc_cpu_char {
 #define KVM_REG_PPC_MMCR3  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc1)
 #define KVM_REG_PPC_SIER2  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc2)
 #define KVM_REG_PPC_SIER3  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc3)
+#define KVM_REG_PPC_DAWR1  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc4)
+#define KVM_REG_PPC_DAWRX1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc5)
 
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 5a77aac516ba..a35ea4e19360 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -550,6 +550,8 @@ int main(void)
OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx);
OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0);
OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0);
+   OFFSET(VCPU_DAWR1, kvm_vcpu, arch.dawr1);
+   OFFSET(VCPU_DAWRX1, kvm_vcpu, arch.dawrx1);
OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr);
OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags);
OFFSET(VCPU_DEC, kvm_vcpu, arch.dec);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index bcbad8daa974..b7a30c0692a7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -785,6 +785,22 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
vcpu->arch.dawr0  = value1;
vcpu->arch.dawrx0 = value2;
return H_SUCCESS;
+   case H_SET_MODE_RESOURCE_SET_DAWR1:
+   if (!kvmppc_power8_compatible(vcpu))
+   return H_P2;
+   if (!ppc_breakpoint_available())
+   return H_P2;
+   if (!cpu_has_feature(CPU_FTR_DAWR1))
+

[PATCH] powerpc/xmon: Fix build failure for 8xx

2020-11-29 Thread Ravi Bangoria

With CONFIG_PPC_8xx and CONFIG_XMON set, kernel build fails with

  arch/powerpc/xmon/xmon.c:1379:12: error: 'find_free_data_bpt' defined
  but not used [-Werror=unused-function]

Fix it by enclosing find_free_data_bpt() inside #ifndef CONFIG_PPC_8xx.

Reported-by: kernel test robot 
Fixes: 30df74d67d48 ("powerpc/watchpoint/xmon: Support 2nd DAWR")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/xmon/xmon.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 55c43a6c9111..5559edf36756 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1383,6 +1383,7 @@ static long check_bp_loc(unsigned long addr)
return 1;
 }
 
+#ifndef CONFIG_PPC_8xx
 static int find_free_data_bpt(void)
 {
int i;
@@ -1394,6 +1395,7 @@ static int find_free_data_bpt(void)
printf("Couldn't find free breakpoint register\n");
return -1;
 }
+#endif
 
 static void print_data_bpts(void)
 {
-- 
2.17.1

Re: [PATCH 1/2] powerpc: sstep: Fix load and update instructions

2020-11-26 Thread Ravi Bangoria





On 11/19/20 11:11 AM, Sandipan Das wrote:

The Power ISA says that the fixed-point load and update
instructions must neither use R0 for the base address (RA)
nor have the destination (RT) and the base address (RA) as
the same register. In these cases, the instruction is
invalid. This applies to the following instructions.
   * Load Byte and Zero with Update (lbzu)
   * Load Byte and Zero with Update Indexed (lbzux)
   * Load Halfword and Zero with Update (lhzu)
   * Load Halfword and Zero with Update Indexed (lhzux)
   * Load Halfword Algebraic with Update (lhau)
   * Load Halfword Algebraic with Update Indexed (lhaux)
   * Load Word and Zero with Update (lwzu)
   * Load Word and Zero with Update Indexed (lwzux)
   * Load Word Algebraic with Update Indexed (lwaux)
   * Load Doubleword with Update (ldu)
   * Load Doubleword with Update Indexed (ldux)

However, the following behaviour is observed using some
invalid opcodes where RA = RT.

An userspace program using an invalid instruction word like
0xe9ce0001, i.e. "ldu r14, 0(r14)", runs and exits without
getting terminated abruptly. The instruction performs the
load operation but does not write the effective address to
the base address register. Attaching an uprobe at that
instruction's address results in emulation which writes the
effective address to the base register. Thus, the final value
of the base address register is different.

To remove any inconsistencies, this adds an additional check
for the aforementioned instructions to make sure that they
are treated as unknown by the emulation infrastructure when
RA = 0 or RA = RT. The kernel will then fallback to executing
the instruction on hardware.

Signed-off-by: Sandipan Das 


For the series:
Reviewed-by: Ravi Bangoria

Re: [PATCH 1/2] powerpc: sstep: Fix load and update instructions

2020-11-25 Thread Ravi Bangoria





diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 855457ed09b5..25a5436be6c6 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -2157,11 +2157,15 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
  
  		case 23:	/* lwzx */

case 55:/* lwzux */
+   if (u && (ra == 0 || ra == rd))
+   return -1;


I guess you also need to split case 23 and 55?

- Ravi

[PATCH v2 4/4] KVM: PPC: Introduce new capability for 2nd DAWR

2020-11-24 Thread Ravi Bangoria

Introduce KVM_CAP_PPC_DAWR1 which can be used by Qemu to query whether
kvm supports 2nd DAWR or not.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kvm/powerpc.c | 3 +++
 include/uapi/linux/kvm.h   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 13999123b735..48763fe59fc5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -679,6 +679,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
!kvmppc_hv_ops->enable_svm(NULL);
break;
 #endif
+   case KVM_CAP_PPC_DAWR1:
+   r = cpu_has_feature(CPU_FTR_DAWR1);
+   break;
default:
r = 0;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6d86033c4fa..0f32d6cbabc2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1035,6 +1035,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_LAST_CPU 184
 #define KVM_CAP_SMALLER_MAXPHYADDR 185
 #define KVM_CAP_S390_DIAG318 186
+#define KVM_CAP_PPC_DAWR1 187
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.26.2

[PATCH v2 0/4] KVM: PPC: Power10 2nd DAWR enablement

2020-11-24 Thread Ravi Bangoria

Enable p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor
resource and thus H_SET_MODE hcall is used to set/unset it. A new case
H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for
setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has
been added to query 2nd DAWR support via kvm ioctl.

This feature also needs to be enabled in Qemu to really use it. I'll
post Qemu patches once kvm patches get accepted.

v1: 
https://lore.kernel.org/r/20200723102058.312282-1-ravi.bango...@linux.ibm.com

v1->v2:
 - patch #1: New patch
 - patch #2: Don't rename KVM_REG_PPC_DAWR, it's an uapi macro
 - patch #3: Increment HV_GUEST_STATE_VERSION
 - Split kvm and selftests patches into different series
 - Patches rebased to paulus/kvm-ppc-next (cf59eb13e151) + few
   other watchpoint patches which are yet to be merged in
   paulus/kvm-ppc-next.

Ravi Bangoria (4):
  KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1
  KVM: PPC: Rename current DAWR macros and variables
  KVM: PPC: Add infrastructure to support 2nd DAWR
  KVM: PPC: Introduce new capability for 2nd DAWR

 Documentation/virt/kvm/api.rst|  2 +
 arch/powerpc/include/asm/hvcall.h | 25 -
 arch/powerpc/include/asm/kvm_host.h   |  6 +-
 arch/powerpc/include/uapi/asm/kvm.h   |  2 +
 arch/powerpc/kernel/asm-offsets.c |  6 +-
 arch/powerpc/kvm/book3s_hv.c  | 65 ++
 arch/powerpc/kvm/book3s_hv_nested.c   | 68 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 43 ++
 arch/powerpc/kvm/powerpc.c|  3 +
 include/uapi/linux/kvm.h  |  1 +
 tools/arch/powerpc/include/uapi/asm/kvm.h |  2 +
 11 files changed, 181 insertions(+), 42 deletions(-)

-- 
2.26.2

[PATCH v2 1/4] KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1

2020-11-24 Thread Ravi Bangoria

On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED
hcall to load L2 guest state in cpu. L1 hypervisor prepares the
L2 state in struct hv_guest_state and passes a pointer to it via
hcall. Using that pointer, L0 reads/writes that state directly
from/to L1 memory. Thus L0 must be aware of hv_guest_state layout
of L1. Currently it uses version field to achieve this. i.e. If
L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't
allow nested kvm guest.

This restriction can be loosen up a bit. L0 can be taught to
understand older layout of hv_guest_state, if we restrict the
new member to be added only at the end. i.e. we can allow
nested guest even when L0 hv_guest_state.version > L1
hv_guest_state.version. Though, the other way around is not
possible.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/hvcall.h   | 17 +++--
 arch/powerpc/kvm/book3s_hv_nested.c | 53 -
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index fbb377055471..a7073fddb657 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -524,9 +524,12 @@ struct h_cpu_char_result {
u64 behaviour;
 };
 
-/* Register state for entering a nested guest with H_ENTER_NESTED */
+/*
+ * Register state for entering a nested guest with H_ENTER_NESTED.
+ * New member must be added at the end.
+ */
 struct hv_guest_state {
-   u64 version;/* version of this structure layout */
+   u64 version;/* version of this structure layout, must be 
first */
u32 lpid;
u32 vcpu_token;
/* These registers are hypervisor privileged (at least for writing) */
@@ -560,6 +563,16 @@ struct hv_guest_state {
 /* Latest version of hv_guest_state structure */
 #define HV_GUEST_STATE_VERSION 1
 
+static inline int hv_guest_state_size(unsigned int version)
+{
+   switch (version) {
+   case 1:
+   return offsetofend(struct hv_guest_state, ppr);
+   default:
+   return -1;
+   }
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_HVCALL_H */
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 33b58549a9aa..2b433c3bacea 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -215,6 +215,45 @@ static void kvmhv_nested_mmio_needed(struct kvm_vcpu 
*vcpu, u64 regs_ptr)
}
 }
 
+static int kvmhv_read_guest_state_and_regs(struct kvm_vcpu *vcpu,
+  struct hv_guest_state *l2_hv,
+  struct pt_regs *l2_regs,
+  u64 hv_ptr, u64 regs_ptr)
+{
+   int size;
+
+   if (kvm_vcpu_read_guest(vcpu, hv_ptr, &(l2_hv->version),
+   sizeof(l2_hv->version)))
+   return -1;
+
+   if (kvmppc_need_byteswap(vcpu))
+   l2_hv->version = swab64(l2_hv->version);
+
+   size = hv_guest_state_size(l2_hv->version);
+   if (size < 0)
+   return -1;
+
+   return kvm_vcpu_read_guest(vcpu, hv_ptr, l2_hv, size) ||
+   kvm_vcpu_read_guest(vcpu, regs_ptr, l2_regs,
+   sizeof(struct pt_regs));
+}
+
+static int kvmhv_write_guest_state_and_regs(struct kvm_vcpu *vcpu,
+   struct hv_guest_state *l2_hv,
+   struct pt_regs *l2_regs,
+   u64 hv_ptr, u64 regs_ptr)
+{
+   int size;
+
+   size = hv_guest_state_size(l2_hv->version);
+   if (size < 0)
+   return -1;
+
+   return kvm_vcpu_write_guest(vcpu, hv_ptr, l2_hv, size) ||
+   kvm_vcpu_write_guest(vcpu, regs_ptr, l2_regs,
+sizeof(struct pt_regs));
+}
+
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 {
long int err, r;
@@ -235,17 +274,15 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
hv_ptr = kvmppc_get_gpr(vcpu, 4);
regs_ptr = kvmppc_get_gpr(vcpu, 5);
vcpu->srcu_idx = srcu_read_lock(>kvm->srcu);
-   err = kvm_vcpu_read_guest(vcpu, hv_ptr, _hv,
- sizeof(struct hv_guest_state)) ||
-   kvm_vcpu_read_guest(vcpu, regs_ptr, _regs,
-   sizeof(struct pt_regs));
+   err = kvmhv_read_guest_state_and_regs(vcpu, _hv, _regs,
+ hv_ptr, regs_ptr);
srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx);
if (err)
return H_PARAMETER;
 
if (kvmppc_need_byteswap(vcpu))
byteswap_hv_regs(_hv);
-   if (l2_hv.version != HV_GUEST_STATE_VERSION)
+   if (l2_hv.version >

[PATCH v2 3/4] KVM: PPC: Add infrastructure to support 2nd DAWR

2020-11-24 Thread Ravi Bangoria

kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR.
DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/
unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR.
Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set.

Signed-off-by: Ravi Bangoria 
---
 Documentation/virt/kvm/api.rst|  2 ++
 arch/powerpc/include/asm/hvcall.h |  8 -
 arch/powerpc/include/asm/kvm_host.h   |  2 ++
 arch/powerpc/include/uapi/asm/kvm.h   |  2 ++
 arch/powerpc/kernel/asm-offsets.c |  2 ++
 arch/powerpc/kvm/book3s_hv.c  | 41 +++
 arch/powerpc/kvm/book3s_hv_nested.c   |  7 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 23 +
 tools/arch/powerpc/include/uapi/asm/kvm.h |  2 ++
 9 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eb3a1316f03e..72c98735aa52 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2249,6 +2249,8 @@ registers, find a list below:
   PPC KVM_REG_PPC_PSSCR   64
   PPC KVM_REG_PPC_DEC_EXPIRY  64
   PPC KVM_REG_PPC_PTCR64
+  PPC KVM_REG_PPC_DAWR1   64
+  PPC KVM_REG_PPC_DAWRX1  64
   PPC KVM_REG_PPC_TM_GPR0 64
   ...
   PPC KVM_REG_PPC_TM_GPR3164
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index a7073fddb657..4bacd27a348b 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -558,16 +558,22 @@ struct hv_guest_state {
u64 pidr;
u64 cfar;
u64 ppr;
+   /* Version 1 ends here */
+   u64 dawr1;
+   u64 dawrx1;
+   /* Version 2 ends here */
 };
 
 /* Latest version of hv_guest_state structure */
-#define HV_GUEST_STATE_VERSION 1
+#define HV_GUEST_STATE_VERSION 2
 
 static inline int hv_guest_state_size(unsigned int version)
 {
switch (version) {
case 1:
return offsetofend(struct hv_guest_state, ppr);
+   case 2:
+   return offsetofend(struct hv_guest_state, dawrx1);
default:
return -1;
}
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 62cadf1a596e..9804afdf8578 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -586,6 +586,8 @@ struct kvm_vcpu_arch {
ulong dabr;
ulong dawr0;
ulong dawrx0;
+   ulong dawr1;
+   ulong dawrx1;
ulong ciabr;
ulong cfar;
ulong ppr;
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index c3af3f324c5a..9f18fa090f1f 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -644,6 +644,8 @@ struct kvm_ppc_cpu_char {
 #define KVM_REG_PPC_MMCR3  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc1)
 #define KVM_REG_PPC_SIER2  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc2)
 #define KVM_REG_PPC_SIER3  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc3)
+#define KVM_REG_PPC_DAWR1  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc4)
+#define KVM_REG_PPC_DAWRX1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc5)
 
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index e4256f5b4602..26d4fa8fe51e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -549,6 +549,8 @@ int main(void)
OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx);
OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0);
OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0);
+   OFFSET(VCPU_DAWR1, kvm_vcpu, arch.dawr1);
+   OFFSET(VCPU_DAWRX1, kvm_vcpu, arch.dawrx1);
OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr);
OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags);
OFFSET(VCPU_DEC, kvm_vcpu, arch.dec);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d5c6efc8a76e..2ff645789e9e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -785,6 +785,20 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
vcpu->arch.dawr0  = value1;
vcpu->arch.dawrx0 = value2;
return H_SUCCESS;
+   case H_SET_MODE_RESOURCE_SET_DAWR1:
+   if (!kvmppc_power8_compatible(vcpu))
+   return H_P2;
+   if (!ppc_breakpoint_available())
+   return H_P2;
+   if (!cpu_has_feature(CPU_FTR_DAWR1))
+   return H_P2;
+   if (mflags)
+   return H_UNSUPPORTED_FLAG_START;
+   if (value2 & DABRX_HYP)
+   return H_P4;
+   vcpu->arch.dawr1  = value1;
+

[PATCH v2 2/4] KVM: PPC: Rename current DAWR macros and variables

2020-11-24 Thread Ravi Bangoria

Power10 is introducing second DAWR. Use real register names (with
suffix 0) from ISA for current macros and variables used by kvm.
One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's
uapi so changing it will break userspace.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/kvm_host.h |  4 ++--
 arch/powerpc/kernel/asm-offsets.c   |  4 ++--
 arch/powerpc/kvm/book3s_hv.c| 24 
 arch/powerpc/kvm/book3s_hv_nested.c |  8 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++--
 5 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d67a470e95a3..62cadf1a596e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -584,8 +584,8 @@ struct kvm_vcpu_arch {
u32 ctrl;
u32 dabrx;
ulong dabr;
-   ulong dawr;
-   ulong dawrx;
+   ulong dawr0;
+   ulong dawrx0;
ulong ciabr;
ulong cfar;
ulong ppr;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8711c2164b45..e4256f5b4602 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -547,8 +547,8 @@ int main(void)
OFFSET(VCPU_CTRL, kvm_vcpu, arch.ctrl);
OFFSET(VCPU_DABR, kvm_vcpu, arch.dabr);
OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx);
-   OFFSET(VCPU_DAWR, kvm_vcpu, arch.dawr);
-   OFFSET(VCPU_DAWRX, kvm_vcpu, arch.dawrx);
+   OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0);
+   OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0);
OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr);
OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags);
OFFSET(VCPU_DEC, kvm_vcpu, arch.dec);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 490a0f6a7285..d5c6efc8a76e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -782,8 +782,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
return H_UNSUPPORTED_FLAG_START;
if (value2 & DABRX_HYP)
return H_P4;
-   vcpu->arch.dawr  = value1;
-   vcpu->arch.dawrx = value2;
+   vcpu->arch.dawr0  = value1;
+   vcpu->arch.dawrx0 = value2;
return H_SUCCESS;
case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE:
/* KVM does not support mflags=2 (AIL=2) */
@@ -1747,10 +1747,10 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
*val = get_reg_val(id, vcpu->arch.vcore->vtb);
break;
case KVM_REG_PPC_DAWR:
-   *val = get_reg_val(id, vcpu->arch.dawr);
+   *val = get_reg_val(id, vcpu->arch.dawr0);
break;
case KVM_REG_PPC_DAWRX:
-   *val = get_reg_val(id, vcpu->arch.dawrx);
+   *val = get_reg_val(id, vcpu->arch.dawrx0);
break;
case KVM_REG_PPC_CIABR:
*val = get_reg_val(id, vcpu->arch.ciabr);
@@ -1979,10 +1979,10 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
vcpu->arch.vcore->vtb = set_reg_val(id, *val);
break;
case KVM_REG_PPC_DAWR:
-   vcpu->arch.dawr = set_reg_val(id, *val);
+   vcpu->arch.dawr0 = set_reg_val(id, *val);
break;
case KVM_REG_PPC_DAWRX:
-   vcpu->arch.dawrx = set_reg_val(id, *val) & ~DAWRX_HYP;
+   vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP;
break;
case KVM_REG_PPC_CIABR:
vcpu->arch.ciabr = set_reg_val(id, *val);
@@ -3437,8 +3437,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
int trap;
unsigned long host_hfscr = mfspr(SPRN_HFSCR);
unsigned long host_ciabr = mfspr(SPRN_CIABR);
-   unsigned long host_dawr = mfspr(SPRN_DAWR0);
-   unsigned long host_dawrx = mfspr(SPRN_DAWRX0);
+   unsigned long host_dawr0 = mfspr(SPRN_DAWR0);
+   unsigned long host_dawrx0 = mfspr(SPRN_DAWRX0);
unsigned long host_psscr = mfspr(SPRN_PSSCR);
unsigned long host_pidr = mfspr(SPRN_PID);
 
@@ -3477,8 +3477,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
mtspr(SPRN_SPURR, vcpu->arch.spurr);
 
if (dawr_enabled()) {
-   mtspr(SPRN_DAWR0, vcpu->arch.dawr);
-   mtspr(SPRN_DAWRX0, vcpu->arch.dawrx);
+   mtspr(SPRN_DAWR0, vcpu->arch.dawr0);
+   mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0);
}
mtspr(SPRN_CIABR, vcpu->arch.ciabr);
mtspr(SPRN_IC, vcpu->arch.ic);
@@ -3530,8 +3530,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64

[PATCH v3] powerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions

2020-11-05 Thread Ravi Bangoria

POWER10 DD1 has an issue where it generates watchpoint exceptions when
it shouldn't. The conditions where this occur are:

 - octword op
 - ending address of DAWR range is less than starting address of op
 - those addresses need to be in the same or in two consecutive 512B
   blocks
 - 'op address + 64B' generates an address that has a carry into bit
   52 (crosses 2K boundary)

Handle such spurious exception by considering them as extraneous and
emulating/single-steeping instruction without generating an event.

Signed-off-by: Ravi Bangoria 
[Fixed build warning reported by kernel test robot]
Reported-by: kernel test robot 
---
v2->v3:
 - v2: 
https://lore.kernel.org/r/20201022034039.330365-1-ravi.bango...@linux.ibm.com
 - Drop first patch which introduced CPU_FTRS_POWER10_DD1. Instead use
   P1 DD1 PVR direclty in if condition. We can't set CPU_FTRS_POWER10_DD1
   inside guest as guest can be migrated to futur version of cpu.

Dependency: VSX-32 byte emulation support patches
  https://lore.kernel.org/r/20201011050908.72173-1-ravi.bango...@linux.ibm.com

 arch/powerpc/kernel/hw_breakpoint.c | 68 -
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 1f4a1efa0074..67297aea5d94 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -644,6 +644,11 @@ static bool is_larx_stcx_instr(int type)
return type == LARX || type == STCX;
 }
 
+static bool is_octword_vsx_instr(int type, int size)
+{
+   return ((type == LOAD_VSX || type == STORE_VSX) && size == 32);
+}
+
 /*
  * We've failed in reliably handling the hw-breakpoint. Unregister
  * it and throw a warning message to let the user know about it.
@@ -694,6 +699,58 @@ static bool stepping_handler(struct pt_regs *regs, struct 
perf_event **bp,
return true;
 }
 
+static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info,
+int *hit, unsigned long ea)
+{
+   int i;
+   unsigned long hw_end_addr;
+
+   /*
+* Handle spurious exception only when any bp_per_reg is set.
+* Otherwise this might be created by xmon and not actually a
+* spurious exception.
+*/
+   for (i = 0; i < nr_wp_slots(); i++) {
+   if (!info[i])
+   continue;
+
+   hw_end_addr = ALIGN(info[i]->address + info[i]->len, 
HW_BREAKPOINT_SIZE);
+
+   /*
+* Ending address of DAWR range is less than starting
+* address of op.
+*/
+   if ((hw_end_addr - 1) >= ea)
+   continue;
+
+   /*
+* Those addresses need to be in the same or in two
+* consecutive 512B blocks;
+*/
+   if (((hw_end_addr - 1) >> 10) != (ea >> 10))
+   continue;
+
+   /*
+* 'op address + 64B' generates an address that has a
+* carry into bit 52 (crosses 2K boundary).
+*/
+   if ((ea & 0x800) == ((ea + 64) & 0x800))
+   continue;
+
+   break;
+   }
+
+   if (i == nr_wp_slots())
+   return;
+
+   for (i = 0; i < nr_wp_slots(); i++) {
+   if (info[i]) {
+   hit[i] = 1;
+   info[i]->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
+   }
+   }
+}
+
 int hw_breakpoint_handler(struct die_args *args)
 {
bool err = false;
@@ -752,8 +809,15 @@ int hw_breakpoint_handler(struct die_args *args)
goto reset;
 
if (!nr_hit) {
-   rc = NOTIFY_DONE;
-   goto out;
+   /* Workaround for Power10 DD1 */
+   if (mfspr(SPRN_PVR) == 0x800100 &&
+   !IS_ENABLED(CONFIG_PPC_8xx) &&
+   is_octword_vsx_instr(type, size)) {
+   handle_p10dd1_spurious_exception(info, hit, ea);
+   } else {
+   rc = NOTIFY_DONE;
+   goto out;
+   }
}
 
/*
-- 
2.25.1

Re: [PATCH 1/2] powerpc: Introduce POWER10_DD1 feature

2020-10-26 Thread Ravi Bangoria


Hi Michael,


+static void __init fixup_cpu_features(void)
+{
+   unsigned long version = mfspr(SPRN_PVR);
+
+   if ((version & 0x) == 0x00800100)
+   cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1;
+}
+
  static int __init early_init_dt_scan_cpus(unsigned long node,
  const char *uname, int depth,
  void *data)
@@ -378,6 +386,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
  
  		check_cpu_feature_properties(node);

check_cpu_pa_features(node);
+   fixup_cpu_features();
}


This is not the way we normally do CPU features.

In the past we have always added a raw entry in cputable.c, see eg. the
Power9 DD 2.0, 2.1 entries.


True. But that won't work PowerVM guests because kernel overwrites "raw"
mode cpu_features with "architected" mode cpu_features.



Doing it here is not really safe, if you're running with an architected
PVR (via cpu-version property), you can't set the DD1 feature, because
you might be migrated to a future CPU that doesn't have the DD1 quirks.


Okay.. I suppose, that mean kernel need to check real PVR every time when it
encounters spurious exception. i.e. instead of using

  if (cpu_has_feature(CPU_FTR_POWER10_DD1) && ...)

it need to use

  if (mfspr(SPRN_PVR) == 0x800100 && ...)

in patch 2. Right?

Thanks,
Ravi

Re: [PATCH v2 1/2] powerpc: Introduce POWER10_DD1 feature

2020-10-23 Thread Ravi Bangoria





+static void __init fixup_cpu_features(void)
+{
+   unsigned long version = mfspr(SPRN_PVR);
+
+   if ((version & 0x) == 0x00800100)
+   cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1;
+}
+

I am just wondering why this is needed here, but the same thing is not
done for, say, CPU_FTR_POWER9_DD2_1?


When we don't use DT cpu_features (PowerVM / kvm geusts), we call
identify_cpu() twice. First with Real PVR which sets "raw" cpu_spec
as cur_cpu_spec and then 2nd time with Logical PVR (0x0f...) which
(mostly) overwrites the cur_cpu_spec with "architected" mode cpu_spec.
I don't see DD version specific entries for "architected" mode in
cpu_specs[] for any previous processors. So I've introduced this
function to tweak cpu_features.

Though, I don't know why we don't have similar thing for
CPU_FTR_POWER9_DD2_1. I've to check that.


And should we get a /* Power10 DD 1 */ added to cpu_specs[]?


IIUC, we don't need such entry. For PowerVM / kvm guests, we overwrite
cpu_spec, so /* Power10 */ "raw" entry is sufficient. And For baremetal,
we don't use cpu_specs[] at all.

I think even for powernv, using dt features can be disabled by the
cmdline with dt_cpu_ftrs=off, then cpu_specs[] will then be used.


Ok... with dt_cpu_ftrs=off, we seem to be using raw mode cpu_specs[] entry on
baremetal. So yeah, I'll add /* Power10 DD1 */ raw mode entry into cpu_specs[].
Thanks for pointing it out.

-Ravi

Re: [PATCH v2 1/2] powerpc: Introduce POWER10_DD1 feature

2020-10-21 Thread Ravi Bangoria





On 10/22/20 10:41 AM, Jordan Niethe wrote:

On Thu, Oct 22, 2020 at 2:40 PM Ravi Bangoria
 wrote:


POWER10_DD1 feature flag will be needed while adding
conditional code that applies only for Power10 DD1.

Signed-off-by: Ravi Bangoria 
---
  arch/powerpc/include/asm/cputable.h | 8 ++--
  arch/powerpc/kernel/dt_cpu_ftrs.c   | 3 +++
  arch/powerpc/kernel/prom.c  | 9 +
  3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 93bc70d4c9a1..d486f56c0d33 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -216,6 +216,7 @@ static inline void cpu_feature_keys_init(void) { }
  #define CPU_FTR_P9_RADIX_PREFETCH_BUG  LONG_ASM_CONST(0x0002)
  #define CPU_FTR_ARCH_31
LONG_ASM_CONST(0x0004)
  #define CPU_FTR_DAWR1  LONG_ASM_CONST(0x0008)
+#define CPU_FTR_POWER10_DD1LONG_ASM_CONST(0x0010)

  #ifndef __ASSEMBLY__

@@ -479,6 +480,7 @@ static inline void cpu_feature_keys_init(void) { }
 CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
 CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \
 CPU_FTR_DAWR | CPU_FTR_DAWR1)
+#define CPU_FTRS_POWER10_DD1   (CPU_FTRS_POWER10 | CPU_FTR_POWER10_DD1)
  #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
 CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
 CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -497,14 +499,16 @@ static inline void cpu_feature_keys_init(void) { }
  #define CPU_FTRS_POSSIBLE  \
 (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \
  CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
+CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | 
\
+CPU_FTRS_POWER10_DD1)
  #else
  #define CPU_FTRS_POSSIBLE  \
 (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
  CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
  CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
  CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
+CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | 
\
+CPU_FTRS_POWER10_DD1)
  #endif /* CONFIG_CPU_LITTLE_ENDIAN */
  #endif
  #else
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 1098863e17ee..b2327f2967ff 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -811,6 +811,9 @@ static __init void cpufeatures_cpu_quirks(void)
 }

 update_tlbie_feature_flag(version);
+
+   if ((version & 0x) == 0x00800100)
+   cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1;
  }

  static void __init cpufeatures_setup_finished(void)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index c1545f22c077..c778c81284f7 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -305,6 +305,14 @@ static void __init check_cpu_feature_properties(unsigned 
long node)
 }
  }

+static void __init fixup_cpu_features(void)
+{
+   unsigned long version = mfspr(SPRN_PVR);
+
+   if ((version & 0x) == 0x00800100)
+   cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1;
+}
+

I am just wondering why this is needed here, but the same thing is not
done for, say, CPU_FTR_POWER9_DD2_1?


When we don't use DT cpu_features (PowerVM / kvm geusts), we call
identify_cpu() twice. First with Real PVR which sets "raw" cpu_spec
as cur_cpu_spec and then 2nd time with Logical PVR (0x0f...) which
(mostly) overwrites the cur_cpu_spec with "architected" mode cpu_spec.
I don't see DD version specific entries for "architected" mode in
cpu_specs[] for any previous processors. So I've introduced this
function to tweak cpu_features.

Though, I don't know why we don't have similar thing for
CPU_FTR_POWER9_DD2_1. I've to check that.


And should we get a /* Power10 DD 1 */ added to cpu_specs[]?


IIUC, we don't need such entry. For PowerVM / kvm guests, we overwrite
cpu_spec, so /* Power10 */ "raw" entry is sufficient. And For baremetal,
we don't use cpu_specs[] at all.




  static int __init early_init_dt_scan_cpus(unsigned long node,
   const char *uname, int depth,
   void *data)
@@ -378,6 +386,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,

 check_cpu_feature_properties(node);
 check_cpu_pa_features(node);
+   fixup_cpu_features();
 }

 identical_pvr_fixup(node);
--
2.25.1

[PATCH v2 2/2] powerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions

2020-10-21 Thread Ravi Bangoria

POWER10 DD1 has an issue where it generates watchpoint exceptions when it
shouldn't. The conditions where this occur are:

 - octword op
 - ending address of DAWR range is less than starting address of op
 - those addresses need to be in the same or in two consecutive 512B
   blocks
 - 'op address + 64B' generates an address that has a carry into bit
   52 (crosses 2K boundary)

Handle such spurious exception by considering them as extraneous and
emulating/single-steeping instruction without generating an event.

Signed-off-by: Ravi Bangoria 
[Fixed build warning reported by kernel test robot]
Reported-by: kernel test robot 
---

Dependency: VSX-32 byte emulation support patches
  https://lore.kernel.org/r/20201011050908.72173-1-ravi.bango...@linux.ibm.com

 arch/powerpc/kernel/hw_breakpoint.c | 67 -
 1 file changed, 65 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index f4e8f21046f5..9e83dd3d2ec5 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -499,6 +499,11 @@ static bool is_larx_stcx_instr(int type)
return type == LARX || type == STCX;
 }
 
+static bool is_octword_vsx_instr(int type, int size)
+{
+   return ((type == LOAD_VSX || type == STORE_VSX) && size == 32);
+}
+
 /*
  * We've failed in reliably handling the hw-breakpoint. Unregister
  * it and throw a warning message to let the user know about it.
@@ -549,6 +554,58 @@ static bool stepping_handler(struct pt_regs *regs, struct 
perf_event **bp,
return true;
 }
 
+static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info,
+int *hit, unsigned long ea)
+{
+   int i;
+   unsigned long hw_end_addr;
+
+   /*
+* Handle spurious exception only when any bp_per_reg is set.
+* Otherwise this might be created by xmon and not actually a
+* spurious exception.
+*/
+   for (i = 0; i < nr_wp_slots(); i++) {
+   if (!info[i])
+   continue;
+
+   hw_end_addr = ALIGN(info[i]->address + info[i]->len, 
HW_BREAKPOINT_SIZE);
+
+   /*
+* Ending address of DAWR range is less than starting
+* address of op.
+*/
+   if ((hw_end_addr - 1) >= ea)
+   continue;
+
+   /*
+* Those addresses need to be in the same or in two
+* consecutive 512B blocks;
+*/
+   if (((hw_end_addr - 1) >> 10) != (ea >> 10))
+   continue;
+
+   /*
+* 'op address + 64B' generates an address that has a
+* carry into bit 52 (crosses 2K boundary).
+*/
+   if ((ea & 0x800) == ((ea + 64) & 0x800))
+   continue;
+
+   break;
+   }
+
+   if (i == nr_wp_slots())
+   return;
+
+   for (i = 0; i < nr_wp_slots(); i++) {
+   if (info[i]) {
+   hit[i] = 1;
+   info[i]->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
+   }
+   }
+}
+
 int hw_breakpoint_handler(struct die_args *args)
 {
bool err = false;
@@ -607,8 +664,14 @@ int hw_breakpoint_handler(struct die_args *args)
goto reset;
 
if (!nr_hit) {
-   rc = NOTIFY_DONE;
-   goto out;
+   if (cpu_has_feature(CPU_FTR_POWER10_DD1) &&
+   !IS_ENABLED(CONFIG_PPC_8xx) &&
+   is_octword_vsx_instr(type, size)) {
+   handle_p10dd1_spurious_exception(info, hit, ea);
+   } else {
+   rc = NOTIFY_DONE;
+   goto out;
+   }
}
 
/*
-- 
2.25.1

[PATCH v2 1/2] powerpc: Introduce POWER10_DD1 feature

2020-10-21 Thread Ravi Bangoria

POWER10_DD1 feature flag will be needed while adding
conditional code that applies only for Power10 DD1.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/cputable.h | 8 ++--
 arch/powerpc/kernel/dt_cpu_ftrs.c   | 3 +++
 arch/powerpc/kernel/prom.c  | 9 +
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 93bc70d4c9a1..d486f56c0d33 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -216,6 +216,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_RADIX_PREFETCH_BUG  LONG_ASM_CONST(0x0002)
 #define CPU_FTR_ARCH_31
LONG_ASM_CONST(0x0004)
 #define CPU_FTR_DAWR1  LONG_ASM_CONST(0x0008)
+#define CPU_FTR_POWER10_DD1LONG_ASM_CONST(0x0010)
 
 #ifndef __ASSEMBLY__
 
@@ -479,6 +480,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \
CPU_FTR_DAWR | CPU_FTR_DAWR1)
+#define CPU_FTRS_POWER10_DD1   (CPU_FTRS_POWER10 | CPU_FTR_POWER10_DD1)
 #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -497,14 +499,16 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTRS_POSSIBLE  \
(CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \
 CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
+CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | 
\
+CPU_FTRS_POWER10_DD1)
 #else
 #define CPU_FTRS_POSSIBLE  \
(CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
 CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
 CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
 CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
+CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | 
\
+CPU_FTRS_POWER10_DD1)
 #endif /* CONFIG_CPU_LITTLE_ENDIAN */
 #endif
 #else
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 1098863e17ee..b2327f2967ff 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -811,6 +811,9 @@ static __init void cpufeatures_cpu_quirks(void)
}
 
update_tlbie_feature_flag(version);
+
+   if ((version & 0x) == 0x00800100)
+   cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1;
 }
 
 static void __init cpufeatures_setup_finished(void)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index c1545f22c077..c778c81284f7 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -305,6 +305,14 @@ static void __init check_cpu_feature_properties(unsigned 
long node)
}
 }
 
+static void __init fixup_cpu_features(void)
+{
+   unsigned long version = mfspr(SPRN_PVR);
+
+   if ((version & 0x) == 0x00800100)
+   cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1;
+}
+
 static int __init early_init_dt_scan_cpus(unsigned long node,
  const char *uname, int depth,
  void *data)
@@ -378,6 +386,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
 
check_cpu_feature_properties(node);
check_cpu_pa_features(node);
+   fixup_cpu_features();
}
 
identical_pvr_fixup(node);
-- 
2.25.1

[PATCH 1/2] powerpc: Introduce POWER10_DD1 feature

2020-10-19 Thread Ravi Bangoria

POWER10_DD1 feature flag will be needed while adding
conditional code that applies only for Power10 DD1.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/cputable.h | 8 ++--
 arch/powerpc/kernel/dt_cpu_ftrs.c   | 3 +++
 arch/powerpc/kernel/prom.c  | 9 +
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 93bc70d4c9a1..d486f56c0d33 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -216,6 +216,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_RADIX_PREFETCH_BUG  LONG_ASM_CONST(0x0002)
 #define CPU_FTR_ARCH_31
LONG_ASM_CONST(0x0004)
 #define CPU_FTR_DAWR1  LONG_ASM_CONST(0x0008)
+#define CPU_FTR_POWER10_DD1LONG_ASM_CONST(0x0010)
 
 #ifndef __ASSEMBLY__
 
@@ -479,6 +480,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \
CPU_FTR_DAWR | CPU_FTR_DAWR1)
+#define CPU_FTRS_POWER10_DD1   (CPU_FTRS_POWER10 | CPU_FTR_POWER10_DD1)
 #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -497,14 +499,16 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTRS_POSSIBLE  \
(CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \
 CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
+CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | 
\
+CPU_FTRS_POWER10_DD1)
 #else
 #define CPU_FTRS_POSSIBLE  \
(CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
 CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
 CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
 CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
+CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | 
\
+CPU_FTRS_POWER10_DD1)
 #endif /* CONFIG_CPU_LITTLE_ENDIAN */
 #endif
 #else
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 1098863e17ee..b2327f2967ff 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -811,6 +811,9 @@ static __init void cpufeatures_cpu_quirks(void)
}
 
update_tlbie_feature_flag(version);
+
+   if ((version & 0x) == 0x00800100)
+   cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1;
 }
 
 static void __init cpufeatures_setup_finished(void)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index c1545f22c077..c778c81284f7 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -305,6 +305,14 @@ static void __init check_cpu_feature_properties(unsigned 
long node)
}
 }
 
+static void __init fixup_cpu_features(void)
+{
+   unsigned long version = mfspr(SPRN_PVR);
+
+   if ((version & 0x) == 0x00800100)
+   cur_cpu_spec->cpu_features |= CPU_FTR_POWER10_DD1;
+}
+
 static int __init early_init_dt_scan_cpus(unsigned long node,
  const char *uname, int depth,
  void *data)
@@ -378,6 +386,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
 
check_cpu_feature_properties(node);
check_cpu_pa_features(node);
+   fixup_cpu_features();
}
 
identical_pvr_fixup(node);
-- 
2.25.1

[PATCH 2/2] powerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions

2020-10-19 Thread Ravi Bangoria

POWER10 DD1 has an issue where it generates watchpoint exceptions when it
shouldn't. The conditions where this occur are:

 - octword op
 - ending address of DAWR range is less than starting address of op
 - those addresses need to be in the same or in two consecutive 512B
   blocks
 - 'op address + 64B' generates an address that has a carry into bit
   52 (crosses 2K boundary)

Handle such spurious exception by considering them as extraneous and
emulating/single-steeping instruction without generating an event.

Signed-off-by: Ravi Bangoria 
---

Dependency: VSX-32 byte emulation support patches
  https://lore.kernel.org/r/20201011050908.72173-1-ravi.bango...@linux.ibm.com

 arch/powerpc/kernel/hw_breakpoint.c | 69 -
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index f4e8f21046f5..4514745d27c3 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -499,6 +499,11 @@ static bool is_larx_stcx_instr(int type)
return type == LARX || type == STCX;
 }
 
+static bool is_octword_vsx_instr(int type, int size)
+{
+   return ((type == LOAD_VSX || type == STORE_VSX) && size == 32);
+}
+
 /*
  * We've failed in reliably handling the hw-breakpoint. Unregister
  * it and throw a warning message to let the user know about it.
@@ -549,6 +554,60 @@ static bool stepping_handler(struct pt_regs *regs, struct 
perf_event **bp,
return true;
 }
 
+static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info,
+int *hit, unsigned long ea)
+{
+   int i;
+   unsigned long hw_start_addr;
+   unsigned long hw_end_addr;
+
+   /*
+* Handle spurious exception only when any bp_per_reg is set.
+* Otherwise this might be created by xmon and not actually a
+* spurious exception.
+*/
+   for (i = 0; i < nr_wp_slots(); i++) {
+   if (!info[i])
+   continue;
+
+   hw_start_addr = ALIGN_DOWN(info[i]->address, 
HW_BREAKPOINT_SIZE);
+   hw_end_addr = ALIGN(info[i]->address + info[i]->len, 
HW_BREAKPOINT_SIZE);
+
+   /*
+* Ending address of DAWR range is less than starting
+* address of op.
+*/
+   if ((hw_end_addr - 1) >= ea)
+   continue;
+
+   /*
+* Those addresses need to be in the same or in two
+* consecutive 512B blocks;
+*/
+   if (((hw_end_addr - 1) >> 10) != (ea >> 10))
+   continue;
+
+   /*
+* 'op address + 64B' generates an address that has a
+* carry into bit 52 (crosses 2K boundary).
+*/
+   if ((ea & 0x800) == ((ea + 64) & 0x800))
+   continue;
+
+   break;
+   }
+
+   if (i == nr_wp_slots())
+   return;
+
+   for (i = 0; i < nr_wp_slots(); i++) {
+   if (info[i]) {
+   hit[i] = 1;
+   info[i]->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
+   }
+   }
+}
+
 int hw_breakpoint_handler(struct die_args *args)
 {
bool err = false;
@@ -607,8 +666,14 @@ int hw_breakpoint_handler(struct die_args *args)
goto reset;
 
if (!nr_hit) {
-   rc = NOTIFY_DONE;
-   goto out;
+   if (cpu_has_feature(CPU_FTR_POWER10_DD1) &&
+   !IS_ENABLED(CONFIG_PPC_8xx) &&
+   is_octword_vsx_instr(type, size)) {
+   handle_p10dd1_spurious_exception(info, hit, ea);
+   } else {
+   rc = NOTIFY_DONE;
+   goto out;
+   }
}
 
/*
-- 
2.25.1

Re: [PATCH v5 1/5] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set

2020-10-14 Thread Ravi Bangoria


Hi Daniel,

On 10/12/20 7:14 PM, Daniel Axtens wrote:

Hi,

To review this, I looked through the supported instructions to see if
there were any that I thought might have been missed.

I didn't find any other v3.1 ones, although I don't have a v3.1 ISA to
hand so I was basically looking for instructions I didn't recognise.

I did, however, find a number of instructions that are new in ISA 3.0
that aren't guarded:

  - addpcis
  - lxvl/stxvl
  - lxvll/stxvll
  - lxvwsx
  - stxvx
  - lxsipzx
  - lxvh8x
  - lxsihzx
  - lxvb16x/stxvb16x
  - stxsibx
  - stxsihx
  - lxvb16x
  - lxsd/stxsd
  - lxssp/stxssp
  - lxv/stxv
  
Also, I don't know how bothered we are about P7, but if I'm reading the

ISA correctly, lqarx/stqcx. are not supported before ISA 2.07. Likewise
a number of the vector instructions like lxsiwzx and lxsiwax (and the
companion stores).

I realise it's not really the point of this particular patch, so I don't
think this should block acceptance. What I would like to know - and
maybe this is something where we need mpe to weigh in - is whether we
need consistent guards for 2.07 and 3.0. We have some 3.0 guards already
but clearly not everywhere.


Yes, those needs to be handled properly. Otherwise they can be harmful
when emulated on a non-supporting platform. Will work on it when I get
some time. Thanks for reporting it.



With all that said - the patch does what it says it does, and looks good
to me:

Reviewed-by: Daniel Axtens 


Thanks!
Ravi

Re: [PATCH v5 1/5] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set

2020-10-12 Thread Ravi Bangoria


Hi Daniel,

On 10/12/20 7:21 AM, Daniel Axtens wrote:

Hi,

Apologies if this has come up in a previous revision.



case 1:
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   return -1;
+
prefix_r = GET_PREFIX_R(word);
ra = GET_PREFIX_RA(suffix);


The comment above analyse_instr reads in part:

  * Return value is 1 if the instruction can be emulated just by
  * updating *regs with the information in *op, -1 if we need the
  * GPRs but *regs doesn't contain the full register set, or 0
  * otherwise.

I was wondering why returning -1 if the instruction isn't supported the
right thing to do - it seemed to me that it should return 0?

I did look and see that there are other cases where the code returns -1
for an unsupported operation, e.g.:

#ifdef __powerpc64__
case 4:
if (!cpu_has_feature(CPU_FTR_ARCH_300))
return -1;

switch (word & 0x3f) {
case 48:/* maddhd */

That's from commit 930d6288a267 ("powerpc: sstep: Add support for
maddhd, maddhdu, maddld instructions"), but it's not explained there either

I see the same pattern in a number of commits: commit 6324320de609
("powerpc sstep: Add support for modsd, modud instructions"), commit
6c180071509a ("powerpc sstep: Add support for modsw, moduw
instructions"), commit a23987ef267a ("powerpc: sstep: Add support for
darn instruction") and a few others, all of which seem to have come
through Sandipan in February 2019. I haven't spotted any explanation for
why -1 was chosen, but I haven't checked the mailing list archives.

If -1 is OK, would it be possible to update the comment to explain why?


Agreed. As you rightly pointed out, there are many more such cases and, yes,
we are aware of this issue and it's being worked upon as a separate patch.
(Sandipan did send a fix patch internally some time back).

Thanks for pointing it out!
Ravi

[PATCH v5 5/5] powerpc/sstep: Add testcases for VSX vector paired load/store instructions

2020-10-10 Thread Ravi Bangoria

From: Balamuruhan S 

Add testcases for VSX vector paired load/store instructions.
Sample o/p:

  emulate_step_test: lxvp   : PASS
  emulate_step_test: stxvp  : PASS
  emulate_step_test: lxvpx  : PASS
  emulate_step_test: stxvpx : PASS
  emulate_step_test: plxvp  : PASS
  emulate_step_test: pstxvp : PASS

Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/test_emulate_step.c | 270 +++
 1 file changed, 270 insertions(+)

diff --git a/arch/powerpc/lib/test_emulate_step.c 
b/arch/powerpc/lib/test_emulate_step.c
index 0a201b771477..783d1b85ecfe 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -612,6 +612,273 @@ static void __init test_lxvd2x_stxvd2x(void)
 }
 #endif /* CONFIG_VSX */
 
+#ifdef CONFIG_VSX
+static void __init test_lxvp_stxvp(void)
+{
+   struct pt_regs regs;
+   union {
+   vector128 a;
+   u32 b[4];
+   } c[2];
+   u32 cached_b[8];
+   int stepped = -1;
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+   show_result("lxvp", "SKIP (!CPU_FTR_ARCH_31)");
+   show_result("stxvp", "SKIP (!CPU_FTR_ARCH_31)");
+   return;
+   }
+
+   init_pt_regs();
+
+   /*** lxvp ***/
+
+   cached_b[0] = c[0].b[0] = 18233;
+   cached_b[1] = c[0].b[1] = 34863571;
+   cached_b[2] = c[0].b[2] = 834;
+   cached_b[3] = c[0].b[3] = 6138911;
+   cached_b[4] = c[1].b[0] = 1234;
+   cached_b[5] = c[1].b[1] = 5678;
+   cached_b[6] = c[1].b[2] = 91011;
+   cached_b[7] = c[1].b[3] = 121314;
+
+   regs.gpr[4] = (unsigned long)[0].a;
+
+   /*
+* lxvp XTp,DQ(RA)
+* XTp = 32xTX + 2xTp
+* let TX=1 Tp=1 RA=4 DQ=0
+*/
+   stepped = emulate_step(, ppc_inst(PPC_RAW_LXVP(34, 4, 0)));
+
+   if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+   show_result("lxvp", "PASS");
+   } else {
+   if (!cpu_has_feature(CPU_FTR_VSX))
+   show_result("lxvp", "PASS (!CPU_FTR_VSX)");
+   else
+   show_result("lxvp", "FAIL");
+   }
+
+   /*** stxvp ***/
+
+   c[0].b[0] = 21379463;
+   c[0].b[1] = 87;
+   c[0].b[2] = 374234;
+   c[0].b[3] = 4;
+   c[1].b[0] = 90;
+   c[1].b[1] = 122;
+   c[1].b[2] = 555;
+   c[1].b[3] = 32144;
+
+   /*
+* stxvp XSp,DQ(RA)
+* XSp = 32xSX + 2xSp
+* let SX=1 Sp=1 RA=4 DQ=0
+*/
+   stepped = emulate_step(, ppc_inst(PPC_RAW_STXVP(34, 4, 0)));
+
+   if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == 
c[0].b[1] &&
+   cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] &&
+   cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] &&
+   cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] &&
+   cpu_has_feature(CPU_FTR_VSX)) {
+   show_result("stxvp", "PASS");
+   } else {
+   if (!cpu_has_feature(CPU_FTR_VSX))
+   show_result("stxvp", "PASS (!CPU_FTR_VSX)");
+   else
+   show_result("stxvp", "FAIL");
+   }
+}
+#else
+static void __init test_lxvp_stxvp(void)
+{
+   show_result("lxvp", "SKIP (CONFIG_VSX is not set)");
+   show_result("stxvp", "SKIP (CONFIG_VSX is not set)");
+}
+#endif /* CONFIG_VSX */
+
+#ifdef CONFIG_VSX
+static void __init test_lxvpx_stxvpx(void)
+{
+   struct pt_regs regs;
+   union {
+   vector128 a;
+   u32 b[4];
+   } c[2];
+   u32 cached_b[8];
+   int stepped = -1;
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+   show_result("lxvpx", "SKIP (!CPU_FTR_ARCH_31)");
+   show_result("stxvpx", "SKIP (!CPU_FTR_ARCH_31)");
+   return;
+   }
+
+   init_pt_regs();
+
+   /*** lxvpx ***/
+
+   cached_b[0] = c[0].b[0] = 18233;
+   cached_b[1] = c[0].b[1] = 34863571;
+   cached_b[2] = c[0].b[2] = 834;
+   cached_b[3] = c[0].b[3] = 6138911;
+   cached_b[4] = c[1].b[0] = 1234;
+   cached_b[5] = c[1].b[1] = 5678;
+   cached_b[6] = c[1].b[2] = 91011;
+   cached_b[7] = c[1].b[3] = 121314;
+
+   regs.gpr[3] = (unsigned long)[0].a;
+   regs.gpr[4] = 0;
+
+   /*
+* lxvpx XTp,RA,RB
+* XTp = 32xTX + 2xTp
+* let TX=1 Tp=1 RA=3 RB=4
+*/
+   stepped = emulate_step(, ppc_inst(PPC_RAW_LXVPX(34, 3, 4)));
+
+   if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+   show_result("lxvpx", "PASS&qu

[PATCH v5 4/5] powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions

2020-10-10 Thread Ravi Bangoria

From: Balamuruhan S 

Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as
macros for new VSX vector paired instructions,
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao 
Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/ppc-opcode.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index a6e3700c4566..5e7918ca4fb7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -78,6 +78,9 @@
 
 #define IMM_L(i)   ((uintptr_t)(i) & 0x)
 #define IMM_DS(i)  ((uintptr_t)(i) & 0xfffc)
+#define IMM_DQ(i)  ((uintptr_t)(i) & 0xfff0)
+#define IMM_D0(i)  (((uintptr_t)(i) >> 16) & 0x3)
+#define IMM_D1(i)  IMM_L(i)
 
 /*
  * 16-bit immediate helper macros: HA() is for use with sign-extending instrs
@@ -295,6 +298,8 @@
 #define __PPC_XB(b)b) & 0x1f) << 11) | (((b) & 0x20) >> 4))
 #define __PPC_XS(s)s) & 0x1f) << 21) | (((s) & 0x20) >> 5))
 #define __PPC_XT(s)__PPC_XS(s)
+#define __PPC_XSP(s)   s) & 0x1e) | (((s) >> 5) & 0x1)) << 21)
+#define __PPC_XTP(s)   __PPC_XSP(s)
 #define __PPC_T_TLB(t) (((t) & 0x3) << 21)
 #define __PPC_WC(w)(((w) & 0x3) << 21)
 #define __PPC_WS(w)(((w) & 0x1f) << 11)
@@ -395,6 +400,14 @@
 #define PPC_RAW_XVCPSGNDP(t, a, b) ((0xf780 | VSX_XX3((t), (a), (b
 #define PPC_RAW_VPERMXOR(vrt, vra, vrb, vrc) \
((0x102d | ___PPC_RT(vrt) | ___PPC_RA(vra) | ___PPC_RB(vrb) | 
(((vrc) & 0x1f) << 6)))
+#define PPC_RAW_LXVP(xtp, a, i)(0x1800 | __PPC_XTP(xtp) | 
___PPC_RA(a) | IMM_DQ(i))
+#define PPC_RAW_STXVP(xsp, a, i)   (0x1801 | __PPC_XSP(xsp) | 
___PPC_RA(a) | IMM_DQ(i))
+#define PPC_RAW_LXVPX(xtp, a, b)   (0x7c00029a | __PPC_XTP(xtp) | 
___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_STXVPX(xsp, a, b)  (0x7c00039a | __PPC_XSP(xsp) | 
___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_PLXVP(xtp, i, a, pr) \
+   ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xe800 | 
__PPC_XTP(xtp) | ___PPC_RA(a) | IMM_D1(i)))
+#define PPC_RAW_PSTXVP(xsp, i, a, pr) \
+   ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xf800 | 
__PPC_XSP(xsp) | ___PPC_RA(a) | IMM_D1(i)))
 #define PPC_RAW_NAP(0x4c000364)
 #define PPC_RAW_SLEEP  (0x4c0003a4)
 #define PPC_RAW_WINKLE (0x4c0003e4)
-- 
2.26.2

[PATCH v5 3/5] powerpc/sstep: Support VSX vector paired storage access instructions

2020-10-10 Thread Ravi Bangoria

From: Balamuruhan S 

VSX Vector Paired instructions loads/stores an octword (32 bytes)
from/to storage into two sequential VSRs. Add emulation support
for these new instructions:
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao 
Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
[kernel test robot reported a build failure]
Reported-by: kernel test robot 
---
 arch/powerpc/lib/sstep.c | 150 +--
 1 file changed, 129 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index faf0bbf3efb7..96ca813a65e7 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -32,6 +32,10 @@ extern char system_call_vectored_emulate[];
 #define XER_OV32   0x0008U
 #define XER_CA32   0x0004U
 
+#ifdef CONFIG_VSX
+#define VSX_REGISTER_XTP(rd)   rd) & 1) << 5) | ((rd) & 0xfe))
+#endif
+
 #ifdef CONFIG_PPC_FPU
 /*
  * Functions in ldstfp.S
@@ -279,6 +283,19 @@ static nokprobe_inline void do_byte_reverse(void *ptr, int 
nb)
up[1] = tmp;
break;
}
+   case 32: {
+   unsigned long *up = (unsigned long *)ptr;
+   unsigned long tmp;
+
+   tmp = byterev_8(up[0]);
+   up[0] = byterev_8(up[3]);
+   up[3] = tmp;
+   tmp = byterev_8(up[2]);
+   up[2] = byterev_8(up[1]);
+   up[1] = tmp;
+   break;
+   }
+
 #endif
default:
WARN_ON_ONCE(1);
@@ -709,6 +726,8 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
reg->d[0] = reg->d[1] = 0;
 
switch (op->element_size) {
+   case 32:
+   /* [p]lxvp[x] */
case 16:
/* whole vector; lxv[x] or lxvl[l] */
if (size == 0)
@@ -717,7 +736,7 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
if (IS_LE && (op->vsx_flags & VSX_LDLEFT))
rev = !rev;
if (rev)
-   do_byte_reverse(reg, 16);
+   do_byte_reverse(reg, size);
break;
case 8:
/* scalar loads, lxvd2x, lxvdsx */
@@ -793,6 +812,20 @@ void emulate_vsx_store(struct instruction_op *op, const 
union vsx_reg *reg,
size = GETSIZE(op->type);
 
switch (op->element_size) {
+   case 32:
+   /* [p]stxvp[x] */
+   if (size == 0)
+   break;
+   if (rev) {
+   /* reverse 32 bytes */
+   buf.d[0] = byterev_8(reg->d[3]);
+   buf.d[1] = byterev_8(reg->d[2]);
+   buf.d[2] = byterev_8(reg->d[1]);
+   buf.d[3] = byterev_8(reg->d[0]);
+   reg = 
+   }
+   memcpy(mem, reg, size);
+   break;
case 16:
/* stxv, stxvx, stxvl, stxvll */
if (size == 0)
@@ -861,28 +894,43 @@ static nokprobe_inline int do_vsx_load(struct 
instruction_op *op,
   bool cross_endian)
 {
int reg = op->reg;
-   u8 mem[16];
-   union vsx_reg buf;
+   int i, j, nr_vsx_regs;
+   u8 mem[32];
+   union vsx_reg buf[2];
int size = GETSIZE(op->type);
 
if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs))
return -EFAULT;
 
-   emulate_vsx_load(op, , mem, cross_endian);
+   nr_vsx_regs = size / sizeof(__vector128);
+   emulate_vsx_load(op, buf, mem, cross_endian);
preempt_disable();
if (reg < 32) {
/* FP regs + extensions */
if (regs->msr & MSR_FP) {
-   load_vsrn(reg, );
+   for (i = 0; i < nr_vsx_regs; i++) {
+   j = IS_LE ? nr_vsx_regs - i - 1 : i;
+   load_vsrn(reg + i, [j].v);
+   }
} else {
-   current->thread.fp_state.fpr[reg][0] = buf.d[0];
-   current->thread.fp_state.fpr[reg][1] = buf.d[1];
+   for (i = 0; i < nr_vsx_regs; i++) {
+   j = IS_LE ? nr_vsx_regs - i - 1 : i;
+   current->thread.fp_state.fpr[reg + i][0] = 
buf[j].d[0];
+   current->thread.fp_state.fpr[reg + i][1] = 
buf[j].d[1];
+   }
}
} else {
-   if (regs->msr & MSR_VEC)
-   load_vsrn(r

[PATCH v5 2/5] powerpc/sstep: Cover new VSX instructions under CONFIG_VSX

2020-10-10 Thread Ravi Bangoria

Recently added Power10 prefixed VSX instruction are included
unconditionally in the kernel. If they are executed on a
machine without VSX support, it might create issues. Fix that.
Also fix one mnemonics spelling mistake in comment.

Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/sstep.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e6242744c71b..faf0bbf3efb7 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -2757,6 +2757,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
case 41:/* plwa */
op->type = MKOP(LOAD, PREFIXED | SIGNEXT, 4);
break;
+#ifdef CONFIG_VSX
case 42:/* plxsd */
op->reg = rd + 32;
op->type = MKOP(LOAD_VSX, PREFIXED, 8);
@@ -2797,13 +2798,14 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
op->element_size = 16;
op->vsx_flags = VSX_CHECK_VEC;
break;
+#endif /* CONFIG_VSX */
case 56:/* plq */
op->type = MKOP(LOAD, PREFIXED, 16);
break;
case 57:/* pld */
op->type = MKOP(LOAD, PREFIXED, 8);
break;
-   case 60:/* stq */
+   case 60:/* pstq */
op->type = MKOP(STORE, PREFIXED, 16);
break;
case 61:/* pstd */
-- 
2.26.2

[PATCH v5 1/5] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set

2020-10-10 Thread Ravi Bangoria

From: Balamuruhan S 

Unconditional emulation of prefixed instructions will allow
emulation of them on Power10 predecessors which might cause
issues. Restrict that.

Fixes: 3920742b92f5 ("powerpc sstep: Add support for prefixed fixed-point 
arithmetic")
Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores")
Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/sstep.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e9dcaba9a4f8..e6242744c71b 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1346,6 +1346,9 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
switch (opcode) {
 #ifdef __powerpc64__
case 1:
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   return -1;
+
prefix_r = GET_PREFIX_R(word);
ra = GET_PREFIX_RA(suffix);
rd = (suffix >> 21) & 0x1f;
@@ -2733,6 +2736,9 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
}
break;
case 1: /* Prefixed instructions */
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   return -1;
+
prefix_r = GET_PREFIX_R(word);
ra = GET_PREFIX_RA(suffix);
op->update_reg = ra;
-- 
2.26.2

[PATCH v5 0/5] powerpc/sstep: VSX 32-byte vector paired load/store instructions

2020-10-10 Thread Ravi Bangoria

VSX vector paired instructions operates with octword (32-byte)
operand for loads and stores between storage and a pair of two
sequential Vector-Scalar Registers (VSRs). There are 4 word
instructions and 2 prefixed instructions that provides this
32-byte storage access operations - lxvp, lxvpx, stxvp, stxvpx,
plxvp, pstxvp.

Emulation infrastructure doesn't have support for these instructions,
to operate with 32-byte storage access and to operate with 2 VSX
registers. This patch series enables the instruction emulation
support and adds test cases for them respectively.

v4: 
https://lore.kernel.org/r/20201008072726.233086-1-ravi.bango...@linux.ibm.com

Changes in v5:
-
* Fix build breakage reported by Kernel test robote
* Patch #2 is new. CONFIG_VSX check was missing for some VSX
  instructions. Patch #2 adds that check.

Changes in v4:
-
* Patch #1 is (kind of) new.
* Patch #2 now enables both analyse_instr() and emulate_step()
  unlike prev series where both were in separate patches.
* Patch #2 also has important fix for emulation on LE.
* Patch #3 and #4. Added XSP/XTP, D0/D1 instruction operands,
  removed *_EX_OP, __PPC_T[P][X] macros which are incorrect,
  and adhered to PPC_RAW_* convention.
* Added `CPU_FTR_ARCH_31` check in testcases to avoid failing
  in p8/p9.
* Some consmetic changes.
* Rebased to powerpc/next

Changes in v3:
-
Worked on review comments and suggestions from Ravi and Naveen,

* Fix the do_vsx_load() to handle vsx instructions if MSR_FP/MSR_VEC
  cleared in exception conditions and it reaches to read/write to
  thread_struct member fp_state/vr_state respectively.
* Fix wrongly used `__vector128 v[2]` in struct vsx_reg as it should
  hold a single vsx register size.
* Remove unnecessary `VSX_CHECK_VEC` flag set and condition to check
  `VSX_LDLEFT` that is not applicable for these vsx instructions.
* Fix comments in emulate_vsx_load() that were misleading.
* Rebased on latest powerpc next branch.

Changes in v2:
-
* Fix suggestion from Sandipan, wrap ISA 3.1 instructions with
  cpu_has_feature(CPU_FTR_ARCH_31) check.


Balamuruhan S (4):
  powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31
is set
  powerpc/sstep: Support VSX vector paired storage access instructions
  powerpc/ppc-opcode: Add encoding macros for VSX vector paired
instructions
  powerpc/sstep: Add testcases for VSX vector paired load/store
instructions

Ravi Bangoria (1):
  powerpc/sstep: Cover new VSX instructions under CONFIG_VSX

 arch/powerpc/include/asm/ppc-opcode.h |  13 ++
 arch/powerpc/lib/sstep.c  | 160 ---
 arch/powerpc/lib/test_emulate_step.c  | 270 ++
 3 files changed, 421 insertions(+), 22 deletions(-)

-- 
2.26.2

[PATCH v4 4/4] powerpc/sstep: Add testcases for VSX vector paired load/store instructions

2020-10-08 Thread Ravi Bangoria

From: Balamuruhan S 

Add testcases for VSX vector paired load/store instructions.
Sample o/p:

  emulate_step_test: lxvp   : PASS
  emulate_step_test: stxvp  : PASS
  emulate_step_test: lxvpx  : PASS
  emulate_step_test: stxvpx : PASS
  emulate_step_test: plxvp  : PASS
  emulate_step_test: pstxvp : PASS

Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/test_emulate_step.c | 270 +++
 1 file changed, 270 insertions(+)

diff --git a/arch/powerpc/lib/test_emulate_step.c 
b/arch/powerpc/lib/test_emulate_step.c
index 0a201b771477..783d1b85ecfe 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -612,6 +612,273 @@ static void __init test_lxvd2x_stxvd2x(void)
 }
 #endif /* CONFIG_VSX */
 
+#ifdef CONFIG_VSX
+static void __init test_lxvp_stxvp(void)
+{
+   struct pt_regs regs;
+   union {
+   vector128 a;
+   u32 b[4];
+   } c[2];
+   u32 cached_b[8];
+   int stepped = -1;
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+   show_result("lxvp", "SKIP (!CPU_FTR_ARCH_31)");
+   show_result("stxvp", "SKIP (!CPU_FTR_ARCH_31)");
+   return;
+   }
+
+   init_pt_regs();
+
+   /*** lxvp ***/
+
+   cached_b[0] = c[0].b[0] = 18233;
+   cached_b[1] = c[0].b[1] = 34863571;
+   cached_b[2] = c[0].b[2] = 834;
+   cached_b[3] = c[0].b[3] = 6138911;
+   cached_b[4] = c[1].b[0] = 1234;
+   cached_b[5] = c[1].b[1] = 5678;
+   cached_b[6] = c[1].b[2] = 91011;
+   cached_b[7] = c[1].b[3] = 121314;
+
+   regs.gpr[4] = (unsigned long)[0].a;
+
+   /*
+* lxvp XTp,DQ(RA)
+* XTp = 32xTX + 2xTp
+* let TX=1 Tp=1 RA=4 DQ=0
+*/
+   stepped = emulate_step(, ppc_inst(PPC_RAW_LXVP(34, 4, 0)));
+
+   if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+   show_result("lxvp", "PASS");
+   } else {
+   if (!cpu_has_feature(CPU_FTR_VSX))
+   show_result("lxvp", "PASS (!CPU_FTR_VSX)");
+   else
+   show_result("lxvp", "FAIL");
+   }
+
+   /*** stxvp ***/
+
+   c[0].b[0] = 21379463;
+   c[0].b[1] = 87;
+   c[0].b[2] = 374234;
+   c[0].b[3] = 4;
+   c[1].b[0] = 90;
+   c[1].b[1] = 122;
+   c[1].b[2] = 555;
+   c[1].b[3] = 32144;
+
+   /*
+* stxvp XSp,DQ(RA)
+* XSp = 32xSX + 2xSp
+* let SX=1 Sp=1 RA=4 DQ=0
+*/
+   stepped = emulate_step(, ppc_inst(PPC_RAW_STXVP(34, 4, 0)));
+
+   if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == 
c[0].b[1] &&
+   cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] &&
+   cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] &&
+   cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] &&
+   cpu_has_feature(CPU_FTR_VSX)) {
+   show_result("stxvp", "PASS");
+   } else {
+   if (!cpu_has_feature(CPU_FTR_VSX))
+   show_result("stxvp", "PASS (!CPU_FTR_VSX)");
+   else
+   show_result("stxvp", "FAIL");
+   }
+}
+#else
+static void __init test_lxvp_stxvp(void)
+{
+   show_result("lxvp", "SKIP (CONFIG_VSX is not set)");
+   show_result("stxvp", "SKIP (CONFIG_VSX is not set)");
+}
+#endif /* CONFIG_VSX */
+
+#ifdef CONFIG_VSX
+static void __init test_lxvpx_stxvpx(void)
+{
+   struct pt_regs regs;
+   union {
+   vector128 a;
+   u32 b[4];
+   } c[2];
+   u32 cached_b[8];
+   int stepped = -1;
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+   show_result("lxvpx", "SKIP (!CPU_FTR_ARCH_31)");
+   show_result("stxvpx", "SKIP (!CPU_FTR_ARCH_31)");
+   return;
+   }
+
+   init_pt_regs();
+
+   /*** lxvpx ***/
+
+   cached_b[0] = c[0].b[0] = 18233;
+   cached_b[1] = c[0].b[1] = 34863571;
+   cached_b[2] = c[0].b[2] = 834;
+   cached_b[3] = c[0].b[3] = 6138911;
+   cached_b[4] = c[1].b[0] = 1234;
+   cached_b[5] = c[1].b[1] = 5678;
+   cached_b[6] = c[1].b[2] = 91011;
+   cached_b[7] = c[1].b[3] = 121314;
+
+   regs.gpr[3] = (unsigned long)[0].a;
+   regs.gpr[4] = 0;
+
+   /*
+* lxvpx XTp,RA,RB
+* XTp = 32xTX + 2xTp
+* let TX=1 Tp=1 RA=3 RB=4
+*/
+   stepped = emulate_step(, ppc_inst(PPC_RAW_LXVPX(34, 3, 4)));
+
+   if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+   show_result("lxvpx", "PASS&qu

[PATCH v4 3/4] powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions

2020-10-08 Thread Ravi Bangoria

From: Balamuruhan S 

Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as
macros for new VSX vector paired instructions,
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao 
Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/ppc-opcode.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index a6e3700c4566..5e7918ca4fb7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -78,6 +78,9 @@
 
 #define IMM_L(i)   ((uintptr_t)(i) & 0x)
 #define IMM_DS(i)  ((uintptr_t)(i) & 0xfffc)
+#define IMM_DQ(i)  ((uintptr_t)(i) & 0xfff0)
+#define IMM_D0(i)  (((uintptr_t)(i) >> 16) & 0x3)
+#define IMM_D1(i)  IMM_L(i)
 
 /*
  * 16-bit immediate helper macros: HA() is for use with sign-extending instrs
@@ -295,6 +298,8 @@
 #define __PPC_XB(b)b) & 0x1f) << 11) | (((b) & 0x20) >> 4))
 #define __PPC_XS(s)s) & 0x1f) << 21) | (((s) & 0x20) >> 5))
 #define __PPC_XT(s)__PPC_XS(s)
+#define __PPC_XSP(s)   s) & 0x1e) | (((s) >> 5) & 0x1)) << 21)
+#define __PPC_XTP(s)   __PPC_XSP(s)
 #define __PPC_T_TLB(t) (((t) & 0x3) << 21)
 #define __PPC_WC(w)(((w) & 0x3) << 21)
 #define __PPC_WS(w)(((w) & 0x1f) << 11)
@@ -395,6 +400,14 @@
 #define PPC_RAW_XVCPSGNDP(t, a, b) ((0xf780 | VSX_XX3((t), (a), (b
 #define PPC_RAW_VPERMXOR(vrt, vra, vrb, vrc) \
((0x102d | ___PPC_RT(vrt) | ___PPC_RA(vra) | ___PPC_RB(vrb) | 
(((vrc) & 0x1f) << 6)))
+#define PPC_RAW_LXVP(xtp, a, i)(0x1800 | __PPC_XTP(xtp) | 
___PPC_RA(a) | IMM_DQ(i))
+#define PPC_RAW_STXVP(xsp, a, i)   (0x1801 | __PPC_XSP(xsp) | 
___PPC_RA(a) | IMM_DQ(i))
+#define PPC_RAW_LXVPX(xtp, a, b)   (0x7c00029a | __PPC_XTP(xtp) | 
___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_STXVPX(xsp, a, b)  (0x7c00039a | __PPC_XSP(xsp) | 
___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_PLXVP(xtp, i, a, pr) \
+   ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xe800 | 
__PPC_XTP(xtp) | ___PPC_RA(a) | IMM_D1(i)))
+#define PPC_RAW_PSTXVP(xsp, i, a, pr) \
+   ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xf800 | 
__PPC_XSP(xsp) | ___PPC_RA(a) | IMM_D1(i)))
 #define PPC_RAW_NAP(0x4c000364)
 #define PPC_RAW_SLEEP  (0x4c0003a4)
 #define PPC_RAW_WINKLE (0x4c0003e4)
-- 
2.26.2

[PATCH v4 2/4] powerpc/sstep: Support VSX vector paired storage access instructions

2020-10-08 Thread Ravi Bangoria

From: Balamuruhan S 

VSX Vector Paired instructions loads/stores an octword (32 bytes)
from/to storage into two sequential VSRs. Add emulation support
for these new instructions:
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao 
Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/sstep.c | 146 +--
 1 file changed, 125 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e6242744c71b..e39ee1651636 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -32,6 +32,10 @@ extern char system_call_vectored_emulate[];
 #define XER_OV32   0x0008U
 #define XER_CA32   0x0004U
 
+#ifdef CONFIG_VSX
+#define VSX_REGISTER_XTP(rd)   rd) & 1) << 5) | ((rd) & 0xfe))
+#endif
+
 #ifdef CONFIG_PPC_FPU
 /*
  * Functions in ldstfp.S
@@ -279,6 +283,19 @@ static nokprobe_inline void do_byte_reverse(void *ptr, int 
nb)
up[1] = tmp;
break;
}
+   case 32: {
+   unsigned long *up = (unsigned long *)ptr;
+   unsigned long tmp;
+
+   tmp = byterev_8(up[0]);
+   up[0] = byterev_8(up[3]);
+   up[3] = tmp;
+   tmp = byterev_8(up[2]);
+   up[2] = byterev_8(up[1]);
+   up[1] = tmp;
+   break;
+   }
+
 #endif
default:
WARN_ON_ONCE(1);
@@ -709,6 +726,8 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
reg->d[0] = reg->d[1] = 0;
 
switch (op->element_size) {
+   case 32:
+   /* [p]lxvp[x] */
case 16:
/* whole vector; lxv[x] or lxvl[l] */
if (size == 0)
@@ -717,7 +736,7 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
if (IS_LE && (op->vsx_flags & VSX_LDLEFT))
rev = !rev;
if (rev)
-   do_byte_reverse(reg, 16);
+   do_byte_reverse(reg, size);
break;
case 8:
/* scalar loads, lxvd2x, lxvdsx */
@@ -793,6 +812,20 @@ void emulate_vsx_store(struct instruction_op *op, const 
union vsx_reg *reg,
size = GETSIZE(op->type);
 
switch (op->element_size) {
+   case 32:
+   /* [p]stxvp[x] */
+   if (size == 0)
+   break;
+   if (rev) {
+   /* reverse 32 bytes */
+   buf.d[0] = byterev_8(reg->d[3]);
+   buf.d[1] = byterev_8(reg->d[2]);
+   buf.d[2] = byterev_8(reg->d[1]);
+   buf.d[3] = byterev_8(reg->d[0]);
+   reg = 
+   }
+   memcpy(mem, reg, size);
+   break;
case 16:
/* stxv, stxvx, stxvl, stxvll */
if (size == 0)
@@ -861,28 +894,43 @@ static nokprobe_inline int do_vsx_load(struct 
instruction_op *op,
   bool cross_endian)
 {
int reg = op->reg;
-   u8 mem[16];
-   union vsx_reg buf;
+   int i, j, nr_vsx_regs;
+   u8 mem[32];
+   union vsx_reg buf[2];
int size = GETSIZE(op->type);
 
if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs))
return -EFAULT;
 
-   emulate_vsx_load(op, , mem, cross_endian);
+   nr_vsx_regs = size / sizeof(__vector128);
+   emulate_vsx_load(op, buf, mem, cross_endian);
preempt_disable();
if (reg < 32) {
/* FP regs + extensions */
if (regs->msr & MSR_FP) {
-   load_vsrn(reg, );
+   for (i = 0; i < nr_vsx_regs; i++) {
+   j = IS_LE ? nr_vsx_regs - i - 1 : i;
+   load_vsrn(reg + i, [j].v);
+   }
} else {
-   current->thread.fp_state.fpr[reg][0] = buf.d[0];
-   current->thread.fp_state.fpr[reg][1] = buf.d[1];
+   for (i = 0; i < nr_vsx_regs; i++) {
+   j = IS_LE ? nr_vsx_regs - i - 1 : i;
+   current->thread.fp_state.fpr[reg + i][0] = 
buf[j].d[0];
+   current->thread.fp_state.fpr[reg + i][1] = 
buf[j].d[1];
+   }
}
} else {
-   if (regs->msr & MSR_VEC)
-   load_vsrn(reg, );
-   else
-   current->thread.vr_state.vr

[PATCH v4 0/4] powerpc/sstep: VSX 32-byte vector paired load/store instructions

2020-10-08 Thread Ravi Bangoria

VSX vector paired instructions operates with octword (32-byte)
operand for loads and stores between storage and a pair of two
sequential Vector-Scalar Registers (VSRs). There are 4 word
instructions and 2 prefixed instructions that provides this
32-byte storage access operations - lxvp, lxvpx, stxvp, stxvpx,
plxvp, pstxvp.

Emulation infrastructure doesn't have support for these instructions,
to operate with 32-byte storage access and to operate with 2 VSX
registers. This patch series enables the instruction emulation
support and adds test cases for them respectively.

v3: 
https://lore.kernel.org/linuxppc-dev/20200731081637.1837559-1-bal...@linux.ibm.com/
 

Changes in v4:
-
* Patch #1 is (kind of) new.
* Patch #2 now enables both analyse_instr() and emulate_step()
  unlike prev series where both were in separate patches.
* Patch #2 also has important fix for emulation on LE.
* Patch #3 and #4. Added XSP/XTP, D0/D1 instruction operands,
  removed *_EX_OP, __PPC_T[P][X] macros which are incorrect,
  and adhered to PPC_RAW_* convention.
* Added `CPU_FTR_ARCH_31` check in testcases to avoid failing
  in p8/p9.
* Some consmetic changes.
* Rebased to powerpc/next

Changes in v3:
-
Worked on review comments and suggestions from Ravi and Naveen,

* Fix the do_vsx_load() to handle vsx instructions if MSR_FP/MSR_VEC
  cleared in exception conditions and it reaches to read/write to
  thread_struct member fp_state/vr_state respectively.
* Fix wrongly used `__vector128 v[2]` in struct vsx_reg as it should
  hold a single vsx register size.
* Remove unnecessary `VSX_CHECK_VEC` flag set and condition to check
  `VSX_LDLEFT` that is not applicable for these vsx instructions.
* Fix comments in emulate_vsx_load() that were misleading.
* Rebased on latest powerpc next branch.

Changes in v2:
-
* Fix suggestion from Sandipan, wrap ISA 3.1 instructions with
  cpu_has_feature(CPU_FTR_ARCH_31) check.
* Rebase on latest powerpc next branch.


Balamuruhan S (4):
  powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31
is set
  powerpc/sstep: Support VSX vector paired storage access instructions
  powerpc/ppc-opcode: Add encoding macros for VSX vector paired
instructions
  powerpc/sstep: Add testcases for VSX vector paired load/store
instructions

 arch/powerpc/include/asm/ppc-opcode.h |  13 ++
 arch/powerpc/lib/sstep.c  | 152 +--
 arch/powerpc/lib/test_emulate_step.c  | 270 ++
 3 files changed, 414 insertions(+), 21 deletions(-)

-- 
2.26.2

[PATCH v4 1/4] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set

2020-10-08 Thread Ravi Bangoria

From: Balamuruhan S 

Unconditional emulation of prefixed instructions will allow
emulation of them on Power10 predecessors which might cause
issues. Restrict that.

Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/sstep.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e9dcaba9a4f8..e6242744c71b 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1346,6 +1346,9 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
switch (opcode) {
 #ifdef __powerpc64__
case 1:
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   return -1;
+
prefix_r = GET_PREFIX_R(word);
ra = GET_PREFIX_RA(suffix);
rd = (suffix >> 21) & 0x1f;
@@ -2733,6 +2736,9 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
}
break;
case 1: /* Prefixed instructions */
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   return -1;
+
prefix_r = GET_PREFIX_R(word);
ra = GET_PREFIX_RA(suffix);
op->update_reg = ra;
-- 
2.26.2

Re: [PATCH v6 0/8] powerpc/watchpoint: Bug fixes plus new feature flag

2020-09-18 Thread Ravi Bangoria





On 9/17/20 6:54 PM, Rogerio Alves wrote:

On 9/2/20 1:29 AM, Ravi Bangoria wrote:

Patch #1 fixes issue for quardword instruction on p10 predecessors.
Patch #2 fixes issue for vector instructions.
Patch #3 fixes a bug about watchpoint not firing when created with
  ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N.
  The fix uses HW_BRK_TYPE_PRIV_ALL for ptrace user which, I
  guess, should be fine because we don't leak any kernel
  addresses and PRIV_ALL will also help to cover scenarios when
  kernel accesses user memory.
Patch #4,#5 fixes infinite exception bug, again the bug happens only
  with CONFIG_HAVE_HW_BREAKPOINT=N.
Patch #6 fixes two places where we are missing to set hw_len.
Patch #7 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
  which will be set when running on ISA 3.1 compliant machine.
Patch #8 finally adds selftest to test scenarios fixed by patch#2,#3
  and also moves MODE_EXACT tests outside of BP_RANGE condition.


[...]



Tested this patch set for:
- SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N = OK
- Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N = OK
- Check for PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 = OK
- Fix quarword instruction handling on p10 predecessors = OK
- Fix handling of vector instructions = OK

Also tested for:
- Set second watchpoint (P10 Mambo) = OK
- Infinity loop on sc instruction = OK


Thanks Rogerio!

Ravi

Re: [PATCH 0/7] powerpc/watchpoint: 2nd DAWR kvm enablement + selftests

2020-09-01 Thread Ravi Bangoria


Hi Paul,

On 9/2/20 8:02 AM, Paul Mackerras wrote:

On Thu, Jul 23, 2020 at 03:50:51PM +0530, Ravi Bangoria wrote:

Patch #1, #2 and #3 enables p10 2nd DAWR feature for Book3S kvm guest. DAWR
is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it.
A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall
for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has
been added to query 2nd DAWR support via kvm ioctl.

This feature also needs to be enabled in Qemu to really use it. I'll reply
link to qemu patches once I post them in qemu-devel mailing list.

Patch #4, #5, #6 and #7 adds selftests to test 2nd DAWR.


If/when you resubmit these patches, please split the KVM patches into
a separate series, since the KVM patches would go via my tree whereas
I expect the selftests/powerpc patches would go through Michael
Ellerman's tree.


Sure. Will split it.

Thanks,
Ravi

Re: [PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR

2020-09-01 Thread Ravi Bangoria


Hi Paul,


diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 33793444144c..03f401d7be41 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -538,6 +538,8 @@ struct hv_guest_state {
s64 tb_offset;
u64 dawr0;
u64 dawrx0;
+   u64 dawr1;
+   u64 dawrx1;
u64 ciabr;
u64 hdec_expiry;
u64 purr;


After this struct, there is a macro HV_GUEST_STATE_VERSION, I guess that
also needs to be incremented because I'm adding new members in the struct?

Thanks,
Ravi

Re: [PATCH 1/7] powerpc/watchpoint/kvm: Rename current DAWR macros and variables

2020-09-01 Thread Ravi Bangoria


Hi Paul,

On 9/2/20 7:19 AM, Paul Mackerras wrote:

On Thu, Jul 23, 2020 at 03:50:52PM +0530, Ravi Bangoria wrote:

Power10 is introducing second DAWR. Use real register names (with
suffix 0) from ISA for current macros and variables used by kvm.


Most of this looks fine, but I think we should not change the existing
names in arch/powerpc/include/uapi/asm/kvm.h (and therefore also
Documentation/virt/kvm/api.rst).


Missed that I'm changing uapi. I'll rename only those macros/variables
which are not uapi.

Thanks,
Ravi

Re: [PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR

2020-09-01 Thread Ravi Bangoria


Hi Paul,

On 9/2/20 7:31 AM, Paul Mackerras wrote:

On Thu, Jul 23, 2020 at 03:50:53PM +0530, Ravi Bangoria wrote:

kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR.
DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/
unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR.


Is this the same interface as will be defined in PAPR and available
under PowerVM, or is it a new/different interface for KVM?


Yes, kvm hcall interface for 2nd DAWR is same as PowerVM, as defined in PAPR.




Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set.


In general QEMU wants to be able to control all aspects of the virtual
machine presented to the guest, meaning that just because a host has a
particular hardware capability does not mean we should automatically
present that capability to the guest.

In this case, QEMU will want a way to control whether the guest sees
the availability of the second DAWR/X registers or not, i.e. whether a
H_SET_MODE to set DAWR[X]1 will succeed or fail.


Patch #3 adds new kvm capability KVM_CAP_PPC_DAWR1 that can be checked
by Qemu. Also, as suggested by David in Qemu patch[1], I'm planning to
add new machine capability in Qemu:

  -machine cap-dawr1=ON/OFF

cap-dawr1 will be default ON when PPC_FEATURE2_ARCH_3_10 is set and OFF
otherwise.

Is this correct approach?

[1]: https://lore.kernel.org/kvm/20200724045613.ga8...@umbus.fritz.box

Thanks,
Ravi

[PATCH v6 8/8] powerpc/watchpoint/selftests: Tests for kernel accessing user memory

2020-09-01 Thread Ravi Bangoria

Introduce tests to cover simple scenarios where user is watching
memory which can be accessed by kernel as well. We also support
_MODE_EXACT with _SETHWDEBUG interface. Move those testcases out-
side of _BP_RANGE condition. This will help to test _MODE_EXACT
scenarios when CONFIG_HAVE_HW_BREAKPOINT is not set, eg:

  $ ./ptrace-hwbreak
  ...
  PTRACE_SET_DEBUGREG, Kernel Access Userspace, len: 8: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace, len: 1: Ok
  success: ptrace-hwbreak

Suggested-by: Pedro Miraglia Franco de Carvalho 
Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 48 ++-
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
index fc477dfe86a2..2e0d86e0687e 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "ptrace.h"
 
 #define SPRN_PVR   0x11F
@@ -44,6 +46,7 @@ struct gstruct {
 };
 static volatile struct gstruct gstruct __attribute__((aligned(512)));
 
+static volatile char cwd[PATH_MAX] __attribute__((aligned(8)));
 
 static void get_dbginfo(pid_t child_pid, struct ppc_debug_info *dbginfo)
 {
@@ -138,6 +141,9 @@ static void test_workload(void)
write_var(len);
}
 
+   /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */
+   syscall(__NR_getcwd, , PATH_MAX);
+
/* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO test */
write_var(1);
 
@@ -150,6 +156,9 @@ static void test_workload(void)
else
read_var(1);
 
+   /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */
+   syscall(__NR_getcwd, , PATH_MAX);
+
/* PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, WO test */
gstruct.a[rand() % A_LEN] = 'a';
 
@@ -293,6 +302,24 @@ static int test_set_debugreg(pid_t child_pid)
return 0;
 }
 
+static int test_set_debugreg_kernel_userspace(pid_t child_pid)
+{
+   unsigned long wp_addr = (unsigned long)cwd;
+   char *name = "PTRACE_SET_DEBUGREG";
+
+   /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */
+   wp_addr &= ~0x7UL;
+   wp_addr |= (1Ul << DABR_READ_SHIFT);
+   wp_addr |= (1UL << DABR_WRITE_SHIFT);
+   wp_addr |= (1UL << DABR_TRANSLATION_SHIFT);
+   ptrace_set_debugreg(child_pid, wp_addr);
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name, "Kernel Access Userspace", wp_addr, 8);
+
+   ptrace_set_debugreg(child_pid, 0);
+   return 0;
+}
+
 static void get_ppc_hw_breakpoint(struct ppc_hw_breakpoint *info, int type,
  unsigned long addr, int len)
 {
@@ -338,6 +365,22 @@ static void test_sethwdebug_exact(pid_t child_pid)
ptrace_delhwdebug(child_pid, wh);
 }
 
+static void test_sethwdebug_exact_kernel_userspace(pid_t child_pid)
+{
+   struct ppc_hw_breakpoint info;
+   unsigned long wp_addr = (unsigned long)
+   char *name = "PPC_PTRACE_SETHWDEBUG, MODE_EXACT";
+   int len = 1; /* hardcoded in kernel */
+   int wh;
+
+   /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */
+   get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr, 0);
+   wh = ptrace_sethwdebug(child_pid, );
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name, "Kernel Access Userspace", wp_addr, len);
+   ptrace_delhwdebug(child_pid, wh);
+}
+
 static void test_sethwdebug_range_aligned(pid_t child_pid)
 {
struct ppc_hw_breakpoint info;
@@ -452,9 +495,10 @@ static void
 run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, bool dawr)
 {
test_set_debugreg(child_pid);
+   test_set_debugreg_kernel_userspace(child_pid);
+   test_sethwdebug_exact(child_pid);
+   test_sethwdebug_exact_kernel_userspace(child_pid);
if (dbginfo->features & PPC_DEBUG_FEATURE_DATA_BP_RANGE) {
-   test_sethwdebug_exact(child_pid);
-
test_sethwdebug_range_aligned(child_pid);
if (dawr || is_8xx) {
test_sethwdebug_range_unaligned(child_pid);
-- 
2.26.2

[PATCH v6 7/8] powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31

2020-09-01 Thread Ravi Bangoria

PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 can be used to determine whether
we are running on an ISA 3.1 compliant machine. Which is needed to
determine DAR behaviour, 512 byte boundary limit etc. This was
requested by Pedro Miraglia Franco de Carvalho for extending
watchpoint features in gdb. Note that availability of 2nd DAWR is
independent of this flag and should be checked using
ppc_debug_info->num_data_bps.

Signed-off-by: Ravi Bangoria 
---
 Documentation/powerpc/ptrace.rst  | 1 +
 arch/powerpc/include/uapi/asm/ptrace.h| 1 +
 arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/Documentation/powerpc/ptrace.rst b/Documentation/powerpc/ptrace.rst
index 864d4b61..77725d69eb4a 100644
--- a/Documentation/powerpc/ptrace.rst
+++ b/Documentation/powerpc/ptrace.rst
@@ -46,6 +46,7 @@ features will have bits indicating whether there is support 
for::
   #define PPC_DEBUG_FEATURE_DATA_BP_RANGE  0x4
   #define PPC_DEBUG_FEATURE_DATA_BP_MASK   0x8
   #define PPC_DEBUG_FEATURE_DATA_BP_DAWR   0x10
+  #define PPC_DEBUG_FEATURE_DATA_BP_ARCH_310x20
 
 2. PTRACE_SETHWDEBUG
 
diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index f5f1ccc740fc..7004cfea3f5f 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -222,6 +222,7 @@ struct ppc_debug_info {
 #define PPC_DEBUG_FEATURE_DATA_BP_RANGE0x0004
 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x0008
 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x0010
+#define PPC_DEBUG_FEATURE_DATA_BP_ARCH_31  0x0020
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index 48c52426af80..aa36fcad36cd 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -57,6 +57,8 @@ void ppc_gethwdinfo(struct ppc_debug_info *dbginfo)
} else {
dbginfo->features = 0;
}
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   dbginfo->features |= PPC_DEBUG_FEATURE_DATA_BP_ARCH_31;
 }
 
 int ptrace_get_debugreg(struct task_struct *child, unsigned long addr,
-- 
2.26.2

[PATCH v6 6/8] powerpc/watchpoint: Add hw_len wherever missing

2020-09-01 Thread Ravi Bangoria

There are couple of places where we set len but not hw_len. For
ptrace/perf watchpoints, when CONFIG_HAVE_HW_BREAKPOINT=Y, hw_len
will be calculated and set internally while parsing watchpoint.
But when CONFIG_HAVE_HW_BREAKPOINT=N, we need to manually set
'hw_len'. Similarly for xmon as well, hw_len needs to be set
directly.

Fixes: b57aeab811db ("powerpc/watchpoint: Fix length calculation for unaligned 
target")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/ptrace/ptrace-noadv.c | 1 +
 arch/powerpc/xmon/xmon.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index c9122ed91340..48c52426af80 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -219,6 +219,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct 
ppc_hw_breakpoint *bp_inf
brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE);
brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL;
brk.len = DABR_MAX_LEN;
+   brk.hw_len = DABR_MAX_LEN;
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ)
brk.type |= HW_BRK_TYPE_READ;
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE)
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index df7bca00f5ec..55c43a6c9111 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -969,6 +969,7 @@ static void insert_cpu_bpts(void)
brk.address = dabr[i].address;
brk.type = (dabr[i].enabled & HW_BRK_TYPE_DABR) | 
HW_BRK_TYPE_PRIV_ALL;
brk.len = 8;
+   brk.hw_len = 8;
__set_breakpoint(i, );
}
}
-- 
2.26.2

[PATCH v6 5/8] powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N

2020-09-01 Thread Ravi Bangoria

On powerpc, ptrace watchpoint works in one-shot mode. i.e. kernel
disables event every time it fires and user has to re-enable it.
Also, in case of ptrace watchpoint, kernel notifies ptrace user
before executing instruction.

With CONFIG_HAVE_HW_BREAKPOINT=N, kernel is missing to disable
ptrace event and thus it's causing infinite loop of exceptions.
This is especially harmful when user watches on a data which is
also read/written by kernel, eg syscall parameters. In such case,
infinite exceptions happens in kernel mode which causes soft-lockup.

Fixes: 9422de3e953d ("powerpc: Hardware breakpoints rewrite to handle non DABR 
breakpoint registers")
Reported-by: Pedro Miraglia Franco de Carvalho 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/hw_breakpoint.h  |  3 ++
 arch/powerpc/kernel/process.c | 48 +++
 arch/powerpc/kernel/ptrace/ptrace-noadv.c |  4 +-
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 81872c420476..abebfbee5b1c 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -18,6 +18,7 @@ struct arch_hw_breakpoint {
u16 type;
u16 len; /* length of the target data symbol */
u16 hw_len; /* length programmed in hw */
+   u8  flags;
 };
 
 /* Note: Don't change the first 6 bits below as they are in the same order
@@ -37,6 +38,8 @@ struct arch_hw_breakpoint {
 #define HW_BRK_TYPE_PRIV_ALL   (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \
 HW_BRK_TYPE_HYP)
 
+#define HW_BRK_FLAG_DISABLED   0x1
+
 /* Minimum granularity */
 #ifdef CONFIG_PPC_8xx
 #define HW_BREAKPOINT_SIZE  0x4
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 016bd831908e..160fbbf41d40 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -636,6 +636,44 @@ void do_send_trap(struct pt_regs *regs, unsigned long 
address,
(void __user *)address);
 }
 #else  /* !CONFIG_PPC_ADV_DEBUG_REGS */
+
+static void do_break_handler(struct pt_regs *regs)
+{
+   struct arch_hw_breakpoint null_brk = {0};
+   struct arch_hw_breakpoint *info;
+   struct ppc_inst instr = ppc_inst(0);
+   int type = 0;
+   int size = 0;
+   unsigned long ea;
+   int i;
+
+   /*
+* If underneath hw supports only one watchpoint, we know it
+* caused exception. 8xx also falls into this category.
+*/
+   if (nr_wp_slots() == 1) {
+   __set_breakpoint(0, _brk);
+   current->thread.hw_brk[0] = null_brk;
+   current->thread.hw_brk[0].flags |= HW_BRK_FLAG_DISABLED;
+   return;
+   }
+
+   /* Otherwise findout which DAWR caused exception and disable it. */
+   wp_get_instr_detail(regs, , , , );
+
+   for (i = 0; i < nr_wp_slots(); i++) {
+   info = >thread.hw_brk[i];
+   if (!info->address)
+   continue;
+
+   if (wp_check_constraints(regs, instr, ea, type, size, info)) {
+   __set_breakpoint(i, _brk);
+   current->thread.hw_brk[i] = null_brk;
+   current->thread.hw_brk[i].flags |= HW_BRK_FLAG_DISABLED;
+   }
+   }
+}
+
 void do_break (struct pt_regs *regs, unsigned long address,
unsigned long error_code)
 {
@@ -647,6 +685,16 @@ void do_break (struct pt_regs *regs, unsigned long address,
if (debugger_break_match(regs))
return;
 
+   /*
+* We reach here only when watchpoint exception is generated by ptrace
+* event (or hw is buggy!). Now if CONFIG_HAVE_HW_BREAKPOINT is set,
+* watchpoint is already handled by hw_breakpoint_handler() so we don't
+* have to do anything. But when CONFIG_HAVE_HW_BREAKPOINT is not set,
+* we need to manually handle the watchpoint here.
+*/
+   if (!IS_ENABLED(CONFIG_HAVE_HW_BREAKPOINT))
+   do_break_handler(regs);
+
/* Deliver the signal to userspace */
force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)address);
 }
diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index 57a0ab822334..c9122ed91340 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -286,11 +286,13 @@ long ppc_del_hwdebug(struct task_struct *child, long data)
}
return ret;
 #else /* CONFIG_HAVE_HW_BREAKPOINT */
-   if (child->thread.hw_brk[data - 1].address == 0)
+   if (!(child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED) &&
+   child->thread.hw_brk[data - 1].address == 0)
return -ENOENT;
 
child

[PATCH v6 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c

2020-09-01 Thread Ravi Bangoria

Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused
the exception. So we have a sw logic to detect that in hw_breakpoint.c.
But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y.
Move DAWR detection logic outside of hw_breakpoint.c so that it can be
reused when CONFIG_HAVE_HW_BREAKPOINT is not set.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/hw_breakpoint.h  |   8 +
 arch/powerpc/kernel/Makefile  |   3 +-
 arch/powerpc/kernel/hw_breakpoint.c   | 159 +
 .../kernel/hw_breakpoint_constraints.c| 162 ++
 4 files changed, 174 insertions(+), 158 deletions(-)
 create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 9b68eafebf43..81872c420476 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -10,6 +10,7 @@
 #define _PPC_BOOK3S_64_HW_BREAKPOINT_H
 
 #include 
+#include 
 
 #ifdef __KERNEL__
 struct arch_hw_breakpoint {
@@ -52,6 +53,13 @@ static inline int nr_wp_slots(void)
return cpu_has_feature(CPU_FTR_DAWR1) ? 2 : 1;
 }
 
+bool wp_check_constraints(struct pt_regs *regs, struct ppc_inst instr,
+ unsigned long ea, int type, int size,
+ struct arch_hw_breakpoint *info);
+
+void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
+int *type, int *size, unsigned long *ea);
+
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 #include 
 #include 
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index cbf41fb4ee89..a5550c2b24c4 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -45,7 +45,8 @@ obj-y := cputable.o syscalls.o \
   signal.o sysfs.o cacheinfo.o time.o \
   prom.o traps.o setup-common.o \
   udbg.o misc.o io.o misc_$(BITS).o \
-  of_platform.o prom_parse.o firmware.o
+  of_platform.o prom_parse.o firmware.o \
+  hw_breakpoint_constraints.o
 obj-y  += ptrace/
 obj-$(CONFIG_PPC64)+= setup_64.o \
   paca.o nvram_64.o note.o syscall_64.o
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index f6b24838ca3c..f4e8f21046f5 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -494,161 +494,6 @@ void thread_change_pc(struct task_struct *tsk, struct 
pt_regs *regs)
}
 }
 
-static bool dar_in_user_range(unsigned long dar, struct arch_hw_breakpoint 
*info)
-{
-   return ((info->address <= dar) && (dar - info->address < info->len));
-}
-
-static bool ea_user_range_overlaps(unsigned long ea, int size,
-  struct arch_hw_breakpoint *info)
-{
-   return ((ea < info->address + info->len) &&
-   (ea + size > info->address));
-}
-
-static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info)
-{
-   unsigned long hw_start_addr, hw_end_addr;
-
-   hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE);
-   hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE);
-
-   return ((hw_start_addr <= dar) && (hw_end_addr > dar));
-}
-
-static bool ea_hw_range_overlaps(unsigned long ea, int size,
-struct arch_hw_breakpoint *info)
-{
-   unsigned long hw_start_addr, hw_end_addr;
-   unsigned long align_size = HW_BREAKPOINT_SIZE;
-
-   /*
-* On p10 predecessors, quadword is handle differently then
-* other instructions.
-*/
-   if (!cpu_has_feature(CPU_FTR_ARCH_31) && size == 16)
-   align_size = HW_BREAKPOINT_SIZE_QUADWORD;
-
-   hw_start_addr = ALIGN_DOWN(info->address, align_size);
-   hw_end_addr = ALIGN(info->address + info->len, align_size);
-
-   return ((ea < hw_end_addr) && (ea + size > hw_start_addr));
-}
-
-/*
- * If hw has multiple DAWR registers, we also need to check all
- * dawrx constraint bits to confirm this is _really_ a valid event.
- * If type is UNKNOWN, but privilege level matches, consider it as
- * a positive match.
- */
-static bool check_dawrx_constraints(struct pt_regs *regs, int type,
-   struct arch_hw_breakpoint *info)
-{
-   if (OP_IS_LOAD(type) && !(info->type & HW_BRK_TYPE_READ))
-   return false;
-
-   /*
-* The Cache Management instructions other than dcbz never
-* cause a match. i.e. if type is CACHEOP, the instruction
-* is dcbz, and dcbz is tr

[PATCH v6 3/8] powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N

2020-09-01 Thread Ravi Bangoria

When kernel is compiled with CONFIG_HAVE_HW_BREAKPOINT=N, user can
still create watchpoint using PPC_PTRACE_SETHWDEBUG, with limited
functionalities. But, such watchpoints are never firing because of
the missing privilege settings. Fix that.

It's safe to set HW_BRK_TYPE_PRIV_ALL because we don't really leak
any kernel address in signal info. Setting HW_BRK_TYPE_PRIV_ALL will
also help to find scenarios when kernel accesses user memory.

Reported-by: Pedro Miraglia Franco de Carvalho 
Suggested-by: Pedro Miraglia Franco de Carvalho 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index 697c7e4b5877..57a0ab822334 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -217,7 +217,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct 
ppc_hw_breakpoint *bp_inf
return -EIO;
 
brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE);
-   brk.type = HW_BRK_TYPE_TRANSLATE;
+   brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL;
brk.len = DABR_MAX_LEN;
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ)
brk.type |= HW_BRK_TYPE_READ;
-- 
2.26.2

[PATCH v6 2/8] powerpc/watchpoint: Fix handling of vector instructions

2020-09-01 Thread Ravi Bangoria

Vector load/store instructions are special because they are always
aligned. Thus unaligned EA needs to be aligned down before comparing
it with watch ranges. Otherwise we might consider valid event as
invalid.

Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than 
one watchpoint")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/hw_breakpoint.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 9f7df1c37233..f6b24838ca3c 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -644,6 +644,8 @@ static void get_instr_detail(struct pt_regs *regs, struct 
ppc_inst *instr,
if (*type == CACHEOP) {
*size = cache_op_size();
*ea &= ~(*size - 1);
+   } else if (*type == LOAD_VMX || *type == STORE_VMX) {
+   *ea &= ~(*size - 1);
}
 }
 
-- 
2.26.2

[PATCH v6 1/8] powerpc/watchpoint: Fix quarword instruction handling on p10 predecessors

2020-09-01 Thread Ravi Bangoria

On p10 predecessors, watchpoint with quarword access is compared at
quardword length. If the watch range is doubleword or less than that
in a first half of quarword aligned 16 bytes, and if there is any
unaligned quadword access which will access only the 2nd half, the
handler should consider it as extraneous and emulate/single-step it
before continuing.

Reported-by: Pedro Miraglia Franco de Carvalho 
Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than 
one watchpoint")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/hw_breakpoint.h |  1 +
 arch/powerpc/kernel/hw_breakpoint.c  | 12 ++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index db206a7f38e2..9b68eafebf43 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -42,6 +42,7 @@ struct arch_hw_breakpoint {
 #else
 #define HW_BREAKPOINT_SIZE  0x8
 #endif
+#define HW_BREAKPOINT_SIZE_QUADWORD0x10
 
 #define DABR_MAX_LEN   8
 #define DAWR_MAX_LEN   512
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 1f4a1efa0074..9f7df1c37233 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -520,9 +520,17 @@ static bool ea_hw_range_overlaps(unsigned long ea, int 
size,
 struct arch_hw_breakpoint *info)
 {
unsigned long hw_start_addr, hw_end_addr;
+   unsigned long align_size = HW_BREAKPOINT_SIZE;
 
-   hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE);
-   hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE);
+   /*
+* On p10 predecessors, quadword is handle differently then
+* other instructions.
+*/
+   if (!cpu_has_feature(CPU_FTR_ARCH_31) && size == 16)
+   align_size = HW_BREAKPOINT_SIZE_QUADWORD;
+
+   hw_start_addr = ALIGN_DOWN(info->address, align_size);
+   hw_end_addr = ALIGN(info->address + info->len, align_size);
 
return ((ea < hw_end_addr) && (ea + size > hw_start_addr));
 }
-- 
2.26.2

[PATCH v6 0/8] powerpc/watchpoint: Bug fixes plus new feature flag

2020-09-01 Thread Ravi Bangoria

Patch #1 fixes issue for quardword instruction on p10 predecessors.
Patch #2 fixes issue for vector instructions.
Patch #3 fixes a bug about watchpoint not firing when created with
 ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N.
 The fix uses HW_BRK_TYPE_PRIV_ALL for ptrace user which, I
 guess, should be fine because we don't leak any kernel
 addresses and PRIV_ALL will also help to cover scenarios when
 kernel accesses user memory.
Patch #4,#5 fixes infinite exception bug, again the bug happens only
 with CONFIG_HAVE_HW_BREAKPOINT=N.
Patch #6 fixes two places where we are missing to set hw_len.
Patch #7 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
 which will be set when running on ISA 3.1 compliant machine.
Patch #8 finally adds selftest to test scenarios fixed by patch#2,#3
 and also moves MODE_EXACT tests outside of BP_RANGE condition.

Christophe, let me know if this series breaks something for 8xx.

v5: 
https://lore.kernel.org/r/20200825043617.1073634-1-ravi.bango...@linux.ibm.com

v5->v6:
 - Fix build faulure reported by kernel test robot
 - patch #5. Use more compact if condition, suggested by Christophe


Ravi Bangoria (8):
  powerpc/watchpoint: Fix quarword instruction handling on p10
predecessors
  powerpc/watchpoint: Fix handling of vector instructions
  powerpc/watchpoint/ptrace: Fix SETHWDEBUG when
CONFIG_HAVE_HW_BREAKPOINT=N
  powerpc/watchpoint: Move DAWR detection logic outside of
hw_breakpoint.c
  powerpc/watchpoint: Fix exception handling for
CONFIG_HAVE_HW_BREAKPOINT=N
  powerpc/watchpoint: Add hw_len wherever missing
  powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
  powerpc/watchpoint/selftests: Tests for kernel accessing user memory

 Documentation/powerpc/ptrace.rst  |   1 +
 arch/powerpc/include/asm/hw_breakpoint.h  |  12 ++
 arch/powerpc/include/uapi/asm/ptrace.h|   1 +
 arch/powerpc/kernel/Makefile  |   3 +-
 arch/powerpc/kernel/hw_breakpoint.c   | 149 +---
 .../kernel/hw_breakpoint_constraints.c| 162 ++
 arch/powerpc/kernel/process.c |  48 ++
 arch/powerpc/kernel/ptrace/ptrace-noadv.c |   9 +-
 arch/powerpc/xmon/xmon.c  |   1 +
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c |  48 +-
 10 files changed, 282 insertions(+), 152 deletions(-)
 create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c

-- 
2.26.2

Re: [PATCH v5 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c

2020-08-25 Thread Ravi Bangoria


Hi Christophe,


+static int cache_op_size(void)
+{
+#ifdef __powerpc64__
+    return ppc64_caches.l1d.block_size;
+#else
+    return L1_CACHE_BYTES;
+#endif
+}


You've got l1_dcache_bytes() in arch/powerpc/include/asm/cache.h to do that.


+
+void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
+ int *type, int *size, unsigned long *ea)
+{
+    struct instruction_op op;
+
+    if (__get_user_instr_inatomic(*instr, (void __user *)regs->nip))
+    return;
+
+    analyse_instr(, regs, *instr);
+    *type = GETTYPE(op.type);
+    *ea = op.ea;
+#ifdef __powerpc64__
+    if (!(regs->msr & MSR_64BIT))
+    *ea &= 0xUL;
+#endif


This #ifdef is unneeded, it should build fine on a 32 bits too.


This patch is just a code movement from one file to another.
I don't really change the logic. Would you mind if I do a
separate patch for these changes (not a part of this series)?

Thanks for review,
Ravi

1 2 3 4 5 6 >

1 - 100 of 558 matches

Mail list logo