Re: [PATCH v2] perf: Rewrite core context handling
On 14-Oct-22 3:26 PM, Ravi Bangoria wrote: > On 13-Oct-22 4:29 PM, Peter Zijlstra wrote: >> On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote: >> - refcount_t refcount; + refcount_t refcount; /* event <-> ctx */ >>> >>> Ok. We need to remove all those // XXX get/put_ctx() from code >>> which we added to make refcount a pmu_ctx <-> ctx. >> >> Them already gone :-) I've not yet fixed up the typoes, but current >> version should be here: >> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core >> >> Thanks! > > I've been running perf-fuzzer on Xeon machine since yesterday and I don't see > any > issue. Will do the same on my AMD machine as well over the weekend. Only one WARN_ON() hit. Otherwise all good. https://lore.kernel.org/lkml/8d91528b-e830-6ad0-8a92-621ce9f94...@amd.com Thanks, Ravi
Re: [PATCH v2] perf: Rewrite core context handling
On Fri, Oct 14, 2022 at 03:26:07PM +0530, Ravi Bangoria wrote: > On 13-Oct-22 4:29 PM, Peter Zijlstra wrote: > > On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote: > > > >>> - refcount_t refcount; > >>> + refcount_t refcount; /* event <-> ctx */ > >> > >> Ok. We need to remove all those // XXX get/put_ctx() from code > >> which we added to make refcount a pmu_ctx <-> ctx. > > > > Them already gone :-) I've not yet fixed up the typoes, but current > > version should be here: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core > > > > Thanks! > > I've been running perf-fuzzer on Xeon machine since yesterday and I don't see > any > issue. Will do the same on my AMD machine as well over the weekend. Awesome -- I've started fuzzing on the ADL (with the big.LITTLE PMU setup) and I've had it run on my very aged IVB-EP machine. Both so far (knock on wood) with no issues. The most modern AMD machine I have at hand is a 2 socket Interlagos, and I doubt anybody really much cares about that these days -- but I can run it for giggles.
Re: [PATCH v2] perf: Rewrite core context handling
On 13-Oct-22 4:29 PM, Peter Zijlstra wrote: > On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote: > >>> - refcount_t refcount; >>> + refcount_t refcount; /* event <-> ctx */ >> >> Ok. We need to remove all those // XXX get/put_ctx() from code >> which we added to make refcount a pmu_ctx <-> ctx. > > Them already gone :-) I've not yet fixed up the typoes, but current > version should be here: > > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core > > Thanks! I've been running perf-fuzzer on Xeon machine since yesterday and I don't see any issue. Will do the same on my AMD machine as well over the weekend. Thanks, Ravi
Re: [PATCH v2] perf: Rewrite core context handling
On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote: > > - refcount_t refcount; > > + refcount_t refcount; /* event <-> ctx */ > > Ok. We need to remove all those // XXX get/put_ctx() from code > which we added to make refcount a pmu_ctx <-> ctx. Them already gone :-) I've not yet fixed up the typoes, but current version should be here: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core Thanks!
Re: [PATCH v2] perf: Rewrite core context handling
On 13-Oct-22 2:17 AM, Peter Zijlstra wrote: > On Wed, Oct 12, 2022 at 02:16:29PM +0200, Peter Zijlstra wrote: > >> That's the intent yeah. But due to not always holding ctx->mutex over >> put_pmu_ctx() this might be moot. I'm almost through auditing epc usage >> and I think ctx->lock is sufficient, fingers crossed. > > So the very last epc usage threw a spanner into the works and made > things complicated. > > Specifically sys_perf_event_open()'s group_leader case uses > event->pmu_ctx while only holding ctx->mutex. Therefore we can't fully > let go of ctx->mutex locking and purely rely on ctx->lock. > > Now the good news is that the annoying put_pmu_ctx() without holding > ctx->mutex case doesn't actually matter here. Since we hold a reference > on the group_leader (per the filedesc) the event can't go away, > therefore it must have a pmu_ctx, and then holding ctx->mutex ensures > the pmu_ctx is stable -- iow it serializes against > sys_perf_event_open()'s move_group and perf_pmu_migrate_context() > changing the epc around. > > So we're going with the normal mutex+lock for modification rule, but > allow the weird put_pmu_ctx() exception. > > I have the below delta. > > I'm hoping we can call this done -- I'm going to see if I can bribe Mark > to take a look at the arm64 thing soon and then hopefully queue the > whole thing once -rc1 happens. That should give us a good long soak > until the next merge window. Sounds good. Thanks for all the help! I've glanced through the changes and they looks fine, below are few minor points. > + * Specificially, sys_perf_event_open()'s group_leader case depends on > + * ctx->mutex pinning the configuration. Since we hold a reference on > + * group_leader (through the filedesc) it can't fo away, therefore it's typo: can't go away > - refcount_t refcount; > + refcount_t refcount; /* event <-> ctx */ Ok. We need to remove all those // XXX get/put_ctx() from code which we added to make refcount a pmu_ctx <-> ctx. > +#define double_list_for_each_entry(pos1, pos2, head1, head2, member) \ > + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ > + pos2 = list_first_entry(head2, typeof(*pos2), member); \ > + !list_entry_is_head(pos1, head1, member) &&\ > + !list_entry_is_head(pos2, head2, member); \ > + pos1 = list_next_entry(pos1, member), \ > + pos2 = list_next_entry(pos2, member)) > + > static void perf_event_swap_task_ctx_data(struct perf_event_context > *prev_ctx, > struct perf_event_context *next_ctx) While this is unrelated to this patch, shouldn't we also need to swap event->hw.target? A purely hypothetical scenario: Consider two processes having clone contexts (for example, two children of the same parent). While process switch between these two, the perf event context would get swapped but event->hw.target will point to other sibling's task_struct. If any one process exit just after single context swap, _free_event() will call put_task_context() on sibling process' task_struct. > @@ -12436,6 +12463,9 @@ SYSCALL_DEFINE5(perf_event_open, >* Allow the addition of software events to hw >* groups, this is safe because software events >* never fail to schedule. > + * > + * Note the comment that goes with struct > + * pmu_event_pmu_context. typo: perf_event_pmu_context The good (or bad? ;)) news is, perf test and Vince's perf_event_tests are running fine without any regression on my machine. Thanks, Ravi
Re: [PATCH v2] perf: Rewrite core context handling
On Wed, Oct 12, 2022 at 02:16:29PM +0200, Peter Zijlstra wrote: > That's the intent yeah. But due to not always holding ctx->mutex over > put_pmu_ctx() this might be moot. I'm almost through auditing epc usage > and I think ctx->lock is sufficient, fingers crossed. So the very last epc usage threw a spanner into the works and made things complicated. Specifically sys_perf_event_open()'s group_leader case uses event->pmu_ctx while only holding ctx->mutex. Therefore we can't fully let go of ctx->mutex locking and purely rely on ctx->lock. Now the good news is that the annoying put_pmu_ctx() without holding ctx->mutex case doesn't actually matter here. Since we hold a reference on the group_leader (per the filedesc) the event can't go away, therefore it must have a pmu_ctx, and then holding ctx->mutex ensures the pmu_ctx is stable -- iow it serializes against sys_perf_event_open()'s move_group and perf_pmu_migrate_context() changing the epc around. So we're going with the normal mutex+lock for modification rule, but allow the weird put_pmu_ctx() exception. I have the below delta. I'm hoping we can call this done -- I'm going to see if I can bribe Mark to take a look at the arm64 thing soon and then hopefully queue the whole thing once -rc1 happens. That should give us a good long soak until the next merge window. --- --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -826,21 +826,28 @@ struct perf_event { }; /* - * ,[1:n]-. + * ,---[1:n]--. * V V * perf_event_context <-[1:n]-> perf_event_pmu_context <--- perf_event * ^ ^ | | * `[1:n]-' `-[n:1]-> pmu <-[1:n]-' * * - * XXX destroy epc when empty - * refcount, !rcu - * - * XXX epc locking - * - * event->pmu_ctxctx->mutex && inactive - * ctx->pmu_ctx_list ctx->mutex && ctx->lock - * + * struct perf_event_pmu_context lifetime is refcount based and RCU freed + * (similar to perf_event_context). Locking is as if it were a member of + * perf_event_context; specifically: + * + * modification, both: ctx->mutex && ctx->lock + * reading, either:ctx->mutex || ctx->lock + * + * There is one exception to this; namely put_pmu_ctx() isn't always called + * with ctx->mutex held; this means that as long as we can guarantee the epc + * has events the above rules hold. + * + * Specificially, sys_perf_event_open()'s group_leader case depends on + * ctx->mutex pinning the configuration. Since we hold a reference on + * group_leader (through the filedesc) it can't fo away, therefore it's + * associated pmu_ctx must exist and cannot change due to ctx->mutex. */ struct perf_event_pmu_context { struct pmu *pmu; @@ -857,6 +864,7 @@ struct perf_event_pmu_context { unsigned intnr_events; atomic_trefcount; /* event <-> epc */ + struct rcu_head rcu_head; void*task_ctx_data; /* pmu specific data */ /* @@ -906,7 +914,7 @@ struct perf_event_context { int nr_freq; int rotate_disable; - refcount_t refcount; + refcount_t refcount; /* event <-> ctx */ struct task_struct *task; /* --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1727,6 +1727,10 @@ perf_event_groups_next(struct perf_event return NULL; } +#define perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) \ + for (event = perf_event_groups_first(groups, cpu, pmu, NULL); \ +event; event = perf_event_groups_next(event, pmu)) + /* * Iterate through the whole groups tree. */ @@ -3366,6 +3370,14 @@ static void perf_event_sync_stat(struct } } +#define double_list_for_each_entry(pos1, pos2, head1, head2, member) \ + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ +pos2 = list_first_entry(head2, typeof(*pos2), member); \ +!list_entry_is_head(pos1, head1, member) &&\ +!list_entry_is_head(pos2, head2, member); \ +pos1 = list_next_entry(pos1, member), \ +pos2 = list_next_entry(pos2, member)) + static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, struct perf_event_context *next_ctx) { @@ -3374,17 +3386,12 @@ static void perf_event_swap_task_ctx_dat if (!prev_ctx->nr_task_data) return; - prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, - struct perf_event_pmu_context, -
Re: [PATCH v2] perf: Rewrite core context handling
On Wed, Oct 12, 2022 at 02:09:00PM +0530, Ravi Bangoria wrote: > > @@ -3366,6 +3370,14 @@ static void perf_event_sync_stat(struct > > } > > } > > > > +#define list_for_each_entry_double(pos1, pos2, head1, head2, member) > > \ > > + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ > > +pos2 = list_first_entry(head2, typeof(*pos2), member); \ > > +!list_entry_is_head(pos1, head1, member) &&\ > > +!list_entry_is_head(pos2, head2, member); \ > > +pos1 = list_next_entry(pos1, member), \ > > +pos2 = list_next_entry(pos2, member)) > > + > > static void perf_event_swap_task_ctx_data(struct perf_event_context > > *prev_ctx, > > struct perf_event_context *next_ctx) > > { > > @@ -3374,16 +3386,9 @@ static void perf_event_swap_task_ctx_dat > > if (!prev_ctx->nr_task_data) > > return; > > > > - prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, > > - struct perf_event_pmu_context, > > - pmu_ctx_entry); > > - next_epc = list_first_entry(&next_ctx->pmu_ctx_list, > > - struct perf_event_pmu_context, > > - pmu_ctx_entry); > > - > > - while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && > > - &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { > > - > > + list_for_each_entry_double(prev_epc, next_epc, > > + &prev_ctx->pmu_ctx_list, > > &next_ctx->pmu_ctx_list, > > + pmu_ctx_entry) { > > There are more places which can use list_for_each_entry_double(). > I'll fix those. I've gone and renamed it: double_list_for_each_entry(), but yeah, didn't look too hard for other users. > > @@ -4859,7 +4879,14 @@ static void put_pmu_ctx(struct perf_even > > if (epc->ctx) { > > struct perf_event_context *ctx = epc->ctx; > > > > - // XXX ctx->mutex > > + /* > > +* XXX > > +* > > +* lockdep_assert_held(&ctx->mutex); > > +* > > +* can't because of the call-site in _free_event()/put_event() > > +* which isn't always called under ctx->mutex. > > +*/ > > Yes. I came across the same and could not figure out how to solve > this. So Just kept XXX as is. Yeah, I can sorta fix it, but it's ugly so there we are. > > > > WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry)); > > raw_spin_lock_irqsave(&ctx->lock, flags); > > @@ -12657,6 +12675,13 @@ perf_event_create_kernel_counter(struct > > goto err_unlock; > > } > > > > + pmu_ctx = find_get_pmu_context(pmu, ctx, event); > > + if (IS_ERR(pmu_ctx)) { > > + err = PTR_ERR(pmu_ctx); > > + goto err_unlock; > > + } > > + event->pmu_ctx = pmu_ctx; > > We should call find_get_pmu_context() with ctx->mutex held and thus > above perf_event_create_kernel_counter() change. Is my understanding > correct? That's the intent yeah. But due to not always holding ctx->mutex over put_pmu_ctx() this might be moot. I'm almost through auditing epc usage and I think ctx->lock is sufficient, fingers crossed. > > + > > if (!task) { > > /* > > * Check if the @cpu we're creating an event for is online. > > @@ -12998,7 +13022,7 @@ void perf_event_free_task(struct task_st > > struct perf_event_context *ctx; > > struct perf_event *event, *tmp; > > > > - ctx = rcu_dereference(task->perf_event_ctxp); > > + ctx = rcu_access_pointer(task->perf_event_ctxp); > > We dereference ctx pointer but with mutex and lock held. And thus > rcu_access_pointer() is sufficient. Is my understanding correct? We do not in fact hold ctx->lock here IIRC; but this is a NULL test, if it is !NULL we know we have a reference on it and are good.
Re: [PATCH v2] perf: Rewrite core context handling
On 11-Oct-22 11:17 PM, Peter Zijlstra wrote: > On Tue, Oct 11, 2022 at 04:02:56PM +0200, Peter Zijlstra wrote: >> On Tue, Oct 11, 2022 at 06:49:55PM +0530, Ravi Bangoria wrote: >>> On 11-Oct-22 4:59 PM, Peter Zijlstra wrote: On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: > +static void perf_event_swap_task_ctx_data(struct perf_event_context > *prev_ctx, > + struct perf_event_context *next_ctx) > +{ > + struct perf_event_pmu_context *prev_epc, *next_epc; > + > + if (!prev_ctx->nr_task_data) > + return; > + > + prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, > + struct perf_event_pmu_context, > + pmu_ctx_entry); > + next_epc = list_first_entry(&next_ctx->pmu_ctx_list, > + struct perf_event_pmu_context, > + pmu_ctx_entry); > + > + while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && > +&next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { > + > + WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); > + > + /* > + * PMU specific parts of task perf context can require > + * additional synchronization. As an example of such > + * synchronization see implementation details of Intel > + * LBR call stack data profiling; > + */ > + if (prev_epc->pmu->swap_task_ctx) > + prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); > + else > + swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); Did I forget to advance the iterators here? >>> >>> Yeah. Seems so. I overlooked it too. >> >> OK; so I'm not slowly going crazy staring at this code ;-) Let me go add >> it now then. :-) >> >> But first I gotta taxi the kids around for a bit, bbl. > > OK, so I've been going over the perf_event_pmu_context life-time thing > as well, there were a bunch of XXXs there and I'm not sure Im happy with > things, but I'd also forgotten most of it. > > Ideally epc works like it's a regular member of ctx -- locking wise that > is, but I'm not sure we can make that stick -- see the ctx->mutex issues > we have with put_ctx(). > > As such, I'm going to have to re-audit all the epc usage to see if > pure ctx->lock is sufficient. > > I did do make epc RCU freed, because pretty much everything is and that > was easy enough to make happen -- it means we don't need to worry about > that. > > But I'm going cross-eyes from staring at this all day, so more tomorrow. > The below is what I currently have. > > --- > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -833,13 +833,13 @@ struct perf_event { > * `[1:n]-' `-[n:1]-> pmu <-[1:n]-' > * > * > - * XXX destroy epc when empty > - * refcount, !rcu > + * epc lifetime is refcount based and RCU freed (similar to > perf_event_context). > + * epc locking is as if it were a member of perf_event_context; specifically: > * > - * XXX epc locking > + * modification, both: ctx->mutex && ctx->lock > + * reading, either: ctx->mutex || ctx->lock > * > - * event->pmu_ctxctx->mutex && inactive > - * ctx->pmu_ctx_list ctx->mutex && ctx->lock > + * XXX except this isn't true ... see put_pmu_ctx(). > * > */ > struct perf_event_pmu_context { > @@ -857,6 +857,7 @@ struct perf_event_pmu_context { > unsigned intnr_events; > > atomic_trefcount; /* event <-> epc */ > + struct rcu_head rcu_head; > > void*task_ctx_data; /* pmu specific data */ > /* > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -1727,6 +1727,10 @@ perf_event_groups_next(struct perf_event > return NULL; > } > > +#define perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) > \ > + for (event = perf_event_groups_first(groups, cpu, pmu, NULL); \ > + event; event = perf_event_groups_next(event, pmu)) > + > /* > * Iterate through the whole groups tree. > */ > @@ -3366,6 +3370,14 @@ static void perf_event_sync_stat(struct > } > } > > +#define list_for_each_entry_double(pos1, pos2, head1, head2, member) \ > + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ > + pos2 = list_first_entry(head2, typeof(*pos2), member); \ > + !list_entry_is_head(pos1, head1, member) &&\ > + !list_entry_is_head(pos2, head2, member); \ > + pos1 = list_next_entry(pos1, member), \ > + pos2 = list_next_entry(pos2, member)) > + > static void perf_event_swap_task_ctx_data(struct perf_event_context > *prev_ctx, > stru
Re: [PATCH v2] perf: Rewrite core context handling
On Tue, Oct 11, 2022 at 04:02:56PM +0200, Peter Zijlstra wrote: > On Tue, Oct 11, 2022 at 06:49:55PM +0530, Ravi Bangoria wrote: > > On 11-Oct-22 4:59 PM, Peter Zijlstra wrote: > > > On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: > > > > > >> +static void perf_event_swap_task_ctx_data(struct perf_event_context > > >> *prev_ctx, > > >> + struct perf_event_context > > >> *next_ctx) > > >> +{ > > >> +struct perf_event_pmu_context *prev_epc, *next_epc; > > >> + > > >> +if (!prev_ctx->nr_task_data) > > >> +return; > > >> + > > >> +prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, > > >> +struct perf_event_pmu_context, > > >> +pmu_ctx_entry); > > >> +next_epc = list_first_entry(&next_ctx->pmu_ctx_list, > > >> +struct perf_event_pmu_context, > > >> +pmu_ctx_entry); > > >> + > > >> +while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && > > >> + &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { > > >> + > > >> +WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); > > >> + > > >> +/* > > >> + * PMU specific parts of task perf context can require > > >> + * additional synchronization. As an example of such > > >> + * synchronization see implementation details of Intel > > >> + * LBR call stack data profiling; > > >> + */ > > >> +if (prev_epc->pmu->swap_task_ctx) > > >> +prev_epc->pmu->swap_task_ctx(prev_epc, > > >> next_epc); > > >> +else > > >> +swap(prev_epc->task_ctx_data, > > >> next_epc->task_ctx_data); > > > > > > Did I forget to advance the iterators here? > > > > Yeah. Seems so. I overlooked it too. > > OK; so I'm not slowly going crazy staring at this code ;-) Let me go add > it now then. :-) > > But first I gotta taxi the kids around for a bit, bbl. OK, so I've been going over the perf_event_pmu_context life-time thing as well, there were a bunch of XXXs there and I'm not sure Im happy with things, but I'd also forgotten most of it. Ideally epc works like it's a regular member of ctx -- locking wise that is, but I'm not sure we can make that stick -- see the ctx->mutex issues we have with put_ctx(). As such, I'm going to have to re-audit all the epc usage to see if pure ctx->lock is sufficient. I did do make epc RCU freed, because pretty much everything is and that was easy enough to make happen -- it means we don't need to worry about that. But I'm going cross-eyes from staring at this all day, so more tomorrow. The below is what I currently have. --- --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -833,13 +833,13 @@ struct perf_event { * `[1:n]-' `-[n:1]-> pmu <-[1:n]-' * * - * XXX destroy epc when empty - * refcount, !rcu + * epc lifetime is refcount based and RCU freed (similar to perf_event_context). + * epc locking is as if it were a member of perf_event_context; specifically: * - * XXX epc locking + * modification, both: ctx->mutex && ctx->lock + * reading, either: ctx->mutex || ctx->lock * - * event->pmu_ctxctx->mutex && inactive - * ctx->pmu_ctx_list ctx->mutex && ctx->lock + * XXX except this isn't true ... see put_pmu_ctx(). * */ struct perf_event_pmu_context { @@ -857,6 +857,7 @@ struct perf_event_pmu_context { unsigned intnr_events; atomic_trefcount; /* event <-> epc */ + struct rcu_head rcu_head; void*task_ctx_data; /* pmu specific data */ /* --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1727,6 +1727,10 @@ perf_event_groups_next(struct perf_event return NULL; } +#define perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) \ + for (event = perf_event_groups_first(groups, cpu, pmu, NULL); \ +event; event = perf_event_groups_next(event, pmu)) + /* * Iterate through the whole groups tree. */ @@ -3366,6 +3370,14 @@ static void perf_event_sync_stat(struct } } +#define list_for_each_entry_double(pos1, pos2, head1, head2, member) \ + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ +pos2 = list_first_entry(head2, typeof(*pos2), member); \ +!list_entry_is_head(pos1, head1, member) &&\ +!list_entry_is_head(pos2, head2, member); \ +pos1 = list_next_entry(pos1, member), \ +pos2 = list_next_entry(pos2, member)) + static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx,
Re: [PATCH v2] perf: Rewrite core context handling
On Tue, Oct 11, 2022 at 06:49:55PM +0530, Ravi Bangoria wrote: > On 11-Oct-22 4:59 PM, Peter Zijlstra wrote: > > On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: > > > >> +static void perf_event_swap_task_ctx_data(struct perf_event_context > >> *prev_ctx, > >> +struct perf_event_context *next_ctx) > >> +{ > >> + struct perf_event_pmu_context *prev_epc, *next_epc; > >> + > >> + if (!prev_ctx->nr_task_data) > >> + return; > >> + > >> + prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, > >> + struct perf_event_pmu_context, > >> + pmu_ctx_entry); > >> + next_epc = list_first_entry(&next_ctx->pmu_ctx_list, > >> + struct perf_event_pmu_context, > >> + pmu_ctx_entry); > >> + > >> + while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && > >> + &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { > >> + > >> + WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); > >> + > >> + /* > >> + * PMU specific parts of task perf context can require > >> + * additional synchronization. As an example of such > >> + * synchronization see implementation details of Intel > >> + * LBR call stack data profiling; > >> + */ > >> + if (prev_epc->pmu->swap_task_ctx) > >> + prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); > >> + else > >> + swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); > > > > Did I forget to advance the iterators here? > > Yeah. Seems so. I overlooked it too. OK; so I'm not slowly going crazy staring at this code ;-) Let me go add it now then. :-) But first I gotta taxi the kids around for a bit, bbl.
Re: [PATCH v2] perf: Rewrite core context handling
On 11-Oct-22 4:59 PM, Peter Zijlstra wrote: > On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: > >> +static void perf_event_swap_task_ctx_data(struct perf_event_context >> *prev_ctx, >> + struct perf_event_context *next_ctx) >> +{ >> +struct perf_event_pmu_context *prev_epc, *next_epc; >> + >> +if (!prev_ctx->nr_task_data) >> +return; >> + >> +prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, >> +struct perf_event_pmu_context, >> +pmu_ctx_entry); >> +next_epc = list_first_entry(&next_ctx->pmu_ctx_list, >> +struct perf_event_pmu_context, >> +pmu_ctx_entry); >> + >> +while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && >> + &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { >> + >> +WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); >> + >> +/* >> + * PMU specific parts of task perf context can require >> + * additional synchronization. As an example of such >> + * synchronization see implementation details of Intel >> + * LBR call stack data profiling; >> + */ >> +if (prev_epc->pmu->swap_task_ctx) >> +prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); >> +else >> +swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); > > Did I forget to advance the iterators here? Yeah. Seems so. I overlooked it too. Thanks, Ravi
Re: [PATCH v2] perf: Rewrite core context handling
On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: > +static void perf_event_swap_task_ctx_data(struct perf_event_context > *prev_ctx, > + struct perf_event_context *next_ctx) > +{ > + struct perf_event_pmu_context *prev_epc, *next_epc; > + > + if (!prev_ctx->nr_task_data) > + return; > + > + prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, > + struct perf_event_pmu_context, > + pmu_ctx_entry); > + next_epc = list_first_entry(&next_ctx->pmu_ctx_list, > + struct perf_event_pmu_context, > + pmu_ctx_entry); > + > + while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && > +&next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { > + > + WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); > + > + /* > + * PMU specific parts of task perf context can require > + * additional synchronization. As an example of such > + * synchronization see implementation details of Intel > + * LBR call stack data profiling; > + */ > + if (prev_epc->pmu->swap_task_ctx) > + prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); > + else > + swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); Did I forget to advance the iterators here? > + } > +}