Re: [PATCH] perf: update perf_cgroup time for ancestor cgroup(s)

2018-03-12 Thread Song Liu


> On Mar 12, 2018, at 5:38 AM, Peter Zijlstra  wrote:
> 
> On Sun, Mar 11, 2018 at 10:32:44PM -0700, Song Liu wrote:
>> When a perf_event is attached to parent cgroup, it should count events
>> for all children cgroups:
>> 
>> parent_group   < perf_event
>>   \
>>- child_group  < process(es)
>> 
>> However, in our tests, we found this perf_event cannot report reliable
>> results. This is because perf_event->cgrp and cpuctx->cgrp are not
>> identical, thus perf_event->cgrp are not updated properly.
> 
> It might help now and in for our older selves, if you could provide a
> simple reproducer for this.

I will add a repro to v2. 

> 
>> Signed-off-by: Song Liu 
>> Reported-by: Ephraim Park 
>> ---
>> kernel/events/core.c | 68 
>> +++-
>> 1 file changed, 67 insertions(+), 1 deletion(-)
>> 
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 5789810..623d38f 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
> 
>> @@ -766,6 +821,17 @@ perf_cgroup_set_timestamp(struct task_struct *task,
>>  cgrp = perf_cgroup_from_task(task, ctx);
>>  info = this_cpu_ptr(cgrp->info);
>>  info->timestamp = ctx->timestamp;
>> +
>> +/* set timestamp for ancestor cgroups */
>> +if (cgrp->css.cgroup->level > 1) {
>> +struct perf_cgroup_info *info = this_cpu_ptr(cgrp->info);
>> +struct perf_event *event;
>> +
>> +list_for_each_entry(event, >pinned_groups, group_entry)
>> +perf_ancestor_cgroup_set_timestamp(event, cgrp, info);
>> +list_for_each_entry(event, >flexible_groups, group_entry)
>> +perf_ancestor_cgroup_set_timestamp(event, cgrp, info);
>> +}
>> }
> 
> That doesn't make any kind of sense... if we're interested in ancestor
> groups, then WTH are you iterating _events_ ?
> 
> I'm expecting something like:
> 
>   struct cgroup_subsys_state *css;
> 
>   for (css = cgrp->css; css; css = css->parent) {
>   /* fudge time */
>   }

I guess this is what I have been looking for! Thanks Peter! 

V2 coming soon...

Song



Re: [PATCH] perf: update perf_cgroup time for ancestor cgroup(s)

2018-03-12 Thread Peter Zijlstra
On Sun, Mar 11, 2018 at 10:32:44PM -0700, Song Liu wrote:
> When a perf_event is attached to parent cgroup, it should count events
> for all children cgroups:
> 
>  parent_group   < perf_event
>\
> - child_group  < process(es)
> 
> However, in our tests, we found this perf_event cannot report reliable
> results. This is because perf_event->cgrp and cpuctx->cgrp are not
> identical, thus perf_event->cgrp are not updated properly.

It might help now and in for our older selves, if you could provide a
simple reproducer for this.

> Signed-off-by: Song Liu 
> Reported-by: Ephraim Park 
> ---
>  kernel/events/core.c | 68 
> +++-
>  1 file changed, 67 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5789810..623d38f 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c

> @@ -766,6 +821,17 @@ perf_cgroup_set_timestamp(struct task_struct *task,
>   cgrp = perf_cgroup_from_task(task, ctx);
>   info = this_cpu_ptr(cgrp->info);
>   info->timestamp = ctx->timestamp;
> +
> + /* set timestamp for ancestor cgroups */
> + if (cgrp->css.cgroup->level > 1) {
> + struct perf_cgroup_info *info = this_cpu_ptr(cgrp->info);
> + struct perf_event *event;
> +
> + list_for_each_entry(event, >pinned_groups, group_entry)
> + perf_ancestor_cgroup_set_timestamp(event, cgrp, info);
> + list_for_each_entry(event, >flexible_groups, group_entry)
> + perf_ancestor_cgroup_set_timestamp(event, cgrp, info);
> + }
>  }

That doesn't make any kind of sense... if we're interested in ancestor
groups, then WTH are you iterating _events_ ?

I'm expecting something like:

struct cgroup_subsys_state *css;

for (css = cgrp->css; css; css = css->parent) {
/* fudge time */
}