Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Wed, Oct 24, 2012 at 02:14:06PM +0200, Jiri Olsa wrote: > On Wed, Oct 24, 2012 at 02:01:18PM +0200, Peter Zijlstra wrote: SNIP > > Right, so I don't object to the patch per-se, I was just curious how you > > ran into it, because ISTR what you just said, we enable all this stuff > > together. > > > > Also, why would disabled counters give strange values? They'd simply > > return the same old value time after time, right? > > well, x86_pmu_read calls x86_perf_event_update, which expects the event > is active.. if it's not it'll update the count from whatever left in > event.hw.idx counter.. could be uninitialized or used by others.. > > I can easily reproduce this one, so let's see.. ;) ok, the problem code path is like this: - running "perf record -e '{cycles,cache-misses}:S' -a sleep 1" which creates group of counters, that are enabled by perf via ioctl - within the __perf_event_enable function the __perf_event_mark_enabled only change state for leader, so following group_sched_in will fail to schedule group siblings, because of the state check in event_sched_in: static int event_sched_in(struct perf_event *event, struct perf_cpu_context *cpuctx, struct perf_event_context *ctx) { u64 tstamp = perf_event_time(event); if (event->state <= PERF_EVENT_STATE_OFF) return 0; - ending up with only leader enabled - all the other events in group are enabled by perf after the leader, but meanwhile leader can hit sample.. and read group events.. ;) attached patch fixies this for me and I was wondering we want same behaviour for disable path as well (included below not tested) I also think that we should keep that state check before calling pmu->read() in the perf sample read thanks, jirka --- diff --git a/kernel/events/core.c b/kernel/events/core.c index dabfc5d..119a57e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1253,6 +1253,16 @@ retry: raw_spin_unlock_irq(>lock); } +static void __perf_event_mark_disabled(struct perf_event *event) +{ + struct perf_event *sub; + + event->state = PERF_EVENT_STATE_OFF; + + list_for_each_entry(sub, >sibling_list, group_entry) + sub->state = PERF_EVENT_STATE_OFF; +} + /* * Cross CPU call to disable a performance event */ @@ -1286,7 +1296,8 @@ int __perf_event_disable(void *info) group_sched_out(event, cpuctx, ctx); else event_sched_out(event, cpuctx, ctx); - event->state = PERF_EVENT_STATE_OFF; + + __perf_event_mark_disabled(event); } raw_spin_unlock(>lock); @@ -1685,8 +1696,8 @@ retry: /* * Put a event into inactive state and update time fields. * Enabling the leader of a group effectively enables all - * the group members that aren't explicitly disabled, so we - * have to update their ->tstamp_enabled also. + * the group members, so we have to update their ->tstamp_enabled + * also. * Note: this works for group members as well as group leaders * since the non-leader members' sibling_lists will be empty. */ @@ -1697,9 +1708,10 @@ static void __perf_event_mark_enabled(struct perf_event *event) event->state = PERF_EVENT_STATE_INACTIVE; event->tstamp_enabled = tstamp - event->total_time_enabled; + list_for_each_entry(sub, >sibling_list, group_entry) { - if (sub->state >= PERF_EVENT_STATE_INACTIVE) - sub->tstamp_enabled = tstamp - sub->total_time_enabled; + sub->state = PERF_EVENT_STATE_INACTIVE; + sub->tstamp_enabled = tstamp - sub->total_time_enabled; } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Wed, 2012-10-24 at 14:14 +0200, Jiri Olsa wrote: > > well, x86_pmu_read calls x86_perf_event_update, which expects the > event > is active.. if it's not it'll update the count from whatever left in > event.hw.idx counter.. could be uninitialized or used by others.. > Oh right, we shouldn't call ->read() unless ->state == PERF_EVENT_STATE_ACTIVE. Aside from that, perf_event_count() should return the 'right' value regardless state (although excluding ERROR might make sense). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Wed, Oct 24, 2012 at 02:01:18PM +0200, Peter Zijlstra wrote: > On Tue, 2012-10-23 at 18:50 +0200, Jiri Olsa wrote: > > On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote: > > > On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote: > > > > It's possible some of the counters in the group could be > > > > disabled when sampling member of the event group is reading > > > > the rest via PERF_SAMPLE_READ sample type processing. Disabled > > > > counters could then produce wrong numbers. > > > > > > > > Fixing that by reading only enabled counters for PERF_SAMPLE_READ > > > > sample type processing. > > > > > > > > > > However did you run into this? > > > > yep, with perf record -a > > > > hm, I just checked and we enable/disable event groups atomicaly.. > > I haven't checked that before because it seemed obvious :-/ > > > > So, I'm not sure now about the exact code path that triggered it > > in my test.. however you could always disable child event from > > group and hit this issue, but thats not what happened in perf. > > > > might be some other bug... I'll check > > Right, so I don't object to the patch per-se, I was just curious how you > ran into it, because ISTR what you just said, we enable all this stuff > together. > > Also, why would disabled counters give strange values? They'd simply > return the same old value time after time, right? well, x86_pmu_read calls x86_perf_event_update, which expects the event is active.. if it's not it'll update the count from whatever left in event.hw.idx counter.. could be uninitialized or used by others.. I can easily reproduce this one, so let's see.. ;) jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Tue, 2012-10-23 at 18:50 +0200, Jiri Olsa wrote: > On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote: > > On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote: > > > It's possible some of the counters in the group could be > > > disabled when sampling member of the event group is reading > > > the rest via PERF_SAMPLE_READ sample type processing. Disabled > > > counters could then produce wrong numbers. > > > > > > Fixing that by reading only enabled counters for PERF_SAMPLE_READ > > > sample type processing. > > > > > > > However did you run into this? > > yep, with perf record -a > > hm, I just checked and we enable/disable event groups atomicaly.. > I haven't checked that before because it seemed obvious :-/ > > So, I'm not sure now about the exact code path that triggered it > in my test.. however you could always disable child event from > group and hit this issue, but thats not what happened in perf. > > might be some other bug... I'll check Right, so I don't object to the patch per-se, I was just curious how you ran into it, because ISTR what you just said, we enable all this stuff together. Also, why would disabled counters give strange values? They'd simply return the same old value time after time, right? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Tue, 2012-10-23 at 18:50 +0200, Jiri Olsa wrote: On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote: On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote: It's possible some of the counters in the group could be disabled when sampling member of the event group is reading the rest via PERF_SAMPLE_READ sample type processing. Disabled counters could then produce wrong numbers. Fixing that by reading only enabled counters for PERF_SAMPLE_READ sample type processing. However did you run into this? yep, with perf record -a hm, I just checked and we enable/disable event groups atomicaly.. I haven't checked that before because it seemed obvious :-/ So, I'm not sure now about the exact code path that triggered it in my test.. however you could always disable child event from group and hit this issue, but thats not what happened in perf. might be some other bug... I'll check Right, so I don't object to the patch per-se, I was just curious how you ran into it, because ISTR what you just said, we enable all this stuff together. Also, why would disabled counters give strange values? They'd simply return the same old value time after time, right? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Wed, Oct 24, 2012 at 02:01:18PM +0200, Peter Zijlstra wrote: On Tue, 2012-10-23 at 18:50 +0200, Jiri Olsa wrote: On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote: On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote: It's possible some of the counters in the group could be disabled when sampling member of the event group is reading the rest via PERF_SAMPLE_READ sample type processing. Disabled counters could then produce wrong numbers. Fixing that by reading only enabled counters for PERF_SAMPLE_READ sample type processing. However did you run into this? yep, with perf record -a hm, I just checked and we enable/disable event groups atomicaly.. I haven't checked that before because it seemed obvious :-/ So, I'm not sure now about the exact code path that triggered it in my test.. however you could always disable child event from group and hit this issue, but thats not what happened in perf. might be some other bug... I'll check Right, so I don't object to the patch per-se, I was just curious how you ran into it, because ISTR what you just said, we enable all this stuff together. Also, why would disabled counters give strange values? They'd simply return the same old value time after time, right? well, x86_pmu_read calls x86_perf_event_update, which expects the event is active.. if it's not it'll update the count from whatever left in event.hw.idx counter.. could be uninitialized or used by others.. I can easily reproduce this one, so let's see.. ;) jirka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Wed, 2012-10-24 at 14:14 +0200, Jiri Olsa wrote: well, x86_pmu_read calls x86_perf_event_update, which expects the event is active.. if it's not it'll update the count from whatever left in event.hw.idx counter.. could be uninitialized or used by others.. Oh right, we shouldn't call -read() unless -state == PERF_EVENT_STATE_ACTIVE. Aside from that, perf_event_count() should return the 'right' value regardless state (although excluding ERROR might make sense). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Wed, Oct 24, 2012 at 02:14:06PM +0200, Jiri Olsa wrote: On Wed, Oct 24, 2012 at 02:01:18PM +0200, Peter Zijlstra wrote: SNIP Right, so I don't object to the patch per-se, I was just curious how you ran into it, because ISTR what you just said, we enable all this stuff together. Also, why would disabled counters give strange values? They'd simply return the same old value time after time, right? well, x86_pmu_read calls x86_perf_event_update, which expects the event is active.. if it's not it'll update the count from whatever left in event.hw.idx counter.. could be uninitialized or used by others.. I can easily reproduce this one, so let's see.. ;) ok, the problem code path is like this: - running perf record -e '{cycles,cache-misses}:S' -a sleep 1 which creates group of counters, that are enabled by perf via ioctl - within the __perf_event_enable function the __perf_event_mark_enabled only change state for leader, so following group_sched_in will fail to schedule group siblings, because of the state check in event_sched_in: static int event_sched_in(struct perf_event *event, struct perf_cpu_context *cpuctx, struct perf_event_context *ctx) { u64 tstamp = perf_event_time(event); if (event-state = PERF_EVENT_STATE_OFF) return 0; - ending up with only leader enabled - all the other events in group are enabled by perf after the leader, but meanwhile leader can hit sample.. and read group events.. ;) attached patch fixies this for me and I was wondering we want same behaviour for disable path as well (included below not tested) I also think that we should keep that state check before calling pmu-read() in the perf sample read thanks, jirka --- diff --git a/kernel/events/core.c b/kernel/events/core.c index dabfc5d..119a57e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1253,6 +1253,16 @@ retry: raw_spin_unlock_irq(ctx-lock); } +static void __perf_event_mark_disabled(struct perf_event *event) +{ + struct perf_event *sub; + + event-state = PERF_EVENT_STATE_OFF; + + list_for_each_entry(sub, event-sibling_list, group_entry) + sub-state = PERF_EVENT_STATE_OFF; +} + /* * Cross CPU call to disable a performance event */ @@ -1286,7 +1296,8 @@ int __perf_event_disable(void *info) group_sched_out(event, cpuctx, ctx); else event_sched_out(event, cpuctx, ctx); - event-state = PERF_EVENT_STATE_OFF; + + __perf_event_mark_disabled(event); } raw_spin_unlock(ctx-lock); @@ -1685,8 +1696,8 @@ retry: /* * Put a event into inactive state and update time fields. * Enabling the leader of a group effectively enables all - * the group members that aren't explicitly disabled, so we - * have to update their -tstamp_enabled also. + * the group members, so we have to update their -tstamp_enabled + * also. * Note: this works for group members as well as group leaders * since the non-leader members' sibling_lists will be empty. */ @@ -1697,9 +1708,10 @@ static void __perf_event_mark_enabled(struct perf_event *event) event-state = PERF_EVENT_STATE_INACTIVE; event-tstamp_enabled = tstamp - event-total_time_enabled; + list_for_each_entry(sub, event-sibling_list, group_entry) { - if (sub-state = PERF_EVENT_STATE_INACTIVE) - sub-tstamp_enabled = tstamp - sub-total_time_enabled; + sub-state = PERF_EVENT_STATE_INACTIVE; + sub-tstamp_enabled = tstamp - sub-total_time_enabled; } } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote: > On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote: > > It's possible some of the counters in the group could be > > disabled when sampling member of the event group is reading > > the rest via PERF_SAMPLE_READ sample type processing. Disabled > > counters could then produce wrong numbers. > > > > Fixing that by reading only enabled counters for PERF_SAMPLE_READ > > sample type processing. > > > > However did you run into this? yep, with perf record -a hm, I just checked and we enable/disable event groups atomicaly.. I haven't checked that before because it seemed obvious :-/ So, I'm not sure now about the exact code path that triggered it in my test.. however you could always disable child event from group and hit this issue, but thats not what happened in perf. might be some other bug... I'll check > > > --- > > kernel/events/core.c | 18 +++--- > > 1 file changed, 15 insertions(+), 3 deletions(-) > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index 32aec40..5220d01 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -4012,12 +4012,24 @@ static void perf_output_read_group(struct > > perf_output_handle *handle, > > __output_copy(handle, values, n * sizeof(u64)); > > > > list_for_each_entry(sub, >sibling_list, group_entry) { > > + u64 value = 0; > > n = 0; > > > > - if (sub != event) > > - sub->pmu->read(sub); > > + /* > > +* We are NOT interested in disabled counters, > > +* giving us strange values and keeping us from > > +* good night sleep!!! > > +*/ > > + if (sub->state > PERF_EVENT_STATE_OFF) { > > + > > superfluous whitespace there... ;-) yea.. v2 ;) thanks, jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote: > It's possible some of the counters in the group could be > disabled when sampling member of the event group is reading > the rest via PERF_SAMPLE_READ sample type processing. Disabled > counters could then produce wrong numbers. > > Fixing that by reading only enabled counters for PERF_SAMPLE_READ > sample type processing. > However did you run into this? > --- > kernel/events/core.c | 18 +++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 32aec40..5220d01 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -4012,12 +4012,24 @@ static void perf_output_read_group(struct > perf_output_handle *handle, > __output_copy(handle, values, n * sizeof(u64)); > > list_for_each_entry(sub, >sibling_list, group_entry) { > + u64 value = 0; > n = 0; > > - if (sub != event) > - sub->pmu->read(sub); > + /* > + * We are NOT interested in disabled counters, > + * giving us strange values and keeping us from > + * good night sleep!!! > + */ > + if (sub->state > PERF_EVENT_STATE_OFF) { > + superfluous whitespace there... ;-) > + if (sub != event) > + sub->pmu->read(sub); > + > + value = perf_event_count(sub); > + } > + > + values[n++] = value; > > - values[n++] = perf_event_count(sub); > if (read_format & PERF_FORMAT_ID) > values[n++] = primary_event_id(sub); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote: It's possible some of the counters in the group could be disabled when sampling member of the event group is reading the rest via PERF_SAMPLE_READ sample type processing. Disabled counters could then produce wrong numbers. Fixing that by reading only enabled counters for PERF_SAMPLE_READ sample type processing. However did you run into this? --- kernel/events/core.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 32aec40..5220d01 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4012,12 +4012,24 @@ static void perf_output_read_group(struct perf_output_handle *handle, __output_copy(handle, values, n * sizeof(u64)); list_for_each_entry(sub, leader-sibling_list, group_entry) { + u64 value = 0; n = 0; - if (sub != event) - sub-pmu-read(sub); + /* + * We are NOT interested in disabled counters, + * giving us strange values and keeping us from + * good night sleep!!! + */ + if (sub-state PERF_EVENT_STATE_OFF) { + superfluous whitespace there... ;-) + if (sub != event) + sub-pmu-read(sub); + + value = perf_event_count(sub); + } + + values[n++] = value; - values[n++] = perf_event_count(sub); if (read_format PERF_FORMAT_ID) values[n++] = primary_event_id(sub); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read
On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote: On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote: It's possible some of the counters in the group could be disabled when sampling member of the event group is reading the rest via PERF_SAMPLE_READ sample type processing. Disabled counters could then produce wrong numbers. Fixing that by reading only enabled counters for PERF_SAMPLE_READ sample type processing. However did you run into this? yep, with perf record -a hm, I just checked and we enable/disable event groups atomicaly.. I haven't checked that before because it seemed obvious :-/ So, I'm not sure now about the exact code path that triggered it in my test.. however you could always disable child event from group and hit this issue, but thats not what happened in perf. might be some other bug... I'll check --- kernel/events/core.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 32aec40..5220d01 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4012,12 +4012,24 @@ static void perf_output_read_group(struct perf_output_handle *handle, __output_copy(handle, values, n * sizeof(u64)); list_for_each_entry(sub, leader-sibling_list, group_entry) { + u64 value = 0; n = 0; - if (sub != event) - sub-pmu-read(sub); + /* +* We are NOT interested in disabled counters, +* giving us strange values and keeping us from +* good night sleep!!! +*/ + if (sub-state PERF_EVENT_STATE_OFF) { + superfluous whitespace there... ;-) yea.. v2 ;) thanks, jirka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/