subject:"\[REGRESSION\] x86, perf\: counter freezing breaks rr"

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-29 Thread Peter Zijlstra

On Tue, Nov 27, 2018 at 03:36:15PM -0800, Andi Kleen wrote:
> > It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of
> > rather limited use (or even negative, in our case) to a counter that's
> > already restricted to ring 3.
> 
> It's much faster. The PMI cost goes down dramatically.

faster doesn't mean much when its broken.

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-29 Thread Liang, Kan





On 11/27/2018 8:25 PM, Stephane Eranian wrote:

On Tue, Nov 27, 2018 at 3:36 PM Andi Kleen  wrote:



It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of
rather limited use (or even negative, in our case) to a counter that's
already restricted to ring 3.


It's much faster. The PMI cost goes down dramatically.

I still the the right fix is to add an perf event opt-out and let it be
used by rr.

V3 is without counter freezing.
 V4 is with counter freezing.
 The value is the average cost of the PMI handler.
 (lower is better)

 perf options`   V3(ns) V4(ns)  delta
 -c 10   1088   894 -18%
 -g -c 101862   1646-12%
 --call-graph lbr -c 10  3649   3367-8%
 --c.g. dwarf -c 10  2248   1982-12%


Is that measured on the same machine, i.e., do you force V3 on Skylake?


Yes, it's measured on same Kabylake machine with counter_freezing option 
disabled/enabled.




All it does, I think, is save one wrmsr(GLOBAL_CTLR) on entry to the
PMU interrupt handler or am I missing something?
Or does it save two? The wrmsr(GLOBAL_CTRL) at the end to reactivate.


__intel_pmu_disable_all() and __intel_pmu_enable_all() are not called in 
V4 handler. So save at least two wrmsrl.


Thanks,
Kan

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-27 Thread Stephane Eranian

On Tue, Nov 27, 2018 at 3:36 PM Andi Kleen  wrote:
>
> > It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of
> > rather limited use (or even negative, in our case) to a counter that's
> > already restricted to ring 3.
>
> It's much faster. The PMI cost goes down dramatically.
>
> I still the the right fix is to add an perf event opt-out and let it be
> used by rr.
>
>V3 is without counter freezing.
> V4 is with counter freezing.
> The value is the average cost of the PMI handler.
> (lower is better)
>
> perf options`   V3(ns) V4(ns)  delta
> -c 10   1088   894 -18%
> -g -c 101862   1646-12%
> --call-graph lbr -c 10  3649   3367-8%
> --c.g. dwarf -c 10  2248   1982-12%
>
Is that measured on the same machine, i.e., do you force V3 on Skylake?
All it does, I think, is save one wrmsr(GLOBAL_CTLR) on entry to the
PMU interrupt handler or am I missing something?
Or does it save two? The wrmsr(GLOBAL_CTRL) at the end to reactivate.

>
> -Andi
>

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-27 Thread Andi Kleen

> It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of
> rather limited use (or even negative, in our case) to a counter that's
> already restricted to ring 3.

It's much faster. The PMI cost goes down dramatically.

I still the the right fix is to add an perf event opt-out and let it be 
used by rr.

   V3 is without counter freezing.
V4 is with counter freezing.
The value is the average cost of the PMI handler.
(lower is better)

perf options`   V3(ns) V4(ns)  delta
-c 10   1088   894 -18%
-g -c 101862   1646-12%
--call-graph lbr -c 10  3649   3367-8%
--c.g. dwarf -c 10  2248   1982-12%


-Andi

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-27 Thread Kyle Huey

On Wed, Nov 21, 2018 at 12:14 AM Peter Zijlstra  wrote:
>
> On Tue, Nov 20, 2018 at 02:38:54PM -0800, Andi Kleen wrote:
> > > In fact, I'll argue FREEZE_ON_OVERFLOW is unfixably broken for
> > > independent counters, because while one counter overflows, we'll stall
> > > counting on all others until we've handled the PMI.
> > >
> > > Even though the PMI might not be for them and they very much want/need
> > > to continue counting.
> >
> > We stop all counters in any case for the PMI. With freeze-on-PMI it just
> > happens slightly earlier.
>
> Hiding the PMI is fine and good. The PMI is not the workload. Stopping
> it earlier is _NOT_ good, it hides your actual workload.

It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of
rather limited use (or even negative, in our case) to a counter that's
already restricted to ring 3.

- Kyle

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-21 Thread Peter Zijlstra

On Tue, Nov 20, 2018 at 02:38:54PM -0800, Andi Kleen wrote:
> > In fact, I'll argue FREEZE_ON_OVERFLOW is unfixably broken for
> > independent counters, because while one counter overflows, we'll stall
> > counting on all others until we've handled the PMI.
> > 
> > Even though the PMI might not be for them and they very much want/need
> > to continue counting.
> 
> We stop all counters in any case for the PMI. With freeze-on-PMI it just
> happens slightly earlier.

Hiding the PMI is fine and good. The PMI is not the workload. Stopping
it earlier is _NOT_ good, it hides your actual workload.

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Andi Kleen

> In fact, I'll argue FREEZE_ON_OVERFLOW is unfixably broken for
> independent counters, because while one counter overflows, we'll stall
> counting on all others until we've handled the PMI.
> 
> Even though the PMI might not be for them and they very much want/need
> to continue counting.

We stop all counters in any case for the PMI. With freeze-on-PMI it just
happens slightly earlier.

-Andi

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Peter Zijlstra

On Tue, Nov 20, 2018 at 11:16:42PM +0100, Peter Zijlstra wrote:
> Ooh, so the thing does FREEZE_ON_OVERFLOW _not_ FREEZE_ON_PMI. Yes, that
> can be a big difference.
> 
> See, FREEZE_ON_PMI, as advertised by the name, should have no observable
> effect on counters limited to USR. But something like FREEZE_ON_OVERFLOW
> will loose everything between the overflow and the eventual PMI, and by
> freezing early we can't even compensate for it anymore either,
> introducing drift in the period.
> 
> And I don't buy the over-count argument, the counter register shows how
> far over you are; it triggers the overflow when we cross 0, it then
> continues counting. So if you really care, you can throw away the
> 'over-count' at PMI time. That doesn't make it more reliable. We don't
> magically get pt_regs from earlier on or any other state.
> 
> The only thing where it might make a difference is if you're running
> multiple counters (groups in perf speak) and want to correlate the count
> values. Then, and only then, does it matter.

In fact, I'll argue FREEZE_ON_OVERFLOW is unfixably broken for
independent counters, because while one counter overflows, we'll stall
counting on all others until we've handled the PMI.

Even though the PMI might not be for them and they very much want/need
to continue counting.

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Andi Kleen

On Tue, Nov 20, 2018 at 01:46:05PM -0800, Kyle Huey wrote:
> On Tue, Nov 20, 2018 at 1:18 PM Andi Kleen  wrote:
> >
> > > I suppose that's fair that it's better for some use cases. The flip
> > > side is that it's no longer possible to get exactly accurate counts
> > > from user space if you're using the PMI (because any events between
> > > the overflow itself and the transition to the PMI handler are
> > > permanently lost) which is catastrophically bad for us :)
> >
> > Yes that's a fair point. For most usages it doesn't matter.
> >
> > I suspect that's a case for supporting opt-out for freezing
> > per perf event, and rr using that.
> 
> I don't see how you could easily opt-out on a per perf event basis. If
> I'm reading the SDM correctly the Freeze_PerfMon_On_PMI setting is
> global and affects all counters on that CPU. Even counters that don't
> use the PMI at all will still be frozen if another counter overflows
> and counter freezing is enabled. It would seem that a counter that
> wants to use counter freezing and a counter that wants the behavior we
> want would be mutually exclusive. I suppose the kernel could handle
> all of that but it's a bit involved.

Yes it's a per CPU setting.  You wouldn't be able to opt-in. If anyone opts
out on a CPU it would be disabled on that CPU while that event
is active.

-Andi

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Peter Zijlstra

On Tue, Nov 20, 2018 at 12:11:44PM -0800, Andi Kleen wrote:
> > > > Given that we're already at rc3, and that this renders rr unusable,
> > > > we'd ask that counter freezing be disabled for the 4.20 release.
> > >
> > > The boot option should be good enough for the release?
> > 
> > I'm not entirely sure what you mean here. We want you to flip the
> > default boot option so this feature is off for this release. i.e. rr
> > should work by default on 4.20 and people should have to opt into the
> > inaccurate behavior if they want faster PMI servicing.
> 
> I don't think it's inaccurate, it's just different 
> than what you are used to.
> 
> For profiling including the kernel it's actually far more accurate
> because the count is stopped much earlier near the sampling
> point. Otherwise there is a considerable over count into
> the PMI handler.
> 
> In your case you limit the count to ring 3 so it's always cut off
> at the transition point into the kernel, while with freezing
> it's at the overflow point.

Ooh, so the thing does FREEZE_ON_OVERFLOW _not_ FREEZE_ON_PMI. Yes, that
can be a big difference.

See, FREEZE_ON_PMI, as advertised by the name, should have no observable
effect on counters limited to USR. But something like FREEZE_ON_OVERFLOW
will loose everything between the overflow and the eventual PMI, and by
freezing early we can't even compensate for it anymore either,
introducing drift in the period.

And I don't buy the over-count argument, the counter register shows how
far over you are; it triggers the overflow when we cross 0, it then
continues counting. So if you really care, you can throw away the
'over-count' at PMI time. That doesn't make it more reliable. We don't
magically get pt_regs from earlier on or any other state.

The only thing where it might make a difference is if you're running
multiple counters (groups in perf speak) and want to correlate the count
values. Then, and only then, does it matter.

Bah.

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Kyle Huey

On Tue, Nov 20, 2018 at 1:18 PM Andi Kleen  wrote:
>
> > I suppose that's fair that it's better for some use cases. The flip
> > side is that it's no longer possible to get exactly accurate counts
> > from user space if you're using the PMI (because any events between
> > the overflow itself and the transition to the PMI handler are
> > permanently lost) which is catastrophically bad for us :)
>
> Yes that's a fair point. For most usages it doesn't matter.
>
> I suspect that's a case for supporting opt-out for freezing
> per perf event, and rr using that.

I don't see how you could easily opt-out on a per perf event basis. If
I'm reading the SDM correctly the Freeze_PerfMon_On_PMI setting is
global and affects all counters on that CPU. Even counters that don't
use the PMI at all will still be frozen if another counter overflows
and counter freezing is enabled. It would seem that a counter that
wants to use counter freezing and a counter that wants the behavior we
want would be mutually exclusive. I suppose the kernel could handle
all of that but it's a bit involved.

- Kyle

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Kyle Huey

On Tue, Nov 20, 2018 at 1:19 PM Stephane Eranian  wrote:
>
> On Tue, Nov 20, 2018 at 12:53 PM Kyle Huey  wrote:
> >
> > On Tue, Nov 20, 2018 at 12:11 PM Andi Kleen  wrote:
> > >
> > > > > > Given that we're already at rc3, and that this renders rr unusable,
> > > > > > we'd ask that counter freezing be disabled for the 4.20 release.
> > > > >
> > > > > The boot option should be good enough for the release?
> > > >
> > > > I'm not entirely sure what you mean here. We want you to flip the
> > > > default boot option so this feature is off for this release. i.e. rr
> > > > should work by default on 4.20 and people should have to opt into the
> > > > inaccurate behavior if they want faster PMI servicing.
> > >
> > > I don't think it's inaccurate, it's just different
> > > than what you are used to.
> > >
> > > For profiling including the kernel it's actually far more accurate
> > > because the count is stopped much earlier near the sampling
> > > point. Otherwise there is a considerable over count into
> > > the PMI handler.
> > >
> > > In your case you limit the count to ring 3 so it's always cut off
> > > at the transition point into the kernel, while with freezing
> > > it's at the overflow point.
> >
> > I suppose that's fair that it's better for some use cases. The flip
> > side is that it's no longer possible to get exactly accurate counts
> > from user space if you're using the PMI (because any events between
> > the overflow itself and the transition to the PMI handler are
> > permanently lost) which is catastrophically bad for us :)
> >
> Let me make sure I got this right. During recording, you count on
> retired-cond-branch
> and you record the value of the PMU counter at specific locations,
> e.g., syscalls.
> During replay, you program the branch-conditional-retired to overflow
> on interrupt at
> each recorded values. So if you sampled the event at 1,000,000 and
> then at 1,500,000.
> Then you program the event with a period of 1,000,000 first, on
> overflow the counter interrupts
> and you get a signal. Then, you reprogram the event for a new period
> of 500,000.  During recording
> and replay the event is limited to ring 3 (user level). Am I
> understanding this right?

This is largely correct, except that we only program the interrupt for
events that we would not naturally stop at during the course of
execution such as asynchronous signals or context switch points. At
events that we would naturally stop at (i.e. we can stop at syscalls
via ptrace) we simply check that the counters match to find any
discrepancies faster, before they affect an async signal delivery.
Let's say I have the following event sequence:

1. alarm syscall at rbc=1000
2. SIGALARM delivery at rbc=8000
3. exit syscall at rbc=9000

During replay, we begin the program and run to the syscall via a
PTRACE_SYSCALL ptrace. When the replayed process stops, we check that
the value of the rbc counter is 1000 (we also check that all registers
match what we recorded) and then we emulate the effects of the syscall
on the replayed process's registers and memory.

Then we see that the next event is an asynchronous signal, and we
program the rbc counter to interrupt after an additional (8000 - 1000
- SKID_SIZE) events (where SKID_SIZE has been chosen by
experimentation to ensure that the PMU interrupt is not delivered
*after* the point in the program we care about. For Skylake this value
is 100). We then resume the program with a PTRACE_CONT ptrace and wait
for the PMI to stop the replayed tracee. We advance the program to the
exact point that we care about through a combination of breakpoints
and singlestepping, and then deliver the SIGALARM.

Once that is done, we see that the next event is the exit syscall, and
we again do a PTRACE_SYSCALL ptrace to get to it. Once there we check
the rbc counter value and registers match what were recorded, and
perform the syscall.

Our counters are always restricted to ring 3 in both recording and replay.

- Kyle

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Stephane Eranian

On Tue, Nov 20, 2018 at 12:53 PM Kyle Huey  wrote:
>
> On Tue, Nov 20, 2018 at 12:11 PM Andi Kleen  wrote:
> >
> > > > > Given that we're already at rc3, and that this renders rr unusable,
> > > > > we'd ask that counter freezing be disabled for the 4.20 release.
> > > >
> > > > The boot option should be good enough for the release?
> > >
> > > I'm not entirely sure what you mean here. We want you to flip the
> > > default boot option so this feature is off for this release. i.e. rr
> > > should work by default on 4.20 and people should have to opt into the
> > > inaccurate behavior if they want faster PMI servicing.
> >
> > I don't think it's inaccurate, it's just different
> > than what you are used to.
> >
> > For profiling including the kernel it's actually far more accurate
> > because the count is stopped much earlier near the sampling
> > point. Otherwise there is a considerable over count into
> > the PMI handler.
> >
> > In your case you limit the count to ring 3 so it's always cut off
> > at the transition point into the kernel, while with freezing
> > it's at the overflow point.
>
> I suppose that's fair that it's better for some use cases. The flip
> side is that it's no longer possible to get exactly accurate counts
> from user space if you're using the PMI (because any events between
> the overflow itself and the transition to the PMI handler are
> permanently lost) which is catastrophically bad for us :)
>
Let me make sure I got this right. During recording, you count on
retired-cond-branch
and you record the value of the PMU counter at specific locations,
e.g., syscalls.
During replay, you program the branch-conditional-retired to overflow
on interrupt at
each recorded values. So if you sampled the event at 1,000,000 and
then at 1,500,000.
Then you program the event with a period of 1,000,000 first, on
overflow the counter interrupts
and you get a signal. Then, you reprogram the event for a new period
of 500,000.  During recording
and replay the event is limited to ring 3 (user level). Am I
understanding this right?

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Andi Kleen

> I suppose that's fair that it's better for some use cases. The flip
> side is that it's no longer possible to get exactly accurate counts
> from user space if you're using the PMI (because any events between
> the overflow itself and the transition to the PMI handler are
> permanently lost) which is catastrophically bad for us :)

Yes that's a fair point. For most usages it doesn't matter.

I suspect that's a case for supporting opt-out for freezing
per perf event, and rr using that.

-Andi

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Kyle Huey

On Tue, Nov 20, 2018 at 12:11 PM Andi Kleen  wrote:
>
> > > > Given that we're already at rc3, and that this renders rr unusable,
> > > > we'd ask that counter freezing be disabled for the 4.20 release.
> > >
> > > The boot option should be good enough for the release?
> >
> > I'm not entirely sure what you mean here. We want you to flip the
> > default boot option so this feature is off for this release. i.e. rr
> > should work by default on 4.20 and people should have to opt into the
> > inaccurate behavior if they want faster PMI servicing.
>
> I don't think it's inaccurate, it's just different
> than what you are used to.
>
> For profiling including the kernel it's actually far more accurate
> because the count is stopped much earlier near the sampling
> point. Otherwise there is a considerable over count into
> the PMI handler.
>
> In your case you limit the count to ring 3 so it's always cut off
> at the transition point into the kernel, while with freezing
> it's at the overflow point.

I suppose that's fair that it's better for some use cases. The flip
side is that it's no longer possible to get exactly accurate counts
from user space if you're using the PMI (because any events between
the overflow itself and the transition to the PMI handler are
permanently lost) which is catastrophically bad for us :)

- Kyle

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Andi Kleen

> > > Given that we're already at rc3, and that this renders rr unusable,
> > > we'd ask that counter freezing be disabled for the 4.20 release.
> >
> > The boot option should be good enough for the release?
> 
> I'm not entirely sure what you mean here. We want you to flip the
> default boot option so this feature is off for this release. i.e. rr
> should work by default on 4.20 and people should have to opt into the
> inaccurate behavior if they want faster PMI servicing.

I don't think it's inaccurate, it's just different 
than what you are used to.

For profiling including the kernel it's actually far more accurate
because the count is stopped much earlier near the sampling
point. Otherwise there is a considerable over count into
the PMI handler.

In your case you limit the count to ring 3 so it's always cut off
at the transition point into the kernel, while with freezing
it's at the overflow point.

-Andi

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Kyle Huey

On Tue, Nov 20, 2018 at 11:41 AM Andi Kleen  wrote:
>
> > rr, a userspace record and replay debugger[0], uses the PMU interrupt
> > (PMI) to stop a program during replay to inject asynchronous events
> > such as signals. With perf counter freezing enabled we are reliably
> > seeing perf event overcounts during replay. This behavior is easily
>
> It's hard to see how it could over count since the PMU freezes
> earlier than the PMI with freezing. So it should count less.
> Did you mean under count?

Yes, I did mean under count, see my last email.

> > Given that we're already at rc3, and that this renders rr unusable,
> > we'd ask that counter freezing be disabled for the 4.20 release.
>
> The boot option should be good enough for the release?

I'm not entirely sure what you mean here. We want you to flip the
default boot option so this feature is off for this release. i.e. rr
should work by default on 4.20 and people should have to opt into the
inaccurate behavior if they want faster PMI servicing.

> A reasonable future option would be to expose an option to disable it in
> the perf_event. Then rr could set it.

- Kyle

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Kyle Huey

On Tue, Nov 20, 2018 at 10:16 AM Stephane Eranian  wrote:
> I would like to understand better the PMU behavior you are relying upon and
> why the V4 freeze approach is breaking it. Could you elaborate?

I investigated a bit more to write this response and discovered that
my initial characterization of the problem as an overcount during
replay is incorrect; what we are actually seeing is an undercount
during recording.

rr relies on the userspace retired-conditional-branches counter being
exactly the same between recording and replay. The primary reason we
do this is to establish a program "timeline", allowing us to find the
correct place to inject asynchronous signals during replay (the
program counter plus the retired-conditional-branches counter value
uniquely identifies a point in most programs). Because we run the rcb
counter during recording, we piggy back on it by programming it to
interrupt the program every few hundred thousand branches to give us a
chance to context switch to a different program thread.

We've found that with counter freezing enabled, when the PMI fires,
the reported value of the retired conditional branches counter is low
by something on the order of 10 branches. In a single threaded
program, although the PMI fires, we don't actually record a context
switch or the counter value at this point. We continue on to the next
tracee event (e.g. a syscall) and record the counter value at that
point. Then, during replay, we replay to the syscall and check that
the replay counter value matches the recorded value and find that it
is too high. (NB: during a single threaded replay the PMI is not used
here because there is no asynchronous event.) Repeatedly recording the
same program produces traces that have different recorded
retired-conditional-branch counter values after the first PMI fired
during recording, but during replay we always count off the same
number of branches, further suggesting that the replay value is
correct. And finally, recordings made on a kernel with counter
freezing active still fail to replay on a kernel without counter
freezing active.

I don't know what the underlying mechanism for the loss of counter
events is (e.g. whether it's incorrect code in the interrupt handler,
a silicon bug, or what) but it's clear that the counter freezing
implementation is causing events to be lost.

- Kyle

- Kyle

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Andi Kleen

> rr, a userspace record and replay debugger[0], uses the PMU interrupt
> (PMI) to stop a program during replay to inject asynchronous events
> such as signals. With perf counter freezing enabled we are reliably
> seeing perf event overcounts during replay. This behavior is easily

It's hard to see how it could over count since the PMU freezes
earlier than the PMI with freezing. So it should count less.
Did you mean under count?

> Given that we're already at rc3, and that this renders rr unusable,
> we'd ask that counter freezing be disabled for the 4.20 release.

The boot option should be good enough for the release?

A reasonable future option would be to expose an option to disable it in
the perf_event. Then rr could set it.

-Andi

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Stephane Eranian

On Tue, Nov 20, 2018 at 9:08 AM Peter Zijlstra  wrote:
>
> On Tue, Nov 20, 2018 at 08:19:39AM -0800, Kyle Huey wrote:
> > tl;dr: rr is currently broken on 4.20rc2, which I bisected to
> > af3bdb991a5cb57c189d34aadbd3aa88995e0d9f. I further confirmed that
> > booting the 4.20rc2 kernel with `disable_counter_freezing=true` allows
> > rr to work.
> >
> > rr, a userspace record and replay debugger[0], uses the PMU interrupt
> > (PMI) to stop a program during replay to inject asynchronous events
> > such as signals. With perf counter freezing enabled we are reliably
> > seeing perf event overcounts during replay. This behavior is easily
> > demonstrated by attempting to record and replay the `alarm` test from
> > rr's test suite. Through bisection I determined that [1] is the first
> > bad commit, and further testing showed that booting the kernel with
> > `disable_counter_freezing=true` fixes rr.
> >
I would like to understand better the PMU behavior you are relying upon and
why the V4 freeze approach is breaking it. Could you elaborate?

> > This behavior has been observed on two different CPUs (a Core i7-6700K
> > and a Xeon E3-1505M v5). We have no reason to believe it is limited to
> > specific CPU models, this information is included only for
> > completeness.
> >
> > Given that we're already at rc3, and that this renders rr unusable,
> > we'd ask that counter freezing be disabled for the 4.20 release.
>
> Andi, can you have a look at this?
>
> Meanwhile, I suppose we should do something along these lines.
>
>
> ---
> Subject: perf/x86/intel: Default disable perfmon v4 interrupt handling
>
> Rework the 'disable_counter_freezing' __setup() parameter such that we
> can explicitly enable/disable it and switch to default disabled.
>
> To this purpose, rename the parameter to "perf_v4_pmi=" which is a much
> better description and allows requiring a bool argument.
>
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  3 ++-
>  arch/x86/events/intel/core.c| 12 
>  2 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 76c82c01bf5e..ff6d1d4229e0 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -856,7 +856,8 @@
> causing system reset or hang due to sending
> INIT from AP to BSP.
>
> -   disable_counter_freezing [HW]
> +   perf_v4_pmi=[X86,INTEL]
> +   Format: 
> Disable Intel PMU counter freezing feature.
> The feature only exists starting from
> Arch Perfmon v4 (Skylake and newer).
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 273c62e81546..af8bea9d4006 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2306,14 +2306,18 @@ static int handle_pmi_common(struct pt_regs *regs, 
> u64 status)
> return handled;
>  }
>
> -static bool disable_counter_freezing;
> +static bool disable_counter_freezing = true;
>  static int __init intel_perf_counter_freezing_setup(char *s)
>  {
> -   disable_counter_freezing = true;
> -   pr_info("Intel PMU Counter freezing feature disabled\n");
> +   bool res;
> +
> +   if (kstrtobool(s, &res))
> +   return -EINVAL;
> +
> +   disable_counter_freezing = !res;
> return 1;
>  }
> -__setup("disable_counter_freezing", intel_perf_counter_freezing_setup);
> +__setup("perf_v4_pmi=", intel_perf_counter_freezing_setup);
>
>  /*
>   * Simplified handler for Arch Perfmon v4:

Re: [REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Peter Zijlstra

On Tue, Nov 20, 2018 at 08:19:39AM -0800, Kyle Huey wrote:
> tl;dr: rr is currently broken on 4.20rc2, which I bisected to
> af3bdb991a5cb57c189d34aadbd3aa88995e0d9f. I further confirmed that
> booting the 4.20rc2 kernel with `disable_counter_freezing=true` allows
> rr to work.
> 
> rr, a userspace record and replay debugger[0], uses the PMU interrupt
> (PMI) to stop a program during replay to inject asynchronous events
> such as signals. With perf counter freezing enabled we are reliably
> seeing perf event overcounts during replay. This behavior is easily
> demonstrated by attempting to record and replay the `alarm` test from
> rr's test suite. Through bisection I determined that [1] is the first
> bad commit, and further testing showed that booting the kernel with
> `disable_counter_freezing=true` fixes rr.
> 
> This behavior has been observed on two different CPUs (a Core i7-6700K
> and a Xeon E3-1505M v5). We have no reason to believe it is limited to
> specific CPU models, this information is included only for
> completeness.
> 
> Given that we're already at rc3, and that this renders rr unusable,
> we'd ask that counter freezing be disabled for the 4.20 release.

Andi, can you have a look at this?

Meanwhile, I suppose we should do something along these lines.


---
Subject: perf/x86/intel: Default disable perfmon v4 interrupt handling

Rework the 'disable_counter_freezing' __setup() parameter such that we
can explicitly enable/disable it and switch to default disabled.

To this purpose, rename the parameter to "perf_v4_pmi=" which is a much
better description and allows requiring a bool argument.

Signed-off-by: Peter Zijlstra (Intel) 
---
 Documentation/admin-guide/kernel-parameters.txt |  3 ++-
 arch/x86/events/intel/core.c| 12 
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 76c82c01bf5e..ff6d1d4229e0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -856,7 +856,8 @@
causing system reset or hang due to sending
INIT from AP to BSP.
 
-   disable_counter_freezing [HW]
+   perf_v4_pmi=[X86,INTEL]
+   Format: 
Disable Intel PMU counter freezing feature.
The feature only exists starting from
Arch Perfmon v4 (Skylake and newer).
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 273c62e81546..af8bea9d4006 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2306,14 +2306,18 @@ static int handle_pmi_common(struct pt_regs *regs, u64 
status)
return handled;
 }
 
-static bool disable_counter_freezing;
+static bool disable_counter_freezing = true;
 static int __init intel_perf_counter_freezing_setup(char *s)
 {
-   disable_counter_freezing = true;
-   pr_info("Intel PMU Counter freezing feature disabled\n");
+   bool res;
+
+   if (kstrtobool(s, &res))
+   return -EINVAL;
+
+   disable_counter_freezing = !res;
return 1;
 }
-__setup("disable_counter_freezing", intel_perf_counter_freezing_setup);
+__setup("perf_v4_pmi=", intel_perf_counter_freezing_setup);
 
 /*
  * Simplified handler for Arch Perfmon v4:

[REGRESSION] x86, perf: counter freezing breaks rr

2018-11-20 Thread Kyle Huey

tl;dr: rr is currently broken on 4.20rc2, which I bisected to
af3bdb991a5cb57c189d34aadbd3aa88995e0d9f. I further confirmed that
booting the 4.20rc2 kernel with `disable_counter_freezing=true` allows
rr to work.

rr, a userspace record and replay debugger[0], uses the PMU interrupt
(PMI) to stop a program during replay to inject asynchronous events
such as signals. With perf counter freezing enabled we are reliably
seeing perf event overcounts during replay. This behavior is easily
demonstrated by attempting to record and replay the `alarm` test from
rr's test suite. Through bisection I determined that [1] is the first
bad commit, and further testing showed that booting the kernel with
`disable_counter_freezing=true` fixes rr.

This behavior has been observed on two different CPUs (a Core i7-6700K
and a Xeon E3-1505M v5). We have no reason to believe it is limited to
specific CPU models, this information is included only for
completeness.

Given that we're already at rc3, and that this renders rr unusable,
we'd ask that counter freezing be disabled for the 4.20 release.

Thanks,

- Kyle

[0] https://rr-project.org/
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=af3bdb991a5cb57c189d34aadbd3aa88995e0d9f

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

Re: [REGRESSION] x86, perf: counter freezing breaks rr

[REGRESSION] x86, perf: counter freezing breaks rr

22 matches

Site Navigation

Mail list logo

Footer information