Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 27, 2018 at 03:36:15PM -0800, Andi Kleen wrote: > > It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of > > rather limited use (or even negative, in our case) to a counter that's > > already restricted to ring 3. > > It's much faster. The PMI cost goes down dramatically. faster doesn't mean much when its broken.
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On 11/27/2018 8:25 PM, Stephane Eranian wrote: On Tue, Nov 27, 2018 at 3:36 PM Andi Kleen wrote: It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of rather limited use (or even negative, in our case) to a counter that's already restricted to ring 3. It's much faster. The PMI cost goes down dramatically. I still the the right fix is to add an perf event opt-out and let it be used by rr. V3 is without counter freezing. V4 is with counter freezing. The value is the average cost of the PMI handler. (lower is better) perf options` V3(ns) V4(ns) delta -c 10 1088 894 -18% -g -c 101862 1646-12% --call-graph lbr -c 10 3649 3367-8% --c.g. dwarf -c 10 2248 1982-12% Is that measured on the same machine, i.e., do you force V3 on Skylake? Yes, it's measured on same Kabylake machine with counter_freezing option disabled/enabled. All it does, I think, is save one wrmsr(GLOBAL_CTLR) on entry to the PMU interrupt handler or am I missing something? Or does it save two? The wrmsr(GLOBAL_CTRL) at the end to reactivate. __intel_pmu_disable_all() and __intel_pmu_enable_all() are not called in V4 handler. So save at least two wrmsrl. Thanks, Kan
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 27, 2018 at 3:36 PM Andi Kleen wrote: > > > It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of > > rather limited use (or even negative, in our case) to a counter that's > > already restricted to ring 3. > > It's much faster. The PMI cost goes down dramatically. > > I still the the right fix is to add an perf event opt-out and let it be > used by rr. > >V3 is without counter freezing. > V4 is with counter freezing. > The value is the average cost of the PMI handler. > (lower is better) > > perf options` V3(ns) V4(ns) delta > -c 10 1088 894 -18% > -g -c 101862 1646-12% > --call-graph lbr -c 10 3649 3367-8% > --c.g. dwarf -c 10 2248 1982-12% > Is that measured on the same machine, i.e., do you force V3 on Skylake? All it does, I think, is save one wrmsr(GLOBAL_CTLR) on entry to the PMU interrupt handler or am I missing something? Or does it save two? The wrmsr(GLOBAL_CTRL) at the end to reactivate. > > -Andi >
Re: [REGRESSION] x86, perf: counter freezing breaks rr
> It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of > rather limited use (or even negative, in our case) to a counter that's > already restricted to ring 3. It's much faster. The PMI cost goes down dramatically. I still the the right fix is to add an perf event opt-out and let it be used by rr. V3 is without counter freezing. V4 is with counter freezing. The value is the average cost of the PMI handler. (lower is better) perf options` V3(ns) V4(ns) delta -c 10 1088 894 -18% -g -c 101862 1646-12% --call-graph lbr -c 10 3649 3367-8% --c.g. dwarf -c 10 2248 1982-12% -Andi
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Wed, Nov 21, 2018 at 12:14 AM Peter Zijlstra wrote: > > On Tue, Nov 20, 2018 at 02:38:54PM -0800, Andi Kleen wrote: > > > In fact, I'll argue FREEZE_ON_OVERFLOW is unfixably broken for > > > independent counters, because while one counter overflows, we'll stall > > > counting on all others until we've handled the PMI. > > > > > > Even though the PMI might not be for them and they very much want/need > > > to continue counting. > > > > We stop all counters in any case for the PMI. With freeze-on-PMI it just > > happens slightly earlier. > > Hiding the PMI is fine and good. The PMI is not the workload. Stopping > it earlier is _NOT_ good, it hides your actual workload. It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of rather limited use (or even negative, in our case) to a counter that's already restricted to ring 3. - Kyle
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 02:38:54PM -0800, Andi Kleen wrote: > > In fact, I'll argue FREEZE_ON_OVERFLOW is unfixably broken for > > independent counters, because while one counter overflows, we'll stall > > counting on all others until we've handled the PMI. > > > > Even though the PMI might not be for them and they very much want/need > > to continue counting. > > We stop all counters in any case for the PMI. With freeze-on-PMI it just > happens slightly earlier. Hiding the PMI is fine and good. The PMI is not the workload. Stopping it earlier is _NOT_ good, it hides your actual workload.
Re: [REGRESSION] x86, perf: counter freezing breaks rr
> In fact, I'll argue FREEZE_ON_OVERFLOW is unfixably broken for > independent counters, because while one counter overflows, we'll stall > counting on all others until we've handled the PMI. > > Even though the PMI might not be for them and they very much want/need > to continue counting. We stop all counters in any case for the PMI. With freeze-on-PMI it just happens slightly earlier. -Andi
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 11:16:42PM +0100, Peter Zijlstra wrote: > Ooh, so the thing does FREEZE_ON_OVERFLOW _not_ FREEZE_ON_PMI. Yes, that > can be a big difference. > > See, FREEZE_ON_PMI, as advertised by the name, should have no observable > effect on counters limited to USR. But something like FREEZE_ON_OVERFLOW > will loose everything between the overflow and the eventual PMI, and by > freezing early we can't even compensate for it anymore either, > introducing drift in the period. > > And I don't buy the over-count argument, the counter register shows how > far over you are; it triggers the overflow when we cross 0, it then > continues counting. So if you really care, you can throw away the > 'over-count' at PMI time. That doesn't make it more reliable. We don't > magically get pt_regs from earlier on or any other state. > > The only thing where it might make a difference is if you're running > multiple counters (groups in perf speak) and want to correlate the count > values. Then, and only then, does it matter. In fact, I'll argue FREEZE_ON_OVERFLOW is unfixably broken for independent counters, because while one counter overflows, we'll stall counting on all others until we've handled the PMI. Even though the PMI might not be for them and they very much want/need to continue counting.
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 01:46:05PM -0800, Kyle Huey wrote: > On Tue, Nov 20, 2018 at 1:18 PM Andi Kleen wrote: > > > > > I suppose that's fair that it's better for some use cases. The flip > > > side is that it's no longer possible to get exactly accurate counts > > > from user space if you're using the PMI (because any events between > > > the overflow itself and the transition to the PMI handler are > > > permanently lost) which is catastrophically bad for us :) > > > > Yes that's a fair point. For most usages it doesn't matter. > > > > I suspect that's a case for supporting opt-out for freezing > > per perf event, and rr using that. > > I don't see how you could easily opt-out on a per perf event basis. If > I'm reading the SDM correctly the Freeze_PerfMon_On_PMI setting is > global and affects all counters on that CPU. Even counters that don't > use the PMI at all will still be frozen if another counter overflows > and counter freezing is enabled. It would seem that a counter that > wants to use counter freezing and a counter that wants the behavior we > want would be mutually exclusive. I suppose the kernel could handle > all of that but it's a bit involved. Yes it's a per CPU setting. You wouldn't be able to opt-in. If anyone opts out on a CPU it would be disabled on that CPU while that event is active. -Andi
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 12:11:44PM -0800, Andi Kleen wrote: > > > > Given that we're already at rc3, and that this renders rr unusable, > > > > we'd ask that counter freezing be disabled for the 4.20 release. > > > > > > The boot option should be good enough for the release? > > > > I'm not entirely sure what you mean here. We want you to flip the > > default boot option so this feature is off for this release. i.e. rr > > should work by default on 4.20 and people should have to opt into the > > inaccurate behavior if they want faster PMI servicing. > > I don't think it's inaccurate, it's just different > than what you are used to. > > For profiling including the kernel it's actually far more accurate > because the count is stopped much earlier near the sampling > point. Otherwise there is a considerable over count into > the PMI handler. > > In your case you limit the count to ring 3 so it's always cut off > at the transition point into the kernel, while with freezing > it's at the overflow point. Ooh, so the thing does FREEZE_ON_OVERFLOW _not_ FREEZE_ON_PMI. Yes, that can be a big difference. See, FREEZE_ON_PMI, as advertised by the name, should have no observable effect on counters limited to USR. But something like FREEZE_ON_OVERFLOW will loose everything between the overflow and the eventual PMI, and by freezing early we can't even compensate for it anymore either, introducing drift in the period. And I don't buy the over-count argument, the counter register shows how far over you are; it triggers the overflow when we cross 0, it then continues counting. So if you really care, you can throw away the 'over-count' at PMI time. That doesn't make it more reliable. We don't magically get pt_regs from earlier on or any other state. The only thing where it might make a difference is if you're running multiple counters (groups in perf speak) and want to correlate the count values. Then, and only then, does it matter. Bah.
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 1:18 PM Andi Kleen wrote: > > > I suppose that's fair that it's better for some use cases. The flip > > side is that it's no longer possible to get exactly accurate counts > > from user space if you're using the PMI (because any events between > > the overflow itself and the transition to the PMI handler are > > permanently lost) which is catastrophically bad for us :) > > Yes that's a fair point. For most usages it doesn't matter. > > I suspect that's a case for supporting opt-out for freezing > per perf event, and rr using that. I don't see how you could easily opt-out on a per perf event basis. If I'm reading the SDM correctly the Freeze_PerfMon_On_PMI setting is global and affects all counters on that CPU. Even counters that don't use the PMI at all will still be frozen if another counter overflows and counter freezing is enabled. It would seem that a counter that wants to use counter freezing and a counter that wants the behavior we want would be mutually exclusive. I suppose the kernel could handle all of that but it's a bit involved. - Kyle
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 1:19 PM Stephane Eranian wrote: > > On Tue, Nov 20, 2018 at 12:53 PM Kyle Huey wrote: > > > > On Tue, Nov 20, 2018 at 12:11 PM Andi Kleen wrote: > > > > > > > > > Given that we're already at rc3, and that this renders rr unusable, > > > > > > we'd ask that counter freezing be disabled for the 4.20 release. > > > > > > > > > > The boot option should be good enough for the release? > > > > > > > > I'm not entirely sure what you mean here. We want you to flip the > > > > default boot option so this feature is off for this release. i.e. rr > > > > should work by default on 4.20 and people should have to opt into the > > > > inaccurate behavior if they want faster PMI servicing. > > > > > > I don't think it's inaccurate, it's just different > > > than what you are used to. > > > > > > For profiling including the kernel it's actually far more accurate > > > because the count is stopped much earlier near the sampling > > > point. Otherwise there is a considerable over count into > > > the PMI handler. > > > > > > In your case you limit the count to ring 3 so it's always cut off > > > at the transition point into the kernel, while with freezing > > > it's at the overflow point. > > > > I suppose that's fair that it's better for some use cases. The flip > > side is that it's no longer possible to get exactly accurate counts > > from user space if you're using the PMI (because any events between > > the overflow itself and the transition to the PMI handler are > > permanently lost) which is catastrophically bad for us :) > > > Let me make sure I got this right. During recording, you count on > retired-cond-branch > and you record the value of the PMU counter at specific locations, > e.g., syscalls. > During replay, you program the branch-conditional-retired to overflow > on interrupt at > each recorded values. So if you sampled the event at 1,000,000 and > then at 1,500,000. > Then you program the event with a period of 1,000,000 first, on > overflow the counter interrupts > and you get a signal. Then, you reprogram the event for a new period > of 500,000. During recording > and replay the event is limited to ring 3 (user level). Am I > understanding this right? This is largely correct, except that we only program the interrupt for events that we would not naturally stop at during the course of execution such as asynchronous signals or context switch points. At events that we would naturally stop at (i.e. we can stop at syscalls via ptrace) we simply check that the counters match to find any discrepancies faster, before they affect an async signal delivery. Let's say I have the following event sequence: 1. alarm syscall at rbc=1000 2. SIGALARM delivery at rbc=8000 3. exit syscall at rbc=9000 During replay, we begin the program and run to the syscall via a PTRACE_SYSCALL ptrace. When the replayed process stops, we check that the value of the rbc counter is 1000 (we also check that all registers match what we recorded) and then we emulate the effects of the syscall on the replayed process's registers and memory. Then we see that the next event is an asynchronous signal, and we program the rbc counter to interrupt after an additional (8000 - 1000 - SKID_SIZE) events (where SKID_SIZE has been chosen by experimentation to ensure that the PMU interrupt is not delivered *after* the point in the program we care about. For Skylake this value is 100). We then resume the program with a PTRACE_CONT ptrace and wait for the PMI to stop the replayed tracee. We advance the program to the exact point that we care about through a combination of breakpoints and singlestepping, and then deliver the SIGALARM. Once that is done, we see that the next event is the exit syscall, and we again do a PTRACE_SYSCALL ptrace to get to it. Once there we check the rbc counter value and registers match what were recorded, and perform the syscall. Our counters are always restricted to ring 3 in both recording and replay. - Kyle
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 12:53 PM Kyle Huey wrote: > > On Tue, Nov 20, 2018 at 12:11 PM Andi Kleen wrote: > > > > > > > Given that we're already at rc3, and that this renders rr unusable, > > > > > we'd ask that counter freezing be disabled for the 4.20 release. > > > > > > > > The boot option should be good enough for the release? > > > > > > I'm not entirely sure what you mean here. We want you to flip the > > > default boot option so this feature is off for this release. i.e. rr > > > should work by default on 4.20 and people should have to opt into the > > > inaccurate behavior if they want faster PMI servicing. > > > > I don't think it's inaccurate, it's just different > > than what you are used to. > > > > For profiling including the kernel it's actually far more accurate > > because the count is stopped much earlier near the sampling > > point. Otherwise there is a considerable over count into > > the PMI handler. > > > > In your case you limit the count to ring 3 so it's always cut off > > at the transition point into the kernel, while with freezing > > it's at the overflow point. > > I suppose that's fair that it's better for some use cases. The flip > side is that it's no longer possible to get exactly accurate counts > from user space if you're using the PMI (because any events between > the overflow itself and the transition to the PMI handler are > permanently lost) which is catastrophically bad for us :) > Let me make sure I got this right. During recording, you count on retired-cond-branch and you record the value of the PMU counter at specific locations, e.g., syscalls. During replay, you program the branch-conditional-retired to overflow on interrupt at each recorded values. So if you sampled the event at 1,000,000 and then at 1,500,000. Then you program the event with a period of 1,000,000 first, on overflow the counter interrupts and you get a signal. Then, you reprogram the event for a new period of 500,000. During recording and replay the event is limited to ring 3 (user level). Am I understanding this right?
Re: [REGRESSION] x86, perf: counter freezing breaks rr
> I suppose that's fair that it's better for some use cases. The flip > side is that it's no longer possible to get exactly accurate counts > from user space if you're using the PMI (because any events between > the overflow itself and the transition to the PMI handler are > permanently lost) which is catastrophically bad for us :) Yes that's a fair point. For most usages it doesn't matter. I suspect that's a case for supporting opt-out for freezing per perf event, and rr using that. -Andi
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 12:11 PM Andi Kleen wrote: > > > > > Given that we're already at rc3, and that this renders rr unusable, > > > > we'd ask that counter freezing be disabled for the 4.20 release. > > > > > > The boot option should be good enough for the release? > > > > I'm not entirely sure what you mean here. We want you to flip the > > default boot option so this feature is off for this release. i.e. rr > > should work by default on 4.20 and people should have to opt into the > > inaccurate behavior if they want faster PMI servicing. > > I don't think it's inaccurate, it's just different > than what you are used to. > > For profiling including the kernel it's actually far more accurate > because the count is stopped much earlier near the sampling > point. Otherwise there is a considerable over count into > the PMI handler. > > In your case you limit the count to ring 3 so it's always cut off > at the transition point into the kernel, while with freezing > it's at the overflow point. I suppose that's fair that it's better for some use cases. The flip side is that it's no longer possible to get exactly accurate counts from user space if you're using the PMI (because any events between the overflow itself and the transition to the PMI handler are permanently lost) which is catastrophically bad for us :) - Kyle
Re: [REGRESSION] x86, perf: counter freezing breaks rr
> > > Given that we're already at rc3, and that this renders rr unusable, > > > we'd ask that counter freezing be disabled for the 4.20 release. > > > > The boot option should be good enough for the release? > > I'm not entirely sure what you mean here. We want you to flip the > default boot option so this feature is off for this release. i.e. rr > should work by default on 4.20 and people should have to opt into the > inaccurate behavior if they want faster PMI servicing. I don't think it's inaccurate, it's just different than what you are used to. For profiling including the kernel it's actually far more accurate because the count is stopped much earlier near the sampling point. Otherwise there is a considerable over count into the PMI handler. In your case you limit the count to ring 3 so it's always cut off at the transition point into the kernel, while with freezing it's at the overflow point. -Andi
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 11:41 AM Andi Kleen wrote: > > > rr, a userspace record and replay debugger[0], uses the PMU interrupt > > (PMI) to stop a program during replay to inject asynchronous events > > such as signals. With perf counter freezing enabled we are reliably > > seeing perf event overcounts during replay. This behavior is easily > > It's hard to see how it could over count since the PMU freezes > earlier than the PMI with freezing. So it should count less. > Did you mean under count? Yes, I did mean under count, see my last email. > > Given that we're already at rc3, and that this renders rr unusable, > > we'd ask that counter freezing be disabled for the 4.20 release. > > The boot option should be good enough for the release? I'm not entirely sure what you mean here. We want you to flip the default boot option so this feature is off for this release. i.e. rr should work by default on 4.20 and people should have to opt into the inaccurate behavior if they want faster PMI servicing. > A reasonable future option would be to expose an option to disable it in > the perf_event. Then rr could set it. - Kyle
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 10:16 AM Stephane Eranian wrote: > I would like to understand better the PMU behavior you are relying upon and > why the V4 freeze approach is breaking it. Could you elaborate? I investigated a bit more to write this response and discovered that my initial characterization of the problem as an overcount during replay is incorrect; what we are actually seeing is an undercount during recording. rr relies on the userspace retired-conditional-branches counter being exactly the same between recording and replay. The primary reason we do this is to establish a program "timeline", allowing us to find the correct place to inject asynchronous signals during replay (the program counter plus the retired-conditional-branches counter value uniquely identifies a point in most programs). Because we run the rcb counter during recording, we piggy back on it by programming it to interrupt the program every few hundred thousand branches to give us a chance to context switch to a different program thread. We've found that with counter freezing enabled, when the PMI fires, the reported value of the retired conditional branches counter is low by something on the order of 10 branches. In a single threaded program, although the PMI fires, we don't actually record a context switch or the counter value at this point. We continue on to the next tracee event (e.g. a syscall) and record the counter value at that point. Then, during replay, we replay to the syscall and check that the replay counter value matches the recorded value and find that it is too high. (NB: during a single threaded replay the PMI is not used here because there is no asynchronous event.) Repeatedly recording the same program produces traces that have different recorded retired-conditional-branch counter values after the first PMI fired during recording, but during replay we always count off the same number of branches, further suggesting that the replay value is correct. And finally, recordings made on a kernel with counter freezing active still fail to replay on a kernel without counter freezing active. I don't know what the underlying mechanism for the loss of counter events is (e.g. whether it's incorrect code in the interrupt handler, a silicon bug, or what) but it's clear that the counter freezing implementation is causing events to be lost. - Kyle - Kyle
Re: [REGRESSION] x86, perf: counter freezing breaks rr
> rr, a userspace record and replay debugger[0], uses the PMU interrupt > (PMI) to stop a program during replay to inject asynchronous events > such as signals. With perf counter freezing enabled we are reliably > seeing perf event overcounts during replay. This behavior is easily It's hard to see how it could over count since the PMU freezes earlier than the PMI with freezing. So it should count less. Did you mean under count? > Given that we're already at rc3, and that this renders rr unusable, > we'd ask that counter freezing be disabled for the 4.20 release. The boot option should be good enough for the release? A reasonable future option would be to expose an option to disable it in the perf_event. Then rr could set it. -Andi
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 9:08 AM Peter Zijlstra wrote: > > On Tue, Nov 20, 2018 at 08:19:39AM -0800, Kyle Huey wrote: > > tl;dr: rr is currently broken on 4.20rc2, which I bisected to > > af3bdb991a5cb57c189d34aadbd3aa88995e0d9f. I further confirmed that > > booting the 4.20rc2 kernel with `disable_counter_freezing=true` allows > > rr to work. > > > > rr, a userspace record and replay debugger[0], uses the PMU interrupt > > (PMI) to stop a program during replay to inject asynchronous events > > such as signals. With perf counter freezing enabled we are reliably > > seeing perf event overcounts during replay. This behavior is easily > > demonstrated by attempting to record and replay the `alarm` test from > > rr's test suite. Through bisection I determined that [1] is the first > > bad commit, and further testing showed that booting the kernel with > > `disable_counter_freezing=true` fixes rr. > > I would like to understand better the PMU behavior you are relying upon and why the V4 freeze approach is breaking it. Could you elaborate? > > This behavior has been observed on two different CPUs (a Core i7-6700K > > and a Xeon E3-1505M v5). We have no reason to believe it is limited to > > specific CPU models, this information is included only for > > completeness. > > > > Given that we're already at rc3, and that this renders rr unusable, > > we'd ask that counter freezing be disabled for the 4.20 release. > > Andi, can you have a look at this? > > Meanwhile, I suppose we should do something along these lines. > > > --- > Subject: perf/x86/intel: Default disable perfmon v4 interrupt handling > > Rework the 'disable_counter_freezing' __setup() parameter such that we > can explicitly enable/disable it and switch to default disabled. > > To this purpose, rename the parameter to "perf_v4_pmi=" which is a much > better description and allows requiring a bool argument. > > Signed-off-by: Peter Zijlstra (Intel) > --- > Documentation/admin-guide/kernel-parameters.txt | 3 ++- > arch/x86/events/intel/core.c| 12 > 2 files changed, 10 insertions(+), 5 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index 76c82c01bf5e..ff6d1d4229e0 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -856,7 +856,8 @@ > causing system reset or hang due to sending > INIT from AP to BSP. > > - disable_counter_freezing [HW] > + perf_v4_pmi=[X86,INTEL] > + Format: > Disable Intel PMU counter freezing feature. > The feature only exists starting from > Arch Perfmon v4 (Skylake and newer). > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > index 273c62e81546..af8bea9d4006 100644 > --- a/arch/x86/events/intel/core.c > +++ b/arch/x86/events/intel/core.c > @@ -2306,14 +2306,18 @@ static int handle_pmi_common(struct pt_regs *regs, > u64 status) > return handled; > } > > -static bool disable_counter_freezing; > +static bool disable_counter_freezing = true; > static int __init intel_perf_counter_freezing_setup(char *s) > { > - disable_counter_freezing = true; > - pr_info("Intel PMU Counter freezing feature disabled\n"); > + bool res; > + > + if (kstrtobool(s, &res)) > + return -EINVAL; > + > + disable_counter_freezing = !res; > return 1; > } > -__setup("disable_counter_freezing", intel_perf_counter_freezing_setup); > +__setup("perf_v4_pmi=", intel_perf_counter_freezing_setup); > > /* > * Simplified handler for Arch Perfmon v4:
Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 08:19:39AM -0800, Kyle Huey wrote: > tl;dr: rr is currently broken on 4.20rc2, which I bisected to > af3bdb991a5cb57c189d34aadbd3aa88995e0d9f. I further confirmed that > booting the 4.20rc2 kernel with `disable_counter_freezing=true` allows > rr to work. > > rr, a userspace record and replay debugger[0], uses the PMU interrupt > (PMI) to stop a program during replay to inject asynchronous events > such as signals. With perf counter freezing enabled we are reliably > seeing perf event overcounts during replay. This behavior is easily > demonstrated by attempting to record and replay the `alarm` test from > rr's test suite. Through bisection I determined that [1] is the first > bad commit, and further testing showed that booting the kernel with > `disable_counter_freezing=true` fixes rr. > > This behavior has been observed on two different CPUs (a Core i7-6700K > and a Xeon E3-1505M v5). We have no reason to believe it is limited to > specific CPU models, this information is included only for > completeness. > > Given that we're already at rc3, and that this renders rr unusable, > we'd ask that counter freezing be disabled for the 4.20 release. Andi, can you have a look at this? Meanwhile, I suppose we should do something along these lines. --- Subject: perf/x86/intel: Default disable perfmon v4 interrupt handling Rework the 'disable_counter_freezing' __setup() parameter such that we can explicitly enable/disable it and switch to default disabled. To this purpose, rename the parameter to "perf_v4_pmi=" which is a much better description and allows requiring a bool argument. Signed-off-by: Peter Zijlstra (Intel) --- Documentation/admin-guide/kernel-parameters.txt | 3 ++- arch/x86/events/intel/core.c| 12 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 76c82c01bf5e..ff6d1d4229e0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -856,7 +856,8 @@ causing system reset or hang due to sending INIT from AP to BSP. - disable_counter_freezing [HW] + perf_v4_pmi=[X86,INTEL] + Format: Disable Intel PMU counter freezing feature. The feature only exists starting from Arch Perfmon v4 (Skylake and newer). diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 273c62e81546..af8bea9d4006 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2306,14 +2306,18 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status) return handled; } -static bool disable_counter_freezing; +static bool disable_counter_freezing = true; static int __init intel_perf_counter_freezing_setup(char *s) { - disable_counter_freezing = true; - pr_info("Intel PMU Counter freezing feature disabled\n"); + bool res; + + if (kstrtobool(s, &res)) + return -EINVAL; + + disable_counter_freezing = !res; return 1; } -__setup("disable_counter_freezing", intel_perf_counter_freezing_setup); +__setup("perf_v4_pmi=", intel_perf_counter_freezing_setup); /* * Simplified handler for Arch Perfmon v4:
[REGRESSION] x86, perf: counter freezing breaks rr
tl;dr: rr is currently broken on 4.20rc2, which I bisected to af3bdb991a5cb57c189d34aadbd3aa88995e0d9f. I further confirmed that booting the 4.20rc2 kernel with `disable_counter_freezing=true` allows rr to work. rr, a userspace record and replay debugger[0], uses the PMU interrupt (PMI) to stop a program during replay to inject asynchronous events such as signals. With perf counter freezing enabled we are reliably seeing perf event overcounts during replay. This behavior is easily demonstrated by attempting to record and replay the `alarm` test from rr's test suite. Through bisection I determined that [1] is the first bad commit, and further testing showed that booting the kernel with `disable_counter_freezing=true` fixes rr. This behavior has been observed on two different CPUs (a Core i7-6700K and a Xeon E3-1505M v5). We have no reason to believe it is limited to specific CPU models, this information is included only for completeness. Given that we're already at rc3, and that this renders rr unusable, we'd ask that counter freezing be disabled for the 4.20 release. Thanks, - Kyle [0] https://rr-project.org/ [1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=af3bdb991a5cb57c189d34aadbd3aa88995e0d9f