Re: x86_pmu_start WARN_ON.

2014-02-25 Thread Vince Weaver
On Mon, 24 Feb 2014, Peter Zijlstra wrote: > On Fri, Feb 21, 2014 at 03:18:38PM -0500, Vince Weaver wrote: > > I've applied the patch and have been unable to trigger the warning with > > either my testcase or a few hours of fuzzing. > > Yay. > > > My only comment on the patch is it could always

Re: x86_pmu_start WARN_ON.

2014-02-24 Thread Peter Zijlstra
On Fri, Feb 21, 2014 at 03:18:38PM -0500, Vince Weaver wrote: > I've applied the patch and have been unable to trigger the warning with > either my testcase or a few hours of fuzzing. Yay. > My only comment on the patch is it could always use some comments. > > The perf_event code is really har

Re: x86_pmu_start WARN_ON.

2014-02-21 Thread Vince Weaver
On Fri, 21 Feb 2014, Peter Zijlstra wrote: > group_sched_in() that fails (for whatever reason), and without x86_pmu > TXN support (because the leader is !x86_pmu), will corrupt the n_added > state. > > If this all is correct; the below ought to cure things. I've applied the patch and have been u

Re: x86_pmu_start WARN_ON.

2014-02-21 Thread Peter Zijlstra
On Thu, Feb 20, 2014 at 07:23:00PM +0100, Peter Zijlstra wrote: > This is I think the relevant bit: > >pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: > x86_pmu_disable >pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: { >pec_1076_warn-2804 [000] d...

Re: x86_pmu_start WARN_ON.

2014-02-21 Thread Vince Weaver
and the perf_fuzzer overnight triggered this possibly related warning in x86_pmu_stop() I assume it's this code (the line numbers don't match up for some reason). if (__test_and_clear_bit(hwc->idx, cpuc->active_mask)) { x86_pmu.disable(event); cpuc->events

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Vince Weaver
On Thu, 20 Feb 2014, Vince Weaver wrote: > On Thu, 20 Feb 2014, Vince Weaver wrote: > > > Might be relevant: check the last_cpu values. Right before the above > > it looks like the thread gets moved from CPU 1 to CPU 0 > > (possibly as a result of the long chain started with the > > close() of t

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Vince Weaver
On Thu, 20 Feb 2014, Vince Weaver wrote: > Might be relevant: check the last_cpu values. Right before the above > it looks like the thread gets moved from CPU 1 to CPU 0 > (possibly as a result of the long chain started with the > close() of the tracepoint event), > so the problem NMI watchdog ev

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Steven Rostedt
On Thu, 20 Feb 2014 19:15:38 +0100 Peter Zijlstra wrote: > I think by using the /debug/tracing/events/ftrace/function event, but > I'm not actually sure, I've never used it nor did I write the code to do > it. Jolsa did all that IIRC. > > All I know is that we had some 'fun' bugs around there s

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Vince Weaver
On Thu, 20 Feb 2014, Peter Zijlstra wrote: > On Thu, Feb 20, 2014 at 01:03:16PM -0500, Vince Weaver wrote: > > attached, it's not very big. > > This is I think the relevant bit: > >pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: > x86_pmu_disable >pec_1076_warn-2804 [000]

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Andi Kleen
On Thu, Feb 20, 2014 at 07:15:38PM +0100, Peter Zijlstra wrote: > On Thu, Feb 20, 2014 at 09:31:19AM -0800, Andi Kleen wrote: > > Peter Zijlstra writes: > > > > > > It will; trace_printk() works without -pg, I think you didn't read the > > > instructions very well. > > > > Ok, you enable and disa

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Peter Zijlstra
On Thu, Feb 20, 2014 at 01:03:16PM -0500, Vince Weaver wrote: > attached, it's not very big. This is I think the relevant bit: pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: { pec_1076_warn

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Peter Zijlstra
On Thu, Feb 20, 2014 at 12:46:12PM -0500, Steven Rostedt wrote: > On Thu, 20 Feb 2014 12:43:51 -0500 > Steven Rostedt wrote: > > > As a disable_trace_on_warning is more of a modification to the kernel, > > I'm leaning to adding a /proc/sys/kernel/ftrace_disable_on_warning > > file. This keeps it

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Peter Zijlstra
On Thu, Feb 20, 2014 at 09:31:19AM -0800, Andi Kleen wrote: > Peter Zijlstra writes: > > > > It will; trace_printk() works without -pg, I think you didn't read the > > instructions very well. > > Ok, you enable and disable it again. I won't guess why you do that. To grow the trace buffers; it s

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Vince Weaver
On Thu, 20 Feb 2014, Peter Zijlstra wrote: > On Wed, Feb 19, 2014 at 05:34:49PM -0500, Vince Weaver wrote: > > So where would the NMI counter event get disabled? Would it never get > > disabled, just because it's always running and always gets the same fixed > > slot? Why isn't this a problem

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Steven Rostedt
On Thu, 20 Feb 2014 12:43:51 -0500 Steven Rostedt wrote: > As a disable_trace_on_warning is more of a modification to the kernel, > I'm leaning to adding a /proc/sys/kernel/ftrace_disable_on_warning > file. This keeps it in line with ftrace_dump_on_oops, which is the most > similar feature. Neve

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Steven Rostedt
On Thu, 20 Feb 2014 18:00:18 +0100 Peter Zijlstra wrote: > On Thu, Feb 20, 2014 at 11:26:00AM -0500, Steven Rostedt wrote: > > On Thu, 20 Feb 2014 11:08:30 +0100 > > Peter Zijlstra wrote: > > > > > @rostedt: WTF is disable_trace_on_warning a boot option only? > > > > Laziness. > > > > > > I'

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Andi Kleen
Peter Zijlstra writes: > > It will; trace_printk() works without -pg, I think you didn't read the > instructions very well. Ok, you enable and disable it again. I won't guess why you do that. > > And there's a very good reason not to apply your patch; you can route > the function tracer into pe

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Peter Zijlstra
On Thu, Feb 20, 2014 at 11:26:00AM -0500, Steven Rostedt wrote: > On Thu, 20 Feb 2014 11:08:30 +0100 > Peter Zijlstra wrote: > > > @rostedt: WTF is disable_trace_on_warning a boot option only? > > Laziness. > > > I'll add a sysctl for it in 3.15. /debug/tracing/options/ was where I was lookin

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Steven Rostedt
On Thu, 20 Feb 2014 11:08:30 +0100 Peter Zijlstra wrote: > @rostedt: WTF is disable_trace_on_warning a boot option only? Laziness. I'll add a sysctl for it in 3.15. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Andi Kleen
Peter Zijlstra writes: > On Wed, Feb 19, 2014 at 05:34:49PM -0500, Vince Weaver wrote: >> So where would the NMI counter event get disabled? Would it never get >> disabled, just because it's always running and always gets the same fixed >> slot? Why isn't this a problem all the time, not just

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Peter Zijlstra
On Thu, Feb 20, 2014 at 07:47:23AM -0800, Andi Kleen wrote: > Peter Zijlstra writes: > > > On Wed, Feb 19, 2014 at 05:34:49PM -0500, Vince Weaver wrote: > >> So where would the NMI counter event get disabled? Would it never get > >> disabled, just because it's always running and always gets the

Re: x86_pmu_start WARN_ON.

2014-02-20 Thread Peter Zijlstra
On Wed, Feb 19, 2014 at 05:34:49PM -0500, Vince Weaver wrote: > So where would the NMI counter event get disabled? Would it never get > disabled, just because it's always running and always gets the same fixed > slot? Why isn't this a problem all the time, not just with corner cases? Well it c

Re: x86_pmu_start WARN_ON.

2014-02-19 Thread Vince Weaver
On Wed, 19 Feb 2014, Peter Zijlstra wrote: > So when we add a new event (or more) we compute a mapping from event to > counter. Then we disable all (pre existing) events that moved to a new > location, then we enable all events (insert HES_ARCH) that were running > but got relocated and the new ev

Re: x86_pmu_start WARN_ON.

2014-02-19 Thread Peter Zijlstra
On Tue, Feb 18, 2014 at 05:20:57PM -0500, Vince Weaver wrote: > On Tue, 18 Feb 2014, Vince Weaver wrote: > > > On Mon, 17 Feb 2014, Peter Zijlstra wrote: > > > > > Enable CONFIG_FRAME_POINTER for better stack traces; I suspect the > > > list_del_event() is just random stack garbage. The path that

Re: x86_pmu_start WARN_ON.

2014-02-18 Thread Vince Weaver
On Tue, 18 Feb 2014, Vince Weaver wrote: > On Mon, 17 Feb 2014, Peter Zijlstra wrote: > > > Enable CONFIG_FRAME_POINTER for better stack traces; I suspect the > > list_del_event() is just random stack garbage. The path that makes sense > > is: > > wait_rcu()->__wait_for_common()->schedule_timeo

Re: x86_pmu_start WARN_ON.

2014-02-18 Thread Vince Weaver
On Mon, 17 Feb 2014, Peter Zijlstra wrote: > Enable CONFIG_FRAME_POINTER for better stack traces; I suspect the > list_del_event() is just random stack garbage. The path that makes sense > is: > wait_rcu()->__wait_for_common()->schedule_timeout() Here's an updated stack trace on 3.14-rc3 with C

Re: x86_pmu_start WARN_ON.

2014-02-17 Thread Peter Zijlstra
On Thu, Feb 13, 2014 at 05:13:20PM -0500, Vince Weaver wrote: > On Thu, 13 Feb 2014, Vince Weaver wrote: > > > The plot thickens. The WARN_ON is not caused by the cycles event that we > > open, but it's caused by the NMI Watchdog cycles event. > > The WARN_ON_ONCE at line 1076 in perf_event.c i

Re: x86_pmu_start WARN_ON.

2014-02-13 Thread Vince Weaver
On Thu, 13 Feb 2014, Vince Weaver wrote: > The plot thickens. The WARN_ON is not caused by the cycles event that we > open, but it's caused by the NMI Watchdog cycles event. The WARN_ON_ONCE at line 1076 in perf_event.c is triggering because in x86_pmu_enable() is calling x86_pmu_start() for al

Re: x86_pmu_start WARN_ON.

2014-02-13 Thread Vince Weaver
On Thu, 13 Feb 2014, Vince Weaver wrote: > On Wed, 12 Feb 2014, Vince Weaver wrote: > > > > It is triggered in this case when you have: > > > > An event group of breakpoint, cycles, branches > > An event of instructions with precise=1 > > A tracepoint > > > > and then you close the tracepo

Re: x86_pmu_start WARN_ON.

2014-02-13 Thread Vince Weaver
On Wed, 12 Feb 2014, Vince Weaver wrote: > On Tue, 11 Feb 2014, Peter Zijlstra wrote: > > > > I'll see if I can run through the reproduction case by hand. > > I've come up with an even simpler test case with all of the extraneous > settings removed. Included below. > > It is triggered in this

Re: x86_pmu_start WARN_ON.

2014-02-12 Thread Vince Weaver
On Tue, 11 Feb 2014, Peter Zijlstra wrote: > > I'll see if I can run through the reproduction case by hand. I've come up with an even simpler test case with all of the extraneous settings removed. Included below. It is triggered in this case when you have: An event group of breakpoint, cycl

Re: x86_pmu_start WARN_ON.

2014-02-11 Thread Peter Zijlstra
On Mon, Feb 10, 2014 at 04:26:29PM -0500, Vince Weaver wrote: > On Thu, 30 Jan 2014, Dave Jones wrote: > > > I gave Vince's perf_fuzzer a run, hoping to trigger a different perf bug > > that I've been seeing. Instead I hit a different bug. > > I've been seeing that WARN_ON for months but it was h

Re: x86_pmu_start WARN_ON.

2014-02-10 Thread Vince Weaver
On Thu, 30 Jan 2014, Dave Jones wrote: > I gave Vince's perf_fuzzer a run, hoping to trigger a different perf bug > that I've been seeing. Instead I hit a different bug. I've been seeing that WARN_ON for months but it was hard to reproduce. After a lot of hassle (and scores or reboots) I managed

x86_pmu_start WARN_ON.

2014-01-30 Thread Dave Jones
I gave Vince's perf_fuzzer a run, hoping to trigger a different perf bug that I've been seeing. Instead I hit a different bug. WARNING: CPU: 1 PID: 9277 at arch/x86/kernel/cpu/perf_event.c:1076 x86_pmu_start+0xd1/0x110() CPU: 1 PID: 9277 Comm: perf_fuzzer Not tainted 3.13.0+ #101 000