On Sat, Mar 01, 2014 at 08:50:17AM -0800, H. Peter Anvin wrote:
> The bottom line is that if we want hard numbers we probably have to
> measure.
>
> Hoisting the cr2 read is a no-brainer, might even help performance...
Btw, I just got word that on AMD, a read from CR2 is 4 cycles on family
0x15
On Sat, Mar 01, 2014 at 08:50:17AM -0800, H. Peter Anvin wrote:
The bottom line is that if we want hard numbers we probably have to
measure.
Hoisting the cr2 read is a no-brainer, might even help performance...
Btw, I just got word that on AMD, a read from CR2 is 4 cycles on family
0x15 and 3
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
> We read CR2 in the page fault hot path, so it's on the top of CPU
> architects' minds and it's reasonably optimized. A couple of cycles
> IIRC, but would be nice to hear actual numbers.
Well, going with what Linus found; it looks
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
We read CR2 in the page fault hot path, so it's on the top of CPU
architects' minds and it's reasonably optimized. A couple of cycles
IIRC, but would be nice to hear actual numbers.
Well, going with what Linus found; it looks like
On Fri, 28 Feb 2014, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
> Vince Weaver wrote:
>
> But perf_event bug finder is a much more prestigious title than
> "college professor" ;-)
yes, it's something to fall back on if/when I get denied tenure :)
I do enjoy tracking down
On Sat, 1 Mar 2014, Andi Kleen wrote:
> Steven Rostedt writes:
> >
> > BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
> > really useful for us to do our own testing too.
>
> I believe it's part of trinity.
>
> http://codemonkey.org.uk/projects/trinity/
>
> Perhaps it
On Sat, 1 Mar 2014, Andi Kleen wrote:
Steven Rostedt rost...@goodmis.org writes:
BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
really useful for us to do our own testing too.
I believe it's part of trinity.
http://codemonkey.org.uk/projects/trinity/
On Fri, 28 Feb 2014, Steven Rostedt wrote:
On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:
But perf_event bug finder is a much more prestigious title than
college professor ;-)
yes, it's something to fall back on if/when I get denied tenure :)
I do
The bottom line is that if we want hard numbers we probably have to measure.
Hoisting the cr2 read is a no-brainer, might even help performance...
On March 1, 2014 1:50:42 AM PST, Borislav Petkov wrote:
>On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
>>
>> * Steven Rostedt
Steven Rostedt writes:
>
> BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
> really useful for us to do our own testing too.
I believe it's part of trinity.
http://codemonkey.org.uk/projects/trinity/
Perhaps it should have a "ftracer fuzzer" too?
-Andi
--
To
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
>
> * Steven Rostedt wrote:
>
> > > Also, this function is called a _LOT_ under certain workloads, I
> > > don't know how cheap a CR2 read is, but it had better be really
> > > cheap.
> >
> > That's a HPA question.
>
> We read CR2
* Steven Rostedt wrote:
> > Also, this function is called a _LOT_ under certain workloads, I
> > don't know how cheap a CR2 read is, but it had better be really
> > cheap.
>
> That's a HPA question.
We read CR2 in the page fault hot path, so it's on the top of CPU
architects' minds and
* Steven Rostedt rost...@goodmis.org wrote:
Also, this function is called a _LOT_ under certain workloads, I
don't know how cheap a CR2 read is, but it had better be really
cheap.
That's a HPA question.
We read CR2 in the page fault hot path, so it's on the top of CPU
architects'
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
* Steven Rostedt rost...@goodmis.org wrote:
Also, this function is called a _LOT_ under certain workloads, I
don't know how cheap a CR2 read is, but it had better be really
cheap.
That's a HPA question.
We read
Steven Rostedt rost...@goodmis.org writes:
BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
really useful for us to do our own testing too.
I believe it's part of trinity.
http://codemonkey.org.uk/projects/trinity/
Perhaps it should have a ftracer fuzzer too?
-Andi
--
The bottom line is that if we want hard numbers we probably have to measure.
Hoisting the cr2 read is a no-brainer, might even help performance...
On March 1, 2014 1:50:42 AM PST, Borislav Petkov b...@alien8.de wrote:
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
* Steven
On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
Vince Weaver wrote:
> > I was poking fun at you on IRC for this exact reason:
> >
> > poor Vince, I keep sending him new patches. "No, don't test this
> > patch, now test this one. Oh wait, try this one instead"
> > * peterz sees Vince thinking:
On 02/28/2014 03:34 PM, Vince Weaver wrote:
>
> Well while it might appear that I spend all of my days finding perf_event
> bugs, I actually am a college professor so I do occasionally have to run
> off to teach a class, meet with students, or write papers/grants for other
> academics to
On Fri, 28 Feb 2014, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 16:18:23 -0500 (EST)
> Vince Weaver wrote:
>
> > I was away from the computer this afternoon and of course I have scores of
> > e-mails on this topic now with lots of competing patches. Is there one
> > in particular I'm
On Fri, Feb 28, 2014 at 05:05:53PM -0500, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 22:55:11 +0100
> Peter Zijlstra wrote:
>
> > On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> > > > On Fri, Feb 28, 2014
On Fri, 28 Feb 2014 22:55:11 +0100
Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> > > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > > > This code isn't running
On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > > This code isn't running in idle context is it? If so, RCU will happily
> > > free out
On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > This code isn't running in idle context is it? If so, RCU will happily
> > free out from under it. CONFIG_PROVE_RCU should detect this sort of thing,
> >
On Fri, 28 Feb 2014 16:18:23 -0500 (EST)
Vince Weaver wrote:
> I was away from the computer this afternoon and of course I have scores of
> e-mails on this topic now with lots of competing patches. Is there one
> in particular I'm supposed to be testing?
I was poking fun at you on IRC for
On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> This code isn't running in idle context is it? If so, RCU will happily
> free out from under it. CONFIG_PROVE_RCU should detect this sort of thing,
> though.
Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
On Fri, Feb 28, 2014 at 09:54:09PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote:
> > > > I'll try your patch momentarily, first I had some other changes I
> > > > started
> > > > running before I left work (for some reason it recompiled the whole
On Fri, 28 Feb 2014, H. Peter Anvin wrote:
> Now we need to figure out if the reboot problem and the segfault problem
> are actually the same... I have a nasty feeling they might be different
> problems.
I'm currently running a script that tries setting EBP to all possible
32-bit pages and
On Fri, 28 Feb 2014 21:56:38 +0100
Peter Zijlstra wrote:
> Like already said; _trace is an absolutely abysmal name. Also you
> _really_ don't want an unconditional CR2 write in there, that's just
> stupidly expensive.
But a read isn't. Which is why we only do a write if the copy caused a
page
On Fri, Feb 28, 2014 at 11:29:46AM -0500, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 08:15:11 -0800
> "H. Peter Anvin" wrote:
>
> > Well, I was talking about the assumption spelled out in the comment
> > above copy_from_user_nmi() which pretty much states "cr2 is safe because
> > cr2 is
On Fri, Feb 28, 2014 at 08:15:11AM -0800, H. Peter Anvin wrote:
> On 02/28/2014 07:40 AM, Peter Zijlstra wrote:
> > On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
> >> If I'm reading this right we end up going from the page fault
> >> tracepoint to copy_from_user_nmi() without
On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote:
> > > I'll try your patch momentarily, first I had some other changes I started
> > > running before I left work (for some reason it recompiled the whole
> > > kernel).
> > >
> > > 8: function: perf_output_begin
> > >
On Fri, 28 Feb 2014 12:34:05 -0800
"Paul E. McKenney" wrote:
> On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote:
> > On Thu, 27 Feb 2014, H. Peter Anvin wrote:
> >
> > > On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> > > > On Thu, 27 Feb 2014 14:52:54 -0800
> > > > "H. Peter Anvin"
On Fri, 28 Feb 2014 12:38:52 -0800
"H. Peter Anvin" wrote:
> Now we need to figure out if the reboot problem and the segfault problem are
> actually the same... I have a nasty feeling they might be different problems.
I wonder if there was any recursion problem. Although, I believe perf
has
Now we need to figure out if the reboot problem and the segfault problem are
actually the same... I have a nasty feeling they might be different problems.
On February 28, 2014 7:07:29 AM PST, Vince Weaver
wrote:
>On Fri, 28 Feb 2014, Steven Rostedt wrote:
>
>> Interesting. Are you doing a perf
On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote:
> On Thu, 27 Feb 2014, H. Peter Anvin wrote:
>
> > On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> > > On Thu, 27 Feb 2014 14:52:54 -0800
> > > "H. Peter Anvin" wrote:
> > >
> > >> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> >
On Fri, 28 Feb 2014 08:15:11 -0800
"H. Peter Anvin" wrote:
> Well, I was talking about the assumption spelled out in the comment
> above copy_from_user_nmi() which pretty much states "cr2 is safe because
> cr2 is saved/restored in the NMI wrappers."
Yeah, it seems that the name
On 02/28/2014 07:40 AM, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
>> If I'm reading this right we end up going from the page fault
>> tracepoint to copy_from_user_nmi() without going through NMI, and the
>> cr2 corruption is obvious. I guess the
On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
> If I'm reading this right we end up going from the page fault
> tracepoint to copy_from_user_nmi() without going through NMI, and the
> cr2 corruption is obvious. I guess the assumption that only the NMI
> path needed to save cr2
On Fri, 28 Feb 2014 10:20:00 -0500
Steven Rostedt wrote:
> Below is a patch that should fix this. Please remove all other patches
> and try this out.
Updated patch, as Peter Zijlstra on IRC asked me if the
exception_enter() can be traced. And looking at it, it sure can be.
-- Steve
diff --git
On Fri, 28 Feb 2014 10:07:29 -0500 (EST)
Vince Weaver wrote:
> On Fri, 28 Feb 2014, Steven Rostedt wrote:
> 199.900696: function: __module_address
> ...
> 199.900705: function: __kernel_text_address
> 199.900809: kernel_stack:
> =>
If I'm reading this right we end up going from the page fault tracepoint to
copy_from_user_nmi() without going through NMI, and the cr2 corruption is
obvious. I guess the assumption that only the NMI path needed to save cr2 is
flawed?
On February 28, 2014 7:07:29 AM PST, Vince Weaver
wrote:
On Fri, 28 Feb 2014, Steven Rostedt wrote:
> Interesting. Are you doing a perf function trace?
>
> And just in case, can you add this patch and make sure the copy is
> called by NMI.
199.900682: function: trace_do_page_fault
199.900683: page_fault_user: address=__per_cpu_end
On Fri, 28 Feb 2014 09:15:33 -0500 (EST)
Vince Weaver wrote:
> On Thu, 27 Feb 2014, Steven Rostedt wrote:
>
> > On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
> > Vince Weaver wrote:
> >
> >
> > > > I would actually suggest we do the equivalent on i386 as well.
> > > >
> > > > Vince, could you
On Thu, 27 Feb 2014, Steven Rostedt wrote:
> On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
> Vince Weaver wrote:
>
>
> > > I would actually suggest we do the equivalent on i386 as well.
> > >
> > > Vince, could you try this patch as an experiment?
> >
> > OK with your patch applied it does not
On Fri, 28 Feb 2014 12:11:11 +0100
Peter Zijlstra wrote:
> On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote:
> > @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
> > dotraplinkage notrace __kprobes void
> > do_nmi(struct pt_regs *regs, long error_code)
> > {
On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote:
> @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
> dotraplinkage notrace __kprobes void
> do_nmi(struct pt_regs *regs, long error_code)
> {
> + unsigned long cr2;
> +
>
On Thu, Feb 27, 2014 at 05:31:50PM -0500, Steven Rostedt wrote:
> Well, the perf ring buffer is vmalloced, right? That can cause a page
> fault too.
On x86 they're typically not -- although we have a debug CONFIG option
to test that code on x86 too. On SPARC/ARM etc.. we have to use
On Thu, Feb 27, 2014 at 05:31:50PM -0500, Steven Rostedt wrote:
Well, the perf ring buffer is vmalloced, right? That can cause a page
fault too.
On x86 they're typically not -- although we have a debug CONFIG option
to test that code on x86 too. On SPARC/ARM etc.. we have to use
vmalloc_user()
On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote:
@@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
dotraplinkage notrace __kprobes void
do_nmi(struct pt_regs *regs, long error_code)
{
+ unsigned long cr2;
+
nmi_nesting_preprocess(regs);
On Fri, 28 Feb 2014 12:11:11 +0100
Peter Zijlstra pet...@infradead.org wrote:
On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote:
@@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
dotraplinkage notrace __kprobes void
do_nmi(struct pt_regs *regs, long
On Thu, 27 Feb 2014, Steven Rostedt wrote:
On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:
I would actually suggest we do the equivalent on i386 as well.
Vince, could you try this patch as an experiment?
OK with your patch applied it does
On Fri, 28 Feb 2014 09:15:33 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:
On Thu, 27 Feb 2014, Steven Rostedt wrote:
On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:
I would actually suggest we do the equivalent on i386 as well.
On Fri, 28 Feb 2014, Steven Rostedt wrote:
Interesting. Are you doing a perf function trace?
And just in case, can you add this patch and make sure the copy is
called by NMI.
199.900682: function: trace_do_page_fault
199.900683: page_fault_user: address=__per_cpu_end
If I'm reading this right we end up going from the page fault tracepoint to
copy_from_user_nmi() without going through NMI, and the cr2 corruption is
obvious. I guess the assumption that only the NMI path needed to save cr2 is
flawed?
On February 28, 2014 7:07:29 AM PST, Vince Weaver
On Fri, 28 Feb 2014 10:07:29 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:
On Fri, 28 Feb 2014, Steven Rostedt wrote:
199.900696: function: __module_address
...
199.900705: function: __kernel_text_address
199.900809:
On Fri, 28 Feb 2014 10:20:00 -0500
Steven Rostedt rost...@goodmis.org wrote:
Below is a patch that should fix this. Please remove all other patches
and try this out.
Updated patch, as Peter Zijlstra on IRC asked me if the
exception_enter() can be traced. And looking at it, it sure can be.
--
On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
If I'm reading this right we end up going from the page fault
tracepoint to copy_from_user_nmi() without going through NMI, and the
cr2 corruption is obvious. I guess the assumption that only the NMI
path needed to save cr2 is
On 02/28/2014 07:40 AM, Peter Zijlstra wrote:
On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
If I'm reading this right we end up going from the page fault
tracepoint to copy_from_user_nmi() without going through NMI, and the
cr2 corruption is obvious. I guess the assumption
On Fri, 28 Feb 2014 08:15:11 -0800
H. Peter Anvin h...@zytor.com wrote:
Well, I was talking about the assumption spelled out in the comment
above copy_from_user_nmi() which pretty much states cr2 is safe because
cr2 is saved/restored in the NMI wrappers.
Yeah, it seems that the name
On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote:
On Thu, 27 Feb 2014, H. Peter Anvin wrote:
On 02/27/2014 03:30 PM, Steven Rostedt wrote:
On Thu, 27 Feb 2014 14:52:54 -0800
H. Peter Anvin h...@zytor.com wrote:
On 02/27/2014 02:31 PM, Steven Rostedt wrote:
Yeah,
Now we need to figure out if the reboot problem and the segfault problem are
actually the same... I have a nasty feeling they might be different problems.
On February 28, 2014 7:07:29 AM PST, Vince Weaver vincent.wea...@maine.edu
wrote:
On Fri, 28 Feb 2014, Steven Rostedt wrote:
Interesting.
On Fri, 28 Feb 2014 12:38:52 -0800
H. Peter Anvin h...@zytor.com wrote:
Now we need to figure out if the reboot problem and the segfault problem are
actually the same... I have a nasty feeling they might be different problems.
I wonder if there was any recursion problem. Although, I believe
On Fri, 28 Feb 2014 12:34:05 -0800
Paul E. McKenney paul...@linux.vnet.ibm.com wrote:
On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote:
On Thu, 27 Feb 2014, H. Peter Anvin wrote:
On 02/27/2014 03:30 PM, Steven Rostedt wrote:
On Thu, 27 Feb 2014 14:52:54 -0800
H. Peter
On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote:
I'll try your patch momentarily, first I had some other changes I started
running before I left work (for some reason it recompiled the whole
kernel).
8: function: perf_output_begin
8: bprint:
On Fri, Feb 28, 2014 at 08:15:11AM -0800, H. Peter Anvin wrote:
On 02/28/2014 07:40 AM, Peter Zijlstra wrote:
On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
If I'm reading this right we end up going from the page fault
tracepoint to copy_from_user_nmi() without going
On Fri, Feb 28, 2014 at 11:29:46AM -0500, Steven Rostedt wrote:
On Fri, 28 Feb 2014 08:15:11 -0800
H. Peter Anvin h...@zytor.com wrote:
Well, I was talking about the assumption spelled out in the comment
above copy_from_user_nmi() which pretty much states cr2 is safe because
cr2 is
On Fri, 28 Feb 2014 21:56:38 +0100
Peter Zijlstra pet...@infradead.org wrote:
Like already said; _trace is an absolutely abysmal name. Also you
_really_ don't want an unconditional CR2 write in there, that's just
stupidly expensive.
But a read isn't. Which is why we only do a write if the
On Fri, 28 Feb 2014, H. Peter Anvin wrote:
Now we need to figure out if the reboot problem and the segfault problem
are actually the same... I have a nasty feeling they might be different
problems.
I'm currently running a script that tries setting EBP to all possible
32-bit pages and
On Fri, Feb 28, 2014 at 09:54:09PM +0100, Peter Zijlstra wrote:
On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote:
I'll try your patch momentarily, first I had some other changes I
started
running before I left work (for some reason it recompiled the whole
kernel).
On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
This code isn't running in idle context is it? If so, RCU will happily
free out from under it. CONFIG_PROVE_RCU should detect this sort of thing,
though.
Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
entry
On Fri, 28 Feb 2014 16:18:23 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:
I was away from the computer this afternoon and of course I have scores of
e-mails on this topic now with lots of competing patches. Is there one
in particular I'm supposed to be testing?
I was poking fun
On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
This code isn't running in idle context is it? If so, RCU will happily
free out from under it. CONFIG_PROVE_RCU should detect this sort of thing,
though.
On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
This code isn't running in idle context is it? If so, RCU will happily
free out from under
On Fri, 28 Feb 2014 22:55:11 +0100
Peter Zijlstra pet...@infradead.org wrote:
On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
This code
On Fri, Feb 28, 2014 at 05:05:53PM -0500, Steven Rostedt wrote:
On Fri, 28 Feb 2014 22:55:11 +0100
Peter Zijlstra pet...@infradead.org wrote:
On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
On Fri, Feb
On Fri, 28 Feb 2014, Steven Rostedt wrote:
On Fri, 28 Feb 2014 16:18:23 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:
I was away from the computer this afternoon and of course I have scores of
e-mails on this topic now with lots of competing patches. Is there one
in
On 02/28/2014 03:34 PM, Vince Weaver wrote:
Well while it might appear that I spend all of my days finding perf_event
bugs, I actually am a college professor so I do occasionally have to run
off to teach a class, meet with students, or write papers/grants for other
academics to reject.
On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:
I was poking fun at you on IRC for this exact reason:
rostedt poor Vince, I keep sending him new patches. No, don't test this
patch, now test this one. Oh wait, try this one instead
* peterz sees
On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
Vince Weaver wrote:
> > I would actually suggest we do the equivalent on i386 as well.
> >
> > Vince, could you try this patch as an experiment?
>
> OK with your patch applied it does not segfault.
>
Vince, Great! Can you remove Peter's patch, and
Ok... I think we're definitely talking about a cr2 leak. The reboot might be a
race condition in the NMI nesting handling maybe?
On February 27, 2014 5:34:34 PM PST, Vince Weaver
wrote:
>On Thu, 27 Feb 2014, H. Peter Anvin wrote:
>
>> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
>> >
>> >
On Thu, 27 Feb 2014, H. Peter Anvin wrote:
> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> >
> > Yeah, something is getting mesed up.
> >
>
> What it *looks* like to me is that we try to nest the cr2 save/restore,
> which doesn't nest because it is a percpu variable.
>
> ... except in the
On Thu, 27 Feb 2014, H. Peter Anvin wrote:
> On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> > On Thu, 27 Feb 2014 14:52:54 -0800
> > "H. Peter Anvin" wrote:
> >
> >> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> >>>
> >>> Yeah, something is getting mesed up.
> >>>
> >>
> >> What it *looks*
On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> On Thu, 27 Feb 2014 14:52:54 -0800
> "H. Peter Anvin" wrote:
>
>> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
>>>
>>> Yeah, something is getting mesed up.
>>>
>>
>> What it *looks* like to me is that we try to nest the cr2 save/restore,
>> which
On Thu, 27 Feb 2014 14:52:54 -0800
"H. Peter Anvin" wrote:
> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> >
> > Yeah, something is getting mesed up.
> >
>
> What it *looks* like to me is that we try to nest the cr2 save/restore,
> which doesn't nest because it is a percpu variable.
>
>
On 02/27/2014 02:31 PM, Steven Rostedt wrote:
>
> Yeah, something is getting mesed up.
>
What it *looks* like to me is that we try to nest the cr2 save/restore,
which doesn't nest because it is a percpu variable.
... except in the x86-64 case, we *ALSO* save/restore cr2 inside
entry_64.S,
On Thu, 27 Feb 2014 17:06:36 -0500 (EST)
Vince Weaver wrote:
>
> I spent some more time on this.
> I managed to get a trace that exhibited the bug practically right
> away, but still unable to generate a reproducible trace :(
>
> So instead I'm adding WARN's and trace_printks to see what I
I spent some more time on this.
I managed to get a trace that exhibited the bug practically right
away, but still unable to generate a reproducible trace :(
So instead I'm adding WARN's and trace_printks to see what I can find out.
Here's a summary of what I think is happneing. Please let
I spent some more time on this.
I managed to get a trace that exhibited the bug practically right
away, but still unable to generate a reproducible trace :(
So instead I'm adding WARN's and trace_printks to see what I can find out.
Here's a summary of what I think is happneing. Please let
On Thu, 27 Feb 2014 17:06:36 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:
I spent some more time on this.
I managed to get a trace that exhibited the bug practically right
away, but still unable to generate a reproducible trace :(
So instead I'm adding WARN's and
On 02/27/2014 02:31 PM, Steven Rostedt wrote:
Yeah, something is getting mesed up.
What it *looks* like to me is that we try to nest the cr2 save/restore,
which doesn't nest because it is a percpu variable.
... except in the x86-64 case, we *ALSO* save/restore cr2 inside
entry_64.S, which
On Thu, 27 Feb 2014 14:52:54 -0800
H. Peter Anvin h...@zytor.com wrote:
On 02/27/2014 02:31 PM, Steven Rostedt wrote:
Yeah, something is getting mesed up.
What it *looks* like to me is that we try to nest the cr2 save/restore,
which doesn't nest because it is a percpu variable.
On 02/27/2014 03:30 PM, Steven Rostedt wrote:
On Thu, 27 Feb 2014 14:52:54 -0800
H. Peter Anvin h...@zytor.com wrote:
On 02/27/2014 02:31 PM, Steven Rostedt wrote:
Yeah, something is getting mesed up.
What it *looks* like to me is that we try to nest the cr2 save/restore,
which doesn't
On Thu, 27 Feb 2014, H. Peter Anvin wrote:
On 02/27/2014 03:30 PM, Steven Rostedt wrote:
On Thu, 27 Feb 2014 14:52:54 -0800
H. Peter Anvin h...@zytor.com wrote:
On 02/27/2014 02:31 PM, Steven Rostedt wrote:
Yeah, something is getting mesed up.
What it *looks* like to me is
On Thu, 27 Feb 2014, H. Peter Anvin wrote:
On 02/27/2014 02:31 PM, Steven Rostedt wrote:
Yeah, something is getting mesed up.
What it *looks* like to me is that we try to nest the cr2 save/restore,
which doesn't nest because it is a percpu variable.
... except in the x86-64 case,
Ok... I think we're definitely talking about a cr2 leak. The reboot might be a
race condition in the NMI nesting handling maybe?
On February 27, 2014 5:34:34 PM PST, Vince Weaver vincent.wea...@maine.edu
wrote:
On Thu, 27 Feb 2014, H. Peter Anvin wrote:
On 02/27/2014 02:31 PM, Steven Rostedt
On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:
I would actually suggest we do the equivalent on i386 as well.
Vince, could you try this patch as an experiment?
OK with your patch applied it does not segfault.
Vince, Great! Can you remove
On Tue, 25 Feb 2014, Vince Weaver wrote:
> On Tue, 25 Feb 2014, Steven Rostedt wrote:
>
> > On Tue, 25 Feb 2014 06:34:55 -0800
> > "H. Peter Anvin" wrote:
> >
> > > #2 is what I really don't understand.
> > >
> > > I worry something else is going on there
> >
> > Yeah, me too.
> >
>
> OK,
On Tue, 25 Feb 2014, Vince Weaver wrote:
On Tue, 25 Feb 2014, Steven Rostedt wrote:
On Tue, 25 Feb 2014 06:34:55 -0800
H. Peter Anvin h...@zytor.com wrote:
#2 is what I really don't understand.
I worry something else is going on there
Yeah, me too.
OK, well I'll
On Tue, 25 Feb 2014, Steven Rostedt wrote:
> On Tue, 25 Feb 2014 06:34:55 -0800
> "H. Peter Anvin" wrote:
>
> > #2 is what I really don't understand.
> >
> > I worry something else is going on there
>
> Yeah, me too.
>
OK, well I'll work on isolating that next, I was hoping the segfault
On Tue, 25 Feb 2014 06:34:55 -0800
"H. Peter Anvin" wrote:
> #2 is what I really don't understand.
>
> I worry something else is going on there
Yeah, me too.
-- Steve
>
> >
> >While the missing cr2 issue made debugging frustrating, I find the
> >other
> >aspects of the bug more serious:
>
1 - 100 of 164 matches
Mail list logo