Re: perf_fuzzer compiled for x32 causes reboot

2014-03-04 Thread Borislav Petkov
On Sat, Mar 01, 2014 at 08:50:17AM -0800, H. Peter Anvin wrote:
> The bottom line is that if we want hard numbers we probably have to
> measure.
>
> Hoisting the cr2 read is a no-brainer, might even help performance...

Btw, I just got word that on AMD, a read from CR2 is 4 cycles on family
0x15 and 3 cycles on family 0x10. If we're in the same ballpark on
Intel, I'd say we're fine with those numbers even in hot paths. The OoO
is an additional bonus which should hide latency even more.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-04 Thread Borislav Petkov
On Sat, Mar 01, 2014 at 08:50:17AM -0800, H. Peter Anvin wrote:
 The bottom line is that if we want hard numbers we probably have to
 measure.

 Hoisting the cr2 read is a no-brainer, might even help performance...

Btw, I just got word that on AMD, a read from CR2 is 4 cycles on family
0x15 and 3 cycles on family 0x10. If we're in the same ballpark on
Intel, I'd say we're fine with those numbers even in hot paths. The OoO
is an additional bonus which should hide latency even more.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-03 Thread Peter Zijlstra
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
> We read CR2 in the page fault hot path, so it's on the top of CPU 
> architects' minds and it's reasonably optimized. A couple of cycles 
> IIRC, but would be nice to hear actual numbers.

Well, going with what Linus found; it looks like they made page faults
more expensive on Haswell as compared to previous generations. So its
not _that_ high on their list :/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-03 Thread Peter Zijlstra
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
 We read CR2 in the page fault hot path, so it's on the top of CPU 
 architects' minds and it's reasonably optimized. A couple of cycles 
 IIRC, but would be nice to hear actual numbers.

Well, going with what Linus found; it looks like they made page faults
more expensive on Haswell as compared to previous generations. So its
not _that_ high on their list :/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-02 Thread Vince Weaver
On Fri, 28 Feb 2014, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
> Vince Weaver  wrote:
> 
> But perf_event bug finder is a much more prestigious title than
> "college professor" ;-)

yes, it's something to fall back on if/when I get denied tenure :)

I do enjoy tracking down bugs like this, it's why I spend a lot of time on 
it when I should probably be doing other things.

> BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
> really useful for us to do our own testing too.

yes, code is fully available though possibly not publicized well.  I 
posted the full info later in the thread as I seem to be responding to 
e-mails in reverse order today for some reason.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-02 Thread Vince Weaver
On Sat, 1 Mar 2014, Andi Kleen wrote:

> Steven Rostedt  writes:
> >
> > BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
> > really useful for us to do our own testing too.
> 
> I believe it's part of trinity.
> 
> http://codemonkey.org.uk/projects/trinity/
> 
> Perhaps it should have a "ftracer fuzzer" too?

It's not part of trinity, it's a separate project that re-uses some 
of trinity's codebase.

http://web.eece.maine.edu/~vweaver/projects/perf_events/fuzzer/

You can get the source as part of the perf_event_tests package
https://github.com/deater/perf_event_tests

go into the fuzzer directory, run make, then "./perf_fuzzer".
You can also run the ./fast_repro.sh script instead, it's what I've
been using to track this kernel issue, although it tripped over the issue
a lot faster when compiled with -mx32 than it did when complied as native 
x86_64.

I think the code as found in the git tree should be working, I made a lot 
of changes when tracking this problem and I need to make sure I only check 
in the proper ones and not let any weird debugging patches slip in.

I'm aware of at least two other WARN messages and possibly one or two 
hard-lockup bugs in addition to the potential system reboot that can be 
triggered by the fuzzer, it just takes a while to isolate these things.

Vince


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-02 Thread Vince Weaver
On Sat, 1 Mar 2014, Andi Kleen wrote:

 Steven Rostedt rost...@goodmis.org writes:
 
  BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
  really useful for us to do our own testing too.
 
 I believe it's part of trinity.
 
 http://codemonkey.org.uk/projects/trinity/
 
 Perhaps it should have a ftracer fuzzer too?

It's not part of trinity, it's a separate project that re-uses some 
of trinity's codebase.

http://web.eece.maine.edu/~vweaver/projects/perf_events/fuzzer/

You can get the source as part of the perf_event_tests package
https://github.com/deater/perf_event_tests

go into the fuzzer directory, run make, then ./perf_fuzzer.
You can also run the ./fast_repro.sh script instead, it's what I've
been using to track this kernel issue, although it tripped over the issue
a lot faster when compiled with -mx32 than it did when complied as native 
x86_64.

I think the code as found in the git tree should be working, I made a lot 
of changes when tracking this problem and I need to make sure I only check 
in the proper ones and not let any weird debugging patches slip in.

I'm aware of at least two other WARN messages and possibly one or two 
hard-lockup bugs in addition to the potential system reboot that can be 
triggered by the fuzzer, it just takes a while to isolate these things.

Vince


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-02 Thread Vince Weaver
On Fri, 28 Feb 2014, Steven Rostedt wrote:
 On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
 Vince Weaver vincent.wea...@maine.edu wrote:
 
 But perf_event bug finder is a much more prestigious title than
 college professor ;-)

yes, it's something to fall back on if/when I get denied tenure :)

I do enjoy tracking down bugs like this, it's why I spend a lot of time on 
it when I should probably be doing other things.

 BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
 really useful for us to do our own testing too.

yes, code is fully available though possibly not publicized well.  I 
posted the full info later in the thread as I seem to be responding to 
e-mails in reverse order today for some reason.

Vince
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-01 Thread H. Peter Anvin
The bottom line is that if we want hard numbers we probably have to measure.

Hoisting the cr2 read is a no-brainer, might even help performance...

On March 1, 2014 1:50:42 AM PST, Borislav Petkov  wrote:
>On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
>> 
>> * Steven Rostedt  wrote:
>> 
>> > > Also, this function is called a _LOT_ under certain workloads, I 
>> > > don't know how cheap a CR2 read is, but it had better be really 
>> > > cheap.
>> > 
>> > That's a HPA question.
>> 
>> We read CR2 in the page fault hot path, so it's on the top of CPU 
>> architects' minds and it's reasonably optimized. A couple of cycles 
>> IIRC, but would be nice to hear actual numbers.
>
>Yeah, we were discussing this last night on IRC.
>
>And hpa actually meant that the optimization potential was there but no
>one did do it, except maybe Transmeta. :-)
>
>So the expensive thing is writing to CR2 because it is a serializing
>instruction. In fact, all writes to control registers except CR8 are
>serializing.
>
>The reading from CR2 should be cheaper but not as cheap as a normal
>MOV %reg %reg is. On AMD, MOV %reg, %cr2 is done with microcode so
>definitely at least a couple of cycles and I'd guess it is not a
>trivial
>MOV on Intel too.
>
>Maybe a way to hide this cost is the OoO, as hpa suggested, depending
>on
>how much parallelism that particular code region can offer (serializing
>instructions close by).

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-01 Thread Andi Kleen
Steven Rostedt  writes:
>
> BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
> really useful for us to do our own testing too.

I believe it's part of trinity.

http://codemonkey.org.uk/projects/trinity/

Perhaps it should have a "ftracer fuzzer" too?

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-01 Thread Borislav Petkov
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
> 
> * Steven Rostedt  wrote:
> 
> > > Also, this function is called a _LOT_ under certain workloads, I 
> > > don't know how cheap a CR2 read is, but it had better be really 
> > > cheap.
> > 
> > That's a HPA question.
> 
> We read CR2 in the page fault hot path, so it's on the top of CPU 
> architects' minds and it's reasonably optimized. A couple of cycles 
> IIRC, but would be nice to hear actual numbers.

Yeah, we were discussing this last night on IRC.

And hpa actually meant that the optimization potential was there but no
one did do it, except maybe Transmeta. :-)

So the expensive thing is writing to CR2 because it is a serializing
instruction. In fact, all writes to control registers except CR8 are
serializing.

The reading from CR2 should be cheaper but not as cheap as a normal
MOV %reg %reg is. On AMD, MOV %reg, %cr2 is done with microcode so
definitely at least a couple of cycles and I'd guess it is not a trivial
MOV on Intel too.

Maybe a way to hide this cost is the OoO, as hpa suggested, depending on
how much parallelism that particular code region can offer (serializing
instructions close by).

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-01 Thread Ingo Molnar

* Steven Rostedt  wrote:

> > Also, this function is called a _LOT_ under certain workloads, I 
> > don't know how cheap a CR2 read is, but it had better be really 
> > cheap.
> 
> That's a HPA question.

We read CR2 in the page fault hot path, so it's on the top of CPU 
architects' minds and it's reasonably optimized. A couple of cycles 
IIRC, but would be nice to hear actual numbers.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-01 Thread Ingo Molnar

* Steven Rostedt rost...@goodmis.org wrote:

  Also, this function is called a _LOT_ under certain workloads, I 
  don't know how cheap a CR2 read is, but it had better be really 
  cheap.
 
 That's a HPA question.

We read CR2 in the page fault hot path, so it's on the top of CPU 
architects' minds and it's reasonably optimized. A couple of cycles 
IIRC, but would be nice to hear actual numbers.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-01 Thread Borislav Petkov
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
 
 * Steven Rostedt rost...@goodmis.org wrote:
 
   Also, this function is called a _LOT_ under certain workloads, I 
   don't know how cheap a CR2 read is, but it had better be really 
   cheap.
  
  That's a HPA question.
 
 We read CR2 in the page fault hot path, so it's on the top of CPU 
 architects' minds and it's reasonably optimized. A couple of cycles 
 IIRC, but would be nice to hear actual numbers.

Yeah, we were discussing this last night on IRC.

And hpa actually meant that the optimization potential was there but no
one did do it, except maybe Transmeta. :-)

So the expensive thing is writing to CR2 because it is a serializing
instruction. In fact, all writes to control registers except CR8 are
serializing.

The reading from CR2 should be cheaper but not as cheap as a normal
MOV %reg %reg is. On AMD, MOV %reg, %cr2 is done with microcode so
definitely at least a couple of cycles and I'd guess it is not a trivial
MOV on Intel too.

Maybe a way to hide this cost is the OoO, as hpa suggested, depending on
how much parallelism that particular code region can offer (serializing
instructions close by).

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-01 Thread Andi Kleen
Steven Rostedt rost...@goodmis.org writes:

 BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
 really useful for us to do our own testing too.

I believe it's part of trinity.

http://codemonkey.org.uk/projects/trinity/

Perhaps it should have a ftracer fuzzer too?

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-03-01 Thread H. Peter Anvin
The bottom line is that if we want hard numbers we probably have to measure.

Hoisting the cr2 read is a no-brainer, might even help performance...

On March 1, 2014 1:50:42 AM PST, Borislav Petkov b...@alien8.de wrote:
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
 
 * Steven Rostedt rost...@goodmis.org wrote:
 
   Also, this function is called a _LOT_ under certain workloads, I 
   don't know how cheap a CR2 read is, but it had better be really 
   cheap.
  
  That's a HPA question.
 
 We read CR2 in the page fault hot path, so it's on the top of CPU 
 architects' minds and it's reasonably optimized. A couple of cycles 
 IIRC, but would be nice to hear actual numbers.

Yeah, we were discussing this last night on IRC.

And hpa actually meant that the optimization potential was there but no
one did do it, except maybe Transmeta. :-)

So the expensive thing is writing to CR2 because it is a serializing
instruction. In fact, all writes to control registers except CR8 are
serializing.

The reading from CR2 should be cheaper but not as cheap as a normal
MOV %reg %reg is. On AMD, MOV %reg, %cr2 is done with microcode so
definitely at least a couple of cycles and I'd guess it is not a
trivial
MOV on Intel too.

Maybe a way to hide this cost is the OoO, as hpa suggested, depending
on
how much parallelism that particular code region can offer (serializing
instructions close by).

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
Vince Weaver  wrote:

> > I was poking fun at you on IRC for this exact reason:
> > 
> >  poor Vince, I keep sending him new patches. "No, don't test this 
> > patch, now test this one. Oh wait, try this one instead"
> > * peterz sees Vince thinking: "stop... sending.. me.. damn... patches... 
> > already... !!!11!"
> >  or at least, "Let me finish this test before I cancel it again 
> > for another damn patch"
> >  then he's probably doing "I'm not going to run any tests now, 
> > until I wait a while to see if there's a new patch to test"

I hope you were not offended by this. It was actually a testament to
the work you've given us.

> 
> Well while it might appear that I spend all of my days finding perf_event 
> bugs, I actually am a college professor so I do occasionally have to run 
> off to teach a class, meet with students, or write papers/grants for other 
> academics to reject.

But perf_event bug finder is a much more prestigious title than
"college professor" ;-)

> 
> It's nice others can reproduce the issue now, it would have saved me a lot 
> of trouble, although now in theory I have a much better handle of how to 
> use/abuse ftrace so I guess it was worth it.

I always enjoy finding bugs in other people's code. That's usually the
best way to learn what their code does.

> 
> Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
> know if there are any other issues left.  I do look forward to the day 
> when I can leave it running overnight and have a clean syslog the next 
> morning.

BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
really useful for us to do our own testing too.

Thanks,

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread H. Peter Anvin
On 02/28/2014 03:34 PM, Vince Weaver wrote:
> 
> Well while it might appear that I spend all of my days finding perf_event 
> bugs, I actually am a college professor so I do occasionally have to run 
> off to teach a class, meet with students, or write papers/grants for other 
> academics to reject.
> 

We really appreciate your help.  This has been really critical.

>
> It's nice others can reproduce the issue now, it would have saved me a lot 
> of trouble, although now in theory I have a much better handle of how to 
> use/abuse ftrace so I guess it was worth it.
> 
> Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
> know if there are any other issues left.  I do look forward to the day 
> when I can leave it running overnight and have a clean syslog the next 
> morning.
> 

We all do, definitely, and your help has been a huge step in that direction.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Vince Weaver
On Fri, 28 Feb 2014, Steven Rostedt wrote:

> On Fri, 28 Feb 2014 16:18:23 -0500 (EST)
> Vince Weaver  wrote:
> 
> > I was away from the computer this afternoon and of course I have scores of 
> > e-mails on this topic now with lots of competing patches.  Is there one 
> > in particular I'm supposed to be testing?
> 
> I was poking fun at you on IRC for this exact reason:
> 
>  poor Vince, I keep sending him new patches. "No, don't test this 
> patch, now test this one. Oh wait, try this one instead"
> * peterz sees Vince thinking: "stop... sending.. me.. damn... patches... 
> already... !!!11!"
>  or at least, "Let me finish this test before I cancel it again for 
> another damn patch"
>  then he's probably doing "I'm not going to run any tests now, until 
> I wait a while to see if there's a new patch to test"

Well while it might appear that I spend all of my days finding perf_event 
bugs, I actually am a college professor so I do occasionally have to run 
off to teach a class, meet with students, or write papers/grants for other 
academics to reject.

It's nice others can reproduce the issue now, it would have saved me a lot 
of trouble, although now in theory I have a much better handle of how to 
use/abuse ftrace so I guess it was worth it.

Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
know if there are any other issues left.  I do look forward to the day 
when I can leave it running overnight and have a clean syslog the next 
morning.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Paul E. McKenney
On Fri, Feb 28, 2014 at 05:05:53PM -0500, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 22:55:11 +0100
> Peter Zijlstra  wrote:
> 
> > On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> > > > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > > > > This code isn't running in idle context is it?  If so, RCU will 
> > > > > happily
> > > > > free out from under it.  CONFIG_PROVE_RCU should detect this sort of 
> > > > > thing,
> > > > > though.
> > > > 
> > > > Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
> > > > entry code deals with the idle state AFAIK.
> > > 
> > > Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
> > > about this being code invoked from some energy-efficiency driver or
> > > another within the idle loop.
> > 
> > Right, so any tracepoint can end up there; but I thought there was
> > already the rule that tracepoints needed RCU enabled.
> 
> There is and we have special tracepoint caller for those cases we want a
> tracepoint out of RCU scope. These will reactivate rcu in the
> tracepoint code.
> 
>   trace__rcuidle(...)

OK, I finally looked at the emails leading up to this in the thread...
I believe that I am doing premature debugging.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 22:55:11 +0100
Peter Zijlstra  wrote:

> On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> > > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > > > This code isn't running in idle context is it?  If so, RCU will happily
> > > > free out from under it.  CONFIG_PROVE_RCU should detect this sort of 
> > > > thing,
> > > > though.
> > > 
> > > Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
> > > entry code deals with the idle state AFAIK.
> > 
> > Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
> > about this being code invoked from some energy-efficiency driver or
> > another within the idle loop.
> 
> Right, so any tracepoint can end up there; but I thought there was
> already the rule that tracepoints needed RCU enabled.

There is and we have special tracepoint caller for those cases we want a
tracepoint out of RCU scope. These will reactivate rcu in the
tracepoint code.

  trace__rcuidle(...)

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > > This code isn't running in idle context is it?  If so, RCU will happily
> > > free out from under it.  CONFIG_PROVE_RCU should detect this sort of 
> > > thing,
> > > though.
> > 
> > Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
> > entry code deals with the idle state AFAIK.
> 
> Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
> about this being code invoked from some energy-efficiency driver or
> another within the idle loop.

Right, so any tracepoint can end up there; but I thought there was
already the rule that tracepoints needed RCU enabled.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Paul E. McKenney
On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > This code isn't running in idle context is it?  If so, RCU will happily
> > free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
> > though.
> 
> Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
> entry code deals with the idle state AFAIK.

Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
about this being code invoked from some energy-efficiency driver or
another within the idle loop.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 16:18:23 -0500 (EST)
Vince Weaver  wrote:

> I was away from the computer this afternoon and of course I have scores of 
> e-mails on this topic now with lots of competing patches.  Is there one 
> in particular I'm supposed to be testing?

I was poking fun at you on IRC for this exact reason:

 poor Vince, I keep sending him new patches. "No, don't test this 
patch, now test this one. Oh wait, try this one instead"
* peterz sees Vince thinking: "stop... sending.. me.. damn... patches... 
already... !!!11!"
 or at least, "Let me finish this test before I cancel it again for 
another damn patch"
 then he's probably doing "I'm not going to run any tests now, until I 
wait a while to see if there's a new patch to test"

 :-)

Anyway, I would say wait a bit while we sort this out. At least we have
a strong idea of what the bug is. Now we need to agree on the solution.

Vince, you have been really terrific in helping us figure out what was
wrong. I don't want to waste your time until we can all agree on what
the proper fix is. When we do, it would be great if you can test it.

But for now, have a great weekend and thanks for all your hard work in
narrowing down this issue.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> This code isn't running in idle context is it?  If so, RCU will happily
> free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
> though.

Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
entry code deals with the idle state AFAIK.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Paul E. McKenney
On Fri, Feb 28, 2014 at 09:54:09PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote:
> > > > I'll try your patch momentarily, first I had some other changes I 
> > > > started 
> > > > running before I left work (for some reason it recompiled the whole 
> > > > kernel).
> > > > 
> > > > 8: function: perf_output_begin
> > > > 8: bprint:   perf_output_begin: VMW: event type 2 config 2a 
> > > > st: 2c3e
> > > > 8: bputs:perf_output_begin: VMW: before rcu_dereference
> > > > 9: function: __do_page_fault
> > > > 9: function:down_read_trylock
> > > > 9: function:_cond_resched
> > > > 9: function:find_vma
> > > > 
> > > > so it looks like the fault happens 
> > > > 
> > > > rcu_read_lock();
> > > > 
> > > > 116 /*
> > > > 117  * For inherited events we send all the output towards the 
> > > > parent.
> > > > 118  */
> > > > 119 if (event->parent)
> > > > 120 event = event->parent;
> > > > 121 
> > > > 
> > > > somewhere between here
> > > > 
> > > > 122 rb = rcu_dereference(event->rb);
> > > > 123 if (unlikely(!rb))
> > > > 124 goto out;
> > > > 
> > > > and here
> > > > 
> > > > 125 
> > > > 126 if (unlikely(!rb->nr_pages))
> > > > 127 goto out;
> > > > 
> > > > although if rcu locks do anything to turn off tracing then this could 
> > > > be 
> > > > suspect.
> > > 
> > > The most likely suspect is of course event->rb in the rcu_dereference.
> > > I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
> > > currently interact with tracing.  ;-)
> > 
> > These are all perf related. You'll need to defer to Peter Zijlstra ;-)
> 
> I'm lost.. :/
> 
> pretty much all perf objects are RCU freed. 

This code isn't running in idle context is it?  If so, RCU will happily
free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
though.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Vince Weaver
On Fri, 28 Feb 2014, H. Peter Anvin wrote:

> Now we need to figure out if the reboot problem and the segfault problem 
> are actually the same... I have a nasty feeling they might be different 
> problems.

I'm currently running a script that tries setting EBP to all possible 
32-bit pages and running the test to see if that triggers anything.

If I look at my notes the original reboot crash might have happened when I 
had the fuzzer also generating overflow signals (my current tests do not) 
so I'm not sure if having all this mess triggered from inside a signal 
handler could make it reboot somehow.

I was away from the computer this afternoon and of course I have scores of 
e-mails on this topic now with lots of competing patches.  Is there one 
in particular I'm supposed to be testing?

Thanks,

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 21:56:38 +0100
Peter Zijlstra  wrote:


> Like already said; _trace is an absolutely abysmal name. Also you
> _really_ don't want an unconditional CR2 write in there, that's just
> stupidly expensive.

But a read isn't. Which is why we only do a write if the copy caused a
page fault (which is already expensive).

The proposed patch will have:

if (cr2 != read_cr2())
write_cr2(cr2);

> 
> Also, this function is called a _LOT_ under certain workloads, I don't
> know how cheap a CR2 read is, but it had better be really cheap.

That's a HPA question.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 11:29:46AM -0500, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 08:15:11 -0800
> "H. Peter Anvin"  wrote:
> 
> > Well, I was talking about the assumption spelled out in the comment
> > above copy_from_user_nmi() which pretty much states "cr2 is safe because
> > cr2 is saved/restored in the NMI wrappers."
> 
> Yeah, it seems that the name "copy_from_user_nmi()" is a misnomer. As
> it can be called outside of nmi context. Perhaps we should have a
> copy_from_user_trace() that does the save and restore of the cr2.
> 
> As that's the only place that faults, it may be the best answer.
> 
> Arnaldo,
> 
> Can you test this patch and see if it fixes the bug for you too. Vince
> already said it fixes it for him.
> 
> I'm attaching it below (it's from H. Peter).
> 
> Peter Z., as both Jiri's and my patch fixed the callers of the problem
> area, and as we have been discussing, there may be more problem areas,
> I'm thinking the best solution is to just use H. Peter's patch instead.
> And then we should rename it to copy_from_user_trace().

Like already said; _trace is an absolutely abysmal name. Also you
_really_ don't want an unconditional CR2 write in there, that's just
stupidly expensive.

Also, this function is called a _LOT_ under certain workloads, I don't
know how cheap a CR2 read is, but it had better be really cheap.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 08:15:11AM -0800, H. Peter Anvin wrote:
> On 02/28/2014 07:40 AM, Peter Zijlstra wrote:
> > On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
> >> If I'm reading this right we end up going from the page fault
> >> tracepoint to copy_from_user_nmi() without going through NMI, and the
> >> cr2 corruption is obvious.  I guess the assumption that only the NMI
> >> path needed to save cr2 is flawed?
> > 
> > It was never assumed it would only go through NMI, but that it would be
> > NMI safe -- and as it turns out, it is that.
> > 
> > What I did assume was that any other callsites would be safe, seeing how
> > they'd already be running in 'normal' contexts.
> > 
> > I had not considered people putting tracepoints _that_ early in the
> > exception paths.
> > 
> > Note that there's more tracepoints there than the one mentioned.
> > 
> 
> Well, I was talking about the assumption spelled out in the comment
> above copy_from_user_nmi() which pretty much states "cr2 is safe because
> cr2 is saved/restored in the NMI wrappers."

That is because we assumed NMI was the only thing that could interrupt a
fault handler before it would read CR2. Never thinking someone would put
a tracepoint there.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote:
> > > I'll try your patch momentarily, first I had some other changes I started 
> > > running before I left work (for some reason it recompiled the whole 
> > > kernel).
> > > 
> > > 8: function: perf_output_begin
> > > 8: bprint:   perf_output_begin: VMW: event type 2 config 2a 
> > > st: 2c3e
> > > 8: bputs:perf_output_begin: VMW: before rcu_dereference
> > > 9: function: __do_page_fault
> > > 9: function:down_read_trylock
> > > 9: function:_cond_resched
> > > 9: function:find_vma
> > > 
> > > so it looks like the fault happens 
> > > 
> > > rcu_read_lock();
> > > 
> > > 116 /*
> > > 117  * For inherited events we send all the output towards the 
> > > parent.
> > > 118  */
> > > 119 if (event->parent)
> > > 120 event = event->parent;
> > > 121 
> > > 
> > > somewhere between here
> > > 
> > > 122 rb = rcu_dereference(event->rb);
> > > 123 if (unlikely(!rb))
> > > 124 goto out;
> > > 
> > > and here
> > > 
> > > 125 
> > > 126 if (unlikely(!rb->nr_pages))
> > > 127 goto out;
> > > 
> > > although if rcu locks do anything to turn off tracing then this could be 
> > > suspect.
> > 
> > The most likely suspect is of course event->rb in the rcu_dereference.
> > I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
> > currently interact with tracing.  ;-)
> 
> These are all perf related. You'll need to defer to Peter Zijlstra ;-)

I'm lost.. :/

pretty much all perf objects are RCU freed. 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 12:34:05 -0800
"Paul E. McKenney"  wrote:

> On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote:
> > On Thu, 27 Feb 2014, H. Peter Anvin wrote:
> > 
> > > On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> > > > On Thu, 27 Feb 2014 14:52:54 -0800
> > > > "H. Peter Anvin"  wrote:
> > > > 
> > > >> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> > > >>>
> > > >>> Yeah, something is getting mesed up.
> > > >>>
> > > >>
> > > >> What it *looks* like to me is that we try to nest the cr2 save/restore,
> > > >> which doesn't nest because it is a percpu variable.
> > > >>
> > > >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
> > > >> entry_64.S, which makes the stuff in do_nmi completely redundant and
> > > >> there for no good reason.
> > > > 
> > > > Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
> > > > section. That is, it isn't even executed. That's i386 code. The only
> > > > place the cr2 is saved for x86_64 is in entry_64.S.
> > > > 
> > > 
> > > Right, egg on my face.  However, I still think it would make more sense
> > > for it to nest the way entry_64.S does if at all possible.
> > > 
> > > That makes this even more confusing, though.  I would still like to see
> > > what happens with the patch I sent Vince.
> > 
> > I'll try your patch momentarily, first I had some other changes I started 
> > running before I left work (for some reason it recompiled the whole 
> > kernel).
> > 
> > 8: function: perf_output_begin
> > 8: bprint:   perf_output_begin: VMW: event type 2 config 2a st: 
> > 2c3e
> > 8: bputs:perf_output_begin: VMW: before rcu_dereference
> > 9: function: __do_page_fault
> > 9: function:down_read_trylock
> > 9: function:_cond_resched
> > 9: function:find_vma
> > 
> > so it looks like the fault happens 
> > 
> > rcu_read_lock();
> > 
> > 116 /*
> > 117  * For inherited events we send all the output towards the 
> > parent.
> > 118  */
> > 119 if (event->parent)
> > 120 event = event->parent;
> > 121 
> > 
> > somewhere between here
> > 
> > 122 rb = rcu_dereference(event->rb);
> > 123 if (unlikely(!rb))
> > 124 goto out;
> > 
> > and here
> > 
> > 125 
> > 126 if (unlikely(!rb->nr_pages))
> > 127 goto out;
> > 
> > although if rcu locks do anything to turn off tracing then this could be 
> > suspect.
> 
> The most likely suspect is of course event->rb in the rcu_dereference.
> I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
> currently interact with tracing.  ;-)

These are all perf related. You'll need to defer to Peter Zijlstra ;-)

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 12:38:52 -0800
"H. Peter Anvin"  wrote:

> Now we need to figure out if the reboot problem and the segfault problem are 
> actually the same... I have a nasty feeling they might be different problems.

I wonder if there was any recursion problem. Although, I believe perf
has recursion checks. But perhaps something recursed before the checks?

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread H. Peter Anvin
Now we need to figure out if the reboot problem and the segfault problem are 
actually the same... I have a nasty feeling they might be different problems.

On February 28, 2014 7:07:29 AM PST, Vince Weaver  
wrote:
>On Fri, 28 Feb 2014, Steven Rostedt wrote:
>
>> Interesting. Are you doing a perf function trace?
>> 
>> And just in case, can you add this patch and make sure the copy is
>> called by NMI.
>
>199.900682: function: trace_do_page_fault
>199.900683: page_fault_user:  address=__per_cpu_end
>ip=__per_cpu_end error_code=0x6
>199.900683: function: perf_swevent_get_recursion_context
>199.900684: function: perf_tp_event
>199.900684: function:perf_swevent_event
>199.900684: function:   perf_swevent_overflow
>199.900684: function:  __perf_event_overflow
>199.900685: function: perf_prepare_sample
>199.900685: function:   
>__perf_event_header__init_id
>199.900685: function:   task_tgid_nr_ns
>199.900685: function:   perf_event_tid
>199.900686: function:  __task_pid_nr_ns
>199.900686: function:perf_callchain
>199.900687: function: copy_from_user_nmi
>199.900687: function: trace_do_page_fault
>199.900687: page_fault_kernel:address=irq_stack_union
>ip=copy_user_generic_string error_code=0x0
>199.900688: function:__do_page_fault
>199.900688: function:   bad_area_nosemaphore
>199.900688: function:  __bad_area_nosemaphore
>199.900689: function: no_context
>199.900689: function:fixup_exception
>199.900689: function:  
>search_exception_tables
>199.900689: function:  search_extable
>199.900691: function: copy_user_handle_tail
>199.900691: function: trace_do_page_fault
>199.900691: page_fault_kernel:address=irq_stack_union
>ip=copy_user_handle_tail error_code=0x0
>199.900691: function:__do_page_fault
>199.900692: function:   bad_area_nosemaphore
>199.900692: function:  __bad_area_nosemaphore
>199.900692: function: no_context
>199.900692: function:fixup_exception
>199.900692: function:  
>search_exception_tables
>199.900692: function:  search_extable
>199.900693: function: save_stack_trace
>199.900693: function:dump_trace
>199.900694: function:   print_context_stack
>199.900694: function:  __kernel_text_address
>199.900694: function: is_module_text_address
>199.900695: function:__module_text_address
>199.900695: function:   __module_address
>199.900695: function:  __kernel_text_address
>199.900695: function: is_module_text_address
>199.900696: function:__module_text_address
>199.900696: function:   __module_address
>...
>199.900705: function:  __kernel_text_address
>199.900809: kernel_stack: 
>=> perf_callchain (810d35a2)
>=> perf_prepare_sample (810cfae3)
>=> __perf_event_overflow (810d02f4)
>=> perf_swevent_overflow (810d04e3)
>=> perf_swevent_event (810d0574)
>=> perf_tp_event (810d070c)
>=> perf_trace_x86_exceptions (810341b6)
>=> trace_do_page_fault (81537702)
>=> trace_page_fault (81534772)
>199.900810: function: perf_output_begin
>199.900810: function: __do_page_fault
>199.900810: function:__perf_sw_event
>199.900810: function:  
>perf_swevent_get_recursion_context
>199.900811: function:down_read_trylock
>199.900811: function:_cond_resched
>199.900811: function:find_vma
>199.900811: function:bad_area
>199.900812: function:   up_read
>199.900812: function:   __bad_area_nosemaphore
>199.900812: function:  is_prefetch
>199.900812: function: convert_ip_to_linear
>199.900813: function:  unhandled_signal
>199.900813: function:  __printk_ratelimit
>199.900813: function: _raw_spin_trylock
>199.900813: function: _raw_spin_unlock_irqrestore
>199.900814: function:  printk
>199.900814: function: vprintk_emit

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe 

Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Paul E. McKenney
On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote:
> On Thu, 27 Feb 2014, H. Peter Anvin wrote:
> 
> > On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> > > On Thu, 27 Feb 2014 14:52:54 -0800
> > > "H. Peter Anvin"  wrote:
> > > 
> > >> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> > >>>
> > >>> Yeah, something is getting mesed up.
> > >>>
> > >>
> > >> What it *looks* like to me is that we try to nest the cr2 save/restore,
> > >> which doesn't nest because it is a percpu variable.
> > >>
> > >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
> > >> entry_64.S, which makes the stuff in do_nmi completely redundant and
> > >> there for no good reason.
> > > 
> > > Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
> > > section. That is, it isn't even executed. That's i386 code. The only
> > > place the cr2 is saved for x86_64 is in entry_64.S.
> > > 
> > 
> > Right, egg on my face.  However, I still think it would make more sense
> > for it to nest the way entry_64.S does if at all possible.
> > 
> > That makes this even more confusing, though.  I would still like to see
> > what happens with the patch I sent Vince.
> 
> I'll try your patch momentarily, first I had some other changes I started 
> running before I left work (for some reason it recompiled the whole 
> kernel).
> 
> 8: function: perf_output_begin
> 8: bprint:   perf_output_begin: VMW: event type 2 config 2a st: 
> 2c3e
> 8: bputs:perf_output_begin: VMW: before rcu_dereference
> 9: function: __do_page_fault
> 9: function:down_read_trylock
> 9: function:_cond_resched
> 9: function:find_vma
> 
> so it looks like the fault happens 
> 
> rcu_read_lock();
> 
> 116 /*
> 117  * For inherited events we send all the output towards the parent.
> 118  */
> 119 if (event->parent)
> 120 event = event->parent;
> 121 
> 
> somewhere between here
> 
> 122 rb = rcu_dereference(event->rb);
> 123 if (unlikely(!rb))
> 124 goto out;
> 
> and here
> 
> 125 
> 126 if (unlikely(!rb->nr_pages))
> 127 goto out;
> 
> although if rcu locks do anything to turn off tracing then this could be 
> suspect.

The most likely suspect is of course event->rb in the rcu_dereference.
I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
currently interact with tracing.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 08:15:11 -0800
"H. Peter Anvin"  wrote:

> Well, I was talking about the assumption spelled out in the comment
> above copy_from_user_nmi() which pretty much states "cr2 is safe because
> cr2 is saved/restored in the NMI wrappers."

Yeah, it seems that the name "copy_from_user_nmi()" is a misnomer. As
it can be called outside of nmi context. Perhaps we should have a
copy_from_user_trace() that does the save and restore of the cr2.

As that's the only place that faults, it may be the best answer.

Arnaldo,

Can you test this patch and see if it fixes the bug for you too. Vince
already said it fixes it for him.

I'm attaching it below (it's from H. Peter).

Peter Z., as both Jiri's and my patch fixed the callers of the problem
area, and as we have been discussing, there may be more problem areas,
I'm thinking the best solution is to just use H. Peter's patch instead.
And then we should rename it to copy_from_user_trace().

-- Steve


diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ddf9ecb..938e45c 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -10,6 +10,8 @@
 #include 
 #include 
 
+#include 
+
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
  * nested NMI paths are careful to preserve CR2.
@@ -18,6 +20,7 @@ unsigned long
 copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 {
unsigned long ret;
+   unsigned long cr2;
 
if (__range_not_ok(from, n, TASK_SIZE))
return 0;
@@ -27,9 +30,11 @@ copy_from_user_nmi(void *to, const void __user *from, 
unsigned long n)
 * disable pagefaults so that its behaviour is consistent even when
 * called form other contexts.
 */
+   cr2 = read_cr2();
pagefault_disable();
ret = __copy_from_user_inatomic(to, from, n);
pagefault_enable();
+   write_cr2(cr2);
 
return ret;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread H. Peter Anvin
On 02/28/2014 07:40 AM, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
>> If I'm reading this right we end up going from the page fault
>> tracepoint to copy_from_user_nmi() without going through NMI, and the
>> cr2 corruption is obvious.  I guess the assumption that only the NMI
>> path needed to save cr2 is flawed?
> 
> It was never assumed it would only go through NMI, but that it would be
> NMI safe -- and as it turns out, it is that.
> 
> What I did assume was that any other callsites would be safe, seeing how
> they'd already be running in 'normal' contexts.
> 
> I had not considered people putting tracepoints _that_ early in the
> exception paths.
> 
> Note that there's more tracepoints there than the one mentioned.
> 

Well, I was talking about the assumption spelled out in the comment
above copy_from_user_nmi() which pretty much states "cr2 is safe because
cr2 is saved/restored in the NMI wrappers."

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
> If I'm reading this right we end up going from the page fault
> tracepoint to copy_from_user_nmi() without going through NMI, and the
> cr2 corruption is obvious.  I guess the assumption that only the NMI
> path needed to save cr2 is flawed?

It was never assumed it would only go through NMI, but that it would be
NMI safe -- and as it turns out, it is that.

What I did assume was that any other callsites would be safe, seeing how
they'd already be running in 'normal' contexts.

I had not considered people putting tracepoints _that_ early in the
exception paths.

Note that there's more tracepoints there than the one mentioned.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 10:20:00 -0500
Steven Rostedt  wrote:

> Below is a patch that should fix this. Please remove all other patches
> and try this out.

Updated patch, as Peter Zijlstra on IRC asked me if the
exception_enter() can be traced. And looking at it, it sure can be.

-- Steve

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 6dea040..f606b67 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1271,9 +1271,15 @@ dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
enum ctx_state prev_state;
+   unsigned long cr2;
 
+   /* The trace might fault, save the cr2 register */
+   cr2 = read_cr2();
prev_state = exception_enter();
trace_page_fault_entries(regs, error_code);
+   /* Put back the original cr2 if needed */
+   if (cr2 != read_cr2())
+   write_cr2(cr2);
__do_page_fault(regs, error_code);
exception_exit(prev_state);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 10:07:29 -0500 (EST)
Vince Weaver  wrote:

> On Fri, 28 Feb 2014, Steven Rostedt wrote:

> 199.900696: function:   __module_address
> ...
> 199.900705: function:  __kernel_text_address
> 199.900809: kernel_stack: 
> => perf_callchain (810d35a2)
> => perf_prepare_sample (810cfae3)
> => __perf_event_overflow (810d02f4)
> => perf_swevent_overflow (810d04e3)
> => perf_swevent_event (810d0574)
> => perf_tp_event (810d070c)
> => perf_trace_x86_exceptions (810341b6)
> => trace_do_page_fault (81537702)
> => trace_page_fault (81534772)

Thank you!!! You just found the bug :-)

The bug was caused by:

commit 25c74b10bacead867478480170083f69cfc0db48
x86, trace: Register exception handler to trace IDT

With this code:

dotraplinkage void __kprobes
trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
{
enum ctx_state prev_state;

prev_state = exception_enter();
trace_page_fault_entries(regs, error_code);
__do_page_fault(regs, error_code);
exception_exit(prev_state);
}

The trace_page_fault_entries() which is called before the cr2 is saved
can fault by perf doing a userspace stack trace. But the cr2 is not
restored when calling __do_page_fault() and that gets the wrong cr2.

Below is a patch that should fix this. Please remove all other patches
and try this out.

Thanks,

-- Steve

> 199.900810: function: perf_output_begin
> 199.900810: function: __do_page_fault
> 199.900810: function:__perf_sw_event
> 199.900810: function:   perf_swevent_get_recursion_context
> 199.900811: function:down_read_trylock
> 199.900811: function:_cond_resched
> 199.900811: function:find_vma
> 199.900811: function:bad_area
> 199.900812: function:   up_read
> 199.900812: function:   __bad_area_nosemaphore
> 199.900812: function:  is_prefetch
> 199.900812: function: convert_ip_to_linear
> 199.900813: function:  unhandled_signal
> 199.900813: function:  __printk_ratelimit
> 199.900813: function: _raw_spin_trylock
> 199.900813: function: _raw_spin_unlock_irqrestore
> 199.900814: function:  printk
> 199.900814: function: vprintk_emit

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 6dea040..66b636d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1271,9 +1271,15 @@ dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
enum ctx_state prev_state;
+   unsigned long cr2;
 
prev_state = exception_enter();
+   /* The trace might fault, save the cr2 register */
+   cr2 = read_cr2();
trace_page_fault_entries(regs, error_code);
+   /* Put back the original cr2 if needed */
+   if (cr2 != read_cr2())
+   write_cr2(cr2);
__do_page_fault(regs, error_code);
exception_exit(prev_state);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread H. Peter Anvin
If I'm reading this right we end up going from the page fault tracepoint to 
copy_from_user_nmi() without going through NMI, and the cr2 corruption is 
obvious.  I guess the assumption that only the NMI path needed to save cr2 is 
flawed?

On February 28, 2014 7:07:29 AM PST, Vince Weaver  
wrote:
>On Fri, 28 Feb 2014, Steven Rostedt wrote:
>
>> Interesting. Are you doing a perf function trace?
>> 
>> And just in case, can you add this patch and make sure the copy is
>> called by NMI.
>
>199.900682: function: trace_do_page_fault
>199.900683: page_fault_user:  address=__per_cpu_end
>ip=__per_cpu_end error_code=0x6
>199.900683: function: perf_swevent_get_recursion_context
>199.900684: function: perf_tp_event
>199.900684: function:perf_swevent_event
>199.900684: function:   perf_swevent_overflow
>199.900684: function:  __perf_event_overflow
>199.900685: function: perf_prepare_sample
>199.900685: function:   
>__perf_event_header__init_id
>199.900685: function:   task_tgid_nr_ns
>199.900685: function:   perf_event_tid
>199.900686: function:  __task_pid_nr_ns
>199.900686: function:perf_callchain
>199.900687: function: copy_from_user_nmi
>199.900687: function: trace_do_page_fault
>199.900687: page_fault_kernel:address=irq_stack_union
>ip=copy_user_generic_string error_code=0x0
>199.900688: function:__do_page_fault
>199.900688: function:   bad_area_nosemaphore
>199.900688: function:  __bad_area_nosemaphore
>199.900689: function: no_context
>199.900689: function:fixup_exception
>199.900689: function:  
>search_exception_tables
>199.900689: function:  search_extable
>199.900691: function: copy_user_handle_tail
>199.900691: function: trace_do_page_fault
>199.900691: page_fault_kernel:address=irq_stack_union
>ip=copy_user_handle_tail error_code=0x0
>199.900691: function:__do_page_fault
>199.900692: function:   bad_area_nosemaphore
>199.900692: function:  __bad_area_nosemaphore
>199.900692: function: no_context
>199.900692: function:fixup_exception
>199.900692: function:  
>search_exception_tables
>199.900692: function:  search_extable
>199.900693: function: save_stack_trace
>199.900693: function:dump_trace
>199.900694: function:   print_context_stack
>199.900694: function:  __kernel_text_address
>199.900694: function: is_module_text_address
>199.900695: function:__module_text_address
>199.900695: function:   __module_address
>199.900695: function:  __kernel_text_address
>199.900695: function: is_module_text_address
>199.900696: function:__module_text_address
>199.900696: function:   __module_address
>...
>199.900705: function:  __kernel_text_address
>199.900809: kernel_stack: 
>=> perf_callchain (810d35a2)
>=> perf_prepare_sample (810cfae3)
>=> __perf_event_overflow (810d02f4)
>=> perf_swevent_overflow (810d04e3)
>=> perf_swevent_event (810d0574)
>=> perf_tp_event (810d070c)
>=> perf_trace_x86_exceptions (810341b6)
>=> trace_do_page_fault (81537702)
>=> trace_page_fault (81534772)
>199.900810: function: perf_output_begin
>199.900810: function: __do_page_fault
>199.900810: function:__perf_sw_event
>199.900810: function:  
>perf_swevent_get_recursion_context
>199.900811: function:down_read_trylock
>199.900811: function:_cond_resched
>199.900811: function:find_vma
>199.900811: function:bad_area
>199.900812: function:   up_read
>199.900812: function:   __bad_area_nosemaphore
>199.900812: function:  is_prefetch
>199.900812: function: convert_ip_to_linear
>199.900813: function:  unhandled_signal
>199.900813: function:  __printk_ratelimit
>199.900813: function: _raw_spin_trylock
>199.900813: function: _raw_spin_unlock_irqrestore
>199.900814: function:  printk
>199.900814: function: vprintk_emit

-- 
Sent from my mobile phone.  Please pardon brevity and lack of 

Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Vince Weaver
On Fri, 28 Feb 2014, Steven Rostedt wrote:

> Interesting. Are you doing a perf function trace?
> 
> And just in case, can you add this patch and make sure the copy is
> called by NMI.

199.900682: function: trace_do_page_fault
199.900683: page_fault_user:  address=__per_cpu_end ip=__per_cpu_end 
error_code=0x6
199.900683: function: perf_swevent_get_recursion_context
199.900684: function: perf_tp_event
199.900684: function:perf_swevent_event
199.900684: function:   perf_swevent_overflow
199.900684: function:  __perf_event_overflow
199.900685: function: perf_prepare_sample
199.900685: function:__perf_event_header__init_id
199.900685: function:   task_tgid_nr_ns
199.900685: function:   perf_event_tid
199.900686: function:  __task_pid_nr_ns
199.900686: function:perf_callchain
199.900687: function: copy_from_user_nmi
199.900687: function: trace_do_page_fault
199.900687: page_fault_kernel:address=irq_stack_union 
ip=copy_user_generic_string error_code=0x0
199.900688: function:__do_page_fault
199.900688: function:   bad_area_nosemaphore
199.900688: function:  __bad_area_nosemaphore
199.900689: function: no_context
199.900689: function:fixup_exception
199.900689: function:   search_exception_tables
199.900689: function:  search_extable
199.900691: function: copy_user_handle_tail
199.900691: function: trace_do_page_fault
199.900691: page_fault_kernel:address=irq_stack_union 
ip=copy_user_handle_tail error_code=0x0
199.900691: function:__do_page_fault
199.900692: function:   bad_area_nosemaphore
199.900692: function:  __bad_area_nosemaphore
199.900692: function: no_context
199.900692: function:fixup_exception
199.900692: function:   search_exception_tables
199.900692: function:  search_extable
199.900693: function: save_stack_trace
199.900693: function:dump_trace
199.900694: function:   print_context_stack
199.900694: function:  __kernel_text_address
199.900694: function: is_module_text_address
199.900695: function:__module_text_address
199.900695: function:   __module_address
199.900695: function:  __kernel_text_address
199.900695: function: is_module_text_address
199.900696: function:__module_text_address
199.900696: function:   __module_address
...
199.900705: function:  __kernel_text_address
199.900809: kernel_stack: 
=> perf_callchain (810d35a2)
=> perf_prepare_sample (810cfae3)
=> __perf_event_overflow (810d02f4)
=> perf_swevent_overflow (810d04e3)
=> perf_swevent_event (810d0574)
=> perf_tp_event (810d070c)
=> perf_trace_x86_exceptions (810341b6)
=> trace_do_page_fault (81537702)
=> trace_page_fault (81534772)
199.900810: function: perf_output_begin
199.900810: function: __do_page_fault
199.900810: function:__perf_sw_event
199.900810: function:   perf_swevent_get_recursion_context
199.900811: function:down_read_trylock
199.900811: function:_cond_resched
199.900811: function:find_vma
199.900811: function:bad_area
199.900812: function:   up_read
199.900812: function:   __bad_area_nosemaphore
199.900812: function:  is_prefetch
199.900812: function: convert_ip_to_linear
199.900813: function:  unhandled_signal
199.900813: function:  __printk_ratelimit
199.900813: function: _raw_spin_trylock
199.900813: function: _raw_spin_unlock_irqrestore
199.900814: function:  printk
199.900814: function: vprintk_emit

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 09:15:33 -0500 (EST)
Vince Weaver  wrote:

> On Thu, 27 Feb 2014, Steven Rostedt wrote:
> 
> > On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
> > Vince Weaver  wrote:
> > 
> > 
> > > > I would actually suggest we do the equivalent on i386 as well.
> > > > 
> > > > Vince, could you try this patch as an experiment?
> > > 
> > > OK with your patch applied it does not segfault.
> > > 
> > 
> > Vince, Great! Can you remove Peter's patch, and try this one. It
> > removes the crud to save the cr2 from entry_64.S and makes both i386
> > and x86_64 do the same thing in regards to cr2 handling.
> 
> no, with only this patch applied it segfaults as per previous:
> 
> [  126.396049] perf_fuzzer[2904]: segfault at 17a0 ip 004017fd sp 
> ffaff3f0 error 6 in perf_fuzzer[40+d1000]

Interesting. Are you doing a perf function trace?

And just in case, can you add this patch and make sure the copy is
called by NMI.

-- Steve

diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ddf9ecb..ca943cd 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -29,6 +29,7 @@ copy_from_user_nmi(void *to, const void __user *from, 
unsigned long n)
 */
pagefault_disable();
ret = __copy_from_user_inatomic(to, from, n);
+   trace_dump_stack(2)
pagefault_enable();
 
return ret;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Vince Weaver
On Thu, 27 Feb 2014, Steven Rostedt wrote:

> On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
> Vince Weaver  wrote:
> 
> 
> > > I would actually suggest we do the equivalent on i386 as well.
> > > 
> > > Vince, could you try this patch as an experiment?
> > 
> > OK with your patch applied it does not segfault.
> > 
> 
> Vince, Great! Can you remove Peter's patch, and try this one. It
> removes the crud to save the cr2 from entry_64.S and makes both i386
> and x86_64 do the same thing in regards to cr2 handling.

no, with only this patch applied it segfaults as per previous:

[  126.396049] perf_fuzzer[2904]: segfault at 17a0 ip 004017fd sp 
ffaff3f0 error 6 in perf_fuzzer[40+d1000]

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 12:11:11 +0100
Peter Zijlstra  wrote:

> On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote:
> > @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
> >  dotraplinkage notrace __kprobes void
> >  do_nmi(struct pt_regs *regs, long error_code)
> >  {
> > +   unsigned long cr2;
> > +   
> > nmi_nesting_preprocess(regs);
> >  
> > +   /*
> > +* Save off the CR2 register. If we take a page fault in the NMI then
> > +* it could corrupt the CR2 value. If the NMI preempts a page fault
> > +* handler before it was able to read the CR2 register, and then the
> > +* NMI itself takes a page fault, the page fault that was preempted
> > +* will read the information from the NMI page fault and not the
> > +* origin fault. Save it off and restore it if it changes.
> 
> > +* Use the r12 callee-saved register.
> 
> You might want to make that line go away :-)
> 

Gah! I noticed that before sending and deleted the line, recreated the
patch and sent that out. The one step I forgot to do was to save the
file after deleting it :-p

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote:
> @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
>  dotraplinkage notrace __kprobes void
>  do_nmi(struct pt_regs *regs, long error_code)
>  {
> + unsigned long cr2;
> + 
>   nmi_nesting_preprocess(regs);
>  
> + /*
> +  * Save off the CR2 register. If we take a page fault in the NMI then
> +  * it could corrupt the CR2 value. If the NMI preempts a page fault
> +  * handler before it was able to read the CR2 register, and then the
> +  * NMI itself takes a page fault, the page fault that was preempted
> +  * will read the information from the NMI page fault and not the
> +  * origin fault. Save it off and restore it if it changes.

> +  * Use the r12 callee-saved register.

You might want to make that line go away :-)

> +  */
> + cr2 = read_cr2();
> +
>   nmi_enter();
>  
>   inc_irq_stat(__nmi_count);
> @@ -523,6 +532,10 @@ do_nmi(struct pt_regs *regs, long error_code)
>  
>   nmi_exit();
>  
> + /* Reads are cheaper than writes */
> + if (unlikely(cr2 != read_cr2()))
> + write_cr2(cr2);
> +
>   /* On i386, may loop back to preprocess */
>   nmi_nesting_postprocess();
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Thu, Feb 27, 2014 at 05:31:50PM -0500, Steven Rostedt wrote:
> Well, the perf ring buffer is vmalloced, right? That can cause a page
> fault too.

On x86 they're typically not -- although we have a debug CONFIG option
to test that code on x86 too. On SPARC/ARM etc.. we have to use
vmalloc_user() because of cache aliasing (yay for VIPT).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Thu, Feb 27, 2014 at 05:31:50PM -0500, Steven Rostedt wrote:
 Well, the perf ring buffer is vmalloced, right? That can cause a page
 fault too.

On x86 they're typically not -- although we have a debug CONFIG option
to test that code on x86 too. On SPARC/ARM etc.. we have to use
vmalloc_user() because of cache aliasing (yay for VIPT).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote:
 @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
  dotraplinkage notrace __kprobes void
  do_nmi(struct pt_regs *regs, long error_code)
  {
 + unsigned long cr2;
 + 
   nmi_nesting_preprocess(regs);
  
 + /*
 +  * Save off the CR2 register. If we take a page fault in the NMI then
 +  * it could corrupt the CR2 value. If the NMI preempts a page fault
 +  * handler before it was able to read the CR2 register, and then the
 +  * NMI itself takes a page fault, the page fault that was preempted
 +  * will read the information from the NMI page fault and not the
 +  * origin fault. Save it off and restore it if it changes.

 +  * Use the r12 callee-saved register.

You might want to make that line go away :-)

 +  */
 + cr2 = read_cr2();
 +
   nmi_enter();
  
   inc_irq_stat(__nmi_count);
 @@ -523,6 +532,10 @@ do_nmi(struct pt_regs *regs, long error_code)
  
   nmi_exit();
  
 + /* Reads are cheaper than writes */
 + if (unlikely(cr2 != read_cr2()))
 + write_cr2(cr2);
 +
   /* On i386, may loop back to preprocess */
   nmi_nesting_postprocess();
  }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 12:11:11 +0100
Peter Zijlstra pet...@infradead.org wrote:

 On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote:
  @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
   dotraplinkage notrace __kprobes void
   do_nmi(struct pt_regs *regs, long error_code)
   {
  +   unsigned long cr2;
  +   
  nmi_nesting_preprocess(regs);
   
  +   /*
  +* Save off the CR2 register. If we take a page fault in the NMI then
  +* it could corrupt the CR2 value. If the NMI preempts a page fault
  +* handler before it was able to read the CR2 register, and then the
  +* NMI itself takes a page fault, the page fault that was preempted
  +* will read the information from the NMI page fault and not the
  +* origin fault. Save it off and restore it if it changes.
 
  +* Use the r12 callee-saved register.
 
 You might want to make that line go away :-)
 

Gah! I noticed that before sending and deleted the line, recreated the
patch and sent that out. The one step I forgot to do was to save the
file after deleting it :-p

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Vince Weaver
On Thu, 27 Feb 2014, Steven Rostedt wrote:

 On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
 Vince Weaver vincent.wea...@maine.edu wrote:
 
 
   I would actually suggest we do the equivalent on i386 as well.
   
   Vince, could you try this patch as an experiment?
  
  OK with your patch applied it does not segfault.
  
 
 Vince, Great! Can you remove Peter's patch, and try this one. It
 removes the crud to save the cr2 from entry_64.S and makes both i386
 and x86_64 do the same thing in regards to cr2 handling.

no, with only this patch applied it segfaults as per previous:

[  126.396049] perf_fuzzer[2904]: segfault at 17a0 ip 004017fd sp 
ffaff3f0 error 6 in perf_fuzzer[40+d1000]

Vince
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 09:15:33 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:

 On Thu, 27 Feb 2014, Steven Rostedt wrote:
 
  On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
  Vince Weaver vincent.wea...@maine.edu wrote:
  
  
I would actually suggest we do the equivalent on i386 as well.

Vince, could you try this patch as an experiment?
   
   OK with your patch applied it does not segfault.
   
  
  Vince, Great! Can you remove Peter's patch, and try this one. It
  removes the crud to save the cr2 from entry_64.S and makes both i386
  and x86_64 do the same thing in regards to cr2 handling.
 
 no, with only this patch applied it segfaults as per previous:
 
 [  126.396049] perf_fuzzer[2904]: segfault at 17a0 ip 004017fd sp 
 ffaff3f0 error 6 in perf_fuzzer[40+d1000]

Interesting. Are you doing a perf function trace?

And just in case, can you add this patch and make sure the copy is
called by NMI.

-- Steve

diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ddf9ecb..ca943cd 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -29,6 +29,7 @@ copy_from_user_nmi(void *to, const void __user *from, 
unsigned long n)
 */
pagefault_disable();
ret = __copy_from_user_inatomic(to, from, n);
+   trace_dump_stack(2)
pagefault_enable();
 
return ret;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Vince Weaver
On Fri, 28 Feb 2014, Steven Rostedt wrote:

 Interesting. Are you doing a perf function trace?
 
 And just in case, can you add this patch and make sure the copy is
 called by NMI.

199.900682: function: trace_do_page_fault
199.900683: page_fault_user:  address=__per_cpu_end ip=__per_cpu_end 
error_code=0x6
199.900683: function: perf_swevent_get_recursion_context
199.900684: function: perf_tp_event
199.900684: function:perf_swevent_event
199.900684: function:   perf_swevent_overflow
199.900684: function:  __perf_event_overflow
199.900685: function: perf_prepare_sample
199.900685: function:__perf_event_header__init_id
199.900685: function:   task_tgid_nr_ns
199.900685: function:   perf_event_tid
199.900686: function:  __task_pid_nr_ns
199.900686: function:perf_callchain
199.900687: function: copy_from_user_nmi
199.900687: function: trace_do_page_fault
199.900687: page_fault_kernel:address=irq_stack_union 
ip=copy_user_generic_string error_code=0x0
199.900688: function:__do_page_fault
199.900688: function:   bad_area_nosemaphore
199.900688: function:  __bad_area_nosemaphore
199.900689: function: no_context
199.900689: function:fixup_exception
199.900689: function:   search_exception_tables
199.900689: function:  search_extable
199.900691: function: copy_user_handle_tail
199.900691: function: trace_do_page_fault
199.900691: page_fault_kernel:address=irq_stack_union 
ip=copy_user_handle_tail error_code=0x0
199.900691: function:__do_page_fault
199.900692: function:   bad_area_nosemaphore
199.900692: function:  __bad_area_nosemaphore
199.900692: function: no_context
199.900692: function:fixup_exception
199.900692: function:   search_exception_tables
199.900692: function:  search_extable
199.900693: function: save_stack_trace
199.900693: function:dump_trace
199.900694: function:   print_context_stack
199.900694: function:  __kernel_text_address
199.900694: function: is_module_text_address
199.900695: function:__module_text_address
199.900695: function:   __module_address
199.900695: function:  __kernel_text_address
199.900695: function: is_module_text_address
199.900696: function:__module_text_address
199.900696: function:   __module_address
...
199.900705: function:  __kernel_text_address
199.900809: kernel_stack: stack trace
= perf_callchain (810d35a2)
= perf_prepare_sample (810cfae3)
= __perf_event_overflow (810d02f4)
= perf_swevent_overflow (810d04e3)
= perf_swevent_event (810d0574)
= perf_tp_event (810d070c)
= perf_trace_x86_exceptions (810341b6)
= trace_do_page_fault (81537702)
= trace_page_fault (81534772)
199.900810: function: perf_output_begin
199.900810: function: __do_page_fault
199.900810: function:__perf_sw_event
199.900810: function:   perf_swevent_get_recursion_context
199.900811: function:down_read_trylock
199.900811: function:_cond_resched
199.900811: function:find_vma
199.900811: function:bad_area
199.900812: function:   up_read
199.900812: function:   __bad_area_nosemaphore
199.900812: function:  is_prefetch
199.900812: function: convert_ip_to_linear
199.900813: function:  unhandled_signal
199.900813: function:  __printk_ratelimit
199.900813: function: _raw_spin_trylock
199.900813: function: _raw_spin_unlock_irqrestore
199.900814: function:  printk
199.900814: function: vprintk_emit

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread H. Peter Anvin
If I'm reading this right we end up going from the page fault tracepoint to 
copy_from_user_nmi() without going through NMI, and the cr2 corruption is 
obvious.  I guess the assumption that only the NMI path needed to save cr2 is 
flawed?

On February 28, 2014 7:07:29 AM PST, Vince Weaver vincent.wea...@maine.edu 
wrote:
On Fri, 28 Feb 2014, Steven Rostedt wrote:

 Interesting. Are you doing a perf function trace?
 
 And just in case, can you add this patch and make sure the copy is
 called by NMI.

199.900682: function: trace_do_page_fault
199.900683: page_fault_user:  address=__per_cpu_end
ip=__per_cpu_end error_code=0x6
199.900683: function: perf_swevent_get_recursion_context
199.900684: function: perf_tp_event
199.900684: function:perf_swevent_event
199.900684: function:   perf_swevent_overflow
199.900684: function:  __perf_event_overflow
199.900685: function: perf_prepare_sample
199.900685: function:   
__perf_event_header__init_id
199.900685: function:   task_tgid_nr_ns
199.900685: function:   perf_event_tid
199.900686: function:  __task_pid_nr_ns
199.900686: function:perf_callchain
199.900687: function: copy_from_user_nmi
199.900687: function: trace_do_page_fault
199.900687: page_fault_kernel:address=irq_stack_union
ip=copy_user_generic_string error_code=0x0
199.900688: function:__do_page_fault
199.900688: function:   bad_area_nosemaphore
199.900688: function:  __bad_area_nosemaphore
199.900689: function: no_context
199.900689: function:fixup_exception
199.900689: function:  
search_exception_tables
199.900689: function:  search_extable
199.900691: function: copy_user_handle_tail
199.900691: function: trace_do_page_fault
199.900691: page_fault_kernel:address=irq_stack_union
ip=copy_user_handle_tail error_code=0x0
199.900691: function:__do_page_fault
199.900692: function:   bad_area_nosemaphore
199.900692: function:  __bad_area_nosemaphore
199.900692: function: no_context
199.900692: function:fixup_exception
199.900692: function:  
search_exception_tables
199.900692: function:  search_extable
199.900693: function: save_stack_trace
199.900693: function:dump_trace
199.900694: function:   print_context_stack
199.900694: function:  __kernel_text_address
199.900694: function: is_module_text_address
199.900695: function:__module_text_address
199.900695: function:   __module_address
199.900695: function:  __kernel_text_address
199.900695: function: is_module_text_address
199.900696: function:__module_text_address
199.900696: function:   __module_address
...
199.900705: function:  __kernel_text_address
199.900809: kernel_stack: stack trace
= perf_callchain (810d35a2)
= perf_prepare_sample (810cfae3)
= __perf_event_overflow (810d02f4)
= perf_swevent_overflow (810d04e3)
= perf_swevent_event (810d0574)
= perf_tp_event (810d070c)
= perf_trace_x86_exceptions (810341b6)
= trace_do_page_fault (81537702)
= trace_page_fault (81534772)
199.900810: function: perf_output_begin
199.900810: function: __do_page_fault
199.900810: function:__perf_sw_event
199.900810: function:  
perf_swevent_get_recursion_context
199.900811: function:down_read_trylock
199.900811: function:_cond_resched
199.900811: function:find_vma
199.900811: function:bad_area
199.900812: function:   up_read
199.900812: function:   __bad_area_nosemaphore
199.900812: function:  is_prefetch
199.900812: function: convert_ip_to_linear
199.900813: function:  unhandled_signal
199.900813: function:  __printk_ratelimit
199.900813: function: _raw_spin_trylock
199.900813: function: _raw_spin_unlock_irqrestore
199.900814: function:  printk
199.900814: function: vprintk_emit

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line 

Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 10:07:29 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:

 On Fri, 28 Feb 2014, Steven Rostedt wrote:

 199.900696: function:   __module_address
 ...
 199.900705: function:  __kernel_text_address
 199.900809: kernel_stack: stack trace
 = perf_callchain (810d35a2)
 = perf_prepare_sample (810cfae3)
 = __perf_event_overflow (810d02f4)
 = perf_swevent_overflow (810d04e3)
 = perf_swevent_event (810d0574)
 = perf_tp_event (810d070c)
 = perf_trace_x86_exceptions (810341b6)
 = trace_do_page_fault (81537702)
 = trace_page_fault (81534772)

Thank you!!! You just found the bug :-)

The bug was caused by:

commit 25c74b10bacead867478480170083f69cfc0db48
x86, trace: Register exception handler to trace IDT

With this code:

dotraplinkage void __kprobes
trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
{
enum ctx_state prev_state;

prev_state = exception_enter();
trace_page_fault_entries(regs, error_code);
__do_page_fault(regs, error_code);
exception_exit(prev_state);
}

The trace_page_fault_entries() which is called before the cr2 is saved
can fault by perf doing a userspace stack trace. But the cr2 is not
restored when calling __do_page_fault() and that gets the wrong cr2.

Below is a patch that should fix this. Please remove all other patches
and try this out.

Thanks,

-- Steve

 199.900810: function: perf_output_begin
 199.900810: function: __do_page_fault
 199.900810: function:__perf_sw_event
 199.900810: function:   perf_swevent_get_recursion_context
 199.900811: function:down_read_trylock
 199.900811: function:_cond_resched
 199.900811: function:find_vma
 199.900811: function:bad_area
 199.900812: function:   up_read
 199.900812: function:   __bad_area_nosemaphore
 199.900812: function:  is_prefetch
 199.900812: function: convert_ip_to_linear
 199.900813: function:  unhandled_signal
 199.900813: function:  __printk_ratelimit
 199.900813: function: _raw_spin_trylock
 199.900813: function: _raw_spin_unlock_irqrestore
 199.900814: function:  printk
 199.900814: function: vprintk_emit

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 6dea040..66b636d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1271,9 +1271,15 @@ dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
enum ctx_state prev_state;
+   unsigned long cr2;
 
prev_state = exception_enter();
+   /* The trace might fault, save the cr2 register */
+   cr2 = read_cr2();
trace_page_fault_entries(regs, error_code);
+   /* Put back the original cr2 if needed */
+   if (cr2 != read_cr2())
+   write_cr2(cr2);
__do_page_fault(regs, error_code);
exception_exit(prev_state);
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 10:20:00 -0500
Steven Rostedt rost...@goodmis.org wrote:

 Below is a patch that should fix this. Please remove all other patches
 and try this out.

Updated patch, as Peter Zijlstra on IRC asked me if the
exception_enter() can be traced. And looking at it, it sure can be.

-- Steve

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 6dea040..f606b67 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1271,9 +1271,15 @@ dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
enum ctx_state prev_state;
+   unsigned long cr2;
 
+   /* The trace might fault, save the cr2 register */
+   cr2 = read_cr2();
prev_state = exception_enter();
trace_page_fault_entries(regs, error_code);
+   /* Put back the original cr2 if needed */
+   if (cr2 != read_cr2())
+   write_cr2(cr2);
__do_page_fault(regs, error_code);
exception_exit(prev_state);
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
 If I'm reading this right we end up going from the page fault
 tracepoint to copy_from_user_nmi() without going through NMI, and the
 cr2 corruption is obvious.  I guess the assumption that only the NMI
 path needed to save cr2 is flawed?

It was never assumed it would only go through NMI, but that it would be
NMI safe -- and as it turns out, it is that.

What I did assume was that any other callsites would be safe, seeing how
they'd already be running in 'normal' contexts.

I had not considered people putting tracepoints _that_ early in the
exception paths.

Note that there's more tracepoints there than the one mentioned.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread H. Peter Anvin
On 02/28/2014 07:40 AM, Peter Zijlstra wrote:
 On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
 If I'm reading this right we end up going from the page fault
 tracepoint to copy_from_user_nmi() without going through NMI, and the
 cr2 corruption is obvious.  I guess the assumption that only the NMI
 path needed to save cr2 is flawed?
 
 It was never assumed it would only go through NMI, but that it would be
 NMI safe -- and as it turns out, it is that.
 
 What I did assume was that any other callsites would be safe, seeing how
 they'd already be running in 'normal' contexts.
 
 I had not considered people putting tracepoints _that_ early in the
 exception paths.
 
 Note that there's more tracepoints there than the one mentioned.
 

Well, I was talking about the assumption spelled out in the comment
above copy_from_user_nmi() which pretty much states cr2 is safe because
cr2 is saved/restored in the NMI wrappers.

-hpa
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 08:15:11 -0800
H. Peter Anvin h...@zytor.com wrote:

 Well, I was talking about the assumption spelled out in the comment
 above copy_from_user_nmi() which pretty much states cr2 is safe because
 cr2 is saved/restored in the NMI wrappers.

Yeah, it seems that the name copy_from_user_nmi() is a misnomer. As
it can be called outside of nmi context. Perhaps we should have a
copy_from_user_trace() that does the save and restore of the cr2.

As that's the only place that faults, it may be the best answer.

Arnaldo,

Can you test this patch and see if it fixes the bug for you too. Vince
already said it fixes it for him.

I'm attaching it below (it's from H. Peter).

Peter Z., as both Jiri's and my patch fixed the callers of the problem
area, and as we have been discussing, there may be more problem areas,
I'm thinking the best solution is to just use H. Peter's patch instead.
And then we should rename it to copy_from_user_trace().

-- Steve


diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ddf9ecb..938e45c 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -10,6 +10,8 @@
 #include asm/word-at-a-time.h
 #include linux/sched.h
 
+#include asm/processor.h
+
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
  * nested NMI paths are careful to preserve CR2.
@@ -18,6 +20,7 @@ unsigned long
 copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 {
unsigned long ret;
+   unsigned long cr2;
 
if (__range_not_ok(from, n, TASK_SIZE))
return 0;
@@ -27,9 +30,11 @@ copy_from_user_nmi(void *to, const void __user *from, 
unsigned long n)
 * disable pagefaults so that its behaviour is consistent even when
 * called form other contexts.
 */
+   cr2 = read_cr2();
pagefault_disable();
ret = __copy_from_user_inatomic(to, from, n);
pagefault_enable();
+   write_cr2(cr2);
 
return ret;
 }

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Paul E. McKenney
On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote:
 On Thu, 27 Feb 2014, H. Peter Anvin wrote:
 
  On 02/27/2014 03:30 PM, Steven Rostedt wrote:
   On Thu, 27 Feb 2014 14:52:54 -0800
   H. Peter Anvin h...@zytor.com wrote:
   
   On 02/27/2014 02:31 PM, Steven Rostedt wrote:
  
   Yeah, something is getting mesed up.
  
  
   What it *looks* like to me is that we try to nest the cr2 save/restore,
   which doesn't nest because it is a percpu variable.
  
   ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
   entry_64.S, which makes the stuff in do_nmi completely redundant and
   there for no good reason.
   
   Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
   section. That is, it isn't even executed. That's i386 code. The only
   place the cr2 is saved for x86_64 is in entry_64.S.
   
  
  Right, egg on my face.  However, I still think it would make more sense
  for it to nest the way entry_64.S does if at all possible.
  
  That makes this even more confusing, though.  I would still like to see
  what happens with the patch I sent Vince.
 
 I'll try your patch momentarily, first I had some other changes I started 
 running before I left work (for some reason it recompiled the whole 
 kernel).
 
 8: function: perf_output_begin
 8: bprint:   perf_output_begin: VMW: event type 2 config 2a st: 
 2c3e
 8: bputs:perf_output_begin: VMW: before rcu_dereference
 9: function: __do_page_fault
 9: function:down_read_trylock
 9: function:_cond_resched
 9: function:find_vma
 
 so it looks like the fault happens 
 
 rcu_read_lock();
 
 116 /*
 117  * For inherited events we send all the output towards the parent.
 118  */
 119 if (event-parent)
 120 event = event-parent;
 121 
 
 somewhere between here
 
 122 rb = rcu_dereference(event-rb);
 123 if (unlikely(!rb))
 124 goto out;
 
 and here
 
 125 
 126 if (unlikely(!rb-nr_pages))
 127 goto out;
 
 although if rcu locks do anything to turn off tracing then this could be 
 suspect.

The most likely suspect is of course event-rb in the rcu_dereference.
I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
currently interact with tracing.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread H. Peter Anvin
Now we need to figure out if the reboot problem and the segfault problem are 
actually the same... I have a nasty feeling they might be different problems.

On February 28, 2014 7:07:29 AM PST, Vince Weaver vincent.wea...@maine.edu 
wrote:
On Fri, 28 Feb 2014, Steven Rostedt wrote:

 Interesting. Are you doing a perf function trace?
 
 And just in case, can you add this patch and make sure the copy is
 called by NMI.

199.900682: function: trace_do_page_fault
199.900683: page_fault_user:  address=__per_cpu_end
ip=__per_cpu_end error_code=0x6
199.900683: function: perf_swevent_get_recursion_context
199.900684: function: perf_tp_event
199.900684: function:perf_swevent_event
199.900684: function:   perf_swevent_overflow
199.900684: function:  __perf_event_overflow
199.900685: function: perf_prepare_sample
199.900685: function:   
__perf_event_header__init_id
199.900685: function:   task_tgid_nr_ns
199.900685: function:   perf_event_tid
199.900686: function:  __task_pid_nr_ns
199.900686: function:perf_callchain
199.900687: function: copy_from_user_nmi
199.900687: function: trace_do_page_fault
199.900687: page_fault_kernel:address=irq_stack_union
ip=copy_user_generic_string error_code=0x0
199.900688: function:__do_page_fault
199.900688: function:   bad_area_nosemaphore
199.900688: function:  __bad_area_nosemaphore
199.900689: function: no_context
199.900689: function:fixup_exception
199.900689: function:  
search_exception_tables
199.900689: function:  search_extable
199.900691: function: copy_user_handle_tail
199.900691: function: trace_do_page_fault
199.900691: page_fault_kernel:address=irq_stack_union
ip=copy_user_handle_tail error_code=0x0
199.900691: function:__do_page_fault
199.900692: function:   bad_area_nosemaphore
199.900692: function:  __bad_area_nosemaphore
199.900692: function: no_context
199.900692: function:fixup_exception
199.900692: function:  
search_exception_tables
199.900692: function:  search_extable
199.900693: function: save_stack_trace
199.900693: function:dump_trace
199.900694: function:   print_context_stack
199.900694: function:  __kernel_text_address
199.900694: function: is_module_text_address
199.900695: function:__module_text_address
199.900695: function:   __module_address
199.900695: function:  __kernel_text_address
199.900695: function: is_module_text_address
199.900696: function:__module_text_address
199.900696: function:   __module_address
...
199.900705: function:  __kernel_text_address
199.900809: kernel_stack: stack trace
= perf_callchain (810d35a2)
= perf_prepare_sample (810cfae3)
= __perf_event_overflow (810d02f4)
= perf_swevent_overflow (810d04e3)
= perf_swevent_event (810d0574)
= perf_tp_event (810d070c)
= perf_trace_x86_exceptions (810341b6)
= trace_do_page_fault (81537702)
= trace_page_fault (81534772)
199.900810: function: perf_output_begin
199.900810: function: __do_page_fault
199.900810: function:__perf_sw_event
199.900810: function:  
perf_swevent_get_recursion_context
199.900811: function:down_read_trylock
199.900811: function:_cond_resched
199.900811: function:find_vma
199.900811: function:bad_area
199.900812: function:   up_read
199.900812: function:   __bad_area_nosemaphore
199.900812: function:  is_prefetch
199.900812: function: convert_ip_to_linear
199.900813: function:  unhandled_signal
199.900813: function:  __printk_ratelimit
199.900813: function: _raw_spin_trylock
199.900813: function: _raw_spin_unlock_irqrestore
199.900814: function:  printk
199.900814: function: vprintk_emit

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More 

Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 12:38:52 -0800
H. Peter Anvin h...@zytor.com wrote:

 Now we need to figure out if the reboot problem and the segfault problem are 
 actually the same... I have a nasty feeling they might be different problems.

I wonder if there was any recursion problem. Although, I believe perf
has recursion checks. But perhaps something recursed before the checks?

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 12:34:05 -0800
Paul E. McKenney paul...@linux.vnet.ibm.com wrote:

 On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote:
  On Thu, 27 Feb 2014, H. Peter Anvin wrote:
  
   On 02/27/2014 03:30 PM, Steven Rostedt wrote:
On Thu, 27 Feb 2014 14:52:54 -0800
H. Peter Anvin h...@zytor.com wrote:

On 02/27/2014 02:31 PM, Steven Rostedt wrote:
   
Yeah, something is getting mesed up.
   
   
What it *looks* like to me is that we try to nest the cr2 save/restore,
which doesn't nest because it is a percpu variable.
   
... except in the x86-64 case, we *ALSO* save/restore cr2 inside
entry_64.S, which makes the stuff in do_nmi completely redundant and
there for no good reason.

Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
section. That is, it isn't even executed. That's i386 code. The only
place the cr2 is saved for x86_64 is in entry_64.S.

   
   Right, egg on my face.  However, I still think it would make more sense
   for it to nest the way entry_64.S does if at all possible.
   
   That makes this even more confusing, though.  I would still like to see
   what happens with the patch I sent Vince.
  
  I'll try your patch momentarily, first I had some other changes I started 
  running before I left work (for some reason it recompiled the whole 
  kernel).
  
  8: function: perf_output_begin
  8: bprint:   perf_output_begin: VMW: event type 2 config 2a st: 
  2c3e
  8: bputs:perf_output_begin: VMW: before rcu_dereference
  9: function: __do_page_fault
  9: function:down_read_trylock
  9: function:_cond_resched
  9: function:find_vma
  
  so it looks like the fault happens 
  
  rcu_read_lock();
  
  116 /*
  117  * For inherited events we send all the output towards the 
  parent.
  118  */
  119 if (event-parent)
  120 event = event-parent;
  121 
  
  somewhere between here
  
  122 rb = rcu_dereference(event-rb);
  123 if (unlikely(!rb))
  124 goto out;
  
  and here
  
  125 
  126 if (unlikely(!rb-nr_pages))
  127 goto out;
  
  although if rcu locks do anything to turn off tracing then this could be 
  suspect.
 
 The most likely suspect is of course event-rb in the rcu_dereference.
 I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
 currently interact with tracing.  ;-)

These are all perf related. You'll need to defer to Peter Zijlstra ;-)

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote:
   I'll try your patch momentarily, first I had some other changes I started 
   running before I left work (for some reason it recompiled the whole 
   kernel).
   
   8: function: perf_output_begin
   8: bprint:   perf_output_begin: VMW: event type 2 config 2a 
   st: 2c3e
   8: bputs:perf_output_begin: VMW: before rcu_dereference
   9: function: __do_page_fault
   9: function:down_read_trylock
   9: function:_cond_resched
   9: function:find_vma
   
   so it looks like the fault happens 
   
   rcu_read_lock();
   
   116 /*
   117  * For inherited events we send all the output towards the 
   parent.
   118  */
   119 if (event-parent)
   120 event = event-parent;
   121 
   
   somewhere between here
   
   122 rb = rcu_dereference(event-rb);
   123 if (unlikely(!rb))
   124 goto out;
   
   and here
   
   125 
   126 if (unlikely(!rb-nr_pages))
   127 goto out;
   
   although if rcu locks do anything to turn off tracing then this could be 
   suspect.
  
  The most likely suspect is of course event-rb in the rcu_dereference.
  I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
  currently interact with tracing.  ;-)
 
 These are all perf related. You'll need to defer to Peter Zijlstra ;-)

I'm lost.. :/

pretty much all perf objects are RCU freed. 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 08:15:11AM -0800, H. Peter Anvin wrote:
 On 02/28/2014 07:40 AM, Peter Zijlstra wrote:
  On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
  If I'm reading this right we end up going from the page fault
  tracepoint to copy_from_user_nmi() without going through NMI, and the
  cr2 corruption is obvious.  I guess the assumption that only the NMI
  path needed to save cr2 is flawed?
  
  It was never assumed it would only go through NMI, but that it would be
  NMI safe -- and as it turns out, it is that.
  
  What I did assume was that any other callsites would be safe, seeing how
  they'd already be running in 'normal' contexts.
  
  I had not considered people putting tracepoints _that_ early in the
  exception paths.
  
  Note that there's more tracepoints there than the one mentioned.
  
 
 Well, I was talking about the assumption spelled out in the comment
 above copy_from_user_nmi() which pretty much states cr2 is safe because
 cr2 is saved/restored in the NMI wrappers.

That is because we assumed NMI was the only thing that could interrupt a
fault handler before it would read CR2. Never thinking someone would put
a tracepoint there.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 11:29:46AM -0500, Steven Rostedt wrote:
 On Fri, 28 Feb 2014 08:15:11 -0800
 H. Peter Anvin h...@zytor.com wrote:
 
  Well, I was talking about the assumption spelled out in the comment
  above copy_from_user_nmi() which pretty much states cr2 is safe because
  cr2 is saved/restored in the NMI wrappers.
 
 Yeah, it seems that the name copy_from_user_nmi() is a misnomer. As
 it can be called outside of nmi context. Perhaps we should have a
 copy_from_user_trace() that does the save and restore of the cr2.
 
 As that's the only place that faults, it may be the best answer.
 
 Arnaldo,
 
 Can you test this patch and see if it fixes the bug for you too. Vince
 already said it fixes it for him.
 
 I'm attaching it below (it's from H. Peter).
 
 Peter Z., as both Jiri's and my patch fixed the callers of the problem
 area, and as we have been discussing, there may be more problem areas,
 I'm thinking the best solution is to just use H. Peter's patch instead.
 And then we should rename it to copy_from_user_trace().

Like already said; _trace is an absolutely abysmal name. Also you
_really_ don't want an unconditional CR2 write in there, that's just
stupidly expensive.

Also, this function is called a _LOT_ under certain workloads, I don't
know how cheap a CR2 read is, but it had better be really cheap.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 21:56:38 +0100
Peter Zijlstra pet...@infradead.org wrote:


 Like already said; _trace is an absolutely abysmal name. Also you
 _really_ don't want an unconditional CR2 write in there, that's just
 stupidly expensive.

But a read isn't. Which is why we only do a write if the copy caused a
page fault (which is already expensive).

The proposed patch will have:

if (cr2 != read_cr2())
write_cr2(cr2);

 
 Also, this function is called a _LOT_ under certain workloads, I don't
 know how cheap a CR2 read is, but it had better be really cheap.

That's a HPA question.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Vince Weaver
On Fri, 28 Feb 2014, H. Peter Anvin wrote:

 Now we need to figure out if the reboot problem and the segfault problem 
 are actually the same... I have a nasty feeling they might be different 
 problems.

I'm currently running a script that tries setting EBP to all possible 
32-bit pages and running the test to see if that triggers anything.

If I look at my notes the original reboot crash might have happened when I 
had the fuzzer also generating overflow signals (my current tests do not) 
so I'm not sure if having all this mess triggered from inside a signal 
handler could make it reboot somehow.

I was away from the computer this afternoon and of course I have scores of 
e-mails on this topic now with lots of competing patches.  Is there one 
in particular I'm supposed to be testing?

Thanks,

Vince
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Paul E. McKenney
On Fri, Feb 28, 2014 at 09:54:09PM +0100, Peter Zijlstra wrote:
 On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote:
I'll try your patch momentarily, first I had some other changes I 
started 
running before I left work (for some reason it recompiled the whole 
kernel).

8: function: perf_output_begin
8: bprint:   perf_output_begin: VMW: event type 2 config 2a 
st: 2c3e
8: bputs:perf_output_begin: VMW: before rcu_dereference
9: function: __do_page_fault
9: function:down_read_trylock
9: function:_cond_resched
9: function:find_vma

so it looks like the fault happens 

rcu_read_lock();

116 /*
117  * For inherited events we send all the output towards the 
parent.
118  */
119 if (event-parent)
120 event = event-parent;
121 

somewhere between here

122 rb = rcu_dereference(event-rb);
123 if (unlikely(!rb))
124 goto out;

and here

125 
126 if (unlikely(!rb-nr_pages))
127 goto out;

although if rcu locks do anything to turn off tracing then this could 
be 
suspect.
   
   The most likely suspect is of course event-rb in the rcu_dereference.
   I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
   currently interact with tracing.  ;-)
  
  These are all perf related. You'll need to defer to Peter Zijlstra ;-)
 
 I'm lost.. :/
 
 pretty much all perf objects are RCU freed. 

This code isn't running in idle context is it?  If so, RCU will happily
free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
though.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
 This code isn't running in idle context is it?  If so, RCU will happily
 free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
 though.

Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
entry code deals with the idle state AFAIK.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 16:18:23 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:

 I was away from the computer this afternoon and of course I have scores of 
 e-mails on this topic now with lots of competing patches.  Is there one 
 in particular I'm supposed to be testing?

I was poking fun at you on IRC for this exact reason:

rostedt poor Vince, I keep sending him new patches. No, don't test this 
patch, now test this one. Oh wait, try this one instead
* peterz sees Vince thinking: stop... sending.. me.. damn... patches... 
already... !!!11!
rostedt or at least, Let me finish this test before I cancel it again for 
another damn patch
rostedt then he's probably doing I'm not going to run any tests now, until I 
wait a while to see if there's a new patch to test

 :-)

Anyway, I would say wait a bit while we sort this out. At least we have
a strong idea of what the bug is. Now we need to agree on the solution.

Vince, you have been really terrific in helping us figure out what was
wrong. I don't want to waste your time until we can all agree on what
the proper fix is. When we do, it would be great if you can test it.

But for now, have a great weekend and thanks for all your hard work in
narrowing down this issue.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Paul E. McKenney
On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
 On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
  This code isn't running in idle context is it?  If so, RCU will happily
  free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
  though.
 
 Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
 entry code deals with the idle state AFAIK.

Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
about this being code invoked from some energy-efficiency driver or
another within the idle loop.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
 On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
  On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
   This code isn't running in idle context is it?  If so, RCU will happily
   free out from under it.  CONFIG_PROVE_RCU should detect this sort of 
   thing,
   though.
  
  Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
  entry code deals with the idle state AFAIK.
 
 Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
 about this being code invoked from some energy-efficiency driver or
 another within the idle loop.

Right, so any tracepoint can end up there; but I thought there was
already the rule that tracepoints needed RCU enabled.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 22:55:11 +0100
Peter Zijlstra pet...@infradead.org wrote:

 On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
  On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
   On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
This code isn't running in idle context is it?  If so, RCU will happily
free out from under it.  CONFIG_PROVE_RCU should detect this sort of 
thing,
though.
   
   Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
   entry code deals with the idle state AFAIK.
  
  Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
  about this being code invoked from some energy-efficiency driver or
  another within the idle loop.
 
 Right, so any tracepoint can end up there; but I thought there was
 already the rule that tracepoints needed RCU enabled.

There is and we have special tracepoint caller for those cases we want a
tracepoint out of RCU scope. These will reactivate rcu in the
tracepoint code.

  trace_tp_name_rcuidle(...)

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Paul E. McKenney
On Fri, Feb 28, 2014 at 05:05:53PM -0500, Steven Rostedt wrote:
 On Fri, 28 Feb 2014 22:55:11 +0100
 Peter Zijlstra pet...@infradead.org wrote:
 
  On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
   On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
 This code isn't running in idle context is it?  If so, RCU will 
 happily
 free out from under it.  CONFIG_PROVE_RCU should detect this sort of 
 thing,
 though.

Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
entry code deals with the idle state AFAIK.
   
   Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
   about this being code invoked from some energy-efficiency driver or
   another within the idle loop.
  
  Right, so any tracepoint can end up there; but I thought there was
  already the rule that tracepoints needed RCU enabled.
 
 There is and we have special tracepoint caller for those cases we want a
 tracepoint out of RCU scope. These will reactivate rcu in the
 tracepoint code.
 
   trace_tp_name_rcuidle(...)

OK, I finally looked at the emails leading up to this in the thread...
I believe that I am doing premature debugging.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Vince Weaver
On Fri, 28 Feb 2014, Steven Rostedt wrote:

 On Fri, 28 Feb 2014 16:18:23 -0500 (EST)
 Vince Weaver vincent.wea...@maine.edu wrote:
 
  I was away from the computer this afternoon and of course I have scores of 
  e-mails on this topic now with lots of competing patches.  Is there one 
  in particular I'm supposed to be testing?
 
 I was poking fun at you on IRC for this exact reason:
 
 rostedt poor Vince, I keep sending him new patches. No, don't test this 
 patch, now test this one. Oh wait, try this one instead
 * peterz sees Vince thinking: stop... sending.. me.. damn... patches... 
 already... !!!11!
 rostedt or at least, Let me finish this test before I cancel it again for 
 another damn patch
 rostedt then he's probably doing I'm not going to run any tests now, until 
 I wait a while to see if there's a new patch to test

Well while it might appear that I spend all of my days finding perf_event 
bugs, I actually am a college professor so I do occasionally have to run 
off to teach a class, meet with students, or write papers/grants for other 
academics to reject.

It's nice others can reproduce the issue now, it would have saved me a lot 
of trouble, although now in theory I have a much better handle of how to 
use/abuse ftrace so I guess it was worth it.

Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
know if there are any other issues left.  I do look forward to the day 
when I can leave it running overnight and have a clean syslog the next 
morning.

Vince
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread H. Peter Anvin
On 02/28/2014 03:34 PM, Vince Weaver wrote:
 
 Well while it might appear that I spend all of my days finding perf_event 
 bugs, I actually am a college professor so I do occasionally have to run 
 off to teach a class, meet with students, or write papers/grants for other 
 academics to reject.
 

We really appreciate your help.  This has been really critical.


 It's nice others can reproduce the issue now, it would have saved me a lot 
 of trouble, although now in theory I have a much better handle of how to 
 use/abuse ftrace so I guess it was worth it.
 
 Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
 know if there are any other issues left.  I do look forward to the day 
 when I can leave it running overnight and have a clean syslog the next 
 morning.
 

We all do, definitely, and your help has been a huge step in that direction.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt
On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:

  I was poking fun at you on IRC for this exact reason:
  
  rostedt poor Vince, I keep sending him new patches. No, don't test this 
  patch, now test this one. Oh wait, try this one instead
  * peterz sees Vince thinking: stop... sending.. me.. damn... patches... 
  already... !!!11!
  rostedt or at least, Let me finish this test before I cancel it again 
  for another damn patch
  rostedt then he's probably doing I'm not going to run any tests now, 
  until I wait a while to see if there's a new patch to test

I hope you were not offended by this. It was actually a testament to
the work you've given us.

 
 Well while it might appear that I spend all of my days finding perf_event 
 bugs, I actually am a college professor so I do occasionally have to run 
 off to teach a class, meet with students, or write papers/grants for other 
 academics to reject.

But perf_event bug finder is a much more prestigious title than
college professor ;-)

 
 It's nice others can reproduce the issue now, it would have saved me a lot 
 of trouble, although now in theory I have a much better handle of how to 
 use/abuse ftrace so I guess it was worth it.

I always enjoy finding bugs in other people's code. That's usually the
best way to learn what their code does.

 
 Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
 know if there are any other issues left.  I do look forward to the day 
 when I can leave it running overnight and have a clean syslog the next 
 morning.

BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
really useful for us to do our own testing too.

Thanks,

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Steven Rostedt
On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
Vince Weaver  wrote:


> > I would actually suggest we do the equivalent on i386 as well.
> > 
> > Vince, could you try this patch as an experiment?
> 
> OK with your patch applied it does not segfault.
> 

Vince, Great! Can you remove Peter's patch, and try this one. It
removes the crud to save the cr2 from entry_64.S and makes both i386
and x86_64 do the same thing in regards to cr2 handling.

-- Steve

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1e96c36..937cb8d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1854,29 +1854,11 @@ end_repeat_nmi:
call save_paranoid
DEFAULT_FRAME 0
 
-   /*
-* Save off the CR2 register. If we take a page fault in the NMI then
-* it could corrupt the CR2 value. If the NMI preempts a page fault
-* handler before it was able to read the CR2 register, and then the
-* NMI itself takes a page fault, the page fault that was preempted
-* will read the information from the NMI page fault and not the
-* origin fault. Save it off and restore it if it changes.
-* Use the r12 callee-saved register.
-*/
-   movq %cr2, %r12
-
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
movq %rsp,%rdi
movq $-1,%rsi
call do_nmi
 
-   /* Did the NMI take a page fault? Restore cr2 if it did */
-   movq %cr2, %rcx
-   cmpq %rcx, %r12
-   je 1f
-   movq %r12, %cr2
-1:
-   
testl %ebx,%ebx /* swapgs needed? */
jnz nmi_restore
 nmi_swapgs:
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 6fcb49c..f1a6294 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -443,7 +443,6 @@ enum nmi_states {
NMI_LATCHED,
 };
 static DEFINE_PER_CPU(enum nmi_states, nmi_state);
-static DEFINE_PER_CPU(unsigned long, nmi_cr2);
 
 #define nmi_nesting_preprocess(regs)   \
do {\
@@ -452,14 +451,11 @@ static DEFINE_PER_CPU(unsigned long, nmi_cr2);
return; \
}   \
this_cpu_write(nmi_state, NMI_EXECUTING);   \
-   this_cpu_write(nmi_cr2, read_cr2());\
} while (0);\
nmi_restart:
 
 #define nmi_nesting_postprocess()  \
do {\
-   if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) \
-   write_cr2(this_cpu_read(nmi_cr2));  \
if (this_cpu_dec_return(nmi_state)) \
goto nmi_restart;   \
} while (0)
@@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
 dotraplinkage notrace __kprobes void
 do_nmi(struct pt_regs *regs, long error_code)
 {
+   unsigned long cr2;
+   
nmi_nesting_preprocess(regs);
 
+   /*
+* Save off the CR2 register. If we take a page fault in the NMI then
+* it could corrupt the CR2 value. If the NMI preempts a page fault
+* handler before it was able to read the CR2 register, and then the
+* NMI itself takes a page fault, the page fault that was preempted
+* will read the information from the NMI page fault and not the
+* origin fault. Save it off and restore it if it changes.
+* Use the r12 callee-saved register.
+*/
+   cr2 = read_cr2();
+
nmi_enter();
 
inc_irq_stat(__nmi_count);
@@ -523,6 +532,10 @@ do_nmi(struct pt_regs *regs, long error_code)
 
nmi_exit();
 
+   /* Reads are cheaper than writes */
+   if (unlikely(cr2 != read_cr2()))
+   write_cr2(cr2);
+
/* On i386, may loop back to preprocess */
nmi_nesting_postprocess();
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread H. Peter Anvin
Ok... I think we're definitely talking about a cr2 leak.  The reboot might be a 
race condition in the NMI nesting handling maybe?

On February 27, 2014 5:34:34 PM PST, Vince Weaver  
wrote:
>On Thu, 27 Feb 2014, H. Peter Anvin wrote:
>
>> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
>> > 
>> > Yeah, something is getting mesed up.
>> > 
>> 
>> What it *looks* like to me is that we try to nest the cr2
>save/restore,
>> which doesn't nest because it is a percpu variable.
>> 
>> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
>> entry_64.S, which makes the stuff in do_nmi completely redundant and
>> there for no good reason.
>> 
>> I would actually suggest we do the equivalent on i386 as well.
>> 
>> Vince, could you try this patch as an experiment?
>
>OK with your patch applied it does not segfault.
>
>The trace looks like you'd expect too:
>
>perf_fuzzer-2910  [000]   255.355331: page_fault_kernel:   
>address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
>perf_fuzzer-2910  [000]   255.355331: function:   
>__do_page_fault
>perf_fuzzer-2910  [000]   255.355331: function:  
>bad_area_nosemaphore
>perf_fuzzer-2910  [000]   255.355331: function: 
>__bad_area_nosemaphore
>perf_fuzzer-2910  [000]   255.355332: function:
>no_context
>perf_fuzzer-2910  [000]   255.355332: function:
>   fixup_exception
>perf_fuzzer-2910  [000]   255.355332: function:
>  search_exception_tables
>perf_fuzzer-2910  [000]   255.355332: function:
> search_extable
>perf_fuzzer-2910  [000]   255.355333: function:
>perf_output_begin
>perf_fuzzer-2910  [000]   255.355333: bprint:  
>perf_output_begin: VMW: event type 2 config 2a st: 2c3e7
>perf_fuzzer-2910  [000]   255.355333: bprint:  
>perf_output_begin: VMW: before event->parent (nil)
>perf_fuzzer-2910  [000]   255.355334: bprint:  
>perf_output_begin: VMW: before rcu_dereference (nil)
>perf_fuzzer-2910  [000]   255.355334: function:
>__do_page_fault
>perf_fuzzer-2910  [000]   255.355334: function:   
>down_read_trylock
>perf_fuzzer-2910  [000]   255.355334: function:   
>_cond_resched
>perf_fuzzer-2910  [000]   255.355335: function:find_vma
>perf_fuzzer-2910  [000]   255.355335: function:   
>handle_mm_fault
>perf_fuzzer-2910  [000]   255.355335: function:  
>__do_fault
>perf_fuzzer-2910  [000]   255.355335: bputs:   
>perf_mmap_fault: VMW: perf_mmap_fault
>perf_fuzzer-2910  [000]   255.355336: bprint:  
>perf_mmap_fault: VMW: perf_mmap_fault 0x8800cba6a980
>perf_fuzzer-2910  [000]   255.355336: function:
>perf_mmap_to_page
>perf_fuzzer-2910  [000]   255.355336: function:
>_cond_resched
>perf_fuzzer-2910  [000]   255.355336: function: 
>unlock_page
>perf_fuzzer-2910  [000]   255.355336: function:
>page_waitqueue
>perf_fuzzer-2910  [000]   255.355337: function:
>__wake_up_bit
>perf_fuzzer-2910  [000]   255.355337: bputs:   
>perf_mmap_fault: VMW: perf_mmap_fault
>perf_fuzzer-2910  [000]   255.355337: function:
>_cond_resched
>perf_fuzzer-2910  [000]   255.355337: function: 
>_raw_spin_lock
>perf_fuzzer-2910  [000]   255.355337: function: 
>page_add_file_rmap
>perf_fuzzer-2910  [000]   255.355338: function:
>__inc_zone_page_state
>perf_fuzzer-2910  [000]   255.355338: function:
>   __inc_zone_state
>perf_fuzzer-2910  [000]   255.355338: function: 
>set_page_dirty
>perf_fuzzer-2910  [000]   255.355338: function:
>page_mapping
>perf_fuzzer-2910  [000]   255.355338: function:
>anon_set_page_dirty
>perf_fuzzer-2910  [000]   255.355339: function: 
>unlock_page
>
>
>Vince

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Vince Weaver
On Thu, 27 Feb 2014, H. Peter Anvin wrote:

> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> > 
> > Yeah, something is getting mesed up.
> > 
> 
> What it *looks* like to me is that we try to nest the cr2 save/restore,
> which doesn't nest because it is a percpu variable.
> 
> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
> entry_64.S, which makes the stuff in do_nmi completely redundant and
> there for no good reason.
> 
> I would actually suggest we do the equivalent on i386 as well.
> 
> Vince, could you try this patch as an experiment?

OK with your patch applied it does not segfault.

The trace looks like you'd expect too:

 perf_fuzzer-2910  [000]   255.355331: page_fault_kernel:
address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
 perf_fuzzer-2910  [000]   255.355331: function:
__do_page_fault
 perf_fuzzer-2910  [000]   255.355331: function:   
bad_area_nosemaphore
 perf_fuzzer-2910  [000]   255.355331: function:  
__bad_area_nosemaphore
 perf_fuzzer-2910  [000]   255.355332: function: 
no_context
 perf_fuzzer-2910  [000]   255.355332: function:
fixup_exception
 perf_fuzzer-2910  [000]   255.355332: function:
   search_exception_tables
 perf_fuzzer-2910  [000]   255.355332: function:
  search_extable
 perf_fuzzer-2910  [000]   255.355333: function: 
perf_output_begin
 perf_fuzzer-2910  [000]   255.355333: bprint:   
perf_output_begin: VMW: event type 2 config 2a st: 2c3e7
 perf_fuzzer-2910  [000]   255.355333: bprint:   
perf_output_begin: VMW: before event->parent (nil)
 perf_fuzzer-2910  [000]   255.355334: bprint:   
perf_output_begin: VMW: before rcu_dereference (nil)
 perf_fuzzer-2910  [000]   255.355334: function: __do_page_fault
 perf_fuzzer-2910  [000]   255.355334: function:
down_read_trylock
 perf_fuzzer-2910  [000]   255.355334: function:
_cond_resched
 perf_fuzzer-2910  [000]   255.355335: function:find_vma
 perf_fuzzer-2910  [000]   255.355335: function:
handle_mm_fault
 perf_fuzzer-2910  [000]   255.355335: function:   
__do_fault
 perf_fuzzer-2910  [000]   255.355335: bputs:
perf_mmap_fault: VMW: perf_mmap_fault
 perf_fuzzer-2910  [000]   255.355336: bprint:   
perf_mmap_fault: VMW: perf_mmap_fault 0x8800cba6a980
 perf_fuzzer-2910  [000]   255.355336: function: 
perf_mmap_to_page
 perf_fuzzer-2910  [000]   255.355336: function: _cond_resched
 perf_fuzzer-2910  [000]   255.355336: function:  
unlock_page
 perf_fuzzer-2910  [000]   255.355336: function: 
page_waitqueue
 perf_fuzzer-2910  [000]   255.355337: function: 
__wake_up_bit
 perf_fuzzer-2910  [000]   255.355337: bputs:
perf_mmap_fault: VMW: perf_mmap_fault
 perf_fuzzer-2910  [000]   255.355337: function: _cond_resched
 perf_fuzzer-2910  [000]   255.355337: function:  
_raw_spin_lock
 perf_fuzzer-2910  [000]   255.355337: function:  
page_add_file_rmap
 perf_fuzzer-2910  [000]   255.355338: function: 
__inc_zone_page_state
 perf_fuzzer-2910  [000]   255.355338: function:
__inc_zone_state
 perf_fuzzer-2910  [000]   255.355338: function:  
set_page_dirty
 perf_fuzzer-2910  [000]   255.355338: function: 
page_mapping
 perf_fuzzer-2910  [000]   255.355338: function: 
anon_set_page_dirty
 perf_fuzzer-2910  [000]   255.355339: function:  
unlock_page


Vince

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Vince Weaver
On Thu, 27 Feb 2014, H. Peter Anvin wrote:

> On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> > On Thu, 27 Feb 2014 14:52:54 -0800
> > "H. Peter Anvin"  wrote:
> > 
> >> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> >>>
> >>> Yeah, something is getting mesed up.
> >>>
> >>
> >> What it *looks* like to me is that we try to nest the cr2 save/restore,
> >> which doesn't nest because it is a percpu variable.
> >>
> >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
> >> entry_64.S, which makes the stuff in do_nmi completely redundant and
> >> there for no good reason.
> > 
> > Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
> > section. That is, it isn't even executed. That's i386 code. The only
> > place the cr2 is saved for x86_64 is in entry_64.S.
> > 
> 
> Right, egg on my face.  However, I still think it would make more sense
> for it to nest the way entry_64.S does if at all possible.
> 
> That makes this even more confusing, though.  I would still like to see
> what happens with the patch I sent Vince.

I'll try your patch momentarily, first I had some other changes I started 
running before I left work (for some reason it recompiled the whole 
kernel).

8: function: perf_output_begin
8: bprint:   perf_output_begin: VMW: event type 2 config 2a st: 2c3e
8: bputs:perf_output_begin: VMW: before rcu_dereference
9: function: __do_page_fault
9: function:down_read_trylock
9: function:_cond_resched
9: function:find_vma

so it looks like the fault happens 

rcu_read_lock();

116 /*
117  * For inherited events we send all the output towards the parent.
118  */
119 if (event->parent)
120 event = event->parent;
121 

somewhere between here

122 rb = rcu_dereference(event->rb);
123 if (unlikely(!rb))
124 goto out;

and here

125 
126 if (unlikely(!rb->nr_pages))
127 goto out;

although if rcu locks do anything to turn off tracing then this could be 
suspect.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread H. Peter Anvin
On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> On Thu, 27 Feb 2014 14:52:54 -0800
> "H. Peter Anvin"  wrote:
> 
>> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
>>>
>>> Yeah, something is getting mesed up.
>>>
>>
>> What it *looks* like to me is that we try to nest the cr2 save/restore,
>> which doesn't nest because it is a percpu variable.
>>
>> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
>> entry_64.S, which makes the stuff in do_nmi completely redundant and
>> there for no good reason.
> 
> Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
> section. That is, it isn't even executed. That's i386 code. The only
> place the cr2 is saved for x86_64 is in entry_64.S.
> 

Right, egg on my face.  However, I still think it would make more sense
for it to nest the way entry_64.S does if at all possible.

That makes this even more confusing, though.  I would still like to see
what happens with the patch I sent Vince.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Steven Rostedt
On Thu, 27 Feb 2014 14:52:54 -0800
"H. Peter Anvin"  wrote:

> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> > 
> > Yeah, something is getting mesed up.
> > 
> 
> What it *looks* like to me is that we try to nest the cr2 save/restore,
> which doesn't nest because it is a percpu variable.
> 
> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
> entry_64.S, which makes the stuff in do_nmi completely redundant and
> there for no good reason.

Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
section. That is, it isn't even executed. That's i386 code. The only
place the cr2 is saved for x86_64 is in entry_64.S.

-- Steve

> 
> I would actually suggest we do the equivalent on i386 as well.
> 
> Vince, could you try this patch as an experiment?
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread H. Peter Anvin
On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> 
> Yeah, something is getting mesed up.
> 

What it *looks* like to me is that we try to nest the cr2 save/restore,
which doesn't nest because it is a percpu variable.

... except in the x86-64 case, we *ALSO* save/restore cr2 inside
entry_64.S, which makes the stuff in do_nmi completely redundant and
there for no good reason.

I would actually suggest we do the equivalent on i386 as well.

Vince, could you try this patch as an experiment?


diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ddf9ecb..938e45c 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -10,6 +10,8 @@
 #include 
 #include 
 
+#include 
+
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
  * nested NMI paths are careful to preserve CR2.
@@ -18,6 +20,7 @@ unsigned long
 copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 {
unsigned long ret;
+   unsigned long cr2;
 
if (__range_not_ok(from, n, TASK_SIZE))
return 0;
@@ -27,9 +30,11 @@ copy_from_user_nmi(void *to, const void __user *from, 
unsigned long n)
 * disable pagefaults so that its behaviour is consistent even when
 * called form other contexts.
 */
+   cr2 = read_cr2();
pagefault_disable();
ret = __copy_from_user_inatomic(to, from, n);
pagefault_enable();
+   write_cr2(cr2);
 
return ret;
 }


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Steven Rostedt
On Thu, 27 Feb 2014 17:06:36 -0500 (EST)
Vince Weaver  wrote:

> 
> I spent some more time on this.
> I managed to get a trace that exhibited the bug practically right 
> away, but still unable to generate a reproducible trace :(  
> 
> So instead I'm adding WARN's and trace_printks to see what I can find out.
> 
> Here's a summary of what I think is happneing.  Please let me know if I'm 
> wildly wrong in analyzing this.
> 
> 
> 
> Userspace accesses the perf 
> ring-buffer-user-page of an event for the first time (in this trace
> it's a SW_CPU_CLOCK event).
> 
> This triggers a page_fault to bring in the page.
> 
> This triggers a TRACEPOINT event (task/task_newtask) which
> has 
> PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_CALLCHAIN|PERF_SAMPLE_ID|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_WEIGHT|PERF_SAMPLE_DATA_SRC|PERF_SAMPLE_TRANSACTION;
> and exclude_callchain_kernel=1, triggering a perf_callchain().
> 
> Here is a dump of the stack right after we enter perf_callchain()
> 
> vince@core2:~$ [  202.320444] [ cut here ]
> [  202.324001] WARNING: CPU: 1 PID: 2873 at kernel/events/callchain.c:168 
> perf_callchain+0x67/0x211()
> hermal_sys ehci_pci ehci_hcd sg sd_mod usbcore usb_common
> [  202.324001] CPU: 1 PID: 2873 Comm: perf_fuzzer Not tainted 3.14.0-rc3+ #24
> [  202.324001] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, 
> BIOS 080015  10/19/2012
> [  202.324001]  00a8 880119ec1b48 81531dc1 
> 00a8
> [  202.324001]   880119ec1b88 81040ce4 
> 880119ec1b88
> [  202.324001]  810d416f 880119ec1e38 8800cba67800 
> 
> [  202.324001] Call Trace:
> [  202.324001]  [] dump_stack+0x49/0x60
> [  202.324001]  [] warn_slowpath_common+0x81/0x9b
> [  202.324001]  [] ? perf_callchain+0x67/0x211
> [  202.324001]  [] warn_slowpath_null+0x1a/0x1c
> [  202.324001]  [] perf_callchain+0x67/0x211
> [  202.324001]  [] ? local_clock+0x1b/0x24
> [  202.324001]  [] perf_prepare_sample+0x7b/0x304
> [  202.324001]  [] __perf_event_overflow+0x156/0x1c1
> [  202.324001]  [] ? free_pgtables+0xa7/0xc9
> [  202.324001]  [] perf_swevent_overflow+0x41/0x5b
> [  202.324001]  [] perf_swevent_event+0x72/0x74
> [  202.324001]  [] perf_tp_event+0xea/0x1ef
> [  202.324001]  [] ? perf_trace_x86_exceptions+0x4c/0xba
> [  202.324001]  [] perf_trace_x86_exceptions+0xa8/0xba
> [  202.324001]  [] ? perf_trace_x86_exceptions+0x4c/0xba
> [  202.324001]  [] trace_do_page_fault+0x48/0x99
> [  202.324001]  [] trace_page_fault+0x22/0x30
> [  202.324001] ---[ end trace 31ccd31b4e82cb42 ]---
> 
> This triggers a kernel pagefault because the BP register is not valid and
> copy_from_user_nmi() tries to copy the user stack area from there.
> 
> This *should* be OK?  And it maybe looks like it works, but then
> 
>   copy_user_handle_tail()
> 
> causes another page fault again at the invaid BP register.
> 
> But then the code continues to
>   perf_output_begin()
> 
> 
> 237.265689: page_fault_kernel:address=0x17a0 ip=copy_user_handle_tail 
> error_code=0x0
> 237.265689: function:__do_page_fault
> 237.265689: function:   bad_area_nosemaphore
> 237.265690: function:  __bad_area_nosemaphore
> 237.265690: function: no_context
> 237.265690: function:fixup_exception
> 237.265690: function:   search_exception_tables
> 237.265690: function:  search_extable
> 237.265691: function: perf_output_begin

You can also allow perf to be traced. Remove the CFLAGS_REMOVE_core.o
line from kernel/events/Makefile.

> 237.265692: bprint:   perf_output_begin: VMW: event type 2 config 
> 2a st: 2c3e7
> 
> how are we back in __do_page_fault again here?

Well, the perf ring buffer is vmalloced, right? That can cause a page
fault too.

> 
> 237.265692: function: __do_page_fault

Try adding a trace_printk() to see all the page faults?

> 237.265692: function:down_read_trylock
> 237.265692: function:_cond_resched
> 237.265693: function:find_vma
> 237.265693: function:bad_area
> 237.265693: function:   up_read
> 237.265693: function:   __bad_area_nosemaphore
> 237.265694: function:  is_prefetch
> 237.265694: function: convert_ip_to_linear
> 237.265695: function:  unhandled_signal
> 237.265695: function:  __printk_ratelimit
> 237.265695: function: _raw_spin_trylock
> 237.265695: function: _raw_spin_unlock_irqrestore
> 237.265696: function:  printk
> 237.265696: function: vprintk_emit
> 
> [  202.877004] perf_fuzzer[2873]: segfault at 17a0 ip 

Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Vince Weaver

I spent some more time on this.
I managed to get a trace that exhibited the bug practically right 
away, but still unable to generate a reproducible trace :(  

So instead I'm adding WARN's and trace_printks to see what I can find out.

Here's a summary of what I think is happneing.  Please let me know if I'm 
wildly wrong in analyzing this.



Userspace accesses the perf 
ring-buffer-user-page of an event for the first time (in this trace
it's a SW_CPU_CLOCK event).

This triggers a page_fault to bring in the page.

This triggers a TRACEPOINT event (task/task_newtask) which
has 
PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_CALLCHAIN|PERF_SAMPLE_ID|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_WEIGHT|PERF_SAMPLE_DATA_SRC|PERF_SAMPLE_TRANSACTION;
and exclude_callchain_kernel=1, triggering a perf_callchain().

Here is a dump of the stack right after we enter perf_callchain()

vince@core2:~$ [  202.320444] [ cut here ]
[  202.324001] WARNING: CPU: 1 PID: 2873 at kernel/events/callchain.c:168 
perf_callchain+0x67/0x211()
hermal_sys ehci_pci ehci_hcd sg sd_mod usbcore usb_common
[  202.324001] CPU: 1 PID: 2873 Comm: perf_fuzzer Not tainted 3.14.0-rc3+ #24
[  202.324001] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, 
BIOS 080015  10/19/2012
[  202.324001]  00a8 880119ec1b48 81531dc1 
00a8
[  202.324001]   880119ec1b88 81040ce4 
880119ec1b88
[  202.324001]  810d416f 880119ec1e38 8800cba67800 

[  202.324001] Call Trace:
[  202.324001]  [] dump_stack+0x49/0x60
[  202.324001]  [] warn_slowpath_common+0x81/0x9b
[  202.324001]  [] ? perf_callchain+0x67/0x211
[  202.324001]  [] warn_slowpath_null+0x1a/0x1c
[  202.324001]  [] perf_callchain+0x67/0x211
[  202.324001]  [] ? local_clock+0x1b/0x24
[  202.324001]  [] perf_prepare_sample+0x7b/0x304
[  202.324001]  [] __perf_event_overflow+0x156/0x1c1
[  202.324001]  [] ? free_pgtables+0xa7/0xc9
[  202.324001]  [] perf_swevent_overflow+0x41/0x5b
[  202.324001]  [] perf_swevent_event+0x72/0x74
[  202.324001]  [] perf_tp_event+0xea/0x1ef
[  202.324001]  [] ? perf_trace_x86_exceptions+0x4c/0xba
[  202.324001]  [] perf_trace_x86_exceptions+0xa8/0xba
[  202.324001]  [] ? perf_trace_x86_exceptions+0x4c/0xba
[  202.324001]  [] trace_do_page_fault+0x48/0x99
[  202.324001]  [] trace_page_fault+0x22/0x30
[  202.324001] ---[ end trace 31ccd31b4e82cb42 ]---

This triggers a kernel pagefault because the BP register is not valid and
copy_from_user_nmi() tries to copy the user stack area from there.

This *should* be OK?  And it maybe looks like it works, but then

copy_user_handle_tail()

causes another page fault again at the invaid BP register.

But then the code continues to
perf_output_begin()


237.265689: page_fault_kernel:address=0x17a0 ip=copy_user_handle_tail 
error_code=0x0
237.265689: function:__do_page_fault
237.265689: function:   bad_area_nosemaphore
237.265690: function:  __bad_area_nosemaphore
237.265690: function: no_context
237.265690: function:fixup_exception
237.265690: function:   search_exception_tables
237.265690: function:  search_extable
237.265691: function: perf_output_begin
237.265692: bprint:   perf_output_begin: VMW: event type 2 config 
2a st: 2c3e7

how are we back in __do_page_fault again here?

237.265692: function: __do_page_fault
237.265692: function:down_read_trylock
237.265692: function:_cond_resched
237.265693: function:find_vma
237.265693: function:bad_area
237.265693: function:   up_read
237.265693: function:   __bad_area_nosemaphore
237.265694: function:  is_prefetch
237.265694: function: convert_ip_to_linear
237.265695: function:  unhandled_signal
237.265695: function:  __printk_ratelimit
237.265695: function: _raw_spin_trylock
237.265695: function: _raw_spin_unlock_irqrestore
237.265696: function:  printk
237.265696: function: vprintk_emit

[  202.877004] perf_fuzzer[2873]: segfault at 17a0 ip 004017fd sp 
ffd19d10 error 6 in perf_fuzzer[40+d1000]


Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Vince Weaver

I spent some more time on this.
I managed to get a trace that exhibited the bug practically right 
away, but still unable to generate a reproducible trace :(  

So instead I'm adding WARN's and trace_printks to see what I can find out.

Here's a summary of what I think is happneing.  Please let me know if I'm 
wildly wrong in analyzing this.



Userspace accesses the perf 
ring-buffer-user-page of an event for the first time (in this trace
it's a SW_CPU_CLOCK event).

This triggers a page_fault to bring in the page.

This triggers a TRACEPOINT event (task/task_newtask) which
has 
PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_CALLCHAIN|PERF_SAMPLE_ID|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_WEIGHT|PERF_SAMPLE_DATA_SRC|PERF_SAMPLE_TRANSACTION;
and exclude_callchain_kernel=1, triggering a perf_callchain().

Here is a dump of the stack right after we enter perf_callchain()

vince@core2:~$ [  202.320444] [ cut here ]
[  202.324001] WARNING: CPU: 1 PID: 2873 at kernel/events/callchain.c:168 
perf_callchain+0x67/0x211()
hermal_sys ehci_pci ehci_hcd sg sd_mod usbcore usb_common
[  202.324001] CPU: 1 PID: 2873 Comm: perf_fuzzer Not tainted 3.14.0-rc3+ #24
[  202.324001] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, 
BIOS 080015  10/19/2012
[  202.324001]  00a8 880119ec1b48 81531dc1 
00a8
[  202.324001]   880119ec1b88 81040ce4 
880119ec1b88
[  202.324001]  810d416f 880119ec1e38 8800cba67800 

[  202.324001] Call Trace:
[  202.324001]  [81531dc1] dump_stack+0x49/0x60
[  202.324001]  [81040ce4] warn_slowpath_common+0x81/0x9b
[  202.324001]  [810d416f] ? perf_callchain+0x67/0x211
[  202.324001]  [81040d18] warn_slowpath_null+0x1a/0x1c
[  202.324001]  [810d416f] perf_callchain+0x67/0x211
[  202.324001]  [8106b0ab] ? local_clock+0x1b/0x24
[  202.324001]  [810d0872] perf_prepare_sample+0x7b/0x304
[  202.324001]  [810d1079] __perf_event_overflow+0x156/0x1c1
[  202.324001]  [810f4613] ? free_pgtables+0xa7/0xc9
[  202.324001]  [810d125e] perf_swevent_overflow+0x41/0x5b
[  202.324001]  [810d12ea] perf_swevent_event+0x72/0x74
[  202.324001]  [810d1478] perf_tp_event+0xea/0x1ef
[  202.324001]  [8103515a] ? perf_trace_x86_exceptions+0x4c/0xba
[  202.324001]  [810351b6] perf_trace_x86_exceptions+0xa8/0xba
[  202.324001]  [8103515a] ? perf_trace_x86_exceptions+0x4c/0xba
[  202.324001]  [81538439] trace_do_page_fault+0x48/0x99
[  202.324001]  [815354b2] trace_page_fault+0x22/0x30
[  202.324001] ---[ end trace 31ccd31b4e82cb42 ]---

This triggers a kernel pagefault because the BP register is not valid and
copy_from_user_nmi() tries to copy the user stack area from there.

This *should* be OK?  And it maybe looks like it works, but then

copy_user_handle_tail()

causes another page fault again at the invaid BP register.

But then the code continues to
perf_output_begin()


237.265689: page_fault_kernel:address=0x17a0 ip=copy_user_handle_tail 
error_code=0x0
237.265689: function:__do_page_fault
237.265689: function:   bad_area_nosemaphore
237.265690: function:  __bad_area_nosemaphore
237.265690: function: no_context
237.265690: function:fixup_exception
237.265690: function:   search_exception_tables
237.265690: function:  search_extable
237.265691: function: perf_output_begin
237.265692: bprint:   perf_output_begin: VMW: event type 2 config 
2a st: 2c3e7

how are we back in __do_page_fault again here?

237.265692: function: __do_page_fault
237.265692: function:down_read_trylock
237.265692: function:_cond_resched
237.265693: function:find_vma
237.265693: function:bad_area
237.265693: function:   up_read
237.265693: function:   __bad_area_nosemaphore
237.265694: function:  is_prefetch
237.265694: function: convert_ip_to_linear
237.265695: function:  unhandled_signal
237.265695: function:  __printk_ratelimit
237.265695: function: _raw_spin_trylock
237.265695: function: _raw_spin_unlock_irqrestore
237.265696: function:  printk
237.265696: function: vprintk_emit

[  202.877004] perf_fuzzer[2873]: segfault at 17a0 ip 004017fd sp 
ffd19d10 error 6 in perf_fuzzer[40+d1000]


Vince
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Steven Rostedt
On Thu, 27 Feb 2014 17:06:36 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:

 
 I spent some more time on this.
 I managed to get a trace that exhibited the bug practically right 
 away, but still unable to generate a reproducible trace :(  
 
 So instead I'm adding WARN's and trace_printks to see what I can find out.
 
 Here's a summary of what I think is happneing.  Please let me know if I'm 
 wildly wrong in analyzing this.
 
 
 
 Userspace accesses the perf 
 ring-buffer-user-page of an event for the first time (in this trace
 it's a SW_CPU_CLOCK event).
 
 This triggers a page_fault to bring in the page.
 
 This triggers a TRACEPOINT event (task/task_newtask) which
 has 
 PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_CALLCHAIN|PERF_SAMPLE_ID|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_WEIGHT|PERF_SAMPLE_DATA_SRC|PERF_SAMPLE_TRANSACTION;
 and exclude_callchain_kernel=1, triggering a perf_callchain().
 
 Here is a dump of the stack right after we enter perf_callchain()
 
 vince@core2:~$ [  202.320444] [ cut here ]
 [  202.324001] WARNING: CPU: 1 PID: 2873 at kernel/events/callchain.c:168 
 perf_callchain+0x67/0x211()
 hermal_sys ehci_pci ehci_hcd sg sd_mod usbcore usb_common
 [  202.324001] CPU: 1 PID: 2873 Comm: perf_fuzzer Not tainted 3.14.0-rc3+ #24
 [  202.324001] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, 
 BIOS 080015  10/19/2012
 [  202.324001]  00a8 880119ec1b48 81531dc1 
 00a8
 [  202.324001]   880119ec1b88 81040ce4 
 880119ec1b88
 [  202.324001]  810d416f 880119ec1e38 8800cba67800 
 
 [  202.324001] Call Trace:
 [  202.324001]  [81531dc1] dump_stack+0x49/0x60
 [  202.324001]  [81040ce4] warn_slowpath_common+0x81/0x9b
 [  202.324001]  [810d416f] ? perf_callchain+0x67/0x211
 [  202.324001]  [81040d18] warn_slowpath_null+0x1a/0x1c
 [  202.324001]  [810d416f] perf_callchain+0x67/0x211
 [  202.324001]  [8106b0ab] ? local_clock+0x1b/0x24
 [  202.324001]  [810d0872] perf_prepare_sample+0x7b/0x304
 [  202.324001]  [810d1079] __perf_event_overflow+0x156/0x1c1
 [  202.324001]  [810f4613] ? free_pgtables+0xa7/0xc9
 [  202.324001]  [810d125e] perf_swevent_overflow+0x41/0x5b
 [  202.324001]  [810d12ea] perf_swevent_event+0x72/0x74
 [  202.324001]  [810d1478] perf_tp_event+0xea/0x1ef
 [  202.324001]  [8103515a] ? perf_trace_x86_exceptions+0x4c/0xba
 [  202.324001]  [810351b6] perf_trace_x86_exceptions+0xa8/0xba
 [  202.324001]  [8103515a] ? perf_trace_x86_exceptions+0x4c/0xba
 [  202.324001]  [81538439] trace_do_page_fault+0x48/0x99
 [  202.324001]  [815354b2] trace_page_fault+0x22/0x30
 [  202.324001] ---[ end trace 31ccd31b4e82cb42 ]---
 
 This triggers a kernel pagefault because the BP register is not valid and
 copy_from_user_nmi() tries to copy the user stack area from there.
 
 This *should* be OK?  And it maybe looks like it works, but then
 
   copy_user_handle_tail()
 
 causes another page fault again at the invaid BP register.
 
 But then the code continues to
   perf_output_begin()
 
 
 237.265689: page_fault_kernel:address=0x17a0 ip=copy_user_handle_tail 
 error_code=0x0
 237.265689: function:__do_page_fault
 237.265689: function:   bad_area_nosemaphore
 237.265690: function:  __bad_area_nosemaphore
 237.265690: function: no_context
 237.265690: function:fixup_exception
 237.265690: function:   search_exception_tables
 237.265690: function:  search_extable
 237.265691: function: perf_output_begin

You can also allow perf to be traced. Remove the CFLAGS_REMOVE_core.o
line from kernel/events/Makefile.

 237.265692: bprint:   perf_output_begin: VMW: event type 2 config 
 2a st: 2c3e7
 
 how are we back in __do_page_fault again here?

Well, the perf ring buffer is vmalloced, right? That can cause a page
fault too.

 
 237.265692: function: __do_page_fault

Try adding a trace_printk() to see all the page faults?

 237.265692: function:down_read_trylock
 237.265692: function:_cond_resched
 237.265693: function:find_vma
 237.265693: function:bad_area
 237.265693: function:   up_read
 237.265693: function:   __bad_area_nosemaphore
 237.265694: function:  is_prefetch
 237.265694: function: convert_ip_to_linear
 237.265695: function:  unhandled_signal
 237.265695: function:  __printk_ratelimit
 237.265695: function: _raw_spin_trylock
 237.265695: function: 

Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread H. Peter Anvin
On 02/27/2014 02:31 PM, Steven Rostedt wrote:
 
 Yeah, something is getting mesed up.
 

What it *looks* like to me is that we try to nest the cr2 save/restore,
which doesn't nest because it is a percpu variable.

... except in the x86-64 case, we *ALSO* save/restore cr2 inside
entry_64.S, which makes the stuff in do_nmi completely redundant and
there for no good reason.

I would actually suggest we do the equivalent on i386 as well.

Vince, could you try this patch as an experiment?


diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ddf9ecb..938e45c 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -10,6 +10,8 @@
 #include asm/word-at-a-time.h
 #include linux/sched.h
 
+#include asm/processor.h
+
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
  * nested NMI paths are careful to preserve CR2.
@@ -18,6 +20,7 @@ unsigned long
 copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 {
unsigned long ret;
+   unsigned long cr2;
 
if (__range_not_ok(from, n, TASK_SIZE))
return 0;
@@ -27,9 +30,11 @@ copy_from_user_nmi(void *to, const void __user *from, 
unsigned long n)
 * disable pagefaults so that its behaviour is consistent even when
 * called form other contexts.
 */
+   cr2 = read_cr2();
pagefault_disable();
ret = __copy_from_user_inatomic(to, from, n);
pagefault_enable();
+   write_cr2(cr2);
 
return ret;
 }


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Steven Rostedt
On Thu, 27 Feb 2014 14:52:54 -0800
H. Peter Anvin h...@zytor.com wrote:

 On 02/27/2014 02:31 PM, Steven Rostedt wrote:
  
  Yeah, something is getting mesed up.
  
 
 What it *looks* like to me is that we try to nest the cr2 save/restore,
 which doesn't nest because it is a percpu variable.
 
 ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
 entry_64.S, which makes the stuff in do_nmi completely redundant and
 there for no good reason.

Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
section. That is, it isn't even executed. That's i386 code. The only
place the cr2 is saved for x86_64 is in entry_64.S.

-- Steve

 
 I would actually suggest we do the equivalent on i386 as well.
 
 Vince, could you try this patch as an experiment?
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread H. Peter Anvin
On 02/27/2014 03:30 PM, Steven Rostedt wrote:
 On Thu, 27 Feb 2014 14:52:54 -0800
 H. Peter Anvin h...@zytor.com wrote:
 
 On 02/27/2014 02:31 PM, Steven Rostedt wrote:

 Yeah, something is getting mesed up.


 What it *looks* like to me is that we try to nest the cr2 save/restore,
 which doesn't nest because it is a percpu variable.

 ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
 entry_64.S, which makes the stuff in do_nmi completely redundant and
 there for no good reason.
 
 Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
 section. That is, it isn't even executed. That's i386 code. The only
 place the cr2 is saved for x86_64 is in entry_64.S.
 

Right, egg on my face.  However, I still think it would make more sense
for it to nest the way entry_64.S does if at all possible.

That makes this even more confusing, though.  I would still like to see
what happens with the patch I sent Vince.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Vince Weaver
On Thu, 27 Feb 2014, H. Peter Anvin wrote:

 On 02/27/2014 03:30 PM, Steven Rostedt wrote:
  On Thu, 27 Feb 2014 14:52:54 -0800
  H. Peter Anvin h...@zytor.com wrote:
  
  On 02/27/2014 02:31 PM, Steven Rostedt wrote:
 
  Yeah, something is getting mesed up.
 
 
  What it *looks* like to me is that we try to nest the cr2 save/restore,
  which doesn't nest because it is a percpu variable.
 
  ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
  entry_64.S, which makes the stuff in do_nmi completely redundant and
  there for no good reason.
  
  Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
  section. That is, it isn't even executed. That's i386 code. The only
  place the cr2 is saved for x86_64 is in entry_64.S.
  
 
 Right, egg on my face.  However, I still think it would make more sense
 for it to nest the way entry_64.S does if at all possible.
 
 That makes this even more confusing, though.  I would still like to see
 what happens with the patch I sent Vince.

I'll try your patch momentarily, first I had some other changes I started 
running before I left work (for some reason it recompiled the whole 
kernel).

8: function: perf_output_begin
8: bprint:   perf_output_begin: VMW: event type 2 config 2a st: 2c3e
8: bputs:perf_output_begin: VMW: before rcu_dereference
9: function: __do_page_fault
9: function:down_read_trylock
9: function:_cond_resched
9: function:find_vma

so it looks like the fault happens 

rcu_read_lock();

116 /*
117  * For inherited events we send all the output towards the parent.
118  */
119 if (event-parent)
120 event = event-parent;
121 

somewhere between here

122 rb = rcu_dereference(event-rb);
123 if (unlikely(!rb))
124 goto out;

and here

125 
126 if (unlikely(!rb-nr_pages))
127 goto out;

although if rcu locks do anything to turn off tracing then this could be 
suspect.

Vince
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Vince Weaver
On Thu, 27 Feb 2014, H. Peter Anvin wrote:

 On 02/27/2014 02:31 PM, Steven Rostedt wrote:
  
  Yeah, something is getting mesed up.
  
 
 What it *looks* like to me is that we try to nest the cr2 save/restore,
 which doesn't nest because it is a percpu variable.
 
 ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
 entry_64.S, which makes the stuff in do_nmi completely redundant and
 there for no good reason.
 
 I would actually suggest we do the equivalent on i386 as well.
 
 Vince, could you try this patch as an experiment?

OK with your patch applied it does not segfault.

The trace looks like you'd expect too:

 perf_fuzzer-2910  [000]   255.355331: page_fault_kernel:
address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
 perf_fuzzer-2910  [000]   255.355331: function:
__do_page_fault
 perf_fuzzer-2910  [000]   255.355331: function:   
bad_area_nosemaphore
 perf_fuzzer-2910  [000]   255.355331: function:  
__bad_area_nosemaphore
 perf_fuzzer-2910  [000]   255.355332: function: 
no_context
 perf_fuzzer-2910  [000]   255.355332: function:
fixup_exception
 perf_fuzzer-2910  [000]   255.355332: function:
   search_exception_tables
 perf_fuzzer-2910  [000]   255.355332: function:
  search_extable
 perf_fuzzer-2910  [000]   255.355333: function: 
perf_output_begin
 perf_fuzzer-2910  [000]   255.355333: bprint:   
perf_output_begin: VMW: event type 2 config 2a st: 2c3e7
 perf_fuzzer-2910  [000]   255.355333: bprint:   
perf_output_begin: VMW: before event-parent (nil)
 perf_fuzzer-2910  [000]   255.355334: bprint:   
perf_output_begin: VMW: before rcu_dereference (nil)
 perf_fuzzer-2910  [000]   255.355334: function: __do_page_fault
 perf_fuzzer-2910  [000]   255.355334: function:
down_read_trylock
 perf_fuzzer-2910  [000]   255.355334: function:
_cond_resched
 perf_fuzzer-2910  [000]   255.355335: function:find_vma
 perf_fuzzer-2910  [000]   255.355335: function:
handle_mm_fault
 perf_fuzzer-2910  [000]   255.355335: function:   
__do_fault
 perf_fuzzer-2910  [000]   255.355335: bputs:
perf_mmap_fault: VMW: perf_mmap_fault
 perf_fuzzer-2910  [000]   255.355336: bprint:   
perf_mmap_fault: VMW: perf_mmap_fault 0x8800cba6a980
 perf_fuzzer-2910  [000]   255.355336: function: 
perf_mmap_to_page
 perf_fuzzer-2910  [000]   255.355336: function: _cond_resched
 perf_fuzzer-2910  [000]   255.355336: function:  
unlock_page
 perf_fuzzer-2910  [000]   255.355336: function: 
page_waitqueue
 perf_fuzzer-2910  [000]   255.355337: function: 
__wake_up_bit
 perf_fuzzer-2910  [000]   255.355337: bputs:
perf_mmap_fault: VMW: perf_mmap_fault
 perf_fuzzer-2910  [000]   255.355337: function: _cond_resched
 perf_fuzzer-2910  [000]   255.355337: function:  
_raw_spin_lock
 perf_fuzzer-2910  [000]   255.355337: function:  
page_add_file_rmap
 perf_fuzzer-2910  [000]   255.355338: function: 
__inc_zone_page_state
 perf_fuzzer-2910  [000]   255.355338: function:
__inc_zone_state
 perf_fuzzer-2910  [000]   255.355338: function:  
set_page_dirty
 perf_fuzzer-2910  [000]   255.355338: function: 
page_mapping
 perf_fuzzer-2910  [000]   255.355338: function: 
anon_set_page_dirty
 perf_fuzzer-2910  [000]   255.355339: function:  
unlock_page


Vince

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread H. Peter Anvin
Ok... I think we're definitely talking about a cr2 leak.  The reboot might be a 
race condition in the NMI nesting handling maybe?

On February 27, 2014 5:34:34 PM PST, Vince Weaver vincent.wea...@maine.edu 
wrote:
On Thu, 27 Feb 2014, H. Peter Anvin wrote:

 On 02/27/2014 02:31 PM, Steven Rostedt wrote:
  
  Yeah, something is getting mesed up.
  
 
 What it *looks* like to me is that we try to nest the cr2
save/restore,
 which doesn't nest because it is a percpu variable.
 
 ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
 entry_64.S, which makes the stuff in do_nmi completely redundant and
 there for no good reason.
 
 I would actually suggest we do the equivalent on i386 as well.
 
 Vince, could you try this patch as an experiment?

OK with your patch applied it does not segfault.

The trace looks like you'd expect too:

perf_fuzzer-2910  [000]   255.355331: page_fault_kernel:   
address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
perf_fuzzer-2910  [000]   255.355331: function:   
__do_page_fault
perf_fuzzer-2910  [000]   255.355331: function:  
bad_area_nosemaphore
perf_fuzzer-2910  [000]   255.355331: function: 
__bad_area_nosemaphore
perf_fuzzer-2910  [000]   255.355332: function:
no_context
perf_fuzzer-2910  [000]   255.355332: function:
   fixup_exception
perf_fuzzer-2910  [000]   255.355332: function:
  search_exception_tables
perf_fuzzer-2910  [000]   255.355332: function:
 search_extable
perf_fuzzer-2910  [000]   255.355333: function:
perf_output_begin
perf_fuzzer-2910  [000]   255.355333: bprint:  
perf_output_begin: VMW: event type 2 config 2a st: 2c3e7
perf_fuzzer-2910  [000]   255.355333: bprint:  
perf_output_begin: VMW: before event-parent (nil)
perf_fuzzer-2910  [000]   255.355334: bprint:  
perf_output_begin: VMW: before rcu_dereference (nil)
perf_fuzzer-2910  [000]   255.355334: function:
__do_page_fault
perf_fuzzer-2910  [000]   255.355334: function:   
down_read_trylock
perf_fuzzer-2910  [000]   255.355334: function:   
_cond_resched
perf_fuzzer-2910  [000]   255.355335: function:find_vma
perf_fuzzer-2910  [000]   255.355335: function:   
handle_mm_fault
perf_fuzzer-2910  [000]   255.355335: function:  
__do_fault
perf_fuzzer-2910  [000]   255.355335: bputs:   
perf_mmap_fault: VMW: perf_mmap_fault
perf_fuzzer-2910  [000]   255.355336: bprint:  
perf_mmap_fault: VMW: perf_mmap_fault 0x8800cba6a980
perf_fuzzer-2910  [000]   255.355336: function:
perf_mmap_to_page
perf_fuzzer-2910  [000]   255.355336: function:
_cond_resched
perf_fuzzer-2910  [000]   255.355336: function: 
unlock_page
perf_fuzzer-2910  [000]   255.355336: function:
page_waitqueue
perf_fuzzer-2910  [000]   255.355337: function:
__wake_up_bit
perf_fuzzer-2910  [000]   255.355337: bputs:   
perf_mmap_fault: VMW: perf_mmap_fault
perf_fuzzer-2910  [000]   255.355337: function:
_cond_resched
perf_fuzzer-2910  [000]   255.355337: function: 
_raw_spin_lock
perf_fuzzer-2910  [000]   255.355337: function: 
page_add_file_rmap
perf_fuzzer-2910  [000]   255.355338: function:
__inc_zone_page_state
perf_fuzzer-2910  [000]   255.355338: function:
   __inc_zone_state
perf_fuzzer-2910  [000]   255.355338: function: 
set_page_dirty
perf_fuzzer-2910  [000]   255.355338: function:
page_mapping
perf_fuzzer-2910  [000]   255.355338: function:
anon_set_page_dirty
perf_fuzzer-2910  [000]   255.355339: function: 
unlock_page


Vince

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-27 Thread Steven Rostedt
On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
Vince Weaver vincent.wea...@maine.edu wrote:


  I would actually suggest we do the equivalent on i386 as well.
  
  Vince, could you try this patch as an experiment?
 
 OK with your patch applied it does not segfault.
 

Vince, Great! Can you remove Peter's patch, and try this one. It
removes the crud to save the cr2 from entry_64.S and makes both i386
and x86_64 do the same thing in regards to cr2 handling.

-- Steve

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1e96c36..937cb8d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1854,29 +1854,11 @@ end_repeat_nmi:
call save_paranoid
DEFAULT_FRAME 0
 
-   /*
-* Save off the CR2 register. If we take a page fault in the NMI then
-* it could corrupt the CR2 value. If the NMI preempts a page fault
-* handler before it was able to read the CR2 register, and then the
-* NMI itself takes a page fault, the page fault that was preempted
-* will read the information from the NMI page fault and not the
-* origin fault. Save it off and restore it if it changes.
-* Use the r12 callee-saved register.
-*/
-   movq %cr2, %r12
-
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
movq %rsp,%rdi
movq $-1,%rsi
call do_nmi
 
-   /* Did the NMI take a page fault? Restore cr2 if it did */
-   movq %cr2, %rcx
-   cmpq %rcx, %r12
-   je 1f
-   movq %r12, %cr2
-1:
-   
testl %ebx,%ebx /* swapgs needed? */
jnz nmi_restore
 nmi_swapgs:
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 6fcb49c..f1a6294 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -443,7 +443,6 @@ enum nmi_states {
NMI_LATCHED,
 };
 static DEFINE_PER_CPU(enum nmi_states, nmi_state);
-static DEFINE_PER_CPU(unsigned long, nmi_cr2);
 
 #define nmi_nesting_preprocess(regs)   \
do {\
@@ -452,14 +451,11 @@ static DEFINE_PER_CPU(unsigned long, nmi_cr2);
return; \
}   \
this_cpu_write(nmi_state, NMI_EXECUTING);   \
-   this_cpu_write(nmi_cr2, read_cr2());\
} while (0);\
nmi_restart:
 
 #define nmi_nesting_postprocess()  \
do {\
-   if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) \
-   write_cr2(this_cpu_read(nmi_cr2));  \
if (this_cpu_dec_return(nmi_state)) \
goto nmi_restart;   \
} while (0)
@@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
 dotraplinkage notrace __kprobes void
 do_nmi(struct pt_regs *regs, long error_code)
 {
+   unsigned long cr2;
+   
nmi_nesting_preprocess(regs);
 
+   /*
+* Save off the CR2 register. If we take a page fault in the NMI then
+* it could corrupt the CR2 value. If the NMI preempts a page fault
+* handler before it was able to read the CR2 register, and then the
+* NMI itself takes a page fault, the page fault that was preempted
+* will read the information from the NMI page fault and not the
+* origin fault. Save it off and restore it if it changes.
+* Use the r12 callee-saved register.
+*/
+   cr2 = read_cr2();
+
nmi_enter();
 
inc_irq_stat(__nmi_count);
@@ -523,6 +532,10 @@ do_nmi(struct pt_regs *regs, long error_code)
 
nmi_exit();
 
+   /* Reads are cheaper than writes */
+   if (unlikely(cr2 != read_cr2()))
+   write_cr2(cr2);
+
/* On i386, may loop back to preprocess */
nmi_nesting_postprocess();
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-26 Thread Vince Weaver
On Tue, 25 Feb 2014, Vince Weaver wrote:

> On Tue, 25 Feb 2014, Steven Rostedt wrote:
> 
> > On Tue, 25 Feb 2014 06:34:55 -0800
> > "H. Peter Anvin"  wrote:
> > 
> > > #2 is what I really don't understand.
> > > 
> > > I worry something else is going on there
> > 
> > Yeah, me too.
> > 
> 
> OK, well I'll work on isolating that next, I was hoping the segfault issue 
> was related.  It's a lot harder getting traces out of a machine that 
> insta-reboots so fast that it can't even finish printing the oops message.

OK, dug up the random seeds that are causing the system to reboot, and it 
turns out that it's hard to reproduce the actual reboot, but in the cases 
where it doesn't reboot it segfaults in the same code path that was 
causing the other segfaults in this thread.  So I think it's all the same 
bug, although I can investigate further if needed.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-26 Thread Vince Weaver
On Tue, 25 Feb 2014, Vince Weaver wrote:

 On Tue, 25 Feb 2014, Steven Rostedt wrote:
 
  On Tue, 25 Feb 2014 06:34:55 -0800
  H. Peter Anvin h...@zytor.com wrote:
  
   #2 is what I really don't understand.
   
   I worry something else is going on there
  
  Yeah, me too.
  
 
 OK, well I'll work on isolating that next, I was hoping the segfault issue 
 was related.  It's a lot harder getting traces out of a machine that 
 insta-reboots so fast that it can't even finish printing the oops message.

OK, dug up the random seeds that are causing the system to reboot, and it 
turns out that it's hard to reproduce the actual reboot, but in the cases 
where it doesn't reboot it segfaults in the same code path that was 
causing the other segfaults in this thread.  So I think it's all the same 
bug, although I can investigate further if needed.

Vince
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-25 Thread Vince Weaver
On Tue, 25 Feb 2014, Steven Rostedt wrote:

> On Tue, 25 Feb 2014 06:34:55 -0800
> "H. Peter Anvin"  wrote:
> 
> > #2 is what I really don't understand.
> > 
> > I worry something else is going on there
> 
> Yeah, me too.
> 

OK, well I'll work on isolating that next, I was hoping the segfault issue 
was related.  It's a lot harder getting traces out of a machine that 
insta-reboots so fast that it can't even finish printing the oops message.

I need my testing machine for some unrelated experiments so I might not 
get a chance to work on this further for a day or two.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf_fuzzer compiled for x32 causes reboot

2014-02-25 Thread Steven Rostedt
On Tue, 25 Feb 2014 06:34:55 -0800
"H. Peter Anvin"  wrote:

> #2 is what I really don't understand.
> 
> I worry something else is going on there

Yeah, me too.

-- Steve

> 
> >
> >While the missing cr2 issue made debugging frustrating, I find the
> >other 
> >aspects of the bug more serious:
> >
> >  1.  Programs that are doing valid memory accesses can segfault
> >and worse
> >2.  This bug can cause an instant-reboot of the system (I assume
> >somehow
> >  with the right combination of memory accesses  it causes a 
> >  triple-fault?)
> >
> >#2 is why I spent all of this time tracking this down, I couldn't leave
> >a 
> >machine fuzzing overnight without the machine rebooting.
> >
> >Vince
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >