Re: perf_fuzzer compiled for x32 causes reboot
On Sat, Mar 01, 2014 at 08:50:17AM -0800, H. Peter Anvin wrote: > The bottom line is that if we want hard numbers we probably have to > measure. > > Hoisting the cr2 read is a no-brainer, might even help performance... Btw, I just got word that on AMD, a read from CR2 is 4 cycles on family 0x15 and 3 cycles on family 0x10. If we're in the same ballpark on Intel, I'd say we're fine with those numbers even in hot paths. The OoO is an additional bonus which should hide latency even more. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Sat, Mar 01, 2014 at 08:50:17AM -0800, H. Peter Anvin wrote: The bottom line is that if we want hard numbers we probably have to measure. Hoisting the cr2 read is a no-brainer, might even help performance... Btw, I just got word that on AMD, a read from CR2 is 4 cycles on family 0x15 and 3 cycles on family 0x10. If we're in the same ballpark on Intel, I'd say we're fine with those numbers even in hot paths. The OoO is an additional bonus which should hide latency even more. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote: > We read CR2 in the page fault hot path, so it's on the top of CPU > architects' minds and it's reasonably optimized. A couple of cycles > IIRC, but would be nice to hear actual numbers. Well, going with what Linus found; it looks like they made page faults more expensive on Haswell as compared to previous generations. So its not _that_ high on their list :/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote: We read CR2 in the page fault hot path, so it's on the top of CPU architects' minds and it's reasonably optimized. A couple of cycles IIRC, but would be nice to hear actual numbers. Well, going with what Linus found; it looks like they made page faults more expensive on Haswell as compared to previous generations. So its not _that_ high on their list :/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014, Steven Rostedt wrote: > On Fri, 28 Feb 2014 18:34:00 -0500 (EST) > Vince Weaver wrote: > > But perf_event bug finder is a much more prestigious title than > "college professor" ;-) yes, it's something to fall back on if/when I get denied tenure :) I do enjoy tracking down bugs like this, it's why I spend a lot of time on it when I should probably be doing other things. > BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be > really useful for us to do our own testing too. yes, code is fully available though possibly not publicized well. I posted the full info later in the thread as I seem to be responding to e-mails in reverse order today for some reason. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Sat, 1 Mar 2014, Andi Kleen wrote: > Steven Rostedt writes: > > > > BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be > > really useful for us to do our own testing too. > > I believe it's part of trinity. > > http://codemonkey.org.uk/projects/trinity/ > > Perhaps it should have a "ftracer fuzzer" too? It's not part of trinity, it's a separate project that re-uses some of trinity's codebase. http://web.eece.maine.edu/~vweaver/projects/perf_events/fuzzer/ You can get the source as part of the perf_event_tests package https://github.com/deater/perf_event_tests go into the fuzzer directory, run make, then "./perf_fuzzer". You can also run the ./fast_repro.sh script instead, it's what I've been using to track this kernel issue, although it tripped over the issue a lot faster when compiled with -mx32 than it did when complied as native x86_64. I think the code as found in the git tree should be working, I made a lot of changes when tracking this problem and I need to make sure I only check in the proper ones and not let any weird debugging patches slip in. I'm aware of at least two other WARN messages and possibly one or two hard-lockup bugs in addition to the potential system reboot that can be triggered by the fuzzer, it just takes a while to isolate these things. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Sat, 1 Mar 2014, Andi Kleen wrote: Steven Rostedt rost...@goodmis.org writes: BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be really useful for us to do our own testing too. I believe it's part of trinity. http://codemonkey.org.uk/projects/trinity/ Perhaps it should have a ftracer fuzzer too? It's not part of trinity, it's a separate project that re-uses some of trinity's codebase. http://web.eece.maine.edu/~vweaver/projects/perf_events/fuzzer/ You can get the source as part of the perf_event_tests package https://github.com/deater/perf_event_tests go into the fuzzer directory, run make, then ./perf_fuzzer. You can also run the ./fast_repro.sh script instead, it's what I've been using to track this kernel issue, although it tripped over the issue a lot faster when compiled with -mx32 than it did when complied as native x86_64. I think the code as found in the git tree should be working, I made a lot of changes when tracking this problem and I need to make sure I only check in the proper ones and not let any weird debugging patches slip in. I'm aware of at least two other WARN messages and possibly one or two hard-lockup bugs in addition to the potential system reboot that can be triggered by the fuzzer, it just takes a while to isolate these things. Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014, Steven Rostedt wrote: On Fri, 28 Feb 2014 18:34:00 -0500 (EST) Vince Weaver vincent.wea...@maine.edu wrote: But perf_event bug finder is a much more prestigious title than college professor ;-) yes, it's something to fall back on if/when I get denied tenure :) I do enjoy tracking down bugs like this, it's why I spend a lot of time on it when I should probably be doing other things. BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be really useful for us to do our own testing too. yes, code is fully available though possibly not publicized well. I posted the full info later in the thread as I seem to be responding to e-mails in reverse order today for some reason. Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
The bottom line is that if we want hard numbers we probably have to measure. Hoisting the cr2 read is a no-brainer, might even help performance... On March 1, 2014 1:50:42 AM PST, Borislav Petkov wrote: >On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote: >> >> * Steven Rostedt wrote: >> >> > > Also, this function is called a _LOT_ under certain workloads, I >> > > don't know how cheap a CR2 read is, but it had better be really >> > > cheap. >> > >> > That's a HPA question. >> >> We read CR2 in the page fault hot path, so it's on the top of CPU >> architects' minds and it's reasonably optimized. A couple of cycles >> IIRC, but would be nice to hear actual numbers. > >Yeah, we were discussing this last night on IRC. > >And hpa actually meant that the optimization potential was there but no >one did do it, except maybe Transmeta. :-) > >So the expensive thing is writing to CR2 because it is a serializing >instruction. In fact, all writes to control registers except CR8 are >serializing. > >The reading from CR2 should be cheaper but not as cheap as a normal >MOV %reg %reg is. On AMD, MOV %reg, %cr2 is done with microcode so >definitely at least a couple of cycles and I'd guess it is not a >trivial >MOV on Intel too. > >Maybe a way to hide this cost is the OoO, as hpa suggested, depending >on >how much parallelism that particular code region can offer (serializing >instructions close by). -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
Steven Rostedt writes: > > BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be > really useful for us to do our own testing too. I believe it's part of trinity. http://codemonkey.org.uk/projects/trinity/ Perhaps it should have a "ftracer fuzzer" too? -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote: > > * Steven Rostedt wrote: > > > > Also, this function is called a _LOT_ under certain workloads, I > > > don't know how cheap a CR2 read is, but it had better be really > > > cheap. > > > > That's a HPA question. > > We read CR2 in the page fault hot path, so it's on the top of CPU > architects' minds and it's reasonably optimized. A couple of cycles > IIRC, but would be nice to hear actual numbers. Yeah, we were discussing this last night on IRC. And hpa actually meant that the optimization potential was there but no one did do it, except maybe Transmeta. :-) So the expensive thing is writing to CR2 because it is a serializing instruction. In fact, all writes to control registers except CR8 are serializing. The reading from CR2 should be cheaper but not as cheap as a normal MOV %reg %reg is. On AMD, MOV %reg, %cr2 is done with microcode so definitely at least a couple of cycles and I'd guess it is not a trivial MOV on Intel too. Maybe a way to hide this cost is the OoO, as hpa suggested, depending on how much parallelism that particular code region can offer (serializing instructions close by). -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
* Steven Rostedt wrote: > > Also, this function is called a _LOT_ under certain workloads, I > > don't know how cheap a CR2 read is, but it had better be really > > cheap. > > That's a HPA question. We read CR2 in the page fault hot path, so it's on the top of CPU architects' minds and it's reasonably optimized. A couple of cycles IIRC, but would be nice to hear actual numbers. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
* Steven Rostedt rost...@goodmis.org wrote: Also, this function is called a _LOT_ under certain workloads, I don't know how cheap a CR2 read is, but it had better be really cheap. That's a HPA question. We read CR2 in the page fault hot path, so it's on the top of CPU architects' minds and it's reasonably optimized. A couple of cycles IIRC, but would be nice to hear actual numbers. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote: * Steven Rostedt rost...@goodmis.org wrote: Also, this function is called a _LOT_ under certain workloads, I don't know how cheap a CR2 read is, but it had better be really cheap. That's a HPA question. We read CR2 in the page fault hot path, so it's on the top of CPU architects' minds and it's reasonably optimized. A couple of cycles IIRC, but would be nice to hear actual numbers. Yeah, we were discussing this last night on IRC. And hpa actually meant that the optimization potential was there but no one did do it, except maybe Transmeta. :-) So the expensive thing is writing to CR2 because it is a serializing instruction. In fact, all writes to control registers except CR8 are serializing. The reading from CR2 should be cheaper but not as cheap as a normal MOV %reg %reg is. On AMD, MOV %reg, %cr2 is done with microcode so definitely at least a couple of cycles and I'd guess it is not a trivial MOV on Intel too. Maybe a way to hide this cost is the OoO, as hpa suggested, depending on how much parallelism that particular code region can offer (serializing instructions close by). -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
Steven Rostedt rost...@goodmis.org writes: BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be really useful for us to do our own testing too. I believe it's part of trinity. http://codemonkey.org.uk/projects/trinity/ Perhaps it should have a ftracer fuzzer too? -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
The bottom line is that if we want hard numbers we probably have to measure. Hoisting the cr2 read is a no-brainer, might even help performance... On March 1, 2014 1:50:42 AM PST, Borislav Petkov b...@alien8.de wrote: On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote: * Steven Rostedt rost...@goodmis.org wrote: Also, this function is called a _LOT_ under certain workloads, I don't know how cheap a CR2 read is, but it had better be really cheap. That's a HPA question. We read CR2 in the page fault hot path, so it's on the top of CPU architects' minds and it's reasonably optimized. A couple of cycles IIRC, but would be nice to hear actual numbers. Yeah, we were discussing this last night on IRC. And hpa actually meant that the optimization potential was there but no one did do it, except maybe Transmeta. :-) So the expensive thing is writing to CR2 because it is a serializing instruction. In fact, all writes to control registers except CR8 are serializing. The reading from CR2 should be cheaper but not as cheap as a normal MOV %reg %reg is. On AMD, MOV %reg, %cr2 is done with microcode so definitely at least a couple of cycles and I'd guess it is not a trivial MOV on Intel too. Maybe a way to hide this cost is the OoO, as hpa suggested, depending on how much parallelism that particular code region can offer (serializing instructions close by). -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 18:34:00 -0500 (EST) Vince Weaver wrote: > > I was poking fun at you on IRC for this exact reason: > > > > poor Vince, I keep sending him new patches. "No, don't test this > > patch, now test this one. Oh wait, try this one instead" > > * peterz sees Vince thinking: "stop... sending.. me.. damn... patches... > > already... !!!11!" > > or at least, "Let me finish this test before I cancel it again > > for another damn patch" > > then he's probably doing "I'm not going to run any tests now, > > until I wait a while to see if there's a new patch to test" I hope you were not offended by this. It was actually a testament to the work you've given us. > > Well while it might appear that I spend all of my days finding perf_event > bugs, I actually am a college professor so I do occasionally have to run > off to teach a class, meet with students, or write papers/grants for other > academics to reject. But perf_event bug finder is a much more prestigious title than "college professor" ;-) > > It's nice others can reproduce the issue now, it would have saved me a lot > of trouble, although now in theory I have a much better handle of how to > use/abuse ftrace so I guess it was worth it. I always enjoy finding bugs in other people's code. That's usually the best way to learn what their code does. > > Once the fix gets into git I'm sure the relentless perf_fuzzer will let us > know if there are any other issues left. I do look forward to the day > when I can leave it running overnight and have a clean syslog the next > morning. BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be really useful for us to do our own testing too. Thanks, -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On 02/28/2014 03:34 PM, Vince Weaver wrote: > > Well while it might appear that I spend all of my days finding perf_event > bugs, I actually am a college professor so I do occasionally have to run > off to teach a class, meet with students, or write papers/grants for other > academics to reject. > We really appreciate your help. This has been really critical. > > It's nice others can reproduce the issue now, it would have saved me a lot > of trouble, although now in theory I have a much better handle of how to > use/abuse ftrace so I guess it was worth it. > > Once the fix gets into git I'm sure the relentless perf_fuzzer will let us > know if there are any other issues left. I do look forward to the day > when I can leave it running overnight and have a clean syslog the next > morning. > We all do, definitely, and your help has been a huge step in that direction. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014, Steven Rostedt wrote: > On Fri, 28 Feb 2014 16:18:23 -0500 (EST) > Vince Weaver wrote: > > > I was away from the computer this afternoon and of course I have scores of > > e-mails on this topic now with lots of competing patches. Is there one > > in particular I'm supposed to be testing? > > I was poking fun at you on IRC for this exact reason: > > poor Vince, I keep sending him new patches. "No, don't test this > patch, now test this one. Oh wait, try this one instead" > * peterz sees Vince thinking: "stop... sending.. me.. damn... patches... > already... !!!11!" > or at least, "Let me finish this test before I cancel it again for > another damn patch" > then he's probably doing "I'm not going to run any tests now, until > I wait a while to see if there's a new patch to test" Well while it might appear that I spend all of my days finding perf_event bugs, I actually am a college professor so I do occasionally have to run off to teach a class, meet with students, or write papers/grants for other academics to reject. It's nice others can reproduce the issue now, it would have saved me a lot of trouble, although now in theory I have a much better handle of how to use/abuse ftrace so I guess it was worth it. Once the fix gets into git I'm sure the relentless perf_fuzzer will let us know if there are any other issues left. I do look forward to the day when I can leave it running overnight and have a clean syslog the next morning. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 05:05:53PM -0500, Steven Rostedt wrote: > On Fri, 28 Feb 2014 22:55:11 +0100 > Peter Zijlstra wrote: > > > On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote: > > > On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote: > > > > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote: > > > > > This code isn't running in idle context is it? If so, RCU will > > > > > happily > > > > > free out from under it. CONFIG_PROVE_RCU should detect this sort of > > > > > thing, > > > > > though. > > > > > > > > Well, interrupts/NMIs can happen when idle, but the interrupt/NMI > > > > entry code deals with the idle state AFAIK. > > > > > > Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that. More worried > > > about this being code invoked from some energy-efficiency driver or > > > another within the idle loop. > > > > Right, so any tracepoint can end up there; but I thought there was > > already the rule that tracepoints needed RCU enabled. > > There is and we have special tracepoint caller for those cases we want a > tracepoint out of RCU scope. These will reactivate rcu in the > tracepoint code. > > trace__rcuidle(...) OK, I finally looked at the emails leading up to this in the thread... I believe that I am doing premature debugging. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 22:55:11 +0100 Peter Zijlstra wrote: > On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote: > > On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote: > > > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote: > > > > This code isn't running in idle context is it? If so, RCU will happily > > > > free out from under it. CONFIG_PROVE_RCU should detect this sort of > > > > thing, > > > > though. > > > > > > Well, interrupts/NMIs can happen when idle, but the interrupt/NMI > > > entry code deals with the idle state AFAIK. > > > > Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that. More worried > > about this being code invoked from some energy-efficiency driver or > > another within the idle loop. > > Right, so any tracepoint can end up there; but I thought there was > already the rule that tracepoints needed RCU enabled. There is and we have special tracepoint caller for those cases we want a tracepoint out of RCU scope. These will reactivate rcu in the tracepoint code. trace__rcuidle(...) -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote: > On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote: > > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote: > > > This code isn't running in idle context is it? If so, RCU will happily > > > free out from under it. CONFIG_PROVE_RCU should detect this sort of > > > thing, > > > though. > > > > Well, interrupts/NMIs can happen when idle, but the interrupt/NMI > > entry code deals with the idle state AFAIK. > > Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that. More worried > about this being code invoked from some energy-efficiency driver or > another within the idle loop. Right, so any tracepoint can end up there; but I thought there was already the rule that tracepoints needed RCU enabled. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote: > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote: > > This code isn't running in idle context is it? If so, RCU will happily > > free out from under it. CONFIG_PROVE_RCU should detect this sort of thing, > > though. > > Well, interrupts/NMIs can happen when idle, but the interrupt/NMI > entry code deals with the idle state AFAIK. Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that. More worried about this being code invoked from some energy-efficiency driver or another within the idle loop. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 16:18:23 -0500 (EST) Vince Weaver wrote: > I was away from the computer this afternoon and of course I have scores of > e-mails on this topic now with lots of competing patches. Is there one > in particular I'm supposed to be testing? I was poking fun at you on IRC for this exact reason: poor Vince, I keep sending him new patches. "No, don't test this patch, now test this one. Oh wait, try this one instead" * peterz sees Vince thinking: "stop... sending.. me.. damn... patches... already... !!!11!" or at least, "Let me finish this test before I cancel it again for another damn patch" then he's probably doing "I'm not going to run any tests now, until I wait a while to see if there's a new patch to test" :-) Anyway, I would say wait a bit while we sort this out. At least we have a strong idea of what the bug is. Now we need to agree on the solution. Vince, you have been really terrific in helping us figure out what was wrong. I don't want to waste your time until we can all agree on what the proper fix is. When we do, it would be great if you can test it. But for now, have a great weekend and thanks for all your hard work in narrowing down this issue. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote: > This code isn't running in idle context is it? If so, RCU will happily > free out from under it. CONFIG_PROVE_RCU should detect this sort of thing, > though. Well, interrupts/NMIs can happen when idle, but the interrupt/NMI entry code deals with the idle state AFAIK. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 09:54:09PM +0100, Peter Zijlstra wrote: > On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote: > > > > I'll try your patch momentarily, first I had some other changes I > > > > started > > > > running before I left work (for some reason it recompiled the whole > > > > kernel). > > > > > > > > 8: function: perf_output_begin > > > > 8: bprint: perf_output_begin: VMW: event type 2 config 2a > > > > st: 2c3e > > > > 8: bputs:perf_output_begin: VMW: before rcu_dereference > > > > 9: function: __do_page_fault > > > > 9: function:down_read_trylock > > > > 9: function:_cond_resched > > > > 9: function:find_vma > > > > > > > > so it looks like the fault happens > > > > > > > > rcu_read_lock(); > > > > > > > > 116 /* > > > > 117 * For inherited events we send all the output towards the > > > > parent. > > > > 118 */ > > > > 119 if (event->parent) > > > > 120 event = event->parent; > > > > 121 > > > > > > > > somewhere between here > > > > > > > > 122 rb = rcu_dereference(event->rb); > > > > 123 if (unlikely(!rb)) > > > > 124 goto out; > > > > > > > > and here > > > > > > > > 125 > > > > 126 if (unlikely(!rb->nr_pages)) > > > > 127 goto out; > > > > > > > > although if rcu locks do anything to turn off tracing then this could > > > > be > > > > suspect. > > > > > > The most likely suspect is of course event->rb in the rcu_dereference. > > > I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock() > > > currently interact with tracing. ;-) > > > > These are all perf related. You'll need to defer to Peter Zijlstra ;-) > > I'm lost.. :/ > > pretty much all perf objects are RCU freed. This code isn't running in idle context is it? If so, RCU will happily free out from under it. CONFIG_PROVE_RCU should detect this sort of thing, though. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014, H. Peter Anvin wrote: > Now we need to figure out if the reboot problem and the segfault problem > are actually the same... I have a nasty feeling they might be different > problems. I'm currently running a script that tries setting EBP to all possible 32-bit pages and running the test to see if that triggers anything. If I look at my notes the original reboot crash might have happened when I had the fuzzer also generating overflow signals (my current tests do not) so I'm not sure if having all this mess triggered from inside a signal handler could make it reboot somehow. I was away from the computer this afternoon and of course I have scores of e-mails on this topic now with lots of competing patches. Is there one in particular I'm supposed to be testing? Thanks, Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 21:56:38 +0100 Peter Zijlstra wrote: > Like already said; _trace is an absolutely abysmal name. Also you > _really_ don't want an unconditional CR2 write in there, that's just > stupidly expensive. But a read isn't. Which is why we only do a write if the copy caused a page fault (which is already expensive). The proposed patch will have: if (cr2 != read_cr2()) write_cr2(cr2); > > Also, this function is called a _LOT_ under certain workloads, I don't > know how cheap a CR2 read is, but it had better be really cheap. That's a HPA question. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 11:29:46AM -0500, Steven Rostedt wrote: > On Fri, 28 Feb 2014 08:15:11 -0800 > "H. Peter Anvin" wrote: > > > Well, I was talking about the assumption spelled out in the comment > > above copy_from_user_nmi() which pretty much states "cr2 is safe because > > cr2 is saved/restored in the NMI wrappers." > > Yeah, it seems that the name "copy_from_user_nmi()" is a misnomer. As > it can be called outside of nmi context. Perhaps we should have a > copy_from_user_trace() that does the save and restore of the cr2. > > As that's the only place that faults, it may be the best answer. > > Arnaldo, > > Can you test this patch and see if it fixes the bug for you too. Vince > already said it fixes it for him. > > I'm attaching it below (it's from H. Peter). > > Peter Z., as both Jiri's and my patch fixed the callers of the problem > area, and as we have been discussing, there may be more problem areas, > I'm thinking the best solution is to just use H. Peter's patch instead. > And then we should rename it to copy_from_user_trace(). Like already said; _trace is an absolutely abysmal name. Also you _really_ don't want an unconditional CR2 write in there, that's just stupidly expensive. Also, this function is called a _LOT_ under certain workloads, I don't know how cheap a CR2 read is, but it had better be really cheap. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 08:15:11AM -0800, H. Peter Anvin wrote: > On 02/28/2014 07:40 AM, Peter Zijlstra wrote: > > On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote: > >> If I'm reading this right we end up going from the page fault > >> tracepoint to copy_from_user_nmi() without going through NMI, and the > >> cr2 corruption is obvious. I guess the assumption that only the NMI > >> path needed to save cr2 is flawed? > > > > It was never assumed it would only go through NMI, but that it would be > > NMI safe -- and as it turns out, it is that. > > > > What I did assume was that any other callsites would be safe, seeing how > > they'd already be running in 'normal' contexts. > > > > I had not considered people putting tracepoints _that_ early in the > > exception paths. > > > > Note that there's more tracepoints there than the one mentioned. > > > > Well, I was talking about the assumption spelled out in the comment > above copy_from_user_nmi() which pretty much states "cr2 is safe because > cr2 is saved/restored in the NMI wrappers." That is because we assumed NMI was the only thing that could interrupt a fault handler before it would read CR2. Never thinking someone would put a tracepoint there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote: > > > I'll try your patch momentarily, first I had some other changes I started > > > running before I left work (for some reason it recompiled the whole > > > kernel). > > > > > > 8: function: perf_output_begin > > > 8: bprint: perf_output_begin: VMW: event type 2 config 2a > > > st: 2c3e > > > 8: bputs:perf_output_begin: VMW: before rcu_dereference > > > 9: function: __do_page_fault > > > 9: function:down_read_trylock > > > 9: function:_cond_resched > > > 9: function:find_vma > > > > > > so it looks like the fault happens > > > > > > rcu_read_lock(); > > > > > > 116 /* > > > 117 * For inherited events we send all the output towards the > > > parent. > > > 118 */ > > > 119 if (event->parent) > > > 120 event = event->parent; > > > 121 > > > > > > somewhere between here > > > > > > 122 rb = rcu_dereference(event->rb); > > > 123 if (unlikely(!rb)) > > > 124 goto out; > > > > > > and here > > > > > > 125 > > > 126 if (unlikely(!rb->nr_pages)) > > > 127 goto out; > > > > > > although if rcu locks do anything to turn off tracing then this could be > > > suspect. > > > > The most likely suspect is of course event->rb in the rcu_dereference. > > I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock() > > currently interact with tracing. ;-) > > These are all perf related. You'll need to defer to Peter Zijlstra ;-) I'm lost.. :/ pretty much all perf objects are RCU freed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 12:34:05 -0800 "Paul E. McKenney" wrote: > On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote: > > On Thu, 27 Feb 2014, H. Peter Anvin wrote: > > > > > On 02/27/2014 03:30 PM, Steven Rostedt wrote: > > > > On Thu, 27 Feb 2014 14:52:54 -0800 > > > > "H. Peter Anvin" wrote: > > > > > > > >> On 02/27/2014 02:31 PM, Steven Rostedt wrote: > > > >>> > > > >>> Yeah, something is getting mesed up. > > > >>> > > > >> > > > >> What it *looks* like to me is that we try to nest the cr2 save/restore, > > > >> which doesn't nest because it is a percpu variable. > > > >> > > > >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside > > > >> entry_64.S, which makes the stuff in do_nmi completely redundant and > > > >> there for no good reason. > > > > > > > > Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32 > > > > section. That is, it isn't even executed. That's i386 code. The only > > > > place the cr2 is saved for x86_64 is in entry_64.S. > > > > > > > > > > Right, egg on my face. However, I still think it would make more sense > > > for it to nest the way entry_64.S does if at all possible. > > > > > > That makes this even more confusing, though. I would still like to see > > > what happens with the patch I sent Vince. > > > > I'll try your patch momentarily, first I had some other changes I started > > running before I left work (for some reason it recompiled the whole > > kernel). > > > > 8: function: perf_output_begin > > 8: bprint: perf_output_begin: VMW: event type 2 config 2a st: > > 2c3e > > 8: bputs:perf_output_begin: VMW: before rcu_dereference > > 9: function: __do_page_fault > > 9: function:down_read_trylock > > 9: function:_cond_resched > > 9: function:find_vma > > > > so it looks like the fault happens > > > > rcu_read_lock(); > > > > 116 /* > > 117 * For inherited events we send all the output towards the > > parent. > > 118 */ > > 119 if (event->parent) > > 120 event = event->parent; > > 121 > > > > somewhere between here > > > > 122 rb = rcu_dereference(event->rb); > > 123 if (unlikely(!rb)) > > 124 goto out; > > > > and here > > > > 125 > > 126 if (unlikely(!rb->nr_pages)) > > 127 goto out; > > > > although if rcu locks do anything to turn off tracing then this could be > > suspect. > > The most likely suspect is of course event->rb in the rcu_dereference. > I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock() > currently interact with tracing. ;-) These are all perf related. You'll need to defer to Peter Zijlstra ;-) -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 12:38:52 -0800 "H. Peter Anvin" wrote: > Now we need to figure out if the reboot problem and the segfault problem are > actually the same... I have a nasty feeling they might be different problems. I wonder if there was any recursion problem. Although, I believe perf has recursion checks. But perhaps something recursed before the checks? -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
Now we need to figure out if the reboot problem and the segfault problem are actually the same... I have a nasty feeling they might be different problems. On February 28, 2014 7:07:29 AM PST, Vince Weaver wrote: >On Fri, 28 Feb 2014, Steven Rostedt wrote: > >> Interesting. Are you doing a perf function trace? >> >> And just in case, can you add this patch and make sure the copy is >> called by NMI. > >199.900682: function: trace_do_page_fault >199.900683: page_fault_user: address=__per_cpu_end >ip=__per_cpu_end error_code=0x6 >199.900683: function: perf_swevent_get_recursion_context >199.900684: function: perf_tp_event >199.900684: function:perf_swevent_event >199.900684: function: perf_swevent_overflow >199.900684: function: __perf_event_overflow >199.900685: function: perf_prepare_sample >199.900685: function: >__perf_event_header__init_id >199.900685: function: task_tgid_nr_ns >199.900685: function: perf_event_tid >199.900686: function: __task_pid_nr_ns >199.900686: function:perf_callchain >199.900687: function: copy_from_user_nmi >199.900687: function: trace_do_page_fault >199.900687: page_fault_kernel:address=irq_stack_union >ip=copy_user_generic_string error_code=0x0 >199.900688: function:__do_page_fault >199.900688: function: bad_area_nosemaphore >199.900688: function: __bad_area_nosemaphore >199.900689: function: no_context >199.900689: function:fixup_exception >199.900689: function: >search_exception_tables >199.900689: function: search_extable >199.900691: function: copy_user_handle_tail >199.900691: function: trace_do_page_fault >199.900691: page_fault_kernel:address=irq_stack_union >ip=copy_user_handle_tail error_code=0x0 >199.900691: function:__do_page_fault >199.900692: function: bad_area_nosemaphore >199.900692: function: __bad_area_nosemaphore >199.900692: function: no_context >199.900692: function:fixup_exception >199.900692: function: >search_exception_tables >199.900692: function: search_extable >199.900693: function: save_stack_trace >199.900693: function:dump_trace >199.900694: function: print_context_stack >199.900694: function: __kernel_text_address >199.900694: function: is_module_text_address >199.900695: function:__module_text_address >199.900695: function: __module_address >199.900695: function: __kernel_text_address >199.900695: function: is_module_text_address >199.900696: function:__module_text_address >199.900696: function: __module_address >... >199.900705: function: __kernel_text_address >199.900809: kernel_stack: >=> perf_callchain (810d35a2) >=> perf_prepare_sample (810cfae3) >=> __perf_event_overflow (810d02f4) >=> perf_swevent_overflow (810d04e3) >=> perf_swevent_event (810d0574) >=> perf_tp_event (810d070c) >=> perf_trace_x86_exceptions (810341b6) >=> trace_do_page_fault (81537702) >=> trace_page_fault (81534772) >199.900810: function: perf_output_begin >199.900810: function: __do_page_fault >199.900810: function:__perf_sw_event >199.900810: function: >perf_swevent_get_recursion_context >199.900811: function:down_read_trylock >199.900811: function:_cond_resched >199.900811: function:find_vma >199.900811: function:bad_area >199.900812: function: up_read >199.900812: function: __bad_area_nosemaphore >199.900812: function: is_prefetch >199.900812: function: convert_ip_to_linear >199.900813: function: unhandled_signal >199.900813: function: __printk_ratelimit >199.900813: function: _raw_spin_trylock >199.900813: function: _raw_spin_unlock_irqrestore >199.900814: function: printk >199.900814: function: vprintk_emit -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote: > On Thu, 27 Feb 2014, H. Peter Anvin wrote: > > > On 02/27/2014 03:30 PM, Steven Rostedt wrote: > > > On Thu, 27 Feb 2014 14:52:54 -0800 > > > "H. Peter Anvin" wrote: > > > > > >> On 02/27/2014 02:31 PM, Steven Rostedt wrote: > > >>> > > >>> Yeah, something is getting mesed up. > > >>> > > >> > > >> What it *looks* like to me is that we try to nest the cr2 save/restore, > > >> which doesn't nest because it is a percpu variable. > > >> > > >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside > > >> entry_64.S, which makes the stuff in do_nmi completely redundant and > > >> there for no good reason. > > > > > > Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32 > > > section. That is, it isn't even executed. That's i386 code. The only > > > place the cr2 is saved for x86_64 is in entry_64.S. > > > > > > > Right, egg on my face. However, I still think it would make more sense > > for it to nest the way entry_64.S does if at all possible. > > > > That makes this even more confusing, though. I would still like to see > > what happens with the patch I sent Vince. > > I'll try your patch momentarily, first I had some other changes I started > running before I left work (for some reason it recompiled the whole > kernel). > > 8: function: perf_output_begin > 8: bprint: perf_output_begin: VMW: event type 2 config 2a st: > 2c3e > 8: bputs:perf_output_begin: VMW: before rcu_dereference > 9: function: __do_page_fault > 9: function:down_read_trylock > 9: function:_cond_resched > 9: function:find_vma > > so it looks like the fault happens > > rcu_read_lock(); > > 116 /* > 117 * For inherited events we send all the output towards the parent. > 118 */ > 119 if (event->parent) > 120 event = event->parent; > 121 > > somewhere between here > > 122 rb = rcu_dereference(event->rb); > 123 if (unlikely(!rb)) > 124 goto out; > > and here > > 125 > 126 if (unlikely(!rb->nr_pages)) > 127 goto out; > > although if rcu locks do anything to turn off tracing then this could be > suspect. The most likely suspect is of course event->rb in the rcu_dereference. I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock() currently interact with tracing. ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 08:15:11 -0800 "H. Peter Anvin" wrote: > Well, I was talking about the assumption spelled out in the comment > above copy_from_user_nmi() which pretty much states "cr2 is safe because > cr2 is saved/restored in the NMI wrappers." Yeah, it seems that the name "copy_from_user_nmi()" is a misnomer. As it can be called outside of nmi context. Perhaps we should have a copy_from_user_trace() that does the save and restore of the cr2. As that's the only place that faults, it may be the best answer. Arnaldo, Can you test this patch and see if it fixes the bug for you too. Vince already said it fixes it for him. I'm attaching it below (it's from H. Peter). Peter Z., as both Jiri's and my patch fixed the callers of the problem area, and as we have been discussing, there may be more problem areas, I'm thinking the best solution is to just use H. Peter's patch instead. And then we should rename it to copy_from_user_trace(). -- Steve diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c index ddf9ecb..938e45c 100644 --- a/arch/x86/lib/usercopy.c +++ b/arch/x86/lib/usercopy.c @@ -10,6 +10,8 @@ #include #include +#include + /* * We rely on the nested NMI work to allow atomic faults from the NMI path; the * nested NMI paths are careful to preserve CR2. @@ -18,6 +20,7 @@ unsigned long copy_from_user_nmi(void *to, const void __user *from, unsigned long n) { unsigned long ret; + unsigned long cr2; if (__range_not_ok(from, n, TASK_SIZE)) return 0; @@ -27,9 +30,11 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n) * disable pagefaults so that its behaviour is consistent even when * called form other contexts. */ + cr2 = read_cr2(); pagefault_disable(); ret = __copy_from_user_inatomic(to, from, n); pagefault_enable(); + write_cr2(cr2); return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On 02/28/2014 07:40 AM, Peter Zijlstra wrote: > On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote: >> If I'm reading this right we end up going from the page fault >> tracepoint to copy_from_user_nmi() without going through NMI, and the >> cr2 corruption is obvious. I guess the assumption that only the NMI >> path needed to save cr2 is flawed? > > It was never assumed it would only go through NMI, but that it would be > NMI safe -- and as it turns out, it is that. > > What I did assume was that any other callsites would be safe, seeing how > they'd already be running in 'normal' contexts. > > I had not considered people putting tracepoints _that_ early in the > exception paths. > > Note that there's more tracepoints there than the one mentioned. > Well, I was talking about the assumption spelled out in the comment above copy_from_user_nmi() which pretty much states "cr2 is safe because cr2 is saved/restored in the NMI wrappers." -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote: > If I'm reading this right we end up going from the page fault > tracepoint to copy_from_user_nmi() without going through NMI, and the > cr2 corruption is obvious. I guess the assumption that only the NMI > path needed to save cr2 is flawed? It was never assumed it would only go through NMI, but that it would be NMI safe -- and as it turns out, it is that. What I did assume was that any other callsites would be safe, seeing how they'd already be running in 'normal' contexts. I had not considered people putting tracepoints _that_ early in the exception paths. Note that there's more tracepoints there than the one mentioned. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 10:20:00 -0500 Steven Rostedt wrote: > Below is a patch that should fix this. Please remove all other patches > and try this out. Updated patch, as Peter Zijlstra on IRC asked me if the exception_enter() can be traced. And looking at it, it sure can be. -- Steve diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 6dea040..f606b67 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1271,9 +1271,15 @@ dotraplinkage void __kprobes trace_do_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; + unsigned long cr2; + /* The trace might fault, save the cr2 register */ + cr2 = read_cr2(); prev_state = exception_enter(); trace_page_fault_entries(regs, error_code); + /* Put back the original cr2 if needed */ + if (cr2 != read_cr2()) + write_cr2(cr2); __do_page_fault(regs, error_code); exception_exit(prev_state); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 10:07:29 -0500 (EST) Vince Weaver wrote: > On Fri, 28 Feb 2014, Steven Rostedt wrote: > 199.900696: function: __module_address > ... > 199.900705: function: __kernel_text_address > 199.900809: kernel_stack: > => perf_callchain (810d35a2) > => perf_prepare_sample (810cfae3) > => __perf_event_overflow (810d02f4) > => perf_swevent_overflow (810d04e3) > => perf_swevent_event (810d0574) > => perf_tp_event (810d070c) > => perf_trace_x86_exceptions (810341b6) > => trace_do_page_fault (81537702) > => trace_page_fault (81534772) Thank you!!! You just found the bug :-) The bug was caused by: commit 25c74b10bacead867478480170083f69cfc0db48 x86, trace: Register exception handler to trace IDT With this code: dotraplinkage void __kprobes trace_do_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; prev_state = exception_enter(); trace_page_fault_entries(regs, error_code); __do_page_fault(regs, error_code); exception_exit(prev_state); } The trace_page_fault_entries() which is called before the cr2 is saved can fault by perf doing a userspace stack trace. But the cr2 is not restored when calling __do_page_fault() and that gets the wrong cr2. Below is a patch that should fix this. Please remove all other patches and try this out. Thanks, -- Steve > 199.900810: function: perf_output_begin > 199.900810: function: __do_page_fault > 199.900810: function:__perf_sw_event > 199.900810: function: perf_swevent_get_recursion_context > 199.900811: function:down_read_trylock > 199.900811: function:_cond_resched > 199.900811: function:find_vma > 199.900811: function:bad_area > 199.900812: function: up_read > 199.900812: function: __bad_area_nosemaphore > 199.900812: function: is_prefetch > 199.900812: function: convert_ip_to_linear > 199.900813: function: unhandled_signal > 199.900813: function: __printk_ratelimit > 199.900813: function: _raw_spin_trylock > 199.900813: function: _raw_spin_unlock_irqrestore > 199.900814: function: printk > 199.900814: function: vprintk_emit diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 6dea040..66b636d 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1271,9 +1271,15 @@ dotraplinkage void __kprobes trace_do_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; + unsigned long cr2; prev_state = exception_enter(); + /* The trace might fault, save the cr2 register */ + cr2 = read_cr2(); trace_page_fault_entries(regs, error_code); + /* Put back the original cr2 if needed */ + if (cr2 != read_cr2()) + write_cr2(cr2); __do_page_fault(regs, error_code); exception_exit(prev_state); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
If I'm reading this right we end up going from the page fault tracepoint to copy_from_user_nmi() without going through NMI, and the cr2 corruption is obvious. I guess the assumption that only the NMI path needed to save cr2 is flawed? On February 28, 2014 7:07:29 AM PST, Vince Weaver wrote: >On Fri, 28 Feb 2014, Steven Rostedt wrote: > >> Interesting. Are you doing a perf function trace? >> >> And just in case, can you add this patch and make sure the copy is >> called by NMI. > >199.900682: function: trace_do_page_fault >199.900683: page_fault_user: address=__per_cpu_end >ip=__per_cpu_end error_code=0x6 >199.900683: function: perf_swevent_get_recursion_context >199.900684: function: perf_tp_event >199.900684: function:perf_swevent_event >199.900684: function: perf_swevent_overflow >199.900684: function: __perf_event_overflow >199.900685: function: perf_prepare_sample >199.900685: function: >__perf_event_header__init_id >199.900685: function: task_tgid_nr_ns >199.900685: function: perf_event_tid >199.900686: function: __task_pid_nr_ns >199.900686: function:perf_callchain >199.900687: function: copy_from_user_nmi >199.900687: function: trace_do_page_fault >199.900687: page_fault_kernel:address=irq_stack_union >ip=copy_user_generic_string error_code=0x0 >199.900688: function:__do_page_fault >199.900688: function: bad_area_nosemaphore >199.900688: function: __bad_area_nosemaphore >199.900689: function: no_context >199.900689: function:fixup_exception >199.900689: function: >search_exception_tables >199.900689: function: search_extable >199.900691: function: copy_user_handle_tail >199.900691: function: trace_do_page_fault >199.900691: page_fault_kernel:address=irq_stack_union >ip=copy_user_handle_tail error_code=0x0 >199.900691: function:__do_page_fault >199.900692: function: bad_area_nosemaphore >199.900692: function: __bad_area_nosemaphore >199.900692: function: no_context >199.900692: function:fixup_exception >199.900692: function: >search_exception_tables >199.900692: function: search_extable >199.900693: function: save_stack_trace >199.900693: function:dump_trace >199.900694: function: print_context_stack >199.900694: function: __kernel_text_address >199.900694: function: is_module_text_address >199.900695: function:__module_text_address >199.900695: function: __module_address >199.900695: function: __kernel_text_address >199.900695: function: is_module_text_address >199.900696: function:__module_text_address >199.900696: function: __module_address >... >199.900705: function: __kernel_text_address >199.900809: kernel_stack: >=> perf_callchain (810d35a2) >=> perf_prepare_sample (810cfae3) >=> __perf_event_overflow (810d02f4) >=> perf_swevent_overflow (810d04e3) >=> perf_swevent_event (810d0574) >=> perf_tp_event (810d070c) >=> perf_trace_x86_exceptions (810341b6) >=> trace_do_page_fault (81537702) >=> trace_page_fault (81534772) >199.900810: function: perf_output_begin >199.900810: function: __do_page_fault >199.900810: function:__perf_sw_event >199.900810: function: >perf_swevent_get_recursion_context >199.900811: function:down_read_trylock >199.900811: function:_cond_resched >199.900811: function:find_vma >199.900811: function:bad_area >199.900812: function: up_read >199.900812: function: __bad_area_nosemaphore >199.900812: function: is_prefetch >199.900812: function: convert_ip_to_linear >199.900813: function: unhandled_signal >199.900813: function: __printk_ratelimit >199.900813: function: _raw_spin_trylock >199.900813: function: _raw_spin_unlock_irqrestore >199.900814: function: printk >199.900814: function: vprintk_emit -- Sent from my mobile phone. Please pardon brevity and lack of
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014, Steven Rostedt wrote: > Interesting. Are you doing a perf function trace? > > And just in case, can you add this patch and make sure the copy is > called by NMI. 199.900682: function: trace_do_page_fault 199.900683: page_fault_user: address=__per_cpu_end ip=__per_cpu_end error_code=0x6 199.900683: function: perf_swevent_get_recursion_context 199.900684: function: perf_tp_event 199.900684: function:perf_swevent_event 199.900684: function: perf_swevent_overflow 199.900684: function: __perf_event_overflow 199.900685: function: perf_prepare_sample 199.900685: function:__perf_event_header__init_id 199.900685: function: task_tgid_nr_ns 199.900685: function: perf_event_tid 199.900686: function: __task_pid_nr_ns 199.900686: function:perf_callchain 199.900687: function: copy_from_user_nmi 199.900687: function: trace_do_page_fault 199.900687: page_fault_kernel:address=irq_stack_union ip=copy_user_generic_string error_code=0x0 199.900688: function:__do_page_fault 199.900688: function: bad_area_nosemaphore 199.900688: function: __bad_area_nosemaphore 199.900689: function: no_context 199.900689: function:fixup_exception 199.900689: function: search_exception_tables 199.900689: function: search_extable 199.900691: function: copy_user_handle_tail 199.900691: function: trace_do_page_fault 199.900691: page_fault_kernel:address=irq_stack_union ip=copy_user_handle_tail error_code=0x0 199.900691: function:__do_page_fault 199.900692: function: bad_area_nosemaphore 199.900692: function: __bad_area_nosemaphore 199.900692: function: no_context 199.900692: function:fixup_exception 199.900692: function: search_exception_tables 199.900692: function: search_extable 199.900693: function: save_stack_trace 199.900693: function:dump_trace 199.900694: function: print_context_stack 199.900694: function: __kernel_text_address 199.900694: function: is_module_text_address 199.900695: function:__module_text_address 199.900695: function: __module_address 199.900695: function: __kernel_text_address 199.900695: function: is_module_text_address 199.900696: function:__module_text_address 199.900696: function: __module_address ... 199.900705: function: __kernel_text_address 199.900809: kernel_stack: => perf_callchain (810d35a2) => perf_prepare_sample (810cfae3) => __perf_event_overflow (810d02f4) => perf_swevent_overflow (810d04e3) => perf_swevent_event (810d0574) => perf_tp_event (810d070c) => perf_trace_x86_exceptions (810341b6) => trace_do_page_fault (81537702) => trace_page_fault (81534772) 199.900810: function: perf_output_begin 199.900810: function: __do_page_fault 199.900810: function:__perf_sw_event 199.900810: function: perf_swevent_get_recursion_context 199.900811: function:down_read_trylock 199.900811: function:_cond_resched 199.900811: function:find_vma 199.900811: function:bad_area 199.900812: function: up_read 199.900812: function: __bad_area_nosemaphore 199.900812: function: is_prefetch 199.900812: function: convert_ip_to_linear 199.900813: function: unhandled_signal 199.900813: function: __printk_ratelimit 199.900813: function: _raw_spin_trylock 199.900813: function: _raw_spin_unlock_irqrestore 199.900814: function: printk 199.900814: function: vprintk_emit -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 09:15:33 -0500 (EST) Vince Weaver wrote: > On Thu, 27 Feb 2014, Steven Rostedt wrote: > > > On Thu, 27 Feb 2014 20:34:34 -0500 (EST) > > Vince Weaver wrote: > > > > > > > > I would actually suggest we do the equivalent on i386 as well. > > > > > > > > Vince, could you try this patch as an experiment? > > > > > > OK with your patch applied it does not segfault. > > > > > > > Vince, Great! Can you remove Peter's patch, and try this one. It > > removes the crud to save the cr2 from entry_64.S and makes both i386 > > and x86_64 do the same thing in regards to cr2 handling. > > no, with only this patch applied it segfaults as per previous: > > [ 126.396049] perf_fuzzer[2904]: segfault at 17a0 ip 004017fd sp > ffaff3f0 error 6 in perf_fuzzer[40+d1000] Interesting. Are you doing a perf function trace? And just in case, can you add this patch and make sure the copy is called by NMI. -- Steve diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c index ddf9ecb..ca943cd 100644 --- a/arch/x86/lib/usercopy.c +++ b/arch/x86/lib/usercopy.c @@ -29,6 +29,7 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n) */ pagefault_disable(); ret = __copy_from_user_inatomic(to, from, n); + trace_dump_stack(2) pagefault_enable(); return ret; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014, Steven Rostedt wrote: > On Thu, 27 Feb 2014 20:34:34 -0500 (EST) > Vince Weaver wrote: > > > > > I would actually suggest we do the equivalent on i386 as well. > > > > > > Vince, could you try this patch as an experiment? > > > > OK with your patch applied it does not segfault. > > > > Vince, Great! Can you remove Peter's patch, and try this one. It > removes the crud to save the cr2 from entry_64.S and makes both i386 > and x86_64 do the same thing in regards to cr2 handling. no, with only this patch applied it segfaults as per previous: [ 126.396049] perf_fuzzer[2904]: segfault at 17a0 ip 004017fd sp ffaff3f0 error 6 in perf_fuzzer[40+d1000] Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 12:11:11 +0100 Peter Zijlstra wrote: > On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote: > > @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void) > > dotraplinkage notrace __kprobes void > > do_nmi(struct pt_regs *regs, long error_code) > > { > > + unsigned long cr2; > > + > > nmi_nesting_preprocess(regs); > > > > + /* > > +* Save off the CR2 register. If we take a page fault in the NMI then > > +* it could corrupt the CR2 value. If the NMI preempts a page fault > > +* handler before it was able to read the CR2 register, and then the > > +* NMI itself takes a page fault, the page fault that was preempted > > +* will read the information from the NMI page fault and not the > > +* origin fault. Save it off and restore it if it changes. > > > +* Use the r12 callee-saved register. > > You might want to make that line go away :-) > Gah! I noticed that before sending and deleted the line, recreated the patch and sent that out. The one step I forgot to do was to save the file after deleting it :-p -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote: > @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void) > dotraplinkage notrace __kprobes void > do_nmi(struct pt_regs *regs, long error_code) > { > + unsigned long cr2; > + > nmi_nesting_preprocess(regs); > > + /* > + * Save off the CR2 register. If we take a page fault in the NMI then > + * it could corrupt the CR2 value. If the NMI preempts a page fault > + * handler before it was able to read the CR2 register, and then the > + * NMI itself takes a page fault, the page fault that was preempted > + * will read the information from the NMI page fault and not the > + * origin fault. Save it off and restore it if it changes. > + * Use the r12 callee-saved register. You might want to make that line go away :-) > + */ > + cr2 = read_cr2(); > + > nmi_enter(); > > inc_irq_stat(__nmi_count); > @@ -523,6 +532,10 @@ do_nmi(struct pt_regs *regs, long error_code) > > nmi_exit(); > > + /* Reads are cheaper than writes */ > + if (unlikely(cr2 != read_cr2())) > + write_cr2(cr2); > + > /* On i386, may loop back to preprocess */ > nmi_nesting_postprocess(); > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, Feb 27, 2014 at 05:31:50PM -0500, Steven Rostedt wrote: > Well, the perf ring buffer is vmalloced, right? That can cause a page > fault too. On x86 they're typically not -- although we have a debug CONFIG option to test that code on x86 too. On SPARC/ARM etc.. we have to use vmalloc_user() because of cache aliasing (yay for VIPT). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, Feb 27, 2014 at 05:31:50PM -0500, Steven Rostedt wrote: Well, the perf ring buffer is vmalloced, right? That can cause a page fault too. On x86 they're typically not -- although we have a debug CONFIG option to test that code on x86 too. On SPARC/ARM etc.. we have to use vmalloc_user() because of cache aliasing (yay for VIPT). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote: @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void) dotraplinkage notrace __kprobes void do_nmi(struct pt_regs *regs, long error_code) { + unsigned long cr2; + nmi_nesting_preprocess(regs); + /* + * Save off the CR2 register. If we take a page fault in the NMI then + * it could corrupt the CR2 value. If the NMI preempts a page fault + * handler before it was able to read the CR2 register, and then the + * NMI itself takes a page fault, the page fault that was preempted + * will read the information from the NMI page fault and not the + * origin fault. Save it off and restore it if it changes. + * Use the r12 callee-saved register. You might want to make that line go away :-) + */ + cr2 = read_cr2(); + nmi_enter(); inc_irq_stat(__nmi_count); @@ -523,6 +532,10 @@ do_nmi(struct pt_regs *regs, long error_code) nmi_exit(); + /* Reads are cheaper than writes */ + if (unlikely(cr2 != read_cr2())) + write_cr2(cr2); + /* On i386, may loop back to preprocess */ nmi_nesting_postprocess(); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 12:11:11 +0100 Peter Zijlstra pet...@infradead.org wrote: On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote: @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void) dotraplinkage notrace __kprobes void do_nmi(struct pt_regs *regs, long error_code) { + unsigned long cr2; + nmi_nesting_preprocess(regs); + /* +* Save off the CR2 register. If we take a page fault in the NMI then +* it could corrupt the CR2 value. If the NMI preempts a page fault +* handler before it was able to read the CR2 register, and then the +* NMI itself takes a page fault, the page fault that was preempted +* will read the information from the NMI page fault and not the +* origin fault. Save it off and restore it if it changes. +* Use the r12 callee-saved register. You might want to make that line go away :-) Gah! I noticed that before sending and deleted the line, recreated the patch and sent that out. The one step I forgot to do was to save the file after deleting it :-p -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014, Steven Rostedt wrote: On Thu, 27 Feb 2014 20:34:34 -0500 (EST) Vince Weaver vincent.wea...@maine.edu wrote: I would actually suggest we do the equivalent on i386 as well. Vince, could you try this patch as an experiment? OK with your patch applied it does not segfault. Vince, Great! Can you remove Peter's patch, and try this one. It removes the crud to save the cr2 from entry_64.S and makes both i386 and x86_64 do the same thing in regards to cr2 handling. no, with only this patch applied it segfaults as per previous: [ 126.396049] perf_fuzzer[2904]: segfault at 17a0 ip 004017fd sp ffaff3f0 error 6 in perf_fuzzer[40+d1000] Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 09:15:33 -0500 (EST) Vince Weaver vincent.wea...@maine.edu wrote: On Thu, 27 Feb 2014, Steven Rostedt wrote: On Thu, 27 Feb 2014 20:34:34 -0500 (EST) Vince Weaver vincent.wea...@maine.edu wrote: I would actually suggest we do the equivalent on i386 as well. Vince, could you try this patch as an experiment? OK with your patch applied it does not segfault. Vince, Great! Can you remove Peter's patch, and try this one. It removes the crud to save the cr2 from entry_64.S and makes both i386 and x86_64 do the same thing in regards to cr2 handling. no, with only this patch applied it segfaults as per previous: [ 126.396049] perf_fuzzer[2904]: segfault at 17a0 ip 004017fd sp ffaff3f0 error 6 in perf_fuzzer[40+d1000] Interesting. Are you doing a perf function trace? And just in case, can you add this patch and make sure the copy is called by NMI. -- Steve diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c index ddf9ecb..ca943cd 100644 --- a/arch/x86/lib/usercopy.c +++ b/arch/x86/lib/usercopy.c @@ -29,6 +29,7 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n) */ pagefault_disable(); ret = __copy_from_user_inatomic(to, from, n); + trace_dump_stack(2) pagefault_enable(); return ret; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014, Steven Rostedt wrote: Interesting. Are you doing a perf function trace? And just in case, can you add this patch and make sure the copy is called by NMI. 199.900682: function: trace_do_page_fault 199.900683: page_fault_user: address=__per_cpu_end ip=__per_cpu_end error_code=0x6 199.900683: function: perf_swevent_get_recursion_context 199.900684: function: perf_tp_event 199.900684: function:perf_swevent_event 199.900684: function: perf_swevent_overflow 199.900684: function: __perf_event_overflow 199.900685: function: perf_prepare_sample 199.900685: function:__perf_event_header__init_id 199.900685: function: task_tgid_nr_ns 199.900685: function: perf_event_tid 199.900686: function: __task_pid_nr_ns 199.900686: function:perf_callchain 199.900687: function: copy_from_user_nmi 199.900687: function: trace_do_page_fault 199.900687: page_fault_kernel:address=irq_stack_union ip=copy_user_generic_string error_code=0x0 199.900688: function:__do_page_fault 199.900688: function: bad_area_nosemaphore 199.900688: function: __bad_area_nosemaphore 199.900689: function: no_context 199.900689: function:fixup_exception 199.900689: function: search_exception_tables 199.900689: function: search_extable 199.900691: function: copy_user_handle_tail 199.900691: function: trace_do_page_fault 199.900691: page_fault_kernel:address=irq_stack_union ip=copy_user_handle_tail error_code=0x0 199.900691: function:__do_page_fault 199.900692: function: bad_area_nosemaphore 199.900692: function: __bad_area_nosemaphore 199.900692: function: no_context 199.900692: function:fixup_exception 199.900692: function: search_exception_tables 199.900692: function: search_extable 199.900693: function: save_stack_trace 199.900693: function:dump_trace 199.900694: function: print_context_stack 199.900694: function: __kernel_text_address 199.900694: function: is_module_text_address 199.900695: function:__module_text_address 199.900695: function: __module_address 199.900695: function: __kernel_text_address 199.900695: function: is_module_text_address 199.900696: function:__module_text_address 199.900696: function: __module_address ... 199.900705: function: __kernel_text_address 199.900809: kernel_stack: stack trace = perf_callchain (810d35a2) = perf_prepare_sample (810cfae3) = __perf_event_overflow (810d02f4) = perf_swevent_overflow (810d04e3) = perf_swevent_event (810d0574) = perf_tp_event (810d070c) = perf_trace_x86_exceptions (810341b6) = trace_do_page_fault (81537702) = trace_page_fault (81534772) 199.900810: function: perf_output_begin 199.900810: function: __do_page_fault 199.900810: function:__perf_sw_event 199.900810: function: perf_swevent_get_recursion_context 199.900811: function:down_read_trylock 199.900811: function:_cond_resched 199.900811: function:find_vma 199.900811: function:bad_area 199.900812: function: up_read 199.900812: function: __bad_area_nosemaphore 199.900812: function: is_prefetch 199.900812: function: convert_ip_to_linear 199.900813: function: unhandled_signal 199.900813: function: __printk_ratelimit 199.900813: function: _raw_spin_trylock 199.900813: function: _raw_spin_unlock_irqrestore 199.900814: function: printk 199.900814: function: vprintk_emit -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
If I'm reading this right we end up going from the page fault tracepoint to copy_from_user_nmi() without going through NMI, and the cr2 corruption is obvious. I guess the assumption that only the NMI path needed to save cr2 is flawed? On February 28, 2014 7:07:29 AM PST, Vince Weaver vincent.wea...@maine.edu wrote: On Fri, 28 Feb 2014, Steven Rostedt wrote: Interesting. Are you doing a perf function trace? And just in case, can you add this patch and make sure the copy is called by NMI. 199.900682: function: trace_do_page_fault 199.900683: page_fault_user: address=__per_cpu_end ip=__per_cpu_end error_code=0x6 199.900683: function: perf_swevent_get_recursion_context 199.900684: function: perf_tp_event 199.900684: function:perf_swevent_event 199.900684: function: perf_swevent_overflow 199.900684: function: __perf_event_overflow 199.900685: function: perf_prepare_sample 199.900685: function: __perf_event_header__init_id 199.900685: function: task_tgid_nr_ns 199.900685: function: perf_event_tid 199.900686: function: __task_pid_nr_ns 199.900686: function:perf_callchain 199.900687: function: copy_from_user_nmi 199.900687: function: trace_do_page_fault 199.900687: page_fault_kernel:address=irq_stack_union ip=copy_user_generic_string error_code=0x0 199.900688: function:__do_page_fault 199.900688: function: bad_area_nosemaphore 199.900688: function: __bad_area_nosemaphore 199.900689: function: no_context 199.900689: function:fixup_exception 199.900689: function: search_exception_tables 199.900689: function: search_extable 199.900691: function: copy_user_handle_tail 199.900691: function: trace_do_page_fault 199.900691: page_fault_kernel:address=irq_stack_union ip=copy_user_handle_tail error_code=0x0 199.900691: function:__do_page_fault 199.900692: function: bad_area_nosemaphore 199.900692: function: __bad_area_nosemaphore 199.900692: function: no_context 199.900692: function:fixup_exception 199.900692: function: search_exception_tables 199.900692: function: search_extable 199.900693: function: save_stack_trace 199.900693: function:dump_trace 199.900694: function: print_context_stack 199.900694: function: __kernel_text_address 199.900694: function: is_module_text_address 199.900695: function:__module_text_address 199.900695: function: __module_address 199.900695: function: __kernel_text_address 199.900695: function: is_module_text_address 199.900696: function:__module_text_address 199.900696: function: __module_address ... 199.900705: function: __kernel_text_address 199.900809: kernel_stack: stack trace = perf_callchain (810d35a2) = perf_prepare_sample (810cfae3) = __perf_event_overflow (810d02f4) = perf_swevent_overflow (810d04e3) = perf_swevent_event (810d0574) = perf_tp_event (810d070c) = perf_trace_x86_exceptions (810341b6) = trace_do_page_fault (81537702) = trace_page_fault (81534772) 199.900810: function: perf_output_begin 199.900810: function: __do_page_fault 199.900810: function:__perf_sw_event 199.900810: function: perf_swevent_get_recursion_context 199.900811: function:down_read_trylock 199.900811: function:_cond_resched 199.900811: function:find_vma 199.900811: function:bad_area 199.900812: function: up_read 199.900812: function: __bad_area_nosemaphore 199.900812: function: is_prefetch 199.900812: function: convert_ip_to_linear 199.900813: function: unhandled_signal 199.900813: function: __printk_ratelimit 199.900813: function: _raw_spin_trylock 199.900813: function: _raw_spin_unlock_irqrestore 199.900814: function: printk 199.900814: function: vprintk_emit -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 10:07:29 -0500 (EST) Vince Weaver vincent.wea...@maine.edu wrote: On Fri, 28 Feb 2014, Steven Rostedt wrote: 199.900696: function: __module_address ... 199.900705: function: __kernel_text_address 199.900809: kernel_stack: stack trace = perf_callchain (810d35a2) = perf_prepare_sample (810cfae3) = __perf_event_overflow (810d02f4) = perf_swevent_overflow (810d04e3) = perf_swevent_event (810d0574) = perf_tp_event (810d070c) = perf_trace_x86_exceptions (810341b6) = trace_do_page_fault (81537702) = trace_page_fault (81534772) Thank you!!! You just found the bug :-) The bug was caused by: commit 25c74b10bacead867478480170083f69cfc0db48 x86, trace: Register exception handler to trace IDT With this code: dotraplinkage void __kprobes trace_do_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; prev_state = exception_enter(); trace_page_fault_entries(regs, error_code); __do_page_fault(regs, error_code); exception_exit(prev_state); } The trace_page_fault_entries() which is called before the cr2 is saved can fault by perf doing a userspace stack trace. But the cr2 is not restored when calling __do_page_fault() and that gets the wrong cr2. Below is a patch that should fix this. Please remove all other patches and try this out. Thanks, -- Steve 199.900810: function: perf_output_begin 199.900810: function: __do_page_fault 199.900810: function:__perf_sw_event 199.900810: function: perf_swevent_get_recursion_context 199.900811: function:down_read_trylock 199.900811: function:_cond_resched 199.900811: function:find_vma 199.900811: function:bad_area 199.900812: function: up_read 199.900812: function: __bad_area_nosemaphore 199.900812: function: is_prefetch 199.900812: function: convert_ip_to_linear 199.900813: function: unhandled_signal 199.900813: function: __printk_ratelimit 199.900813: function: _raw_spin_trylock 199.900813: function: _raw_spin_unlock_irqrestore 199.900814: function: printk 199.900814: function: vprintk_emit diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 6dea040..66b636d 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1271,9 +1271,15 @@ dotraplinkage void __kprobes trace_do_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; + unsigned long cr2; prev_state = exception_enter(); + /* The trace might fault, save the cr2 register */ + cr2 = read_cr2(); trace_page_fault_entries(regs, error_code); + /* Put back the original cr2 if needed */ + if (cr2 != read_cr2()) + write_cr2(cr2); __do_page_fault(regs, error_code); exception_exit(prev_state); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 10:20:00 -0500 Steven Rostedt rost...@goodmis.org wrote: Below is a patch that should fix this. Please remove all other patches and try this out. Updated patch, as Peter Zijlstra on IRC asked me if the exception_enter() can be traced. And looking at it, it sure can be. -- Steve diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 6dea040..f606b67 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1271,9 +1271,15 @@ dotraplinkage void __kprobes trace_do_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; + unsigned long cr2; + /* The trace might fault, save the cr2 register */ + cr2 = read_cr2(); prev_state = exception_enter(); trace_page_fault_entries(regs, error_code); + /* Put back the original cr2 if needed */ + if (cr2 != read_cr2()) + write_cr2(cr2); __do_page_fault(regs, error_code); exception_exit(prev_state); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote: If I'm reading this right we end up going from the page fault tracepoint to copy_from_user_nmi() without going through NMI, and the cr2 corruption is obvious. I guess the assumption that only the NMI path needed to save cr2 is flawed? It was never assumed it would only go through NMI, but that it would be NMI safe -- and as it turns out, it is that. What I did assume was that any other callsites would be safe, seeing how they'd already be running in 'normal' contexts. I had not considered people putting tracepoints _that_ early in the exception paths. Note that there's more tracepoints there than the one mentioned. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On 02/28/2014 07:40 AM, Peter Zijlstra wrote: On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote: If I'm reading this right we end up going from the page fault tracepoint to copy_from_user_nmi() without going through NMI, and the cr2 corruption is obvious. I guess the assumption that only the NMI path needed to save cr2 is flawed? It was never assumed it would only go through NMI, but that it would be NMI safe -- and as it turns out, it is that. What I did assume was that any other callsites would be safe, seeing how they'd already be running in 'normal' contexts. I had not considered people putting tracepoints _that_ early in the exception paths. Note that there's more tracepoints there than the one mentioned. Well, I was talking about the assumption spelled out in the comment above copy_from_user_nmi() which pretty much states cr2 is safe because cr2 is saved/restored in the NMI wrappers. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 08:15:11 -0800 H. Peter Anvin h...@zytor.com wrote: Well, I was talking about the assumption spelled out in the comment above copy_from_user_nmi() which pretty much states cr2 is safe because cr2 is saved/restored in the NMI wrappers. Yeah, it seems that the name copy_from_user_nmi() is a misnomer. As it can be called outside of nmi context. Perhaps we should have a copy_from_user_trace() that does the save and restore of the cr2. As that's the only place that faults, it may be the best answer. Arnaldo, Can you test this patch and see if it fixes the bug for you too. Vince already said it fixes it for him. I'm attaching it below (it's from H. Peter). Peter Z., as both Jiri's and my patch fixed the callers of the problem area, and as we have been discussing, there may be more problem areas, I'm thinking the best solution is to just use H. Peter's patch instead. And then we should rename it to copy_from_user_trace(). -- Steve diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c index ddf9ecb..938e45c 100644 --- a/arch/x86/lib/usercopy.c +++ b/arch/x86/lib/usercopy.c @@ -10,6 +10,8 @@ #include asm/word-at-a-time.h #include linux/sched.h +#include asm/processor.h + /* * We rely on the nested NMI work to allow atomic faults from the NMI path; the * nested NMI paths are careful to preserve CR2. @@ -18,6 +20,7 @@ unsigned long copy_from_user_nmi(void *to, const void __user *from, unsigned long n) { unsigned long ret; + unsigned long cr2; if (__range_not_ok(from, n, TASK_SIZE)) return 0; @@ -27,9 +30,11 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n) * disable pagefaults so that its behaviour is consistent even when * called form other contexts. */ + cr2 = read_cr2(); pagefault_disable(); ret = __copy_from_user_inatomic(to, from, n); pagefault_enable(); + write_cr2(cr2); return ret; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote: On Thu, 27 Feb 2014, H. Peter Anvin wrote: On 02/27/2014 03:30 PM, Steven Rostedt wrote: On Thu, 27 Feb 2014 14:52:54 -0800 H. Peter Anvin h...@zytor.com wrote: On 02/27/2014 02:31 PM, Steven Rostedt wrote: Yeah, something is getting mesed up. What it *looks* like to me is that we try to nest the cr2 save/restore, which doesn't nest because it is a percpu variable. ... except in the x86-64 case, we *ALSO* save/restore cr2 inside entry_64.S, which makes the stuff in do_nmi completely redundant and there for no good reason. Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32 section. That is, it isn't even executed. That's i386 code. The only place the cr2 is saved for x86_64 is in entry_64.S. Right, egg on my face. However, I still think it would make more sense for it to nest the way entry_64.S does if at all possible. That makes this even more confusing, though. I would still like to see what happens with the patch I sent Vince. I'll try your patch momentarily, first I had some other changes I started running before I left work (for some reason it recompiled the whole kernel). 8: function: perf_output_begin 8: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e 8: bputs:perf_output_begin: VMW: before rcu_dereference 9: function: __do_page_fault 9: function:down_read_trylock 9: function:_cond_resched 9: function:find_vma so it looks like the fault happens rcu_read_lock(); 116 /* 117 * For inherited events we send all the output towards the parent. 118 */ 119 if (event-parent) 120 event = event-parent; 121 somewhere between here 122 rb = rcu_dereference(event-rb); 123 if (unlikely(!rb)) 124 goto out; and here 125 126 if (unlikely(!rb-nr_pages)) 127 goto out; although if rcu locks do anything to turn off tracing then this could be suspect. The most likely suspect is of course event-rb in the rcu_dereference. I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock() currently interact with tracing. ;-) Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
Now we need to figure out if the reboot problem and the segfault problem are actually the same... I have a nasty feeling they might be different problems. On February 28, 2014 7:07:29 AM PST, Vince Weaver vincent.wea...@maine.edu wrote: On Fri, 28 Feb 2014, Steven Rostedt wrote: Interesting. Are you doing a perf function trace? And just in case, can you add this patch and make sure the copy is called by NMI. 199.900682: function: trace_do_page_fault 199.900683: page_fault_user: address=__per_cpu_end ip=__per_cpu_end error_code=0x6 199.900683: function: perf_swevent_get_recursion_context 199.900684: function: perf_tp_event 199.900684: function:perf_swevent_event 199.900684: function: perf_swevent_overflow 199.900684: function: __perf_event_overflow 199.900685: function: perf_prepare_sample 199.900685: function: __perf_event_header__init_id 199.900685: function: task_tgid_nr_ns 199.900685: function: perf_event_tid 199.900686: function: __task_pid_nr_ns 199.900686: function:perf_callchain 199.900687: function: copy_from_user_nmi 199.900687: function: trace_do_page_fault 199.900687: page_fault_kernel:address=irq_stack_union ip=copy_user_generic_string error_code=0x0 199.900688: function:__do_page_fault 199.900688: function: bad_area_nosemaphore 199.900688: function: __bad_area_nosemaphore 199.900689: function: no_context 199.900689: function:fixup_exception 199.900689: function: search_exception_tables 199.900689: function: search_extable 199.900691: function: copy_user_handle_tail 199.900691: function: trace_do_page_fault 199.900691: page_fault_kernel:address=irq_stack_union ip=copy_user_handle_tail error_code=0x0 199.900691: function:__do_page_fault 199.900692: function: bad_area_nosemaphore 199.900692: function: __bad_area_nosemaphore 199.900692: function: no_context 199.900692: function:fixup_exception 199.900692: function: search_exception_tables 199.900692: function: search_extable 199.900693: function: save_stack_trace 199.900693: function:dump_trace 199.900694: function: print_context_stack 199.900694: function: __kernel_text_address 199.900694: function: is_module_text_address 199.900695: function:__module_text_address 199.900695: function: __module_address 199.900695: function: __kernel_text_address 199.900695: function: is_module_text_address 199.900696: function:__module_text_address 199.900696: function: __module_address ... 199.900705: function: __kernel_text_address 199.900809: kernel_stack: stack trace = perf_callchain (810d35a2) = perf_prepare_sample (810cfae3) = __perf_event_overflow (810d02f4) = perf_swevent_overflow (810d04e3) = perf_swevent_event (810d0574) = perf_tp_event (810d070c) = perf_trace_x86_exceptions (810341b6) = trace_do_page_fault (81537702) = trace_page_fault (81534772) 199.900810: function: perf_output_begin 199.900810: function: __do_page_fault 199.900810: function:__perf_sw_event 199.900810: function: perf_swevent_get_recursion_context 199.900811: function:down_read_trylock 199.900811: function:_cond_resched 199.900811: function:find_vma 199.900811: function:bad_area 199.900812: function: up_read 199.900812: function: __bad_area_nosemaphore 199.900812: function: is_prefetch 199.900812: function: convert_ip_to_linear 199.900813: function: unhandled_signal 199.900813: function: __printk_ratelimit 199.900813: function: _raw_spin_trylock 199.900813: function: _raw_spin_unlock_irqrestore 199.900814: function: printk 199.900814: function: vprintk_emit -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 12:38:52 -0800 H. Peter Anvin h...@zytor.com wrote: Now we need to figure out if the reboot problem and the segfault problem are actually the same... I have a nasty feeling they might be different problems. I wonder if there was any recursion problem. Although, I believe perf has recursion checks. But perhaps something recursed before the checks? -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 12:34:05 -0800 Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote: On Thu, 27 Feb 2014, H. Peter Anvin wrote: On 02/27/2014 03:30 PM, Steven Rostedt wrote: On Thu, 27 Feb 2014 14:52:54 -0800 H. Peter Anvin h...@zytor.com wrote: On 02/27/2014 02:31 PM, Steven Rostedt wrote: Yeah, something is getting mesed up. What it *looks* like to me is that we try to nest the cr2 save/restore, which doesn't nest because it is a percpu variable. ... except in the x86-64 case, we *ALSO* save/restore cr2 inside entry_64.S, which makes the stuff in do_nmi completely redundant and there for no good reason. Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32 section. That is, it isn't even executed. That's i386 code. The only place the cr2 is saved for x86_64 is in entry_64.S. Right, egg on my face. However, I still think it would make more sense for it to nest the way entry_64.S does if at all possible. That makes this even more confusing, though. I would still like to see what happens with the patch I sent Vince. I'll try your patch momentarily, first I had some other changes I started running before I left work (for some reason it recompiled the whole kernel). 8: function: perf_output_begin 8: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e 8: bputs:perf_output_begin: VMW: before rcu_dereference 9: function: __do_page_fault 9: function:down_read_trylock 9: function:_cond_resched 9: function:find_vma so it looks like the fault happens rcu_read_lock(); 116 /* 117 * For inherited events we send all the output towards the parent. 118 */ 119 if (event-parent) 120 event = event-parent; 121 somewhere between here 122 rb = rcu_dereference(event-rb); 123 if (unlikely(!rb)) 124 goto out; and here 125 126 if (unlikely(!rb-nr_pages)) 127 goto out; although if rcu locks do anything to turn off tracing then this could be suspect. The most likely suspect is of course event-rb in the rcu_dereference. I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock() currently interact with tracing. ;-) These are all perf related. You'll need to defer to Peter Zijlstra ;-) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote: I'll try your patch momentarily, first I had some other changes I started running before I left work (for some reason it recompiled the whole kernel). 8: function: perf_output_begin 8: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e 8: bputs:perf_output_begin: VMW: before rcu_dereference 9: function: __do_page_fault 9: function:down_read_trylock 9: function:_cond_resched 9: function:find_vma so it looks like the fault happens rcu_read_lock(); 116 /* 117 * For inherited events we send all the output towards the parent. 118 */ 119 if (event-parent) 120 event = event-parent; 121 somewhere between here 122 rb = rcu_dereference(event-rb); 123 if (unlikely(!rb)) 124 goto out; and here 125 126 if (unlikely(!rb-nr_pages)) 127 goto out; although if rcu locks do anything to turn off tracing then this could be suspect. The most likely suspect is of course event-rb in the rcu_dereference. I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock() currently interact with tracing. ;-) These are all perf related. You'll need to defer to Peter Zijlstra ;-) I'm lost.. :/ pretty much all perf objects are RCU freed. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 08:15:11AM -0800, H. Peter Anvin wrote: On 02/28/2014 07:40 AM, Peter Zijlstra wrote: On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote: If I'm reading this right we end up going from the page fault tracepoint to copy_from_user_nmi() without going through NMI, and the cr2 corruption is obvious. I guess the assumption that only the NMI path needed to save cr2 is flawed? It was never assumed it would only go through NMI, but that it would be NMI safe -- and as it turns out, it is that. What I did assume was that any other callsites would be safe, seeing how they'd already be running in 'normal' contexts. I had not considered people putting tracepoints _that_ early in the exception paths. Note that there's more tracepoints there than the one mentioned. Well, I was talking about the assumption spelled out in the comment above copy_from_user_nmi() which pretty much states cr2 is safe because cr2 is saved/restored in the NMI wrappers. That is because we assumed NMI was the only thing that could interrupt a fault handler before it would read CR2. Never thinking someone would put a tracepoint there. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 11:29:46AM -0500, Steven Rostedt wrote: On Fri, 28 Feb 2014 08:15:11 -0800 H. Peter Anvin h...@zytor.com wrote: Well, I was talking about the assumption spelled out in the comment above copy_from_user_nmi() which pretty much states cr2 is safe because cr2 is saved/restored in the NMI wrappers. Yeah, it seems that the name copy_from_user_nmi() is a misnomer. As it can be called outside of nmi context. Perhaps we should have a copy_from_user_trace() that does the save and restore of the cr2. As that's the only place that faults, it may be the best answer. Arnaldo, Can you test this patch and see if it fixes the bug for you too. Vince already said it fixes it for him. I'm attaching it below (it's from H. Peter). Peter Z., as both Jiri's and my patch fixed the callers of the problem area, and as we have been discussing, there may be more problem areas, I'm thinking the best solution is to just use H. Peter's patch instead. And then we should rename it to copy_from_user_trace(). Like already said; _trace is an absolutely abysmal name. Also you _really_ don't want an unconditional CR2 write in there, that's just stupidly expensive. Also, this function is called a _LOT_ under certain workloads, I don't know how cheap a CR2 read is, but it had better be really cheap. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 21:56:38 +0100 Peter Zijlstra pet...@infradead.org wrote: Like already said; _trace is an absolutely abysmal name. Also you _really_ don't want an unconditional CR2 write in there, that's just stupidly expensive. But a read isn't. Which is why we only do a write if the copy caused a page fault (which is already expensive). The proposed patch will have: if (cr2 != read_cr2()) write_cr2(cr2); Also, this function is called a _LOT_ under certain workloads, I don't know how cheap a CR2 read is, but it had better be really cheap. That's a HPA question. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014, H. Peter Anvin wrote: Now we need to figure out if the reboot problem and the segfault problem are actually the same... I have a nasty feeling they might be different problems. I'm currently running a script that tries setting EBP to all possible 32-bit pages and running the test to see if that triggers anything. If I look at my notes the original reboot crash might have happened when I had the fuzzer also generating overflow signals (my current tests do not) so I'm not sure if having all this mess triggered from inside a signal handler could make it reboot somehow. I was away from the computer this afternoon and of course I have scores of e-mails on this topic now with lots of competing patches. Is there one in particular I'm supposed to be testing? Thanks, Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 09:54:09PM +0100, Peter Zijlstra wrote: On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote: I'll try your patch momentarily, first I had some other changes I started running before I left work (for some reason it recompiled the whole kernel). 8: function: perf_output_begin 8: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e 8: bputs:perf_output_begin: VMW: before rcu_dereference 9: function: __do_page_fault 9: function:down_read_trylock 9: function:_cond_resched 9: function:find_vma so it looks like the fault happens rcu_read_lock(); 116 /* 117 * For inherited events we send all the output towards the parent. 118 */ 119 if (event-parent) 120 event = event-parent; 121 somewhere between here 122 rb = rcu_dereference(event-rb); 123 if (unlikely(!rb)) 124 goto out; and here 125 126 if (unlikely(!rb-nr_pages)) 127 goto out; although if rcu locks do anything to turn off tracing then this could be suspect. The most likely suspect is of course event-rb in the rcu_dereference. I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock() currently interact with tracing. ;-) These are all perf related. You'll need to defer to Peter Zijlstra ;-) I'm lost.. :/ pretty much all perf objects are RCU freed. This code isn't running in idle context is it? If so, RCU will happily free out from under it. CONFIG_PROVE_RCU should detect this sort of thing, though. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote: This code isn't running in idle context is it? If so, RCU will happily free out from under it. CONFIG_PROVE_RCU should detect this sort of thing, though. Well, interrupts/NMIs can happen when idle, but the interrupt/NMI entry code deals with the idle state AFAIK. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 16:18:23 -0500 (EST) Vince Weaver vincent.wea...@maine.edu wrote: I was away from the computer this afternoon and of course I have scores of e-mails on this topic now with lots of competing patches. Is there one in particular I'm supposed to be testing? I was poking fun at you on IRC for this exact reason: rostedt poor Vince, I keep sending him new patches. No, don't test this patch, now test this one. Oh wait, try this one instead * peterz sees Vince thinking: stop... sending.. me.. damn... patches... already... !!!11! rostedt or at least, Let me finish this test before I cancel it again for another damn patch rostedt then he's probably doing I'm not going to run any tests now, until I wait a while to see if there's a new patch to test :-) Anyway, I would say wait a bit while we sort this out. At least we have a strong idea of what the bug is. Now we need to agree on the solution. Vince, you have been really terrific in helping us figure out what was wrong. I don't want to waste your time until we can all agree on what the proper fix is. When we do, it would be great if you can test it. But for now, have a great weekend and thanks for all your hard work in narrowing down this issue. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote: On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote: This code isn't running in idle context is it? If so, RCU will happily free out from under it. CONFIG_PROVE_RCU should detect this sort of thing, though. Well, interrupts/NMIs can happen when idle, but the interrupt/NMI entry code deals with the idle state AFAIK. Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that. More worried about this being code invoked from some energy-efficiency driver or another within the idle loop. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote: On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote: On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote: This code isn't running in idle context is it? If so, RCU will happily free out from under it. CONFIG_PROVE_RCU should detect this sort of thing, though. Well, interrupts/NMIs can happen when idle, but the interrupt/NMI entry code deals with the idle state AFAIK. Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that. More worried about this being code invoked from some energy-efficiency driver or another within the idle loop. Right, so any tracepoint can end up there; but I thought there was already the rule that tracepoints needed RCU enabled. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 22:55:11 +0100 Peter Zijlstra pet...@infradead.org wrote: On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote: On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote: On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote: This code isn't running in idle context is it? If so, RCU will happily free out from under it. CONFIG_PROVE_RCU should detect this sort of thing, though. Well, interrupts/NMIs can happen when idle, but the interrupt/NMI entry code deals with the idle state AFAIK. Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that. More worried about this being code invoked from some energy-efficiency driver or another within the idle loop. Right, so any tracepoint can end up there; but I thought there was already the rule that tracepoints needed RCU enabled. There is and we have special tracepoint caller for those cases we want a tracepoint out of RCU scope. These will reactivate rcu in the tracepoint code. trace_tp_name_rcuidle(...) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, Feb 28, 2014 at 05:05:53PM -0500, Steven Rostedt wrote: On Fri, 28 Feb 2014 22:55:11 +0100 Peter Zijlstra pet...@infradead.org wrote: On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote: On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote: On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote: This code isn't running in idle context is it? If so, RCU will happily free out from under it. CONFIG_PROVE_RCU should detect this sort of thing, though. Well, interrupts/NMIs can happen when idle, but the interrupt/NMI entry code deals with the idle state AFAIK. Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that. More worried about this being code invoked from some energy-efficiency driver or another within the idle loop. Right, so any tracepoint can end up there; but I thought there was already the rule that tracepoints needed RCU enabled. There is and we have special tracepoint caller for those cases we want a tracepoint out of RCU scope. These will reactivate rcu in the tracepoint code. trace_tp_name_rcuidle(...) OK, I finally looked at the emails leading up to this in the thread... I believe that I am doing premature debugging. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014, Steven Rostedt wrote: On Fri, 28 Feb 2014 16:18:23 -0500 (EST) Vince Weaver vincent.wea...@maine.edu wrote: I was away from the computer this afternoon and of course I have scores of e-mails on this topic now with lots of competing patches. Is there one in particular I'm supposed to be testing? I was poking fun at you on IRC for this exact reason: rostedt poor Vince, I keep sending him new patches. No, don't test this patch, now test this one. Oh wait, try this one instead * peterz sees Vince thinking: stop... sending.. me.. damn... patches... already... !!!11! rostedt or at least, Let me finish this test before I cancel it again for another damn patch rostedt then he's probably doing I'm not going to run any tests now, until I wait a while to see if there's a new patch to test Well while it might appear that I spend all of my days finding perf_event bugs, I actually am a college professor so I do occasionally have to run off to teach a class, meet with students, or write papers/grants for other academics to reject. It's nice others can reproduce the issue now, it would have saved me a lot of trouble, although now in theory I have a much better handle of how to use/abuse ftrace so I guess it was worth it. Once the fix gets into git I'm sure the relentless perf_fuzzer will let us know if there are any other issues left. I do look forward to the day when I can leave it running overnight and have a clean syslog the next morning. Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On 02/28/2014 03:34 PM, Vince Weaver wrote: Well while it might appear that I spend all of my days finding perf_event bugs, I actually am a college professor so I do occasionally have to run off to teach a class, meet with students, or write papers/grants for other academics to reject. We really appreciate your help. This has been really critical. It's nice others can reproduce the issue now, it would have saved me a lot of trouble, although now in theory I have a much better handle of how to use/abuse ftrace so I guess it was worth it. Once the fix gets into git I'm sure the relentless perf_fuzzer will let us know if there are any other issues left. I do look forward to the day when I can leave it running overnight and have a clean syslog the next morning. We all do, definitely, and your help has been a huge step in that direction. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Fri, 28 Feb 2014 18:34:00 -0500 (EST) Vince Weaver vincent.wea...@maine.edu wrote: I was poking fun at you on IRC for this exact reason: rostedt poor Vince, I keep sending him new patches. No, don't test this patch, now test this one. Oh wait, try this one instead * peterz sees Vince thinking: stop... sending.. me.. damn... patches... already... !!!11! rostedt or at least, Let me finish this test before I cancel it again for another damn patch rostedt then he's probably doing I'm not going to run any tests now, until I wait a while to see if there's a new patch to test I hope you were not offended by this. It was actually a testament to the work you've given us. Well while it might appear that I spend all of my days finding perf_event bugs, I actually am a college professor so I do occasionally have to run off to teach a class, meet with students, or write papers/grants for other academics to reject. But perf_event bug finder is a much more prestigious title than college professor ;-) It's nice others can reproduce the issue now, it would have saved me a lot of trouble, although now in theory I have a much better handle of how to use/abuse ftrace so I guess it was worth it. I always enjoy finding bugs in other people's code. That's usually the best way to learn what their code does. Once the fix gets into git I'm sure the relentless perf_fuzzer will let us know if there are any other issues left. I do look forward to the day when I can leave it running overnight and have a clean syslog the next morning. BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be really useful for us to do our own testing too. Thanks, -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014 20:34:34 -0500 (EST) Vince Weaver wrote: > > I would actually suggest we do the equivalent on i386 as well. > > > > Vince, could you try this patch as an experiment? > > OK with your patch applied it does not segfault. > Vince, Great! Can you remove Peter's patch, and try this one. It removes the crud to save the cr2 from entry_64.S and makes both i386 and x86_64 do the same thing in regards to cr2 handling. -- Steve diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 1e96c36..937cb8d 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1854,29 +1854,11 @@ end_repeat_nmi: call save_paranoid DEFAULT_FRAME 0 - /* -* Save off the CR2 register. If we take a page fault in the NMI then -* it could corrupt the CR2 value. If the NMI preempts a page fault -* handler before it was able to read the CR2 register, and then the -* NMI itself takes a page fault, the page fault that was preempted -* will read the information from the NMI page fault and not the -* origin fault. Save it off and restore it if it changes. -* Use the r12 callee-saved register. -*/ - movq %cr2, %r12 - /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ movq %rsp,%rdi movq $-1,%rsi call do_nmi - /* Did the NMI take a page fault? Restore cr2 if it did */ - movq %cr2, %rcx - cmpq %rcx, %r12 - je 1f - movq %r12, %cr2 -1: - testl %ebx,%ebx /* swapgs needed? */ jnz nmi_restore nmi_swapgs: diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index 6fcb49c..f1a6294 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -443,7 +443,6 @@ enum nmi_states { NMI_LATCHED, }; static DEFINE_PER_CPU(enum nmi_states, nmi_state); -static DEFINE_PER_CPU(unsigned long, nmi_cr2); #define nmi_nesting_preprocess(regs) \ do {\ @@ -452,14 +451,11 @@ static DEFINE_PER_CPU(unsigned long, nmi_cr2); return; \ } \ this_cpu_write(nmi_state, NMI_EXECUTING); \ - this_cpu_write(nmi_cr2, read_cr2());\ } while (0);\ nmi_restart: #define nmi_nesting_postprocess() \ do {\ - if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) \ - write_cr2(this_cpu_read(nmi_cr2)); \ if (this_cpu_dec_return(nmi_state)) \ goto nmi_restart; \ } while (0) @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void) dotraplinkage notrace __kprobes void do_nmi(struct pt_regs *regs, long error_code) { + unsigned long cr2; + nmi_nesting_preprocess(regs); + /* +* Save off the CR2 register. If we take a page fault in the NMI then +* it could corrupt the CR2 value. If the NMI preempts a page fault +* handler before it was able to read the CR2 register, and then the +* NMI itself takes a page fault, the page fault that was preempted +* will read the information from the NMI page fault and not the +* origin fault. Save it off and restore it if it changes. +* Use the r12 callee-saved register. +*/ + cr2 = read_cr2(); + nmi_enter(); inc_irq_stat(__nmi_count); @@ -523,6 +532,10 @@ do_nmi(struct pt_regs *regs, long error_code) nmi_exit(); + /* Reads are cheaper than writes */ + if (unlikely(cr2 != read_cr2())) + write_cr2(cr2); + /* On i386, may loop back to preprocess */ nmi_nesting_postprocess(); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
Ok... I think we're definitely talking about a cr2 leak. The reboot might be a race condition in the NMI nesting handling maybe? On February 27, 2014 5:34:34 PM PST, Vince Weaver wrote: >On Thu, 27 Feb 2014, H. Peter Anvin wrote: > >> On 02/27/2014 02:31 PM, Steven Rostedt wrote: >> > >> > Yeah, something is getting mesed up. >> > >> >> What it *looks* like to me is that we try to nest the cr2 >save/restore, >> which doesn't nest because it is a percpu variable. >> >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside >> entry_64.S, which makes the stuff in do_nmi completely redundant and >> there for no good reason. >> >> I would actually suggest we do the equivalent on i386 as well. >> >> Vince, could you try this patch as an experiment? > >OK with your patch applied it does not segfault. > >The trace looks like you'd expect too: > >perf_fuzzer-2910 [000] 255.355331: page_fault_kernel: >address=irq_stack_union ip=copy_user_handle_tail error_code=0x0 >perf_fuzzer-2910 [000] 255.355331: function: >__do_page_fault >perf_fuzzer-2910 [000] 255.355331: function: >bad_area_nosemaphore >perf_fuzzer-2910 [000] 255.355331: function: >__bad_area_nosemaphore >perf_fuzzer-2910 [000] 255.355332: function: >no_context >perf_fuzzer-2910 [000] 255.355332: function: > fixup_exception >perf_fuzzer-2910 [000] 255.355332: function: > search_exception_tables >perf_fuzzer-2910 [000] 255.355332: function: > search_extable >perf_fuzzer-2910 [000] 255.355333: function: >perf_output_begin >perf_fuzzer-2910 [000] 255.355333: bprint: >perf_output_begin: VMW: event type 2 config 2a st: 2c3e7 >perf_fuzzer-2910 [000] 255.355333: bprint: >perf_output_begin: VMW: before event->parent (nil) >perf_fuzzer-2910 [000] 255.355334: bprint: >perf_output_begin: VMW: before rcu_dereference (nil) >perf_fuzzer-2910 [000] 255.355334: function: >__do_page_fault >perf_fuzzer-2910 [000] 255.355334: function: >down_read_trylock >perf_fuzzer-2910 [000] 255.355334: function: >_cond_resched >perf_fuzzer-2910 [000] 255.355335: function:find_vma >perf_fuzzer-2910 [000] 255.355335: function: >handle_mm_fault >perf_fuzzer-2910 [000] 255.355335: function: >__do_fault >perf_fuzzer-2910 [000] 255.355335: bputs: >perf_mmap_fault: VMW: perf_mmap_fault >perf_fuzzer-2910 [000] 255.355336: bprint: >perf_mmap_fault: VMW: perf_mmap_fault 0x8800cba6a980 >perf_fuzzer-2910 [000] 255.355336: function: >perf_mmap_to_page >perf_fuzzer-2910 [000] 255.355336: function: >_cond_resched >perf_fuzzer-2910 [000] 255.355336: function: >unlock_page >perf_fuzzer-2910 [000] 255.355336: function: >page_waitqueue >perf_fuzzer-2910 [000] 255.355337: function: >__wake_up_bit >perf_fuzzer-2910 [000] 255.355337: bputs: >perf_mmap_fault: VMW: perf_mmap_fault >perf_fuzzer-2910 [000] 255.355337: function: >_cond_resched >perf_fuzzer-2910 [000] 255.355337: function: >_raw_spin_lock >perf_fuzzer-2910 [000] 255.355337: function: >page_add_file_rmap >perf_fuzzer-2910 [000] 255.355338: function: >__inc_zone_page_state >perf_fuzzer-2910 [000] 255.355338: function: > __inc_zone_state >perf_fuzzer-2910 [000] 255.355338: function: >set_page_dirty >perf_fuzzer-2910 [000] 255.355338: function: >page_mapping >perf_fuzzer-2910 [000] 255.355338: function: >anon_set_page_dirty >perf_fuzzer-2910 [000] 255.355339: function: >unlock_page > > >Vince -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014, H. Peter Anvin wrote: > On 02/27/2014 02:31 PM, Steven Rostedt wrote: > > > > Yeah, something is getting mesed up. > > > > What it *looks* like to me is that we try to nest the cr2 save/restore, > which doesn't nest because it is a percpu variable. > > ... except in the x86-64 case, we *ALSO* save/restore cr2 inside > entry_64.S, which makes the stuff in do_nmi completely redundant and > there for no good reason. > > I would actually suggest we do the equivalent on i386 as well. > > Vince, could you try this patch as an experiment? OK with your patch applied it does not segfault. The trace looks like you'd expect too: perf_fuzzer-2910 [000] 255.355331: page_fault_kernel: address=irq_stack_union ip=copy_user_handle_tail error_code=0x0 perf_fuzzer-2910 [000] 255.355331: function: __do_page_fault perf_fuzzer-2910 [000] 255.355331: function: bad_area_nosemaphore perf_fuzzer-2910 [000] 255.355331: function: __bad_area_nosemaphore perf_fuzzer-2910 [000] 255.355332: function: no_context perf_fuzzer-2910 [000] 255.355332: function: fixup_exception perf_fuzzer-2910 [000] 255.355332: function: search_exception_tables perf_fuzzer-2910 [000] 255.355332: function: search_extable perf_fuzzer-2910 [000] 255.355333: function: perf_output_begin perf_fuzzer-2910 [000] 255.355333: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e7 perf_fuzzer-2910 [000] 255.355333: bprint: perf_output_begin: VMW: before event->parent (nil) perf_fuzzer-2910 [000] 255.355334: bprint: perf_output_begin: VMW: before rcu_dereference (nil) perf_fuzzer-2910 [000] 255.355334: function: __do_page_fault perf_fuzzer-2910 [000] 255.355334: function: down_read_trylock perf_fuzzer-2910 [000] 255.355334: function: _cond_resched perf_fuzzer-2910 [000] 255.355335: function:find_vma perf_fuzzer-2910 [000] 255.355335: function: handle_mm_fault perf_fuzzer-2910 [000] 255.355335: function: __do_fault perf_fuzzer-2910 [000] 255.355335: bputs: perf_mmap_fault: VMW: perf_mmap_fault perf_fuzzer-2910 [000] 255.355336: bprint: perf_mmap_fault: VMW: perf_mmap_fault 0x8800cba6a980 perf_fuzzer-2910 [000] 255.355336: function: perf_mmap_to_page perf_fuzzer-2910 [000] 255.355336: function: _cond_resched perf_fuzzer-2910 [000] 255.355336: function: unlock_page perf_fuzzer-2910 [000] 255.355336: function: page_waitqueue perf_fuzzer-2910 [000] 255.355337: function: __wake_up_bit perf_fuzzer-2910 [000] 255.355337: bputs: perf_mmap_fault: VMW: perf_mmap_fault perf_fuzzer-2910 [000] 255.355337: function: _cond_resched perf_fuzzer-2910 [000] 255.355337: function: _raw_spin_lock perf_fuzzer-2910 [000] 255.355337: function: page_add_file_rmap perf_fuzzer-2910 [000] 255.355338: function: __inc_zone_page_state perf_fuzzer-2910 [000] 255.355338: function: __inc_zone_state perf_fuzzer-2910 [000] 255.355338: function: set_page_dirty perf_fuzzer-2910 [000] 255.355338: function: page_mapping perf_fuzzer-2910 [000] 255.355338: function: anon_set_page_dirty perf_fuzzer-2910 [000] 255.355339: function: unlock_page Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014, H. Peter Anvin wrote: > On 02/27/2014 03:30 PM, Steven Rostedt wrote: > > On Thu, 27 Feb 2014 14:52:54 -0800 > > "H. Peter Anvin" wrote: > > > >> On 02/27/2014 02:31 PM, Steven Rostedt wrote: > >>> > >>> Yeah, something is getting mesed up. > >>> > >> > >> What it *looks* like to me is that we try to nest the cr2 save/restore, > >> which doesn't nest because it is a percpu variable. > >> > >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside > >> entry_64.S, which makes the stuff in do_nmi completely redundant and > >> there for no good reason. > > > > Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32 > > section. That is, it isn't even executed. That's i386 code. The only > > place the cr2 is saved for x86_64 is in entry_64.S. > > > > Right, egg on my face. However, I still think it would make more sense > for it to nest the way entry_64.S does if at all possible. > > That makes this even more confusing, though. I would still like to see > what happens with the patch I sent Vince. I'll try your patch momentarily, first I had some other changes I started running before I left work (for some reason it recompiled the whole kernel). 8: function: perf_output_begin 8: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e 8: bputs:perf_output_begin: VMW: before rcu_dereference 9: function: __do_page_fault 9: function:down_read_trylock 9: function:_cond_resched 9: function:find_vma so it looks like the fault happens rcu_read_lock(); 116 /* 117 * For inherited events we send all the output towards the parent. 118 */ 119 if (event->parent) 120 event = event->parent; 121 somewhere between here 122 rb = rcu_dereference(event->rb); 123 if (unlikely(!rb)) 124 goto out; and here 125 126 if (unlikely(!rb->nr_pages)) 127 goto out; although if rcu locks do anything to turn off tracing then this could be suspect. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On 02/27/2014 03:30 PM, Steven Rostedt wrote: > On Thu, 27 Feb 2014 14:52:54 -0800 > "H. Peter Anvin" wrote: > >> On 02/27/2014 02:31 PM, Steven Rostedt wrote: >>> >>> Yeah, something is getting mesed up. >>> >> >> What it *looks* like to me is that we try to nest the cr2 save/restore, >> which doesn't nest because it is a percpu variable. >> >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside >> entry_64.S, which makes the stuff in do_nmi completely redundant and >> there for no good reason. > > Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32 > section. That is, it isn't even executed. That's i386 code. The only > place the cr2 is saved for x86_64 is in entry_64.S. > Right, egg on my face. However, I still think it would make more sense for it to nest the way entry_64.S does if at all possible. That makes this even more confusing, though. I would still like to see what happens with the patch I sent Vince. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014 14:52:54 -0800 "H. Peter Anvin" wrote: > On 02/27/2014 02:31 PM, Steven Rostedt wrote: > > > > Yeah, something is getting mesed up. > > > > What it *looks* like to me is that we try to nest the cr2 save/restore, > which doesn't nest because it is a percpu variable. > > ... except in the x86-64 case, we *ALSO* save/restore cr2 inside > entry_64.S, which makes the stuff in do_nmi completely redundant and > there for no good reason. Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32 section. That is, it isn't even executed. That's i386 code. The only place the cr2 is saved for x86_64 is in entry_64.S. -- Steve > > I would actually suggest we do the equivalent on i386 as well. > > Vince, could you try this patch as an experiment? > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On 02/27/2014 02:31 PM, Steven Rostedt wrote: > > Yeah, something is getting mesed up. > What it *looks* like to me is that we try to nest the cr2 save/restore, which doesn't nest because it is a percpu variable. ... except in the x86-64 case, we *ALSO* save/restore cr2 inside entry_64.S, which makes the stuff in do_nmi completely redundant and there for no good reason. I would actually suggest we do the equivalent on i386 as well. Vince, could you try this patch as an experiment? diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c index ddf9ecb..938e45c 100644 --- a/arch/x86/lib/usercopy.c +++ b/arch/x86/lib/usercopy.c @@ -10,6 +10,8 @@ #include #include +#include + /* * We rely on the nested NMI work to allow atomic faults from the NMI path; the * nested NMI paths are careful to preserve CR2. @@ -18,6 +20,7 @@ unsigned long copy_from_user_nmi(void *to, const void __user *from, unsigned long n) { unsigned long ret; + unsigned long cr2; if (__range_not_ok(from, n, TASK_SIZE)) return 0; @@ -27,9 +30,11 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n) * disable pagefaults so that its behaviour is consistent even when * called form other contexts. */ + cr2 = read_cr2(); pagefault_disable(); ret = __copy_from_user_inatomic(to, from, n); pagefault_enable(); + write_cr2(cr2); return ret; }
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014 17:06:36 -0500 (EST) Vince Weaver wrote: > > I spent some more time on this. > I managed to get a trace that exhibited the bug practically right > away, but still unable to generate a reproducible trace :( > > So instead I'm adding WARN's and trace_printks to see what I can find out. > > Here's a summary of what I think is happneing. Please let me know if I'm > wildly wrong in analyzing this. > > > > Userspace accesses the perf > ring-buffer-user-page of an event for the first time (in this trace > it's a SW_CPU_CLOCK event). > > This triggers a page_fault to bring in the page. > > This triggers a TRACEPOINT event (task/task_newtask) which > has > PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_CALLCHAIN|PERF_SAMPLE_ID|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_WEIGHT|PERF_SAMPLE_DATA_SRC|PERF_SAMPLE_TRANSACTION; > and exclude_callchain_kernel=1, triggering a perf_callchain(). > > Here is a dump of the stack right after we enter perf_callchain() > > vince@core2:~$ [ 202.320444] [ cut here ] > [ 202.324001] WARNING: CPU: 1 PID: 2873 at kernel/events/callchain.c:168 > perf_callchain+0x67/0x211() > hermal_sys ehci_pci ehci_hcd sg sd_mod usbcore usb_common > [ 202.324001] CPU: 1 PID: 2873 Comm: perf_fuzzer Not tainted 3.14.0-rc3+ #24 > [ 202.324001] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, > BIOS 080015 10/19/2012 > [ 202.324001] 00a8 880119ec1b48 81531dc1 > 00a8 > [ 202.324001] 880119ec1b88 81040ce4 > 880119ec1b88 > [ 202.324001] 810d416f 880119ec1e38 8800cba67800 > > [ 202.324001] Call Trace: > [ 202.324001] [] dump_stack+0x49/0x60 > [ 202.324001] [] warn_slowpath_common+0x81/0x9b > [ 202.324001] [] ? perf_callchain+0x67/0x211 > [ 202.324001] [] warn_slowpath_null+0x1a/0x1c > [ 202.324001] [] perf_callchain+0x67/0x211 > [ 202.324001] [] ? local_clock+0x1b/0x24 > [ 202.324001] [] perf_prepare_sample+0x7b/0x304 > [ 202.324001] [] __perf_event_overflow+0x156/0x1c1 > [ 202.324001] [] ? free_pgtables+0xa7/0xc9 > [ 202.324001] [] perf_swevent_overflow+0x41/0x5b > [ 202.324001] [] perf_swevent_event+0x72/0x74 > [ 202.324001] [] perf_tp_event+0xea/0x1ef > [ 202.324001] [] ? perf_trace_x86_exceptions+0x4c/0xba > [ 202.324001] [] perf_trace_x86_exceptions+0xa8/0xba > [ 202.324001] [] ? perf_trace_x86_exceptions+0x4c/0xba > [ 202.324001] [] trace_do_page_fault+0x48/0x99 > [ 202.324001] [] trace_page_fault+0x22/0x30 > [ 202.324001] ---[ end trace 31ccd31b4e82cb42 ]--- > > This triggers a kernel pagefault because the BP register is not valid and > copy_from_user_nmi() tries to copy the user stack area from there. > > This *should* be OK? And it maybe looks like it works, but then > > copy_user_handle_tail() > > causes another page fault again at the invaid BP register. > > But then the code continues to > perf_output_begin() > > > 237.265689: page_fault_kernel:address=0x17a0 ip=copy_user_handle_tail > error_code=0x0 > 237.265689: function:__do_page_fault > 237.265689: function: bad_area_nosemaphore > 237.265690: function: __bad_area_nosemaphore > 237.265690: function: no_context > 237.265690: function:fixup_exception > 237.265690: function: search_exception_tables > 237.265690: function: search_extable > 237.265691: function: perf_output_begin You can also allow perf to be traced. Remove the CFLAGS_REMOVE_core.o line from kernel/events/Makefile. > 237.265692: bprint: perf_output_begin: VMW: event type 2 config > 2a st: 2c3e7 > > how are we back in __do_page_fault again here? Well, the perf ring buffer is vmalloced, right? That can cause a page fault too. > > 237.265692: function: __do_page_fault Try adding a trace_printk() to see all the page faults? > 237.265692: function:down_read_trylock > 237.265692: function:_cond_resched > 237.265693: function:find_vma > 237.265693: function:bad_area > 237.265693: function: up_read > 237.265693: function: __bad_area_nosemaphore > 237.265694: function: is_prefetch > 237.265694: function: convert_ip_to_linear > 237.265695: function: unhandled_signal > 237.265695: function: __printk_ratelimit > 237.265695: function: _raw_spin_trylock > 237.265695: function: _raw_spin_unlock_irqrestore > 237.265696: function: printk > 237.265696: function: vprintk_emit > > [ 202.877004] perf_fuzzer[2873]: segfault at 17a0 ip
Re: perf_fuzzer compiled for x32 causes reboot
I spent some more time on this. I managed to get a trace that exhibited the bug practically right away, but still unable to generate a reproducible trace :( So instead I'm adding WARN's and trace_printks to see what I can find out. Here's a summary of what I think is happneing. Please let me know if I'm wildly wrong in analyzing this. Userspace accesses the perf ring-buffer-user-page of an event for the first time (in this trace it's a SW_CPU_CLOCK event). This triggers a page_fault to bring in the page. This triggers a TRACEPOINT event (task/task_newtask) which has PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_CALLCHAIN|PERF_SAMPLE_ID|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_WEIGHT|PERF_SAMPLE_DATA_SRC|PERF_SAMPLE_TRANSACTION; and exclude_callchain_kernel=1, triggering a perf_callchain(). Here is a dump of the stack right after we enter perf_callchain() vince@core2:~$ [ 202.320444] [ cut here ] [ 202.324001] WARNING: CPU: 1 PID: 2873 at kernel/events/callchain.c:168 perf_callchain+0x67/0x211() hermal_sys ehci_pci ehci_hcd sg sd_mod usbcore usb_common [ 202.324001] CPU: 1 PID: 2873 Comm: perf_fuzzer Not tainted 3.14.0-rc3+ #24 [ 202.324001] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015 10/19/2012 [ 202.324001] 00a8 880119ec1b48 81531dc1 00a8 [ 202.324001] 880119ec1b88 81040ce4 880119ec1b88 [ 202.324001] 810d416f 880119ec1e38 8800cba67800 [ 202.324001] Call Trace: [ 202.324001] [] dump_stack+0x49/0x60 [ 202.324001] [] warn_slowpath_common+0x81/0x9b [ 202.324001] [] ? perf_callchain+0x67/0x211 [ 202.324001] [] warn_slowpath_null+0x1a/0x1c [ 202.324001] [] perf_callchain+0x67/0x211 [ 202.324001] [] ? local_clock+0x1b/0x24 [ 202.324001] [] perf_prepare_sample+0x7b/0x304 [ 202.324001] [] __perf_event_overflow+0x156/0x1c1 [ 202.324001] [] ? free_pgtables+0xa7/0xc9 [ 202.324001] [] perf_swevent_overflow+0x41/0x5b [ 202.324001] [] perf_swevent_event+0x72/0x74 [ 202.324001] [] perf_tp_event+0xea/0x1ef [ 202.324001] [] ? perf_trace_x86_exceptions+0x4c/0xba [ 202.324001] [] perf_trace_x86_exceptions+0xa8/0xba [ 202.324001] [] ? perf_trace_x86_exceptions+0x4c/0xba [ 202.324001] [] trace_do_page_fault+0x48/0x99 [ 202.324001] [] trace_page_fault+0x22/0x30 [ 202.324001] ---[ end trace 31ccd31b4e82cb42 ]--- This triggers a kernel pagefault because the BP register is not valid and copy_from_user_nmi() tries to copy the user stack area from there. This *should* be OK? And it maybe looks like it works, but then copy_user_handle_tail() causes another page fault again at the invaid BP register. But then the code continues to perf_output_begin() 237.265689: page_fault_kernel:address=0x17a0 ip=copy_user_handle_tail error_code=0x0 237.265689: function:__do_page_fault 237.265689: function: bad_area_nosemaphore 237.265690: function: __bad_area_nosemaphore 237.265690: function: no_context 237.265690: function:fixup_exception 237.265690: function: search_exception_tables 237.265690: function: search_extable 237.265691: function: perf_output_begin 237.265692: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e7 how are we back in __do_page_fault again here? 237.265692: function: __do_page_fault 237.265692: function:down_read_trylock 237.265692: function:_cond_resched 237.265693: function:find_vma 237.265693: function:bad_area 237.265693: function: up_read 237.265693: function: __bad_area_nosemaphore 237.265694: function: is_prefetch 237.265694: function: convert_ip_to_linear 237.265695: function: unhandled_signal 237.265695: function: __printk_ratelimit 237.265695: function: _raw_spin_trylock 237.265695: function: _raw_spin_unlock_irqrestore 237.265696: function: printk 237.265696: function: vprintk_emit [ 202.877004] perf_fuzzer[2873]: segfault at 17a0 ip 004017fd sp ffd19d10 error 6 in perf_fuzzer[40+d1000] Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
I spent some more time on this. I managed to get a trace that exhibited the bug practically right away, but still unable to generate a reproducible trace :( So instead I'm adding WARN's and trace_printks to see what I can find out. Here's a summary of what I think is happneing. Please let me know if I'm wildly wrong in analyzing this. Userspace accesses the perf ring-buffer-user-page of an event for the first time (in this trace it's a SW_CPU_CLOCK event). This triggers a page_fault to bring in the page. This triggers a TRACEPOINT event (task/task_newtask) which has PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_CALLCHAIN|PERF_SAMPLE_ID|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_WEIGHT|PERF_SAMPLE_DATA_SRC|PERF_SAMPLE_TRANSACTION; and exclude_callchain_kernel=1, triggering a perf_callchain(). Here is a dump of the stack right after we enter perf_callchain() vince@core2:~$ [ 202.320444] [ cut here ] [ 202.324001] WARNING: CPU: 1 PID: 2873 at kernel/events/callchain.c:168 perf_callchain+0x67/0x211() hermal_sys ehci_pci ehci_hcd sg sd_mod usbcore usb_common [ 202.324001] CPU: 1 PID: 2873 Comm: perf_fuzzer Not tainted 3.14.0-rc3+ #24 [ 202.324001] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015 10/19/2012 [ 202.324001] 00a8 880119ec1b48 81531dc1 00a8 [ 202.324001] 880119ec1b88 81040ce4 880119ec1b88 [ 202.324001] 810d416f 880119ec1e38 8800cba67800 [ 202.324001] Call Trace: [ 202.324001] [81531dc1] dump_stack+0x49/0x60 [ 202.324001] [81040ce4] warn_slowpath_common+0x81/0x9b [ 202.324001] [810d416f] ? perf_callchain+0x67/0x211 [ 202.324001] [81040d18] warn_slowpath_null+0x1a/0x1c [ 202.324001] [810d416f] perf_callchain+0x67/0x211 [ 202.324001] [8106b0ab] ? local_clock+0x1b/0x24 [ 202.324001] [810d0872] perf_prepare_sample+0x7b/0x304 [ 202.324001] [810d1079] __perf_event_overflow+0x156/0x1c1 [ 202.324001] [810f4613] ? free_pgtables+0xa7/0xc9 [ 202.324001] [810d125e] perf_swevent_overflow+0x41/0x5b [ 202.324001] [810d12ea] perf_swevent_event+0x72/0x74 [ 202.324001] [810d1478] perf_tp_event+0xea/0x1ef [ 202.324001] [8103515a] ? perf_trace_x86_exceptions+0x4c/0xba [ 202.324001] [810351b6] perf_trace_x86_exceptions+0xa8/0xba [ 202.324001] [8103515a] ? perf_trace_x86_exceptions+0x4c/0xba [ 202.324001] [81538439] trace_do_page_fault+0x48/0x99 [ 202.324001] [815354b2] trace_page_fault+0x22/0x30 [ 202.324001] ---[ end trace 31ccd31b4e82cb42 ]--- This triggers a kernel pagefault because the BP register is not valid and copy_from_user_nmi() tries to copy the user stack area from there. This *should* be OK? And it maybe looks like it works, but then copy_user_handle_tail() causes another page fault again at the invaid BP register. But then the code continues to perf_output_begin() 237.265689: page_fault_kernel:address=0x17a0 ip=copy_user_handle_tail error_code=0x0 237.265689: function:__do_page_fault 237.265689: function: bad_area_nosemaphore 237.265690: function: __bad_area_nosemaphore 237.265690: function: no_context 237.265690: function:fixup_exception 237.265690: function: search_exception_tables 237.265690: function: search_extable 237.265691: function: perf_output_begin 237.265692: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e7 how are we back in __do_page_fault again here? 237.265692: function: __do_page_fault 237.265692: function:down_read_trylock 237.265692: function:_cond_resched 237.265693: function:find_vma 237.265693: function:bad_area 237.265693: function: up_read 237.265693: function: __bad_area_nosemaphore 237.265694: function: is_prefetch 237.265694: function: convert_ip_to_linear 237.265695: function: unhandled_signal 237.265695: function: __printk_ratelimit 237.265695: function: _raw_spin_trylock 237.265695: function: _raw_spin_unlock_irqrestore 237.265696: function: printk 237.265696: function: vprintk_emit [ 202.877004] perf_fuzzer[2873]: segfault at 17a0 ip 004017fd sp ffd19d10 error 6 in perf_fuzzer[40+d1000] Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014 17:06:36 -0500 (EST) Vince Weaver vincent.wea...@maine.edu wrote: I spent some more time on this. I managed to get a trace that exhibited the bug practically right away, but still unable to generate a reproducible trace :( So instead I'm adding WARN's and trace_printks to see what I can find out. Here's a summary of what I think is happneing. Please let me know if I'm wildly wrong in analyzing this. Userspace accesses the perf ring-buffer-user-page of an event for the first time (in this trace it's a SW_CPU_CLOCK event). This triggers a page_fault to bring in the page. This triggers a TRACEPOINT event (task/task_newtask) which has PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_CALLCHAIN|PERF_SAMPLE_ID|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_WEIGHT|PERF_SAMPLE_DATA_SRC|PERF_SAMPLE_TRANSACTION; and exclude_callchain_kernel=1, triggering a perf_callchain(). Here is a dump of the stack right after we enter perf_callchain() vince@core2:~$ [ 202.320444] [ cut here ] [ 202.324001] WARNING: CPU: 1 PID: 2873 at kernel/events/callchain.c:168 perf_callchain+0x67/0x211() hermal_sys ehci_pci ehci_hcd sg sd_mod usbcore usb_common [ 202.324001] CPU: 1 PID: 2873 Comm: perf_fuzzer Not tainted 3.14.0-rc3+ #24 [ 202.324001] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015 10/19/2012 [ 202.324001] 00a8 880119ec1b48 81531dc1 00a8 [ 202.324001] 880119ec1b88 81040ce4 880119ec1b88 [ 202.324001] 810d416f 880119ec1e38 8800cba67800 [ 202.324001] Call Trace: [ 202.324001] [81531dc1] dump_stack+0x49/0x60 [ 202.324001] [81040ce4] warn_slowpath_common+0x81/0x9b [ 202.324001] [810d416f] ? perf_callchain+0x67/0x211 [ 202.324001] [81040d18] warn_slowpath_null+0x1a/0x1c [ 202.324001] [810d416f] perf_callchain+0x67/0x211 [ 202.324001] [8106b0ab] ? local_clock+0x1b/0x24 [ 202.324001] [810d0872] perf_prepare_sample+0x7b/0x304 [ 202.324001] [810d1079] __perf_event_overflow+0x156/0x1c1 [ 202.324001] [810f4613] ? free_pgtables+0xa7/0xc9 [ 202.324001] [810d125e] perf_swevent_overflow+0x41/0x5b [ 202.324001] [810d12ea] perf_swevent_event+0x72/0x74 [ 202.324001] [810d1478] perf_tp_event+0xea/0x1ef [ 202.324001] [8103515a] ? perf_trace_x86_exceptions+0x4c/0xba [ 202.324001] [810351b6] perf_trace_x86_exceptions+0xa8/0xba [ 202.324001] [8103515a] ? perf_trace_x86_exceptions+0x4c/0xba [ 202.324001] [81538439] trace_do_page_fault+0x48/0x99 [ 202.324001] [815354b2] trace_page_fault+0x22/0x30 [ 202.324001] ---[ end trace 31ccd31b4e82cb42 ]--- This triggers a kernel pagefault because the BP register is not valid and copy_from_user_nmi() tries to copy the user stack area from there. This *should* be OK? And it maybe looks like it works, but then copy_user_handle_tail() causes another page fault again at the invaid BP register. But then the code continues to perf_output_begin() 237.265689: page_fault_kernel:address=0x17a0 ip=copy_user_handle_tail error_code=0x0 237.265689: function:__do_page_fault 237.265689: function: bad_area_nosemaphore 237.265690: function: __bad_area_nosemaphore 237.265690: function: no_context 237.265690: function:fixup_exception 237.265690: function: search_exception_tables 237.265690: function: search_extable 237.265691: function: perf_output_begin You can also allow perf to be traced. Remove the CFLAGS_REMOVE_core.o line from kernel/events/Makefile. 237.265692: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e7 how are we back in __do_page_fault again here? Well, the perf ring buffer is vmalloced, right? That can cause a page fault too. 237.265692: function: __do_page_fault Try adding a trace_printk() to see all the page faults? 237.265692: function:down_read_trylock 237.265692: function:_cond_resched 237.265693: function:find_vma 237.265693: function:bad_area 237.265693: function: up_read 237.265693: function: __bad_area_nosemaphore 237.265694: function: is_prefetch 237.265694: function: convert_ip_to_linear 237.265695: function: unhandled_signal 237.265695: function: __printk_ratelimit 237.265695: function: _raw_spin_trylock 237.265695: function:
Re: perf_fuzzer compiled for x32 causes reboot
On 02/27/2014 02:31 PM, Steven Rostedt wrote: Yeah, something is getting mesed up. What it *looks* like to me is that we try to nest the cr2 save/restore, which doesn't nest because it is a percpu variable. ... except in the x86-64 case, we *ALSO* save/restore cr2 inside entry_64.S, which makes the stuff in do_nmi completely redundant and there for no good reason. I would actually suggest we do the equivalent on i386 as well. Vince, could you try this patch as an experiment? diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c index ddf9ecb..938e45c 100644 --- a/arch/x86/lib/usercopy.c +++ b/arch/x86/lib/usercopy.c @@ -10,6 +10,8 @@ #include asm/word-at-a-time.h #include linux/sched.h +#include asm/processor.h + /* * We rely on the nested NMI work to allow atomic faults from the NMI path; the * nested NMI paths are careful to preserve CR2. @@ -18,6 +20,7 @@ unsigned long copy_from_user_nmi(void *to, const void __user *from, unsigned long n) { unsigned long ret; + unsigned long cr2; if (__range_not_ok(from, n, TASK_SIZE)) return 0; @@ -27,9 +30,11 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n) * disable pagefaults so that its behaviour is consistent even when * called form other contexts. */ + cr2 = read_cr2(); pagefault_disable(); ret = __copy_from_user_inatomic(to, from, n); pagefault_enable(); + write_cr2(cr2); return ret; }
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014 14:52:54 -0800 H. Peter Anvin h...@zytor.com wrote: On 02/27/2014 02:31 PM, Steven Rostedt wrote: Yeah, something is getting mesed up. What it *looks* like to me is that we try to nest the cr2 save/restore, which doesn't nest because it is a percpu variable. ... except in the x86-64 case, we *ALSO* save/restore cr2 inside entry_64.S, which makes the stuff in do_nmi completely redundant and there for no good reason. Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32 section. That is, it isn't even executed. That's i386 code. The only place the cr2 is saved for x86_64 is in entry_64.S. -- Steve I would actually suggest we do the equivalent on i386 as well. Vince, could you try this patch as an experiment? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On 02/27/2014 03:30 PM, Steven Rostedt wrote: On Thu, 27 Feb 2014 14:52:54 -0800 H. Peter Anvin h...@zytor.com wrote: On 02/27/2014 02:31 PM, Steven Rostedt wrote: Yeah, something is getting mesed up. What it *looks* like to me is that we try to nest the cr2 save/restore, which doesn't nest because it is a percpu variable. ... except in the x86-64 case, we *ALSO* save/restore cr2 inside entry_64.S, which makes the stuff in do_nmi completely redundant and there for no good reason. Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32 section. That is, it isn't even executed. That's i386 code. The only place the cr2 is saved for x86_64 is in entry_64.S. Right, egg on my face. However, I still think it would make more sense for it to nest the way entry_64.S does if at all possible. That makes this even more confusing, though. I would still like to see what happens with the patch I sent Vince. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014, H. Peter Anvin wrote: On 02/27/2014 03:30 PM, Steven Rostedt wrote: On Thu, 27 Feb 2014 14:52:54 -0800 H. Peter Anvin h...@zytor.com wrote: On 02/27/2014 02:31 PM, Steven Rostedt wrote: Yeah, something is getting mesed up. What it *looks* like to me is that we try to nest the cr2 save/restore, which doesn't nest because it is a percpu variable. ... except in the x86-64 case, we *ALSO* save/restore cr2 inside entry_64.S, which makes the stuff in do_nmi completely redundant and there for no good reason. Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32 section. That is, it isn't even executed. That's i386 code. The only place the cr2 is saved for x86_64 is in entry_64.S. Right, egg on my face. However, I still think it would make more sense for it to nest the way entry_64.S does if at all possible. That makes this even more confusing, though. I would still like to see what happens with the patch I sent Vince. I'll try your patch momentarily, first I had some other changes I started running before I left work (for some reason it recompiled the whole kernel). 8: function: perf_output_begin 8: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e 8: bputs:perf_output_begin: VMW: before rcu_dereference 9: function: __do_page_fault 9: function:down_read_trylock 9: function:_cond_resched 9: function:find_vma so it looks like the fault happens rcu_read_lock(); 116 /* 117 * For inherited events we send all the output towards the parent. 118 */ 119 if (event-parent) 120 event = event-parent; 121 somewhere between here 122 rb = rcu_dereference(event-rb); 123 if (unlikely(!rb)) 124 goto out; and here 125 126 if (unlikely(!rb-nr_pages)) 127 goto out; although if rcu locks do anything to turn off tracing then this could be suspect. Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014, H. Peter Anvin wrote: On 02/27/2014 02:31 PM, Steven Rostedt wrote: Yeah, something is getting mesed up. What it *looks* like to me is that we try to nest the cr2 save/restore, which doesn't nest because it is a percpu variable. ... except in the x86-64 case, we *ALSO* save/restore cr2 inside entry_64.S, which makes the stuff in do_nmi completely redundant and there for no good reason. I would actually suggest we do the equivalent on i386 as well. Vince, could you try this patch as an experiment? OK with your patch applied it does not segfault. The trace looks like you'd expect too: perf_fuzzer-2910 [000] 255.355331: page_fault_kernel: address=irq_stack_union ip=copy_user_handle_tail error_code=0x0 perf_fuzzer-2910 [000] 255.355331: function: __do_page_fault perf_fuzzer-2910 [000] 255.355331: function: bad_area_nosemaphore perf_fuzzer-2910 [000] 255.355331: function: __bad_area_nosemaphore perf_fuzzer-2910 [000] 255.355332: function: no_context perf_fuzzer-2910 [000] 255.355332: function: fixup_exception perf_fuzzer-2910 [000] 255.355332: function: search_exception_tables perf_fuzzer-2910 [000] 255.355332: function: search_extable perf_fuzzer-2910 [000] 255.355333: function: perf_output_begin perf_fuzzer-2910 [000] 255.355333: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e7 perf_fuzzer-2910 [000] 255.355333: bprint: perf_output_begin: VMW: before event-parent (nil) perf_fuzzer-2910 [000] 255.355334: bprint: perf_output_begin: VMW: before rcu_dereference (nil) perf_fuzzer-2910 [000] 255.355334: function: __do_page_fault perf_fuzzer-2910 [000] 255.355334: function: down_read_trylock perf_fuzzer-2910 [000] 255.355334: function: _cond_resched perf_fuzzer-2910 [000] 255.355335: function:find_vma perf_fuzzer-2910 [000] 255.355335: function: handle_mm_fault perf_fuzzer-2910 [000] 255.355335: function: __do_fault perf_fuzzer-2910 [000] 255.355335: bputs: perf_mmap_fault: VMW: perf_mmap_fault perf_fuzzer-2910 [000] 255.355336: bprint: perf_mmap_fault: VMW: perf_mmap_fault 0x8800cba6a980 perf_fuzzer-2910 [000] 255.355336: function: perf_mmap_to_page perf_fuzzer-2910 [000] 255.355336: function: _cond_resched perf_fuzzer-2910 [000] 255.355336: function: unlock_page perf_fuzzer-2910 [000] 255.355336: function: page_waitqueue perf_fuzzer-2910 [000] 255.355337: function: __wake_up_bit perf_fuzzer-2910 [000] 255.355337: bputs: perf_mmap_fault: VMW: perf_mmap_fault perf_fuzzer-2910 [000] 255.355337: function: _cond_resched perf_fuzzer-2910 [000] 255.355337: function: _raw_spin_lock perf_fuzzer-2910 [000] 255.355337: function: page_add_file_rmap perf_fuzzer-2910 [000] 255.355338: function: __inc_zone_page_state perf_fuzzer-2910 [000] 255.355338: function: __inc_zone_state perf_fuzzer-2910 [000] 255.355338: function: set_page_dirty perf_fuzzer-2910 [000] 255.355338: function: page_mapping perf_fuzzer-2910 [000] 255.355338: function: anon_set_page_dirty perf_fuzzer-2910 [000] 255.355339: function: unlock_page Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
Ok... I think we're definitely talking about a cr2 leak. The reboot might be a race condition in the NMI nesting handling maybe? On February 27, 2014 5:34:34 PM PST, Vince Weaver vincent.wea...@maine.edu wrote: On Thu, 27 Feb 2014, H. Peter Anvin wrote: On 02/27/2014 02:31 PM, Steven Rostedt wrote: Yeah, something is getting mesed up. What it *looks* like to me is that we try to nest the cr2 save/restore, which doesn't nest because it is a percpu variable. ... except in the x86-64 case, we *ALSO* save/restore cr2 inside entry_64.S, which makes the stuff in do_nmi completely redundant and there for no good reason. I would actually suggest we do the equivalent on i386 as well. Vince, could you try this patch as an experiment? OK with your patch applied it does not segfault. The trace looks like you'd expect too: perf_fuzzer-2910 [000] 255.355331: page_fault_kernel: address=irq_stack_union ip=copy_user_handle_tail error_code=0x0 perf_fuzzer-2910 [000] 255.355331: function: __do_page_fault perf_fuzzer-2910 [000] 255.355331: function: bad_area_nosemaphore perf_fuzzer-2910 [000] 255.355331: function: __bad_area_nosemaphore perf_fuzzer-2910 [000] 255.355332: function: no_context perf_fuzzer-2910 [000] 255.355332: function: fixup_exception perf_fuzzer-2910 [000] 255.355332: function: search_exception_tables perf_fuzzer-2910 [000] 255.355332: function: search_extable perf_fuzzer-2910 [000] 255.355333: function: perf_output_begin perf_fuzzer-2910 [000] 255.355333: bprint: perf_output_begin: VMW: event type 2 config 2a st: 2c3e7 perf_fuzzer-2910 [000] 255.355333: bprint: perf_output_begin: VMW: before event-parent (nil) perf_fuzzer-2910 [000] 255.355334: bprint: perf_output_begin: VMW: before rcu_dereference (nil) perf_fuzzer-2910 [000] 255.355334: function: __do_page_fault perf_fuzzer-2910 [000] 255.355334: function: down_read_trylock perf_fuzzer-2910 [000] 255.355334: function: _cond_resched perf_fuzzer-2910 [000] 255.355335: function:find_vma perf_fuzzer-2910 [000] 255.355335: function: handle_mm_fault perf_fuzzer-2910 [000] 255.355335: function: __do_fault perf_fuzzer-2910 [000] 255.355335: bputs: perf_mmap_fault: VMW: perf_mmap_fault perf_fuzzer-2910 [000] 255.355336: bprint: perf_mmap_fault: VMW: perf_mmap_fault 0x8800cba6a980 perf_fuzzer-2910 [000] 255.355336: function: perf_mmap_to_page perf_fuzzer-2910 [000] 255.355336: function: _cond_resched perf_fuzzer-2910 [000] 255.355336: function: unlock_page perf_fuzzer-2910 [000] 255.355336: function: page_waitqueue perf_fuzzer-2910 [000] 255.355337: function: __wake_up_bit perf_fuzzer-2910 [000] 255.355337: bputs: perf_mmap_fault: VMW: perf_mmap_fault perf_fuzzer-2910 [000] 255.355337: function: _cond_resched perf_fuzzer-2910 [000] 255.355337: function: _raw_spin_lock perf_fuzzer-2910 [000] 255.355337: function: page_add_file_rmap perf_fuzzer-2910 [000] 255.355338: function: __inc_zone_page_state perf_fuzzer-2910 [000] 255.355338: function: __inc_zone_state perf_fuzzer-2910 [000] 255.355338: function: set_page_dirty perf_fuzzer-2910 [000] 255.355338: function: page_mapping perf_fuzzer-2910 [000] 255.355338: function: anon_set_page_dirty perf_fuzzer-2910 [000] 255.355339: function: unlock_page Vince -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Thu, 27 Feb 2014 20:34:34 -0500 (EST) Vince Weaver vincent.wea...@maine.edu wrote: I would actually suggest we do the equivalent on i386 as well. Vince, could you try this patch as an experiment? OK with your patch applied it does not segfault. Vince, Great! Can you remove Peter's patch, and try this one. It removes the crud to save the cr2 from entry_64.S and makes both i386 and x86_64 do the same thing in regards to cr2 handling. -- Steve diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 1e96c36..937cb8d 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1854,29 +1854,11 @@ end_repeat_nmi: call save_paranoid DEFAULT_FRAME 0 - /* -* Save off the CR2 register. If we take a page fault in the NMI then -* it could corrupt the CR2 value. If the NMI preempts a page fault -* handler before it was able to read the CR2 register, and then the -* NMI itself takes a page fault, the page fault that was preempted -* will read the information from the NMI page fault and not the -* origin fault. Save it off and restore it if it changes. -* Use the r12 callee-saved register. -*/ - movq %cr2, %r12 - /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ movq %rsp,%rdi movq $-1,%rsi call do_nmi - /* Did the NMI take a page fault? Restore cr2 if it did */ - movq %cr2, %rcx - cmpq %rcx, %r12 - je 1f - movq %r12, %cr2 -1: - testl %ebx,%ebx /* swapgs needed? */ jnz nmi_restore nmi_swapgs: diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index 6fcb49c..f1a6294 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -443,7 +443,6 @@ enum nmi_states { NMI_LATCHED, }; static DEFINE_PER_CPU(enum nmi_states, nmi_state); -static DEFINE_PER_CPU(unsigned long, nmi_cr2); #define nmi_nesting_preprocess(regs) \ do {\ @@ -452,14 +451,11 @@ static DEFINE_PER_CPU(unsigned long, nmi_cr2); return; \ } \ this_cpu_write(nmi_state, NMI_EXECUTING); \ - this_cpu_write(nmi_cr2, read_cr2());\ } while (0);\ nmi_restart: #define nmi_nesting_postprocess() \ do {\ - if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) \ - write_cr2(this_cpu_read(nmi_cr2)); \ if (this_cpu_dec_return(nmi_state)) \ goto nmi_restart; \ } while (0) @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void) dotraplinkage notrace __kprobes void do_nmi(struct pt_regs *regs, long error_code) { + unsigned long cr2; + nmi_nesting_preprocess(regs); + /* +* Save off the CR2 register. If we take a page fault in the NMI then +* it could corrupt the CR2 value. If the NMI preempts a page fault +* handler before it was able to read the CR2 register, and then the +* NMI itself takes a page fault, the page fault that was preempted +* will read the information from the NMI page fault and not the +* origin fault. Save it off and restore it if it changes. +* Use the r12 callee-saved register. +*/ + cr2 = read_cr2(); + nmi_enter(); inc_irq_stat(__nmi_count); @@ -523,6 +532,10 @@ do_nmi(struct pt_regs *regs, long error_code) nmi_exit(); + /* Reads are cheaper than writes */ + if (unlikely(cr2 != read_cr2())) + write_cr2(cr2); + /* On i386, may loop back to preprocess */ nmi_nesting_postprocess(); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Tue, 25 Feb 2014, Vince Weaver wrote: > On Tue, 25 Feb 2014, Steven Rostedt wrote: > > > On Tue, 25 Feb 2014 06:34:55 -0800 > > "H. Peter Anvin" wrote: > > > > > #2 is what I really don't understand. > > > > > > I worry something else is going on there > > > > Yeah, me too. > > > > OK, well I'll work on isolating that next, I was hoping the segfault issue > was related. It's a lot harder getting traces out of a machine that > insta-reboots so fast that it can't even finish printing the oops message. OK, dug up the random seeds that are causing the system to reboot, and it turns out that it's hard to reproduce the actual reboot, but in the cases where it doesn't reboot it segfaults in the same code path that was causing the other segfaults in this thread. So I think it's all the same bug, although I can investigate further if needed. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Tue, 25 Feb 2014, Vince Weaver wrote: On Tue, 25 Feb 2014, Steven Rostedt wrote: On Tue, 25 Feb 2014 06:34:55 -0800 H. Peter Anvin h...@zytor.com wrote: #2 is what I really don't understand. I worry something else is going on there Yeah, me too. OK, well I'll work on isolating that next, I was hoping the segfault issue was related. It's a lot harder getting traces out of a machine that insta-reboots so fast that it can't even finish printing the oops message. OK, dug up the random seeds that are causing the system to reboot, and it turns out that it's hard to reproduce the actual reboot, but in the cases where it doesn't reboot it segfaults in the same code path that was causing the other segfaults in this thread. So I think it's all the same bug, although I can investigate further if needed. Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Tue, 25 Feb 2014, Steven Rostedt wrote: > On Tue, 25 Feb 2014 06:34:55 -0800 > "H. Peter Anvin" wrote: > > > #2 is what I really don't understand. > > > > I worry something else is going on there > > Yeah, me too. > OK, well I'll work on isolating that next, I was hoping the segfault issue was related. It's a lot harder getting traces out of a machine that insta-reboots so fast that it can't even finish printing the oops message. I need my testing machine for some unrelated experiments so I might not get a chance to work on this further for a day or two. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf_fuzzer compiled for x32 causes reboot
On Tue, 25 Feb 2014 06:34:55 -0800 "H. Peter Anvin" wrote: > #2 is what I really don't understand. > > I worry something else is going on there Yeah, me too. -- Steve > > > > >While the missing cr2 issue made debugging frustrating, I find the > >other > >aspects of the bug more serious: > > > > 1. Programs that are doing valid memory accesses can segfault > >and worse > >2. This bug can cause an instant-reboot of the system (I assume > >somehow > > with the right combination of memory accesses it causes a > > triple-fault?) > > > >#2 is why I spent all of this time tracking this down, I couldn't leave > >a > >machine fuzzing overnight without the machine rebooting. > > > >Vince > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/