Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
On Thu, Aug 15, 2013 at 5:33 PM, Linus Torvalds wrote: > I'll probably delay committing it until tomorrow, in the hope that > somebody using one of the other architectures will at least ack that > it compiles. I'm re-attaching the patch (with the two "logn" -> "long" > fixes) just to encourage that. Hint hint, everybody.. I see I'm too late to supply an Ack for the commit, because it is already in. But just for completeness sake - all my ia64 configs build OK, and the couple that get boot tested still appear to be working too. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
On Fri, Aug 16, 2013 at 01:00:31PM +0200, Michal Hocko wrote: > I was thinking about teaching __tlb_remove_page to update the range > automatically from the given address. The mmu_gather unification stuff I had did it differently still: http://permalink.gmane.org/gmane.linux.kernel.mm/81287 That said, I do like Linus' approach. The only thing I haven't considered is if it does the right thing for tile,mips-r4k which have 'special' rules for VM_HUGETLB. Although I don't think it changes those archs enough to break anything. I should find some time to finally finish that series :/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
On Thu 15-08-13 17:33:28, Linus Torvalds wrote: > On Thu, Aug 15, 2013 at 4:05 PM, Ben Tebulin wrote: > > > >> Ben, please test. I'm worried that the problem you see is something > >> even more fundamentally wrong with the whole "oops, must flush in the > >> middle" logic, but I'm _hoping_ this fixes it. > > > > It's gone. > > > > Really! > > > > I git-fsck'ed successfully around 30 times in a row. > > And even all the other things still seem to work ;-) > > Goodie. I think I'm just going to commit it (with the speling fixes > for other architectures) asap. It's bigger than I'd like, but it's a > lot simpler than the alternatives of trying to figure out exactly > which call chain got things wrong with the previous confusing model. I was thinking about teaching __tlb_remove_page to update the range automatically from the given address. But your patch looks good to me as well. Feel free to add Reviewed-by: Michal Hocko Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
On Fri, Aug 16, 2013 at 2:33 AM, Linus Torvalds wrote: > I'll probably delay committing it until tomorrow, in the hope that > somebody using one of the other architectures will at least ack that > it compiles. I'm re-attaching the patch (with the two "logn" -> "long" > fixes) just to encourage that. Hint hint, everybody.. /me tested arch/um, so far everything looks good. :-) -- Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
Hi Linus, On Thu, 15 Aug 2013 17:33:28 -0700 Linus Torvalds wrote: > > I'll probably delay committing it until tomorrow, in the hope that > somebody using one of the other architectures will at least ack that > it compiles. I'm re-attaching the patch (with the two "logn" -> "long" > fixes) just to encourage that. Hint hint, everybody.. I built all the (major) PowerPC defconfigs, allnoconfig and allmodconfig and they built as well as they did before this patch (i.e. some failed for other reasons). I have not done any boot testing on PowerPC. -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpkdrOLC8mEK.pgp Description: PGP signature
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
Hi Linus, On Thu, 15 Aug 2013 17:33:28 -0700 Linus Torvalds torva...@linux-foundation.org wrote: I'll probably delay committing it until tomorrow, in the hope that somebody using one of the other architectures will at least ack that it compiles. I'm re-attaching the patch (with the two logn - long fixes) just to encourage that. Hint hint, everybody.. I built all the (major) PowerPC defconfigs, allnoconfig and allmodconfig and they built as well as they did before this patch (i.e. some failed for other reasons). I have not done any boot testing on PowerPC. -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpkdrOLC8mEK.pgp Description: PGP signature
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
On Fri, Aug 16, 2013 at 2:33 AM, Linus Torvalds torva...@linux-foundation.org wrote: I'll probably delay committing it until tomorrow, in the hope that somebody using one of the other architectures will at least ack that it compiles. I'm re-attaching the patch (with the two logn - long fixes) just to encourage that. Hint hint, everybody.. /me tested arch/um, so far everything looks good. :-) -- Thanks, //richard -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
On Thu 15-08-13 17:33:28, Linus Torvalds wrote: On Thu, Aug 15, 2013 at 4:05 PM, Ben Tebulin tebu...@googlemail.com wrote: Ben, please test. I'm worried that the problem you see is something even more fundamentally wrong with the whole oops, must flush in the middle logic, but I'm _hoping_ this fixes it. It's gone. Really! I git-fsck'ed successfully around 30 times in a row. And even all the other things still seem to work ;-) Goodie. I think I'm just going to commit it (with the speling fixes for other architectures) asap. It's bigger than I'd like, but it's a lot simpler than the alternatives of trying to figure out exactly which call chain got things wrong with the previous confusing model. I was thinking about teaching __tlb_remove_page to update the range automatically from the given address. But your patch looks good to me as well. Feel free to add Reviewed-by: Michal Hocko mho...@suse.cz Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
On Fri, Aug 16, 2013 at 01:00:31PM +0200, Michal Hocko wrote: I was thinking about teaching __tlb_remove_page to update the range automatically from the given address. The mmu_gather unification stuff I had did it differently still: http://permalink.gmane.org/gmane.linux.kernel.mm/81287 That said, I do like Linus' approach. The only thing I haven't considered is if it does the right thing for tile,mips-r4k which have 'special' rules for VM_HUGETLB. Although I don't think it changes those archs enough to break anything. I should find some time to finally finish that series :/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
On Thu, Aug 15, 2013 at 5:33 PM, Linus Torvalds torva...@linux-foundation.org wrote: I'll probably delay committing it until tomorrow, in the hope that somebody using one of the other architectures will at least ack that it compiles. I'm re-attaching the patch (with the two logn - long fixes) just to encourage that. Hint hint, everybody.. I see I'm too late to supply an Ack for the commit, because it is already in. But just for completeness sake - all my ia64 configs build OK, and the couple that get boot tested still appear to be working too. -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
On Thu, Aug 15, 2013 at 4:05 PM, Ben Tebulin wrote: > >> Ben, please test. I'm worried that the problem you see is something >> even more fundamentally wrong with the whole "oops, must flush in the >> middle" logic, but I'm _hoping_ this fixes it. > > It's gone. > > Really! > > I git-fsck'ed successfully around 30 times in a row. > And even all the other things still seem to work ;-) Goodie. I think I'm just going to commit it (with the speling fixes for other architectures) asap. It's bigger than I'd like, but it's a lot simpler than the alternatives of trying to figure out exactly which call chain got things wrong with the previous confusing model. Thanks for bisecting and testing. > Honestly I have to confess that I'm deeply impressed how this finally > worked out: I just threw a particular, innocent-looking commit hash and > nothing more into the round. Being able to bisect the exact commit that introduced the bad behavior is *very* powerful debugging aid, and in fact the smaller and more innocent-looking the bisected commit is, the easier it generally is to then say "ok, it must be related to this one particular issue". So the bisection really pinpointed the area. After that it was just a matter of reading the source code and seeing what looked suspicious. I'll probably delay committing it until tomorrow, in the hope that somebody using one of the other architectures will at least ack that it compiles. I'm re-attaching the patch (with the two "logn" -> "long" fixes) just to encourage that. Hint hint, everybody.. Linus patch.diff Description: Binary data
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
Am 15.08.2013 20:00, schrieb Linus Torvalds: > Ok, so I've slept on it, and here's my current thinking. > [...] Many thoughts which as a user I'm am unable to follow ;-) > This patch tries to fix the interface instead of trying to patch up > the individual places that *should* set the range some particular way > [...] > This patch is against current git, so to apply you need to have > that commit e6c495a96ce0 cherry-picked to older kernels first. I took a shot based on 3.9.11 + e6c495a96ce0. The reason why I don't simply use the current git master is, that for some reasons my linux-image-*.deb become 750MB and larger since 3.10.y and I have no clue at all why and what to do about it. The patch failed. Due to my outstanding incompetence I resorted into applying it onto master, cherry-picking that back and trying to resolve the remaining conflicts correctly. > - I have no idea whether this will fix the problem Ben sees, but I > feel happier about the code, because now any place that forgets to set > up start/end will work just fine, because they are always valid. Simpler code? Resilient API? Happy people? Great! > Ben, please test. I'm worried that the problem you see is something > even more fundamentally wrong with the whole "oops, must flush in the > middle" logic, but I'm _hoping_ this fixes it. It's gone. Really! I git-fsck'ed successfully around 30 times in a row. And even all the other things still seem to work ;-) Honestly I have to confess that I'm deeply impressed how this finally worked out: I just threw a particular, innocent-looking commit hash and nothing more into the round. And while still being unsure if this might be a plain user space issue, only 24h later I received a 11kb sized kernel patch (with blatant typos in it !1! *g* ) apparently solving my issue. /me happy now, too! :) - Ben -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
Am 15.08.2013 20:00, schrieb Linus Torvalds: Ok, so I've slept on it, and here's my current thinking. [...] Many thoughts which as a user I'm am unable to follow ;-) This patch tries to fix the interface instead of trying to patch up the individual places that *should* set the range some particular way [...] This patch is against current git, so to apply you need to have that commit e6c495a96ce0 cherry-picked to older kernels first. I took a shot based on 3.9.11 + e6c495a96ce0. The reason why I don't simply use the current git master is, that for some reasons my linux-image-*.deb become 750MB and larger since 3.10.y and I have no clue at all why and what to do about it. The patch failed. Due to my outstanding incompetence I resorted into applying it onto master, cherry-picking that back and trying to resolve the remaining conflicts correctly. - I have no idea whether this will fix the problem Ben sees, but I feel happier about the code, because now any place that forgets to set up start/end will work just fine, because they are always valid. Simpler code? Resilient API? Happy people? Great! Ben, please test. I'm worried that the problem you see is something even more fundamentally wrong with the whole oops, must flush in the middle logic, but I'm _hoping_ this fixes it. It's gone. Really! I git-fsck'ed successfully around 30 times in a row. And even all the other things still seem to work ;-) Honestly I have to confess that I'm deeply impressed how this finally worked out: I just threw a particular, innocent-looking commit hash and nothing more into the round. And while still being unsure if this might be a plain user space issue, only 24h later I received a 11kb sized kernel patch (with blatant typos in it !1! *g* ) apparently solving my issue. /me happy now, too! :) - Ben -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
On Thu, Aug 15, 2013 at 4:05 PM, Ben Tebulin tebu...@googlemail.com wrote: Ben, please test. I'm worried that the problem you see is something even more fundamentally wrong with the whole oops, must flush in the middle logic, but I'm _hoping_ this fixes it. It's gone. Really! I git-fsck'ed successfully around 30 times in a row. And even all the other things still seem to work ;-) Goodie. I think I'm just going to commit it (with the speling fixes for other architectures) asap. It's bigger than I'd like, but it's a lot simpler than the alternatives of trying to figure out exactly which call chain got things wrong with the previous confusing model. Thanks for bisecting and testing. Honestly I have to confess that I'm deeply impressed how this finally worked out: I just threw a particular, innocent-looking commit hash and nothing more into the round. Being able to bisect the exact commit that introduced the bad behavior is *very* powerful debugging aid, and in fact the smaller and more innocent-looking the bisected commit is, the easier it generally is to then say ok, it must be related to this one particular issue. So the bisection really pinpointed the area. After that it was just a matter of reading the source code and seeing what looked suspicious. I'll probably delay committing it until tomorrow, in the hope that somebody using one of the other architectures will at least ack that it compiles. I'm re-attaching the patch (with the two logn - long fixes) just to encourage that. Hint hint, everybody.. Linus patch.diff Description: Binary data