Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On 2012-12-19 17:29, Linus Torvalds wrote: >> Of course it's been tested. Granted it got moved over too late (as 1 of >> 2 that did), but I've run the branch on a multitude of systems. >> Apparently none of them hit the case of having a zero granularity >> reported, so never hit the bug. > > I presumably happens on pretty much anything that doesn't have > discard. Of course, I've personally gotten rid of any rotating devices > I have, but it still sounds like there's a big testing hole somewhere. It doesn't, though it seems so. Otherwise I definitely would have seen it. It only happens if discard max sectors is set, but alignment isn't. I suspect because that first divide is ordered after the !max_discard_sectors check. At least here. And I suspect we would have seen a lot more reports if it DID trigger on anything that didn't have discard :-) -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On Wed, Dec 19, 2012 at 8:29 AM, Linus Torvalds wrote: > I presumably happens on pretty much anything that doesn't have > discard. Of course, I've personally gotten rid of any rotating devices > I have, but it still sounds like there's a big testing hole somewhere. For what it's worth, I have tested Linus's patch on my ARM Chromebook (which was reproducing the divide by 0 yesterday). With the patch the system has no divide by 0 and still boots fine. Interestingly enough the divide by 0 appeared to happen at probe time once for each partition of both the internal eMMC and the external SD card. Later in the boot sequence the code runs with a non-zero discard. I'm not familiar enough with this part of the kernel to speculate why this system behaves differently than the systems that Jens tested on. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On Wed, Dec 19, 2012 at 6:47 AM, Jens Axboe wrote: > > It should all just be in sectors. The limits set are in bytes (could be > sectors too, but doesn't matter so much), but any interface operates in > sectors. > > I'm happy with your proposed fix. I think you should shove it in there, > then I'll make sure we get it cleaned up for 3.9. Ok, committed and pushed out. Neil, can you please test it on whatever raid setting you can have? In particular, it would be interesting to make sure it works correctly even without LBD support on 32-bit devices on partitions that start more than 4GB into the device, because that's the case I think was broken before. > Of course it's been tested. Granted it got moved over too late (as 1 of > 2 that did), but I've run the branch on a multitude of systems. > Apparently none of them hit the case of having a zero granularity > reported, so never hit the bug. I presumably happens on pretty much anything that doesn't have discard. Of course, I've personally gotten rid of any rotating devices I have, but it still sounds like there's a big testing hole somewhere. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On 2012-12-18 17:49, Linus Torvalds wrote: > On Tue, Dec 18, 2012 at 3:42 AM, Jens Axboe wrote: >> >> Bah. Does the below fix it up for you? > > Grr. This is still bullshit. > > Doing this: > > alignment = sector << 9; > > is fundamentally crap, because 'sector_t' may well be 32-bit > (non-large-block device case). And we're supposed (surprise surprise) > to be able to handle devices larger than 4GB in size. > > So doing *any* of these calculations in bytes is pure and utter crap. > You need to do them in sectors. That's what "sector_t" means, and > that's damn well how everything should work. Anything that works in > bytes is simply pure crap. And don't talk to me about 64-bit math and > doing it in "u64" or "loff_t", that's just utterly moronic too. > > Besides, "sector_div()" is only sensible when you're looking for the > remainder of a sector number. That's true in the first case (sector > really is a sector number - it's the starting sector of the > partition), but the source of alignment and granularity are actually > just "unsigned int" (and that's in bytes, not sectors), so using > sector_t afterwards is crazy too. You should have used just '%'. > Looking around, there are other places where this idiocy happens too > (blkdev_issue_discard() seems to think the granularity/alignments are > sector_t's too, for example). > > Anyway, here's a patch to fix the crazy types and the bogus second > "sector_div()". It's whitespace-damaged, because not only have I not > tested it, I also think somebody needs to look at things in general. > The whole "discard_alignment" handling is extremely odd. I don't think > it should be called "alignment" at all - because it isn't. It's an > alignment *offset*. Look at the normal (non-discard) case, where it's > called "alignment_offset" like it should be. > > So the math is confused, the types are confused, and the naming is > confused. Please, somebody check this out, because now *I* am > confused. It should all just be in sectors. The limits set are in bytes (could be sectors too, but doesn't matter so much), but any interface operates in sectors. I'm happy with your proposed fix. I think you should shove it in there, then I'll make sure we get it cleaned up for 3.9. > And btw, that whole commit happened too f*cking late too. When I get a > pull request, it should damn well have been tested already, and it > should have been developed *before* the merge window started. Not the > day before the pull request. Of course it's been tested. Granted it got moved over too late (as 1 of 2 that did), but I've run the branch on a multitude of systems. Apparently none of them hit the case of having a zero granularity reported, so never hit the bug. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On 2012-12-18 17:49, Linus Torvalds wrote: On Tue, Dec 18, 2012 at 3:42 AM, Jens Axboe ax...@kernel.dk wrote: Bah. Does the below fix it up for you? Grr. This is still bullshit. Doing this: alignment = sector 9; is fundamentally crap, because 'sector_t' may well be 32-bit (non-large-block device case). And we're supposed (surprise surprise) to be able to handle devices larger than 4GB in size. So doing *any* of these calculations in bytes is pure and utter crap. You need to do them in sectors. That's what sector_t means, and that's damn well how everything should work. Anything that works in bytes is simply pure crap. And don't talk to me about 64-bit math and doing it in u64 or loff_t, that's just utterly moronic too. Besides, sector_div() is only sensible when you're looking for the remainder of a sector number. That's true in the first case (sector really is a sector number - it's the starting sector of the partition), but the source of alignment and granularity are actually just unsigned int (and that's in bytes, not sectors), so using sector_t afterwards is crazy too. You should have used just '%'. Looking around, there are other places where this idiocy happens too (blkdev_issue_discard() seems to think the granularity/alignments are sector_t's too, for example). Anyway, here's a patch to fix the crazy types and the bogus second sector_div(). It's whitespace-damaged, because not only have I not tested it, I also think somebody needs to look at things in general. The whole discard_alignment handling is extremely odd. I don't think it should be called alignment at all - because it isn't. It's an alignment *offset*. Look at the normal (non-discard) case, where it's called alignment_offset like it should be. So the math is confused, the types are confused, and the naming is confused. Please, somebody check this out, because now *I* am confused. It should all just be in sectors. The limits set are in bytes (could be sectors too, but doesn't matter so much), but any interface operates in sectors. I'm happy with your proposed fix. I think you should shove it in there, then I'll make sure we get it cleaned up for 3.9. And btw, that whole commit happened too f*cking late too. When I get a pull request, it should damn well have been tested already, and it should have been developed *before* the merge window started. Not the day before the pull request. Of course it's been tested. Granted it got moved over too late (as 1 of 2 that did), but I've run the branch on a multitude of systems. Apparently none of them hit the case of having a zero granularity reported, so never hit the bug. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On Wed, Dec 19, 2012 at 6:47 AM, Jens Axboe ax...@kernel.dk wrote: It should all just be in sectors. The limits set are in bytes (could be sectors too, but doesn't matter so much), but any interface operates in sectors. I'm happy with your proposed fix. I think you should shove it in there, then I'll make sure we get it cleaned up for 3.9. Ok, committed and pushed out. Neil, can you please test it on whatever raid setting you can have? In particular, it would be interesting to make sure it works correctly even without LBD support on 32-bit devices on partitions that start more than 4GB into the device, because that's the case I think was broken before. Of course it's been tested. Granted it got moved over too late (as 1 of 2 that did), but I've run the branch on a multitude of systems. Apparently none of them hit the case of having a zero granularity reported, so never hit the bug. I presumably happens on pretty much anything that doesn't have discard. Of course, I've personally gotten rid of any rotating devices I have, but it still sounds like there's a big testing hole somewhere. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On Wed, Dec 19, 2012 at 8:29 AM, Linus Torvalds torva...@linux-foundation.org wrote: I presumably happens on pretty much anything that doesn't have discard. Of course, I've personally gotten rid of any rotating devices I have, but it still sounds like there's a big testing hole somewhere. For what it's worth, I have tested Linus's patch on my ARM Chromebook (which was reproducing the divide by 0 yesterday). With the patch the system has no divide by 0 and still boots fine. Interestingly enough the divide by 0 appeared to happen at probe time once for each partition of both the internal eMMC and the external SD card. Later in the boot sequence the code runs with a non-zero discard. I'm not familiar enough with this part of the kernel to speculate why this system behaves differently than the systems that Jens tested on. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On 2012-12-19 17:29, Linus Torvalds wrote: Of course it's been tested. Granted it got moved over too late (as 1 of 2 that did), but I've run the branch on a multitude of systems. Apparently none of them hit the case of having a zero granularity reported, so never hit the bug. I presumably happens on pretty much anything that doesn't have discard. Of course, I've personally gotten rid of any rotating devices I have, but it still sounds like there's a big testing hole somewhere. It doesn't, though it seems so. Otherwise I definitely would have seen it. It only happens if discard max sectors is set, but alignment isn't. I suspect because that first divide is ordered after the !max_discard_sectors check. At least here. And I suspect we would have seen a lot more reports if it DID trigger on anything that didn't have discard :-) -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On Tue, Dec 18, 2012 at 3:42 AM, Jens Axboe wrote: > > Bah. Does the below fix it up for you? Grr. This is still bullshit. Doing this: alignment = sector << 9; is fundamentally crap, because 'sector_t' may well be 32-bit (non-large-block device case). And we're supposed (surprise surprise) to be able to handle devices larger than 4GB in size. So doing *any* of these calculations in bytes is pure and utter crap. You need to do them in sectors. That's what "sector_t" means, and that's damn well how everything should work. Anything that works in bytes is simply pure crap. And don't talk to me about 64-bit math and doing it in "u64" or "loff_t", that's just utterly moronic too. Besides, "sector_div()" is only sensible when you're looking for the remainder of a sector number. That's true in the first case (sector really is a sector number - it's the starting sector of the partition), but the source of alignment and granularity are actually just "unsigned int" (and that's in bytes, not sectors), so using sector_t afterwards is crazy too. You should have used just '%'. Looking around, there are other places where this idiocy happens too (blkdev_issue_discard() seems to think the granularity/alignments are sector_t's too, for example). Anyway, here's a patch to fix the crazy types and the bogus second "sector_div()". It's whitespace-damaged, because not only have I not tested it, I also think somebody needs to look at things in general. The whole "discard_alignment" handling is extremely odd. I don't think it should be called "alignment" at all - because it isn't. It's an alignment *offset*. Look at the normal (non-discard) case, where it's called "alignment_offset" like it should be. So the math is confused, the types are confused, and the naming is confused. Please, somebody check this out, because now *I* am confused. And btw, that whole commit happened too f*cking late too. When I get a pull request, it should damn well have been tested already, and it should have been developed *before* the merge window started. Not the day before the pull request. I'm grumpy, because all of this code is UTTER SH*T, and it was sent to me. Why? Linus --- diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index acb4f7bbbd32..c23cae25a0c0 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1188,14 +1188,25 @@ static inline int queue_discard_alignment(struct request_queue *q) static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector_t sector) { - sector_t alignment = sector << 9; - alignment = sector_div(alignment, lim->discard_granularity); + /* Why are these in bytes, not sectors? */ + unsigned int alignment, granularity, offset; if (!lim->max_discard_sectors) return 0; - alignment = lim->discard_granularity + lim->discard_alignment - alignment; - return sector_div(alignment, lim->discard_granularity); + alignment = lim->discard_alignment >> 9; + granularity = lim->discard_granularity >> 9; + if (!alignment || !granularity) + return 0; + + /* Offset of the partition start in 'granularity' sectors */ + offset = sector_div(sector, granularity); + + /* And why do we do this modulus *again* in blkdev_issue_discard()? */ + offset = (granularity + alignment - offset) % granularity; + + /* Turn it back into bytes, gaah */ + return offset << 9; } static inline int bdev_discard_alignment(struct block_device *bdev) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On 2012-12-18 10:25, Ingo Molnar wrote: > > * Jens Axboe wrote: > >> Hi Linus, >> >> Now that the core bits are in, here are the driver bits for 3.8. The >> branch contains: > > FYI, I'm getting a divide-by-zero boot crash (serial log capture > below) with the attached config. > > Reproduced with 848b81415c42. > > The bug might have gone upstream between 8874e81 (Linus's tree > from yesterday) and 848b81415c42 (Linus's tree from today). Or > it's from earlier and I only triggered it today. > > ( Note that every log line is duplicated, haven't tracked that > down yet, earlyprintk=,keep might be busted. ) Bah. Does the below fix it up for you? diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index acb4f7b..067f195 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1188,12 +1188,13 @@ static inline int queue_discard_alignment(struct request_queue *q) static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector_t sector) { - sector_t alignment = sector << 9; - alignment = sector_div(alignment, lim->discard_granularity); + sector_t alignment; - if (!lim->max_discard_sectors) + if (!lim->max_discard_sectors || !lim->discard_granularity) return 0; + alignment = sector << 9; + alignment = sector_div(alignment, lim->discard_granularity); alignment = lim->discard_granularity + lim->discard_alignment - alignment; return sector_div(alignment, lim->discard_granularity); } -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On 2012-12-18 10:25, Ingo Molnar wrote: * Jens Axboe ax...@kernel.dk wrote: Hi Linus, Now that the core bits are in, here are the driver bits for 3.8. The branch contains: FYI, I'm getting a divide-by-zero boot crash (serial log capture below) with the attached config. Reproduced with 848b81415c42. The bug might have gone upstream between 8874e81 (Linus's tree from yesterday) and 848b81415c42 (Linus's tree from today). Or it's from earlier and I only triggered it today. ( Note that every log line is duplicated, haven't tracked that down yet, earlyprintk=,keep might be busted. ) Bah. Does the below fix it up for you? diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index acb4f7b..067f195 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1188,12 +1188,13 @@ static inline int queue_discard_alignment(struct request_queue *q) static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector_t sector) { - sector_t alignment = sector 9; - alignment = sector_div(alignment, lim-discard_granularity); + sector_t alignment; - if (!lim-max_discard_sectors) + if (!lim-max_discard_sectors || !lim-discard_granularity) return 0; + alignment = sector 9; + alignment = sector_div(alignment, lim-discard_granularity); alignment = lim-discard_granularity + lim-discard_alignment - alignment; return sector_div(alignment, lim-discard_granularity); } -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8
On Tue, Dec 18, 2012 at 3:42 AM, Jens Axboe ax...@kernel.dk wrote: Bah. Does the below fix it up for you? Grr. This is still bullshit. Doing this: alignment = sector 9; is fundamentally crap, because 'sector_t' may well be 32-bit (non-large-block device case). And we're supposed (surprise surprise) to be able to handle devices larger than 4GB in size. So doing *any* of these calculations in bytes is pure and utter crap. You need to do them in sectors. That's what sector_t means, and that's damn well how everything should work. Anything that works in bytes is simply pure crap. And don't talk to me about 64-bit math and doing it in u64 or loff_t, that's just utterly moronic too. Besides, sector_div() is only sensible when you're looking for the remainder of a sector number. That's true in the first case (sector really is a sector number - it's the starting sector of the partition), but the source of alignment and granularity are actually just unsigned int (and that's in bytes, not sectors), so using sector_t afterwards is crazy too. You should have used just '%'. Looking around, there are other places where this idiocy happens too (blkdev_issue_discard() seems to think the granularity/alignments are sector_t's too, for example). Anyway, here's a patch to fix the crazy types and the bogus second sector_div(). It's whitespace-damaged, because not only have I not tested it, I also think somebody needs to look at things in general. The whole discard_alignment handling is extremely odd. I don't think it should be called alignment at all - because it isn't. It's an alignment *offset*. Look at the normal (non-discard) case, where it's called alignment_offset like it should be. So the math is confused, the types are confused, and the naming is confused. Please, somebody check this out, because now *I* am confused. And btw, that whole commit happened too f*cking late too. When I get a pull request, it should damn well have been tested already, and it should have been developed *before* the merge window started. Not the day before the pull request. I'm grumpy, because all of this code is UTTER SH*T, and it was sent to me. Why? Linus --- diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index acb4f7bbbd32..c23cae25a0c0 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1188,14 +1188,25 @@ static inline int queue_discard_alignment(struct request_queue *q) static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector_t sector) { - sector_t alignment = sector 9; - alignment = sector_div(alignment, lim-discard_granularity); + /* Why are these in bytes, not sectors? */ + unsigned int alignment, granularity, offset; if (!lim-max_discard_sectors) return 0; - alignment = lim-discard_granularity + lim-discard_alignment - alignment; - return sector_div(alignment, lim-discard_granularity); + alignment = lim-discard_alignment 9; + granularity = lim-discard_granularity 9; + if (!alignment || !granularity) + return 0; + + /* Offset of the partition start in 'granularity' sectors */ + offset = sector_div(sector, granularity); + + /* And why do we do this modulus *again* in blkdev_issue_discard()? */ + offset = (granularity + alignment - offset) % granularity; + + /* Turn it back into bytes, gaah */ + return offset 9; } static inline int bdev_discard_alignment(struct block_device *bdev) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/