Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-19 Thread Jens Axboe
On 2012-12-19 17:29, Linus Torvalds wrote:
>> Of course it's been tested. Granted it got moved over too late (as 1 of
>> 2 that did), but I've run the branch on a multitude of systems.
>> Apparently none of them hit the case of having a zero granularity
>> reported, so never hit the bug.
> 
> I presumably happens on pretty much anything that doesn't have
> discard. Of course, I've personally gotten rid of any rotating devices
> I have, but it still sounds like there's a big testing hole somewhere.

It doesn't, though it seems so. Otherwise I definitely would have seen
it. It only happens if discard max sectors is set, but alignment isn't.
I suspect because that first divide is ordered after the
!max_discard_sectors check. At least here.

And I suspect we would have seen a lot more reports if it DID trigger
on anything that didn't have discard :-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-19 Thread Doug Anderson
On Wed, Dec 19, 2012 at 8:29 AM, Linus Torvalds
 wrote:
> I presumably happens on pretty much anything that doesn't have
> discard. Of course, I've personally gotten rid of any rotating devices
> I have, but it still sounds like there's a big testing hole somewhere.

For what it's worth, I have tested Linus's patch on my ARM Chromebook
(which was reproducing the divide by 0 yesterday).  With the patch the
system has no divide by 0 and still boots fine.

Interestingly enough the divide by 0 appeared to happen at probe time
once for each partition of both the internal eMMC and the external SD
card.  Later in the boot sequence the code runs with a non-zero
discard.  I'm not familiar enough with this part of the kernel to
speculate why this system behaves differently than the systems that
Jens tested on.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-19 Thread Linus Torvalds
On Wed, Dec 19, 2012 at 6:47 AM, Jens Axboe  wrote:
>
> It should all just be in sectors. The limits set are in bytes (could be
> sectors too, but doesn't matter so much), but any interface operates in
> sectors.
>
> I'm happy with your proposed fix. I think you should shove it in there,
> then I'll make sure we get it cleaned up for 3.9.

Ok, committed and pushed out. Neil, can you please test it on whatever
raid setting you can have?

In particular, it would be interesting to make sure it works correctly
even without LBD support on 32-bit devices on partitions that start
more than 4GB into the device, because that's the case I think was
broken before.

> Of course it's been tested. Granted it got moved over too late (as 1 of
> 2 that did), but I've run the branch on a multitude of systems.
> Apparently none of them hit the case of having a zero granularity
> reported, so never hit the bug.

I presumably happens on pretty much anything that doesn't have
discard. Of course, I've personally gotten rid of any rotating devices
I have, but it still sounds like there's a big testing hole somewhere.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-19 Thread Jens Axboe
On 2012-12-18 17:49, Linus Torvalds wrote:
> On Tue, Dec 18, 2012 at 3:42 AM, Jens Axboe  wrote:
>>
>> Bah. Does the below fix it up for you?
> 
> Grr. This is still bullshit.
> 
> Doing this:
> 
> alignment = sector << 9;
> 
> is fundamentally crap, because 'sector_t' may well be 32-bit
> (non-large-block device case). And we're supposed (surprise surprise)
> to be able to handle devices larger than 4GB in size.
> 
> So doing *any* of these calculations in bytes is pure and utter crap.
> You need to do them in sectors. That's what "sector_t" means, and
> that's damn well how everything should work. Anything that works in
> bytes is simply pure crap. And don't talk to me about 64-bit math and
> doing it in "u64" or "loff_t", that's just utterly moronic too.
> 
> Besides, "sector_div()" is only sensible when you're looking for the
> remainder of a sector number. That's true in the first case (sector
> really is a sector number - it's the starting sector of the
> partition), but the source of alignment and granularity are actually
> just "unsigned int" (and that's in bytes, not sectors), so using
> sector_t afterwards is crazy too. You should have used just '%'.
> Looking around, there are other places where this idiocy happens too
> (blkdev_issue_discard() seems to think the granularity/alignments are
> sector_t's too, for example).
> 
> Anyway, here's a patch to fix the crazy types and the bogus second
> "sector_div()". It's whitespace-damaged, because not only have I not
> tested it, I also think somebody needs to look at things in general.
> The whole "discard_alignment" handling is extremely odd. I don't think
> it should be called "alignment" at all - because it isn't. It's an
> alignment *offset*. Look at the normal (non-discard) case, where it's
> called "alignment_offset" like it should be.
> 
> So the math is confused, the types are confused, and the naming is
> confused. Please, somebody check this out, because now *I* am
> confused.

It should all just be in sectors. The limits set are in bytes (could be
sectors too, but doesn't matter so much), but any interface operates in
sectors.

I'm happy with your proposed fix. I think you should shove it in there,
then I'll make sure we get it cleaned up for 3.9.

> And btw, that whole commit happened too f*cking late too. When I get a
> pull request, it should damn well have been tested already, and it
> should have been developed *before* the merge window started. Not the
> day before the pull request.

Of course it's been tested. Granted it got moved over too late (as 1 of
2 that did), but I've run the branch on a multitude of systems.
Apparently none of them hit the case of having a zero granularity
reported, so never hit the bug.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-19 Thread Jens Axboe
On 2012-12-18 17:49, Linus Torvalds wrote:
 On Tue, Dec 18, 2012 at 3:42 AM, Jens Axboe ax...@kernel.dk wrote:

 Bah. Does the below fix it up for you?
 
 Grr. This is still bullshit.
 
 Doing this:
 
 alignment = sector  9;
 
 is fundamentally crap, because 'sector_t' may well be 32-bit
 (non-large-block device case). And we're supposed (surprise surprise)
 to be able to handle devices larger than 4GB in size.
 
 So doing *any* of these calculations in bytes is pure and utter crap.
 You need to do them in sectors. That's what sector_t means, and
 that's damn well how everything should work. Anything that works in
 bytes is simply pure crap. And don't talk to me about 64-bit math and
 doing it in u64 or loff_t, that's just utterly moronic too.
 
 Besides, sector_div() is only sensible when you're looking for the
 remainder of a sector number. That's true in the first case (sector
 really is a sector number - it's the starting sector of the
 partition), but the source of alignment and granularity are actually
 just unsigned int (and that's in bytes, not sectors), so using
 sector_t afterwards is crazy too. You should have used just '%'.
 Looking around, there are other places where this idiocy happens too
 (blkdev_issue_discard() seems to think the granularity/alignments are
 sector_t's too, for example).
 
 Anyway, here's a patch to fix the crazy types and the bogus second
 sector_div(). It's whitespace-damaged, because not only have I not
 tested it, I also think somebody needs to look at things in general.
 The whole discard_alignment handling is extremely odd. I don't think
 it should be called alignment at all - because it isn't. It's an
 alignment *offset*. Look at the normal (non-discard) case, where it's
 called alignment_offset like it should be.
 
 So the math is confused, the types are confused, and the naming is
 confused. Please, somebody check this out, because now *I* am
 confused.

It should all just be in sectors. The limits set are in bytes (could be
sectors too, but doesn't matter so much), but any interface operates in
sectors.

I'm happy with your proposed fix. I think you should shove it in there,
then I'll make sure we get it cleaned up for 3.9.

 And btw, that whole commit happened too f*cking late too. When I get a
 pull request, it should damn well have been tested already, and it
 should have been developed *before* the merge window started. Not the
 day before the pull request.

Of course it's been tested. Granted it got moved over too late (as 1 of
2 that did), but I've run the branch on a multitude of systems.
Apparently none of them hit the case of having a zero granularity
reported, so never hit the bug.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-19 Thread Linus Torvalds
On Wed, Dec 19, 2012 at 6:47 AM, Jens Axboe ax...@kernel.dk wrote:

 It should all just be in sectors. The limits set are in bytes (could be
 sectors too, but doesn't matter so much), but any interface operates in
 sectors.

 I'm happy with your proposed fix. I think you should shove it in there,
 then I'll make sure we get it cleaned up for 3.9.

Ok, committed and pushed out. Neil, can you please test it on whatever
raid setting you can have?

In particular, it would be interesting to make sure it works correctly
even without LBD support on 32-bit devices on partitions that start
more than 4GB into the device, because that's the case I think was
broken before.

 Of course it's been tested. Granted it got moved over too late (as 1 of
 2 that did), but I've run the branch on a multitude of systems.
 Apparently none of them hit the case of having a zero granularity
 reported, so never hit the bug.

I presumably happens on pretty much anything that doesn't have
discard. Of course, I've personally gotten rid of any rotating devices
I have, but it still sounds like there's a big testing hole somewhere.

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-19 Thread Doug Anderson
On Wed, Dec 19, 2012 at 8:29 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
 I presumably happens on pretty much anything that doesn't have
 discard. Of course, I've personally gotten rid of any rotating devices
 I have, but it still sounds like there's a big testing hole somewhere.

For what it's worth, I have tested Linus's patch on my ARM Chromebook
(which was reproducing the divide by 0 yesterday).  With the patch the
system has no divide by 0 and still boots fine.

Interestingly enough the divide by 0 appeared to happen at probe time
once for each partition of both the internal eMMC and the external SD
card.  Later in the boot sequence the code runs with a non-zero
discard.  I'm not familiar enough with this part of the kernel to
speculate why this system behaves differently than the systems that
Jens tested on.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-19 Thread Jens Axboe
On 2012-12-19 17:29, Linus Torvalds wrote:
 Of course it's been tested. Granted it got moved over too late (as 1 of
 2 that did), but I've run the branch on a multitude of systems.
 Apparently none of them hit the case of having a zero granularity
 reported, so never hit the bug.
 
 I presumably happens on pretty much anything that doesn't have
 discard. Of course, I've personally gotten rid of any rotating devices
 I have, but it still sounds like there's a big testing hole somewhere.

It doesn't, though it seems so. Otherwise I definitely would have seen
it. It only happens if discard max sectors is set, but alignment isn't.
I suspect because that first divide is ordered after the
!max_discard_sectors check. At least here.

And I suspect we would have seen a lot more reports if it DID trigger
on anything that didn't have discard :-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-18 Thread Linus Torvalds
On Tue, Dec 18, 2012 at 3:42 AM, Jens Axboe  wrote:
>
> Bah. Does the below fix it up for you?

Grr. This is still bullshit.

Doing this:

alignment = sector << 9;

is fundamentally crap, because 'sector_t' may well be 32-bit
(non-large-block device case). And we're supposed (surprise surprise)
to be able to handle devices larger than 4GB in size.

So doing *any* of these calculations in bytes is pure and utter crap.
You need to do them in sectors. That's what "sector_t" means, and
that's damn well how everything should work. Anything that works in
bytes is simply pure crap. And don't talk to me about 64-bit math and
doing it in "u64" or "loff_t", that's just utterly moronic too.

Besides, "sector_div()" is only sensible when you're looking for the
remainder of a sector number. That's true in the first case (sector
really is a sector number - it's the starting sector of the
partition), but the source of alignment and granularity are actually
just "unsigned int" (and that's in bytes, not sectors), so using
sector_t afterwards is crazy too. You should have used just '%'.
Looking around, there are other places where this idiocy happens too
(blkdev_issue_discard() seems to think the granularity/alignments are
sector_t's too, for example).

Anyway, here's a patch to fix the crazy types and the bogus second
"sector_div()". It's whitespace-damaged, because not only have I not
tested it, I also think somebody needs to look at things in general.
The whole "discard_alignment" handling is extremely odd. I don't think
it should be called "alignment" at all - because it isn't. It's an
alignment *offset*. Look at the normal (non-discard) case, where it's
called "alignment_offset" like it should be.

So the math is confused, the types are confused, and the naming is
confused. Please, somebody check this out, because now *I* am
confused.

And btw, that whole commit happened too f*cking late too. When I get a
pull request, it should damn well have been tested already, and it
should have been developed *before* the merge window started. Not the
day before the pull request.

I'm grumpy, because all of this code is UTTER SH*T, and it was sent to me. Why?

Linus

---
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index acb4f7bbbd32..c23cae25a0c0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1188,14 +1188,25 @@ static inline int
queue_discard_alignment(struct request_queue *q)

 static inline int queue_limit_discard_alignment(struct queue_limits
*lim, sector_t sector)
 {
-   sector_t alignment = sector << 9;
-   alignment = sector_div(alignment, lim->discard_granularity);
+   /* Why are these in bytes, not sectors? */
+   unsigned int alignment, granularity, offset;

if (!lim->max_discard_sectors)
return 0;

-   alignment = lim->discard_granularity + lim->discard_alignment
- alignment;
-   return sector_div(alignment, lim->discard_granularity);
+   alignment = lim->discard_alignment >> 9;
+   granularity = lim->discard_granularity >> 9;
+   if (!alignment || !granularity)
+   return 0;
+
+   /* Offset of the partition start in 'granularity' sectors */
+   offset = sector_div(sector, granularity);
+
+   /* And why do we do this modulus *again* in blkdev_issue_discard()? */
+   offset = (granularity + alignment - offset) % granularity;
+
+   /* Turn it back into bytes, gaah */
+   return offset << 9;
 }

 static inline int bdev_discard_alignment(struct block_device *bdev)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-18 Thread Jens Axboe
On 2012-12-18 10:25, Ingo Molnar wrote:
> 
> * Jens Axboe  wrote:
> 
>> Hi Linus,
>>
>> Now that the core bits are in, here are the driver bits for 3.8. The
>> branch contains:
> 
> FYI, I'm getting a divide-by-zero boot crash (serial log capture 
> below) with the attached config.
> 
> Reproduced with 848b81415c42.
> 
> The bug might have gone upstream between 8874e81 (Linus's tree 
> from yesterday) and 848b81415c42 (Linus's tree from today). Or 
> it's from earlier and I only triggered it today.
> 
> ( Note that every log line is duplicated, haven't tracked that
>   down yet, earlyprintk=,keep might be busted. )

Bah. Does the below fix it up for you?

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index acb4f7b..067f195 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1188,12 +1188,13 @@ static inline int queue_discard_alignment(struct 
request_queue *q)
 
 static inline int queue_limit_discard_alignment(struct queue_limits *lim, 
sector_t sector)
 {
-   sector_t alignment = sector << 9;
-   alignment = sector_div(alignment, lim->discard_granularity);
+   sector_t alignment;
 
-   if (!lim->max_discard_sectors)
+   if (!lim->max_discard_sectors || !lim->discard_granularity)
return 0;
 
+   alignment = sector << 9;
+   alignment = sector_div(alignment, lim->discard_granularity);
alignment = lim->discard_granularity + lim->discard_alignment - 
alignment;
return sector_div(alignment, lim->discard_granularity);
 }

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-18 Thread Jens Axboe
On 2012-12-18 10:25, Ingo Molnar wrote:
 
 * Jens Axboe ax...@kernel.dk wrote:
 
 Hi Linus,

 Now that the core bits are in, here are the driver bits for 3.8. The
 branch contains:
 
 FYI, I'm getting a divide-by-zero boot crash (serial log capture 
 below) with the attached config.
 
 Reproduced with 848b81415c42.
 
 The bug might have gone upstream between 8874e81 (Linus's tree 
 from yesterday) and 848b81415c42 (Linus's tree from today). Or 
 it's from earlier and I only triggered it today.
 
 ( Note that every log line is duplicated, haven't tracked that
   down yet, earlyprintk=,keep might be busted. )

Bah. Does the below fix it up for you?

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index acb4f7b..067f195 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1188,12 +1188,13 @@ static inline int queue_discard_alignment(struct 
request_queue *q)
 
 static inline int queue_limit_discard_alignment(struct queue_limits *lim, 
sector_t sector)
 {
-   sector_t alignment = sector  9;
-   alignment = sector_div(alignment, lim-discard_granularity);
+   sector_t alignment;
 
-   if (!lim-max_discard_sectors)
+   if (!lim-max_discard_sectors || !lim-discard_granularity)
return 0;
 
+   alignment = sector  9;
+   alignment = sector_div(alignment, lim-discard_granularity);
alignment = lim-discard_granularity + lim-discard_alignment - 
alignment;
return sector_div(alignment, lim-discard_granularity);
 }

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [boot crash] Re: [GIT PULL[ block drivers bits for 3.8

2012-12-18 Thread Linus Torvalds
On Tue, Dec 18, 2012 at 3:42 AM, Jens Axboe ax...@kernel.dk wrote:

 Bah. Does the below fix it up for you?

Grr. This is still bullshit.

Doing this:

alignment = sector  9;

is fundamentally crap, because 'sector_t' may well be 32-bit
(non-large-block device case). And we're supposed (surprise surprise)
to be able to handle devices larger than 4GB in size.

So doing *any* of these calculations in bytes is pure and utter crap.
You need to do them in sectors. That's what sector_t means, and
that's damn well how everything should work. Anything that works in
bytes is simply pure crap. And don't talk to me about 64-bit math and
doing it in u64 or loff_t, that's just utterly moronic too.

Besides, sector_div() is only sensible when you're looking for the
remainder of a sector number. That's true in the first case (sector
really is a sector number - it's the starting sector of the
partition), but the source of alignment and granularity are actually
just unsigned int (and that's in bytes, not sectors), so using
sector_t afterwards is crazy too. You should have used just '%'.
Looking around, there are other places where this idiocy happens too
(blkdev_issue_discard() seems to think the granularity/alignments are
sector_t's too, for example).

Anyway, here's a patch to fix the crazy types and the bogus second
sector_div(). It's whitespace-damaged, because not only have I not
tested it, I also think somebody needs to look at things in general.
The whole discard_alignment handling is extremely odd. I don't think
it should be called alignment at all - because it isn't. It's an
alignment *offset*. Look at the normal (non-discard) case, where it's
called alignment_offset like it should be.

So the math is confused, the types are confused, and the naming is
confused. Please, somebody check this out, because now *I* am
confused.

And btw, that whole commit happened too f*cking late too. When I get a
pull request, it should damn well have been tested already, and it
should have been developed *before* the merge window started. Not the
day before the pull request.

I'm grumpy, because all of this code is UTTER SH*T, and it was sent to me. Why?

Linus

---
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index acb4f7bbbd32..c23cae25a0c0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1188,14 +1188,25 @@ static inline int
queue_discard_alignment(struct request_queue *q)

 static inline int queue_limit_discard_alignment(struct queue_limits
*lim, sector_t sector)
 {
-   sector_t alignment = sector  9;
-   alignment = sector_div(alignment, lim-discard_granularity);
+   /* Why are these in bytes, not sectors? */
+   unsigned int alignment, granularity, offset;

if (!lim-max_discard_sectors)
return 0;

-   alignment = lim-discard_granularity + lim-discard_alignment
- alignment;
-   return sector_div(alignment, lim-discard_granularity);
+   alignment = lim-discard_alignment  9;
+   granularity = lim-discard_granularity  9;
+   if (!alignment || !granularity)
+   return 0;
+
+   /* Offset of the partition start in 'granularity' sectors */
+   offset = sector_div(sector, granularity);
+
+   /* And why do we do this modulus *again* in blkdev_issue_discard()? */
+   offset = (granularity + alignment - offset) % granularity;
+
+   /* Turn it back into bytes, gaah */
+   return offset  9;
 }

 static inline int bdev_discard_alignment(struct block_device *bdev)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/