Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-12-04 Thread Takashi Iwai
On Fri, 04 Dec 2015 18:02:58 +0100,
Jens Axboe wrote:
> 
> On 12/04/2015 09:59 AM, Takashi Iwai wrote:
> > On Wed, 25 Nov 2015 20:01:47 +0100,
> > Hannes Reinecke wrote:
> >>
> >> On 11/25/2015 07:01 PM, Mike Snitzer wrote:
> >>> On Wed, Nov 25 2015 at  4:04am -0500,
> >>> Hannes Reinecke  wrote:
> >>>
>  On 11/20/2015 04:28 PM, Ewan Milne wrote:
> > On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> >> Can't we have a joint effort here?
> >> I've been spending a _LOT_ of time trying to debug things here, but
> >> none of the ideas I've come up with have been able to fix anything.
> >
> > Yes.  I'm not the one primarily looking at it, and we don't have a
> > reproducer in-house.  We just have the one dump right now.
> >
> >>
> >> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> >> by one and be done with ...
> >>
> >
> > That might not fix it if it is a problem with the merge code, though.
> >
>  And indeed, it doesn't.
> >>>
> >>> How did you arrive at that?  Do you have a reproducer now?
> >>>
> >> Not a reproducer, but several dumps for analysis.
> >>
>  Seems I finally found the culprit.
> 
>  What happens is this:
>  We have two paths, with these seg_boundary_masks:
> 
>  path-1:seg_boundary_mask = 65535,
>  path-2:seg_boundary_mask = 4294967295,
> 
>  consequently the DM request queue has this:
> 
>  md-1:seg_boundary_mask = 65535,
> 
>  What happens now is that a request is being formatted, and sent
>  to path 2. During submission req->nr_phys_segments is formatted
>  with the limits of path 2, arriving at a count of 3.
>  Now the request gets retried on path 1, but as the NOMERGE request
>  flag is set req->nr_phys_segments is never updated.
>  But blk_rq_map_sg() ignores all counters, and just uses the
>  bi_vec directly, resulting in a count of 4 -> boom.
> 
>  So the culprit here is the NOMERGE flag,
> >>>
> >>> NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
> >>>
> >> Yes.
> >>
>  which is evaluated via
>  ->dm_dispatch_request()
>  ->blk_insert_cloned_request()
>    ->blk_rq_check_limits()
> >>>
> >>> blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
> >>> anyway after reading your mail I'm still left wondering if your proposed
> >>> patch is correct.
> >>>
>  If the above assessment is correct, the following patch should
>  fix it:
> 
>  diff --git a/block/blk-core.c b/block/blk-core.c
>  index 801ced7..12cccd6 100644
>  --- a/block/blk-core.c
>  +++ b/block/blk-core.c
>  @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>  */
> int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> {
>  -   if (!rq_mergeable(rq))
>  +   if (rq->cmd_type != REQ_TYPE_FS)
>    return 0;
> 
>    if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
>  rq->cmd_flags)) {
> 
> 
>  Mike? Jens?
>  Can you comment on it?
> >>>
> >>> You're not explaining the actual change in the patch very well; I think
> >>> you're correct but you're leaving the justification as an exercise to
> >>> the reviewer:
> >>>
> >>> blk_rq_check_limits() will call blk_recalc_rq_segments() after the
> >>> !rq_mergeable() check but you're saying for this case in question we
> >>> never get there -- due to the cloned request having NOMERGE set.
> >>>
> >>> So in blk_rq_check_limits() you've unrolled rq_mergeable() and
> >>> open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
> >>>
> >>> I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
> >>> the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
> >>> sense for cloned requests that always have NOMERGE set.
> >>>
> >>> So you're saying that by having blk_rq_check_limits() go on to call
> >>> blk_recalc_rq_segments() this bug will be fixed?
> >>>
> >> That is the idea.
> >>
> >> I've already established that in all instances I have seen so far
> >> req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.
> >>
> >> As it turns out, req->nr_phys_segemnts _would_ have been updated in
> >> blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
> >> for the cloned request.
> >> So each cloned request inherits the values from the original request,
> >> despite the fact that req->nr_phys_segments _has_ to be evaluated in
> >> the final request_queue context, as the queue limits _might_ be
> >> different from the original (merged) queue limits of the multipath
> >> request queue.
> >>
> >>> BTW, I think blk_rq_check_limits()'s export should be removed and the
> >>> function made static and renamed to blk_clone_rq_check_limits(), again:
> >>> blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
> >>>
> >> Actually, seeing Jens'

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-12-04 Thread Jens Axboe

On 12/04/2015 09:59 AM, Takashi Iwai wrote:

On Wed, 25 Nov 2015 20:01:47 +0100,
Hannes Reinecke wrote:


On 11/25/2015 07:01 PM, Mike Snitzer wrote:

On Wed, Nov 25 2015 at  4:04am -0500,
Hannes Reinecke  wrote:


On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


And indeed, it doesn't.


How did you arrive at that?  Do you have a reproducer now?


Not a reproducer, but several dumps for analysis.


Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag,


NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.


Yes.


which is evaluated via
->dm_dispatch_request()
->blk_insert_cloned_request()
  ->blk_rq_check_limits()


blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
anyway after reading your mail I'm still left wondering if your proposed
patch is correct.


If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
*/
   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
   {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
  return 0;

  if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?


You're not explaining the actual change in the patch very well; I think
you're correct but you're leaving the justification as an exercise to
the reviewer:

blk_rq_check_limits() will call blk_recalc_rq_segments() after the
!rq_mergeable() check but you're saying for this case in question we
never get there -- due to the cloned request having NOMERGE set.

So in blk_rq_check_limits() you've unrolled rq_mergeable() and
open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)

I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
sense for cloned requests that always have NOMERGE set.

So you're saying that by having blk_rq_check_limits() go on to call
blk_recalc_rq_segments() this bug will be fixed?


That is the idea.

I've already established that in all instances I have seen so far
req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.

As it turns out, req->nr_phys_segemnts _would_ have been updated in
blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
for the cloned request.
So each cloned request inherits the values from the original request,
despite the fact that req->nr_phys_segments _has_ to be evaluated in
the final request_queue context, as the queue limits _might_ be
different from the original (merged) queue limits of the multipath
request queue.


BTW, I think blk_rq_check_limits()'s export should be removed and the
function made static and renamed to blk_clone_rq_check_limits(), again:
blk_insert_cloned_request() is the only caller of blk_rq_check_limits()


Actually, seeing Jens' last comment the check for REQ_TYPE_FS is
pointless, too, so we might as well remove the entire if-clause.


Seems prudent to make that change now to be clear that this code is only
used by cloned requests.


Yeah, that would make sense. I'll be preparing a patch.
With a more detailed description :-)


Do we have already a fix?  Right now I got (likely) this kernel BUG()
on the almost latest Linus tree (commit 25364a9e54fb8296).  It
happened while I started a KVM right after a fresh boot.  The machine
paniced even before that, so I hit this twice today.


Update to the tree as-of yesterday (or today) and it should work. 
25364a9e54fb8296 doesn't include the latest block fixes that were sent 
in yesterday, that should fix it. You need commit a88d32af18b8 or newer.


--
Jens Axboe

--
To 

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-12-04 Thread Takashi Iwai
On Wed, 25 Nov 2015 20:01:47 +0100,
Hannes Reinecke wrote:
> 
> On 11/25/2015 07:01 PM, Mike Snitzer wrote:
> > On Wed, Nov 25 2015 at  4:04am -0500,
> > Hannes Reinecke  wrote:
> >
> >> On 11/20/2015 04:28 PM, Ewan Milne wrote:
> >>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>  Can't we have a joint effort here?
>  I've been spending a _LOT_ of time trying to debug things here, but
>  none of the ideas I've come up with have been able to fix anything.
> >>>
> >>> Yes.  I'm not the one primarily looking at it, and we don't have a
> >>> reproducer in-house.  We just have the one dump right now.
> >>>
> 
>  I'm almost tempted to increase the count from scsi_alloc_sgtable()
>  by one and be done with ...
> 
> >>>
> >>> That might not fix it if it is a problem with the merge code, though.
> >>>
> >> And indeed, it doesn't.
> >
> > How did you arrive at that?  Do you have a reproducer now?
> >
> Not a reproducer, but several dumps for analysis.
> 
> >> Seems I finally found the culprit.
> >>
> >> What happens is this:
> >> We have two paths, with these seg_boundary_masks:
> >>
> >> path-1:seg_boundary_mask = 65535,
> >> path-2:seg_boundary_mask = 4294967295,
> >>
> >> consequently the DM request queue has this:
> >>
> >> md-1:seg_boundary_mask = 65535,
> >>
> >> What happens now is that a request is being formatted, and sent
> >> to path 2. During submission req->nr_phys_segments is formatted
> >> with the limits of path 2, arriving at a count of 3.
> >> Now the request gets retried on path 1, but as the NOMERGE request
> >> flag is set req->nr_phys_segments is never updated.
> >> But blk_rq_map_sg() ignores all counters, and just uses the
> >> bi_vec directly, resulting in a count of 4 -> boom.
> >>
> >> So the culprit here is the NOMERGE flag,
> >
> > NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
> >
> Yes.
> 
> >> which is evaluated via
> >> ->dm_dispatch_request()
> >>->blk_insert_cloned_request()
> >>  ->blk_rq_check_limits()
> >
> > blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
> > anyway after reading your mail I'm still left wondering if your proposed
> > patch is correct.
> >
> >> If the above assessment is correct, the following patch should
> >> fix it:
> >>
> >> diff --git a/block/blk-core.c b/block/blk-core.c
> >> index 801ced7..12cccd6 100644
> >> --- a/block/blk-core.c
> >> +++ b/block/blk-core.c
> >> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
> >>*/
> >>   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> >>   {
> >> -   if (!rq_mergeable(rq))
> >> +   if (rq->cmd_type != REQ_TYPE_FS)
> >>  return 0;
> >>
> >>  if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> >> rq->cmd_flags)) {
> >>
> >>
> >> Mike? Jens?
> >> Can you comment on it?
> >
> > You're not explaining the actual change in the patch very well; I think
> > you're correct but you're leaving the justification as an exercise to
> > the reviewer:
> >
> > blk_rq_check_limits() will call blk_recalc_rq_segments() after the
> > !rq_mergeable() check but you're saying for this case in question we
> > never get there -- due to the cloned request having NOMERGE set.
> >
> > So in blk_rq_check_limits() you've unrolled rq_mergeable() and
> > open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
> >
> > I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
> > the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
> > sense for cloned requests that always have NOMERGE set.
> >
> > So you're saying that by having blk_rq_check_limits() go on to call
> > blk_recalc_rq_segments() this bug will be fixed?
> >
> That is the idea.
> 
> I've already established that in all instances I have seen so far
> req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.
> 
> As it turns out, req->nr_phys_segemnts _would_ have been updated in
> blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
> for the cloned request.
> So each cloned request inherits the values from the original request,
> despite the fact that req->nr_phys_segments _has_ to be evaluated in
> the final request_queue context, as the queue limits _might_ be 
> different from the original (merged) queue limits of the multipath
> request queue.
> 
> > BTW, I think blk_rq_check_limits()'s export should be removed and the
> > function made static and renamed to blk_clone_rq_check_limits(), again:
> > blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
> >
> Actually, seeing Jens' last comment the check for REQ_TYPE_FS is 
> pointless, too, so we might as well remove the entire if-clause.
> 
> > Seems prudent to make that change now to be clear that this code is only
> > used by cloned requests.
> >
> Yeah, that would make sense. I'll be preparing a patch.
> With a more detailed description :-)

Do we have already a fix?  Right now I got (likely) 

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Mike Snitzer
On Wed, Nov 25 2015 at  3:23pm -0500,
Mike Snitzer  wrote:

> On Wed, Nov 25 2015 at  2:24pm -0500,
> Jens Axboe  wrote:
> 
> > On 11/25/2015 12:10 PM, Hannes Reinecke wrote:
> > >The problem is that NOMERGE does too much, as it inhibits _any_ merging.
> > 
> > Right, that is the point of the flag from the block layer view,
> > where it was originally added for the case mentioned.
> 
> And we really don't want _any_ merging.  The merging, if any, will have
> already happened in upper DM-multipath's elevator.  So there should be
> no need to have the underlying SCSI paths do any merging.
> 
> > >Unfortunately, the req->nr_phys_segments value is evaluated in the final
> > >_driver_ context _after_ the merging happend; cf
> > >scsi_lib.c:scsi_init_sgtable().
> > >As nr_phys_segments is inherited from the original request (and never
> > >recalculated with the new request queue limits) the following
> > >blk_rq_map_sg() call might end up at a different calculation, especially
> > >after retrying a request on another path.
> > 
> > That all sounds pretty horrible. Why is blk_rq_check_limits()
> > checking for mergeable at all? If merging is disabled on the
> > request, I'm assuming that's an attempt at an optimization since we
> > know it won't change. But that should be tracked separately, like
> > how it's done on the bio.
> 
> Not clear to me why it was checking for merging...

Ewan pointed out that blk_rq_check_limits()'s call to rq_mergable() was
introduced as part of Martin's DISCARD cleanup that prepared for
WRITE_SAME, see: commit e2a60da74 ("block: Clean up special command handling 
logic")

And prior to that, blk_rq_check_limits()'s (rq->cmd_flags & REQ_DISCARD)
check was introduced by some guy crazy doppelganger name "Ike Snitzer",
see commit: 3383977f ("block: update request stacking methods to support 
discards")

So basically we probably never needed the extra check in
blk_rq_check_limits() to begin with.. Ike was probably paranoid. ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Mike Snitzer
On Wed, Nov 25 2015 at  2:24pm -0500,
Jens Axboe  wrote:

> On 11/25/2015 12:10 PM, Hannes Reinecke wrote:
> >On 11/25/2015 06:56 PM, Jens Axboe wrote:
> >>On 11/25/2015 02:04 AM, Hannes Reinecke wrote:
> >>>On 11/20/2015 04:28 PM, Ewan Milne wrote:
> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> >Can't we have a joint effort here?
> >I've been spending a _LOT_ of time trying to debug things here, but
> >none of the ideas I've come up with have been able to fix anything.
> 
> Yes.  I'm not the one primarily looking at it, and we don't have a
> reproducer in-house.  We just have the one dump right now.
> 
> >
> >I'm almost tempted to increase the count from scsi_alloc_sgtable()
> >by one and be done with ...
> >
> 
> That might not fix it if it is a problem with the merge code, though.
> 
> >>>And indeed, it doesn't.
> >>>Seems I finally found the culprit.
> >>>
> >>>What happens is this:
> >>>We have two paths, with these seg_boundary_masks:
> >>>
> >>>path-1:seg_boundary_mask = 65535,
> >>>path-2:seg_boundary_mask = 4294967295,
> >>>
> >>>consequently the DM request queue has this:
> >>>
> >>>md-1:seg_boundary_mask = 65535,
> >>>
> >>>What happens now is that a request is being formatted, and sent
> >>>to path 2. During submission req->nr_phys_segments is formatted
> >>>with the limits of path 2, arriving at a count of 3.
> >>>Now the request gets retried on path 1, but as the NOMERGE request
> >>>flag is set req->nr_phys_segments is never updated.
> >>>But blk_rq_map_sg() ignores all counters, and just uses the
> >>>bi_vec directly, resulting in a count of 4 -> boom.
> >>>
> >>>So the culprit here is the NOMERGE flag, which is evaluated
> >>>via
> >>>->dm_dispatch_request()
> >>>   ->blk_insert_cloned_request()
> >>> ->blk_rq_check_limits()
> >>>
> >>>If the above assessment is correct, the following patch should
> >>>fix it:
> >>>
> >>>diff --git a/block/blk-core.c b/block/blk-core.c
> >>>index 801ced7..12cccd6 100644
> >>>--- a/block/blk-core.c
> >>>+++ b/block/blk-core.c
> >>>@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
> >>>   */
> >>>  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> >>>  {
> >>>-   if (!rq_mergeable(rq))
> >>>+   if (rq->cmd_type != REQ_TYPE_FS)
> >>> return 0;
> >>>
> >>> if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> >>>rq->cmd_flags)) {
> >>>
> >>>
> >>>Mike? Jens?
> >>>Can you comment on it?
> >>
> >>We only support merging on REQ_TYPE_FS already, so how is the above
> >>making it any different? In general, NOMERGE being set or not should not
> >>make a difference. It's only a hint that we need not check further if we
> >>should be merging on this request, since we already tried it once, found
> >>we'd exceed various limits, then set NOMERGE to reflect that.
> >>
> >The problem is that NOMERGE does too much, as it inhibits _any_ merging.
> 
> Right, that is the point of the flag from the block layer view,
> where it was originally added for the case mentioned.

And we really don't want _any_ merging.  The merging, if any, will have
already happened in upper DM-multipath's elevator.  So there should be
no need to have the underlying SCSI paths do any merging.

> >Unfortunately, the req->nr_phys_segments value is evaluated in the final
> >_driver_ context _after_ the merging happend; cf
> >scsi_lib.c:scsi_init_sgtable().
> >As nr_phys_segments is inherited from the original request (and never
> >recalculated with the new request queue limits) the following
> >blk_rq_map_sg() call might end up at a different calculation, especially
> >after retrying a request on another path.
> 
> That all sounds pretty horrible. Why is blk_rq_check_limits()
> checking for mergeable at all? If merging is disabled on the
> request, I'm assuming that's an attempt at an optimization since we
> know it won't change. But that should be tracked separately, like
> how it's done on the bio.

Not clear to me why it was checking for merging...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Jens Axboe

On 11/25/2015 12:10 PM, Hannes Reinecke wrote:

On 11/25/2015 06:56 PM, Jens Axboe wrote:

On 11/25/2015 02:04 AM, Hannes Reinecke wrote:

On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


And indeed, it doesn't.
Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
   ->blk_insert_cloned_request()
 ->blk_rq_check_limits()

If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
   */
  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
  {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
 return 0;

 if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?


We only support merging on REQ_TYPE_FS already, so how is the above
making it any different? In general, NOMERGE being set or not should not
make a difference. It's only a hint that we need not check further if we
should be merging on this request, since we already tried it once, found
we'd exceed various limits, then set NOMERGE to reflect that.


The problem is that NOMERGE does too much, as it inhibits _any_ merging.


Right, that is the point of the flag from the block layer view, where it 
was originally added for the case mentioned.



Unfortunately, the req->nr_phys_segments value is evaluated in the final
_driver_ context _after_ the merging happend; cf
scsi_lib.c:scsi_init_sgtable().
As nr_phys_segments is inherited from the original request (and never
recalculated with the new request queue limits) the following
blk_rq_map_sg() call might end up at a different calculation, especially
after retrying a request on another path.


That all sounds pretty horrible. Why is blk_rq_check_limits() checking 
for mergeable at all? If merging is disabled on the request, I'm 
assuming that's an attempt at an optimization since we know it won't 
change. But that should be tracked separately, like how it's done on the 
bio.



--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Hannes Reinecke

On 11/25/2015 06:56 PM, Jens Axboe wrote:

On 11/25/2015 02:04 AM, Hannes Reinecke wrote:

On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


And indeed, it doesn't.
Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
   ->blk_insert_cloned_request()
 ->blk_rq_check_limits()

If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
   */
  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
  {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
 return 0;

 if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?


We only support merging on REQ_TYPE_FS already, so how is the above
making it any different? In general, NOMERGE being set or not should not
make a difference. It's only a hint that we need not check further if we
should be merging on this request, since we already tried it once, found
we'd exceed various limits, then set NOMERGE to reflect that.


The problem is that NOMERGE does too much, as it inhibits _any_ merging.

Unfortunately, the req->nr_phys_segments value is evaluated in the final
_driver_ context _after_ the merging happend; cf 
scsi_lib.c:scsi_init_sgtable().

As nr_phys_segments is inherited from the original request (and never
recalculated with the new request queue limits) the following 
blk_rq_map_sg() call might end up at a different calculation, especially

after retrying a request on another path.

Cheers,

Hannes
--
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Hannes Reinecke

On 11/25/2015 07:01 PM, Mike Snitzer wrote:

On Wed, Nov 25 2015 at  4:04am -0500,
Hannes Reinecke  wrote:


On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


And indeed, it doesn't.


How did you arrive at that?  Do you have a reproducer now?


Not a reproducer, but several dumps for analysis.


Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag,


NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.


Yes.


which is evaluated via
->dm_dispatch_request()
   ->blk_insert_cloned_request()
 ->blk_rq_check_limits()


blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
anyway after reading your mail I'm still left wondering if your proposed
patch is correct.


If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
   */
  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
  {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
 return 0;

 if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?


You're not explaining the actual change in the patch very well; I think
you're correct but you're leaving the justification as an exercise to
the reviewer:

blk_rq_check_limits() will call blk_recalc_rq_segments() after the
!rq_mergeable() check but you're saying for this case in question we
never get there -- due to the cloned request having NOMERGE set.

So in blk_rq_check_limits() you've unrolled rq_mergeable() and
open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)

I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
sense for cloned requests that always have NOMERGE set.

So you're saying that by having blk_rq_check_limits() go on to call
blk_recalc_rq_segments() this bug will be fixed?


That is the idea.

I've already established that in all instances I have seen so far
req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.

As it turns out, req->nr_phys_segemnts _would_ have been updated in
blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
for the cloned request.
So each cloned request inherits the values from the original request,
despite the fact that req->nr_phys_segments _has_ to be evaluated in
the final request_queue context, as the queue limits _might_ be 
different from the original (merged) queue limits of the multipath

request queue.


BTW, I think blk_rq_check_limits()'s export should be removed and the
function made static and renamed to blk_clone_rq_check_limits(), again:
blk_insert_cloned_request() is the only caller of blk_rq_check_limits()

Actually, seeing Jens' last comment the check for REQ_TYPE_FS is 
pointless, too, so we might as well remove the entire if-clause.



Seems prudent to make that change now to be clear that this code is only
used by cloned requests.


Yeah, that would make sense. I'll be preparing a patch.
With a more detailed description :-)

Cheers,

Hannes

--
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Mike Snitzer
On Wed, Nov 25 2015 at  4:04am -0500,
Hannes Reinecke  wrote:

> On 11/20/2015 04:28 PM, Ewan Milne wrote:
> > On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> >> Can't we have a joint effort here?
> >> I've been spending a _LOT_ of time trying to debug things here, but
> >> none of the ideas I've come up with have been able to fix anything.
> > 
> > Yes.  I'm not the one primarily looking at it, and we don't have a
> > reproducer in-house.  We just have the one dump right now.
> > 
> >>
> >> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> >> by one and be done with ...
> >>
> > 
> > That might not fix it if it is a problem with the merge code, though.
> > 
> And indeed, it doesn't.

How did you arrive at that?  Do you have a reproducer now?

> Seems I finally found the culprit.
> 
> What happens is this:
> We have two paths, with these seg_boundary_masks:
> 
> path-1:seg_boundary_mask = 65535,
> path-2:seg_boundary_mask = 4294967295,
> 
> consequently the DM request queue has this:
> 
> md-1:seg_boundary_mask = 65535,
> 
> What happens now is that a request is being formatted, and sent
> to path 2. During submission req->nr_phys_segments is formatted
> with the limits of path 2, arriving at a count of 3.
> Now the request gets retried on path 1, but as the NOMERGE request
> flag is set req->nr_phys_segments is never updated.
> But blk_rq_map_sg() ignores all counters, and just uses the
> bi_vec directly, resulting in a count of 4 -> boom.
> 
> So the culprit here is the NOMERGE flag,

NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.

> which is evaluated via
> ->dm_dispatch_request()
>   ->blk_insert_cloned_request()
> ->blk_rq_check_limits()

blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
anyway after reading your mail I'm still left wondering if your proposed
patch is correct.

> If the above assessment is correct, the following patch should
> fix it:
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 801ced7..12cccd6 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>   */
>  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>  {
> -   if (!rq_mergeable(rq))
> +   if (rq->cmd_type != REQ_TYPE_FS)
> return 0;
> 
> if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> rq->cmd_flags)) {
> 
> 
> Mike? Jens?
> Can you comment on it?

You're not explaining the actual change in the patch very well; I think
you're correct but you're leaving the justification as an exercise to 
the reviewer:

blk_rq_check_limits() will call blk_recalc_rq_segments() after the
!rq_mergeable() check but you're saying for this case in question we
never get there -- due to the cloned request having NOMERGE set.

So in blk_rq_check_limits() you've unrolled rq_mergeable() and
open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)

I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
sense for cloned requests that always have NOMERGE set.

So you're saying that by having blk_rq_check_limits() go on to call
blk_recalc_rq_segments() this bug will be fixed?

BTW, I think blk_rq_check_limits()'s export should be removed and the
function made static and renamed to blk_clone_rq_check_limits(), again:
blk_insert_cloned_request() is the only caller of blk_rq_check_limits()

Seems prudent to make that change now to be clear that this code is only
used by cloned requests.

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Jens Axboe

On 11/25/2015 02:04 AM, Hannes Reinecke wrote:

On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


And indeed, it doesn't.
Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
   ->blk_insert_cloned_request()
 ->blk_rq_check_limits()

If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
   */
  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
  {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
 return 0;

 if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?


We only support merging on REQ_TYPE_FS already, so how is the above 
making it any different? In general, NOMERGE being set or not should not 
make a difference. It's only a hint that we need not check further if we 
should be merging on this request, since we already tried it once, found 
we'd exceed various limits, then set NOMERGE to reflect that.



--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-25 Thread Hannes Reinecke
On 11/20/2015 04:28 PM, Ewan Milne wrote:
> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>> Can't we have a joint effort here?
>> I've been spending a _LOT_ of time trying to debug things here, but
>> none of the ideas I've come up with have been able to fix anything.
> 
> Yes.  I'm not the one primarily looking at it, and we don't have a
> reproducer in-house.  We just have the one dump right now.
> 
>>
>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>> by one and be done with ...
>>
> 
> That might not fix it if it is a problem with the merge code, though.
> 
And indeed, it doesn't.
Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:seg_boundary_mask = 65535,
path-2:seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
  ->blk_insert_cloned_request()
->blk_rq_check_limits()

If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
  */
 int blk_rq_check_limits(struct request_queue *q, struct request *rq)
 {
-   if (!rq_mergeable(rq))
+   if (rq->cmd_type != REQ_TYPE_FS)
return 0;

if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-24 Thread Alan Ott

On 11/23/2015 10:21 AM, Ming Lei wrote:

On Mon, 23 Nov 2015 10:46:20 +0800
Ming Lei  wrote:


Hi Mark,

On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter  wrote:

On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:

On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter  wrote:

On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:

On Sat, 21 Nov 2015 12:30:14 +0100
Laurent Dufour  wrote:


On 20/11/2015 13:10, Michael Ellerman wrote:

On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:


It's pretty much guaranteed a block layer bug, most likely in the
merge bios to request infrastucture where we don't obey the merging
limits properly.

Does either of you have a known good and first known bad kernel?


Not me, I've only hit it one or two times. All I can say is I have hit it in
4.4-rc1.

Laurent, can you narrow it down at all?


It seems that the panic is triggered by the commit bdced438acd8 ("block:
setup bi_phys_segments after splitting") which has been pulled by the
merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
git://git.kernel.dk/linux-block").

My system is panicing promptly when running a kernel built at
d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
without panicing.

This being said, I can't explain what's going wrong.

May Ming shed some light here ?


Laurent, looks there is one bug in blk_bio_segment_split(), would you
mind testing the following patch to see if it fixes your issue?

---
 From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
From: Ming Lei 
Date: Sun, 22 Nov 2015 00:47:13 +0800
Subject: [PATCH] block: fix segment split

Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
always points to the iterator local variable, which is obviously
wrong, so fix it by pointing to the local variable of 'bvprv'.

Signed-off-by: Ming Lei 
---
  block/blk-merge.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index de5716d8..f2efe8a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue 
*q,

   seg_size += bv.bv_len;
   bvprv = bv;
- bvprvp = &bv;
+ bvprvp = &bvprv;
   sectors += bv.bv_len >> 9;
   continue;
   }
@@ -108,7 +108,7 @@ new_segment:

   nsegs++;
   bvprv = bv;
- bvprvp = &bv;
+ bvprvp = &bvprv;
   seg_size = bv.bv_len;
   sectors += bv.bv_len >> 9;
   }


I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.


OK, looks there are still other bugs, care to share us how to reproduce
it on arm64?

thanks,
Ming


Unfortunately, the best reproducer I have is to boot the platform. I have seen 
the
BUG a few times post-boot, but I don't have a consistant reproducer. I am using
upstream 4.4-rc1 with this config:

   http://people.redhat.com/msalter/fh_defconfig

With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 
boots.
On an AMD Seattle platform, about every 9 boots.


Thanks for the input, and I will try to reproduce the issue on mustang with
your kernel config.


I can reproduce the issue on mustang, and looks I may understand the story now.

When 64K page size is used on arm64, and the default segment size of block
is 65536, then one segment should only include one page at most.

Commit bdced438acd83a(block: setup bi_phys_segments after splitting) does not
compute bio->bi_seg_front_size and bio->bi_seg_back_size, then one less segment
may be obtained because blk_phys_contig_segment() thought the last bvec in 1st
bio and the 1st bvec in the 2nd bio is in one physical segment, so cause the
regression.

Looks the following patch can fix the issue by figuring bio->bi_seg_front_size
and bio->bi_seg_back_size in blk_bio_segment_split().

Mark, thanks again for providing the reproduction steps, and could you run your
test to see if it can fix your issue?

---
 From 86b5f33d48715c1150fdcfd9a76e495e7aa913aa Mon Sep 17 00:00:00 2001
From: Ming Lei 
Date: Mon, 23 Nov 2015 20:27:23 +0800
Subject: [PATCH 2/2] blk-merge: fix blk_bio_segment_split

Commit bdced438acd83a(block: setup bi_phys_segments after
splitting) introduces function of computing bio->bi_phys_segments
during bio splitting.

Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
arn't computed, so too many physical segments may be obtained
for one request since both the two are used to check if one segment
across two bios can be possible.

This patch fixes the issue by computing the two variables in
blk_bio_segment_split().

Reported-by: Michael Ellerman 
Reported-by: Mark Salter 
Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
Signed-off-by: Ming Lei 
---
  block/blk-merge.c | 22 +++---
  1 file changed, 19 insertions(+), 3 deletions(-)

diff 

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-23 Thread Mark Salter
On Mon, 2015-11-23 at 23:27 +0800, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
>  wrote:
> > > 
> > > Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
> > 
> > That's what I mentioned earlier ;)
> > 
> > Now Ming send an additional patch with seems to fix the bug introduced
> > through the commit bdced438acd8. When testing with this new patch I
> > can't get the panic anymore, but Mark reported he is still hitting it.
> 
> Laurent, thanks for your test on the 1st patch, and looks there are at
> least two problems, and my 2nd patch sent just now should address
> Mark's issue which is caused by bdced438acd83a.
> 
> Once the 2nd one is tested OK, I will send out the two together.
> 
> Thanks,
> Ming

Thanks Ming.
With both patches applied, I have been unable to reproduce the problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-23 Thread Laurent Dufour
On 23/11/2015 16:27, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
>  wrote:
>>>
>>> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
>>
>> That's what I mentioned earlier ;)
>>
>> Now Ming send an additional patch with seems to fix the bug introduced
>> through the commit bdced438acd8. When testing with this new patch I
>> can't get the panic anymore, but Mark reported he is still hitting it.
> 
> Laurent, thanks for your test on the 1st patch, and looks there are at
> least two problems, and my 2nd patch sent just now should address
> Mark's issue which is caused by bdced438acd83a.
> 
> Once the 2nd one is tested OK, I will send out the two together.

FWIW, I applied the 2nd patch, and my system is still running like a charm.

Cheers,
Laurent.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-23 Thread Ming Lei
On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
 wrote:
>>
>> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
>
> That's what I mentioned earlier ;)
>
> Now Ming send an additional patch with seems to fix the bug introduced
> through the commit bdced438acd8. When testing with this new patch I
> can't get the panic anymore, but Mark reported he is still hitting it.

Laurent, thanks for your test on the 1st patch, and looks there are at
least two problems, and my 2nd patch sent just now should address
Mark's issue which is caused by bdced438acd83a.

Once the 2nd one is tested OK, I will send out the two together.

Thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-23 Thread Ming Lei
On Mon, 23 Nov 2015 10:46:20 +0800
Ming Lei  wrote:

> Hi Mark,
> 
> On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter  wrote:
> > On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> >> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter  wrote:
> >> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> >> > > On Sat, 21 Nov 2015 12:30:14 +0100
> >> > > Laurent Dufour  wrote:
> >> > >
> >> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> >> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >> > > > >
> >> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
> >> > > > > > merge bios to request infrastucture where we don't obey the 
> >> > > > > > merging
> >> > > > > > limits properly.
> >> > > > > >
> >> > > > > > Does either of you have a known good and first known bad kernel?
> >> > > > >
> >> > > > > Not me, I've only hit it one or two times. All I can say is I have 
> >> > > > > hit it in
> >> > > > > 4.4-rc1.
> >> > > > >
> >> > > > > Laurent, can you narrow it down at all?
> >> > > >
> >> > > > It seems that the panic is triggered by the commit bdced438acd8 
> >> > > > ("block:
> >> > > > setup bi_phys_segments after splitting") which has been pulled by the
> >> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> >> > > > git://git.kernel.dk/linux-block").
> >> > > >
> >> > > > My system is panicing promptly when running a kernel built at
> >> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run 
> >> > > > hours
> >> > > > without panicing.
> >> > > >
> >> > > > This being said, I can't explain what's going wrong.
> >> > > >
> >> > > > May Ming shed some light here ?
> >> > >
> >> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
> >> > > mind testing the following patch to see if it fixes your issue?
> >> > >
> >> > > ---
> >> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> >> > > From: Ming Lei 
> >> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> >> > > Subject: [PATCH] block: fix segment split
> >> > >
> >> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> >> > > always points to the iterator local variable, which is obviously
> >> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> >> > >
> >> > > Signed-off-by: Ming Lei 
> >> > > ---
> >> > >  block/blk-merge.c | 4 ++--
> >> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> >> > >
> >> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> > > index de5716d8..f2efe8a 100644
> >> > > --- a/block/blk-merge.c
> >> > > +++ b/block/blk-merge.c
> >> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
> >> > > request_queue *q,
> >> > >
> >> > >   seg_size += bv.bv_len;
> >> > >   bvprv = bv;
> >> > > - bvprvp = &bv;
> >> > > + bvprvp = &bvprv;
> >> > >   sectors += bv.bv_len >> 9;
> >> > >   continue;
> >> > >   }
> >> > > @@ -108,7 +108,7 @@ new_segment:
> >> > >
> >> > >   nsegs++;
> >> > >   bvprv = bv;
> >> > > - bvprvp = &bv;
> >> > > + bvprvp = &bvprv;
> >> > >   seg_size = bv.bv_len;
> >> > >   sectors += bv.bv_len >> 9;
> >> > >   }
> >> >
> >> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> >>
> >> OK, looks there are still other bugs, care to share us how to reproduce
> >> it on arm64?
> >>
> >> thanks,
> >> Ming
> >
> > Unfortunately, the best reproducer I have is to boot the platform. I have 
> > seen the
> > BUG a few times post-boot, but I don't have a consistant reproducer. I am 
> > using
> > upstream 4.4-rc1 with this config:
> >
> >   http://people.redhat.com/msalter/fh_defconfig
> >
> > With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 
> > boots.
> > On an AMD Seattle platform, about every 9 boots.
> 
> Thanks for the input, and I will try to reproduce the issue on mustang with
> your kernel config.

I can reproduce the issue on mustang, and looks I may understand the story now.

When 64K page size is used on arm64, and the default segment size of block
is 65536, then one segment should only include one page at most.

Commit bdced438acd83a(block: setup bi_phys_segments after splitting) does not
compute bio->bi_seg_front_size and bio->bi_seg_back_size, then one less segment
may be obtained because blk_phys_contig_segment() thought the last bvec in 1st
bio and the 1st bvec in the 2nd bio is in one physical segment, so cause the
regression.

Looks the following patch can fix the issue by figuring bio->bi_seg_front_size
and bio->bi_seg_back_size in blk_bio_segment_split().

Mark, thanks again for providing the reproduction steps, and could you run your
test to see if it can fix your issue?

---
>From 86b5f33d48715c1150fdcfd9a76e495e7aa913aa Mon Sep 17 00:00:00 2001
From: Ming Lei 
Date: Mon, 23 Nov

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-23 Thread Laurent Dufour
On 23/11/2015 16:13, Pratyush Anand wrote:
> On 23/11/2015:02:57:19 PM, Laurent Dufour wrote:
>> On 23/11/2015 00:20, Mark Salter wrote:
>>> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
 On Sat, 21 Nov 2015 12:30:14 +0100
 Laurent Dufour  wrote:

> On 20/11/2015 13:10, Michael Ellerman wrote:
>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>
>>> It's pretty much guaranteed a block layer bug, most likely in the
>>> merge bios to request infrastucture where we don't obey the merging
>>> limits properly.
>>>
>>> Does either of you have a known good and first known bad kernel?
>>
>> Not me, I've only hit it one or two times. All I can say is I have hit 
>> it in
>> 4.4-rc1.
>>
>> Laurent, can you narrow it down at all?
>
> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> setup bi_phys_segments after splitting") which has been pulled by the
> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> git://git.kernel.dk/linux-block").
>
> My system is panicing promptly when running a kernel built at
> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> without panicing.
>
> This being said, I can't explain what's going wrong.
>
> May Ming shed some light here ?

 Laurent, looks there is one bug in blk_bio_segment_split(), would you
 mind testing the following patch to see if it fixes your issue?

 ---
 From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
 From: Ming Lei 
 Date: Sun, 22 Nov 2015 00:47:13 +0800
 Subject: [PATCH] block: fix segment split

 Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
 always points to the iterator local variable, which is obviously
 wrong, so fix it by pointing to the local variable of 'b
> ~Pratyush
> 
vprv'.

 Signed-off-by: Ming Lei 
 ---
  block/blk-merge.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/block/blk-merge.c b/block/blk-merge.c
 index de5716d8..f2efe8a 100644
 --- a/block/blk-merge.c
 +++ b/block/blk-merge.c
 @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
 request_queue *q,
  
seg_size += bv.bv_len;bdced438acd8
bvprv = bv;
 -  bvprvp = &bv;
 +  bvprvp = &bvprv;
sectors += bv.bv_len >> 9;
continue;
}
 @@ -108,7 +108,7 @@ new_segment:
  
nsegs++;
bvprv = bv;
 -  bvprvp = &bv;
 +  bvprvp = &bvprv;
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;
}
>>>
>>> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>
>> On my side, with the patch applied on top of 4.4-rc1, I can't get the
>> panic anymore.
> 
> git bisect shows:
> 
> bdced438acd83ad83a6c6fc7f50099b820245ddb is the first bad commit
> commit bdced438acd83ad83a6c6fc7f50099b820245ddb
> Author: Ming Lei 
> Date:   Tue Oct 20 23:13:52 2015 +0800 
> 
> block: setup bi_phys_segments after splitting
> 
> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.

That's what I mentioned earlier ;)

Now Ming send an additional patch with seems to fix the bug introduced
through the commit bdced438acd8. When testing with this new patch I
can't get the panic anymore, but Mark reported he is still hitting it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-23 Thread Pratyush Anand
On 23/11/2015:02:57:19 PM, Laurent Dufour wrote:
> On 23/11/2015 00:20, Mark Salter wrote:
> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> >> On Sat, 21 Nov 2015 12:30:14 +0100
> >> Laurent Dufour  wrote:
> >>
> >>> On 20/11/2015 13:10, Michael Ellerman wrote:
>  On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> 
> > It's pretty much guaranteed a block layer bug, most likely in the
> > merge bios to request infrastucture where we don't obey the merging
> > limits properly.
> >
> > Does either of you have a known good and first known bad kernel?
> 
>  Not me, I've only hit it one or two times. All I can say is I have hit 
>  it in
>  4.4-rc1.
> 
>  Laurent, can you narrow it down at all?
> >>>
> >>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> >>> setup bi_phys_segments after splitting") which has been pulled by the
> >>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> >>> git://git.kernel.dk/linux-block").
> >>>
> >>> My system is panicing promptly when running a kernel built at
> >>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> >>> without panicing.
> >>>
> >>> This being said, I can't explain what's going wrong.
> >>>
> >>> May Ming shed some light here ?
> >>
> >> Laurent, looks there is one bug in blk_bio_segment_split(), would you
> >> mind testing the following patch to see if it fixes your issue?
> >>
> >> ---
> >> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> >> From: Ming Lei 
> >> Date: Sun, 22 Nov 2015 00:47:13 +0800
> >> Subject: [PATCH] block: fix segment split
> >>
> >> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> >> always points to the iterator local variable, which is obviously
> >> wrong, so fix it by pointing to the local variable of 'bvprv'.
> >>
> >> Signed-off-by: Ming Lei 
> >> ---
> >>  block/blk-merge.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> index de5716d8..f2efe8a 100644
> >> --- a/block/blk-merge.c
> >> +++ b/block/blk-merge.c
> >> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
> >> request_queue *q,
> >>  
> >>seg_size += bv.bv_len;
> >>bvprv = bv;
> >> -  bvprvp = &bv;
> >> +  bvprvp = &bvprv;
> >>sectors += bv.bv_len >> 9;
> >>continue;
> >>}
> >> @@ -108,7 +108,7 @@ new_segment:
> >>  
> >>nsegs++;
> >>bvprv = bv;
> >> -  bvprvp = &bv;
> >> +  bvprvp = &bvprv;
> >>seg_size = bv.bv_len;
> >>sectors += bv.bv_len >> 9;
> >>}
> > 
> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> 
> On my side, with the patch applied on top of 4.4-rc1, I can't get the
> panic anymore.

git bisect shows:

bdced438acd83ad83a6c6fc7f50099b820245ddb is the first bad commit
commit bdced438acd83ad83a6c6fc7f50099b820245ddb
Author: Ming Lei 
Date:   Tue Oct 20 23:13:52 2015 +0800 

block: setup bi_phys_segments after splitting

Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.

~Pratyush
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-23 Thread Laurent Dufour
On 23/11/2015 00:20, Mark Salter wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour  wrote:
>>
>>> On 20/11/2015 13:10, Michael Ellerman wrote:
 On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:

> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
>
> Does either of you have a known good and first known bad kernel?

 Not me, I've only hit it one or two times. All I can say is I have hit it 
 in
 4.4-rc1.

 Laurent, can you narrow it down at all?
>>>
>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>> setup bi_phys_segments after splitting") which has been pulled by the
>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>> git://git.kernel.dk/linux-block").
>>>
>>> My system is panicing promptly when running a kernel built at
>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>> without panicing.
>>>
>>> This being said, I can't explain what's going wrong.
>>>
>>> May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei 
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei 
>> ---
>>  block/blk-merge.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
>> request_queue *q,
>>  
>>  seg_size += bv.bv_len;
>>  bvprv = bv;
>> -bvprvp = &bv;
>> +bvprvp = &bvprv;
>>  sectors += bv.bv_len >> 9;
>>  continue;
>>  }
>> @@ -108,7 +108,7 @@ new_segment:
>>  
>>  nsegs++;
>>  bvprv = bv;
>> -bvprvp = &bv;
>> +bvprvp = &bvprv;
>>  seg_size = bv.bv_len;
>>  sectors += bv.bv_len >> 9;
>>  }
> 
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

On my side, with the patch applied on top of 4.4-rc1, I can't get the
panic anymore.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Hannes Reinecke

On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.


Oh, I got plenty of them :-(



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


Of course not. But it'll be a band-aid to keep the customer happy.

Cheers,

Hannes
--
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Ming Lei
Hi Mark,

On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter  wrote:
> On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
>> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter  wrote:
>> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> > > On Sat, 21 Nov 2015 12:30:14 +0100
>> > > Laurent Dufour  wrote:
>> > >
>> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > > > >
>> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > > > merge bios to request infrastucture where we don't obey the merging
>> > > > > > limits properly.
>> > > > > >
>> > > > > > Does either of you have a known good and first known bad kernel?
>> > > > >
>> > > > > Not me, I've only hit it one or two times. All I can say is I have 
>> > > > > hit it in
>> > > > > 4.4-rc1.
>> > > > >
>> > > > > Laurent, can you narrow it down at all?
>> > > >
>> > > > It seems that the panic is triggered by the commit bdced438acd8 
>> > > > ("block:
>> > > > setup bi_phys_segments after splitting") which has been pulled by the
>> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > > > git://git.kernel.dk/linux-block").
>> > > >
>> > > > My system is panicing promptly when running a kernel built at
>> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > > > without panicing.
>> > > >
>> > > > This being said, I can't explain what's going wrong.
>> > > >
>> > > > May Ming shed some light here ?
>> > >
>> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> > > mind testing the following patch to see if it fixes your issue?
>> > >
>> > > ---
>> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> > > From: Ming Lei 
>> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
>> > > Subject: [PATCH] block: fix segment split
>> > >
>> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> > > always points to the iterator local variable, which is obviously
>> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
>> > >
>> > > Signed-off-by: Ming Lei 
>> > > ---
>> > >  block/blk-merge.c | 4 ++--
>> > >  1 file changed, 2 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
>> > > index de5716d8..f2efe8a 100644
>> > > --- a/block/blk-merge.c
>> > > +++ b/block/blk-merge.c
>> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
>> > > request_queue *q,
>> > >
>> > >   seg_size += bv.bv_len;
>> > >   bvprv = bv;
>> > > - bvprvp = &bv;
>> > > + bvprvp = &bvprv;
>> > >   sectors += bv.bv_len >> 9;
>> > >   continue;
>> > >   }
>> > > @@ -108,7 +108,7 @@ new_segment:
>> > >
>> > >   nsegs++;
>> > >   bvprv = bv;
>> > > - bvprvp = &bv;
>> > > + bvprvp = &bvprv;
>> > >   seg_size = bv.bv_len;
>> > >   sectors += bv.bv_len >> 9;
>> > >   }
>> >
>> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>
>> OK, looks there are still other bugs, care to share us how to reproduce
>> it on arm64?
>>
>> thanks,
>> Ming
>
> Unfortunately, the best reproducer I have is to boot the platform. I have 
> seen the
> BUG a few times post-boot, but I don't have a consistant reproducer. I am 
> using
> upstream 4.4-rc1 with this config:
>
>   http://people.redhat.com/msalter/fh_defconfig
>
> With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 
> boots.
> On an AMD Seattle platform, about every 9 boots.

Thanks for the input, and I will try to reproduce the issue on mustang with
your kernel config.

>
> I have a script that loops through an ssh command to reboot the platform 
> under test.
> I manually install test kernels and then run the script and wait for failure. 
> While
> debugging, I have tried more minimal configs with which I have been unable to
> reproduce the problem even after several hours of reboots. With the above 
> mentioned
> fh_defconfig, I have been able to get a failure within 20 or so boots with 
> most
> kernel builds but at certain kernel commits, the failure has taken a longer 
> time to
> reproduce.
>
> From my POV, I can't say which commit causes the problem. So far, I have not 
> been
> able to reproduce at all before commit d9734e0d1ccf but I am currently trying 
> to
> reproduce with commit 0d51ce9ca1116 (one merge earlier than d9734e0d1ccf).

The patch for fixing 'bvprvp' is better to be included for test,
because that issue
may have a big effect on computing physical seg count.

Also I appreciate if you, Laurent or anyone may provide debug log which
can be captured with the attached debug patch when this issue is trigered .


Thanks,
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dd8ad2a..a0db1fe 100644
--- a/drivers/s

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Mark Salter
On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter  wrote:
> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> > > On Sat, 21 Nov 2015 12:30:14 +0100
> > > Laurent Dufour  wrote:
> > > 
> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > > > > 
> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > > > merge bios to request infrastucture where we don't obey the merging
> > > > > > limits properly.
> > > > > > 
> > > > > > Does either of you have a known good and first known bad kernel?
> > > > > 
> > > > > Not me, I've only hit it one or two times. All I can say is I have 
> > > > > hit it in
> > > > > 4.4-rc1.
> > > > > 
> > > > > Laurent, can you narrow it down at all?
> > > > 
> > > > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > > > setup bi_phys_segments after splitting") which has been pulled by the
> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > > > git://git.kernel.dk/linux-block").
> > > > 
> > > > My system is panicing promptly when running a kernel built at
> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > > > without panicing.
> > > > 
> > > > This being said, I can't explain what's going wrong.
> > > > 
> > > > May Ming shed some light here ?
> > > 
> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
> > > mind testing the following patch to see if it fixes your issue?
> > > 
> > > ---
> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> > > From: Ming Lei 
> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> > > Subject: [PATCH] block: fix segment split
> > > 
> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> > > always points to the iterator local variable, which is obviously
> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> > > 
> > > Signed-off-by: Ming Lei 
> > > ---
> > >  block/blk-merge.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > > index de5716d8..f2efe8a 100644
> > > --- a/block/blk-merge.c
> > > +++ b/block/blk-merge.c
> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
> > > request_queue *q,
> > > 
> > >   seg_size += bv.bv_len;
> > >   bvprv = bv;
> > > - bvprvp = &bv;
> > > + bvprvp = &bvprv;
> > >   sectors += bv.bv_len >> 9;
> > >   continue;
> > >   }
> > > @@ -108,7 +108,7 @@ new_segment:
> > > 
> > >   nsegs++;
> > >   bvprv = bv;
> > > - bvprvp = &bv;
> > > + bvprvp = &bvprv;
> > >   seg_size = bv.bv_len;
> > >   sectors += bv.bv_len >> 9;
> > >   }
> > 
> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> 
> OK, looks there are still other bugs, care to share us how to reproduce
> it on arm64?
> 
> thanks,
> Ming

Unfortunately, the best reproducer I have is to boot the platform. I have seen 
the
BUG a few times post-boot, but I don't have a consistant reproducer. I am using
upstream 4.4-rc1 with this config:

  http://people.redhat.com/msalter/fh_defconfig

With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 
boots.
On an AMD Seattle platform, about every 9 boots.

I have a script that loops through an ssh command to reboot the platform under 
test.
I manually install test kernels and then run the script and wait for failure. 
While
debugging, I have tried more minimal configs with which I have been unable to
reproduce the problem even after several hours of reboots. With the above 
mentioned
fh_defconfig, I have been able to get a failure within 20 or so boots with most
kernel builds but at certain kernel commits, the failure has taken a longer 
time to
reproduce.

>From my POV, I can't say which commit causes the problem. So far, I have not 
>been
able to reproduce at all before commit d9734e0d1ccf but I am currently trying to
reproduce with commit 0d51ce9ca1116 (one merge earlier than d9734e0d1ccf).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Ming Lei
On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter  wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour  wrote:
>>
>> > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > >
>> > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > merge bios to request infrastucture where we don't obey the merging
>> > > > limits properly.
>> > > >
>> > > > Does either of you have a known good and first known bad kernel?
>> > >
>> > > Not me, I've only hit it one or two times. All I can say is I have hit 
>> > > it in
>> > > 4.4-rc1.
>> > >
>> > > Laurent, can you narrow it down at all?
>> >
>> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
>> > setup bi_phys_segments after splitting") which has been pulled by the
>> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > git://git.kernel.dk/linux-block").
>> >
>> > My system is panicing promptly when running a kernel built at
>> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > without panicing.
>> >
>> > This being said, I can't explain what's going wrong.
>> >
>> > May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei 
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei 
>> ---
>>  block/blk-merge.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
>> request_queue *q,
>>
>>   seg_size += bv.bv_len;
>>   bvprv = bv;
>> - bvprvp = &bv;
>> + bvprvp = &bvprv;
>>   sectors += bv.bv_len >> 9;
>>   continue;
>>   }
>> @@ -108,7 +108,7 @@ new_segment:
>>
>>   nsegs++;
>>   bvprv = bv;
>> - bvprvp = &bv;
>> + bvprvp = &bvprv;
>>   seg_size = bv.bv_len;
>>   sectors += bv.bv_len >> 9;
>>   }
>
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

OK, looks there are still other bugs, care to share us how to reproduce
it on arm64?

thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Mark Salter
On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> On Sat, 21 Nov 2015 12:30:14 +0100
> Laurent Dufour  wrote:
> 
> > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > > 
> > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > merge bios to request infrastucture where we don't obey the merging
> > > > limits properly.
> > > > 
> > > > Does either of you have a known good and first known bad kernel?
> > > 
> > > Not me, I've only hit it one or two times. All I can say is I have hit it 
> > > in
> > > 4.4-rc1.
> > > 
> > > Laurent, can you narrow it down at all?
> > 
> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > setup bi_phys_segments after splitting") which has been pulled by the
> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > git://git.kernel.dk/linux-block").
> > 
> > My system is panicing promptly when running a kernel built at
> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > without panicing.
> > 
> > This being said, I can't explain what's going wrong.
> > 
> > May Ming shed some light here ?
> 
> Laurent, looks there is one bug in blk_bio_segment_split(), would you
> mind testing the following patch to see if it fixes your issue?
> 
> ---
> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> From: Ming Lei 
> Date: Sun, 22 Nov 2015 00:47:13 +0800
> Subject: [PATCH] block: fix segment split
> 
> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> always points to the iterator local variable, which is obviously
> wrong, so fix it by pointing to the local variable of 'bvprv'.
> 
> Signed-off-by: Ming Lei 
> ---
>  block/blk-merge.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index de5716d8..f2efe8a 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
> request_queue *q,
>  
>   seg_size += bv.bv_len;
>   bvprv = bv;
> - bvprvp = &bv;
> + bvprvp = &bvprv;
>   sectors += bv.bv_len >> 9;
>   continue;
>   }
> @@ -108,7 +108,7 @@ new_segment:
>  
>   nsegs++;
>   bvprv = bv;
> - bvprvp = &bv;
> + bvprvp = &bvprv;
>   seg_size = bv.bv_len;
>   sectors += bv.bv_len >> 9;
>   }

I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-21 Thread Ming Lei
On Sat, 21 Nov 2015 12:30:14 +0100
Laurent Dufour  wrote:

> On 20/11/2015 13:10, Michael Ellerman wrote:
> > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > 
> >> It's pretty much guaranteed a block layer bug, most likely in the
> >> merge bios to request infrastucture where we don't obey the merging
> >> limits properly.
> >>
> >> Does either of you have a known good and first known bad kernel?
> > 
> > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > 4.4-rc1.
> > 
> > Laurent, can you narrow it down at all?
> 
> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> setup bi_phys_segments after splitting") which has been pulled by the
> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> git://git.kernel.dk/linux-block").
> 
> My system is panicing promptly when running a kernel built at
> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> without panicing.
> 
> This being said, I can't explain what's going wrong.
> 
> May Ming shed some light here ?

Laurent, looks there is one bug in blk_bio_segment_split(), would you
mind testing the following patch to see if it fixes your issue?

---
>From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
From: Ming Lei 
Date: Sun, 22 Nov 2015 00:47:13 +0800
Subject: [PATCH] block: fix segment split

Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
always points to the iterator local variable, which is obviously
wrong, so fix it by pointing to the local variable of 'bvprv'.

Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index de5716d8..f2efe8a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue 
*q,
 
seg_size += bv.bv_len;
bvprv = bv;
-   bvprvp = &bv;
+   bvprvp = &bvprv;
sectors += bv.bv_len >> 9;
continue;
}
@@ -108,7 +108,7 @@ new_segment:
 
nsegs++;
bvprv = bv;
-   bvprvp = &bv;
+   bvprvp = &bvprv;
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;
}
-- 
1.9.1



Thanks,
Ming


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-21 Thread Laurent Dufour
On 20/11/2015 13:10, Michael Ellerman wrote:
> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> 
>> It's pretty much guaranteed a block layer bug, most likely in the
>> merge bios to request infrastucture where we don't obey the merging
>> limits properly.
>>
>> Does either of you have a known good and first known bad kernel?
> 
> Not me, I've only hit it one or two times. All I can say is I have hit it in
> 4.4-rc1.
> 
> Laurent, can you narrow it down at all?

It seems that the panic is triggered by the commit bdced438acd8 ("block:
setup bi_phys_segments after splitting") which has been pulled by the
merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
git://git.kernel.dk/linux-block").

My system is panicing promptly when running a kernel built at
d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
without panicing.

This being said, I can't explain what's going wrong.

May Ming shed some light here ?

Cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Ewan Milne
On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> Can't we have a joint effort here?
> I've been spending a _LOT_ of time trying to debug things here, but
> none of the ideas I've come up with have been able to fix anything.

Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.

> 
> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> by one and be done with ...
> 

That might not fix it if it is a problem with the merge code, though.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Hannes Reinecke
On 11/20/2015 03:38 PM, Ewan Milne wrote:
> On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
>> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
>>> It's pretty much guaranteed a block layer bug, most likely in the
>>> merge bios to request infrastucture where we don't obey the merging
>>> limits properly.
>>>
>>> Does either of you have a known good and first known bad kernel?
>>
>> Well, I have been fighting a similar issue for several months now,
>> albeit with multipath enabled. Haven't had much progress with this,
>> sadly.
>> Seeing that this is our distro kernel it might or might not be
>> related; however, as the symptoms are identical there still is a
>> chance that this is actually a generic block-layer problem.
>>
>> Cheers,
>>
>> Hannes
> 
> We have seen this also.  (e.g.  req->nr_phys_segments was 3, but
> blk_rq_map_sg() returned 4.)  I was suspicious of the patch:
> 
> bio: modify __bio_add_page() to accept pages that don't start a new segment
> 
> But we put some debugging code in and didn't hit it.  We haven't
> found the problem yet, either, though.  We're still looking.
> 
Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.

I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...

> As Christoph said, it would seem to be a problem with the block layer
> merging.
> 
> The API for this seems defective, in that blk_rq_map_sg() should
> never be returning a value indicating that it overwrote past the
> end of the supplied SG array and depend on the caller to check it.
> (We could get data corruption on another I/O if it used adjacent
> memory for a different SG list, for example.)
> 
Yeah, the API is bloody useless.
By the time you hit the BUG_ON you've already caused a memory
corruption, so no way you can recover there.

At the very least we should be passing in the sg list count into
blk_map_rq_sg(), but that's a core blocklayer API and changes here
would require changes by quite a set of drivers. Plus it wouldn't
help me for a distribution kernel ...

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Ewan Milne
On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
> > It's pretty much guaranteed a block layer bug, most likely in the
> > merge bios to request infrastucture where we don't obey the merging
> > limits properly.
> > 
> > Does either of you have a known good and first known bad kernel?
> 
> Well, I have been fighting a similar issue for several months now,
> albeit with multipath enabled. Haven't had much progress with this,
> sadly.
> Seeing that this is our distro kernel it might or might not be
> related; however, as the symptoms are identical there still is a
> chance that this is actually a generic block-layer problem.
> 
> Cheers,
> 
> Hannes

We have seen this also.  (e.g.  req->nr_phys_segments was 3, but
blk_rq_map_sg() returned 4.)  I was suspicious of the patch:

bio: modify __bio_add_page() to accept pages that don't start a new segment

But we put some debugging code in and didn't hit it.  We haven't
found the problem yet, either, though.  We're still looking.

As Christoph said, it would seem to be a problem with the block layer
merging.

The API for this seems defective, in that blk_rq_map_sg() should
never be returning a value indicating that it overwrote past the
end of the supplied SG array and depend on the caller to check it.
(We could get data corruption on another I/O if it used adjacent
memory for a different SG list, for example.)

-Ewan





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Mark Salter
On Fri, 2015-11-20 at 13:56 +0100, Laurent Dufour wrote:
> On 20/11/2015 13:10, Michael Ellerman wrote:
> > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > 
> > > It's pretty much guaranteed a block layer bug, most likely in the
> > > merge bios to request infrastucture where we don't obey the merging
> > > limits properly.
> > > 
> > > Does either of you have a known good and first known bad kernel?
> > 
> > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > 4.4-rc1.
> > 
> > Laurent, can you narrow it down at all?
> 
> I'm trying bisect it but it's very time consuming, and I think I missed
> some bad kernel because I didn't let enough time to the system to panic.
> 
> But I'm still on...
> 
I usually hit it at boot time. Maybe one out of five times. I'll take a stab
at bisecting today...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Laurent Dufour
On 20/11/2015 13:10, Michael Ellerman wrote:
> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> 
>> It's pretty much guaranteed a block layer bug, most likely in the
>> merge bios to request infrastucture where we don't obey the merging
>> limits properly.
>>
>> Does either of you have a known good and first known bad kernel?
> 
> Not me, I've only hit it one or two times. All I can say is I have hit it in
> 4.4-rc1.
> 
> Laurent, can you narrow it down at all?

I'm trying bisect it but it's very time consuming, and I think I missed
some bad kernel because I didn't let enough time to the system to panic.

But I'm still on...

Cheers

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Michael Ellerman
On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:

> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
> 
> Does either of you have a known good and first known bad kernel?

Not me, I've only hit it one or two times. All I can say is I have hit it in
4.4-rc1.

Laurent, can you narrow it down at all?

cheers

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-19 Thread Hannes Reinecke
On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
> 
> Does either of you have a known good and first known bad kernel?

Well, I have been fighting a similar issue for several months now,
albeit with multipath enabled. Haven't had much progress with this,
sadly.
Seeing that this is our distro kernel it might or might not be
related; however, as the symptoms are identical there still is a
chance that this is actually a generic block-layer problem.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-19 Thread Christoph Hellwig
It's pretty much guaranteed a block layer bug, most likely in the
merge bios to request infrastucture where we don't obey the merging
limits properly.

Does either of you have a known good and first known bad kernel?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-18 Thread Michael Ellerman
On Wed, 2015-11-18 at 09:03 -0500, Mark Salter wrote:
> On Wed, 2015-11-18 at 20:18 +1100, Michael Ellerman wrote:
> > Hi folks,
> > 
> > I'm intermittently seeing the following oops on at least one powerpc box.
> > 
> > The BUG_ON() is from:
> > 
> > static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer 
> > *sdb)
> > {
> > ...
> > count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
> > BUG_ON(count > sdb->table.nents);
> > 
> > Looking at the dump it looks like count was 2, I can't work out what nents 
> > was.
> > 
> > The machine's just a fairly boring bare metal setup, with a single IPR 
> > adapter:
> > 
> > 0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
> > Subsystem: IBM PCIe3 x8 SAS RAID Internal Adapter 6Gb (57D7)
> > Flags: bus master, fast devsel, latency 0
> > Kernel driver in use: ipr
> > 
> > 
> > Anyone seen it before or have any ideas?
> 
> I'm also seeing it on arm64 in 4.4-rc1

Ah thanks, that's a good data point. I was assuming it was a driver bug, but I
assume you're not using IPR :)

> [6.859003] Call trace:
>   
> [6.861439] [] scsi_init_sgtable+0x84/0x88   
>   
> [6.867072] [] scsi_init_io+0x4c/0x1ac   
>   
> [6.872358] [] sd_setup_read_write_cmnd+0x44/0x844   
>   
> [6.878682] [] sd_init_command+0x38/0xb0 
>   
> [6.884141] [] scsi_setup_cmnd+0xd8/0x13c
>   
> [6.889686] [] scsi_prep_fn+0xc0/0x140   
>   
> [6.894973] [] blk_peek_request+0x148/0x24c  
>   
> [6.900692] [] scsi_request_fn+0x58/0x648
>   
> [6.906237] [] __blk_run_queue+0x40/0x58 
>   
> [6.911696] [] blk_run_queue+0x30/0x48   
>   
> [6.916983] [] scsi_run_queue+0x204/0x294
>   
> [6.922528] [] scsi_end_request+0x13c/0x1a0  
>   
> [6.928247] [] scsi_io_completion+0xf0/0x564 
>   
> [6.934052] [] scsi_finish_command+0xe4/0x144
>   
> [6.939943] [] scsi_softirq_done+0x148/0x178 
>   
> [6.945748] [] blk_done_softirq+0x7c/0x94
>   
> [6.951295] [] __do_softirq+0x114/0x2a0  
>   
> [6.956667] [] irq_exit+0x8c/0xe4
>   
> [6.961522] [] handle_IPI+0x170/0x228
>   
> [6.966721] [] gic_handle_irq+0xa0/0xb8  
>   
> [6.972093] Exception stack(0xfe03dc143de0 to 0xfe03dc143f00)  
>   

cheers

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-18 Thread Mark Salter
On Wed, 2015-11-18 at 20:18 +1100, Michael Ellerman wrote:
> Hi folks,
> 
> I'm intermittently seeing the following oops on at least one powerpc box.
> 
> The BUG_ON() is from:
> 
> static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer 
> *sdb)
> {
>   ...
>   count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
>   BUG_ON(count > sdb->table.nents);
> 
> Looking at the dump it looks like count was 2, I can't work out what nents 
> was.
> 
> The machine's just a fairly boring bare metal setup, with a single IPR 
> adapter:
> 
> 0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
>   Subsystem: IBM PCIe3 x8 SAS RAID Internal Adapter 6Gb (57D7)
>   Flags: bus master, fast devsel, latency 0
>   Kernel driver in use: ipr
> 
> 
> Anyone seen it before or have any ideas?

I'm also seeing it on arm64 in 4.4-rc1

[6.859003] Call trace:  

[6.861439] [] scsi_init_sgtable+0x84/0x88 

[6.867072] [] scsi_init_io+0x4c/0x1ac 

[6.872358] [] sd_setup_read_write_cmnd+0x44/0x844 

[6.878682] [] sd_init_command+0x38/0xb0   

[6.884141] [] scsi_setup_cmnd+0xd8/0x13c  

[6.889686] [] scsi_prep_fn+0xc0/0x140 

[6.894973] [] blk_peek_request+0x148/0x24c

[6.900692] [] scsi_request_fn+0x58/0x648  

[6.906237] [] __blk_run_queue+0x40/0x58   

[6.911696] [] blk_run_queue+0x30/0x48 

[6.916983] [] scsi_run_queue+0x204/0x294  

[6.922528] [] scsi_end_request+0x13c/0x1a0

[6.928247] [] scsi_io_completion+0xf0/0x564   

[6.934052] [] scsi_finish_command+0xe4/0x144  

[6.939943] [] scsi_softirq_done+0x148/0x178   

[6.945748] [] blk_done_softirq+0x7c/0x94  

[6.951295] [] __do_softirq+0x114/0x2a0

[6.956667] [] irq_exit+0x8c/0xe4  

[6.961522] [] handle_IPI+0x170/0x228  

[6.966721] [] gic_handle_irq+0xa0/0xb8

[6.972093] Exception stack(0xfe03dc143de0 to 0xfe03dc143f00)


> 
> cheers
> 
> 
> systemd[1]: Starting Uncomplicated firewall...
>  Starting Uncomplicated firewall...
> [ cut here ]
> kernel BUG at drivers/scsi/scsi_lib.c:1096!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 132 PID: 2699 Comm: kworker/132:1H Not tainted 
> 4.4.0-rc1-54939-ge22a248-dirty #77
> Workqueue: kblockd cfq_kick_queue
> task: c00fef147400 ti: c00feb384000 task.ti: c00feb384000
> NIP: c05ab4a8 LR: c05ab490 CTR: 
> REGS: c00feb387620 TRAP: 0700   Not tainted  
> (4.4.0-rc1-54939-ge22a248-dirty)
> MSR: 900100029033   CR: 24002228  XER: 
> CFAR: c0464950 SOFTE: 0 
> GPR00: c05ab490 c00feb3878a0 c0d77d00 0002 
> GPR04: c00ff2030158 c00ff47a0c00  1000 
> GPR08:  0001  fff7 
> GPR12: 2200 cfde5200 c00c8098 c0b39858 
> GPR16: c0ae12c8 c0b39948   
> GPR20: c05ab9c0 c017f5800144  c01e551a6850 
> GPR24: c017f5800140 c007efb9c800  c01e551a6800 
> GPR28:   c00ff2030158 c00feb420240 
> NIP [c05ab4a8] scsi_init_sgtable+0xa8/0x180
> LR [c05ab490] scsi_init_sgtable+0x90/0x180
> Call Trace:
> [c00feb3878a0] [c05ab490] scsi_init_sgtable+0x90/0x180 
> (unreliable)
> [c00feb3878e0] [c05ab5d4] scsi_init_io+0x54/0x160
> [c00feb387930] [c05fb43c] sd_init_command+0x6c/0xb00
> [c00feb3879f0] [c05ac2f8] scsi_setup_cmnd+0x108/0x1a0
> [c00feb387a30] [c05ac594] scsi_prep_fn+0x104/0x1c0
> [c00feb387a70] [c045d85c] blk_peek_request+0x20c/0x390
> [c00feb387af0] [c05ad6a8] scsi_request_fn+0xb8/0x7f0
> [c00feb387bf0] [c04583b4] __blk_run_queue+0x54/0x80
> [c00feb387c20] [c047e178] cfq_kick_queue+0x38/0xc0
> [c00feb387c50] [c00c00cc] process_one_work+0x2ac/0x560
> [c00feb387ce0] [c00c0510] worker_thread+0x190/0x660
> [c00feb387d80] [c00c8198] kthread+0x108/0x130
> [c00feb387e30] [c