Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 13:50 -0600, Jens Axboe wrote:
> On 5/9/18 12:31 PM, Mike Galbraith wrote:
> > On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote:
> >> On 5/9/18 10:57 AM, Mike Galbraith wrote:
> >>
> > Confirmed.  Impressive high speed bug stomping.
> 
>  Well, that's good news. Can I get you to try this patch?
> >>>
> >>> Sure thing.  The original hang (minus provocation patch) being
> >>> annoyingly non-deterministic, this will (hopefully) take a while.
> >>
> >> You can verify with the provocation patch as well first, if you wish.
> > 
> > Done, box still seems fine.
> 
> Omar had some (valid) complaints, can you try this one as well? You
> can also find it as a series here:
> 
> http://git.kernel.dk/cgit/linux-block/log/?h=bfq-cleanups
> 
> I'll repost the series shortly, need to check if it actually builds and
> boots.

I applied the series (+ provocation), all is well.

-Mike


Re: bug in tag handling in blk-mq?

2018-05-09 Thread Jens Axboe
On 5/9/18 12:31 PM, Mike Galbraith wrote:
> On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote:
>> On 5/9/18 10:57 AM, Mike Galbraith wrote:
>>
> Confirmed.  Impressive high speed bug stomping.

 Well, that's good news. Can I get you to try this patch?
>>>
>>> Sure thing.  The original hang (minus provocation patch) being
>>> annoyingly non-deterministic, this will (hopefully) take a while.
>>
>> You can verify with the provocation patch as well first, if you wish.
> 
> Done, box still seems fine.

Omar had some (valid) complaints, can you try this one as well? You
can also find it as a series here:

http://git.kernel.dk/cgit/linux-block/log/?h=bfq-cleanups

I'll repost the series shortly, need to check if it actually builds and
boots.

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index ebc264c87a09..cba6e82153a2 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -487,46 +487,6 @@ static struct request *bfq_choose_req(struct bfq_data 
*bfqd,
 }
 
 /*
- * See the comments on bfq_limit_depth for the purpose of
- * the depths set in the function.
- */
-static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
-{
-   bfqd->sb_shift = bt->sb.shift;
-
-   /*
-* In-word depths if no bfq_queue is being weight-raised:
-* leaving 25% of tags only for sync reads.
-*
-* In next formulas, right-shift the value
-* (1Usb_shift - something)), to be robust against
-* any possible value of bfqd->sb_shift, without having to
-* limit 'something'.
-*/
-   /* no more than 50% of tags for async I/O */
-   bfqd->word_depths[0][0] = max((1U>1, 1U);
-   /*
-* no more than 75% of tags for sync writes (25% extra tags
-* w.r.t. async I/O, to prevent async I/O from starving sync
-* writes)
-*/
-   bfqd->word_depths[0][1] = max(((1U>2, 1U);
-
-   /*
-* In-word depths in case some bfq_queue is being weight-
-* raised: leaving ~63% of tags for sync reads. This is the
-* highest percentage for which, in our tests, application
-* start-up times didn't suffer from any regression due to tag
-* shortage.
-*/
-   /* no more than ~18% of tags for async I/O */
-   bfqd->word_depths[1][0] = max(((1U>4, 1U);
-   /* no more than ~37% of tags for sync writes (~20% extra tags) */
-   bfqd->word_depths[1][1] = max(((1U>4, 1U);
-}
-
-/*
  * Async I/O can easily starve sync I/O (both sync reads and sync
  * writes), by consuming all tags. Similarly, storms of sync writes,
  * such as those that sync(2) may trigger, can starve sync reads.
@@ -535,25 +495,11 @@ static void bfq_update_depths(struct bfq_data *bfqd, 
struct sbitmap_queue *bt)
  */
 static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
 {
-   struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
struct bfq_data *bfqd = data->q->elevator->elevator_data;
-   struct sbitmap_queue *bt;
 
if (op_is_sync(op) && !op_is_write(op))
return;
 
-   if (data->flags & BLK_MQ_REQ_RESERVED) {
-   if (unlikely(!tags->nr_reserved_tags)) {
-   WARN_ON_ONCE(1);
-   return;
-   }
-   bt = &tags->breserved_tags;
-   } else
-   bt = &tags->bitmap_tags;
-
-   if (unlikely(bfqd->sb_shift != bt->sb.shift))
-   bfq_update_depths(bfqd, bt);
-
data->shallow_depth =
bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
 
@@ -5105,6 +5051,66 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct 
bfq_group *bfqg)
__bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
 }
 
+/*
+ * See the comments on bfq_limit_depth for the purpose of
+ * the depths set in the function. Return minimum shallow depth we'll use.
+ */
+static unsigned int bfq_update_depths(struct bfq_data *bfqd,
+ struct sbitmap_queue *bt)
+{
+   unsigned int i, j, min_shallow = UINT_MAX;
+
+   bfqd->sb_shift = bt->sb.shift;
+
+   /*
+* In-word depths if no bfq_queue is being weight-raised:
+* leaving 25% of tags only for sync reads.
+*
+* In next formulas, right-shift the value
+* (1Usb_shift - something)), to be robust against
+* any possible value of bfqd->sb_shift, without having to
+* limit 'something'.
+*/
+   /* no more than 50% of tags for async I/O */
+   bfqd->word_depths[0][0] = max((1U>1, 1U);
+   /*
+* no more than 75% of tags for sync writes (25% extra tags
+* w.r.t. async I/O, to prevent async I/O from starving sync
+* writes)
+*/
+   bfqd->word

Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote:
> On 5/9/18 10:57 AM, Mike Galbraith wrote:
> 
> >>> Confirmed.  Impressive high speed bug stomping.
> >>
> >> Well, that's good news. Can I get you to try this patch?
> > 
> > Sure thing.  The original hang (minus provocation patch) being
> > annoyingly non-deterministic, this will (hopefully) take a while.
> 
> You can verify with the provocation patch as well first, if you wish.

Done, box still seems fine.

-Mike


Re: bug in tag handling in blk-mq?

2018-05-09 Thread Jens Axboe
On 5/9/18 10:57 AM, Mike Galbraith wrote:
> On Wed, 2018-05-09 at 09:18 -0600, Jens Axboe wrote:
>> On 5/8/18 10:11 PM, Mike Galbraith wrote:
>>> On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote:

 Alright, I managed to reproduce it. What I think is happening is that
 BFQ is limiting the inflight case to something less than the wake
 batch for sbitmap, which can lead to stalls. I don't have time to test
 this tonight, but perhaps you can give it a go when you are back at it.
 If not, I'll try tomorrow morning.

 If this is the issue, I can turn it into a real patch. This is just to
 confirm that the issue goes away with the below.
>>>
>>> Confirmed.  Impressive high speed bug stomping.
>>
>> Well, that's good news. Can I get you to try this patch?
> 
> Sure thing.  The original hang (minus provocation patch) being
> annoyingly non-deterministic, this will (hopefully) take a while.

You can verify with the provocation patch as well first, if you wish.
Just need to hand-apply since it'll conflict with this patch in
bfq. But it's a trivial resolve.

-- 
Jens Axboe



Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 09:18 -0600, Jens Axboe wrote:
> On 5/8/18 10:11 PM, Mike Galbraith wrote:
> > On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote:
> >>
> >> Alright, I managed to reproduce it. What I think is happening is that
> >> BFQ is limiting the inflight case to something less than the wake
> >> batch for sbitmap, which can lead to stalls. I don't have time to test
> >> this tonight, but perhaps you can give it a go when you are back at it.
> >> If not, I'll try tomorrow morning.
> >>
> >> If this is the issue, I can turn it into a real patch. This is just to
> >> confirm that the issue goes away with the below.
> > 
> > Confirmed.  Impressive high speed bug stomping.
> 
> Well, that's good news. Can I get you to try this patch?

Sure thing.  The original hang (minus provocation patch) being
annoyingly non-deterministic, this will (hopefully) take a while.

-Mike


Re: bug in tag handling in blk-mq?

2018-05-09 Thread Jens Axboe
On 5/8/18 10:11 PM, Mike Galbraith wrote:
> On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote:
>>
>> Alright, I managed to reproduce it. What I think is happening is that
>> BFQ is limiting the inflight case to something less than the wake
>> batch for sbitmap, which can lead to stalls. I don't have time to test
>> this tonight, but perhaps you can give it a go when you are back at it.
>> If not, I'll try tomorrow morning.
>>
>> If this is the issue, I can turn it into a real patch. This is just to
>> confirm that the issue goes away with the below.
> 
> Confirmed.  Impressive high speed bug stomping.

Well, that's good news. Can I get you to try this patch? Needs to be
split, but it'll be good to know if this fixes it too (since it's an
ACTUAL attempt at a fix, not just a masking).


diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index ebc264c87a09..b0dbfd297d20 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -533,19 +533,20 @@ static void bfq_update_depths(struct bfq_data *bfqd, 
struct sbitmap_queue *bt)
  * Limit depths of async I/O and sync writes so as to counter both
  * problems.
  */
-static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
+static int bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
 {
struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
struct bfq_data *bfqd = data->q->elevator->elevator_data;
struct sbitmap_queue *bt;
+   int old_depth;
 
if (op_is_sync(op) && !op_is_write(op))
-   return;
+   return 0;
 
if (data->flags & BLK_MQ_REQ_RESERVED) {
if (unlikely(!tags->nr_reserved_tags)) {
WARN_ON_ONCE(1);
-   return;
+   return 0;
}
bt = &tags->breserved_tags;
} else
@@ -554,12 +555,18 @@ static void bfq_limit_depth(unsigned int op, struct 
blk_mq_alloc_data *data)
if (unlikely(bfqd->sb_shift != bt->sb.shift))
bfq_update_depths(bfqd, bt);
 
+   old_depth = data->shallow_depth;
data->shallow_depth =
bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
 
bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
__func__, bfqd->wr_busy_queues, op_is_sync(op),
data->shallow_depth);
+
+   if (old_depth != data->shallow_depth)
+   return data->shallow_depth;
+
+   return 0;
 }
 
 static struct bfq_queue *
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 25c14c58385c..0c53a254671f 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -16,6 +16,32 @@
 #include "blk-mq-tag.h"
 #include "blk-wbt.h"
 
+void blk_mq_sched_limit_depth(struct elevator_queue *e,
+ struct blk_mq_alloc_data *data, unsigned int op)
+{
+   struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
+   struct sbitmap_queue *bt;
+   int ret;
+
+   /*
+* Flush requests are special and go directly to the
+* dispatch list.
+*/
+   if (op_is_flush(op) || !e->type->ops.mq.limit_depth)
+   return;
+
+   ret = e->type->ops.mq.limit_depth(op, data);
+   if (!ret)
+   return;
+
+   if (data->flags & BLK_MQ_REQ_RESERVED)
+   bt = &tags->breserved_tags;
+   else
+   bt = &tags->bitmap_tags;
+
+   sbitmap_queue_shallow_depth(bt, ret);
+}
+
 void blk_mq_sched_free_hctx_data(struct request_queue *q,
 void (*exit)(struct blk_mq_hw_ctx *))
 {
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 1e9c9018ace1..6abebc1b9ae0 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -5,6 +5,9 @@
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
 
+void blk_mq_sched_limit_depth(struct elevator_queue *e,
+ struct blk_mq_alloc_data *data, unsigned int op);
+
 void blk_mq_sched_free_hctx_data(struct request_queue *q,
 void (*exit)(struct blk_mq_hw_ctx *));
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4e9d83594cca..1bb7aa40c192 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -357,13 +357,7 @@ static struct request *blk_mq_get_request(struct 
request_queue *q,
 
if (e) {
data->flags |= BLK_MQ_REQ_INTERNAL;
-
-   /*
-* Flush requests are special and go directly to the
-* dispatch list.
-*/
-   if (!op_is_flush(op) && e->type->ops.mq.limit_depth)
-   e->type->ops.mq.limit_depth(op, data);
+   blk_mq_sched_limit_depth(e, data, op);
}
 
tag = blk_mq_get_tag(data);
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 564967fafe5f..d2622386c115 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -433,17 +433,23 @@ static void rq_clear_domain_t

Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 14:37 -0600, Jens Axboe wrote:
> 
> - sdd has nothing pending, yet has 6 active waitqueues.

sdd is where ccache storage lives, which that should have been the only
activity on that drive, as I built source in sdb, and was doing nothing
else that utilizes sdd.

-Mike


Re: bug in tag handling in blk-mq?

2018-05-08 Thread Paolo Valente


> Il giorno 09 mag 2018, alle ore 06:11, Mike Galbraith  ha 
> scritto:
> 
> On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote:
>> 
>> Alright, I managed to reproduce it. What I think is happening is that
>> BFQ is limiting the inflight case to something less than the wake
>> batch for sbitmap, which can lead to stalls. I don't have time to test
>> this tonight, but perhaps you can give it a go when you are back at it.
>> If not, I'll try tomorrow morning.
>> 
>> If this is the issue, I can turn it into a real patch. This is just to
>> confirm that the issue goes away with the below.
> 
> Confirmed.  Impressive high speed bug stomping.
> 

Great! It's a real relief that this ghost is gone.

Thank you both,
Paolo

>> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
>> index e6a9c06ec70c..94ced15b6428 100644
>> --- a/lib/sbitmap.c
>> +++ b/lib/sbitmap.c
>> @@ -272,6 +272,7 @@ EXPORT_SYMBOL_GPL(sbitmap_bitmap_show);
>> 
>> static unsigned int sbq_calc_wake_batch(unsigned int depth)
>> {
>> +#if 0
>>  unsigned int wake_batch;
>> 
>>  /*
>> @@ -284,6 +285,9 @@ static unsigned int sbq_calc_wake_batch(unsigned int 
>> depth)
>>  wake_batch = max(1U, depth / SBQ_WAIT_QUEUES);
>> 
>>  return wake_batch;
>> +#else
>> +return 1;
>> +#endif
>> }
>> 
>> int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth,
>> 



Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote:
> 
> Alright, I managed to reproduce it. What I think is happening is that
> BFQ is limiting the inflight case to something less than the wake
> batch for sbitmap, which can lead to stalls. I don't have time to test
> this tonight, but perhaps you can give it a go when you are back at it.
> If not, I'll try tomorrow morning.
> 
> If this is the issue, I can turn it into a real patch. This is just to
> confirm that the issue goes away with the below.

Confirmed.  Impressive high speed bug stomping.

> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index e6a9c06ec70c..94ced15b6428 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -272,6 +272,7 @@ EXPORT_SYMBOL_GPL(sbitmap_bitmap_show);
>  
>  static unsigned int sbq_calc_wake_batch(unsigned int depth)
>  {
> +#if 0
>   unsigned int wake_batch;
>  
>   /*
> @@ -284,6 +285,9 @@ static unsigned int sbq_calc_wake_batch(unsigned int 
> depth)
>   wake_batch = max(1U, depth / SBQ_WAIT_QUEUES);
>  
>   return wake_batch;
> +#else
> + return 1;
> +#endif
>  }
>  
>  int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth,
> 


Re: bug in tag handling in blk-mq?

2018-05-08 Thread Jens Axboe
On 5/8/18 3:19 PM, Jens Axboe wrote:
> On 5/8/18 2:37 PM, Jens Axboe wrote:
>> On 5/8/18 10:42 AM, Mike Galbraith wrote:
>>> On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote:

 All the block debug files are empty...
>>>
>>> Sigh.  Take 2, this time cat debug files, having turned block tracing
>>> off before doing anything else (so trace bits in dmesg.txt should end
>>> AT the stall).
>>
>> OK, that's better. What I see from the traces:
>>
>> - You have regular IO and some non-fs IO (from scsi_execute()). This mix
>>   may be key.
>>
>> - sdd has nothing pending, yet has 6 active waitqueues.
>>
>> I'm going to see if I can reproduce this. Paolo, what kind of attempts
>> to reproduce this have you done?
> 
> No luck so far. Out of the patches you referenced, I can only find the
> shallow depth change, since that's in the parent of this email. Can
> you send those as well?
> 
> Perhaps also expand a bit on exactly what you are running. File system,
> mount options, etc.

Alright, I managed to reproduce it. What I think is happening is that
BFQ is limiting the inflight case to something less than the wake
batch for sbitmap, which can lead to stalls. I don't have time to test
this tonight, but perhaps you can give it a go when you are back at it.
If not, I'll try tomorrow morning.

If this is the issue, I can turn it into a real patch. This is just to
confirm that the issue goes away with the below.

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index e6a9c06ec70c..94ced15b6428 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -272,6 +272,7 @@ EXPORT_SYMBOL_GPL(sbitmap_bitmap_show);
 
 static unsigned int sbq_calc_wake_batch(unsigned int depth)
 {
+#if 0
unsigned int wake_batch;
 
/*
@@ -284,6 +285,9 @@ static unsigned int sbq_calc_wake_batch(unsigned int depth)
wake_batch = max(1U, depth / SBQ_WAIT_QUEUES);
 
return wake_batch;
+#else
+   return 1;
+#endif
 }
 
 int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth,

-- 
Jens Axboe



Re: bug in tag handling in blk-mq?

2018-05-08 Thread Jens Axboe
On 5/8/18 2:37 PM, Jens Axboe wrote:
> On 5/8/18 10:42 AM, Mike Galbraith wrote:
>> On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote:
>>>
>>> All the block debug files are empty...
>>
>> Sigh.  Take 2, this time cat debug files, having turned block tracing
>> off before doing anything else (so trace bits in dmesg.txt should end
>> AT the stall).
> 
> OK, that's better. What I see from the traces:
> 
> - You have regular IO and some non-fs IO (from scsi_execute()). This mix
>   may be key.
> 
> - sdd has nothing pending, yet has 6 active waitqueues.
> 
> I'm going to see if I can reproduce this. Paolo, what kind of attempts
> to reproduce this have you done?

No luck so far. Out of the patches you referenced, I can only find the
shallow depth change, since that's in the parent of this email. Can
you send those as well?

Perhaps also expand a bit on exactly what you are running. File system,
mount options, etc.

-- 
Jens Axboe



Re: bug in tag handling in blk-mq?

2018-05-08 Thread Jens Axboe
On 5/8/18 10:42 AM, Mike Galbraith wrote:
> On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote:
>>
>> All the block debug files are empty...
> 
> Sigh.  Take 2, this time cat debug files, having turned block tracing
> off before doing anything else (so trace bits in dmesg.txt should end
> AT the stall).

OK, that's better. What I see from the traces:

- You have regular IO and some non-fs IO (from scsi_execute()). This mix
  may be key.

- sdd has nothing pending, yet has 6 active waitqueues.

I'm going to see if I can reproduce this. Paolo, what kind of attempts
to reproduce this have you done?

-- 
Jens Axboe



Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote:
> 
> All the block debug files are empty...

Sigh.  Take 2, this time cat debug files, having turned block tracing
off before doing anything else (so trace bits in dmesg.txt should end
AT the stall).

-Mike

dmesg.xz
Description: application/xz


dmesg.txt.xz
Description: application/xz


block_debug.xz
Description: application/xz


Re: bug in tag handling in blk-mq?

2018-05-08 Thread Jens Axboe
On 5/8/18 2:37 AM, Mike Galbraith wrote:
> On Tue, 2018-05-08 at 06:51 +0200, Mike Galbraith wrote:
>>
>> I'm deadlined ATM, but will get to it.
> 
> (Bah, even a zombie can type ccache -C; make -j8 and stare...)
> 
> kbuild again hung on the first go (yay), and post hang data written to
> sdd1 survived (kernel source lives in sdb3).  Full ftrace buffer (echo
> 1 > events/block/enable) available off list if desired.  dmesg.txt.xz
> is dmesg from post hang crashdump, attached because it contains the
> tail of trace buffer, so _might_ be useful.
> 
> homer:~ # df|grep sd
> /dev/sdb3  959074776 785342824 172741072  82% /
> /dev/sdc3  959074776 455464912 502618984  48% /backup
> /dev/sdb1 159564  7980    151584   6% /boot/efi
> /dev/sdd1  961301832 393334868 519112540  44% /abuild
> 
> Kernel is virgin modulo these...
> 
> patches/remove_irritating_plus.diff
> patches/add-scm-version-to-EXTRAVERSION.patch
> patches/block-bfq:-postpone-rq-preparation-to-insert-or-merge.patch
> patches/block-bfq:-test.patch  (hang provocation hack from Paolo)

All the block debug files are empty...

-- 
Jens Axboe



Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 06:51 +0200, Mike Galbraith wrote:
> 
> I'm deadlined ATM, but will get to it.

(Bah, even a zombie can type ccache -C; make -j8 and stare...)

kbuild again hung on the first go (yay), and post hang data written to
sdd1 survived (kernel source lives in sdb3).  Full ftrace buffer (echo
1 > events/block/enable) available off list if desired.  dmesg.txt.xz
is dmesg from post hang crashdump, attached because it contains the
tail of trace buffer, so _might_ be useful.

homer:~ # df|grep sd
/dev/sdb3  959074776 785342824 172741072  82% /
/dev/sdc3  959074776 455464912 502618984  48% /backup
/dev/sdb1 159564  7980    151584   6% /boot/efi
/dev/sdd1  961301832 393334868 519112540  44% /abuild

Kernel is virgin modulo these...

patches/remove_irritating_plus.diff
patches/add-scm-version-to-EXTRAVERSION.patch
patches/block-bfq:-postpone-rq-preparation-to-insert-or-merge.patch
patches/block-bfq:-test.patch  (hang provocation hack from Paolo)

-Mike

block_debug.tar.xz
Description: application/xz-compressed-tar


dmesg.xz
Description: application/xz


dmesg.txt.xz
Description: application/xz


Re: bug in tag handling in blk-mq?

2018-05-07 Thread Mike Galbraith
On Mon, 2018-05-07 at 20:02 +0200, Paolo Valente wrote:
> 
> 
> > Is there a reproducer?

Just building fat config kernels works for me.  It was highly non-
deterministic, but reproduced quickly twice in a row with Paolos hack.
  
> Ok Mike, I guess it's your turn now, for at least a stack trace.

Sure.  I'm deadlined ATM, but will get to it.

-Mike


Re: bug in tag handling in blk-mq?

2018-05-07 Thread Paolo Valente


> Il giorno 07 mag 2018, alle ore 18:39, Jens Axboe  ha 
> scritto:
> 
> On 5/7/18 8:03 AM, Paolo Valente wrote:
>> Hi Jens, Christoph, all,
>> Mike Galbraith has been experiencing hangs, on blk_mq_get_tag, only
>> with bfq [1].  Symptoms seem to clearly point to a problem in I/O-tag
>> handling, triggered by bfq because it limits the number of tags for
>> async and sync write requests (in bfq_limit_depth).
>> 
>> Fortunately, I just happened to find a way to apparently confirm it.
>> With the following one-liner for block/bfq-iosched.c:
>> 
>> @@ -554,8 +554,7 @@ static void bfq_limit_depth(unsigned int op, struct 
>> blk_mq_alloc_data *data)
>>if (unlikely(bfqd->sb_shift != bt->sb.shift))
>>bfq_update_depths(bfqd, bt);
>> 
>> -   data->shallow_depth =
>> -   bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
>> +   data->shallow_depth = 1;
>> 
>>bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
>>__func__, bfqd->wr_busy_queues, op_is_sync(op),
>> 
>> Mike's machine now crashes soon and systematically, while nothing bad
>> happens on my machines, even with heavy workloads (apart from an
>> expected throughput drop).
>> 
>> This change simply reduces to 1 the maximum possible value for the sum
>> of the number of async requests and of sync write requests.
>> 
>> This email is basically a request for help to knowledgeable people.  To
>> start, here are my first doubts/questions:
>> 1) Just to be certain, I guess it is not normal that blk-mq hangs if
>> async requests and sync write requests can be at most one, right?
>> 2) Do you have any hint to where I could look for, to chase this bug?
>> Of course, the bug may be in bfq, i.e, it may be a somehow unrelated
>> bfq bug that causes this hang in blk-mq, indirectly.  But it is hard
>> for me to understand how.
> 
> CC Omar, since he implemented the shallow part. But we'll need some
> traces to show where we are hung, probably also the value of the
> /sys/debug/kernel/block// directory. For the crash mentioned, a
> trace as well. Otherwise we'll be wasting a lot of time on this.
> 
> Is there a reproducer?
> 

Ok Mike, I guess it's your turn now, for at least a stack trace.

Thanks,
Paolo

> -- 
> Jens Axboe



Re: bug in tag handling in blk-mq?

2018-05-07 Thread Jens Axboe
On 5/7/18 8:03 AM, Paolo Valente wrote:
> Hi Jens, Christoph, all,
> Mike Galbraith has been experiencing hangs, on blk_mq_get_tag, only
> with bfq [1].  Symptoms seem to clearly point to a problem in I/O-tag
> handling, triggered by bfq because it limits the number of tags for
> async and sync write requests (in bfq_limit_depth).
> 
> Fortunately, I just happened to find a way to apparently confirm it.
> With the following one-liner for block/bfq-iosched.c:
> 
> @@ -554,8 +554,7 @@ static void bfq_limit_depth(unsigned int op, struct 
> blk_mq_alloc_data *data)
> if (unlikely(bfqd->sb_shift != bt->sb.shift))
> bfq_update_depths(bfqd, bt);
>  
> -   data->shallow_depth =
> -   bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
> +   data->shallow_depth = 1;
>  
> bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
> __func__, bfqd->wr_busy_queues, op_is_sync(op),
> 
> Mike's machine now crashes soon and systematically, while nothing bad
> happens on my machines, even with heavy workloads (apart from an
> expected throughput drop).
> 
> This change simply reduces to 1 the maximum possible value for the sum
> of the number of async requests and of sync write requests.
> 
> This email is basically a request for help to knowledgeable people.  To
> start, here are my first doubts/questions:
> 1) Just to be certain, I guess it is not normal that blk-mq hangs if
> async requests and sync write requests can be at most one, right?
> 2) Do you have any hint to where I could look for, to chase this bug?
> Of course, the bug may be in bfq, i.e, it may be a somehow unrelated
> bfq bug that causes this hang in blk-mq, indirectly.  But it is hard
> for me to understand how.

CC Omar, since he implemented the shallow part. But we'll need some
traces to show where we are hung, probably also the value of the
/sys/debug/kernel/block// directory. For the crash mentioned, a
trace as well. Otherwise we'll be wasting a lot of time on this.

Is there a reproducer?

-- 
Jens Axboe



bug in tag handling in blk-mq?

2018-05-07 Thread Paolo Valente
Hi Jens, Christoph, all,
Mike Galbraith has been experiencing hangs, on blk_mq_get_tag, only
with bfq [1].  Symptoms seem to clearly point to a problem in I/O-tag
handling, triggered by bfq because it limits the number of tags for
async and sync write requests (in bfq_limit_depth).

Fortunately, I just happened to find a way to apparently confirm it.
With the following one-liner for block/bfq-iosched.c:

@@ -554,8 +554,7 @@ static void bfq_limit_depth(unsigned int op, struct 
blk_mq_alloc_data *data)
if (unlikely(bfqd->sb_shift != bt->sb.shift))
bfq_update_depths(bfqd, bt);
 
-   data->shallow_depth =
-   bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
+   data->shallow_depth = 1;
 
bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
__func__, bfqd->wr_busy_queues, op_is_sync(op),

Mike's machine now crashes soon and systematically, while nothing bad
happens on my machines, even with heavy workloads (apart from an
expected throughput drop).

This change simply reduces to 1 the maximum possible value for the sum
of the number of async requests and of sync write requests.

This email is basically a request for help to knowledgeable people.  To
start, here are my first doubts/questions:
1) Just to be certain, I guess it is not normal that blk-mq hangs if
async requests and sync write requests can be at most one, right?
2) Do you have any hint to where I could look for, to chase this bug?
Of course, the bug may be in bfq, i.e, it may be a somehow unrelated
bfq bug that causes this hang in blk-mq, indirectly.  But it is hard
for me to understand how.

Looking forward to some help.

Thanks,
Paolo

[1] https://www.spinics.net/lists/stable/msg215036.html