Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 13:50 -0600, Jens Axboe wrote:
> On 5/9/18 12:31 PM, Mike Galbraith wrote:
> > On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote:
> >> On 5/9/18 10:57 AM, Mike Galbraith wrote:
> >>
> >>>>> Confirmed.  Impressive high speed bug stomping.
> >>>>
> >>>> Well, that's good news. Can I get you to try this patch?
> >>>
> >>> Sure thing.  The original hang (minus provocation patch) being
> >>> annoyingly non-deterministic, this will (hopefully) take a while.
> >>
> >> You can verify with the provocation patch as well first, if you wish.
> > 
> > Done, box still seems fine.
> 
> Omar had some (valid) complaints, can you try this one as well? You
> can also find it as a series here:
> 
> http://git.kernel.dk/cgit/linux-block/log/?h=bfq-cleanups
> 
> I'll repost the series shortly, need to check if it actually builds and
> boots.

I applied the series (+ provocation), all is well.

-Mike


Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote:
> On 5/9/18 10:57 AM, Mike Galbraith wrote:
> 
> >>> Confirmed.  Impressive high speed bug stomping.
> >>
> >> Well, that's good news. Can I get you to try this patch?
> > 
> > Sure thing.  The original hang (minus provocation patch) being
> > annoyingly non-deterministic, this will (hopefully) take a while.
> 
> You can verify with the provocation patch as well first, if you wish.

Done, box still seems fine.

-Mike


Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 09:18 -0600, Jens Axboe wrote:
> On 5/8/18 10:11 PM, Mike Galbraith wrote:
> > On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote:
> >>
> >> Alright, I managed to reproduce it. What I think is happening is that
> >> BFQ is limiting the inflight case to something less than the wake
> >> batch for sbitmap, which can lead to stalls. I don't have time to test
> >> this tonight, but perhaps you can give it a go when you are back at it.
> >> If not, I'll try tomorrow morning.
> >>
> >> If this is the issue, I can turn it into a real patch. This is just to
> >> confirm that the issue goes away with the below.
> > 
> > Confirmed.  Impressive high speed bug stomping.
> 
> Well, that's good news. Can I get you to try this patch?

Sure thing.  The original hang (minus provocation patch) being
annoyingly non-deterministic, this will (hopefully) take a while.

-Mike


Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 14:37 -0600, Jens Axboe wrote:
> 
> - sdd has nothing pending, yet has 6 active waitqueues.

sdd is where ccache storage lives, which that should have been the only
activity on that drive, as I built source in sdb, and was doing nothing
else that utilizes sdd.

-Mike


Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote:
> 
> Alright, I managed to reproduce it. What I think is happening is that
> BFQ is limiting the inflight case to something less than the wake
> batch for sbitmap, which can lead to stalls. I don't have time to test
> this tonight, but perhaps you can give it a go when you are back at it.
> If not, I'll try tomorrow morning.
> 
> If this is the issue, I can turn it into a real patch. This is just to
> confirm that the issue goes away with the below.

Confirmed.  Impressive high speed bug stomping.

> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index e6a9c06ec70c..94ced15b6428 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -272,6 +272,7 @@ EXPORT_SYMBOL_GPL(sbitmap_bitmap_show);
>  
>  static unsigned int sbq_calc_wake_batch(unsigned int depth)
>  {
> +#if 0
>   unsigned int wake_batch;
>  
>   /*
> @@ -284,6 +285,9 @@ static unsigned int sbq_calc_wake_batch(unsigned int 
> depth)
>   wake_batch = max(1U, depth / SBQ_WAIT_QUEUES);
>  
>   return wake_batch;
> +#else
> + return 1;
> +#endif
>  }
>  
>  int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth,
> 


Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote:
> 
> All the block debug files are empty...

Sigh.  Take 2, this time cat debug files, having turned block tracing
off before doing anything else (so trace bits in dmesg.txt should end
AT the stall).

-Mike

dmesg.xz
Description: application/xz


dmesg.txt.xz
Description: application/xz


block_debug.xz
Description: application/xz


Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 06:51 +0200, Mike Galbraith wrote:
> 
> I'm deadlined ATM, but will get to it.

(Bah, even a zombie can type ccache -C; make -j8 and stare...)

kbuild again hung on the first go (yay), and post hang data written to
sdd1 survived (kernel source lives in sdb3).  Full ftrace buffer (echo
1 > events/block/enable) available off list if desired.  dmesg.txt.xz
is dmesg from post hang crashdump, attached because it contains the
tail of trace buffer, so _might_ be useful.

homer:~ # df|grep sd
/dev/sdb3  959074776 785342824 172741072  82% /
/dev/sdc3  959074776 455464912 502618984  48% /backup
/dev/sdb1 159564  7980    151584   6% /boot/efi
/dev/sdd1  961301832 393334868 519112540  44% /abuild

Kernel is virgin modulo these...

patches/remove_irritating_plus.diff
patches/add-scm-version-to-EXTRAVERSION.patch
patches/block-bfq:-postpone-rq-preparation-to-insert-or-merge.patch
patches/block-bfq:-test.patch  (hang provocation hack from Paolo)

-Mike

block_debug.tar.xz
Description: application/xz-compressed-tar


dmesg.xz
Description: application/xz


dmesg.txt.xz
Description: application/xz


Re: bug in tag handling in blk-mq?

2018-05-07 Thread Mike Galbraith
On Mon, 2018-05-07 at 20:02 +0200, Paolo Valente wrote:
> 
> 
> > Is there a reproducer?

Just building fat config kernels works for me.  It was highly non-
deterministic, but reproduced quickly twice in a row with Paolos hack.
  
> Ok Mike, I guess it's your turn now, for at least a stack trace.

Sure.  I'm deadlined ATM, but will get to it.

-Mike


Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-07 Thread Mike Galbraith
On Mon, 2018-05-07 at 11:27 +0200, Paolo Valente wrote:
> 
> 
> Where is the bug?

Hm, seems potent pain-killers and C don't mix all that well.



Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-06 Thread Mike Galbraith
On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote:
> 
> diff --git a/block/bfq-mq-iosched.c b/block/bfq-mq-iosched.c
> index 118f319af7c0..6662efe29b69 100644
> --- a/block/bfq-mq-iosched.c
> +++ b/block/bfq-mq-iosched.c
> @@ -525,8 +525,13 @@ static void bfq_limit_depth(unsigned int op, struct 
> blk_mq_alloc_data *data)
> if (unlikely(bfqd->sb_shift != bt->sb.shift))
> bfq_update_depths(bfqd, bt);
>  
> +#if 0
> data->shallow_depth =
> bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
^

Q: why doesn't the top of this function look like so?

---
 block/bfq-iosched.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -539,7 +539,7 @@ static void bfq_limit_depth(unsigned int
struct bfq_data *bfqd = data->q->elevator->elevator_data;
struct sbitmap_queue *bt;
 
-   if (op_is_sync(op) && !op_is_write(op))
+   if (!op_is_write(op))
return;
 
if (data->flags & BLK_MQ_REQ_RESERVED) {

It looks a bit odd that these elements exist...

+   /*
+    * no more than 75% of tags for sync writes (25% extra tags
+    * w.r.t. async I/O, to prevent async I/O from starving sync
+    * writes)
+    */
+   bfqd->word_depths[0][1] = max(((1U>2, 1U);

+   /* no more than ~37% of tags for sync writes (~20% extra tags) */
+   bfqd->word_depths[1][1] = max(((1U>4, 1U);

...yet we index via and log a guaranteed zero.

-Mike




Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-06 Thread Mike Galbraith
On Mon, 2018-05-07 at 04:43 +0200, Mike Galbraith wrote:
> On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote:
> > 
> > I've attached a compressed patch (to avoid possible corruption from my
> > mailer).  I'm little confident, but no pain, no gain, right?
> > 
> > If possible, apply this patch on top of the fix I proposed in this
> > thread, just to eliminate possible further noise. Finally, the
> > patch content follows.
> > 
> > Hoping for a stroke of luck,
> 
> FWIW, box didn't survive the first full build of the morning.

Nor the second.

-Mike


Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-06 Thread Mike Galbraith
On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote:
> 
> I've attached a compressed patch (to avoid possible corruption from my
> mailer).  I'm little confident, but no pain, no gain, right?
> 
> If possible, apply this patch on top of the fix I proposed in this
> thread, just to eliminate possible further noise. Finally, the
> patch content follows.
> 
> Hoping for a stroke of luck,

FWIW, box didn't survive the first full build of the morning.

> Paolo
> 
> diff --git a/block/bfq-mq-iosched.c b/block/bfq-mq-iosched.c
> index 118f319af7c0..6662efe29b69 100644
> --- a/block/bfq-mq-iosched.c
> +++ b/block/bfq-mq-iosched.c

That doesn't exist in master, so I applied it like so.

---
 block/bfq-iosched.c |4 
 1 file changed, 4 insertions(+)

--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -554,8 +554,12 @@ static void bfq_limit_depth(unsigned int
if (unlikely(bfqd->sb_shift != bt->sb.shift))
bfq_update_depths(bfqd, bt);
 
+#if 0
data->shallow_depth =
bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
+#else
+   data->shallow_depth = 1;
+#endif
 
bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
__func__, bfqd->wr_busy_queues, op_is_sync(op),


Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-05 Thread Mike Galbraith
On Sat, 2018-05-05 at 12:39 +0200, Paolo Valente wrote:
> 
> BTW, if you didn't run out of patience with this permanent issue yet,
> I was thinking of two o three changes to retry to trigger your failure
> reliably.

Sure, fire away, I'll happily give the annoying little bugger
opportunities to show its tender belly.

-Mike



Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-05 Thread Mike Galbraith
On Fri, 2018-05-04 at 21:46 +0200, Mike Galbraith wrote:
> Tentatively, I suspect you've just fixed the nasty stalls I reported a
> while back.

Oh well, so much for optimism.  It took a lot, but just hung.


Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-04 Thread Mike Galbraith
Tentatively, I suspect you've just fixed the nasty stalls I reported a
while back.  Not a hint of stall as yet (should have shown itself by
now), spinning rust buckets are being all they can be, box feels good.

Later mq-deadline (I hope to eventually forget the module dependency
eternities we've spent together;), welcome back bfq (maybe.. I hope).

-Mike


Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook

2018-02-09 Thread Mike Galbraith
On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote:
> 
> In addition to this I think it should be worth considering CC'ing Greg 
> to pull this fix into 4.15 stable tree.

This isn't one he can cherry-pick, some munging required, in which case
he usually wants a properly tested backport.

-Mike


Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Mike Galbraith
On Wed, 2018-02-07 at 12:12 +0100, Paolo Valente wrote:

> Just to be certain, before submitting a new patch: you changed *only*
> the BUG_ON at line 4742, on top of my instrumentation patch.

Nah, I completely rewrite it with only a little help from an ouija
board to compensate for missing (all) knowledge wrt BFQ internals.

(iow yeah, I did exactly and only what I was asked to do:)


Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Mike Galbraith
On Wed, 2018-02-07 at 11:27 +0100, Paolo Valente wrote:
> 
> 2. Could you please turn that BUG_ON into:
> if (!(rq->rq_flags & RQF_ELVPRIV))
>   return;
> and see what happens?

That seems to make it forgets how to make boom.

-Mike


Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Mike Galbraith
On Wed, 2018-02-07 at 11:27 +0100, Paolo Valente wrote:
> 
> 1. Could you paste a stack trace for this OOPS, just to understand how we
> get there?

[  442.421058] kernel BUG at block/bfq-iosched.c:4742!
[  442.421762] invalid opcode:  [#1] SMP PTI
[  442.422436] Dumping ftrace buffer:
[  442.423116](ftrace buffer empty)
[  442.423785] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) 
nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) xt_tcpudp(E) 
iptable_filter(E) ip6table_>
[  442.426685]  pcbc(E) iTCO_wdt(E) aesni_intel(E) aes_x86_64(E) 
iTCO_vendor_support(E) crypto_simd(E) glue_helper(E) mei_me(E) i2c_i801(E) 
r8169(E) cryptd(E) lpc_ich(E) mfd_core(E) mii(E) mei(E) soundcore(E) pcspkr(E) 
shpchp(E) thermal>
[  442.429634]  dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) 
scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E)
[  442.430166] CPU: 2 PID: 489 Comm: kworker/2:1H Tainted: GE
4.15.0.ga2e5790-master #612
[  442.430675] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[  442.431195] Workqueue: kblockd blk_mq_requeue_work
[  442.431706] RIP: 0010:bfq_only_requeue_request+0xd1/0xe0
[  442.432221] RSP: 0018:8803f7b8fcc0 EFLAGS: 00010246
[  442.432733] RAX: 2090 RBX: 8803dbda2580 RCX: 
[  442.433250] RDX: 8803fa3c8000 RSI: 0004 RDI: 8803dbda2580
[  442.433778] RBP: 8803f9ec85d0 R08: 8803fa3c81c0 R09: 
[  442.434296] R10: 8803fa3c81d0 R11: 1000 R12: 8803dbda2600
[  442.434815] R13: 000d R14:  R15: 8803f7b8fd88
[  442.435334] FS:  () GS:88041ec8() 
knlGS:
[  442.435863] CS:  0010 DS:  ES:  CR0: 80050033
[  442.436381] CR2: 7f00e001e198 CR3: 01e0a002 CR4: 001606e0
[  442.436909] Call Trace:
[  442.437445]  __blk_mq_requeue_request+0x8f/0x120
[  442.437980]  blk_mq_dispatch_rq_list+0x342/0x550
[  442.438506]  ? kyber_dispatch_request+0xd0/0xd0
[  442.439041]  blk_mq_sched_dispatch_requests+0xf7/0x180
[  442.439568]  __blk_mq_run_hw_queue+0x58/0xd0
[  442.440103]  __blk_mq_delay_run_hw_queue+0x99/0xa0
[  442.440628]  blk_mq_run_hw_queue+0x54/0xf0
[  442.441157]  blk_mq_run_hw_queues+0x4b/0x60
[  442.441822]  blk_mq_requeue_work+0x13a/0x150
[  442.442518]  process_one_work+0x147/0x350
[  442.443205]  worker_thread+0x47/0x3e0
[  442.443887]  kthread+0xf8/0x130
[  442.444579]  ? rescuer_thread+0x360/0x360
[  442.445264]  ? kthread_stop+0x120/0x120
[  442.445965]  ? do_syscall_64+0x75/0x1a0
[  442.446651]  ? SyS_exit_group+0x10/0x10
[  442.447340]  ret_from_fork+0x35/0x40
[  442.448023] Code: ff 4c 89 f6 4c 89 ef e8 be ec 2e 00 48 c7 83 80 00 00 00 
00 00 00 00 48 c7 83 88 00 00 00 00 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 
0f 0b 0f 0b 0f 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 41
[  442.448668] RIP: bfq_only_requeue_request+0xd1/0xe0 RSP: 8803f7b8fcc0



Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Mike Galbraith
On Wed, 2018-02-07 at 10:45 +0100, Paolo Valente wrote:
> 
> > Il giorno 07 feb 2018, alle ore 10:23, Mike Galbraith <efa...@gmx.de> ha 
> > scritto:
> > 
> > On Wed, 2018-02-07 at 10:08 +0100, Paolo Valente wrote:
> >> 
> >> The first piece of information I need is whether this failure happens
> >> even without "BFQ hierarchical scheduling support".
> > 
> > I presume you mean BFQ_GROUP_IOSCHED, which I do not have enabled.
> > 
> 
> Great (so to speak), this saves us one step.
> 
> So, here's my next request for help: please apply the attached patch
> (compressed to preserve it from my email client) and retry. It adds
> several anomaly checks. I hope I have not added any false-positive
> check.

kernel BUG at block/bfq-iosched.c:4742!

4742 BUG_ON(!(rq->rq_flags & RQF_ELVPRIV));




Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Mike Galbraith
On Wed, 2018-02-07 at 10:08 +0100, Paolo Valente wrote:
> 
> The first piece of information I need is whether this failure happens
> even without "BFQ hierarchical scheduling support".

I presume you mean BFQ_GROUP_IOSCHED, which I do not have enabled.

-Mike 


Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Mike Galbraith
On Tue, 2018-02-06 at 13:43 +0100, Holger Hoffstätte wrote:
> 
> A much more interesting question to me is why there is kyber in the middle. :)

Yeah, given per sysfs I have zero devices using kyber.

-Mike


Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Mike Galbraith
On Tue, 2018-02-06 at 13:26 +0100, Paolo Valente wrote:
> 
> ok, right in the middle of bfq this time ... Was this the first OOPS in your 
> kernel log?

Yeah.


Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Mike Galbraith
On Tue, 2018-02-06 at 13:16 +0100, Oleksandr Natalenko wrote:
> Hi.
> 
> 06.02.2018 12:57, Mike Galbraith wrote:
> > Not me.  Box seems to be fairly sure that it is bfq.  Twice again box
> > went belly up on me in fairly short order with bfq, but seemed fine
> > with deadline.  I'm currently running deadline again, and box again
> > seems solid, thought I won't say _is_ solid until it's been happily
> > trundling along with deadline for a quite a bit longer.
> 
> Sorry for the noise, but just to make it clear, are we talking about 
> "deadline" or "mq-deadline" now?

mq-deadline.


Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Mike Galbraith
On Tue, 2018-02-06 at 10:38 +0100, Paolo Valente wrote:
> 
> Hi Mike,
> as you can imagine, I didn't get any failure in my pre-submission
> tests on this patch.  In addition, it is not that easy to link this
> patch, which just adds some internal bfq housekeeping in case of a
> requeue, with a corruption of external lists for general I/O
> management.
> 
> In this respect, as Oleksandr comments point out, by switching from
> cfq to bfq, you switch between much more than two schedulers.  Anyway,
> who knows ...

Not me.  Box seems to be fairly sure that it is bfq.  Twice again box
went belly up on me in fairly short order with bfq, but seemed fine
with deadline.  I'm currently running deadline again, and box again
seems solid, thought I won't say _is_ solid until it's been happily
trundling along with deadline for a quite a bit longer.

I was ssh'd in during the last episode, got this out.  I should be
getting crash dumps, but seems kdump is only working intermittently
atm.  I did get one earlier, but 3 of 4 times not.  Hohum.

[  484.179292] BUG: unable to handle kernel paging request at a0817000
[  484.179436] IP: __trace_note_message+0x1f/0xd0
[  484.179576] PGD 1e0c067 P4D 1e0c067 PUD 1e0d063 PMD 3faff2067 PTE 0
[  484.179719] Oops:  [#1] SMP PTI
[  484.179861] Dumping ftrace buffer:
[  484.180011](ftrace buffer empty)
[  484.180138] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) 
nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) xt_tcpudp(E) 
iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) 
nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) 
xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) 
nls_iso8859_1(E) nls_cp437(E) intel_rapl(E) x86_pkg_temp_thermal(E) 
intel_powerclamp(E) snd_hda_codec_hdmi(E) coretemp(E) kvm_intel(E) 
snd_hda_codec_realtek(E) kvm(E) snd_hda_codec_generic(E) snd_hda_intel(E) 
snd_hda_codec(E) sr_mod(E) snd_hwdep(E) cdrom(E) joydev(E) snd_hda_core(E) 
snd_pcm(E) snd_timer(E) irqbypass(E) snd(E) crct10dif_pclmul(E) crc32_pclmul(E) 
crc32c_intel(E) r8169(E)
[  484.180740]  iTCO_wdt(E) ghash_clmulni_intel(E) mii(E) 
iTCO_vendor_support(E) pcbc(E) aesni_intel(E) soundcore(E) aes_x86_64(E) 
shpchp(E) crypto_simd(E) lpc_ich(E) glue_helper(E) i2c_i801(E) mei_me(E) 
mfd_core(E) mei(E) cryptd(E) intel_smartconnect(E) pcspkr(E) fan(E) thermal(E) 
nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) 
hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) 
usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) 
sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ahci(E) xhci_pci(E) ehci_pci(E) 
libahci(E) ttm(E) ehci_hcd(E) xhci_hcd(E) libata(E) drm(E) usbcore(E) video(E) 
button(E) sd_mod(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio_pci(E) 
virtio_ring(E) virtio(E) ext4(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) 
dm_multipath(E)
[  484.181421]  dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) 
scsi_mod(E) efivarfs(E) autofs4(E)
[  484.181583] CPU: 3 PID: 500 Comm: kworker/3:1H Tainted: GE
4.15.0.ge237f98-master #609
[  484.181746] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[  484.181910] Workqueue: kblockd blk_mq_requeue_work
[  484.182076] RIP: 0010:__trace_note_message+0x1f/0xd0
[  484.182250] RSP: 0018:8803f45bfc20 EFLAGS: 00010282
[  484.182436] RAX:  RBX: a0817000 RCX: 8803
[  484.182622] RDX: 81bf514d RSI:  RDI: a0817000
[  484.182810] RBP: 8803f45bfc80 R08: 0041 R09: 8803f69cc5d0
[  484.182998] R10: 8803f80b47d0 R11: 1000 R12: 8803f45e8000
[  484.183185] R13: 000d R14:  R15: 8803fba112c0
[  484.183372] FS:  () GS:88041ecc() 
knlGS:
[  484.183561] CS:  0010 DS:  ES:  CR0: 80050033
[  484.183747] CR2: a0817000 CR3: 01e0a006 CR4: 001606e0
[  484.183934] Call Trace:
[  484.184122]  bfq_put_queue+0xd3/0xe0
[  484.184305]  bfq_finish_requeue_request+0x72/0x350
[  484.184493]  __blk_mq_requeue_request+0x8f/0x120
[  484.184678]  blk_mq_dispatch_rq_list+0x342/0x550
[  484.184866]  ? kyber_dispatch_request+0xd0/0xd0
[  484.185053]  blk_mq_sched_dispatch_requests+0xf7/0x180
[  484.185238]  __blk_mq_run_hw_queue+0x58/0xd0
[  484.185429]  __blk_mq_delay_run_hw_queue+0x99/0xa0
[  484.185614]  blk_mq_run_hw_queue+0x54/0xf0
[  484.185805]  blk_mq_run_hw_queues+0x4b/0x60
[  484.185994]  blk_mq_requeue_work+0x13a/0x150
[  484.186192]  process_one_work+0x147/0x350
[  484.186383]  worker_thread+0x47/0x3e0
[  484.186572]  kthread+0xf8/0x130
[  484.186760]  ? rescuer_thread+0x360/0x360
[  484.186948]  ? kthread_stop+0x120/0x120
[  484.187137]  ret_from_fork+0x35/0x40
[  484.187321] Code: ff 48 89 44 24 10 

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Mike Galbraith
On Tue, 2018-02-06 at 09:37 +0100, Oleksandr Natalenko wrote:
> Hi.
> 
> 06.02.2018 08:56, Mike Galbraith wrote:
> > I was doing kbuilds, and it blew up on me twice.  Switching back to cfq
> > seemed to confirm it was indeed the patch causing trouble, but that's
> > by no means a certainty.
> 
> Just to note, I was using v4.15.1, not the latest git HEAD. Are you able 
> to reproduce it on the stable kernel?

I didn't even try to wedge it into 4.15.1, tested it as posted.

>  Also, assuming this issue might be 
> unrelated to the BFQ itself, did you manage to reproduce the trouble 
> with another blk-mq scheduler (not CFQ, but mq-deadline/Kyber)?

I can give another scheduler a go this afternoon.

-Mike


Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Mike Galbraith
On Tue, 2018-02-06 at 08:44 +0100, Oleksandr Natalenko wrote:
> Hi, Paolo.
> 
> I can confirm that this patch fixes cfdisk hang for me. I've also tried 
> to trigger the issue Mike has encountered, but with no luck (maybe, I 
> wasn't insistent enough, just was doing dd on usb-storage device in the 
> VM).

I was doing kbuilds, and it blew up on me twice.  Switching back to cfq
seemed to confirm it was indeed the patch causing trouble, but that's
by no means a certainty.

-Mike


Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Mike Galbraith
Hi Paolo,

I applied this to master.today, flipped udev back to bfq and took it
for a spin.  Unfortunately, box fairly quickly went boom under load.

[  454.739975] [ cut here ]
[  454.739979] list_add corruption. prev->next should be next 
(5f99a42a), but was   (null). (prev=fc569ec9).
[  454.739989] WARNING: CPU: 3 PID: 0 at lib/list_debug.c:28 
__list_add_valid+0x6a/0x70
[  454.739990] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) 
nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) xt_tcpudp(E) 
iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) 
nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) 
xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) 
nls_iso8859_1(E) nls_cp437(E) intel_rapl(E) x86_pkg_temp_thermal(E) 
intel_powerclamp(E) snd_hda_codec_hdmi(E) coretemp(E) snd_hda_codec_realtek(E) 
snd_hda_codec_generic(E) kvm_intel(E) kvm(E) snd_hda_intel(E) snd_hda_codec(E) 
snd_hwdep(E) snd_hda_core(E) snd_pcm(E) irqbypass(E) snd_timer(E) joydev(E) 
crct10dif_pclmul(E) snd(E) r8169(E) crc32_pclmul(E) mii(E) mei_me(E) 
soundcore(E)
[  454.740011]  crc32c_intel(E) iTCO_wdt(E) ghash_clmulni_intel(E) 
iTCO_vendor_support(E) pcbc(E) mei(E) lpc_ich(E) aesni_intel(E) i2c_i801(E) 
mfd_core(E) aes_x86_64(E) shpchp(E) intel_smartconnect(E) crypto_simd(E) 
glue_helper(E) cryptd(E) pcspkr(E) fan(E) thermal(E) nfsd(E) auth_rpcgss(E) 
nfs_acl(E) lockd(E) grace(E) sunrpc(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) 
hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) 
wmi(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) 
sysimgblt(E) fb_sys_fops(E) ahci(E) xhci_pci(E) ttm(E) libahci(E) ehci_pci(E) 
xhci_hcd(E) ehci_hcd(E) libata(E) drm(E) usbcore(E) video(E) button(E) 
sd_mod(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio_pci(E) 
virtio_ring(E) virtio(E) ext4(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E)
[  454.740038]  dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) 
scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E)
[  454.740043] CPU: 3 PID: 0 Comm: swapper/3 Tainted: GE
4.15.0.ge237f98-master #605
[  454.740044] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[  454.740046] RIP: 0010:__list_add_valid+0x6a/0x70
[  454.740047] RSP: 0018:88041ecc3ca8 EFLAGS: 00010096
[  454.740048] RAX: 0075 RBX: 8803f33fa8c0 RCX: 0006
[  454.740049] RDX:  RSI: 0082 RDI: 88041ecd5570
[  454.740050] RBP: 8803f596d7e0 R08:  R09: 0368
[  454.740051] R10:  R11: 88041ecc3a30 R12: 8803eb1c8828
[  454.740052] R13: 8803f33fa940 R14: 8803f5852600 R15: 8803f596d810
[  454.740053] FS:  () GS:88041ecc() 
knlGS:
[  454.740054] CS:  0010 DS:  ES:  CR0: 80050033
[  454.740055] CR2: 014d9788 CR3: 01e0a006 CR4: 001606e0
[  454.740056] Call Trace:
[  454.740058]  
[  454.740062]  blk_flush_complete_seq+0x2b1/0x370
[  454.740065]  flush_end_io+0x18c/0x280
[  454.740074]  scsi_end_request+0x95/0x1e0 [scsi_mod]
[  454.740079]  scsi_io_completion+0xbb/0x5d0 [scsi_mod]
[  454.740082]  __blk_mq_complete_request+0xb7/0x180
[  454.740084]  blk_mq_complete_request+0x50/0x90
[  454.740087]  ? scsi_vpd_tpg_id+0x90/0x90 [scsi_mod]
[  454.740095]  ata_scsi_qc_complete+0x1d8/0x470 [libata]
[  454.740100]  ata_qc_complete_multiple+0x87/0xd0 [libata]
[  454.740103]  ahci_handle_port_interrupt+0xd4/0x4e0 [libahci]
[  454.740105]  ahci_handle_port_intr+0x6f/0xb0 [libahci]
[  454.740107]  ahci_single_level_irq_intr+0x3b/0x60 [libahci]
[  454.740110]  __handle_irq_event_percpu+0x40/0x1a0
[  454.740112]  handle_irq_event_percpu+0x20/0x50
[  454.740114]  handle_irq_event+0x36/0x60
[  454.740116]  handle_edge_irq+0x90/0x190
[  454.740118]  handle_irq+0x1c/0x30
[  454.740120]  do_IRQ+0x43/0xd0
[  454.740122]  common_interrupt+0xa2/0xa2
[  454.740123]  
[  454.740125] RIP: 0010:cpuidle_enter_state+0xec/0x250
[  454.740126] RSP: 0018:880187febec0 EFLAGS: 0246 ORIG_RAX: 
ffdd
[  454.740127] RAX: 88041ece0040 RBX: 88041ece77e8 RCX: 001f
[  454.740128] RDX:  RSI: fffd9cb7bc38 RDI: 
[  454.740129] RBP: 0005 R08: 0006 R09: 024f
[  454.740130] R10: 0205 R11: 0018 R12: 0003
[  454.740131] R13: 0069e09a3c87 R14: 0003 R15: 0069e09d450c
[  454.740134]  do_idle+0x16a/0x1d0
[  454.740136]  cpu_startup_entry+0x19/0x20
[  454.740138]  start_secondary+0x14e/0x190
[  454.740140]  secondary_startup_64+0xa5/0xb0
[  454.740141] Code: fe 31 c0 48 c7 c7 a0 61 bf 81 e8 12 f7 d8 ff 0f ff 31 c0 
c3 48 89 d1 48 c7 c7 50 61 bf 81 48 

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-14 Thread Mike Galbraith
On Thu, 2017-12-14 at 22:54 +0100, Peter Zijlstra wrote:
> On Thu, Dec 14, 2017 at 09:42:48PM +, Bart Van Assche wrote:
> 
> > Some time ago the block layer was changed to handle timeouts in thread 
> > context
> > instead of interrupt context. See also commit 287922eb0b18 ("block: defer
> > timeouts to a workqueue").
> 
> That only makes it a little better:
> 
>   Task-A  Worker
> 
>   write_seqcount_begin()
>   blk_mq_rw_update_state(rq, IN_FLIGHT)
>   blk_add_timer(rq)
>   
>   schedule_work()
>   
>   
>   read_seqcount_begin()
>   while(seq & 1)
>   cpu_relax();
> 
> 
> Now normally this isn't fatal because Worker will simply spin its entire
> time slice away and we'll eventually schedule our Task-A back in, which
> will complete the seqcount and things will work.
> 
> But if, for some reason, our Worker was to have RT priority higher than
> our Task-A we'd be up some creek without no paddles.

Most kthreads, including kworkers, are very frequently SCHED_FIFO here.

-Mike


Re: possible deadlock in blk_trace_remove

2017-12-03 Thread Mike Galbraith
On Sun, 2017-12-03 at 17:47 -0700, Jens Axboe wrote:
> On 12/03/2017 05:44 PM, Eric Biggers wrote:
> > 
> >>> #syz fix: blktrace: fix trace mutex deadlock
> >>
> >> This is fixed in current -git.
> >>
> > 
> > I know, but syzbot needed to be told what commit fixes the bug.
> > See https://github.com/google/syzkaller/blob/master/docs/syzbot.md
> 
> Ah gotcha.

"@syzbot fix: bla" syntax would have been intuitive.

-Mike


Re: [PATCH BUGFIX/IMPROVEMENT V2 0/3] three bfq fixes restoring service guarantees with random sync writes in bg

2017-08-31 Thread Mike Galbraith
On Thu, 2017-08-31 at 15:42 +0100, Mel Gorman wrote:
> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:
> > [SECOND TAKE, with just the name of one of the tester fixed]
> > 
> > Hi,
> > while testing the read-write unfairness issues reported by Mel, I
> > found BFQ failing to guarantee good responsiveness against heavy
> > random sync writes in the background, i.e., multiple writers doing
> > random writes and systematic fdatasync [1]. The failure was caused by
> > three related bugs, because of which BFQ failed to guarantee to
> > high-weight processes the expected fraction of the throughput.
> > 
> 
> Queued on top of Ming's most recent series even though that's still a work
> in progress. I should know in a few days how things stand.

It seems to have cured an interactivity issue I regularly meet during
kbuild final link/depmod phase of fat kernel kbuild, especially bad
with evolution mail usage during that on spinning rust.  Can't really
say for sure given this is not based on measurement.

-Mike 


Re: blk-mq breaks suspend even with runtime PM patch

2017-08-08 Thread Mike Galbraith
On Tue, 2017-08-08 at 09:44 -0700, Greg KH wrote:
> 
> Should these go back farther than 4.12?  Looks like they apply cleanly
> to 4.9, didn't look older than that...

I met prerequisites at 4.11, but I wasn't patching anything remotely
resembling virgin source.

-Mike


Re: blk-mq breaks suspend even with runtime PM patch

2017-07-29 Thread Mike Galbraith
On Sat, 2017-07-29 at 17:27 +0200, Oleksandr Natalenko wrote:
> Hello Jens, Christoph.
> 
> Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied 
> blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well 
> as in a VM.
> 
> I use complex disk layout involving MD, LUKS and LVM, and managed to get 
> these 
> warnings from VM via serial console when suspend fails:
> 
> ===
> [  245.516573] INFO: task kworker/0:1:49 blocked for more than 120 seconds.
> [  245.520025]   Not tainted 4.12.0-pf4 #1

FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if
stable fixed it.  If not, I'd find these two commits irresistible.

5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue mapping
4b855ad37194f blk-mq: Create hctx for each present CPU

'course applying random upstream bits does come with some risk, trying
a kernel already containing them has less "entertainment" potential. 

-Mike


Re: [PATCH] Fix loop device flush before configure v2

2017-06-07 Thread Mike Galbraith
On Thu, 2017-06-08 at 10:17 +0800, James Wang wrote:
> This condition check was exist at before commit b5dd2f6047ca ("block: loop:
> improve performance via blk-mq") When add MQ support to loop device, it be
> removed because the member of '->lo_thread' be removed. And then upstream
> add '->worker_task', I think they forget add it to here.
> 
> When I install SLES-12 product is base on 4.4 kernel I found installer will
> hang +60 second at scan disks. and I found LVM tools would take this action.
> finally I found this problem is more obvious on AMD platform. This problem
> will impact all scenarios that scan loop devcie.
> 
> When the loop device didn't configure backing file or Request Queue, we
> shouldn't to cost a lot of time to flush it.

The changelog sounds odd to me, perhaps reword/condense a bit?...

While installing SLES-12 (based on v4.4), I found that the installer
will stall for 60+ seconds during LVM disk scan.  The root cause was
determined to be the removal of a bound device check in loop_flush()
by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq").

Restoring this check, examining ->lo_state as set by loop_set_fd()
eliminates the bad behavior.

Test method:
modprobe loop max_loop=64
dd if=/dev/zero of=disk bs=512 count=200K
for((i=0;i<4;i++))do losetup -f disk; done
mkfs.ext4 -F /dev/loop0
for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
for f in `ls /dev/loop[0-9]*|sort`; do \
echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
done

Test output:  stock  patched
/dev/loop018.1217e-058.3842e-05
/dev/loop1 6.1114e-050.000147979
/dev/loop100.414701  0.000116564
/dev/loop110.74746.7942e-05
/dev/loop120.747986  8.9082e-05
/dev/loop130.746532  7.4799e-05
/dev/loop140.480041  9.3926e-05
/dev/loop151.26453   7.2522e-05

Note that from loop10 onward, the device is not mounted, yet the
stock kernel consumes several orders of magnitude more wall time
than it does for a mounted device.

Reviewed-by: Hannes Reinecke 
Signed-off-by: James Wang 
Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq")
---
>  drivers/block/loop.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 48f6fa6f810e..2e5b8538760c 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -625,6 +625,9 @@ static int loop_switch(struct loop_device *lo, struct 
> file *file)
>   */
>  static int loop_flush(struct loop_device *lo)
>  {
> + /* loop not yet configured, no running thread, nothing to flush */
> + if (lo->lo_state != Lo_bound)
> + return 0;
>   return loop_switch(lo, NULL);
>  }
>