Re: bug in tag handling in blk-mq?
On Wed, 2018-05-09 at 13:50 -0600, Jens Axboe wrote: > On 5/9/18 12:31 PM, Mike Galbraith wrote: > > On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote: > >> On 5/9/18 10:57 AM, Mike Galbraith wrote: > >> > >>>>> Confirmed. Impressive high speed bug stomping. > >>>> > >>>> Well, that's good news. Can I get you to try this patch? > >>> > >>> Sure thing. The original hang (minus provocation patch) being > >>> annoyingly non-deterministic, this will (hopefully) take a while. > >> > >> You can verify with the provocation patch as well first, if you wish. > > > > Done, box still seems fine. > > Omar had some (valid) complaints, can you try this one as well? You > can also find it as a series here: > > http://git.kernel.dk/cgit/linux-block/log/?h=bfq-cleanups > > I'll repost the series shortly, need to check if it actually builds and > boots. I applied the series (+ provocation), all is well. -Mike
Re: bug in tag handling in blk-mq?
On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote: > On 5/9/18 10:57 AM, Mike Galbraith wrote: > > >>> Confirmed. Impressive high speed bug stomping. > >> > >> Well, that's good news. Can I get you to try this patch? > > > > Sure thing. The original hang (minus provocation patch) being > > annoyingly non-deterministic, this will (hopefully) take a while. > > You can verify with the provocation patch as well first, if you wish. Done, box still seems fine. -Mike
Re: bug in tag handling in blk-mq?
On Wed, 2018-05-09 at 09:18 -0600, Jens Axboe wrote: > On 5/8/18 10:11 PM, Mike Galbraith wrote: > > On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote: > >> > >> Alright, I managed to reproduce it. What I think is happening is that > >> BFQ is limiting the inflight case to something less than the wake > >> batch for sbitmap, which can lead to stalls. I don't have time to test > >> this tonight, but perhaps you can give it a go when you are back at it. > >> If not, I'll try tomorrow morning. > >> > >> If this is the issue, I can turn it into a real patch. This is just to > >> confirm that the issue goes away with the below. > > > > Confirmed. Impressive high speed bug stomping. > > Well, that's good news. Can I get you to try this patch? Sure thing. The original hang (minus provocation patch) being annoyingly non-deterministic, this will (hopefully) take a while. -Mike
Re: bug in tag handling in blk-mq?
On Tue, 2018-05-08 at 14:37 -0600, Jens Axboe wrote: > > - sdd has nothing pending, yet has 6 active waitqueues. sdd is where ccache storage lives, which that should have been the only activity on that drive, as I built source in sdb, and was doing nothing else that utilizes sdd. -Mike
Re: bug in tag handling in blk-mq?
On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote: > > Alright, I managed to reproduce it. What I think is happening is that > BFQ is limiting the inflight case to something less than the wake > batch for sbitmap, which can lead to stalls. I don't have time to test > this tonight, but perhaps you can give it a go when you are back at it. > If not, I'll try tomorrow morning. > > If this is the issue, I can turn it into a real patch. This is just to > confirm that the issue goes away with the below. Confirmed. Impressive high speed bug stomping. > diff --git a/lib/sbitmap.c b/lib/sbitmap.c > index e6a9c06ec70c..94ced15b6428 100644 > --- a/lib/sbitmap.c > +++ b/lib/sbitmap.c > @@ -272,6 +272,7 @@ EXPORT_SYMBOL_GPL(sbitmap_bitmap_show); > > static unsigned int sbq_calc_wake_batch(unsigned int depth) > { > +#if 0 > unsigned int wake_batch; > > /* > @@ -284,6 +285,9 @@ static unsigned int sbq_calc_wake_batch(unsigned int > depth) > wake_batch = max(1U, depth / SBQ_WAIT_QUEUES); > > return wake_batch; > +#else > + return 1; > +#endif > } > > int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, >
Re: bug in tag handling in blk-mq?
On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote: > > All the block debug files are empty... Sigh. Take 2, this time cat debug files, having turned block tracing off before doing anything else (so trace bits in dmesg.txt should end AT the stall). -Mike dmesg.xz Description: application/xz dmesg.txt.xz Description: application/xz block_debug.xz Description: application/xz
Re: bug in tag handling in blk-mq?
On Tue, 2018-05-08 at 06:51 +0200, Mike Galbraith wrote: > > I'm deadlined ATM, but will get to it. (Bah, even a zombie can type ccache -C; make -j8 and stare...) kbuild again hung on the first go (yay), and post hang data written to sdd1 survived (kernel source lives in sdb3). Full ftrace buffer (echo 1 > events/block/enable) available off list if desired. dmesg.txt.xz is dmesg from post hang crashdump, attached because it contains the tail of trace buffer, so _might_ be useful. homer:~ # df|grep sd /dev/sdb3 959074776 785342824 172741072 82% / /dev/sdc3 959074776 455464912 502618984 48% /backup /dev/sdb1 159564 7980 151584 6% /boot/efi /dev/sdd1 961301832 393334868 519112540 44% /abuild Kernel is virgin modulo these... patches/remove_irritating_plus.diff patches/add-scm-version-to-EXTRAVERSION.patch patches/block-bfq:-postpone-rq-preparation-to-insert-or-merge.patch patches/block-bfq:-test.patch (hang provocation hack from Paolo) -Mike block_debug.tar.xz Description: application/xz-compressed-tar dmesg.xz Description: application/xz dmesg.txt.xz Description: application/xz
Re: bug in tag handling in blk-mq?
On Mon, 2018-05-07 at 20:02 +0200, Paolo Valente wrote: > > > > Is there a reproducer? Just building fat config kernels works for me. It was highly non- deterministic, but reproduced quickly twice in a row with Paolos hack. > Ok Mike, I guess it's your turn now, for at least a stack trace. Sure. I'm deadlined ATM, but will get to it. -Mike
Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge
On Mon, 2018-05-07 at 11:27 +0200, Paolo Valente wrote: > > > Where is the bug? Hm, seems potent pain-killers and C don't mix all that well.
Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge
On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote: > > diff --git a/block/bfq-mq-iosched.c b/block/bfq-mq-iosched.c > index 118f319af7c0..6662efe29b69 100644 > --- a/block/bfq-mq-iosched.c > +++ b/block/bfq-mq-iosched.c > @@ -525,8 +525,13 @@ static void bfq_limit_depth(unsigned int op, struct > blk_mq_alloc_data *data) > if (unlikely(bfqd->sb_shift != bt->sb.shift)) > bfq_update_depths(bfqd, bt); > > +#if 0 > data->shallow_depth = > bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)]; ^ Q: why doesn't the top of this function look like so? --- block/bfq-iosched.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -539,7 +539,7 @@ static void bfq_limit_depth(unsigned int struct bfq_data *bfqd = data->q->elevator->elevator_data; struct sbitmap_queue *bt; - if (op_is_sync(op) && !op_is_write(op)) + if (!op_is_write(op)) return; if (data->flags & BLK_MQ_REQ_RESERVED) { It looks a bit odd that these elements exist... + /* + * no more than 75% of tags for sync writes (25% extra tags + * w.r.t. async I/O, to prevent async I/O from starving sync + * writes) + */ + bfqd->word_depths[0][1] = max(((1U>2, 1U); + /* no more than ~37% of tags for sync writes (~20% extra tags) */ + bfqd->word_depths[1][1] = max(((1U >4, 1U); ...yet we index via and log a guaranteed zero. -Mike
Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge
On Mon, 2018-05-07 at 04:43 +0200, Mike Galbraith wrote: > On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote: > > > > I've attached a compressed patch (to avoid possible corruption from my > > mailer). I'm little confident, but no pain, no gain, right? > > > > If possible, apply this patch on top of the fix I proposed in this > > thread, just to eliminate possible further noise. Finally, the > > patch content follows. > > > > Hoping for a stroke of luck, > > FWIW, box didn't survive the first full build of the morning. Nor the second. -Mike
Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge
On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote: > > I've attached a compressed patch (to avoid possible corruption from my > mailer). I'm little confident, but no pain, no gain, right? > > If possible, apply this patch on top of the fix I proposed in this > thread, just to eliminate possible further noise. Finally, the > patch content follows. > > Hoping for a stroke of luck, FWIW, box didn't survive the first full build of the morning. > Paolo > > diff --git a/block/bfq-mq-iosched.c b/block/bfq-mq-iosched.c > index 118f319af7c0..6662efe29b69 100644 > --- a/block/bfq-mq-iosched.c > +++ b/block/bfq-mq-iosched.c That doesn't exist in master, so I applied it like so. --- block/bfq-iosched.c |4 1 file changed, 4 insertions(+) --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -554,8 +554,12 @@ static void bfq_limit_depth(unsigned int if (unlikely(bfqd->sb_shift != bt->sb.shift)) bfq_update_depths(bfqd, bt); +#if 0 data->shallow_depth = bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)]; +#else + data->shallow_depth = 1; +#endif bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u", __func__, bfqd->wr_busy_queues, op_is_sync(op),
Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge
On Sat, 2018-05-05 at 12:39 +0200, Paolo Valente wrote: > > BTW, if you didn't run out of patience with this permanent issue yet, > I was thinking of two o three changes to retry to trigger your failure > reliably. Sure, fire away, I'll happily give the annoying little bugger opportunities to show its tender belly. -Mike
Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge
On Fri, 2018-05-04 at 21:46 +0200, Mike Galbraith wrote: > Tentatively, I suspect you've just fixed the nasty stalls I reported a > while back. Oh well, so much for optimism. It took a lot, but just hung.
Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge
Tentatively, I suspect you've just fixed the nasty stalls I reported a while back. Not a hint of stall as yet (should have shown itself by now), spinning rust buckets are being all they can be, box feels good. Later mq-deadline (I hope to eventually forget the module dependency eternities we've spent together;), welcome back bfq (maybe.. I hope). -Mike
Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote: > > In addition to this I think it should be worth considering CC'ing Greg > to pull this fix into 4.15 stable tree. This isn't one he can cherry-pick, some munging required, in which case he usually wants a properly tested backport. -Mike
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Wed, 2018-02-07 at 12:12 +0100, Paolo Valente wrote: > Just to be certain, before submitting a new patch: you changed *only* > the BUG_ON at line 4742, on top of my instrumentation patch. Nah, I completely rewrite it with only a little help from an ouija board to compensate for missing (all) knowledge wrt BFQ internals. (iow yeah, I did exactly and only what I was asked to do:)
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Wed, 2018-02-07 at 11:27 +0100, Paolo Valente wrote: > > 2. Could you please turn that BUG_ON into: > if (!(rq->rq_flags & RQF_ELVPRIV)) > return; > and see what happens? That seems to make it forgets how to make boom. -Mike
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Wed, 2018-02-07 at 11:27 +0100, Paolo Valente wrote: > > 1. Could you paste a stack trace for this OOPS, just to understand how we > get there? [ 442.421058] kernel BUG at block/bfq-iosched.c:4742! [ 442.421762] invalid opcode: [#1] SMP PTI [ 442.422436] Dumping ftrace buffer: [ 442.423116](ftrace buffer empty) [ 442.423785] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) xt_tcpudp(E) iptable_filter(E) ip6table_> [ 442.426685] pcbc(E) iTCO_wdt(E) aesni_intel(E) aes_x86_64(E) iTCO_vendor_support(E) crypto_simd(E) glue_helper(E) mei_me(E) i2c_i801(E) r8169(E) cryptd(E) lpc_ich(E) mfd_core(E) mii(E) mei(E) soundcore(E) pcspkr(E) shpchp(E) thermal> [ 442.429634] dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [ 442.430166] CPU: 2 PID: 489 Comm: kworker/2:1H Tainted: GE 4.15.0.ga2e5790-master #612 [ 442.430675] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 442.431195] Workqueue: kblockd blk_mq_requeue_work [ 442.431706] RIP: 0010:bfq_only_requeue_request+0xd1/0xe0 [ 442.432221] RSP: 0018:8803f7b8fcc0 EFLAGS: 00010246 [ 442.432733] RAX: 2090 RBX: 8803dbda2580 RCX: [ 442.433250] RDX: 8803fa3c8000 RSI: 0004 RDI: 8803dbda2580 [ 442.433778] RBP: 8803f9ec85d0 R08: 8803fa3c81c0 R09: [ 442.434296] R10: 8803fa3c81d0 R11: 1000 R12: 8803dbda2600 [ 442.434815] R13: 000d R14: R15: 8803f7b8fd88 [ 442.435334] FS: () GS:88041ec8() knlGS: [ 442.435863] CS: 0010 DS: ES: CR0: 80050033 [ 442.436381] CR2: 7f00e001e198 CR3: 01e0a002 CR4: 001606e0 [ 442.436909] Call Trace: [ 442.437445] __blk_mq_requeue_request+0x8f/0x120 [ 442.437980] blk_mq_dispatch_rq_list+0x342/0x550 [ 442.438506] ? kyber_dispatch_request+0xd0/0xd0 [ 442.439041] blk_mq_sched_dispatch_requests+0xf7/0x180 [ 442.439568] __blk_mq_run_hw_queue+0x58/0xd0 [ 442.440103] __blk_mq_delay_run_hw_queue+0x99/0xa0 [ 442.440628] blk_mq_run_hw_queue+0x54/0xf0 [ 442.441157] blk_mq_run_hw_queues+0x4b/0x60 [ 442.441822] blk_mq_requeue_work+0x13a/0x150 [ 442.442518] process_one_work+0x147/0x350 [ 442.443205] worker_thread+0x47/0x3e0 [ 442.443887] kthread+0xf8/0x130 [ 442.444579] ? rescuer_thread+0x360/0x360 [ 442.445264] ? kthread_stop+0x120/0x120 [ 442.445965] ? do_syscall_64+0x75/0x1a0 [ 442.446651] ? SyS_exit_group+0x10/0x10 [ 442.447340] ret_from_fork+0x35/0x40 [ 442.448023] Code: ff 4c 89 f6 4c 89 ef e8 be ec 2e 00 48 c7 83 80 00 00 00 00 00 00 00 48 c7 83 88 00 00 00 00 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 41 [ 442.448668] RIP: bfq_only_requeue_request+0xd1/0xe0 RSP: 8803f7b8fcc0
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Wed, 2018-02-07 at 10:45 +0100, Paolo Valente wrote: > > > Il giorno 07 feb 2018, alle ore 10:23, Mike Galbraith <efa...@gmx.de> ha > > scritto: > > > > On Wed, 2018-02-07 at 10:08 +0100, Paolo Valente wrote: > >> > >> The first piece of information I need is whether this failure happens > >> even without "BFQ hierarchical scheduling support". > > > > I presume you mean BFQ_GROUP_IOSCHED, which I do not have enabled. > > > > Great (so to speak), this saves us one step. > > So, here's my next request for help: please apply the attached patch > (compressed to preserve it from my email client) and retry. It adds > several anomaly checks. I hope I have not added any false-positive > check. kernel BUG at block/bfq-iosched.c:4742! 4742 BUG_ON(!(rq->rq_flags & RQF_ELVPRIV));
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Wed, 2018-02-07 at 10:08 +0100, Paolo Valente wrote: > > The first piece of information I need is whether this failure happens > even without "BFQ hierarchical scheduling support". I presume you mean BFQ_GROUP_IOSCHED, which I do not have enabled. -Mike
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Tue, 2018-02-06 at 13:43 +0100, Holger Hoffstätte wrote: > > A much more interesting question to me is why there is kyber in the middle. :) Yeah, given per sysfs I have zero devices using kyber. -Mike
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Tue, 2018-02-06 at 13:26 +0100, Paolo Valente wrote: > > ok, right in the middle of bfq this time ... Was this the first OOPS in your > kernel log? Yeah.
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Tue, 2018-02-06 at 13:16 +0100, Oleksandr Natalenko wrote: > Hi. > > 06.02.2018 12:57, Mike Galbraith wrote: > > Not me. Box seems to be fairly sure that it is bfq. Twice again box > > went belly up on me in fairly short order with bfq, but seemed fine > > with deadline. I'm currently running deadline again, and box again > > seems solid, thought I won't say _is_ solid until it's been happily > > trundling along with deadline for a quite a bit longer. > > Sorry for the noise, but just to make it clear, are we talking about > "deadline" or "mq-deadline" now? mq-deadline.
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Tue, 2018-02-06 at 10:38 +0100, Paolo Valente wrote: > > Hi Mike, > as you can imagine, I didn't get any failure in my pre-submission > tests on this patch. In addition, it is not that easy to link this > patch, which just adds some internal bfq housekeeping in case of a > requeue, with a corruption of external lists for general I/O > management. > > In this respect, as Oleksandr comments point out, by switching from > cfq to bfq, you switch between much more than two schedulers. Anyway, > who knows ... Not me. Box seems to be fairly sure that it is bfq. Twice again box went belly up on me in fairly short order with bfq, but seemed fine with deadline. I'm currently running deadline again, and box again seems solid, thought I won't say _is_ solid until it's been happily trundling along with deadline for a quite a bit longer. I was ssh'd in during the last episode, got this out. I should be getting crash dumps, but seems kdump is only working intermittently atm. I did get one earlier, but 3 of 4 times not. Hohum. [ 484.179292] BUG: unable to handle kernel paging request at a0817000 [ 484.179436] IP: __trace_note_message+0x1f/0xd0 [ 484.179576] PGD 1e0c067 P4D 1e0c067 PUD 1e0d063 PMD 3faff2067 PTE 0 [ 484.179719] Oops: [#1] SMP PTI [ 484.179861] Dumping ftrace buffer: [ 484.180011](ftrace buffer empty) [ 484.180138] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) xt_tcpudp(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) nls_iso8859_1(E) nls_cp437(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec_hdmi(E) coretemp(E) kvm_intel(E) snd_hda_codec_realtek(E) kvm(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) sr_mod(E) snd_hwdep(E) cdrom(E) joydev(E) snd_hda_core(E) snd_pcm(E) snd_timer(E) irqbypass(E) snd(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) r8169(E) [ 484.180740] iTCO_wdt(E) ghash_clmulni_intel(E) mii(E) iTCO_vendor_support(E) pcbc(E) aesni_intel(E) soundcore(E) aes_x86_64(E) shpchp(E) crypto_simd(E) lpc_ich(E) glue_helper(E) i2c_i801(E) mei_me(E) mfd_core(E) mei(E) cryptd(E) intel_smartconnect(E) pcspkr(E) fan(E) thermal(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ahci(E) xhci_pci(E) ehci_pci(E) libahci(E) ttm(E) ehci_hcd(E) xhci_hcd(E) libata(E) drm(E) usbcore(E) video(E) button(E) sd_mod(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio_pci(E) virtio_ring(E) virtio(E) ext4(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) dm_multipath(E) [ 484.181421] dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [ 484.181583] CPU: 3 PID: 500 Comm: kworker/3:1H Tainted: GE 4.15.0.ge237f98-master #609 [ 484.181746] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 484.181910] Workqueue: kblockd blk_mq_requeue_work [ 484.182076] RIP: 0010:__trace_note_message+0x1f/0xd0 [ 484.182250] RSP: 0018:8803f45bfc20 EFLAGS: 00010282 [ 484.182436] RAX: RBX: a0817000 RCX: 8803 [ 484.182622] RDX: 81bf514d RSI: RDI: a0817000 [ 484.182810] RBP: 8803f45bfc80 R08: 0041 R09: 8803f69cc5d0 [ 484.182998] R10: 8803f80b47d0 R11: 1000 R12: 8803f45e8000 [ 484.183185] R13: 000d R14: R15: 8803fba112c0 [ 484.183372] FS: () GS:88041ecc() knlGS: [ 484.183561] CS: 0010 DS: ES: CR0: 80050033 [ 484.183747] CR2: a0817000 CR3: 01e0a006 CR4: 001606e0 [ 484.183934] Call Trace: [ 484.184122] bfq_put_queue+0xd3/0xe0 [ 484.184305] bfq_finish_requeue_request+0x72/0x350 [ 484.184493] __blk_mq_requeue_request+0x8f/0x120 [ 484.184678] blk_mq_dispatch_rq_list+0x342/0x550 [ 484.184866] ? kyber_dispatch_request+0xd0/0xd0 [ 484.185053] blk_mq_sched_dispatch_requests+0xf7/0x180 [ 484.185238] __blk_mq_run_hw_queue+0x58/0xd0 [ 484.185429] __blk_mq_delay_run_hw_queue+0x99/0xa0 [ 484.185614] blk_mq_run_hw_queue+0x54/0xf0 [ 484.185805] blk_mq_run_hw_queues+0x4b/0x60 [ 484.185994] blk_mq_requeue_work+0x13a/0x150 [ 484.186192] process_one_work+0x147/0x350 [ 484.186383] worker_thread+0x47/0x3e0 [ 484.186572] kthread+0xf8/0x130 [ 484.186760] ? rescuer_thread+0x360/0x360 [ 484.186948] ? kthread_stop+0x120/0x120 [ 484.187137] ret_from_fork+0x35/0x40 [ 484.187321] Code: ff 48 89 44 24 10
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Tue, 2018-02-06 at 09:37 +0100, Oleksandr Natalenko wrote: > Hi. > > 06.02.2018 08:56, Mike Galbraith wrote: > > I was doing kbuilds, and it blew up on me twice. Switching back to cfq > > seemed to confirm it was indeed the patch causing trouble, but that's > > by no means a certainty. > > Just to note, I was using v4.15.1, not the latest git HEAD. Are you able > to reproduce it on the stable kernel? I didn't even try to wedge it into 4.15.1, tested it as posted. > Also, assuming this issue might be > unrelated to the BFQ itself, did you manage to reproduce the trouble > with another blk-mq scheduler (not CFQ, but mq-deadline/Kyber)? I can give another scheduler a go this afternoon. -Mike
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
On Tue, 2018-02-06 at 08:44 +0100, Oleksandr Natalenko wrote: > Hi, Paolo. > > I can confirm that this patch fixes cfdisk hang for me. I've also tried > to trigger the issue Mike has encountered, but with no luck (maybe, I > wasn't insistent enough, just was doing dd on usb-storage device in the > VM). I was doing kbuilds, and it blew up on me twice. Switching back to cfq seemed to confirm it was indeed the patch causing trouble, but that's by no means a certainty. -Mike
Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
Hi Paolo, I applied this to master.today, flipped udev back to bfq and took it for a spin. Unfortunately, box fairly quickly went boom under load. [ 454.739975] [ cut here ] [ 454.739979] list_add corruption. prev->next should be next (5f99a42a), but was (null). (prev=fc569ec9). [ 454.739989] WARNING: CPU: 3 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x6a/0x70 [ 454.739990] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) xt_tcpudp(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) nls_iso8859_1(E) nls_cp437(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec_hdmi(E) coretemp(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) kvm_intel(E) kvm(E) snd_hda_intel(E) snd_hda_codec(E) snd_hwdep(E) snd_hda_core(E) snd_pcm(E) irqbypass(E) snd_timer(E) joydev(E) crct10dif_pclmul(E) snd(E) r8169(E) crc32_pclmul(E) mii(E) mei_me(E) soundcore(E) [ 454.740011] crc32c_intel(E) iTCO_wdt(E) ghash_clmulni_intel(E) iTCO_vendor_support(E) pcbc(E) mei(E) lpc_ich(E) aesni_intel(E) i2c_i801(E) mfd_core(E) aes_x86_64(E) shpchp(E) intel_smartconnect(E) crypto_simd(E) glue_helper(E) cryptd(E) pcspkr(E) fan(E) thermal(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ahci(E) xhci_pci(E) ttm(E) libahci(E) ehci_pci(E) xhci_hcd(E) ehci_hcd(E) libata(E) drm(E) usbcore(E) video(E) button(E) sd_mod(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio_pci(E) virtio_ring(E) virtio(E) ext4(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) [ 454.740038] dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [ 454.740043] CPU: 3 PID: 0 Comm: swapper/3 Tainted: GE 4.15.0.ge237f98-master #605 [ 454.740044] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 454.740046] RIP: 0010:__list_add_valid+0x6a/0x70 [ 454.740047] RSP: 0018:88041ecc3ca8 EFLAGS: 00010096 [ 454.740048] RAX: 0075 RBX: 8803f33fa8c0 RCX: 0006 [ 454.740049] RDX: RSI: 0082 RDI: 88041ecd5570 [ 454.740050] RBP: 8803f596d7e0 R08: R09: 0368 [ 454.740051] R10: R11: 88041ecc3a30 R12: 8803eb1c8828 [ 454.740052] R13: 8803f33fa940 R14: 8803f5852600 R15: 8803f596d810 [ 454.740053] FS: () GS:88041ecc() knlGS: [ 454.740054] CS: 0010 DS: ES: CR0: 80050033 [ 454.740055] CR2: 014d9788 CR3: 01e0a006 CR4: 001606e0 [ 454.740056] Call Trace: [ 454.740058] [ 454.740062] blk_flush_complete_seq+0x2b1/0x370 [ 454.740065] flush_end_io+0x18c/0x280 [ 454.740074] scsi_end_request+0x95/0x1e0 [scsi_mod] [ 454.740079] scsi_io_completion+0xbb/0x5d0 [scsi_mod] [ 454.740082] __blk_mq_complete_request+0xb7/0x180 [ 454.740084] blk_mq_complete_request+0x50/0x90 [ 454.740087] ? scsi_vpd_tpg_id+0x90/0x90 [scsi_mod] [ 454.740095] ata_scsi_qc_complete+0x1d8/0x470 [libata] [ 454.740100] ata_qc_complete_multiple+0x87/0xd0 [libata] [ 454.740103] ahci_handle_port_interrupt+0xd4/0x4e0 [libahci] [ 454.740105] ahci_handle_port_intr+0x6f/0xb0 [libahci] [ 454.740107] ahci_single_level_irq_intr+0x3b/0x60 [libahci] [ 454.740110] __handle_irq_event_percpu+0x40/0x1a0 [ 454.740112] handle_irq_event_percpu+0x20/0x50 [ 454.740114] handle_irq_event+0x36/0x60 [ 454.740116] handle_edge_irq+0x90/0x190 [ 454.740118] handle_irq+0x1c/0x30 [ 454.740120] do_IRQ+0x43/0xd0 [ 454.740122] common_interrupt+0xa2/0xa2 [ 454.740123] [ 454.740125] RIP: 0010:cpuidle_enter_state+0xec/0x250 [ 454.740126] RSP: 0018:880187febec0 EFLAGS: 0246 ORIG_RAX: ffdd [ 454.740127] RAX: 88041ece0040 RBX: 88041ece77e8 RCX: 001f [ 454.740128] RDX: RSI: fffd9cb7bc38 RDI: [ 454.740129] RBP: 0005 R08: 0006 R09: 024f [ 454.740130] R10: 0205 R11: 0018 R12: 0003 [ 454.740131] R13: 0069e09a3c87 R14: 0003 R15: 0069e09d450c [ 454.740134] do_idle+0x16a/0x1d0 [ 454.740136] cpu_startup_entry+0x19/0x20 [ 454.740138] start_secondary+0x14e/0x190 [ 454.740140] secondary_startup_64+0xa5/0xb0 [ 454.740141] Code: fe 31 c0 48 c7 c7 a0 61 bf 81 e8 12 f7 d8 ff 0f ff 31 c0 c3 48 89 d1 48 c7 c7 50 61 bf 81 48
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme
On Thu, 2017-12-14 at 22:54 +0100, Peter Zijlstra wrote: > On Thu, Dec 14, 2017 at 09:42:48PM +, Bart Van Assche wrote: > > > Some time ago the block layer was changed to handle timeouts in thread > > context > > instead of interrupt context. See also commit 287922eb0b18 ("block: defer > > timeouts to a workqueue"). > > That only makes it a little better: > > Task-A Worker > > write_seqcount_begin() > blk_mq_rw_update_state(rq, IN_FLIGHT) > blk_add_timer(rq) > > schedule_work() > > > read_seqcount_begin() > while(seq & 1) > cpu_relax(); > > > Now normally this isn't fatal because Worker will simply spin its entire > time slice away and we'll eventually schedule our Task-A back in, which > will complete the seqcount and things will work. > > But if, for some reason, our Worker was to have RT priority higher than > our Task-A we'd be up some creek without no paddles. Most kthreads, including kworkers, are very frequently SCHED_FIFO here. -Mike
Re: possible deadlock in blk_trace_remove
On Sun, 2017-12-03 at 17:47 -0700, Jens Axboe wrote: > On 12/03/2017 05:44 PM, Eric Biggers wrote: > > > >>> #syz fix: blktrace: fix trace mutex deadlock > >> > >> This is fixed in current -git. > >> > > > > I know, but syzbot needed to be told what commit fixes the bug. > > See https://github.com/google/syzkaller/blob/master/docs/syzbot.md > > Ah gotcha. "@syzbot fix: bla" syntax would have been intuitive. -Mike
Re: [PATCH BUGFIX/IMPROVEMENT V2 0/3] three bfq fixes restoring service guarantees with random sync writes in bg
On Thu, 2017-08-31 at 15:42 +0100, Mel Gorman wrote: > On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote: > > [SECOND TAKE, with just the name of one of the tester fixed] > > > > Hi, > > while testing the read-write unfairness issues reported by Mel, I > > found BFQ failing to guarantee good responsiveness against heavy > > random sync writes in the background, i.e., multiple writers doing > > random writes and systematic fdatasync [1]. The failure was caused by > > three related bugs, because of which BFQ failed to guarantee to > > high-weight processes the expected fraction of the throughput. > > > > Queued on top of Ming's most recent series even though that's still a work > in progress. I should know in a few days how things stand. It seems to have cured an interactivity issue I regularly meet during kbuild final link/depmod phase of fat kernel kbuild, especially bad with evolution mail usage during that on spinning rust. Can't really say for sure given this is not based on measurement. -Mike
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, 2017-08-08 at 09:44 -0700, Greg KH wrote: > > Should these go back farther than 4.12? Looks like they apply cleanly > to 4.9, didn't look older than that... I met prerequisites at 4.11, but I wasn't patching anything remotely resembling virgin source. -Mike
Re: blk-mq breaks suspend even with runtime PM patch
On Sat, 2017-07-29 at 17:27 +0200, Oleksandr Natalenko wrote: > Hello Jens, Christoph. > > Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied > blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well > as in a VM. > > I use complex disk layout involving MD, LUKS and LVM, and managed to get > these > warnings from VM via serial console when suspend fails: > > === > [ 245.516573] INFO: task kworker/0:1:49 blocked for more than 120 seconds. > [ 245.520025] Not tainted 4.12.0-pf4 #1 FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if stable fixed it. If not, I'd find these two commits irresistible. 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue mapping 4b855ad37194f blk-mq: Create hctx for each present CPU 'course applying random upstream bits does come with some risk, trying a kernel already containing them has less "entertainment" potential. -Mike
Re: [PATCH] Fix loop device flush before configure v2
On Thu, 2017-06-08 at 10:17 +0800, James Wang wrote: > This condition check was exist at before commit b5dd2f6047ca ("block: loop: > improve performance via blk-mq") When add MQ support to loop device, it be > removed because the member of '->lo_thread' be removed. And then upstream > add '->worker_task', I think they forget add it to here. > > When I install SLES-12 product is base on 4.4 kernel I found installer will > hang +60 second at scan disks. and I found LVM tools would take this action. > finally I found this problem is more obvious on AMD platform. This problem > will impact all scenarios that scan loop devcie. > > When the loop device didn't configure backing file or Request Queue, we > shouldn't to cost a lot of time to flush it. The changelog sounds odd to me, perhaps reword/condense a bit?... While installing SLES-12 (based on v4.4), I found that the installer will stall for 60+ seconds during LVM disk scan. The root cause was determined to be the removal of a bound device check in loop_flush() by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq"). Restoring this check, examining ->lo_state as set by loop_set_fd() eliminates the bad behavior. Test method: modprobe loop max_loop=64 dd if=/dev/zero of=disk bs=512 count=200K for((i=0;i<4;i++))do losetup -f disk; done mkfs.ext4 -F /dev/loop0 for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done for f in `ls /dev/loop[0-9]*|sort`; do \ echo $f; dd if=$f of=/dev/null bs=512 count=1; \ done Test output: stock patched /dev/loop018.1217e-058.3842e-05 /dev/loop1 6.1114e-050.000147979 /dev/loop100.414701 0.000116564 /dev/loop110.74746.7942e-05 /dev/loop120.747986 8.9082e-05 /dev/loop130.746532 7.4799e-05 /dev/loop140.480041 9.3926e-05 /dev/loop151.26453 7.2522e-05 Note that from loop10 onward, the device is not mounted, yet the stock kernel consumes several orders of magnitude more wall time than it does for a mounted device. Reviewed-by: Hannes ReineckeSigned-off-by: James Wang Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq") --- > drivers/block/loop.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 48f6fa6f810e..2e5b8538760c 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -625,6 +625,9 @@ static int loop_switch(struct loop_device *lo, struct > file *file) > */ > static int loop_flush(struct loop_device *lo) > { > + /* loop not yet configured, no running thread, nothing to flush */ > + if (lo->lo_state != Lo_bound) > + return 0; > return loop_switch(lo, NULL); > } >