Re: [PATCH V4] scsi_debugfs: fix crash in scsi_show_rq()
On Fri, Nov 10, 2017 at 08:51:58AM -0800, James Bottomley wrote: > On Fri, 2017-11-10 at 17:01 +0800, Ming Lei wrote: > > cmd->cmnd can be allocated/freed dynamically in case of > > T10_PI_TYPE2_PROTECTION, > > so we should check it in scsi_show_rq() because this request may have > > been freed already here, and cmd->cmnd has been set as null. > > > > We choose to accept read-after-free and dump request data as far as > > possible. > > > > This patch fixs the following kernel crash when dumping request via > > block's > > debugfs interface: > > > > [ 252.962045] BUG: unable to handle kernel NULL pointer dereference > > at (null) > > [ 252.963007] IP: scsi_format_opcode_name+0x1a/0x1c0 > > [ 252.963007] PGD 25e75a067 P4D 25e75a067 PUD 25e75b067 PMD 0 > > [ 252.963007] Oops: [#1] PREEMPT SMP > > [ 252.963007] Dumping ftrace buffer: > > [ 252.963007](ftrace buffer empty) > > [ 252.963007] Modules linked in: scsi_debug ebtable_filter ebtables > > ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE > > nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 > > nf_nat_ipv4 nf_nat nf_conntrack libcrc32c bridge stp llc > > iptable_filter fuse ip_tables sd_mod sg mptsas mptscsih mptbase > > crc32c_intel ahci libahci nvme serio_raw scsi_transport_sas libata > > lpc_ich nvme_core virtio_scsi binfmt_misc dm_mod iscsi_tcp > > libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs > > [ 252.963007] CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0- > > rc2.blk_mq_io_hang+ #516 > > [ 252.963007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > > BIOS 1.9.3-1.fc25 04/01/2014 > > [ 252.963007] task: 88025e6f6000 task.stack: c90001bd > > [ 252.963007] RIP: 0010:scsi_format_opcode_name+0x1a/0x1c0 > > [ 252.963007] RSP: 0018:c90001bd3c50 EFLAGS: 00010286 > > [ 252.963007] RAX: 4843 RBX: 0050 RCX: > > > > [ 252.963007] RDX: RSI: 0050 RDI: > > c90001bd3cd8 > > [ 252.963007] RBP: c90001bd3c88 R08: 1000 R09: > > > > [ 252.963007] R10: 880275134000 R11: 88027513406c R12: > > 0050 > > [ 252.963007] R13: c90001bd3cd8 R14: R15: > > > > [ 252.963007] FS: 7f4d11762700() GS:88027fc4() > > knlGS: > > [ 252.963007] CS: 0010 DS: ES: CR0: 80050033 > > [ 252.963007] CR2: CR3: 00025e789003 CR4: > > 003606e0 > > [ 252.963007] DR0: DR1: DR2: > > > > [ 252.963007] DR3: DR6: fffe0ff0 DR7: > > 0400 > > [ 252.963007] Call Trace: > > [ 252.963007] __scsi_format_command+0x27/0xc0 > > [ 252.963007] scsi_show_rq+0x5c/0xc0 > > [ 252.963007] ? seq_printf+0x4e/0x70 > > [ 252.963007] ? blk_flags_show+0x5b/0xf0 > > [ 252.963007] __blk_mq_debugfs_rq_show+0x116/0x130 > > [ 252.963007] blk_mq_debugfs_rq_show+0xe/0x10 > > [ 252.963007] seq_read+0xfe/0x3b0 > > [ 252.963007] ? __handle_mm_fault+0x631/0x1150 > > [ 252.963007] full_proxy_read+0x54/0x90 > > [ 252.963007] __vfs_read+0x37/0x160 > > [ 252.963007] ? security_file_permission+0x9b/0xc0 > > [ 252.963007] vfs_read+0x96/0x130 > > [ 252.963007] SyS_read+0x55/0xc0 > > [ 252.963007] entry_SYSCALL_64_fastpath+0x1a/0xa5 > > [ 252.963007] RIP: 0033:0x7f4d1127e9b0 > > [ 252.963007] RSP: 002b:7ffd27082568 EFLAGS: 0246 ORIG_RAX: > > > > [ 252.963007] RAX: ffda RBX: 7f4d1154bb20 RCX: > > 7f4d1127e9b0 > > [ 252.963007] RDX: 0002 RSI: 7f4d115a7000 RDI: > > 0003 > > [ 252.963007] RBP: 00021010 R08: R09: > > > > [ 252.963007] R10: 037b R11: 0246 R12: > > 00022000 > > [ 252.963007] R13: 7f4d1154bb78 R14: 1000 R15: > > 0002 > > [ 252.963007] Code: c6 e8 1b ca 24 00 eb 8c e8 74 2c ae ff 0f 1f 40 > > 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 89 f4 > > 48 83 ec 18 <44> 0f b6 32 48 c7 45 c8 00 00 00 00 65 48 8b 04 25 28 > > 00 00 00 > > [ 252.963007] RIP: scsi_format_opcode_name+0x1a/0x1c0 RSP: > > c90001bd3c50 > > [ 252.963007] CR2: > > [ 252.963007] ---[ end trace 83c5bddfbaa6573c ]--- > > [ 252.963007] Kernel panic - not syncing: Fatal exception > > [ 252.963007] Dumping ftrace buffer: > > [ 252.963007](ftrace buffer empty) > > [ 252.963007] Kernel Offset: disabled > > [ 252.963007] ---[ end Kernel panic - not syncing: Fatal exception > > > > Fixes: 0eebd005dd07(scsi: Implement blk_mq_ops.show_rq()) > > Cc: Bart Van Assche> > Cc: Omar Sandoval > > Cc: Martin K. Petersen > > Cc: James Bottomley > > Cc: Hannes Reinecke
Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
On Fri, Nov 10, 2017 at 04:30:35PM +, Bart Van Assche wrote: > On Sun, 2017-11-05 at 15:38 +, Bart Van Assche wrote: > > On Sun, 2017-11-05 at 20:10 +0800, Ming Lei wrote: > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > index 048be4aa6024..0b121f29e3b1 100644 > > > --- a/block/blk-core.c > > > +++ b/block/blk-core.c > > > @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q) > > > queue_flag_set(QUEUE_FLAG_DEAD, q); > > > spin_unlock_irq(lock); > > > > > > + /* respect queue DEAD via quiesce for blk-mq */ > > > + if (q->mq_ops) > > > + blk_mq_quiesce_queue(q); > > > + > > > /* for synchronous bio-based driver finish in-flight integrity i/o */ > > > blk_flush_integrity(); > > > > Have you considered to change the blk_freeze_queue_start() call in > > blk_set_queue_dying() into a blk_freeze_queue() call? That approach has the > > advantage that no new if (q->mq_ops) test has to be introduced. > > There is an additional reason why I think it would be better to make > blk_set_queue_dying() wait until requests that are in progress have finished. > Some block drivers, e.g. the mtip driver, start cleaning up resources > immediately after blk_set_queue_dying() has returned. So making > blk_set_queue_dying() wait would fix a race condition in at least the mtip > driver. Looks you don't explain how your approach fixes this issue. IMO, your may can't fix this issue. 1) the core idea: when blk_freeze_queue() returns, there is still in-progress dispatch or we can call it in-progress io submission; So once blk_freeze_queue() returns, blk_cleanup_queue() will go ahead, and the in-progress dispatch may trigger use-after-free, even corrupt memory. For detailed reason, please see: https://marc.info/?l=linux-block=150993988115872=2 2) suppose you replace blk_freeze_queue_start() with blk_freeze_queue() in blk_set_queue_dying(), that changes nothing about this issue. You may think the blk_freeze_queue() called in blk_cleanup_queue() implies synchronize_rcu(), which may drain up the in-progress dispatch. But the in-progress dispatch(io submission) may not enter rcu read critical area yet at that time, for example, in one IO submission path, the request is just added to scheduler queue, and this request is exactly dispatched to LLD via other run queue and completed before the actual run queue from the current io submission. So the implied synchronize_rcu() won't help this case at all. Or do I miss your idea? -- Ming
Re: [PATCH] blk-mq: only run the hardware queue if IO is pending
On 11/10/2017 03:28 PM, Omar Sandoval wrote: > On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote: >> Currently we are inconsistent in when we decide to run the queue. Using >> blk_mq_run_hw_queues() we check if the hctx has pending IO before >> running it, but we don't do that from the individual queue run function, >> blk_mq_run_hw_queue(). This results in a lot of extra and pointless >> queue runs, potentially, on flush requests and (much worse) on tag >> starvation situations. This is observable just looking at the top >> output, with lots of kworkers active. For the !async runs, it just adds >> to the CPU overhead of blk-mq. >> >> Move the has-pending check into the run function instead of having >> callers do it. >> >> Signed-off-by: Jens Axboe>> >> --- >> >> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c >> index 6f4bdb8209f7..c117bd8fd1f6 100644 >> --- a/block/blk-mq-sched.c >> +++ b/block/blk-mq-sched.c >> @@ -81,12 +81,7 @@ static bool blk_mq_sched_restart_hctx(struct >> blk_mq_hw_ctx *hctx) >> } else >> clear_bit(BLK_MQ_S_SCHED_RESTART, >state); >> >> -if (blk_mq_hctx_has_pending(hctx)) { >> -blk_mq_run_hw_queue(hctx, true); >> -return true; >> -} >> - >> -return false; >> +return blk_mq_run_hw_queue(hctx, true); >> } >> >> /* >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index bfe24a5b62a3..a2a4271f5ab8 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -61,10 +61,10 @@ static int blk_mq_poll_stats_bkt(const struct request >> *rq) >> /* >> * Check if any of the ctx's have pending work in this hardware queue >> */ >> -bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) >> +static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) >> { >> -return sbitmap_any_bit_set(>ctx_map) || >> -!list_empty_careful(>dispatch) || >> +return !list_empty_careful(>dispatch) || >> +sbitmap_any_bit_set(>ctx_map) || >> blk_mq_sched_has_work(hctx); >> } >> >> @@ -1253,9 +1253,14 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx >> *hctx, unsigned long msecs) >> } >> EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); >> >> -void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) >> +bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) >> { >> -__blk_mq_delay_run_hw_queue(hctx, async, 0); >> +if (blk_mq_hctx_has_pending(hctx)) { >> +__blk_mq_delay_run_hw_queue(hctx, async, 0); >> +return true; >> +} >> + >> +return false; >> } >> EXPORT_SYMBOL(blk_mq_run_hw_queue); >> >> @@ -1265,8 +1270,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, >> bool async) >> int i; >> >> queue_for_each_hw_ctx(q, hctx, i) { >> -if (!blk_mq_hctx_has_pending(hctx) || >> -blk_mq_hctx_stopped(hctx)) >> +if (blk_mq_hctx_stopped(hctx)) >> continue; >> >> blk_mq_run_hw_queue(hctx, async); > > Minor cosmetic point, I find this double-negative thing confusing, > how about: > > queue_for_each_hw_ctx(q, hctx, i) { > if (!blk_mq_hctx_stopped(hctx)) > blk_mq_run_hw_queue(hctx, async); Really? It's an easier read for me - if stopped, continue. Then run after that. !stopped seems like more of a double negative :-) > Other than that, > > Reviewed-by: Omar Sandoval Thanks for the review! -- Jens Axboe
Re: [PATCH] blk-mq: only run the hardware queue if IO is pending
On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote: > Currently we are inconsistent in when we decide to run the queue. Using > blk_mq_run_hw_queues() we check if the hctx has pending IO before > running it, but we don't do that from the individual queue run function, > blk_mq_run_hw_queue(). This results in a lot of extra and pointless > queue runs, potentially, on flush requests and (much worse) on tag > starvation situations. This is observable just looking at the top > output, with lots of kworkers active. For the !async runs, it just adds > to the CPU overhead of blk-mq. > > Move the has-pending check into the run function instead of having > callers do it. > > Signed-off-by: Jens Axboe> > --- > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 6f4bdb8209f7..c117bd8fd1f6 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -81,12 +81,7 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx > *hctx) > } else > clear_bit(BLK_MQ_S_SCHED_RESTART, >state); > > - if (blk_mq_hctx_has_pending(hctx)) { > - blk_mq_run_hw_queue(hctx, true); > - return true; > - } > - > - return false; > + return blk_mq_run_hw_queue(hctx, true); > } > > /* > diff --git a/block/blk-mq.c b/block/blk-mq.c > index bfe24a5b62a3..a2a4271f5ab8 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -61,10 +61,10 @@ static int blk_mq_poll_stats_bkt(const struct request *rq) > /* > * Check if any of the ctx's have pending work in this hardware queue > */ > -bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) > +static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) > { > - return sbitmap_any_bit_set(>ctx_map) || > - !list_empty_careful(>dispatch) || > + return !list_empty_careful(>dispatch) || > + sbitmap_any_bit_set(>ctx_map) || > blk_mq_sched_has_work(hctx); > } > > @@ -1253,9 +1253,14 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx > *hctx, unsigned long msecs) > } > EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); > > -void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) > +bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) > { > - __blk_mq_delay_run_hw_queue(hctx, async, 0); > + if (blk_mq_hctx_has_pending(hctx)) { > + __blk_mq_delay_run_hw_queue(hctx, async, 0); > + return true; > + } > + > + return false; > } > EXPORT_SYMBOL(blk_mq_run_hw_queue); > > @@ -1265,8 +1270,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool > async) > int i; > > queue_for_each_hw_ctx(q, hctx, i) { > - if (!blk_mq_hctx_has_pending(hctx) || > - blk_mq_hctx_stopped(hctx)) > + if (blk_mq_hctx_stopped(hctx)) > continue; > > blk_mq_run_hw_queue(hctx, async); Minor cosmetic point, I find this double-negative thing confusing, how about: queue_for_each_hw_ctx(q, hctx, i) { if (!blk_mq_hctx_stopped(hctx)) blk_mq_run_hw_queue(hctx, async); Other than that, Reviewed-by: Omar Sandoval
Re: [PATCH] blk-mq: improve tag waiting setup for non-shared tags
On Thu, Nov 09, 2017 at 04:00:09PM -0700, Jens Axboe wrote: > If we run out of driver tags, we currently treat shared and non-shared > tags the same - both cases hook into the tag waitqueue. This is a bit > more costly than it needs to be on unshared tags, since we have to both > grab the hctx lock, and the waitqueue lock (and disable interrupts). > For the non-shared case, we can simply mark the queue as needing a > restart. > > Split blk_mq_dispatch_wait_add() to account for both cases, and > rename it to blk_mq_mark_tag_wait() to better reflect what it > does now. > > Without this patch, shared and non-shared performance is about the same > with 4 fio thread hammering on a single null_blk device (~410K, at 75% > sys). With the patch, the shared case is the same, but the non-shared > tags case runs at 431K at 71% sys. > > Signed-off-by: Jens AxboeThe best of both worlds. Reviewed-by: Omar Sandoval
Re: [PATCH v3 2/8] crypto: scompress - use sgl_alloc() and sgl_free()
On 10 November 2017 at 16:43, Bart Van Asschewrote: > On Fri, 2017-11-03 at 15:23 -0700, Bart Van Assche wrote: >> Use the sgl_alloc() and sgl_free() functions instead of open coding >> these functions. >> >> Signed-off-by: Bart Van Assche >> Cc: Ard Biesheuvel >> Cc: Herbert Xu > > Hello Ard and Herbert, > > Would it be possible for one of you to share your opinion about this patch? > Acked-by: Ard Biesheuvel
Re: [PATCH] brd: remove unused brd_mutex
On 11/10/2017 10:29 AM, Mikulas Patocka wrote: > Remove unused mutex brd_mutex. It is unused since the commit ff26956875c2 > ("brd: remove support for BLKFLSBUF"). Thanks, applied. -- Jens Axboe
Re: [GIT PULL] nvme updates for Linux 4.15
On 11/10/2017 10:38 AM, Jens Axboe wrote: > On 11/10/2017 10:33 AM, Christoph Hellwig wrote: >> On Fri, Nov 10, 2017 at 10:27:24AM -0700, Jens Axboe wrote: >>> That makes for a bit of an awkward merge, why wasn't this fixed up >>> in your tree? >> >> Because you asked me to always base on for-4.15/block last time? > > That's not what I meant. It's conflicting because of a patch, that's > fine. But the code in your tree is: > > if (a == _attr_uuid.attr) { > if (uuid_is_null(>uuid) || > !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) > return 0; > } > > and you're saying the right resolution is: > > if (a == _attr_uuid.attr) { > if (uuid_is_null(>uuid) || > !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) > return 0; > } This one should have had the && instead of course... -- Jens Axboe
Re: [GIT PULL] nvme updates for Linux 4.15
On 11/10/2017 10:33 AM, Christoph Hellwig wrote: > On Fri, Nov 10, 2017 at 10:27:24AM -0700, Jens Axboe wrote: >> That makes for a bit of an awkward merge, why wasn't this fixed up >> in your tree? > > Because you asked me to always base on for-4.15/block last time? That's not what I meant. It's conflicting because of a patch, that's fine. But the code in your tree is: if (a == _attr_uuid.attr) { if (uuid_is_null(>uuid) || !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) return 0; } and you're saying the right resolution is: if (a == _attr_uuid.attr) { if (uuid_is_null(>uuid) || !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) return 0; } My point is that this should have been fixed up in your tree. Yes, it'd still cause a conflict, that's fine, but at least the merge would not end up also containing a fix for the above code. That part is not logical. -- Jens Axboe
Re: [PATCH] blk-mq: only run the hardware queue if IO is pending
On 11/10/2017 10:34 AM, Christoph Hellwig wrote: > On Fri, Nov 10, 2017 at 10:31:29AM -0700, Jens Axboe wrote: >> On 11/10/2017 10:29 AM, Christoph Hellwig wrote: >>> On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote: Currently we are inconsistent in when we decide to run the queue. Using blk_mq_run_hw_queues() we check if the hctx has pending IO before running it, but we don't do that from the individual queue run function, blk_mq_run_hw_queue(). This results in a lot of extra and pointless queue runs, potentially, on flush requests and (much worse) on tag starvation situations. This is observable just looking at the top output, with lots of kworkers active. For the !async runs, it just adds to the CPU overhead of blk-mq. Move the has-pending check into the run function instead of having callers do it. Signed-off-by: Jens Axboe>>> >>> Do we even still need the blk_mq_hctx_has_pending helper at all? >> >> We do, otherwise we end up running the queue for cases where we don't >> need to. The caller doesn't always know if it's necessary. And even >> if the caller can figure it out, it'd add complicated logic for some >> of those cases (like flush). It's easier/cleaner and more efficient >> to keep it. > > I didn't mean the logic - just wondering if we need the separate > helper given it has a single caller left. > > But it's really just a cosmetic point. Ah, that makes more sense. Since the function isn't necessarily that straight forward if open coded (to read, I mean), I'd prefer to keep it separate. But yeah, cosmetic, can always be revisited. -- Jens Axboe
Re: [PATCH] blk-mq: only run the hardware queue if IO is pending
On Fri, Nov 10, 2017 at 10:31:29AM -0700, Jens Axboe wrote: > On 11/10/2017 10:29 AM, Christoph Hellwig wrote: > > On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote: > >> Currently we are inconsistent in when we decide to run the queue. Using > >> blk_mq_run_hw_queues() we check if the hctx has pending IO before > >> running it, but we don't do that from the individual queue run function, > >> blk_mq_run_hw_queue(). This results in a lot of extra and pointless > >> queue runs, potentially, on flush requests and (much worse) on tag > >> starvation situations. This is observable just looking at the top > >> output, with lots of kworkers active. For the !async runs, it just adds > >> to the CPU overhead of blk-mq. > >> > >> Move the has-pending check into the run function instead of having > >> callers do it. > >> > >> Signed-off-by: Jens Axboe> > > > Do we even still need the blk_mq_hctx_has_pending helper at all? > > We do, otherwise we end up running the queue for cases where we don't > need to. The caller doesn't always know if it's necessary. And even > if the caller can figure it out, it'd add complicated logic for some > of those cases (like flush). It's easier/cleaner and more efficient > to keep it. I didn't mean the logic - just wondering if we need the separate helper given it has a single caller left. But it's really just a cosmetic point.
Re: [GIT PULL] nvme updates for Linux 4.15
On Fri, Nov 10, 2017 at 10:27:24AM -0700, Jens Axboe wrote: > That makes for a bit of an awkward merge, why wasn't this fixed up > in your tree? Because you asked me to always base on for-4.15/block last time?
Re: [PATCH] blk-mq: only run the hardware queue if IO is pending
On 11/10/2017 10:29 AM, Christoph Hellwig wrote: > On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote: >> Currently we are inconsistent in when we decide to run the queue. Using >> blk_mq_run_hw_queues() we check if the hctx has pending IO before >> running it, but we don't do that from the individual queue run function, >> blk_mq_run_hw_queue(). This results in a lot of extra and pointless >> queue runs, potentially, on flush requests and (much worse) on tag >> starvation situations. This is observable just looking at the top >> output, with lots of kworkers active. For the !async runs, it just adds >> to the CPU overhead of blk-mq. >> >> Move the has-pending check into the run function instead of having >> callers do it. >> >> Signed-off-by: Jens Axboe> > Do we even still need the blk_mq_hctx_has_pending helper at all? We do, otherwise we end up running the queue for cases where we don't need to. The caller doesn't always know if it's necessary. And even if the caller can figure it out, it'd add complicated logic for some of those cases (like flush). It's easier/cleaner and more efficient to keep it. -- Jens Axboe
Re: [PATCH] blk-mq: only run the hardware queue if IO is pending
On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote: > Currently we are inconsistent in when we decide to run the queue. Using > blk_mq_run_hw_queues() we check if the hctx has pending IO before > running it, but we don't do that from the individual queue run function, > blk_mq_run_hw_queue(). This results in a lot of extra and pointless > queue runs, potentially, on flush requests and (much worse) on tag > starvation situations. This is observable just looking at the top > output, with lots of kworkers active. For the !async runs, it just adds > to the CPU overhead of blk-mq. > > Move the has-pending check into the run function instead of having > callers do it. > > Signed-off-by: Jens AxboeDo we even still need the blk_mq_hctx_has_pending helper at all? Except for that this looks fine to me: Reviewed-by: Christoph Hellwig
[PATCH] brd: remove unused brd_mutex
Remove unused mutex brd_mutex. It is unused since the commit ff26956875c2 ("brd: remove support for BLKFLSBUF"). Signed-off-by: Mikulas Patocka--- drivers/block/brd.c |1 - 1 file changed, 1 deletion(-) Index: linux-2.6/drivers/block/brd.c === --- linux-2.6.orig/drivers/block/brd.c +++ linux-2.6/drivers/block/brd.c @@ -60,7 +60,6 @@ struct brd_device { /* * Look up and return a brd's page for a given sector. */ -static DEFINE_MUTEX(brd_mutex); static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector) { pgoff_t idx;
Re: [GIT PULL] nvme updates for Linux 4.15
On 11/10/2017 10:22 AM, Christoph Hellwig wrote: > On Fri, Nov 10, 2017 at 08:22:32AM -0700, Jens Axboe wrote: >> Actually, double checking, one of them is a little suspicious: >> >> if (a == _attr_uuid.attr) { >> <<< HEAD >> if (uuid_is_null(>uuid) && >> !memchr_inv(ns->nguid, 0, sizeof(ns->nguid))) >> === >> if (uuid_is_null(>uuid) || >> !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) > 34a9690712aa386fb5fa9e3b8fb44a22f5f2aec6 >> return 0; >> } >> >> with the previous copy using is_null && memchr_inv, the change to ids >> makes it an OR instead. I'm going with the latter, but thought I'd bring >> it up. > > The && from master is the right one, it comes from: > > ommit 007a61ae2f35c7fcf767313285c4924e81f11983 > Author: Martin Wilck> Date: Thu Sep 28 21:33:23 2017 +0200 > > nvme: fix visibility of "uuid" ns attribute > > and needs to be logically applied to the new code as well. That makes for a bit of an awkward merge, why wasn't this fixed up in your tree? In any case, I just fixed it up in my for-next. -- Jens Axboe
Re: [GIT PULL] nvme updates for Linux 4.15
On Fri, Nov 10, 2017 at 08:22:32AM -0700, Jens Axboe wrote: > Actually, double checking, one of them is a little suspicious: > > if (a == _attr_uuid.attr) { > <<< HEAD > if (uuid_is_null(>uuid) && > !memchr_inv(ns->nguid, 0, sizeof(ns->nguid))) > === > if (uuid_is_null(>uuid) || > !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) > >>> 34a9690712aa386fb5fa9e3b8fb44a22f5f2aec6 > return 0; > } > > with the previous copy using is_null && memchr_inv, the change to ids > makes it an OR instead. I'm going with the latter, but thought I'd bring > it up. The && from master is the right one, it comes from: ommit 007a61ae2f35c7fcf767313285c4924e81f11983 Author: Martin WilckDate: Thu Sep 28 21:33:23 2017 +0200 nvme: fix visibility of "uuid" ns attribute and needs to be logically applied to the new code as well.
Re: [PATCH V4] scsi_debugfs: fix crash in scsi_show_rq()
On Fri, 2017-11-10 at 17:01 +0800, Ming Lei wrote: > cmd->cmnd can be allocated/freed dynamically in case of > T10_PI_TYPE2_PROTECTION, > so we should check it in scsi_show_rq() because this request may have > been freed already here, and cmd->cmnd has been set as null. > > We choose to accept read-after-free and dump request data as far as > possible. > > This patch fixs the following kernel crash when dumping request via > block's > debugfs interface: > > [ 252.962045] BUG: unable to handle kernel NULL pointer dereference > at (null) > [ 252.963007] IP: scsi_format_opcode_name+0x1a/0x1c0 > [ 252.963007] PGD 25e75a067 P4D 25e75a067 PUD 25e75b067 PMD 0 > [ 252.963007] Oops: [#1] PREEMPT SMP > [ 252.963007] Dumping ftrace buffer: > [ 252.963007](ftrace buffer empty) > [ 252.963007] Modules linked in: scsi_debug ebtable_filter ebtables > ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE > nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 > nf_nat_ipv4 nf_nat nf_conntrack libcrc32c bridge stp llc > iptable_filter fuse ip_tables sd_mod sg mptsas mptscsih mptbase > crc32c_intel ahci libahci nvme serio_raw scsi_transport_sas libata > lpc_ich nvme_core virtio_scsi binfmt_misc dm_mod iscsi_tcp > libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs > [ 252.963007] CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0- > rc2.blk_mq_io_hang+ #516 > [ 252.963007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > BIOS 1.9.3-1.fc25 04/01/2014 > [ 252.963007] task: 88025e6f6000 task.stack: c90001bd > [ 252.963007] RIP: 0010:scsi_format_opcode_name+0x1a/0x1c0 > [ 252.963007] RSP: 0018:c90001bd3c50 EFLAGS: 00010286 > [ 252.963007] RAX: 4843 RBX: 0050 RCX: > > [ 252.963007] RDX: RSI: 0050 RDI: > c90001bd3cd8 > [ 252.963007] RBP: c90001bd3c88 R08: 1000 R09: > > [ 252.963007] R10: 880275134000 R11: 88027513406c R12: > 0050 > [ 252.963007] R13: c90001bd3cd8 R14: R15: > > [ 252.963007] FS: 7f4d11762700() GS:88027fc4() > knlGS: > [ 252.963007] CS: 0010 DS: ES: CR0: 80050033 > [ 252.963007] CR2: CR3: 00025e789003 CR4: > 003606e0 > [ 252.963007] DR0: DR1: DR2: > > [ 252.963007] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 252.963007] Call Trace: > [ 252.963007] __scsi_format_command+0x27/0xc0 > [ 252.963007] scsi_show_rq+0x5c/0xc0 > [ 252.963007] ? seq_printf+0x4e/0x70 > [ 252.963007] ? blk_flags_show+0x5b/0xf0 > [ 252.963007] __blk_mq_debugfs_rq_show+0x116/0x130 > [ 252.963007] blk_mq_debugfs_rq_show+0xe/0x10 > [ 252.963007] seq_read+0xfe/0x3b0 > [ 252.963007] ? __handle_mm_fault+0x631/0x1150 > [ 252.963007] full_proxy_read+0x54/0x90 > [ 252.963007] __vfs_read+0x37/0x160 > [ 252.963007] ? security_file_permission+0x9b/0xc0 > [ 252.963007] vfs_read+0x96/0x130 > [ 252.963007] SyS_read+0x55/0xc0 > [ 252.963007] entry_SYSCALL_64_fastpath+0x1a/0xa5 > [ 252.963007] RIP: 0033:0x7f4d1127e9b0 > [ 252.963007] RSP: 002b:7ffd27082568 EFLAGS: 0246 ORIG_RAX: > > [ 252.963007] RAX: ffda RBX: 7f4d1154bb20 RCX: > 7f4d1127e9b0 > [ 252.963007] RDX: 0002 RSI: 7f4d115a7000 RDI: > 0003 > [ 252.963007] RBP: 00021010 R08: R09: > > [ 252.963007] R10: 037b R11: 0246 R12: > 00022000 > [ 252.963007] R13: 7f4d1154bb78 R14: 1000 R15: > 0002 > [ 252.963007] Code: c6 e8 1b ca 24 00 eb 8c e8 74 2c ae ff 0f 1f 40 > 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 89 f4 > 48 83 ec 18 <44> 0f b6 32 48 c7 45 c8 00 00 00 00 65 48 8b 04 25 28 > 00 00 00 > [ 252.963007] RIP: scsi_format_opcode_name+0x1a/0x1c0 RSP: > c90001bd3c50 > [ 252.963007] CR2: > [ 252.963007] ---[ end trace 83c5bddfbaa6573c ]--- > [ 252.963007] Kernel panic - not syncing: Fatal exception > [ 252.963007] Dumping ftrace buffer: > [ 252.963007](ftrace buffer empty) > [ 252.963007] Kernel Offset: disabled > [ 252.963007] ---[ end Kernel panic - not syncing: Fatal exception > > Fixes: 0eebd005dd07(scsi: Implement blk_mq_ops.show_rq()) > Cc: Bart Van Assche> Cc: Omar Sandoval > Cc: Martin K. Petersen > Cc: James Bottomley > Cc: Hannes Reinecke > Signed-off-by: Ming Lei > --- > V2: > - fix typo > V3: > - prefer to dump data and accept read-after-free > - add some comment > V4: > - read the two variable into local variable first, for avoiding > free between
Re: [PATCH v3 2/8] crypto: scompress - use sgl_alloc() and sgl_free()
On Fri, 2017-11-03 at 15:23 -0700, Bart Van Assche wrote: > Use the sgl_alloc() and sgl_free() functions instead of open coding > these functions. > > Signed-off-by: Bart Van Assche> Cc: Ard Biesheuvel > Cc: Herbert Xu Hello Ard and Herbert, Would it be possible for one of you to share your opinion about this patch? Thanks, Bart.
Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
On Sun, 2017-11-05 at 15:38 +, Bart Van Assche wrote: > On Sun, 2017-11-05 at 20:10 +0800, Ming Lei wrote: > > diff --git a/block/blk-core.c b/block/blk-core.c > > index 048be4aa6024..0b121f29e3b1 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q) > > queue_flag_set(QUEUE_FLAG_DEAD, q); > > spin_unlock_irq(lock); > > > > + /* respect queue DEAD via quiesce for blk-mq */ > > + if (q->mq_ops) > > + blk_mq_quiesce_queue(q); > > + > > /* for synchronous bio-based driver finish in-flight integrity i/o */ > > blk_flush_integrity(); > > Have you considered to change the blk_freeze_queue_start() call in > blk_set_queue_dying() into a blk_freeze_queue() call? That approach has the > advantage that no new if (q->mq_ops) test has to be introduced. There is an additional reason why I think it would be better to make blk_set_queue_dying() wait until requests that are in progress have finished. Some block drivers, e.g. the mtip driver, start cleaning up resources immediately after blk_set_queue_dying() has returned. So making blk_set_queue_dying() wait would fix a race condition in at least the mtip driver. Bart.
[PATCH] blk-mq: only run the hardware queue if IO is pending
Currently we are inconsistent in when we decide to run the queue. Using blk_mq_run_hw_queues() we check if the hctx has pending IO before running it, but we don't do that from the individual queue run function, blk_mq_run_hw_queue(). This results in a lot of extra and pointless queue runs, potentially, on flush requests and (much worse) on tag starvation situations. This is observable just looking at the top output, with lots of kworkers active. For the !async runs, it just adds to the CPU overhead of blk-mq. Move the has-pending check into the run function instead of having callers do it. Signed-off-by: Jens Axboe--- diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 6f4bdb8209f7..c117bd8fd1f6 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -81,12 +81,7 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) } else clear_bit(BLK_MQ_S_SCHED_RESTART, >state); - if (blk_mq_hctx_has_pending(hctx)) { - blk_mq_run_hw_queue(hctx, true); - return true; - } - - return false; + return blk_mq_run_hw_queue(hctx, true); } /* diff --git a/block/blk-mq.c b/block/blk-mq.c index bfe24a5b62a3..a2a4271f5ab8 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -61,10 +61,10 @@ static int blk_mq_poll_stats_bkt(const struct request *rq) /* * Check if any of the ctx's have pending work in this hardware queue */ -bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) +static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) { - return sbitmap_any_bit_set(>ctx_map) || - !list_empty_careful(>dispatch) || + return !list_empty_careful(>dispatch) || + sbitmap_any_bit_set(>ctx_map) || blk_mq_sched_has_work(hctx); } @@ -1253,9 +1253,14 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) } EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); -void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) +bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) { - __blk_mq_delay_run_hw_queue(hctx, async, 0); + if (blk_mq_hctx_has_pending(hctx)) { + __blk_mq_delay_run_hw_queue(hctx, async, 0); + return true; + } + + return false; } EXPORT_SYMBOL(blk_mq_run_hw_queue); @@ -1265,8 +1270,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) int i; queue_for_each_hw_ctx(q, hctx, i) { - if (!blk_mq_hctx_has_pending(hctx) || - blk_mq_hctx_stopped(hctx)) + if (blk_mq_hctx_stopped(hctx)) continue; blk_mq_run_hw_queue(hctx, async); diff --git a/block/blk-mq.h b/block/blk-mq.h index 99a19c5523e2..dcf379a892dd 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -26,14 +26,12 @@ struct blk_mq_ctx { struct kobject kobj; } cacheline_aligned_in_smp; -void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async); void blk_mq_freeze_queue(struct request_queue *q); void blk_mq_free_queue(struct request_queue *q); int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr); void blk_mq_wake_waiters(struct request_queue *q); bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool); void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list); -bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx); bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx, bool wait); struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx, diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index b326208277ee..eb1e2cdffb31 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -266,7 +266,7 @@ void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async); void blk_mq_quiesce_queue(struct request_queue *q); void blk_mq_unquiesce_queue(struct request_queue *q); void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs); -void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async); +bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async); void blk_mq_run_hw_queues(struct request_queue *q, bool async); void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs); void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, -- Jens Axboe
Re: [PATCH] ide: ide-atapi: fix compile error with defining macro DEBUG
On 11/10/2017 12:59 AM, Hongxu Jia wrote: > Compile ide-atapi failed with defining macro "DEBUG" > ... > |drivers/ide/ide-atapi.c:285:52: error: 'struct request' has > no member named 'cmd'; did you mean 'csd'? > | debug_log("%s: rq->cmd[0]: 0x%x\n", __func__, rq->cmd[0]); > ... > > Since we split the scsi_request out of struct request, it missed > do the same thing on debug_log Thanks, applied. -- Jens Axboe
Re: [PATCH 00/12 v5] Multiqueue for MMC/SD
On 10 November 2017 at 11:01, Linus Walleijwrote: > This is the fifth iteration of this patch set. > > I *HOPE* that we can scrap this patch set and merge Adrian's > patches instead, because they also bring CQE support which is > nice. I had some review comments on his series, mainly that > it needs to kill off the legacy block layer code path that > noone likes anyway. > > So this is mainly an academic and inspirational exercise. > Whatever remains of this refactoring, if anything, I can > certainly do on top of Adrian's patches as well. > > What changed since v4 is the error path, since Adrian pointed > out that the error handling seems to be fragile. It was indeed > fragile... To make sure things work properly I have run long > test rounds with fault injection, essentially: Please correct me if I am wrong, the issues was observed already in patch 11, before the actual switch to mq was done, right? > > Enable FAULT_INJECTION, FAULT_INJECTION_DEBUG_FS, >FAIL_MMC_REQUEST > cd /debug/mmc3/fail_mmc_request/ > echo 1 > probability > echo -1 > times > > Then running a dd to the card, also increased the error rate > to 10% and completed tests successfully, but at this error > rate the MMC stack sometimes exceeds the retry limit and the > dd command fails (as is appropriate). That's great. I really appreciate that you have run these tests, that gives me a good confidence from an overall point of view. > > Removing a card during I/O does not work well however :/ > So I guess I would need to work on that if this series should > continue. (Hopefully unlikely.) Yeah, this has actually been rather cumbersome to deal with also in the legacy request path. Let's dive into this in more detail as soon as possible. > > > Linus Walleij (12): > mmc: core: move the asynchronous post-processing > mmc: core: add a workqueue for completing requests > mmc: core: replace waitqueue with worker > mmc: core: do away with is_done_rcv > mmc: core: do away with is_new_req > mmc: core: kill off the context info > mmc: queue: simplify queue logic > mmc: block: shuffle retry and error handling > mmc: queue: stop flushing the pipeline with NULL > mmc: queue/block: pass around struct mmc_queue_req*s > mmc: block: issue requests in massive parallel > mmc: switch MMC/SD to use blk-mq multiqueueing v5 > > drivers/mmc/core/block.c| 557 > +++- > drivers/mmc/core/block.h| 5 +- > drivers/mmc/core/bus.c | 1 - > drivers/mmc/core/core.c | 217 ++--- > drivers/mmc/core/core.h | 11 +- > drivers/mmc/core/host.c | 1 - > drivers/mmc/core/mmc_test.c | 31 +-- > drivers/mmc/core/queue.c| 252 > drivers/mmc/core/queue.h| 16 +- > include/linux/mmc/core.h| 3 +- > include/linux/mmc/host.h| 31 +-- > 11 files changed, 557 insertions(+), 568 deletions(-) First, I haven't yet commented on the latest version of the mq patch (patch12 in this series) and neither on Adrians (patch 3 in his series), but before doing that, let me share my overall view of how we could move forward, as to see if all of us can agree on that path. So, what I really like in $subject series is the step by step method, moving slowly forward enables an easy review, then the actual switch to mq gets a diff of "3 files changed, 156 insertions(+), 181 deletions(-)". This shows to me, that it can be done! Great work! Of course, I do realize that you may not have considered all preparations needed for CQE, which Adrian may have thought of in his mq patch from his series (patch 3), but still. Moreover, for reasons brought up while reviewing Adrian's series, regarding if mq is "ready", and because I see that the diff for patch 12 is small, I suggest that we just skip the step adding a Kconfig option to allow an opt-in of the mq path. In other words, *the* patch that makes the switch to mq, should also remove the entire left over of rubbish code, from the legacy request path. That's also what you do in patch12, nice! Finally, I understand that you would be happy to scrap this series, but instead let Adrian's series, when re-posted, to go first. Could you perhaps re-consider that, because I wonder if it may not be smother and less risky, to actually apply everything up to patch 11 in this series? I noticed that you reported issues with card removal during I/O (for both yours and Adrian's mq patch), but does those problems exists at patch 11 - or is those explicitly introduced with the mk patch (patch 12)? Of course, I realize that if we apply everything up to patch11, that would require a massive re-base of Adrian's mq/CQE series, but on the other hand, then matter which mq patch we decide to go with, it should be a rather small diff, thus easy to review and less risky. Adrian, Linus - what do you think? Kind regards Uffe
Re: [GIT PULL] nvme updates for Linux 4.15
On 11/10/2017 08:18 AM, Jens Axboe wrote: > On 11/10/2017 07:34 AM, Christoph Hellwig wrote: >> Hi Jens, >> >> another round of NVMe updates for 4.15. >> >> The biggest items are: >> >> - native multipath support (me + Hannes) >> - userspace notifications for AENs (Keith) >> - obey the command effects log page (Keith) >> >> in addition to that lots of fixes from a wide variety of people. > > Pulled. It had two conflicts with master, but both pretty trivial. Actually, double checking, one of them is a little suspicious: if (a == _attr_uuid.attr) { <<< HEAD if (uuid_is_null(>uuid) && !memchr_inv(ns->nguid, 0, sizeof(ns->nguid))) === if (uuid_is_null(>uuid) || !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) >>> 34a9690712aa386fb5fa9e3b8fb44a22f5f2aec6 return 0; } with the previous copy using is_null && memchr_inv, the change to ids makes it an OR instead. I'm going with the latter, but thought I'd bring it up. -- Jens Axboe
Re: [GIT PULL] nvme updates for Linux 4.15
On 11/10/2017 07:34 AM, Christoph Hellwig wrote: > Hi Jens, > > another round of NVMe updates for 4.15. > > The biggest items are: > > - native multipath support (me + Hannes) > - userspace notifications for AENs (Keith) > - obey the command effects log page (Keith) > > in addition to that lots of fixes from a wide variety of people. Pulled. It had two conflicts with master, but both pretty trivial. -- Jens Axboe
[GIT PULL] nvme updates for Linux 4.15
Hi Jens, another round of NVMe updates for 4.15. The biggest items are: - native multipath support (me + Hannes) - userspace notifications for AENs (Keith) - obey the command effects log page (Keith) in addition to that lots of fixes from a wide variety of people. The following changes since commit 7d1dff993252366cc5e21e07b80072d323628363: blktrace: fix unlocked registration of tracepoints (2017-11-07 09:42:10 -0700) are available in the git repository at: git://git.infradead.org/nvme.git nvme-4.15 for you to fetch changes up to b119d90862842a055ff78ca54953d430da314808: nvme: expose subsys attribute to sysfs (2017-11-10 12:16:19 +0100) Christoph Hellwig (14): nvme: move the dying queue check from cancel to completion nvme: always unregister the integrity profile in __nvme_revalidate_disk nvme: don't pass struct nvme_ns to nvme_init_integrity nvme: don't pass struct nvme_ns to nvme_config_discard nvme: set the chunk size before freezing the queue nvme: split __nvme_revalidate_disk nvme: fix and clarify the check for missing metadata nvmet: better data length validation nvmet: kill nvmet_inline_bio_init nvme: track subsystems nvme: introduce a nvme_ns_ids structure nvme: track shared namespaces nvme: implement multipath access to nvme subsystems nvme: also expose the namespace identification sysfs files for mpath nodes Hannes Reinecke (3): block: create 'slaves' and 'holders' entries for hidden gendisks nvme: create 'slaves' and 'holders' entries for hidden controllers nvme: expose subsys attribute to sysfs Israel Rukshin (1): nvmet-rdma: update queue list during ib_device removal James Smart (3): nvme-fc: fix localport resume using stale values nvme-fc: decouple ns references from lldd references lpfc: tie in to new dev_loss_tmo interface in nvme transport Javier González (2): nvme: compare NQN string with right size nvme: fix eui_show() print format Keith Busch (7): nvme: factor get log into a helper nvme: check admin passthru command effects nvme: centralize AEN defines nvme-fc: remove unused "queue_size" field nvme: remove handling of multiple AEN requests nvme: unexport starting async event work nvme: send uevent for some asynchronous events Max Gurtovoy (1): nvme-rdma: fix nvme_rdma_create_queue_ib error flow Ming Lei (1): nvme-pci: avoid dereference of symbol from unloaded module Minwoo Im (1): nvmet: fix comment typos in admin-cmd.c Sagi Grimberg (2): nvmet: remove redundant memset if failed to get_smart_log failed nvmet: remove redundant local variable block/genhd.c | 14 +- drivers/nvme/host/Kconfig |9 + drivers/nvme/host/Makefile |1 + drivers/nvme/host/core.c| 1050 ++- drivers/nvme/host/fabrics.c |4 +- drivers/nvme/host/fc.c | 143 -- drivers/nvme/host/lightnvm.c| 14 +- drivers/nvme/host/multipath.c | 291 +++ drivers/nvme/host/nvme.h| 138 - drivers/nvme/host/pci.c | 21 +- drivers/nvme/host/rdma.c| 21 +- drivers/nvme/target/admin-cmd.c | 21 +- drivers/nvme/target/core.c | 10 + drivers/nvme/target/fc.c| 32 +- drivers/nvme/target/io-cmd.c| 18 +- drivers/nvme/target/loop.c | 19 +- drivers/nvme/target/nvmet.h |4 + drivers/nvme/target/rdma.c | 16 +- drivers/scsi/lpfc/lpfc_attr.c |5 + include/linux/nvme.h| 30 ++ 20 files changed, 1456 insertions(+), 405 deletions(-) create mode 100644 drivers/nvme/host/multipath.c
Re: [PATCH 00/12 v5] Multiqueue for MMC/SD
On Fri, Nov 10, 2017 at 11:01 AM, Linus Walleijwrote: > Removing a card during I/O does not work well however :/ > So I guess I would need to work on that if this series should > continue. (Hopefully unlikely.) I tested a bit more and it turns out this doesn't work on any of the MQ patch sets. Which matches Christoph's statement that this is not really working. I haven't really analyzed why though, I can see that the kernel crashes firmly into a brick wall, but on mainline it does not. I think it's just something we have to smoke out in the next release cycle as we switch to MQ. Yours, Linus Walleij
Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
Following up on this: On Thu, Nov 9, 2017 at 4:52 PM, Linus Walleijwrote: >> No one has ever suggested that the legacy API will remain. Once blk-mq is >> ready the old code gets deleted. > > The block layer maintainers most definately think MQ is ready > but you seem to disagree. Why? > > In the recent LWN article from kernel recepies: > https://lwn.net/Articles/735275/ > the only obstacle I can see is a mention that SCSI was not > switched over by default because of some problems with > slow rotating media. "This change had to be backed out > because of some performance and scalability issues that > arose for rotating storage." > > Christoph mentions that he's gonna try again for v4.14. > But it's true, I do not see that happening in linux-next yet. Neil Brown's article on LWN points to Ming Lei's patch set: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1513023.html As addressing the issue that held back blk-MQ from being default for SCSI, and it is reportedly to be queued for v4.15. Since this MMC/SD rework is also targeted for v4.15 I think we can assume it is pretty much ready for everything, and delete the non-MQ block path. I just haven't seen any of this problem in my tests with MMC/SD so I do not think it would be affected, but it anyways seem to be fixed. OK maybe I am especially optimistic. But it's not just based on that. Yours, Linus Walleij
[PATCH 11/12 v5] mmc: block: issue requests in massive parallel
This makes a crucial change to the issueing mechanism for the MMC requests: Before commit "mmc: core: move the asynchronous post-processing" some parallelism on the read/write requests was achieved by speculatively postprocessing a request and re-preprocess and re-issue the request if something went wrong, which we discover later when checking for an error. This is kind of ugly. Instead we need a mechanism like here: We issue requests, and when they come back from the hardware, we know if they finished successfully or not. If the request was successful, we complete the asynchronous request and let a new request immediately start on the hardware. If, and only if, it returned an error from the hardware we go down the error path. This is achieved by splitting the work path from the hardware in two: a successful path ending up calling down to mmc_blk_rw_done() and completing quickly, and an errorpath calling down to mmc_blk_rw_done_error(). This has a profound effect: we reintroduce the parallelism on the successful path as mmc_post_req() can now be called in while the next request is in transit (just like prior to commit "mmc: core: move the asynchronous post-processing") and blk_end_request() is called while the next request is already on the hardware. The latter has the profound effect of issuing a new request again so that we actually may have three requests in transit at the same time: one on the hardware, one being prepared (such as DMA flushing) and one being prepared for issuing next by the block layer. This shows up when we transit to multiqueue, where this can be exploited. Signed-off-by: Linus Walleij--- ChangeLog v4->v5: - Fixes on the errorpath: when a request reports error back, keep the areq on the host as it is not yet finished (no assigning NULL to host->areq), do not postprocess the request or complete it. This will happen eventually when the request succeeds. - When restarting the command, use mmc_restart_areq() as could be expected. - Augmend the .report_done_status() callback to return a bool indicating whether the areq is now finished or not, to handle the error case where we eventually give up on the request and have returned an error to the block layer. - Make sure to post-process the request on the error path and pre-process it again when resending an asynchronous request. This satisfies the host's semantic expectation that every request will be in pre->req->post sequence even if there are errors. - To assure the ordering of pre/post-processing, we need to post-process any prepared request with -EINVAL if there is an error, then re-preprocess it again after error recovery. To this end a helper pointer in host->pending_areq is added so the error path can act on this. - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/block.c | 98 drivers/mmc/core/core.c | 58 +++- include/linux/mmc/host.h | 4 +- 3 files changed, 117 insertions(+), 43 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 2cd9fe5a8c9b..e3ae7241b2eb 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1827,7 +1827,8 @@ static void mmc_blk_rw_try_restart(struct mmc_queue_req *mq_rq) mmc_restart_areq(mq->card->host, areq); } -static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status status) +static bool mmc_blk_rw_done_error(struct mmc_async_req *areq, + enum mmc_blk_status status) { struct mmc_queue *mq; struct mmc_blk_data *md; @@ -1835,7 +1836,7 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status stat struct mmc_host *host; struct mmc_queue_req *mq_rq; struct mmc_blk_request *brq; - struct request *old_req; + struct request *req; bool req_pending = true; int type, retune_retry_done = 0; @@ -1849,42 +1850,27 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status stat card = mq->card; host = card->host; brq = _rq->brq; - old_req = mmc_queue_req_to_req(mq_rq); - type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE; + req = mmc_queue_req_to_req(mq_rq); + type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE; switch (status) { - case MMC_BLK_SUCCESS: case MMC_BLK_PARTIAL: - /* -* A block was successfully transferred. -*/ + /* This should trigger a retransmit */ mmc_blk_reset_success(md, type); - req_pending = blk_end_request(old_req, BLK_STS_OK, + req_pending = blk_end_request(req, BLK_STS_OK, brq->data.bytes_xfered); - /* -* If the blk_end_request function
[PATCH 09/12 v5] mmc: queue: stop flushing the pipeline with NULL
Remove all the pipeline flush: i.e. repeatedly sending NULL down to the core layer to flush out asynchronous requests, and also sending NULL after "special" commands to achieve the same flush. Instead: let the "special" commands wait for any ongoing asynchronous transfers using the completion, and apart from that expect the core.c and block.c layers to deal with the ongoing requests autonomously without any "push" from the queue. Add a function in the core to wait for an asynchronous request to complete. Update the tests to use the new function prototypes. This kills off some FIXME's such as gettin rid of the mq->qcnt queue depth variable that was introduced a while back. It is a vital step toward multiqueue enablement that we stop pulling NULL off the end of the request queue to flush the asynchronous issueing mechanism. Signed-off-by: Linus Walleij--- ChangeLog v1->v5: - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/block.c| 173 drivers/mmc/core/core.c | 50 +++-- drivers/mmc/core/core.h | 6 +- drivers/mmc/core/mmc_test.c | 31 ++-- drivers/mmc/core/queue.c| 11 ++- drivers/mmc/core/queue.h| 7 -- 6 files changed, 108 insertions(+), 170 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 2cda2f52058e..c7a57006e27f 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1805,7 +1805,6 @@ static void mmc_blk_rw_cmd_abort(struct mmc_queue *mq, struct mmc_card *card, if (mmc_card_removed(card)) req->rq_flags |= RQF_QUIET; while (blk_end_request(req, BLK_STS_IOERR, blk_rq_cur_bytes(req))); - mq->qcnt--; } /** @@ -1877,13 +1876,10 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status stat if (mmc_blk_reset(md, card->host, type)) { if (req_pending) mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq); - else - mq->qcnt--; mmc_blk_rw_try_restart(mq, mq_rq); return; } if (!req_pending) { - mq->qcnt--; mmc_blk_rw_try_restart(mq, mq_rq); return; } @@ -1927,7 +1923,6 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status stat req_pending = blk_end_request(old_req, BLK_STS_IOERR, brq->data.blksz); if (!req_pending) { - mq->qcnt--; mmc_blk_rw_try_restart(mq, mq_rq); return; } @@ -1951,26 +1946,16 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status stat */ mmc_blk_rw_rq_prep(mq_rq, card, areq->disable_multi, mq); - mmc_start_areq(card->host, areq, NULL); + mmc_start_areq(card->host, areq); mq_rq->brq.retune_retry_done = retune_retry_done; - } else { - /* Else, this request is done */ - mq->qcnt--; } } static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req) { - enum mmc_blk_status status; - struct mmc_async_req *new_areq; - struct mmc_async_req *old_areq; struct mmc_card *card = mq->card; - - if (new_req) - mq->qcnt++; - - if (!mq->qcnt) - return; + struct mmc_queue_req *mqrq_cur = req_to_mmc_queue_req(new_req); + struct mmc_async_req *areq = _cur->areq; /* * If the card was removed, just cancel everything and return. @@ -1978,46 +1963,26 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req) if (mmc_card_removed(card)) { new_req->rq_flags |= RQF_QUIET; blk_end_request_all(new_req, BLK_STS_IOERR); - mq->qcnt--; /* FIXME: just set to 0? */ return; } - if (new_req) { - struct mmc_queue_req *mqrq_cur = req_to_mmc_queue_req(new_req); - /* -* When 4KB native sector is enabled, only 8 blocks -* multiple read or write is allowed -*/ - if (mmc_large_sector(card) && - !IS_ALIGNED(blk_rq_sectors(new_req), 8)) { - pr_err("%s: Transfer size is not 4KB sector size aligned\n", - new_req->rq_disk->disk_name); - mmc_blk_rw_cmd_abort(mq, card, new_req, mqrq_cur); - return; - } - - mmc_blk_rw_rq_prep(mqrq_cur, card, 0, mq); - new_areq = _cur->areq; -
[PATCH 12/12 v5] mmc: switch MMC/SD to use blk-mq multiqueueing v5
This switches the MMC/SD stack to use the multiqueue block layer interface. We kill off the kthread that was just calling blk_fetch_request() and let blk-mq drive all traffic, nice, that is how it should work. Due to having switched the submission mechanics around so that the completion of requests is now triggered from the host callbacks, we manage to keep the same performance for linear reads/writes as we have for the old block layer. The open questions from earlier patch series have been addressed: - mmc_[get|put]_card() is now issued across requests from .queue_rq() to .complete() using Adrians nifty context lock. This means that the block layer does not compete with itself on getting access to the host, and we can let other users of the host come in. (For SDIO and mixed-mode cards.) - Partial reads are handled by open coding calls to blk_update_request() as advised by Christoph. Signed-off-by: Linus Walleij--- ChangeLog v4->v5: - Rebase on the other changes including improved error handling. - Use quiesce and unquiesce on the queue in the suspend/resume cycle. --- drivers/mmc/core/block.c | 92 ++ drivers/mmc/core/queue.c | 237 --- drivers/mmc/core/queue.h | 8 +- 3 files changed, 156 insertions(+), 181 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index e3ae7241b2eb..9fa3bfa3b4f8 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -93,7 +94,6 @@ static DEFINE_IDA(mmc_rpmb_ida); * There is one mmc_blk_data per slot. */ struct mmc_blk_data { - spinlock_t lock; struct device *parent; struct gendisk *disk; struct mmc_queue queue; @@ -1204,6 +1204,23 @@ static inline void mmc_blk_reset_success(struct mmc_blk_data *md, int type) } /* + * This reports status back to the block layer for a finished request. + */ +static void mmc_blk_complete(struct mmc_queue_req *mq_rq, +blk_status_t status) +{ + struct request *req = mmc_queue_req_to_req(mq_rq); + + /* +* We are done with I/O, so this call will invoke .complete() and +* release the host lock. +*/ + blk_mq_complete_request(req); + /* Then we report the request back to the block layer */ + blk_mq_end_request(req, status); +} + +/* * The non-block commands come back from the block layer after it queued it and * processed it with all other requests and then they get issued in this * function. @@ -1262,9 +1279,9 @@ static void mmc_blk_issue_drv_op(struct mmc_queue_req *mq_rq) ret = -EINVAL; break; } + mq_rq->drv_op_result = ret; - blk_end_request_all(mmc_queue_req_to_req(mq_rq), - ret ? BLK_STS_IOERR : BLK_STS_OK); + mmc_blk_complete(mq_rq, ret ? BLK_STS_IOERR : BLK_STS_OK); } static void mmc_blk_issue_discard_rq(struct mmc_queue_req *mq_rq) @@ -1308,7 +1325,7 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue_req *mq_rq) else mmc_blk_reset_success(md, type); fail: - blk_end_request(req, status, blk_rq_bytes(req)); + mmc_blk_complete(mq_rq, status); } static void mmc_blk_issue_secdiscard_rq(struct mmc_queue_req *mq_rq) @@ -1378,7 +1395,7 @@ static void mmc_blk_issue_secdiscard_rq(struct mmc_queue_req *mq_rq) if (!err) mmc_blk_reset_success(md, type); out: - blk_end_request(req, status, blk_rq_bytes(req)); + mmc_blk_complete(mq_rq, status); } static void mmc_blk_issue_flush(struct mmc_queue_req *mq_rq) @@ -1388,8 +1405,13 @@ static void mmc_blk_issue_flush(struct mmc_queue_req *mq_rq) int ret = 0; ret = mmc_flush_cache(card); - blk_end_request_all(mmc_queue_req_to_req(mq_rq), - ret ? BLK_STS_IOERR : BLK_STS_OK); + /* +* NOTE: this used to call blk_end_request_all() for both +* cases in the old block layer to flush all queued +* transactions. I am not sure it was even correct to +* do that for the success case. +*/ + mmc_blk_complete(mq_rq, ret ? BLK_STS_IOERR : BLK_STS_OK); } /* @@ -1768,7 +1790,6 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mq_rq, mq_rq->areq.err_check = mmc_blk_err_check; mq_rq->areq.host = card->host; - INIT_WORK(_rq->areq.finalization_work, mmc_finalize_areq); } static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, struct mmc_card *card, @@ -1792,10 +1813,13 @@ static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, struct mmc_card *card, err = mmc_sd_num_wr_blocks(card, ); if (err) req_pending = old_req_pending; - else - req_pending =
[PATCH 03/12 v5] mmc: core: replace waitqueue with worker
The waitqueue in the host context is there to signal back from mmc_request_done() through mmc_wait_data_done() that the hardware is done with a command, and when the wait is over, the core will typically submit the next asynchronous request that is pending just waiting for the hardware to be available. This is in the way for letting the mmc_request_done() trigger the report up to the block layer that a block request is finished. Re-jig this as a first step, remvoving the waitqueue and introducing a work that will run after a completed asynchronous request, finalizing that request, including retransmissions, and eventually reporting back with a completion and a status code to the asynchronous issue method. This has the upside that we can remove the MMC_BLK_NEW_REQUEST status code and the "new_request" state in the request queue that is only there to make the state machine spin out the first time we send a request. Use the workqueue we introduced in the host for handling just this, and then add a work and completion in the asynchronous request to deal with this mechanism. We introduce a pointer from mmc_request back to the asynchronous request so these can be referenced from each other, and augment mmc_wait_data_done() to use this pointer to get at the areq and kick the worker since that function is only used by asynchronous requests anyway. This is a central change that let us do many other changes since we have broken the submit and complete code paths in two, and we can potentially remove the NULL flushing of the asynchronous pipeline and report block requests as finished directly from the worker. Signed-off-by: Linus Walleij--- ChangeLog v1->v5: - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/block.c | 3 ++ drivers/mmc/core/core.c | 93 drivers/mmc/core/core.h | 2 ++ drivers/mmc/core/queue.c | 1 - include/linux/mmc/core.h | 3 +- include/linux/mmc/host.h | 7 ++-- 6 files changed, 59 insertions(+), 50 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index ea80ff4cd7f9..5c84175e49be 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1712,6 +1712,7 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, mmc_blk_data_prep(mq, mqrq, disable_multi, _rel_wr, _data_tag); brq->mrq.cmd = >cmd; + brq->mrq.areq = NULL; brq->cmd.arg = blk_rq_pos(req); if (!mmc_card_blockaddr(card)) @@ -1764,6 +1765,8 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, } mqrq->areq.err_check = mmc_blk_err_check; + mqrq->areq.host = card->host; + INIT_WORK(>areq.finalization_work, mmc_finalize_areq); } static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, struct mmc_card *card, diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 73ebee12e67b..7440daa2f559 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -369,10 +369,15 @@ EXPORT_SYMBOL(mmc_start_request); */ static void mmc_wait_data_done(struct mmc_request *mrq) { - struct mmc_context_info *context_info = >host->context_info; + struct mmc_host *host = mrq->host; + struct mmc_context_info *context_info = >context_info; + struct mmc_async_req *areq = mrq->areq; context_info->is_done_rcv = true; - wake_up_interruptible(_info->wait); + /* Schedule a work to deal with finalizing this request */ + if (!areq) + pr_err("areq of the data mmc_request was NULL!\n"); + queue_work(host->req_done_wq, >finalization_work); } static void mmc_wait_done(struct mmc_request *mrq) @@ -695,43 +700,34 @@ static void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq, * Returns the status of the ongoing asynchronous request, but * MMC_BLK_SUCCESS if no request was going on. */ -static enum mmc_blk_status mmc_finalize_areq(struct mmc_host *host) +void mmc_finalize_areq(struct work_struct *work) { + struct mmc_async_req *areq = + container_of(work, struct mmc_async_req, finalization_work); + struct mmc_host *host = areq->host; struct mmc_context_info *context_info = >context_info; - enum mmc_blk_status status; - - if (!host->areq) - return MMC_BLK_SUCCESS; - - while (1) { - wait_event_interruptible(context_info->wait, - (context_info->is_done_rcv || -context_info->is_new_req)); + enum mmc_blk_status status = MMC_BLK_SUCCESS; - if (context_info->is_done_rcv) { - struct mmc_command *cmd; + if (context_info->is_done_rcv) { + struct mmc_command *cmd; - context_info->is_done_rcv = false; - cmd = host->areq->mrq->cmd; + context_info->is_done_rcv = false; +
[PATCH 00/12 v5] Multiqueue for MMC/SD
This is the fifth iteration of this patch set. I *HOPE* that we can scrap this patch set and merge Adrian's patches instead, because they also bring CQE support which is nice. I had some review comments on his series, mainly that it needs to kill off the legacy block layer code path that noone likes anyway. So this is mainly an academic and inspirational exercise. Whatever remains of this refactoring, if anything, I can certainly do on top of Adrian's patches as well. What changed since v4 is the error path, since Adrian pointed out that the error handling seems to be fragile. It was indeed fragile... To make sure things work properly I have run long test rounds with fault injection, essentially: Enable FAULT_INJECTION, FAULT_INJECTION_DEBUG_FS, FAIL_MMC_REQUEST cd /debug/mmc3/fail_mmc_request/ echo 1 > probability echo -1 > times Then running a dd to the card, also increased the error rate to 10% and completed tests successfully, but at this error rate the MMC stack sometimes exceeds the retry limit and the dd command fails (as is appropriate). Removing a card during I/O does not work well however :/ So I guess I would need to work on that if this series should continue. (Hopefully unlikely.) Linus Walleij (12): mmc: core: move the asynchronous post-processing mmc: core: add a workqueue for completing requests mmc: core: replace waitqueue with worker mmc: core: do away with is_done_rcv mmc: core: do away with is_new_req mmc: core: kill off the context info mmc: queue: simplify queue logic mmc: block: shuffle retry and error handling mmc: queue: stop flushing the pipeline with NULL mmc: queue/block: pass around struct mmc_queue_req*s mmc: block: issue requests in massive parallel mmc: switch MMC/SD to use blk-mq multiqueueing v5 drivers/mmc/core/block.c| 557 +++- drivers/mmc/core/block.h| 5 +- drivers/mmc/core/bus.c | 1 - drivers/mmc/core/core.c | 217 ++--- drivers/mmc/core/core.h | 11 +- drivers/mmc/core/host.c | 1 - drivers/mmc/core/mmc_test.c | 31 +-- drivers/mmc/core/queue.c| 252 drivers/mmc/core/queue.h| 16 +- include/linux/mmc/core.h| 3 +- include/linux/mmc/host.h| 31 +-- 11 files changed, 557 insertions(+), 568 deletions(-) -- 2.13.6
[PATCH 06/12 v5] mmc: core: kill off the context info
The last member of the context info: is_waiting_last_req is just assigned values, never checked. Delete that and the whole context info as a result. Signed-off-by: Linus Walleij--- ChangeLog v1->v5: - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/block.c | 2 -- drivers/mmc/core/bus.c | 1 - drivers/mmc/core/core.c | 13 - drivers/mmc/core/core.h | 2 -- drivers/mmc/core/queue.c | 9 + include/linux/mmc/host.h | 9 - 6 files changed, 1 insertion(+), 35 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 5c84175e49be..86ec87c17e71 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -2065,13 +2065,11 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) default: /* Normal request, just issue it */ mmc_blk_issue_rw_rq(mq, req); - card->host->context_info.is_waiting_last_req = false; break; } } else { /* No request, flushing the pipeline with NULL */ mmc_blk_issue_rw_rq(mq, NULL); - card->host->context_info.is_waiting_last_req = false; } out: diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c index a4b49e25fe96..45904a7e87be 100644 --- a/drivers/mmc/core/bus.c +++ b/drivers/mmc/core/bus.c @@ -348,7 +348,6 @@ int mmc_add_card(struct mmc_card *card) #ifdef CONFIG_DEBUG_FS mmc_add_card_debugfs(card); #endif - mmc_init_context_info(card->host); card->dev.of_node = mmc_of_find_child_device(card->host, 0); diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index b1a5059f6cd1..fa86f9a15d29 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -2997,19 +2997,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host) } #endif -/** - * mmc_init_context_info() - init synchronization context - * @host: mmc host - * - * Init struct context_info needed to implement asynchronous - * request mechanism, used by mmc core, host driver and mmc requests - * supplier. - */ -void mmc_init_context_info(struct mmc_host *host) -{ - host->context_info.is_waiting_last_req = false; -} - static int __init mmc_init(void) { int ret; diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h index e493d9d73fe2..88b852ac8f74 100644 --- a/drivers/mmc/core/core.h +++ b/drivers/mmc/core/core.h @@ -92,8 +92,6 @@ void mmc_remove_host_debugfs(struct mmc_host *host); void mmc_add_card_debugfs(struct mmc_card *card); void mmc_remove_card_debugfs(struct mmc_card *card); -void mmc_init_context_info(struct mmc_host *host); - int mmc_execute_tuning(struct mmc_card *card); int mmc_hs200_to_hs400(struct mmc_card *card); int mmc_hs400_to_hs200(struct mmc_card *card); diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 4a0752ef6154..2c232ba4e594 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -42,7 +42,6 @@ static int mmc_queue_thread(void *d) { struct mmc_queue *mq = d; struct request_queue *q = mq->queue; - struct mmc_context_info *cntx = >card->host->context_info; current->flags |= PF_MEMALLOC; @@ -54,15 +53,12 @@ static int mmc_queue_thread(void *d) set_current_state(TASK_INTERRUPTIBLE); req = blk_fetch_request(q); mq->asleep = false; - cntx->is_waiting_last_req = false; if (!req) { /* * Dispatch queue is empty so set flags for * mmc_request_fn() to wake us up. */ - if (mq->qcnt) - cntx->is_waiting_last_req = true; - else + if (!mq->qcnt) mq->asleep = true; } spin_unlock_irq(q->queue_lock); @@ -96,7 +92,6 @@ static void mmc_request_fn(struct request_queue *q) { struct mmc_queue *mq = q->queuedata; struct request *req; - struct mmc_context_info *cntx; if (!mq) { while ((req = blk_fetch_request(q)) != NULL) { @@ -106,8 +101,6 @@ static void mmc_request_fn(struct request_queue *q) return; } - cntx = >card->host->context_info; - if (mq->asleep) wake_up_process(mq->thread); } diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index 36af19990683..4b210e9283f6 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -239,14 +239,6 @@ struct mmc_slot { void *handler_priv; }; -/** - * mmc_context_info - synchronization details for mmc context - * @is_waiting_last_reqmmc context waiting for single running request - */ -struct mmc_context_info { - bool
[PATCH 10/12 v5] mmc: queue/block: pass around struct mmc_queue_req*s
Instead of passing two pointers around several pointers to mmc_queue_req, request, mmc_queue, and reassigning to the left and right, issue mmc_queue_req and dereference the queue and request from the mmq_queue_req where needed. The struct mmc_queue_req is the thing that has a lifecycle after all: this is what we are keeping in our queue, and what the block layer helps us manager. Augment a bunch of functions to take a single argument so we can see the trees and not just a big jungle of arguments. Signed-off-by: Linus Walleij--- ChangeLog v1->v5: - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/block.c | 128 --- drivers/mmc/core/block.h | 5 +- drivers/mmc/core/queue.c | 2 +- 3 files changed, 69 insertions(+), 66 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index c7a57006e27f..2cd9fe5a8c9b 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1208,9 +1208,9 @@ static inline void mmc_blk_reset_success(struct mmc_blk_data *md, int type) * processed it with all other requests and then they get issued in this * function. */ -static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req) +static void mmc_blk_issue_drv_op(struct mmc_queue_req *mq_rq) { - struct mmc_queue_req *mq_rq; + struct mmc_queue *mq = mq_rq->mq; struct mmc_card *card = mq->card; struct mmc_blk_data *md = mq->blkdata; struct mmc_blk_ioc_data **idata; @@ -1220,7 +1220,6 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req) int ret; int i; - mq_rq = req_to_mmc_queue_req(req); rpmb_ioctl = (mq_rq->drv_op == MMC_DRV_OP_IOCTL_RPMB); switch (mq_rq->drv_op) { @@ -1264,12 +1263,14 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req) break; } mq_rq->drv_op_result = ret; - blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK); + blk_end_request_all(mmc_queue_req_to_req(mq_rq), + ret ? BLK_STS_IOERR : BLK_STS_OK); } -static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req) +static void mmc_blk_issue_discard_rq(struct mmc_queue_req *mq_rq) { - struct mmc_blk_data *md = mq->blkdata; + struct request *req = mmc_queue_req_to_req(mq_rq); + struct mmc_blk_data *md = mq_rq->mq->blkdata; struct mmc_card *card = md->queue.card; unsigned int from, nr, arg; int err = 0, type = MMC_BLK_DISCARD; @@ -1310,10 +1311,10 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req) blk_end_request(req, status, blk_rq_bytes(req)); } -static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq, - struct request *req) +static void mmc_blk_issue_secdiscard_rq(struct mmc_queue_req *mq_rq) { - struct mmc_blk_data *md = mq->blkdata; + struct request *req = mmc_queue_req_to_req(mq_rq); + struct mmc_blk_data *md = mq_rq->mq->blkdata; struct mmc_card *card = md->queue.card; unsigned int from, nr, arg; int err = 0, type = MMC_BLK_SECDISCARD; @@ -1380,14 +1381,15 @@ static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq, blk_end_request(req, status, blk_rq_bytes(req)); } -static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req) +static void mmc_blk_issue_flush(struct mmc_queue_req *mq_rq) { - struct mmc_blk_data *md = mq->blkdata; + struct mmc_blk_data *md = mq_rq->mq->blkdata; struct mmc_card *card = md->queue.card; int ret = 0; ret = mmc_flush_cache(card); - blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK); + blk_end_request_all(mmc_queue_req_to_req(mq_rq), + ret ? BLK_STS_IOERR : BLK_STS_OK); } /* @@ -1698,18 +1700,18 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq, *do_data_tag_p = do_data_tag; } -static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, - struct mmc_card *card, - bool disable_multi, - struct mmc_queue *mq) +static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mq_rq, + bool disable_multi) { u32 readcmd, writecmd; - struct mmc_blk_request *brq = >brq; - struct request *req = mmc_queue_req_to_req(mqrq); + struct mmc_queue *mq = mq_rq->mq; + struct mmc_card *card = mq->card; + struct mmc_blk_request *brq = _rq->brq; + struct request *req = mmc_queue_req_to_req(mq_rq); struct mmc_blk_data *md = mq->blkdata; bool do_rel_wr, do_data_tag; - mmc_blk_data_prep(mq, mqrq, disable_multi, _rel_wr, _data_tag); + mmc_blk_data_prep(mq, mq_rq,
[PATCH 02/12 v5] mmc: core: add a workqueue for completing requests
As we want to complete requests autonomously from feeding the host with new requests, we create a workqueue to deal with this specifically in response to the callback from a host driver. This is necessary to exploit parallelism properly. This patch just adds the workqueu, later patches will make use of it. Signed-off-by: Linus Walleij--- ChangeLog v1->v5: - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/core.c | 9 + drivers/mmc/core/host.c | 1 - include/linux/mmc/host.h | 4 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index e2366a82eebe..73ebee12e67b 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -2838,6 +2838,14 @@ void mmc_start_host(struct mmc_host *host) host->f_init = max(freqs[0], host->f_min); host->rescan_disable = 0; host->ios.power_mode = MMC_POWER_UNDEFINED; + /* Workqueue for completing requests */ + host->req_done_wq = alloc_workqueue("mmc%d-reqdone", + WQ_FREEZABLE | WQ_HIGHPRI | WQ_MEM_RECLAIM, + 0, host->index); + if (!host->req_done_wq) { + dev_err(mmc_dev(host), "could not allocate workqueue\n"); + return; + } if (!(host->caps2 & MMC_CAP2_NO_PRESCAN_POWERUP)) { mmc_claim_host(host); @@ -2859,6 +2867,7 @@ void mmc_stop_host(struct mmc_host *host) host->rescan_disable = 1; cancel_delayed_work_sync(>detect); + destroy_workqueue(host->req_done_wq); /* clear pm flags now and let card drivers set them as needed */ host->pm_flags = 0; diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c index 35a9e4fd1a9f..88033294832f 100644 --- a/drivers/mmc/core/host.c +++ b/drivers/mmc/core/host.c @@ -390,7 +390,6 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev) INIT_DELAYED_WORK(>detect, mmc_rescan); INIT_DELAYED_WORK(>sdio_irq_work, sdio_irq_work); setup_timer(>retune_timer, mmc_retune_timer, (unsigned long)host); - /* * By default, hosts do not support SGIO or large requests. * They have to set these according to their abilities. diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index e7743eca1021..e4fa7058c288 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -425,6 +426,9 @@ struct mmc_host { struct mmc_async_req*areq; /* active async req */ struct mmc_context_info context_info; /* async synchronization info */ + /* finalization workqueue, handles finalizing requests */ + struct workqueue_struct *req_done_wq; + /* Ongoing data transfer that allows commands during transfer */ struct mmc_request *ongoing_mrq; -- 2.13.6
[PATCH 01/12 v5] mmc: core: move the asynchronous post-processing
This moves the asynchronous post-processing of a request over to the finalization function. The patch has a slight semantic change: Both places will be in the code path for if (host->areq) and in the same sequence, but before this patch, the next request was started before performing post-processing. The effect is that whereas before, the post- and preprocessing happened after starting the next request, now the preprocessing will happen after the request is done and before the next has started which would cut half of the pre/post optimizations out. In the later patch named "mmc: core: replace waitqueue with worker" we move the finalization to a worker started by mmc_request_done() and in the patch named "mmc: block: issue requests in massive parallel" we introduce a forked success/failure path that can quickly complete requests when they come back from the hardware. These two later patches together restore the same optimization but in a more elegant manner that avoids the need to flush the two-stage pipleline with NULL, something we remove between these two patches in the commit named "mmc: queue: stop flushing the pipeline with NULL". Signed-off-by: Linus Walleij--- ChangeLog v1->v5: - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/core.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 1f0f44f4dd5f..e2366a82eebe 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -746,6 +746,9 @@ static enum mmc_blk_status mmc_finalize_areq(struct mmc_host *host) mmc_start_bkops(host->card, true); } + /* Successfully postprocess the old request at this point */ + mmc_post_req(host, host->areq->mrq, 0); + return status; } @@ -790,10 +793,6 @@ struct mmc_async_req *mmc_start_areq(struct mmc_host *host, if (status == MMC_BLK_SUCCESS && areq) start_err = __mmc_start_data_req(host, areq->mrq); - /* Postprocess the old request at this point */ - if (host->areq) - mmc_post_req(host, host->areq->mrq, 0); - /* Cancel a prepared request if it was not started. */ if ((status != MMC_BLK_SUCCESS || start_err) && areq) mmc_post_req(host, areq->mrq, -EINVAL); -- 2.13.6
[PATCH 07/12 v5] mmc: queue: simplify queue logic
The if() statment checking if there is no current or previous request is now just looking ahead at something that will be concluded a few lines below. Simplify the logic by moving the assignment of .asleep. Signed-off-by: Linus Walleij--- ChangeLog v1->v5: - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/queue.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 2c232ba4e594..023bbddc1a0b 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -53,14 +53,6 @@ static int mmc_queue_thread(void *d) set_current_state(TASK_INTERRUPTIBLE); req = blk_fetch_request(q); mq->asleep = false; - if (!req) { - /* -* Dispatch queue is empty so set flags for -* mmc_request_fn() to wake us up. -*/ - if (!mq->qcnt) - mq->asleep = true; - } spin_unlock_irq(q->queue_lock); if (req || mq->qcnt) { @@ -68,6 +60,7 @@ static int mmc_queue_thread(void *d) mmc_blk_issue_rq(mq, req); cond_resched(); } else { + mq->asleep = true; if (kthread_should_stop()) { set_current_state(TASK_RUNNING); break; -- 2.13.6
[PATCH 05/12] mmc: core: do away with is_new_req
The host context member "is_new_req" is only assigned values, never checked. Delete it. Signed-off-by: Linus Walleij--- ChangeLog v1->v5: - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/core.c | 1 - drivers/mmc/core/queue.c | 5 - include/linux/mmc/host.h | 2 -- 3 files changed, 8 deletions(-) diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 15a664d3c199..b1a5059f6cd1 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -3007,7 +3007,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host) */ void mmc_init_context_info(struct mmc_host *host) { - host->context_info.is_new_req = false; host->context_info.is_waiting_last_req = false; } diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index c46be4402803..4a0752ef6154 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -55,7 +55,6 @@ static int mmc_queue_thread(void *d) req = blk_fetch_request(q); mq->asleep = false; cntx->is_waiting_last_req = false; - cntx->is_new_req = false; if (!req) { /* * Dispatch queue is empty so set flags for @@ -109,10 +108,6 @@ static void mmc_request_fn(struct request_queue *q) cntx = >card->host->context_info; - if (cntx->is_waiting_last_req) { - cntx->is_new_req = true; - } - if (mq->asleep) wake_up_process(mq->thread); } diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index d43d26562fae..36af19990683 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -241,11 +241,9 @@ struct mmc_slot { /** * mmc_context_info - synchronization details for mmc context - * @is_new_req wake up reason was new request * @is_waiting_last_reqmmc context waiting for single running request */ struct mmc_context_info { - boolis_new_req; boolis_waiting_last_req; }; -- 2.13.6
[PATCH 04/12] mmc: core: do away with is_done_rcv
The "is_done_rcv" in the context info for the host is no longer needed: it is clear from context (ha!) that as long as we are waiting for the asynchronous request to come to completion, we are not done receiving data, and when the finalization work has run and completed the completion, we are indeed done. Signed-off-by: Linus Walleij--- ChangeLog v1->v5: - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/core.c | 40 include/linux/mmc/host.h | 2 -- 2 files changed, 16 insertions(+), 26 deletions(-) diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 7440daa2f559..15a664d3c199 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -370,10 +370,8 @@ EXPORT_SYMBOL(mmc_start_request); static void mmc_wait_data_done(struct mmc_request *mrq) { struct mmc_host *host = mrq->host; - struct mmc_context_info *context_info = >context_info; struct mmc_async_req *areq = mrq->areq; - context_info->is_done_rcv = true; /* Schedule a work to deal with finalizing this request */ if (!areq) pr_err("areq of the data mmc_request was NULL!\n"); @@ -656,7 +654,7 @@ EXPORT_SYMBOL(mmc_cqe_recovery); bool mmc_is_req_done(struct mmc_host *host, struct mmc_request *mrq) { if (host->areq) - return host->context_info.is_done_rcv; + return completion_done(>areq->complete); else return completion_done(>completion); } @@ -705,29 +703,24 @@ void mmc_finalize_areq(struct work_struct *work) struct mmc_async_req *areq = container_of(work, struct mmc_async_req, finalization_work); struct mmc_host *host = areq->host; - struct mmc_context_info *context_info = >context_info; enum mmc_blk_status status = MMC_BLK_SUCCESS; + struct mmc_command *cmd; - if (context_info->is_done_rcv) { - struct mmc_command *cmd; - - context_info->is_done_rcv = false; - cmd = areq->mrq->cmd; + cmd = areq->mrq->cmd; - if (!cmd->error || !cmd->retries || - mmc_card_removed(host->card)) { - status = areq->err_check(host->card, -areq); - } else { - mmc_retune_recheck(host); - pr_info("%s: req failed (CMD%u): %d, retrying...\n", - mmc_hostname(host), - cmd->opcode, cmd->error); - cmd->retries--; - cmd->error = 0; - __mmc_start_request(host, areq->mrq); - return; /* wait for done/new event again */ - } + if (!cmd->error || !cmd->retries || + mmc_card_removed(host->card)) { + status = areq->err_check(host->card, +areq); + } else { + mmc_retune_recheck(host); + pr_info("%s: req failed (CMD%u): %d, retrying...\n", + mmc_hostname(host), + cmd->opcode, cmd->error); + cmd->retries--; + cmd->error = 0; + __mmc_start_request(host, areq->mrq); + return; /* wait for done/new event again */ } mmc_retune_release(host); @@ -3015,7 +3008,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host) void mmc_init_context_info(struct mmc_host *host) { host->context_info.is_new_req = false; - host->context_info.is_done_rcv = false; host->context_info.is_waiting_last_req = false; } diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index d2ff79a16839..d43d26562fae 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -241,12 +241,10 @@ struct mmc_slot { /** * mmc_context_info - synchronization details for mmc context - * @is_done_rcvwake up reason was done request * @is_new_req wake up reason was new request * @is_waiting_last_reqmmc context waiting for single running request */ struct mmc_context_info { - boolis_done_rcv; boolis_new_req; boolis_waiting_last_req; }; -- 2.13.6
[PATCH 08/12 v5] mmc: block: shuffle retry and error handling
Instead of doing retries at the same time as trying to submit new requests, do the retries when the request is reported as completed by the driver, in the finalization worker. This is achieved by letting the core worker call back into the block layer using a callback in the asynchronous request, ->report_done_status() that will pass the status back to the block core so it can repeatedly try to hammer the request using single request, retry etc by calling back to the core layer using mmc_restart_areq(), which will just kick the same asynchronous request without waiting for a previous ongoing request. The beauty of it is that the completion will not complete until the block layer has had the opportunity to hammer a bit at the card using a bunch of different approaches that used to be in the while() loop in mmc_blk_rw_done() The algorithm for recapture, retry and handle errors is identical to the one we used to have in mmc_blk_issue_rw_rq(), only augmented to get called in another path from the core. We have to add and initialize a pointer back to the struct mmc_queue from the struct mmc_queue_req to find the queue from the asynchronous request when reporting the status back to the core. Other users of the asynchrous request that do not need to retry and use misc error handling fallbacks will work fine since a NULL ->report_done_status() is just fine. This is currently only done by the test module. Signed-off-by: Linus Walleij--- ChangeLog v4->v5: - The "disable_multi" and "retry" variables used to be inside the do {} loop in the error handler, so now that we restart the areq when there are problems, we need to make these part of the struct mmc_async_req and reinitialize them to false/zero when restarting an asynchronous request. - Assign mrq->areq also when restarting asynchronous requests: the mrq is a quick-turnaround produce and consume object and only lives for one request to the host, so this needs to be assigned every time we made a new mrq and want to send it off to the host. - Switch "disable_multi" to be a bool as is appropriate. - Be more careful to assign NULL to host->areq when it is not in use, and make sure this only happens at one spot. - Rebasing on the "next" branch in the MMC tree. --- drivers/mmc/core/block.c | 347 --- drivers/mmc/core/core.c | 46 --- drivers/mmc/core/core.h | 1 + drivers/mmc/core/queue.c | 2 + drivers/mmc/core/queue.h | 1 + include/linux/mmc/host.h | 7 +- 6 files changed, 221 insertions(+), 183 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 86ec87c17e71..2cda2f52058e 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1575,7 +1575,7 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card, } static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq, - int disable_multi, bool *do_rel_wr_p, + bool disable_multi, bool *do_rel_wr_p, bool *do_data_tag_p) { struct mmc_blk_data *md = mq->blkdata; @@ -1700,7 +1700,7 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq, static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, struct mmc_card *card, - int disable_multi, + bool disable_multi, struct mmc_queue *mq) { u32 readcmd, writecmd; @@ -1811,198 +1811,213 @@ static void mmc_blk_rw_cmd_abort(struct mmc_queue *mq, struct mmc_card *card, /** * mmc_blk_rw_try_restart() - tries to restart the current async request * @mq: the queue with the card and host to restart - * @req: a new request that want to be started after the current one + * @mqrq: the mmc_queue_request containing the areq to be restarted */ -static void mmc_blk_rw_try_restart(struct mmc_queue *mq, struct request *req, +static void mmc_blk_rw_try_restart(struct mmc_queue *mq, struct mmc_queue_req *mqrq) { - if (!req) - return; + struct mmc_async_req *areq = >areq; + + /* Proceed and try to restart the current async request */ + mmc_blk_rw_rq_prep(mqrq, mq->card, 0, mq); + areq->disable_multi = false; + areq->retry = 0; + mmc_restart_areq(mq->card->host, areq); +} + +static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status status) +{ + struct mmc_queue *mq; + struct mmc_blk_data *md; + struct mmc_card *card; + struct mmc_host *host; + struct mmc_queue_req *mq_rq; + struct mmc_blk_request *brq; + struct request *old_req; + bool req_pending = true; + int type, retune_retry_done = 0; /* -* If the card was removed, just cancel everything and return. +* An
[PATCH V4] scsi_debugfs: fix crash in scsi_show_rq()
cmd->cmnd can be allocated/freed dynamically in case of T10_PI_TYPE2_PROTECTION, so we should check it in scsi_show_rq() because this request may have been freed already here, and cmd->cmnd has been set as null. We choose to accept read-after-free and dump request data as far as possible. This patch fixs the following kernel crash when dumping request via block's debugfs interface: [ 252.962045] BUG: unable to handle kernel NULL pointer dereference at (null) [ 252.963007] IP: scsi_format_opcode_name+0x1a/0x1c0 [ 252.963007] PGD 25e75a067 P4D 25e75a067 PUD 25e75b067 PMD 0 [ 252.963007] Oops: [#1] PREEMPT SMP [ 252.963007] Dumping ftrace buffer: [ 252.963007](ftrace buffer empty) [ 252.963007] Modules linked in: scsi_debug ebtable_filter ebtables ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c bridge stp llc iptable_filter fuse ip_tables sd_mod sg mptsas mptscsih mptbase crc32c_intel ahci libahci nvme serio_raw scsi_transport_sas libata lpc_ich nvme_core virtio_scsi binfmt_misc dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs [ 252.963007] CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0-rc2.blk_mq_io_hang+ #516 [ 252.963007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.9.3-1.fc25 04/01/2014 [ 252.963007] task: 88025e6f6000 task.stack: c90001bd [ 252.963007] RIP: 0010:scsi_format_opcode_name+0x1a/0x1c0 [ 252.963007] RSP: 0018:c90001bd3c50 EFLAGS: 00010286 [ 252.963007] RAX: 4843 RBX: 0050 RCX: [ 252.963007] RDX: RSI: 0050 RDI: c90001bd3cd8 [ 252.963007] RBP: c90001bd3c88 R08: 1000 R09: [ 252.963007] R10: 880275134000 R11: 88027513406c R12: 0050 [ 252.963007] R13: c90001bd3cd8 R14: R15: [ 252.963007] FS: 7f4d11762700() GS:88027fc4() knlGS: [ 252.963007] CS: 0010 DS: ES: CR0: 80050033 [ 252.963007] CR2: CR3: 00025e789003 CR4: 003606e0 [ 252.963007] DR0: DR1: DR2: [ 252.963007] DR3: DR6: fffe0ff0 DR7: 0400 [ 252.963007] Call Trace: [ 252.963007] __scsi_format_command+0x27/0xc0 [ 252.963007] scsi_show_rq+0x5c/0xc0 [ 252.963007] ? seq_printf+0x4e/0x70 [ 252.963007] ? blk_flags_show+0x5b/0xf0 [ 252.963007] __blk_mq_debugfs_rq_show+0x116/0x130 [ 252.963007] blk_mq_debugfs_rq_show+0xe/0x10 [ 252.963007] seq_read+0xfe/0x3b0 [ 252.963007] ? __handle_mm_fault+0x631/0x1150 [ 252.963007] full_proxy_read+0x54/0x90 [ 252.963007] __vfs_read+0x37/0x160 [ 252.963007] ? security_file_permission+0x9b/0xc0 [ 252.963007] vfs_read+0x96/0x130 [ 252.963007] SyS_read+0x55/0xc0 [ 252.963007] entry_SYSCALL_64_fastpath+0x1a/0xa5 [ 252.963007] RIP: 0033:0x7f4d1127e9b0 [ 252.963007] RSP: 002b:7ffd27082568 EFLAGS: 0246 ORIG_RAX: [ 252.963007] RAX: ffda RBX: 7f4d1154bb20 RCX: 7f4d1127e9b0 [ 252.963007] RDX: 0002 RSI: 7f4d115a7000 RDI: 0003 [ 252.963007] RBP: 00021010 R08: R09: [ 252.963007] R10: 037b R11: 0246 R12: 00022000 [ 252.963007] R13: 7f4d1154bb78 R14: 1000 R15: 0002 [ 252.963007] Code: c6 e8 1b ca 24 00 eb 8c e8 74 2c ae ff 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 89 f4 48 83 ec 18 <44> 0f b6 32 48 c7 45 c8 00 00 00 00 65 48 8b 04 25 28 00 00 00 [ 252.963007] RIP: scsi_format_opcode_name+0x1a/0x1c0 RSP: c90001bd3c50 [ 252.963007] CR2: [ 252.963007] ---[ end trace 83c5bddfbaa6573c ]--- [ 252.963007] Kernel panic - not syncing: Fatal exception [ 252.963007] Dumping ftrace buffer: [ 252.963007](ftrace buffer empty) [ 252.963007] Kernel Offset: disabled [ 252.963007] ---[ end Kernel panic - not syncing: Fatal exception Fixes: 0eebd005dd07(scsi: Implement blk_mq_ops.show_rq()) Cc: Bart Van AsscheCc: Omar Sandoval Cc: Martin K. Petersen Cc: James Bottomley Cc: Hannes Reinecke Signed-off-by: Ming Lei --- V2: - fix typo V3: - prefer to dump data and accept read-after-free - add some comment V4: - read the two variable into local variable first, for avoiding free between check and passing to __scsi_format_command(). drivers/scsi/scsi_debugfs.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c index 5e9755008aed..f8437c456458
Re: nvme multipath support V7
I've updated the git tree with the Kconfig type fix, the instance fixup from Keith (plus extensive comments explaning the issue) and the various reviews. I'll wait for the buildbot a and do a little more manual testing and will send it to Jens in the afternoon.
Re: [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host
On Thu, Nov 9, 2017 at 1:55 PM, Adrian Hunterwrote: > On 09/11/17 14:26, Linus Walleij wrote: I think the above approach to put any CQE-specific callbacks directly into the struct mmc_host_ops is way more viable. >>> >>> Nothing to do with CQE. This is CQHCI. Please try to get the difference. >> >> I am trying, please try to think about your language. > > I strongly disapprove of being rude but sadly it seems to get results. Conflict is a natural part of human existence. Using it deliberately to get one's way is manipulation. Admitting to being manipulative is losing one's face in public. Yours, Linus Walleij
Re: [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery
On Thu, Nov 9, 2017 at 2:02 PM, Adrian Hunterwrote: > On 09/11/17 14:52, Linus Walleij wrote: >> On Thu, Nov 9, 2017 at 8:56 AM, Adrian Hunter >> wrote: >>> On 08/11/17 11:30, Linus Walleij wrote: On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter wrote: > Recovery is simpler to understand if it is only used for errors. Create a > separate function for card polling. > > Signed-off-by: Adrian Hunter This looks good but I can't see why it's not folded into patch 3 already. This error handling is introduced there. >>> >>> What are you on about? >> >> You are attacking your most valuable resource, a reviewer. >> >> And I even said the patch looks good. >> >> The only thing you attain with this kind of langauge is alienante >> me and discourage others to review your patch set. You also >> give your employer a bad name, since you are representing >> them. > > 6 months of being messed around will do that. Blessed are the meek, for they will inherit the earth. Nobody is after you, this is just hard stuff and too few people care to help out with review etc. With this and with the block layer as well. That makes things slow. It's noone's agenda to slow things down. Yours, Linus Walleij
Re: [PATCH] block: avoid null pointer dereference on null disk
Looks good, Reviewed-by: Christoph Hellwig
Re: [PATCH 5/7] nvme: also expose the namespace identification sysfs files for mpath nodes
On 11/09/2017 06:44 PM, Christoph Hellwig wrote: > We do this by adding a helper that returns the ns_head for a device that > can belong to either the per-controller or per-subsystem block device > nodes, and otherwise reuse all the existing code. > > Signed-off-by: Christoph Hellwig> Reviewed-by: Keith Busch > Reviewed-by: Sagi Grimberg > Reviewed-by: Johannes Thumshirn > --- > drivers/nvme/host/core.c | 57 > +++ > drivers/nvme/host/multipath.c | 6 + > drivers/nvme/host/nvme.h | 1 + > 3 files changed, 38 insertions(+), 26 deletions(-) > In principle, yes, but I would like to have some more info (like subsysnqn, and maybe 'model' etc) in the subsys sysfs entry, too. But this can be handled in a later patch, so: Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK
On Thu, Nov 9, 2017 at 8:12 AM, Adrian Hunterwrote: > On 08/11/17 11:24, Linus Walleij wrote: >> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter >> wrote: >> >>> Add CQHCI initialization and implement CQHCI operations for Intel GLK. >>> >>> Signed-off-by: Adrian Hunter >> >> This patch seems OK in context, but it merely illustrates the >> weirdness of .[runtime]_suspend/resume calling into CQE-specific >> APIs rather than using generic host callbacks. > > Your comment makes no sense at all. The host driver has > [runtime]_suspend/resume callbacks and it is up to the host driver to decide > what to do. CQHCI provides helpers since that is the whole point of having > a CQHCI library. Yeah I didn't see it in context, the CQHCI library for these hosts make a lot of sense. Yours, Linus Walleij