Re: [PATCH V4] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-10 Thread Ming Lei
On Fri, Nov 10, 2017 at 08:51:58AM -0800, James Bottomley wrote:
> On Fri, 2017-11-10 at 17:01 +0800, Ming Lei wrote:
> > cmd->cmnd can be allocated/freed dynamically in case of
> > T10_PI_TYPE2_PROTECTION,
> > so we should check it in scsi_show_rq() because this request may have
> > been freed already here, and cmd->cmnd has been set as null.
> > 
> > We choose to accept read-after-free and dump request data as far as
> > possible.
> > 
> > This patch fixs the following kernel crash when dumping request via
> > block's
> > debugfs interface:
> > 
> > [  252.962045] BUG: unable to handle kernel NULL pointer dereference
> > at   (null)
> > [  252.963007] IP: scsi_format_opcode_name+0x1a/0x1c0
> > [  252.963007] PGD 25e75a067 P4D 25e75a067 PUD 25e75b067 PMD 0
> > [  252.963007] Oops:  [#1] PREEMPT SMP
> > [  252.963007] Dumping ftrace buffer:
> > [  252.963007](ftrace buffer empty)
> > [  252.963007] Modules linked in: scsi_debug ebtable_filter ebtables
> > ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE
> > nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> > nf_nat_ipv4 nf_nat nf_conntrack libcrc32c bridge stp llc
> > iptable_filter fuse ip_tables sd_mod sg mptsas mptscsih mptbase
> > crc32c_intel ahci libahci nvme serio_raw scsi_transport_sas libata
> > lpc_ich nvme_core virtio_scsi binfmt_misc dm_mod iscsi_tcp
> > libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs
> > [  252.963007] CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0-
> > rc2.blk_mq_io_hang+ #516
> > [  252.963007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > BIOS 1.9.3-1.fc25 04/01/2014
> > [  252.963007] task: 88025e6f6000 task.stack: c90001bd
> > [  252.963007] RIP: 0010:scsi_format_opcode_name+0x1a/0x1c0
> > [  252.963007] RSP: 0018:c90001bd3c50 EFLAGS: 00010286
> > [  252.963007] RAX: 4843 RBX: 0050 RCX:
> > 
> > [  252.963007] RDX:  RSI: 0050 RDI:
> > c90001bd3cd8
> > [  252.963007] RBP: c90001bd3c88 R08: 1000 R09:
> > 
> > [  252.963007] R10: 880275134000 R11: 88027513406c R12:
> > 0050
> > [  252.963007] R13: c90001bd3cd8 R14:  R15:
> > 
> > [  252.963007] FS:  7f4d11762700() GS:88027fc4()
> > knlGS:
> > [  252.963007] CS:  0010 DS:  ES:  CR0: 80050033
> > [  252.963007] CR2:  CR3: 00025e789003 CR4:
> > 003606e0
> > [  252.963007] DR0:  DR1:  DR2:
> > 
> > [  252.963007] DR3:  DR6: fffe0ff0 DR7:
> > 0400
> > [  252.963007] Call Trace:
> > [  252.963007]  __scsi_format_command+0x27/0xc0
> > [  252.963007]  scsi_show_rq+0x5c/0xc0
> > [  252.963007]  ? seq_printf+0x4e/0x70
> > [  252.963007]  ? blk_flags_show+0x5b/0xf0
> > [  252.963007]  __blk_mq_debugfs_rq_show+0x116/0x130
> > [  252.963007]  blk_mq_debugfs_rq_show+0xe/0x10
> > [  252.963007]  seq_read+0xfe/0x3b0
> > [  252.963007]  ? __handle_mm_fault+0x631/0x1150
> > [  252.963007]  full_proxy_read+0x54/0x90
> > [  252.963007]  __vfs_read+0x37/0x160
> > [  252.963007]  ? security_file_permission+0x9b/0xc0
> > [  252.963007]  vfs_read+0x96/0x130
> > [  252.963007]  SyS_read+0x55/0xc0
> > [  252.963007]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [  252.963007] RIP: 0033:0x7f4d1127e9b0
> > [  252.963007] RSP: 002b:7ffd27082568 EFLAGS: 0246 ORIG_RAX:
> > 
> > [  252.963007] RAX: ffda RBX: 7f4d1154bb20 RCX:
> > 7f4d1127e9b0
> > [  252.963007] RDX: 0002 RSI: 7f4d115a7000 RDI:
> > 0003
> > [  252.963007] RBP: 00021010 R08:  R09:
> > 
> > [  252.963007] R10: 037b R11: 0246 R12:
> > 00022000
> > [  252.963007] R13: 7f4d1154bb78 R14: 1000 R15:
> > 0002
> > [  252.963007] Code: c6 e8 1b ca 24 00 eb 8c e8 74 2c ae ff 0f 1f 40
> > 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 89 f4
> > 48 83 ec 18 <44> 0f b6 32 48 c7 45 c8 00 00 00 00 65 48 8b 04 25 28
> > 00 00 00
> > [  252.963007] RIP: scsi_format_opcode_name+0x1a/0x1c0 RSP:
> > c90001bd3c50
> > [  252.963007] CR2: 
> > [  252.963007] ---[ end trace 83c5bddfbaa6573c ]---
> > [  252.963007] Kernel panic - not syncing: Fatal exception
> > [  252.963007] Dumping ftrace buffer:
> > [  252.963007](ftrace buffer empty)
> > [  252.963007] Kernel Offset: disabled
> > [  252.963007] ---[ end Kernel panic - not syncing: Fatal exception
> > 
> > Fixes: 0eebd005dd07(scsi: Implement blk_mq_ops.show_rq())
> > Cc: Bart Van Assche 
> > Cc: Omar Sandoval 
> > Cc: Martin K. Petersen 
> > Cc: James Bottomley 
> > Cc: Hannes Reinecke 

Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue

2017-11-10 Thread Ming Lei
On Fri, Nov 10, 2017 at 04:30:35PM +, Bart Van Assche wrote:
> On Sun, 2017-11-05 at 15:38 +, Bart Van Assche wrote:
> > On Sun, 2017-11-05 at 20:10 +0800, Ming Lei wrote:
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 048be4aa6024..0b121f29e3b1 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q)
> > >   queue_flag_set(QUEUE_FLAG_DEAD, q);
> > >   spin_unlock_irq(lock);
> > >  
> > > + /* respect queue DEAD via quiesce for blk-mq */
> > > + if (q->mq_ops)
> > > + blk_mq_quiesce_queue(q);
> > > +
> > >   /* for synchronous bio-based driver finish in-flight integrity i/o */
> > >   blk_flush_integrity();
> > 
> > Have you considered to change the blk_freeze_queue_start() call in
> > blk_set_queue_dying() into a blk_freeze_queue() call? That approach has the
> > advantage that no new if (q->mq_ops) test has to be introduced.
> 
> There is an additional reason why I think it would be better to make
> blk_set_queue_dying() wait until requests that are in progress have finished.
> Some block drivers, e.g. the mtip driver, start cleaning up resources
> immediately after blk_set_queue_dying() has returned. So making
> blk_set_queue_dying() wait would fix a race condition in at least the mtip
> driver.

Looks you don't explain how your approach fixes this issue.

IMO, your may can't fix this issue.

1) the core idea:
when blk_freeze_queue() returns, there is still in-progress dispatch or
we can call it in-progress io submission; So once blk_freeze_queue() returns,
blk_cleanup_queue() will go ahead, and the in-progress dispatch may trigger
use-after-free, even corrupt memory.

For detailed reason, please see:

https://marc.info/?l=linux-block=150993988115872=2

2) suppose you replace blk_freeze_queue_start() with blk_freeze_queue()
in blk_set_queue_dying(), that changes nothing about this issue.

You may think the blk_freeze_queue() called in blk_cleanup_queue() implies
synchronize_rcu(), which may drain up the in-progress dispatch. But the
in-progress dispatch(io submission) may not enter rcu read critical area yet
at that time, for example, in one IO submission path, the request is just
added to scheduler queue, and this request is exactly dispatched to LLD via
other run queue and completed before the actual run queue from the current
io submission. So the implied synchronize_rcu() won't help this case at all.

Or do I miss your idea?

-- 
Ming


Re: [PATCH] blk-mq: only run the hardware queue if IO is pending

2017-11-10 Thread Jens Axboe
On 11/10/2017 03:28 PM, Omar Sandoval wrote:
> On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote:
>> Currently we are inconsistent in when we decide to run the queue. Using
>> blk_mq_run_hw_queues() we check if the hctx has pending IO before
>> running it, but we don't do that from the individual queue run function,
>> blk_mq_run_hw_queue(). This results in a lot of extra and pointless
>> queue runs, potentially, on flush requests and (much worse) on tag
>> starvation situations. This is observable just looking at the top
>> output, with lots of kworkers active. For the !async runs, it just adds
>> to the CPU overhead of blk-mq.
>>
>> Move the has-pending check into the run function instead of having
>> callers do it.
>>
>> Signed-off-by: Jens Axboe 
>>
>> ---
>>
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index 6f4bdb8209f7..c117bd8fd1f6 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -81,12 +81,7 @@ static bool blk_mq_sched_restart_hctx(struct 
>> blk_mq_hw_ctx *hctx)
>>  } else
>>  clear_bit(BLK_MQ_S_SCHED_RESTART, >state);
>>  
>> -if (blk_mq_hctx_has_pending(hctx)) {
>> -blk_mq_run_hw_queue(hctx, true);
>> -return true;
>> -}
>> -
>> -return false;
>> +return blk_mq_run_hw_queue(hctx, true);
>>  }
>>  
>>  /*
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index bfe24a5b62a3..a2a4271f5ab8 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -61,10 +61,10 @@ static int blk_mq_poll_stats_bkt(const struct request 
>> *rq)
>>  /*
>>   * Check if any of the ctx's have pending work in this hardware queue
>>   */
>> -bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
>> +static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
>>  {
>> -return sbitmap_any_bit_set(>ctx_map) ||
>> -!list_empty_careful(>dispatch) ||
>> +return !list_empty_careful(>dispatch) ||
>> +sbitmap_any_bit_set(>ctx_map) ||
>>  blk_mq_sched_has_work(hctx);
>>  }
>>  
>> @@ -1253,9 +1253,14 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx 
>> *hctx, unsigned long msecs)
>>  }
>>  EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
>>  
>> -void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
>> +bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
>>  {
>> -__blk_mq_delay_run_hw_queue(hctx, async, 0);
>> +if (blk_mq_hctx_has_pending(hctx)) {
>> +__blk_mq_delay_run_hw_queue(hctx, async, 0);
>> +return true;
>> +}
>> +
>> +return false;
>>  }
>>  EXPORT_SYMBOL(blk_mq_run_hw_queue);
>>  
>> @@ -1265,8 +1270,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, 
>> bool async)
>>  int i;
>>  
>>  queue_for_each_hw_ctx(q, hctx, i) {
>> -if (!blk_mq_hctx_has_pending(hctx) ||
>> -blk_mq_hctx_stopped(hctx))
>> +if (blk_mq_hctx_stopped(hctx))
>>  continue;
>>  
>>  blk_mq_run_hw_queue(hctx, async);
> 
> Minor cosmetic point, I find this double-negative thing confusing,
> how about:
> 
>   queue_for_each_hw_ctx(q, hctx, i) {
>   if (!blk_mq_hctx_stopped(hctx))
>   blk_mq_run_hw_queue(hctx, async);

Really? It's an easier read for me - if stopped, continue. Then run after that.
!stopped seems like more of a double negative :-)

> Other than that,
> 
> Reviewed-by: Omar Sandoval 

Thanks for the review!

-- 
Jens Axboe



Re: [PATCH] blk-mq: only run the hardware queue if IO is pending

2017-11-10 Thread Omar Sandoval
On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote:
> Currently we are inconsistent in when we decide to run the queue. Using
> blk_mq_run_hw_queues() we check if the hctx has pending IO before
> running it, but we don't do that from the individual queue run function,
> blk_mq_run_hw_queue(). This results in a lot of extra and pointless
> queue runs, potentially, on flush requests and (much worse) on tag
> starvation situations. This is observable just looking at the top
> output, with lots of kworkers active. For the !async runs, it just adds
> to the CPU overhead of blk-mq.
> 
> Move the has-pending check into the run function instead of having
> callers do it.
> 
> Signed-off-by: Jens Axboe 
> 
> ---
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 6f4bdb8209f7..c117bd8fd1f6 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -81,12 +81,7 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx 
> *hctx)
>   } else
>   clear_bit(BLK_MQ_S_SCHED_RESTART, >state);
>  
> - if (blk_mq_hctx_has_pending(hctx)) {
> - blk_mq_run_hw_queue(hctx, true);
> - return true;
> - }
> -
> - return false;
> + return blk_mq_run_hw_queue(hctx, true);
>  }
>  
>  /*
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index bfe24a5b62a3..a2a4271f5ab8 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -61,10 +61,10 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
>  /*
>   * Check if any of the ctx's have pending work in this hardware queue
>   */
> -bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
> +static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
>  {
> - return sbitmap_any_bit_set(>ctx_map) ||
> - !list_empty_careful(>dispatch) ||
> + return !list_empty_careful(>dispatch) ||
> + sbitmap_any_bit_set(>ctx_map) ||
>   blk_mq_sched_has_work(hctx);
>  }
>  
> @@ -1253,9 +1253,14 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx 
> *hctx, unsigned long msecs)
>  }
>  EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
>  
> -void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
> +bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
>  {
> - __blk_mq_delay_run_hw_queue(hctx, async, 0);
> + if (blk_mq_hctx_has_pending(hctx)) {
> + __blk_mq_delay_run_hw_queue(hctx, async, 0);
> + return true;
> + }
> +
> + return false;
>  }
>  EXPORT_SYMBOL(blk_mq_run_hw_queue);
>  
> @@ -1265,8 +1270,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool 
> async)
>   int i;
>  
>   queue_for_each_hw_ctx(q, hctx, i) {
> - if (!blk_mq_hctx_has_pending(hctx) ||
> - blk_mq_hctx_stopped(hctx))
> + if (blk_mq_hctx_stopped(hctx))
>   continue;
>  
>   blk_mq_run_hw_queue(hctx, async);

Minor cosmetic point, I find this double-negative thing confusing,
how about:

queue_for_each_hw_ctx(q, hctx, i) {
if (!blk_mq_hctx_stopped(hctx))
blk_mq_run_hw_queue(hctx, async);

Other than that,

Reviewed-by: Omar Sandoval 


Re: [PATCH] blk-mq: improve tag waiting setup for non-shared tags

2017-11-10 Thread Omar Sandoval
On Thu, Nov 09, 2017 at 04:00:09PM -0700, Jens Axboe wrote:
> If we run out of driver tags, we currently treat shared and non-shared
> tags the same - both cases hook into the tag waitqueue. This is a bit
> more costly than it needs to be on unshared tags, since we have to both
> grab the hctx lock, and the waitqueue lock (and disable interrupts).
> For the non-shared case, we can simply mark the queue as needing a
> restart.
> 
> Split blk_mq_dispatch_wait_add() to account for both cases, and
> rename it to blk_mq_mark_tag_wait() to better reflect what it
> does now.
> 
> Without this patch, shared and non-shared performance is about the same
> with 4 fio thread hammering on a single null_blk device (~410K, at 75%
> sys). With the patch, the shared case is the same, but the non-shared
> tags case runs at 431K at 71% sys.
> 
> Signed-off-by: Jens Axboe 

The best of both worlds.

Reviewed-by: Omar Sandoval 


Re: [PATCH v3 2/8] crypto: scompress - use sgl_alloc() and sgl_free()

2017-11-10 Thread Ard Biesheuvel
On 10 November 2017 at 16:43, Bart Van Assche  wrote:
> On Fri, 2017-11-03 at 15:23 -0700, Bart Van Assche wrote:
>> Use the sgl_alloc() and sgl_free() functions instead of open coding
>> these functions.
>>
>> Signed-off-by: Bart Van Assche 
>> Cc: Ard Biesheuvel 
>> Cc: Herbert Xu 
>
> Hello Ard and Herbert,
>
> Would it be possible for one of you to share your opinion about this patch?
>

Acked-by: Ard Biesheuvel 


Re: [PATCH] brd: remove unused brd_mutex

2017-11-10 Thread Jens Axboe
On 11/10/2017 10:29 AM, Mikulas Patocka wrote:
> Remove unused mutex brd_mutex. It is unused since the commit ff26956875c2
> ("brd: remove support for BLKFLSBUF").

Thanks, applied.

-- 
Jens Axboe



Re: [GIT PULL] nvme updates for Linux 4.15

2017-11-10 Thread Jens Axboe
On 11/10/2017 10:38 AM, Jens Axboe wrote:
> On 11/10/2017 10:33 AM, Christoph Hellwig wrote:
>> On Fri, Nov 10, 2017 at 10:27:24AM -0700, Jens Axboe wrote:
>>> That makes for a bit of an awkward merge, why wasn't this fixed up
>>> in your tree?
>>
>> Because you asked me to always base on for-4.15/block last time?
> 
> That's not what I meant. It's conflicting because of a patch, that's
> fine. But the code in your tree is:
> 
> if (a == _attr_uuid.attr) {
> if (uuid_is_null(>uuid) ||
> !memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
> return 0;
> }
> 
> and you're saying the right resolution is:
> 
> if (a == _attr_uuid.attr) {
> if (uuid_is_null(>uuid) ||
> !memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
> return 0;
> }

This one should have had the && instead of course...

-- 
Jens Axboe



Re: [GIT PULL] nvme updates for Linux 4.15

2017-11-10 Thread Jens Axboe
On 11/10/2017 10:33 AM, Christoph Hellwig wrote:
> On Fri, Nov 10, 2017 at 10:27:24AM -0700, Jens Axboe wrote:
>> That makes for a bit of an awkward merge, why wasn't this fixed up
>> in your tree?
> 
> Because you asked me to always base on for-4.15/block last time?

That's not what I meant. It's conflicting because of a patch, that's
fine. But the code in your tree is:

if (a == _attr_uuid.attr) {
if (uuid_is_null(>uuid) ||
!memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
return 0;
}

and you're saying the right resolution is:

if (a == _attr_uuid.attr) {
if (uuid_is_null(>uuid) ||
!memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
return 0;
}

My point is that this should have been fixed up in your tree. Yes, it'd
still cause a conflict, that's fine, but at least the merge would not
end up also containing a fix for the above code. That part is not
logical.

-- 
Jens Axboe



Re: [PATCH] blk-mq: only run the hardware queue if IO is pending

2017-11-10 Thread Jens Axboe
On 11/10/2017 10:34 AM, Christoph Hellwig wrote:
> On Fri, Nov 10, 2017 at 10:31:29AM -0700, Jens Axboe wrote:
>> On 11/10/2017 10:29 AM, Christoph Hellwig wrote:
>>> On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote:
 Currently we are inconsistent in when we decide to run the queue. Using
 blk_mq_run_hw_queues() we check if the hctx has pending IO before
 running it, but we don't do that from the individual queue run function,
 blk_mq_run_hw_queue(). This results in a lot of extra and pointless
 queue runs, potentially, on flush requests and (much worse) on tag
 starvation situations. This is observable just looking at the top
 output, with lots of kworkers active. For the !async runs, it just adds
 to the CPU overhead of blk-mq.

 Move the has-pending check into the run function instead of having
 callers do it.

 Signed-off-by: Jens Axboe 
>>>
>>> Do we even still need the blk_mq_hctx_has_pending helper at all?
>>
>> We do, otherwise we end up running the queue for cases where we don't
>> need to. The caller doesn't always know if it's necessary. And even
>> if the caller can figure it out, it'd add complicated logic for some
>> of those cases (like flush). It's easier/cleaner and more efficient
>> to keep it.
> 
> I didn't mean the logic - just wondering if we need the separate
> helper given it has a single caller left.
> 
> But it's really just a cosmetic point.

Ah, that makes more sense. Since the function isn't necessarily that
straight forward if open coded (to read, I mean), I'd prefer to keep
it separate. But yeah, cosmetic, can always be revisited.

-- 
Jens Axboe



Re: [PATCH] blk-mq: only run the hardware queue if IO is pending

2017-11-10 Thread Christoph Hellwig
On Fri, Nov 10, 2017 at 10:31:29AM -0700, Jens Axboe wrote:
> On 11/10/2017 10:29 AM, Christoph Hellwig wrote:
> > On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote:
> >> Currently we are inconsistent in when we decide to run the queue. Using
> >> blk_mq_run_hw_queues() we check if the hctx has pending IO before
> >> running it, but we don't do that from the individual queue run function,
> >> blk_mq_run_hw_queue(). This results in a lot of extra and pointless
> >> queue runs, potentially, on flush requests and (much worse) on tag
> >> starvation situations. This is observable just looking at the top
> >> output, with lots of kworkers active. For the !async runs, it just adds
> >> to the CPU overhead of blk-mq.
> >>
> >> Move the has-pending check into the run function instead of having
> >> callers do it.
> >>
> >> Signed-off-by: Jens Axboe 
> > 
> > Do we even still need the blk_mq_hctx_has_pending helper at all?
> 
> We do, otherwise we end up running the queue for cases where we don't
> need to. The caller doesn't always know if it's necessary. And even
> if the caller can figure it out, it'd add complicated logic for some
> of those cases (like flush). It's easier/cleaner and more efficient
> to keep it.

I didn't mean the logic - just wondering if we need the separate
helper given it has a single caller left.

But it's really just a cosmetic point.


Re: [GIT PULL] nvme updates for Linux 4.15

2017-11-10 Thread Christoph Hellwig
On Fri, Nov 10, 2017 at 10:27:24AM -0700, Jens Axboe wrote:
> That makes for a bit of an awkward merge, why wasn't this fixed up
> in your tree?

Because you asked me to always base on for-4.15/block last time?


Re: [PATCH] blk-mq: only run the hardware queue if IO is pending

2017-11-10 Thread Jens Axboe
On 11/10/2017 10:29 AM, Christoph Hellwig wrote:
> On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote:
>> Currently we are inconsistent in when we decide to run the queue. Using
>> blk_mq_run_hw_queues() we check if the hctx has pending IO before
>> running it, but we don't do that from the individual queue run function,
>> blk_mq_run_hw_queue(). This results in a lot of extra and pointless
>> queue runs, potentially, on flush requests and (much worse) on tag
>> starvation situations. This is observable just looking at the top
>> output, with lots of kworkers active. For the !async runs, it just adds
>> to the CPU overhead of blk-mq.
>>
>> Move the has-pending check into the run function instead of having
>> callers do it.
>>
>> Signed-off-by: Jens Axboe 
> 
> Do we even still need the blk_mq_hctx_has_pending helper at all?

We do, otherwise we end up running the queue for cases where we don't
need to. The caller doesn't always know if it's necessary. And even
if the caller can figure it out, it'd add complicated logic for some
of those cases (like flush). It's easier/cleaner and more efficient
to keep it.

-- 
Jens Axboe



Re: [PATCH] blk-mq: only run the hardware queue if IO is pending

2017-11-10 Thread Christoph Hellwig
On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote:
> Currently we are inconsistent in when we decide to run the queue. Using
> blk_mq_run_hw_queues() we check if the hctx has pending IO before
> running it, but we don't do that from the individual queue run function,
> blk_mq_run_hw_queue(). This results in a lot of extra and pointless
> queue runs, potentially, on flush requests and (much worse) on tag
> starvation situations. This is observable just looking at the top
> output, with lots of kworkers active. For the !async runs, it just adds
> to the CPU overhead of blk-mq.
> 
> Move the has-pending check into the run function instead of having
> callers do it.
> 
> Signed-off-by: Jens Axboe 

Do we even still need the blk_mq_hctx_has_pending helper at all?

Except for that this looks fine to me:

Reviewed-by: Christoph Hellwig 



[PATCH] brd: remove unused brd_mutex

2017-11-10 Thread Mikulas Patocka
Remove unused mutex brd_mutex. It is unused since the commit ff26956875c2
("brd: remove support for BLKFLSBUF").

Signed-off-by: Mikulas Patocka 

---
 drivers/block/brd.c |1 -
 1 file changed, 1 deletion(-)

Index: linux-2.6/drivers/block/brd.c
===
--- linux-2.6.orig/drivers/block/brd.c
+++ linux-2.6/drivers/block/brd.c
@@ -60,7 +60,6 @@ struct brd_device {
 /*
  * Look up and return a brd's page for a given sector.
  */
-static DEFINE_MUTEX(brd_mutex);
 static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
 {
pgoff_t idx;


Re: [GIT PULL] nvme updates for Linux 4.15

2017-11-10 Thread Jens Axboe
On 11/10/2017 10:22 AM, Christoph Hellwig wrote:
> On Fri, Nov 10, 2017 at 08:22:32AM -0700, Jens Axboe wrote:
>> Actually, double checking, one of them is a little suspicious:
>>
>> if (a == _attr_uuid.attr) {
>> <<< HEAD
>> if (uuid_is_null(>uuid) &&
>> !memchr_inv(ns->nguid, 0, sizeof(ns->nguid)))
>> ===
>> if (uuid_is_null(>uuid) ||
>> !memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
> 34a9690712aa386fb5fa9e3b8fb44a22f5f2aec6
>> return 0;
>> }
>>
>> with the previous copy using is_null && memchr_inv, the change to ids
>> makes it an OR instead. I'm going with the latter, but thought I'd bring
>> it up.
> 
> The && from master is the right one, it comes from:
> 
> ommit 007a61ae2f35c7fcf767313285c4924e81f11983
> Author: Martin Wilck 
> Date:   Thu Sep 28 21:33:23 2017 +0200
> 
> nvme: fix visibility of "uuid" ns attribute
> 
> and needs to be logically applied to the new code as well.

That makes for a bit of an awkward merge, why wasn't this fixed up
in your tree?

In any case, I just fixed it up in my for-next.

-- 
Jens Axboe



Re: [GIT PULL] nvme updates for Linux 4.15

2017-11-10 Thread Christoph Hellwig
On Fri, Nov 10, 2017 at 08:22:32AM -0700, Jens Axboe wrote:
> Actually, double checking, one of them is a little suspicious:
> 
> if (a == _attr_uuid.attr) {
> <<< HEAD
> if (uuid_is_null(>uuid) &&
> !memchr_inv(ns->nguid, 0, sizeof(ns->nguid)))
> ===
> if (uuid_is_null(>uuid) ||
> !memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
> >>> 34a9690712aa386fb5fa9e3b8fb44a22f5f2aec6
> return 0;
> }
> 
> with the previous copy using is_null && memchr_inv, the change to ids
> makes it an OR instead. I'm going with the latter, but thought I'd bring
> it up.

The && from master is the right one, it comes from:

ommit 007a61ae2f35c7fcf767313285c4924e81f11983
Author: Martin Wilck 
Date:   Thu Sep 28 21:33:23 2017 +0200

nvme: fix visibility of "uuid" ns attribute

and needs to be logically applied to the new code as well.


Re: [PATCH V4] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-10 Thread James Bottomley
On Fri, 2017-11-10 at 17:01 +0800, Ming Lei wrote:
> cmd->cmnd can be allocated/freed dynamically in case of
> T10_PI_TYPE2_PROTECTION,
> so we should check it in scsi_show_rq() because this request may have
> been freed already here, and cmd->cmnd has been set as null.
> 
> We choose to accept read-after-free and dump request data as far as
> possible.
> 
> This patch fixs the following kernel crash when dumping request via
> block's
> debugfs interface:
> 
> [  252.962045] BUG: unable to handle kernel NULL pointer dereference
> at   (null)
> [  252.963007] IP: scsi_format_opcode_name+0x1a/0x1c0
> [  252.963007] PGD 25e75a067 P4D 25e75a067 PUD 25e75b067 PMD 0
> [  252.963007] Oops:  [#1] PREEMPT SMP
> [  252.963007] Dumping ftrace buffer:
> [  252.963007](ftrace buffer empty)
> [  252.963007] Modules linked in: scsi_debug ebtable_filter ebtables
> ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE
> nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> nf_nat_ipv4 nf_nat nf_conntrack libcrc32c bridge stp llc
> iptable_filter fuse ip_tables sd_mod sg mptsas mptscsih mptbase
> crc32c_intel ahci libahci nvme serio_raw scsi_transport_sas libata
> lpc_ich nvme_core virtio_scsi binfmt_misc dm_mod iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs
> [  252.963007] CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0-
> rc2.blk_mq_io_hang+ #516
> [  252.963007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.9.3-1.fc25 04/01/2014
> [  252.963007] task: 88025e6f6000 task.stack: c90001bd
> [  252.963007] RIP: 0010:scsi_format_opcode_name+0x1a/0x1c0
> [  252.963007] RSP: 0018:c90001bd3c50 EFLAGS: 00010286
> [  252.963007] RAX: 4843 RBX: 0050 RCX:
> 
> [  252.963007] RDX:  RSI: 0050 RDI:
> c90001bd3cd8
> [  252.963007] RBP: c90001bd3c88 R08: 1000 R09:
> 
> [  252.963007] R10: 880275134000 R11: 88027513406c R12:
> 0050
> [  252.963007] R13: c90001bd3cd8 R14:  R15:
> 
> [  252.963007] FS:  7f4d11762700() GS:88027fc4()
> knlGS:
> [  252.963007] CS:  0010 DS:  ES:  CR0: 80050033
> [  252.963007] CR2:  CR3: 00025e789003 CR4:
> 003606e0
> [  252.963007] DR0:  DR1:  DR2:
> 
> [  252.963007] DR3:  DR6: fffe0ff0 DR7:
> 0400
> [  252.963007] Call Trace:
> [  252.963007]  __scsi_format_command+0x27/0xc0
> [  252.963007]  scsi_show_rq+0x5c/0xc0
> [  252.963007]  ? seq_printf+0x4e/0x70
> [  252.963007]  ? blk_flags_show+0x5b/0xf0
> [  252.963007]  __blk_mq_debugfs_rq_show+0x116/0x130
> [  252.963007]  blk_mq_debugfs_rq_show+0xe/0x10
> [  252.963007]  seq_read+0xfe/0x3b0
> [  252.963007]  ? __handle_mm_fault+0x631/0x1150
> [  252.963007]  full_proxy_read+0x54/0x90
> [  252.963007]  __vfs_read+0x37/0x160
> [  252.963007]  ? security_file_permission+0x9b/0xc0
> [  252.963007]  vfs_read+0x96/0x130
> [  252.963007]  SyS_read+0x55/0xc0
> [  252.963007]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [  252.963007] RIP: 0033:0x7f4d1127e9b0
> [  252.963007] RSP: 002b:7ffd27082568 EFLAGS: 0246 ORIG_RAX:
> 
> [  252.963007] RAX: ffda RBX: 7f4d1154bb20 RCX:
> 7f4d1127e9b0
> [  252.963007] RDX: 0002 RSI: 7f4d115a7000 RDI:
> 0003
> [  252.963007] RBP: 00021010 R08:  R09:
> 
> [  252.963007] R10: 037b R11: 0246 R12:
> 00022000
> [  252.963007] R13: 7f4d1154bb78 R14: 1000 R15:
> 0002
> [  252.963007] Code: c6 e8 1b ca 24 00 eb 8c e8 74 2c ae ff 0f 1f 40
> 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 89 f4
> 48 83 ec 18 <44> 0f b6 32 48 c7 45 c8 00 00 00 00 65 48 8b 04 25 28
> 00 00 00
> [  252.963007] RIP: scsi_format_opcode_name+0x1a/0x1c0 RSP:
> c90001bd3c50
> [  252.963007] CR2: 
> [  252.963007] ---[ end trace 83c5bddfbaa6573c ]---
> [  252.963007] Kernel panic - not syncing: Fatal exception
> [  252.963007] Dumping ftrace buffer:
> [  252.963007](ftrace buffer empty)
> [  252.963007] Kernel Offset: disabled
> [  252.963007] ---[ end Kernel panic - not syncing: Fatal exception
> 
> Fixes: 0eebd005dd07(scsi: Implement blk_mq_ops.show_rq())
> Cc: Bart Van Assche 
> Cc: Omar Sandoval 
> Cc: Martin K. Petersen 
> Cc: James Bottomley 
> Cc: Hannes Reinecke 
> Signed-off-by: Ming Lei 
> ---
> V2:
>    - fix typo
> V3:
>    - prefer to dump data and accept read-after-free
>    - add some comment
> V4:
>    - read the two variable into local variable first, for avoiding
>   free between 

Re: [PATCH v3 2/8] crypto: scompress - use sgl_alloc() and sgl_free()

2017-11-10 Thread Bart Van Assche
On Fri, 2017-11-03 at 15:23 -0700, Bart Van Assche wrote:
> Use the sgl_alloc() and sgl_free() functions instead of open coding
> these functions.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Ard Biesheuvel 
> Cc: Herbert Xu 

Hello Ard and Herbert,

Would it be possible for one of you to share your opinion about this patch?

Thanks,

Bart.

Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue

2017-11-10 Thread Bart Van Assche
On Sun, 2017-11-05 at 15:38 +, Bart Van Assche wrote:
> On Sun, 2017-11-05 at 20:10 +0800, Ming Lei wrote:
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 048be4aa6024..0b121f29e3b1 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q)
> > queue_flag_set(QUEUE_FLAG_DEAD, q);
> > spin_unlock_irq(lock);
> >  
> > +   /* respect queue DEAD via quiesce for blk-mq */
> > +   if (q->mq_ops)
> > +   blk_mq_quiesce_queue(q);
> > +
> > /* for synchronous bio-based driver finish in-flight integrity i/o */
> > blk_flush_integrity();
> 
> Have you considered to change the blk_freeze_queue_start() call in
> blk_set_queue_dying() into a blk_freeze_queue() call? That approach has the
> advantage that no new if (q->mq_ops) test has to be introduced.

There is an additional reason why I think it would be better to make
blk_set_queue_dying() wait until requests that are in progress have finished.
Some block drivers, e.g. the mtip driver, start cleaning up resources
immediately after blk_set_queue_dying() has returned. So making
blk_set_queue_dying() wait would fix a race condition in at least the mtip
driver.

Bart.

[PATCH] blk-mq: only run the hardware queue if IO is pending

2017-11-10 Thread Jens Axboe
Currently we are inconsistent in when we decide to run the queue. Using
blk_mq_run_hw_queues() we check if the hctx has pending IO before
running it, but we don't do that from the individual queue run function,
blk_mq_run_hw_queue(). This results in a lot of extra and pointless
queue runs, potentially, on flush requests and (much worse) on tag
starvation situations. This is observable just looking at the top
output, with lots of kworkers active. For the !async runs, it just adds
to the CPU overhead of blk-mq.

Move the has-pending check into the run function instead of having
callers do it.

Signed-off-by: Jens Axboe 

---

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 6f4bdb8209f7..c117bd8fd1f6 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -81,12 +81,7 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx 
*hctx)
} else
clear_bit(BLK_MQ_S_SCHED_RESTART, >state);
 
-   if (blk_mq_hctx_has_pending(hctx)) {
-   blk_mq_run_hw_queue(hctx, true);
-   return true;
-   }
-
-   return false;
+   return blk_mq_run_hw_queue(hctx, true);
 }
 
 /*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index bfe24a5b62a3..a2a4271f5ab8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -61,10 +61,10 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
 /*
  * Check if any of the ctx's have pending work in this hardware queue
  */
-bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
+static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
 {
-   return sbitmap_any_bit_set(>ctx_map) ||
-   !list_empty_careful(>dispatch) ||
+   return !list_empty_careful(>dispatch) ||
+   sbitmap_any_bit_set(>ctx_map) ||
blk_mq_sched_has_work(hctx);
 }
 
@@ -1253,9 +1253,14 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx 
*hctx, unsigned long msecs)
 }
 EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
 
-void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
+bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 {
-   __blk_mq_delay_run_hw_queue(hctx, async, 0);
+   if (blk_mq_hctx_has_pending(hctx)) {
+   __blk_mq_delay_run_hw_queue(hctx, async, 0);
+   return true;
+   }
+
+   return false;
 }
 EXPORT_SYMBOL(blk_mq_run_hw_queue);
 
@@ -1265,8 +1270,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool 
async)
int i;
 
queue_for_each_hw_ctx(q, hctx, i) {
-   if (!blk_mq_hctx_has_pending(hctx) ||
-   blk_mq_hctx_stopped(hctx))
+   if (blk_mq_hctx_stopped(hctx))
continue;
 
blk_mq_run_hw_queue(hctx, async);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 99a19c5523e2..dcf379a892dd 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -26,14 +26,12 @@ struct blk_mq_ctx {
struct kobject  kobj;
 } cacheline_aligned_in_smp;
 
-void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
 int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
 bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool);
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head 
*list);
-bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
 bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
bool wait);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index b326208277ee..eb1e2cdffb31 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -266,7 +266,7 @@ void blk_mq_start_stopped_hw_queues(struct request_queue 
*q, bool async);
 void blk_mq_quiesce_queue(struct request_queue *q);
 void blk_mq_unquiesce_queue(struct request_queue *q);
 void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long 
msecs);
-void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
+bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,

-- 
Jens Axboe



Re: [PATCH] ide: ide-atapi: fix compile error with defining macro DEBUG

2017-11-10 Thread Jens Axboe
On 11/10/2017 12:59 AM, Hongxu Jia wrote:
> Compile ide-atapi failed with defining macro "DEBUG"
> ...
> |drivers/ide/ide-atapi.c:285:52: error: 'struct request' has
> no member named 'cmd'; did you mean 'csd'?
> |  debug_log("%s: rq->cmd[0]: 0x%x\n", __func__, rq->cmd[0]);
> ...
> 
> Since we split the scsi_request out of struct request, it missed
> do the same thing on debug_log

Thanks, applied.

-- 
Jens Axboe



Re: [PATCH 00/12 v5] Multiqueue for MMC/SD

2017-11-10 Thread Ulf Hansson
On 10 November 2017 at 11:01, Linus Walleij  wrote:
> This is the fifth iteration of this patch set.
>
> I *HOPE* that we can scrap this patch set and merge Adrian's
> patches instead, because they also bring CQE support which is
> nice. I had some review comments on his series, mainly that
> it needs to kill off the legacy block layer code path that
> noone likes anyway.
>
> So this is mainly an academic and inspirational exercise.
> Whatever remains of this refactoring, if anything, I can
> certainly do on top of Adrian's patches as well.
>
> What changed since v4 is the error path, since Adrian pointed
> out that the error handling seems to be fragile. It was indeed
> fragile... To make sure things work properly I have run long
> test rounds with fault injection, essentially:

Please correct me if I am wrong, the issues was observed already in
patch 11, before the actual switch to mq was done, right?

>
> Enable FAULT_INJECTION, FAULT_INJECTION_DEBUG_FS,
>FAIL_MMC_REQUEST
> cd /debug/mmc3/fail_mmc_request/
> echo 1 > probability
> echo -1 > times
>
> Then running a dd to the card, also increased the error rate
> to 10% and completed tests successfully, but at this error
> rate the MMC stack sometimes exceeds the retry limit and the
> dd command fails (as is appropriate).

That's great. I really appreciate that you have run these tests, that
gives me a good confidence from an overall point of view.

>
> Removing a card during I/O does not work well however :/
> So I guess I would need to work on that if this series should
> continue. (Hopefully unlikely.)

Yeah, this has actually been rather cumbersome to deal with also in
the legacy request path. Let's dive into this in more detail as soon
as possible.

>
>
> Linus Walleij (12):
>   mmc: core: move the asynchronous post-processing
>   mmc: core: add a workqueue for completing requests
>   mmc: core: replace waitqueue with worker
>   mmc: core: do away with is_done_rcv
>   mmc: core: do away with is_new_req
>   mmc: core: kill off the context info
>   mmc: queue: simplify queue logic
>   mmc: block: shuffle retry and error handling
>   mmc: queue: stop flushing the pipeline with NULL
>   mmc: queue/block: pass around struct mmc_queue_req*s
>   mmc: block: issue requests in massive parallel
>   mmc: switch MMC/SD to use blk-mq multiqueueing v5
>
>  drivers/mmc/core/block.c| 557 
> +++-
>  drivers/mmc/core/block.h|   5 +-
>  drivers/mmc/core/bus.c  |   1 -
>  drivers/mmc/core/core.c | 217 ++---
>  drivers/mmc/core/core.h |  11 +-
>  drivers/mmc/core/host.c |   1 -
>  drivers/mmc/core/mmc_test.c |  31 +--
>  drivers/mmc/core/queue.c| 252 
>  drivers/mmc/core/queue.h|  16 +-
>  include/linux/mmc/core.h|   3 +-
>  include/linux/mmc/host.h|  31 +--
>  11 files changed, 557 insertions(+), 568 deletions(-)

First, I haven't yet commented on the latest version of the mq patch
(patch12 in this series) and neither on Adrians (patch 3 in his
series), but before doing that, let me share my overall view of how we
could move forward, as to see if all of us can agree on that path.

So, what I really like in $subject series is the step by step method,
moving slowly forward enables an easy review, then the actual switch
to mq gets a diff of "3 files changed, 156 insertions(+), 181
deletions(-)". This shows to me, that it can be done! Great work!
Of course, I do realize that you may not have considered all
preparations needed for CQE, which Adrian may have thought of in his
mq patch from his series (patch 3), but still.

Moreover, for reasons brought up while reviewing Adrian's series,
regarding if mq is "ready", and because I see that the diff for patch
12 is small, I suggest that we just skip the step adding a Kconfig
option to allow an opt-in of the mq path. In other words, *the* patch
that makes the switch to mq, should also remove the entire left over
of rubbish code, from the legacy request path. That's also what you do
in patch12, nice!

Finally, I understand that you would be happy to scrap this series,
but instead let Adrian's series, when re-posted, to go first. Could
you perhaps re-consider that, because I wonder if it may not be
smother and less risky, to actually apply everything up to patch 11 in
this series?

I noticed that you reported issues with card removal during I/O (for
both yours and Adrian's mq patch), but does those problems exists at
patch 11 - or is those explicitly introduced with the mk patch (patch
12)?

Of course, I realize that if we apply everything up to patch11, that
would require a massive re-base of Adrian's mq/CQE series, but on the
other hand, then matter which mq patch we decide to go with, it should
be a rather small diff, thus easy to review and less risky.

Adrian, Linus - what do you think?

Kind regards
Uffe


Re: [GIT PULL] nvme updates for Linux 4.15

2017-11-10 Thread Jens Axboe
On 11/10/2017 08:18 AM, Jens Axboe wrote:
> On 11/10/2017 07:34 AM, Christoph Hellwig wrote:
>> Hi Jens,
>>
>> another round of NVMe updates for 4.15.
>>
>> The biggest items are:
>>
>>  - native multipath support (me + Hannes)
>>  - userspace notifications for AENs (Keith)
>>  - obey the command effects log page (Keith)
>>
>> in addition to that lots of fixes from a wide variety of people.
> 
> Pulled. It had two conflicts with master, but both pretty trivial.

Actually, double checking, one of them is a little suspicious:

if (a == _attr_uuid.attr) {
<<< HEAD
if (uuid_is_null(>uuid) &&
!memchr_inv(ns->nguid, 0, sizeof(ns->nguid)))
===
if (uuid_is_null(>uuid) ||
!memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
>>> 34a9690712aa386fb5fa9e3b8fb44a22f5f2aec6
return 0;
}

with the previous copy using is_null && memchr_inv, the change to ids
makes it an OR instead. I'm going with the latter, but thought I'd bring
it up.

-- 
Jens Axboe



Re: [GIT PULL] nvme updates for Linux 4.15

2017-11-10 Thread Jens Axboe
On 11/10/2017 07:34 AM, Christoph Hellwig wrote:
> Hi Jens,
> 
> another round of NVMe updates for 4.15.
> 
> The biggest items are:
> 
>  - native multipath support (me + Hannes)
>  - userspace notifications for AENs (Keith)
>  - obey the command effects log page (Keith)
> 
> in addition to that lots of fixes from a wide variety of people.

Pulled. It had two conflicts with master, but both pretty trivial.

-- 
Jens Axboe



[GIT PULL] nvme updates for Linux 4.15

2017-11-10 Thread Christoph Hellwig
Hi Jens,

another round of NVMe updates for 4.15.

The biggest items are:

 - native multipath support (me + Hannes)
 - userspace notifications for AENs (Keith)
 - obey the command effects log page (Keith)

in addition to that lots of fixes from a wide variety of people.

The following changes since commit 7d1dff993252366cc5e21e07b80072d323628363:

  blktrace: fix unlocked registration of tracepoints (2017-11-07 09:42:10 -0700)

are available in the git repository at:

  git://git.infradead.org/nvme.git nvme-4.15

for you to fetch changes up to b119d90862842a055ff78ca54953d430da314808:

  nvme: expose subsys attribute to sysfs (2017-11-10 12:16:19 +0100)


Christoph Hellwig (14):
  nvme: move the dying queue check from cancel to completion
  nvme: always unregister the integrity profile in __nvme_revalidate_disk
  nvme: don't pass struct nvme_ns to nvme_init_integrity
  nvme: don't pass struct nvme_ns to nvme_config_discard
  nvme: set the chunk size before freezing the queue
  nvme: split __nvme_revalidate_disk
  nvme: fix and clarify the check for missing metadata
  nvmet: better data length validation
  nvmet: kill nvmet_inline_bio_init
  nvme: track subsystems
  nvme: introduce a nvme_ns_ids structure
  nvme: track shared namespaces
  nvme: implement multipath access to nvme subsystems
  nvme: also expose the namespace identification sysfs files for mpath nodes

Hannes Reinecke (3):
  block: create 'slaves' and 'holders' entries for hidden gendisks
  nvme: create 'slaves' and 'holders' entries for hidden controllers
  nvme: expose subsys attribute to sysfs

Israel Rukshin (1):
  nvmet-rdma: update queue list during ib_device removal

James Smart (3):
  nvme-fc: fix localport resume using stale values
  nvme-fc: decouple ns references from lldd references
  lpfc: tie in to new dev_loss_tmo interface in nvme transport

Javier González (2):
  nvme: compare NQN string with right size
  nvme: fix eui_show() print format

Keith Busch (7):
  nvme: factor get log into a helper
  nvme: check admin passthru command effects
  nvme: centralize AEN defines
  nvme-fc: remove unused "queue_size" field
  nvme: remove handling of multiple AEN requests
  nvme: unexport starting async event work
  nvme: send uevent for some asynchronous events

Max Gurtovoy (1):
  nvme-rdma: fix nvme_rdma_create_queue_ib error flow

Ming Lei (1):
  nvme-pci: avoid dereference of symbol from unloaded module

Minwoo Im (1):
  nvmet: fix comment typos in admin-cmd.c

Sagi Grimberg (2):
  nvmet: remove redundant memset if failed to get_smart_log failed
  nvmet: remove redundant local variable

 block/genhd.c   |   14 +-
 drivers/nvme/host/Kconfig   |9 +
 drivers/nvme/host/Makefile  |1 +
 drivers/nvme/host/core.c| 1050 ++-
 drivers/nvme/host/fabrics.c |4 +-
 drivers/nvme/host/fc.c  |  143 --
 drivers/nvme/host/lightnvm.c|   14 +-
 drivers/nvme/host/multipath.c   |  291 +++
 drivers/nvme/host/nvme.h|  138 -
 drivers/nvme/host/pci.c |   21 +-
 drivers/nvme/host/rdma.c|   21 +-
 drivers/nvme/target/admin-cmd.c |   21 +-
 drivers/nvme/target/core.c  |   10 +
 drivers/nvme/target/fc.c|   32 +-
 drivers/nvme/target/io-cmd.c|   18 +-
 drivers/nvme/target/loop.c  |   19 +-
 drivers/nvme/target/nvmet.h |4 +
 drivers/nvme/target/rdma.c  |   16 +-
 drivers/scsi/lpfc/lpfc_attr.c   |5 +
 include/linux/nvme.h|   30 ++
 20 files changed, 1456 insertions(+), 405 deletions(-)
 create mode 100644 drivers/nvme/host/multipath.c


Re: [PATCH 00/12 v5] Multiqueue for MMC/SD

2017-11-10 Thread Linus Walleij
On Fri, Nov 10, 2017 at 11:01 AM, Linus Walleij
 wrote:

> Removing a card during I/O does not work well however :/
> So I guess I would need to work on that if this series should
> continue. (Hopefully unlikely.)

I tested a bit more and it turns out this doesn't work on any of
the MQ patch sets.

Which matches Christoph's statement that this is not really
working. I haven't really analyzed why though, I can see that
the kernel crashes firmly into a brick wall, but on mainline
it does not.

I think it's just something we have to smoke out in the next
release cycle as we switch to MQ.

Yours,
Linus Walleij


Re: [PATCH V13 03/10] mmc: block: Add blk-mq support

2017-11-10 Thread Linus Walleij
Following up on this:

On Thu, Nov 9, 2017 at 4:52 PM, Linus Walleij  wrote:

>> No one has ever suggested that the legacy API will remain.  Once blk-mq is
>> ready the old code gets deleted.
>
> The block layer maintainers most definately think MQ is ready
> but you seem to disagree. Why?
>
> In the recent LWN article from kernel recepies:
> https://lwn.net/Articles/735275/
> the only obstacle I can see is a mention that SCSI was not
> switched over by default because of some problems with
> slow rotating media. "This change had to be backed out
> because of some performance and scalability issues that
> arose for rotating storage."
>
> Christoph mentions that he's gonna try again for v4.14.
> But it's true, I do not see that happening in linux-next yet.

Neil Brown's article on LWN points to Ming Lei's patch
set:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1513023.html

As addressing the issue that held back blk-MQ from
being default for SCSI, and it is reportedly to be
queued for v4.15. Since this MMC/SD rework is
also targeted for v4.15 I think we can assume it is
pretty much ready for everything, and delete the
non-MQ block path.

I just haven't seen any of this problem in my tests
with MMC/SD so I do not think it would be affected,
but it anyways seem to be fixed.

OK maybe I am especially optimistic. But it's not just
based on that.

Yours,
Linus Walleij


[PATCH 11/12 v5] mmc: block: issue requests in massive parallel

2017-11-10 Thread Linus Walleij
This makes a crucial change to the issueing mechanism for the
MMC requests:

Before commit "mmc: core: move the asynchronous post-processing"
some parallelism on the read/write requests was achieved by
speculatively postprocessing a request and re-preprocess and
re-issue the request if something went wrong, which we discover
later when checking for an error.

This is kind of ugly. Instead we need a mechanism like here:

We issue requests, and when they come back from the hardware,
we know if they finished successfully or not. If the request
was successful, we complete the asynchronous request and let a
new request immediately start on the hardware. If, and only if,
it returned an error from the hardware we go down the error
path.

This is achieved by splitting the work path from the hardware
in two: a successful path ending up calling down to
mmc_blk_rw_done() and completing quickly, and an errorpath
calling down to mmc_blk_rw_done_error().

This has a profound effect: we reintroduce the parallelism on
the successful path as mmc_post_req() can now be called in
while the next request is in transit (just like prior to
commit "mmc: core: move the asynchronous post-processing")
and blk_end_request() is called while the next request is
already on the hardware.

The latter has the profound effect of issuing a new request
again so that we actually may have three requests
in transit at the same time: one on the hardware, one being
prepared (such as DMA flushing) and one being prepared for
issuing next by the block layer. This shows up when we
transit to multiqueue, where this can be exploited.

Signed-off-by: Linus Walleij 
---
ChangeLog v4->v5:
- Fixes on the errorpath: when a request reports error back,
  keep the areq on the host as it is not yet finished
  (no assigning NULL to host->areq), do not postprocess
  the request or complete it. This will happen eventually
  when the request succeeds.
- When restarting the command, use mmc_restart_areq() as
  could be expected.
- Augmend the .report_done_status() callback to return a
  bool indicating whether the areq is now finished or not,
  to handle the error case where we eventually give up on
  the request and have returned an error to the block
  layer.
- Make sure to post-process the request on the error path
  and pre-process it again when resending an asynchronous
  request. This satisfies the host's semantic expectation
  that every request will be in pre->req->post sequence
  even if there are errors.
- To assure the ordering of pre/post-processing, we need
  to post-process any prepared request with -EINVAL if
  there is an error, then re-preprocess it again after
  error recovery. To this end a helper pointer in
  host->pending_areq is added so the error path can
  act on this.
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/block.c | 98 
 drivers/mmc/core/core.c  | 58 +++-
 include/linux/mmc/host.h |  4 +-
 3 files changed, 117 insertions(+), 43 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 2cd9fe5a8c9b..e3ae7241b2eb 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1827,7 +1827,8 @@ static void mmc_blk_rw_try_restart(struct mmc_queue_req 
*mq_rq)
mmc_restart_areq(mq->card->host, areq);
 }
 
-static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status 
status)
+static bool mmc_blk_rw_done_error(struct mmc_async_req *areq,
+ enum mmc_blk_status status)
 {
struct mmc_queue *mq;
struct mmc_blk_data *md;
@@ -1835,7 +1836,7 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
struct mmc_host *host;
struct mmc_queue_req *mq_rq;
struct mmc_blk_request *brq;
-   struct request *old_req;
+   struct request *req;
bool req_pending = true;
int type, retune_retry_done = 0;
 
@@ -1849,42 +1850,27 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
card = mq->card;
host = card->host;
brq = _rq->brq;
-   old_req = mmc_queue_req_to_req(mq_rq);
-   type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+   req = mmc_queue_req_to_req(mq_rq);
+   type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
 
switch (status) {
-   case MMC_BLK_SUCCESS:
case MMC_BLK_PARTIAL:
-   /*
-* A block was successfully transferred.
-*/
+   /* This should trigger a retransmit */
mmc_blk_reset_success(md, type);
-   req_pending = blk_end_request(old_req, BLK_STS_OK,
+   req_pending = blk_end_request(req, BLK_STS_OK,
  brq->data.bytes_xfered);
-   /*
-* If the blk_end_request function 

[PATCH 09/12 v5] mmc: queue: stop flushing the pipeline with NULL

2017-11-10 Thread Linus Walleij
Remove all the pipeline flush: i.e. repeatedly sending NULL
down to the core layer to flush out asynchronous requests,
and also sending NULL after "special" commands to achieve the
same flush.

Instead: let the "special" commands wait for any ongoing
asynchronous transfers using the completion, and apart from
that expect the core.c and block.c layers to deal with the
ongoing requests autonomously without any "push" from the
queue.

Add a function in the core to wait for an asynchronous request
to complete.

Update the tests to use the new function prototypes.

This kills off some FIXME's such as gettin rid of the mq->qcnt
queue depth variable that was introduced a while back.

It is a vital step toward multiqueue enablement that we stop
pulling NULL off the end of the request queue to flush the
asynchronous issueing mechanism.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v5:
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/block.c| 173 
 drivers/mmc/core/core.c |  50 +++--
 drivers/mmc/core/core.h |   6 +-
 drivers/mmc/core/mmc_test.c |  31 ++--
 drivers/mmc/core/queue.c|  11 ++-
 drivers/mmc/core/queue.h|   7 --
 6 files changed, 108 insertions(+), 170 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 2cda2f52058e..c7a57006e27f 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1805,7 +1805,6 @@ static void mmc_blk_rw_cmd_abort(struct mmc_queue *mq, 
struct mmc_card *card,
if (mmc_card_removed(card))
req->rq_flags |= RQF_QUIET;
while (blk_end_request(req, BLK_STS_IOERR, blk_rq_cur_bytes(req)));
-   mq->qcnt--;
 }
 
 /**
@@ -1877,13 +1876,10 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
if (mmc_blk_reset(md, card->host, type)) {
if (req_pending)
mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq);
-   else
-   mq->qcnt--;
mmc_blk_rw_try_restart(mq, mq_rq);
return;
}
if (!req_pending) {
-   mq->qcnt--;
mmc_blk_rw_try_restart(mq, mq_rq);
return;
}
@@ -1927,7 +1923,6 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
req_pending = blk_end_request(old_req, BLK_STS_IOERR,
  brq->data.blksz);
if (!req_pending) {
-   mq->qcnt--;
mmc_blk_rw_try_restart(mq, mq_rq);
return;
}
@@ -1951,26 +1946,16 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
 */
mmc_blk_rw_rq_prep(mq_rq, card,
areq->disable_multi, mq);
-   mmc_start_areq(card->host, areq, NULL);
+   mmc_start_areq(card->host, areq);
mq_rq->brq.retune_retry_done = retune_retry_done;
-   } else {
-   /* Else, this request is done */
-   mq->qcnt--;
}
 }
 
 static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 {
-   enum mmc_blk_status status;
-   struct mmc_async_req *new_areq;
-   struct mmc_async_req *old_areq;
struct mmc_card *card = mq->card;
-
-   if (new_req)
-   mq->qcnt++;
-
-   if (!mq->qcnt)
-   return;
+   struct mmc_queue_req *mqrq_cur = req_to_mmc_queue_req(new_req);
+   struct mmc_async_req *areq = _cur->areq;
 
/*
 * If the card was removed, just cancel everything and return.
@@ -1978,46 +1963,26 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
if (mmc_card_removed(card)) {
new_req->rq_flags |= RQF_QUIET;
blk_end_request_all(new_req, BLK_STS_IOERR);
-   mq->qcnt--; /* FIXME: just set to 0? */
return;
}
 
-   if (new_req) {
-   struct mmc_queue_req *mqrq_cur = req_to_mmc_queue_req(new_req);
-   /*
-* When 4KB native sector is enabled, only 8 blocks
-* multiple read or write is allowed
-*/
-   if (mmc_large_sector(card) &&
-   !IS_ALIGNED(blk_rq_sectors(new_req), 8)) {
-   pr_err("%s: Transfer size is not 4KB sector size 
aligned\n",
-  new_req->rq_disk->disk_name);
-   mmc_blk_rw_cmd_abort(mq, card, new_req, mqrq_cur);
-   return;
-   }
-
-   mmc_blk_rw_rq_prep(mqrq_cur, card, 0, mq);
-   new_areq = _cur->areq;
-   

[PATCH 12/12 v5] mmc: switch MMC/SD to use blk-mq multiqueueing v5

2017-11-10 Thread Linus Walleij
This switches the MMC/SD stack to use the multiqueue block
layer interface.

We kill off the kthread that was just calling blk_fetch_request()
and let blk-mq drive all traffic, nice, that is how it should work.

Due to having switched the submission mechanics around so that
the completion of requests is now triggered from the host
callbacks, we manage to keep the same performance for linear
reads/writes as we have for the old block layer.

The open questions from earlier patch series have been
addressed:

- mmc_[get|put]_card() is now issued across requests from
  .queue_rq() to .complete() using Adrians nifty context lock.
  This means that the block layer does not compete with itself
  on getting access to the host, and we can let other users of
  the host come in. (For SDIO and mixed-mode cards.)

- Partial reads are handled by open coding calls to
  blk_update_request() as advised by Christoph.

Signed-off-by: Linus Walleij 
---
ChangeLog v4->v5:
- Rebase on the other changes including improved error
  handling.
- Use quiesce and unquiesce on the queue in the
  suspend/resume cycle.
---
 drivers/mmc/core/block.c |  92 ++
 drivers/mmc/core/queue.c | 237 ---
 drivers/mmc/core/queue.h |   8 +-
 3 files changed, 156 insertions(+), 181 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index e3ae7241b2eb..9fa3bfa3b4f8 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -93,7 +94,6 @@ static DEFINE_IDA(mmc_rpmb_ida);
  * There is one mmc_blk_data per slot.
  */
 struct mmc_blk_data {
-   spinlock_t  lock;
struct device   *parent;
struct gendisk  *disk;
struct mmc_queue queue;
@@ -1204,6 +1204,23 @@ static inline void mmc_blk_reset_success(struct 
mmc_blk_data *md, int type)
 }
 
 /*
+ * This reports status back to the block layer for a finished request.
+ */
+static void mmc_blk_complete(struct mmc_queue_req *mq_rq,
+blk_status_t status)
+{
+   struct request *req = mmc_queue_req_to_req(mq_rq);
+
+   /*
+* We are done with I/O, so this call will invoke .complete() and
+* release the host lock.
+*/
+   blk_mq_complete_request(req);
+   /* Then we report the request back to the block layer */
+   blk_mq_end_request(req, status);
+}
+
+/*
  * The non-block commands come back from the block layer after it queued it and
  * processed it with all other requests and then they get issued in this
  * function.
@@ -1262,9 +1279,9 @@ static void mmc_blk_issue_drv_op(struct mmc_queue_req 
*mq_rq)
ret = -EINVAL;
break;
}
+
mq_rq->drv_op_result = ret;
-   blk_end_request_all(mmc_queue_req_to_req(mq_rq),
-   ret ? BLK_STS_IOERR : BLK_STS_OK);
+   mmc_blk_complete(mq_rq, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 static void mmc_blk_issue_discard_rq(struct mmc_queue_req *mq_rq)
@@ -1308,7 +1325,7 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue_req 
*mq_rq)
else
mmc_blk_reset_success(md, type);
 fail:
-   blk_end_request(req, status, blk_rq_bytes(req));
+   mmc_blk_complete(mq_rq, status);
 }
 
 static void mmc_blk_issue_secdiscard_rq(struct mmc_queue_req *mq_rq)
@@ -1378,7 +1395,7 @@ static void mmc_blk_issue_secdiscard_rq(struct 
mmc_queue_req *mq_rq)
if (!err)
mmc_blk_reset_success(md, type);
 out:
-   blk_end_request(req, status, blk_rq_bytes(req));
+   mmc_blk_complete(mq_rq, status);
 }
 
 static void mmc_blk_issue_flush(struct mmc_queue_req *mq_rq)
@@ -1388,8 +1405,13 @@ static void mmc_blk_issue_flush(struct mmc_queue_req 
*mq_rq)
int ret = 0;
 
ret = mmc_flush_cache(card);
-   blk_end_request_all(mmc_queue_req_to_req(mq_rq),
-   ret ? BLK_STS_IOERR : BLK_STS_OK);
+   /*
+* NOTE: this used to call blk_end_request_all() for both
+* cases in the old block layer to flush all queued
+* transactions. I am not sure it was even correct to
+* do that for the success case.
+*/
+   mmc_blk_complete(mq_rq, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 /*
@@ -1768,7 +1790,6 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req 
*mq_rq,
 
mq_rq->areq.err_check = mmc_blk_err_check;
mq_rq->areq.host = card->host;
-   INIT_WORK(_rq->areq.finalization_work, mmc_finalize_areq);
 }
 
 static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, struct mmc_card *card,
@@ -1792,10 +1813,13 @@ static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, 
struct mmc_card *card,
err = mmc_sd_num_wr_blocks(card, );
if (err)
req_pending = old_req_pending;
-   else
-   req_pending = 

[PATCH 03/12 v5] mmc: core: replace waitqueue with worker

2017-11-10 Thread Linus Walleij
The waitqueue in the host context is there to signal back from
mmc_request_done() through mmc_wait_data_done() that the hardware
is done with a command, and when the wait is over, the core
will typically submit the next asynchronous request that is pending
just waiting for the hardware to be available.

This is in the way for letting the mmc_request_done() trigger the
report up to the block layer that a block request is finished.

Re-jig this as a first step, remvoving the waitqueue and introducing
a work that will run after a completed asynchronous request,
finalizing that request, including retransmissions, and eventually
reporting back with a completion and a status code to the
asynchronous issue method.

This has the upside that we can remove the MMC_BLK_NEW_REQUEST
status code and the "new_request" state in the request queue
that is only there to make the state machine spin out
the first time we send a request.

Use the workqueue we introduced in the host for handling just
this, and then add a work and completion in the asynchronous
request to deal with this mechanism.

We introduce a pointer from mmc_request back to the asynchronous
request so these can be referenced from each other, and
augment mmc_wait_data_done() to use this pointer to get at the
areq and kick the worker since that function is only used by
asynchronous requests anyway.

This is a central change that let us do many other changes since
we have broken the submit and complete code paths in two, and we
can potentially remove the NULL flushing of the asynchronous
pipeline and report block requests as finished directly from
the worker.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v5:
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/block.c |  3 ++
 drivers/mmc/core/core.c  | 93 
 drivers/mmc/core/core.h  |  2 ++
 drivers/mmc/core/queue.c |  1 -
 include/linux/mmc/core.h |  3 +-
 include/linux/mmc/host.h |  7 ++--
 6 files changed, 59 insertions(+), 50 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index ea80ff4cd7f9..5c84175e49be 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1712,6 +1712,7 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
mmc_blk_data_prep(mq, mqrq, disable_multi, _rel_wr, _data_tag);
 
brq->mrq.cmd = >cmd;
+   brq->mrq.areq = NULL;
 
brq->cmd.arg = blk_rq_pos(req);
if (!mmc_card_blockaddr(card))
@@ -1764,6 +1765,8 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
}
 
mqrq->areq.err_check = mmc_blk_err_check;
+   mqrq->areq.host = card->host;
+   INIT_WORK(>areq.finalization_work, mmc_finalize_areq);
 }
 
 static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, struct mmc_card *card,
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 73ebee12e67b..7440daa2f559 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -369,10 +369,15 @@ EXPORT_SYMBOL(mmc_start_request);
  */
 static void mmc_wait_data_done(struct mmc_request *mrq)
 {
-   struct mmc_context_info *context_info = >host->context_info;
+   struct mmc_host *host = mrq->host;
+   struct mmc_context_info *context_info = >context_info;
+   struct mmc_async_req *areq = mrq->areq;
 
context_info->is_done_rcv = true;
-   wake_up_interruptible(_info->wait);
+   /* Schedule a work to deal with finalizing this request */
+   if (!areq)
+   pr_err("areq of the data mmc_request was NULL!\n");
+   queue_work(host->req_done_wq, >finalization_work);
 }
 
 static void mmc_wait_done(struct mmc_request *mrq)
@@ -695,43 +700,34 @@ static void mmc_post_req(struct mmc_host *host, struct 
mmc_request *mrq,
  * Returns the status of the ongoing asynchronous request, but
  * MMC_BLK_SUCCESS if no request was going on.
  */
-static enum mmc_blk_status mmc_finalize_areq(struct mmc_host *host)
+void mmc_finalize_areq(struct work_struct *work)
 {
+   struct mmc_async_req *areq =
+   container_of(work, struct mmc_async_req, finalization_work);
+   struct mmc_host *host = areq->host;
struct mmc_context_info *context_info = >context_info;
-   enum mmc_blk_status status;
-
-   if (!host->areq)
-   return MMC_BLK_SUCCESS;
-
-   while (1) {
-   wait_event_interruptible(context_info->wait,
-   (context_info->is_done_rcv ||
-context_info->is_new_req));
+   enum mmc_blk_status status = MMC_BLK_SUCCESS;
 
-   if (context_info->is_done_rcv) {
-   struct mmc_command *cmd;
+   if (context_info->is_done_rcv) {
+   struct mmc_command *cmd;
 
-   context_info->is_done_rcv = false;
-   cmd = host->areq->mrq->cmd;
+   context_info->is_done_rcv = false;
+ 

[PATCH 00/12 v5] Multiqueue for MMC/SD

2017-11-10 Thread Linus Walleij
This is the fifth iteration of this patch set.

I *HOPE* that we can scrap this patch set and merge Adrian's
patches instead, because they also bring CQE support which is
nice. I had some review comments on his series, mainly that
it needs to kill off the legacy block layer code path that
noone likes anyway.

So this is mainly an academic and inspirational exercise.
Whatever remains of this refactoring, if anything, I can
certainly do on top of Adrian's patches as well.

What changed since v4 is the error path, since Adrian pointed
out that the error handling seems to be fragile. It was indeed
fragile... To make sure things work properly I have run long
test rounds with fault injection, essentially:

Enable FAULT_INJECTION, FAULT_INJECTION_DEBUG_FS,
   FAIL_MMC_REQUEST
cd /debug/mmc3/fail_mmc_request/
echo 1 > probability
echo -1 > times

Then running a dd to the card, also increased the error rate
to 10% and completed tests successfully, but at this error
rate the MMC stack sometimes exceeds the retry limit and the
dd command fails (as is appropriate).

Removing a card during I/O does not work well however :/
So I guess I would need to work on that if this series should
continue. (Hopefully unlikely.)


Linus Walleij (12):
  mmc: core: move the asynchronous post-processing
  mmc: core: add a workqueue for completing requests
  mmc: core: replace waitqueue with worker
  mmc: core: do away with is_done_rcv
  mmc: core: do away with is_new_req
  mmc: core: kill off the context info
  mmc: queue: simplify queue logic
  mmc: block: shuffle retry and error handling
  mmc: queue: stop flushing the pipeline with NULL
  mmc: queue/block: pass around struct mmc_queue_req*s
  mmc: block: issue requests in massive parallel
  mmc: switch MMC/SD to use blk-mq multiqueueing v5

 drivers/mmc/core/block.c| 557 +++-
 drivers/mmc/core/block.h|   5 +-
 drivers/mmc/core/bus.c  |   1 -
 drivers/mmc/core/core.c | 217 ++---
 drivers/mmc/core/core.h |  11 +-
 drivers/mmc/core/host.c |   1 -
 drivers/mmc/core/mmc_test.c |  31 +--
 drivers/mmc/core/queue.c| 252 
 drivers/mmc/core/queue.h|  16 +-
 include/linux/mmc/core.h|   3 +-
 include/linux/mmc/host.h|  31 +--
 11 files changed, 557 insertions(+), 568 deletions(-)

-- 
2.13.6



[PATCH 06/12 v5] mmc: core: kill off the context info

2017-11-10 Thread Linus Walleij
The last member of the context info: is_waiting_last_req is
just assigned values, never checked. Delete that and the whole
context info as a result.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v5:
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/block.c |  2 --
 drivers/mmc/core/bus.c   |  1 -
 drivers/mmc/core/core.c  | 13 -
 drivers/mmc/core/core.h  |  2 --
 drivers/mmc/core/queue.c |  9 +
 include/linux/mmc/host.h |  9 -
 6 files changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 5c84175e49be..86ec87c17e71 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2065,13 +2065,11 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
default:
/* Normal request, just issue it */
mmc_blk_issue_rw_rq(mq, req);
-   card->host->context_info.is_waiting_last_req = false;
break;
}
} else {
/* No request, flushing the pipeline with NULL */
mmc_blk_issue_rw_rq(mq, NULL);
-   card->host->context_info.is_waiting_last_req = false;
}
 
 out:
diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
index a4b49e25fe96..45904a7e87be 100644
--- a/drivers/mmc/core/bus.c
+++ b/drivers/mmc/core/bus.c
@@ -348,7 +348,6 @@ int mmc_add_card(struct mmc_card *card)
 #ifdef CONFIG_DEBUG_FS
mmc_add_card_debugfs(card);
 #endif
-   mmc_init_context_info(card->host);
 
card->dev.of_node = mmc_of_find_child_device(card->host, 0);
 
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index b1a5059f6cd1..fa86f9a15d29 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2997,19 +2997,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host)
 }
 #endif
 
-/**
- * mmc_init_context_info() - init synchronization context
- * @host: mmc host
- *
- * Init struct context_info needed to implement asynchronous
- * request mechanism, used by mmc core, host driver and mmc requests
- * supplier.
- */
-void mmc_init_context_info(struct mmc_host *host)
-{
-   host->context_info.is_waiting_last_req = false;
-}
-
 static int __init mmc_init(void)
 {
int ret;
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index e493d9d73fe2..88b852ac8f74 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -92,8 +92,6 @@ void mmc_remove_host_debugfs(struct mmc_host *host);
 void mmc_add_card_debugfs(struct mmc_card *card);
 void mmc_remove_card_debugfs(struct mmc_card *card);
 
-void mmc_init_context_info(struct mmc_host *host);
-
 int mmc_execute_tuning(struct mmc_card *card);
 int mmc_hs200_to_hs400(struct mmc_card *card);
 int mmc_hs400_to_hs200(struct mmc_card *card);
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 4a0752ef6154..2c232ba4e594 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -42,7 +42,6 @@ static int mmc_queue_thread(void *d)
 {
struct mmc_queue *mq = d;
struct request_queue *q = mq->queue;
-   struct mmc_context_info *cntx = >card->host->context_info;
 
current->flags |= PF_MEMALLOC;
 
@@ -54,15 +53,12 @@ static int mmc_queue_thread(void *d)
set_current_state(TASK_INTERRUPTIBLE);
req = blk_fetch_request(q);
mq->asleep = false;
-   cntx->is_waiting_last_req = false;
if (!req) {
/*
 * Dispatch queue is empty so set flags for
 * mmc_request_fn() to wake us up.
 */
-   if (mq->qcnt)
-   cntx->is_waiting_last_req = true;
-   else
+   if (!mq->qcnt)
mq->asleep = true;
}
spin_unlock_irq(q->queue_lock);
@@ -96,7 +92,6 @@ static void mmc_request_fn(struct request_queue *q)
 {
struct mmc_queue *mq = q->queuedata;
struct request *req;
-   struct mmc_context_info *cntx;
 
if (!mq) {
while ((req = blk_fetch_request(q)) != NULL) {
@@ -106,8 +101,6 @@ static void mmc_request_fn(struct request_queue *q)
return;
}
 
-   cntx = >card->host->context_info;
-
if (mq->asleep)
wake_up_process(mq->thread);
 }
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 36af19990683..4b210e9283f6 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -239,14 +239,6 @@ struct mmc_slot {
void *handler_priv;
 };
 
-/**
- * mmc_context_info - synchronization details for mmc context
- * @is_waiting_last_reqmmc context waiting for single running request
- */
-struct mmc_context_info {
-   bool 

[PATCH 10/12 v5] mmc: queue/block: pass around struct mmc_queue_req*s

2017-11-10 Thread Linus Walleij
Instead of passing two pointers around several pointers to
mmc_queue_req, request, mmc_queue, and reassigning to the left and
right, issue mmc_queue_req and dereference the queue and request
from the mmq_queue_req where needed.

The struct mmc_queue_req is the thing that has a lifecycle after
all: this is what we are keeping in our queue, and what the block
layer helps us manager. Augment a bunch of functions to take a
single argument so we can see the trees and not just a big
jungle of arguments.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v5:
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/block.c | 128 ---
 drivers/mmc/core/block.h |   5 +-
 drivers/mmc/core/queue.c |   2 +-
 3 files changed, 69 insertions(+), 66 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index c7a57006e27f..2cd9fe5a8c9b 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1208,9 +1208,9 @@ static inline void mmc_blk_reset_success(struct 
mmc_blk_data *md, int type)
  * processed it with all other requests and then they get issued in this
  * function.
  */
-static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_drv_op(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_queue_req *mq_rq;
+   struct mmc_queue *mq = mq_rq->mq;
struct mmc_card *card = mq->card;
struct mmc_blk_data *md = mq->blkdata;
struct mmc_blk_ioc_data **idata;
@@ -1220,7 +1220,6 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
int ret;
int i;
 
-   mq_rq = req_to_mmc_queue_req(req);
rpmb_ioctl = (mq_rq->drv_op == MMC_DRV_OP_IOCTL_RPMB);
 
switch (mq_rq->drv_op) {
@@ -1264,12 +1263,14 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
break;
}
mq_rq->drv_op_result = ret;
-   blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+   blk_end_request_all(mmc_queue_req_to_req(mq_rq),
+   ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
-static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_discard_rq(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_blk_data *md = mq->blkdata;
+   struct request *req = mmc_queue_req_to_req(mq_rq);
+   struct mmc_blk_data *md = mq_rq->mq->blkdata;
struct mmc_card *card = md->queue.card;
unsigned int from, nr, arg;
int err = 0, type = MMC_BLK_DISCARD;
@@ -1310,10 +1311,10 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue 
*mq, struct request *req)
blk_end_request(req, status, blk_rq_bytes(req));
 }
 
-static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
-  struct request *req)
+static void mmc_blk_issue_secdiscard_rq(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_blk_data *md = mq->blkdata;
+   struct request *req = mmc_queue_req_to_req(mq_rq);
+   struct mmc_blk_data *md = mq_rq->mq->blkdata;
struct mmc_card *card = md->queue.card;
unsigned int from, nr, arg;
int err = 0, type = MMC_BLK_SECDISCARD;
@@ -1380,14 +1381,15 @@ static void mmc_blk_issue_secdiscard_rq(struct 
mmc_queue *mq,
blk_end_request(req, status, blk_rq_bytes(req));
 }
 
-static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_flush(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_blk_data *md = mq->blkdata;
+   struct mmc_blk_data *md = mq_rq->mq->blkdata;
struct mmc_card *card = md->queue.card;
int ret = 0;
 
ret = mmc_flush_cache(card);
-   blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+   blk_end_request_all(mmc_queue_req_to_req(mq_rq),
+   ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 /*
@@ -1698,18 +1700,18 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, 
struct mmc_queue_req *mqrq,
*do_data_tag_p = do_data_tag;
 }
 
-static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
-  struct mmc_card *card,
-  bool disable_multi,
-  struct mmc_queue *mq)
+static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mq_rq,
+  bool disable_multi)
 {
u32 readcmd, writecmd;
-   struct mmc_blk_request *brq = >brq;
-   struct request *req = mmc_queue_req_to_req(mqrq);
+   struct mmc_queue *mq = mq_rq->mq;
+   struct mmc_card *card = mq->card;
+   struct mmc_blk_request *brq = _rq->brq;
+   struct request *req = mmc_queue_req_to_req(mq_rq);
struct mmc_blk_data *md = mq->blkdata;
bool do_rel_wr, do_data_tag;
 
-   mmc_blk_data_prep(mq, mqrq, disable_multi, _rel_wr, _data_tag);
+   mmc_blk_data_prep(mq, mq_rq, 

[PATCH 02/12 v5] mmc: core: add a workqueue for completing requests

2017-11-10 Thread Linus Walleij
As we want to complete requests autonomously from feeding the
host with new requests, we create a workqueue to deal with
this specifically in response to the callback from a host driver.
This is necessary to exploit parallelism properly.

This patch just adds the workqueu, later patches will make use of
it.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v5:
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/core.c  | 9 +
 drivers/mmc/core/host.c  | 1 -
 include/linux/mmc/host.h | 4 
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index e2366a82eebe..73ebee12e67b 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2838,6 +2838,14 @@ void mmc_start_host(struct mmc_host *host)
host->f_init = max(freqs[0], host->f_min);
host->rescan_disable = 0;
host->ios.power_mode = MMC_POWER_UNDEFINED;
+   /* Workqueue for completing requests */
+   host->req_done_wq = alloc_workqueue("mmc%d-reqdone",
+   WQ_FREEZABLE | WQ_HIGHPRI | WQ_MEM_RECLAIM,
+   0, host->index);
+   if (!host->req_done_wq) {
+   dev_err(mmc_dev(host), "could not allocate workqueue\n");
+   return;
+   }
 
if (!(host->caps2 & MMC_CAP2_NO_PRESCAN_POWERUP)) {
mmc_claim_host(host);
@@ -2859,6 +2867,7 @@ void mmc_stop_host(struct mmc_host *host)
 
host->rescan_disable = 1;
cancel_delayed_work_sync(>detect);
+   destroy_workqueue(host->req_done_wq);
 
/* clear pm flags now and let card drivers set them as needed */
host->pm_flags = 0;
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 35a9e4fd1a9f..88033294832f 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -390,7 +390,6 @@ struct mmc_host *mmc_alloc_host(int extra, struct device 
*dev)
INIT_DELAYED_WORK(>detect, mmc_rescan);
INIT_DELAYED_WORK(>sdio_irq_work, sdio_irq_work);
setup_timer(>retune_timer, mmc_retune_timer, (unsigned long)host);
-
/*
 * By default, hosts do not support SGIO or large requests.
 * They have to set these according to their abilities.
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index e7743eca1021..e4fa7058c288 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -425,6 +426,9 @@ struct mmc_host {
struct mmc_async_req*areq;  /* active async req */
struct mmc_context_info context_info;   /* async synchronization info */
 
+   /* finalization workqueue, handles finalizing requests */
+   struct workqueue_struct *req_done_wq;
+
/* Ongoing data transfer that allows commands during transfer */
struct mmc_request  *ongoing_mrq;
 
-- 
2.13.6



[PATCH 01/12 v5] mmc: core: move the asynchronous post-processing

2017-11-10 Thread Linus Walleij
This moves the asynchronous post-processing of a request over
to the finalization function.

The patch has a slight semantic change:

Both places will be in the code path for if (host->areq) and
in the same sequence, but before this patch, the next request
was started before performing post-processing.

The effect is that whereas before, the post- and preprocessing
happened after starting the next request, now the preprocessing
will happen after the request is done and before the next has
started which would cut half of the pre/post optimizations out.

In the later patch named "mmc: core: replace waitqueue with
worker" we move the finalization to a worker started by
mmc_request_done() and in the patch named
"mmc: block: issue requests in massive parallel" we introduce
a forked success/failure path that can quickly complete
requests when they come back from the hardware.

These two later patches together restore the same optimization
but in a more elegant manner that avoids the need to flush the
two-stage pipleline with NULL, something we remove between these
two patches in the commit named
"mmc: queue: stop flushing the pipeline with NULL".

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v5:
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/core.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 1f0f44f4dd5f..e2366a82eebe 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -746,6 +746,9 @@ static enum mmc_blk_status mmc_finalize_areq(struct 
mmc_host *host)
mmc_start_bkops(host->card, true);
}
 
+   /* Successfully postprocess the old request at this point */
+   mmc_post_req(host, host->areq->mrq, 0);
+
return status;
 }
 
@@ -790,10 +793,6 @@ struct mmc_async_req *mmc_start_areq(struct mmc_host *host,
if (status == MMC_BLK_SUCCESS && areq)
start_err = __mmc_start_data_req(host, areq->mrq);
 
-   /* Postprocess the old request at this point */
-   if (host->areq)
-   mmc_post_req(host, host->areq->mrq, 0);
-
/* Cancel a prepared request if it was not started. */
if ((status != MMC_BLK_SUCCESS || start_err) && areq)
mmc_post_req(host, areq->mrq, -EINVAL);
-- 
2.13.6



[PATCH 07/12 v5] mmc: queue: simplify queue logic

2017-11-10 Thread Linus Walleij
The if() statment checking if there is no current or previous
request is now just looking ahead at something that will be
concluded a few lines below. Simplify the logic by moving the
assignment of .asleep.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v5:
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/queue.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 2c232ba4e594..023bbddc1a0b 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -53,14 +53,6 @@ static int mmc_queue_thread(void *d)
set_current_state(TASK_INTERRUPTIBLE);
req = blk_fetch_request(q);
mq->asleep = false;
-   if (!req) {
-   /*
-* Dispatch queue is empty so set flags for
-* mmc_request_fn() to wake us up.
-*/
-   if (!mq->qcnt)
-   mq->asleep = true;
-   }
spin_unlock_irq(q->queue_lock);
 
if (req || mq->qcnt) {
@@ -68,6 +60,7 @@ static int mmc_queue_thread(void *d)
mmc_blk_issue_rq(mq, req);
cond_resched();
} else {
+   mq->asleep = true;
if (kthread_should_stop()) {
set_current_state(TASK_RUNNING);
break;
-- 
2.13.6



[PATCH 05/12] mmc: core: do away with is_new_req

2017-11-10 Thread Linus Walleij
The host context member "is_new_req" is only assigned values,
never checked. Delete it.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v5:
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/core.c  | 1 -
 drivers/mmc/core/queue.c | 5 -
 include/linux/mmc/host.h | 2 --
 3 files changed, 8 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 15a664d3c199..b1a5059f6cd1 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -3007,7 +3007,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host)
  */
 void mmc_init_context_info(struct mmc_host *host)
 {
-   host->context_info.is_new_req = false;
host->context_info.is_waiting_last_req = false;
 }
 
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index c46be4402803..4a0752ef6154 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -55,7 +55,6 @@ static int mmc_queue_thread(void *d)
req = blk_fetch_request(q);
mq->asleep = false;
cntx->is_waiting_last_req = false;
-   cntx->is_new_req = false;
if (!req) {
/*
 * Dispatch queue is empty so set flags for
@@ -109,10 +108,6 @@ static void mmc_request_fn(struct request_queue *q)
 
cntx = >card->host->context_info;
 
-   if (cntx->is_waiting_last_req) {
-   cntx->is_new_req = true;
-   }
-
if (mq->asleep)
wake_up_process(mq->thread);
 }
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index d43d26562fae..36af19990683 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -241,11 +241,9 @@ struct mmc_slot {
 
 /**
  * mmc_context_info - synchronization details for mmc context
- * @is_new_req wake up reason was new request
  * @is_waiting_last_reqmmc context waiting for single running request
  */
 struct mmc_context_info {
-   boolis_new_req;
boolis_waiting_last_req;
 };
 
-- 
2.13.6



[PATCH 04/12] mmc: core: do away with is_done_rcv

2017-11-10 Thread Linus Walleij
The "is_done_rcv" in the context info for the host is no longer
needed: it is clear from context (ha!) that as long as we are
waiting for the asynchronous request to come to completion,
we are not done receiving data, and when the finalization work
has run and completed the completion, we are indeed done.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v5:
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/core.c  | 40 
 include/linux/mmc/host.h |  2 --
 2 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 7440daa2f559..15a664d3c199 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -370,10 +370,8 @@ EXPORT_SYMBOL(mmc_start_request);
 static void mmc_wait_data_done(struct mmc_request *mrq)
 {
struct mmc_host *host = mrq->host;
-   struct mmc_context_info *context_info = >context_info;
struct mmc_async_req *areq = mrq->areq;
 
-   context_info->is_done_rcv = true;
/* Schedule a work to deal with finalizing this request */
if (!areq)
pr_err("areq of the data mmc_request was NULL!\n");
@@ -656,7 +654,7 @@ EXPORT_SYMBOL(mmc_cqe_recovery);
 bool mmc_is_req_done(struct mmc_host *host, struct mmc_request *mrq)
 {
if (host->areq)
-   return host->context_info.is_done_rcv;
+   return completion_done(>areq->complete);
else
return completion_done(>completion);
 }
@@ -705,29 +703,24 @@ void mmc_finalize_areq(struct work_struct *work)
struct mmc_async_req *areq =
container_of(work, struct mmc_async_req, finalization_work);
struct mmc_host *host = areq->host;
-   struct mmc_context_info *context_info = >context_info;
enum mmc_blk_status status = MMC_BLK_SUCCESS;
+   struct mmc_command *cmd;
 
-   if (context_info->is_done_rcv) {
-   struct mmc_command *cmd;
-
-   context_info->is_done_rcv = false;
-   cmd = areq->mrq->cmd;
+   cmd = areq->mrq->cmd;
 
-   if (!cmd->error || !cmd->retries ||
-   mmc_card_removed(host->card)) {
-   status = areq->err_check(host->card,
-areq);
-   } else {
-   mmc_retune_recheck(host);
-   pr_info("%s: req failed (CMD%u): %d, retrying...\n",
-   mmc_hostname(host),
-   cmd->opcode, cmd->error);
-   cmd->retries--;
-   cmd->error = 0;
-   __mmc_start_request(host, areq->mrq);
-   return; /* wait for done/new event again */
-   }
+   if (!cmd->error || !cmd->retries ||
+   mmc_card_removed(host->card)) {
+   status = areq->err_check(host->card,
+areq);
+   } else {
+   mmc_retune_recheck(host);
+   pr_info("%s: req failed (CMD%u): %d, retrying...\n",
+   mmc_hostname(host),
+   cmd->opcode, cmd->error);
+   cmd->retries--;
+   cmd->error = 0;
+   __mmc_start_request(host, areq->mrq);
+   return; /* wait for done/new event again */
}
 
mmc_retune_release(host);
@@ -3015,7 +3008,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host)
 void mmc_init_context_info(struct mmc_host *host)
 {
host->context_info.is_new_req = false;
-   host->context_info.is_done_rcv = false;
host->context_info.is_waiting_last_req = false;
 }
 
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index d2ff79a16839..d43d26562fae 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -241,12 +241,10 @@ struct mmc_slot {
 
 /**
  * mmc_context_info - synchronization details for mmc context
- * @is_done_rcvwake up reason was done request
  * @is_new_req wake up reason was new request
  * @is_waiting_last_reqmmc context waiting for single running request
  */
 struct mmc_context_info {
-   boolis_done_rcv;
boolis_new_req;
boolis_waiting_last_req;
 };
-- 
2.13.6



[PATCH 08/12 v5] mmc: block: shuffle retry and error handling

2017-11-10 Thread Linus Walleij
Instead of doing retries at the same time as trying to submit new
requests, do the retries when the request is reported as completed
by the driver, in the finalization worker.

This is achieved by letting the core worker call back into the block
layer using a callback in the asynchronous request, ->report_done_status()
that will pass the status back to the block core so it can repeatedly
try to hammer the request using single request, retry etc by calling back
to the core layer using mmc_restart_areq(), which will just kick the
same asynchronous request without waiting for a previous ongoing request.

The beauty of it is that the completion will not complete until the
block layer has had the opportunity to hammer a bit at the card using
a bunch of different approaches that used to be in the while() loop in
mmc_blk_rw_done()

The algorithm for recapture, retry and handle errors is identical to the
one we used to have in mmc_blk_issue_rw_rq(), only augmented to get
called in another path from the core.

We have to add and initialize a pointer back to the struct mmc_queue
from the struct mmc_queue_req to find the queue from the asynchronous
request when reporting the status back to the core.

Other users of the asynchrous request that do not need to retry and
use misc error handling fallbacks will work fine since a NULL
->report_done_status() is just fine. This is currently only done by
the test module.

Signed-off-by: Linus Walleij 
---
ChangeLog v4->v5:
- The "disable_multi" and "retry" variables used to be inside
  the do {} loop in the error handler, so now that we restart
  the areq when there are problems, we need to make these
  part of the struct mmc_async_req and reinitialize them to
  false/zero when restarting an asynchronous request.
- Assign mrq->areq also when restarting asynchronous requests:
  the mrq is a quick-turnaround produce and consume object and
  only lives for one request to the host, so this needs to be
  assigned every time we made a new mrq and want to send it
  off to the host.
- Switch "disable_multi" to be a bool as is appropriate.
- Be more careful to assign NULL to host->areq when it is not
  in use, and make sure this only happens at one spot.
- Rebasing on the "next" branch in the MMC tree.
---
 drivers/mmc/core/block.c | 347 ---
 drivers/mmc/core/core.c  |  46 ---
 drivers/mmc/core/core.h  |   1 +
 drivers/mmc/core/queue.c |   2 +
 drivers/mmc/core/queue.h |   1 +
 include/linux/mmc/host.h |   7 +-
 6 files changed, 221 insertions(+), 183 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 86ec87c17e71..2cda2f52058e 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1575,7 +1575,7 @@ static enum mmc_blk_status mmc_blk_err_check(struct 
mmc_card *card,
 }
 
 static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
- int disable_multi, bool *do_rel_wr_p,
+ bool disable_multi, bool *do_rel_wr_p,
  bool *do_data_tag_p)
 {
struct mmc_blk_data *md = mq->blkdata;
@@ -1700,7 +1700,7 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, 
struct mmc_queue_req *mqrq,
 
 static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
   struct mmc_card *card,
-  int disable_multi,
+  bool disable_multi,
   struct mmc_queue *mq)
 {
u32 readcmd, writecmd;
@@ -1811,198 +1811,213 @@ static void mmc_blk_rw_cmd_abort(struct mmc_queue 
*mq, struct mmc_card *card,
 /**
  * mmc_blk_rw_try_restart() - tries to restart the current async request
  * @mq: the queue with the card and host to restart
- * @req: a new request that want to be started after the current one
+ * @mqrq: the mmc_queue_request containing the areq to be restarted
  */
-static void mmc_blk_rw_try_restart(struct mmc_queue *mq, struct request *req,
+static void mmc_blk_rw_try_restart(struct mmc_queue *mq,
   struct mmc_queue_req *mqrq)
 {
-   if (!req)
-   return;
+   struct mmc_async_req *areq = >areq;
+
+   /* Proceed and try to restart the current async request */
+   mmc_blk_rw_rq_prep(mqrq, mq->card, 0, mq);
+   areq->disable_multi = false;
+   areq->retry = 0;
+   mmc_restart_areq(mq->card->host, areq);
+}
+
+static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status 
status)
+{
+   struct mmc_queue *mq;
+   struct mmc_blk_data *md;
+   struct mmc_card *card;
+   struct mmc_host *host;
+   struct mmc_queue_req *mq_rq;
+   struct mmc_blk_request *brq;
+   struct request *old_req;
+   bool req_pending = true;
+   int type, retune_retry_done = 0;
 
/*
-* If the card was removed, just cancel everything and return.
+* An 

[PATCH V4] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-10 Thread Ming Lei
cmd->cmnd can be allocated/freed dynamically in case of T10_PI_TYPE2_PROTECTION,
so we should check it in scsi_show_rq() because this request may have
been freed already here, and cmd->cmnd has been set as null.

We choose to accept read-after-free and dump request data as far as possible.

This patch fixs the following kernel crash when dumping request via block's
debugfs interface:

[  252.962045] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[  252.963007] IP: scsi_format_opcode_name+0x1a/0x1c0
[  252.963007] PGD 25e75a067 P4D 25e75a067 PUD 25e75b067 PMD 0
[  252.963007] Oops:  [#1] PREEMPT SMP
[  252.963007] Dumping ftrace buffer:
[  252.963007](ftrace buffer empty)
[  252.963007] Modules linked in: scsi_debug ebtable_filter ebtables 
ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack libcrc32c bridge stp llc iptable_filter fuse ip_tables 
sd_mod sg mptsas mptscsih mptbase crc32c_intel ahci libahci nvme serio_raw 
scsi_transport_sas libata lpc_ich nvme_core virtio_scsi binfmt_misc dm_mod 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs
[  252.963007] CPU: 1 PID: 1881 Comm: cat Not tainted 
4.14.0-rc2.blk_mq_io_hang+ #516
[  252.963007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.9.3-1.fc25 04/01/2014
[  252.963007] task: 88025e6f6000 task.stack: c90001bd
[  252.963007] RIP: 0010:scsi_format_opcode_name+0x1a/0x1c0
[  252.963007] RSP: 0018:c90001bd3c50 EFLAGS: 00010286
[  252.963007] RAX: 4843 RBX: 0050 RCX: 
[  252.963007] RDX:  RSI: 0050 RDI: c90001bd3cd8
[  252.963007] RBP: c90001bd3c88 R08: 1000 R09: 
[  252.963007] R10: 880275134000 R11: 88027513406c R12: 0050
[  252.963007] R13: c90001bd3cd8 R14:  R15: 
[  252.963007] FS:  7f4d11762700() GS:88027fc4() 
knlGS:
[  252.963007] CS:  0010 DS:  ES:  CR0: 80050033
[  252.963007] CR2:  CR3: 00025e789003 CR4: 003606e0
[  252.963007] DR0:  DR1:  DR2: 
[  252.963007] DR3:  DR6: fffe0ff0 DR7: 0400
[  252.963007] Call Trace:
[  252.963007]  __scsi_format_command+0x27/0xc0
[  252.963007]  scsi_show_rq+0x5c/0xc0
[  252.963007]  ? seq_printf+0x4e/0x70
[  252.963007]  ? blk_flags_show+0x5b/0xf0
[  252.963007]  __blk_mq_debugfs_rq_show+0x116/0x130
[  252.963007]  blk_mq_debugfs_rq_show+0xe/0x10
[  252.963007]  seq_read+0xfe/0x3b0
[  252.963007]  ? __handle_mm_fault+0x631/0x1150
[  252.963007]  full_proxy_read+0x54/0x90
[  252.963007]  __vfs_read+0x37/0x160
[  252.963007]  ? security_file_permission+0x9b/0xc0
[  252.963007]  vfs_read+0x96/0x130
[  252.963007]  SyS_read+0x55/0xc0
[  252.963007]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  252.963007] RIP: 0033:0x7f4d1127e9b0
[  252.963007] RSP: 002b:7ffd27082568 EFLAGS: 0246 ORIG_RAX: 

[  252.963007] RAX: ffda RBX: 7f4d1154bb20 RCX: 7f4d1127e9b0
[  252.963007] RDX: 0002 RSI: 7f4d115a7000 RDI: 0003
[  252.963007] RBP: 00021010 R08:  R09: 
[  252.963007] R10: 037b R11: 0246 R12: 00022000
[  252.963007] R13: 7f4d1154bb78 R14: 1000 R15: 0002
[  252.963007] Code: c6 e8 1b ca 24 00 eb 8c e8 74 2c ae ff 0f 1f 40 00 0f 1f 
44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 89 f4 48 83 ec 18 <44> 0f 
b6 32 48 c7 45 c8 00 00 00 00 65 48 8b 04 25 28 00 00 00
[  252.963007] RIP: scsi_format_opcode_name+0x1a/0x1c0 RSP: c90001bd3c50
[  252.963007] CR2: 
[  252.963007] ---[ end trace 83c5bddfbaa6573c ]---
[  252.963007] Kernel panic - not syncing: Fatal exception
[  252.963007] Dumping ftrace buffer:
[  252.963007](ftrace buffer empty)
[  252.963007] Kernel Offset: disabled
[  252.963007] ---[ end Kernel panic - not syncing: Fatal exception

Fixes: 0eebd005dd07(scsi: Implement blk_mq_ops.show_rq())
Cc: Bart Van Assche 
Cc: Omar Sandoval 
Cc: Martin K. Petersen 
Cc: James Bottomley 
Cc: Hannes Reinecke 
Signed-off-by: Ming Lei 
---
V2:
   - fix typo
V3:
   - prefer to dump data and accept read-after-free
   - add some comment
V4:
   - read the two variable into local variable first, for avoiding
free between check and passing to __scsi_format_command().


 drivers/scsi/scsi_debugfs.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index 5e9755008aed..f8437c456458 

Re: nvme multipath support V7

2017-11-10 Thread Christoph Hellwig
I've updated the git tree with the Kconfig type fix, the instance
fixup from Keith (plus extensive comments explaning the issue) and
the various reviews.  I'll wait for the buildbot a and do a little
more manual testing and will send it to Jens in the afternoon.


Re: [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host

2017-11-10 Thread Linus Walleij
On Thu, Nov 9, 2017 at 1:55 PM, Adrian Hunter  wrote:
> On 09/11/17 14:26, Linus Walleij wrote:

 I think the above approach to put any CQE-specific callbacks
 directly into the struct mmc_host_ops is way more viable.
>>>
>>> Nothing to do with CQE.  This is CQHCI.  Please try to get the difference.
>>
>> I am trying, please try to think about your language.
>
> I strongly disapprove of being rude but sadly it seems to get results.

Conflict is a natural part of human existence. Using it deliberately
to get one's way is manipulation. Admitting to being manipulative
is losing one's face in public.

Yours,
Linus Walleij


Re: [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery

2017-11-10 Thread Linus Walleij
On Thu, Nov 9, 2017 at 2:02 PM, Adrian Hunter  wrote:
> On 09/11/17 14:52, Linus Walleij wrote:
>> On Thu, Nov 9, 2017 at 8:56 AM, Adrian Hunter  
>> wrote:
>>> On 08/11/17 11:30, Linus Walleij wrote:
 On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter  
 wrote:

> Recovery is simpler to understand if it is only used for errors. Create a
> separate function for card polling.
>
> Signed-off-by: Adrian Hunter 

 This looks good but I can't see why it's not folded into
 patch 3 already. This error handling is introduced there.
>>>
>>> What are you on about?
>>
>> You are attacking your most valuable resource, a reviewer.
>>
>> And I even said the patch looks good.
>>
>> The only thing you attain with this kind of langauge is alienante
>> me and discourage others to review your patch set. You also
>> give your employer a bad name, since you are representing
>> them.
>
> 6 months of being messed around will do that.

Blessed are the meek, for they will inherit the earth.

Nobody is after you, this is just hard stuff and too few people care to
help out with review etc. With this and with the block layer as well.
That makes things slow. It's noone's agenda to slow things down.

Yours,
Linus Walleij


Re: [PATCH] block: avoid null pointer dereference on null disk

2017-11-10 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH 5/7] nvme: also expose the namespace identification sysfs files for mpath nodes

2017-11-10 Thread Hannes Reinecke
On 11/09/2017 06:44 PM, Christoph Hellwig wrote:
> We do this by adding a helper that returns the ns_head for a device that
> can belong to either the per-controller or per-subsystem block device
> nodes, and otherwise reuse all the existing code.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Keith Busch 
> Reviewed-by: Sagi Grimberg 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/nvme/host/core.c  | 57 
> +++
>  drivers/nvme/host/multipath.c |  6 +
>  drivers/nvme/host/nvme.h  |  1 +
>  3 files changed, 38 insertions(+), 26 deletions(-)
> 
In principle, yes, but I would like to have some more info (like
subsysnqn, and maybe 'model' etc) in the subsys sysfs entry, too.
But this can be handled in a later patch, so:

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK

2017-11-10 Thread Linus Walleij
On Thu, Nov 9, 2017 at 8:12 AM, Adrian Hunter  wrote:
> On 08/11/17 11:24, Linus Walleij wrote:
>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter  
>> wrote:
>>
>>> Add CQHCI initialization and implement CQHCI operations for Intel GLK.
>>>
>>> Signed-off-by: Adrian Hunter 
>>
>> This patch seems OK in context, but it merely illustrates the
>> weirdness of .[runtime]_suspend/resume calling into CQE-specific
>> APIs rather than using generic host callbacks.
>
> Your comment makes no sense at all.  The host driver has
> [runtime]_suspend/resume callbacks and it is up to the host driver to decide
> what to do.  CQHCI provides helpers since that is the whole point of having
> a CQHCI library.

Yeah I didn't see it in context, the CQHCI library for these hosts
make a lot of sense.

Yours,
Linus Walleij