Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()

2016-09-22 Thread Josef Bacik

> On Sep 22, 2016, at 9:44 PM, Ming Lei  wrote:
> 
>> On Thu, Sep 22, 2016 at 11:17 PM, Jens Axboe  wrote:
>>> On 09/22/2016 09:12 AM, Christoph Hellwig wrote:
>>> 
 On Thu, Sep 22, 2016 at 09:03:56AM -0600, Jens Axboe wrote:
 
 Having to grab a mutex, for instance. We invoke ->queue_rq() with
 preemption disabled, so I'd hope that would not be the case... What
 drivers block in ->queue_rq?
>>> 
>>> 
>>> I though I had converted a lot of them to GFP_NOIO instead of GFP_ATOMIC
>>> allocations, but I can't find any evidence of that.  Maybe it was just
>>> my imagination, or an unsubmitted patch series.  Sorry for the
>>> confusion.
>> 
>> 
>> OK, that makes more sense. Pretty sure we would have had complaints!
>> 
 Loop was another case that was on my radar to get rid of the queue_work
 it currently has to do. Josef is currently testing the nbd driver using
 this approach, so we should get some numbers there too. If we have to,
 we can always bump up the concurrency to mimic more of the behavior of
 having multiple workers running on the hardware queue. I'd prefer to
 still do that in blk-mq, instead of having drivers reinventing their own
 work offload functionality.
>>> 
>>> 
>>> There should be a lot of numbers in the list archives from the time
>>> that Ming did the loop conversion, as I've been trying to steer him
>>> that way, and he actually implemented and benchmarked it.
>>> 
>>> We can't just increase the concurrency as a single work_struct item
>>> can't be queued multiple times even on a high concurreny workqueue.
>> 
>> 
>> But we could have more work items, if we had to. Even if loop isn't a
>> drop-in replacement for this simpler approach, I think it'll work well
>> enough for nbd. The 5% number from Josef is comparing to not having any
>> offload at all, I suspect the number from just converting to queue_work
>> in the driver would be similar.
> 
> 5% might be a noise.
> 
> From nbd's .queue_rq(), kthread_work should match its case because
> one per-nbd mutex is required and all cmds sent to the nbd are serialized,
> so using common work items may hurt performance caused by
> unnecessary context switch.

This is with my multi connection patch with 4 connections per device (so 4 hw 
queues).  The 5% is real but I don't think it'll get better by punting to a 
worker per command, so this approach is fine for NBD.  Thanks,

Josef--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()

2016-09-22 Thread Ming Lei
On Thu, Sep 22, 2016 at 11:17 PM, Jens Axboe  wrote:
> On 09/22/2016 09:12 AM, Christoph Hellwig wrote:
>>
>> On Thu, Sep 22, 2016 at 09:03:56AM -0600, Jens Axboe wrote:
>>>
>>> Having to grab a mutex, for instance. We invoke ->queue_rq() with
>>> preemption disabled, so I'd hope that would not be the case... What
>>> drivers block in ->queue_rq?
>>
>>
>> I though I had converted a lot of them to GFP_NOIO instead of GFP_ATOMIC
>> allocations, but I can't find any evidence of that.  Maybe it was just
>> my imagination, or an unsubmitted patch series.  Sorry for the
>> confusion.
>
>
> OK, that makes more sense. Pretty sure we would have had complaints!
>
>>> Loop was another case that was on my radar to get rid of the queue_work
>>> it currently has to do. Josef is currently testing the nbd driver using
>>> this approach, so we should get some numbers there too. If we have to,
>>> we can always bump up the concurrency to mimic more of the behavior of
>>> having multiple workers running on the hardware queue. I'd prefer to
>>> still do that in blk-mq, instead of having drivers reinventing their own
>>> work offload functionality.
>>
>>
>> There should be a lot of numbers in the list archives from the time
>> that Ming did the loop conversion, as I've been trying to steer him
>> that way, and he actually implemented and benchmarked it.
>>
>> We can't just increase the concurrency as a single work_struct item
>> can't be queued multiple times even on a high concurreny workqueue.
>
>
> But we could have more work items, if we had to. Even if loop isn't a
> drop-in replacement for this simpler approach, I think it'll work well
> enough for nbd. The 5% number from Josef is comparing to not having any
> offload at all, I suspect the number from just converting to queue_work
> in the driver would be similar.

5% might be a noise.

>From nbd's .queue_rq(), kthread_work should match its case because
one per-nbd mutex is required and all cmds sent to the nbd are serialized,
so using common work items may hurt performance caused by
unnecessary context switch.

thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()

2016-09-22 Thread Ming Lei
On Thu, Sep 22, 2016 at 11:12 PM, Christoph Hellwig  wrote:
> On Thu, Sep 22, 2016 at 09:03:56AM -0600, Jens Axboe wrote:
>> Having to grab a mutex, for instance. We invoke ->queue_rq() with
>> preemption disabled, so I'd hope that would not be the case... What
>> drivers block in ->queue_rq?
>
> I though I had converted a lot of them to GFP_NOIO instead of GFP_ATOMIC
> allocations, but I can't find any evidence of that.  Maybe it was just
> my imagination, or an unsubmitted patch series.  Sorry for the
> confusion.
>
>> Loop was another case that was on my radar to get rid of the queue_work
>> it currently has to do. Josef is currently testing the nbd driver using
>> this approach, so we should get some numbers there too. If we have to,
>> we can always bump up the concurrency to mimic more of the behavior of
>> having multiple workers running on the hardware queue. I'd prefer to
>> still do that in blk-mq, instead of having drivers reinventing their own
>> work offload functionality.
>
> There should be a lot of numbers in the list archives from the time
> that Ming did the loop conversion, as I've been trying to steer him
> that way, and he actually implemented and benchmarked it.
>
> We can't just increase the concurrency as a single work_struct item
> can't be queued multiple times even on a high concurreny workqueue.

Yes, that is it, and that is why last time we can't convert to this way.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()

2016-09-22 Thread Josef Bacik

On 09/22/2016 10:59 AM, Christoph Hellwig wrote:

On Thu, Sep 22, 2016 at 08:53:00AM -0600, Jens Axboe wrote:

If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its
->queue_rq() handler. For that case, blk-mq ensures that we always
calls it from a safe context.


First can you provide a more useful defintion of blocking?  Lots of current
drivers will already block in ->queue_rq, mostly to allocate memory.



NBD needs to take a mutex before it sends.  This patch was born out of a kbuild 
error I got because of a schedule while atomic, this was the backtrace


[   17.698207] BUG: sleeping function called from invalid context at 
kernel/locking/mutex.c:620

[   17.700903] in_atomic(): 1, irqs_disabled(): 0, pid: 1050, name: mount
[   17.702350] 1 lock held by mount/1050:
[   17.703395]  #0:  (>s_umount_key#20/1){+.+.+.}, at: 
[] sget_userns+0x2f7/0x4b0

[   17.706064] CPU: 0 PID: 1050 Comm: mount Not tainted 
4.8.0-rc4-7-gfd8383fd #1
[   17.708149] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014

[   17.710523]   880034dd39f0 8178e23d 
8800350e4d00
[   17.712773]  041a 880034dd3a18 81108c39 
839f1476
[   17.714978]  026c  880034dd3a40 
81108cb5
[   17.717224] Call Trace:
[   17.718130]  [] dump_stack+0x82/0xb8
[   17.719474]  [] ___might_sleep+0x1bd/0x1c4
[   17.720813]  [] __might_sleep+0x75/0x7c
[   17.722100]  [] mutex_lock_nested+0x3e/0x396
[   17.723449]  [] ? mod_timer+0x10/0x12
[   17.724705]  [] ? blk_add_timer+0xe0/0xe5
[   17.726088]  [] nbd_queue_rq+0x7b/0x14b
[   17.727367]  [] ? nbd_queue_rq+0x7b/0x14b
[   17.728653]  [] __blk_mq_run_hw_queue+0x1c7/0x2c8
[   17.730118]  [] blk_mq_run_hw_queue+0x5e/0xbb
[   17.731454]  [] blk_sq_make_request+0x1a1/0x1ba
[   17.732806]  [] generic_make_request+0xbd/0x160
[   17.734153]  [] submit_bio+0xf6/0xff
[   17.735365]  [] submit_bh_wbc+0x136/0x143
[   17.736719]  [] submit_bh+0x10/0x12
[   17.737888]  [] __bread_gfp+0x50/0x6f
[   17.739212]  [] ext4_fill_super+0x1f4/0x27ec
[   17.740492]  [] ? vsnprintf+0x22d/0x3b7
[   17.741685]  [] mount_bdev+0x144/0x197
[   17.742928]  [] ? ext4_calculate_overhead+0x2bd/0x2bd
[   17.744275]  [] ext4_mount+0x15/0x17
[   17.745406]  [] mount_fs+0x67/0x131
[   17.746518]  [] vfs_kern_mount+0x6b/0xdb
[   17.747675]  [] do_mount+0x708/0x97d
[   17.748833]  [] ? __might_fault+0x7e/0x84
[   17.750074]  [] ? strndup_user+0x3f/0x53
[   17.751198]  [] compat_SyS_mount+0x185/0x1ae
[   17.752433]  [] do_int80_syscall_32+0x5c/0x6b
[   17.753592]  [] entry_INT80_compat+0x38/0x50
[   17.754952] block nbd11: Attempted send on closed socket


Second we looked at something similar a few times ago, mosty notably
when converting loop and rbd, and came to the conclusion that
performance sucks when we only have that single per-hctx work struct
for a busy device.  I think Ming has done a lot of the benchmarking,
so I've added him to Cc.



Yeah this looks to be about a 5% hit on my microbenchmarks for NBD, but nobody 
accuses NBD of being particularly fast either so I wasn't too worried about it. 
If we want to say that we can block in ->queue_rq that would make me happy too. 
Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()

2016-09-22 Thread Jens Axboe

On 09/22/2016 08:59 AM, Christoph Hellwig wrote:

On Thu, Sep 22, 2016 at 08:53:00AM -0600, Jens Axboe wrote:

If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its
->queue_rq() handler. For that case, blk-mq ensures that we always
calls it from a safe context.


First can you provide a more useful defintion of blocking?  Lots of current
drivers will already block in ->queue_rq, mostly to allocate memory.


Having to grab a mutex, for instance. We invoke ->queue_rq() with
preemption disabled, so I'd hope that would not be the case... What
drivers block in ->queue_rq?


Second we looked at something similar a few times ago, mosty notably
when converting loop and rbd, and came to the conclusion that
performance sucks when we only have that single per-hctx work struct
for a busy device.  I think Ming has done a lot of the benchmarking,
so I've added him to Cc.


Loop was another case that was on my radar to get rid of the queue_work
it currently has to do. Josef is currently testing the nbd driver using
this approach, so we should get some numbers there too. If we have to,
we can always bump up the concurrency to mimic more of the behavior of
having multiple workers running on the hardware queue. I'd prefer to
still do that in blk-mq, instead of having drivers reinventing their own
work offload functionality.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()

2016-09-22 Thread Christoph Hellwig
On Thu, Sep 22, 2016 at 08:53:00AM -0600, Jens Axboe wrote:
> If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its
> ->queue_rq() handler. For that case, blk-mq ensures that we always
> calls it from a safe context.

First can you provide a more useful defintion of blocking?  Lots of current
drivers will already block in ->queue_rq, mostly to allocate memory.

Second we looked at something similar a few times ago, mosty notably
when converting loop and rbd, and came to the conclusion that
performance sucks when we only have that single per-hctx work struct
for a busy device.  I think Ming has done a lot of the benchmarking,
so I've added him to Cc.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html