[PATCH V4 14/14] blk-mq: improve bio merge from blk-mq sw queue

2017-09-02 Thread Ming Lei
This patch uses hash table to do bio merge from sw queue, then we can align to blk-mq scheduler/block legacy's way for bio merge. Turns out bio merge via hash table is more efficient than simple merge on the last 8 requests in sw queue. On SCSI SRP, it is observed ~10% IOPS is increased in

[PATCH V4 12/14] block: introduce .last_merge and .hash to blk_mq_ctx

2017-09-02 Thread Ming Lei
Prepare for supporting bio merge to sw queue if no blk-mq io scheduler is taken. Signed-off-by: Ming Lei --- block/blk-mq.h | 4 block/blk.h | 3 +++ block/elevator.c | 22 +++--- 3 files changed, 26 insertions(+), 3 deletions(-) diff --git

[PATCH V4 13/14] blk-mq-sched: refactor blk_mq_sched_try_merge()

2017-09-02 Thread Ming Lei
This patch introduces one function __blk_mq_try_merge() which will be resued for bio merge to sw queue in the following patch. No functional change. Reviewed-by: Bart Van Assche Signed-off-by: Ming Lei --- block/blk-mq-sched.c | 18

[PATCH V4 10/14] block: move actual bio merge code into __elv_merge

2017-09-02 Thread Ming Lei
So that we can reuse __elv_merge() to merge bio into requests from sw queue in the following patches. Signed-off-by: Ming Lei --- block/elevator.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/block/elevator.c b/block/elevator.c

[PATCH V4 11/14] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue

2017-09-02 Thread Ming Lei
blk_mq_sched_try_merge() will be reused in following patches to support bio merge to blk-mq sw queue, so add checkes to related functions which are called from blk_mq_sched_try_merge(). Signed-off-by: Ming Lei --- block/elevator.c | 16 1 file changed, 16

[PATCH V4 09/14] block: introduce rqhash helpers

2017-09-02 Thread Ming Lei
We need this helpers for supporting to use hashtable to improve bio merge from sw queue in the following patches. No functional change. Signed-off-by: Ming Lei --- block/blk.h | 52 block/elevator.c | 36

[PATCH V4 08/14] blk-mq-sched: use q->queue_depth as hint for q->nr_requests

2017-09-02 Thread Ming Lei
SCSI sets q->queue_depth from shost->cmd_per_lun, and q->queue_depth is per request_queue and more related to scheduler queue compared with hw queue depth, which can be shared by queues, such as TAG_SHARED. This patch tries to use q->queue_depth as hint for computing q->nr_requests, which should

[PATCH V4 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctx()

2017-09-02 Thread Ming Lei
This function is introduced for dequeuing request from sw queue so that we can dispatch it in scheduler's way. More importantly, some SCSI devices may set q->queue_depth, which is a per-request_queue limit, and applied on pending I/O from all hctxs. This function is introduced for avoiding to

[PATCH V4 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed

2017-09-02 Thread Ming Lei
During dispatching, we moved all requests from hctx->dispatch to one temporary list, then dispatch them one by one from this list. Unfortunately during this period, run queue from other contexts may think the queue is idle, then start to dequeue from sw/scheduler queue and still try to dispatch

[PATCH V4 07/14] blk-mq-sched: introduce blk_mq_sched_queue_depth()

2017-09-02 Thread Ming Lei
The following patch will use one hint to figure out default queue depth for scheduler queue, so introduce the helper of blk_mq_sched_queue_depth() for this purpose. Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche Signed-off-by: Ming Lei

[PATCH V4 04/14] blk-mq-sched: move actual dispatching into one helper

2017-09-02 Thread Ming Lei
So that it becomes easy to support to dispatch from sw queue in the following patch. No functional change. Reviewed-by: Bart Van Assche Signed-off-by: Ming Lei --- block/blk-mq-sched.c | 28 ++-- 1 file changed, 18

[PATCH V4 05/14] blk-mq-sched: improve dispatching from sw queue

2017-09-02 Thread Ming Lei
SCSI devices use host-wide tagset, and the shared driver tag space is often quite big. Meantime there is also queue depth for each lun(.cmd_per_lun), which is often small. So lots of requests may stay in sw queue, and we always flush all belonging to same hw queue and dispatch them all to driver,

[PATCH V4 02/14] sbitmap: introduce __sbitmap_for_each_set()

2017-09-02 Thread Ming Lei
We need to iterate ctx starting from any ctx in round robin way, so introduce this helper. Cc: Omar Sandoval Signed-off-by: Ming Lei --- include/linux/sbitmap.h | 54 - 1 file changed, 40 insertions(+), 14

[PATCH V4 01/14] blk-mq-sched: fix scheduler bad performance

2017-09-02 Thread Ming Lei
When hw queue is busy, we shouldn't take requests from scheduler queue any more, otherwise it is difficult to do IO merge. This patch fixes the awful IO performance on some SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber is used by not taking requests if hw queue is busy. Reviewed-by:

Re: [PATCH V3 0/8] block/scsi: safe SCSI quiescing

2017-09-02 Thread Oleksandr Natalenko
Again, Tested-by: Oleksandr Natalenko On sobota 2. září 2017 15:08:32 CEST Ming Lei wrote: > Hi, > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. > > Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT > can be

Re: [PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

2017-09-02 Thread Ming Lei
On Sat, Sep 02, 2017 at 09:08:39PM +0800, Ming Lei wrote: > REQF_PREEMPT is a bit special because the request is required > to be dispatched to lld even when SCSI device is quiesced. > > So this patch introduces __blk_get_request() to allow block > layer to allocate request when queue is preempt

[PATCH V3 6/8] block: introduce preempt version of blk_[freeze|unfreeze]_queue

2017-09-02 Thread Ming Lei
The two APIs are required to allow request allocation of RQF_PREEMPT when queue is preempt frozen. The following two points have to be guaranteed for one queue: 1) preempt freezing can be started only after all in-progress normal & preempt freezings are completed 2) normal freezing can be

[PATCH V3 8/8] SCSI: preempt freeze block queue when SCSI device is put into quiesce

2017-09-02 Thread Ming Lei
Simply quiesing SCSI device and waiting for completeion of IO dispatched to SCSI queue isn't safe, it is easy to use up requests because all these allocated requests can't be dispatched when device is put in QIUESCE. Then no request can be allocated for RQF_PREEMPT, and system may hang somewhere,

[PATCH V3 5/8] block: tracking request allocation with q_usage_counter

2017-09-02 Thread Ming Lei
This usage is basically same with blk-mq, so that we can support to freeze queue easily. Signed-off-by: Ming Lei --- block/blk-core.c | 8 1 file changed, 8 insertions(+) diff --git a/block/blk-core.c b/block/blk-core.c index ce2d3b6f6c62..85b15833a7a5 100644 ---

[PATCH V3 4/8] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait

2017-09-02 Thread Ming Lei
The only change on legacy is that blk_drain_queue() is run from blk_freeze_queue(), which is called in blk_cleanup_queue(). So this patch removes the explicite __blk_drain_queue() in blk_cleanup_queue(). Signed-off-by: Ming Lei --- block/blk-core.c | 17

[PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

2017-09-02 Thread Ming Lei
REQF_PREEMPT is a bit special because the request is required to be dispatched to lld even when SCSI device is quiesced. So this patch introduces __blk_get_request() to allow block layer to allocate request when queue is preempt frozen, since we will preempt freeze queue before quiescing SCSI

[PATCH V3 3/8] blk-mq: only run hw queues for blk-mq

2017-09-02 Thread Ming Lei
This patch just makes it explicitely. Reviewed-by: Johannes Thumshirn Signed-off-by: Ming Lei --- block/blk-mq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 8cf1f7cbef2b..4c532d8612e1

[PATCH V3 0/8] block/scsi: safe SCSI quiescing

2017-09-02 Thread Ming Lei
Hi, The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT can be dispatched to SCSI successfully, and scsi_device_quiesce() just simply waits for completion of I/Os dispatched to SCSI stack. It isn't

[PATCH V3 1/8] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue

2017-09-02 Thread Ming Lei
We will support to freeze queue on block legacy path too. Signed-off-by: Ming Lei --- block/blk-cgroup.c | 4 ++-- block/blk-mq.c | 10 +- block/elevator.c | 2 +- drivers/block/loop.c | 8 drivers/nvme/host/core.c | 4 ++--

[PATCH V3 2/8] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue

2017-09-02 Thread Ming Lei
This APIs will be used by legacy path too. Signed-off-by: Ming Lei --- block/bfq-iosched.c | 2 +- block/blk-cgroup.c | 4 ++-- block/blk-mq.c | 17 - block/blk-mq.h | 1 - block/elevator.c | 2 +-

Re: [PATCH V2 0/8] block/scsi: safe SCSI quiescing

2017-09-02 Thread Oleksandr Natalenko
With regard to suspend/resume cycle: Tested-by: Oleksandr Natalenko On pátek 1. září 2017 20:49:49 CEST Ming Lei wrote: > Hi, > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. > > Once SCSI device is put into QUIESCE, no new request except for