[PATCH v2] lightnvm: pblk: ignore the smeta oob area scan

2018-10-25 Thread Zhoujie Wu
The smeta area l2p mapping is empty, and actually the
recovery procedure only need to restore data sector's l2p
mapping. So ignore the smeta oob scan.

Signed-off-by: Zhoujie Wu 
---
v2: Modified based on suggestion from Hans. The smeta may not start from
paddr 0 if the first block is bad. Use pblk_line_smeta_start to calculate
the smeta start address.

 drivers/lightnvm/pblk-recovery.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 5740b75..0fbd30e 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -334,6 +334,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
   struct pblk_recov_alloc p)
 {
struct nvm_tgt_dev *dev = pblk->dev;
+   struct pblk_line_meta *lm = >lm;
struct nvm_geo *geo = >geo;
struct ppa_addr *ppa_list;
struct pblk_sec_meta *meta_list;
@@ -342,12 +343,12 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
void *data;
dma_addr_t dma_ppa_list, dma_meta_list;
__le64 *lba_list;
-   u64 paddr = 0;
+   u64 paddr = pblk_line_smeta_start(pblk, line) + lm->smeta_sec;
bool padded = false;
int rq_ppas, rq_len;
int i, j;
int ret;
-   u64 left_ppas = pblk_sec_in_open_line(pblk, line);
+   u64 left_ppas = pblk_sec_in_open_line(pblk, line) - lm->smeta_sec;
 
if (pblk_line_wp_is_unbalanced(pblk, line))
pblk_warn(pblk, "recovering unbalanced line (%d)\n", line->id);
-- 
1.9.1



Re: [EXT] Re: [PATCH] lightnvm: pblk: ignore the smeta oob area scan

2018-10-25 Thread Zhoujie Wu




On 10/25/2018 04:16 AM, Hans Holmberg wrote:

External Email

--
On Thu, Oct 25, 2018 at 2:44 AM Zhoujie Wu  wrote:

The smeta area l2p mapping is empty, and actually the
recovery procedure only need to restore data sector's l2p
mapping. So ignore the smeta oob scan.

Signed-off-by: Zhoujie Wu 
---
  drivers/lightnvm/pblk-recovery.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 5740b75..30f2616 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -334,6 +334,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
struct pblk_recov_alloc p)
  {
 struct nvm_tgt_dev *dev = pblk->dev;
+   struct pblk_line_meta *lm = >lm;
 struct nvm_geo *geo = >geo;
 struct ppa_addr *ppa_list;
 struct pblk_sec_meta *meta_list;
@@ -342,12 +343,12 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
 void *data;
 dma_addr_t dma_ppa_list, dma_meta_list;
 __le64 *lba_list;
-   u64 paddr = 0;
+   u64 paddr = lm->smeta_sec;

Smeta is not guaranteed to start at paddr 0 - it will be placed in the
first non-bad chunk (in stripe order).
If the first chunk in the line is bad, smeta will be read and
lm->smeta_sec sectors will be lost.

You can use pblk_line_smeta_start to calculate the start address of smeta.

/ Hans
Good point, I will submit v2 patch based on your suggestion. This 
reminds me the similar issue in function pblk_line_wp_is_unbalanced.
current 4.20 branch implementation, this function will check if all 
other blks' wp larger than blk0's wp, if larger, consider this line as 
unbalanced.
If blk0 is bad blk, the wp could be 0, this line will anyway consider as 
unbalance line and report a warning.  Looks like this also has to be fixed?



 bool padded = false;
 int rq_ppas, rq_len;
 int i, j;
 int ret;
-   u64 left_ppas = pblk_sec_in_open_line(pblk, line);
+   u64 left_ppas = pblk_sec_in_open_line(pblk, line) - lm->smeta_sec;

 if (pblk_line_wp_is_unbalanced(pblk, line))
 pblk_warn(pblk, "recovering unbalanced line (%d)\n", line->id);
--
1.9.1





Re: [PATCH 0/2] blktests: New loop tests

2018-10-25 Thread Omar Sandoval
On Thu, Oct 18, 2018 at 12:31:45PM +0200, Jan Kara wrote:
> 
> Hello,
> 
> these two patches create two new tests for blktests as regression tests
> for my recently posted loopback device fixes. More details in individual
> patches.

Thanks, Jan, I applied 007 renamed to 006.


[PATCH 10/14] blk-mq: initial support for multiple queue maps

2018-10-25 Thread Jens Axboe
Add a queue offset to the tag map. This enables users to map
iteratively, for each queue map type they support.

Bump maximum number of supported maps to 2, we're now fully
able to support more than 1 map.

Signed-off-by: Jens Axboe 
---
 block/blk-mq-cpumap.c  | 9 +
 block/blk-mq-pci.c | 2 +-
 block/blk-mq-virtio.c  | 2 +-
 include/linux/blk-mq.h | 3 ++-
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 6e6686c55984..03a534820271 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -14,9 +14,10 @@
 #include "blk.h"
 #include "blk-mq.h"
 
-static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
+static int cpu_to_queue_index(struct blk_mq_queue_map *qmap,
+ unsigned int nr_queues, const int cpu)
 {
-   return cpu % nr_queues;
+   return qmap->queue_offset + (cpu % nr_queues);
 }
 
 static int get_first_sibling(unsigned int cpu)
@@ -44,11 +45,11 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
 * performace optimizations.
 */
if (cpu < nr_queues) {
-   map[cpu] = cpu_to_queue_index(nr_queues, cpu);
+   map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
} else {
first_sibling = get_first_sibling(cpu);
if (first_sibling == cpu)
-   map[cpu] = cpu_to_queue_index(nr_queues, cpu);
+   map[cpu] = cpu_to_queue_index(qmap, nr_queues, 
cpu);
else
map[cpu] = map[first_sibling];
}
diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
index 40333d60a850..1dce18553984 100644
--- a/block/blk-mq-pci.c
+++ b/block/blk-mq-pci.c
@@ -43,7 +43,7 @@ int blk_mq_pci_map_queues(struct blk_mq_queue_map *qmap, 
struct pci_dev *pdev,
goto fallback;
 
for_each_cpu(cpu, mask)
-   qmap->mq_map[cpu] = queue;
+   qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
 
return 0;
diff --git a/block/blk-mq-virtio.c b/block/blk-mq-virtio.c
index 661fbfef480f..370827163835 100644
--- a/block/blk-mq-virtio.c
+++ b/block/blk-mq-virtio.c
@@ -44,7 +44,7 @@ int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
goto fallback;
 
for_each_cpu(cpu, mask)
-   qmap->mq_map[cpu] = queue;
+   qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
 
return 0;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7e792ffb09bb..250b9ed86cd4 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -78,10 +78,11 @@ struct blk_mq_hw_ctx {
 struct blk_mq_queue_map {
unsigned int *mq_map;
unsigned int nr_queues;
+   unsigned int queue_offset;
 };
 
 enum {
-   HCTX_MAX_TYPES = 1,
+   HCTX_MAX_TYPES = 2,
 };
 
 struct blk_mq_tag_set {
-- 
2.17.1



[PATCH 09/14] blk-mq: ensure that plug lists don't straddle hardware queues

2018-10-25 Thread Jens Axboe
Since we insert per hardware queue, we have to ensure that every
request on the plug list being inserted belongs to the same
hardware queue.

Signed-off-by: Jens Axboe 
---
 block/blk-mq.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 60a951c4934c..52b07188b39a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1621,6 +1621,27 @@ static int plug_ctx_cmp(void *priv, struct list_head *a, 
struct list_head *b)
  blk_rq_pos(rqa) < blk_rq_pos(rqb)));
 }
 
+/*
+ * Need to ensure that the hardware queue matches, so we don't submit
+ * a list of requests that end up on different hardware queues.
+ */
+static bool ctx_match(struct request *req, struct blk_mq_ctx *ctx,
+ unsigned int flags)
+{
+   if (req->mq_ctx != ctx)
+   return false;
+
+   /*
+* If we just have one map, then we know the hctx will match
+* if the ctx matches
+*/
+   if (req->q->tag_set->nr_maps == 1)
+   return true;
+
+   return blk_mq_map_queue(req->q, req->cmd_flags, ctx->cpu) ==
+   blk_mq_map_queue(req->q, flags, ctx->cpu);
+}
+
 void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 {
struct blk_mq_ctx *this_ctx;
@@ -1628,7 +1649,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool 
from_schedule)
struct request *rq;
LIST_HEAD(list);
LIST_HEAD(ctx_list);
-   unsigned int depth;
+   unsigned int depth, this_flags;
 
list_splice_init(>mq_list, );
 
@@ -1636,13 +1657,14 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool 
from_schedule)
 
this_q = NULL;
this_ctx = NULL;
+   this_flags = 0;
depth = 0;
 
while (!list_empty()) {
rq = list_entry_rq(list.next);
list_del_init(>queuelist);
BUG_ON(!rq->q);
-   if (rq->mq_ctx != this_ctx) {
+   if (!ctx_match(rq, this_ctx, this_flags)) {
if (this_ctx) {
trace_block_unplug(this_q, depth, 
!from_schedule);
blk_mq_sched_insert_requests(this_q, this_ctx,
@@ -1650,6 +1672,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool 
from_schedule)
from_schedule);
}
 
+   this_flags = rq->cmd_flags;
this_ctx = rq->mq_ctx;
this_q = rq->q;
depth = 0;
-- 
2.17.1



[PATCH 08/14] blk-mq: separate number of hardware queues from nr_cpu_ids

2018-10-25 Thread Jens Axboe
With multiple maps, nr_cpu_ids is no longer the maximum number of
hardware queues we support on a given devices. The initializer of
the tag_set can have set ->nr_hw_queues larger than the available
number of CPUs, since we can exceed that with multiple queue maps.

Signed-off-by: Jens Axboe 
---
 block/blk-mq.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0fab36372ace..60a951c4934c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2663,6 +2663,19 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set 
*set,
mutex_unlock(>sysfs_lock);
 }
 
+/*
+ * Maximum number of queues we support. For single sets, we'll never have
+ * more than the CPUs (software queues). For multiple sets, the tag_set
+ * user may have set ->nr_hw_queues larger.
+ */
+static unsigned int nr_hw_queues(struct blk_mq_tag_set *set)
+{
+   if (set->nr_maps == 1)
+   return nr_cpu_ids;
+
+   return max(set->nr_hw_queues, nr_cpu_ids);
+}
+
 struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
  struct request_queue *q)
 {
@@ -2682,7 +2695,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
/* init q->mq_kobj and sw queues' kobjects */
blk_mq_sysfs_init(q);
 
-   q->queue_hw_ctx = kcalloc_node(nr_cpu_ids, sizeof(*(q->queue_hw_ctx)),
+   q->nr_queues = nr_hw_queues(set);
+   q->queue_hw_ctx = kcalloc_node(q->nr_queues, sizeof(*(q->queue_hw_ctx)),
GFP_KERNEL, set->numa_node);
if (!q->queue_hw_ctx)
goto err_percpu;
@@ -2694,7 +2708,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
INIT_WORK(>timeout_work, blk_mq_timeout_work);
blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ);
 
-   q->nr_queues = nr_cpu_ids;
q->tag_set = set;
 
q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
@@ -2884,12 +2897,13 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
set->queue_depth = min(64U, set->queue_depth);
}
/*
-* There is no use for more h/w queues than cpus.
+* There is no use for more h/w queues than cpus if we just have
+* a single map
 */
-   if (set->nr_hw_queues > nr_cpu_ids)
+   if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
set->nr_hw_queues = nr_cpu_ids;
 
-   set->tags = kcalloc_node(nr_cpu_ids, sizeof(struct blk_mq_tags *),
+   set->tags = kcalloc_node(nr_hw_queues(set), sizeof(struct blk_mq_tags 
*),
 GFP_KERNEL, set->numa_node);
if (!set->tags)
return -ENOMEM;
@@ -2932,7 +2946,7 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 {
int i, j;
 
-   for (i = 0; i < nr_cpu_ids; i++)
+   for (i = 0; i < nr_hw_queues(set); i++)
blk_mq_free_map_and_requests(set, i);
 
for (j = 0; j < set->nr_maps; j++) {
@@ -3064,7 +3078,7 @@ static void __blk_mq_update_nr_hw_queues(struct 
blk_mq_tag_set *set,
 
lockdep_assert_held(>tag_list_lock);
 
-   if (nr_hw_queues > nr_cpu_ids)
+   if (set->nr_maps == 1 && nr_hw_queues > nr_cpu_ids)
nr_hw_queues = nr_cpu_ids;
if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues)
return;
-- 
2.17.1



[PATCH 06/14] blk-mq: add 'type' attribute to the sysfs hctx directory

2018-10-25 Thread Jens Axboe
It can be useful for a user to verify what type a given hardware
queue is, expose this information in sysfs.

Signed-off-by: Jens Axboe 
---
 block/blk-mq-sysfs.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index aafb44224c89..2d737f9e7ba7 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -161,6 +161,11 @@ static ssize_t blk_mq_hw_sysfs_cpus_show(struct 
blk_mq_hw_ctx *hctx, char *page)
return ret;
 }
 
+static ssize_t blk_mq_hw_sysfs_type_show(struct blk_mq_hw_ctx *hctx, char 
*page)
+{
+   return sprintf(page, "%u\n", hctx->type);
+}
+
 static struct attribute *default_ctx_attrs[] = {
NULL,
 };
@@ -177,11 +182,16 @@ static struct blk_mq_hw_ctx_sysfs_entry 
blk_mq_hw_sysfs_cpus = {
.attr = {.name = "cpu_list", .mode = 0444 },
.show = blk_mq_hw_sysfs_cpus_show,
 };
+static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_type = {
+   .attr = {.name = "type", .mode = 0444 },
+   .show = blk_mq_hw_sysfs_type_show,
+};
 
 static struct attribute *default_hw_ctx_attrs[] = {
_mq_hw_sysfs_nr_tags.attr,
_mq_hw_sysfs_nr_reserved_tags.attr,
_mq_hw_sysfs_cpus.attr,
+   _mq_hw_sysfs_type.attr,
NULL,
 };
 
-- 
2.17.1



[PATCH 13/14] block: add REQ_HIPRI and inherit it from IOCB_HIPRI

2018-10-25 Thread Jens Axboe
We use IOCB_HIPRI to poll for IO in the caller instead of scheduling.
This information is not available for (or after) IO submission. The
driver may make different queue choices based on the type of IO, so
make the fact that we will poll for this IO known to the lower layers
as well.

Signed-off-by: Jens Axboe 
---
 fs/block_dev.c| 2 ++
 fs/direct-io.c| 2 ++
 fs/iomap.c| 9 -
 include/linux/blk_types.h | 4 +++-
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 38b8ce05cbc7..8bb8090c57a7 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -232,6 +232,8 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct 
iov_iter *iter,
bio.bi_opf = dio_bio_write_op(iocb);
task_io_account_write(ret);
}
+   if (iocb->ki_flags & IOCB_HIPRI)
+   bio.bi_opf |= REQ_HIPRI;
 
qc = submit_bio();
for (;;) {
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 093fb54cd316..ffb46b7aa5f7 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1265,6 +1265,8 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode 
*inode,
} else {
dio->op = REQ_OP_READ;
}
+   if (iocb->ki_flags & IOCB_HIPRI)
+   dio->op_flags |= REQ_HIPRI;
 
/*
 * For AIO O_(D)SYNC writes we need to defer completions to a workqueue
diff --git a/fs/iomap.c b/fs/iomap.c
index ec15cf2ec696..50ad8c8d1dcb 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1554,6 +1554,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap 
*iomap, loff_t pos,
unsigned len)
 {
struct page *page = ZERO_PAGE(0);
+   int flags = REQ_SYNC | REQ_IDLE;
struct bio *bio;
 
bio = bio_alloc(GFP_KERNEL, 1);
@@ -1562,9 +1563,12 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap 
*iomap, loff_t pos,
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
 
+   if (dio->iocb->ki_flags & IOCB_HIPRI)
+   flags |= REQ_HIPRI;
+
get_page(page);
__bio_add_page(bio, page, len, 0);
-   bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC | REQ_IDLE);
+   bio_set_op_attrs(bio, REQ_OP_WRITE, flags);
 
atomic_inc(>ref);
return submit_bio(bio);
@@ -1663,6 +1667,9 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, 
loff_t length,
bio_set_pages_dirty(bio);
}
 
+   if (dio->iocb->ki_flags & IOCB_HIPRI)
+   bio->bi_opf |= REQ_HIPRI;
+
iov_iter_advance(dio->submit.iter, n);
 
dio->size += n;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 093a818c5b68..d6c2558d6b73 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -322,6 +322,8 @@ enum req_flag_bits {
/* command specific flags for REQ_OP_WRITE_ZEROES: */
__REQ_NOUNMAP,  /* do not free blocks when zeroing */
 
+   __REQ_HIPRI,
+
/* for driver use */
__REQ_DRV,
__REQ_SWAP, /* swapping request. */
@@ -342,8 +344,8 @@ enum req_flag_bits {
 #define REQ_RAHEAD (1ULL << __REQ_RAHEAD)
 #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND)
 #define REQ_NOWAIT (1ULL << __REQ_NOWAIT)
-
 #define REQ_NOUNMAP(1ULL << __REQ_NOUNMAP)
+#define REQ_HIPRI  (1ULL << __REQ_HIPRI)
 
 #define REQ_DRV(1ULL << __REQ_DRV)
 #define REQ_SWAP   (1ULL << __REQ_SWAP)
-- 
2.17.1



[PATCH 07/14] blk-mq: support multiple hctx maps

2018-10-25 Thread Jens Axboe
Add support for the tag set carrying multiple queue maps, and
for the driver to inform blk-mq how many it wishes to support
through setting set->nr_maps.

This adds an mq_ops helper for drivers that support more than 1
map, mq_ops->flags_to_type(). The function takes request/bio flags
and CPU, and returns a queue map index for that. We then use the
type information in blk_mq_map_queue() to index the map set.

Signed-off-by: Jens Axboe 
---
 block/blk-mq.c | 85 --
 block/blk-mq.h | 19 ++
 include/linux/blk-mq.h |  7 
 3 files changed, 76 insertions(+), 35 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index fab84c6bda18..0fab36372ace 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2257,7 +2257,8 @@ static int blk_mq_init_hctx(struct request_queue *q,
 static void blk_mq_init_cpu_queues(struct request_queue *q,
   unsigned int nr_hw_queues)
 {
-   unsigned int i;
+   struct blk_mq_tag_set *set = q->tag_set;
+   unsigned int i, j;
 
for_each_possible_cpu(i) {
struct blk_mq_ctx *__ctx = per_cpu_ptr(q->queue_ctx, i);
@@ -2272,9 +2273,11 @@ static void blk_mq_init_cpu_queues(struct request_queue 
*q,
 * Set local node, IFF we have more than one hw queue. If
 * not, we remain on the home node of the device
 */
-   hctx = blk_mq_map_queue_type(q, 0, i);
-   if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
-   hctx->numa_node = local_memory_node(cpu_to_node(i));
+   for (j = 0; j < set->nr_maps; j++) {
+   hctx = blk_mq_map_queue_type(q, j, i);
+   if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
+   hctx->numa_node = 
local_memory_node(cpu_to_node(i));
+   }
}
 }
 
@@ -2309,7 +2312,7 @@ static void blk_mq_free_map_and_requests(struct 
blk_mq_tag_set *set,
 
 static void blk_mq_map_swqueue(struct request_queue *q)
 {
-   unsigned int i, hctx_idx;
+   unsigned int i, j, hctx_idx;
struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *ctx;
struct blk_mq_tag_set *set = q->tag_set;
@@ -2345,13 +2348,23 @@ static void blk_mq_map_swqueue(struct request_queue *q)
}
 
ctx = per_cpu_ptr(q->queue_ctx, i);
-   hctx = blk_mq_map_queue_type(q, 0, i);
-   hctx->type = 0;
-   cpumask_set_cpu(i, hctx->cpumask);
-   ctx->index_hw[hctx->type] = hctx->nr_ctx;
-   hctx->ctxs[hctx->nr_ctx++] = ctx;
-   /* wrap */
-   BUG_ON(!hctx->nr_ctx);
+   for (j = 0; j < set->nr_maps; j++) {
+   hctx = blk_mq_map_queue_type(q, j, i);
+   hctx->type = j;
+
+   /*
+* If the CPU is already set in the mask, then we've
+* mapped this one already. This can happen if
+* devices share queues across queue maps.
+*/
+   if (cpumask_test_cpu(i, hctx->cpumask))
+   continue;
+   cpumask_set_cpu(i, hctx->cpumask);
+   ctx->index_hw[hctx->type] = hctx->nr_ctx;
+   hctx->ctxs[hctx->nr_ctx++] = ctx;
+   /* wrap */
+   BUG_ON(!hctx->nr_ctx);
+   }
}
 
mutex_unlock(>sysfs_lock);
@@ -2519,6 +2532,7 @@ struct request_queue *blk_mq_init_sq_queue(struct 
blk_mq_tag_set *set,
memset(set, 0, sizeof(*set));
set->ops = ops;
set->nr_hw_queues = 1;
+   set->nr_maps = 1;
set->queue_depth = queue_depth;
set->numa_node = NUMA_NO_NODE;
set->flags = set_flags;
@@ -2798,6 +2812,8 @@ static int blk_mq_alloc_rq_maps(struct blk_mq_tag_set 
*set)
 static int blk_mq_update_queue_map(struct blk_mq_tag_set *set)
 {
if (set->ops->map_queues) {
+   int i;
+
/*
 * transport .map_queues is usually done in the following
 * way:
@@ -2805,18 +2821,21 @@ static int blk_mq_update_queue_map(struct 
blk_mq_tag_set *set)
 * for (queue = 0; queue < set->nr_hw_queues; queue++) {
 *  mask = get_cpu_mask(queue)
 *  for_each_cpu(cpu, mask)
-*  set->map.mq_map[cpu] = queue;
+*  set->map[x].mq_map[cpu] = queue;
 * }
 *
 * When we need to remap, the table has to be cleared for
 * killing stale mapping since one CPU may not be mapped
 * to any hw queue.
 */
-   blk_mq_clear_mq_map(>map[0]);
+   for (i = 0; i < set->nr_maps; i++)
+

[PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs

2018-10-25 Thread Jens Axboe
A driver may have a need to allocate multiple sets of MSI/MSI-X
interrupts, and have them appropriately affinitized. Add support for
defining a number of sets in the irq_affinity structure, of varying
sizes, and get each set affinitized correctly across the machine.

Cc: Thomas Gleixner 
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Jens Axboe 
---
 include/linux/interrupt.h |  4 
 kernel/irq/affinity.c | 31 +--
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index eeceac3376fc..9fce2131902c 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -247,10 +247,14 @@ struct irq_affinity_notify {
  * the MSI(-X) vector space
  * @post_vectors:  Don't apply affinity to @post_vectors at end of
  * the MSI(-X) vector space
+ * @nr_sets:   Length of passed in *sets array
+ * @sets:  Number of affinitized sets
  */
 struct irq_affinity {
int pre_vectors;
int post_vectors;
+   int nr_sets;
+   int *sets;
 };
 
 #if defined(CONFIG_SMP)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f4f29b9d90ee..0055e252e438 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -180,6 +180,7 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
int curvec, usedvecs;
cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
struct cpumask *masks = NULL;
+   int i, nr_sets;
 
/*
 * If there aren't any vectors left after applying the pre/post
@@ -210,10 +211,23 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
get_online_cpus();
build_node_to_cpumask(node_to_cpumask);
 
-   /* Spread on present CPUs starting from affd->pre_vectors */
-   usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
-   node_to_cpumask, cpu_present_mask,
-   nmsk, masks);
+   /*
+* Spread on present CPUs starting from affd->pre_vectors. If we
+* have multiple sets, build each sets affinity mask separately.
+*/
+   nr_sets = affd->nr_sets;
+   if (!nr_sets)
+   nr_sets = 1;
+
+   for (i = 0, usedvecs = 0; i < nr_sets; i++) {
+   int this_vecs = affd->sets ? affd->sets[i] : affvecs;
+   int nr;
+
+   nr = irq_build_affinity_masks(affd, curvec, this_vecs,
+ node_to_cpumask, cpu_present_mask,
+ nmsk, masks + usedvecs);
+   usedvecs += nr;
+   }
 
/*
 * Spread on non present CPUs starting from the next vector to be
@@ -258,13 +272,18 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, 
const struct irq_affinity
 {
int resv = affd->pre_vectors + affd->post_vectors;
int vecs = maxvec - resv;
+   int i, set_vecs;
int ret;
 
if (resv > minvec)
return 0;
 
get_online_cpus();
-   ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
+   ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs);
put_online_cpus();
-   return ret;
+
+   for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
+   set_vecs += affd->sets[i];
+
+   return resv + max(ret, set_vecs);
 }
-- 
2.17.1



[PATCH 14/14] nvme: add separate poll queue map

2018-10-25 Thread Jens Axboe
Adds support for defining a variable number of poll queues, currently
configurable with the 'poll_queues' module parameter. Defaults to
a single poll queue.

And now we finally have poll support without triggering interrupts!

Signed-off-by: Jens Axboe 
---
 drivers/nvme/host/pci.c | 103 +---
 include/linux/blk-mq.h  |   2 +-
 2 files changed, 88 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 658c9a2f4114..cce5d06f11c5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -86,6 +86,10 @@ MODULE_PARM_DESC(write_queues,
"Number of queues to use for writes. If not set, reads and writes "
"will share a queue set.");
 
+static int poll_queues = 1;
+module_param_cb(poll_queues, _count_ops, _queues, 0644);
+MODULE_PARM_DESC(poll_queues, "Number of queues to use for polled IO.");
+
 struct nvme_dev;
 struct nvme_queue;
 
@@ -94,6 +98,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown);
 enum {
NVMEQ_TYPE_READ,
NVMEQ_TYPE_WRITE,
+   NVMEQ_TYPE_POLL,
NVMEQ_TYPE_NR,
 };
 
@@ -202,6 +207,7 @@ struct nvme_queue {
u16 last_cq_head;
u16 qid;
u8 cq_phase;
+   u8 polled;
u32 *dbbuf_sq_db;
u32 *dbbuf_cq_db;
u32 *dbbuf_sq_ei;
@@ -250,7 +256,7 @@ static inline void _nvme_check_size(void)
 
 static unsigned int max_io_queues(void)
 {
-   return num_possible_cpus() + write_queues;
+   return num_possible_cpus() + write_queues + poll_queues;
 }
 
 static unsigned int max_queue_count(void)
@@ -500,8 +506,15 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
offset = queue_irq_offset(dev);
}
 
+   /*
+* The poll queue(s) doesn't have an IRQ (and hence IRQ
+* affinity), so use the regular blk-mq cpu mapping
+*/
map->queue_offset = qoff;
-   blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
+   if (i != NVMEQ_TYPE_POLL)
+   blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), 
offset);
+   else
+   blk_mq_map_queues(map);
qoff += map->nr_queues;
offset += map->nr_queues;
}
@@ -892,7 +905,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx 
*hctx,
 * We should not need to do this, but we're still using this to
 * ensure we can drain requests on a dying queue.
 */
-   if (unlikely(nvmeq->cq_vector < 0))
+   if (unlikely(nvmeq->cq_vector < 0 && !nvmeq->polled))
return BLK_STS_IOERR;
 
ret = nvme_setup_cmd(ns, req, );
@@ -921,6 +934,8 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx 
*hctx,
 
 static int nvme_flags_to_type(struct request_queue *q, unsigned int flags)
 {
+   if (flags & REQ_HIPRI)
+   return NVMEQ_TYPE_POLL;
if ((flags & REQ_OP_MASK) == REQ_OP_READ)
return NVMEQ_TYPE_READ;
 
@@ -1094,7 +1109,10 @@ static int adapter_alloc_cq(struct nvme_dev *dev, u16 
qid,
struct nvme_queue *nvmeq, s16 vector)
 {
struct nvme_command c;
-   int flags = NVME_QUEUE_PHYS_CONTIG | NVME_CQ_IRQ_ENABLED;
+   int flags = NVME_QUEUE_PHYS_CONTIG;
+
+   if (vector != -1)
+   flags |= NVME_CQ_IRQ_ENABLED;
 
/*
 * Note: we (ab)use the fact that the prp fields survive if no data
@@ -1106,7 +1124,10 @@ static int adapter_alloc_cq(struct nvme_dev *dev, u16 
qid,
c.create_cq.cqid = cpu_to_le16(qid);
c.create_cq.qsize = cpu_to_le16(nvmeq->q_depth - 1);
c.create_cq.cq_flags = cpu_to_le16(flags);
-   c.create_cq.irq_vector = cpu_to_le16(vector);
+   if (vector != -1)
+   c.create_cq.irq_vector = cpu_to_le16(vector);
+   else
+   c.create_cq.irq_vector = 0;
 
return nvme_submit_sync_cmd(dev->ctrl.admin_q, , NULL, 0);
 }
@@ -1348,13 +1369,14 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
int vector;
 
spin_lock_irq(>cq_lock);
-   if (nvmeq->cq_vector == -1) {
+   if (nvmeq->cq_vector == -1 && !nvmeq->polled) {
spin_unlock_irq(>cq_lock);
return 1;
}
vector = nvmeq->cq_vector;
nvmeq->dev->online_queues--;
nvmeq->cq_vector = -1;
+   nvmeq->polled = false;
spin_unlock_irq(>cq_lock);
 
/*
@@ -1366,7 +1388,8 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
 
-   pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
+   if (vector != -1)
+   pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
 
return 0;
 }
@@ -1500,7 +1523,7 @@ static void nvme_init_queue(struct nvme_queue 

[PATCH 12/14] nvme: utilize two queue maps, one for reads and one for writes

2018-10-25 Thread Jens Axboe
NVMe does round-robin between queues by default, which means that
sharing a queue map for both reads and writes can be problematic
in terms of read servicing. It's much easier to flood the queue
with writes and reduce the read servicing.

Implement two queue maps, one for reads and one for writes. The
write queue count is configurable through the 'write_queues'
parameter.

By default, we retain the previous behavior of having a single
queue set, shared between reads and writes. Setting 'write_queues'
to a non-zero value will create two queue sets, one for reads and
one for writes, the latter using the configurable number of
queues (hardware queue counts permitting).

Signed-off-by: Jens Axboe 
---
 drivers/nvme/host/pci.c | 139 +---
 1 file changed, 131 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e5d783cb6937..658c9a2f4114 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -74,11 +74,29 @@ static int io_queue_depth = 1024;
 module_param_cb(io_queue_depth, _queue_depth_ops, _queue_depth, 0644);
 MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2");
 
+static int queue_count_set(const char *val, const struct kernel_param *kp);
+static const struct kernel_param_ops queue_count_ops = {
+   .set = queue_count_set,
+   .get = param_get_int,
+};
+
+static int write_queues;
+module_param_cb(write_queues, _count_ops, _queues, 0644);
+MODULE_PARM_DESC(write_queues,
+   "Number of queues to use for writes. If not set, reads and writes "
+   "will share a queue set.");
+
 struct nvme_dev;
 struct nvme_queue;
 
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 
+enum {
+   NVMEQ_TYPE_READ,
+   NVMEQ_TYPE_WRITE,
+   NVMEQ_TYPE_NR,
+};
+
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
@@ -92,6 +110,7 @@ struct nvme_dev {
struct dma_pool *prp_small_pool;
unsigned online_queues;
unsigned max_qid;
+   unsigned io_queues[NVMEQ_TYPE_NR];
unsigned int num_vecs;
int q_depth;
u32 db_stride;
@@ -134,6 +153,17 @@ static int io_queue_depth_set(const char *val, const 
struct kernel_param *kp)
return param_set_int(val, kp);
 }
 
+static int queue_count_set(const char *val, const struct kernel_param *kp)
+{
+   int n = 0, ret;
+
+   ret = kstrtoint(val, 10, );
+   if (n > num_possible_cpus())
+   n = num_possible_cpus();
+
+   return param_set_int(val, kp);
+}
+
 static inline unsigned int sq_idx(unsigned int qid, u32 stride)
 {
return qid * 2 * stride;
@@ -218,9 +248,20 @@ static inline void _nvme_check_size(void)
BUILD_BUG_ON(sizeof(struct nvme_dbbuf) != 64);
 }
 
+static unsigned int max_io_queues(void)
+{
+   return num_possible_cpus() + write_queues;
+}
+
+static unsigned int max_queue_count(void)
+{
+   /* IO queues + admin queue */
+   return 1 + max_io_queues();
+}
+
 static inline unsigned int nvme_dbbuf_size(u32 stride)
 {
-   return ((num_possible_cpus() + 1) * 8 * stride);
+   return (max_queue_count() * 8 * stride);
 }
 
 static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev)
@@ -431,12 +472,41 @@ static int nvme_init_request(struct blk_mq_tag_set *set, 
struct request *req,
return 0;
 }
 
+static int queue_irq_offset(struct nvme_dev *dev)
+{
+   /* if we have more than 1 vec, admin queue offsets us 1 */
+   if (dev->num_vecs > 1)
+   return 1;
+
+   return 0;
+}
+
 static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 {
struct nvme_dev *dev = set->driver_data;
+   int i, qoff, offset;
+
+   offset = queue_irq_offset(dev);
+   for (i = 0, qoff = 0; i < set->nr_maps; i++) {
+   struct blk_mq_queue_map *map = >map[i];
+
+   map->nr_queues = dev->io_queues[i];
+   if (!map->nr_queues) {
+   BUG_ON(i == NVMEQ_TYPE_READ);
 
-   return blk_mq_pci_map_queues(>map[0], to_pci_dev(dev->dev),
-   dev->num_vecs > 1 ? 1 /* admin queue */ : 0);
+   /* shared set, resuse read set parameters */
+   map->nr_queues = dev->io_queues[NVMEQ_TYPE_READ];
+   qoff = 0;
+   offset = queue_irq_offset(dev);
+   }
+
+   map->queue_offset = qoff;
+   blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
+   qoff += map->nr_queues;
+   offset += map->nr_queues;
+   }
+
+   return 0;
 }
 
 /**
@@ -849,6 +919,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx 
*hctx,
return ret;
 }
 
+static int nvme_flags_to_type(struct request_queue *q, unsigned int flags)
+{
+   if ((flags & REQ_OP_MASK) == REQ_OP_READ)
+   return NVMEQ_TYPE_READ;
+
+   return NVMEQ_TYPE_WRITE;
+}
+
 static void 

[PATCH 03/14] blk-mq: provide dummy blk_mq_map_queue_type() helper

2018-10-25 Thread Jens Axboe
Doesn't do anything right now, but it's needed as a prep patch
to get the interfaces right.

Signed-off-by: Jens Axboe 
---
 block/blk-mq.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/blk-mq.h b/block/blk-mq.h
index 889f0069dd80..79c300faa7ce 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -80,6 +80,12 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct 
request_queue *q,
return q->queue_hw_ctx[set->map[0].mq_map[cpu]];
 }
 
+static inline struct blk_mq_hw_ctx *blk_mq_map_queue_type(struct request_queue 
*q,
+ int type, int cpu)
+{
+   return blk_mq_map_queue(q, cpu);
+}
+
 /*
  * sysfs helpers
  */
-- 
2.17.1



[PATCH 02/14] blk-mq: abstract out queue map

2018-10-25 Thread Jens Axboe
This is in preparation for allowing multiple sets of maps per
queue, if so desired.

Signed-off-by: Jens Axboe 
---
 block/blk-mq-cpumap.c | 10 
 block/blk-mq-pci.c| 10 
 block/blk-mq-rdma.c   |  4 ++--
 block/blk-mq-virtio.c |  8 +++
 block/blk-mq.c| 34 ++-
 block/blk-mq.h|  8 +++
 drivers/block/virtio_blk.c|  2 +-
 drivers/nvme/host/pci.c   |  2 +-
 drivers/scsi/qla2xxx/qla_os.c |  5 ++--
 drivers/scsi/scsi_lib.c   |  2 +-
 drivers/scsi/smartpqi/smartpqi_init.c |  3 ++-
 drivers/scsi/virtio_scsi.c|  3 ++-
 include/linux/blk-mq-pci.h|  4 ++--
 include/linux/blk-mq-virtio.h |  4 ++--
 include/linux/blk-mq.h| 13 --
 15 files changed, 63 insertions(+), 49 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 3eb169f15842..6e6686c55984 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -30,10 +30,10 @@ static int get_first_sibling(unsigned int cpu)
return cpu;
 }
 
-int blk_mq_map_queues(struct blk_mq_tag_set *set)
+int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
 {
-   unsigned int *map = set->mq_map;
-   unsigned int nr_queues = set->nr_hw_queues;
+   unsigned int *map = qmap->mq_map;
+   unsigned int nr_queues = qmap->nr_queues;
unsigned int cpu, first_sibling;
 
for_each_possible_cpu(cpu) {
@@ -62,12 +62,12 @@ EXPORT_SYMBOL_GPL(blk_mq_map_queues);
  * We have no quick way of doing reverse lookups. This is only used at
  * queue init time, so runtime isn't important.
  */
-int blk_mq_hw_queue_to_node(unsigned int *mq_map, unsigned int index)
+int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int index)
 {
int i;
 
for_each_possible_cpu(i) {
-   if (index == mq_map[i])
+   if (index == qmap->mq_map[i])
return local_memory_node(cpu_to_node(i));
}
 
diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
index db644ec624f5..40333d60a850 100644
--- a/block/blk-mq-pci.c
+++ b/block/blk-mq-pci.c
@@ -31,26 +31,26 @@
  * that maps a queue to the CPUs that have irq affinity for the corresponding
  * vector.
  */
-int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+int blk_mq_pci_map_queues(struct blk_mq_queue_map *qmap, struct pci_dev *pdev,
int offset)
 {
const struct cpumask *mask;
unsigned int queue, cpu;
 
-   for (queue = 0; queue < set->nr_hw_queues; queue++) {
+   for (queue = 0; queue < qmap->nr_queues; queue++) {
mask = pci_irq_get_affinity(pdev, queue + offset);
if (!mask)
goto fallback;
 
for_each_cpu(cpu, mask)
-   set->mq_map[cpu] = queue;
+   qmap->mq_map[cpu] = queue;
}
 
return 0;
 
 fallback:
-   WARN_ON_ONCE(set->nr_hw_queues > 1);
-   blk_mq_clear_mq_map(set);
+   WARN_ON_ONCE(qmap->nr_queues > 1);
+   blk_mq_clear_mq_map(qmap);
return 0;
 }
 EXPORT_SYMBOL_GPL(blk_mq_pci_map_queues);
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
index 996167f1de18..a71576aff3a5 100644
--- a/block/blk-mq-rdma.c
+++ b/block/blk-mq-rdma.c
@@ -41,12 +41,12 @@ int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
goto fallback;
 
for_each_cpu(cpu, mask)
-   set->mq_map[cpu] = queue;
+   set->map[0].mq_map[cpu] = queue;
}
 
return 0;
 
 fallback:
-   return blk_mq_map_queues(set);
+   return blk_mq_map_queues(>map[0]);
 }
 EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues);
diff --git a/block/blk-mq-virtio.c b/block/blk-mq-virtio.c
index c3afbca11299..661fbfef480f 100644
--- a/block/blk-mq-virtio.c
+++ b/block/blk-mq-virtio.c
@@ -29,7 +29,7 @@
  * that maps a queue to the CPUs that have irq affinity for the corresponding
  * vector.
  */
-int blk_mq_virtio_map_queues(struct blk_mq_tag_set *set,
+int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
struct virtio_device *vdev, int first_vec)
 {
const struct cpumask *mask;
@@ -38,17 +38,17 @@ int blk_mq_virtio_map_queues(struct blk_mq_tag_set *set,
if (!vdev->config->get_vq_affinity)
goto fallback;
 
-   for (queue = 0; queue < set->nr_hw_queues; queue++) {
+   for (queue = 0; queue < qmap->nr_queues; queue++) {
mask = vdev->config->get_vq_affinity(vdev, first_vec + queue);
if (!mask)
goto fallback;
 
for_each_cpu(cpu, mask)
-   set->mq_map[cpu] = queue;
+   qmap->mq_map[cpu] = queue;
}
 
return 0;
 fallback:
-   return 

[PATCH 05/14] blk-mq: allow software queue to map to multiple hardware queues

2018-10-25 Thread Jens Axboe
The mapping used to be dependent on just the CPU location, but
now it's a tuple of { type, cpu} instead. This is a prep patch
for allowing a single software queue to map to multiple hardware
queues. No functional changes in this patch.

Signed-off-by: Jens Axboe 
---
 block/blk-mq-sched.c   |  2 +-
 block/blk-mq.c | 18 --
 block/blk-mq.h |  2 +-
 block/kyber-iosched.c  |  6 +++---
 include/linux/blk-mq.h |  3 ++-
 5 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 8125e9393ec2..d232ecf3290c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -110,7 +110,7 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx 
*hctx)
 static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
  struct blk_mq_ctx *ctx)
 {
-   unsigned idx = ctx->index_hw;
+   unsigned short idx = ctx->index_hw[hctx->type];
 
if (++idx == hctx->nr_ctx)
idx = 0;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e6ea7da99125..fab84c6bda18 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -75,14 +75,18 @@ static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx 
*hctx)
 static void blk_mq_hctx_mark_pending(struct blk_mq_hw_ctx *hctx,
 struct blk_mq_ctx *ctx)
 {
-   if (!sbitmap_test_bit(>ctx_map, ctx->index_hw))
-   sbitmap_set_bit(>ctx_map, ctx->index_hw);
+   const int bit = ctx->index_hw[hctx->type];
+
+   if (!sbitmap_test_bit(>ctx_map, bit))
+   sbitmap_set_bit(>ctx_map, bit);
 }
 
 static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
  struct blk_mq_ctx *ctx)
 {
-   sbitmap_clear_bit(>ctx_map, ctx->index_hw);
+   const int bit = ctx->index_hw[hctx->type];
+
+   sbitmap_clear_bit(>ctx_map, bit);
 }
 
 struct mq_inflight {
@@ -954,7 +958,7 @@ static bool dispatch_rq_from_ctx(struct sbitmap *sb, 
unsigned int bitnr,
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *start)
 {
-   unsigned off = start ? start->index_hw : 0;
+   unsigned off = start ? start->index_hw[hctx->type] : 0;
struct dispatch_rq_data data = {
.hctx = hctx,
.rq   = NULL,
@@ -2342,10 +2346,12 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 
ctx = per_cpu_ptr(q->queue_ctx, i);
hctx = blk_mq_map_queue_type(q, 0, i);
-
+   hctx->type = 0;
cpumask_set_cpu(i, hctx->cpumask);
-   ctx->index_hw = hctx->nr_ctx;
+   ctx->index_hw[hctx->type] = hctx->nr_ctx;
hctx->ctxs[hctx->nr_ctx++] = ctx;
+   /* wrap */
+   BUG_ON(!hctx->nr_ctx);
}
 
mutex_unlock(>sysfs_lock);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 55428b92c019..7b5a790acdbf 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -17,7 +17,7 @@ struct blk_mq_ctx {
}  cacheline_aligned_in_smp;
 
unsigned intcpu;
-   unsigned intindex_hw;
+   unsigned short  index_hw[HCTX_MAX_TYPES];
 
/* incremented at dispatch time */
unsigned long   rq_dispatched[2];
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 728757a34fa0..b824a639d5d4 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -576,7 +576,7 @@ static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, 
struct bio *bio)
 {
struct kyber_hctx_data *khd = hctx->sched_data;
struct blk_mq_ctx *ctx = blk_mq_get_ctx(hctx->queue);
-   struct kyber_ctx_queue *kcq = >kcqs[ctx->index_hw];
+   struct kyber_ctx_queue *kcq = >kcqs[ctx->index_hw[hctx->type]];
unsigned int sched_domain = kyber_sched_domain(bio->bi_opf);
struct list_head *rq_list = >rq_list[sched_domain];
bool merged;
@@ -602,7 +602,7 @@ static void kyber_insert_requests(struct blk_mq_hw_ctx 
*hctx,
 
list_for_each_entry_safe(rq, next, rq_list, queuelist) {
unsigned int sched_domain = kyber_sched_domain(rq->cmd_flags);
-   struct kyber_ctx_queue *kcq = >kcqs[rq->mq_ctx->index_hw];
+   struct kyber_ctx_queue *kcq = 
>kcqs[rq->mq_ctx->index_hw[hctx->type]];
struct list_head *head = >rq_list[sched_domain];
 
spin_lock(>lock);
@@ -611,7 +611,7 @@ static void kyber_insert_requests(struct blk_mq_hw_ctx 
*hctx,
else
list_move_tail(>queuelist, head);
sbitmap_set_bit(>kcq_map[sched_domain],
-   rq->mq_ctx->index_hw);
+   rq->mq_ctx->index_hw[hctx->type]);
blk_mq_sched_request_inserted(rq);
spin_unlock(>lock);
}
diff --git a/include/linux/blk-mq.h 

[PATCH 04/14] blk-mq: pass in request/bio flags to queue mapping

2018-10-25 Thread Jens Axboe
Prep patch for being able to place request based not just on
CPU location, but also on the type of request.

Signed-off-by: Jens Axboe 
---
 block/blk-flush.c  |  7 +++---
 block/blk-mq-debugfs.c |  4 +++-
 block/blk-mq-sched.c   | 16 ++
 block/blk-mq-tag.c |  5 +++--
 block/blk-mq.c | 50 +++---
 block/blk-mq.h |  8 ---
 block/blk.h|  6 ++---
 7 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 9baa9a119447..7922dba81497 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -219,7 +219,7 @@ static void flush_end_io(struct request *flush_rq, 
blk_status_t error)
 
/* release the tag's ownership to the req cloned from */
spin_lock_irqsave(>mq_flush_lock, flags);
-   hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu);
+   hctx = blk_mq_map_queue(q, flush_rq->cmd_flags, flush_rq->mq_ctx->cpu);
if (!q->elevator) {
blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
flush_rq->tag = -1;
@@ -307,7 +307,8 @@ static bool blk_kick_flush(struct request_queue *q, struct 
blk_flush_queue *fq,
if (!q->elevator) {
fq->orig_rq = first_rq;
flush_rq->tag = first_rq->tag;
-   hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
+   hctx = blk_mq_map_queue(q, first_rq->cmd_flags,
+   first_rq->mq_ctx->cpu);
blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
} else {
flush_rq->internal_tag = first_rq->internal_tag;
@@ -330,7 +331,7 @@ static void mq_flush_data_end_io(struct request *rq, 
blk_status_t error)
unsigned long flags;
struct blk_flush_queue *fq = blk_get_flush_queue(q, ctx);
 
-   hctx = blk_mq_map_queue(q, ctx->cpu);
+   hctx = blk_mq_map_queue(q, rq->cmd_flags, ctx->cpu);
 
if (q->elevator) {
WARN_ON(rq->tag < 0);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 9ed43a7c70b5..fac70c81b7de 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -427,8 +427,10 @@ struct show_busy_params {
 static void hctx_show_busy_rq(struct request *rq, void *data, bool reserved)
 {
const struct show_busy_params *params = data;
+   struct blk_mq_hw_ctx *hctx;
 
-   if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx)
+   hctx = blk_mq_map_queue(rq->q, rq->cmd_flags, rq->mq_ctx->cpu);
+   if (hctx == params->hctx)
__blk_mq_debugfs_rq_show(params->m,
 list_entry_rq(>queuelist));
 }
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 29bfe8017a2d..8125e9393ec2 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -311,7 +311,7 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, 
struct bio *bio)
 {
struct elevator_queue *e = q->elevator;
struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
-   struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
+   struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, bio->bi_opf, ctx->cpu);
bool ret = false;
 
if (e && e->type->ops.mq.bio_merge) {
@@ -367,7 +367,9 @@ void blk_mq_sched_insert_request(struct request *rq, bool 
at_head,
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
struct blk_mq_ctx *ctx = rq->mq_ctx;
-   struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
+   struct blk_mq_hw_ctx *hctx;
+
+   hctx = blk_mq_map_queue(q, rq->cmd_flags, ctx->cpu);
 
/* flush rq in flush machinery need to be dispatched directly */
if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
@@ -400,9 +402,15 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
  struct blk_mq_ctx *ctx,
  struct list_head *list, bool run_queue_async)
 {
-   struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
-   struct elevator_queue *e = hctx->queue->elevator;
+   struct blk_mq_hw_ctx *hctx;
+   struct elevator_queue *e;
+   struct request *rq;
+
+   /* For list inserts, requests better be on the same hw queue */
+   rq = list_first_entry(list, struct request, queuelist);
+   hctx = blk_mq_map_queue(q, rq->cmd_flags, ctx->cpu);
 
+   e = hctx->queue->elevator;
if (e && e->type->ops.mq.insert_requests)
e->type->ops.mq.insert_requests(hctx, list, false);
else {
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 4254e74c1446..478a959357f5 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -168,7 +168,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
io_schedule();
 
data->ctx = blk_mq_get_ctx(data->q);
-   data->hctx = blk_mq_map_queue(data->q, 

[PATCH 01/14] blk-mq: kill q->mq_map

2018-10-25 Thread Jens Axboe
It's just a pointer to set->mq_map, use that instead.

Signed-off-by: Jens Axboe 
---
 block/blk-mq.c | 13 -
 block/blk-mq.h |  4 +++-
 include/linux/blkdev.h |  2 --
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 21e4147c4810..22d5beaab5a0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2321,7 +2321,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 * If the cpu isn't present, the cpu is mapped to first hctx.
 */
for_each_possible_cpu(i) {
-   hctx_idx = q->mq_map[i];
+   hctx_idx = set->mq_map[i];
/* unmapped hw queue can be remapped after CPU topo changed */
if (!set->tags[hctx_idx] &&
!__blk_mq_alloc_rq_map(set, hctx_idx)) {
@@ -2331,7 +2331,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 * case, remap the current ctx to hctx[0] which
 * is guaranteed to always have tags allocated
 */
-   q->mq_map[i] = 0;
+   set->mq_map[i] = 0;
}
 
ctx = per_cpu_ptr(q->queue_ctx, i);
@@ -2429,8 +2429,6 @@ static void blk_mq_del_queue_tag_set(struct request_queue 
*q)
 static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
 struct request_queue *q)
 {
-   q->tag_set = set;
-
mutex_lock(>tag_list_lock);
 
/*
@@ -2467,8 +2465,6 @@ void blk_mq_release(struct request_queue *q)
kobject_put(>kobj);
}
 
-   q->mq_map = NULL;
-
kfree(q->queue_hw_ctx);
 
/*
@@ -2588,7 +2584,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set 
*set,
int node;
struct blk_mq_hw_ctx *hctx;
 
-   node = blk_mq_hw_queue_to_node(q->mq_map, i);
+   node = blk_mq_hw_queue_to_node(set->mq_map, i);
/*
 * If the hw queue has been mapped to another numa node,
 * we need to realloc the hctx. If allocation fails, fallback
@@ -2665,8 +2661,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
if (!q->queue_hw_ctx)
goto err_percpu;
 
-   q->mq_map = set->mq_map;
-
blk_mq_realloc_hw_ctxs(set, q);
if (!q->nr_hw_queues)
goto err_hctxs;
@@ -2675,6 +2669,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ);
 
q->nr_queues = nr_cpu_ids;
+   q->tag_set = set;
 
q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 9497b47e2526..9536be06d022 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -75,7 +75,9 @@ extern int blk_mq_hw_queue_to_node(unsigned int *map, 
unsigned int);
 static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
int cpu)
 {
-   return q->queue_hw_ctx[q->mq_map[cpu]];
+   struct blk_mq_tag_set *set = q->tag_set;
+
+   return q->queue_hw_ctx[set->mq_map[cpu]];
 }
 
 /*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 82b6cf45c6e0..6e506044a309 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -415,8 +415,6 @@ struct request_queue {
 
const struct blk_mq_ops *mq_ops;
 
-   unsigned int*mq_map;
-
/* sw queues */
struct blk_mq_ctx __percpu  *queue_ctx;
unsigned intnr_queues;
-- 
2.17.1



[PATCHSET 0/14] Add support for multiple queue maps

2018-10-25 Thread Jens Axboe
This series adds support for multiple queue maps for blk-mq.
Since blk-mq was introduced, it's only support a single queue
map. This means you can have 1 set of queues, and the mapping
purely depends on what CPU an IO originated from. With this
patch set, drivers can implement mappings that depend on both
CPU and request type - and they can have multiple sets of mappings.

NVMe is used as a proof of concept. It adds support for a separate
write queue set. One way to use this would be to limit the number
of write queues to favor reads, since NVMe does round-robin service
of queues. An easy extension of this would be to add multiple
sets of queues, for prioritized IO.

NVMe also uses this feature to finally make the polling work
efficiently, without triggering interrupts. This both increases
performance (and decreases latency), at a lower system load. At
the same time it's more flexible, as you don't have to worry about
IRQ coalescing and redirection to avoid interrupts disturbing the
workload. This is how polling should have worked from day 1.

This is on top of the mq-conversions branch and series just
posted. It can also be bound in my mq-maps branch.

 block/blk-flush.c |   7 +-
 block/blk-mq-cpumap.c |  19 +--
 block/blk-mq-debugfs.c|   4 +-
 block/blk-mq-pci.c|  10 +-
 block/blk-mq-rdma.c   |   4 +-
 block/blk-mq-sched.c  |  18 ++-
 block/blk-mq-sysfs.c  |  10 ++
 block/blk-mq-tag.c|   5 +-
 block/blk-mq-virtio.c |   8 +-
 block/blk-mq.c| 213 --
 block/blk-mq.h|  29 -
 block/blk.h   |   6 +-
 block/kyber-iosched.c |   6 +-
 drivers/block/virtio_blk.c|   2 +-
 drivers/nvme/host/pci.c   | 238 ++
 drivers/scsi/qla2xxx/qla_os.c |   5 +-
 drivers/scsi/scsi_lib.c   |   2 +-
 drivers/scsi/smartpqi/smartpqi_init.c |   3 +-
 drivers/scsi/virtio_scsi.c|   3 +-
 fs/block_dev.c|   2 +
 fs/direct-io.c|   2 +
 fs/iomap.c|   9 +-
 include/linux/blk-mq-pci.h|   4 +-
 include/linux/blk-mq-virtio.h |   4 +-
 include/linux/blk-mq.h|  24 +++-
 include/linux/blk_types.h |   4 +-
 include/linux/blkdev.h|   2 -
 include/linux/interrupt.h |   4 +
 kernel/irq/affinity.c |  31 -
 29 files changed, 520 insertions(+), 158 deletions(-)

-- 
Jens Axboe




[PATCH 27/28] blk-merge: kill dead queue lock held check

2018-10-25 Thread Jens Axboe
This is dead code, any queue reaching this part has mq_ops
attached.

Signed-off-by: Jens Axboe 
---
 block/blk-merge.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 3561dcce2260..0128284bded4 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -704,9 +704,6 @@ static void blk_account_io_merge(struct request *req)
 static struct request *attempt_merge(struct request_queue *q,
 struct request *req, struct request *next)
 {
-   if (!q->mq_ops)
-   lockdep_assert_held(q->queue_lock);
-
if (!rq_mergeable(req) || !rq_mergeable(next))
return NULL;
 
-- 
2.17.1



[PATCH 28/28] block: get rid of blk_queued_rq()

2018-10-25 Thread Jens Axboe
No point in hiding what this does, just open code it in the
one spot where we are still using it.

Signed-off-by: Jens Axboe 
---
 block/blk-mq.c | 2 +-
 include/linux/blkdev.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d43c9232c77c..21e4147c4810 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -692,7 +692,7 @@ void blk_mq_requeue_request(struct request *rq, bool 
kick_requeue_list)
/* this request will be re-inserted to io scheduler queue */
blk_mq_sched_requeue_request(rq);
 
-   BUG_ON(blk_queued_rq(rq));
+   BUG_ON(!list_empty(>queuelist));
blk_mq_add_to_requeue_list(rq, true, kick_requeue_list);
 }
 EXPORT_SYMBOL(blk_mq_requeue_request);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 95e119409490..82b6cf45c6e0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -678,8 +678,6 @@ static inline bool blk_account_rq(struct request *rq)
 
 #define blk_rq_cpu_valid(rq)   ((rq)->cpu != -1)
 #define blk_bidi_rq(rq)((rq)->next_rq != NULL)
-/* rq->queuelist of dequeued request must be list_empty() */
-#define blk_queued_rq(rq)  (!list_empty(&(rq)->queuelist))
 
 #define list_entry_rq(ptr) list_entry((ptr), struct request, queuelist)
 
-- 
2.17.1



[PATCH 11/28] bsg: pass in desired timeout handler

2018-10-25 Thread Jens Axboe
This will ease in the conversion to blk-mq, where we can't set
a timeout handler after queue init.

Cc: Johannes Thumshirn 
Cc: Benjamin Block 
Cc: linux-s...@vger.kernel.org
Signed-off-by: Jens Axboe 
---
 block/bsg-lib.c | 3 ++-
 drivers/scsi/scsi_transport_fc.c| 7 +++
 drivers/scsi/scsi_transport_iscsi.c | 2 +-
 drivers/scsi/scsi_transport_sas.c   | 4 ++--
 include/linux/bsg-lib.h | 2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index f3501cdaf1a6..1da011ec04e6 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -304,7 +304,7 @@ static void bsg_exit_rq(struct request_queue *q, struct 
request *req)
  * @dd_job_size: size of LLD data needed for each job
  */
 struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
-   bsg_job_fn *job_fn, int dd_job_size)
+   bsg_job_fn *job_fn, rq_timed_out_fn *timeout, int dd_job_size)
 {
struct request_queue *q;
int ret;
@@ -327,6 +327,7 @@ struct request_queue *bsg_setup_queue(struct device *dev, 
const char *name,
blk_queue_flag_set(QUEUE_FLAG_BIDI, q);
blk_queue_softirq_done(q, bsg_softirq_done);
blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT);
+   blk_queue_rq_timed_out(q, timeout);
 
ret = bsg_register_queue(q, dev, name, _transport_ops);
if (ret) {
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 381668fa135d..98aaffb4c715 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3780,7 +3780,8 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct 
fc_host_attrs *fc_host)
snprintf(bsg_name, sizeof(bsg_name),
 "fc_host%d", shost->host_no);
 
-   q = bsg_setup_queue(dev, bsg_name, fc_bsg_dispatch, i->f->dd_bsg_size);
+   q = bsg_setup_queue(dev, bsg_name, fc_bsg_dispatch, fc_bsg_job_timeout,
+   i->f->dd_bsg_size);
if (IS_ERR(q)) {
dev_err(dev,
"fc_host%d: bsg interface failed to initialize - setup 
queue\n",
@@ -3788,7 +3789,6 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct 
fc_host_attrs *fc_host)
return PTR_ERR(q);
}
__scsi_init_queue(shost, q);
-   blk_queue_rq_timed_out(q, fc_bsg_job_timeout);
blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT);
fc_host->rqst_q = q;
return 0;
@@ -3826,14 +3826,13 @@ fc_bsg_rportadd(struct Scsi_Host *shost, struct 
fc_rport *rport)
return -ENOTSUPP;
 
q = bsg_setup_queue(dev, dev_name(dev), fc_bsg_dispatch,
-   i->f->dd_bsg_size);
+   fc_bsg_job_timeout, i->f->dd_bsg_size);
if (IS_ERR(q)) {
dev_err(dev, "failed to setup bsg queue\n");
return PTR_ERR(q);
}
__scsi_init_queue(shost, q);
blk_queue_prep_rq(q, fc_bsg_rport_prep);
-   blk_queue_rq_timed_out(q, fc_bsg_job_timeout);
blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT);
rport->rqst_q = q;
return 0;
diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index 6fd2fe210fc3..26b11a775be9 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1542,7 +1542,7 @@ iscsi_bsg_host_add(struct Scsi_Host *shost, struct 
iscsi_cls_host *ihost)
return -ENOTSUPP;
 
snprintf(bsg_name, sizeof(bsg_name), "iscsi_host%d", shost->host_no);
-   q = bsg_setup_queue(dev, bsg_name, iscsi_bsg_host_dispatch, 0);
+   q = bsg_setup_queue(dev, bsg_name, iscsi_bsg_host_dispatch, NULL, 0);
if (IS_ERR(q)) {
shost_printk(KERN_ERR, shost, "bsg interface failed to "
 "initialize - no request queue\n");
diff --git a/drivers/scsi/scsi_transport_sas.c 
b/drivers/scsi/scsi_transport_sas.c
index 0a165b2b3e81..cf6d47891d77 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -198,7 +198,7 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, 
struct sas_rphy *rphy)
 
if (rphy) {
q = bsg_setup_queue(>dev, dev_name(>dev),
-   sas_smp_dispatch, 0);
+   sas_smp_dispatch, NULL, 0);
if (IS_ERR(q))
return PTR_ERR(q);
rphy->q = q;
@@ -207,7 +207,7 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, 
struct sas_rphy *rphy)
 
snprintf(name, sizeof(name), "sas_host%d", shost->host_no);
q = bsg_setup_queue(>shost_gendev, name,
-   sas_smp_dispatch, 0);
+   sas_smp_dispatch, NULL, 0);
if (IS_ERR(q))
return PTR_ERR(q);
to_sas_host_attrs(shost)->q = q;
diff --git 

[PATCH 12/28] bsg: provide bsg_remove_queue() helper

2018-10-25 Thread Jens Axboe
All drivers do unregister + cleanup, provide a helper for that.

Cc: Johannes Thumshirn 
Cc: Benjamin Block 
Cc: linux-s...@vger.kernel.org
Signed-off-by: Jens Axboe 
---
 block/bsg-lib.c | 7 +++
 drivers/scsi/scsi_transport_fc.c| 6 ++
 drivers/scsi/scsi_transport_iscsi.c | 7 +++
 drivers/scsi/scsi_transport_sas.c   | 6 ++
 include/linux/bsg-lib.h | 1 +
 5 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 1da011ec04e6..267f965af77a 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -296,6 +296,13 @@ static void bsg_exit_rq(struct request_queue *q, struct 
request *req)
kfree(job->reply);
 }
 
+void bsg_remove_queue(struct request_queue *q)
+{
+   bsg_unregister_queue(q);
+   blk_cleanup_queue(q);
+}
+EXPORT_SYMBOL_GPL(bsg_remove_queue);
+
 /**
  * bsg_setup_queue - Create and add the bsg hooks so we can receive requests
  * @dev: device to attach bsg device to
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 98aaffb4c715..4d64956bb5d3 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3851,10 +3851,8 @@ fc_bsg_rportadd(struct Scsi_Host *shost, struct fc_rport 
*rport)
 static void
 fc_bsg_remove(struct request_queue *q)
 {
-   if (q) {
-   bsg_unregister_queue(q);
-   blk_cleanup_queue(q);
-   }
+   if (q)
+   bsg_remove_queue(q);
 }
 
 
diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index 26b11a775be9..3ead0dba5d8d 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1576,10 +1576,9 @@ static int iscsi_remove_host(struct transport_container 
*tc,
struct Scsi_Host *shost = dev_to_shost(dev);
struct iscsi_cls_host *ihost = shost->shost_data;
 
-   if (ihost->bsg_q) {
-   bsg_unregister_queue(ihost->bsg_q);
-   blk_cleanup_queue(ihost->bsg_q);
-   }
+   if (ihost->bsg_q)
+   bsg_remove_queue(ihost->bsg_q);
+
return 0;
 }
 
diff --git a/drivers/scsi/scsi_transport_sas.c 
b/drivers/scsi/scsi_transport_sas.c
index cf6d47891d77..c46d642dc133 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -246,10 +246,8 @@ static int sas_host_remove(struct transport_container *tc, 
struct device *dev,
struct Scsi_Host *shost = dev_to_shost(dev);
struct request_queue *q = to_sas_host_attrs(shost)->q;
 
-   if (q) {
-   bsg_unregister_queue(q);
-   blk_cleanup_queue(q);
-   }
+   if (q)
+   bsg_remove_queue(q);
 
return 0;
 }
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index b13ae143e7ef..9c9b134b1fa5 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -73,6 +73,7 @@ void bsg_job_done(struct bsg_job *job, int result,
  unsigned int reply_payload_rcv_len);
 struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
bsg_job_fn *job_fn, rq_timed_out_fn *timeout, int dd_job_size);
+void bsg_remove_queue(struct request_queue *q);
 void bsg_job_put(struct bsg_job *job);
 int __must_check bsg_job_get(struct bsg_job *job);
 
-- 
2.17.1



[PATCH 17/28] block: remove legacy rq tagging

2018-10-25 Thread Jens Axboe
It's now unused, kill it.

Signed-off-by: Jens Axboe 
---
 Documentation/block/biodoc.txt |  88 
 block/Makefile |   2 +-
 block/blk-core.c   |   6 -
 block/blk-mq-debugfs.c |   2 -
 block/blk-mq-tag.c |   6 +-
 block/blk-sysfs.c  |   3 -
 block/blk-tag.c| 378 -
 include/linux/blkdev.h |  35 ---
 8 files changed, 3 insertions(+), 517 deletions(-)
 delete mode 100644 block/blk-tag.c

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index 207eca58efaa..ac18b488cb5e 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -65,7 +65,6 @@ Description of Contents:
 3.2.3 I/O completion
 3.2.4 Implications for drivers that do not interpret bios (don't handle
  multiple segments)
-3.2.5 Request command tagging
   3.3 I/O submission
 4. The I/O scheduler
 5. Scalability related changes
@@ -708,93 +707,6 @@ is crossed on completion of a transfer. (The end*request* 
functions should
 be used if only if the request has come down from block/bio path, not for
 direct access requests which only specify rq->buffer without a valid rq->bio)
 
-3.2.5 Generic request command tagging
-
-3.2.5.1 Tag helpers
-
-Block now offers some simple generic functionality to help support command
-queueing (typically known as tagged command queueing), ie manage more than
-one outstanding command on a queue at any given time.
-
-   blk_queue_init_tags(struct request_queue *q, int depth)
-
-   Initialize internal command tagging structures for a maximum
-   depth of 'depth'.
-
-   blk_queue_free_tags((struct request_queue *q)
-
-   Teardown tag info associated with the queue. This will be done
-   automatically by block if blk_queue_cleanup() is called on a queue
-   that is using tagging.
-
-The above are initialization and exit management, the main helpers during
-normal operations are:
-
-   blk_queue_start_tag(struct request_queue *q, struct request *rq)
-
-   Start tagged operation for this request. A free tag number between
-   0 and 'depth' is assigned to the request (rq->tag holds this number),
-   and 'rq' is added to the internal tag management. If the maximum depth
-   for this queue is already achieved (or if the tag wasn't started for
-   some other reason), 1 is returned. Otherwise 0 is returned.
-
-   blk_queue_end_tag(struct request_queue *q, struct request *rq)
-
-   End tagged operation on this request. 'rq' is removed from the internal
-   book keeping structures.
-
-To minimize struct request and queue overhead, the tag helpers utilize some
-of the same request members that are used for normal request queue management.
-This means that a request cannot both be an active tag and be on the queue
-list at the same time. blk_queue_start_tag() will remove the request, but
-the driver must remember to call blk_queue_end_tag() before signalling
-completion of the request to the block layer. This means ending tag
-operations before calling end_that_request_last()! For an example of a user
-of these helpers, see the IDE tagged command queueing support.
-
-3.2.5.2 Tag info
-
-Some block functions exist to query current tag status or to go from a
-tag number to the associated request. These are, in no particular order:
-
-   blk_queue_tagged(q)
-
-   Returns 1 if the queue 'q' is using tagging, 0 if not.
-
-   blk_queue_tag_request(q, tag)
-
-   Returns a pointer to the request associated with tag 'tag'.
-
-   blk_queue_tag_depth(q)
-   
-   Return current queue depth.
-
-   blk_queue_tag_queue(q)
-
-   Returns 1 if the queue can accept a new queued command, 0 if we are
-   at the maximum depth already.
-
-   blk_queue_rq_tagged(rq)
-
-   Returns 1 if the request 'rq' is tagged.
-
-3.2.5.2 Internal structure
-
-Internally, block manages tags in the blk_queue_tag structure:
-
-   struct blk_queue_tag {
-   struct request **tag_index; /* array or pointers to rq */
-   unsigned long *tag_map; /* bitmap of free tags */
-   struct list_head busy_list; /* fifo list of busy tags */
-   int busy;   /* queue depth */
-   int max_depth;  /* max queue depth */
-   };
-
-Most of the above is simple and straight forward, however busy_list may need
-a bit of explaining. Normally we don't care too much about request ordering,
-but in the event of any barrier requests in the tag queue we need to ensure
-that requests are restarted in the order they were queue.
-
 3.3 I/O Submission
 
 The routine submit_bio() is used to submit a single io. Higher level i/o
diff --git a/block/Makefile b/block/Makefile
index 27eac600474f..213674c8faaa 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -3,7 +3,7 @@
 # Makefile for the kernel 

[PATCH 14/28] block: remove blk_complete_request()

2018-10-25 Thread Jens Axboe
It's now unused.

Signed-off-by: Jens Axboe 
---
 block/blk-softirq.c| 20 
 include/linux/blkdev.h |  1 -
 2 files changed, 21 deletions(-)

diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index e47a2f751884..8ca0f6caf174 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -145,26 +145,6 @@ void __blk_complete_request(struct request *req)
 }
 EXPORT_SYMBOL(__blk_complete_request);
 
-/**
- * blk_complete_request - end I/O on a request
- * @req:  the request being processed
- *
- * Description:
- * Ends all I/O on a request. It does not handle partial completions,
- * unless the driver actually implements this in its completion callback
- * through requeueing. The actual completion happens out-of-order,
- * through a softirq handler. The user must have registered a completion
- * callback through blk_queue_softirq_done().
- **/
-void blk_complete_request(struct request *req)
-{
-   if (unlikely(blk_should_fake_timeout(req->q)))
-   return;
-   if (!blk_mark_rq_complete(req))
-   __blk_complete_request(req);
-}
-EXPORT_SYMBOL(blk_complete_request);
-
 static __init int blk_softirq_init(void)
 {
int i;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4293dc1cd160..9ff9ab6fc1fe 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1205,7 +1205,6 @@ extern bool __blk_end_request(struct request *rq, 
blk_status_t error,
 extern void __blk_end_request_all(struct request *rq, blk_status_t error);
 extern bool __blk_end_request_cur(struct request *rq, blk_status_t error);
 
-extern void blk_complete_request(struct request *);
 extern void __blk_complete_request(struct request *);
 extern void blk_abort_request(struct request *);
 extern void blk_unprep_request(struct request *);
-- 
2.17.1



[PATCH 18/28] block: remove non mq parts from the flush code

2018-10-25 Thread Jens Axboe
Signed-off-by: Jens Axboe 
---
 block/blk-flush.c | 154 +-
 block/blk.h   |   4 +-
 2 files changed, 31 insertions(+), 127 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 8b44b86779da..9baa9a119447 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -134,16 +134,8 @@ static void blk_flush_restore_request(struct request *rq)
 
 static bool blk_flush_queue_rq(struct request *rq, bool add_front)
 {
-   if (rq->q->mq_ops) {
-   blk_mq_add_to_requeue_list(rq, add_front, true);
-   return false;
-   } else {
-   if (add_front)
-   list_add(>queuelist, >q->queue_head);
-   else
-   list_add_tail(>queuelist, >q->queue_head);
-   return true;
-   }
+   blk_mq_add_to_requeue_list(rq, add_front, true);
+   return false;
 }
 
 /**
@@ -204,10 +196,7 @@ static bool blk_flush_complete_seq(struct request *rq,
BUG_ON(!list_empty(>queuelist));
list_del_init(>flush.list);
blk_flush_restore_request(rq);
-   if (q->mq_ops)
-   blk_mq_end_request(rq, error);
-   else
-   __blk_end_request_all(rq, error);
+   blk_mq_end_request(rq, error);
break;
 
default:
@@ -226,20 +215,17 @@ static void flush_end_io(struct request *flush_rq, 
blk_status_t error)
struct request *rq, *n;
unsigned long flags = 0;
struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
+   struct blk_mq_hw_ctx *hctx;
 
-   if (q->mq_ops) {
-   struct blk_mq_hw_ctx *hctx;
-
-   /* release the tag's ownership to the req cloned from */
-   spin_lock_irqsave(>mq_flush_lock, flags);
-   hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu);
-   if (!q->elevator) {
-   blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
-   flush_rq->tag = -1;
-   } else {
-   blk_mq_put_driver_tag_hctx(hctx, flush_rq);
-   flush_rq->internal_tag = -1;
-   }
+   /* release the tag's ownership to the req cloned from */
+   spin_lock_irqsave(>mq_flush_lock, flags);
+   hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu);
+   if (!q->elevator) {
+   blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
+   flush_rq->tag = -1;
+   } else {
+   blk_mq_put_driver_tag_hctx(hctx, flush_rq);
+   flush_rq->internal_tag = -1;
}
 
running = >flush_queue[fq->flush_running_idx];
@@ -248,9 +234,6 @@ static void flush_end_io(struct request *flush_rq, 
blk_status_t error)
/* account completion of the flush request */
fq->flush_running_idx ^= 1;
 
-   if (!q->mq_ops)
-   elv_completed_request(q, flush_rq);
-
/* and push the waiting requests to the next stage */
list_for_each_entry_safe(rq, n, running, flush.list) {
unsigned int seq = blk_flush_cur_seq(rq);
@@ -259,24 +242,8 @@ static void flush_end_io(struct request *flush_rq, 
blk_status_t error)
queued |= blk_flush_complete_seq(rq, fq, seq, error);
}
 
-   /*
-* Kick the queue to avoid stall for two cases:
-* 1. Moving a request silently to empty queue_head may stall the
-* queue.
-* 2. When flush request is running in non-queueable queue, the
-* queue is hold. Restart the queue after flush request is finished
-* to avoid stall.
-* This function is called from request completion path and calling
-* directly into request_fn may confuse the driver.  Always use
-* kblockd.
-*/
-   if (queued || fq->flush_queue_delayed) {
-   WARN_ON(q->mq_ops);
-   blk_run_queue_async(q);
-   }
fq->flush_queue_delayed = 0;
-   if (q->mq_ops)
-   spin_unlock_irqrestore(>mq_flush_lock, flags);
+   spin_unlock_irqrestore(>mq_flush_lock, flags);
 }
 
 /**
@@ -301,6 +268,7 @@ static bool blk_kick_flush(struct request_queue *q, struct 
blk_flush_queue *fq,
struct request *first_rq =
list_first_entry(pending, struct request, flush.list);
struct request *flush_rq = fq->flush_rq;
+   struct blk_mq_hw_ctx *hctx;
 
/* C1 described at the top of this file */
if (fq->flush_pending_idx != fq->flush_running_idx || 
list_empty(pending))
@@ -334,19 +302,15 @@ static bool blk_kick_flush(struct request_queue *q, 
struct blk_flush_queue *fq,
 * In case of IO scheduler, flush rq need to borrow scheduler tag
 * just for cheating put/get driver tag.
 */
-   if (q->mq_ops) {
-   struct blk_mq_hw_ctx *hctx;
-
-   flush_rq->mq_ctx = 

[PATCH 09/28] dm: remove legacy IO path

2018-10-25 Thread Jens Axboe
dm supports both, and since we're killing off the legacy path
in general, get rid of it in dm as well.

Signed-off-by: Jens Axboe 
---
 drivers/md/Kconfig|  11 --
 drivers/md/dm-core.h  |  10 --
 drivers/md/dm-mpath.c |  14 +-
 drivers/md/dm-rq.c| 293 --
 drivers/md/dm-rq.h|   4 -
 drivers/md/dm-sysfs.c |   3 +-
 drivers/md/dm-table.c |  36 +-
 drivers/md/dm.c   |  21 +--
 drivers/md/dm.h   |   1 -
 9 files changed, 35 insertions(+), 358 deletions(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 8b8c123cae66..3db222509e44 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -215,17 +215,6 @@ config BLK_DEV_DM
 
  If unsure, say N.
 
-config DM_MQ_DEFAULT
-   bool "request-based DM: use blk-mq I/O path by default"
-   depends on BLK_DEV_DM
-   ---help---
- This option enables the blk-mq based I/O path for request-based
- DM devices by default.  With the option the dm_mod.use_blk_mq
- module/boot option defaults to Y, without it to N, but it can
- still be overriden either way.
-
- If unsure say N.
-
 config DM_DEBUG
bool "Device mapper debugging support"
depends on BLK_DEV_DM
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 7d480c930eaf..224d44503a06 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -112,18 +112,8 @@ struct mapped_device {
 
struct dm_stats stats;
 
-   struct kthread_worker kworker;
-   struct task_struct *kworker_task;
-
-   /* for request-based merge heuristic in dm_request_fn() */
-   unsigned seq_rq_merge_deadline_usecs;
-   int last_rq_rw;
-   sector_t last_rq_pos;
-   ktime_t last_rq_start_time;
-
/* for blk-mq request-based DM support */
struct blk_mq_tag_set *tag_set;
-   bool use_blk_mq:1;
bool init_tio_pdu:1;
 
struct srcu_struct io_barrier;
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 419362c2d8ac..a24ed3973e7c 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -203,14 +203,7 @@ static struct multipath *alloc_multipath(struct dm_target 
*ti)
 static int alloc_multipath_stage2(struct dm_target *ti, struct multipath *m)
 {
if (m->queue_mode == DM_TYPE_NONE) {
-   /*
-* Default to request-based.
-*/
-   if (dm_use_blk_mq(dm_table_get_md(ti->table)))
-   m->queue_mode = DM_TYPE_MQ_REQUEST_BASED;
-   else
-   m->queue_mode = DM_TYPE_REQUEST_BASED;
-
+   m->queue_mode = DM_TYPE_MQ_REQUEST_BASED;
} else if (m->queue_mode == DM_TYPE_BIO_BASED) {
INIT_WORK(>process_queued_bios, process_queued_bios);
/*
@@ -537,10 +530,7 @@ static int multipath_clone_and_map(struct dm_target *ti, 
struct request *rq,
 * get the queue busy feedback (via BLK_STS_RESOURCE),
 * otherwise I/O merging can suffer.
 */
-   if (q->mq_ops)
-   return DM_MAPIO_REQUEUE;
-   else
-   return DM_MAPIO_DELAY_REQUEUE;
+   return DM_MAPIO_REQUEUE;
}
clone->bio = clone->biotail = NULL;
clone->rq_disk = bdev->bd_disk;
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 6e547b8dd298..37192b396473 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -23,19 +23,6 @@ static unsigned dm_mq_queue_depth = DM_MQ_QUEUE_DEPTH;
 #define RESERVED_REQUEST_BASED_IOS 256
 static unsigned reserved_rq_based_ios = RESERVED_REQUEST_BASED_IOS;
 
-static bool use_blk_mq = IS_ENABLED(CONFIG_DM_MQ_DEFAULT);
-
-bool dm_use_blk_mq_default(void)
-{
-   return use_blk_mq;
-}
-
-bool dm_use_blk_mq(struct mapped_device *md)
-{
-   return md->use_blk_mq;
-}
-EXPORT_SYMBOL_GPL(dm_use_blk_mq);
-
 unsigned dm_get_reserved_rq_based_ios(void)
 {
return __dm_get_module_param(_rq_based_ios,
@@ -59,16 +46,6 @@ int dm_request_based(struct mapped_device *md)
return queue_is_rq_based(md->queue);
 }
 
-static void dm_old_start_queue(struct request_queue *q)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(q->queue_lock, flags);
-   if (blk_queue_stopped(q))
-   blk_start_queue(q);
-   spin_unlock_irqrestore(q->queue_lock, flags);
-}
-
 static void dm_mq_start_queue(struct request_queue *q)
 {
blk_mq_unquiesce_queue(q);
@@ -77,20 +54,7 @@ static void dm_mq_start_queue(struct request_queue *q)
 
 void dm_start_queue(struct request_queue *q)
 {
-   if (!q->mq_ops)
-   dm_old_start_queue(q);
-   else
-   dm_mq_start_queue(q);
-}
-
-static void dm_old_stop_queue(struct request_queue *q)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(q->queue_lock, flags);
-   if (!blk_queue_stopped(q))
-   blk_stop_queue(q);
-   

[PATCH 03/28] mspro_block: convert to blk-mq

2018-10-25 Thread Jens Axboe
Straight forward conversion, there's room for improvement.

Signed-off-by: Jens Axboe 
---
 drivers/memstick/core/mspro_block.c | 121 +++-
 1 file changed, 66 insertions(+), 55 deletions(-)

diff --git a/drivers/memstick/core/mspro_block.c 
b/drivers/memstick/core/mspro_block.c
index 0cd30dcb6801..aba50ec98b4d 100644
--- a/drivers/memstick/core/mspro_block.c
+++ b/drivers/memstick/core/mspro_block.c
@@ -12,7 +12,7 @@
  *
  */
 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -142,6 +142,7 @@ struct mspro_block_data {
struct gendisk*disk;
struct request_queue  *queue;
struct request*block_req;
+   struct blk_mq_tag_set tag_set;
spinlock_tq_lock;
 
unsigned shortpage_size;
@@ -152,7 +153,6 @@ struct mspro_block_data {
unsigned char system;
unsigned char read_only:1,
  eject:1,
- has_request:1,
  data_dir:1,
  active:1;
unsigned char transfer_cmd;
@@ -694,13 +694,12 @@ static void h_mspro_block_setup_cmd(struct memstick_dev 
*card, u64 offset,
 
 /*** Data transfer ***/
 
-static int mspro_block_issue_req(struct memstick_dev *card, int chunk)
+static int mspro_block_issue_req(struct memstick_dev *card, bool chunk)
 {
struct mspro_block_data *msb = memstick_get_drvdata(card);
u64 t_off;
unsigned int count;
 
-try_again:
while (chunk) {
msb->current_page = 0;
msb->current_seg = 0;
@@ -709,9 +708,17 @@ static int mspro_block_issue_req(struct memstick_dev 
*card, int chunk)
   msb->req_sg);
 
if (!msb->seg_count) {
-   chunk = __blk_end_request_cur(msb->block_req,
-   BLK_STS_RESOURCE);
-   continue;
+   unsigned int bytes = blk_rq_cur_bytes(msb->block_req);
+
+   chunk = blk_update_request(msb->block_req,
+   BLK_STS_RESOURCE,
+   bytes);
+   if (chunk)
+   continue;
+   __blk_mq_end_request(msb->block_req,
+   BLK_STS_RESOURCE);
+   msb->block_req = NULL;
+   break;
}
 
t_off = blk_rq_pos(msb->block_req);
@@ -729,30 +736,22 @@ static int mspro_block_issue_req(struct memstick_dev 
*card, int chunk)
return 0;
}
 
-   dev_dbg(>dev, "blk_fetch\n");
-   msb->block_req = blk_fetch_request(msb->queue);
-   if (!msb->block_req) {
-   dev_dbg(>dev, "issue end\n");
-   return -EAGAIN;
-   }
-
-   dev_dbg(>dev, "trying again\n");
-   chunk = 1;
-   goto try_again;
+   return 1;
 }
 
 static int mspro_block_complete_req(struct memstick_dev *card, int error)
 {
struct mspro_block_data *msb = memstick_get_drvdata(card);
-   int chunk, cnt;
+   int cnt;
+   bool chunk;
unsigned int t_len = 0;
unsigned long flags;
 
spin_lock_irqsave(>q_lock, flags);
-   dev_dbg(>dev, "complete %d, %d\n", msb->has_request ? 1 : 0,
+   dev_dbg(>dev, "complete %d, %d\n", msb->block_req ? 1 : 0,
error);
 
-   if (msb->has_request) {
+   if (msb->block_req) {
/* Nothing to do - not really an error */
if (error == -EAGAIN)
error = 0;
@@ -777,15 +776,17 @@ static int mspro_block_complete_req(struct memstick_dev 
*card, int error)
if (error && !t_len)
t_len = blk_rq_cur_bytes(msb->block_req);
 
-   chunk = __blk_end_request(msb->block_req,
+   chunk = blk_update_request(msb->block_req,
errno_to_blk_status(error), t_len);
-
-   error = mspro_block_issue_req(card, chunk);
-
-   if (!error)
-   goto out;
-   else
-   msb->has_request = 0;
+   if (chunk) {
+   error = mspro_block_issue_req(card, chunk);
+   if (!error)
+   goto out;
+   } else {
+   __blk_mq_end_request(msb->block_req,
+   errno_to_blk_status(error));
+   msb->block_req = NULL;
+   }
} else {
if (!error)
error = -EAGAIN;
@@ -806,8 +807,8 @@ static void mspro_block_stop(struct memstick_dev *card)
 
while (1) {
spin_lock_irqsave(>q_lock, flags);
-   if 

[PATCH 13/28] bsg: convert to use blk-mq

2018-10-25 Thread Jens Axboe
Requires a few changes to the FC transport class as well.

Cc: Johannes Thumshirn 
Cc: Benjamin Block 
Cc: linux-s...@vger.kernel.org
Signed-off-by: Jens Axboe 
---
 block/bsg-lib.c  | 123 +++
 drivers/scsi/scsi_transport_fc.c |  59 +--
 2 files changed, 110 insertions(+), 72 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 267f965af77a..ef176b472914 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -21,7 +21,7 @@
  *
  */
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -129,7 +129,7 @@ static void bsg_teardown_job(struct kref *kref)
kfree(job->request_payload.sg_list);
kfree(job->reply_payload.sg_list);
 
-   blk_end_request_all(rq, BLK_STS_OK);
+   blk_mq_end_request(rq, BLK_STS_OK);
 }
 
 void bsg_job_put(struct bsg_job *job)
@@ -157,15 +157,15 @@ void bsg_job_done(struct bsg_job *job, int result,
 {
job->result = result;
job->reply_payload_rcv_len = reply_payload_rcv_len;
-   blk_complete_request(blk_mq_rq_from_pdu(job));
+   blk_mq_complete_request(blk_mq_rq_from_pdu(job));
 }
 EXPORT_SYMBOL_GPL(bsg_job_done);
 
 /**
- * bsg_softirq_done - softirq done routine for destroying the bsg requests
+ * bsg_complete - softirq done routine for destroying the bsg requests
  * @rq: BSG request that holds the job to be destroyed
  */
-static void bsg_softirq_done(struct request *rq)
+static void bsg_complete(struct request *rq)
 {
struct bsg_job *job = blk_mq_rq_to_pdu(rq);
 
@@ -224,54 +224,46 @@ static bool bsg_prepare_job(struct device *dev, struct 
request *req)
 }
 
 /**
- * bsg_request_fn - generic handler for bsg requests
- * @q: request queue to manage
+ * bsg_queue_rq - generic handler for bsg requests
+ * @hctx: hardware queue
+ * @bd: queue data
  *
  * On error the create_bsg_job function should return a -Exyz error value
  * that will be set to ->result.
  *
  * Drivers/subsys should pass this to the queue init function.
  */
-static void bsg_request_fn(struct request_queue *q)
-   __releases(q->queue_lock)
-   __acquires(q->queue_lock)
+static blk_status_t bsg_queue_rq(struct blk_mq_hw_ctx *hctx,
+const struct blk_mq_queue_data *bd)
 {
+   struct request_queue *q = hctx->queue;
struct device *dev = q->queuedata;
-   struct request *req;
+   struct request *req = bd->rq;
int ret;
 
+   blk_mq_start_request(req);
+
if (!get_device(dev))
-   return;
-
-   while (1) {
-   req = blk_fetch_request(q);
-   if (!req)
-   break;
-   spin_unlock_irq(q->queue_lock);
-
-   if (!bsg_prepare_job(dev, req)) {
-   blk_end_request_all(req, BLK_STS_OK);
-   spin_lock_irq(q->queue_lock);
-   continue;
-   }
-
-   ret = q->bsg_job_fn(blk_mq_rq_to_pdu(req));
-   spin_lock_irq(q->queue_lock);
-   if (ret)
-   break;
-   }
+   return BLK_STS_IOERR;
+
+   if (!bsg_prepare_job(dev, req))
+   return BLK_STS_IOERR;
+
+   ret = q->bsg_job_fn(blk_mq_rq_to_pdu(req));
+   if (ret)
+   return BLK_STS_IOERR;
 
-   spin_unlock_irq(q->queue_lock);
put_device(dev);
-   spin_lock_irq(q->queue_lock);
+   return BLK_STS_OK;
 }
 
 /* called right after the request is allocated for the request_queue */
-static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t gfp)
+static int bsg_init_rq(struct blk_mq_tag_set *set, struct request *req,
+  unsigned int hctx_idx, unsigned int numa_node)
 {
struct bsg_job *job = blk_mq_rq_to_pdu(req);
 
-   job->reply = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
+   job->reply = kzalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
if (!job->reply)
return -ENOMEM;
return 0;
@@ -289,7 +281,8 @@ static void bsg_initialize_rq(struct request *req)
job->dd_data = job + 1;
 }
 
-static void bsg_exit_rq(struct request_queue *q, struct request *req)
+static void bsg_exit_rq(struct blk_mq_tag_set *set, struct request *req,
+  unsigned int hctx_idx)
 {
struct bsg_job *job = blk_mq_rq_to_pdu(req);
 
@@ -298,11 +291,35 @@ static void bsg_exit_rq(struct request_queue *q, struct 
request *req)
 
 void bsg_remove_queue(struct request_queue *q)
 {
+   struct blk_mq_tag_set *set = q->tag_set;
+
bsg_unregister_queue(q);
blk_cleanup_queue(q);
+   blk_mq_free_tag_set(set);
+   kfree(set);
 }
 EXPORT_SYMBOL_GPL(bsg_remove_queue);
 
+static enum blk_eh_timer_return bsg_timeout(struct request *rq, bool reserved)
+{
+   enum blk_eh_timer_return ret = BLK_EH_DONE;
+   struct request_queue *q = rq->q;
+
+   if (q->rq_timed_out_fn)
+   ret = q->rq_timed_out_fn(rq);
+
+   

[PATCH 16/28] blk-cgroup: remove legacy queue bypassing

2018-10-25 Thread Jens Axboe
We only support mq devices now.

Signed-off-by: Jens Axboe 
---
 block/blk-cgroup.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 992da5592c6e..5f10d755ec52 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1446,8 +1446,6 @@ int blkcg_activate_policy(struct request_queue *q,
 
if (q->mq_ops)
blk_mq_freeze_queue(q);
-   else
-   blk_queue_bypass_start(q);
 pd_prealloc:
if (!pd_prealloc) {
pd_prealloc = pol->pd_alloc_fn(GFP_KERNEL, q->node);
@@ -1487,8 +1485,6 @@ int blkcg_activate_policy(struct request_queue *q,
 out_bypass_end:
if (q->mq_ops)
blk_mq_unfreeze_queue(q);
-   else
-   blk_queue_bypass_end(q);
if (pd_prealloc)
pol->pd_free_fn(pd_prealloc);
return ret;
@@ -1513,8 +1509,6 @@ void blkcg_deactivate_policy(struct request_queue *q,
 
if (q->mq_ops)
blk_mq_freeze_queue(q);
-   else
-   blk_queue_bypass_start(q);
 
spin_lock_irq(q->queue_lock);
 
@@ -1533,8 +1527,6 @@ void blkcg_deactivate_policy(struct request_queue *q,
 
if (q->mq_ops)
blk_mq_unfreeze_queue(q);
-   else
-   blk_queue_bypass_end(q);
 }
 EXPORT_SYMBOL_GPL(blkcg_deactivate_policy);
 
-- 
2.17.1



[PATCH 15/28] blk-wbt: kill check for legacy queue type

2018-10-25 Thread Jens Axboe
Everything is blk-mq at this point, so it doesn't make any sense
to have this option available as it does nothing.

Signed-off-by: Jens Axboe 
---
 block/Kconfig   | 6 --
 block/blk-wbt.c | 3 +--
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index f7045aa47edb..8044452a4fd3 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -155,12 +155,6 @@ config BLK_CGROUP_IOLATENCY
 
Note, this is an experimental interface and could be changed someday.
 
-config BLK_WBT_SQ
-   bool "Single queue writeback throttling"
-   depends on BLK_WBT
-   ---help---
-   Enable writeback throttling by default on legacy single queue devices
-
 config BLK_WBT_MQ
bool "Multiqueue writeback throttling"
default y
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 8ac93fcbaa2e..49fac89a981c 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -709,8 +709,7 @@ void wbt_enable_default(struct request_queue *q)
if (!test_bit(QUEUE_FLAG_REGISTERED, >queue_flags))
return;
 
-   if ((q->mq_ops && IS_ENABLED(CONFIG_BLK_WBT_MQ)) ||
-   (q->request_fn && IS_ENABLED(CONFIG_BLK_WBT_SQ)))
+   if (IS_ENABLED(CONFIG_BLK_WBT_MQ))
wbt_init(q);
 }
 EXPORT_SYMBOL_GPL(wbt_enable_default);
-- 
2.17.1



Re: [PATCH v2] blktest: remove instances of null_blk queue_mode=1

2018-10-25 Thread Jens Axboe
On 10/25/18 3:08 PM, Omar Sandoval wrote:
> On Thu, Oct 25, 2018 at 03:03:30PM -0600, Jens Axboe wrote:
>> This is no longer supported in recent kernels, get rid of
>> any testing of queue_mode=1. queue_mode=1 tested the legacy
>> IO path, which is going away completely. As such, there's
>> no point in doing anymore testing with it.
>>
>> Signed-off-by: Jens Axboe 
> 
> Thanks, applied this squashed it with the one removing block/022.

Thanks Omar!

-- 
Jens Axboe



[PATCHSET 0/28] blk-mq driver conversions and legacy path removal

2018-10-25 Thread Jens Axboe
The first round of this went into 4.20-rc, but we've still some of
them pending. This patch series converts the remaining drivers to
blk-mq. The ones that support dual paths (like SCSI and DM) have
the non-mq path removed. At the end, legacy IO code and schedulers
are killed off.

This patch series is on top of my for-linus branch. It can also
be bound in my mq-conversions branch.

 Documentation/block/biodoc.txt |   88 -
 Documentation/block/cfq-iosched.txt|  291 --
 Documentation/scsi/scsi-parameters.txt |5 -
 block/Kconfig  |6 -
 block/Kconfig.iosched  |   61 -
 block/Makefile |5 +-
 block/bfq-iosched.c|1 -
 block/blk-cgroup.c |   55 -
 block/blk-core.c   | 1860 +---
 block/blk-exec.c   |   20 +-
 block/blk-flush.c  |  154 +-
 block/blk-ioc.c|   46 +-
 block/blk-merge.c  |   35 +-
 block/blk-mq-debugfs.c |2 -
 block/blk-mq-tag.c |6 +-
 block/blk-mq.c |   13 +-
 block/blk-settings.c   |   49 -
 block/blk-softirq.c|   20 -
 block/blk-sysfs.c  |   39 +-
 block/blk-tag.c|  378 ---
 block/blk-timeout.c|   99 +-
 block/blk-wbt.c|3 +-
 block/blk.h|   60 +-
 block/bsg-lib.c|  131 +-
 block/cfq-iosched.c| 4916 
 block/deadline-iosched.c   |  560 
 block/elevator.c   |  447 +--
 block/kyber-iosched.c  |1 -
 block/mq-deadline.c|1 -
 block/noop-iosched.c   |  124 -
 drivers/block/sunvdc.c |  149 +-
 drivers/ide/ide-atapi.c|   25 +-
 drivers/ide/ide-cd.c   |  175 +-
 drivers/ide/ide-disk.c |5 +-
 drivers/ide/ide-io.c   |  101 +-
 drivers/ide/ide-park.c |4 +-
 drivers/ide/ide-pm.c   |   28 +-
 drivers/ide/ide-probe.c|   68 +-
 drivers/infiniband/ulp/srp/ib_srp.c|7 -
 drivers/md/Kconfig |   11 -
 drivers/md/dm-core.h   |   10 -
 drivers/md/dm-mpath.c  |   18 +-
 drivers/md/dm-rq.c |  293 +-
 drivers/md/dm-rq.h |4 -
 drivers/md/dm-sysfs.c  |3 +-
 drivers/md/dm-table.c  |   36 +-
 drivers/md/dm.c|   21 +-
 drivers/md/dm.h|1 -
 drivers/memstick/core/ms_block.c   |  110 +-
 drivers/memstick/core/ms_block.h   |1 +
 drivers/memstick/core/mspro_block.c|  121 +-
 drivers/s390/block/dasd_ioctl.c|   22 +-
 drivers/scsi/Kconfig   |   12 -
 drivers/scsi/cxlflash/main.c   |6 -
 drivers/scsi/hosts.c   |   29 +-
 drivers/scsi/lpfc/lpfc_scsi.c  |2 +-
 drivers/scsi/osd/osd_initiator.c   |4 +-
 drivers/scsi/osst.c|2 +-
 drivers/scsi/qedi/qedi_main.c  |3 +-
 drivers/scsi/qla2xxx/qla_os.c  |   30 +-
 drivers/scsi/scsi.c|5 +-
 drivers/scsi/scsi_debug.c  |3 +-
 drivers/scsi/scsi_error.c  |4 +-
 drivers/scsi/scsi_lib.c|  624 +---
 drivers/scsi/scsi_priv.h   |1 -
 drivers/scsi/scsi_scan.c   |   10 +-
 drivers/scsi/scsi_sysfs.c  |8 +-
 drivers/scsi/scsi_transport_fc.c   |   72 +-
 drivers/scsi/scsi_transport_iscsi.c|9 +-
 drivers/scsi/scsi_transport_sas.c  |   10 +-
 drivers/scsi/sg.c  |2 +-
 drivers/scsi/st.c  |2 +-
 drivers/scsi/ufs/ufshcd.c  |6 -
 drivers/target/target_core_pscsi.c |2 +-
 include/linux/blk-cgroup.h |  108 -
 include/linux/blkdev.h |  174 +-
 include/linux/bsg-lib.h|3 +-
 include/linux/elevator.h   |   90 +-
 include/linux/ide.h|   13 +-
 include/linux/init.h   |1 -
 include/scsi/scsi_host.h   |   18 +-
 include/scsi/scsi_tcq.h|   14 +-
 init/do_mounts_initrd.c|3 -
 init/initramfs.c   |6 -
 init/main.c|   12 -
 85 files changed, 833 insertions(+), 11144 deletions(-)

-- 
Jens Axboe




Re: [PATCH v2] blktest: remove instances of null_blk queue_mode=1

2018-10-25 Thread Omar Sandoval
On Thu, Oct 25, 2018 at 03:03:30PM -0600, Jens Axboe wrote:
> This is no longer supported in recent kernels, get rid of
> any testing of queue_mode=1. queue_mode=1 tested the legacy
> IO path, which is going away completely. As such, there's
> no point in doing anymore testing with it.
> 
> Signed-off-by: Jens Axboe 

Thanks, applied this squashed it with the one removing block/022.


[PATCH v2] blktest: remove instances of null_blk queue_mode=1

2018-10-25 Thread Jens Axboe
This is no longer supported in recent kernels, get rid of
any testing of queue_mode=1. queue_mode=1 tested the legacy
IO path, which is going away completely. As such, there's
no point in doing anymore testing with it.

Signed-off-by: Jens Axboe 

---

Replaces the two previous patches - covers 024 as well, and folds the
other two.

diff --git a/tests/block/017 b/tests/block/017
index 715c4e59c514..cea29beaf062 100755
--- a/tests/block/017
+++ b/tests/block/017
@@ -26,27 +26,23 @@ show_inflight() {
 test() {
echo "Running ${TEST_NAME}"
 
-   for ((queue_mode = 1; queue_mode <= 2; queue_mode++)) do
-   echo "queue mode $queue_mode"
+   if ! _init_null_blk queue_mode=2 irqmode=2 \
+completion_nsec=5; then
+   return 1
+   fi
 
-   if ! _init_null_blk queue_mode="$queue_mode" irqmode=2 \
-completion_nsec=5; then
-   continue
-   fi
+   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 status=none 
&
+   sleep 0.1
+   show_inflight
 
-   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 
status=none &
-   sleep 0.1
-   show_inflight
+   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 status=none 
&
+   sleep 0.1
+   show_inflight
 
-   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 
status=none &
-   sleep 0.1
-   show_inflight
+   wait
+   show_inflight
 
-   wait
-   show_inflight
-
-   _exit_null_blk
-   done
+   _exit_null_blk
 
echo "Test complete"
 }
diff --git a/tests/block/017.out b/tests/block/017.out
index 93d67c98e159..e2ecf978af5f 100644
--- a/tests/block/017.out
+++ b/tests/block/017.out
@@ -1,18 +1,4 @@
 Running block/017
-queue mode 1
-sysfs inflight reads 1
-sysfs inflight writes 0
-sysfs stat 1
-diskstats 1
-sysfs inflight reads 1
-sysfs inflight writes 1
-sysfs stat 2
-diskstats 2
-sysfs inflight reads 0
-sysfs inflight writes 0
-sysfs stat 0
-diskstats 0
-queue mode 2
 sysfs inflight reads 1
 sysfs inflight writes 0
 sysfs stat 1
diff --git a/tests/block/018 b/tests/block/018
index 279dc7a31958..731272399a82 100755
--- a/tests/block/018
+++ b/tests/block/018
@@ -29,37 +29,33 @@ show_times() {
 }
 
 test() {
-   echo "Running ${TEST_NAME}"
-
-   for ((queue_mode = 1; queue_mode <= 2; queue_mode++)) do
-   local init_read_ms init_write_ms read_ms write_ms
+   local init_read_ms init_write_ms read_ms write_ms
 
-   echo "queue mode $queue_mode"
+   echo "Running ${TEST_NAME}"
 
-   if ! _init_null_blk queue_mode="$queue_mode" irqmode=2 \
-completion_nsec=10; then
-   continue
-   fi
+   if ! _init_null_blk queue_mode=2 irqmode=2 \
+completion_nsec=10; then
+   return 1
+   fi
 
-   init_times
-   show_times
+   init_times
+   show_times
 
-   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 
status=none
-   show_times
+   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 status=none
+   show_times
 
-   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 
status=none
-   show_times
+   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 status=none
+   show_times
 
-   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 
status=none &
-   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 
status=none &
-   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 
status=none &
-   wait
-   show_times
+   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 status=none 
&
+   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 status=none 
&
+   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 status=none 
&
+   wait
+   show_times
 
-   _exit_null_blk
+   _exit_null_blk
 
-   unset init_read_ms init_write_ms read_ms write_ms
-   done
+   unset init_read_ms init_write_ms read_ms write_ms
 
echo "Test complete"
 }
diff --git a/tests/block/018.out b/tests/block/018.out
index facb7c2be260..27c74bec2337 100644
--- a/tests/block/018.out
+++ b/tests/block/018.out
@@ -1,14 +1,4 @@
 Running block/018
-queue mode 1
-read 0 s
-write 0 s
-read 1 s
-write 0 s
-read 1 s
-write 1 s
-read 2 s
-write 3 s
-queue mode 2
 read 0 s
 write 0 s
 read 1 s
diff --git a/tests/block/023 b/tests/block/023
index b053af45adb0..b0739f72e46d 100755
--- a/tests/block/023
+++ b/tests/block/023
@@ -19,7 +19,7 @@ test() {
echo "Running ${TEST_NAME}"
 
local queue_mode
-   for ((queue_mode = 0; queue_mode <= 2; queue_mode++)); do
+  

[PATCH] blktest: remove instances of null_blk queue_mode=1

2018-10-25 Thread Jens Axboe
This is no longer supported in recent kernels, get rid of
any testing of queue_mode=1.

Signed-off-by: Jens Axboe 

diff --git a/tests/block/017 b/tests/block/017
index 715c4e59c514..cea29beaf062 100755
--- a/tests/block/017
+++ b/tests/block/017
@@ -26,27 +26,23 @@ show_inflight() {
 test() {
echo "Running ${TEST_NAME}"
 
-   for ((queue_mode = 1; queue_mode <= 2; queue_mode++)) do
-   echo "queue mode $queue_mode"
+   if ! _init_null_blk queue_mode=2 irqmode=2 \
+completion_nsec=5; then
+   return 1
+   fi
 
-   if ! _init_null_blk queue_mode="$queue_mode" irqmode=2 \
-completion_nsec=5; then
-   continue
-   fi
+   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 status=none 
&
+   sleep 0.1
+   show_inflight
 
-   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 
status=none &
-   sleep 0.1
-   show_inflight
+   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 status=none 
&
+   sleep 0.1
+   show_inflight
 
-   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 
status=none &
-   sleep 0.1
-   show_inflight
+   wait
+   show_inflight
 
-   wait
-   show_inflight
-
-   _exit_null_blk
-   done
+   _exit_null_blk
 
echo "Test complete"
 }
diff --git a/tests/block/017.out b/tests/block/017.out
index 93d67c98e159..e2ecf978af5f 100644
--- a/tests/block/017.out
+++ b/tests/block/017.out
@@ -1,18 +1,4 @@
 Running block/017
-queue mode 1
-sysfs inflight reads 1
-sysfs inflight writes 0
-sysfs stat 1
-diskstats 1
-sysfs inflight reads 1
-sysfs inflight writes 1
-sysfs stat 2
-diskstats 2
-sysfs inflight reads 0
-sysfs inflight writes 0
-sysfs stat 0
-diskstats 0
-queue mode 2
 sysfs inflight reads 1
 sysfs inflight writes 0
 sysfs stat 1
diff --git a/tests/block/018 b/tests/block/018
index 279dc7a31958..731272399a82 100755
--- a/tests/block/018
+++ b/tests/block/018
@@ -29,37 +29,33 @@ show_times() {
 }
 
 test() {
-   echo "Running ${TEST_NAME}"
-
-   for ((queue_mode = 1; queue_mode <= 2; queue_mode++)) do
-   local init_read_ms init_write_ms read_ms write_ms
+   local init_read_ms init_write_ms read_ms write_ms
 
-   echo "queue mode $queue_mode"
+   echo "Running ${TEST_NAME}"
 
-   if ! _init_null_blk queue_mode="$queue_mode" irqmode=2 \
-completion_nsec=10; then
-   continue
-   fi
+   if ! _init_null_blk queue_mode=2 irqmode=2 \
+completion_nsec=10; then
+   return 1
+   fi
 
-   init_times
-   show_times
+   init_times
+   show_times
 
-   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 
status=none
-   show_times
+   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 status=none
+   show_times
 
-   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 
status=none
-   show_times
+   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 status=none
+   show_times
 
-   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 
status=none &
-   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 
status=none &
-   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 
status=none &
-   wait
-   show_times
+   dd if=/dev/nullb0 of=/dev/null bs=4096 iflag=direct count=1 status=none 
&
+   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 status=none 
&
+   dd if=/dev/zero of=/dev/nullb0 bs=4096 oflag=direct count=1 status=none 
&
+   wait
+   show_times
 
-   _exit_null_blk
+   _exit_null_blk
 
-   unset init_read_ms init_write_ms read_ms write_ms
-   done
+   unset init_read_ms init_write_ms read_ms write_ms
 
echo "Test complete"
 }
diff --git a/tests/block/018.out b/tests/block/018.out
index facb7c2be260..27c74bec2337 100644
--- a/tests/block/018.out
+++ b/tests/block/018.out
@@ -1,14 +1,4 @@
 Running block/018
-queue mode 1
-read 0 s
-write 0 s
-read 1 s
-write 0 s
-read 1 s
-write 1 s
-read 2 s
-write 3 s
-queue mode 2
 read 0 s
 write 0 s
 read 1 s
diff --git a/tests/block/023 b/tests/block/023
index b053af45adb0..b0739f72e46d 100755
--- a/tests/block/023
+++ b/tests/block/023
@@ -19,7 +19,7 @@ test() {
echo "Running ${TEST_NAME}"
 
local queue_mode
-   for ((queue_mode = 0; queue_mode <= 2; queue_mode++)); do
+   for queue_mode in 0 2; do
if _init_null_blk gb=1 queue_mode="$queue_mode"; then
echo "Queue mode $queue_mode"
dd if=/dev/nullb0 of=/dev/null 

[PATCH] blktests: remove legacy IO path timeout test

2018-10-25 Thread Jens Axboe
This feature is gone from null_blk in the current Linux kernels.
It doesn't make sense to keep testing this on older kernels either,
as the legacy IO path is going away.

Signed-off-by: Jens Axboe 

diff --git a/tests/block/022 b/tests/block/022
deleted file mode 100755
index 91946cfab6bf..
--- a/tests/block/022
+++ /dev/null
@@ -1,45 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-3.0+
-# Copyright (C) 2018 Jens Axboe
-#
-# Smoke test !mq timeout handling with null-blk.
-
-. tests/block/rc
-. common/null_blk
-
-DESCRIPTION="run null-blk with legacy blk path and timeout injection 
configured"
-
-requires() {
-   _have_null_blk && _have_module_param null_blk timeout
-}
-
-test() {
-   echo "Running ${TEST_NAME}"
-
-   # The format is ",,,". Here, we
-   # fail 50% of I/Os.
-   if ! _init_null_blk queue_mode=1 timeout='1,50,0,-1'; then
-   return 1
-   fi
-
-   local scheds
-   # shellcheck disable=SC2207
-   scheds=($(sed 's/[][]//g' /sys/block/nullb0/queue/scheduler))
-
-   for sched in "${scheds[@]}"; do
-   echo "Testing $sched" >> "$FULL"
-   echo "$sched" > /sys/block/nullb0/queue/scheduler
-   # Do a bunch of I/Os which will timeout and then complete. The
-   # only thing we're really testing here is that this doesn't
-   # crash or hang.
-   for ((i = 0; i < 100; i++)); do
-   dd if=/dev/nullb0 of=/dev/null bs=4K count=4 \
-   iflag=direct status=none &
-   done
-   wait
-   done
-
-   _exit_null_blk
-
-   echo "Test complete"
-}
diff --git a/tests/block/022.out b/tests/block/022.out
deleted file mode 100644
index 14d43cb1c828..
--- a/tests/block/022.out
+++ /dev/null
@@ -1,2 +0,0 @@
-Running block/022
-Test complete

-- 
Jens Axboe



Re: [PATCH] lightnvm: pblk: ignore the smeta oob area scan

2018-10-25 Thread Hans Holmberg
On Thu, Oct 25, 2018 at 2:44 AM Zhoujie Wu  wrote:
>
> The smeta area l2p mapping is empty, and actually the
> recovery procedure only need to restore data sector's l2p
> mapping. So ignore the smeta oob scan.
>
> Signed-off-by: Zhoujie Wu 
> ---
>  drivers/lightnvm/pblk-recovery.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-recovery.c 
> b/drivers/lightnvm/pblk-recovery.c
> index 5740b75..30f2616 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -334,6 +334,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
> pblk_line *line,
>struct pblk_recov_alloc p)
>  {
> struct nvm_tgt_dev *dev = pblk->dev;
> +   struct pblk_line_meta *lm = >lm;
> struct nvm_geo *geo = >geo;
> struct ppa_addr *ppa_list;
> struct pblk_sec_meta *meta_list;
> @@ -342,12 +343,12 @@ static int pblk_recov_scan_oob(struct pblk *pblk, 
> struct pblk_line *line,
> void *data;
> dma_addr_t dma_ppa_list, dma_meta_list;
> __le64 *lba_list;
> -   u64 paddr = 0;
> +   u64 paddr = lm->smeta_sec;

Smeta is not guaranteed to start at paddr 0 - it will be placed in the
first non-bad chunk (in stripe order).
If the first chunk in the line is bad, smeta will be read and
lm->smeta_sec sectors will be lost.

You can use pblk_line_smeta_start to calculate the start address of smeta.

/ Hans

> bool padded = false;
> int rq_ppas, rq_len;
> int i, j;
> int ret;
> -   u64 left_ppas = pblk_sec_in_open_line(pblk, line);
> +   u64 left_ppas = pblk_sec_in_open_line(pblk, line) - lm->smeta_sec;
>
> if (pblk_line_wp_is_unbalanced(pblk, line))
> pblk_warn(pblk, "recovering unbalanced line (%d)\n", 
> line->id);
> --
> 1.9.1
>


Submit Proposals to the 2019 Linux Storage and Filesystems Conference!

2018-10-25 Thread Christoph Hellwig
After a one-year hiatus, the Linux Storage and Filesystems Conference (Vault) 
returns in 2019, under the sponsorship and organization of the USENIX 
Association. Vault brings together practitioners, implementers, users, and 
researchers working on storage in open source and related projects.

We welcome creators and users of open source storage, file systems, and related 
technologies to submit their work and to join us for Vault '19, which will take 
place on February 25 - 26, 2019, in Boston, MA, USA, and will be co-located 
with the 17th USENIX Conference on File and Storage Technologies (FAST '19).

Learn More about Vault '19:
https://www.usenix.org/conference/vault19

Learn More about FAST '19:
https://www.usenix.org/conference/fast19

We are looking for proposals on a diverse range of topics related to storage, 
Linux, and open source. The best talks will share your or your team's 
experience with a new technology, a new idea, a new approach, or inspire the 
audience to think beyond the ways they have always done things. We are also 
accepting proposals for a limited number of workshop sessions, where content 
can be more like a tutorial in nature or include hands-on participation by 
attendees. We encourage new speakers to submit talks as some of the most 
insightful talks often come from people with new experiences to share.

Previous Vault events have drawn multiple hundreds of attendees from a range of 
companies, with backgrounds ranging from individual open source contributors, 
to new startups, through teams within the technology and storage giants, or 
storage end users.

Talk and workshop proposals are due on Thursday, November 15, 2018. Please read 
through the Call for Participation for additional details, including topics of 
interest, and submission instructions.

View the Vault '19 Call for Participation:
https://www.usenix.org/conference/vault19/call-for-participation

We look forward to receiving your proposals!

Christoph Hellwig
Erik Riedel
Ric Wheeler, Red Hat
vault19cha...@usenix.org