Re: [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq

2017-11-03 Thread Ming Lei
On Thu, Nov 02, 2017 at 11:24:31PM +0800, Ming Lei wrote:
> Hi Jens,
> 
> This patchset avoids to allocate driver tag beforehand for flush rq
> in case of I/O scheduler, then flush rq isn't treated specially
> wrt. get/put driver tag, code gets cleanup much, such as,
> reorder_tags_to_front() is removed, and we needn't to worry
> about request order in dispatch list for avoiding I/O deadlock.
> 
> 'dbench -t 30 -s 64' has been run on different devices(shared tag,
> multi-queue, singele queue, ...), and no issues are observed,
> even very low queue depth test are run, debench still works well.
> 
> Please consider it for V4.15, thanks!

Hi Jens,

As we discussed before, this patch is a good cleanup on handling flush
request, could you share your opinion on V3?

thanks,
Ming


Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-03 Thread Ming Lei
On Sat, Nov 04, 2017 at 12:47:31AM +0800, Ming Lei wrote:
> On Fri, Nov 03, 2017 at 12:13:09PM -0400, Laurence Oberman wrote:
> > Hi
> > I had it working some time back. I am off today to take my son to the
> > doctor.
> > I will get Bart's test working again this weekend.
> 
> Hello Laurence and Bart,
> 
> Just found srp-test starts to work now with v4.14-rc4 kernel, and
> the IO hang issue has been reproduced one time. BTW, the connection
> failure happened on for-next of block tree.
> 
> Will investigate the issue this weekend.

Hi Bart,

Please test the following patch to see if your issue can be fixed:

https://marc.info/?l=linux-scsi=150976056219162=2

I run three times of your test 01, not see the hang issue any more
once this patch is applied.

Thanks,
Ming

> 
> > 
> > On Nov 3, 2017 11:51 AM, "Bart Van Assche"  wrote:
> > 
> > > On Fri, 2017-11-03 at 23:47 +0800, Ming Lei wrote:
> > > > Forget to mention, there is failure when running 'make' under srp-test
> > > > because shellcheck package is missed in RHEL7. Can that be the issue
> > > > of test failure? If yes, could you provide a special version of srp-test
> > > > which doesn't depend on shellcheck?
> > >
> > > The srp-test software does not depend on shellcheck. The only purpose of
> > > the "shellcheck" target in the srp-test Makefile is to verify the syntax
> > > of the shell scripts. The tests run fine even if shellcheck is missing.
> > >
> > > Bart.
> 
> -- 
> Ming

-- 
Ming


[PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-03 Thread Ming Lei
It is very expensive to atomic_inc/atomic_dec the host wide counter of
host->busy_count, and it should have been avoided via blk-mq's mechanism
of getting driver tag, which uses the more efficient way of sbitmap queue.

Also we don't check atomic_read(>device_busy) in scsi_mq_get_budget()
and don't run queue if the counter becomes zero, so IO hang may be caused
if all requests are completed just before the current SCSI device
is added to shost->starved_list.

Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq)
Reported-by: Bart Van Assche 
Signed-off-by: Ming Lei 
---

 drivers/scsi/scsi_lib.c | 29 +
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a098af3070a1..7f218ef61900 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1944,11 +1944,7 @@ static void scsi_mq_put_budget(struct blk_mq_hw_ctx 
*hctx)
 {
struct request_queue *q = hctx->queue;
struct scsi_device *sdev = q->queuedata;
-   struct Scsi_Host *shost = sdev->host;
 
-   atomic_dec(>host_busy);
-   if (scsi_target(sdev)->can_queue > 0)
-   atomic_dec(_target(sdev)->target_busy);
atomic_dec(>device_busy);
put_device(>sdev_gendev);
 }
@@ -1957,7 +1953,6 @@ static blk_status_t scsi_mq_get_budget(struct 
blk_mq_hw_ctx *hctx)
 {
struct request_queue *q = hctx->queue;
struct scsi_device *sdev = q->queuedata;
-   struct Scsi_Host *shost = sdev->host;
blk_status_t ret;
 
ret = prep_to_mq(scsi_prep_state_check(sdev, NULL));
@@ -1968,18 +1963,9 @@ static blk_status_t scsi_mq_get_budget(struct 
blk_mq_hw_ctx *hctx)
goto out;
if (!scsi_dev_queue_ready(q, sdev))
goto out_put_device;
-   if (!scsi_target_queue_ready(shost, sdev))
-   goto out_dec_device_busy;
-   if (!scsi_host_queue_ready(q, shost, sdev))
-   goto out_dec_target_busy;
 
return BLK_STS_OK;
 
-out_dec_target_busy:
-   if (scsi_target(sdev)->can_queue > 0)
-   atomic_dec(_target(sdev)->target_busy);
-out_dec_device_busy:
-   atomic_dec(>device_busy);
 out_put_device:
put_device(>sdev_gendev);
 out:
@@ -1992,6 +1978,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx 
*hctx,
struct request *req = bd->rq;
struct request_queue *q = req->q;
struct scsi_device *sdev = q->queuedata;
+   struct Scsi_Host *shost = sdev->host;
struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
blk_status_t ret;
int reason;
@@ -2001,10 +1988,15 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx 
*hctx,
goto out_put_budget;
 
ret = BLK_STS_RESOURCE;
+   if (!scsi_target_queue_ready(shost, sdev))
+   goto out_put_budget;
+   if (!scsi_host_queue_ready(q, shost, sdev))
+   goto out_dec_target_busy;
+
if (!(req->rq_flags & RQF_DONTPREP)) {
ret = prep_to_mq(scsi_mq_prep_fn(req));
if (ret != BLK_STS_OK)
-   goto out_put_budget;
+   goto out_dec_host_busy;
req->rq_flags |= RQF_DONTPREP;
} else {
blk_mq_start_request(req);
@@ -2022,11 +2014,16 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx 
*hctx,
if (reason) {
scsi_set_blocked(cmd, reason);
ret = BLK_STS_RESOURCE;
-   goto out_put_budget;
+   goto out_dec_host_busy;
}
 
return BLK_STS_OK;
 
+out_dec_host_busy:
+   atomic_dec(>host_busy);
+out_dec_target_busy:
+   if (scsi_target(sdev)->can_queue > 0)
+   atomic_dec(_target(sdev)->target_busy);
 out_put_budget:
scsi_mq_put_budget(hctx);
switch (ret) {
-- 
2.9.5



[PATCH v3 3/8] nvmet/fc: Use sgl_alloc() and sgl_free()

2017-11-03 Thread Bart Van Assche
Use the sgl_alloc() and sgl_free() functions instead of open coding
these functions.

Signed-off-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Cc: Keith Busch 
Cc: Christoph Hellwig 
Cc: James Smart 
Cc: Sagi Grimberg 
---
 drivers/nvme/target/Kconfig |  1 +
 drivers/nvme/target/fc.c| 36 ++--
 2 files changed, 3 insertions(+), 34 deletions(-)

diff --git a/drivers/nvme/target/Kconfig b/drivers/nvme/target/Kconfig
index 03e4ab65fe77..4d9715630e21 100644
--- a/drivers/nvme/target/Kconfig
+++ b/drivers/nvme/target/Kconfig
@@ -39,6 +39,7 @@ config NVME_TARGET_FC
tristate "NVMe over Fabrics FC target driver"
depends on NVME_TARGET
depends on HAS_DMA
+   select SGL_ALLOC
help
  This enables the NVMe FC target support, which allows exporting NVMe
  devices over FC.
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 5f2570f5dfa9..1b4030d74ceb 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -1695,31 +1695,12 @@ static int
 nvmet_fc_alloc_tgt_pgs(struct nvmet_fc_fcp_iod *fod)
 {
struct scatterlist *sg;
-   struct page *page;
unsigned int nent;
-   u32 page_len, length;
-   int i = 0;
 
-   length = fod->total_length;
-   nent = DIV_ROUND_UP(length, PAGE_SIZE);
-   sg = kmalloc_array(nent, sizeof(struct scatterlist), GFP_KERNEL);
+   sg = sgl_alloc(fod->total_length, GFP_KERNEL, );
if (!sg)
goto out;
 
-   sg_init_table(sg, nent);
-
-   while (length) {
-   page_len = min_t(u32, length, PAGE_SIZE);
-
-   page = alloc_page(GFP_KERNEL);
-   if (!page)
-   goto out_free_pages;
-
-   sg_set_page([i], page, page_len, 0);
-   length -= page_len;
-   i++;
-   }
-
fod->data_sg = sg;
fod->data_sg_cnt = nent;
fod->data_sg_cnt = fc_dma_map_sg(fod->tgtport->dev, sg, nent,
@@ -1729,14 +1710,6 @@ nvmet_fc_alloc_tgt_pgs(struct nvmet_fc_fcp_iod *fod)
 
return 0;
 
-out_free_pages:
-   while (i > 0) {
-   i--;
-   __free_page(sg_page([i]));
-   }
-   kfree(sg);
-   fod->data_sg = NULL;
-   fod->data_sg_cnt = 0;
 out:
return NVME_SC_INTERNAL;
 }
@@ -1744,18 +1717,13 @@ nvmet_fc_alloc_tgt_pgs(struct nvmet_fc_fcp_iod *fod)
 static void
 nvmet_fc_free_tgt_pgs(struct nvmet_fc_fcp_iod *fod)
 {
-   struct scatterlist *sg;
-   int count;
-
if (!fod->data_sg || !fod->data_sg_cnt)
return;
 
fc_dma_unmap_sg(fod->tgtport->dev, fod->data_sg, fod->data_sg_cnt,
((fod->io_dir == NVMET_FCP_WRITE) ?
DMA_FROM_DEVICE : DMA_TO_DEVICE));
-   for_each_sg(fod->data_sg, sg, fod->data_sg_cnt, count)
-   __free_page(sg_page(sg));
-   kfree(fod->data_sg);
+   sgl_free(fod->data_sg);
fod->data_sg = NULL;
fod->data_sg_cnt = 0;
 }
-- 
2.14.3



[PATCH v3 0/8] Introduce sgl_alloc() and sgl_free()

2017-11-03 Thread Bart Van Assche
Hello Jens,

As you know there are multiple drivers that both allocate a scatter/gather
list and populate that list with pages. This patch series moves the code for
allocating and freeing such scatterlists from these drivers into
lib/scatterlist.c. Please consider this patch series for kernel v4.15.

Notes:
- Only the ib_srpt and ipr changes have been tested.
- The next step is to introduce a caching mechanism for these scatterlists
  and make the nvmet/rdma and SCSI target drivers use that caching mechanism
  since for these drivers sgl allocation occurs in the hot path.

Changes between v2 and v3:
- Added Reviewed-by tags that had been posted as replies on v2.
- Cc'ed the crypto maintainers for the entire patch series.

Changes between v1 and v2:
- Moved the sgl_alloc*() and sgl_free() functions from a new source file into
  lib/scatterlist.c.
- Changed the argument order for the sgl_alloc*() functions such that the
  (pointer to) the output argument comes last.

Thanks,

Bart.

Bart Van Assche (8):
  lib/scatterlist: Introduce sgl_alloc() and sgl_free()
  crypto: scompress - use sgl_alloc() and sgl_free()
  nvmet/fc: Use sgl_alloc() and sgl_free()
  nvmet/rdma: Use sgl_alloc() and sgl_free()
  target: Use sgl_alloc_order() and sgl_free()
  scsi/ipr: Use sgl_alloc_order() and sgl_free_order()
  scsi/pmcraid: Remove an unused structure member
  scsi/pmcraid: Use sgl_alloc_order() and sgl_free_order()

 crypto/Kconfig |   1 +
 crypto/scompress.c |  51 +---
 drivers/nvme/target/Kconfig|   2 +
 drivers/nvme/target/fc.c   |  36 +--
 drivers/nvme/target/rdma.c |  63 ++--
 drivers/scsi/Kconfig   |   2 +
 drivers/scsi/ipr.c |  49 +++
 drivers/scsi/ipr.h |   2 +-
 drivers/scsi/pmcraid.c |  43 ++
 drivers/scsi/pmcraid.h |   3 +-
 drivers/target/Kconfig |   1 +
 drivers/target/target_core_transport.c |  46 ++-
 include/linux/scatterlist.h|  10 
 lib/Kconfig|   4 ++
 lib/scatterlist.c  | 105 +
 15 files changed, 151 insertions(+), 267 deletions(-)

-- 
2.14.3



[PATCH v3 1/8] lib/scatterlist: Introduce sgl_alloc() and sgl_free()

2017-11-03 Thread Bart Van Assche
Many kernel drivers contain code that allocates and frees both a
scatterlist and the pages that populate that scatterlist.
Introduce functions in lib/scatterlist.c that perform these tasks
instead of duplicating this functionality in multiple drivers.
Only include these functions in the build if CONFIG_SGL_ALLOC=y
to avoid that the kernel size increases if this functionality is
not used.

Signed-off-by: Bart Van Assche 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
---
 include/linux/scatterlist.h |  10 +
 lib/Kconfig |   4 ++
 lib/scatterlist.c   | 105 
 3 files changed, 119 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 4b3286ac60c8..3642511918d5 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -266,6 +266,16 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
unsigned long offset, unsigned long size,
gfp_t gfp_mask);
 
+#ifdef CONFIG_SGL_ALLOC
+struct scatterlist *sgl_alloc_order(unsigned long long length,
+   unsigned int order, bool chainable,
+   gfp_t gfp, unsigned int *nent_p);
+struct scatterlist *sgl_alloc(unsigned long long length, gfp_t gfp,
+ unsigned int *nent_p);
+void sgl_free_order(struct scatterlist *sgl, int order);
+void sgl_free(struct scatterlist *sgl);
+#endif /* CONFIG_SGL_ALLOC */
+
 size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
  size_t buflen, off_t skip, bool to_buffer);
 
diff --git a/lib/Kconfig b/lib/Kconfig
index b1445b22a6de..8396c4cfa1ab 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -413,6 +413,10 @@ config HAS_DMA
depends on !NO_DMA
default y
 
+config SGL_ALLOC
+   bool
+   default n
+
 config DMA_NOOP_OPS
bool
depends on HAS_DMA && (!64BIT || ARCH_DMA_ADDR_T_64BIT)
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index be7b4dd6b68d..e2b5a78ceb44 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -433,6 +433,111 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
 }
 EXPORT_SYMBOL(sg_alloc_table_from_pages);
 
+#ifdef CONFIG_SGL_ALLOC
+
+/**
+ * sgl_alloc_order - allocate a scatterlist and its pages
+ * @length: Length in bytes of the scatterlist. Must be at least one
+ * @order: Second argument for alloc_pages()
+ * @chainable: Whether or not to allocate an extra element in the scatterlist
+ * for scatterlist chaining purposes
+ * @gfp: Memory allocation flags
+ * @nent_p: [out] Number of entries in the scatterlist that have pages
+ *
+ * Returns: A pointer to an initialized scatterlist or %NULL upon failure.
+ */
+struct scatterlist *sgl_alloc_order(unsigned long long length,
+   unsigned int order, bool chainable,
+   gfp_t gfp, unsigned int *nent_p)
+{
+   struct scatterlist *sgl, *sg;
+   struct page *page;
+   unsigned int nent, nalloc;
+   u32 elem_len;
+
+   nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
+   /* Check for integer overflow */
+   if (length > (nent << (PAGE_SHIFT + order)))
+   return NULL;
+   nalloc = nent;
+   if (chainable) {
+   /* Check for integer overflow */
+   if (nalloc + 1 < nalloc)
+   return NULL;
+   nalloc++;
+   }
+   sgl = kmalloc_array(nalloc, sizeof(struct scatterlist),
+   (gfp & ~GFP_DMA) | __GFP_ZERO);
+   if (!sgl)
+   return NULL;
+
+   sg_init_table(sgl, nent);
+   sg = sgl;
+   while (length) {
+   elem_len = min_t(u64, length, PAGE_SIZE << order);
+   page = alloc_pages(gfp, order);
+   if (!page) {
+   sgl_free(sgl);
+   return NULL;
+   }
+
+   sg_set_page(sg, page, elem_len, 0);
+   length -= elem_len;
+   sg = sg_next(sg);
+   }
+   WARN_ON_ONCE(sg);
+   if (nent_p)
+   *nent_p = nent;
+   return sgl;
+}
+EXPORT_SYMBOL(sgl_alloc_order);
+
+/**
+ * sgl_alloc - allocate a scatterlist and its pages
+ * @length: Length in bytes of the scatterlist
+ * @gfp: Memory allocation flags
+ * @nent_p: [out] Number of entries in the scatterlist
+ *
+ * Returns: A pointer to an initialized scatterlist or %NULL upon failure.
+ */
+struct scatterlist *sgl_alloc(unsigned long long length, gfp_t gfp,
+ unsigned int *nent_p)
+{
+   return sgl_alloc_order(length, 0, false, gfp, nent_p);
+}
+EXPORT_SYMBOL(sgl_alloc);
+
+/**
+ * sgl_free_order - free a scatterlist and its pages
+ * @sgl: Scatterlist with one or more elements
+ * @order: Second argument for __free_pages()
+ */

[PATCH v3 8/8] scsi/pmcraid: Use sgl_alloc_order() and sgl_free_order()

2017-11-03 Thread Bart Van Assche
Use the sgl_alloc_order() and sgl_free_order() functions instead
of open coding these functions.

Signed-off-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Cc: linux-s...@vger.kernel.org
Cc: Martin K. Petersen 
Cc: Anil Ravindranath 
---
 drivers/scsi/Kconfig   |  1 +
 drivers/scsi/pmcraid.c | 43 ---
 drivers/scsi/pmcraid.h |  2 +-
 3 files changed, 6 insertions(+), 40 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index d918115afdca..04400bed39a1 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -1577,6 +1577,7 @@ config ZFCP
 config SCSI_PMCRAID
tristate "PMC SIERRA Linux MaxRAID adapter support"
depends on PCI && SCSI && NET
+   select SGL_ALLOC
---help---
  This driver supports the PMC SIERRA MaxRAID adapters.
 
diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c
index b4d6cd8cd1ad..95fc0352f9bb 100644
--- a/drivers/scsi/pmcraid.c
+++ b/drivers/scsi/pmcraid.c
@@ -3232,12 +3232,7 @@ static int pmcraid_build_ioadl(
  */
 static void pmcraid_free_sglist(struct pmcraid_sglist *sglist)
 {
-   int i;
-
-   for (i = 0; i < sglist->num_sg; i++)
-   __free_pages(sg_page(&(sglist->scatterlist[i])),
-sglist->order);
-
+   sgl_free_order(sglist->scatterlist, sglist->order);
kfree(sglist);
 }
 
@@ -3254,50 +3249,20 @@ static void pmcraid_free_sglist(struct pmcraid_sglist 
*sglist)
 static struct pmcraid_sglist *pmcraid_alloc_sglist(int buflen)
 {
struct pmcraid_sglist *sglist;
-   struct scatterlist *scatterlist;
-   struct page *page;
-   int num_elem, i, j;
int sg_size;
int order;
-   int bsize_elem;
 
sg_size = buflen / (PMCRAID_MAX_IOADLS - 1);
order = (sg_size > 0) ? get_order(sg_size) : 0;
-   bsize_elem = PAGE_SIZE * (1 << order);
-
-   /* Determine the actual number of sg entries needed */
-   if (buflen % bsize_elem)
-   num_elem = (buflen / bsize_elem) + 1;
-   else
-   num_elem = buflen / bsize_elem;
 
/* Allocate a scatter/gather list for the DMA */
-   sglist = kzalloc(sizeof(struct pmcraid_sglist) +
-(sizeof(struct scatterlist) * (num_elem - 1)),
-GFP_KERNEL);
-
+   sglist = kzalloc(sizeof(struct pmcraid_sglist), GFP_KERNEL);
if (sglist == NULL)
return NULL;
 
-   scatterlist = sglist->scatterlist;
-   sg_init_table(scatterlist, num_elem);
sglist->order = order;
-   sglist->num_sg = num_elem;
-   sg_size = buflen;
-
-   for (i = 0; i < num_elem; i++) {
-   page = alloc_pages(GFP_KERNEL|GFP_DMA|__GFP_ZERO, order);
-   if (!page) {
-   for (j = i - 1; j >= 0; j--)
-   __free_pages(sg_page([j]), order);
-   kfree(sglist);
-   return NULL;
-   }
-
-   sg_set_page([i], page,
-   sg_size < bsize_elem ? sg_size : bsize_elem, 0);
-   sg_size -= bsize_elem;
-   }
+   sgl_alloc_order(buflen, order, false,
+   GFP_KERNEL | GFP_DMA | __GFP_ZERO, >num_sg);
 
return sglist;
 }
diff --git a/drivers/scsi/pmcraid.h b/drivers/scsi/pmcraid.h
index 44da91712115..754ef30927e2 100644
--- a/drivers/scsi/pmcraid.h
+++ b/drivers/scsi/pmcraid.h
@@ -542,7 +542,7 @@ struct pmcraid_sglist {
u32 order;
u32 num_sg;
u32 num_dma_sg;
-   struct scatterlist scatterlist[1];
+   struct scatterlist *scatterlist;
 };
 
 /* page D0 inquiry data of focal point resource */
-- 
2.14.3



[PATCH v3 7/8] scsi/pmcraid: Remove an unused structure member

2017-11-03 Thread Bart Van Assche
Signed-off-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Cc: linux-s...@vger.kernel.org
Cc: Martin K. Petersen 
Cc: Anil Ravindranath 
---
 drivers/scsi/pmcraid.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/pmcraid.h b/drivers/scsi/pmcraid.h
index 8bfac72a242b..44da91712115 100644
--- a/drivers/scsi/pmcraid.h
+++ b/drivers/scsi/pmcraid.h
@@ -542,7 +542,6 @@ struct pmcraid_sglist {
u32 order;
u32 num_sg;
u32 num_dma_sg;
-   u32 buffer_len;
struct scatterlist scatterlist[1];
 };
 
-- 
2.14.3



[PATCH v3 6/8] scsi/ipr: Use sgl_alloc_order() and sgl_free_order()

2017-11-03 Thread Bart Van Assche
Use the sgl_alloc_order() and sgl_free_order() functions instead
of open coding these functions.

Signed-off-by: Bart Van Assche 
Acked-by: Brian King 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Cc: Martin K. Petersen 
Cc: linux-s...@vger.kernel.org
---
 drivers/scsi/Kconfig |  1 +
 drivers/scsi/ipr.c   | 49 -
 drivers/scsi/ipr.h   |  2 +-
 3 files changed, 10 insertions(+), 42 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 766955318005..d918115afdca 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -1059,6 +1059,7 @@ config SCSI_IPR
depends on PCI && SCSI && ATA
select FW_LOADER
select IRQ_POLL
+   select SGL_ALLOC
---help---
  This driver supports the IBM Power Linux family RAID adapters.
  This includes IBM pSeries 5712, 5703, 5709, and 570A, as well
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index f838bd73befa..6fed5db6255e 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -3815,10 +3815,8 @@ static struct device_attribute ipr_iopoll_weight_attr = {
  **/
 static struct ipr_sglist *ipr_alloc_ucode_buffer(int buf_len)
 {
-   int sg_size, order, bsize_elem, num_elem, i, j;
+   int sg_size, order;
struct ipr_sglist *sglist;
-   struct scatterlist *scatterlist;
-   struct page *page;
 
/* Get the minimum size per scatter/gather element */
sg_size = buf_len / (IPR_MAX_SGLIST - 1);
@@ -3826,45 +3824,18 @@ static struct ipr_sglist *ipr_alloc_ucode_buffer(int 
buf_len)
/* Get the actual size per element */
order = get_order(sg_size);
 
-   /* Determine the actual number of bytes per element */
-   bsize_elem = PAGE_SIZE * (1 << order);
-
-   /* Determine the actual number of sg entries needed */
-   if (buf_len % bsize_elem)
-   num_elem = (buf_len / bsize_elem) + 1;
-   else
-   num_elem = buf_len / bsize_elem;
-
/* Allocate a scatter/gather list for the DMA */
-   sglist = kzalloc(sizeof(struct ipr_sglist) +
-(sizeof(struct scatterlist) * (num_elem - 1)),
-GFP_KERNEL);
-
+   sglist = kzalloc(sizeof(struct ipr_sglist), GFP_KERNEL);
if (sglist == NULL) {
ipr_trace;
return NULL;
}
-
-   scatterlist = sglist->scatterlist;
-   sg_init_table(scatterlist, num_elem);
-
sglist->order = order;
-   sglist->num_sg = num_elem;
-
-   /* Allocate a bunch of sg elements */
-   for (i = 0; i < num_elem; i++) {
-   page = alloc_pages(GFP_KERNEL, order);
-   if (!page) {
-   ipr_trace;
-
-   /* Free up what we already allocated */
-   for (j = i - 1; j >= 0; j--)
-   __free_pages(sg_page([j]), order);
-   kfree(sglist);
-   return NULL;
-   }
-
-   sg_set_page([i], page, 0, 0);
+   sglist->scatterlist = sgl_alloc_order(buf_len, order, false, GFP_KERNEL,
+ >num_sg);
+   if (!sglist->scatterlist) {
+   kfree(sglist);
+   return NULL;
}
 
return sglist;
@@ -3882,11 +3853,7 @@ static struct ipr_sglist *ipr_alloc_ucode_buffer(int 
buf_len)
  **/
 static void ipr_free_ucode_buffer(struct ipr_sglist *sglist)
 {
-   int i;
-
-   for (i = 0; i < sglist->num_sg; i++)
-   __free_pages(sg_page(>scatterlist[i]), sglist->order);
-
+   sgl_free_order(sglist->scatterlist, sglist->order);
kfree(sglist);
 }
 
diff --git a/drivers/scsi/ipr.h b/drivers/scsi/ipr.h
index c7f0e9e3cd7d..93570734cbfb 100644
--- a/drivers/scsi/ipr.h
+++ b/drivers/scsi/ipr.h
@@ -1454,7 +1454,7 @@ struct ipr_sglist {
u32 num_sg;
u32 num_dma_sg;
u32 buffer_len;
-   struct scatterlist scatterlist[1];
+   struct scatterlist *scatterlist;
 };
 
 enum ipr_sdt_state {
-- 
2.14.3



[PATCH v3 4/8] nvmet/rdma: Use sgl_alloc() and sgl_free()

2017-11-03 Thread Bart Van Assche
Use the sgl_alloc() and sgl_free() functions instead of open coding
these functions.

Signed-off-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Cc: Keith Busch 
Cc: Christoph Hellwig 
Cc: Sagi Grimberg 
---
 drivers/nvme/target/Kconfig |  1 +
 drivers/nvme/target/rdma.c  | 63 +++--
 2 files changed, 5 insertions(+), 59 deletions(-)

diff --git a/drivers/nvme/target/Kconfig b/drivers/nvme/target/Kconfig
index 4d9715630e21..5f4f8b16685f 100644
--- a/drivers/nvme/target/Kconfig
+++ b/drivers/nvme/target/Kconfig
@@ -29,6 +29,7 @@ config NVME_TARGET_RDMA
tristate "NVMe over Fabrics RDMA target support"
depends on INFINIBAND
depends on NVME_TARGET
+   select SGL_ALLOC
help
  This enables the NVMe RDMA target support, which allows exporting NVMe
  devices over RDMA.
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 76d2bb793afe..363d44c10b68 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -185,59 +185,6 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
spin_unlock_irqrestore(>queue->rsps_lock, flags);
 }
 
-static void nvmet_rdma_free_sgl(struct scatterlist *sgl, unsigned int nents)
-{
-   struct scatterlist *sg;
-   int count;
-
-   if (!sgl || !nents)
-   return;
-
-   for_each_sg(sgl, sg, nents, count)
-   __free_page(sg_page(sg));
-   kfree(sgl);
-}
-
-static int nvmet_rdma_alloc_sgl(struct scatterlist **sgl, unsigned int *nents,
-   u32 length)
-{
-   struct scatterlist *sg;
-   struct page *page;
-   unsigned int nent;
-   int i = 0;
-
-   nent = DIV_ROUND_UP(length, PAGE_SIZE);
-   sg = kmalloc_array(nent, sizeof(struct scatterlist), GFP_KERNEL);
-   if (!sg)
-   goto out;
-
-   sg_init_table(sg, nent);
-
-   while (length) {
-   u32 page_len = min_t(u32, length, PAGE_SIZE);
-
-   page = alloc_page(GFP_KERNEL);
-   if (!page)
-   goto out_free_pages;
-
-   sg_set_page([i], page, page_len, 0);
-   length -= page_len;
-   i++;
-   }
-   *sgl = sg;
-   *nents = nent;
-   return 0;
-
-out_free_pages:
-   while (i > 0) {
-   i--;
-   __free_page(sg_page([i]));
-   }
-   kfree(sg);
-out:
-   return NVME_SC_INTERNAL;
-}
-
 static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev,
struct nvmet_rdma_cmd *c, bool admin)
 {
@@ -484,7 +431,7 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp 
*rsp)
}
 
if (rsp->req.sg != >cmd->inline_sg)
-   nvmet_rdma_free_sgl(rsp->req.sg, rsp->req.sg_cnt);
+   sgl_free(rsp->req.sg);
 
if (unlikely(!list_empty_careful(>rsp_wr_wait_list)))
nvmet_rdma_process_wr_wait_list(queue);
@@ -620,16 +567,14 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp 
*rsp,
u32 len = get_unaligned_le24(sgl->length);
u32 key = get_unaligned_le32(sgl->key);
int ret;
-   u16 status;
 
/* no data command? */
if (!len)
return 0;
 
-   status = nvmet_rdma_alloc_sgl(>req.sg, >req.sg_cnt,
-   len);
-   if (status)
-   return status;
+   rsp->req.sg = sgl_alloc(len, GFP_KERNEL, >req.sg_cnt);
+   if (!rsp->req.sg)
+   return NVME_SC_INTERNAL;
 
ret = rdma_rw_ctx_init(>rw, cm_id->qp, cm_id->port_num,
rsp->req.sg, rsp->req.sg_cnt, 0, addr, key,
-- 
2.14.3



[PATCH v3 2/8] crypto: scompress - use sgl_alloc() and sgl_free()

2017-11-03 Thread Bart Van Assche
Use the sgl_alloc() and sgl_free() functions instead of open coding
these functions.

Signed-off-by: Bart Van Assche 
Cc: Ard Biesheuvel 
Cc: Herbert Xu 
---
 crypto/Kconfig |  1 +
 crypto/scompress.c | 51 ++-
 2 files changed, 3 insertions(+), 49 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 0a121f9ddf8e..a0667dd284ff 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -105,6 +105,7 @@ config CRYPTO_KPP
 config CRYPTO_ACOMP2
tristate
select CRYPTO_ALGAPI2
+   select SGL_ALLOC
 
 config CRYPTO_ACOMP
tristate
diff --git a/crypto/scompress.c b/crypto/scompress.c
index 2075e2c4e7df..968bbcf65c94 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -140,53 +140,6 @@ static int crypto_scomp_init_tfm(struct crypto_tfm *tfm)
return ret;
 }
 
-static void crypto_scomp_sg_free(struct scatterlist *sgl)
-{
-   int i, n;
-   struct page *page;
-
-   if (!sgl)
-   return;
-
-   n = sg_nents(sgl);
-   for_each_sg(sgl, sgl, n, i) {
-   page = sg_page(sgl);
-   if (page)
-   __free_page(page);
-   }
-
-   kfree(sgl);
-}
-
-static struct scatterlist *crypto_scomp_sg_alloc(size_t size, gfp_t gfp)
-{
-   struct scatterlist *sgl;
-   struct page *page;
-   int i, n;
-
-   n = ((size - 1) >> PAGE_SHIFT) + 1;
-
-   sgl = kmalloc_array(n, sizeof(struct scatterlist), gfp);
-   if (!sgl)
-   return NULL;
-
-   sg_init_table(sgl, n);
-
-   for (i = 0; i < n; i++) {
-   page = alloc_page(gfp);
-   if (!page)
-   goto err;
-   sg_set_page(sgl + i, page, PAGE_SIZE, 0);
-   }
-
-   return sgl;
-
-err:
-   sg_mark_end(sgl + i);
-   crypto_scomp_sg_free(sgl);
-   return NULL;
-}
-
 static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
 {
struct crypto_acomp *tfm = crypto_acomp_reqtfm(req);
@@ -220,7 +173,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, 
int dir)
  scratch_dst, >dlen, *ctx);
if (!ret) {
if (!req->dst) {
-   req->dst = crypto_scomp_sg_alloc(req->dlen, GFP_ATOMIC);
+   req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL);
if (!req->dst)
goto out;
}
@@ -274,7 +227,7 @@ int crypto_init_scomp_ops_async(struct crypto_tfm *tfm)
 
crt->compress = scomp_acomp_compress;
crt->decompress = scomp_acomp_decompress;
-   crt->dst_free = crypto_scomp_sg_free;
+   crt->dst_free = sgl_free;
crt->reqsize = sizeof(void *);
 
return 0;
-- 
2.14.3



[PATCH v3 5/8] target: Use sgl_alloc_order() and sgl_free()

2017-11-03 Thread Bart Van Assche
Use the sgl_alloc_order() and sgl_free() functions instead of open
coding these functions.

Signed-off-by: Bart Van Assche 
Cc: Nicholas A. Bellinger 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Sagi Grimberg 
---
 drivers/target/Kconfig |  1 +
 drivers/target/target_core_transport.c | 46 +++---
 2 files changed, 5 insertions(+), 42 deletions(-)

diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig
index e2bc99980f75..4c44d7bed01a 100644
--- a/drivers/target/Kconfig
+++ b/drivers/target/Kconfig
@@ -5,6 +5,7 @@ menuconfig TARGET_CORE
select CONFIGFS_FS
select CRC_T10DIF
select BLK_SCSI_REQUEST # only for scsi_command_size_tbl..
+   select SGL_ALLOC
default n
help
Say Y or M here to enable the TCM Storage Engine and ConfigFS enabled
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 836d552b0385..9bbd08be9d60 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2293,13 +2293,7 @@ static void target_complete_ok_work(struct work_struct 
*work)
 
 void target_free_sgl(struct scatterlist *sgl, int nents)
 {
-   struct scatterlist *sg;
-   int count;
-
-   for_each_sg(sgl, sg, nents, count)
-   __free_page(sg_page(sg));
-
-   kfree(sgl);
+   sgl_free(sgl);
 }
 EXPORT_SYMBOL(target_free_sgl);
 
@@ -2423,42 +2417,10 @@ int
 target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents, u32 length,
 bool zero_page, bool chainable)
 {
-   struct scatterlist *sg;
-   struct page *page;
-   gfp_t zero_flag = (zero_page) ? __GFP_ZERO : 0;
-   unsigned int nalloc, nent;
-   int i = 0;
-
-   nalloc = nent = DIV_ROUND_UP(length, PAGE_SIZE);
-   if (chainable)
-   nalloc++;
-   sg = kmalloc_array(nalloc, sizeof(struct scatterlist), GFP_KERNEL);
-   if (!sg)
-   return -ENOMEM;
+   gfp_t gfp = GFP_KERNEL | (zero_page ? __GFP_ZERO : 0);
 
-   sg_init_table(sg, nalloc);
-
-   while (length) {
-   u32 page_len = min_t(u32, length, PAGE_SIZE);
-   page = alloc_page(GFP_KERNEL | zero_flag);
-   if (!page)
-   goto out;
-
-   sg_set_page([i], page, page_len, 0);
-   length -= page_len;
-   i++;
-   }
-   *sgl = sg;
-   *nents = nent;
-   return 0;
-
-out:
-   while (i > 0) {
-   i--;
-   __free_page(sg_page([i]));
-   }
-   kfree(sg);
-   return -ENOMEM;
+   *sgl = sgl_alloc_order(length, 0, chainable, gfp, nents);
+   return *sgl ? 0 : -ENOMEM;
 }
 EXPORT_SYMBOL(target_alloc_sgl);
 
-- 
2.14.3



Re: [PATCH v2] blk-mq: Make blk_mq_get_request() error path less confusing

2017-11-03 Thread Jens Axboe
On 11/03/2017 12:31 PM, Bart Van Assche wrote:
> On Mon, 2017-10-16 at 16:32 -0700, Bart Van Assche wrote:
>> blk_mq_get_tag() can modify data->ctx. This means that in the
>> error path of blk_mq_get_request() data->ctx should be passed to
>> blk_mq_put_ctx() instead of local_ctx. Note: since blk_mq_put_ctx()
>> ignores its argument, this patch does not change any functionality.
> 
> Hello Jens,
> 
> Would it be possible to share your opinion about this patch?

Yeah looks fine, I queued it up.

-- 
Jens Axboe



Re: [PATCH v2] blk-mq: Make blk_mq_get_request() error path less confusing

2017-11-03 Thread Bart Van Assche
On Mon, 2017-10-16 at 16:32 -0700, Bart Van Assche wrote:
> blk_mq_get_tag() can modify data->ctx. This means that in the
> error path of blk_mq_get_request() data->ctx should be passed to
> blk_mq_put_ctx() instead of local_ctx. Note: since blk_mq_put_ctx()
> ignores its argument, this patch does not change any functionality.

Hello Jens,

Would it be possible to share your opinion about this patch?

Thanks,

Bart.

[PATCH RESEND] badblocks: fix wrong return value in badblocks_set if badblocks are disabled

2017-11-03 Thread Liu Bo
MD's rdev_set_badblocks() expects that badblocks_set() returns 1 if
badblocks are disabled, otherwise, rdev_set_badblocks() will record
superblock changes and return success in that case and md will fail to
report an IO error which it should.

This bug has existed since badblocks were introduced in commit
9e0e252a048b ("badblocks: Add core badblock management code").

Signed-off-by: Liu Bo 
Acked-by: Guoqing Jiang 
---
 block/badblocks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/badblocks.c b/block/badblocks.c
index 43c7116..91f7bcf 100644
--- a/block/badblocks.c
+++ b/block/badblocks.c
@@ -178,7 +178,7 @@ int badblocks_set(struct badblocks *bb, sector_t s, int 
sectors,
 
if (bb->shift < 0)
/* badblocks are disabled */
-   return 0;
+   return 1;
 
if (bb->shift) {
/* round the start down, and the end up */
-- 
2.9.4



Re: [PATCH v4] blk-mq: fix nr_requests wrong value when modify it from sysfs

2017-11-03 Thread Jens Axboe
On 09/22/2017 09:36 AM, weiping zhang wrote:
> if blk-mq use "none" io scheduler, nr_request get a wrong value when
> input a number > tag_set->queue_depth. blk_mq_tag_update_depth will get
> the smaller one min(nr, set->queue_depth), and then q->nr_request get a
> wrong value.
> 
> Reproduce:
> 
> echo none > /sys/block/nvme0n1/queue/scheduler
> echo 100 > /sys/block/nvme0n1/queue/nr_requests
> cat /sys/block/nvme0n1/queue/nr_requests
> 100

Sorry for the delay - looks good to me, and also tested that it does
fix the issue. Thanks, applied for 4.15.

-- 
Jens Axboe



Re: [PATCH v4] blk-mq: fix nr_requests wrong value when modify it from sysfs

2017-11-03 Thread weiping zhang
On Fri, Sep 22, 2017 at 11:36:28PM +0800, weiping zhang wrote:
> if blk-mq use "none" io scheduler, nr_request get a wrong value when
> input a number > tag_set->queue_depth. blk_mq_tag_update_depth will get
> the smaller one min(nr, set->queue_depth), and then q->nr_request get a
> wrong value.
> 
> Reproduce:
> 
> echo none > /sys/block/nvme0n1/queue/scheduler
> echo 100 > /sys/block/nvme0n1/queue/nr_requests
> cat /sys/block/nvme0n1/queue/nr_requests
> 100
> 
> Signed-off-by: weiping zhang 
> ---
> 
> Changes since v4:
>  * fix typo in commit message(queue/ioscheduler => queue/scheduler)
> 
> Changes since v3:
>  * remove compare nr with tags->qdepth, pass nr to blk_mq_tag_update_depth
> directly
> 
>  * remove return EINVAL when user modify nr_request less than BLKDEV_MIN_RQ
> 
> Changes since v2:
>  * add return EINVAL when user modify nr_request less than BLKDEV_MIN_RQ
>  * remove pr_warn, and return EINVAL, if input number is too large
> 
>  block/blk-mq.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 98a1860..491e336 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2642,8 +2642,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, 
> unsigned int nr)
>* queue depth. This is similar to what the old code would do.
>*/
>   if (!hctx->sched_tags) {
> - ret = blk_mq_tag_update_depth(hctx, >tags,
> - min(nr, 
> set->queue_depth),
> + ret = blk_mq_tag_update_depth(hctx, >tags, nr,
>   false);
>   } else {
>   ret = blk_mq_tag_update_depth(hctx, >sched_tags,
> -- 
> 2.9.4
> 
Hello Jens,

ping


Re: [PATCH] badblocks: fix wrong return value in badblocks_set if badblocks are disabled

2017-11-03 Thread Shaohua Li
On Fri, Nov 03, 2017 at 10:13:38AM -0600, Liu Bo wrote:
> Hi Shaohua,
> 
> Given it's related to md, can you please take this thru your tree?

Yes, the patch makes sense. Can you resend the patch to me? I can't find it in 
my inbox

Thanks,
Shaohua
 
> Thanks,
> 
> -liubo
> 
> On Wed, Sep 27, 2017 at 04:13:17PM -0600, Liu Bo wrote:
> > MD's rdev_set_badblocks() expects that badblocks_set() returns 1 if
> > badblocks are disabled, otherwise, rdev_set_badblocks() will record
> > superblock changes and return success in that case and md will fail to
> > report an IO error which it should.
> > 
> > This bug has existed since badblocks were introduced in commit
> > 9e0e252a048b ("badblocks: Add core badblock management code").
> > 
> > Signed-off-by: Liu Bo 
> > ---
> >  block/badblocks.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/block/badblocks.c b/block/badblocks.c
> > index 43c7116..91f7bcf 100644
> > --- a/block/badblocks.c
> > +++ b/block/badblocks.c
> > @@ -178,7 +178,7 @@ int badblocks_set(struct badblocks *bb, sector_t s, int 
> > sectors,
> >  
> > if (bb->shift < 0)
> > /* badblocks are disabled */
> > -   return 0;
> > +   return 1;
> >  
> > if (bb->shift) {
> > /* round the start down, and the end up */
> > -- 
> > 2.9.4
> > 


Re: [PATCH] badblocks: fix wrong return value in badblocks_set if badblocks are disabled

2017-11-03 Thread Liu Bo
Hi Shaohua,

Given it's related to md, can you please take this thru your tree?

Thanks,

-liubo

On Wed, Sep 27, 2017 at 04:13:17PM -0600, Liu Bo wrote:
> MD's rdev_set_badblocks() expects that badblocks_set() returns 1 if
> badblocks are disabled, otherwise, rdev_set_badblocks() will record
> superblock changes and return success in that case and md will fail to
> report an IO error which it should.
> 
> This bug has existed since badblocks were introduced in commit
> 9e0e252a048b ("badblocks: Add core badblock management code").
> 
> Signed-off-by: Liu Bo 
> ---
>  block/badblocks.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/badblocks.c b/block/badblocks.c
> index 43c7116..91f7bcf 100644
> --- a/block/badblocks.c
> +++ b/block/badblocks.c
> @@ -178,7 +178,7 @@ int badblocks_set(struct badblocks *bb, sector_t s, int 
> sectors,
>  
>   if (bb->shift < 0)
>   /* badblocks are disabled */
> - return 0;
> + return 1;
>  
>   if (bb->shift) {
>   /* round the start down, and the end up */
> -- 
> 2.9.4
> 


Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-03 Thread Ming Lei
On Fri, Nov 03, 2017 at 12:13:09PM -0400, Laurence Oberman wrote:
> Hi
> I had it working some time back. I am off today to take my son to the
> doctor.
> I will get Bart's test working again this weekend.

Hello Laurence and Bart,

Just found srp-test starts to work now with v4.14-rc4 kernel, and
the IO hang issue has been reproduced one time. BTW, the connection
failure happened on for-next of block tree.

Will investigate the issue this weekend.

Thanks,
Ming

> 
> On Nov 3, 2017 11:51 AM, "Bart Van Assche"  wrote:
> 
> > On Fri, 2017-11-03 at 23:47 +0800, Ming Lei wrote:
> > > Forget to mention, there is failure when running 'make' under srp-test
> > > because shellcheck package is missed in RHEL7. Can that be the issue
> > > of test failure? If yes, could you provide a special version of srp-test
> > > which doesn't depend on shellcheck?
> >
> > The srp-test software does not depend on shellcheck. The only purpose of
> > the "shellcheck" target in the srp-test Makefile is to verify the syntax
> > of the shell scripts. The tests run fine even if shellcheck is missing.
> >
> > Bart.

-- 
Ming


Re: block layer patches for nvme multipath support

2017-11-03 Thread Jens Axboe
On 11/02/2017 12:29 PM, Christoph Hellwig wrote:
> Hi Jens,
> 
> these patches add the block layer helpers / tweaks for NVMe multipath
> support.  Can you review them for inclusion?
> 
> There have been no functional changes to the versions posted with
> previous nvme multipath patchset.

I've queued this up. I really hate the direct_make_request(), but
I also don't have a good suggestion on how to do this any better
right now.

-- 
Jens Axboe



Re: [GIT PULL] nvme updates for Linux 4.15

2017-11-03 Thread Jens Axboe
On 11/03/2017 06:17 AM, Christoph Hellwig wrote:
> Hi Jens,
> 
> below are the currently queue nvme updates for Linux 4.15.  There are
> a few more things that could make it for this merge window, but I'd
> like to get things into linux-next, especially for the unlikely case
> that Linus decided to cut -rc8.
> 
> Highlights:
>  - support for SGLs in the PCIe driver (Chaitanya Kulkarni)
>  - disable I/O schedulers for the admin queue (Israel Rukshin)
>  - various Fibre Channel fixes and enhancements (James Smart)
>  - various refactoring for better code sharing between transports
>(Sagi Grimberg and me)
> 
> as well as lots of little bits from various contributors.

Pulled, thanks.

-- 
Jens Axboe



Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-03 Thread Bart Van Assche
On Fri, 2017-11-03 at 23:47 +0800, Ming Lei wrote:
> Forget to mention, there is failure when running 'make' under srp-test
> because shellcheck package is missed in RHEL7. Can that be the issue
> of test failure? If yes, could you provide a special version of srp-test
> which doesn't depend on shellcheck?

The srp-test software does not depend on shellcheck. The only purpose of
the "shellcheck" target in the srp-test Makefile is to verify the syntax
of the shell scripts. The tests run fine even if shellcheck is missing.

Bart.

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-03 Thread Bart Van Assche
On Fri, 2017-11-03 at 23:18 +0800, Ming Lei wrote:
> BTW, Laurence found there is kernel crash in his IB/SRP test when running
> for-next branch of block tree, so we just test v4.14-rc4 w/wo my blk-mq 
> patches.

One fix for a *sporadic* initiator crash has been queued for the v4.15 merge
window. If you hit another crash, please let me know. See also
https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git/commit/?h=k.o/for-next=8a0d18c62121d3c554a83eb96e2752861d84d937.

Bart.

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-03 Thread Ming Lei
On Fri, Nov 03, 2017 at 03:23:14PM +, Bart Van Assche wrote:
> On Fri, 2017-11-03 at 11:50 +0800, Ming Lei wrote:
> > On Fri, Nov 03, 2017 at 02:42:50AM +, Bart Van Assche wrote:
> > > On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote:
> > > > [root@ibclient srp-test]# ./run_tests
> > > > modprobe: FATAL: Module target_core_mod is in use.
> > > 
> > > LIO must be unloaded before srp-test software is started.
> > 
> > Yeah, I can make this test kick off after running
> > 'modprobe -fr ib_isert' first, but looks all tests failed without
> > any hang, could you check if that is the expected result?
> > 
> > https://pastebin.com/VXe66Jpg
> 
> I see plenty of "... >/sys/.../add_target failed: Connection timed out"
> messages. The srp-test scripts use the loopback functionality of IB ports
> so one or more IB ports must be active. Have you already checked the IB
> ports state, e.g. by running ibv_devinfo?

[root@ibclient ~]# ibv_devinfo
hca_id: mlx5_1
transport:  InfiniBand (0)
fw_ver: 12.20.1010
node_guid:  7cfe:9003:0072:6ed3
sys_image_guid: 7cfe:9003:0072:6ed2
vendor_id:  0x02c9
vendor_part_id: 4115
hw_ver: 0x0
board_id:   MT_2190110032
phys_port_cnt:  1
port:   1
state:  PORT_ACTIVE (4)
max_mtu:4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid:   1
port_lmc:   0x00
link_layer: InfiniBand

hca_id: mlx5_0
transport:  InfiniBand (0)
fw_ver: 12.20.1010
node_guid:  7cfe:9003:0072:6ed2
sys_image_guid: 7cfe:9003:0072:6ed2
vendor_id:  0x02c9
vendor_part_id: 4115
hw_ver: 0x0
board_id:   MT_2190110032
phys_port_cnt:  1
port:   1
state:  PORT_ACTIVE (4)
max_mtu:4096 (5)
active_mtu: 4096 (5)
sm_lid: 4
port_lid:   4
port_lmc:   0x00
link_layer: InfiniBand

Forget to mention, there is failure when running 'make' under srp-test
because shellcheck package is missed in RHEL7. Can that be the issue
of test failure? If yes, could you provide a special version of srp-test
which doesn't depend on shellcheck?

> 
> > Also could you provide some output from debugfs when this hang happens?
> > such as:
> > 
> > dispatch/busy/.. under hctxN
> 
> This is the output that appears on my test setup if the two patches from
> the patch series "block: remove unnecessary RESTART" are applied:
> 
> # ~bart/software/infiniband/srp-test/run_tests -f xfs -d -e deadline -r 60
>  
> Unloaded the ib_srpt kernel module
> Zero-initializing /dev/ram0 ... done
> Zero-initializing /dev/ram1 ... done
> Zero-initializing /dev/disk/by-id/scsi-04650 ... done
> brw-rw 1 root disk 1, 0 Nov  3 08:10 /dev/ram0
> /dev/ram0
> brw-rw 1 root disk 1, 1 Nov  3 08:10 /dev/ram1
> /dev/ram1
> lrwxrwxrwx 1 root root 9 Nov  3 08:11 /dev/disk/by-id/scsi-04650 
> -> ../../sdi
> /dev/sdi
> Configured SRP target driver
> Running test /home/bart/software/infiniband/srp-test/tests/01 ...
> id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe82c90300a02d51,pkey=7fff,service_id=0x0002c90300a02d50,
>  >/sys/class/infiniband_srp/
> srp-mlx4_0-2/add_target failed: Invalid argument
> id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe82c90300a02d52,pkey=7fff,service_id=0x0002c90300a02d50,
>  >/sys/class/infiniband_srp/
> srp-mlx4_0-2/add_target failed: Invalid argument
> count_luns(): 3 <> 3
> Error: unloading kernel module ib_srp failed
> Test /home/bart/software/infiniband/srp-test/tests/01 failed
> # cat /sys/module/scsi_mod/parameters/use_blk_mq  
> Y
> # find /sys/kernel/debug/block -name busy | xargs cat

Can you check 'dispatch' file, hctx state, tags and sched_tags?
And see where is the IO not dispatched?

> # dmesg -c >/dev/null; echo w > /proc/sysrq-trigger; dmesg
> [  709.321750] sysrq: SysRq : Show Blocked State
> [  709.327906]   taskPC stack   pid father
...
> [  709.456602] kworker/u66:6   D0   341  2 0x8000
> [  709.463371] Workqueue: events_unbound async_run_entry_fn
> [  

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-03 Thread Bart Van Assche
On Fri, 2017-11-03 at 23:18 +0800, Ming Lei wrote:
> Subject: [PATCH] SCSI_MQ: fix IO hang in case of queue busy
> 
> We have to insert the rq back before checking .device_busy,
> otherwise When IO completes just after the check and before
> this req is added to hctx->dispatch, this queue may never get
> chance to be run, then this IO may hang forever.
> 
> This patch introduces BLK_STS_RESOURCE_OK for handling this
> issue.
> 
> Signed-off-by: Ming Lei 
> ---
>  block/blk-mq.c| 17 +
>  drivers/scsi/scsi_lib.c   |  8 
>  include/linux/blk-mq.h|  1 +
>  include/linux/blk_types.h |  1 +
>  4 files changed, 27 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index e4d2490f4e7e..e1e03576edca 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -660,6 +660,16 @@ static void __blk_mq_requeue_request(struct request *rq)
>   }
>  }
>  
> +void blk_mq_reinsert_request_hctx(struct blk_mq_hw_ctx *hctx, struct request 
> *rq)
> +{
> + __blk_mq_requeue_request(rq);
> +
> + spin_lock(>lock);
> + list_add_tail(>queuelist, >dispatch);
> + spin_unlock(>lock);
> +}
> +EXPORT_SYMBOL(blk_mq_reinsert_request_hctx);
> +
>  void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
>  {
>   __blk_mq_requeue_request(rq);
> @@ -1165,6 +1175,12 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
> struct list_head *list,
>   list_add(>queuelist, list);
>   __blk_mq_requeue_request(rq);
>   break;
> + } else if (ret == BLK_STS_RESOURCE_OK) {
> + /*
> +  * BLK_STS_RESOURCE_OK means driver handled this
> +  * STS_RESOURCE already, we just need to stop dispatch.
> +  */
> + break;
>   }
>  
>   fail_rq:
> @@ -1656,6 +1672,7 @@ static void __blk_mq_try_issue_directly(struct 
> blk_mq_hw_ctx *hctx,
>   ret = q->mq_ops->queue_rq(hctx, );
>   switch (ret) {
>   case BLK_STS_OK:
> + case BLK_STS_RESOURCE_OK:
>   *cookie = new_cookie;
>   return;
>   case BLK_STS_RESOURCE:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 7f218ef61900..0165c1caed82 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2030,9 +2030,17 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx 
> *hctx,
>   case BLK_STS_OK:
>   break;
>   case BLK_STS_RESOURCE:
> + /*
> +  * We have to insert the rq back before checking .device_busy,
> +  * otherwise when IO completes just after the check and before
> +  * this req is added to hctx->dispatch, this queue may never get
> +  * chance to be run, then this IO may hang forever.
> +  */
> + blk_mq_reinsert_request_hctx(hctx, req);
>   if (atomic_read(>device_busy) == 0 &&
>   !scsi_device_blocked(sdev))
>   blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> + ret = BLK_STS_RESOURCE_OK;
>   break;
>   default:
>   /*
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index e5e6becd57d3..4740f643d8c5 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -244,6 +244,7 @@ void blk_mq_start_request(struct request *rq);
>  void blk_mq_end_request(struct request *rq, blk_status_t error);
>  void __blk_mq_end_request(struct request *rq, blk_status_t error);
>  
> +void blk_mq_reinsert_request_hctx(struct blk_mq_hw_ctx *hctx, struct request 
> *rq);
>  void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
>  void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
>   bool kick_requeue_list);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 3385c89f402e..b630cc026a93 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -32,6 +32,7 @@ typedef u8 __bitwise blk_status_t;
>  #define BLK_STS_PROTECTION   ((__force blk_status_t)8)
>  #define BLK_STS_RESOURCE ((__force blk_status_t)9)
>  #define BLK_STS_IOERR((__force blk_status_t)10)
> +#define BLK_STS_RESOURCE_OK  ((__force blk_status_t)11)
>  
>  /* hack for device mapper, don't use elsewhere: */
>  #define BLK_STS_DM_REQUEUE((__force blk_status_t)11)

Running test one with this patch applied on top of Jens' for-next branch yields
the same result as without this patch:
[ ... ]
Error: unloading kernel module ib_srp failed
Test /home/bart/software/infiniband/srp-test/tests/01 failed
# dmesg -c >/dev/null; echo w > /proc/sysrq-trigger; dmesg
[  325.167295] sysrq: SysRq : Show Blocked State
[  325.173612]   taskPC stack   pid father
[  325.181903] kworker/10:1D0   165  2 0x8000
[  325.189396] 

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-03 Thread Bart Van Assche
On Fri, 2017-11-03 at 11:50 +0800, Ming Lei wrote:
> On Fri, Nov 03, 2017 at 02:42:50AM +, Bart Van Assche wrote:
> > On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote:
> > > [root@ibclient srp-test]# ./run_tests
> > > modprobe: FATAL: Module target_core_mod is in use.
> > 
> > LIO must be unloaded before srp-test software is started.
> 
> Yeah, I can make this test kick off after running
> 'modprobe -fr ib_isert' first, but looks all tests failed without
> any hang, could you check if that is the expected result?
> 
>   https://pastebin.com/VXe66Jpg

I see plenty of "... >/sys/.../add_target failed: Connection timed out"
messages. The srp-test scripts use the loopback functionality of IB ports
so one or more IB ports must be active. Have you already checked the IB
ports state, e.g. by running ibv_devinfo?

> Also could you provide some output from debugfs when this hang happens?
> such as:
> 
>   dispatch/busy/.. under hctxN

This is the output that appears on my test setup if the two patches from
the patch series "block: remove unnecessary RESTART" are applied:

# ~bart/software/infiniband/srp-test/run_tests -f xfs -d -e deadline -r 60  
   
Unloaded the ib_srpt kernel module
Zero-initializing /dev/ram0 ... done
Zero-initializing /dev/ram1 ... done
Zero-initializing /dev/disk/by-id/scsi-04650 ... done
brw-rw 1 root disk 1, 0 Nov  3 08:10 /dev/ram0
/dev/ram0
brw-rw 1 root disk 1, 1 Nov  3 08:10 /dev/ram1
/dev/ram1
lrwxrwxrwx 1 root root 9 Nov  3 08:11 /dev/disk/by-id/scsi-04650 -> 
../../sdi
/dev/sdi
Configured SRP target driver
Running test /home/bart/software/infiniband/srp-test/tests/01 ...
id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe82c90300a02d51,pkey=7fff,service_id=0x0002c90300a02d50,
 >/sys/class/infiniband_srp/
srp-mlx4_0-2/add_target failed: Invalid argument
id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe82c90300a02d52,pkey=7fff,service_id=0x0002c90300a02d50,
 >/sys/class/infiniband_srp/
srp-mlx4_0-2/add_target failed: Invalid argument
count_luns(): 3 <> 3
Error: unloading kernel module ib_srp failed
Test /home/bart/software/infiniband/srp-test/tests/01 failed
# cat /sys/module/scsi_mod/parameters/use_blk_mq  
Y
# find /sys/kernel/debug/block -name busy | xargs cat
# dmesg -c >/dev/null; echo w > /proc/sysrq-trigger; dmesg
[  709.321750] sysrq: SysRq : Show Blocked State
[  709.327906]   taskPC stack   pid father
[  709.335910] kworker/4:1 D054  2 0x8000
[  709.342887] Workqueue: srp_remove srp_remove_work [ib_srp]
[  709.349806] Call Trace:
[  709.353365]  __schedule+0x2fa/0xbb0
[  709.357865]  schedule+0x36/0x90
[  709.361985]  async_synchronize_cookie_domain+0x88/0x130
[  709.368422]  ? finish_wait+0x90/0x90
[  709.373033]  async_synchronize_full_domain+0x18/0x20
[  709.379184]  sd_remove+0x4d/0xc0 [sd_mod]
[  709.384282]  device_release_driver_internal+0x160/0x210
[  709.390752]  device_release_driver+0x12/0x20
[  709.396130]  bus_remove_device+0x100/0x180
[  709.401332]  device_del+0x1d8/0x340
[  709.405823]  __scsi_remove_device+0xfc/0x130
[  709.411221]  scsi_forget_host+0x25/0x70
[  709.416108]  scsi_remove_host+0x79/0x120
[  709.421114]  srp_remove_work+0x90/0x1d0 [ib_srp]
[  709.426867]  process_one_work+0x20a/0x660
[  709.431971]  worker_thread+0x3d/0x3b0
[  709.436655]  kthread+0x13a/0x150
[  709.440872]  ? process_one_work+0x660/0x660
[  709.446133]  ? kthread_create_on_node+0x40/0x40
[  709.451825]  ret_from_fork+0x27/0x40
[  709.456602] kworker/u66:6   D0   341  2 0x8000
[  709.463371] Workqueue: events_unbound async_run_entry_fn
[  709.469919] Call Trace:
[  709.473325]  __schedule+0x2fa/0xbb0
[  709.477850]  ? trace_hardirqs_on_caller+0xfb/0x190
[  709.483847]  schedule+0x36/0x90
[  709.487982]  schedule_timeout+0x22c/0x570
[  709.493115]  ? call_timer_fn+0x330/0x330
[  709.498133]  ? wait_for_completion_io_timeout+0xf7/0x180
[  709.504732]  io_schedule_timeout+0x1e/0x50
[  709.509938]  ? io_schedule_timeout+0x1e/0x50
[  709.515413]  wait_for_completion_io_timeout+0x11f/0x180
[  709.521920]  ? wake_up_q+0x80/0x80
[  709.526376]  blk_execute_rq+0x86/0xc0
[  709.531135]  scsi_execute+0xdb/0x1f0
[  709.536336]  sd_revalidate_disk+0xed/0x1c70 [sd_mod]
[  709.543281]  sd_probe_async+0xc3/0x1d0 [sd_mod]
[  709.549524]  async_run_entry_fn+0x38/0x160
[  709.03]  process_one_work+0x20a/0x660
[  709.561299]  worker_thread+0x3d/0x3b0
[  709.566688]  kthread+0x13a/0x150
[  709.571613]  ? process_one_work+0x660/0x660
[  709.577561]  ? kthread_create_on_node+0x40/0x40
[  709.583910]  ret_from_fork+0x27/0x40
[  709.589174] kworker/5:2 D0   386  2 0x8000
[  709.596616] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua]
[  709.604130] Call Trace:
[  709.608096]  __schedule+0x2fa/0xbb0
[  709.613265]  ? trace_hardirqs_on_caller+0xfb/0x190
[  709.619879]  schedule+0x36/0x90
[  709.624684]  

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-03 Thread Ming Lei
On Fri, Nov 03, 2017 at 02:42:50AM +, Bart Van Assche wrote:
> On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote:
> > [root@ibclient srp-test]# ./run_tests
> > modprobe: FATAL: Module target_core_mod is in use.
> 
> LIO must be unloaded before srp-test software is started.

Hi Bart,

Even with help of Laurence, we still can't setup your srp-test
in our test environment today.

But we have run Laurence's usual 3 tests on IB/SRP with/without all
my following patches against V4.14-rc4, looks everything is fine, and
no I/O hang is observed.

0001-blk-mq-sched-dispatch-from-scheduler-IFF-progress-is.patch
0002-blk-mq-sched-move-actual-dispatching-into-one-helper.patch
0003-sbitmap-introduce-__sbitmap_for_each_set.patch
0004-block-kyber-check-if-there-are-requests-in-ctx-in-ky.patch
0005-blk-mq-introduce-.get_budget-and-.put_budget-in-blk_.patch
0006-blk-mq-sched-improve-dispatching-from-sw-queue.patch
0007-scsi-allow-passing-in-null-rq-to-scsi_prep_state_che.patch
0008-scsi-implement-.get_budget-and-.put_budget-for-blk-m.patch
0009-blk-mq-don-t-handle-TAG_SHARED-in-restart.patch
0010-blk-mq-don-t-restart-queue-when-.get_budget-returns-.patch

BTW, Laurence found there is kernel crash in his IB/SRP test when running
for-next branch of block tree, so we just test v4.14-rc4 w/wo my blk-mq patches.

And I looked at the SCSI's queue_rq code for a while, and only found
one issue which may cause IO hang, and the following patch may address
this issue, but not sure if it is same with your issue. Could you apply
this patch and see if your issue can be fixed?

BTW, it should be helpful to check the blk-mq debugfs related files
when your I/O hang happens, could you provide that info?

--

>From edcb243d9a6f3446bd9a9f95c00bed7616dd7368 Mon Sep 17 00:00:00 2001
From: Ming Lei 
Date: Fri, 3 Nov 2017 12:11:59 +0800
Subject: [PATCH] SCSI_MQ: fix IO hang in case of queue busy

We have to insert the rq back before checking .device_busy,
otherwise When IO completes just after the check and before
this req is added to hctx->dispatch, this queue may never get
chance to be run, then this IO may hang forever.

This patch introduces BLK_STS_RESOURCE_OK for handling this
issue.

Signed-off-by: Ming Lei 
---
 block/blk-mq.c| 17 +
 drivers/scsi/scsi_lib.c   |  8 
 include/linux/blk-mq.h|  1 +
 include/linux/blk_types.h |  1 +
 4 files changed, 27 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e4d2490f4e7e..e1e03576edca 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -660,6 +660,16 @@ static void __blk_mq_requeue_request(struct request *rq)
}
 }
 
+void blk_mq_reinsert_request_hctx(struct blk_mq_hw_ctx *hctx, struct request 
*rq)
+{
+   __blk_mq_requeue_request(rq);
+
+   spin_lock(>lock);
+   list_add_tail(>queuelist, >dispatch);
+   spin_unlock(>lock);
+}
+EXPORT_SYMBOL(blk_mq_reinsert_request_hctx);
+
 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
__blk_mq_requeue_request(rq);
@@ -1165,6 +1175,12 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
list_add(>queuelist, list);
__blk_mq_requeue_request(rq);
break;
+   } else if (ret == BLK_STS_RESOURCE_OK) {
+   /*
+* BLK_STS_RESOURCE_OK means driver handled this
+* STS_RESOURCE already, we just need to stop dispatch.
+*/
+   break;
}
 
  fail_rq:
@@ -1656,6 +1672,7 @@ static void __blk_mq_try_issue_directly(struct 
blk_mq_hw_ctx *hctx,
ret = q->mq_ops->queue_rq(hctx, );
switch (ret) {
case BLK_STS_OK:
+   case BLK_STS_RESOURCE_OK:
*cookie = new_cookie;
return;
case BLK_STS_RESOURCE:
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7f218ef61900..0165c1caed82 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2030,9 +2030,17 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx 
*hctx,
case BLK_STS_OK:
break;
case BLK_STS_RESOURCE:
+   /*
+* We have to insert the rq back before checking .device_busy,
+* otherwise when IO completes just after the check and before
+* this req is added to hctx->dispatch, this queue may never get
+* chance to be run, then this IO may hang forever.
+*/
+   blk_mq_reinsert_request_hctx(hctx, req);
if (atomic_read(>device_busy) == 0 &&
!scsi_device_blocked(sdev))
blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
+   ret = BLK_STS_RESOURCE_OK;
break;

Re: [PATCH 3/3] nvme: fix eui_show() print format

2017-11-03 Thread Keith Busch
On Fri, Nov 03, 2017 at 01:55:16PM +0100, Christoph Hellwig wrote:
> On Fri, Nov 03, 2017 at 11:02:50AM +0100, Javier González wrote:
> > Signed-off-by: Javier González 
> > ---
> >  drivers/nvme/host/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index ae8ab0a1ef0d..f05c81774abf 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2108,7 +2108,7 @@ static ssize_t eui_show(struct device *dev, struct 
> > device_attribute *attr,
> > char *buf)
> >  {
> > struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
> > -   return sprintf(buf, "%8phd\n", ns->eui);
> > +   return sprintf(buf, "%8phD\n", ns->eui);
> >  }
> >  static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL);
> 
> This looks correct.  I wonder what the old code printed - does someone
> have a device with an EUI-64 at hand to quickly cross check what we
> did before?

It just prints the same as the 'ph' format, which would look like this:

  01 02 03 04 05 06 07 08

The change will make it look like this:

  01-02-03-04-05-06-07-08

I think that was the original intention.

Reviewed-by: Keith Busch 


Re: [PATCH 1/3] nvme: do not check for ns on rw path

2017-11-03 Thread Keith Busch
On Fri, Nov 03, 2017 at 01:53:40PM +0100, Christoph Hellwig wrote:
> > -   if (ns && ns->ms &&
> > +   if (ns->ms &&
> > (!ns->pi_type || ns->ms != sizeof(struct t10_pi_tuple)) &&
> > !blk_integrity_rq(req) && !blk_rq_is_passthrough(req))
> > return BLK_STS_NOTSUPP;
> 
> blk_rq_is_passthrough also can't be true here.
> 
> How about:
> 
>   if (ns->ms && !blk_integrity_rq(req) &&
>   (!ns->pi_type || ns->ms != sizeof(struct t10_pi_tuple)))
>   return BLK_STS_NOTSUPP;
> 
> Although I have to admit I don't really understand what this check
> is even trying to do.  It basically checks for a namespace that has
> a format with metadata that is not T10 protection information and
> then rejects all I/O to it.  Why are we even creating a block device
> node for such a thing?

If the namespace has metadata, but the request doesn't have a metadata
payload attached to it for whatever reason, we can't construct the command
for that format. We also can't have the controller strip/generate the
payload with PRACT bit set if it's not a T10 format, so we just fail
the command.


[PATCH V13 02/10] mmc: block: Add error-handling comments

2017-11-03 Thread Adrian Hunter
Add error-handling comments to explain what would also be done for blk-mq
if it used the legacy error-handling.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index ea80ff4cd7f9..ad72fa19f082 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1894,7 +1894,11 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
case MMC_BLK_SUCCESS:
case MMC_BLK_PARTIAL:
/*
-* A block was successfully transferred.
+* Reset success, and accept bytes_xfered. For
+* MMC_BLK_PARTIAL re-submit the remaining request. For
+* MMC_BLK_SUCCESS error out the remaining request (it
+* could not be re-submitted anyway if a next request
+* had already begun).
 */
mmc_blk_reset_success(md, type);
 
@@ -1914,6 +1918,14 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
}
break;
case MMC_BLK_CMD_ERR:
+   /*
+* For SD cards, get bytes written, but do not accept
+* bytes_xfered if that fails. For MMC cards accept
+* bytes_xfered. Then try to reset. If reset fails then
+* error out the remaining request, otherwise retry
+* once (N.B mmc_blk_reset() will not succeed twice in a
+* row).
+*/
req_pending = mmc_blk_rw_cmd_err(md, card, brq, 
old_req, req_pending);
if (mmc_blk_reset(md, card->host, type)) {
if (req_pending)
@@ -1930,11 +1942,20 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
}
break;
case MMC_BLK_RETRY:
+   /*
+* Do not accept bytes_xfered, but retry up to 5 times,
+* otherwise same as abort.
+*/
retune_retry_done = brq->retune_retry_done;
if (retry++ < 5)
break;
/* Fall through */
case MMC_BLK_ABORT:
+   /*
+* Do not accept bytes_xfered, but try to reset. If
+* reset succeeds, try once more, otherwise error out
+* the request.
+*/
if (!mmc_blk_reset(md, card->host, type))
break;
mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq);
@@ -1943,6 +1964,13 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
case MMC_BLK_DATA_ERR: {
int err;
 
+   /*
+* Do not accept bytes_xfered, but try to reset. If
+* reset succeeds, try once more. If reset fails with
+* ENODEV which means the partition is wrong, then error
+* out the request. Otherwise attempt to read one sector
+* at a time.
+*/
err = mmc_blk_reset(md, card->host, type);
if (!err)
break;
@@ -1954,6 +1982,10 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
/* Fall through */
}
case MMC_BLK_ECC_ERR:
+   /*
+* Do not accept bytes_xfered. If reading more than one
+* sector, try reading one sector at a time.
+*/
if (brq->data.blocks > 1) {
/* Redo read one sector at a time */
pr_warn("%s: retrying using single block 
read\n",
@@ -1975,10 +2007,12 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
}
break;
case MMC_BLK_NOMEDIUM:
+   /* Do not accept bytes_xfered. Error out the request */
mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq);
mmc_blk_rw_try_restart(mq, new_req, mqrq_cur);
return;
default:
+   /* Do not accept bytes_xfered. Error out the request */
 

[PATCH V13 01/10] mmc: core: Add parameter use_blk_mq

2017-11-03 Thread Adrian Hunter
Until mmc has blk-mq support fully implemented and tested, add a
parameter use_blk_mq, default to false unless config option MMC_MQ_DEFAULT
is selected.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/Kconfig  | 11 +++
 drivers/mmc/core/core.c  |  7 +++
 drivers/mmc/core/core.h  |  2 ++
 drivers/mmc/core/host.c  |  2 ++
 drivers/mmc/core/host.h  |  4 
 include/linux/mmc/host.h |  1 +
 6 files changed, 27 insertions(+)

diff --git a/drivers/mmc/Kconfig b/drivers/mmc/Kconfig
index ec21388311db..98202934bd29 100644
--- a/drivers/mmc/Kconfig
+++ b/drivers/mmc/Kconfig
@@ -12,6 +12,17 @@ menuconfig MMC
  If you want MMC/SD/SDIO support, you should say Y here and
  also to your specific host controller driver.
 
+config MMC_MQ_DEFAULT
+   bool "MMC: use blk-mq I/O path by default"
+   depends on MMC && BLOCK
+   ---help---
+ This option enables the new blk-mq based I/O path for MMC block
+ devices by default.  With the option the mmc_core.use_blk_mq
+ module/boot option defaults to Y, without it to N, but it can
+ still be overridden either way.
+
+ If unsure say N.
+
 if MMC
 
 source "drivers/mmc/core/Kconfig"
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 1f0f44f4dd5f..5df03cb73be7 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -66,6 +66,13 @@
 bool use_spi_crc = 1;
 module_param(use_spi_crc, bool, 0);
 
+#ifdef CONFIG_MMC_MQ_DEFAULT
+bool mmc_use_blk_mq = true;
+#else
+bool mmc_use_blk_mq = false;
+#endif
+module_param_named(use_blk_mq, mmc_use_blk_mq, bool, S_IWUSR | S_IRUGO);
+
 static int mmc_schedule_delayed_work(struct delayed_work *work,
 unsigned long delay)
 {
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index 71e6c6d7ceb7..8c5dd8d31400 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -35,6 +35,8 @@ struct mmc_bus_ops {
int (*reset)(struct mmc_host *);
 };
 
+extern bool mmc_use_blk_mq;
+
 void mmc_attach_bus(struct mmc_host *host, const struct mmc_bus_ops *ops);
 void mmc_detach_bus(struct mmc_host *host);
 
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 35a9e4fd1a9f..62ef6cb0ece4 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -404,6 +404,8 @@ struct mmc_host *mmc_alloc_host(int extra, struct device 
*dev)
 
host->fixed_drv_type = -EINVAL;
 
+   host->use_blk_mq = mmc_use_blk_mq;
+
return host;
 }
 
diff --git a/drivers/mmc/core/host.h b/drivers/mmc/core/host.h
index fb689a1065ed..6eaf558e62d6 100644
--- a/drivers/mmc/core/host.h
+++ b/drivers/mmc/core/host.h
@@ -74,6 +74,10 @@ static inline bool mmc_card_hs400es(struct mmc_card *card)
return card->host->ios.enhanced_strobe;
 }
 
+static inline bool mmc_host_use_blk_mq(struct mmc_host *host)
+{
+   return host->use_blk_mq;
+}
 
 #endif
 
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index e7743eca1021..ce2075d6f429 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -380,6 +380,7 @@ struct mmc_host {
unsigned intdoing_retune:1; /* re-tuning in progress */
unsigned intretune_now:1;   /* do re-tuning at next req */
unsigned intretune_paused:1; /* re-tuning is temporarily 
disabled */
+   unsigned intuse_blk_mq:1;   /* use blk-mq */
 
int rescan_disable; /* disable card detection */
int rescan_entered; /* used with nonremovable 
devices */
-- 
1.9.1



[PATCH V13 05/10] mmc: cqhci: support for command queue enabled host

2017-11-03 Thread Adrian Hunter
From: Venkat Gopalakrishnan 

This patch adds CMDQ support for command-queue compatible
hosts.

Command queue is added in eMMC-5.1 specification. This
enables the controller to process upto 32 requests at
a time.

Adrian Hunter contributed renaming to cqhci, recovery, suspend
and resume, cqhci_off, cqhci_wait_for_idle, and external timeout
handling.

Signed-off-by: Asutosh Das 
Signed-off-by: Sujit Reddy Thumma 
Signed-off-by: Konstantin Dorfman 
Signed-off-by: Venkat Gopalakrishnan 
Signed-off-by: Subhash Jadavani 
Signed-off-by: Ritesh Harjani 
Signed-off-by: Adrian Hunter 
---
 drivers/mmc/host/Kconfig  |   13 +
 drivers/mmc/host/Makefile |1 +
 drivers/mmc/host/cqhci.c  | 1150 +
 drivers/mmc/host/cqhci.h  |  240 ++
 4 files changed, 1404 insertions(+)
 create mode 100644 drivers/mmc/host/cqhci.c
 create mode 100644 drivers/mmc/host/cqhci.h

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 567028c9219a..3092b7085cb5 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -857,6 +857,19 @@ config MMC_SUNXI
  This selects support for the SD/MMC Host Controller on
  Allwinner sunxi SoCs.
 
+config MMC_CQHCI
+   tristate "Command Queue Host Controller Interface support"
+   depends on HAS_DMA
+   help
+ This selects the Command Queue Host Controller Interface (CQHCI)
+ support present in host controllers of Qualcomm Technologies, Inc
+ amongst others.
+ This controller supports eMMC devices with command queue support.
+
+ If you have a controller with this interface, say Y or M here.
+
+ If unsure, say N.
+
 config MMC_TOSHIBA_PCI
tristate "Toshiba Type A SD/MMC Card Interface Driver"
depends on PCI
diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
index ab61a3e39c0b..de140e3ef402 100644
--- a/drivers/mmc/host/Makefile
+++ b/drivers/mmc/host/Makefile
@@ -91,6 +91,7 @@ obj-$(CONFIG_MMC_SDHCI_ST)+= sdhci-st.o
 obj-$(CONFIG_MMC_SDHCI_MICROCHIP_PIC32)+= sdhci-pic32.o
 obj-$(CONFIG_MMC_SDHCI_BRCMSTB)+= sdhci-brcmstb.o
 obj-$(CONFIG_MMC_SDHCI_OMAP)   += sdhci-omap.o
+obj-$(CONFIG_MMC_CQHCI)+= cqhci.o
 
 ifeq ($(CONFIG_CB710_DEBUG),y)
CFLAGS-cb710-mmc+= -DDEBUG
diff --git a/drivers/mmc/host/cqhci.c b/drivers/mmc/host/cqhci.c
new file mode 100644
index ..159270e947cf
--- /dev/null
+++ b/drivers/mmc/host/cqhci.c
@@ -0,0 +1,1150 @@
+/* Copyright (c) 2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "cqhci.h"
+
+#define DCMD_SLOT 31
+#define NUM_SLOTS 32
+
+struct cqhci_slot {
+   struct mmc_request *mrq;
+   unsigned int flags;
+#define CQHCI_EXTERNAL_TIMEOUT BIT(0)
+#define CQHCI_COMPLETEDBIT(1)
+#define CQHCI_HOST_CRC BIT(2)
+#define CQHCI_HOST_TIMEOUT BIT(3)
+#define CQHCI_HOST_OTHER   BIT(4)
+};
+
+static inline u8 *get_desc(struct cqhci_host *cq_host, u8 tag)
+{
+   return cq_host->desc_base + (tag * cq_host->slot_sz);
+}
+
+static inline u8 *get_link_desc(struct cqhci_host *cq_host, u8 tag)
+{
+   u8 *desc = get_desc(cq_host, tag);
+
+   return desc + cq_host->task_desc_len;
+}
+
+static inline dma_addr_t get_trans_desc_dma(struct cqhci_host *cq_host, u8 tag)
+{
+   return cq_host->trans_desc_dma_base +
+   (cq_host->mmc->max_segs * tag *
+cq_host->trans_desc_len);
+}
+
+static inline u8 *get_trans_desc(struct cqhci_host *cq_host, u8 tag)
+{
+   return cq_host->trans_desc_base +
+   (cq_host->trans_desc_len * cq_host->mmc->max_segs * tag);
+}
+
+static void setup_trans_desc(struct cqhci_host *cq_host, u8 tag)
+{
+   u8 *link_temp;
+   dma_addr_t trans_temp;
+
+   link_temp = get_link_desc(cq_host, tag);
+   trans_temp = get_trans_desc_dma(cq_host, tag);
+
+   memset(link_temp, 0, cq_host->link_desc_len);
+   if (cq_host->link_desc_len > 8)
+   *(link_temp + 8) = 0;
+
+   if (tag == DCMD_SLOT && (cq_host->mmc->caps2 & MMC_CAP2_CQE_DCMD)) {
+   *link_temp = CQHCI_VALID(0) 

[PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion

2017-11-03 Thread Adrian Hunter
For blk-mq, add support for completing requests directly in the ->done
callback. That means that error handling and urgent background operations
must be handled by recovery_work in that case.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 100 +--
 drivers/mmc/core/block.h |   1 +
 drivers/mmc/core/queue.c |   5 ++-
 drivers/mmc/core/queue.h |   6 +++
 include/linux/mmc/host.h |   1 +
 5 files changed, 101 insertions(+), 12 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index e8be17152884..cbb4b35a592d 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2086,6 +2086,22 @@ static void mmc_blk_rw_recovery(struct mmc_queue *mq, 
struct request *req)
}
 }
 
+static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
+{
+   mmc_blk_eval_resp_error(brq);
+
+   return brq->sbc.error || brq->cmd.error || brq->stop.error ||
+  brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
+}
+
+static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
+   struct request *req)
+{
+   int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+
+   mmc_blk_reset_success(mq->blkdata, type);
+}
+
 static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
 {
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
@@ -2167,14 +2183,43 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, 
struct request *req)
if (host->ops->post_req)
host->ops->post_req(host, mrq, 0);
 
-   blk_mq_complete_request(req);
+   /*
+* Block layer timeouts race with completions which means the normal
+* completion path cannot be used during recovery.
+*/
+   if (mq->in_recovery)
+   mmc_blk_mq_complete_rq(mq, req);
+   else
+   blk_mq_complete_request(req);
 
mmc_blk_mq_acct_req_done(mq, req);
 }
 
+void mmc_blk_mq_recovery(struct mmc_queue *mq)
+{
+   struct request *req = mq->recovery_req;
+   struct mmc_host *host = mq->card->host;
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+
+   mq->recovery_req = NULL;
+   mq->rw_wait = false;
+
+   if (mmc_blk_rq_error(>brq)) {
+   mmc_retune_hold_now(host);
+   mmc_blk_rw_recovery(mq, req);
+   }
+
+   mmc_blk_urgent_bkops(mq, mqrq);
+
+   mmc_blk_mq_post_req(mq, req);
+}
+
 static void mmc_blk_mq_complete_prev_req(struct mmc_queue *mq,
 struct request **prev_req)
 {
+   if (mmc_queue_direct_complete(mq->card->host))
+   return;
+
mutex_lock(>complete_lock);
 
if (!mq->complete_req)
@@ -2208,19 +2253,43 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq)
struct request *req = mmc_queue_req_to_req(mqrq);
struct request_queue *q = req->q;
struct mmc_queue *mq = q->queuedata;
+   struct mmc_host *host = mq->card->host;
unsigned long flags;
-   bool waiting;
 
-   spin_lock_irqsave(q->queue_lock, flags);
-   mq->complete_req = req;
-   mq->rw_wait = false;
-   waiting = mq->waiting;
-   spin_unlock_irqrestore(q->queue_lock, flags);
+   if (!mmc_queue_direct_complete(host)) {
+   bool waiting;
+
+   spin_lock_irqsave(q->queue_lock, flags);
+   mq->complete_req = req;
+   mq->rw_wait = false;
+   waiting = mq->waiting;
+   spin_unlock_irqrestore(q->queue_lock, flags);
+
+   if (waiting)
+   wake_up(>wait);
+   else
+   kblockd_schedule_work(>complete_work);
 
-   if (waiting)
+   return;
+   }
+
+   if (mmc_blk_rq_error(>brq) ||
+   mmc_blk_urgent_bkops_needed(mq, mqrq)) {
+   spin_lock_irqsave(q->queue_lock, flags);
+   mq->recovery_needed = true;
+   mq->recovery_req = req;
+   spin_unlock_irqrestore(q->queue_lock, flags);
wake_up(>wait);
-   else
-   kblockd_schedule_work(>complete_work);
+   schedule_work(>recovery_work);
+   return;
+   }
+
+   mmc_blk_rw_reset_success(mq, req);
+
+   mq->rw_wait = false;
+   wake_up(>wait);
+
+   mmc_blk_mq_post_req(mq, req);
 }
 
 static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
@@ -2230,7 +2299,12 @@ static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, 
int *err)
bool done;
 
spin_lock_irqsave(q->queue_lock, flags);
-   done = !mq->rw_wait;
+   if (mq->recovery_needed) {
+   *err = -EBUSY;
+   done = true;
+   } else {
+   done = !mq->rw_wait;
+   }
mq->waiting = !done;
spin_unlock_irqrestore(q->queue_lock, flags);
 
@@ 

[PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK

2017-11-03 Thread Adrian Hunter
Add CQHCI initialization and implement CQHCI operations for Intel GLK.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/host/Kconfig  |   1 +
 drivers/mmc/host/sdhci-pci-core.c | 155 +-
 2 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 3092b7085cb5..2b02a9788bb6 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -81,6 +81,7 @@ config MMC_SDHCI_BIG_ENDIAN_32BIT_BYTE_SWAPPER
 config MMC_SDHCI_PCI
tristate "SDHCI support on PCI bus"
depends on MMC_SDHCI && PCI
+   select MMC_CQHCI
help
  This selects the PCI Secure Digital Host Controller Interface.
  Most controllers found today are PCI devices.
diff --git a/drivers/mmc/host/sdhci-pci-core.c 
b/drivers/mmc/host/sdhci-pci-core.c
index 3e4f04fd5175..110c634cfb43 100644
--- a/drivers/mmc/host/sdhci-pci-core.c
+++ b/drivers/mmc/host/sdhci-pci-core.c
@@ -30,6 +30,8 @@
 #include 
 #include 
 
+#include "cqhci.h"
+
 #include "sdhci.h"
 #include "sdhci-pci.h"
 
@@ -116,6 +118,28 @@ int sdhci_pci_resume_host(struct sdhci_pci_chip *chip)
 
return 0;
 }
+
+static int sdhci_cqhci_suspend(struct sdhci_pci_chip *chip)
+{
+   int ret;
+
+   ret = cqhci_suspend(chip->slots[0]->host->mmc);
+   if (ret)
+   return ret;
+
+   return sdhci_pci_suspend_host(chip);
+}
+
+static int sdhci_cqhci_resume(struct sdhci_pci_chip *chip)
+{
+   int ret;
+
+   ret = sdhci_pci_resume_host(chip);
+   if (ret)
+   return ret;
+
+   return cqhci_resume(chip->slots[0]->host->mmc);
+}
 #endif
 
 #ifdef CONFIG_PM
@@ -166,8 +190,48 @@ static int sdhci_pci_runtime_resume_host(struct 
sdhci_pci_chip *chip)
 
return 0;
 }
+
+static int sdhci_cqhci_runtime_suspend(struct sdhci_pci_chip *chip)
+{
+   int ret;
+
+   ret = cqhci_suspend(chip->slots[0]->host->mmc);
+   if (ret)
+   return ret;
+
+   return sdhci_pci_runtime_suspend_host(chip);
+}
+
+static int sdhci_cqhci_runtime_resume(struct sdhci_pci_chip *chip)
+{
+   int ret;
+
+   ret = sdhci_pci_runtime_resume_host(chip);
+   if (ret)
+   return ret;
+
+   return cqhci_resume(chip->slots[0]->host->mmc);
+}
 #endif
 
+static u32 sdhci_cqhci_irq(struct sdhci_host *host, u32 intmask)
+{
+   int cmd_error = 0;
+   int data_error = 0;
+
+   if (!sdhci_cqe_irq(host, intmask, _error, _error))
+   return intmask;
+
+   cqhci_irq(host->mmc, intmask, cmd_error, data_error);
+
+   return 0;
+}
+
+static void sdhci_pci_dumpregs(struct mmc_host *mmc)
+{
+   sdhci_dumpregs(mmc_priv(mmc));
+}
+
 /*\
  *   *
  * Hardware specific quirk handling  *
@@ -583,6 +647,18 @@ static void sdhci_intel_voltage_switch(struct sdhci_host 
*host)
.voltage_switch = sdhci_intel_voltage_switch,
 };
 
+static const struct sdhci_ops sdhci_intel_glk_ops = {
+   .set_clock  = sdhci_set_clock,
+   .set_power  = sdhci_intel_set_power,
+   .enable_dma = sdhci_pci_enable_dma,
+   .set_bus_width  = sdhci_set_bus_width,
+   .reset  = sdhci_reset,
+   .set_uhs_signaling  = sdhci_set_uhs_signaling,
+   .hw_reset   = sdhci_pci_hw_reset,
+   .voltage_switch = sdhci_intel_voltage_switch,
+   .irq= sdhci_cqhci_irq,
+};
+
 static void byt_read_dsm(struct sdhci_pci_slot *slot)
 {
struct intel_host *intel_host = sdhci_pci_priv(slot);
@@ -612,12 +688,80 @@ static int glk_emmc_probe_slot(struct sdhci_pci_slot 
*slot)
 {
int ret = byt_emmc_probe_slot(slot);
 
+   slot->host->mmc->caps2 |= MMC_CAP2_CQE;
+
if (slot->chip->pdev->device != PCI_DEVICE_ID_INTEL_GLK_EMMC) {
slot->host->mmc->caps2 |= MMC_CAP2_HS400_ES,
slot->host->mmc_host_ops.hs400_enhanced_strobe =
intel_hs400_enhanced_strobe;
+   slot->host->mmc->caps2 |= MMC_CAP2_CQE_DCMD;
+   }
+
+   return ret;
+}
+
+static void glk_cqe_enable(struct mmc_host *mmc)
+{
+   struct sdhci_host *host = mmc_priv(mmc);
+   u32 reg;
+
+   /*
+* CQE gets stuck if it sees Buffer Read Enable bit set, which can be
+* the case after tuning, so ensure the buffer is drained.
+*/
+   reg = sdhci_readl(host, SDHCI_PRESENT_STATE);
+   while (reg & SDHCI_DATA_AVAILABLE) {
+   sdhci_readl(host, SDHCI_BUFFER);
+   reg = sdhci_readl(host, SDHCI_PRESENT_STATE);
+   }
+
+   sdhci_cqe_enable(mmc);
+}
+
+static const struct cqhci_host_ops glk_cqhci_ops = {
+   .enable 

[PATCH V13 09/10] mmc: block: blk-mq: Stop using card_busy_detect()

2017-11-03 Thread Adrian Hunter
card_busy_detect() doesn't set a correct timeout, and it doesn't take care
of error status bits. Stop using it for blk-mq.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 117 +++
 1 file changed, 109 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 0c29b1d8d545..5c5ff3c34313 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1426,15 +1426,18 @@ static inline void mmc_apply_rel_rw(struct 
mmc_blk_request *brq,
}
 }
 
-#define CMD_ERRORS \
-   (R1_OUT_OF_RANGE |  /* Command argument out of range */ \
-R1_ADDRESS_ERROR | /* Misaligned address */\
+#define CMD_ERRORS_EXCL_OOR\
+   (R1_ADDRESS_ERROR | /* Misaligned address */\
 R1_BLOCK_LEN_ERROR |   /* Transferred block length incorrect */\
 R1_WP_VIOLATION |  /* Tried to write to protected block */ \
 R1_CARD_ECC_FAILED |   /* Card ECC failed */   \
 R1_CC_ERROR |  /* Card controller error */ \
 R1_ERROR)  /* General/unknown error */
 
+#define CMD_ERRORS \
+   (CMD_ERRORS_EXCL_OOR |  \
+R1_OUT_OF_RANGE)   /* Command argument out of range */ \
+
 static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
 {
u32 val;
@@ -1951,6 +1954,95 @@ static void mmc_blk_ss_read(struct mmc_queue *mq, struct 
request *req)
mqrq->retries = MMC_NO_RETRIES;
 }
 
+static inline bool mmc_blk_oor_valid(struct mmc_blk_request *brq)
+{
+   return !!brq->mrq.sbc;
+}
+
+static inline u32 mmc_blk_stop_err_bits(struct mmc_blk_request *brq)
+{
+   return mmc_blk_oor_valid(brq) ? CMD_ERRORS : CMD_ERRORS_EXCL_OOR;
+}
+
+static inline bool mmc_blk_in_tran_state(u32 status)
+{
+   /*
+* Some cards mishandle the status bits, so make sure to check both the
+* busy indication and the card state.
+*/
+   return status & R1_READY_FOR_DATA &&
+  (R1_CURRENT_STATE(status) == R1_STATE_TRAN);
+}
+
+static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
+{
+   if (host->actual_clock)
+   return host->actual_clock / 1000;
+
+   /* Clock may be subject to a divisor, fudge it by a factor of 2. */
+   if (host->ios.clock)
+   return host->ios.clock / 2000;
+
+   /* How can there be no clock */
+   WARN_ON_ONCE(1);
+   return 100; /* 100 kHz is minimum possible value */
+}
+
+static unsigned long mmc_blk_data_timeout_jiffies(struct mmc_host *host,
+ struct mmc_data *data)
+{
+   unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 100);
+   unsigned int khz;
+
+   if (data->timeout_clks) {
+   khz = mmc_blk_clock_khz(host);
+   ms += DIV_ROUND_UP(data->timeout_clks, khz);
+   }
+
+   return msecs_to_jiffies(ms);
+}
+
+static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req,
+ u32 *resp_errs)
+{
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   struct mmc_data *data = >brq.data;
+   unsigned long timeout;
+   u32 status;
+   int err;
+
+   timeout = jiffies + mmc_blk_data_timeout_jiffies(card->host, data);
+
+   while (1) {
+   bool done = time_after(jiffies, timeout);
+
+   err = __mmc_send_status(card, , 5);
+   if (err) {
+   pr_err("%s: error %d requesting status\n",
+  req->rq_disk->disk_name, err);
+   break;
+   }
+
+   /* Accumulate any response error bits seen */
+   if (resp_errs)
+   *resp_errs |= status;
+
+   if (mmc_blk_in_tran_state(status))
+   break;
+
+   /* Timeout if the device never becomes ready */
+   if (done) {
+   pr_err("%s: Card stuck in wrong state! %s %s\n",
+   mmc_hostname(card->host),
+   req->rq_disk->disk_name, __func__);
+   err = -ETIMEDOUT;
+   break;
+   }
+   }
+
+   return err;
+}
+
 static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
 {
int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
@@ -2097,17 +2189,26 @@ static inline bool mmc_blk_rq_error(struct 
mmc_blk_request *brq)
 static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
 {
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
-   bool gen_err = false;
+  

[PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery

2017-11-03 Thread Adrian Hunter
Recovery is simpler to understand if it is only used for errors. Create a
separate function for card polling.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index cbb4b35a592d..0c29b1d8d545 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2094,6 +2094,24 @@ static inline bool mmc_blk_rq_error(struct 
mmc_blk_request *brq)
   brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
 }
 
+static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
+{
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   bool gen_err = false;
+   int err;
+
+   if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
+   return 0;
+
+   err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, false, req, _err);
+
+   /* Copy the general error bit so it will be seen later on */
+   if (gen_err)
+   mqrq->brq.stop.resp[0] |= R1_ERROR;
+
+   return err;
+}
+
 static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
struct request *req)
 {
@@ -2150,8 +2168,15 @@ static void mmc_blk_mq_poll_completion(struct mmc_queue 
*mq,
   struct request *req)
 {
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   struct mmc_host *host = mq->card->host;
 
-   mmc_blk_rw_recovery(mq, req);
+   if (mmc_blk_rq_error(>brq) ||
+   mmc_blk_card_busy(mq->card, req)) {
+   mmc_blk_rw_recovery(mq, req);
+   } else {
+   mmc_blk_rw_reset_success(mq, req);
+   mmc_retune_release(host);
+   }
 
mmc_blk_urgent_bkops(mq, mqrq);
 }
-- 
1.9.1



[PATCH V13 10/10] mmc: block: blk-mq: Stop using legacy recovery

2017-11-03 Thread Adrian Hunter
There are only a few things the recovery needs to do. Primarily, it just
needs to:
Determine the number of bytes transferred
Get the card back to transfer state
Determine whether to retry

There are also a couple of additional features:
Reset the card before the last retry
Read one sector at a time

The legacy code spent much effort analyzing command errors, but commands
fail fast, so it is simpler just to give all command errors the same number
of retries.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 261 ---
 1 file changed, 135 insertions(+), 126 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 5c5ff3c34313..623fa2be7077 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1480,9 +1480,11 @@ static void mmc_blk_eval_resp_error(struct 
mmc_blk_request *brq)
}
 }
 
-static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card,
-  struct mmc_queue_req *mq_mrq)
+static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
+struct mmc_async_req *areq)
 {
+   struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
+   areq);
struct mmc_blk_request *brq = _mrq->brq;
struct request *req = mmc_queue_req_to_req(mq_mrq);
int need_retune = card->host->need_retune;
@@ -1588,15 +1590,6 @@ static enum mmc_blk_status __mmc_blk_err_check(struct 
mmc_card *card,
return MMC_BLK_SUCCESS;
 }
 
-static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
-struct mmc_async_req *areq)
-{
-   struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
-   areq);
-
-   return __mmc_blk_err_check(card, mq_mrq);
-}
-
 static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
  int disable_multi, bool *do_rel_wr_p,
  bool *do_data_tag_p)
@@ -1922,6 +1915,7 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 }
 
 #define MMC_MAX_RETRIES5
+#define MMC_DATA_RETRIES   2
 #define MMC_NO_RETRIES (MMC_MAX_RETRIES + 1)
 
 /* Single sector read during recovery */
@@ -1974,6 +1968,28 @@ static inline bool mmc_blk_in_tran_state(u32 status)
   (R1_CURRENT_STATE(status) == R1_STATE_TRAN);
 }
 
+/*
+ * Check for errors the host controller driver might not have seen such as
+ * response mode errors or invalid card state.
+ */
+static bool mmc_blk_status_error(struct request *req, u32 status)
+{
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   struct mmc_blk_request *brq = >brq;
+   u32 stop_err_bits = mmc_blk_stop_err_bits(brq);
+
+   return brq->cmd.resp[0]  & CMD_ERRORS||
+  brq->stop.resp[0] & stop_err_bits ||
+  status& stop_err_bits ||
+  (rq_data_dir(req) == WRITE && !mmc_blk_in_tran_state(status));
+}
+
+static inline bool mmc_blk_cmd_started(struct mmc_blk_request *brq)
+{
+   return !brq->sbc.error && !brq->cmd.error &&
+  !(brq->cmd.resp[0] & CMD_ERRORS);
+}
+
 static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
 {
if (host->actual_clock)
@@ -2043,6 +2059,47 @@ static int mmc_blk_card_stuck(struct mmc_card *card, 
struct request *req,
return err;
 }
 
+static int mmc_blk_send_stop(struct mmc_card *card)
+{
+   struct mmc_command cmd = {
+   .opcode = MMC_STOP_TRANSMISSION,
+   .flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_AC,
+   };
+
+   return mmc_wait_for_cmd(card->host, , 5);
+}
+
+static int mmc_blk_fix_state(struct mmc_card *card, struct request *req)
+{
+   int err;
+
+   mmc_retune_hold_now(card->host);
+
+   mmc_blk_send_stop(card);
+
+   err = mmc_blk_card_stuck(card, req, NULL);
+
+   mmc_retune_release(card->host);
+
+   return err;
+}
+
+/*
+ * Requests are completed by mmc_blk_mq_complete_rq() which sets simple
+ * policy:
+ * 1. A request that has transferred at least some data is considered
+ * successful and will be requeued if there is remaining data to
+ * transfer.
+ * 2. Otherwise the number of retries is incremented and the request
+ * will be requeued if there are remaining retries.
+ * 3. Otherwise the request will be errored out.
+ * That means mmc_blk_mq_complete_rq() is controlled by bytes_xfered and
+ * mqrq->retries. So there are only 4 possible actions here:
+ * 1. do not accept the bytes_xfered value i.e. set it to zero
+ * 2. change mqrq->retries to determine the number of retries
+ * 3. try to reset the card
+ * 4. read one sector at a time
+ */
 static void 

[PATCH V13 04/10] mmc: block: Add CQE support

2017-11-03 Thread Adrian Hunter
Add CQE support to the block driver, including:
- optionally using DCMD for flush requests
- "manually" issuing discard requests
- issuing read / write requests to the CQE
- supporting block-layer timeouts
- handling recovery
- supporting re-tuning

CQE offers 25% - 50% better random multi-threaded I/O.  There is a slight
(e.g. 2%) drop in sequential read speed but no observable change to sequential
write.

CQE automatically sends the commands to complete requests.  However it only
supports reads / writes and so-called "direct commands" (DCMD).  Furthermore
DCMD is limited to one command at a time, but discards require 3 commands.
That makes issuing discards through CQE very awkward, but some CQE's don't
support DCMD anyway.  So for discards, the existing non-CQE approach is
taken, where the mmc core code issues the 3 commands one at a time i.e.
mmc_erase(). Where DCMD is used, is for issuing flushes.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 150 +++-
 drivers/mmc/core/block.h |   2 +
 drivers/mmc/core/queue.c | 158 +--
 drivers/mmc/core/queue.h |  18 ++
 4 files changed, 322 insertions(+), 6 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index e2838ff4738e..e8be17152884 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -112,6 +112,7 @@ struct mmc_blk_data {
 #define MMC_BLK_WRITE  BIT(1)
 #define MMC_BLK_DISCARDBIT(2)
 #define MMC_BLK_SECDISCARD BIT(3)
+#define MMC_BLK_CQE_RECOVERY   BIT(4)
 
/*
 * Only set in main mmc_blk_data associated
@@ -1717,6 +1718,138 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, 
struct mmc_queue_req *mqrq,
*do_data_tag_p = do_data_tag;
 }
 
+#define MMC_CQE_RETRIES 2
+
+static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
+{
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   struct mmc_request *mrq = >brq.mrq;
+   struct request_queue *q = req->q;
+   struct mmc_host *host = mq->card->host;
+   unsigned long flags;
+   bool put_card;
+   int err;
+
+   mmc_cqe_post_req(host, mrq);
+
+   if (mrq->cmd && mrq->cmd->error)
+   err = mrq->cmd->error;
+   else if (mrq->data && mrq->data->error)
+   err = mrq->data->error;
+   else
+   err = 0;
+
+   if (err) {
+   if (mqrq->retries++ < MMC_CQE_RETRIES)
+   blk_mq_requeue_request(req, true);
+   else
+   blk_mq_end_request(req, BLK_STS_IOERR);
+   } else if (mrq->data) {
+   if (blk_update_request(req, BLK_STS_OK, 
mrq->data->bytes_xfered))
+   blk_mq_requeue_request(req, true);
+   else
+   __blk_mq_end_request(req, BLK_STS_OK);
+   } else {
+   blk_mq_end_request(req, BLK_STS_OK);
+   }
+
+   spin_lock_irqsave(q->queue_lock, flags);
+
+   mq->in_flight[mmc_issue_type(mq, req)] -= 1;
+
+   put_card = mmc_tot_in_flight(mq) == 0;
+
+   mmc_cqe_check_busy(mq);
+
+   spin_unlock_irqrestore(q->queue_lock, flags);
+
+   if (!mq->cqe_busy)
+   blk_mq_run_hw_queues(q, true);
+
+   if (put_card)
+   mmc_put_card(mq->card, >ctx);
+}
+
+void mmc_blk_cqe_recovery(struct mmc_queue *mq)
+{
+   struct mmc_card *card = mq->card;
+   struct mmc_host *host = card->host;
+   int err;
+
+   pr_debug("%s: CQE recovery start\n", mmc_hostname(host));
+
+   err = mmc_cqe_recovery(host);
+   if (err)
+   mmc_blk_reset(mq->blkdata, host, MMC_BLK_CQE_RECOVERY);
+   else
+   mmc_blk_reset_success(mq->blkdata, MMC_BLK_CQE_RECOVERY);
+
+   pr_debug("%s: CQE recovery done\n", mmc_hostname(host));
+}
+
+static void mmc_blk_cqe_req_done(struct mmc_request *mrq)
+{
+   struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
+ brq.mrq);
+   struct request *req = mmc_queue_req_to_req(mqrq);
+   struct request_queue *q = req->q;
+   struct mmc_queue *mq = q->queuedata;
+
+   /*
+* Block layer timeouts race with completions which means the normal
+* completion path cannot be used during recovery.
+*/
+   if (mq->in_recovery)
+   mmc_blk_cqe_complete_rq(mq, req);
+   else
+   blk_mq_complete_request(req);
+}
+
+static int mmc_blk_cqe_start_req(struct mmc_host *host, struct mmc_request 
*mrq)
+{
+   mrq->done   = mmc_blk_cqe_req_done;
+   mrq->recovery_notifier  = mmc_cqe_recovery_notifier;
+
+   return mmc_cqe_start_req(host, mrq);
+}
+
+static struct mmc_request *mmc_blk_cqe_prep_dcmd(struct mmc_queue_req *mqrq,
+

[PATCH V13 03/10] mmc: block: Add blk-mq support

2017-11-03 Thread Adrian Hunter
Define and use a blk-mq queue. Discards and flushes are processed
synchronously, but reads and writes asynchronously. In order to support
slow DMA unmapping, DMA unmapping is not done until after the next request
is started. That means the request is not completed until then. If there is
no next request then the completion is done by queued work.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 457 ++-
 drivers/mmc/core/block.h |   9 +
 drivers/mmc/core/queue.c | 273 +---
 drivers/mmc/core/queue.h |  32 
 4 files changed, 740 insertions(+), 31 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index ad72fa19f082..e2838ff4738e 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1264,7 +1264,10 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
break;
}
mq_rq->drv_op_result = ret;
-   blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+   if (req->mq_ctx)
+   blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+   else
+   blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
@@ -1307,7 +1310,10 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue 
*mq, struct request *req)
else
mmc_blk_reset_success(md, type);
 fail:
-   blk_end_request(req, status, blk_rq_bytes(req));
+   if (req->mq_ctx)
+   blk_mq_end_request(req, status);
+   else
+   blk_end_request(req, status, blk_rq_bytes(req));
 }
 
 static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
@@ -1377,7 +1383,10 @@ static void mmc_blk_issue_secdiscard_rq(struct mmc_queue 
*mq,
if (!err)
mmc_blk_reset_success(md, type);
 out:
-   blk_end_request(req, status, blk_rq_bytes(req));
+   if (req->mq_ctx)
+   blk_mq_end_request(req, status);
+   else
+   blk_end_request(req, status, blk_rq_bytes(req));
 }
 
 static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
@@ -1387,7 +1396,10 @@ static void mmc_blk_issue_flush(struct mmc_queue *mq, 
struct request *req)
int ret = 0;
 
ret = mmc_flush_cache(card);
-   blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+   if (req->mq_ctx)
+   blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+   else
+   blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 /*
@@ -1464,11 +1476,9 @@ static void mmc_blk_eval_resp_error(struct 
mmc_blk_request *brq)
}
 }
 
-static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
-struct mmc_async_req *areq)
+static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card,
+  struct mmc_queue_req *mq_mrq)
 {
-   struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
-   areq);
struct mmc_blk_request *brq = _mrq->brq;
struct request *req = mmc_queue_req_to_req(mq_mrq);
int need_retune = card->host->need_retune;
@@ -1574,6 +1584,15 @@ static enum mmc_blk_status mmc_blk_err_check(struct 
mmc_card *card,
return MMC_BLK_SUCCESS;
 }
 
+static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
+struct mmc_async_req *areq)
+{
+   struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
+   areq);
+
+   return __mmc_blk_err_check(card, mq_mrq);
+}
+
 static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
  int disable_multi, bool *do_rel_wr_p,
  bool *do_data_tag_p)
@@ -1766,6 +1785,428 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req 
*mqrq,
mqrq->areq.err_check = mmc_blk_err_check;
 }
 
+#define MMC_MAX_RETRIES5
+#define MMC_NO_RETRIES (MMC_MAX_RETRIES + 1)
+
+/* Single sector read during recovery */
+static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req)
+{
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   blk_status_t status;
+
+   while (1) {
+   mmc_blk_rw_rq_prep(mqrq, mq->card, 1, mq);
+
+   mmc_wait_for_req(mq->card->host, >brq.mrq);
+
+   /*
+* Not expecting command errors, so just give up in that case.
+* If there are retries remaining, the request will get
+* requeued.
+*/
+   if (mqrq->brq.cmd.error)
+   return;
+
+   if (blk_rq_bytes(req) <= 512)
+  

[PATCH V13 00/10] mmc: Add Command Queue support

2017-11-03 Thread Adrian Hunter
Hi

Here is V13 of the hardware command queue patches without the software
command queue patches, now using blk-mq and now with blk-mq support for
non-CQE I/O.

HW CMDQ offers 25% - 50% better random multi-threaded I/O.  I see a slight
2% drop in sequential read speed but no change to sequential write.

Non-CQE blk-mq showed a 3% decrease in sequential read performance.  This
seemed to be coming from the inferior latency of running work items compared
with a dedicated thread.  Hacking blk-mq workqueue to be unbound reduced the
performance degradation from 3% to 1%.

While we should look at changing blk-mq to give better workqueue performance,
a bigger gain is likely to be made by adding a new host API to enable the
next already-prepared request to be issued directly from within ->done()
callback of the current request.


Changes since V12:
  mmc: block: Add error-handling comments
New patch.
  mmc: block: Add blk-mq support
Use legacy error handling
  mmc: block: Add CQE support
Re-base
  mmc: block: blk-mq: Add support for direct completion
New patch.
  mmc: block: blk-mq: Separate card polling from recovery
New patch.
  mmc: block: blk-mq: Stop using card_busy_detect()
New patch.
  mmc: block: blk-mq: Stop using legacy recovery
New patch.

Changes since V11:
  Split "mmc: block: Add CQE and blk-mq support" into 2 patches

Changes since V10:
  mmc: core: Remove unnecessary host claim
  mmc: core: Introduce host claiming by context
  mmc: core: Add support for handling CQE requests
  mmc: mmc: Enable Command Queuing
  mmc: mmc: Enable CQE's
  mmc: block: Use local variables in mmc_blk_data_prep()
  mmc: block: Prepare CQE data
  mmc: block: Factor out mmc_setup_queue()
  mmc: core: Add parameter use_blk_mq
  mmc: core: Export mmc_start_bkops()
  mmc: core: Export mmc_start_request()
  mmc: core: Export mmc_retune_hold_now() and mmc_retune_release()
Dropped because they have been applied
  mmc: block: Add CQE and blk-mq support
Extend blk-mq support for asynchronous read / writes to all host
controllers including those that require polling. The direct
completion path is still available but depends on a new capability
flag.
Drop blk-mq support for synchronous read / writes.

Venkat Gopalakrishnan (1):
  mmc: cqhci: support for command queue enabled host

Changes since V9:
  mmc: block: Add CQE and blk-mq support
- reinstate mq support for REQ_OP_DRV_IN/OUT that was removed because
it was incorrectly assumed to be handled by the rpmb character device
- don't check for rpmb block device anymore
  mmc: cqhci: support for command queue enabled host
Fix cqhci_set_irqs() as per Haibo Chen

Changes since V8:
Re-based
  mmc: core: Introduce host claiming by context
Slightly simplified as per Ulf
  mmc: core: Export mmc_retune_hold_now() and mmc_retune_release()
New patch.
  mmc: block: Add CQE and blk-mq support
Fix missing ->post_req() on the error path

Changes since V7:
Re-based
  mmc: core: Introduce host claiming by context
Slightly simplified
  mmc: core: Add parameter use_blk_mq
New patch.
  mmc: core: Remove unnecessary host claim
New patch.
  mmc: core: Export mmc_start_bkops()
New patch.
  mmc: core: Export mmc_start_request()
New patch.
  mmc: block: Add CQE and blk-mq support
Add blk-mq support for non_CQE requests

Changes since V6:
  mmc: core: Introduce host claiming by context
New patch.
  mmc: core: Move mmc_start_areq() declaration
Dropped because it has been applied
  mmc: block: Fix block status codes
Dropped because it has been applied
  mmc: host: Add CQE interface
Dropped because it has been applied
  mmc: core: Turn off CQE before sending commands
Dropped because it has been applied
  mmc: block: Factor out mmc_setup_queue()
New patch.
  mmc: block: Add CQE support
Drop legacy support and add blk-mq support

Changes since V5:
Re-based
  mmc: core: Add mmc_retune_hold_now()
Dropped because it has been applied
  mmc: core: Add members to mmc_request and mmc_data for CQE's
Dropped because it has been applied
  mmc: core: Move mmc_start_areq() declaration
New patch at Ulf's request
  mmc: block: Fix block status codes
Another un-related patch
  mmc: host: Add CQE interface
Move recovery_notifier() callback to struct mmc_request
  mmc: core: Add support for handling CQE requests
Roll __mmc_cqe_request_done() into mmc_cqe_request_done()
Move function declarations requested by Ulf
  mmc: core: Remove unused MMC_CAP2_PACKED_CMD
Dropped because it has been applied
  

Re: [PATCH 1/3] nvme: do not check for ns on rw path

2017-11-03 Thread Javier González
> On 3 Nov 2017, at 13.53, Christoph Hellwig  wrote:
> 
>> -if (ns && ns->ms &&
>> +if (ns->ms &&
>>  (!ns->pi_type || ns->ms != sizeof(struct t10_pi_tuple)) &&
>>  !blk_integrity_rq(req) && !blk_rq_is_passthrough(req))
>>  return BLK_STS_NOTSUPP;
> 
> blk_rq_is_passthrough also can't be true here.
> 
> How about:
> 
>   if (ns->ms && !blk_integrity_rq(req) &&
>   (!ns->pi_type || ns->ms != sizeof(struct t10_pi_tuple)))
>   return BLK_STS_NOTSUPP;
> 

Sure.

> Although I have to admit I don't really understand what this check
> is even trying to do.  It basically checks for a namespace that has
> a format with metadata that is not T10 protection information and
> then rejects all I/O to it.  Why are we even creating a block device
> node for such a thing?

Looking at the history (i) the check has changed location and (ii) some
checks have been added through time. So it looks like leftovers from
here and there.

If we end up not needing these checks at all here, you can just fix it
all in the same commit. Just wanted to get rid of sparse/smatch
complains...



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 2/3] nvme: compare NQN string with right size

2017-11-03 Thread Javier González
> On 3 Nov 2017, at 13.54, Christoph Hellwig  wrote:
> 
> On Fri, Nov 03, 2017 at 11:02:49AM +0100, Javier González wrote:
>> Compare subnqns using NVMF_NQN_SIZE as it is < 256
>> 
>> Signed-off-by: Javier González 
>> ---
>> drivers/nvme/host/core.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index bd1d5ff911c9..ae8ab0a1ef0d 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1743,7 +1743,7 @@ static void nvme_init_subnqn(struct nvme_ctrl *ctrl, 
>> struct nvme_id_ctrl *id)
>> 
>>  nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE);
>>  if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) {
>> -strcpy(ctrl->subnqn, id->subnqn);
>> +strncpy(ctrl->subnqn, id->subnqn, NVMF_NQN_SIZE);
>>  return;
>>  }
> 
> This isn't a compare, but a copy.  Except for that it looks ok to me.

True. Can you change the message when picking it up?


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 3/3] nvme: fix eui_show() print format

2017-11-03 Thread Christoph Hellwig
On Fri, Nov 03, 2017 at 11:02:50AM +0100, Javier González wrote:
> Signed-off-by: Javier González 
> ---
>  drivers/nvme/host/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ae8ab0a1ef0d..f05c81774abf 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2108,7 +2108,7 @@ static ssize_t eui_show(struct device *dev, struct 
> device_attribute *attr,
>   char *buf)
>  {
>   struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
> - return sprintf(buf, "%8phd\n", ns->eui);
> + return sprintf(buf, "%8phD\n", ns->eui);
>  }
>  static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL);

This looks correct.  I wonder what the old code printed - does someone
have a device with an EUI-64 at hand to quickly cross check what we
did before?


Re: [PATCH 2/3] nvme: compare NQN string with right size

2017-11-03 Thread Christoph Hellwig
On Fri, Nov 03, 2017 at 11:02:49AM +0100, Javier González wrote:
> Compare subnqns using NVMF_NQN_SIZE as it is < 256
> 
> Signed-off-by: Javier González 
> ---
>  drivers/nvme/host/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index bd1d5ff911c9..ae8ab0a1ef0d 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1743,7 +1743,7 @@ static void nvme_init_subnqn(struct nvme_ctrl *ctrl, 
> struct nvme_id_ctrl *id)
>  
>   nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE);
>   if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) {
> - strcpy(ctrl->subnqn, id->subnqn);
> + strncpy(ctrl->subnqn, id->subnqn, NVMF_NQN_SIZE);
>   return;
>   }

This isn't a compare, but a copy.  Except for that it looks ok to me.


[GIT PULL] nvme updates for Linux 4.15

2017-11-03 Thread Christoph Hellwig
Hi Jens,

below are the currently queue nvme updates for Linux 4.15.  There are
a few more things that could make it for this merge window, but I'd
like to get things into linux-next, especially for the unlikely case
that Linus decided to cut -rc8.

Highlights:
 - support for SGLs in the PCIe driver (Chaitanya Kulkarni)
 - disable I/O schedulers for the admin queue (Israel Rukshin)
 - various Fibre Channel fixes and enhancements (James Smart)
 - various refactoring for better code sharing between transports
   (Sagi Grimberg and me)

as well as lots of little bits from various contributors.

The following changes since commit 9c9883744dda1cc38339a448dd8435140537027e:

  block: move __elv_next_request to blk-core.c (2017-10-03 08:43:04 -0600)

are available in the git repository at:

  git://git.infradead.org/nvme.git nvme-4.15

for you to fetch changes up to a806c6c81e6c0d07c8a8b2643bad4a37a196d681:

  nvme: comment typo fixed in clearing AER (2017-11-03 10:02:20 +0300)


Chaitanya Kulkarni (1):
  nvme-pci: add SGL support

Christoph Hellwig (11):
  nvme: simplify compat_ioctl handling
  nvme: use ida_simple_{get,remove} for the controller instance
  nvme: use kref_get_unless_zero in nvme_find_get_ns
  nvme: simplify nvme_open
  nvme: switch controller refcounting to use struct device
  nvme: get rid of nvme_ctrl_list
  nvme: check for a live controller in nvme_dev_open
  nvme-fc: merge __nvme_fc_schedule_delete_work into __nvme_fc_del_ctrl
  nvme: move controller deletion to common code
  nvme-rdma: remove nvme_rdma_remove_ctrl
  nvme: consolidate common code from ->reset_work

Israel Rukshin (3):
  nvme-rdma: Add BLK_MQ_F_NO_SCHED flag to admin tag set
  nvme-fc: Add BLK_MQ_F_NO_SCHED flag to admin tag set
  nvme-loop: Add BLK_MQ_F_NO_SCHED flag to admin tag set

James Smart (18):
  nvmet: bump NVMET_NR_QUEUES to 128
  nvme-fc: add uevent for auto-connect
  nvme-fc: create fc class and transport device
  nvme-fc: move remote port get/put/free location
  nvme-fc: correct io termination handling
  nvme-fc: correct io timeout behavior
  nvme: add duplicate_connect option
  nvme: add helper to compare options to controller
  nvme-rdma: add support for duplicate_connect option
  nvme-fc: add support for duplicate_connect option
  nvme-fc: remove NVME_FC_MAX_SEGMENTS
  nvme-fc: avoid workqueue flush stalls
  nvme-fc: change ctlr state assignments during reset/reconnect
  nvme-fc: add a dev_loss_tmo field to the remoteport
  nvme-fc: check connectivity before initiating reconnects
  nvme: allow controller RESETTING to RECONNECTING transition
  nvme-fc: add dev_loss_tmo timeout and remoteport resume support
  nvmet: fix fatal_err_work deadlock

Keith Busch (1):
  nvme: Remove unused headers

Marc Olson (1):
  nvme: update timeout module parameter type

Max Gurtovoy (1):
  nvme-rdma: align nvme_rdma_device structure

Minwoo Im (2):
  nvme-pci: fix typos in comments
  nvme: comment typo fixed in clearing AER

Nitzan Carmi (1):
  nvme-rdma: Add debug message when reaches timeout

Randy Dunlap (1):
  nvme: use menu Kconfig interface

Roy Shterman (1):
  nvmet: Change max_nsid in subsystem due to ns_disable if needed

Sagi Grimberg (14):
  nvme-fabrics: request transport module
  block: introduce blk_mq_tagset_iter
  nvme: introduce nvme_reinit_tagset
  block: remove blk_mq_reinit_tagset
  nvme-rdma: pass tagset to directly nvme_rdma_free_tagset
  nvme-rdma: fix wrong logging message
  nvme-rdma: move assignment to declaration
  nvme-rdma: Check that reinit_request got a proper mr
  nvme-rdma: teardown admin/io queues once on error recovery
  nvme-rdma: Don't local invalidate if the queue is not live
  nvme-rdma: change queue flag semantics DELETING -> ALLOCATED
  nvme-rdma: stop controller reset if the controller is deleting
  nvme-rdma: reuse nvme_delete_ctrl when reconnect attempts expire
  nvme: flush reset_work before safely continuing with delete operation

 block/blk-mq-tag.c |  11 +-
 drivers/nvme/Kconfig   |   4 +
 drivers/nvme/host/core.c   | 260 +++-
 drivers/nvme/host/fabrics.c|  12 +-
 drivers/nvme/host/fabrics.h|  14 +
 drivers/nvme/host/fc.c | 656 ++---
 drivers/nvme/host/nvme.h   |  26 +-
 drivers/nvme/host/pci.c| 224 +++---
 drivers/nvme/host/rdma.c   | 225 --
 drivers/nvme/target/core.c |  13 +
 drivers/nvme/target/fc.c   |  16 +-
 drivers/nvme/target/loop.c |  47 +--
 drivers/nvme/target/nvmet.h|   2 +-
 include/linux/blk-mq.h |   4 +-
 include/linux/nvme-fc-driver.h |  15 +-
 15 files changed, 1026 insertions(+), 503 deletions(-)


[PATCH 1/3] nvme: do not check for ns on rw path

2017-11-03 Thread Javier González
On the rw path, the ns is assumed to be set. However, a check is still
done, inherited from the time the code resided at nvme_queue_rq().

Eliminate this check, which also eliminates a smatch complain for not
doing proper NULL checks on ns.

Signed-off-by: Javier González 
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5a14cc7f28ee..bd1d5ff911c9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -472,7 +472,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 * unless this namespace is formated such that the metadata can be
 * stripped/generated by the controller with PRACT=1.
 */
-   if (ns && ns->ms &&
+   if (ns->ms &&
(!ns->pi_type || ns->ms != sizeof(struct t10_pi_tuple)) &&
!blk_integrity_rq(req) && !blk_rq_is_passthrough(req))
return BLK_STS_NOTSUPP;
-- 
2.7.4



[PATCH 2/3] nvme: compare NQN string with right size

2017-11-03 Thread Javier González
Compare subnqns using NVMF_NQN_SIZE as it is < 256

Signed-off-by: Javier González 
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bd1d5ff911c9..ae8ab0a1ef0d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1743,7 +1743,7 @@ static void nvme_init_subnqn(struct nvme_ctrl *ctrl, 
struct nvme_id_ctrl *id)
 
nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE);
if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) {
-   strcpy(ctrl->subnqn, id->subnqn);
+   strncpy(ctrl->subnqn, id->subnqn, NVMF_NQN_SIZE);
return;
}
 
-- 
2.7.4



[PATCH 3/3] nvme: fix eui_show() print format

2017-11-03 Thread Javier González
Signed-off-by: Javier González 
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ae8ab0a1ef0d..f05c81774abf 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2108,7 +2108,7 @@ static ssize_t eui_show(struct device *dev, struct 
device_attribute *attr,
char *buf)
 {
struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-   return sprintf(buf, "%8phd\n", ns->eui);
+   return sprintf(buf, "%8phD\n", ns->eui);
 }
 static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL);
 
-- 
2.7.4



[PATCH 0/3] nvme: small fixes reported by smatch

2017-11-03 Thread Javier González
Fix a number of small things reported by smatch on the nvme driver

Javier González (3):
  nvme: do not check for ns on rw path
  nvme: compare NQN string with right size
  nvme: fix eui_show() print format

 drivers/nvme/host/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.7.4