date:20170123

Re: [PATCH 1/5] blk-throttle: Move three assignments for the variable "ret" in tg_set_max()

2017-01-23 Thread Johannes Thumshirn

On Sun, Jan 22, 2017 at 09:31:29AM +0100, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sat, 21 Jan 2017 21:23:06 +0100
> 
> A local variable was set to an error code before a concrete error situation
> was detected. Thus move the corresponding assignments into if branches
> to indicate a software failure there.
> 
> Signed-off-by: Markus Elfring 
> ---
>  block/blk-throttle.c | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 6f4c96e5f86b..51d112deb02e 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -1327,27 +1327,30 @@ static ssize_t tg_set_max(struct kernfs_open_file *of,
>   break;
>   ctx.body += len;
>  
> - ret = -EINVAL;
>   p = tok;
>   strsep(, "=");
> - if (!p || (sscanf(p, "%llu", ) != 1 && strcmp(p, "max")))
> + if (!p || (sscanf(p, "%llu", ) != 1 && strcmp(p, "max"))) {
> + ret = -EINVAL;
>   goto out_finish;
> + }

Sorry, I don't like this patch. We know the next error if we encounter one
will be EINVAL until we change it. Your patch doesn't introduce a functual
change and doesn't improve readability, so I don't really see a point for it.

Byte,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] blk-throttle: Adjust two function calls together with a variable assignment

2017-01-23 Thread Johannes Thumshirn

On Sun, Jan 22, 2017 at 09:33:08AM +0100, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sat, 21 Jan 2017 22:15:33 +0100
> 
> The script "checkpatch.pl" pointed information out like the following.
> 
> ERROR: do not use assignment in if condition
> 
> Thus fix the affected source code places.
> 
> Signed-off-by: Markus Elfring 
> ---

Reviewed-by: Johannes Thumshirn 
if Jens wants doesn't mind.

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/8] lightnvm: cleanup nvm transformation functions

2017-01-23 Thread Matias Bjørling

Going from target specific ppa addresses to device was accomplished by
first converting target to generic ppa addresses and generic to device
addresses. The conversion was either open-coded or used the built-in
nvm_trans_* and nvm_map_* functions for conversion. Simplify the
interface and cleanup the calls to provide clean functions that now
either take a list of ppas or a nvm_rq, and is exposed through:

 void nvm_ppa_* - target to/from device with a list of PPAs,
 void nvm_rq_* - target to/from device with a nvm_rq.

Signed-off-by: Matias Bjørling 
---
 drivers/lightnvm/core.c  | 115 ++-
 include/linux/lightnvm.h |  14 +++---
 2 files changed, 40 insertions(+), 89 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index a1a7a5a..0842c85 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -50,11 +50,6 @@ struct nvm_area {
sector_t end;   /* end is excluded */
 };
 
-enum {
-   TRANS_TGT_TO_DEV =  0x0,
-   TRANS_DEV_TO_TGT =  0x1,
-};
-
 static struct nvm_target *nvm_find_target(struct nvm_dev *dev, const char 
*name)
 {
struct nvm_target *tgt;
@@ -428,38 +423,46 @@ static void nvm_map_to_tgt(struct nvm_tgt_dev *tgt_dev, 
struct ppa_addr *p)
p->g.lun -= lun_roff;
 }
 
-static void nvm_trans_rq(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd,
-   int flag)
+static void nvm_ppa_tgt_to_dev(struct nvm_tgt_dev *tgt_dev,
+   struct ppa_addr *ppa_list, int nr_ppas)
 {
int i;
 
-   if (rqd->nr_ppas == 1) {
-   if (flag == TRANS_TGT_TO_DEV)
-   nvm_map_to_dev(tgt_dev, >ppa_addr);
-   else
-   nvm_map_to_tgt(tgt_dev, >ppa_addr);
-   return;
+   for (i = 0; i < nr_ppas; i++) {
+   nvm_map_to_dev(tgt_dev, _list[i]);
+   ppa_list[i] = generic_to_dev_addr(tgt_dev, ppa_list[i]);
}
+}
+
+static void nvm_ppa_dev_to_tgt(struct nvm_tgt_dev *tgt_dev,
+   struct ppa_addr *ppa_list, int nr_ppas)
+{
+   int i;
 
-   for (i = 0; i < rqd->nr_ppas; i++) {
-   if (flag == TRANS_TGT_TO_DEV)
-   nvm_map_to_dev(tgt_dev, >ppa_list[i]);
-   else
-   nvm_map_to_tgt(tgt_dev, >ppa_list[i]);
+   for (i = 0; i < nr_ppas; i++) {
+   ppa_list[i] = dev_to_generic_addr(tgt_dev, ppa_list[i]);
+   nvm_map_to_tgt(tgt_dev, _list[i]);
}
 }
 
-static struct ppa_addr nvm_trans_ppa(struct nvm_tgt_dev *tgt_dev,
-struct ppa_addr p, int dir)
+static void nvm_rq_tgt_to_dev(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd)
 {
-   struct ppa_addr ppa = p;
+   if (rqd->nr_ppas == 1) {
+   nvm_ppa_tgt_to_dev(tgt_dev, >ppa_addr, 1);
+   return;
+   }
+
+   nvm_ppa_tgt_to_dev(tgt_dev, rqd->ppa_list, rqd->nr_ppas);
+}
 
-   if (dir == TRANS_TGT_TO_DEV)
-   nvm_map_to_dev(tgt_dev, );
-   else
-   nvm_map_to_tgt(tgt_dev, );
+static void nvm_rq_dev_to_tgt(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd)
+{
+   if (rqd->nr_ppas == 1) {
+   nvm_ppa_dev_to_tgt(tgt_dev, >ppa_addr, 1);
+   return;
+   }
 
-   return ppa;
+   nvm_ppa_dev_to_tgt(tgt_dev, rqd->ppa_list, rqd->nr_ppas);
 }
 
 void nvm_part_to_tgt(struct nvm_dev *dev, sector_t *entries,
@@ -564,26 +567,6 @@ static struct nvm_dev *nvm_find_nvm_dev(const char *name)
return NULL;
 }
 
-static void nvm_tgt_generic_to_addr_mode(struct nvm_tgt_dev *tgt_dev,
-struct nvm_rq *rqd)
-{
-   struct nvm_dev *dev = tgt_dev->parent;
-   int i;
-
-   if (rqd->nr_ppas > 1) {
-   for (i = 0; i < rqd->nr_ppas; i++) {
-   rqd->ppa_list[i] = nvm_trans_ppa(tgt_dev,
-   rqd->ppa_list[i], TRANS_TGT_TO_DEV);
-   rqd->ppa_list[i] = generic_to_dev_addr(dev,
-   rqd->ppa_list[i]);
-   }
-   } else {
-   rqd->ppa_addr = nvm_trans_ppa(tgt_dev, rqd->ppa_addr,
-   TRANS_TGT_TO_DEV);
-   rqd->ppa_addr = generic_to_dev_addr(dev, rqd->ppa_addr);
-   }
-}
-
 int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *ppas,
   int nr_ppas, int type)
 {
@@ -599,7 +582,7 @@ int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct 
ppa_addr *ppas,
memset(, 0, sizeof(struct nvm_rq));
 
nvm_set_rqd_ppalist(dev, , ppas, nr_ppas, 1);
-   nvm_tgt_generic_to_addr_mode(tgt_dev, );
+   nvm_rq_tgt_to_dev(tgt_dev, );
 
ret = dev->ops->set_bb_tbl(dev, _addr, rqd.nr_ppas, type);
nvm_free_rqd_ppalist(dev, );
@@ -627,8

[PATCH 5/8] lightnvm: make nvm_map_* return void

2017-01-23 Thread Matias Bjørling

The only check there was done was a debugging check. Remove it and
replace the return value with void to reduce error checking.

Signed-off-by: Matias Bjørling 
---
 drivers/lightnvm/core.c | 41 +
 1 file changed, 9 insertions(+), 32 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 18d4873..a1a7a5a 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -407,31 +407,17 @@ static int nvm_register_map(struct nvm_dev *dev)
return -ENOMEM;
 }
 
-static int nvm_map_to_dev(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *p)
+static void nvm_map_to_dev(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *p)
 {
struct nvm_dev_map *dev_map = tgt_dev->map;
struct nvm_ch_map *ch_map = _map->chnls[p->g.ch];
int lun_off = ch_map->lun_offs[p->g.lun];
-   struct nvm_dev *dev = tgt_dev->parent;
-   struct nvm_dev_map *dev_rmap = dev->rmap;
-   struct nvm_ch_map *ch_rmap;
-   int lun_roff;
 
p->g.ch += ch_map->ch_off;
p->g.lun += lun_off;
-
-   ch_rmap = _rmap->chnls[p->g.ch];
-   lun_roff = ch_rmap->lun_offs[p->g.lun];
-
-   if (unlikely(ch_rmap->ch_off < 0 || lun_roff < 0)) {
-   pr_err("nvm: corrupted device partition table\n");
-   return -EINVAL;
-   }
-
-   return 0;
 }
 
-static int nvm_map_to_tgt(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *p)
+static void nvm_map_to_tgt(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *p)
 {
struct nvm_dev *dev = tgt_dev->parent;
struct nvm_dev_map *dev_rmap = dev->rmap;
@@ -440,34 +426,27 @@ static int nvm_map_to_tgt(struct nvm_tgt_dev *tgt_dev, 
struct ppa_addr *p)
 
p->g.ch -= ch_rmap->ch_off;
p->g.lun -= lun_roff;
-
-   return 0;
 }
 
-static int nvm_trans_rq(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd,
+static void nvm_trans_rq(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd,
int flag)
 {
int i;
-   int ret;
 
if (rqd->nr_ppas == 1) {
if (flag == TRANS_TGT_TO_DEV)
-   return nvm_map_to_dev(tgt_dev, >ppa_addr);
+   nvm_map_to_dev(tgt_dev, >ppa_addr);
else
-   return nvm_map_to_tgt(tgt_dev, >ppa_addr);
+   nvm_map_to_tgt(tgt_dev, >ppa_addr);
+   return;
}
 
for (i = 0; i < rqd->nr_ppas; i++) {
if (flag == TRANS_TGT_TO_DEV)
-   ret = nvm_map_to_dev(tgt_dev, >ppa_list[i]);
+   nvm_map_to_dev(tgt_dev, >ppa_list[i]);
else
-   ret = nvm_map_to_tgt(tgt_dev, >ppa_list[i]);
-
-   if (ret)
-   break;
+   nvm_map_to_tgt(tgt_dev, >ppa_list[i]);
}
-
-   return ret;
 }
 
 static struct ppa_addr nvm_trans_ppa(struct nvm_tgt_dev *tgt_dev,
@@ -665,9 +644,7 @@ int nvm_erase_blk(struct nvm_tgt_dev *tgt_dev, struct 
ppa_addr *ppas, int flags)
if (!dev->ops->erase_block)
return 0;
 
-   ret = nvm_map_to_dev(tgt_dev, ppas);
-   if (ret)
-   return ret;
+   nvm_map_to_dev(tgt_dev, ppas);
 
memset(, 0, sizeof(struct nvm_rq));
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/8] lightnvm: cleanups and support for vector ioctls

2017-01-23 Thread Matias Bjørling

A couple of patches. The first seven patches is cleanup patches, that
cleans up the codebase for when pblk is added to the kernel, and
prepare the code such that a device can be used as a normal block device
(as long as the access rules are respected).

The last patch adds ioctls for vectored I/Os, which enables liblightnvm
to integrate without any kernel modifications.

-Matias

Matias Bjørling (8):
  lightnvm: merge gennvm with core
  lightnvm: collapse nvm_erase_ppa and nvm_erase_blk
  lightnvm: remove nvm_submit_ppa* functions
  lightnvm: remove nvm_get_bb_tbl and nvm_set_bb_tbl
  lightnvm: make nvm_map_* return void
  lightnvm: cleanup nvm transformation functions
  lightnvm: reduce number of nvm_id groups to one
  lightnvm: add ioctls for vector I/Os

 drivers/block/null_blk.c  |3 +-
 drivers/lightnvm/Kconfig  |9 -
 drivers/lightnvm/Makefile |3 +-
 drivers/lightnvm/core.c   | 1016 -
 drivers/lightnvm/gennvm.c |  657 --
 drivers/lightnvm/gennvm.h |   62 ---
 drivers/lightnvm/sysblk.c |  733 -
 drivers/nvme/host/core.c  |4 +
 drivers/nvme/host/lightnvm.c  |  314 +++--
 drivers/nvme/host/nvme.h  |6 +
 include/linux/lightnvm.h  |  121 +
 include/uapi/linux/lightnvm.h |   50 ++
 12 files changed, 940 insertions(+), 2038 deletions(-)
 delete mode 100644 drivers/lightnvm/gennvm.c
 delete mode 100644 drivers/lightnvm/gennvm.h
 delete mode 100644 drivers/lightnvm/sysblk.c

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/8] lightnvm: remove nvm_submit_ppa* functions

2017-01-23 Thread Matias Bjørling

The nvm_submit_ppa* functions are no longer needed after gennvm and core
have been merged.

Signed-off-by: Matias Bjørling 
---
 drivers/lightnvm/core.c  | 109 ---
 include/linux/lightnvm.h |   4 --
 2 files changed, 113 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index a4e2e3b..ffdf048 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -883,115 +883,6 @@ void nvm_end_io(struct nvm_rq *rqd, int error)
 }
 EXPORT_SYMBOL(nvm_end_io);
 
-static void nvm_end_io_sync(struct nvm_rq *rqd)
-{
-   struct completion *waiting = rqd->wait;
-
-   rqd->wait = NULL;
-
-   complete(waiting);
-}
-
-static int __nvm_submit_ppa(struct nvm_dev *dev, struct nvm_rq *rqd, int 
opcode,
-   int flags, void *buf, int len)
-{
-   DECLARE_COMPLETION_ONSTACK(wait);
-   struct bio *bio;
-   int ret;
-   unsigned long hang_check;
-
-   bio = bio_map_kern(dev->q, buf, len, GFP_KERNEL);
-   if (IS_ERR_OR_NULL(bio))
-   return -ENOMEM;
-
-   nvm_generic_to_addr_mode(dev, rqd);
-
-   rqd->dev = NULL;
-   rqd->opcode = opcode;
-   rqd->flags = flags;
-   rqd->bio = bio;
-   rqd->wait = 
-   rqd->end_io = nvm_end_io_sync;
-
-   ret = dev->ops->submit_io(dev, rqd);
-   if (ret) {
-   bio_put(bio);
-   return ret;
-   }
-
-   /* Prevent hang_check timer from firing at us during very long I/O */
-   hang_check = sysctl_hung_task_timeout_secs;
-   if (hang_check)
-   while (!wait_for_completion_io_timeout(,
-   hang_check * (HZ/2)))
-   ;
-   else
-   wait_for_completion_io();
-
-   return rqd->error;
-}
-
-/**
- * nvm_submit_ppa_list - submit user-defined ppa list to device. The user must
- *  take to free ppa list if necessary.
- * @dev:   device
- * @ppa_list:  user created ppa_list
- * @nr_ppas:   length of ppa_list
- * @opcode:device opcode
- * @flags: device flags
- * @buf:   data buffer
- * @len:   data buffer length
- */
-int nvm_submit_ppa_list(struct nvm_dev *dev, struct ppa_addr *ppa_list,
-   int nr_ppas, int opcode, int flags, void *buf, int len)
-{
-   struct nvm_rq rqd;
-
-   if (dev->ops->max_phys_sect < nr_ppas)
-   return -EINVAL;
-
-   memset(, 0, sizeof(struct nvm_rq));
-
-   rqd.nr_ppas = nr_ppas;
-   if (nr_ppas > 1)
-   rqd.ppa_list = ppa_list;
-   else
-   rqd.ppa_addr = ppa_list[0];
-
-   return __nvm_submit_ppa(dev, , opcode, flags, buf, len);
-}
-EXPORT_SYMBOL(nvm_submit_ppa_list);
-
-/**
- * nvm_submit_ppa - submit PPAs to device. PPAs will automatically be unfolded
- * as single, dual, quad plane PPAs depending on device type.
- * @dev:   device
- * @ppa:   user created ppa_list
- * @nr_ppas:   length of ppa_list
- * @opcode:device opcode
- * @flags: device flags
- * @buf:   data buffer
- * @len:   data buffer length
- */
-int nvm_submit_ppa(struct nvm_dev *dev, struct ppa_addr *ppa, int nr_ppas,
-   int opcode, int flags, void *buf, int len)
-{
-   struct nvm_rq rqd;
-   int ret;
-
-   memset(, 0, sizeof(struct nvm_rq));
-   ret = nvm_set_rqd_ppalist(dev, , ppa, nr_ppas, 1);
-   if (ret)
-   return ret;
-
-   ret = __nvm_submit_ppa(dev, , opcode, flags, buf, len);
-
-   nvm_free_rqd_ppalist(dev, );
-
-   return ret;
-}
-EXPORT_SYMBOL(nvm_submit_ppa);
-
 /*
  * folds a bad block list from its plane representation to its virtual
  * block representation. The fold is done in place and reduced size is
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index f2007b2..abb3d55 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -489,10 +489,6 @@ extern int nvm_get_l2p_tbl(struct nvm_tgt_dev *, u64, u32, 
nvm_l2p_update_fn *,
 extern int nvm_get_area(struct nvm_tgt_dev *, sector_t *, sector_t);
 extern void nvm_put_area(struct nvm_tgt_dev *, sector_t);
 extern void nvm_end_io(struct nvm_rq *, int);
-extern int nvm_submit_ppa(struct nvm_dev *, struct ppa_addr *, int, int, int,
-   void *, int);
-extern int nvm_submit_ppa_list(struct nvm_dev *, struct ppa_addr *, int, int,
-   int, void *, int);
 extern int nvm_bb_tbl_fold(struct nvm_dev *, u8 *, int);
 extern int nvm_get_bb_tbl(struct nvm_dev *, struct ppa_addr, u8 *);
 extern int nvm_get_tgt_bb_tbl(struct nvm_tgt_dev *, struct ppa_addr, u8 *);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

[PATCH 4/8] lightnvm: remove nvm_get_bb_tbl and nvm_set_bb_tbl

2017-01-23 Thread Matias Bjørling

Since the merge of gennvm and core, there is no longer a need for the
device specific bad block functions.

Signed-off-by: Matias Bjørling 
---
 drivers/lightnvm/core.c  | 40 
 include/linux/lightnvm.h |  2 --
 2 files changed, 4 insertions(+), 38 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index ffdf048..18d4873 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -605,33 +605,6 @@ static void nvm_tgt_generic_to_addr_mode(struct 
nvm_tgt_dev *tgt_dev,
}
 }
 
-int nvm_set_bb_tbl(struct nvm_dev *dev, struct ppa_addr *ppas, int nr_ppas,
-   int type)
-{
-   struct nvm_rq rqd;
-   int ret;
-
-   if (nr_ppas > dev->ops->max_phys_sect) {
-   pr_err("nvm: unable to update all sysblocks atomically\n");
-   return -EINVAL;
-   }
-
-   memset(, 0, sizeof(struct nvm_rq));
-
-   nvm_set_rqd_ppalist(dev, , ppas, nr_ppas, 1);
-   nvm_generic_to_addr_mode(dev, );
-
-   ret = dev->ops->set_bb_tbl(dev, _addr, rqd.nr_ppas, type);
-   nvm_free_rqd_ppalist(dev, );
-   if (ret) {
-   pr_err("nvm: sysblk failed bb mark\n");
-   return -EINVAL;
-   }
-
-   return 0;
-}
-EXPORT_SYMBOL(nvm_set_bb_tbl);
-
 int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *ppas,
   int nr_ppas, int type)
 {
@@ -919,20 +892,15 @@ int nvm_bb_tbl_fold(struct nvm_dev *dev, u8 *blks, int 
nr_blks)
 }
 EXPORT_SYMBOL(nvm_bb_tbl_fold);
 
-int nvm_get_bb_tbl(struct nvm_dev *dev, struct ppa_addr ppa, u8 *blks)
-{
-   ppa = generic_to_dev_addr(dev, ppa);
-
-   return dev->ops->get_bb_tbl(dev, ppa, blks);
-}
-EXPORT_SYMBOL(nvm_get_bb_tbl);
-
 int nvm_get_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct ppa_addr ppa,
   u8 *blks)
 {
+   struct nvm_dev *dev = tgt_dev->parent;
+
ppa = nvm_trans_ppa(tgt_dev, ppa, TRANS_TGT_TO_DEV);
+   ppa = generic_to_dev_addr(dev, ppa);
 
-   return nvm_get_bb_tbl(tgt_dev->parent, ppa, blks);
+   return dev->ops->get_bb_tbl(dev, ppa, blks);
 }
 EXPORT_SYMBOL(nvm_get_tgt_bb_tbl);
 
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index abb3d55..cad1e1c 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -473,7 +473,6 @@ extern struct nvm_dev *nvm_alloc_dev(int);
 extern int nvm_register(struct nvm_dev *);
 extern void nvm_unregister(struct nvm_dev *);
 
-extern int nvm_set_bb_tbl(struct nvm_dev *, struct ppa_addr *, int, int);
 extern int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *, struct ppa_addr *,
  int, int);
 extern int nvm_max_phys_sects(struct nvm_tgt_dev *);
@@ -490,7 +489,6 @@ extern int nvm_get_area(struct nvm_tgt_dev *, sector_t *, 
sector_t);
 extern void nvm_put_area(struct nvm_tgt_dev *, sector_t);
 extern void nvm_end_io(struct nvm_rq *, int);
 extern int nvm_bb_tbl_fold(struct nvm_dev *, u8 *, int);
-extern int nvm_get_bb_tbl(struct nvm_dev *, struct ppa_addr, u8 *);
 extern int nvm_get_tgt_bb_tbl(struct nvm_tgt_dev *, struct ppa_addr, u8 *);
 
 extern int nvm_dev_factory(struct nvm_dev *, int flags);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/8] lightnvm: add ioctls for vector I/Os

2017-01-23 Thread Matias Bjørling

Enable user-space to issue vector I/O commands through ioctls. To issue
a vector I/O, the ppa list with addresses is also required and must be
mapped for the controller to access.

For each ioctl, the result and status bits are returned as well, such
that user-space can retrieve the open-channel SSD completion bits.

The implementation covers the traditional use-cases of bad block
management, and vectored read/write/erase.

Signed-off-by: Matias Bjørling 
Metadata implementation, test, and fixes.
Signed-off-by: Simon A.F. Lund 
Signed-off-by: Matias Bjørling 

Signed-off-by: Matias Bjørling 
---
 drivers/nvme/host/core.c  |   4 +
 drivers/nvme/host/lightnvm.c  | 220 ++
 drivers/nvme/host/nvme.h  |   6 ++
 include/uapi/linux/lightnvm.h |  50 ++
 4 files changed, 280 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2fc86dc..037ee99 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -784,6 +784,10 @@ static int nvme_ioctl(struct block_device *bdev, fmode_t 
mode,
return nvme_sg_io(ns, (void __user *)arg);
 #endif
default:
+#ifdef CONFIG_NVM
+   if (ns->ndev)
+   return nvme_nvm_ioctl(ns, cmd, arg);
+#endif
return -ENOTTY;
}
 }
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 733992a..dafdee5 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 enum nvme_nvm_admin_opcode {
nvme_nvm_admin_identity = 0xe2,
@@ -583,6 +585,224 @@ static struct nvm_dev_ops nvme_nvm_dev_ops = {
.max_phys_sect  = 64,
 };
 
+static void nvme_nvm_end_user_vio(struct request *rq, int error)
+{
+   struct completion *waiting = rq->end_io_data;
+
+   complete(waiting);
+}
+
+static int nvme_nvm_submit_user_cmd(struct request_queue *q,
+   struct nvme_ns *ns,
+   struct nvme_nvm_command *vcmd,
+   void __user *ubuf, unsigned int bufflen,
+   void __user *meta_buf, unsigned int meta_len,
+   void __user *ppa_buf, unsigned int ppa_len,
+   u32 *result, u64 *status, unsigned int timeout)
+{
+   bool write = nvme_is_write((struct nvme_command *)vcmd);
+   struct nvm_dev *dev = ns->ndev;
+   struct gendisk *disk = ns->disk;
+   struct request *rq;
+   struct bio *bio = NULL;
+   __le64 *ppa_list = NULL;
+   dma_addr_t ppa_dma;
+   __le64 *metadata = NULL;
+   dma_addr_t metadata_dma;
+   DECLARE_COMPLETION_ONSTACK(wait);
+   int ret;
+
+   rq = nvme_alloc_request(q, (struct nvme_command *)vcmd, 0,
+   NVME_QID_ANY);
+   if (IS_ERR(rq)) {
+   ret = -ENOMEM;
+   goto err_cmd;
+   }
+
+   rq->timeout = timeout ? timeout : ADMIN_TIMEOUT;
+
+   rq->cmd_flags &= ~REQ_FAILFAST_DRIVER;
+   rq->end_io_data = 
+
+   if (ppa_buf && ppa_len) {
+   ppa_list = dma_pool_alloc(dev->dma_pool, GFP_KERNEL, _dma);
+   if (!ppa_list) {
+   ret = -ENOMEM;
+   goto err_rq;
+   }
+   if (copy_from_user(ppa_list, (void __user *)ppa_buf,
+   sizeof(u64) * (ppa_len + 1))) {
+   ret = -EFAULT;
+   goto err_ppa;
+   }
+   vcmd->ph_rw.spba = cpu_to_le64(ppa_dma);
+   } else {
+   vcmd->ph_rw.spba = cpu_to_le64((uintptr_t)ppa_buf);
+   }
+
+   if (ubuf && bufflen) {
+   ret = blk_rq_map_user(q, rq, NULL, ubuf, bufflen, GFP_KERNEL);
+   if (ret)
+   goto err_ppa;
+   bio = rq->bio;
+
+   if (meta_buf && meta_len) {
+   metadata = dma_pool_alloc(dev->dma_pool, GFP_KERNEL,
+   _dma);
+   if (!metadata) {
+   ret = -ENOMEM;
+   goto err_map;
+   }
+
+   if (write) {
+   if (copy_from_user(metadata,
+   (void __user *)meta_buf,
+   meta_len)) {
+   ret = -EFAULT;
+   goto err_meta;
+   }
+   }
+   vcmd->ph_rw.metadata = cpu_to_le64(metadata_dma);
+   }
+
+   if (!disk)
+   goto submit;
+
+

Re: [PATCH 1/5] blk-throttle: Move three assignments for the variable "ret" in tg_set_max()

2017-01-23 Thread Johannes Thumshirn

On Mon, Jan 23, 2017 at 11:00:15AM +0100, SF Markus Elfring wrote:
> >> @@ -1327,27 +1327,30 @@ static ssize_t tg_set_max(struct kernfs_open_file 
> >> *of,
> >>break;
> >>ctx.body += len;
> >>  
> >> -  ret = -EINVAL;
> >>p = tok;
> >>strsep(, "=");
> >> -  if (!p || (sscanf(p, "%llu", ) != 1 && strcmp(p, "max")))
> >> +  if (!p || (sscanf(p, "%llu", ) != 1 && strcmp(p, "max"))) {
> >> +  ret = -EINVAL;
> >>goto out_finish;
> >> +  }
> > 
> > Sorry, I don't like this patch. We know the next error if we encounter one
> > will be EINVAL until we change it.
> 
> Thanks for your constructive feedback.
> 
> 
> > Your patch doesn't introduce a functual change and doesn't improve 
> > readability,
> > so I don't really see a point for it.
> 
> We have got different preferences for the placement of error code settings.
Yes we do, so what's the point? Both are OK. Please don't go down that road it
opens so much potential for needless bikeshedding and waste all of our
(including your) time.

Thanks,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Urgent Please,,

2017-01-23 Thread Joyes Dadi

Good Day Dear,

My name is Ms. Joyes Dadi, I am glad you are reading this letter and I hope
we will start our communication and I know that this message will look strange,
surprising and probably unbelievable to you, but it is the reality. I want to
make a donation of money to you.

I contact you by the will of God. I am a firm German woman specialized in
mining gold and diamonds in Africa. But now, I'm very sick of a cancer. My
husband died in an accident two years ago with our two children and now I have
cancer of the esophagus that damaged almost all the cells in my system/agencies
and I'll die soon according to my doctor.

My most concern now is, we grew up in the orphanage and were married in
orphanage. If I die this deposited fund will soon be left alone in the hand of
the bank, and I do want to it that  way. Please, if you can be reliable and
sincere to accept my humble proposal; I have (10.5Millions Euro) in a fixed
deposit account; I will order the Bank to transfer the money into your account
in your country immediately, and then you will take the fund to  your country
and invest it to the orphanage homes Please, answer as quickly as possible.

God bless you.
Ms. Joyes Dadi
Email: joyesdadi...@citromail.hu
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] block: Initialize cfqq->ioprio_class in cfq_get_queue()

2017-01-23 Thread Alexander Potapenko

On Mon, Jan 23, 2017 at 4:30 PM, Jens Axboe  wrote:
> On 01/23/2017 07:06 AM, Alexander Potapenko wrote:
>> KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
>> uninitialized memory in cfq_init_cfqq():
>>
>> ==
>> BUG: KMSAN: use of unitialized memory
>> ...
>> Call Trace:
>>  [< inline >] __dump_stack lib/dump_stack.c:15
>>  [] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
>>  [] kmsan_report+0x205/0x360 ??:?
>>  [] __msan_warning+0x5b/0xb0 ??:?
>>  [< inline >] cfq_init_cfqq block/cfq-iosched.c:3754
>>  [] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
>> ...
>> origin:
>>  [] save_stack_trace+0x27/0x50 
>> arch/x86/kernel/stacktrace.c:67
>>  [] kmsan_internal_poison_shadow+0xab/0x150 ??:?
>>  [] kmsan_poison_slab+0xbb/0x120 ??:?
>>  [< inline >] allocate_slab mm/slub.c:1627
>>  [] new_slab+0x3af/0x4b0 mm/slub.c:1641
>>  [< inline >] new_slab_objects mm/slub.c:2407
>>  [] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
>>  [< inline >] __slab_alloc mm/slub.c:2606
>>  [< inline >] slab_alloc_node mm/slub.c:2669
>>  [] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
>>  [] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
>> ...
>> ==
>> (the line numbers are relative to 4.8-rc6, but the bug persists
>> upstream)
>>
>> The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
>> and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
>> before it's initialized.
>
> Patch looks fine to me, thanks. Is this a new warning? We don't seem
> to have changed this path in a while, yet I wonder why this is only
> surfacing now.

This is because KMSAN is a new tool, it's not in the trunk yet. Nobody
could see this warning before.
I don't know if kmemcheck detects this bug (and if anyone is actively
using it for testing either).
> --
> Jens Axboe
>



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

BUG: KASAN: Use-after-free

2017-01-23 Thread Matias Bjørling

Hi,

I could use some help verifying an use-after-free bug that I am seeing
after the new direct I/O work went in.

When issuing a direct write io using libaio, a bio is referenced in the
blkdev_direct_IO path, and then put again in the blkdev_bio_end_io path.
However, there is a case where the bio is put twice, which leads to
modules that rely on the bio after biodev_bio_end_io() to access it
prematurely.

The KASAN error report:

[   14.645916]
==
[   14.648027] BUG: KASAN: use-after-free in
blkdev_direct_IO+0x50c/0x600 at addr 8801ef30ea14
[   14.648558] Read of size 1 by task fio/375
[   14.648779] CPU: 0 PID: 375 Comm: fio Not tainted 4.10.0-rc5 #4845
[   14.649107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   14.649792] Call Trace:
[   14.649930]  dump_stack+0x63/0x8b
[   14.650112]  kasan_object_err+0x21/0x80
[   14.650323]  kasan_report_error+0x1fa/0x520
[   14.650551]  kasan_report+0x58/0x60
[   14.650741]  ? blkdev_direct_IO+0x50c/0x600
[   14.650969]  __asan_load1+0x47/0x50
[   14.651159]  blkdev_direct_IO+0x50c/0x600
[   14.651377]  ? filemap_check_errors+0x38/0x70
[   14.651613]  generic_file_direct_write+0x11b/0x220
[   14.651871]  __generic_file_write_iter+0x12b/0x2d0
[   14.652129]  blkdev_write_iter+0xd8/0x180
[   14.652348]  ? security_file_permission+0x81/0x100
[   14.652607]  aio_write+0x15f/0x1c0
[   14.652795]  ? kasan_unpoison_shadow+0x14/0x40
[   14.653036]  ? kasan_kmalloc+0x93/0xc0
[   14.653239]  ? __fget+0xae/0x110
[   14.653417]  do_io_submit+0x672/0x830
[   14.653617]  SyS_io_submit+0x10/0x20
[   14.653813]  entry_SYSCALL_64_fastpath+0x1e/0xad
[   14.654061] RIP: 0033:0x7f40d6d14667
[   14.654255] RSP: 002b:7fff5a8cc858 EFLAGS: 0202 ORIG_RAX:
00d1
[   14.654661] RAX: ffda RBX: 00df RCX:
7f40d6d14667
[   14.655041] RDX: 017a7f00 RSI: 0002 RDI:
7f40d71a6000
[   14.655421] RBP: 7f40cffea000 R08: fd33e1f0 R09:
01f0
[   14.655801] R10: 0446e000 R11: 0202 R12:
0001
[   14.656180] R13: 0001 R14: 7f40cffea008 R15:
0001
[   14.656556] Object at 8801ef30ea00, in cache bio-1 size: 224
[   14.656871] Allocated:
[   14.657075] PID = 375
[   14.657202]
[   14.657205] [] save_stack_trace+0x1b/0x20
[   14.657591]
[   14.657593] [] save_stack+0x46/0xd0
[   14.657953]
[   14.657955] [] kasan_kmalloc+0x93/0xc0
[   14.658350]
[   14.658353] [] kasan_slab_alloc+0x12/0x20
[   14.658739]
[   14.658741] [] kmem_cache_alloc+0xb9/0x1c0
[   14.659134]
[   14.659137] [] mempool_alloc_slab+0x15/0x20
[   14.659536]
[   14.659538] [] mempool_alloc+0x96/0x1a0
[   14.659918]
[   14.659921] [] bio_alloc_bioset+0xf9/0x330
[   14.660315]
[   14.660317] [] blkdev_direct_IO+0x187/0x600
[   14.660716]
[   14.660719] [] generic_file_direct_write+0x11b/0x220
[   14.661152]
[   14.661154] [] __generic_file_write_iter+0x12b/0x2d0
[   14.661588]
[   14.661590] [] blkdev_write_iter+0xd8/0x180
[   14.661985]
[   14.661987] [] aio_write+0x15f/0x1c0
[   14.662355]
[   14.662357] [] do_io_submit+0x672/0x830
[   14.662736]
[   14.662738] [] SyS_io_submit+0x10/0x20
[   14.663113]
[   14.663115] [] entry_SYSCALL_64_fastpath+0x1e/0xad
[   14.663545] Freed:
[   14.663658] PID = 375
[   14.663784]
[   14.663786] [] save_stack_trace+0x1b/0x20
[   14.664175]
[   14.664177] [] save_stack+0x46/0xd0
[   14.664539]
[   14.664542] [] kasan_slab_free+0x7d/0xc0
[   14.664925]
[   14.664927] [] kmem_cache_free+0x73/0x200
[   14.665317]
[   14.665319] [] mempool_free_slab+0x17/0x20
[   14.665713]
[   14.665715] [] mempool_free+0x5f/0xd0
[   14.666092]
[   14.666094] [] bio_free+0x94/0xb0
[   14.666446]
[   14.666449] [] bio_put+0x3d/0x50
[   14.666793]
[   14.666796] [] rrpc_end_io+0x235/0x430
[   14.667168]
[   14.667170] [] gen_end_io+0x5c/0x70
[   14.667529]
[   14.667531] [] nvm_end_io+0x2e/0x40
[   14.667890]
[   14.667893] [] nvme_nvm_end_io+0x5f/0x90
[   14.668277]
[   14.668279] [] blk_mq_end_request+0x55/0x90
[   14.668675]
[   14.668678] [] nvme_complete_rq+0x12c/0x3d0
[   14.669073]
[   14.669075] [] __blk_mq_complete_request+0x15f/0x240
[   14.669512]
[   14.669514] [] blk_mq_complete_request+0x35/0x40
[   14.669932]
[   14.669934] [] __nvme_process_cq+0x12b/0x330
[   14.670336]
[   14.670338] [] nvme_queue_rq+0x31c/0xe70
[   14.670720]
[   14.670722] [] blk_mq_dispatch_rq_list+0x191/0x360
[   14.671149]
[   14.671151] [] blk_mq_process_rq_list+0x312/0x350
[   14.671570]
[   14.671572] [] __blk_mq_run_hw_queue+0xa3/0xb0
[   14.671976]
[   14.671978] [] blk_mq_run_hw_queue+0xd7/0xe0
[   14.672373]
[   14.672375] [] blk_mq_insert_request+0xd9/0xf0
[   14.672780]
[   14.672782] [] blk_execute_rq_nowait+0xae/0x1c0
[   14.673190]
[   14.673192] [] nvme_nvm_submit_io+0x28f/0x310
[   14.673648]
[   14.673650] []

Re: [PATCH] block: Initialize cfqq->ioprio_class in cfq_get_queue()

2017-01-23 Thread Alexander Potapenko

On Mon, Jan 23, 2017 at 5:03 PM, Andrey Ryabinin  wrote:
> 2017-01-23 17:06 GMT+03:00 Alexander Potapenko :
>> KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
>> uninitialized memory in cfq_init_cfqq():
>>
>> ==
>> BUG: KMSAN: use of unitialized memory
>> ...
>> Call Trace:
>>  [< inline >] __dump_stack lib/dump_stack.c:15
>>  [] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
>>  [] kmsan_report+0x205/0x360 ??:?
>>  [] __msan_warning+0x5b/0xb0 ??:?
>>  [< inline >] cfq_init_cfqq block/cfq-iosched.c:3754
>>  [] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
>> ...
>> origin:
>>  [] save_stack_trace+0x27/0x50 
>> arch/x86/kernel/stacktrace.c:67
>>  [] kmsan_internal_poison_shadow+0xab/0x150 ??:?
>>  [] kmsan_poison_slab+0xbb/0x120 ??:?
>>  [< inline >] allocate_slab mm/slub.c:1627
>>  [] new_slab+0x3af/0x4b0 mm/slub.c:1641
>>  [< inline >] new_slab_objects mm/slub.c:2407
>>  [] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
>>  [< inline >] __slab_alloc mm/slub.c:2606
>>  [< inline >] slab_alloc_node mm/slub.c:2669
>>  [] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
>>  [] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
>> ...
>> ==
>> (the line numbers are relative to 4.8-rc6, but the bug persists
>> upstream)
>>
>> The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
>> and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
>> before it's initialized.
>>
>
> struct cfq_queue is zero initialized (__GFP_ZERO).
> The warning is false-positive.
You are totally right. I've handled __GFP_ZERO in (hopefully) every
case except for this one, and overlooked the presence of that flag in
the kmem_cache_alloc_node().
Thanks for double-checking!
Jens, sorry for the false alarm.


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: split scsi passthrough fields out of struct request

2017-01-23 Thread Christoph Hellwig

On Mon, Jan 23, 2017 at 08:39:44AM -0700, Jens Axboe wrote:
> I'd like to get this in sooner rather than later, so I'll spend some
> time reviewing and testing it start this week. I'm assuming you are
> targeting 4.11 with this change, right?

Yes.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] block: Initialize cfqq->ioprio_class in cfq_get_queue()

2017-01-23 Thread Jens Axboe

On 01/23/2017 08:49 AM, Alexander Potapenko wrote:
> On Mon, Jan 23, 2017 at 4:30 PM, Jens Axboe  wrote:
>> On 01/23/2017 07:06 AM, Alexander Potapenko wrote:
>>> KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
>>> uninitialized memory in cfq_init_cfqq():
>>>
>>> ==
>>> BUG: KMSAN: use of unitialized memory
>>> ...
>>> Call Trace:
>>>  [< inline >] __dump_stack lib/dump_stack.c:15
>>>  [] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
>>>  [] kmsan_report+0x205/0x360 ??:?
>>>  [] __msan_warning+0x5b/0xb0 ??:?
>>>  [< inline >] cfq_init_cfqq block/cfq-iosched.c:3754
>>>  [] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
>>> ...
>>> origin:
>>>  [] save_stack_trace+0x27/0x50 
>>> arch/x86/kernel/stacktrace.c:67
>>>  [] kmsan_internal_poison_shadow+0xab/0x150 ??:?
>>>  [] kmsan_poison_slab+0xbb/0x120 ??:?
>>>  [< inline >] allocate_slab mm/slub.c:1627
>>>  [] new_slab+0x3af/0x4b0 mm/slub.c:1641
>>>  [< inline >] new_slab_objects mm/slub.c:2407
>>>  [] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
>>>  [< inline >] __slab_alloc mm/slub.c:2606
>>>  [< inline >] slab_alloc_node mm/slub.c:2669
>>>  [] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
>>>  [] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
>>> ...
>>> ==
>>> (the line numbers are relative to 4.8-rc6, but the bug persists
>>> upstream)
>>>
>>> The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
>>> and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
>>> before it's initialized.
>>
>> Patch looks fine to me, thanks. Is this a new warning? We don't seem
>> to have changed this path in a while, yet I wonder why this is only
>> surfacing now.
> 
> This is because KMSAN is a new tool, it's not in the trunk yet. Nobody
> could see this warning before.
> I don't know if kmemcheck detects this bug (and if anyone is actively
> using it for testing either).

Ah gotcha, I read that as KASAN. Then it makes more sense. Thanks for
clarifying.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] nbd: add a nbd-control interface

2017-01-23 Thread Josef Bacik

On Mon, Jan 23, 2017 at 10:03 AM, Greg KH  
wrote:

On Mon, Jan 23, 2017 at 09:57:52AM -0500, Josef Bacik wrote:
 On Mon, Jan 23, 2017 at 9:52 AM, Greg KH 
 wrote:

 > On Mon, Jan 23, 2017 at 09:42:08AM -0500, Josef Bacik wrote:
 > >
 > >
 > >  On Sat, Jan 21, 2017 at 4:05 AM, Greg KH
 > >  wrote:
 > >  > On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
 > >  > >  This patch mirrors the loop back device behavior with a 
few

 > >  > > changes.  First
 > >  > >  there is no DEL operation as NBD doesn't get as much 
churn as

 > > loop
 > >  > > devices do.
 > >  > >  Secondly the GET_NEXT operation can optionally create a 
new NBD

 > >  > > device or not.
 > >  > >  Our infrastructure people want to not allow NBD to create 
new

 > >  > > devices as it
 > >  > >  causes problems for them in containers.  However allow 
this to

 > > be
 > >  > > optional as
 > >  > >  things like the OSS NBD client probably doesn't care and 
would

 > > like
 > >  > > to just be
 > >  > >  given a device to use.
 > >  > >
 > >  > >  Signed-off-by: Josef Bacik 
 > >  >
 > >  > A random char device with odd ioctls?  Why?  There's no other
 > >  > configuration choice you could possibly use?  Where is the
 > > userspace
 > >  > tool that uses this new kernel api?
 > >  >
 > >  > You aren't passing in structures to the ioctl, so why does 
this

 > > HAVE to
 > >  > be an ioctl?
 > >
 > >  Again, this is how loop does it so I assumed a known, regularly
 > > used API was
 > >  the best bet.  I can do literally anything, but these 
interfaces

 > > have to be
 > >  used by other people, including internal people.  The
 > > /dev/whatever-control
 > >  is a well established way for interacting with dynamic device
 > > drivers (loop,
 > >  DM, btrfs), so that's what I went with.  Thanks,
 >
 > Again, please don't duplicate what loop did, we must _learn_ from
 > history, not repeat it :(

 Sure but what am I supposed to do?  Have some random sysfs knobs?  
Thanks,


It all depends on what you are trying to do.  I have yet to figure 
that

out at all here :(


I explained it in the changelog and my response to Wouter.  NBD 
preallocates all of its /dev/nbd# devices at modprobe time, so there's 
no way to add new devices as we need them.  Loop accomplishes this with 
the /dev/loop-control and an ioctl.  Then we also need a way to figure 
out what is the first /dev/nbd# device that isn't currently in use in 
order to pick the next one to configure.  Keep in mind that all of the 
configuration for /dev/nbd# devices is done through ioctls to those 
devices, so having a ioctl interface for the control device is 
consistent with the rest of how NBD works.  Thanks,


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] block: Initialize cfqq->ioprio_class in cfq_get_queue()

2017-01-23 Thread Andrey Ryabinin

2017-01-23 17:06 GMT+03:00 Alexander Potapenko :
> KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
> uninitialized memory in cfq_init_cfqq():
>
> ==
> BUG: KMSAN: use of unitialized memory
> ...
> Call Trace:
>  [< inline >] __dump_stack lib/dump_stack.c:15
>  [] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
>  [] kmsan_report+0x205/0x360 ??:?
>  [] __msan_warning+0x5b/0xb0 ??:?
>  [< inline >] cfq_init_cfqq block/cfq-iosched.c:3754
>  [] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
> ...
> origin:
>  [] save_stack_trace+0x27/0x50 
> arch/x86/kernel/stacktrace.c:67
>  [] kmsan_internal_poison_shadow+0xab/0x150 ??:?
>  [] kmsan_poison_slab+0xbb/0x120 ??:?
>  [< inline >] allocate_slab mm/slub.c:1627
>  [] new_slab+0x3af/0x4b0 mm/slub.c:1641
>  [< inline >] new_slab_objects mm/slub.c:2407
>  [] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
>  [< inline >] __slab_alloc mm/slub.c:2606
>  [< inline >] slab_alloc_node mm/slub.c:2669
>  [] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
>  [] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
> ...
> ==
> (the line numbers are relative to 4.8-rc6, but the bug persists
> upstream)
>
> The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
> and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
> before it's initialized.
>

struct cfq_queue is zero initialized (__GFP_ZERO).
The warning is false-positive.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] nbd: add a nbd-control interface

2017-01-23 Thread Greg KH

On Mon, Jan 23, 2017 at 09:42:08AM -0500, Josef Bacik wrote:
> 
> 
> On Sat, Jan 21, 2017 at 4:05 AM, Greg KH  wrote:
> > On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
> > >  This patch mirrors the loop back device behavior with a few
> > > changes.  First
> > >  there is no DEL operation as NBD doesn't get as much churn as loop
> > > devices do.
> > >  Secondly the GET_NEXT operation can optionally create a new NBD
> > > device or not.
> > >  Our infrastructure people want to not allow NBD to create new
> > > devices as it
> > >  causes problems for them in containers.  However allow this to be
> > > optional as
> > >  things like the OSS NBD client probably doesn't care and would like
> > > to just be
> > >  given a device to use.
> > > 
> > >  Signed-off-by: Josef Bacik 
> > 
> > A random char device with odd ioctls?  Why?  There's no other
> > configuration choice you could possibly use?  Where is the userspace
> > tool that uses this new kernel api?
> > 
> > You aren't passing in structures to the ioctl, so why does this HAVE to
> > be an ioctl?
> 
> Again, this is how loop does it so I assumed a known, regularly used API was
> the best bet.  I can do literally anything, but these interfaces have to be
> used by other people, including internal people.  The /dev/whatever-control
> is a well established way for interacting with dynamic device drivers (loop,
> DM, btrfs), so that's what I went with.  Thanks,

Again, please don't duplicate what loop did, we must _learn_ from
history, not repeat it :(
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/16] scsi_dh_hp_sw: switch to scsi_execute_req_flags()

2017-01-23 Thread Christoph Hellwig

From: Hannes Reinecke 

Switch to scsi_execute_req_flags() instead of using the block interface
directly.  This will set REQ_QUIET and REQ_PREEMPT, but this is okay as
we're evaluating the errors anyway and should be able to send the command
even if the device is quiesced.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/device_handler/scsi_dh_hp_sw.c | 222 
 1 file changed, 65 insertions(+), 157 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_hp_sw.c 
b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
index 308e871..be43c94 100644
--- a/drivers/scsi/device_handler/scsi_dh_hp_sw.c
+++ b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
@@ -38,13 +38,10 @@
 #define HP_SW_PATH_PASSIVE 1
 
 struct hp_sw_dh_data {
-   unsigned char sense[SCSI_SENSE_BUFFERSIZE];
int path_state;
int retries;
int retry_cnt;
struct scsi_device *sdev;
-   activate_complete   callback_fn;
-   void*callback_data;
 };
 
 static int hp_sw_start_stop(struct hp_sw_dh_data *);
@@ -56,43 +53,34 @@ static int hp_sw_start_stop(struct hp_sw_dh_data *);
  *
  * Returns SCSI_DH_DEV_OFFLINED if the sdev is on the passive path
  */
-static int tur_done(struct scsi_device *sdev, unsigned char *sense)
+static int tur_done(struct scsi_device *sdev, struct hp_sw_dh_data *h,
+   struct scsi_sense_hdr *sshdr)
 {
-   struct scsi_sense_hdr sshdr;
-   int ret;
+   int ret = SCSI_DH_IO;
 
-   ret = scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, );
-   if (!ret) {
-   sdev_printk(KERN_WARNING, sdev,
-   "%s: sending tur failed, no sense available\n",
-   HP_SW_NAME);
-   ret = SCSI_DH_IO;
-   goto done;
-   }
-   switch (sshdr.sense_key) {
+   switch (sshdr->sense_key) {
case UNIT_ATTENTION:
ret = SCSI_DH_IMM_RETRY;
break;
case NOT_READY:
-   if ((sshdr.asc == 0x04) && (sshdr.ascq == 2)) {
+   if (sshdr->asc == 0x04 && sshdr->ascq == 2) {
/*
 * LUN not ready - Initialization command required
 *
 * This is the passive path
 */
-   ret = SCSI_DH_DEV_OFFLINED;
+   h->path_state = HP_SW_PATH_PASSIVE;
+   ret = SCSI_DH_OK;
break;
}
/* Fallthrough */
default:
sdev_printk(KERN_WARNING, sdev,
   "%s: sending tur failed, sense %x/%x/%x\n",
-  HP_SW_NAME, sshdr.sense_key, sshdr.asc,
-  sshdr.ascq);
+  HP_SW_NAME, sshdr->sense_key, sshdr->asc,
+  sshdr->ascq);
break;
}
-
-done:
return ret;
 }
 
@@ -105,131 +93,36 @@ static int tur_done(struct scsi_device *sdev, unsigned 
char *sense)
  */
 static int hp_sw_tur(struct scsi_device *sdev, struct hp_sw_dh_data *h)
 {
-   struct request *req;
-   int ret;
+   unsigned char cmd[6] = { TEST_UNIT_READY };
+   struct scsi_sense_hdr sshdr;
+   int ret = SCSI_DH_OK, res;
+   u64 req_flags = REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
+   REQ_FAILFAST_DRIVER;
 
 retry:
-   req = blk_get_request(sdev->request_queue, WRITE, GFP_NOIO);
-   if (IS_ERR(req))
-   return SCSI_DH_RES_TEMP_UNAVAIL;
-
-   blk_rq_set_block_pc(req);
-   req->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
- REQ_FAILFAST_DRIVER;
-   req->cmd_len = COMMAND_SIZE(TEST_UNIT_READY);
-   req->cmd[0] = TEST_UNIT_READY;
-   req->timeout = HP_SW_TIMEOUT;
-   req->sense = h->sense;
-   memset(req->sense, 0, SCSI_SENSE_BUFFERSIZE);
-   req->sense_len = 0;
-
-   ret = blk_execute_rq(req->q, NULL, req, 1);
-   if (ret == -EIO) {
-   if (req->sense_len > 0) {
-   ret = tur_done(sdev, h->sense);
-   } else {
+   res = scsi_execute_req_flags(sdev, cmd, DMA_NONE, NULL, 0, ,
+HP_SW_TIMEOUT, HP_SW_RETRIES,
+NULL, req_flags, 0);
+   if (res) {
+   if (scsi_sense_valid())
+   ret = tur_done(sdev, h, );
+   else {
sdev_printk(KERN_WARNING, sdev,
"%s: sending tur failed with %x\n",
-   HP_SW_NAME, req->errors);
+   HP_SW_NAME, res);
ret = SCSI_DH_IO;
}
} else {
h->path_state = HP_SW_PATH_ACTIVE;

[PATCH 16/16] block: don't assign cmd_flags in __blk_rq_prep_clone

2017-01-23 Thread Christoph Hellwig

These days we have the proper flags set since request allocation time.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index fe5cc98..93a9e0b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2983,7 +2983,6 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone);
 static void __blk_rq_prep_clone(struct request *dst, struct request *src)
 {
dst->cpu = src->cpu;
-   dst->cmd_flags = src->cmd_flags | REQ_NOMERGE;
dst->cmd_type = src->cmd_type;
dst->__sector = blk_rq_pos(src);
dst->__data_len = blk_rq_bytes(src);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/16] scsi: remove __scsi_alloc_queue

2017-01-23 Thread Christoph Hellwig

Instead do an internal export of __scsi_init_queue for the transport
classes that export BSG nodes.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/scsi_lib.c | 19 ---
 drivers/scsi/scsi_transport_fc.c|  6 --
 drivers/scsi/scsi_transport_iscsi.c |  3 ++-
 include/scsi/scsi_host.h|  2 --
 include/scsi/scsi_transport.h   |  2 ++
 5 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3d6b364..7950516 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2082,7 +2082,7 @@ static u64 scsi_calculate_bounce_limit(struct Scsi_Host 
*shost)
return bounce_limit;
 }
 
-static void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
+void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
 {
struct device *dev = shost->dma_dev;
 
@@ -2117,28 +2117,17 @@ static void __scsi_init_queue(struct Scsi_Host *shost, 
struct request_queue *q)
 */
blk_queue_dma_alignment(q, 0x03);
 }
-
-struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
-request_fn_proc *request_fn)
-{
-   struct request_queue *q;
-
-   q = blk_init_queue(request_fn, NULL);
-   if (!q)
-   return NULL;
-   __scsi_init_queue(shost, q);
-   return q;
-}
-EXPORT_SYMBOL(__scsi_alloc_queue);
+EXPORT_SYMBOL_GPL(__scsi_init_queue);
 
 struct request_queue *scsi_alloc_queue(struct scsi_device *sdev)
 {
struct request_queue *q;
 
-   q = __scsi_alloc_queue(sdev->host, scsi_request_fn);
+   q = blk_init_queue(scsi_request_fn, NULL);
if (!q)
return NULL;
 
+   __scsi_init_queue(sdev->host, q);
blk_queue_prep_rq(q, scsi_prep_fn);
blk_queue_unprep_rq(q, scsi_unprep_fn);
blk_queue_softirq_done(q, scsi_softirq_done);
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 03577bd..afcedec 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3776,7 +3776,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct 
fc_host_attrs *fc_host)
snprintf(bsg_name, sizeof(bsg_name),
 "fc_host%d", shost->host_no);
 
-   q = __scsi_alloc_queue(shost, bsg_request_fn);
+   q = blk_init_queue(bsg_request_fn, NULL);
if (!q) {
dev_err(dev,
"fc_host%d: bsg interface failed to initialize - no 
request queue\n",
@@ -3784,6 +3784,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct 
fc_host_attrs *fc_host)
return -ENOMEM;
}
 
+   __scsi_init_queue(shost, q);
err = bsg_setup_queue(dev, q, bsg_name, fc_bsg_dispatch,
 i->f->dd_bsg_size);
if (err) {
@@ -3831,12 +3832,13 @@ fc_bsg_rportadd(struct Scsi_Host *shost, struct 
fc_rport *rport)
if (!i->f->bsg_request)
return -ENOTSUPP;
 
-   q = __scsi_alloc_queue(shost, bsg_request_fn);
+   q = blk_init_queue(bsg_request_fn, NULL);
if (!q) {
dev_err(dev, "bsg interface failed to initialize - no request 
queue\n");
return -ENOMEM;
}
 
+   __scsi_init_queue(shost, q);
err = bsg_setup_queue(dev, q, NULL, fc_bsg_dispatch, i->f->dd_bsg_size);
if (err) {
dev_err(dev, "failed to setup bsg queue\n");
diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index 42bca61..04ebe6e 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1544,10 +1544,11 @@ iscsi_bsg_host_add(struct Scsi_Host *shost, struct 
iscsi_cls_host *ihost)
 
snprintf(bsg_name, sizeof(bsg_name), "iscsi_host%d", shost->host_no);
 
-   q = __scsi_alloc_queue(shost, bsg_request_fn);
+   q = blk_init_queue(bsg_request_fn, NULL);
if (!q)
return -ENOMEM;
 
+   __scsi_init_queue(shost, q);
ret = bsg_setup_queue(dev, q, bsg_name, iscsi_bsg_host_dispatch, 0);
if (ret) {
shost_printk(KERN_ERR, shost, "bsg interface failed to "
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 36680f1..f4964d7 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -826,8 +826,6 @@ extern void scsi_block_requests(struct Scsi_Host *);
 
 struct class_container;
 
-extern struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
-   void (*) (struct request_queue 
*));
 /*
  * These two functions are used to allocate and free a pseudo device
  * which will connect to the host adapter itself rather than any
diff --git a/include/scsi/scsi_transport.h b/include/scsi/scsi_transport.h
index 8129239..b6e07b5 100644
--- a/include/scsi/scsi_transport.h
+++ b/include/scsi/scsi_transport.h
@@ -119,4 +119,6 @@

Re: split scsi passthrough fields out of struct request

2017-01-23 Thread Jens Axboe

On 01/23/2017 08:29 AM, Christoph Hellwig wrote:
> Hi all,
> 
> this series splits the support for SCSI passthrough commands from the
> main struct request used all over the block layer into a separate
> scsi_request structure that drivers that want to support SCSI passthough
> need to embedded as the first thing into their request-private data,
> similar to how we handle NVMe passthrough commands.
> 
> To support this I've added support for that the private data after
> request structure to the legacy request path instead, so that it can
> be treated the same way as the blk-mq path.  Compare to the current
> scsi_cmnd allocator that actually is a major simplification.
> 
> Compared to the previous RFC version the major change is that dm-mpath
> works with this version.  To make it work I've switched the legacy
> request dm-rq to use the same clone and map method as the blk-mq version.

I'd like to get this in sooner rather than later, so I'll spend some
time reviewing and testing it start this week. I'm assuming you are
targeting 4.11 with this change, right?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] nbd: add a nbd-control interface

2017-01-23 Thread Josef Bacik

On Mon, Jan 23, 2017 at 9:52 AM, Greg KH  
wrote:

On Mon, Jan 23, 2017 at 09:42:08AM -0500, Josef Bacik wrote:



 On Sat, Jan 21, 2017 at 4:05 AM, Greg KH 
 wrote:

 > On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
 > >  This patch mirrors the loop back device behavior with a few
 > > changes.  First
 > >  there is no DEL operation as NBD doesn't get as much churn as 
loop

 > > devices do.
 > >  Secondly the GET_NEXT operation can optionally create a new NBD
 > > device or not.
 > >  Our infrastructure people want to not allow NBD to create new
 > > devices as it
 > >  causes problems for them in containers.  However allow this to 
be

 > > optional as
 > >  things like the OSS NBD client probably doesn't care and would 
like

 > > to just be
 > >  given a device to use.
 > >
 > >  Signed-off-by: Josef Bacik 
 >
 > A random char device with odd ioctls?  Why?  There's no other
 > configuration choice you could possibly use?  Where is the 
userspace

 > tool that uses this new kernel api?
 >
 > You aren't passing in structures to the ioctl, so why does this 
HAVE to

 > be an ioctl?

 Again, this is how loop does it so I assumed a known, regularly 
used API was
 the best bet.  I can do literally anything, but these interfaces 
have to be
 used by other people, including internal people.  The 
/dev/whatever-control
 is a well established way for interacting with dynamic device 
drivers (loop,

 DM, btrfs), so that's what I went with.  Thanks,


Again, please don't duplicate what loop did, we must _learn_ from
history, not repeat it :(


Sure but what am I supposed to do?  Have some random sysfs knobs?  
Thanks,


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: blk-throttle: Move three assignments for the variable "ret" in tg_set_max()

2017-01-23 Thread Jens Axboe

On 01/23/2017 05:06 AM, SF Markus Elfring wrote:
>>> We have got different preferences for the placement of error code settings.
>> Yes we do, so what's the point? Both are OK.
> 
> Can a function implementation be executed a bit faster in the case
> that error codes will usually only matter if they would be set after
> a concrete software failure

Don't turn this into a troll fest. Maintainability trumps performance.
Every time. See previous email for reasoning for that.

>> Please don't go down that road it opens so much potential for needless 
>> bikeshedding
>> and waste all of our (including your) time.
> 
> I would appreciate to clarify involved run time consequences a bit more.

How about you go and benchmark the before and after, and present some
compelling evidence based on those tests for why the change should be
made? The onus is on the submitter here, not the reviewer.

As I said in the previous email, don't bother sending these types of
patches for the block layer again. They are just going to be ignored.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/16] scsi_dh_emc: switch to scsi_execute_req_flags()

2017-01-23 Thread Christoph Hellwig

From: Hannes Reinecke 

Switch to scsi_execute_req_flags() and scsi_get_vpd_page() instead of
open-coding it.  Using scsi_execute_req_flags() will set REQ_QUIET and
REQ_PREEMPT, but this is okay as we're evaluating the errors anyway and
should be able to send the command even if the device is quiesced.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/device_handler/scsi_dh_emc.c | 247 +++---
 1 file changed, 56 insertions(+), 191 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_emc.c 
b/drivers/scsi/device_handler/scsi_dh_emc.c
index 5b80746..4a7679f 100644
--- a/drivers/scsi/device_handler/scsi_dh_emc.c
+++ b/drivers/scsi/device_handler/scsi_dh_emc.c
@@ -88,12 +88,6 @@ struct clariion_dh_data {
 */
unsigned char buffer[CLARIION_BUFFER_SIZE];
/*
-* SCSI sense buffer for commands -- assumes serial issuance
-* and completion sequence of all commands for same multipath.
-*/
-   unsigned char sense[SCSI_SENSE_BUFFERSIZE];
-   unsigned int senselen;
-   /*
 * LUN state
 */
int lun_state;
@@ -116,44 +110,38 @@ struct clariion_dh_data {
 /*
  * Parse MODE_SELECT cmd reply.
  */
-static int trespass_endio(struct scsi_device *sdev, char *sense)
+static int trespass_endio(struct scsi_device *sdev,
+ struct scsi_sense_hdr *sshdr)
 {
int err = SCSI_DH_IO;
-   struct scsi_sense_hdr sshdr;
-
-   if (!scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, )) {
-   sdev_printk(KERN_ERR, sdev, "%s: Found valid sense data 0x%2x, "
-   "0x%2x, 0x%2x while sending CLARiiON trespass "
-   "command.\n", CLARIION_NAME, sshdr.sense_key,
-   sshdr.asc, sshdr.ascq);
 
-   if ((sshdr.sense_key == 0x05) && (sshdr.asc == 0x04) &&
-(sshdr.ascq == 0x00)) {
-   /*
-* Array based copy in progress -- do not send
-* mode_select or copy will be aborted mid-stream.
-*/
-   sdev_printk(KERN_INFO, sdev, "%s: Array Based Copy in "
-   "progress while sending CLARiiON trespass "
-   "command.\n", CLARIION_NAME);
-   err = SCSI_DH_DEV_TEMP_BUSY;
-   } else if ((sshdr.sense_key == 0x02) && (sshdr.asc == 0x04) &&
-   (sshdr.ascq == 0x03)) {
-   /*
-* LUN Not Ready - Manual Intervention Required
-* indicates in-progress ucode upgrade (NDU).
-*/
-   sdev_printk(KERN_INFO, sdev, "%s: Detected in-progress "
-   "ucode upgrade NDU operation while sending "
-   "CLARiiON trespass command.\n", 
CLARIION_NAME);
-   err = SCSI_DH_DEV_TEMP_BUSY;
-   } else
-   err = SCSI_DH_DEV_FAILED;
-   } else {
-   sdev_printk(KERN_INFO, sdev,
-   "%s: failed to send MODE SELECT, no sense 
available\n",
-   CLARIION_NAME);
-   }
+   sdev_printk(KERN_ERR, sdev, "%s: Found valid sense data 0x%2x, "
+   "0x%2x, 0x%2x while sending CLARiiON trespass "
+   "command.\n", CLARIION_NAME, sshdr->sense_key,
+   sshdr->asc, sshdr->ascq);
+
+   if (sshdr->sense_key == 0x05 && sshdr->asc == 0x04 &&
+   sshdr->ascq == 0x00) {
+   /*
+* Array based copy in progress -- do not send
+* mode_select or copy will be aborted mid-stream.
+*/
+   sdev_printk(KERN_INFO, sdev, "%s: Array Based Copy in "
+   "progress while sending CLARiiON trespass "
+   "command.\n", CLARIION_NAME);
+   err = SCSI_DH_DEV_TEMP_BUSY;
+   } else if (sshdr->sense_key == 0x02 && sshdr->asc == 0x04 &&
+  sshdr->ascq == 0x03) {
+   /*
+* LUN Not Ready - Manual Intervention Required
+* indicates in-progress ucode upgrade (NDU).
+*/
+   sdev_printk(KERN_INFO, sdev, "%s: Detected in-progress "
+   "ucode upgrade NDU operation while sending "
+   "CLARiiON trespass command.\n", CLARIION_NAME);
+   err = SCSI_DH_DEV_TEMP_BUSY;
+   } else
+   err = SCSI_DH_DEV_FAILED;
return err;
 }
 
@@ -257,103 +245,15 @@ static char * parse_sp_model(struct scsi_device *sdev, 
unsigned char *buffer)
return sp_model;
 }
 
-/*
- * Get block request for REQ_BLOCK_PC command issued to path.  Currently
- *

[PATCH 05/16] dm: always defer request allocation to the owner of the request_queue

2017-01-23 Thread Christoph Hellwig

DM already calls blk_mq_alloc_request on the request_queue of the
underlying device if it is a blk-mq device.  But now that we allow drivers
to allocate additional data and initialize it ahead of time we need to do
the same for all drivers.   Doing so and using the new cmd_size
infrastructure in the block layer greatly simplifies the dm-rq and mpath
code, and should also make arbitrary combinations of SQ and MQ devices
with SQ or MQ device mapper tables easily possible as a further step.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-core.h  |   1 -
 drivers/md/dm-mpath.c | 132 --
 drivers/md/dm-rq.c| 251 ++
 drivers/md/dm-rq.h|   2 +-
 drivers/md/dm-target.c|   7 --
 drivers/md/dm.c   |  18 +--
 drivers/md/dm.h   |   3 +-
 include/linux/device-mapper.h |   3 -
 8 files changed, 81 insertions(+), 336 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 40ceba1..136fda3 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -92,7 +92,6 @@ struct mapped_device {
 * io objects are allocated from here.
 */
mempool_t *io_pool;
-   mempool_t *rq_pool;
 
struct bio_set *bs;
 
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 6400cff..784f237 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -92,12 +92,6 @@ struct multipath {
 
unsigned queue_mode;
 
-   /*
-* We must use a mempool of dm_mpath_io structs so that we
-* can resubmit bios on error.
-*/
-   mempool_t *mpio_pool;
-
struct mutex work_mutex;
struct work_struct trigger_event;
 
@@ -115,8 +109,6 @@ struct dm_mpath_io {
 
 typedef int (*action_fn) (struct pgpath *pgpath);
 
-static struct kmem_cache *_mpio_cache;
-
 static struct workqueue_struct *kmultipathd, *kmpath_handlerd;
 static void trigger_event(struct work_struct *work);
 static void activate_path(struct work_struct *work);
@@ -209,7 +201,6 @@ static struct multipath *alloc_multipath(struct dm_target 
*ti)
init_waitqueue_head(>pg_init_wait);
mutex_init(>work_mutex);
 
-   m->mpio_pool = NULL;
m->queue_mode = DM_TYPE_NONE;
 
m->ti = ti;
@@ -229,16 +220,7 @@ static int alloc_multipath_stage2(struct dm_target *ti, 
struct multipath *m)
m->queue_mode = DM_TYPE_MQ_REQUEST_BASED;
else
m->queue_mode = DM_TYPE_REQUEST_BASED;
-   }
-
-   if (m->queue_mode == DM_TYPE_REQUEST_BASED) {
-   unsigned min_ios = dm_get_reserved_rq_based_ios();
-
-   m->mpio_pool = mempool_create_slab_pool(min_ios, _mpio_cache);
-   if (!m->mpio_pool)
-   return -ENOMEM;
-   }
-   else if (m->queue_mode == DM_TYPE_BIO_BASED) {
+   } else if (m->queue_mode == DM_TYPE_BIO_BASED) {
INIT_WORK(>process_queued_bios, process_queued_bios);
/*
 * bio-based doesn't support any direct scsi_dh management;
@@ -263,7 +245,6 @@ static void free_multipath(struct multipath *m)
 
kfree(m->hw_handler_name);
kfree(m->hw_handler_params);
-   mempool_destroy(m->mpio_pool);
kfree(m);
 }
 
@@ -272,38 +253,6 @@ static struct dm_mpath_io *get_mpio(union map_info *info)
return info->ptr;
 }
 
-static struct dm_mpath_io *set_mpio(struct multipath *m, union map_info *info)
-{
-   struct dm_mpath_io *mpio;
-
-   if (!m->mpio_pool) {
-   /* Use blk-mq pdu memory requested via per_io_data_size */
-   mpio = get_mpio(info);
-   memset(mpio, 0, sizeof(*mpio));
-   return mpio;
-   }
-
-   mpio = mempool_alloc(m->mpio_pool, GFP_ATOMIC);
-   if (!mpio)
-   return NULL;
-
-   memset(mpio, 0, sizeof(*mpio));
-   info->ptr = mpio;
-
-   return mpio;
-}
-
-static void clear_request_fn_mpio(struct multipath *m, union map_info *info)
-{
-   /* Only needed for non blk-mq (.request_fn) multipath */
-   if (m->mpio_pool) {
-   struct dm_mpath_io *mpio = info->ptr;
-
-   info->ptr = NULL;
-   mempool_free(mpio, m->mpio_pool);
-   }
-}
-
 static size_t multipath_per_bio_data_size(void)
 {
return sizeof(struct dm_mpath_io) + sizeof(struct dm_bio_details);
@@ -530,16 +479,17 @@ static bool must_push_back_bio(struct multipath *m)
 /*
  * Map cloned requests (request-based multipath)
  */
-static int __multipath_map(struct dm_target *ti, struct request *clone,
-  union map_info *map_context,
-  struct request *rq, struct request **__clone)
+static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
+  union map_info *map_context,
+

[PATCH 09/16] scsi: remove gfp_flags member in scsi_host_cmd_pool

2017-01-23 Thread Christoph Hellwig

When using the slab allocator we already decide at cache creation time if
an allocation comes from a GFP_DMA pool using the SLAB_CACHE_DMA flag,
and there is no point passing the kmalloc-family only GFP_DMA flag to
kmem_cache_alloc.  Drop all the infrastructure for doing so.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/scsi.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 75455d4..0f93892 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -105,7 +105,6 @@ struct scsi_host_cmd_pool {
char*cmd_name;
char*sense_name;
unsigned intslab_flags;
-   gfp_t   gfp_mask;
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_pool = {
@@ -118,7 +117,6 @@ static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
.cmd_name   = "scsi_cmd_cache(DMA)",
.sense_name = "scsi_sense_cache(DMA)",
.slab_flags = SLAB_HWCACHE_ALIGN|SLAB_CACHE_DMA,
-   .gfp_mask   = __GFP_DMA,
 };
 
 static DEFINE_MUTEX(host_cmd_pool_mutex);
@@ -156,12 +154,11 @@ scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t 
gfp_mask)
struct scsi_host_cmd_pool *pool = shost->cmd_pool;
struct scsi_cmnd *cmd;
 
-   cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
+   cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask);
if (!cmd)
goto fail;
 
-   cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab,
-gfp_mask | pool->gfp_mask);
+   cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab, gfp_mask);
if (!cmd->sense_buffer)
goto fail_free_cmd;
 
@@ -327,10 +324,8 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
}
 
pool->slab_flags = SLAB_HWCACHE_ALIGN;
-   if (shost->unchecked_isa_dma) {
+   if (shost->unchecked_isa_dma)
pool->slab_flags |= SLAB_CACHE_DMA;
-   pool->gfp_mask = __GFP_DMA;
-   }
 
if (hostt->cmd_size)
hostt->cmd_pool = pool;
@@ -424,7 +419,6 @@ static void scsi_put_host_cmd_pool(struct Scsi_Host *shost)
  */
 int scsi_setup_command_freelist(struct Scsi_Host *shost)
 {
-   const gfp_t gfp_mask = shost->unchecked_isa_dma ? GFP_DMA : GFP_KERNEL;
struct scsi_cmnd *cmd;
 
spin_lock_init(>free_list_lock);
@@ -437,7 +431,7 @@ int scsi_setup_command_freelist(struct Scsi_Host *shost)
/*
 * Get one backup command for this host.
 */
-   cmd = scsi_host_alloc_command(shost, gfp_mask);
+   cmd = scsi_host_alloc_command(shost, GFP_KERNEL);
if (!cmd) {
scsi_put_host_cmd_pool(shost);
shost->cmd_pool = NULL;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/16] dm: remove incomple BLOCK_PC support

2017-01-23 Thread Christoph Hellwig

DM tries to copy a few fields around for BLOCK_PC requests, but given
that no dm-target ever wires up scsi_cmd_ioctl BLOCK_PC can't actually
be sent to dm.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-rq.c | 16 
 1 file changed, 16 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 93f6e9f..3f12916 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -270,19 +270,6 @@ static void dm_end_request(struct request *clone, int 
error)
struct mapped_device *md = tio->md;
struct request *rq = tio->orig;
 
-   if (rq->cmd_type == REQ_TYPE_BLOCK_PC) {
-   rq->errors = clone->errors;
-   rq->resid_len = clone->resid_len;
-
-   if (rq->sense)
-   /*
-* We are using the sense buffer of the original
-* request.
-* So setting the length of the sense data is enough.
-*/
-   rq->sense_len = clone->sense_len;
-   }
-
free_rq_clone(clone);
rq_end_stats(md, rq);
if (!rq->q->mq_ops)
@@ -511,9 +498,6 @@ static int setup_clone(struct request *clone, struct 
request *rq,
if (r)
return r;
 
-   clone->cmd = rq->cmd;
-   clone->cmd_len = rq->cmd_len;
-   clone->sense = rq->sense;
clone->end_io = end_clone_request;
clone->end_io_data = tio;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/16] scsi_dh_rdac: switch to scsi_execute_req_flags()

2017-01-23 Thread Christoph Hellwig

From: Hannes Reinecke 

Switch to scsi_execute_req_flags() and scsi_get_vpd_page() instead of
open-coding it.  Using scsi_execute_req_flags() will set REQ_QUIET and
REQ_PREEMPT, but this is okay as we're evaluating the errors anyway and
should be able to send the command even if the device is quiesced.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/device_handler/scsi_dh_rdac.c | 174 +
 1 file changed, 51 insertions(+), 123 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c 
b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 00d9c32..b64eaae 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -205,7 +205,6 @@ struct rdac_dh_data {
 #define RDAC_NON_PREFERRED 1
charpreferred;
 
-   unsigned char   sense[SCSI_SENSE_BUFFERSIZE];
union   {
struct c2_inquiry c2;
struct c4_inquiry c4;
@@ -262,40 +261,12 @@ do { \
sdev_printk(KERN_INFO, sdev, RDAC_NAME ": " f "\n", ## arg); \
 } while (0);
 
-static struct request *get_rdac_req(struct scsi_device *sdev,
-   void *buffer, unsigned buflen, int rw)
+static unsigned int rdac_failover_get(struct rdac_controller *ctlr,
+ struct list_head *list,
+ unsigned char *cdb)
 {
-   struct request *rq;
-   struct request_queue *q = sdev->request_queue;
-
-   rq = blk_get_request(q, rw, GFP_NOIO);
-
-   if (IS_ERR(rq)) {
-   sdev_printk(KERN_INFO, sdev,
-   "get_rdac_req: blk_get_request failed.\n");
-   return NULL;
-   }
-   blk_rq_set_block_pc(rq);
-
-   if (buflen && blk_rq_map_kern(q, rq, buffer, buflen, GFP_NOIO)) {
-   blk_put_request(rq);
-   sdev_printk(KERN_INFO, sdev,
-   "get_rdac_req: blk_rq_map_kern failed.\n");
-   return NULL;
-   }
-
-   rq->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
-REQ_FAILFAST_DRIVER;
-   rq->retries = RDAC_RETRIES;
-   rq->timeout = RDAC_TIMEOUT;
-
-   return rq;
-}
-
-static struct request *rdac_failover_get(struct scsi_device *sdev,
-   struct rdac_dh_data *h, struct list_head *list)
-{
-   struct request *rq;
+   struct scsi_device *sdev = ctlr->ms_sdev;
+   struct rdac_dh_data *h = sdev->handler_data;
struct rdac_mode_common *common;
unsigned data_size;
struct rdac_queue_data *qdata;
@@ -332,27 +303,17 @@ static struct request *rdac_failover_get(struct 
scsi_device *sdev,
lun_table[qdata->h->lun] = 0x81;
}
 
-   /* get request for block layer packet command */
-   rq = get_rdac_req(sdev, >ctlr->mode_select, data_size, WRITE);
-   if (!rq)
-   return NULL;
-
/* Prepare the command. */
if (h->ctlr->use_ms10) {
-   rq->cmd[0] = MODE_SELECT_10;
-   rq->cmd[7] = data_size >> 8;
-   rq->cmd[8] = data_size & 0xff;
+   cdb[0] = MODE_SELECT_10;
+   cdb[7] = data_size >> 8;
+   cdb[8] = data_size & 0xff;
} else {
-   rq->cmd[0] = MODE_SELECT;
-   rq->cmd[4] = data_size;
+   cdb[0] = MODE_SELECT;
+   cdb[4] = data_size;
}
-   rq->cmd_len = COMMAND_SIZE(rq->cmd[0]);
-
-   rq->sense = h->sense;
-   memset(rq->sense, 0, SCSI_SENSE_BUFFERSIZE);
-   rq->sense_len = 0;
 
-   return rq;
+   return data_size;
 }
 
 static void release_controller(struct kref *kref)
@@ -400,46 +361,14 @@ static struct rdac_controller *get_controller(int index, 
char *array_name,
return ctlr;
 }
 
-static int submit_inquiry(struct scsi_device *sdev, int page_code,
- unsigned int len, struct rdac_dh_data *h)
-{
-   struct request *rq;
-   struct request_queue *q = sdev->request_queue;
-   int err = SCSI_DH_RES_TEMP_UNAVAIL;
-
-   rq = get_rdac_req(sdev, >inq, len, READ);
-   if (!rq)
-   goto done;
-
-   /* Prepare the command. */
-   rq->cmd[0] = INQUIRY;
-   rq->cmd[1] = 1;
-   rq->cmd[2] = page_code;
-   rq->cmd[4] = len;
-   rq->cmd_len = COMMAND_SIZE(INQUIRY);
-
-   rq->sense = h->sense;
-   memset(rq->sense, 0, SCSI_SENSE_BUFFERSIZE);
-   rq->sense_len = 0;
-
-   err = blk_execute_rq(q, NULL, rq, 1);
-   if (err == -EIO)
-   err = SCSI_DH_IO;
-
-   blk_put_request(rq);
-done:
-   return err;
-}
-
 static int get_lun_info(struct scsi_device *sdev, struct rdac_dh_data *h,
char *array_name, u8 *array_id)
 {
-   int err, i;
-   struct c8_inquiry *inqp;
+

split scsi passthrough fields out of struct request

2017-01-23 Thread Christoph Hellwig

Hi all,

this series splits the support for SCSI passthrough commands from the
main struct request used all over the block layer into a separate
scsi_request structure that drivers that want to support SCSI passthough
need to embedded as the first thing into their request-private data,
similar to how we handle NVMe passthrough commands.

To support this I've added support for that the private data after
request structure to the legacy request path instead, so that it can
be treated the same way as the blk-mq path.  Compare to the current
scsi_cmnd allocator that actually is a major simplification.

Compared to the previous RFC version the major change is that dm-mpath
works with this version.  To make it work I've switched the legacy
request dm-rq to use the same clone and map method as the blk-mq version.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/16] block: simplify blk_init_allocated_queue

2017-01-23 Thread Christoph Hellwig

Return an errno value instead of the passed in queue so that the callers
don't have to keep track of two queues, and move the assignment of the
request_fn and lock to the caller as passing them as argument doesn't
simplify anything.  While we're at it also remove two pointless NULL
assignments, given that the request structure is zeroed on allocation.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c   | 38 +++---
 drivers/md/dm-rq.c |  3 ++-
 include/linux/blkdev.h |  3 +--
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 732b719..d2c2fc9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -823,15 +823,19 @@ EXPORT_SYMBOL(blk_init_queue);
 struct request_queue *
 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 {
-   struct request_queue *uninit_q, *q;
+   struct request_queue *q;
 
-   uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
-   if (!uninit_q)
+   q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+   if (!q)
return NULL;
 
-   q = blk_init_allocated_queue(uninit_q, rfn, lock);
-   if (!q)
-   blk_cleanup_queue(uninit_q);
+   q->request_fn = rfn;
+   if (lock)
+   q->queue_lock = lock;
+   if (blk_init_allocated_queue(q) < 0) {
+   blk_cleanup_queue(q);
+   return NULL;
+   }
 
return q;
 }
@@ -839,30 +843,19 @@ EXPORT_SYMBOL(blk_init_queue_node);
 
 static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio);
 
-struct request_queue *
-blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
-spinlock_t *lock)
-{
-   if (!q)
-   return NULL;
 
+int blk_init_allocated_queue(struct request_queue *q)
+{
q->fq = blk_alloc_flush_queue(q, NUMA_NO_NODE, 0);
if (!q->fq)
-   return NULL;
+   return -ENOMEM;
 
if (blk_init_rl(>root_rl, q, GFP_KERNEL))
goto fail;
 
INIT_WORK(>timeout_work, blk_timeout_work);
-   q->request_fn   = rfn;
-   q->prep_rq_fn   = NULL;
-   q->unprep_rq_fn = NULL;
q->queue_flags  |= QUEUE_FLAG_DEFAULT;
 
-   /* Override internal queue lock with supplied lock pointer */
-   if (lock)
-   q->queue_lock   = lock;
-
/*
 * This also sets hw/phys segments, boundary and size
 */
@@ -880,13 +873,12 @@ blk_init_allocated_queue(struct request_queue *q, 
request_fn_proc *rfn,
}
 
mutex_unlock(>sysfs_lock);
-
-   return q;
+   return 0;
 
 fail:
blk_free_flush_queue(q->fq);
wbt_exit(q);
-   return NULL;
+   return -ENOMEM;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue);
 
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 9d7275f..93f6e9f 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -823,7 +823,8 @@ static void dm_old_request_fn(struct request_queue *q)
 int dm_old_init_request_queue(struct mapped_device *md)
 {
/* Fully initialize the queue */
-   if (!blk_init_allocated_queue(md->queue, dm_old_request_fn, NULL))
+   md->queue->request_fn = dm_old_request_fn;
+   if (blk_init_allocated_queue(md->queue) < 0)
return -EINVAL;
 
/* disable dm_old_request_fn's merge heuristic by default */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8e0b57e..a036c4a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1131,8 +1131,7 @@ extern void blk_unprep_request(struct request *);
 extern struct request_queue *blk_init_queue_node(request_fn_proc *rfn,
spinlock_t *lock, int node_id);
 extern struct request_queue *blk_init_queue(request_fn_proc *, spinlock_t *);
-extern struct request_queue *blk_init_allocated_queue(struct request_queue *,
- request_fn_proc *, 
spinlock_t *);
+extern int blk_init_allocated_queue(struct request_queue *);
 extern void blk_cleanup_queue(struct request_queue *);
 extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
 extern void blk_queue_bounce_limit(struct request_queue *, u64);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/16] block: fix elevator init check

2017-01-23 Thread Christoph Hellwig

We can't initalize the elevator fields for flushes as flush share space
in struct request with the elevator data.  But currently we can't
commnicate that a request is a flush through blk_get_request as we
can only pass READ or WRITE, and the low-level code looks at the
possible NULL bio to check for a flush.

Fix this by allowing to pass any block op and flags, and by checking for
the flush flags in __get_request.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c | 26 --
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a61f140..732b719 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1022,25 +1022,6 @@ int blk_update_nr_requests(struct request_queue *q, 
unsigned int nr)
return 0;
 }
 
-/*
- * Determine if elevator data should be initialized when allocating the
- * request associated with @bio.
- */
-static bool blk_rq_should_init_elevator(struct bio *bio)
-{
-   if (!bio)
-   return true;
-
-   /*
-* Flush requests do not use the elevator so skip initialization.
-* This allows a request to share the flush and elevator data.
-*/
-   if (bio->bi_opf & (REQ_PREFLUSH | REQ_FUA))
-   return false;
-
-   return true;
-}
-
 /**
  * __get_request - get a free request
  * @rl: request list to allocate from
@@ -1119,10 +1100,13 @@ static struct request *__get_request(struct 
request_list *rl, unsigned int op,
 * request is freed.  This guarantees icq's won't be destroyed and
 * makes creating new ones safe.
 *
+* Flush requests do not use the elevator so skip initialization.
+* This allows a request to share the flush and elevator data.
+*
 * Also, lookup icq while holding queue_lock.  If it doesn't exist,
 * it will be created after releasing queue_lock.
 */
-   if (blk_rq_should_init_elevator(bio) && !blk_queue_bypass(q)) {
+   if (!(op & (REQ_PREFLUSH | REQ_FUA)) && !blk_queue_bypass(q)) {
rq_flags |= RQF_ELVPRIV;
q->nr_rqs_elvpriv++;
if (et->icq_cache && ioc)
@@ -1276,8 +1260,6 @@ static struct request *blk_old_get_request(struct 
request_queue *q, int rw,
 {
struct request *rq;
 
-   BUG_ON(rw != READ && rw != WRITE);
-
/* create ioc upfront */
create_io_context(gfp_mask, q->node);
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: __blk_mq_tag_busy() weirdness

2017-01-23 Thread Jens Axboe

On 01/23/2017 07:07 AM, Hannes Reinecke wrote:
> Hi Jens,
> 
> while hunting yet another queue stall I've been looking at
> __blk_mq_tag_busy():
> 
> bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> {
>   if (!test_bit(BLK_MQ_S_TAG_ACTIVE, >state) &&
>   !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, >state))
>   atomic_inc(>tags->active_queues);
> 
>   return true;
> }
> 
> Is this some subtle usage of bitops (and I just didn't grasp it) or is
> the first test_bit() pointless?

Is it subtle? I've used that in a few places in blk-mq, and in mm/filemap.c
as well:

commit 7fcbbaf18392f0b17c95e2f033c8ccf87eecde1d
Author: Jens Axboe 
Date:   Thu May 22 11:54:16 2014 -0700

mm/filemap.c: avoid always dirtying mapping->flags on O_DIRECT

the test_and_* functions always need to be locked, and there's no point
in doing that if we only rarely need to change the value.

> If the former I _really_ would like to have some comments on why this is
> required.

Sure, but I really didn't think it was black magic.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] nbd: add a nbd-control interface

2017-01-23 Thread Greg KH

On Mon, Jan 23, 2017 at 09:57:52AM -0500, Josef Bacik wrote:
> On Mon, Jan 23, 2017 at 9:52 AM, Greg KH  wrote:
> > On Mon, Jan 23, 2017 at 09:42:08AM -0500, Josef Bacik wrote:
> > > 
> > > 
> > >  On Sat, Jan 21, 2017 at 4:05 AM, Greg KH
> > >  wrote:
> > >  > On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
> > >  > >  This patch mirrors the loop back device behavior with a few
> > >  > > changes.  First
> > >  > >  there is no DEL operation as NBD doesn't get as much churn as
> > > loop
> > >  > > devices do.
> > >  > >  Secondly the GET_NEXT operation can optionally create a new NBD
> > >  > > device or not.
> > >  > >  Our infrastructure people want to not allow NBD to create new
> > >  > > devices as it
> > >  > >  causes problems for them in containers.  However allow this to
> > > be
> > >  > > optional as
> > >  > >  things like the OSS NBD client probably doesn't care and would
> > > like
> > >  > > to just be
> > >  > >  given a device to use.
> > >  > >
> > >  > >  Signed-off-by: Josef Bacik 
> > >  >
> > >  > A random char device with odd ioctls?  Why?  There's no other
> > >  > configuration choice you could possibly use?  Where is the
> > > userspace
> > >  > tool that uses this new kernel api?
> > >  >
> > >  > You aren't passing in structures to the ioctl, so why does this
> > > HAVE to
> > >  > be an ioctl?
> > > 
> > >  Again, this is how loop does it so I assumed a known, regularly
> > > used API was
> > >  the best bet.  I can do literally anything, but these interfaces
> > > have to be
> > >  used by other people, including internal people.  The
> > > /dev/whatever-control
> > >  is a well established way for interacting with dynamic device
> > > drivers (loop,
> > >  DM, btrfs), so that's what I went with.  Thanks,
> > 
> > Again, please don't duplicate what loop did, we must _learn_ from
> > history, not repeat it :(
> 
> Sure but what am I supposed to do?  Have some random sysfs knobs?  Thanks,

It all depends on what you are trying to do.  I have yet to figure that
out at all here :(
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] blk-throttle: Adjust two function calls together with a variable assignment

2017-01-23 Thread Jens Axboe

On 01/23/2017 02:20 AM, Johannes Thumshirn wrote:
> On Sun, Jan 22, 2017 at 09:33:08AM +0100, SF Markus Elfring wrote:
>> From: Markus Elfring 
>> Date: Sat, 21 Jan 2017 22:15:33 +0100
>>
>> The script "checkpatch.pl" pointed information out like the following.
>>
>> ERROR: do not use assignment in if condition
>>
>> Thus fix the affected source code places.
>>
>> Signed-off-by: Markus Elfring 
>> ---
> 
> Reviewed-by: Johannes Thumshirn 
> if Jens wants doesn't mind.

I don't mind these, but I completely agree on the patches for moving the
'error' assignment. What ends up happening for those cases is that
someone adds a new section and forgets to set 'error', and then all hell
breaks lose.

So Markus, don't bother sending those patches again for the block layer
or drivers, I'm not going to take them.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/16] scsi: allocate scsi_cmnd structures as part of struct request

2017-01-23 Thread Christoph Hellwig

Rely on the new block layer functionality to allocate additional driver
specific data behind struct request instead of implementing it in SCSI
itѕelf.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/hosts.c  |  20 +--
 drivers/scsi/scsi.c   | 319 --
 drivers/scsi/scsi_error.c |  17 ++-
 drivers/scsi/scsi_lib.c   | 122 --
 drivers/scsi/scsi_priv.h  |   8 +-
 include/scsi/scsi_host.h  |   3 -
 6 files changed, 95 insertions(+), 394 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 6d29c4a..831a1c8 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -230,19 +230,6 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
}
}
 
-   /*
-* Note that we allocate the freelist even for the MQ case for now,
-* as we need a command set aside for scsi_reset_provider.  Having
-* the full host freelist and one command available for that is a
-* little heavy-handed, but avoids introducing a special allocator
-* just for this.  Eventually the structure of scsi_reset_provider
-* will need a major overhaul.
-*/
-   error = scsi_setup_command_freelist(shost);
-   if (error)
-   goto out_destroy_tags;
-
-
if (!shost->shost_gendev.parent)
shost->shost_gendev.parent = dev ? dev : _bus;
if (!dma_dev)
@@ -262,7 +249,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
 
error = device_add(>shost_gendev);
if (error)
-   goto out_destroy_freelist;
+   goto out_disable_runtime_pm;
 
scsi_host_set_state(shost, SHOST_RUNNING);
get_device(shost->shost_gendev.parent);
@@ -312,13 +299,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, 
struct device *dev,
device_del(>shost_dev);
  out_del_gendev:
device_del(>shost_gendev);
- out_destroy_freelist:
+ out_disable_runtime_pm:
device_disable_async_suspend(>shost_gendev);
pm_runtime_disable(>shost_gendev);
pm_runtime_set_suspended(>shost_gendev);
pm_runtime_put_noidle(>shost_gendev);
-   scsi_destroy_command_freelist(shost);
- out_destroy_tags:
if (shost_use_blk_mq(shost))
scsi_mq_destroy_tags(shost);
  fail:
@@ -359,7 +344,6 @@ static void scsi_host_dev_release(struct device *dev)
kfree(dev_name(>shost_dev));
}
 
-   scsi_destroy_command_freelist(shost);
if (shost_use_blk_mq(shost)) {
if (shost->tag_set.tags)
scsi_mq_destroy_tags(shost);
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2e24f31..3d8d215 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -98,163 +98,6 @@ EXPORT_SYMBOL(scsi_sd_probe_domain);
 ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain);
 EXPORT_SYMBOL(scsi_sd_pm_domain);
 
-struct scsi_host_cmd_pool {
-   struct kmem_cache   *cmd_slab;
-   unsigned intusers;
-   char*cmd_name;
-};
-
-static struct scsi_host_cmd_pool scsi_cmd_pool = {
-   .cmd_name   = "scsi_cmd_cache",
-};
-
-static DEFINE_MUTEX(host_cmd_pool_mutex);
-
-/**
- * scsi_host_free_command - internal function to release a command
- * @shost: host to free the command for
- * @cmd:   command to release
- *
- * the command must previously have been allocated by
- * scsi_host_alloc_command.
- */
-static void
-scsi_host_free_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
-{
-   struct scsi_host_cmd_pool *pool = shost->cmd_pool;
-
-   if (cmd->prot_sdb)
-   kmem_cache_free(scsi_sdb_cache, cmd->prot_sdb);
-   scsi_free_sense_buffer(shost, cmd->sense_buffer);
-   kmem_cache_free(pool->cmd_slab, cmd);
-}
-
-/**
- * scsi_host_alloc_command - internal function to allocate command
- * @shost: SCSI host whose pool to allocate from
- * @gfp_mask:  mask for the allocation
- *
- * Returns a fully allocated command with sense buffer and protection
- * data buffer (where applicable) or NULL on failure
- */
-static struct scsi_cmnd *
-scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t gfp_mask)
-{
-   struct scsi_host_cmd_pool *pool = shost->cmd_pool;
-   struct scsi_cmnd *cmd;
-
-   cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask);
-   if (!cmd)
-   goto fail;
-
-   cmd->sense_buffer = scsi_alloc_sense_buffer(shost, gfp_mask,
-   NUMA_NO_NODE);
-   if (!cmd->sense_buffer)
-   goto fail_free_cmd;
-
-   if (scsi_host_get_prot(shost) >= SHOST_DIX_TYPE0_PROTECTION) {
-   cmd->prot_sdb = kmem_cache_zalloc(scsi_sdb_cache, gfp_mask);
-   if (!cmd->prot_sdb)
-   goto fail_free_sense;
-   }
-
-   return cmd;
-
-fail_free_sense:
-   scsi_free_sense_buffer(shost, cmd->sense_buffer);

[PATCH 11/16] scsi: remove scsi_cmd_dma_pool

2017-01-23 Thread Christoph Hellwig

There is no need for GFP_DMA allocations of the scsi_cmnd structures
themselves, all that might be DMAed to or from is the actual payload,
or the sense buffers.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/scsi.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 469aa0f..2e24f31 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -102,17 +102,10 @@ struct scsi_host_cmd_pool {
struct kmem_cache   *cmd_slab;
unsigned intusers;
char*cmd_name;
-   unsigned intslab_flags;
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_pool = {
.cmd_name   = "scsi_cmd_cache",
-   .slab_flags = SLAB_HWCACHE_ALIGN,
-};
-
-static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
-   .cmd_name   = "scsi_cmd_cache(DMA)",
-   .slab_flags = SLAB_HWCACHE_ALIGN|SLAB_CACHE_DMA,
 };
 
 static DEFINE_MUTEX(host_cmd_pool_mutex);
@@ -290,8 +283,6 @@ scsi_find_host_cmd_pool(struct Scsi_Host *shost)
 {
if (shost->hostt->cmd_size)
return shost->hostt->cmd_pool;
-   if (shost->unchecked_isa_dma)
-   return _cmd_dma_pool;
return _cmd_pool;
 }
 
@@ -318,10 +309,6 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
return NULL;
}
 
-   pool->slab_flags = SLAB_HWCACHE_ALIGN;
-   if (shost->unchecked_isa_dma)
-   pool->slab_flags |= SLAB_CACHE_DMA;
-
if (hostt->cmd_size)
hostt->cmd_pool = pool;
 
@@ -349,7 +336,7 @@ scsi_get_host_cmd_pool(struct Scsi_Host *shost)
 
if (!pool->users) {
pool->cmd_slab = kmem_cache_create(pool->cmd_name, cmd_size, 0,
-  pool->slab_flags, NULL);
+  SLAB_HWCACHE_ALIGN, NULL);
if (!pool->cmd_slab)
goto out_free_pool;
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/16] block/bsg: move queue creation into bsg_setup_queue

2017-01-23 Thread Christoph Hellwig

Simply the boilerplate code needed for bsg nodes a bit.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 block/bsg-lib.c | 21 +++--
 drivers/scsi/scsi_transport_fc.c| 36 
 drivers/scsi/scsi_transport_iscsi.c | 15 ---
 include/linux/bsg-lib.h |  5 ++---
 4 files changed, 25 insertions(+), 52 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 9d652a9..c74acf4 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -177,7 +177,7 @@ static int bsg_create_job(struct device *dev, struct 
request *req)
  *
  * Drivers/subsys should pass this to the queue init function.
  */
-void bsg_request_fn(struct request_queue *q)
+static void bsg_request_fn(struct request_queue *q)
__releases(q->queue_lock)
__acquires(q->queue_lock)
 {
@@ -214,24 +214,24 @@ void bsg_request_fn(struct request_queue *q)
put_device(dev);
spin_lock_irq(q->queue_lock);
 }
-EXPORT_SYMBOL_GPL(bsg_request_fn);
 
 /**
  * bsg_setup_queue - Create and add the bsg hooks so we can receive requests
  * @dev: device to attach bsg device to
- * @q: request queue setup by caller
  * @name: device to give bsg device
  * @job_fn: bsg job handler
  * @dd_job_size: size of LLD data needed for each job
- *
- * The caller should have setup the reuqest queue with bsg_request_fn
- * as the request_fn.
  */
-int bsg_setup_queue(struct device *dev, struct request_queue *q,
-   char *name, bsg_job_fn *job_fn, int dd_job_size)
+struct request_queue *bsg_setup_queue(struct device *dev, char *name,
+   bsg_job_fn *job_fn, int dd_job_size)
 {
+   struct request_queue *q;
int ret;
 
+   q = blk_init_queue(bsg_request_fn, NULL);
+   if (!q)
+   return ERR_PTR(-ENOMEM);
+
q->queuedata = dev;
q->bsg_job_size = dd_job_size;
q->bsg_job_fn = job_fn;
@@ -243,9 +243,10 @@ int bsg_setup_queue(struct device *dev, struct 
request_queue *q,
if (ret) {
printk(KERN_ERR "%s: bsg interface failed to "
   "initialize - register queue\n", dev->kobj.name);
-   return ret;
+   blk_cleanup_queue(q);
+   return ERR_PTR(ret);
}
 
-   return 0;
+   return q;
 }
 EXPORT_SYMBOL_GPL(bsg_setup_queue);
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index afcedec..13dcb9b 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3765,7 +3765,6 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct 
fc_host_attrs *fc_host)
struct device *dev = >shost_gendev;
struct fc_internal *i = to_fc_internal(shost->transportt);
struct request_queue *q;
-   int err;
char bsg_name[20];
 
fc_host->rqst_q = NULL;
@@ -3776,24 +3775,14 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct 
fc_host_attrs *fc_host)
snprintf(bsg_name, sizeof(bsg_name),
 "fc_host%d", shost->host_no);
 
-   q = blk_init_queue(bsg_request_fn, NULL);
-   if (!q) {
-   dev_err(dev,
-   "fc_host%d: bsg interface failed to initialize - no 
request queue\n",
-   shost->host_no);
-   return -ENOMEM;
-   }
-
-   __scsi_init_queue(shost, q);
-   err = bsg_setup_queue(dev, q, bsg_name, fc_bsg_dispatch,
-i->f->dd_bsg_size);
-   if (err) {
+   q = bsg_setup_queue(dev, bsg_name, fc_bsg_dispatch, i->f->dd_bsg_size);
+   if (IS_ERR(q)) {
dev_err(dev,
"fc_host%d: bsg interface failed to initialize - setup 
queue\n",
shost->host_no);
-   blk_cleanup_queue(q);
-   return err;
+   return PTR_ERR(q);
}
+   __scsi_init_queue(shost, q);
blk_queue_rq_timed_out(q, fc_bsg_job_timeout);
blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT);
fc_host->rqst_q = q;
@@ -3825,27 +3814,18 @@ fc_bsg_rportadd(struct Scsi_Host *shost, struct 
fc_rport *rport)
struct device *dev = >dev;
struct fc_internal *i = to_fc_internal(shost->transportt);
struct request_queue *q;
-   int err;
 
rport->rqst_q = NULL;
 
if (!i->f->bsg_request)
return -ENOTSUPP;
 
-   q = blk_init_queue(bsg_request_fn, NULL);
-   if (!q) {
-   dev_err(dev, "bsg interface failed to initialize - no request 
queue\n");
-   return -ENOMEM;
-   }
-
-   __scsi_init_queue(shost, q);
-   err = bsg_setup_queue(dev, q, NULL, fc_bsg_dispatch, i->f->dd_bsg_size);
-   if (err) {
+   q = bsg_setup_queue(dev, NULL, fc_bsg_dispatch, i->f->dd_bsg_size);
+   if (IS_ERR(q)) {
dev_err(dev, "failed to setup bsg queue\n");
-

[PATCH 15/16] block: split scsi_request out of struct request

2017-01-23 Thread Christoph Hellwig

And require all drivers that want to support BLOCK_PC to allocate it
as the first thing of their private data.  To support this the legacy
IDE and BSG code is switched to set cmd_size on their queues to let
the block layer allocate the additional space.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c | 31 ---
 block/blk-exec.c | 12 -
 block/blk-mq.c   | 10 
 block/bsg-lib.c  | 34 
 block/bsg.c  | 47 -
 block/scsi_ioctl.c   | 87 ++
 drivers/ata/libata-scsi.c|  2 +-
 drivers/block/cciss.c| 28 +-
 drivers/block/pktcdvd.c  |  6 +--
 drivers/block/virtio_blk.c   | 11 ++--
 drivers/cdrom/cdrom.c| 32 +--
 drivers/ide/ide-atapi.c  | 43 ---
 drivers/ide/ide-cd.c | 91 +++-
 drivers/ide/ide-cd_ioctl.c   |  1 +
 drivers/ide/ide-cd_verbose.c |  6 +--
 drivers/ide/ide-devsets.c|  9 ++--
 drivers/ide/ide-disk.c   |  1 +
 drivers/ide/ide-eh.c |  2 +-
 drivers/ide/ide-floppy.c |  4 +-
 drivers/ide/ide-io.c |  3 +-
 drivers/ide/ide-ioctls.c |  6 ++-
 drivers/ide/ide-park.c   | 12 +++--
 drivers/ide/ide-pm.c |  2 +
 drivers/ide/ide-probe.c  | 36 +++--
 drivers/ide/ide-tape.c   | 32 +--
 drivers/ide/ide-taskfile.c   |  1 +
 drivers/ide/sis5513.c|  2 +-
 drivers/message/fusion/mptsas.c  |  8 +--
 drivers/scsi/libfc/fc_lport.c|  2 +-
 drivers/scsi/libsas/sas_expander.c   |  8 +--
 drivers/scsi/libsas/sas_host_smp.c   | 38 ++---
 drivers/scsi/mpt3sas/mpt3sas_transport.c |  8 +--
 drivers/scsi/osd/osd_initiator.c | 19 +++
 drivers/scsi/osst.c  | 15 +++---
 drivers/scsi/qla2xxx/qla_bsg.c   |  2 +-
 drivers/scsi/qla2xxx/qla_isr.c   |  6 ++-
 drivers/scsi/qla2xxx/qla_mr.c|  2 +-
 drivers/scsi/scsi_error.c| 22 
 drivers/scsi/scsi_lib.c  | 48 +
 drivers/scsi/scsi_transport_sas.c|  5 ++
 drivers/scsi/sd.c|  4 +-
 drivers/scsi/sg.c| 30 ++-
 drivers/scsi/st.c| 22 
 drivers/target/target_core_pscsi.c   | 11 ++--
 fs/nfsd/blocklayout.c| 17 +++---
 include/linux/blkdev.h   | 11 
 include/linux/blktrace_api.h |  3 +-
 include/linux/ide.h  |  8 ++-
 include/scsi/scsi_cmnd.h |  2 +
 include/scsi/scsi_request.h  | 28 ++
 kernel/trace/blktrace.c  | 32 +--
 51 files changed, 483 insertions(+), 419 deletions(-)
 create mode 100644 include/scsi/scsi_request.h

diff --git a/block/blk-core.c b/block/blk-core.c
index 793ef96..fe5cc98 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -132,8 +132,6 @@ void blk_rq_init(struct request_queue *q, struct request 
*rq)
rq->__sector = (sector_t) -1;
INIT_HLIST_NODE(>hash);
RB_CLEAR_NODE(>rb_node);
-   rq->cmd = rq->__cmd;
-   rq->cmd_len = BLK_MAX_CDB;
rq->tag = -1;
rq->internal_tag = -1;
rq->start_time = jiffies;
@@ -160,8 +158,6 @@ static void req_bio_endio(struct request *rq, struct bio 
*bio,
 
 void blk_dump_rq_flags(struct request *rq, char *msg)
 {
-   int bit;
-
printk(KERN_INFO "%s: dev %s: type=%x, flags=%llx\n", msg,
rq->rq_disk ? rq->rq_disk->disk_name : "?", rq->cmd_type,
(unsigned long long) rq->cmd_flags);
@@ -171,13 +167,6 @@ void blk_dump_rq_flags(struct request *rq, char *msg)
   blk_rq_sectors(rq), blk_rq_cur_sectors(rq));
printk(KERN_INFO "  bio %p, biotail %p, len %u\n",
   rq->bio, rq->biotail, blk_rq_bytes(rq));
-
-   if (rq->cmd_type == REQ_TYPE_BLOCK_PC) {
-   printk(KERN_INFO "  cdb: ");
-   for (bit = 0; bit < BLK_MAX_CDB; bit++)
-   printk("%02x ", rq->cmd[bit]);
-   printk("\n");
-   }
 }
 EXPORT_SYMBOL(blk_dump_rq_flags);
 
@@ -1316,18 +1305,6 @@ struct request *blk_get_request(struct request_queue *q, 
int rw, gfp_t gfp_mask)
 EXPORT_SYMBOL(blk_get_request);
 
 /**
- * blk_rq_set_block_pc - initialize a request to type BLOCK_PC
- * @rq:request to be initialized
- *
- */
-void blk_rq_set_block_pc(struct request *rq)
-{
-   rq->cmd_type = REQ_TYPE_BLOCK_PC;
-   memset(rq->__cmd, 0, sizeof(rq->__cmd));
-}

Re: [PATCH] block: Initialize cfqq->ioprio_class in cfq_get_queue()

2017-01-23 Thread Jens Axboe

On 01/23/2017 07:06 AM, Alexander Potapenko wrote:
> KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
> uninitialized memory in cfq_init_cfqq():
> 
> ==
> BUG: KMSAN: use of unitialized memory
> ...
> Call Trace:
>  [< inline >] __dump_stack lib/dump_stack.c:15
>  [] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
>  [] kmsan_report+0x205/0x360 ??:?
>  [] __msan_warning+0x5b/0xb0 ??:?
>  [< inline >] cfq_init_cfqq block/cfq-iosched.c:3754
>  [] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
> ...
> origin:
>  [] save_stack_trace+0x27/0x50 
> arch/x86/kernel/stacktrace.c:67
>  [] kmsan_internal_poison_shadow+0xab/0x150 ??:?
>  [] kmsan_poison_slab+0xbb/0x120 ??:?
>  [< inline >] allocate_slab mm/slub.c:1627
>  [] new_slab+0x3af/0x4b0 mm/slub.c:1641
>  [< inline >] new_slab_objects mm/slub.c:2407
>  [] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
>  [< inline >] __slab_alloc mm/slub.c:2606
>  [< inline >] slab_alloc_node mm/slub.c:2669
>  [] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
>  [] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
> ...
> ==
> (the line numbers are relative to 4.8-rc6, but the bug persists
> upstream)
> 
> The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
> and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
> before it's initialized.

Patch looks fine to me, thanks. Is this a new warning? We don't seem
to have changed this path in a while, yet I wonder why this is only
surfacing now.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/10] blk-mq: create debugfs directory tree

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

In preparation for putting blk-mq debugging information in debugfs,
create a directory tree mirroring the one in sysfs:

# tree -d /sys/kernel/debug/block
/sys/kernel/debug/block
|-- nvme0n1
|   `-- mq
|   |-- 0
|   |   `-- cpu0
|   |-- 1
|   |   `-- cpu1
|   |-- 2
|   |   `-- cpu2
|   `-- 3
|   `-- cpu3
`-- vda
`-- mq
`-- 0
|-- cpu0
|-- cpu1
|-- cpu2
`-- cpu3

Also add the scaffolding for the actual files that will go in here,
either under the hardware queue or software queue directories.

Signed-off-by: Omar Sandoval 
---
 block/Makefile |   1 +
 block/blk-mq-debugfs.c | 152 +
 block/blk-mq-sysfs.c   |   8 +++
 block/blk-mq.c |   2 +
 block/blk-mq.h |  33 +++
 include/linux/blkdev.h |   5 ++
 6 files changed, 201 insertions(+)
 create mode 100644 block/blk-mq-debugfs.c

diff --git a/block/Makefile b/block/Makefile
index 3ee0abd7205a..6cabe6bd2882 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -26,3 +26,4 @@ obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o 
blk-integrity.o t10-pi.o
 obj-$(CONFIG_BLK_MQ_PCI)   += blk-mq-pci.o
 obj-$(CONFIG_BLK_DEV_ZONED)+= blk-zoned.o
 obj-$(CONFIG_BLK_WBT)  += blk-wbt.o
+obj-$(CONFIG_DEBUG_FS) += blk-mq-debugfs.o
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
new file mode 100644
index ..01711bbf5ade
--- /dev/null
+++ b/block/blk-mq-debugfs.c
@@ -0,0 +1,152 @@
+/*
+ * Copyright (C) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include "blk-mq.h"
+
+struct blk_mq_debugfs_attr {
+   const char *name;
+   umode_t mode;
+   const struct file_operations *fops;
+};
+
+static struct dentry *block_debugfs_root;
+
+static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
+};
+
+static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
+};
+
+int blk_mq_debugfs_register(struct request_queue *q, const char *name)
+{
+   if (!block_debugfs_root)
+   return -ENOENT;
+
+   q->debugfs_dir = debugfs_create_dir(name, block_debugfs_root);
+   if (!q->debugfs_dir)
+   goto err;
+
+   if (blk_mq_debugfs_register_hctxs(q))
+   goto err;
+
+   return 0;
+
+err:
+   blk_mq_debugfs_unregister(q);
+   return -ENOMEM;
+}
+
+void blk_mq_debugfs_unregister(struct request_queue *q)
+{
+   debugfs_remove_recursive(q->debugfs_dir);
+   q->mq_debugfs_dir = NULL;
+   q->debugfs_dir = NULL;
+}
+
+static int blk_mq_debugfs_register_ctx(struct request_queue *q,
+  struct blk_mq_ctx *ctx,
+  struct dentry *hctx_dir)
+{
+   struct dentry *ctx_dir;
+   char name[20];
+   int i;
+
+   snprintf(name, sizeof(name), "cpu%u", ctx->cpu);
+   ctx_dir = debugfs_create_dir(name, hctx_dir);
+   if (!ctx_dir)
+   return -ENOMEM;
+
+   for (i = 0; i < ARRAY_SIZE(blk_mq_debugfs_ctx_attrs); i++) {
+   const struct blk_mq_debugfs_attr *attr;
+
+   attr = _mq_debugfs_ctx_attrs[i];
+   if (!debugfs_create_file(attr->name, attr->mode, ctx_dir, ctx,
+attr->fops))
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+static int blk_mq_debugfs_register_hctx(struct request_queue *q,
+   struct blk_mq_hw_ctx *hctx)
+{
+   struct blk_mq_ctx *ctx;
+   struct dentry *hctx_dir;
+   char name[20];
+   int i;
+
+   snprintf(name, sizeof(name), "%u", hctx->queue_num);
+   hctx_dir = debugfs_create_dir(name, q->mq_debugfs_dir);
+   if (!hctx_dir)
+   return -ENOMEM;
+
+   for (i = 0; i < ARRAY_SIZE(blk_mq_debugfs_hctx_attrs); i++) {
+   const struct blk_mq_debugfs_attr *attr;
+
+   attr = _mq_debugfs_hctx_attrs[i];
+   if (!debugfs_create_file(attr->name, attr->mode, hctx_dir, hctx,
+attr->fops))
+   return -ENOMEM;
+   }
+
+   hctx_for_each_ctx(hctx, ctx, i) {
+   if

[PATCH 00/10] blk-mq: move debugging information from sysfs to debugfs

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

This series ends our abuse of sysfs and puts all of the debugging information
in debugfs instead. This has a few benefits:

1. Removes the possibility of userspace being stupid and relying on something
   in sysfs that we only exposed for debugging.
2. Lifts the limitations of sysfs, including the one-value-per-file convention
   and maximum of one page of output.
3. Allows us to add more debugging information that we often want but don't
   have when debugging a live system.

Thanks!
Omar

Omar Sandoval (10):
  blk-mq: create debugfs directory tree
  blk-mq: add hctx->{state,flags} to debugfs
  blk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfs
  blk-mq: add extra request information to debugfs
  sbitmap: add helpers for dumping to a seq_file
  blk-mq: export software queue pending map to debugfs
  blk-mq: move tags and sched_tags info from sysfs to debugfs
  blk-mq: add tags and sched_tags bitmaps to debugfs
  blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfs
  blk-mq: move hctx and ctx counters from sysfs to debugfs

 block/Makefile  |   1 +
 block/blk-mq-debugfs.c  | 750 
 block/blk-mq-sysfs.c| 248 ++--
 block/blk-mq-tag.c  |  27 --
 block/blk-mq-tag.h  |   1 -
 block/blk-mq.c  |   2 +
 block/blk-mq.h  |  33 +++
 include/linux/blkdev.h  |   5 +
 include/linux/sbitmap.h |  34 +++
 lib/sbitmap.c   |  83 ++
 10 files changed, 929 insertions(+), 255 deletions(-)
 create mode 100644 block/blk-mq-debugfs.c

-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/10] blk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfs

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

These lists are only useful for debugging; they definitely don't belong
in sysfs. Putting them in debugfs also removes the limitation of a
single page of output.

Signed-off-by: Omar Sandoval 
---
 block/blk-mq-debugfs.c | 106 +
 block/blk-mq-sysfs.c   |  57 --
 2 files changed, 106 insertions(+), 57 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 0c14511fa9e0..1967c06d80e0 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -29,6 +29,20 @@ struct blk_mq_debugfs_attr {
 
 static struct dentry *block_debugfs_root;
 
+static int blk_mq_debugfs_seq_open(struct inode *inode, struct file *file,
+  const struct seq_operations *ops)
+{
+   struct seq_file *m;
+   int ret;
+
+   ret = seq_open(file, ops);
+   if (!ret) {
+   m = file->private_data;
+   m->private = inode->i_private;
+   }
+   return ret;
+}
+
 static int hctx_state_show(struct seq_file *m, void *v)
 {
struct blk_mq_hw_ctx *hctx = m->private;
@@ -69,12 +83,104 @@ static const struct file_operations hctx_flags_fops = {
.release= single_release,
 };
 
+static int blk_mq_debugfs_rq_show(struct seq_file *m, void *v)
+{
+   struct request *rq = list_entry_rq(v);
+
+   seq_printf(m, "%p\n", rq);
+   return 0;
+}
+
+static void *hctx_dispatch_start(struct seq_file *m, loff_t *pos)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   spin_lock(>lock);
+   return seq_list_start(>dispatch, *pos);
+}
+
+static void *hctx_dispatch_next(struct seq_file *m, void *v, loff_t *pos)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   return seq_list_next(v, >dispatch, pos);
+}
+
+static void hctx_dispatch_stop(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   spin_unlock(>lock);
+}
+
+static const struct seq_operations hctx_dispatch_seq_ops = {
+   .start  = hctx_dispatch_start,
+   .next   = hctx_dispatch_next,
+   .stop   = hctx_dispatch_stop,
+   .show   = blk_mq_debugfs_rq_show,
+};
+
+static int hctx_dispatch_open(struct inode *inode, struct file *file)
+{
+   return blk_mq_debugfs_seq_open(inode, file, _dispatch_seq_ops);
+}
+
+static const struct file_operations hctx_dispatch_fops = {
+   .open   = hctx_dispatch_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
+static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
+{
+   struct blk_mq_ctx *ctx = m->private;
+
+   spin_lock(>lock);
+   return seq_list_start(>rq_list, *pos);
+}
+
+static void *ctx_rq_list_next(struct seq_file *m, void *v, loff_t *pos)
+{
+   struct blk_mq_ctx *ctx = m->private;
+
+   return seq_list_next(v, >rq_list, pos);
+}
+
+static void ctx_rq_list_stop(struct seq_file *m, void *v)
+{
+   struct blk_mq_ctx *ctx = m->private;
+
+   spin_unlock(>lock);
+}
+
+static const struct seq_operations ctx_rq_list_seq_ops = {
+   .start  = ctx_rq_list_start,
+   .next   = ctx_rq_list_next,
+   .stop   = ctx_rq_list_stop,
+   .show   = blk_mq_debugfs_rq_show,
+};
+
+static int ctx_rq_list_open(struct inode *inode, struct file *file)
+{
+   return blk_mq_debugfs_seq_open(inode, file, _rq_list_seq_ops);
+}
+
+static const struct file_operations ctx_rq_list_fops = {
+   .open   = ctx_rq_list_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
{"state", 0400, _state_fops},
{"flags", 0400, _flags_fops},
+   {"dispatch", 0400, _dispatch_fops},
 };
 
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
+   {"rq_list", 0400, _rq_list_fops},
 };
 
 int blk_mq_debugfs_register(struct request_queue *q, const char *name)
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 91b93ca1c1e0..ee3694d8c4ee 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -139,41 +139,6 @@ static ssize_t blk_mq_sysfs_completed_show(struct 
blk_mq_ctx *ctx, char *page)
ctx->rq_completed[0]);
 }
 
-static ssize_t sysfs_list_show(char *page, struct list_head *list, char *msg)
-{
-   struct request *rq;
-   int len = snprintf(page, PAGE_SIZE - 1, "%s:\n", msg);
-
-   list_for_each_entry(rq, list, queuelist) {
-   const int rq_len = 2 * sizeof(rq) + 2;
-
-   /* if the output will be truncated */
-   if (PAGE_SIZE - 1 < len + rq_len) {
-   /* backspacing if it can't hold '\t...\n' */
-   if (PAGE_SIZE - 1 < len + 5)
-   len -= rq_len;
-   len +=

[PATCH 05/10] sbitmap: add helpers for dumping to a seq_file

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

This is useful debugging information that will be used in the blk-mq
debugfs directory.

Signed-off-by: Omar Sandoval 
---
 include/linux/sbitmap.h | 34 
 lib/sbitmap.c   | 83 +
 2 files changed, 117 insertions(+)

diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index f017fd6e69c4..97758d1cf79a 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -259,6 +259,29 @@ static inline int sbitmap_test_bit(struct sbitmap *sb, 
unsigned int bitnr)
 unsigned int sbitmap_weight(const struct sbitmap *sb);
 
 /**
+ * sbitmap_show() - Dump bitmap information to a struct seq_file.
+ * @m: struct seq_file to write to.
+ * @v: Bitmap to show.
+ *
+ * Return: Zero.
+ *
+ * This is intended for debugging. The format may change at any time.
+ */
+int sbitmap_show(struct seq_file *m, void *v);
+
+/**
+ * sbitmap_bitmap_show() - Dump the raw bitmap to a struct seq_file.
+ * @m: struct seq_file to write to.
+ * @v: Bitmap to show.
+ *
+ * Return: Zero.
+ *
+ * This is intended for debugging. The output isn't guaranteed to be internally
+ * consistent.
+ */
+int sbitmap_bitmap_show(struct seq_file *m, void *v);
+
+/**
  * sbitmap_queue_init_node() - Initialize a  sbitmap_queue on a specific
  * memory node.
  * @sbq: Bitmap queue to initialize.
@@ -370,4 +393,15 @@ static inline struct sbq_wait_state *sbq_wait_ptr(struct 
sbitmap_queue *sbq,
  */
 void sbitmap_queue_wake_all(struct sbitmap_queue *sbq);
 
+/**
+ * sbitmap_queue_show() - Dump bitmap queue information to a struct seq_file.
+ * @m: struct seq_file to write to.
+ * @v: Bitmap queue to show.
+ *
+ * Return: Zero.
+ *
+ * This is intended for debugging. The format may change at any time.
+ */
+int sbitmap_queue_show(struct seq_file *m, void *v);
+
 #endif /* __LINUX_SCALE_BITMAP_H */
diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 8f5c3b268c77..f2b1ca9dfc36 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -17,6 +17,7 @@
 
 #include 
 #include 
+#include 
 
 int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift,
  gfp_t flags, int node)
@@ -180,6 +181,51 @@ unsigned int sbitmap_weight(const struct sbitmap *sb)
 }
 EXPORT_SYMBOL_GPL(sbitmap_weight);
 
+int sbitmap_show(struct seq_file *m, void *v)
+{
+   struct sbitmap *sb = v;
+
+   seq_printf(m, "depth=%u\n", sb->depth);
+   seq_printf(m, "weight=%u\n", sbitmap_weight(sb));
+   seq_printf(m, "bits_per_word=%u\n", 1U << sb->shift);
+   seq_printf(m, "map_nr=%u\n", sb->map_nr);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(sbitmap_show);
+
+int sbitmap_bitmap_show(struct seq_file *m, void *v)
+{
+   struct sbitmap *sb = v;
+   u8 byte = 0;
+   unsigned int byte_bits = 0;
+   int i;
+
+   for (i = 0; i < sb->map_nr; i++) {
+   unsigned long word = READ_ONCE(sb->map[i].word);
+   unsigned int word_bits = READ_ONCE(sb->map[i].depth);
+
+   while (word_bits > 0) {
+   unsigned int bits = min(8 - byte_bits, word_bits);
+
+   byte |= (word & (BIT(bits) - 1)) << byte_bits;
+   byte_bits += bits;
+   if (byte_bits == 8) {
+   seq_write(m, , sizeof(byte));
+   byte = 0;
+   byte_bits = 0;
+   }
+   word >>= bits;
+   word_bits -= bits;
+   }
+   }
+
+   if (byte_bits)
+   seq_write(m, , sizeof(byte));
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(sbitmap_bitmap_show);
+
 static unsigned int sbq_calc_wake_batch(unsigned int depth)
 {
unsigned int wake_batch;
@@ -377,3 +423,40 @@ void sbitmap_queue_wake_all(struct sbitmap_queue *sbq)
}
 }
 EXPORT_SYMBOL_GPL(sbitmap_queue_wake_all);
+
+int sbitmap_queue_show(struct seq_file *m, void *v)
+{
+   struct sbitmap_queue *sbq = v;
+   bool first;
+   int i;
+
+   sbitmap_show(m, >sb);
+
+   seq_puts(m, "alloc_hint={");
+   first = true;
+   for_each_possible_cpu(i) {
+   if (!first)
+   seq_puts(m, ", ");
+   first = false;
+   seq_printf(m, "%u", *per_cpu_ptr(sbq->alloc_hint, i));
+   }
+   seq_puts(m, "}\n");
+
+   seq_printf(m, "wake_batch=%u\n", sbq->wake_batch);
+   seq_printf(m, "wake_index=%d\n", atomic_read(>wake_index));
+
+   seq_puts(m, "ws={\n");
+   for (i = 0; i < SBQ_WAIT_QUEUES; i++) {
+   struct sbq_wait_state *ws = >ws[i];
+
+   seq_printf(m, "\t{.wait_cnt=%d, .wait=%s},\n",
+  atomic_read(>wait_cnt),
+  waitqueue_active(>wait) ? "active" : "inactive");
+   }
+   seq_puts(m, "}\n");
+
+   seq_printf(m, "round_robin=%d\n", sbq->round_robin);
+
+

[PATCH 08/10] blk-mq: add tags and sched_tags bitmaps to debugfs

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

These can be used to debug issues like tag leaks and stuck requests.

Signed-off-by: Omar Sandoval 
---
 block/blk-mq-debugfs.c | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 48b3a11402fb..9f43b285fe3a 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -191,6 +191,30 @@ static const struct file_operations hctx_tags_fops = {
.release= single_release,
 };
 
+static int hctx_tags_bitmap_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+   struct request_queue *q = hctx->queue;
+
+   mutex_lock(>sysfs_lock);
+   if (hctx->tags)
+   sbitmap_bitmap_show(m, >tags->bitmap_tags.sb);
+   mutex_unlock(>sysfs_lock);
+   return 0;
+}
+
+static int hctx_tags_bitmap_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_tags_bitmap_show, inode->i_private);
+}
+
+static const struct file_operations hctx_tags_bitmap_fops = {
+   .open   = hctx_tags_bitmap_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 static int hctx_sched_tags_show(struct seq_file *m, void *v)
 {
struct blk_mq_hw_ctx *hctx = m->private;
@@ -216,6 +240,30 @@ static const struct file_operations hctx_sched_tags_fops = 
{
.release= single_release,
 };
 
+static int hctx_sched_tags_bitmap_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+   struct request_queue *q = hctx->queue;
+
+   mutex_lock(>sysfs_lock);
+   if (hctx->sched_tags)
+   sbitmap_bitmap_show(m, >sched_tags->bitmap_tags.sb);
+   mutex_unlock(>sysfs_lock);
+   return 0;
+}
+
+static int hctx_sched_tags_bitmap_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_sched_tags_bitmap_show, inode->i_private);
+}
+
+static const struct file_operations hctx_sched_tags_bitmap_fops = {
+   .open   = hctx_sched_tags_bitmap_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
 {
struct blk_mq_ctx *ctx = m->private;
@@ -263,7 +311,9 @@ static const struct blk_mq_debugfs_attr 
blk_mq_debugfs_hctx_attrs[] = {
{"dispatch", 0400, _dispatch_fops},
{"ctx_map", 0400, _ctx_map_fops},
{"tags", 0400, _tags_fops},
+   {"tags_bitmap", 0400, _tags_bitmap_fops},
{"sched_tags", 0400, _sched_tags_fops},
+   {"sched_tags_bitmap", 0400, _sched_tags_bitmap_fops},
 };
 
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/10] blk-mq: export software queue pending map to debugfs

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

This is useful for debugging problems where we've gotten stuck with
requests in the software queues.

Signed-off-by: Omar Sandoval 
---
 block/blk-mq-debugfs.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index ee947f2f605b..4ee6ab28f56f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -134,6 +134,20 @@ static const struct file_operations hctx_dispatch_fops = {
.release= seq_release,
 };
 
+static int hctx_ctx_map_open(struct inode *inode, struct file *file)
+{
+   struct blk_mq_hw_ctx *hctx = inode->i_private;
+
+   return single_open(file, sbitmap_bitmap_show, >ctx_map);
+}
+
+static const struct file_operations hctx_ctx_map_fops = {
+   .open   = hctx_ctx_map_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
 {
struct blk_mq_ctx *ctx = m->private;
@@ -179,6 +193,7 @@ static const struct blk_mq_debugfs_attr 
blk_mq_debugfs_hctx_attrs[] = {
{"state", 0400, _state_fops},
{"flags", 0400, _flags_fops},
{"dispatch", 0400, _dispatch_fops},
+   {"ctx_map", 0400, _ctx_map_fops},
 };
 
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/10] blk-mq: move tags and sched_tags info from sysfs to debugfs

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

These are very tied to the blk-mq tag implementation, so exposing them
to sysfs isn't a great idea. Move the debugging information to debugfs
and add basic entries for the number of tags and the number of reserved
tags to sysfs.

Signed-off-by: Omar Sandoval 
---
 block/blk-mq-debugfs.c | 70 ++
 block/blk-mq-sysfs.c   | 33 
 block/blk-mq-tag.c | 27 ---
 block/blk-mq-tag.h |  1 -
 4 files changed, 86 insertions(+), 45 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 4ee6ab28f56f..48b3a11402fb 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -20,6 +20,7 @@
 
 #include 
 #include "blk-mq.h"
+#include "blk-mq-tag.h"
 
 struct blk_mq_debugfs_attr {
const char *name;
@@ -148,6 +149,73 @@ static const struct file_operations hctx_ctx_map_fops = {
.release= single_release,
 };
 
+static void blk_mq_debugfs_tags_show(struct seq_file *m,
+struct blk_mq_tags *tags)
+{
+   seq_printf(m, "nr_tags=%u\n", tags->nr_tags);
+   seq_printf(m, "nr_reserved_tags=%u\n", tags->nr_reserved_tags);
+   seq_printf(m, "active_queues=%d\n",
+  atomic_read(>active_queues));
+
+   seq_puts(m, "\nbitmap_tags:\n");
+   sbitmap_queue_show(m, >bitmap_tags);
+
+   if (tags->nr_reserved_tags) {
+   seq_puts(m, "\nbreserved_tags:\n");
+   sbitmap_queue_show(m, >breserved_tags);
+   }
+}
+
+static int hctx_tags_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+   struct request_queue *q = hctx->queue;
+
+   mutex_lock(>sysfs_lock);
+   if (hctx->tags)
+   blk_mq_debugfs_tags_show(m, hctx->tags);
+   mutex_unlock(>sysfs_lock);
+
+   return 0;
+}
+
+static int hctx_tags_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_tags_show, inode->i_private);
+}
+
+static const struct file_operations hctx_tags_fops = {
+   .open   = hctx_tags_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+static int hctx_sched_tags_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+   struct request_queue *q = hctx->queue;
+
+   mutex_lock(>sysfs_lock);
+   if (hctx->sched_tags)
+   blk_mq_debugfs_tags_show(m, hctx->sched_tags);
+   mutex_unlock(>sysfs_lock);
+
+   return 0;
+}
+
+static int hctx_sched_tags_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_sched_tags_show, inode->i_private);
+}
+
+static const struct file_operations hctx_sched_tags_fops = {
+   .open   = hctx_sched_tags_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
 {
struct blk_mq_ctx *ctx = m->private;
@@ -194,6 +262,8 @@ static const struct blk_mq_debugfs_attr 
blk_mq_debugfs_hctx_attrs[] = {
{"flags", 0400, _flags_fops},
{"dispatch", 0400, _dispatch_fops},
{"ctx_map", 0400, _ctx_map_fops},
+   {"tags", 0400, _tags_fops},
+   {"sched_tags", 0400, _sched_tags_fops},
 };
 
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index ee3694d8c4ee..a0ae1f536ed0 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -184,17 +184,16 @@ static ssize_t blk_mq_hw_sysfs_dispatched_show(struct 
blk_mq_hw_ctx *hctx,
return page - start_page;
 }
 
-static ssize_t blk_mq_hw_sysfs_sched_tags_show(struct blk_mq_hw_ctx *hctx, 
char *page)
+static ssize_t blk_mq_hw_sysfs_nr_tags_show(struct blk_mq_hw_ctx *hctx,
+   char *page)
 {
-   if (hctx->sched_tags)
-   return blk_mq_tag_sysfs_show(hctx->sched_tags, page);
-
-   return 0;
+   return sprintf(page, "%u\n", hctx->tags->nr_tags);
 }
 
-static ssize_t blk_mq_hw_sysfs_tags_show(struct blk_mq_hw_ctx *hctx, char 
*page)
+static ssize_t blk_mq_hw_sysfs_nr_reserved_tags_show(struct blk_mq_hw_ctx 
*hctx,
+char *page)
 {
-   return blk_mq_tag_sysfs_show(hctx->tags, page);
+   return sprintf(page, "%u\n", hctx->tags->nr_reserved_tags);
 }
 
 static ssize_t blk_mq_hw_sysfs_active_show(struct blk_mq_hw_ctx *hctx, char 
*page)
@@ -293,18 +292,18 @@ static struct blk_mq_hw_ctx_sysfs_entry 
blk_mq_hw_sysfs_dispatched = {
.attr = {.name = "dispatched", .mode = S_IRUGO },
.show = blk_mq_hw_sysfs_dispatched_show,
 };
+static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_nr_tags = {
+   .attr = {.name = "nr_tags", .mode = S_IRUGO },
+

[PATCH 09/10] blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfs

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

These statistics _might_ be useful to userspace, but it's better not to
commit to an ABI for these yet. Also, the dispatched file in sysfs
couldn't be cleared, so make it clearable like the others in debugfs.

Signed-off-by: Omar Sandoval 
---
 block/blk-mq-debugfs.c | 132 +
 block/blk-mq-sysfs.c   |  92 --
 2 files changed, 132 insertions(+), 92 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 9f43b285fe3a..37780d10bc9e 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -264,6 +264,135 @@ static const struct file_operations 
hctx_sched_tags_bitmap_fops = {
.release= single_release,
 };
 
+static int hctx_io_poll_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   seq_printf(m, "considered=%lu\n", hctx->poll_considered);
+   seq_printf(m, "invoked=%lu\n", hctx->poll_invoked);
+   seq_printf(m, "success=%lu\n", hctx->poll_success);
+   return 0;
+}
+
+static int hctx_io_poll_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_io_poll_show, inode->i_private);
+}
+
+static ssize_t hctx_io_poll_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+   struct seq_file *m = file->private_data;
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   hctx->poll_considered = hctx->poll_invoked = hctx->poll_success = 0;
+   return count;
+}
+
+static const struct file_operations hctx_io_poll_fops = {
+   .open   = hctx_io_poll_open,
+   .read   = seq_read,
+   .write  = hctx_io_poll_write,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
+{
+   seq_printf(m, "samples=%d, mean=%lld, min=%llu, max=%llu",
+  stat->nr_samples, stat->mean, stat->min, stat->max);
+}
+
+static int hctx_stats_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+   struct blk_rq_stat stat[2];
+
+   blk_stat_init([BLK_STAT_READ]);
+   blk_stat_init([BLK_STAT_WRITE]);
+
+   blk_hctx_stat_get(hctx, stat);
+
+   seq_puts(m, "read: ");
+   print_stat(m, [BLK_STAT_READ]);
+   seq_puts(m, "\n");
+
+   seq_puts(m, "write: ");
+   print_stat(m, [BLK_STAT_WRITE]);
+   seq_puts(m, "\n");
+   return 0;
+}
+
+static int hctx_stats_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_stats_show, inode->i_private);
+}
+
+static ssize_t hctx_stats_write(struct file *file, const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   struct seq_file *m = file->private_data;
+   struct blk_mq_hw_ctx *hctx = m->private;
+   struct blk_mq_ctx *ctx;
+   int i;
+
+   hctx_for_each_ctx(hctx, ctx, i) {
+   blk_stat_init(>stat[BLK_STAT_READ]);
+   blk_stat_init(>stat[BLK_STAT_WRITE]);
+   }
+   return count;
+}
+
+static const struct file_operations hctx_stats_fops = {
+   .open   = hctx_stats_open,
+   .read   = seq_read,
+   .write  = hctx_stats_write,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+static int hctx_dispatched_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+   int i;
+
+   seq_printf(m, "%8u\t%lu\n", 0U, hctx->dispatched[0]);
+
+   for (i = 1; i < BLK_MQ_MAX_DISPATCH_ORDER - 1; i++) {
+   unsigned int d = 1U << (i - 1);
+
+   seq_printf(m, "%8u\t%lu\n", d, hctx->dispatched[i]);
+   }
+
+   seq_printf(m, "%8u+\t%lu\n", 1U << (i - 1), hctx->dispatched[i]);
+   return 0;
+}
+
+static int hctx_dispatched_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_dispatched_show, inode->i_private);
+}
+
+static ssize_t hctx_dispatched_write(struct file *file, const char __user *buf,
+size_t count, loff_t *ppos)
+{
+   struct seq_file *m = file->private_data;
+   struct blk_mq_hw_ctx *hctx = m->private;
+   int i;
+
+   for (i = 0; i < BLK_MQ_MAX_DISPATCH_ORDER; i++)
+   hctx->dispatched[i] = 0;
+   return count;
+}
+
+static const struct file_operations hctx_dispatched_fops = {
+   .open   = hctx_dispatched_open,
+   .read   = seq_read,
+   .write  = hctx_dispatched_write,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
 {
struct blk_mq_ctx *ctx = m->private;
@@ -314,6 +443,9 @@ static const struct blk_mq_debugfs_attr 
blk_mq_debugfs_hctx_attrs[] = {

[PATCH 02/10] blk-mq: add hctx->{state,flags} to debugfs

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

hctx->state could come in handy for bugs where the hardware queue gets
stuck in the stopped state, and hctx->flags is just useful to know.

Signed-off-by: Omar Sandoval 
---
 block/blk-mq-debugfs.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 01711bbf5ade..0c14511fa9e0 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -29,7 +29,49 @@ struct blk_mq_debugfs_attr {
 
 static struct dentry *block_debugfs_root;
 
+static int hctx_state_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   seq_printf(m, "0x%lx\n", hctx->state);
+   return 0;
+}
+
+static int hctx_state_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_state_show, inode->i_private);
+}
+
+static const struct file_operations hctx_state_fops = {
+   .open   = hctx_state_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+static int hctx_flags_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   seq_printf(m, "0x%lx\n", hctx->flags);
+   return 0;
+}
+
+static int hctx_flags_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_flags_show, inode->i_private);
+}
+
+static const struct file_operations hctx_flags_fops = {
+   .open   = hctx_flags_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
+   {"state", 0400, _state_fops},
+   {"flags", 0400, _flags_fops},
 };
 
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/10] blk-mq: add extra request information to debugfs

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

The request pointers by themselves aren't super useful.

Signed-off-by: Omar Sandoval 
---
 block/blk-mq-debugfs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 1967c06d80e0..ee947f2f605b 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -87,7 +87,9 @@ static int blk_mq_debugfs_rq_show(struct seq_file *m, void *v)
 {
struct request *rq = list_entry_rq(v);
 
-   seq_printf(m, "%p\n", rq);
+   seq_printf(m, "%p {.cmd_type=%u, .cmd_flags=0x%x, .rq_flags=0x%x, 
.tag=%d, .internal_tag=%d}\n",
+  rq, rq->cmd_type, rq->cmd_flags, (unsigned int)rq->rq_flags,
+  rq->tag, rq->internal_tag);
return 0;
 }
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/10] blk-mq: move hctx and ctx counters from sysfs to debugfs

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

These counters aren't as out-of-place in sysfs as the other stuff, but
debugfs is a slightly better home for them.

Signed-off-by: Omar Sandoval 
---
 block/blk-mq-debugfs.c | 181 +
 block/blk-mq-sysfs.c   |  64 -
 2 files changed, 181 insertions(+), 64 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 37780d10bc9e..4a28c285c685 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -393,6 +393,88 @@ static const struct file_operations hctx_dispatched_fops = 
{
.release= single_release,
 };
 
+static int hctx_queued_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   seq_printf(m, "%lu\n", hctx->queued);
+   return 0;
+}
+
+static int hctx_queued_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_queued_show, inode->i_private);
+}
+
+static ssize_t hctx_queued_write(struct file *file, const char __user *buf,
+size_t count, loff_t *ppos)
+{
+   struct seq_file *m = file->private_data;
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   hctx->queued = 0;
+   return count;
+}
+
+static const struct file_operations hctx_queued_fops = {
+   .open   = hctx_queued_open,
+   .read   = seq_read,
+   .write  = hctx_queued_write,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+static int hctx_run_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   seq_printf(m, "%lu\n", hctx->run);
+   return 0;
+}
+
+static int hctx_run_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_run_show, inode->i_private);
+}
+
+static ssize_t hctx_run_write(struct file *file, const char __user *buf,
+size_t count, loff_t *ppos)
+{
+   struct seq_file *m = file->private_data;
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   hctx->run = 0;
+   return count;
+}
+
+static const struct file_operations hctx_run_fops = {
+   .open   = hctx_run_open,
+   .read   = seq_read,
+   .write  = hctx_run_write,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+static int hctx_active_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_hw_ctx *hctx = m->private;
+
+   seq_printf(m, "%d\n", atomic_read(>nr_active));
+   return 0;
+}
+
+static int hctx_active_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, hctx_active_show, inode->i_private);
+}
+
+static const struct file_operations hctx_active_fops = {
+   .open   = hctx_active_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
 {
struct blk_mq_ctx *ctx = m->private;
@@ -434,6 +516,99 @@ static const struct file_operations ctx_rq_list_fops = {
.release= seq_release,
 };
 
+static int ctx_dispatched_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_ctx *ctx = m->private;
+
+   seq_printf(m, "%lu %lu\n", ctx->rq_dispatched[1], 
ctx->rq_dispatched[0]);
+   return 0;
+}
+
+static int ctx_dispatched_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, ctx_dispatched_show, inode->i_private);
+}
+
+static ssize_t ctx_dispatched_write(struct file *file, const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   struct seq_file *m = file->private_data;
+   struct blk_mq_ctx *ctx = m->private;
+
+   ctx->rq_dispatched[0] = ctx->rq_dispatched[1] = 0;
+   return count;
+}
+
+static const struct file_operations ctx_dispatched_fops = {
+   .open   = ctx_dispatched_open,
+   .read   = seq_read,
+   .write  = ctx_dispatched_write,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+static int ctx_merged_show(struct seq_file *m, void *v)
+{
+   struct blk_mq_ctx *ctx = m->private;
+
+   seq_printf(m, "%lu\n", ctx->rq_merged);
+   return 0;
+}
+
+static int ctx_merged_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, ctx_merged_show, inode->i_private);
+}
+
+static ssize_t ctx_merged_write(struct file *file, const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   struct seq_file *m = file->private_data;
+   struct blk_mq_ctx *ctx = m->private;
+
+   ctx->rq_merged = 0;
+   return count;
+}
+
+static const struct file_operations ctx_merged_fops = {
+   .open   = ctx_merged_open,
+   .read   = seq_read,
+   .write  = ctx_merged_write,

Re: [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime

2017-01-23 Thread Thiago Jung Bauermann

Hello Dan,

Am Freitag, 6. Januar 2017, 17:02:51 BRST schrieb Dan Williams:
> v1 of these changes [1] was a one line change to bdev_get_queue() to
> prevent a shutdown crash when del_gendisk() races the final
> __blkdev_put().
> 
> While it is known at del_gendisk() time that the queue is still alive,
> Jan Kara points to other paths [2] that are racing __blkdev_put() where
> the assumption that ->bd_queue, or inode->i_wb is valid does not hold.
> 
> Fix that broken assumption, make it the case that if you have a live
> block_device, or block_device-inode that the corresponding queue and
> inode-write-back data is still valid.
> 
> These changes survive a run of the libnvdimm unit test suite which puts
> some stress on the block_device shutdown path.

I realize that the kernel test robot found problems with this series, but FWIW 
it fixes the bug mentioned in [2].

> [1]: http://marc.info/?l=linux-block=148366637105761=2
> [2]: http://www.spinics.net/lists/linux-fsdevel/msg105153.html

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] block: correct documentation for blkdev_issue_discard() flags

2017-01-23 Thread Eric Biggers

From: Eric Biggers 

BLKDEV_IFL_* flags no longer exist; blkdev_issue_discard() now actually
takes BLKDEV_DISCARD_* flags.

Signed-off-by: Eric Biggers 
---
 block/blk-lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index ed89c8f4b2a0..463b76dd566f 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -109,7 +109,7 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
  * @sector:start sector
  * @nr_sects:  number of sectors to discard
  * @gfp_mask:  memory allocation flags (for bio_alloc)
- * @flags: BLKDEV_IFL_* flags to control behaviour
+ * @flags: BLKDEV_DISCARD_* flags to control behaviour
  *
  * Description:
  *Issue a discard request for the sectors in question.
-- 
2.11.0.483.g087da7b7c-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] block: remove outdated part of blkdev_issue_flush() comment

2017-01-23 Thread Eric Biggers

From: Eric Biggers 

blkdev_issue_flush() is now always synchronous, and it no longer has a
flags argument.  So remove the part of the comment about the WAIT flag.

Signed-off-by: Eric Biggers 
---
 block/blk-flush.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 20b7c7a02f1c..3c0ab7361e46 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -490,8 +490,7 @@ void blk_insert_flush(struct request *rq)
  * Description:
  *Issue a flush for the block device in question. Caller can supply
  *room for storing the error offset in case of a flush error, if they
- *wish to. If WAIT flag is not passed then caller may check only what
- *request was pushed in some internal queue for later handling.
+ *wish to.
  */
 int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
sector_t *error_sector)
-- 
2.11.0.483.g087da7b7c-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] blk-mq-sched: fix possible crash if changing scheduler fails

2017-01-23 Thread Omar Sandoval

From: Omar Sandoval 

In elevator_switch(), we free the old scheduler's tags and then
initialize the new scheduler. If initializing the new scheduler fails,
we fall back to the old scheduler, but our tags have already been freed.
There's no reason to free the sched_tags only to reallocate them, so
let's just reuse them, which makes it possible to safely fall back to
the old scheduler.

Signed-off-by: Omar Sandoval 
---
Applies to for-4.11/block. Reproduced the issue and verified the fix by
sticking an err = -ENOMEM in the right place. As far as I can tell, it's
safe to reuse the previously allocated sched_tags for the new scheduler.

 block/elevator.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index ef7f59469acc..28283aaab04c 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -963,9 +963,6 @@ static int elevator_switch(struct request_queue *q, struct 
elevator_type *new_e)
if (old) {
old_registered = old->registered;
 
-   if (old->uses_mq)
-   blk_mq_sched_teardown(q);
-
if (!q->mq_ops)
blk_queue_bypass_start(q);
 
@@ -981,7 +978,14 @@ static int elevator_switch(struct request_queue *q, struct 
elevator_type *new_e)
/* allocate, init and register new elevator */
if (new_e) {
if (new_e->uses_mq) {
-   err = blk_mq_sched_setup(q);
+   /*
+* If we were previously using an I/O scheduler, the
+* sched_tags are already allocated.
+*/
+   if (old)
+   err = 0;
+   else
+   err = blk_mq_sched_setup(q);
if (!err)
err = new_e->ops.mq.init_sched(q, new_e);
} else
@@ -997,6 +1001,12 @@ static int elevator_switch(struct request_queue *q, 
struct elevator_type *new_e)
 
/* done, kill the old one and finish */
if (old) {
+   /*
+* If we switched to another I/O scheduler, then we shouldn't
+* free sched_tags.
+*/
+   if (old->uses_mq && !new_e)
+   blk_mq_sched_teardown(q);
elevator_exit(old);
if (!q->mq_ops)
blk_queue_bypass_end(q);
@@ -1015,7 +1025,7 @@ static int elevator_switch(struct request_queue *q, 
struct elevator_type *new_e)
return 0;
 
 fail_register:
-   if (q->mq_ops)
+   if (q->mq_ops && !old)
blk_mq_sched_teardown(q);
elevator_exit(q->elevator);
 fail_init:
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/16] block: fix elevator init check

2017-01-23 Thread Hannes Reinecke

On 01/23/2017 04:29 PM, Christoph Hellwig wrote:
> We can't initalize the elevator fields for flushes as flush share space
> in struct request with the elevator data.  But currently we can't
> commnicate that a request is a flush through blk_get_request as we
> can only pass READ or WRITE, and the low-level code looks at the
> possible NULL bio to check for a flush.
> 
> Fix this by allowing to pass any block op and flags, and by checking for
> the flush flags in __get_request.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/blk-core.c | 26 --
>  1 file changed, 4 insertions(+), 22 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/16] block: simplify blk_init_allocated_queue

2017-01-23 Thread Hannes Reinecke

On 01/23/2017 04:29 PM, Christoph Hellwig wrote:
> Return an errno value instead of the passed in queue so that the callers
> don't have to keep track of two queues, and move the assignment of the
> request_fn and lock to the caller as passing them as argument doesn't
> simplify anything.  While we're at it also remove two pointless NULL
> assignments, given that the request structure is zeroed on allocation.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/blk-core.c   | 38 +++---
>  drivers/md/dm-rq.c |  3 ++-
>  include/linux/blkdev.h |  3 +--
>  3 files changed, 18 insertions(+), 26 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-23 Thread Jan Kara

On Fri 20-01-17 07:42:09, Dan Williams wrote:
> On Fri, Jan 20, 2017 at 1:47 AM, Jan Kara  wrote:
> > On Thu 19-01-17 14:17:19, Vishal Verma wrote:
> >> On 01/18, Jan Kara wrote:
> >> > On Tue 17-01-17 15:37:05, Vishal Verma wrote:
> >> > 2) PMEM is exposed for DAX aware filesystem. This seems to be what you 
> >> > are
> >> > mostly interested in. We could possibly do something more efficient than
> >> > what NVDIMM driver does however the complexity would be relatively high 
> >> > and
> >> > frankly I'm far from convinced this is really worth it. If there are so
> >> > many badblocks this would matter, the HW has IMHO bigger problems than
> >> > performance.
> >>
> >> Correct, and Dave was of the opinion that once at least XFS has reverse
> >> mapping support (which it does now), adding badblocks information to
> >> that should not be a hard lift, and should be a better solution. I
> >> suppose should try to benchmark how much of a penalty the current badblock
> >> checking in the NVVDIMM driver imposes. The penalty is not because there
> >> may be a large number of badblocks, but just due to the fact that we
> >> have to do this check for every IO, in fact, every 'bvec' in a bio.
> >
> > Well, letting filesystem know is certainly good from error reporting quality
> > POV. I guess I'll leave it upto XFS guys to tell whether they can be more
> > efficient in checking whether current IO overlaps with any of given bad
> > blocks.
> >
> >> > Now my question: Why do we bother with badblocks at all? In cases 1) and 
> >> > 2)
> >> > if the platform can recover from MCE, we can just always access 
> >> > persistent
> >> > memory using memcpy_mcsafe(), if that fails, return -EIO. Actually that
> >> > seems to already happen so we just need to make sure all places handle
> >> > returned errors properly (e.g. fs/dax.c does not seem to) and we are 
> >> > done.
> >> > No need for bad blocks list at all, no slow down unless we hit a bad cell
> >> > and in that case who cares about performance when the data is gone...
> >>
> >> Even when we have MCE recovery, we cannot do away with the badblocks
> >> list:
> >> 1. My understanding is that the hardware's ability to do MCE recovery is
> >> limited/best-effort, and is not guaranteed. There can be circumstances
> >> that cause a "Processor Context Corrupt" state, which is unrecoverable.
> >
> > Well, then they have to work on improving the hardware. Because having HW
> > that just sometimes gets stuck instead of reporting bad storage is simply
> > not acceptable. And no matter how hard you try you cannot avoid MCEs from
> > OS when accessing persistent memory so OS just has no way to avoid that
> > risk.
> >
> >> 2. We still need to maintain a badblocks list so that we know what
> >> blocks need to be cleared (via the ACPI method) on writes.
> >
> > Well, why cannot we just do the write, see whether we got CMCI and if yes,
> > clear the error via the ACPI method?
> 
> I would need to check if you get the address reported in the CMCI, but
> it would only fire if the write triggered a read-modify-write cycle. I
> suspect most copies to pmem, through something like
> arch_memcpy_to_pmem(), are not triggering any reads. It also triggers
> asynchronously, so what data do you write after clearing the error?
> There may have been more writes while the CMCI was being delivered.

OK, I see. And if we just write new data but don't clear error on write
through the ACPI method, will we still get MCE on following read of that
data? But regardless whether we get MCE or not, I suppose that the memory
location will be still marked as bad in some ACPI table, won't it?

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHSET v4] blk-mq-scheduling framework

2017-01-23 Thread Paolo Valente


> Il giorno 18 gen 2017, alle ore 17:21, Jens Axboe  ha scritto:
> 
> On 01/18/2017 08:14 AM, Paolo Valente wrote:
>> according to the function blk_mq_sched_put_request, the
>> mq.completed_request hook seems to always be invoked (if set) for a
>> request for which the mq.put_rq_priv is invoked (if set).
> 
> Correct, any request that came out of blk_mq_sched_get_request()
> will always have completed called on it, regardless of whether it
> had IO started on it or not.
> 

It seems that some request, after being dispatched, happens to have no
mq.put_rq_priv invoked on it now or then.  Is it expected?  If it is,
could you point me to the path through which the end of the life of
such a request is handled?

Thanks,
Paolo

> -- 
> Jens Axboe
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG: KASAN: Use-after-free

2017-01-23 Thread Christoph Hellwig

On Mon, Jan 23, 2017 at 05:07:52PM +0100, Matias Bjørling wrote:
> Hi,
> 
> I could use some help verifying an use-after-free bug that I am seeing
> after the new direct I/O work went in.
> 
> When issuing a direct write io using libaio, a bio is referenced in the
> blkdev_direct_IO path, and then put again in the blkdev_bio_end_io path.
> However, there is a case where the bio is put twice, which leads to
> modules that rely on the bio after biodev_bio_end_io() to access it
> prematurely.

Can you reproduce anything like this with a normal block device?

> 
> The KASAN error report:
> 
> [   14.645916]
> ==
> [   14.648027] BUG: KASAN: use-after-free in
> blkdev_direct_IO+0x50c/0x600 at addr 8801ef30ea14

Can you resolve that address for me, please?

> Which boils down to the bio already being freed, when we hit the
> module's endio function.
> 
> The double bio_put() occurs when dio->should_dirty = 0, and dio->is_sync
> = 0. The flow follows:
> 
> Issuing:
>   blkdev_direct_IO
>  get_bio(bio)

bio refcounting in __blkdev_direct_IO is the following

first bio is allocated using bio_alloc_bioset to embedd the dio structure
into it.  We then grab a second reference to that bio and only that one.
Each other new bio just gets it's single reference from bio_alloc.

blkdev_bio_end_io then checks if we hit the last reference on the dio, in
which case it then drops the additional reference on the first bio after
calling the aio completion handler.  Once that is done it always drops the
reference for the current bio - either explicitly or through
bio_check_pages_dirty in the should_dirty case.

>  submit_io()
>rrpc make_request(bio)
>  get_bio(bio)
> Completion:
>   blkdev_bio_end_io
> bio_put(bio)
> bio_put(bio) - bio is freed

Yes, and that's how it's intended.

>   rrpc_end_io
> bio_put(bio) (use-after-free access)

Can you check this is the same bio that got submitted and it didn't
get bounced somewhere?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHSET v4] blk-mq-scheduling framework

2017-01-23 Thread Jens Axboe

On 01/23/2017 10:04 AM, Paolo Valente wrote:
> 
>> Il giorno 18 gen 2017, alle ore 17:21, Jens Axboe  ha scritto:
>>
>> On 01/18/2017 08:14 AM, Paolo Valente wrote:
>>> according to the function blk_mq_sched_put_request, the
>>> mq.completed_request hook seems to always be invoked (if set) for a
>>> request for which the mq.put_rq_priv is invoked (if set).
>>
>> Correct, any request that came out of blk_mq_sched_get_request()
>> will always have completed called on it, regardless of whether it
>> had IO started on it or not.
>>
> 
> It seems that some request, after being dispatched, happens to have no
> mq.put_rq_priv invoked on it now or then.  Is it expected?  If it is,
> could you point me to the path through which the end of the life of
> such a request is handled?

I'm guessing that's a flush request. I added RQF_QUEUED to check for
that, if RQF_QUEUED is set, you know it has come from your get_request
handler.

I'm assuming that is it, let me know.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] blk-mq: Update queue map when changing queue count

2017-01-23 Thread Christoph Hellwig

Looks fine except for the nitpick pointed out by Sagi:

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] block: Initialize cfqq->ioprio_class in cfq_get_queue()

2017-01-23 Thread Jens Axboe

On 01/23/2017 09:17 AM, Alexander Potapenko wrote:
> On Mon, Jan 23, 2017 at 5:03 PM, Andrey Ryabinin  
> wrote:
>> 2017-01-23 17:06 GMT+03:00 Alexander Potapenko :
>>> KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
>>> uninitialized memory in cfq_init_cfqq():
>>>
>>> ==
>>> BUG: KMSAN: use of unitialized memory
>>> ...
>>> Call Trace:
>>>  [< inline >] __dump_stack lib/dump_stack.c:15
>>>  [] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
>>>  [] kmsan_report+0x205/0x360 ??:?
>>>  [] __msan_warning+0x5b/0xb0 ??:?
>>>  [< inline >] cfq_init_cfqq block/cfq-iosched.c:3754
>>>  [] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
>>> ...
>>> origin:
>>>  [] save_stack_trace+0x27/0x50 
>>> arch/x86/kernel/stacktrace.c:67
>>>  [] kmsan_internal_poison_shadow+0xab/0x150 ??:?
>>>  [] kmsan_poison_slab+0xbb/0x120 ??:?
>>>  [< inline >] allocate_slab mm/slub.c:1627
>>>  [] new_slab+0x3af/0x4b0 mm/slub.c:1641
>>>  [< inline >] new_slab_objects mm/slub.c:2407
>>>  [] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
>>>  [< inline >] __slab_alloc mm/slub.c:2606
>>>  [< inline >] slab_alloc_node mm/slub.c:2669
>>>  [] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
>>>  [] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
>>> ...
>>> ==
>>> (the line numbers are relative to 4.8-rc6, but the bug persists
>>> upstream)
>>>
>>> The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
>>> and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
>>> before it's initialized.
>>>
>>
>> struct cfq_queue is zero initialized (__GFP_ZERO).
>> The warning is false-positive.
> You are totally right. I've handled __GFP_ZERO in (hopefully) every
> case except for this one, and overlooked the presence of that flag in
> the kmem_cache_alloc_node().
> Thanks for double-checking!
> Jens, sorry for the false alarm.

No worries, I did queue up the patch, since even if it is a false
positive, it's cleaner to set this explicitly to NONE rather than
silently rely on the fact that NONE is 0.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/6] irq/affinity: Assign all online CPUs to vectors

2017-01-23 Thread Christoph Hellwig

On Wed, Jan 04, 2017 at 05:41:06PM -0500, Keith Busch wrote:
> This patch makes sure all online CPUs are assigned to vectors in
> cases where the nodes don't have the same number of online CPUs.
> The calculation for how many vectors needs to be assigned should account
> for the number of CPUs for a particular node on each round of assignment
> in order to ensure all online CPUs are assigned a vector when they don't
> evenly divide, and calculate extras accordingly.
> 
> Since we attempt to divide evently among the nodes, this may still result
> in unused vectors if some nodes have fewer CPUs than nodes previosuly
> set up, but at least every online CPU will be assigned to something.

This looks fine:

I think we still should switch to something like all present or possible
cpus for MSI-X vector and blk-mq queue assignment, though reduce
the needs for this:

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/6] nvme/pci: Start queues after tagset is updated

2017-01-23 Thread Christoph Hellwig

Looks fine,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

65 matches

Mail list logo