Re: [PATCH 05/16] dm: always defer request allocation to the owner of the request_queue

2017-01-25 Thread Junichi Nomura
On 01/25/17 01:39, Mike Snitzer wrote:
> On Tue, Jan 24 2017 at  9:20am -0500, Christoph Hellwig  wrote:
>> On Tue, Jan 24, 2017 at 05:05:39AM -0500, Mike Snitzer wrote:
>>> possible and is welcomed cleanup.  The only concern I have is that using
>>> get_request() for the old request_fn request_queue eliminates the
>>> guaranteed availability of requests to allow for forward progress (on
>>> path failure or for the purposes of swap over mpath, etc).  This isn't a
>>> concern for blk-mq because as you know we have a fixed set of tags (and
>>> associated preallocated requests).
>>>
>>> So I'm left unconvinced old request_fn request-based DM multipath isn't
>>> regressing in how request resubmission can be assured a request will be
>>> available when needed on retry in the face of path failure.
>>
>> Mempool only need a size where we can make guaranteed requests, so for
>> get_request based drivers under dm the theoretical minimum size would be
>> one as we never rely on a second request to finish the first one,
>> and each request_queue has it's own mempool(s) to start with.
> 
> Fair enough.  Cc'ing Junichi just in case he sees anything we're
> missing.

DM multipath could not use blk_get_request() because the function
was not callable from interrupt-disabled context. E.g. request_fn.

However, since the current code no longer calls blk_get_request()
from such a context, the change should be ok.

-- 
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/16] dm: always defer request allocation to the owner of the request_queue

2017-01-24 Thread Christoph Hellwig
On Tue, Jan 24, 2017 at 11:39:51AM -0500, Mike Snitzer wrote:
> Fair enough.  Cc'ing Junichi just in case he sees anything we're
> missing.

Thanks, I'll wait for his repsonse before reposting with the few accumulated
changes (including the dm cleanup).
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/16] dm: always defer request allocation to the owner of the request_queue

2017-01-24 Thread Mike Snitzer
On Tue, Jan 24 2017 at  9:20am -0500,
Christoph Hellwig  wrote:

> On Tue, Jan 24, 2017 at 05:05:39AM -0500, Mike Snitzer wrote:
> > possible and is welcomed cleanup.  The only concern I have is that using
> > get_request() for the old request_fn request_queue eliminates the
> > guaranteed availability of requests to allow for forward progress (on
> > path failure or for the purposes of swap over mpath, etc).  This isn't a
> > concern for blk-mq because as you know we have a fixed set of tags (and
> > associated preallocated requests).
> > 
> > So I'm left unconvinced old request_fn request-based DM multipath isn't
> > regressing in how request resubmission can be assured a request will be
> > available when needed on retry in the face of path failure.
> 
> Mempool only need a size where we can make guaranteed requests, so for
> get_request based drivers under dm the theoretical minimum size would be
> one as we never rely on a second request to finish the first one,
> and each request_queue has it's own mempool(s) to start with.

Fair enough.  Cc'ing Junichi just in case he sees anything we're
missing.
 
> > dm_mod's 'reserved_rq_based_ios' module_param governs the minimum number
> > of requests in the md->rq_pool (and defaults to 256 requests per
> > request-based DM request_queue).  Whereas blk_init_rl()'s
> > mempool_create_node() uses BLKDEV_MIN_RQ (4) yet q->nr_requests =
> > BLKDEV_MAX_RQ (128).  Also, this patch eliminates the utility of
> > 'reserved_rq_based_ios' module_param without actually removing it.
> 
> It's still used for the bio pool, so I couldn't simply remove it.

Ah, yeah, you're talking about:
pools->bs = bioset_create_nobvec(pool_size, front_pad);

Makes sense.

> > Anyway, should blk-core evolve to allow drivers to specify a custom
> > min_nr of requests in the old request_fn request_queue's mempool?  Or is
> > my concern overblown?
> 
> Thanks to mempool_resize we could add that functionality.  I just don't
> see an actual need for it.
> 
> > p.s. dm.c:dm_alloc_md_mempools() could be cleaned up a bit more since
> > only bio-based DM will have a pools->io_pool moving forward; but I can
> > circle back to that cleanup after.
> 
> If you're fine with doing more than the necessary changes in a patch
> that needs to got into the block tree I'd be happy to apply my usual
> cleanup magic to it.

I think this one example illustrates cleanup that makes sense even if
going through block.  While in there it is best to not leave extra
branching that just doesn't make sense to have anymore.  So if you do
another version of this patchset feel free to move this direct into
dm_alloc_md_mempools()'s BIO_BASED case:
pools->io_pool = mempool_create_slab_pool(pool_size, _io_cache);
And eliminate the cachep pointer.

But if not that is fine too.

Also:

Reviewed-by: Mike Snitzer 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/16] dm: always defer request allocation to the owner of the request_queue

2017-01-24 Thread Christoph Hellwig
On Tue, Jan 24, 2017 at 05:05:39AM -0500, Mike Snitzer wrote:
> possible and is welcomed cleanup.  The only concern I have is that using
> get_request() for the old request_fn request_queue eliminates the
> guaranteed availability of requests to allow for forward progress (on
> path failure or for the purposes of swap over mpath, etc).  This isn't a
> concern for blk-mq because as you know we have a fixed set of tags (and
> associated preallocated requests).
> 
> So I'm left unconvinced old request_fn request-based DM multipath isn't
> regressing in how request resubmission can be assured a request will be
> available when needed on retry in the face of path failure.

Mempool only need a size where we can make guaranteed requests, so for
get_request based drivers under dm the theoretical minimum size would be
one as we never rely on a second request to finish the first one,
and each request_queue has it's own mempool(s) to start with.

> dm_mod's 'reserved_rq_based_ios' module_param governs the minimum number
> of requests in the md->rq_pool (and defaults to 256 requests per
> request-based DM request_queue).  Whereas blk_init_rl()'s
> mempool_create_node() uses BLKDEV_MIN_RQ (4) yet q->nr_requests =
> BLKDEV_MAX_RQ (128).  Also, this patch eliminates the utility of
> 'reserved_rq_based_ios' module_param without actually removing it.

It's still used for the bio pool, so I couldn't simply remove it.

> Anyway, should blk-core evolve to allow drivers to specify a custom
> min_nr of requests in the old request_fn request_queue's mempool?  Or is
> my concern overblown?

Thanks to mempool_resize we could add that functionality.  I just don't
see an actual need for it.

> p.s. dm.c:dm_alloc_md_mempools() could be cleaned up a bit more since
> only bio-based DM will have a pools->io_pool moving forward; but I can
> circle back to that cleanup after.

If you're fine with doing more than the necessary changes in a patch
that needs to got into the block tree I'd be happy to apply my usual
cleanup magic to it.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/16] dm: always defer request allocation to the owner of the request_queue

2017-01-24 Thread Mike Snitzer
On Mon, Jan 23 2017 at 10:29am -0500,
Christoph Hellwig  wrote:

> DM already calls blk_mq_alloc_request on the request_queue of the
> underlying device if it is a blk-mq device.  But now that we allow drivers
> to allocate additional data and initialize it ahead of time we need to do
> the same for all drivers.   Doing so and using the new cmd_size
> infrastructure in the block layer greatly simplifies the dm-rq and mpath
> code, and should also make arbitrary combinations of SQ and MQ devices
> with SQ or MQ device mapper tables easily possible as a further step.

Thanks for working (and suffering) through all of this request-based DM
code.  Nice to have someone else be painfully aware of the complexity in
request-based DM's old request_fn support.

The queue->cmd_size (per request data) definitely makes this more
possible and is welcomed cleanup.  The only concern I have is that using
get_request() for the old request_fn request_queue eliminates the
guaranteed availability of requests to allow for forward progress (on
path failure or for the purposes of swap over mpath, etc).  This isn't a
concern for blk-mq because as you know we have a fixed set of tags (and
associated preallocated requests).

So I'm left unconvinced old request_fn request-based DM multipath isn't
regressing in how request resubmission can be assured a request will be
available when needed on retry in the face of path failure.

dm_mod's 'reserved_rq_based_ios' module_param governs the minimum number
of requests in the md->rq_pool (and defaults to 256 requests per
request-based DM request_queue).  Whereas blk_init_rl()'s
mempool_create_node() uses BLKDEV_MIN_RQ (4) yet q->nr_requests =
BLKDEV_MAX_RQ (128).  Also, this patch eliminates the utility of
'reserved_rq_based_ios' module_param without actually removing it.

Anyway, should blk-core evolve to allow drivers to specify a custom
min_nr of requests in the old request_fn request_queue's mempool?  Or is
my concern overblown?

Seems we're very close to making this request-based DM cleanup doable.
Just would like some extra eyes and care/thought/guidance from yourself
and likely Jens.

Thanks,
Mike

p.s. dm.c:dm_alloc_md_mempools() could be cleaned up a bit more since
only bio-based DM will have a pools->io_pool moving forward; but I can
circle back to that cleanup after.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/16] dm: always defer request allocation to the owner of the request_queue

2017-01-24 Thread Hannes Reinecke
On 01/23/2017 04:29 PM, Christoph Hellwig wrote:
> DM already calls blk_mq_alloc_request on the request_queue of the
> underlying device if it is a blk-mq device.  But now that we allow drivers
> to allocate additional data and initialize it ahead of time we need to do
> the same for all drivers.   Doing so and using the new cmd_size
> infrastructure in the block layer greatly simplifies the dm-rq and mpath
> code, and should also make arbitrary combinations of SQ and MQ devices
> with SQ or MQ device mapper tables easily possible as a further step.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/md/dm-core.h  |   1 -
>  drivers/md/dm-mpath.c | 132 --
>  drivers/md/dm-rq.c| 251 
> ++
>  drivers/md/dm-rq.h|   2 +-
>  drivers/md/dm-target.c|   7 --
>  drivers/md/dm.c   |  18 +--
>  drivers/md/dm.h   |   3 +-
>  include/linux/device-mapper.h |   3 -
>  8 files changed, 81 insertions(+), 336 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/16] dm: always defer request allocation to the owner of the request_queue

2017-01-23 Thread Christoph Hellwig
DM already calls blk_mq_alloc_request on the request_queue of the
underlying device if it is a blk-mq device.  But now that we allow drivers
to allocate additional data and initialize it ahead of time we need to do
the same for all drivers.   Doing so and using the new cmd_size
infrastructure in the block layer greatly simplifies the dm-rq and mpath
code, and should also make arbitrary combinations of SQ and MQ devices
with SQ or MQ device mapper tables easily possible as a further step.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-core.h  |   1 -
 drivers/md/dm-mpath.c | 132 --
 drivers/md/dm-rq.c| 251 ++
 drivers/md/dm-rq.h|   2 +-
 drivers/md/dm-target.c|   7 --
 drivers/md/dm.c   |  18 +--
 drivers/md/dm.h   |   3 +-
 include/linux/device-mapper.h |   3 -
 8 files changed, 81 insertions(+), 336 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 40ceba1..136fda3 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -92,7 +92,6 @@ struct mapped_device {
 * io objects are allocated from here.
 */
mempool_t *io_pool;
-   mempool_t *rq_pool;
 
struct bio_set *bs;
 
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 6400cff..784f237 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -92,12 +92,6 @@ struct multipath {
 
unsigned queue_mode;
 
-   /*
-* We must use a mempool of dm_mpath_io structs so that we
-* can resubmit bios on error.
-*/
-   mempool_t *mpio_pool;
-
struct mutex work_mutex;
struct work_struct trigger_event;
 
@@ -115,8 +109,6 @@ struct dm_mpath_io {
 
 typedef int (*action_fn) (struct pgpath *pgpath);
 
-static struct kmem_cache *_mpio_cache;
-
 static struct workqueue_struct *kmultipathd, *kmpath_handlerd;
 static void trigger_event(struct work_struct *work);
 static void activate_path(struct work_struct *work);
@@ -209,7 +201,6 @@ static struct multipath *alloc_multipath(struct dm_target 
*ti)
init_waitqueue_head(>pg_init_wait);
mutex_init(>work_mutex);
 
-   m->mpio_pool = NULL;
m->queue_mode = DM_TYPE_NONE;
 
m->ti = ti;
@@ -229,16 +220,7 @@ static int alloc_multipath_stage2(struct dm_target *ti, 
struct multipath *m)
m->queue_mode = DM_TYPE_MQ_REQUEST_BASED;
else
m->queue_mode = DM_TYPE_REQUEST_BASED;
-   }
-
-   if (m->queue_mode == DM_TYPE_REQUEST_BASED) {
-   unsigned min_ios = dm_get_reserved_rq_based_ios();
-
-   m->mpio_pool = mempool_create_slab_pool(min_ios, _mpio_cache);
-   if (!m->mpio_pool)
-   return -ENOMEM;
-   }
-   else if (m->queue_mode == DM_TYPE_BIO_BASED) {
+   } else if (m->queue_mode == DM_TYPE_BIO_BASED) {
INIT_WORK(>process_queued_bios, process_queued_bios);
/*
 * bio-based doesn't support any direct scsi_dh management;
@@ -263,7 +245,6 @@ static void free_multipath(struct multipath *m)
 
kfree(m->hw_handler_name);
kfree(m->hw_handler_params);
-   mempool_destroy(m->mpio_pool);
kfree(m);
 }
 
@@ -272,38 +253,6 @@ static struct dm_mpath_io *get_mpio(union map_info *info)
return info->ptr;
 }
 
-static struct dm_mpath_io *set_mpio(struct multipath *m, union map_info *info)
-{
-   struct dm_mpath_io *mpio;
-
-   if (!m->mpio_pool) {
-   /* Use blk-mq pdu memory requested via per_io_data_size */
-   mpio = get_mpio(info);
-   memset(mpio, 0, sizeof(*mpio));
-   return mpio;
-   }
-
-   mpio = mempool_alloc(m->mpio_pool, GFP_ATOMIC);
-   if (!mpio)
-   return NULL;
-
-   memset(mpio, 0, sizeof(*mpio));
-   info->ptr = mpio;
-
-   return mpio;
-}
-
-static void clear_request_fn_mpio(struct multipath *m, union map_info *info)
-{
-   /* Only needed for non blk-mq (.request_fn) multipath */
-   if (m->mpio_pool) {
-   struct dm_mpath_io *mpio = info->ptr;
-
-   info->ptr = NULL;
-   mempool_free(mpio, m->mpio_pool);
-   }
-}
-
 static size_t multipath_per_bio_data_size(void)
 {
return sizeof(struct dm_mpath_io) + sizeof(struct dm_bio_details);
@@ -530,16 +479,17 @@ static bool must_push_back_bio(struct multipath *m)
 /*
  * Map cloned requests (request-based multipath)
  */
-static int __multipath_map(struct dm_target *ti, struct request *clone,
-  union map_info *map_context,
-  struct request *rq, struct request **__clone)
+static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
+  union map_info *map_context,
+