Re: [dm-devel] [PATCH 05/16] dm: always defer request allocation to the owner of the request_queue

2017-01-25 Thread Junichi Nomura
On 01/25/17 01:39, Mike Snitzer wrote:
> On Tue, Jan 24 2017 at  9:20am -0500, Christoph Hellwig  wrote:
>> On Tue, Jan 24, 2017 at 05:05:39AM -0500, Mike Snitzer wrote:
>>> possible and is welcomed cleanup.  The only concern I have is that using
>>> get_request() for the old request_fn request_queue eliminates the
>>> guaranteed availability of requests to allow for forward progress (on
>>> path failure or for the purposes of swap over mpath, etc).  This isn't a
>>> concern for blk-mq because as you know we have a fixed set of tags (and
>>> associated preallocated requests).
>>>
>>> So I'm left unconvinced old request_fn request-based DM multipath isn't
>>> regressing in how request resubmission can be assured a request will be
>>> available when needed on retry in the face of path failure.
>>
>> Mempool only need a size where we can make guaranteed requests, so for
>> get_request based drivers under dm the theoretical minimum size would be
>> one as we never rely on a second request to finish the first one,
>> and each request_queue has it's own mempool(s) to start with.
> 
> Fair enough.  Cc'ing Junichi just in case he sees anything we're
> missing.

DM multipath could not use blk_get_request() because the function
was not callable from interrupt-disabled context. E.g. request_fn.

However, since the current code no longer calls blk_get_request()
from such a context, the change should be ok.

-- 
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 18/18] block: don't assign cmd_flags in __blk_rq_prep_clone

2017-01-25 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> These days we have the proper flags set since request
Christoph> allocation time.

Acked-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 15/18] scsi: allocate scsi_cmnd structures as part of struct request

2017-01-25 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> Rely on the new block layer functionality to allocate
Christoph> additional driver specific data behind struct request instead
Christoph> of implementing it in SCSI itѕelf.

Acked-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH 14/18] scsi: remove __scsi_alloc_queue

2017-01-25 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> Instead do an internal export of __scsi_init_queue for the
Christoph> transport classes that export BSG nodes.

Acked-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 13/18] scsi: remove scsi_cmd_dma_pool

2017-01-25 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> There is no need for GFP_DMA allocations of the scsi_cmnd
Christoph> structures themselves, all that might be DMAed to or from is
Christoph> the actual payload, or the sense buffers.

Acked-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 12/18] scsi: respect unchecked_isa_dma for blk-mq

2017-01-25 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> Currently blk-mq always allocates the sense buffer using
Christoph> normal GFP_KERNEL allocation.  Refactor the cmd pool code to
Christoph> split the cmd and sense allocation and share the code to
Christoph> allocate the sense buffers as well as the sense buffer slab
Christoph> caches between the legacy and blk-mq path.

Acked-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 11/18] scsi: remove gfp_flags member in scsi_host_cmd_pool

2017-01-25 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> When using the slab allocator we already decide at cache
Christoph> creation time if an allocation comes from a GFP_DMA pool
Christoph> using the SLAB_CACHE_DMA flag, and there is no point passing
Christoph> the kmalloc-family only GFP_DMA flag to kmem_cache_alloc.
Christoph> Drop all the infrastructure for doing so.

Acked-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 08/18] scsi_dh_rdac: switch to scsi_execute_req_flags()

2017-01-25 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> From: Hannes Reinecke  Switch to
Christoph> scsi_execute_req_flags() and scsi_get_vpd_page() instead of
Christoph> open-coding it.  Using scsi_execute_req_flags() will set
Christoph> REQ_QUIET and REQ_PREEMPT, but this is okay as we're
Christoph> evaluating the errors anyway and should be able to send the
Christoph> command even if the device is quiesced.

Acked-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 05/18] block: allow specifying size for extra command data

2017-01-25 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph,

Christoph> This mirrors the blk-mq capabilities to allocate extra
Christoph> drivers-specific data behind struct request by setting a
Christoph> cmd_size field, as well as having a constructor / destructor
Christoph> for it.

Nice!

A couple of minor nits:

+static void *alloc_request_size(gfp_t gfp_mask, void *data)

I like alloc_request_simple() but alloc_request_size() seems a bit
contrived. _reserve? _extra? _special? Don't have any good suggestions,
I'm afraid.

Also a bit heavy on the else brackets a couple of places. But no biggie.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 04/18] block: simplify blk_init_allocated_queue

2017-01-25 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> Return an errno value instead of the passed in queue so that
Christoph> the callers don't have to keep track of two queues, and move
Christoph> the assignment of the request_fn and lock to the caller as
Christoph> passing them as argument doesn't simplify anything.  While
Christoph> we're at it also remove two pointless NULL assignments, given
Christoph> that the request structure is zeroed on allocation.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 03/18] block: fix elevator init check

2017-01-25 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> We can't initalize the elevator fields for flushes as flush
Christoph> share space in struct request with the elevator data.  But
Christoph> currently we can't commnicate that a request is a flush

communicate

Christoph> through blk_get_request as we can only pass READ or WRITE,
Christoph> and the low-level code looks at the possible NULL bio to
Christoph> check for a flush.

Christoph> Fix this by allowing to pass any block op and flags, and by
Christoph> checking for the flush flags in __get_request.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH 01/18] block: add a op_is_flush helper

2017-01-25 Thread Christoph Hellwig
This centralizes the checks for bios that needs to be go into the flush
state machine.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c |  8 
 block/blk-mq-sched.c |  5 ++---
 block/blk-mq.c   |  4 ++--
 drivers/md/bcache/request.c  |  2 +-
 drivers/md/dm-cache-target.c | 13 +++--
 drivers/md/dm-thin.c | 13 +
 include/linux/blk_types.h|  9 +
 7 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a61f140..b830e14 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1035,7 +1035,7 @@ static bool blk_rq_should_init_elevator(struct bio *bio)
 * Flush requests do not use the elevator so skip initialization.
 * This allows a request to share the flush and elevator data.
 */
-   if (bio->bi_opf & (REQ_PREFLUSH | REQ_FUA))
+   if (op_is_flush(bio->bi_opf))
return false;
 
return true;
@@ -1641,7 +1641,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio)
return BLK_QC_T_NONE;
}
 
-   if (bio->bi_opf & (REQ_PREFLUSH | REQ_FUA)) {
+   if (op_is_flush(bio->bi_opf)) {
spin_lock_irq(q->queue_lock);
where = ELEVATOR_INSERT_FLUSH;
goto get_rq;
@@ -2145,7 +2145,7 @@ int blk_insert_cloned_request(struct request_queue *q, 
struct request *rq)
 */
BUG_ON(blk_queued_rq(rq));
 
-   if (rq->cmd_flags & (REQ_PREFLUSH | REQ_FUA))
+   if (op_is_flush(rq->cmd_flags))
where = ELEVATOR_INSERT_FLUSH;
 
add_acct_request(q, rq, where);
@@ -3256,7 +3256,7 @@ void blk_flush_plug_list(struct blk_plug *plug, bool 
from_schedule)
/*
 * rq is already accounted, so use raw insert
 */
-   if (rq->cmd_flags & (REQ_PREFLUSH | REQ_FUA))
+   if (op_is_flush(rq->cmd_flags))
__elv_add_request(q, rq, ELEVATOR_INSERT_FLUSH);
else
__elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index d05061f..3bd66e5 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -111,7 +111,6 @@ struct request *blk_mq_sched_get_request(struct 
request_queue *q,
struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *ctx;
struct request *rq;
-   const bool is_flush = op & (REQ_PREFLUSH | REQ_FUA);
 
blk_queue_enter_live(q);
ctx = blk_mq_get_ctx(q);
@@ -126,7 +125,7 @@ struct request *blk_mq_sched_get_request(struct 
request_queue *q,
 * Flush requests are special and go directly to the
 * dispatch list.
 */
-   if (!is_flush && e->type->ops.mq.get_request) {
+   if (!op_is_flush(op) && e->type->ops.mq.get_request) {
rq = e->type->ops.mq.get_request(q, op, data);
if (rq)
rq->rq_flags |= RQF_QUEUED;
@@ -138,7 +137,7 @@ struct request *blk_mq_sched_get_request(struct 
request_queue *q,
}
 
if (rq) {
-   if (!is_flush) {
+   if (!op_is_flush(op)) {
rq->elv.icq = NULL;
if (e && e->type->icq_cache)
blk_mq_sched_assign_ioc(q, rq, bio);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ee69e5e..e229f8a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1378,7 +1378,7 @@ static void blk_mq_try_issue_directly(struct request *rq, 
blk_qc_t *cookie)
 static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 {
const int is_sync = op_is_sync(bio->bi_opf);
-   const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
+   const int is_flush_fua = op_is_flush(bio->bi_opf);
struct blk_mq_alloc_data data;
struct request *rq;
unsigned int request_count = 0, srcu_idx;
@@ -1498,7 +1498,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue 
*q, struct bio *bio)
 static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
 {
const int is_sync = op_is_sync(bio->bi_opf);
-   const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
+   const int is_flush_fua = op_is_flush(bio->bi_opf);
struct blk_plug *plug;
unsigned int request_count = 0;
struct blk_mq_alloc_data data;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 76d2087..01035e7 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -666,7 +666,7 @@ static inline struct search *search_alloc(struct bio *bio,
s->iop.write_prio   = 0;
s->iop.error= 0;
s->iop.flags= 0;
-   s->iop.flush_journal= (bio->bi_opf & (REQ_PREFLUSH|REQ_FUA)) != 

[dm-devel] [PATCH 04/18] block: simplify blk_init_allocated_queue

2017-01-25 Thread Christoph Hellwig
Return an errno value instead of the passed in queue so that the callers
don't have to keep track of two queues, and move the assignment of the
request_fn and lock to the caller as passing them as argument doesn't
simplify anything.  While we're at it also remove two pointless NULL
assignments, given that the request structure is zeroed on allocation.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 block/blk-core.c   | 38 +++---
 drivers/md/dm-rq.c |  3 ++-
 include/linux/blkdev.h |  3 +--
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a84c1b9..54b5512 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -823,15 +823,19 @@ EXPORT_SYMBOL(blk_init_queue);
 struct request_queue *
 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 {
-   struct request_queue *uninit_q, *q;
+   struct request_queue *q;
 
-   uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
-   if (!uninit_q)
+   q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+   if (!q)
return NULL;
 
-   q = blk_init_allocated_queue(uninit_q, rfn, lock);
-   if (!q)
-   blk_cleanup_queue(uninit_q);
+   q->request_fn = rfn;
+   if (lock)
+   q->queue_lock = lock;
+   if (blk_init_allocated_queue(q) < 0) {
+   blk_cleanup_queue(q);
+   return NULL;
+   }
 
return q;
 }
@@ -839,30 +843,19 @@ EXPORT_SYMBOL(blk_init_queue_node);
 
 static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio);
 
-struct request_queue *
-blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
-spinlock_t *lock)
-{
-   if (!q)
-   return NULL;
 
+int blk_init_allocated_queue(struct request_queue *q)
+{
q->fq = blk_alloc_flush_queue(q, NUMA_NO_NODE, 0);
if (!q->fq)
-   return NULL;
+   return -ENOMEM;
 
if (blk_init_rl(>root_rl, q, GFP_KERNEL))
goto fail;
 
INIT_WORK(>timeout_work, blk_timeout_work);
-   q->request_fn   = rfn;
-   q->prep_rq_fn   = NULL;
-   q->unprep_rq_fn = NULL;
q->queue_flags  |= QUEUE_FLAG_DEFAULT;
 
-   /* Override internal queue lock with supplied lock pointer */
-   if (lock)
-   q->queue_lock   = lock;
-
/*
 * This also sets hw/phys segments, boundary and size
 */
@@ -880,13 +873,12 @@ blk_init_allocated_queue(struct request_queue *q, 
request_fn_proc *rfn,
}
 
mutex_unlock(>sysfs_lock);
-
-   return q;
+   return 0;
 
 fail:
blk_free_flush_queue(q->fq);
wbt_exit(q);
-   return NULL;
+   return -ENOMEM;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue);
 
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 9d7275f..93f6e9f 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -823,7 +823,8 @@ static void dm_old_request_fn(struct request_queue *q)
 int dm_old_init_request_queue(struct mapped_device *md)
 {
/* Fully initialize the queue */
-   if (!blk_init_allocated_queue(md->queue, dm_old_request_fn, NULL))
+   md->queue->request_fn = dm_old_request_fn;
+   if (blk_init_allocated_queue(md->queue) < 0)
return -EINVAL;
 
/* disable dm_old_request_fn's merge heuristic by default */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8e0b57e..a036c4a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1131,8 +1131,7 @@ extern void blk_unprep_request(struct request *);
 extern struct request_queue *blk_init_queue_node(request_fn_proc *rfn,
spinlock_t *lock, int node_id);
 extern struct request_queue *blk_init_queue(request_fn_proc *, spinlock_t *);
-extern struct request_queue *blk_init_allocated_queue(struct request_queue *,
- request_fn_proc *, 
spinlock_t *);
+extern int blk_init_allocated_queue(struct request_queue *);
 extern void blk_cleanup_queue(struct request_queue *);
 extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
 extern void blk_queue_bounce_limit(struct request_queue *, u64);
-- 
2.1.4

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH 18/18] block: don't assign cmd_flags in __blk_rq_prep_clone

2017-01-25 Thread Christoph Hellwig
These days we have the proper flags set since request allocation time.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 block/blk-core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 33c5d05e..6bf5ba0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2983,7 +2983,6 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone);
 static void __blk_rq_prep_clone(struct request *dst, struct request *src)
 {
dst->cpu = src->cpu;
-   dst->cmd_flags = src->cmd_flags | REQ_NOMERGE;
dst->cmd_type = src->cmd_type;
dst->__sector = blk_rq_pos(src);
dst->__data_len = blk_rq_bytes(src);
-- 
2.1.4

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] split scsi passthrough fields out of struct request V2

2017-01-25 Thread Christoph Hellwig
Hi all,

this series splits the support for SCSI passthrough commands from the
main struct request used all over the block layer into a separate
scsi_request structure that drivers that want to support SCSI passthough
need to embedded as the first thing into their request-private data,
similar to how we handle NVMe passthrough commands.

To support this I've added support for that the private data after
request structure to the legacy request path instead, so that it can
be treated the same way as the blk-mq path.  Compare to the current
scsi_cmnd allocator that actually is a major simplification.

Changes since V1:
 - fix handling of a NULL sense pointer in __scsi_execute
 - clean up handling of the flush flags in the block layer and MD
 - additional small cleanup in dm-rq

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH 03/18] block: fix elevator init check

2017-01-25 Thread Christoph Hellwig
We can't initalize the elevator fields for flushes as flush share space
in struct request with the elevator data.  But currently we can't
commnicate that a request is a flush through blk_get_request as we
can only pass READ or WRITE, and the low-level code looks at the
possible NULL bio to check for a flush.

Fix this by allowing to pass any block op and flags, and by checking for
the flush flags in __get_request.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 block/blk-core.c | 26 --
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b830e14..a84c1b9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1022,25 +1022,6 @@ int blk_update_nr_requests(struct request_queue *q, 
unsigned int nr)
return 0;
 }
 
-/*
- * Determine if elevator data should be initialized when allocating the
- * request associated with @bio.
- */
-static bool blk_rq_should_init_elevator(struct bio *bio)
-{
-   if (!bio)
-   return true;
-
-   /*
-* Flush requests do not use the elevator so skip initialization.
-* This allows a request to share the flush and elevator data.
-*/
-   if (op_is_flush(bio->bi_opf))
-   return false;
-
-   return true;
-}
-
 /**
  * __get_request - get a free request
  * @rl: request list to allocate from
@@ -1119,10 +1100,13 @@ static struct request *__get_request(struct 
request_list *rl, unsigned int op,
 * request is freed.  This guarantees icq's won't be destroyed and
 * makes creating new ones safe.
 *
+* Flush requests do not use the elevator so skip initialization.
+* This allows a request to share the flush and elevator data.
+*
 * Also, lookup icq while holding queue_lock.  If it doesn't exist,
 * it will be created after releasing queue_lock.
 */
-   if (blk_rq_should_init_elevator(bio) && !blk_queue_bypass(q)) {
+   if (!op_is_flush(op) && !blk_queue_bypass(q)) {
rq_flags |= RQF_ELVPRIV;
q->nr_rqs_elvpriv++;
if (et->icq_cache && ioc)
@@ -1276,8 +1260,6 @@ static struct request *blk_old_get_request(struct 
request_queue *q, int rw,
 {
struct request *rq;
 
-   BUG_ON(rw != READ && rw != WRITE);
-
/* create ioc upfront */
create_io_context(gfp_mask, q->node);
 
-- 
2.1.4

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH 15/18] scsi: allocate scsi_cmnd structures as part of struct request

2017-01-25 Thread Christoph Hellwig
Rely on the new block layer functionality to allocate additional driver
specific data behind struct request instead of implementing it in SCSI
itѕelf.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/hosts.c  |  20 +--
 drivers/scsi/scsi.c   | 319 --
 drivers/scsi/scsi_error.c |  17 ++-
 drivers/scsi/scsi_lib.c   | 122 --
 drivers/scsi/scsi_priv.h  |   8 +-
 include/scsi/scsi_host.h  |   3 -
 6 files changed, 95 insertions(+), 394 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 6d29c4a..831a1c8 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -230,19 +230,6 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
}
}
 
-   /*
-* Note that we allocate the freelist even for the MQ case for now,
-* as we need a command set aside for scsi_reset_provider.  Having
-* the full host freelist and one command available for that is a
-* little heavy-handed, but avoids introducing a special allocator
-* just for this.  Eventually the structure of scsi_reset_provider
-* will need a major overhaul.
-*/
-   error = scsi_setup_command_freelist(shost);
-   if (error)
-   goto out_destroy_tags;
-
-
if (!shost->shost_gendev.parent)
shost->shost_gendev.parent = dev ? dev : _bus;
if (!dma_dev)
@@ -262,7 +249,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
 
error = device_add(>shost_gendev);
if (error)
-   goto out_destroy_freelist;
+   goto out_disable_runtime_pm;
 
scsi_host_set_state(shost, SHOST_RUNNING);
get_device(shost->shost_gendev.parent);
@@ -312,13 +299,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, 
struct device *dev,
device_del(>shost_dev);
  out_del_gendev:
device_del(>shost_gendev);
- out_destroy_freelist:
+ out_disable_runtime_pm:
device_disable_async_suspend(>shost_gendev);
pm_runtime_disable(>shost_gendev);
pm_runtime_set_suspended(>shost_gendev);
pm_runtime_put_noidle(>shost_gendev);
-   scsi_destroy_command_freelist(shost);
- out_destroy_tags:
if (shost_use_blk_mq(shost))
scsi_mq_destroy_tags(shost);
  fail:
@@ -359,7 +344,6 @@ static void scsi_host_dev_release(struct device *dev)
kfree(dev_name(>shost_dev));
}
 
-   scsi_destroy_command_freelist(shost);
if (shost_use_blk_mq(shost)) {
if (shost->tag_set.tags)
scsi_mq_destroy_tags(shost);
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2e24f31..3d8d215 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -98,163 +98,6 @@ EXPORT_SYMBOL(scsi_sd_probe_domain);
 ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain);
 EXPORT_SYMBOL(scsi_sd_pm_domain);
 
-struct scsi_host_cmd_pool {
-   struct kmem_cache   *cmd_slab;
-   unsigned intusers;
-   char*cmd_name;
-};
-
-static struct scsi_host_cmd_pool scsi_cmd_pool = {
-   .cmd_name   = "scsi_cmd_cache",
-};
-
-static DEFINE_MUTEX(host_cmd_pool_mutex);
-
-/**
- * scsi_host_free_command - internal function to release a command
- * @shost: host to free the command for
- * @cmd:   command to release
- *
- * the command must previously have been allocated by
- * scsi_host_alloc_command.
- */
-static void
-scsi_host_free_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
-{
-   struct scsi_host_cmd_pool *pool = shost->cmd_pool;
-
-   if (cmd->prot_sdb)
-   kmem_cache_free(scsi_sdb_cache, cmd->prot_sdb);
-   scsi_free_sense_buffer(shost, cmd->sense_buffer);
-   kmem_cache_free(pool->cmd_slab, cmd);
-}
-
-/**
- * scsi_host_alloc_command - internal function to allocate command
- * @shost: SCSI host whose pool to allocate from
- * @gfp_mask:  mask for the allocation
- *
- * Returns a fully allocated command with sense buffer and protection
- * data buffer (where applicable) or NULL on failure
- */
-static struct scsi_cmnd *
-scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t gfp_mask)
-{
-   struct scsi_host_cmd_pool *pool = shost->cmd_pool;
-   struct scsi_cmnd *cmd;
-
-   cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask);
-   if (!cmd)
-   goto fail;
-
-   cmd->sense_buffer = scsi_alloc_sense_buffer(shost, gfp_mask,
-   NUMA_NO_NODE);
-   if (!cmd->sense_buffer)
-   goto fail_free_cmd;
-
-   if (scsi_host_get_prot(shost) >= SHOST_DIX_TYPE0_PROTECTION) {
-   cmd->prot_sdb = kmem_cache_zalloc(scsi_sdb_cache, gfp_mask);
-   if (!cmd->prot_sdb)
-   goto fail_free_sense;
-   }
-
-   return cmd;
-
-fail_free_sense:
-   

[dm-devel] [PATCH 09/18] scsi_dh_emc: switch to scsi_execute_req_flags()

2017-01-25 Thread Christoph Hellwig
From: Hannes Reinecke 

Switch to scsi_execute_req_flags() and scsi_get_vpd_page() instead of
open-coding it.  Using scsi_execute_req_flags() will set REQ_QUIET and
REQ_PREEMPT, but this is okay as we're evaluating the errors anyway and
should be able to send the command even if the device is quiesced.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/device_handler/scsi_dh_emc.c | 247 +++---
 1 file changed, 56 insertions(+), 191 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_emc.c 
b/drivers/scsi/device_handler/scsi_dh_emc.c
index 5b80746..4a7679f 100644
--- a/drivers/scsi/device_handler/scsi_dh_emc.c
+++ b/drivers/scsi/device_handler/scsi_dh_emc.c
@@ -88,12 +88,6 @@ struct clariion_dh_data {
 */
unsigned char buffer[CLARIION_BUFFER_SIZE];
/*
-* SCSI sense buffer for commands -- assumes serial issuance
-* and completion sequence of all commands for same multipath.
-*/
-   unsigned char sense[SCSI_SENSE_BUFFERSIZE];
-   unsigned int senselen;
-   /*
 * LUN state
 */
int lun_state;
@@ -116,44 +110,38 @@ struct clariion_dh_data {
 /*
  * Parse MODE_SELECT cmd reply.
  */
-static int trespass_endio(struct scsi_device *sdev, char *sense)
+static int trespass_endio(struct scsi_device *sdev,
+ struct scsi_sense_hdr *sshdr)
 {
int err = SCSI_DH_IO;
-   struct scsi_sense_hdr sshdr;
-
-   if (!scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, )) {
-   sdev_printk(KERN_ERR, sdev, "%s: Found valid sense data 0x%2x, "
-   "0x%2x, 0x%2x while sending CLARiiON trespass "
-   "command.\n", CLARIION_NAME, sshdr.sense_key,
-   sshdr.asc, sshdr.ascq);
 
-   if ((sshdr.sense_key == 0x05) && (sshdr.asc == 0x04) &&
-(sshdr.ascq == 0x00)) {
-   /*
-* Array based copy in progress -- do not send
-* mode_select or copy will be aborted mid-stream.
-*/
-   sdev_printk(KERN_INFO, sdev, "%s: Array Based Copy in "
-   "progress while sending CLARiiON trespass "
-   "command.\n", CLARIION_NAME);
-   err = SCSI_DH_DEV_TEMP_BUSY;
-   } else if ((sshdr.sense_key == 0x02) && (sshdr.asc == 0x04) &&
-   (sshdr.ascq == 0x03)) {
-   /*
-* LUN Not Ready - Manual Intervention Required
-* indicates in-progress ucode upgrade (NDU).
-*/
-   sdev_printk(KERN_INFO, sdev, "%s: Detected in-progress "
-   "ucode upgrade NDU operation while sending "
-   "CLARiiON trespass command.\n", 
CLARIION_NAME);
-   err = SCSI_DH_DEV_TEMP_BUSY;
-   } else
-   err = SCSI_DH_DEV_FAILED;
-   } else {
-   sdev_printk(KERN_INFO, sdev,
-   "%s: failed to send MODE SELECT, no sense 
available\n",
-   CLARIION_NAME);
-   }
+   sdev_printk(KERN_ERR, sdev, "%s: Found valid sense data 0x%2x, "
+   "0x%2x, 0x%2x while sending CLARiiON trespass "
+   "command.\n", CLARIION_NAME, sshdr->sense_key,
+   sshdr->asc, sshdr->ascq);
+
+   if (sshdr->sense_key == 0x05 && sshdr->asc == 0x04 &&
+   sshdr->ascq == 0x00) {
+   /*
+* Array based copy in progress -- do not send
+* mode_select or copy will be aborted mid-stream.
+*/
+   sdev_printk(KERN_INFO, sdev, "%s: Array Based Copy in "
+   "progress while sending CLARiiON trespass "
+   "command.\n", CLARIION_NAME);
+   err = SCSI_DH_DEV_TEMP_BUSY;
+   } else if (sshdr->sense_key == 0x02 && sshdr->asc == 0x04 &&
+  sshdr->ascq == 0x03) {
+   /*
+* LUN Not Ready - Manual Intervention Required
+* indicates in-progress ucode upgrade (NDU).
+*/
+   sdev_printk(KERN_INFO, sdev, "%s: Detected in-progress "
+   "ucode upgrade NDU operation while sending "
+   "CLARiiON trespass command.\n", CLARIION_NAME);
+   err = SCSI_DH_DEV_TEMP_BUSY;
+   } else
+   err = SCSI_DH_DEV_FAILED;
return err;
 }
 
@@ -257,103 +245,15 @@ static char * parse_sp_model(struct scsi_device *sdev, 
unsigned char *buffer)
return sp_model;
 }
 
-/*
- * Get block request for REQ_BLOCK_PC command issued to path.  Currently
- * 

[dm-devel] [PATCH 02/18] md: cleanup bio op / flags handling in raid1_write_request

2017-01-25 Thread Christoph Hellwig
No need for the local variables, the bio is still live and we can just
assigned the bits we want directly.  Make me wonder why we can't assign
all the bio flags to start with.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/raid1.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7b0f647..67b0365 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1170,10 +1170,6 @@ static void raid1_write_request(struct mddev *mddev, 
struct bio *bio,
int i, disks;
struct bitmap *bitmap = mddev->bitmap;
unsigned long flags;
-   const int op = bio_op(bio);
-   const unsigned long do_sync = (bio->bi_opf & REQ_SYNC);
-   const unsigned long do_flush_fua = (bio->bi_opf &
-   (REQ_PREFLUSH | REQ_FUA));
struct md_rdev *blocked_rdev;
struct blk_plug_cb *cb;
struct raid1_plug_cb *plug = NULL;
@@ -1389,7 +1385,8 @@ static void raid1_write_request(struct mddev *mddev, 
struct bio *bio,
   conf->mirrors[i].rdev->data_offset);
mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
mbio->bi_end_io = raid1_end_write_request;
-   bio_set_op_attrs(mbio, op, do_flush_fua | do_sync);
+   mbio->bi_opf = bio_op(bio) |
+   (bio->bi_opf & (REQ_SYNC | REQ_PREFLUSH | REQ_FUA));
if (test_bit(FailFast, >mirrors[i].rdev->flags) &&
!test_bit(WriteMostly, >mirrors[i].rdev->flags) &&
conf->raid_disks - mddev->degraded > 1)
-- 
2.1.4

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH 08/18] scsi_dh_rdac: switch to scsi_execute_req_flags()

2017-01-25 Thread Christoph Hellwig
From: Hannes Reinecke 

Switch to scsi_execute_req_flags() and scsi_get_vpd_page() instead of
open-coding it.  Using scsi_execute_req_flags() will set REQ_QUIET and
REQ_PREEMPT, but this is okay as we're evaluating the errors anyway and
should be able to send the command even if the device is quiesced.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/device_handler/scsi_dh_rdac.c | 174 +
 1 file changed, 51 insertions(+), 123 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c 
b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 00d9c32..b64eaae 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -205,7 +205,6 @@ struct rdac_dh_data {
 #define RDAC_NON_PREFERRED 1
charpreferred;
 
-   unsigned char   sense[SCSI_SENSE_BUFFERSIZE];
union   {
struct c2_inquiry c2;
struct c4_inquiry c4;
@@ -262,40 +261,12 @@ do { \
sdev_printk(KERN_INFO, sdev, RDAC_NAME ": " f "\n", ## arg); \
 } while (0);
 
-static struct request *get_rdac_req(struct scsi_device *sdev,
-   void *buffer, unsigned buflen, int rw)
+static unsigned int rdac_failover_get(struct rdac_controller *ctlr,
+ struct list_head *list,
+ unsigned char *cdb)
 {
-   struct request *rq;
-   struct request_queue *q = sdev->request_queue;
-
-   rq = blk_get_request(q, rw, GFP_NOIO);
-
-   if (IS_ERR(rq)) {
-   sdev_printk(KERN_INFO, sdev,
-   "get_rdac_req: blk_get_request failed.\n");
-   return NULL;
-   }
-   blk_rq_set_block_pc(rq);
-
-   if (buflen && blk_rq_map_kern(q, rq, buffer, buflen, GFP_NOIO)) {
-   blk_put_request(rq);
-   sdev_printk(KERN_INFO, sdev,
-   "get_rdac_req: blk_rq_map_kern failed.\n");
-   return NULL;
-   }
-
-   rq->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
-REQ_FAILFAST_DRIVER;
-   rq->retries = RDAC_RETRIES;
-   rq->timeout = RDAC_TIMEOUT;
-
-   return rq;
-}
-
-static struct request *rdac_failover_get(struct scsi_device *sdev,
-   struct rdac_dh_data *h, struct list_head *list)
-{
-   struct request *rq;
+   struct scsi_device *sdev = ctlr->ms_sdev;
+   struct rdac_dh_data *h = sdev->handler_data;
struct rdac_mode_common *common;
unsigned data_size;
struct rdac_queue_data *qdata;
@@ -332,27 +303,17 @@ static struct request *rdac_failover_get(struct 
scsi_device *sdev,
lun_table[qdata->h->lun] = 0x81;
}
 
-   /* get request for block layer packet command */
-   rq = get_rdac_req(sdev, >ctlr->mode_select, data_size, WRITE);
-   if (!rq)
-   return NULL;
-
/* Prepare the command. */
if (h->ctlr->use_ms10) {
-   rq->cmd[0] = MODE_SELECT_10;
-   rq->cmd[7] = data_size >> 8;
-   rq->cmd[8] = data_size & 0xff;
+   cdb[0] = MODE_SELECT_10;
+   cdb[7] = data_size >> 8;
+   cdb[8] = data_size & 0xff;
} else {
-   rq->cmd[0] = MODE_SELECT;
-   rq->cmd[4] = data_size;
+   cdb[0] = MODE_SELECT;
+   cdb[4] = data_size;
}
-   rq->cmd_len = COMMAND_SIZE(rq->cmd[0]);
-
-   rq->sense = h->sense;
-   memset(rq->sense, 0, SCSI_SENSE_BUFFERSIZE);
-   rq->sense_len = 0;
 
-   return rq;
+   return data_size;
 }
 
 static void release_controller(struct kref *kref)
@@ -400,46 +361,14 @@ static struct rdac_controller *get_controller(int index, 
char *array_name,
return ctlr;
 }
 
-static int submit_inquiry(struct scsi_device *sdev, int page_code,
- unsigned int len, struct rdac_dh_data *h)
-{
-   struct request *rq;
-   struct request_queue *q = sdev->request_queue;
-   int err = SCSI_DH_RES_TEMP_UNAVAIL;
-
-   rq = get_rdac_req(sdev, >inq, len, READ);
-   if (!rq)
-   goto done;
-
-   /* Prepare the command. */
-   rq->cmd[0] = INQUIRY;
-   rq->cmd[1] = 1;
-   rq->cmd[2] = page_code;
-   rq->cmd[4] = len;
-   rq->cmd_len = COMMAND_SIZE(INQUIRY);
-
-   rq->sense = h->sense;
-   memset(rq->sense, 0, SCSI_SENSE_BUFFERSIZE);
-   rq->sense_len = 0;
-
-   err = blk_execute_rq(q, NULL, rq, 1);
-   if (err == -EIO)
-   err = SCSI_DH_IO;
-
-   blk_put_request(rq);
-done:
-   return err;
-}
-
 static int get_lun_info(struct scsi_device *sdev, struct rdac_dh_data *h,
char *array_name, u8 *array_id)
 {
-   int err, i;
-   struct c8_inquiry *inqp;
+   

[dm-devel] [PATCH 07/18] dm: always defer request allocation to the owner of the request_queue

2017-01-25 Thread Christoph Hellwig
DM already calls blk_mq_alloc_request on the request_queue of the
underlying device if it is a blk-mq device.  But now that we allow drivers
to allocate additional data and initialize it ahead of time we need to do
the same for all drivers.   Doing so and using the new cmd_size
infrastructure in the block layer greatly simplifies the dm-rq and mpath
code, and should also make arbitrary combinations of SQ and MQ devices
with SQ or MQ device mapper tables easily possible as a further step.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Mike Snitzer 
---
 drivers/md/dm-core.h  |   1 -
 drivers/md/dm-mpath.c | 132 --
 drivers/md/dm-rq.c| 251 ++
 drivers/md/dm-rq.h|   2 +-
 drivers/md/dm-target.c|   7 --
 drivers/md/dm.c   |  30 ++---
 drivers/md/dm.h   |   3 +-
 include/linux/device-mapper.h |   3 -
 8 files changed, 85 insertions(+), 344 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 40ceba1..136fda3 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -92,7 +92,6 @@ struct mapped_device {
 * io objects are allocated from here.
 */
mempool_t *io_pool;
-   mempool_t *rq_pool;
 
struct bio_set *bs;
 
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 6400cff..784f237 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -92,12 +92,6 @@ struct multipath {
 
unsigned queue_mode;
 
-   /*
-* We must use a mempool of dm_mpath_io structs so that we
-* can resubmit bios on error.
-*/
-   mempool_t *mpio_pool;
-
struct mutex work_mutex;
struct work_struct trigger_event;
 
@@ -115,8 +109,6 @@ struct dm_mpath_io {
 
 typedef int (*action_fn) (struct pgpath *pgpath);
 
-static struct kmem_cache *_mpio_cache;
-
 static struct workqueue_struct *kmultipathd, *kmpath_handlerd;
 static void trigger_event(struct work_struct *work);
 static void activate_path(struct work_struct *work);
@@ -209,7 +201,6 @@ static struct multipath *alloc_multipath(struct dm_target 
*ti)
init_waitqueue_head(>pg_init_wait);
mutex_init(>work_mutex);
 
-   m->mpio_pool = NULL;
m->queue_mode = DM_TYPE_NONE;
 
m->ti = ti;
@@ -229,16 +220,7 @@ static int alloc_multipath_stage2(struct dm_target *ti, 
struct multipath *m)
m->queue_mode = DM_TYPE_MQ_REQUEST_BASED;
else
m->queue_mode = DM_TYPE_REQUEST_BASED;
-   }
-
-   if (m->queue_mode == DM_TYPE_REQUEST_BASED) {
-   unsigned min_ios = dm_get_reserved_rq_based_ios();
-
-   m->mpio_pool = mempool_create_slab_pool(min_ios, _mpio_cache);
-   if (!m->mpio_pool)
-   return -ENOMEM;
-   }
-   else if (m->queue_mode == DM_TYPE_BIO_BASED) {
+   } else if (m->queue_mode == DM_TYPE_BIO_BASED) {
INIT_WORK(>process_queued_bios, process_queued_bios);
/*
 * bio-based doesn't support any direct scsi_dh management;
@@ -263,7 +245,6 @@ static void free_multipath(struct multipath *m)
 
kfree(m->hw_handler_name);
kfree(m->hw_handler_params);
-   mempool_destroy(m->mpio_pool);
kfree(m);
 }
 
@@ -272,38 +253,6 @@ static struct dm_mpath_io *get_mpio(union map_info *info)
return info->ptr;
 }
 
-static struct dm_mpath_io *set_mpio(struct multipath *m, union map_info *info)
-{
-   struct dm_mpath_io *mpio;
-
-   if (!m->mpio_pool) {
-   /* Use blk-mq pdu memory requested via per_io_data_size */
-   mpio = get_mpio(info);
-   memset(mpio, 0, sizeof(*mpio));
-   return mpio;
-   }
-
-   mpio = mempool_alloc(m->mpio_pool, GFP_ATOMIC);
-   if (!mpio)
-   return NULL;
-
-   memset(mpio, 0, sizeof(*mpio));
-   info->ptr = mpio;
-
-   return mpio;
-}
-
-static void clear_request_fn_mpio(struct multipath *m, union map_info *info)
-{
-   /* Only needed for non blk-mq (.request_fn) multipath */
-   if (m->mpio_pool) {
-   struct dm_mpath_io *mpio = info->ptr;
-
-   info->ptr = NULL;
-   mempool_free(mpio, m->mpio_pool);
-   }
-}
-
 static size_t multipath_per_bio_data_size(void)
 {
return sizeof(struct dm_mpath_io) + sizeof(struct dm_bio_details);
@@ -530,16 +479,17 @@ static bool must_push_back_bio(struct multipath *m)
 /*
  * Map cloned requests (request-based multipath)
  */
-static int __multipath_map(struct dm_target *ti, struct request *clone,
-  union map_info *map_context,
-  struct request *rq, struct request **__clone)
+static int multipath_clone_and_map(struct dm_target *ti, 

[dm-devel] [PATCH 13/18] scsi: remove scsi_cmd_dma_pool

2017-01-25 Thread Christoph Hellwig
There is no need for GFP_DMA allocations of the scsi_cmnd structures
themselves, all that might be DMAed to or from is the actual payload,
or the sense buffers.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/scsi.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 469aa0f..2e24f31 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -102,17 +102,10 @@ struct scsi_host_cmd_pool {
struct kmem_cache   *cmd_slab;
unsigned intusers;
char*cmd_name;
-   unsigned intslab_flags;
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_pool = {
.cmd_name   = "scsi_cmd_cache",
-   .slab_flags = SLAB_HWCACHE_ALIGN,
-};
-
-static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
-   .cmd_name   = "scsi_cmd_cache(DMA)",
-   .slab_flags = SLAB_HWCACHE_ALIGN|SLAB_CACHE_DMA,
 };
 
 static DEFINE_MUTEX(host_cmd_pool_mutex);
@@ -290,8 +283,6 @@ scsi_find_host_cmd_pool(struct Scsi_Host *shost)
 {
if (shost->hostt->cmd_size)
return shost->hostt->cmd_pool;
-   if (shost->unchecked_isa_dma)
-   return _cmd_dma_pool;
return _cmd_pool;
 }
 
@@ -318,10 +309,6 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
return NULL;
}
 
-   pool->slab_flags = SLAB_HWCACHE_ALIGN;
-   if (shost->unchecked_isa_dma)
-   pool->slab_flags |= SLAB_CACHE_DMA;
-
if (hostt->cmd_size)
hostt->cmd_pool = pool;
 
@@ -349,7 +336,7 @@ scsi_get_host_cmd_pool(struct Scsi_Host *shost)
 
if (!pool->users) {
pool->cmd_slab = kmem_cache_create(pool->cmd_name, cmd_size, 0,
-  pool->slab_flags, NULL);
+  SLAB_HWCACHE_ALIGN, NULL);
if (!pool->cmd_slab)
goto out_free_pool;
}
-- 
2.1.4

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH 05/18] block: allow specifying size for extra command data

2017-01-25 Thread Christoph Hellwig
This mirrors the blk-mq capabilities to allocate extra drivers-specific
data behind struct request by setting a cmd_size field, as well as having
a constructor / destructor for it.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 block/blk-core.c   | 59 --
 block/blk-flush.c  |  5 ++---
 block/blk-sysfs.c  |  7 --
 include/linux/blkdev.h |  7 ++
 4 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 54b5512..7de7164 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -606,17 +606,41 @@ void blk_cleanup_queue(struct request_queue *q)
 EXPORT_SYMBOL(blk_cleanup_queue);
 
 /* Allocate memory local to the request queue */
-static void *alloc_request_struct(gfp_t gfp_mask, void *data)
+static void *alloc_request_simple(gfp_t gfp_mask, void *data)
 {
-   int nid = (int)(long)data;
-   return kmem_cache_alloc_node(request_cachep, gfp_mask, nid);
+   struct request_queue *q = data;
+
+   return kmem_cache_alloc_node(request_cachep, gfp_mask, q->node);
 }
 
-static void free_request_struct(void *element, void *unused)
+static void free_request_simple(void *element, void *data)
 {
kmem_cache_free(request_cachep, element);
 }
 
+static void *alloc_request_size(gfp_t gfp_mask, void *data)
+{
+   struct request_queue *q = data;
+   struct request *rq;
+
+   rq = kmalloc_node(sizeof(struct request) + q->cmd_size, gfp_mask,
+   q->node);
+   if (rq && q->init_rq_fn && q->init_rq_fn(q, rq, gfp_mask) < 0) {
+   kfree(rq);
+   rq = NULL;
+   }
+   return rq;
+}
+
+static void free_request_size(void *element, void *data)
+{
+   struct request_queue *q = data;
+
+   if (q->exit_rq_fn)
+   q->exit_rq_fn(q, element);
+   kfree(element);
+}
+
 int blk_init_rl(struct request_list *rl, struct request_queue *q,
gfp_t gfp_mask)
 {
@@ -629,10 +653,15 @@ int blk_init_rl(struct request_list *rl, struct 
request_queue *q,
init_waitqueue_head(>wait[BLK_RW_SYNC]);
init_waitqueue_head(>wait[BLK_RW_ASYNC]);
 
-   rl->rq_pool = mempool_create_node(BLKDEV_MIN_RQ, alloc_request_struct,
- free_request_struct,
- (void *)(long)q->node, gfp_mask,
- q->node);
+   if (q->cmd_size) {
+   rl->rq_pool = mempool_create_node(BLKDEV_MIN_RQ,
+   alloc_request_size, free_request_size,
+   q, gfp_mask, q->node);
+   } else {
+   rl->rq_pool = mempool_create_node(BLKDEV_MIN_RQ,
+   alloc_request_simple, free_request_simple,
+   q, gfp_mask, q->node);
+   }
if (!rl->rq_pool)
return -ENOMEM;
 
@@ -846,12 +875,15 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio);
 
 int blk_init_allocated_queue(struct request_queue *q)
 {
-   q->fq = blk_alloc_flush_queue(q, NUMA_NO_NODE, 0);
+   q->fq = blk_alloc_flush_queue(q, NUMA_NO_NODE, q->cmd_size);
if (!q->fq)
return -ENOMEM;
 
+   if (q->init_rq_fn && q->init_rq_fn(q, q->fq->flush_rq, GFP_KERNEL))
+   goto out_free_flush_queue;
+
if (blk_init_rl(>root_rl, q, GFP_KERNEL))
-   goto fail;
+   goto out_exit_flush_rq;
 
INIT_WORK(>timeout_work, blk_timeout_work);
q->queue_flags  |= QUEUE_FLAG_DEFAULT;
@@ -869,13 +901,16 @@ int blk_init_allocated_queue(struct request_queue *q)
/* init elevator */
if (elevator_init(q, NULL)) {
mutex_unlock(>sysfs_lock);
-   goto fail;
+   goto out_exit_flush_rq;
}
 
mutex_unlock(>sysfs_lock);
return 0;
 
-fail:
+out_exit_flush_rq:
+   if (q->exit_rq_fn)
+   q->exit_rq_fn(q, q->fq->flush_rq);
+out_free_flush_queue:
blk_free_flush_queue(q->fq);
wbt_exit(q);
return -ENOMEM;
diff --git a/block/blk-flush.c b/block/blk-flush.c
index d7de34e..bf3ba3c 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -547,11 +547,10 @@ struct blk_flush_queue *blk_alloc_flush_queue(struct 
request_queue *q,
if (!fq)
goto fail;
 
-   if (q->mq_ops) {
+   if (q->mq_ops)
spin_lock_init(>mq_flush_lock);
-   rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
-   }
 
+   rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
fq->flush_rq = kzalloc_node(rq_sz, GFP_KERNEL, node);
if (!fq->flush_rq)
goto fail_rq;
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 1dbce05..894f773 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -814,10 +814,13 @@ 

[dm-devel] [PATCH 12/18] scsi: respect unchecked_isa_dma for blk-mq

2017-01-25 Thread Christoph Hellwig
Currently blk-mq always allocates the sense buffer using normal GFP_KERNEL
allocation.  Refactor the cmd pool code to split the cmd and sense allocation
and share the code to allocate the sense buffers as well as the sense buffer
slab caches between the legacy and blk-mq path.

Note that this switches to lazy allocation of the sense slab caches - the
slab caches (not the actual allocations) won't be destroy until the scsi
module is unloaded instead of keeping track of hosts using them.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/hosts.c |  4 
 drivers/scsi/scsi.c  | 24 ---
 drivers/scsi/scsi_lib.c  | 62 +---
 drivers/scsi/scsi_priv.h |  5 
 4 files changed, 73 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 258a3f9..6d29c4a 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -213,6 +213,10 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
goto fail;
}
 
+   error = scsi_init_sense_cache(shost);
+   if (error)
+   goto fail;
+
if (shost_use_blk_mq(shost)) {
error = scsi_mq_setup_tags(shost);
if (error)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 0f93892..469aa0f 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -100,22 +100,18 @@ EXPORT_SYMBOL(scsi_sd_pm_domain);
 
 struct scsi_host_cmd_pool {
struct kmem_cache   *cmd_slab;
-   struct kmem_cache   *sense_slab;
unsigned intusers;
char*cmd_name;
-   char*sense_name;
unsigned intslab_flags;
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_pool = {
.cmd_name   = "scsi_cmd_cache",
-   .sense_name = "scsi_sense_cache",
.slab_flags = SLAB_HWCACHE_ALIGN,
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
.cmd_name   = "scsi_cmd_cache(DMA)",
-   .sense_name = "scsi_sense_cache(DMA)",
.slab_flags = SLAB_HWCACHE_ALIGN|SLAB_CACHE_DMA,
 };
 
@@ -136,7 +132,7 @@ scsi_host_free_command(struct Scsi_Host *shost, struct 
scsi_cmnd *cmd)
 
if (cmd->prot_sdb)
kmem_cache_free(scsi_sdb_cache, cmd->prot_sdb);
-   kmem_cache_free(pool->sense_slab, cmd->sense_buffer);
+   scsi_free_sense_buffer(shost, cmd->sense_buffer);
kmem_cache_free(pool->cmd_slab, cmd);
 }
 
@@ -158,7 +154,8 @@ scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t 
gfp_mask)
if (!cmd)
goto fail;
 
-   cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab, gfp_mask);
+   cmd->sense_buffer = scsi_alloc_sense_buffer(shost, gfp_mask,
+   NUMA_NO_NODE);
if (!cmd->sense_buffer)
goto fail_free_cmd;
 
@@ -171,7 +168,7 @@ scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t 
gfp_mask)
return cmd;
 
 fail_free_sense:
-   kmem_cache_free(pool->sense_slab, cmd->sense_buffer);
+   scsi_free_sense_buffer(shost, cmd->sense_buffer);
 fail_free_cmd:
kmem_cache_free(pool->cmd_slab, cmd);
 fail:
@@ -301,7 +298,6 @@ scsi_find_host_cmd_pool(struct Scsi_Host *shost)
 static void
 scsi_free_host_cmd_pool(struct scsi_host_cmd_pool *pool)
 {
-   kfree(pool->sense_name);
kfree(pool->cmd_name);
kfree(pool);
 }
@@ -317,8 +313,7 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
return NULL;
 
pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->proc_name);
-   pool->sense_name = kasprintf(GFP_KERNEL, "%s_sense", hostt->proc_name);
-   if (!pool->cmd_name || !pool->sense_name) {
+   if (!pool->cmd_name) {
scsi_free_host_cmd_pool(pool);
return NULL;
}
@@ -357,12 +352,6 @@ scsi_get_host_cmd_pool(struct Scsi_Host *shost)
   pool->slab_flags, NULL);
if (!pool->cmd_slab)
goto out_free_pool;
-
-   pool->sense_slab = kmem_cache_create(pool->sense_name,
-SCSI_SENSE_BUFFERSIZE, 0,
-pool->slab_flags, NULL);
-   if (!pool->sense_slab)
-   goto out_free_slab;
}
 
pool->users++;
@@ -371,8 +360,6 @@ scsi_get_host_cmd_pool(struct Scsi_Host *shost)
mutex_unlock(_cmd_pool_mutex);
return retval;
 
-out_free_slab:
-   kmem_cache_destroy(pool->cmd_slab);
 out_free_pool:
if (hostt->cmd_size) {
scsi_free_host_cmd_pool(pool);
@@ -398,7 +385,6 @@ static void scsi_put_host_cmd_pool(struct Scsi_Host *shost)
 
if (!--pool->users) {
kmem_cache_destroy(pool->cmd_slab);
-   

[dm-devel] [PATCH 11/18] scsi: remove gfp_flags member in scsi_host_cmd_pool

2017-01-25 Thread Christoph Hellwig
When using the slab allocator we already decide at cache creation time if
an allocation comes from a GFP_DMA pool using the SLAB_CACHE_DMA flag,
and there is no point passing the kmalloc-family only GFP_DMA flag to
kmem_cache_alloc.  Drop all the infrastructure for doing so.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 drivers/scsi/scsi.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 75455d4..0f93892 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -105,7 +105,6 @@ struct scsi_host_cmd_pool {
char*cmd_name;
char*sense_name;
unsigned intslab_flags;
-   gfp_t   gfp_mask;
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_pool = {
@@ -118,7 +117,6 @@ static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
.cmd_name   = "scsi_cmd_cache(DMA)",
.sense_name = "scsi_sense_cache(DMA)",
.slab_flags = SLAB_HWCACHE_ALIGN|SLAB_CACHE_DMA,
-   .gfp_mask   = __GFP_DMA,
 };
 
 static DEFINE_MUTEX(host_cmd_pool_mutex);
@@ -156,12 +154,11 @@ scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t 
gfp_mask)
struct scsi_host_cmd_pool *pool = shost->cmd_pool;
struct scsi_cmnd *cmd;
 
-   cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
+   cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask);
if (!cmd)
goto fail;
 
-   cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab,
-gfp_mask | pool->gfp_mask);
+   cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab, gfp_mask);
if (!cmd->sense_buffer)
goto fail_free_cmd;
 
@@ -327,10 +324,8 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
}
 
pool->slab_flags = SLAB_HWCACHE_ALIGN;
-   if (shost->unchecked_isa_dma) {
+   if (shost->unchecked_isa_dma)
pool->slab_flags |= SLAB_CACHE_DMA;
-   pool->gfp_mask = __GFP_DMA;
-   }
 
if (hostt->cmd_size)
hostt->cmd_pool = pool;
@@ -424,7 +419,6 @@ static void scsi_put_host_cmd_pool(struct Scsi_Host *shost)
  */
 int scsi_setup_command_freelist(struct Scsi_Host *shost)
 {
-   const gfp_t gfp_mask = shost->unchecked_isa_dma ? GFP_DMA : GFP_KERNEL;
struct scsi_cmnd *cmd;
 
spin_lock_init(>free_list_lock);
@@ -437,7 +431,7 @@ int scsi_setup_command_freelist(struct Scsi_Host *shost)
/*
 * Get one backup command for this host.
 */
-   cmd = scsi_host_alloc_command(shost, gfp_mask);
+   cmd = scsi_host_alloc_command(shost, GFP_KERNEL);
if (!cmd) {
scsi_put_host_cmd_pool(shost);
shost->cmd_pool = NULL;
-- 
2.1.4

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH 10/18] scsi_dh_hp_sw: switch to scsi_execute_req_flags()

2017-01-25 Thread Christoph Hellwig
From: Hannes Reinecke 

Switch to scsi_execute_req_flags() instead of using the block interface
directly.  This will set REQ_QUIET and REQ_PREEMPT, but this is okay as
we're evaluating the errors anyway and should be able to send the command
even if the device is quiesced.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/device_handler/scsi_dh_hp_sw.c | 222 
 1 file changed, 65 insertions(+), 157 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_hp_sw.c 
b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
index 308e871..be43c94 100644
--- a/drivers/scsi/device_handler/scsi_dh_hp_sw.c
+++ b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
@@ -38,13 +38,10 @@
 #define HP_SW_PATH_PASSIVE 1
 
 struct hp_sw_dh_data {
-   unsigned char sense[SCSI_SENSE_BUFFERSIZE];
int path_state;
int retries;
int retry_cnt;
struct scsi_device *sdev;
-   activate_complete   callback_fn;
-   void*callback_data;
 };
 
 static int hp_sw_start_stop(struct hp_sw_dh_data *);
@@ -56,43 +53,34 @@ static int hp_sw_start_stop(struct hp_sw_dh_data *);
  *
  * Returns SCSI_DH_DEV_OFFLINED if the sdev is on the passive path
  */
-static int tur_done(struct scsi_device *sdev, unsigned char *sense)
+static int tur_done(struct scsi_device *sdev, struct hp_sw_dh_data *h,
+   struct scsi_sense_hdr *sshdr)
 {
-   struct scsi_sense_hdr sshdr;
-   int ret;
+   int ret = SCSI_DH_IO;
 
-   ret = scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, );
-   if (!ret) {
-   sdev_printk(KERN_WARNING, sdev,
-   "%s: sending tur failed, no sense available\n",
-   HP_SW_NAME);
-   ret = SCSI_DH_IO;
-   goto done;
-   }
-   switch (sshdr.sense_key) {
+   switch (sshdr->sense_key) {
case UNIT_ATTENTION:
ret = SCSI_DH_IMM_RETRY;
break;
case NOT_READY:
-   if ((sshdr.asc == 0x04) && (sshdr.ascq == 2)) {
+   if (sshdr->asc == 0x04 && sshdr->ascq == 2) {
/*
 * LUN not ready - Initialization command required
 *
 * This is the passive path
 */
-   ret = SCSI_DH_DEV_OFFLINED;
+   h->path_state = HP_SW_PATH_PASSIVE;
+   ret = SCSI_DH_OK;
break;
}
/* Fallthrough */
default:
sdev_printk(KERN_WARNING, sdev,
   "%s: sending tur failed, sense %x/%x/%x\n",
-  HP_SW_NAME, sshdr.sense_key, sshdr.asc,
-  sshdr.ascq);
+  HP_SW_NAME, sshdr->sense_key, sshdr->asc,
+  sshdr->ascq);
break;
}
-
-done:
return ret;
 }
 
@@ -105,131 +93,36 @@ static int tur_done(struct scsi_device *sdev, unsigned 
char *sense)
  */
 static int hp_sw_tur(struct scsi_device *sdev, struct hp_sw_dh_data *h)
 {
-   struct request *req;
-   int ret;
+   unsigned char cmd[6] = { TEST_UNIT_READY };
+   struct scsi_sense_hdr sshdr;
+   int ret = SCSI_DH_OK, res;
+   u64 req_flags = REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
+   REQ_FAILFAST_DRIVER;
 
 retry:
-   req = blk_get_request(sdev->request_queue, WRITE, GFP_NOIO);
-   if (IS_ERR(req))
-   return SCSI_DH_RES_TEMP_UNAVAIL;
-
-   blk_rq_set_block_pc(req);
-   req->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
- REQ_FAILFAST_DRIVER;
-   req->cmd_len = COMMAND_SIZE(TEST_UNIT_READY);
-   req->cmd[0] = TEST_UNIT_READY;
-   req->timeout = HP_SW_TIMEOUT;
-   req->sense = h->sense;
-   memset(req->sense, 0, SCSI_SENSE_BUFFERSIZE);
-   req->sense_len = 0;
-
-   ret = blk_execute_rq(req->q, NULL, req, 1);
-   if (ret == -EIO) {
-   if (req->sense_len > 0) {
-   ret = tur_done(sdev, h->sense);
-   } else {
+   res = scsi_execute_req_flags(sdev, cmd, DMA_NONE, NULL, 0, ,
+HP_SW_TIMEOUT, HP_SW_RETRIES,
+NULL, req_flags, 0);
+   if (res) {
+   if (scsi_sense_valid())
+   ret = tur_done(sdev, h, );
+   else {
sdev_printk(KERN_WARNING, sdev,
"%s: sending tur failed with %x\n",
-   HP_SW_NAME, req->errors);
+   HP_SW_NAME, res);
ret = SCSI_DH_IO;
}
} else {
h->path_state = HP_SW_PATH_ACTIVE;

Re: [dm-devel] deterministic io throughput in multipath

2017-01-25 Thread Benjamin Marzinski
On Wed, Jan 25, 2017 at 11:48:33AM +, Muneendra Kumar M wrote:
> Hi Ben,
> Thanks for the review .
> I will consider the below points and will do the necessary changes.
> 
> I have two general questions which may not be related to this.
> 1)Is there any standard tests that we need to do to check the functionality 
> of the multipath daemon.

No. multipath doesn't have a standard set of regression tests.  You need
to do your own testing.

> 2)Iam new to git is there any standard steps which we generally  follow to 
> push the changes .

You don't need to use git to push a patch, but it is easier to process
if your patch is inline in the email instead of as an attachment
(assuming your mail client doesn't mangle the patch).

If you want to use git, you just need to commit your patches to a branch
off the head of master. Then you can build patches with

# git format-patch --cover-letter -s -n -o  origin

and send them with

# git send-email --to "device-mapper development " --cc 
"Christophe Varoqui " --no-chain-reply-to 
--suppress-from 

You may need to first need to set up your git name and email

-Ben

> Regards,
> Muneendra.
> 
> 
> 
> -Original Message-
> From: Benjamin Marzinski [mailto:bmarz...@redhat.com] 
> Sent: Wednesday, January 25, 2017 2:59 PM
> To: Muneendra Kumar M 
> Cc: dm-devel@redhat.com
> Subject: Re: [dm-devel] deterministic io throughput in multipath
> 
> This looks fine to me.  If this what you want to push, I'm o.k. with it.
> But I'd like to make some suggestions that you are free to ignore.
> 
> Right now you have to check in two places to see if the path failed (in 
> update_multipath and check_path). If you look at the delayed_*_checks code, 
> it flags the path failures when you reinstate the path in check_path, since 
> this will only happen there.
> 
> Next, right now you use the disable_reinstate code to deal with the devices 
> when they shouldn't be reinstated. The issue with this is that the path 
> appears to be up when people look at its state, but still isn't being used. 
> If you do the check early and set the path state to PATH_DELAYED, like 
> delayed_*_checks does, then the path is clearly marked when users look to see 
> why it isn't being used. Also, if you exit check_path early, then you won't 
> be running the prioritizer on these likely-unstable paths.
> 
> Finally, the way you use dis_reinstate_time, a flakey device can get 
> reinstated as soon as it comes back up, as long it was down for long enough, 
> simply because pp->dis_reinstate_time reached
> mpp->san_path_err_recovery_time while the device was failed.
> delayed_*_checks depends on a number of successful path checks, so you know 
> that the device has at least been nominally functional for 
> san_path_err_recovery_time.
> 
> Like I said, you don't have to change any of this to make me happy with your 
> patch. But if you did change all of these, then the current delay_*_checks 
> code would just end up being a special case of your code.
> I'd really like to pull out the delayed_*_checks code and just keep your 
> version, since it seems more useful. It would be nice to keep the same 
> functionality. But even if you don't make these changes, I still think we 
> should pull out the delayed_*_checks code, since they both do the same 
> general thing, and your code does it better.
> 
> -Ben
> 
> On Mon, Jan 23, 2017 at 11:02:42AM +, Muneendra Kumar M wrote:
> >Hi Ben,
> >I have made the changes as per the below review comments .
> > 
> >Could you please review the attached patch and provide us your valuable
> >comments .
> >Below are the files that has been changed .
> > 
> >libmultipath/config.c  |  3 +++
> >libmultipath/config.h  |  9 +
> >libmultipath/configure.c   |  3 +++
> >libmultipath/defaults.h    |  3 ++-
> >libmultipath/dict.c    | 84
> >
> > 
> >libmultipath/dict.h    |  3 +--
> >libmultipath/propsel.c | 48
> >++--
> >libmultipath/propsel.h |  3 +++
> >libmultipath/structs.h | 14 ++
> >libmultipath/structs_vec.c |  6 ++
> >multipath/multipath.conf.5 | 57
> >+
> >multipathd/main.c  | 70
> >
> > +++---
> > 
> > 
> >Regards,
> >Muneendra.
> > 
> >_
> >From: Muneendra Kumar M
> >Sent: Tuesday, January 17, 2017 4:13 PM
> >To: 'Benjamin Marzinski' 
> >Cc: dm-devel@redhat.com
> >Subject: RE: [dm-devel] deterministic io throughput in multipath
> > 
> > 
> >Hi Ben,
> >Thanks for the review.
> >  

Re: [dm-devel] deterministic io throughput in multipath

2017-01-25 Thread Muneendra Kumar M
Hi Ben,
Thanks for the review .
I will consider the below points and will do the necessary changes.

I have two general questions which may not be related to this.
1)Is there any standard tests that we need to do to check the functionality of 
the multipath daemon.
2)Iam new to git is there any standard steps which we generally  follow to push 
the changes .

Regards,
Muneendra.



-Original Message-
From: Benjamin Marzinski [mailto:bmarz...@redhat.com] 
Sent: Wednesday, January 25, 2017 2:59 PM
To: Muneendra Kumar M 
Cc: dm-devel@redhat.com
Subject: Re: [dm-devel] deterministic io throughput in multipath

This looks fine to me.  If this what you want to push, I'm o.k. with it.
But I'd like to make some suggestions that you are free to ignore.

Right now you have to check in two places to see if the path failed (in 
update_multipath and check_path). If you look at the delayed_*_checks code, it 
flags the path failures when you reinstate the path in check_path, since this 
will only happen there.

Next, right now you use the disable_reinstate code to deal with the devices 
when they shouldn't be reinstated. The issue with this is that the path appears 
to be up when people look at its state, but still isn't being used. If you do 
the check early and set the path state to PATH_DELAYED, like delayed_*_checks 
does, then the path is clearly marked when users look to see why it isn't being 
used. Also, if you exit check_path early, then you won't be running the 
prioritizer on these likely-unstable paths.

Finally, the way you use dis_reinstate_time, a flakey device can get reinstated 
as soon as it comes back up, as long it was down for long enough, simply 
because pp->dis_reinstate_time reached
mpp->san_path_err_recovery_time while the device was failed.
delayed_*_checks depends on a number of successful path checks, so you know 
that the device has at least been nominally functional for 
san_path_err_recovery_time.

Like I said, you don't have to change any of this to make me happy with your 
patch. But if you did change all of these, then the current delay_*_checks code 
would just end up being a special case of your code.
I'd really like to pull out the delayed_*_checks code and just keep your 
version, since it seems more useful. It would be nice to keep the same 
functionality. But even if you don't make these changes, I still think we 
should pull out the delayed_*_checks code, since they both do the same general 
thing, and your code does it better.

-Ben

On Mon, Jan 23, 2017 at 11:02:42AM +, Muneendra Kumar M wrote:
>Hi Ben,
>I have made the changes as per the below review comments .
> 
>Could you please review the attached patch and provide us your valuable
>comments .
>Below are the files that has been changed .
> 
>libmultipath/config.c  |  3 +++
>libmultipath/config.h  |  9 +
>libmultipath/configure.c   |  3 +++
>libmultipath/defaults.h    |  3 ++-
>libmultipath/dict.c    | 84
>
> 
>libmultipath/dict.h    |  3 +--
>libmultipath/propsel.c | 48
>++--
>libmultipath/propsel.h |  3 +++
>libmultipath/structs.h | 14 ++
>libmultipath/structs_vec.c |  6 ++
>multipath/multipath.conf.5 | 57
>+
>multipathd/main.c  | 70
>
> +++---
> 
> 
>Regards,
>Muneendra.
> 
>_
>From: Muneendra Kumar M
>Sent: Tuesday, January 17, 2017 4:13 PM
>To: 'Benjamin Marzinski' 
>Cc: dm-devel@redhat.com
>Subject: RE: [dm-devel] deterministic io throughput in multipath
> 
> 
>Hi Ben,
>Thanks for the review.
>In dict.c  I will make sure I will make generic functions which will be
>used by both delay_checks and err_checks.
> 
>We want to increment the path failures every time the path goes down
>regardless of whether multipathd or the kernel noticed the failure of
>paths.
>Thanks for pointing this.
> 
>I will completely agree with the idea which you mentioned below by
>reconsidering the san_path_err_threshold_window with
>san_path_err_forget_rate. This will avoid counting time when the path was
>down as time where the path wasn't having problems.
> 
>I will incorporate all the changes mentioned below and will resend the
>patch once the testing is done.
> 
>Regards,
>Muneendra.
> 
> 
> 
>-Original Message-
>From: Benjamin Marzinski [[1]mailto:bmarz...@redhat.com]
>Sent: Tuesday, January 17, 2017 6:35 AM
>To: Muneendra Kumar M <[2]mmand...@brocade.com>
>Cc: 

Re: [dm-devel] deterministic io throughput in multipath

2017-01-25 Thread Benjamin Marzinski
This looks fine to me.  If this what you want to push, I'm o.k. with it.
But I'd like to make some suggestions that you are free to ignore.

Right now you have to check in two places to see if the path failed (in
update_multipath and check_path). If you look at the delayed_*_checks
code, it flags the path failures when you reinstate the path in
check_path, since this will only happen there.

Next, right now you use the disable_reinstate code to deal with the
devices when they shouldn't be reinstated. The issue with this is that
the path appears to be up when people look at its state, but still isn't
being used. If you do the check early and set the path state to
PATH_DELAYED, like delayed_*_checks does, then the path is clearly
marked when users look to see why it isn't being used. Also, if you exit
check_path early, then you won't be running the prioritizer on these
likely-unstable paths.

Finally, the way you use dis_reinstate_time, a flakey device can get
reinstated as soon as it comes back up, as long it was down for long
enough, simply because pp->dis_reinstate_time reached
mpp->san_path_err_recovery_time while the device was failed.
delayed_*_checks depends on a number of successful path checks, so
you know that the device has at least been nominally functional for
san_path_err_recovery_time.

Like I said, you don't have to change any of this to make me happy with
your patch. But if you did change all of these, then the current
delay_*_checks code would just end up being a special case of your code.
I'd really like to pull out the delayed_*_checks code and just keep your
version, since it seems more useful. It would be nice to keep the same
functionality. But even if you don't make these changes, I still think
we should pull out the delayed_*_checks code, since they both do the
same general thing, and your code does it better.

-Ben

On Mon, Jan 23, 2017 at 11:02:42AM +, Muneendra Kumar M wrote:
>Hi Ben,
>I have made the changes as per the below review comments .
> 
>Could you please review the attached patch and provide us your valuable
>comments .
>Below are the files that has been changed .
> 
>libmultipath/config.c  |  3 +++
>libmultipath/config.h  |  9 +
>libmultipath/configure.c   |  3 +++
>libmultipath/defaults.h    |  3 ++-
>libmultipath/dict.c    | 84
>
> 
>libmultipath/dict.h    |  3 +--
>libmultipath/propsel.c | 48
>++--
>libmultipath/propsel.h |  3 +++
>libmultipath/structs.h | 14 ++
>libmultipath/structs_vec.c |  6 ++
>multipath/multipath.conf.5 | 57
>+
>multipathd/main.c  | 70
>+++---
> 
> 
>Regards,
>Muneendra.
> 
>_
>From: Muneendra Kumar M
>Sent: Tuesday, January 17, 2017 4:13 PM
>To: 'Benjamin Marzinski' 
>Cc: dm-devel@redhat.com
>Subject: RE: [dm-devel] deterministic io throughput in multipath
> 
> 
>Hi Ben,
>Thanks for the review.
>In dict.c  I will make sure I will make generic functions which will be
>used by both delay_checks and err_checks.
> 
>We want to increment the path failures every time the path goes down
>regardless of whether multipathd or the kernel noticed the failure of
>paths.
>Thanks for pointing this.
> 
>I will completely agree with the idea which you mentioned below by
>reconsidering the san_path_err_threshold_window with
>san_path_err_forget_rate. This will avoid counting time when the path was
>down as time where the path wasn't having problems.
> 
>I will incorporate all the changes mentioned below and will resend the
>patch once the testing is done.
> 
>Regards,
>Muneendra.
> 
> 
> 
>-Original Message-
>From: Benjamin Marzinski [[1]mailto:bmarz...@redhat.com]
>Sent: Tuesday, January 17, 2017 6:35 AM
>To: Muneendra Kumar M <[2]mmand...@brocade.com>
>Cc: [3]dm-devel@redhat.com
>Subject: Re: [dm-devel] deterministic io throughput in multipath
> 
>On Mon, Jan 16, 2017 at 11:19:19AM +, Muneendra Kumar M wrote:
>>    Hi Ben,
>>    After the below discussion we  came with the approach which will meet
>our
>>    requirement.
>>    I have attached the patch which is working good in our field tests.
>>    Could you please review the attached patch and provide us your
>valuable
>>    comments .
> 
>I can see a number of issues with this patch.
> 
>First, some nit-picks:
>- I assume "dis_reinstante_time" should be "dis_reinstate_time"
> 
>- The indenting in