[dm-devel] [PATCH] multipath-tools: improve processing efficiency for addition and deletion of multipath devices

2017-01-17 Thread tang . junhui
From: "tang.junhui" 

Change-Id: I3f81a55fff389f991f915927000b281d7e263cc5
Signed-off-by: tang.junhui 

This patch used to improve processing efficiency for addition and deletion
of multipath devices.

This patch is tested pass by ZTE multipath automatic testing system.
The modification reduces the system consumption(such as CPU) and shortens
the processing time obviously in scene of massive multipath devices
addition or deletion.

The main processing flow of code is:
1) add uid_attrs configuration in the defaults section:
   It is configured udev attribute which providing a unique path identifier
   for corresponding type of path devices. If this field is configured and
   matched with type of device, it would override any other methods providing
   for device unique identifier in config file, and it would activate merging
   uevents according to the identifier to promote effiecncy in processing
   uevents. Tt has no default value, so defaultly only uevents filtering
   works, and uevents merging does not works, if users want to identify path
   by udev attribute and to activate merging uevents for SCSI and DAS device,
   they can set it's value as:
   "sd:ID_SERIAL dasd:ID_UID"
2) uevents accumulation in uevents burst scene:
   wait one seconds for more uevents in uevent_listen() in uevents burst
   situations
3) uevents preparing, filtering and merging:
   discard unuse uevents and fetch path idendifier from uevents;
   filter uevents;
   merge uevents.
4) uevents proccessing:
   proccess the merged uevents in uev->merge_node list without calling
   domap();
   proccess the last uevents uev with calling domap().

Any comment will be welcome, and it would be appreciated if these
patches would be considered for inclusion in the upstream
multipath-tools.

Thank you all,
Tang Junhui
---
 libmultipath/config.c  |   3 +
 libmultipath/config.h  |   1 +
 libmultipath/dict.c|   3 +
 libmultipath/discovery.c   |   5 +-
 libmultipath/discovery.h   |   2 +-
 libmultipath/list.h|  41 ++
 libmultipath/propsel.c |   7 +
 libmultipath/uevent.c  | 319 +++--
 libmultipath/uevent.h  |   2 +
 libmultipath/util.c|  40 ++
 libmultipath/util.h|   1 +
 multipath/multipath.conf.5 |  18 +++
 multipathd/cli_handlers.c  |   4 +-
 multipathd/main.c  |  93 +
 multipathd/main.h  |   4 +-
 15 files changed, 468 insertions(+), 75 deletions(-)

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 15ddbd8..765e91d 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -488,6 +488,9 @@ free_config (struct config * conf)
if (conf->uid_attribute)
FREE(conf->uid_attribute);
 
+   if (conf->uid_attrs)
+   FREE(conf->uid_attrs);
+
if (conf->getuid)
FREE(conf->getuid);
 
diff --git a/libmultipath/config.h b/libmultipath/config.h
index 9670020..ab85930 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -153,6 +153,7 @@ struct config {
 
char * multipath_dir;
char * selector;
+   char * uid_attrs;
char * uid_attribute;
char * getuid;
char * features;
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index dc21846..0a531d1 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -249,6 +249,8 @@ declare_ovr_snprint(selector, print_str)
 declare_mp_handler(selector, set_str)
 declare_mp_snprint(selector, print_str)
 
+declare_def_handler(uid_attrs, set_str)
+declare_def_snprint(uid_attrs, print_str)
 declare_def_handler(uid_attribute, set_str)
 declare_def_snprint_defstr(uid_attribute, print_str, DEFAULT_UID_ATTRIBUTE)
 declare_ovr_handler(uid_attribute, set_str)
@@ -1367,6 +1369,7 @@ init_keywords(vector keywords)
install_keyword("multipath_dir", &def_multipath_dir_handler, 
&snprint_def_multipath_dir);
install_keyword("path_selector", &def_selector_handler, 
&snprint_def_selector);
install_keyword("path_grouping_policy", &def_pgpolicy_handler, 
&snprint_def_pgpolicy);
+   install_keyword("uid_attrs", &def_uid_attrs_handler, 
&snprint_def_uid_attrs);
install_keyword("uid_attribute", &def_uid_attribute_handler, 
&snprint_def_uid_attribute);
install_keyword("getuid_callout", &def_getuid_handler, 
&snprint_def_getuid);
install_keyword("prio", &def_prio_name_handler, &snprint_def_prio_name);
diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index d1aec31..14904f2 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -33,7 +33,7 @@
 
 int
 alloc_path_with_pathinfo (struct config *conf, struct udev_device *udevice,
- int flag, struct path **pp_ptr)
+ char *wwid, int flag, struct path **pp_ptr)
 {
int err = PATHINFO_FAILED;
struct path * pp;
@@ -51,6 +51,9 @@ alloc_path_with_pathinfo (struct config *conf, struct 
udev_device *ude

Re: [dm-devel] [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt

2017-01-17 Thread Herbert Xu
On Tue, Jan 17, 2017 at 12:20:02PM +0100, Ondrej Mosnáček wrote:
> 2017-01-13 15:29 GMT+01:00 Herbert Xu :
> > What if the driver had hardware support for generating these IVs?
> > With your scheme this cannot be supported at all.
> 
> That's true... I'm starting to think that this isn't really a good
> idea. I was mainly trying to keep the door open for the random IV
> support and also to keep the multi-key stuff (which was really only
> intended for loop-AES partition support) out of the crypto API, but
> both of these can be probably solved in a better way...

As you said that the multi-key stuff is legacy-only I too would like
to see a way to keep that complexity out of the common path.

> > With such a definition you could either generate the IVs in dm-crypt
> > or have them generated in the IV generator.
> 
> That seems kind of hacky to me... but if that's what you prefer, then so be 
> it.

I'm open to other proposals.  The basic requirement is to be able to
process multiple blocks as one entity at the driver level, potentially
generating the IVs there too.

It's essentially the equivalent to full IPsec offload.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [patch] block: add blktrace C events for bio-based drivers

2017-01-17 Thread Jeff Moyer
Jens Axboe  writes:

> On 01/17/2017 01:57 PM, Jeff Moyer wrote:
>> Only a few bio-based drivers actually generate blktrace completion
>> (C) events.  Instead of changing all bio-based drivers to call
>> trace_block_bio_complete, move the tracing to bio_complete, and remove
>> the explicit tracing from the few drivers that actually do it.  After
>> this patch, there is exactly one caller of trace_block_bio_complete
>> and one caller of trace_block_rq_complete.  More importantly, all
>> bio-based drivers now generate C events, which is useful for
>> performance analysis.
>
> I like the change, hate the naming. I'd prefer one of two things:
>
> - Add bio_endio_complete() instead. That name sucks too, the
>   important part is flipping the __name() to have a trace
>   version instead.

I had also considered bio_endio_notrace().

> - Mark the bio as trace completed, and keep the naming. Since
>   it's only off the completion path, that can be just marking
>   the bi_flags non-atomically.
>
> I probably prefer the latter.

Hmm, okay.  I'll take a crack at that.

Thanks!
Jeff

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [patch] block: add blktrace C events for bio-based drivers

2017-01-17 Thread Jens Axboe
On 01/17/2017 01:57 PM, Jeff Moyer wrote:
> Only a few bio-based drivers actually generate blktrace completion
> (C) events.  Instead of changing all bio-based drivers to call
> trace_block_bio_complete, move the tracing to bio_complete, and remove
> the explicit tracing from the few drivers that actually do it.  After
> this patch, there is exactly one caller of trace_block_bio_complete
> and one caller of trace_block_rq_complete.  More importantly, all
> bio-based drivers now generate C events, which is useful for
> performance analysis.

I like the change, hate the naming. I'd prefer one of two things:

- Add bio_endio_complete() instead. That name sucks too, the
  important part is flipping the __name() to have a trace
  version instead.

- Mark the bio as trace completed, and keep the naming. Since
  it's only off the completion path, that can be just marking
  the bi_flags non-atomically.

I probably prefer the latter.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [patch] block: add blktrace C events for bio-based drivers

2017-01-17 Thread Jeff Moyer
Only a few bio-based drivers actually generate blktrace completion
(C) events.  Instead of changing all bio-based drivers to call
trace_block_bio_complete, move the tracing to bio_complete, and remove
the explicit tracing from the few drivers that actually do it.  After
this patch, there is exactly one caller of trace_block_bio_complete
and one caller of trace_block_rq_complete.  More importantly, all
bio-based drivers now generate C events, which is useful for
performance analysis.

Suggested-by: Christoph Hellwig 
Signed-off-by: Jeff Moyer 
---

Testing: I made sure that request-based drivers don't see duplicate
completions, and that bio-based drivers show both Q and C events.  I
haven't tested all affected drivers or combinations, though.

diff --git a/block/bio.c b/block/bio.c
index 2b37502..ba5daad 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1785,16 +1785,7 @@ static inline bool bio_remaining_done(struct bio *bio)
return false;
 }
 
-/**
- * bio_endio - end I/O on a bio
- * @bio:   bio
- *
- * Description:
- *   bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
- *   way to end I/O on a bio. No one should call bi_end_io() directly on a
- *   bio unless they own it and thus know that it has an end_io function.
- **/
-void bio_endio(struct bio *bio)
+void __bio_endio(struct bio *bio)
 {
 again:
if (!bio_remaining_done(bio))
@@ -1816,6 +1807,22 @@ void bio_endio(struct bio *bio)
if (bio->bi_end_io)
bio->bi_end_io(bio);
 }
+
+/**
+ * bio_endio - end I/O on a bio
+ * @bio:   bio
+ *
+ * Description:
+ *   bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
+ *   way to end I/O on a bio. No one should call bi_end_io() directly on a
+ *   bio unless they own it and thus know that it has an end_io function.
+ **/
+void bio_endio(struct bio *bio)
+{
+   trace_block_bio_complete(bdev_get_queue(bio->bi_bdev),
+bio, bio->bi_error);
+   __bio_endio(bio);
+}
 EXPORT_SYMBOL(bio_endio);
 
 /**
diff --git a/block/blk-core.c b/block/blk-core.c
index 61ba08c..f77f2d9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -153,7 +153,7 @@ static void req_bio_endio(struct request *rq, struct bio 
*bio,
 
/* don't actually finish bio if it's part of flush sequence */
if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
-   bio_endio(bio);
+   __bio_endio(bio);
 }
 
 void blk_dump_rq_flags(struct request *rq, char *msg)
@@ -1947,7 +1947,7 @@ generic_make_request_checks(struct bio *bio)
err = -EOPNOTSUPP;
 end_io:
bio->bi_error = err;
-   bio_endio(bio);
+   __bio_endio(bio);
return false;
 }
 
diff --git a/block/blk.h b/block/blk.h
index 041185e..1c9b50a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -57,6 +57,7 @@ int blk_init_rl(struct request_list *rl, struct request_queue 
*q,
gfp_t gfp_mask);
 void blk_exit_rl(struct request_list *rl);
 void init_request_from_bio(struct request *req, struct bio *bio);
+void __bio_endio(struct bio *bio);
 void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
struct bio *bio);
 void blk_queue_bypass_start(struct request_queue *q);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3086da5..e151aef 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -807,7 +807,6 @@ static void dec_pending(struct dm_io *io, int error)
queue_io(md, bio);
} else {
/* done with normal IO or empty flush */
-   trace_block_bio_complete(md->queue, bio, io_error);
bio->bi_error = io_error;
bio_endio(bio);
}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 36c13e4..17b4e06 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -159,8 +159,6 @@ static void return_io(struct bio_list *return_bi)
struct bio *bi;
while ((bi = bio_list_pop(return_bi)) != NULL) {
bi->bi_iter.bi_size = 0;
-   trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
-bi, 0);
bio_endio(bi);
}
 }
@@ -4902,8 +4900,6 @@ static void raid5_align_endio(struct bio *bi)
rdev_dec_pending(rdev, conf->mddev);
 
if (!error) {
-   trace_block_bio_complete(bdev_get_queue(raid_bi->bi_bdev),
-raid_bi, 0);
bio_endio(raid_bi);
if (atomic_dec_and_test(&conf->active_aligned_reads))
wake_up(&conf->wait_for_quiescent);
@@ -5470,8 +5466,6 @@ static void raid5_make_request(struct mddev *mddev, 
struct bio * bi)
if ( rw == WRITE )
md_write_end(mddev);
 
-   trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
-bi, 0);
 

Re: [dm-devel] [PATCH 04/11] multipathd: add need_do_map to indicate whether need calling domap() in ev_add_path()

2017-01-17 Thread Benjamin Marzinski
On Tue, Jan 17, 2017 at 03:28:06PM +0800, tang.jun...@zte.com.cn wrote:
>Hello Ben
> 
>Thank you for your patience again.
> 
>I'll modify code according to your suggestion as this:
>1) add configuration in the defaults section
>   uid_attrs "sd:ID_SERIAL dasd:ID_UID"
>   it would override any other configurations if this
>   filed is configured and matched;
> 
>2) In uevent processing thread, we will assign merge_id
>   according the label in uevents by this configuration;
> 
>3) this patch will take back:
>   [PATCH 12/12] libmultipath: use existing wwid when
>   wwid has already been existed in uevent
> 
>4) if this field is not configured, only do filtering and
>   no merging works.
> 
>Please confirm whether this modification is feasible.

Yes. This is perfectly reasonable solution.  Thanks for doing all the
work on this.

-Ben

> 
>Regards,
>Tang Junhui
> 
>��:         "Benjamin Marzinski" 
>�ռ���:         tang.jun...@zte.com.cn,
>:        christophe.varo...@opensvc.com, h...@suse.de,
>mwi...@suse.com, bart.vanass...@sandisk.com, dm-devel@redhat.com,
>zhang.ka...@zte.com.cn, tang.wenj...@zte.com.cn
>:         2017/01/17 05:38
>:        Re: [PATCH 04/11] multipathd: add need_do_map to indicate
>whether need calling domap() in ev_add_path()
> 
>--
> 
>On Thu, Jan 12, 2017 at 01:52:20PM +0800, tang.jun...@zte.com.cn wrote:
>> From: tang.junhui 
>>
>> Usually calling domap() in ev_add_path() is needed, but only last path
>> need to call domap() in processing for merged uevents to reduce the
>> count of calling domap() and promote efficiency. So add input parameter
>> need_do_map to indicate whether need calling domap() in ev_add_path().
> 
>With the addition of checking if the merge_id equals the wwid, you are
>protected against accidentially merging paths that shouldn't be merged,
>which is great.  But setting need_do_map for these paths doesn't
>completely make sure that if the wwid and merge_id differ, you will
>always call domap.
> 
>A contrived situation where this fails would look like:
> 
>add path1, path2, path3
> 
>where merge_id equals the wwid for path1 and path2, but there is a
>different wwid for path3.  In this case, you would call domap on just
>the multipath device for path3, but since path1 and path2 matched the
>merge_id, they wouldn't force a call to domap.
> 
>A less contrived example would be
> 
>add path1, path2, path3, path4
> 
>Where these were devices that were actually pointing at two different
>LUNs, but all set ID_SERIAL the same. path1 and path2 pointed to one
>LUN, while path3 and path4 pointed to another LUN.  In this case the
>wwid of path1 and path2 matched the merge_id, while the wwid of path3
>and path4 was different. In this case you would call domap twice, on
>both path3 and path4, but nothing would call domap to create a multipath
>device for path1 and path2.
> 
>In general, the code to set need_do_map if the wwid and merge_id don't
>match only works if either none of the device in a merged set have wwids
>that match the merge_id, or if the last device has a wwid that matches
>the merge_id. If there are devices with wwids that match the merge_id,
>but the last device in the set isn't one of them, then some devices will
>not get a multipath device created for them.
> 
>Now, I don't know of any device that works like my above example, so
>your code will likely work fine for all real-world devices.  Also,
>fixing this is a pain, as you don't find out until processing the last
>path in a set that things went wrong, and then you would have to go back
>and run the skipped functions on one of the paths you have already
>processed.
> 
>The easy way to fix it is to use the other possibility that I mentioned
>before, which is to not have the merge_id, and just use the udev
>provided wwid, instead of fetching it from pathinfo.  Like I mentioned,
>if you do this, you want to make sure that you only try to grab the wwid
>from the udev events for devices with the correct kernel name: ID_SERIAL
>only for "sd.*" devices, and ID_UID only for "dasd.*" devices. I also
>think that this should be configurable.
> 
>Otherwise, you can either go through the messy work of calling domap
>correctly when the last device of a set has a wwid that doesn't match
>the merge_id, or we can decide that this won't acutally cause problems
>with any known device, and punt fixing it for now. And if it causes
>problems with some future devices, we can deal with it then.
> 
>-Ben
> 
>>
>> Change-Id: I7f33de49a1d89d91180dcb95cd94e4944ae7ff36
>> Signed-off-by: tang.wenjun 
>> ---
>>  multipathd/cli_

Re: [dm-devel] kernel oops with blk-mq-sched latest

2017-01-17 Thread Jens Axboe
On 01/17/2017 04:47 AM, Jens Axboe wrote:
> On 01/17/2017 12:57 AM, Hannes Reinecke wrote:
>> Hi Jens,
>>
>> I gave your latest patchset from
>>
>> git.kernel.dk/linux-block blk-mq-sched
>>
>> I see a kernel oops when shutting down:
>>
>> [ 2132.708929] systemd-shutdown[1]: Detaching DM devices.
>> [ 2132.965107] BUG: unable to handle kernel NULL pointer dereference at
>> 
>> 0001
>> [ 2133.037182] IP: dd_merged_requests+0x6/0x60
>> [ 2133.077816] PGD 0
>> [ 2133.077818]
>> [ 2133.113087] Oops:  [#1] SMP
>> [ list of modules removed ]
>> [ 2133.925265] CPU: 20 PID: 1 Comm: systemd-shutdow Not tainted
>> 4.10.0-rc4+ #543
>> [ 2133.990034] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013
>> [ 2134.050522] task: 88042d614040 task.stack: c9000315
>> [ 2134.106915] RIP: 0010:dd_merged_requests+0x6/0x60
>> [ 2134.150593] RSP: 0018:c90003153b18 EFLAGS: 00010002
>> [ 2134.198740] RAX: 81cc6de0 RBX: 8804296d5040 RCX:
>> 0001
>> [ 2134.262708] RDX: 0001 RSI: 0001 RDI:
>> 8804296d5040
>> [ 2134.326987] RBP: c90003153b30 R08:  R09:
>> 
>> [ 2134.391054] R10:  R11: 0001f8180001f815 R12:
>> 
>> [ 2134.456095] R13: 8804296d5040 R14: 8804099801f0 R15:
>> 0004
>> [ 2134.521196] FS:  7fd64d3bf840() GS:88042f90()
>> knlGS:
>> [ 2134.595178] CS:  0010 DS:  ES:  CR0: 80050033
>> [ 2134.648637] CR2: 0001 CR3: 00081b892000 CR4:
>> 000406e0
>> [ 2134.713349] Call Trace:
>> [ 2134.737168]  ? elv_drain_elevator+0x29/0xa0
>> [ 2134.775821]  __blk_drain_queue+0x52/0x1a0
>> [ 2134.812473]  blk_queue_bypass_start+0x6e/0xa0
>> [ 2134.854009]  blkcg_deactivate_policy+0x30/0xf0
>> [ 2134.894993]  blk_throtl_exit+0x34/0x50
>> [ 2134.929450]  blkcg_exit_queue+0x35/0x40
>> [ 2134.965089]  blk_release_queue+0x33/0xe0
>> [ 2135.001364]  kobject_cleanup+0x63/0x170
>> [ 2135.037412]  kobject_put+0x25/0x50
>> [ 2135.068622]  blk_cleanup_queue+0x198/0x260
>> [ 2135.107780]  cleanup_mapped_device+0xb5/0xf0 [dm_mod]
>> [ 2135.154741]  __dm_destroy+0x1a5/0x290 [dm_mod]
>> [ 2135.195009]  dm_destroy+0x13/0x20 [dm_mod]
>> [ 2135.232149]  dev_remove+0xde/0x120 [dm_mod]
>> [ 2135.270184]  ? dev_suspend+0x210/0x210 [dm_mod]
>> [ 2135.311478]  ctl_ioctl+0x20b/0x510 [dm_mod]
>> [ 2135.349680]  ? terminate_walk+0xc3/0x140
>> [ 2135.385442]  ? kmem_cache_free+0x10/0x260
>> [ 2135.422315]  dm_ctl_ioctl+0x13/0x20 [dm_mod]
>>
>> Known issue?
> 
> Try with this, looks like we're calling the old bypass path for the new
> path, which is now triggering because q->elevator is true.

Should have grepped, there's one more path that uses the old bypass
path for q->mq_ops, potentially. This one handles that one, too.


diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 8ba0af780e88..2630f64bed19 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1223,7 +1223,11 @@ int blkcg_activate_policy(struct request_queue *q,
if (blkcg_policy_enabled(q, pol))
return 0;
 
-   blk_queue_bypass_start(q);
+   if (q->mq_ops) {
+   blk_mq_freeze_queue(q);
+   blk_mq_quiesce_queue(q);
+   } else
+   blk_queue_bypass_start(q);
 pd_prealloc:
if (!pd_prealloc) {
pd_prealloc = pol->pd_alloc_fn(GFP_KERNEL, q->node);
@@ -1261,7 +1265,10 @@ int blkcg_activate_policy(struct request_queue *q,
 
spin_unlock_irq(q->queue_lock);
 out_bypass_end:
-   blk_queue_bypass_end(q);
+   if (q->mq_ops)
+   blk_mq_unfreeze_queue(q);
+   else
+   blk_queue_bypass_end(q);
if (pd_prealloc)
pol->pd_free_fn(pd_prealloc);
return ret;
@@ -1284,7 +1291,12 @@ void blkcg_deactivate_policy(struct request_queue *q,
if (!blkcg_policy_enabled(q, pol))
return;
 
-   blk_queue_bypass_start(q);
+   if (q->mq_ops) {
+   blk_mq_freeze_queue(q);
+   blk_mq_quiesce_queue(q);
+   } else
+   blk_queue_bypass_start(q);
+
spin_lock_irq(q->queue_lock);
 
__clear_bit(pol->plid, q->blkcg_pols);
@@ -1304,7 +1316,11 @@ void blkcg_deactivate_policy(struct request_queue *q,
}
 
spin_unlock_irq(q->queue_lock);
-   blk_queue_bypass_end(q);
+
+   if (q->mq_ops)
+   blk_mq_unfreeze_queue(q);
+   else
+   blk_queue_bypass_end(q);
 }
 EXPORT_SYMBOL_GPL(blkcg_deactivate_policy);
 
diff --git a/block/elevator.c b/block/elevator.c
index d14cb87e6564..464372840774 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -613,6 +613,9 @@ void elv_drain_elevator(struct request_queue *q)
 {
static int printed;
 
+   if (WARN_ON_ONCE(q->mq_ops))
+   return;
+
lockdep_assert_held(q->queue_lock);
 
while (q->elevator->type->ops.sq.elevator_dis

Re: [dm-devel] kernel oops with blk-mq-sched latest

2017-01-17 Thread Jens Axboe
On 01/17/2017 12:57 AM, Hannes Reinecke wrote:
> Hi Jens,
> 
> I gave your latest patchset from
> 
> git.kernel.dk/linux-block blk-mq-sched
> 
> I see a kernel oops when shutting down:
> 
> [ 2132.708929] systemd-shutdown[1]: Detaching DM devices.
> [ 2132.965107] BUG: unable to handle kernel NULL pointer dereference at
> 
> 0001
> [ 2133.037182] IP: dd_merged_requests+0x6/0x60
> [ 2133.077816] PGD 0
> [ 2133.077818]
> [ 2133.113087] Oops:  [#1] SMP
> [ list of modules removed ]
> [ 2133.925265] CPU: 20 PID: 1 Comm: systemd-shutdow Not tainted
> 4.10.0-rc4+ #543
> [ 2133.990034] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013
> [ 2134.050522] task: 88042d614040 task.stack: c9000315
> [ 2134.106915] RIP: 0010:dd_merged_requests+0x6/0x60
> [ 2134.150593] RSP: 0018:c90003153b18 EFLAGS: 00010002
> [ 2134.198740] RAX: 81cc6de0 RBX: 8804296d5040 RCX:
> 0001
> [ 2134.262708] RDX: 0001 RSI: 0001 RDI:
> 8804296d5040
> [ 2134.326987] RBP: c90003153b30 R08:  R09:
> 
> [ 2134.391054] R10:  R11: 0001f8180001f815 R12:
> 
> [ 2134.456095] R13: 8804296d5040 R14: 8804099801f0 R15:
> 0004
> [ 2134.521196] FS:  7fd64d3bf840() GS:88042f90()
> knlGS:
> [ 2134.595178] CS:  0010 DS:  ES:  CR0: 80050033
> [ 2134.648637] CR2: 0001 CR3: 00081b892000 CR4:
> 000406e0
> [ 2134.713349] Call Trace:
> [ 2134.737168]  ? elv_drain_elevator+0x29/0xa0
> [ 2134.775821]  __blk_drain_queue+0x52/0x1a0
> [ 2134.812473]  blk_queue_bypass_start+0x6e/0xa0
> [ 2134.854009]  blkcg_deactivate_policy+0x30/0xf0
> [ 2134.894993]  blk_throtl_exit+0x34/0x50
> [ 2134.929450]  blkcg_exit_queue+0x35/0x40
> [ 2134.965089]  blk_release_queue+0x33/0xe0
> [ 2135.001364]  kobject_cleanup+0x63/0x170
> [ 2135.037412]  kobject_put+0x25/0x50
> [ 2135.068622]  blk_cleanup_queue+0x198/0x260
> [ 2135.107780]  cleanup_mapped_device+0xb5/0xf0 [dm_mod]
> [ 2135.154741]  __dm_destroy+0x1a5/0x290 [dm_mod]
> [ 2135.195009]  dm_destroy+0x13/0x20 [dm_mod]
> [ 2135.232149]  dev_remove+0xde/0x120 [dm_mod]
> [ 2135.270184]  ? dev_suspend+0x210/0x210 [dm_mod]
> [ 2135.311478]  ctl_ioctl+0x20b/0x510 [dm_mod]
> [ 2135.349680]  ? terminate_walk+0xc3/0x140
> [ 2135.385442]  ? kmem_cache_free+0x10/0x260
> [ 2135.422315]  dm_ctl_ioctl+0x13/0x20 [dm_mod]
> 
> Known issue?

Try with this, looks like we're calling the old bypass path for the new
path, which is now triggering because q->elevator is true.


diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 8ba0af780e88..1dd14c5b637a 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1284,7 +1284,12 @@ void blkcg_deactivate_policy(struct request_queue *q,
if (!blkcg_policy_enabled(q, pol))
return;
 
-   blk_queue_bypass_start(q);
+   if (q->mq_ops) {
+   blk_mq_freeze_queue(q);
+   blk_mq_quiesce_queue(q);
+   } else
+   blk_queue_bypass_start(q);
+
spin_lock_irq(q->queue_lock);
 
__clear_bit(pol->plid, q->blkcg_pols);
@@ -1304,7 +1309,11 @@ void blkcg_deactivate_policy(struct request_queue *q,
}
 
spin_unlock_irq(q->queue_lock);
-   blk_queue_bypass_end(q);
+
+   if (q->mq_ops)
+   blk_mq_unfreeze_queue(q);
+   else
+   blk_queue_bypass_end(q);
 }
 EXPORT_SYMBOL_GPL(blkcg_deactivate_policy);
 
diff --git a/block/elevator.c b/block/elevator.c
index d14cb87e6564..464372840774 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -613,6 +613,9 @@ void elv_drain_elevator(struct request_queue *q)
 {
static int printed;
 
+   if (WARN_ON_ONCE(q->mq_ops))
+   return;
+
lockdep_assert_held(q->queue_lock);
 
while (q->elevator->type->ops.sq.elevator_dispatch_fn(q, 1))


-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support

2017-01-17 Thread Ondrej Mosnáček
Hi Binoy,

2017-01-16 9:37 GMT+01:00 Binoy Jayan :
> The initial goal of our proposal was to process the encryption requests with 
> the
> maximum possible block sizes with a hardware which has automated iv generation
> capabilities. But when it is done in software, and if the bulk
> requests are processed
> sequentially, one block at a time, the memory foot print could be
> reduced even if
> the bulk request exceeds a page. While your patch looks good, there
> are couple of
> drawbacks one of which is the maximum size of a bulk request is a page. This
> could limit the capability of the crypto hardware. If the whole bio is
> processed at
> once, which is what qualcomm's version of dm-req-crypt does, it achieves an 
> even
> better performance.

I see... well, I added the limit only so that the async fallback
implementation can allocate multiple requests, so they can be
processed in parallel, as they would be in the current dm-crypt code.
I'm not really sure if that brings any benefit, but I guess if some HW
accelerator has multiple engines, then this allows distributing the
work among them. (I wonder how switching to the crypto API's IV
generation will affect the situation for drivers that can process
requests in parallel, but do not support the IV generators...)

I could remove the limit and switch the fallback to sequential
processing (or maybe even allocate the requests from a mempool, the
way dm-crypt does it now...), but after Herbert's feedback I'm
probably going to scrap this patchset anyway...

>> Note that if the 'keycount' parameter of the cipher specification is set to a
>> value other than 1, dm-crypt still sends only one sector in each request, 
>> since
>> in such case the neighboring sectors are encrypted with different keys.
>
> This could be avoided if the key management is done at the crypto layer.

Yes, but remember that the only reasonable use-case for using keycount
!= 1 is mounting loop-AES partitions (which is kind of a legacy
format, so there is not much point in making HW drivers for it). It is
an unfortunate consequence of Milan's decision to make keycount an
independent part of the cipher specification (instead of making it
specific for the LMK mode), that all the other IV modes are now
'polluted' with the requirement to support it.

I discussed with Milan the possibility of deprecating the keycount
parameter (i.e. allowing only value of 64 for LMK and 1 for all the
other IV modes) and then converting the IV modes to skciphers (or IV
generators, or some combination of both). This would significantly
simplify the key management and allow for better optimization
strategies. However, I don't know if such change would be accepted by
device-mapper maintainers, since it may break someone's unusual
dm-crypt configuration...

Cheers,
Ondrej

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt

2017-01-17 Thread Ondrej Mosnáček
2017-01-13 15:29 GMT+01:00 Herbert Xu :
> What if the driver had hardware support for generating these IVs?
> With your scheme this cannot be supported at all.

That's true... I'm starting to think that this isn't really a good
idea. I was mainly trying to keep the door open for the random IV
support and also to keep the multi-key stuff (which was really only
intended for loop-AES partition support) out of the crypto API, but
both of these can be probably solved in a better way...

> Getting the IVs back is not actually that hard.  We could simply
> change the algorithm definition for the IV generator so that
> the IVs are embedded in the plaintext and ciphertext.  For
> example, you could declare it so that the for n sectors the
> first n*ivsize bytes would be the IV, and the actual plaintext
> or ciphertext would follow.
>
> With such a definition you could either generate the IVs in dm-crypt
> or have them generated in the IV generator.

That seems kind of hacky to me... but if that's what you prefer, then so be it.

Cheers,
Ondrej

>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] deterministic io throughput in multipath

2017-01-17 Thread Muneendra Kumar M
Hi Ben,
Thanks for the review.
In dict.c  I will make sure I will make generic functions which will be used by 
both delay_checks and err_checks.

We want to increment the path failures every time the path goes down regardless 
of whether multipathd or the kernel noticed the failure of paths.
Thanks for pointing this.

I will completely agree with the idea which you mentioned below by 
reconsidering the san_path_err_threshold_window with
san_path_err_forget_rate. This will avoid counting time when the path was down 
as time where the path wasn't having problems.

I will incorporate all the changes mentioned below and will resend the patch 
once the testing is done.

Regards,
Muneendra.



-Original Message-
From: Benjamin Marzinski [mailto:bmarz...@redhat.com]
Sent: Tuesday, January 17, 2017 6:35 AM
To: Muneendra Kumar M 
Cc: dm-devel@redhat.com
Subject: Re: [dm-devel] deterministic io throughput in multipath

On Mon, Jan 16, 2017 at 11:19:19AM +, Muneendra Kumar M wrote:
>Hi Ben,
>After the below discussion we  came with the approach which will meet our
>requirement.
>I have attached the patch which is working good in our field tests.
>Could you please review the attached patch and provide us your valuable
>comments .

I can see a number of issues with this patch.

First, some nit-picks:
- I assume "dis_reinstante_time" should be "dis_reinstate_time"

- The indenting in check_path_validity_err is wrong, which made it
  confusing until I noticed that

if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)

  doesn't have an open brace, and shouldn't indent the rest of the
  function.

- You call clock_gettime in check_path, but never use the result.

- In dict.c, instead of writing your own functions that are the same as
  the *_delay_checks functions, you could make those functions generic
  and use them for both.  To go match the other generic function names
  they would probably be something like

set_off_int_undef

print_off_int_undef

  You would also need to change DELAY_CHECKS_* and ERR_CHECKS_* to
  point to some common enum that you created, the way
  user_friendly_names_states (to name one of many) does. The generic
  enum used by *_off_int_undef would be something like.

enum no_undef {
NU_NO = -1,
NU_UNDEF = 0,
}

  The idea is to try to cut down on the number of functions that are
  simply copy-pasting other functions in dict.c.


Those are all minor cleanup issues, but there are some bigger problems.

Instead of checking if san_path_err_threshold, san_path_err_threshold_window, 
and san_path_err_recovery_time are greater than zero seperately, you should 
probably check them all at the start of check_path_validity_err, and return 0 
unless they all are set.
Right now, if a user sets san_path_err_threshold and 
san_path_err_threshold_window but not san_path_err_recovery_time, their path 
will never recover after it hits the error threshold.  I pretty sure that you 
don't mean to permanently disable the paths.


time_t is a signed type, which means that if you get the clock time in 
update_multpath and then fail to get the clock time in check_path_validity_err, 
this check:

start_time.tv_sec - pp->failure_start_time) < 
pp->mpp->san_path_err_threshold_window

will always be true.  I realize that clock_gettime is very unlikely to fail.  
But if it does, probably the safest thing to so is to just immediately return 0 
in check_path_validity_err.


The way you set path_failures in update_multipath may not get you what you 
want.  It will only count path failures found by the kernel, and not the path 
checker.  If the check_path finds the error, pp->state will be set to PATH_DOWN 
before pp->dmstate is set to PSTATE_FAILED. That means you will not increment 
path_failures. Perhaps this is what you want, but I would assume that you would 
want to count every time the path goes down regardless of whether multipathd or 
the kernel noticed it.


I'm not super enthusiastic about how the san_path_err_threshold_window works.  
First, it starts counting from when the path goes down, so if the path takes 
long enough to get restored, and then fails immediately, it can just keep 
failing and it will never hit the san_path_err_threshold_window, since it 
spends so much of that time with the path failed.  Also, the window gets set on 
the first error, and never reset until the number of errors is over the 
threshold.  This means that if you get one early error and then a bunch of 
errors much later, you will go for (2 x san_path_err_threshold) - 1 errors 
until you stop reinstating the path, because of the window reset in the middle 
of the string of errors.  It seems like a better idea would be to have 
check_path_validity_err reset path_failures as soon as it notices that you are 
past san_path_err_threshold_window, instead of waiting till the number of 
errors hits san_path_err_threshold.


If I was going to design this, I think I would have san_path_err_th

[dm-devel] kernel oops with blk-mq-sched latest

2017-01-17 Thread Hannes Reinecke
Hi Jens,

I gave your latest patchset from

git.kernel.dk/linux-block blk-mq-sched

I see a kernel oops when shutting down:

[ 2132.708929] systemd-shutdown[1]: Detaching DM devices.
[ 2132.965107] BUG: unable to handle kernel NULL pointer dereference at

0001
[ 2133.037182] IP: dd_merged_requests+0x6/0x60
[ 2133.077816] PGD 0
[ 2133.077818]
[ 2133.113087] Oops:  [#1] SMP
[ list of modules removed ]
[ 2133.925265] CPU: 20 PID: 1 Comm: systemd-shutdow Not tainted
4.10.0-rc4+ #543
[ 2133.990034] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013
[ 2134.050522] task: 88042d614040 task.stack: c9000315
[ 2134.106915] RIP: 0010:dd_merged_requests+0x6/0x60
[ 2134.150593] RSP: 0018:c90003153b18 EFLAGS: 00010002
[ 2134.198740] RAX: 81cc6de0 RBX: 8804296d5040 RCX:
0001
[ 2134.262708] RDX: 0001 RSI: 0001 RDI:
8804296d5040
[ 2134.326987] RBP: c90003153b30 R08:  R09:

[ 2134.391054] R10:  R11: 0001f8180001f815 R12:

[ 2134.456095] R13: 8804296d5040 R14: 8804099801f0 R15:
0004
[ 2134.521196] FS:  7fd64d3bf840() GS:88042f90()
knlGS:
[ 2134.595178] CS:  0010 DS:  ES:  CR0: 80050033
[ 2134.648637] CR2: 0001 CR3: 00081b892000 CR4:
000406e0
[ 2134.713349] Call Trace:
[ 2134.737168]  ? elv_drain_elevator+0x29/0xa0
[ 2134.775821]  __blk_drain_queue+0x52/0x1a0
[ 2134.812473]  blk_queue_bypass_start+0x6e/0xa0
[ 2134.854009]  blkcg_deactivate_policy+0x30/0xf0
[ 2134.894993]  blk_throtl_exit+0x34/0x50
[ 2134.929450]  blkcg_exit_queue+0x35/0x40
[ 2134.965089]  blk_release_queue+0x33/0xe0
[ 2135.001364]  kobject_cleanup+0x63/0x170
[ 2135.037412]  kobject_put+0x25/0x50
[ 2135.068622]  blk_cleanup_queue+0x198/0x260
[ 2135.107780]  cleanup_mapped_device+0xb5/0xf0 [dm_mod]
[ 2135.154741]  __dm_destroy+0x1a5/0x290 [dm_mod]
[ 2135.195009]  dm_destroy+0x13/0x20 [dm_mod]
[ 2135.232149]  dev_remove+0xde/0x120 [dm_mod]
[ 2135.270184]  ? dev_suspend+0x210/0x210 [dm_mod]
[ 2135.311478]  ctl_ioctl+0x20b/0x510 [dm_mod]
[ 2135.349680]  ? terminate_walk+0xc3/0x140
[ 2135.385442]  ? kmem_cache_free+0x10/0x260
[ 2135.422315]  dm_ctl_ioctl+0x13/0x20 [dm_mod]

Known issue?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel