[dm-devel] [PATCH] multipath-tools: improve processing efficiency for addition and deletion of multipath devices
From: "tang.junhui" Change-Id: I3f81a55fff389f991f915927000b281d7e263cc5 Signed-off-by: tang.junhui This patch used to improve processing efficiency for addition and deletion of multipath devices. This patch is tested pass by ZTE multipath automatic testing system. The modification reduces the system consumption(such as CPU) and shortens the processing time obviously in scene of massive multipath devices addition or deletion. The main processing flow of code is: 1) add uid_attrs configuration in the defaults section: It is configured udev attribute which providing a unique path identifier for corresponding type of path devices. If this field is configured and matched with type of device, it would override any other methods providing for device unique identifier in config file, and it would activate merging uevents according to the identifier to promote effiecncy in processing uevents. Tt has no default value, so defaultly only uevents filtering works, and uevents merging does not works, if users want to identify path by udev attribute and to activate merging uevents for SCSI and DAS device, they can set it's value as: "sd:ID_SERIAL dasd:ID_UID" 2) uevents accumulation in uevents burst scene: wait one seconds for more uevents in uevent_listen() in uevents burst situations 3) uevents preparing, filtering and merging: discard unuse uevents and fetch path idendifier from uevents; filter uevents; merge uevents. 4) uevents proccessing: proccess the merged uevents in uev->merge_node list without calling domap(); proccess the last uevents uev with calling domap(). Any comment will be welcome, and it would be appreciated if these patches would be considered for inclusion in the upstream multipath-tools. Thank you all, Tang Junhui --- libmultipath/config.c | 3 + libmultipath/config.h | 1 + libmultipath/dict.c| 3 + libmultipath/discovery.c | 5 +- libmultipath/discovery.h | 2 +- libmultipath/list.h| 41 ++ libmultipath/propsel.c | 7 + libmultipath/uevent.c | 319 +++-- libmultipath/uevent.h | 2 + libmultipath/util.c| 40 ++ libmultipath/util.h| 1 + multipath/multipath.conf.5 | 18 +++ multipathd/cli_handlers.c | 4 +- multipathd/main.c | 93 + multipathd/main.h | 4 +- 15 files changed, 468 insertions(+), 75 deletions(-) diff --git a/libmultipath/config.c b/libmultipath/config.c index 15ddbd8..765e91d 100644 --- a/libmultipath/config.c +++ b/libmultipath/config.c @@ -488,6 +488,9 @@ free_config (struct config * conf) if (conf->uid_attribute) FREE(conf->uid_attribute); + if (conf->uid_attrs) + FREE(conf->uid_attrs); + if (conf->getuid) FREE(conf->getuid); diff --git a/libmultipath/config.h b/libmultipath/config.h index 9670020..ab85930 100644 --- a/libmultipath/config.h +++ b/libmultipath/config.h @@ -153,6 +153,7 @@ struct config { char * multipath_dir; char * selector; + char * uid_attrs; char * uid_attribute; char * getuid; char * features; diff --git a/libmultipath/dict.c b/libmultipath/dict.c index dc21846..0a531d1 100644 --- a/libmultipath/dict.c +++ b/libmultipath/dict.c @@ -249,6 +249,8 @@ declare_ovr_snprint(selector, print_str) declare_mp_handler(selector, set_str) declare_mp_snprint(selector, print_str) +declare_def_handler(uid_attrs, set_str) +declare_def_snprint(uid_attrs, print_str) declare_def_handler(uid_attribute, set_str) declare_def_snprint_defstr(uid_attribute, print_str, DEFAULT_UID_ATTRIBUTE) declare_ovr_handler(uid_attribute, set_str) @@ -1367,6 +1369,7 @@ init_keywords(vector keywords) install_keyword("multipath_dir", &def_multipath_dir_handler, &snprint_def_multipath_dir); install_keyword("path_selector", &def_selector_handler, &snprint_def_selector); install_keyword("path_grouping_policy", &def_pgpolicy_handler, &snprint_def_pgpolicy); + install_keyword("uid_attrs", &def_uid_attrs_handler, &snprint_def_uid_attrs); install_keyword("uid_attribute", &def_uid_attribute_handler, &snprint_def_uid_attribute); install_keyword("getuid_callout", &def_getuid_handler, &snprint_def_getuid); install_keyword("prio", &def_prio_name_handler, &snprint_def_prio_name); diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c index d1aec31..14904f2 100644 --- a/libmultipath/discovery.c +++ b/libmultipath/discovery.c @@ -33,7 +33,7 @@ int alloc_path_with_pathinfo (struct config *conf, struct udev_device *udevice, - int flag, struct path **pp_ptr) + char *wwid, int flag, struct path **pp_ptr) { int err = PATHINFO_FAILED; struct path * pp; @@ -51,6 +51,9 @@ alloc_path_with_pathinfo (struct config *conf, struct udev_device *ude
Re: [dm-devel] [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
On Tue, Jan 17, 2017 at 12:20:02PM +0100, Ondrej Mosnáček wrote: > 2017-01-13 15:29 GMT+01:00 Herbert Xu : > > What if the driver had hardware support for generating these IVs? > > With your scheme this cannot be supported at all. > > That's true... I'm starting to think that this isn't really a good > idea. I was mainly trying to keep the door open for the random IV > support and also to keep the multi-key stuff (which was really only > intended for loop-AES partition support) out of the crypto API, but > both of these can be probably solved in a better way... As you said that the multi-key stuff is legacy-only I too would like to see a way to keep that complexity out of the common path. > > With such a definition you could either generate the IVs in dm-crypt > > or have them generated in the IV generator. > > That seems kind of hacky to me... but if that's what you prefer, then so be > it. I'm open to other proposals. The basic requirement is to be able to process multiple blocks as one entity at the driver level, potentially generating the IVs there too. It's essentially the equivalent to full IPsec offload. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [patch] block: add blktrace C events for bio-based drivers
Jens Axboe writes: > On 01/17/2017 01:57 PM, Jeff Moyer wrote: >> Only a few bio-based drivers actually generate blktrace completion >> (C) events. Instead of changing all bio-based drivers to call >> trace_block_bio_complete, move the tracing to bio_complete, and remove >> the explicit tracing from the few drivers that actually do it. After >> this patch, there is exactly one caller of trace_block_bio_complete >> and one caller of trace_block_rq_complete. More importantly, all >> bio-based drivers now generate C events, which is useful for >> performance analysis. > > I like the change, hate the naming. I'd prefer one of two things: > > - Add bio_endio_complete() instead. That name sucks too, the > important part is flipping the __name() to have a trace > version instead. I had also considered bio_endio_notrace(). > - Mark the bio as trace completed, and keep the naming. Since > it's only off the completion path, that can be just marking > the bi_flags non-atomically. > > I probably prefer the latter. Hmm, okay. I'll take a crack at that. Thanks! Jeff -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [patch] block: add blktrace C events for bio-based drivers
On 01/17/2017 01:57 PM, Jeff Moyer wrote: > Only a few bio-based drivers actually generate blktrace completion > (C) events. Instead of changing all bio-based drivers to call > trace_block_bio_complete, move the tracing to bio_complete, and remove > the explicit tracing from the few drivers that actually do it. After > this patch, there is exactly one caller of trace_block_bio_complete > and one caller of trace_block_rq_complete. More importantly, all > bio-based drivers now generate C events, which is useful for > performance analysis. I like the change, hate the naming. I'd prefer one of two things: - Add bio_endio_complete() instead. That name sucks too, the important part is flipping the __name() to have a trace version instead. - Mark the bio as trace completed, and keep the naming. Since it's only off the completion path, that can be just marking the bi_flags non-atomically. I probably prefer the latter. -- Jens Axboe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [patch] block: add blktrace C events for bio-based drivers
Only a few bio-based drivers actually generate blktrace completion (C) events. Instead of changing all bio-based drivers to call trace_block_bio_complete, move the tracing to bio_complete, and remove the explicit tracing from the few drivers that actually do it. After this patch, there is exactly one caller of trace_block_bio_complete and one caller of trace_block_rq_complete. More importantly, all bio-based drivers now generate C events, which is useful for performance analysis. Suggested-by: Christoph Hellwig Signed-off-by: Jeff Moyer --- Testing: I made sure that request-based drivers don't see duplicate completions, and that bio-based drivers show both Q and C events. I haven't tested all affected drivers or combinations, though. diff --git a/block/bio.c b/block/bio.c index 2b37502..ba5daad 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1785,16 +1785,7 @@ static inline bool bio_remaining_done(struct bio *bio) return false; } -/** - * bio_endio - end I/O on a bio - * @bio: bio - * - * Description: - * bio_endio() will end I/O on the whole bio. bio_endio() is the preferred - * way to end I/O on a bio. No one should call bi_end_io() directly on a - * bio unless they own it and thus know that it has an end_io function. - **/ -void bio_endio(struct bio *bio) +void __bio_endio(struct bio *bio) { again: if (!bio_remaining_done(bio)) @@ -1816,6 +1807,22 @@ void bio_endio(struct bio *bio) if (bio->bi_end_io) bio->bi_end_io(bio); } + +/** + * bio_endio - end I/O on a bio + * @bio: bio + * + * Description: + * bio_endio() will end I/O on the whole bio. bio_endio() is the preferred + * way to end I/O on a bio. No one should call bi_end_io() directly on a + * bio unless they own it and thus know that it has an end_io function. + **/ +void bio_endio(struct bio *bio) +{ + trace_block_bio_complete(bdev_get_queue(bio->bi_bdev), +bio, bio->bi_error); + __bio_endio(bio); +} EXPORT_SYMBOL(bio_endio); /** diff --git a/block/blk-core.c b/block/blk-core.c index 61ba08c..f77f2d9 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -153,7 +153,7 @@ static void req_bio_endio(struct request *rq, struct bio *bio, /* don't actually finish bio if it's part of flush sequence */ if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ)) - bio_endio(bio); + __bio_endio(bio); } void blk_dump_rq_flags(struct request *rq, char *msg) @@ -1947,7 +1947,7 @@ generic_make_request_checks(struct bio *bio) err = -EOPNOTSUPP; end_io: bio->bi_error = err; - bio_endio(bio); + __bio_endio(bio); return false; } diff --git a/block/blk.h b/block/blk.h index 041185e..1c9b50a 100644 --- a/block/blk.h +++ b/block/blk.h @@ -57,6 +57,7 @@ int blk_init_rl(struct request_list *rl, struct request_queue *q, gfp_t gfp_mask); void blk_exit_rl(struct request_list *rl); void init_request_from_bio(struct request *req, struct bio *bio); +void __bio_endio(struct bio *bio); void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio); void blk_queue_bypass_start(struct request_queue *q); diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 3086da5..e151aef 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -807,7 +807,6 @@ static void dec_pending(struct dm_io *io, int error) queue_io(md, bio); } else { /* done with normal IO or empty flush */ - trace_block_bio_complete(md->queue, bio, io_error); bio->bi_error = io_error; bio_endio(bio); } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 36c13e4..17b4e06 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -159,8 +159,6 @@ static void return_io(struct bio_list *return_bi) struct bio *bi; while ((bi = bio_list_pop(return_bi)) != NULL) { bi->bi_iter.bi_size = 0; - trace_block_bio_complete(bdev_get_queue(bi->bi_bdev), -bi, 0); bio_endio(bi); } } @@ -4902,8 +4900,6 @@ static void raid5_align_endio(struct bio *bi) rdev_dec_pending(rdev, conf->mddev); if (!error) { - trace_block_bio_complete(bdev_get_queue(raid_bi->bi_bdev), -raid_bi, 0); bio_endio(raid_bi); if (atomic_dec_and_test(&conf->active_aligned_reads)) wake_up(&conf->wait_for_quiescent); @@ -5470,8 +5466,6 @@ static void raid5_make_request(struct mddev *mddev, struct bio * bi) if ( rw == WRITE ) md_write_end(mddev); - trace_block_bio_complete(bdev_get_queue(bi->bi_bdev), -bi, 0);
Re: [dm-devel] [PATCH 04/11] multipathd: add need_do_map to indicate whether need calling domap() in ev_add_path()
On Tue, Jan 17, 2017 at 03:28:06PM +0800, tang.jun...@zte.com.cn wrote: >Hello Ben > >Thank you for your patience again. > >I'll modify code according to your suggestion as this: >1) add configuration in the defaults section > uid_attrs "sd:ID_SERIAL dasd:ID_UID" > it would override any other configurations if this > filed is configured and matched; > >2) In uevent processing thread, we will assign merge_id > according the label in uevents by this configuration; > >3) this patch will take back: > [PATCH 12/12] libmultipath: use existing wwid when > wwid has already been existed in uevent > >4) if this field is not configured, only do filtering and > no merging works. > >Please confirm whether this modification is feasible. Yes. This is perfectly reasonable solution. Thanks for doing all the work on this. -Ben > >Regards, >Tang Junhui > >��: "Benjamin Marzinski" >�ռ���: tang.jun...@zte.com.cn, >: christophe.varo...@opensvc.com, h...@suse.de, >mwi...@suse.com, bart.vanass...@sandisk.com, dm-devel@redhat.com, >zhang.ka...@zte.com.cn, tang.wenj...@zte.com.cn >: 2017/01/17 05:38 >: Re: [PATCH 04/11] multipathd: add need_do_map to indicate >whether need calling domap() in ev_add_path() > >-- > >On Thu, Jan 12, 2017 at 01:52:20PM +0800, tang.jun...@zte.com.cn wrote: >> From: tang.junhui >> >> Usually calling domap() in ev_add_path() is needed, but only last path >> need to call domap() in processing for merged uevents to reduce the >> count of calling domap() and promote efficiency. So add input parameter >> need_do_map to indicate whether need calling domap() in ev_add_path(). > >With the addition of checking if the merge_id equals the wwid, you are >protected against accidentially merging paths that shouldn't be merged, >which is great. But setting need_do_map for these paths doesn't >completely make sure that if the wwid and merge_id differ, you will >always call domap. > >A contrived situation where this fails would look like: > >add path1, path2, path3 > >where merge_id equals the wwid for path1 and path2, but there is a >different wwid for path3. In this case, you would call domap on just >the multipath device for path3, but since path1 and path2 matched the >merge_id, they wouldn't force a call to domap. > >A less contrived example would be > >add path1, path2, path3, path4 > >Where these were devices that were actually pointing at two different >LUNs, but all set ID_SERIAL the same. path1 and path2 pointed to one >LUN, while path3 and path4 pointed to another LUN. In this case the >wwid of path1 and path2 matched the merge_id, while the wwid of path3 >and path4 was different. In this case you would call domap twice, on >both path3 and path4, but nothing would call domap to create a multipath >device for path1 and path2. > >In general, the code to set need_do_map if the wwid and merge_id don't >match only works if either none of the device in a merged set have wwids >that match the merge_id, or if the last device has a wwid that matches >the merge_id. If there are devices with wwids that match the merge_id, >but the last device in the set isn't one of them, then some devices will >not get a multipath device created for them. > >Now, I don't know of any device that works like my above example, so >your code will likely work fine for all real-world devices. Also, >fixing this is a pain, as you don't find out until processing the last >path in a set that things went wrong, and then you would have to go back >and run the skipped functions on one of the paths you have already >processed. > >The easy way to fix it is to use the other possibility that I mentioned >before, which is to not have the merge_id, and just use the udev >provided wwid, instead of fetching it from pathinfo. Like I mentioned, >if you do this, you want to make sure that you only try to grab the wwid >from the udev events for devices with the correct kernel name: ID_SERIAL >only for "sd.*" devices, and ID_UID only for "dasd.*" devices. I also >think that this should be configurable. > >Otherwise, you can either go through the messy work of calling domap >correctly when the last device of a set has a wwid that doesn't match >the merge_id, or we can decide that this won't acutally cause problems >with any known device, and punt fixing it for now. And if it causes >problems with some future devices, we can deal with it then. > >-Ben > >> >> Change-Id: I7f33de49a1d89d91180dcb95cd94e4944ae7ff36 >> Signed-off-by: tang.wenjun >> --- >> multipathd/cli_
Re: [dm-devel] kernel oops with blk-mq-sched latest
On 01/17/2017 04:47 AM, Jens Axboe wrote: > On 01/17/2017 12:57 AM, Hannes Reinecke wrote: >> Hi Jens, >> >> I gave your latest patchset from >> >> git.kernel.dk/linux-block blk-mq-sched >> >> I see a kernel oops when shutting down: >> >> [ 2132.708929] systemd-shutdown[1]: Detaching DM devices. >> [ 2132.965107] BUG: unable to handle kernel NULL pointer dereference at >> >> 0001 >> [ 2133.037182] IP: dd_merged_requests+0x6/0x60 >> [ 2133.077816] PGD 0 >> [ 2133.077818] >> [ 2133.113087] Oops: [#1] SMP >> [ list of modules removed ] >> [ 2133.925265] CPU: 20 PID: 1 Comm: systemd-shutdow Not tainted >> 4.10.0-rc4+ #543 >> [ 2133.990034] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013 >> [ 2134.050522] task: 88042d614040 task.stack: c9000315 >> [ 2134.106915] RIP: 0010:dd_merged_requests+0x6/0x60 >> [ 2134.150593] RSP: 0018:c90003153b18 EFLAGS: 00010002 >> [ 2134.198740] RAX: 81cc6de0 RBX: 8804296d5040 RCX: >> 0001 >> [ 2134.262708] RDX: 0001 RSI: 0001 RDI: >> 8804296d5040 >> [ 2134.326987] RBP: c90003153b30 R08: R09: >> >> [ 2134.391054] R10: R11: 0001f8180001f815 R12: >> >> [ 2134.456095] R13: 8804296d5040 R14: 8804099801f0 R15: >> 0004 >> [ 2134.521196] FS: 7fd64d3bf840() GS:88042f90() >> knlGS: >> [ 2134.595178] CS: 0010 DS: ES: CR0: 80050033 >> [ 2134.648637] CR2: 0001 CR3: 00081b892000 CR4: >> 000406e0 >> [ 2134.713349] Call Trace: >> [ 2134.737168] ? elv_drain_elevator+0x29/0xa0 >> [ 2134.775821] __blk_drain_queue+0x52/0x1a0 >> [ 2134.812473] blk_queue_bypass_start+0x6e/0xa0 >> [ 2134.854009] blkcg_deactivate_policy+0x30/0xf0 >> [ 2134.894993] blk_throtl_exit+0x34/0x50 >> [ 2134.929450] blkcg_exit_queue+0x35/0x40 >> [ 2134.965089] blk_release_queue+0x33/0xe0 >> [ 2135.001364] kobject_cleanup+0x63/0x170 >> [ 2135.037412] kobject_put+0x25/0x50 >> [ 2135.068622] blk_cleanup_queue+0x198/0x260 >> [ 2135.107780] cleanup_mapped_device+0xb5/0xf0 [dm_mod] >> [ 2135.154741] __dm_destroy+0x1a5/0x290 [dm_mod] >> [ 2135.195009] dm_destroy+0x13/0x20 [dm_mod] >> [ 2135.232149] dev_remove+0xde/0x120 [dm_mod] >> [ 2135.270184] ? dev_suspend+0x210/0x210 [dm_mod] >> [ 2135.311478] ctl_ioctl+0x20b/0x510 [dm_mod] >> [ 2135.349680] ? terminate_walk+0xc3/0x140 >> [ 2135.385442] ? kmem_cache_free+0x10/0x260 >> [ 2135.422315] dm_ctl_ioctl+0x13/0x20 [dm_mod] >> >> Known issue? > > Try with this, looks like we're calling the old bypass path for the new > path, which is now triggering because q->elevator is true. Should have grepped, there's one more path that uses the old bypass path for q->mq_ops, potentially. This one handles that one, too. diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 8ba0af780e88..2630f64bed19 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1223,7 +1223,11 @@ int blkcg_activate_policy(struct request_queue *q, if (blkcg_policy_enabled(q, pol)) return 0; - blk_queue_bypass_start(q); + if (q->mq_ops) { + blk_mq_freeze_queue(q); + blk_mq_quiesce_queue(q); + } else + blk_queue_bypass_start(q); pd_prealloc: if (!pd_prealloc) { pd_prealloc = pol->pd_alloc_fn(GFP_KERNEL, q->node); @@ -1261,7 +1265,10 @@ int blkcg_activate_policy(struct request_queue *q, spin_unlock_irq(q->queue_lock); out_bypass_end: - blk_queue_bypass_end(q); + if (q->mq_ops) + blk_mq_unfreeze_queue(q); + else + blk_queue_bypass_end(q); if (pd_prealloc) pol->pd_free_fn(pd_prealloc); return ret; @@ -1284,7 +1291,12 @@ void blkcg_deactivate_policy(struct request_queue *q, if (!blkcg_policy_enabled(q, pol)) return; - blk_queue_bypass_start(q); + if (q->mq_ops) { + blk_mq_freeze_queue(q); + blk_mq_quiesce_queue(q); + } else + blk_queue_bypass_start(q); + spin_lock_irq(q->queue_lock); __clear_bit(pol->plid, q->blkcg_pols); @@ -1304,7 +1316,11 @@ void blkcg_deactivate_policy(struct request_queue *q, } spin_unlock_irq(q->queue_lock); - blk_queue_bypass_end(q); + + if (q->mq_ops) + blk_mq_unfreeze_queue(q); + else + blk_queue_bypass_end(q); } EXPORT_SYMBOL_GPL(blkcg_deactivate_policy); diff --git a/block/elevator.c b/block/elevator.c index d14cb87e6564..464372840774 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -613,6 +613,9 @@ void elv_drain_elevator(struct request_queue *q) { static int printed; + if (WARN_ON_ONCE(q->mq_ops)) + return; + lockdep_assert_held(q->queue_lock); while (q->elevator->type->ops.sq.elevator_dis
Re: [dm-devel] kernel oops with blk-mq-sched latest
On 01/17/2017 12:57 AM, Hannes Reinecke wrote: > Hi Jens, > > I gave your latest patchset from > > git.kernel.dk/linux-block blk-mq-sched > > I see a kernel oops when shutting down: > > [ 2132.708929] systemd-shutdown[1]: Detaching DM devices. > [ 2132.965107] BUG: unable to handle kernel NULL pointer dereference at > > 0001 > [ 2133.037182] IP: dd_merged_requests+0x6/0x60 > [ 2133.077816] PGD 0 > [ 2133.077818] > [ 2133.113087] Oops: [#1] SMP > [ list of modules removed ] > [ 2133.925265] CPU: 20 PID: 1 Comm: systemd-shutdow Not tainted > 4.10.0-rc4+ #543 > [ 2133.990034] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013 > [ 2134.050522] task: 88042d614040 task.stack: c9000315 > [ 2134.106915] RIP: 0010:dd_merged_requests+0x6/0x60 > [ 2134.150593] RSP: 0018:c90003153b18 EFLAGS: 00010002 > [ 2134.198740] RAX: 81cc6de0 RBX: 8804296d5040 RCX: > 0001 > [ 2134.262708] RDX: 0001 RSI: 0001 RDI: > 8804296d5040 > [ 2134.326987] RBP: c90003153b30 R08: R09: > > [ 2134.391054] R10: R11: 0001f8180001f815 R12: > > [ 2134.456095] R13: 8804296d5040 R14: 8804099801f0 R15: > 0004 > [ 2134.521196] FS: 7fd64d3bf840() GS:88042f90() > knlGS: > [ 2134.595178] CS: 0010 DS: ES: CR0: 80050033 > [ 2134.648637] CR2: 0001 CR3: 00081b892000 CR4: > 000406e0 > [ 2134.713349] Call Trace: > [ 2134.737168] ? elv_drain_elevator+0x29/0xa0 > [ 2134.775821] __blk_drain_queue+0x52/0x1a0 > [ 2134.812473] blk_queue_bypass_start+0x6e/0xa0 > [ 2134.854009] blkcg_deactivate_policy+0x30/0xf0 > [ 2134.894993] blk_throtl_exit+0x34/0x50 > [ 2134.929450] blkcg_exit_queue+0x35/0x40 > [ 2134.965089] blk_release_queue+0x33/0xe0 > [ 2135.001364] kobject_cleanup+0x63/0x170 > [ 2135.037412] kobject_put+0x25/0x50 > [ 2135.068622] blk_cleanup_queue+0x198/0x260 > [ 2135.107780] cleanup_mapped_device+0xb5/0xf0 [dm_mod] > [ 2135.154741] __dm_destroy+0x1a5/0x290 [dm_mod] > [ 2135.195009] dm_destroy+0x13/0x20 [dm_mod] > [ 2135.232149] dev_remove+0xde/0x120 [dm_mod] > [ 2135.270184] ? dev_suspend+0x210/0x210 [dm_mod] > [ 2135.311478] ctl_ioctl+0x20b/0x510 [dm_mod] > [ 2135.349680] ? terminate_walk+0xc3/0x140 > [ 2135.385442] ? kmem_cache_free+0x10/0x260 > [ 2135.422315] dm_ctl_ioctl+0x13/0x20 [dm_mod] > > Known issue? Try with this, looks like we're calling the old bypass path for the new path, which is now triggering because q->elevator is true. diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 8ba0af780e88..1dd14c5b637a 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1284,7 +1284,12 @@ void blkcg_deactivate_policy(struct request_queue *q, if (!blkcg_policy_enabled(q, pol)) return; - blk_queue_bypass_start(q); + if (q->mq_ops) { + blk_mq_freeze_queue(q); + blk_mq_quiesce_queue(q); + } else + blk_queue_bypass_start(q); + spin_lock_irq(q->queue_lock); __clear_bit(pol->plid, q->blkcg_pols); @@ -1304,7 +1309,11 @@ void blkcg_deactivate_policy(struct request_queue *q, } spin_unlock_irq(q->queue_lock); - blk_queue_bypass_end(q); + + if (q->mq_ops) + blk_mq_unfreeze_queue(q); + else + blk_queue_bypass_end(q); } EXPORT_SYMBOL_GPL(blkcg_deactivate_policy); diff --git a/block/elevator.c b/block/elevator.c index d14cb87e6564..464372840774 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -613,6 +613,9 @@ void elv_drain_elevator(struct request_queue *q) { static int printed; + if (WARN_ON_ONCE(q->mq_ops)) + return; + lockdep_assert_held(q->queue_lock); while (q->elevator->type->ops.sq.elevator_dispatch_fn(q, 1)) -- Jens Axboe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support
Hi Binoy, 2017-01-16 9:37 GMT+01:00 Binoy Jayan : > The initial goal of our proposal was to process the encryption requests with > the > maximum possible block sizes with a hardware which has automated iv generation > capabilities. But when it is done in software, and if the bulk > requests are processed > sequentially, one block at a time, the memory foot print could be > reduced even if > the bulk request exceeds a page. While your patch looks good, there > are couple of > drawbacks one of which is the maximum size of a bulk request is a page. This > could limit the capability of the crypto hardware. If the whole bio is > processed at > once, which is what qualcomm's version of dm-req-crypt does, it achieves an > even > better performance. I see... well, I added the limit only so that the async fallback implementation can allocate multiple requests, so they can be processed in parallel, as they would be in the current dm-crypt code. I'm not really sure if that brings any benefit, but I guess if some HW accelerator has multiple engines, then this allows distributing the work among them. (I wonder how switching to the crypto API's IV generation will affect the situation for drivers that can process requests in parallel, but do not support the IV generators...) I could remove the limit and switch the fallback to sequential processing (or maybe even allocate the requests from a mempool, the way dm-crypt does it now...), but after Herbert's feedback I'm probably going to scrap this patchset anyway... >> Note that if the 'keycount' parameter of the cipher specification is set to a >> value other than 1, dm-crypt still sends only one sector in each request, >> since >> in such case the neighboring sectors are encrypted with different keys. > > This could be avoided if the key management is done at the crypto layer. Yes, but remember that the only reasonable use-case for using keycount != 1 is mounting loop-AES partitions (which is kind of a legacy format, so there is not much point in making HW drivers for it). It is an unfortunate consequence of Milan's decision to make keycount an independent part of the cipher specification (instead of making it specific for the LMK mode), that all the other IV modes are now 'polluted' with the requirement to support it. I discussed with Milan the possibility of deprecating the keycount parameter (i.e. allowing only value of 64 for LMK and 1 for all the other IV modes) and then converting the IV modes to skciphers (or IV generators, or some combination of both). This would significantly simplify the key management and allow for better optimization strategies. However, I don't know if such change would be accepted by device-mapper maintainers, since it may break someone's unusual dm-crypt configuration... Cheers, Ondrej -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
2017-01-13 15:29 GMT+01:00 Herbert Xu : > What if the driver had hardware support for generating these IVs? > With your scheme this cannot be supported at all. That's true... I'm starting to think that this isn't really a good idea. I was mainly trying to keep the door open for the random IV support and also to keep the multi-key stuff (which was really only intended for loop-AES partition support) out of the crypto API, but both of these can be probably solved in a better way... > Getting the IVs back is not actually that hard. We could simply > change the algorithm definition for the IV generator so that > the IVs are embedded in the plaintext and ciphertext. For > example, you could declare it so that the for n sectors the > first n*ivsize bytes would be the IV, and the actual plaintext > or ciphertext would follow. > > With such a definition you could either generate the IVs in dm-crypt > or have them generated in the IV generator. That seems kind of hacky to me... but if that's what you prefer, then so be it. Cheers, Ondrej > > Cheers, > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] deterministic io throughput in multipath
Hi Ben, Thanks for the review. In dict.c I will make sure I will make generic functions which will be used by both delay_checks and err_checks. We want to increment the path failures every time the path goes down regardless of whether multipathd or the kernel noticed the failure of paths. Thanks for pointing this. I will completely agree with the idea which you mentioned below by reconsidering the san_path_err_threshold_window with san_path_err_forget_rate. This will avoid counting time when the path was down as time where the path wasn't having problems. I will incorporate all the changes mentioned below and will resend the patch once the testing is done. Regards, Muneendra. -Original Message- From: Benjamin Marzinski [mailto:bmarz...@redhat.com] Sent: Tuesday, January 17, 2017 6:35 AM To: Muneendra Kumar M Cc: dm-devel@redhat.com Subject: Re: [dm-devel] deterministic io throughput in multipath On Mon, Jan 16, 2017 at 11:19:19AM +, Muneendra Kumar M wrote: >Hi Ben, >After the below discussion we came with the approach which will meet our >requirement. >I have attached the patch which is working good in our field tests. >Could you please review the attached patch and provide us your valuable >comments . I can see a number of issues with this patch. First, some nit-picks: - I assume "dis_reinstante_time" should be "dis_reinstate_time" - The indenting in check_path_validity_err is wrong, which made it confusing until I noticed that if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) doesn't have an open brace, and shouldn't indent the rest of the function. - You call clock_gettime in check_path, but never use the result. - In dict.c, instead of writing your own functions that are the same as the *_delay_checks functions, you could make those functions generic and use them for both. To go match the other generic function names they would probably be something like set_off_int_undef print_off_int_undef You would also need to change DELAY_CHECKS_* and ERR_CHECKS_* to point to some common enum that you created, the way user_friendly_names_states (to name one of many) does. The generic enum used by *_off_int_undef would be something like. enum no_undef { NU_NO = -1, NU_UNDEF = 0, } The idea is to try to cut down on the number of functions that are simply copy-pasting other functions in dict.c. Those are all minor cleanup issues, but there are some bigger problems. Instead of checking if san_path_err_threshold, san_path_err_threshold_window, and san_path_err_recovery_time are greater than zero seperately, you should probably check them all at the start of check_path_validity_err, and return 0 unless they all are set. Right now, if a user sets san_path_err_threshold and san_path_err_threshold_window but not san_path_err_recovery_time, their path will never recover after it hits the error threshold. I pretty sure that you don't mean to permanently disable the paths. time_t is a signed type, which means that if you get the clock time in update_multpath and then fail to get the clock time in check_path_validity_err, this check: start_time.tv_sec - pp->failure_start_time) < pp->mpp->san_path_err_threshold_window will always be true. I realize that clock_gettime is very unlikely to fail. But if it does, probably the safest thing to so is to just immediately return 0 in check_path_validity_err. The way you set path_failures in update_multipath may not get you what you want. It will only count path failures found by the kernel, and not the path checker. If the check_path finds the error, pp->state will be set to PATH_DOWN before pp->dmstate is set to PSTATE_FAILED. That means you will not increment path_failures. Perhaps this is what you want, but I would assume that you would want to count every time the path goes down regardless of whether multipathd or the kernel noticed it. I'm not super enthusiastic about how the san_path_err_threshold_window works. First, it starts counting from when the path goes down, so if the path takes long enough to get restored, and then fails immediately, it can just keep failing and it will never hit the san_path_err_threshold_window, since it spends so much of that time with the path failed. Also, the window gets set on the first error, and never reset until the number of errors is over the threshold. This means that if you get one early error and then a bunch of errors much later, you will go for (2 x san_path_err_threshold) - 1 errors until you stop reinstating the path, because of the window reset in the middle of the string of errors. It seems like a better idea would be to have check_path_validity_err reset path_failures as soon as it notices that you are past san_path_err_threshold_window, instead of waiting till the number of errors hits san_path_err_threshold. If I was going to design this, I think I would have san_path_err_th
[dm-devel] kernel oops with blk-mq-sched latest
Hi Jens, I gave your latest patchset from git.kernel.dk/linux-block blk-mq-sched I see a kernel oops when shutting down: [ 2132.708929] systemd-shutdown[1]: Detaching DM devices. [ 2132.965107] BUG: unable to handle kernel NULL pointer dereference at 0001 [ 2133.037182] IP: dd_merged_requests+0x6/0x60 [ 2133.077816] PGD 0 [ 2133.077818] [ 2133.113087] Oops: [#1] SMP [ list of modules removed ] [ 2133.925265] CPU: 20 PID: 1 Comm: systemd-shutdow Not tainted 4.10.0-rc4+ #543 [ 2133.990034] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013 [ 2134.050522] task: 88042d614040 task.stack: c9000315 [ 2134.106915] RIP: 0010:dd_merged_requests+0x6/0x60 [ 2134.150593] RSP: 0018:c90003153b18 EFLAGS: 00010002 [ 2134.198740] RAX: 81cc6de0 RBX: 8804296d5040 RCX: 0001 [ 2134.262708] RDX: 0001 RSI: 0001 RDI: 8804296d5040 [ 2134.326987] RBP: c90003153b30 R08: R09: [ 2134.391054] R10: R11: 0001f8180001f815 R12: [ 2134.456095] R13: 8804296d5040 R14: 8804099801f0 R15: 0004 [ 2134.521196] FS: 7fd64d3bf840() GS:88042f90() knlGS: [ 2134.595178] CS: 0010 DS: ES: CR0: 80050033 [ 2134.648637] CR2: 0001 CR3: 00081b892000 CR4: 000406e0 [ 2134.713349] Call Trace: [ 2134.737168] ? elv_drain_elevator+0x29/0xa0 [ 2134.775821] __blk_drain_queue+0x52/0x1a0 [ 2134.812473] blk_queue_bypass_start+0x6e/0xa0 [ 2134.854009] blkcg_deactivate_policy+0x30/0xf0 [ 2134.894993] blk_throtl_exit+0x34/0x50 [ 2134.929450] blkcg_exit_queue+0x35/0x40 [ 2134.965089] blk_release_queue+0x33/0xe0 [ 2135.001364] kobject_cleanup+0x63/0x170 [ 2135.037412] kobject_put+0x25/0x50 [ 2135.068622] blk_cleanup_queue+0x198/0x260 [ 2135.107780] cleanup_mapped_device+0xb5/0xf0 [dm_mod] [ 2135.154741] __dm_destroy+0x1a5/0x290 [dm_mod] [ 2135.195009] dm_destroy+0x13/0x20 [dm_mod] [ 2135.232149] dev_remove+0xde/0x120 [dm_mod] [ 2135.270184] ? dev_suspend+0x210/0x210 [dm_mod] [ 2135.311478] ctl_ioctl+0x20b/0x510 [dm_mod] [ 2135.349680] ? terminate_walk+0xc3/0x140 [ 2135.385442] ? kmem_cache_free+0x10/0x260 [ 2135.422315] dm_ctl_ioctl+0x13/0x20 [dm_mod] Known issue? Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel