Re: [dm-devel] [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
On Tue, Nov 22 2016 at 7:48P -0500, Mike Snitzerwrote: > But regardless, what certainly needs fixing is the inconsistency of > inheriting DM_TYPE_MQ_REQUEST_BASED but not setting all_blk_mq to true > (all_blk_mq == true is implied by DM_TYPE_MQ_REQUEST_BASED). > > I'm now questioning why we even need the 'all_blk_mq' state within the > table. I'll revisit the "why?" on all that in my previous commits. > I'll likely get back with you on this point tomorrow. And will > hopefully have a fix for you. We need 'all_blk_mq' because DM_TYPE_* is only used to establish the DM device's type of request_queue. It doesn't say anything about the DM device's underlying devices. Anyway, this _untested_ patch should hopefully resolve the 'all_blk_mq' inconsistency you saw: diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 8b013ea..8ce81d0 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -924,12 +924,6 @@ static int dm_table_determine_type(struct dm_table *t) BUG_ON(!request_based); /* No targets in this table */ - if (list_empty(devices) && __table_type_request_based(live_md_type)) { - /* inherit live MD type */ - t->type = live_md_type; - return 0; - } - /* * The only way to establish DM_TYPE_MQ_REQUEST_BASED is by * having a compatible target use dm_table_set_type. @@ -948,6 +942,19 @@ static int dm_table_determine_type(struct dm_table *t) return -EINVAL; } + if (list_empty(devices)) { + int srcu_idx; + struct dm_table *live_table = dm_get_live_table(t->md, _idx); + + /* inherit live table's type and all_blk_mq */ + if (live_table) { + t->type = live_table->type; + t->all_blk_mq = live_table->all_blk_mq; + } + dm_put_live_table(t->md, srcu_idx); + return 0; + } + /* Non-request-stackable devices can't be used for request-based dm */ list_for_each_entry(dd, devices, list) { struct request_queue *q = bdev_get_queue(dd->dm_dev->bdev); -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices
So, now we have at least 4 ways to improve mutliapth efficiency: 1) Filtering uevents; 2) Merger uevents; 3) Using separate locks for mpvec and pathvec; 4) Get rid of the gazillion waiter threads. This is exciting, but how do we achieve this blueprint? Can we set up some working groups and develop it in parallel to implement each improvement since most of them are independent? 发件人: "Benjamin Marzinski"收件人: Martin Wilck , 抄送: dm-devel@redhat.com, tang.jun...@zte.com.cn 日期: 2016/11/19 06:33 主题: Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices 发件人: dm-devel-boun...@redhat.com My $.02 I'm not sure that we want to wait for a predetermined time to grab the uevents. If they are coming in slowly, multipathd is more responsive by processing them immediately, and if they are coming in a storm, they will naturally pile up faster than we deal with them. I also don't really like the idea of waiting when we get an event to see if another one comes along, for the same reasons. That being said, I wouldn't NACK an config option (turned off by default) that set a minimum wait time between uevent dispatches, because I might be wrong here. However, It sees like what uevent_dispatch() dispatch already does is the right thing, which is to pull all oustanding uevents off the uevent queue into a new list. The issue is that we simply need to work on processing the whole list at a time. Martin's filtering definitely has a place here. He is correct that if we get a delete uevent for a device, we don't need to process any of the earlier ones. We do need to look at both the add and change uevents individually. For instance, one of the change uevents out of a group could be for changing the read-only status of a device. If we just took the most recent, we would miss that information. The good news is that we don't need to actually call uev_add_path and uev_update_path once for each of these events. All we need to do is read through the uev environment variables for all of them to find the information we need (like change in ro status), and then call the function once with that information. Hannes pointed out that for adding paths we don't need to do any locking until we want to add the path to the pathvec. We could grab all the outstanding add events from the list that gets passed to service_uevq, and collect the pathinfo for all the paths, add them all into the pathvec, and then create or update the multipath devices. delete uevents could work similarly, where we find all the paths for a multipath device in our list, remove them all, and reload the device once. I'm not sure how much a separate thread for doing the multipath work would buy us, however. It's true that we could have a seperate lock for the mpvec, so we didn't need to hold the pathvec lock while updating the mpvec, but actually updating the mpvec only takes a moment. The time consuming part is building the multipath device and passing it to the kernel. For this, we do need the the paths locked. Otherwise what would keep a path from getting removed while the multipath device was using it to set get set up. If working with multipath devices and proccessing path uevents is happening at the same time, this is a very real possibility. But even if we keep one thread processing the uevents, simply avoiding calling uev_add_path and uev_update_path for repeated add and change events, and batching multiple add and delete events together could provide a real speedup, IMHO. Now, the holy grail of multipathd efficiency would be to totally rework multipathd's locking into something sensible. We could have locks for the vectors that only got held when you were actually traversing or manipulating the vectors, coupled with reference counts on the individual paths and multipaths, so that they didn't get removed while they were in use, and probably locks on the individual paths and multipaths for just the sections that really needed to be protected. The problem is that this would take a huge amount of code auditting to get mostly right, and then a while to flush out all the missed corner cases. At least I think it would. But I dismissed decoupling the config structure from the vecs lock as too painful, and clearly that was possible. At any rate, I'd rather get rid of the gazillion waiter threads first. -Ben On Thu, Nov 17, 2016 at 11:48:32AM +0100, Martin Wilck wrote: > Hi Tang, > > > As to process several uevents for the same physical devices, I think > > the opinions > > different between us is "FILTER" or "MERGER". Personally, I think > > Merger is more > > accuracy, for example, we receive 4 paths addition uevent messages > > from the same > > physical devices: > > 1)uevent add sdb > > 2)uevent add sdc > > 3)uevent add sdd > > 4)uevent add sde > > > > We cannot just filter the 1)2)3) uevent messages but only process the > > 4)uevent message,
Re: [dm-devel] [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
On Tue, Nov 22 2016 at 6:47pm -0500, Bart Van Asschewrote: > On 11/21/2016 04:34 PM, Mike Snitzer wrote: > >But you WARN_ON_ONCE(clone && q->mq_ops) will trigger with sq-on-mq. > > Hello Mike, > > This behavior is indeed triggered by the sq-on-mq test. After having > added the following code in __bind(): > > if (old_map && > dm_table_all_blk_mq_devices(old_map) != > dm_table_all_blk_mq_devices(t)) > pr_debug("%s: old all_blk_mq %d <> new all_blk_mq %d\n", >dm_device_name(md), >dm_table_all_blk_mq_devices(old_map), >dm_table_all_blk_mq_devices(t)); > > I see the following output appear frequently in the kernel log: > > dm_mod:__bind: 254:0: old all_blk_mq 1 <> new all_blk_mq 0 > > Could these all_blk_mq state changes explain that WARN_ON_ONCE(clone > && q->mq_ops) is triggered in __multipath_map()? Does this mean that > the comment in patch http://marc.info/?l=dm-devel=147925314306752 > is correct? Yes, looks like you're on to something. dm_old_prep_tio() has: /* * Must clone a request if this .request_fn DM device * is stacked on .request_fn device(s). */ if (!dm_table_all_blk_mq_devices(table)) { ... So if your table->all_blk_mq is false (likely due to a temporary no paths in multipath table scenario) a clone will be passed to __multipath_map(). But what isn't clear is how __multipath_map() would then go on to have any underlying paths available to issue IO to (given the table would have been empty -- or so I would think). Would be great if you could verify that the table is in fact empty. It should be noted that dm_table_determine_type() has: if (list_empty(devices) && __table_type_request_based(live_md_type)) { /* inherit live MD type */ t->type = live_md_type; return 0; } So this explains how/why an empty table will inherit the DM_TYPE_MQ_REQUEST_BASED, and it also explains why the new (empty) table would not have ->all_blk_mq set to true. But again, no IO is able to be issued when there are no underlying paths. And looking closer, __multipath_map() _should_ return early with either DM_MAPIO_DELAY_REQUEUE or DM_MAPIO_REQUEUE when no paths are available. Not seeing how you could have this scenario where you prepared a clone (as if the clone request were to be issued to a .request_fn, aka "sq", device) and then by the time you get into __multipath_map() there is an underlying path available with q->mq_ops. But regardless, what certainly needs fixing is the inconsistency of inheriting DM_TYPE_MQ_REQUEST_BASED but not setting all_blk_mq to true (all_blk_mq == true is implied by DM_TYPE_MQ_REQUEST_BASED). I'm now questioning why we even need the 'all_blk_mq' state within the table. I'll revisit the "why?" on all that in my previous commits. I'll likely get back with you on this point tomorrow. And will hopefully have a fix for you. FYI: given all this I do think something like your 7/7 patch (which you referenced with the above url) would be a possible added safety net to guard against future inconsistencies/bug/regressions. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH] multipath-tools: identify DataCore SANsymphony in hwtable
Cc: Christophe VaroquiCc: device-mapper development Signed-off-by: Xose Vazquez Perez --- libmultipath/hwtable.c | 1 + 1 file changed, 1 insertion(+) diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c index eb7a5a0..c0bfa75 100644 --- a/libmultipath/hwtable.c +++ b/libmultipath/hwtable.c @@ -884,6 +884,7 @@ static struct hwentry default_hw[] = { .prio_name = PRIO_ALUA, }, { + /* SANsymphony */ .vendor= "DataCore", .product = "Virtual Disk", .pgpolicy = GROUP_BY_PRIO, -- 2.10.2 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 00/12] block: cleanup direct access to bvec table
On 11/11/2016 05:05 AM, Ming Lei wrote: Hi, This patchset cleans up direct access to bvec table. The 1st patch passes bvec table to bio_init(), so that direct access to bvec table in bio initialization is avoided. For other patches, most of them uses bio_add_page() to replace hardcode style of adding page to bvec, and others avoids to access bio->bi_vcnt. The most special one is to use bvec iterator helpers to implement .get_page/.next_page for dm-io.c One big motivation is to prepare for supporting multipage bvec, but this patchset is one good cleanup too even not for that purpose. I've added the series, except 6-8, the dm parts. I updated some of the descriptions/subjects to read a little better. -- Jens Axboe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] New device mapper target ZDM
Well to start with some device-mapper basics, looks at the newer files (e.g. thin-provisioning.txt, cache.txt) in the Documentation directory to see what format to use and how to document the interface. Constructor, status, examples using dmsetup only etc. (BTW We don't use equals signs in our interfaces.) But even before any of that, I'm puzzled by the basics of how this can have been fitted into a device-mapper interface where the "sine qua non" resume function is still TODO, 'dmsetup table' appears unimplemented, 'status' doesn't appear to provide the internal status, and 'load' has some keywords that also don't appear to fit. Look at how the existing dm targets divide up the work between each of the functions and model your interface on those. Alasdair -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] New device mapper target ZDM
On 11/21/16 11:23, Shaun Tancheff wrote: > +/* > -- */ > +/* --ProcFS Support > Routines- */ > +/* > -- */ > + > +#if defined(CONFIG_PROC_FS) > + > +/** > + * struct zone_info_entry - Proc zone entry. > + * @zone: Zone Index > + * @info: Info (WP/Used). > + */ > +struct zone_info_entry { > + u32 zone; > + u32 info; > +}; > + > +/** > + * Startup writing to our proc entry > + */ > +static void *proc_wp_start(struct seq_file *seqf, loff_t *pos) > +{ > + struct zdm *znd = seqf->private; > + > + if (*pos == 0) > + znd->wp_proc_at = *pos; > + return >wp_proc_at; > +} Hello Shaun, Does this mean that you are not aware that it is considered unacceptable for new drivers to create procfs entries? Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 07/12] dm: use bvec iterator helpers to implement .get_page and .next_page
On Mon, Nov 21, 2016 at 10:49 PM, Mike Snitzerwrote: > On Fri, Nov 11 2016 at 7:05am -0500, > Ming Lei wrote: > >> Firstly we have mature bvec/bio iterator helper for iterate each >> page in one bio, not necessary to reinvent a wheel to do that. >> >> Secondly the coming multipage bvecs requires this patch. >> >> Also add comments about the direct access to bvec table. >> >> Signed-off-by: Ming Lei > > I've staged this for 4.10 Thanks Mike! -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel