Re: [dm-devel] [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()

2016-11-22 Thread Mike Snitzer
On Tue, Nov 22 2016 at  7:48P -0500,
Mike Snitzer  wrote:
 
> But regardless, what certainly needs fixing is the inconsistency of
> inheriting DM_TYPE_MQ_REQUEST_BASED but not setting all_blk_mq to true
> (all_blk_mq == true is implied by DM_TYPE_MQ_REQUEST_BASED).
> 
> I'm now questioning why we even need the 'all_blk_mq' state within the
> table.  I'll revisit the "why?" on all that in my previous commits.
> I'll likely get back with you on this point tomorrow.  And will
> hopefully have a fix for you.

We need 'all_blk_mq' because DM_TYPE_* is only used to establish the
DM device's type of request_queue.  It doesn't say anything about the DM
device's underlying devices.

Anyway, this _untested_ patch should hopefully resolve the 'all_blk_mq'
inconsistency you saw:

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 8b013ea..8ce81d0 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -924,12 +924,6 @@ static int dm_table_determine_type(struct dm_table *t)
 
BUG_ON(!request_based); /* No targets in this table */
 
-   if (list_empty(devices) && __table_type_request_based(live_md_type)) {
-   /* inherit live MD type */
-   t->type = live_md_type;
-   return 0;
-   }
-
/*
 * The only way to establish DM_TYPE_MQ_REQUEST_BASED is by
 * having a compatible target use dm_table_set_type.
@@ -948,6 +942,19 @@ static int dm_table_determine_type(struct dm_table *t)
return -EINVAL;
}
 
+   if (list_empty(devices)) {
+   int srcu_idx;
+   struct dm_table *live_table = dm_get_live_table(t->md, 
_idx);
+
+   /* inherit live table's type and all_blk_mq */
+   if (live_table) {
+   t->type = live_table->type;
+   t->all_blk_mq = live_table->all_blk_mq;
+   }
+   dm_put_live_table(t->md, srcu_idx);
+   return 0;
+   }
+
/* Non-request-stackable devices can't be used for request-based dm */
list_for_each_entry(dd, devices, list) {
struct request_queue *q = bdev_get_queue(dd->dm_dev->bdev);

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices

2016-11-22 Thread tang . junhui
So, now we have at least 4 ways to improve mutliapth efficiency:
1)  Filtering uevents;
2)  Merger uevents;
3)  Using separate locks for mpvec and pathvec;
4)  Get rid of the gazillion waiter threads.

This is exciting, but how do we achieve this blueprint? 
Can we set up some working groups and develop it in parallel 
to implement each improvement since most of them are independent?



发件人: "Benjamin Marzinski" 
收件人: Martin Wilck , 
抄送:   dm-devel@redhat.com, tang.jun...@zte.com.cn
日期:   2016/11/19 06:33
主题:   Re: [dm-devel] Improve processing efficiency for addition and 
deletion of multipath devices
发件人: dm-devel-boun...@redhat.com



My $.02

I'm not sure that we want to wait for a predetermined time to grab the
uevents.  If they are coming in slowly, multipathd is more responsive by
processing them immediately, and if they are coming in a storm, they
will naturally pile up faster than we deal with them. I also don't
really like the idea of waiting when we get an event to see if another
one comes along, for the same reasons. That being said, I wouldn't NACK
an config option (turned off by default) that set a minimum wait time
between uevent dispatches, because I might be wrong here.  However, It
sees like what uevent_dispatch() dispatch already does is the right
thing, which is to pull all oustanding uevents off the uevent queue into
a new list.

The issue is that we simply need to work on processing the whole list at
a time.  Martin's filtering definitely has a place here. He is correct
that if we get a delete uevent for a device, we don't need to process
any of the earlier ones.  We do need to look at both the add and change
uevents individually. For instance, one of the change uevents out of a
group could be for changing the read-only status of a device. If we just
took the most recent, we would miss that information.  The good news is
that we don't need to actually call uev_add_path and uev_update_path
once for each of these events.  All we need to do is read through the
uev environment variables for all of them to find the information we
need (like change in ro status), and then call the function once with
that information.

Hannes pointed out that for adding paths we don't need to do any locking
until we want to add the path to the pathvec. We could grab all the
outstanding add events from the list that gets passed to service_uevq,
and collect the pathinfo for all the paths, add them all into the
pathvec, and then create or update the multipath devices. delete uevents
could work similarly, where we find all the paths for a multipath device
in our list, remove them all, and reload the device once.

I'm not sure how much a separate thread for doing the multipath work
would buy us, however. It's true that we could have a seperate lock for
the mpvec, so we didn't need to hold the pathvec lock while updating the
mpvec, but actually updating the mpvec only takes a moment. The time
consuming part is building the multipath device and passing it to the
kernel. For this, we do need the the paths locked. Otherwise what would
keep a path from getting removed while the multipath device was using it
to set get set up. If working with multipath devices and proccessing
path uevents is happening at the same time, this is a very real
possibility.

But even if we keep one thread processing the uevents, simply avoiding
calling uev_add_path and uev_update_path for repeated add and change
events, and batching multiple add and delete events together could
provide a real speedup, IMHO.

Now, the holy grail of multipathd efficiency would be to totally rework
multipathd's locking into something sensible.  We could have locks for
the vectors that only got held when you were actually traversing or
manipulating the vectors, coupled with reference counts on the
individual paths and multipaths, so that they didn't get removed while
they were in use, and probably locks on the individual paths and
multipaths for just the sections that really needed to be protected.
The problem is that this would take a huge amount of code auditting to
get mostly right, and then a while to flush out all the missed corner
cases.  At least I think it would. But I dismissed decoupling the config
structure from the vecs lock as too painful, and clearly that was
possible.

At any rate, I'd rather get rid of the gazillion waiter threads first.

-Ben

On Thu, Nov 17, 2016 at 11:48:32AM +0100, Martin Wilck wrote:
> Hi Tang,
> 
> > As to process several uevents for the same physical devices, I think
> > the opinions 
> > different between us is "FILTER" or "MERGER". Personally, I think
> > Merger is more 
> > accuracy, for example, we receive 4 paths addition uevent messages
> > from the same 
> > physical devices: 
> > 1)uevent add sdb 
> > 2)uevent add sdc 
> > 3)uevent add sdd 
> > 4)uevent add sde 
> > 
> > We cannot just filter the 1)2)3) uevent messages but only process the
> > 4)uevent message, 

Re: [dm-devel] [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()

2016-11-22 Thread Mike Snitzer
On Tue, Nov 22 2016 at  6:47pm -0500,
Bart Van Assche  wrote:

> On 11/21/2016 04:34 PM, Mike Snitzer wrote:
> >But you WARN_ON_ONCE(clone && q->mq_ops) will trigger with sq-on-mq.
> 
> Hello Mike,
> 
> This behavior is indeed triggered by the sq-on-mq test. After having
> added the following code in __bind():
> 
>   if (old_map &&
>   dm_table_all_blk_mq_devices(old_map) !=
>   dm_table_all_blk_mq_devices(t))
>   pr_debug("%s: old all_blk_mq %d <> new all_blk_mq %d\n",
>dm_device_name(md),
>dm_table_all_blk_mq_devices(old_map),
>dm_table_all_blk_mq_devices(t));
> 
> I see the following output appear frequently in the kernel log:
> 
> dm_mod:__bind: 254:0: old all_blk_mq 1 <> new all_blk_mq 0
> 
> Could these all_blk_mq state changes explain that WARN_ON_ONCE(clone
> && q->mq_ops) is triggered in __multipath_map()? Does this mean that
> the comment in patch http://marc.info/?l=dm-devel=147925314306752
> is correct?

Yes, looks like you're on to something.  dm_old_prep_tio() has:

/*
 * Must clone a request if this .request_fn DM device
 * is stacked on .request_fn device(s).
 */
if (!dm_table_all_blk_mq_devices(table)) { ...

So if your table->all_blk_mq is false (likely due to a temporary no
paths in multipath table scenario) a clone will be passed to
__multipath_map().  But what isn't clear is how __multipath_map() would
then go on to have any underlying paths available to issue IO to (given
the table would have been empty -- or so I would think).

Would be great if you could verify that the table is in fact empty.

It should be noted that dm_table_determine_type() has:

if (list_empty(devices) && __table_type_request_based(live_md_type)) {
/* inherit live MD type */
t->type = live_md_type;
return 0;
}

So this explains how/why an empty table will inherit the
DM_TYPE_MQ_REQUEST_BASED, and it also explains why the new (empty) table
would not have ->all_blk_mq set to true.  But again, no IO is able to be
issued when there are no underlying paths.

And looking closer, __multipath_map() _should_ return early with either
DM_MAPIO_DELAY_REQUEUE or DM_MAPIO_REQUEUE when no paths are available.

Not seeing how you could have this scenario where you prepared a clone
(as if the clone request were to be issued to a .request_fn, aka "sq",
device) and then by the time you get into __multipath_map() there is an
underlying path available with q->mq_ops.

But regardless, what certainly needs fixing is the inconsistency of
inheriting DM_TYPE_MQ_REQUEST_BASED but not setting all_blk_mq to true
(all_blk_mq == true is implied by DM_TYPE_MQ_REQUEST_BASED).

I'm now questioning why we even need the 'all_blk_mq' state within the
table.  I'll revisit the "why?" on all that in my previous commits.
I'll likely get back with you on this point tomorrow.  And will
hopefully have a fix for you.

FYI: given all this I do think something like your 7/7 patch (which you
referenced with the above url) would be a possible added safety net to
guard against future inconsistencies/bug/regressions.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH] multipath-tools: identify DataCore SANsymphony in hwtable

2016-11-22 Thread Xose Vazquez Perez
Cc: Christophe Varoqui 
Cc: device-mapper development 
Signed-off-by: Xose Vazquez Perez 
---
 libmultipath/hwtable.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index eb7a5a0..c0bfa75 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -884,6 +884,7 @@ static struct hwentry default_hw[] = {
.prio_name = PRIO_ALUA,
},
{
+   /* SANsymphony */
.vendor= "DataCore",
.product   = "Virtual Disk",
.pgpolicy  = GROUP_BY_PRIO,
-- 
2.10.2

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 00/12] block: cleanup direct access to bvec table

2016-11-22 Thread Jens Axboe

On 11/11/2016 05:05 AM, Ming Lei wrote:

Hi,

This patchset cleans up direct access to bvec table.

The 1st patch passes bvec table to bio_init(), so that
direct access to bvec table in bio initialization is avoided.

For other patches, most of them uses bio_add_page()
to replace hardcode style of adding page to bvec,
and others avoids to access bio->bi_vcnt.

The most special one is to use bvec iterator helpers
to implement .get_page/.next_page for dm-io.c

One big motivation is to prepare for supporting multipage
bvec, but this patchset is one good cleanup too even not
for that purpose.


I've added the series, except 6-8, the dm parts. I updated some of the
descriptions/subjects to read a little better.

--
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] New device mapper target ZDM

2016-11-22 Thread Alasdair G Kergon
Well to start with some device-mapper basics, looks at the newer files
(e.g.  thin-provisioning.txt, cache.txt) in the Documentation directory
to see what format to use and how to document the interface.
Constructor, status, examples using dmsetup only etc.

(BTW We don't use equals signs in our interfaces.)

But even before any of that, I'm puzzled by the basics of how this can
have been fitted into a device-mapper interface where the "sine qua non"
resume function is still TODO, 'dmsetup table' appears unimplemented,
'status' doesn't appear to provide the internal status, and 'load' has
some keywords that also don't appear to fit.

Look at how the existing dm targets divide up the work between each of
the functions and model your interface on those.

Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] New device mapper target ZDM

2016-11-22 Thread Bart Van Assche
On 11/21/16 11:23, Shaun Tancheff wrote:
> +/* 
> -- */
> +/* --ProcFS Support 
> Routines- */
> +/* 
> -- */
> +
> +#if defined(CONFIG_PROC_FS)
> +
> +/**
> + * struct zone_info_entry - Proc zone entry.
> + * @zone: Zone Index
> + * @info: Info (WP/Used).
> + */
> +struct zone_info_entry {
> + u32 zone;
> + u32 info;
> +};
> +
> +/**
> + * Startup writing to our proc entry
> + */
> +static void *proc_wp_start(struct seq_file *seqf, loff_t *pos)
> +{
> + struct zdm *znd = seqf->private;
> +
> + if (*pos == 0)
> + znd->wp_proc_at = *pos;
> + return >wp_proc_at;
> +}

Hello Shaun,

Does this mean that you are not aware that it is considered unacceptable 
for new drivers to create procfs entries?

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 07/12] dm: use bvec iterator helpers to implement .get_page and .next_page

2016-11-22 Thread Ming Lei
On Mon, Nov 21, 2016 at 10:49 PM, Mike Snitzer  wrote:
> On Fri, Nov 11 2016 at  7:05am -0500,
> Ming Lei  wrote:
>
>> Firstly we have mature bvec/bio iterator helper for iterate each
>> page in one bio, not necessary to reinvent a wheel to do that.
>>
>> Secondly the coming multipage bvecs requires this patch.
>>
>> Also add comments about the direct access to bvec table.
>>
>> Signed-off-by: Ming Lei 
>
> I've staged this for 4.10

Thanks Mike!

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel