Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-17 Thread Mike Snitzer
On Fri, Feb 17 2017 at  4:04am -0500,
h...@infradead.org  wrote:

> On Thu, Feb 16, 2017 at 01:21:29PM -0500, Mike Snitzer wrote:
> > multipath-tools has tables that specify all the defaults for a given
> > target backend.  NVMe will just be yet another.
> 
> No, if we get things right it won't.  ALUA already got rid of most
> of the parameter people would have to set under normal conditions,
> and I plan to make sure the NVMe equivalent will do it for all
> parameters.  I am active in the NVMe working group and will do my
> best to get there.  There's a few others folks here that are more or
> less active there as well (Keith, Martin, Jens for example), so I
> think we have a chance.
> 
> That beeing said Keith is right that we'll always have odd setups
> where we need to overrid things, and we will have to provide tunables
> for that.  It's no different from any other kernel subsystem in that.

Before ALUA fixed all that vendor specific fragmentation there was the
even worse fragmentation where different vendors pushed multipathing
into their FC drivers.  James correctly pushed them toward a generic
solution (and DM multipath was born).  If every transport implements its
own multipathing then it'll be a more generic, yet very similar,
fragmentation problem.

But if your native NVMe multipathing really is factored such that the
actual IO fast path is implemented in block core, and transport specific
hooks are called out to as needed, then you've simply reimplement DM
multipath in block core.

Pretty weird place to invest _so_ much energy before you've fully
explored how unworkable DM multipath support for NVMe is.  But I
digress.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-17 Thread Mike Snitzer
On Fri, Feb 17 2017 at  4:05am -0500,
Christoph Hellwig  wrote:

> On Thu, Feb 16, 2017 at 08:05:36PM +0200, Sagi Grimberg wrote:
> > I guess one config option that we'd need is multibus vs. failover
> > which are used per use-case.
> 
> Which fundamentally is a property of the target first, and it should
> tell us that.  There might be the occasional need for an override,
> but certainly not locally maintained tables of targets and their
> preferences.

Think you're being idealistic to think that once NVMe is on the roadmap,
and an engineering priority, of every player in the industry that your
more-or-less prestine implementation isn't going to get stretched to its
limits and ultimately need a fair amount of vendor specific tweaks.

But hopefully I'm wrong.  I just know what has happened with SCSI/FC/IB
multipathing vendor hardware and the jockeying that all these vendors do
ultimately results in the need for quirks because they raced to market
and ultimately missed something or invented their own ways forward to
allow NVMe to dove-tail into their preexisting legacy "enterprise"
offerings.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-17 Thread Mike Snitzer
On Fri, Feb 17 2017 at  4:33am -0500,
Christoph Hellwig  wrote:

> On Thu, Feb 16, 2017 at 10:13:37AM -0500, Mike Snitzer wrote:
> > Not following what you're saying Keith did.  Please feel free to
> > clarify.
> 
> Keith demonstrated what it takes to support NVMe with dm.  He also
> gave a couple presentations on it in addition to various ptches on
> the list.

Well I know very well that Keith was an early consumer of DM multipath
for NVMe (back when Intel was looking at doing dual port NVMe drives).
Keith's interest in NVMe support in multipath is what motivated all the
blk-mq request-based DM multipath advances.

But I'll follow-up with Keith to see if he can sweep through whatever
patches he has and make sure I'm aware of them.

> > The middle man is useful if it can support all transports.  If it only
> > supports some then yeah the utility is certainly reduced.
> 
> Again let's look at what multipathing involves:
> 
>  - discovery of multiple paths for a device, and path preferences:
>Storage protocol specific
> 
>  - handling of path state and grouping changes:
>Storage protocol specific
> 
>  - handling of path up / down events:
>Storage protocol / transport specific if provided
> 
>  - keep alive / path checking:
>Storage protocol specific with possible generic fallback
> 
>  - path selection:
>Generic, although building heavily on protocol / transport specific
>information
> 
> So most of the hard work is transport specific anyway.  And I fully
> agree that generic code should be, well generic.  And with generic
> I mean right in the block layer instead of involving a layer block
> driver that relies on lots of low-level driver information and setup
> from user space.

"Lots of low-level driver information?"  You mean like SCSI's pg_init
and such?

> > I'm going to look at removing any scsi_dh code from DM multipath
> > (someone already proposed removing the 'retain_attached_hw_handler'
> > feature).  Not much point having anything in DM multipath now that scsi
> > discovery has the ability to auto-attach the right scsi_dh via scsi_dh's
> > .match hook.
> 
> Great.
> 
> > As a side-effect it will fix Keith's scsi_dh crash (when
> > operating on NVMe request_queue).
> 
> I think we'll need to have a quick fix for that ASAP, though.

Hannes' latest patch looked good to me.  Should take care of it.  I
assume mkp will pick it up for the 4.11 merge window and mark for
stable?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-17 Thread Christoph Hellwig
On Thu, Feb 16, 2017 at 10:13:37AM -0500, Mike Snitzer wrote:
> Not following what you're saying Keith did.  Please feel free to
> clarify.

Keith demonstrated what it takes to support NVMe with dm.  He also
gave a couple presentations on it in addition to various ptches on
the list.

> The middle man is useful if it can support all transports.  If it only
> supports some then yeah the utility is certainly reduced.

Again let's look at what multipathing involves:

 - discovery of multiple paths for a device, and path preferences:
   Storage protocol specific

 - handling of path state and grouping changes:
   Storage protocol specific

 - handling of path up / down events:
   Storage protocol / transport specific if provided

 - keep alive / path checking:
   Storage protocol specific with possible generic fallback

 - path selection:
   Generic, although building heavily on protocol / transport specific
   information

So most of the hard work is transport specific anyway.  And I fully
agree that generic code should be, well generic.  And with generic
I mean right in the block layer instead of involving a layer block
driver that relies on lots of low-level driver information and setup
from user space.

> I'm going to look at removing any scsi_dh code from DM multipath
> (someone already proposed removing the 'retain_attached_hw_handler'
> feature).  Not much point having anything in DM multipath now that scsi
> discovery has the ability to auto-attach the right scsi_dh via scsi_dh's
> .match hook.

Great.

> As a side-effect it will fix Keith's scsi_dh crash (when
> operating on NVMe request_queue).

I think we'll need to have a quick fix for that ASAP, though.

> My hope is that your NVMe equivalent for scsi_dh will "just work" (TM)
> like scsi_dh auto-attach does.  There isn't a finished ALUA equivalent
> standard for NVMe so I'd imagine at this point you have a single device
> handler for NVMe to do error translation?

Yes, error translation for the block layer, but most importantly
discovery of multiple paths to the same namespace.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-17 Thread h...@infradead.org
On Thu, Feb 16, 2017 at 01:21:29PM -0500, Mike Snitzer wrote:
> multipath-tools has tables that specify all the defaults for a given
> target backend.  NVMe will just be yet another.

No, if we get things right it won't.  ALUA already got rid of most
of the parameter people would have to set under normal conditions,
and I plan to make sure the NVMe equivalent will do it for all
parameters.  I am active in the NVMe working group and will do my
best to get there.  There's a few others folks here that are more or
less active there as well (Keith, Martin, Jens for example), so I
think we have a chance.

That beeing said Keith is right that we'll always have odd setups
where we need to overrid things, and we will have to provide tunables
for that.  It's no different from any other kernel subsystem in that.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-17 Thread Christoph Hellwig
On Thu, Feb 16, 2017 at 08:05:36PM +0200, Sagi Grimberg wrote:
> I guess one config option that we'd need is multibus vs. failover
> which are used per use-case.

Which fundamentally is a property of the target first, and it should
tell us that.  There might be the occasional need for an override,
but certainly not locally maintained tables of targets and their
preferences.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Bart Van Assche
On Thu, 2017-02-16 at 15:23 -0500, Mike Snitzer wrote:
> Or you can wait to rebase on v4.11-rc1 in ~2 weeks.

Hello Mike,

I will wait until v4.11-rc1 has been released.

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Keith Busch
On Thu, Feb 16, 2017 at 01:21:29PM -0500, Mike Snitzer wrote:
> Then undeprecate them.  Decisions like marking a path checker deprecated
> were _not_ made with NVMe in mind.  They must predate NVMe.
> 
> multipath-tools has tables that specify all the defaults for a given
> target backend.  NVMe will just be yet another.  Yes some user _could_
> shoot themselves in the foot by overriding the proper configuration but
> since when are we motivated by _not_ giving users the power to hang
> themselves?
> 
> As for configurability (chosing between N valid configs/settings): At
> some point the user will want one behaviour vs another.  Thinking
> otherwise is just naive.  Think error timeouts, etc.  Any multipath
> kernel implementation (which dm-multipath is BTW) will eventually find
> itself at a crossroads where the underlying fabric could be tweaked in
> different ways.  Thinking you can just hardcode these attributes and
> settings is foolish.

Roger that, and I absolutely want to see this work with the existing
framework.

I just think it'd be easier for everyone if multipath were more like
the generic block layer, in that devices are surfaced with configurable
policies without userspace telling it which to use. The kernel knowing
safe defaults for a particular device is probably the more common case,
and userspace can still tune them as needed. Of course, I accept you're
in a better position to know if this is folly.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Mike Snitzer
On Thu, Feb 16 2017 at  2:46pm -0500,
Bart Van Assche  wrote:

> On Thu, 2017-02-16 at 07:37 -0500, Mike Snitzer wrote:
> > Weird.  I did push back on those changes initially (just felt like
> > churn) but I ultimately did take them:
> > 
> > $ git log --oneline --author=bart drivers/md/dm-mpath.c
> > 6599c84 dm mpath: do not modify *__clone if blk_mq_alloc_request() fails
> > 4813577 dm mpath: change return type of pg_init_all_paths() from int to void
> > 9f4c3f8 dm: convert wait loops to use autoremove_wake_function()
> > 
> > Did I miss any?
> 
> Hello Mike,
> 
> Thank you for having accepted these patches. However, I was not referring
> to these patches but to the eight patches available at
> https://github.com/bvanassche/linux/tree/dm-mpath

Some of these look familiar (the dm-mpath micro-optimize one made me
cringe.. potential for regression, etc.  But careful review should ease
those concerns).  But yeah, these certainly slipped through the cracks.
I'll review these for 4.12 (too late for 4.11, sorry)

Please rebase these ontop of linux-block.git's for-4.11/rq-refactor
because Christoph significantly reworked the request-based DM code (all
request allocation is done in DM multipath now), see:
http://git.kernel.dk/cgit/linux-block/commit/?h=for-4.11/rq-refactor=eb8db831be80692bf4bda3dfc55001daf64ec299

Or you can wait to rebase on v4.11-rc1 in ~2 weeks.

Thanks

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Bart Van Assche
On Thu, 2017-02-16 at 07:37 -0500, Mike Snitzer wrote:
> Weird.  I did push back on those changes initially (just felt like
> churn) but I ultimately did take them:
> 
> $ git log --oneline --author=bart drivers/md/dm-mpath.c
> 6599c84 dm mpath: do not modify *__clone if blk_mq_alloc_request() fails
> 4813577 dm mpath: change return type of pg_init_all_paths() from int to void
> 9f4c3f8 dm: convert wait loops to use autoremove_wake_function()
> 
> Did I miss any?

Hello Mike,

Thank you for having accepted these patches. However, I was not referring
to these patches but to the eight patches available at
https://github.com/bvanassche/linux/tree/dm-mpath

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Mike Snitzer
On Thu, Feb 16 2017 at  1:07pm -0500,
Keith Busch  wrote:

> On Thu, Feb 16, 2017 at 05:37:41PM +, Bart Van Assche wrote:
> > On Thu, 2017-02-16 at 12:38 -0500, Keith Busch wrote:
> > > Maybe I'm not seeing the bigger picture. Is there some part to multipath
> > > that the kernel is not in a better position to handle?
> > 
> > Does this mean that the code to parse /etc/multipath.conf will be moved into
> > the kernel? Or will we lose the ability to configure the policies that
> > /etc/multipath.conf allows to configure?
> 
> No, I'm just considering the settings for a device that won't work
> at all if multipath.conf is wrong. For example, the uuid attributes,
> path priority, or path checker. These can't be considered configurable
> policies if all but one of them are invalid for a specific device type.
> 
> It shouldn't even be an option to let a user select TUR path checker
> for NVMe, and the only checkers multipath-tools provide that even make
> sense for NVMe are deprecated.

Then undeprecate them.  Decisions like marking a path checker deprecated
were _not_ made with NVMe in mind.  They must predate NVMe.

multipath-tools has tables that specify all the defaults for a given
target backend.  NVMe will just be yet another.  Yes some user _could_
shoot themselves in the foot by overriding the proper configuration but
since when are we motivated by _not_ giving users the power to hang
themselves?

As for configurability (chosing between N valid configs/settings): At
some point the user will want one behaviour vs another.  Thinking
otherwise is just naive.  Think error timeouts, etc.  Any multipath
kernel implementation (which dm-multipath is BTW) will eventually find
itself at a crossroads where the underlying fabric could be tweaked in
different ways.  Thinking you can just hardcode these attributes and
settings is foolish.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Sagi Grimberg



I'm fine with the path selectors getting moved out; maybe it'll
encourage new path selectors to be developed.

But there will need to be some userspace interface stood up to support
your native NVMe multipathing (you may not think it needed but think in
time there will be a need to configure _something_).  That is the
fragmentation I'm referring to.


I guess one config option that we'd need is multibus vs. failover
which are used per use-case.

I'd have to say that having something much simpler than multipath-tools
does sound appealing.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Keith Busch
On Thu, Feb 16, 2017 at 05:37:41PM +, Bart Van Assche wrote:
> On Thu, 2017-02-16 at 12:38 -0500, Keith Busch wrote:
> > Maybe I'm not seeing the bigger picture. Is there some part to multipath
> > that the kernel is not in a better position to handle?
> 
> Does this mean that the code to parse /etc/multipath.conf will be moved into
> the kernel? Or will we lose the ability to configure the policies that
> /etc/multipath.conf allows to configure?

No, I'm just considering the settings for a device that won't work
at all if multipath.conf is wrong. For example, the uuid attributes,
path priority, or path checker. These can't be considered configurable
policies if all but one of them are invalid for a specific device type.

It shouldn't even be an option to let a user select TUR path checker
for NVMe, and the only checkers multipath-tools provide that even make
sense for NVMe are deprecated.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Bart Van Assche
On Thu, 2017-02-16 at 12:38 -0500, Keith Busch wrote:
> Maybe I'm not seeing the bigger picture. Is there some part to multipath
> that the kernel is not in a better position to handle?

Does this mean that the code to parse /etc/multipath.conf will be moved into
the kernel? Or will we lose the ability to configure the policies that
/etc/multipath.conf allows to configure?

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Keith Busch
On Thu, Feb 16, 2017 at 10:13:37AM -0500, Mike Snitzer wrote:
> On Thu, Feb 16 2017 at  9:26am -0500,
> Christoph Hellwig  wrote:
>
> > just a little new code in the block layer, and a move of the path
> > selectors from dm to the block layer.  I would not call this
> > fragmentation.
> 
> I'm fine with the path selectors getting moved out; maybe it'll
> encourage new path selectors to be developed.
> 
> But there will need to be some userspace interface stood up to support
> your native NVMe multipathing (you may not think it needed but think in
> time there will be a need to configure _something_).  That is the
> fragmentation I'm referring to.

I'm not sure what Christoph's proposal looks like, but I have to agree
that multipath support directly in the kernel without requiring user
space to setup the mpath block device is easier for everyone. The only
NVMe specific part, though, just needs to be how it reports unique
identifiers to the multipath layer.

Maybe I'm not seeing the bigger picture. Is there some part to multipath
that the kernel is not in a better position to handle?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Mike Snitzer
On Thu, Feb 16 2017 at 12:00am -0500,
Bart Van Assche  wrote:

> On 02/15/17 18:53, Mike Snitzer wrote:
> >Nobody has interest in Linux multipathing becoming fragmented.
> >
> >If every transport implemented their own multipathing the end-user would
> >be amazingly screwed trying to keep track of all the
> >quirks/configuration/management of each.
> >
> >Not saying multipath-tools is great, nor that DM multipath is god's
> >gift.  But substantiating _why_ you need this "native NVMe
> >multipathing" would go a really long way to justifying your effort.
> >
> >For starters, how about you show just how much better than DM multipath
> >this native NVMe multipathing performs?  NOTE: it'd imply you put effort
> >to making DM multipath work with NVMe.. if you've sat on that code too
> >that'd be amazingly unfortunate/frustrating.
> 
> Another question is what your attitude is towards dm-mpath changes?
> Last time I posted a series of patches that significantly clean up
> and improve readability of the dm-mpath code you refused to take these 
> upstream.

Weird.  I did push back on those changes initially (just felt like
churn) but I ultimately did take them:

$ git log --oneline --author=bart drivers/md/dm-mpath.c
6599c84 dm mpath: do not modify *__clone if blk_mq_alloc_request() fails
4813577 dm mpath: change return type of pg_init_all_paths() from int to void
9f4c3f8 dm: convert wait loops to use autoremove_wake_function()

Did I miss any?

But to be 100% clear, I'm very appreciative of any DM mpath (and
request-based DM core) changes.  I'll review them with a critical eye
but if they hold up they get included.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-15 Thread Bart Van Assche

On 02/15/17 18:53, Mike Snitzer wrote:

Nobody has interest in Linux multipathing becoming fragmented.

If every transport implemented their own multipathing the end-user would
be amazingly screwed trying to keep track of all the
quirks/configuration/management of each.

Not saying multipath-tools is great, nor that DM multipath is god's
gift.  But substantiating _why_ you need this "native NVMe
multipathing" would go a really long way to justifying your effort.

For starters, how about you show just how much better than DM multipath
this native NVMe multipathing performs?  NOTE: it'd imply you put effort
to making DM multipath work with NVMe.. if you've sat on that code too
that'd be amazingly unfortunate/frustrating.


Another question is what your attitude is towards dm-mpath changes? Last 
time I posted a series of patches that significantly clean up and

improve readability of the dm-mpath code you refused to take these upstream.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel