Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-04-01 Thread Alex Williamson
On Thu, 1 Apr 2021 10:12:27 -0300
Jason Gunthorpe  wrote:

> On Mon, Mar 29, 2021 at 05:10:53PM -0600, Alex Williamson wrote:
> > On Tue, 23 Mar 2021 16:32:13 -0300
> > Jason Gunthorpe  wrote:
> >   
> > > On Mon, Mar 22, 2021 at 10:40:16AM -0600, Alex Williamson wrote:
> > >   
> > > > Of course if you start looking at features like migration support,
> > > > that's more than likely not simply an additional region with optional
> > > > information, it would need to interact with the actual state of the
> > > > device.  For those, I would very much support use of a specific
> > > > id_table.  That's not these.
> > > 
> > > What I don't understand is why do we need two different ways of
> > > inserting vendor code?  
> > 
> > Because a PCI id table only identifies the device, these drivers are
> > looking for a device in the context of firmware dependencies.  
> 
> The firmware dependencies only exist for a defined list of ID's, so I
> don't entirely agree with this statement. I agree with below though,
> so lets leave it be.
> 
> > > I understood he ment that NVIDI GPUs *without* NVLINK can exist, but
> > > the ID table we have here is supposed to be the NVLINK compatible
> > > ID's.  
> > 
> > Those IDs are just for the SXM2 variants of the device that can
> > exist on a variety of platforms, only one of which includes the
> > firmware tables to activate the vfio support.  
> 
> AFAIK, SXM2 is a special physical form factor that has the nvlink
> physical connection - it is only for this specific generation of power
> servers that can accept the specific nvlink those cards have.

SXM2 is not unique to Power, there are various x86 systems that support
the interface, everything from NVIDIA's own line of DGX systems,
various vendor systems, all the way to VARs like Super Micro and
Gigabyte.

> > I think you're looking for a significant inflection in vendor's stated
> > support for vfio use cases, beyond the "best-effort, give it a try",
> > that we currently have.  
> 
> I see, so they don't want to. Lets leave it then.
> 
> Though if Xe breaks everything they need to add/maintain a proper ID
> table, not more hackery.

e4eccb853664 ("vfio/pci: Bypass IGD init in case of -ENODEV") is
supposed to enable Xe, where the IGD code is expected to return -ENODEV
and we go on with the base vfio-pci support.
 
> > > And again, I feel this is all a big tangent, especially now that
> > > HCH wants to delete the nvlink stuff we should just leave igd
> > > alone.  
> > 
> > Determining which things stay in vfio-pci-core and which things are
> > split to variant drivers and how those variant drivers can match the
> > devices they intend to support seems very inline with this series.
> >   
> 
> IMHO, the main litmus test for core is if variant drivers will need it
> or not.
> 
> No variant driver should be stacked on an igd device, or if it someday
> is, it should implement the special igd hackery internally (and have a
> proper ID table). So when we split it up igd goes into vfio_pci.ko as
> some special behavior vfio_pci.ko's universal driver provides for IGD.
> 
> Every variant driver will still need the zdev data to be exposed to
> userspace, and every PCI device on s390 has that extra information. So
> vdev goes to vfio_pci_core.ko
> 
> Future things going into vfio_pci.ko need a really good reason why
> they can't be varian drivers instead.

That sounds fair.  Thanks,

Alex



Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-04-01 Thread Cornelia Huck
On Mon, 29 Mar 2021 17:10:53 -0600
Alex Williamson  wrote:

> On Tue, 23 Mar 2021 16:32:13 -0300
> Jason Gunthorpe  wrote:
> 
> > On Mon, Mar 22, 2021 at 10:40:16AM -0600, Alex Williamson wrote:

> > > So unless you want to do some bitkeeper archaeology, we've always
> > > allowed driver probes to fail and fall through to the next one, not
> > > even complaining with -ENODEV.  In practice it hasn't been an issue
> > > because how many drivers do you expect to have that would even try to
> > > claim a device.  
> > 
> > Do you know of anything using this ability? It might be helpful  
> 
> I don't.

I've been trying to remember why I added that patch to ignore all
errors (rather than only -ENODEV), but I suspect it might have been
related to the concurrent probing stuff I tried to implement back then.
The one instance of drivers matching to the same id I recall (s390
ctc/lcs) is actually not handled on the individual device level, but in
the meta ccwgroup driver; I don't remember anything else in the s390
case.

> 
> > > Ordering is only important when there's a catch-all so we need to
> > > figure out how to make that last among a class of drivers that will
> > > attempt to claim a device.  The softdep is a bit of a hack to do
> > > that, I'll admit, but I don't see how the alternate driver flavor
> > > universe solves having a catch-all either.
> > 
> > Haven't entirely got there yet, but I think the catch all probably has
> > to be handled by userspace udev/kmod in some way, as it is the only
> > thing that knows if there is a more specific module to load. This is
> > the biggest problem..
> > 
> > And again, I feel this is all a big tangent, especially now that HCH
> > wants to delete the nvlink stuff we should just leave igd alone.  
> 
> Determining which things stay in vfio-pci-core and which things are
> split to variant drivers and how those variant drivers can match the
> devices they intend to support seems very inline with this series.  If
> igd stays as part of vfio-pci-core then I think we're drawing a
> parallel to z-pci support, where a significant part of that support is
> a set of extra data structures exposed through capabilities to support
> userspace use of the device.  Therefore extra regions or data
> structures through capabilities, where we're not changing device
> access, except as required for the platform (not the device) seem to be
> things that fit within the core, right?  Thanks,
> 
> Alex

As we are only talking about extra data governed by a capability, I
don't really see a problem with keeping it in the vfio core.

For those devices that need more specialized treatment, maybe we need
some kind of priority-based matching? I.e., if we match a device with
drivers, start with the one with highest priority (the specialized
one), and have the generic driver at the lowest priority. A
higher-priority driver added later one should not affect already bound
devices (and would need manual intervention again.)

[I think this has come up in other places in the past as well, but I
don't have any concrete pointers handy.]



Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-04-01 Thread Jason Gunthorpe
On Mon, Mar 29, 2021 at 05:10:53PM -0600, Alex Williamson wrote:
> On Tue, 23 Mar 2021 16:32:13 -0300
> Jason Gunthorpe  wrote:
> 
> > On Mon, Mar 22, 2021 at 10:40:16AM -0600, Alex Williamson wrote:
> > 
> > > Of course if you start looking at features like migration support,
> > > that's more than likely not simply an additional region with optional
> > > information, it would need to interact with the actual state of the
> > > device.  For those, I would very much support use of a specific
> > > id_table.  That's not these.  
> > 
> > What I don't understand is why do we need two different ways of
> > inserting vendor code?
> 
> Because a PCI id table only identifies the device, these drivers are
> looking for a device in the context of firmware dependencies.

The firmware dependencies only exist for a defined list of ID's, so I
don't entirely agree with this statement. I agree with below though,
so lets leave it be.

> > I understood he ment that NVIDI GPUs *without* NVLINK can exist, but
> > the ID table we have here is supposed to be the NVLINK compatible
> > ID's.
> 
> Those IDs are just for the SXM2 variants of the device that can
> exist on a variety of platforms, only one of which includes the
> firmware tables to activate the vfio support.

AFAIK, SXM2 is a special physical form factor that has the nvlink
physical connection - it is only for this specific generation of power
servers that can accept the specific nvlink those cards have.

> I think you're looking for a significant inflection in vendor's stated
> support for vfio use cases, beyond the "best-effort, give it a try",
> that we currently have.

I see, so they don't want to. Lets leave it then.

Though if Xe breaks everything they need to add/maintain a proper ID
table, not more hackery.

> > And again, I feel this is all a big tangent, especially now that HCH
> > wants to delete the nvlink stuff we should just leave igd alone.
> 
> Determining which things stay in vfio-pci-core and which things are
> split to variant drivers and how those variant drivers can match the
> devices they intend to support seems very inline with this series.  

IMHO, the main litmus test for core is if variant drivers will need it
or not.

No variant driver should be stacked on an igd device, or if it someday
is, it should implement the special igd hackery internally (and have a
proper ID table). So when we split it up igd goes into vfio_pci.ko as
some special behavior vfio_pci.ko's universal driver provides for IGD.

Every variant driver will still need the zdev data to be exposed to
userspace, and every PCI device on s390 has that extra information. So
vdev goes to vfio_pci_core.ko

Future things going into vfio_pci.ko need a really good reason why
they can't be varian drivers instead.

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-29 Thread Alex Williamson
On Tue, 23 Mar 2021 16:32:13 -0300
Jason Gunthorpe  wrote:

> On Mon, Mar 22, 2021 at 10:40:16AM -0600, Alex Williamson wrote:
> 
> > Of course if you start looking at features like migration support,
> > that's more than likely not simply an additional region with optional
> > information, it would need to interact with the actual state of the
> > device.  For those, I would very much support use of a specific
> > id_table.  That's not these.  
> 
> What I don't understand is why do we need two different ways of
> inserting vendor code?

Because a PCI id table only identifies the device, these drivers are
looking for a device in the context of firmware dependencies.

> > > new_id and driver_override should probably be in that disable list
> > > too..  
> > 
> > We don't have this other world yet, nor is it clear that we will have
> > it.  
> 
> We do today, it is obscure, but there is a whole set of config options
> designed to disable the unsafe kernel features. Kernels booted with
> secure boot and signed modules tend to enable a lot of them, for
> instance. The people working on the IMA stuff tend to enable a lot
> more as you can defeat the purpose of IMA if you can hijack the
> kernel.
> 
> > What sort of id_table is the base vfio-pci driver expected to use?  
> 
> If it has a match table it would be all match, this is why I called it
> a "universal driver"
> 
> If we have a flavour then the flavour controls the activation of
> VFIO, not new_id or driver_override, and in vfio flavour mode we can
> have an all match table, if we can resolve how to choose between two
> drivers with overlapping matches.
> 
> > > > > This is why I want to try for fine grained autoloading first. It
> > > > > really is the elegant solution if we can work it out.
> > > > 
> > > > I just don't see how we create a manageable change to userspace.
> > > 
> > > I'm not sure I understand. Even if we add a new sysfs to set some
> > > flavour then that is a pretty trivial change for userspace to move
> > > from driver_override?  
> > 
> > Perhaps for some definition of trivial that I'm not familiar with.
> > We're talking about changing libvirt and driverctl and every distro and
> > user that's created a custom script outside of those.  Even changing
> > from "vfio-pci" to "vfio-pci*" is a hurdle.  
> 
> Sure, but it isn't like a major architectural shift, nor is it
> mandatory unless you start using this new hardware class.
> 
> Userspace changes when we add kernel functionality.. The kernel just
> has to keep working the way it used to for old functionality.

Seems like we're bound to keep igd in the core as you propose below.

> > > Well, I read through the Intel GPU driver and this is how I felt it
> > > works. It doesn't even check the firmware bit unless certain PCI IDs
> > > are matched first.  
> > 
> > The IDs being only the PCI vendor ID and class code.
> 
> I don't mean how vfio works, I mean how the Intel GPU driver works.
> 
> eg:
> 
> psb_pci_probe()
>  psb_driver_load()
>   psb_intel_opregion_setup()
>if (memcmp(base, OPREGION_SIGNATURE, 16)) {
> 
> i915_pci_probe()
>  i915_driver_probe()
>   i915_driver_hw_probe()
>intel_opregion_setup()
>   if (memcmp(buf, OPREGION_SIGNATURE, 16)) {
> 
> All of these memcmp's are protected by exact id_tables hung off the
> pci_driver's id_table.
> 
> VFIO is the different case. In this case the ID match confirms that
> the config space has the ASLS dword at the fixed offset. If the ID
> doesn't match nothing should read the ASLS offset.
> 
> > > For NVIDIA GPU Max checked internally and we saw it looks very much
> > > like how Intel GPU works. Only some PCI IDs trigger checking on the
> > > feature the firmware thing is linked to.  
> > 
> > And as Alexey noted, the table came up incomplete.  But also those same
> > devices exist on platforms where this extension is completely
> > irrelevant.  
> 
> I understood he ment that NVIDI GPUs *without* NVLINK can exist, but
> the ID table we have here is supposed to be the NVLINK compatible
> ID's.

Those IDs are just for the SXM2 variants of the device that can
exist on a variety of platforms, only one of which includes the
firmware tables to activate the vfio support.

> > So because we don't check for an Intel specific graphics firmware table
> > when binding to Realtek NIC, we can leap to the conclusion that there
> > must be a concise id_table we can create for IGD support?  
> 
> Concise? No, but we can see *today* what the ID table is supposed to
> be by just loooking and the three probe functions that touch
> OPREGION_SIGNATURE.
> 
> > There's a giant assumption above that I'm missing.  Are you expecting
> > that vendors are actually going to keep up with submitting device IDs
> > that they claim to have tested and support with vfio-pci and all other
> > devices won't be allowed to bind?  That would single handedly destroy
> > any non-enterprise use cases of vfio-pci.  
> 
> Why not? They do it for 

Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-23 Thread Alexey Kardashevskiy




On 24/03/2021 06:32, Jason Gunthorpe wrote:


For NVIDIA GPU Max checked internally and we saw it looks very much
like how Intel GPU works. Only some PCI IDs trigger checking on the
feature the firmware thing is linked to.


And as Alexey noted, the table came up incomplete.  But also those same
devices exist on platforms where this extension is completely
irrelevant.


I understood he ment that NVIDI GPUs *without* NVLINK can exist, but
the ID table we have here is supposed to be the NVLINK compatible
ID's.



I also meant there are more (than in the proposed list)  GPUs with 
NVLink which will work on P9.



--
Alexey


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-23 Thread Jason Gunthorpe
On Mon, Mar 22, 2021 at 10:40:16AM -0600, Alex Williamson wrote:

> Of course if you start looking at features like migration support,
> that's more than likely not simply an additional region with optional
> information, it would need to interact with the actual state of the
> device.  For those, I would very much support use of a specific
> id_table.  That's not these.

What I don't understand is why do we need two different ways of
inserting vendor code?

> > new_id and driver_override should probably be in that disable list
> > too..
> 
> We don't have this other world yet, nor is it clear that we will have
> it.

We do today, it is obscure, but there is a whole set of config options
designed to disable the unsafe kernel features. Kernels booted with
secure boot and signed modules tend to enable a lot of them, for
instance. The people working on the IMA stuff tend to enable a lot
more as you can defeat the purpose of IMA if you can hijack the
kernel.

> What sort of id_table is the base vfio-pci driver expected to use?

If it has a match table it would be all match, this is why I called it
a "universal driver"

If we have a flavour then the flavour controls the activation of
VFIO, not new_id or driver_override, and in vfio flavour mode we can
have an all match table, if we can resolve how to choose between two
drivers with overlapping matches.

> > > > This is why I want to try for fine grained autoloading first. It
> > > > really is the elegant solution if we can work it out.  
> > > 
> > > I just don't see how we create a manageable change to userspace.  
> > 
> > I'm not sure I understand. Even if we add a new sysfs to set some
> > flavour then that is a pretty trivial change for userspace to move
> > from driver_override?
> 
> Perhaps for some definition of trivial that I'm not familiar with.
> We're talking about changing libvirt and driverctl and every distro and
> user that's created a custom script outside of those.  Even changing
> from "vfio-pci" to "vfio-pci*" is a hurdle.

Sure, but it isn't like a major architectural shift, nor is it
mandatory unless you start using this new hardware class.

Userspace changes when we add kernel functionality.. The kernel just
has to keep working the way it used to for old functionality.

> > Well, I read through the Intel GPU driver and this is how I felt it
> > works. It doesn't even check the firmware bit unless certain PCI IDs
> > are matched first.
> 
> The IDs being only the PCI vendor ID and class code.  

I don't mean how vfio works, I mean how the Intel GPU driver works.

eg:

psb_pci_probe()
 psb_driver_load()
  psb_intel_opregion_setup()
   if (memcmp(base, OPREGION_SIGNATURE, 16)) {

i915_pci_probe()
 i915_driver_probe()
  i915_driver_hw_probe()
   intel_opregion_setup()
if (memcmp(buf, OPREGION_SIGNATURE, 16)) {

All of these memcmp's are protected by exact id_tables hung off the
pci_driver's id_table.

VFIO is the different case. In this case the ID match confirms that
the config space has the ASLS dword at the fixed offset. If the ID
doesn't match nothing should read the ASLS offset.

> > For NVIDIA GPU Max checked internally and we saw it looks very much
> > like how Intel GPU works. Only some PCI IDs trigger checking on the
> > feature the firmware thing is linked to.
> 
> And as Alexey noted, the table came up incomplete.  But also those same
> devices exist on platforms where this extension is completely
> irrelevant.

I understood he ment that NVIDI GPUs *without* NVLINK can exist, but
the ID table we have here is supposed to be the NVLINK compatible
ID's.

> So because we don't check for an Intel specific graphics firmware table
> when binding to Realtek NIC, we can leap to the conclusion that there
> must be a concise id_table we can create for IGD support?

Concise? No, but we can see *today* what the ID table is supposed to
be by just loooking and the three probe functions that touch
OPREGION_SIGNATURE.

> There's a giant assumption above that I'm missing.  Are you expecting
> that vendors are actually going to keep up with submitting device IDs
> that they claim to have tested and support with vfio-pci and all other
> devices won't be allowed to bind?  That would single handedly destroy
> any non-enterprise use cases of vfio-pci.

Why not? They do it for the in-tree GPU drivers today! The ID table
for Intel GPU is even in a *header file* and we can just #include it
into vfio igd as well.

> So unless you want to do some bitkeeper archaeology, we've always
> allowed driver probes to fail and fall through to the next one, not
> even complaining with -ENODEV.  In practice it hasn't been an issue
> because how many drivers do you expect to have that would even try to
> claim a device.  

Do you know of anything using this ability? It might be helpful

> Ordering is only important when there's a catch-all so we need to
> figure out how to make that last among a class of drivers that will
> attempt to claim a device.  The 

Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2021 at 02:17:09PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 22, 2021 at 01:44:11PM -0300, Jason Gunthorpe wrote:
> > This isn't quite the scenario that needs solving. Lets go back to
> > Max's V1 posting:
> > 
> > The mlx5_vfio_pci.c pci_driver matches this:
> > 
> > +   { PCI_DEVICE_SUB(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1042,
> > +PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID) }, /* Virtio SNAP 
> > controllers */
> > 
> > This overlaps with the match table in
> > drivers/virtio/virtio_pci_common.c:
> > 
> > { PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> > 
> > So, if we do as you propose we have to add something mellanox specific
> > to virtio_pci_common which seems to me to just repeating this whole
> > problem except in more drivers.
> 
> Oh, yikes.  

This is why I keep saying it is a VFIO driver - it has no relation to
the normal kernel drivers on the hypervisor. Even loading a normal
kernel driver and switching to a VFIO mode would be unacceptably
slow/disruptive.

The goal is to go directly to a VFIO mode driver with PCI driver auto
probing disabled to avoid attaching a regular driver. Big servers will
have 1000's of these things.

> > The general thing that that is happening is people are adding VM
> > migration capability to existing standard PCI interfaces like VFIO,
> > NVMe, etc
> 
> Well, if a migration capability is added to virtio (or NVMe) it should
> be standardized and not vendor specific.

It would be nice, but it would be a challenging standard to write.

I think the industry is still in the pre-standards mode of trying to
even figure out how this stuff should work.

IMHO PCI sig needs to tackle a big part of this as we can't embed any
migration controls in the VF itself, it has to be secure for only
hypervisor use.

What we've got now is a Linux standard in VFIO where the uAPI to
manage migration is multi-vendor and we want to plug drivers into
that.

If in a few years the industry also develops HW standards then I
imagine using the same mechanism to plug in these standards based
implementation.

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-23 Thread Christoph Hellwig
On Mon, Mar 22, 2021 at 01:44:11PM -0300, Jason Gunthorpe wrote:
> This isn't quite the scenario that needs solving. Lets go back to
> Max's V1 posting:
> 
> The mlx5_vfio_pci.c pci_driver matches this:
> 
> + { PCI_DEVICE_SUB(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1042,
> +  PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID) }, /* Virtio SNAP 
> controllers */
> 
> This overlaps with the match table in
> drivers/virtio/virtio_pci_common.c:
> 
> { PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> 
> So, if we do as you propose we have to add something mellanox specific
> to virtio_pci_common which seems to me to just repeating this whole
> problem except in more drivers.

Oh, yikes.  

> The general thing that that is happening is people are adding VM
> migration capability to existing standard PCI interfaces like VFIO,
> NVMe, etc

Well, if a migration capability is added to virtio (or NVMe) it should
be standardized and not vendor specific.


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-22 Thread Jason Gunthorpe
On Mon, Mar 22, 2021 at 04:11:25PM +0100, Christoph Hellwig wrote:
> On Fri, Mar 19, 2021 at 05:07:49PM -0300, Jason Gunthorpe wrote:
> > The way the driver core works is to first match against the already
> > loaded driver list, then trigger an event for module loading and when
> > new drivers are registered they bind to unbound devices.
> > 
> > So, the trouble is the event through userspace because the kernel
> > can't just go on to use vfio_pci until it knows userspace has failed
> > to satisfy the load request.
> > 
> > One answer is to have userspace udev have the "hook" here and when a
> > vfio flavour mod alias is requested on a PCI device it swaps in
> > vfio_pci if it can't find an alternative.
> > 
> > The dream would be a system with no vfio modules loaded could do some
> > 
> >  echo "vfio" > /sys/bus/pci/xxx/driver_flavour
> > 
> > And a module would be loaded and a struct vfio_device is created for
> > that device. Very easy for the user.
> 
> Maybe I did not communicate my suggestion last week very well.  My
> idea is that there are no different pci_drivers vs vfio or not,
> but different personalities of the same driver.

This isn't quite the scenario that needs solving. Lets go back to
Max's V1 posting:

The mlx5_vfio_pci.c pci_driver matches this:

+   { PCI_DEVICE_SUB(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1042,
+PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID) }, /* Virtio SNAP 
controllers */

This overlaps with the match table in
drivers/virtio/virtio_pci_common.c:

{ PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },

So, if we do as you propose we have to add something mellanox specific
to virtio_pci_common which seems to me to just repeating this whole
problem except in more drivers.

The general thing that that is happening is people are adding VM
migration capability to existing standard PCI interfaces like VFIO,
NVMe, etc

At least in this mlx5 situation the PF driver provides the HW access
to do the migration and the vfio mlx5 driver provides all the protocol
and machinery specific to the PCI standard being migrated. They are
all a little different.

But you could imagine some other implemetnation where the VF might
have an extra non-standard BAR that is the migration control.

This is why I like having a full stand alone pci_driver as everyone
implementing this can provide the vfio_device that is appropriate for
the HW.

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-22 Thread Alex Williamson
On Sun, 21 Mar 2021 09:58:18 -0300
Jason Gunthorpe  wrote:

> On Fri, Mar 19, 2021 at 10:40:28PM -0600, Alex Williamson wrote:
> 
> > > Well, today we don't, but Max here adds id_table's to the special
> > > devices and a MODULE_DEVICE_TABLE would come too if we do the flavours
> > > thing below.  
> > 
> > I think the id_tables are the wrong approach for IGD and NVLink
> > variants.  
> 
> I really disagree with this. Checking for some random bits in firmware
> and assuming that every device made forever into the future works with
> this check is not a good way to do compatibility. Christoph made the
> same point.
> 
> We have good processes to maintain id tables, I don't see this as a
> problem.

The base driver we're discussing here is a meta-driver that binds to
any PCI endpoint as directed by the user.  There is no id_table.  There
can't be any id_table unless you're expecting every device vendor to
submit the exact subset of devices they have tested and condone usage
with this interface.  The IGD extensions here only extend that
interface by providing userspace read-only access to a few additional
pieces of information that we've found to be necessary for certain
userspace drivers.  The actual device interface is unchanged.  In the
case of the NVLink extensions, AIUI these are mostly extensions of a
firmware defined interface for managing aspects of the interconnect to
the device.  It is actually the "random bits in firmware" that we want
to expose, the ID of the device is somewhat tangential, we just only
look for those firmware extensions in association to certain vendor
devices.

Of course if you start looking at features like migration support,
that's more than likely not simply an additional region with optional
information, it would need to interact with the actual state of the
device.  For those, I would very much support use of a specific
id_table.  That's not these.

> > > As-is driver_override seems dangerous as overriding the matching table
> > > could surely allow root userspace to crash the machine. In situations
> > > with trusted boot/signed modules this shouldn't be.  
> > 
> > When we're dealing with meta-drivers that can bind to anything, we
> > shouldn't rely on the match, but should instead verify the driver is
> > appropriate in the probe callback.  Even without driver_override,
> > there's the new_id mechanism.  Either method allows the root user to
> > break driver binding.  Greg has previously stated something to the
> > effect that users get to keep all the pieces when they break something
> > by manipulating driver binding.  
> 
> Yes, but that is a view where root is allowed to break the kernel, we
> now have this optional other world where that is not allowed and root
> access to lots of dangerous things are now disabled.
> 
> new_id and driver_override should probably be in that disable list
> too..

We don't have this other world yet, nor is it clear that we will have
it.  What sort of id_table is the base vfio-pci driver expected to use?
There's always a risk that hardware doesn't adhere to the spec or that
platform firmware might escalate an error that we'd otherwise consider
mundane from a userspace driver.

> > > While that might not seem too bad with these simple drivers, at least
> > > the mlx5 migration driver will have a large dependency tree and pull
> > > in lots of other modules. Even Max's sample from v1 pulls in mlx5_core.ko
> > > and a bunch of other stuff in its orbit.  
> > 
> > Luckily the mlx5 driver doesn't need to be covered by compatibility
> > support, so we don't need to set a softdep for it and the module could
> > be named such that a wildcard driver_override of vfio_pci* shouldn't
> > logically include that driver.  Users can manually create their own
> > modprobe.d softdep entry if they'd like to include it.  Otherwise
> > userspace would need to know to bind to it specifically.  
> 
> But now you are giving up on the whole point, which was to
> automatically load the correct specific module without special admin
> involvement!

This series only exposed a temporary compatibility interface to provide
that anyway.  As I understood it, the long term solution was that
userspace would somehow learn which driver to use for which device.
That "somehow" isn't clear to me.

> > > This is why I want to try for fine grained autoloading first. It
> > > really is the elegant solution if we can work it out.  
> > 
> > I just don't see how we create a manageable change to userspace.  
> 
> I'm not sure I understand. Even if we add a new sysfs to set some
> flavour then that is a pretty trivial change for userspace to move
> from driver_override?

Perhaps for some definition of trivial that I'm not familiar with.
We're talking about changing libvirt and driverctl and every distro and
user that's created a custom script outside of those.  Even changing
from "vfio-pci" to "vfio-pci*" is a hurdle.

> > > I don't think we should over-focus on these two firmware 

Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-22 Thread Christoph Hellwig
On Fri, Mar 19, 2021 at 05:07:49PM -0300, Jason Gunthorpe wrote:
> The way the driver core works is to first match against the already
> loaded driver list, then trigger an event for module loading and when
> new drivers are registered they bind to unbound devices.
> 
> So, the trouble is the event through userspace because the kernel
> can't just go on to use vfio_pci until it knows userspace has failed
> to satisfy the load request.
> 
> One answer is to have userspace udev have the "hook" here and when a
> vfio flavour mod alias is requested on a PCI device it swaps in
> vfio_pci if it can't find an alternative.
> 
> The dream would be a system with no vfio modules loaded could do some
> 
>  echo "vfio" > /sys/bus/pci/xxx/driver_flavour
> 
> And a module would be loaded and a struct vfio_device is created for
> that device. Very easy for the user.

Maybe I did not communicate my suggestion last week very well.  My
idea is that there are no different pci_drivers vs vfio or not,
but different personalities of the same driver.

So the interface would still look somewhat like your suggestion above,
although I'd prefer something like:

   echo 1 > /sys/bus/pci/xxx/use_vfio

How would the flow look like for the various cases?

 a) if a driver is bound, and it supports the enable_vfio method that
is called, and everything is controller by the driver, which uses
symbols exorted from vfio/vfio_pci to implement the functionality
 b) if a driver is bound, but does not support the enable_vfio method
it is unbound and vfio_pci is bound instead, continue at c)
 c) use the normal current vfio flow

do the reverse on a

echo 0 > /sys/bus/pci/xxx/use_vfio


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-21 Thread Jason Gunthorpe
On Fri, Mar 19, 2021 at 10:40:28PM -0600, Alex Williamson wrote:

> > Well, today we don't, but Max here adds id_table's to the special
> > devices and a MODULE_DEVICE_TABLE would come too if we do the flavours
> > thing below.
> 
> I think the id_tables are the wrong approach for IGD and NVLink
> variants.

I really disagree with this. Checking for some random bits in firmware
and assuming that every device made forever into the future works with
this check is not a good way to do compatibility. Christoph made the
same point.

We have good processes to maintain id tables, I don't see this as a
problem.

> > As-is driver_override seems dangerous as overriding the matching table
> > could surely allow root userspace to crash the machine. In situations
> > with trusted boot/signed modules this shouldn't be.
> 
> When we're dealing with meta-drivers that can bind to anything, we
> shouldn't rely on the match, but should instead verify the driver is
> appropriate in the probe callback.  Even without driver_override,
> there's the new_id mechanism.  Either method allows the root user to
> break driver binding.  Greg has previously stated something to the
> effect that users get to keep all the pieces when they break something
> by manipulating driver binding.

Yes, but that is a view where root is allowed to break the kernel, we
now have this optional other world where that is not allowed and root
access to lots of dangerous things are now disabled.

new_id and driver_override should probably be in that disable list
too..

> > While that might not seem too bad with these simple drivers, at least
> > the mlx5 migration driver will have a large dependency tree and pull
> > in lots of other modules. Even Max's sample from v1 pulls in mlx5_core.ko
> > and a bunch of other stuff in its orbit.
> 
> Luckily the mlx5 driver doesn't need to be covered by compatibility
> support, so we don't need to set a softdep for it and the module could
> be named such that a wildcard driver_override of vfio_pci* shouldn't
> logically include that driver.  Users can manually create their own
> modprobe.d softdep entry if they'd like to include it.  Otherwise
> userspace would need to know to bind to it specifically.

But now you are giving up on the whole point, which was to
automatically load the correct specific module without special admin
involvement!

> > This is why I want to try for fine grained autoloading first. It
> > really is the elegant solution if we can work it out.
> 
> I just don't see how we create a manageable change to userspace.

I'm not sure I understand. Even if we add a new sysfs to set some
flavour then that is a pretty trivial change for userspace to move
from driver_override?

> > I don't think we should over-focus on these two firmware triggered
> > examples. I looked at the Intel GPU driver and it already only reads
> > the firmware thing for certain PCI ID's, we can absolutely generate a
> > narrow match table for it. Same is true for the NVIDIA GPU.
> 
> I'm not sure we can make this assertion, both only care about the type
> of device and existence of associated firmware tables.  

Well, I read through the Intel GPU driver and this is how I felt it
works. It doesn't even check the firmware bit unless certain PCI IDs
are matched first.

For NVIDIA GPU Max checked internally and we saw it looks very much
like how Intel GPU works. Only some PCI IDs trigger checking on the
feature the firmware thing is linked to.

My point is: the actual *drivers* consuming these firmware features do
*not* blindly match every PCI device and check for the firmware
bit. They all have narrow matches and further only try to use the
firmware thing for some subset of PCI IDs that the entire driver
supports.

Given that the actual drivers work this way there is no technical
reason vfio-pci can't do this as well.

We don't have to change them of course, they can stay as is if people
feel really strongly.

> > Even so, I'm not *so* worried about "over matching" - if IGD or the
> > nvidia stuff load on a wide set of devices then they can just not
> > enable their extended stuff. It wastes some kernel memory, but it is
> > OK.
> 
> I'd rather they bind to the base vfio-pci driver if their extended
> features are not available.

Sure it would be nice, but functionally it is no different.

> > And if some driver *really* gets stuck here the true answer is to
> > improve the driver core match capability.
> > 
> > > devices in the deny-list and non-endpoint devices.  Many drivers
> > > clearly place implicit trust in their id_table, others don't.  In the
> > > case of meta drivers, I think it's fair to make use of the latter
> > > approach.  
> > 
> > Well, AFAIK, the driver core doesn't have a 'try probe, if it fails
> > then try another driver' approach. One device, one driver. Am I
> > missing something?
> 
> If the driver probe callback fails, really_probe() returns 0 with the
> comment:
> 
> /*
>  * Ignore errors 

Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-19 Thread Alex Williamson
On Fri, 19 Mar 2021 19:59:43 -0300
Jason Gunthorpe  wrote:

> On Fri, Mar 19, 2021 at 03:08:09PM -0600, Alex Williamson wrote:
> > On Fri, 19 Mar 2021 17:07:49 -0300
> > Jason Gunthorpe  wrote:
> >   
> > > On Fri, Mar 19, 2021 at 11:36:42AM -0600, Alex Williamson wrote:  
> > > > On Fri, 19 Mar 2021 17:34:49 +0100
> > > > Christoph Hellwig  wrote:
> > > > 
> > > > > On Fri, Mar 19, 2021 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> > > > > > The wrinkle I don't yet have an easy answer to is how to load 
> > > > > > vfio_pci
> > > > > > as a universal "default" within the driver core lazy bind scheme and
> > > > > > still have working module autoloading... I'm hoping to get some
> > > > > > research into this..  
> > > > 
> > > > What about using MODULE_SOFTDEP("pre: ...") in the vfio-pci base
> > > > driver, which would load all the known variants in order to influence
> > > > the match, and therefore probe ordering?
> > > 
> > > The way the driver core works is to first match against the already
> > > loaded driver list, then trigger an event for module loading and when
> > > new drivers are registered they bind to unbound devices.  
> > 
> > The former is based on id_tables, the latter on MODULE_DEVICE_TABLE, we
> > don't have either of those.  
> 
> Well, today we don't, but Max here adds id_table's to the special
> devices and a MODULE_DEVICE_TABLE would come too if we do the flavours
> thing below.

I think the id_tables are the wrong approach for IGD and NVLink
variants.
 
> My starting thinking is that everything should have these tables and
> they should work properly..

id_tables require ongoing maintenance whereas the existing variants
require only vendor + device class and some platform feature, like a
firmware or fdt table.  They're meant to only add extra regions to
vfio-pci base support, not extensively modify the device interface.
 
> > As noted to Christoph, the cases where we want a vfio driver to
> > bind to anything automatically is the exception.  
> 
> I agree vfio should not automatically claim devices, but once vfio is
> told to claim a device everything from there after should be
> automatic.
> 
> > > One answer is to have userspace udev have the "hook" here and when a
> > > vfio flavour mod alias is requested on a PCI device it swaps in
> > > vfio_pci if it can't find an alternative.
> > > 
> > > The dream would be a system with no vfio modules loaded could do some
> > > 
> > >  echo "vfio" > /sys/bus/pci/xxx/driver_flavour
> > > 
> > > And a module would be loaded and a struct vfio_device is created for
> > > that device. Very easy for the user.  
> > 
> > This is like switching a device to a parallel universe where we do
> > want vfio drivers to bind automatically to devices.  
> 
> Yes.
> 
> If we do this I'd probably suggest that driver_override be bumped down
> to some user compat and 'vfio > driver_override' would just set the
> flavour.
> 
> As-is driver_override seems dangerous as overriding the matching table
> could surely allow root userspace to crash the machine. In situations
> with trusted boot/signed modules this shouldn't be.

When we're dealing with meta-drivers that can bind to anything, we
shouldn't rely on the match, but should instead verify the driver is
appropriate in the probe callback.  Even without driver_override,
there's the new_id mechanism.  Either method allows the root user to
break driver binding.  Greg has previously stated something to the
effect that users get to keep all the pieces when they break something
by manipulating driver binding.

> > > > If we coupled that with wildcard support in driver_override, ex.
> > > > "vfio_pci*", and used consistent module naming, I think we'd only need
> > > > to teach userspace about this wildcard and binding to a specific module
> > > > would come for free.
> > > 
> > > What would the wildcard do?  
> > 
> > It allows a driver_override to match more than one driver, not too
> > dissimilar to your driver_flavor above.  In this case it would match
> > all driver names starting with "vfio_pci".  For example if we had:
> > 
> > softdep vfio-pci pre: vfio-pci-foo vfio-pci-bar
> >
> > Then we'd pre-seed the condition that drivers foo and bar precede the
> > base vfio-pci driver, each will match the device to the driver and have
> > an opportunity in their probe function to either claim or skip the
> > device.  Userspace could also set and exact driver_override, for
> > example if they want to force using the base vfio-pci driver or go
> > directly to a specific variant.  
> 
> Okay, I see. The problem is that this makes 'vfio-pci' monolithic, in
> normal situations it will load *everything*.
> 
> While that might not seem too bad with these simple drivers, at least
> the mlx5 migration driver will have a large dependency tree and pull
> in lots of other modules. Even Max's sample from v1 pulls in mlx5_core.ko
> and a bunch of other stuff in its orbit.

Luckily the mlx5 driver doesn't need to 

Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-19 Thread Jason Gunthorpe
On Fri, Mar 19, 2021 at 03:08:09PM -0600, Alex Williamson wrote:
> On Fri, 19 Mar 2021 17:07:49 -0300
> Jason Gunthorpe  wrote:
> 
> > On Fri, Mar 19, 2021 at 11:36:42AM -0600, Alex Williamson wrote:
> > > On Fri, 19 Mar 2021 17:34:49 +0100
> > > Christoph Hellwig  wrote:
> > >   
> > > > On Fri, Mar 19, 2021 at 01:28:48PM -0300, Jason Gunthorpe wrote:  
> > > > > The wrinkle I don't yet have an easy answer to is how to load vfio_pci
> > > > > as a universal "default" within the driver core lazy bind scheme and
> > > > > still have working module autoloading... I'm hoping to get some
> > > > > research into this..
> > > 
> > > What about using MODULE_SOFTDEP("pre: ...") in the vfio-pci base
> > > driver, which would load all the known variants in order to influence
> > > the match, and therefore probe ordering?  
> > 
> > The way the driver core works is to first match against the already
> > loaded driver list, then trigger an event for module loading and when
> > new drivers are registered they bind to unbound devices.
> 
> The former is based on id_tables, the latter on MODULE_DEVICE_TABLE, we
> don't have either of those.

Well, today we don't, but Max here adds id_table's to the special
devices and a MODULE_DEVICE_TABLE would come too if we do the flavours
thing below.

My starting thinking is that everything should have these tables and
they should work properly..

> As noted to Christoph, the cases where we want a vfio driver to
> bind to anything automatically is the exception.

I agree vfio should not automatically claim devices, but once vfio is
told to claim a device everything from there after should be
automatic.

> > One answer is to have userspace udev have the "hook" here and when a
> > vfio flavour mod alias is requested on a PCI device it swaps in
> > vfio_pci if it can't find an alternative.
> > 
> > The dream would be a system with no vfio modules loaded could do some
> > 
> >  echo "vfio" > /sys/bus/pci/xxx/driver_flavour
> > 
> > And a module would be loaded and a struct vfio_device is created for
> > that device. Very easy for the user.
> 
> This is like switching a device to a parallel universe where we do
> want vfio drivers to bind automatically to devices.

Yes.

If we do this I'd probably suggest that driver_override be bumped down
to some user compat and 'vfio > driver_override' would just set the
flavour.

As-is driver_override seems dangerous as overriding the matching table
could surely allow root userspace to crash the machine. In situations
with trusted boot/signed modules this shouldn't be.

> > > If we coupled that with wildcard support in driver_override, ex.
> > > "vfio_pci*", and used consistent module naming, I think we'd only need
> > > to teach userspace about this wildcard and binding to a specific module
> > > would come for free.  
> > 
> > What would the wildcard do?
> 
> It allows a driver_override to match more than one driver, not too
> dissimilar to your driver_flavor above.  In this case it would match
> all driver names starting with "vfio_pci".  For example if we had:
> 
> softdep vfio-pci pre: vfio-pci-foo vfio-pci-bar
>
> Then we'd pre-seed the condition that drivers foo and bar precede the
> base vfio-pci driver, each will match the device to the driver and have
> an opportunity in their probe function to either claim or skip the
> device.  Userspace could also set and exact driver_override, for
> example if they want to force using the base vfio-pci driver or go
> directly to a specific variant.

Okay, I see. The problem is that this makes 'vfio-pci' monolithic, in
normal situations it will load *everything*.

While that might not seem too bad with these simple drivers, at least
the mlx5 migration driver will have a large dependency tree and pull
in lots of other modules. Even Max's sample from v1 pulls in mlx5_core.ko
and a bunch of other stuff in its orbit.

This is why I want to try for fine grained autoloading first. It
really is the elegant solution if we can work it out.

> > Open coding a match table in probe() and returning failure feels hacky
> > to me.
> 
> How's it any different than Max's get_foo_vfio_pci_driver() that calls
> pci_match_id() with an internal match table?  

Well, I think that is hacky too - but it is hacky only to service user
space compatability so lets put that aside

> It seems a better fit for the existing use cases, for example the
> IGD variant can use a single line table to exclude all except Intel
> VGA class devices in its probe callback, then test availability of
> the extra regions we'd expose, otherwise return -ENODEV.

I don't think we should over-focus on these two firmware triggered
examples. I looked at the Intel GPU driver and it already only reads
the firmware thing for certain PCI ID's, we can absolutely generate a
narrow match table for it. Same is true for the NVIDIA GPU.

The fact this is hard or whatever is beside the point - future drivers
in this scheme should have exact match tables. 

The 

Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-19 Thread Alex Williamson
On Fri, 19 Mar 2021 17:07:49 -0300
Jason Gunthorpe  wrote:

> On Fri, Mar 19, 2021 at 11:36:42AM -0600, Alex Williamson wrote:
> > On Fri, 19 Mar 2021 17:34:49 +0100
> > Christoph Hellwig  wrote:
> >   
> > > On Fri, Mar 19, 2021 at 01:28:48PM -0300, Jason Gunthorpe wrote:  
> > > > The wrinkle I don't yet have an easy answer to is how to load vfio_pci
> > > > as a universal "default" within the driver core lazy bind scheme and
> > > > still have working module autoloading... I'm hoping to get some
> > > > research into this..
> > 
> > What about using MODULE_SOFTDEP("pre: ...") in the vfio-pci base
> > driver, which would load all the known variants in order to influence
> > the match, and therefore probe ordering?  
> 
> The way the driver core works is to first match against the already
> loaded driver list, then trigger an event for module loading and when
> new drivers are registered they bind to unbound devices.

The former is based on id_tables, the latter on MODULE_DEVICE_TABLE, we
don't have either of those.  As noted to Christoph, the cases where we
want a vfio driver to bind to anything automatically is the exception.
 
> So, the trouble is the event through userspace because the kernel
> can't just go on to use vfio_pci until it knows userspace has failed
> to satisfy the load request.

Given that we don't use MODULE_DEVICE_TABLE, vfio-pci doesn't autoload.
AFAIK, all tools like libvirt and driverctl that typically bind devices
to vfio-pci will manually load vfio-pci.  I think we can take advantage
of that.

> One answer is to have userspace udev have the "hook" here and when a
> vfio flavour mod alias is requested on a PCI device it swaps in
> vfio_pci if it can't find an alternative.
> 
> The dream would be a system with no vfio modules loaded could do some
> 
>  echo "vfio" > /sys/bus/pci/xxx/driver_flavour
> 
> And a module would be loaded and a struct vfio_device is created for
> that device. Very easy for the user.

This is like switching a device to a parallel universe where we do
want vfio drivers to bind automatically to devices.
 
> > If we coupled that with wildcard support in driver_override, ex.
> > "vfio_pci*", and used consistent module naming, I think we'd only need
> > to teach userspace about this wildcard and binding to a specific module
> > would come for free.  
> 
> What would the wildcard do?

It allows a driver_override to match more than one driver, not too
dissimilar to your driver_flavor above.  In this case it would match
all driver names starting with "vfio_pci".  For example if we had:

softdep vfio-pci pre: vfio-pci-foo vfio-pci-bar

Then we'd pre-seed the condition that drivers foo and bar precede the
base vfio-pci driver, each will match the device to the driver and have
an opportunity in their probe function to either claim or skip the
device.  Userspace could also set and exact driver_override, for
example if they want to force using the base vfio-pci driver or go
directly to a specific variant.
 
> > This assumes we drop the per-variant id_table and use the probe
> > function to skip devices without the necessary requirements, either
> > wrong device or missing the tables we expect to expose.  
> 
> Without a module table how do we know which driver is which? 
> 
> Open coding a match table in probe() and returning failure feels hacky
> to me.

How's it any different than Max's get_foo_vfio_pci_driver() that calls
pci_match_id() with an internal match table?  It seems a better fit for
the existing use cases, for example the IGD variant can use a single
line table to exclude all except Intel VGA class devices in its probe
callback, then test availability of the extra regions we'd expose,
otherwise return -ENODEV.  The NVLink variant can use pci_match_id() in
the probe callback to filter out anything other than NVIDIA VGA or 3D
accelerator class devices, then check for associated FDT table, or
return -ENODEV.  We already use the vfio_pci probe function to exclude
devices in the deny-list and non-endpoint devices.  Many drivers
clearly place implicit trust in their id_table, others don't.  In the
case of meta drivers, I think it's fair to make use of the latter
approach.

> > > Should we even load it by default?  One answer would be that the sysfs
> > > file to switch to vfio mode goes into the core PCI layer, and that core
> > > PCI code would contain a hack^H^H^H^Hhook to first load and bind vfio_pci
> > > for that device.  
> > 
> > Generally we don't want to be the default driver for anything (I think
> > mdev devices are the exception).  Assignment to userspace or VM is a
> > niche use case.  Thanks,  
> 
> By "default" I mean if the user says device A is in "vfio" mode then
> the kernel should
>  - Search for a specific driver for this device and autoload it
>  - If no specific driver is found then attach a default "universal"
>driver for it. vfio_pci is a universal driver.
> 
> vfio_platform is also a "universal" driver when in ACPI mode, in some

Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-19 Thread Jason Gunthorpe
On Fri, Mar 19, 2021 at 11:36:42AM -0600, Alex Williamson wrote:
> On Fri, 19 Mar 2021 17:34:49 +0100
> Christoph Hellwig  wrote:
> 
> > On Fri, Mar 19, 2021 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> > > The wrinkle I don't yet have an easy answer to is how to load vfio_pci
> > > as a universal "default" within the driver core lazy bind scheme and
> > > still have working module autoloading... I'm hoping to get some
> > > research into this..  
> 
> What about using MODULE_SOFTDEP("pre: ...") in the vfio-pci base
> driver, which would load all the known variants in order to influence
> the match, and therefore probe ordering?

The way the driver core works is to first match against the already
loaded driver list, then trigger an event for module loading and when
new drivers are registered they bind to unbound devices.

So, the trouble is the event through userspace because the kernel
can't just go on to use vfio_pci until it knows userspace has failed
to satisfy the load request.

One answer is to have userspace udev have the "hook" here and when a
vfio flavour mod alias is requested on a PCI device it swaps in
vfio_pci if it can't find an alternative.

The dream would be a system with no vfio modules loaded could do some

 echo "vfio" > /sys/bus/pci/xxx/driver_flavour

And a module would be loaded and a struct vfio_device is created for
that device. Very easy for the user.

> If we coupled that with wildcard support in driver_override, ex.
> "vfio_pci*", and used consistent module naming, I think we'd only need
> to teach userspace about this wildcard and binding to a specific module
> would come for free.

What would the wildcard do?

> This assumes we drop the per-variant id_table and use the probe
> function to skip devices without the necessary requirements, either
> wrong device or missing the tables we expect to expose.

Without a module table how do we know which driver is which? 

Open coding a match table in probe() and returning failure feels hacky
to me.

> > Should we even load it by default?  One answer would be that the sysfs
> > file to switch to vfio mode goes into the core PCI layer, and that core
> > PCI code would contain a hack^H^H^H^Hhook to first load and bind vfio_pci
> > for that device.
> 
> Generally we don't want to be the default driver for anything (I think
> mdev devices are the exception).  Assignment to userspace or VM is a
> niche use case.  Thanks,

By "default" I mean if the user says device A is in "vfio" mode then
the kernel should
 - Search for a specific driver for this device and autoload it
 - If no specific driver is found then attach a default "universal"
   driver for it. vfio_pci is a universal driver.

vfio_platform is also a "universal" driver when in ACPI mode, in some
cases.

For OF cases platform it builts its own little subsystem complete with
autoloading:

request_module("vfio-reset:%s", vdev->compat);
vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
>reset_module);

And it is a good example of why I don't like this subsystem design
because vfio_platform doesn't do the driver loading for OF entirely
right, vdev->compat is a single string derived from the compatible
property:

ret = device_property_read_string(dev, "compatible",
  >compat);
if (ret)
dev_err(dev, "Cannot retrieve compat for %s\n", vdev->name);

Unfortunately OF requires that compatible is a *list* of strings and a
correct driver is supposed to evaluate all of them. The driver core
does this all correctly, and this was lost when it was open coded
here.

We should NOT be avoiding the standard infrastructure for matching
drivers to devices by re-implementing it poorly.

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-19 Thread Alex Williamson
On Fri, 19 Mar 2021 17:34:49 +0100
Christoph Hellwig  wrote:

> On Fri, Mar 19, 2021 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> > The wrinkle I don't yet have an easy answer to is how to load vfio_pci
> > as a universal "default" within the driver core lazy bind scheme and
> > still have working module autoloading... I'm hoping to get some
> > research into this..  

What about using MODULE_SOFTDEP("pre: ...") in the vfio-pci base
driver, which would load all the known variants in order to influence
the match, and therefore probe ordering?

If we coupled that with wildcard support in driver_override, ex.
"vfio_pci*", and used consistent module naming, I think we'd only need
to teach userspace about this wildcard and binding to a specific module
would come for free.  This assumes we drop the per-variant id_table and
use the probe function to skip devices without the necessary
requirements, either wrong device or missing the tables we expect to
expose.

> Should we even load it by default?  One answer would be that the sysfs
> file to switch to vfio mode goes into the core PCI layer, and that core
> PCI code would contain a hack^H^H^H^Hhook to first load and bind vfio_pci
> for that device.

Generally we don't want to be the default driver for anything (I think
mdev devices are the exception).  Assignment to userspace or VM is a
niche use case.  Thanks,

Alex



Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-19 Thread Christoph Hellwig
On Fri, Mar 19, 2021 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> The wrinkle I don't yet have an easy answer to is how to load vfio_pci
> as a universal "default" within the driver core lazy bind scheme and
> still have working module autoloading... I'm hoping to get some
> research into this..

Should we even load it by default?  One answer would be that the sysfs
file to switch to vfio mode goes into the core PCI layer, and that core
PCI code would contain a hack^H^H^H^Hhook to first load and bind vfio_pci
for that device.


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-19 Thread Jason Gunthorpe
On Fri, Mar 19, 2021 at 05:20:33PM +0100, Christoph Hellwig wrote:
> On Fri, Mar 19, 2021 at 01:17:22PM -0300, Jason Gunthorpe wrote:
> > I think we talked about this.. We still need a better way to control
> > binding of VFIO modules - now that we have device-specific modules we
> > must have these match tables to control what devices they connect
> > to.
> > 
> > Previously things used the binding of vfio_pci as the "switch" and
> > hardcoded all the matches inside it.
> > 
> > I'm still keen to try the "driver flavour" idea I outlined earlier,
> > but it is hard to say what will resonate with Greg.
> 
> IMHO the only model that really works and makes sense is to turn the
> whole model around and make vfio a library called by the actual driver
> for the device.  That is any device that needs device specific
> funtionality simply needs a proper in-kernel driver, which then can be
> switched to a vfio mode where all the normal subsystems are unbound
> from the device, and VFIO functionality is found to it all while _the_
> driver that controls the PCI ID is still in charge of it.

Yes, this is what I want to strive for with Greg.

It would also resolve alot of the uncomfortable code I see in VFIO
using the driver core. For instance, when a device is moved to 'vfio
mode' it can go through and *lock* the entire group of devices to
'vfio mode' or completely fail.

This would replace all the protective code that is all about ensuring
the admin doesn't improperly mix & match in-kernel and vfio drivers
within a security domain.

The wrinkle I don't yet have an easy answer to is how to load vfio_pci
as a universal "default" within the driver core lazy bind scheme and
still have working module autoloading... I'm hoping to get some
research into this..

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-19 Thread Christoph Hellwig
On Fri, Mar 19, 2021 at 01:17:22PM -0300, Jason Gunthorpe wrote:
> I think we talked about this.. We still need a better way to control
> binding of VFIO modules - now that we have device-specific modules we
> must have these match tables to control what devices they connect
> to.
> 
> Previously things used the binding of vfio_pci as the "switch" and
> hardcoded all the matches inside it.
> 
> I'm still keen to try the "driver flavour" idea I outlined earlier,
> but it is hard to say what will resonate with Greg.

IMHO the only model that really works and makes sense is to turn the
whole model around and make vfio a library called by the actual driver
for the device.  That is any device that needs device specific
funtionality simply needs a proper in-kernel driver, which then can be
switched to a vfio mode where all the normal subsystems are unbound
from the device, and VFIO functionality is found to it all while _the_
driver that controls the PCI ID is still in charge of it.

vfio_pci remains a separate driver not binding to any ID by default
and not having any device specific functionality.


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-19 Thread Jason Gunthorpe
On Fri, Mar 19, 2021 at 09:23:41AM -0600, Alex Williamson wrote:
> On Wed, 10 Mar 2021 14:57:57 +0200
> Max Gurtovoy  wrote:
> > On 3/10/2021 8:39 AM, Alexey Kardashevskiy wrote:
> > > On 09/03/2021 19:33, Max Gurtovoy wrote:  
> > >> +static const struct pci_device_id nvlink2gpu_vfio_pci_table[] = {
> > >> +    { PCI_VDEVICE(NVIDIA, 0x1DB1) }, /* GV100GL-A NVIDIA Tesla 
> > >> V100-SXM2-16GB */
> > >> +    { PCI_VDEVICE(NVIDIA, 0x1DB5) }, /* GV100GL-A NVIDIA Tesla 
> > >> V100-SXM2-32GB */
> > >> +    { PCI_VDEVICE(NVIDIA, 0x1DB8) }, /* GV100GL-A NVIDIA Tesla 
> > >> V100-SXM3-32GB */
> > >> +    { PCI_VDEVICE(NVIDIA, 0x1DF5) }, /* GV100GL-B NVIDIA Tesla 
> > >> V100-SXM2-16GB */  
> > >
> > >
> > > Where is this list from?
> > >
> > > Also, how is this supposed to work at the boot time? Will the kernel 
> > > try binding let's say this one and nouveau? Which one is going to win?  
> > 
> > At boot time nouveau driver will win since the vfio drivers don't 
> > declare MODULE_DEVICE_TABLE
> 
> This still seems troublesome, AIUI the MODULE_DEVICE_TABLE is
> responsible for creating aliases so that kmod can figure out which
> modules to load, but what happens if all these vfio-pci modules are
> built into the kernel or the modules are already loaded?

I think we talked about this.. We still need a better way to control
binding of VFIO modules - now that we have device-specific modules we
must have these match tables to control what devices they connect
to.

Previously things used the binding of vfio_pci as the "switch" and
hardcoded all the matches inside it.

I'm still keen to try the "driver flavour" idea I outlined earlier,
but it is hard to say what will resonate with Greg.

> In the former case, I think it boils down to link order while the
> latter is generally considered even less deterministic since it depends
> on module load order.  So if one of these vfio modules should get
> loaded before the native driver, I think devices could bind here first.

At this point - "don't link these statically", we could have a kconfig
to prevent it.

> Are there tricks/extensions we could use in driver overrides, for
> example maybe a compatibility alias such that one of these vfio-pci
> variants could match "vfio-pci"?

driver override is not really useful as soon as you have a match table
as its operation is to defeat the match table entirely. :(

Again, this is still more of a outline how things will look as we must
get through this before we can attempt to do something in the driver
core with Greg.

We could revise this series to not register drivers at all and keep
the uAPI view exactly as is today. This would allow enough code to
show Greg how some driver flavour thing would work.

If soemthing can't be done in the driver core, I'd propse to keep the
same basic outline Max has here, but make registering the "compat"
dynamic - it is basically a sub-driver design at that point and we
give up achieving module autoloading.

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-19 Thread Alex Williamson
On Wed, 10 Mar 2021 14:57:57 +0200
Max Gurtovoy  wrote:
> On 3/10/2021 8:39 AM, Alexey Kardashevskiy wrote:
> > On 09/03/2021 19:33, Max Gurtovoy wrote:  
> >> +static const struct pci_device_id nvlink2gpu_vfio_pci_table[] = {
> >> +    { PCI_VDEVICE(NVIDIA, 0x1DB1) }, /* GV100GL-A NVIDIA Tesla 
> >> V100-SXM2-16GB */
> >> +    { PCI_VDEVICE(NVIDIA, 0x1DB5) }, /* GV100GL-A NVIDIA Tesla 
> >> V100-SXM2-32GB */
> >> +    { PCI_VDEVICE(NVIDIA, 0x1DB8) }, /* GV100GL-A NVIDIA Tesla 
> >> V100-SXM3-32GB */
> >> +    { PCI_VDEVICE(NVIDIA, 0x1DF5) }, /* GV100GL-B NVIDIA Tesla 
> >> V100-SXM2-16GB */  
> >
> >
> > Where is this list from?
> >
> > Also, how is this supposed to work at the boot time? Will the kernel 
> > try binding let's say this one and nouveau? Which one is going to win?  
> 
> At boot time nouveau driver will win since the vfio drivers don't 
> declare MODULE_DEVICE_TABLE

This still seems troublesome, AIUI the MODULE_DEVICE_TABLE is
responsible for creating aliases so that kmod can figure out which
modules to load, but what happens if all these vfio-pci modules are
built into the kernel or the modules are already loaded?

In the former case, I think it boils down to link order while the
latter is generally considered even less deterministic since it depends
on module load order.  So if one of these vfio modules should get
loaded before the native driver, I think devices could bind here first.

Are there tricks/extensions we could use in driver overrides, for
example maybe a compatibility alias such that one of these vfio-pci
variants could match "vfio-pci"?  Perhaps that, along with some sort of
priority scheme to probe variants ahead of the base driver, though I'm
not sure how we'd get these variants loaded without something like
module aliases.  I know we're trying to avoid creating another level of
driver matching, but that's essentially what we have in the compat
option enabled here, and I'm not sure I see how userspace makes the
leap to understand what driver to use for a given device.  Thanks,

Alex



Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-11 Thread Jason Gunthorpe
On Thu, Mar 11, 2021 at 06:54:09PM +1100, Alexey Kardashevskiy wrote:

> Is there an idea how it is going to work? For example, the Intel IGD driver
> and vfio-pci-igd - how should the system pick one? If there is no
> MODULE_DEVICE_TABLE in vfio-pci-xxx, is the user supposed to try binding all
> vfio-pci-xxx drivers until some binds?

We must expose some MODULE_DEVICE_TABLE like thing to userspace.

Compiling everything into one driver and using if statements was only
managable with these tiny drivers - the stuff that is coming are big
things that are infeasible to link directly to vfio_pci.ko

I'm feeling some general consensus around this approach (vs trying to
make a subdriver) so we will start looking at exactly what form that
could take soon.

The general idea would be to have a selection of extended VFIO drivers
for PCI devices that can be loaded as an alternative to vfio-pci and
they provide additional uapi and behaviors that only work on specific
hardware. nvlink is a good example because it does provide new API and
additional HW specific behavior.

A way for userspace to learn about the drivers automatically and sort
out how to load and bind them.

I was thinking about your earlier question about FDT - do you think we
could switch this to a platform_device and provide an of_match_table
that would select correctly? Did IBM enforce a useful compatible
string in the DT for these things?

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-11 Thread Jason Gunthorpe
On Thu, Mar 11, 2021 at 11:44:38AM +0200, Max Gurtovoy wrote:
> 
> On 3/11/2021 9:54 AM, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 11/03/2021 13:00, Jason Gunthorpe wrote:
> > > On Thu, Mar 11, 2021 at 12:42:56PM +1100, Alexey Kardashevskiy wrote:
> > > > > > btw can the id list have only vendor ids and not have device ids?
> > > > > 
> > > > > The PCI matcher is quite flexable, see the other patch from Max for
> > > > > the igd
> > > >   ah cool, do this for NVIDIA GPUs then please, I just
> > > > discovered another P9
> > > > system sold with NVIDIA T4s which is not in your list.
> > > 
> > > I think it will make things easier down the road if you maintain an
> > > exact list 
> > 
> > 
> > Then why do not you do the exact list for Intel IGD? The commit log does
> > not explain this detail.
> 
> I expect Intel team to review this series and give a more precise list.
> 
> I did the best I could in finding a proper configuration for igd.

Right. Doing this retroactively is really hard.

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-11 Thread Max Gurtovoy



On 3/11/2021 9:54 AM, Alexey Kardashevskiy wrote:



On 11/03/2021 13:00, Jason Gunthorpe wrote:

On Thu, Mar 11, 2021 at 12:42:56PM +1100, Alexey Kardashevskiy wrote:

btw can the id list have only vendor ids and not have device ids?


The PCI matcher is quite flexable, see the other patch from Max for
the igd
  ah cool, do this for NVIDIA GPUs then please, I just discovered 
another P9

system sold with NVIDIA T4s which is not in your list.


I think it will make things easier down the road if you maintain an
exact list 



Then why do not you do the exact list for Intel IGD? The commit log 
does not explain this detail.


I expect Intel team to review this series and give a more precise list.

I did the best I could in finding a proper configuration for igd.






But best practice is to be as narrow as possible as I hope this will
eventually impact module autoloading and other details.


The amount of device specific knowledge is too little to tie it up 
to device

ids, it is a generic PCI driver with quirks. We do not have a separate
drivers for the hardware which requires quirks.


It provides its own capability structure exposed to userspace, that is
absolutely not a "quirk"


And how do you hope this should impact autoloading?


I would like to autoload the most specific vfio driver for the target
hardware.



Is there an idea how it is going to work? For example, the Intel IGD 
driver and vfio-pci-igd - how should the system pick one? If there is 
no MODULE_DEVICE_TABLE in vfio-pci-xxx, is the user supposed to try 
binding all vfio-pci-xxx drivers until some binds?


For example, in my local setup I did a POC patch that convert some 
drivers to be "manual binding only drivers".


So the IGD driver will have the priority, user will unbind the device 
from it, load igd-vfio-pci, bind the device to it, ends with probing.


For now we separated the driver core stuff until we all agree that this 
series is the right way to go + we also make sure it's backward compatible.






If you someday need to support new GPU HW that needs a different VFIO
driver then you are really stuck because things become indeterminate
if there are two devices claiming the ID. We don't have the concept of
"best match", driver core works on exact match.






Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Alexey Kardashevskiy




On 11/03/2021 13:00, Jason Gunthorpe wrote:

On Thu, Mar 11, 2021 at 12:42:56PM +1100, Alexey Kardashevskiy wrote:

btw can the id list have only vendor ids and not have device ids?


The PCI matcher is quite flexable, see the other patch from Max for
the igd
  
ah cool, do this for NVIDIA GPUs then please, I just discovered another P9

system sold with NVIDIA T4s which is not in your list.


I think it will make things easier down the road if you maintain an
exact list 



Then why do not you do the exact list for Intel IGD? The commit log does 
not explain this detail.




But best practice is to be as narrow as possible as I hope this will
eventually impact module autoloading and other details.


The amount of device specific knowledge is too little to tie it up to device
ids, it is a generic PCI driver with quirks. We do not have a separate
drivers for the hardware which requires quirks.


It provides its own capability structure exposed to userspace, that is
absolutely not a "quirk"


And how do you hope this should impact autoloading?


I would like to autoload the most specific vfio driver for the target
hardware.



Is there an idea how it is going to work? For example, the Intel IGD 
driver and vfio-pci-igd - how should the system pick one? If there is no 
MODULE_DEVICE_TABLE in vfio-pci-xxx, is the user supposed to try binding 
all vfio-pci-xxx drivers until some binds?




If you someday need to support new GPU HW that needs a different VFIO
driver then you are really stuck because things become indeterminate
if there are two devices claiming the ID. We don't have the concept of
"best match", driver core works on exact match.




--
Alexey


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Jason Gunthorpe
On Thu, Mar 11, 2021 at 12:42:56PM +1100, Alexey Kardashevskiy wrote:
> > > btw can the id list have only vendor ids and not have device ids?
> > 
> > The PCI matcher is quite flexable, see the other patch from Max for
> > the igd
>  
> ah cool, do this for NVIDIA GPUs then please, I just discovered another P9
> system sold with NVIDIA T4s which is not in your list.

I think it will make things easier down the road if you maintain an
exact list 

> > But best practice is to be as narrow as possible as I hope this will
> > eventually impact module autoloading and other details.
> 
> The amount of device specific knowledge is too little to tie it up to device
> ids, it is a generic PCI driver with quirks. We do not have a separate
> drivers for the hardware which requires quirks.

It provides its own capability structure exposed to userspace, that is
absolutely not a "quirk"

> And how do you hope this should impact autoloading?

I would like to autoload the most specific vfio driver for the target
hardware.

If you someday need to support new GPU HW that needs a different VFIO
driver then you are really stuck because things become indeterminate
if there are two devices claiming the ID. We don't have the concept of
"best match", driver core works on exact match.

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Alexey Kardashevskiy




On 11/03/2021 12:34, Jason Gunthorpe wrote:

On Thu, Mar 11, 2021 at 12:20:33PM +1100, Alexey Kardashevskiy wrote:


It is supposed to match exactly the same match table as the pci_driver
above. We *don't* want different behavior from what the standrd PCI
driver matcher will do.


This is not a standard PCI driver though


It is now, that is what this patch makes it into. This is why it now
has a struct pci_driver.


and the main vfio-pci won't have a
list to match ever.


?? vfio-pci uses driver_override or new_id to manage its match list



Exactly, no list to update.



IBM NPU PCI id is unlikely to change ever but NVIDIA keeps making
new devices which work in those P9 boxes, are you going to keep
adding those ids to nvlink2gpu_vfio_pci_table?


Certainly, as needed. PCI list updates is normal for the kernel.


btw can the id list have only vendor ids and not have device ids?


The PCI matcher is quite flexable, see the other patch from Max for
the igd



ah cool, do this for NVIDIA GPUs then please, I just discovered another 
P9 system sold with NVIDIA T4s which is not in your list.




But best practice is to be as narrow as possible as I hope this will
eventually impact module autoloading and other details.


The amount of device specific knowledge is too little to tie it up to 
device ids, it is a generic PCI driver with quirks. We do not have a 
separate drivers for the hardware which requires quirks.


And how do you hope this should impact autoloading?



--
Alexey


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Jason Gunthorpe
On Thu, Mar 11, 2021 at 12:20:33PM +1100, Alexey Kardashevskiy wrote:

> > It is supposed to match exactly the same match table as the pci_driver
> > above. We *don't* want different behavior from what the standrd PCI
> > driver matcher will do.
> 
> This is not a standard PCI driver though 

It is now, that is what this patch makes it into. This is why it now
has a struct pci_driver.

> and the main vfio-pci won't have a
> list to match ever.

?? vfio-pci uses driver_override or new_id to manage its match list

> IBM NPU PCI id is unlikely to change ever but NVIDIA keeps making
> new devices which work in those P9 boxes, are you going to keep
> adding those ids to nvlink2gpu_vfio_pci_table?

Certainly, as needed. PCI list updates is normal for the kernel.

> btw can the id list have only vendor ids and not have device ids?

The PCI matcher is quite flexable, see the other patch from Max for
the igd

But best practice is to be as narrow as possible as I hope this will
eventually impact module autoloading and other details.

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Alexey Kardashevskiy




On 11/03/2021 06:40, Jason Gunthorpe wrote:

On Thu, Mar 11, 2021 at 01:24:47AM +1100, Alexey Kardashevskiy wrote:



On 11/03/2021 00:02, Jason Gunthorpe wrote:

On Wed, Mar 10, 2021 at 02:57:57PM +0200, Max Gurtovoy wrote:


+    .err_handler    = _pci_core_err_handlers,
+};
+
+#ifdef CONFIG_VFIO_PCI_DRIVER_COMPAT
+struct pci_driver *get_nvlink2gpu_vfio_pci_driver(struct pci_dev *pdev)
+{
+    if (pci_match_id(nvlink2gpu_vfio_pci_driver.id_table, pdev))
+    return _vfio_pci_driver;



Why do we need matching PCI ids here instead of looking at the FDT which
will work better?


what is FDT ? any is it better to use it instead of match_id ?


This is emulating the device_driver match for the pci_driver.


No it is not, it is a device tree info which lets to skip the linux PCI
discovery part (the firmware does it anyway) but it tells nothing about
which drivers to bind.


I mean get_nvlink2gpu_vfio_pci_driver() is emulating the PCI match.

Max added a pci driver for NPU here:

+static struct pci_driver npu2_vfio_pci_driver = {
+   .name   = "npu2-vfio-pci",
+   .id_table   = npu2_vfio_pci_table,
+   .probe  = npu2_vfio_pci_probe,


new userspace should use driver_override with "npu-vfio-pci" as the
string not "vfio-pci"

The point of the get_npu2_vfio_pci_driver() is only optional
compatibility to redirect old userspace using "vfio-pci" in the
driver_override to the now split driver code so userspace doesn't see
any change in behavior.

If we don't do this then the vfio-pci driver override will disable the
npu2 special stuff, since Max took it all out of vfio-pci's
pci_driver.

It is supposed to match exactly the same match table as the pci_driver
above. We *don't* want different behavior from what the standrd PCI
driver matcher will do.



This is not a standard PCI driver though and the main vfio-pci won't 
have a list to match ever. IBM NPU PCI id is unlikely to change ever but 
NVIDIA keeps making new devices which work in those P9 boxes, are you 
going to keep adding those ids to nvlink2gpu_vfio_pci_table? btw can the 
id list have only vendor ids and not have device ids?




Since we don't have any way to mix in FDT discovery to the standard
PCI driver match it will still attach the npu driver but not enable
any special support. This seems OK.




--
Alexey


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Max Gurtovoy



On 3/10/2021 4:19 PM, Alexey Kardashevskiy wrote:



On 10/03/2021 23:57, Max Gurtovoy wrote:


On 3/10/2021 8:39 AM, Alexey Kardashevskiy wrote:



On 09/03/2021 19:33, Max Gurtovoy wrote:

The new drivers introduced are nvlink2gpu_vfio_pci.ko and
npu2_vfio_pci.ko.
The first will be responsible for providing special extensions for
NVIDIA GPUs with NVLINK2 support for P9 platform (and others in the
future). The last will be responsible for POWER9 NPU2 unit (NVLink2 
host

bus adapter).

Also, preserve backward compatibility for users that were binding
NVLINK2 devices to vfio_pci.ko. Hopefully this compatibility layer 
will

be dropped in the future

Signed-off-by: Max Gurtovoy 
---
  drivers/vfio/pci/Kconfig  |  28 +++-
  drivers/vfio/pci/Makefile |   7 +-
  .../pci/{vfio_pci_npu2.c => npu2_vfio_pci.c}  | 144 
-

  drivers/vfio/pci/npu2_vfio_pci.h  |  24 +++
  ...pci_nvlink2gpu.c => nvlink2gpu_vfio_pci.c} | 149 
+-

  drivers/vfio/pci/nvlink2gpu_vfio_pci.h    |  24 +++
  drivers/vfio/pci/vfio_pci.c   |  61 ++-
  drivers/vfio/pci/vfio_pci_core.c  |  18 ---
  drivers/vfio/pci/vfio_pci_core.h  |  14 --
  9 files changed, 422 insertions(+), 47 deletions(-)
  rename drivers/vfio/pci/{vfio_pci_npu2.c => npu2_vfio_pci.c} (64%)
  create mode 100644 drivers/vfio/pci/npu2_vfio_pci.h
  rename drivers/vfio/pci/{vfio_pci_nvlink2gpu.c => 
nvlink2gpu_vfio_pci.c} (67%)

  create mode 100644 drivers/vfio/pci/nvlink2gpu_vfio_pci.h

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 829e90a2e5a3..88c89863a205 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -48,8 +48,30 @@ config VFIO_PCI_IGD
      To enable Intel IGD assignment through vfio-pci, say Y.
  -config VFIO_PCI_NVLINK2
-    def_bool y
+config VFIO_PCI_NVLINK2GPU
+    tristate "VFIO support for NVIDIA NVLINK2 GPUs"
  depends on VFIO_PCI_CORE && PPC_POWERNV
  help
-  VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 
GPUs
+  VFIO PCI driver for NVIDIA NVLINK2 GPUs with specific 
extensions

+  for P9 Witherspoon machine.
+
+config VFIO_PCI_NPU2
+    tristate "VFIO support for IBM NPU host bus adapter on P9"
+    depends on VFIO_PCI_NVLINK2GPU && PPC_POWERNV
+    help
+  VFIO PCI specific extensions for IBM NVLink2 host bus 
adapter on P9

+  Witherspoon machine.
+
+config VFIO_PCI_DRIVER_COMPAT
+    bool "VFIO PCI backward compatibility for vendor specific 
extensions"

+    default y
+    depends on VFIO_PCI
+    help
+  Say Y here if you want to preserve VFIO PCI backward
+  compatibility. vfio_pci.ko will continue to automatically use
+  the NVLINK2, NPU2 and IGD VFIO drivers when it is attached to
+  a compatible device.
+
+  When N is selected the user must bind explicity to the module
+  they want to handle the device and vfio_pci.ko will have no
+  device specific special behaviors.
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index f539f32c9296..86fb62e271fc 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -2,10 +2,15 @@
    obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
  obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
+obj-$(CONFIG_VFIO_PCI_NPU2) += npu2-vfio-pci.o
+obj-$(CONFIG_VFIO_PCI_NVLINK2GPU) += nvlink2gpu-vfio-pci.o
    vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o 
vfio_pci_rdwr.o vfio_pci_config.o

  vfio-pci-core-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
-vfio-pci-core-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2gpu.o 
vfio_pci_npu2.o

  vfio-pci-core-$(CONFIG_S390) += vfio_pci_zdev.o
    vfio-pci-y := vfio_pci.o
+
+npu2-vfio-pci-y := npu2_vfio_pci.o
+
+nvlink2gpu-vfio-pci-y := nvlink2gpu_vfio_pci.o
diff --git a/drivers/vfio/pci/vfio_pci_npu2.c 
b/drivers/vfio/pci/npu2_vfio_pci.c

similarity index 64%
rename from drivers/vfio/pci/vfio_pci_npu2.c
rename to drivers/vfio/pci/npu2_vfio_pci.c
index 717745256ab3..7071bda0f2b6 100644
--- a/drivers/vfio/pci/vfio_pci_npu2.c
+++ b/drivers/vfio/pci/npu2_vfio_pci.c
@@ -14,19 +14,28 @@
   *    Author: Alex Williamson 
   */
  +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
  #include 
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
    #include "vfio_pci_core.h"
+#include "npu2_vfio_pci.h"
    #define CREATE_TRACE_POINTS
  #include "npu2_trace.h"
  +#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Alexey Kardashevskiy "
+#define DRIVER_DESC "NPU2 VFIO PCI - User Level meta-driver 
for POWER9 NPU NVLink2 HBA"

+
  EXPORT_TRACEPOINT_SYMBOL_GPL(vfio_pci_npu2_mmap);
    struct vfio_pci_npu2_data {
@@ -36,6 +45,10 @@ struct vfio_pci_npu2_data {
  unsigned int link_speed; /* The link speed from DT's 
ibm,nvlink-speed */

  };
  +struct npu2_vfio_pci_device {
+    struct vfio_pci_core_device    vdev;
+};
+
  static size_t vfio_pci_npu2_rw(struct vfio_pci_core_device 

Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Jason Gunthorpe
On Thu, Mar 11, 2021 at 01:24:47AM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 11/03/2021 00:02, Jason Gunthorpe wrote:
> > On Wed, Mar 10, 2021 at 02:57:57PM +0200, Max Gurtovoy wrote:
> > 
> > > > > +    .err_handler    = _pci_core_err_handlers,
> > > > > +};
> > > > > +
> > > > > +#ifdef CONFIG_VFIO_PCI_DRIVER_COMPAT
> > > > > +struct pci_driver *get_nvlink2gpu_vfio_pci_driver(struct pci_dev 
> > > > > *pdev)
> > > > > +{
> > > > > +    if (pci_match_id(nvlink2gpu_vfio_pci_driver.id_table, pdev))
> > > > > +    return _vfio_pci_driver;
> > > > 
> > > > 
> > > > Why do we need matching PCI ids here instead of looking at the FDT which
> > > > will work better?
> > > 
> > > what is FDT ? any is it better to use it instead of match_id ?
> > 
> > This is emulating the device_driver match for the pci_driver.
> 
> No it is not, it is a device tree info which lets to skip the linux PCI
> discovery part (the firmware does it anyway) but it tells nothing about
> which drivers to bind.

I mean get_nvlink2gpu_vfio_pci_driver() is emulating the PCI match.

Max added a pci driver for NPU here:

+static struct pci_driver npu2_vfio_pci_driver = {
+   .name   = "npu2-vfio-pci",
+   .id_table   = npu2_vfio_pci_table,
+   .probe  = npu2_vfio_pci_probe,


new userspace should use driver_override with "npu-vfio-pci" as the
string not "vfio-pci"

The point of the get_npu2_vfio_pci_driver() is only optional
compatibility to redirect old userspace using "vfio-pci" in the
driver_override to the now split driver code so userspace doesn't see
any change in behavior.

If we don't do this then the vfio-pci driver override will disable the
npu2 special stuff, since Max took it all out of vfio-pci's
pci_driver.

It is supposed to match exactly the same match table as the pci_driver
above. We *don't* want different behavior from what the standrd PCI
driver matcher will do.

Since we don't have any way to mix in FDT discovery to the standard
PCI driver match it will still attach the npu driver but not enable
any special support. This seems OK.

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Alexey Kardashevskiy




On 11/03/2021 00:02, Jason Gunthorpe wrote:

On Wed, Mar 10, 2021 at 02:57:57PM +0200, Max Gurtovoy wrote:


+    .err_handler    = _pci_core_err_handlers,
+};
+
+#ifdef CONFIG_VFIO_PCI_DRIVER_COMPAT
+struct pci_driver *get_nvlink2gpu_vfio_pci_driver(struct pci_dev *pdev)
+{
+    if (pci_match_id(nvlink2gpu_vfio_pci_driver.id_table, pdev))
+    return _vfio_pci_driver;



Why do we need matching PCI ids here instead of looking at the FDT which
will work better?


what is FDT ? any is it better to use it instead of match_id ?


This is emulating the device_driver match for the pci_driver.



No it is not, it is a device tree info which lets to skip the linux PCI 
discovery part (the firmware does it anyway) but it tells nothing about 
which drivers to bind.




I don't think we can combine FDT matching with pci_driver, can we?


It is a c function calling another c function, all within vfio-pci, this 
is not called by the generic pci code.




--
Alexey


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Alexey Kardashevskiy




On 10/03/2021 23:57, Max Gurtovoy wrote:


On 3/10/2021 8:39 AM, Alexey Kardashevskiy wrote:



On 09/03/2021 19:33, Max Gurtovoy wrote:

The new drivers introduced are nvlink2gpu_vfio_pci.ko and
npu2_vfio_pci.ko.
The first will be responsible for providing special extensions for
NVIDIA GPUs with NVLINK2 support for P9 platform (and others in the
future). The last will be responsible for POWER9 NPU2 unit (NVLink2 host
bus adapter).

Also, preserve backward compatibility for users that were binding
NVLINK2 devices to vfio_pci.ko. Hopefully this compatibility layer will
be dropped in the future

Signed-off-by: Max Gurtovoy 
---
  drivers/vfio/pci/Kconfig  |  28 +++-
  drivers/vfio/pci/Makefile |   7 +-
  .../pci/{vfio_pci_npu2.c => npu2_vfio_pci.c}  | 144 -
  drivers/vfio/pci/npu2_vfio_pci.h  |  24 +++
  ...pci_nvlink2gpu.c => nvlink2gpu_vfio_pci.c} | 149 +-
  drivers/vfio/pci/nvlink2gpu_vfio_pci.h    |  24 +++
  drivers/vfio/pci/vfio_pci.c   |  61 ++-
  drivers/vfio/pci/vfio_pci_core.c  |  18 ---
  drivers/vfio/pci/vfio_pci_core.h  |  14 --
  9 files changed, 422 insertions(+), 47 deletions(-)
  rename drivers/vfio/pci/{vfio_pci_npu2.c => npu2_vfio_pci.c} (64%)
  create mode 100644 drivers/vfio/pci/npu2_vfio_pci.h
  rename drivers/vfio/pci/{vfio_pci_nvlink2gpu.c => 
nvlink2gpu_vfio_pci.c} (67%)

  create mode 100644 drivers/vfio/pci/nvlink2gpu_vfio_pci.h

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 829e90a2e5a3..88c89863a205 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -48,8 +48,30 @@ config VFIO_PCI_IGD
      To enable Intel IGD assignment through vfio-pci, say Y.
  -config VFIO_PCI_NVLINK2
-    def_bool y
+config VFIO_PCI_NVLINK2GPU
+    tristate "VFIO support for NVIDIA NVLINK2 GPUs"
  depends on VFIO_PCI_CORE && PPC_POWERNV
  help
-  VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs
+  VFIO PCI driver for NVIDIA NVLINK2 GPUs with specific extensions
+  for P9 Witherspoon machine.
+
+config VFIO_PCI_NPU2
+    tristate "VFIO support for IBM NPU host bus adapter on P9"
+    depends on VFIO_PCI_NVLINK2GPU && PPC_POWERNV
+    help
+  VFIO PCI specific extensions for IBM NVLink2 host bus adapter 
on P9

+  Witherspoon machine.
+
+config VFIO_PCI_DRIVER_COMPAT
+    bool "VFIO PCI backward compatibility for vendor specific 
extensions"

+    default y
+    depends on VFIO_PCI
+    help
+  Say Y here if you want to preserve VFIO PCI backward
+  compatibility. vfio_pci.ko will continue to automatically use
+  the NVLINK2, NPU2 and IGD VFIO drivers when it is attached to
+  a compatible device.
+
+  When N is selected the user must bind explicity to the module
+  they want to handle the device and vfio_pci.ko will have no
+  device specific special behaviors.
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index f539f32c9296..86fb62e271fc 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -2,10 +2,15 @@
    obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
  obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
+obj-$(CONFIG_VFIO_PCI_NPU2) += npu2-vfio-pci.o
+obj-$(CONFIG_VFIO_PCI_NVLINK2GPU) += nvlink2gpu-vfio-pci.o
    vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o 
vfio_pci_rdwr.o vfio_pci_config.o

  vfio-pci-core-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
-vfio-pci-core-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2gpu.o 
vfio_pci_npu2.o

  vfio-pci-core-$(CONFIG_S390) += vfio_pci_zdev.o
    vfio-pci-y := vfio_pci.o
+
+npu2-vfio-pci-y := npu2_vfio_pci.o
+
+nvlink2gpu-vfio-pci-y := nvlink2gpu_vfio_pci.o
diff --git a/drivers/vfio/pci/vfio_pci_npu2.c 
b/drivers/vfio/pci/npu2_vfio_pci.c

similarity index 64%
rename from drivers/vfio/pci/vfio_pci_npu2.c
rename to drivers/vfio/pci/npu2_vfio_pci.c
index 717745256ab3..7071bda0f2b6 100644
--- a/drivers/vfio/pci/vfio_pci_npu2.c
+++ b/drivers/vfio/pci/npu2_vfio_pci.c
@@ -14,19 +14,28 @@
   *    Author: Alex Williamson 
   */
  +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
  #include 
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
    #include "vfio_pci_core.h"
+#include "npu2_vfio_pci.h"
    #define CREATE_TRACE_POINTS
  #include "npu2_trace.h"
  +#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Alexey Kardashevskiy "
+#define DRIVER_DESC "NPU2 VFIO PCI - User Level meta-driver for 
POWER9 NPU NVLink2 HBA"

+
  EXPORT_TRACEPOINT_SYMBOL_GPL(vfio_pci_npu2_mmap);
    struct vfio_pci_npu2_data {
@@ -36,6 +45,10 @@ struct vfio_pci_npu2_data {
  unsigned int link_speed; /* The link speed from DT's 
ibm,nvlink-speed */

  };
  +struct npu2_vfio_pci_device {
+    struct vfio_pci_core_device    vdev;
+};
+
  static size_t vfio_pci_npu2_rw(struct vfio_pci_core_device *vdev,
  char __user *buf, size_t count, loff_t *ppos, 

Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Jason Gunthorpe
On Wed, Mar 10, 2021 at 02:57:57PM +0200, Max Gurtovoy wrote:

> > > +    .err_handler    = _pci_core_err_handlers,
> > > +};
> > > +
> > > +#ifdef CONFIG_VFIO_PCI_DRIVER_COMPAT
> > > +struct pci_driver *get_nvlink2gpu_vfio_pci_driver(struct pci_dev *pdev)
> > > +{
> > > +    if (pci_match_id(nvlink2gpu_vfio_pci_driver.id_table, pdev))
> > > +    return _vfio_pci_driver;
> > 
> > 
> > Why do we need matching PCI ids here instead of looking at the FDT which
> > will work better?
> 
> what is FDT ? any is it better to use it instead of match_id ?

This is emulating the device_driver match for the pci_driver.

I don't think we can combine FDT matching with pci_driver, can we?

Jason


Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Max Gurtovoy



On 3/10/2021 8:39 AM, Alexey Kardashevskiy wrote:



On 09/03/2021 19:33, Max Gurtovoy wrote:

The new drivers introduced are nvlink2gpu_vfio_pci.ko and
npu2_vfio_pci.ko.
The first will be responsible for providing special extensions for
NVIDIA GPUs with NVLINK2 support for P9 platform (and others in the
future). The last will be responsible for POWER9 NPU2 unit (NVLink2 host
bus adapter).

Also, preserve backward compatibility for users that were binding
NVLINK2 devices to vfio_pci.ko. Hopefully this compatibility layer will
be dropped in the future

Signed-off-by: Max Gurtovoy 
---
  drivers/vfio/pci/Kconfig  |  28 +++-
  drivers/vfio/pci/Makefile |   7 +-
  .../pci/{vfio_pci_npu2.c => npu2_vfio_pci.c}  | 144 -
  drivers/vfio/pci/npu2_vfio_pci.h  |  24 +++
  ...pci_nvlink2gpu.c => nvlink2gpu_vfio_pci.c} | 149 +-
  drivers/vfio/pci/nvlink2gpu_vfio_pci.h    |  24 +++
  drivers/vfio/pci/vfio_pci.c   |  61 ++-
  drivers/vfio/pci/vfio_pci_core.c  |  18 ---
  drivers/vfio/pci/vfio_pci_core.h  |  14 --
  9 files changed, 422 insertions(+), 47 deletions(-)
  rename drivers/vfio/pci/{vfio_pci_npu2.c => npu2_vfio_pci.c} (64%)
  create mode 100644 drivers/vfio/pci/npu2_vfio_pci.h
  rename drivers/vfio/pci/{vfio_pci_nvlink2gpu.c => 
nvlink2gpu_vfio_pci.c} (67%)

  create mode 100644 drivers/vfio/pci/nvlink2gpu_vfio_pci.h

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 829e90a2e5a3..88c89863a205 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -48,8 +48,30 @@ config VFIO_PCI_IGD
      To enable Intel IGD assignment through vfio-pci, say Y.
  -config VFIO_PCI_NVLINK2
-    def_bool y
+config VFIO_PCI_NVLINK2GPU
+    tristate "VFIO support for NVIDIA NVLINK2 GPUs"
  depends on VFIO_PCI_CORE && PPC_POWERNV
  help
-  VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs
+  VFIO PCI driver for NVIDIA NVLINK2 GPUs with specific extensions
+  for P9 Witherspoon machine.
+
+config VFIO_PCI_NPU2
+    tristate "VFIO support for IBM NPU host bus adapter on P9"
+    depends on VFIO_PCI_NVLINK2GPU && PPC_POWERNV
+    help
+  VFIO PCI specific extensions for IBM NVLink2 host bus adapter 
on P9

+  Witherspoon machine.
+
+config VFIO_PCI_DRIVER_COMPAT
+    bool "VFIO PCI backward compatibility for vendor specific 
extensions"

+    default y
+    depends on VFIO_PCI
+    help
+  Say Y here if you want to preserve VFIO PCI backward
+  compatibility. vfio_pci.ko will continue to automatically use
+  the NVLINK2, NPU2 and IGD VFIO drivers when it is attached to
+  a compatible device.
+
+  When N is selected the user must bind explicity to the module
+  they want to handle the device and vfio_pci.ko will have no
+  device specific special behaviors.
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index f539f32c9296..86fb62e271fc 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -2,10 +2,15 @@
    obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
  obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
+obj-$(CONFIG_VFIO_PCI_NPU2) += npu2-vfio-pci.o
+obj-$(CONFIG_VFIO_PCI_NVLINK2GPU) += nvlink2gpu-vfio-pci.o
    vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o 
vfio_pci_rdwr.o vfio_pci_config.o

  vfio-pci-core-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
-vfio-pci-core-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2gpu.o 
vfio_pci_npu2.o

  vfio-pci-core-$(CONFIG_S390) += vfio_pci_zdev.o
    vfio-pci-y := vfio_pci.o
+
+npu2-vfio-pci-y := npu2_vfio_pci.o
+
+nvlink2gpu-vfio-pci-y := nvlink2gpu_vfio_pci.o
diff --git a/drivers/vfio/pci/vfio_pci_npu2.c 
b/drivers/vfio/pci/npu2_vfio_pci.c

similarity index 64%
rename from drivers/vfio/pci/vfio_pci_npu2.c
rename to drivers/vfio/pci/npu2_vfio_pci.c
index 717745256ab3..7071bda0f2b6 100644
--- a/drivers/vfio/pci/vfio_pci_npu2.c
+++ b/drivers/vfio/pci/npu2_vfio_pci.c
@@ -14,19 +14,28 @@
   *    Author: Alex Williamson 
   */
  +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
  #include 
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
    #include "vfio_pci_core.h"
+#include "npu2_vfio_pci.h"
    #define CREATE_TRACE_POINTS
  #include "npu2_trace.h"
  +#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Alexey Kardashevskiy "
+#define DRIVER_DESC "NPU2 VFIO PCI - User Level meta-driver for 
POWER9 NPU NVLink2 HBA"

+
  EXPORT_TRACEPOINT_SYMBOL_GPL(vfio_pci_npu2_mmap);
    struct vfio_pci_npu2_data {
@@ -36,6 +45,10 @@ struct vfio_pci_npu2_data {
  unsigned int link_speed; /* The link speed from DT's 
ibm,nvlink-speed */

  };
  +struct npu2_vfio_pci_device {
+    struct vfio_pci_core_device    vdev;
+};
+
  static size_t vfio_pci_npu2_rw(struct vfio_pci_core_device *vdev,
  char __user *buf, size_t count, loff_t *ppos, bool iswrite)
  {
@@ -120,7 +133,7 @@ static 

Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-09 Thread Alexey Kardashevskiy




On 09/03/2021 19:33, Max Gurtovoy wrote:

The new drivers introduced are nvlink2gpu_vfio_pci.ko and
npu2_vfio_pci.ko.
The first will be responsible for providing special extensions for
NVIDIA GPUs with NVLINK2 support for P9 platform (and others in the
future). The last will be responsible for POWER9 NPU2 unit (NVLink2 host
bus adapter).

Also, preserve backward compatibility for users that were binding
NVLINK2 devices to vfio_pci.ko. Hopefully this compatibility layer will
be dropped in the future

Signed-off-by: Max Gurtovoy 
---
  drivers/vfio/pci/Kconfig  |  28 +++-
  drivers/vfio/pci/Makefile |   7 +-
  .../pci/{vfio_pci_npu2.c => npu2_vfio_pci.c}  | 144 -
  drivers/vfio/pci/npu2_vfio_pci.h  |  24 +++
  ...pci_nvlink2gpu.c => nvlink2gpu_vfio_pci.c} | 149 +-
  drivers/vfio/pci/nvlink2gpu_vfio_pci.h|  24 +++
  drivers/vfio/pci/vfio_pci.c   |  61 ++-
  drivers/vfio/pci/vfio_pci_core.c  |  18 ---
  drivers/vfio/pci/vfio_pci_core.h  |  14 --
  9 files changed, 422 insertions(+), 47 deletions(-)
  rename drivers/vfio/pci/{vfio_pci_npu2.c => npu2_vfio_pci.c} (64%)
  create mode 100644 drivers/vfio/pci/npu2_vfio_pci.h
  rename drivers/vfio/pci/{vfio_pci_nvlink2gpu.c => nvlink2gpu_vfio_pci.c} (67%)
  create mode 100644 drivers/vfio/pci/nvlink2gpu_vfio_pci.h

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 829e90a2e5a3..88c89863a205 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -48,8 +48,30 @@ config VFIO_PCI_IGD
  
  	  To enable Intel IGD assignment through vfio-pci, say Y.
  
-config VFIO_PCI_NVLINK2

-   def_bool y
+config VFIO_PCI_NVLINK2GPU
+   tristate "VFIO support for NVIDIA NVLINK2 GPUs"
depends on VFIO_PCI_CORE && PPC_POWERNV
help
- VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs
+ VFIO PCI driver for NVIDIA NVLINK2 GPUs with specific extensions
+ for P9 Witherspoon machine.
+
+config VFIO_PCI_NPU2
+   tristate "VFIO support for IBM NPU host bus adapter on P9"
+   depends on VFIO_PCI_NVLINK2GPU && PPC_POWERNV
+   help
+ VFIO PCI specific extensions for IBM NVLink2 host bus adapter on P9
+ Witherspoon machine.
+
+config VFIO_PCI_DRIVER_COMPAT
+   bool "VFIO PCI backward compatibility for vendor specific extensions"
+   default y
+   depends on VFIO_PCI
+   help
+ Say Y here if you want to preserve VFIO PCI backward
+ compatibility. vfio_pci.ko will continue to automatically use
+ the NVLINK2, NPU2 and IGD VFIO drivers when it is attached to
+ a compatible device.
+
+ When N is selected the user must bind explicity to the module
+ they want to handle the device and vfio_pci.ko will have no
+ device specific special behaviors.
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index f539f32c9296..86fb62e271fc 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -2,10 +2,15 @@
  
  obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o

  obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
+obj-$(CONFIG_VFIO_PCI_NPU2) += npu2-vfio-pci.o
+obj-$(CONFIG_VFIO_PCI_NVLINK2GPU) += nvlink2gpu-vfio-pci.o
  
  vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o

  vfio-pci-core-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
-vfio-pci-core-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2gpu.o 
vfio_pci_npu2.o
  vfio-pci-core-$(CONFIG_S390) += vfio_pci_zdev.o
  
  vfio-pci-y := vfio_pci.o

+
+npu2-vfio-pci-y := npu2_vfio_pci.o
+
+nvlink2gpu-vfio-pci-y := nvlink2gpu_vfio_pci.o
diff --git a/drivers/vfio/pci/vfio_pci_npu2.c b/drivers/vfio/pci/npu2_vfio_pci.c
similarity index 64%
rename from drivers/vfio/pci/vfio_pci_npu2.c
rename to drivers/vfio/pci/npu2_vfio_pci.c
index 717745256ab3..7071bda0f2b6 100644
--- a/drivers/vfio/pci/vfio_pci_npu2.c
+++ b/drivers/vfio/pci/npu2_vfio_pci.c
@@ -14,19 +14,28 @@
   *Author: Alex Williamson 
   */
  
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

+
+#include 
  #include 
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
  
  #include "vfio_pci_core.h"

+#include "npu2_vfio_pci.h"
  
  #define CREATE_TRACE_POINTS

  #include "npu2_trace.h"
  
+#define DRIVER_VERSION  "0.1"

+#define DRIVER_AUTHOR   "Alexey Kardashevskiy "
+#define DRIVER_DESC "NPU2 VFIO PCI - User Level meta-driver for POWER9 NPU 
NVLink2 HBA"
+
  EXPORT_TRACEPOINT_SYMBOL_GPL(vfio_pci_npu2_mmap);
  
  struct vfio_pci_npu2_data {

@@ -36,6 +45,10 @@ struct vfio_pci_npu2_data {
unsigned int link_speed; /* The link speed from DT's ibm,nvlink-speed */
  };
  
+struct npu2_vfio_pci_device {

+   struct vfio_pci_core_device vdev;
+};
+
  static size_t vfio_pci_npu2_rw(struct vfio_pci_core_device *vdev,
char __user *buf, size_t count, loff_t *ppos, bool iswrite)
  

[PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-09 Thread Max Gurtovoy
The new drivers introduced are nvlink2gpu_vfio_pci.ko and
npu2_vfio_pci.ko.
The first will be responsible for providing special extensions for
NVIDIA GPUs with NVLINK2 support for P9 platform (and others in the
future). The last will be responsible for POWER9 NPU2 unit (NVLink2 host
bus adapter).

Also, preserve backward compatibility for users that were binding
NVLINK2 devices to vfio_pci.ko. Hopefully this compatibility layer will
be dropped in the future

Signed-off-by: Max Gurtovoy 
---
 drivers/vfio/pci/Kconfig  |  28 +++-
 drivers/vfio/pci/Makefile |   7 +-
 .../pci/{vfio_pci_npu2.c => npu2_vfio_pci.c}  | 144 -
 drivers/vfio/pci/npu2_vfio_pci.h  |  24 +++
 ...pci_nvlink2gpu.c => nvlink2gpu_vfio_pci.c} | 149 +-
 drivers/vfio/pci/nvlink2gpu_vfio_pci.h|  24 +++
 drivers/vfio/pci/vfio_pci.c   |  61 ++-
 drivers/vfio/pci/vfio_pci_core.c  |  18 ---
 drivers/vfio/pci/vfio_pci_core.h  |  14 --
 9 files changed, 422 insertions(+), 47 deletions(-)
 rename drivers/vfio/pci/{vfio_pci_npu2.c => npu2_vfio_pci.c} (64%)
 create mode 100644 drivers/vfio/pci/npu2_vfio_pci.h
 rename drivers/vfio/pci/{vfio_pci_nvlink2gpu.c => nvlink2gpu_vfio_pci.c} (67%)
 create mode 100644 drivers/vfio/pci/nvlink2gpu_vfio_pci.h

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 829e90a2e5a3..88c89863a205 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -48,8 +48,30 @@ config VFIO_PCI_IGD
 
  To enable Intel IGD assignment through vfio-pci, say Y.
 
-config VFIO_PCI_NVLINK2
-   def_bool y
+config VFIO_PCI_NVLINK2GPU
+   tristate "VFIO support for NVIDIA NVLINK2 GPUs"
depends on VFIO_PCI_CORE && PPC_POWERNV
help
- VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs
+ VFIO PCI driver for NVIDIA NVLINK2 GPUs with specific extensions
+ for P9 Witherspoon machine.
+
+config VFIO_PCI_NPU2
+   tristate "VFIO support for IBM NPU host bus adapter on P9"
+   depends on VFIO_PCI_NVLINK2GPU && PPC_POWERNV
+   help
+ VFIO PCI specific extensions for IBM NVLink2 host bus adapter on P9
+ Witherspoon machine.
+
+config VFIO_PCI_DRIVER_COMPAT
+   bool "VFIO PCI backward compatibility for vendor specific extensions"
+   default y
+   depends on VFIO_PCI
+   help
+ Say Y here if you want to preserve VFIO PCI backward
+ compatibility. vfio_pci.ko will continue to automatically use
+ the NVLINK2, NPU2 and IGD VFIO drivers when it is attached to
+ a compatible device.
+
+ When N is selected the user must bind explicity to the module
+ they want to handle the device and vfio_pci.ko will have no
+ device specific special behaviors.
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index f539f32c9296..86fb62e271fc 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -2,10 +2,15 @@
 
 obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
 obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
+obj-$(CONFIG_VFIO_PCI_NPU2) += npu2-vfio-pci.o
+obj-$(CONFIG_VFIO_PCI_NVLINK2GPU) += nvlink2gpu-vfio-pci.o
 
 vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o 
vfio_pci_config.o
 vfio-pci-core-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
-vfio-pci-core-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2gpu.o 
vfio_pci_npu2.o
 vfio-pci-core-$(CONFIG_S390) += vfio_pci_zdev.o
 
 vfio-pci-y := vfio_pci.o
+
+npu2-vfio-pci-y := npu2_vfio_pci.o
+
+nvlink2gpu-vfio-pci-y := nvlink2gpu_vfio_pci.o
diff --git a/drivers/vfio/pci/vfio_pci_npu2.c b/drivers/vfio/pci/npu2_vfio_pci.c
similarity index 64%
rename from drivers/vfio/pci/vfio_pci_npu2.c
rename to drivers/vfio/pci/npu2_vfio_pci.c
index 717745256ab3..7071bda0f2b6 100644
--- a/drivers/vfio/pci/vfio_pci_npu2.c
+++ b/drivers/vfio/pci/npu2_vfio_pci.c
@@ -14,19 +14,28 @@
  * Author: Alex Williamson 
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
 #include "vfio_pci_core.h"
+#include "npu2_vfio_pci.h"
 
 #define CREATE_TRACE_POINTS
 #include "npu2_trace.h"
 
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Alexey Kardashevskiy "
+#define DRIVER_DESC "NPU2 VFIO PCI - User Level meta-driver for POWER9 NPU 
NVLink2 HBA"
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(vfio_pci_npu2_mmap);
 
 struct vfio_pci_npu2_data {
@@ -36,6 +45,10 @@ struct vfio_pci_npu2_data {
unsigned int link_speed; /* The link speed from DT's ibm,nvlink-speed */
 };
 
+struct npu2_vfio_pci_device {
+   struct vfio_pci_core_device vdev;
+};
+
 static size_t vfio_pci_npu2_rw(struct vfio_pci_core_device *vdev,
char __user *buf, size_t count, loff_t *ppos, bool iswrite)
 {
@@ -120,7 +133,7 @@ static const struct vfio_pci_regops vfio_pci_npu2_regops = {