Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-26 Thread Alex Williamson
On Thu, 26 Oct 2023 15:08:12 +0300
Yishai Hadas  wrote:

> On 25/10/2023 22:13, Alex Williamson wrote:
> > On Wed, 25 Oct 2023 17:35:51 +0300
> > Yishai Hadas  wrote:
> >  
> >> On 24/10/2023 22:57, Alex Williamson wrote:  
> >>> On Tue, 17 Oct 2023 16:42:17 +0300
> >>> Yishai Hadas  wrote:
   
> >>>> +if (copy_to_user(buf + copy_offset, , copy_count))
> >>>> +return -EFAULT;
> >>>> +}
> >>>> +
> >>>> +if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, 
> >>>> sizeof(val16),
> >>>> +  _offset, _count, NULL)) {
> >>>> +/*
> >>>> + * Transitional devices use the PCI subsystem device id 
> >>>> as
> >>>> + * virtio device id, same as legacy driver always did.  
> >>> Where did we require the subsystem vendor ID to be 0x1af4?  This
> >>> subsystem device ID really only makes since given that subsystem
> >>> vendor ID, right?  Otherwise I don't see that non-transitional devices,
> >>> such as the VF, have a hard requirement per the spec for the subsystem
> >>> vendor ID.
> >>>
> >>> Do we want to make this only probe the correct subsystem vendor ID or do
> >>> we want to emulate the subsystem vendor ID as well?  I don't see this is
> >>> correct without one of those options.  
> >> Looking in the 1.x spec we can see the below.
> >>
> >> Legacy Interfaces: A Note on PCI Device Discovery
> >>
> >> "Transitional devices MUST have the PCI Subsystem
> >> Device ID matching the Virtio Device ID, as indicated in section 5 ...
> >> This is to match legacy drivers."
> >>
> >> However, there is no need to enforce Subsystem Vendor ID.
> >>
> >> This is what we followed here.
> >>
> >> Makes sense ?  
> > So do I understand correctly that virtio dictates the subsystem device
> > ID for all subsystem vendor IDs that implement a legacy virtio
> > interface?  Ok, but this device didn't actually implement a legacy
> > virtio interface.  The device itself is not tranistional, we're imposing
> > an emulated transitional interface onto it.  So did the subsystem vendor
> > agree to have their subsystem device ID managed by the virtio committee
> > or might we create conflicts?  I imagine we know we don't have a
> > conflict if we also virtualize the subsystem vendor ID.
> >  
> The non transitional net device in the virtio spec defined as the below 
> tuple.
> T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> 
> And transitional net device in the virtio spec for a vendor FOO is 
> defined as:
> T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> 
> This driver is converting T_A to T_B, which both are defined by the 
> virtio spec.
> Hence, it does not conflict for the subsystem vendor, it is fine.

Surprising to me that the virtio spec dictates subsystem device ID in
all cases.  The further discussion in this thread seems to indicate we
need to virtualize subsystem vendor ID for broader driver compatibility
anyway.

> > BTW, it would be a lot easier for all of the config space emulation here
> > if we could make use of the existing field virtualization in
> > vfio-pci-core.  In fact you'll see in vfio_config_init() that
> > PCI_DEVICE_ID is already virtualized for VFs, so it would be enough to
> > simply do the following to report the desired device ID:
> >
> > *(__le16 *)[PCI_DEVICE_ID] = cpu_to_le16(0x1000);  
> 
> I would prefer keeping things simple and have one place/flow that 
> handles all the fields as we have now as part of the driver.

That's the same argument I'd make for re-using the core code, we don't
need multiple implementations handling merging physical and virtual
bits within config space.

> In any case, I'll further look at that option for managing the DEVICE_ID 
> towards V2.
> 
> > It appears everything in this function could be handled similarly by
> > vfio-pci-core if the right fields in the perm_bits.virt and .write
> > bits could be manipulated and vconfig modified appropriately.  I'd look
> > for a way that a variant driver could provide an alternate set of
> > permissions structures for various capabilities.  Thanks,  
> 
> OK
> 
> However, let's not block V2 and the series acceptance as of that.
> 
> It can always be some future refactoring as part of other series that 
> will bring the infra-structure that is needed for that.

We're already on the verge of the v6.7 merge window, so this looks like
v6.8 material anyway.  We have time.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-25 Thread Alex Williamson
On Wed, 25 Oct 2023 17:35:51 +0300
Yishai Hadas  wrote:

> On 24/10/2023 22:57, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas  wrote:
> >  
> >> Introduce a vfio driver over virtio devices to support the legacy
> >> interface functionality for VFs.
> >>
> >> Background, from the virtio spec [1].
> >> 
> >> In some systems, there is a need to support a virtio legacy driver with
> >> a device that does not directly support the legacy interface. In such
> >> scenarios, a group owner device can provide the legacy interface
> >> functionality for the group member devices. The driver of the owner
> >> device can then access the legacy interface of a member device on behalf
> >> of the legacy member device driver.
> >>
> >> For example, with the SR-IOV group type, group members (VFs) can not
> >> present the legacy interface in an I/O BAR in BAR0 as expected by the
> >> legacy pci driver. If the legacy driver is running inside a virtual
> >> machine, the hypervisor executing the virtual machine can present a
> >> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> >> legacy driver accesses to this I/O BAR and forwards them to the group
> >> owner device (PF) using group administration commands.
> >> 
> >>
> >> Specifically, this driver adds support for a virtio-net VF to be exposed
> >> as a transitional device to a guest driver and allows the legacy IO BAR
> >> functionality on top.
> >>
> >> This allows a VM which uses a legacy virtio-net driver in the guest to
> >> work transparently over a VF which its driver in the host is that new
> >> driver.
> >>
> >> The driver can be extended easily to support some other types of virtio
> >> devices (e.g virtio-blk), by adding in a few places the specific type
> >> properties as was done for virtio-net.
> >>
> >> For now, only the virtio-net use case was tested and as such we introduce
> >> the support only for such a device.
> >>
> >> Practically,
> >> Upon probing a VF for a virtio-net device, in case its PF supports
> >> legacy access over the virtio admin commands and the VF doesn't have BAR
> >> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> >> transitional device with I/O BAR in BAR 0.
> >>
> >> The existence of the simulated I/O bar is reported later on by
> >> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> >> exposes itself as a transitional device by overwriting some properties
> >> upon reading its config space.
> >>
> >> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> >> guest may use it via read/write calls according to the virtio
> >> specification.
> >>
> >> Any read/write towards the control parts of the BAR will be captured by
> >> the new driver and will be translated into admin commands towards the
> >> device.
> >>
> >> Any data path read/write access (i.e. virtio driver notifications) will
> >> be forwarded to the physical BAR which its properties were supplied by
> >> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> >> probing/init flow.
> >>
> >> With that code in place a legacy driver in the guest has the look and
> >> feel as if having a transitional device with legacy support for both its
> >> control and data path flows.
> >>
> >> [1]
> >> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> >>
> >> Signed-off-by: Yishai Hadas 
> >> ---
> >>   MAINTAINERS  |   7 +
> >>   drivers/vfio/pci/Kconfig |   2 +
> >>   drivers/vfio/pci/Makefile|   2 +
> >>   drivers/vfio/pci/virtio/Kconfig  |  15 +
> >>   drivers/vfio/pci/virtio/Makefile |   4 +
> >>   drivers/vfio/pci/virtio/main.c   | 577 +++
> >>   6 files changed, 607 insertions(+)
> >>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> >>   create mode 100644 drivers/vfio/pci/virtio/Makefile
> >>   create mode 100644 drivers/vfio/pci/virtio/main.c
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index 7a7bd8bd80e9..680a70063775 100644
> >> --- a/MAINTAIN

Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-24 Thread Alex Williamson
On Tue, 17 Oct 2023 16:42:17 +0300
Yishai Hadas  wrote:

> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> 
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> 
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas 
> ---
>  MAINTAINERS  |   7 +
>  drivers/vfio/pci/Kconfig |   2 +
>  drivers/vfio/pci/Makefile|   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/main.c   | 577 +++
>  6 files changed, 607 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7a7bd8bd80e9..680a70063775 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22620,6 +22620,13 @@ L:   k...@vger.kernel.org
>  S:   Maintained
>  F:   drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:   Yishai Hadas 
> +L:   k...@vger.kernel.org
> +L:   virtualization@lists.linux-foundation.org
> +S:   Maintained
> +F:   drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:   Jason Gunthorpe 
>  R:   Yishai Hadas 
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)   += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff 

Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices

2023-10-23 Thread Alex Williamson
On Mon, 23 Oct 2023 13:20:43 -0300
Jason Gunthorpe  wrote:

> On Mon, Oct 23, 2023 at 10:09:13AM -0600, Alex Williamson wrote:
> > On Mon, 23 Oct 2023 12:42:57 -0300
> > Jason Gunthorpe  wrote:
> >   
> > > On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:
> > >   
> > > > > Alex,
> > > > > Are you fine to leave the provisioning of the VF including the 
> > > > > control 
> > > > > of its transitional capability in the device hands as was suggested 
> > > > > by 
> > > > > Jason ?
> > > > 
> > > > If this is the standard we're going to follow, ie. profiling of a
> > > > device is expected to occur prior to the probe of the vfio-pci variant
> > > > driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> > > > with this too.
> > > 
> > > Those GPU drivers are using mdev not vfio-pci..  
> > 
> > The SR-IOV mdev vGPUs rely on the IOMMU backing device support which
> > was removed from upstream.
> 
> It wasn't, but it changed forms.
> 
> mdev is a sysfs framework for managing lifecycle with GUIDs only.
> 
> The thing using mdev can call vfio_register_emulated_iommu_dev() or
> vfio_register_group_dev(). 
> 
> It doesn't matter to the mdev stuff.
> 
> The thing using mdev is responsible to get the struct device to pass
> to vfio_register_group_dev()

Are we describing what can be done (possibly limited to out-of-tree
drivers) or what should be done and would be accepted upstream?

I'm under the impression that mdev has been redefined to be more
narrowly focused for emulated IOMMU devices and that devices based
around a PCI VF should be making use of a vfio-pci variant driver.

Are you suggesting it's the vendor's choice based on whether they want
the mdev lifecycle support?

We've defined certain aspects of the vfio-mdev interface as only
available for emulated IOMMU devices, ex. page pinning.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices

2023-10-23 Thread Alex Williamson
On Mon, 23 Oct 2023 12:42:57 -0300
Jason Gunthorpe  wrote:

> On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:
> 
> > > Alex,
> > > Are you fine to leave the provisioning of the VF including the control 
> > > of its transitional capability in the device hands as was suggested by 
> > > Jason ?  
> > 
> > If this is the standard we're going to follow, ie. profiling of a
> > device is expected to occur prior to the probe of the vfio-pci variant
> > driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> > with this too.  
> 
> Those GPU drivers are using mdev not vfio-pci..

The SR-IOV mdev vGPUs rely on the IOMMU backing device support which
was removed from upstream.  They only exist in the mdev form on
downstreams which have retained this interface for compatibility and
continuity.  I'm not aware of any other means by which the SR-IOV RID
can be used in the mdev model, therefore only the pre-SR-IOV GPUs
should continue to use the mdev interface.

> mdev doesn't have a way in its uapi to configure the mdev before it is
> created.

Of course.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices

2023-10-23 Thread Alex Williamson
On Sun, 22 Oct 2023 11:20:31 +0300
Yishai Hadas  wrote:

> On 17/10/2023 16:42, Yishai Hadas wrote:
> > This series introduce a vfio driver over virtio devices to support the
> > legacy interface functionality for VFs.
> >
> > Background, from the virtio spec [1].
> > 
> > In some systems, there is a need to support a virtio legacy driver with
> > a device that does not directly support the legacy interface. In such
> > scenarios, a group owner device can provide the legacy interface
> > functionality for the group member devices. The driver of the owner
> > device can then access the legacy interface of a member device on behalf
> > of the legacy member device driver.
> >
> > For example, with the SR-IOV group type, group members (VFs) can not
> > present the legacy interface in an I/O BAR in BAR0 as expected by the
> > legacy pci driver. If the legacy driver is running inside a virtual
> > machine, the hypervisor executing the virtual machine can present a
> > virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> > legacy driver accesses to this I/O BAR and forwards them to the group
> > owner device (PF) using group administration commands.
> > 
> >
> > The first 6 patches are in the virtio area and handle the below:
> > - Fix common config map for modern device as was reported by Michael 
> > Tsirkin.
> > - Introduce the admin virtqueue infrastcture.
> > - Expose the layout of the commands that should be used for
> >supporting the legacy access.
> > - Expose APIs to enable upper layers as of vfio, net, etc
> >to execute admin commands.
> >
> > The above follows the virtio spec that was lastly accepted in that area
> > [1].
> >
> > The last 3 patches are in the vfio area and handle the below:
> > - Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
> > - Introduce a vfio driver over virtio devices to support the legacy
> >interface functionality for VFs.
> >
> > The series was tested successfully over virtio-net VFs in the host,
> > while running in the guest both modern and legacy drivers.
> >
> > [1]
> > https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> >
> > Changes from V0: 
> > https://www.spinics.net/lists/linux-virtualization/msg63802.html
> >
> > Virtio:
> > - Fix the common config map size issue that was reported by Michael
> >Tsirkin.
> > - Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
> >Michael, instead skip the AQ specifically.
> > - Move admin vq implementation into virtio_pci_modern.c as was asked by
> >Michael.
> > - Rename structure virtio_avq to virtio_pci_admin_vq and some extra
> >corresponding renames.
> > - Remove exported symbols virtio_pci_vf_get_pf_dev(),
> >virtio_admin_cmd_exec() as now callers are local to the module.
> > - Handle inflight commands as part of the device reset flow.
> > - Introduce APIs per admin command in virtio-pci as was asked by Michael.
> >
> > Vfio:
> > - Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
> >vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
> >Alex.
> > - Drop the intermediate patch which prepares the commands and calls the
> >generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
> > - Instead, call directly to the new APIs per admin command that are
> >exported from Virtio - based on Michael's request.
> > - Enable only virtio-net as part of the pci_device_id table to enforce
> >upon binding only what is supported as suggested by Alex.
> > - Add support for byte-wise access (read/write) over the device config
> >region as was asked by Alex.
> > - Consider whether MSIX is practically enabled/disabled to choose the
> >right opcode upon issuing read/write admin command, as mentioned
> >by Michael.
> > - Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
> >as was suggested by Michael.
> > - Set the '.close_device' op to vfio_pci_core_close_device() as was
> >pointed by Alex.
> > - Adapt to Vfio multi-line comment style in a few places.
> > - Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
> >to be CCed for the new driver as was suggested by Jason.
> >
> > Yishai
> >
> > Feng Liu (5):
> >virtio-pci: Fix common config map for modern device
> >virtio: Define feature bit for administration virtqueue
> >virtio-pci: Introduce admin virtqueue
> >virtio-pci: Introduce admin command sending function
> >virtio-pci: Introduce admin commands
> >
> > Yishai Hadas (4):
> >virtio-pci: Introduce APIs to execute legacy IO admin commands
> >vfio/pci: Expose vfio_pci_core_setup_barmap()
> >vfio/pci: Expose vfio_pci_iowrite/read##size()
> >vfio/virtio: Introduce a vfio driver over virtio devices
> >
> >   MAINTAINERS 

Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-18 Thread Alex Williamson
On Wed, 18 Oct 2023 13:33:33 -0300
Jason Gunthorpe  wrote:

> On Tue, Oct 17, 2023 at 02:24:48PM -0600, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas  wrote:  
> > > +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > > +   const struct pci_device_id *id)
> > > +{
> > > + const struct vfio_device_ops *ops = _acc_vfio_pci_ops;
> > > + struct virtiovf_pci_core_device *virtvdev;
> > > + int ret;
> > > +
> > > + if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> > > + !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> > > + ops = _acc_vfio_pci_tran_ops;  
> > 
> > This is still an issue for me, it's a very narrow use case where we
> > have a modern device and want to enable legacy support.  Implementing an
> > IO BAR and mangling the device ID seems like it should be an opt-in,
> > not standard behavior for any compatible device.  Users should
> > generally expect that the device they see in the host is the device
> > they see in the guest.  They might even rely on that principle.  
> 
> I think this should be configured when the VF is provisioned. If the
> user does not want legacy IO bar support then the VFIO VF function
> should not advertise the capability, and they won't get driver
> support.
> 
> I think that is a very reasonable way to approach this - it is how we
> approached similar problems for mlx5. The provisioning interface is
> what "profiles" the VF, regardless of if VFIO is driving it or not.

It seems like a huge assumption that every device is going to allow
this degree of specification in provisioning VFs.  mlx5 is a vendor
specific driver, it can make such assumptions in design philosophy.

> > We can't use the argument that users wanting the default device should
> > use vfio-pci rather than virtio-vfio-pci because we've already defined
> > the algorithm by which libvirt should choose a variant driver for a
> > device.  libvirt will choose this driver for all virtio-net devices.  
> 
> Well, we can if the use case is niche. I think profiling a virtio VF
> to support legacy IO bar emulation and then not wanting to use it is
> a niche case.
> 
> The same argument is going come with live migration. This same driver
> will still bind and enable live migration if the virtio function is
> profiled to support it. If you don't want that in your system then
> don't profile the VF for migration support.

What in the virtio or SR-IOV spec requires a vendor to make this
configurable?

> > This driver effectively has the option to expose two different profiles
> > for the device, native or transitional.  We've discussed profile
> > support for variant drivers previously as an equivalent functionality
> > to mdev types, but the only use case for this currently is out-of-tree.
> > I think this might be the opportunity to define how device profiles are
> > exposed and selected in a variant driver.  
> 
> Honestly, I've been trying to keep this out of VFIO...
> 
> The function is profiled when it is created, by whatever created
> it. As in the other thread we have a vast amount of variation in what
> is required to provision the function in the first place. "Legacy IO
> BAR emulation support" is just one thing. virtio-net needs to be
> hooked up to real network and get a MAC, virtio-blk needs to be hooked
> up to real storage and get a media. At a minimum. This is big and
> complicated.
> 
> It may not even be the x86 running VFIO that is doing this
> provisioning, the PCI function may come pre-provisioned from a DPU.
> 
> It feels better to keep that all in one place, in whatever external
> thing is preparing the function before giving it to VFIO. VFIO is
> concerned with operating a prepared function.
> 
> When we get to SIOV it should not be VFIO that is
> provisioning/creating functions. The owning driver should be doing
> this and routing the function to VFIO (eg with an aux device or
> otherwise)
> 
> This gets back to the qemu thread on the grace patch where we need to
> ask how does the libvirt world see this, given there is no good way to
> generically handle all scenarios without a userspace driver to operate
> elements.

So nothing here is really "all in one place", it may be in the
provisioning of the VF, outside of the scope of the host OS, it might
be a collection of scripts or operators with device or interface
specific tooling to configure the device.  Sometimes this configuration
will be before the device is probed by the vfio-pci variant driver,
sometimes in between probing and opening the device.

I don't se

Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-18 Thread Alex Williamson
On Wed, 18 Oct 2023 12:01:57 +0300
Yishai Hadas  wrote:

> On 17/10/2023 23:24, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas  wrote:  
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> +const struct pci_device_id *id)
> >> +{
> >> +  const struct vfio_device_ops *ops = _acc_vfio_pci_ops;
> >> +  struct virtiovf_pci_core_device *virtvdev;
> >> +  int ret;
> >> +
> >> +  if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> >> +  !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> >> +  ops = _acc_vfio_pci_tran_ops;  
> >
> > This is still an issue for me, it's a very narrow use case where we
> > have a modern device and want to enable legacy support.  Implementing an
> > IO BAR and mangling the device ID seems like it should be an opt-in,
> > not standard behavior for any compatible device.  Users should
> > generally expect that the device they see in the host is the device
> > they see in the guest.  They might even rely on that principle.  
> 
> Users here mainly refer to cloud operators.
> 
> We may assume, I believe, that they will be fine with seeing a 
> transitional device in the guest as they would like to get the legacy IO 
> support for their system.
> 
> However, we can still consider supplying a configuration knob in the 
> device layer (e.g. in the DPU side) to let a cloud operator turning off 
> the legacy capability.

This is a driver that implements to the virtio standard, so I don't see
how we can assume that the current use case is the only use case we'll
ever see.  Therefore we cannot assume this will only be consumed by a
specific cloud operator making use of NVIDIA hardware.  Other vendors
may implement this spec for other environments.  We might even see an
implementation of a virtual virtio-net device with SR-IOV.

> In that case upon probe() of the vfio-virtio driver, we'll just pick-up 
> the default vfio-pci 'ops' and in the guest we may have the same device 
> ID as of in the host.
> 
> With that approach we may not require a HOST side control (i.e. sysfs, 
> etc.), but stay with a s device control based on its user manual.
> 
> At the end, we don't expect any functional issue nor any compatible 
> problem with the new driver, both modern and legacy drivers can work in 
> the guest.
> 
> Can that work for you ?

This is not being proposed as an NVIDIA specific driver, we can't make
such claims relative to all foreseeable implementations of virtio-net.

> > We can't use the argument that users wanting the default device should
> > use vfio-pci rather than virtio-vfio-pci because we've already defined
> > the algorithm by which libvirt should choose a variant driver for a
> > device.  libvirt will choose this driver for all virtio-net devices.
> >
> > This driver effectively has the option to expose two different profiles
> > for the device, native or transitional.  We've discussed profile
> > support for variant drivers previously as an equivalent functionality
> > to mdev types, but the only use case for this currently is out-of-tree.
> > I think this might be the opportunity to define how device profiles are
> > exposed and selected in a variant driver.
> >
> > Jason had previously suggested a devlink interface for this, but I
> > understand that path had been shot down by devlink developers.  Another
> > obvious option is sysfs, where we might imagine an optional "profiles"
> > directory, perhaps under vfio-dev.  Attributes of "available" and
> > "current" could allow discovery and selection of a profile similar to
> > mdev types.  
> 
> Referring to the sysfs option,
> 
> Do you expect the sysfs data to effect the libvirt decision ? may that 
> require changes in libvirt ?

We don't have such changes in libvirt for mdev, other than the ability
of the nodedev information to return available type information.
Generally the mdev type is configured outside of libvirt, which falls
into the same sort of configuration as necessary to enable migration on
mlx5-vfio-pci.

It's possible we could allows a default profile which would be used if
the open_device callback is used without setting a profile, but we need
to be careful of vGPU use cases where profiles consume resources and a
default selection may affect other devices.

> In addition,
> May that be too late as the sysfs entry will be created upon driver 
> binding by libvirt or that we have in mind some other option to control 
> with that ?

No different than mlx5-vfio-pci, there's a necessary point between
binding the driver and usi

Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-17 Thread Alex Williamson
On Tue, 17 Oct 2023 16:42:17 +0300
Yishai Hadas  wrote:
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +   const struct pci_device_id *id)
> +{
> + const struct vfio_device_ops *ops = _acc_vfio_pci_ops;
> + struct virtiovf_pci_core_device *virtvdev;
> + int ret;
> +
> + if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> + !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> + ops = _acc_vfio_pci_tran_ops;


This is still an issue for me, it's a very narrow use case where we
have a modern device and want to enable legacy support.  Implementing an
IO BAR and mangling the device ID seems like it should be an opt-in,
not standard behavior for any compatible device.  Users should
generally expect that the device they see in the host is the device
they see in the guest.  They might even rely on that principle.

We can't use the argument that users wanting the default device should
use vfio-pci rather than virtio-vfio-pci because we've already defined
the algorithm by which libvirt should choose a variant driver for a
device.  libvirt will choose this driver for all virtio-net devices.

This driver effectively has the option to expose two different profiles
for the device, native or transitional.  We've discussed profile
support for variant drivers previously as an equivalent functionality
to mdev types, but the only use case for this currently is out-of-tree.
I think this might be the opportunity to define how device profiles are
exposed and selected in a variant driver.

Jason had previously suggested a devlink interface for this, but I
understand that path had been shot down by devlink developers.  Another
obvious option is sysfs, where we might imagine an optional "profiles"
directory, perhaps under vfio-dev.  Attributes of "available" and
"current" could allow discovery and selection of a profile similar to
mdev types.

Is this where we should head with this or are there other options to
confine this transitional behavior?

BTW, what is "acc" in virtiovf_acc_vfio_pci_ops?

> +
> + virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +  >dev, ops);
> + if (IS_ERR(virtvdev))
> + return PTR_ERR(virtvdev);
> +
> + dev_set_drvdata(>dev, >core_device);
> + ret = vfio_pci_core_register_device(>core_device);
> + if (ret)
> + goto out;
> + return 0;
> +out:
> + vfio_put_device(>core_device.vdev);
> + return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> + struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(>dev);
> +
> + vfio_pci_core_unregister_device(>core_device);
> + vfio_put_device(>core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> + /* Only virtio-net is supported/tested so far */
> + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 
> 0x1041) },
> + {}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> + .name = KBUILD_MODNAME,
> + .id_table = virtiovf_pci_table,
> + .probe = virtiovf_pci_probe,
> + .remove = virtiovf_pci_remove,
> + .err_handler = _pci_core_err_handlers,
> + .driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas ");
> +MODULE_DESCRIPTION(
> + "VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");

Not yet "family" per the device table.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-21 Thread Alex Williamson
On Thu, 21 Sep 2023 16:20:59 -0400
"Michael S. Tsirkin"  wrote:

> On Thu, Sep 21, 2023 at 05:01:21PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 01:58:32PM -0600, Alex Williamson wrote:
> >   
> > > > +static const struct pci_device_id virtiovf_pci_table[] = {
> > > > +   { 
> > > > PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 
> > > > PCI_ANY_ID) },  
> > > 
> > > libvirt will blindly use this driver for all devices matching this as
> > > we've discussed how it should make use of modules.alias.  I don't think
> > > this driver should be squatting on devices where it doesn't add value
> > > and it's not clear whether this is adding or subtracting value in all
> > > cases for the one NIC that it modifies.  How should libvirt choose when
> > > and where to use this driver?  What regressions are we going to see
> > > with VMs that previously saw "modern" virtio-net devices and now see a
> > > legacy compatible device?  Thanks,  
> > 
> > Maybe this approach needs to use a subsystem ID match?
> > 
> > Jason  
> 
> Maybe make users load it manually?
> 
> Please don't bind to virtio by default, you will break
> all guests.

This would never bind by default, it's only bound as a vfio override
driver, but if libvirt were trying to determine the correct driver to
use with vfio for a 0x1af4 device, it'd land on this one.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-21 Thread Alex Williamson
On Thu, 21 Sep 2023 15:40:40 +0300
Yishai Hadas  wrote:

> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> 
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> 
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas 
> ---
>  MAINTAINERS  |   6 +
>  drivers/vfio/pci/Kconfig |   2 +
>  drivers/vfio/pci/Makefile|   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c|   4 +-
>  drivers/vfio/pci/virtio/cmd.h|   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:   k...@vger.kernel.org
>  S:   Maintained
>  F:   drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:   Yishai Hadas 
> +L:   k...@vger.kernel.org
> +S:   Maintained
> +F:   drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:   Jason Gunthorpe 
>  R:   Yishai Hadas 
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)   += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> 

Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-21 Thread Alex Williamson
On Thu, 21 Sep 2023 15:40:40 +0300
Yishai Hadas  wrote:

> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> 
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> 
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.

Why do we need to enable a "legacy" driver in the guest?  The very name
suggests there's an alternative driver that perhaps doesn't require
this I/O BAR.  Why don't we just require the non-legacy driver in the
guest rather than increase our maintenance burden?  Thanks,

Alex

> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas 
> ---
>  MAINTAINERS  |   6 +
>  drivers/vfio/pci/Kconfig |   2 +
>  drivers/vfio/pci/Makefile|   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/cmd.c|   4 +-
>  drivers/vfio/pci/virtio/cmd.h|   8 +
>  drivers/vfio/pci/virtio/main.c   | 546 +++
>  8 files changed, 585 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf0f54c24f81..5098418c8389 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22624,6 +22624,12 @@ L:   k...@vger.kernel.org
>  S:   Maintained
>  F:   drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:   Yishai Hadas 
> +L:   k...@vger.kernel.org
> +S:   Maintained
> +F:   drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:   Jason Gunthorpe 
>  R:   Yishai Hadas 
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 

Re: [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap()

2023-09-21 Thread Alex Williamson
On Thu, 21 Sep 2023 15:40:37 +0300
Yishai Hadas  wrote:

> Expose vfio_pci_core_setup_barmap() to be used by drivers.
> 
> This will let drivers to mmap a BAR and re-use it from both vfio and the
> driver when it's applicable.
> 
> This API will be used in the next patches by the vfio/virtio coming
> driver.
> 
> Signed-off-by: Yishai Hadas 
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 25 +
>  drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--
>  include/linux/vfio_pci_core.h|  1 +
>  3 files changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> b/drivers/vfio/pci/vfio_pci_core.c
> index 1929103ee59a..b56111ed8a8c 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device 
> *vdev)
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
>  
> +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
> +{
> + struct pci_dev *pdev = vdev->pdev;
> + void __iomem *io;
> + int ret;
> +
> + if (vdev->barmap[bar])
> + return 0;
> +
> + ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
> + if (ret)
> + return ret;
> +
> + io = pci_iomap(pdev, bar, 0);
> + if (!io) {
> + pci_release_selected_regions(pdev, 1 << bar);
> + return -ENOMEM;
> + }
> +
> + vdev->barmap[bar] = io;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(vfio_pci_core_setup_barmap);

Not to endorse the rest of this yet, but minimally _GPL, same for the
following patch.  Thanks,

Alex

> +
>  void vfio_pci_core_close_device(struct vfio_device *core_vdev)
>  {
>   struct vfio_pci_core_device *vdev =
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
> b/drivers/vfio/pci/vfio_pci_rdwr.c
> index e27de61ac9fe..6f08b3ecbb89 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device 
> *vdev, bool test_mem,
>   return done;
>  }
>  
> -static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
> -{
> - struct pci_dev *pdev = vdev->pdev;
> - int ret;
> - void __iomem *io;
> -
> - if (vdev->barmap[bar])
> - return 0;
> -
> - ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
> - if (ret)
> - return ret;
> -
> - io = pci_iomap(pdev, bar, 0);
> - if (!io) {
> - pci_release_selected_regions(pdev, 1 << bar);
> - return -ENOMEM;
> - }
> -
> - vdev->barmap[bar] = io;
> -
> - return 0;
> -}
> -
>  ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>   size_t count, loff_t *ppos, bool iswrite)
>  {
> @@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device 
> *vdev, char __user *buf,
>   }
>   x_end = end;
>   } else {
> - int ret = vfio_pci_setup_barmap(vdev, bar);
> + int ret = vfio_pci_core_setup_barmap(vdev, bar);
>   if (ret) {
>   done = ret;
>   goto out;
> @@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, 
> loff_t offset,
>   return -EINVAL;
>  #endif
>  
> - ret = vfio_pci_setup_barmap(vdev, bar);
> + ret = vfio_pci_core_setup_barmap(vdev, bar);
>   if (ret)
>   return ret;
>  
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 562e8754869d..67ac58e20e1d 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, 
> char *buf);
>  int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
>  void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
>  void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
> +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
>  pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
>   pci_channel_state_t state);
>  

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] eventfd: simplify signal helpers

2023-07-17 Thread Alex Williamson
On Mon, 17 Jul 2023 19:12:16 -0300
Jason Gunthorpe  wrote:

> On Mon, Jul 17, 2023 at 01:08:31PM -0600, Alex Williamson wrote:
> 
> > What would that mechanism be?  We've been iterating on getting the
> > serialization and buffering correct, but I don't know of another means
> > that combines the notification with a value, so we'd likely end up with
> > an eventfd only for notification and a separate ring buffer for
> > notification values.  
> 
> All FDs do this. You just have to make a FD with custom
> file_operations that does what this wants. The uAPI shouldn't be able
> to tell if the FD is backing it with an eventfd or otherwise. Have the
> kernel return the FD instead of accepting it. Follow the basic design
> of eg mlx5vf_save_fops

Sure, userspace could poll on any fd and read a value from it, but at
that point we're essentially duplicating a lot of what eventfd provides
for a minor(?) semantic difference over how the counter value is
interpreted.  Using an actual eventfd allows the ACPI notification to
work as just another interrupt index within the existing vfio IRQ uAPI.
Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] eventfd: simplify signal helpers

2023-07-17 Thread Alex Williamson
On Mon, 17 Jul 2023 10:29:34 +0200
Grzegorz Jaszczyk  wrote:

> pt., 14 lip 2023 o 09:05 Christian Brauner  napisał(a):
> >
> > On Thu, Jul 13, 2023 at 11:10:54AM -0600, Alex Williamson wrote:  
> > > On Thu, 13 Jul 2023 12:05:36 +0200
> > > Christian Brauner  wrote:
> > >  
> > > > Hey everyone,
> > > >
> > > > This simplifies the eventfd_signal() and eventfd_signal_mask() helpers
> > > > by removing the count argument which is effectively unused.  
> > >
> > > We have a patch under review which does in fact make use of the
> > > signaling value:
> > >
> > > https://lore.kernel.org/all/20230630155936.3015595-1-...@semihalf.com/  
> >
> > Huh, thanks for the link.
> >
> > Quoting from
> > https://patchwork.kernel.org/project/kvm/patch/20230307220553.631069-1-...@semihalf.com/#25266856
> >  
> > > Reading an eventfd returns an 8-byte value, we generally only use it
> > > as a counter, but it's been discussed previously and IIRC, it's possible
> > > to use that value as a notification value.  
> >
> > So the goal is to pipe a specific value through eventfd? But it is
> > explicitly a counter. The whole thing is written around a counter and
> > each write and signal adds to the counter.
> >
> > The consequences are pretty well described in the cover letter of
> > v6 https://lore.kernel.org/all/20230630155936.3015595-1-...@semihalf.com/
> >  
> > > Since the eventfd counter is used as ACPI notification value
> > > placeholder, the eventfd signaling needs to be serialized in order to
> > > not end up with notification values being coalesced. Therefore ACPI
> > > notification values are buffered and signalized one by one, when the
> > > previous notification value has been consumed.  
> >
> > But isn't this a good indication that you really don't want an eventfd
> > but something that's explicitly designed to associate specific data with
> > a notification? Using eventfd in that manner requires serialization,
> > buffering, and enforces ordering.

What would that mechanism be?  We've been iterating on getting the
serialization and buffering correct, but I don't know of another means
that combines the notification with a value, so we'd likely end up with
an eventfd only for notification and a separate ring buffer for
notification values.

As this series demonstrates, the current in-kernel users only increment
the counter and most userspace likely discards the counter value, which
makes the counter largely a waste.  While perhaps unconventional,
there's no requirement that the counter may only be incremented by one,
nor any restriction that I see in how userspace must interpret the
counter value.

As I understand the ACPI notification proposal that Grzegorz links
below, a notification with an interpreted value allows for a more
direct userspace implementation when dealing with a series of discrete
notification with value events.  Thanks,

Alex

> > I have no skin in the game aside from having to drop this conversion
> > which I'm fine to do if there are actually users for this btu really,
> > that looks a lot like abusing an api that really wasn't designed for
> > this.  
> 
> https://patchwork.kernel.org/project/kvm/patch/20230307220553.631069-1-...@semihalf.com/
> was posted at the beginig of March and one of the main things we've
> discussed was the mechanism for propagating acpi notification value.
> We've endup with eventfd as the best mechanism and have actually been
> using it from v2. I really do not want to waste this effort, I think
> we are quite advanced with v6 now. Additionally we didn't actually
> modify any part of eventfd support that was in place, we only used it
> in a specific (and discussed beforehand) way.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/2] eventfd: simplify signal helpers

2023-07-13 Thread Alex Williamson
On Thu, 13 Jul 2023 12:05:36 +0200
Christian Brauner  wrote:

> Hey everyone,
> 
> This simplifies the eventfd_signal() and eventfd_signal_mask() helpers
> by removing the count argument which is effectively unused.

We have a patch under review which does in fact make use of the
signaling value:

https://lore.kernel.org/all/20230630155936.3015595-1-...@semihalf.com/

Thanks,
Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: vIOMMU gIOVA to HPA mapping

2022-11-14 Thread Alex Williamson
On Mon, 14 Nov 2022 20:31:49 +0800
"leo...@tom.com"  wrote:

> Hi,
>  Here is my application scenario:
> 1. The NIC (Network Information Center) passes through to the VM(Virtual 
> Machine);
> 2. The VM uses the user mode driver DPDK;
> 
> Question:
> 1. vIOMMU maintains the mapping gIOVA->gPA, When do you use this gPA ?

QEMU in the host derives the hVA from the gPA.  The vIOMMU driver in
QEMU is triggering the gIOVA to hVA mapping through vfio in the host.

> 2. Physical IOMMU maintains the GIOVA->HPA mapping ?  If so, by what means 
> (gIOVA -> HPA) mapping ?

As above, the vIOMMU in the guest provides gIOVA -> gPA, in QEMU we do
the gPA -> hVA, then vfio in the host kernel performs hVA -> hPA via
page pinning.

> 3. What does QEMU do in NIC pass-through address translation ?

The guest visible vIOMMU triggers MemoryListener notifications in QEMU
for the device address space, which insert and removes mappings to the
vfio layer below it.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5 0/5] cover-letter: Simplify vfio_iommu_type1 attach/detach routine

2022-07-06 Thread Alex Williamson
On Wed, 6 Jul 2022 10:53:52 -0700
Nicolin Chen  wrote:

> On Wed, Jul 06, 2022 at 11:42:17AM -0600, Alex Williamson wrote:
> 
> > On Fri, 1 Jul 2022 14:44:50 -0700
> > Nicolin Chen  wrote:
> >   
> > > This is a preparatory series for IOMMUFD v2 patches. It enforces error
> > > code -EMEDIUMTYPE in iommu_attach_device() and iommu_attach_group() when
> > > an IOMMU domain and a device/group are incompatible. It also drops the
> > > useless domain->ops check since it won't fail in current environment.
> > >
> > > These allow VFIO iommu code to simplify its group attachment routine, by
> > > avoiding the extra IOMMU domain allocations and attach/detach sequences
> > > of the old code.
> > >
> > > Worths mentioning the exact match for enforce_cache_coherency is removed
> > > with this series, since there's very less value in doing that as KVM will
> > > not be able to take advantage of it -- this just wastes domain memory.
> > > Instead, we rely on Intel IOMMU driver taking care of that internally.
> > >
> > > This is on github:
> > > https://github.com/nicolinc/iommufd/commits/vfio_iommu_attach  
> > 
> > How do you foresee this going in, I'm imagining Joerg would merge the
> > first patch via the IOMMU tree and provide a topic branch that I'd
> > merge into the vfio tree along with the remaining patches.  Sound
> > right?  Thanks,  
> 
> We don't have any build dependency between the IOMMU change and
> VFIO changes, yet, without the IOMMU one, any iommu_attach_group()
> failure now would be a hard failure without a chance falling back
> to a new_domain, which is slightly different from the current flow.
> 
> For a potential existing use case that relies on reusing existing
> domain, I think it'd be safer to have Joerg acking the first change
> so you merge them all? Thank!

Works for me, I'll look for buy-in + ack from Joerg.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 0/5] cover-letter: Simplify vfio_iommu_type1 attach/detach routine

2022-07-06 Thread Alex Williamson
On Fri, 1 Jul 2022 14:44:50 -0700
Nicolin Chen  wrote:

> This is a preparatory series for IOMMUFD v2 patches. It enforces error
> code -EMEDIUMTYPE in iommu_attach_device() and iommu_attach_group() when
> an IOMMU domain and a device/group are incompatible. It also drops the
> useless domain->ops check since it won't fail in current environment.
> 
> These allow VFIO iommu code to simplify its group attachment routine, by
> avoiding the extra IOMMU domain allocations and attach/detach sequences
> of the old code.
> 
> Worths mentioning the exact match for enforce_cache_coherency is removed
> with this series, since there's very less value in doing that as KVM will
> not be able to take advantage of it -- this just wastes domain memory.
> Instead, we rely on Intel IOMMU driver taking care of that internally.
> 
> This is on github:
> https://github.com/nicolinc/iommufd/commits/vfio_iommu_attach

How do you foresee this going in, I'm imagining Joerg would merge the
first patch via the IOMMU tree and provide a topic branch that I'd
merge into the vfio tree along with the remaining patches.  Sound
right?  Thanks,

Alex

 
> Changelog
> v5:
>  * Rebased on top of Robin's "Simplify bus_type determination".
>  * Fixed a wrong change returning -EMEDIUMTYPE in arm-smmu driver.
>  * Added Baolu's "Reviewed-by".
> v4:
>  * Dropped -EMEDIUMTYPE change in mtk_v1 driver per Robin's input
>  * Added Baolu's and Kevin's Reviewed-by lines
> v3: https://lore.kernel.org/kvm/20220623200029.26007-1-nicol...@nvidia.com/
>  * Dropped all dev_err since -EMEDIUMTYPE clearly indicates what error.
>  * Updated commit message of enforce_cache_coherency removing patch.
>  * Updated commit message of domain->ops removing patch.
>  * Replaced "goto out_unlock" with simply mutex_unlock() and return.
>  * Added a line of comments for -EMEDIUMTYPE return check.
>  * Moved iommu_get_msi_cookie() into alloc_attach_domain() as a cookie
>should be logically tied to the lifetime of a domain itself.
>  * Added Kevin's "Reviewed-by".
> v2: https://lore.kernel.org/kvm/20220616000304.23890-1-nicol...@nvidia.com/
>  * Added -EMEDIUMTYPE to more IOMMU drivers that fit the category.
>  * Changed dev_err to dev_dbg for -EMEDIUMTYPE to avoid kernel log spam.
>  * Dropped iommu_ops patch, and removed domain->ops in VFIO directly,
>since there's no mixed-driver use case that would fail the sanity.
>  * Updated commit log of the patch removing enforce_cache_coherency.
>  * Fixed a misplace of "num_non_pinned_groups--" in detach_group patch.
>  * Moved "num_non_pinned_groups++" in PATCH-5 to the common path between
>domain-reusing and new-domain pathways, like the code previously did.
>  * Fixed a typo in EMEDIUMTYPE patch.
> v1: https://lore.kernel.org/kvm/20220606061927.26049-1-nicol...@nvidia.com/
> 
> Jason Gunthorpe (1):
>   vfio/iommu_type1: Prefer to reuse domains vs match enforced cache
> coherency
> 
> Nicolin Chen (4):
>   iommu: Return -EMEDIUMTYPE for incompatible domain and device/group
>   vfio/iommu_type1: Remove the domain->ops comparison
>   vfio/iommu_type1: Clean up update_dirty_scope in detach_group()
>   vfio/iommu_type1: Simplify group attachment
> 
>  drivers/iommu/amd/iommu.c   |   2 +-
>  drivers/iommu/apple-dart.c  |   4 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  15 +-
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   |   5 +-
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c |   9 +-
>  drivers/iommu/intel/iommu.c |  10 +-
>  drivers/iommu/iommu.c   |  28 ++
>  drivers/iommu/ipmmu-vmsa.c  |   4 +-
>  drivers/iommu/omap-iommu.c  |   3 +-
>  drivers/iommu/s390-iommu.c  |   2 +-
>  drivers/iommu/sprd-iommu.c  |   6 +-
>  drivers/iommu/tegra-gart.c  |   2 +-
>  drivers/iommu/virtio-iommu.c|   3 +-
>  drivers/vfio/vfio_iommu_type1.c | 352 ++--
>  14 files changed, 229 insertions(+), 216 deletions(-)
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency

2022-06-21 Thread Alex Williamson
On Wed, 15 Jun 2022 17:03:01 -0700
Nicolin Chen  wrote:

> From: Jason Gunthorpe 
> 
> The KVM mechanism for controlling wbinvd is based on OR of the coherency
> property of all devices attached to a guest, no matter those devices are
> attached to a single domain or multiple domains.
> 
> So, there is no value in trying to push a device that could do enforced
> cache coherency to a dedicated domain vs re-using an existing domain
> which is non-coherent since KVM won't be able to take advantage of it.
> This just wastes domain memory.
> 
> Simplify this code and eliminate the test. This removes the only logic
> that needed to have a dummy domain attached prior to searching for a
> matching domain and simplifies the next patches.
> 
> It's unclear whether we want to further optimize the Intel driver to
> update the domain coherency after a device is detached from it, at
> least not before KVM can be verified to handle such dynamics in related
> emulation paths (wbinvd, vcpu load, write_cr0, ept, etc.). In reality
> we don't see an usage requiring such optimization as the only device
> which imposes such non-coherency is Intel GPU which even doesn't
> support hotplug/hot remove.

The 2nd paragraph above is quite misleading in this respect.  I think
it would be more accurate to explain that the benefit to using separate
domains was that devices attached to domains supporting enforced cache
coherency always mapped with the attributes necessary to provide that
feature, therefore if a non-enforced domain was dropped, the associated
group removal would re-trigger an evaluation by KVM.  We can then go on
to discuss that in practice the only known cases of such mixed domains
included an Intel IGD device behind an IOMMU lacking snoop control,
where such devices do not support hotplug, therefore this scenario lacks
testing and is not considered sufficiently relevant to support.  Thanks,

Alex

> 
> Signed-off-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index c13b9290e357..f4e3b423a453 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2285,9 +2285,7 @@ static int vfio_iommu_type1_attach_group(void 
> *iommu_data,
>* testing if they're on the same bus_type.
>*/
>   list_for_each_entry(d, >domain_list, next) {
> - if (d->domain->ops == domain->domain->ops &&
> - d->enforce_cache_coherency ==
> - domain->enforce_cache_coherency) {
> + if (d->domain->ops == domain->domain->ops) {
>   iommu_detach_group(domain->domain, group->iommu_group);
>   if (!iommu_attach_group(d->domain,
>   group->iommu_group)) {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Alex Williamson
On Thu, 7 Apr 2022 12:23:31 -0300
Jason Gunthorpe  wrote:

> On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:
> 
> > For the specific case of overriding PCIe No Snoop (which is more problematic
> > from an Arm SMMU PoV) when assigning to a VM, would that not be easier
> > solved by just having vfio-pci clear the "Enable No Snoop" control bit in
> > the endpoint's PCIe capability?  
> 
> Ideally.
> 
> That was rediscussed recently, apparently there are non-compliant
> devices and drivers that just ignore the bit. 
> 
> Presumably this is why x86 had to move to an IOMMU enforced feature..

I considered this option when implementing the current solution, but
ultimately I didn't have confidence in being able to prevent drivers
from using device specific means to effect the change anyway.  GPUs
especially have various back channels to config space.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/5] iommu: Introduce the domain op enforce_cache_coherency()

2022-04-05 Thread Alex Williamson
On Tue,  5 Apr 2022 13:16:02 -0300
Jason Gunthorpe  wrote:

> This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY and
> IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU.
> 
> Currently only Intel and AMD IOMMUs are known to support this
> feature. They both implement it as an IOPTE bit, that when set, will cause
> PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though
> the no-snoop bit was clear.
> 
> The new API is triggered by calling enforce_cache_coherency() before
> mapping any IOVA to the domain which globally switches on no-snoop
> blocking. This allows other implementations that might block no-snoop
> globally and outside the IOPTE - AMD also documents such an HW capability.
> 
> Leave AMD out of sync with Intel and have it block no-snoop even for
> in-kernel users. This can be trivially resolved in a follow up patch.
> 
> Only VFIO will call this new API.
> 
> Signed-off-by: Jason Gunthorpe 
> ---
>  drivers/iommu/amd/iommu.c   |  7 +++
>  drivers/iommu/intel/iommu.c | 14 +-
>  include/linux/intel-iommu.h |  1 +
>  include/linux/iommu.h   |  4 
>  4 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index a1ada7bff44e61..e500b487eb3429 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2271,6 +2271,12 @@ static int amd_iommu_def_domain_type(struct device 
> *dev)
>   return 0;
>  }
>  
> +static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
> +{
> + /* IOMMU_PTE_FC is always set */
> + return true;
> +}
> +
>  const struct iommu_ops amd_iommu_ops = {
>   .capable = amd_iommu_capable,
>   .domain_alloc = amd_iommu_domain_alloc,
> @@ -2293,6 +2299,7 @@ const struct iommu_ops amd_iommu_ops = {
>   .flush_iotlb_all = amd_iommu_flush_iotlb_all,
>   .iotlb_sync = amd_iommu_iotlb_sync,
>   .free   = amd_iommu_domain_free,
> + .enforce_cache_coherency = amd_iommu_enforce_cache_coherency,
>   }
>  };
>  
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index df5c62ecf942b8..f08611a6cc4799 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4422,7 +4422,8 @@ static int intel_iommu_map(struct iommu_domain *domain,
>   prot |= DMA_PTE_READ;
>   if (iommu_prot & IOMMU_WRITE)
>   prot |= DMA_PTE_WRITE;
> - if ((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping)
> + if (((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping) ||
> + dmar_domain->enforce_no_snoop)
>   prot |= DMA_PTE_SNP;
>  
>   max_addr = iova + size;
> @@ -4545,6 +4546,16 @@ static phys_addr_t intel_iommu_iova_to_phys(struct 
> iommu_domain *domain,
>   return phys;
>  }
>  
> +static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
> +{
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +
> + if (!dmar_domain->iommu_snooping)
> + return false;
> + dmar_domain->enforce_no_snoop = true;
> + return true;
> +}

Don't we have issues if we try to set DMA_PTE_SNP on DMARs that don't
support it, ie. reserved register bit set in pte faults?  It seems
really inconsistent here that I could make a domain that supports
iommu_snooping, set enforce_no_snoop = true, then add another DMAR to
the domain that may not support iommu_snooping, I'd get false on the
subsequent enforcement test, but the dmar_domain is still trying to use
DMA_PTE_SNP.

There's also a disconnect, maybe just in the naming or documentation,
but if I call enforce_cache_coherency for a domain, that seems like the
domain should retain those semantics regardless of how it's modified,
ie. "enforced".  For example, if I tried to perform the above operation,
I should get a failure attaching the device that brings in the less
capable DMAR because the domain has been set to enforce this feature.

If the API is that I need to re-enforce_cache_coherency on every
modification of the domain, shouldn't dmar_domain->enforce_no_snoop
also return to a default value on domain changes?

Maybe this should be something like set_no_snoop_squashing with the
above semantics, it needs to be re-applied whenever the domain:device
composition changes?  Thanks,

Alex

> +
>  static bool intel_iommu_capable(enum iommu_cap cap)
>  {
>   if (cap == IOMMU_CAP_CACHE_COHERENCY)
> @@ -4898,6 +4909,7 @@ const struct iommu_ops intel_iommu_ops = {
>   .iotlb_sync = intel_iommu_tlb_sync,
>   .iova_to_phys   = intel_iommu_iova_to_phys,
>   .free   = intel_iommu_domain_free,
> + .enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
>   }
>  };
>  
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 2f9891cb3d0014..1f930c0c225d94 

Re: [PATCH 2/5] vfio: Require that devices support DMA cache coherence

2022-04-05 Thread Alex Williamson
On Tue,  5 Apr 2022 13:16:01 -0300
Jason Gunthorpe  wrote:

> dev_is_dma_coherent() is the control to determine if IOMMU_CACHE can be
> supported.
> 
> IOMMU_CACHE means that normal DMAs do not require any additional coherency
> mechanism and is the basic uAPI that VFIO exposes to userspace. For
> instance VFIO applications like DPDK will not work if additional coherency
> operations are required.
> 
> Therefore check dev_is_dma_coherent() before allowing a device to join a
> domain. This will block device/platform/iommu combinations from using VFIO
> that do not support cache coherent DMA.
> 
> Signed-off-by: Jason Gunthorpe 
> ---
>  drivers/vfio/vfio.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index a4555014bd1e72..2a3aa3e742d943 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "vfio.h"
>  
>  #define DRIVER_VERSION   "0.3"
> @@ -1348,6 +1349,11 @@ static int vfio_group_get_device_fd(struct vfio_group 
> *group, char *buf)
>   if (IS_ERR(device))
>   return PTR_ERR(device);
>  
> + if (group->type == VFIO_IOMMU && !dev_is_dma_coherent(device->dev)) {
> + ret = -ENODEV;
> + goto err_device_put;
> + }
> +

Failing at the point where the user is trying to gain access to the
device seems a little late in the process and opaque, wouldn't we
rather have vfio bus drivers fail to probe such devices?  I'd expect
this to occur in the vfio_register_group_dev() path.  Thanks,

Alex

>   if (!try_module_get(device->dev->driver->owner)) {
>   ret = -ENODEV;
>   goto err_device_put;

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf

2021-01-19 Thread Alex Williamson
On Wed, 20 Jan 2021 03:05:49 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson
> > Sent: Wednesday, January 20, 2021 8:51 AM
> > 
> > On Wed, 20 Jan 2021 00:14:49 +
> > "Kasireddy, Vivek"  wrote:
> >   
> > > Hi Alex,
> > >  
> > > > -Original Message-
> > > > From: Alex Williamson 
> > > > Sent: Tuesday, January 19, 2021 7:40 AM
> > > > To: Kasireddy, Vivek 
> > > > Cc: virtualization@lists.linux-foundation.org; Kim, Dongwon  
> >   
> > > > Subject: Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
> > > >
> > > > On Tue, 19 Jan 2021 00:28:12 -0800
> > > > Vivek Kasireddy  wrote:
> > > >  
> > > > > Getting a copy of the KVM instance is necessary for mapping Guest
> > > > > pages in the Host.
> > > > >
> > > > > TODO: Instead of invoking the symbol directly, there needs to be a
> > > > > better way of getting a copy of the KVM instance probably by using
> > > > > other notifiers. However, currently, KVM shares its instance only
> > > > > with VFIO and therefore we are compelled to bind the passthrough'd
> > > > > device to vfio-pci.  
> > > >
> > > > Yeah, this is a bad solution, sorry, vfio is not going to gratuitously
> > > > call out to vhost to share a kvm pointer.  I'd prefer to get rid of
> > > > vfio having any knowledge or visibility of the kvm pointer.  Thanks,  
> > >
> > > [Kasireddy, Vivek] I agree that this is definitely not ideal as I 
> > > recognize it
> > > in the TODO. However, it looks like VFIO also gets a copy of the KVM
> > > pointer in a similar manner:
> > >
> > > virt/kvm/vfio.c
> > >
> > > static void kvm_vfio_group_set_kvm(struct vfio_group *group, struct kvm  
> > *kvm)  
> > > {
> > > void (*fn)(struct vfio_group *, struct kvm *);
> > >
> > > fn = symbol_get(vfio_group_set_kvm);
> > > if (!fn)
> > > return;
> > >
> > > fn(group, kvm);
> > >
> > > symbol_put(vfio_group_set_kvm);
> > > }  
> > 
> > You're equating the mechanism with the architecture.  We use symbols
> > here to avoid module dependencies between kvm and vfio, but this is
> > just propagating data that userspace is specifically registering
> > between kvm and vfio.  vhost doesn't get to piggyback on that channel.
> >   
> > > With this patch, I am not suggesting that this is a precedent that should 
> > > be  
> > followed  
> > > but it appears there doesn't seem to be an alternative way of getting a 
> > > copy  
> > of the KVM  
> > > pointer that is clean and elegant -- unless I have not looked hard 
> > > enough. I  
> > guess we  
> > > could create a notifier chain with callbacks for VFIO and Vhost that KVM  
> > would call  
> > > but this would mean modifying KVM.
> > >
> > > Also, if I understand correctly, if VFIO does not want to share the KVM  
> > pointer with  
> > > VFIO groups, then I think it would break stuff like mdev which counts on 
> > > it.  
> > 
> > Only kvmgt requires the kvm pointer and the use case there is pretty
> > questionable, I wonder if it actually still exists now that we have the
> > DMA r/w interface through vfio.  Thanks,
> >   
> 
> IIRC, kvmgt still needs the kvm pointer to use kvm page tracking interface 
> for write-protecting guest pgtable.

Thanks, Kevin.  Either way, a vhost device has no stake in the game wrt
the kvm pointer lifecycle here and no business adding a callout.  I'm
reluctant to add any further use cases even for mdevs as ideally mdevs
should have no dependency on kvm.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf

2021-01-19 Thread Alex Williamson
On Wed, 20 Jan 2021 00:14:49 +
"Kasireddy, Vivek"  wrote:

> Hi Alex,
> 
> > -Original Message-
> > From: Alex Williamson 
> > Sent: Tuesday, January 19, 2021 7:40 AM
> > To: Kasireddy, Vivek 
> > Cc: virtualization@lists.linux-foundation.org; Kim, Dongwon 
> > 
> > Subject: Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
> > 
> > On Tue, 19 Jan 2021 00:28:12 -0800
> > Vivek Kasireddy  wrote:
> >   
> > > Getting a copy of the KVM instance is necessary for mapping Guest
> > > pages in the Host.
> > >
> > > TODO: Instead of invoking the symbol directly, there needs to be a
> > > better way of getting a copy of the KVM instance probably by using
> > > other notifiers. However, currently, KVM shares its instance only
> > > with VFIO and therefore we are compelled to bind the passthrough'd
> > > device to vfio-pci.  
> > 
> > Yeah, this is a bad solution, sorry, vfio is not going to gratuitously
> > call out to vhost to share a kvm pointer.  I'd prefer to get rid of
> > vfio having any knowledge or visibility of the kvm pointer.  Thanks,  
> 
> [Kasireddy, Vivek] I agree that this is definitely not ideal as I recognize it
> in the TODO. However, it looks like VFIO also gets a copy of the KVM 
> pointer in a similar manner:
> 
> virt/kvm/vfio.c
> 
> static void kvm_vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
> {
> void (*fn)(struct vfio_group *, struct kvm *);
> 
> fn = symbol_get(vfio_group_set_kvm);
> if (!fn)
> return;
> 
> fn(group, kvm);
> 
> symbol_put(vfio_group_set_kvm);
> }

You're equating the mechanism with the architecture.  We use symbols
here to avoid module dependencies between kvm and vfio, but this is
just propagating data that userspace is specifically registering
between kvm and vfio.  vhost doesn't get to piggyback on that channel.

> With this patch, I am not suggesting that this is a precedent that should be 
> followed 
> but it appears there doesn't seem to be an alternative way of getting a copy 
> of the KVM 
> pointer that is clean and elegant -- unless I have not looked hard enough. I 
> guess we
> could create a notifier chain with callbacks for VFIO and Vhost that KVM 
> would call 
> but this would mean modifying KVM.
> 
> Also, if I understand correctly, if VFIO does not want to share the KVM 
> pointer with
> VFIO groups, then I think it would break stuff like mdev which counts on it. 

Only kvmgt requires the kvm pointer and the use case there is pretty
questionable, I wonder if it actually still exists now that we have the
DMA r/w interface through vfio.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf

2021-01-19 Thread Alex Williamson
On Tue, 19 Jan 2021 00:28:12 -0800
Vivek Kasireddy  wrote:

> Getting a copy of the KVM instance is necessary for mapping Guest
> pages in the Host.
> 
> TODO: Instead of invoking the symbol directly, there needs to be a
> better way of getting a copy of the KVM instance probably by using
> other notifiers. However, currently, KVM shares its instance only
> with VFIO and therefore we are compelled to bind the passthrough'd
> device to vfio-pci.

Yeah, this is a bad solution, sorry, vfio is not going to gratuitously
call out to vhost to share a kvm pointer.  I'd prefer to get rid of
vfio having any knowledge or visibility of the kvm pointer.  Thanks,

Alex
 
> Signed-off-by: Vivek Kasireddy 
> ---
>  drivers/vfio/vfio.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 4ad8a35667a7..9fb11b1ad3cd 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -2213,11 +2213,20 @@ static int vfio_unregister_iommu_notifier(struct 
> vfio_group *group,
>   return ret;
>  }
>  
> +extern void vhost_vdmabuf_get_kvm(unsigned long action, void *data);
>  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
>  {
> + void (*fn)(unsigned long, void *);
> +
>   group->kvm = kvm;
>   blocking_notifier_call_chain(>notifier,
>   VFIO_GROUP_NOTIFY_SET_KVM, kvm);
> +
> + fn = symbol_get(vhost_vdmabuf_get_kvm);
> + if (fn) {
> + fn(VFIO_GROUP_NOTIFY_SET_KVM, kvm);
> + symbol_put(vhost_vdmabuf_get_kvm);
> + }
>  }
>  EXPORT_SYMBOL_GPL(vfio_group_set_kvm);
>  

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 2/6] kvm: detect assigned device via irqbypass manager

2020-07-17 Thread Alex Williamson
On Thu, 16 Jul 2020 19:23:45 +0800
Zhu Lingshan  wrote:

> vDPA devices has dedicated backed hardware like
> passthrough-ed devices. Then it is possible to setup irq
> offloading to vCPU for vDPA devices. Thus this patch tries to
> manipulated assigned device counters via irqbypass manager.
> 
> We will increase/decrease the assigned device counter in kvm/x86.
> Both vDPA and VFIO would go through this code path.
> 
> This code path only affect x86 for now.
> 
> Signed-off-by: Zhu Lingshan 
> Suggested-by: Jason Wang 
> ---
>  arch/x86/kvm/x86.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 00c88c2..20c07d3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10624,11 +10624,17 @@ int kvm_arch_irq_bypass_add_producer(struct 
> irq_bypass_consumer *cons,
>  {
>   struct kvm_kernel_irqfd *irqfd =
>   container_of(cons, struct kvm_kernel_irqfd, consumer);
> + int ret;
>  
>   irqfd->producer = prod;
> + kvm_arch_start_assignment(irqfd->kvm);
> + ret = kvm_x86_ops.update_pi_irte(irqfd->kvm,
> +  prod->irq, irqfd->gsi, 1);
> +
> + if (ret)
> + kvm_arch_end_assignment(irqfd->kvm);
>  
> - return kvm_x86_ops.update_pi_irte(irqfd->kvm,
> -prod->irq, irqfd->gsi, 1);
> + return ret;
>  }
>  
>  void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,


Why isn't there a matching end-assignment in the del_producer path?  It
seems this only goes one-way, what happens when a device is
hot-unplugged from the VM or the device interrupt configuration changes.
This will still break vfio if it's not guaranteed to be symmetric.
Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/7] kvm/vfio: detect assigned device via irqbypass manager

2020-07-12 Thread Alex Williamson
On Sun, 12 Jul 2020 22:49:21 +0800
Zhu Lingshan  wrote:

> We used to detect assigned device via VFIO manipulated device
> conters. This is less flexible consider VFIO is not the only
> interface for assigned device. vDPA devices has dedicated
> backed hardware as well. So this patch tries to detect
> the assigned device via irqbypass manager.
> 
> We will increase/decrease the assigned device counter in kvm/x86.
> Both vDPA and VFIO would go through this code path.
> 
> This code path only affect x86 for now.

No it doesn't, it only adds VFIO support to x86, but it removes it from
architecture neutral code.  Also a VFIO device does not necessarily
make use of the irqbypass manager, this depends on platform support and
enablement of this feature.   Therefore, NAK.  Thanks,

Alex
 
> Signed-off-by: Zhu Lingshan 
> ---
>  arch/x86/kvm/x86.c | 10 --
>  virt/kvm/vfio.c|  2 --
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 00c88c2..20c07d3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10624,11 +10624,17 @@ int kvm_arch_irq_bypass_add_producer(struct 
> irq_bypass_consumer *cons,
>  {
>   struct kvm_kernel_irqfd *irqfd =
>   container_of(cons, struct kvm_kernel_irqfd, consumer);
> + int ret;
>  
>   irqfd->producer = prod;
> + kvm_arch_start_assignment(irqfd->kvm);
> + ret = kvm_x86_ops.update_pi_irte(irqfd->kvm,
> +  prod->irq, irqfd->gsi, 1);
> +
> + if (ret)
> + kvm_arch_end_assignment(irqfd->kvm);
>  
> - return kvm_x86_ops.update_pi_irte(irqfd->kvm,
> -prod->irq, irqfd->gsi, 1);
> + return ret;
>  }
>  
>  void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 8fcbc50..111da52 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -226,7 +226,6 @@ static int kvm_vfio_set_group(struct kvm_device *dev, 
> long attr, u64 arg)
>   list_add_tail(>node, >group_list);
>   kvg->vfio_group = vfio_group;
>  
> - kvm_arch_start_assignment(dev->kvm);
>  
>   mutex_unlock(>lock);
>  
> @@ -254,7 +253,6 @@ static int kvm_vfio_set_group(struct kvm_device *dev, 
> long attr, u64 arg)
>   continue;
>  
>   list_del(>node);
> - kvm_arch_end_assignment(dev->kvm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>   kvm_spapr_tce_release_vfio_group(dev->kvm,
>kvg->vfio_group);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH for QEMU v2] hw/vfio: Add VMD Passthrough Quirk

2020-05-13 Thread Alex Williamson
On Wed, 13 May 2020 19:26:34 +
"Derrick, Jonathan"  wrote:

> On Wed, 2020-05-13 at 11:55 -0600, Alex Williamson wrote:
> > On Wed, 13 May 2020 00:35:47 +
> > "Derrick, Jonathan"  wrote:
> >   
> > > Hi Alex,
> > > 
> > >   
> [snip]
> 
> > 
> > Thanks for the confirmation.  The approach seems ok, but a subtle
> > side-effect of MemoryRegion quirks is that we're going to trap the
> > entire PAGE_SIZE range overlapping the quirk out to QEMU on guest
> > access.  The two registers at 0x2000 might be reserved for this
> > purpose, but is there anything else that could be performance sensitive
> > anywhere in that page?  If so, it might be less transparent, but more
> > efficient to invent a new quirk making use of config space (ex. adding
> > an emulated vendor capability somewhere known to be unused on the
> > device).  Thanks,
> > 
> > Alex  
> 
> Seems like that could be a problem if we're running with huge pages and
> overlapping msix tables.

FWIW, MSI-X tables are already getting trapped into QEMU for emulation.
We have a mechanism via x-msix-relocation= in QEMU to
deal with that when it becomes a problem by emulating the MSI-X
structures in MMIO space that doesn't overlap with the actual device
(ie. virtually resizing or adding BARs).  The issue is what else can be
in that 4K page or will this device be supported on archs where the
system page size is >4K more so than huge pages (as in hugetlbfs or
transparent huge pages).  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH for QEMU v2] hw/vfio: Add VMD Passthrough Quirk

2020-05-13 Thread Alex Williamson
On Wed, 13 May 2020 00:35:47 +
"Derrick, Jonathan"  wrote:

> Hi Alex,
> 
> I'm probably not getting the translation technical details correct.
> 
> On Mon, 2020-05-11 at 16:59 -0600, Alex Williamson wrote:
> > On Mon, 11 May 2020 15:01:27 -0400
> > Jon Derrick  wrote:
> >   
> > > The VMD endpoint provides a real PCIe domain to the guest, including  
> > 
> > Please define VMD.  I'm sure this is obvious to many, but I've had to
> > do some research.  The best TL;DR summary I've found is Keith's
> > original commit 185a383ada2e adding the controller to Linux.  If there's
> > something better, please let me know.  
> That's the correct commit, but I'll try to summarize the important bits
> for v3.
> 
> >   
> > > bridges and endpoints. Because the VMD domain is enumerated by the guest
> > > kernel, the guest kernel will assign Guest Physical Addresses to the
> > > downstream endpoint BARs and bridge windows.
> > > 
> > > When the guest kernel performs MMIO to VMD sub-devices, IOMMU will
> > > translate from the guest address space to the physical address space.
> > > Because the bridges have been programmed with guest addresses, the
> > > bridges will reject the transaction containing physical addresses.  
> > 
> > I'm lost, what IOMMU is involved in CPU access to MMIO space?  My guess
> > is that since all MMIO of this domain is mapped behind the host
> > endpoint BARs 2 & 4 that QEMU simply accesses it via mapping of those
> > BARs into the VM, so it's the MMU, not the IOMMU performing those GPA
> > to HPA translations.  But then presumably the bridges within the domain
> > are scrambled because their apertures are programmed with ranges that
> > don't map into the VMD endpoint BARs.  Is that remotely correct?  Some
> > /proc/iomem output and/or lspci listing from the host to see how this
> > works would be useful.  
> Correct. So MMU not IOMMU.
> 
> In the guest kernel, the bridges and devices in the VMD domain are
> programmed with the addresses provided in the VMD endpoint's BAR2&4
> (MEMBAR1&2). Because these BARs are populated with guest addresses, MMU
> translates to host physical and the bridge window rejects MMIO not in
> its [GPA] range.
> 
> As an example:
> Host:
>   9400-97ff : :17:05.5
> 9400-97ff : VMD MEMBAR1
>   9400-943f : PCI Bus 1:01
> 9400-9400 : 1:01:00.0
> 9401-94013fff : 1:01:00.0
>   9401-94013fff : nvme
>   9440-947f : PCI Bus 1:01
>   9480-94bf : PCI Bus 1:02
> 9480-9480 : 1:02:00.0
> 9481-94813fff : 1:02:00.0
>   9481-94813fff : nvme
>   94c0-94ff : PCI Bus 1:02
> 
> 
> MEMBAR 2 is similarly assigned
> 
> >   
> > > VMD device 28C0 natively assists passthrough by providing the Host
> > > Physical Address in shadow registers accessible to the guest for bridge
> > > window assignment. The shadow registers are valid if bit 1 is set in VMD
> > > VMLOCK config register 0x70. Future VMDs will also support this feature.
> > > Existing VMDs have config register 0x70 reserved, and will return 0 on
> > > reads.  
> > 
> > So these shadow registers are simply exposing the host BAR2 & BAR4
> > addresses into the guest, so the quirk is dependent on reading those
> > values from the device before anyone has written to them and the BAR
> > emulation in the kernel kicks in (not a problem, just an observation).  
> It's not expected that there will be anything writing that resource and
> those registers are read-only.
> The first 0x2000 of MEMBAR2 (BAR4) contain msix tables, and mappings to
> subordinate buses are on 1MB aligned.
> 
> 
> > Does the VMD controller code then use these bases addresses to program
> > the bridges/endpoint within the domain?  What does the same /proc/iomem
> > or lspci look like inside the guest then?  It seems like we'd see the
> > VMD endpoint with GPA BARs, but the devices within the domain using
> > HPAs.  If that's remotely true, and we're not forcing an identity
> > mapping of this HPA range into the GPA, does the vmd controller driver
> > impose a TRA function on these MMIO addresses in the guest?  
> 
> This is the guest with the guest addresses:
>   f800-fbff : :00:07.0
> f800-fbff : VMD MEMBAR1
>   
> f800-f83f : PCI Bus 1:01
> f800-f800 :
> 1:01:00.0
> f801-f8013fff : 1:01:00.0
>   f801000
&g

Re: [PATCH for QEMU v2] hw/vfio: Add VMD Passthrough Quirk

2020-05-11 Thread Alex Williamson
On Mon, 11 May 2020 15:01:27 -0400
Jon Derrick  wrote:

> The VMD endpoint provides a real PCIe domain to the guest, including

Please define VMD.  I'm sure this is obvious to many, but I've had to
do some research.  The best TL;DR summary I've found is Keith's
original commit 185a383ada2e adding the controller to Linux.  If there's
something better, please let me know.

> bridges and endpoints. Because the VMD domain is enumerated by the guest
> kernel, the guest kernel will assign Guest Physical Addresses to the
> downstream endpoint BARs and bridge windows.
>
> When the guest kernel performs MMIO to VMD sub-devices, IOMMU will
> translate from the guest address space to the physical address space.
> Because the bridges have been programmed with guest addresses, the
> bridges will reject the transaction containing physical addresses.

I'm lost, what IOMMU is involved in CPU access to MMIO space?  My guess
is that since all MMIO of this domain is mapped behind the host
endpoint BARs 2 & 4 that QEMU simply accesses it via mapping of those
BARs into the VM, so it's the MMU, not the IOMMU performing those GPA
to HPA translations.  But then presumably the bridges within the domain
are scrambled because their apertures are programmed with ranges that
don't map into the VMD endpoint BARs.  Is that remotely correct?  Some
/proc/iomem output and/or lspci listing from the host to see how this
works would be useful.

> VMD device 28C0 natively assists passthrough by providing the Host
> Physical Address in shadow registers accessible to the guest for bridge
> window assignment. The shadow registers are valid if bit 1 is set in VMD
> VMLOCK config register 0x70. Future VMDs will also support this feature.
> Existing VMDs have config register 0x70 reserved, and will return 0 on
> reads.

So these shadow registers are simply exposing the host BAR2 & BAR4
addresses into the guest, so the quirk is dependent on reading those
values from the device before anyone has written to them and the BAR
emulation in the kernel kicks in (not a problem, just an observation).

Does the VMD controller code then use these bases addresses to program
the bridges/endpoint within the domain?  What does the same /proc/iomem
or lspci look like inside the guest then?  It seems like we'd see the
VMD endpoint with GPA BARs, but the devices within the domain using
HPAs.  If that's remotely true, and we're not forcing an identity
mapping of this HPA range into the GPA, does the vmd controller driver
impose a TRA function on these MMIO addresses in the guest?

Sorry if I'm way off, I'm piecing things together from scant
information here.  Please Cc me on future vfio related patches.  Thanks,

Alex

 
> In order to support existing VMDs, this quirk emulates the VMLOCK and
> HPA shadow registers for all VMD device ids which don't natively assist
> with passthrough. The Linux VMD driver is updated to allow existing VMD
> devices to query VMLOCK for passthrough support.
> 
> Signed-off-by: Jon Derrick 
> ---
>  hw/vfio/pci-quirks.c | 103 +++
>  hw/vfio/pci.c|   7 +++
>  hw/vfio/pci.h|   2 +
>  hw/vfio/trace-events |   3 ++
>  4 files changed, 115 insertions(+)
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index 2d348f8237..4060a6a95d 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -1709,3 +1709,106 @@ free_exit:
>  
>  return ret;
>  }
> +
> +/*
> + * The VMD endpoint provides a real PCIe domain to the guest and the guest
> + * kernel performs enumeration of the VMD sub-device domain. Guest 
> transactions
> + * to VMD sub-devices go through IOMMU translation from guest addresses to
> + * physical addresses. When MMIO goes to an endpoint after being translated 
> to
> + * physical addresses, the bridge rejects the transaction because the window
> + * has been programmed with guest addresses.
> + *
> + * VMD can use the Host Physical Address in order to correctly program the
> + * bridge windows in its PCIe domain. VMD device 28C0 has HPA shadow 
> registers
> + * located at offset 0x2000 in MEMBAR2 (BAR 4). The shadow registers are 
> valid
> + * if bit 1 is set in the VMD VMLOCK config register 0x70. VMD devices 
> without
> + * this native assistance can have these registers safely emulated as these
> + * registers are reserved.
> + */
> +typedef struct VFIOVMDQuirk {
> +VFIOPCIDevice *vdev;
> +uint64_t membar_phys[2];
> +} VFIOVMDQuirk;
> +
> +static uint64_t vfio_vmd_quirk_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +VFIOVMDQuirk *data = opaque;
> +uint64_t val = 0;
> +
> +memcpy(, (void *)data->membar_phys + addr, size);
> +return val;
> +}
> +
> +static const MemoryRegionOps vfio_vmd_quirk = {
> +.read = vfio_vmd_quirk_read,
> +.endianness = DEVICE_LITTLE_ENDIAN,
> +};
> +
> +#define VMD_VMLOCK  0x70
> +#define VMD_SHADOW  0x2000
> +#define VMD_MEMBAR2 4
> +
> +static int 

Re: [PATCH] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE

2020-03-19 Thread Alex Williamson
On Wed, 18 Mar 2020 17:14:05 +0100
Auger Eric  wrote:

> Hi,
> 
> On 3/18/20 1:00 PM, Robin Murphy wrote:
> > On 2020-03-18 11:40 am, Jean-Philippe Brucker wrote:  
> >> We don't currently support IOMMUs with a page granule larger than the
> >> system page size. The IOVA allocator has a BUG_ON() in this case, and
> >> VFIO has a WARN_ON().  
> 
> Adding Alex in CC in case he has time to jump in. At the moment I don't
> get why this WARN_ON() is here.
> 
> This was introduced in
> c8dbca165bb090f926996a572ea2b5b577b34b70 vfio/iommu_type1: Avoid overflow

I don't recall if I had something specific in mind when adding this
warning, but if PAGE_SIZE is smaller than the minimum IOMMU page size,
then we need multiple PAGE_SIZE pages per IOMMU TLB entry.  Therefore
those pages must be contiguous.  Aside from requiring only hugepage
backing, how could a user make sure that their virtual address buffer
is physically contiguous?  I don't think we have any sanity checking
code that does this, thus the warning.  Thanks,

Alex

> >> It might be possible to remove these obstacles if necessary. If the host
> >> uses 64kB pages and the guest uses 4kB, then a device driver calling
> >> alloc_page() followed by dma_map_page() will create a 64kB mapping for a
> >> 4kB physical page, allowing the endpoint to access the neighbouring 60kB
> >> of memory. This problem could be worked around with bounce buffers.  
> > 
> > FWIW the fundamental issue is that callers of iommu_map() may expect to
> > be able to map two or more page-aligned regions directly adjacent to
> > each other for scatter-gather purposes (or ring buffer tricks), and
> > that's just not possible if the IOMMU granule is too big. Bounce
> > buffering would be a viable workaround for the streaming DMA API and
> > certain similar use-cases, but not in general (e.g. coherent DMA, VFIO,
> > GPUs, etc.)
> > 
> > Robin.
> >   
> >> For the moment, rather than triggering the IOVA BUG_ON() on mismatched
> >> page sizes, abort the virtio-iommu probe with an error message.  
> 
> I understand this is a introduced as a temporary solution but this
> sounds as an important limitation to me. For instance this will prevent
> from running a fedora guest exposed with a virtio-iommu with a RHEL host.
> 
> Thanks
> 
> Eric
> >>
> >> Reported-by: Bharat Bhushan 
> >> Signed-off-by: Jean-Philippe Brucker 
> >> ---
> >>   drivers/iommu/virtio-iommu.c | 9 +
> >>   1 file changed, 9 insertions(+)
> >>
> >> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> >> index 6d4e3c2a2ddb..80d5d8f621ab 100644
> >> --- a/drivers/iommu/virtio-iommu.c
> >> +++ b/drivers/iommu/virtio-iommu.c
> >> @@ -998,6 +998,7 @@ static int viommu_probe(struct virtio_device *vdev)
> >>   struct device *parent_dev = vdev->dev.parent;
> >>   struct viommu_dev *viommu = NULL;
> >>   struct device *dev = >dev;
> >> +    unsigned long viommu_page_size;
> >>   u64 input_start = 0;
> >>   u64 input_end = -1UL;
> >>   int ret;
> >> @@ -1028,6 +1029,14 @@ static int viommu_probe(struct virtio_device
> >> *vdev)
> >>   goto err_free_vqs;
> >>   }
> >>   +    viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
> >> +    if (viommu_page_size > PAGE_SIZE) {
> >> +    dev_err(dev, "granule 0x%lx larger than system page size
> >> 0x%lx\n",
> >> +    viommu_page_size, PAGE_SIZE);
> >> +    ret = -EINVAL;
> >> +    goto err_free_vqs;
> >> +    }
> >> +
> >>   viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ |
> >> VIRTIO_IOMMU_MAP_F_WRITE;
> >>   viommu->last_domain = ~0U;
> >>    
> >   

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V9 6/6] docs: sample driver to demonstrate how to implement virtio-mdev framework

2019-11-06 Thread Alex Williamson
On Wed, 6 Nov 2019 14:50:30 -0800
Randy Dunlap  wrote:

> On 11/5/19 11:05 PM, Jason Wang wrote:
> > diff --git a/samples/Kconfig b/samples/Kconfig
> > index c8dacb4dda80..13a2443e18e0 100644
> > --- a/samples/Kconfig
> > +++ b/samples/Kconfig
> > @@ -131,6 +131,16 @@ config SAMPLE_VFIO_MDEV_MDPY
> >   mediated device.  It is a simple framebuffer and supports
> >   the region display interface (VFIO_GFX_PLANE_TYPE_REGION).
> >  
> > +config SAMPLE_VIRTIO_MDEV_NET
> > +   tristate "Build VIRTIO net example mediated device sample code -- 
> > loadable modules only"
> > +   depends on VIRTIO_MDEV && VHOST_RING && m
> > +   help
> > + Build a networking sample device for use as a virtio
> > + mediated device. The device coopreates with virtio-mdev bus  
> 
> typo here:
> cooperates
> 

I can fix this on commit relative to V10 if there are no other issues
raised:

diff --git a/samples/Kconfig b/samples/Kconfig
index 13a2443e18e0..b7116d97cbbe 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -136,7 +136,7 @@ config SAMPLE_VIRTIO_MDEV_NET
depends on VIRTIO_MDEV && VHOST_RING && m
help
  Build a networking sample device for use as a virtio
- mediated device. The device coopreates with virtio-mdev bus
+ mediated device. The device cooperates with virtio-mdev bus
  driver to present an virtio ethernet driver for
  kernel. It simply loopbacks all packets from its TX
  virtqueue to its RX virtqueue.

Thanks,
Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V8 0/6] mdev based hardware virtio offloading support

2019-11-06 Thread Alex Williamson
On Wed, 6 Nov 2019 14:25:23 -0500
"Michael S. Tsirkin"  wrote:

> On Wed, Nov 06, 2019 at 12:03:12PM -0700, Alex Williamson wrote:
> > On Wed, 6 Nov 2019 11:56:46 +0800
> > Jason Wang  wrote:
> >   
> > > On 2019/11/6 上午1:58, Alex Williamson wrote:  
> > > > On Tue,  5 Nov 2019 17:32:34 +0800
> > > > Jason Wang  wrote:
> > > >
> > > >> Hi all:
> > > >>
> > > >> There are hardwares that can do virtio datapath offloading while
> > > >> having its own control path. This path tries to implement a mdev based
> > > >> unified API to support using kernel virtio driver to drive those
> > > >> devices. This is done by introducing a new mdev transport for virtio
> > > >> (virtio_mdev) and register itself as a new kind of mdev driver. Then
> > > >> it provides a unified way for kernel virtio driver to talk with mdev
> > > >> device implementation.
> > > >>
> > > >> Though the series only contains kernel driver support, the goal is to
> > > >> make the transport generic enough to support userspace drivers. This
> > > >> means vhost-mdev[1] could be built on top as well by resuing the
> > > >> transport.
> > > >>
> > > >> A sample driver is also implemented which simulate a virito-net
> > > >> loopback ethernet device on top of vringh + workqueue. This could be
> > > >> used as a reference implementation for real hardware driver.
> > > >>
> > > >> Also a real ICF VF driver was also posted here[2] which is a good
> > > >> reference for vendors who is interested in their own virtio datapath
> > > >> offloading product.
> > > >>
> > > >> Consider mdev framework only support VFIO device and driver right now,
> > > >> this series also extend it to support other types. This is done
> > > >> through introducing class id to the device and pairing it with
> > > >> id_talbe claimed by the driver. On top, this seris also decouple
> > > >> device specific parents ops out of the common ones.
> > > >>
> > > >> Pktgen test was done with virito-net + mvnet loop back device.
> > > >>
> > > >> Please review.
> > > >>
> > > >> [1] https://lkml.org/lkml/2019/10/31/440
> > > >> [2] https://lkml.org/lkml/2019/10/15/1226
> > > >>
> > > >> Changes from V7:
> > > >> - drop {set|get}_mdev_features for virtio
> > > >> - typo and comment style fixes
> > > >
> > > > Seems we're nearly there, all the remaining comments are relatively
> > > > superficial, though I would appreciate a v9 addressing them as well as
> > > > the checkpatch warnings:
> > > >
> > > > https://patchwork.freedesktop.org/series/68977/
> > > 
> > > 
> > > Will do.
> > > 
> > > Btw, do you plan to merge vhost-mdev patch on top? Or you prefer it to 
> > > go through Michael's vhost tree?  
> > 
> > I can include it if you wish.  The mdev changes are isolated enough in
> > that patch that I wouldn't presume it, but clearly it would require
> > less merge coordination to drop it in my tree.  Let me know.  Thanks,
> > 
> > Alex  
> 
> I'm fine with merging through your tree. If you do, feel free to
> include
> 
> Acked-by: Michael S. Tsirkin 

AFAICT, it looks like we're expecting at least one more version of
Tiwei's patch after V5, so it'd probably be best to provide the ack and
go-ahead on that next version so there's no confusion.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V8 0/6] mdev based hardware virtio offloading support

2019-11-06 Thread Alex Williamson
On Wed, 6 Nov 2019 11:56:46 +0800
Jason Wang  wrote:

> On 2019/11/6 上午1:58, Alex Williamson wrote:
> > On Tue,  5 Nov 2019 17:32:34 +0800
> > Jason Wang  wrote:
> >  
> >> Hi all:
> >>
> >> There are hardwares that can do virtio datapath offloading while
> >> having its own control path. This path tries to implement a mdev based
> >> unified API to support using kernel virtio driver to drive those
> >> devices. This is done by introducing a new mdev transport for virtio
> >> (virtio_mdev) and register itself as a new kind of mdev driver. Then
> >> it provides a unified way for kernel virtio driver to talk with mdev
> >> device implementation.
> >>
> >> Though the series only contains kernel driver support, the goal is to
> >> make the transport generic enough to support userspace drivers. This
> >> means vhost-mdev[1] could be built on top as well by resuing the
> >> transport.
> >>
> >> A sample driver is also implemented which simulate a virito-net
> >> loopback ethernet device on top of vringh + workqueue. This could be
> >> used as a reference implementation for real hardware driver.
> >>
> >> Also a real ICF VF driver was also posted here[2] which is a good
> >> reference for vendors who is interested in their own virtio datapath
> >> offloading product.
> >>
> >> Consider mdev framework only support VFIO device and driver right now,
> >> this series also extend it to support other types. This is done
> >> through introducing class id to the device and pairing it with
> >> id_talbe claimed by the driver. On top, this seris also decouple
> >> device specific parents ops out of the common ones.
> >>
> >> Pktgen test was done with virito-net + mvnet loop back device.
> >>
> >> Please review.
> >>
> >> [1] https://lkml.org/lkml/2019/10/31/440
> >> [2] https://lkml.org/lkml/2019/10/15/1226
> >>
> >> Changes from V7:
> >> - drop {set|get}_mdev_features for virtio
> >> - typo and comment style fixes  
> >
> > Seems we're nearly there, all the remaining comments are relatively
> > superficial, though I would appreciate a v9 addressing them as well as
> > the checkpatch warnings:
> >
> > https://patchwork.freedesktop.org/series/68977/  
> 
> 
> Will do.
> 
> Btw, do you plan to merge vhost-mdev patch on top? Or you prefer it to 
> go through Michael's vhost tree?

I can include it if you wish.  The mdev changes are isolated enough in
that patch that I wouldn't presume it, but clearly it would require
less merge coordination to drop it in my tree.  Let me know.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V8 0/6] mdev based hardware virtio offloading support

2019-11-05 Thread Alex Williamson
On Tue,  5 Nov 2019 17:32:34 +0800
Jason Wang  wrote:

> Hi all:
> 
> There are hardwares that can do virtio datapath offloading while
> having its own control path. This path tries to implement a mdev based
> unified API to support using kernel virtio driver to drive those
> devices. This is done by introducing a new mdev transport for virtio
> (virtio_mdev) and register itself as a new kind of mdev driver. Then
> it provides a unified way for kernel virtio driver to talk with mdev
> device implementation.
> 
> Though the series only contains kernel driver support, the goal is to
> make the transport generic enough to support userspace drivers. This
> means vhost-mdev[1] could be built on top as well by resuing the
> transport.
> 
> A sample driver is also implemented which simulate a virito-net
> loopback ethernet device on top of vringh + workqueue. This could be
> used as a reference implementation for real hardware driver.
> 
> Also a real ICF VF driver was also posted here[2] which is a good
> reference for vendors who is interested in their own virtio datapath
> offloading product.
> 
> Consider mdev framework only support VFIO device and driver right now,
> this series also extend it to support other types. This is done
> through introducing class id to the device and pairing it with
> id_talbe claimed by the driver. On top, this seris also decouple
> device specific parents ops out of the common ones.
> 
> Pktgen test was done with virito-net + mvnet loop back device.
> 
> Please review.
> 
> [1] https://lkml.org/lkml/2019/10/31/440
> [2] https://lkml.org/lkml/2019/10/15/1226
> 
> Changes from V7:
> - drop {set|get}_mdev_features for virtio
> - typo and comment style fixes


Seems we're nearly there, all the remaining comments are relatively
superficial, though I would appreciate a v9 addressing them as well as
the checkpatch warnings:

https://patchwork.freedesktop.org/series/68977/

Consider this a last call for reviews or acks (or naks) from affected
mdev vendor drivers, mdev-core sub-maintainers (Hi Kirti), virtio
stakeholders, etc.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V8 4/6] mdev: introduce virtio device and its device ops

2019-11-05 Thread Alex Williamson
On Tue,  5 Nov 2019 17:32:38 +0800
Jason Wang  wrote:

> This patch implements basic support for mdev driver that supports
> virtio transport for kernel virtio driver.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vfio/mdev/mdev_core.c|  21 +
>  drivers/vfio/mdev/mdev_private.h |   2 +
>  include/linux/mdev.h |   6 ++
>  include/linux/mdev_virtio_ops.h  | 149 +++
>  4 files changed, 178 insertions(+)
>  create mode 100644 include/linux/mdev_virtio_ops.h
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 4e70f19ac145..c58253404ed5 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -78,6 +78,27 @@ const struct mdev_vfio_device_ops 
> *mdev_get_vfio_ops(struct mdev_device *mdev)
>  }
>  EXPORT_SYMBOL(mdev_get_vfio_ops);
>  
> +/*
> + * Specify the virtio device ops for the mdev device, this
> + * must be called during create() callback for virtio mdev device.
> + */
> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> +  const struct mdev_virtio_device_ops *virtio_ops)
> +{
> + mdev_set_class(mdev, MDEV_CLASS_ID_VIRTIO);
> + mdev->virtio_ops = virtio_ops;
> +}
> +EXPORT_SYMBOL(mdev_set_virtio_ops);
> +
> +/* Get the virtio device ops for the mdev device. */
> +const struct mdev_virtio_device_ops *
> +mdev_get_virtio_ops(struct mdev_device *mdev)
> +{
> + WARN_ON(mdev->class_id != MDEV_CLASS_ID_VIRTIO);
> + return mdev->virtio_ops;
> +}
> +EXPORT_SYMBOL(mdev_get_virtio_ops);
> +
>  struct device *mdev_dev(struct mdev_device *mdev)
>  {
>   return >dev;
> diff --git a/drivers/vfio/mdev/mdev_private.h 
> b/drivers/vfio/mdev/mdev_private.h
> index 411227373625..2c74dd032409 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -11,6 +11,7 @@
>  #define MDEV_PRIVATE_H
>  
>  #include 
> +#include 
>  
>  int  mdev_bus_register(void);
>  void mdev_bus_unregister(void);
> @@ -38,6 +39,7 @@ struct mdev_device {
>   u16 class_id;
>   union {
>   const struct mdev_vfio_device_ops *vfio_ops;
> + const struct mdev_virtio_device_ops *virtio_ops;
>   };
>  };
>  
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 9e37506d1987..f3d75a60c2b5 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -17,6 +17,7 @@
>  
>  struct mdev_device;
>  struct mdev_vfio_device_ops;
> +struct mdev_virtio_device_ops;
>  
>  /*
>   * Called by the parent device driver to set the device which represents
> @@ -112,6 +113,10 @@ void mdev_set_class(struct mdev_device *mdev, u16 id);
>  void mdev_set_vfio_ops(struct mdev_device *mdev,
>  const struct mdev_vfio_device_ops *vfio_ops);
>  const struct mdev_vfio_device_ops *mdev_get_vfio_ops(struct mdev_device 
> *mdev);
> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> +  const struct mdev_virtio_device_ops *virtio_ops);
> +const struct mdev_virtio_device_ops *
> +mdev_get_virtio_ops(struct mdev_device *mdev);
>  
>  extern struct bus_type mdev_bus_type;
>  
> @@ -127,6 +132,7 @@ struct mdev_device *mdev_from_dev(struct device *dev);
>  
>  enum {
>   MDEV_CLASS_ID_VFIO = 1,
> + MDEV_CLASS_ID_VIRTIO = 2,
>   /* New entries must be added here */
>  };
>  
> diff --git a/include/linux/mdev_virtio_ops.h b/include/linux/mdev_virtio_ops.h
> new file mode 100644
> index ..379bfa5d6a30
> --- /dev/null
> +++ b/include/linux/mdev_virtio_ops.h
> @@ -0,0 +1,149 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Virtio mediated device driver
> + *
> + * Copyright 2019, Red Hat Corp.
> + * Author: Jason Wang 
> + */
> +#ifndef MDEV_VIRTIO_OPS_H
> +#define MDEV_VIRTIO_OPS_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define VIRTIO_MDEV_DEVICE_API_STRING"virtio-mdev"
> +#define VIRTIO_MDEV_F_VERSION_1 0x1

This entire concept of VIRTIO_MDEV_F_VERSION_1 is gone now, right?
Let's remove it here and below.  Thanks,

Alex

> +
> +struct virtio_mdev_callback {
> + irqreturn_t (*callback)(void *data);
> + void *private;
> +};
> +
> +/**
> + * struct mdev_virtio_device_ops - Structure to be registered for each
> + * mdev device to register the device for virtio/vhost drivers.
> + *
> + * The device ops that is supported by VIRTIO_MDEV_F_VERSION_1, the
> + * callbacks are mandatory unless explicity mentioned.
> + *
> + * @set_vq_address:  Set the address of virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @desc_area: address of desc area
> + *   @driver_area: address of driver area
> + *   @device_area: address of device area
> + *   Returns integer: success (0) or error (< 0)
> + * @set_vq_num:  Set the size of virtqueue
> + * 

Re: [PATCH V8 3/6] mdev: introduce device specific ops

2019-11-05 Thread Alex Williamson
On Tue, 5 Nov 2019 17:50:25 +0100
Cornelia Huck  wrote:

> On Tue,  5 Nov 2019 17:32:37 +0800
> Jason Wang  wrote:
> 
> > Currently, except for the create and remove, the rest of
> > mdev_parent_ops is designed for vfio-mdev driver only and may not help
> > for kernel mdev driver. With the help of class id, this patch
> > introduces device specific callbacks inside mdev_device
> > structure. This allows different set of callback to be used by
> > vfio-mdev and virtio-mdev.
> > 
> > Reviewed-by: Parav Pandit 
> > Signed-off-by: Jason Wang 
> > ---
> >  .../driver-api/vfio-mediated-device.rst   | 35 +
> >  MAINTAINERS   |  1 +
> >  drivers/gpu/drm/i915/gvt/kvmgt.c  | 18 ---
> >  drivers/s390/cio/vfio_ccw_ops.c   | 18 ---
> >  drivers/s390/crypto/vfio_ap_ops.c | 14 +++--
> >  drivers/vfio/mdev/mdev_core.c | 24 -
> >  drivers/vfio/mdev/mdev_private.h  |  5 ++
> >  drivers/vfio/mdev/vfio_mdev.c | 37 ++---
> >  include/linux/mdev.h  | 43 ---
> >  include/linux/mdev_vfio_ops.h | 52 +++
> >  samples/vfio-mdev/mbochs.c| 20 ---
> >  samples/vfio-mdev/mdpy.c  | 20 ---
> >  samples/vfio-mdev/mtty.c  | 18 ---
> >  13 files changed, 206 insertions(+), 99 deletions(-)
> >  create mode 100644 include/linux/mdev_vfio_ops.h
> >   
> 
> (...)
> 
> > @@ -172,10 +163,34 @@ that a driver should use to unregister itself with 
> > the mdev core driver::
> >  
> > extern void mdev_unregister_device(struct device *dev);
> >  
> > -It is also required to specify the class_id in create() callback through::
> > +As multiple types of mediated devices may be supported, class id needs
> > +to be specified in the create callback(). This could be done  
> 
> The brackets should probably go behind 'create'?
> 
> > +explicitly for the device that does not use on mdev bus for its  
> 
> "for devices that do not use the mdev bus" ?
> 
> But why wouldn't they? I feel like I've missed some discussion here :/

The device ops provide a route through mdev-core for known callbacks,
which is primarily useful when we have 1:N relation between mdev bus
driver and vendor drivers.  The obvious example here is vfio-mdev,
where we have GVT-g, vfio-ap, vfio-ccw, NVIDIA GRID, and various sample
drivers all advertising vfio-mdev support via their class id.  However,
if we have a tightly coupled vendor driver and mdev bus driver, as the
mlx5 support that Parav is developing, the claim is that they prefer
not to expose any device ops and intend to interact directly with the
mdev device.  At least that's my understanding.  Thanks,

Alex

> > +operation through:
> >  
> > int mdev_set_class(struct mdev_device *mdev, u16 id);
> >  
> > +For the device that uses on the mdev bus for its operation, the class  
> 
> "For devices that use the mdev bus..."
> 
> But same comment as above.
> 
> > +should provide helper function to set class id and device specific
> > +ops. E.g for vfio-mdev devices, the function to be called is::
> > +
> > +   int mdev_set_vfio_ops(struct mdev_device *mdev,
> > +  const struct mdev_vfio_device_ops *vfio_ops);
> > +
> > +The class id (set by this function to MDEV_CLASS_ID_VFIO) is used to
> > +match a device with an mdev driver via its id table. The device
> > +specific callbacks (specified in *vfio_ops) are obtainable via
> > +mdev_get_vfio_ops() (for use by the mdev bus driver). A vfio-mdev
> > +device (class id MDEV_CLASS_ID_VFIO) uses the following
> > +device-specific ops:
> > +
> > +* open: open callback of vfio mediated device
> > +* close: close callback of vfio mediated device
> > +* ioctl: ioctl callback of vfio mediated device
> > +* read : read emulation callback
> > +* write: write emulation callback
> > +* mmap: mmap emulation callback
> > +
> >  Mediated Device Management Interface Through sysfs
> >  ==  
> 
> Otherwise, looks good.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V7 4/6] mdev: introduce virtio device and its device ops

2019-11-04 Thread Alex Williamson
On Tue, 5 Nov 2019 11:52:41 +0800
Jason Wang  wrote:

> On 2019/11/5 上午5:50, Alex Williamson wrote:
> > On Mon,  4 Nov 2019 20:39:50 +0800
> > Jason Wang  wrote:
> >  
> >> This patch implements basic support for mdev driver that supports
> >> virtio transport for kernel virtio driver.
> >>
> >> Signed-off-by: Jason Wang 
> >> ---
> >>   drivers/vfio/mdev/mdev_core.c|  20 
> >>   drivers/vfio/mdev/mdev_private.h |   2 +
> >>   include/linux/mdev.h |   6 ++
> >>   include/linux/mdev_virtio_ops.h  | 166 +++
> >>   4 files changed, 194 insertions(+)
> >>   create mode 100644 include/linux/mdev_virtio_ops.h
> >>
> >> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> >> index 8d579d7ed82f..95ee4126ff9c 100644
> >> --- a/drivers/vfio/mdev/mdev_core.c
> >> +++ b/drivers/vfio/mdev/mdev_core.c
> >> @@ -76,6 +76,26 @@ const struct mdev_vfio_device_ops 
> >> *mdev_get_vfio_ops(struct mdev_device *mdev)
> >>   }
> >>   EXPORT_SYMBOL(mdev_get_vfio_ops);
> >>   
> >> +/* Specify the virtio device ops for the mdev device, this
> >> + * must be called during create() callback for virtio mdev device.
> >> + */  
> > Comment style.  
> 
> 
> Will fix.
> 
> 
> >  
> >> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> >> +   const struct mdev_virtio_device_ops *virtio_ops)
> >> +{
> >> +  mdev_set_class(mdev, MDEV_CLASS_ID_VIRTIO);
> >> +  mdev->virtio_ops = virtio_ops;
> >> +}
> >> +EXPORT_SYMBOL(mdev_set_virtio_ops);
> >> +
> >> +/* Get the virtio device ops for the mdev device. */
> >> +const struct mdev_virtio_device_ops *
> >> +mdev_get_virtio_ops(struct mdev_device *mdev)
> >> +{
> >> +  WARN_ON(mdev->class_id != MDEV_CLASS_ID_VIRTIO);
> >> +  return mdev->virtio_ops;
> >> +}
> >> +EXPORT_SYMBOL(mdev_get_virtio_ops);
> >> +
> >>   struct device *mdev_dev(struct mdev_device *mdev)
> >>   {
> >>return >dev;
> >> diff --git a/drivers/vfio/mdev/mdev_private.h 
> >> b/drivers/vfio/mdev/mdev_private.h
> >> index 411227373625..2c74dd032409 100644
> >> --- a/drivers/vfio/mdev/mdev_private.h
> >> +++ b/drivers/vfio/mdev/mdev_private.h
> >> @@ -11,6 +11,7 @@
> >>   #define MDEV_PRIVATE_H
> >>   
> >>   #include 
> >> +#include 
> >>   
> >>   int  mdev_bus_register(void);
> >>   void mdev_bus_unregister(void);
> >> @@ -38,6 +39,7 @@ struct mdev_device {
> >>u16 class_id;
> >>union {
> >>const struct mdev_vfio_device_ops *vfio_ops;
> >> +  const struct mdev_virtio_device_ops *virtio_ops;
> >>};
> >>   };
> >>   
> >> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> >> index 9e37506d1987..f3d75a60c2b5 100644
> >> --- a/include/linux/mdev.h
> >> +++ b/include/linux/mdev.h
> >> @@ -17,6 +17,7 @@
> >>   
> >>   struct mdev_device;
> >>   struct mdev_vfio_device_ops;
> >> +struct mdev_virtio_device_ops;
> >>   
> >>   /*
> >>* Called by the parent device driver to set the device which represents
> >> @@ -112,6 +113,10 @@ void mdev_set_class(struct mdev_device *mdev, u16 id);
> >>   void mdev_set_vfio_ops(struct mdev_device *mdev,
> >>   const struct mdev_vfio_device_ops *vfio_ops);
> >>   const struct mdev_vfio_device_ops *mdev_get_vfio_ops(struct mdev_device 
> >> *mdev);
> >> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> >> +   const struct mdev_virtio_device_ops *virtio_ops);
> >> +const struct mdev_virtio_device_ops *
> >> +mdev_get_virtio_ops(struct mdev_device *mdev);
> >>   
> >>   extern struct bus_type mdev_bus_type;
> >>   
> >> @@ -127,6 +132,7 @@ struct mdev_device *mdev_from_dev(struct device *dev);
> >>   
> >>   enum {
> >>MDEV_CLASS_ID_VFIO = 1,
> >> +  MDEV_CLASS_ID_VIRTIO = 2,
> >>/* New entries must be added here */
> >>   };
> >>   
> >> diff --git a/include/linux/mdev_virtio_ops.h 
> >> b/include/linux/mdev_virtio_ops.h
> >> new file mode 100644
> >> index ..0dcae7fa31e5
> >> --- /dev/null
> >> +++ b/includ

Re: [PATCH V7 3/6] mdev: introduce device specific ops

2019-11-04 Thread Alex Williamson
On Mon,  4 Nov 2019 20:39:49 +0800
Jason Wang  wrote:

> Currently, except for the create and remove, the rest of
> mdev_parent_ops is designed for vfio-mdev driver only and may not help
> for kernel mdev driver. With the help of class id, this patch
> introduces device specific callbacks inside mdev_device
> structure. This allows different set of callback to be used by
> vfio-mdev and virtio-mdev.
> 
> Reviewed-by: Parav Pandit 
> Signed-off-by: Jason Wang 
> ---
>  .../driver-api/vfio-mediated-device.rst   | 35 +
>  MAINTAINERS   |  1 +
>  drivers/gpu/drm/i915/gvt/kvmgt.c  | 18 ---
>  drivers/s390/cio/vfio_ccw_ops.c   | 18 ---
>  drivers/s390/crypto/vfio_ap_ops.c | 14 +++--
>  drivers/vfio/mdev/mdev_core.c | 25 -
>  drivers/vfio/mdev/mdev_private.h  |  5 ++
>  drivers/vfio/mdev/vfio_mdev.c | 37 ++---
>  include/linux/mdev.h  | 43 ---
>  include/linux/mdev_vfio_ops.h | 52 +++
>  samples/vfio-mdev/mbochs.c| 20 ---
>  samples/vfio-mdev/mdpy.c  | 20 ---
>  samples/vfio-mdev/mtty.c  | 18 ---
>  13 files changed, 206 insertions(+), 100 deletions(-)
>  create mode 100644 include/linux/mdev_vfio_ops.h
> 
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> b/Documentation/driver-api/vfio-mediated-device.rst
> index 6709413bee29..e35f1f8f946e 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -152,15 +152,6 @@ callbacks per mdev parent device, per mdev type, or any 
> other categorization.
>  Vendor drivers are expected to be fully asynchronous in this respect or
>  provide their own internal resource protection.)
>  
> -The callbacks in the mdev_parent_ops structure are as follows:
> -
> -* open: open callback of mediated device
> -* close: close callback of mediated device
> -* ioctl: ioctl callback of mediated device
> -* read : read emulation callback
> -* write: write emulation callback
> -* mmap: mmap emulation callback
> -
>  A driver should use the mdev_parent_ops structure in the function call to
>  register itself with the mdev core driver::
>  
> @@ -172,10 +163,34 @@ that a driver should use to unregister itself with the 
> mdev core driver::
>  
>   extern void mdev_unregister_device(struct device *dev);
>  
> -It is also required to specify the class_id in create() callback through::
> +As multiple types of mediated devices may be supported, class id needs
> +to be specified in the create callback(). This could be done
> +explicitly for the device that does not use on mdev bus for its
> +operation through:
>  
>   int mdev_set_class(struct mdev_device *mdev, u16 id);
>  
> +For the device that uses on the mdev bus for its operation, the class
> +should provide helper function to set class id and device specific
> +ops. E.g for vfio-mdev devices, the function to be called is::
> +
> + int mdev_set_vfio_ops(struct mdev_device *mdev,
> +  const struct mdev_vfio_device_ops *vfio_ops);
> +
> +The class id (set by this function to MDEV_CLASS_ID_VFIO) is used to
> +match a device with an mdev driver via its id table. The device
> +specific callbacks (specified in *vfio_ops) are obtainable via
> +mdev_get_vfio_ops() (for use by the mdev bus driver). A vfio-mdev
> +device (class id MDEV_CLASS_ID_VFIO) uses the following
> +device-specific ops:
> +
> +* open: open callback of vfio mediated device
> +* close: close callback of vfio mediated device
> +* ioctl: ioctl callback of vfio mediated device
> +* read : read emulation callback
> +* write: write emulation callback
> +* mmap: mmap emulation callback
> +
>  Mediated Device Management Interface Through sysfs
>  ==
>  
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cba1095547fd..f661d13344d6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17121,6 +17121,7 @@ S:Maintained
>  F:   Documentation/driver-api/vfio-mediated-device.rst
>  F:   drivers/vfio/mdev/
>  F:   include/linux/mdev.h
> +F:   include/linux/mdev_vfio_ops.h
>  F:   samples/vfio-mdev/
>  
>  VFIO PLATFORM DRIVER
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 6420f0dbd31b..662f3a672372 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -42,6 +42,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -643,6 +644,8 @@ static void kvmgt_put_vfio_device(void *vgpu)
>   vfio_device_put(((struct intel_vgpu *)vgpu)->vdev.vfio_device);
>  }
>  
> +static const struct mdev_vfio_device_ops intel_vfio_vgpu_dev_ops;
> +
>  static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
>  {
>  

Re: [PATCH V7 4/6] mdev: introduce virtio device and its device ops

2019-11-04 Thread Alex Williamson
On Mon,  4 Nov 2019 20:39:50 +0800
Jason Wang  wrote:

> This patch implements basic support for mdev driver that supports
> virtio transport for kernel virtio driver.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vfio/mdev/mdev_core.c|  20 
>  drivers/vfio/mdev/mdev_private.h |   2 +
>  include/linux/mdev.h |   6 ++
>  include/linux/mdev_virtio_ops.h  | 166 +++
>  4 files changed, 194 insertions(+)
>  create mode 100644 include/linux/mdev_virtio_ops.h
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 8d579d7ed82f..95ee4126ff9c 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -76,6 +76,26 @@ const struct mdev_vfio_device_ops 
> *mdev_get_vfio_ops(struct mdev_device *mdev)
>  }
>  EXPORT_SYMBOL(mdev_get_vfio_ops);
>  
> +/* Specify the virtio device ops for the mdev device, this
> + * must be called during create() callback for virtio mdev device.
> + */

Comment style.

> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> +  const struct mdev_virtio_device_ops *virtio_ops)
> +{
> + mdev_set_class(mdev, MDEV_CLASS_ID_VIRTIO);
> + mdev->virtio_ops = virtio_ops;
> +}
> +EXPORT_SYMBOL(mdev_set_virtio_ops);
> +
> +/* Get the virtio device ops for the mdev device. */
> +const struct mdev_virtio_device_ops *
> +mdev_get_virtio_ops(struct mdev_device *mdev)
> +{
> + WARN_ON(mdev->class_id != MDEV_CLASS_ID_VIRTIO);
> + return mdev->virtio_ops;
> +}
> +EXPORT_SYMBOL(mdev_get_virtio_ops);
> +
>  struct device *mdev_dev(struct mdev_device *mdev)
>  {
>   return >dev;
> diff --git a/drivers/vfio/mdev/mdev_private.h 
> b/drivers/vfio/mdev/mdev_private.h
> index 411227373625..2c74dd032409 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -11,6 +11,7 @@
>  #define MDEV_PRIVATE_H
>  
>  #include 
> +#include 
>  
>  int  mdev_bus_register(void);
>  void mdev_bus_unregister(void);
> @@ -38,6 +39,7 @@ struct mdev_device {
>   u16 class_id;
>   union {
>   const struct mdev_vfio_device_ops *vfio_ops;
> + const struct mdev_virtio_device_ops *virtio_ops;
>   };
>  };
>  
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 9e37506d1987..f3d75a60c2b5 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -17,6 +17,7 @@
>  
>  struct mdev_device;
>  struct mdev_vfio_device_ops;
> +struct mdev_virtio_device_ops;
>  
>  /*
>   * Called by the parent device driver to set the device which represents
> @@ -112,6 +113,10 @@ void mdev_set_class(struct mdev_device *mdev, u16 id);
>  void mdev_set_vfio_ops(struct mdev_device *mdev,
>  const struct mdev_vfio_device_ops *vfio_ops);
>  const struct mdev_vfio_device_ops *mdev_get_vfio_ops(struct mdev_device 
> *mdev);
> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> +  const struct mdev_virtio_device_ops *virtio_ops);
> +const struct mdev_virtio_device_ops *
> +mdev_get_virtio_ops(struct mdev_device *mdev);
>  
>  extern struct bus_type mdev_bus_type;
>  
> @@ -127,6 +132,7 @@ struct mdev_device *mdev_from_dev(struct device *dev);
>  
>  enum {
>   MDEV_CLASS_ID_VFIO = 1,
> + MDEV_CLASS_ID_VIRTIO = 2,
>   /* New entries must be added here */
>  };
>  
> diff --git a/include/linux/mdev_virtio_ops.h b/include/linux/mdev_virtio_ops.h
> new file mode 100644
> index ..0dcae7fa31e5
> --- /dev/null
> +++ b/include/linux/mdev_virtio_ops.h
> @@ -0,0 +1,166 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Virtio mediated device driver
> + *
> + * Copyright 2019, Red Hat Corp.
> + * Author: Jason Wang 
> + */
> +#ifndef MDEV_VIRTIO_OPS_H
> +#define MDEV_VIRTIO_OPS_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define VIRTIO_MDEV_DEVICE_API_STRING"virtio-mdev"
> +#define VIRTIO_MDEV_F_VERSION_1 0x1
> +
> +struct virtio_mdev_callback {
> + irqreturn_t (*callback)(void *data);
> + void *private;
> +};
> +
> +/**
> + * struct mdev_virtio_device_ops - Structure to be registered for each
> + * mdev device to register the device for virtio/vhost drivers.
> + *
> + * The device ops that is supported by VIRTIO_MDEV_F_VERSION_1, the
> + * callbacks are mandatory unless explicity mentioned.
> + *
> + * @get_mdev_features:   Get a set of bits that demonstrate
> + *   the capability of the mdev device. New
> + *   feature bits must be added when
> + *   introducing new device ops. This
> + *   allows the device ops to be extended
> + *   (one feature could have N new ops).
> + *   @mdev: mediated device
> + *   Returns the mdev features (API) support by
> + *   the device.

I still don't see the point of 

Re: [PATCH V7 1/6] mdev: class id support

2019-11-04 Thread Alex Williamson
On Mon,  4 Nov 2019 20:39:47 +0800
Jason Wang  wrote:

> Mdev bus only supports vfio driver right now, so it doesn't implement
> match method. But in the future, we may add drivers other than vfio,
> the first driver could be virtio-mdev. This means we need to add
> device class id support in bus match method to pair the mdev device
> and mdev driver correctly.
> 
> So this patch adds id_table to mdev_driver and class_id for mdev
> device with the match method for mdev bus.
> 
> Reviewed-by: Parav Pandit 
> Signed-off-by: Jason Wang 
> ---
>  .../driver-api/vfio-mediated-device.rst   |  5 
>  drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
>  drivers/s390/cio/vfio_ccw_ops.c   |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c |  1 +
>  drivers/vfio/mdev/mdev_core.c | 16 
>  drivers/vfio/mdev/mdev_driver.c   | 25 +++
>  drivers/vfio/mdev/mdev_private.h  |  1 +
>  drivers/vfio/mdev/vfio_mdev.c |  6 +
>  include/linux/mdev.h  |  8 ++
>  include/linux/mod_devicetable.h   |  8 ++
>  samples/vfio-mdev/mbochs.c|  1 +
>  samples/vfio-mdev/mdpy.c  |  1 +
>  samples/vfio-mdev/mtty.c  |  1 +
>  13 files changed, 75 insertions(+)
> 
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> b/Documentation/driver-api/vfio-mediated-device.rst
> index 25eb7d5b834b..6709413bee29 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -102,12 +102,14 @@ structure to represent a mediated device's driver::
>* @probe: called when new device created
>* @remove: called when device removed
>* @driver: device driver structure
> +  * @id_table: the ids serviced by this driver
>*/
>   struct mdev_driver {
>const char *name;
>int  (*probe)  (struct device *dev);
>void (*remove) (struct device *dev);
>struct device_driverdriver;
> +  const struct mdev_class_id *id_table;
>   };
>  
>  A mediated bus driver for mdev should use this structure in the function 
> calls
> @@ -170,6 +172,9 @@ that a driver should use to unregister itself with the 
> mdev core driver::
>  
>   extern void mdev_unregister_device(struct device *dev);
>  
> +It is also required to specify the class_id in create() callback through::
> +
> + int mdev_set_class(struct mdev_device *mdev, u16 id);
>  
>  Mediated Device Management Interface Through sysfs
>  ==
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 343d79c1cb7e..6420f0dbd31b 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -678,6 +678,7 @@ static int intel_vgpu_create(struct kobject *kobj, struct 
> mdev_device *mdev)
>dev_name(mdev_dev(mdev)));
>   ret = 0;
>  
> + mdev_set_class(mdev, MDEV_CLASS_ID_VFIO);
>  out:
>   return ret;
>  }
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index f0d71ab77c50..cf2c013ae32f 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -129,6 +129,7 @@ static int vfio_ccw_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>  private->sch->schid.ssid,
>  private->sch->schid.sch_no);
>  
> + mdev_set_class(mdev, MDEV_CLASS_ID_VFIO);
>   return 0;
>  }
>  
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 5c0f53c6dde7..07c31070afeb 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -343,6 +343,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>   list_add(_mdev->node, _dev->mdev_list);
>   mutex_unlock(_dev->lock);
>  
> + mdev_set_class(mdev, MDEV_CLASS_ID_VFIO);
>   return 0;
>  }
>  
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b558d4cfd082..d23ca39e3be6 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -45,6 +45,16 @@ void mdev_set_drvdata(struct mdev_device *mdev, void *data)
>  }
>  EXPORT_SYMBOL(mdev_set_drvdata);
>  
> +/* Specify the class for the mdev device, this must be called during
> + * create() callback.
> + */

Standard non-networking multi-line comment style please, ie.

/*
 * Multi-
 * line
 * comment
 */

Thanks,
Alex

> +void mdev_set_class(struct mdev_device *mdev, u16 id)
> +{
> + WARN_ON(mdev->class_id);
> + mdev->class_id = id;
> +}
> +EXPORT_SYMBOL(mdev_set_class);
> +
>  struct device *mdev_dev(struct mdev_device *mdev)
>  {
>   return >dev;
> @@ -324,6 +334,12 @@ int mdev_device_create(struct kobject 

Re: [PATCH V5 4/6] mdev: introduce virtio device and its device ops

2019-10-24 Thread Alex Williamson
On Thu, 24 Oct 2019 11:51:35 +0800
Jason Wang  wrote:

> On 2019/10/24 上午5:57, Alex Williamson wrote:
> > On Wed, 23 Oct 2019 21:07:50 +0800
> > Jason Wang  wrote:
> >  
> >> This patch implements basic support for mdev driver that supports
> >> virtio transport for kernel virtio driver.
> >>
> >> Signed-off-by: Jason Wang 
> >> ---
> >>   drivers/vfio/mdev/mdev_core.c|  20 
> >>   drivers/vfio/mdev/mdev_private.h |   2 +
> >>   include/linux/mdev.h |   6 ++
> >>   include/linux/virtio_mdev_ops.h  | 159 +++
> >>   4 files changed, 187 insertions(+)
> >>   create mode 100644 include/linux/virtio_mdev_ops.h
> >>
> >> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> >> index 555bd61d8c38..9b00c3513120 100644
> >> --- a/drivers/vfio/mdev/mdev_core.c
> >> +++ b/drivers/vfio/mdev/mdev_core.c
> >> @@ -76,6 +76,26 @@ const struct vfio_mdev_device_ops 
> >> *mdev_get_vfio_ops(struct mdev_device *mdev)
> >>   }
> >>   EXPORT_SYMBOL(mdev_get_vfio_ops);
> >>   
> >> +/* Specify the virtio device ops for the mdev device, this
> >> + * must be called during create() callback for virtio mdev device.
> >> + */
> >> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> >> +   const struct virtio_mdev_device_ops *virtio_ops)
> >> +{
> >> +  mdev_set_class(mdev, MDEV_CLASS_ID_VIRTIO);
> >> +  mdev->virtio_ops = virtio_ops;
> >> +}
> >> +EXPORT_SYMBOL(mdev_set_virtio_ops);
> >> +
> >> +/* Get the virtio device ops for the mdev device. */
> >> +const struct virtio_mdev_device_ops *
> >> +mdev_get_virtio_ops(struct mdev_device *mdev)
> >> +{
> >> +  WARN_ON(mdev->class_id != MDEV_CLASS_ID_VIRTIO);
> >> +  return mdev->virtio_ops;
> >> +}
> >> +EXPORT_SYMBOL(mdev_get_virtio_ops);
> >> +
> >>   struct device *mdev_dev(struct mdev_device *mdev)
> >>   {
> >>return >dev;
> >> diff --git a/drivers/vfio/mdev/mdev_private.h 
> >> b/drivers/vfio/mdev/mdev_private.h
> >> index 0770410ded2a..7b47890c34e7 100644
> >> --- a/drivers/vfio/mdev/mdev_private.h
> >> +++ b/drivers/vfio/mdev/mdev_private.h
> >> @@ -11,6 +11,7 @@
> >>   #define MDEV_PRIVATE_H
> >>   
> >>   #include 
> >> +#include 
> >>   
> >>   int  mdev_bus_register(void);
> >>   void mdev_bus_unregister(void);
> >> @@ -38,6 +39,7 @@ struct mdev_device {
> >>u16 class_id;
> >>union {
> >>const struct vfio_mdev_device_ops *vfio_ops;
> >> +  const struct virtio_mdev_device_ops *virtio_ops;
> >>};
> >>   };
> >>   
> >> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> >> index 4625f1a11014..9b69b0bbebfd 100644
> >> --- a/include/linux/mdev.h
> >> +++ b/include/linux/mdev.h
> >> @@ -17,6 +17,7 @@
> >>   
> >>   struct mdev_device;
> >>   struct vfio_mdev_device_ops;
> >> +struct virtio_mdev_device_ops;
> >>   
> >>   /*
> >>* Called by the parent device driver to set the device which represents
> >> @@ -112,6 +113,10 @@ void mdev_set_class(struct mdev_device *mdev, u16 id);
> >>   void mdev_set_vfio_ops(struct mdev_device *mdev,
> >>   const struct vfio_mdev_device_ops *vfio_ops);
> >>   const struct vfio_mdev_device_ops *mdev_get_vfio_ops(struct mdev_device 
> >> *mdev);
> >> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> >> +   const struct virtio_mdev_device_ops *virtio_ops);
> >> +const struct virtio_mdev_device_ops *
> >> +mdev_get_virtio_ops(struct mdev_device *mdev);
> >>   
> >>   extern struct bus_type mdev_bus_type;
> >>   
> >> @@ -127,6 +132,7 @@ struct mdev_device *mdev_from_dev(struct device *dev);
> >>   
> >>   enum {
> >>MDEV_CLASS_ID_VFIO = 1,
> >> +  MDEV_CLASS_ID_VIRTIO = 2,
> >>/* New entries must be added here */
> >>   };
> >>   
> >> diff --git a/include/linux/virtio_mdev_ops.h 
> >> b/include/linux/virtio_mdev_ops.h
> >> new file mode 100644
> >> index ..d417b41f2845
> >> --- /dev/null
> >> +++ b/include/linux/virtio_mdev_ops.h
> >> @@ -0,0 +1,159 @@
> >> +/* SPDX-License-Identifier: GPL-2.0-only *

Re: [PATCH V5 1/6] mdev: class id support

2019-10-24 Thread Alex Williamson
On Thu, 24 Oct 2019 13:46:36 -0600
Alex Williamson  wrote:

> On Thu, 24 Oct 2019 11:27:36 +0800
> Jason Wang  wrote:
> 
> > On 2019/10/24 上午5:42, Alex Williamson wrote:  
> > > On Wed, 23 Oct 2019 21:07:47 +0800
> > > Jason Wang  wrote:
> > >
> > >> Mdev bus only supports vfio driver right now, so it doesn't implement
> > >> match method. But in the future, we may add drivers other than vfio,
> > >> the first driver could be virtio-mdev. This means we need to add
> > >> device class id support in bus match method to pair the mdev device
> > >> and mdev driver correctly.
> > >>
> > >> So this patch adds id_table to mdev_driver and class_id for mdev
> > >> device with the match method for mdev bus.
> > >>
> > >> Signed-off-by: Jason Wang 
> > >> ---
> > >>   .../driver-api/vfio-mediated-device.rst   |  5 +
> > >>   drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
> > >>   drivers/s390/cio/vfio_ccw_ops.c   |  1 +
> > >>   drivers/s390/crypto/vfio_ap_ops.c |  1 +
> > >>   drivers/vfio/mdev/mdev_core.c | 18 +++
> > >>   drivers/vfio/mdev/mdev_driver.c   | 22 +++
> > >>   drivers/vfio/mdev/mdev_private.h  |  1 +
> > >>   drivers/vfio/mdev/vfio_mdev.c |  6 +
> > >>   include/linux/mdev.h  |  8 +++
> > >>   include/linux/mod_devicetable.h   |  8 +++
> > >>   samples/vfio-mdev/mbochs.c|  1 +
> > >>   samples/vfio-mdev/mdpy.c  |  1 +
> > >>   samples/vfio-mdev/mtty.c  |  1 +
> > >>   13 files changed, 74 insertions(+)
> > >>
> > >> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> > >> b/Documentation/driver-api/vfio-mediated-device.rst
> > >> index 25eb7d5b834b..6709413bee29 100644
> > >> --- a/Documentation/driver-api/vfio-mediated-device.rst
> > >> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> > >> @@ -102,12 +102,14 @@ structure to represent a mediated device's driver::
> > >> * @probe: called when new device created
> > >> * @remove: called when device removed
> > >> * @driver: device driver structure
> > >> +  * @id_table: the ids serviced by this driver
> > >> */
> > >>struct mdev_driver {
> > >>   const char *name;
> > >>   int  (*probe)  (struct device *dev);
> > >>   void (*remove) (struct device *dev);
> > >>   struct device_driverdriver;
> > >> + const struct mdev_class_id *id_table;
> > >>};
> > >>   
> > >>   A mediated bus driver for mdev should use this structure in the 
> > >> function calls
> > >> @@ -170,6 +172,9 @@ that a driver should use to unregister itself with 
> > >> the mdev core driver::
> > >>   
> > >>  extern void mdev_unregister_device(struct device *dev);
> > >>   
> > >> +It is also required to specify the class_id in create() callback 
> > >> through::
> > >> +
> > >> +int mdev_set_class(struct mdev_device *mdev, u16 id);
> > >>   
> > >>   Mediated Device Management Interface Through sysfs
> > >>   ==
> > >> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> > >> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > >> index 343d79c1cb7e..6420f0dbd31b 100644
> > >> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> > >> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > >> @@ -678,6 +678,7 @@ static int intel_vgpu_create(struct kobject *kobj, 
> > >> struct mdev_device *mdev)
> > >>   dev_name(mdev_dev(mdev)));
> > >>  ret = 0;
> > >>   
> > >> +mdev_set_class(mdev, MDEV_CLASS_ID_VFIO);
> > >>   out:
> > >>  return ret;
> > >>   }
> > >> diff --git a/drivers/s390/cio/vfio_ccw_ops.c 
> > >> b/drivers/s390/cio/vfio_ccw_ops.c
> > >> index f0d71ab77c50..cf2c013ae32f 100644
> > >> --- a/drivers/s390/cio/vfio_ccw_ops.c
> > >> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> > >> @

Re: [PATCH V5 2/6] modpost: add support for mdev class id

2019-10-24 Thread Alex Williamson
On Thu, 24 Oct 2019 11:31:04 +0800
Jason Wang  wrote:

> On 2019/10/24 上午5:42, Alex Williamson wrote:
> > On Wed, 23 Oct 2019 21:07:48 +0800
> > Jason Wang  wrote:
> >  
> >> Add support to parse mdev class id table.
> >>
> >> Reviewed-by: Parav Pandit 
> >> Signed-off-by: Jason Wang 
> >> ---
> >>   drivers/vfio/mdev/vfio_mdev.c |  2 ++
> >>   scripts/mod/devicetable-offsets.c |  3 +++
> >>   scripts/mod/file2alias.c  | 10 ++
> >>   3 files changed, 15 insertions(+)
> >>
> >> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> >> index 7b24ee9cb8dd..cb701cd646f0 100644
> >> --- a/drivers/vfio/mdev/vfio_mdev.c
> >> +++ b/drivers/vfio/mdev/vfio_mdev.c
> >> @@ -125,6 +125,8 @@ static const struct mdev_class_id id_table[] = {
> >>{ 0 },
> >>   };
> >>   
> >> +MODULE_DEVICE_TABLE(mdev, id_table);
> >> +  
> > Two questions, first we have:
> >
> > #define MODULE_DEVICE_TABLE(type, name) \
> > extern typeof(name) __mod_##type##__##name##_device_table   \
> >__attribute__ ((unused, alias(__stringify(name
> >
> > Therefore we're defining __mod_mdev__id_table_device_table with alias
> > id_table.  When the virtio mdev bus driver is added in 5/6 it uses the
> > same name value.  I see virtio types all register this way (virtio,
> > id_table), so I assume there's no conflict, but pci types mostly (not
> > entirely) seem to use unique names.  Is there a preference to one way
> > or the other or it simply doesn't matter?  
> 
> 
> It looks to me that those symbol were local, so it doesn't matter. But 
> if you wish I can switch to use unique name.

I don't have a strong opinion, I'm just trying to make sure we're not
doing something obviously broken.

> >>   static struct mdev_driver vfio_mdev_driver = {
> >>.name   = "vfio_mdev",
> >>.probe  = vfio_mdev_probe,
> >> diff --git a/scripts/mod/devicetable-offsets.c 
> >> b/scripts/mod/devicetable-offsets.c
> >> index 054405b90ba4..6cbb1062488a 100644
> >> --- a/scripts/mod/devicetable-offsets.c
> >> +++ b/scripts/mod/devicetable-offsets.c
> >> @@ -231,5 +231,8 @@ int main(void)
> >>DEVID(wmi_device_id);
> >>DEVID_FIELD(wmi_device_id, guid_string);
> >>   
> >> +  DEVID(mdev_class_id);
> >> +  DEVID_FIELD(mdev_class_id, id);
> >> +
> >>return 0;
> >>   }
> >> diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> >> index c91eba751804..d365dfe7c718 100644
> >> --- a/scripts/mod/file2alias.c
> >> +++ b/scripts/mod/file2alias.c
> >> @@ -1335,6 +1335,15 @@ static int do_wmi_entry(const char *filename, void 
> >> *symval, char *alias)
> >>return 1;
> >>   }
> >>   
> >> +/* looks like: "mdev:cN" */
> >> +static int do_mdev_entry(const char *filename, void *symval, char *alias)
> >> +{
> >> +  DEF_FIELD(symval, mdev_class_id, id);
> >> +
> >> +  sprintf(alias, "mdev:c%02X", id);  
> > A lot of entries call add_wildcard() here, should we?  Sorry for the
> > basic questions, I haven't played in this code.  Thanks,  
> 
> 
> It's really good question. My understanding is we won't have a module 
> that can deal with all kinds of classes like CLASS_ID_ANY. So there's 
> probably no need for the wildcard.

The comment for add_wildcard() indicates future extension, so it's hard
to know what we might need in the future until we do need it.  The
majority of modules.alias entries on my laptop (even if I exclude pci
aliases) end with a wildcard.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V5 1/6] mdev: class id support

2019-10-24 Thread Alex Williamson
On Thu, 24 Oct 2019 11:27:36 +0800
Jason Wang  wrote:

> On 2019/10/24 上午5:42, Alex Williamson wrote:
> > On Wed, 23 Oct 2019 21:07:47 +0800
> > Jason Wang  wrote:
> >  
> >> Mdev bus only supports vfio driver right now, so it doesn't implement
> >> match method. But in the future, we may add drivers other than vfio,
> >> the first driver could be virtio-mdev. This means we need to add
> >> device class id support in bus match method to pair the mdev device
> >> and mdev driver correctly.
> >>
> >> So this patch adds id_table to mdev_driver and class_id for mdev
> >> device with the match method for mdev bus.
> >>
> >> Signed-off-by: Jason Wang 
> >> ---
> >>   .../driver-api/vfio-mediated-device.rst   |  5 +
> >>   drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
> >>   drivers/s390/cio/vfio_ccw_ops.c   |  1 +
> >>   drivers/s390/crypto/vfio_ap_ops.c |  1 +
> >>   drivers/vfio/mdev/mdev_core.c | 18 +++
> >>   drivers/vfio/mdev/mdev_driver.c   | 22 +++
> >>   drivers/vfio/mdev/mdev_private.h  |  1 +
> >>   drivers/vfio/mdev/vfio_mdev.c |  6 +
> >>   include/linux/mdev.h  |  8 +++
> >>   include/linux/mod_devicetable.h   |  8 +++
> >>   samples/vfio-mdev/mbochs.c|  1 +
> >>   samples/vfio-mdev/mdpy.c  |  1 +
> >>   samples/vfio-mdev/mtty.c  |  1 +
> >>   13 files changed, 74 insertions(+)
> >>
> >> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> >> b/Documentation/driver-api/vfio-mediated-device.rst
> >> index 25eb7d5b834b..6709413bee29 100644
> >> --- a/Documentation/driver-api/vfio-mediated-device.rst
> >> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> >> @@ -102,12 +102,14 @@ structure to represent a mediated device's driver::
> >> * @probe: called when new device created
> >> * @remove: called when device removed
> >> * @driver: device driver structure
> >> +  * @id_table: the ids serviced by this driver
> >> */
> >>struct mdev_driver {
> >> const char *name;
> >> int  (*probe)  (struct device *dev);
> >> void (*remove) (struct device *dev);
> >> struct device_driverdriver;
> >> +   const struct mdev_class_id *id_table;
> >>};
> >>   
> >>   A mediated bus driver for mdev should use this structure in the function 
> >> calls
> >> @@ -170,6 +172,9 @@ that a driver should use to unregister itself with the 
> >> mdev core driver::
> >>   
> >>extern void mdev_unregister_device(struct device *dev);
> >>   
> >> +It is also required to specify the class_id in create() callback through::
> >> +
> >> +  int mdev_set_class(struct mdev_device *mdev, u16 id);
> >>   
> >>   Mediated Device Management Interface Through sysfs
> >>   ==
> >> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> >> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> >> index 343d79c1cb7e..6420f0dbd31b 100644
> >> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> >> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> >> @@ -678,6 +678,7 @@ static int intel_vgpu_create(struct kobject *kobj, 
> >> struct mdev_device *mdev)
> >> dev_name(mdev_dev(mdev)));
> >>ret = 0;
> >>   
> >> +  mdev_set_class(mdev, MDEV_CLASS_ID_VFIO);
> >>   out:
> >>return ret;
> >>   }
> >> diff --git a/drivers/s390/cio/vfio_ccw_ops.c 
> >> b/drivers/s390/cio/vfio_ccw_ops.c
> >> index f0d71ab77c50..cf2c013ae32f 100644
> >> --- a/drivers/s390/cio/vfio_ccw_ops.c
> >> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> >> @@ -129,6 +129,7 @@ static int vfio_ccw_mdev_create(struct kobject *kobj, 
> >> struct mdev_device *mdev)
> >>   private->sch->schid.ssid,
> >>   private->sch->schid.sch_no);
> >>   
> >> +  mdev_set_class(mdev, MDEV_CLASS_ID_VFIO);
> >>return 0;
> >>   }
> >>   
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> >> b/drivers/s390/crypto/vfio_ap_ops.c
> >> index 5c0f53c6dde7..07c31070afeb 100644
> >> --- 

Re: [PATCH V5 4/6] mdev: introduce virtio device and its device ops

2019-10-23 Thread Alex Williamson
On Wed, 23 Oct 2019 21:07:50 +0800
Jason Wang  wrote:

> This patch implements basic support for mdev driver that supports
> virtio transport for kernel virtio driver.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vfio/mdev/mdev_core.c|  20 
>  drivers/vfio/mdev/mdev_private.h |   2 +
>  include/linux/mdev.h |   6 ++
>  include/linux/virtio_mdev_ops.h  | 159 +++
>  4 files changed, 187 insertions(+)
>  create mode 100644 include/linux/virtio_mdev_ops.h
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 555bd61d8c38..9b00c3513120 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -76,6 +76,26 @@ const struct vfio_mdev_device_ops 
> *mdev_get_vfio_ops(struct mdev_device *mdev)
>  }
>  EXPORT_SYMBOL(mdev_get_vfio_ops);
>  
> +/* Specify the virtio device ops for the mdev device, this
> + * must be called during create() callback for virtio mdev device.
> + */
> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> +  const struct virtio_mdev_device_ops *virtio_ops)
> +{
> + mdev_set_class(mdev, MDEV_CLASS_ID_VIRTIO);
> + mdev->virtio_ops = virtio_ops;
> +}
> +EXPORT_SYMBOL(mdev_set_virtio_ops);
> +
> +/* Get the virtio device ops for the mdev device. */
> +const struct virtio_mdev_device_ops *
> +mdev_get_virtio_ops(struct mdev_device *mdev)
> +{
> + WARN_ON(mdev->class_id != MDEV_CLASS_ID_VIRTIO);
> + return mdev->virtio_ops;
> +}
> +EXPORT_SYMBOL(mdev_get_virtio_ops);
> +
>  struct device *mdev_dev(struct mdev_device *mdev)
>  {
>   return >dev;
> diff --git a/drivers/vfio/mdev/mdev_private.h 
> b/drivers/vfio/mdev/mdev_private.h
> index 0770410ded2a..7b47890c34e7 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -11,6 +11,7 @@
>  #define MDEV_PRIVATE_H
>  
>  #include 
> +#include 
>  
>  int  mdev_bus_register(void);
>  void mdev_bus_unregister(void);
> @@ -38,6 +39,7 @@ struct mdev_device {
>   u16 class_id;
>   union {
>   const struct vfio_mdev_device_ops *vfio_ops;
> + const struct virtio_mdev_device_ops *virtio_ops;
>   };
>  };
>  
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 4625f1a11014..9b69b0bbebfd 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -17,6 +17,7 @@
>  
>  struct mdev_device;
>  struct vfio_mdev_device_ops;
> +struct virtio_mdev_device_ops;
>  
>  /*
>   * Called by the parent device driver to set the device which represents
> @@ -112,6 +113,10 @@ void mdev_set_class(struct mdev_device *mdev, u16 id);
>  void mdev_set_vfio_ops(struct mdev_device *mdev,
>  const struct vfio_mdev_device_ops *vfio_ops);
>  const struct vfio_mdev_device_ops *mdev_get_vfio_ops(struct mdev_device 
> *mdev);
> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> +  const struct virtio_mdev_device_ops *virtio_ops);
> +const struct virtio_mdev_device_ops *
> +mdev_get_virtio_ops(struct mdev_device *mdev);
>  
>  extern struct bus_type mdev_bus_type;
>  
> @@ -127,6 +132,7 @@ struct mdev_device *mdev_from_dev(struct device *dev);
>  
>  enum {
>   MDEV_CLASS_ID_VFIO = 1,
> + MDEV_CLASS_ID_VIRTIO = 2,
>   /* New entries must be added here */
>  };
>  
> diff --git a/include/linux/virtio_mdev_ops.h b/include/linux/virtio_mdev_ops.h
> new file mode 100644
> index ..d417b41f2845
> --- /dev/null
> +++ b/include/linux/virtio_mdev_ops.h
> @@ -0,0 +1,159 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Virtio mediated device driver
> + *
> + * Copyright 2019, Red Hat Corp.
> + * Author: Jason Wang 
> + */
> +#ifndef _LINUX_VIRTIO_MDEV_H
> +#define _LINUX_VIRTIO_MDEV_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define VIRTIO_MDEV_DEVICE_API_STRING"virtio-mdev"
> +#define VIRTIO_MDEV_F_VERSION_1 0x1
> +
> +struct virtio_mdev_callback {
> + irqreturn_t (*callback)(void *data);
> + void *private;
> +};
> +
> +/**
> + * struct vfio_mdev_device_ops - Structure to be registered for each
> + * mdev device to register the device for virtio/vhost drivers.
> + *
> + * The device ops that is supported by VIRTIO_MDEV_F_VERSION_1, the
> + * callbacks are mandatory unless explicity mentioned.

If the version of the callbacks is returned by a callback within the
structure defined by the version... isn't that a bit circular?  This
seems redundant to me versus the class id.  The fact that the parent
driver defines the device as MDEV_CLASS_ID_VIRTIO should tell us this
already.  If it was incremented, we'd need an MDEV_CLASS_ID_VIRTIOv2,
which the virtio-mdev bus driver could add to its id table and handle
differently.

> + *
> + * @set_vq_address:  Set the address of virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @desc_area: 

Re: [PATCH V5 2/6] modpost: add support for mdev class id

2019-10-23 Thread Alex Williamson
On Wed, 23 Oct 2019 21:07:48 +0800
Jason Wang  wrote:

> Add support to parse mdev class id table.
> 
> Reviewed-by: Parav Pandit 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vfio/mdev/vfio_mdev.c |  2 ++
>  scripts/mod/devicetable-offsets.c |  3 +++
>  scripts/mod/file2alias.c  | 10 ++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index 7b24ee9cb8dd..cb701cd646f0 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -125,6 +125,8 @@ static const struct mdev_class_id id_table[] = {
>   { 0 },
>  };
>  
> +MODULE_DEVICE_TABLE(mdev, id_table);
> +

Two questions, first we have:

#define MODULE_DEVICE_TABLE(type, name) \
extern typeof(name) __mod_##type##__##name##_device_table   \
  __attribute__ ((unused, alias(__stringify(name

Therefore we're defining __mod_mdev__id_table_device_table with alias
id_table.  When the virtio mdev bus driver is added in 5/6 it uses the
same name value.  I see virtio types all register this way (virtio,
id_table), so I assume there's no conflict, but pci types mostly (not
entirely) seem to use unique names.  Is there a preference to one way
or the other or it simply doesn't matter?

>  static struct mdev_driver vfio_mdev_driver = {
>   .name   = "vfio_mdev",
>   .probe  = vfio_mdev_probe,
> diff --git a/scripts/mod/devicetable-offsets.c 
> b/scripts/mod/devicetable-offsets.c
> index 054405b90ba4..6cbb1062488a 100644
> --- a/scripts/mod/devicetable-offsets.c
> +++ b/scripts/mod/devicetable-offsets.c
> @@ -231,5 +231,8 @@ int main(void)
>   DEVID(wmi_device_id);
>   DEVID_FIELD(wmi_device_id, guid_string);
>  
> + DEVID(mdev_class_id);
> + DEVID_FIELD(mdev_class_id, id);
> +
>   return 0;
>  }
> diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> index c91eba751804..d365dfe7c718 100644
> --- a/scripts/mod/file2alias.c
> +++ b/scripts/mod/file2alias.c
> @@ -1335,6 +1335,15 @@ static int do_wmi_entry(const char *filename, void 
> *symval, char *alias)
>   return 1;
>  }
>  
> +/* looks like: "mdev:cN" */
> +static int do_mdev_entry(const char *filename, void *symval, char *alias)
> +{
> + DEF_FIELD(symval, mdev_class_id, id);
> +
> + sprintf(alias, "mdev:c%02X", id);

A lot of entries call add_wildcard() here, should we?  Sorry for the
basic questions, I haven't played in this code.  Thanks,

Alex

> + return 1;
> +}
> +
>  /* Does namelen bytes of name exactly match the symbol? */
>  static bool sym_is(const char *name, unsigned namelen, const char *symbol)
>  {
> @@ -1407,6 +1416,7 @@ static const struct devtable devtable[] = {
>   {"typec", SIZE_typec_device_id, do_typec_entry},
>   {"tee", SIZE_tee_client_device_id, do_tee_entry},
>   {"wmi", SIZE_wmi_device_id, do_wmi_entry},
> + {"mdev", SIZE_mdev_class_id, do_mdev_entry},
>  };
>  
>  /* Create MODULE_ALIAS() statements.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V5 1/6] mdev: class id support

2019-10-23 Thread Alex Williamson
On Wed, 23 Oct 2019 21:07:47 +0800
Jason Wang  wrote:

> Mdev bus only supports vfio driver right now, so it doesn't implement
> match method. But in the future, we may add drivers other than vfio,
> the first driver could be virtio-mdev. This means we need to add
> device class id support in bus match method to pair the mdev device
> and mdev driver correctly.
> 
> So this patch adds id_table to mdev_driver and class_id for mdev
> device with the match method for mdev bus.
> 
> Signed-off-by: Jason Wang 
> ---
>  .../driver-api/vfio-mediated-device.rst   |  5 +
>  drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
>  drivers/s390/cio/vfio_ccw_ops.c   |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c |  1 +
>  drivers/vfio/mdev/mdev_core.c | 18 +++
>  drivers/vfio/mdev/mdev_driver.c   | 22 +++
>  drivers/vfio/mdev/mdev_private.h  |  1 +
>  drivers/vfio/mdev/vfio_mdev.c |  6 +
>  include/linux/mdev.h  |  8 +++
>  include/linux/mod_devicetable.h   |  8 +++
>  samples/vfio-mdev/mbochs.c|  1 +
>  samples/vfio-mdev/mdpy.c  |  1 +
>  samples/vfio-mdev/mtty.c  |  1 +
>  13 files changed, 74 insertions(+)
> 
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> b/Documentation/driver-api/vfio-mediated-device.rst
> index 25eb7d5b834b..6709413bee29 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -102,12 +102,14 @@ structure to represent a mediated device's driver::
>* @probe: called when new device created
>* @remove: called when device removed
>* @driver: device driver structure
> +  * @id_table: the ids serviced by this driver
>*/
>   struct mdev_driver {
>const char *name;
>int  (*probe)  (struct device *dev);
>void (*remove) (struct device *dev);
>struct device_driverdriver;
> +  const struct mdev_class_id *id_table;
>   };
>  
>  A mediated bus driver for mdev should use this structure in the function 
> calls
> @@ -170,6 +172,9 @@ that a driver should use to unregister itself with the 
> mdev core driver::
>  
>   extern void mdev_unregister_device(struct device *dev);
>  
> +It is also required to specify the class_id in create() callback through::
> +
> + int mdev_set_class(struct mdev_device *mdev, u16 id);
>  
>  Mediated Device Management Interface Through sysfs
>  ==
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 343d79c1cb7e..6420f0dbd31b 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -678,6 +678,7 @@ static int intel_vgpu_create(struct kobject *kobj, struct 
> mdev_device *mdev)
>dev_name(mdev_dev(mdev)));
>   ret = 0;
>  
> + mdev_set_class(mdev, MDEV_CLASS_ID_VFIO);
>  out:
>   return ret;
>  }
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index f0d71ab77c50..cf2c013ae32f 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -129,6 +129,7 @@ static int vfio_ccw_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>  private->sch->schid.ssid,
>  private->sch->schid.sch_no);
>  
> + mdev_set_class(mdev, MDEV_CLASS_ID_VFIO);
>   return 0;
>  }
>  
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 5c0f53c6dde7..07c31070afeb 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -343,6 +343,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>   list_add(_mdev->node, _dev->mdev_list);
>   mutex_unlock(_dev->lock);
>  
> + mdev_set_class(mdev, MDEV_CLASS_ID_VFIO);
>   return 0;
>  }
>  
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b558d4cfd082..3a9c52d71b4e 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -45,6 +45,16 @@ void mdev_set_drvdata(struct mdev_device *mdev, void *data)
>  }
>  EXPORT_SYMBOL(mdev_set_drvdata);
>  
> +/* Specify the class for the mdev device, this must be called during
> + * create() callback.
> + */
> +void mdev_set_class(struct mdev_device *mdev, u16 id)
> +{
> + WARN_ON(mdev->class_id);
> + mdev->class_id = id;
> +}
> +EXPORT_SYMBOL(mdev_set_class);
> +
>  struct device *mdev_dev(struct mdev_device *mdev)
>  {
>   return >dev;
> @@ -135,6 +145,7 @@ static int mdev_device_remove_cb(struct device *dev, void 
> *data)
>   * mdev_register_device : Register a device
>   * @dev: device structure representing parent device.
>   * 

Re: [PATCH V4 4/6] mdev: introduce virtio device and its device ops

2019-10-17 Thread Alex Williamson
On Thu, 17 Oct 2019 18:48:34 +0800
Jason Wang  wrote:

> This patch implements basic support for mdev driver that supports
> virtio transport for kernel virtio driver.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vfio/mdev/mdev_core.c |  12 +++
>  include/linux/mdev.h  |   4 +
>  include/linux/virtio_mdev.h   | 151 ++
>  3 files changed, 167 insertions(+)
>  create mode 100644 include/linux/virtio_mdev.h
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index d0f3113c8071..5834f6b7c7a5 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -57,6 +57,18 @@ void mdev_set_vfio_ops(struct mdev_device *mdev,
>  }
>  EXPORT_SYMBOL(mdev_set_vfio_ops);
>  
> +/* Specify the virtio device ops for the mdev device, this
> + * must be called during create() callback for virtio mdev device.
> + */
> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> +  const struct virtio_mdev_device_ops *virtio_ops)
> +{
> + BUG_ON(mdev->class_id);

Nit, this one is a BUG_ON, but the vfio one is a WARN_ON.  Thanks,

Alex

> + mdev->class_id = MDEV_CLASS_ID_VIRTIO;
> + mdev->device_ops = virtio_ops;
> +}
> +EXPORT_SYMBOL(mdev_set_virtio_ops);
> +
>  const void *mdev_get_dev_ops(struct mdev_device *mdev)
>  {
>   return mdev->device_ops;
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 3d29e09e20c9..13e045e09d3b 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -17,6 +17,7 @@
>  
>  struct mdev_device;
>  struct vfio_mdev_device_ops;
> +struct virtio_mdev_device_ops;
>  
>  /*
>   * Called by the parent device driver to set the device which represents
> @@ -111,6 +112,8 @@ void mdev_set_drvdata(struct mdev_device *mdev, void 
> *data);
>  const guid_t *mdev_uuid(struct mdev_device *mdev);
>  void mdev_set_vfio_ops(struct mdev_device *mdev,
>  const struct vfio_mdev_device_ops *vfio_ops);
> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> + const struct virtio_mdev_device_ops *virtio_ops);
>  const void *mdev_get_dev_ops(struct mdev_device *mdev);
>  
>  extern struct bus_type mdev_bus_type;
> @@ -127,6 +130,7 @@ struct mdev_device *mdev_from_dev(struct device *dev);
>  
>  enum {
>   MDEV_CLASS_ID_VFIO = 1,
> + MDEV_CLASS_ID_VIRTIO = 2,
>   /* New entries must be added here */
>  };
>  
> diff --git a/include/linux/virtio_mdev.h b/include/linux/virtio_mdev.h
> new file mode 100644
> index ..b965b50f9b24
> --- /dev/null
> +++ b/include/linux/virtio_mdev.h
> @@ -0,0 +1,151 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Virtio mediated device driver
> + *
> + * Copyright 2019, Red Hat Corp.
> + * Author: Jason Wang 
> + */
> +#ifndef _LINUX_VIRTIO_MDEV_H
> +#define _LINUX_VIRTIO_MDEV_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define VIRTIO_MDEV_DEVICE_API_STRING"virtio-mdev"
> +#define VIRTIO_MDEV_F_VERSION_1 0x1
> +
> +struct virtio_mdev_callback {
> + irqreturn_t (*callback)(void *data);
> + void *private;
> +};
> +
> +/**
> + * struct vfio_mdev_device_ops - Structure to be registered for each
> + * mdev device to register the device to virtio-mdev module.
> + *
> + * @set_vq_address:  Set the address of virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @desc_area: address of desc area
> + *   @driver_area: address of driver area
> + *   @device_area: address of device area
> + *   Returns integer: success (0) or error (< 0)
> + * @set_vq_num:  Set the size of virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @num: the size of virtqueue
> + * @kick_vq: Kick the virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + * @set_vq_cb:   Set the interrupt callback function for
> + *   a virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @cb: virtio-mdev interrupt callback structure
> + * @set_vq_ready:Set ready status for a virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @ready: ready (true) not ready(false)
> + * @get_vq_ready:Get ready status for a virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   Returns boolean: ready (true) or not (false)
> + * @set_vq_state:Set the state for a 

Re: [PATCH V4 3/6] mdev: introduce device specific ops

2019-10-17 Thread Alex Williamson
On Thu, 17 Oct 2019 17:07:55 +0200
Cornelia Huck  wrote:

> On Thu, 17 Oct 2019 18:48:33 +0800
> Jason Wang  wrote:
> 
> > Currently, except for the create and remove, the rest of
> > mdev_parent_ops is designed for vfio-mdev driver only and may not help
> > for kernel mdev driver. With the help of class id, this patch
> > introduces device specific callbacks inside mdev_device
> > structure. This allows different set of callback to be used by
> > vfio-mdev and virtio-mdev.
> > 
> > Signed-off-by: Jason Wang 
> > ---
> >  .../driver-api/vfio-mediated-device.rst   | 25 +
> >  MAINTAINERS   |  1 +
> >  drivers/gpu/drm/i915/gvt/kvmgt.c  | 18 ---
> >  drivers/s390/cio/vfio_ccw_ops.c   | 18 ---
> >  drivers/s390/crypto/vfio_ap_ops.c | 14 +++--
> >  drivers/vfio/mdev/mdev_core.c | 18 +--
> >  drivers/vfio/mdev/mdev_private.h  |  1 +
> >  drivers/vfio/mdev/vfio_mdev.c | 37 ++---
> >  include/linux/mdev.h  | 45 
> >  include/linux/vfio_mdev.h | 52 +++
> >  samples/vfio-mdev/mbochs.c| 20 ---
> >  samples/vfio-mdev/mdpy.c  | 20 ---
> >  samples/vfio-mdev/mtty.c  | 18 ---
> >  13 files changed, 184 insertions(+), 103 deletions(-)
> >  create mode 100644 include/linux/vfio_mdev.h
> > 
> > diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> > b/Documentation/driver-api/vfio-mediated-device.rst
> > index f9a78d75a67a..0cca84d19603 100644
> > --- a/Documentation/driver-api/vfio-mediated-device.rst
> > +++ b/Documentation/driver-api/vfio-mediated-device.rst
> > @@ -152,11 +152,22 @@ callbacks per mdev parent device, per mdev type, or 
> > any other categorization.
> >  Vendor drivers are expected to be fully asynchronous in this respect or
> >  provide their own internal resource protection.)
> >  
> > -The callbacks in the mdev_parent_ops structure are as follows:
> > -
> > -* open: open callback of mediated device
> > -* close: close callback of mediated device
> > -* ioctl: ioctl callback of mediated device
> > +As multiple types of mediated devices may be supported, the device
> > +must set up the class id and the device specific callbacks in create()  
> 
> s/in create()/in the create()/
> 
> > +callback. E.g for vfio-mdev device it needs to be done through:  
> 
> "Each class provides a helper function to do so; e.g. for vfio-mdev
> devices, the function to be called is:"
> 
> ?
> 
> > +
> > +int mdev_set_vfio_ops(struct mdev_device *mdev,
> > +  const struct vfio_mdev_ops *vfio_ops);
> > +
> > +The class id (set to MDEV_CLASS_ID_VFIO) is used to match a device  
> 
> "(set by this helper function to MDEV_CLASS_ID_VFIO)" ?
> 
> > +with an mdev driver via its id table. The device specific callbacks
> > +(specified in *ops) are obtainable via mdev_get_dev_ops() (for use by  
> 
> "(specified in *vfio_ops by the caller)" ?
> 
> > +the mdev bus driver). A vfio-mdev device (class id MDEV_CLASS_ID_VFIO)
> > +uses the following device-specific ops:
> > +
> > +* open: open callback of vfio mediated device
> > +* close: close callback of vfio mediated device
> > +* ioctl: ioctl callback of vfio mediated device
> >  * read : read emulation callback
> >  * write: write emulation callback
> >  * mmap: mmap emulation callback
> > @@ -167,10 +178,6 @@ register itself with the mdev core driver::
> > extern int  mdev_register_device(struct device *dev,
> >  const struct mdev_parent_ops *ops);
> >  
> > -It is also required to specify the class_id in create() callback through::
> > -
> > -   int mdev_set_class(struct mdev_device *mdev, u16 id);
> > -  
> 
> I'm wondering if this patch set should start out with introducing
> helper functions already (i.e. don't introduce mdev_set_class(), but
> start out with mdev_set_class_vfio() which will gain the *vfio_ops
> argument in this patch.)

Yes, it would be cleaner, but is it really worth the churn?  Correct me
if I'm wrong, but I think we get to the same point after this patch and
aside from the function name itself, the difference is really just that
the class_id is briefly exposed to the parent driver, right?  Thanks,

Alex
 
> >  However, the mdev_parent_ops structure is not required in the function call
> >  that a driver should use to unregister itself with the mdev core driver::
> >
> 
> (...)
> 
> > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> > index 3a9c52d71b4e..d0f3113c8071 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -45,15 +45,23 @@ void mdev_set_drvdata(struct mdev_device *mdev, void 
> > *data)
> >  }
> >  EXPORT_SYMBOL(mdev_set_drvdata);
> >  
> > -/* Specify the class for the mdev device, this must be called during
> 

Re: [PATCH V3 4/7] mdev: introduce device specific ops

2019-10-16 Thread Alex Williamson
On Wed, 16 Oct 2019 20:48:06 +
Parav Pandit  wrote:

> > From: Alex Williamson 
> > On Wed, 16 Oct 2019 15:31:25 +
> > Parav Pandit  wrote:
> > > > From: Cornelia Huck 
> > > > Parav Pandit  wrote:
> > > > > > From: Alex Williamson 
> > > > > > On Tue, 15 Oct 2019 20:17:01 +0800 Jason Wang
> > > > > >  wrote:
> > > > > >  
> > > > > > > On 2019/10/15 下午6:41, Cornelia Huck wrote:  
> > > > > > > > Apologies if that has already been discussed, but do we want
> > > > > > > > a
> > > > > > > > 1:1 relationship between id and ops, or can different
> > > > > > > > devices with the same id register different ops?  
> > > > > > >
> > > > > > >
> > > > > > > I think we have a N:1 mapping between id and ops, e.g we want
> > > > > > > both virtio-mdev and vhost-mdev use a single set of device ops.  
> > > > > >
> > > > > > The contents of the ops structure is essentially defined by the
> > > > > > id, which is why I was leaning towards them being defined together.
> > > > > > They are effectively interlocked, the id defines which mdev 
> > > > > > "endpoint"
> > > > > > driver is loaded and that driver requires mdev_get_dev_ops() to
> > > > > > return the structure required by the driver.  I wish there was a
> > > > > > way we could incorporate type checking here.  We toyed with the
> > > > > > idea of having the class in the same structure as the ops, but I
> > > > > > think this approach was chosen for simplicity.  We could still do  
> > something like:  
> > > > > >
> > > > > > int mdev_set_class_struct(struct device *dev, const struct
> > > > > > mdev_class_struct *class);
> > > > > >
> > > > > > struct mdev_class_struct {
> > > > > > u16 id;
> > > > > > union {
> > > > > > struct vfio_mdev_ops vfio_ops;
> > > > > > struct virtio_mdev_ops virtio_ops;
> > > > > > };
> > > > > > };
> > > > > >
> > > > > > Maybe even:
> > > > > >
> > > > > > struct vfio_mdev_ops *mdev_get_vfio_ops(struct mdev_device *mdev)  
> > {  
> > > > > > BUG_ON(mdev->class.id != MDEV_ID_VFIO);
> > > > > > return >class.vfio_ops;
> > > > > > }
> > > > > >
> > > > > > The match callback would of course just use the mdev->class.id 
> > > > > > value.
> > > > > > Functionally equivalent, but maybe better type characteristics.
> > > > > > Thanks,
> > > > > >
> > > > > > Alex  
> > > > >
> > > > > We have 3 use cases of mdev.
> > > > > 1. current mdev binding to vfio_mdev 2. mdev binding to virtio 3.
> > > > > mdev binding to mlx5_core without dev_ops
> > > > >
> > > > > Also
> > > > > (a) a given parent may serve multiple types of classes in future.
> > > > > (b) number of classes may not likely explode, they will be handful
> > > > > of them. (vfio_mdev, virtio)
> > > > >
> > > > > So, instead of making copies of this dev_ops pointer in each mdev,
> > > > > it is better  
> > > > to keep const multiple ops in their parent device.  
> > > > > Something like below,
> > > > >
> > > > > struct mdev_parent {
> > > > >   [..]
> > > > >   struct mdev_parent_ops *parent_ops; /* create, remove */
> > > > >   struct vfio_mdev_ops *vfio_ops; /* read,write, ioctl etc */
> > > > >   struct virtio_mdev_ops *virtio_ops; /* virtio ops */ };  
> > > >
> > > > That feels a bit odd. Why should the parent carry pointers to every
> > > > possible version of ops?
> > > >  
> > > How many are we expecting? I envisioned handful of them.
> > > It carries because parent is few, mdevs are several hundreds.
> > > It makes sense to keep few copies, instead of several hundred copies
> > > and it doesn't need to setup on every mdev creation.  
> > 
> &g

Re: [PATCH V3 4/7] mdev: introduce device specific ops

2019-10-16 Thread Alex Williamson
On Wed, 16 Oct 2019 15:31:25 +
Parav Pandit  wrote:

> > -Original Message-
> > From: Cornelia Huck 
> > Sent: Wednesday, October 16, 2019 3:53 AM
> > To: Parav Pandit 
> > Cc: Alex Williamson ; Jason Wang
> > ; k...@vger.kernel.org; linux-s...@vger.kernel.org;
> > linux-ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; intel-
> > g...@lists.freedesktop.org; intel-gvt-...@lists.freedesktop.org;
> > kwankh...@nvidia.com; m...@redhat.com; tiwei@intel.com;
> > virtualization@lists.linux-foundation.org; net...@vger.kernel.org;
> > maxime.coque...@redhat.com; cunming.li...@intel.com;
> > zhihong.w...@intel.com; rob.mil...@broadcom.com; xiao.w.w...@intel.com;
> > haotian.w...@sifive.com; zhen...@linux.intel.com; zhi.a.w...@intel.com;
> > jani.nik...@linux.intel.com; joonas.lahti...@linux.intel.com;
> > rodrigo.v...@intel.com; airl...@linux.ie; dan...@ffwll.ch;
> > far...@linux.ibm.com; pa...@linux.ibm.com; seb...@linux.ibm.com;
> > ober...@linux.ibm.com; heiko.carst...@de.ibm.com; g...@linux.ibm.com;
> > borntrae...@de.ibm.com; akrow...@linux.ibm.com; fre...@linux.ibm.com;
> > lingshan@intel.com; Ido Shamay ;
> > epere...@redhat.com; l...@redhat.com; christophe.de.dinec...@gmail.com;
> > kevin.t...@intel.com
> > Subject: Re: [PATCH V3 4/7] mdev: introduce device specific ops
> > 
> > On Wed, 16 Oct 2019 05:50:08 +
> > Parav Pandit  wrote:
> >   
> > > Hi Alex,
> > >  
> > > > -Original Message-
> > > > From: Alex Williamson 
> > > > Sent: Tuesday, October 15, 2019 12:27 PM
> > > > To: Jason Wang 
> > > > Cc: Cornelia Huck ; k...@vger.kernel.org; linux-
> > > > s...@vger.kernel.org; linux-ker...@vger.kernel.org; dri-
> > > > de...@lists.freedesktop.org; intel-...@lists.freedesktop.org;
> > > > intel-gvt- d...@lists.freedesktop.org; kwankh...@nvidia.com;
> > > > m...@redhat.com; tiwei@intel.com;
> > > > virtualization@lists.linux-foundation.org;
> > > > net...@vger.kernel.org; maxime.coque...@redhat.com;
> > > > cunming.li...@intel.com; zhihong.w...@intel.com;
> > > > rob.mil...@broadcom.com; xiao.w.w...@intel.com;
> > > > haotian.w...@sifive.com; zhen...@linux.intel.com;
> > > > zhi.a.w...@intel.com; jani.nik...@linux.intel.com;
> > > > joonas.lahti...@linux.intel.com; rodrigo.v...@intel.com;
> > > > airl...@linux.ie; dan...@ffwll.ch; far...@linux.ibm.com;
> > > > pa...@linux.ibm.com; seb...@linux.ibm.com; ober...@linux.ibm.com;
> > > > heiko.carst...@de.ibm.com; g...@linux.ibm.com;
> > > > borntrae...@de.ibm.com; akrow...@linux.ibm.com;
> > > > fre...@linux.ibm.com; lingshan@intel.com; Ido Shamay
> > > > ; epere...@redhat.com; l...@redhat.com; Parav
> > > > Pandit ; christophe.de.dinec...@gmail.com;
> > > > kevin.t...@intel.com
> > > > Subject: Re: [PATCH V3 4/7] mdev: introduce device specific ops
> > > >
> > > > On Tue, 15 Oct 2019 20:17:01 +0800
> > > > Jason Wang  wrote:
> > > >  
> > > > > On 2019/10/15 下午6:41, Cornelia Huck wrote:  
> > > > > > On Fri, 11 Oct 2019 16:15:54 +0800 Jason Wang
> > > > > >  wrote:  
> >   
> > > > > >> @@ -167,9 +176,10 @@ register itself with the mdev core driver::
> > > > > >>extern int  mdev_register_device(struct device *dev,
> > > > > >> const struct
> > > > > >> mdev_parent_ops *ops);
> > > > > >>
> > > > > >> -It is also required to specify the class_id through::
> > > > > >> +It is also required to specify the class_id and device
> > > > > >> +specific ops  
> > > > through::  
> > > > > >>
> > > > > >> -  extern int mdev_set_class(struct device *dev, u16 id);
> > > > > >> +  extern int mdev_set_class(struct device *dev, u16 id,
> > > > > >> +const void *ops);  
> > > > > > Apologies if that has already been discussed, but do we want a
> > > > > > 1:1 relationship between id and ops, or can different devices
> > > > > > with the same id register different ops?  
> > > > >
> > > > >
> > > > > I think we have a N:1 mapping between id and ops, e.g we want both
> > > > > virtio-mdev and vhost-mdev use a sin

Re: [PATCH V3 4/7] mdev: introduce device specific ops

2019-10-15 Thread Alex Williamson
On Tue, 15 Oct 2019 20:17:01 +0800
Jason Wang  wrote:

> On 2019/10/15 下午6:41, Cornelia Huck wrote:
> > On Fri, 11 Oct 2019 16:15:54 +0800
> > Jason Wang  wrote:
> >  
> >> Currently, except for the create and remove, the rest of
> >> mdev_parent_ops is designed for vfio-mdev driver only and may not help
> >> for kernel mdev driver. With the help of class id, this patch
> >> introduces device specific callbacks inside mdev_device
> >> structure. This allows different set of callback to be used by
> >> vfio-mdev and virtio-mdev.
> >>
> >> Signed-off-by: Jason Wang 
> >> ---
> >>   .../driver-api/vfio-mediated-device.rst   | 22 +---
> >>   MAINTAINERS   |  1 +
> >>   drivers/gpu/drm/i915/gvt/kvmgt.c  | 18 ---
> >>   drivers/s390/cio/vfio_ccw_ops.c   | 18 ---
> >>   drivers/s390/crypto/vfio_ap_ops.c | 14 +++--
> >>   drivers/vfio/mdev/mdev_core.c |  9 +++-
> >>   drivers/vfio/mdev/mdev_private.h  |  1 +
> >>   drivers/vfio/mdev/vfio_mdev.c | 37 ++---
> >>   include/linux/mdev.h  | 42 +++
> >>   include/linux/vfio_mdev.h | 52 +++
> >>   samples/vfio-mdev/mbochs.c| 20 ---
> >>   samples/vfio-mdev/mdpy.c  | 21 +---
> >>   samples/vfio-mdev/mtty.c  | 18 ---
> >>   13 files changed, 177 insertions(+), 96 deletions(-)
> >>   create mode 100644 include/linux/vfio_mdev.h
> >>
> >> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> >> b/Documentation/driver-api/vfio-mediated-device.rst
> >> index 2035e48da7b2..553574ebba73 100644
> >> --- a/Documentation/driver-api/vfio-mediated-device.rst
> >> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> >> @@ -152,11 +152,20 @@ callbacks per mdev parent device, per mdev type, or 
> >> any other categorization.
> >>   Vendor drivers are expected to be fully asynchronous in this respect or
> >>   provide their own internal resource protection.)
> >>   
> >> -The callbacks in the mdev_parent_ops structure are as follows:
> >> +In order to support multiple types of device/driver, device needs to
> >> +provide both class_id and device_ops through:  
> > "As multiple types of mediated devices may be supported, the device
> > needs to set up the class id and the device specific callbacks via:"
> >
> > ?
> >  
> >>   
> >> -* open: open callback of mediated device
> >> -* close: close callback of mediated device
> >> -* ioctl: ioctl callback of mediated device
> >> +void mdev_set_class(struct mdev_device *mdev, u16 id, const void 
> >> *ops);
> >> +
> >> +The class_id is used to be paired with ids in id_table in mdev_driver
> >> +structure for probing the correct driver.  
> > "The class id  (specified in id) is used to match a device with an mdev
> > driver via its id table."
> >
> > ?
> >  
> >> The device_ops is device
> >> +specific callbacks which can be get through mdev_get_dev_ops()
> >> +function by mdev bus driver.  
> > "The device specific callbacks (specified in *ops) are obtainable via
> > mdev_get_dev_ops() (for use by the mdev bus driver.)"
> >
> > ?
> >  
> >> For vfio-mdev device, its device specific
> >> +ops are as follows:  
> > "A vfio-mdev device (class id MDEV_ID_VFIO) uses the following
> > device-specific ops:"
> >
> > ?  
> 
> 
> All you propose is better than what I wrote, will change the docs.
> 
> 
> >  
> >> +
> >> +* open: open callback of vfio mediated device
> >> +* close: close callback of vfio mediated device
> >> +* ioctl: ioctl callback of vfio mediated device
> >>   * read : read emulation callback
> >>   * write: write emulation callback
> >>   * mmap: mmap emulation callback
> >> @@ -167,9 +176,10 @@ register itself with the mdev core driver::
> >>extern int  mdev_register_device(struct device *dev,
> >> const struct mdev_parent_ops *ops);
> >>   
> >> -It is also required to specify the class_id through::
> >> +It is also required to specify the class_id and device specific ops 
> >> through::
> >>   
> >> -  extern int mdev_set_class(struct device *dev, u16 id);
> >> +  extern int mdev_set_class(struct device *dev, u16 id,
> >> +const void *ops);  
> > Apologies if that has already been discussed, but do we want a 1:1
> > relationship between id and ops, or can different devices with the same
> > id register different ops?  
> 
> 
> I think we have a N:1 mapping between id and ops, e.g we want both 
> virtio-mdev and vhost-mdev use a single set of device ops.

The contents of the ops structure is essentially defined by the id,
which is why I was leaning towards them being defined together.  They
are effectively interlocked, the id defines which mdev "endpoint"
driver is loaded and that driver requires mdev_get_dev_ops() to return
the structure required by the driver.  I wish 

Re: [PATCH V3 1/7] mdev: class id support

2019-10-15 Thread Alex Williamson
On Fri, 11 Oct 2019 16:15:51 +0800
Jason Wang  wrote:
  
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b558d4cfd082..724e9b9841d8 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -45,6 +45,12 @@ void mdev_set_drvdata(struct mdev_device *mdev, void *data)
>  }
>  EXPORT_SYMBOL(mdev_set_drvdata);
>  
> +void mdev_set_class(struct mdev_device *mdev, u16 id)
> +{
> + mdev->class_id = id;
> +}
> +EXPORT_SYMBOL(mdev_set_class);
> +
>  struct device *mdev_dev(struct mdev_device *mdev)
>  {
>   return >dev;
> @@ -135,6 +141,7 @@ static int mdev_device_remove_cb(struct device *dev, void 
> *data)
>   * mdev_register_device : Register a device
>   * @dev: device structure representing parent device.
>   * @ops: Parent device operation structure to be registered.
> + * @id: class id.
>   *
>   * Add device to list of registered parent devices.
>   * Returns a negative value on error, otherwise 0.
> @@ -324,6 +331,9 @@ int mdev_device_create(struct kobject *kobj,
>   if (ret)
>   goto ops_create_fail;
>  
> + if (!mdev->class_id)

This is a sanity test failure of the parent driver on a privileged
path, I think it's fair to print a warning when this occurs rather than
only return an errno to the user.  In fact, ret is not set to an error
value here, so it looks like this fails to create the device but
returns success.  Thanks,

Alex

> + goto class_id_fail;
> +
>   ret = device_add(>dev);
>   if (ret)
>   goto add_fail;
> @@ -340,6 +350,7 @@ int mdev_device_create(struct kobject *kobj,
>  
>  sysfs_fail:
>   device_del(>dev);
> +class_id_fail:
>  add_fail:
>   parent->ops->remove(mdev);
>  ops_create_fail:
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 6/8] mdev: introduce virtio device and its device ops

2019-09-30 Thread Alex Williamson
On Fri, 27 Sep 2019 16:25:13 +
Parav Pandit  wrote:

> Hi Alex,
> 
> 
> > -Original Message-
> > From: Alex Williamson 
> > Sent: Tuesday, September 24, 2019 6:07 PM
> > To: Jason Wang 
> > Cc: k...@vger.kernel.org; linux-s...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; intel-
> > g...@lists.freedesktop.org; intel-gvt-...@lists.freedesktop.org;
> > kwankh...@nvidia.com; m...@redhat.com; tiwei@intel.com;
> > virtualization@lists.linux-foundation.org; net...@vger.kernel.org;
> > coh...@redhat.com; maxime.coque...@redhat.com;
> > cunming.li...@intel.com; zhihong.w...@intel.com;
> > rob.mil...@broadcom.com; xiao.w.w...@intel.com;
> > haotian.w...@sifive.com; zhen...@linux.intel.com; zhi.a.w...@intel.com;
> > jani.nik...@linux.intel.com; joonas.lahti...@linux.intel.com;
> > rodrigo.v...@intel.com; airl...@linux.ie; dan...@ffwll.ch;
> > far...@linux.ibm.com; pa...@linux.ibm.com; seb...@linux.ibm.com;
> > ober...@linux.ibm.com; heiko.carst...@de.ibm.com; g...@linux.ibm.com;
> > borntrae...@de.ibm.com; akrow...@linux.ibm.com; fre...@linux.ibm.com;
> > lingshan@intel.com; Ido Shamay ;
> > epere...@redhat.com; l...@redhat.com; Parav Pandit
> > ; christophe.de.dinec...@gmail.com;
> > kevin.t...@intel.com
> > Subject: Re: [PATCH V2 6/8] mdev: introduce virtio device and its device ops
> > 
> > On Tue, 24 Sep 2019 21:53:30 +0800
> > Jason Wang  wrote:
> >   
> > > This patch implements basic support for mdev driver that supports
> > > virtio transport for kernel virtio driver.
> > >
> > > Signed-off-by: Jason Wang 
> > > ---
> > >  include/linux/mdev.h|   2 +
> > >  include/linux/virtio_mdev.h | 145
> > > 
> > >  2 files changed, 147 insertions(+)
> > >  create mode 100644 include/linux/virtio_mdev.h
> > >
> > > diff --git a/include/linux/mdev.h b/include/linux/mdev.h index
> > > 3414307311f1..73ac27b3b868 100644
> > > --- a/include/linux/mdev.h
> > > +++ b/include/linux/mdev.h
> > > @@ -126,6 +126,8 @@ struct mdev_device *mdev_from_dev(struct device
> > > *dev);
> > >
> > >  enum {
> > >   MDEV_ID_VFIO = 1,
> > > + MDEV_ID_VIRTIO = 2,
> > > + MDEV_ID_VHOST = 3,  
> > 
> > MDEV_ID_VHOST isn't used yet here.  Also, given the strong interdependence
> > between the class_id and the ops structure, we might wand to define them in
> > the same place.  Thanks,
> >   
> 
> When mlx5_core creates mdevs (parent->ops->create() and it wants to
> bind to mlx5 mdev driver (which does mdev_register_driver()), mlx5
> core driver will publish MDEV_ID_MLX5_NET defined in central place as
> include/linux/mdev.h without any ops structure. Because such ops are
> not relevant. It uses usual, standard ops probe() remove() on the
> mdev (similar to a regular PCI device). So for VHOST case ops may be
> closely related to ID, but not for other type of ID.
> 
> Just want to make sure, that scope of ID covers this case.

AIUI, these device-ops are primarily meant to have 1:N multiplexing of
the mdev bus driver.  One mdev bus driver supports N vendor drivers via
a common "protocol" defined by this structure.  vfio-mdev supports
GVT-g, GRID, and several sample drivers.  I think Jason and Tiwei are
attempting something similar if we have multiple vendors that may
provide virtio/vhost parent drivers.  If you have a 1:1 model with
mlx5 where you're not trying to abstract a common channel between the
mdev bus driver and the mdev vendor driver, then I suppose you might
not use the device-ops capabilities of the mdev-core.  Did I interpret
the question correctly?  I think that's probably fine, mdev-core
shouldn't have any dependencies on the device-ops and we shouldn't
really be dictating the bus/vendor link through mdev.  Thanks,

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 5/8] mdev: introduce device specific ops

2019-09-26 Thread Alex Williamson
On Thu, 26 Sep 2019 11:46:55 -0400
"Michael S. Tsirkin"  wrote:

> On Wed, Sep 25, 2019 at 10:30:28AM -0600, Alex Williamson wrote:
> > On Wed, 25 Sep 2019 10:11:00 -0400
> > Rob Miller  wrote:  
> > > > > On Tue, 24 Sep 2019 21:53:29 +0800
> > > > > Jason Wang  wrote:  
> > > > > > diff --git a/drivers/vfio/mdev/vfio_mdev.c
> > > > > b/drivers/vfio/mdev/vfio_mdev.c
> > > > > > index 891cf83a2d9a..95efa054442f 100644
> > > > > > --- a/drivers/vfio/mdev/vfio_mdev.c
> > > > > > +++ b/drivers/vfio/mdev/vfio_mdev.c
> > > > > > @@ -14,6 +14,7 @@
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > > +#include 
> > > > > >
> > > > > >  #include "mdev_private.h"
> > > > > >
> > > > > > @@ -24,16 +25,16 @@
> > > > > >  static int vfio_mdev_open(void *device_data)
> > > > > >  {
> > > > > > struct mdev_device *mdev = device_data;
> > > > > > -   struct mdev_parent *parent = mdev->parent;
> > > > > > +   const struct vfio_mdev_device_ops *ops =
> > > > > mdev_get_dev_ops(mdev);
> > > > > > int ret;
> > > > > >
> > > > > > -   if (unlikely(!parent->ops->open))
> > > > > > +   if (unlikely(!ops->open))
> > > > > > return -EINVAL;
> > > > > >
> > > > > > if (!try_module_get(THIS_MODULE))
> > > > > > return -ENODEV;
> > > >
> > >   
> > > RJM>] My understanding lately is that this call to
> > > try_module_get(THIS_MODULE) is no longer needed as is considered as a
> > > latent bug.
> > > Quote from
> > > https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put
> > >  :
> > > There are a number of uses of try_module_get(THIS_MODULE) in the kernel
> > > source but most if not all of them are latent bugs that should be cleaned
> > > up.  
> > 
> > This use seems to fall exactly into the case where it is necessary, the
> > open here is not a direct VFS call, it's an internal interface between
> > modules.  The user is interacting with filesystem objects from the vfio
> > module and the module reference we're trying to acquire here is to the
> > vfio-mdev module.  Thanks,
> > 
> > Alex  
> 
> 
> I think the latent bug refers not to module get per se,
> but to the module_put tied to it. E.g.:
> 
>  static void vfio_mdev_release(void *device_data)
>  {
> struct mdev_device *mdev = device_data;
> struct mdev_parent *parent = mdev->parent;
> 
> if (likely(parent->ops->release))
> parent->ops->release(mdev);
> 
> module_put(THIS_MODULE);
> 
> Does anything prevent the module from unloading at this point?
> if not then ...
> 
> 
>  }
> 
> it looks like the implicit return (with instructions for argument pop
> and functuon return) here can get overwritten on module
> unload, causing a crash when executed.
> 
> IOW there's generally no way for module to keep a reference
> to itself: it can take a reference but it needs someone else
> to keep it and put.

I'd always assumed this would exit cleanly, but perhaps there is a
latent race there.  In any case, taking a module reference within the
module in this case is better than not doing so, as the latter would
potentially allow the module to be removed at any point in time, while
the former only seems to expose acquire and release gaps.  Add it to
the todo list.  Thanks,

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 5/8] mdev: introduce device specific ops

2019-09-25 Thread Alex Williamson
On Wed, 25 Sep 2019 10:11:00 -0400
Rob Miller  wrote:
> > > On Tue, 24 Sep 2019 21:53:29 +0800
> > > Jason Wang  wrote:
> > > > diff --git a/drivers/vfio/mdev/vfio_mdev.c  
> > > b/drivers/vfio/mdev/vfio_mdev.c  
> > > > index 891cf83a2d9a..95efa054442f 100644
> > > > --- a/drivers/vfio/mdev/vfio_mdev.c
> > > > +++ b/drivers/vfio/mdev/vfio_mdev.c
> > > > @@ -14,6 +14,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >
> > > >  #include "mdev_private.h"
> > > >
> > > > @@ -24,16 +25,16 @@
> > > >  static int vfio_mdev_open(void *device_data)
> > > >  {
> > > > struct mdev_device *mdev = device_data;
> > > > -   struct mdev_parent *parent = mdev->parent;
> > > > +   const struct vfio_mdev_device_ops *ops =  
> > > mdev_get_dev_ops(mdev);  
> > > > int ret;
> > > >
> > > > -   if (unlikely(!parent->ops->open))
> > > > +   if (unlikely(!ops->open))
> > > > return -EINVAL;
> > > >
> > > > if (!try_module_get(THIS_MODULE))
> > > > return -ENODEV;  
> >  
> 
> RJM>] My understanding lately is that this call to  
> try_module_get(THIS_MODULE) is no longer needed as is considered as a
> latent bug.
> Quote from
> https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put
>  :
> There are a number of uses of try_module_get(THIS_MODULE) in the kernel
> source but most if not all of them are latent bugs that should be cleaned
> up.

This use seems to fall exactly into the case where it is necessary, the
open here is not a direct VFS call, it's an internal interface between
modules.  The user is interacting with filesystem objects from the vfio
module and the module reference we're trying to acquire here is to the
vfio-mdev module.  Thanks,

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 6/8] mdev: introduce virtio device and its device ops

2019-09-24 Thread Alex Williamson
On Tue, 24 Sep 2019 21:53:30 +0800
Jason Wang  wrote:

> This patch implements basic support for mdev driver that supports
> virtio transport for kernel virtio driver.
> 
> Signed-off-by: Jason Wang 
> ---
>  include/linux/mdev.h|   2 +
>  include/linux/virtio_mdev.h | 145 
>  2 files changed, 147 insertions(+)
>  create mode 100644 include/linux/virtio_mdev.h
> 
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 3414307311f1..73ac27b3b868 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -126,6 +126,8 @@ struct mdev_device *mdev_from_dev(struct device *dev);
>  
>  enum {
>   MDEV_ID_VFIO = 1,
> + MDEV_ID_VIRTIO = 2,
> + MDEV_ID_VHOST = 3,

MDEV_ID_VHOST isn't used yet here.  Also, given the strong
interdependence between the class_id and the ops structure, we might
wand to define them in the same place.  Thanks,

Alex

>   /* New entries must be added here */
>  };
>  
> diff --git a/include/linux/virtio_mdev.h b/include/linux/virtio_mdev.h
> new file mode 100644
> index ..d1a40a739266
> --- /dev/null
> +++ b/include/linux/virtio_mdev.h
> @@ -0,0 +1,145 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Virtio mediated device driver
> + *
> + * Copyright 2019, Red Hat Corp.
> + * Author: Jason Wang 
> + */
> +#ifndef _LINUX_VIRTIO_MDEV_H
> +#define _LINUX_VIRTIO_MDEV_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define VIRTIO_MDEV_DEVICE_API_STRING"virtio-mdev"
> +#define VIRTIO_MDEV_VERSION 0x1
> +
> +struct virtio_mdev_callback {
> + irqreturn_t (*callback)(void *data);
> + void *private;
> +};
> +
> +/**
> + * struct vfio_mdev_device_ops - Structure to be registered for each
> + * mdev device to register the device to virtio-mdev module.
> + *
> + * @set_vq_address:  Set the address of virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @desc_area: address of desc area
> + *   @driver_area: address of driver area
> + *   @device_area: address of device area
> + *   Returns integer: success (0) or error (< 0)
> + * @set_vq_num:  Set the size of virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @num: the size of virtqueue
> + * @kick_vq: Kick the virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + * @set_vq_cb:   Set the interrut calback function for
> + *   a virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @cb: virtio-mdev interrupt callback structure
> + * @set_vq_ready:Set ready status for a virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @ready: ready (true) not ready(false)
> + * @get_vq_ready:Get ready status for a virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   Returns boolean: ready (true) or not (false)
> + * @set_vq_state:Set the state for a virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   @state: virtqueue state (last_avail_idx)
> + *   Returns integer: success (0) or error (< 0)
> + * @get_vq_state:Get the state for a virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   Returns virtqueue state (last_avail_idx)
> + * @get_vq_align:Get the virtqueue align requirement
> + *   for the device
> + *   @mdev: mediated device
> + *   Returns virtqueue algin requirement
> + * @get_features:Get virtio features supported by the device
> + *   @mdev: mediated device
> + *   Returns the features support by the
> + *   device
> + * @get_features:Set virtio features supported by the driver
> + *   @mdev: mediated device
> + *   @features: feature support by the driver
> + *   Returns integer: success (0) or error (< 0)
> + * @set_config_cb:   Set the config interrupt callback
> + *   @mdev: mediated device
> + *   @cb: virtio-mdev interrupt callback structure
> + * 

Re: [PATCH V2 2/8] mdev: class id support

2019-09-24 Thread Alex Williamson
On Tue, 24 Sep 2019 21:53:26 +0800
Jason Wang  wrote:

> Mdev bus only supports vfio driver right now, so it doesn't implement
> match method. But in the future, we may add drivers other than vfio,
> the first driver could be virtio-mdev. This means we need to add
> device class id support in bus match method to pair the mdev device
> and mdev driver correctly.
> 
> So this patch adds id_table to mdev_driver and class_id for mdev
> parent with the match method for mdev bus.

Description needs to be revised from v1, class_id is no longer on the
parent.

> Signed-off-by: Jason Wang 
> ---
>  Documentation/driver-api/vfio-mediated-device.rst |  3 +++
>  drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
>  drivers/s390/cio/vfio_ccw_ops.c   |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c |  1 +
>  drivers/vfio/mdev/mdev_core.c |  7 +++
>  drivers/vfio/mdev/mdev_driver.c   | 14 ++
>  drivers/vfio/mdev/mdev_private.h  |  1 +
>  drivers/vfio/mdev/vfio_mdev.c |  6 ++
>  include/linux/mdev.h  |  8 
>  include/linux/mod_devicetable.h   |  8 
>  samples/vfio-mdev/mbochs.c|  1 +
>  samples/vfio-mdev/mdpy.c  |  1 +
>  samples/vfio-mdev/mtty.c  |  1 +
>  13 files changed, 53 insertions(+)
> 
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> b/Documentation/driver-api/vfio-mediated-device.rst
> index 25eb7d5b834b..a5bdc60d62a1 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -102,12 +102,14 @@ structure to represent a mediated device's driver::
>* @probe: called when new device created
>* @remove: called when device removed
>* @driver: device driver structure
> +  * @id_table: the ids serviced by this driver
>*/
>   struct mdev_driver {
>const char *name;
>int  (*probe)  (struct device *dev);
>void (*remove) (struct device *dev);
>struct device_driverdriver;
> +  const struct mdev_class_id *id_table;
>   };
>  
>  A mediated bus driver for mdev should use this structure in the function 
> calls
> @@ -165,6 +167,7 @@ register itself with the mdev core driver::
>   extern int  mdev_register_device(struct device *dev,
>const struct mdev_parent_ops *ops);
>  
> +
>  However, the mdev_parent_ops structure is not required in the function call
>  that a driver should use to unregister itself with the mdev core driver::

Unintended extra line?  Doesn't seem to match surrounding formatting.

Calling mdev_set_class_id() as part of create seems relatively
fundamental to the vendor driver with this change, it should be added
to the documentation.

> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 23aa3e50cbf8..f793252a3d2a 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -678,6 +678,7 @@ static int intel_vgpu_create(struct kobject *kobj, struct 
> mdev_device *mdev)
>dev_name(mdev_dev(mdev)));
>   ret = 0;
>  
> + mdev_set_class_id(mdev, MDEV_ID_VFIO);
>  out:
>   return ret;
>  }
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index f0d71ab77c50..d258ef1fedb9 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -129,6 +129,7 @@ static int vfio_ccw_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>  private->sch->schid.ssid,
>  private->sch->schid.sch_no);
>  
> + mdev_set_class_id(mdev, MDEV_ID_VFIO);
>   return 0;
>  }
>  
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 5c0f53c6dde7..2cfd96112aa0 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -343,6 +343,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>   list_add(_mdev->node, _dev->mdev_list);
>   mutex_unlock(_dev->lock);
>  
> + mdev_set_class_id(mdev, MDEV_ID_VFIO);
>   return 0;
>  }
>  
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b558d4cfd082..8764cf4a276d 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -45,6 +45,12 @@ void mdev_set_drvdata(struct mdev_device *mdev, void *data)
>  }
>  EXPORT_SYMBOL(mdev_set_drvdata);
>  
> +void mdev_set_class_id(struct mdev_device *mdev, u16 id)
> +{
> + mdev->class_id = id;
> +}
> +EXPORT_SYMBOL(mdev_set_class_id);
> +
>  struct device *mdev_dev(struct mdev_device *mdev)
>  {
>   return >dev;
> @@ -135,6 +141,7 @@ static int 

Re: [PATCH V2 5/8] mdev: introduce device specific ops

2019-09-24 Thread Alex Williamson
On Tue, 24 Sep 2019 21:53:29 +0800
Jason Wang  wrote:

> Currently, except for the create and remove, the rest of
> mdev_parent_ops is designed for vfio-mdev driver only and may not help
> for kernel mdev driver. With the help of class id, this patch
> introduces device specific callbacks inside mdev_device
> structure. This allows different set of callback to be used by
> vfio-mdev and virtio-mdev.
> 
> Signed-off-by: Jason Wang 
> ---
>  .../driver-api/vfio-mediated-device.rst   |  4 +-
>  MAINTAINERS   |  1 +
>  drivers/gpu/drm/i915/gvt/kvmgt.c  | 17 +++---
>  drivers/s390/cio/vfio_ccw_ops.c   | 17 --
>  drivers/s390/crypto/vfio_ap_ops.c | 13 +++--
>  drivers/vfio/mdev/mdev_core.c | 12 +
>  drivers/vfio/mdev/mdev_private.h  |  1 +
>  drivers/vfio/mdev/vfio_mdev.c | 37 ++---
>  include/linux/mdev.h  | 42 ---
>  include/linux/vfio_mdev.h | 52 +++
>  samples/vfio-mdev/mbochs.c| 19 ---
>  samples/vfio-mdev/mdpy.c  | 19 ---
>  samples/vfio-mdev/mtty.c  | 17 --
>  13 files changed, 168 insertions(+), 83 deletions(-)
>  create mode 100644 include/linux/vfio_mdev.h
> 
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> b/Documentation/driver-api/vfio-mediated-device.rst
> index a5bdc60d62a1..d50425b368bb 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -152,7 +152,9 @@ callbacks per mdev parent device, per mdev type, or any 
> other categorization.
>  Vendor drivers are expected to be fully asynchronous in this respect or
>  provide their own internal resource protection.)
>  
> -The callbacks in the mdev_parent_ops structure are as follows:
> +The device specific callbacks are referred through device_ops pointer
> +in mdev_parent_ops. For vfio-mdev device, its callbacks in device_ops
> +are as follows:

This is not accurate.  device_ops is now on the mdev_device and is an
mdev bus driver specific structure of callbacks that must be registered
for each mdev device by the parent driver during the create callback.
There's a one to one mapping of class_id to mdev_device_ops callbacks.

That also suggests to me that we could be more clever in registering
both of these with mdev-core.  Can we embed the class_id in the ops
structure in a common way so that the core can extract it and the bus
drivers can access their specific callbacks?

>  * open: open callback of mediated device
>  * close: close callback of mediated device
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b2326dece28e..89832b316500 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17075,6 +17075,7 @@ S:Maintained
>  F:   Documentation/driver-api/vfio-mediated-device.rst
>  F:   drivers/vfio/mdev/
>  F:   include/linux/mdev.h
> +F:   include/linux/vfio_mdev.h
>  F:   samples/vfio-mdev/
>  
>  VFIO PLATFORM DRIVER
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index f793252a3d2a..b274f5ee481f 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -42,6 +42,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -643,6 +644,8 @@ static void kvmgt_put_vfio_device(void *vgpu)
>   vfio_device_put(((struct intel_vgpu *)vgpu)->vdev.vfio_device);
>  }
>  
> +static struct vfio_mdev_device_ops intel_vfio_vgpu_dev_ops;
> +
>  static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
>  {
>   struct intel_vgpu *vgpu = NULL;
> @@ -679,6 +682,7 @@ static int intel_vgpu_create(struct kobject *kobj, struct 
> mdev_device *mdev)
>   ret = 0;
>  
>   mdev_set_class_id(mdev, MDEV_ID_VFIO);
> + mdev_set_dev_ops(mdev, _vfio_vgpu_dev_ops);

This seems rather unrefined.  We're registering interdependent data in
separate calls.  All drivers need to make both of these calls.  I'm not
sure if this is a good idea, but what if we had:

static const struct vfio_mdev_device_ops intel_vfio_vgpu_dev_ops = {
.id = MDEV_ID_VFIO,
.open   = intel_vgpu_open,
.release= intel_vgpu_release,
...

And the set function passed _vfio_vgpu_dev_ops.id and the mdev
bus drivers used container_of to get to their callbacks?

>  out:
>   return ret;
>  }
> @@ -1601,20 +1605,21 @@ static const struct attribute_group 
> *intel_vgpu_groups[] = {
>   NULL,
>  };
>  
> -static struct mdev_parent_ops intel_vgpu_ops = {
> - .mdev_attr_groups   = intel_vgpu_groups,
> - .create = intel_vgpu_create,
> - .remove = intel_vgpu_remove,
> -
> +static struct vfio_mdev_device_ops intel_vfio_vgpu_dev_ops = {
>   .open   = 

Re: [PATCH 5/6] vringh: fix copy direction of vringh_iov_push_kern()

2019-09-24 Thread Alex Williamson
On Mon, 23 Sep 2019 12:00:41 -0400
"Michael S. Tsirkin"  wrote:

> On Mon, Sep 23, 2019 at 09:45:59AM -0600, Alex Williamson wrote:
> > On Mon, 23 Sep 2019 21:03:30 +0800
> > Jason Wang  wrote:
> >   
> > > We want to copy from iov to buf, so the direction was wrong.
> > > 
> > > Signed-off-by: Jason Wang 
> > > ---
> > >  drivers/vhost/vringh.c | 8 +++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)  
> > 
> > 
> > Why is this included in the series?  Seems like an unrelated fix being
> > held up within a proposal for a new feature.  Thanks,
> > 
> > Alex  
> 
> It's better to have it as patch 1/6, but it's a dependency of the
> example driver in the series. I can reorder when I apply.

It's a fix, please submit it separately through virtio/vhost channels,
then it will already be in the base kernel we use for the rest of the
series.  The remainder of the series certainly suggests a workflow
through the vfio tree rather than virtio/vhost.  Thanks,

Alex

> > > diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> > > index 08ad0d1f0476..a0a2d74967ef 100644
> > > --- a/drivers/vhost/vringh.c
> > > +++ b/drivers/vhost/vringh.c
> > > @@ -852,6 +852,12 @@ static inline int xfer_kern(void *src, void *dst, 
> > > size_t len)
> > >   return 0;
> > >  }
> > >  
> > > +static inline int kern_xfer(void *dst, void *src, size_t len)
> > > +{
> > > + memcpy(dst, src, len);
> > > + return 0;
> > > +}
> > > +
> > >  /**
> > >   * vringh_init_kern - initialize a vringh for a kernelspace vring.
> > >   * @vrh: the vringh to initialize.
> > > @@ -958,7 +964,7 @@ EXPORT_SYMBOL(vringh_iov_pull_kern);
> > >  ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
> > >const void *src, size_t len)
> > >  {
> > > - return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
> > > + return vringh_iov_xfer(wiov, (void *)src, len, kern_xfer);
> > >  }
> > >  EXPORT_SYMBOL(vringh_iov_push_kern);
> > >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/6] mdev: class id support

2019-09-23 Thread Alex Williamson
On Mon, 23 Sep 2019 21:03:26 +0800
Jason Wang  wrote:

> Mdev bus only supports vfio driver right now, so it doesn't implement
> match method. But in the future, we may add drivers other than vfio,
> one example is virtio-mdev[1] driver. This means we need to add device
> class id support in bus match method to pair the mdev device and mdev
> driver correctly.
> 
> So this patch adds id_table to mdev_driver and class_id for mdev
> parent with the match method for mdev bus.
> 
> Signed-off-by: Jason Wang 
> ---
>  Documentation/driver-api/vfio-mediated-device.rst |  7 +--
>  drivers/gpu/drm/i915/gvt/kvmgt.c  |  2 +-
>  drivers/s390/cio/vfio_ccw_ops.c   |  2 +-
>  drivers/s390/crypto/vfio_ap_ops.c |  3 ++-
>  drivers/vfio/mdev/mdev_core.c | 14 --
>  drivers/vfio/mdev/mdev_driver.c   | 14 ++
>  drivers/vfio/mdev/mdev_private.h  |  1 +
>  drivers/vfio/mdev/vfio_mdev.c |  6 ++
>  include/linux/mdev.h  |  7 ++-
>  include/linux/mod_devicetable.h   |  8 
>  samples/vfio-mdev/mbochs.c|  2 +-
>  samples/vfio-mdev/mdpy.c  |  2 +-
>  samples/vfio-mdev/mtty.c  |  2 +-
>  13 files changed, 59 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> b/Documentation/driver-api/vfio-mediated-device.rst
> index 25eb7d5b834b..0e052072e1d8 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -102,12 +102,14 @@ structure to represent a mediated device's driver::
>* @probe: called when new device created
>* @remove: called when device removed
>* @driver: device driver structure
> +  * @id_table: the ids serviced by this driver.
>*/
>   struct mdev_driver {
>const char *name;
>int  (*probe)  (struct device *dev);
>void (*remove) (struct device *dev);
>struct device_driverdriver;
> +  const struct mdev_class_id *id_table;
>   };
>  
>  A mediated bus driver for mdev should use this structure in the function 
> calls
> @@ -116,7 +118,7 @@ to register and unregister itself with the core driver:
>  * Register::
>  
>  extern int  mdev_register_driver(struct mdev_driver *drv,
> -struct module *owner);
> + struct module *owner);
>  
>  * Unregister::
>  
> @@ -163,7 +165,8 @@ A driver should use the mdev_parent_ops structure in the 
> function call to
>  register itself with the mdev core driver::
>  
>   extern int  mdev_register_device(struct device *dev,
> -  const struct mdev_parent_ops *ops);
> +  const struct mdev_parent_ops *ops,
> +  u8 class_id);
>  
>  However, the mdev_parent_ops structure is not required in the function call
>  that a driver should use to unregister itself with the mdev core driver::
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 23aa3e50cbf8..19d51a35f019 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1625,7 +1625,7 @@ static int kvmgt_host_init(struct device *dev, void 
> *gvt, const void *ops)
>   return -EFAULT;
>   intel_vgpu_ops.supported_type_groups = kvm_vgpu_type_groups;
>  
> - return mdev_register_device(dev, _vgpu_ops);
> + return mdev_register_vfio_device(dev, _vgpu_ops);
>  }
>  
>  static void kvmgt_host_exit(struct device *dev)
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index f0d71ab77c50..246ff0f80944 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -588,7 +588,7 @@ static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
>  
>  int vfio_ccw_mdev_reg(struct subchannel *sch)
>  {
> - return mdev_register_device(>dev, _ccw_mdev_ops);
> + return mdev_register_vfio_device(>dev, _ccw_mdev_ops);
>  }
>  
>  void vfio_ccw_mdev_unreg(struct subchannel *sch)
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 5c0f53c6dde7..7487fc39d2c5 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1295,7 +1295,8 @@ int vfio_ap_mdev_register(void)
>  {
>   atomic_set(_dev->available_instances, MAX_ZDEV_ENTRIES_EXT);
>  
> - return mdev_register_device(_dev->device, _ap_matrix_ops);
> + return mdev_register_vfio_device(_dev->device,
> +  _ap_matrix_ops);
>  }
>  
>  void vfio_ap_mdev_unregister(void)
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 

Re: [PATCH 5/6] vringh: fix copy direction of vringh_iov_push_kern()

2019-09-23 Thread Alex Williamson
On Mon, 23 Sep 2019 21:03:30 +0800
Jason Wang  wrote:

> We want to copy from iov to buf, so the direction was wrong.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vringh.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)


Why is this included in the series?  Seems like an unrelated fix being
held up within a proposal for a new feature.  Thanks,

Alex
 
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 08ad0d1f0476..a0a2d74967ef 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -852,6 +852,12 @@ static inline int xfer_kern(void *src, void *dst, size_t 
> len)
>   return 0;
>  }
>  
> +static inline int kern_xfer(void *dst, void *src, size_t len)
> +{
> + memcpy(dst, src, len);
> + return 0;
> +}
> +
>  /**
>   * vringh_init_kern - initialize a vringh for a kernelspace vring.
>   * @vrh: the vringh to initialize.
> @@ -958,7 +964,7 @@ EXPORT_SYMBOL(vringh_iov_pull_kern);
>  ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
>const void *src, size_t len)
>  {
> - return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
> + return vringh_iov_xfer(wiov, (void *)src, len, kern_xfer);
>  }
>  EXPORT_SYMBOL(vringh_iov_push_kern);
>  

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/2] Mdev: support mutiple kinds of devices

2019-09-17 Thread Alex Williamson
On Wed, 18 Sep 2019 01:54:43 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson
> > Sent: Wednesday, September 18, 2019 1:31 AM
> > 
> > [cc +Parav]
> > 
> > On Thu, 12 Sep 2019 17:40:10 +0800
> > Jason Wang  wrote:
> >   
> > > Hi all:
> > >
> > > During the development of virtio-mdev[1]. I find that mdev needs to be
> > > extended to support devices other than vfio mdev device. So this
> > > series tries to extend the mdev to be able to differ from different
> > > devices by:
> > >
> > > - device id and matching for mdev bus
> > > - device speicfic callbacks and move vfio callbacks there
> > >
> > > Sent for early reivew, compile test only!
> > >
> > > Thanks
> > >
> > > [1] https://lkml.org/lkml/2019/9/10/135  
> > 
> > I expect Parav must have something similar in the works for their
> > in-kernel networking mdev support.  Link to discussion so far:
> > 
> > https://lore.kernel.org/kvm/20190912094012.29653-1-
> > jasow...@redhat.com/T/#t
> >   
> 
> It links to the current thread. Is it intended or do you want
> people to look at another thread driven by Parav? :-)

Sorry, the link was provided for Parav since they weren't cc'd.  Thanks,

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 2/4] mdev: introduce helper to set per device dma ops

2019-09-17 Thread Alex Williamson
On Tue, 10 Sep 2019 16:19:33 +0800
Jason Wang  wrote:

> This patch introduces mdev_set_dma_ops() which allows parent to set
> per device DMA ops. This help for the kernel driver to setup a correct
> DMA mappings.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vfio/mdev/mdev_core.c | 7 +++
>  include/linux/mdev.h  | 2 ++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b558d4cfd082..eb28552082d7 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "mdev_private.h"
>  
> @@ -27,6 +28,12 @@ static struct class_compat *mdev_bus_compat_class;
>  static LIST_HEAD(mdev_list);
>  static DEFINE_MUTEX(mdev_list_lock);
>  
> +void mdev_set_dma_ops(struct mdev_device *mdev, struct dma_map_ops *ops)
> +{
> + set_dma_ops(>dev, ops);
> +}
> +EXPORT_SYMBOL(mdev_set_dma_ops);
> +

Why does mdev need to be involved here?  Your sample driver in 4/4 calls
this from its create callback, where it could just as easily call:

  set_dma_ops(mdev_dev(mdev), ops);

Thanks,
Alex

>  struct device *mdev_parent_dev(struct mdev_device *mdev)
>  {
>   return mdev->parent->dev;
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 0ce30ca78db0..7195f40bf8bf 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -145,4 +145,6 @@ struct device *mdev_parent_dev(struct mdev_device *mdev);
>  struct device *mdev_dev(struct mdev_device *mdev);
>  struct mdev_device *mdev_from_dev(struct device *dev);
>  
> +void mdev_set_dma_ops(struct mdev_device *mdev, struct dma_map_ops *ops);
> +
>  #endif /* MDEV_H */

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/2] Mdev: support mutiple kinds of devices

2019-09-17 Thread Alex Williamson
[cc +Parav]

On Thu, 12 Sep 2019 17:40:10 +0800
Jason Wang  wrote:

> Hi all:
> 
> During the development of virtio-mdev[1]. I find that mdev needs to be
> extended to support devices other than vfio mdev device. So this
> series tries to extend the mdev to be able to differ from different
> devices by:
> 
> - device id and matching for mdev bus
> - device speicfic callbacks and move vfio callbacks there
> 
> Sent for early reivew, compile test only!
> 
> Thanks
> 
> [1] https://lkml.org/lkml/2019/9/10/135

I expect Parav must have something similar in the works for their
in-kernel networking mdev support.  Link to discussion so far:

https://lore.kernel.org/kvm/20190912094012.29653-1-jasow...@redhat.com/T/#t

Thanks,
Alex


> Jason Wang (2):
>   mdev: device id support
>   mdev: introduce device specific ops
> 
>  drivers/gpu/drm/i915/gvt/kvmgt.c  | 16 ---
>  drivers/s390/cio/vfio_ccw_ops.c   | 16 ---
>  drivers/s390/crypto/vfio_ap_ops.c | 13 --
>  drivers/vfio/mdev/mdev_core.c | 14 +-
>  drivers/vfio/mdev/mdev_driver.c   | 14 ++
>  drivers/vfio/mdev/mdev_private.h  |  1 +
>  drivers/vfio/mdev/vfio_mdev.c | 36 ++-
>  include/linux/mdev.h  | 76 +++
>  include/linux/mod_devicetable.h   |  6 +++
>  samples/vfio-mdev/mbochs.c| 18 +---
>  samples/vfio-mdev/mdpy.c  | 18 +---
>  samples/vfio-mdev/mtty.c  | 16 ---
>  12 files changed, 163 insertions(+), 81 deletions(-)
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-05 Thread Alex Williamson
On Thu, 4 Jul 2019 14:21:34 +0800
Tiwei Bie  wrote:

> On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > On 2019/7/3 下午9:08, Tiwei Bie wrote:  
> > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:  
> > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:  
> > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:  
> > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:  
> > > > > > > Details about this can be found here:
> > > > > > > 
> > > > > > > https://lwn.net/Articles/750770/
> > > > > > > 
> > > > > > > What's new in this version
> > > > > > > ==
> > > > > > > 
> > > > > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > > > > some comments from here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > 
> > > > > > > Below is the updated device interface:
> > > > > > > 
> > > > > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > > can be used to notify the device.
> > > > > > > 
> > > > > > > 1. CONFIG_REGION
> > > > > > > 
> > > > > > > The region described by CONFIG_REGION is the main control 
> > > > > > > interface.
> > > > > > > Messages will be written to or read from this region.
> > > > > > > 
> > > > > > > The message type is determined by the `request` field in message
> > > > > > > header. The message size is encoded in the message header too.
> > > > > > > The message format looks like this:
> > > > > > > 
> > > > > > > struct vhost_vfio_op {
> > > > > > >   __u64 request;
> > > > > > >   __u32 flags;
> > > > > > >   /* Flag values: */
> > > > > > > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > >   __u32 size;
> > > > > > >   union {
> > > > > > >   __u64 u64;
> > > > > > >   struct vhost_vring_state state;
> > > > > > >   struct vhost_vring_addr addr;
> > > > > > >   } payload;
> > > > > > > };
> > > > > > > 
> > > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > > requests in above structure.  
> > > > > > Still a comments like V1. What's the advantage of inventing a new 
> > > > > > protocol?  
> > > > > I'm trying to make it work in VFIO's way..
> > > > >   
> > > > > > I believe either of the following should be better:
> > > > > > 
> > > > > > - using vhost ioctl,  we can start from 
> > > > > > SET_VRING_KICK/SET_VRING_CALL and
> > > > > > extend it with e.g notify region. The advantages is that all exist 
> > > > > > userspace
> > > > > > program could be reused without modification (or minimal 
> > > > > > modification). And
> > > > > > vhost API hides lots of details that is not necessary to be 
> > > > > > understood by
> > > > > > application (e.g in the case of container).  
> > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > > using the existing vfio_mdev) for mdev device?  
> > > > Can we simply add them into ioctl of mdev_parent_ops?  
> > > Right, either way, these ioctls have to be and just need to be
> > > added in the ioctl of the mdev_parent_ops. But another thing we
> > > also need to consider is that which file descriptor the userspace
> > > will do the ioctl() on. So I'm wondering do you mean let the
> > > userspace do the ioctl() on the VFIO device fd of the mdev
> > > device?
> > >   
> > 
> > Yes.  
> 
> Got it! I'm not sure what's Alex opinion on this. If we all
> agree with this, I can do it in this way.
> 
> > Is there any other way btw?  
> 
> Just a quick thought.. Maybe totally a bad idea. I was thinking
> whether it would be odd to do non-VFIO's ioctls on VFIO's device
> fd. So I was wondering whether it's possible to allow binding
> another mdev driver (e.g. vhost_mdev) to the supported mdev
> devices. The new mdev driver, vhost_mdev, can provide similar
> ways to let userspace open the mdev device and do the vhost ioctls
> on it. To distinguish with the vfio_mdev compatible mdev devices,
> the device API of the new vhost_mdev compatible mdev devices
> might be e.g. "vhost-net" for net?
> 
> So in VFIO case, the device will be for passthru directly. And
> in VHOST case, the device can be used to accelerate the existing
> virtualized devices.
> 
> How do you think?

VFIO really can't prevent vendor specific ioctls on the device file
descriptor for mdevs, but a) we'd want to be sure the ioctl address
space can't collide with ioctls we'd use for vfio defined purposes and
b) maybe the VFIO user API isn't what you want in the first place if
you intend to mostly/entirely ignore the defined ioctl set and replace
them with your own.  In the case of the latter, you're also not getting
the advantages of the existing VFIO userspace code, so why expose a
VFIO device at all.

The mdev interface does provide a general 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Alex Williamson
On Wed,  3 Jul 2019 17:13:39 +0800
Tiwei Bie  wrote:
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 8f10748dac79..6c5718ab7eeb 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -201,6 +201,7 @@ struct vfio_device_info {
>  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3) /* vfio-amba device */
>  #define VFIO_DEVICE_FLAGS_CCW(1 << 4)/* vfio-ccw device */
>  #define VFIO_DEVICE_FLAGS_AP (1 << 5)/* vfio-ap device */
> +#define VFIO_DEVICE_FLAGS_VHOST  (1 << 6)/* vfio-vhost device */
>   __u32   num_regions;/* Max region index + 1 */
>   __u32   num_irqs;   /* Max IRQ index + 1 */
>  };
> @@ -217,6 +218,7 @@ struct vfio_device_info {
>  #define VFIO_DEVICE_API_AMBA_STRING  "vfio-amba"
>  #define VFIO_DEVICE_API_CCW_STRING   "vfio-ccw"
>  #define VFIO_DEVICE_API_AP_STRING"vfio-ap"
> +#define VFIO_DEVICE_API_VHOST_STRING "vfio-vhost"
>  
>  /**
>   * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
> @@ -573,6 +575,23 @@ enum {
>   VFIO_CCW_NUM_IRQS
>  };
>  
> +/*
> + * The vfio-vhost bus driver makes use of the following fixed region and
> + * IRQ index mapping. Unimplemented regions return a size of zero.
> + * Unimplemented IRQ types return a count of zero.
> + */
> +
> +enum {
> + VFIO_VHOST_CONFIG_REGION_INDEX,
> + VFIO_VHOST_NOTIFY_REGION_INDEX,
> + VFIO_VHOST_NUM_REGIONS
> +};
> +
> +enum {
> + VFIO_VHOST_VQ_IRQ_INDEX,
> + VFIO_VHOST_NUM_IRQS
> +};
> +

Note that the vfio API has evolved a bit since vfio-pci started this
way, with fixed indexes for pre-defined region types.  We now support
device specific regions which can be identified by a capability within
the REGION_INFO ioctl return data.  This allows a bit more flexibility,
at the cost of complexity, but the infrastructure already exists in
kernel and QEMU to make it relatively easy.  I think we'll have the
same support for interrupts soon too.  If you continue to pursue the
vfio-vhost direction you might want to consider these before committing
to fixed indexes.  Thanks,

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/4] vfio: Allow type-1 IOMMU instantiation with a virtio-iommu

2018-02-14 Thread Alex Williamson
On Wed, 14 Feb 2018 14:53:40 +
Jean-Philippe Brucker  wrote:

> When enabling both VFIO and VIRTIO_IOMMU modules, automatically select
> VFIO_IOMMU_TYPE1 as well.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/vfio/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index c84333eb5eb5..65a1e691110c 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -21,7 +21,7 @@ config VFIO_VIRQFD
>  menuconfig VFIO
>   tristate "VFIO Non-Privileged userspace driver framework"
>   depends on IOMMU_API
> - select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3)
> + select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3 || 
> VIRTIO_IOMMU)
>   select ANON_INODES
>   help
> VFIO provides a framework for secure userspace device drivers.

Why are we basing this on specific IOMMU drivers in the first place?
Only ARM is doing that.  Shouldn't IOMMU_API only be enabled for ARM
targets that support it and therefore we can forget about the specific
IOMMU drivers?  Thanks,

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] Fix ERROR: trailing statements should be on next line

2017-05-14 Thread Alex Williamson
On Mon, 15 May 2017 05:58:05 +0300
"Michael S. Tsirkin"  wrote:

> On Sun, May 14, 2017 at 07:51:28PM +0200, Maciek Fijalkowski wrote:
> > From: Maciej Fijalkowski 
> > 
> > Signed-off-by: Maciej Fijalkowski   
> 
> I prefer the original form - ; isn't a full statement.
> 
> > ---
> >  drivers/net/virtio_net.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 9320d96..f20dfb8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -217,7 +217,8 @@ static void give_pages(struct receive_queue *rq, struct 
> > page *page)
> > struct page *end;
> >  
> > /* Find end of list, sew whole thing into vi->rq.pages. */
> > -   for (end = page; end->private; end = (struct page *)end->private);
> > +   for (end = page; end->private; end = (struct page *)end->private)
> > +   ;

FWIW, I generally like to put a comment on the next line to make it
abundantly clear that there's nothing in the body of the loop, it's
also more aesthetically pleasing than a semi-colon on the line by
itself, ex. /* Nothing */;  It's just too easy to misinterpret the
loop otherwise, especially without gratuitous white space.  Thanks,

Alex


> > end->private = (unsigned long)rq->pages;
> > rq->pages = page;
> >  }
> > -- 
> > 2.4.11  
> ___
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Need information on type 2 IOMMU

2017-04-09 Thread Alex Williamson
On Mon, 10 Apr 2017 08:00:45 +0530
valmiki  wrote:

> Hi All,
> 
> We have drivers/vfio/vfio_iommu_type1.c. what is type1 iommu? Is it 
> w.r.t vfio layer it is being referred?
> 
> Is there type 2 IOMMU w.r.t vfio? If so what is it?

type1 is the 1st type.  It's an arbitrary name.  There is no type2, yet.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-30 Thread Alex Williamson
On Tue, 30 Aug 2016 08:20:38 +0300
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Mon, Aug 29, 2016 at 10:53:04PM -0600, Alex Williamson wrote:
> > On Mon, 29 Aug 2016 21:52:20 -0600
> > Alex Williamson <alex.william...@redhat.com> wrote:
> >   
> > > On Mon, 29 Aug 2016 21:23:25 -0600
> > > Alex Williamson <alex.william...@redhat.com> wrote:
> > >   
> > > > On Tue, 30 Aug 2016 05:27:17 +0300
> > > > "Michael S. Tsirkin" <m...@redhat.com> wrote:
> > > > 
> > > > > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > > > > to signal they are safe to use with an IOMMU.
> > > > > 
> > > > > Without this bit, exposing the device to userspace is unsafe, so probe
> > > > > and fail VFIO initialization unless noiommu is enabled.
> > > > > 
> > > > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> > > > > ---
> > > > >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> > > > >  drivers/vfio/pci/vfio_pci.c |  14 
> > > > >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > > > > 
> > > > >  drivers/vfio/pci/Makefile   |   1 +
> > > > >  4 files changed, 156 insertions(+)
> > > > >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > > > > 
> > > > > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > > > > b/drivers/vfio/pci/vfio_pci_private.h
> > > > > index 2128de8..2bd5616 100644
> > > > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > > > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > > > > vfio_pci_device *vdev)
> > > > >   return -ENODEV;
> > > > >  }
> > > > >  #endif
> > > > > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > > > > noiommu);
> > > > >  #endif /* VFIO_PCI_PRIVATE_H */
> > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > > index d624a52..e93bf0c 100644
> > > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev 
> > > > > *pdev, const struct pci_device_id *id)
> > > > >   return ret;
> > > > >   }
> > > > >  
> > > > > + if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {  
> > > > 
> > > > Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> > > > ID range initially as well, this test raised a big red flag for me
> > > > whether all devices within this vendor ID were virtio.
> > > > 
> > > > > + bool noiommu = vfio_is_noiommu_group_dev(>dev);   
> > > > >
> > > > 
> > > > I think you can use iommu_present() for this and avoid patch 1of2.
> > > > noiommu is mutually exclusive to an iommu being present.  Seems like
> > > > all of this logic should be in the quirk itself, I'm not sure what it
> > > > buys to get the value here but wait until later to use it.  Using
> > > > iommu_present() could also move this test much earlier in
> > > > vfio_pci_probe() making the exit path easier.
> > > 
> > > Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
> > > iommu_present() assumes an IOMMU API based device.  I'll try to think if
> > > there's another way to avoid adding the is_noiommu function.  Thanks,  
> > 
> > I think something like this would do it.
> > 
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1214,6 +1214,22 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > const str
> > if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> > return -EINVAL;
> >  
> > +   /*
> > +* Filter out virtio devices that do not honor the iommu,
> > +* but only for real iommu groups.
> > +*/
> > +   if (vfio_pci_is_virtio(pdev)) {
> > +   struct iommu_group *tmp = iommu_group_get(>dev);
> > +
> > +   if (tmp) {
> > +   iommu_group_put(tmp);

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Alex Williamson
On Mon, 29 Aug 2016 21:52:20 -0600
Alex Williamson <alex.william...@redhat.com> wrote:

> On Mon, 29 Aug 2016 21:23:25 -0600
> Alex Williamson <alex.william...@redhat.com> wrote:
> 
> > On Tue, 30 Aug 2016 05:27:17 +0300
> > "Michael S. Tsirkin" <m...@redhat.com> wrote:
> >   
> > > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > > to signal they are safe to use with an IOMMU.
> > > 
> > > Without this bit, exposing the device to userspace is unsafe, so probe
> > > and fail VFIO initialization unless noiommu is enabled.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> > > ---
> > >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> > >  drivers/vfio/pci/vfio_pci.c |  14 
> > >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > > 
> > >  drivers/vfio/pci/Makefile   |   1 +
> > >  4 files changed, 156 insertions(+)
> > >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > > 
> > > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > > b/drivers/vfio/pci/vfio_pci_private.h
> > > index 2128de8..2bd5616 100644
> > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > > vfio_pci_device *vdev)
> > >   return -ENODEV;
> > >  }
> > >  #endif
> > > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > > noiommu);
> > >  #endif /* VFIO_PCI_PRIVATE_H */
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index d624a52..e93bf0c 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > > const struct pci_device_id *id)
> > >   return ret;
> > >   }
> > >  
> > > + if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {
> > 
> > Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> > ID range initially as well, this test raised a big red flag for me
> > whether all devices within this vendor ID were virtio.
> >   
> > > + bool noiommu = vfio_is_noiommu_group_dev(>dev);
> > 
> > I think you can use iommu_present() for this and avoid patch 1of2.
> > noiommu is mutually exclusive to an iommu being present.  Seems like
> > all of this logic should be in the quirk itself, I'm not sure what it
> > buys to get the value here but wait until later to use it.  Using
> > iommu_present() could also move this test much earlier in
> > vfio_pci_probe() making the exit path easier.  
> 
> Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
> iommu_present() assumes an IOMMU API based device.  I'll try to think if
> there's another way to avoid adding the is_noiommu function.  Thanks,

I think something like this would do it.

--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1214,6 +1214,22 @@ static int vfio_pci_probe(struct pci_dev *pdev, const str
if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
return -EINVAL;
 
+   /*
+* Filter out virtio devices that do not honor the iommu,
+* but only for real iommu groups.
+*/
+   if (vfio_pci_is_virtio(pdev)) {
+   struct iommu_group *tmp = iommu_group_get(>dev);
+
+   if (tmp) {
+   iommu_group_put(tmp);
+
+   ret = vfio_pci_virtio_quirk(pdev);
+   if (ret)
+   return ret;
+   }
+   }
+
group = vfio_iommu_group_get(>dev);
if (!group)
return -EINVAL;

Thanks,
Alex

> > > +
> > > + ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > > + if (ret) {
> > > + dev_warn(>pdev->dev,
> > > +  "Failed to setup Virtio for VFIO\n");
> > > + vfio_del_group_dev(>dev);
> > > + vfio_iommu_group_put(group, >dev);
> > > + kfree(vdev);
> > > + return ret;
> > > + }
> > > + }
> > > +
> > >   if (vfio_pci_is_vga(pdev)) {
> > >   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > >   vga_set_legacy_decoding(pdev,
> > > diff --git a/drivers/vfio/pci/

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Alex Williamson
On Mon, 29 Aug 2016 21:23:25 -0600
Alex Williamson <alex.william...@redhat.com> wrote:

> On Tue, 30 Aug 2016 05:27:17 +0300
> "Michael S. Tsirkin" <m...@redhat.com> wrote:
> 
> > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > to signal they are safe to use with an IOMMU.
> > 
> > Without this bit, exposing the device to userspace is unsafe, so probe
> > and fail VFIO initialization unless noiommu is enabled.
> > 
> > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> >  drivers/vfio/pci/vfio_pci.c |  14 
> >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > 
> >  drivers/vfio/pci/Makefile   |   1 +
> >  4 files changed, 156 insertions(+)
> >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > b/drivers/vfio/pci/vfio_pci_private.h
> > index 2128de8..2bd5616 100644
> > --- a/drivers/vfio/pci/vfio_pci_private.h
> > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > vfio_pci_device *vdev)
> > return -ENODEV;
> >  }
> >  #endif
> > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > noiommu);
> >  #endif /* VFIO_PCI_PRIVATE_H */
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index d624a52..e93bf0c 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > const struct pci_device_id *id)
> > return ret;
> > }
> >  
> > +   if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {  
> 
> Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> ID range initially as well, this test raised a big red flag for me
> whether all devices within this vendor ID were virtio.
> 
> > +   bool noiommu = vfio_is_noiommu_group_dev(>dev);  
> 
> I think you can use iommu_present() for this and avoid patch 1of2.
> noiommu is mutually exclusive to an iommu being present.  Seems like
> all of this logic should be in the quirk itself, I'm not sure what it
> buys to get the value here but wait until later to use it.  Using
> iommu_present() could also move this test much earlier in
> vfio_pci_probe() making the exit path easier.

Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
iommu_present() assumes an IOMMU API based device.  I'll try to think if
there's another way to avoid adding the is_noiommu function.  Thanks,

Alex

> 
> > +
> > +   ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > +   if (ret) {
> > +   dev_warn(>pdev->dev,
> > +"Failed to setup Virtio for VFIO\n");
> > +   vfio_del_group_dev(>dev);
> > +   vfio_iommu_group_put(group, >dev);
> > +   kfree(vdev);
> > +   return ret;
> > +   }
> > +   }
> > +
> > if (vfio_pci_is_vga(pdev)) {
> > vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > vga_set_legacy_decoding(pdev,
> > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > b/drivers/vfio/pci/vfio_pci_virtio.c
> > new file mode 100644
> > index 000..e1ecffd
> > --- /dev/null
> > +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> > @@ -0,0 +1,140 @@
> > +/*
> > + * VFIO PCI Intel Graphics support  
>   ^^^
> > + *
> > + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> > + * Author: Alex Williamson <alex.william...@redhat.com>  
> 
> Update.
> 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * Register a device specific region through which to provide read-only
> > + * access to the Intel IGD opregion.  The register defining the opregion
> > + * address is also virtualized to prevent user modification.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include   
> 
> Are io.h and uaccess.h needed?
> 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "vfio_pci_private.h"
> > +
>

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Alex Williamson
On Tue, 30 Aug 2016 05:27:17 +0300
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> to signal they are safe to use with an IOMMU.
> 
> Without this bit, exposing the device to userspace is unsafe, so probe
> and fail VFIO initialization unless noiommu is enabled.
> 
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> ---
>  drivers/vfio/pci/vfio_pci_private.h |   1 +
>  drivers/vfio/pci/vfio_pci.c |  14 
>  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> 
>  drivers/vfio/pci/Makefile   |   1 +
>  4 files changed, 156 insertions(+)
>  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> 
> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> b/drivers/vfio/pci/vfio_pci_private.h
> index 2128de8..2bd5616 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> vfio_pci_device *vdev)
>   return -ENODEV;
>  }
>  #endif
> +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool noiommu);
>  #endif /* VFIO_PCI_PRIVATE_H */
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index d624a52..e93bf0c 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   return ret;
>   }
>  
> + if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {

Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
ID range initially as well, this test raised a big red flag for me
whether all devices within this vendor ID were virtio.

> + bool noiommu = vfio_is_noiommu_group_dev(>dev);

I think you can use iommu_present() for this and avoid patch 1of2.
noiommu is mutually exclusive to an iommu being present.  Seems like
all of this logic should be in the quirk itself, I'm not sure what it
buys to get the value here but wait until later to use it.  Using
iommu_present() could also move this test much earlier in
vfio_pci_probe() making the exit path easier.

> +
> + ret = vfio_pci_virtio_quirk(vdev, noiommu);
> + if (ret) {
> + dev_warn(>pdev->dev,
> +  "Failed to setup Virtio for VFIO\n");
> + vfio_del_group_dev(>dev);
> + vfio_iommu_group_put(group, >dev);
> + kfree(vdev);
> + return ret;
> + }
> + }
> +
>   if (vfio_pci_is_vga(pdev)) {
>   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
>   vga_set_legacy_decoding(pdev,
> diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> b/drivers/vfio/pci/vfio_pci_virtio.c
> new file mode 100644
> index 000..e1ecffd
> --- /dev/null
> +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> @@ -0,0 +1,140 @@
> +/*
> + * VFIO PCI Intel Graphics support
  ^^^
> + *
> + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> + *   Author: Alex Williamson <alex.william...@redhat.com>

Update.

> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Register a device specific region through which to provide read-only
> + * access to the Intel IGD opregion.  The register defining the opregion
> + * address is also virtualized to prevent user modification.
> + */
> +
> +#include 
> +#include 
> +#include 

Are io.h and uaccess.h needed?

> +#include 
> +#include 
> +#include 
> +
> +#include "vfio_pci_private.h"
> +
> +/**
> + * virtio_pci_find_capability - walk capabilities to find device info.
> + * @dev: the pci device
> + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> + *
> + * Returns offset of the capability, or 0.
> + */
> +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 
> cfg_type)

Does inlining this really make sense?

> +{
> + int pos;
> +
> + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> +  pos > 0;
> +  pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> + u8 type;
> + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> +  cfg_type),
> +  );
> +
> + if (type != cfg_ty

Re: [PATCH RFC] fixup! virtio: convert to use DMA api

2016-04-19 Thread Alex Williamson
On Tue, 19 Apr 2016 12:13:29 +0300
"Michael S. Tsirkin"  wrote:

> On Mon, Apr 18, 2016 at 02:29:33PM -0400, David Woodhouse wrote:
> > On Mon, 2016-04-18 at 19:27 +0300, Michael S. Tsirkin wrote:  
> > > I balk at adding more hacks to a broken system. My goals are
> > > merely to
> > > - make things work correctly with an IOMMU and new guests,
> > >   so people can use userspace drivers with virtio devices
> > > - prevent security risks when guest kernel mistakenly thinks
> > >   it's protected by an IOMMU, but in fact isn't
> > > - avoid breaking any working configurations  
> > 
> > AFAICT the VIRTIO_F_IOMMU_PASSTHROUGH thing seems orthogonal to this.
> > That's just an optimisation, for telling an OS "you don't really need
> > to bother with the IOMMU, even though you it works".
> > 
> > There are two main reasons why an operating system might want to use
> > the IOMMU via the DMA API for native drivers: 
> >  - To protect against driver bugs triggering rogue DMA.
> >  - To protect against hardware (or firmware) bugs.
> > 
> > With virtio, the first reason still exists. But the second is moot
> > because the device is part of the hypervisor and if the hypervisor is
> > untrustworthy then you're screwed anyway... but then again, in SoC
> > devices you could replace 'hypervisor' with 'chip' and the same is
> > true, isn't it? Is there *really* anything virtio-specific here?
> >
> > Sure, I want my *external* network device on a PCIe card with software-
> > loadable firmware to be behind an IOMMU because I don't trust it as far
> > as I can throw it. But for on-SoC devices surely the situation is
> > *just* the same as devices provided by a hypervisor?  
> 
> Depends on how SoC is designed I guess.  At the moment specifically QEMU
> runs everything in a single memory space so an IOMMU table lookup does
> not offer any extra protection. That's not a must, one could come
> up with modular hypervisor designs - it's just what we have ATM.
> 
> 
> > And some people want that external network device to use passthrough
> > anyway, for performance reasons.  
> 
> That's a policy decision though.
> 
> > On the whole, there are *plenty* of reasons why we might want to have a
> > passthrough mapping on a per-device basis,  
> 
> That's true. And driver security also might differ, for example maybe I
> trust a distro-supplied driver more than an out of tree one.  Or maybe I
> trust a distro-supplied userspace driver more than a closed-source one.
> And maybe I trust devices from same vendor as my chip more than a 3rd
> party one.  So one can generalize this even further, think about device
> and driver security/trust level as an integer and platform protection as an
> integer.
> 
> If platform IOMMU offers you extra protection over trusting the device
> (trust < protection) it improves you security to use platform to limit
> the device. If trust >= protection it just adds overhead without
> increasing the security.
> 
> > and I really struggle to
> > find justification for having this 'hint' in a virtio-specific way.  
> 
> It's a way. No system seems to expose this information in a more generic
> way at the moment, and it's portable. Would you like to push for some
> kind of standartization of such a hint? I would be interested
> to hear about that.
> 
> 
> > And it's complicating the discussion of the *actual* fix we're looking
> > at.  
> 
> I guess you are right in that we should split this part out.
> What I wanted is really the combination
> PASSTHROUGH && !PLATFORM so that we can say "ok we don't
> need to guess, this device actually bypasses the IOMMU".
> 
> And I thought it's a nice idea to use PASSTHROUGH && PLATFORM
> as a hint since it seemed to be unused.
> But maybe the best thing to do for now is to say
> - hosts should not set PASSTHROUGH && PLATFORM
> - guests should ignore PASSTHROUGH if PLATFORM is set
> 
> and then we can come back to this optimization idea later
> if it's appropriate.
> 
> So yes I think we need the two bits but no we don't need to
> mix the hint discussion in here.
> 
> > > Looking at guest code, it looks like virtio was always
> > > bypassing the IOMMU even if configured, but no other
> > > guest driver did.
> > > 
> > > This makes me think the problem where guest drivers
> > > ignore the IOMMU is virtio specific
> > > and so a virtio specific solution seems cleaner.
> > > 
> > > The problem for assigned devices is IMHO different: they bypass
> > > the guest IOMMU too but no guest driver knows about this,
> > > so guests do not work. Seems cleaner to fix QEMU to make
> > > existing guests work.  
> > 
> > I certainly agree that it's better to fix QEMU. Whether devices are
> > behind an IOMMU or not, the DMAR tables we expose to a guest should
> > tell the truth.
> > 
> > Part of the issue here is virtio-specific; part isn't.
> > 
> > Basically, we have a conjunction of two separate bugs which happened to
> > work (for virtio) — the IOMMU support in QEMU wasn't 

Re: [PATCH RFC 3/3] vfio: add virtio pci quirk

2016-04-18 Thread Alex Williamson
On Mon, 18 Apr 2016 12:58:28 +0300
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> to signal they are safe to use with an IOMMU.
> 
> Without this bit, exposing the device to userspace is unsafe, so probe
> and fail VFIO initialization unless noiommu is enabled.
> 
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> ---
>  drivers/vfio/pci/vfio_pci_private.h |   1 +
>  drivers/vfio/pci/vfio_pci.c |  11 +++
>  drivers/vfio/pci/vfio_pci_virtio.c  | 135 
> 
>  drivers/vfio/pci/Makefile   |   1 +
>  4 files changed, 148 insertions(+)
>  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> 
> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> b/drivers/vfio/pci/vfio_pci_private.h
> index 8a7d546..604d445 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -130,4 +130,5 @@ static inline int vfio_pci_igd_init(struct 
> vfio_pci_device *vdev)
>   return -ENODEV;
>  }
>  #endif
> +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, int noiommu);
>  #endif /* VFIO_PCI_PRIVATE_H */
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index d622a41..2bb8c76 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1125,6 +1125,17 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   return ret;
>   }
>  
> + if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET &&

Virtio really owns this entire vendor ID block?  Apparently nobody told
ivshmem: http://pci-ids.ucw.cz/read/PC/1af4/1110  Even the comment by
virtio_pci_id_table[] suggests virtio is only a subset even if the code
doesn't appear to honor that comment.  I don't know the history there,
but that seems like really inefficient use of an entire, coveted vendor
block.

> + ((ret = vfio_pci_virtio_quirk(vdev, ret {

Please don't set variables like this unless necessary.

if (vendor...) {
   ret = vfio_pci_virtio_quir...
   if (ret) {
   ...

> + dev_warn(>pdev->dev,
> +  "Failed to setup Virtio for VFIO\n");
> + vfio_del_group_dev(>dev);
> + vfio_iommu_group_put(group, >dev);
> + kfree(vdev);
> + return ret;
> + }
> +
> +
>   if (vfio_pci_is_vga(pdev)) {
>   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
>   vga_set_legacy_decoding(pdev,
> diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> b/drivers/vfio/pci/vfio_pci_virtio.c
> new file mode 100644
> index 000..1a32064
> --- /dev/null
> +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> @@ -0,0 +1,135 @@
> +/*
> + * VFIO PCI Intel Graphics support
> + *
> + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> + *   Author: Alex Williamson <alex.william...@redhat.com>
> + *

Update

> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Register a device specific region through which to provide read-only
> + * access to the Intel IGD opregion.  The register defining the opregion
> + * address is also virtualized to prevent user modification.

Update

> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

I don't see where io or uaccess are needed here.

> +
> +#include "vfio_pci_private.h"
> +
> +/**
> + * virtio_pci_find_capability - walk capabilities to find device info.
> + * @dev: the pci device
> + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> + *
> + * Returns offset of the capability, or 0.
> + */
> +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 
> cfg_type)

This is called from probe code, why inline?  There's already a function
with this exact same name in virtio code, can we come up with something
unique to avoid confusion?

> +{
> + int pos;
> +
> + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> +  pos > 0;
> +  pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> + u8 type;
> + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> +  cfg_type),
> +  );
> +
> + if (type != cfg_type)
> + continue;
> +
> + /* Ignore structures with reserved BAR values */
> +

Re: [PATCH RFC 2/3] vfio: report group noiommu status

2016-04-18 Thread Alex Williamson
On Mon, 18 Apr 2016 12:58:20 +0300
"Michael S. Tsirkin"  wrote:

> When using vfio, callers might want to know whether device is added to a
> regular group or an non-iommu group.
> 
> Report this status from vfio_add_group_dev.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---

What about making an interface to query this rather than playing games
with magic return values?

bool vfio_iommu_group_is_noiommu(struct iommu_group *group)
{
return iommu_group_get_iommudata(group) == 
}

>  drivers/vfio/pci/vfio_pci.c  | 2 +-
>  drivers/vfio/platform/vfio_platform_common.c | 2 +-
>  drivers/vfio/vfio.c  | 5 -
>  Documentation/vfio.txt   | 4 +++-
>  4 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 712a849..d622a41 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1119,7 +1119,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   spin_lock_init(>irqlock);
>  
>   ret = vfio_add_group_dev(>dev, _pci_ops, vdev);
> - if (ret) {
> + if (ret < 0) {
>   vfio_iommu_group_put(group, >dev);
>   kfree(vdev);
>   return ret;
> diff --git a/drivers/vfio/platform/vfio_platform_common.c 
> b/drivers/vfio/platform/vfio_platform_common.c
> index e65b142..bf74e21 100644
> --- a/drivers/vfio/platform/vfio_platform_common.c
> +++ b/drivers/vfio/platform/vfio_platform_common.c
> @@ -568,7 +568,7 @@ int vfio_platform_probe_common(struct 
> vfio_platform_device *vdev,
>   }
>  
>   ret = vfio_add_group_dev(dev, _platform_ops, vdev);
> - if (ret) {
> + if (ret < 0) {
>   iommu_group_put(group);
>   return ret;
>   }
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 6fd6fa5..67db231 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -756,6 +756,7 @@ int vfio_add_group_dev(struct device *dev,
>   struct iommu_group *iommu_group;
>   struct vfio_group *group;
>   struct vfio_device *device;
> + int noiommu;
>  
>   iommu_group = iommu_group_get(dev);
>   if (!iommu_group)
> @@ -791,6 +792,8 @@ int vfio_add_group_dev(struct device *dev,
>   return PTR_ERR(device);
>   }
>  
> + noiommu = group->noiommu;
> +
>   /*
>* Drop all but the vfio_device reference.  The vfio_device holds
>* a reference to the vfio_group, which holds a reference to the
> @@ -798,7 +801,7 @@ int vfio_add_group_dev(struct device *dev,
>*/
>   vfio_group_put(group);
>  
> - return 0;
> + return noiommu;
>  }
>  EXPORT_SYMBOL_GPL(vfio_add_group_dev);
>  
> diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
> index 1dd3fdd..d76be0f 100644
> --- a/Documentation/vfio.txt
> +++ b/Documentation/vfio.txt
> @@ -259,7 +259,9 @@ extern void *vfio_del_group_dev(struct device *dev);
>  
>  vfio_add_group_dev() indicates to the core to begin tracking the
>  specified iommu_group and register the specified dev as owned by
> -a VFIO bus driver.  The driver provides an ops structure for callbacks
> +a VFIO bus driver.  A negative return value indicates failure.
> +A positive return value indicates that an unsafe noiommu mode
> +is in use.  The driver provides an ops structure for callbacks
>  similar to a file operations structure:
>  
>  struct vfio_device_ops {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost/net: length miscalculation

2015-01-07 Thread Alex Williamson
On Wed, 2015-01-07 at 10:55 +0200, Michael S. Tsirkin wrote:
 commit 8b38694a2dc8b18374310df50174f1e4376d6824
 vhost/net: virtio 1.0 byte swap
 had this chunk:
 -   heads[headcount - 1].len += datalen;
 +   heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);
 
 This adds datalen with the wrong sign, causing guest panics.
 
 Fixes: 8b38694a2dc8b18374310df50174f1e4376d6824
 Reported-by: Alex Williamson alex.william...@redhat.com
 Suggested-by: Greg Kurz gk...@linux.vnet.ibm.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Alex, could you please confirm this fixes the crash for you?

Confirmed, this works.  Thanks,

Alex

  drivers/vhost/net.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 14419a8..d415d69 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -538,7 +538,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
   ++headcount;
   seg += in;
   }
 - heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);
 + heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
   *iovcount = seg;
   if (unlikely(log))
   *log_num = nlogs;



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 34/50] vhost/net: virtio 1.0 byte swap

2015-01-06 Thread Alex Williamson
On Mon, 2014-12-01 at 18:05 +0200, Michael S. Tsirkin wrote:
 I had to add an explicit tag to suppress compiler warning:
 gcc isn't smart enough to notice that
 len is always initialized since function is called with size  0.

I'm getting a panic inside a guest when this change is applied on the
host.  I identified this patch via bisect and confirmed by reverting it
from v3.19-rc2.  Guest is centos6.  Thanks,

Alex

commit 8b38694a2dc8b18374310df50174f1e4376d6824
Author: Michael S. Tsirkin m...@redhat.com
Date:   Fri Oct 24 14:19:48 2014 +0300

vhost/net: virtio 1.0 byte swap

I had to add an explicit tag to suppress compiler warning:
gcc isn't smart enough to notice that
len is always initialized since function is called with size  0.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com

XML chunk:

interface type='direct'
  mac address='52:54:00:64:f3:34'/
  source dev='iscsinet0' mode='bridge'/
  model type='virtio'/
  address type='pci' domain='0x' bus='0x00' slot='0x03' 
function='0x0'/
/interface

Panic log:

1BUG: unable to handle kernel NULL pointer dereference at 0010
1IP: [a0079469] virtnet_poll+0x4f9/0x910 [virtio_net]
4PGD 1aa2f4067 PUD 1aa2f5067 PMD 0 
4Oops:  [#1] SMP 
4last sysfs file: 
/sys/devices/pci:00/:00:03.0/virtio0/net/eth9/ifindex
4CPU 0 
4Modules linked in: 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput 
microcode snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq 
snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc igbvf 
nvidia(P)(U) i2c_core tg3 ptp pps_core virtio_balloon virtio_net virtio_console 
ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi 
ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: 
speedstep_lib]
4
4Pid: 1374, comm: NetworkManager Tainted: P   ---
2.6.32-431.23.3.el6.centos.plus.x86_64 #1 QEMU Standard PC (i440FX + PIIX, 1996)
4RIP: 0010:[a0079469]  [a0079469] 
virtnet_poll+0x4f9/0x910 [virtio_net]
4RSP: 0018:880028203e48  EFLAGS: 00010246
4RAX: 8801a3383d00 RBX: 8801a6aaf480 RCX: 8801aa20b6e0
4RDX: 00c0 RSI: 8801a3383c00 RDI: 8801a3383cc0
4RBP: 880028203ed8 R08: 009e R09: 8801aa1d800c
4R10: 0218 R11:  R12: 8801aa20b6e0
4R13:  R14:  R15: 
4FS:  7febf114d800() GS:88002820() knlGS:
4CS:  0010 DS:  ES:  CR0: 80050033
4CR2: 0010 CR3: 0001aa793000 CR4: 06f0
4DR0:  DR1:  DR2: 
4DR3:  DR6: 0ff0 DR7: 0400
4Process NetworkManager (pid: 1374, threadinfo 8801a74ba000, task 
8801a8d56040)
4Stack:
4 8801aa1d8000 009e 8801aa20b6e0 8801aa20b718
4d 8801aa20b780 8801aa1d800c 8801a6aaf4b8 8801aa20b020
4d 0080 8801aa20b708 0001 1f5981a830c8
4Call Trace:
4 IRQ 
4 [8146ae33] net_rx_action+0x103/0x2f0
4 [8107a5f1] __do_softirq+0xc1/0x1e0
4 [8100c30c] ? call_softirq+0x1c/0x30
4 [8100c30c] call_softirq+0x1c/0x30
4 EOI 
4 [8100fa75] ? do_softirq+0x65/0xa0
4 [8107b2ea] local_bh_enable+0x9a/0xb0
4 [a007813a] virtnet_napi_enable+0x4a/0x60 [virtio_net]
4 [a0078ebf] virtnet_open+0x4f/0x60 [virtio_net]
4 [81467691] dev_open+0xa1/0x100
4 [81466751] dev_change_flags+0xa1/0x1d0
4 [81474a59] do_setlink+0x169/0x8b0
4 [814770b6] ? rtnl_fill_ifinfo+0x946/0xcb0
4 [812a3d24] ? nla_parse+0x34/0x110
4 [8147659e] rtnl_setlink+0xee/0x130
4 [81475b67] rtnetlink_rcv_msg+0x2d7/0x340
4 [81231e14] ? socket_has_perm+0x74/0x90
4 [81475890] ? rtnetlink_rcv_msg+0x0/0x340
4 [814910a9] netlink_rcv_skb+0xa9/0xd0
4 [81475875] rtnetlink_rcv+0x25/0x40
4 [81490cdb] netlink_unicast+0x2db/0x320
4 [81491750] netlink_sendmsg+0x2c0/0x3d0
4 [814520c3] sock_sendmsg+0x123/0x150
4 [81453d73] ? sock_recvmsg+0x133/0x160
4 [8109afa0] ? autoremove_wake_function+0x0/0x40
4 [81136941] ? lru_cache_add_lru+0x21/0x40
4 [8115522d] ? page_add_new_anon_rmap+0x9d/0xf0
4 [8114aeef] ? handle_pte_fault+0x4af/0xb00
4 [81451f14] ? move_addr_to_kernel+0x64/0x70
4 [814538b6] __sys_sendmsg+0x406/0x420
4 [8104a98c] ? __do_page_fault+0x1ec/0x480
4 [814523d9] ? sys_sendto+0x139/0x190
4 [8103ea6c] ? kvm_clock_read+0x1c/0x20
4 [81453ad9] sys_sendmsg+0x49/0x90
4 [8100b072] system_call_fastpath+0x16/0x1b
4Code: 83 e0 

Re: Linux Plumbers ACPI/PM, PCI Microconference

2013-07-30 Thread Alex Williamson
On Wed, 2013-07-31 at 00:02 +, Shuah Khan wrote:
 On 07/30/2013 05:38 PM, Rafael J. Wysocki wrote:
  On Wednesday, July 17, 2013 08:31:55 AM Shuah Khan wrote:
  Myron,
 
  On Tue, Jul 16, 2013 at 8:21 PM, Myron Stowe myron.st...@gmail.com wrote:
 
 
  Shuah - You brought up the idea about Converting drivers from Legacy
  PM ops to dev_pm_ops; would you like to present what you have
  done/encountered so far?
 
 
  Awesome. Yes, I would like to present what I have done so far and I do
  have a couple of things that could benefit from a face to face
  discussion which would help me make progress on the rest of the work
  that needs to get done.
 
  Care to sumbit a formal proposal through the LPC web page?
 
  Rafael
 
 
 
 Rafael,
 
 I did submit a formal talk proposal to LinuxCon/LPC and it was rejected. 
 Submission is closed now as far as I know.

Microconference topics should be submitted here:

http://www.linuxplumbersconf.org/2013/ocw/events/LPC2013/proposals/new

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Call for Proposals: 2013 Linux Plumbers Virtualization Microconference

2013-07-24 Thread Alex Williamson

Reminder, there's one week left to submit proposals for the
virtualization micro-conference at LPC.  Please see below for details
and note the update to submit proposals through the Linux Plumbers
website:

http://www.linuxplumbersconf.org/2013/ocw/events/LPC2013/proposals/new

Thanks,
Alex

On Sun, 2013-07-14 at 15:59 -0600, Alex Williamson wrote:
 On Fri, 2013-07-12 at 14:38 -0600, Alex Williamson wrote:
  The Call for Proposals for the 2013 Linux Plumbers Virtualization
  Microconference is now open.  This uconf is being held as part of Linux
  Plumbers Conference in New Orleans, Louisiana, USA September 18-20th and
  is co-located with LinuxCon North America.  For more information see:
  
  http://www.linuxplumbersconf.org/2013/
  
  The tentative deadline for proposals is August 1st.  To submit a topic
  please email a brief abstract to lpc2013-virt...@codemonkey.ws  If you
  require travel assistance (extremely limited) in order to attend, please
  note that in your submission.  Also, please keep an eye on:
  
  http://www.linuxplumbersconf.org/2013/submitting-topic/
  http://www.linuxplumbersconf.org/2013/participate/
  
  We've setup the above email submission as an interim approach until the
  LPC program committee brings the official submission tool online.  I'll
  send a follow-up message when that occurs, but please send your
  proposals as soon as possible.  Thanks,
 
 And the official tool is now online.  Please see:
 
 http://www.linuxplumbersconf.org/2013/microconference-discussion-topic-bof-submissions-now-open/
 
 for instructions to propose a discussion topic for the virtualization
 microconference.  Thanks,
 
 Alex



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Call for Proposals: 2013 Linux Plumbers Virtualization Microconference

2013-07-14 Thread Alex Williamson
On Fri, 2013-07-12 at 14:38 -0600, Alex Williamson wrote:
 The Call for Proposals for the 2013 Linux Plumbers Virtualization
 Microconference is now open.  This uconf is being held as part of Linux
 Plumbers Conference in New Orleans, Louisiana, USA September 18-20th and
 is co-located with LinuxCon North America.  For more information see:
 
 http://www.linuxplumbersconf.org/2013/
 
 The tentative deadline for proposals is August 1st.  To submit a topic
 please email a brief abstract to lpc2013-virt...@codemonkey.ws  If you
 require travel assistance (extremely limited) in order to attend, please
 note that in your submission.  Also, please keep an eye on:
 
 http://www.linuxplumbersconf.org/2013/submitting-topic/
 http://www.linuxplumbersconf.org/2013/participate/
 
 We've setup the above email submission as an interim approach until the
 LPC program committee brings the official submission tool online.  I'll
 send a follow-up message when that occurs, but please send your
 proposals as soon as possible.  Thanks,

And the official tool is now online.  Please see:

http://www.linuxplumbersconf.org/2013/microconference-discussion-topic-bof-submissions-now-open/

for instructions to propose a discussion topic for the virtualization
microconference.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Call for Proposals: 2013 Linux Plumbers Virtualization Microconference

2013-07-12 Thread Alex Williamson

The Call for Proposals for the 2013 Linux Plumbers Virtualization
Microconference is now open.  This uconf is being held as part of Linux
Plumbers Conference in New Orleans, Louisiana, USA September 18-20th and
is co-located with LinuxCon North America.  For more information see:

http://www.linuxplumbersconf.org/2013/

The tentative deadline for proposals is August 1st.  To submit a topic
please email a brief abstract to lpc2013-virt...@codemonkey.ws  If you
require travel assistance (extremely limited) in order to attend, please
note that in your submission.  Also, please keep an eye on:

http://www.linuxplumbersconf.org/2013/submitting-topic/
http://www.linuxplumbersconf.org/2013/participate/

We've setup the above email submission as an interim approach until the
LPC program committee brings the official submission tool online.  I'll
send a follow-up message when that occurs, but please send your
proposals as soon as possible.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: RFC: vfio interface for platform devices (v2)

2013-07-03 Thread Alex Williamson
On Wed, 2013-07-03 at 21:40 +, Yoder Stuart-B08248 wrote:
 Version 2
   -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
   -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
   -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
   -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
defined 5 new flags and associated structs
   -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
defined 1 new flag and associated struct
   -removed redundant example
 
 --
 VFIO for Platform Devices
 
 The existing kernel interface for vfio-pci is pretty close to what is needed
 for platform devices:
-mechanism to create a container
-add groups/devices to a container
-set the IOMMU model
-map DMA regions
-get an fd for a specific device, which allows user space to determine
 info about device regions (e.g. registers) and interrupt info
-support for mmapping device regions
-mechanism to set how interrupts are signaled
 
 Many platform device are simple and consist of a single register
 region and a single interrupt.  For these types of devices the
 existing vfio interfaces should be sufficient.
 
 However, platform devices can get complicated-- logically represented
 as a device tree hierarchy of nodes.  For devices with multiple regions
 and interrupts, new mechanisms are needed in vfio to correlate the
 regions/interrupts with the device tree structure that drivers use
 to determine the meaning of device resources.
 
 In some cases there are relationships between device, and devices
 reference other devices using phandle links.  The kernel won't expose
 relationships between devices, but just exposes mappable register
 regions and interrupts.
 
 The changes needed for vfio are around some of the device tree
 related info that needs to be available with the device fd.
 
 1.  VFIO_GROUP_GET_DEVICE_FD
 
   User space knows by out-of-band means which device it is accessing
   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
   to get the device information:
 
   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
  /sys/bus/platform/devices/ffe21.usb));

FWIW, I'm in favor of whichever way works out cleaner in the code for
pre-pending /sys/bus or not.  It sort of seems like it's unnecessary.
It's also a little inconsistent that the returned path doesn't
pre-pend /sys in the examples below.

 2.  VFIO_DEVICE_GET_INFO
 
The number of regions corresponds to the regions defined
in reg and ranges in the device tree.  
 
Two new flags are added to struct vfio_device_info:
 
#define VFIO_DEVICE_FLAGS_PLATFORM (1  ?) /* A platform bus device */
#define VFIO_DEVICE_FLAGS_DEVTREE  (1  ?) /* device tree info available 
 */
 
It is possible that there could be platform bus devices 
that are not in the device tree, so we use 2 flags to
allow for that.
 
If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
that there are regions and IRQs but no device tree info
available.
 
If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
there is device tree info available.

But it would be invalid to only have DEVTREE w/o PLATFORM for now,
right?

 3. VFIO_DEVICE_GET_REGION_INFO
 
For platform devices with multiple regions, information
is needed to correlate the regions with the device 
tree structure that drivers use to determine the meaning
of device resources.

The VFIO_DEVICE_GET_REGION_INFO is extended to provide
device tree information.
 
The following information is needed:
   -the device tree path to the node corresponding to the
region
   -whether it corresponds to a reg or ranges property
   -there could be multiple sub-regions per reg or ranges and
the sub-index within the reg/ranges is needed
 
There are 5 new flags added to vfio_region_info :
 
struct vfio_region_info {
 __u32   argsz;
 __u32   flags;
#define VFIO_REGION_INFO_FLAG_CACHEABLE (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1  ?)
 __u32   index;  /* Region index */
 __u32   resv;   /* Reserved for alignment */
 __u64   size;   /* Region size (bytes) */
 __u64   offset; /* Region offset from start of device fd */
};
  
VFIO_REGION_INFO_FLAG_CACHEABLE
-if set indicates that the region must be mapped as cacheable
 
VFIO_DEVTREE_REGION_INFO_FLAG_REG
-if set indicates that the region corresponds to a reg property
 in the device tree representation of the device
 
VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
-if set indicates that the region corresponds to a ranges property
 in the 

Re: binding/unbinding devices to vfio-pci

2013-07-02 Thread Alex Williamson
On Tue, 2013-07-02 at 15:13 +, Yoder Stuart-B08248 wrote:
 
  -Original Message-
  From: Alex Williamson [mailto:alex.william...@redhat.com]
  Sent: Tuesday, July 02, 2013 9:46 AM
  To: Yoder Stuart-B08248
  Cc: k...@vger.kernel.org list; Alexander Graf; Bhushan Bharat-R65777; 
  a.mota...@virtualopensystems.com;
  virtualization@lists.linux-foundation.org
  Subject: Re: binding/unbinding devices to vfio-pci
  
  On Tue, 2013-07-02 at 14:15 +, Yoder Stuart-B08248 wrote:
   Alex,
  
   I'm trying to think through how binding/unbinding of devices will
   work with VFIO for platform devices and have a couple of questions
   about how vfio-pci works.
  
   When you bind a device to vfio-pci, e.g.:
   # echo 1102 0002  /sys/bus/pci/drivers/vfio-pci/new_id
  
   ...I understand that the echo into 'new_id' tells the
   vfio pci driver that it now handles the specified PCI ID.
  
   But now there are 2 drivers that handle that PCI ID,
   the original host driver and vfio-pci.   Say that
   you hotplug a PCI device that matches that ID.   Which of
   the 2 drivers are going to get bound to the device?
  
   Also, if you unbind a device from vfio-pci and want to
   bind it again to the normal host driver you would just
   echo the full device info into the 'bind' sysfs file
   for the host driver, right?
  
   echo :06:0d.0  /sys/bus/pci/drivers/...
  
  Hi Stuart,
  
  The driver binding interface is far from perfect.  In your scenario
  where you've added the ID for one device, then hotplug another device
  with the same ID, the results are indeterminate.  Both vfio-pci and the
  host driver, assuming it's still loaded, can claim the device, it's just
  a matter of which gets probed first.
  
  Generally that window should be very short though.  To bind a device,
  the user should do:
  
  1) echo :bb:dd.f  /sys/bus/pci/devices/:bb:dd.f/driver/unbind
  2) echo    /sys/bus/pci/drivers/vfio-pci/new_id
  3) echo :bb:dd.f  /sys/bus/pci/drivers/vfio-pci/bind
  4) echo    /sys/bus/pci/drivers/vfio-pci/remove_id
  
  There are actually a number of ways you can do this and the default
  autoprobe behavior really makes step 3) unnecessary as the driver core
  will probe any unbound devices as soon as a new_id is added to vfio-pci.
  That can be changed by:
  
  # echo 0  /sys/bus/pci/drivers_autoprobe
  
  But then we have to worry about races from any devices that might have
  been hotplugged in the interim.
 
 But, even apart from hot-plugged devices, what about the device
 we just unbound?  There are now 2 host drivers that can handle the
 device when the autoprobe happens.  Is it just luck that vfio-pci
 is the one that gets the device?

If we have an unbound device and echo ID  new_id, then just that
driver with the new_id is autoprobed, not the original host driver.
Thanks,

Alex


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: RFC: vfio interface for platform devices

2013-07-02 Thread Alex Williamson
On Tue, 2013-07-02 at 23:25 +, Yoder Stuart-B08248 wrote:
 The write-up below is the first draft of a proposal for how the kernel can 
 expose
 platform devices to user space using vfio.
 
 In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
 allows user space to correlate regions and interrupts to the corresponding
 device tree node structure that is defined for most platform devices.
 
 Regards,
 Stuart Yoder
 
 --
 VFIO for Platform Devices
 
 The existing infrastructure for vfio-pci is pretty close to what we need:
-mechanism to create a container
-add groups/devices to a container
-set the IOMMU model
-map DMA regions
-get an fd for a specific device, which allows user space to determine
 info about device regions (e.g. registers) and interrupt info
-support for mmapping device regions
-mechanism to set how interrupts are signaled
 
 Platform devices can get complicated-- potentially with a tree hierarchy
 of nodes, and links/phandles pointing to other platform 
 devices.   The kernel doesn't expose relationships between
 devices.  The kernel just exposes mappable register regions and interrupts.
 It's up to user space to work out relationships between devices
 if it needs to-- this can be determined in the device tree exposed in
 /proc/device-tree.
 
 I think the changes needed for vfio are around some of the device tree
 related info that needs to be available with the device fd.
 
 1.  VFIO_GROUP_GET_DEVICE_FD
 
   User space has to know which device it is accessing and will call
   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
   get the device information:
 
   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, /soc@ffe00/usb@21);
 
   (whether the path is a device tree path or a sysfs path is up for
   discussion, e.g. /sys/bus/platform/devices/ffe21.usb)
 
 2.  VFIO_DEVICE_GET_INFO
 
Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
than adding a new flag identifying a devices as a 'platform'
device.
 
This ioctl simply returns the number of regions and number of irqs.
 
The number of regions corresponds to the number of regions
that can be mapped for the device-- corresponds to the regions defined
in reg and ranges in the device tree.  
 
 3.  VFIO_DEVICE_GET_REGION_INFO
 
No changes needed, except perhaps adding a new flag.  Freescale has some
devices with regions that must be mapped cacheable.
 
 3.  VFIO_DEVICE_GET_IRQ_INFO
 
No changes needed.
 
 4. VFIO_DEVICE_GET_DEVTREE_INFO
 
The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
expose device regions and interrupts, but it's not enough to know
that there are X regions and Y interrupts.  User space needs to
know what the resources are for-- to correlate those regions/interrupts
to the device tree structure that drivers use.  The device tree
structure could consist of multiple nodes and it is necessary to
identify the node corresponding to the region/interrupt exposed
by VFIO.
 
The following information is needed:
   -the device tree path to the node corresponding to the
region or interrupt
   -for a region, whether it corresponds to a reg or ranges
property
   -there could be multiple sub-regions per reg or ranges and
the sub-index within the reg/ranges is needed
 
The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
 
ioctl: VFIO_DEVICE_GET_DEVTREE_INFO

struct vfio_path_info {
 __u32   argsz;
 __u32   flags;
#define VFIO_DEVTREE_INFO_RANGES  (1  3) /* the region is a ranges 
 property */

(1  0)?

Having flags = 0x0 for regs and 0x1 for ranges is a bit awkward.  I'd
suggest a bit for each.  Otherwise, what does it mean when this returns
flags = 0x0 for an irq?

 __u32   index;  /* input: index of region or irq for which we 
 are getting info */
 __u32   type;   /* input: 0 - get devtree info for a region
   1 - get devtree info for an irq
  */
 __u32   start;  /* output: identifies the index within the 
 reg/ranges */
 __u8path[]; /* output: Full path to associated device 
 tree node */
};
 
User space allocates enough space for the device tree path, sets
the type field identifying whether this is a region, or irq,
and sets argsz appropriately.
 
 5.  EXAMPLE 1
 
 Example, Freescale SATA controller:
 
  sata@22 {
  compatible = fsl,p2041-sata, fsl,pq-sata-v2;
  reg = 0x22 0x1000;
  interrupts = 0x44 0x2 0x0 0x0;
  };
 
 request to get device FD would look like:
   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, 
 /soc@ffe00/sata@22);
 
 The VFIO_DEVICE_GET_INFO ioctl would return:
 

2013 Linux Plumbers Virtualization Microconference proposal call for participation

2013-05-16 Thread Alex Williamson
Hey folks,

We'd like to hold another virtualization microconference as part of this
year's Linux Plumbers Conference.  To do so, we need to show that
there's enough interest, materials, and people willing to attend. 

Anthony and Amit have already started a wiki page for the
microconference:

http://wiki.linuxplumbersconf.org/2013:virtualization

Please help to fill it out and add topics you're interested in
discussing or presenting, or relevant topics to current development that
you'd like others to present.  We don't need full abstracts or proposals
at this time, just enough to show that we have things to discuss and
enough people interested in attending to allocate space in the program.

If approved by the program committee we'll give precedence to proposals
that address current issues needing attention, help, or resolution.
This is also an excellent opportunity for cross-project discussions.

We're not sure when the LPC program committee will make a decision, so
please don't hesitate to add things to the wiki as soon as possible.
There's no commitment at this point though we obviously hope that many
of you will follow through with participation if this microconference is
approved.  Please feel free to forward to anyone I've missed.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: PCI device not properly reset after VFIO

2012-10-18 Thread Alex Williamson
Hi Hannes,

Thanks for testing vfio

On Thu, 2012-10-18 at 08:47 +0200, Hannes Reinecke wrote:
 Hi Alex,
 
 I've been playing around with VFIO and megasas (of course).
 What I did now was switching between VFIO and 'normal' operation, ie 
 emulated access.
 
 megasas is happily running under VFIO, but when I do an emergency 
 stop like killing the Qemu session the PCI device is not properly reset.
 IE when I load 'megaraid_sas' after unbinding the vfio_pci module
 the driver cannot initialize the card and waits forever for the 
 firmware state to change.
 
 I need to do a proper pci reset via
 echo 1  /sys/bus/pci/device//reset
 to get it into a working state again.
 
 Looking at vfio_pci_disable() pci reset is called before the config 
 state and BARs are restored.
 Seeing that vfio_pci_enable() calls pci reset right at the start, 
 too, before modifying anything I do wonder whether the pci reset is 
 at the correct location for disable.
 
 I would have expected to call pci reset in vfio_pci_disable() 
 _after_ we have restored the configuration, to ensure a sane state 
 after reset.
 And, as experience show, we do need to call it there.
 
 So what is the rationale for the pci reset?
 Can we move it to the end of vfio_pci_disable() or do we need to 
 call pci reset twice?

I believe the rationale was that by resetting the device before we
restore the state we stop anything that the device was doing.  Restoring
the saved state on a running device seems like it could cause problems,
so you may be right and we actually need to do reset, load, restore,
reset.  Does adding another call to pci_reset_function in the
pci_restore_state (as below) solve the problem?  Traditional KVM device
assignment has a nearly identical path, does it have this same bug?
Thanks,

Alex

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 6c11994..d07a45c 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -107,9 +107,10 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
pci_reset_function(vdev-pdev);
 
if (pci_load_and_free_saved_state(vdev-pdev,
- vdev-pci_saved_state) == 0)
+ vdev-pci_saved_state) == 0) {
pci_restore_state(vdev-pdev);
-   else
+   pci_reset_function(vdev-pdev);
+   } else
pr_info(%s: Couldn't reload %s saved state\n,
__func__, dev_name(vdev-pdev-dev));
 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: PCI device not properly reset after VFIO

2012-10-18 Thread Alex Williamson
On Thu, 2012-10-18 at 17:06 +0200, Hannes Reinecke wrote:
 On 10/18/2012 04:40 PM, Alex Williamson wrote:
  Hi Hannes,
 
  Thanks for testing vfio
 
  On Thu, 2012-10-18 at 08:47 +0200, Hannes Reinecke wrote:
  Hi Alex,
 
  I've been playing around with VFIO and megasas (of course).
  What I did now was switching between VFIO and 'normal' operation, ie
  emulated access.
 
  megasas is happily running under VFIO, but when I do an emergency
  stop like killing the Qemu session the PCI device is not properly reset.
  IE when I load 'megaraid_sas' after unbinding the vfio_pci module
  the driver cannot initialize the card and waits forever for the
  firmware state to change.
 
  I need to do a proper pci reset via
  echo 1  /sys/bus/pci/device//reset
  to get it into a working state again.
 
  Looking at vfio_pci_disable() pci reset is called before the config
  state and BARs are restored.
  Seeing that vfio_pci_enable() calls pci reset right at the start,
  too, before modifying anything I do wonder whether the pci reset is
  at the correct location for disable.
 
  I would have expected to call pci reset in vfio_pci_disable()
  _after_ we have restored the configuration, to ensure a sane state
  after reset.
  And, as experience show, we do need to call it there.
 
  So what is the rationale for the pci reset?
  Can we move it to the end of vfio_pci_disable() or do we need to
  call pci reset twice?
 
  I believe the rationale was that by resetting the device before we
  restore the state we stop anything that the device was doing.  Restoring
  the saved state on a running device seems like it could cause problems,
  so you may be right and we actually need to do reset, load, restore,
  reset.  Does adding another call to pci_reset_function in the
  pci_restore_state (as below) solve the problem?  Traditional KVM device
  assignment has a nearly identical path, does it have this same bug?
 
 It's actually the first time I've been able to test this (the 
 hardware is a bit tricky to setup ...), so I cannot tell (yet)
 if KVM exhibited the same thing.
 
  Thanks,
 
  Alex
 
  diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
  index 6c11994..d07a45c 100644
  --- a/drivers/vfio/pci/vfio_pci.c
  +++ b/drivers/vfio/pci/vfio_pci.c
  @@ -107,9 +107,10 @@ static void vfio_pci_disable(struct vfio_pci_device 
  *vdev)
   pci_reset_function(vdev-pdev);
 
   if (pci_load_and_free_saved_state(vdev-pdev,
  - vdev-pci_saved_state) == 0)
  + vdev-pci_saved_state) == 0) {
   pci_restore_state(vdev-pdev);
  -   else
  +   pci_reset_function(vdev-pdev);
  +   } else
   pr_info(%s: Couldn't reload %s saved state\n,
   __func__, dev_name(vdev-pdev-dev));
 
 
 
 I would have called reset after unmapping the BARs; the HBA I'm 
 working with does need to access the BARs, so the content of them 
 might be relevant, too.

I think I copied the ordering from kvm since it seems to work there,
maybe it doesn't for your device if the kvm path is still an unknown.
Logically it does seem like we'd want to unmap and release before the
final reset, but I can't find that those paths actually cause
interactions where it would matter.

Since we first disable the device and free the interrupt config it seems
like we should be relatively at ease restoring the saved config prior to
reset and, as you suggest, re-ordering the unmap and region release.
That leaves us with something like below.  Does that work better for
your device?  Anyone on linux-pci have advice on ordering of
pci_reset_function?  Thanks,

Alex

Signed-off-by: Alex Williamson alex.william...@redhat.com

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 6c11994..af0b165 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -104,7 +104,13 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 
vfio_config_free(vdev);
 
-   pci_reset_function(vdev-pdev);
+   for (bar = PCI_STD_RESOURCES; bar = PCI_STD_RESOURCE_END; bar++) {
+   if (!vdev-barmap[bar])
+   continue;
+   pci_iounmap(vdev-pdev, vdev-barmap[bar]);
+   pci_release_selected_regions(vdev-pdev, 1  bar);
+   vdev-barmap[bar] = NULL;
+   }
 
if (pci_load_and_free_saved_state(vdev-pdev,
  vdev-pci_saved_state) == 0)
@@ -113,13 +119,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
pr_info(%s: Couldn't reload %s saved state\n,
__func__, dev_name(vdev-pdev-dev));
 
-   for (bar = PCI_STD_RESOURCES; bar = PCI_STD_RESOURCE_END; bar++) {
-   if (!vdev-barmap[bar])
-   continue;
-   pci_iounmap(vdev-pdev, vdev-barmap[bar

Re: [PATCHv3 0/4] qemu-kvm: vhost net support

2009-08-24 Thread Alex Williamson
On Sun, Aug 23, 2009 at 1:22 PM, Michael S. Tsirkinm...@redhat.com wrote:

 Just had a different, but slightly similar problem when the host running
 qemu had forwarding enabled. Is it possible your host is forwarding the
 packets somewhere else, and that's why we get the dupes?
 sysctl -w net.ipv4.conf.all.forwarding=0

Yes!  This seems to be the problem.  As expected, I can just disable
forwarding on eth10 and the duplicates disappear.  Thanks,

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv3 0/4] qemu-kvm: vhost net support

2009-08-20 Thread Alex Williamson
On Thu, Aug 20, 2009 at 1:03 AM, Michael S. Tsirkinm...@redhat.com wrote:

 I think the duplicates are our best hint that something's wrong at this
 point. Let's try to see where do they come from.

 What is it exactly that you see?

# ping 10.100.100.74
PING 10.100.100.74 (10.100.100.74) 56(84) bytes of data.
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=0.263 ms
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=0.358 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=0.394 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=0.478 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=0.626 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=0.727 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=0.834 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=0.909 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=0.997 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.08 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.12 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.18 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.29 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.38 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.46 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.55 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.56 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.65 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.68 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.70 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.83 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.89 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.95 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=1.98 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.08 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.22 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.27 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.31 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.44 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.50 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.51 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.57 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.63 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.76 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.81 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.84 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.90 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=2.94 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.02 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.06 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.09 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.15 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.17 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.24 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.28 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.35 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.37 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.44 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.47 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.54 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.57 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.64 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.67 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.75 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.77 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.84 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.87 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.94 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=3.98 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=4.04 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=4.07 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=4.15 ms (DUP!)
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=4.17 ms (DUP!)
From 10.91.73.253 icmp_seq=1 Time to live exceeded
64 bytes from 10.100.100.74: icmp_seq=1 ttl=64 time=4.26 ms (DUP!)
^C
--- 10.100.100.74 ping statistics ---
1 packets transmitted, 1 received, +63 duplicates, +1 errors, 0%
packet loss, time 0ms
rtt min/avg/max/mdev = 0.263/2.481/4.266/1.147 ms

 This ping is external box to guest,
 correct?

Either direction, external box-guest or guest-external box

 Is it the external box that gets duplicates or the guest?

Re: [PATCHv3 0/4] qemu-kvm: vhost net support

2009-08-18 Thread Alex Williamson
On Mon, Aug 17, 2009 at 6:37 AM, Michael S. Tsirkinm...@redhat.com wrote:
 This adds support for vhost-net virtio kernel backend.

 This is RFC, but works without issues for me.

I got this to build by syncing up some headers in kvm/include/linux,
but it doesn't seem to be working quite right.  I have an unused
e1000e nic in my system (eth10,00:17:a4:77:a4:08) that I ifconfig up,
then launch a VM with the option:

-net nic,model=virtio,macaddr=00:17:a4:77:a4:08,vhost=eth10

The virtio nic is functional in the guest, but it gets packet dupes on
ping, which is likely contributing to the poor performance, roughly
1/3rd of virtio-net userspace for tcp_stream and tcp_rr.  I think I
have all the offload option disabled on the nic in the host.  Any idea
what I might be doing wrong?  This seems quite a bit off of the udp_rr
results you posted.  Thanks,

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv3 0/4] qemu-kvm: vhost net support

2009-08-18 Thread Alex Williamson
On Tue, Aug 18, 2009 at 3:04 PM, Michael S. Tsirkinm...@redhat.com wrote:
 Did you assign ip address in host by any chance? You don't want that.

Nope, just up on the host, no IP:

eth10 Link encap:Ethernet  HWaddr 00:17:a4:77:a4:08
  inet6 addr: fe80::217:a4ff:fe77:a408/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:22446487 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4529008 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:1492187453 (1.3 GiB)  TX bytes:2972806236 (2.7 GiB)
  Memory:fbae-fbb0
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv3 0/4] qemu-kvm: vhost net support

2009-08-18 Thread Alex Williamson
On Tue, Aug 18, 2009 at 3:11 PM, Alex Williamsonalex.william...@hp.com wrote:
 On Tue, Aug 18, 2009 at 3:04 PM, Michael S. Tsirkinm...@redhat.com wrote:
 Did you assign ip address in host by any chance? You don't want that.

 Nope, just up on the host, no IP:

 eth10     Link encap:Ethernet  HWaddr 00:17:a4:77:a4:08
          inet6 addr: fe80::217:a4ff:fe77:a408/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:22446487 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4529008 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1492187453 (1.3 GiB)  TX bytes:2972806236 (2.7 GiB)
          Memory:fbae-fbb0


Hmm, I lose the dupe and get pretty respectable tcp_rr rates if I use
the host as my target IP, is the issue maybe isolated to off-box
communication?  tcp_stream is still worse than userspace though.

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization