RE: [RFC 03/20] vfio: Add vfio_[un]register_device()

2021-09-29 Thread Cornelia Huck
On Wed, Sep 29 2021, "Tian, Kevin"  wrote:

>> From: David Gibson 
>> Sent: Wednesday, September 29, 2021 10:44 AM
>> 
>> > One alternative option is to arrange device nodes in sub-directories based
>> > on the device type. But doing so also adds one trouble to userspace. The
>> > current vfio uAPI is designed to have the user query device type via
>> > VFIO_DEVICE_GET_INFO after opening the device. With this option the user
>> > instead needs to figure out the device type before opening the device, to
>> > identify the sub-directory.
>> 
>> Wouldn't this be up to the operator / configuration, rather than the
>> actual software though?  I would assume that typically the VFIO
>> program would be pointed at a specific vfio device node file to use,
>> e.g.
>>  my-vfio-prog -d /dev/vfio/pci/:0a:03.1
>> 
>> Or more generally, if you're expecting userspace to know a name in a
>> uniqu pattern, they can equally well know a "type/name" pair.
>> 
>
> You are correct. Currently:
>
> -device, vfio-pci,host=:BB:DD.F
> -device, vfio-pci,sysfdev=/sys/bus/pci/devices/ :BB:DD.F
> -device, vfio-platform,sysdev=/sys/bus/platform/devices/PNP0103:00
>
> above is definitely type/name information to find the related node. 
>
> Actually even for Jason's proposal we still need such information to
> identify the sysfs path.
>
> Then I feel type-based sub-directory does work. Adding another link
> to sysfs sounds unnecessary now. But I'm not sure whether we still
> want to create /dev/vfio/devices/vfio0 thing and related udev rule
> thing that you pointed out in another mail.

Still reading through this whole thread, but type-based subdirectories
also make the most sense to me. I don't really see userspace wanting to
grab just any device and then figure out whether it is the device it was
looking for, but rather immediately go to a specific device or at least
a device of a specific type.

Sequentially-numbered devices tend to become really unwieldy in my
experience if you are working on a system with loads of devices.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

2020-02-25 Thread Cornelia Huck
On Mon, 24 Feb 2020 19:49:53 +0100
Halil Pasic  wrote:

> On Mon, 24 Feb 2020 14:33:14 +1100
> David Gibson  wrote:
> 
> > On Fri, Feb 21, 2020 at 07:07:02PM +0100, Halil Pasic wrote:  
> > > On Fri, 21 Feb 2020 10:48:15 -0500
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:  
> > > > > On Fri, 21 Feb 2020 14:27:27 +1100
> > > > > David Gibson  wrote:
> > > > >   
> > > > > > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:  
> > > > > > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger 
> > > > > > > wrote:  
> > > > > > > > >From a users perspective it makes absolutely perfect sense to 
> > > > > > > > >use the  
> > > > > > > > bounce buffers when they are NEEDED. 
> > > > > > > > Forcing the user to specify iommu_platform just because you 
> > > > > > > > need bounce buffers
> > > > > > > > really feels wrong. And obviously we have a severe performance 
> > > > > > > > issue
> > > > > > > > because of the indirections.  
> > > > > > > 
> > > > > > > The point is that the user should not have to specify 
> > > > > > > iommu_platform.
> > > > > > > We need to make sure any new hypervisor (especially one that 
> > > > > > > might require
> > > > > > > bounce buffering) always sets it,  
> > > > > > 
> > > > > > So, I have draft qemu patches which enable iommu_platform by 
> > > > > > default.
> > > > > > But that's really because of other problems with !iommu_platform, 
> > > > > > not
> > > > > > anything to do with bounce buffering or secure VMs.
> > > > > > 
> > > > > > The thing is that the hypervisor *doesn't* require bounce buffering.
> > > > > > In the POWER (and maybe s390 as well) models for Secure VMs, it's 
> > > > > > the
> > > > > > *guest*'s choice to enter secure mode, so the hypervisor has no 
> > > > > > reason
> > > > > > to know whether the guest needs bounce buffering.  As far as the
> > > > > > hypervisor and qemu are concerned that's a guest internal detail, it
> > > > > > just expects to get addresses it can access whether those are GPAs
> > > > > > (iommu_platform=off) or IOVAs (iommu_platform=on).  
> > > > > 
> > > > > I very much agree!
> > > > >   
> > > > > >   
> > > > > > > as was a rather bogus legacy hack  
> > > > > > 
> > > > > > It was certainly a bad idea, but it was a bad idea that went into a
> > > > > > public spec and has been widely deployed for many years.  We can't
> > > > > > just pretend it didn't happen and move on.
> > > > > > 
> > > > > > Turning iommu_platform=on by default breaks old guests, some of 
> > > > > > which
> > > > > > we still care about.  We can't (automatically) do it only for guests
> > > > > > that need bounce buffering, because the hypervisor doesn't know that
> > > > > > ahead of time.  

We could default to iommu_platform=on on s390 when the host has active
support for protected virtualization... but that's just another kind of
horrible, so let's just pretend I didn't suggest it.

> > > > > 
> > > > > Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
> > > > > because for CCW I/O there is no such thing as IOMMU and the addresses
> > > > > are always physical addresses.  
> > > > 
> > > > Fix the name then. The spec calls is ACCESS_PLATFORM now, which
> > > > makes much more sense.  
> > > 
> > > I don't quite get it. Sorry. Maybe I will revisit this later.  
> > 
> > Halil, I think I can clarify this.
> > 
> > The "iommu_platform" flag doesn't necessarily have anything to do with
> > an iommu, although it often will.  Basically it means "access guest
> > memory via the bus's normal DMA mechanism" rather than "access guest
> > memory using GPA, because you're the hypervisor and you can do that".
> >   
> 
> Unfortunately, I don't think this is what is conveyed to the end users.
> Let's see what do we have documented:
> 
> Neither Qemu user documentation
> (https://www.qemu.org/docs/master/qemu-doc.html) nor online help:
> qemu-system-s390x -device virtio-net-ccw,?|grep iommu
>   iommu_platform=  - on/off (default: false)
> has any documentation on it.

Now, that's 'helpful' -- this certainly calls out for a bit of doc...

> 
> But libvirt does have have documenttion on the knob that contros
> iommu_platform for QEMU (when  QEMU is managed by libvirt):
> """
> Virtio-related options 
> 
> QEMU's virtio devices have some attributes related to the virtio
> transport under the driver element: The iommu attribute enables the use
> of emulated IOMMU by the device. The attribute ats controls the Address
> Translation Service support for PCIe devices. This is needed to make use
> of IOTLB support (see IOMMU device). Possible values are on or off.
> Since 3.5.0 
> """
> (https://libvirt.org/formatdomain.html#elementsVirtio)
> 
> Thus it seems the only available documentation says that it "enables the use
> of emulated IOMMU by the device".
> 
> And for vhost-user we have
> """
> When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not 

Re: [PATCH v4 3/4] vfio: zpci: defining the VFIO headers

2019-10-08 Thread Cornelia Huck
On Fri, 20 Sep 2019 11:14:28 -0400
Matthew Rosato  wrote:

> On 9/20/19 10:02 AM, Cornelia Huck wrote:
> > On Thu, 19 Sep 2019 16:55:57 -0400
> > Matthew Rosato  wrote:
> >   
> >> On 9/19/19 11:20 AM, Cornelia Huck wrote:  
> >>> On Fri,  6 Sep 2019 20:13:50 -0400
> >>> Matthew Rosato  wrote:
> >>> 
> >>>> From: Pierre Morel 
> >>>>
> >>>> We define a new device region in vfio.h to be able to
> >>>> get the ZPCI CLP information by reading this region from
> >>>> userland.
> >>>>
> >>>> We create a new file, vfio_zdev.h to define the structure
> >>>> of the new region we defined in vfio.h
> >>>>
> >>>> Signed-off-by: Pierre Morel 
> >>>> Signed-off-by: Matthew Rosato 
> >>>> ---
> >>>>  include/uapi/linux/vfio.h  |  1 +
> >>>>  include/uapi/linux/vfio_zdev.h | 35 +++
> >>>>  2 files changed, 36 insertions(+)
> >>>>  create mode 100644 include/uapi/linux/vfio_zdev.h  
> >   
> >>>> diff --git a/include/uapi/linux/vfio_zdev.h 
> >>>> b/include/uapi/linux/vfio_zdev.h
> >>>> new file mode 100644
> >>>> index 000..55e0d6d
> >>>> --- /dev/null
> >>>> +++ b/include/uapi/linux/vfio_zdev.h
> >>>> @@ -0,0 +1,35 @@
> >>>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> >>>> +/*
> >>>> + * Region definition for ZPCI devices
> >>>> + *
> >>>> + * Copyright IBM Corp. 2019
> >>>> + *
> >>>> + * Author(s): Pierre Morel 
> >>>> + */
> >>>> +
> >>>> +#ifndef _VFIO_ZDEV_H_
> >>>> +#define _VFIO_ZDEV_H_
> >>>> +
> >>>> +#include 
> >>>> +
> >>>> +/**
> >>>> + * struct vfio_region_zpci_info - ZPCI information.
> >>>
> >>> Hm... probably should also get some more explanation. E.g. is that
> >>> derived from a hardware structure?
> >>> 
> >>
> >> The structure itself is not mapped 1:1 to a hardware structure, but it
> >> does serve as a collection of information that was derived from other
> >> hardware structures.
> >>
> >> "Used for passing hardware feature information about a zpci device
> >> between the host and guest" ?  
> > 
> > "zPCI specific hardware feature information for a device"?
> > 
> > Are we reasonably sure that this is complete for now? I'm not sure if
> > expanding this structure would work; adding another should always be
> > possible, though (if a bit annoying).
> >   
> 
> I think trying to make the structure expandable would be best...  If we
> allow arbitrary-sized reads of the info, and only add new fields onto
> the end it should be OK, no? (older qemu doesn't get the info it doesn't
> ask for / understand)  But I guess that's not compatible with having
> util_str[] size being defined dynamically.  Another caveat would be if
> CLP_UTIL_STR_LEN were to grow in size in the future, and assuming
> util_str[] was no longer at the end of the structure, I guess the
> additional data would have to end up in a
> util_str2[CLP_UTIL_STR_LEN_NEW-CLP_UTIL_STR_LEN_OLD]...  To explain what
> I mean, something like:
> 
> struct vfio_region_zpci_info {
>   <..>
>   __u8 util_str[CLP_UTIL_STR_LEN_OLD];
>   /* END OF V1 */
>   __u8 foo;
>   /* END OF V2 */
>   __u8 util_str2[CLP_UTIL_STR_LEN_NEW-CLP_UTIL_STR_LEN_OLD];
>   /* END OF V3 */
> } __packed;

[Sorry about the late response -- was on PTO]

That sounds a bit too complicated to me, and I'd prefer the "add
another region if we missed something" approach. If we put anything
looking potentially useful in here now, that "add another region" event
is hopefully far in the future.

> 
> 
> >>  
> >>>> + *
> >>>> + */
> >>>> +struct vfio_region_zpci_info {
> >>>> +__u64 dasm;
> >>>> +__u64 start_dma;
> >>>> +__u64 end_dma;
> >>>> +__u64 msi_addr;
> >>>> +__u64 flags;
> >>>> +__u16 pchid;
> >>>> +__u16 mui;
> >>>> +__u16 noi;
> >>>> +__u16 maxstbl;
> >>>> +__u8 version;
> >>>> +__u8 gid;
> >>>> +#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1
> >>>> +__u8 util_str[];
> >>>> +} __packed;
> >>>> +
> >>>> +#endif
> >>>
> >>> 
> >>  
> >   
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/4] vfio: pci: Using a device region to retrieve zPCI information

2019-09-20 Thread Cornelia Huck
On Thu, 19 Sep 2019 16:57:10 -0400
Matthew Rosato  wrote:

> On 9/19/19 11:25 AM, Cornelia Huck wrote:
> > On Fri,  6 Sep 2019 20:13:51 -0400
> > Matthew Rosato  wrote:
> >   
> >> From: Pierre Morel 
> >>
> >> We define a new configuration entry for VFIO/PCI, VFIO_PCI_ZDEV
> >>
> >> When the VFIO_PCI_ZDEV feature is configured we initialize
> >> a new device region, VFIO_REGION_SUBTYPE_ZDEV_CLP, to hold
> >> the information from the ZPCI device the use
> >>
> >> Signed-off-by: Pierre Morel 
> >> Signed-off-by: Matthew Rosato 
> >> ---
> >>  drivers/vfio/pci/Kconfig|  7 +++
> >>  drivers/vfio/pci/Makefile   |  1 +
> >>  drivers/vfio/pci/vfio_pci.c |  9 
> >>  drivers/vfio/pci/vfio_pci_private.h | 10 +
> >>  drivers/vfio/pci/vfio_pci_zdev.c| 85 
> >> +
> >>  5 files changed, 112 insertions(+)
> >>  create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c
> >>
> >> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >> index ac3c1dd..d4562a8 100644
> >> --- a/drivers/vfio/pci/Kconfig
> >> +++ b/drivers/vfio/pci/Kconfig
> >> @@ -45,3 +45,10 @@ config VFIO_PCI_NVLINK2
> >>depends on VFIO_PCI && PPC_POWERNV
> >>help
> >>  VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs
> >> +
> >> +config VFIO_PCI_ZDEV
> >> +  bool "VFIO PCI Generic for ZPCI devices"
> >> +  depends on VFIO_PCI && S390
> >> +  default y
> >> +  help
> >> +VFIO PCI support for S390 Z-PCI devices  
> >   
> >>From that description, I'd have no idea whether I'd want that or not.  
> > Is there any downside to enabling it?
> >   
> 
> :) Not really, you're just getting information from the hardware vs
> using hard-coded defaults.  The only reason I could think of to turn it
> off would be if you wanted/needed to restore this hard-coded behavior.

I'm not really sure whether that's worth adding a Kconfig switch for.
Won't older versions simply ignore the new region anyway?

Also, I don't think we have any migration compatibility issues, as
vfio-pci devices are not (yet) migrateable anyway.

> 
> bool "VFIO PCI support for generic ZPCI devices" ?

"Support zPCI-specific configuration for VFIO PCI" ?

> 
> "Support for sharing ZPCI hardware device information between the host
> and guests." ?

"Enabling this options exposes a region containing hardware
configuration for zPCI devices. This enables userspace (e.g. QEMU) to
supply proper configuration values instead of hard-coded defaults for
zPCI devices passed through via VFIO on s390.

Say Y here."

?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/4] vfio: zpci: defining the VFIO headers

2019-09-20 Thread Cornelia Huck
On Thu, 19 Sep 2019 16:55:57 -0400
Matthew Rosato  wrote:

> On 9/19/19 11:20 AM, Cornelia Huck wrote:
> > On Fri,  6 Sep 2019 20:13:50 -0400
> > Matthew Rosato  wrote:
> >   
> >> From: Pierre Morel 
> >>
> >> We define a new device region in vfio.h to be able to
> >> get the ZPCI CLP information by reading this region from
> >> userland.
> >>
> >> We create a new file, vfio_zdev.h to define the structure
> >> of the new region we defined in vfio.h
> >>
> >> Signed-off-by: Pierre Morel 
> >> Signed-off-by: Matthew Rosato 
> >> ---
> >>  include/uapi/linux/vfio.h  |  1 +
> >>  include/uapi/linux/vfio_zdev.h | 35 +++
> >>  2 files changed, 36 insertions(+)
> >>  create mode 100644 include/uapi/linux/vfio_zdev.h

> >> diff --git a/include/uapi/linux/vfio_zdev.h 
> >> b/include/uapi/linux/vfio_zdev.h
> >> new file mode 100644
> >> index 000..55e0d6d
> >> --- /dev/null
> >> +++ b/include/uapi/linux/vfio_zdev.h
> >> @@ -0,0 +1,35 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> >> +/*
> >> + * Region definition for ZPCI devices
> >> + *
> >> + * Copyright IBM Corp. 2019
> >> + *
> >> + * Author(s): Pierre Morel 
> >> + */
> >> +
> >> +#ifndef _VFIO_ZDEV_H_
> >> +#define _VFIO_ZDEV_H_
> >> +
> >> +#include 
> >> +
> >> +/**
> >> + * struct vfio_region_zpci_info - ZPCI information.  
> > 
> > Hm... probably should also get some more explanation. E.g. is that
> > derived from a hardware structure?
> >   
> 
> The structure itself is not mapped 1:1 to a hardware structure, but it
> does serve as a collection of information that was derived from other
> hardware structures.
> 
> "Used for passing hardware feature information about a zpci device
> between the host and guest" ?

"zPCI specific hardware feature information for a device"?

Are we reasonably sure that this is complete for now? I'm not sure if
expanding this structure would work; adding another should always be
possible, though (if a bit annoying).

> 
> >> + *
> >> + */
> >> +struct vfio_region_zpci_info {
> >> +  __u64 dasm;
> >> +  __u64 start_dma;
> >> +  __u64 end_dma;
> >> +  __u64 msi_addr;
> >> +  __u64 flags;
> >> +  __u16 pchid;
> >> +  __u16 mui;
> >> +  __u16 noi;
> >> +  __u16 maxstbl;
> >> +  __u8 version;
> >> +  __u8 gid;
> >> +#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1
> >> +  __u8 util_str[];
> >> +} __packed;
> >> +
> >> +#endif  
> > 
> >   
> 



Re: [PATCH v4 4/4] vfio: pci: Using a device region to retrieve zPCI information

2019-09-19 Thread Cornelia Huck
On Fri,  6 Sep 2019 20:13:51 -0400
Matthew Rosato  wrote:

> From: Pierre Morel 
> 
> We define a new configuration entry for VFIO/PCI, VFIO_PCI_ZDEV
> 
> When the VFIO_PCI_ZDEV feature is configured we initialize
> a new device region, VFIO_REGION_SUBTYPE_ZDEV_CLP, to hold
> the information from the ZPCI device the use
> 
> Signed-off-by: Pierre Morel 
> Signed-off-by: Matthew Rosato 
> ---
>  drivers/vfio/pci/Kconfig|  7 +++
>  drivers/vfio/pci/Makefile   |  1 +
>  drivers/vfio/pci/vfio_pci.c |  9 
>  drivers/vfio/pci/vfio_pci_private.h | 10 +
>  drivers/vfio/pci/vfio_pci_zdev.c| 85 
> +
>  5 files changed, 112 insertions(+)
>  create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c
> 
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index ac3c1dd..d4562a8 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -45,3 +45,10 @@ config VFIO_PCI_NVLINK2
>   depends on VFIO_PCI && PPC_POWERNV
>   help
> VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs
> +
> +config VFIO_PCI_ZDEV
> + bool "VFIO PCI Generic for ZPCI devices"
> + depends on VFIO_PCI && S390
> + default y
> + help
> +   VFIO PCI support for S390 Z-PCI devices

>From that description, I'd have no idea whether I'd want that or not.
Is there any downside to enabling it?

> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index f027f8a..781e080 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -3,5 +3,6 @@
>  vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
>  vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
>  vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
> +vfio-pci-$(CONFIG_VFIO_PCI_ZDEV) += vfio_pci_zdev.o
>  
>  obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 703948c..b40544a 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -356,6 +356,15 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
>   }
>   }
>  
> + if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV)) {
> + ret = vfio_pci_zdev_init(vdev);
> + if (ret) {
> + dev_warn(>pdev->dev,
> +  "Failed to setup ZDEV regions\n");
> + goto disable_exit;
> + }
> + }
> +
>   vfio_pci_probe_mmaps(vdev);
>  
>   return 0;
> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> b/drivers/vfio/pci/vfio_pci_private.h
> index ee6ee91..08e02f5 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -186,4 +186,14 @@ static inline int vfio_pci_ibm_npu2_init(struct 
> vfio_pci_device *vdev)
>   return -ENODEV;
>  }
>  #endif
> +
> +#ifdef CONFIG_VFIO_PCI_ZDEV
> +extern int vfio_pci_zdev_init(struct vfio_pci_device *vdev);
> +#else
> +static inline int vfio_pci_zdev_init(struct vfio_pci_device *vdev)
> +{
> + return -ENODEV;

If you really want to have this configurable, why not just return 0
here and skip the IS_ENABLED check above?

> +}
> +#endif
> +
>  #endif /* VFIO_PCI_PRIVATE_H */

(...)


Re: [PATCH v4 3/4] vfio: zpci: defining the VFIO headers

2019-09-19 Thread Cornelia Huck
On Fri,  6 Sep 2019 20:13:50 -0400
Matthew Rosato  wrote:

> From: Pierre Morel 
> 
> We define a new device region in vfio.h to be able to
> get the ZPCI CLP information by reading this region from
> userland.
> 
> We create a new file, vfio_zdev.h to define the structure
> of the new region we defined in vfio.h
> 
> Signed-off-by: Pierre Morel 
> Signed-off-by: Matthew Rosato 
> ---
>  include/uapi/linux/vfio.h  |  1 +
>  include/uapi/linux/vfio_zdev.h | 35 +++
>  2 files changed, 36 insertions(+)
>  create mode 100644 include/uapi/linux/vfio_zdev.h
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 8f10748..8328c87 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -371,6 +371,7 @@ struct vfio_region_gfx_edid {
>   * to do TLB invalidation on a GPU.
>   */
>  #define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD (1)
> +#define VFIO_REGION_SUBTYPE_ZDEV_CLP (2)

Using a subtype is fine, but maybe add a comment what this is for?

>  
>  /*
>   * The MSIX mappable capability informs that MSIX data of a BAR can be 
> mmapped
> diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
> new file mode 100644
> index 000..55e0d6d
> --- /dev/null
> +++ b/include/uapi/linux/vfio_zdev.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Region definition for ZPCI devices
> + *
> + * Copyright IBM Corp. 2019
> + *
> + * Author(s): Pierre Morel 
> + */
> +
> +#ifndef _VFIO_ZDEV_H_
> +#define _VFIO_ZDEV_H_
> +
> +#include 
> +
> +/**
> + * struct vfio_region_zpci_info - ZPCI information.

Hm... probably should also get some more explanation. E.g. is that
derived from a hardware structure?

> + *
> + */
> +struct vfio_region_zpci_info {
> + __u64 dasm;
> + __u64 start_dma;
> + __u64 end_dma;
> + __u64 msi_addr;
> + __u64 flags;
> + __u16 pchid;
> + __u16 mui;
> + __u16 noi;
> + __u16 maxstbl;
> + __u8 version;
> + __u8 gid;
> +#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1
> + __u8 util_str[];
> +} __packed;
> +
> +#endif



Re: [PATCH v3 2/3] vfio: zpci: defining the VFIO headers

2019-05-23 Thread Cornelia Huck
On Thu, 23 May 2019 14:25:25 +0200
Pierre Morel  wrote:

> We define a new device region in vfio.h to be able to
> get the ZPCI CLP information by reading this region from
> userland.
> 
> We create a new file, vfio_zdev.h to define the structure
> of the new region we defined in vfio.h
> 
> Signed-off-by: Pierre Morel 
> ---
>  include/uapi/linux/vfio.h  |  4 
>  include/uapi/linux/vfio_zdev.h | 34 ++
>  2 files changed, 38 insertions(+)
>  create mode 100644 include/uapi/linux/vfio_zdev.h
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 8f10748..56595b8 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -306,6 +306,10 @@ struct vfio_region_info_cap_type {
>  #define VFIO_REGION_TYPE_GFX(1)
>  #define VFIO_REGION_SUBTYPE_GFX_EDID(1)
>  
> +/* IBM Subtypes */
> +#define VFIO_REGION_TYPE_IBM_ZDEV(1)
> +#define VFIO_REGION_SUBTYPE_ZDEV_CLP (1)

I'm afraid that confuses me a bit. You want to add the region to every
vfio-pci device when we're running under s390, right? So this does not
depend on the device type of the actual device (which may or may not be
from IBM), but only on the architecture?

(Generally speaking, I think using regions for this makes sense,
though.)

> +
>  /**
>   * struct vfio_region_gfx_edid - EDID region layout.
>   *
> diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
> new file mode 100644
> index 000..84b1a82
> --- /dev/null
> +++ b/include/uapi/linux/vfio_zdev.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Region definition for ZPCI devices
> + *
> + * Copyright IBM Corp. 2019
> + *
> + * Author(s): Pierre Morel 
> + */
> +
> +#ifndef _VFIO_ZDEV_H_
> +#define _VFIO_ZDEV_H_
> +
> +#include 
> +
> +/**
> + * struct vfio_region_zpci_info - ZPCI information.
> + *
> + */
> +struct vfio_region_zpci_info {
> + __u64 dasm;
> + __u64 start_dma;
> + __u64 end_dma;
> + __u64 msi_addr;
> + __u64 flags;
> + __u16 pchid;
> + __u16 mui;
> + __u16 noi;
> + __u8 gid;
> + __u8 version;
> +#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1
> + __u8 util_str[CLP_UTIL_STR_LEN];
> +} __packed;
> +
> +#endif

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/4] vfio: vfio_iommu_type1: implement VFIO_IOMMU_INFO_CAPABILITIES

2019-05-21 Thread Cornelia Huck
On Tue, 21 May 2019 11:14:38 +0200
Pierre Morel  wrote:

> 1) A short description, of zPCI functions and groups
> 
> IN Z, PCI cards, leave behind an adapter between subchannels and PCI.
> We access PCI cards through 2 ways:
> - dedicated PCI instructions (pci_load/pci_store/pci/store_block)
> - DMA

Quick question: What about the new pci instructions? Anything that
needs to be considered there?

> We receive events through
> - Adapter interrupts

Note for the non-s390 folks: These are (I/O) interrupts that are not
tied to a specific device. MSI-X is mapped to this.

> - CHSC events

Another note for the non-s390 folks: This is a notification mechanism
that is using machine check interrupts; more information is retrieved
via a special instruction (chsc).

> 
> The adapter propose an IOMMU to protect the DMA
> and the interrupt handling goes through a MSIX like interface handled by 
> the adapter.
> 
> The architecture specific PCI do the interface between the standard PCI 
> level and the zPCI function (PCI + DMA/IOMMU/Interrupt)
> 
> To handle the communication through the "zPCI way" the CLP interface 
> provides instructions to retrieve informations from the adapters.
> 
> There are different group of functions having same functionalities.
> 
> clp_list give us a list from zPCI functions
> clp_query_pci_function returns informations specific to a function
> clp_query_group returns information on a function group
> 
> 
> 2) Why do we need it in the guest
> 
> We need to provide the guest with information on the adapters and zPCI 
> functions returned by the clp_query instruction so that the guest's 
> driver gets the right information on how the way to the zPCI function 
> has been built in the host.
> 
> 
> When a guest issues the CLP instructions we intercept the clp command in 
> QEMU and we need to feed the response with the right values for the guest.
> The "right" values are not the raw CLP response values:
> 
> - some identifier must be virtualized, like UID and FID,
> 
> - some values must match what the host received from the CLP response, 
> like the size of the transmited blocks, the DMA Address Space Mask, 
> number of interrupt, MSIA
> 
> - some other must match what the host handled with the adapter and 
> function, the start and end of DMA,
> 
> - some what the host IOMMU driver supports (frame size),
> 
> 
> 
> 3) We have three different way to get This information:
> 
> The PCI Linux interface is a standard PCI interface and some Z specific 
> information is available in sysfs.
> Not all the information needed to be returned inside the CLP response is 
> available.
> So we can not use the sysfs interface to get all the information.
> 
> There is a CLP ioctl interface but this interface is not secure in that 
> it returns the information for all adapters in the system.
> 
> The VFIO interface offers the advantage to point to a single PCI 
> function, so more secure than the clp ioctl interface.
> Coupled with the s390_iommu we get access to the zPCI CLP instruction 
> and to the values handled by the zPCI driver.
> 
> 
> 4) Until now we used to fill the CLP response to the guest inside QEMU 
> with fixed values corresponding to the only PCI card we supported.
> To support new cards we need to get the right values from the kernel out.

IIRC, the current code fills in values that make sense for one specific
type of card only, right? We also use the same values for emulated
cards (virtio); I assume that they are not completely weird for that
case...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/4] vfio: vfio_iommu_type1: implement VFIO_IOMMU_INFO_CAPABILITIES

2019-05-20 Thread Cornelia Huck
On Mon, 20 May 2019 13:19:23 +0200
Pierre Morel  wrote:

> On 17/05/2019 20:04, Pierre Morel wrote:
> > On 17/05/2019 18:41, Alex Williamson wrote:  
> >> On Fri, 17 May 2019 18:16:50 +0200
> >> Pierre Morel  wrote:
> >>  
> >>> We implement the capability interface for VFIO_IOMMU_GET_INFO.
> >>>
> >>> When calling the ioctl, the user must specify
> >>> VFIO_IOMMU_INFO_CAPABILITIES to retrieve the capabilities and
> >>> must check in the answer if capabilities are supported.
> >>>
> >>> The iommu get_attr callback will be used to retrieve the specific
> >>> attributes and fill the capabilities.
> >>>
> >>> Currently two Z-PCI specific capabilities will be queried and
> >>> filled by the underlying Z specific s390_iommu:
> >>> VFIO_IOMMU_INFO_CAP_QFN for the PCI query function attributes
> >>> and
> >>> VFIO_IOMMU_INFO_CAP_QGRP for the PCI query function group.
> >>>
> >>> Other architectures may add new capabilities in the same way
> >>> after enhancing the architecture specific IOMMU driver.
> >>>
> >>> Signed-off-by: Pierre Morel 
> >>> ---
> >>>   drivers/vfio/vfio_iommu_type1.c | 122 
> >>> +++-
> >>>   1 file changed, 121 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >>> b/drivers/vfio/vfio_iommu_type1.c
> >>> index d0f731c..9435647 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -1658,6 +1658,97 @@ static int 
> >>> vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >>>   return ret;
> >>>   }
> >>> +static int vfio_iommu_type1_zpci_fn(struct iommu_domain *domain,
> >>> +    struct vfio_info_cap *caps, size_t size)
> >>> +{
> >>> +    struct vfio_iommu_type1_info_pcifn *info_fn;
> >>> +    int ret;
> >>> +
> >>> +    info_fn = kzalloc(size, GFP_KERNEL);
> >>> +    if (!info_fn)
> >>> +    return -ENOMEM;
> >>> +
> >>> +    ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_ZPCI_FN,
> >>> +    _fn->response);  
> >>
> >> What ensures that the 'struct clp_rsp_query_pci' returned from this
> >> get_attr remains consistent with a 'struct vfio_iommu_pci_function'?
> >> Why does the latter contains so many reserved fields (beyond simply
> >> alignment) for a user API?  What fields of these structures are
> >> actually useful to userspace?  Should any fields not be exposed to the
> >> user?  Aren't BAR sizes redundant to what's available through the vfio
> >> PCI API?  I'm afraid that simply redefining an internal structure as
> >> the API leaves a lot to be desired too.  Thanks,
> >>
> >> Alex
> >>  
> > Hi Alex,
> > 
> > I simply used the structure returned by the firmware to be sure to be 
> > consistent with future evolutions and facilitate the copy from CLP and 
> > to userland.
> > 
> > If you prefer, and I understand that this is the case, I can define a 
> > specific VFIO_IOMMU structure with only the fields relevant to the user, 
> > leaving future enhancement of the user's interface being implemented in 
> > another kernel patch when the time has come.
> > 
> > In fact, the struct will have all defined fields I used but not the BAR 
> > size and address (at least for now because there are special cases we do 
> > not support yet with bars).
> > All the reserved fields can go away.
> > 
> > Is it more conform to your idea?
> > 
> > Also I have 2 interfaces:
> > 
> > s390_iommu.get_attr <-I1-> VFIO_IOMMU <-I2-> userland
> > 
> > Do you prefer:
> > - 2 different structures, no CLP raw structure
> > - the CLP raw structure for I1 and a VFIO specific structure for I2  



IIUC, get_attr extracts various data points via clp, and we then make
it available to userspace. The clp interface needs to be abstracted
away at some point... one question from me: Is there a chance that
someone else may want to make use of the userspace interface (extra
information about a function)? If yes, I'd expect the get_attr to
obtain some kind of portable information already (basically your third
option, below).

> 
> Hi Alex,
> 
> I am back again on this.
> This solution here above seems to me the best one but in this way I must 
> include S390 specific include inside the iommu_type1, which is AFAIU not 
> a good thing.
> It seems that the powerpc architecture use a solution with a dedicated 
> VFIO_IOMMU, the vfio_iommu_spar_tce.
> 
> Wouldn't it be a solution for s390 too, to use the vfio_iommu_type1 as a 
> basis to have a s390 dedicated solution.
> Then it becomes easier to have on one side the s390_iommu interface, 
> S390 specific, and on the other side a VFIO interface without a blind 
> copy of the firmware values.

If nobody else would want this exact interface, it might be a solution.
It would still be better not to encode clp data explicitly in the
userspace interface.

> 
> Do you think it is a viable solution?
> 
> Thanks,
> Pierre
> 
> 
> 
> > - the same VFIO structure for both I1 and I2

Re: [RFC PATCH 3/7] vfio: add spimdev support

2018-08-02 Thread Cornelia Huck
On Thu, 2 Aug 2018 15:34:40 +0800
Kenneth Lee  wrote:

> On Thu, Aug 02, 2018 at 04:24:22AM +, Tian, Kevin wrote:

> > > From: Kenneth Lee [mailto:liguo...@hisilicon.com]
> > > Sent: Thursday, August 2, 2018 11:47 AM
> > >   
> > > >  
> > > > > From: Kenneth Lee
> > > > > Sent: Wednesday, August 1, 2018 6:22 PM
> > > > >
> > > > > From: Kenneth Lee 
> > > > >
> > > > > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ  
> > > from  
> > > > > the general vfio-mdev:
> > > > >
> > > > > 1. It shares its parent's IOMMU.
> > > > > 2. There is no hardware resource attached to the mdev is created. The
> > > > > hardware resource (A `queue') is allocated only when the mdev is
> > > > > opened.  
> > > >
> > > > Alex has concern on doing so, as pointed out in:
> > > >
> > > > https://www.spinics.net/lists/kvm/msg172652.html
> > > >
> > > > resource allocation should be reserved at creation time.  
> > > 
> > > Yes. That is why I keep telling that SPIMDEV is not for "VM", it is for 
> > > "many
> > > processes", it is just an access point to the process. Not a device to 
> > > VM. I
> > > hope
> > > Alex can accept it:)
> > >   
> > 
> > VFIO is just about assigning device resource to user space. It doesn't care
> > whether it's native processes or VM using the device so far. Along the 
> > direction
> > which you described, looks VFIO needs to support the configuration that
> > some mdevs are used for native process only, while others can be used
> > for both native and VM. I'm not sure whether there is a clean way to
> > enforce it...  
> 
> I had the same idea at the beginning. But finally I found that the life cycle
> of the virtual device for VM and process were different. Consider you create
> some mdevs for VM use, you will give all those mdevs to lib-virt, which
> distribute those mdev to VMs or containers. If the VM or container exits, the
> mdev is returned to the lib-virt and used for next allocation. It is the
> administrator who controlled every mdev's allocation.
> 
> But for process, it is different. There is no lib-virt in control. The
> administrator's intension is to grant some type of application to access the
> hardware. The application can get a handle of the hardware, send request and 
> get
> the result. That's all. He/She dose not care which mdev is allocated to that
> application. If it crashes, it should be the kernel's responsibility to 
> withdraw
> the resource, the system administrator does not want to do it by hand.

I don't think that you should distinguish the cases by the presence of
a management application. How can the mdev driver know what the
intention behind using the device is?

Would it make more sense to use a different mechanism to enforce that
applications only use those handles they are supposed to use? Maybe
cgroups? I don't think it's a good idea to push usage policy into the
kernel.

[I have not read the documentation, so I may be totally off on the
intended use of this.]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu