Re: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD

2021-09-29 Thread David Gibson
On Wed, Sep 29, 2021 at 06:41:00AM +, Tian, Kevin wrote:
> > From: David Gibson 
> > Sent: Wednesday, September 29, 2021 2:01 PM
> > 
> > On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote:
> > > This patch adds VFIO_DEVICE_BIND_IOMMUFD for userspace to bind the
> > vfio
> > > device to an iommufd. No VFIO_DEVICE_UNBIND_IOMMUFD interface is
> > provided
> > > because it's implicitly done when the device fd is closed.
> > >
> > > In concept a vfio device can be bound to multiple iommufds, each hosting
> > > a subset of I/O address spaces attached by this device.
> > 
> > I really feel like this many<->many mapping between devices is going
> > to be super-confusing, and therefore make it really hard to be
> > confident we have all the rules right for proper isolation.
> 
> Based on new discussion on group ownership part (patch06), I feel this
> many<->many relationship will disappear. The context fd (either container
> or iommufd) will uniquely mark the ownership on a physical device and
> its group. With this design it's impractical to have one device bound
> to multiple iommufds. Actually I don't think this is a compelling usage
> in reality. The previous rationale was that no need to impose such restriction
> if no special reason... and now we have a reason. 
> 
> Jason, are you OK with this simplification?
> 
> > 
> > That's why I was suggesting a concept like endpoints, to break this
> > into two many<->one relationships.  I'm ok if that isn't visible in
> > the user API, but I think this is going to be really hard to keep
> > track of if it isn't explicit somewhere in the internals.
> > 
> 
> I think this endpoint concept is represented by ioas_device_info in
> patch14:
> 
> +/*
> + * An ioas_device_info object is created per each successful attaching
> + * request. A list of objects are maintained per ioas when the address
> + * space is shared by multiple devices.
> + */
> +struct ioas_device_info {
> + struct iommufd_device *idev;
> + struct list_head next;
>  };
> 
> currently it's 1:1 mapping before this object and iommufd_device, 
> because no pasid support yet.

Ok, I haven't read that far in the series yet.

> We can rename it to struct ioas_endpoint if it makes you feel
> better.

Meh.  The concept is much more important than the name.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD

2021-09-29 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 29, 2021 8:28 PM
> 
> On Wed, Sep 29, 2021 at 06:41:00AM +, Tian, Kevin wrote:
> > > From: David Gibson 
> > > Sent: Wednesday, September 29, 2021 2:01 PM
> > >
> > > On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote:
> > > > This patch adds VFIO_DEVICE_BIND_IOMMUFD for userspace to bind
> the
> > > vfio
> > > > device to an iommufd. No VFIO_DEVICE_UNBIND_IOMMUFD interface
> is
> > > provided
> > > > because it's implicitly done when the device fd is closed.
> > > >
> > > > In concept a vfio device can be bound to multiple iommufds, each
> hosting
> > > > a subset of I/O address spaces attached by this device.
> > >
> > > I really feel like this many<->many mapping between devices is going
> > > to be super-confusing, and therefore make it really hard to be
> > > confident we have all the rules right for proper isolation.
> >
> > Based on new discussion on group ownership part (patch06), I feel this
> > many<->many relationship will disappear. The context fd (either container
> > or iommufd) will uniquely mark the ownership on a physical device and
> > its group. With this design it's impractical to have one device bound
> > to multiple iommufds.
> 
> That should be a requirement! We have no way to prove that two
> iommufds are the same security domain, so devices/groups cannot be
> shared.
> 
> That is why the API I suggested takes in a struct file to ID the user
> security context. A group is accessible only from that single struct
> file and no more.
> 
> If the first series goes the way I outlined then I think David's
> concern about security is strongly solved as the IOMMU layer is
> directly managing it with a very clear responsiblity and semantic.
> 

Yes, this is also my understanding now.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD

2021-09-29 Thread Jason Gunthorpe via iommu
On Wed, Sep 29, 2021 at 06:41:00AM +, Tian, Kevin wrote:
> > From: David Gibson 
> > Sent: Wednesday, September 29, 2021 2:01 PM
> > 
> > On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote:
> > > This patch adds VFIO_DEVICE_BIND_IOMMUFD for userspace to bind the
> > vfio
> > > device to an iommufd. No VFIO_DEVICE_UNBIND_IOMMUFD interface is
> > provided
> > > because it's implicitly done when the device fd is closed.
> > >
> > > In concept a vfio device can be bound to multiple iommufds, each hosting
> > > a subset of I/O address spaces attached by this device.
> > 
> > I really feel like this many<->many mapping between devices is going
> > to be super-confusing, and therefore make it really hard to be
> > confident we have all the rules right for proper isolation.
> 
> Based on new discussion on group ownership part (patch06), I feel this
> many<->many relationship will disappear. The context fd (either container
> or iommufd) will uniquely mark the ownership on a physical device and
> its group. With this design it's impractical to have one device bound
> to multiple iommufds. 

That should be a requirement! We have no way to prove that two
iommufds are the same security domain, so devices/groups cannot be
shared.

That is why the API I suggested takes in a struct file to ID the user
security context. A group is accessible only from that single struct
file and no more.

If the first series goes the way I outlined then I think David's
concern about security is strongly solved as the IOMMU layer is
directly managing it with a very clear responsiblity and semantic.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD

2021-09-29 Thread Tian, Kevin
> From: David Gibson 
> Sent: Wednesday, September 29, 2021 2:01 PM
> 
> On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote:
> > This patch adds VFIO_DEVICE_BIND_IOMMUFD for userspace to bind the
> vfio
> > device to an iommufd. No VFIO_DEVICE_UNBIND_IOMMUFD interface is
> provided
> > because it's implicitly done when the device fd is closed.
> >
> > In concept a vfio device can be bound to multiple iommufds, each hosting
> > a subset of I/O address spaces attached by this device.
> 
> I really feel like this many<->many mapping between devices is going
> to be super-confusing, and therefore make it really hard to be
> confident we have all the rules right for proper isolation.

Based on new discussion on group ownership part (patch06), I feel this
many<->many relationship will disappear. The context fd (either container
or iommufd) will uniquely mark the ownership on a physical device and
its group. With this design it's impractical to have one device bound
to multiple iommufds. Actually I don't think this is a compelling usage
in reality. The previous rationale was that no need to impose such restriction
if no special reason... and now we have a reason. 

Jason, are you OK with this simplification?

> 
> That's why I was suggesting a concept like endpoints, to break this
> into two many<->one relationships.  I'm ok if that isn't visible in
> the user API, but I think this is going to be really hard to keep
> track of if it isn't explicit somewhere in the internals.
> 

I think this endpoint concept is represented by ioas_device_info in
patch14:

+/*
+ * An ioas_device_info object is created per each successful attaching
+ * request. A list of objects are maintained per ioas when the address
+ * space is shared by multiple devices.
+ */
+struct ioas_device_info {
+   struct iommufd_device *idev;
+   struct list_head next;
 };

currently it's 1:1 mapping before this object and iommufd_device, 
because no pasid support yet.

We can rename it to struct ioas_endpoint if it makes you feel better.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD

2021-09-29 Thread David Gibson
On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote:
> This patch adds VFIO_DEVICE_BIND_IOMMUFD for userspace to bind the vfio
> device to an iommufd. No VFIO_DEVICE_UNBIND_IOMMUFD interface is provided
> because it's implicitly done when the device fd is closed.
> 
> In concept a vfio device can be bound to multiple iommufds, each hosting
> a subset of I/O address spaces attached by this device.

I really feel like this many<->many mapping between devices is going
to be super-confusing, and therefore make it really hard to be
confident we have all the rules right for proper isolation.

That's why I was suggesting a concept like endpoints, to break this
into two many<->one relationships.  I'm ok if that isn't visible in
the user API, but I think this is going to be really hard to keep
track of if it isn't explicit somewhere in the internals.

> However as a
> starting point (matching current vfio), only one I/O address space is
> supported per vfio device. It implies one device can only be attached
> to one iommufd at this point.
> 
> Signed-off-by: Liu Yi L 
> ---
>  drivers/vfio/pci/Kconfig|  1 +
>  drivers/vfio/pci/vfio_pci.c | 72 -
>  drivers/vfio/pci/vfio_pci_private.h |  8 
>  include/uapi/linux/vfio.h   | 30 
>  4 files changed, 110 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 5e2e1b9a9fd3..3abfb098b4dc 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -5,6 +5,7 @@ config VFIO_PCI
>   depends on MMU
>   select VFIO_VIRQFD
>   select IRQ_BYPASS_MANAGER
> + select IOMMUFD
>   help
> Support for the PCI VFIO bus driver.  This is required to make
> use of PCI drivers using the VFIO framework.
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 145addde983b..20006bb66430 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -552,6 +552,16 @@ static void vfio_pci_release(struct vfio_device 
> *core_vdev)
>   vdev->req_trigger = NULL;
>   }
>   mutex_unlock(>igate);
> +
> + mutex_lock(>videv_lock);
> + if (vdev->videv) {
> + struct vfio_iommufd_device *videv = vdev->videv;
> +
> + vdev->videv = NULL;
> + iommufd_unbind_device(videv->idev);
> + kfree(videv);
> + }
> + mutex_unlock(>videv_lock);
>   }
>  
>   mutex_unlock(>reflck->lock);
> @@ -780,7 +790,66 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,
>   container_of(core_vdev, struct vfio_pci_device, vdev);
>   unsigned long minsz;
>  
> - if (cmd == VFIO_DEVICE_GET_INFO) {
> + if (cmd == VFIO_DEVICE_BIND_IOMMUFD) {
> + struct vfio_device_iommu_bind_data bind_data;
> + unsigned long minsz;
> + struct iommufd_device *idev;
> + struct vfio_iommufd_device *videv;
> +
> + /*
> +  * Reject the request if the device is already opened and
> +  * attached to a container.
> +  */
> + if (vfio_device_in_container(core_vdev))

Usually one would do argument sanity checks before checks that
actually depend on machine state.

> + return -ENOTTY;

This doesn't seem like the right error code.  It's a perfectly valid
operation for this device - just not available right now.

> +
> + minsz = offsetofend(struct vfio_device_iommu_bind_data, 
> dev_cookie);
> +
> + if (copy_from_user(_data, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (bind_data.argsz < minsz ||
> + bind_data.flags || bind_data.iommu_fd < 0)
> + return -EINVAL;
> +
> + mutex_lock(>videv_lock);
> + /*
> +  * Allow only one iommufd per device until multiple
> +  * address spaces (e.g. vSVA) support is introduced
> +  * in the future.
> +  */
> + if (vdev->videv) {
> + mutex_unlock(>videv_lock);
> + return -EBUSY;
> + }
> +
> + idev = iommufd_bind_device(bind_data.iommu_fd,
> +>pdev->dev,
> +bind_data.dev_cookie);
> + if (IS_ERR(idev)) {
> + mutex_unlock(>videv_lock);
> + return PTR_ERR(idev);
> + }
> +
> + videv = kzalloc(sizeof(*videv), GFP_KERNEL);
> + if (!videv) {
> + iommufd_unbind_device(idev);
> + mutex_unlock(>videv_lock);
> + return -ENOMEM;
> + }
> + videv->idev = idev;
> + videv->iommu_fd = 

Re: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD

2021-09-22 Thread Jason Gunthorpe via iommu
On Wed, Sep 22, 2021 at 03:01:01PM -0600, Alex Williamson wrote:
> On Tue, 21 Sep 2021 14:29:39 -0300
> Jason Gunthorpe  wrote:
> 
> > On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote:
> > > +struct vfio_device_iommu_bind_data {
> > > + __u32   argsz;
> > > + __u32   flags;
> > > + __s32   iommu_fd;
> > > + __u64   dev_cookie;  
> > 
> > Missing explicit padding
> > 
> > Always use __aligned_u64 in uapi headers, fix all the patches.
> 
> We don't need padding or explicit alignment if we just swap the order
> of iommu_fd and dev_cookie.  Thanks,

Yes, the padding should all be checked and minimized

But it is always good practice to always use __aligned_u64 in the uapi
headers just in case someone messes it up someday - it prevents small
mistakes from becoming an ABI mess.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD

2021-09-22 Thread Alex Williamson
On Tue, 21 Sep 2021 14:29:39 -0300
Jason Gunthorpe  wrote:

> On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote:
> > +struct vfio_device_iommu_bind_data {
> > +   __u32   argsz;
> > +   __u32   flags;
> > +   __s32   iommu_fd;
> > +   __u64   dev_cookie;  
> 
> Missing explicit padding
> 
> Always use __aligned_u64 in uapi headers, fix all the patches.

We don't need padding or explicit alignment if we just swap the order
of iommu_fd and dev_cookie.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote:
> This patch adds VFIO_DEVICE_BIND_IOMMUFD for userspace to bind the vfio
> device to an iommufd. No VFIO_DEVICE_UNBIND_IOMMUFD interface is provided
> because it's implicitly done when the device fd is closed.
> 
> In concept a vfio device can be bound to multiple iommufds, each hosting
> a subset of I/O address spaces attached by this device. However as a
> starting point (matching current vfio), only one I/O address space is
> supported per vfio device. It implies one device can only be attached
> to one iommufd at this point.
> 
> Signed-off-by: Liu Yi L 
>  drivers/vfio/pci/Kconfig|  1 +
>  drivers/vfio/pci/vfio_pci.c | 72 -
>  drivers/vfio/pci/vfio_pci_private.h |  8 
>  include/uapi/linux/vfio.h   | 30 
>  4 files changed, 110 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 5e2e1b9a9fd3..3abfb098b4dc 100644
> +++ b/drivers/vfio/pci/Kconfig
> @@ -5,6 +5,7 @@ config VFIO_PCI
>   depends on MMU
>   select VFIO_VIRQFD
>   select IRQ_BYPASS_MANAGER
> + select IOMMUFD
>   help
> Support for the PCI VFIO bus driver.  This is required to make
> use of PCI drivers using the VFIO framework.
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 145addde983b..20006bb66430 100644
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -552,6 +552,16 @@ static void vfio_pci_release(struct vfio_device 
> *core_vdev)
>   vdev->req_trigger = NULL;
>   }
>   mutex_unlock(>igate);
> +
> + mutex_lock(>videv_lock);
> + if (vdev->videv) {
> + struct vfio_iommufd_device *videv = vdev->videv;
> +
> + vdev->videv = NULL;
> + iommufd_unbind_device(videv->idev);
> + kfree(videv);
> + }
> + mutex_unlock(>videv_lock);
>   }
>  
>   mutex_unlock(>reflck->lock);
> @@ -780,7 +790,66 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,
>   container_of(core_vdev, struct vfio_pci_device, vdev);
>   unsigned long minsz;
>  
> - if (cmd == VFIO_DEVICE_GET_INFO) {
> + if (cmd == VFIO_DEVICE_BIND_IOMMUFD) {

Choosing to implement this through the ioctl multiplexor is what is
causing so much ugly gyration in the previous patches

This should be a straightforward new function and ops:

struct iommufd_device *vfio_pci_bind_iommufd(struct vfio_device *)
{
iommu_dev = iommufd_bind_device(bind_data.iommu_fd,
   >pdev->dev,
   bind_data.dev_cookie);
if (!iommu_dev) return ERR
vdev->iommu_dev = iommu_dev;
}
static const struct vfio_device_ops vfio_pci_ops = {
   .bind_iommufd = &*vfio_pci_bind_iommufd

If you do the other stuff I said then you'll notice that the
iommufd_bind_device() will provide automatic exclusivity.

The thread that sees ops->bind_device succeed will know it is the only
thread that can see that (by definition, the iommu enable user stuff
has to be exclusive and race free) thus it can go ahead and store the
iommu pointer.

The other half of the problem '>block_access' is solved by
manipulating the filp->f_ops. Start with a fops that can ONLY call the
above op. When the above op succeeds switch the fops to the normal
full ops. .

The same flow happens when the group fd spawns the device fd, just
parts of iommfd_bind_device are open coded into the vfio code, but the
whole flow and sequence should be the same.

> + /*
> +  * Reject the request if the device is already opened and
> +  * attached to a container.
> +  */
> + if (vfio_device_in_container(core_vdev))
> + return -ENOTTY;

This is wrongly locked

> +
> + minsz = offsetofend(struct vfio_device_iommu_bind_data, 
> dev_cookie);
> +
> + if (copy_from_user(_data, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (bind_data.argsz < minsz ||
> + bind_data.flags || bind_data.iommu_fd < 0)
> + return -EINVAL;
> +
> + mutex_lock(>videv_lock);
> + /*
> +  * Allow only one iommufd per device until multiple
> +  * address spaces (e.g. vSVA) support is introduced
> +  * in the future.
> +  */
> + if (vdev->videv) {
> + mutex_unlock(>videv_lock);
> + return -EBUSY;
> + }
> +
> + idev = iommufd_bind_device(bind_data.iommu_fd,
> +>pdev->dev,
> +bind_data.dev_cookie);
> + if (IS_ERR(idev)) {
> +