Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-19 Thread Jason Gunthorpe via iommu
On Tue, Oct 19, 2021 at 10:11:34AM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 19 Oct 2021 13:57:47 -0300, Jason Gunthorpe  wrote:
> 
> > On Tue, Oct 19, 2021 at 09:57:34AM -0700, Jacob Pan wrote:
> > > Hi Jason,
> > > 
> > > On Fri, 15 Oct 2021 08:18:07 -0300, Jason Gunthorpe 
> > > wrote: 
> > > > On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote:
> > > >   
> > > > > >   Acquire from the xarray is
> > > > > >rcu_lock()
> > > > > >ioas = xa_load()
> > > > > >if (ioas)
> > > > > >   if (down_read_trylock(&ioas->destroying_lock))
> > > > > 
> > > > > all good suggestions, will refine accordingly. Here destroying_lock
> > > > > is a rw_semaphore. right? Since down_read_trylock() accepts a
> > > > > rwsem.
> > > > 
> > > > Yes, you probably need a sleeping lock
> > > >   
> > > I am not following why we want a sleeping lock inside RCU protected
> > > section?  
> > 
> > trylock is not sleeping
> Of course, thanks for clarifying.
> 
> > > For ioas, do we really care about the stale data to choose rw_lock vs
> > > RCU? Destroying can be kfree_rcu?  
> > 
> > It needs a hard fence so things don't continue to use the IOS once it
> > is destroyed.
> I guess RCU can do that also perhaps can scale better?

RCU is not a fence, it is an eventual consistency mechanism and has
very bad scaling if you don't use things like kfree_rcu

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-19 Thread Jacob Pan
Hi Jason,

On Tue, 19 Oct 2021 13:57:47 -0300, Jason Gunthorpe  wrote:

> On Tue, Oct 19, 2021 at 09:57:34AM -0700, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Fri, 15 Oct 2021 08:18:07 -0300, Jason Gunthorpe 
> > wrote: 
> > > On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote:
> > >   
> > > > >   Acquire from the xarray is
> > > > >rcu_lock()
> > > > >ioas = xa_load()
> > > > >if (ioas)
> > > > >   if (down_read_trylock(&ioas->destroying_lock))
> > > > 
> > > > all good suggestions, will refine accordingly. Here destroying_lock
> > > > is a rw_semaphore. right? Since down_read_trylock() accepts a
> > > > rwsem.
> > > 
> > > Yes, you probably need a sleeping lock
> > >   
> > I am not following why we want a sleeping lock inside RCU protected
> > section?  
> 
> trylock is not sleeping
Of course, thanks for clarifying.

> > For ioas, do we really care about the stale data to choose rw_lock vs
> > RCU? Destroying can be kfree_rcu?  
> 
> It needs a hard fence so things don't continue to use the IOS once it
> is destroyed.
I guess RCU can do that also perhaps can scale better?

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-19 Thread Jason Gunthorpe via iommu
On Tue, Oct 19, 2021 at 09:57:34AM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Fri, 15 Oct 2021 08:18:07 -0300, Jason Gunthorpe  wrote:
> 
> > On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote:
> > 
> > > >   Acquire from the xarray is
> > > >rcu_lock()
> > > >ioas = xa_load()
> > > >if (ioas)
> > > >   if (down_read_trylock(&ioas->destroying_lock))  
> > > 
> > > all good suggestions, will refine accordingly. Here destroying_lock is a
> > > rw_semaphore. right? Since down_read_trylock() accepts a rwsem.  
> > 
> > Yes, you probably need a sleeping lock
> > 
> I am not following why we want a sleeping lock inside RCU protected
> section?

trylock is not sleeping
 
> For ioas, do we really care about the stale data to choose rw_lock vs RCU?
> Destroying can be kfree_rcu?

It needs a hard fence so things don't continue to use the IOS once it
is destroyed.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-19 Thread Jacob Pan
Hi Jason,

On Fri, 15 Oct 2021 08:18:07 -0300, Jason Gunthorpe  wrote:

> On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote:
> 
> > >   Acquire from the xarray is
> > >rcu_lock()
> > >ioas = xa_load()
> > >if (ioas)
> > >   if (down_read_trylock(&ioas->destroying_lock))  
> > 
> > all good suggestions, will refine accordingly. Here destroying_lock is a
> > rw_semaphore. right? Since down_read_trylock() accepts a rwsem.  
> 
> Yes, you probably need a sleeping lock
> 
I am not following why we want a sleeping lock inside RCU protected
section?

For ioas, do we really care about the stale data to choose rw_lock vs RCU?
Destroying can be kfree_rcu?
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-15 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Friday, October 15, 2021 7:18 PM
> 
> On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote:
> 
> > >   Acquire from the xarray is
> > >rcu_lock()
> > >ioas = xa_load()
> > >if (ioas)
> > >   if (down_read_trylock(&ioas->destroying_lock))
> >
> > all good suggestions, will refine accordingly. Here destroying_lock is a
> > rw_semaphore. right? Since down_read_trylock() accepts a rwsem.
> 
> Yes, you probably need a sleeping lock

got it. thanks,

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-15 Thread Jason Gunthorpe via iommu
On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote:

> >   Acquire from the xarray is
> >rcu_lock()
> >ioas = xa_load()
> >if (ioas)
> >   if (down_read_trylock(&ioas->destroying_lock))
> 
> all good suggestions, will refine accordingly. Here destroying_lock is a
> rw_semaphore. right? Since down_read_trylock() accepts a rwsem.

Yes, you probably need a sleeping lock

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-15 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Tuesday, September 21, 2021 11:42 PM
> 
> On Sun, Sep 19, 2021 at 02:38:29PM +0800, Liu Yi L wrote:
> > /dev/iommu aims to provide a unified interface for managing I/O address
> > spaces for devices assigned to userspace. This patch adds the initial
> > framework to create a /dev/iommu node. Each open of this node returns an
> > iommufd. And this fd is the handle for userspace to initiate its I/O
> > address space management.
> >
> > One open:
> > - We call this feature as IOMMUFD in Kconfig in this RFC. However this
> >   name is not clear enough to indicate its purpose to user. Back to 2010
> >   vfio even introduced a /dev/uiommu [1] as the predecessor of its
> >   container concept. Is that a better name? Appreciate opinions here.
> >
> > [1]
> https://lore.kernel.org/kvm/4c0eb470.1hmjondo00nivfm6%25p...@cisco.co
> m/
> >
> > Signed-off-by: Liu Yi L 
> >  drivers/iommu/Kconfig   |   1 +
> >  drivers/iommu/Makefile  |   1 +
> >  drivers/iommu/iommufd/Kconfig   |  11 
> >  drivers/iommu/iommufd/Makefile  |   2 +
> >  drivers/iommu/iommufd/iommufd.c | 112
> 
> >  5 files changed, 127 insertions(+)
> >  create mode 100644 drivers/iommu/iommufd/Kconfig
> >  create mode 100644 drivers/iommu/iommufd/Makefile
> >  create mode 100644 drivers/iommu/iommufd/iommufd.c
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 07b7c25cbed8..a83ce0acd09d 100644
> > +++ b/drivers/iommu/Kconfig
> > @@ -136,6 +136,7 @@ config MSM_IOMMU
> >
> >  source "drivers/iommu/amd/Kconfig"
> >  source "drivers/iommu/intel/Kconfig"
> > +source "drivers/iommu/iommufd/Kconfig"
> >
> >  config IRQ_REMAP
> > bool "Support for Interrupt Remapping"
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index c0fb0ba88143..719c799f23ad 100644
> > +++ b/drivers/iommu/Makefile
> > @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
> >  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> >  obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
> >  obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
> > +obj-$(CONFIG_IOMMUFD) += iommufd/
> > diff --git a/drivers/iommu/iommufd/Kconfig
> b/drivers/iommu/iommufd/Kconfig
> > new file mode 100644
> > index ..9fb7769a815d
> > +++ b/drivers/iommu/iommufd/Kconfig
> > @@ -0,0 +1,11 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +config IOMMUFD
> > +   tristate "I/O Address Space management framework for passthrough
> devices"
> > +   select IOMMU_API
> > +   default n
> > +   help
> > + provides unified I/O address space management framework for
> > + isolating untrusted DMAs via devices which are passed through
> > + to userspace drivers.
> > +
> > + If you don't know what to do here, say N.
> > diff --git a/drivers/iommu/iommufd/Makefile
> b/drivers/iommu/iommufd/Makefile
> > new file mode 100644
> > index ..54381a01d003
> > +++ b/drivers/iommu/iommufd/Makefile
> > @@ -0,0 +1,2 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +obj-$(CONFIG_IOMMUFD) += iommufd.o
> > diff --git a/drivers/iommu/iommufd/iommufd.c
> b/drivers/iommu/iommufd/iommufd.c
> > new file mode 100644
> > index ..710b7e62988b
> > +++ b/drivers/iommu/iommufd/iommufd.c
> > @@ -0,0 +1,112 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * I/O Address Space Management for passthrough devices
> > + *
> > + * Copyright (C) 2021 Intel Corporation
> > + *
> > + * Author: Liu Yi L 
> > + */
> > +
> > +#define pr_fmt(fmt)"iommufd: " fmt
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +/* Per iommufd */
> > +struct iommufd_ctx {
> > +   refcount_t refs;
> > +};
> 
> A private_data of a struct file should avoid having a refcount (and
> this should have been a kref anyhow)
> 
> Use the refcount on the struct file instead.
> 
> In general the lifetime models look overly convoluted to me with
> refcounts being used as locks and going in all manner of directions.
> 
> - No refcount on iommufd_ctx, this should use the fget on the fd.
>   The driver facing version of the API has the driver holds a fget
>   inside the iommufd_device.
> 
> - Put a rwlock inside the iommufd_ioas that is a
>   'destroying_lock'. The rwlock starts out unlocked.
> 
>   Acquire from the xarray is
>rcu_lock()
>ioas = xa_load()
>if (ioas)
>   if (down_read_trylock(&ioas->destroying_lock))

all good suggestions, will refine accordingly. Here destroying_lock is a
rw_semaphore. right? Since down_read_trylock() accepts a rwsem.

Thanks,
Yi Liu

>// success
>   Unacquire is just up_read()
> 
>   Do down_write when the ioas is to be destroyed, do not return ebusy.
> 
>  - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does
>not need locking (order it properly too, it is in the wrong order), and
>don't check for duplicate devices or dev_cookie duplication,

Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-09-22 Thread Jason Gunthorpe via iommu
On Wed, Sep 22, 2021 at 01:59:39PM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, September 22, 2021 8:41 PM
> > 
> > On Wed, Sep 22, 2021 at 01:51:03AM +, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe 
> > > > Sent: Tuesday, September 21, 2021 11:42 PM
> > > >
> > > >  - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc
> > does
> > > >not need locking (order it properly too, it is in the wrong order), 
> > > > and
> > > >don't check for duplicate devices or dev_cookie duplication, that
> > > >is user error and is harmless to the kernel.
> > > >
> > >
> > > I'm confused here. yes it's user error, but we check so many user errors
> > > and then return -EINVAL, -EBUSY, etc. Why is this one special?
> > 
> > Because it is expensive to calculate and forces a complicated locking
> > scheme into the kernel. Without this check you don't need the locking
> > that spans so much code, and simple RCU becomes acceptable.
> 
> In case of duplication the kernel just uses the first entry which matches
> the device when sending an event to userspace?

Sure

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-09-22 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 8:41 PM
> 
> On Wed, Sep 22, 2021 at 01:51:03AM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Tuesday, September 21, 2021 11:42 PM
> > >
> > >  - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc
> does
> > >not need locking (order it properly too, it is in the wrong order), and
> > >don't check for duplicate devices or dev_cookie duplication, that
> > >is user error and is harmless to the kernel.
> > >
> >
> > I'm confused here. yes it's user error, but we check so many user errors
> > and then return -EINVAL, -EBUSY, etc. Why is this one special?
> 
> Because it is expensive to calculate and forces a complicated locking
> scheme into the kernel. Without this check you don't need the locking
> that spans so much code, and simple RCU becomes acceptable.
> 

In case of duplication the kernel just uses the first entry which matches
the device when sending an event to userspace?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-09-22 Thread Jason Gunthorpe via iommu
On Wed, Sep 22, 2021 at 01:51:03AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, September 21, 2021 11:42 PM
> > 
> >  - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does
> >not need locking (order it properly too, it is in the wrong order), and
> >don't check for duplicate devices or dev_cookie duplication, that
> >is user error and is harmless to the kernel.
> > 
> 
> I'm confused here. yes it's user error, but we check so many user errors
> and then return -EINVAL, -EBUSY, etc. Why is this one special?

Because it is expensive to calculate and forces a complicated locking
scheme into the kernel. Without this check you don't need the locking
that spans so much code, and simple RCU becomes acceptable.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, September 21, 2021 11:42 PM
> 
>  - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does
>not need locking (order it properly too, it is in the wrong order), and
>don't check for duplicate devices or dev_cookie duplication, that
>is user error and is harmless to the kernel.
> 

I'm confused here. yes it's user error, but we check so many user errors
and then return -EINVAL, -EBUSY, etc. Why is this one special?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:29PM +0800, Liu Yi L wrote:
> /dev/iommu aims to provide a unified interface for managing I/O address
> spaces for devices assigned to userspace. This patch adds the initial
> framework to create a /dev/iommu node. Each open of this node returns an
> iommufd. And this fd is the handle for userspace to initiate its I/O
> address space management.
> 
> One open:
> - We call this feature as IOMMUFD in Kconfig in this RFC. However this
>   name is not clear enough to indicate its purpose to user. Back to 2010
>   vfio even introduced a /dev/uiommu [1] as the predecessor of its
>   container concept. Is that a better name? Appreciate opinions here.
> 
> [1] https://lore.kernel.org/kvm/4c0eb470.1hmjondo00nivfm6%25p...@cisco.com/
> 
> Signed-off-by: Liu Yi L 
>  drivers/iommu/Kconfig   |   1 +
>  drivers/iommu/Makefile  |   1 +
>  drivers/iommu/iommufd/Kconfig   |  11 
>  drivers/iommu/iommufd/Makefile  |   2 +
>  drivers/iommu/iommufd/iommufd.c | 112 
>  5 files changed, 127 insertions(+)
>  create mode 100644 drivers/iommu/iommufd/Kconfig
>  create mode 100644 drivers/iommu/iommufd/Makefile
>  create mode 100644 drivers/iommu/iommufd/iommufd.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 07b7c25cbed8..a83ce0acd09d 100644
> +++ b/drivers/iommu/Kconfig
> @@ -136,6 +136,7 @@ config MSM_IOMMU
>  
>  source "drivers/iommu/amd/Kconfig"
>  source "drivers/iommu/intel/Kconfig"
> +source "drivers/iommu/iommufd/Kconfig"
>  
>  config IRQ_REMAP
>   bool "Support for Interrupt Remapping"
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index c0fb0ba88143..719c799f23ad 100644
> +++ b/drivers/iommu/Makefile
> @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
>  obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
>  obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
> +obj-$(CONFIG_IOMMUFD) += iommufd/
> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> new file mode 100644
> index ..9fb7769a815d
> +++ b/drivers/iommu/iommufd/Kconfig
> @@ -0,0 +1,11 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config IOMMUFD
> + tristate "I/O Address Space management framework for passthrough 
> devices"
> + select IOMMU_API
> + default n
> + help
> +   provides unified I/O address space management framework for
> +   isolating untrusted DMAs via devices which are passed through
> +   to userspace drivers.
> +
> +   If you don't know what to do here, say N.
> diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
> new file mode 100644
> index ..54381a01d003
> +++ b/drivers/iommu/iommufd/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_IOMMUFD) += iommufd.o
> diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c
> new file mode 100644
> index ..710b7e62988b
> +++ b/drivers/iommu/iommufd/iommufd.c
> @@ -0,0 +1,112 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * I/O Address Space Management for passthrough devices
> + *
> + * Copyright (C) 2021 Intel Corporation
> + *
> + * Author: Liu Yi L 
> + */
> +
> +#define pr_fmt(fmt)"iommufd: " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Per iommufd */
> +struct iommufd_ctx {
> + refcount_t refs;
> +};

A private_data of a struct file should avoid having a refcount (and
this should have been a kref anyhow)

Use the refcount on the struct file instead.

In general the lifetime models look overly convoluted to me with
refcounts being used as locks and going in all manner of directions.

- No refcount on iommufd_ctx, this should use the fget on the fd.
  The driver facing version of the API has the driver holds a fget
  inside the iommufd_device.

- Put a rwlock inside the iommufd_ioas that is a
  'destroying_lock'. The rwlock starts out unlocked.
  
  Acquire from the xarray is
   rcu_lock()
   ioas = xa_load()
   if (ioas)
  if (down_read_trylock(&ioas->destroying_lock))
   // success
  Unacquire is just up_read()

  Do down_write when the ioas is to be destroyed, do not return ebusy.

 - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does
   not need locking (order it properly too, it is in the wrong order), and
   don't check for duplicate devices or dev_cookie duplication, that
   is user error and is harmless to the kernel.
  
> +static int iommufd_fops_release(struct inode *inode, struct file *filep)
> +{
> + struct iommufd_ctx *ictx = filep->private_data;
> +
> + filep->private_data = NULL;

unnecessary

> + iommufd_ctx_put(ictx);
> +
> + return 0;
> +}
> +
> +static long iommufd_fops_unl_ioctl(struct file *filep,
> +unsigned int cmd, unsigned long arg)
> +{
> + struct