Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
On Tue, Oct 19, 2021 at 10:11:34AM -0700, Jacob Pan wrote: > Hi Jason, > > On Tue, 19 Oct 2021 13:57:47 -0300, Jason Gunthorpe wrote: > > > On Tue, Oct 19, 2021 at 09:57:34AM -0700, Jacob Pan wrote: > > > Hi Jason, > > > > > > On Fri, 15 Oct 2021 08:18:07 -0300, Jason Gunthorpe > > > wrote: > > > > On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote: > > > > > > > > > > Acquire from the xarray is > > > > > >rcu_lock() > > > > > >ioas = xa_load() > > > > > >if (ioas) > > > > > > if (down_read_trylock(&ioas->destroying_lock)) > > > > > > > > > > all good suggestions, will refine accordingly. Here destroying_lock > > > > > is a rw_semaphore. right? Since down_read_trylock() accepts a > > > > > rwsem. > > > > > > > > Yes, you probably need a sleeping lock > > > > > > > I am not following why we want a sleeping lock inside RCU protected > > > section? > > > > trylock is not sleeping > Of course, thanks for clarifying. > > > > For ioas, do we really care about the stale data to choose rw_lock vs > > > RCU? Destroying can be kfree_rcu? > > > > It needs a hard fence so things don't continue to use the IOS once it > > is destroyed. > I guess RCU can do that also perhaps can scale better? RCU is not a fence, it is an eventual consistency mechanism and has very bad scaling if you don't use things like kfree_rcu Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
Hi Jason, On Tue, 19 Oct 2021 13:57:47 -0300, Jason Gunthorpe wrote: > On Tue, Oct 19, 2021 at 09:57:34AM -0700, Jacob Pan wrote: > > Hi Jason, > > > > On Fri, 15 Oct 2021 08:18:07 -0300, Jason Gunthorpe > > wrote: > > > On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote: > > > > > > > > Acquire from the xarray is > > > > >rcu_lock() > > > > >ioas = xa_load() > > > > >if (ioas) > > > > > if (down_read_trylock(&ioas->destroying_lock)) > > > > > > > > all good suggestions, will refine accordingly. Here destroying_lock > > > > is a rw_semaphore. right? Since down_read_trylock() accepts a > > > > rwsem. > > > > > > Yes, you probably need a sleeping lock > > > > > I am not following why we want a sleeping lock inside RCU protected > > section? > > trylock is not sleeping Of course, thanks for clarifying. > > For ioas, do we really care about the stale data to choose rw_lock vs > > RCU? Destroying can be kfree_rcu? > > It needs a hard fence so things don't continue to use the IOS once it > is destroyed. I guess RCU can do that also perhaps can scale better? > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
On Tue, Oct 19, 2021 at 09:57:34AM -0700, Jacob Pan wrote: > Hi Jason, > > On Fri, 15 Oct 2021 08:18:07 -0300, Jason Gunthorpe wrote: > > > On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote: > > > > > > Acquire from the xarray is > > > >rcu_lock() > > > >ioas = xa_load() > > > >if (ioas) > > > > if (down_read_trylock(&ioas->destroying_lock)) > > > > > > all good suggestions, will refine accordingly. Here destroying_lock is a > > > rw_semaphore. right? Since down_read_trylock() accepts a rwsem. > > > > Yes, you probably need a sleeping lock > > > I am not following why we want a sleeping lock inside RCU protected > section? trylock is not sleeping > For ioas, do we really care about the stale data to choose rw_lock vs RCU? > Destroying can be kfree_rcu? It needs a hard fence so things don't continue to use the IOS once it is destroyed. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
Hi Jason, On Fri, 15 Oct 2021 08:18:07 -0300, Jason Gunthorpe wrote: > On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote: > > > > Acquire from the xarray is > > >rcu_lock() > > >ioas = xa_load() > > >if (ioas) > > > if (down_read_trylock(&ioas->destroying_lock)) > > > > all good suggestions, will refine accordingly. Here destroying_lock is a > > rw_semaphore. right? Since down_read_trylock() accepts a rwsem. > > Yes, you probably need a sleeping lock > I am not following why we want a sleeping lock inside RCU protected section? For ioas, do we really care about the stale data to choose rw_lock vs RCU? Destroying can be kfree_rcu? > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
> From: Jason Gunthorpe > Sent: Friday, October 15, 2021 7:18 PM > > On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote: > > > > Acquire from the xarray is > > >rcu_lock() > > >ioas = xa_load() > > >if (ioas) > > > if (down_read_trylock(&ioas->destroying_lock)) > > > > all good suggestions, will refine accordingly. Here destroying_lock is a > > rw_semaphore. right? Since down_read_trylock() accepts a rwsem. > > Yes, you probably need a sleeping lock got it. thanks, Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote: > > Acquire from the xarray is > >rcu_lock() > >ioas = xa_load() > >if (ioas) > > if (down_read_trylock(&ioas->destroying_lock)) > > all good suggestions, will refine accordingly. Here destroying_lock is a > rw_semaphore. right? Since down_read_trylock() accepts a rwsem. Yes, you probably need a sleeping lock Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
> From: Jason Gunthorpe > Sent: Tuesday, September 21, 2021 11:42 PM > > On Sun, Sep 19, 2021 at 02:38:29PM +0800, Liu Yi L wrote: > > /dev/iommu aims to provide a unified interface for managing I/O address > > spaces for devices assigned to userspace. This patch adds the initial > > framework to create a /dev/iommu node. Each open of this node returns an > > iommufd. And this fd is the handle for userspace to initiate its I/O > > address space management. > > > > One open: > > - We call this feature as IOMMUFD in Kconfig in this RFC. However this > > name is not clear enough to indicate its purpose to user. Back to 2010 > > vfio even introduced a /dev/uiommu [1] as the predecessor of its > > container concept. Is that a better name? Appreciate opinions here. > > > > [1] > https://lore.kernel.org/kvm/4c0eb470.1hmjondo00nivfm6%25p...@cisco.co > m/ > > > > Signed-off-by: Liu Yi L > > drivers/iommu/Kconfig | 1 + > > drivers/iommu/Makefile | 1 + > > drivers/iommu/iommufd/Kconfig | 11 > > drivers/iommu/iommufd/Makefile | 2 + > > drivers/iommu/iommufd/iommufd.c | 112 > > > 5 files changed, 127 insertions(+) > > create mode 100644 drivers/iommu/iommufd/Kconfig > > create mode 100644 drivers/iommu/iommufd/Makefile > > create mode 100644 drivers/iommu/iommufd/iommufd.c > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > > index 07b7c25cbed8..a83ce0acd09d 100644 > > +++ b/drivers/iommu/Kconfig > > @@ -136,6 +136,7 @@ config MSM_IOMMU > > > > source "drivers/iommu/amd/Kconfig" > > source "drivers/iommu/intel/Kconfig" > > +source "drivers/iommu/iommufd/Kconfig" > > > > config IRQ_REMAP > > bool "Support for Interrupt Remapping" > > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > > index c0fb0ba88143..719c799f23ad 100644 > > +++ b/drivers/iommu/Makefile > > @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > > obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o > > obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o > > +obj-$(CONFIG_IOMMUFD) += iommufd/ > > diff --git a/drivers/iommu/iommufd/Kconfig > b/drivers/iommu/iommufd/Kconfig > > new file mode 100644 > > index ..9fb7769a815d > > +++ b/drivers/iommu/iommufd/Kconfig > > @@ -0,0 +1,11 @@ > > +# SPDX-License-Identifier: GPL-2.0-only > > +config IOMMUFD > > + tristate "I/O Address Space management framework for passthrough > devices" > > + select IOMMU_API > > + default n > > + help > > + provides unified I/O address space management framework for > > + isolating untrusted DMAs via devices which are passed through > > + to userspace drivers. > > + > > + If you don't know what to do here, say N. > > diff --git a/drivers/iommu/iommufd/Makefile > b/drivers/iommu/iommufd/Makefile > > new file mode 100644 > > index ..54381a01d003 > > +++ b/drivers/iommu/iommufd/Makefile > > @@ -0,0 +1,2 @@ > > +# SPDX-License-Identifier: GPL-2.0-only > > +obj-$(CONFIG_IOMMUFD) += iommufd.o > > diff --git a/drivers/iommu/iommufd/iommufd.c > b/drivers/iommu/iommufd/iommufd.c > > new file mode 100644 > > index ..710b7e62988b > > +++ b/drivers/iommu/iommufd/iommufd.c > > @@ -0,0 +1,112 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * I/O Address Space Management for passthrough devices > > + * > > + * Copyright (C) 2021 Intel Corporation > > + * > > + * Author: Liu Yi L > > + */ > > + > > +#define pr_fmt(fmt)"iommufd: " fmt > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +/* Per iommufd */ > > +struct iommufd_ctx { > > + refcount_t refs; > > +}; > > A private_data of a struct file should avoid having a refcount (and > this should have been a kref anyhow) > > Use the refcount on the struct file instead. > > In general the lifetime models look overly convoluted to me with > refcounts being used as locks and going in all manner of directions. > > - No refcount on iommufd_ctx, this should use the fget on the fd. > The driver facing version of the API has the driver holds a fget > inside the iommufd_device. > > - Put a rwlock inside the iommufd_ioas that is a > 'destroying_lock'. The rwlock starts out unlocked. > > Acquire from the xarray is >rcu_lock() >ioas = xa_load() >if (ioas) > if (down_read_trylock(&ioas->destroying_lock)) all good suggestions, will refine accordingly. Here destroying_lock is a rw_semaphore. right? Since down_read_trylock() accepts a rwsem. Thanks, Yi Liu >// success > Unacquire is just up_read() > > Do down_write when the ioas is to be destroyed, do not return ebusy. > > - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does >not need locking (order it properly too, it is in the wrong order), and >don't check for duplicate devices or dev_cookie duplication,
Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
On Wed, Sep 22, 2021 at 01:59:39PM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Wednesday, September 22, 2021 8:41 PM > > > > On Wed, Sep 22, 2021 at 01:51:03AM +, Tian, Kevin wrote: > > > > From: Jason Gunthorpe > > > > Sent: Tuesday, September 21, 2021 11:42 PM > > > > > > > > - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc > > does > > > >not need locking (order it properly too, it is in the wrong order), > > > > and > > > >don't check for duplicate devices or dev_cookie duplication, that > > > >is user error and is harmless to the kernel. > > > > > > > > > > I'm confused here. yes it's user error, but we check so many user errors > > > and then return -EINVAL, -EBUSY, etc. Why is this one special? > > > > Because it is expensive to calculate and forces a complicated locking > > scheme into the kernel. Without this check you don't need the locking > > that spans so much code, and simple RCU becomes acceptable. > > In case of duplication the kernel just uses the first entry which matches > the device when sending an event to userspace? Sure Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 8:41 PM > > On Wed, Sep 22, 2021 at 01:51:03AM +, Tian, Kevin wrote: > > > From: Jason Gunthorpe > > > Sent: Tuesday, September 21, 2021 11:42 PM > > > > > > - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc > does > > >not need locking (order it properly too, it is in the wrong order), and > > >don't check for duplicate devices or dev_cookie duplication, that > > >is user error and is harmless to the kernel. > > > > > > > I'm confused here. yes it's user error, but we check so many user errors > > and then return -EINVAL, -EBUSY, etc. Why is this one special? > > Because it is expensive to calculate and forces a complicated locking > scheme into the kernel. Without this check you don't need the locking > that spans so much code, and simple RCU becomes acceptable. > In case of duplication the kernel just uses the first entry which matches the device when sending an event to userspace? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
On Wed, Sep 22, 2021 at 01:51:03AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Tuesday, September 21, 2021 11:42 PM > > > > - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does > >not need locking (order it properly too, it is in the wrong order), and > >don't check for duplicate devices or dev_cookie duplication, that > >is user error and is harmless to the kernel. > > > > I'm confused here. yes it's user error, but we check so many user errors > and then return -EINVAL, -EBUSY, etc. Why is this one special? Because it is expensive to calculate and forces a complicated locking scheme into the kernel. Without this check you don't need the locking that spans so much code, and simple RCU becomes acceptable. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
> From: Jason Gunthorpe > Sent: Tuesday, September 21, 2021 11:42 PM > > - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does >not need locking (order it properly too, it is in the wrong order), and >don't check for duplicate devices or dev_cookie duplication, that >is user error and is harmless to the kernel. > I'm confused here. yes it's user error, but we check so many user errors and then return -EINVAL, -EBUSY, etc. Why is this one special? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
On Sun, Sep 19, 2021 at 02:38:29PM +0800, Liu Yi L wrote: > /dev/iommu aims to provide a unified interface for managing I/O address > spaces for devices assigned to userspace. This patch adds the initial > framework to create a /dev/iommu node. Each open of this node returns an > iommufd. And this fd is the handle for userspace to initiate its I/O > address space management. > > One open: > - We call this feature as IOMMUFD in Kconfig in this RFC. However this > name is not clear enough to indicate its purpose to user. Back to 2010 > vfio even introduced a /dev/uiommu [1] as the predecessor of its > container concept. Is that a better name? Appreciate opinions here. > > [1] https://lore.kernel.org/kvm/4c0eb470.1hmjondo00nivfm6%25p...@cisco.com/ > > Signed-off-by: Liu Yi L > drivers/iommu/Kconfig | 1 + > drivers/iommu/Makefile | 1 + > drivers/iommu/iommufd/Kconfig | 11 > drivers/iommu/iommufd/Makefile | 2 + > drivers/iommu/iommufd/iommufd.c | 112 > 5 files changed, 127 insertions(+) > create mode 100644 drivers/iommu/iommufd/Kconfig > create mode 100644 drivers/iommu/iommufd/Makefile > create mode 100644 drivers/iommu/iommufd/iommufd.c > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index 07b7c25cbed8..a83ce0acd09d 100644 > +++ b/drivers/iommu/Kconfig > @@ -136,6 +136,7 @@ config MSM_IOMMU > > source "drivers/iommu/amd/Kconfig" > source "drivers/iommu/intel/Kconfig" > +source "drivers/iommu/iommufd/Kconfig" > > config IRQ_REMAP > bool "Support for Interrupt Remapping" > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index c0fb0ba88143..719c799f23ad 100644 > +++ b/drivers/iommu/Makefile > @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o > obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o > +obj-$(CONFIG_IOMMUFD) += iommufd/ > diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig > new file mode 100644 > index ..9fb7769a815d > +++ b/drivers/iommu/iommufd/Kconfig > @@ -0,0 +1,11 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +config IOMMUFD > + tristate "I/O Address Space management framework for passthrough > devices" > + select IOMMU_API > + default n > + help > + provides unified I/O address space management framework for > + isolating untrusted DMAs via devices which are passed through > + to userspace drivers. > + > + If you don't know what to do here, say N. > diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile > new file mode 100644 > index ..54381a01d003 > +++ b/drivers/iommu/iommufd/Makefile > @@ -0,0 +1,2 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +obj-$(CONFIG_IOMMUFD) += iommufd.o > diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c > new file mode 100644 > index ..710b7e62988b > +++ b/drivers/iommu/iommufd/iommufd.c > @@ -0,0 +1,112 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * I/O Address Space Management for passthrough devices > + * > + * Copyright (C) 2021 Intel Corporation > + * > + * Author: Liu Yi L > + */ > + > +#define pr_fmt(fmt)"iommufd: " fmt > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* Per iommufd */ > +struct iommufd_ctx { > + refcount_t refs; > +}; A private_data of a struct file should avoid having a refcount (and this should have been a kref anyhow) Use the refcount on the struct file instead. In general the lifetime models look overly convoluted to me with refcounts being used as locks and going in all manner of directions. - No refcount on iommufd_ctx, this should use the fget on the fd. The driver facing version of the API has the driver holds a fget inside the iommufd_device. - Put a rwlock inside the iommufd_ioas that is a 'destroying_lock'. The rwlock starts out unlocked. Acquire from the xarray is rcu_lock() ioas = xa_load() if (ioas) if (down_read_trylock(&ioas->destroying_lock)) // success Unacquire is just up_read() Do down_write when the ioas is to be destroyed, do not return ebusy. - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does not need locking (order it properly too, it is in the wrong order), and don't check for duplicate devices or dev_cookie duplication, that is user error and is harmless to the kernel. > +static int iommufd_fops_release(struct inode *inode, struct file *filep) > +{ > + struct iommufd_ctx *ictx = filep->private_data; > + > + filep->private_data = NULL; unnecessary > + iommufd_ctx_put(ictx); > + > + return 0; > +} > + > +static long iommufd_fops_unl_ioctl(struct file *filep, > +unsigned int cmd, unsigned long arg) > +{ > + struct