Re: Host-PCI-Device mapping
Never mind, found the answers in kvm_set_user_memory :) On Fri, Oct 15, 2021 at 9:36 PM Ajay Garg wrote: > > Hello everyone. > > I have a x86_64 L1 guest, running on a x86_64 host, with a > host-pci-device attached to the guest. > The host runs with IOMMU enabled, and passthrough enabled. > > Following are the addresses of the bar0-region of the pci-device, as > per the output of lspci -v : > > * On host (hpa) => e2c2 > * On guest (gpa) => fc078000 > > > Now, if /proc//maps is dumped on the host, following line of > interest is seen : > > # > 7f0b5c5f4000-7f0b5c5f5000 rw-s e2c2 00:0e 13653 > anon_inode:[vfio-device] > # > > Above indicates that the hva for the pci-device starts from 0x7f0b5c5f4000. > > > Also, upon attaching gdb to the qemu process, and using a slightly > modified qemugdb/mtree.py (that prints only the information for > :0a:00.0 name) to dump the memory-regions, following is obtained : > > # > (gdb) source qemu-gdb.py > (gdb) qemu mtree > fc078000-fc07c095 :0a:00.0 base BAR 0 (I/O) (@ > 0x56540d8c8da0) > fc078000-fc07c095 :0a:00.0 BAR 0 (I/O) (@ > 0x56540d8c76b0) > fc078000-fc07c095 :0a:00.0 BAR 0 mmaps[0] > (I/O) (@ 0x56540d8c7c30) > (gdb) > # > > Above indicates that the hva for the pci-device starts from 0x56540d8c7c30. > > As seen, there is a discrepancy in the two results. > > > What am I missing? > Looking for pointers, will be grateful. > > > Thanks and Regards, > Ajay ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 04/12] iommu/mediatek: Add device_link between the consumer and the larb devices
On Mon, 2021-10-11 at 14:36 +0200, Dafna Hirschfeld wrote: > > On 29.09.21 03:37, Yong Wu wrote: > > MediaTek IOMMU-SMI diagram is like below. all the consumer connect > > with > > smi-larb, then connect with smi-common. > > > > M4U > > | > > smi-common > > | > >- > >| |... > >| | > > larb1 larb2 > >| | > > vdec venc > > > > When the consumer works, it should enable the smi-larb's power > > which > > also need enable the smi-common's power firstly. > > > > Thus, First of all, use the device link connect the consumer and > > the > > smi-larbs. then add device link between the smi-larb and smi- > > common. > > > > This patch adds device_link between the consumer and the larbs. > > > > When device_link_add, I add the flag DL_FLAG_STATELESS to avoid > > calling > > pm_runtime_xx to keep the original status of clocks. It can avoid > > two > > issues: > > 1) Display HW show fastlogo abnormally reported in [1]. At the > > beggining, > > all the clocks are enabled before entering kernel, but the clocks > > for > > display HW(always in larb0) will be gated after clk_enable and > > clk_disable > > called from device_link_add(->pm_runtime_resume) and rpm_idle. The > > clock > > operation happened before display driver probe. At that time, the > > display > > HW will be abnormal. > > > > 2) A deadlock issue reported in [2]. Use DL_FLAG_STATELESS to skip > > pm_runtime_xx to avoid the deadlock. > > > > Corresponding, DL_FLAG_AUTOREMOVE_CONSUMER can't be added, then > > device_link_removed should be added explicitly. > > > > [1] > > https://lore.kernel.org/linux-mediatek/1564213888.22908.4.camel@mhfsdcap03/ > > [2] https://lore.kernel.org/patchwork/patch/1086569/ > > > > Suggested-by: Tomasz Figa > > Signed-off-by: Yong Wu > > Tested-by: Frank Wunderlich # BPI- > > R2/MT7623 > > --- > > drivers/iommu/mtk_iommu.c| 22 ++ > > drivers/iommu/mtk_iommu_v1.c | 20 +++- > > 2 files changed, 41 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > > index d5848f78a677..a2fa55899434 100644 > > --- a/drivers/iommu/mtk_iommu.c > > +++ b/drivers/iommu/mtk_iommu.c > > @@ -560,22 +560,44 @@ static struct iommu_device > > *mtk_iommu_probe_device(struct device *dev) > > { > > struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > > struct mtk_iommu_data *data; > > + struct device_link *link; > > + struct device *larbdev; > > + unsigned int larbid; > > > > if (!fwspec || fwspec->ops != _iommu_ops) > > return ERR_PTR(-ENODEV); /* Not a iommu client device > > */ > > > > data = dev_iommu_priv_get(dev); > > > > + /* > > +* Link the consumer device with the smi-larb device(supplier) > > +* The device in each a larb is a independent HW. thus only > > link > > +* one larb here. > > +*/ > > + larbid = MTK_M4U_TO_LARB(fwspec->ids[0]); > > so larbid is always the same for all the ids of a device? Yes. For me, each a dtsi node should represent a HW unit which can only connect one larb. > If so maybe it worth testing it and return error if this is not the > case. Thanks for the suggestion. This is very helpful. I did see someone put the different larbs in one node. I will check this, and add return EINVAL for this case. > > Thanks, > Dafna > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Host-PCI-Device mapping
Hello everyone. I have a x86_64 L1 guest, running on a x86_64 host, with a host-pci-device attached to the guest. The host runs with IOMMU enabled, and passthrough enabled. Following are the addresses of the bar0-region of the pci-device, as per the output of lspci -v : * On host (hpa) => e2c2 * On guest (gpa) => fc078000 Now, if /proc//maps is dumped on the host, following line of interest is seen : # 7f0b5c5f4000-7f0b5c5f5000 rw-s e2c2 00:0e 13653 anon_inode:[vfio-device] # Above indicates that the hva for the pci-device starts from 0x7f0b5c5f4000. Also, upon attaching gdb to the qemu process, and using a slightly modified qemugdb/mtree.py (that prints only the information for :0a:00.0 name) to dump the memory-regions, following is obtained : # (gdb) source qemu-gdb.py (gdb) qemu mtree fc078000-fc07c095 :0a:00.0 base BAR 0 (I/O) (@ 0x56540d8c8da0) fc078000-fc07c095 :0a:00.0 BAR 0 (I/O) (@ 0x56540d8c76b0) fc078000-fc07c095 :0a:00.0 BAR 0 mmaps[0] (I/O) (@ 0x56540d8c7c30) (gdb) # Above indicates that the hva for the pci-device starts from 0x56540d8c7c30. As seen, there is a discrepancy in the two results. What am I missing? Looking for pointers, will be grateful. Thanks and Regards, Ajay ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 03/13] memory: mtk-smi: Use clk_bulk clock ops
Il 15/10/21 15:43, Krzysztof Kozlowski ha scritto: On 15/10/2021 15:38, AngeloGioacchino Del Regno wrote: Use clk_bulk interface instead of the orginal one to simplify the code. For SMI larbs: Require apb/smi clocks while gals is optional. For SMI common: Require apb/smi/gals0/gal1 in has_gals case. Otherwise, also only require apb/smi, No optional clk here. About the "has_gals" flag, for smi larbs, the gals clock also may be optional even this platform support it. thus it always use *_bulk_get_optional, then the flag has_gals is unnecessary. Remove it. The smi_common's has_gals still keep it. Also remove clk fail logs since bulk interface already output fail log. Signed-off-by: Yong Wu Hello Yong, thanks for the patch! However, I have an improvement to point out: --- drivers/memory/mtk-smi.c | 143 +++ 1 file changed, 55 insertions(+), 88 deletions(-) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index c5fb51f73b34..f91eaf5c3ab0 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -60,6 +60,20 @@ enum mtk_smi_gen { MTK_SMI_GEN2 }; +#define MTK_SMI_CLK_NR_MAX 4 This refers to mtk_smi_common_clks[] and should be probably moved after that. In any case, I don't think that there's any need to manually define this as 4, as you can simply use the macro ARRAY_SIZE(mtk_smi_common_clks). Using that will make you able to not update this definition everytime an update occurs to the mtk_smi_common_clks array. + +/* larbs: Require apb/smi clocks while gals is optional. */ +static const char * const mtk_smi_larb_clks[] = {"apb", "smi", "gals"}; +#define MTK_SMI_LARB_REQ_CLK_NR2 +#define MTK_SMI_LARB_OPT_CLK_NR1 + +/* + * common: Require these four clocks in has_gals case. Otherwise, only apb/smi are required. + */ +static const char * const mtk_smi_common_clks[] = {"apb", "smi", "gals0", "gals1"}; +#define MTK_SMI_COM_REQ_CLK_NR 2 +#define MTK_SMI_COM_GALS_REQ_CLK_NRMTK_SMI_CLK_NR_MAX + Apart from that, Acked-By: AngeloGioacchino Del Regno The patchset was merged around a month ago: https://lore.kernel.org/lkml/163229303729.7874.4095337797772755570.b4...@canonical.com/ Best regards, Krzysztof Whoops. Sorry for that. I'll send a patch to address what I pointed out, unless the original author of this series wants to. Thanks, - Angelo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 03/13] memory: mtk-smi: Use clk_bulk clock ops
Use clk_bulk interface instead of the orginal one to simplify the code. For SMI larbs: Require apb/smi clocks while gals is optional. For SMI common: Require apb/smi/gals0/gal1 in has_gals case. Otherwise, also only require apb/smi, No optional clk here. About the "has_gals" flag, for smi larbs, the gals clock also may be optional even this platform support it. thus it always use *_bulk_get_optional, then the flag has_gals is unnecessary. Remove it. The smi_common's has_gals still keep it. Also remove clk fail logs since bulk interface already output fail log. Signed-off-by: Yong Wu Hello Yong, thanks for the patch! However, I have an improvement to point out: --- drivers/memory/mtk-smi.c | 143 +++ 1 file changed, 55 insertions(+), 88 deletions(-) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index c5fb51f73b34..f91eaf5c3ab0 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -60,6 +60,20 @@ enum mtk_smi_gen { MTK_SMI_GEN2 }; +#define MTK_SMI_CLK_NR_MAX 4 This refers to mtk_smi_common_clks[] and should be probably moved after that. In any case, I don't think that there's any need to manually define this as 4, as you can simply use the macro ARRAY_SIZE(mtk_smi_common_clks). Using that will make you able to not update this definition everytime an update occurs to the mtk_smi_common_clks array. + +/* larbs: Require apb/smi clocks while gals is optional. */ +static const char * const mtk_smi_larb_clks[] = {"apb", "smi", "gals"}; +#define MTK_SMI_LARB_REQ_CLK_NR2 +#define MTK_SMI_LARB_OPT_CLK_NR1 + +/* + * common: Require these four clocks in has_gals case. Otherwise, only apb/smi are required. + */ +static const char * const mtk_smi_common_clks[] = {"apb", "smi", "gals0", "gals1"}; +#define MTK_SMI_COM_REQ_CLK_NR 2 +#define MTK_SMI_COM_GALS_REQ_CLK_NRMTK_SMI_CLK_NR_MAX + Apart from that, Acked-By: AngeloGioacchino Del Regno Regards, - Angelo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 03/13] memory: mtk-smi: Use clk_bulk clock ops
On 15/10/2021 15:38, AngeloGioacchino Del Regno wrote: >> Use clk_bulk interface instead of the orginal one to simplify the code. >> >> For SMI larbs: Require apb/smi clocks while gals is optional. >> For SMI common: Require apb/smi/gals0/gal1 in has_gals case. Otherwise, >> also only require apb/smi, No optional clk here. >> >> About the "has_gals" flag, for smi larbs, the gals clock also may be >> optional even this platform support it. thus it always use >> *_bulk_get_optional, then the flag has_gals is unnecessary. Remove it. >> The smi_common's has_gals still keep it. >> >> Also remove clk fail logs since bulk interface already output fail log. >> >> Signed-off-by: Yong Wu > > Hello Yong, > thanks for the patch! However, I have an improvement to point out: > >> --- >> drivers/memory/mtk-smi.c | 143 +++ >> 1 file changed, 55 insertions(+), 88 deletions(-) >> >> diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c >> index c5fb51f73b34..f91eaf5c3ab0 100644 >> --- a/drivers/memory/mtk-smi.c >> +++ b/drivers/memory/mtk-smi.c >> @@ -60,6 +60,20 @@ enum mtk_smi_gen { >> MTK_SMI_GEN2 >> }; >> >> +#define MTK_SMI_CLK_NR_MAX 4 > > This refers to mtk_smi_common_clks[] and should be probably moved after that. > In any case, I don't think that there's any need to manually define this as 4, > as you can simply use the macro ARRAY_SIZE(mtk_smi_common_clks). > Using that will make you able to not update this definition everytime an > update > occurs to the mtk_smi_common_clks array. > >> + >> +/* larbs: Require apb/smi clocks while gals is optional. */ >> +static const char * const mtk_smi_larb_clks[] = {"apb", "smi", "gals"}; >> +#define MTK_SMI_LARB_REQ_CLK_NR 2 >> +#define MTK_SMI_LARB_OPT_CLK_NR 1 >> + >> +/* >> + * common: Require these four clocks in has_gals case. Otherwise, only >> apb/smi are required. >> + */ >> +static const char * const mtk_smi_common_clks[] = {"apb", "smi", "gals0", >> "gals1"}; >> +#define MTK_SMI_COM_REQ_CLK_NR 2 >> +#define MTK_SMI_COM_GALS_REQ_CLK_NR MTK_SMI_CLK_NR_MAX >> + > > Apart from that, > Acked-By: AngeloGioacchino Del Regno The patchset was merged around a month ago: https://lore.kernel.org/lkml/163229303729.7874.4095337797772755570.b4...@canonical.com/ Best regards, Krzysztof ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
> From: Jason Gunthorpe > Sent: Friday, October 15, 2021 7:18 PM > > On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote: > > > > Acquire from the xarray is > > >rcu_lock() > > >ioas = xa_load() > > >if (ioas) > > > if (down_read_trylock(>destroying_lock)) > > > > all good suggestions, will refine accordingly. Here destroying_lock is a > > rw_semaphore. right? Since down_read_trylock() accepts a rwsem. > > Yes, you probably need a sleeping lock got it. thanks, Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote: > > Acquire from the xarray is > >rcu_lock() > >ioas = xa_load() > >if (ioas) > > if (down_read_trylock(>destroying_lock)) > > all good suggestions, will refine accordingly. Here destroying_lock is a > rw_semaphore. right? Since down_read_trylock() accepts a rwsem. Yes, you probably need a sleeping lock Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On Fri, Oct 15, 2021 at 01:29:16AM +, Tian, Kevin wrote: > Hi, Jason, > > > From: Jason Gunthorpe > > Sent: Wednesday, September 29, 2021 8:59 PM > > > > On Wed, Sep 29, 2021 at 12:38:35AM +, Tian, Kevin wrote: > > > > > /* If set the driver must call iommu_XX as the first action in probe() or > > > * before it attempts to do DMA > > > */ > > > bool suppress_dma_owner:1; > > > > It is not "attempts to do DMA" but more "operates the physical device > > in any away" > > > > Not having ownership means another entity could be using user space > > DMA to manipulate the device state and attack the integrity of the > > kernel's programming of the device. > > > > Does suppress_kernel_dma sounds better than suppress_dma_owner? > We found the latter causing some confusion when doing internal > code review. Somehow this flag represents "don't claim the kernel dma > ownership during driver binding". suppress_dma_owner sounds the > entire ownership is disabled... If in doubt make it suppress_iommu_whatever_the_api_is_that_isn't_called > Another thing is about DMA_OWNER_SHARED, which is set to indicate > no dma at all. Thinking more we feel that this flag is meaningless. Its > sole purpose is to show compatibility to any USER/KERNEL ownership, > and essentially the same semantics as a device which is not bound to > any driver. So we plan to remove it then pci-stub just needs one line > change to set the suppress flag. But want to check with you first in case > any oversight. It sounds reasonable, but also makes it much harder to find the few places that have this special relationship - ie we can't grep for DMA_OWNER_SHARED anymore. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 07/20] iommu/iommufd: Add iommufd_[un]bind_device()
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 1:15 AM > > On Sun, Sep 19, 2021 at 02:38:35PM +0800, Liu Yi L wrote: > > > +/* > > + * A iommufd_device object represents the binding relationship > > + * between iommufd and device. It is created per a successful > > + * binding request from device driver. The bound device must be > > + * a physical device so far. Subdevice will be supported later > > + * (with additional PASID information). An user-assigned cookie > > + * is also recorded to mark the device in the /dev/iommu uAPI. > > + */ > > +struct iommufd_device { > > + unsigned int id; > > + struct iommufd_ctx *ictx; > > + struct device *dev; /* always be the physical device */ > > + u64 dev_cookie; > > }; > > > > static int iommufd_fops_open(struct inode *inode, struct file *filep) > > @@ -32,15 +52,58 @@ static int iommufd_fops_open(struct inode *inode, > struct file *filep) > > return -ENOMEM; > > > > refcount_set(>refs, 1); > > + mutex_init(>lock); > > + xa_init_flags(>device_xa, XA_FLAGS_ALLOC); > > filep->private_data = ictx; > > > > return ret; > > } > > > > +static void iommufd_ctx_get(struct iommufd_ctx *ictx) > > +{ > > + refcount_inc(>refs); > > +} > > See my earlier remarks about how to structure the lifetime logic, this > ref isn't necessary. > > > +static const struct file_operations iommufd_fops; > > + > > +/** > > + * iommufd_ctx_fdget - Acquires a reference to the internal iommufd > context. > > + * @fd: [in] iommufd file descriptor. > > + * > > + * Returns a pointer to the iommufd context, otherwise NULL; > > + * > > + */ > > +static struct iommufd_ctx *iommufd_ctx_fdget(int fd) > > +{ > > + struct fd f = fdget(fd); > > + struct file *file = f.file; > > + struct iommufd_ctx *ictx; > > + > > + if (!file) > > + return NULL; > > + > > + if (file->f_op != _fops) > > + return NULL; > > Leaks the fdget > > > + > > + ictx = file->private_data; > > + if (ictx) > > + iommufd_ctx_get(ictx); > > Use success oriented flow > > > + fdput(f); > > + return ictx; > > +} > > > + */ > > +struct iommufd_device *iommufd_bind_device(int fd, struct device *dev, > > + u64 dev_cookie) > > +{ > > + struct iommufd_ctx *ictx; > > + struct iommufd_device *idev; > > + unsigned long index; > > + unsigned int id; > > + int ret; > > + > > + ictx = iommufd_ctx_fdget(fd); > > + if (!ictx) > > + return ERR_PTR(-EINVAL); > > + > > + mutex_lock(>lock); > > + > > + /* check duplicate registration */ > > + xa_for_each(>device_xa, index, idev) { > > + if (idev->dev == dev || idev->dev_cookie == dev_cookie) { > > + idev = ERR_PTR(-EBUSY); > > + goto out_unlock; > > + } > > I can't think of a reason why this expensive check is needed. > > > + } > > + > > + idev = kzalloc(sizeof(*idev), GFP_KERNEL); > > + if (!idev) { > > + ret = -ENOMEM; > > + goto out_unlock; > > + } > > + > > + /* Establish the security context */ > > + ret = iommu_device_init_user_dma(dev, (unsigned long)ictx); > > + if (ret) > > + goto out_free; > > + > > + ret = xa_alloc(>device_xa, , idev, > > + XA_LIMIT(IOMMUFD_DEVID_MIN, > IOMMUFD_DEVID_MAX), > > + GFP_KERNEL); > > idev should be fully initialized before being placed in the xarray, so > this should be the last thing done. all good suggestions above. thanks for catching them. > Why not just use the standard xa_limit_32b instead of special single > use constants? yeah. should use xa_limit_32b. Regards, Yi Liu > Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
> From: Jason Gunthorpe > Sent: Tuesday, September 21, 2021 11:42 PM > > On Sun, Sep 19, 2021 at 02:38:29PM +0800, Liu Yi L wrote: > > /dev/iommu aims to provide a unified interface for managing I/O address > > spaces for devices assigned to userspace. This patch adds the initial > > framework to create a /dev/iommu node. Each open of this node returns an > > iommufd. And this fd is the handle for userspace to initiate its I/O > > address space management. > > > > One open: > > - We call this feature as IOMMUFD in Kconfig in this RFC. However this > > name is not clear enough to indicate its purpose to user. Back to 2010 > > vfio even introduced a /dev/uiommu [1] as the predecessor of its > > container concept. Is that a better name? Appreciate opinions here. > > > > [1] > https://lore.kernel.org/kvm/4c0eb470.1hmjondo00nivfm6%25p...@cisco.co > m/ > > > > Signed-off-by: Liu Yi L > > drivers/iommu/Kconfig | 1 + > > drivers/iommu/Makefile | 1 + > > drivers/iommu/iommufd/Kconfig | 11 > > drivers/iommu/iommufd/Makefile | 2 + > > drivers/iommu/iommufd/iommufd.c | 112 > > > 5 files changed, 127 insertions(+) > > create mode 100644 drivers/iommu/iommufd/Kconfig > > create mode 100644 drivers/iommu/iommufd/Makefile > > create mode 100644 drivers/iommu/iommufd/iommufd.c > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > > index 07b7c25cbed8..a83ce0acd09d 100644 > > +++ b/drivers/iommu/Kconfig > > @@ -136,6 +136,7 @@ config MSM_IOMMU > > > > source "drivers/iommu/amd/Kconfig" > > source "drivers/iommu/intel/Kconfig" > > +source "drivers/iommu/iommufd/Kconfig" > > > > config IRQ_REMAP > > bool "Support for Interrupt Remapping" > > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > > index c0fb0ba88143..719c799f23ad 100644 > > +++ b/drivers/iommu/Makefile > > @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > > obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o > > obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o > > +obj-$(CONFIG_IOMMUFD) += iommufd/ > > diff --git a/drivers/iommu/iommufd/Kconfig > b/drivers/iommu/iommufd/Kconfig > > new file mode 100644 > > index ..9fb7769a815d > > +++ b/drivers/iommu/iommufd/Kconfig > > @@ -0,0 +1,11 @@ > > +# SPDX-License-Identifier: GPL-2.0-only > > +config IOMMUFD > > + tristate "I/O Address Space management framework for passthrough > devices" > > + select IOMMU_API > > + default n > > + help > > + provides unified I/O address space management framework for > > + isolating untrusted DMAs via devices which are passed through > > + to userspace drivers. > > + > > + If you don't know what to do here, say N. > > diff --git a/drivers/iommu/iommufd/Makefile > b/drivers/iommu/iommufd/Makefile > > new file mode 100644 > > index ..54381a01d003 > > +++ b/drivers/iommu/iommufd/Makefile > > @@ -0,0 +1,2 @@ > > +# SPDX-License-Identifier: GPL-2.0-only > > +obj-$(CONFIG_IOMMUFD) += iommufd.o > > diff --git a/drivers/iommu/iommufd/iommufd.c > b/drivers/iommu/iommufd/iommufd.c > > new file mode 100644 > > index ..710b7e62988b > > +++ b/drivers/iommu/iommufd/iommufd.c > > @@ -0,0 +1,112 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * I/O Address Space Management for passthrough devices > > + * > > + * Copyright (C) 2021 Intel Corporation > > + * > > + * Author: Liu Yi L > > + */ > > + > > +#define pr_fmt(fmt)"iommufd: " fmt > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +/* Per iommufd */ > > +struct iommufd_ctx { > > + refcount_t refs; > > +}; > > A private_data of a struct file should avoid having a refcount (and > this should have been a kref anyhow) > > Use the refcount on the struct file instead. > > In general the lifetime models look overly convoluted to me with > refcounts being used as locks and going in all manner of directions. > > - No refcount on iommufd_ctx, this should use the fget on the fd. > The driver facing version of the API has the driver holds a fget > inside the iommufd_device. > > - Put a rwlock inside the iommufd_ioas that is a > 'destroying_lock'. The rwlock starts out unlocked. > > Acquire from the xarray is >rcu_lock() >ioas = xa_load() >if (ioas) > if (down_read_trylock(>destroying_lock)) all good suggestions, will refine accordingly. Here destroying_lock is a rw_semaphore. right? Since down_read_trylock() accepts a rwsem. Thanks, Yi Liu >// success > Unacquire is just up_read() > > Do down_write when the ioas is to be destroyed, do not return ebusy. > > - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does >not need locking (order it properly too, it is in the wrong order), and >don't check for duplicate devices or dev_cookie duplication, that