Re: Host-PCI-Device mapping

2021-10-15 Thread Ajay Garg
Never mind, found the answers in kvm_set_user_memory :)

On Fri, Oct 15, 2021 at 9:36 PM Ajay Garg  wrote:
>
> Hello everyone.
>
> I have a x86_64 L1 guest, running on a x86_64 host, with a
> host-pci-device attached to the guest.
> The host runs with IOMMU enabled, and passthrough enabled.
>
> Following are the addresses of the bar0-region of the pci-device, as
> per the output of lspci -v :
>
> * On host (hpa) => e2c2
> * On guest (gpa) => fc078000
>
>
> Now, if /proc//maps is dumped on the host, following line of
> interest is seen :
>
> #
> 7f0b5c5f4000-7f0b5c5f5000 rw-s e2c2 00:0e 13653
>   anon_inode:[vfio-device]
> #
>
> Above indicates that the hva for the pci-device starts from 0x7f0b5c5f4000.
>
>
> Also, upon attaching gdb to the qemu process, and using a slightly
> modified qemugdb/mtree.py (that prints only the information for
> :0a:00.0 name) to dump the memory-regions, following is obtained :
>
> #
> (gdb) source qemu-gdb.py
> (gdb) qemu mtree
> fc078000-fc07c095 :0a:00.0 base BAR 0 (I/O) (@
> 0x56540d8c8da0)
>   fc078000-fc07c095 :0a:00.0 BAR 0 (I/O) (@
> 0x56540d8c76b0)
> fc078000-fc07c095 :0a:00.0 BAR 0 mmaps[0]
> (I/O) (@ 0x56540d8c7c30)
> (gdb)
> #
>
> Above indicates that the hva for the pci-device starts from 0x56540d8c7c30.
>
> As seen, there is a discrepancy in the two results.
>
>
> What am I missing?
> Looking for pointers, will be grateful.
>
>
> Thanks and Regards,
> Ajay
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 04/12] iommu/mediatek: Add device_link between the consumer and the larb devices

2021-10-15 Thread Yong Wu
On Mon, 2021-10-11 at 14:36 +0200, Dafna Hirschfeld wrote:
> 
> On 29.09.21 03:37, Yong Wu wrote:
> > MediaTek IOMMU-SMI diagram is like below. all the consumer connect
> > with
> > smi-larb, then connect with smi-common.
> > 
> >  M4U
> >   |
> >  smi-common
> >   |
> >-
> >| |...
> >| |
> > larb1 larb2
> >| |
> > vdec   venc
> > 
> > When the consumer works, it should enable the smi-larb's power
> > which
> > also need enable the smi-common's power firstly.
> > 
> > Thus, First of all, use the device link connect the consumer and
> > the
> > smi-larbs. then add device link between the smi-larb and smi-
> > common.
> > 
> > This patch adds device_link between the consumer and the larbs.
> > 
> > When device_link_add, I add the flag DL_FLAG_STATELESS to avoid
> > calling
> > pm_runtime_xx to keep the original status of clocks. It can avoid
> > two
> > issues:
> > 1) Display HW show fastlogo abnormally reported in [1]. At the
> > beggining,
> > all the clocks are enabled before entering kernel, but the clocks
> > for
> > display HW(always in larb0) will be gated after clk_enable and
> > clk_disable
> > called from device_link_add(->pm_runtime_resume) and rpm_idle. The
> > clock
> > operation happened before display driver probe. At that time, the
> > display
> > HW will be abnormal.
> > 
> > 2) A deadlock issue reported in [2]. Use DL_FLAG_STATELESS to skip
> > pm_runtime_xx to avoid the deadlock.
> > 
> > Corresponding, DL_FLAG_AUTOREMOVE_CONSUMER can't be added, then
> > device_link_removed should be added explicitly.
> > 
> > [1] 
> > https://lore.kernel.org/linux-mediatek/1564213888.22908.4.camel@mhfsdcap03/
> > [2] https://lore.kernel.org/patchwork/patch/1086569/
> > 
> > Suggested-by: Tomasz Figa 
> > Signed-off-by: Yong Wu 
> > Tested-by: Frank Wunderlich  # BPI-
> > R2/MT7623
> > ---
> >   drivers/iommu/mtk_iommu.c| 22 ++
> >   drivers/iommu/mtk_iommu_v1.c | 20 +++-
> >   2 files changed, 41 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index d5848f78a677..a2fa55899434 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -560,22 +560,44 @@ static struct iommu_device
> > *mtk_iommu_probe_device(struct device *dev)
> >   {
> > struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> > struct mtk_iommu_data *data;
> > +   struct device_link *link;
> > +   struct device *larbdev;
> > +   unsigned int larbid;
> >   
> > if (!fwspec || fwspec->ops != _iommu_ops)
> > return ERR_PTR(-ENODEV); /* Not a iommu client device
> > */
> >   
> > data = dev_iommu_priv_get(dev);
> >   
> > +   /*
> > +* Link the consumer device with the smi-larb device(supplier)
> > +* The device in each a larb is a independent HW. thus only
> > link
> > +* one larb here.
> > +*/
> > +   larbid = MTK_M4U_TO_LARB(fwspec->ids[0]);
> 
> so larbid is always the same for all the ids of a device? 

Yes. For me, each a dtsi node should represent a HW unit which can only
connect one larb.

> If so maybe it worth testing it and return error if this is not the
> case.

Thanks for the suggestion. This is very helpful. I did see someone put
the different larbs in one node. I will check this, and add return
EINVAL for this case.

> 
> Thanks,
> Dafna
 
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Host-PCI-Device mapping

2021-10-15 Thread Ajay Garg
Hello everyone.

I have a x86_64 L1 guest, running on a x86_64 host, with a
host-pci-device attached to the guest.
The host runs with IOMMU enabled, and passthrough enabled.

Following are the addresses of the bar0-region of the pci-device, as
per the output of lspci -v :

* On host (hpa) => e2c2
* On guest (gpa) => fc078000


Now, if /proc//maps is dumped on the host, following line of
interest is seen :

#
7f0b5c5f4000-7f0b5c5f5000 rw-s e2c2 00:0e 13653
  anon_inode:[vfio-device]
#

Above indicates that the hva for the pci-device starts from 0x7f0b5c5f4000.


Also, upon attaching gdb to the qemu process, and using a slightly
modified qemugdb/mtree.py (that prints only the information for
:0a:00.0 name) to dump the memory-regions, following is obtained :

#
(gdb) source qemu-gdb.py
(gdb) qemu mtree
fc078000-fc07c095 :0a:00.0 base BAR 0 (I/O) (@
0x56540d8c8da0)
  fc078000-fc07c095 :0a:00.0 BAR 0 (I/O) (@
0x56540d8c76b0)
fc078000-fc07c095 :0a:00.0 BAR 0 mmaps[0]
(I/O) (@ 0x56540d8c7c30)
(gdb)
#

Above indicates that the hva for the pci-device starts from 0x56540d8c7c30.

As seen, there is a discrepancy in the two results.


What am I missing?
Looking for pointers, will be grateful.


Thanks and Regards,
Ajay
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 03/13] memory: mtk-smi: Use clk_bulk clock ops

2021-10-15 Thread AngeloGioacchino Del Regno

Il 15/10/21 15:43, Krzysztof Kozlowski ha scritto:

On 15/10/2021 15:38, AngeloGioacchino Del Regno wrote:

Use clk_bulk interface instead of the orginal one to simplify the code.

For SMI larbs: Require apb/smi clocks while gals is optional.
For SMI common: Require apb/smi/gals0/gal1 in has_gals case. Otherwise,
  also only require apb/smi, No optional clk here.

About the "has_gals" flag, for smi larbs, the gals clock also may be
optional even this platform support it. thus it always use
*_bulk_get_optional, then the flag has_gals is unnecessary. Remove it.
The smi_common's has_gals still keep it.

Also remove clk fail logs since bulk interface already output fail log.

Signed-off-by: Yong Wu 


Hello Yong,
thanks for the patch! However, I have an improvement to point out:


---
   drivers/memory/mtk-smi.c | 143 +++
   1 file changed, 55 insertions(+), 88 deletions(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index c5fb51f73b34..f91eaf5c3ab0 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -60,6 +60,20 @@ enum mtk_smi_gen {
MTK_SMI_GEN2
   };
   
+#define MTK_SMI_CLK_NR_MAX			4


This refers to mtk_smi_common_clks[] and should be probably moved after that.
In any case, I don't think that there's any need to manually define this as 4,
as you can simply use the macro ARRAY_SIZE(mtk_smi_common_clks).
Using that will make you able to not update this definition everytime an update
occurs to the mtk_smi_common_clks array.


+
+/* larbs: Require apb/smi clocks while gals is optional. */
+static const char * const mtk_smi_larb_clks[] = {"apb", "smi", "gals"};
+#define MTK_SMI_LARB_REQ_CLK_NR2
+#define MTK_SMI_LARB_OPT_CLK_NR1
+
+/*
+ * common: Require these four clocks in has_gals case. Otherwise, only apb/smi 
are required.
+ */
+static const char * const mtk_smi_common_clks[] = {"apb", "smi", "gals0", 
"gals1"};
+#define MTK_SMI_COM_REQ_CLK_NR 2
+#define MTK_SMI_COM_GALS_REQ_CLK_NRMTK_SMI_CLK_NR_MAX
+


Apart from that,
Acked-By: AngeloGioacchino Del Regno 


The patchset was merged around a month ago:
https://lore.kernel.org/lkml/163229303729.7874.4095337797772755570.b4...@canonical.com/


Best regards,
Krzysztof



Whoops. Sorry for that.
I'll send a patch to address what I pointed out, unless the original author of
this series wants to.

Thanks,
- Angelo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 03/13] memory: mtk-smi: Use clk_bulk clock ops

2021-10-15 Thread AngeloGioacchino Del Regno

Use clk_bulk interface instead of the orginal one to simplify the code.

For SMI larbs: Require apb/smi clocks while gals is optional.
For SMI common: Require apb/smi/gals0/gal1 in has_gals case. Otherwise,
 also only require apb/smi, No optional clk here.

About the "has_gals" flag, for smi larbs, the gals clock also may be
optional even this platform support it. thus it always use
*_bulk_get_optional, then the flag has_gals is unnecessary. Remove it.
The smi_common's has_gals still keep it.

Also remove clk fail logs since bulk interface already output fail log.

Signed-off-by: Yong Wu 


Hello Yong,
thanks for the patch! However, I have an improvement to point out:


---
  drivers/memory/mtk-smi.c | 143 +++
  1 file changed, 55 insertions(+), 88 deletions(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index c5fb51f73b34..f91eaf5c3ab0 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -60,6 +60,20 @@ enum mtk_smi_gen {
MTK_SMI_GEN2
  };
  
+#define MTK_SMI_CLK_NR_MAX			4


This refers to mtk_smi_common_clks[] and should be probably moved after that.
In any case, I don't think that there's any need to manually define this as 4,
as you can simply use the macro ARRAY_SIZE(mtk_smi_common_clks).
Using that will make you able to not update this definition everytime an update
occurs to the mtk_smi_common_clks array.


+
+/* larbs: Require apb/smi clocks while gals is optional. */
+static const char * const mtk_smi_larb_clks[] = {"apb", "smi", "gals"};
+#define MTK_SMI_LARB_REQ_CLK_NR2
+#define MTK_SMI_LARB_OPT_CLK_NR1
+
+/*
+ * common: Require these four clocks in has_gals case. Otherwise, only apb/smi 
are required.
+ */
+static const char * const mtk_smi_common_clks[] = {"apb", "smi", "gals0", 
"gals1"};
+#define MTK_SMI_COM_REQ_CLK_NR 2
+#define MTK_SMI_COM_GALS_REQ_CLK_NRMTK_SMI_CLK_NR_MAX
+


Apart from that,
Acked-By: AngeloGioacchino Del Regno 

Regards,
- Angelo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 03/13] memory: mtk-smi: Use clk_bulk clock ops

2021-10-15 Thread Krzysztof Kozlowski
On 15/10/2021 15:38, AngeloGioacchino Del Regno wrote:
>> Use clk_bulk interface instead of the orginal one to simplify the code.
>>
>> For SMI larbs: Require apb/smi clocks while gals is optional.
>> For SMI common: Require apb/smi/gals0/gal1 in has_gals case. Otherwise,
>>  also only require apb/smi, No optional clk here.
>>
>> About the "has_gals" flag, for smi larbs, the gals clock also may be
>> optional even this platform support it. thus it always use
>> *_bulk_get_optional, then the flag has_gals is unnecessary. Remove it.
>> The smi_common's has_gals still keep it.
>>
>> Also remove clk fail logs since bulk interface already output fail log.
>>
>> Signed-off-by: Yong Wu 
> 
> Hello Yong,
> thanks for the patch! However, I have an improvement to point out:
> 
>> ---
>>   drivers/memory/mtk-smi.c | 143 +++
>>   1 file changed, 55 insertions(+), 88 deletions(-)
>>
>> diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
>> index c5fb51f73b34..f91eaf5c3ab0 100644
>> --- a/drivers/memory/mtk-smi.c
>> +++ b/drivers/memory/mtk-smi.c
>> @@ -60,6 +60,20 @@ enum mtk_smi_gen {
>>  MTK_SMI_GEN2
>>   };
>>   
>> +#define MTK_SMI_CLK_NR_MAX  4
> 
> This refers to mtk_smi_common_clks[] and should be probably moved after that.
> In any case, I don't think that there's any need to manually define this as 4,
> as you can simply use the macro ARRAY_SIZE(mtk_smi_common_clks).
> Using that will make you able to not update this definition everytime an 
> update
> occurs to the mtk_smi_common_clks array.
> 
>> +
>> +/* larbs: Require apb/smi clocks while gals is optional. */
>> +static const char * const mtk_smi_larb_clks[] = {"apb", "smi", "gals"};
>> +#define MTK_SMI_LARB_REQ_CLK_NR 2
>> +#define MTK_SMI_LARB_OPT_CLK_NR 1
>> +
>> +/*
>> + * common: Require these four clocks in has_gals case. Otherwise, only 
>> apb/smi are required.
>> + */
>> +static const char * const mtk_smi_common_clks[] = {"apb", "smi", "gals0", 
>> "gals1"};
>> +#define MTK_SMI_COM_REQ_CLK_NR  2
>> +#define MTK_SMI_COM_GALS_REQ_CLK_NR MTK_SMI_CLK_NR_MAX
>> +
> 
> Apart from that,
> Acked-By: AngeloGioacchino Del Regno 

The patchset was merged around a month ago:
https://lore.kernel.org/lkml/163229303729.7874.4095337797772755570.b4...@canonical.com/


Best regards,
Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-15 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Friday, October 15, 2021 7:18 PM
> 
> On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote:
> 
> > >   Acquire from the xarray is
> > >rcu_lock()
> > >ioas = xa_load()
> > >if (ioas)
> > >   if (down_read_trylock(>destroying_lock))
> >
> > all good suggestions, will refine accordingly. Here destroying_lock is a
> > rw_semaphore. right? Since down_read_trylock() accepts a rwsem.
> 
> Yes, you probably need a sleeping lock

got it. thanks,

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-15 Thread Jason Gunthorpe via iommu
On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote:

> >   Acquire from the xarray is
> >rcu_lock()
> >ioas = xa_load()
> >if (ioas)
> >   if (down_read_trylock(>destroying_lock))
> 
> all good suggestions, will refine accordingly. Here destroying_lock is a
> rw_semaphore. right? Since down_read_trylock() accepts a rwsem.

Yes, you probably need a sleeping lock

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

2021-10-15 Thread Jason Gunthorpe via iommu
On Fri, Oct 15, 2021 at 01:29:16AM +, Tian, Kevin wrote:
> Hi, Jason,
> 
> > From: Jason Gunthorpe 
> > Sent: Wednesday, September 29, 2021 8:59 PM
> > 
> > On Wed, Sep 29, 2021 at 12:38:35AM +, Tian, Kevin wrote:
> > 
> > > /* If set the driver must call iommu_XX as the first action in probe() or
> > >   * before it attempts to do DMA
> > >   */
> > >  bool suppress_dma_owner:1;
> > 
> > It is not "attempts to do DMA" but more "operates the physical device
> > in any away"
> > 
> > Not having ownership means another entity could be using user space
> > DMA to manipulate the device state and attack the integrity of the
> > kernel's programming of the device.
> > 
> 
> Does suppress_kernel_dma sounds better than suppress_dma_owner?
> We found the latter causing some confusion when doing internal
> code review. Somehow this flag represents "don't claim the kernel dma
> ownership during driver binding". suppress_dma_owner sounds the
> entire ownership is disabled...

If in doubt make it

suppress_iommu_whatever_the_api_is_that_isn't_called

> Another thing is about DMA_OWNER_SHARED, which is set to indicate 
> no dma at all. Thinking more we feel that this flag is meaningless. Its
> sole purpose is to show compatibility to any USER/KERNEL ownership, 
> and essentially the same semantics as a device which is not bound to
> any driver. So we plan to remove it then pci-stub just needs one line 
> change to set the suppress flag. But want to check with you first in case
> any oversight.

It sounds reasonable, but also makes it much harder to find the few
places that have this special relationship - ie we can't grep for
DMA_OWNER_SHARED anymore.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 07/20] iommu/iommufd: Add iommufd_[un]bind_device()

2021-10-15 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 1:15 AM
> 
> On Sun, Sep 19, 2021 at 02:38:35PM +0800, Liu Yi L wrote:
> 
> > +/*
> > + * A iommufd_device object represents the binding relationship
> > + * between iommufd and device. It is created per a successful
> > + * binding request from device driver. The bound device must be
> > + * a physical device so far. Subdevice will be supported later
> > + * (with additional PASID information). An user-assigned cookie
> > + * is also recorded to mark the device in the /dev/iommu uAPI.
> > + */
> > +struct iommufd_device {
> > +   unsigned int id;
> > +   struct iommufd_ctx *ictx;
> > +   struct device *dev; /* always be the physical device */
> > +   u64 dev_cookie;
> >  };
> >
> >  static int iommufd_fops_open(struct inode *inode, struct file *filep)
> > @@ -32,15 +52,58 @@ static int iommufd_fops_open(struct inode *inode,
> struct file *filep)
> > return -ENOMEM;
> >
> > refcount_set(>refs, 1);
> > +   mutex_init(>lock);
> > +   xa_init_flags(>device_xa, XA_FLAGS_ALLOC);
> > filep->private_data = ictx;
> >
> > return ret;
> >  }
> >
> > +static void iommufd_ctx_get(struct iommufd_ctx *ictx)
> > +{
> > +   refcount_inc(>refs);
> > +}
> 
> See my earlier remarks about how to structure the lifetime logic, this
> ref isn't necessary.
> 
> > +static const struct file_operations iommufd_fops;
> > +
> > +/**
> > + * iommufd_ctx_fdget - Acquires a reference to the internal iommufd
> context.
> > + * @fd: [in] iommufd file descriptor.
> > + *
> > + * Returns a pointer to the iommufd context, otherwise NULL;
> > + *
> > + */
> > +static struct iommufd_ctx *iommufd_ctx_fdget(int fd)
> > +{
> > +   struct fd f = fdget(fd);
> > +   struct file *file = f.file;
> > +   struct iommufd_ctx *ictx;
> > +
> > +   if (!file)
> > +   return NULL;
> > +
> > +   if (file->f_op != _fops)
> > +   return NULL;
> 
> Leaks the fdget
> 
> > +
> > +   ictx = file->private_data;
> > +   if (ictx)
> > +   iommufd_ctx_get(ictx);
> 
> Use success oriented flow
> 
> > +   fdput(f);
> > +   return ictx;
> > +}
> 
> > + */
> > +struct iommufd_device *iommufd_bind_device(int fd, struct device *dev,
> > +  u64 dev_cookie)
> > +{
> > +   struct iommufd_ctx *ictx;
> > +   struct iommufd_device *idev;
> > +   unsigned long index;
> > +   unsigned int id;
> > +   int ret;
> > +
> > +   ictx = iommufd_ctx_fdget(fd);
> > +   if (!ictx)
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   mutex_lock(>lock);
> > +
> > +   /* check duplicate registration */
> > +   xa_for_each(>device_xa, index, idev) {
> > +   if (idev->dev == dev || idev->dev_cookie == dev_cookie) {
> > +   idev = ERR_PTR(-EBUSY);
> > +   goto out_unlock;
> > +   }
> 
> I can't think of a reason why this expensive check is needed.
> 
> > +   }
> > +
> > +   idev = kzalloc(sizeof(*idev), GFP_KERNEL);
> > +   if (!idev) {
> > +   ret = -ENOMEM;
> > +   goto out_unlock;
> > +   }
> > +
> > +   /* Establish the security context */
> > +   ret = iommu_device_init_user_dma(dev, (unsigned long)ictx);
> > +   if (ret)
> > +   goto out_free;
> > +
> > +   ret = xa_alloc(>device_xa, , idev,
> > +  XA_LIMIT(IOMMUFD_DEVID_MIN,
> IOMMUFD_DEVID_MAX),
> > +  GFP_KERNEL);
> 
> idev should be fully initialized before being placed in the xarray, so
> this should be the last thing done.

all good suggestions above. thanks for catching them.

> Why not just use the standard xa_limit_32b instead of special single
> use constants?

yeah. should use xa_limit_32b.

Regards,
Yi Liu

> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-15 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Tuesday, September 21, 2021 11:42 PM
> 
> On Sun, Sep 19, 2021 at 02:38:29PM +0800, Liu Yi L wrote:
> > /dev/iommu aims to provide a unified interface for managing I/O address
> > spaces for devices assigned to userspace. This patch adds the initial
> > framework to create a /dev/iommu node. Each open of this node returns an
> > iommufd. And this fd is the handle for userspace to initiate its I/O
> > address space management.
> >
> > One open:
> > - We call this feature as IOMMUFD in Kconfig in this RFC. However this
> >   name is not clear enough to indicate its purpose to user. Back to 2010
> >   vfio even introduced a /dev/uiommu [1] as the predecessor of its
> >   container concept. Is that a better name? Appreciate opinions here.
> >
> > [1]
> https://lore.kernel.org/kvm/4c0eb470.1hmjondo00nivfm6%25p...@cisco.co
> m/
> >
> > Signed-off-by: Liu Yi L 
> >  drivers/iommu/Kconfig   |   1 +
> >  drivers/iommu/Makefile  |   1 +
> >  drivers/iommu/iommufd/Kconfig   |  11 
> >  drivers/iommu/iommufd/Makefile  |   2 +
> >  drivers/iommu/iommufd/iommufd.c | 112
> 
> >  5 files changed, 127 insertions(+)
> >  create mode 100644 drivers/iommu/iommufd/Kconfig
> >  create mode 100644 drivers/iommu/iommufd/Makefile
> >  create mode 100644 drivers/iommu/iommufd/iommufd.c
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 07b7c25cbed8..a83ce0acd09d 100644
> > +++ b/drivers/iommu/Kconfig
> > @@ -136,6 +136,7 @@ config MSM_IOMMU
> >
> >  source "drivers/iommu/amd/Kconfig"
> >  source "drivers/iommu/intel/Kconfig"
> > +source "drivers/iommu/iommufd/Kconfig"
> >
> >  config IRQ_REMAP
> > bool "Support for Interrupt Remapping"
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index c0fb0ba88143..719c799f23ad 100644
> > +++ b/drivers/iommu/Makefile
> > @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
> >  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> >  obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
> >  obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
> > +obj-$(CONFIG_IOMMUFD) += iommufd/
> > diff --git a/drivers/iommu/iommufd/Kconfig
> b/drivers/iommu/iommufd/Kconfig
> > new file mode 100644
> > index ..9fb7769a815d
> > +++ b/drivers/iommu/iommufd/Kconfig
> > @@ -0,0 +1,11 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +config IOMMUFD
> > +   tristate "I/O Address Space management framework for passthrough
> devices"
> > +   select IOMMU_API
> > +   default n
> > +   help
> > + provides unified I/O address space management framework for
> > + isolating untrusted DMAs via devices which are passed through
> > + to userspace drivers.
> > +
> > + If you don't know what to do here, say N.
> > diff --git a/drivers/iommu/iommufd/Makefile
> b/drivers/iommu/iommufd/Makefile
> > new file mode 100644
> > index ..54381a01d003
> > +++ b/drivers/iommu/iommufd/Makefile
> > @@ -0,0 +1,2 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +obj-$(CONFIG_IOMMUFD) += iommufd.o
> > diff --git a/drivers/iommu/iommufd/iommufd.c
> b/drivers/iommu/iommufd/iommufd.c
> > new file mode 100644
> > index ..710b7e62988b
> > +++ b/drivers/iommu/iommufd/iommufd.c
> > @@ -0,0 +1,112 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * I/O Address Space Management for passthrough devices
> > + *
> > + * Copyright (C) 2021 Intel Corporation
> > + *
> > + * Author: Liu Yi L 
> > + */
> > +
> > +#define pr_fmt(fmt)"iommufd: " fmt
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +/* Per iommufd */
> > +struct iommufd_ctx {
> > +   refcount_t refs;
> > +};
> 
> A private_data of a struct file should avoid having a refcount (and
> this should have been a kref anyhow)
> 
> Use the refcount on the struct file instead.
> 
> In general the lifetime models look overly convoluted to me with
> refcounts being used as locks and going in all manner of directions.
> 
> - No refcount on iommufd_ctx, this should use the fget on the fd.
>   The driver facing version of the API has the driver holds a fget
>   inside the iommufd_device.
> 
> - Put a rwlock inside the iommufd_ioas that is a
>   'destroying_lock'. The rwlock starts out unlocked.
> 
>   Acquire from the xarray is
>rcu_lock()
>ioas = xa_load()
>if (ioas)
>   if (down_read_trylock(>destroying_lock))

all good suggestions, will refine accordingly. Here destroying_lock is a
rw_semaphore. right? Since down_read_trylock() accepts a rwsem.

Thanks,
Yi Liu

>// success
>   Unacquire is just up_read()
> 
>   Do down_write when the ioas is to be destroyed, do not return ebusy.
> 
>  - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does
>not need locking (order it properly too, it is in the wrong order), and
>don't check for duplicate devices or dev_cookie duplication, that