Re: [PATCH 01/21] dt-binding: memory: mediatek: Add a common larb-port header file

2020-07-12 Thread Pi-Hsun Shih
On Mon, Jul 13, 2020 at 2:06 AM Matthias Brugger  wrote:
>
>
>
> On 11/07/2020 08:48, Yong Wu wrote:
> > Put all the macros about smi larb/port togethers, this is a preparing
> > patch for extending LARB_NR and adding new dom-id support.
> >
> > Signed-off-by: Yong Wu 
> > ---
> >   include/dt-bindings/memory/mt2712-larb-port.h  |  2 +-
> >   include/dt-bindings/memory/mt6779-larb-port.h  |  2 +-
> >   include/dt-bindings/memory/mt8173-larb-port.h  |  2 +-
> >   include/dt-bindings/memory/mt8183-larb-port.h  |  2 +-
> >   include/dt-bindings/memory/mtk-smi-larb-port.h | 15 +++
> >   5 files changed, 19 insertions(+), 4 deletions(-)
> >   create mode 100644 include/dt-bindings/memory/mtk-smi-larb-port.h
> >
> > ...
> > diff --git a/include/dt-bindings/memory/mtk-smi-larb-port.h 
> > b/include/dt-bindings/memory/mtk-smi-larb-port.h
> > new file mode 100644
> > index ..2ec7fe5ce4e9
> > --- /dev/null
> > +++ b/include/dt-bindings/memory/mtk-smi-larb-port.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2020 MediaTek Inc.
> > + * Author: Yong Wu 
> > + */
> > +#ifndef __DTS_MTK_IOMMU_PORT_H_
> > +#define __DTS_MTK_IOMMU_PORT_H_
> > +
> > +#define MTK_LARB_NR_MAX  16
>
> include/soc/mediatek/smi.h has the very same define.
> Should smi.h include this file?
>
> Regards,
> Matthias
>

Looks like this is being addressed in patch 5 in this series ([05/21]
iommu/mediatek: Use the common mtk-smi-larb-port.h)
That said, should that patch be merged into this one?



> > +
> > +#define MTK_M4U_ID(larb, port)   (((larb) << 5) | (port))
> > +#define MTK_M4U_TO_LARB(id)  (((id) >> 5) & 0xf)
> > +#define MTK_M4U_TO_PORT(id)  ((id) & 0x1f)
> > +
> > +#endif
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/21] dt-binding: mediatek: Add binding for mt8192 IOMMU and SMI

2020-07-12 Thread Pi-Hsun Shih
On Sat, Jul 11, 2020 at 2:50 PM Yong Wu  wrote:
>
> This patch adds decriptions for mt8192 IOMMU and SMI.
>
> mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor translation
> table format. The M4U-SMI HW diagram is as below:
>
>   EMI
>|
>   M4U
>|
>   
>SMI Common
>   
>|
>   +---+--+--+--+---+
>   |   |  |  |   .. |   |
>   |   |  |  |  |   |
> larb0   larb1  larb2  larb4 ..  larb19   larb20
> disp0   disp1   mdpvdec   IPE  IPE
>
> All the connections are HW fixed, SW can NOT adjust it.
>
> mt8192 M4U support 0~16GB iova range. we preassign different engines
> into different iova ranges:
>
> domain-id  module iova-range  larbs
>0   disp0 ~ 4G  larb0/1
>1   vcodec  4G ~ 8G larb4/5/7
>2   cam/mdp 8G ~ 12G larb2/9/11/13/14/16/17/18/19/20
>3   CCU00x4000_ ~ 0x43ff_ larb13: port 9/10
>4   CCU10x4400_ ~ 0x47ff_ larb14: port 4/5
>
> The iova range for CCU0/1(camera control unit) is HW requirement.
>
> Signed-off-by: Yong Wu 
> ---
>  .../bindings/iommu/mediatek,iommu.txt |   8 +-
>  .../mediatek,smi-common.txt   |   5 +-
>  .../memory-controllers/mediatek,smi-larb.txt  |   3 +-
>  include/dt-bindings/memory/mt8192-larb-port.h | 237 ++
>  4 files changed, 247 insertions(+), 6 deletions(-)
>  create mode 100644 include/dt-bindings/memory/mt8192-larb-port.h
> ...
> diff --git a/include/dt-bindings/memory/mt8192-larb-port.h 
> b/include/dt-bindings/memory/mt8192-larb-port.h
> new file mode 100644
> index ..fbe0d5d50f1c
> --- /dev/null
> +++ b/include/dt-bindings/memory/mt8192-larb-port.h
> ...
> +/* larb7 */
> +#define M4U_PORT_L7_VENC_RCPU  MTK_M4U_DOM_ID(1, 7, 0)
> +#define M4U_PORT_L7_VENC_REC   MTK_M4U_DOM_ID(1, 7, 1)
> +#define M4U_PORT_L7_VENC_BSDMA MTK_M4U_DOM_ID(1, 7, 2)
> +#define M4U_PORT_L7_VENC_SV_COMV   MTK_M4U_DOM_ID(1, 7, 3)
> +#define M4U_PORT_L7_VENC_RD_COMV   MTK_M4U_DOM_ID(1, 7, 4)
> +#define M4U_PORT_L7_VENC_CUR_LUMA  MTK_M4U_DOM_ID(1, 7, 5)
> +#define M4U_PORT_L7_VENC_CUR_CHROMAMTK_M4U_DOM_ID(1, 7, 6)
> +#define M4U_PORT_L7_VENC_REF_LUMA  MTK_M4U_DOM_ID(1, 7, 7)
> +#define M4U_PORT_L7_VENC_REF_CHROMAMTK_M4U_DOM_ID(1, 7, 8)
> +#define M4U_PORT_L7_JPGENC_Y_RDMA  MTK_M4U_DOM_ID(1, 7, 9)
> +#define M4U_PORT_L7_JPGENC_Q_RDMA  MTK_M4U_DOM_ID(1, 7, 10)
> +#define M4U_PORT_L7_JPGENC_C_TABLE MTK_M4U_DOM_ID(1, 7, 11)
> +#define M4U_PORT_L7_JPGENC_BSDMA   MTK_M4U_DOM_ID(1, 7, 12)
> +#define M4U_PORT_L7_VENC_SUB_R_LUMAMTK_M4U_DOM_ID(1, 7, 13)
> +#define M4U_PORT_L7_VENC_SUB_W_LUMAMTK_M4U_DOM_ID(1, 7, 14)
> +

Small nit, /* larb8: null */ is missing here.

> +/* larb9 */
> +#define M4U_PORT_L9_IMG_IMGI_D1MTK_M4U_DOM_ID(2, 9, 
> 0)
> +#define M4U_PORT_L9_IMG_IMGBI_D1   MTK_M4U_DOM_ID(2, 9, 1)
> +#define M4U_PORT_L9_IMG_DMGI_D1MTK_M4U_DOM_ID(2, 9, 
> 2)
> +#define M4U_PORT_L9_IMG_DEPI_D1MTK_M4U_DOM_ID(2, 9, 
> 3)
> +#define M4U_PORT_L9_IMG_ICE_D1 MTK_M4U_DOM_ID(2, 9, 4)
> +#define M4U_PORT_L9_IMG_SMTI_D1MTK_M4U_DOM_ID(2, 9, 
> 5)
> +#define M4U_PORT_L9_IMG_SMTO_D2MTK_M4U_DOM_ID(2, 9, 
> 6)
> +#define M4U_PORT_L9_IMG_SMTO_D1MTK_M4U_DOM_ID(2, 9, 
> 7)
> +#define M4U_PORT_L9_IMG_CRZO_D1MTK_M4U_DOM_ID(2, 9, 
> 8)
> +#define M4U_PORT_L9_IMG_IMG3O_D1   MTK_M4U_DOM_ID(2, 9, 9)
> +#define M4U_PORT_L9_IMG_VIPI_D1MTK_M4U_DOM_ID(2, 9, 
> 10)
> +#define M4U_PORT_L9_IMG_SMTI_D5MTK_M4U_DOM_ID(2, 9, 
> 11)
> +#define M4U_PORT_L9_IMG_TIMGO_D1   MTK_M4U_DOM_ID(2, 9, 12)
> +#define M4U_PORT_L9_IMG_UFBC_W0MTK_M4U_DOM_ID(2, 9, 
> 13)
> +#define M4U_PORT_L9_IMG_UFBC_R0MTK_M4U_DOM_ID(2, 9, 
> 14)
> +
> ...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] dma-mapping: add a dma_ops_bypass flag to struct device

2020-07-12 Thread Alexey Kardashevskiy



On 09/07/2020 01:24, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/device.h |  8 +
>  kernel/dma/Kconfig |  8 +
>  kernel/dma/mapping.c   | 74 +-
>  3 files changed, 68 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 4c4af98321ebd6..1f71acf37f78d7 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -523,6 +523,11 @@ struct dev_links_info {
>   * sync_state() callback.
>   * @dma_coherent: this particular device is dma coherent, even if the
>   *   architecture supports non-coherent devices.
> + * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
> + *   streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
> + *   and optionall (if the coherent mask is large enough) also


s/optionall/optional/g

Otherwise the series looks good and works well on powernv and pseries.
Thanks,



-- 
Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA

2020-07-12 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Song Bao Hua (Barry Song)
> Sent: Sunday, June 28, 2020 11:13 PM
> To: h...@lst.de; m.szyprow...@samsung.com; robin.mur...@arm.com;
> w...@kernel.org; ganapatrao.kulka...@cavium.com;
> catalin.mari...@arm.com
> Cc: iommu@lists.linux-foundation.org; Linuxarm ;
> linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; Song Bao
> Hua (Barry Song) 
> Subject: [PATCH v3 0/2] make dma_alloc_coherent NUMA-aware by
> per-NUMA CMA
> 
> Ganapatrao Kulkarni has put some effort on making arm-smmu-v3 use local
> memory to save command queues[1]. I also did similar job in patch
> "iommu/arm-smmu-v3: allocate the memory of queues in local numa node"
> [2] while not realizing Ganapatrao has done that before.
> 
> But it seems it is much better to make dma_alloc_coherent() to be inherently
> NUMA-aware on NUMA-capable systems.
> 
> Right now, smmu is using dma_alloc_coherent() to get memory to save queues
> and tables. Typically, on ARM64 server, there is a default CMA located at
> node0, which could be far away from node2, node3 etc.
> Saving queues and tables remotely will increase the latency of ARM SMMU
> significantly. For example, when SMMU is at node2 and the default global
> CMA is at node0, after sending a CMD_SYNC in an empty command queue, we
> have to wait more than 550ns for the completion of the command
> CMD_SYNC.
> However, if we save them locally, we only need to wait for 240ns.
> 
> with per-numa CMA, smmu will get memory from local numa node to save
> command queues and page tables. that means dma_unmap latency will be
> shrunk much.
> 
> Meanwhile, when iommu.passthrough is on, device drivers which call dma_
> alloc_coherent() will also get local memory and avoid the travel between
> numa nodes.
> 
> [1]
> https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024455.htm
> l
> [2] https://www.spinics.net/lists/iommu/msg44767.html
> 
> -v3:
>   * move to use page_to_nid() while freeing cma with respect to Robin's
>   comment, but this will only work after applying my below patch:
>   "mm/cma.c: use exact_nid true to fix possible per-numa cma leak"
>   https://marc.info/?l=linux-mm=159333034726647=2
> 
>   * handle the case count <= 1 more properly according to Robin's
>   comment;
> 
>   * add pernuma_cma parameter to support dynamic setting of per-numa
>   cma size;
>   ideally we can leverage the CMA_SIZE_MBYTES, CMA_SIZE_PERCENTAGE and
>   "cma=" kernel parameter and avoid a new paramter separately for per-
>   numa cma. Practically, it is really too complicated considering the
>   below problems:
>   (1) if we leverage the size of default numa for per-numa, we have to
>   avoid creating two cma with same size in node0 since default cma is
>   probably on node0.
>   (2) default cma can consider the address limitation for old devices
>   while per-numa cma doesn't support GFP_DMA and GFP_DMA32. all
>   allocations with limitation flags will fallback to default one.
>   (3) hard to apply CMA_SIZE_PERCENTAGE to per-numa. it is hard to
>   decide if the percentage should apply to the whole memory size
>   or only apply to the memory size of a specific numa node.
>   (4) default cma size has CMA_SIZE_SEL_MIN and CMA_SIZE_SEL_MAX, it
>   makes things even more complicated to per-numa cma.
> 
>   I haven't figured out a good way to leverage the size of default cma
>   for per-numa cma. it seems a separate parameter for per-numa could
>   make life easier.
> 
>   * move dma_pernuma_cma_reserve() after hugetlb_cma_reserve() to
>   reuse the comment before hugetlb_cma_reserve() with respect to
>   Robin's comment
> 
> -v2:
>   * fix some issues reported by kernel test robot
>   * fallback to default cma while allocation fails in per-numa cma
>  free memory properly
> 
> Barry Song (2):
>   dma-direct: provide the ability to reserve per-numa CMA
>   arm64: mm: reserve per-numa CMA to localize coherent dma buffers
> 
>  .../admin-guide/kernel-parameters.txt |  9 ++
>  arch/arm64/mm/init.c  |  2 +
>  include/linux/dma-contiguous.h|  4 +
>  kernel/dma/Kconfig| 10 ++
>  kernel/dma/contiguous.c   | 98
> +--
>  5 files changed, 114 insertions(+), 9 deletions(-)

Gentle ping :-)

Thanks
Barry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] xen: introduce xen_vring_use_dma

2020-07-12 Thread Peng Fan
> Subject: Re: [PATCH] xen: introduce xen_vring_use_dma
> 
> Sorry for the late reply -- a couple of conferences kept me busy.
> 
> 
> On Wed, 1 Jul 2020, Michael S. Tsirkin wrote:
> > On Wed, Jul 01, 2020 at 10:34:53AM -0700, Stefano Stabellini wrote:
> > > Would you be in favor of a more flexible check along the lines of
> > > the one proposed in the patch that started this thread:
> > >
> > > if (xen_vring_use_dma())
> > > return true;
> > >
> > >
> > > xen_vring_use_dma would be implemented so that it returns true when
> > > xen_swiotlb is required and false otherwise.
> >
> > Just to stress - with a patch like this virtio can *still* use DMA API
> > if PLATFORM_ACCESS is set. So if DMA API is broken on some platforms
> > as you seem to be saying, you guys should fix it before doing
> > something like this..
> 
> Yes, DMA API is broken with some interfaces (specifically: rpmesg and trusty),
> but for them PLATFORM_ACCESS is never set. That is why the errors weren't
> reported before. Xen special case aside, there is no problem under normal
> circumstances.
> 
> 
> If you are OK with this patch (after a little bit of clean-up), Peng, are you 
> OK
> with sending an update or do you want me to?

If you could help, that would be great. You have more expertise in knowing
the whole picture.

Thanks,
Peng.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/21] iommu/io-pgtable-arm-v7s: Use ias to check the valid iova in unmap

2020-07-12 Thread Nicolas Boichat
On Sat, Jul 11, 2020 at 2:50 PM Yong Wu  wrote:
>
> As title.
>
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/io-pgtable-arm-v7s.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index 4272fe4e17f4..01f2a8876808 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -717,7 +717,7 @@ static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, 
> unsigned long iova,
>  {
> struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
>
> -   if (WARN_ON(upper_32_bits(iova)))
> +   if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))

This is a little odd as iova is unsigned long and 1ULL is unsigned long long.

Would it be better to keep the spirit of the previous test and do
something like:
 if (WARN_ON(iova >> data->iop.cfg.ias)) ?

> return 0;
>
> return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd);
> --
> 2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/21] dt-binding: memory: mediatek: Add a common larb-port header file

2020-07-12 Thread Matthias Brugger




On 11/07/2020 08:48, Yong Wu wrote:

Put all the macros about smi larb/port togethers, this is a preparing
patch for extending LARB_NR and adding new dom-id support.

Signed-off-by: Yong Wu 
---
  include/dt-bindings/memory/mt2712-larb-port.h  |  2 +-
  include/dt-bindings/memory/mt6779-larb-port.h  |  2 +-
  include/dt-bindings/memory/mt8173-larb-port.h  |  2 +-
  include/dt-bindings/memory/mt8183-larb-port.h  |  2 +-
  include/dt-bindings/memory/mtk-smi-larb-port.h | 15 +++
  5 files changed, 19 insertions(+), 4 deletions(-)
  create mode 100644 include/dt-bindings/memory/mtk-smi-larb-port.h

diff --git a/include/dt-bindings/memory/mt2712-larb-port.h 
b/include/dt-bindings/memory/mt2712-larb-port.h
index 6f9aa7349cef..b6b2c6bf4459 100644
--- a/include/dt-bindings/memory/mt2712-larb-port.h
+++ b/include/dt-bindings/memory/mt2712-larb-port.h
@@ -6,7 +6,7 @@
  #ifndef __DTS_IOMMU_PORT_MT2712_H
  #define __DTS_IOMMU_PORT_MT2712_H
  
-#define MTK_M4U_ID(larb, port)		(((larb) << 5) | (port))

+#include 
  
  #define M4U_LARB0_ID			0

  #define M4U_LARB1_ID  1
diff --git a/include/dt-bindings/memory/mt6779-larb-port.h 
b/include/dt-bindings/memory/mt6779-larb-port.h
index 2ad0899fbf2f..60f57f54393e 100644
--- a/include/dt-bindings/memory/mt6779-larb-port.h
+++ b/include/dt-bindings/memory/mt6779-larb-port.h
@@ -7,7 +7,7 @@
  #ifndef _DTS_IOMMU_PORT_MT6779_H_
  #define _DTS_IOMMU_PORT_MT6779_H_
  
-#define MTK_M4U_ID(larb, port)		 (((larb) << 5) | (port))

+#include 
  
  #define M4U_LARB0_ID			 0

  #define M4U_LARB1_ID   1
diff --git a/include/dt-bindings/memory/mt8173-larb-port.h 
b/include/dt-bindings/memory/mt8173-larb-port.h
index 9f31ccfeca21..d8c99c946053 100644
--- a/include/dt-bindings/memory/mt8173-larb-port.h
+++ b/include/dt-bindings/memory/mt8173-larb-port.h
@@ -6,7 +6,7 @@
  #ifndef __DTS_IOMMU_PORT_MT8173_H
  #define __DTS_IOMMU_PORT_MT8173_H
  
-#define MTK_M4U_ID(larb, port)		(((larb) << 5) | (port))

+#include 
  
  #define M4U_LARB0_ID			0

  #define M4U_LARB1_ID  1
diff --git a/include/dt-bindings/memory/mt8183-larb-port.h 
b/include/dt-bindings/memory/mt8183-larb-port.h
index 2c579f305162..275c095a6fd6 100644
--- a/include/dt-bindings/memory/mt8183-larb-port.h
+++ b/include/dt-bindings/memory/mt8183-larb-port.h
@@ -6,7 +6,7 @@
  #ifndef __DTS_IOMMU_PORT_MT8183_H
  #define __DTS_IOMMU_PORT_MT8183_H
  
-#define MTK_M4U_ID(larb, port)		(((larb) << 5) | (port))

+#include 
  
  #define M4U_LARB0_ID			0

  #define M4U_LARB1_ID  1
diff --git a/include/dt-bindings/memory/mtk-smi-larb-port.h 
b/include/dt-bindings/memory/mtk-smi-larb-port.h
new file mode 100644
index ..2ec7fe5ce4e9
--- /dev/null
+++ b/include/dt-bindings/memory/mtk-smi-larb-port.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020 MediaTek Inc.
+ * Author: Yong Wu 
+ */
+#ifndef __DTS_MTK_IOMMU_PORT_H_
+#define __DTS_MTK_IOMMU_PORT_H_
+
+#define MTK_LARB_NR_MAX16


include/soc/mediatek/smi.h has the very same define.
Should smi.h include this file?

Regards,
Matthias


+
+#define MTK_M4U_ID(larb, port) (((larb) << 5) | (port))
+#define MTK_M4U_TO_LARB(id)(((id) >> 5) & 0xf)
+#define MTK_M4U_TO_PORT(id)((id) & 0x1f)
+
+#endif


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-07-12 Thread Liu Yi L
This patch allows user space to request PASID allocation/free, e.g. when
serving the request from the guest.

PASIDs that are not freed by userspace are automatically freed when the
IOASID set is destroyed when process exits.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Yi Sun 
Signed-off-by: Jacob Pan 
---
v4 -> v5:
*) address comments from Eric Auger.
*) the comments for the PASID_FREE request is addressed in patch 5/15 of
   this series.

v3 -> v4:
*) address comments from v3, except the below comment against the range
   of PASID_FREE request. needs more help on it.
"> +if (req.range.min > req.range.max)

Is it exploitable that a user can spin the kernel for a long time in
the case of a free by calling this with [0, MAX_UINT] regardless of
their actual allocations?"
https://lore.kernel.org/linux-iommu/20200702151832.048b4...@x1.home/

v1 -> v2:
*) move the vfio_mm related code to be a seprate module
*) use a single structure for alloc/free, could support a range of PASIDs
*) fetch vfio_mm at group_attach time instead of at iommu driver open time
---
 drivers/vfio/Kconfig|  1 +
 drivers/vfio/vfio_iommu_type1.c | 85 +
 drivers/vfio/vfio_pasid.c   | 10 +
 include/linux/vfio.h|  6 +++
 include/uapi/linux/vfio.h   | 37 ++
 5 files changed, 139 insertions(+)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 3d8a108..95d90c6 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -2,6 +2,7 @@
 config VFIO_IOMMU_TYPE1
tristate
depends on VFIO
+   select VFIO_PASID if (X86)
default n
 
 config VFIO_IOMMU_SPAPR_TCE
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ed80104..55b4065 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -76,6 +76,7 @@ struct vfio_iommu {
booldirty_page_tracking;
boolpinned_page_dirty_scope;
struct iommu_nesting_info   *nesting_info;
+   struct vfio_mm  *vmm;
 };
 
 struct vfio_domain {
@@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct 
vfio_iommu *iommu,
 
 static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
 {
+   if (iommu->vmm) {
+   vfio_mm_put(iommu->vmm);
+   iommu->vmm = NULL;
+   }
+
kfree(iommu->nesting_info);
iommu->nesting_info = NULL;
 }
@@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
iommu->nesting_info);
if (ret)
goto out_detach;
+
+   if (iommu->nesting_info->features &
+   IOMMU_NESTING_FEAT_SYSWIDE_PASID) {
+   struct vfio_mm *vmm;
+   int sid;
+
+   vmm = vfio_mm_get_from_task(current);
+   if (IS_ERR(vmm)) {
+   ret = PTR_ERR(vmm);
+   goto out_detach;
+   }
+   iommu->vmm = vmm;
+
+   sid = vfio_mm_ioasid_sid(vmm);
+   ret = iommu_domain_set_attr(domain->domain,
+   DOMAIN_ATTR_IOASID_SID,
+   );
+   if (ret)
+   goto out_detach;
+   }
}
 
/* Get aperture info */
@@ -2855,6 +2881,63 @@ static int vfio_iommu_type1_dirty_pages(struct 
vfio_iommu *iommu,
return -EINVAL;
 }
 
+static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
+   unsigned int min,
+   unsigned int max)
+{
+   int ret = -EOPNOTSUPP;
+
+   mutex_lock(>lock);
+   if (iommu->vmm)
+   ret = vfio_pasid_alloc(iommu->vmm, min, max);
+   mutex_unlock(>lock);
+   return ret;
+}
+
+static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
+  unsigned int min,
+  unsigned int max)
+{
+   int ret = -EOPNOTSUPP;
+
+   mutex_lock(>lock);
+   if (iommu->vmm) {
+   vfio_pasid_free_range(iommu->vmm, min, max);
+   ret = 0;
+   }
+   mutex_unlock(>lock);
+   return ret;
+}
+
+static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
+ unsigned long arg)
+{
+   struct vfio_iommu_type1_pasid_request req;
+   unsigned long minsz;
+
+   minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);
+
+   if (copy_from_user(, 

[PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace

2020-07-12 Thread Liu Yi L
This patch exports iommu nesting capability info to user space through
VFIO. User space is expected to check this info for supported uAPIs (e.g.
PASID alloc/free, bind page table, and cache invalidation) and the vendor
specific format information for first level/stage page table that will be
bound to.

The nesting info is available only after the nesting iommu type is set
for a container. Current implementation imposes one limitation - one
nesting container should include at most one group. The philosophy of
vfio container is having all groups/devices within the container share
the same IOMMU context. When vSVA is enabled, one IOMMU context could
include one 2nd-level address space and multiple 1st-level address spaces.
While the 2nd-level address space is reasonably sharable by multiple groups
, blindly sharing 1st-level address spaces across all groups within the
container might instead break the guest expectation. In the future sub/
super container concept might be introduced to allow partial address space
sharing within an IOMMU context. But for now let's go with this restriction
by requiring singleton container for using nesting iommu features. Below
link has the related discussion about this decision.

https://lkml.org/lkml/2020/5/15/1028

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
---
v4 -> v5:
*) address comments from Eric Auger.
*) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
   cap is much "cheap", if needs extension in future, just define another cap.
   https://lore.kernel.org/kvm/20200708132947.5b7ee...@x1.home/

v3 -> v4:
*) address comments against v3.

v1 -> v2:
*) added in v2
---
 drivers/vfio/vfio_iommu_type1.c | 102 +++-
 include/uapi/linux/vfio.h   |  19 
 2 files changed, 109 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3bd70ff..ed80104 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
 "Maximum number of user DMA mappings per container (65535).");
 
 struct vfio_iommu {
-   struct list_headdomain_list;
-   struct list_headiova_list;
-   struct vfio_domain  *external_domain; /* domain for external user */
-   struct mutexlock;
-   struct rb_root  dma_list;
-   struct blocking_notifier_head notifier;
-   unsigned intdma_avail;
-   uint64_tpgsize_bitmap;
-   boolv2;
-   boolnesting;
-   booldirty_page_tracking;
-   boolpinned_page_dirty_scope;
+   struct list_headdomain_list;
+   struct list_headiova_list;
+   /* domain for external user */
+   struct vfio_domain  *external_domain;
+   struct mutexlock;
+   struct rb_root  dma_list;
+   struct blocking_notifier_head   notifier;
+   unsigned intdma_avail;
+   uint64_tpgsize_bitmap;
+   boolv2;
+   boolnesting;
+   booldirty_page_tracking;
+   boolpinned_page_dirty_scope;
+   struct iommu_nesting_info   *nesting_info;
 };
 
 struct vfio_domain {
@@ -130,6 +132,9 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(>domain_list))
 
+#define CONTAINER_HAS_DOMAIN(iommu)(((iommu)->external_domain) || \
+(!list_empty(&(iommu)->domain_list)))
+
 #define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
 
 /*
@@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct 
vfio_iommu *iommu,
 
list_splice_tail(iova_copy, iova);
 }
+
+static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
+{
+   kfree(iommu->nesting_info);
+   iommu->nesting_info = NULL;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 struct iommu_group *iommu_group)
 {
@@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
}
}
 
+   /* Nesting type container can include only one group */
+   if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
+   mutex_unlock(>lock);
+   return -EINVAL;
+   }
+
group = kzalloc(sizeof(*group), GFP_KERNEL);
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!group || !domain) {
@@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,

[PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs

2020-07-12 Thread Liu Yi L
Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Patch Overview:
 1. a refactor to vfio_iommu_type1 ioctl (patch 0001)
 2. reports IOMMU nesting info to userspace ( patch 0002, 0003, 0004 and 0015)
 3. vfio support for PASID allocation and free for VMs (patch 0005, 0006, 0007)
 4. vfio support for binding guest page table to host (patch 0008, 0009, 0010)
 5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
 6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
 7. expose PASID capability to VM (patch 0013)
 8. add doc for VFIO dual stage control (patch 0014)

The complete vSVA kernel upstream patches are divided into three phases:
1. Common APIs and PCI device direct assignment
2. IOMMU-backed Mediated Device assignment
3. Page Request Services (PRS) support

This patchset is aiming for the phase 1 and phase 2, and based on Jacob's
below series.
*) [PATCH v4 0/5] IOMMU user API enhancement - wip
   
https://lore.kernel.org/linux-iommu/1594165429-20075-1-git-send-email-jacob.jun@linux.intel.com/

*) [PATCH 00/10] IOASID extensions for guest SVA - wip
   https://lkml.org/lkml/2020/3/25/874

The latest IOASID code added below new interface for itertate all PASIDs of an
ioasid_set. The implementation is not sent out yet as Jacob needs some cleanup,
it can be found in branch vsva-linux-5.8-rc3-v5 on github (mentioned below):
 int ioasid_set_for_each_ioasid(int sid, void (*fn)(ioasid_t id, void *data), 
void *data);

Complete set for current vSVA can be found in below branch.
https://github.com/luxis1999/linux-vsva.git: vsva-linux-5.8-rc3-v5

The corresponding QEMU patch series is included in below branch:
https://github.com/luxis1999/qemu.git: vsva_5.8_rc3_qemu_rfcv8


Regards,
Yi Liu

Changelog:
- Patch v4 -> Patch v5:
  a) Address comments against v4
  Patch v4: 
https://lore.kernel.org/kvm/1593861989-35920-1-git-send-email-yi.l@intel.com/

- Patch v3 -> Patch v4:
  a) Address comments against v3
  b) Add rb from Stefan on patch 14/15
  Patch v3: 
https://lore.kernel.org/linux-iommu/1592988927-48009-1-git-send-email-yi.l@intel.com/

- Patch v2 -> Patch v3:
  a) Rebase on top of Jacob's v3 iommu uapi patchset
  b) Address comments from Kevin and Stefan Hajnoczi
  c) Reuse DOMAIN_ATTR_NESTING to get iommu nesting info
  d) Drop [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data
  Patch v2: 
https://lore.kernel.org/linux-iommu/1591877734-66527-1-git-send-email-yi.l@intel.com/#r

- Patch v1 -> Patch v2:
  a) Refactor vfio_iommu_type1_ioctl() per suggestion from Christoph
 Hellwig.
  b) Re-sequence the patch series for better bisect support.
  c) Report IOMMU nesting cap info in detail instead of a format in
 v1.
  d) Enforce one group per nesting type container for vfio iommu type1
 driver.
  e) Build the vfio_mm related code from vfio.c to be a separate
 vfio_pasid.ko.
  f) Add PASID ownership check in IOMMU driver.
  g) Adopted to latest IOMMU UAPI design. Removed IOMMU UAPI version
 check. Added iommu_gpasid_unbind_data for unbind requests from
 userspace.
  h) Define a single 

[PATCH v5 02/15] iommu: Report domain nesting info

2020-07-12 Thread Liu Yi L
IOMMUs that support nesting translation needs report the capability info
to userspace, e.g. the format of first level/stage paging structures.

This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
nesting info after setting DOMAIN_ATTR_NESTING.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
v4 -> v5:
*) address comments from Eric Auger.

v3 -> v4:
*) split the SMMU driver changes to be a separate patch
*) move the @addr_width and @pasid_bits from vendor specific
   part to generic part.
*) tweak the description for the @features field of struct
   iommu_nesting_info.
*) add description on the @data[] field of struct iommu_nesting_info

v2 -> v3:
*) remvoe cap/ecap_mask in iommu_nesting_info.
*) reuse DOMAIN_ATTR_NESTING to get nesting info.
*) return an empty iommu_nesting_info for SMMU drivers per Jean'
   suggestion.
---
 include/uapi/linux/iommu.h | 77 ++
 1 file changed, 77 insertions(+)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 1afc661..d2a47c4 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -332,4 +332,81 @@ struct iommu_gpasid_bind_data {
} vendor;
 };
 
+/*
+ * struct iommu_nesting_info - Information for nesting-capable IOMMU.
+ *user space should check it before using
+ *nesting capability.
+ *
+ * @size:  size of the whole structure
+ * @format:PASID table entry format, the same definition as struct
+ * iommu_gpasid_bind_data @format.
+ * @features:  supported nesting features.
+ * @flags: currently reserved for future extension.
+ * @addr_width:The output addr width of first level/stage translation
+ * @pasid_bits:Maximum supported PASID bits, 0 represents no PASID
+ * support.
+ * @data:  vendor specific cap info. data[] structure type can be deduced
+ * from @format field.
+ *
+ * +===+==+
+ * | feature   |  Notes   |
+ * +===+==+
+ * | SYSWIDE_PASID |  PASIDs are managed in system-wide, instead of per   |
+ * |   |  device. When a device is assigned to userspace or   |
+ * |   |  VM, proper uAPI (userspace driver framework uAPI,   |
+ * |   |  e.g. VFIO) must be used to allocate/free PASIDs for |
+ * |   |  the assigned device.|
+ * +---+--+
+ * | BIND_PGTBL|  The owner of the first level/stage page table must  |
+ * |   |  explicitly bind the page table to associated PASID  |
+ * |   |  (either the one specified in bind request or the|
+ * |   |  default PASID of iommu domain), through userspace   |
+ * |   |  driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). |
+ * +---+--+
+ * | CACHE_INVLD   |  The owner of the first level/stage page table must  |
+ * |   |  explicitly invalidate the IOMMU cache through uAPI  |
+ * |   |  provided by userspace driver framework (e.g. VFIO)  |
+ * |   |  according to vendor-specific requirement when   |
+ * |   |  changing the page table.|
+ * +---+--+
+ *
+ * @data[] types defined for @format:
+ * ++=+
+ * | @format| @data[] |
+ * ++=+
+ * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd   |
+ * ++-+
+ *
+ */
+struct iommu_nesting_info {
+   __u32   size;
+   __u32   format;
+#define IOMMU_NESTING_FEAT_SYSWIDE_PASID   (1 << 0)
+#define IOMMU_NESTING_FEAT_BIND_PGTBL  (1 << 1)
+#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2)
+   __u32   features;
+   __u32   flags;
+   __u16   addr_width;
+   __u16   pasid_bits;
+   __u32   padding;
+   __u8data[];
+};
+
+/*
+ * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info
+ *
+ * @flags: VT-d specific flags. Currently reserved for future
+ * extension.
+ * @cap_reg:   Describe basic capabilities as defined in VT-d capability
+ * register.
+ * @ecap_reg:  Describe the extended capabilities as defined in VT-d
+ * extended capability register.
+ */
+struct iommu_nesting_info_vtd {
+  

[PATCH v5 14/15] vfio: Document dual stage control

2020-07-12 Thread Liu Yi L
From: Eric Auger 

The VFIO API was enhanced to support nested stage control: a bunch of
new iotcls and usage guideline.

Let's document the process to follow to set up nested mode.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Eric Auger 
Signed-off-by: Liu Yi L 
---
v3 -> v4:
*) add review-by from Stefan Hajnoczi

v2 -> v3:
*) address comments from Stefan Hajnoczi

v1 -> v2:
*) new in v2, compared with Eric's original version, pasid table bind
   and fault reporting is removed as this series doesn't cover them.
   Original version from Eric.
   https://lkml.org/lkml/2020/3/20/700
---
 Documentation/driver-api/vfio.rst | 67 +++
 1 file changed, 67 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst 
b/Documentation/driver-api/vfio.rst
index f1a4d3c..0672c45 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,73 @@ group and can access them as follows::
/* Gratuitous device reset and go... */
ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMU Dual Stage Control
+
+
+Some IOMMUs support 2 stages/levels of translation. Stage corresponds to
+the ARM terminology while level corresponds to Intel's VTD terminology.
+In the following text we use either without distinction.
+
+This is useful when the guest is exposed with a virtual IOMMU and some
+devices are assigned to the guest through VFIO. Then the guest OS can use
+stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage 2 for
+VM isolation (GPA -> HPA).
+
+Under dual stage translation, the guest gets ownership of the stage 1 page
+tables and also owns stage 1 configuration structures. The hypervisor owns
+the root configuration structure (for security reason), including stage 2
+configuration. This works as long as configuration structures and page table
+formats are compatible between the virtual IOMMU and the physical IOMMU.
+
+Assuming the HW supports it, this nested mode is selected by choosing the
+VFIO_TYPE1_NESTING_IOMMU type through:
+
+ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
+
+This forces the hypervisor to use the stage 2, leaving stage 1 available
+for guest usage. The guest stage 1 format depends on IOMMU vendor, and
+it is the same with the nesting configuration method. User space should
+check the format and configuration method after setting nesting type by
+using:
+
+ioctl(container->fd, VFIO_IOMMU_GET_INFO, _info);
+
+Details can be found in Documentation/userspace-api/iommu.rst. For Intel
+VT-d, each stage 1 page table is bound to host by:
+
+nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
+memcpy(_op->data, _data, sizeof(bind_data));
+ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
+GVA->GPA page tables are available when PASID (Process Address Space ID)
+is exposed to guest. e.g. guest with PASID-capable devices assigned. For
+such page table binding, the bind_data should include PASID info, which
+is allocated by guest itself or by host. This depends on hardware vendor.
+e.g. Intel VT-d requires to allocate PASID from host. This requirement is
+defined by the Virtual Command Support in VT-d 3.0 spec, guest software
+running on VT-d should allocate PASID from host kernel. To allocate PASID
+from host, user space should check the IOMMU_NESTING_FEAT_SYSWIDE_PASID
+bit of the nesting info reported from host kernel. VFIO reports the nesting
+info by VFIO_IOMMU_GET_INFO. User space could allocate PASID from host by:
+
+req.flags = VFIO_IOMMU_ALLOC_PASID;
+ioctl(container, VFIO_IOMMU_PASID_REQUEST, );
+
+With first stage/level page table bound to host, it allows to combine the
+guest stage 1 translation along with the hypervisor stage 2 translation to
+get final address.
+
+When the guest invalidates stage 1 related caches, invalidations must be
+forwarded to the host through
+
+nesting_op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD;
+memcpy(_op->data, _data, sizeof(inv_data));
+ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+Those invalidations can happen at various granularity levels, page, context,
+...
+
 VFIO User API
 ---
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 08/15] iommu: Pass domain to sva_unbind_gpasid()

2020-07-12 Thread Liu Yi L
From: Yi Sun 

Current interface is good enough for SVA virtualization on an assigned
physical PCI device, but when it comes to mediated devices, a physical
device may attached with multiple aux-domains. Also, for guest unbind,
the PASID to be unbind should be allocated to the VM. This check requires
to know the ioasid_set which is associated with the domain.

So this interface needs to pass in domain info. Then the iommu driver is
able to know which domain will be used for the 2nd stage translation of
the nesting mode and also be able to do PASID ownership check. This patch
passes @domain per the above reason.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Yi Sun 
Signed-off-by: Liu Yi L 
---
v2 -> v3:
*) pass in domain info only
*) use ioasid_t for pasid instead of int type

v1 -> v2:
*) added in v2.
---
 drivers/iommu/intel/svm.c   | 3 ++-
 drivers/iommu/iommu.c   | 2 +-
 include/linux/intel-iommu.h | 3 ++-
 include/linux/iommu.h   | 3 ++-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index b9a9c55..d2c0e1a 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -432,7 +432,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
return ret;
 }
 
-int intel_svm_unbind_gpasid(struct device *dev, int pasid)
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
struct intel_svm_dev *sdev;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7910249..d3e554c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2151,7 +2151,7 @@ int __iommu_sva_unbind_gpasid(struct iommu_domain 
*domain, struct device *dev,
if (unlikely(!domain->ops->sva_unbind_gpasid))
return -ENODEV;
 
-   return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
+   return domain->ops->sva_unbind_gpasid(domain, dev, data->hpasid);
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_gpasid);
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 0d0ab32..18f292e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -738,7 +738,8 @@ extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
 int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
  struct iommu_gpasid_bind_data *data);
-int intel_svm_unbind_gpasid(struct device *dev, int pasid);
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid);
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e84a1d5..ca5edd8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -303,7 +303,8 @@ struct iommu_ops {
int (*sva_bind_gpasid)(struct iommu_domain *domain,
struct device *dev, struct iommu_gpasid_bind_data 
*data);
 
-   int (*sva_unbind_gpasid)(struct device *dev, int pasid);
+   int (*sva_unbind_gpasid)(struct iommu_domain *domain,
+struct device *dev, ioasid_t pasid);
 
int (*def_domain_type)(struct device *dev);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl()

2020-07-12 Thread Liu Yi L
This patch refactors the vfio_iommu_type1_ioctl() to use switch instead of
if-else, and each command got a helper function.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Reviewed-by: Eric Auger 
Suggested-by: Christoph Hellwig 
Signed-off-by: Liu Yi L 
---
v4 -> v5:
*) address comments from Eric Auger, add r-b from Eric.
---
 drivers/vfio/vfio_iommu_type1.c | 394 ++--
 1 file changed, 213 insertions(+), 181 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5e556ac..3bd70ff 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2453,6 +2453,23 @@ static int vfio_domains_have_iommu_cache(struct 
vfio_iommu *iommu)
return ret;
 }
 
+static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
+   unsigned long arg)
+{
+   switch (arg) {
+   case VFIO_TYPE1_IOMMU:
+   case VFIO_TYPE1v2_IOMMU:
+   case VFIO_TYPE1_NESTING_IOMMU:
+   return 1;
+   case VFIO_DMA_CC_IOMMU:
+   if (!iommu)
+   return 0;
+   return vfio_domains_have_iommu_cache(iommu);
+   default:
+   return 0;
+   }
+}
+
 static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps,
 struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
 size_t size)
@@ -2529,241 +2546,256 @@ static int vfio_iommu_migration_build_caps(struct 
vfio_iommu *iommu,
return vfio_info_add_capability(caps, _mig.header, sizeof(cap_mig));
 }
 
-static long vfio_iommu_type1_ioctl(void *iommu_data,
-  unsigned int cmd, unsigned long arg)
+static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
+unsigned long arg)
 {
-   struct vfio_iommu *iommu = iommu_data;
+   struct vfio_iommu_type1_info info;
unsigned long minsz;
+   struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+   unsigned long capsz;
+   int ret;
 
-   if (cmd == VFIO_CHECK_EXTENSION) {
-   switch (arg) {
-   case VFIO_TYPE1_IOMMU:
-   case VFIO_TYPE1v2_IOMMU:
-   case VFIO_TYPE1_NESTING_IOMMU:
-   return 1;
-   case VFIO_DMA_CC_IOMMU:
-   if (!iommu)
-   return 0;
-   return vfio_domains_have_iommu_cache(iommu);
-   default:
-   return 0;
-   }
-   } else if (cmd == VFIO_IOMMU_GET_INFO) {
-   struct vfio_iommu_type1_info info;
-   struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
-   unsigned long capsz;
-   int ret;
-
-   minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
+   minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
-   /* For backward compatibility, cannot require this */
-   capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
+   /* For backward compatibility, cannot require this */
+   capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
 
-   if (copy_from_user(, (void __user *)arg, minsz))
-   return -EFAULT;
+   if (copy_from_user(, (void __user *)arg, minsz))
+   return -EFAULT;
 
-   if (info.argsz < minsz)
-   return -EINVAL;
+   if (info.argsz < minsz)
+   return -EINVAL;
 
-   if (info.argsz >= capsz) {
-   minsz = capsz;
-   info.cap_offset = 0; /* output, no-recopy necessary */
-   }
+   if (info.argsz >= capsz) {
+   minsz = capsz;
+   info.cap_offset = 0; /* output, no-recopy necessary */
+   }
 
-   mutex_lock(>lock);
-   info.flags = VFIO_IOMMU_INFO_PGSIZES;
+   mutex_lock(>lock);
+   info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
-   info.iova_pgsizes = iommu->pgsize_bitmap;
+   info.iova_pgsizes = iommu->pgsize_bitmap;
 
-   ret = vfio_iommu_migration_build_caps(iommu, );
+   ret = vfio_iommu_migration_build_caps(iommu, );
 
-   if (!ret)
-   ret = vfio_iommu_iova_build_caps(iommu, );
+   if (!ret)
+   ret = vfio_iommu_iova_build_caps(iommu, );
 
-   mutex_unlock(>lock);
+   mutex_unlock(>lock);
 
-   if (ret)
-   return ret;
+   if (ret)
+   return ret;
 
-   if (caps.size) {
-   info.flags |= VFIO_IOMMU_INFO_CAPS;
+   if (caps.size) {
+   info.flags |= VFIO_IOMMU_INFO_CAPS;
 
-   if (info.argsz < sizeof(info) + caps.size) 

[PATCH v5 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache

2020-07-12 Thread Liu Yi L
This patch provides an interface allowing the userspace to invalidate
IOMMU cache for first-level page table. It is required when the first
level IOMMU page table is not managed by the host kernel in the nested
translation setup.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Eric Auger 
Signed-off-by: Jacob Pan 
---
v1 -> v2:
*) rename from "vfio/type1: Flush stage-1 IOMMU cache for nesting type"
*) rename vfio_cache_inv_fn() to vfio_dev_cache_invalidate_fn()
*) vfio_dev_cache_inv_fn() always successful
*) remove VFIO_IOMMU_CACHE_INVALIDATE, and reuse VFIO_IOMMU_NESTING_OP
---
 drivers/vfio/vfio_iommu_type1.c | 50 +
 include/uapi/linux/vfio.h   |  3 +++
 2 files changed, 53 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f0f21ff..960cc59 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -3073,6 +3073,53 @@ static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   unsigned long arg = *(unsigned long *)dc->data;
+
+   iommu_cache_invalidate(dc->domain, dev, (void __user *)arg);
+   return 0;
+}
+
+static long vfio_iommu_invalidate_cache(struct vfio_iommu *iommu,
+   unsigned long arg)
+{
+   struct domain_capsule dc = { .data =  };
+   struct vfio_group *group;
+   struct vfio_domain *domain;
+   int ret = 0;
+   struct iommu_nesting_info *info;
+
+   mutex_lock(>lock);
+   /*
+* Cache invalidation is required for any nesting IOMMU,
+* so no need to check system-wide PASID support.
+*/
+   info = iommu->nesting_info;
+   if (!info || !(info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) {
+   ret = -EOPNOTSUPP;
+   goto out_unlock;
+   }
+
+   group = vfio_find_nesting_group(iommu);
+   if (!group) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   domain = list_first_entry(>domain_list,
+ struct vfio_domain, next);
+   dc.group = group;
+   dc.domain = domain->domain;
+   iommu_group_for_each_dev(group->iommu_group, ,
+vfio_dev_cache_invalidate_fn);
+
+out_unlock:
+   mutex_unlock(>lock);
+   return ret;
+}
+
 static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
unsigned long arg)
 {
@@ -3095,6 +3142,9 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu 
*iommu,
case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
break;
+   case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
+   ret = vfio_iommu_invalidate_cache(iommu, arg + minsz);
+   break;
default:
ret = -EINVAL;
}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index a8ad786..845a5800 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1225,6 +1225,8 @@ struct vfio_iommu_type1_pasid_request {
  * +-+---+
  * | UNBIND_PGTBL|  struct iommu_gpasid_bind_data|
  * +-+---+
+ * | CACHE_INVLD |  struct iommu_cache_invalidate_info   |
+ * +-+---+
  *
  * returns: 0 on success, -errno on failure.
  */
@@ -1237,6 +1239,7 @@ struct vfio_iommu_type1_nesting_op {
 
 #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL   (0)
 #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL (1)
+#define VFIO_IOMMU_NESTING_OP_CACHE_INVLD  (2)
 
 #define VFIO_IOMMU_NESTING_OP  _IO(VFIO_TYPE, VFIO_BASE + 19)
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 05/15] vfio: Add PASID allocation/free support

2020-07-12 Thread Liu Yi L
Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
multiple process virtual address spaces with the device for simplified
programming model. PASID is used to tag an virtual address space in DMA
requests and to identify the related translation structure in IOMMU. When
a PASID-capable device is assigned to a VM, we want the same capability
of using PASID to tag guest process virtual address spaces to achieve
virtual SVA (vSVA).

PASID management for guest is vendor specific. Some vendors (e.g. Intel
VT-d) requires system-wide managed PASIDs cross all devices, regardless
of whether a device is used by host or assigned to guest. Other vendors
(e.g. ARM SMMU) may allow PASIDs managed per-device thus could be fully
delegated to the guest for assigned devices.

For system-wide managed PASIDs, this patch introduces a vfio module to
handle explicit PASID alloc/free requests from guest. Allocated PASIDs
are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
object is introduced to track mm_struct. Multiple VFIO containers within
a process share the same vfio_mm object.

A quota mechanism is provided to prevent malicious user from exhausting
available PASIDs. Currently the quota is a global parameter applied to
all VFIO devices. In the future per-device quota might be supported too.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Suggested-by: Alex Williamson 
Signed-off-by: Liu Yi L 
---
v4 -> v5:
*) address comments from Eric Auger.
*) address the comments from Alex on the pasid free range support. Added
   per vfio_mm pasid r-b tree.
   https://lore.kernel.org/kvm/20200709082751.32074...@x1.home/

v3 -> v4:
*) fix lock leam in vfio_mm_get_from_task()
*) drop pasid_quota field in struct vfio_mm
*) vfio_mm_get_from_task() returns ERR_PTR(-ENOTTY) when !CONFIG_VFIO_PASID

v1 -> v2:
*) added in v2, split from the pasid alloc/free support of v1
---
 drivers/vfio/Kconfig  |   5 +
 drivers/vfio/Makefile |   1 +
 drivers/vfio/vfio_pasid.c | 235 ++
 include/linux/vfio.h  |  28 ++
 4 files changed, 269 insertions(+)
 create mode 100644 drivers/vfio/vfio_pasid.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index fd17db9..3d8a108 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -19,6 +19,11 @@ config VFIO_VIRQFD
depends on VFIO && EVENTFD
default n
 
+config VFIO_PASID
+   tristate
+   depends on IOASID && VFIO
+   default n
+
 menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index de67c47..bb836a3 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o
 
 obj-$(CONFIG_VFIO) += vfio.o
 obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
+obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
new file mode 100644
index 000..66e6054e
--- /dev/null
+++ b/drivers/vfio/vfio_pasid.c
@@ -0,0 +1,235 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ * Author: Liu Yi L 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Liu Yi L "
+#define DRIVER_DESC "PASID management for VFIO bus drivers"
+
+#define VFIO_DEFAULT_PASID_QUOTA   1000
+static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
+module_param_named(pasid_quota, pasid_quota, uint, 0444);
+MODULE_PARM_DESC(pasid_quota,
+" Set the quota for max number of PASIDs that an application 
is allowed to request (default 1000)");
+
+struct vfio_mm_token {
+   unsigned long long val;
+};
+
+struct vfio_mm {
+   struct kref kref;
+   int ioasid_sid;
+   struct mutexpasid_lock;
+   struct rb_root  pasid_list;
+   struct list_headnext;
+   struct vfio_mm_tokentoken;
+};
+
+static struct mutexvfio_mm_lock;
+static struct list_headvfio_mm_list;
+
+struct vfio_pasid {
+   struct rb_node  node;
+   ioasid_tpasid;
+};
+
+static void vfio_remove_all_pasids(struct vfio_mm *vmm);
+
+/* called with vfio.vfio_mm_lock held */
+static void vfio_mm_release(struct kref *kref)
+{
+   struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
+
+   list_del(>next);
+   mutex_unlock(_mm_lock);
+   vfio_remove_all_pasids(vmm);
+   ioasid_free_set(vmm->ioasid_sid, true);
+   kfree(vmm);
+}
+
+void vfio_mm_put(struct vfio_mm *vmm)
+{
+   kref_put_mutex(>kref, vfio_mm_release, _mm_lock);
+}
+

[PATCH v5 10/15] vfio/type1: Support binding guest page tables to PASID

2020-07-12 Thread Liu Yi L
Nesting translation allows two-levels/stages page tables, with 1st level
for guest translations (e.g. GVA->GPA), 2nd level for host translations
(e.g. GPA->HPA). This patch adds interface for binding guest page tables
to a PASID. This PASID must have been allocated to user space before the
binding request.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
v3 -> v4:
*) address comments from Alex on v3

v2 -> v3:
*) use __iommu_sva_unbind_gpasid() for unbind call issued by VFIO
   
https://lore.kernel.org/linux-iommu/1592931837-58223-6-git-send-email-jacob.jun@linux.intel.com/

v1 -> v2:
*) rename subject from "vfio/type1: Bind guest page tables to host"
*) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/
   unbind guet page table
*) replaced vfio_iommu_for_each_dev() with a group level loop since this
   series enforces one group per container w/ nesting type as start.
*) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn()
*) vfio_dev_unbind_gpasid() always successful
*) use vfio_mm->pasid_lock to avoid race between PASID free and page table
   bind/unbind
---
 drivers/vfio/vfio_iommu_type1.c | 166 
 drivers/vfio/vfio_pasid.c   |  26 +++
 include/linux/vfio.h|  20 +
 include/uapi/linux/vfio.h   |  31 
 4 files changed, 243 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 55b4065..f0f21ff 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -149,6 +149,30 @@ struct vfio_regions {
 #define DIRTY_BITMAP_PAGES_MAX  ((u64)INT_MAX)
 #define DIRTY_BITMAP_SIZE_MAX   DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
 
+struct domain_capsule {
+   struct vfio_group *group;
+   struct iommu_domain *domain;
+   void *data;
+};
+
+/* iommu->lock must be held */
+static struct vfio_group *vfio_find_nesting_group(struct vfio_iommu *iommu)
+{
+   struct vfio_domain *d;
+   struct vfio_group *group = NULL;
+
+   if (!iommu->nesting_info)
+   return NULL;
+
+   /* only support singleton container with nesting type */
+   list_for_each_entry(d, >domain_list, next) {
+   list_for_each_entry(group, >group_list, next) {
+   break;
+   }
+   }
+   return group;
+}
+
 static int put_pfn(unsigned long pfn, int prot);
 
 static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
@@ -2349,6 +2373,48 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   unsigned long arg = *(unsigned long *)dc->data;
+
+   return iommu_sva_bind_gpasid(dc->domain, dev, (void __user *)arg);
+}
+
+static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   unsigned long arg = *(unsigned long *)dc->data;
+
+   iommu_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg);
+   return 0;
+}
+
+static int __vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   struct iommu_gpasid_bind_data *unbind_data =
+   (struct iommu_gpasid_bind_data *)dc->data;
+
+   __iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
+   return 0;
+}
+
+static void vfio_group_unbind_gpasid_fn(ioasid_t pasid, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   struct iommu_gpasid_bind_data unbind_data;
+
+   unbind_data.argsz = offsetof(struct iommu_gpasid_bind_data, vendor);
+   unbind_data.flags = 0;
+   unbind_data.hpasid = pasid;
+
+   dc->data = _data;
+
+   iommu_group_for_each_dev(dc->group->iommu_group,
+dc, __vfio_dev_unbind_gpasid_fn);
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
  struct iommu_group *iommu_group)
 {
@@ -2392,6 +2458,21 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
if (!group)
continue;
 
+   if (iommu->nesting_info && iommu->vmm &&
+   (iommu->nesting_info->features &
+   IOMMU_NESTING_FEAT_BIND_PGTBL)) {
+   struct domain_capsule dc = { .group = group,
+.domain = domain->domain,
+.data = NULL };
+
+   /*
+* Unbind page tables bound with system wide 

[PATCH v5 06/15] iommu/vt-d: Support setting ioasid set to domain

2020-07-12 Thread Liu Yi L
>From IOMMU p.o.v., PASIDs allocated and managed by external components
(e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
needs some knowledge to check the PASID ownership, hence add an interface
for those components to tell the PASID owner.

In latest kernel design, PASID ownership is managed by IOASID set where
the PASID is allocated from. This patch adds support for setting ioasid
set ID to the domains used for nesting/vSVA. Subsequent SVA operations
on the PASID will be checked against its IOASID set for proper ownership.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
v4 -> v5:
*) address comments from Eric Auger.
---
 drivers/iommu/intel/iommu.c | 22 ++
 include/linux/intel-iommu.h |  4 
 include/linux/iommu.h   |  1 +
 3 files changed, 27 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 72ae6a2..4d54198 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1793,6 +1793,7 @@ static struct dmar_domain *alloc_domain(int flags)
if (first_level_by_default())
domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
domain->has_iotlb_device = false;
+   domain->ioasid_sid = INVALID_IOASID_SET;
INIT_LIST_HEAD(>devices);
 
return domain;
@@ -6039,6 +6040,27 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
}
spin_unlock_irqrestore(_domain_lock, flags);
break;
+   case DOMAIN_ATTR_IOASID_SID:
+   {
+   int sid = *(int *)data;
+
+   if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
+   ret = -ENODEV;
+   break;
+   }
+   spin_lock_irqsave(_domain_lock, flags);
+   if (dmar_domain->ioasid_sid != INVALID_IOASID_SET &&
+   dmar_domain->ioasid_sid != sid) {
+   pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
+   dmar_domain->ioasid_sid, sid);
+   ret = -EBUSY;
+   spin_unlock_irqrestore(_domain_lock, flags);
+   break;
+   }
+   dmar_domain->ioasid_sid = sid;
+   spin_unlock_irqrestore(_domain_lock, flags);
+   break;
+   }
default:
ret = -EINVAL;
break;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 3f23c26..0d0ab32 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -549,6 +549,10 @@ struct dmar_domain {
   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
u64 max_addr;   /* maximum mapped address */
 
+   int ioasid_sid; /*
+* the ioasid set which tracks all
+* PASIDs used by the domain.
+*/
int default_pasid;  /*
 * The default pasid used for non-SVM
 * traffic on mediated devices.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ca9d48..e84a1d5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_IOASID_SID,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs

2020-07-12 Thread Liu Yi L
Recent years, mediated device pass-through framework (e.g. vfio-mdev)
is used to achieve flexible device sharing across domains (e.g. VMs).
Also there are hardware assisted mediated pass-through solutions from
platform vendors. e.g. Intel VT-d scalable mode which supports Intel
Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
concept, which means mdevs are protected by an iommu domain which is
auxiliary to the domain that the kernel driver primarily uses for DMA
API. Details can be found in the KVM presentation as below:

https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\
Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf

This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
main requirement is to use the auxiliary domain associated with mdev.

Cc: Kevin Tian 
CC: Jacob Pan 
CC: Jun Tian 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
---
v1 -> v2:
*) check the iommu_device to ensure the handling mdev is IOMMU-backed
---
 drivers/vfio/vfio_iommu_type1.c | 39 +++
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 960cc59..f1f1ae2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2373,20 +2373,41 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static struct device *vfio_get_iommu_device(struct vfio_group *group,
+   struct device *dev)
+{
+   if (group->mdev_group)
+   return vfio_mdev_get_iommu_device(dev);
+   else
+   return dev;
+}
+
 static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
 {
struct domain_capsule *dc = (struct domain_capsule *)data;
unsigned long arg = *(unsigned long *)dc->data;
+   struct device *iommu_device;
+
+   iommu_device = vfio_get_iommu_device(dc->group, dev);
+   if (!iommu_device)
+   return -EINVAL;
 
-   return iommu_sva_bind_gpasid(dc->domain, dev, (void __user *)arg);
+   return iommu_sva_bind_gpasid(dc->domain, iommu_device,
+(void __user *)arg);
 }
 
 static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
 {
struct domain_capsule *dc = (struct domain_capsule *)data;
unsigned long arg = *(unsigned long *)dc->data;
+   struct device *iommu_device;
 
-   iommu_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg);
+   iommu_device = vfio_get_iommu_device(dc->group, dev);
+   if (!iommu_device)
+   return -EINVAL;
+
+   iommu_sva_unbind_gpasid(dc->domain, iommu_device,
+   (void __user *)arg);
return 0;
 }
 
@@ -2395,8 +2416,13 @@ static int __vfio_dev_unbind_gpasid_fn(struct device 
*dev, void *data)
struct domain_capsule *dc = (struct domain_capsule *)data;
struct iommu_gpasid_bind_data *unbind_data =
(struct iommu_gpasid_bind_data *)dc->data;
+   struct device *iommu_device;
+
+   iommu_device = vfio_get_iommu_device(dc->group, dev);
+   if (!iommu_device)
+   return -EINVAL;
 
-   __iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
+   __iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data);
return 0;
 }
 
@@ -3077,8 +3103,13 @@ static int vfio_dev_cache_invalidate_fn(struct device 
*dev, void *data)
 {
struct domain_capsule *dc = (struct domain_capsule *)data;
unsigned long arg = *(unsigned long *)dc->data;
+   struct device *iommu_device;
+
+   iommu_device = vfio_get_iommu_device(dc->group, dev);
+   if (!iommu_device)
+   return -EINVAL;
 
-   iommu_cache_invalidate(dc->domain, dev, (void __user *)arg);
+   iommu_cache_invalidate(dc->domain, iommu_device, (void __user *)arg);
return 0;
 }
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 09/15] iommu/vt-d: Check ownership for PASIDs from user-space

2020-07-12 Thread Liu Yi L
When an IOMMU domain with nesting attribute is used for guest SVA, a
system-wide PASID is allocated for binding with the device and the domain.
For security reason, we need to check the PASID passsed from user-space.
e.g. page table bind/unbind and PASID related cache invalidation.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 10 ++
 drivers/iommu/intel/svm.c   |  7 +--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4d54198..a9504cb 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5436,6 +5436,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
int granu = 0;
u64 pasid = 0;
u64 addr = 0;
+   void *pdata;
 
granu = to_vtd_granularity(cache_type, inv_info->granularity);
if (granu == -EINVAL) {
@@ -5456,6 +5457,15 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 (inv_info->granu.addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_PASID))
pasid = inv_info->granu.addr_info.pasid;
 
+   pdata = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
+   if (!pdata) {
+   ret = -EINVAL;
+   goto out_unlock;
+   } else if (IS_ERR(pdata)) {
+   ret = PTR_ERR(pdata);
+   goto out_unlock;
+   }
+
switch (BIT(cache_type)) {
case IOMMU_CACHE_INV_TYPE_IOTLB:
/* HW will ignore LSB bits based on address mask */
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d2c0e1a..212dee0 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -319,7 +319,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
dmar_domain = to_dmar_domain(domain);
 
mutex_lock(_mutex);
-   svm = ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);
+   svm = ioasid_find(dmar_domain->ioasid_sid, data->hpasid, NULL);
if (IS_ERR(svm)) {
ret = PTR_ERR(svm);
goto out;
@@ -436,6 +436,7 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+   struct dmar_domain *dmar_domain;
struct intel_svm_dev *sdev;
struct intel_svm *svm;
int ret = -EINVAL;
@@ -443,8 +444,10 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain,
if (WARN_ON(!iommu))
return -EINVAL;
 
+   dmar_domain = to_dmar_domain(domain);
+
mutex_lock(_mutex);
-   svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
+   svm = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
if (!svm) {
ret = -EINVAL;
goto out;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 13/15] vfio/pci: Expose PCIe PASID capability to guest

2020-07-12 Thread Liu Yi L
This patch exposes PCIe PASID capability to guest for assigned devices.
Existing vfio_pci driver hides it from guest by setting the capability
length as 0 in pci_ext_cap_length[].

And this patch only exposes PASID capability for devices which has PCIe
PASID extended struture in its configuration space. So VFs, will will
not see PASID capability on VFs as VF doesn't implement PASID extended
structure in its configuration space. For VF, it is a TODO in future.
Related discussion can be found in below link:

https://lkml.org/lkml/2020/4/7/693

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
---
v1 -> v2:
*) added in v2, but it was sent in a separate patchseries before
---
 drivers/vfio/pci/vfio_pci_config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c 
b/drivers/vfio/pci/vfio_pci_config.c
index d98843f..07ff2e6 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = 
{
[PCI_EXT_CAP_ID_LTR]=   PCI_EXT_CAP_LTR_SIZEOF,
[PCI_EXT_CAP_ID_SECPCI] =   0,  /* not yet */
[PCI_EXT_CAP_ID_PMUX]   =   0,  /* not yet */
-   [PCI_EXT_CAP_ID_PASID]  =   0,  /* not yet */
+   [PCI_EXT_CAP_ID_PASID]  =   PCI_EXT_CAP_PASID_SIZEOF,
 };
 
 /*
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 03/15] iommu/smmu: Report empty domain nesting info

2020-07-12 Thread Liu Yi L
This patch is added as instead of returning a boolean for DOMAIN_ATTR_NESTING,
iommu_domain_get_attr() should return an iommu_nesting_info handle.

Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
v4 -> v5:
*) address comments from Eric Auger.
---
 drivers/iommu/arm-smmu-v3.c | 29 +++--
 drivers/iommu/arm-smmu.c| 29 +++--
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f578677..ec815d7 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3019,6 +3019,32 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
 }
 
+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+   void *data)
+{
+   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+   unsigned int size;
+
+   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -ENODEV;
+
+   size = sizeof(struct iommu_nesting_info);
+
+   /*
+* if provided buffer size is smaller than expected, should
+* return 0 and also the expected buffer size to caller.
+*/
+   if (info->size < size) {
+   info->size = size;
+   return 0;
+   }
+
+   /* report an empty iommu_nesting_info for now */
+   memset(info, 0x0, size);
+   info->size = size;
+   return 0;
+}
+
 static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
enum iommu_attr attr, void *data)
 {
@@ -3028,8 +3054,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case IOMMU_DOMAIN_UNMANAGED:
switch (attr) {
case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
+   return arm_smmu_domain_nesting_info(smmu_domain, data);
default:
return -ENODEV;
}
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4c..09e2f1b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1506,6 +1506,32 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
 }
 
+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+   void *data)
+{
+   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+   unsigned int size;
+
+   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -ENODEV;
+
+   size = sizeof(struct iommu_nesting_info);
+
+   /*
+* if provided buffer size is smaller than expected, should
+* return 0 and also the expected buffer size to caller.
+*/
+   if (info->size < size) {
+   info->size = size;
+   return 0;
+   }
+
+   /* report an empty iommu_nesting_info for now */
+   memset(info, 0x0, size);
+   info->size = size;
+   return 0;
+}
+
 static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
enum iommu_attr attr, void *data)
 {
@@ -1515,8 +1541,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case IOMMU_DOMAIN_UNMANAGED:
switch (attr) {
case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
+   return arm_smmu_domain_nesting_info(smmu_domain, data);
default:
return -ENODEV;
}
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 15/15] iommu/vt-d: Support reporting nesting capability info

2020-07-12 Thread Liu Yi L
Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
v2 -> v3:
*) remove cap/ecap_mask in iommu_nesting_info.
---
 drivers/iommu/intel/iommu.c | 81 +++--
 include/linux/intel-iommu.h | 16 +
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index a9504cb..9f7ad1a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5659,12 +5659,16 @@ static inline bool iommu_pasid_support(void)
 static inline bool nested_mode_support(void)
 {
struct dmar_drhd_unit *drhd;
-   struct intel_iommu *iommu;
+   struct intel_iommu *iommu, *prev = NULL;
bool ret = true;
 
rcu_read_lock();
for_each_active_iommu(iommu, drhd) {
-   if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) {
+   if (!prev)
+   prev = iommu;
+   if (!sm_supported(iommu) || !ecap_nest(iommu->ecap) ||
+   (VTD_CAP_MASK & (iommu->cap ^ prev->cap)) ||
+   (VTD_ECAP_MASK & (iommu->ecap ^ prev->ecap))) {
ret = false;
break;
}
@@ -6079,6 +6083,78 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
return ret;
 }
 
+static int intel_iommu_get_nesting_info(struct iommu_domain *domain,
+   struct iommu_nesting_info *info)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK;
+   struct device_domain_info *domain_info;
+   struct iommu_nesting_info_vtd vtd;
+   unsigned long flags;
+   unsigned int size;
+
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED ||
+   !(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
+   return -ENODEV;
+
+   if (!info)
+   return -EINVAL;
+
+   size = sizeof(struct iommu_nesting_info) +
+   sizeof(struct iommu_nesting_info_vtd);
+   /*
+* if provided buffer size is smaller than expected, should
+* return 0 and also the expected buffer size to caller.
+*/
+   if (info->size < size) {
+   info->size = size;
+   return 0;
+   }
+
+   spin_lock_irqsave(_domain_lock, flags);
+   /*
+* arbitrary select the first domain_info as all nesting
+* related capabilities should be consistent across iommu
+* units.
+*/
+   domain_info = list_first_entry(_domain->devices,
+  struct device_domain_info, link);
+   cap &= domain_info->iommu->cap;
+   ecap &= domain_info->iommu->ecap;
+   spin_unlock_irqrestore(_domain_lock, flags);
+
+   info->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+   info->features = IOMMU_NESTING_FEAT_SYSWIDE_PASID |
+IOMMU_NESTING_FEAT_BIND_PGTBL |
+IOMMU_NESTING_FEAT_CACHE_INVLD;
+   info->addr_width = dmar_domain->gaw;
+   info->pasid_bits = ilog2(intel_pasid_max_id);
+   info->padding = 0;
+   vtd.flags = 0;
+   vtd.padding = 0;
+   vtd.cap_reg = cap;
+   vtd.ecap_reg = ecap;
+
+   memcpy(info->data, , sizeof(vtd));
+   return 0;
+}
+
+static int intel_iommu_domain_get_attr(struct iommu_domain *domain,
+  enum iommu_attr attr, void *data)
+{
+   switch (attr) {
+   case DOMAIN_ATTR_NESTING:
+   {
+   struct iommu_nesting_info *info =
+   (struct iommu_nesting_info *)data;
+
+   return intel_iommu_get_nesting_info(domain, info);
+   }
+   default:
+   return -ENODEV;
+   }
+}
+
 /*
  * Check that the device does not live on an external facing PCI port that is
  * marked as untrusted. Such devices should not be able to apply quirks and
@@ -6101,6 +6177,7 @@ const struct iommu_ops intel_iommu_ops = {
.domain_alloc   = intel_iommu_domain_alloc,
.domain_free= intel_iommu_domain_free,
.domain_set_attr= intel_iommu_domain_set_attr,
+   .domain_get_attr= intel_iommu_domain_get_attr,
.attach_dev = intel_iommu_attach_device,
.detach_dev = intel_iommu_detach_device,
.aux_attach_dev = intel_iommu_aux_attach_device,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 18f292e..c4ed0d4 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -197,6 +197,22 @@
 #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
 #define ecap_sc_support(e) ((e >> 7) & 0x1) /* Snooping Control */
 
+/* Nesting Support Capability Alignment */
+#define VTD_CAP_FL1GP  BIT_ULL(56)
+#define 

Re: [PATCH v2 3/5] irqchip: Allow QCOM_PDC to be loadable as a permanent module

2020-07-12 Thread Marc Zyngier
On Sat, 11 Jul 2020 00:27:45 +0100,
Stephen Boyd  wrote:
> 
> Quoting John Stultz (2020-07-10 15:44:18)
> > On Thu, Jul 9, 2020 at 11:02 PM Stephen Boyd  wrote:
> > >
> > > Does it work? I haven't looked in detail but I worry that the child
> > > irqdomain (i.e. pinctrl-msm) would need to delay probing until this
> > > parent irqdomain is registered. Or has the hierarchical irqdomain code
> > > been updated to handle the parent child relationship and wait for things
> > > to probe or be loaded?
> > 
> > So I can't say I know the underlying hardware particularly well, but
> > I've been using this successfully on the Dragonboard 845c with both
> > static builds as well as module enabled builds.
> > And the same patch has been in the android-mainline and android-5.4
> > kernels for a while without objections from QCOM.
> > 
> > As to the probe ordering question, Saravana can maybe speak in more
> > detail if it's involved in this case but the fw_devlink code has
> > addressed many of these sorts of ordering issues.
> > However, I'm not sure if I'm lucking into the right probe order, as we
> > have been able to boot android-mainline w/ both fw_devlink=on and
> > fw_devlink=off (though in the =off case, we need
> > deferred_probe_timeout=30 to give us a bit more time for modules to
> > load after init starts).
> > 
> 
> Ok I looked at the code (sorry for not checking earlier) and I see this in
> msm_gpio_init()
> 
> np = of_parse_phandle(pctrl->dev->of_node, "wakeup-parent", 0);
> if (np) {
> chip->irq.parent_domain = irq_find_matching_host(np,
>  DOMAIN_BUS_WAKEUP);
> of_node_put(np);
> if (!chip->irq.parent_domain)
> return -EPROBE_DEFER;
> 
> so it looks like we'll probe defer the pinctrl driver until the pdc module
> loads. Meaning it should work to have pinctrl builtin and pdc as a module.

What I hope is that eventually fw_devlink will become the norm (on by
default), and that probe deferral will become a thing of the past.

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu