RE: [PATCH 2/4] iommu/vt-d: Enable write protect propagation from guest

2021-02-18 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Friday, February 19, 2021 5:31 AM
> 
> Write protect bit, when set, inhibits supervisor writes to the read-only
> pages. In guest supervisor shared virtual addressing (SVA), write-protect
> should be honored upon guest bind supervisor PASID request.
> 
> This patch extends the VT-d portion of the IOMMU UAPI to include WP bit.
> WPE bit of the  supervisor PASID entry will be set to match CPU CR0.WP bit.
> 
> Signed-off-by: Sanjay Kumar 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel/pasid.c | 5 +
>  include/uapi/linux/iommu.h  | 3 ++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 0b7e0e726ade..c7a2ec930af4 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -763,6 +763,11 @@ intel_pasid_setup_bind_data(struct intel_iommu
> *iommu, struct pasid_entry *pte,
>   return -EINVAL;
>   }
>   pasid_set_sre(pte);
> + /* Enable write protect WP if guest requested */
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_WPE) {
> + if (pasid_enable_wpe(pte))
> + return -EINVAL;

We should call pasid_set_wpe directly, as this binding is about guest
page table and suppose the guest has done whatever check required
(e.g. gcr0.wp) before setting this bit. pasid_enable_wpe has an additional 
check on host cr0.wp thus is logically incorrect here.

Thanks
Kevin

> + }
>   }
> 
>   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 68cb558fe8db..33f3dc7a91de 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -288,7 +288,8 @@ struct iommu_gpasid_bind_data_vtd {
>  #define IOMMU_SVA_VTD_GPASID_PWT (1 << 3) /* page-level write
> through */
>  #define IOMMU_SVA_VTD_GPASID_EMTE(1 << 4) /* extended mem
> type enable */
>  #define IOMMU_SVA_VTD_GPASID_CD  (1 << 5) /* PASID-level
> cache disable */
> -#define IOMMU_SVA_VTD_GPASID_LAST(1 << 6)
> +#define IOMMU_SVA_VTD_GPASID_WPE (1 << 6) /* Write protect
> enable */
> +#define IOMMU_SVA_VTD_GPASID_LAST(1 << 7)
>   __u64 flags;
>   __u32 pat;
>   __u32 emt;
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] iommu/vt-d: Enable write protect for supervisor SVM

2021-02-18 Thread Jacob Pan
Write protect bit, when set, inhibits supervisor writes to the read-only
pages. In supervisor shared virtual addressing (SVA), where page tables
are shared between CPU and DMA, IOMMU PASID entry WPE bit should match
CR0.WP bit in the CPU.
This patch sets WPE bit for supervisor PASIDs if CR0.WP is set.

Signed-off-by: Sanjay Kumar 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/pasid.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 0cceaabc3ce6..0b7e0e726ade 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -410,6 +410,15 @@ static inline void pasid_set_sre(struct pasid_entry *pe)
pasid_set_bits(>val[2], 1 << 0, 1);
 }
 
+/*
+ * Setup the WPE(Write Protect Enable) field (Bit 132) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_wpe(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[2], 1 << 4, 1 << 4);
+}
+
 /*
  * Setup the P(Present) field (Bit 0) of a scalable mode PASID
  * entry.
@@ -553,6 +562,20 @@ static void pasid_flush_caches(struct intel_iommu *iommu,
}
 }
 
+static inline int pasid_enable_wpe(struct pasid_entry *pte)
+{
+   unsigned long cr0 = read_cr0();
+
+   /* CR0.WP is normally set but just to be sure */
+   if (unlikely(!(cr0 & X86_CR0_WP))) {
+   pr_err_ratelimited("No CPU write protect!\n");
+   return -EINVAL;
+   }
+   pasid_set_wpe(pte);
+
+   return 0;
+};
+
 /*
  * Set up the scalable mode pasid table entry for first only
  * translation type.
@@ -584,6 +607,9 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
return -EINVAL;
}
pasid_set_sre(pte);
+   if (pasid_enable_wpe(pte))
+   return -EINVAL;
+
}
 
if (flags & PASID_FLAG_FL5LP) {
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] Misc vSVA fixes for VT-d

2021-02-18 Thread Jacob Pan
Hi Baolu et al,

This is a collection of SVA-related fixes.

Thanks,
Jacob


Jacob Pan (4):
  iommu/vt-d: Enable write protect for supervisor SVM
  iommu/vt-d: Enable write protect propagation from guest
  iommu/vt-d: Reject unsupported page request modes
  iommu/vt-d: Calculate and set flags for handle_mm_fault

 drivers/iommu/intel/pasid.c | 31 +++
 drivers/iommu/intel/svm.c   | 21 +
 include/uapi/linux/iommu.h  |  3 ++-
 3 files changed, 50 insertions(+), 5 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] iommu/vt-d: Enable write protect propagation from guest

2021-02-18 Thread Jacob Pan
Write protect bit, when set, inhibits supervisor writes to the read-only
pages. In guest supervisor shared virtual addressing (SVA), write-protect
should be honored upon guest bind supervisor PASID request.

This patch extends the VT-d portion of the IOMMU UAPI to include WP bit.
WPE bit of the  supervisor PASID entry will be set to match CPU CR0.WP bit.

Signed-off-by: Sanjay Kumar 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/pasid.c | 5 +
 include/uapi/linux/iommu.h  | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 0b7e0e726ade..c7a2ec930af4 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -763,6 +763,11 @@ intel_pasid_setup_bind_data(struct intel_iommu *iommu, 
struct pasid_entry *pte,
return -EINVAL;
}
pasid_set_sre(pte);
+   /* Enable write protect WP if guest requested */
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_WPE) {
+   if (pasid_enable_wpe(pte))
+   return -EINVAL;
+   }
}
 
if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 68cb558fe8db..33f3dc7a91de 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -288,7 +288,8 @@ struct iommu_gpasid_bind_data_vtd {
 #define IOMMU_SVA_VTD_GPASID_PWT   (1 << 3) /* page-level write through */
 #define IOMMU_SVA_VTD_GPASID_EMTE  (1 << 4) /* extended mem type enable */
 #define IOMMU_SVA_VTD_GPASID_CD(1 << 5) /* PASID-level cache 
disable */
-#define IOMMU_SVA_VTD_GPASID_LAST  (1 << 6)
+#define IOMMU_SVA_VTD_GPASID_WPE   (1 << 6) /* Write protect enable */
+#define IOMMU_SVA_VTD_GPASID_LAST  (1 << 7)
__u64 flags;
__u32 pat;
__u32 emt;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] iommu/vt-d: Calculate and set flags for handle_mm_fault

2021-02-18 Thread Jacob Pan
Page requests are originated from the user page fault. Therefore, we
shall set FAULT_FLAG_USER. 

FAULT_FLAG_REMOTE indicates that we are walking an mm which is not
guaranteed to be the same as the current->mm and should not be subject
to protection key enforcement. Therefore, we should set FAULT_FLAG_REMOTE
to avoid faults when both SVM and PKEY are used.

References: commit 1b2ee1266ea6 ("mm/core: Do not enforce PKEY permissions on 
remote mm access")
Reviewed-by: Raj Ashok 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index ff7ae7cc17d5..7bfd20a24a60 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1086,6 +1086,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
struct intel_iommu *iommu = d;
struct intel_svm *svm = NULL;
int head, tail, handled = 0;
+   unsigned int flags = 0;
 
/* Clear PPR bit before reading head/tail registers, to
 * ensure that we get a new interrupt if needed. */
@@ -1186,9 +1187,11 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (access_error(vma, req))
goto invalid;
 
-   ret = handle_mm_fault(vma, address,
- req->wr_req ? FAULT_FLAG_WRITE : 0,
- NULL);
+   flags = FAULT_FLAG_USER | FAULT_FLAG_REMOTE;
+   if (req->wr_req)
+   flags |= FAULT_FLAG_WRITE;
+
+   ret = handle_mm_fault(vma, address, flags, NULL);
if (ret & VM_FAULT_ERROR)
goto invalid;
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/4] iommu/vt-d: Reject unsupported page request modes

2021-02-18 Thread Jacob Pan
When supervisor/privilige mode SVM is used, we bind init_mm.pgd with
a supervisor PASID. There should not be any page fault for init_mm.
Execution request with DMA read is also not supported.

This patch checks PRQ descriptor for both unsupported configurations,
reject them both with invalid responses.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 23a1e4f58c54..ff7ae7cc17d5 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1113,7 +1113,17 @@ static irqreturn_t prq_event_thread(int irq, void *d)
   ((unsigned long long *)req)[1]);
goto no_pasid;
}
-
+   /* We shall not receive page request for supervisor SVM */
+   if (req->pm_req && (req->rd_req | req->wr_req)) {
+   pr_err("Unexpected page request in Privilege Mode");
+   /* No need to find the matching sdev as for bad_req */
+   goto no_pasid;
+   }
+   /* DMA read with exec requeset is not supported. */
+   if (req->exe_req && req->rd_req) {
+   pr_err("Execution request not supported\n");
+   goto no_pasid;
+   }
if (!svm || svm->pasid != req->pasid) {
rcu_read_lock();
svm = ioasid_find(NULL, req->pasid, NULL);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: Question regarding VIOT proposal

2021-02-18 Thread Al Stone
On 17 Feb 2021 10:37, Jean-Philippe Brucker wrote:
> On Tue, Feb 16, 2021 at 02:31:03PM -0700, Al Stone wrote:
> > Would you believe last week's meeting was canceled, too?  Not sure
> > why the chair/co-chair are doing this but I'm finding it a little
> > frustrating.
> > 
> > We'll try again this week ... again, apologies for the delays.  I'd
> > recommend going with the last version posted just so progress can be
> > made.  The spec can always be fixed later.
> 
> Thanks, I'll send the acpica changes for review and follow with QEMU and
> Linux patches. These things also take time so I might as well start in
> parallel.
> 
> Thanks,
> Jean

As of today, the proposal has been approved for inclusion in the next
release of the ACPI spec (whatever version gets released post the 6.4
version that just came out).

Congratulations ?!? :)

And thanks to all for their patience during this process.  You now
have the dubious disctinction of being the very first table added
to the spec that _started_ as open source.

-- 
ciao,
al
---
Al Stone
Software Engineer
Red Hat, Inc.
a...@redhat.com
---

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

2021-02-18 Thread Nicolin Chen
Commit 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
removed certain hack in the tegra_smmu_probe() by relying on IOMMU core to
of_xlate SMMU's SID per device, so as to get rid of tegra_smmu_find() and
tegra_smmu_configure() that are typically done in the IOMMU core also.

This approach works for both existing devices that have DT nodes and other
devices (like PCI device) that don't exist in DT, on Tegra210 and Tegra3
upon testing. However, Page Fault errors are reported on tegra124-Nyan:

  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
 EMEM address decode error (SMMU translation error [--S])
  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
 Page fault (SMMU translation error [--S])

After debugging, I found that the mentioned commit changed some function
callback sequence of tegra-smmu's, resulting in enabling SMMU for display
client before display driver gets initialized. I couldn't reproduce exact
same issue on Tegra210 as Tegra124 (arm-32) differs at arch-level code.

Actually this Page Fault is a known issue, as on most of Tegra platforms,
display gets enabled by the bootloader for the splash screen feature, so
it keeps filling the framebuffer memory. A proper fix to this issue is to
1:1 linear map the framebuffer memory to IOVA space so the SMMU will have
the same address as the physical address in its page table. Yet, Thierry
has been working on the solution above for a year, and it hasn't merged.

Therefore, let's partially revert the mentioned commit to fix the errors.

The reason why we do a partial revert here is that we can still set priv
in ->of_xlate() callback for PCI devices. Meanwhile, devices existing in
DT, like display, will go through tegra_smmu_configure() at the stage of
bus_set_iommu() when SMMU gets probed(), as what it did before we merged
the mentioned commit.

Once we have the linear map solution for framebuffer memory, this change
can be cleaned away.

[Big thank to Guillaume who reported and helped debugging/verification]

Fixes: 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
Reported-by: Guillaume Tucker 
Signed-off-by: Nicolin Chen 
---

Guillaume, would you please give a "Tested-by" to this change? Thanks!

 drivers/iommu/tegra-smmu.c | 72 +-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 4a3f095a1c26..97eb62f667d2 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -798,10 +798,70 @@ static phys_addr_t tegra_smmu_iova_to_phys(struct 
iommu_domain *domain,
return SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova);
 }
 
+static struct tegra_smmu *tegra_smmu_find(struct device_node *np)
+{
+   struct platform_device *pdev;
+   struct tegra_mc *mc;
+
+   pdev = of_find_device_by_node(np);
+   if (!pdev)
+   return NULL;
+
+   mc = platform_get_drvdata(pdev);
+   if (!mc)
+   return NULL;
+
+   return mc->smmu;
+}
+
+static int tegra_smmu_configure(struct tegra_smmu *smmu, struct device *dev,
+   struct of_phandle_args *args)
+{
+   const struct iommu_ops *ops = smmu->iommu.ops;
+   int err;
+
+   err = iommu_fwspec_init(dev, >of_node->fwnode, ops);
+   if (err < 0) {
+   dev_err(dev, "failed to initialize fwspec: %d\n", err);
+   return err;
+   }
+
+   err = ops->of_xlate(dev, args);
+   if (err < 0) {
+   dev_err(dev, "failed to parse SW group ID: %d\n", err);
+   iommu_fwspec_free(dev);
+   return err;
+   }
+
+   return 0;
+}
+
 static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
 {
-   struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
+   struct device_node *np = dev->of_node;
+   struct tegra_smmu *smmu = NULL;
+   struct of_phandle_args args;
+   unsigned int index = 0;
+   int err;
+
+   while (of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index,
+ ) == 0) {
+   smmu = tegra_smmu_find(args.np);
+   if (smmu) {
+   err = tegra_smmu_configure(smmu, dev, );
+   of_node_put(args.np);
 
+   if (err < 0)
+   return ERR_PTR(err);
+
+   break;
+   }
+
+   of_node_put(args.np);
+   index++;
+   }
+
+   smmu = dev_iommu_priv_get(dev);
if (!smmu)
return ERR_PTR(-ENODEV);
 
@@ -1028,6 +1088,16 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
if (!smmu)
return ERR_PTR(-ENOMEM);
 
+   /*
+* This is a bit of a hack. Ideally we'd want to simply return this
+* value. However the IOMMU registration process will attempt to add
+   

Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-18 Thread Guillaume Tucker
On 18/02/2021 10:35, Nicolin Chen wrote:
> Hi Guillaume,
> 
> Thank you for the test results! And sorry for my belated reply.

No worries :)

> On Thu, Feb 11, 2021 at 03:50:05PM +, Guillaume Tucker wrote:
>>> On Sat, Feb 06, 2021 at 01:40:13PM +, Guillaume Tucker wrote:
>>>>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
>>>>> and the commit-reverted version (passing), each applying this patch.
>>>>
>>>> Sure, I've run 3 jobs:
>>>>
>>>> * v5.11-rc6 as a reference, to see the original issue:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187848
>>>>
>>>> * + your debug patch:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187849
>>>>
>>>> * + the "breaking" commit reverted, passing the tests:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187851
>>>
>>> Thanks for the help!
>>>
>>> I am able to figure out what's probably wrong, yet not so sure
>>> about the best solution at this point.
>>>
>>> Would it be possible for you to run one more time with another
>>> debugging patch? I'd like to see the same logs as previous:
>>> 1. Vanilla kernel + debug patch
>>> 2. Vanilla kernel + Reverted + debug patch
>>
>> As it turns out, next-20210210 is passing all the tests again so
>> it looks like this got fixed in the meantime:
>>
>>   https://lava.collabora.co.uk/scheduler/job/3210192
> 
> I checked this passing log, however, found that the regression is
> still there though test passed, as the prints below aren't normal:
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>EMEM address decode error (SMMU translation error [--S])
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>Page fault (SMMU translation error [--S])

Ah yes sorry, there are other KernelCI checks for kernel errors
but that wasn't enabled in the bisection so I didn't notice them.

> I was trying to think of a simpler solution than a revert. However,
> given the fact that the callback sequence could change -- guessing
> likely a recent change in iommu core, I feel it safer to revert my
> previous change, not necessarily being a complete revert though.
> 
> I attached my partial reverting change in this email. Would it be
> possible for you to run one more test for me to confirm it? It'd
> keep the tests passing while eliminating all error prints above.
> 
> If the fix works, I'll re-send it to mail list by adding a commit
> message.

Sure, here's next-20210218 as a reference:

  https://lava.collabora.co.uk/scheduler/job/3241236

and here with your patch applied on top of it:

  https://lava.collabora.co.uk/scheduler/job/3241246

The git branch I've used where your patch is applied:

  
https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210218-nyan-big-drm-read/

The errors seem to have disappeared but I'll let you double check
that things are all back to a working state.

BTW: This thread is a good example of how having an "on-demand"
KernelCI service to let developers re-run tests with extra
patches would allow them to fix issues independently.  We'll keep
that in mind for the future.

Best wishes,
Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 12/13] vfio/pci: Register a DMA fault response region

2021-02-18 Thread Auger Eric
Hi Shameer,

On 2/18/21 11:36 AM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>>> -Original Message-
>>> From: Eric Auger [mailto:eric.au...@redhat.com]
>>> Sent: 16 November 2020 11:00
>>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
>>> alex.william...@redhat.com
>>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>>> Thodi ;
>>> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
>>> nicoleots...@gmail.com; yuzenghui 
>>> Subject: [PATCH v11 12/13] vfio/pci: Register a DMA fault response
>>> region
>>>
>>> In preparation for vSVA, let's register a DMA fault response region,
>>> where the userspace will push the page responses and increment the
>>> head of the buffer. The kernel will pop those responses and inject
>>> them on iommu side.
>>>
>>> Signed-off-by: Eric Auger 
>>> ---
>>>  drivers/vfio/pci/vfio_pci.c | 114 +---
>>>  drivers/vfio/pci/vfio_pci_private.h |   5 ++
>>>  drivers/vfio/pci/vfio_pci_rdwr.c|  39 ++
>>>  include/uapi/linux/vfio.h   |  32 
>>>  4 files changed, 181 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>> index 65a83fd0e8c0..e9a904ce3f0d 100644
>>> --- a/drivers/vfio/pci/vfio_pci.c
>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>> @@ -318,9 +318,20 @@ static void vfio_pci_dma_fault_release(struct
>>> vfio_pci_device *vdev,
>>> kfree(vdev->fault_pages);
>>>  }
>>>
>>> -static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> -  struct vfio_pci_region *region,
>>> -  struct vm_area_struct *vma)
>>> +static void
>>> +vfio_pci_dma_fault_response_release(struct vfio_pci_device *vdev,
>>> +   struct vfio_pci_region *region) {
>>> +   if (vdev->dma_fault_response_wq)
>>> +   destroy_workqueue(vdev->dma_fault_response_wq);
>>> +   kfree(vdev->fault_response_pages);
>>> +   vdev->fault_response_pages = NULL;
>>> +}
>>> +
>>> +static int __vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> +struct vfio_pci_region *region,
>>> +struct vm_area_struct *vma,
>>> +u8 *pages)
>>>  {
>>> u64 phys_len, req_len, pgoff, req_start;
>>> unsigned long long addr;
>>> @@ -333,14 +344,14 @@ static int vfio_pci_dma_fault_mmap(struct
>>> vfio_pci_device *vdev,
>>> ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
>>> req_start = pgoff << PAGE_SHIFT;
>>>
>>> -   /* only the second page of the producer fault region is mmappable */
>>> +   /* only the second page of the fault region is mmappable */
>>> if (req_start < PAGE_SIZE)
>>> return -EINVAL;
>>>
>>> if (req_start + req_len > phys_len)
>>> return -EINVAL;
>>>
>>> -   addr = virt_to_phys(vdev->fault_pages);
>>> +   addr = virt_to_phys(pages);
>>> vma->vm_private_data = vdev;
>>> vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
>>>
>>> @@ -349,13 +360,29 @@ static int vfio_pci_dma_fault_mmap(struct
>>> vfio_pci_device *vdev,
>>> return ret;
>>>  }
>>>
>>> -static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
>>> -struct vfio_pci_region *region,
>>> -struct vfio_info_cap *caps)
>>> +static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> +  struct vfio_pci_region *region,
>>> +  struct vm_area_struct *vma)
>>> +{
>>> +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
>>> vdev->fault_pages);
>>> +}
>>> +
>>> +static int
>>> +vfio_pci_dma_fault_response_mmap(struct vfio_pci_device *vdev,
>>> +   struct vfio_pci_region *region,
>>> +   struct vm_area_struct *vma)
>>> +{
>>> +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
>>> vdev->fault_response_pages);
>>> +}
>>> +
>>> +static int __vfio_pci_dma_fault_add_capability(struct vfio_pci_device 
>>> *vdev,
>>> +  struct vfio_pci_region *region,
>>> +  struct vfio_info_cap *caps,
>>> +  u32 cap_id)
>>>  {
>>> struct vfio_region_info_cap_sparse_mmap *sparse = NULL;
>>> struct vfio_region_info_cap_fault cap = {
>>> -   .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
>>> +   .header.id = cap_id,
>>> .header.version = 1,
>>> .version = 1,
>>> };
>>> @@ -383,6 +410,14 @@ static int
>>> vfio_pci_dma_fault_add_capability(struct
>>> 

RE: [PATCH v11 12/13] vfio/pci: Register a DMA fault response region

2021-02-18 Thread Shameerali Kolothum Thodi
Hi Eric,

> > -Original Message-
> > From: Eric Auger [mailto:eric.au...@redhat.com]
> > Sent: 16 November 2020 11:00
> > To: eric.auger@gmail.com; eric.au...@redhat.com;
> > iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> > k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> > j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
> > alex.william...@redhat.com
> > Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> > zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> > Thodi ;
> > jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> > nicoleots...@gmail.com; yuzenghui 
> > Subject: [PATCH v11 12/13] vfio/pci: Register a DMA fault response
> > region
> >
> > In preparation for vSVA, let's register a DMA fault response region,
> > where the userspace will push the page responses and increment the
> > head of the buffer. The kernel will pop those responses and inject
> > them on iommu side.
> >
> > Signed-off-by: Eric Auger 
> > ---
> >  drivers/vfio/pci/vfio_pci.c | 114 +---
> >  drivers/vfio/pci/vfio_pci_private.h |   5 ++
> >  drivers/vfio/pci/vfio_pci_rdwr.c|  39 ++
> >  include/uapi/linux/vfio.h   |  32 
> >  4 files changed, 181 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 65a83fd0e8c0..e9a904ce3f0d 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -318,9 +318,20 @@ static void vfio_pci_dma_fault_release(struct
> > vfio_pci_device *vdev,
> > kfree(vdev->fault_pages);
> >  }
> >
> > -static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> > -  struct vfio_pci_region *region,
> > -  struct vm_area_struct *vma)
> > +static void
> > +vfio_pci_dma_fault_response_release(struct vfio_pci_device *vdev,
> > +   struct vfio_pci_region *region) {
> > +   if (vdev->dma_fault_response_wq)
> > +   destroy_workqueue(vdev->dma_fault_response_wq);
> > +   kfree(vdev->fault_response_pages);
> > +   vdev->fault_response_pages = NULL;
> > +}
> > +
> > +static int __vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> > +struct vfio_pci_region *region,
> > +struct vm_area_struct *vma,
> > +u8 *pages)
> >  {
> > u64 phys_len, req_len, pgoff, req_start;
> > unsigned long long addr;
> > @@ -333,14 +344,14 @@ static int vfio_pci_dma_fault_mmap(struct
> > vfio_pci_device *vdev,
> > ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
> > req_start = pgoff << PAGE_SHIFT;
> >
> > -   /* only the second page of the producer fault region is mmappable */
> > +   /* only the second page of the fault region is mmappable */
> > if (req_start < PAGE_SIZE)
> > return -EINVAL;
> >
> > if (req_start + req_len > phys_len)
> > return -EINVAL;
> >
> > -   addr = virt_to_phys(vdev->fault_pages);
> > +   addr = virt_to_phys(pages);
> > vma->vm_private_data = vdev;
> > vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
> >
> > @@ -349,13 +360,29 @@ static int vfio_pci_dma_fault_mmap(struct
> > vfio_pci_device *vdev,
> > return ret;
> >  }
> >
> > -static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
> > -struct vfio_pci_region *region,
> > -struct vfio_info_cap *caps)
> > +static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> > +  struct vfio_pci_region *region,
> > +  struct vm_area_struct *vma)
> > +{
> > +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
> > vdev->fault_pages);
> > +}
> > +
> > +static int
> > +vfio_pci_dma_fault_response_mmap(struct vfio_pci_device *vdev,
> > +   struct vfio_pci_region *region,
> > +   struct vm_area_struct *vma)
> > +{
> > +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
> > vdev->fault_response_pages);
> > +}
> > +
> > +static int __vfio_pci_dma_fault_add_capability(struct vfio_pci_device 
> > *vdev,
> > +  struct vfio_pci_region *region,
> > +  struct vfio_info_cap *caps,
> > +  u32 cap_id)
> >  {
> > struct vfio_region_info_cap_sparse_mmap *sparse = NULL;
> > struct vfio_region_info_cap_fault cap = {
> > -   .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
> > +   .header.id = cap_id,
> > .header.version = 1,
> > .version = 1,
> > };
> > @@ -383,6 +410,14 @@ static int
> > vfio_pci_dma_fault_add_capability(struct
> > vfio_pci_device *vdev,
> > return ret;
> >  }
> >
> > +static int 

Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-18 Thread Auger Eric
Hi Keqian,

On 2/18/21 9:43 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/12 16:55, Auger Eric wrote:
>> Hi Keqian,
>>
>> On 2/1/21 12:52 PM, Keqian Zhu wrote:
>>> Hi Eric,
>>>
>>> On 2020/11/18 19:21, Eric Auger wrote:
 On ARM, MSI are translated by the SMMU. An IOVA is allocated
 for each MSI doorbell. If both the host and the guest are exposed
 with SMMUs, we end up with 2 different IOVAs allocated by each.
 guest allocates an IOVA (gIOVA) to map onto the guest MSI
 doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
 onto the physical doorbell (hDB).

 So we end up with 2 untied mappings:
  S1S2
 gIOVA->gDB
   hIOVA->hDB

 Currently the PCI device is programmed by the host with hIOVA
 as MSI doorbell. So this does not work.

 This patch introduces an API to pass gIOVA/gDB to the host so
 that gIOVA can be reused by the host instead of re-allocating
 a new IOVA. So the goal is to create the following nested mapping:
>>> Does the gDB can be reused under non-nested mode?
>>
>> Under non nested mode the hIOVA is allocated within the MSI reserved
>> region exposed by the SMMU driver, [0x800, 80f]. see
>> iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
>> is programmed in the physical device so that the physical SMMU
>> translates it into the physical doorbell (hDB = host physical ITS
> So, AFAIU, under non-nested mode, at smmu side, we reuse the workflow of 
> non-virtualization scenario.
Without virtualization, the host kernel also transparently allocates an
iova to map the doorbell. With standard passthrough withou vIOMMU, the
iova window is different (MSI RESV region).

Thanks

Eric
> 
>> doorbell). The gDB is not used at pIOMMU programming level. It is only
>> used when setting up the KVM irq route.
>>
>> Hope this answers your question.
> Thanks for your explanation!
>>
> 
> Thanks,
> Keqian
> 
>>>

  S1S2
 gIOVA->gDB ->hDB

 and program the PCI device with gIOVA MSI doorbell.

 In case we have several devices attached to this nested domain
 (devices belonging to the same group), they cannot be isolated
 on guest side either. So they should also end up in the same domain
 on guest side. We will enforce that all the devices attached to
 the host iommu domain use the same physical doorbell and similarly
 a single virtual doorbell mapping gets registered (1 single
 virtual doorbell is used on guest as well).

>>> [...]
>>>
 + *
 + * The associated IOVA can be reused by the host to create a nested
 + * stage2 binding mapping translating into the physical doorbell used
 + * by the devices attached to the domain.
 + *
 + * All devices within the domain must share the same physical doorbell.
 + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
 + */
 +
 +int iommu_bind_guest_msi(struct iommu_domain *domain,
 +   dma_addr_t giova, phys_addr_t gpa, size_t size)
 +{
 +  if (unlikely(!domain->ops->bind_guest_msi))
 +  return -ENODEV;
 +
 +  return domain->ops->bind_guest_msi(domain, giova, gpa, size);
 +}
 +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
 +
 +void iommu_unbind_guest_msi(struct iommu_domain *domain,
 +  dma_addr_t iova)
>>> nit: s/iova/giova
>> sure
>>>
 +{
 +  if (unlikely(!domain->ops->unbind_guest_msi))
 +  return;
 +
 +  domain->ops->unbind_guest_msi(domain, iova);
 +}
 +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
 +
>>> [...]
>>>
>>> Thanks,
>>> Keqian
>>>
>>
>> Thanks
>>
>> Eric
>>
>> .
>>
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-18 Thread Nicolin Chen
Hi Guillaume,

Thank you for the test results! And sorry for my belated reply.

On Thu, Feb 11, 2021 at 03:50:05PM +, Guillaume Tucker wrote:
> > On Sat, Feb 06, 2021 at 01:40:13PM +, Guillaume Tucker wrote:
> >>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
> >>> and the commit-reverted version (passing), each applying this patch.
> >>
> >> Sure, I've run 3 jobs:
> >>
> >> * v5.11-rc6 as a reference, to see the original issue:
> >>   https://lava.collabora.co.uk/scheduler/job/3187848
> >>
> >> * + your debug patch:
> >>   https://lava.collabora.co.uk/scheduler/job/3187849
> >>
> >> * + the "breaking" commit reverted, passing the tests:
> >>   https://lava.collabora.co.uk/scheduler/job/3187851
> > 
> > Thanks for the help!
> > 
> > I am able to figure out what's probably wrong, yet not so sure
> > about the best solution at this point.
> > 
> > Would it be possible for you to run one more time with another
> > debugging patch? I'd like to see the same logs as previous:
> > 1. Vanilla kernel + debug patch
> > 2. Vanilla kernel + Reverted + debug patch
> 
> As it turns out, next-20210210 is passing all the tests again so
> it looks like this got fixed in the meantime:
> 
>   https://lava.collabora.co.uk/scheduler/job/3210192

I checked this passing log, however, found that the regression is
still there though test passed, as the prints below aren't normal:
  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
 EMEM address decode error (SMMU translation error [--S])
  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
 Page fault (SMMU translation error [--S])

I was trying to think of a simpler solution than a revert. However,
given the fact that the callback sequence could change -- guessing
likely a recent change in iommu core, I feel it safer to revert my
previous change, not necessarily being a complete revert though.

I attached my partial reverting change in this email. Would it be
possible for you to run one more test for me to confirm it? It'd
keep the tests passing while eliminating all error prints above.

If the fix works, I'll re-send it to mail list by adding a commit
message.

Thanks!
Nicolin
>From a9950b6e76e279f19d2c9d06aef1e222b020a9e2 Mon Sep 17 00:00:00 2001
From: Nicolin Chen 
Date: Thu, 18 Feb 2021 01:11:59 -0800
Subject: [PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
	 EMEM address decode error (SMMU translation error [--S])
  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
	 Page fault (SMMU translation error [--S])

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/tegra-smmu.c | 72 +-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 4a3f095a1c26..97eb62f667d2 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -798,10 +798,70 @@ static phys_addr_t tegra_smmu_iova_to_phys(struct iommu_domain *domain,
 	return SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova);
 }
 
+static struct tegra_smmu *tegra_smmu_find(struct device_node *np)
+{
+	struct platform_device *pdev;
+	struct tegra_mc *mc;
+
+	pdev = of_find_device_by_node(np);
+	if (!pdev)
+		return NULL;
+
+	mc = platform_get_drvdata(pdev);
+	if (!mc)
+		return NULL;
+
+	return mc->smmu;
+}
+
+static int tegra_smmu_configure(struct tegra_smmu *smmu, struct device *dev,
+struct of_phandle_args *args)
+{
+	const struct iommu_ops *ops = smmu->iommu.ops;
+	int err;
+
+	err = iommu_fwspec_init(dev, >of_node->fwnode, ops);
+	if (err < 0) {
+		dev_err(dev, "failed to initialize fwspec: %d\n", err);
+		return err;
+	}
+
+	err = ops->of_xlate(dev, args);
+	if (err < 0) {
+		dev_err(dev, "failed to parse SW group ID: %d\n", err);
+		iommu_fwspec_free(dev);
+		return err;
+	}
+
+	return 0;
+}
+
 static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
 {
-	struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
+	struct device_node *np = dev->of_node;
+	struct tegra_smmu *smmu = NULL;
+	struct of_phandle_args args;
+	unsigned int index = 0;
+	int err;
+
+	while (of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index,
+	  ) == 0) {
+		smmu = tegra_smmu_find(args.np);
+		if (smmu) {
+			err = tegra_smmu_configure(smmu, dev, );
+			of_node_put(args.np);
 
+			if (err < 0)
+return ERR_PTR(err);
+
+			break;
+		}
+
+		of_node_put(args.np);
+		index++;
+	}
+
+	smmu = dev_iommu_priv_get(dev);
 	if (!smmu)
 		return ERR_PTR(-ENODEV);
 
@@ -1028,6 +1088,16 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
 	if (!smmu)
 		return ERR_PTR(-ENOMEM);
 
+	/*
+	 * This is a bit of a hack. Ideally we'd want to simply return this
+	 * value. However the IOMMU registration process will attempt to add
+	 * all devices to the IOMMU when bus_set_iommu() is called. In order
+	 * not to rely on global variables to track the IOMMU 

Re: [PATCH 5/6] MIPS: remove CONFIG_DMA_MAYBE_COHERENT

2021-02-18 Thread Huacai Chen
Reviewed-by: Huacai Chen 

On Wed, Feb 10, 2021 at 6:04 PM Christoph Hellwig  wrote:
>
> CONFIG_DMA_MAYBE_COHERENT just guards two early init options now.  Just
> enable them unconditionally for CONFIG_DMA_NONCOHERENT.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/mips/Kconfig| 8 ++--
>  arch/mips/kernel/setup.c | 2 +-
>  2 files changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 0e86162df65541..1f1603a08a6d2d 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -181,7 +181,7 @@ config MIPS_ALCHEMY
> select CEVT_R4K
> select CSRC_R4K
> select IRQ_MIPS_CPU
> -   select DMA_MAYBE_COHERENT   # Au1000,1500,1100 aren't, rest is
> +   select DMA_NONCOHERENT  # Au1000,1500,1100 aren't, rest is
> select MIPS_FIXUP_BIGPHYS_ADDR if PCI
> select SYS_HAS_CPU_MIPS32_R1
> select SYS_SUPPORTS_32BIT_KERNEL
> @@ -546,7 +546,7 @@ config MIPS_MALTA
> select CLKSRC_MIPS_GIC
> select COMMON_CLK
> select CSRC_R4K
> -   select DMA_MAYBE_COHERENT
> +   select DMA_NONCOHERENT
> select GENERIC_ISA_DMA
> select HAVE_PCSPKR_PLATFORM
> select HAVE_PCI
> @@ -1127,10 +1127,6 @@ config FW_CFE
>  config ARCH_SUPPORTS_UPROBES
> bool
>
> -config DMA_MAYBE_COHERENT
> -   select DMA_NONCOHERENT
> -   bool
> -
>  config DMA_PERDEV_COHERENT
> bool
> select ARCH_HAS_SETUP_DMA_OPS
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index d6b2ba527f5b81..b25d07675b1ee1 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -805,7 +805,7 @@ static int __init debugfs_mips(void)
>  arch_initcall(debugfs_mips);
>  #endif
>
> -#ifdef CONFIG_DMA_MAYBE_COHERENT
> +#ifdef CONFIG_DMA_NONCOHERENT
>  static int __init setcoherentio(char *str)
>  {
> dma_default_coherent = true;
> --
> 2.29.2
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-18 Thread Keqian Zhu
Hi Eric,

On 2021/2/12 16:55, Auger Eric wrote:
> Hi Keqian,
> 
> On 2/1/21 12:52 PM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2020/11/18 19:21, Eric Auger wrote:
>>> On ARM, MSI are translated by the SMMU. An IOVA is allocated
>>> for each MSI doorbell. If both the host and the guest are exposed
>>> with SMMUs, we end up with 2 different IOVAs allocated by each.
>>> guest allocates an IOVA (gIOVA) to map onto the guest MSI
>>> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
>>> onto the physical doorbell (hDB).
>>>
>>> So we end up with 2 untied mappings:
>>>  S1S2
>>> gIOVA->gDB
>>>   hIOVA->hDB
>>>
>>> Currently the PCI device is programmed by the host with hIOVA
>>> as MSI doorbell. So this does not work.
>>>
>>> This patch introduces an API to pass gIOVA/gDB to the host so
>>> that gIOVA can be reused by the host instead of re-allocating
>>> a new IOVA. So the goal is to create the following nested mapping:
>> Does the gDB can be reused under non-nested mode?
> 
> Under non nested mode the hIOVA is allocated within the MSI reserved
> region exposed by the SMMU driver, [0x800, 80f]. see
> iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
> is programmed in the physical device so that the physical SMMU
> translates it into the physical doorbell (hDB = host physical ITS
So, AFAIU, under non-nested mode, at smmu side, we reuse the workflow of 
non-virtualization scenario.

> doorbell). The gDB is not used at pIOMMU programming level. It is only
> used when setting up the KVM irq route.
> 
> Hope this answers your question.
Thanks for your explanation!
> 

Thanks,
Keqian

>>
>>>
>>>  S1S2
>>> gIOVA->gDB ->hDB
>>>
>>> and program the PCI device with gIOVA MSI doorbell.
>>>
>>> In case we have several devices attached to this nested domain
>>> (devices belonging to the same group), they cannot be isolated
>>> on guest side either. So they should also end up in the same domain
>>> on guest side. We will enforce that all the devices attached to
>>> the host iommu domain use the same physical doorbell and similarly
>>> a single virtual doorbell mapping gets registered (1 single
>>> virtual doorbell is used on guest as well).
>>>
>> [...]
>>
>>> + *
>>> + * The associated IOVA can be reused by the host to create a nested
>>> + * stage2 binding mapping translating into the physical doorbell used
>>> + * by the devices attached to the domain.
>>> + *
>>> + * All devices within the domain must share the same physical doorbell.
>>> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
>>> + */
>>> +
>>> +int iommu_bind_guest_msi(struct iommu_domain *domain,
>>> +dma_addr_t giova, phys_addr_t gpa, size_t size)
>>> +{
>>> +   if (unlikely(!domain->ops->bind_guest_msi))
>>> +   return -ENODEV;
>>> +
>>> +   return domain->ops->bind_guest_msi(domain, giova, gpa, size);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
>>> +
>>> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
>>> +   dma_addr_t iova)
>> nit: s/iova/giova
> sure
>>
>>> +{
>>> +   if (unlikely(!domain->ops->unbind_guest_msi))
>>> +   return;
>>> +
>>> +   domain->ops->unbind_guest_msi(domain, iova);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
>>> +
>> [...]
>>
>> Thanks,
>> Keqian
>>
> 
> Thanks
> 
> Eric
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu