Re: [PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

2020-07-01 Thread Oliver O'Halloran
On Thu, Jul 2, 2020 at 4:07 AM Rajat Jain  wrote:
>
> *snip*
>
> > > I guess it would make sense to have an attribute for user space to
> > > write to in order to make the kernel reject device plug-in events
> > > coming from a given port or connector, but the kernel has no reliable
> > > means to determine *which* ports or connectors are "safe", and even if
> > > there was a way for it to do that, it still may not agree with user
> > > space on which ports or connectors should be regarded as "safe".
> >
> > Again, we have been doing this for USB devices for a very long time, PCI
> > shouldn't be any different.  Why people keep ignoring working solutions
> > is beyond me, there's nothing "special" about PCI devices here for this
> > type of "worry" or reasoning to try to create new solutions.
> >
> > So, again, I ask, go do what USB does, and to do that, take the logic
> > out of the USB core, make it bus-agnositic, and _THEN_ add it to the PCI
> > code. Why the original submitter keeps ignoring my request to do this
> > is beyond me, I guess they like making patches that will get rejected :(
>
> IMHO I'm actually trying to precisely do what I think was the
> conclusion of our discussion, and then some changes because of the
> further feedback I received on those patches. Let's take a step back
> and please allow me to explain how I got here (my apologies but this
> spans a couple of threads, and I"m trying to tie them all together
> here):

The previous thread had some suggestions, but no real conclusions.
That's probably why we're still arguing about it...

> GOAL: To allow user space to control what (PCI) drivers he wants to
> allow on external (thunderbolt) ports. There was a lot of debate about
> the need for such a policy at
> https://lore.kernel.org/linux-pci/CACK8Z6GR7-wseug=ttvyrarvzx_ao2geoldnbwjtb+5y7vw...@mail.gmail.com/
> with the final conclusion that it should be OK to implement such a
> policy in userspace, as long as the policy is not implemented in the
> kernel. The kernel only needs to expose bits & info that is needed by
> the userspace to implement such a policy, and it can be used in
> conjunction with "drivers_autoprobe" to implement this policy:
> 
> 
> That's an odd thing, but sure, if you want to write up such a policy for
> your systems, great.  But that policy does not belong in the kernel, it
> belongs in userspace.
> 
> 
> 1) The post 
> https://lore.kernel.org/linux-pci/20200609210400.GA1461839@bjorn-Precision-5520/
> lists out the approach that was agreed on. Replicating it here:
> ---
>   - Expose the PCI pdev->untrusted bit in sysfs.  We don't expose this
> today, but doing so would be trivial.  I think I would prefer a
> sysfs name like "external" so it's more descriptive and less of a
> judgment.
>
> This comes from either the DT "external-facing" property or the
> ACPI "ExternalFacingPort" property.
>
>   - All devices present at boot are enumerated.  Any statically built
> drivers will bind to them before any userspace code runs.
>
> If you want to keep statically built drivers from binding, you'd
> need to invent some mechanism so pci_driver_init() could clear
> drivers_autoprobe after registering pci_bus_type.
>
>   - Early userspace code prevents modular drivers from automatically
> binding to PCI devices:
>
>   echo 0 > /sys/bus/pci/drivers_autoprobe
>
> This prevents modular drivers from binding to all devices, whether
> present at boot or hot-added.
>
>   - Userspace code uses the sysfs "bind" file to control which drivers
> are loaded and can bind to each device, e.g.,
>
>   echo :02:00.0 > /sys/bus/pci/drivers/nvme/bind

I think this is a reasonable suggestion. However, as Greg pointed out
it's gratuitously different to what USB does for no real reason.

> ---
> 2) As part of implementing the above agreed approach, when I exposed
> PCI "untrusted" attribute to userspace, it ran into discussion that
> concluded that instead of this, the device core should be enhanced
> with a location attribute.
> https://lore.kernel.org/linux-pci/20200618184621.ga446...@kroah.com/
> ---
> ...
> The attribute should be called something like "location" or something
> like that (naming is hard), as you don't always know if something is
> external or not (it could be internal, it could be unknown, it could be
> internal to an external device that you trust (think PCI drawers for
> "super" computers that are hot pluggable but yet really part of the
> internal bus).
> 
> "trust" has no direct relation to the location, except in a policy of
> what you wish to do with that 

Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()

2020-07-01 Thread Lu Baolu

Hi Robin,

On 7/1/20 8:18 PM, Robin Murphy wrote:

On 2020-07-01 08:32, Lu Baolu wrote:

Hi Robin,

On 2020/7/1 0:51, Robin Murphy wrote:

On 2020-06-30 02:03, Lu Baolu wrote:

Hi Robin,

On 6/29/20 7:56 PM, Robin Murphy wrote:

On 2020-06-27 04:15, Lu Baolu wrote:

The hardware assistant vfio mediated device is a use case of iommu
aux-domain. The interactions between vfio/mdev and iommu during mdev
creation and passthr are:

- Create a group for mdev with iommu_group_alloc();
- Add the device to the group with
 group = iommu_group_alloc();
 if (IS_ERR(group))
 return PTR_ERR(group);

 ret = iommu_group_add_device(group, >dev);
 if (!ret)
 dev_info(>dev, "MDEV: group_id = %d\n",
  iommu_group_id(group));
- Allocate an aux-domain
iommu_domain_alloc()
- Attach the aux-domain to the physical device from which the mdev is
   created.
iommu_aux_attach_device()

In the whole process, an iommu group was allocated for the mdev 
and an

iommu domain was attached to the group, but the group->domain leaves
NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.

This adds iommu_group_get/set_domain() so that group->domain could be
managed whenever a domain is attached or detached through the 
aux-domain

api's.


Letting external callers poke around directly in the internals of 
iommu_group doesn't look right to me.


Unfortunately, it seems that the vifo iommu abstraction is deeply bound
to the IOMMU subsystem. We can easily find other examples:

iommu_group_get/set_iommudata()
iommu_group_get/set_name()
...


Sure, but those are ways for users of a group to attach useful 
information of their own to it, that doesn't matter to the IOMMU 
subsystem itself. The interface you've proposed gives callers rich 
new opportunities to fundamentally break correct operation of the API:


 dom = iommu_domain_alloc();
 iommu_attach_group(dom, grp);
 ...
 iommu_group_set_domain(grp, NULL);
 // oops, leaked and can't ever detach properly now

or perhaps:

 grp = iommu_group_alloc();
 iommu_group_add_device(grp, dev);
 iommu_group_set_domain(grp, dom);
 ...
 iommu_detach_group(dom, grp);
 // oops, IOMMU driver might not handle this

If a regular device is attached to one or more aux domains for 
PASID use, iommu_get_domain_for_dev() is still going to return the 
primary domain, so why should it be expected to behave differently 
for mediated


Unlike the normal device attach, we will encounter two devices when it
comes to aux-domain.

- Parent physical device - this might be, for example, a PCIe device
with PASID feature support, hence it is able to tag an unique PASID
for DMA transfers originated from its subset. The device driver hence
is able to wrapper this subset into an isolated:

- Mediated device - a fake device created by the device driver 
mentioned

above.

Yes. All you mentioned are right for the parent device. But for 
mediated

device, iommu_get_domain_for_dev() doesn't work even it has an valid
iommu_group and iommu_domain.

iommu_get_domain_for_dev() is a necessary interface for device drivers
which want to support aux-domain. For example,


Only if they want to follow this very specific notion of using 
made-up devices and groups to represent aux attachments. Even if a 
driver managing its own aux domains entirely privately does create 
child devices for them, it's not like it can't keep its domain 
pointers in drvdata if it wants to ;)


Let's not conflate the current implementation of vfio_mdev with the 
general concepts involved here.



   struct iommu_domain *domain;
   struct device *dev = mdev_dev(mdev);
   unsigned long pasid;

   domain = iommu_get_domain_for_dev(dev);
   if (!domain)
   return -ENODEV;

   pasid = iommu_aux_get_pasid(domain, dev->parent);
   if (pasid == IOASID_INVALID)
   return -EINVAL;

   /* Program the device context with the PASID value */
   

Without this fix, iommu_get_domain_for_dev() always returns NULL and 
the

device driver has no means to support aux-domain.


So either the IOMMU API itself is missing the ability to do the right 
thing internally, or the mdev layer isn't using it appropriately. 
Either way, simply punching holes in the API for mdev to hack around 
its own mess doesn't seem like the best thing to do.


The initial impression I got was that it's implicitly assumed here 
that the mdev itself is attached to exactly one aux domain and 
nothing else, at which point I would wonder why it's using aux at 
all, but are you saying that in fact no attach happens with the mdev 
group either way, only to the parent device?


I'll admit I'm not hugely familiar with any of this, but it seems to 
me that the logical flow should be:


 - allocate domain
 - attach as aux to parent
 - retrieve aux domain PASID
 - create mdev 

Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()

2020-07-01 Thread Lu Baolu

Hello,

On 7/1/20 8:18 PM, Robin Murphy wrote:

On 2020-07-01 08:32, Lu Baolu wrote:

Hi Robin,

On 2020/7/1 0:51, Robin Murphy wrote:

On 2020-06-30 02:03, Lu Baolu wrote:

Hi Robin,

On 6/29/20 7:56 PM, Robin Murphy wrote:

On 2020-06-27 04:15, Lu Baolu wrote:

The hardware assistant vfio mediated device is a use case of iommu
aux-domain. The interactions between vfio/mdev and iommu during mdev
creation and passthr are:

- Create a group for mdev with iommu_group_alloc();
- Add the device to the group with
 group = iommu_group_alloc();
 if (IS_ERR(group))
 return PTR_ERR(group);

 ret = iommu_group_add_device(group, >dev);
 if (!ret)
 dev_info(>dev, "MDEV: group_id = %d\n",
  iommu_group_id(group));
- Allocate an aux-domain
iommu_domain_alloc()
- Attach the aux-domain to the physical device from which the mdev is
   created.
iommu_aux_attach_device()

In the whole process, an iommu group was allocated for the mdev 
and an

iommu domain was attached to the group, but the group->domain leaves
NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.

This adds iommu_group_get/set_domain() so that group->domain could be
managed whenever a domain is attached or detached through the 
aux-domain

api's.


Letting external callers poke around directly in the internals of 
iommu_group doesn't look right to me.


Unfortunately, it seems that the vifo iommu abstraction is deeply bound
to the IOMMU subsystem. We can easily find other examples:

iommu_group_get/set_iommudata()
iommu_group_get/set_name()
...


Sure, but those are ways for users of a group to attach useful 
information of their own to it, that doesn't matter to the IOMMU 
subsystem itself. The interface you've proposed gives callers rich 
new opportunities to fundamentally break correct operation of the API:


 dom = iommu_domain_alloc();
 iommu_attach_group(dom, grp);
 ...
 iommu_group_set_domain(grp, NULL);
 // oops, leaked and can't ever detach properly now

or perhaps:

 grp = iommu_group_alloc();
 iommu_group_add_device(grp, dev);
 iommu_group_set_domain(grp, dom);
 ...
 iommu_detach_group(dom, grp);
 // oops, IOMMU driver might not handle this

If a regular device is attached to one or more aux domains for 
PASID use, iommu_get_domain_for_dev() is still going to return the 
primary domain, so why should it be expected to behave differently 
for mediated


Unlike the normal device attach, we will encounter two devices when it
comes to aux-domain.

- Parent physical device - this might be, for example, a PCIe device
with PASID feature support, hence it is able to tag an unique PASID
for DMA transfers originated from its subset. The device driver hence
is able to wrapper this subset into an isolated:

- Mediated device - a fake device created by the device driver 
mentioned

above.

Yes. All you mentioned are right for the parent device. But for 
mediated

device, iommu_get_domain_for_dev() doesn't work even it has an valid
iommu_group and iommu_domain.

iommu_get_domain_for_dev() is a necessary interface for device drivers
which want to support aux-domain. For example,


Only if they want to follow this very specific notion of using 
made-up devices and groups to represent aux attachments. Even if a 
driver managing its own aux domains entirely privately does create 
child devices for them, it's not like it can't keep its domain 
pointers in drvdata if it wants to ;)


Let's not conflate the current implementation of vfio_mdev with the 
general concepts involved here.



   struct iommu_domain *domain;
   struct device *dev = mdev_dev(mdev);
   unsigned long pasid;

   domain = iommu_get_domain_for_dev(dev);
   if (!domain)
   return -ENODEV;

   pasid = iommu_aux_get_pasid(domain, dev->parent);
   if (pasid == IOASID_INVALID)
   return -EINVAL;

   /* Program the device context with the PASID value */
   

Without this fix, iommu_get_domain_for_dev() always returns NULL and 
the

device driver has no means to support aux-domain.


So either the IOMMU API itself is missing the ability to do the right 
thing internally, or the mdev layer isn't using it appropriately. 
Either way, simply punching holes in the API for mdev to hack around 
its own mess doesn't seem like the best thing to do.


The initial impression I got was that it's implicitly assumed here 
that the mdev itself is attached to exactly one aux domain and 
nothing else, at which point I would wonder why it's using aux at 
all, but are you saying that in fact no attach happens with the mdev 
group either way, only to the parent device?


I'll admit I'm not hugely familiar with any of this, but it seems to 
me that the logical flow should be:


 - allocate domain
 - attach as aux to parent
 - retrieve aux domain PASID
 - create mdev 

[PATCH AUTOSEL 5.4 25/40] iommu/vt-d: Don't apply gfx quirks to untrusted devices

2020-07-01 Thread Sasha Levin
From: Rajat Jain 

[ Upstream commit 67e8a5b18d41af9298db5c17193f671f235cce01 ]

Currently, an external malicious PCI device can masquerade the VID:PID
of faulty gfx devices, and thus apply iommu quirks to effectively
disable the IOMMU restrictions for itself.

Thus we need to ensure that the device we are applying quirks to, is
indeed an internal trusted device.

Signed-off-by: Rajat Jain 
Reviewed-by: Ashok Raj 
Reviewed-by: Mika Westerberg 
Acked-by: Lu Baolu 
Link: https://lore.kernel.org/r/20200622231345.29722-4-baolu...@linux.intel.com
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-iommu.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 773ac2b0d6068..4e91036cf5636 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5955,6 +5955,23 @@ static bool intel_iommu_is_attach_deferred(struct 
iommu_domain *domain,
return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO;
 }
 
+/*
+ * Check that the device does not live on an external facing PCI port that is
+ * marked as untrusted. Such devices should not be able to apply quirks and
+ * thus not be able to bypass the IOMMU restrictions.
+ */
+static bool risky_device(struct pci_dev *pdev)
+{
+   if (pdev->untrusted) {
+   pci_info(pdev,
+"Skipping IOMMU quirk for dev [%04X:%04X] on untrusted 
PCI link\n",
+pdev->vendor, pdev->device);
+   pci_info(pdev, "Please check with your BIOS/Platform vendor 
about this\n");
+   return true;
+   }
+   return false;
+}
+
 const struct iommu_ops intel_iommu_ops = {
.capable= intel_iommu_capable,
.domain_alloc   = intel_iommu_domain_alloc,
@@ -5983,6 +6000,9 @@ const struct iommu_ops intel_iommu_ops = {
 
 static void quirk_iommu_igfx(struct pci_dev *dev)
 {
+   if (risky_device(dev))
+   return;
+
pci_info(dev, "Disabling IOMMU for graphics on this chipset\n");
dmar_map_gfx = 0;
 }
@@ -6024,6 +6044,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163D, 
quirk_iommu_igfx);
 
 static void quirk_iommu_rwbf(struct pci_dev *dev)
 {
+   if (risky_device(dev))
+   return;
+
/*
 * Mobile 4 Series Chipset neglects to set RWBF capability,
 * but needs it. Same seems to hold for the desktop versions.
@@ -6054,6 +6077,9 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev 
*dev)
 {
unsigned short ggc;
 
+   if (risky_device(dev))
+   return;
+
if (pci_read_config_word(dev, GGC, ))
return;
 
@@ -6087,6 +6113,12 @@ static void __init check_tylersburg_isoch(void)
pdev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x3a3e, NULL);
if (!pdev)
return;
+
+   if (risky_device(pdev)) {
+   pci_dev_put(pdev);
+   return;
+   }
+
pci_dev_put(pdev);
 
/* System Management Registers. Might be hidden, in which case
@@ -6096,6 +6128,11 @@ static void __init check_tylersburg_isoch(void)
if (!pdev)
return;
 
+   if (risky_device(pdev)) {
+   pci_dev_put(pdev);
+   return;
+   }
+
if (pci_read_config_dword(pdev, 0x188, )) {
pci_dev_put(pdev);
return;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.7 32/53] iommu/vt-d: Don't apply gfx quirks to untrusted devices

2020-07-01 Thread Sasha Levin
From: Rajat Jain 

[ Upstream commit 67e8a5b18d41af9298db5c17193f671f235cce01 ]

Currently, an external malicious PCI device can masquerade the VID:PID
of faulty gfx devices, and thus apply iommu quirks to effectively
disable the IOMMU restrictions for itself.

Thus we need to ensure that the device we are applying quirks to, is
indeed an internal trusted device.

Signed-off-by: Rajat Jain 
Reviewed-by: Ashok Raj 
Reviewed-by: Mika Westerberg 
Acked-by: Lu Baolu 
Link: https://lore.kernel.org/r/20200622231345.29722-4-baolu...@linux.intel.com
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-iommu.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index fde7aba49b746..a29ed14839c41 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -6200,6 +6200,23 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
return ret;
 }
 
+/*
+ * Check that the device does not live on an external facing PCI port that is
+ * marked as untrusted. Such devices should not be able to apply quirks and
+ * thus not be able to bypass the IOMMU restrictions.
+ */
+static bool risky_device(struct pci_dev *pdev)
+{
+   if (pdev->untrusted) {
+   pci_info(pdev,
+"Skipping IOMMU quirk for dev [%04X:%04X] on untrusted 
PCI link\n",
+pdev->vendor, pdev->device);
+   pci_info(pdev, "Please check with your BIOS/Platform vendor 
about this\n");
+   return true;
+   }
+   return false;
+}
+
 const struct iommu_ops intel_iommu_ops = {
.capable= intel_iommu_capable,
.domain_alloc   = intel_iommu_domain_alloc,
@@ -6229,6 +6246,9 @@ const struct iommu_ops intel_iommu_ops = {
 
 static void quirk_iommu_igfx(struct pci_dev *dev)
 {
+   if (risky_device(dev))
+   return;
+
pci_info(dev, "Disabling IOMMU for graphics on this chipset\n");
dmar_map_gfx = 0;
 }
@@ -6270,6 +6290,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163D, 
quirk_iommu_igfx);
 
 static void quirk_iommu_rwbf(struct pci_dev *dev)
 {
+   if (risky_device(dev))
+   return;
+
/*
 * Mobile 4 Series Chipset neglects to set RWBF capability,
 * but needs it. Same seems to hold for the desktop versions.
@@ -6300,6 +6323,9 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev 
*dev)
 {
unsigned short ggc;
 
+   if (risky_device(dev))
+   return;
+
if (pci_read_config_word(dev, GGC, ))
return;
 
@@ -6333,6 +6359,12 @@ static void __init check_tylersburg_isoch(void)
pdev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x3a3e, NULL);
if (!pdev)
return;
+
+   if (risky_device(pdev)) {
+   pci_dev_put(pdev);
+   return;
+   }
+
pci_dev_put(pdev);
 
/* System Management Registers. Might be hidden, in which case
@@ -6342,6 +6374,11 @@ static void __init check_tylersburg_isoch(void)
if (!pdev)
return;
 
+   if (risky_device(pdev)) {
+   pci_dev_put(pdev);
+   return;
+   }
+
if (pci_read_config_dword(pdev, 0x188, )) {
pci_dev_put(pdev);
return;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/7] iommu/vt-d: Warn on out-of-range invalidation address

2020-07-01 Thread Lu Baolu

On 7/1/20 11:33 PM, Jacob Pan wrote:

For guest requested IOTLB invalidation, address and mask are provided as
part of the invalidation data. VT-d HW silently ignores any address bits
below the mask. SW shall also allow such case but give warning if
address does not align with the mask. This patch relax the fault
handling from error to warning and proceed with invalidation request
with the given mask.

Signed-off-by: Jacob Pan 


Fixes: 6ee1b77ba3ac0 ("iommu/vt-d: Add svm/sva invalidate function")
Acked-by: Lu Baolu 

Best regards,
baolu


---
  drivers/iommu/intel/iommu.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6a0c62c7395c..2e1b53ade784 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5439,13 +5439,12 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
  
  		switch (BIT(cache_type)) {

case IOMMU_CACHE_INV_TYPE_IOTLB:
+   /* HW will ignore LSB bits based on address mask */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
size &&
(inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + 
size)) - 1))) {
-   pr_err_ratelimited("Address out of range, 0x%llx, 
size order %llu\n",
-  inv_info->addr_info.addr, 
size);
-   ret = -ERANGE;
-   goto out_unlock;
+   pr_err_ratelimited("User address not aligned, 
0x%llx, size order %llu\n",
+ inv_info->addr_info.addr, size);
}
  
  			/*



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/7] iommu/vt-d: Fix PASID devTLB invalidation

2020-07-01 Thread Lu Baolu

On 7/1/20 11:33 PM, Jacob Pan wrote:

DevTLB flush can be used for both DMA request with and without PASIDs.
The former uses PASID#0 (RID2PASID), latter uses non-zero PASID for SVA
usage.

This patch adds a check for PASID value such that devTLB flush with
PASID is used for SVA case. This is more efficient in that multiple
PASIDs can be used by a single device, when tearing down a PASID entry
we shall flush only the devTLB specific to a PASID.

Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table")
Signed-off-by: Jacob Pan 


Acked-by: Lu Baolu 

Best regards,
baolu


---
  drivers/iommu/intel/pasid.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c81f0f17c6ba..fa0154cce537 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -486,7 +486,16 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
qdep = info->ats_qdep;
pfsid = info->pfsid;
  
-	qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);

+   /*
+* When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID),
+* devTLB flush w/o PASID should be used. For non-zero PASID under
+* SVA usage, device could do DMA with multiple PASIDs. It is more
+* efficient to flush devTLB specific to the PASID.
+*/
+   if (pasid == PASID_RID2PASID)
+   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);
+   else
+   qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 
- VTD_PAGE_SHIFT);
  }
  
  void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-07-01 Thread Michael S. Tsirkin
On Wed, Jul 01, 2020 at 10:34:53AM -0700, Stefano Stabellini wrote:
> Would you be in favor of a more flexible check along the lines of the
> one proposed in the patch that started this thread:
> 
> if (xen_vring_use_dma())
> return true;
> 
> 
> xen_vring_use_dma would be implemented so that it returns true when
> xen_swiotlb is required and false otherwise.

Just to stress - with a patch like this virtio can *still* use DMA API
if PLATFORM_ACCESS is set. So if DMA API is broken on some platforms
as you seem to be saying, you guys should fix it before doing something
like this..

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 08/12] device core: Introduce DMA range map, supplanting dma_pfn_offset

2020-07-01 Thread Jim Quinlan via iommu
The new field 'dma_range_map' in struct device is used to facilitate the
use of single or multiple offsets between mapping regions of cpu addrs and
dma addrs.  It subsumes the role of "dev->dma_pfn_offset" which was only
capable of holding a single uniform offset and had no region bounds
checking.

The function of_dma_get_range() has been modified so that it takes a single
argument -- the device node -- and returns a map, NULL, or an error code.
The map is an array that holds the information regarding the DMA regions.
Each range entry contains the address offset, the cpu_start address, the
dma_start address, and the size of the region.

of_dma_configure() is the typical manner to set range offsets but there are
a number of ad hoc assignments to "dev->dma_pfn_offset" in the kernel
driver code.  These cases now invoke the function
dma_attach_offset_range(dev, cpu_addr, dma_addr, size).

Signed-off-by: Jim Quinlan 
---
 arch/arm/include/asm/dma-mapping.h|  9 +-
 arch/arm/mach-keystone/keystone.c | 17 ++--
 arch/sh/drivers/pci/pcie-sh7786.c |  9 +-
 arch/sh/kernel/dma-coherent.c | 14 +--
 arch/x86/pci/sta2x11-fixup.c  |  7 +-
 drivers/acpi/arm64/iort.c |  5 +-
 drivers/gpu/drm/sun4i/sun4i_backend.c |  7 +-
 drivers/iommu/io-pgtable-arm.c|  2 +-
 .../platform/sunxi/sun4i-csi/sun4i_csi.c  |  6 +-
 .../platform/sunxi/sun6i-csi/sun6i_csi.c  |  5 +-
 drivers/of/address.c  | 95 ++-
 drivers/of/device.c   | 50 ++
 drivers/of/of_private.h   |  9 +-
 drivers/of/unittest.c | 35 +--
 drivers/remoteproc/remoteproc_core.c  |  2 +-
 .../staging/media/sunxi/cedrus/cedrus_hw.c|  8 +-
 drivers/usb/core/message.c|  4 +-
 drivers/usb/core/usb.c|  2 +-
 include/linux/device.h|  4 +-
 include/linux/dma-direct.h| 10 +-
 include/linux/dma-mapping.h   | 37 
 kernel/dma/coherent.c | 11 ++-
 kernel/dma/mapping.c  | 53 +++
 23 files changed, 279 insertions(+), 122 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index bdd80ddbca34..b6796383f205 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -35,8 +35,9 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
 #ifndef __arch_pfn_to_dma
 static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
 {
-   if (dev)
-   pfn -= dev->dma_pfn_offset;
+   if (dev && dev->dma_range_map)
+   pfn -= (unsigned long)PFN_DOWN(dma_offset_from_phys_addr(dev, 
PFN_PHYS(pfn)));
+
return (dma_addr_t)__pfn_to_bus(pfn);
 }
 
@@ -44,8 +45,8 @@ static inline unsigned long dma_to_pfn(struct device *dev, 
dma_addr_t addr)
 {
unsigned long pfn = __bus_to_pfn(addr);
 
-   if (dev)
-   pfn += dev->dma_pfn_offset;
+   if (dev && dev->dma_range_map)
+   pfn += (unsigned long)PFN_DOWN(dma_offset_from_dma_addr(dev, 
addr));
 
return pfn;
 }
diff --git a/arch/arm/mach-keystone/keystone.c 
b/arch/arm/mach-keystone/keystone.c
index 638808c4e122..a499ece6f054 100644
--- a/arch/arm/mach-keystone/keystone.c
+++ b/arch/arm/mach-keystone/keystone.c
@@ -8,6 +8,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -24,8 +25,6 @@
 
 #include "keystone.h"
 
-static unsigned long keystone_dma_pfn_offset __read_mostly;
-
 static int keystone_platform_notifier(struct notifier_block *nb,
  unsigned long event, void *data)
 {
@@ -38,9 +37,12 @@ static int keystone_platform_notifier(struct notifier_block 
*nb,
return NOTIFY_BAD;
 
if (!dev->of_node) {
-   dev->dma_pfn_offset = keystone_dma_pfn_offset;
-   dev_err(dev, "set dma_pfn_offset%08lx\n",
-   dev->dma_pfn_offset);
+   int ret = dma_attach_offset_range(dev, KEYSTONE_HIGH_PHYS_START,
+ KEYSTONE_LOW_PHYS_START,
+ KEYSTONE_HIGH_PHYS_SIZE);
+   dev_err(dev, "set dma_offset%08llx%s\n", 
KEYSTONE_HIGH_PHYS_START
+   - KEYSTONE_LOW_PHYS_START, ret ? " failed" : "");
+
}
return NOTIFY_OK;
 }
@@ -51,11 +53,8 @@ static struct notifier_block platform_nb = {
 
 static void __init keystone_init(void)
 {
-   if (PHYS_OFFSET >= KEYSTONE_HIGH_PHYS_START) {
-   keystone_dma_pfn_offset = PFN_DOWN(KEYSTONE_HIGH_PHYS_START -
-  KEYSTONE_LOW_PHYS_START);
+   if (PHYS_OFFSET >= KEYSTONE_HIGH_PHYS_START)

[PATCH v6 00/12] PCI: brcmstb: enable PCIe for STB chips

2020-07-01 Thread Jim Quinlan via iommu
Patchset Summary:
  Enhance a PCIe host controller driver.  Because of its unusual design
  we are foced to change dev->dma_pfn_offset into a more general role
  allowing multiple offsets.  See the 'v1' notes below for more info.

v6:
  Commit "device core: Introduce DMA range map":
  -- of_dma_get_range() now takes a single argument and returns either
 NULL, a valid map, or an ERR_PTR. (Robin)
  -- offsets are no longer a PFN value but an actual address. (Robin)
  -- the bus_dma_region struct stores the range size instead of
 the cpu_end and pci_end values. (Robin)
  -- devices that were setting a single offset with no boundaries
 have been modified to have boundaries; in a few places
 where this informatino was unavilable a /* FIXME: ... */
 comment was added. (Robin)
  -- dma_attach_offset_range() can be called when an offset
 map already exists; if it's range is already present
 nothing is done and success is returned. (Robin)
  All commits:
  -- Man name/style/corrections/etc changed (Bjorn)
  -- rebase to Torvalds master

v5:
  Commit "device core: Introduce multiple dma pfn offsets"
  -- in of/address.c: "map_size = 0" => "*map_size = 0"
  -- use kcalloc instead of kzalloc (AndyS)
  -- use PHYS_ADDR_MAX instead of "~(phys_addr_t)0"
  Commit "PCI: brcmstb: Set internal memory viewport sizes"
  -- now gives error on missing dma-ranges property.
  Commit "dt-bindings: PCI: Add bindings for more Brcmstb chips"
  -- removed "Allof:" from brcm,scb-sizes definition (RobH)
  All Commits:
  -- indentation style, use max chars 100 (AndyS)
  -- rebased to torvalds master

v4:
  Commit "device core: Introduce multiple dma pfn offsets"
  -- of_dma_get_range() does not take a dev param but instead
 takes two "out" params: map and map_size.  We do this so
 that the code that parses dma-ranges is separate from
 the code that modifies 'dev'.   (Nicolas)
  -- the separate case of having a single pfn offset has
 been removed and is now processed by going through the
 map array. (Nicolas)
  -- move attach_uniform_dma_pfn_offset() from of/address.c to
 dma/mapping.c so that it does not depend on CONFIG_OF. (Nicolas)
  -- devm_kcalloc => devm_kzalloc (DanC)
  -- add/fix assignment to dev->dma_pfn_offset_map for func
 attach_uniform_dma_pfn_offset() (DanC, Nicolas)
  -- s/struct dma_pfn_offset_region/struct bus_dma_region/ (Nicolas)
  -- s/attach_uniform_dma_pfn_offset/dma_attach_uniform_pfn_offset/
  -- s/attach_dma_pfn_offset_map/dma_attach_pfn_offset_map/
  -- More use of PFN_{PHYS,DOWN,UP}. (AndyS)
  Commit "of: Include a dev param in of_dma_get_range()"
  -- this commit was sqaushed with "device core: Introduce ..."

v3:
  Commit "device core: Introduce multiple dma pfn offsets"
  Commit "arm: dma-mapping: Invoke dma offset func if needed"
  -- The above two commits have been squashed.  More importantly,
 the code has been modified so that the functionality for
 multiple pfn offsets subsumes the use of dev->dma_pfn_offset.
 In fact, dma_pfn_offset is removed and supplanted by
 dma_pfn_offset_map, which is a pointer to an array.  The
 more common case of a uniform offset is now handled as
 a map with a single entry, while cases requiring multiple
 pfn offsets use a map with multiple entries.  Code paths
 that used to do this:

 dev->dma_pfn_offset = mydrivers_pfn_offset;

 have been changed to do this:

 attach_uniform_dma_pfn_offset(dev, pfn_offset);

  Commit "dt-bindings: PCI: Add bindings for more Brcmstb chips"
  -- Add if/then clause for required props: resets, reset-names (RobH)
  -- Change compatible list from const to enum (RobH)
  -- Change list of u32-tuples to u64 (RobH)

  Commit "of: Include a dev param in of_dma_get_range()"
  -- modify of/unittests.c to add NULL param in of_dma_get_range() call.

  Commit "device core: Add ability to handle multiple dma offsets"
  -- align comment in device.h (AndyS).
  -- s/cpu_beg/cpu_start/ and s/dma_beg/dma_start/ in struct
 dma_pfn_offset_region (AndyS).

v2:
Commit: "device core: Add ability to handle multiple dma offsets"
  o Added helper func attach_dma_pfn_offset_map() in address.c (Chistoph)
  o Helpers funcs added to __phys_to_dma() & __dma_to_phys() (Christoph)
  o Added warning when multiple offsets are needed and !DMA_PFN_OFFSET_MAP
  o dev->dma_pfn_map => dev->dma_pfn_offset_map
  o s/frm/from/ for dma_pfn_offset_frm_{phys,dma}_addr() (Christoph)
  o In device.h: s/const void */const struct dma_pfn_offset_region */
  o removed 'unlikely' from unlikely(dev->dma_pfn_offset_map) since
guarded by CONFIG_DMA_PFN_OFFSET_MAP (Christoph)
  o Since dev->dma_pfn_offset is copied in usb/core/{usb,message}.c, now
dev->dma_pfn_offset_map is copied as well.
  o Merged two of the DMA commits into one (Christoph).

Commit "arm: dma-mapping: Invoke dma offset func if needed":
  o Use helper functions instead of #if CONFIG_DMA_PFN_OFFSET

Other commits' 

Re: [PATCH] xen: introduce xen_vring_use_dma

2020-07-01 Thread Michael S. Tsirkin
On Wed, Jul 01, 2020 at 10:34:53AM -0700, Stefano Stabellini wrote:
> On Wed, 1 Jul 2020, Christoph Hellwig wrote:
> > On Mon, Jun 29, 2020 at 04:46:09PM -0700, Stefano Stabellini wrote:
> > > > I could imagine some future Xen hosts setting a flag somewhere in the
> > > > platform capability saying "no xen specific flag, rely on
> > > > "VIRTIO_F_ACCESS_PLATFORM". Then you set that accordingly in QEMU.
> > > > How about that?
> > > 
> > > Yes, that would be fine and there is no problem implementing something
> > > like that when we get virtio support in Xen. Today there are still no
> > > virtio interfaces provided by Xen to ARM guests (no virtio-block/net,
> > > etc.)
> > > 
> > > In fact, in both cases we are discussing virtio is *not* provided by
> > > Xen; it is a firmware interface to something entirely different:
> > > 
> > > 1) virtio is used to talk to a remote AMP processor (RPMesg)
> > > 2) virtio is used to talk to a secure-world firmware/OS (Trusty)
> > >
> > > VIRTIO_F_ACCESS_PLATFORM is not set by Xen in these cases but by RPMesg
> > > and by Trusty respectively. I don't know if Trusty should or should not
> > > set VIRTIO_F_ACCESS_PLATFORM, but I think Linux should still work
> > > without issues.
> > > 
> > 
> > Any virtio implementation that is not in control of the memory map
> > (aka not the hypervisor) absolutely must set VIRTIO_F_ACCESS_PLATFORM,
> > else it is completely broken.
> 
> Lots of broken virtio implementations out there it would seem :-(

Not really, most of virtio implementations are in full control of
memory, being part of the hypervisor.

> 
> > > The xen_domain() check in Linux makes it so that vring_use_dma_api
> > > returns the opposite value on native Linux compared to Linux as Xen/ARM
> > > DomU by "accident". By "accident" because there is no architectural
> > > reason why Linux Xen/ARM DomU should behave differently compared to
> > > native Linux in this regard.
> > > 
> > > I hope that now it is clearer why I think the if (xen_domain()) check
> > > needs to be improved anyway, even if we fix generic dma_ops with virtio
> > > interfaces missing VIRTIO_F_ACCESS_PLATFORM.
> > 
> > IMHO that Xen quirk should never have been added in this form..
> 
> Would you be in favor of a more flexible check along the lines of the
> one proposed in the patch that started this thread:
> 
> if (xen_vring_use_dma())
> return true;
> 
> 
> xen_vring_use_dma would be implemented so that it returns true when
> xen_swiotlb is required and false otherwise.

I'll need to think about it. Sounds reasonable on the surface ...

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
On 01/07/2020 20:00, Krishna Reddy wrote:
>> +items:
>> +  - enum:
>> +  - nvdia,tegra194-smmu
>> +  - const: arm,mmu-500

> Is the fallback compatible appropriate here? If software treats this as a 
> standard MMU-500 it will only program the first instance (because the 
> second isn't presented as a separate MMU-500) - is there any way that 
> isn't going to blow up?

 When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
 implementation override ensure that both instances are programmed. Isn't 
 it? I am not sure I follow your comment fully.
>> 
>>> The problem is, if for some reason someone had a Tegra194, but only set the 
>>> compatible string to 'arm,mmu-500' it would assume that it was a normal 
>>> arm,mmu-500 and only one instance would be programmed. We always want at 
>>> least 2 of the 3 instances >>programmed and so we should only match 
>>> 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the 
>>> arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to 
>>> _mmu500.
>> 
>> In that case, new binding "nvidia,smmu-v2" can be added with data set to 
>> _mmu500 and enumeration would have nvidia,tegra194-smmu and another 
>> variant for next generation SoC in future. 

>I think you would be better off with nvidia,smmu-500 as smmu-v2 appears to be 
>something different. I see others have a smmu-v2 but I am not sure if that is 
>legacy. We have an smmu-500 and so that would seem more appropriate.

I tried to use the binding synonymous to other vendors. 
V2 is the architecture version.  MMU-500 is the actual implementation from ARM 
based on V2 arch.  As we just use the MMU-500 IP as it is, It can be named as 
nvidia,smmu-500 or similar as well.
Others probably having their own implementation based on V2 arch.

KR
--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Jon Hunter


On 01/07/2020 20:00, Krishna Reddy wrote:
> +items:
> +  - enum:
> +  - nvdia,tegra194-smmu
> +  - const: arm,mmu-500
>>>
 Is the fallback compatible appropriate here? If software treats this as a 
 standard MMU-500 it will only program the first instance (because the 
 second isn't presented as a separate MMU-500) - is there any way that 
 isn't going to blow up?
>>>
>>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
>>> implementation override ensure that both instances are programmed. Isn't 
>>> it? I am not sure I follow your comment fully.
> 
>> The problem is, if for some reason someone had a Tegra194, but only set the 
>> compatible string to 'arm,mmu-500' it would assume that it was a normal 
>> arm,mmu-500 and only one instance would be programmed. We always want at 
>> least 2 of the 3 instances >programmed and so we should only match 
>> 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the 
>> arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to 
>> _mmu500.
> 
> In that case, new binding "nvidia,smmu-v2" can be added with data set to 
> _mmu500 and enumeration would have nvidia,tegra194-smmu and another 
> variant for next generation SoC in future. 

I think you would be better off with nvidia,smmu-500 as smmu-v2 appears
to be something different. I see others have a smmu-v2 but I am not sure
if that is legacy. We have an smmu-500 and so that would seem more
appropriate.

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-01 Thread Krishna Reddy
>> With shared irq line, the context fault identification is not optimal 
>> already.  Reading all the context banks all the time can be additional mmio 
>> read overhead. But, it may not hurt the real use cases as these happen only 
>> when there are bugs.

>Right, I did ponder the idea of a whole programmatic "request_context_irq" 
>hook that would allow registering the handler for both interrupts with the 
>appropriate context bank and instance data, but since all interrupts are 
>currently unexpected it seems somewhat hard to justify the extra complexity. 
>Obviously we can revisit this in future if you want to start actually doing 
>something with faults like the qcom GPU folks do.

Thanks, I would just avoid making changes to interrupt handlers till it is 
really necessary in future. The current code would just be simple and 
functional with more interrupts when there are multiple faults.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-01 Thread Robin Murphy

On 2020-07-01 19:48, Krishna Reddy wrote:

+for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+if (irq_ret == IRQ_HANDLED)
+return irq_ret;


Any chance there could be more than one SMMU faulting by the time we
service the interrupt?



It certainly seems plausible if the interconnect is automatically 
load-balancing requests across the SMMU instances - say a driver bug caused a 
buffer to be unmapped too early, there could be many in-flight accesses to 
parts of that buffer that aren't all taking the same path and thus could now 
fault in parallel.
[ And anyone inclined to nitpick global vs. context faults, s/unmap a 
buffer/tear down a domain/ ;) ]
Either way I think it would be easier to reason about if we just handled these 
like a typical shared interrupt and always checked all the instances.


It would be optimal to check at the same time across all instances.


+for (idx = 0; idx < smmu->num_context_banks; idx++) {
+irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+ idx,
+ inst);
+
+if (irq_ret == IRQ_HANDLED)
+return irq_ret;


Any reason why we don't check all banks?



As above, we certainly shouldn't bail out without checking the bank for the 
offending domain across all of its instances, and I guess the way this works 
means that we would have to iterate all the banks to achieve that.


With shared irq line, the context fault identification is not optimal already.  
Reading all the context banks all the time can be additional mmio read 
overhead. But, it may not hurt the real use cases as these happen only when 
there are bugs.


Right, I did ponder the idea of a whole programmatic 
"request_context_irq" hook that would allow registering the handler for 
both interrupts with the appropriate context bank and instance data, but 
since all interrupts are currently unexpected it seems somewhat hard to 
justify the extra complexity. Obviously we can revisit this in future if 
you want to start actually doing something with faults like the qcom GPU 
folks do.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Krishna Reddy
>Yeah, I realised later last night that this probably originated from forking 
>the whole driver downstream. But even then you could have treated the other 
>one as a separate nsmmu with a single instance ;)

True, But the initial nvidia implementation had limitation that it can only 
handle one instance of usage. With your implementation hooks design, it should 
be able to handle multiple instances of usage now.

>Since it does add a bit of confusion to the code and comments, let's just keep 
>things simple. I do like Jon's suggestion of actually enforcing that the 
>number of "reg" regions exactly matches the number expected for the given 
>compatible - I guess for now that means just hard-coding 2 and hoping the 
>hardware folks don't cook up any more of these...

For T194, reg can just be forced to 2. No future plan to use more than two 
MMU-500s together as of now. 

-KR 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Robin Murphy

On 2020-07-01 19:47, Jon Hunter wrote:


On 01/07/2020 19:28, Krishna Reddy wrote:

+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"

Hmm, there must be a better way to word that to express that it only applies to 
the sets of SMMUs that must be programmed identically, and not any other 
independent MMU-500s that might also happen to be in the same SoC.


Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s 
identically".


+items:
+  - enum:
+  - nvdia,tegra194-smmu
+  - const: arm,mmu-500



Is the fallback compatible appropriate here? If software treats this as a 
standard MMU-500 it will only program the first instance (because the second 
isn't presented as a separate MMU-500) - is there any way that isn't going to 
blow up?


When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
implementation override ensure that both instances are programmed. Isn't it? I 
am not sure I follow your comment fully.


The problem is, if for some reason someone had a Tegra194, but only set
the compatible string to 'arm,mmu-500' it would assume that it was a
normal arm,mmu-500 and only one instance would be programmed. We always
want at least 2 of the 3 instances programmed and so we should only
match 'nvidia,tegra194-smmu'. In fact, I think that we also need to
update the arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with
the data set to _mmu500.


Right, the main concern is if the shiny new DT gets passed to an older 
kernel (or other OS entirely) which doesn't know the 
"nvdia,tegra194-smmu" compatible but would match on "arm,mmu-500" and 
bind a standard driver thinking it's going to work OK. The user would 
probably prefer that no driver matched and both instances were left 
turned off, than face the fallout of only one of them being set up.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
 +items:
 +  - enum:
 +  - nvdia,tegra194-smmu
 +  - const: arm,mmu-500
> >
>>> Is the fallback compatible appropriate here? If software treats this as a 
>>> standard MMU-500 it will only program the first instance (because the 
>>> second isn't presented as a separate MMU-500) - is there any way that isn't 
>>> going to blow up?
> >
>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
>> implementation override ensure that both instances are programmed. Isn't it? 
>> I am not sure I follow your comment fully.

>The problem is, if for some reason someone had a Tegra194, but only set the 
>compatible string to 'arm,mmu-500' it would assume that it was a normal 
>arm,mmu-500 and only one instance would be programmed. We always want at least 
>2 of the 3 instances >programmed and so we should only match 
>'nvidia,tegra194-smmu'. In fact, I think that we also need to update the 
>arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to 
>_mmu500.

In that case, new binding "nvidia,smmu-v2" can be added with data set to 
_mmu500 and enumeration would have nvidia,tegra194-smmu and another variant 
for next generation SoC in future. 

-KR
--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Robin Murphy

On 2020-07-01 19:18, Krishna Reddy wrote:

+ * When Linux kernel supports multiple SMMU devices, the SMMU
device +used for + * isochornous HW devices should be added as a
separate ARM MMU-500 +device + * in DT and be programmed
independently for efficient TLB invalidates.



I don't understand the "When" there - the driver has always
supported multiple independent SMMUs, and it's not something that
could be configured out or otherwise disabled. Plus I really don't
see why you would ever want to force unrelated SMMUs to be
>programmed together - beyond the TLB thing mentioned it would also
waste precious context bank resources and might lead to weird
device grouping via false stream ID aliasing, with no obvious
upside at all.


Sorry, I missed this comment. During the initial patches, when the
iommu_ops were different between, support multiple SMMU drivers at
the same is not possible as one of them(that gets probed last)
overwrites the platform bus ops. On revisiting the original issue,
This problem is no longer relevant. At this point, It makes more
sense to just get rid of 3rd instance programming in
arm-smmu-nvidia.c and just limit it to the SMMU instances that need
identical programming.


Yeah, I realised later last night that this probably originated from 
forking the whole driver downstream. But even then you could have 
treated the other one as a separate nsmmu with a single instance ;)


Since it does add a bit of confusion to the code and comments, let's 
just keep things simple. I do like Jon's suggestion of actually 
enforcing that the number of "reg" regions exactly matches the number 
expected for the given compatible - I guess for now that means just 
hard-coding 2 and hoping the hardware folks don't cook up any more of 
these...


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-01 Thread Krishna Reddy
>>> +for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
>>> +irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
>>> +if (irq_ret == IRQ_HANDLED)
>>> +return irq_ret;
>>
>> Any chance there could be more than one SMMU faulting by the time we 
>> service the interrupt?

>It certainly seems plausible if the interconnect is automatically 
>load-balancing requests across the SMMU instances - say a driver bug caused a 
>buffer to be unmapped too early, there could be many in-flight accesses to 
>parts of that buffer that aren't all taking the same path and thus could now 
>fault in parallel.
>[ And anyone inclined to nitpick global vs. context faults, s/unmap a 
>buffer/tear down a domain/ ;) ]
>Either way I think it would be easier to reason about if we just handled these 
>like a typical shared interrupt and always checked all the instances.

It would be optimal to check at the same time across all instances. 

>>> +for (idx = 0; idx < smmu->num_context_banks; idx++) {
>>> +irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
>>> + idx, 
>>> + inst);
>>> +
>>> +if (irq_ret == IRQ_HANDLED)
>>> +return irq_ret;
>>
>> Any reason why we don't check all banks?

>As above, we certainly shouldn't bail out without checking the bank for the 
>offending domain across all of its instances, and I guess the way this works 
>means that we would have to iterate all the banks to achieve that.

With shared irq line, the context fault identification is not optimal already.  
Reading all the context banks all the time can be additional mmio read 
overhead. But, it may not hurt the real use cases as these happen only when 
there are bugs. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Jon Hunter


On 01/07/2020 19:28, Krishna Reddy wrote:
>>> +  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
>> Hmm, there must be a better way to word that to express that it only applies 
>> to the sets of SMMUs that must be programmed identically, and not any other 
>> independent MMU-500s that might also happen to be in the same SoC.
> 
> Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s 
> identically".
> 
>>> +items:
>>> +  - enum:
>>> +  - nvdia,tegra194-smmu
>>> +  - const: arm,mmu-500
> 
>> Is the fallback compatible appropriate here? If software treats this as a 
>> standard MMU-500 it will only program the first instance (because the second 
>> isn't presented as a separate MMU-500) - is there any way that isn't going 
>> to blow up?
> 
> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
> implementation override ensure that both instances are programmed. Isn't it? 
> I am not sure I follow your comment fully.

The problem is, if for some reason someone had a Tegra194, but only set
the compatible string to 'arm,mmu-500' it would assume that it was a
normal arm,mmu-500 and only one instance would be programmed. We always
want at least 2 of the 3 instances programmed and so we should only
match 'nvidia,tegra194-smmu'. In fact, I think that we also need to
update the arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with
the data set to _mmu500.

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
>> +  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
> Hmm, there must be a better way to word that to express that it only applies 
> to the sets of SMMUs that must be programmed identically, and not any other 
> independent MMU-500s that might also happen to be in the same SoC.

Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s 
identically".

>> +items:
>> +  - enum:
>> +  - nvdia,tegra194-smmu
>> +  - const: arm,mmu-500

>Is the fallback compatible appropriate here? If software treats this as a 
>standard MMU-500 it will only program the first instance (because the second 
>isn't presented as a separate MMU-500) - is there any way that isn't going to 
>blow up?

When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
implementation override ensure that both instances are programmed. Isn't it? I 
am not sure I follow your comment fully.

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Krishna Reddy
>> + * When Linux kernel supports multiple SMMU devices, the SMMU device 
>> +used for
>> + * isochornous HW devices should be added as a separate ARM MMU-500 
>> +device
>> + * in DT and be programmed independently for efficient TLB invalidates.

>I don't understand the "When" there - the driver has always supported multiple 
>independent SMMUs, and it's not something that could be configured out or 
>otherwise disabled. Plus I really don't see why you would ever want to force 
>unrelated SMMUs to be >programmed together - beyond the TLB thing mentioned it 
>would also waste precious context bank resources and might lead to weird 
>device grouping via false stream ID aliasing, with no obvious upside at all.

Sorry, I missed this comment.
During the initial patches, when the iommu_ops were different between, support 
multiple SMMU drivers at the same is not possible as one of them(that gets 
probed last) overwrites the platform bus ops. 
On revisiting the original issue, This problem is no longer relevant. At this 
point, It makes more sense to just get rid of 3rd instance programming in 
arm-smmu-nvidia.c and just limit it to the SMMU instances that need identical 
programming.

-KR



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

2020-07-01 Thread Rajat Jain via iommu
Hello,

On Tue, Jun 30, 2020 at 10:00 AM Greg Kroah-Hartman
 wrote:
>
> On Tue, Jun 30, 2020 at 06:08:31PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Jun 30, 2020 at 5:38 PM Greg Kroah-Hartman
> >  wrote:
> > >
> > > On Tue, Jun 30, 2020 at 03:00:34PM +0200, Rafael J. Wysocki wrote:
> > > > On Tue, Jun 30, 2020 at 2:52 PM Greg Kroah-Hartman
> > > >  wrote:
> > > > >
> > > > > On Tue, Jun 30, 2020 at 01:49:48PM +0300, Heikki Krogerus wrote:
> > > > > > On Mon, Jun 29, 2020 at 09:49:41PM -0700, Rajat Jain wrote:
> > > > > > > Add a new (optional) field to denote the physical location of a 
> > > > > > > device
> > > > > > > in the system, and expose it in sysfs. This was discussed here:
> > > > > > > https://lore.kernel.org/linux-acpi/20200618184621.ga446...@kroah.com/
> > > > > > >
> > > > > > > (The primary choice for attribute name i.e. "location" is already
> > > > > > > exposed as an ABI elsewhere, so settled for "site"). Individual 
> > > > > > > buses
> > > > > > > that want to support this new attribute can opt-in by setting a 
> > > > > > > flag in
> > > > > > > bus_type, and then populating the location of device while 
> > > > > > > enumerating
> > > > > > > it.
> > > > > >
> > > > > > So why not just call it "physical_location"?
> > > > >
> > > > > That's better, and will allow us to put "3rd blue plug from the left,
> > > > > 4th row down" in there someday :)
> > > > >
> > > > > All of this is "relative" to the CPU, right?  But what CPU?  Again, 
> > > > > how
> > > > > are the systems with drawers of PCI and CPUs and memory that can be
> > > > > added/removed at any point in time being handled here?  What is
> > > > > "internal" and "external" for them?
> > > > >
> > > > > What exactly is the physical boundry here that is attempting to be
> > > > > described?
> > > >
> > > > Also, where is the "physical location" information going to come from?
> > >
> > > Who knows?  :)
> > >
> > > Some BIOS seem to provide this, but do you trust that?
> > >
> > > > If that is the platform firmware (which I suspect is the anticipated
> > > > case), there may be problems with reliability related to that.
> > >
> > > s/may/will/
> > >
> > > which means making the kernel inact a policy like this patch series
> > > tries to add, will result in a lot of broken systems, which is why I
> > > keep saying that it needs to be done in userspace.
> > >
> > > It's as if some of us haven't been down this road before and just keep
> > > being ignored...
> > >
> > > {sigh}
> >
> > Well, to be honest, if you are a "vertical" vendor and you control the
> > entire stack, *including* the platform firmware, it would be kind of
> > OK for you to do that in a product kernel.
> >
> > However, this is not a practical thing to do in the mainline kernel
> > which must work for everybody, including people who happen to use
> > systems with broken or even actively unfriendly firmware on them.
> >
> > So I'm inclined to say that IMO this series "as is" would not be an
> > improvement from the mainline perspective.
>
> It can be, we have been using this for USB devices for many many years
> now, quite successfully.  The key is not to trust that the platform
> firmware got it right :)
>
> > I guess it would make sense to have an attribute for user space to
> > write to in order to make the kernel reject device plug-in events
> > coming from a given port or connector, but the kernel has no reliable
> > means to determine *which* ports or connectors are "safe", and even if
> > there was a way for it to do that, it still may not agree with user
> > space on which ports or connectors should be regarded as "safe".
>
> Again, we have been doing this for USB devices for a very long time, PCI
> shouldn't be any different.  Why people keep ignoring working solutions
> is beyond me, there's nothing "special" about PCI devices here for this
> type of "worry" or reasoning to try to create new solutions.
>
> So, again, I ask, go do what USB does, and to do that, take the logic
> out of the USB core, make it bus-agnositic, and _THEN_ add it to the PCI
> code. Why the original submitter keeps ignoring my request to do this
> is beyond me, I guess they like making patches that will get rejected :(

IMHO I'm actually trying to precisely do what I think was the
conclusion of our discussion, and then some changes because of the
further feedback I received on those patches. Let's take a step back
and please allow me to explain how I got here (my apologies but this
spans a couple of threads, and I"m trying to tie them all together
here):

GOAL: To allow user space to control what (PCI) drivers he wants to
allow on external (thunderbolt) ports. There was a lot of debate about
the need for such a policy at
https://lore.kernel.org/linux-pci/CACK8Z6GR7-wseug=ttvyrarvzx_ao2geoldnbwjtb+5y7vw...@mail.gmail.com/
with the final conclusion that it should be OK to implement such a
policy in userspace, as long as the policy is not implemented in the
kernel. The kernel 

Re: [PATCH] xen: introduce xen_vring_use_dma

2020-07-01 Thread Stefano Stabellini
On Wed, 1 Jul 2020, Christoph Hellwig wrote:
> On Mon, Jun 29, 2020 at 04:46:09PM -0700, Stefano Stabellini wrote:
> > > I could imagine some future Xen hosts setting a flag somewhere in the
> > > platform capability saying "no xen specific flag, rely on
> > > "VIRTIO_F_ACCESS_PLATFORM". Then you set that accordingly in QEMU.
> > > How about that?
> > 
> > Yes, that would be fine and there is no problem implementing something
> > like that when we get virtio support in Xen. Today there are still no
> > virtio interfaces provided by Xen to ARM guests (no virtio-block/net,
> > etc.)
> > 
> > In fact, in both cases we are discussing virtio is *not* provided by
> > Xen; it is a firmware interface to something entirely different:
> > 
> > 1) virtio is used to talk to a remote AMP processor (RPMesg)
> > 2) virtio is used to talk to a secure-world firmware/OS (Trusty)
> >
> > VIRTIO_F_ACCESS_PLATFORM is not set by Xen in these cases but by RPMesg
> > and by Trusty respectively. I don't know if Trusty should or should not
> > set VIRTIO_F_ACCESS_PLATFORM, but I think Linux should still work
> > without issues.
> > 
> 
> Any virtio implementation that is not in control of the memory map
> (aka not the hypervisor) absolutely must set VIRTIO_F_ACCESS_PLATFORM,
> else it is completely broken.

Lots of broken virtio implementations out there it would seem :-(


> > The xen_domain() check in Linux makes it so that vring_use_dma_api
> > returns the opposite value on native Linux compared to Linux as Xen/ARM
> > DomU by "accident". By "accident" because there is no architectural
> > reason why Linux Xen/ARM DomU should behave differently compared to
> > native Linux in this regard.
> > 
> > I hope that now it is clearer why I think the if (xen_domain()) check
> > needs to be improved anyway, even if we fix generic dma_ops with virtio
> > interfaces missing VIRTIO_F_ACCESS_PLATFORM.
> 
> IMHO that Xen quirk should never have been added in this form..

Would you be in favor of a more flexible check along the lines of the
one proposed in the patch that started this thread:

if (xen_vring_use_dma())
return true;


xen_vring_use_dma would be implemented so that it returns true when
xen_swiotlb is required and false otherwise.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 12/12] bus: fsl-mc: Add ACPI support for fsl-mc

2020-07-01 Thread Laurentiu Tudor



On 6/19/2020 11:20 AM, Lorenzo Pieralisi wrote:
> From: Makarand Pawagi 
> 
> Add ACPI support in the fsl-mc driver. Driver parses MC DSDT table to
> extract memory and other resources.
> 
> Interrupt (GIC ITS) information is extracted from the MADT table
> by drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c.
> 
> IORT table is parsed to configure DMA.
> 
> Signed-off-by: Makarand Pawagi 
> Signed-off-by: Diana Craciun 
> Signed-off-by: Laurentiu Tudor 
> ---
>  drivers/bus/fsl-mc/fsl-mc-bus.c | 73 
>  drivers/bus/fsl-mc/fsl-mc-msi.c | 37 +
>  drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c | 92 -
>  3 files changed, 150 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
> index 824ff77bbe86..324d49d6df89 100644
> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> @@ -18,6 +18,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "fsl-mc-private.h"
>  
> @@ -38,6 +40,7 @@ struct fsl_mc {
>   struct fsl_mc_device *root_mc_bus_dev;
>   u8 num_translation_ranges;
>   struct fsl_mc_addr_translation_range *translation_ranges;
> + void *fsl_mc_regs;
>  };
>  
>  /**
> @@ -56,6 +59,10 @@ struct fsl_mc_addr_translation_range {
>   phys_addr_t start_phys_addr;
>  };
>  
> +#define FSL_MC_FAPR  0x28
> +#define MC_FAPR_PL   BIT(18)
> +#define MC_FAPR_BMT  BIT(17)
> +
>  /**
>   * fsl_mc_bus_match - device to driver matching callback
>   * @dev: the fsl-mc device to match against
> @@ -124,7 +131,10 @@ static int fsl_mc_dma_configure(struct device *dev)
>   while (dev_is_fsl_mc(dma_dev))
>   dma_dev = dma_dev->parent;
>  
> - return of_dma_configure_id(dev, dma_dev->of_node, 0, _id);
> + if (dev_of_node(dma_dev))
> + return of_dma_configure_id(dev, dma_dev->of_node, 0, _id);
> +
> + return acpi_dma_configure_id(dev, DEV_DMA_COHERENT, _id);
>  }
>  
>  static ssize_t modalias_show(struct device *dev, struct device_attribute 
> *attr,
> @@ -865,8 +875,11 @@ static int fsl_mc_bus_probe(struct platform_device *pdev)
>   struct fsl_mc_io *mc_io = NULL;
>   int container_id;
>   phys_addr_t mc_portal_phys_addr;
> - u32 mc_portal_size;
> - struct resource res;
> + u32 mc_portal_size, mc_stream_id;
> + struct resource *plat_res;
> +
> + if (!iommu_present(_mc_bus_type))
> + return -EPROBE_DEFER;
>  
>   mc = devm_kzalloc(>dev, sizeof(*mc), GFP_KERNEL);
>   if (!mc)
> @@ -874,19 +887,33 @@ static int fsl_mc_bus_probe(struct platform_device 
> *pdev)
>  
>   platform_set_drvdata(pdev, mc);
>  
> + plat_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> + mc->fsl_mc_regs = devm_ioremap_resource(>dev, plat_res);
> + if (IS_ERR(mc->fsl_mc_regs))
> + return PTR_ERR(mc->fsl_mc_regs);
> +
> + if (IS_ENABLED(CONFIG_ACPI) && !dev_of_node(>dev)) {
> + mc_stream_id = readl(mc->fsl_mc_regs + FSL_MC_FAPR);
> + /*
> +  * HW ORs the PL and BMT bit, places the result in bit 15 of
> +  * the StreamID and ORs in the ICID. Calculate it accordingly.
> +  */
> + mc_stream_id = (mc_stream_id & 0x) |
> + ((mc_stream_id & (MC_FAPR_PL | MC_FAPR_BMT)) ?
> + 0x4000 : 0);
> + error = acpi_dma_configure_id(>dev, DEV_DMA_COHERENT,
> +   _stream_id);
> + if (error)
> + dev_warn(>dev, "failed to configure dma: %d.\n",
> +  error);
> + }
> +
>   /*
>* Get physical address of MC portal for the root DPRC:
>*/
> - error = of_address_to_resource(pdev->dev.of_node, 0, );
> - if (error < 0) {
> - dev_err(>dev,
> - "of_address_to_resource() failed for %pOF\n",
> - pdev->dev.of_node);
> - return error;
> - }
> -
> - mc_portal_phys_addr = res.start;
> - mc_portal_size = resource_size();
> + plat_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + mc_portal_phys_addr = plat_res->start;
> + mc_portal_size = resource_size(plat_res);
>   error = fsl_create_mc_io(>dev, mc_portal_phys_addr,
>mc_portal_size, NULL,
>FSL_MC_IO_ATOMIC_CONTEXT_PORTAL, _io);
> @@ -903,11 +930,13 @@ static int fsl_mc_bus_probe(struct platform_device 
> *pdev)
>   dev_info(>dev, "MC firmware version: %u.%u.%u\n",
>mc_version.major, mc_version.minor, mc_version.revision);
>  
> - error = get_mc_addr_translation_ranges(>dev,
> ->translation_ranges,
> ->num_translation_ranges);
> - if (error < 0)
> - goto 

Re: [PATCH v2 01/12] ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC

2020-07-01 Thread Robin Murphy

On 2020-06-30 14:04, Hanjun Guo wrote:

On 2020/6/30 18:24, Lorenzo Pieralisi wrote:

On Tue, Jun 30, 2020 at 11:06:41AM +0800, Hanjun Guo wrote:

[...]


For devices that aren't described in the DSDT - IORT translations
are determined by their ACPI parent device. Do you see/Have you
found any issue with this approach ?


The spec says "Describes the IO relationships between devices
represented in the ACPI namespace.", and in section 3.1.1.3 Named
component node, it says:


PCI devices aren't necessarily described in the ACPI namespace and we
still use IORT to describe them - through the RC node.


"Named component nodes are used to describe devices that are also
included in the Differentiated System Description Table (DSDT). See
[ACPI]."

So from my understanding, the IORT spec for now, can only do ID
translations for devices in the DSDT.


I think you can read this multiple ways but this patch does not
change this concept. What changes, is applying parent's node IORT
mapping to child nodes with no associated DSDT nodes, it is the
same thing we do with PCI and the _DMA method - we could update
the wording in the specs if that clarifies but I don't think this
deliberately disregards the specifications.


I agree, but it's better to update the wording of the spec.




For a platform device, if I use its parent's full path name for
its named component entry, then it will match, but this will violate
the IORT spec.


Can you elaborate on this please I don't get the point you
are making.


For example, device A is not described in DSDT so can't represent
as a NC node in IORT. Device B can be described in DSDT and it
is the parent of device A, so device B can be represented in IORT
with memory access properties and node flags with Substream width
and Stall supported info.

When we trying to translate device A's ID, we reuse all the memory
access properties and node flags from its parent (device B), but
will it the same?


I assume so why wouldn't it be ? Why would be describe them in
a parent-child relationship if that's not how the system looks like
in HW ?


The point I'm making is that I'm not sure all the memory access and
stall properties are the same for the parent and the device itself.


Is that even a valid case though? The principal thing we want to 
accommodate here is when device B *is* the one accessing memory, either 
because it is a bridge with device A sat behind it, or because device A 
is actually just some logical function or subset of physical device B.


If the topology is such that device A is a completely independent device 
with its own path to memory such that it could have different 
properties, I would expect that it *should* be described in DSDT, and I 
can't easily think of a good reason why it wouldn't be. I'm also 
struggling to imagine how it might even have an ID that had to be 
interpreted in the context of device B if it wasn't one of the cases 
above :/


I don't doubt that people could - or maybe even have - come up with crap 
DSDT bindings that don't represent the hardware sufficiently accurately, 
but I'm not sure that should be IORT's problem...


Robin.


Do you have a specific example in mind that we should be aware of ?


So the IORT spec don't support this, at least it's pretty vague
I think.


I think that's a matter of wording, it can be updated if it needs be,
reach out if you see any issue with the current approach please.


If the all the properties for parent and device itself are the same,
I have no strong opinion for this patch, but it's better to update
the wording of the spec as well.

Thanks
Hanjun


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/7] iommu/vt-d: Handle non-page aligned address

2020-07-01 Thread Jacob Pan
From: Liu Yi L 

Address information for device TLB invalidation comes from userspace
when device is directly assigned to a guest with vIOMMU support.
VT-d requires page aligned address. This patch checks and enforce
address to be page aligned, otherwise reserved bits can be set in the
invalidation descriptor. Unrecoverable fault will be reported due to
non-zero value in the reserved bits.

Fixes: 61a06a16e36d8 ("iommu/vt-d: Support flushing more translation
cache types")
Acked-by: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/dmar.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index d9f973fa1190..3899f3161071 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1455,9 +1455,25 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, 
u16 sid, u16 pfsid,
 * Max Invs Pending (MIP) is set to 0 for now until we have DIT in
 * ECAP.
 */
-   desc.qw1 |= addr & ~mask;
-   if (size_order)
+   if (addr & ~VTD_PAGE_MASK)
+   pr_warn_ratelimited("Invalidate non-page aligned address 
%llx\n", addr);
+
+   /* Take page address */
+   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr);
+
+   if (size_order) {
+   /*
+* Existing 0s in address below size_order may be the least
+* significant bit, we must set them to 1s to avoid having
+* smaller size than desired.
+*/
+   desc.qw1 |= GENMASK_ULL(size_order + VTD_PAGE_SHIFT,
+   VTD_PAGE_SHIFT);
+   /* Clear size_order bit to indicate size */
+   desc.qw1 &= ~mask;
+   /* Set the S bit to indicate flushing more than 1 page */
desc.qw1 |= QI_DEV_EIOTLB_SIZE;
+   }
 
qi_submit_sync(iommu, , 1, 0);
 }
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 6/7] iommu/vt-d: Warn on out-of-range invalidation address

2020-07-01 Thread Jacob Pan
For guest requested IOTLB invalidation, address and mask are provided as
part of the invalidation data. VT-d HW silently ignores any address bits
below the mask. SW shall also allow such case but give warning if
address does not align with the mask. This patch relax the fault
handling from error to warning and proceed with invalidation request
with the given mask.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6a0c62c7395c..2e1b53ade784 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5439,13 +5439,12 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 
switch (BIT(cache_type)) {
case IOMMU_CACHE_INV_TYPE_IOTLB:
+   /* HW will ignore LSB bits based on address mask */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
size &&
(inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + 
size)) - 1))) {
-   pr_err_ratelimited("Address out of range, 
0x%llx, size order %llu\n",
-  inv_info->addr_info.addr, 
size);
-   ret = -ERANGE;
-   goto out_unlock;
+   pr_err_ratelimited("User address not aligned, 
0x%llx, size order %llu\n",
+ inv_info->addr_info.addr, size);
}
 
/*
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/7] iommu/vt-d: Fix PASID devTLB invalidation

2020-07-01 Thread Jacob Pan
DevTLB flush can be used for both DMA request with and without PASIDs.
The former uses PASID#0 (RID2PASID), latter uses non-zero PASID for SVA
usage.

This patch adds a check for PASID value such that devTLB flush with
PASID is used for SVA case. This is more efficient in that multiple
PASIDs can be used by a single device, when tearing down a PASID entry
we shall flush only the devTLB specific to a PASID.

Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table")
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/pasid.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c81f0f17c6ba..fa0154cce537 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -486,7 +486,16 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
qdep = info->ats_qdep;
pfsid = info->pfsid;
 
-   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
+   /*
+* When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID),
+* devTLB flush w/o PASID should be used. For non-zero PASID under
+* SVA usage, device could do DMA with multiple PASIDs. It is more
+* efficient to flush devTLB specific to the PASID.
+*/
+   if (pasid == PASID_RID2PASID)
+   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);
+   else
+   qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 
- VTD_PAGE_SHIFT);
 }
 
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/7] iommu/vt-d: Misc tweaks and fixes for vSVA

2020-07-01 Thread Jacob Pan
Hi Baolu and all,

This is a series to address some of the issues we found in vSVA support.
Most of the patches deal with exception handling, we also removed some bits
that are not currently supported.

Many thanks to Kevin Tian's review.

Jacob & Yi


Changelog:
v3
- Use pr_err instead of WARN() for invalid user address range (6/7)
- Fix logic in PASID selective devTLB flush (3/7)

v2 Address reviews from Baolu
- Fixed addr field in devTLB flush (5/7)
- Assign address for single page devTLB invalidation (4/7)
- Coding style tweaks


*** SUBJECT HERE ***

*** BLURB HERE ***

Jacob Pan (4):
  iommu/vt-d: Remove global page support in devTLB flush
  iommu/vt-d: Fix PASID devTLB invalidation
  iommu/vt-d: Warn on out-of-range invalidation address
  iommu/vt-d: Disable multiple GPASID-dev bind

Liu Yi L (3):
  iommu/vt-d: Enforce PASID devTLB field mask
  iommu/vt-d: Handle non-page aligned address
  iommu/vt-d: Fix devTLB flush for vSVA

 drivers/iommu/intel/dmar.c  | 24 +++-
 drivers/iommu/intel/iommu.c | 37 ++---
 drivers/iommu/intel/pasid.c | 11 ++-
 drivers/iommu/intel/svm.c   | 22 +-
 include/linux/intel-iommu.h |  5 ++---
 5 files changed, 62 insertions(+), 37 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/7] iommu/vt-d: Enforce PASID devTLB field mask

2020-07-01 Thread Jacob Pan
From: Liu Yi L 

Set proper masks to avoid invalid input spillover to reserved bits.

Acked-by: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 include/linux/intel-iommu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4100bd224f5c..729386ca8122 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -380,8 +380,8 @@ enum {
 
 #define QI_DEV_EIOTLB_ADDR(a)  ((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
-#define QI_DEV_EIOTLB_GLOB(g)  ((u64)g)
-#define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32)
+#define QI_DEV_EIOTLB_GLOB(g)  ((u64)(g) & 0x1)
+#define QI_DEV_EIOTLB_PASID(p) ((u64)((p) & 0xf) << 32)
 #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16)
 #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4)
 #define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 5/7] iommu/vt-d: Fix devTLB flush for vSVA

2020-07-01 Thread Jacob Pan
From: Liu Yi L 

For guest SVA usage, in order to optimize for less VMEXIT, guest request
of IOTLB flush also includes device TLB.

On the host side, IOMMU driver performs IOTLB and implicit devTLB
invalidation. When PASID-selective granularity is requested by the guest
we need to derive the equivalent address range for devTLB instead of
using the address information in the UAPI data. The reason for that is, unlike
IOTLB flush, devTLB flush does not support PASID-selective granularity.
This is to say, we need to set the following in the PASID based devTLB
invalidation descriptor:
- entire 64 bit range in address ~(0x1 << 63)
- S bit = 1 (VT-d CH 6.5.2.6).

Without this fix, device TLB flush range is not set properly for PASID
selective granularity. This patch also merged devTLB flush code for both
implicit and explicit cases.

Fixes: 6ee1b77ba3ac ("iommu/vt-d: Add svm/sva invalidate function")
Acked-by: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 96340da57075..6a0c62c7395c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5408,7 +5408,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
sid = PCI_DEVID(bus, devfn);
 
/* Size is only valid in address selective invalidation */
-   if (inv_info->granularity != IOMMU_INV_GRANU_PASID)
+   if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
size = to_vtd_size(inv_info->addr_info.granule_size,
   inv_info->addr_info.nb_granules);
 
@@ -5417,6 +5417,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 IOMMU_CACHE_INV_TYPE_NR) {
int granu = 0;
u64 pasid = 0;
+   u64 addr = 0;
 
granu = to_vtd_granularity(cache_type, inv_info->granularity);
if (granu == -EINVAL) {
@@ -5456,24 +5457,31 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
(granu == QI_GRAN_NONG_PASID) ? -1 : 1 
<< size,
inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
 
+   if (!info->ats_enabled)
+   break;
/*
 * Always flush device IOTLB if ATS is enabled. vIOMMU
 * in the guest may assume IOTLB flush is inclusive,
 * which is more efficient.
 */
-   if (info->ats_enabled)
-   qi_flush_dev_iotlb_pasid(iommu, sid,
-   info->pfsid, pasid,
-   info->ats_qdep,
-   inv_info->addr_info.addr,
-   size);
-   break;
+   fallthrough;
case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
+   /*
+* There is no PASID selective flush for device TLB, so
+* the equivalent of that is we set the size to be the
+* entire range of 64 bit. User only provides PASID info
+* without address info. So we set addr to 0.
+*/
+   if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
+   size = 64 - VTD_PAGE_SHIFT;
+   addr = 0;
+   } else if (inv_info->granularity == 
IOMMU_INV_GRANU_ADDR)
+   addr = inv_info->addr_info.addr;
+
if (info->ats_enabled)
qi_flush_dev_iotlb_pasid(iommu, sid,
info->pfsid, pasid,
-   info->ats_qdep,
-   inv_info->addr_info.addr,
+   info->ats_qdep, addr,
size);
else
pr_warn_ratelimited("Passdown device IOTLB 
flush w/o ATS!\n");
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/7] iommu/vt-d: Remove global page support in devTLB flush

2020-07-01 Thread Jacob Pan
Global pages support is removed from VT-d spec 3.0 for dev TLB
invalidation. This patch is to remove the bits for vSVA. Similar change
already made for the native SVA. See the link below.

Link: https://lkml.org/lkml/2019/8/26/651
Acked-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/dmar.c  | 4 +---
 drivers/iommu/intel/iommu.c | 4 ++--
 include/linux/intel-iommu.h | 3 +--
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index cc46dff98fa0..d9f973fa1190 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1437,8 +1437,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, 
u32 pasid, u64 addr,
 
 /* PASID-based device IOTLB Invalidate */
 void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
- u32 pasid,  u16 qdep, u64 addr,
- unsigned int size_order, u64 granu)
+ u32 pasid,  u16 qdep, u64 addr, unsigned int 
size_order)
 {
unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
@@ -1446,7 +1445,6 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, 
u16 sid, u16 pfsid,
desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
QI_DEV_IOTLB_PFSID(pfsid);
-   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
 
/*
 * If S bit is 0, we only flush a single page. If S bit is set,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9129663a7406..96340da57075 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5466,7 +5466,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
info->pfsid, pasid,
info->ats_qdep,
inv_info->addr_info.addr,
-   size, granu);
+   size);
break;
case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
if (info->ats_enabled)
@@ -5474,7 +5474,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
info->pfsid, pasid,
info->ats_qdep,
inv_info->addr_info.addr,
-   size, granu);
+   size);
else
pr_warn_ratelimited("Passdown device IOTLB 
flush w/o ATS!\n");
break;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 729386ca8122..9a6614880773 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -380,7 +380,6 @@ enum {
 
 #define QI_DEV_EIOTLB_ADDR(a)  ((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
-#define QI_DEV_EIOTLB_GLOB(g)  ((u64)(g) & 0x1)
 #define QI_DEV_EIOTLB_PASID(p) ((u64)((p) & 0xf) << 32)
 #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16)
 #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4)
@@ -704,7 +703,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, 
u32 pasid, u64 addr,
 
 void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
  u32 pasid, u16 qdep, u64 addr,
- unsigned int size_order, u64 granu);
+ unsigned int size_order);
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
  int pasid);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 7/7] iommu/vt-d: Disable multiple GPASID-dev bind

2020-07-01 Thread Jacob Pan
For the unlikely use case where multiple aux domains from the same pdev
are attached to a single guest and then bound to a single process
(thus same PASID) within that guest, we cannot easily support this case
by refcounting the number of users. As there is only one SL page table
per PASID while we have multiple aux domains thus multiple SL page tables
for the same PASID.

Extra unbinding guest PASID can happen due to race between normal and
exception cases. Termination of one aux domain may affect others unless
we actively track and switch aux domains to ensure the validity of SL
page tables and TLB states in the shared PASID entry.

Support for sharing second level PGDs across domains can reduce the
complexity but this is not available due to the limitations on VFIO
container architecture. We can revisit this decision once sharing PGDs
are available.

Overall, the complexity and potential glitch do not warrant this unlikely
use case thereby removed by this patch.

Fixes: 56722a4398a30 ("iommu/vt-d: Add bind guest PASID support")
Acked-by: Lu Baolu 
Cc: Kevin Tian 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 6c87c807a0ab..d386853121a2 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -277,20 +277,16 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
goto out;
}
 
+   /*
+* Do not allow multiple bindings of the same device-PASID since
+* there is only one SL page tables per PASID. We may revisit
+* once sharing PGD across domains are supported.
+*/
for_each_svm_dev(sdev, svm, dev) {
-   /*
-* For devices with aux domains, we should allow
-* multiple bind calls with the same PASID and pdev.
-*/
-   if (iommu_dev_feature_enabled(dev,
- IOMMU_DEV_FEAT_AUX)) {
-   sdev->users++;
-   } else {
-   dev_warn_ratelimited(dev,
-"Already bound with PASID 
%u\n",
-svm->pasid);
-   ret = -EBUSY;
-   }
+   dev_warn_ratelimited(dev,
+"Already bound with PASID %u\n",
+svm->pasid);
+   ret = -EBUSY;
goto out;
}
} else {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 07/10] iommu/mediatek: Add REG_MMU_WR_LEN register definition

2020-07-01 Thread Matthias Brugger



On 30/06/2020 12:59, chao hao wrote:
> On Mon, 2020-06-29 at 12:16 +0200, Matthias Brugger wrote:
>>
>> On 29/06/2020 09:13, Chao Hao wrote:
>>> Some platforms(ex: mt6779) need to improve performance by setting
>>> REG_MMU_WR_LEN register. And we can use WR_THROT_EN macro to control
>>> whether we need to set the register. If the register uses default value,
>>> iommu will send command to EMI without restriction, when the number of
>>> commands become more and more, it will drop the EMI performance. So when
>>> more than ten_commands(default value) don't be handled for EMI, iommu will
>>> stop send command to EMI for keeping EMI's performace by enabling write
>>> throttling mechanism(bit[5][21]=0) in MMU_WR_LEN_CTRL register.
>>>
>>> Cc: Matthias Brugger 
>>> Signed-off-by: Chao Hao 
>>> ---
>>>  drivers/iommu/mtk_iommu.c | 10 ++
>>>  drivers/iommu/mtk_iommu.h |  2 ++
>>>  2 files changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>>> index ec1f86913739..92316c4175a9 100644
>>> --- a/drivers/iommu/mtk_iommu.c
>>> +++ b/drivers/iommu/mtk_iommu.c
>>> @@ -46,6 +46,8 @@
>>>  #define F_MMU_STANDARD_AXI_MODE_BIT(BIT(3) | BIT(19))
>>>  
>>>  #define REG_MMU_DCM_DIS0x050
>>> +#define REG_MMU_WR_LEN 0x054
>>
>> The register name is confusing. For me it seems to describe the length of a
>> write but it is used for controlling the write throttling. Is this the name
>> that's used in the datasheet?
>>
> 
> Thanks for your review carefully, we can name it to REG_MMU_WR_LEN_CTRL

Yes, that's mbetter, thanks.

> 
> 
>>> +#define F_MMU_WR_THROT_DIS_BIT (BIT(5) |  BIT(21))
>>
>> There are two spaces between '|' and 'BIT(21)', should be one.
>>
>> Regarding the name of the define, what does the 'F_' statnds for? 
> 
> F_ is used to described some bits in register and doesn't have other
> meanings. The format is refer to other bits definition
> 
>> Also I think
>> it should be called '_MASK' instead of '_BIT' as it defines a mask of bits.
>>
> 
> Thanks for your advice.
> For F_MMU_WR_THROT_DIS_BIT:
> 1'b0: Enable write throttling mechanism
> 1'b1: Disable write throttling mechanism
> So I think we can name "F_MMU_WR_THROT_DIS  BIT(5) | BIT(21)" directly,
> it maybe more clearer.
> 

Is this what the datasheet names it? If not then I'd prefer 
F_MMU_WR_THROT_DIS_MASK.

Regards,
Matthias

>> Regards,
>> Matthias
>>
>>>  
>>>  #define REG_MMU_CTRL_REG   0x110
>>>  #define F_MMU_TF_PROT_TO_PROGRAM_ADDR  (2 << 4)
>>> @@ -582,6 +584,12 @@ static int mtk_iommu_hw_init(const struct 
>>> mtk_iommu_data *data)
>>> writel_relaxed(regval, data->base + REG_MMU_VLD_PA_RNG);
>>> }
>>> writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
>>> +   if (MTK_IOMMU_HAS_FLAG(data->plat_data, WR_THROT_EN)) {
>>> +   /* write command throttling mode */
>>> +   regval = readl_relaxed(data->base + REG_MMU_WR_LEN);
>>> +   regval &= ~F_MMU_WR_THROT_DIS_BIT;
>>> +   writel_relaxed(regval, data->base + REG_MMU_WR_LEN);
>>> +   }
>>>  
>>> regval = readl_relaxed(data->base + REG_MMU_MISC_CTRL);
>>> if (MTK_IOMMU_HAS_FLAG(data->plat_data, RESET_AXI)) {
>>> @@ -737,6 +745,7 @@ static int __maybe_unused mtk_iommu_suspend(struct 
>>> device *dev)
>>> struct mtk_iommu_suspend_reg *reg = >reg;
>>> void __iomem *base = data->base;
>>>  
>>> +   reg->wr_len = readl_relaxed(base + REG_MMU_WR_LEN);
>>> reg->misc_ctrl = readl_relaxed(base + REG_MMU_MISC_CTRL);
>>> reg->dcm_dis = readl_relaxed(base + REG_MMU_DCM_DIS);
>>> reg->ctrl_reg = readl_relaxed(base + REG_MMU_CTRL_REG);
>>> @@ -761,6 +770,7 @@ static int __maybe_unused mtk_iommu_resume(struct 
>>> device *dev)
>>> dev_err(data->dev, "Failed to enable clk(%d) in resume\n", ret);
>>> return ret;
>>> }
>>> +   writel_relaxed(reg->wr_len, base + REG_MMU_WR_LEN);
>>> writel_relaxed(reg->misc_ctrl, base + REG_MMU_MISC_CTRL);
>>> writel_relaxed(reg->dcm_dis, base + REG_MMU_DCM_DIS);
>>> writel_relaxed(reg->ctrl_reg, base + REG_MMU_CTRL_REG);
>>> diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
>>> index be6d32ee5bda..ce4f4e8f03aa 100644
>>> --- a/drivers/iommu/mtk_iommu.h
>>> +++ b/drivers/iommu/mtk_iommu.h
>>> @@ -24,6 +24,7 @@
>>>  #define RESET_AXI  BIT(3)
>>>  #define OUT_ORDER_EN   BIT(4)
>>>  #define HAS_SUB_COMM   BIT(5)
>>> +#define WR_THROT_ENBIT(6)
>>>  
>>>  #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
>>> pdata)->flags) & (_x)) == (_x))
>>> @@ -36,6 +37,7 @@ struct mtk_iommu_suspend_reg {
>>> u32 int_main_control;
>>> u32 ivrp_paddr;
>>> u32 vld_pa_rng;
>>> +   u32 wr_len;
>>>  };

Re: [PATCH v5 04/10] iommu/mediatek: Setting MISC_CTRL register

2020-07-01 Thread Matthias Brugger



On 30/06/2020 12:53, chao hao wrote:
> On Mon, 2020-06-29 at 11:28 +0200, Matthias Brugger wrote:
>>
>> On 29/06/2020 09:13, Chao Hao wrote:
>>> Add F_MMU_IN_ORDER_WR_EN and F_MMU_STANDARD_AXI_MODE_BIT definition
>>> in MISC_CTRL register.
>>> F_MMU_STANDARD_AXI_MODE_BIT:
>>>   If we set F_MMU_STANDARD_AXI_MODE_BIT(bit[3][19] = 0, not follow
>>> standard AXI protocol), iommu will send urgent read command firstly
>>> compare with normal read command to improve performance.
>>
>> Can you please help me to understand the phrase. Sorry I'm not a AXI 
>> specialist.
>> Does this mean that you will send a 'urgent read command' which is not 
>> described
>> in the specifications instead of a normal read command?
> 
> ok.
> iommu sends read command to next bus_node normally(we can name it to
> cmd1), when cmd1 isn't handled by next bus_node, iommu has a urgent read
> command is needed to be sent(we can name it to cmd2), iommu will send
> cmd2 and replace cmd1. So cmd2 is handled by next bus_node firstly and
> cmd2 will be handled secondly.
> But for standard AXI protocol, it will ignore the priority of read
> command and only be handled in order. So cmd2 is handled by next
> bus_node after cmd1 is done.
> 

Thanks. So I propose change this part of the commit message to something like:
F_MMU_STANDARD_AXI_MODE_BIT:
If we set F_MMU_STANDARD_AXI_MODE_EN_MASK (bit[3][19] = 0, not follow standard
AXI protocol), the iommu will priorize sending of urgent read command over a
normal read command. This improves the performance.

>>
>>> F_MMU_IN_ORDER_WR_EN:
>>>   If we set F_MMU_IN_ORDER_WR_EN(bit[1][17] = 0, out-of-order write), iommu
>>> will re-order write command and send more higher priority write command
>>> instead of sending write command in order. The feature be controlled
>>> by OUT_ORDER_EN macro definition.

F_MMU_IN_ORDER_WR_EN:
If we set F_MMU_IN_ORDER_WR_EN_MASK (bit[1][17] = 0, out-of-order write), the
iommu will re-order write commands and send the write command with higher
priority. Otherwise the sending of write commands will be done in order. The
feature is controlled by OUT_ORDER_WR_EN platform data flag.


>>>
>>> Cc: Matthias Brugger 
>>> Suggested-by: Yong Wu 
>>> Signed-off-by: Chao Hao 
>>> ---
>>>  drivers/iommu/mtk_iommu.c | 12 +++-
>>>  drivers/iommu/mtk_iommu.h |  1 +
>>>  2 files changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>>> index 8f81df6cbe51..67b46b5d83d9 100644
>>> --- a/drivers/iommu/mtk_iommu.c
>>> +++ b/drivers/iommu/mtk_iommu.c
>>> @@ -42,6 +42,9 @@
>>>  #define F_INVLD_EN1BIT(1)
>>>  
>>>  #define REG_MMU_MISC_CTRL  0x048
>>> +#define F_MMU_IN_ORDER_WR_EN   (BIT(1) | BIT(17))
>>> +#define F_MMU_STANDARD_AXI_MODE_BIT(BIT(3) | BIT(19))
>>
>> Wouldn't it make more sense to name it F_MMU_STANDARD_AXI_MODE_EN?
> ok, you are right.
> 1'b1: follow standard axi protocol
> 

What about
F_MMU_IN_ORDER_WR_EN_MASK
F_MMU_STANDARD_AXI_MODE_EN_MASK

Background is that we have to set/unset two bits to enable or disable the
feature, so it's a mask we have to apply to the register.

Regards,
Matthias

>>
>>> +
>>>  #define REG_MMU_DCM_DIS0x050
>>>  
>>>  #define REG_MMU_CTRL_REG   0x110
>>> @@ -574,10 +577,17 @@ static int mtk_iommu_hw_init(const struct 
>>> mtk_iommu_data *data)
>>> }
>>> writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
>>>  
>>> +   regval = readl_relaxed(data->base + REG_MMU_MISC_CTRL);
>>
>> We only need to read regval in the else branch.
> 
> ok, I got it. thanks
> 
>>
>>> if (MTK_IOMMU_HAS_FLAG(data->plat_data, RESET_AXI)) {
>>> /* The register is called STANDARD_AXI_MODE in this case */
>>> -   writel_relaxed(0, data->base + REG_MMU_MISC_CTRL);
>>> +   regval = 0;
>>> +   } else {
>>> +   /* For mm_iommu, it can improve performance by the setting */
>>> +   regval &= ~F_MMU_STANDARD_AXI_MODE_BIT;
>>> +   if (MTK_IOMMU_HAS_FLAG(data->plat_data, OUT_ORDER_EN))
>>> +   regval &= ~F_MMU_IN_ORDER_WR_EN;
>>> }
>>> +   writel_relaxed(regval, data->base + REG_MMU_MISC_CTRL);
>>>  
>>> if (devm_request_irq(data->dev, data->irq, mtk_iommu_isr, 0,
>>>  dev_name(data->dev), (void *)data)) {
>>> diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
>>> index 7cc39f729263..4b780b651ef4 100644
>>> --- a/drivers/iommu/mtk_iommu.h
>>> +++ b/drivers/iommu/mtk_iommu.h
>>> @@ -22,6 +22,7 @@
>>>  #define HAS_BCLK   BIT(1)
>>>  #define HAS_VLD_PA_RNG BIT(2)
>>>  #define RESET_AXI  BIT(3)
>>> +#define OUT_ORDER_EN   BIT(4)
>>
>> Maybe something like OUT_ORDER_WR_EN, to make clear that it's about the the
>> write path.
>>
> ok, thanks for your advice.
> 
>>>  
>>>  #define MTK_IOMMU_HAS_FLAG(pdata, 

Re: [PATCH v2 3/7] iommu/vt-d: Fix PASID devTLB invalidation

2020-07-01 Thread Jacob Pan
On Wed, 1 Jul 2020 08:49:54 +0800
Lu Baolu  wrote:

> Hi Jacob,
> 
> On 7/1/20 5:07 AM, Jacob Pan wrote:
> > DevTLB flush can be used for both DMA request with and without
> > PASIDs. The former uses PASID#0 (RID2PASID), latter uses non-zero
> > PASID for SVA usage.
> > 
> > This patch adds a check for PASID value such that devTLB flush with
> > PASID is used for SVA case. This is more efficient in that multiple
> > PASIDs can be used by a single device, when tearing down a PASID
> > entry we shall flush only the devTLB specific to a PASID.
> > 
> > Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table")
> > Signed-off-by: Jacob Pan 
> > ---
> >   drivers/iommu/intel/pasid.c | 11 ++-
> >   1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel/pasid.c
> > b/drivers/iommu/intel/pasid.c index c81f0f17c6ba..70d21209dd04
> > 100644 --- a/drivers/iommu/intel/pasid.c
> > +++ b/drivers/iommu/intel/pasid.c
> > @@ -486,7 +486,16 @@ devtlb_invalidation_with_pasid(struct
> > intel_iommu *iommu, qdep = info->ats_qdep;
> > pfsid = info->pfsid;
> >   
> > -   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 -
> > VTD_PAGE_SHIFT);
> > +   /*
> > +* When PASID 0 is used, it indicates RID2PASID(DMA
> > request w/o PASID),
> > +* devTLB flush w/o PASID should be used. For non-zero
> > PASID under
> > +* SVA usage, device could do DMA with multiple PASIDs. It
> > is more
> > +* efficient to flush devTLB specific to the PASID.
> > +*/
> > +   if (pasid == PASID_RID2PASID)
> > +   qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid,
> > qdep, 0, 64 - VTD_PAGE_SHIFT);
> > +   else
> > +   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64
> > - VTD_PAGE_SHIFT);  
> 
> The if/else logic is reversed.
> 
>   if (pasid == PASID_RID2PASID)
>   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 -
> VTD_PAGE_SHIFT); else
>   qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid,
> qdep, 0, 64 - VTD_PAGE_SHIFT);
> 
indeed, will fix. thanks

> Best regards,
> baolu
> 
> >   }
> >   
> >   void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> > struct device *dev, 

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] iommu/vt-d: Warn on out-of-range invalidation address

2020-07-01 Thread Jacob Pan
On Wed, 1 Jul 2020 09:45:40 +0800
Lu Baolu  wrote:

> Hi Jacob,
> 
> On 7/1/20 1:34 AM, Jacob Pan wrote:
> > On Thu, 25 Jun 2020 18:10:43 +0800
> > Lu Baolu  wrote:
> >   
> >> Hi,
> >>
> >> On 2020/6/23 23:43, Jacob Pan wrote:  
> >>> For guest requested IOTLB invalidation, address and mask are
> >>> provided as part of the invalidation data. VT-d HW silently
> >>> ignores any address bits below the mask. SW shall also allow such
> >>> case but give warning if address does not align with the mask.
> >>> This patch relax the fault handling from error to warning and
> >>> proceed with invalidation request with the given mask.
> >>>
> >>> Signed-off-by: Jacob Pan
> >>> ---
> >>>drivers/iommu/intel/iommu.c | 7 +++
> >>>1 file changed, 3 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/iommu/intel/iommu.c
> >>> b/drivers/iommu/intel/iommu.c index 5ea5732d5ec4..50fc62413a35
> >>> 100644 --- a/drivers/iommu/intel/iommu.c
> >>> +++ b/drivers/iommu/intel/iommu.c
> >>> @@ -5439,13 +5439,12 @@ intel_iommu_sva_invalidate(struct
> >>> iommu_domain *domain, struct device *dev,
> >>>   switch (BIT(cache_type)) {
> >>>   case IOMMU_CACHE_INV_TYPE_IOTLB:
> >>> + /* HW will ignore LSB bits based on
> >>> address mask */ if (inv_info->granularity == IOMMU_INV_GRANU_ADDR
> >>> && size &&
> >>>   (inv_info->addr_info.addr &
> >>> ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
> >>> - pr_err_ratelimited("Address out
> >>> of range, 0x%llx, size order %llu\n",
> >>> -
> >>> inv_info->addr_info.addr, size);
> >>> - ret = -ERANGE;
> >>> - goto out_unlock;
> >>> + WARN_ONCE(1, "Address out of
> >>> range, 0x%llx, size order %llu\n",
> >>> +
> >>> inv_info->addr_info.addr, size);  
> >> I don't think WARN_ONCE() is suitable here. It makes users think
> >> it's a kernel bug. How about pr_warn_ratelimited()?
> >>  
> > I think pr_warn_ratelimited might still be too chatty. There is no
> > functional issues, we just don't to silently ignore it. Perhaps just
> > say:
> > WARN_ONCE(1, "User provided address not page aligned, alignment
> > forced") ?
> >   
> 
> WARN() is normally used for reporting a kernel bug. It dumps kernel
> trace. And the users will report bug through bugzilla.kernel.org.
> 
> In this case, it's actually an unexpected user input, we shouldn't
> treat it as a kernel bug and pr_err_ratelimited() is enough?
> 
Sounds good. I will leave it.

Thanks,

Jacob
> Best regards,
> baolu

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-07-01 Thread Christoph Hellwig
On Mon, Jun 29, 2020 at 04:46:09PM -0700, Stefano Stabellini wrote:
> > I could imagine some future Xen hosts setting a flag somewhere in the
> > platform capability saying "no xen specific flag, rely on
> > "VIRTIO_F_ACCESS_PLATFORM". Then you set that accordingly in QEMU.
> > How about that?
> 
> Yes, that would be fine and there is no problem implementing something
> like that when we get virtio support in Xen. Today there are still no
> virtio interfaces provided by Xen to ARM guests (no virtio-block/net,
> etc.)
> 
> In fact, in both cases we are discussing virtio is *not* provided by
> Xen; it is a firmware interface to something entirely different:
> 
> 1) virtio is used to talk to a remote AMP processor (RPMesg)
> 2) virtio is used to talk to a secure-world firmware/OS (Trusty)
>
> VIRTIO_F_ACCESS_PLATFORM is not set by Xen in these cases but by RPMesg
> and by Trusty respectively. I don't know if Trusty should or should not
> set VIRTIO_F_ACCESS_PLATFORM, but I think Linux should still work
> without issues.
> 

Any virtio implementation that is not in control of the memory map
(aka not the hypervisor) absolutely must set VIRTIO_F_ACCESS_PLATFORM,
else it is completely broken.

> The xen_domain() check in Linux makes it so that vring_use_dma_api
> returns the opposite value on native Linux compared to Linux as Xen/ARM
> DomU by "accident". By "accident" because there is no architectural
> reason why Linux Xen/ARM DomU should behave differently compared to
> native Linux in this regard.
> 
> I hope that now it is clearer why I think the if (xen_domain()) check
> needs to be improved anyway, even if we fix generic dma_ops with virtio
> interfaces missing VIRTIO_F_ACCESS_PLATFORM.

IMHO that Xen quirk should never have been added in this form..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()

2020-07-01 Thread Robin Murphy

On 2020-07-01 08:32, Lu Baolu wrote:

Hi Robin,

On 2020/7/1 0:51, Robin Murphy wrote:

On 2020-06-30 02:03, Lu Baolu wrote:

Hi Robin,

On 6/29/20 7:56 PM, Robin Murphy wrote:

On 2020-06-27 04:15, Lu Baolu wrote:

The hardware assistant vfio mediated device is a use case of iommu
aux-domain. The interactions between vfio/mdev and iommu during mdev
creation and passthr are:

- Create a group for mdev with iommu_group_alloc();
- Add the device to the group with
 group = iommu_group_alloc();
 if (IS_ERR(group))
 return PTR_ERR(group);

 ret = iommu_group_add_device(group, >dev);
 if (!ret)
 dev_info(>dev, "MDEV: group_id = %d\n",
  iommu_group_id(group));
- Allocate an aux-domain
iommu_domain_alloc()
- Attach the aux-domain to the physical device from which the mdev is
   created.
iommu_aux_attach_device()

In the whole process, an iommu group was allocated for the mdev and an
iommu domain was attached to the group, but the group->domain leaves
NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.

This adds iommu_group_get/set_domain() so that group->domain could be
managed whenever a domain is attached or detached through the 
aux-domain

api's.


Letting external callers poke around directly in the internals of 
iommu_group doesn't look right to me.


Unfortunately, it seems that the vifo iommu abstraction is deeply bound
to the IOMMU subsystem. We can easily find other examples:

iommu_group_get/set_iommudata()
iommu_group_get/set_name()
...


Sure, but those are ways for users of a group to attach useful 
information of their own to it, that doesn't matter to the IOMMU 
subsystem itself. The interface you've proposed gives callers rich new 
opportunities to fundamentally break correct operation of the API:


 dom = iommu_domain_alloc();
 iommu_attach_group(dom, grp);
 ...
 iommu_group_set_domain(grp, NULL);
 // oops, leaked and can't ever detach properly now

or perhaps:

 grp = iommu_group_alloc();
 iommu_group_add_device(grp, dev);
 iommu_group_set_domain(grp, dom);
 ...
 iommu_detach_group(dom, grp);
 // oops, IOMMU driver might not handle this

If a regular device is attached to one or more aux domains for PASID 
use, iommu_get_domain_for_dev() is still going to return the primary 
domain, so why should it be expected to behave differently for mediated


Unlike the normal device attach, we will encounter two devices when it
comes to aux-domain.

- Parent physical device - this might be, for example, a PCIe device
with PASID feature support, hence it is able to tag an unique PASID
for DMA transfers originated from its subset. The device driver hence
is able to wrapper this subset into an isolated:

- Mediated device - a fake device created by the device driver mentioned
above.

Yes. All you mentioned are right for the parent device. But for mediated
device, iommu_get_domain_for_dev() doesn't work even it has an valid
iommu_group and iommu_domain.

iommu_get_domain_for_dev() is a necessary interface for device drivers
which want to support aux-domain. For example,


Only if they want to follow this very specific notion of using made-up 
devices and groups to represent aux attachments. Even if a driver 
managing its own aux domains entirely privately does create child 
devices for them, it's not like it can't keep its domain pointers in 
drvdata if it wants to ;)


Let's not conflate the current implementation of vfio_mdev with the 
general concepts involved here.



   struct iommu_domain *domain;
   struct device *dev = mdev_dev(mdev);
   unsigned long pasid;

   domain = iommu_get_domain_for_dev(dev);
   if (!domain)
   return -ENODEV;

   pasid = iommu_aux_get_pasid(domain, dev->parent);
   if (pasid == IOASID_INVALID)
   return -EINVAL;

   /* Program the device context with the PASID value */
   

Without this fix, iommu_get_domain_for_dev() always returns NULL and the
device driver has no means to support aux-domain.


So either the IOMMU API itself is missing the ability to do the right 
thing internally, or the mdev layer isn't using it appropriately. 
Either way, simply punching holes in the API for mdev to hack around 
its own mess doesn't seem like the best thing to do.


The initial impression I got was that it's implicitly assumed here 
that the mdev itself is attached to exactly one aux domain and nothing 
else, at which point I would wonder why it's using aux at all, but are 
you saying that in fact no attach happens with the mdev group either 
way, only to the parent device?


I'll admit I'm not hugely familiar with any of this, but it seems to 
me that the logical flow should be:


 - allocate domain
 - attach as aux to parent
 - retrieve aux domain PASID
 - create mdev child based on PASID
 - attach mdev to domain (normally)


Re: [RFC 0/2] iommu: arm-smmu: Add support for early direct mappings

2020-07-01 Thread Will Deacon
On Wed, Jul 01, 2020 at 12:40:50AM -0700, Bjorn Andersson wrote:
> On Wed 03 Jun 04:00 PDT 2020, Robin Murphy wrote:
> > at that point I'm inclined to suggest we give up and stop trying to
> > drive these things with arm-smmu. The XZR thing was bad enough, but if
> > they're not even going to pretend to implement the architecture correctly
> > then I'm not massively keen to continue tying the architectural driver in
> > further knots if innocent things like CONFIG_IOMMU_DEFAULT_PASSTHROUGH are
> > going to unexpectedly and catastrophically fail. We have qcom-iommu for
> > hypervisor-mediated SMMUs, and this new hypervisor behaviour sounds to me
> > more like "qcom-iommu++" with reassignable stream-to-context mappings,
> > rather than a proper Arm SMMU emulation.
> > 
> 
> I've been going through over and over, hoping to perhaps be able to
> evolve qcom_iommu into a qcom-iommu++, but afaict the new hypervisor is
> different enough that this isn't feasible. In particular, the platforms
> using qcom_iommu relies entirely on the hypervisor to configure stream
> mapping etc - and we can't even read most of the registers.
> 
> On the other hand I agree with you that we're messing around quite a bit
> with the arm-smmu driver, and I'm uncertain where we are on supporting
> the various GPU features, so I'm adding Jordan to the thread.
> 
> So, afaict we have the options of either shoehorning this too into the
> arm-smmu driver or we essentially fork arm-smmu.c to create a
> qcom-smmu.c.
> 
> While I don't fancy the code duplication, it would allow us to revert
> the Qualcomm quirks from arm-smmu and would unblock a number of
> activities that we have depending on getting the SMMU enabled on various
> platforms.

We added the impl hooks to cater for implementation differences, so I'd
still prefer to see this done as part of arm-smmu than introduce another
almost-the-same-but-not-quite IOMMU driver that has the lifetime of
a single SoC.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

2020-07-01 Thread Robin Murphy

On 2020-07-01 01:40, Qian Cai wrote:

Looks like this patchset introduced an use-after-free on arm-smmu-v3.

Reproduced using mlx5,

# echo 1 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
# echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs

The .config,
https://github.com/cailca/linux-mm/blob/master/arm64.config

Looking at the free stack,

iommu_release_device->iommu_group_remove_device

was introduced in 07/34 ("iommu: Add probe_device() and release_device()
call-backs").


Right, iommu_group_remove_device can tear down the group and call 
->domain_free before the driver has any knowledge of the last device 
going away via the ->release_device call.


I guess the question is do we simply flip the call order in 
iommu_release_device() so drivers can easily clean up their internal 
per-device state first, or do we now want them to be robust against 
freeing domains with devices still nominally attached?


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH net] xsk: remove cheap_dma optimization

2020-07-01 Thread Björn Töpel

On 2020-06-29 17:41, Robin Murphy wrote:

On 2020-06-28 18:16, Björn Töpel wrote:

[...]>

Somewhat related to the DMA API; It would have performance benefits for
AF_XDP if the DMA range of the mapped memory was linear, i.e. by IOMMU
utilization. I've started hacking a thing a little bit, but it would be
nice if such API was part of the mapping core.

Input: array of pages Output: array of dma addrs (and obviously dev,
flags and such)

For non-IOMMU len(array of pages) == len(array of dma addrs)
For best-case IOMMU len(array of dma addrs) == 1 (large linear space)

But that's for later. :-)


FWIW you will typically get that behaviour from IOMMU-based 
implementations of dma_map_sg() right now, although it's not strictly 
guaranteed. If you can weather some additional setup cost of calling 
sg_alloc_table_from_pages() plus walking the list after mapping to test 
whether you did get a contiguous result, you could start taking 
advantage of it as some of the dma-buf code in DRM and v4l2 does already 
(although those cases actually treat it as a strict dependency rather 
than an optimisation).


I'm inclined to agree that if we're going to see more of these cases, a 
new API call that did formally guarantee a DMA-contiguous mapping 
(either via IOMMU or bounce buffering) or failure might indeed be handy.




I forgot to reply to this one! My current hack is using the iommu code 
directly, similar to what vfio-pci does (hopefully not gutting the API 
this time ;-)).


Your approach sound much nicer, and easier. I'll try that out! Thanks a 
lot for the pointers, and I might be back with more questions.



Cheers,
Björn


Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/7] iommu/arm-smmu: Enable split pagetable support

2020-07-01 Thread Sai Prakash Ranjan

Hi Will, Robin,

On 2020-06-27 01:30, Jordan Crouse wrote:
Another iteration of the split-pagetable support for arm-smmu and the 
Adreno GPU
SMMU. After email discussions [1] we opted to make a arm-smmu 
implementation for
specifically for the Adreno GPU and use that to enable split pagetable 
support

and later other implementation specific bits that we need.

On the hardware side this is very close to the same code from before 
[2] only
the TTBR1 quirk is turned on by the implementation and not a domain 
attribute.
In drm/msm we use the returned size of the aperture as a clue to let us 
know

which virtual address space we should use for global memory objects.

There are two open items that you should be aware of. First, in the
implementation specific code we have to check the compatible string of 
the
device so that we only enable TTBR1 for the GPU (SID 0) and not the GMU 
(SID 4).
I went back and forth trying to decide if I wanted to use the 
compatible string
or the SID as the filter and settled on the compatible string but I 
could be

talked out of it.

The other open item is that in drm/msm the hardware only uses 49 bits 
of the
address space but arm-smmu expects the address to be sign extended all 
the way
to 64 bits. This isn't a problem normally unless you look at the 
hardware
registers that contain a IOVA and then the upper bits will be zero. I 
opted to
restrict the internal drm/msm IOVA range to only 49 bits and then sign 
extend
right before calling iommu_map / iommu_unmap. This is a bit wonky but I 
thought
that matching the hardware would be less confusing when debugging a 
hang.


v9: Fix bot-detected merge conflict
v7: Add attached device to smmu_domain to pass to implementation 
specific

functions

[1] 
https://lists.linuxfoundation.org/pipermail/iommu/2020-May/044537.html

[2] https://patchwork.kernel.org/patch/11482591/


Jordan Crouse (7):
  iommu/arm-smmu: Pass io-pgtable config to implementation specific
function
  iommu/arm-smmu: Add support for split pagetables
  dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
  iommu/arm-smmu: Add a pointer to the attached device to smmu_domain
  iommu/arm-smmu: Add implementation for the adreno GPU SMMU
  drm/msm: Set the global virtual address range from the IOMMU domain
  arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  4 ++
 arch/arm64/boot/dts/qcom/sdm845.dtsi  |  2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 13 +-
 drivers/gpu/drm/msm/msm_iommu.c   |  7 +++
 drivers/iommu/arm-smmu-impl.c |  6 ++-
 drivers/iommu/arm-smmu-qcom.c | 45 ++-
 drivers/iommu/arm-smmu.c  | 38 +++-
 drivers/iommu/arm-smmu.h  | 30 ++---
 8 files changed, 120 insertions(+), 25 deletions(-)


Any chance reviewing this?

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: the XSK buffer pool needs be to reverted

2020-07-01 Thread Robin Murphy

On 2020-06-30 20:08, Jonathan Lemon wrote:

On Mon, Jun 29, 2020 at 02:15:16PM +0100, Robin Murphy wrote:

On 2020-06-27 08:02, Christoph Hellwig wrote:

On Fri, Jun 26, 2020 at 01:54:12PM -0700, Jonathan Lemon wrote:

On Fri, Jun 26, 2020 at 09:47:25AM +0200, Christoph Hellwig wrote:


Note that this is somewhat urgent, as various of the APIs that the code
is abusing are slated to go away for Linux 5.9, so this addition comes
at a really bad time.


Could you elaborate on what is upcoming here?


Moving all these calls out of line, and adding a bypass flag to avoid
the indirect function call for IOMMUs in direct mapped mode.


Also, on a semi-related note, are there limitations on how many pages
can be left mapped by the iommu?  Some of the page pool work involves
leaving the pages mapped instead of constantly mapping/unmapping them.


There are, but I think for all modern IOMMUs they are so big that they
don't matter.  Maintaines of the individual IOMMU drivers might know
more.


Right - I don't know too much about older and more esoteric stuff like POWER
TCE, but for modern pagetable-based stuff like Intel VT-d, AMD-Vi, and Arm
SMMU, the only "limits" are such that legitimate DMA API use should never
get anywhere near them (you'd run out of RAM for actual buffers long
beforehand). The most vaguely-realistic concern might be a pathological
system topology where some old 32-bit PCI device doesn't have ACS isolation
from your high-performance NIC such that they have to share an address
space, where the NIC might happen to steal all the low addresses and prevent
the soundcard or whatever from being able to map a usable buffer.

With an IOMMU, you typically really *want* to keep a full working set's
worth of pages mapped, since dma_map/unmap are expensive while dma_sync is
somewhere between relatively cheap and free. With no IOMMU it makes no real
difference from the DMA API perspective since map/unmap are effectively no
more than the equivalent sync operations anyway (I'm assuming we're not
talking about the kind of constrained hardware that might need SWIOTLB).


On a heavily loaded box with iommu enabled, it seems that quite often
there is contention on the iova_lock.  Are there known issues in this
area?


I'll have to defer to the IOMMU maintainers, and for that you'll need
to say what code you are using.  Current mainlaine doesn't even have
an iova_lock anywhere.


Again I can't speak for non-mainstream stuff outside drivers/iommu, but it's
been over 4 years now since merging the initial scalability work for the
generic IOVA allocator there that focused on minimising lock contention, and
it's had considerable evaluation and tweaking since. But if we can achieve
the goal of efficiently recycling mapped buffers then we shouldn't need to
go anywhere near IOVA allocation either way except when expanding the pool.



I'm running a set of patches which uses the page pool to try and keep
all the RX buffers mapped as the skb goes up the stack, returning the
pages to the pool when the skb is freed.

On a dual-socket 12-core Intel machine (48 processors), and 256G of
memory, when iommu is enabled, I see the following from 'perf top -U',
as the hottest function being run:

-   43.42%  worker  [k] queued_spin_lock_slowpath
- 43.42% queued_spin_lock_slowpath
   - 41.69% _raw_spin_lock_irqsave
  + 41.39% alloc_iova
  + 0.28% iova_magazine_free_pfns
   + 1.07% lock_sock_nested

Which likely is heavy contention on the iovad->iova_rbtree_lock.
(This is on a 5.6 based system, BTW).  More scripts and data are below.
Is there a way to reduce the contention here?


Hmm, how big are your DMA mappings? If you're still hitting the rbtree a 
lot, that most likely implies that either you're making giant IOVA 
allocations that are too big to be cached, or you're allocating/freeing 
IOVAs in a pathological pattern that defeats the whole magazine cache 
mechanism (It's optimised for relatively-balanced allocation and freeing 
of sizes up order 6). On a further hunch, does the 
"intel_iommu=forcedac" option make any difference at all?


Either way if this persists after some initial warm-up period, it 
further implies that the page pool is not doing its job properly (or at 
least in the way I would have expected). The alloc_iova() call is part 
of the dma_map_*() overhead, and if the aim is to keep pages mapped then 
that should only be called relatively infrequently. The optimal 
behaviour would be to dma_map() new clean pages as they are added to the 
pool, use dma_sync() when they are claimed and returned by the driver, 
and only dma_unmap() if they're actually freed back to the page 
allocator. And if you're still seeing a lot of dma_map/unmap time after 
that, then the pool itself is churning pages and clearly needs its 
size/thresholds tuning.


Robin.





The following quick and dirty [and possibly wrong] .bpf script was used
to try and find the time spent in 

Re: [PATCH v9 3/4] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-01 Thread Jon Hunter


On 01/07/2020 00:57, Krishna Reddy wrote:
> Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
> on ARM MMU-500.
> 
> Signed-off-by: Krishna Reddy 
> ---
>  .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> index d7ceb4c34423b..662c46e16f07d 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> @@ -38,6 +38,11 @@ properties:
>- qcom,sc7180-smmu-500
>- qcom,sdm845-smmu-500
>- const: arm,mmu-500
> +  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
> +items:
> +  - enum:
> +  - nvidia,tegra194-smmu
> +  - const: arm,mmu-500

We should also address Robin's comments here. I don't think that we ever
want to fail back to just the regular 'arm,mmu-500' and should be should
probably just drop this part.

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 2/4] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-07-01 Thread Jon Hunter


On 01/07/2020 00:57, Krishna Reddy wrote:
> NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
> It uses two of ARM MMU-500s together to interleave IOVA accesses
> across them and must be programmed identically.
> The third SMMU instance is used as a regular ARM MMU-500 and it
> can either be programmed independently or identical to other
> two ARM MMU-500s.
> 
> This implementation supports programming two or three ARM MMU-500s
> identically as per DT config.
> 
> Signed-off-by: Krishna Reddy 
> ---
>  MAINTAINERS |   2 +
>  drivers/iommu/Makefile  |   2 +-
>  drivers/iommu/arm-smmu-impl.c   |   3 +
>  drivers/iommu/arm-smmu-nvidia.c | 206 
>  drivers/iommu/arm-smmu.h|   1 +
>  5 files changed, 213 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/arm-smmu-nvidia.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7b5ffd646c6b9..64c37dbdd4426 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16808,8 +16808,10 @@ F:   drivers/i2c/busses/i2c-tegra.c
>  
>  TEGRA IOMMU DRIVERS
>  M:   Thierry Reding 
> +R:   Krishna Reddy 
>  L:   linux-te...@vger.kernel.org
>  S:   Supported
> +F:   drivers/iommu/arm-smmu-nvidia.c
>  F:   drivers/iommu/tegra*
>  
>  TEGRA KBC DRIVER
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 342190196dfb0..2b8203db73ec3 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o 
> amd/quirks.o
>  obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
>  obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
>  obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
> -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
> +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
>  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>  obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
>  obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index c75b9d957b702..f15571d05474e 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
> arm_smmu_device *smmu)
>   if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
>   smmu->impl = _impl;
>  
> + if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
> + return nvidia_smmu_impl_init(smmu);
> +
>   if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
>   of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
>   return qcom_smmu_impl_init(smmu);
> diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
> new file mode 100644
> index 0..5c874912e1c1a
> --- /dev/null
> +++ b/drivers/iommu/arm-smmu-nvidia.c
> @@ -0,0 +1,206 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// NVIDIA ARM SMMU v2 implementation quirks
> +// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "arm-smmu.h"
> +
> +/*
> + * Tegra194 has three ARM MMU-500 Instances.
> + * Two of them are used together for interleaved IOVA accesses and
> + * used by non-isochronous HW devices for SMMU translations.
> + * Third one is used for SMMU translations from isochronous HW devices.
> + * It is possible to use this implementation to program either
> + * all three or two of the instances identically as desired through
> + * DT node.
> + *
> + * Programming all the three instances identically comes with redundant TLB
> + * invalidations as all three never need to be TLB invalidated for a HW 
> device.
> + *
> + * When Linux kernel supports multiple SMMU devices, the SMMU device used for
> + * isochornous HW devices should be added as a separate ARM MMU-500 device
> + * in DT and be programmed independently for efficient TLB invalidates.
> + */

We should address Robin's comment about the 'When' above.

> +#define MAX_SMMU_INSTANCES 3
> +
> +struct nvidia_smmu {
> + struct arm_smmu_device  smmu;
> + unsigned intnum_inst;
> + void __iomem*bases[MAX_SMMU_INSTANCES];
> +};
> +
> +static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device 
> *smmu)
> +{
> + return container_of(smmu, struct nvidia_smmu, smmu);
> +}
> +
> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
> +  unsigned int inst, int page)
> +{
> + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
> +
> + if (!nvidia_smmu->bases[0])
> + nvidia_smmu->bases[0] = smmu->base;


Robin said that he would accept a patch to move the
devm_ioremap_resource() call in arm_smmu_device_probe() so that we do
not need to do this. See the V7 series. I think that this would be a
good improvement so that we could avoid doing the above.

Cheers

Re: [RFC 0/2] iommu: arm-smmu: Add support for early direct mappings

2020-07-01 Thread Bjorn Andersson
On Wed 03 Jun 04:00 PDT 2020, Robin Murphy wrote:

> On 2020-06-02 07:32, Bjorn Andersson wrote:
> > On Wed 27 May 04:03 PDT 2020, Will Deacon wrote:
> > 
> > > Hi John, Bjorn,
> > > 
> > > On Tue, May 26, 2020 at 01:34:45PM -0700, John Stultz wrote:
> > > > On Thu, May 14, 2020 at 12:34 PM  wrote:
> > > > > 
> > > > > On Thu 27 Feb 18:57 PST 2020, Bjorn Andersson wrote:
> > > > > 
> > > > > Rob, Will, we're reaching the point where upstream has enough
> > > > > functionality that this is becoming a critical issue for us.
> > > > > 
> > > > > E.g. Lenovo Yoga C630 is lacking this and a single dts patch to boot
> > > > > mainline with display, GPU, WiFi and audio working and the story is
> > > > > similar on several devboards.
> > > > > 
> > > > > As previously described, the only thing I want is the stream mapping
> > > > > related to the display controller in place, either with the CB with
> > > > > translation disabled or possibly with a way to specify the framebuffer
> > > > > region (although this turns out to mess things up in the display
> > > > > driver...)
> > > > > 
> > > > > I did pick this up again recently and concluded that by omitting the
> > > > > streams for the USB controllers causes an instability issue seen on 
> > > > > one
> > > > > of the controller to disappear. So I would prefer if we somehow could
> > > > > have a mechanism to only pick the display streams and the context
> > > > > allocation for this.
> > > > > 
> > > > > 
> > > > > Can you please share some pointers/insights/wishes for how we can
> > > > > conclude on this subject?
> > > > 
> > > > Ping? I just wanted to follow up on this discussion as this small
> > > > series is crucial for booting mainline on the Dragonboard 845c
> > > > devboard. It would be really valuable to be able to get some solution
> > > > upstream so we can test mainline w/o adding additional patches.
> > > 
> > > Sorry, it's been insanely busy recently and I haven't had a chance to 
> > > think
> > > about this on top of everything else. We're also carrying a hack in 
> > > Android
> > > for you :)
> > > 
> > 
> > Thanks for taking the time to get back to us on this!
> > 
> > > > The rest of the db845c series has been moving forward smoothly, but
> > > > this set seems to be very stuck with no visible progress since Dec.
> > > > 
> > > > Are there any pointers for what folks would prefer to see?
> > > 
> > > I've had a chat with Robin about this. Originally, I was hoping that
> > > people would all work together towards an idyllic future where firmware
> > > would be able to describe arbitrary pre-existing mappings for devices,
> > > irrespective of the IOMMU through which they master and Linux could
> > > inherit this configuration. However, that hasn't materialised (there was
> > > supposed to be an IORT update, but I don't know what happened to that)
> > > and, in actual fact, the problem that you have on db845 is /far/ more
> > > restricted than the general problem.
> > > 
> > > Could you please try hacking something along the following lines and see
> > > how you get on? You may need my for-joerg/arm-smmu/updates branch for
> > > all the pieces:
> > > 
> > >1. Use the ->cfg_probe() callback to reserve the SMR/S2CRs you need
> > >   "pinning" and configure for bypass.
> > > 
> > >2. Use the ->def_domain_type() callback to return IOMMU_DOMAIN_IDENTITY
> > >   for the display controller
> > > 
> > > I /think/ that's sufficient, but note that it differs from the current
> > > approach because we don't end up reserving a CB -- bypass is configured
> > > in the S2CR instead. Some invalidation might therefore be needed in
> > > ->cfg_probe() after unhooking the CB.
> > > 
> > > Thanks, and please yell if you run into problems with this approach.
> > > 
> > 
> > This sounded straight forward and cleaner, so I implemented it...
> > 
> > Unfortunately the hypervisor is playing tricks on me when writing to
> > S2CR registers:
> > - TRANS writes lands as requested
> > - BYPASS writes ends up in the register as requested, with type FAULT
> > - FAULT writes are ignored
> > 
> > In other words, the Qualcomm firmware prevents us from relying on
> > marking the relevant streams as BYPASS type.
> 
> Sigh...

I agree.

> at that point I'm inclined to suggest we give up and stop trying to
> drive these things with arm-smmu. The XZR thing was bad enough, but if
> they're not even going to pretend to implement the architecture correctly
> then I'm not massively keen to continue tying the architectural driver in
> further knots if innocent things like CONFIG_IOMMU_DEFAULT_PASSTHROUGH are
> going to unexpectedly and catastrophically fail. We have qcom-iommu for
> hypervisor-mediated SMMUs, and this new hypervisor behaviour sounds to me
> more like "qcom-iommu++" with reassignable stream-to-context mappings,
> rather than a proper Arm SMMU emulation.
> 

I've been going through over and over, hoping to perhaps be able to
evolve qcom_iommu into a 

Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()

2020-07-01 Thread Lu Baolu

Hi Robin,

On 2020/7/1 0:51, Robin Murphy wrote:

On 2020-06-30 02:03, Lu Baolu wrote:

Hi Robin,

On 6/29/20 7:56 PM, Robin Murphy wrote:

On 2020-06-27 04:15, Lu Baolu wrote:

The hardware assistant vfio mediated device is a use case of iommu
aux-domain. The interactions between vfio/mdev and iommu during mdev
creation and passthr are:

- Create a group for mdev with iommu_group_alloc();
- Add the device to the group with
 group = iommu_group_alloc();
 if (IS_ERR(group))
 return PTR_ERR(group);

 ret = iommu_group_add_device(group, >dev);
 if (!ret)
 dev_info(>dev, "MDEV: group_id = %d\n",
  iommu_group_id(group));
- Allocate an aux-domain
iommu_domain_alloc()
- Attach the aux-domain to the physical device from which the mdev is
   created.
iommu_aux_attach_device()

In the whole process, an iommu group was allocated for the mdev and an
iommu domain was attached to the group, but the group->domain leaves
NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.

This adds iommu_group_get/set_domain() so that group->domain could be
managed whenever a domain is attached or detached through the 
aux-domain

api's.


Letting external callers poke around directly in the internals of 
iommu_group doesn't look right to me.


Unfortunately, it seems that the vifo iommu abstraction is deeply bound
to the IOMMU subsystem. We can easily find other examples:

iommu_group_get/set_iommudata()
iommu_group_get/set_name()
...


Sure, but those are ways for users of a group to attach useful 
information of their own to it, that doesn't matter to the IOMMU 
subsystem itself. The interface you've proposed gives callers rich new 
opportunities to fundamentally break correct operation of the API:


 dom = iommu_domain_alloc();
 iommu_attach_group(dom, grp);
 ...
 iommu_group_set_domain(grp, NULL);
 // oops, leaked and can't ever detach properly now

or perhaps:

 grp = iommu_group_alloc();
 iommu_group_add_device(grp, dev);
 iommu_group_set_domain(grp, dom);
 ...
 iommu_detach_group(dom, grp);
 // oops, IOMMU driver might not handle this

If a regular device is attached to one or more aux domains for PASID 
use, iommu_get_domain_for_dev() is still going to return the primary 
domain, so why should it be expected to behave differently for mediated


Unlike the normal device attach, we will encounter two devices when it
comes to aux-domain.

- Parent physical device - this might be, for example, a PCIe device
with PASID feature support, hence it is able to tag an unique PASID
for DMA transfers originated from its subset. The device driver hence
is able to wrapper this subset into an isolated:

- Mediated device - a fake device created by the device driver mentioned
above.

Yes. All you mentioned are right for the parent device. But for mediated
device, iommu_get_domain_for_dev() doesn't work even it has an valid
iommu_group and iommu_domain.

iommu_get_domain_for_dev() is a necessary interface for device drivers
which want to support aux-domain. For example,


Only if they want to follow this very specific notion of using made-up 
devices and groups to represent aux attachments. Even if a driver 
managing its own aux domains entirely privately does create child 
devices for them, it's not like it can't keep its domain pointers in 
drvdata if it wants to ;)


Let's not conflate the current implementation of vfio_mdev with the 
general concepts involved here.



   struct iommu_domain *domain;
   struct device *dev = mdev_dev(mdev);
   unsigned long pasid;

   domain = iommu_get_domain_for_dev(dev);
   if (!domain)
   return -ENODEV;

   pasid = iommu_aux_get_pasid(domain, dev->parent);
   if (pasid == IOASID_INVALID)
   return -EINVAL;

   /* Program the device context with the PASID value */
   

Without this fix, iommu_get_domain_for_dev() always returns NULL and the
device driver has no means to support aux-domain.


So either the IOMMU API itself is missing the ability to do the right 
thing internally, or the mdev layer isn't using it appropriately. Either 
way, simply punching holes in the API for mdev to hack around its own 
mess doesn't seem like the best thing to do.


The initial impression I got was that it's implicitly assumed here that 
the mdev itself is attached to exactly one aux domain and nothing else, 
at which point I would wonder why it's using aux at all, but are you 
saying that in fact no attach happens with the mdev group either way, 
only to the parent device?


I'll admit I'm not hugely familiar with any of this, but it seems to me 
that the logical flow should be:


 - allocate domain
 - attach as aux to parent
 - retrieve aux domain PASID
 - create mdev child based on PASID
 - attach mdev to domain (normally)

Of course that might require giving