Re: [RFC 0/2] iommu: arm-smmu: Add support for early direct mappings

2020-06-01 Thread Bjorn Andersson
On Wed 27 May 04:03 PDT 2020, Will Deacon wrote:

> Hi John, Bjorn,
> 
> On Tue, May 26, 2020 at 01:34:45PM -0700, John Stultz wrote:
> > On Thu, May 14, 2020 at 12:34 PM  wrote:
> > >
> > > On Thu 27 Feb 18:57 PST 2020, Bjorn Andersson wrote:
> > >
> > > Rob, Will, we're reaching the point where upstream has enough
> > > functionality that this is becoming a critical issue for us.
> > >
> > > E.g. Lenovo Yoga C630 is lacking this and a single dts patch to boot
> > > mainline with display, GPU, WiFi and audio working and the story is
> > > similar on several devboards.
> > >
> > > As previously described, the only thing I want is the stream mapping
> > > related to the display controller in place, either with the CB with
> > > translation disabled or possibly with a way to specify the framebuffer
> > > region (although this turns out to mess things up in the display
> > > driver...)
> > >
> > > I did pick this up again recently and concluded that by omitting the
> > > streams for the USB controllers causes an instability issue seen on one
> > > of the controller to disappear. So I would prefer if we somehow could
> > > have a mechanism to only pick the display streams and the context
> > > allocation for this.
> > >
> > >
> > > Can you please share some pointers/insights/wishes for how we can
> > > conclude on this subject?
> > 
> > Ping? I just wanted to follow up on this discussion as this small
> > series is crucial for booting mainline on the Dragonboard 845c
> > devboard. It would be really valuable to be able to get some solution
> > upstream so we can test mainline w/o adding additional patches.
> 
> Sorry, it's been insanely busy recently and I haven't had a chance to think
> about this on top of everything else. We're also carrying a hack in Android
> for you :)
> 

Thanks for taking the time to get back to us on this!

> > The rest of the db845c series has been moving forward smoothly, but
> > this set seems to be very stuck with no visible progress since Dec.
> > 
> > Are there any pointers for what folks would prefer to see?
> 
> I've had a chat with Robin about this. Originally, I was hoping that
> people would all work together towards an idyllic future where firmware
> would be able to describe arbitrary pre-existing mappings for devices,
> irrespective of the IOMMU through which they master and Linux could
> inherit this configuration. However, that hasn't materialised (there was
> supposed to be an IORT update, but I don't know what happened to that)
> and, in actual fact, the problem that you have on db845 is /far/ more
> restricted than the general problem.
> 
> Could you please try hacking something along the following lines and see
> how you get on? You may need my for-joerg/arm-smmu/updates branch for
> all the pieces:
> 
>   1. Use the ->cfg_probe() callback to reserve the SMR/S2CRs you need
>  "pinning" and configure for bypass.
> 
>   2. Use the ->def_domain_type() callback to return IOMMU_DOMAIN_IDENTITY
>  for the display controller
> 
> I /think/ that's sufficient, but note that it differs from the current
> approach because we don't end up reserving a CB -- bypass is configured
> in the S2CR instead. Some invalidation might therefore be needed in
> ->cfg_probe() after unhooking the CB.
> 
> Thanks, and please yell if you run into problems with this approach.
> 

This sounded straight forward and cleaner, so I implemented it...

Unfortunately the hypervisor is playing tricks on me when writing to
S2CR registers:
- TRANS writes lands as requested
- BYPASS writes ends up in the register as requested, with type FAULT
- FAULT writes are ignored

In other words, the Qualcomm firmware prevents us from relying on
marking the relevant streams as BYPASS type.


Instead Qualcomm seems to implement "bypass" by setting up stream
mapping, of TRANS type, pointing to a context bank without
ARM_SMMU_SCTLR_M set.

Regards,
Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] driver core: platform: expose numa_node to users in sysfs

2020-06-01 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Greg KH [mailto:gre...@linuxfoundation.org]
> Sent: Tuesday, June 2, 2020 6:11 PM
> To: Song Bao Hua (Barry Song) 
> Cc: raf...@kernel.org; iommu@lists.linux-foundation.org;
> linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; Linuxarm
> ; Zengtao (B) ; Robin
> Murphy 
> Subject: Re: [PATCH] driver core: platform: expose numa_node to users in sysfs
> 
> On Tue, Jun 02, 2020 at 05:09:57AM +, Song Bao Hua (Barry Song) wrote:
> > > >
> > > > Platform devices are NUMA?  That's crazy, and feels like a total
> > > > abuse of platform devices and drivers that really should belong on a
> "real"
> > > > bus.
> > >
> > > I am not sure if it is an abuse of platform device. But smmu is a
> > > platform device, drivers/iommu/arm-smmu-v3.c is a platform driver.
> > > In a typical ARM server, there are maybe multiple SMMU devices which
> > > can support IO virtual address and page tables for other devices on
> > > PCI-like busses.
> > > Each different SMMU device might be close to different NUMA node.
> > > There is really a hardware topology.
> > >
> > > If you have multiple CPU packages in a NUMA server, some platform
> > > devices might Belong to CPU0, some other might belong to CPU1.
> >
> > Those devices are populated by acpi_iort for an ARM server:
> >
> > drivers/acpi/arm64/iort.c:
> >
> > static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = {
> > .name = "arm-smmu-v3",
> > .dev_dma_configure = arm_smmu_v3_dma_configure,
> > .dev_count_resources = arm_smmu_v3_count_resources,
> > .dev_init_resources = arm_smmu_v3_init_resources,
> > .dev_set_proximity = arm_smmu_v3_set_proximity, };
> >
> > void __init acpi_iort_init(void)
> > {
> > acpi_status status;
> >
> > status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table);
> > ...
> > iort_check_id_count_workaround(iort_table);
> > iort_init_platform_devices();
> > }
> >
> > static void __init iort_init_platform_devices(void) {
> > ...
> >
> > for (i = 0; i < iort->node_count; i++) {
> > if (iort_node >= iort_end) {
> > pr_err("iort node pointer overflows, bad
> table\n");
> > return;
> > }
> >
> > iort_enable_acs(iort_node);
> >
> > ops = iort_get_dev_cfg(iort_node);
> > if (ops) {
> > fwnode = acpi_alloc_fwnode_static();
> > if (!fwnode)
> > return;
> >
> > iort_set_fwnode(iort_node, fwnode);
> >
> > ret = iort_add_platform_device(iort_node, ops);
> > if (ret) {
> > iort_delete_fwnode(iort_node);
> > acpi_free_fwnode_static(fwnode);
> > return;
> > }
> > }
> >
> > ...
> > }
> > ...
> > }
> >
> > NUMA node is got from ACPI:
> >
> > static int  __init arm_smmu_v3_set_proximity(struct device *dev,
> >   struct acpi_iort_node
> > *node) {
> > struct acpi_iort_smmu_v3 *smmu;
> >
> > smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
> > if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
> > int dev_node = acpi_map_pxm_to_node(smmu->pxm);
> >
> > ...
> >
> > set_dev_node(dev, dev_node);
> > ...
> > }
> > return 0;
> > }
> >
> > Barry
> 
> That's fine, but those are "real" devices, not platform devices, right?
> 

Most platform devices are "real" memory-mapped hardware devices. For an 
embedded system, almost all "simple-bus"
devices are populated from device trees as platform devices. Only a part of 
platform devices are not "real" hardware.

Smmu is a memory-mapped device. It is totally like most other platform devices 
populated in a 
memory space mapped in cpu's local space. It uses ioremap to map registers, use 
readl/writel to read/write its
space.

> What platform device has this issue?  What one will show up this way with
> the new patch?

if platform device shouldn't be a real hardware, there is no platform device 
with a hardware topology.
But platform devices are "real" hardware at most time. Smmu is a "real" device, 
but it is a platform device in Linux.

> 
> thanks,
> 
> greg k-h

-barry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] driver core: platform: expose numa_node to users in sysfs

2020-06-01 Thread Greg KH
On Tue, Jun 02, 2020 at 05:09:57AM +, Song Bao Hua (Barry Song) wrote:
> > >
> > > Platform devices are NUMA?  That's crazy, and feels like a total abuse
> > > of platform devices and drivers that really should belong on a "real"
> > > bus.
> > 
> > I am not sure if it is an abuse of platform device. But smmu is a platform
> > device,
> > drivers/iommu/arm-smmu-v3.c is a platform driver.
> > In a typical ARM server, there are maybe multiple SMMU devices which can
> > support
> > IO virtual address and page tables for other devices on PCI-like busses.
> > Each different SMMU device might be close to different NUMA node. There is
> > really a hardware topology.
> > 
> > If you have multiple CPU packages in a NUMA server, some platform devices
> > might
> > Belong to CPU0, some other might belong to CPU1.
> 
> Those devices are populated by acpi_iort for an ARM server:
> 
> drivers/acpi/arm64/iort.c:
> 
> static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = {
> .name = "arm-smmu-v3",
> .dev_dma_configure = arm_smmu_v3_dma_configure,
> .dev_count_resources = arm_smmu_v3_count_resources,
> .dev_init_resources = arm_smmu_v3_init_resources,
> .dev_set_proximity = arm_smmu_v3_set_proximity,
> };
> 
> void __init acpi_iort_init(void)
> {
> acpi_status status;
> 
> status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table);
> ...
> iort_check_id_count_workaround(iort_table);
> iort_init_platform_devices();
> }
> 
> static void __init iort_init_platform_devices(void)
> {
> ...
> 
> for (i = 0; i < iort->node_count; i++) {
> if (iort_node >= iort_end) {
> pr_err("iort node pointer overflows, bad table\n");
> return;
> }
> 
> iort_enable_acs(iort_node);
> 
> ops = iort_get_dev_cfg(iort_node);
> if (ops) {
> fwnode = acpi_alloc_fwnode_static();
> if (!fwnode)
> return;
> 
> iort_set_fwnode(iort_node, fwnode);
> 
> ret = iort_add_platform_device(iort_node, ops);
> if (ret) {
> iort_delete_fwnode(iort_node);
> acpi_free_fwnode_static(fwnode);
> return;
> }
> }
> 
> ...
> }
> ...
> }
> 
> NUMA node is got from ACPI:
> 
> static int  __init arm_smmu_v3_set_proximity(struct device *dev,
>   struct acpi_iort_node *node)
> {
> struct acpi_iort_smmu_v3 *smmu;
> 
> smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
> if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
> int dev_node = acpi_map_pxm_to_node(smmu->pxm);
> 
> ...
> 
> set_dev_node(dev, dev_node);
> ...
> }
> return 0;
> }
> 
> Barry

That's fine, but those are "real" devices, not platform devices, right?

What platform device has this issue?  What one will show up this way
with the new patch?

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu: Improve exception handling in iommu_group_alloc()

2020-06-01 Thread Baolin Wang
On Tue, Jun 02, 2020 at 07:01:02AM +0200, Markus Elfring wrote:
> >> * I suggest to avoid the specification of duplicate function calls.
> >>   Will it be helpful to add a few jump targets?
> >>   
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162#n455
> >
> > I don't think it is helpful or readable to add some jump targets here,
> > since the exception handling is very simple here.
> 
> Do you disagree to the application of the Linux coding style then
> for the recommended exception handling?

No, that's not what I mean. My point is the exception handling in this
patch is simple and no need to add 'goto' statement which does not help
to improve readability. And I agree it is helpful for the cases where a
function exits from multiple locations and more same cleanup work need
to do.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] driver core: platform: expose numa_node to users in sysfs

2020-06-01 Thread Barry Song
For some platform devices like iommu, particually ARM smmu, users may
care about the numa locality. for example, if threads and drivers run
near iommu, they may get much better performance on dma_unmap_sg.
For other platform devices, users may still want to know the hardware
topology.

Cc: Prime Zeng 
Cc: Robin Murphy 
Signed-off-by: Barry Song 
---
 -v2: add the numa_node entry in Documentation/ABI/

 Documentation/ABI/testing/sysfs-bus-platform | 10 
 drivers/base/platform.c  | 26 +++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-platform 
b/Documentation/ABI/testing/sysfs-bus-platform
index 5172a6124b27..e8f5958c1d18 100644
--- a/Documentation/ABI/testing/sysfs-bus-platform
+++ b/Documentation/ABI/testing/sysfs-bus-platform
@@ -18,3 +18,13 @@ Description:
devices to opt-out of driver binding using a driver_override
name such as "none".  Only a single driver may be specified in
the override, there is no support for parsing delimiters.
+
+What:  /sys/bus/platform/devices/.../numa_node
+Date:  June 2020
+Contact:   Barry Song 
+Description:
+   This file contains the NUMA node to which the platform device
+   is attached. It won't be visible if the node is unknown. The
+   value comes from an ACPI _PXM method or a similar firmware
+   source. Initial users for this file would be devices like
+   arm smmu which are populated by arm64 acpi_iort.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index b27d0f6c18c9..7794b9a38d82 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1062,13 +1062,37 @@ static ssize_t driver_override_show(struct device *dev,
 }
 static DEVICE_ATTR_RW(driver_override);
 
+static ssize_t numa_node_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%d\n", dev_to_node(dev));
+}
+static DEVICE_ATTR_RO(numa_node);
+
+static umode_t platform_dev_attrs_visible(struct kobject *kobj, struct 
attribute *a,
+   int n)
+{
+   struct device *dev = container_of(kobj, typeof(*dev), kobj);
+
+   if (a == &dev_attr_numa_node.attr &&
+   dev_to_node(dev) == NUMA_NO_NODE)
+   return 0;
+
+   return a->mode;
+}
 
 static struct attribute *platform_dev_attrs[] = {
&dev_attr_modalias.attr,
+   &dev_attr_numa_node.attr,
&dev_attr_driver_override.attr,
NULL,
 };
-ATTRIBUTE_GROUPS(platform_dev);
+
+static struct attribute_group platform_dev_group = {
+   .attrs = platform_dev_attrs,
+   .is_visible = platform_dev_attrs_visible,
+};
+__ATTRIBUTE_GROUPS(platform_dev);
 
 static int platform_uevent(struct device *dev, struct kobj_uevent_env *env)
 {
-- 
2.23.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] driver core: platform: expose numa_node to users in sysfs

2020-06-01 Thread Song Bao Hua (Barry Song)
> >
> > Platform devices are NUMA?  That's crazy, and feels like a total abuse
> > of platform devices and drivers that really should belong on a "real"
> > bus.
> 
> I am not sure if it is an abuse of platform device. But smmu is a platform
> device,
> drivers/iommu/arm-smmu-v3.c is a platform driver.
> In a typical ARM server, there are maybe multiple SMMU devices which can
> support
> IO virtual address and page tables for other devices on PCI-like busses.
> Each different SMMU device might be close to different NUMA node. There is
> really a hardware topology.
> 
> If you have multiple CPU packages in a NUMA server, some platform devices
> might
> Belong to CPU0, some other might belong to CPU1.

Those devices are populated by acpi_iort for an ARM server:

drivers/acpi/arm64/iort.c:

static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = {
.name = "arm-smmu-v3",
.dev_dma_configure = arm_smmu_v3_dma_configure,
.dev_count_resources = arm_smmu_v3_count_resources,
.dev_init_resources = arm_smmu_v3_init_resources,
.dev_set_proximity = arm_smmu_v3_set_proximity,
};

void __init acpi_iort_init(void)
{
acpi_status status;

status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table);
...
iort_check_id_count_workaround(iort_table);
iort_init_platform_devices();
}

static void __init iort_init_platform_devices(void)
{
...

for (i = 0; i < iort->node_count; i++) {
if (iort_node >= iort_end) {
pr_err("iort node pointer overflows, bad table\n");
return;
}

iort_enable_acs(iort_node);

ops = iort_get_dev_cfg(iort_node);
if (ops) {
fwnode = acpi_alloc_fwnode_static();
if (!fwnode)
return;

iort_set_fwnode(iort_node, fwnode);

ret = iort_add_platform_device(iort_node, ops);
if (ret) {
iort_delete_fwnode(iort_node);
acpi_free_fwnode_static(fwnode);
return;
}
}

...
}
...
}

NUMA node is got from ACPI:

static int  __init arm_smmu_v3_set_proximity(struct device *dev,
  struct acpi_iort_node *node)
{
struct acpi_iort_smmu_v3 *smmu;

smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
int dev_node = acpi_map_pxm_to_node(smmu->pxm);

...

set_dev_node(dev, dev_node);
...
}
return 0;
}

Barry


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu: Improve exception handling in iommu_group_alloc()

2020-06-01 Thread Markus Elfring
>> * I suggest to avoid the specification of duplicate function calls.
>>   Will it be helpful to add a few jump targets?
>>   
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162#n455
>
> I don't think it is helpful or readable to add some jump targets here,
> since the exception handling is very simple here.

Do you disagree to the application of the Linux coding style then
for the recommended exception handling?

Regards,
Markus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] driver core: platform: expose numa_node to users in sysfs

2020-06-01 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Greg KH [mailto:gre...@linuxfoundation.org]
> Sent: Tuesday, June 2, 2020 4:24 PM
> To: Song Bao Hua (Barry Song) 
> Cc: raf...@kernel.org; iommu@lists.linux-foundation.org;
> linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; Linuxarm
> ; Zengtao (B) ; Robin
> Murphy 
> Subject: Re: [PATCH] driver core: platform: expose numa_node to users in sysfs
> 
> On Tue, Jun 02, 2020 at 03:01:39PM +1200, Barry Song wrote:
> > For some platform devices like iommu, particually ARM smmu, users may
> > care about the numa locality. for example, if threads and drivers run
> > near iommu, they may get much better performance on dma_unmap_sg.
> > For other platform devices, users may still want to know the hardware
> > topology.
> >
> > Cc: Prime Zeng 
> > Cc: Robin Murphy 
> > Signed-off-by: Barry Song 
> > ---
> >  drivers/base/platform.c | 26 +-
> >  1 file changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > index b27d0f6c18c9..7794b9a38d82 100644
> > --- a/drivers/base/platform.c
> > +++ b/drivers/base/platform.c
> > @@ -1062,13 +1062,37 @@ static ssize_t driver_override_show(struct
> device *dev,
> >  }
> >  static DEVICE_ATTR_RW(driver_override);
> >
> > +static ssize_t numa_node_show(struct device *dev,
> > +   struct device_attribute *attr, char *buf)
> > +{
> > +   return sprintf(buf, "%d\n", dev_to_node(dev));
> > +}
> > +static DEVICE_ATTR_RO(numa_node);
> > +
> > +static umode_t platform_dev_attrs_visible(struct kobject *kobj, struct
> attribute *a,
> > +   int n)
> > +{
> > +   struct device *dev = container_of(kobj, typeof(*dev), kobj);
> > +
> > +   if (a == &dev_attr_numa_node.attr &&
> > +   dev_to_node(dev) == NUMA_NO_NODE)
> > +   return 0;
> > +
> > +   return a->mode;
> > +}
> >
> >  static struct attribute *platform_dev_attrs[] = {
> > &dev_attr_modalias.attr,
> > +   &dev_attr_numa_node.attr,
> > &dev_attr_driver_override.attr,
> > NULL,
> >  };
> > -ATTRIBUTE_GROUPS(platform_dev);
> > +
> > +static struct attribute_group platform_dev_group = {
> > +   .attrs = platform_dev_attrs,
> > +   .is_visible = platform_dev_attrs_visible,
> > +};
> > +__ATTRIBUTE_GROUPS(platform_dev);
> >
> >  static int platform_uevent(struct device *dev, struct kobj_uevent_env *env)
> >  {
> 
> Platform devices are NUMA?  That's crazy, and feels like a total abuse
> of platform devices and drivers that really should belong on a "real"
> bus.

I am not sure if it is an abuse of platform device. But smmu is a platform 
device,
drivers/iommu/arm-smmu-v3.c is a platform driver.
In a typical ARM server, there are maybe multiple SMMU devices which can support
IO virtual address and page tables for other devices on PCI-like busses.
Each different SMMU device might be close to different NUMA node. There is
really a hardware topology.

If you have multiple CPU packages in a NUMA server, some platform devices might
Belong to CPU0, some other might belong to CPU1.

-barry

> 
> What drivers in the kernel today have this issue?
> 
> thanks,
> 
> greg k-h


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] driver core: platform: expose numa_node to users in sysfs

2020-06-01 Thread Greg KH
On Tue, Jun 02, 2020 at 03:01:39PM +1200, Barry Song wrote:
> For some platform devices like iommu, particually ARM smmu, users may
> care about the numa locality. for example, if threads and drivers run
> near iommu, they may get much better performance on dma_unmap_sg.
> For other platform devices, users may still want to know the hardware
> topology.
> 
> Cc: Prime Zeng 
> Cc: Robin Murphy 
> Signed-off-by: Barry Song 
> ---
>  drivers/base/platform.c | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index b27d0f6c18c9..7794b9a38d82 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1062,13 +1062,37 @@ static ssize_t driver_override_show(struct device 
> *dev,
>  }
>  static DEVICE_ATTR_RW(driver_override);
>  
> +static ssize_t numa_node_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "%d\n", dev_to_node(dev));
> +}
> +static DEVICE_ATTR_RO(numa_node);
> +
> +static umode_t platform_dev_attrs_visible(struct kobject *kobj, struct 
> attribute *a,
> + int n)
> +{
> + struct device *dev = container_of(kobj, typeof(*dev), kobj);
> +
> + if (a == &dev_attr_numa_node.attr &&
> + dev_to_node(dev) == NUMA_NO_NODE)
> + return 0;
> +
> + return a->mode;
> +}
>  
>  static struct attribute *platform_dev_attrs[] = {
>   &dev_attr_modalias.attr,
> + &dev_attr_numa_node.attr,
>   &dev_attr_driver_override.attr,
>   NULL,
>  };
> -ATTRIBUTE_GROUPS(platform_dev);
> +
> +static struct attribute_group platform_dev_group = {
> + .attrs = platform_dev_attrs,
> + .is_visible = platform_dev_attrs_visible,
> +};
> +__ATTRIBUTE_GROUPS(platform_dev);
>  
>  static int platform_uevent(struct device *dev, struct kobj_uevent_env *env)
>  {

Also you forgot a new entry in Documentation/ABI/ :(

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] driver core: platform: expose numa_node to users in sysfs

2020-06-01 Thread Greg KH
On Tue, Jun 02, 2020 at 03:01:39PM +1200, Barry Song wrote:
> For some platform devices like iommu, particually ARM smmu, users may
> care about the numa locality. for example, if threads and drivers run
> near iommu, they may get much better performance on dma_unmap_sg.
> For other platform devices, users may still want to know the hardware
> topology.
> 
> Cc: Prime Zeng 
> Cc: Robin Murphy 
> Signed-off-by: Barry Song 
> ---
>  drivers/base/platform.c | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index b27d0f6c18c9..7794b9a38d82 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1062,13 +1062,37 @@ static ssize_t driver_override_show(struct device 
> *dev,
>  }
>  static DEVICE_ATTR_RW(driver_override);
>  
> +static ssize_t numa_node_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "%d\n", dev_to_node(dev));
> +}
> +static DEVICE_ATTR_RO(numa_node);
> +
> +static umode_t platform_dev_attrs_visible(struct kobject *kobj, struct 
> attribute *a,
> + int n)
> +{
> + struct device *dev = container_of(kobj, typeof(*dev), kobj);
> +
> + if (a == &dev_attr_numa_node.attr &&
> + dev_to_node(dev) == NUMA_NO_NODE)
> + return 0;
> +
> + return a->mode;
> +}
>  
>  static struct attribute *platform_dev_attrs[] = {
>   &dev_attr_modalias.attr,
> + &dev_attr_numa_node.attr,
>   &dev_attr_driver_override.attr,
>   NULL,
>  };
> -ATTRIBUTE_GROUPS(platform_dev);
> +
> +static struct attribute_group platform_dev_group = {
> + .attrs = platform_dev_attrs,
> + .is_visible = platform_dev_attrs_visible,
> +};
> +__ATTRIBUTE_GROUPS(platform_dev);
>  
>  static int platform_uevent(struct device *dev, struct kobj_uevent_env *env)
>  {

Platform devices are NUMA?  That's crazy, and feels like a total abuse
of platform devices and drivers that really should belong on a "real"
bus.

What drivers in the kernel today have this issue?

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] driver core: platform: expose numa_node to users in sysfs

2020-06-01 Thread Barry Song
For some platform devices like iommu, particually ARM smmu, users may
care about the numa locality. for example, if threads and drivers run
near iommu, they may get much better performance on dma_unmap_sg.
For other platform devices, users may still want to know the hardware
topology.

Cc: Prime Zeng 
Cc: Robin Murphy 
Signed-off-by: Barry Song 
---
 drivers/base/platform.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index b27d0f6c18c9..7794b9a38d82 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1062,13 +1062,37 @@ static ssize_t driver_override_show(struct device *dev,
 }
 static DEVICE_ATTR_RW(driver_override);
 
+static ssize_t numa_node_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%d\n", dev_to_node(dev));
+}
+static DEVICE_ATTR_RO(numa_node);
+
+static umode_t platform_dev_attrs_visible(struct kobject *kobj, struct 
attribute *a,
+   int n)
+{
+   struct device *dev = container_of(kobj, typeof(*dev), kobj);
+
+   if (a == &dev_attr_numa_node.attr &&
+   dev_to_node(dev) == NUMA_NO_NODE)
+   return 0;
+
+   return a->mode;
+}
 
 static struct attribute *platform_dev_attrs[] = {
&dev_attr_modalias.attr,
+   &dev_attr_numa_node.attr,
&dev_attr_driver_override.attr,
NULL,
 };
-ATTRIBUTE_GROUPS(platform_dev);
+
+static struct attribute_group platform_dev_group = {
+   .attrs = platform_dev_attrs,
+   .is_visible = platform_dev_attrs_visible,
+};
+__ATTRIBUTE_GROUPS(platform_dev);
 
 static int platform_uevent(struct device *dev, struct kobj_uevent_env *env)
 {
-- 
2.23.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu: Improve exception handling in iommu_group_alloc()

2020-06-01 Thread Baolin Wang
Improve the exception handling to free the resources correctly when
failed to allocate an iommu group.

Fixes: bc7d12b91bd3 ("iommu: Implement reserved_regions iommu-group sysfs file")
Signed-off-by: Baolin Wang 
---
Changes from v1:
 - Improve the commmit message.
 - Add Fixes tag.
---
 drivers/iommu/iommu.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 03d6a26..ac91024 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -529,12 +529,18 @@ struct iommu_group *iommu_group_alloc(void)
 
ret = iommu_group_create_file(group,
  &iommu_group_attr_reserved_regions);
-   if (ret)
+   if (ret) {
+   kobject_put(group->devices_kobj);
return ERR_PTR(ret);
+   }
 
ret = iommu_group_create_file(group, &iommu_group_attr_type);
-   if (ret)
+   if (ret) {
+   iommu_group_remove_file(group,
+   &iommu_group_attr_reserved_regions);
+   kobject_put(group->devices_kobj);
return ERR_PTR(ret);
+   }
 
pr_debug("Allocated group %d\n", group->id);
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Improve exception handling in iommu_group_alloc()

2020-06-01 Thread Baolin Wang
On Mon, Jun 01, 2020 at 02:38:05PM +0200, Markus Elfring wrote:
> > Optimize the error handling to free the resources correctly when
> > failed to allocate an iommu group.
> 
> * I would not categorise the desired completion of exception handling
>   as a software optimisation.
> 
> * Would you like to add the tag “Fixes” to the commit message?

Sure.

> 
> * I suggest to avoid the specification of duplicate function calls.
>   Will it be helpful to add a few jump targets?
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162#n455

I don't think it is helpful or readable to add some jump targets here, since 
the exception handling is very simple here.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code

2020-06-01 Thread Jerry Snitselaar

On Tue Jun 02 20, Lu Baolu wrote:

Hi Jerry,

On 6/1/20 6:42 PM, Jerry Snitselaar wrote:


Hi Joerg,

With this patchset, I have an epyc system where if I boot with
iommu=nopt and force a dump I will see some io page faults for a nic
on the system. The vmcore is harvested and the system reboots. I
haven't reproduced it on other systems yet, but without the patchset I
don't see the io page faults during the kdump.

Regards,
Jerry


I just hit an issue on a separate intel based system (kdump iommu=nopt),
where it panics in during intel_iommu_attach_device, in is_aux_domain,
due to device_domain_info being DEFER_DEVICE_DOMAIN_INFO. That doesn't
get set to a valid address until the domain_add_dev_info call.

Is it as simple as the following?


I guess you won't hit this issue if you use iommu/next branch of Joerg's
tree. We've changed to use a generic helper to retrieve the valid per
device iommu data or NULL (if there's no).

Best regards,
baolu



Yeah, that will solve the panic. 



diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 29d3940847d3..f1bbeed46a4c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5053,8 +5053,8 @@ is_aux_domain(struct device *dev, struct 
iommu_domain *domain)

 {
    struct device_domain_info *info = dev->archdata.iommu;

-   return info && info->auxd_enabled &&
-   domain->type == IOMMU_DOMAIN_UNMANAGED;
+   return info && info != DEFER_DEVICE_DOMAIN_INFO &&
+   info->auxd_enabled && domain->type == 
IOMMU_DOMAIN_UNMANAGED;

 }

 static void auxiliary_link_device(struct dmar_domain *domain,


Regards,
Jerry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code

2020-06-01 Thread Lu Baolu

Hi Jerry,

On 6/1/20 9:17 PM, Jerry Snitselaar wrote:

On Mon Jun 01 20, Jerry Snitselaar wrote:

On Fri May 29 20, Jerry Snitselaar wrote:

On Tue Apr 14 20, Joerg Roedel wrote:

Hi,

here is the second version of this patch-set. The first version with
some more introductory text can be found here:

https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/

Changes v1->v2:

* Rebased to v5.7-rc1

* Re-wrote the arm-smmu changes as suggested by Robin Murphy

* Re-worked the Exynos patches to hopefully not break the
  driver anymore

* Fixed a missing mutex_unlock() reported by Marek Szyprowski,
  thanks for that.

There is also a git-branch available with these patches applied:

https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v2 



Please review.

Thanks,

Joerg

Joerg Roedel (32):
iommu: Move default domain allocation to separate function
iommu/amd: Implement iommu_ops->def_domain_type call-back
iommu/vt-d: Wire up iommu_ops->def_domain_type
iommu/amd: Remove dma_mask check from check_device()
iommu/amd: Return -ENODEV in add_device when device is not handled by
 IOMMU
iommu: Add probe_device() and remove_device() call-backs
iommu: Move default domain allocation to iommu_probe_device()
iommu: Keep a list of allocated groups in __iommu_probe_device()
iommu: Move new probe_device path to separate function
iommu: Split off default domain allocation from group assignment
iommu: Move iommu_group_create_direct_mappings() out of
 iommu_group_add_device()
iommu: Export bus_iommu_probe() and make is safe for re-probing
iommu/amd: Remove dev_data->passthrough
iommu/amd: Convert to probe/release_device() call-backs
iommu/vt-d: Convert to probe/release_device() call-backs
iommu/arm-smmu: Convert to probe/release_device() call-backs
iommu/pamu: Convert to probe/release_device() call-backs
iommu/s390: Convert to probe/release_device() call-backs
iommu/virtio: Convert to probe/release_device() call-backs
iommu/msm: Convert to probe/release_device() call-backs
iommu/mediatek: Convert to probe/release_device() call-backs
iommu/mediatek-v1 Convert to probe/release_device() call-backs
iommu/qcom: Convert to probe/release_device() call-backs
iommu/rockchip: Convert to probe/release_device() call-backs
iommu/tegra: Convert to probe/release_device() call-backs
iommu/renesas: Convert to probe/release_device() call-backs
iommu/omap: Remove orphan_dev tracking
iommu/omap: Convert to probe/release_device() call-backs
iommu/exynos: Use first SYSMMU in controllers list for IOMMU core
iommu/exynos: Convert to probe/release_device() call-backs
iommu: Remove add_device()/remove_device() code-paths
iommu: Unexport iommu_group_get_for_dev()

Sai Praneeth Prakhya (1):
iommu: Add def_domain_type() callback in iommu_ops

drivers/iommu/amd_iommu.c   |  97 
drivers/iommu/amd_iommu_types.h |   1 -
drivers/iommu/arm-smmu-v3.c |  38 +--
drivers/iommu/arm-smmu.c    |  39 ++--
drivers/iommu/exynos-iommu.c    |  24 +-
drivers/iommu/fsl_pamu_domain.c |  22 +-
drivers/iommu/intel-iommu.c |  68 +-
drivers/iommu/iommu.c   | 393 +---
drivers/iommu/ipmmu-vmsa.c  |  60 ++---
drivers/iommu/msm_iommu.c   |  34 +--
drivers/iommu/mtk_iommu.c   |  24 +-
drivers/iommu/mtk_iommu_v1.c    |  50 ++--
drivers/iommu/omap-iommu.c  |  99 ++--
drivers/iommu/qcom_iommu.c  |  24 +-
drivers/iommu/rockchip-iommu.c  |  26 +--
drivers/iommu/s390-iommu.c  |  22 +-
drivers/iommu/tegra-gart.c  |  24 +-
drivers/iommu/tegra-smmu.c  |  31 +--
drivers/iommu/virtio-iommu.c    |  41 +---
include/linux/iommu.h   |  21 +-
20 files changed, 533 insertions(+), 605 deletions(-)

--
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu



Hi Joerg,

With this patchset, I have an epyc system where if I boot with
iommu=nopt and force a dump I will see some io page faults for a nic
on the system. The vmcore is harvested and the system reboots. I
haven't reproduced it on other systems yet, but without the patchset I
don't see the io page faults during the kdump.

Regards,
Jerry


I just hit an issue on a separate intel based system (kdump iommu=nopt),
where it panics in during intel_iommu_attach_device, in is_aux_domain,
due to device_domain_info being DEFER_DEVICE_DOMAIN_INFO. That doesn't
get set to a valid address until the domain_add_dev_info call.

Is it as simple as the following?

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 29d3940847d3..f1bbeed46a4c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5053,8 +5053,8 @@ is_aux_domain(struct device *dev, struct 
iommu_domain *domain)

{
   struct device_domain_info *info = dev->archdata.iommu;
-   return info && info->auxd_enabled &&
-   domain->type 

Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code

2020-06-01 Thread Lu Baolu

Hi Jerry,

On 6/1/20 6:42 PM, Jerry Snitselaar wrote:


Hi Joerg,

With this patchset, I have an epyc system where if I boot with
iommu=nopt and force a dump I will see some io page faults for a nic
on the system. The vmcore is harvested and the system reboots. I
haven't reproduced it on other systems yet, but without the patchset I
don't see the io page faults during the kdump.

Regards,
Jerry


I just hit an issue on a separate intel based system (kdump iommu=nopt),
where it panics in during intel_iommu_attach_device, in is_aux_domain,
due to device_domain_info being DEFER_DEVICE_DOMAIN_INFO. That doesn't
get set to a valid address until the domain_add_dev_info call.

Is it as simple as the following?


I guess you won't hit this issue if you use iommu/next branch of Joerg's
tree. We've changed to use a generic helper to retrieve the valid per
device iommu data or NULL (if there's no).

Best regards,
baolu



diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 29d3940847d3..f1bbeed46a4c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5053,8 +5053,8 @@ is_aux_domain(struct device *dev, struct 
iommu_domain *domain)

  {
     struct device_domain_info *info = dev->archdata.iommu;

-   return info && info->auxd_enabled &&
-   domain->type == IOMMU_DOMAIN_UNMANAGED;
+   return info && info != DEFER_DEVICE_DOMAIN_INFO &&
+   info->auxd_enabled && domain->type == 
IOMMU_DOMAIN_UNMANAGED;

  }

  static void auxiliary_link_device(struct dmar_domain *domain,


Regards,
Jerry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.

2020-06-01 Thread Bjorn Helgaas
On Mon, Jun 01, 2020 at 03:56:55PM -0600, Alex Williamson wrote:
> On Mon, 1 Jun 2020 14:40:23 -0700
> "Raj, Ashok"  wrote:
> 
> > On Mon, Jun 01, 2020 at 04:25:19PM -0500, Bjorn Helgaas wrote:
> > > On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote:  
> > > > All Intel platforms guarantee that all root complex implementations
> > > > must send transactions up to IOMMU for address translations. Hence for
> > > > RCiEP devices that are Vendor ID Intel, can claim exception for lack of
> > > > ACS support.
> > > > 
> > > > 
> > > > 3.16 Root-Complex Peer to Peer Considerations
> > > > When DMA remapping is enabled, peer-to-peer requests through the
> > > > Root-Complex must be handled
> > > > as follows:
> > > > • The input address in the request is translated (through first-level,
> > > >   second-level or nested translation) to a host physical address (HPA).
> > > >   The address decoding for peer addresses must be done only on the
> > > >   translated HPA. Hardware implementations are free to further limit
> > > >   peer-to-peer accesses to specific host physical address regions
> > > >   (or to completely disallow peer-forwarding of translated requests).
> > > > • Since address translation changes the contents (address field) of
> > > >   the PCI Express Transaction Layer Packet (TLP), for PCI Express
> > > >   peer-to-peer requests with ECRC, the Root-Complex hardware must use
> > > >   the new ECRC (re-computed with the translated address) if it
> > > >   decides to forward the TLP as a peer request.
> > > > • Root-ports, and multi-function root-complex integrated endpoints, may
> > > >   support additional peerto-peer control features by supporting PCI 
> > > > Express
> > > >   Access Control Services (ACS) capability. Refer to ACS capability in
> > > >   PCI Express specifications for details.
> > > > 
> > > > Since Linux didn't give special treatment to allow this exception, 
> > > > certain
> > > > RCiEP MFD devices are getting grouped in a single iommu group. This
> > > > doesn't permit a single device to be assigned to a guest for instance.
> > > > 
> > > > In one vendor system: Device 14.x were grouped in a single IOMMU group.
> > > > 
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.0
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.2
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.3
> > > > 
> > > > After the patch:
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.0
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.2
> > > > /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group
> > > > 
> > > > 14.0 and 14.2 are integrated devices, but legacy end points.
> > > > Whereas 14.3 was a PCIe compliant RCiEP.
> > > > 
> > > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
> > > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> > > > 
> > > > This permits assigning this device to a guest VM.
> > > > 
> > > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
> > > > Signed-off-by: Ashok Raj 
> > > > To: Joerg Roedel 
> > > > To: Bjorn Helgaas 
> > > > Cc: linux-ker...@vger.kernel.org
> > > > Cc: iommu@lists.linux-foundation.org
> > > > Cc: Lu Baolu 
> > > > Cc: Alex Williamson 
> > > > Cc: Darrel Goeddel 
> > > > Cc: Mark Scott ,
> > > > Cc: Romil Sharma 
> > > > Cc: Ashok Raj   
> > > 
> > > Tentatively applied to pci/virtualization for v5.8, thanks!
> > > 
> > > The spec says this handling must apply "when DMA remapping is
> > > enabled".  The patch does not check whether DMA remapping is enabled.
> > > 
> > > Is there any case where DMA remapping is *not* enabled, and we rely on
> > > this patch to tell us whether the device is isolated?  It sounds like
> > > it may give the wrong answer in such a case?
> > > 
> > > Can you confirm that I don't need to worry about this?
> > 
> > I think all of this makes sense only when DMA remapping is enabled.
> > Otherwise there is no enforcement for isolation. 
> 
> Yep, without an IOMMU all devices operate in the same IOVA space and we
> have no isolation.  We only enable ACS when an IOMMU driver requests it
> and it's only used by IOMMU code to determine IOMMU grouping of
> devices.  Thanks,

Thanks, Ashok and Alex.  I wish it were more obvious from the code,
but I am reassured.

I also added a stable tag to help get this backported.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.

2020-06-01 Thread Alex Williamson
On Mon, 1 Jun 2020 14:40:23 -0700
"Raj, Ashok"  wrote:

> On Mon, Jun 01, 2020 at 04:25:19PM -0500, Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote:  
> > > All Intel platforms guarantee that all root complex implementations
> > > must send transactions up to IOMMU for address translations. Hence for
> > > RCiEP devices that are Vendor ID Intel, can claim exception for lack of
> > > ACS support.
> > > 
> > > 
> > > 3.16 Root-Complex Peer to Peer Considerations
> > > When DMA remapping is enabled, peer-to-peer requests through the
> > > Root-Complex must be handled
> > > as follows:
> > > • The input address in the request is translated (through first-level,
> > >   second-level or nested translation) to a host physical address (HPA).
> > >   The address decoding for peer addresses must be done only on the
> > >   translated HPA. Hardware implementations are free to further limit
> > >   peer-to-peer accesses to specific host physical address regions
> > >   (or to completely disallow peer-forwarding of translated requests).
> > > • Since address translation changes the contents (address field) of
> > >   the PCI Express Transaction Layer Packet (TLP), for PCI Express
> > >   peer-to-peer requests with ECRC, the Root-Complex hardware must use
> > >   the new ECRC (re-computed with the translated address) if it
> > >   decides to forward the TLP as a peer request.
> > > • Root-ports, and multi-function root-complex integrated endpoints, may
> > >   support additional peerto-peer control features by supporting PCI 
> > > Express
> > >   Access Control Services (ACS) capability. Refer to ACS capability in
> > >   PCI Express specifications for details.
> > > 
> > > Since Linux didn't give special treatment to allow this exception, certain
> > > RCiEP MFD devices are getting grouped in a single iommu group. This
> > > doesn't permit a single device to be assigned to a guest for instance.
> > > 
> > > In one vendor system: Device 14.x were grouped in a single IOMMU group.
> > > 
> > > /sys/kernel/iommu_groups/5/devices/:00:14.0
> > > /sys/kernel/iommu_groups/5/devices/:00:14.2
> > > /sys/kernel/iommu_groups/5/devices/:00:14.3
> > > 
> > > After the patch:
> > > /sys/kernel/iommu_groups/5/devices/:00:14.0
> > > /sys/kernel/iommu_groups/5/devices/:00:14.2
> > > /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group
> > > 
> > > 14.0 and 14.2 are integrated devices, but legacy end points.
> > > Whereas 14.3 was a PCIe compliant RCiEP.
> > > 
> > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
> > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> > > 
> > > This permits assigning this device to a guest VM.
> > > 
> > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
> > > Signed-off-by: Ashok Raj 
> > > To: Joerg Roedel 
> > > To: Bjorn Helgaas 
> > > Cc: linux-ker...@vger.kernel.org
> > > Cc: iommu@lists.linux-foundation.org
> > > Cc: Lu Baolu 
> > > Cc: Alex Williamson 
> > > Cc: Darrel Goeddel 
> > > Cc: Mark Scott ,
> > > Cc: Romil Sharma 
> > > Cc: Ashok Raj   
> > 
> > Tentatively applied to pci/virtualization for v5.8, thanks!
> > 
> > The spec says this handling must apply "when DMA remapping is
> > enabled".  The patch does not check whether DMA remapping is enabled.
> > 
> > Is there any case where DMA remapping is *not* enabled, and we rely on
> > this patch to tell us whether the device is isolated?  It sounds like
> > it may give the wrong answer in such a case?
> > 
> > Can you confirm that I don't need to worry about this?
> 
> I think all of this makes sense only when DMA remapping is enabled.
> Otherwise there is no enforcement for isolation. 

Yep, without an IOMMU all devices operate in the same IOVA space and we
have no isolation.  We only enable ACS when an IOMMU driver requests it
and it's only used by IOMMU code to determine IOMMU grouping of
devices.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.

2020-06-01 Thread Raj, Ashok
On Mon, Jun 01, 2020 at 04:25:19PM -0500, Bjorn Helgaas wrote:
> On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote:
> > All Intel platforms guarantee that all root complex implementations
> > must send transactions up to IOMMU for address translations. Hence for
> > RCiEP devices that are Vendor ID Intel, can claim exception for lack of
> > ACS support.
> > 
> > 
> > 3.16 Root-Complex Peer to Peer Considerations
> > When DMA remapping is enabled, peer-to-peer requests through the
> > Root-Complex must be handled
> > as follows:
> > • The input address in the request is translated (through first-level,
> >   second-level or nested translation) to a host physical address (HPA).
> >   The address decoding for peer addresses must be done only on the
> >   translated HPA. Hardware implementations are free to further limit
> >   peer-to-peer accesses to specific host physical address regions
> >   (or to completely disallow peer-forwarding of translated requests).
> > • Since address translation changes the contents (address field) of
> >   the PCI Express Transaction Layer Packet (TLP), for PCI Express
> >   peer-to-peer requests with ECRC, the Root-Complex hardware must use
> >   the new ECRC (re-computed with the translated address) if it
> >   decides to forward the TLP as a peer request.
> > • Root-ports, and multi-function root-complex integrated endpoints, may
> >   support additional peerto-peer control features by supporting PCI Express
> >   Access Control Services (ACS) capability. Refer to ACS capability in
> >   PCI Express specifications for details.
> > 
> > Since Linux didn't give special treatment to allow this exception, certain
> > RCiEP MFD devices are getting grouped in a single iommu group. This
> > doesn't permit a single device to be assigned to a guest for instance.
> > 
> > In one vendor system: Device 14.x were grouped in a single IOMMU group.
> > 
> > /sys/kernel/iommu_groups/5/devices/:00:14.0
> > /sys/kernel/iommu_groups/5/devices/:00:14.2
> > /sys/kernel/iommu_groups/5/devices/:00:14.3
> > 
> > After the patch:
> > /sys/kernel/iommu_groups/5/devices/:00:14.0
> > /sys/kernel/iommu_groups/5/devices/:00:14.2
> > /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group
> > 
> > 14.0 and 14.2 are integrated devices, but legacy end points.
> > Whereas 14.3 was a PCIe compliant RCiEP.
> > 
> > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
> > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> > 
> > This permits assigning this device to a guest VM.
> > 
> > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
> > Signed-off-by: Ashok Raj 
> > To: Joerg Roedel 
> > To: Bjorn Helgaas 
> > Cc: linux-ker...@vger.kernel.org
> > Cc: iommu@lists.linux-foundation.org
> > Cc: Lu Baolu 
> > Cc: Alex Williamson 
> > Cc: Darrel Goeddel 
> > Cc: Mark Scott ,
> > Cc: Romil Sharma 
> > Cc: Ashok Raj 
> 
> Tentatively applied to pci/virtualization for v5.8, thanks!
> 
> The spec says this handling must apply "when DMA remapping is
> enabled".  The patch does not check whether DMA remapping is enabled.
> 
> Is there any case where DMA remapping is *not* enabled, and we rely on
> this patch to tell us whether the device is isolated?  It sounds like
> it may give the wrong answer in such a case?
> 
> Can you confirm that I don't need to worry about this?  

I think all of this makes sense only when DMA remapping is enabled.
Otherwise there is no enforcement for isolation. 

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.

2020-06-01 Thread Bjorn Helgaas
On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote:
> All Intel platforms guarantee that all root complex implementations
> must send transactions up to IOMMU for address translations. Hence for
> RCiEP devices that are Vendor ID Intel, can claim exception for lack of
> ACS support.
> 
> 
> 3.16 Root-Complex Peer to Peer Considerations
> When DMA remapping is enabled, peer-to-peer requests through the
> Root-Complex must be handled
> as follows:
> • The input address in the request is translated (through first-level,
>   second-level or nested translation) to a host physical address (HPA).
>   The address decoding for peer addresses must be done only on the
>   translated HPA. Hardware implementations are free to further limit
>   peer-to-peer accesses to specific host physical address regions
>   (or to completely disallow peer-forwarding of translated requests).
> • Since address translation changes the contents (address field) of
>   the PCI Express Transaction Layer Packet (TLP), for PCI Express
>   peer-to-peer requests with ECRC, the Root-Complex hardware must use
>   the new ECRC (re-computed with the translated address) if it
>   decides to forward the TLP as a peer request.
> • Root-ports, and multi-function root-complex integrated endpoints, may
>   support additional peerto-peer control features by supporting PCI Express
>   Access Control Services (ACS) capability. Refer to ACS capability in
>   PCI Express specifications for details.
> 
> Since Linux didn't give special treatment to allow this exception, certain
> RCiEP MFD devices are getting grouped in a single iommu group. This
> doesn't permit a single device to be assigned to a guest for instance.
> 
> In one vendor system: Device 14.x were grouped in a single IOMMU group.
> 
> /sys/kernel/iommu_groups/5/devices/:00:14.0
> /sys/kernel/iommu_groups/5/devices/:00:14.2
> /sys/kernel/iommu_groups/5/devices/:00:14.3
> 
> After the patch:
> /sys/kernel/iommu_groups/5/devices/:00:14.0
> /sys/kernel/iommu_groups/5/devices/:00:14.2
> /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group
> 
> 14.0 and 14.2 are integrated devices, but legacy end points.
> Whereas 14.3 was a PCIe compliant RCiEP.
> 
> 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
> Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> 
> This permits assigning this device to a guest VM.
> 
> Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
> Signed-off-by: Ashok Raj 
> To: Joerg Roedel 
> To: Bjorn Helgaas 
> Cc: linux-ker...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Cc: Lu Baolu 
> Cc: Alex Williamson 
> Cc: Darrel Goeddel 
> Cc: Mark Scott ,
> Cc: Romil Sharma 
> Cc: Ashok Raj 

Tentatively applied to pci/virtualization for v5.8, thanks!

The spec says this handling must apply "when DMA remapping is
enabled".  The patch does not check whether DMA remapping is enabled.

Is there any case where DMA remapping is *not* enabled, and we rely on
this patch to tell us whether the device is isolated?  It sounds like
it may give the wrong answer in such a case?

Can you confirm that I don't need to worry about this?  

> ---
> v2: Moved functionality from iommu to pci quirks - Alex Williamson
> 
>  drivers/pci/quirks.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 28c9a2409c50..63373ca0a3fe 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4682,6 +4682,20 @@ static int pci_quirk_mf_endpoint_acs(struct pci_dev 
> *dev, u16 acs_flags)
>   PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_DT);
>  }
>  
> +static int pci_quirk_rciep_acs(struct pci_dev *dev, u16 acs_flags)
> +{
> + /*
> +  * RCiEP's are required to allow p2p only on translated addresses.
> +  * Refer to Intel VT-d specification Section 3.16 Root-Complex Peer
> +  * to Peer Considerations
> +  */
> + if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_END)
> + return -ENOTTY;
> +
> + return pci_acs_ctrl_enabled(acs_flags,
> + PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF);
> +}
> +
>  static int pci_quirk_brcm_acs(struct pci_dev *dev, u16 acs_flags)
>  {
>   /*
> @@ -4764,6 +4778,7 @@ static const struct pci_dev_acs_enabled {
>   /* I219 */
>   { PCI_VENDOR_ID_INTEL, 0x15b7, pci_quirk_mf_endpoint_acs },
>   { PCI_VENDOR_ID_INTEL, 0x15b8, pci_quirk_mf_endpoint_acs },
> + { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_rciep_acs },
>   /* QCOM QDF2xxx root ports */
>   { PCI_VENDOR_ID_QCOM, 0x0400, pci_quirk_qcom_rp_acs },
>   { PCI_VENDOR_ID_QCOM, 0x0401, pci_quirk_qcom_rp_acs },
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to users in sysfs

2020-06-01 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Tuesday, June 2, 2020 1:14 AM
> To: Song Bao Hua (Barry Song) ; w...@kernel.org;
> h...@lst.de; m.szyprow...@samsung.com; iommu@lists.linux-foundation.org
> Cc: Linuxarm ; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to
> users in sysfs
> 
> On 2020-05-30 10:15, Barry Song wrote:
> > As tests show the latency of dma_unmap can increase dramatically while
> > calling them cross NUMA nodes, especially cross CPU packages, eg.
> > 300ns vs 800ns while waiting for the completion of CMD_SYNC in an
> > empty command queue. The large latency causing by remote node will
> > in turn make contention of the command queue more serious, and enlarge
> > the latency of DMA users within local NUMA nodes.
> >
> > Users might intend to enforce NUMA locality with the consideration of
> > the position of SMMU. The patch provides minor benefit by presenting
> > this information to users directly, as they might want to know it without
> > checking hardware spec at all.
> 
> Hmm, given that dev-to_node() is a standard driver model thing, is there
> not already some generic device property that can expose it - and if
> not, should there be? Presumably if userspace cares enough to want to
> know whereabouts in the system an IOMMU is, it probably also cares where
> the actual endpoint devices are too.
> 
> At the very least, it doesn't seem right for it to be specific to one
> single IOMMU driver.

Right now pci devices have generally got the numa_node in sysfs by 
drivers/pci/pci-sysfs.c

static ssize_t numa_node_store(struct device *dev,
   struct device_attribute *attr, const char *buf,
   size_t count)
{
...

add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
pci_alert(pdev, FW_BUG "Overriding NUMA node to %d.  Contact your 
vendor for updates.",
  node);

dev->numa_node = node;
return count;
}

static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr,
  char *buf)
{
return sprintf(buf, "%d\n", dev->numa_node);
}
static DEVICE_ATTR_RW(numa_node);

for other devices who care about numa information, the specific drivers are 
doing that, for example:

drivers/dax/bus.c:  if (a == &dev_attr_numa_node.attr && 
!IS_ENABLED(CONFIG_NUMA))
drivers/dax/bus.c:  &dev_attr_numa_node.attr,
drivers/dma/idxd/sysfs.c:   &dev_attr_numa_node.attr,
drivers/hv/vmbus_drv.c: &dev_attr_numa_node.attr,
drivers/nvdimm/bus.c:   &dev_attr_numa_node.attr,
drivers/nvme/host/core.c:   &dev_attr_numa_node.attr,

smmu is usually a platform device, we can actually expose numa_node for 
platform_device, or globally expose numa_node
for general "device" if people don't opposite.

Barry

> 
> Robin.
> 
> > Signed-off-by: Barry Song 
> > ---
> >   drivers/iommu/arm-smmu-v3.c | 40
> -
> >   1 file changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index 82508730feb7..754c4d59498b 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -4021,6 +4021,44 @@ err_reset_pci_ops: __maybe_unused;
> > return err;
> >   }
> >
> > +static ssize_t numa_node_show(struct device *dev,
> > +   struct device_attribute *attr, char *buf)
> > +{
> > +   return sprintf(buf, "%d\n", dev_to_node(dev));
> > +}
> > +static DEVICE_ATTR_RO(numa_node);
> > +
> > +static umode_t arm_smmu_numa_attr_visible(struct kobject *kobj, struct
> attribute *a,
> > +   int n)
> > +{
> > +   struct device *dev = container_of(kobj, typeof(*dev), kobj);
> > +
> > +   if (!IS_ENABLED(CONFIG_NUMA))
> > +   return 0;
> > +
> > +   if (a == &dev_attr_numa_node.attr &&
> > +   dev_to_node(dev) == NUMA_NO_NODE)
> > +   return 0;
> > +
> > +   return a->mode;
> > +}
> > +
> > +static struct attribute *arm_smmu_dev_attrs[] = {
> > +   &dev_attr_numa_node.attr,
> > +   NULL
> > +};
> > +
> > +static struct attribute_group arm_smmu_dev_attrs_group = {
> > +   .attrs  = arm_smmu_dev_attrs,
> > +   .is_visible = arm_smmu_numa_attr_visible,
> > +};
> > +
> > +
> > +static const struct attribute_group *arm_smmu_dev_attrs_groups[] = {
> > +   &arm_smmu_dev_attrs_group,
> > +   NULL,
> > +};
> > +
> >   static int arm_smmu_device_probe(struct platform_device *pdev)
> >   {
> > int irq, ret;
> > @@ -4097,7 +4135,7 @@ static int arm_smmu_device_probe(struct
> platform_device *pdev)
> > return ret;
> >
> > /* And we're up. Go go go! */
> > -   ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
> > +   ret = iommu_device_sysfs_add(&smmu->iommu, dev,
> arm_smmu_dev_attrs_groups,
> >  "smmu3.%pa", &ioaddr);
> > if (ret)
> >   

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-01 Thread Bjorn Helgaas
On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
> On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> > Is this slowdown significant?  We already iterate over every device
> > when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> > PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> > adding two more iterations to the loop in pci_do_fixups() that tries
> > to match quirks against the current device.  I doubt that would be a
> > measurable slowdown.
> 
> I don't know how significant it is, but I remember people complaining
> about adding new PCI quirks because it takes too long for them to run
> them all. That was in the discussion about the quirk disabling ATS on
> AMD Stoney systems.
> 
> So it probably depends on how many PCI devices are in the system whether
> it causes any measureable slowdown.

I found this [1] from Paul Menzel, which was a slowdown caused by
quirk_usb_early_handoff().  I think the real problem is individual
quirks that take a long time.

The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
course, they're only run for matching devices anyway.  So I'd rather
keep them as PCI_FIXUP_FINAL than add a whole new phase.

Bjorn

[1] 
https://lore.kernel.org/linux-pci/b1533fd5-1fae-7256-9597-36d3d5de9...@molgen.mpg.de/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix event counter availability check

2020-06-01 Thread Alexander Monakov
> Instead of blindly moving the code around to a spot that would just work,
> I am trying to understand what might be required here. In this case,
> the init_device_table_dma()should not be needed. I suspect it's the IOMMU
> invalidate all command that's also needed here.
> 
> I'm also checking with the HW and BIOS team. Meanwhile, could you please give
> the following change a try:

Yes, this also fixes the problem for me.
Alexander

> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 5b81fd16f5fa..b07458cc1b0b 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -1875,6 +1875,8 @@ static int __init amd_iommu_init_pci(void)
> ret = iommu_init_pci(iommu);
> if (ret)
> break;
> +   iommu_flush_all_caches(iommu);
> +   init_iommu_perf_ctr(iommu);
> }
> 
> /*
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix event counter availability check

2020-06-01 Thread Suravee Suthikulpanit

Alexander

On 6/1/20 4:01 PM, Alexander Monakov wrote:

On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote:


Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
the issue. This is the earliest point in amd_iommu_init_pci where the
call succeeds on my laptop.


According to your description, it should just need to be anywhere after the
pci_enable_device() is called for the IOMMU device, isn't it? So, on your
system, what if we just move the init_iommu_perf_ctr() here:


No, this doesn't work, as I already said in the paragraph you are responding
to. See my last sentence in the quoted part.

So the implication is init_device_table_dma together with subsequent cache
flush is also setting up something that is necessary for counters to be
writable.

Alexander



Instead of blindly moving the code around to a spot that would just work,
I am trying to understand what might be required here. In this case,
the init_device_table_dma()should not be needed. I suspect it's the IOMMU
invalidate all command that's also needed here.

I'm also checking with the HW and BIOS team. Meanwhile, could you please give
the following change a try:

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5b81fd16f5fa..b07458cc1b0b 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1875,6 +1875,8 @@ static int __init amd_iommu_init_pci(void)
ret = iommu_init_pci(iommu);
if (ret)
break;
+   iommu_flush_all_caches(iommu);
+   init_iommu_perf_ctr(iommu);
}

/*
--
2.17.1

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: Optimize the error handling when allocating an iommu group

2020-06-01 Thread Baolin Wang
Optimize the error handling to free the resources correctly when
failed to allocate an iommu group.

Signed-off-by: Baolin Wang 
---
 drivers/iommu/iommu.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 03d6a26..ac91024 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -529,12 +529,18 @@ struct iommu_group *iommu_group_alloc(void)
 
ret = iommu_group_create_file(group,
  &iommu_group_attr_reserved_regions);
-   if (ret)
+   if (ret) {
+   kobject_put(group->devices_kobj);
return ERR_PTR(ret);
+   }
 
ret = iommu_group_create_file(group, &iommu_group_attr_type);
-   if (ret)
+   if (ret) {
+   iommu_group_remove_file(group,
+   &iommu_group_attr_reserved_regions);
+   kobject_put(group->devices_kobj);
return ERR_PTR(ret);
+   }
 
pr_debug("Allocated group %d\n", group->id);
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code

2020-06-01 Thread Jerry Snitselaar

On Mon Jun 01 20, Jerry Snitselaar wrote:

On Fri May 29 20, Jerry Snitselaar wrote:

On Tue Apr 14 20, Joerg Roedel wrote:

Hi,

here is the second version of this patch-set. The first version with
some more introductory text can be found here:

https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/

Changes v1->v2:

* Rebased to v5.7-rc1

* Re-wrote the arm-smmu changes as suggested by Robin Murphy

* Re-worked the Exynos patches to hopefully not break the
  driver anymore

* Fixed a missing mutex_unlock() reported by Marek Szyprowski,
  thanks for that.

There is also a git-branch available with these patches applied:


https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v2

Please review.

Thanks,

Joerg

Joerg Roedel (32):
iommu: Move default domain allocation to separate function
iommu/amd: Implement iommu_ops->def_domain_type call-back
iommu/vt-d: Wire up iommu_ops->def_domain_type
iommu/amd: Remove dma_mask check from check_device()
iommu/amd: Return -ENODEV in add_device when device is not handled by
 IOMMU
iommu: Add probe_device() and remove_device() call-backs
iommu: Move default domain allocation to iommu_probe_device()
iommu: Keep a list of allocated groups in __iommu_probe_device()
iommu: Move new probe_device path to separate function
iommu: Split off default domain allocation from group assignment
iommu: Move iommu_group_create_direct_mappings() out of
 iommu_group_add_device()
iommu: Export bus_iommu_probe() and make is safe for re-probing
iommu/amd: Remove dev_data->passthrough
iommu/amd: Convert to probe/release_device() call-backs
iommu/vt-d: Convert to probe/release_device() call-backs
iommu/arm-smmu: Convert to probe/release_device() call-backs
iommu/pamu: Convert to probe/release_device() call-backs
iommu/s390: Convert to probe/release_device() call-backs
iommu/virtio: Convert to probe/release_device() call-backs
iommu/msm: Convert to probe/release_device() call-backs
iommu/mediatek: Convert to probe/release_device() call-backs
iommu/mediatek-v1 Convert to probe/release_device() call-backs
iommu/qcom: Convert to probe/release_device() call-backs
iommu/rockchip: Convert to probe/release_device() call-backs
iommu/tegra: Convert to probe/release_device() call-backs
iommu/renesas: Convert to probe/release_device() call-backs
iommu/omap: Remove orphan_dev tracking
iommu/omap: Convert to probe/release_device() call-backs
iommu/exynos: Use first SYSMMU in controllers list for IOMMU core
iommu/exynos: Convert to probe/release_device() call-backs
iommu: Remove add_device()/remove_device() code-paths
iommu: Unexport iommu_group_get_for_dev()

Sai Praneeth Prakhya (1):
iommu: Add def_domain_type() callback in iommu_ops

drivers/iommu/amd_iommu.c   |  97 
drivers/iommu/amd_iommu_types.h |   1 -
drivers/iommu/arm-smmu-v3.c |  38 +--
drivers/iommu/arm-smmu.c|  39 ++--
drivers/iommu/exynos-iommu.c|  24 +-
drivers/iommu/fsl_pamu_domain.c |  22 +-
drivers/iommu/intel-iommu.c |  68 +-
drivers/iommu/iommu.c   | 393 +---
drivers/iommu/ipmmu-vmsa.c  |  60 ++---
drivers/iommu/msm_iommu.c   |  34 +--
drivers/iommu/mtk_iommu.c   |  24 +-
drivers/iommu/mtk_iommu_v1.c|  50 ++--
drivers/iommu/omap-iommu.c  |  99 ++--
drivers/iommu/qcom_iommu.c  |  24 +-
drivers/iommu/rockchip-iommu.c  |  26 +--
drivers/iommu/s390-iommu.c  |  22 +-
drivers/iommu/tegra-gart.c  |  24 +-
drivers/iommu/tegra-smmu.c  |  31 +--
drivers/iommu/virtio-iommu.c|  41 +---
include/linux/iommu.h   |  21 +-
20 files changed, 533 insertions(+), 605 deletions(-)

--
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu



Hi Joerg,

With this patchset, I have an epyc system where if I boot with
iommu=nopt and force a dump I will see some io page faults for a nic
on the system. The vmcore is harvested and the system reboots. I
haven't reproduced it on other systems yet, but without the patchset I
don't see the io page faults during the kdump.

Regards,
Jerry


I just hit an issue on a separate intel based system (kdump iommu=nopt),
where it panics in during intel_iommu_attach_device, in is_aux_domain,
due to device_domain_info being DEFER_DEVICE_DOMAIN_INFO. That doesn't
get set to a valid address until the domain_add_dev_info call.

Is it as simple as the following?

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 29d3940847d3..f1bbeed46a4c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5053,8 +5053,8 @@ is_aux_domain(struct device *dev, struct iommu_domain 
*domain)
{
   struct device_domain_info *info = dev->archdata.iommu;
-   return info && info->auxd_enabled &&
-   domain->type == IOMMU_DOMAIN_UNMANA

Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to users in sysfs

2020-06-01 Thread Robin Murphy

On 2020-05-30 10:15, Barry Song wrote:

As tests show the latency of dma_unmap can increase dramatically while
calling them cross NUMA nodes, especially cross CPU packages, eg.
300ns vs 800ns while waiting for the completion of CMD_SYNC in an
empty command queue. The large latency causing by remote node will
in turn make contention of the command queue more serious, and enlarge
the latency of DMA users within local NUMA nodes.

Users might intend to enforce NUMA locality with the consideration of
the position of SMMU. The patch provides minor benefit by presenting
this information to users directly, as they might want to know it without
checking hardware spec at all.


Hmm, given that dev-to_node() is a standard driver model thing, is there 
not already some generic device property that can expose it - and if 
not, should there be? Presumably if userspace cares enough to want to 
know whereabouts in the system an IOMMU is, it probably also cares where 
the actual endpoint devices are too.


At the very least, it doesn't seem right for it to be specific to one 
single IOMMU driver.


Robin.


Signed-off-by: Barry Song 
---
  drivers/iommu/arm-smmu-v3.c | 40 -
  1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 82508730feb7..754c4d59498b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -4021,6 +4021,44 @@ err_reset_pci_ops: __maybe_unused;
return err;
  }
  
+static ssize_t numa_node_show(struct device *dev,

+   struct device_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%d\n", dev_to_node(dev));
+}
+static DEVICE_ATTR_RO(numa_node);
+
+static umode_t arm_smmu_numa_attr_visible(struct kobject *kobj, struct 
attribute *a,
+   int n)
+{
+   struct device *dev = container_of(kobj, typeof(*dev), kobj);
+
+   if (!IS_ENABLED(CONFIG_NUMA))
+   return 0;
+
+   if (a == &dev_attr_numa_node.attr &&
+   dev_to_node(dev) == NUMA_NO_NODE)
+   return 0;
+
+   return a->mode;
+}
+
+static struct attribute *arm_smmu_dev_attrs[] = {
+   &dev_attr_numa_node.attr,
+   NULL
+};
+
+static struct attribute_group arm_smmu_dev_attrs_group = {
+   .attrs  = arm_smmu_dev_attrs,
+   .is_visible = arm_smmu_numa_attr_visible,
+};
+
+
+static const struct attribute_group *arm_smmu_dev_attrs_groups[] = {
+   &arm_smmu_dev_attrs_group,
+   NULL,
+};
+
  static int arm_smmu_device_probe(struct platform_device *pdev)
  {
int irq, ret;
@@ -4097,7 +4135,7 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
return ret;
  
  	/* And we're up. Go go go! */

-   ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
+   ret = iommu_device_sysfs_add(&smmu->iommu, dev, 
arm_smmu_dev_attrs_groups,
 "smmu3.%pa", &ioaddr);
if (ret)
return ret;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v7 21/24] iommu/arm-smmu-v3: Add stall support for platform devices

2020-06-01 Thread Shameerali Kolothum Thodi
Hi Jean,

> -Original Message-
> From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
> Jean-Philippe Brucker
> Sent: 19 May 2020 18:55
> To: iommu@lists.linux-foundation.org; devicet...@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linux-...@vger.kernel.org;
> linux...@kvack.org
> Cc: fenghua...@intel.com; kevin.t...@intel.com; j...@ziepe.ca;
> catalin.mari...@arm.com; robin.mur...@arm.com; h...@infradead.org;
> zhangfei@linaro.org; Jean-Philippe Brucker ;
> felix.kuehl...@amd.com; w...@kernel.org; christian.koe...@amd.com
> Subject: [PATCH v7 21/24] iommu/arm-smmu-v3: Add stall support for platform
> devices
> 
> The SMMU provides a Stall model for handling page faults in platform
> devices. It is similar to PCI PRI, but doesn't require devices to have
> their own translation cache. Instead, faulting transactions are parked
> and the OS is given a chance to fix the page tables and retry the
> transaction.
> 
> Enable stall for devices that support it (opt-in by firmware). When an
> event corresponds to a translation error, call the IOMMU fault handler.
> If the fault is recoverable, it will call us back to terminate or
> continue the stall.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/Kconfig   |   1 +
>  include/linux/iommu.h   |   2 +
>  drivers/iommu/arm-smmu-v3.c | 284
> ++--
>  drivers/iommu/of_iommu.c|   5 +-
>  4 files changed, 281 insertions(+), 11 deletions(-)
> 

[...]
 
> +static int arm_smmu_page_response(struct device *dev,
> +   struct iommu_fault_event *unused,
> +   struct iommu_page_response *resp)
> +{
> + struct arm_smmu_cmdq_ent cmd = {0};
> + struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> + int sid = master->streams[0].id;
> +
> + if (master->stall_enabled) {
> + cmd.opcode  = CMDQ_OP_RESUME;
> + cmd.resume.sid  = sid;
> + cmd.resume.stag = resp->grpid;
> + switch (resp->code) {
> + case IOMMU_PAGE_RESP_INVALID:
> + case IOMMU_PAGE_RESP_FAILURE:
> + cmd.resume.resp = CMDQ_RESUME_0_RESP_ABORT;
> + break;
> + case IOMMU_PAGE_RESP_SUCCESS:
> + cmd.resume.resp = CMDQ_RESUME_0_RESP_RETRY;
> + break;
> + default:
> + return -EINVAL;
> + }
> + } else {
> + /* TODO: insert PRI response here */
> + return -ENODEV;
> + }
> +
> + arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
> + /*
> +  * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
> +  * RESUME consumption guarantees that the stalled transaction will be
> +  * terminated... at some point in the future. PRI_RESP is fire and
> +  * forget.
> +  */
> +
> + return 0;
> +}
> +
>  /* Context descriptor manipulation functions */
>  static void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16
> asid)
>  {
> @@ -1762,8 +1846,7 @@ static int arm_smmu_write_ctx_desc(struct
> arm_smmu_domain *smmu_domain,
>   FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
>   CTXDESC_CD_0_V;
> 
> - /* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
> - if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
> + if (smmu_domain->stall_enabled)
>   val |= CTXDESC_CD_0_S;
>   }
> 
> @@ -2171,7 +2254,7 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>FIELD_PREP(STRTAB_STE_1_STRW, strw));
> 
>   if (smmu->features & ARM_SMMU_FEAT_STALLS &&
> -!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
> + !master->stall_enabled)
>   dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
> 
>   val |= (s1_cfg->cdcfg.cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK)
> |
> @@ -2248,7 +2331,6 @@ static int arm_smmu_init_l2_strtab(struct
> arm_smmu_device *smmu, u32 sid)
>   return 0;
>  }
> 
> -__maybe_unused
>  static struct arm_smmu_master *
>  arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
>  {
> @@ -2275,23 +2357,123 @@ arm_smmu_find_master(struct
> arm_smmu_device *smmu, u32 sid)
>  }
> 
>  /* IRQ and event handlers */
> +static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
> +{
> + int ret;
> + u32 perm = 0;
> + struct arm_smmu_master *master;
> + bool ssid_valid = evt[0] & EVTQ_0_SSV;
> + u8 type = FIELD_GET(EVTQ_0_ID, evt[0]);
> + u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]);
> + struct iommu_fault_event fault_evt = { };
> + struct iommu_fault *flt = &fault_evt.fault;
> +
> + /* Stage-2 is always pinned at the moment */
> + if (evt[1] & EVTQ_1_S2)
> + return -EFAULT;
> +
> + master = arm_smmu_

Re: [PATCH] iommu: Improve exception handling in iommu_group_alloc()

2020-06-01 Thread Markus Elfring
> Optimize the error handling to free the resources correctly when
> failed to allocate an iommu group.

* I would not categorise the desired completion of exception handling
  as a software optimisation.

* Would you like to add the tag “Fixes” to the commit message?

* I suggest to avoid the specification of duplicate function calls.
  Will it be helpful to add a few jump targets?
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162#n455

Regards,
Markus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH RFC 2/2] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()

2020-06-01 Thread John Garry
It has been shown that the cmpxchg() for finding space in the cmdq can
be a bottleneck:
- For more CPUs contenting the cmdq, cmpxchg() will fail more often.
- Since the software-maintained cons pointer is updated on the same 64b
  memory region, the chance of cmpxchg() failure increases again.

The cmpxchg() is removed as part of 2 related changes:
- If a CMD_SYNC is always issued for a batch, the cmdq can - in theory -
  never fill, since we always wait for a CMD_SYNC to be consumed. We must
  issue the CMD_SYNC so that a CPU will be always limited to issuing
  max_cmd_per_batch commands. Generally for DMA unmap ops, a CMD_SYNC
  is always issued anyway.
  As such, the cmdq locking is removed, and we only longer maintain cons
  in software (this needs to be revised for !MSI support).
- Update prod and cmdq owner in a single operation. For this, we count the
  prod and owner in separate regions in arm_smmu_ll_queue.prod.
  As with simple binary counting, once the prod+wrap fields overflow, they
  will zero. They will overflow into "owner" region, but this is ok as we
  have accounted for the extra value.
  As for the "owner", we now count this value, instead of setting a flag.
  Similar to before, once the owner has finished gathering, it will clear
  this mask. As such, a CPU declares itself as the "owner" when it reads
  zero for this field. This zeroing will also clear possible overflow in
  wrap+prod region.

Signed-off-by: John Garry 
---
 drivers/iommu/arm-smmu-v3.c | 58 +++--
 1 file changed, 14 insertions(+), 44 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e607ab5a4cbd..d6c7d82f9cf8 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1375,7 +1375,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct 
arm_smmu_device *smmu,
   u64 *cmds, int n, bool sync)
 {
u64 cmd_sync[CMDQ_ENT_DWORDS];
-   u32 prod;
+   u32 prod, prodx;
unsigned long flags;
bool owner;
struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
@@ -1383,33 +1383,21 @@ static int arm_smmu_cmdq_issue_cmdlist(struct 
arm_smmu_device *smmu,
.max_n_shift = cmdq->q.llq.max_n_shift,
}, head = llq;
int ret = 0;
+   u32 owner_val = 1 << cmdq->q.llq.owner_count_shift;
+   u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0);
+   u32 owner_mask = GENMASK(30, cmdq->q.llq.owner_count_shift);
+
+   /* always issue a CMD_SYNC TODO: fixup callers for this */
+   sync = true;
 
/* 1. Allocate some space in the queue */
local_irq_save(flags);
-   llq.val = READ_ONCE(cmdq->q.llq.val);
-   do {
-   u64 old;
-
-   while (!queue_has_space(&llq, n + sync)) {
-   local_irq_restore(flags);
-   if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq))
-   dev_err_ratelimited(smmu->dev, "CMDQ 
timeout\n");
-   local_irq_save(flags);
-   }
 
-   head.cons = llq.cons;
-   head.prod = queue_inc_prod_n(&llq, n + sync) |
-CMDQ_PROD_OWNED_FLAG;
+   prodx = atomic_fetch_add(n + sync + owner_val, 
&cmdq->q.llq.atomic.prod);
 
-   old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);
-   if (old == llq.val)
-   break;
-
-   llq.val = old;
-   } while (1);
-   owner = !(llq.prod & CMDQ_PROD_OWNED_FLAG);
-   head.prod &= ~CMDQ_PROD_OWNED_FLAG;
-   llq.prod &= ~CMDQ_PROD_OWNED_FLAG;
+   owner = !(prodx & owner_mask);
+   llq.prod = prod_mask & prodx;
+   head.prod = queue_inc_prod_n(&llq, n + sync);
 
/*
 * 2. Write our commands into the queue As with simple binary counting, 
once the prod+wrap fields overflow, they
  will zero. They will overflow into "owner" region, but this is ok as we
  have accounted for the extra value.
@@ -1420,14 +1408,6 @@ static int arm_smmu_cmdq_issue_cmdlist(struct 
arm_smmu_device *smmu,
prod = queue_inc_prod_n(&llq, n);
arm_smmu_cmdq_build_sync_cmd(cmd_sync, smmu, prod);
queue_write(Q_ENT(&cmdq->q, prod), cmd_sync, CMDQ_ENT_DWORDS);
-
-   /*
-* In order to determine completion of our CMD_SYNC, we must
-* ensure that the queue can't wrap twice without us noticing.
-* We achieve that by taking the cmdq lock as shared before
-* marking our slot as valid.
-*/
-   arm_smmu_cmdq_shared_lock(cmdq);
}
 
/* 3. Mark our slots as valid, ensuring commands are visible first */
@@ -1439,11 +1419,10 @@ static int arm_smmu_cmdq_issue_cmdlist(struct 
arm_smmu_device *smmu,
/* a. Wait for previous owner to finish */
atomic_cond_rea

[PATCH RFC 0/2] iommu/arm-smmu-v3: Improve cmdq lock efficiency

2020-06-01 Thread John Garry
As mentioned in [0], the CPU may consume many cycles processing
arm_smmu_cmdq_issue_cmdlist(). One issue we find is the cmpxchg() loop to
get space on the queue takes approx 25% of the cycles for this function.

The cmpxchg() is removed as follows:
- We assume that the cmdq can never fill with changes to limit the
  batch size (where necessary) and always issue a CMD_SYNC for a batch
  We need to do this since we no longer maintain the cons value in
  software, and we cannot deal with no available space properly.
- Replace cmpxchg() with atomic inc operation, to maintain the prod
  and owner values.

Early experiments have shown that we may see a 25% boost in throughput
IOPS for my NVMe test with these changes. And some CPUs, which were
loaded at ~55%, now see a ~45% load.

So, even though the changes are incomplete and other parts of the driver
will need fixing up (and it looks maybe broken for !MSI support), the
performance boost seen would seem to be worth the effort of exploring
this.

Comments requested please.

Thanks

[0] 
https://lore.kernel.org/linux-iommu/b926444035e5e2439431908e3842afd24b8...@dggemi525-mbs.china.huawei.com/T/#ma02e301c38c3e94b7725e685757c27e39c7cbde3

John Garry (2):
  iommu/arm-smmu-v3: Calculate bits for prod and owner
  iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()

 drivers/iommu/arm-smmu-v3.c | 92 +++--
 1 file changed, 47 insertions(+), 45 deletions(-)

-- 
2.16.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC 1/2] iommu/arm-smmu-v3: Calculate bits for prod and owner

2020-06-01 Thread John Garry
Since the arm_smmu_ll_queue.prod will be for counting the "owner" value
and also HW prod pointer, calculate how many bits are available for and
used by each.

This is based on the number of possible CPUs in the system. And we require
that each CPU can issue a minimum of 2 commands per batch - 1 x CMD_SYNC
and at least 1 x other.

Ignoring limits of HW max_n_shift and HW cmdq memory allocation, approx 16K
is the max supported CPUs. For this, max_n_shift would be 15.

Signed-off-by: John Garry 
---
 drivers/iommu/arm-smmu-v3.c | 34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 82508730feb7..e607ab5a4cbd 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -530,6 +530,8 @@ struct arm_smmu_ll_queue {
u8  __pad[SMP_CACHE_BYTES];
} cacheline_aligned_in_smp;
u32 max_n_shift;
+   u32 max_cmd_per_batch;
+   u32 owner_count_shift;
 };
 
 struct arm_smmu_queue {
@@ -1512,7 +1514,10 @@ static void arm_smmu_cmdq_batch_add(struct 
arm_smmu_device *smmu,
struct arm_smmu_cmdq_batch *cmds,
struct arm_smmu_cmdq_ent *cmd)
 {
-   if (cmds->num == CMDQ_BATCH_ENTRIES) {
+   struct arm_smmu_cmdq *q = &smmu->cmdq;
+   struct arm_smmu_ll_queue *llq = &q->q.llq;
+
+   if (cmds->num == llq->max_cmd_per_batch) {
arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false);
cmds->num = 0;
}
@@ -3156,8 +3161,24 @@ static int arm_smmu_init_one_queue(struct 
arm_smmu_device *smmu,
   unsigned long cons_off,
   size_t dwords, const char *name)
 {
+   int cpus = num_possible_cpus();
size_t qsz;
 
+   /*
+* We can get the number of bits required for owner counting by
+* log2(nr possible cpus) + 1, but we have to take into account that he
+* wrap+prod could overflow before the owner zeroes, so add 1
+* more (to cpus) for bits_for_cmdq_owner calculation.
+*/
+   int bits_for_cmdq_owner = ilog2(cpus + 1) + 1;
+   /* 1-bit for overflow, 1-bit for wrap */
+   int bits_available_for_prod = 32 - 2 - bits_for_cmdq_owner;
+   int entries_for_prod;
+
+   if (bits_available_for_prod < 1) /* this would be insane - how many 
CPUs? */
+   return -ENOMEM;
+
+   q->llq.max_n_shift = min_t(int, q->llq.max_n_shift, 
bits_available_for_prod);
do {
qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
@@ -3167,6 +3188,17 @@ static int arm_smmu_init_one_queue(struct 
arm_smmu_device *smmu,
 
q->llq.max_n_shift--;
} while (1);
+   entries_for_prod = 1 << q->llq.max_n_shift;
+
+   /*
+* We need at least 2 commands in a batch (1 x CMD_SYNC and 1 x 
whatever else).
+* Assuming orig max_n_shift >= 17, this would mean ~16K CPUs max.
+*/
+   if (entries_for_prod < 2 * cpus)
+   return -ENOMEM;
+
+   q->llq.max_cmd_per_batch = min_t(u32, entries_for_prod / cpus, 
CMDQ_BATCH_ENTRIES);
+   q->llq.owner_count_shift = q->llq.max_n_shift + 1;
 
if (!q->base) {
dev_err(smmu->dev,
-- 
2.16.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu-v3: allocate the memory of queues in local numa node

2020-06-01 Thread Song Bao Hua (Barry Song)
BEGIN:VCALENDAR
METHOD:REQUEST
PRODID:Microsoft Exchange Server 2010
VERSION:2.0
BEGIN:VTIMEZONE
TZID:New Zealand Standard Time
BEGIN:STANDARD
DTSTART:16010101T03
TZOFFSETFROM:+1300
TZOFFSETTO:+1200
RRULE:FREQ=YEARLY;INTERVAL=1;BYDAY=1SU;BYMONTH=4
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:16010101T02
TZOFFSETFROM:+1200
TZOFFSETTO:+1300
RRULE:FREQ=YEARLY;INTERVAL=1;BYDAY=-1SU;BYMONTH=9
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
ORGANIZER;CN=Song Bao Hua (Barry Song):MAILTO:song.bao@hisilicon.com
ATTENDEE;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=h...@lst.de
 :MAILTO:h...@lst.de
ATTENDEE;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=m.szyprows
 k...@samsung.com:MAILTO:m.szyprow...@samsung.com
ATTENDEE;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=robin.murp
 h...@arm.com:MAILTO:robin.mur...@arm.com
ATTENDEE;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=will@kerne
 l.org:MAILTO:w...@kernel.org
ATTENDEE;ROLE=OPT-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=linux-arm-
 ker...@lists.infradead.org:MAILTO:linux-arm-ker...@lists.infradead.org
ATTENDEE;ROLE=OPT-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=iommu@list
 s.linux-foundation.org:MAILTO:iommu@lists.linux-foundation.org
ATTENDEE;ROLE=OPT-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=Linuxarm:M
 AILTO:linux...@huawei.com
DESCRIPTION;LANGUAGE=en-US:> From: Song Bao Hua (Barry Song)\n> Sent: Monda
 y\, June 1\, 2020 11:32 PM\n> To: h...@lst.de\; m.szyprow...@samsung.com\; 
 robin.mur...@arm.com\; \n> w...@kernel.org\n> Cc: linux-arm-kernel@lists.i
 nfradead.org\; \n> iommu@lists.linux-foundation.org\; Linuxarm \; Song \n> Bao Hua (Barry Song) \n>
  Subject: [PATCH] iommu/arm-smmu-v3: allocate the memory of queues in \n> 
 local numa node\n> \n> \n> dmam_alloc_coherent() will usually allocate mem
 ory from the default CMA.\n> For\n> a common arm64 defconfig without reser
 ved memory in device tree\, there \n> is only one CMA close to address 0.\
 n> dma_alloc_contiguous() will allocate memory without any idea of  NUMA \
 n> and smmu has no customized per-numa cma_area.\n> struct page *dma_alloc
 _contiguous(struct device *dev\, size_t size\, \n> gfp_t gfp) {\n>
  size_t count = size >> PAGE_SHIFT\;\n> struct page *page = NULL\;
 \n> struct cma *cma = NULL\;\n> \n> if (dev && dev->cma_ar
 ea)\n> cma = dev->cma_area\;\n> else if (count > 1
 )\n> cma = dma_contiguous_default_area\;\n> \n> 	...\n>   
   return page\;\n> }\n> \n> if there are N numa nodes\, N-1 nodes will
  put command/evt queues etc \n> in a remote node the default CMA belongs t
 o\,probably node 0. Tests \n> show\, after sending CMD_SYNC in an empty co
 mmand queue\, on Node2\, \n> without this patch\, it takes 550ns to wait f
 or the completion of \n> CMD_SYNC\; with this patch\, it takes 250ns to wa
 it for the completion \n> of CMD_SYNC.\n> \n\nSorry for missing the RFC in
  the subject.\nFor the tested platform\, hardware will help the sync betwe
 en cpu and smmu. So there is no sync operations. So please consider this p
 atch as a RFC. This is only for the concept.\n\nThe other option to fix th
 is is creating a per-numa CMA area for smmu and assigning the per-numa cma
 _area to SMMU.\nMaybe cma_declare_contiguous_nid() used by mm/hugetlb.c ca
 n be used by AARCH64/SMMU.\n\nOr we can completely change CMA to create de
 fault per-numa CMA like this:\n\n struct page *dma_alloc_contiguous(struct
  device *dev\, size_t size\, gfp_t gfp)  {\n size_t count = size >
 > PAGE_SHIFT\;\n struct page *page = NULL\;\n struct cma *
 cma = NULL\;\n +int nid = dev_to_node(dev)\;\n \n if (dev 
 && dev->cma_area)\n cma = dev->cma_area\;\n else i
 f (count > 1)\n -   cma = dma_contiguous_default_area\;\n +   
  cma = dma_contiguous_default_areas[nid]\;\n \n 	...\n
  return page\;\n}\n\nThen there is no necessity to assign per-numa cma_are
 a to smmu->dev.cma_area.\n\n-barry\n\n> Signed-off-by: Barry Song \n> ---\n>  drivers/iommu/arm-smmu-v3.c | 63 \n> +
 +++-\n>  1 file changed\, 48 insertions(+)\, 1
 5 deletions(-)\n> \n> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/i
 ommu/arm-smmu-v3.c \n> index 82508730feb7..58295423e1d7 100644\n> --- a/dr
 ivers/iommu/arm-smmu-v3.c\n> +++ b/drivers/iommu/arm-smmu-v3.c\n> @@ -3157
 \,21 +3157\,23 @@ static int arm_smmu_init_one_queue(struct \n> arm_smmu_d
 evice *smmu\,\n>     size_t dwords\, const char *name)  {\n>  	size_t 
 qsz\;\n> +	struct page *page\;\n> \n> -	do {\n> -		qsz = ((1 << q->llq.max
 _n_shift) * dwords) << 3\;\n> -		q->base = dmam_alloc_coherent(smmu->dev\,
  qsz\, &q->base_dma\,\n> -	  GFP_KERNEL)\;\n> -		if (q->base || qs
 z < PAGE_SIZE)\n> -			break\;\n> -\n> -		q->llq.max_n_shift--\;\n> -	} whi
 le (1)\;\n> +	qsz = ((1 << q->llq.max_n_shift) * dwords) << 3\;\n> +	page 
 = alloc_page

[PATCH] iommu/arm-smmu-v3: allocate the memory of queues in local numa node

2020-06-01 Thread Barry Song
dmam_alloc_coherent() will usually allocate memory from the default CMA. For
a common arm64 defconfig without reserved memory in device tree, there is only
one CMA close to address 0.
dma_alloc_contiguous() will allocate memory without any idea of  NUMA and smmu
has no customized per-numa cma_area.
struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
{
size_t count = size >> PAGE_SHIFT;
struct page *page = NULL;
struct cma *cma = NULL;

if (dev && dev->cma_area)
cma = dev->cma_area;
else if (count > 1)
cma = dma_contiguous_default_area;

...
return page;
}

if there are N numa nodes, N-1 nodes will put command/evt queues etc in a
remote node the default CMA belongs to,probably node 0. Tests show, after
sending CMD_SYNC in an empty command queue,
on Node2, without this patch, it takes 550ns to wait for the completion
of CMD_SYNC; with this patch, it takes 250ns to wait for the completion
of CMD_SYNC.

Signed-off-by: Barry Song 
---
 drivers/iommu/arm-smmu-v3.c | 63 -
 1 file changed, 48 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 82508730feb7..58295423e1d7 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3157,21 +3157,23 @@ static int arm_smmu_init_one_queue(struct 
arm_smmu_device *smmu,
   size_t dwords, const char *name)
 {
size_t qsz;
+   struct page *page;
 
-   do {
-   qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
-   q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
- GFP_KERNEL);
-   if (q->base || qsz < PAGE_SIZE)
-   break;
-
-   q->llq.max_n_shift--;
-   } while (1);
+   qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
+   page = alloc_pages_node(dev_to_node(smmu->dev), GFP_KERNEL,
+   get_order(qsz));
+   if (!page) {
+   dev_err(smmu->dev,
+   "failed to allocate queue (0x%zx bytes) for 
%s\n",
+   qsz, name);
+   return -ENOMEM;
+   }
 
-   if (!q->base) {
+   q->base = page_address(page);
+   q->base_dma = dma_map_single(smmu->dev, q->base, qsz, 
DMA_BIDIRECTIONAL);
+   if (dma_mapping_error(smmu->dev, q->base_dma)) {
dev_err(smmu->dev,
-   "failed to allocate queue (0x%zx bytes) for %s\n",
-   qsz, name);
+   "failed to dma map for %s\n", name);
return -ENOMEM;
}
 
@@ -3192,6 +3194,18 @@ static int arm_smmu_init_one_queue(struct 
arm_smmu_device *smmu,
return 0;
 }
 
+static int arm_smmu_deinit_one_queue(struct arm_smmu_device *smmu,
+  struct arm_smmu_queue *q,
+  size_t dwords)
+{
+   size_t qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
+
+   dma_unmap_single(smmu->dev, q->base_dma, qsz, DMA_BIDIRECTIONAL);
+   free_page((unsigned long)q->base);
+
+   return 0;
+}
+
 static void arm_smmu_cmdq_free_bitmap(void *data)
 {
unsigned long *bitmap = data;
@@ -3233,22 +3247,40 @@ static int arm_smmu_init_queues(struct arm_smmu_device 
*smmu)
 
ret = arm_smmu_cmdq_init(smmu);
if (ret)
-   return ret;
+   goto deinit_cmdq;
 
/* evtq */
ret = arm_smmu_init_one_queue(smmu, &smmu->evtq.q, ARM_SMMU_EVTQ_PROD,
  ARM_SMMU_EVTQ_CONS, EVTQ_ENT_DWORDS,
  "evtq");
if (ret)
-   return ret;
+   goto deinit_cmdq;
 
/* priq */
if (!(smmu->features & ARM_SMMU_FEAT_PRI))
return 0;
 
-   return arm_smmu_init_one_queue(smmu, &smmu->priq.q, ARM_SMMU_PRIQ_PROD,
+   ret = arm_smmu_init_one_queue(smmu, &smmu->priq.q, ARM_SMMU_PRIQ_PROD,
   ARM_SMMU_PRIQ_CONS, PRIQ_ENT_DWORDS,
   "priq");
+   if (ret)
+   goto deinit_evtq;
+
+   return 0;
+
+deinit_evtq:
+   arm_smmu_deinit_one_queue(smmu, &smmu->evtq.q, EVTQ_ENT_DWORDS);
+deinit_cmdq:
+   arm_smmu_deinit_one_queue(smmu, &smmu->cmdq.q, CMDQ_ENT_DWORDS);
+   return ret;
+}
+
+static void arm_smmu_deinit_queues(struct arm_smmu_device *smmu)
+{
+   arm_smmu_deinit_one_queue(smmu, &smmu->cmdq.q, CMDQ_ENT_DWORDS);
+   arm_smmu_deinit_one_queue(smmu, &smmu->evtq.q, EVTQ_ENT_DWORDS);
+   if (smmu->features & ARM_SMMU_FEAT_PRI)
+   arm_smmu_deinit_one_queue(smmu, &smmu->priq.q, PRIQ_ENT_DWORDS);
 }
 
 static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
@@ -4121,6 +4153,7 @@ static int arm_sm

[Try 2] DMAR errors on Wildcat Point-LP xHCI (Lenovo T450s)

2020-06-01 Thread Vincent Pelletier
Well, try 1 was a word salad, thanks gmail I guess. Now I will try to redeem 
myself...

Hello,

Trying to use a built-in USB device I rarely use (Sierra EM7345 LTE
modem), I ran into issues from the modem flapping (getting removed
from USB bus before LTE network registration is visible) up to all
devices on the bus being "disconnected" (xhci giving up, in my
understanding, with only the root hubs listed by lsusb).
Checking the syslog, I find this:

May 30 17:35:46 localhost kernel: [  278.999480] DMAR: DRHD: handling fault 
status reg 3
May 30 17:35:46 localhost kernel: [  278.999485] DMAR: [DMA Read] Request 
device [00:16.7] PASID  fault addr 9cdff000 [fault reason 02] Present 
bit in context entry is clear
May 30 17:35:46 localhost kernel: [  278.999488] DMAR: DRHD: handling fault 
status reg 3
May 30 17:35:46 localhost kernel: [  278.999490] DMAR: [DMA Read] Request 
device [00:16.7] PASID  fault addr 9cdff000 [fault reason 02] Present 
bit in context entry is clear
May 30 17:35:46 localhost kernel: [  279.001076] DMAR: DRHD: handling fault 
status reg 2
May 30 17:35:46 localhost kernel: [  279.001078] DMAR: [DMA Write] Request 
device [00:16.7] PASID  fault addr 9cdff000 [fault reason 02] Present 
bit in context entry is clear
May 30 17:35:46 localhost kernel: [  279.001120] DMAR: DRHD: handling fault 
status reg 2
May 30 17:35:47 localhost kernel: [  280.738192] usb 2-4: USB disconnect, 
device number 10
May 30 17:35:47 localhost kernel: [  280.738224] cdc_mbim 2-4:1.0: Tx URB 
error: -19
May 30 17:35:47 localhost kernel: [  280.738303] cdc_mbim 2-4:1.0 wwan0: 
unregister 'cdc_mbim' usb-:00:14.0-4, CDC MBIM
May 30 17:35:47 localhost ModemManager[736]:   (net/wwan0): released by 
device '/sys/devices/pci:00/:00:14.0/usb2/2-4'
May 30 17:35:47 localhost ModemManager[736]: [/dev/cdc-wdm0] unexpected port 
hangup!
May 30 17:35:47 localhost ModemManager[736]: [/dev/cdc-wdm0] channel destroyed
May 30 17:35:47 localhost ModemManager[736]:   Connection to mbim-proxy 
for /dev/cdc-wdm0 lost, reprobing
May 30 17:35:47 localhost ModemManager[736]:   [device 
/sys/devices/pci:00/:00:14.0/usb2/2-4] creating modem with plugin 
'Sierra' and '2' ports
May 30 17:35:47 localhost ModemManager[736]:   Could not recreate modem 
for device '/sys/devices/pci:00/:00:14.0/usb2/2-4': Failed to find a 
net port in the MBIM modem
May 30 17:35:47 localhost ModemManager[736]:  (usbmisc/cdc-wdm0): 
released by device '/sys/devices/pci:00/:00:14.0/usb2/2-4'
May 30 17:35:47 localhost ModemManager[736]:   (tty/ttyACM0): released by 
device '/sys/devices/pci:00/:00:14.0/usb2/2-4'
May 30 17:35:48 localhost kernel: [  281.202173] usb 2-4: new high-speed USB 
device number 11 using xhci_hcd
May 30 17:35:48 localhost kernel: [  281.224037] usb 2-4: New USB device found, 
idVendor=8087, idProduct=0716, bcdDevice= 0.00
May 30 17:35:48 localhost kernel: [  281.224043] usb 2-4: New USB device 
strings: Mfr=0, Product=0, SerialNumber=0
May 30 17:35:48 localhost kernel: [  281.225646] usb_serial_simple 2-4:1.0: 
flashloader converter detected
May 30 17:35:48 localhost kernel: [  281.225891] usb 2-4: flashloader converter 
now attached to ttyUSB0

which makes me suspect a missing IOMMU mapping in ACPI for the xhci
controller. In this case, the xhci could recover and re-enumerated the
device fairly quickly.
Booting with "intel_iommu=off" makes the LTE modem work at least far
enough that it gets registered to network (can send/receive SMS). I
have not tried data communication (no data plan on current SIM).
I have noticed for a while that this machine had a tendency to lose
all USB devices more often than I enable the LTE modem, so it seems
the modem just make this issue more likely, and is not their direct
cause.

This is on a 5.6.7 (Debian Sid 5.6.0-1-amd64, version from which
pasted logs are extracted), and reproduced with 5.6.14 (Debian Sid
5.6.0-2-amd64).
The USB issues have been happening for a long time, and I use this
modem rarely enough that I would not notice a new issue before several
kernel versions.
The modem usually worked "well enough" but always has had a bit of
flapping, sometimes working after one or two suspend/resume cycles,
and until now I did not feel the need to investigate more (I assumed a
less-than-optimal modem/modem driver).
This time it never ended up working after several suspend/resume
cycles and reboots. So I do not believe it is a localised regression,
but a bad situation getting just nudged over the edge.

$ lspci
00:00.0 Host bridge: Intel Corporation Broadwell-U Host Bridge -OPI (rev 09)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 5500 (rev 09)
00:03.0 Audio device: Intel Corporation Broadwell-U Audio Controller (rev 09)
00:14.0 USB controller: Intel Corporation Wildcat Point-LP USB xHCI Controller 
(rev 03)
00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI 
Controller #1 (rev 03)
00:19.0 Ethernet contr

Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code

2020-06-01 Thread Jerry Snitselaar

On Fri May 29 20, Jerry Snitselaar wrote:

On Tue Apr 14 20, Joerg Roedel wrote:

Hi,

here is the second version of this patch-set. The first version with
some more introductory text can be found here:

https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/

Changes v1->v2:

* Rebased to v5.7-rc1

* Re-wrote the arm-smmu changes as suggested by Robin Murphy

* Re-worked the Exynos patches to hopefully not break the
  driver anymore

* Fixed a missing mutex_unlock() reported by Marek Szyprowski,
  thanks for that.

There is also a git-branch available with these patches applied:


https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v2

Please review.

Thanks,

Joerg

Joerg Roedel (32):
iommu: Move default domain allocation to separate function
iommu/amd: Implement iommu_ops->def_domain_type call-back
iommu/vt-d: Wire up iommu_ops->def_domain_type
iommu/amd: Remove dma_mask check from check_device()
iommu/amd: Return -ENODEV in add_device when device is not handled by
  IOMMU
iommu: Add probe_device() and remove_device() call-backs
iommu: Move default domain allocation to iommu_probe_device()
iommu: Keep a list of allocated groups in __iommu_probe_device()
iommu: Move new probe_device path to separate function
iommu: Split off default domain allocation from group assignment
iommu: Move iommu_group_create_direct_mappings() out of
  iommu_group_add_device()
iommu: Export bus_iommu_probe() and make is safe for re-probing
iommu/amd: Remove dev_data->passthrough
iommu/amd: Convert to probe/release_device() call-backs
iommu/vt-d: Convert to probe/release_device() call-backs
iommu/arm-smmu: Convert to probe/release_device() call-backs
iommu/pamu: Convert to probe/release_device() call-backs
iommu/s390: Convert to probe/release_device() call-backs
iommu/virtio: Convert to probe/release_device() call-backs
iommu/msm: Convert to probe/release_device() call-backs
iommu/mediatek: Convert to probe/release_device() call-backs
iommu/mediatek-v1 Convert to probe/release_device() call-backs
iommu/qcom: Convert to probe/release_device() call-backs
iommu/rockchip: Convert to probe/release_device() call-backs
iommu/tegra: Convert to probe/release_device() call-backs
iommu/renesas: Convert to probe/release_device() call-backs
iommu/omap: Remove orphan_dev tracking
iommu/omap: Convert to probe/release_device() call-backs
iommu/exynos: Use first SYSMMU in controllers list for IOMMU core
iommu/exynos: Convert to probe/release_device() call-backs
iommu: Remove add_device()/remove_device() code-paths
iommu: Unexport iommu_group_get_for_dev()

Sai Praneeth Prakhya (1):
iommu: Add def_domain_type() callback in iommu_ops

drivers/iommu/amd_iommu.c   |  97 
drivers/iommu/amd_iommu_types.h |   1 -
drivers/iommu/arm-smmu-v3.c |  38 +--
drivers/iommu/arm-smmu.c|  39 ++--
drivers/iommu/exynos-iommu.c|  24 +-
drivers/iommu/fsl_pamu_domain.c |  22 +-
drivers/iommu/intel-iommu.c |  68 +-
drivers/iommu/iommu.c   | 393 +---
drivers/iommu/ipmmu-vmsa.c  |  60 ++---
drivers/iommu/msm_iommu.c   |  34 +--
drivers/iommu/mtk_iommu.c   |  24 +-
drivers/iommu/mtk_iommu_v1.c|  50 ++--
drivers/iommu/omap-iommu.c  |  99 ++--
drivers/iommu/qcom_iommu.c  |  24 +-
drivers/iommu/rockchip-iommu.c  |  26 +--
drivers/iommu/s390-iommu.c  |  22 +-
drivers/iommu/tegra-gart.c  |  24 +-
drivers/iommu/tegra-smmu.c  |  31 +--
drivers/iommu/virtio-iommu.c|  41 +---
include/linux/iommu.h   |  21 +-
20 files changed, 533 insertions(+), 605 deletions(-)

--
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu



Hi Joerg,

With this patchset, I have an epyc system where if I boot with
iommu=nopt and force a dump I will see some io page faults for a nic
on the system. The vmcore is harvested and the system reboots. I
haven't reproduced it on other systems yet, but without the patchset I
don't see the io page faults during the kdump.

Regards,
Jerry


I just hit an issue on a separate intel based system (kdump iommu=nopt),
where it panics in during intel_iommu_attach_device, in is_aux_domain,
due to device_domain_info being DEFER_DEVICE_DOMAIN_INFO. That doesn't
get set to a valid address until the domain_add_dev_info call.

Is it as simple as the following?

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 29d3940847d3..f1bbeed46a4c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5053,8 +5053,8 @@ is_aux_domain(struct device *dev, struct iommu_domain 
*domain)
 {
struct device_domain_info *info = dev->archdata.iommu;
 
-   return info && info->auxd_enabled &&

-   domain->type == IOMMU_DOMAIN_UNMANAGED;
+   return info && info != 

Re: [PATCH] iommu/amd: Fix event counter availability check

2020-06-01 Thread Alexander Monakov
On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote:

> > Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
> > the issue. This is the earliest point in amd_iommu_init_pci where the
> > call succeeds on my laptop.
> 
> According to your description, it should just need to be anywhere after the
> pci_enable_device() is called for the IOMMU device, isn't it? So, on your
> system, what if we just move the init_iommu_perf_ctr() here:

No, this doesn't work, as I already said in the paragraph you are responding
to. See my last sentence in the quoted part.

So the implication is init_device_table_dma together with subsequent cache
flush is also setting up something that is necessary for counters to be
writable.

Alexander
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix event counter availability check

2020-06-01 Thread Suravee Suthikulpanit

Hi Alexander,

On 5/30/20 3:07 AM, Alexander Monakov wrote:

The driver performs an extra check if the IOMMU's capabilities advertise
presence of performance counters: it verifies that counters are writable
by writing a hard-coded value to a counter and testing that reading that
counter gives back the same value.

Unfortunately it does so quite early, even before pci_enable_device is
called for the IOMMU, i.e. when accessing its MMIO space is not
guaranteed to work. On Ryzen 4500U CPU, this actually breaks the test:
the driver assumes the counters are not writable, and disables the
functionality.

Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
the issue. This is the earliest point in amd_iommu_init_pci where the
call succeeds on my laptop.


According to your description, it should just need to be anywhere after the pci_enable_device() is called for the IOMMU 
device, isn't it? So, on your system, what if we just move the init_iommu_perf_ctr() here:


diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5b81fd16f5fa..17b9ac9491e0 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1875,6 +1875,7 @@ static int __init amd_iommu_init_pci(void)
ret = iommu_init_pci(iommu);
if (ret)
break;
+   init_iommu_perf_ctr(iommu);
}

/*
--
2.17.1


Does this works?

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu