Re: [RFC 0/2] iommu: arm-smmu: Add support for early direct mappings
On Wed 27 May 04:03 PDT 2020, Will Deacon wrote: > Hi John, Bjorn, > > On Tue, May 26, 2020 at 01:34:45PM -0700, John Stultz wrote: > > On Thu, May 14, 2020 at 12:34 PM wrote: > > > > > > On Thu 27 Feb 18:57 PST 2020, Bjorn Andersson wrote: > > > > > > Rob, Will, we're reaching the point where upstream has enough > > > functionality that this is becoming a critical issue for us. > > > > > > E.g. Lenovo Yoga C630 is lacking this and a single dts patch to boot > > > mainline with display, GPU, WiFi and audio working and the story is > > > similar on several devboards. > > > > > > As previously described, the only thing I want is the stream mapping > > > related to the display controller in place, either with the CB with > > > translation disabled or possibly with a way to specify the framebuffer > > > region (although this turns out to mess things up in the display > > > driver...) > > > > > > I did pick this up again recently and concluded that by omitting the > > > streams for the USB controllers causes an instability issue seen on one > > > of the controller to disappear. So I would prefer if we somehow could > > > have a mechanism to only pick the display streams and the context > > > allocation for this. > > > > > > > > > Can you please share some pointers/insights/wishes for how we can > > > conclude on this subject? > > > > Ping? I just wanted to follow up on this discussion as this small > > series is crucial for booting mainline on the Dragonboard 845c > > devboard. It would be really valuable to be able to get some solution > > upstream so we can test mainline w/o adding additional patches. > > Sorry, it's been insanely busy recently and I haven't had a chance to think > about this on top of everything else. We're also carrying a hack in Android > for you :) > Thanks for taking the time to get back to us on this! > > The rest of the db845c series has been moving forward smoothly, but > > this set seems to be very stuck with no visible progress since Dec. > > > > Are there any pointers for what folks would prefer to see? > > I've had a chat with Robin about this. Originally, I was hoping that > people would all work together towards an idyllic future where firmware > would be able to describe arbitrary pre-existing mappings for devices, > irrespective of the IOMMU through which they master and Linux could > inherit this configuration. However, that hasn't materialised (there was > supposed to be an IORT update, but I don't know what happened to that) > and, in actual fact, the problem that you have on db845 is /far/ more > restricted than the general problem. > > Could you please try hacking something along the following lines and see > how you get on? You may need my for-joerg/arm-smmu/updates branch for > all the pieces: > > 1. Use the ->cfg_probe() callback to reserve the SMR/S2CRs you need > "pinning" and configure for bypass. > > 2. Use the ->def_domain_type() callback to return IOMMU_DOMAIN_IDENTITY > for the display controller > > I /think/ that's sufficient, but note that it differs from the current > approach because we don't end up reserving a CB -- bypass is configured > in the S2CR instead. Some invalidation might therefore be needed in > ->cfg_probe() after unhooking the CB. > > Thanks, and please yell if you run into problems with this approach. > This sounded straight forward and cleaner, so I implemented it... Unfortunately the hypervisor is playing tricks on me when writing to S2CR registers: - TRANS writes lands as requested - BYPASS writes ends up in the register as requested, with type FAULT - FAULT writes are ignored In other words, the Qualcomm firmware prevents us from relying on marking the relevant streams as BYPASS type. Instead Qualcomm seems to implement "bypass" by setting up stream mapping, of TRANS type, pointing to a context bank without ARM_SMMU_SCTLR_M set. Regards, Bjorn ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] driver core: platform: expose numa_node to users in sysfs
> -Original Message- > From: Greg KH [mailto:gre...@linuxfoundation.org] > Sent: Tuesday, June 2, 2020 6:11 PM > To: Song Bao Hua (Barry Song) > Cc: raf...@kernel.org; iommu@lists.linux-foundation.org; > linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; Linuxarm > ; Zengtao (B) ; Robin > Murphy > Subject: Re: [PATCH] driver core: platform: expose numa_node to users in sysfs > > On Tue, Jun 02, 2020 at 05:09:57AM +, Song Bao Hua (Barry Song) wrote: > > > > > > > > Platform devices are NUMA? That's crazy, and feels like a total > > > > abuse of platform devices and drivers that really should belong on a > "real" > > > > bus. > > > > > > I am not sure if it is an abuse of platform device. But smmu is a > > > platform device, drivers/iommu/arm-smmu-v3.c is a platform driver. > > > In a typical ARM server, there are maybe multiple SMMU devices which > > > can support IO virtual address and page tables for other devices on > > > PCI-like busses. > > > Each different SMMU device might be close to different NUMA node. > > > There is really a hardware topology. > > > > > > If you have multiple CPU packages in a NUMA server, some platform > > > devices might Belong to CPU0, some other might belong to CPU1. > > > > Those devices are populated by acpi_iort for an ARM server: > > > > drivers/acpi/arm64/iort.c: > > > > static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = { > > .name = "arm-smmu-v3", > > .dev_dma_configure = arm_smmu_v3_dma_configure, > > .dev_count_resources = arm_smmu_v3_count_resources, > > .dev_init_resources = arm_smmu_v3_init_resources, > > .dev_set_proximity = arm_smmu_v3_set_proximity, }; > > > > void __init acpi_iort_init(void) > > { > > acpi_status status; > > > > status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table); > > ... > > iort_check_id_count_workaround(iort_table); > > iort_init_platform_devices(); > > } > > > > static void __init iort_init_platform_devices(void) { > > ... > > > > for (i = 0; i < iort->node_count; i++) { > > if (iort_node >= iort_end) { > > pr_err("iort node pointer overflows, bad > table\n"); > > return; > > } > > > > iort_enable_acs(iort_node); > > > > ops = iort_get_dev_cfg(iort_node); > > if (ops) { > > fwnode = acpi_alloc_fwnode_static(); > > if (!fwnode) > > return; > > > > iort_set_fwnode(iort_node, fwnode); > > > > ret = iort_add_platform_device(iort_node, ops); > > if (ret) { > > iort_delete_fwnode(iort_node); > > acpi_free_fwnode_static(fwnode); > > return; > > } > > } > > > > ... > > } > > ... > > } > > > > NUMA node is got from ACPI: > > > > static int __init arm_smmu_v3_set_proximity(struct device *dev, > > struct acpi_iort_node > > *node) { > > struct acpi_iort_smmu_v3 *smmu; > > > > smmu = (struct acpi_iort_smmu_v3 *)node->node_data; > > if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) { > > int dev_node = acpi_map_pxm_to_node(smmu->pxm); > > > > ... > > > > set_dev_node(dev, dev_node); > > ... > > } > > return 0; > > } > > > > Barry > > That's fine, but those are "real" devices, not platform devices, right? > Most platform devices are "real" memory-mapped hardware devices. For an embedded system, almost all "simple-bus" devices are populated from device trees as platform devices. Only a part of platform devices are not "real" hardware. Smmu is a memory-mapped device. It is totally like most other platform devices populated in a memory space mapped in cpu's local space. It uses ioremap to map registers, use readl/writel to read/write its space. > What platform device has this issue? What one will show up this way with > the new patch? if platform device shouldn't be a real hardware, there is no platform device with a hardware topology. But platform devices are "real" hardware at most time. Smmu is a "real" device, but it is a platform device in Linux. > > thanks, > > greg k-h -barry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] driver core: platform: expose numa_node to users in sysfs
On Tue, Jun 02, 2020 at 05:09:57AM +, Song Bao Hua (Barry Song) wrote: > > > > > > Platform devices are NUMA? That's crazy, and feels like a total abuse > > > of platform devices and drivers that really should belong on a "real" > > > bus. > > > > I am not sure if it is an abuse of platform device. But smmu is a platform > > device, > > drivers/iommu/arm-smmu-v3.c is a platform driver. > > In a typical ARM server, there are maybe multiple SMMU devices which can > > support > > IO virtual address and page tables for other devices on PCI-like busses. > > Each different SMMU device might be close to different NUMA node. There is > > really a hardware topology. > > > > If you have multiple CPU packages in a NUMA server, some platform devices > > might > > Belong to CPU0, some other might belong to CPU1. > > Those devices are populated by acpi_iort for an ARM server: > > drivers/acpi/arm64/iort.c: > > static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = { > .name = "arm-smmu-v3", > .dev_dma_configure = arm_smmu_v3_dma_configure, > .dev_count_resources = arm_smmu_v3_count_resources, > .dev_init_resources = arm_smmu_v3_init_resources, > .dev_set_proximity = arm_smmu_v3_set_proximity, > }; > > void __init acpi_iort_init(void) > { > acpi_status status; > > status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table); > ... > iort_check_id_count_workaround(iort_table); > iort_init_platform_devices(); > } > > static void __init iort_init_platform_devices(void) > { > ... > > for (i = 0; i < iort->node_count; i++) { > if (iort_node >= iort_end) { > pr_err("iort node pointer overflows, bad table\n"); > return; > } > > iort_enable_acs(iort_node); > > ops = iort_get_dev_cfg(iort_node); > if (ops) { > fwnode = acpi_alloc_fwnode_static(); > if (!fwnode) > return; > > iort_set_fwnode(iort_node, fwnode); > > ret = iort_add_platform_device(iort_node, ops); > if (ret) { > iort_delete_fwnode(iort_node); > acpi_free_fwnode_static(fwnode); > return; > } > } > > ... > } > ... > } > > NUMA node is got from ACPI: > > static int __init arm_smmu_v3_set_proximity(struct device *dev, > struct acpi_iort_node *node) > { > struct acpi_iort_smmu_v3 *smmu; > > smmu = (struct acpi_iort_smmu_v3 *)node->node_data; > if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) { > int dev_node = acpi_map_pxm_to_node(smmu->pxm); > > ... > > set_dev_node(dev, dev_node); > ... > } > return 0; > } > > Barry That's fine, but those are "real" devices, not platform devices, right? What platform device has this issue? What one will show up this way with the new patch? thanks, greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: iommu: Improve exception handling in iommu_group_alloc()
On Tue, Jun 02, 2020 at 07:01:02AM +0200, Markus Elfring wrote: > >> * I suggest to avoid the specification of duplicate function calls. > >> Will it be helpful to add a few jump targets? > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162#n455 > > > > I don't think it is helpful or readable to add some jump targets here, > > since the exception handling is very simple here. > > Do you disagree to the application of the Linux coding style then > for the recommended exception handling? No, that's not what I mean. My point is the exception handling in this patch is simple and no need to add 'goto' statement which does not help to improve readability. And I agree it is helpful for the cases where a function exits from multiple locations and more same cleanup work need to do. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] driver core: platform: expose numa_node to users in sysfs
For some platform devices like iommu, particually ARM smmu, users may care about the numa locality. for example, if threads and drivers run near iommu, they may get much better performance on dma_unmap_sg. For other platform devices, users may still want to know the hardware topology. Cc: Prime Zeng Cc: Robin Murphy Signed-off-by: Barry Song --- -v2: add the numa_node entry in Documentation/ABI/ Documentation/ABI/testing/sysfs-bus-platform | 10 drivers/base/platform.c | 26 +++- 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-bus-platform b/Documentation/ABI/testing/sysfs-bus-platform index 5172a6124b27..e8f5958c1d18 100644 --- a/Documentation/ABI/testing/sysfs-bus-platform +++ b/Documentation/ABI/testing/sysfs-bus-platform @@ -18,3 +18,13 @@ Description: devices to opt-out of driver binding using a driver_override name such as "none". Only a single driver may be specified in the override, there is no support for parsing delimiters. + +What: /sys/bus/platform/devices/.../numa_node +Date: June 2020 +Contact: Barry Song +Description: + This file contains the NUMA node to which the platform device + is attached. It won't be visible if the node is unknown. The + value comes from an ACPI _PXM method or a similar firmware + source. Initial users for this file would be devices like + arm smmu which are populated by arm64 acpi_iort. diff --git a/drivers/base/platform.c b/drivers/base/platform.c index b27d0f6c18c9..7794b9a38d82 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -1062,13 +1062,37 @@ static ssize_t driver_override_show(struct device *dev, } static DEVICE_ATTR_RW(driver_override); +static ssize_t numa_node_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", dev_to_node(dev)); +} +static DEVICE_ATTR_RO(numa_node); + +static umode_t platform_dev_attrs_visible(struct kobject *kobj, struct attribute *a, + int n) +{ + struct device *dev = container_of(kobj, typeof(*dev), kobj); + + if (a == &dev_attr_numa_node.attr && + dev_to_node(dev) == NUMA_NO_NODE) + return 0; + + return a->mode; +} static struct attribute *platform_dev_attrs[] = { &dev_attr_modalias.attr, + &dev_attr_numa_node.attr, &dev_attr_driver_override.attr, NULL, }; -ATTRIBUTE_GROUPS(platform_dev); + +static struct attribute_group platform_dev_group = { + .attrs = platform_dev_attrs, + .is_visible = platform_dev_attrs_visible, +}; +__ATTRIBUTE_GROUPS(platform_dev); static int platform_uevent(struct device *dev, struct kobj_uevent_env *env) { -- 2.23.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] driver core: platform: expose numa_node to users in sysfs
> > > > Platform devices are NUMA? That's crazy, and feels like a total abuse > > of platform devices and drivers that really should belong on a "real" > > bus. > > I am not sure if it is an abuse of platform device. But smmu is a platform > device, > drivers/iommu/arm-smmu-v3.c is a platform driver. > In a typical ARM server, there are maybe multiple SMMU devices which can > support > IO virtual address and page tables for other devices on PCI-like busses. > Each different SMMU device might be close to different NUMA node. There is > really a hardware topology. > > If you have multiple CPU packages in a NUMA server, some platform devices > might > Belong to CPU0, some other might belong to CPU1. Those devices are populated by acpi_iort for an ARM server: drivers/acpi/arm64/iort.c: static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = { .name = "arm-smmu-v3", .dev_dma_configure = arm_smmu_v3_dma_configure, .dev_count_resources = arm_smmu_v3_count_resources, .dev_init_resources = arm_smmu_v3_init_resources, .dev_set_proximity = arm_smmu_v3_set_proximity, }; void __init acpi_iort_init(void) { acpi_status status; status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table); ... iort_check_id_count_workaround(iort_table); iort_init_platform_devices(); } static void __init iort_init_platform_devices(void) { ... for (i = 0; i < iort->node_count; i++) { if (iort_node >= iort_end) { pr_err("iort node pointer overflows, bad table\n"); return; } iort_enable_acs(iort_node); ops = iort_get_dev_cfg(iort_node); if (ops) { fwnode = acpi_alloc_fwnode_static(); if (!fwnode) return; iort_set_fwnode(iort_node, fwnode); ret = iort_add_platform_device(iort_node, ops); if (ret) { iort_delete_fwnode(iort_node); acpi_free_fwnode_static(fwnode); return; } } ... } ... } NUMA node is got from ACPI: static int __init arm_smmu_v3_set_proximity(struct device *dev, struct acpi_iort_node *node) { struct acpi_iort_smmu_v3 *smmu; smmu = (struct acpi_iort_smmu_v3 *)node->node_data; if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) { int dev_node = acpi_map_pxm_to_node(smmu->pxm); ... set_dev_node(dev, dev_node); ... } return 0; } Barry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: iommu: Improve exception handling in iommu_group_alloc()
>> * I suggest to avoid the specification of duplicate function calls. >> Will it be helpful to add a few jump targets? >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162#n455 > > I don't think it is helpful or readable to add some jump targets here, > since the exception handling is very simple here. Do you disagree to the application of the Linux coding style then for the recommended exception handling? Regards, Markus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] driver core: platform: expose numa_node to users in sysfs
> -Original Message- > From: Greg KH [mailto:gre...@linuxfoundation.org] > Sent: Tuesday, June 2, 2020 4:24 PM > To: Song Bao Hua (Barry Song) > Cc: raf...@kernel.org; iommu@lists.linux-foundation.org; > linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; Linuxarm > ; Zengtao (B) ; Robin > Murphy > Subject: Re: [PATCH] driver core: platform: expose numa_node to users in sysfs > > On Tue, Jun 02, 2020 at 03:01:39PM +1200, Barry Song wrote: > > For some platform devices like iommu, particually ARM smmu, users may > > care about the numa locality. for example, if threads and drivers run > > near iommu, they may get much better performance on dma_unmap_sg. > > For other platform devices, users may still want to know the hardware > > topology. > > > > Cc: Prime Zeng > > Cc: Robin Murphy > > Signed-off-by: Barry Song > > --- > > drivers/base/platform.c | 26 +- > > 1 file changed, 25 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > > index b27d0f6c18c9..7794b9a38d82 100644 > > --- a/drivers/base/platform.c > > +++ b/drivers/base/platform.c > > @@ -1062,13 +1062,37 @@ static ssize_t driver_override_show(struct > device *dev, > > } > > static DEVICE_ATTR_RW(driver_override); > > > > +static ssize_t numa_node_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + return sprintf(buf, "%d\n", dev_to_node(dev)); > > +} > > +static DEVICE_ATTR_RO(numa_node); > > + > > +static umode_t platform_dev_attrs_visible(struct kobject *kobj, struct > attribute *a, > > + int n) > > +{ > > + struct device *dev = container_of(kobj, typeof(*dev), kobj); > > + > > + if (a == &dev_attr_numa_node.attr && > > + dev_to_node(dev) == NUMA_NO_NODE) > > + return 0; > > + > > + return a->mode; > > +} > > > > static struct attribute *platform_dev_attrs[] = { > > &dev_attr_modalias.attr, > > + &dev_attr_numa_node.attr, > > &dev_attr_driver_override.attr, > > NULL, > > }; > > -ATTRIBUTE_GROUPS(platform_dev); > > + > > +static struct attribute_group platform_dev_group = { > > + .attrs = platform_dev_attrs, > > + .is_visible = platform_dev_attrs_visible, > > +}; > > +__ATTRIBUTE_GROUPS(platform_dev); > > > > static int platform_uevent(struct device *dev, struct kobj_uevent_env *env) > > { > > Platform devices are NUMA? That's crazy, and feels like a total abuse > of platform devices and drivers that really should belong on a "real" > bus. I am not sure if it is an abuse of platform device. But smmu is a platform device, drivers/iommu/arm-smmu-v3.c is a platform driver. In a typical ARM server, there are maybe multiple SMMU devices which can support IO virtual address and page tables for other devices on PCI-like busses. Each different SMMU device might be close to different NUMA node. There is really a hardware topology. If you have multiple CPU packages in a NUMA server, some platform devices might Belong to CPU0, some other might belong to CPU1. -barry > > What drivers in the kernel today have this issue? > > thanks, > > greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] driver core: platform: expose numa_node to users in sysfs
On Tue, Jun 02, 2020 at 03:01:39PM +1200, Barry Song wrote: > For some platform devices like iommu, particually ARM smmu, users may > care about the numa locality. for example, if threads and drivers run > near iommu, they may get much better performance on dma_unmap_sg. > For other platform devices, users may still want to know the hardware > topology. > > Cc: Prime Zeng > Cc: Robin Murphy > Signed-off-by: Barry Song > --- > drivers/base/platform.c | 26 +- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > index b27d0f6c18c9..7794b9a38d82 100644 > --- a/drivers/base/platform.c > +++ b/drivers/base/platform.c > @@ -1062,13 +1062,37 @@ static ssize_t driver_override_show(struct device > *dev, > } > static DEVICE_ATTR_RW(driver_override); > > +static ssize_t numa_node_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + return sprintf(buf, "%d\n", dev_to_node(dev)); > +} > +static DEVICE_ATTR_RO(numa_node); > + > +static umode_t platform_dev_attrs_visible(struct kobject *kobj, struct > attribute *a, > + int n) > +{ > + struct device *dev = container_of(kobj, typeof(*dev), kobj); > + > + if (a == &dev_attr_numa_node.attr && > + dev_to_node(dev) == NUMA_NO_NODE) > + return 0; > + > + return a->mode; > +} > > static struct attribute *platform_dev_attrs[] = { > &dev_attr_modalias.attr, > + &dev_attr_numa_node.attr, > &dev_attr_driver_override.attr, > NULL, > }; > -ATTRIBUTE_GROUPS(platform_dev); > + > +static struct attribute_group platform_dev_group = { > + .attrs = platform_dev_attrs, > + .is_visible = platform_dev_attrs_visible, > +}; > +__ATTRIBUTE_GROUPS(platform_dev); > > static int platform_uevent(struct device *dev, struct kobj_uevent_env *env) > { Also you forgot a new entry in Documentation/ABI/ :( ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] driver core: platform: expose numa_node to users in sysfs
On Tue, Jun 02, 2020 at 03:01:39PM +1200, Barry Song wrote: > For some platform devices like iommu, particually ARM smmu, users may > care about the numa locality. for example, if threads and drivers run > near iommu, they may get much better performance on dma_unmap_sg. > For other platform devices, users may still want to know the hardware > topology. > > Cc: Prime Zeng > Cc: Robin Murphy > Signed-off-by: Barry Song > --- > drivers/base/platform.c | 26 +- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > index b27d0f6c18c9..7794b9a38d82 100644 > --- a/drivers/base/platform.c > +++ b/drivers/base/platform.c > @@ -1062,13 +1062,37 @@ static ssize_t driver_override_show(struct device > *dev, > } > static DEVICE_ATTR_RW(driver_override); > > +static ssize_t numa_node_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + return sprintf(buf, "%d\n", dev_to_node(dev)); > +} > +static DEVICE_ATTR_RO(numa_node); > + > +static umode_t platform_dev_attrs_visible(struct kobject *kobj, struct > attribute *a, > + int n) > +{ > + struct device *dev = container_of(kobj, typeof(*dev), kobj); > + > + if (a == &dev_attr_numa_node.attr && > + dev_to_node(dev) == NUMA_NO_NODE) > + return 0; > + > + return a->mode; > +} > > static struct attribute *platform_dev_attrs[] = { > &dev_attr_modalias.attr, > + &dev_attr_numa_node.attr, > &dev_attr_driver_override.attr, > NULL, > }; > -ATTRIBUTE_GROUPS(platform_dev); > + > +static struct attribute_group platform_dev_group = { > + .attrs = platform_dev_attrs, > + .is_visible = platform_dev_attrs_visible, > +}; > +__ATTRIBUTE_GROUPS(platform_dev); > > static int platform_uevent(struct device *dev, struct kobj_uevent_env *env) > { Platform devices are NUMA? That's crazy, and feels like a total abuse of platform devices and drivers that really should belong on a "real" bus. What drivers in the kernel today have this issue? thanks, greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] driver core: platform: expose numa_node to users in sysfs
For some platform devices like iommu, particually ARM smmu, users may care about the numa locality. for example, if threads and drivers run near iommu, they may get much better performance on dma_unmap_sg. For other platform devices, users may still want to know the hardware topology. Cc: Prime Zeng Cc: Robin Murphy Signed-off-by: Barry Song --- drivers/base/platform.c | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/drivers/base/platform.c b/drivers/base/platform.c index b27d0f6c18c9..7794b9a38d82 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -1062,13 +1062,37 @@ static ssize_t driver_override_show(struct device *dev, } static DEVICE_ATTR_RW(driver_override); +static ssize_t numa_node_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", dev_to_node(dev)); +} +static DEVICE_ATTR_RO(numa_node); + +static umode_t platform_dev_attrs_visible(struct kobject *kobj, struct attribute *a, + int n) +{ + struct device *dev = container_of(kobj, typeof(*dev), kobj); + + if (a == &dev_attr_numa_node.attr && + dev_to_node(dev) == NUMA_NO_NODE) + return 0; + + return a->mode; +} static struct attribute *platform_dev_attrs[] = { &dev_attr_modalias.attr, + &dev_attr_numa_node.attr, &dev_attr_driver_override.attr, NULL, }; -ATTRIBUTE_GROUPS(platform_dev); + +static struct attribute_group platform_dev_group = { + .attrs = platform_dev_attrs, + .is_visible = platform_dev_attrs_visible, +}; +__ATTRIBUTE_GROUPS(platform_dev); static int platform_uevent(struct device *dev, struct kobj_uevent_env *env) { -- 2.23.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu: Improve exception handling in iommu_group_alloc()
Improve the exception handling to free the resources correctly when failed to allocate an iommu group. Fixes: bc7d12b91bd3 ("iommu: Implement reserved_regions iommu-group sysfs file") Signed-off-by: Baolin Wang --- Changes from v1: - Improve the commmit message. - Add Fixes tag. --- drivers/iommu/iommu.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 03d6a26..ac91024 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -529,12 +529,18 @@ struct iommu_group *iommu_group_alloc(void) ret = iommu_group_create_file(group, &iommu_group_attr_reserved_regions); - if (ret) + if (ret) { + kobject_put(group->devices_kobj); return ERR_PTR(ret); + } ret = iommu_group_create_file(group, &iommu_group_attr_type); - if (ret) + if (ret) { + iommu_group_remove_file(group, + &iommu_group_attr_reserved_regions); + kobject_put(group->devices_kobj); return ERR_PTR(ret); + } pr_debug("Allocated group %d\n", group->id); -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Improve exception handling in iommu_group_alloc()
On Mon, Jun 01, 2020 at 02:38:05PM +0200, Markus Elfring wrote: > > Optimize the error handling to free the resources correctly when > > failed to allocate an iommu group. > > * I would not categorise the desired completion of exception handling > as a software optimisation. > > * Would you like to add the tag “Fixes” to the commit message? Sure. > > * I suggest to avoid the specification of duplicate function calls. > Will it be helpful to add a few jump targets? > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162#n455 I don't think it is helpful or readable to add some jump targets here, since the exception handling is very simple here. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code
On Tue Jun 02 20, Lu Baolu wrote: Hi Jerry, On 6/1/20 6:42 PM, Jerry Snitselaar wrote: Hi Joerg, With this patchset, I have an epyc system where if I boot with iommu=nopt and force a dump I will see some io page faults for a nic on the system. The vmcore is harvested and the system reboots. I haven't reproduced it on other systems yet, but without the patchset I don't see the io page faults during the kdump. Regards, Jerry I just hit an issue on a separate intel based system (kdump iommu=nopt), where it panics in during intel_iommu_attach_device, in is_aux_domain, due to device_domain_info being DEFER_DEVICE_DOMAIN_INFO. That doesn't get set to a valid address until the domain_add_dev_info call. Is it as simple as the following? I guess you won't hit this issue if you use iommu/next branch of Joerg's tree. We've changed to use a generic helper to retrieve the valid per device iommu data or NULL (if there's no). Best regards, baolu Yeah, that will solve the panic. diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 29d3940847d3..f1bbeed46a4c 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5053,8 +5053,8 @@ is_aux_domain(struct device *dev, struct iommu_domain *domain) { struct device_domain_info *info = dev->archdata.iommu; - return info && info->auxd_enabled && - domain->type == IOMMU_DOMAIN_UNMANAGED; + return info && info != DEFER_DEVICE_DOMAIN_INFO && + info->auxd_enabled && domain->type == IOMMU_DOMAIN_UNMANAGED; } static void auxiliary_link_device(struct dmar_domain *domain, Regards, Jerry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code
Hi Jerry, On 6/1/20 9:17 PM, Jerry Snitselaar wrote: On Mon Jun 01 20, Jerry Snitselaar wrote: On Fri May 29 20, Jerry Snitselaar wrote: On Tue Apr 14 20, Joerg Roedel wrote: Hi, here is the second version of this patch-set. The first version with some more introductory text can be found here: https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/ Changes v1->v2: * Rebased to v5.7-rc1 * Re-wrote the arm-smmu changes as suggested by Robin Murphy * Re-worked the Exynos patches to hopefully not break the driver anymore * Fixed a missing mutex_unlock() reported by Marek Szyprowski, thanks for that. There is also a git-branch available with these patches applied: https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v2 Please review. Thanks, Joerg Joerg Roedel (32): iommu: Move default domain allocation to separate function iommu/amd: Implement iommu_ops->def_domain_type call-back iommu/vt-d: Wire up iommu_ops->def_domain_type iommu/amd: Remove dma_mask check from check_device() iommu/amd: Return -ENODEV in add_device when device is not handled by IOMMU iommu: Add probe_device() and remove_device() call-backs iommu: Move default domain allocation to iommu_probe_device() iommu: Keep a list of allocated groups in __iommu_probe_device() iommu: Move new probe_device path to separate function iommu: Split off default domain allocation from group assignment iommu: Move iommu_group_create_direct_mappings() out of iommu_group_add_device() iommu: Export bus_iommu_probe() and make is safe for re-probing iommu/amd: Remove dev_data->passthrough iommu/amd: Convert to probe/release_device() call-backs iommu/vt-d: Convert to probe/release_device() call-backs iommu/arm-smmu: Convert to probe/release_device() call-backs iommu/pamu: Convert to probe/release_device() call-backs iommu/s390: Convert to probe/release_device() call-backs iommu/virtio: Convert to probe/release_device() call-backs iommu/msm: Convert to probe/release_device() call-backs iommu/mediatek: Convert to probe/release_device() call-backs iommu/mediatek-v1 Convert to probe/release_device() call-backs iommu/qcom: Convert to probe/release_device() call-backs iommu/rockchip: Convert to probe/release_device() call-backs iommu/tegra: Convert to probe/release_device() call-backs iommu/renesas: Convert to probe/release_device() call-backs iommu/omap: Remove orphan_dev tracking iommu/omap: Convert to probe/release_device() call-backs iommu/exynos: Use first SYSMMU in controllers list for IOMMU core iommu/exynos: Convert to probe/release_device() call-backs iommu: Remove add_device()/remove_device() code-paths iommu: Unexport iommu_group_get_for_dev() Sai Praneeth Prakhya (1): iommu: Add def_domain_type() callback in iommu_ops drivers/iommu/amd_iommu.c | 97 drivers/iommu/amd_iommu_types.h | 1 - drivers/iommu/arm-smmu-v3.c | 38 +-- drivers/iommu/arm-smmu.c | 39 ++-- drivers/iommu/exynos-iommu.c | 24 +- drivers/iommu/fsl_pamu_domain.c | 22 +- drivers/iommu/intel-iommu.c | 68 +- drivers/iommu/iommu.c | 393 +--- drivers/iommu/ipmmu-vmsa.c | 60 ++--- drivers/iommu/msm_iommu.c | 34 +-- drivers/iommu/mtk_iommu.c | 24 +- drivers/iommu/mtk_iommu_v1.c | 50 ++-- drivers/iommu/omap-iommu.c | 99 ++-- drivers/iommu/qcom_iommu.c | 24 +- drivers/iommu/rockchip-iommu.c | 26 +-- drivers/iommu/s390-iommu.c | 22 +- drivers/iommu/tegra-gart.c | 24 +- drivers/iommu/tegra-smmu.c | 31 +-- drivers/iommu/virtio-iommu.c | 41 +--- include/linux/iommu.h | 21 +- 20 files changed, 533 insertions(+), 605 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu Hi Joerg, With this patchset, I have an epyc system where if I boot with iommu=nopt and force a dump I will see some io page faults for a nic on the system. The vmcore is harvested and the system reboots. I haven't reproduced it on other systems yet, but without the patchset I don't see the io page faults during the kdump. Regards, Jerry I just hit an issue on a separate intel based system (kdump iommu=nopt), where it panics in during intel_iommu_attach_device, in is_aux_domain, due to device_domain_info being DEFER_DEVICE_DOMAIN_INFO. That doesn't get set to a valid address until the domain_add_dev_info call. Is it as simple as the following? diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 29d3940847d3..f1bbeed46a4c 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5053,8 +5053,8 @@ is_aux_domain(struct device *dev, struct iommu_domain *domain) { struct device_domain_info *info = dev->archdata.iommu; - return info && info->auxd_enabled && - domain->type
Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code
Hi Jerry, On 6/1/20 6:42 PM, Jerry Snitselaar wrote: Hi Joerg, With this patchset, I have an epyc system where if I boot with iommu=nopt and force a dump I will see some io page faults for a nic on the system. The vmcore is harvested and the system reboots. I haven't reproduced it on other systems yet, but without the patchset I don't see the io page faults during the kdump. Regards, Jerry I just hit an issue on a separate intel based system (kdump iommu=nopt), where it panics in during intel_iommu_attach_device, in is_aux_domain, due to device_domain_info being DEFER_DEVICE_DOMAIN_INFO. That doesn't get set to a valid address until the domain_add_dev_info call. Is it as simple as the following? I guess you won't hit this issue if you use iommu/next branch of Joerg's tree. We've changed to use a generic helper to retrieve the valid per device iommu data or NULL (if there's no). Best regards, baolu diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 29d3940847d3..f1bbeed46a4c 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5053,8 +5053,8 @@ is_aux_domain(struct device *dev, struct iommu_domain *domain) { struct device_domain_info *info = dev->archdata.iommu; - return info && info->auxd_enabled && - domain->type == IOMMU_DOMAIN_UNMANAGED; + return info && info != DEFER_DEVICE_DOMAIN_INFO && + info->auxd_enabled && domain->type == IOMMU_DOMAIN_UNMANAGED; } static void auxiliary_link_device(struct dmar_domain *domain, Regards, Jerry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.
On Mon, Jun 01, 2020 at 03:56:55PM -0600, Alex Williamson wrote: > On Mon, 1 Jun 2020 14:40:23 -0700 > "Raj, Ashok" wrote: > > > On Mon, Jun 01, 2020 at 04:25:19PM -0500, Bjorn Helgaas wrote: > > > On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote: > > > > All Intel platforms guarantee that all root complex implementations > > > > must send transactions up to IOMMU for address translations. Hence for > > > > RCiEP devices that are Vendor ID Intel, can claim exception for lack of > > > > ACS support. > > > > > > > > > > > > 3.16 Root-Complex Peer to Peer Considerations > > > > When DMA remapping is enabled, peer-to-peer requests through the > > > > Root-Complex must be handled > > > > as follows: > > > > • The input address in the request is translated (through first-level, > > > > second-level or nested translation) to a host physical address (HPA). > > > > The address decoding for peer addresses must be done only on the > > > > translated HPA. Hardware implementations are free to further limit > > > > peer-to-peer accesses to specific host physical address regions > > > > (or to completely disallow peer-forwarding of translated requests). > > > > • Since address translation changes the contents (address field) of > > > > the PCI Express Transaction Layer Packet (TLP), for PCI Express > > > > peer-to-peer requests with ECRC, the Root-Complex hardware must use > > > > the new ECRC (re-computed with the translated address) if it > > > > decides to forward the TLP as a peer request. > > > > • Root-ports, and multi-function root-complex integrated endpoints, may > > > > support additional peerto-peer control features by supporting PCI > > > > Express > > > > Access Control Services (ACS) capability. Refer to ACS capability in > > > > PCI Express specifications for details. > > > > > > > > Since Linux didn't give special treatment to allow this exception, > > > > certain > > > > RCiEP MFD devices are getting grouped in a single iommu group. This > > > > doesn't permit a single device to be assigned to a guest for instance. > > > > > > > > In one vendor system: Device 14.x were grouped in a single IOMMU group. > > > > > > > > /sys/kernel/iommu_groups/5/devices/:00:14.0 > > > > /sys/kernel/iommu_groups/5/devices/:00:14.2 > > > > /sys/kernel/iommu_groups/5/devices/:00:14.3 > > > > > > > > After the patch: > > > > /sys/kernel/iommu_groups/5/devices/:00:14.0 > > > > /sys/kernel/iommu_groups/5/devices/:00:14.2 > > > > /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group > > > > > > > > 14.0 and 14.2 are integrated devices, but legacy end points. > > > > Whereas 14.3 was a PCIe compliant RCiEP. > > > > > > > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) > > > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > > > > > > > > This permits assigning this device to a guest VM. > > > > > > > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") > > > > Signed-off-by: Ashok Raj > > > > To: Joerg Roedel > > > > To: Bjorn Helgaas > > > > Cc: linux-ker...@vger.kernel.org > > > > Cc: iommu@lists.linux-foundation.org > > > > Cc: Lu Baolu > > > > Cc: Alex Williamson > > > > Cc: Darrel Goeddel > > > > Cc: Mark Scott , > > > > Cc: Romil Sharma > > > > Cc: Ashok Raj > > > > > > Tentatively applied to pci/virtualization for v5.8, thanks! > > > > > > The spec says this handling must apply "when DMA remapping is > > > enabled". The patch does not check whether DMA remapping is enabled. > > > > > > Is there any case where DMA remapping is *not* enabled, and we rely on > > > this patch to tell us whether the device is isolated? It sounds like > > > it may give the wrong answer in such a case? > > > > > > Can you confirm that I don't need to worry about this? > > > > I think all of this makes sense only when DMA remapping is enabled. > > Otherwise there is no enforcement for isolation. > > Yep, without an IOMMU all devices operate in the same IOVA space and we > have no isolation. We only enable ACS when an IOMMU driver requests it > and it's only used by IOMMU code to determine IOMMU grouping of > devices. Thanks, Thanks, Ashok and Alex. I wish it were more obvious from the code, but I am reassured. I also added a stable tag to help get this backported. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.
On Mon, 1 Jun 2020 14:40:23 -0700 "Raj, Ashok" wrote: > On Mon, Jun 01, 2020 at 04:25:19PM -0500, Bjorn Helgaas wrote: > > On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote: > > > All Intel platforms guarantee that all root complex implementations > > > must send transactions up to IOMMU for address translations. Hence for > > > RCiEP devices that are Vendor ID Intel, can claim exception for lack of > > > ACS support. > > > > > > > > > 3.16 Root-Complex Peer to Peer Considerations > > > When DMA remapping is enabled, peer-to-peer requests through the > > > Root-Complex must be handled > > > as follows: > > > • The input address in the request is translated (through first-level, > > > second-level or nested translation) to a host physical address (HPA). > > > The address decoding for peer addresses must be done only on the > > > translated HPA. Hardware implementations are free to further limit > > > peer-to-peer accesses to specific host physical address regions > > > (or to completely disallow peer-forwarding of translated requests). > > > • Since address translation changes the contents (address field) of > > > the PCI Express Transaction Layer Packet (TLP), for PCI Express > > > peer-to-peer requests with ECRC, the Root-Complex hardware must use > > > the new ECRC (re-computed with the translated address) if it > > > decides to forward the TLP as a peer request. > > > • Root-ports, and multi-function root-complex integrated endpoints, may > > > support additional peerto-peer control features by supporting PCI > > > Express > > > Access Control Services (ACS) capability. Refer to ACS capability in > > > PCI Express specifications for details. > > > > > > Since Linux didn't give special treatment to allow this exception, certain > > > RCiEP MFD devices are getting grouped in a single iommu group. This > > > doesn't permit a single device to be assigned to a guest for instance. > > > > > > In one vendor system: Device 14.x were grouped in a single IOMMU group. > > > > > > /sys/kernel/iommu_groups/5/devices/:00:14.0 > > > /sys/kernel/iommu_groups/5/devices/:00:14.2 > > > /sys/kernel/iommu_groups/5/devices/:00:14.3 > > > > > > After the patch: > > > /sys/kernel/iommu_groups/5/devices/:00:14.0 > > > /sys/kernel/iommu_groups/5/devices/:00:14.2 > > > /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group > > > > > > 14.0 and 14.2 are integrated devices, but legacy end points. > > > Whereas 14.3 was a PCIe compliant RCiEP. > > > > > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) > > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > > > > > > This permits assigning this device to a guest VM. > > > > > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") > > > Signed-off-by: Ashok Raj > > > To: Joerg Roedel > > > To: Bjorn Helgaas > > > Cc: linux-ker...@vger.kernel.org > > > Cc: iommu@lists.linux-foundation.org > > > Cc: Lu Baolu > > > Cc: Alex Williamson > > > Cc: Darrel Goeddel > > > Cc: Mark Scott , > > > Cc: Romil Sharma > > > Cc: Ashok Raj > > > > Tentatively applied to pci/virtualization for v5.8, thanks! > > > > The spec says this handling must apply "when DMA remapping is > > enabled". The patch does not check whether DMA remapping is enabled. > > > > Is there any case where DMA remapping is *not* enabled, and we rely on > > this patch to tell us whether the device is isolated? It sounds like > > it may give the wrong answer in such a case? > > > > Can you confirm that I don't need to worry about this? > > I think all of this makes sense only when DMA remapping is enabled. > Otherwise there is no enforcement for isolation. Yep, without an IOMMU all devices operate in the same IOVA space and we have no isolation. We only enable ACS when an IOMMU driver requests it and it's only used by IOMMU code to determine IOMMU grouping of devices. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.
On Mon, Jun 01, 2020 at 04:25:19PM -0500, Bjorn Helgaas wrote: > On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote: > > All Intel platforms guarantee that all root complex implementations > > must send transactions up to IOMMU for address translations. Hence for > > RCiEP devices that are Vendor ID Intel, can claim exception for lack of > > ACS support. > > > > > > 3.16 Root-Complex Peer to Peer Considerations > > When DMA remapping is enabled, peer-to-peer requests through the > > Root-Complex must be handled > > as follows: > > • The input address in the request is translated (through first-level, > > second-level or nested translation) to a host physical address (HPA). > > The address decoding for peer addresses must be done only on the > > translated HPA. Hardware implementations are free to further limit > > peer-to-peer accesses to specific host physical address regions > > (or to completely disallow peer-forwarding of translated requests). > > • Since address translation changes the contents (address field) of > > the PCI Express Transaction Layer Packet (TLP), for PCI Express > > peer-to-peer requests with ECRC, the Root-Complex hardware must use > > the new ECRC (re-computed with the translated address) if it > > decides to forward the TLP as a peer request. > > • Root-ports, and multi-function root-complex integrated endpoints, may > > support additional peerto-peer control features by supporting PCI Express > > Access Control Services (ACS) capability. Refer to ACS capability in > > PCI Express specifications for details. > > > > Since Linux didn't give special treatment to allow this exception, certain > > RCiEP MFD devices are getting grouped in a single iommu group. This > > doesn't permit a single device to be assigned to a guest for instance. > > > > In one vendor system: Device 14.x were grouped in a single IOMMU group. > > > > /sys/kernel/iommu_groups/5/devices/:00:14.0 > > /sys/kernel/iommu_groups/5/devices/:00:14.2 > > /sys/kernel/iommu_groups/5/devices/:00:14.3 > > > > After the patch: > > /sys/kernel/iommu_groups/5/devices/:00:14.0 > > /sys/kernel/iommu_groups/5/devices/:00:14.2 > > /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group > > > > 14.0 and 14.2 are integrated devices, but legacy end points. > > Whereas 14.3 was a PCIe compliant RCiEP. > > > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > > > > This permits assigning this device to a guest VM. > > > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") > > Signed-off-by: Ashok Raj > > To: Joerg Roedel > > To: Bjorn Helgaas > > Cc: linux-ker...@vger.kernel.org > > Cc: iommu@lists.linux-foundation.org > > Cc: Lu Baolu > > Cc: Alex Williamson > > Cc: Darrel Goeddel > > Cc: Mark Scott , > > Cc: Romil Sharma > > Cc: Ashok Raj > > Tentatively applied to pci/virtualization for v5.8, thanks! > > The spec says this handling must apply "when DMA remapping is > enabled". The patch does not check whether DMA remapping is enabled. > > Is there any case where DMA remapping is *not* enabled, and we rely on > this patch to tell us whether the device is isolated? It sounds like > it may give the wrong answer in such a case? > > Can you confirm that I don't need to worry about this? I think all of this makes sense only when DMA remapping is enabled. Otherwise there is no enforcement for isolation. Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.
On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote: > All Intel platforms guarantee that all root complex implementations > must send transactions up to IOMMU for address translations. Hence for > RCiEP devices that are Vendor ID Intel, can claim exception for lack of > ACS support. > > > 3.16 Root-Complex Peer to Peer Considerations > When DMA remapping is enabled, peer-to-peer requests through the > Root-Complex must be handled > as follows: > • The input address in the request is translated (through first-level, > second-level or nested translation) to a host physical address (HPA). > The address decoding for peer addresses must be done only on the > translated HPA. Hardware implementations are free to further limit > peer-to-peer accesses to specific host physical address regions > (or to completely disallow peer-forwarding of translated requests). > • Since address translation changes the contents (address field) of > the PCI Express Transaction Layer Packet (TLP), for PCI Express > peer-to-peer requests with ECRC, the Root-Complex hardware must use > the new ECRC (re-computed with the translated address) if it > decides to forward the TLP as a peer request. > • Root-ports, and multi-function root-complex integrated endpoints, may > support additional peerto-peer control features by supporting PCI Express > Access Control Services (ACS) capability. Refer to ACS capability in > PCI Express specifications for details. > > Since Linux didn't give special treatment to allow this exception, certain > RCiEP MFD devices are getting grouped in a single iommu group. This > doesn't permit a single device to be assigned to a guest for instance. > > In one vendor system: Device 14.x were grouped in a single IOMMU group. > > /sys/kernel/iommu_groups/5/devices/:00:14.0 > /sys/kernel/iommu_groups/5/devices/:00:14.2 > /sys/kernel/iommu_groups/5/devices/:00:14.3 > > After the patch: > /sys/kernel/iommu_groups/5/devices/:00:14.0 > /sys/kernel/iommu_groups/5/devices/:00:14.2 > /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group > > 14.0 and 14.2 are integrated devices, but legacy end points. > Whereas 14.3 was a PCIe compliant RCiEP. > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > > This permits assigning this device to a guest VM. > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") > Signed-off-by: Ashok Raj > To: Joerg Roedel > To: Bjorn Helgaas > Cc: linux-ker...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org > Cc: Lu Baolu > Cc: Alex Williamson > Cc: Darrel Goeddel > Cc: Mark Scott , > Cc: Romil Sharma > Cc: Ashok Raj Tentatively applied to pci/virtualization for v5.8, thanks! The spec says this handling must apply "when DMA remapping is enabled". The patch does not check whether DMA remapping is enabled. Is there any case where DMA remapping is *not* enabled, and we rely on this patch to tell us whether the device is isolated? It sounds like it may give the wrong answer in such a case? Can you confirm that I don't need to worry about this? > --- > v2: Moved functionality from iommu to pci quirks - Alex Williamson > > drivers/pci/quirks.c | 15 +++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 28c9a2409c50..63373ca0a3fe 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -4682,6 +4682,20 @@ static int pci_quirk_mf_endpoint_acs(struct pci_dev > *dev, u16 acs_flags) > PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_DT); > } > > +static int pci_quirk_rciep_acs(struct pci_dev *dev, u16 acs_flags) > +{ > + /* > + * RCiEP's are required to allow p2p only on translated addresses. > + * Refer to Intel VT-d specification Section 3.16 Root-Complex Peer > + * to Peer Considerations > + */ > + if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_END) > + return -ENOTTY; > + > + return pci_acs_ctrl_enabled(acs_flags, > + PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); > +} > + > static int pci_quirk_brcm_acs(struct pci_dev *dev, u16 acs_flags) > { > /* > @@ -4764,6 +4778,7 @@ static const struct pci_dev_acs_enabled { > /* I219 */ > { PCI_VENDOR_ID_INTEL, 0x15b7, pci_quirk_mf_endpoint_acs }, > { PCI_VENDOR_ID_INTEL, 0x15b8, pci_quirk_mf_endpoint_acs }, > + { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_rciep_acs }, > /* QCOM QDF2xxx root ports */ > { PCI_VENDOR_ID_QCOM, 0x0400, pci_quirk_qcom_rp_acs }, > { PCI_VENDOR_ID_QCOM, 0x0401, pci_quirk_qcom_rp_acs }, > -- > 2.7.4 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to users in sysfs
> -Original Message- > From: Robin Murphy [mailto:robin.mur...@arm.com] > Sent: Tuesday, June 2, 2020 1:14 AM > To: Song Bao Hua (Barry Song) ; w...@kernel.org; > h...@lst.de; m.szyprow...@samsung.com; iommu@lists.linux-foundation.org > Cc: Linuxarm ; linux-arm-ker...@lists.infradead.org > Subject: Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to > users in sysfs > > On 2020-05-30 10:15, Barry Song wrote: > > As tests show the latency of dma_unmap can increase dramatically while > > calling them cross NUMA nodes, especially cross CPU packages, eg. > > 300ns vs 800ns while waiting for the completion of CMD_SYNC in an > > empty command queue. The large latency causing by remote node will > > in turn make contention of the command queue more serious, and enlarge > > the latency of DMA users within local NUMA nodes. > > > > Users might intend to enforce NUMA locality with the consideration of > > the position of SMMU. The patch provides minor benefit by presenting > > this information to users directly, as they might want to know it without > > checking hardware spec at all. > > Hmm, given that dev-to_node() is a standard driver model thing, is there > not already some generic device property that can expose it - and if > not, should there be? Presumably if userspace cares enough to want to > know whereabouts in the system an IOMMU is, it probably also cares where > the actual endpoint devices are too. > > At the very least, it doesn't seem right for it to be specific to one > single IOMMU driver. Right now pci devices have generally got the numa_node in sysfs by drivers/pci/pci-sysfs.c static ssize_t numa_node_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { ... add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); pci_alert(pdev, FW_BUG "Overriding NUMA node to %d. Contact your vendor for updates.", node); dev->numa_node = node; return count; } static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr, char *buf) { return sprintf(buf, "%d\n", dev->numa_node); } static DEVICE_ATTR_RW(numa_node); for other devices who care about numa information, the specific drivers are doing that, for example: drivers/dax/bus.c: if (a == &dev_attr_numa_node.attr && !IS_ENABLED(CONFIG_NUMA)) drivers/dax/bus.c: &dev_attr_numa_node.attr, drivers/dma/idxd/sysfs.c: &dev_attr_numa_node.attr, drivers/hv/vmbus_drv.c: &dev_attr_numa_node.attr, drivers/nvdimm/bus.c: &dev_attr_numa_node.attr, drivers/nvme/host/core.c: &dev_attr_numa_node.attr, smmu is usually a platform device, we can actually expose numa_node for platform_device, or globally expose numa_node for general "device" if people don't opposite. Barry > > Robin. > > > Signed-off-by: Barry Song > > --- > > drivers/iommu/arm-smmu-v3.c | 40 > - > > 1 file changed, 39 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > > index 82508730feb7..754c4d59498b 100644 > > --- a/drivers/iommu/arm-smmu-v3.c > > +++ b/drivers/iommu/arm-smmu-v3.c > > @@ -4021,6 +4021,44 @@ err_reset_pci_ops: __maybe_unused; > > return err; > > } > > > > +static ssize_t numa_node_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + return sprintf(buf, "%d\n", dev_to_node(dev)); > > +} > > +static DEVICE_ATTR_RO(numa_node); > > + > > +static umode_t arm_smmu_numa_attr_visible(struct kobject *kobj, struct > attribute *a, > > + int n) > > +{ > > + struct device *dev = container_of(kobj, typeof(*dev), kobj); > > + > > + if (!IS_ENABLED(CONFIG_NUMA)) > > + return 0; > > + > > + if (a == &dev_attr_numa_node.attr && > > + dev_to_node(dev) == NUMA_NO_NODE) > > + return 0; > > + > > + return a->mode; > > +} > > + > > +static struct attribute *arm_smmu_dev_attrs[] = { > > + &dev_attr_numa_node.attr, > > + NULL > > +}; > > + > > +static struct attribute_group arm_smmu_dev_attrs_group = { > > + .attrs = arm_smmu_dev_attrs, > > + .is_visible = arm_smmu_numa_attr_visible, > > +}; > > + > > + > > +static const struct attribute_group *arm_smmu_dev_attrs_groups[] = { > > + &arm_smmu_dev_attrs_group, > > + NULL, > > +}; > > + > > static int arm_smmu_device_probe(struct platform_device *pdev) > > { > > int irq, ret; > > @@ -4097,7 +4135,7 @@ static int arm_smmu_device_probe(struct > platform_device *pdev) > > return ret; > > > > /* And we're up. Go go go! */ > > - ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL, > > + ret = iommu_device_sysfs_add(&smmu->iommu, dev, > arm_smmu_dev_attrs_groups, > > "smmu3.%pa", &ioaddr); > > if (ret) > >
Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU
On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote: > On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote: > > Is this slowdown significant? We already iterate over every device > > when applying PCI_FIXUP_FINAL quirks, so if we used the existing > > PCI_FIXUP_FINAL, we wouldn't be adding a new loop. We would only be > > adding two more iterations to the loop in pci_do_fixups() that tries > > to match quirks against the current device. I doubt that would be a > > measurable slowdown. > > I don't know how significant it is, but I remember people complaining > about adding new PCI quirks because it takes too long for them to run > them all. That was in the discussion about the quirk disabling ATS on > AMD Stoney systems. > > So it probably depends on how many PCI devices are in the system whether > it causes any measureable slowdown. I found this [1] from Paul Menzel, which was a slowdown caused by quirk_usb_early_handoff(). I think the real problem is individual quirks that take a long time. The PCI_FIXUP_IOMMU things we're talking about should be fast, and of course, they're only run for matching devices anyway. So I'd rather keep them as PCI_FIXUP_FINAL than add a whole new phase. Bjorn [1] https://lore.kernel.org/linux-pci/b1533fd5-1fae-7256-9597-36d3d5de9...@molgen.mpg.de/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix event counter availability check
> Instead of blindly moving the code around to a spot that would just work, > I am trying to understand what might be required here. In this case, > the init_device_table_dma()should not be needed. I suspect it's the IOMMU > invalidate all command that's also needed here. > > I'm also checking with the HW and BIOS team. Meanwhile, could you please give > the following change a try: Yes, this also fixes the problem for me. Alexander > > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c > index 5b81fd16f5fa..b07458cc1b0b 100644 > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -1875,6 +1875,8 @@ static int __init amd_iommu_init_pci(void) > ret = iommu_init_pci(iommu); > if (ret) > break; > + iommu_flush_all_caches(iommu); > + init_iommu_perf_ctr(iommu); > } > > /* ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix event counter availability check
Alexander On 6/1/20 4:01 PM, Alexander Monakov wrote: On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote: Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves the issue. This is the earliest point in amd_iommu_init_pci where the call succeeds on my laptop. According to your description, it should just need to be anywhere after the pci_enable_device() is called for the IOMMU device, isn't it? So, on your system, what if we just move the init_iommu_perf_ctr() here: No, this doesn't work, as I already said in the paragraph you are responding to. See my last sentence in the quoted part. So the implication is init_device_table_dma together with subsequent cache flush is also setting up something that is necessary for counters to be writable. Alexander Instead of blindly moving the code around to a spot that would just work, I am trying to understand what might be required here. In this case, the init_device_table_dma()should not be needed. I suspect it's the IOMMU invalidate all command that's also needed here. I'm also checking with the HW and BIOS team. Meanwhile, could you please give the following change a try: diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 5b81fd16f5fa..b07458cc1b0b 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1875,6 +1875,8 @@ static int __init amd_iommu_init_pci(void) ret = iommu_init_pci(iommu); if (ret) break; + iommu_flush_all_caches(iommu); + init_iommu_perf_ctr(iommu); } /* -- 2.17.1 Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: Optimize the error handling when allocating an iommu group
Optimize the error handling to free the resources correctly when failed to allocate an iommu group. Signed-off-by: Baolin Wang --- drivers/iommu/iommu.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 03d6a26..ac91024 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -529,12 +529,18 @@ struct iommu_group *iommu_group_alloc(void) ret = iommu_group_create_file(group, &iommu_group_attr_reserved_regions); - if (ret) + if (ret) { + kobject_put(group->devices_kobj); return ERR_PTR(ret); + } ret = iommu_group_create_file(group, &iommu_group_attr_type); - if (ret) + if (ret) { + iommu_group_remove_file(group, + &iommu_group_attr_reserved_regions); + kobject_put(group->devices_kobj); return ERR_PTR(ret); + } pr_debug("Allocated group %d\n", group->id); -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code
On Mon Jun 01 20, Jerry Snitselaar wrote: On Fri May 29 20, Jerry Snitselaar wrote: On Tue Apr 14 20, Joerg Roedel wrote: Hi, here is the second version of this patch-set. The first version with some more introductory text can be found here: https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/ Changes v1->v2: * Rebased to v5.7-rc1 * Re-wrote the arm-smmu changes as suggested by Robin Murphy * Re-worked the Exynos patches to hopefully not break the driver anymore * Fixed a missing mutex_unlock() reported by Marek Szyprowski, thanks for that. There is also a git-branch available with these patches applied: https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v2 Please review. Thanks, Joerg Joerg Roedel (32): iommu: Move default domain allocation to separate function iommu/amd: Implement iommu_ops->def_domain_type call-back iommu/vt-d: Wire up iommu_ops->def_domain_type iommu/amd: Remove dma_mask check from check_device() iommu/amd: Return -ENODEV in add_device when device is not handled by IOMMU iommu: Add probe_device() and remove_device() call-backs iommu: Move default domain allocation to iommu_probe_device() iommu: Keep a list of allocated groups in __iommu_probe_device() iommu: Move new probe_device path to separate function iommu: Split off default domain allocation from group assignment iommu: Move iommu_group_create_direct_mappings() out of iommu_group_add_device() iommu: Export bus_iommu_probe() and make is safe for re-probing iommu/amd: Remove dev_data->passthrough iommu/amd: Convert to probe/release_device() call-backs iommu/vt-d: Convert to probe/release_device() call-backs iommu/arm-smmu: Convert to probe/release_device() call-backs iommu/pamu: Convert to probe/release_device() call-backs iommu/s390: Convert to probe/release_device() call-backs iommu/virtio: Convert to probe/release_device() call-backs iommu/msm: Convert to probe/release_device() call-backs iommu/mediatek: Convert to probe/release_device() call-backs iommu/mediatek-v1 Convert to probe/release_device() call-backs iommu/qcom: Convert to probe/release_device() call-backs iommu/rockchip: Convert to probe/release_device() call-backs iommu/tegra: Convert to probe/release_device() call-backs iommu/renesas: Convert to probe/release_device() call-backs iommu/omap: Remove orphan_dev tracking iommu/omap: Convert to probe/release_device() call-backs iommu/exynos: Use first SYSMMU in controllers list for IOMMU core iommu/exynos: Convert to probe/release_device() call-backs iommu: Remove add_device()/remove_device() code-paths iommu: Unexport iommu_group_get_for_dev() Sai Praneeth Prakhya (1): iommu: Add def_domain_type() callback in iommu_ops drivers/iommu/amd_iommu.c | 97 drivers/iommu/amd_iommu_types.h | 1 - drivers/iommu/arm-smmu-v3.c | 38 +-- drivers/iommu/arm-smmu.c| 39 ++-- drivers/iommu/exynos-iommu.c| 24 +- drivers/iommu/fsl_pamu_domain.c | 22 +- drivers/iommu/intel-iommu.c | 68 +- drivers/iommu/iommu.c | 393 +--- drivers/iommu/ipmmu-vmsa.c | 60 ++--- drivers/iommu/msm_iommu.c | 34 +-- drivers/iommu/mtk_iommu.c | 24 +- drivers/iommu/mtk_iommu_v1.c| 50 ++-- drivers/iommu/omap-iommu.c | 99 ++-- drivers/iommu/qcom_iommu.c | 24 +- drivers/iommu/rockchip-iommu.c | 26 +-- drivers/iommu/s390-iommu.c | 22 +- drivers/iommu/tegra-gart.c | 24 +- drivers/iommu/tegra-smmu.c | 31 +-- drivers/iommu/virtio-iommu.c| 41 +--- include/linux/iommu.h | 21 +- 20 files changed, 533 insertions(+), 605 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu Hi Joerg, With this patchset, I have an epyc system where if I boot with iommu=nopt and force a dump I will see some io page faults for a nic on the system. The vmcore is harvested and the system reboots. I haven't reproduced it on other systems yet, but without the patchset I don't see the io page faults during the kdump. Regards, Jerry I just hit an issue on a separate intel based system (kdump iommu=nopt), where it panics in during intel_iommu_attach_device, in is_aux_domain, due to device_domain_info being DEFER_DEVICE_DOMAIN_INFO. That doesn't get set to a valid address until the domain_add_dev_info call. Is it as simple as the following? diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 29d3940847d3..f1bbeed46a4c 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5053,8 +5053,8 @@ is_aux_domain(struct device *dev, struct iommu_domain *domain) { struct device_domain_info *info = dev->archdata.iommu; - return info && info->auxd_enabled && - domain->type == IOMMU_DOMAIN_UNMANA
Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to users in sysfs
On 2020-05-30 10:15, Barry Song wrote: As tests show the latency of dma_unmap can increase dramatically while calling them cross NUMA nodes, especially cross CPU packages, eg. 300ns vs 800ns while waiting for the completion of CMD_SYNC in an empty command queue. The large latency causing by remote node will in turn make contention of the command queue more serious, and enlarge the latency of DMA users within local NUMA nodes. Users might intend to enforce NUMA locality with the consideration of the position of SMMU. The patch provides minor benefit by presenting this information to users directly, as they might want to know it without checking hardware spec at all. Hmm, given that dev-to_node() is a standard driver model thing, is there not already some generic device property that can expose it - and if not, should there be? Presumably if userspace cares enough to want to know whereabouts in the system an IOMMU is, it probably also cares where the actual endpoint devices are too. At the very least, it doesn't seem right for it to be specific to one single IOMMU driver. Robin. Signed-off-by: Barry Song --- drivers/iommu/arm-smmu-v3.c | 40 - 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 82508730feb7..754c4d59498b 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -4021,6 +4021,44 @@ err_reset_pci_ops: __maybe_unused; return err; } +static ssize_t numa_node_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", dev_to_node(dev)); +} +static DEVICE_ATTR_RO(numa_node); + +static umode_t arm_smmu_numa_attr_visible(struct kobject *kobj, struct attribute *a, + int n) +{ + struct device *dev = container_of(kobj, typeof(*dev), kobj); + + if (!IS_ENABLED(CONFIG_NUMA)) + return 0; + + if (a == &dev_attr_numa_node.attr && + dev_to_node(dev) == NUMA_NO_NODE) + return 0; + + return a->mode; +} + +static struct attribute *arm_smmu_dev_attrs[] = { + &dev_attr_numa_node.attr, + NULL +}; + +static struct attribute_group arm_smmu_dev_attrs_group = { + .attrs = arm_smmu_dev_attrs, + .is_visible = arm_smmu_numa_attr_visible, +}; + + +static const struct attribute_group *arm_smmu_dev_attrs_groups[] = { + &arm_smmu_dev_attrs_group, + NULL, +}; + static int arm_smmu_device_probe(struct platform_device *pdev) { int irq, ret; @@ -4097,7 +4135,7 @@ static int arm_smmu_device_probe(struct platform_device *pdev) return ret; /* And we're up. Go go go! */ - ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL, + ret = iommu_device_sysfs_add(&smmu->iommu, dev, arm_smmu_dev_attrs_groups, "smmu3.%pa", &ioaddr); if (ret) return ret; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v7 21/24] iommu/arm-smmu-v3: Add stall support for platform devices
Hi Jean, > -Original Message- > From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of > Jean-Philippe Brucker > Sent: 19 May 2020 18:55 > To: iommu@lists.linux-foundation.org; devicet...@vger.kernel.org; > linux-arm-ker...@lists.infradead.org; linux-...@vger.kernel.org; > linux...@kvack.org > Cc: fenghua...@intel.com; kevin.t...@intel.com; j...@ziepe.ca; > catalin.mari...@arm.com; robin.mur...@arm.com; h...@infradead.org; > zhangfei@linaro.org; Jean-Philippe Brucker ; > felix.kuehl...@amd.com; w...@kernel.org; christian.koe...@amd.com > Subject: [PATCH v7 21/24] iommu/arm-smmu-v3: Add stall support for platform > devices > > The SMMU provides a Stall model for handling page faults in platform > devices. It is similar to PCI PRI, but doesn't require devices to have > their own translation cache. Instead, faulting transactions are parked > and the OS is given a chance to fix the page tables and retry the > transaction. > > Enable stall for devices that support it (opt-in by firmware). When an > event corresponds to a translation error, call the IOMMU fault handler. > If the fault is recoverable, it will call us back to terminate or > continue the stall. > > Signed-off-by: Jean-Philippe Brucker > --- > drivers/iommu/Kconfig | 1 + > include/linux/iommu.h | 2 + > drivers/iommu/arm-smmu-v3.c | 284 > ++-- > drivers/iommu/of_iommu.c| 5 +- > 4 files changed, 281 insertions(+), 11 deletions(-) > [...] > +static int arm_smmu_page_response(struct device *dev, > + struct iommu_fault_event *unused, > + struct iommu_page_response *resp) > +{ > + struct arm_smmu_cmdq_ent cmd = {0}; > + struct arm_smmu_master *master = dev_iommu_priv_get(dev); > + int sid = master->streams[0].id; > + > + if (master->stall_enabled) { > + cmd.opcode = CMDQ_OP_RESUME; > + cmd.resume.sid = sid; > + cmd.resume.stag = resp->grpid; > + switch (resp->code) { > + case IOMMU_PAGE_RESP_INVALID: > + case IOMMU_PAGE_RESP_FAILURE: > + cmd.resume.resp = CMDQ_RESUME_0_RESP_ABORT; > + break; > + case IOMMU_PAGE_RESP_SUCCESS: > + cmd.resume.resp = CMDQ_RESUME_0_RESP_RETRY; > + break; > + default: > + return -EINVAL; > + } > + } else { > + /* TODO: insert PRI response here */ > + return -ENODEV; > + } > + > + arm_smmu_cmdq_issue_cmd(master->smmu, &cmd); > + /* > + * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP. > + * RESUME consumption guarantees that the stalled transaction will be > + * terminated... at some point in the future. PRI_RESP is fire and > + * forget. > + */ > + > + return 0; > +} > + > /* Context descriptor manipulation functions */ > static void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 > asid) > { > @@ -1762,8 +1846,7 @@ static int arm_smmu_write_ctx_desc(struct > arm_smmu_domain *smmu_domain, > FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) | > CTXDESC_CD_0_V; > > - /* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */ > - if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE) > + if (smmu_domain->stall_enabled) > val |= CTXDESC_CD_0_S; > } > > @@ -2171,7 +2254,7 @@ static void arm_smmu_write_strtab_ent(struct > arm_smmu_master *master, u32 sid, >FIELD_PREP(STRTAB_STE_1_STRW, strw)); > > if (smmu->features & ARM_SMMU_FEAT_STALLS && > -!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE)) > + !master->stall_enabled) > dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD); > > val |= (s1_cfg->cdcfg.cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) > | > @@ -2248,7 +2331,6 @@ static int arm_smmu_init_l2_strtab(struct > arm_smmu_device *smmu, u32 sid) > return 0; > } > > -__maybe_unused > static struct arm_smmu_master * > arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid) > { > @@ -2275,23 +2357,123 @@ arm_smmu_find_master(struct > arm_smmu_device *smmu, u32 sid) > } > > /* IRQ and event handlers */ > +static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > +{ > + int ret; > + u32 perm = 0; > + struct arm_smmu_master *master; > + bool ssid_valid = evt[0] & EVTQ_0_SSV; > + u8 type = FIELD_GET(EVTQ_0_ID, evt[0]); > + u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]); > + struct iommu_fault_event fault_evt = { }; > + struct iommu_fault *flt = &fault_evt.fault; > + > + /* Stage-2 is always pinned at the moment */ > + if (evt[1] & EVTQ_1_S2) > + return -EFAULT; > + > + master = arm_smmu_
Re: [PATCH] iommu: Improve exception handling in iommu_group_alloc()
> Optimize the error handling to free the resources correctly when > failed to allocate an iommu group. * I would not categorise the desired completion of exception handling as a software optimisation. * Would you like to add the tag “Fixes” to the commit message? * I suggest to avoid the specification of duplicate function calls. Will it be helpful to add a few jump targets? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162#n455 Regards, Markus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 2/2] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()
It has been shown that the cmpxchg() for finding space in the cmdq can be a bottleneck: - For more CPUs contenting the cmdq, cmpxchg() will fail more often. - Since the software-maintained cons pointer is updated on the same 64b memory region, the chance of cmpxchg() failure increases again. The cmpxchg() is removed as part of 2 related changes: - If a CMD_SYNC is always issued for a batch, the cmdq can - in theory - never fill, since we always wait for a CMD_SYNC to be consumed. We must issue the CMD_SYNC so that a CPU will be always limited to issuing max_cmd_per_batch commands. Generally for DMA unmap ops, a CMD_SYNC is always issued anyway. As such, the cmdq locking is removed, and we only longer maintain cons in software (this needs to be revised for !MSI support). - Update prod and cmdq owner in a single operation. For this, we count the prod and owner in separate regions in arm_smmu_ll_queue.prod. As with simple binary counting, once the prod+wrap fields overflow, they will zero. They will overflow into "owner" region, but this is ok as we have accounted for the extra value. As for the "owner", we now count this value, instead of setting a flag. Similar to before, once the owner has finished gathering, it will clear this mask. As such, a CPU declares itself as the "owner" when it reads zero for this field. This zeroing will also clear possible overflow in wrap+prod region. Signed-off-by: John Garry --- drivers/iommu/arm-smmu-v3.c | 58 +++-- 1 file changed, 14 insertions(+), 44 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index e607ab5a4cbd..d6c7d82f9cf8 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1375,7 +1375,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, u64 *cmds, int n, bool sync) { u64 cmd_sync[CMDQ_ENT_DWORDS]; - u32 prod; + u32 prod, prodx; unsigned long flags; bool owner; struct arm_smmu_cmdq *cmdq = &smmu->cmdq; @@ -1383,33 +1383,21 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, .max_n_shift = cmdq->q.llq.max_n_shift, }, head = llq; int ret = 0; + u32 owner_val = 1 << cmdq->q.llq.owner_count_shift; + u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0); + u32 owner_mask = GENMASK(30, cmdq->q.llq.owner_count_shift); + + /* always issue a CMD_SYNC TODO: fixup callers for this */ + sync = true; /* 1. Allocate some space in the queue */ local_irq_save(flags); - llq.val = READ_ONCE(cmdq->q.llq.val); - do { - u64 old; - - while (!queue_has_space(&llq, n + sync)) { - local_irq_restore(flags); - if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq)) - dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); - local_irq_save(flags); - } - head.cons = llq.cons; - head.prod = queue_inc_prod_n(&llq, n + sync) | -CMDQ_PROD_OWNED_FLAG; + prodx = atomic_fetch_add(n + sync + owner_val, &cmdq->q.llq.atomic.prod); - old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val); - if (old == llq.val) - break; - - llq.val = old; - } while (1); - owner = !(llq.prod & CMDQ_PROD_OWNED_FLAG); - head.prod &= ~CMDQ_PROD_OWNED_FLAG; - llq.prod &= ~CMDQ_PROD_OWNED_FLAG; + owner = !(prodx & owner_mask); + llq.prod = prod_mask & prodx; + head.prod = queue_inc_prod_n(&llq, n + sync); /* * 2. Write our commands into the queue As with simple binary counting, once the prod+wrap fields overflow, they will zero. They will overflow into "owner" region, but this is ok as we have accounted for the extra value. @@ -1420,14 +1408,6 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, prod = queue_inc_prod_n(&llq, n); arm_smmu_cmdq_build_sync_cmd(cmd_sync, smmu, prod); queue_write(Q_ENT(&cmdq->q, prod), cmd_sync, CMDQ_ENT_DWORDS); - - /* -* In order to determine completion of our CMD_SYNC, we must -* ensure that the queue can't wrap twice without us noticing. -* We achieve that by taking the cmdq lock as shared before -* marking our slot as valid. -*/ - arm_smmu_cmdq_shared_lock(cmdq); } /* 3. Mark our slots as valid, ensuring commands are visible first */ @@ -1439,11 +1419,10 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, /* a. Wait for previous owner to finish */ atomic_cond_rea
[PATCH RFC 0/2] iommu/arm-smmu-v3: Improve cmdq lock efficiency
As mentioned in [0], the CPU may consume many cycles processing arm_smmu_cmdq_issue_cmdlist(). One issue we find is the cmpxchg() loop to get space on the queue takes approx 25% of the cycles for this function. The cmpxchg() is removed as follows: - We assume that the cmdq can never fill with changes to limit the batch size (where necessary) and always issue a CMD_SYNC for a batch We need to do this since we no longer maintain the cons value in software, and we cannot deal with no available space properly. - Replace cmpxchg() with atomic inc operation, to maintain the prod and owner values. Early experiments have shown that we may see a 25% boost in throughput IOPS for my NVMe test with these changes. And some CPUs, which were loaded at ~55%, now see a ~45% load. So, even though the changes are incomplete and other parts of the driver will need fixing up (and it looks maybe broken for !MSI support), the performance boost seen would seem to be worth the effort of exploring this. Comments requested please. Thanks [0] https://lore.kernel.org/linux-iommu/b926444035e5e2439431908e3842afd24b8...@dggemi525-mbs.china.huawei.com/T/#ma02e301c38c3e94b7725e685757c27e39c7cbde3 John Garry (2): iommu/arm-smmu-v3: Calculate bits for prod and owner iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist() drivers/iommu/arm-smmu-v3.c | 92 +++-- 1 file changed, 47 insertions(+), 45 deletions(-) -- 2.16.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 1/2] iommu/arm-smmu-v3: Calculate bits for prod and owner
Since the arm_smmu_ll_queue.prod will be for counting the "owner" value and also HW prod pointer, calculate how many bits are available for and used by each. This is based on the number of possible CPUs in the system. And we require that each CPU can issue a minimum of 2 commands per batch - 1 x CMD_SYNC and at least 1 x other. Ignoring limits of HW max_n_shift and HW cmdq memory allocation, approx 16K is the max supported CPUs. For this, max_n_shift would be 15. Signed-off-by: John Garry --- drivers/iommu/arm-smmu-v3.c | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 82508730feb7..e607ab5a4cbd 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -530,6 +530,8 @@ struct arm_smmu_ll_queue { u8 __pad[SMP_CACHE_BYTES]; } cacheline_aligned_in_smp; u32 max_n_shift; + u32 max_cmd_per_batch; + u32 owner_count_shift; }; struct arm_smmu_queue { @@ -1512,7 +1514,10 @@ static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu, struct arm_smmu_cmdq_batch *cmds, struct arm_smmu_cmdq_ent *cmd) { - if (cmds->num == CMDQ_BATCH_ENTRIES) { + struct arm_smmu_cmdq *q = &smmu->cmdq; + struct arm_smmu_ll_queue *llq = &q->q.llq; + + if (cmds->num == llq->max_cmd_per_batch) { arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false); cmds->num = 0; } @@ -3156,8 +3161,24 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu, unsigned long cons_off, size_t dwords, const char *name) { + int cpus = num_possible_cpus(); size_t qsz; + /* +* We can get the number of bits required for owner counting by +* log2(nr possible cpus) + 1, but we have to take into account that he +* wrap+prod could overflow before the owner zeroes, so add 1 +* more (to cpus) for bits_for_cmdq_owner calculation. +*/ + int bits_for_cmdq_owner = ilog2(cpus + 1) + 1; + /* 1-bit for overflow, 1-bit for wrap */ + int bits_available_for_prod = 32 - 2 - bits_for_cmdq_owner; + int entries_for_prod; + + if (bits_available_for_prod < 1) /* this would be insane - how many CPUs? */ + return -ENOMEM; + + q->llq.max_n_shift = min_t(int, q->llq.max_n_shift, bits_available_for_prod); do { qsz = ((1 << q->llq.max_n_shift) * dwords) << 3; q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma, @@ -3167,6 +3188,17 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu, q->llq.max_n_shift--; } while (1); + entries_for_prod = 1 << q->llq.max_n_shift; + + /* +* We need at least 2 commands in a batch (1 x CMD_SYNC and 1 x whatever else). +* Assuming orig max_n_shift >= 17, this would mean ~16K CPUs max. +*/ + if (entries_for_prod < 2 * cpus) + return -ENOMEM; + + q->llq.max_cmd_per_batch = min_t(u32, entries_for_prod / cpus, CMDQ_BATCH_ENTRIES); + q->llq.owner_count_shift = q->llq.max_n_shift + 1; if (!q->base) { dev_err(smmu->dev, -- 2.16.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu-v3: allocate the memory of queues in local numa node
BEGIN:VCALENDAR METHOD:REQUEST PRODID:Microsoft Exchange Server 2010 VERSION:2.0 BEGIN:VTIMEZONE TZID:New Zealand Standard Time BEGIN:STANDARD DTSTART:16010101T03 TZOFFSETFROM:+1300 TZOFFSETTO:+1200 RRULE:FREQ=YEARLY;INTERVAL=1;BYDAY=1SU;BYMONTH=4 END:STANDARD BEGIN:DAYLIGHT DTSTART:16010101T02 TZOFFSETFROM:+1200 TZOFFSETTO:+1300 RRULE:FREQ=YEARLY;INTERVAL=1;BYDAY=-1SU;BYMONTH=9 END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT ORGANIZER;CN=Song Bao Hua (Barry Song):MAILTO:song.bao@hisilicon.com ATTENDEE;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=h...@lst.de :MAILTO:h...@lst.de ATTENDEE;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=m.szyprows k...@samsung.com:MAILTO:m.szyprow...@samsung.com ATTENDEE;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=robin.murp h...@arm.com:MAILTO:robin.mur...@arm.com ATTENDEE;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=will@kerne l.org:MAILTO:w...@kernel.org ATTENDEE;ROLE=OPT-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=linux-arm- ker...@lists.infradead.org:MAILTO:linux-arm-ker...@lists.infradead.org ATTENDEE;ROLE=OPT-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=iommu@list s.linux-foundation.org:MAILTO:iommu@lists.linux-foundation.org ATTENDEE;ROLE=OPT-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=Linuxarm:M AILTO:linux...@huawei.com DESCRIPTION;LANGUAGE=en-US:> From: Song Bao Hua (Barry Song)\n> Sent: Monda y\, June 1\, 2020 11:32 PM\n> To: h...@lst.de\; m.szyprow...@samsung.com\; robin.mur...@arm.com\; \n> w...@kernel.org\n> Cc: linux-arm-kernel@lists.i nfradead.org\; \n> iommu@lists.linux-foundation.org\; Linuxarm \; Song \n> Bao Hua (Barry Song) \n> Subject: [PATCH] iommu/arm-smmu-v3: allocate the memory of queues in \n> local numa node\n> \n> \n> dmam_alloc_coherent() will usually allocate mem ory from the default CMA.\n> For\n> a common arm64 defconfig without reser ved memory in device tree\, there \n> is only one CMA close to address 0.\ n> dma_alloc_contiguous() will allocate memory without any idea of NUMA \ n> and smmu has no customized per-numa cma_area.\n> struct page *dma_alloc _contiguous(struct device *dev\, size_t size\, \n> gfp_t gfp) {\n> size_t count = size >> PAGE_SHIFT\;\n> struct page *page = NULL\; \n> struct cma *cma = NULL\;\n> \n> if (dev && dev->cma_ar ea)\n> cma = dev->cma_area\;\n> else if (count > 1 )\n> cma = dma_contiguous_default_area\;\n> \n> ...\n> return page\;\n> }\n> \n> if there are N numa nodes\, N-1 nodes will put command/evt queues etc \n> in a remote node the default CMA belongs t o\,probably node 0. Tests \n> show\, after sending CMD_SYNC in an empty co mmand queue\, on Node2\, \n> without this patch\, it takes 550ns to wait f or the completion of \n> CMD_SYNC\; with this patch\, it takes 250ns to wa it for the completion \n> of CMD_SYNC.\n> \n\nSorry for missing the RFC in the subject.\nFor the tested platform\, hardware will help the sync betwe en cpu and smmu. So there is no sync operations. So please consider this p atch as a RFC. This is only for the concept.\n\nThe other option to fix th is is creating a per-numa CMA area for smmu and assigning the per-numa cma _area to SMMU.\nMaybe cma_declare_contiguous_nid() used by mm/hugetlb.c ca n be used by AARCH64/SMMU.\n\nOr we can completely change CMA to create de fault per-numa CMA like this:\n\n struct page *dma_alloc_contiguous(struct device *dev\, size_t size\, gfp_t gfp) {\n size_t count = size > > PAGE_SHIFT\;\n struct page *page = NULL\;\n struct cma * cma = NULL\;\n +int nid = dev_to_node(dev)\;\n \n if (dev && dev->cma_area)\n cma = dev->cma_area\;\n else i f (count > 1)\n - cma = dma_contiguous_default_area\;\n + cma = dma_contiguous_default_areas[nid]\;\n \n ...\n return page\;\n}\n\nThen there is no necessity to assign per-numa cma_are a to smmu->dev.cma_area.\n\n-barry\n\n> Signed-off-by: Barry Song \n> ---\n> drivers/iommu/arm-smmu-v3.c | 63 \n> + +++-\n> 1 file changed\, 48 insertions(+)\, 1 5 deletions(-)\n> \n> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/i ommu/arm-smmu-v3.c \n> index 82508730feb7..58295423e1d7 100644\n> --- a/dr ivers/iommu/arm-smmu-v3.c\n> +++ b/drivers/iommu/arm-smmu-v3.c\n> @@ -3157 \,21 +3157\,23 @@ static int arm_smmu_init_one_queue(struct \n> arm_smmu_d evice *smmu\,\n> size_t dwords\, const char *name) {\n> size_t qsz\;\n> + struct page *page\;\n> \n> - do {\n> - qsz = ((1 << q->llq.max _n_shift) * dwords) << 3\;\n> - q->base = dmam_alloc_coherent(smmu->dev\, qsz\, &q->base_dma\,\n> - GFP_KERNEL)\;\n> - if (q->base || qs z < PAGE_SIZE)\n> - break\;\n> -\n> - q->llq.max_n_shift--\;\n> - } whi le (1)\;\n> + qsz = ((1 << q->llq.max_n_shift) * dwords) << 3\;\n> + page = alloc_page
[PATCH] iommu/arm-smmu-v3: allocate the memory of queues in local numa node
dmam_alloc_coherent() will usually allocate memory from the default CMA. For a common arm64 defconfig without reserved memory in device tree, there is only one CMA close to address 0. dma_alloc_contiguous() will allocate memory without any idea of NUMA and smmu has no customized per-numa cma_area. struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp) { size_t count = size >> PAGE_SHIFT; struct page *page = NULL; struct cma *cma = NULL; if (dev && dev->cma_area) cma = dev->cma_area; else if (count > 1) cma = dma_contiguous_default_area; ... return page; } if there are N numa nodes, N-1 nodes will put command/evt queues etc in a remote node the default CMA belongs to,probably node 0. Tests show, after sending CMD_SYNC in an empty command queue, on Node2, without this patch, it takes 550ns to wait for the completion of CMD_SYNC; with this patch, it takes 250ns to wait for the completion of CMD_SYNC. Signed-off-by: Barry Song --- drivers/iommu/arm-smmu-v3.c | 63 - 1 file changed, 48 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 82508730feb7..58295423e1d7 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -3157,21 +3157,23 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu, size_t dwords, const char *name) { size_t qsz; + struct page *page; - do { - qsz = ((1 << q->llq.max_n_shift) * dwords) << 3; - q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma, - GFP_KERNEL); - if (q->base || qsz < PAGE_SIZE) - break; - - q->llq.max_n_shift--; - } while (1); + qsz = ((1 << q->llq.max_n_shift) * dwords) << 3; + page = alloc_pages_node(dev_to_node(smmu->dev), GFP_KERNEL, + get_order(qsz)); + if (!page) { + dev_err(smmu->dev, + "failed to allocate queue (0x%zx bytes) for %s\n", + qsz, name); + return -ENOMEM; + } - if (!q->base) { + q->base = page_address(page); + q->base_dma = dma_map_single(smmu->dev, q->base, qsz, DMA_BIDIRECTIONAL); + if (dma_mapping_error(smmu->dev, q->base_dma)) { dev_err(smmu->dev, - "failed to allocate queue (0x%zx bytes) for %s\n", - qsz, name); + "failed to dma map for %s\n", name); return -ENOMEM; } @@ -3192,6 +3194,18 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu, return 0; } +static int arm_smmu_deinit_one_queue(struct arm_smmu_device *smmu, + struct arm_smmu_queue *q, + size_t dwords) +{ + size_t qsz = ((1 << q->llq.max_n_shift) * dwords) << 3; + + dma_unmap_single(smmu->dev, q->base_dma, qsz, DMA_BIDIRECTIONAL); + free_page((unsigned long)q->base); + + return 0; +} + static void arm_smmu_cmdq_free_bitmap(void *data) { unsigned long *bitmap = data; @@ -3233,22 +3247,40 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu) ret = arm_smmu_cmdq_init(smmu); if (ret) - return ret; + goto deinit_cmdq; /* evtq */ ret = arm_smmu_init_one_queue(smmu, &smmu->evtq.q, ARM_SMMU_EVTQ_PROD, ARM_SMMU_EVTQ_CONS, EVTQ_ENT_DWORDS, "evtq"); if (ret) - return ret; + goto deinit_cmdq; /* priq */ if (!(smmu->features & ARM_SMMU_FEAT_PRI)) return 0; - return arm_smmu_init_one_queue(smmu, &smmu->priq.q, ARM_SMMU_PRIQ_PROD, + ret = arm_smmu_init_one_queue(smmu, &smmu->priq.q, ARM_SMMU_PRIQ_PROD, ARM_SMMU_PRIQ_CONS, PRIQ_ENT_DWORDS, "priq"); + if (ret) + goto deinit_evtq; + + return 0; + +deinit_evtq: + arm_smmu_deinit_one_queue(smmu, &smmu->evtq.q, EVTQ_ENT_DWORDS); +deinit_cmdq: + arm_smmu_deinit_one_queue(smmu, &smmu->cmdq.q, CMDQ_ENT_DWORDS); + return ret; +} + +static void arm_smmu_deinit_queues(struct arm_smmu_device *smmu) +{ + arm_smmu_deinit_one_queue(smmu, &smmu->cmdq.q, CMDQ_ENT_DWORDS); + arm_smmu_deinit_one_queue(smmu, &smmu->evtq.q, EVTQ_ENT_DWORDS); + if (smmu->features & ARM_SMMU_FEAT_PRI) + arm_smmu_deinit_one_queue(smmu, &smmu->priq.q, PRIQ_ENT_DWORDS); } static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu) @@ -4121,6 +4153,7 @@ static int arm_sm
[Try 2] DMAR errors on Wildcat Point-LP xHCI (Lenovo T450s)
Well, try 1 was a word salad, thanks gmail I guess. Now I will try to redeem myself... Hello, Trying to use a built-in USB device I rarely use (Sierra EM7345 LTE modem), I ran into issues from the modem flapping (getting removed from USB bus before LTE network registration is visible) up to all devices on the bus being "disconnected" (xhci giving up, in my understanding, with only the root hubs listed by lsusb). Checking the syslog, I find this: May 30 17:35:46 localhost kernel: [ 278.999480] DMAR: DRHD: handling fault status reg 3 May 30 17:35:46 localhost kernel: [ 278.999485] DMAR: [DMA Read] Request device [00:16.7] PASID fault addr 9cdff000 [fault reason 02] Present bit in context entry is clear May 30 17:35:46 localhost kernel: [ 278.999488] DMAR: DRHD: handling fault status reg 3 May 30 17:35:46 localhost kernel: [ 278.999490] DMAR: [DMA Read] Request device [00:16.7] PASID fault addr 9cdff000 [fault reason 02] Present bit in context entry is clear May 30 17:35:46 localhost kernel: [ 279.001076] DMAR: DRHD: handling fault status reg 2 May 30 17:35:46 localhost kernel: [ 279.001078] DMAR: [DMA Write] Request device [00:16.7] PASID fault addr 9cdff000 [fault reason 02] Present bit in context entry is clear May 30 17:35:46 localhost kernel: [ 279.001120] DMAR: DRHD: handling fault status reg 2 May 30 17:35:47 localhost kernel: [ 280.738192] usb 2-4: USB disconnect, device number 10 May 30 17:35:47 localhost kernel: [ 280.738224] cdc_mbim 2-4:1.0: Tx URB error: -19 May 30 17:35:47 localhost kernel: [ 280.738303] cdc_mbim 2-4:1.0 wwan0: unregister 'cdc_mbim' usb-:00:14.0-4, CDC MBIM May 30 17:35:47 localhost ModemManager[736]: (net/wwan0): released by device '/sys/devices/pci:00/:00:14.0/usb2/2-4' May 30 17:35:47 localhost ModemManager[736]: [/dev/cdc-wdm0] unexpected port hangup! May 30 17:35:47 localhost ModemManager[736]: [/dev/cdc-wdm0] channel destroyed May 30 17:35:47 localhost ModemManager[736]: Connection to mbim-proxy for /dev/cdc-wdm0 lost, reprobing May 30 17:35:47 localhost ModemManager[736]: [device /sys/devices/pci:00/:00:14.0/usb2/2-4] creating modem with plugin 'Sierra' and '2' ports May 30 17:35:47 localhost ModemManager[736]: Could not recreate modem for device '/sys/devices/pci:00/:00:14.0/usb2/2-4': Failed to find a net port in the MBIM modem May 30 17:35:47 localhost ModemManager[736]: (usbmisc/cdc-wdm0): released by device '/sys/devices/pci:00/:00:14.0/usb2/2-4' May 30 17:35:47 localhost ModemManager[736]: (tty/ttyACM0): released by device '/sys/devices/pci:00/:00:14.0/usb2/2-4' May 30 17:35:48 localhost kernel: [ 281.202173] usb 2-4: new high-speed USB device number 11 using xhci_hcd May 30 17:35:48 localhost kernel: [ 281.224037] usb 2-4: New USB device found, idVendor=8087, idProduct=0716, bcdDevice= 0.00 May 30 17:35:48 localhost kernel: [ 281.224043] usb 2-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0 May 30 17:35:48 localhost kernel: [ 281.225646] usb_serial_simple 2-4:1.0: flashloader converter detected May 30 17:35:48 localhost kernel: [ 281.225891] usb 2-4: flashloader converter now attached to ttyUSB0 which makes me suspect a missing IOMMU mapping in ACPI for the xhci controller. In this case, the xhci could recover and re-enumerated the device fairly quickly. Booting with "intel_iommu=off" makes the LTE modem work at least far enough that it gets registered to network (can send/receive SMS). I have not tried data communication (no data plan on current SIM). I have noticed for a while that this machine had a tendency to lose all USB devices more often than I enable the LTE modem, so it seems the modem just make this issue more likely, and is not their direct cause. This is on a 5.6.7 (Debian Sid 5.6.0-1-amd64, version from which pasted logs are extracted), and reproduced with 5.6.14 (Debian Sid 5.6.0-2-amd64). The USB issues have been happening for a long time, and I use this modem rarely enough that I would not notice a new issue before several kernel versions. The modem usually worked "well enough" but always has had a bit of flapping, sometimes working after one or two suspend/resume cycles, and until now I did not feel the need to investigate more (I assumed a less-than-optimal modem/modem driver). This time it never ended up working after several suspend/resume cycles and reboots. So I do not believe it is a localised regression, but a bad situation getting just nudged over the edge. $ lspci 00:00.0 Host bridge: Intel Corporation Broadwell-U Host Bridge -OPI (rev 09) 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 5500 (rev 09) 00:03.0 Audio device: Intel Corporation Broadwell-U Audio Controller (rev 09) 00:14.0 USB controller: Intel Corporation Wildcat Point-LP USB xHCI Controller (rev 03) 00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03) 00:19.0 Ethernet contr
Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code
On Fri May 29 20, Jerry Snitselaar wrote: On Tue Apr 14 20, Joerg Roedel wrote: Hi, here is the second version of this patch-set. The first version with some more introductory text can be found here: https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/ Changes v1->v2: * Rebased to v5.7-rc1 * Re-wrote the arm-smmu changes as suggested by Robin Murphy * Re-worked the Exynos patches to hopefully not break the driver anymore * Fixed a missing mutex_unlock() reported by Marek Szyprowski, thanks for that. There is also a git-branch available with these patches applied: https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v2 Please review. Thanks, Joerg Joerg Roedel (32): iommu: Move default domain allocation to separate function iommu/amd: Implement iommu_ops->def_domain_type call-back iommu/vt-d: Wire up iommu_ops->def_domain_type iommu/amd: Remove dma_mask check from check_device() iommu/amd: Return -ENODEV in add_device when device is not handled by IOMMU iommu: Add probe_device() and remove_device() call-backs iommu: Move default domain allocation to iommu_probe_device() iommu: Keep a list of allocated groups in __iommu_probe_device() iommu: Move new probe_device path to separate function iommu: Split off default domain allocation from group assignment iommu: Move iommu_group_create_direct_mappings() out of iommu_group_add_device() iommu: Export bus_iommu_probe() and make is safe for re-probing iommu/amd: Remove dev_data->passthrough iommu/amd: Convert to probe/release_device() call-backs iommu/vt-d: Convert to probe/release_device() call-backs iommu/arm-smmu: Convert to probe/release_device() call-backs iommu/pamu: Convert to probe/release_device() call-backs iommu/s390: Convert to probe/release_device() call-backs iommu/virtio: Convert to probe/release_device() call-backs iommu/msm: Convert to probe/release_device() call-backs iommu/mediatek: Convert to probe/release_device() call-backs iommu/mediatek-v1 Convert to probe/release_device() call-backs iommu/qcom: Convert to probe/release_device() call-backs iommu/rockchip: Convert to probe/release_device() call-backs iommu/tegra: Convert to probe/release_device() call-backs iommu/renesas: Convert to probe/release_device() call-backs iommu/omap: Remove orphan_dev tracking iommu/omap: Convert to probe/release_device() call-backs iommu/exynos: Use first SYSMMU in controllers list for IOMMU core iommu/exynos: Convert to probe/release_device() call-backs iommu: Remove add_device()/remove_device() code-paths iommu: Unexport iommu_group_get_for_dev() Sai Praneeth Prakhya (1): iommu: Add def_domain_type() callback in iommu_ops drivers/iommu/amd_iommu.c | 97 drivers/iommu/amd_iommu_types.h | 1 - drivers/iommu/arm-smmu-v3.c | 38 +-- drivers/iommu/arm-smmu.c| 39 ++-- drivers/iommu/exynos-iommu.c| 24 +- drivers/iommu/fsl_pamu_domain.c | 22 +- drivers/iommu/intel-iommu.c | 68 +- drivers/iommu/iommu.c | 393 +--- drivers/iommu/ipmmu-vmsa.c | 60 ++--- drivers/iommu/msm_iommu.c | 34 +-- drivers/iommu/mtk_iommu.c | 24 +- drivers/iommu/mtk_iommu_v1.c| 50 ++-- drivers/iommu/omap-iommu.c | 99 ++-- drivers/iommu/qcom_iommu.c | 24 +- drivers/iommu/rockchip-iommu.c | 26 +-- drivers/iommu/s390-iommu.c | 22 +- drivers/iommu/tegra-gart.c | 24 +- drivers/iommu/tegra-smmu.c | 31 +-- drivers/iommu/virtio-iommu.c| 41 +--- include/linux/iommu.h | 21 +- 20 files changed, 533 insertions(+), 605 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu Hi Joerg, With this patchset, I have an epyc system where if I boot with iommu=nopt and force a dump I will see some io page faults for a nic on the system. The vmcore is harvested and the system reboots. I haven't reproduced it on other systems yet, but without the patchset I don't see the io page faults during the kdump. Regards, Jerry I just hit an issue on a separate intel based system (kdump iommu=nopt), where it panics in during intel_iommu_attach_device, in is_aux_domain, due to device_domain_info being DEFER_DEVICE_DOMAIN_INFO. That doesn't get set to a valid address until the domain_add_dev_info call. Is it as simple as the following? diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 29d3940847d3..f1bbeed46a4c 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5053,8 +5053,8 @@ is_aux_domain(struct device *dev, struct iommu_domain *domain) { struct device_domain_info *info = dev->archdata.iommu; - return info && info->auxd_enabled && - domain->type == IOMMU_DOMAIN_UNMANAGED; + return info && info !=
Re: [PATCH] iommu/amd: Fix event counter availability check
On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote: > > Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves > > the issue. This is the earliest point in amd_iommu_init_pci where the > > call succeeds on my laptop. > > According to your description, it should just need to be anywhere after the > pci_enable_device() is called for the IOMMU device, isn't it? So, on your > system, what if we just move the init_iommu_perf_ctr() here: No, this doesn't work, as I already said in the paragraph you are responding to. See my last sentence in the quoted part. So the implication is init_device_table_dma together with subsequent cache flush is also setting up something that is necessary for counters to be writable. Alexander ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix event counter availability check
Hi Alexander, On 5/30/20 3:07 AM, Alexander Monakov wrote: The driver performs an extra check if the IOMMU's capabilities advertise presence of performance counters: it verifies that counters are writable by writing a hard-coded value to a counter and testing that reading that counter gives back the same value. Unfortunately it does so quite early, even before pci_enable_device is called for the IOMMU, i.e. when accessing its MMIO space is not guaranteed to work. On Ryzen 4500U CPU, this actually breaks the test: the driver assumes the counters are not writable, and disables the functionality. Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves the issue. This is the earliest point in amd_iommu_init_pci where the call succeeds on my laptop. According to your description, it should just need to be anywhere after the pci_enable_device() is called for the IOMMU device, isn't it? So, on your system, what if we just move the init_iommu_perf_ctr() here: diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 5b81fd16f5fa..17b9ac9491e0 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1875,6 +1875,7 @@ static int __init amd_iommu_init_pci(void) ret = iommu_init_pci(iommu); if (ret) break; + init_iommu_perf_ctr(iommu); } /* -- 2.17.1 Does this works? Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu