Re: [PATCH v2] Intel IOMMU patch to reprocess RMRR info
On Fri, 2012-09-28 at 20:15 +0100, David Woodhouse wrote: On Fri, 2012-09-28 at 12:36 -0400, Linda Knippers wrote: I can only speak to the HP servers. We have been shipping devices 'for a while' that provide sensor-type data to the platform. The device does DMA writes to a range of memory (the RMRR) and iLO does DMA reads of that data. This works in general but not when the 'iommu=pt' boot option is used. This patch associates the RMRR with the devices when they are moved out of the si domain. That much makes sense, I think, because they're moved out of the hardware SI domain *early*, when we realise they're actually only capable of 32-bit DMA and we have 4GiB of RAM. Right? Correct. By default all devices are added to SI domain assuming that these devices are 64-bit devices. When we detect the device is a 32-bit device based on the requested dma-mask, it gets removed from SI domain, hence looses its RMRR association. In the meantime dma continues causing DMA errors. This patch is re-processing RMRRs for the device in question and doing re-assignment. It sounds like this isn't a case of the device being used by a native driver or given to KVM, and subsequently released. This is just booting up and losing the RMRR regions on a device which the OS *hasn't* really touched. So that should be fixed. Based on Alex's comments about moving RMRR devices between domains, it sounds like we also have a problem without the 'iommu=pt' boot option if someone assigns one of those devices to a guest. Yeah... but why *would* they? What possible reason would we have to assign either the sensor device, or the iLO, to a KVM guest. Or to have a native driver that attempts to do DMA from them? Obviously, in an ideal world we'd have proper native drivers for the sensor device. But I'm guessing that's not the case here; it's used by the firmware and we're not supposed to be touching it? And yes, obviously a better hardware design (from the OS/IOMMU point of view) would be to have a path for the sensor data that *doesn't* go via host RAM and thus via the IOMMU twice. But while that's a lesson that's hopefully been learned and will be implemented in future, we have to deal with the existing hardware and its (ab)use of RMRRs. Right. We do have hardware that is relying on being able to do dma from devices to a system RAM via RMRR. -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/16] iommu/amd: Allocate data structures to keep track of irq remapping tables
On Fri, Sep 28, 2012 at 6:23 AM, Joerg Roedel joerg.roe...@amd.com wrote: To easily map device ids to interrupt remapping table entries a new lookup table is necessary. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- drivers/iommu/amd_iommu_init.c | 16 drivers/iommu/amd_iommu_types.h |9 + 2 files changed, 25 insertions(+) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index a046a0e..db100c5 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -181,6 +181,12 @@ u16 *amd_iommu_alias_table; struct amd_iommu **amd_iommu_rlookup_table; /* + * This table is used to find the irq remapping table for a given device id + * quickly. + */ +struct irq_remap_table **irq_lookup_table; + +/* * AMD IOMMU allows up to 2^16 differend protection different domains. This is a bitmap * to know which ones are already in use. */ @@ -1529,9 +1535,13 @@ static struct syscore_ops amd_iommu_syscore_ops = { static void __init free_on_init_error(void) { + free_pages((unsigned long)irq_lookup_table, + get_order(rlookup_table_size)); + if (amd_iommu_irq_cache) { kmem_cache_destroy(amd_iommu_irq_cache); amd_iommu_irq_cache = NULL; + } amd_iommu_uninit_devices(); @@ -1684,6 +1694,12 @@ static int __init early_amd_iommu_init(void) 0, NULL); if (!amd_iommu_irq_cache) goto out; + + irq_lookup_table = (void *)__get_free_pages( + GFP_KERNEL | __GFP_ZERO, + get_order(rlookup_table_size)); + if (!irq_lookup_table) + goto out; } ret = init_memory_definitions(ivrs_base); diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index 953cea8..1a7d480 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -175,6 +175,7 @@ #define DEV_ENTRY_EX0x67 #define DEV_ENTRY_SYSMGT1 0x68 #define DEV_ENTRY_SYSMGT2 0x69 +#define DEV_ENTRY_IRQ_TBL_EN 0x80 #define DEV_ENTRY_INIT_PASS 0xb8 #define DEV_ENTRY_EINT_PASS 0xb9 #define DEV_ENTRY_NMI_PASS 0xba @@ -337,6 +338,14 @@ extern bool amd_iommu_iotlb_sup; #define MAX_IRQS_PER_TABLE 256 #define IRQ_TABLE_ALIGNMENT128 +struct irq_remap_table { + spinlock_t lock; + unsigned min_index; + u32 *table; +}; + +extern struct irq_remap_table **irq_lookup_table; + /* Interrupt remapping feature used? */ extern bool amd_iommu_irq_remap; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 14/16] iommu/irq: Use amd_iommu_irq_ops if supported
On Fri, Sep 28, 2012 at 6:24 AM, Joerg Roedel joerg.roe...@amd.com wrote: Finally enable interrupt remapping for AMD systems. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- drivers/iommu/irq_remapping.c |5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index 151690d..faf85d6 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -51,6 +51,11 @@ early_param(intremap, setup_irqremap); void __init setup_irq_remapping_ops(void) { remap_ops = intel_irq_remap_ops; + +#ifdef CONFIG_AMD_IOMMU + if (amd_iommu_irq_ops.prepare() == 0) + remap_ops = amd_iommu_irq_ops; +#endif Should remap_ops be set to null when amd_iommu_irq_ops.prepare() fails?What happens if remap_ops left set to intel_irq_remap_ops? Should remap_ops = intel_irq_remap_ops; moved into #else case forifdef CONFIG_AMD_IOMMU? } int irq_remapping_supported(void) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 12/16] iommu/amd: Add initialization routines for AMD interrupt remapping
On Mon, Oct 1, 2012 at 2:05 AM, Joerg Roedel joerg.roe...@amd.com wrote: On Fri, Sep 28, 2012 at 05:18:18PM -0600, Shuah Khan wrote: +void amd_iommu_disable(void) +{ + amd_iommu_suspend(); Is it safe to attempt to suspend when iommu is in suspend state? In other words what happens if amd_iommu_disable() gets called when iommu is already in suspend state? Yes, this is safe. It just trns a bit to 0 which is already 0. Good. +int amd_iommu_reenable(int mode) +{ + amd_iommu_resume(); Same question as above. Safe to do a resume, when in resumed state? Safe too, it just writes values to the hardware which are already there. This was also proven by testing. Thanks. Sounds good. -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 14/16] iommu/irq: Use amd_iommu_irq_ops if supported
On Mon, Oct 1, 2012 at 2:08 AM, Joerg Roedel joerg.roe...@amd.com wrote: On Fri, Sep 28, 2012 at 05:39:03PM -0600, Shuah Khan wrote: On Fri, Sep 28, 2012 at 6:24 AM, Joerg Roedel joerg.roe...@amd.com wrote: void __init setup_irq_remapping_ops(void) { remap_ops = intel_irq_remap_ops; + +#ifdef CONFIG_AMD_IOMMU + if (amd_iommu_irq_ops.prepare() == 0) + remap_ops = amd_iommu_irq_ops; +#endif Should remap_ops be set to null when amd_iommu_irq_ops.prepare() fails?What happens if remap_ops left set to intel_irq_remap_ops? Should remap_ops = intel_irq_remap_ops; moved into #else case forifdef CONFIG_AMD_IOMMU? Remap-Ops does not need to be set to NULL because irq_remapping_enabled will not get set to true then and remap_ops will not get called. The Intel path can't also be moved to #else because this would mean that kernels can only support eihter, Intel or AMD IOMMU. But Linux can support both in the same kernel. Thanks for the explanation. -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 1/5] amd_iommu: Split IOMMU group initialization
Alex, couple of comments in-lined: On Mon, Oct 8, 2012 at 10:49 PM, Alex Williamson alex.william...@redhat.com wrote: This needs to be broken apart, start with pulling all the IOMMU group init code into a new function. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/iommu/amd_iommu.c | 61 - 1 file changed, 43 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 55074cb..b65b377 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -276,39 +276,32 @@ static void swap_pci_ref(struct pci_dev **from, struct pci_dev *to) #define REQ_ACS_FLAGS (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF) -static int iommu_init_device(struct device *dev) +static int init_iommu_group(struct device *dev) { - struct pci_dev *dma_pdev = NULL, *pdev = to_pci_dev(dev); struct iommu_dev_data *dev_data; struct iommu_group *group; - u16 alias; + struct pci_dev *dma_pdev = NULL; int ret; I don't see ret get initialized - if (dev-archdata.iommu) + group = iommu_group_get(dev); + if (group) { + iommu_group_put(group); return 0; + } dev_data = find_dev_data(get_device_id(dev)); if (!dev_data) return -ENOMEM; - alias = amd_iommu_alias_table[dev_data-devid]; - if (alias != dev_data-devid) { - struct iommu_dev_data *alias_data; - - alias_data = find_dev_data(alias); - if (alias_data == NULL) { - pr_err(AMD-Vi: Warning: Unhandled device %s\n, - dev_name(dev)); - free_dev_data(dev_data); - return -ENOTSUPP; - } - dev_data-alias_data = alias_data; + if (dev_data-alias_data) { + u16 alias; + alias = amd_iommu_alias_table[dev_data-devid]; dma_pdev = pci_get_bus_and_slot(alias 8, alias 0xff); } - if (dma_pdev == NULL) - dma_pdev = pci_dev_get(pdev); + if (!dma_pdev) + dma_pdev = pci_dev_get(to_pci_dev(dev)); /* Account for quirked devices */ swap_pci_ref(dma_pdev, pci_get_dma_source(dma_pdev)); @@ -358,6 +351,38 @@ root_bus: iommu_group_put(group); + return ret; Do you even need ret - can this be simply return 0; +} + +static int iommu_init_device(struct device *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct iommu_dev_data *dev_data; + u16 alias; + int ret; + + if (dev-archdata.iommu) + return 0; + + dev_data = find_dev_data(get_device_id(dev)); + if (!dev_data) + return -ENOMEM; + + alias = amd_iommu_alias_table[dev_data-devid]; + if (alias != dev_data-devid) { + struct iommu_dev_data *alias_data; + + alias_data = find_dev_data(alias); + if (alias_data == NULL) { + pr_err(AMD-Vi: Warning: Unhandled device %s\n, + dev_name(dev)); + free_dev_data(dev_data); + return -ENOTSUPP; + } + dev_data-alias_data = alias_data; + } + + ret = init_iommu_group(dev); if (ret) return ret; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH -stable] amd_iommu: attach device fails on the last pci device
amd_iommu_attach_device() checks if device id is within the limits of amd_iommu_last_bdf and instead checking if devid amd_iommu_last_bdf, it checks devid = amd_iommu_last_bdf. As a result the last device attach fails because amd_iommu_attach_device() returns an -EINVAL. This bug is in linux-2.6.32 and an equivalent fix in linux-2.6.33 and has been carried forward to later kernels and is in the upstream kernel. This equivalent fix includes restructuring and consolidating device checks into a routine check_device(). Instead of back-porting all of that work, spot-fixed the bug in amd_iommu_attach_device() for linux-2.6.32. Signed-off-by: Shuah Khan shuah.k...@hp.com CC: sta...@vger.kernel.org v2.6.32 --- arch/x86/kernel/amd_iommu.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c index 3a44b75..67de7d7 100644 --- a/arch/x86/kernel/amd_iommu.c +++ b/arch/x86/kernel/amd_iommu.c @@ -2288,7 +2288,7 @@ static int amd_iommu_attach_device(struct iommu_domain *dom, devid = calc_devid(pdev-bus-number, pdev-devfn); - if (devid = amd_iommu_last_bdf || + if (devid amd_iommu_last_bdf || devid != amd_iommu_alias_table[devid]) return -EINVAL; -- 1.7.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] Prevent devices with RMRRs from being placed into SI Domain during startup
On Tue, 2012-10-16 at 16:50 +, Tom Mingarelli wrote: This patch is to prevent devices that have RMRRs associated with them from getting placed into the SI Domain during init. We don't put USB devices into this category, however. This fixes the issue where the RMRR info for devices being placed in and out of the SI Domain gets lost. Signed-off-by: Thomas Mingarelli thomas.mingare...@hp.com PATCH v1: https://lkml.org/lkml/2012/6/15/204 PATCH v2: https://lkml.org/lkml/2012/9/18/354 drivers/iommu/intel-iommu.c | 33 + 1 files changed, 33 insertions(+), 0 deletions(-) diff -up ./drivers/iommu/intel-iommu.c.ORIG ./drivers/iommu/intel-iommu.c --- ./drivers/iommu/intel-iommu.c.ORIG2012-10-16 09:34:23.148089944 -0500 +++ ./drivers/iommu/intel-iommu.c 2012-10-16 09:56:56.905932861 -0500 @@ -2320,8 +2320,41 @@ static int domain_add_dev_info(struct dm return 0; } +static bool device_has_rmrr(struct pci_dev *dev) +{ + struct dmar_rmrr_unit *rmrr; + int i; + + for_each_rmrr_units(rmrr) { + for (i = 0; i rmrr-devices_cnt; i++) { + /* + * Here we are just concerned with checking each device + * that has an RMRR associated with it and not allow it + * to be placed into the SI Domain during startup. + */ + if (rmrr-devices[i] == dev) + return true; + } + } + return false; +} + Will you use the same routine to deny device assignment request for devices with RMRR? Is that going to be another patch? static int iommu_should_identity_map(struct pci_dev *pdev, int startup) { + + if (startup) { + /* + * This is where we will refuse any device that has an + * RMRR associated with it and is not a USB device and + * NOT allow it to be placed into the SI Domain. We + * only do this on startup. We don't need a separate bit + * for this because it could be ANY device. + */ + if (device_has_rmrr(pdev) + (pdev-class 8) != PCI_CLASS_SERIAL_USB) + return 0; + } + Is there a reason to not group this with the other pci device checks below. Don't you need this done whenever iommu_should_identity_map() get called as opposed just at startup? if ((iommu_identity_mapping IDENTMAP_AZALIA) IS_AZALIA(pdev)) return 1; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] Prevent devices with RMRRs from being placed into SI Domain during startup
On Tue, 2012-10-16 at 16:50 +, Tom Mingarelli wrote: This patch is to prevent devices that have RMRRs associated with them from getting placed into the SI Domain during init. We don't put USB devices into this category, however. This fixes the issue where the RMRR info for devices being placed in and out of the SI Domain gets lost. Signed-off-by: Thomas Mingarelli thomas.mingare...@hp.com PATCH v1: https://lkml.org/lkml/2012/6/15/204 PATCH v2: https://lkml.org/lkml/2012/9/18/354 drivers/iommu/intel-iommu.c | 33 + 1 files changed, 33 insertions(+), 0 deletions(-) diff -up ./drivers/iommu/intel-iommu.c.ORIG ./drivers/iommu/intel-iommu.c --- ./drivers/iommu/intel-iommu.c.ORIG2012-10-16 09:34:23.148089944 -0500 +++ ./drivers/iommu/intel-iommu.c 2012-10-16 09:56:56.905932861 -0500 @@ -2320,8 +2320,41 @@ static int domain_add_dev_info(struct dm return 0; } +static bool device_has_rmrr(struct pci_dev *dev) +{ + struct dmar_rmrr_unit *rmrr; + int i; + + for_each_rmrr_units(rmrr) { + for (i = 0; i rmrr-devices_cnt; i++) { + /* + * Here we are just concerned with checking each device + * that has an RMRR associated with it and not allow it + * to be placed into the SI Domain during startup. + */ + if (rmrr-devices[i] == dev) + return true; + } + } + return false; +} + static int iommu_should_identity_map(struct pci_dev *pdev, int startup) { + + if (startup) { + /* + * This is where we will refuse any device that has an + * RMRR associated with it and is not a USB device and + * NOT allow it to be placed into the SI Domain. We + * only do this on startup. We don't need a separate bit + * for this because it could be ANY device. + */ + if (device_has_rmrr(pdev) + (pdev-class 8) != PCI_CLASS_SERIAL_USB) Forgot to ask in my last response. Is it sufficient to check _USB. Are we missing any other devices that use RMRR that would qualify? + return 0; + } + if ((iommu_identity_mapping IDENTMAP_AZALIA) IS_AZALIA(pdev)) return 1; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH -stable] amd_iommu: attach device fails on the last pci device
On Fri, 2012-10-12 at 12:35 -0700, Jonathan Nieder wrote: Shuah Khan wrote: On Fri, 2012-10-12 at 11:38 -0700, Jonathan Nieder wrote: To save Willy time: am I correct in guessing the upstream commit you are referring to is 98fc5a693bbdda498a556654c70d1e31a186c988 (x86/amd-iommu: Use get_device_id and check_device where appropriate, 2009-11-24)? Yes that is one. Great, thanks for the quick confirmation. Is this patch in the queue to be pulled into 2.6.32 tree? Thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: moving initialization earlier
On Fri, Jan 11, 2013 at 7:50 AM, Joerg Roedel j...@8bytes.org wrote: On Thu, Jan 10, 2013 at 01:40:17PM -0700, Shuah Khan wrote: I am currently debugging IO_PAGE_FAULTS on 3.6.11 (happens on all pre-3.7 releases). I root-caused the reason 3.7 works is because in 3.7 amd iommu driver moving up the early iommu initialization from irq_remap_ops with the irq remapping feature. Have you investigated the reason for those IO_PAGE_FAULTS? I guess they come from the USB controlers and happen between the time the IOMMU is enabled and the USB controlers are taken over by the Linux kernel from the BIOS. But I don't see why this patch can have any impact on the IO_PAGE_FAULTS you are seeing. USB is one of them and SATA is another. I know with my back-port to split dma ops initialization and this patch Alexey sent, the problem goes away on 3.6.11 and I don't see the problem on 3.7. Same system, and two different boots. I will investigate do more investigation into the nature of these faults I am seeing and get back to you. Thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init() in Linux 3.4
Joerg, I am seeing IO_PAGE_FAULTs on AMD system running releases prior to 3.7. I focused my debug and testing on 3.4. I am hoping to find a solution for this problem in 3.4. I don't see any IO_PAGE_FAULTs with 3.7 and later releases on this system. On this system BIOS specifies Unity mapped (direct mapped) exclusion ranges in IVMDs for several devices. These regions are in use during BIOS hand-off to kernel and continue to be used during kernel boot and run-time. Access to these ranges continues to work with no errors until AMD IOMMU driver disables and re-enables IOMMU in enable_iommus(). These faults don't persist and appear between the enable_iommus() call and before amd_iommu_init() gets done printing AMD-Vi: Lazy IO/TLB flushing enabled message. Read requests from device 02:00.2 and write request from device 03:00.0 to these unity mapped regions fail. The reason appears to be because domain id is 0. Domain gets assigned in amd_iommu_init_dma_ops() and unity maps are handled. I don't see enable_iommus() doing anything to these unity mapped exclusion ranges. So I am assuming that is not the issue, however, could domain ids get flushed? More like, why do these faults show up in this window? These are direct mapped, so there is no need for any translations. Please see below for IVMD dump and IO_PAGE_FAULT analysis. Dump of these ranges from dmesg: [5.322280] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0 devid_end: 04:00.3 range_start: 000f range_end: 0010 flags: 7 [5.322367] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0 devid_end: 04:00.3 range_start: bff7 range_end: bfff flags: 7 [5.322454] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0 devid_end: 04:00.3 range_start: 000e8000 range_end: 000e9000 flags: 7 [5.322540] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0 devid_end: 04:00.3 range_start: bdffe000 range_end: be00 flags: 7 [5.322627] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0 devid_end: 04:00.3 range_start: bdff9000 range_end: bdffd000 flags: 7 [5.322714] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0 devid_end: 04:00.3 range_start: bdfe9000 range_end: bdff9000 flags: 7 Now to IO_PAGE_FAULT analysis: My observations in [ 15.281594] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.2 domain=0x address=0xbdffe000 flags=0x0050] [ 15.281861] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.2 domain=0x address=0xbdff9080 flags=0x0050] [ 15.281990] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.2 domain=0x address=0xbdff9100 flags=0x0050] Domain ID is zero - PASID not valid flags=0x0050 - Bits PE and PR are set in the Event. TR: translation TR=0 TR is 0 that means it is a transaction request RZ: reserved bit RZ=0 Since PR is set RZ is meaningful, I/O page fault is due to an invalid level encoding PE: permission indicator PE=1 Device doesn't have permission for this transaction RW: read-write RW=0 RW is meaningful since PR=1, TR=0, and I=0. It is a Read transaction PR: Present PR=1 PR = 1 means transaction is to a page marked present I: interrupt I=0 transaction is a memory request US: user-supervisor US=0 Supervisor privileges were asserted. NX: no execute NX=0 0 upstream transaction lacks a PASID TLP prefix. Domain ID is zero. GN: guest/nested GN=0 Transaction used a nested address (GPA). [ 15.281733] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x address=0xbdff9160 flags=0x0070] Domain ID is zero - PASID is not valid flags=0x0070 - Bits PE, RW, and PR are set in the Event. TR: translation TR=0 TR is 0 that means it is a transaction request RZ: reserved bit RZ=0 Since PR is set RZ is meaningful, I/O page fault is due to an invalid level encoding PE: permission indicator PE=1 Device doesn't have permission for this transaction RW: read-write RW=1 RW is meaningful since PR=1, TR=0, and I=0. It is a Write transaction PR: Present PR=1 PR = 1 means transaction is to a page marked present I: interrupt I=0 transaction is a memory request US: user-supervisor US=0 Supervisor privileges were asserted. NX: no execute NX=0 0 upstream transaction lacks a PASID TLP prefix. Domain ID is zero. GN: guest/nested GN=0 Transaction used a nested address (GPA). Thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init() in Linux 3.4
On Tue, Feb 5, 2013 at 6:31 AM, Joerg Roedel j...@8bytes.org wrote: Hi Shuah, On Fri, Feb 01, 2013 at 11:31:59AM -0700, Shuah Khan wrote: Yes, 3.7 has the same window of opportunity for this race condition, however I couldn't figure out why it doesn't happen on 3.7. On 3.7 the window between amd_iommu_init_hardware() and amd_iommu_init_dma_ops() might actually be wider than the window in 3.4. I think this is highly timing related. IOMMU initialization may have been moved by a few milliseconds between the kernel versions which might cause the warnings to appear or disappear. I don't think it has much value to dive deeper into the differences between the initialization sequences. As somethimes with such issues there is a simple and a more complex fix for that. I'll try to come up with a simple fix for the next merge window and implement the clean and more complex one for the next one. Hi Joerg, Thanks much. I will hang on to this test system for testing your fix. -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init() in Linux 3.4
On Wed, 2013-02-06 at 13:12 +0100, Joerg Roedel wrote: On Tue, Feb 05, 2013 at 06:57:21AM -0700, Shuah Khan wrote: Thanks much. I will hang on to this test system for testing your fix. Okay, here is the simple fix for v3.8-rc6. I guess it is not straighforward to port it to v3.4, but it should be doable. From 2ecf57c85e67e0243b36b787d0490c0b47202ba8 Mon Sep 17 00:00:00 2001 From: Joerg Roedel j...@8bytes.org Date: Wed, 6 Feb 2013 12:55:23 +0100 Subject: [PATCH] iommu/amd: Initialize device table after dma_ops When dma_ops are initialized the unity mappings are created. The init_device_table_dma() function makes sure DMA from all devices is blocked by default. This opens a short window in time where DMA to unity mapped regions is blocked by the IOMMU. Make sure this does not happen by initializing the device table after dma_ops. Signed-off-by: Joerg Roedel j...@8bytes.org Joerg, I tested your patch on 3.8. I was able to reproduce the problem and then apply your patch to verify that the problem is fixed. This patch applies cleanly to 3.7.6, however I could not reproduce the problem on 3.7.6 without the patch. But the window exists on 3.7 as well. Your patch can be applied to 3.7.6 as is. I back-ported the patch to 3.4 and 3.0 and tested. I am sending those patches after this email. On 3.4.29 and 3.0.62 I was able to reproduce the problem and then applied the back-ported patch to verify that the problem is fixed. Thanks again for the fix. -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] amd_iommu: IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init()
When dma_ops are initialized the unity mappings are created. The init_device_table_dma() function makes sure DMA from all devices is blocked by default. This opens a short window in time where DMA to unity mapped regions is blocked by the IOMMU. Make sure this does not happen by initializing the device table after dma_ops. Signed-off-by: Shuah Khan shuah.k...@hp.com CC: sta...@vger.kernel.org 3.4 --- drivers/iommu/amd_iommu_init.c |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index ef0ae93..b573f80 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1572,8 +1572,6 @@ int __init amd_iommu_init_hardware(void) if (amd_iommu_pd_alloc_bitmap == NULL) goto free; - /* init the device table */ - init_device_table(); /* * let all alias entries point to itself @@ -1655,6 +1653,7 @@ out: */ static int __init amd_iommu_init(void) { + struct amd_iommu *iommu; int ret = 0; ret = amd_iommu_init_hardware(); @@ -1673,6 +1672,12 @@ static int __init amd_iommu_init(void) if (ret) goto free; + /* init the device table */ + init_device_table(); + + for_each_iommu(iommu) + iommu_flush_all_caches(iommu); + amd_iommu_init_api(); x86_platform.iommu_shutdown = disable_iommus; -- 1.7.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init() in Linux 3.4
On Mon, Feb 11, 2013 at 12:49 PM, Greg KH gre...@linuxfoundation.org wrote: On Wed, Feb 06, 2013 at 07:40:50PM -0700, Shuah Khan wrote: On Wed, 2013-02-06 at 13:12 +0100, Joerg Roedel wrote: On Tue, Feb 05, 2013 at 06:57:21AM -0700, Shuah Khan wrote: Thanks much. I will hang on to this test system for testing your fix. Okay, here is the simple fix for v3.8-rc6. I guess it is not straighforward to port it to v3.4, but it should be doable. From 2ecf57c85e67e0243b36b787d0490c0b47202ba8 Mon Sep 17 00:00:00 2001 From: Joerg Roedel j...@8bytes.org Date: Wed, 6 Feb 2013 12:55:23 +0100 Subject: [PATCH] iommu/amd: Initialize device table after dma_ops When dma_ops are initialized the unity mappings are created. The init_device_table_dma() function makes sure DMA from all devices is blocked by default. This opens a short window in time where DMA to unity mapped regions is blocked by the IOMMU. Make sure this does not happen by initializing the device table after dma_ops. Signed-off-by: Joerg Roedel j...@8bytes.org Joerg, I tested your patch on 3.8. I was able to reproduce the problem and then apply your patch to verify that the problem is fixed. This patch applies cleanly to 3.7.6, however I could not reproduce the problem on 3.7.6 without the patch. But the window exists on 3.7 as well. Your patch can be applied to 3.7.6 as is. I back-ported the patch to 3.4 and 3.0 and tested. I am sending those patches after this email. On 3.4.29 and 3.0.62 I was able to reproduce the problem and then applied the back-ported patch to verify that the problem is fixed. Thanks again for the fix. I'm lost here, why isn't this patch in Linus's tree already? You seem to be sending backports for something that isn't there to backport yet. Please resend these patches to sta...@vger.kernel.org, with the upstream git commit id, when they are in Linus's tree, there's nothing I can do with them for now, sorry. greg k-h Greg, I will resend the patch when I have the commit id from Linus's tree. Thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id
pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however, it doesn't have interfaces to return PCI bus and PCI device id. Drivers (AMD IOMMU, and AER) have module specific definitions for PCI_BUS() and AMD_IOMMU driver also has a module specific interface to calculate PCI device id from bus number and devfn. This patch set adds PCI_BUS(), and PCI_DEVID() to pci.h, changes are to use PCI_BUS() from pci. Changes AMD_IOMMU driver to use PCI_BUS() and PCI_DEVID() from pci. Files changed: include/uapi/linux/pci.h |4 1 file changed, 4 insertions(+) drivers/pci/pcie/aer/aerdrv_core.c |2 -- 1 file changed, 2 deletions(-) drivers/iommu/amd_iommu_types.h |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) drivers/iommu/amd_iommu.c |2 +- drivers/iommu/amd_iommu_init.c |6 +++--- drivers/iommu/amd_iommu_types.h |7 --- 3 files changed, 4 insertions(+), 11 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id
pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however, it doesn't have interfaces to return PCI bus and PCI device id. Drivers (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS() and AMD_IOMMU driver also has a module specific interface to calculate PCI device id from bus number and devfn. Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device id respectively to avoid the need for duplicate definitions in other modules. AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines an interface to calculate device id from bus number, and devfn pair. Signed-off-by: Shuah Khan shuah.k...@hp.com --- include/uapi/linux/pci.h |4 1 file changed, 4 insertions(+) diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h index 3c292bc0..6b2c8b3 100644 --- a/include/uapi/linux/pci.h +++ b/include/uapi/linux/pci.h @@ -30,6 +30,10 @@ #define PCI_DEVFN(slot, func) slot) 0x1f) 3) | ((func) 0x07)) #define PCI_SLOT(devfn)(((devfn) 3) 0x1f) #define PCI_FUNC(devfn)((devfn) 0x07) +#define PCI_DEVID(bus, devfn) u16)bus) 8) | devfn) + +/* return bus from PCI devid = ((u16)bus_number) 8) | devfn */ +#define PCI_BUS(x) (((x) 8) 0xff) /* Ioctls for /proc/bus/pci/X/Y nodes. */ #define PCIIOC_BASE('P' 24 | 'C' 16 | 'I' 8) -- 1.7.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/4] amd_iommu: Remove local PCI_BUS() define and use PCI_BUS() from pci
Change to remove local PCI_BUS() define and use the new PCI_BUS interface from pci. Signed-off-by: Shuah Khan shuah.k...@hp.com --- drivers/iommu/amd_iommu_types.h |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index e38ab43..a07882f 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -24,6 +24,7 @@ #include linux/mutex.h #include linux/list.h #include linux/spinlock.h +#include linux/pci.h /* * Maximum number of IOMMUs supported @@ -315,9 +316,6 @@ #define MAX_DOMAIN_ID 65536 -/* FIXME: move this macro to linux/pci.h */ -#define PCI_BUS(x) (((x) 8) 0xff) - /* Protection domain flags */ #define PD_DMA_OPS_MASK(1UL 0) /* domain used for dma_ops */ #define PD_DEFAULT_MASK(1UL 1) /* domain is a default dma_ops -- 1.7.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/4] amd_iommu: Remove calc_devid() and use PCI_DEVID() from pci
Change to remove calc_devid() and use PCI_DEVID() from pci instead. Signed-off-by: Shuah Khan shuah.k...@hp.com --- drivers/iommu/amd_iommu.c |2 +- drivers/iommu/amd_iommu_init.c |6 +++--- drivers/iommu/amd_iommu_types.h |7 --- 3 files changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index c1c74e0..38c1392 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -173,7 +173,7 @@ static inline u16 get_device_id(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); - return calc_devid(pdev-bus-number, pdev-devfn); + return PCI_DEVID(pdev-bus-number, pdev-devfn); } static struct iommu_dev_data *get_dev_data(struct device *dev) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index faf10ba..582b5df 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -406,7 +406,7 @@ static int __init find_last_devid_on_pci(int bus, int dev, int fn, int cap_ptr) u32 cap; cap = read_pci_config(bus, dev, fn, cap_ptr+MMIO_RANGE_OFFSET); - update_last_devid(calc_devid(MMIO_GET_BUS(cap), MMIO_GET_LD(cap))); + update_last_devid(PCI_DEVID(MMIO_GET_BUS(cap), MMIO_GET_LD(cap))); return 0; } @@ -1128,9 +1128,9 @@ static int iommu_init_pci(struct amd_iommu *iommu) pci_read_config_dword(iommu-dev, cap_ptr + MMIO_MISC_OFFSET, misc); - iommu-first_device = calc_devid(MMIO_GET_BUS(range), + iommu-first_device = PCI_DEVID(MMIO_GET_BUS(range), MMIO_GET_FD(range)); - iommu-last_device = calc_devid(MMIO_GET_BUS(range), + iommu-last_device = PCI_DEVID(MMIO_GET_BUS(range), MMIO_GET_LD(range)); if (!(iommu-cap (1 IOMMU_CAP_IOTLB))) diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index a07882f..ec36cf6 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -701,13 +701,6 @@ extern int amd_iommu_max_glx_val; */ extern void iommu_flush_all_caches(struct amd_iommu *iommu); -/* takes bus and device/function and returns the device id - * FIXME: should that be in generic PCI code? */ -static inline u16 calc_devid(u8 bus, u8 devfn) -{ - return (((u16)bus) 8) | devfn; -} - static inline int get_ioapic_devid(int id) { struct devid_map *entry; -- 1.7.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id
On Wed, 2013-02-20 at 18:19 -0700, Bjorn Helgaas wrote: On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan shuah.k...@hp.com wrote: pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however, it doesn't have interfaces to return PCI bus and PCI device id. Drivers (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS() and AMD_IOMMU driver also has a module specific interface to calculate PCI device id from bus number and devfn. Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device id respectively to avoid the need for duplicate definitions in other modules. AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines an interface to calculate device id from bus number, and devfn pair. Signed-off-by: Shuah Khan shuah.k...@hp.com --- include/uapi/linux/pci.h |4 1 file changed, 4 insertions(+) diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h index 3c292bc0..6b2c8b3 100644 --- a/include/uapi/linux/pci.h +++ b/include/uapi/linux/pci.h @@ -30,6 +30,10 @@ #define PCI_DEVFN(slot, func) slot) 0x1f) 3) | ((func) 0x07)) #define PCI_SLOT(devfn)(((devfn) 3) 0x1f) #define PCI_FUNC(devfn)((devfn) 0x07) +#define PCI_DEVID(bus, devfn) u16)bus) 8) | devfn) + +/* return bus from PCI devid = ((u16)bus_number) 8) | devfn */ +#define PCI_BUS(x) (((x) 8) 0xff) /* Ioctls for /proc/bus/pci/X/Y nodes. */ #define PCIIOC_BASE('P' 24 | 'C' 16 | 'I' 8) David, can you point me at a description of include/uapi ... what is there and why, and how we should decide what new things go in include/uapi/linux/pci.h as opposed to include/linux/pci.h? Maybe there should be something in Documentation/? I'm guessing it's something to do with being exported to userland, but I'm not sure the things in this patch (PCI_DEV_ID, PCI_BUS) are really exportable in the sense of being used for syscalls, etc. Bjorn,David, Looks like the following thread answers some of the questions about when this uapi export was done on the existing defines. https://lkml.org/lkml/2011/7/28/198 Sounds like the concern is that the older defines PCI_DEVFN, PCI_SLOT, PCI_FUNC, and PCI_DEVID could be exported, but not the new ones I added. I could find any discussion on whether these four older defines are exportable or the reasons for the export in the above thread. So the question is if uapi/linux.pci.h isn't the right place, do you have a recommendation on where they belong. The only alternative I can think of is include/linux/pci.h. It makes functional and logical sense to add the new defines to where the existing ones are defines. At least, not knowing the details of the change that moved PCI_DEVFN etc. to uapi/pci.h, that is my conclusion. -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id
On Mon, 2013-02-25 at 14:23 -0700, Bjorn Helgaas wrote: On Mon, Feb 25, 2013 at 9:37 AM, Shuah Khan shuah.k...@hp.com wrote: On Wed, 2013-02-20 at 18:19 -0700, Bjorn Helgaas wrote: On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan shuah.k...@hp.com wrote: pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however, it doesn't have interfaces to return PCI bus and PCI device id. Drivers (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS() and AMD_IOMMU driver also has a module specific interface to calculate PCI device id from bus number and devfn. Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device id respectively to avoid the need for duplicate definitions in other modules. AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines an interface to calculate device id from bus number, and devfn pair. Signed-off-by: Shuah Khan shuah.k...@hp.com --- include/uapi/linux/pci.h |4 1 file changed, 4 insertions(+) diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h index 3c292bc0..6b2c8b3 100644 --- a/include/uapi/linux/pci.h +++ b/include/uapi/linux/pci.h @@ -30,6 +30,10 @@ #define PCI_DEVFN(slot, func) slot) 0x1f) 3) | ((func) 0x07)) #define PCI_SLOT(devfn)(((devfn) 3) 0x1f) #define PCI_FUNC(devfn)((devfn) 0x07) +#define PCI_DEVID(bus, devfn) u16)bus) 8) | devfn) + +/* return bus from PCI devid = ((u16)bus_number) 8) | devfn */ +#define PCI_BUS(x) (((x) 8) 0xff) /* Ioctls for /proc/bus/pci/X/Y nodes. */ #define PCIIOC_BASE('P' 24 | 'C' 16 | 'I' 8) David, can you point me at a description of include/uapi ... what is there and why, and how we should decide what new things go in include/uapi/linux/pci.h as opposed to include/linux/pci.h? Maybe there should be something in Documentation/? I'm guessing it's something to do with being exported to userland, but I'm not sure the things in this patch (PCI_DEV_ID, PCI_BUS) are really exportable in the sense of being used for syscalls, etc. Bjorn,David, Looks like the following thread answers some of the questions about when this uapi export was done on the existing defines. https://lkml.org/lkml/2011/7/28/198 Sounds like the concern is that the older defines PCI_DEVFN, PCI_SLOT, PCI_FUNC, and PCI_DEVID could be exported, but not the new ones I added. I could find any discussion on whether these four older defines are exportable or the reasons for the export in the above thread. I think David's disintegration script took include/linux/pci.h, left the #ifdef __KERNEL__ parts there, and moved everything else (which wasn't much) to include/uapi/linux/pci.h. It's obvious that the PCIIOC_ #defines need to be exported to user-space for ioctls. It's not obvious to me why PCI_DEVFN, PCI_SLOT, and PCI_FUNC need to be exported to user-space. But I can imagine user-space using functionality like that, even if it's not connected to a kernel interface. I assume the intent of the disintegration is that only include/uapi would be exposed to user-space, so keeping those definitions in include/linux/pci.h would break any user programs that used them. So the question is if uapi/linux.pci.h isn't the right place, do you have a recommendation on where they belong. The only alternative I can think of is include/linux/pci.h. It makes functional and logical sense to add the new defines to where the existing ones are defines. At least, not knowing the details of the change that moved PCI_DEVFN etc. to uapi/pci.h, that is my conclusion. Using the linux-fullhist tree, I found these: 059d367 Import 2.1.82 -- moved PCI_DEVFN outside #ifdef __KERNEL__ b039547 Import 2.1.76 -- PCI_DEVFN was inside #ifdef __KERNEL__ f6d9739 Import 2.1.68pre1 -- added #ifdef __KERNEL__ (enclosing PCI_DEVFN) 940649f Import 1.3.0 -- added PCI_DEVFN There's no indication of *why* PCI_DEVFN was exported, of course. Bottom line, I think it's reasonable to keep PCI_DEVFN, et al., in uapi/linux/pci.h to keep from breaking user-programs, even though if we were adding them today we would probably put them in the kernel-only linux/pci.h. For the new ones you're adding, I'd propose putting them in the kernel-only linux/pci.h because we know no user programs use them. Yeah. These have been in uapi long enough to continue to keep them there. :) It's not nice and consistent, but it does follow the simple rule of don't expose things to user-space unnecessarily. We might want to add a comment to keep somebody from cleaning it up later. ok. Will resend patches adding the new defines to linux/pci.h and renaming PCI_BUS() to PCI_BUS_NR() or PCI_BUS_NUM() like you suggested. Thanks, -- Shuah ___ iommu mailing
[PATCH 3.0] iommu/amd: Initialize device table after dma_ops
When dma_ops are initialized the unity mappings are created. The init_device_table_dma() function makes sure DMA from all devices is blocked by default. This opens a short window in time where DMA to unity mapped regions is blocked by the IOMMU. Make sure this does not happen by initializing the device table after dma_ops. Back-port upstream commit: f528d980c17b8714aedc918ba86e058af914d66b Signed-off-by: Joerg Roedel j...@8bytes.org Signed-off-by: Shuah Khan shuah.k...@hp.com CC: sta...@vger.kernel.org 3.0 --- arch/x86/kernel/amd_iommu_init.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c index 33df6e8..d86aa3f 100644 --- a/arch/x86/kernel/amd_iommu_init.c +++ b/arch/x86/kernel/amd_iommu_init.c @@ -1363,6 +1363,7 @@ static struct syscore_ops amd_iommu_syscore_ops = { */ static int __init amd_iommu_init(void) { + struct amd_iommu *iommu; int i, ret = 0; /* @@ -1411,9 +1412,6 @@ static int __init amd_iommu_init(void) if (amd_iommu_pd_alloc_bitmap == NULL) goto free; - /* init the device table */ - init_device_table(); - /* * let all alias entries point to itself */ @@ -1463,6 +1461,12 @@ static int __init amd_iommu_init(void) if (ret) goto free_disable; + /* init the device table */ + init_device_table(); + + for_each_iommu(iommu) + iommu_flush_all_caches(iommu); + amd_iommu_init_api(); amd_iommu_init_notifier(); -- 1.7.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/4] pci: Add PCI_BUS_NUM() and PCI_DEVID() interfaces to return bus number and device id
pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however, it doesn't have interfaces to return PCI bus and PCI device id. Drivers (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS() and AMD_IOMMU driver also has a module specific interface to calculate PCI device id from bus number and devfn. Add PCI_BUS_NUM and PCI_DEVID interfaces to return PCI bus number and PCI device id respectively to avoid the need for duplicate definitions in other modules. AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines an interface to calculate device id from bus number, and devfn pair. PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces are exported to user-space via uapi/linux/pci.h. However, in the interest to keep the new interfaces as kernel only and not export them to user-space unnecessarily, added them to linux/pci.h instead. Signed-off-by: Shuah Khan shuah.k...@hp.com --- include/linux/pci.h | 15 +++ 1 file changed, 15 insertions(+) diff --git a/include/linux/pci.h b/include/linux/pci.h index 15472d6..95c105a 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -35,6 +35,21 @@ /* Include the ID list */ #include linux/pci_ids.h +/* + * The PCI interface treats multi-function devices as independent + * devices. The slot/function address of each device is encoded + * in a single byte as follows: + * + * 7:3 = slot + * 2:0 = function + * PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() are defined uapi/linux/pci.h + * In the interest of not exposing interfaces to user-space unnecessarily, + * the following kernel only defines are being added here. + */ +#define PCI_DEVID(bus, devfn) u16)bus) 8) | devfn) +/* return bus from PCI devid = ((u16)bus_number) 8) | devfn */ +#define PCI_BUS_NUM(x) (((x) 8) 0xff) + /* pci_slot represents a physical slot */ struct pci_slot { struct pci_bus *bus;/* The bus this slot is on */ -- 1.7.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/4] pci/aer: Remove local PCI_BUS() define and use PCI_BUS_NUM() from pci
Change to remove local PCI_BUS() define and use the new PCI_BUS_NUM() interface from pci. Signed-off-by: Shuah Khan shuah.k...@hp.com --- drivers/pci/pcie/aer/aerdrv_core.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index 564d97f..8ec8b4f 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -89,8 +89,6 @@ static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev) return -ENOSPC; } -#definePCI_BUS(x) (((x) 8) 0xff) - /** * is_error_source - check whether the device is source of reported error * @dev: pointer to pci_dev to be checked @@ -106,7 +104,7 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info) * When bus id is equal to 0, it might be a bad id * reported by root port. */ - if (!nosourceid (PCI_BUS(e_info-id) != 0)) { + if (!nosourceid (PCI_BUS_NUM(e_info-id) != 0)) { /* Device ID match? */ if (e_info-id == ((dev-bus-number 8) | dev-devfn)) return true; -- 1.7.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/4] iommu/amd: Remove calc_devid() and use PCI_DEVID() from pci
Change to remove calc_devid() and use PCI_DEVID() from pci instead. Signed-off-by: Shuah Khan shuah.k...@hp.com --- drivers/iommu/amd_iommu.c |2 +- drivers/iommu/amd_iommu_init.c |6 +++--- drivers/iommu/amd_iommu_types.h |7 --- 3 files changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index c1c74e0..38c1392 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -173,7 +173,7 @@ static inline u16 get_device_id(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); - return calc_devid(pdev-bus-number, pdev-devfn); + return PCI_DEVID(pdev-bus-number, pdev-devfn); } static struct iommu_dev_data *get_dev_data(struct device *dev) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index faf10ba..582b5df 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -406,7 +406,7 @@ static int __init find_last_devid_on_pci(int bus, int dev, int fn, int cap_ptr) u32 cap; cap = read_pci_config(bus, dev, fn, cap_ptr+MMIO_RANGE_OFFSET); - update_last_devid(calc_devid(MMIO_GET_BUS(cap), MMIO_GET_LD(cap))); + update_last_devid(PCI_DEVID(MMIO_GET_BUS(cap), MMIO_GET_LD(cap))); return 0; } @@ -1128,9 +1128,9 @@ static int iommu_init_pci(struct amd_iommu *iommu) pci_read_config_dword(iommu-dev, cap_ptr + MMIO_MISC_OFFSET, misc); - iommu-first_device = calc_devid(MMIO_GET_BUS(range), + iommu-first_device = PCI_DEVID(MMIO_GET_BUS(range), MMIO_GET_FD(range)); - iommu-last_device = calc_devid(MMIO_GET_BUS(range), + iommu-last_device = PCI_DEVID(MMIO_GET_BUS(range), MMIO_GET_LD(range)); if (!(iommu-cap (1 IOMMU_CAP_IOTLB))) diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index a07882f..ec36cf6 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -701,13 +701,6 @@ extern int amd_iommu_max_glx_val; */ extern void iommu_flush_all_caches(struct amd_iommu *iommu); -/* takes bus and device/function and returns the device id - * FIXME: should that be in generic PCI code? */ -static inline u16 calc_devid(u8 bus, u8 devfn) -{ - return (((u16)bus) 8) | devfn; -} - static inline int get_ioapic_devid(int id) { struct devid_map *entry; -- 1.7.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3.4] iommu/amd: Initialize device table after dma_ops
When dma_ops are initialized the unity mappings are created. The init_device_table_dma() function makes sure DMA from all devices is blocked by default. This opens a short window in time where DMA to unity mapped regions is blocked by the IOMMU. Make sure this does not happen by initializing the device table after dma_ops. Back-port upstream commit: f528d980c17b8714aedc918ba86e058af914d66b Signed-off-by: Joerg Roedel j...@8bytes.org Signed-off-by: Shuah Khan shuah.k...@hp.com CC: sta...@vger.kernel.org 3.4 --- drivers/iommu/amd_iommu_init.c |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index ef0ae93..b573f80 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1572,8 +1572,6 @@ int __init amd_iommu_init_hardware(void) if (amd_iommu_pd_alloc_bitmap == NULL) goto free; - /* init the device table */ - init_device_table(); /* * let all alias entries point to itself @@ -1655,6 +1653,7 @@ out: */ static int __init amd_iommu_init(void) { + struct amd_iommu *iommu; int ret = 0; ret = amd_iommu_init_hardware(); @@ -1673,6 +1672,12 @@ static int __init amd_iommu_init(void) if (ret) goto free; + /* init the device table */ + init_device_table(); + + for_each_iommu(iommu) + iommu_flush_all_caches(iommu); + amd_iommu_init_api(); x86_platform.iommu_shutdown = disable_iommus; -- 1.7.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id
On Mon, 2013-02-25 at 14:53 -0700, Shuah Khan wrote: On Mon, 2013-02-25 at 14:23 -0700, Bjorn Helgaas wrote: On Mon, Feb 25, 2013 at 9:37 AM, Shuah Khan shuah.k...@hp.com wrote: On Wed, 2013-02-20 at 18:19 -0700, Bjorn Helgaas wrote: On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan shuah.k...@hp.com wrote: It's not nice and consistent, but it does follow the simple rule of don't expose things to user-space unnecessarily. We might want to add a comment to keep somebody from cleaning it up later. ok. Will resend patches adding the new defines to linux/pci.h and renaming PCI_BUS() to PCI_BUS_NR() or PCI_BUS_NUM() like you suggested. Thanks, -- Shuah Bjorn/Joerg, I added PCI_BUS_NUM() amd PCI_DEVID() to linux/pci.h. Please note that changing PCI_BUS() to PCI_BUS_NUM() required additional changes to AMD IOMMU source files and aer driver. Essentially in addition to removing local PCI_BUS() define, PCI_BUS() usages are changed to PCI_BUS_NUM(). I am resending the patches. Thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3.4] iommu/amd: Initialize device table after dma_ops
On Tue, 2013-02-26 at 16:49 -0700, Shuah Khan wrote: When dma_ops are initialized the unity mappings are created. The init_device_table_dma() function makes sure DMA from all devices is blocked by default. This opens a short window in time where DMA to unity mapped regions is blocked by the IOMMU. Make sure this does not happen by initializing the device table after dma_ops. Back-port upstream commit: f528d980c17b8714aedc918ba86e058af914d66b Signed-off-by: Joerg Roedel j...@8bytes.org Signed-off-by: Shuah Khan shuah.k...@hp.com CC: sta...@vger.kernel.org 3.4 Please consider this for 3.5 as well. -- Shuah --- drivers/iommu/amd_iommu_init.c |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index ef0ae93..b573f80 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1572,8 +1572,6 @@ int __init amd_iommu_init_hardware(void) if (amd_iommu_pd_alloc_bitmap == NULL) goto free; - /* init the device table */ - init_device_table(); /* * let all alias entries point to itself @@ -1655,6 +1653,7 @@ out: */ static int __init amd_iommu_init(void) { + struct amd_iommu *iommu; int ret = 0; ret = amd_iommu_init_hardware(); @@ -1673,6 +1672,12 @@ static int __init amd_iommu_init(void) if (ret) goto free; + /* init the device table */ + init_device_table(); + + for_each_iommu(iommu) + iommu_flush_all_caches(iommu); + amd_iommu_init_api(); x86_platform.iommu_shutdown = disable_iommus; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3.4] iommu/amd: Initialize device table after dma_ops
On Thu, 2013-02-28 at 12:52 -0700, Shuah Khan wrote: On Tue, 2013-02-26 at 16:49 -0700, Shuah Khan wrote: When dma_ops are initialized the unity mappings are created. The init_device_table_dma() function makes sure DMA from all devices is blocked by default. This opens a short window in time where DMA to unity mapped regions is blocked by the IOMMU. Make sure this does not happen by initializing the device table after dma_ops. Back-port upstream commit: f528d980c17b8714aedc918ba86e058af914d66b Signed-off-by: Joerg Roedel j...@8bytes.org Signed-off-by: Shuah Khan shuah.k...@hp.com CC: sta...@vger.kernel.org 3.4 Please consider this for 3.5 as well. -- Shuah Please consider this for 3.5. Patch applies and fix works on 3.5.7. My previous reply to Herton Ronaldo Krzesinski bounced. Including Luis Henriques on this one. -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3.2] iommu/amd: Initialize device table after dma_ops
When dma_ops are initialized the unity mappings are created. The init_device_table_dma() function makes sure DMA from all devices is blocked by default. This opens a short window in time where DMA to unity mapped regions is blocked by the IOMMU. Make sure this does not happen by initializing the device table after dma_ops. Back-port upstream commit: f528d980c17b8714aedc918ba86e058af914d66b Tested on 3.2.38 Signed-off-by: Joerg Roedel j...@8bytes.org Signed-off-by: Shuah Khan shuah.k...@hp.com CC: sta...@vger.kernel.org 3.2 --- drivers/iommu/amd_iommu_init.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 62a4d5c..b7d1cdd 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1396,6 +1396,7 @@ static struct syscore_ops amd_iommu_syscore_ops = { */ static int __init amd_iommu_init(void) { + struct amd_iommu *iommu; int i, ret = 0; /* @@ -1444,9 +1445,6 @@ static int __init amd_iommu_init(void) if (amd_iommu_pd_alloc_bitmap == NULL) goto free; - /* init the device table */ - init_device_table(); - /* * let all alias entries point to itself */ @@ -1496,6 +1494,12 @@ static int __init amd_iommu_init(void) if (ret) goto free_disable; + /* init the device table */ + init_device_table(); + + for_each_iommu(iommu) + iommu_flush_all_caches(iommu); + amd_iommu_init_api(); amd_iommu_init_notifier(); -- 1.7.10.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 8/9] iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides
On Tue, Apr 9, 2013 at 2:13 PM, Joerg Roedel j...@8bytes.org wrote: When the IVRS entries for IOAPIC and HPET are overridden on the kernel command line, a problem detected in the check function might not be a firmware bug anymore. So disable the firmware bug reporting if the user provided valid ivrs_ioapic or ivrs_hpet entries on the command line. Signed-off-by: Joerg Roedel j...@8bytes.org --- drivers/iommu/amd_iommu_init.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 030d6ab..9767941 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -219,6 +219,7 @@ static struct devid_map __initdata early_ioapic_map[EARLY_MAP_SIZE]; static struct devid_map __initdata early_hpet_map[EARLY_MAP_SIZE]; static int __initdata early_ioapic_map_size; static int __initdata early_hpet_map_size; +static bool __initdata cmdline_maps; static enum iommu_init_state init_state = IOMMU_START_STATE; @@ -1686,18 +1687,28 @@ static void __init free_on_init_error(void) static bool __init check_ioapic_information(void) { + const char *fw_bug = FW_BUG; bool ret, has_sb_ioapic; int idx; has_sb_ioapic = false; ret = false; + /* +* If we have map overrides on the kernel command line the +* messages in this function might not describe firmware bugs +* anymore - so be careful +*/ + if (cmdline_maps) + fw_bug = ; + for (idx = 0; idx nr_ioapics; idx++) { int devid, id = mpc_ioapic_id(idx); devid = get_ioapic_devid(id); if (devid 0) { - pr_err(FW_BUG AMD-Vi: IOAPIC[%d] not in IVRS table\n, id); + pr_err(%sAMD-Vi: IOAPIC[%d] not in IVRS table\n, + fw_bug, id); ret = false; } else if (devid == IOAPIC_SB_DEVID) { has_sb_ioapic = true; @@ -1714,11 +1725,11 @@ static bool __init check_ioapic_information(void) * when the BIOS is buggy and provides us the wrong * device id for the IOAPIC in the system. */ - pr_err(FW_BUG AMD-Vi: No southbridge IOAPIC found in IVRS table\n); + pr_err(%sAMD-Vi: No southbridge IOAPIC found\n, fw_bug); } if (!ret) - pr_err(AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)\n); + pr_err(AMD-Vi: Disabling interrupt remapping\n); Can this made conditional on cmdline_maps and keep the original message in the case of a real BIOS bug? That way it is explicit and easier to debug. return ret; } @@ -2166,6 +2177,7 @@ static int __init parse_ivrs_ioapic(char *str) devid = ((bus 0xff) 8) | ((dev 0x1f) 3) | (fn 0x7); + cmdline_maps= true; i = early_ioapic_map_size++; early_ioapic_map[i].id = id; early_ioapic_map[i].devid = devid; @@ -2195,6 +2207,7 @@ static int __init parse_ivrs_hpet(char *str) devid = ((bus 0xff) 8) | ((dev 0x1f) 3) | (fn 0x7); + cmdline_maps= true; i = early_hpet_map_size++; early_hpet_map[i].id= id; early_hpet_map[i].devid = devid; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/9] iommu/amd: Remove map_sg_no_iommu()
On Tue, Apr 9, 2013 at 2:12 PM, Joerg Roedel j...@8bytes.org wrote: This function was intended as a fall-back if the map_sg function is called for a device not mapped by the IOMMU. Since the AMD IOMMU driver uses per-device dma_ops this can never happen. So this function isn't needed anymore. Signed-off-by: Joerg Roedel j...@8bytes.org --- Reviewed-by: Shuah Khan shuahk...@gmail.com ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/9] iommu/amd: Use AMD specific data structure for irq remapping
On Tue, Apr 9, 2013 at 2:12 PM, Joerg Roedel j...@8bytes.org wrote: For compatibility reasons the irq remapping code for the AMD IOMMU used the same per-irq data structure as the Intel implementation. Now that support for the AMD specific data structure is upstream we can use this one instead. Signed-off-by: Joerg Roedel j...@8bytes.org --- Reviewed-by: Shuah Khan shuahk...@gmail.com ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9] AMD IOMMU cleanups, fixes and IVRS bug workarounds
On Tue, Apr 9, 2013 at 2:12 PM, Joerg Roedel j...@8bytes.org wrote: Hi, this patch-set contains some cleanups and patches for the AMD IOMM driver. The most important part is a workaround that can be used to get interrupt remapping working even the the IVRS table provided by the BIOS is broken. Joerg Joerg Roedel (9): iommu/amd: Remove map_sg_no_iommu() iommu/amd: Use AMD specific data structure for irq remapping iommu/amd: Properly initialize irq-table lock iommu/amd: Move add_special_device() to __init iommu/amd: Extend IVRS special device data structure iommu/amd: Add early maps for ioapic and hpet iommu/amd: Add ioapic and hpet ivrs override iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides iommu/amd: Document ivrs_ioapic and ivrs_hpet parameters Documentation/kernel-parameters.txt | 14 drivers/iommu/amd_iommu.c | 79 +++--- drivers/iommu/amd_iommu_init.c | 151 +++ drivers/iommu/amd_iommu_types.h |1 + 4 files changed, 182 insertions(+), 63 deletions(-) Joerg, Reviewed all the patches in this set. No longer have access to test machine. :( Reviewed-by: Shuah Khan shuahk...@gmail.com -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2 V5] iommu/amd: Add logic to decode AMD IOMMU event flag
== 0) { + if (err_type ARRAY_SIZE(_invalid_trnslt_desc)) + pr_cont(%s\n, + _invalid_trnslt_desc[err_type]); + } else { + if (err_type ARRAY_SIZE(_invalid_trnsac_desc)) + pr_cont(%s\n, + _invalid_trnsac_desc[err_type]); + } + } +} + +static void dump_flags(int flags, int ev_type) +{ + struct _event_log_flags *p = (struct _event_log_flags *) flags; + u32 err_type = p-type; + + pr_cont( flags:%s %s %s %s %s %s %s %s %s, + (p-gn ? G : N), + (p-nx ? Nx : Ex), + (p-us ? Usr : Sup), + (p-i ? I : M), + (p-pr ? P : NP), + (p-rw ? W : R), + (p-pe ? N-Pm : Pm), + (p-rz ? Rsv : Ill), + (p-tr ? Tl : Ta)); + Good feature. Do you also plan to add decode logic for these flags. For example, RZ is only meaningful when PR=1, RW is only meaningful when PR=1, TR=0, and I=0, and so on? This additional logic will be useful. Reviewed-by: Shuah Khan shuahk...@gmail.com -- Shuah + /* Error type only needed for certain events */ + if (!amd_iommu_verbose) { + if ((ev_type == EVENT_TYPE_DEV_TAB_ERR) || + (ev_type == EVENT_TYPE_PAGE_TAB_ERR) || + (ev_type == EVENT_TYPE_CMD_HARD_ERR) || + (ev_type == EVENT_TYPE_INV_DEV_REQ)) + pr_cont( type(0x%x)]\n, err_type); + } else { + pr_cont(]\n); + dump_detail_error(p, ev_type); + } +} + static void dump_dte_entry(u16 devid) { int i; + pr_err(AMD-Vi: DTE[0..3]:); for (i = 0; i 4; ++i) - pr_err(AMD-Vi: DTE[%d]: %016llx\n, i, - amd_iommu_dev_table[devid].data[i]); + pr_cont( %016llx, amd_iommu_dev_table[devid].data[i]); + pr_cont(\n); } static void dump_command(unsigned long phys_addr) @@ -619,81 +724,96 @@ static void dump_command(unsigned long phys_addr) pr_err(AMD-Vi: CMD[%d]: %08x\n, i, cmd-data[i]); } -static void iommu_print_event(struct amd_iommu *iommu, void *__evt) +void amd_iommu_print_event(int type, int devid, int domid, + int flags, u64 address) { - int type, devid, domid, flags; - volatile u32 *event = __evt; - int count = 0; - u64 address; - -retry: - type= (event[1] EVENT_TYPE_SHIFT) EVENT_TYPE_MASK; - devid = (event[0] EVENT_DEVID_SHIFT) EVENT_DEVID_MASK; - domid = (event[1] EVENT_DOMID_SHIFT) EVENT_DOMID_MASK; - flags = (event[1] EVENT_FLAGS_SHIFT) EVENT_FLAGS_MASK; - address = (u64)(((u64)event[3]) 32) | event[2]; - - if (type == 0) { - /* Did we hit the erratum? */ - if (++count == LOOP_TIMEOUT) { - pr_err(AMD-Vi: No event written to event log\n); - return; - } - udelay(1); - goto retry; - } - - printk(KERN_ERR AMD-Vi: Event logged [); + pr_err(AMD-Vi: Event logged [); switch (type) { case EVENT_TYPE_ILL_DEV: - printk(ILLEGAL_DEV_TABLE_ENTRY device=%02x:%02x.%x - address=0x%016llx flags=0x%04x]\n, + pr_cont(ILLEGAL_DEV_TABLE_ENTRY device=%02x:%02x.%x + address=0x%016llx, PCI_BUS(devid), PCI_SLOT(devid), PCI_FUNC(devid), - address, flags); + address); + dump_flags(flags, type); dump_dte_entry(devid); break; case EVENT_TYPE_IO_FAULT: - printk(IO_PAGE_FAULT device=%02x:%02x.%x - domain=0x%04x address=0x%016llx flags=0x%04x]\n, + pr_cont(IO_PAGE_FAULT device=%02x:%02x.%x + domain=0x%04x address=0x%016llx, PCI_BUS(devid), PCI_SLOT(devid), PCI_FUNC(devid), - domid, address, flags); + domid, address); + dump_flags(flags, type); + dump_dte_entry(devid); break; case EVENT_TYPE_DEV_TAB_ERR: - printk(DEV_TAB_HARDWARE_ERROR device=%02x:%02x.%x - address=0x%016llx flags=0x%04x]\n, + pr_cont(DEV_TAB_HARDWARE_ERROR device=%02x:%02x.%x + address=0x%016llx, PCI_BUS(devid), PCI_SLOT(devid), PCI_FUNC(devid), - address, flags); + address); + dump_flags(flags, type); break; case
Re: [PATCH 2/2 V5] iommu/amd: Add logic to decode AMD IOMMU event flag
On Wed, Apr 10, 2013 at 10:27 AM, Suravee Suthikulanit suravee.suthikulpa...@amd.com wrote: On 4/10/2013 11:21 AM, Shuah Khan wrote: Good feature. Do you also plan to add decode logic for these flags. For example, RZ is only meaningful when PR=1, RW is only meaningful when PR=1, TR=0, and I=0, and so on? This additional logic will be useful. Reviewed-by: Shuah Khanshuahk...@gmail.com -- Shuah Additional filtering logic can also be added in the future. This will also be important if we are planning on handling IOMMU errors. Correct. Additional logic isn't necessary in this patch. -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9] AMD IOMMU cleanups, fixes and IVRS bug workarounds
On Sat, Apr 13, 2013 at 9:26 AM, Joerg Roedel j...@8bytes.org wrote: On Sat, Apr 13, 2013 at 11:06:22PM +0800, Andrew Cooks wrote: On Fri, Apr 12, 2013 at 4:06 PM, Joerg Roedel j...@8bytes.org wrote: Oh, that's sad. You were the only one having a machine wich actually has unity-mapped ranges defined in the BIOS table. The code for those mappings was basically untested before you ran it on that machine. What is the machine in question? Maybe someone else has access to one, if it's not too exotic. Shuah had access to a HP server machine (don't know which one) that defined unity-map ranges in the BIOS table. Shuah certainly knows the details about that machine. Joerg/Andrew, It is a DL385 Gen8 server. Unfortunately I left HP as of yesterday and no longer have access to the system. Maybe there others that have access to one. -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] iommu: Add iommu_fault class event to iommu trace
On 09/24/2013 04:48 AM, Joerg Roedel wrote: On Fri, Aug 16, 2013 at 11:20:16AM -0600, Shuah Khan wrote: +DEFINE_EVENT(iommu_fault, report_fault, + + TP_PROTO(struct device *dev, unsigned long iova, int flags), + + TP_ARGS(dev, iova, flags) +); I am not entirely happy with the naming. It is better to name the class iommu_error and the page-fault event io_page_fault. This makes it more clear what kind of fault it is. Joerg Joerg, Yes, iommu_error and io_page_fault sounds better. I will rename and send the updated patch. -- Shuah -- Shuah Khan Senior Linux Kernel Developer - Open Source Group Samsung Research America(Silicon Valley) shuah...@samsung.com | (970) 672-0658 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 0/2] iommu: Add iommu_error class event to iommu trace
This patch set adds iommu_error class event to iommu trace. iommu_error class event can be enabled to trigger when an iommu error occurs. This trace event is intended to be called to report the error information. Trace information includes driver name, device name, iova, and flags. iommu_error:io_page_fault This patch set depends on the previous patch set that added iommu tracing feature. Reference: http://www.kernelhub.org/?msg=313155p=2 http://www.kernelhub.org/?msg=313156p=2 Shuah Khan (2): iommu: Add iommu_error class event to iommu trace iommu: Change iommu driver to call io_page_fault trace event drivers/iommu/iommu-traces.c | 3 +++ include/linux/iommu.h| 2 ++ include/trace/events/iommu.h | 33 + 3 files changed, 38 insertions(+) -- 1.8.1.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 2/2] iommu: Change iommu driver to call io_page_fault trace event
Change iommu driver call io_page_fault trace event. This iommu_error class event can be enabled to trigger when an iommu error occurs. Trace information includes driver name, device name, iova, and flags. Testing: Added trace calls to iommu_prepare_identity_map() for testing some of the conditions that are hard to trigger. Here is the trace from the testing: swapper/0-1 [003] 2.003774: io_page_fault: IOMMU:pci :00:02.0 iova=0xcb80 flags=0x0002 swapper/0-1 [003] 2.004098: io_page_fault: IOMMU:pci :00:1d.0 iova=0xcadc6000 flags=0x0002 swapper/0-1 [003] 2.004115: io_page_fault: IOMMU:pci :00:1a.0 iova=0xcadc6000 flags=0x0002 swapper/0-1 [003] 2.004129: io_page_fault: IOMMU:pci :00:1f.0 iova=0x flags=0x0002 Signed-off-by: Shuah Khan shuah...@samsung.com --- include/linux/iommu.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 7ea319e..a444c79 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -22,6 +22,7 @@ #include linux/errno.h #include linux/err.h #include linux/types.h +#include trace/events/iommu.h #define IOMMU_READ (1) #define IOMMU_WRITE(2) @@ -227,6 +228,7 @@ static inline int report_iommu_fault(struct iommu_domain *domain, ret = domain-handler(domain, dev, iova, flags, domain-handler_token); + trace_io_page_fault(dev, iova, flags); return ret; } -- 1.8.1.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 1/2] iommu: Add iommu_error class event to iommu trace
iommu_error class event can be enabled to trigger when an iommu error occurs. This trace event is intended to be called to report the error information. Trace information includes driver name, device name, iova, and flags. iommu_error:io_page_fault Signed-off-by: Shuah Khan shuah...@samsung.com --- drivers/iommu/iommu-traces.c | 3 +++ include/trace/events/iommu.h | 33 + 2 files changed, 36 insertions(+) diff --git a/drivers/iommu/iommu-traces.c b/drivers/iommu/iommu-traces.c index a2af60f..71ac5fa 100644 --- a/drivers/iommu/iommu-traces.c +++ b/drivers/iommu/iommu-traces.c @@ -22,3 +22,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(detach_device_from_domain); /* iommu_map_unmap */ EXPORT_TRACEPOINT_SYMBOL_GPL(map); EXPORT_TRACEPOINT_SYMBOL_GPL(unmap); + +/* iommu_error */ +EXPORT_TRACEPOINT_SYMBOL_GPL(io_page_fault); diff --git a/include/trace/events/iommu.h b/include/trace/events/iommu.h index 86bcc5a..a8f5c32 100644 --- a/include/trace/events/iommu.h +++ b/include/trace/events/iommu.h @@ -123,6 +123,39 @@ DEFINE_EVENT_PRINT(iommu_map_unmap, unmap, __entry-iova, __entry-size ) ); + +DECLARE_EVENT_CLASS(iommu_error, + + TP_PROTO(struct device *dev, unsigned long iova, int flags), + + TP_ARGS(dev, iova, flags), + + TP_STRUCT__entry( + __string(device, dev_name(dev)) + __string(driver, dev_driver_string(dev)) + __field(u64, iova) + __field(int, flags) + ), + + TP_fast_assign( + __assign_str(device, dev_name(dev)); + __assign_str(driver, dev_driver_string(dev)); + __entry-iova = iova; + __entry-flags = flags; + ), + + TP_printk(IOMMU:%s %s iova=0x%016llx flags=0x%04x, + __get_str(driver), __get_str(device), + __entry-iova, __entry-flags + ) +); + +DEFINE_EVENT(iommu_error, io_page_fault, + + TP_PROTO(struct device *dev, unsigned long iova, int flags), + + TP_ARGS(dev, iova, flags) +); #endif /* _TRACE_IOMMU_H */ /* This part must be outside protection */ -- 1.8.1.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: hpsa driver bug crack kernel down!
On Thu, Apr 10, 2014 at 2:45 PM, scame...@beardog.cce.hp.com wrote: 3f583bc21977 BAD (Merge tag 'iommu-updates-v3.15') Yes, specifically (finally done bisecting): commit 2e45528930388658603ea24d49cf52867b928d3e Author: Jiang Liu jiang@linux.intel.com Date: Wed Feb 19 14:07:36 2014 +0800 iommu/vt-d: Unify the way to process DMAR device scope array Now we have a PCI bus notification based mechanism to update DMAR device scope array, we could extend the mechanism to support boot time initialization too, which will help to unify and simplify the implementation. Signed-off-by: Jiang Liu jiang@linux.intel.com Signed-off-by: Joerg Roedel j...@8bytes.org My git bisect appears to be converging on something else, something within the hpsa patches that I sent up recently, unfortunately for me. Will let you all know when it converges. This smells very much like the problem that was solved couple of years ago for SI domain. It is likely that path is broken with the DMAR device scope array change. Please take a look to see if the following no longer occurs. Looks like BIOS could be expecting this RMRR to be still mapped. /* * We want to prevent any device associated with an RMRR from * getting placed into the SI Domain. This is done because * problems exist when devices are moved in and out of domains * and their respective RMRR info is lost. We exempt USB devices * from this process due to their usage of RMRRs that are known * to not be needed after BIOS hand-off to OS. */ if (device_has_rmrr(dev) (pdev-class 8) != PCI_CLASS_SERIAL_USB) return 0; -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/4] dma-mapping: Add devm_ interface for dma_map_single()
On 06/01/2014 01:01 AM, Eli Billauer wrote: dmam_map_single() and dmam_unmap_single() are the managed counterparts for the respective dma_* functions. Note that dmam_map_single() returns a status value rather than the DMA handle. The DMA handle is passed to the caller through a pointer in the arguments. The reason for this API change is that dmam_map_single() allocates memory, which may fail even if the DMA mapping is successful. On the other hand, it seems like there's no platform-independent value for dma_handle that can be used to indicate an error. This leaves dmam_map_single() with no safe way to tell its caller that the memory allocation has failed, except for returning a status as an int. Trying to keep close to the non-devres API could also tempt programmers into using dma_mapping_error() on the dmam_* functions. This would work incorrectly on platforms, for which dma_mapping_error() ignores its argument, and always returns success. I see the value of this interface in unmap case, this type of wrapper can release dma buffers, drivers neglected to release leaving dangling buffers. However, driver writers should give it some thought before switching from conventional dma_map_*() interfaces to these proposed managed dma mapping interfaces. These new interfaces shouldn't be treated as drop in replacements to dma_map_*() interfaces. The reasons are: 1. This interface adds an overhead in allocation memory for devres to compared to other dma_map_* wrappers. Users need to be aware of that. This would be okay in the cases where a driver allocates several buffers at init time and uses them. However, several drivers allocate during run-time and release as soon as it is no longer needed. This overhead is going to be in the performance path. 2. It adds a failure case even when dma buffers are available. i.e if if devres alloc fails, it will return failure even if dma map could have succeeded. This is a new failure case for DMA-API. The reason dma_map_*() routines fail now is because there are no buffers available. Drivers handle this error as a retry case. Drivers using dmam_map_single() will have to handle the failure cases differently. Since the return values are different for dmam_map_*(), that is plus that these interfaces can't be drop in replacements to the dma_map_*() interfaces. 3. Similarly, it adds an overhead in releasing memory for devres to compared to other dma_unmap_* wrappers. Users need to be aware of that. This overhead is going to be in the performance path when drivers unmap buffers during run-time. Adding Joerg Roedel to the thread. -- Shuah -- Shuah Khan Senior Linux Kernel Developer - Open Source Group Samsung Research America(Silicon Valley) shuah...@samsung.com | (970) 672-0658 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/4] dma-mapping: Add devm_ interface for dma_map_single()
On 06/06/2014 10:21 AM, Eli Billauer wrote: On 06/06/14 19:01, Greg KH wrote: Please try to put yourself in my position: I have a driver that I care about, which works fine for a few years. It's based upon dma_map_single(), which seems to be the common way to get non-coherent memory, even for the driver's entire lifespan. I realize that dma_alloc_* was the intended way to do it, but fact is that dma_map_* has become the common choice. Is your driver in the kernel tree? If not, you really are on your own :( It's the Xillybus driver in the staging area. I don't know if this counts for being in the kernel tree... The suggested patchset would allow replacing my use of dma_map_single() with a managed version of that function. This will clean the driver's code considerably. But I think that the discussion here is if it's valid to use dma_map_single() for a device-permanent DMA mapping, or if dma_alloc_noncoherent() is the only way. If the answer is no, there's quite obviously no point in a devres version for that function. Eli, dma_map_single() and dma_unmap_single() are streaming DMA APIs. These are intended for one DMA transfer and unmapped right after it is done. dma buffers are limited shared resources for streaming that are shared by several drivers. Hence the need for use and release model. Please refer to the Using Streaming DMA mappings in DMA-API-HOWTO.txt - Streaming DMA mappings which are usually mapped for one DMA transfer, unmapped right after it (unless you use dma_sync_* below) and for which hardware can optimize for sequential accesses. This of streaming as asynchronous or outside the coherency domain. Good examples of what to use streaming mappings for are: - Networking buffers transmitted/received by a device. - Filesystem buffers written/read by a SCSI device. If I understand your intended usage correctly, you are looking to allocate and hold the buffers for the lifetime of the driver. For such cases, dma_alloc_*() interfaces are the ones to use. Please also refer to DMA-API.txt as well. Hope this helps. -- Shuah -- Shuah Khan Senior Linux Kernel Developer - Open Source Group Samsung Research America(Silicon Valley) shuah...@samsung.com | (970) 672-0658 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: fix trace_map() to report original iova and original size
iommu_map() calls trace_map() with iova and size. trace_map() should report original iova and original size as opposed to iova and size after they get changed during mapping. size is always zero at the end of mapping which is useless to report and iova as it gets incremented, it is not as useful as the original iova. Change iommu_map() to call trace_map() to report original iova and original size. Signed-off-by: Shuah Khan shua...@osg.samsung.com Reported-by: Alex Williamson alex.william...@redhat.com --- drivers/iommu/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f7718d7..fbf8827 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1084,7 +1084,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, if (ret) iommu_unmap(domain, orig_iova, orig_size - size); else - trace_map(iova, paddr, size); + trace_map(orig_iova, paddr, orig_size); return ret; } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: fix trace_unmap() to report original iova
iommu_unmap() calls trace_unmap() with changed iova and original size. trace_unmap() should report original iova instead. Change iommu_unmap() to call trace_unmap() with original iova. Signed-off-by: Shuah Khan shua...@osg.samsung.com Reported-by: Alex Williamson alex.william...@redhat.com --- drivers/iommu/iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index fbf8827..2eb7554 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1094,6 +1094,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { size_t unmapped_page, unmapped = 0; unsigned int min_pagesz; + unsigned long orig_iova = iova; if (unlikely(domain-ops-unmap == NULL || domain-ops-pgsize_bitmap == 0UL)) @@ -1133,7 +1134,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) unmapped += unmapped_page; } - trace_unmap(iova, 0, size); + trace_unmap(orig_iova, 0, size); return unmapped; } EXPORT_SYMBOL_GPL(iommu_unmap); -- 2.1.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: change trace unmap api to report unmapped size
Currently map and unmap are implemented as events under a common trace class declaration. The common class forces trace_unmap() to require a bogus physical address argument that it doesn't use. Changing unmap to report unmapped size will provide useful information for debugging. Remove common map_unmap trace class and change map and unmap into separate events as opposed to events under the same class to allow for differences in the reporting information. In addition, map and unmap are changed to handle size value as size_t instead of int to match the passed size value and avoid overflow. Signed-off-by: Shuah Khan shua...@osg.samsung.com Suggested-by: Alex Williamson alex.william...@redhat.com --- drivers/iommu/iommu.c| 2 +- include/trace/events/iommu.h | 31 ++- 2 files changed, 19 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 2eb7554..9e0dcdb 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1134,7 +1134,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) unmapped += unmapped_page; } - trace_unmap(orig_iova, 0, size); + trace_unmap(orig_iova, size, unmapped); return unmapped; } EXPORT_SYMBOL_GPL(iommu_unmap); diff --git a/include/trace/events/iommu.h b/include/trace/events/iommu.h index a8f5c32..2c7befb 100644 --- a/include/trace/events/iommu.h +++ b/include/trace/events/iommu.h @@ -83,7 +83,7 @@ DEFINE_EVENT(iommu_device_event, detach_device_from_domain, TP_ARGS(dev) ); -DECLARE_EVENT_CLASS(iommu_map_unmap, +TRACE_EVENT(map, TP_PROTO(unsigned long iova, phys_addr_t paddr, size_t size), @@ -92,7 +92,7 @@ DECLARE_EVENT_CLASS(iommu_map_unmap, TP_STRUCT__entry( __field(u64, iova) __field(u64, paddr) - __field(int, size) + __field(size_t, size) ), TP_fast_assign( @@ -101,26 +101,31 @@ DECLARE_EVENT_CLASS(iommu_map_unmap, __entry-size = size; ), - TP_printk(IOMMU: iova=0x%016llx paddr=0x%016llx size=0x%x, + TP_printk(IOMMU: iova=0x%016llx paddr=0x%016llx size=%zu, __entry-iova, __entry-paddr, __entry-size ) ); -DEFINE_EVENT(iommu_map_unmap, map, +TRACE_EVENT(unmap, - TP_PROTO(unsigned long iova, phys_addr_t paddr, size_t size), - - TP_ARGS(iova, paddr, size) -); + TP_PROTO(unsigned long iova, size_t size, size_t unmapped_size), -DEFINE_EVENT_PRINT(iommu_map_unmap, unmap, + TP_ARGS(iova, size, unmapped_size), - TP_PROTO(unsigned long iova, phys_addr_t paddr, size_t size), + TP_STRUCT__entry( + __field(u64, iova) + __field(size_t, size) + __field(size_t, unmapped_size) + ), - TP_ARGS(iova, paddr, size), + TP_fast_assign( + __entry-iova = iova; + __entry-size = size; + __entry-unmapped_size = unmapped_size; + ), - TP_printk(IOMMU: iova=0x%016llx size=0x%x, - __entry-iova, __entry-size + TP_printk(IOMMU: iova=0x%016llx size=%zu unmapped_size=%zu, + __entry-iova, __entry-size, __entry-unmapped_size ) ); -- 2.1.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: fix trace_map() to report original iova and original size
On 01/15/2015 07:29 PM, Shuah Khan wrote: iommu_map() calls trace_map() with iova and size. trace_map() should report original iova and original size as opposed to iova and size after they get changed during mapping. size is always zero at the end of mapping which is useless to report and iova as it gets incremented, it is not as useful as the original iova. Change iommu_map() to call trace_map() to report original iova and original size. Signed-off-by: Shuah Khan shua...@osg.samsung.com Reported-by: Alex Williamson alex.william...@redhat.com --- drivers/iommu/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f7718d7..fbf8827 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1084,7 +1084,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, if (ret) iommu_unmap(domain, orig_iova, orig_size - size); else - trace_map(iova, paddr, size); + trace_map(orig_iova, paddr, orig_size); return ret; } Joerg, Just makeing sure you saw this as well. I saw your responses on the trace_unmap() patches that you pulled them in. thanks, -- Shuah -- Shuah Khan Sr. Linux Kernel Developer Open Source Innovation Group Samsung Research America (Silicon Valley) shua...@osg.samsung.com | (970) 217-8978 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH kernel] vfio/spapr: Add trace points for map/unmap
On 11/29/2017 09:32 AM, Alex Williamson wrote: > On Thu, 23 Nov 2017 15:13:37 +1100 > Alexey Kardashevskiywrote: > >> On 17/11/17 17:58, Alexey Kardashevskiy wrote: >>> On 17/11/17 11:13, Alex Williamson wrote: On Tue, 14 Nov 2017 10:47:12 +1100 Alexey Kardashevskiy wrote: > On 27/10/17 14:00, Alexey Kardashevskiy wrote: >> This adds trace_map/trace_unmap tracepoints to spapr driver. Type1 >> already >> uses these via the IOMMU API (iommu_map/__iommu_unmap). >> >> Signed-off-by: Alexey Kardashevskiy Is this really legitimate to include tracepoints from a different subsystem?> The vfio type1 backend gets these trace points by virtue of it actually using the IOMMU API, it doesn't call them itself. I'm kind of surprised these are actually available to be called from a module. >>> >>> They are explicitly exported: >>> >>> EXPORT_TRACEPOINT_SYMBOL_GPL(map); >>> EXPORT_TRACEPOINT_SYMBOL_GPL(unmap); Tracepoints need to be exported as per the tracepoint sub-system requirements. >>> >>> I would think this is for things like drivers/vfio/vfio_iommu_spapr_tce.c , >>> why else?... >>> >>> I suspect the way to do this is probably to define our own tracepoints in the vfio/spapr backend or insert tracepoints into the IOMMU layers that that code calls into rather than masquerading as tracepoints from a different subsystem. Right? >>> >>> This makes sense too. But it is going to be just cut-n-paste and some >>> confusion - >>> /sys/kernel/debug/tracing/events/iommu/map will still be present but >>> won't work and >>> /sys/kernel/debug/tracing/events/vfio/vfio_iommu_spapr_tce/map will. > > But iommu/map does work, it's just not called by the vfio spapr tce > backend, which is absolutely correct. In fact, that's part of my > reservation about this approach is that it could be interpreted as > implying a call path that doesn't exist on this arch. These tracepoints are intended to be called after a iommo/map/unmap calls from IOMMU API. It doesn't make any sense to call them from random places. It is totally useless to do so and makes it difficult to debug problems. As an author of these tracepoints - that is my opinion. thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Wed, Nov 28, 2018 at 12:48 PM Russell King - ARM Linux wrote: > > On Wed, Nov 28, 2018 at 11:27:17AM -0800, David Miller wrote: > > From: Linus Torvalds > > Date: Wed, 28 Nov 2018 10:00:06 -0800 > > > > > Not all memory is accessible even to the kernel. If you have memory > > > that shows up in the last page of phys_addr_t, you just mark it > > > reserved at boot-time. > > > > It's not the physical memory at the end that needs to be reserved. > > > > It's the IOMMU mapping arena. > > True, if and only if you have an IOMMU. > > Where there isn't an IOMMU, then we'd have to reserve every page that > that translates to a bus address in the top 4K of dma_addr_t on any > bus in the system - that means knowing early in the kernel > initialisation about all buses in the system so we can detect and > reserve these pages. > The arch and platform differences/confusion as to "what is DMA error value" is the reason why dma_mapping_error is part dma ops. I understand that iy would make sense to reduce the overhead of an additional call, however, I am not sure if would be trivial change to address this. I was down this path of trying to address the missing mapping error checks a few years ago and introduced dma_mapping_error checks in the DMA DEBUG API. As you might already know that there is no common definition for the mapping error. Quick look at the defines shows: #define CALGARY_MAPPING_ERROR 0 #define S390_MAPPING_ERROR (~(dma_addr_t) 0x0) #define SPARC_MAPPING_ERROR (~(dma_addr_t)0x0) This is the reason why there is a arch or in some cases bus specific mapping_error ops is needed. We could unify this and fix all these first. I haven't looked at the patch set closely, maybe you are already doing this. thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu: amd: Fix IOMMU perf counter clobbering during init
On 1/23/20 11:43 PM, Suravee Suthikulpanit wrote: On 1/24/20 5:32 AM, Shuah Khan wrote: init_iommu_perf_ctr() clobbers the register when it checks write access to IOMMU perf counters and fails to restore when they are writable. Add save and restore to fix it. Signed-off-by: Shuah Khan --- Changes since v1: -- Fix bug in sucessful return path. Add a return instead of fall through to pc_false error case drivers/iommu/amd_iommu_init.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 568c52317757..483f7bc379fa 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1655,27 +1655,39 @@ static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, static void init_iommu_perf_ctr(struct amd_iommu *iommu) { struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0; + u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; amd_iommu_pc_present = true; + /* save the value to restore, if writable */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false)) + goto pc_false; + /* Check if the performance counters can be written to */ if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) || (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) || - (val != val2)) { - pci_err(pdev, "Unable to write to IOMMU perf counter.\n"); - amd_iommu_pc_present = false; - return; - } + (val != val2)) + goto pc_false; + + /* restore */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true)) + goto pc_false; pci_info(pdev, "IOMMU performance counters supported\n"); val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET); iommu->max_banks = (u8) ((val >> 12) & 0x3f); iommu->max_counters = (u8) ((val >> 7) & 0xf); + + return; + Good catch. Sorry, I missed this part as well :( +pc_false: + pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n"); + amd_iommu_pc_present = false; + return; } static ssize_t amd_iommu_show_cap(struct device *dev, As for your question in v1: > I see 2 banks and 4 counters on my system. Is it sufficient to check > the first bank and first counter? In other words, if the first one > isn't writable, are all counters non-writable? We currently assume all counters have the same write-ability. So, it should be sufficient to just check the first one. > Should we read the config first and then, try to see if any of the > counters are writable? I have a patch that does that, I can send it > out for review. Which config are you referring to? Not sure what you mean. I mean reading MMIO_CNTR_CONF_OFFSET to get max banks and counters. Also what is the reason to check writable? I tried a couplf og things on my AMD Ryzen 5 PRO 2400GE w/ Radeon Vega Graphics I changed the logic to read config to get max banks and counters before checking if counters are writable and tried writing to all. The result is the same and all of them aren't writable. However, when disable the writable check and assume they are, I can run perf stat -e 'amd_iommu_0 on all events and get data. perf stat -e 'amd_iommu_0/cmd_processed/' sleep 10 Performance counter stats for 'system wide': 56 amd_iommu_0/cmd_processed/ 10.001525171 seconds time elapsed perf stat -a -e amd_iommu/mem_trans_total/ sleep 10 Performance counter stats for 'system wide': 2,696 amd_iommu/mem_trans_total/ 10.001465115 seconds time elapsed I tried all possible events listed under amd_iommu_0 and I can get data on all of them. No problems in dmesg. Is this inline with what you expect? I am curious what this write tell us and can we enable read only mode on these counters? By the way, here is the output from booting the system with this patch (+ debug messages). [ 14.408834] pci :60:00.2: AMD-Vi: IOMMU performance counters supported [ 14.416526] DEBUG: init_iommu_perf_ctr: amd_iommu_pc_present=0x1 [ 14.429602] pci :40:00.2: AMD-Vi: IOMMU performance counters supported [ 14.437275] DEBUG: init_iommu_perf_ctr: amd_iommu_pc_present=0x1 [ 14.450320] pci :20:00.2: AMD-Vi: IOMMU performance counters supported [ 14.457991] DEBUG: init_iommu_perf_ctr: amd_iommu_pc_present=0x1 [ 14.471049] pci :00:00.2: AMD-Vi: IOMMU performance counters supported [ 14.478722] DEBUG: init_iommu_perf_ctr: amd_iommu_pc_present=0x1 Also, here is the perf amd_iommu testing. # perf stat -e 'amd_iommu_0/cmd_processed/,\ amd_iommu_1/cmd_processed/,\ amd_iommu_2/cmd_processed/,\ amd_iommu_3/cmd_processed/' Performance counter stats for 'system wide': 204 amd_iommu_0/cmd_pr
Re: [PATCH] iommu: amd: Fix IOMMU perf counter clobbering during init
On 1/21/20 8:32 AM, Shuah Khan wrote: On 1/20/20 7:10 PM, Suravee Suthikulpanit wrote: On 1/17/2020 5:08 PM, Joerg Roedel wrote: Adding Suravee, who wrote the IOMMU Perf Counter code. On Tue, Jan 14, 2020 at 08:12:20AM -0700, Shuah Khan wrote: init_iommu_perf_ctr() clobbers the register when it checks write access to IOMMU perf counters and fails to restore when they are writable. Add save and restore to fix it. Signed-off-by: Shuah Khan --- drivers/iommu/amd_iommu_init.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) Suravee, can you please review this patch? This looks ok. Does this fix certain issues? Or is this just for sanity. I didn't notice any problems. Counters aren't writable on my system. However, it certainly looks like a bog since registers aren't restored like in other places in this file where such checks are done on other registers. I see 2 banks and 4 counters on my system. Is it sufficient to check the first bank and first counter? In other words, if the first one isn't writable, are all counters non-writable? Should we read the config first and then, try to see if any of the counters are writable? I have a patch that does that, I can send it out for review. Reviewed-by: Suravee Suthikulpanit Joerg, Please don't pull this in. I introduced a bug in this patch. It always returns amd_iommu_pc_present false even when it can write to the counters. My bad. I will send v2. thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu: amd: Fix IOMMU perf counter clobbering during init
init_iommu_perf_ctr() clobbers the register when it checks write access to IOMMU perf counters and fails to restore when they are writable. Add save and restore to fix it. Signed-off-by: Shuah Khan --- Changes since v1: -- Fix bug in sucessful return path. Add a return instead of fall through to pc_false error case drivers/iommu/amd_iommu_init.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 568c52317757..483f7bc379fa 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1655,27 +1655,39 @@ static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, static void init_iommu_perf_ctr(struct amd_iommu *iommu) { struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0; + u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; amd_iommu_pc_present = true; + /* save the value to restore, if writable */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false)) + goto pc_false; + /* Check if the performance counters can be written to */ if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) || (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) || - (val != val2)) { - pci_err(pdev, "Unable to write to IOMMU perf counter.\n"); - amd_iommu_pc_present = false; - return; - } + (val != val2)) + goto pc_false; + + /* restore */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true)) + goto pc_false; pci_info(pdev, "IOMMU performance counters supported\n"); val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET); iommu->max_banks = (u8) ((val >> 12) & 0x3f); iommu->max_counters = (u8) ((val >> 7) & 0xf); + + return; + +pc_false: + pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n"); + amd_iommu_pc_present = false; + return; } static ssize_t amd_iommu_show_cap(struct device *dev, -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: amd: Fix IOMMU perf counter clobbering during init
init_iommu_perf_ctr() clobbers the register when it checks write access to IOMMU perf counters and fails to restore when they are writable. Add save and restore to fix it. Signed-off-by: Shuah Khan --- drivers/iommu/amd_iommu_init.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 568c52317757..c0ad4f293522 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1655,27 +1655,37 @@ static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, static void init_iommu_perf_ctr(struct amd_iommu *iommu) { struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0; + u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; amd_iommu_pc_present = true; + /* save the value to restore, if writable */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false)) + goto pc_false; + /* Check if the performance counters can be written to */ if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) || (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) || - (val != val2)) { - pci_err(pdev, "Unable to write to IOMMU perf counter.\n"); - amd_iommu_pc_present = false; - return; - } + (val != val2)) + goto pc_false; + + /* restore */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true)) + goto pc_false; pci_info(pdev, "IOMMU performance counters supported\n"); val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET); iommu->max_banks = (u8) ((val >> 12) & 0x3f); iommu->max_counters = (u8) ((val >> 7) & 0xf); + +pc_false: + pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n"); + amd_iommu_pc_present = false; + return; } static ssize_t amd_iommu_show_cap(struct device *dev, -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: amd: Fix IOMMU perf counter clobbering during init
On 1/20/20 7:10 PM, Suravee Suthikulpanit wrote: On 1/17/2020 5:08 PM, Joerg Roedel wrote: Adding Suravee, who wrote the IOMMU Perf Counter code. On Tue, Jan 14, 2020 at 08:12:20AM -0700, Shuah Khan wrote: init_iommu_perf_ctr() clobbers the register when it checks write access to IOMMU perf counters and fails to restore when they are writable. Add save and restore to fix it. Signed-off-by: Shuah Khan --- drivers/iommu/amd_iommu_init.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) Suravee, can you please review this patch? This looks ok. Does this fix certain issues? Or is this just for sanity. I didn't notice any problems. Counters aren't writable on my system. However, it certainly looks like a bog since registers aren't restored like in other places in this file where such checks are done on other registers. I see 2 banks and 4 counters on my system. Is it sufficient to check the first bank and first counter? In other words, if the first one isn't writable, are all counters non-writable? Should we read the config first and then, try to see if any of the counters are writable? I have a patch that does that, I can send it out for review. Reviewed-by: Suravee Suthikulpanit Thanks for the review. thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix event counter availability check
On 5/31/20 1:22 AM, Alexander Monakov wrote: Hi, Adding Shuah Khan to Cc: I've noticed you've seen this issue on Ryzen 2400GE; can you have a look at the patch? Would be nice to know if it fixes the problem for you too. I am not seeing any change in behavior on my system. I still see: I can't read perf counters. The question I asked in my previous thread on this: I see 2 banks and 4 counters on my system. Is it sufficient to check the first bank and first counter? In other words, if the first one isn't writable, are all counters non-writable? Should we read the config first and then, try to see if any of the counters are writable? I have a patch that does that, I can send it out for review. I changed the logic to read config to get max banks and counters before checking if counters are writable and tried writing to all. The result is the same and all of them aren't writable. However, when disable the writable check and assume they are, I can run perf stat -e 'amd_iommu_0 on all events and get data. perf stat -e 'amd_iommu_0/cmd_processed/' sleep 10 Performance counter stats for 'system wide': 56 amd_iommu_0/cmd_processed/ 10.001525171 seconds time elapsed perf stat -a -e amd_iommu/mem_trans_total/ sleep 10 Performance counter stats for 'system wide': 2,696 amd_iommu/mem_trans_total/ 10.001465115 seconds time elapsed I tried all possible events listed under amd_iommu_0 and I can get data on all of them. No problems in dmesg. This patch doesn't really address that question. thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix event counter availability check
On 2/26/21 2:44 PM, Paul Menzel wrote: [cc: +suravee, +jörg] Dear Alex, dear Shuah, dear Suravee, dear Jörg, Am 03.06.20 um 08:54 schrieb Alexander Monakov: On Tue, 2 Jun 2020, Shuah Khan wrote: I changed the logic to read config to get max banks and counters before checking if counters are writable and tried writing to all. The result is the same and all of them aren't writable. However, when disable the writable check and assume they are, I can run [snip] This is similar to what I did. I also noticed that counters can be successfully used with perf if the initial check is ignored. I was considering sending a patch to remove the check and adjust the event counting logic to use counters as read-only, but after a bit more investigation I've noticed how late pci_enable_device is done, and came up with this patch. It's a path of less resistance: I'd expect maintainers to be more averse to removing the check rather than fixing it so it works as intended (even though I think the check should not be there in the first place). However: The ability to modify the counters is needed only for sampling the events (getting an interrupt when a counter overflows). There's no code to do that for these AMD IOMMU counters. A solution I would prefer is to not write to those counters at all. It would simplify or even remove a bunch of code. I can submit a corresponding patch if there's general agreement this path is ok. What do you think? I like this idea. Suravee, Jörg, what do you think? Commit 6778ff5b21b (iommu/amd: Fix performance counter initialization) delays the boot up to 100 ms, which is over 20 % on fast systems, and also just a workaround, and does not seem to work always. The delay is also not mentioned in the commit message. Sounds good to me. I can test this out on my system. thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote: In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. unnecessarily Unfortunatly, there is no documentation of since which generation Unfortunately, Rephrase suggestion: Unfortunately, it is unclear when the PMC HW bug fixed. of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume Based that the buggy platforms are less likely to be in used, and it should in use be relatively safe to remove this legacy logic. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd/init.c | 24 +--- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 648cdfd03074..247cdda5d683 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct acpi_table_header *table) return 0; } -static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, - u8 fxn, u64 *value, bool is_write); - static void init_iommu_perf_ctr(struct amd_iommu *iommu) { + u64 val; struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; amd_iommu_pc_present = true; - /* save the value to restore, if writable */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false)) - goto pc_false; - - /* Check if the performance counters can be written to */ - if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) || - (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) || - (val != val2)) - goto pc_false; - - /* restore */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true)) - goto pc_false; - pci_info(pdev, "IOMMU performance counters supported\n"); val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET); @@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu) iommu->max_counters = (u8) ((val >> 7) & 0xf); return; - -pc_false: - pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n"); - amd_iommu_pc_present = false; - return; } static ssize_t amd_iommu_show_cap(struct device *dev, I will test this patch and the revert on my two AMD systems and update you in a day or two. thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
On 4/9/21 10:37 AM, Shuah Khan wrote: On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote: In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. unnecessarily Unfortunatly, there is no documentation of since which generation Unfortunately, Rephrase suggestion: Unfortunately, it is unclear when the PMC HW bug fixed. of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume Based that the buggy platforms are less likely to be in used, and it should in use be relatively safe to remove this legacy logic. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd/init.c | 24 +--- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 648cdfd03074..247cdda5d683 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct acpi_table_header *table) return 0; } -static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, - u8 fxn, u64 *value, bool is_write); - static void init_iommu_perf_ctr(struct amd_iommu *iommu) { + u64 val; struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0, save_reg = 0; Why not leave this u64 val here? Having the pdev assignment as the first line makes it easier to read/follow. if (!iommu_feature(iommu, FEATURE_PC)) return; amd_iommu_pc_present = true; - /* save the value to restore, if writable */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false)) - goto pc_false; - - /* Check if the performance counters can be written to */ - if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) || - (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) || - (val != val2)) Aha - this is going away anyway. Please ignore my comment on 1/2 about parenthesis around (val != val2) being unnecessary. - goto pc_false; - - /* restore */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true)) - goto pc_false; - pci_info(pdev, "IOMMU performance counters supported\n"); val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET); @@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu) iommu->max_counters = (u8) ((val >> 7) & 0xf); return; - -pc_false: - pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n"); - amd_iommu_pc_present = false; - return; } static ssize_t amd_iommu_show_cap(struct device *dev, thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] Revert "iommu/amd: Fix performance counter initialization"
On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote: From: Paul Menzel This reverts commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b. The original commit tries to address an issue, where PMC power-gating causing the IOMMU PMC pre-init test to fail on certain desktop/mobile platforms where the power-gating is normally enabled. There have been several reports that the workaround still does not guarantee to work, and can add up to 100 ms (on the worst case) to the boot process on certain platforms such as the MSI B350M MORTAR with AMD Ryzen 3 2200G. Therefore, revert this commit as a prelude to removing the pre-init test. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Signed-off-by: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- Note: I have revised the commit message to add more detail and remove uncessary information. drivers/iommu/amd/init.c | 45 ++-- 1 file changed, 11 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 321f5906e6ed..648cdfd03074 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -12,7 +12,6 @@ #include #include #include -#include #include #include #include @@ -257,8 +256,6 @@ static enum iommu_init_state init_state = IOMMU_START_STATE; static int amd_iommu_enable_interrupts(void); static int __init iommu_go_to_state(enum iommu_init_state state); static void init_device_table_dma(void); -static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, - u8 fxn, u64 *value, bool is_write); static bool amd_iommu_pre_enabled = true; @@ -1717,11 +1714,13 @@ static int __init init_iommu_all(struct acpi_table_header *table) return 0; } -static void __init init_iommu_perf_ctr(struct amd_iommu *iommu) +static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, + u8 fxn, u64 *value, bool is_write); + +static void init_iommu_perf_ctr(struct amd_iommu *iommu) { - int retry; struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0, save_reg, save_src; + u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; @@ -1729,39 +1728,17 @@ static void __init init_iommu_perf_ctr(struct amd_iommu *iommu) amd_iommu_pc_present = true; /* save the value to restore, if writable */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false) || - iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, false)) - goto pc_false; - - /* -* Disable power gating by programing the performance counter -* source to 20 (i.e. counts the reads and writes from/to IOMMU -* Reserved Register [MMIO Offset 1FF8h] that are ignored.), -* which never get incremented during this init phase. -* (Note: The event is also deprecated.) -*/ - val = 20; - if (iommu_pc_get_set_reg(iommu, 0, 0, 8, , true)) + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false)) goto pc_false; /* Check if the performance counters can be written to */ - val = 0xabcd; - for (retry = 5; retry; retry--) { - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, , true) || - iommu_pc_get_set_reg(iommu, 0, 0, 0, , false) || - val2) - break; - - /* Wait about 20 msec for power gating to disable and retry. */ - msleep(20); - } - - /* restore */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true) || - iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, true)) + if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) || + (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) || + (val != val2)) Probably don't need parentheses around 'val != val2' goto pc_false; - if (val != val2) + /* restore */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true)) goto pc_false; pci_info(pdev, "IOMMU performance counters supported\n"); thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote: In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. Unfortunatly, there is no documentation of since which generation of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume that the buggy platforms are less likely to be in used, and it should be relatively safe to remove this legacy logic. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- Tested-by: Shuah Khan thanks, -- Shuah Results with this patch on AMD Ryzen 5 PRO 2400GE w/ Radeon Vega Graphics sudo ./perf stat -e 'amd_iommu_0/cmd_processed/, amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10 Performance counter stats for 'system wide': 156 amd_iommu_0/cmd_processed/ (33.30%) 80 amd_iommu_0/cmd_processed_inv/ (33.38%) 0 amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.40%) 0 amd_iommu_0/int_dte_hit/ (33.43%) 325 amd_iommu_0/int_dte_mis/ (33.44%) 1,951 amd_iommu_0/mem_dte_hit/ (33.45%) 7,589 amd_iommu_0/mem_dte_mis/ (33.49%) 325 amd_iommu_0/mem_iommu_tlb_pde_hit/ (33.45%) 2,460 amd_iommu_0/mem_iommu_tlb_pde_mis/ (33.41%) 2,510 amd_iommu_0/mem_iommu_tlb_pte_hit/ (33.38%) 5,526 amd_iommu_0/mem_iommu_tlb_pte_mis/ (33.33%) 0 amd_iommu_0/mem_pass_excl/ (33.29%) 0 amd_iommu_0/mem_pass_pretrans/ (33.28%) 1,556 amd_iommu_0/mem_pass_untrans/ (33.27%) 0 amd_iommu_0/mem_target_abort/ (33.26%) 3,112 amd_iommu_0/mem_trans_total/ (33.29%) 0 amd_iommu_0/page_tbl_read_gst/ (33.29%) 1,813 amd_iommu_0/page_tbl_read_nst/ (33.25%) 2,242 amd_iommu_0/page_tbl_read_tot/ (33.27%) 0 amd_iommu_0/smi_blk/ (33.29%) 0 amd_iommu_0/smi_recv/ (33.28%) 0 amd_iommu_0/tlb_inv/ (33.28%) 0 amd_iommu_0/vapic_int_guest/ (33.25%) 0 amd_iommu_0/vapic_int_non_guest/ (33.26%) 10.003200316 seconds time elapsed ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
On 4/9/21 2:00 PM, Shuah Khan wrote: On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote: In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. Unfortunatly, there is no documentation of since which generation of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume that the buggy platforms are less likely to be in used, and it should be relatively safe to remove this legacy logic. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- Tested-by: Shuah Khan Revert + this patch - same as my test on Ryzen 5 On AMD Ryzen 7 4700G with Radeon Graphics These look real odd to me. Let me know if I should look further. sudo ./perf stat -e 'amd_iommu_0/cmd_processed/, amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10 Performance counter stats for 'system wide': 17,761,952,514,865,374 amd_iommu_0/cmd_processed/ (33.28%) 18,582,155,570,607,472 amd_iommu_0/cmd_processed_inv/ (33.32%) 0 amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.36%) 5,056,087,645,262,255 amd_iommu_0/int_dte_hit/ (33.40%) 32,831,106,446,308,888 amd_iommu_0/int_dte_mis/ (33.44%) 13,461,819,655,591,296 amd_iommu_0/mem_dte_hit/ (33.45%) 208,555,436,221,050,464 amd_iommu_0/mem_dte_mis/ (33.47%) 196,824,154,635,609,888 amd_iommu_0/mem_iommu_tlb_pde_hit/ (33.46%) 193,552,630,440,410,144 amd_iommu_0/mem_iommu_tlb_pde_mis/ (33.45%) 176,936,647,809,098,368 amd_iommu_0/mem_iommu_tlb_pte_hit/ (33.41%) 184,737,401,623,626,464 amd_iommu_0/mem_iommu_tlb_pte_mis/ (33.37%) 0 amd_iommu_0/mem_pass_excl/ (33.33%) 0 amd_iommu_0/mem_pass_pretrans/ (33.30%) 0 amd_iommu_0/mem_pass_untrans/ (33.28%) 0 amd_iommu_0/mem_target_abort/ (33.27%) 245,383,212,924,004,288 amd_iommu_0/mem_trans_total/ (33.27%) 0 amd_iommu_0/page_tbl_read_gst/ (33.28%) 262,267,045,917,967,264 amd_iommu_0/page_tbl_read_nst/ (33.27%) 256,308,216,913,137,600 amd_iommu_0/page_tbl_read_tot/ (33.28%) 0 amd_iommu_0/smi_blk/ (33.27%) 0 amd_iommu_0/smi_recv/ (33.27%) 0 amd_iommu_0/tlb_inv/ (33.27%) 0 amd_iommu_0/vapic_int_guest/ (33.26%) 38,913,544,420,579,888 amd_iommu_0/vapic_int_non_guest/ (33.27%) 10.003967760 seconds time elapsed thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-mapping: benchmark: Extract a common header file for map_benchmark definition
On 2/10/22 9:22 PM, Song Bao Hua (Barry Song) wrote: -Original Message- From: tiantao (H) Sent: Friday, February 11, 2022 4:15 PM To: Song Bao Hua (Barry Song) ; sh...@kernel.org; chenxiang (M) Cc: iommu@lists.linux-foundation.org; linux-kselft...@vger.kernel.org; linux...@openeuler.org Subject: [PATCH] dma-mapping: benchmark: Extract a common header file for map_benchmark definition kernel/dma/map_benchmark.c and selftests/dma/dma_map_benchmark.c have duplicate map_benchmark definitions, which tends to lead to inconsistent changes to map_benchmark on both sides, extract a common header file to avoid this problem. Signed-off-by: Tian Tao +To: Christoph Looks like a right cleanup. This will help decrease the maintain overhead in the future. Other similar selftests tools are also doing this. Acked-by: Barry Song +1 on this cleanup making this code maintainable. We are moving in the direction of cleaning up defines in selftests for the same reason. Let's just make sure this works on older kernels. We do support mainline kselftest on stable releases. With that: Reviewed-by: Shuah Khan thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu