Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register
On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote: Current flow on kdump boot enable_IR intel_enable_irq_remapping iommu_disable_irq_remapping == IRES/QIES/TES disabled here dmar_disable_qi == do nothing dmar_enable_qi == QIES enabled intel_setup_irq_remapping== IRES enabled But what we want to do here in the kdumo case is to disable translation too, right? Because the former kernel might have translation and irq-remapping enabled and the kdump kernel might be compiled without support for dma-remapping. So if we don't disable translation here too the kdump kernel is unable to do DMA. I agree that disabling translation should be a bit more explicit instead of the current code. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] iommu/amd: Add IOMMU event log injection interface for testing event flag decoding logic
On Wed, Mar 27, 2013 at 06:51:34PM -0500, suravee.suthikulpa...@amd.com wrote: From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com Add IOMMU event log injection interface for testing event flag decoding logic. This interface allows users to specify device id, event flag, and event types via debugfs. echo 0x300 /sys/kernel/debug/amd-iommu-evninj/devid // (e.g. Bus:Dev.fun 3:0.0) echo 0xfff /sys/kernel/debug/amd-iommu-evninj/flags // (e.g. Specify flag value) echo 2 /sys/kernel/debug/amd-iommu-evninj/type // (e.g. for IO_PAGE_FAULT event) Once the event is injected, IOMMU driver will parse and print out the event information in kernel log (dmesg) with the various information depending on the types of event and flag specified. Why is this needed? I think the risk of regressions in the event-handling code is to low to justify the need for an event-injection mechanism. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag
On 4/2/2013 9:33 AM, Joerg Roedel wrote: Hi Suravee, On Wed, Mar 27, 2013 at 06:51:23PM -0500, suravee.suthikulpa...@amd.com wrote: From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com Add logic to decode AMD IOMMU event flag based on information from AMD IOMMU specification. This should simplify debugging IOMMU errors. Also, dump DTE information in additional cases. The patch in general makes sense to have, but I have a couple of comments. +static void dump_flags(int flags, int ev_type) +{ + struct _event_log_flags *p = (struct _event_log_flags *) flags; + u32 err_type = p-type; + + pr_err(AMD-Vi: Flags details:\n); + pr_err(AMD-Vi:Guest / Nested : %u\n, p-gn); + pr_err(AMD-Vi:No Execute : %u\n, p-nx); + pr_err(AMD-Vi:User-Supervisor : %u\n, p-us); + pr_err(AMD-Vi:Interrupt : %u\n, p-i); + pr_err(AMD-Vi:Present : %u\n, p-pr); + pr_err(AMD-Vi:Read / Write: %u\n, p-rw); + pr_err(AMD-Vi:Permission : %u\n, p-pe); + pr_err(AMD-Vi:Reserv bit not zero / Illegal level encoding : %u\n, + p-rz); + pr_err(AMD-Vi:Translation / Transaction : %u\n, + p-tr); + pr_err(AMD-Vi:Type of error : (0x%x) , err_type); Printing the flags multi-line is much too noisy for IOMMU events. Just print a char-shortcut for each flag set on the same line. I will make the changes and send in for new patch for review. + + if ((ev_type == EVENT_TYPE_DEV_TAB_ERR) || + (ev_type == EVENT_TYPE_PAGE_TAB_ERR) || + (ev_type == EVENT_TYPE_CMD_HARD_ERR)) { + /* Only supports up to 2 bits */ + err_type = 0x3; + switch (err_type) { + case 0: + pr_err(Reserved\n); + break; + case 1: + pr_err(Master Abort\n); + break; + case 2: + pr_err(Target Abort\n); + break; + case 3: + pr_err(Data Error\n); + break; + } Why do you create string-arrays for other type-encodings but not for this one? I can do the same way if that's preferred. Thanks, Suravee + } else if (ev_type == EVENT_TYPE_INV_DEV_REQ) { + if (p-tr == 0) { + if (err_type ARRAY_SIZE(_invalid_translation_desc)) + printk(%s\n, + _invalid_translation_desc[err_type]); + } else { + if (err_type ARRAY_SIZE(_invalid_transaction_desc)) + printk(%s\n, + _invalid_transaction_desc[err_type]); pr_cont instead of printk. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag
On Tue, Apr 02, 2013 at 04:33:36PM +0200, Joerg Roedel wrote: Hi Suravee, On Wed, Mar 27, 2013 at 06:51:23PM -0500, suravee.suthikulpa...@amd.com wrote: From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com Add logic to decode AMD IOMMU event flag based on information from AMD IOMMU specification. This should simplify debugging IOMMU errors. Also, dump DTE information in additional cases. The patch in general makes sense to have, but I have a couple of comments. While you guys are at it, can someone fix this too pls (ASUS board with a PD on it). [0.220342] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table [0.220398] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table [0.220451] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table [0.220506] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.
On 2 Apr 2013 15:37, Pat Erley pat-l...@erley.org wrote: On 03/07/2013 09:35 PM, Andrew Cooks wrote: --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c +/* Table of multiple (ghost) source functions. This is similar to the + * translated sources above, but with the following differences: + * 1. the device may use multiple functions as DMA sources, + * 2. these functions cannot be assumed to be actual devices, they're simply + * incorrect DMA tags. + * 3. the specific ghost function for a request can not always be predicted. + * For example, the actual device could be xx:yy.1 and it could use + * both 0 and 1 for different requests, with no obvious way to tell when + * DMA will be tagged as comming from xx.yy.0 and and when it will be tagged + * as comming from xx.yy.1. + * The bitmap contains all of the functions used in DMA tags, including the + * actual device. + * See https://bugzilla.redhat.com/show_bug.cgi?id=757166, + * https://bugzilla.kernel.org/show_bug.cgi?id=42679 + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768 + */ +static const struct pci_dev_dma_multi_func_sources { + u16 vendor; + u16 device; + u8 func_map;/* bit map. lsb is fn 0. */ +} pci_dev_dma_multi_func_sources[] = { + { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)}, + { 0 } +}; Adding another buggy device. I have a Ricoh multifunction device: 17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01) 17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 01) 17:00.0 0805: 1180:e822 (rev 01) 17:00.3 0c00: 1180:e832 (rev 01) The Ricoh device issue has been known for some time and a quirk has been available since commit 12ea6cad1c7d046 in June 2012. It's slightly different than the problem this patch tries to work around [1]. that adding entries for also fixed booting. I don't have any SD cards or firewire devices handy to test that they work, but the system now boots, which was not the case without your patch and IOMMU/DMAR enabled. That is really strange. Could you tell us what kernel version you tested and provide dmesg output? Here's a previous patch used for similar hardware that may also be fixed by this: http://lists.fedoraproject.org/pipermail/scm-commits/2010-October/510785.html and another thread/bug report this may solve: https://bugzilla.redhat.com/show_bug.cgi?id=605888 I believe this is referenced in drivers/pci/quirks.c for versions newer than 3.5. Feel free to include me in any future iterations of this patch you'd like tested. Tested-By: Pat Erley pat-l...@erley.org Thanks for testing! [1] In the Ricoh case, multiple functions are used for real devices and the bug is that these devices all use function 0 during DMA. In this particular case, I'd expect the FireWire device 17:00.3 to issue DMA from the SD Host Controller address 17:00.0. The quirk is not too much of a terrible hack - it's a fairly simple translation. In the Marvell case, the real device uses DMA source tags that don't actually belong to any visible devices. The quirk to make this work is more invasive, not nearly as elegant and has not attracted much enthusiasm from subsystem maintainers, though I'm still hopeful that a quirk will be merged in some form or another. -- a. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag
On Tue, Apr 02, 2013 at 04:40:37PM +0200, Borislav Petkov wrote: While you guys are at it, can someone fix this too pls (ASUS board with a PD on it). [0.220342] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table [0.220398] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table [0.220451] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table [0.220506] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) That is actually a BIOS problem. I wonder whether it would help to turn this into a WARN_ON to get the board vendors to release working BIOSes. Opinions? Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5 v11] powerpc: Add iommu domain pointer to device archdata
On Fri, Mar 29, 2013 at 01:23:59AM +0530, Varun Sethi wrote: Add an iommu domain pointer to device (powerpc) archdata. Devices are attached to iommu domains and this pointer provides a mechanism to correlate between a device and the associated iommu domain. This field is set when a device is attached to a domain. Signed-off-by: Varun Sethi varun.se...@freescale.com This patch needs to be Acked by the PPC maintainers. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.
-Original Message- From: Sethi Varun-B16395 Sent: Thursday, March 28, 2013 2:54 PM To: j...@8bytes.org; Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-foundation.org; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org Cc: Sethi Varun-B16395 Subject: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation. Following is a brief description of the PAMU hardware: PAMU determines what action to take and whether to authorize the action on the basis of the memory address, a Logical IO Device Number (LIODN), and PAACT table (logically) indexed by LIODN and address. Hardware devices which need to access memory must provide an LIODN in addition to the memory address. Peripheral Access Authorization and Control Tables (PAACTs) are the primary data structures used by PAMU. A PAACT is a table of peripheral access authorization and control entries (PAACE).Each PAACE defines the range of I/O bus address space that is accessible by the LIOD and the associated access capabilities. There are two types of PAACTs: primary PAACT (PPAACT) and secondary PAACT (SPAACT).A given physical I/O device may be able to act as one or more independent logical I/O devices (LIODs). Each such logical I/O device is assigned an identifier called logical I/O device number (LIODN). A LIODN is allocated a contiguous portion of the I/O bus address space called the DSA window for performing DSA operations. The DSA window may optionally be divided into multiple sub-windows, each of which may be used to map to a region in system storage space. The first sub-window is referred to as the primary sub-window and the remaining are called secondary sub-windows. This patch provides the PAMU driver (fsl_pamu.c) and the corresponding IOMMU API implementation (fsl_pamu_domain.c). The PAMU hardware driver (fsl_pamu.c) has been derived from the work done by Ashish Kalra and Timur Tabi. Signed-off-by: Timur Tabi ti...@tabi.org Signed-off-by: Varun Sethi varun.se...@freescale.com --- changes in v11: - changed iova to dma_addr_t in iova_to_phys API. changes in v10: - Support for new guts compatibe string for T4 B4 devices. - Modified comment about port ID and mentioned the errata number. - Fixed the issue where data pointer was not freed in case of a an error. - Pass data pointer while freeing irq. - Whle initializing the SPAACE entry clear the valid bit. changes in v9: - Merged and createad a single function to delete a device from domain list. - Refactored the add_device API code. - Renamed the paace and spaace init fucntions. - Renamed functions for mapping windows and subwindows. - Changed the MAX LIODN value to MAX value u-boot can program. - Hard coded maximum number of subwindows. changes in v8: - implemented the new API for window based IOMMUs. changes in v7: - Set max_subwidows in the geometry attribute. - Add checking for maximum supported LIODN value. - Use upper_32_bits and lower_32_bits macros while intializing PAMU data structures. changes in v6: - Simplified complex conditional statements. - Fixed indentation issues. - Added comments for IOMMU API implementation. changes in v5: - Addressed comments from Timur. changes in v4: - Addressed comments from Timur and Scott. changes in v3: - Addressed comments by Kumar Gala - dynamic fspi allocation - fixed alignment check in map and unmap arch/powerpc/sysdev/fsl_pci.h |5 + drivers/iommu/Kconfig |8 + drivers/iommu/Makefile |1 + drivers/iommu/fsl_pamu.c| 1269 +++ drivers/iommu/fsl_pamu.h| 405 + drivers/iommu/fsl_pamu_domain.c | 1137 +++ drivers/iommu/fsl_pamu_domain.h | 85 +++ 7 files changed, 2910 insertions(+), 0 deletions(-) create mode 100644 drivers/iommu/fsl_pamu.c create mode 100644 drivers/iommu/fsl_pamu.h create mode 100644 drivers/iommu/fsl_pamu_domain.c create mode 100644 drivers/iommu/fsl_pamu_domain.h Ack. Stuart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag
On Tue, Apr 02, 2013 at 05:03:04PM +0200, Joerg Roedel wrote: On Tue, Apr 02, 2013 at 04:40:37PM +0200, Borislav Petkov wrote: While you guys are at it, can someone fix this too pls (ASUS board with a PD on it). [0.220342] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table [0.220398] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table [0.220451] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table [0.220506] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) That is actually a BIOS problem. I wonder whether it would help to turn this into a WARN_ON to get the board vendors to release working BIOSes. Opinions? Good luck trying to get ASUS to fix anything in their BIOS :(. Can't we detect the SB IOAPIC some other way in this case? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag
On 4/2/2013 10:29 AM, Borislav Petkov wrote: On Tue, Apr 02, 2013 at 05:03:04PM +0200, Joerg Roedel wrote: On Tue, Apr 02, 2013 at 04:40:37PM +0200, Borislav Petkov wrote: While you guys are at it, can someone fix this too pls (ASUS board with a PD on it). [0.220342] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table [0.220398] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table [0.220451] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table [0.220506] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) That is actually a BIOS problem. I wonder whether it would help to turn this into a WARN_ON to get the board vendors to release working BIOSes. Opinions? Good luck trying to get ASUS to fix anything in their BIOS :(. I have tried to contact Asus in the past to have them fix the issue, but I got no luck. Once it is out in the field, it's very difficult to get them to make changes. I am also addressing this issue with the BIOS team for the future hardware. Turning this into WARN_ON() at this point might break a lot of systems currently out in the field. However, users can always switching to use intremap=off but this might not be obvious. Suravee Can't we detect the SB IOAPIC some other way in this case? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.
On 03/07/2013 09:35 PM, Andrew Cooks wrote: --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c +/* Table of multiple (ghost) source functions. This is similar to the + * translated sources above, but with the following differences: + * 1. the device may use multiple functions as DMA sources, + * 2. these functions cannot be assumed to be actual devices, they're simply + * incorrect DMA tags. + * 3. the specific ghost function for a request can not always be predicted. + * For example, the actual device could be xx:yy.1 and it could use + * both 0 and 1 for different requests, with no obvious way to tell when + * DMA will be tagged as comming from xx.yy.0 and and when it will be tagged + * as comming from xx.yy.1. + * The bitmap contains all of the functions used in DMA tags, including the + * actual device. + * See https://bugzilla.redhat.com/show_bug.cgi?id=757166, + * https://bugzilla.kernel.org/show_bug.cgi?id=42679 + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768 + */ +static const struct pci_dev_dma_multi_func_sources { + u16 vendor; + u16 device; + u8 func_map;/* bit map. lsb is fn 0. */ +} pci_dev_dma_multi_func_sources[] = { + { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)}, + { 0 } +}; Adding another buggy device. I have a Ricoh multifunction device: 17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01) 17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 01) 17:00.0 0805: 1180:e822 (rev 01) 17:00.3 0c00: 1180:e832 (rev 01) that adding entries for also fixed booting. I don't have any SD cards or firewire devices handy to test that they work, but the system now boots, which was not the case without your patch and IOMMU/DMAR enabled. Here's a previous patch used for similar hardware that may also be fixed by this: http://lists.fedoraproject.org/pipermail/scm-commits/2010-October/510785.html and another thread/bug report this may solve: https://bugzilla.redhat.com/show_bug.cgi?id=605888 Feel free to include me in any future iterations of this patch you'd like tested. Tested-By: Pat Erley pat-l...@erley.org ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.
On 04/02/2013 10:50 AM, Andrew Cooks wrote: On 2 Apr 2013 15:37, Pat Erley pat-l...@erley.org mailto:pat-l...@erley.org wrote: On 03/07/2013 09:35 PM, Andrew Cooks wrote: --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c +/* Table of multiple (ghost) source functions. This is similar to the + * translated sources above, but with the following differences: + * 1. the device may use multiple functions as DMA sources, + * 2. these functions cannot be assumed to be actual devices, they're simply + * incorrect DMA tags. + * 3. the specific ghost function for a request can not always be predicted. + * For example, the actual device could be xx:yy.1 and it could use + * both 0 and 1 for different requests, with no obvious way to tell when + * DMA will be tagged as comming from xx.yy.0 and and when it will be tagged + * as comming from xx.yy.1. + * The bitmap contains all of the functions used in DMA tags, including the + * actual device. + * See https://bugzilla.redhat.com/show_bug.cgi?id=757166, + * https://bugzilla.kernel.org/show_bug.cgi?id=42679 + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768 + */ +static const struct pci_dev_dma_multi_func_sources { + u16 vendor; + u16 device; + u8 func_map;/* bit map. lsb is fn 0. */ +} pci_dev_dma_multi_func_sources[] = { + { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)}, + { 0 } +}; Adding another buggy device. I have a Ricoh multifunction device: 17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01) 17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 01) 17:00.0 0805: 1180:e822 (rev 01) 17:00.3 0c00: 1180:e832 (rev 01) The Ricoh device issue has been known for some time and a quirk has been available since commit 12ea6cad1c7d046 in June 2012. It's slightly different than the problem this patch tries to work around [1]. Hmm, I've had this problem with many recent (vanilla) kernels, up to and including 3.9-rc5 that adding entries for also fixed booting. I don't have any SD cards or firewire devices handy to test that they work, but the system now boots, which was not the case without your patch and IOMMU/DMAR enabled. That is really strange. Could you tell us what kernel version you tested and provide dmesg output? I'll capture a vanilla 3.8.5 boot without any patches and iommu=off, then try to find another machine to catch what I can of a netconsole boot with iommu=on. What's the preferred way to send these? pastebin links? I'd been running the 'dirty' fix that's in the redhat bugzilla entry. I checked my .config and have CONFIG_PCI_QUIRKS=y, and verified my devices are in the quirks table for the pci_func_0_dma_source fixup. Here's a previous patch used for similar hardware that may also be fixed by this: http://lists.fedoraproject.org/pipermail/scm-commits/2010-October/510785.html and another thread/bug report this may solve: https://bugzilla.redhat.com/show_bug.cgi?id=605888 I believe this is referenced in drivers/pci/quirks.c for versions newer than 3.5. Feel free to include me in any future iterations of this patch you'd like tested. Tested-By: Pat Erley pat-l...@erley.org mailto:pat-l...@erley.org Thanks for testing! [1] In the Ricoh case, multiple functions are used for real devices and the bug is that these devices all use function 0 during DMA. In this particular case, I'd expect the FireWire device 17:00.3 to issue DMA from the SD Host Controller address 17:00.0. The quirk is not too much of a terrible hack - it's a fairly simple translation. In the Marvell case, the real device uses DMA source tags that don't actually belong to any visible devices. The quirk to make this work is more invasive, not nearly as elegant and has not attracted much enthusiasm from subsystem maintainers, though I'm still hopeful that a quirk will be merged in some form or another. Thanks for explaining the difference! Pat ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag
On Tue, Apr 02, 2013 at 05:29:56PM +0200, Borislav Petkov wrote: On Tue, Apr 02, 2013 at 05:03:04PM +0200, Joerg Roedel wrote: Good luck trying to get ASUS to fix anything in their BIOS :(. Hmm... Can't we detect the SB IOAPIC some other way in this case? I can certainly write a patch that works around your particular BIOS bug. The problem is that such a fix will most certainly break other systems. Unfortunatly there is no reliable way to fixup the IO-APIC-ID-DEVID mapping at runtime when the BIOS messed it up. The only thing I can do is to check for potential problems and disable the intremap feature then, so that the system will at least boot. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag
On Tue, Apr 02, 2013 at 10:41:25AM -0500, Suthikulpanit, Suravee wrote: Turning this into WARN_ON() at this point might break a lot of systems currently out in the field. However, users can always switching to use intremap=off but this might not be obvious. A WARN_ON doesn't break systems, it just creates more noise. Probably enough noise to get board vendors to fix their stuff up. But there is more to consider before making more noise, of course :) Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.
Cc'ing Alex Williamson Alex, can you please review the iommu-group part of this patch? My comments so far are below: On Fri, Mar 29, 2013 at 01:24:02AM +0530, Varun Sethi wrote: +config FSL_PAMU + bool Freescale IOMMU support + depends on PPC_E500MC + select IOMMU_API + select GENERIC_ALLOCATOR + help + Freescale PAMU support. A bit lame for a help text. Can you elaborate more what PAMU is and when it should be enabled? +int pamu_enable_liodn(int liodn) +{ + struct paace *ppaace; + + ppaace = pamu_get_ppaace(liodn); + if (!ppaace) { + pr_err(Invalid primary paace entry\n); + return -ENOENT; + } + + if (!get_bf(ppaace-addr_bitfields, PPAACE_AF_WSE)) { + pr_err(liodn %d not configured\n, liodn); + return -EINVAL; + } + + /* Ensure that all other stores to the ppaace complete first */ + mb(); + + ppaace-addr_bitfields |= PAACE_V_VALID; + mb(); Why is it sufficient to set the bit in a variable when enabling liodn but when disabling it set_bf needs to be called? This looks a bit assymetric. +/* Derive the window size encoding for a particular PAACE entry */ +static unsigned int map_addrspace_size_to_wse(phys_addr_t addrspace_size) +{ + /* Bug if not a power of 2 */ + BUG_ON((addrspace_size (addrspace_size - 1))); Please use is_power_of_2 here. + + /* window size is 2^(WSE+1) bytes */ + return __ffs(addrspace_size PAMU_PAGE_SHIFT) + PAMU_PAGE_SHIFT - 1; The PAMU_PAGE_SHIFT shifting and adding looks redundant. + if ((win_size (win_size - 1)) || win_size PAMU_PAGE_SIZE) { + pr_err(window size too small or not a power of two %llx\n, win_size); + return -EINVAL; + } + + if (win_addr (win_size - 1)) { + pr_err(window address is not aligned with window size\n); + return -EINVAL; + } Again, use is_power_of_2 instead of hand-coding. + if (~stashid != 0) + set_bf(paace-impl_attr, PAACE_IA_CID, stashid); + + smp_wmb(); + + if (enable) + paace-addr_bitfields |= PAACE_V_VALID; Havn't you written a helper funtion to set this bit? +irqreturn_t pamu_av_isr(int irq, void *arg) +{ + struct pamu_isr_data *data = arg; + phys_addr_t phys; + unsigned int i, j; + + pr_emerg(fsl-pamu: access violation interrupt\n); + + for (i = 0; i data-count; i++) { + void __iomem *p = data-pamu_reg_base + i * PAMU_OFFSET; + u32 pics = in_be32(p + PAMU_PICS); + + if (pics PAMU_ACCESS_VIOLATION_STAT) { + pr_emerg(POES1=%08x\n, in_be32(p + PAMU_POES1)); + pr_emerg(POES2=%08x\n, in_be32(p + PAMU_POES2)); + pr_emerg(AVS1=%08x\n, in_be32(p + PAMU_AVS1)); + pr_emerg(AVS2=%08x\n, in_be32(p + PAMU_AVS2)); + pr_emerg(AVA=%016llx\n, make64(in_be32(p + PAMU_AVAH), + in_be32(p + PAMU_AVAL))); + pr_emerg(UDAD=%08x\n, in_be32(p + PAMU_UDAD)); + pr_emerg(POEA=%016llx\n, make64(in_be32(p + PAMU_POEAH), + in_be32(p + PAMU_POEAL))); + + phys = make64(in_be32(p + PAMU_POEAH), + in_be32(p + PAMU_POEAL)); + + /* Assume that POEA points to a PAACE */ + if (phys) { + u32 *paace = phys_to_virt(phys); + + /* Only the first four words are relevant */ + for (j = 0; j 4; j++) + pr_emerg(PAACE[%u]=%08x\n, j, in_be32(paace + j)); + } + } + } + + panic(\n); A kernel panic seems like an over-reaction to an access violation. Besides the device that caused the violation the system should still work, no? +#define make64(high, low) (((u64)(high) 32) | (low)) You redefined this make64 here. +static int map_subwins(int liodn, struct fsl_dma_domain *dma_domain) +{ + struct dma_window *sub_win_ptr = + dma_domain-win_arr[0]; + int i, ret; + unsigned long rpn; + + for (i = 0; i dma_domain-win_cnt; i++) { + if (sub_win_ptr[i].valid) { + rpn = sub_win_ptr[i].paddr + PAMU_PAGE_SHIFT; + spin_lock(iommu_lock); IOMMU code might run in interrupt context, so please use spin_lock_irqsave for the iommu_lock. +static void detach_device(struct device *dev, struct fsl_dma_domain *dma_domain) +{ + struct device_domain_info *info; + struct list_head *entry, *tmp; + unsigned long flags; + + spin_lock_irqsave(dma_domain-domain_lock, flags); + /* Remove the device from the domain device list */ +
Re: [PATCH 0/5 v11] iommu/fsl: Freescale PAMU driver and IOMMU API implementation.
On Fri, Mar 29, 2013 at 01:23:57AM +0530, Varun Sethi wrote: This patchset provides the Freescale PAMU (Peripheral Access Management Unit) driver and the corresponding IOMMU API implementation. PAMU is the IOMMU present on Freescale QorIQ platforms. PAMU can authorize memory access, remap the memory address, and remap the I/O transaction type. This set consists of the following patches: 1. Make iova dma_addr_t in the iommu_iova_to_phys API. 2. Addition of new field in the device (powerpc) archdata structure for storing iommu domain information pointer. 3. Add window permission flags in the iommu_domain_window_enable API. 4. Add domain attributes for FSL PAMU driver. 5. PAMU driver and IOMMU API implementation. Okay, I am about to apply patches 1 and 3 to a new ppc/pamu branch in my tree. As a general question, how did you test the IOMMU driver and what will you do in the future to avoid regressions? Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.
On 04/02/2013 11:47 AM, Pat Erley wrote: On 04/02/2013 10:50 AM, Andrew Cooks wrote: On 2 Apr 2013 15:37, Pat Erley pat-l...@erley.org mailto:pat-l...@erley.org wrote: On 03/07/2013 09:35 PM, Andrew Cooks wrote: --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c +/* Table of multiple (ghost) source functions. This is similar to the + * translated sources above, but with the following differences: + * 1. the device may use multiple functions as DMA sources, + * 2. these functions cannot be assumed to be actual devices, they're simply + * incorrect DMA tags. + * 3. the specific ghost function for a request can not always be predicted. + * For example, the actual device could be xx:yy.1 and it could use + * both 0 and 1 for different requests, with no obvious way to tell when + * DMA will be tagged as comming from xx.yy.0 and and when it will be tagged + * as comming from xx.yy.1. + * The bitmap contains all of the functions used in DMA tags, including the + * actual device. + * See https://bugzilla.redhat.com/show_bug.cgi?id=757166, + * https://bugzilla.kernel.org/show_bug.cgi?id=42679 + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768 + */ +static const struct pci_dev_dma_multi_func_sources { + u16 vendor; + u16 device; + u8 func_map;/* bit map. lsb is fn 0. */ +} pci_dev_dma_multi_func_sources[] = { + { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)}, + { 0 } +}; Adding another buggy device. I have a Ricoh multifunction device: 17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01) 17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 01) 17:00.0 0805: 1180:e822 (rev 01) 17:00.3 0c00: 1180:e832 (rev 01) The Ricoh device issue has been known for some time and a quirk has been available since commit 12ea6cad1c7d046 in June 2012. It's slightly different than the problem this patch tries to work around [1]. Hmm, I've had this problem with many recent (vanilla) kernels, up to and including 3.9-rc5 that adding entries for also fixed booting. I don't have any SD cards or firewire devices handy to test that they work, but the system now boots, which was not the case without your patch and IOMMU/DMAR enabled. That is really strange. Could you tell us what kernel version you tested and provide dmesg output? I'll capture a vanilla 3.8.5 boot without any patches and iommu=off, then try to find another machine to catch what I can of a netconsole boot with iommu=on. What's the preferred way to send these? pastebin links? I'd been running the 'dirty' fix that's in the redhat bugzilla entry. I checked my .config and have CONFIG_PCI_QUIRKS=y, and verified my devices are in the quirks table for the pci_func_0_dma_source fixup. Here's a previous patch used for similar hardware that may also be fixed by this: http://lists.fedoraproject.org/pipermail/scm-commits/2010-October/510785.html and another thread/bug report this may solve: https://bugzilla.redhat.com/show_bug.cgi?id=605888 I believe this is referenced in drivers/pci/quirks.c for versions newer than 3.5. Feel free to include me in any future iterations of this patch you'd like tested. Tested-By: Pat Erley pat-l...@erley.org mailto:pat-l...@erley.org Thanks for testing! [1] In the Ricoh case, multiple functions are used for real devices and the bug is that these devices all use function 0 during DMA. In this particular case, I'd expect the FireWire device 17:00.3 to issue DMA from the SD Host Controller address 17:00.0. The quirk is not too much of a terrible hack - it's a fairly simple translation. In the Marvell case, the real device uses DMA source tags that don't actually belong to any visible devices. The quirk to make this work is more invasive, not nearly as elegant and has not attracted much enthusiasm from subsystem maintainers, though I'm still hopeful that a quirk will be merged in some form or another. Thanks for explaining the difference! Pat -- Here are my relevant logs and configs from a vanilla 3.8.5 kernel: http://www.erley.org/oops/ * the -nots files have had timestamps stripped for ease of diffing. * no_iommu_no_fw.txt is a diff of the -nots logs. * loading_fw.txt is an excerpt of log once I load the firewire-ohci module (causing, for all practical purposes, a complete system lock.) * the .gz of the same name is the 55mb of logs it generated in 36 seconds. I was hesitant to send 100k of text to the ML, here is the only 'interesting' difference in the logs, from my inspection: -PCI-DMA: Using software bounce buffering for
RFC: vfio API changes needed for powerpc
Alex, We are in the process of implementing vfio-pci support for the Freescale IOMMU (PAMU). It is an aperture/window-based IOMMU and is quite different than x86, and will involve creating a 'type 2' vfio implementation. For each device's DMA mappings, PAMU has an overall aperture and a number of windows. All sizes and window counts must be power of 2. To illustrate, below is a mapping for a 256MB guest, including guest memory (backed by 64MB huge pages) and some windows for MSIs: Total aperture: 512MB # of windows: 8 win gphys/ # iovaphys size --- 0 0x 0xX_XX00 64MB 1 0x0400 0xX_XX00 64MB 2 0x0800 0xX_XX00 64MB 3 0x0C00 0xX_XX00 64MB 4 0x1000 0xf_fe044000 4KB// msi bank 1 5 0x1400 0xf_fe045000 4KB// msi bank 2 6 0x1800 0xf_fe046000 4KB// msi bank 3 7- - disabled There are a couple of updates needed to the vfio user-kernel interface that we would like your feedback on. 1. IOMMU geometry The kernel IOMMU driver now has an interface (see domain_set_attr, domain_get_attr) that lets us set the domain geometry using attributes. We want to expose that to user space, so envision needing a couple of new ioctls to do this: VFIO_IOMMU_SET_ATTR VFIO_IOMMU_GET_ATTR 2. MSI window mappings The more problematic question is how to deal with MSIs. We need to create mappings for up to 3 MSI banks that a device may need to target to generate interrupts. The Linux MSI driver can allocate MSIs from the 3 banks any way it wants, and currently user space has no way of knowing which bank may be used for a given device. There are 3 options we have discussed and would like your direction: A. Implicit mappings -- with this approach user space would not explicitly map MSIs. User space would be required to set the geometry so that there are 3 unused windows (the last 3 windows) for MSIs, and it would be up to the kernel to create the mappings. This approach requires some specific semantics (leaving 3 windows) and it potentially gets a little weird-- when should the kernel actually create the MSI mappings? When should they be unmapped? Some convention would need to be established. B. Explicit mapping using DMA map flags. The idea is that a new flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that a mapping is to be created for the supplied iova. No vaddr is given though. So in the above example there would be a a dma map at 0x1000 for 24KB (and no vaddr). It's up to the kernel to determine which bank gets mapped where. So, this option puts user space in control of which windows are used for MSIs and when MSIs are mapped/unmapped. There would need to be some semantics as to how this is used-- it only makes sense C. Explicit mapping using normal DMA map. The last idea is that we would introduce a new ioctl to give user-space an fd to the MSI bank, which could be mmapped. The flow would be something like this: -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD -user space mmaps the fd, getting a vaddr -user space does a normal DMA map for desired iova This approach makes everything explicit, but adds a new ioctl applicable most likely only to the PAMU (type2 iommu). Any feedback or direction? Thanks, Stuart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 0/5 v11] iommu/fsl: Freescale PAMU driver and IOMMU API implementation.
-Original Message- From: Joerg Roedel [mailto:j...@8bytes.org] Sent: Tuesday, April 02, 2013 9:53 PM To: Sethi Varun-B16395 Cc: Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux- foundation.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org Subject: Re: [PATCH 0/5 v11] iommu/fsl: Freescale PAMU driver and IOMMU API implementation. On Fri, Mar 29, 2013 at 01:23:57AM +0530, Varun Sethi wrote: This patchset provides the Freescale PAMU (Peripheral Access Management Unit) driver and the corresponding IOMMU API implementation. PAMU is the IOMMU present on Freescale QorIQ platforms. PAMU can authorize memory access, remap the memory address, and remap the I/O transaction type. This set consists of the following patches: 1. Make iova dma_addr_t in the iommu_iova_to_phys API. 2. Addition of new field in the device (powerpc) archdata structure for storing iommu domain information pointer. 3. Add window permission flags in the iommu_domain_window_enable API. 4. Add domain attributes for FSL PAMU driver. 5. PAMU driver and IOMMU API implementation. Okay, I am about to apply patches 1 and 3 to a new ppc/pamu branch in my tree. As a general question, how did you test the IOMMU driver and what will you do in the future to avoid regressions? I use a kernel module for testing the iommu_api support. The module allows for dynamic creation and deletion of iommu domains for the devices in the device tree. Also, the vfio support (under development) for Freescale SOCs with APMU hardware would depend on the PAMU driver. -Varun ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V2] iommu/amd: Add logic to decode AMD IOMMU event flag
From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com Add logic to decode AMD IOMMU event flag based on information from AMD IOMMU specification. This should simplify debugging IOMMU errors. Also, dump DTE information in additional cases. Signed-off-by: Suravee Suthikulpanit suravee.suthikulpa...@amd.com --- Changelog: V2: * Fix printing format to reduce noise * Use string table instead of switch/case * Use pr_cont instead of printk drivers/iommu/amd_iommu.c | 170 - 1 file changed, 136 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index b287ca3..3362678 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -601,6 +601,93 @@ static void amd_iommu_stats_init(void) * / +struct _event_log_flags { + u32 gn:1, /* 16 */ + nx:1, /* 17 */ + us:1, /* 18 */ +i:1, /* 19 */ + pr:1, /* 20 */ + rw:1, /* 21 */ + pe:1, /* 22 */ + rz:1, /* 23 */ + tr:1, /* 24 */ + type:3, /* [27:25] */ + _reserved_:20; /* Reserved */ +}; + +static const char * const _type_field_encodings[] = { + /* 00 */Reserved, + /* 01 */Master Abort, + /* 10 */Target Abort, + /* 11 */Data Error, +}; + +static const char * const _invalid_transaction_desc[] = { + /* 000 */Read request or non-posted write in the interrupt +addres range, + /* 001 */Pretranslated transaction received from an I/O device +that has I=0 or V=0 in DTE, + /* 010 */Port I/O space transaction received from an I/O device +that has IoCtl=00b in DTE, + /* 011 */Posted write to invalid address range, + /* 100 */Invalid read request or non-posted write, + /* 101 */Posted write to the interrupt/EOI range from an I/O +device that has IntCtl=00b in DTE, + /* 110 */Posted write to a reserved interrupt address range, + /* 111 */Invalid transaction to the system management address range, +}; + +static const char * const _invalid_translation_desc[] = { + /* 000 */Translation request received from an I/O device that has +I=0, or has V=0, or has V=1 and TV=0 in DTE, + /* 001 */Translation request in invalid address range, + /* 010 */Invalid translation request, + /* 011 */Reserved, + /* 100 */Reserved, + /* 101 */Reserved, + /* 110 */Reserved, + /* 111 */Reserved, +}; + +static void dump_flags(int flags, int ev_type) +{ + struct _event_log_flags *p = (struct _event_log_flags *) flags; + u32 err_type = p-type; + + pr_err(AMD-Vi: Flags details: %s, NX:%u, %s, %s, %s, %s, %s, %s, %s\n, + (p-gn ? Use guest address : Use nested address), + (p-nx), + (p-us ? User privileges : Supervisor privileges), + (p-i ? Interrupt request : memory request), + (p-pr ? Page present or Interrupt remapped : + Page not present or Interrupt blocked), + (p-rw ? Write transaction : Read transaction), + (p-pe ? Does not have permission : Has permission), + (p-rz ? Reserv bit not zero : Illegal level encoding), + (p-tr ? Translation request : Transaction request)); + + pr_err(AMD-Vi: Type of error: (0x%x) , err_type); + if ((ev_type == EVENT_TYPE_DEV_TAB_ERR) || + (ev_type == EVENT_TYPE_PAGE_TAB_ERR) || + (ev_type == EVENT_TYPE_CMD_HARD_ERR)) { + if (err_type ARRAY_SIZE(_type_field_encodings)) { + pr_cont(%s\n, + _type_field_encodings[err_type]); + } + } else if (ev_type == EVENT_TYPE_INV_DEV_REQ) { + if (p-tr == 0) { + if (err_type ARRAY_SIZE(_invalid_translation_desc)) + pr_cont(%s\n, + _invalid_translation_desc[err_type]); + } else { + if (err_type ARRAY_SIZE(_invalid_transaction_desc)) + pr_cont(%s\n, + _invalid_transaction_desc[err_type]); + } + } + pr_err(AMD-Vi: (Note: Please refer to AMD IOMMU specification for details.)\n); +} + static void dump_dte_entry(u16 devid) { int i; @@ -619,81 +706,95 @@ static void dump_command(unsigned long phys_addr) pr_err(AMD-Vi: CMD[%d]: %08x\n, i, cmd-data[i]); } -static void iommu_print_event(struct amd_iommu *iommu, void *__evt) +void amd_iommu_print_event(int type, int devid, int domid, + int flags, u64 address) { - int
Re: RFC: vfio API changes needed for powerpc
Hi Stuart, On Tue, 2013-04-02 at 17:32 +, Yoder Stuart-B08248 wrote: Alex, We are in the process of implementing vfio-pci support for the Freescale IOMMU (PAMU). It is an aperture/window-based IOMMU and is quite different than x86, and will involve creating a 'type 2' vfio implementation. For each device's DMA mappings, PAMU has an overall aperture and a number of windows. All sizes and window counts must be power of 2. To illustrate, below is a mapping for a 256MB guest, including guest memory (backed by 64MB huge pages) and some windows for MSIs: Total aperture: 512MB # of windows: 8 win gphys/ # iovaphys size --- 0 0x 0xX_XX00 64MB 1 0x0400 0xX_XX00 64MB 2 0x0800 0xX_XX00 64MB 3 0x0C00 0xX_XX00 64MB 4 0x1000 0xf_fe044000 4KB// msi bank 1 5 0x1400 0xf_fe045000 4KB// msi bank 2 6 0x1800 0xf_fe046000 4KB// msi bank 3 7- - disabled There are a couple of updates needed to the vfio user-kernel interface that we would like your feedback on. 1. IOMMU geometry The kernel IOMMU driver now has an interface (see domain_set_attr, domain_get_attr) that lets us set the domain geometry using attributes. We want to expose that to user space, so envision needing a couple of new ioctls to do this: VFIO_IOMMU_SET_ATTR VFIO_IOMMU_GET_ATTR Any ioctls to the vfiofd (/dev/vfio/vfio) not claimed by vfio-core are passed to the IOMMU driver. So you can effectively have your own type2 ioctl extensions. Alexey has already posted patches to do this for SPAPR that add VFIO_IOMMU_ENABLE/DISABLE to allow him access to VFIO_IOMMU_GET_INFO to examine locked page requirements. As Scott notes we need to come up with a clean userspace interface for these though. 2. MSI window mappings The more problematic question is how to deal with MSIs. We need to create mappings for up to 3 MSI banks that a device may need to target to generate interrupts. The Linux MSI driver can allocate MSIs from the 3 banks any way it wants, and currently user space has no way of knowing which bank may be used for a given device. There are 3 options we have discussed and would like your direction: A. Implicit mappings -- with this approach user space would not explicitly map MSIs. User space would be required to set the geometry so that there are 3 unused windows (the last 3 windows) for MSIs, and it would be up to the kernel to create the mappings. This approach requires some specific semantics (leaving 3 windows) and it potentially gets a little weird-- when should the kernel actually create the MSI mappings? When should they be unmapped? Some convention would need to be established. VFIO would have control of SET/GET_ATTR, right? So we could reduce the number exposed to userspace on GET and transparently add MSI entries on SET. On x86 the interrupt remapper handles this transparently when MSI is enabled and userspace never gets direct access to the device MSI address/data registers. What kind of restrictions do you have around adding and removing windows while the aperture is enabled? B. Explicit mapping using DMA map flags. The idea is that a new flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that a mapping is to be created for the supplied iova. No vaddr is given though. So in the above example there would be a a dma map at 0x1000 for 24KB (and no vaddr). It's up to the kernel to determine which bank gets mapped where. So, this option puts user space in control of which windows are used for MSIs and when MSIs are mapped/unmapped. There would need to be some semantics as to how this is used-- it only makes sense This could also be done as another type2 ioctl extension. What's the value to userspace in determining which windows are used by which banks? It sounds like the case that there are X banks and if userspace wants to use MSI it needs to leave X windows available for that. Is this just buying userspace a few more windows to allow them the choice between MSI or RAM? C. Explicit mapping using normal DMA map. The last idea is that we would introduce a new ioctl to give user-space an fd to the MSI bank, which could be mmapped. The flow would be something like this: -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD -user space mmaps the fd, getting a vaddr -user space does a normal DMA map for desired iova This approach makes everything explicit, but adds a new ioctl applicable most likely only to the PAMU (type2 iommu). And the DMA_MAP of that mmap then allows
Re: RFC: vfio API changes needed for powerpc
On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood scottw...@freescale.com wrote: On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote: Alex, We are in the process of implementing vfio-pci support for the Freescale IOMMU (PAMU). It is an aperture/window-based IOMMU and is quite different than x86, and will involve creating a 'type 2' vfio implementation. For each device's DMA mappings, PAMU has an overall aperture and a number of windows. All sizes and window counts must be power of 2. To illustrate, below is a mapping for a 256MB guest, including guest memory (backed by 64MB huge pages) and some windows for MSIs: Total aperture: 512MB # of windows: 8 win gphys/ # iovaphys size --- 0 0x 0xX_XX00 64MB 1 0x0400 0xX_XX00 64MB 2 0x0800 0xX_XX00 64MB 3 0x0C00 0xX_XX00 64MB 4 0x1000 0xf_fe044000 4KB// msi bank 1 5 0x1400 0xf_fe045000 4KB// msi bank 2 6 0x1800 0xf_fe046000 4KB// msi bank 3 7- - disabled There are a couple of updates needed to the vfio user-kernel interface that we would like your feedback on. 1. IOMMU geometry The kernel IOMMU driver now has an interface (see domain_set_attr, domain_get_attr) that lets us set the domain geometry using attributes. We want to expose that to user space, so envision needing a couple of new ioctls to do this: VFIO_IOMMU_SET_ATTR VFIO_IOMMU_GET_ATTR Note that this means attributes need to be updated for user-API appropriateness, such as using fixed-size types. 2. MSI window mappings The more problematic question is how to deal with MSIs. We need to create mappings for up to 3 MSI banks that a device may need to target to generate interrupts. The Linux MSI driver can allocate MSIs from the 3 banks any way it wants, and currently user space has no way of knowing which bank may be used for a given device. There are 3 options we have discussed and would like your direction: A. Implicit mappings -- with this approach user space would not explicitly map MSIs. User space would be required to set the geometry so that there are 3 unused windows (the last 3 windows) Where does userspace get the number 3 from? E.g. on newer chips there are 4 MSI banks. Maybe future chips have even more. Ok, then make the number 4. The chance of more MSI banks in future chips is nil, and if it ever happened user space could adjust. Also, practically speaking since memory is typically allocate in powers of 2 way you need to approximately double the window geometry anyway. B. Explicit mapping using DMA map flags. The idea is that a new flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that a mapping is to be created for the supplied iova. No vaddr is given though. So in the above example there would be a a dma map at 0x1000 for 24KB (and no vaddr). A single 24 KiB mapping wouldn't work (and why 24KB? What if only one MSI group is involved in this VFIO group? What if four MSI groups are involved?). You'd need to either have a naturally aligned, power-of-two sized mapping that covers exactly the pages you want to map and no more, or you'd need to create a separate mapping for each MSI bank, and due to PAMU subwindow alignment restrictions these mappings could not be contiguous in iova-space. You're right, a single 24KB mapping wouldn't work-- in the case of 3 MSI banks perhaps we could just do one 64MB*3 mapping to identify which windows are used for MSIs. If only one MSI bank was involved the kernel could get clever and only enable the banks actually needed. Stuart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On Tue, Apr 2, 2013 at 3:32 PM, Alex Williamson alex.william...@redhat.com wrote: 2. MSI window mappings The more problematic question is how to deal with MSIs. We need to create mappings for up to 3 MSI banks that a device may need to target to generate interrupts. The Linux MSI driver can allocate MSIs from the 3 banks any way it wants, and currently user space has no way of knowing which bank may be used for a given device. There are 3 options we have discussed and would like your direction: A. Implicit mappings -- with this approach user space would not explicitly map MSIs. User space would be required to set the geometry so that there are 3 unused windows (the last 3 windows) for MSIs, and it would be up to the kernel to create the mappings. This approach requires some specific semantics (leaving 3 windows) and it potentially gets a little weird-- when should the kernel actually create the MSI mappings? When should they be unmapped? Some convention would need to be established. VFIO would have control of SET/GET_ATTR, right? So we could reduce the number exposed to userspace on GET and transparently add MSI entries on SET. The number of windows is always power of 2 (and max is 256). And to reduce PAMU cache pressure you want to use the fewest number of windows you can.So, I don't see practically how we could transparently steal entries to add the MSIs. Either user space knows to leave empty windows for MSIs and by convention the kernel knows which windows those are (as in option #A) or explicitly tell the kernel which windows (as in option #B). On x86 the interrupt remapper handles this transparently when MSI is enabled and userspace never gets direct access to the device MSI address/data registers. What kind of restrictions do you have around adding and removing windows while the aperture is enabled? The windows can be enabled/disabled event while the aperture is enabled (pretty sure)... B. Explicit mapping using DMA map flags. The idea is that a new flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that a mapping is to be created for the supplied iova. No vaddr is given though. So in the above example there would be a a dma map at 0x1000 for 24KB (and no vaddr). It's up to the kernel to determine which bank gets mapped where. So, this option puts user space in control of which windows are used for MSIs and when MSIs are mapped/unmapped. There would need to be some semantics as to how this is used-- it only makes sense This could also be done as another type2 ioctl extension. What's the value to userspace in determining which windows are used by which banks? It sounds like the case that there are X banks and if userspace wants to use MSI it needs to leave X windows available for that. Is this just buying userspace a few more windows to allow them the choice between MSI or RAM? Yes, it would potentially give user space the flexibility some more windows. It also makes more explicit when the MSI mappings are created. In option #A the MSI mappings would probably get created at the time of the first normal DMA map. So, you're saying with this approach you'd rather see a new type 2 ioctl instead of adding new flags to DMA map, right? C. Explicit mapping using normal DMA map. The last idea is that we would introduce a new ioctl to give user-space an fd to the MSI bank, which could be mmapped. The flow would be something like this: -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD -user space mmaps the fd, getting a vaddr -user space does a normal DMA map for desired iova This approach makes everything explicit, but adds a new ioctl applicable most likely only to the PAMU (type2 iommu). And the DMA_MAP of that mmap then allows userspace to select the window used? This one seems like a lot of overhead, adding a new ioctl, new fd, mmap, special mapping path, etc. It would be less overhead to just add an ioctl to enable MSI, maybe letting userspace pick which windows get used, but I'm still not sure what the value is to userspace in exposing it. Thanks, Thanks, Stuart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 03:32:17 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 17:32 +, Yoder Stuart-B08248 wrote: 2. MSI window mappings The more problematic question is how to deal with MSIs. We need to create mappings for up to 3 MSI banks that a device may need to target to generate interrupts. The Linux MSI driver can allocate MSIs from the 3 banks any way it wants, and currently user space has no way of knowing which bank may be used for a given device. There are 3 options we have discussed and would like your direction: A. Implicit mappings -- with this approach user space would not explicitly map MSIs. User space would be required to set the geometry so that there are 3 unused windows (the last 3 windows) for MSIs, and it would be up to the kernel to create the mappings. This approach requires some specific semantics (leaving 3 windows) and it potentially gets a little weird-- when should the kernel actually create the MSI mappings? When should they be unmapped? Some convention would need to be established. VFIO would have control of SET/GET_ATTR, right? So we could reduce the number exposed to userspace on GET and transparently add MSI entries on SET. What do you mean by reduce the number exposed? Userspace decides how many entries there are, but it must be a power of two beteen 1 and 256. On x86 the interrupt remapper handles this transparently when MSI is enabled and userspace never gets direct access to the device MSI address/data registers. x86 has a totally different mechanism here, as far as I understand -- even before you get into restrictions on mappings. What kind of restrictions do you have around adding and removing windows while the aperture is enabled? Subwindows can be modified while the aperture is enabled, but the aperture size and number of subwindows cannot be changed. B. Explicit mapping using DMA map flags. The idea is that a new flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that a mapping is to be created for the supplied iova. No vaddr is given though. So in the above example there would be a a dma map at 0x1000 for 24KB (and no vaddr). It's up to the kernel to determine which bank gets mapped where. So, this option puts user space in control of which windows are used for MSIs and when MSIs are mapped/unmapped. There would need to be some semantics as to how this is used-- it only makes sense This could also be done as another type2 ioctl extension. Again, what is type2, specifically? If someone else is adding their own IOMMU that is kind of, sort of like PAMU, how would they know if it's close enough? What assumptions can a user make when they see that they're dealing with type2? What's the value to userspace in determining which windows are used by which banks? That depends on who programs the MSI config space address. What is important is userspace controlling which iovas will be dedicated to this, in case it wants to put something else there. It sounds like the case that there are X banks and if userspace wants to use MSI it needs to leave X windows available for that. Is this just buying userspace a few more windows to allow them the choice between MSI or RAM? Well, there could be that. But also, userspace will generally have a much better idea of the type of mappings it's creating, so it's easier to keep everything explicit at the kernel/user interface than require more complicated code in the kernel to figure things out automatically (not just for MSIs but in general). If the kernel automatically creates the MSI mappings, when does it assume that userspace is done creating its own? What if userspace doesn't need any DMA other than the MSIs? What if userspace wants to continue dynamically modifying its other mappings? C. Explicit mapping using normal DMA map. The last idea is that we would introduce a new ioctl to give user-space an fd to the MSI bank, which could be mmapped. The flow would be something like this: -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD -user space mmaps the fd, getting a vaddr -user space does a normal DMA map for desired iova This approach makes everything explicit, but adds a new ioctl applicable most likely only to the PAMU (type2 iommu). And the DMA_MAP of that mmap then allows userspace to select the window used? This one seems like a lot of overhead, adding a new ioctl, new fd, mmap, special mapping path, etc. There's going to be special stuff no matter what. This would keep it separated from the IOMMU map code. I'm not sure what you mean by overhead here... the runtime overhead of setting things up is not particularly relevant as long
Re: RFC: vfio API changes needed for powerpc
On Tue, Apr 2, 2013 at 3:47 PM, Scott Wood scottw...@freescale.com wrote: On 04/02/2013 03:38:42 PM, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood scottw...@freescale.com wrote: On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote: Alex, We are in the process of implementing vfio-pci support for the Freescale IOMMU (PAMU). It is an aperture/window-based IOMMU and is quite different than x86, and will involve creating a 'type 2' vfio implementation. For each device's DMA mappings, PAMU has an overall aperture and a number of windows. All sizes and window counts must be power of 2. To illustrate, below is a mapping for a 256MB guest, including guest memory (backed by 64MB huge pages) and some windows for MSIs: Total aperture: 512MB # of windows: 8 win gphys/ # iovaphys size --- 0 0x 0xX_XX00 64MB 1 0x0400 0xX_XX00 64MB 2 0x0800 0xX_XX00 64MB 3 0x0C00 0xX_XX00 64MB 4 0x1000 0xf_fe044000 4KB// msi bank 1 5 0x1400 0xf_fe045000 4KB// msi bank 2 6 0x1800 0xf_fe046000 4KB// msi bank 3 7- - disabled There are a couple of updates needed to the vfio user-kernel interface that we would like your feedback on. 1. IOMMU geometry The kernel IOMMU driver now has an interface (see domain_set_attr, domain_get_attr) that lets us set the domain geometry using attributes. We want to expose that to user space, so envision needing a couple of new ioctls to do this: VFIO_IOMMU_SET_ATTR VFIO_IOMMU_GET_ATTR Note that this means attributes need to be updated for user-API appropriateness, such as using fixed-size types. 2. MSI window mappings The more problematic question is how to deal with MSIs. We need to create mappings for up to 3 MSI banks that a device may need to target to generate interrupts. The Linux MSI driver can allocate MSIs from the 3 banks any way it wants, and currently user space has no way of knowing which bank may be used for a given device. There are 3 options we have discussed and would like your direction: A. Implicit mappings -- with this approach user space would not explicitly map MSIs. User space would be required to set the geometry so that there are 3 unused windows (the last 3 windows) Where does userspace get the number 3 from? E.g. on newer chips there are 4 MSI banks. Maybe future chips have even more. Ok, then make the number 4. The chance of more MSI banks in future chips is nil, What makes you so sure? Especially since you seem to be presenting this as not specifically an MPIC API. and if it ever happened user space could adjust. What bit of API is going to tell it that it needs to adjust? Haven't thought through that completely, but I guess we could add an API to return the number of MSI banks for type 2 iommus. Also, practically speaking since memory is typically allocate in powers of 2 way you need to approximately double the window geometry anyway. Only if your existing mapping needs fit exactly in a power of two. B. Explicit mapping using DMA map flags. The idea is that a new flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that a mapping is to be created for the supplied iova. No vaddr is given though. So in the above example there would be a a dma map at 0x1000 for 24KB (and no vaddr). A single 24 KiB mapping wouldn't work (and why 24KB? What if only one MSI group is involved in this VFIO group? What if four MSI groups are involved?). You'd need to either have a naturally aligned, power-of-two sized mapping that covers exactly the pages you want to map and no more, or you'd need to create a separate mapping for each MSI bank, and due to PAMU subwindow alignment restrictions these mappings could not be contiguous in iova-space. You're right, a single 24KB mapping wouldn't work-- in the case of 3 MSI banks perhaps we could just do one 64MB*3 mapping to identify which windows are used for MSIs. Where did the assumption of a 64MiB subwindow size come from? The example I was using. User space would need to create a mapping for window_size * msi_bank_count. Stuart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag
On Tue, Apr 02, 2013 at 09:32:40PM +0200, Borislav Petkov wrote: On Tue, Apr 02, 2013 at 06:33:18PM +0200, Joerg Roedel wrote: Okay, in theory I could implement a feedback loop between timer-setup and intremap code and try fixups until it works. But that seems not to be worth it to work around a buggy BIOS. Yeah, same here. It's not like we really need intremap to work - we're only trying to fix the annoying error message currently. :-) Right :) What I actually thought about was providing an IVRS-override on the kernel command line. So that you can specify the IOAPIC_ID-DEVID mapping there and make it work this way. What do you think? I guess that is workable. I can imagine people wanting this if they want to do the intremap thing on such b0rked BIOSen. So how do I specify this IOAPIC_ID-DEVID mapping on the cmdline exactly? Don't know yet, probably something like ivrs_ioapic[apicid]=0:14.2 and ivrs_hpet[hpet-id]=0:14.2. But not entierly sure yet, at least parsing shouldn't require lex and yacc ;) Yeah, that's my fear too. So we leave it better as it is... Hohumm. Thus shall it be! Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 03:38:42 PM, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood scottw...@freescale.com wrote: On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote: Alex, We are in the process of implementing vfio-pci support for the Freescale IOMMU (PAMU). It is an aperture/window-based IOMMU and is quite different than x86, and will involve creating a 'type 2' vfio implementation. For each device's DMA mappings, PAMU has an overall aperture and a number of windows. All sizes and window counts must be power of 2. To illustrate, below is a mapping for a 256MB guest, including guest memory (backed by 64MB huge pages) and some windows for MSIs: Total aperture: 512MB # of windows: 8 win gphys/ # iovaphys size --- 0 0x 0xX_XX00 64MB 1 0x0400 0xX_XX00 64MB 2 0x0800 0xX_XX00 64MB 3 0x0C00 0xX_XX00 64MB 4 0x1000 0xf_fe044000 4KB// msi bank 1 5 0x1400 0xf_fe045000 4KB// msi bank 2 6 0x1800 0xf_fe046000 4KB// msi bank 3 7- - disabled There are a couple of updates needed to the vfio user-kernel interface that we would like your feedback on. 1. IOMMU geometry The kernel IOMMU driver now has an interface (see domain_set_attr, domain_get_attr) that lets us set the domain geometry using attributes. We want to expose that to user space, so envision needing a couple of new ioctls to do this: VFIO_IOMMU_SET_ATTR VFIO_IOMMU_GET_ATTR Note that this means attributes need to be updated for user-API appropriateness, such as using fixed-size types. 2. MSI window mappings The more problematic question is how to deal with MSIs. We need to create mappings for up to 3 MSI banks that a device may need to target to generate interrupts. The Linux MSI driver can allocate MSIs from the 3 banks any way it wants, and currently user space has no way of knowing which bank may be used for a given device. There are 3 options we have discussed and would like your direction: A. Implicit mappings -- with this approach user space would not explicitly map MSIs. User space would be required to set the geometry so that there are 3 unused windows (the last 3 windows) Where does userspace get the number 3 from? E.g. on newer chips there are 4 MSI banks. Maybe future chips have even more. Ok, then make the number 4. The chance of more MSI banks in future chips is nil, What makes you so sure? Especially since you seem to be presenting this as not specifically an MPIC API. and if it ever happened user space could adjust. What bit of API is going to tell it that it needs to adjust? Also, practically speaking since memory is typically allocate in powers of 2 way you need to approximately double the window geometry anyway. Only if your existing mapping needs fit exactly in a power of two. B. Explicit mapping using DMA map flags. The idea is that a new flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that a mapping is to be created for the supplied iova. No vaddr is given though. So in the above example there would be a a dma map at 0x1000 for 24KB (and no vaddr). A single 24 KiB mapping wouldn't work (and why 24KB? What if only one MSI group is involved in this VFIO group? What if four MSI groups are involved?). You'd need to either have a naturally aligned, power-of-two sized mapping that covers exactly the pages you want to map and no more, or you'd need to create a separate mapping for each MSI bank, and due to PAMU subwindow alignment restrictions these mappings could not be contiguous in iova-space. You're right, a single 24KB mapping wouldn't work-- in the case of 3 MSI banks perhaps we could just do one 64MB*3 mapping to identify which windows are used for MSIs. Where did the assumption of a 64MiB subwindow size come from? If only one MSI bank was involved the kernel could get clever and only enable the banks actually needed. I'd rather see cleverness kept in userspace. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood scottw...@freescale.com wrote: C. Explicit mapping using normal DMA map. The last idea is that we would introduce a new ioctl to give user-space an fd to the MSI bank, which could be mmapped. The flow would be something like this: -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD -user space mmaps the fd, getting a vaddr -user space does a normal DMA map for desired iova This approach makes everything explicit, but adds a new ioctl applicable most likely only to the PAMU (type2 iommu). And the DMA_MAP of that mmap then allows userspace to select the window used? This one seems like a lot of overhead, adding a new ioctl, new fd, mmap, special mapping path, etc. There's going to be special stuff no matter what. This would keep it separated from the IOMMU map code. I'm not sure what you mean by overhead here... the runtime overhead of setting things up is not particularly relevant as long as it's reasonable. If you mean development and maintenance effort, keeping things well separated should help. We don't need to change DMA_MAP. If we can simply add a new type 2 ioctl that allows user space to set which windows are MSIs, it seems vastly less complex than an ioctl to supply a new fd, mmap of it, etc. So maybe 2 ioctls: VFIO_IOMMU_GET_MSI_COUNT VFIO_IOMMU_MAP_MSI(iova, size) How are MSIs related to devices on PAMU? On x86 MSI count is very device specific, which means it wold be a VFIO_DEVICE_* ioctl (actually VFIO_DEVICE_GET_IRQ_INFO does this for us on x86). The trouble with it being a device ioctl is that you need to get the device FD, but the IOMMU protection needs to be established before you can get that... so there's an ordering problem if you need it from the device before configuring the IOMMU. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 04:08:27 PM, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood scottw...@freescale.com wrote: This could also be done as another type2 ioctl extension. Again, what is type2, specifically? If someone else is adding their own IOMMU that is kind of, sort of like PAMU, how would they know if it's close enough? What assumptions can a user make when they see that they're dealing with type2? We will define that as part of the type2 implementation. Highly unlikely anything but a PAMU will comply. So then why not just call it pamu instead of being obfuscatory? There's going to be special stuff no matter what. This would keep it separated from the IOMMU map code. I'm not sure what you mean by overhead here... the runtime overhead of setting things up is not particularly relevant as long as it's reasonable. If you mean development and maintenance effort, keeping things well separated should help. We don't need to change DMA_MAP. If we can simply add a new type 2 ioctl that allows user space to set which windows are MSIs, And what specifically does that ioctl do? It causes new mappings to be created, right? So you're changing (or at least adding to) the DMA map mechanism. it seems vastly less complex than an ioctl to supply a new fd, mmap of it, etc. I don't see enough complexity in the mmap approach for anything to be vastly less complex in comparison. I think you're building the mmap approach up in your head to be a lot worse that it would actually be. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2] iommu/amd: Add logic to decode AMD IOMMU event flag
On Tue, Apr 02, 2013 at 02:05:14PM -0500, Suthikulpanit, Suravee wrote: From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com +static const char * const _type_field_encodings[] = { + /* 00 */Reserved, + /* 01 */Master Abort, + /* 10 */Target Abort, + /* 11 */Data Error, In these arrays, please put the comments at the end of the line and align them nicely like this: static const char * const _type_field_encodings[] = { Reserved, /* 00 */ Master Abort, /* 01 */ Target Abort, /* 10 */ Data Error, /* 11 */ }; This improves readability a lot. +static void dump_flags(int flags, int ev_type) +{ + struct _event_log_flags *p = (struct _event_log_flags *) flags; + u32 err_type = p-type; + + pr_err(AMD-Vi: Flags details: %s, NX:%u, %s, %s, %s, %s, %s, %s, %s\n, + (p-gn ? Use guest address : Use nested address), + (p-nx), + (p-us ? User privileges : Supervisor privileges), + (p-i ? Interrupt request : memory request), + (p-pr ? Page present or Interrupt remapped : + Page not present or Interrupt blocked), + (p-rw ? Write transaction : Read transaction), + (p-pe ? Does not have permission : Has permission), + (p-rz ? Reserv bit not zero : Illegal level encoding), + (p-tr ? Translation request : Transaction request)); These strings are too long, please just print useful shortcuts for them, like US : SV instead of User privileges : Supervisor privileges Also the commas between the strings are not needed then. As a general rule, in the default configuration an event-log entry should not take more than a single line in dmesg (which is also not too long). If you want to print more information in some cases you can enable that by a command line parameter like 'amd_iommu=verbose' or something like that. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 04:16:11 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 15:54 -0500, Stuart Yoder wrote: The number of windows is always power of 2 (and max is 256). And to reduce PAMU cache pressure you want to use the fewest number of windows you can.So, I don't see practically how we could transparently steal entries to add the MSIs. Either user space knows to leave empty windows for MSIs and by convention the kernel knows which windows those are (as in option #A) or explicitly tell the kernel which windows (as in option #B). Ok, apparently I don't understand the API. Is it something like userspace calls GET_ATTR and finds out that there are 256 available windows, userspace determines that it needs 8 for RAM and then it has an MSI device, so it needs to call SET_ATTR and ask for 16? That seems prone to exploitation by the first userspace to allocate it's aperture, What exploitation? It's not as if there is a pool of 256 global windows that users allocate from. The subwindow count is just how finely divided the aperture is. The only way one user will affect another is through cache contention (which is why we want the minimum number of subwindows that we can get away with). but I'm also not sure why userspace could specify the (non-power of 2) number of windows it needs for RAM, then VFIO would see that the devices attached have MSI and add those windows and align to a power of 2. If you double the subwindow count without userspace knowing, you have to double the aperture as well (and you may need to grow up or down depending on alignment). This means you also need to halve the maximum aperture that userspace can request. And you need to expose a different number of maximum subwindows in the IOMMU API based on whether we might have MSIs of this type. It's ugly and awkward, and removes the possibility for userspace to place the MSIs in some unused slot in the middle, or not use MSIs at all. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 04:38:45 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood scottw...@freescale.com wrote: C. Explicit mapping using normal DMA map. The last idea is that we would introduce a new ioctl to give user-space an fd to the MSI bank, which could be mmapped. The flow would be something like this: -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD -user space mmaps the fd, getting a vaddr -user space does a normal DMA map for desired iova This approach makes everything explicit, but adds a new ioctl applicable most likely only to the PAMU (type2 iommu). And the DMA_MAP of that mmap then allows userspace to select the window used? This one seems like a lot of overhead, adding a new ioctl, new fd, mmap, special mapping path, etc. There's going to be special stuff no matter what. This would keep it separated from the IOMMU map code. I'm not sure what you mean by overhead here... the runtime overhead of setting things up is not particularly relevant as long as it's reasonable. If you mean development and maintenance effort, keeping things well separated should help. We don't need to change DMA_MAP. If we can simply add a new type 2 ioctl that allows user space to set which windows are MSIs, it seems vastly less complex than an ioctl to supply a new fd, mmap of it, etc. So maybe 2 ioctls: VFIO_IOMMU_GET_MSI_COUNT Do you mean a count of actual MSIs or a count of MSI banks used by the whole VFIO group? VFIO_IOMMU_MAP_MSI(iova, size) Not sure how you mean size to be used -- for MPIC it would be 4K per bank, and you can only map one bank at a time (which bank you're mapping should be a parameter, if only so that the kernel doesn't have to keep iteration state for you). How are MSIs related to devices on PAMU? PAMU doesn't care about MSIs. The relation of individual MSIs to a device is standard PCI stuff. Each MSI bank (which is part of the MPIC, not PAMU) can hold numerous MSIs. The VFIO user would want to map all MSI banks that are in use by any of the devices in the group. Ideally we'd let the VFIO grouping influence the allocation of MSIs. On x86 MSI count is very device specific, which means it wold be a VFIO_DEVICE_* ioctl (actually VFIO_DEVICE_GET_IRQ_INFO does this for us on x86). The trouble with it being a device ioctl is that you need to get the device FD, but the IOMMU protection needs to be established before you can get that... so there's an ordering problem if you need it from the device before configuring the IOMMU. Thanks, What do you mean by IOMMU protection needs to be established? Wouldn't we just start with no mappings in place? -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V3] iommu/amd: Add logic to decode AMD IOMMU event flag
From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com Add logic to decode AMD IOMMU event flag based on information from AMD IOMMU specification. This should simplify debugging IOMMU errors. Also, dump DTE information in additional cases. This is an example: AMD-Vi: Event logged [IO_PAGE_FAULT device=51:00.0 domain=0x address=0x flags=0x0fff] AMD-Vi: Flags details: Guest NX=1 User Intr Present Write No-Perm Rsrv-Bit Translation AMD-Vi: Type of error: (0x7) AMD-Vi: (Note: Please refer to AMD IOMMU specification for details.) AMD-Vi: DTE[0]: 603fa75e2403 AMD-Vi: DTE[1]: 0014 AMD-Vi: DTE[2]: 203fa5e09011 AMD-Vi: DTE[3]: Signed-off-by: Suravee Suthikulpanit suravee.suthikulpa...@amd.com --- Changelog: V3: * Move comments to end of line * Shorten the print out to be within one line V2: * Fix printing format to reduce noise * Use string table instead of switch/case * Use pr_cont instead of printk drivers/iommu/amd_iommu.c | 171 - 1 file changed, 137 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index b287ca3..593a1a3 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -601,6 +601,94 @@ static void amd_iommu_stats_init(void) * / +struct _event_log_flags { + u32 gn:1, /* 16 */ + nx:1, /* 17 */ + us:1, /* 18 */ +i:1, /* 19 */ + pr:1, /* 20 */ + rw:1, /* 21 */ + pe:1, /* 22 */ + rz:1, /* 23 */ + tr:1, /* 24 */ + type:3, /* [27:25] */ + _reserved_:20; /* Reserved */ +}; + +static const char * const _type_field_encodings[] = { + Reserved, /* 00 */ + Master Abort, /* 01 */ + Target Abort, /* 10 */ + Data Error, /* 11 */ +}; + +static const char * const _invalid_transaction_desc[] = { + Read request or non-posted write in the interrupt +addres range,/* 000 */ + Pretranslated transaction received from an + I/O device that has I=0 or V=0 in DTE,/* 001 */ + Port I/O space transaction received from an + I/O device that has IoCtl=00b in DTE, /* 010 */ + Posted write to invalid address range,/* 011 */ + Invalid read request or non-posted write, /* 100 */ + Posted write to the interrupt/EOI range from an + I/O device that has IntCtl=00b in DTE,/* 101 */ + Posted write to a reserved interrupt address range, /* 110 */ + Invalid transaction to the system management + address range,/* 111 */ +}; + +static const char * const _invalid_translation_desc[] = { + Translation request received from an I/O device + that has I=0, or has V=0, or has V=1 and + TV=0 in DTE, /* 000 */ + Translation request in invalid address range, /* 001 */ + Invalid translation request, /* 010 */ + Reserved, /* 011 */ + Reserved, /* 100 */ + Reserved, /* 101 */ + Reserved, /* 110 */ + Reserved, /* 111 */ +}; + +static void dump_flags(int flags, int ev_type) +{ + struct _event_log_flags *p = (struct _event_log_flags *) flags; + u32 err_type = p-type; + + pr_err(AMD-Vi: Flags details: %s NX=%u %s %s %s %s %s %s %s\n, + (p-gn ? Guest : Nested), + (p-nx), + (p-us ? User : Super), + (p-i ? Intr : Mem), + (p-pr ? Present : Absent), + (p-rw ? Write : Read), + (p-pe ? No-Perm : Has-Perm), + (p-rz ? Rsrv-Bit : Ill-Level), + (p-tr ? Translation : Transaction)); + + pr_err(AMD-Vi: Type of error: (0x%x) , err_type); + if ((ev_type == EVENT_TYPE_DEV_TAB_ERR) || + (ev_type == EVENT_TYPE_PAGE_TAB_ERR) || + (ev_type == EVENT_TYPE_CMD_HARD_ERR)) { + if (err_type ARRAY_SIZE(_type_field_encodings)) { + pr_cont(%s\n, + _type_field_encodings[err_type]); + } + } else if (ev_type == EVENT_TYPE_INV_DEV_REQ) { + if (p-tr == 0) { + if (err_type
Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.
On Tue, Apr 2, 2013 at 11:18 AM, Joerg Roedel j...@8bytes.org wrote: + panic(\n); A kernel panic seems like an over-reaction to an access violation. We have no way to determining what code caused the violation, so we can't just kill the process. I agree it seems like overkill, but what else should we do? Does the IOMMU layer have a way for the IOMMU driver to stop the device that caused the problem? Besides the device that caused the violation the system should still work, no? Not really. The PAMU was designed to add IOMMU support to legacy devices, which have no concept of an MMU. If the PAMU detects an access violation, there's no way for the device to recover, because it has no idea that a violation has occurred. It's going to keep on writing to bad data. Maybe we need a mechanism where a driver can register a callback function to handle IOMMU exceptions? + /* + * In case of devices with multiple LIODNs just store + * the info for the first LIODN as all + * LIODNs share the same domain + */ + if (!old_domain_info) + dev-archdata.iommu_domain = info; + spin_unlock(device_domain_lock); Don't you have to tell the hardware that a device was added to a domain? I don't see that, what I am missing? I'm not sure I understand. What hardware do you think needs to be notified? The PAMU reads everything it needs from the PAACT, which we update. The PAMU does not know anything about the devices that it monitors, and the devices don't know anything about the PAMU. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On Tue, 2013-04-02 at 17:13 -0500, Scott Wood wrote: On 04/02/2013 04:16:11 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 15:54 -0500, Stuart Yoder wrote: The number of windows is always power of 2 (and max is 256). And to reduce PAMU cache pressure you want to use the fewest number of windows you can.So, I don't see practically how we could transparently steal entries to add the MSIs. Either user space knows to leave empty windows for MSIs and by convention the kernel knows which windows those are (as in option #A) or explicitly tell the kernel which windows (as in option #B). Ok, apparently I don't understand the API. Is it something like userspace calls GET_ATTR and finds out that there are 256 available windows, userspace determines that it needs 8 for RAM and then it has an MSI device, so it needs to call SET_ATTR and ask for 16? That seems prone to exploitation by the first userspace to allocate it's aperture, What exploitation? It's not as if there is a pool of 256 global windows that users allocate from. The subwindow count is just how finely divided the aperture is. The only way one user will affect another is through cache contention (which is why we want the minimum number of subwindows that we can get away with). but I'm also not sure why userspace could specify the (non-power of 2) number of windows it needs for RAM, then VFIO would see that the devices attached have MSI and add those windows and align to a power of 2. If you double the subwindow count without userspace knowing, you have to double the aperture as well (and you may need to grow up or down depending on alignment). This means you also need to halve the maximum aperture that userspace can request. And you need to expose a different number of maximum subwindows in the IOMMU API based on whether we might have MSIs of this type. It's ugly and awkward, and removes the possibility for userspace to place the MSIs in some unused slot in the middle, or not use MSIs at all. Ok, I missed this in Stuart's example: Total aperture: 512MB # of windows: 8 win gphys/ # iovaphys size --- 0 0x 0xX_XX00 64MB 1 0x0400 0xX_XX00 64MB 2 0x0800 0xX_XX00 64MB 3 0x0C00 0xX_XX00 64MB 4 0x1000 0xf_fe044000 4KB// msi bank 1 ^^ 5 0x1400 0xf_fe045000 4KB// msi bank 2 ^^ 6 0x1800 0xf_fe046000 4KB// msi bank 3 ^^ 7- - disabled So even though the MSI banks are 4k in this example, they're still on 64MB boundaries. If userspace were to leave this as 256 windows, each would be 2MB and we'd use 128 of them to map the same memory as these 4x64MB windows and thrash the iotlb harder. The picture is becoming clearer. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On Tue, 2013-04-02 at 17:44 -0500, Scott Wood wrote: On 04/02/2013 04:32:04 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 15:57 -0500, Scott Wood wrote: On 04/02/2013 03:32:17 PM, Alex Williamson wrote: On x86 the interrupt remapper handles this transparently when MSI is enabled and userspace never gets direct access to the device MSI address/data registers. x86 has a totally different mechanism here, as far as I understand -- even before you get into restrictions on mappings. So what control will userspace have over programming the actually MSI vectors on PAMU? Not sure what you mean -- PAMU doesn't get explicitly involved in MSIs. It's just another 4K page mapping (per relevant MSI bank). If you want isolation, you need to make sure that an MSI group is only used by one VFIO group, and that you're on a chip that has alias pages with just one MSI bank register each (newer chips do, but the first chip to have a PAMU didn't). How does a user figure this out? This could also be done as another type2 ioctl extension. Again, what is type2, specifically? If someone else is adding their own IOMMU that is kind of, sort of like PAMU, how would they know if it's close enough? What assumptions can a user make when they see that they're dealing with type2? Naming always has and always will be a problem. I assume this is named type2 rather than PAMU because it's trying to expose a generic windowed IOMMU fitting the IOMMU API. But how closely is the MSI situation related to a generic windowed IOMMU, then? We could just as well have a highly flexible IOMMU in terms of arbitrary 4K page mappings, but still handle MSIs as pages to be mapped rather than a translation table. Or we could have a windowed IOMMU that has an MSI translation table. Like type1, it doesn't really make sense to name it IOMMU API because that's a kernel internal interface and we're designing a userspace interface that just happens to use that. Tagging it to a piece of hardware makes it less reusable. Well, that's my point. Is it reusable at all, anyway? If not, then giving it a more obscure name won't change that. If it is reusable, then where is the line drawn between things that are PAMU-specific or MPIC-specific and things that are part of the generic windowed IOMMU abstraction? Type1 is arbitrary. It might as well be named brown and this one can be blue. The difference is that type1 seems to refer to hardware that can do arbitrary 4K page mappings, possibly constrained by an aperture but nothing else. More than one IOMMU can reasonably fit that. The odds that another IOMMU would have exactly the same restrictions as PAMU seem smaller in comparison. In any case, if you had to deal with some Intel-only quirk, would it make sense to call it a type1 attribute? I'm not advocating one way or the other on whether an abstraction is viable here (though Stuart seems to think it's highly unlikely anything but a PAMU will comply), just that if it is to be abstracted rather than a hardware-specific interface, we need to document what is and is not part of the abstraction. Otherwise a non-PAMU-specific user won't know what they can rely on, and someone adding support for a new windowed IOMMU won't know if theirs is close enough, or they need to introduce a type3. So Alexey named the SPAPR IOMMU something related to spapr... surprisingly enough. I'm fine with that. If you think it's unique enough, name it something appropriately. I haven't seen the code and don't know the architecture sufficiently to have an opinion. What's the value to userspace in determining which windows are used by which banks? That depends on who programs the MSI config space address. What is important is userspace controlling which iovas will be dedicated to this, in case it wants to put something else there. So userspace is programming the MSI vectors, targeting a user programmed iova? But an iova selects a window and I thought there were some number of MSI banks and we don't really know which ones we'll need... still confused. Userspace would also need a way to find out the page offset and data value. That may be an argument in favor of having the two ioctls Stuart later suggested (get MSI count, and map MSI). Connecting the user set iova and host kernel assigned irq number is where I'm still lost, but I'll follow-up with that question in the other thread. Would there be any complication in the VFIO code from tracking a mapping that doesn't have a userspace virtual address associated with it? Only the VFIO iommu driver tracks mappings, the QEMU userspace component doesn't (replies on the memory API for type1), nor does any of the kernel framework code. There's going to be special stuff no matter
Re: RFC: vfio API changes needed for powerpc
On Tue, 2013-04-02 at 17:50 -0500, Scott Wood wrote: On 04/02/2013 04:38:45 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood scottw...@freescale.com wrote: C. Explicit mapping using normal DMA map. The last idea is that we would introduce a new ioctl to give user-space an fd to the MSI bank, which could be mmapped. The flow would be something like this: -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD -user space mmaps the fd, getting a vaddr -user space does a normal DMA map for desired iova This approach makes everything explicit, but adds a new ioctl applicable most likely only to the PAMU (type2 iommu). And the DMA_MAP of that mmap then allows userspace to select the window used? This one seems like a lot of overhead, adding a new ioctl, new fd, mmap, special mapping path, etc. There's going to be special stuff no matter what. This would keep it separated from the IOMMU map code. I'm not sure what you mean by overhead here... the runtime overhead of setting things up is not particularly relevant as long as it's reasonable. If you mean development and maintenance effort, keeping things well separated should help. We don't need to change DMA_MAP. If we can simply add a new type 2 ioctl that allows user space to set which windows are MSIs, it seems vastly less complex than an ioctl to supply a new fd, mmap of it, etc. So maybe 2 ioctls: VFIO_IOMMU_GET_MSI_COUNT Do you mean a count of actual MSIs or a count of MSI banks used by the whole VFIO group? I hope the latter, which would clarify how this is distinct from DEVICE_GET_IRQ_INFO. Is hotplug even on the table? Presumably dynamically adding a device could bring along additional MSI banks? VFIO_IOMMU_MAP_MSI(iova, size) Not sure how you mean size to be used -- for MPIC it would be 4K per bank, and you can only map one bank at a time (which bank you're mapping should be a parameter, if only so that the kernel doesn't have to keep iteration state for you). How are MSIs related to devices on PAMU? PAMU doesn't care about MSIs. The relation of individual MSIs to a device is standard PCI stuff. Each MSI bank (which is part of the MPIC, not PAMU) can hold numerous MSIs. The VFIO user would want to map all MSI banks that are in use by any of the devices in the group. Ideally we'd let the VFIO grouping influence the allocation of MSIs. The current VFIO MSI support has the host handling everything about MSI. The user never programs an MSI vector to the physical device, they set up everything through ioctl. On interrupt, we simply trigger an eventfd and leave it to things like KVM irqfd or QEMU to do the right thing in a virtual machine. Here the MSI vector has to go through a PAMU window to hit the correct MSI bank. So that means it has some component of the iova involved, which we're proposing here is controlled by userspace (whether that vector uses an offset from 0x1000 or 0x depending on which window slot is used to make the MSI bank). I assume we're still working in a model where the physical interrupt fires into the host and a host-based interrupt handler triggers an eventfd, right? So that means the vector also has host components so we trigger the correct ISR. How is that coordinated? Would is be possible for userspace to simply leave room for MSI bank mapping (how much room could be determined by something like VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can DMA_MAP starting at the 0x0 address of the aperture, growing up, and VFIO will map banks on demand at the top of the aperture, growing down? Wouldn't that avoid a lot of issues with userspace needing to know anything about MSI banks (other than count) and coordinating irq numbers and enabling handlers? On x86 MSI count is very device specific, which means it wold be a VFIO_DEVICE_* ioctl (actually VFIO_DEVICE_GET_IRQ_INFO does this for us on x86). The trouble with it being a device ioctl is that you need to get the device FD, but the IOMMU protection needs to be established before you can get that... so there's an ordering problem if you need it from the device before configuring the IOMMU. Thanks, What do you mean by IOMMU protection needs to be established? Wouldn't we just start with no mappings in place? If no mappings blocks all DMA, sure, that's fine. Once the VFIO device FD is accessible by userspace we have to protect the host against DMA. If any IOMMU_SET_ATTR calls temporarily disable DMA protection, that could be exploitable. Thanks, Alex ___ iommu mailing list
RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, April 03, 2013 7:23 AM To: Timur Tabi Cc: Joerg Roedel; Sethi Varun-B16395; lkml; Kumar Gala; Yoder Stuart- B08248; iommu@lists.linux-foundation.org; Benjamin Herrenschmidt; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation. On 04/02/2013 08:35:54 PM, Timur Tabi wrote: On Tue, Apr 2, 2013 at 11:18 AM, Joerg Roedel j...@8bytes.org wrote: + panic(\n); A kernel panic seems like an over-reaction to an access violation. We have no way to determining what code caused the violation, so we can't just kill the process. I agree it seems like overkill, but what else should we do? Does the IOMMU layer have a way for the IOMMU driver to stop the device that caused the problem? At a minimum, log a message and continue. Probably turn off the LIODN, at least if it continues to be noisy (otherwise we could get stuck in an interrupt storm as you note). Possibly let the user know somehow, especially if it's a VFIO domain. [Sethi Varun-B16395] Can definitely log the message and disable the LIODN (to avoid an interrupt storm), but we definitely need a mechanism to inform vfio subsystem about the error. Also, disabling LIODN may not be a viable option with the new LIODN allocation scheme (where LIODN would be associated with a domain). Don't take down the whole kernel. It's not just overkill; it undermines VFIO's efforts to make it safe for users to control devices. Besides the device that caused the violation the system should still work, no? Not really. The PAMU was designed to add IOMMU support to legacy devices, which have no concept of an MMU. If the PAMU detects an access violation, there's no way for the device to recover, because it has no idea that a violation has occurred. It's going to keep on writing to bad data. I think that's only the case for posted writes (or devices which fail to take a hint and stop even after they see an I/O error). [Sethi Varun-B16395] Even in the case where the guest driver detects a failure, it may not be able to fix the problem without intervention from the VFIO subsystem. -Varun ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 2/5 v11] powerpc: Add iommu domain pointer to device archdata
Kumar/Ben, Any comments? (Had checked with Ben (on IRC) sometime back, he was fine with this patch) Regards Varun -Original Message- From: Joerg Roedel [mailto:j...@8bytes.org] Sent: Tuesday, April 02, 2013 8:39 PM To: Sethi Varun-B16395 Cc: Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux- foundation.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org Subject: Re: [PATCH 2/5 v11] powerpc: Add iommu domain pointer to device archdata On Fri, Mar 29, 2013 at 01:23:59AM +0530, Varun Sethi wrote: Add an iommu domain pointer to device (powerpc) archdata. Devices are attached to iommu domains and this pointer provides a mechanism to correlate between a device and the associated iommu domain. This field is set when a device is attached to a domain. Signed-off-by: Varun Sethi varun.se...@freescale.com This patch needs to be Acked by the PPC maintainers. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 4/5 v11] iommu/fsl: Add additional iommu attributes required by the PAMU driver.
-Original Message- From: Joerg Roedel [mailto:j...@8bytes.org] Sent: Tuesday, April 02, 2013 8:40 PM To: Sethi Varun-B16395 Cc: Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux- foundation.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org Subject: Re: [PATCH 4/5 v11] iommu/fsl: Add additional iommu attributes required by the PAMU driver. On Fri, Mar 29, 2013 at 01:24:01AM +0530, Varun Sethi wrote: +/* cache stash targets */ +enum stash_target { + IOMMU_ATTR_CACHE_L1 = 1, + IOMMU_ATTR_CACHE_L2, + IOMMU_ATTR_CACHE_L3, +}; + +/* This attribute corresponds to IOMMUs capable of generating + * a stash transaction. A stash transaction is typically a + * hardware initiated prefetch of data from memory to cache. + * This attribute allows configuring stashig specific parameters + * in the IOMMU hardware. + */ + +struct iommu_stash_attribute { + u32 cpu;/* cpu number */ + u32 cache; /* cache to stash to: L1,L2,L3 */ +}; + I would prefer these PAMU specific enum and struct to be in a pamu- specific iommu-header. [Sethi Varun-B16395] But, these would be used by the IOMMU API users (e.g. VFIO), they shouldn't depend on PAMU specific headers. -Varun ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu