Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

2013-04-02 Thread Joerg Roedel
On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote:
 Current flow on kdump boot
 enable_IR
   intel_enable_irq_remapping
 iommu_disable_irq_remapping  == IRES/QIES/TES disabled here
 dmar_disable_qi  == do nothing
 dmar_enable_qi   == QIES enabled
 intel_setup_irq_remapping== IRES enabled

But what we want to do here in the kdumo case is to disable translation
too, right? Because the former kernel might have translation and
irq-remapping enabled and the kdump kernel might be compiled without
support for dma-remapping. So if we don't disable translation here too
the kdump kernel is unable to do DMA.

I agree that disabling translation should be a bit more explicit instead
of the current code.


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/amd: Add IOMMU event log injection interface for testing event flag decoding logic

2013-04-02 Thread Joerg Roedel
On Wed, Mar 27, 2013 at 06:51:34PM -0500, suravee.suthikulpa...@amd.com wrote:
 From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
 
 Add IOMMU event log injection interface for testing event flag decoding logic.
 This interface allows users to specify device id, event flag, and event types 
 via debugfs.
 
 echo 0x300  /sys/kernel/debug/amd-iommu-evninj/devid // (e.g. Bus:Dev.fun 
 3:0.0)
 echo 0xfff  /sys/kernel/debug/amd-iommu-evninj/flags // (e.g. Specify flag 
 value)
 echo 2  /sys/kernel/debug/amd-iommu-evninj/type  // (e.g. for 
 IO_PAGE_FAULT event)
 
 Once the event is injected, IOMMU driver will parse and print out the event 
 information
 in kernel log (dmesg) with the various information depending on the types of 
 event and flag
 specified.

Why is this needed? I think the risk of regressions in the
event-handling code is to low to justify the need for an event-injection
mechanism.

Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread Suravee Suthikulanit

On 4/2/2013 9:33 AM, Joerg Roedel wrote:

Hi Suravee,

On Wed, Mar 27, 2013 at 06:51:23PM -0500, suravee.suthikulpa...@amd.com wrote:

From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com

Add logic to decode AMD IOMMU event flag based on information from AMD IOMMU 
specification.
This should simplify debugging IOMMU errors.  Also, dump DTE information in 
additional cases.

The patch in general makes sense to have, but I have a couple of
comments.


+static void dump_flags(int flags, int ev_type)
+{
+   struct _event_log_flags *p = (struct _event_log_flags *) flags;
+   u32 err_type = p-type;
+
+   pr_err(AMD-Vi: Flags details:\n);
+   pr_err(AMD-Vi:Guest / Nested  : %u\n, p-gn);
+   pr_err(AMD-Vi:No Execute  : %u\n, p-nx);
+   pr_err(AMD-Vi:User-Supervisor : %u\n, p-us);
+   pr_err(AMD-Vi:Interrupt   : %u\n, p-i);
+   pr_err(AMD-Vi:Present : %u\n, p-pr);
+   pr_err(AMD-Vi:Read / Write: %u\n, p-rw);
+   pr_err(AMD-Vi:Permission  : %u\n, p-pe);
+   pr_err(AMD-Vi:Reserv bit not zero / Illegal level encoding : %u\n,
+   p-rz);
+   pr_err(AMD-Vi:Translation / Transaction : %u\n,
+   p-tr);
+   pr_err(AMD-Vi:Type of error   : (0x%x) , err_type);

Printing the flags multi-line is much too noisy for IOMMU events. Just
print a char-shortcut for each flag set on the same line.

I will make the changes and send in for new patch for review.



+
+   if ((ev_type == EVENT_TYPE_DEV_TAB_ERR)  ||
+   (ev_type == EVENT_TYPE_PAGE_TAB_ERR) ||
+   (ev_type == EVENT_TYPE_CMD_HARD_ERR)) {
+   /* Only supports up to 2 bits */
+   err_type = 0x3;
+   switch (err_type) {
+   case 0:
+   pr_err(Reserved\n);
+   break;
+   case 1:
+   pr_err(Master Abort\n);
+   break;
+   case 2:
+   pr_err(Target Abort\n);
+   break;
+   case 3:
+   pr_err(Data Error\n);
+   break;
+   }

Why do you create string-arrays for other type-encodings but not for
this one?

I can do the same way if that's preferred.

Thanks,

Suravee



+   } else if (ev_type == EVENT_TYPE_INV_DEV_REQ) {
+   if (p-tr == 0) {
+   if (err_type  ARRAY_SIZE(_invalid_translation_desc))
+   printk(%s\n,
+   _invalid_translation_desc[err_type]);
+   } else {
+   if (err_type  ARRAY_SIZE(_invalid_transaction_desc))
+   printk(%s\n,
+   _invalid_transaction_desc[err_type]);

pr_cont instead of printk.


Joerg






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread Borislav Petkov
On Tue, Apr 02, 2013 at 04:33:36PM +0200, Joerg Roedel wrote:
 Hi Suravee,
 
 On Wed, Mar 27, 2013 at 06:51:23PM -0500, suravee.suthikulpa...@amd.com wrote:
  From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
  
  Add logic to decode AMD IOMMU event flag based on information from AMD 
  IOMMU specification.
  This should simplify debugging IOMMU errors.  Also, dump DTE information in 
  additional cases.
 
 The patch in general makes sense to have, but I have a couple of
 comments.

While you guys are at it, can someone fix this too pls (ASUS board with
a PD on it).

[0.220342] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
[0.220398] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
[0.220451] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table
[0.220506] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.

2013-04-02 Thread Andrew Cooks
On 2 Apr 2013 15:37, Pat Erley pat-l...@erley.org wrote:

 On 03/07/2013 09:35 PM, Andrew Cooks wrote:

 --- a/drivers/pci/quirks.c
 +++ b/drivers/pci/quirks.c

 +/* Table of multiple (ghost) source functions. This is similar to the
 + * translated sources above, but with the following differences:
 + * 1. the device may use multiple functions as DMA sources,
 + * 2. these functions cannot be assumed to be actual devices, they're
simply
 + * incorrect DMA tags.
 + * 3. the specific ghost function for a request can not always be
predicted.
 + * For example, the actual device could be xx:yy.1 and it could use
 + * both 0 and 1 for different requests, with no obvious way to tell when
 + * DMA will be tagged as comming from xx.yy.0 and and when it will be
tagged
 + * as comming from xx.yy.1.
 + * The bitmap contains all of the functions used in DMA tags, including
the
 + * actual device.
 + * See  https://bugzilla.redhat.com/show_bug.cgi?id=757166,
 + * https://bugzilla.kernel.org/show_bug.cgi?id=42679
 + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768
 + */
 +static const struct pci_dev_dma_multi_func_sources {
 +   u16 vendor;
 +   u16 device;
 +   u8 func_map;/* bit map. lsb is fn 0. */
 +} pci_dev_dma_multi_func_sources[] = {
 +   { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)},
 +   { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)},
 +   { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)},
 +   { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)},
 +   { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)},
 +   { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)},
 +   { 0 }
 +};


 Adding another buggy device.  I have a Ricoh multifunction device:

 17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01)
 17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394
 Controller (rev 01)

 17:00.0 0805: 1180:e822 (rev 01)
 17:00.3 0c00: 1180:e832 (rev 01)


The Ricoh device issue has been known for some time and a quirk has been
available since commit 12ea6cad1c7d046 in June 2012.  It's slightly
different than the problem this patch tries to work around [1].

 that adding entries for also fixed booting.  I don't have any SD cards or
firewire devices handy to test that they work, but the system now boots,
which was not the case without your patch and IOMMU/DMAR enabled.

That is really strange. Could you tell us what kernel version you tested
and provide dmesg output?


  Here's a previous patch used for similar hardware that may also be fixed
by this:


http://lists.fedoraproject.org/pipermail/scm-commits/2010-October/510785.html

 and another thread/bug report this may solve:

 https://bugzilla.redhat.com/show_bug.cgi?id=605888

I believe this is referenced in drivers/pci/quirks.c for versions newer
than 3.5.


 Feel free to include me in any future iterations of this patch you'd like
tested.

 Tested-By: Pat Erley pat-l...@erley.org


Thanks for testing!

[1] In the Ricoh case, multiple functions are used for real devices and the
bug is that these devices all use function 0 during DMA. In this particular
case, I'd expect the FireWire device 17:00.3 to issue DMA from the SD Host
Controller address 17:00.0. The quirk is not too much of a terrible hack -
it's a fairly simple translation.

In the Marvell case, the real device uses DMA source tags that don't
actually belong to any visible devices. The quirk to make this work is more
invasive, not nearly as elegant and has not attracted much enthusiasm from
subsystem maintainers, though I'm still hopeful that a quirk will be merged
in some form or another.

--

a.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread Joerg Roedel
On Tue, Apr 02, 2013 at 04:40:37PM +0200, Borislav Petkov wrote:
 While you guys are at it, can someone fix this too pls (ASUS board with
 a PD on it).
 
 [0.220342] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
 [0.220398] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
 [0.220451] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS 
 table
 [0.220506] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)

That is actually a BIOS problem. I wonder whether it would help to turn this
into a WARN_ON to get the board vendors to release working BIOSes.
Opinions?


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5 v11] powerpc: Add iommu domain pointer to device archdata

2013-04-02 Thread Joerg Roedel
On Fri, Mar 29, 2013 at 01:23:59AM +0530, Varun Sethi wrote:
 Add an iommu domain pointer to device (powerpc) archdata.  Devices
 are attached to iommu domains and this pointer provides a mechanism
 to correlate between a device and the associated iommu domain.  This
 field is set when a device is attached to a domain.
 
 Signed-off-by: Varun Sethi varun.se...@freescale.com

This patch needs to be Acked by the PPC maintainers.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-02 Thread Yoder Stuart-B08248

 -Original Message-
 From: Sethi Varun-B16395
 Sent: Thursday, March 28, 2013 2:54 PM
 To: j...@8bytes.org; Yoder Stuart-B08248; Wood Scott-B07421; 
 iommu@lists.linux-foundation.org; linuxppc-
 d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; 
 ga...@kernel.crashing.org; b...@kernel.crashing.org
 Cc: Sethi Varun-B16395
 Subject: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu 
 implementation.
 
 Following is a brief description of the PAMU hardware:
 PAMU determines what action to take and whether to authorize the action on
 the basis of the memory address, a Logical IO Device Number (LIODN), and
 PAACT table (logically) indexed by LIODN and address. Hardware devices which
 need to access memory must provide an LIODN in addition to the memory address.
 
 Peripheral Access Authorization and Control Tables (PAACTs) are the primary
 data structures used by PAMU. A PAACT is a table of peripheral access
 authorization and control entries (PAACE).Each PAACE defines the range of
 I/O bus address space that is accessible by the LIOD and the associated access
 capabilities.
 
 There are two types of PAACTs: primary PAACT (PPAACT) and secondary PAACT
 (SPAACT).A given physical I/O device may be able to act as one or more
 independent logical I/O devices (LIODs). Each such logical I/O device is
 assigned an identifier called logical I/O device number (LIODN). A LIODN is
 allocated a contiguous portion of the I/O bus address space called the DSA 
 window
 for performing DSA operations. The DSA window may optionally be divided into
 multiple sub-windows, each of which may be used to map to a region in system
 storage space. The first sub-window is referred to as the primary sub-window
 and the remaining are called secondary sub-windows.
 
 This patch provides the PAMU driver (fsl_pamu.c) and the corresponding IOMMU
 API implementation (fsl_pamu_domain.c). The PAMU hardware driver (fsl_pamu.c)
 has been derived from the work done by Ashish Kalra and Timur Tabi.
 
 Signed-off-by: Timur Tabi ti...@tabi.org
 Signed-off-by: Varun Sethi varun.se...@freescale.com
 ---
 changes in v11:
 - changed iova to dma_addr_t in iova_to_phys API.
 changes in v10:
 - Support for new guts compatibe string for T4  B4 devices.
 - Modified comment about port ID and mentioned the errata number.
 - Fixed the issue where data pointer was not freed in case of a an error.
 - Pass data pointer while freeing irq.
 - Whle initializing the SPAACE entry clear the valid bit.
 changes in v9:
 - Merged and createad a single function to delete
 a device from domain list.
 - Refactored the add_device API code.
 - Renamed the paace and spaace init fucntions.
 - Renamed functions for mapping windows and subwindows.
 - Changed the MAX LIODN value to MAX value u-boot can
 program.
 - Hard coded maximum number of subwindows.
 changes in v8:
 - implemented the new API for window based IOMMUs.
 changes in v7:
 - Set max_subwidows in the geometry attribute.
 - Add checking for maximum supported LIODN value.
 - Use upper_32_bits and lower_32_bits macros while
   intializing PAMU data structures.
 changes in v6:
 - Simplified complex conditional statements.
 - Fixed indentation issues.
 - Added comments for IOMMU API implementation.
 changes in v5:
 - Addressed comments from Timur.
 changes in v4:
 - Addressed comments from Timur and Scott.
 changes in v3:
 - Addressed comments by Kumar Gala
 - dynamic fspi allocation
 - fixed alignment check in map and unmap
  arch/powerpc/sysdev/fsl_pci.h   |5 +
  drivers/iommu/Kconfig   |8 +
  drivers/iommu/Makefile  |1 +
  drivers/iommu/fsl_pamu.c| 1269 
 +++
  drivers/iommu/fsl_pamu.h|  405 +
  drivers/iommu/fsl_pamu_domain.c | 1137 +++
  drivers/iommu/fsl_pamu_domain.h |   85 +++
  7 files changed, 2910 insertions(+), 0 deletions(-)
  create mode 100644 drivers/iommu/fsl_pamu.c
  create mode 100644 drivers/iommu/fsl_pamu.h
  create mode 100644 drivers/iommu/fsl_pamu_domain.c
  create mode 100644 drivers/iommu/fsl_pamu_domain.h

Ack.

Stuart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread Borislav Petkov
On Tue, Apr 02, 2013 at 05:03:04PM +0200, Joerg Roedel wrote:
 On Tue, Apr 02, 2013 at 04:40:37PM +0200, Borislav Petkov wrote:
  While you guys are at it, can someone fix this too pls (ASUS board with
  a PD on it).
  
  [0.220342] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
  [0.220398] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
  [0.220451] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS 
  table
  [0.220506] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)
 
 That is actually a BIOS problem. I wonder whether it would help to turn this
 into a WARN_ON to get the board vendors to release working BIOSes.
 Opinions?

Good luck trying to get ASUS to fix anything in their BIOS :(.

Can't we detect the SB IOAPIC some other way in this case?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread Suravee Suthikulpanit

On 4/2/2013 10:29 AM, Borislav Petkov wrote:

On Tue, Apr 02, 2013 at 05:03:04PM +0200, Joerg Roedel wrote:

On Tue, Apr 02, 2013 at 04:40:37PM +0200, Borislav Petkov wrote:

While you guys are at it, can someone fix this too pls (ASUS board with
a PD on it).

[0.220342] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
[0.220398] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
[0.220451] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table
[0.220506] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)

That is actually a BIOS problem. I wonder whether it would help to turn this
into a WARN_ON to get the board vendors to release working BIOSes.
Opinions?

Good luck trying to get ASUS to fix anything in their BIOS :(.
I have tried to contact Asus in the past to have them fix the issue, but 
I got no luck.  Once it is out in the field, it's very difficult to get 
them to make changes.  I am also addressing this issue with the BIOS 
team for the future hardware.


Turning this into WARN_ON() at this point might break a lot of systems 
currently out in the field.  However, users can always switching to use 
intremap=off but this might not be obvious.


Suravee


Can't we detect the SB IOAPIC some other way in this case?




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.

2013-04-02 Thread Pat Erley

On 03/07/2013 09:35 PM, Andrew Cooks wrote:

--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
+/* Table of multiple (ghost) source functions. This is similar to the
+ * translated sources above, but with the following differences:
+ * 1. the device may use multiple functions as DMA sources,
+ * 2. these functions cannot be assumed to be actual devices, they're simply
+ * incorrect DMA tags.
+ * 3. the specific ghost function for a request can not always be predicted.
+ * For example, the actual device could be xx:yy.1 and it could use
+ * both 0 and 1 for different requests, with no obvious way to tell when
+ * DMA will be tagged as comming from xx.yy.0 and and when it will be tagged
+ * as comming from xx.yy.1.
+ * The bitmap contains all of the functions used in DMA tags, including the
+ * actual device.
+ * See  https://bugzilla.redhat.com/show_bug.cgi?id=757166,
+ * https://bugzilla.kernel.org/show_bug.cgi?id=42679
+ * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768
+ */
+static const struct pci_dev_dma_multi_func_sources {
+   u16 vendor;
+   u16 device;
+   u8 func_map;/* bit map. lsb is fn 0. */
+} pci_dev_dma_multi_func_sources[] = {
+   { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)},
+   { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)},
+   { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)},
+   { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)},
+   { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)},
+   { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)},
+   { 0 }
+};


Adding another buggy device.  I have a Ricoh multifunction device:

17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01)
17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394
Controller (rev 01)

17:00.0 0805: 1180:e822 (rev 01)
17:00.3 0c00: 1180:e832 (rev 01)

that adding entries for also fixed booting.  I don't have any SD cards 
or firewire devices handy to test that they work, but the system now 
boots, which was not the case without your patch and IOMMU/DMAR enabled. 
 Here's a previous patch used for similar hardware that may also be 
fixed by this:


http://lists.fedoraproject.org/pipermail/scm-commits/2010-October/510785.html

and another thread/bug report this may solve:

https://bugzilla.redhat.com/show_bug.cgi?id=605888

Feel free to include me in any future iterations of this patch you'd 
like tested.


Tested-By: Pat Erley pat-l...@erley.org

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.

2013-04-02 Thread Pat Erley

On 04/02/2013 10:50 AM, Andrew Cooks wrote:

On 2 Apr 2013 15:37, Pat Erley pat-l...@erley.org
mailto:pat-l...@erley.org wrote:
 
  On 03/07/2013 09:35 PM, Andrew Cooks wrote:
 
  --- a/drivers/pci/quirks.c
  +++ b/drivers/pci/quirks.c
 
  +/* Table of multiple (ghost) source functions. This is similar to the
  + * translated sources above, but with the following differences:
  + * 1. the device may use multiple functions as DMA sources,
  + * 2. these functions cannot be assumed to be actual devices,
they're simply
  + * incorrect DMA tags.
  + * 3. the specific ghost function for a request can not always be
predicted.
  + * For example, the actual device could be xx:yy.1 and it could use
  + * both 0 and 1 for different requests, with no obvious way to tell
when
  + * DMA will be tagged as comming from xx.yy.0 and and when it will
be tagged
  + * as comming from xx.yy.1.
  + * The bitmap contains all of the functions used in DMA tags,
including the
  + * actual device.
  + * See https://bugzilla.redhat.com/show_bug.cgi?id=757166,
  + * https://bugzilla.kernel.org/show_bug.cgi?id=42679
  + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768
  + */
  +static const struct pci_dev_dma_multi_func_sources {
  +   u16 vendor;
  +   u16 device;
  +   u8 func_map;/* bit map. lsb is fn 0. */
  +} pci_dev_dma_multi_func_sources[] = {
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)},
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)},
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)},
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)},
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)},
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)},
  +   { 0 }
  +};
 
 
  Adding another buggy device.  I have a Ricoh multifunction device:
 
  17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01)
  17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394
  Controller (rev 01)
 
  17:00.0 0805: 1180:e822 (rev 01)
  17:00.3 0c00: 1180:e832 (rev 01)
 

The Ricoh device issue has been known for some time and a quirk has been
available since commit 12ea6cad1c7d046 in June 2012.  It's slightly
different than the problem this patch tries to work around [1].


Hmm, I've had this problem with many recent (vanilla) kernels, up to and 
including 3.9-rc5



  that adding entries for also fixed booting.  I don't have any SD
cards or firewire devices handy to test that they work, but the system
now boots, which was not the case without your patch and IOMMU/DMAR enabled.

That is really strange. Could you tell us what kernel version you tested
and provide dmesg output?


I'll capture a vanilla 3.8.5 boot without any patches and iommu=off, 
then try to find another machine to catch what I can of a netconsole 
boot with iommu=on.  What's the preferred way to send these?  pastebin 
links?


I'd been running the 'dirty' fix that's in the redhat bugzilla entry.  I 
checked my .config and have CONFIG_PCI_QUIRKS=y, and verified my devices 
are in the quirks table for the pci_func_0_dma_source fixup.



   Here's a previous patch used for similar hardware that may also be
fixed by this:
 
 
http://lists.fedoraproject.org/pipermail/scm-commits/2010-October/510785.html
 
  and another thread/bug report this may solve:
 
  https://bugzilla.redhat.com/show_bug.cgi?id=605888

I believe this is referenced in drivers/pci/quirks.c for versions newer
than 3.5.


  Feel free to include me in any future iterations of this patch you'd
like tested.
 
  Tested-By: Pat Erley pat-l...@erley.org mailto:pat-l...@erley.org
 

Thanks for testing!

[1] In the Ricoh case, multiple functions are used for real devices and
the bug is that these devices all use function 0 during DMA. In this
particular case, I'd expect the FireWire device 17:00.3 to issue DMA
from the SD Host Controller address 17:00.0. The quirk is not too much
of a terrible hack - it's a fairly simple translation.

In the Marvell case, the real device uses DMA source tags that don't
actually belong to any visible devices. The quirk to make this work is
more invasive, not nearly as elegant and has not attracted much
enthusiasm from subsystem maintainers, though I'm still hopeful that a
quirk will be merged in some form or another.



Thanks for explaining the difference!

Pat
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread Joerg Roedel
On Tue, Apr 02, 2013 at 05:29:56PM +0200, Borislav Petkov wrote:
 On Tue, Apr 02, 2013 at 05:03:04PM +0200, Joerg Roedel wrote:

 Good luck trying to get ASUS to fix anything in their BIOS :(.

Hmm...

 Can't we detect the SB IOAPIC some other way in this case?

I can certainly write a patch that works around your particular BIOS
bug. The problem is that such a fix will most certainly break other
systems.

Unfortunatly there is no reliable way to fixup the IO-APIC-ID-DEVID
mapping at runtime when the BIOS messed it up. The only thing I can do
is to check for potential problems and disable the intremap feature
then, so that the system will at least boot.


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread Joerg Roedel
On Tue, Apr 02, 2013 at 10:41:25AM -0500, Suthikulpanit, Suravee wrote:
 Turning this into WARN_ON() at this point might break a lot of
 systems currently out in the field.  However, users can always
 switching to use intremap=off but this might not be obvious.

A WARN_ON doesn't break systems, it just creates more noise. Probably
enough noise to get board vendors to fix their stuff up. But there is
more to consider before making more noise, of course :)


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-02 Thread Joerg Roedel
Cc'ing Alex Williamson

Alex, can you please review the iommu-group part of this patch?

My comments so far are below:

On Fri, Mar 29, 2013 at 01:24:02AM +0530, Varun Sethi wrote:
 +config FSL_PAMU
 + bool Freescale IOMMU support
 + depends on PPC_E500MC
 + select IOMMU_API
 + select GENERIC_ALLOCATOR
 + help
 +   Freescale PAMU support.

A bit lame for a help text. Can you elaborate more what PAMU is and when
it should be enabled?

 +int pamu_enable_liodn(int liodn)
 +{
 + struct paace *ppaace;
 +
 + ppaace = pamu_get_ppaace(liodn);
 + if (!ppaace) {
 + pr_err(Invalid primary paace entry\n);
 + return -ENOENT;
 + }
 +
 + if (!get_bf(ppaace-addr_bitfields, PPAACE_AF_WSE)) {
 + pr_err(liodn %d not configured\n, liodn);
 + return -EINVAL;
 + }
 +
 + /* Ensure that all other stores to the ppaace complete first */
 + mb();
 +
 + ppaace-addr_bitfields |= PAACE_V_VALID;
 + mb();

Why is it sufficient to set the bit in a variable when enabling liodn
but when disabling it set_bf needs to be called? This looks a bit
assymetric.

 +/* Derive the window size encoding for a particular PAACE entry */
 +static unsigned int map_addrspace_size_to_wse(phys_addr_t addrspace_size)
 +{
 + /* Bug if not a power of 2 */
 + BUG_ON((addrspace_size  (addrspace_size - 1)));

Please use is_power_of_2 here.

 +
 + /* window size is 2^(WSE+1) bytes */
 + return __ffs(addrspace_size  PAMU_PAGE_SHIFT) + PAMU_PAGE_SHIFT - 1;

The PAMU_PAGE_SHIFT shifting and adding looks redundant.

 + if ((win_size  (win_size - 1)) || win_size  PAMU_PAGE_SIZE) {
 + pr_err(window size too small or not a power of two %llx\n, 
 win_size);
 + return -EINVAL;
 + }
 +
 + if (win_addr  (win_size - 1)) {
 + pr_err(window address is not aligned with window size\n);
 + return -EINVAL;
 + }

Again, use is_power_of_2 instead of hand-coding.

 + if (~stashid != 0)
 + set_bf(paace-impl_attr, PAACE_IA_CID, stashid);
 +
 + smp_wmb();
 +
 + if (enable)
 + paace-addr_bitfields |= PAACE_V_VALID;

Havn't you written a helper funtion to set this bit?

 +irqreturn_t pamu_av_isr(int irq, void *arg)
 +{
 + struct pamu_isr_data *data = arg;
 + phys_addr_t phys;
 + unsigned int i, j;
 +
 + pr_emerg(fsl-pamu: access violation interrupt\n);
 +
 + for (i = 0; i  data-count; i++) {
 + void __iomem *p = data-pamu_reg_base + i * PAMU_OFFSET;
 + u32 pics = in_be32(p + PAMU_PICS);
 +
 + if (pics  PAMU_ACCESS_VIOLATION_STAT) {
 + pr_emerg(POES1=%08x\n, in_be32(p + PAMU_POES1));
 + pr_emerg(POES2=%08x\n, in_be32(p + PAMU_POES2));
 + pr_emerg(AVS1=%08x\n, in_be32(p + PAMU_AVS1));
 + pr_emerg(AVS2=%08x\n, in_be32(p + PAMU_AVS2));
 + pr_emerg(AVA=%016llx\n, make64(in_be32(p + PAMU_AVAH),
 + in_be32(p + PAMU_AVAL)));
 + pr_emerg(UDAD=%08x\n, in_be32(p + PAMU_UDAD));
 + pr_emerg(POEA=%016llx\n, make64(in_be32(p + 
 PAMU_POEAH),
 + in_be32(p + PAMU_POEAL)));
 +
 + phys = make64(in_be32(p + PAMU_POEAH),
 + in_be32(p + PAMU_POEAL));
 +
 + /* Assume that POEA points to a PAACE */
 + if (phys) {
 + u32 *paace = phys_to_virt(phys);
 +
 + /* Only the first four words are relevant */
 + for (j = 0; j  4; j++)
 + pr_emerg(PAACE[%u]=%08x\n, j, 
 in_be32(paace + j));
 + }
 + }
 + }
 +
 + panic(\n);

A kernel panic seems like an over-reaction to an access violation.
Besides the device that caused the violation the system should still
work, no?

 +#define make64(high, low) (((u64)(high)  32) | (low))

You redefined this make64 here.

 +static int map_subwins(int liodn, struct fsl_dma_domain *dma_domain)
 +{
 + struct dma_window *sub_win_ptr =
 + dma_domain-win_arr[0];
 + int i, ret;
 + unsigned long rpn;
 +
 + for (i = 0; i  dma_domain-win_cnt; i++) {
 + if (sub_win_ptr[i].valid) {
 + rpn = sub_win_ptr[i].paddr 
 +  PAMU_PAGE_SHIFT;
 + spin_lock(iommu_lock);

IOMMU code might run in interrupt context, so please use
spin_lock_irqsave for the iommu_lock.

 +static void detach_device(struct device *dev, struct fsl_dma_domain 
 *dma_domain)
 +{
 + struct device_domain_info *info;
 + struct list_head *entry, *tmp;
 + unsigned long flags;
 +
 + spin_lock_irqsave(dma_domain-domain_lock, flags);
 + /* Remove the device from the domain device list */
 + 

Re: [PATCH 0/5 v11] iommu/fsl: Freescale PAMU driver and IOMMU API implementation.

2013-04-02 Thread Joerg Roedel
On Fri, Mar 29, 2013 at 01:23:57AM +0530, Varun Sethi wrote:
 This patchset provides the Freescale PAMU (Peripheral Access Management Unit) 
 driver
 and the corresponding IOMMU API implementation. PAMU is the IOMMU present on 
 Freescale
 QorIQ platforms. PAMU can authorize memory access, remap the memory address, 
 and remap 
 the I/O transaction type.
 
 This set consists of the following patches:
 1.  Make iova dma_addr_t in the iommu_iova_to_phys API.
 2. Addition of new field in the device (powerpc) archdata structure for 
 storing iommu domain information
pointer.
 3. Add window permission flags in the iommu_domain_window_enable API.
 4. Add domain attributes for FSL PAMU driver.
 5. PAMU driver and IOMMU API implementation.

Okay, I am about to apply patches 1 and 3 to a new ppc/pamu branch in my
tree.

As a general question, how did you test the IOMMU driver and what will
you do in the future to avoid regressions?


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.

2013-04-02 Thread Pat Erley

On 04/02/2013 11:47 AM, Pat Erley wrote:

On 04/02/2013 10:50 AM, Andrew Cooks wrote:

On 2 Apr 2013 15:37, Pat Erley pat-l...@erley.org
mailto:pat-l...@erley.org wrote:
 
  On 03/07/2013 09:35 PM, Andrew Cooks wrote:
 
  --- a/drivers/pci/quirks.c
  +++ b/drivers/pci/quirks.c
 
  +/* Table of multiple (ghost) source functions. This is similar to
the
  + * translated sources above, but with the following differences:
  + * 1. the device may use multiple functions as DMA sources,
  + * 2. these functions cannot be assumed to be actual devices,
they're simply
  + * incorrect DMA tags.
  + * 3. the specific ghost function for a request can not always be
predicted.
  + * For example, the actual device could be xx:yy.1 and it could use
  + * both 0 and 1 for different requests, with no obvious way to tell
when
  + * DMA will be tagged as comming from xx.yy.0 and and when it will
be tagged
  + * as comming from xx.yy.1.
  + * The bitmap contains all of the functions used in DMA tags,
including the
  + * actual device.
  + * See https://bugzilla.redhat.com/show_bug.cgi?id=757166,
  + * https://bugzilla.kernel.org/show_bug.cgi?id=42679
  + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768
  + */
  +static const struct pci_dev_dma_multi_func_sources {
  +   u16 vendor;
  +   u16 device;
  +   u8 func_map;/* bit map. lsb is fn 0. */
  +} pci_dev_dma_multi_func_sources[] = {
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)},
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)},
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)},
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)},
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)},
  +   { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)},
  +   { 0 }
  +};
 
 
  Adding another buggy device.  I have a Ricoh multifunction device:
 
  17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller
(rev 01)
  17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394
  Controller (rev 01)
 
  17:00.0 0805: 1180:e822 (rev 01)
  17:00.3 0c00: 1180:e832 (rev 01)
 

The Ricoh device issue has been known for some time and a quirk has been
available since commit 12ea6cad1c7d046 in June 2012.  It's slightly
different than the problem this patch tries to work around [1].


Hmm, I've had this problem with many recent (vanilla) kernels, up to and
including 3.9-rc5


  that adding entries for also fixed booting.  I don't have any SD
cards or firewire devices handy to test that they work, but the system
now boots, which was not the case without your patch and IOMMU/DMAR
enabled.

That is really strange. Could you tell us what kernel version you tested
and provide dmesg output?


I'll capture a vanilla 3.8.5 boot without any patches and iommu=off,
then try to find another machine to catch what I can of a netconsole
boot with iommu=on.  What's the preferred way to send these?  pastebin
links?

I'd been running the 'dirty' fix that's in the redhat bugzilla entry.  I
checked my .config and have CONFIG_PCI_QUIRKS=y, and verified my devices
are in the quirks table for the pci_func_0_dma_source fixup.


   Here's a previous patch used for similar hardware that may also be
fixed by this:
 
 
http://lists.fedoraproject.org/pipermail/scm-commits/2010-October/510785.html

 
  and another thread/bug report this may solve:
 
  https://bugzilla.redhat.com/show_bug.cgi?id=605888

I believe this is referenced in drivers/pci/quirks.c for versions newer
than 3.5.


  Feel free to include me in any future iterations of this patch you'd
like tested.
 
  Tested-By: Pat Erley pat-l...@erley.org mailto:pat-l...@erley.org
 

Thanks for testing!

[1] In the Ricoh case, multiple functions are used for real devices and
the bug is that these devices all use function 0 during DMA. In this
particular case, I'd expect the FireWire device 17:00.3 to issue DMA
from the SD Host Controller address 17:00.0. The quirk is not too much
of a terrible hack - it's a fairly simple translation.

In the Marvell case, the real device uses DMA source tags that don't
actually belong to any visible devices. The quirk to make this work is
more invasive, not nearly as elegant and has not attracted much
enthusiasm from subsystem maintainers, though I'm still hopeful that a
quirk will be merged in some form or another.



Thanks for explaining the difference!

Pat
--


Here are my relevant logs and configs from a vanilla 3.8.5 kernel:

  http://www.erley.org/oops/

* the -nots files have had timestamps stripped for ease of diffing.

* no_iommu_no_fw.txt is a diff of the -nots logs.

* loading_fw.txt is an excerpt of log once I load the firewire-ohci
  module (causing, for all practical purposes, a complete system lock.)

* the .gz of the same name is the 55mb of logs it generated in 36
  seconds.

I was hesitant to send 100k of text to the ML, here is the only 
'interesting' difference in the logs, from my inspection:


-PCI-DMA: Using software bounce buffering for 

RFC: vfio API changes needed for powerpc

2013-04-02 Thread Yoder Stuart-B08248
Alex,

We are in the process of implementing vfio-pci support for the Freescale
IOMMU (PAMU).  It is an aperture/window-based IOMMU and is quite different
than x86, and will involve creating a 'type 2' vfio implementation.

For each device's DMA mappings, PAMU has an overall aperture and a number
of windows.  All sizes and window counts must be power of 2.  To illustrate,
below is a mapping for a 256MB guest, including guest memory (backed by
64MB huge pages) and some windows for MSIs:

Total aperture: 512MB
# of windows: 8

win gphys/
#   iovaphys  size
---   
0   0x  0xX_XX00  64MB
1   0x0400  0xX_XX00  64MB
2   0x0800  0xX_XX00  64MB
3   0x0C00  0xX_XX00  64MB
4   0x1000  0xf_fe044000  4KB// msi bank 1
5   0x1400  0xf_fe045000  4KB// msi bank 2
6   0x1800  0xf_fe046000  4KB// msi bank 3
7- -  disabled

There are a couple of updates needed to the vfio user-kernel interface
that we would like your feedback on.

1.  IOMMU geometry

   The kernel IOMMU driver now has an interface (see domain_set_attr,
   domain_get_attr) that lets us set the domain geometry using
   attributes.

   We want to expose that to user space, so envision needing a couple
   of new ioctls to do this:
VFIO_IOMMU_SET_ATTR
VFIO_IOMMU_GET_ATTR 

2.   MSI window mappings

   The more problematic question is how to deal with MSIs.  We need to
   create mappings for up to 3 MSI banks that a device may need to target
   to generate interrupts.  The Linux MSI driver can allocate MSIs from
   the 3 banks any way it wants, and currently user space has no way of
   knowing which bank may be used for a given device.   

   There are 3 options we have discussed and would like your direction:

   A.  Implicit mappings -- with this approach user space would not
   explicitly map MSIs.  User space would be required to set the
   geometry so that there are 3 unused windows (the last 3 windows)
   for MSIs, and it would be up to the kernel to create the mappings.
   This approach requires some specific semantics (leaving 3 windows)
   and it potentially gets a little weird-- when should the kernel
   actually create the MSI mappings?  When should they be unmapped?
   Some convention would need to be established.

   B.  Explicit mapping using DMA map flags.  The idea is that a new
   flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
   a mapping is to be created for the supplied iova.  No vaddr
   is given though.  So in the above example there would be a
   a dma map at 0x1000 for 24KB (and no vaddr).   It's
   up to the kernel to determine which bank gets mapped where.
   So, this option puts user space in control of which windows
   are used for MSIs and when MSIs are mapped/unmapped.   There
   would need to be some semantics as to how this is used-- it
   only makes sense

   C.  Explicit mapping using normal DMA map.  The last idea is that
   we would introduce a new ioctl to give user-space an fd to 
   the MSI bank, which could be mmapped.  The flow would be
   something like this:
  -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD
  -user space mmaps the fd, getting a vaddr
  -user space does a normal DMA map for desired iova
   This approach makes everything explicit, but adds a new ioctl
   applicable most likely only to the PAMU (type2 iommu).

Any feedback or direction?

Thanks,
Stuart 



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 0/5 v11] iommu/fsl: Freescale PAMU driver and IOMMU API implementation.

2013-04-02 Thread Sethi Varun-B16395


 -Original Message-
 From: Joerg Roedel [mailto:j...@8bytes.org]
 Sent: Tuesday, April 02, 2013 9:53 PM
 To: Sethi Varun-B16395
 Cc: Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-
 foundation.org; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; ga...@kernel.crashing.org;
 b...@kernel.crashing.org
 Subject: Re: [PATCH 0/5 v11] iommu/fsl: Freescale PAMU driver and IOMMU
 API implementation.
 
 On Fri, Mar 29, 2013 at 01:23:57AM +0530, Varun Sethi wrote:
  This patchset provides the Freescale PAMU (Peripheral Access
  Management Unit) driver and the corresponding IOMMU API
  implementation. PAMU is the IOMMU present on Freescale QorIQ
  platforms. PAMU can authorize memory access, remap the memory address,
 and remap the I/O transaction type.
 
  This set consists of the following patches:
  1.  Make iova dma_addr_t in the iommu_iova_to_phys API.
  2. Addition of new field in the device (powerpc) archdata structure for
 storing iommu domain information
 pointer.
  3. Add window permission flags in the iommu_domain_window_enable API.
  4. Add domain attributes for FSL PAMU driver.
  5. PAMU driver and IOMMU API implementation.
 
 Okay, I am about to apply patches 1 and 3 to a new ppc/pamu branch in my
 tree.
 
 As a general question, how did you test the IOMMU driver and what will
 you do in the future to avoid regressions?
 
I use a kernel module for testing the iommu_api support. The module allows for 
dynamic creation and deletion of iommu domains for the devices in the device 
tree. Also, the vfio support (under development) for Freescale SOCs with APMU 
hardware would depend on the PAMU driver.

-Varun


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread suravee.suthikulpanit
From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com

Add logic to decode AMD IOMMU event flag based on information from AMD IOMMU 
specification.
This should simplify debugging IOMMU errors.  Also, dump DTE information in 
additional cases.

Signed-off-by: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
---
Changelog:
V2:
* Fix printing format to reduce noise
* Use string table instead of switch/case
* Use pr_cont instead of printk

 drivers/iommu/amd_iommu.c |  170 -
 1 file changed, 136 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b287ca3..3362678 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -601,6 +601,93 @@ static void amd_iommu_stats_init(void)
  *
  /
 
+struct _event_log_flags {
+   u32 gn:1, /* 16 */
+   nx:1, /* 17 */
+   us:1, /* 18 */
+i:1, /* 19 */
+   pr:1, /* 20 */
+   rw:1, /* 21 */
+   pe:1, /* 22 */
+   rz:1, /* 23 */
+   tr:1, /* 24 */
+   type:3, /* [27:25] */
+   _reserved_:20; /* Reserved */
+};
+
+static const char * const _type_field_encodings[] = {
+   /* 00 */Reserved,
+   /* 01 */Master Abort,
+   /* 10 */Target Abort,
+   /* 11 */Data Error,
+};
+
+static const char * const _invalid_transaction_desc[] = {
+   /* 000 */Read request or non-posted write in the interrupt 
+addres range,
+   /* 001 */Pretranslated transaction received from an I/O device 
+that has I=0 or V=0 in DTE,
+   /* 010 */Port I/O space transaction received from an I/O device 
+that has IoCtl=00b in DTE,
+   /* 011 */Posted write to invalid address range,
+   /* 100 */Invalid read request or non-posted write,
+   /* 101 */Posted write to the interrupt/EOI range from an I/O 
+device that has IntCtl=00b in DTE,
+   /* 110 */Posted write to a reserved interrupt address range,
+   /* 111 */Invalid transaction to the system management address range,
+};
+
+static const char * const _invalid_translation_desc[] = {
+   /* 000 */Translation request received from an I/O device that has 
+I=0, or has V=0, or has V=1 and TV=0 in DTE,
+   /* 001 */Translation request in invalid address range,
+   /* 010 */Invalid translation request,
+   /* 011 */Reserved,
+   /* 100 */Reserved,
+   /* 101 */Reserved,
+   /* 110 */Reserved,
+   /* 111 */Reserved,
+};
+
+static void dump_flags(int flags, int ev_type)
+{
+   struct _event_log_flags *p = (struct _event_log_flags *) flags;
+   u32 err_type = p-type;
+
+   pr_err(AMD-Vi: Flags details: %s, NX:%u, %s, %s, %s, %s, %s, %s, %s\n,
+   (p-gn ? Use guest address : Use nested address),
+   (p-nx),
+   (p-us ? User privileges : Supervisor privileges),
+   (p-i ? Interrupt request : memory request),
+   (p-pr ? Page present or Interrupt remapped :
+   Page not present or Interrupt blocked),
+   (p-rw ? Write transaction : Read transaction),
+   (p-pe ? Does not have permission : Has permission),
+   (p-rz ? Reserv bit not zero : Illegal level encoding),
+   (p-tr ? Translation request : Transaction request));
+
+   pr_err(AMD-Vi: Type of error: (0x%x) , err_type);
+   if ((ev_type == EVENT_TYPE_DEV_TAB_ERR)  ||
+   (ev_type == EVENT_TYPE_PAGE_TAB_ERR) ||
+   (ev_type == EVENT_TYPE_CMD_HARD_ERR)) {
+   if (err_type  ARRAY_SIZE(_type_field_encodings)) {
+   pr_cont(%s\n,
+   _type_field_encodings[err_type]);
+   }
+   } else if (ev_type == EVENT_TYPE_INV_DEV_REQ) {
+   if (p-tr == 0) {
+   if (err_type  ARRAY_SIZE(_invalid_translation_desc))
+   pr_cont(%s\n,
+   _invalid_translation_desc[err_type]);
+   } else {
+   if (err_type  ARRAY_SIZE(_invalid_transaction_desc))
+   pr_cont(%s\n,
+   _invalid_transaction_desc[err_type]);
+   }
+   }
+   pr_err(AMD-Vi: (Note: Please refer to AMD IOMMU specification for 
details.)\n);
+}
+
 static void dump_dte_entry(u16 devid)
 {
int i;
@@ -619,81 +706,95 @@ static void dump_command(unsigned long phys_addr)
pr_err(AMD-Vi: CMD[%d]: %08x\n, i, cmd-data[i]);
 }
 
-static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
+void amd_iommu_print_event(int type, int devid, int domid,
+ int flags, u64 address)
 {
-   int 

Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Alex Williamson
Hi Stuart,

On Tue, 2013-04-02 at 17:32 +, Yoder Stuart-B08248 wrote:
 Alex,
 
 We are in the process of implementing vfio-pci support for the Freescale
 IOMMU (PAMU).  It is an aperture/window-based IOMMU and is quite different
 than x86, and will involve creating a 'type 2' vfio implementation.
 
 For each device's DMA mappings, PAMU has an overall aperture and a number
 of windows.  All sizes and window counts must be power of 2.  To illustrate,
 below is a mapping for a 256MB guest, including guest memory (backed by
 64MB huge pages) and some windows for MSIs:
 
 Total aperture: 512MB
 # of windows: 8
 
 win gphys/
 #   iovaphys  size
 ---   
 0   0x  0xX_XX00  64MB
 1   0x0400  0xX_XX00  64MB
 2   0x0800  0xX_XX00  64MB
 3   0x0C00  0xX_XX00  64MB
 4   0x1000  0xf_fe044000  4KB// msi bank 1
 5   0x1400  0xf_fe045000  4KB// msi bank 2
 6   0x1800  0xf_fe046000  4KB// msi bank 3
 7- -  disabled
 
 There are a couple of updates needed to the vfio user-kernel interface
 that we would like your feedback on.
 
 1.  IOMMU geometry
 
The kernel IOMMU driver now has an interface (see domain_set_attr,
domain_get_attr) that lets us set the domain geometry using
attributes.
 
We want to expose that to user space, so envision needing a couple
of new ioctls to do this:
 VFIO_IOMMU_SET_ATTR
 VFIO_IOMMU_GET_ATTR 

Any ioctls to the vfiofd (/dev/vfio/vfio) not claimed by vfio-core are
passed to the IOMMU driver.  So you can effectively have your own type2
ioctl extensions.  Alexey has already posted patches to do this for
SPAPR that add VFIO_IOMMU_ENABLE/DISABLE to allow him access to
VFIO_IOMMU_GET_INFO to examine locked page requirements.  As Scott notes
we need to come up with a clean userspace interface for these though.

 2.   MSI window mappings
 
The more problematic question is how to deal with MSIs.  We need to
create mappings for up to 3 MSI banks that a device may need to target
to generate interrupts.  The Linux MSI driver can allocate MSIs from
the 3 banks any way it wants, and currently user space has no way of
knowing which bank may be used for a given device.   
 
There are 3 options we have discussed and would like your direction:
 
A.  Implicit mappings -- with this approach user space would not
explicitly map MSIs.  User space would be required to set the
geometry so that there are 3 unused windows (the last 3 windows)
for MSIs, and it would be up to the kernel to create the mappings.
This approach requires some specific semantics (leaving 3 windows)
and it potentially gets a little weird-- when should the kernel
actually create the MSI mappings?  When should they be unmapped?
Some convention would need to be established.

VFIO would have control of SET/GET_ATTR, right?  So we could reduce the
number exposed to userspace on GET and transparently add MSI entries on
SET.  On x86 the interrupt remapper handles this transparently when MSI
is enabled and userspace never gets direct access to the device MSI
address/data registers.  What kind of restrictions do you have around
adding and removing windows while the aperture is enabled?

B.  Explicit mapping using DMA map flags.  The idea is that a new
flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
a mapping is to be created for the supplied iova.  No vaddr
is given though.  So in the above example there would be a
a dma map at 0x1000 for 24KB (and no vaddr).   It's
up to the kernel to determine which bank gets mapped where.
So, this option puts user space in control of which windows
are used for MSIs and when MSIs are mapped/unmapped.   There
would need to be some semantics as to how this is used-- it
only makes sense

This could also be done as another type2 ioctl extension.  What's the
value to userspace in determining which windows are used by which banks?
It sounds like the case that there are X banks and if userspace wants to
use MSI it needs to leave X windows available for that.  Is this just
buying userspace a few more windows to allow them the choice between MSI
or RAM?

C.  Explicit mapping using normal DMA map.  The last idea is that
we would introduce a new ioctl to give user-space an fd to 
the MSI bank, which could be mmapped.  The flow would be
something like this:
   -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD
   -user space mmaps the fd, getting a vaddr
   -user space does a normal DMA map for desired iova
This approach makes everything explicit, but adds a new ioctl
applicable most likely only to the PAMU (type2 iommu).

And the DMA_MAP of that mmap then allows 

Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Stuart Yoder
On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood scottw...@freescale.com wrote:
 On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote:

 Alex,

 We are in the process of implementing vfio-pci support for the Freescale
 IOMMU (PAMU).  It is an aperture/window-based IOMMU and is quite different
 than x86, and will involve creating a 'type 2' vfio implementation.

 For each device's DMA mappings, PAMU has an overall aperture and a number
 of windows.  All sizes and window counts must be power of 2.  To
 illustrate,
 below is a mapping for a 256MB guest, including guest memory (backed by
 64MB huge pages) and some windows for MSIs:

 Total aperture: 512MB
 # of windows: 8

 win gphys/
 #   iovaphys  size
 ---   
 0   0x  0xX_XX00  64MB
 1   0x0400  0xX_XX00  64MB
 2   0x0800  0xX_XX00  64MB
 3   0x0C00  0xX_XX00  64MB
 4   0x1000  0xf_fe044000  4KB// msi bank 1
 5   0x1400  0xf_fe045000  4KB// msi bank 2
 6   0x1800  0xf_fe046000  4KB// msi bank 3
 7- -  disabled

 There are a couple of updates needed to the vfio user-kernel interface
 that we would like your feedback on.

 1.  IOMMU geometry

The kernel IOMMU driver now has an interface (see domain_set_attr,
domain_get_attr) that lets us set the domain geometry using
attributes.

We want to expose that to user space, so envision needing a couple
of new ioctls to do this:
 VFIO_IOMMU_SET_ATTR
 VFIO_IOMMU_GET_ATTR


 Note that this means attributes need to be updated for user-API
 appropriateness, such as using fixed-size types.


 2.   MSI window mappings

The more problematic question is how to deal with MSIs.  We need to
create mappings for up to 3 MSI banks that a device may need to target
to generate interrupts.  The Linux MSI driver can allocate MSIs from
the 3 banks any way it wants, and currently user space has no way of
knowing which bank may be used for a given device.

There are 3 options we have discussed and would like your direction:

A.  Implicit mappings -- with this approach user space would not
explicitly map MSIs.  User space would be required to set the
geometry so that there are 3 unused windows (the last 3 windows)


 Where does userspace get the number 3 from?  E.g. on newer chips there are
 4 MSI banks.  Maybe future chips have even more.

Ok, then make the number 4.   The chance of more MSI banks in future chips
is nil, and if it ever happened user space could adjust.  Also,
practically speaking since memory is typically allocate in powers of
2 way you need to approximately double the window geometry anyway.

B.  Explicit mapping using DMA map flags.  The idea is that a new
flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
a mapping is to be created for the supplied iova.  No vaddr
is given though.  So in the above example there would be a
a dma map at 0x1000 for 24KB (and no vaddr).


 A single 24 KiB mapping wouldn't work (and why 24KB? What if only one MSI
 group is involved in this VFIO group?  What if four MSI groups are
 involved?).  You'd need to either have a naturally aligned, power-of-two
 sized mapping that covers exactly the pages you want to map and no more, or
 you'd need to create a separate mapping for each MSI bank, and due to PAMU
 subwindow alignment restrictions these mappings could not be contiguous in
 iova-space.

You're right, a single 24KB mapping wouldn't work--  in the case of 3 MSI banks
perhaps we could just do one 64MB*3 mapping to identify which windows
are used for MSIs.

If only one MSI bank was involved the kernel could get clever and only enable
the banks actually needed.

Stuart
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Stuart Yoder
On Tue, Apr 2, 2013 at 3:32 PM, Alex Williamson
alex.william...@redhat.com wrote:
 2.   MSI window mappings

The more problematic question is how to deal with MSIs.  We need to
create mappings for up to 3 MSI banks that a device may need to target
to generate interrupts.  The Linux MSI driver can allocate MSIs from
the 3 banks any way it wants, and currently user space has no way of
knowing which bank may be used for a given device.

There are 3 options we have discussed and would like your direction:

A.  Implicit mappings -- with this approach user space would not
explicitly map MSIs.  User space would be required to set the
geometry so that there are 3 unused windows (the last 3 windows)
for MSIs, and it would be up to the kernel to create the mappings.
This approach requires some specific semantics (leaving 3 windows)
and it potentially gets a little weird-- when should the kernel
actually create the MSI mappings?  When should they be unmapped?
Some convention would need to be established.

 VFIO would have control of SET/GET_ATTR, right?  So we could reduce the
 number exposed to userspace on GET and transparently add MSI entries on
 SET.

The number of windows is always power of 2 (and max is 256).  And to reduce
PAMU cache pressure you want to use the fewest number of windows
you can.So, I don't see practically how we could transparently
steal entries to
add the MSIs. Either user space knows to leave empty windows for
MSIs and by convention the kernel knows which windows those are (as
in option #A) or explicitly tell the kernel which windows (as in option #B).

 On x86 the interrupt remapper handles this transparently when MSI
 is enabled and userspace never gets direct access to the device MSI
 address/data registers.  What kind of restrictions do you have around
 adding and removing windows while the aperture is enabled?

The windows can be enabled/disabled event while the aperture is
enabled (pretty sure)...

B.  Explicit mapping using DMA map flags.  The idea is that a new
flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
a mapping is to be created for the supplied iova.  No vaddr
is given though.  So in the above example there would be a
a dma map at 0x1000 for 24KB (and no vaddr).   It's
up to the kernel to determine which bank gets mapped where.
So, this option puts user space in control of which windows
are used for MSIs and when MSIs are mapped/unmapped.   There
would need to be some semantics as to how this is used-- it
only makes sense

 This could also be done as another type2 ioctl extension.  What's the
 value to userspace in determining which windows are used by which banks?
 It sounds like the case that there are X banks and if userspace wants to
 use MSI it needs to leave X windows available for that.  Is this just
 buying userspace a few more windows to allow them the choice between MSI
 or RAM?

Yes, it would potentially give user space the flexibility some more windows.
It also makes more explicit when the MSI mappings are created.  In option
#A the MSI mappings would probably get created at the time of the first
normal DMA map.

So, you're saying with this approach you'd rather see a new type 2
ioctl instead of adding new flags to DMA map, right?

C.  Explicit mapping using normal DMA map.  The last idea is that
we would introduce a new ioctl to give user-space an fd to
the MSI bank, which could be mmapped.  The flow would be
something like this:
   -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD
   -user space mmaps the fd, getting a vaddr
   -user space does a normal DMA map for desired iova
This approach makes everything explicit, but adds a new ioctl
applicable most likely only to the PAMU (type2 iommu).

 And the DMA_MAP of that mmap then allows userspace to select the window
 used?  This one seems like a lot of overhead, adding a new ioctl, new
 fd, mmap, special mapping path, etc.  It would be less overhead to just
 add an ioctl to enable MSI, maybe letting userspace pick which windows
 get used, but I'm still not sure what the value is to userspace in
 exposing it.  Thanks,

Thanks,
Stuart
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Scott Wood

On 04/02/2013 03:32:17 PM, Alex Williamson wrote:

On Tue, 2013-04-02 at 17:32 +, Yoder Stuart-B08248 wrote:
 2.   MSI window mappings

The more problematic question is how to deal with MSIs.  We need  
to
create mappings for up to 3 MSI banks that a device may need to  
target
to generate interrupts.  The Linux MSI driver can allocate MSIs  
from
the 3 banks any way it wants, and currently user space has no  
way of

knowing which bank may be used for a given device.

There are 3 options we have discussed and would like your  
direction:


A.  Implicit mappings -- with this approach user space would not
explicitly map MSIs.  User space would be required to set the
geometry so that there are 3 unused windows (the last 3  
windows)
for MSIs, and it would be up to the kernel to create the  
mappings.
This approach requires some specific semantics (leaving 3  
windows)
and it potentially gets a little weird-- when should the  
kernel
actually create the MSI mappings?  When should they be  
unmapped?

Some convention would need to be established.

VFIO would have control of SET/GET_ATTR, right?  So we could reduce  
the
number exposed to userspace on GET and transparently add MSI entries  
on

SET.


What do you mean by reduce the number exposed?  Userspace decides how  
many entries there are, but it must be a power of two beteen 1 and 256.



On x86 the interrupt remapper handles this transparently when MSI
is enabled and userspace never gets direct access to the device MSI
address/data registers.


x86 has a totally different mechanism here, as far as I understand --  
even before you get into restrictions on mappings.



What kind of restrictions do you have around
adding and removing windows while the aperture is enabled?


Subwindows can be modified while the aperture is enabled, but the  
aperture size and number of subwindows cannot be changed.



B.  Explicit mapping using DMA map flags.  The idea is that a new
flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
a mapping is to be created for the supplied iova.  No vaddr
is given though.  So in the above example there would be a
a dma map at 0x1000 for 24KB (and no vaddr).   It's
up to the kernel to determine which bank gets mapped where.
So, this option puts user space in control of which windows
are used for MSIs and when MSIs are mapped/unmapped.   There
would need to be some semantics as to how this is used-- it
only makes sense

This could also be done as another type2 ioctl extension.


Again, what is type2, specifically?  If someone else is adding their  
own IOMMU that is kind of, sort of like PAMU, how would they know if  
it's close enough?  What assumptions can a user make when they see that  
they're dealing with type2?


What's the value to userspace in determining which windows are used  
by which banks?


That depends on who programs the MSI config space address.  What is  
important is userspace controlling which iovas will be dedicated to  
this, in case it wants to put something else there.


It sounds like the case that there are X banks and if userspace wants  
to

use MSI it needs to leave X windows available for that.  Is this just
buying userspace a few more windows to allow them the choice between  
MSI

or RAM?


Well, there could be that.  But also, userspace will generally have a  
much better idea of the type of mappings it's creating, so it's easier  
to keep everything explicit at the kernel/user interface than require  
more complicated code in the kernel to figure things out automatically  
(not just for MSIs but in general).


If the kernel automatically creates the MSI mappings, when does it  
assume that userspace is done creating its own?  What if userspace  
doesn't need any DMA other than the MSIs?  What if userspace wants to  
continue dynamically modifying its other mappings?



C.  Explicit mapping using normal DMA map.  The last idea is that
we would introduce a new ioctl to give user-space an fd to
the MSI bank, which could be mmapped.  The flow would be
something like this:
   -for each group user space calls new ioctl  
VFIO_GROUP_GET_MSI_FD

   -user space mmaps the fd, getting a vaddr
   -user space does a normal DMA map for desired iova
This approach makes everything explicit, but adds a new ioctl
applicable most likely only to the PAMU (type2 iommu).

And the DMA_MAP of that mmap then allows userspace to select the  
window

used?  This one seems like a lot of overhead, adding a new ioctl, new
fd, mmap, special mapping path, etc.


There's going to be special stuff no matter what.  This would keep it  
separated from the IOMMU map code.


I'm not sure what you mean by overhead here... the runtime overhead  
of setting things up is not particularly relevant as long 

Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Stuart Yoder
On Tue, Apr 2, 2013 at 3:47 PM, Scott Wood scottw...@freescale.com wrote:
 On 04/02/2013 03:38:42 PM, Stuart Yoder wrote:

 On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood scottw...@freescale.com
 wrote:
  On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote:
 
  Alex,
 
  We are in the process of implementing vfio-pci support for the
  Freescale
  IOMMU (PAMU).  It is an aperture/window-based IOMMU and is quite
  different
  than x86, and will involve creating a 'type 2' vfio implementation.
 
  For each device's DMA mappings, PAMU has an overall aperture and a
  number
  of windows.  All sizes and window counts must be power of 2.  To
  illustrate,
  below is a mapping for a 256MB guest, including guest memory (backed by
  64MB huge pages) and some windows for MSIs:
 
  Total aperture: 512MB
  # of windows: 8
 
  win gphys/
  #   iovaphys  size
  ---   
  0   0x  0xX_XX00  64MB
  1   0x0400  0xX_XX00  64MB
  2   0x0800  0xX_XX00  64MB
  3   0x0C00  0xX_XX00  64MB
  4   0x1000  0xf_fe044000  4KB// msi bank 1
  5   0x1400  0xf_fe045000  4KB// msi bank 2
  6   0x1800  0xf_fe046000  4KB// msi bank 3
  7- -  disabled
 
  There are a couple of updates needed to the vfio user-kernel interface
  that we would like your feedback on.
 
  1.  IOMMU geometry
 
 The kernel IOMMU driver now has an interface (see domain_set_attr,
 domain_get_attr) that lets us set the domain geometry using
 attributes.
 
 We want to expose that to user space, so envision needing a couple
 of new ioctls to do this:
  VFIO_IOMMU_SET_ATTR
  VFIO_IOMMU_GET_ATTR
 
 
  Note that this means attributes need to be updated for user-API
  appropriateness, such as using fixed-size types.
 
 
  2.   MSI window mappings
 
 The more problematic question is how to deal with MSIs.  We need to
 create mappings for up to 3 MSI banks that a device may need to
  target
 to generate interrupts.  The Linux MSI driver can allocate MSIs from
 the 3 banks any way it wants, and currently user space has no way of
 knowing which bank may be used for a given device.
 
 There are 3 options we have discussed and would like your direction:
 
 A.  Implicit mappings -- with this approach user space would not
 explicitly map MSIs.  User space would be required to set the
 geometry so that there are 3 unused windows (the last 3 windows)
 
 
  Where does userspace get the number 3 from?  E.g. on newer chips there
  are
  4 MSI banks.  Maybe future chips have even more.

 Ok, then make the number 4.   The chance of more MSI banks in future chips
 is nil,


 What makes you so sure?  Especially since you seem to be presenting this as
 not specifically an MPIC API.


 and if it ever happened user space could adjust.


 What bit of API is going to tell it that it needs to adjust?

Haven't thought through that completely, but I guess we could add an API
to return the number of MSI banks for type 2 iommus.

 Also, practically speaking since memory is typically allocate in powers of
 2 way you need to approximately double the window geometry anyway.


 Only if your existing mapping needs fit exactly in a power of two.


 B.  Explicit mapping using DMA map flags.  The idea is that a new
 flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
 a mapping is to be created for the supplied iova.  No vaddr
 is given though.  So in the above example there would be a
 a dma map at 0x1000 for 24KB (and no vaddr).
 
 
  A single 24 KiB mapping wouldn't work (and why 24KB? What if only one
  MSI
  group is involved in this VFIO group?  What if four MSI groups are
  involved?).  You'd need to either have a naturally aligned, power-of-two
  sized mapping that covers exactly the pages you want to map and no more,
  or
  you'd need to create a separate mapping for each MSI bank, and due to
  PAMU
  subwindow alignment restrictions these mappings could not be contiguous
  in
  iova-space.

 You're right, a single 24KB mapping wouldn't work--  in the case of 3 MSI
 banks
 perhaps we could just do one 64MB*3 mapping to identify which windows
 are used for MSIs.


 Where did the assumption of a 64MiB subwindow size come from?

The example I was using.   User space would need to create a
mapping for window_size * msi_bank_count.

Stuart
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread Joerg Roedel
On Tue, Apr 02, 2013 at 09:32:40PM +0200, Borislav Petkov wrote:
 On Tue, Apr 02, 2013 at 06:33:18PM +0200, Joerg Roedel wrote:

  Okay, in theory I could implement a feedback loop between timer-setup
  and intremap code and try fixups until it works. But that seems not to
  be worth it to work around a buggy BIOS.
 
 Yeah, same here. It's not like we really need intremap to work - we're
 only trying to fix the annoying error message currently. :-)

Right :)

  What I actually thought about was providing an IVRS-override on the
  kernel command line. So that you can specify the IOAPIC_ID-DEVID
  mapping there and make it work this way. What do you think?
 
 I guess that is workable. I can imagine people wanting this if they want
 to do the intremap thing on such b0rked BIOSen. So how do I specify this
 IOAPIC_ID-DEVID mapping on the cmdline exactly?


Don't know yet, probably something like ivrs_ioapic[apicid]=0:14.2 and
ivrs_hpet[hpet-id]=0:14.2. But not entierly sure yet, at least parsing
shouldn't require lex and yacc ;)

  Yeah, that's my fear too. So we leave it better as it is...
 
 Hohumm.

Thus shall it be!


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Scott Wood

On 04/02/2013 03:38:42 PM, Stuart Yoder wrote:
On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood scottw...@freescale.com  
wrote:

 On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote:

 Alex,

 We are in the process of implementing vfio-pci support for the  
Freescale
 IOMMU (PAMU).  It is an aperture/window-based IOMMU and is quite  
different

 than x86, and will involve creating a 'type 2' vfio implementation.

 For each device's DMA mappings, PAMU has an overall aperture and a  
number

 of windows.  All sizes and window counts must be power of 2.  To
 illustrate,
 below is a mapping for a 256MB guest, including guest memory  
(backed by

 64MB huge pages) and some windows for MSIs:

 Total aperture: 512MB
 # of windows: 8

 win gphys/
 #   iovaphys  size
 ---   
 0   0x  0xX_XX00  64MB
 1   0x0400  0xX_XX00  64MB
 2   0x0800  0xX_XX00  64MB
 3   0x0C00  0xX_XX00  64MB
 4   0x1000  0xf_fe044000  4KB// msi bank 1
 5   0x1400  0xf_fe045000  4KB// msi bank 2
 6   0x1800  0xf_fe046000  4KB// msi bank 3
 7- -  disabled

 There are a couple of updates needed to the vfio user-kernel  
interface

 that we would like your feedback on.

 1.  IOMMU geometry

The kernel IOMMU driver now has an interface (see  
domain_set_attr,

domain_get_attr) that lets us set the domain geometry using
attributes.

We want to expose that to user space, so envision needing a  
couple

of new ioctls to do this:
 VFIO_IOMMU_SET_ATTR
 VFIO_IOMMU_GET_ATTR


 Note that this means attributes need to be updated for user-API
 appropriateness, such as using fixed-size types.


 2.   MSI window mappings

The more problematic question is how to deal with MSIs.  We  
need to
create mappings for up to 3 MSI banks that a device may need to  
target
to generate interrupts.  The Linux MSI driver can allocate MSIs  
from
the 3 banks any way it wants, and currently user space has no  
way of

knowing which bank may be used for a given device.

There are 3 options we have discussed and would like your  
direction:


A.  Implicit mappings -- with this approach user space would not
explicitly map MSIs.  User space would be required to set  
the
geometry so that there are 3 unused windows (the last 3  
windows)



 Where does userspace get the number 3 from?  E.g. on newer chips  
there are

 4 MSI banks.  Maybe future chips have even more.

Ok, then make the number 4.   The chance of more MSI banks in future  
chips

is nil,


What makes you so sure?  Especially since you seem to be presenting  
this as not specifically an MPIC API.



and if it ever happened user space could adjust.


What bit of API is going to tell it that it needs to adjust?

Also, practically speaking since memory is typically allocate in  
powers of

2 way you need to approximately double the window geometry anyway.


Only if your existing mapping needs fit exactly in a power of two.

B.  Explicit mapping using DMA map flags.  The idea is that a  
new

flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
a mapping is to be created for the supplied iova.  No vaddr
is given though.  So in the above example there would be a
a dma map at 0x1000 for 24KB (and no vaddr).


 A single 24 KiB mapping wouldn't work (and why 24KB? What if only  
one MSI

 group is involved in this VFIO group?  What if four MSI groups are
 involved?).  You'd need to either have a naturally aligned,  
power-of-two
 sized mapping that covers exactly the pages you want to map and no  
more, or
 you'd need to create a separate mapping for each MSI bank, and due  
to PAMU
 subwindow alignment restrictions these mappings could not be  
contiguous in

 iova-space.

You're right, a single 24KB mapping wouldn't work--  in the case of 3  
MSI banks

perhaps we could just do one 64MB*3 mapping to identify which windows
are used for MSIs.


Where did the assumption of a 64MiB subwindow size come from?

If only one MSI bank was involved the kernel could get clever and  
only enable

the banks actually needed.


I'd rather see cleverness kept in userspace.

-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Alex Williamson
On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
 On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood scottw...@freescale.com wrote:
  C.  Explicit mapping using normal DMA map.  The last idea is that
  we would introduce a new ioctl to give user-space an fd to
  the MSI bank, which could be mmapped.  The flow would be
  something like this:
 -for each group user space calls new ioctl
   VFIO_GROUP_GET_MSI_FD
 -user space mmaps the fd, getting a vaddr
 -user space does a normal DMA map for desired iova
  This approach makes everything explicit, but adds a new ioctl
  applicable most likely only to the PAMU (type2 iommu).
 
  And the DMA_MAP of that mmap then allows userspace to select the window
  used?  This one seems like a lot of overhead, adding a new ioctl, new
  fd, mmap, special mapping path, etc.
 
 
  There's going to be special stuff no matter what.  This would keep it
  separated from the IOMMU map code.
 
  I'm not sure what you mean by overhead here... the runtime overhead of
  setting things up is not particularly relevant as long as it's reasonable.
  If you mean development and maintenance effort, keeping things well
  separated should help.
 
 We don't need to change DMA_MAP.  If we can simply add a new type 2
 ioctl that allows user space to set which windows are MSIs, it seems vastly
 less complex than an ioctl to supply a new fd, mmap of it, etc.
 
 So maybe 2 ioctls:
 VFIO_IOMMU_GET_MSI_COUNT
 VFIO_IOMMU_MAP_MSI(iova, size)
 

How are MSIs related to devices on PAMU?  On x86 MSI count is very
device specific, which means it wold be a VFIO_DEVICE_* ioctl (actually
VFIO_DEVICE_GET_IRQ_INFO does this for us on x86).  The trouble with it
being a device ioctl is that you need to get the device FD, but the
IOMMU protection needs to be established before you can get that... so
there's an ordering problem if you need it from the device before
configuring the IOMMU.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Scott Wood

On 04/02/2013 04:08:27 PM, Stuart Yoder wrote:
On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood scottw...@freescale.com  
wrote:

 This could also be done as another type2 ioctl extension.


 Again, what is type2, specifically?  If someone else is adding  
their own
 IOMMU that is kind of, sort of like PAMU, how would they know if  
it's close
 enough?  What assumptions can a user make when they see that  
they're dealing

 with type2?

We will define that as part of the type2 implementation.   Highly  
unlikely

anything but a PAMU will comply.


So then why not just call it pamu instead of being obfuscatory?

 There's going to be special stuff no matter what.  This would keep  
it

 separated from the IOMMU map code.

 I'm not sure what you mean by overhead here... the runtime  
overhead of
 setting things up is not particularly relevant as long as it's  
reasonable.

 If you mean development and maintenance effort, keeping things well
 separated should help.

We don't need to change DMA_MAP.  If we can simply add a new type 2
ioctl that allows user space to set which windows are MSIs,


And what specifically does that ioctl do?  It causes new mappings to be  
created, right?  So you're changing (or at least adding to) the DMA map  
mechanism.


it seems vastly less complex than an ioctl to supply a new fd, mmap  
of it, etc.


I don't see enough complexity in the mmap approach for anything to be  
vastly less complex in comparison.  I think you're building the mmap  
approach up in your head to be a lot worse that it would actually be.


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread Joerg Roedel
On Tue, Apr 02, 2013 at 02:05:14PM -0500, Suthikulpanit, Suravee wrote:
 From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
 +static const char * const _type_field_encodings[] = {
 + /* 00 */Reserved,
 + /* 01 */Master Abort,
 + /* 10 */Target Abort,
 + /* 11 */Data Error,

In these arrays, please put the comments at the end of the line and
align them nicely like this:

static const char * const _type_field_encodings[] = {
Reserved, /* 00 */
Master Abort, /* 01 */
Target Abort, /* 10 */
Data Error,   /* 11 */
};

This improves readability a lot.

 +static void dump_flags(int flags, int ev_type)
 +{
 + struct _event_log_flags *p = (struct _event_log_flags *) flags;
 + u32 err_type = p-type;
 +
 + pr_err(AMD-Vi: Flags details: %s, NX:%u, %s, %s, %s, %s, %s, %s, %s\n,
 + (p-gn ? Use guest address : Use nested address),
 + (p-nx),
 + (p-us ? User privileges : Supervisor privileges),
 + (p-i ? Interrupt request : memory request),
 + (p-pr ? Page present or Interrupt remapped :
 + Page not present or Interrupt blocked),
 + (p-rw ? Write transaction : Read transaction),
 + (p-pe ? Does not have permission : Has permission),
 + (p-rz ? Reserv bit not zero : Illegal level encoding),
 + (p-tr ? Translation request : Transaction request));

These strings are too long, please just print useful shortcuts for them,
like
US : SV instead of
User privileges : Supervisor privileges

Also the commas between the strings are not needed then.

As a general rule, in the default configuration an event-log entry
should not take more than a single line in dmesg (which is also not too
long). If you want to print more information in some cases you can
enable that by a command line parameter like 'amd_iommu=verbose' or
something like that.

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Scott Wood

On 04/02/2013 04:16:11 PM, Alex Williamson wrote:

On Tue, 2013-04-02 at 15:54 -0500, Stuart Yoder wrote:
 The number of windows is always power of 2 (and max is 256).  And  
to reduce

 PAMU cache pressure you want to use the fewest number of windows
 you can.So, I don't see practically how we could transparently
 steal entries to
 add the MSIs. Either user space knows to leave empty windows for
 MSIs and by convention the kernel knows which windows those are (as
 in option #A) or explicitly tell the kernel which windows (as in  
option #B).


Ok, apparently I don't understand the API.  Is it something like
userspace calls GET_ATTR and finds out that there are 256 available
windows, userspace determines that it needs 8 for RAM and then it has  
an

MSI device, so it needs to call SET_ATTR and ask for 16?  That seems
prone to exploitation by the first userspace to allocate it's  
aperture,


What exploitation?

It's not as if there is a pool of 256 global windows that users  
allocate from.  The subwindow count is just how finely divided the  
aperture is.  The only way one user will affect another is through  
cache contention (which is why we want the minimum number of subwindows  
that we can get away with).



but I'm also not sure why userspace could specify the (non-power of 2)
number of windows it needs for RAM, then VFIO would see that the  
devices

attached have MSI and add those windows and align to a power of 2.


If you double the subwindow count without userspace knowing, you have  
to double the aperture as well (and you may need to grow up or down  
depending on alignment).  This means you also need to halve the maximum  
aperture that userspace can request.  And you need to expose a  
different number of maximum subwindows in the IOMMU API based on  
whether we might have MSIs of this type.  It's ugly and awkward, and  
removes the possibility for userspace to place the MSIs in some unused  
slot in the middle, or not use MSIs at all.


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Scott Wood

On 04/02/2013 04:38:45 PM, Alex Williamson wrote:

On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
 On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood  
scottw...@freescale.com wrote:
  C.  Explicit mapping using normal DMA map.  The last idea  
is that
  we would introduce a new ioctl to give user-space an fd  
to

  the MSI bank, which could be mmapped.  The flow would be
  something like this:
 -for each group user space calls new ioctl
   VFIO_GROUP_GET_MSI_FD
 -user space mmaps the fd, getting a vaddr
 -user space does a normal DMA map for desired iova
  This approach makes everything explicit, but adds a new  
ioctl

  applicable most likely only to the PAMU (type2 iommu).
 
  And the DMA_MAP of that mmap then allows userspace to select the  
window
  used?  This one seems like a lot of overhead, adding a new  
ioctl, new

  fd, mmap, special mapping path, etc.
 
 
  There's going to be special stuff no matter what.  This would  
keep it

  separated from the IOMMU map code.
 
  I'm not sure what you mean by overhead here... the runtime  
overhead of
  setting things up is not particularly relevant as long as it's  
reasonable.
  If you mean development and maintenance effort, keeping things  
well

  separated should help.

 We don't need to change DMA_MAP.  If we can simply add a new type  
2
 ioctl that allows user space to set which windows are MSIs, it  
seems vastly

 less complex than an ioctl to supply a new fd, mmap of it, etc.

 So maybe 2 ioctls:
 VFIO_IOMMU_GET_MSI_COUNT


Do you mean a count of actual MSIs or a count of MSI banks used by the  
whole VFIO group?



 VFIO_IOMMU_MAP_MSI(iova, size)


Not sure how you mean size to be used -- for MPIC it would be 4K per  
bank, and you can only map one bank at a time (which bank you're  
mapping should be a parameter, if only so that the kernel doesn't have  
to keep iteration state for you).



How are MSIs related to devices on PAMU?


PAMU doesn't care about MSIs.  The relation of individual MSIs to a  
device is standard PCI stuff.  Each MSI bank (which is part of the  
MPIC, not PAMU) can hold numerous MSIs.  The VFIO user would want to  
map all MSI banks that are in use by any of the devices in the group.   
Ideally we'd let the VFIO grouping influence the allocation of MSIs.



On x86 MSI count is very
device specific, which means it wold be a VFIO_DEVICE_* ioctl  
(actually
VFIO_DEVICE_GET_IRQ_INFO does this for us on x86).  The trouble with  
it

being a device ioctl is that you need to get the device FD, but the
IOMMU protection needs to be established before you can get that... so
there's an ordering problem if you need it from the device before
configuring the IOMMU.  Thanks,


What do you mean by IOMMU protection needs to be established?   
Wouldn't we just start with no mappings in place?


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V3] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread suravee.suthikulpanit
From: Suravee Suthikulpanit suravee.suthikulpa...@amd.com

Add logic to decode AMD IOMMU event flag based on information from AMD IOMMU 
specification.
This should simplify debugging IOMMU errors.  Also, dump DTE information in 
additional cases.

This is an example:
AMD-Vi: Event logged [IO_PAGE_FAULT device=51:00.0 domain=0x 
address=0x flags=0x0fff]
AMD-Vi: Flags details: Guest NX=1 User Intr Present Write No-Perm Rsrv-Bit 
Translation
AMD-Vi: Type of error: (0x7) 
AMD-Vi: (Note: Please refer to AMD IOMMU specification for details.)
AMD-Vi: DTE[0]: 603fa75e2403
AMD-Vi: DTE[1]: 0014
AMD-Vi: DTE[2]: 203fa5e09011
AMD-Vi: DTE[3]: 

Signed-off-by: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
---
Changelog:
V3:
* Move comments to end of line
* Shorten the print out to be within one line
V2:
* Fix printing format to reduce noise
* Use string table instead of switch/case
* Use pr_cont instead of printk

 drivers/iommu/amd_iommu.c |  171 -
 1 file changed, 137 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b287ca3..593a1a3 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -601,6 +601,94 @@ static void amd_iommu_stats_init(void)
  *
  /
 
+struct _event_log_flags {
+   u32 gn:1,   /* 16 */
+   nx:1,   /* 17 */
+   us:1,   /* 18 */
+i:1,   /* 19 */
+   pr:1,   /* 20 */
+   rw:1,   /* 21 */
+   pe:1,   /* 22 */
+   rz:1,   /* 23 */
+   tr:1,   /* 24 */
+   type:3, /* [27:25] */
+   _reserved_:20;  /* Reserved */
+};
+
+static const char * const _type_field_encodings[] = {
+   Reserved, /* 00 */
+   Master Abort, /* 01 */
+   Target Abort, /* 10 */
+   Data Error,   /* 11 */
+};
+
+static const char * const _invalid_transaction_desc[] = {
+   Read request or non-posted write in the interrupt 
+addres range,/* 000 */
+   Pretranslated transaction received from an 
+   I/O device that has I=0 or V=0 in DTE,/* 001 */
+   Port I/O space transaction received from an 
+   I/O device that has IoCtl=00b in DTE, /* 010 */
+   Posted write to invalid address range,/* 011 */
+   Invalid read request or non-posted write, /* 100 */
+   Posted write to the interrupt/EOI range from an 
+   I/O device that has IntCtl=00b in DTE,/* 101 */
+   Posted write to a reserved interrupt address range,   /* 110 */
+   Invalid transaction to the system management 
+   address range,/* 111 */
+};
+
+static const char * const _invalid_translation_desc[] = {
+   Translation request received from an I/O device 
+   that has I=0, or has V=0, or has V=1 and 
+   TV=0 in DTE,  /* 000 */
+   Translation request in invalid address range, /* 001 */
+   Invalid translation request,  /* 010 */
+   Reserved, /* 011 */
+   Reserved, /* 100 */
+   Reserved, /* 101 */
+   Reserved, /* 110 */
+   Reserved, /* 111 */
+};
+
+static void dump_flags(int flags, int ev_type)
+{
+   struct _event_log_flags *p = (struct _event_log_flags *) flags;
+   u32 err_type = p-type;
+
+   pr_err(AMD-Vi: Flags details: %s NX=%u %s %s %s %s %s %s %s\n,
+   (p-gn ? Guest : Nested),
+   (p-nx),
+   (p-us ? User : Super),
+   (p-i ? Intr : Mem),
+   (p-pr ? Present : Absent),
+   (p-rw ? Write : Read),
+   (p-pe ? No-Perm : Has-Perm),
+   (p-rz ? Rsrv-Bit : Ill-Level),
+   (p-tr ? Translation : Transaction));
+
+   pr_err(AMD-Vi: Type of error: (0x%x) , err_type);
+   if ((ev_type == EVENT_TYPE_DEV_TAB_ERR)  ||
+   (ev_type == EVENT_TYPE_PAGE_TAB_ERR) ||
+   (ev_type == EVENT_TYPE_CMD_HARD_ERR)) {
+   if (err_type  ARRAY_SIZE(_type_field_encodings)) {
+   pr_cont(%s\n,
+   _type_field_encodings[err_type]);
+   }
+   } else if (ev_type == EVENT_TYPE_INV_DEV_REQ) {
+   if (p-tr == 0) {
+   if (err_type  

Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-02 Thread Timur Tabi
On Tue, Apr 2, 2013 at 11:18 AM, Joerg Roedel j...@8bytes.org wrote:

  + panic(\n);

 A kernel panic seems like an over-reaction to an access violation.

We have no way to determining what code caused the violation, so we
can't just kill the process.  I agree it seems like overkill, but what
else should we do?  Does the IOMMU layer have a way for the IOMMU
driver to stop the device that caused the problem?

 Besides the device that caused the violation the system should still
 work, no?

Not really.  The PAMU was designed to add IOMMU support to legacy
devices, which have no concept of an MMU.  If the PAMU detects an
access violation, there's no way for the device to recover, because it
has no idea that a violation has occurred.  It's going to keep on
writing to bad data.

Maybe we need a mechanism where a driver can register a callback
function to handle IOMMU exceptions?

  + /*
  +  * In case of devices with multiple LIODNs just store
  +  * the info for the first LIODN as all
  +  * LIODNs share the same domain
  +  */
  + if (!old_domain_info)
  + dev-archdata.iommu_domain = info;
  + spin_unlock(device_domain_lock);

 Don't you have to tell the hardware that a device was added to a domain?
 I don't see that, what I am missing?

I'm not sure I understand.  What hardware do you think needs to be notified?

The PAMU reads everything it needs from the PAACT, which we update.
The PAMU does not know anything about the devices that it monitors,
and the devices don't know anything about the PAMU.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Alex Williamson
On Tue, 2013-04-02 at 17:13 -0500, Scott Wood wrote:
 On 04/02/2013 04:16:11 PM, Alex Williamson wrote:
  On Tue, 2013-04-02 at 15:54 -0500, Stuart Yoder wrote:
   The number of windows is always power of 2 (and max is 256).  And  
  to reduce
   PAMU cache pressure you want to use the fewest number of windows
   you can.So, I don't see practically how we could transparently
   steal entries to
   add the MSIs. Either user space knows to leave empty windows for
   MSIs and by convention the kernel knows which windows those are (as
   in option #A) or explicitly tell the kernel which windows (as in  
  option #B).
  
  Ok, apparently I don't understand the API.  Is it something like
  userspace calls GET_ATTR and finds out that there are 256 available
  windows, userspace determines that it needs 8 for RAM and then it has  
  an
  MSI device, so it needs to call SET_ATTR and ask for 16?  That seems
  prone to exploitation by the first userspace to allocate it's  
  aperture,
 
 What exploitation?
 
 It's not as if there is a pool of 256 global windows that users  
 allocate from.  The subwindow count is just how finely divided the  
 aperture is.  The only way one user will affect another is through  
 cache contention (which is why we want the minimum number of subwindows  
 that we can get away with).
 
  but I'm also not sure why userspace could specify the (non-power of 2)
  number of windows it needs for RAM, then VFIO would see that the  
  devices
  attached have MSI and add those windows and align to a power of 2.
 
 If you double the subwindow count without userspace knowing, you have  
 to double the aperture as well (and you may need to grow up or down  
 depending on alignment).  This means you also need to halve the maximum  
 aperture that userspace can request.  And you need to expose a  
 different number of maximum subwindows in the IOMMU API based on  
 whether we might have MSIs of this type.  It's ugly and awkward, and  
 removes the possibility for userspace to place the MSIs in some unused  
 slot in the middle, or not use MSIs at all.

Ok, I missed this in Stuart's example:

Total aperture: 512MB
# of windows: 8

win gphys/
#   iovaphys  size
---   
0   0x  0xX_XX00  64MB
1   0x0400  0xX_XX00  64MB
2   0x0800  0xX_XX00  64MB
3   0x0C00  0xX_XX00  64MB
4   0x1000  0xf_fe044000  4KB// msi bank 1
  ^^
5   0x1400  0xf_fe045000  4KB// msi bank 2
  ^^
6   0x1800  0xf_fe046000  4KB// msi bank 3
  ^^
7- -  disabled

So even though the MSI banks are 4k in this example, they're still on
64MB boundaries.  If userspace were to leave this as 256 windows, each
would be 2MB and we'd use 128 of them to map the same memory as these
4x64MB windows and thrash the iotlb harder.  The picture is becoming
clearer.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Alex Williamson
On Tue, 2013-04-02 at 17:44 -0500, Scott Wood wrote:
 On 04/02/2013 04:32:04 PM, Alex Williamson wrote:
  On Tue, 2013-04-02 at 15:57 -0500, Scott Wood wrote:
   On 04/02/2013 03:32:17 PM, Alex Williamson wrote:
On x86 the interrupt remapper handles this transparently when MSI
is enabled and userspace never gets direct access to the device  
  MSI
address/data registers.
  
   x86 has a totally different mechanism here, as far as I understand  
  --
   even before you get into restrictions on mappings.
  
  So what control will userspace have over programming the actually MSI
  vectors on PAMU?
 
 Not sure what you mean -- PAMU doesn't get explicitly involved in  
 MSIs.  It's just another 4K page mapping (per relevant MSI bank).  If  
 you want isolation, you need to make sure that an MSI group is only  
 used by one VFIO group, and that you're on a chip that has alias pages  
 with just one MSI bank register each (newer chips do, but the first  
 chip to have a PAMU didn't).

How does a user figure this out?

This could also be done as another type2 ioctl extension.
  
   Again, what is type2, specifically?  If someone else is adding  
  their
   own IOMMU that is kind of, sort of like PAMU, how would they know if
   it's close enough?  What assumptions can a user make when they see  
  that
   they're dealing with type2?
  
  Naming always has and always will be a problem.  I assume this is  
  named
  type2 rather than PAMU because it's trying to expose a generic  
  windowed
  IOMMU fitting the IOMMU API.
 
 But how closely is the MSI situation related to a generic windowed  
 IOMMU, then?  We could just as well have a highly flexible IOMMU in  
 terms of arbitrary 4K page mappings, but still handle MSIs as pages to  
 be mapped rather than a translation table.  Or we could have a windowed  
 IOMMU that has an MSI translation table.
 
  Like type1, it doesn't really make sense
  to name it IOMMU API because that's a kernel internal interface and
  we're designing a userspace interface that just happens to use that.
  Tagging it to a piece of hardware makes it less reusable.
 
 Well, that's my point.  Is it reusable at all, anyway?  If not, then  
 giving it a more obscure name won't change that.  If it is reusable,  
 then where is the line drawn between things that are PAMU-specific or  
 MPIC-specific and things that are part of the generic windowed IOMMU  
 abstraction?
 
   Type1 is arbitrary.  It might as well be named brown and this one  
  can be
  blue.
 
 The difference is that type1 seems to refer to hardware that can do  
 arbitrary 4K page mappings, possibly constrained by an aperture but  
 nothing else.  More than one IOMMU can reasonably fit that.  The odds  
 that another IOMMU would have exactly the same restrictions as PAMU  
 seem smaller in comparison.
 
 In any case, if you had to deal with some Intel-only quirk, would it  
 make sense to call it a type1 attribute?  I'm not advocating one way  
 or the other on whether an abstraction is viable here (though Stuart  
 seems to think it's highly unlikely anything but a PAMU will comply),  
 just that if it is to be abstracted rather than a hardware-specific  
 interface, we need to document what is and is not part of the  
 abstraction.  Otherwise a non-PAMU-specific user won't know what they  
 can rely on, and someone adding support for a new windowed IOMMU won't  
 know if theirs is close enough, or they need to introduce a type3.

So Alexey named the SPAPR IOMMU something related to spapr...
surprisingly enough.  I'm fine with that.  If you think it's unique
enough, name it something appropriately.  I haven't seen the code and
don't know the architecture sufficiently to have an opinion.

What's the value to userspace in determining which windows are  
  used
by which banks?
  
   That depends on who programs the MSI config space address.  What is
   important is userspace controlling which iovas will be dedicated to
   this, in case it wants to put something else there.
  
  So userspace is programming the MSI vectors, targeting a user  
  programmed
  iova?  But an iova selects a window and I thought there were some  
  number
  of MSI banks and we don't really know which ones we'll need...  still
  confused.
 
 Userspace would also need a way to find out the page offset and data  
 value.  That may be an argument in favor of having the two ioctls  
 Stuart later suggested (get MSI count, and map MSI).

Connecting the user set iova and host kernel assigned irq number is
where I'm still lost, but I'll follow-up with that question in the other
thread.

 Would there be  
 any complication in the VFIO code from tracking a mapping that doesn't  
 have a userspace virtual address associated with it?

Only the VFIO iommu driver tracks mappings, the QEMU userspace component
doesn't (replies on the memory API for type1), nor does any of the
kernel framework code.

   There's going to be special stuff no matter 

Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Alex Williamson
On Tue, 2013-04-02 at 17:50 -0500, Scott Wood wrote:
 On 04/02/2013 04:38:45 PM, Alex Williamson wrote:
  On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
   On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood  
  scottw...@freescale.com wrote:
C.  Explicit mapping using normal DMA map.  The last idea  
  is that
we would introduce a new ioctl to give user-space an fd  
  to
the MSI bank, which could be mmapped.  The flow would be
something like this:
   -for each group user space calls new ioctl
 VFIO_GROUP_GET_MSI_FD
   -user space mmaps the fd, getting a vaddr
   -user space does a normal DMA map for desired iova
This approach makes everything explicit, but adds a new  
  ioctl
applicable most likely only to the PAMU (type2 iommu).
   
And the DMA_MAP of that mmap then allows userspace to select the  
  window
used?  This one seems like a lot of overhead, adding a new  
  ioctl, new
fd, mmap, special mapping path, etc.
   
   
There's going to be special stuff no matter what.  This would  
  keep it
separated from the IOMMU map code.
   
I'm not sure what you mean by overhead here... the runtime  
  overhead of
setting things up is not particularly relevant as long as it's  
  reasonable.
If you mean development and maintenance effort, keeping things  
  well
separated should help.
  
   We don't need to change DMA_MAP.  If we can simply add a new type  
  2
   ioctl that allows user space to set which windows are MSIs, it  
  seems vastly
   less complex than an ioctl to supply a new fd, mmap of it, etc.
  
   So maybe 2 ioctls:
   VFIO_IOMMU_GET_MSI_COUNT
 
 Do you mean a count of actual MSIs or a count of MSI banks used by the  
 whole VFIO group?

I hope the latter, which would clarify how this is distinct from
DEVICE_GET_IRQ_INFO.  Is hotplug even on the table?  Presumably
dynamically adding a device could bring along additional MSI banks?

   VFIO_IOMMU_MAP_MSI(iova, size)
 
 Not sure how you mean size to be used -- for MPIC it would be 4K per  
 bank, and you can only map one bank at a time (which bank you're  
 mapping should be a parameter, if only so that the kernel doesn't have  
 to keep iteration state for you).
 
  How are MSIs related to devices on PAMU?
 
 PAMU doesn't care about MSIs.  The relation of individual MSIs to a  
 device is standard PCI stuff.  Each MSI bank (which is part of the  
 MPIC, not PAMU) can hold numerous MSIs.  The VFIO user would want to  
 map all MSI banks that are in use by any of the devices in the group.   
 Ideally we'd let the VFIO grouping influence the allocation of MSIs.

The current VFIO MSI support has the host handling everything about MSI.
The user never programs an MSI vector to the physical device, they set
up everything through ioctl.  On interrupt, we simply trigger an eventfd
and leave it to things like KVM irqfd or QEMU to do the right thing in a
virtual machine.

Here the MSI vector has to go through a PAMU window to hit the correct
MSI bank.  So that means it has some component of the iova involved,
which we're proposing here is controlled by userspace (whether that
vector uses an offset from 0x1000 or 0x depending on which
window slot is used to make the MSI bank).  I assume we're still working
in a model where the physical interrupt fires into the host and a
host-based interrupt handler triggers an eventfd, right?  So that means
the vector also has host components so we trigger the correct ISR.  How
is that coordinated?

Would is be possible for userspace to simply leave room for MSI bank
mapping (how much room could be determined by something like
VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can
DMA_MAP starting at the 0x0 address of the aperture, growing up, and
VFIO will map banks on demand at the top of the aperture, growing down?
Wouldn't that avoid a lot of issues with userspace needing to know
anything about MSI banks (other than count) and coordinating irq numbers
and enabling handlers?

  On x86 MSI count is very
  device specific, which means it wold be a VFIO_DEVICE_* ioctl  
  (actually
  VFIO_DEVICE_GET_IRQ_INFO does this for us on x86).  The trouble with  
  it
  being a device ioctl is that you need to get the device FD, but the
  IOMMU protection needs to be established before you can get that... so
  there's an ordering problem if you need it from the device before
  configuring the IOMMU.  Thanks,
 
 What do you mean by IOMMU protection needs to be established?   
 Wouldn't we just start with no mappings in place?

If no mappings blocks all DMA, sure, that's fine.  Once the VFIO device
FD is accessible by userspace we have to protect the host against DMA.
If any IOMMU_SET_ATTR calls temporarily disable DMA protection, that
could be exploitable.  Thanks,

Alex

___
iommu mailing list

RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-02 Thread Sethi Varun-B16395


 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, April 03, 2013 7:23 AM
 To: Timur Tabi
 Cc: Joerg Roedel; Sethi Varun-B16395; lkml; Kumar Gala; Yoder Stuart-
 B08248; iommu@lists.linux-foundation.org; Benjamin Herrenschmidt;
 linuxppc-...@lists.ozlabs.org
 Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu
 implementation.
 
 On 04/02/2013 08:35:54 PM, Timur Tabi wrote:
  On Tue, Apr 2, 2013 at 11:18 AM, Joerg Roedel j...@8bytes.org wrote:
 
+ panic(\n);
  
   A kernel panic seems like an over-reaction to an access violation.
 
  We have no way to determining what code caused the violation, so we
  can't just kill the process.  I agree it seems like overkill, but what
  else should we do?  Does the IOMMU layer have a way for the IOMMU
  driver to stop the device that caused the problem?
 
 At a minimum, log a message and continue.  Probably turn off the LIODN,
 at least if it continues to be noisy (otherwise we could get stuck in an
 interrupt storm as you note).  Possibly let the user know somehow,
 especially if it's a VFIO domain.
[Sethi Varun-B16395] Can definitely log the message and disable the LIODN (to 
avoid an interrupt storm), but
we definitely need a mechanism to inform vfio subsystem about the error. Also, 
disabling LIODN may not be a viable
option with the new LIODN allocation scheme (where LIODN would be associated 
with a domain).

 
 Don't take down the whole kernel.  It's not just overkill; it undermines
 VFIO's efforts to make it safe for users to control devices.
 
   Besides the device that caused the violation the system should still
   work, no?
 
  Not really.  The PAMU was designed to add IOMMU support to legacy
  devices, which have no concept of an MMU.  If the PAMU detects an
  access violation, there's no way for the device to recover, because it
  has no idea that a violation has occurred.  It's going to keep on
  writing to bad data.
 
 I think that's only the case for posted writes (or devices which fail to
 take a hint and stop even after they see an I/O error).
 
[Sethi Varun-B16395] Even in the case where the guest driver detects a failure, 
it may not be able to fix the problem without intervention from the VFIO 
subsystem.

-Varun

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 2/5 v11] powerpc: Add iommu domain pointer to device archdata

2013-04-02 Thread Sethi Varun-B16395
Kumar/Ben,
Any comments?

(Had checked with Ben (on IRC) sometime back, he was fine with this patch)

Regards
Varun


 -Original Message-
 From: Joerg Roedel [mailto:j...@8bytes.org]
 Sent: Tuesday, April 02, 2013 8:39 PM
 To: Sethi Varun-B16395
 Cc: Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-
 foundation.org; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; ga...@kernel.crashing.org;
 b...@kernel.crashing.org
 Subject: Re: [PATCH 2/5 v11] powerpc: Add iommu domain pointer to device
 archdata
 
 On Fri, Mar 29, 2013 at 01:23:59AM +0530, Varun Sethi wrote:
  Add an iommu domain pointer to device (powerpc) archdata.  Devices are
  attached to iommu domains and this pointer provides a mechanism to
  correlate between a device and the associated iommu domain.  This
  field is set when a device is attached to a domain.
 
  Signed-off-by: Varun Sethi varun.se...@freescale.com
 
 This patch needs to be Acked by the PPC maintainers.
 
 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 4/5 v11] iommu/fsl: Add additional iommu attributes required by the PAMU driver.

2013-04-02 Thread Sethi Varun-B16395


 -Original Message-
 From: Joerg Roedel [mailto:j...@8bytes.org]
 Sent: Tuesday, April 02, 2013 8:40 PM
 To: Sethi Varun-B16395
 Cc: Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-
 foundation.org; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; ga...@kernel.crashing.org;
 b...@kernel.crashing.org
 Subject: Re: [PATCH 4/5 v11] iommu/fsl: Add additional iommu attributes
 required by the PAMU driver.
 
 On Fri, Mar 29, 2013 at 01:24:01AM +0530, Varun Sethi wrote:
  +/* cache stash targets */
  +enum stash_target {
  +   IOMMU_ATTR_CACHE_L1 = 1,
  +   IOMMU_ATTR_CACHE_L2,
  +   IOMMU_ATTR_CACHE_L3,
  +};
  +
  +/* This attribute corresponds to IOMMUs capable of generating
  + * a stash transaction. A stash transaction is typically a
  + * hardware initiated prefetch of data from memory to cache.
  + * This attribute allows configuring stashig specific parameters
  + * in the IOMMU hardware.
  + */
  +
  +struct iommu_stash_attribute {
  +   u32 cpu;/* cpu number */
  +   u32 cache;  /* cache to stash to: L1,L2,L3 */
  +};
  +
 
 I would prefer these PAMU specific enum and struct to be in a pamu-
 specific iommu-header.
 

[Sethi Varun-B16395] But, these would be used by the IOMMU API users (e.g. 
VFIO), they shouldn't depend on PAMU specific headers.

-Varun

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu