Re: [PATCH v2] Intel IOMMU patch to reprocess RMRR info

2012-09-28 Thread Shuah Khan
On Fri, 2012-09-28 at 20:15 +0100, David Woodhouse wrote:
 On Fri, 2012-09-28 at 12:36 -0400, Linda Knippers wrote:
  I can only speak to the HP servers.  We have been shipping devices
  'for a while' that provide sensor-type data to the platform.  The
  device does DMA writes to a range of memory (the RMRR) and
  iLO does DMA reads of that data.
  
  This works in general but not when the 'iommu=pt' boot option is
  used.  This patch associates the RMRR with the devices when
  they are moved out of the si domain.
 
 That much makes sense, I think, because they're moved out of the
 hardware SI domain *early*, when we realise they're actually only
 capable of 32-bit DMA and we have 4GiB of RAM. Right?

Correct. By default all devices are added to SI domain assuming that
these devices are 64-bit devices. When we detect the device is a 32-bit
device based on the requested dma-mask, it gets removed from SI domain,
hence looses its RMRR association. In the meantime dma continues causing
DMA errors. This patch is re-processing RMRRs for the device in question
and doing re-assignment.

 
 It sounds like this isn't a case of the device being used by a native
 driver or given to KVM, and subsequently released. This is just booting
 up and losing the RMRR regions on a device which the OS *hasn't* really
 touched. So that should be fixed.
 
  Based on Alex's comments about moving RMRR devices between domains,
  it sounds like we also have a problem without the 'iommu=pt' boot
  option if someone assigns one of those devices to a guest.
 
 Yeah... but why *would* they? What possible reason would we have to
 assign either the sensor device, or the iLO, to a KVM guest. Or to have
 a native driver that attempts to do DMA from them?
 
 Obviously, in an ideal world we'd have proper native drivers for the
 sensor device. But I'm guessing that's not the case here; it's used by
 the firmware and we're not supposed to be touching it?
 
 And yes, obviously a better hardware design (from the OS/IOMMU point of
 view) would be to have a path for the sensor data that *doesn't* go via
 host RAM and thus via the IOMMU twice. But while that's a lesson that's
 hopefully been learned and will be implemented in future, we have to
 deal with the existing hardware and its (ab)use of RMRRs.
 

Right. We do have hardware that is relying on being able to do dma from
devices to a system RAM via RMRR.

-- Shuah


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/16] iommu/amd: Allocate data structures to keep track of irq remapping tables

2012-10-03 Thread Shuah Khan
On Fri, Sep 28, 2012 at 6:23 AM, Joerg Roedel joerg.roe...@amd.com wrote:
 To easily map device ids to interrupt remapping table
 entries a new lookup table is necessary.

 Signed-off-by: Joerg Roedel joerg.roe...@amd.com
 ---
  drivers/iommu/amd_iommu_init.c  |   16 
  drivers/iommu/amd_iommu_types.h |9 +
  2 files changed, 25 insertions(+)

 diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
 index a046a0e..db100c5 100644
 --- a/drivers/iommu/amd_iommu_init.c
 +++ b/drivers/iommu/amd_iommu_init.c
 @@ -181,6 +181,12 @@ u16 *amd_iommu_alias_table;
  struct amd_iommu **amd_iommu_rlookup_table;

  /*
 + * This table is used to find the irq remapping table for a given device id
 + * quickly.
 + */
 +struct irq_remap_table **irq_lookup_table;
 +
 +/*
   * AMD IOMMU allows up to 2^16 differend protection

different

 domains. This is a bitmap
   * to know which ones are already in use.
   */
 @@ -1529,9 +1535,13 @@ static struct syscore_ops amd_iommu_syscore_ops = {

  static void __init free_on_init_error(void)
  {
 +   free_pages((unsigned long)irq_lookup_table,
 +  get_order(rlookup_table_size));
 +
 if (amd_iommu_irq_cache) {
 kmem_cache_destroy(amd_iommu_irq_cache);
 amd_iommu_irq_cache = NULL;
 +
 }

 amd_iommu_uninit_devices();
 @@ -1684,6 +1694,12 @@ static int __init early_amd_iommu_init(void)
 0, NULL);
 if (!amd_iommu_irq_cache)
 goto out;
 +
 +   irq_lookup_table = (void *)__get_free_pages(
 +   GFP_KERNEL | __GFP_ZERO,
 +   get_order(rlookup_table_size));
 +   if (!irq_lookup_table)
 +   goto out;
 }

 ret = init_memory_definitions(ivrs_base);
 diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
 index 953cea8..1a7d480 100644
 --- a/drivers/iommu/amd_iommu_types.h
 +++ b/drivers/iommu/amd_iommu_types.h
 @@ -175,6 +175,7 @@
  #define DEV_ENTRY_EX0x67
  #define DEV_ENTRY_SYSMGT1   0x68
  #define DEV_ENTRY_SYSMGT2   0x69
 +#define DEV_ENTRY_IRQ_TBL_EN   0x80
  #define DEV_ENTRY_INIT_PASS 0xb8
  #define DEV_ENTRY_EINT_PASS 0xb9
  #define DEV_ENTRY_NMI_PASS  0xba
 @@ -337,6 +338,14 @@ extern bool amd_iommu_iotlb_sup;
  #define MAX_IRQS_PER_TABLE 256
  #define IRQ_TABLE_ALIGNMENT128

 +struct irq_remap_table {
 +   spinlock_t lock;
 +   unsigned min_index;
 +   u32 *table;
 +};
 +
 +extern struct irq_remap_table **irq_lookup_table;
 +
  /* Interrupt remapping feature used? */
  extern bool amd_iommu_irq_remap;

 --
 1.7.9.5


 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 14/16] iommu/irq: Use amd_iommu_irq_ops if supported

2012-10-03 Thread Shuah Khan
On Fri, Sep 28, 2012 at 6:24 AM, Joerg Roedel joerg.roe...@amd.com wrote:
 Finally enable interrupt remapping for AMD systems.

 Signed-off-by: Joerg Roedel joerg.roe...@amd.com
 ---
  drivers/iommu/irq_remapping.c |5 +
  1 file changed, 5 insertions(+)

 diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
 index 151690d..faf85d6 100644
 --- a/drivers/iommu/irq_remapping.c
 +++ b/drivers/iommu/irq_remapping.c
 @@ -51,6 +51,11 @@ early_param(intremap, setup_irqremap);
  void __init setup_irq_remapping_ops(void)
  {
 remap_ops = intel_irq_remap_ops;
 +
 +#ifdef CONFIG_AMD_IOMMU
 +   if (amd_iommu_irq_ops.prepare() == 0)
 +   remap_ops = amd_iommu_irq_ops;
 +#endif

Should remap_ops be set to null when amd_iommu_irq_ops.prepare()
fails?What happens if remap_ops left set to intel_irq_remap_ops?
Should remap_ops = intel_irq_remap_ops; moved into #else case
forifdef CONFIG_AMD_IOMMU?
  }

  int irq_remapping_supported(void)
 --
 1.7.9.5


 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/16] iommu/amd: Add initialization routines for AMD interrupt remapping

2012-10-03 Thread Shuah Khan
On Mon, Oct 1, 2012 at 2:05 AM, Joerg Roedel joerg.roe...@amd.com wrote:
 On Fri, Sep 28, 2012 at 05:18:18PM -0600, Shuah Khan wrote:
  +void amd_iommu_disable(void)
  +{
  +   amd_iommu_suspend();

 Is it safe to attempt to suspend when iommu is in suspend state? In
 other words what happens if amd_iommu_disable() gets called when iommu
 is already in suspend state?

 Yes, this is safe. It just trns a bit to 0 which is already 0.

Good.
  +int amd_iommu_reenable(int mode)
  +{
  +   amd_iommu_resume();

 Same question as above. Safe to do a resume, when in resumed state?

 Safe too, it just writes values to the hardware which are already there.
 This was also proven by testing.


Thanks. Sounds good.

-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 14/16] iommu/irq: Use amd_iommu_irq_ops if supported

2012-10-03 Thread Shuah Khan
On Mon, Oct 1, 2012 at 2:08 AM, Joerg Roedel joerg.roe...@amd.com wrote:
 On Fri, Sep 28, 2012 at 05:39:03PM -0600, Shuah Khan wrote:
 On Fri, Sep 28, 2012 at 6:24 AM, Joerg Roedel joerg.roe...@amd.com wrote:
   void __init setup_irq_remapping_ops(void)
   {
  remap_ops = intel_irq_remap_ops;
  +
  +#ifdef CONFIG_AMD_IOMMU
  +   if (amd_iommu_irq_ops.prepare() == 0)
  +   remap_ops = amd_iommu_irq_ops;
  +#endif

 Should remap_ops be set to null when amd_iommu_irq_ops.prepare()
 fails?What happens if remap_ops left set to intel_irq_remap_ops?
 Should remap_ops = intel_irq_remap_ops; moved into #else case
 forifdef CONFIG_AMD_IOMMU?

 Remap-Ops does not need to be set to NULL because irq_remapping_enabled
 will not get set to true then and remap_ops will not get called. The
 Intel path can't also be moved to #else because this would mean that
 kernels can only support eihter, Intel or AMD IOMMU. But Linux can
 support both in the same kernel.

Thanks for the explanation.

-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/5] amd_iommu: Split IOMMU group initialization

2012-10-09 Thread Shuah Khan
Alex,

couple of comments in-lined:

On Mon, Oct 8, 2012 at 10:49 PM, Alex Williamson
alex.william...@redhat.com wrote:
 This needs to be broken apart, start with pulling all the IOMMU
 group init code into a new function.

 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---

  drivers/iommu/amd_iommu.c |   61 
 -
  1 file changed, 43 insertions(+), 18 deletions(-)

 diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
 index 55074cb..b65b377 100644
 --- a/drivers/iommu/amd_iommu.c
 +++ b/drivers/iommu/amd_iommu.c
 @@ -276,39 +276,32 @@ static void swap_pci_ref(struct pci_dev **from, struct 
 pci_dev *to)

  #define REQ_ACS_FLAGS  (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)

 -static int iommu_init_device(struct device *dev)
 +static int init_iommu_group(struct device *dev)
  {
 -   struct pci_dev *dma_pdev = NULL, *pdev = to_pci_dev(dev);
 struct iommu_dev_data *dev_data;
 struct iommu_group *group;
 -   u16 alias;
 +   struct pci_dev *dma_pdev = NULL;
 int ret;

I don't see ret get initialized


 -   if (dev-archdata.iommu)
 +   group = iommu_group_get(dev);
 +   if (group) {
 +   iommu_group_put(group);
 return 0;
 +   }

 dev_data = find_dev_data(get_device_id(dev));
 if (!dev_data)
 return -ENOMEM;

 -   alias = amd_iommu_alias_table[dev_data-devid];
 -   if (alias != dev_data-devid) {
 -   struct iommu_dev_data *alias_data;
 -
 -   alias_data = find_dev_data(alias);
 -   if (alias_data == NULL) {
 -   pr_err(AMD-Vi: Warning: Unhandled device %s\n,
 -   dev_name(dev));
 -   free_dev_data(dev_data);
 -   return -ENOTSUPP;
 -   }
 -   dev_data-alias_data = alias_data;
 +   if (dev_data-alias_data) {
 +   u16 alias;

 +   alias = amd_iommu_alias_table[dev_data-devid];
 dma_pdev = pci_get_bus_and_slot(alias  8, alias  0xff);
 }

 -   if (dma_pdev == NULL)
 -   dma_pdev = pci_dev_get(pdev);
 +   if (!dma_pdev)
 +   dma_pdev = pci_dev_get(to_pci_dev(dev));

 /* Account for quirked devices */
 swap_pci_ref(dma_pdev, pci_get_dma_source(dma_pdev));
 @@ -358,6 +351,38 @@ root_bus:

 iommu_group_put(group);

 +   return ret;

Do you even need ret - can this be simply return 0;

 +}
 +
 +static int iommu_init_device(struct device *dev)
 +{
 +   struct pci_dev *pdev = to_pci_dev(dev);
 +   struct iommu_dev_data *dev_data;
 +   u16 alias;
 +   int ret;
 +
 +   if (dev-archdata.iommu)
 +   return 0;
 +
 +   dev_data = find_dev_data(get_device_id(dev));
 +   if (!dev_data)
 +   return -ENOMEM;
 +
 +   alias = amd_iommu_alias_table[dev_data-devid];
 +   if (alias != dev_data-devid) {
 +   struct iommu_dev_data *alias_data;
 +
 +   alias_data = find_dev_data(alias);
 +   if (alias_data == NULL) {
 +   pr_err(AMD-Vi: Warning: Unhandled device %s\n,
 +   dev_name(dev));
 +   free_dev_data(dev_data);
 +   return -ENOTSUPP;
 +   }
 +   dev_data-alias_data = alias_data;
 +   }
 +
 +   ret = init_iommu_group(dev);
 if (ret)
 return ret;


 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -stable] amd_iommu: attach device fails on the last pci device

2012-10-12 Thread Shuah Khan
amd_iommu_attach_device() checks if device id is within the limits
of amd_iommu_last_bdf and instead checking if devid  amd_iommu_last_bdf,
it checks devid = amd_iommu_last_bdf. As a result the last device attach
fails because amd_iommu_attach_device() returns an -EINVAL.

This bug is in linux-2.6.32 and an equivalent fix in linux-2.6.33 and has been
carried forward to later kernels and is in the upstream kernel. This equivalent
fix includes restructuring and consolidating device checks into a routine
check_device(). Instead of back-porting all of that work, spot-fixed the bug in
amd_iommu_attach_device() for linux-2.6.32.

Signed-off-by: Shuah Khan shuah.k...@hp.com
CC: sta...@vger.kernel.org v2.6.32
---
 arch/x86/kernel/amd_iommu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
index 3a44b75..67de7d7 100644
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -2288,7 +2288,7 @@ static int amd_iommu_attach_device(struct iommu_domain 
*dom,
 
devid = calc_devid(pdev-bus-number, pdev-devfn);
 
-   if (devid = amd_iommu_last_bdf ||
+   if (devid  amd_iommu_last_bdf ||
devid != amd_iommu_alias_table[devid])
return -EINVAL;
 
-- 
1.7.9.5



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] Prevent devices with RMRRs from being placed into SI Domain during startup

2012-10-16 Thread Shuah Khan
On Tue, 2012-10-16 at 16:50 +, Tom Mingarelli wrote:
 This patch is to prevent devices that have RMRRs associated with them 
 from getting placed into the SI Domain during init. We don't put USB devices
 into this category, however. This fixes the issue where the RMRR info
 for devices being placed in and out of the SI Domain gets lost.
 
 Signed-off-by: Thomas Mingarelli thomas.mingare...@hp.com
 
 PATCH v1: https://lkml.org/lkml/2012/6/15/204
 PATCH v2: https://lkml.org/lkml/2012/9/18/354
 
 drivers/iommu/intel-iommu.c |   33 +
 1 files changed, 33 insertions(+), 0 deletions(-)
 
 diff -up ./drivers/iommu/intel-iommu.c.ORIG ./drivers/iommu/intel-iommu.c
 --- ./drivers/iommu/intel-iommu.c.ORIG2012-10-16 09:34:23.148089944 
 -0500
 +++ ./drivers/iommu/intel-iommu.c 2012-10-16 09:56:56.905932861 -0500
 @@ -2320,8 +2320,41 @@ static int domain_add_dev_info(struct dm
   return 0;
  }
  
 +static bool device_has_rmrr(struct pci_dev *dev)
 +{
 + struct dmar_rmrr_unit *rmrr;
 + int i;
 +
 + for_each_rmrr_units(rmrr) {
 + for (i = 0; i  rmrr-devices_cnt; i++) {
 + /*
 +  * Here we are just concerned with checking each device
 +  * that has an RMRR associated with it and not allow it
 +  * to be placed into the SI Domain during startup.
 + */
 + if (rmrr-devices[i] == dev)
 + return true;
 + }
 + }
 + return false;
 +}
 +

Will you use the same routine to deny device assignment request for
devices with RMRR? Is that going to be another patch?

  static int iommu_should_identity_map(struct pci_dev *pdev, int startup)
  {
 +
 + if (startup) {
 + /*
 +  * This is where we will refuse any device that has an
 +  * RMRR associated with it and is not a USB device and
 +  * NOT allow it to be placed into the SI Domain. We
 +  * only do this on startup. We don't need a separate bit
 +  * for this because it could be ANY device.
 + */
 + if (device_has_rmrr(pdev) 
 + (pdev-class  8) != PCI_CLASS_SERIAL_USB)
 + return 0;
 + }
 +

Is there a reason to not group this with the other pci device checks
below. Don't you need this done whenever iommu_should_identity_map() get
called as opposed just at startup?

   if ((iommu_identity_mapping  IDENTMAP_AZALIA)  IS_AZALIA(pdev))
   return 1;
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] Prevent devices with RMRRs from being placed into SI Domain during startup

2012-10-16 Thread Shuah Khan
On Tue, 2012-10-16 at 16:50 +, Tom Mingarelli wrote:
 This patch is to prevent devices that have RMRRs associated with them 
 from getting placed into the SI Domain during init. We don't put USB devices
 into this category, however. This fixes the issue where the RMRR info
 for devices being placed in and out of the SI Domain gets lost.
 
 Signed-off-by: Thomas Mingarelli thomas.mingare...@hp.com
 
 PATCH v1: https://lkml.org/lkml/2012/6/15/204
 PATCH v2: https://lkml.org/lkml/2012/9/18/354
 
 drivers/iommu/intel-iommu.c |   33 +
 1 files changed, 33 insertions(+), 0 deletions(-)
 
 diff -up ./drivers/iommu/intel-iommu.c.ORIG ./drivers/iommu/intel-iommu.c
 --- ./drivers/iommu/intel-iommu.c.ORIG2012-10-16 09:34:23.148089944 
 -0500
 +++ ./drivers/iommu/intel-iommu.c 2012-10-16 09:56:56.905932861 -0500
 @@ -2320,8 +2320,41 @@ static int domain_add_dev_info(struct dm
   return 0;
  }
  
 +static bool device_has_rmrr(struct pci_dev *dev)
 +{
 + struct dmar_rmrr_unit *rmrr;
 + int i;
 +
 + for_each_rmrr_units(rmrr) {
 + for (i = 0; i  rmrr-devices_cnt; i++) {
 + /*
 +  * Here we are just concerned with checking each device
 +  * that has an RMRR associated with it and not allow it
 +  * to be placed into the SI Domain during startup.
 + */
 + if (rmrr-devices[i] == dev)
 + return true;
 + }
 + }
 + return false;
 +}
 +
  static int iommu_should_identity_map(struct pci_dev *pdev, int startup)
  {
 +
 + if (startup) {
 + /*
 +  * This is where we will refuse any device that has an
 +  * RMRR associated with it and is not a USB device and
 +  * NOT allow it to be placed into the SI Domain. We
 +  * only do this on startup. We don't need a separate bit
 +  * for this because it could be ANY device.
 + */
 + if (device_has_rmrr(pdev) 
 + (pdev-class  8) != PCI_CLASS_SERIAL_USB)

Forgot to ask in my last response. Is it sufficient to check _USB. Are
we missing any other devices that use RMRR that would qualify?

 + return 0;
 + }
 +
   if ((iommu_identity_mapping  IDENTMAP_AZALIA)  IS_AZALIA(pdev))
   return 1;
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH -stable] amd_iommu: attach device fails on the last pci device

2012-10-23 Thread Shuah Khan
On Fri, 2012-10-12 at 12:35 -0700, Jonathan Nieder wrote:
 Shuah Khan wrote:
  On Fri, 2012-10-12 at 11:38 -0700, Jonathan Nieder wrote:
 
  To save Willy time: am I correct in guessing the upstream commit you
  are referring to is 98fc5a693bbdda498a556654c70d1e31a186c988
  (x86/amd-iommu: Use get_device_id and check_device where appropriate,
  2009-11-24)?
 
  Yes that is one.
 
 Great, thanks for the quick confirmation.

Is this patch in the queue to be pulled into 2.6.32 tree?

Thanks,
-- Shuah

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: moving initialization earlier

2013-01-11 Thread Shuah Khan
On Fri, Jan 11, 2013 at 7:50 AM, Joerg Roedel j...@8bytes.org wrote:
 On Thu, Jan 10, 2013 at 01:40:17PM -0700, Shuah Khan wrote:
 I am currently debugging IO_PAGE_FAULTS on 3.6.11 (happens on all
 pre-3.7 releases). I root-caused the reason 3.7 works is because in
 3.7 amd iommu driver moving up the early iommu initialization from
 irq_remap_ops with the irq remapping feature.

 Have you investigated the reason for those IO_PAGE_FAULTS? I guess they
 come from the USB controlers and happen between the time the IOMMU is
 enabled and the USB controlers are taken over by the Linux kernel from
 the BIOS.

 But I don't see why this patch can have any impact on the IO_PAGE_FAULTS
 you are seeing.

USB is one of them and SATA is another. I know with my back-port to
split dma ops initialization and this patch Alexey sent, the problem
goes away on 3.6.11 and I don't see the problem on 3.7. Same system,
and two different boots. I will investigate do more investigation into
the nature of these faults I am seeing and get back to you.

Thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init() in Linux 3.4

2013-01-31 Thread Shuah Khan
Joerg,

I am seeing IO_PAGE_FAULTs on AMD system running releases prior to 3.7.
I focused my debug and testing on 3.4. I am hoping to find a solution
for this problem in 3.4. I don't see any IO_PAGE_FAULTs with 3.7 and
later releases on this system.

On this system BIOS specifies Unity mapped (direct mapped) exclusion
ranges in IVMDs for several devices. These regions are in use during
BIOS hand-off to kernel and continue to be used during kernel boot and
run-time.

Access to these ranges continues to work with no errors until AMD IOMMU
driver disables and re-enables IOMMU in enable_iommus(). These faults
don't persist and appear between the enable_iommus() call and before
amd_iommu_init() gets done printing AMD-Vi: Lazy IO/TLB flushing
enabled message.

Read requests from device 02:00.2 and write request from device 03:00.0
to these unity mapped regions fail. The reason appears to be because
domain id is 0.

Domain gets assigned in amd_iommu_init_dma_ops() and unity maps are
handled. I don't see enable_iommus() doing anything to these unity
mapped exclusion ranges. So I am assuming that is not the issue,
however, could domain ids get flushed? More like, why do these faults
show up in this window? These are direct mapped, so there is no need for
any translations.

Please see below for IVMD dump and IO_PAGE_FAULT analysis.

Dump of these ranges from dmesg:

[5.322280] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: 000f range_end:
0010 flags: 7
[5.322367] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: bff7 range_end:
bfff flags: 7
[5.322454] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: 000e8000 range_end:
000e9000 flags: 7
[5.322540] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: bdffe000 range_end:
be00 flags: 7
[5.322627] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: bdff9000 range_end:
bdffd000 flags: 7
[5.322714] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: bdfe9000 range_end:
bdff9000 flags: 7


Now to IO_PAGE_FAULT analysis: My observations in 

[   15.281594] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.2
domain=0x address=0xbdffe000 flags=0x0050]
[   15.281861] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.2
domain=0x address=0xbdff9080 flags=0x0050]
[   15.281990] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.2
domain=0x address=0xbdff9100 flags=0x0050]

Domain ID is zero - PASID not valid
flags=0x0050 - Bits PE and PR are set in the Event.
TR: translation TR=0
  TR is 0 that means it is a transaction request
RZ: reserved bit RZ=0
  Since PR is set RZ is meaningful, I/O page fault is due to an invalid
   level encoding
PE: permission indicator PE=1
  Device doesn't have permission for this transaction
RW: read-write RW=0
  RW is meaningful since PR=1, TR=0, and I=0. It is a Read transaction
PR: Present PR=1
  PR = 1 means transaction is to a page marked present
I: interrupt I=0
  transaction is a memory request
US: user-supervisor US=0
  Supervisor privileges were asserted.
NX: no execute NX=0
  0 upstream transaction lacks a PASID TLP prefix. Domain ID is zero.
GN: guest/nested GN=0
  Transaction used a nested address (GPA).

[   15.281733] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0
domain=0x address=0xbdff9160 flags=0x0070]

Domain ID is zero - PASID is not valid
flags=0x0070 - Bits PE, RW, and PR are set in the Event.
TR: translation TR=0
  TR is 0 that means it is a transaction request
RZ: reserved bit RZ=0
  Since PR is set RZ is meaningful, I/O page fault is due to an invalid
   level encoding
PE: permission indicator PE=1
  Device doesn't have permission for this transaction
RW: read-write RW=1
  RW is meaningful since PR=1, TR=0, and I=0. It is a Write
transaction
PR: Present PR=1
  PR = 1 means transaction is to a page marked present
I: interrupt I=0
  transaction is a memory request
US: user-supervisor US=0
  Supervisor privileges were asserted.
NX: no execute NX=0
  0 upstream transaction lacks a PASID TLP prefix. Domain ID is zero.
GN: guest/nested GN=0
  Transaction used a nested address (GPA).

Thanks,
-- Shuah

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init() in Linux 3.4

2013-02-05 Thread Shuah Khan
On Tue, Feb 5, 2013 at 6:31 AM, Joerg Roedel j...@8bytes.org wrote:
 Hi Shuah,

 On Fri, Feb 01, 2013 at 11:31:59AM -0700, Shuah Khan wrote:
 Yes, 3.7 has the same window of opportunity for this race condition,
 however I couldn't figure out why it doesn't happen on 3.7. On 3.7 the
 window between amd_iommu_init_hardware() and amd_iommu_init_dma_ops()
 might actually be wider than the window in 3.4.

 I think this is highly timing related. IOMMU initialization may have
 been moved by a few milliseconds between the kernel versions which might
 cause the warnings to appear or disappear. I don't think it has much
 value to dive deeper into the differences between the initialization
 sequences.

 As somethimes with such issues there is a simple and a more complex fix
 for that. I'll try to come up with a simple fix for the next merge
 window and implement the clean and more complex one for the next one.


Hi Joerg,

Thanks much. I will hang on to this test system for testing your fix.

-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init() in Linux 3.4

2013-02-06 Thread Shuah Khan
On Wed, 2013-02-06 at 13:12 +0100, Joerg Roedel wrote:
 On Tue, Feb 05, 2013 at 06:57:21AM -0700, Shuah Khan wrote:
  Thanks much. I will hang on to this test system for testing your fix.
 
 Okay, here is the simple fix for v3.8-rc6. I guess it is not
 straighforward to port it to v3.4, but it should be doable.
 
 From 2ecf57c85e67e0243b36b787d0490c0b47202ba8 Mon Sep 17 00:00:00 2001
 From: Joerg Roedel j...@8bytes.org
 Date: Wed, 6 Feb 2013 12:55:23 +0100
 Subject: [PATCH] iommu/amd: Initialize device table after dma_ops
 
 When dma_ops are initialized the unity mappings are
 created. The init_device_table_dma() function makes sure DMA
 from all devices is blocked by default. This opens a short
 window in time where DMA to unity mapped regions is blocked
 by the IOMMU. Make sure this does not happen by initializing
 the device table after dma_ops.
 
 Signed-off-by: Joerg Roedel j...@8bytes.org

Joerg,

I tested your patch on 3.8. I was able to reproduce the problem and then
apply your patch to verify that the problem is fixed. This patch applies
cleanly to 3.7.6, however I could not reproduce the problem on 3.7.6
without the patch. But the window exists on 3.7 as well. Your patch can
be applied to 3.7.6 as is.

I back-ported the patch to 3.4 and 3.0 and tested. I am sending those
patches after this email.

On 3.4.29 and 3.0.62 I was able to reproduce the problem and then
applied the back-ported patch to verify that the problem is fixed.

Thanks again for the fix.

-- Shuah

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] amd_iommu: IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init()

2013-02-06 Thread Shuah Khan
When dma_ops are initialized the unity mappings are created. The
init_device_table_dma() function makes sure DMA from all devices is
blocked by default. This opens a short window in time where DMA to
unity mapped regions is blocked by the IOMMU. Make sure this does not
happen by initializing the device table after dma_ops.

Signed-off-by: Shuah Khan shuah.k...@hp.com
CC: sta...@vger.kernel.org 3.4
---
 drivers/iommu/amd_iommu_init.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index ef0ae93..b573f80 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1572,8 +1572,6 @@ int __init amd_iommu_init_hardware(void)
if (amd_iommu_pd_alloc_bitmap == NULL)
goto free;
 
-   /* init the device table */
-   init_device_table();
 
/*
 * let all alias entries point to itself
@@ -1655,6 +1653,7 @@ out:
  */
 static int __init amd_iommu_init(void)
 {
+   struct amd_iommu *iommu;
int ret = 0;
 
ret = amd_iommu_init_hardware();
@@ -1673,6 +1672,12 @@ static int __init amd_iommu_init(void)
if (ret)
goto free;
 
+   /* init the device table */
+   init_device_table();
+
+   for_each_iommu(iommu)
+   iommu_flush_all_caches(iommu);
+
amd_iommu_init_api();
 
x86_platform.iommu_shutdown = disable_iommus;
-- 
1.7.9.5



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init() in Linux 3.4

2013-02-11 Thread Shuah Khan
On Mon, Feb 11, 2013 at 12:49 PM, Greg KH gre...@linuxfoundation.org wrote:
 On Wed, Feb 06, 2013 at 07:40:50PM -0700, Shuah Khan wrote:
 On Wed, 2013-02-06 at 13:12 +0100, Joerg Roedel wrote:
  On Tue, Feb 05, 2013 at 06:57:21AM -0700, Shuah Khan wrote:
   Thanks much. I will hang on to this test system for testing your fix.
 
  Okay, here is the simple fix for v3.8-rc6. I guess it is not
  straighforward to port it to v3.4, but it should be doable.
 
  From 2ecf57c85e67e0243b36b787d0490c0b47202ba8 Mon Sep 17 00:00:00 2001
  From: Joerg Roedel j...@8bytes.org
  Date: Wed, 6 Feb 2013 12:55:23 +0100
  Subject: [PATCH] iommu/amd: Initialize device table after dma_ops
 
  When dma_ops are initialized the unity mappings are
  created. The init_device_table_dma() function makes sure DMA
  from all devices is blocked by default. This opens a short
  window in time where DMA to unity mapped regions is blocked
  by the IOMMU. Make sure this does not happen by initializing
  the device table after dma_ops.
 
  Signed-off-by: Joerg Roedel j...@8bytes.org

 Joerg,

 I tested your patch on 3.8. I was able to reproduce the problem and then
 apply your patch to verify that the problem is fixed. This patch applies
 cleanly to 3.7.6, however I could not reproduce the problem on 3.7.6
 without the patch. But the window exists on 3.7 as well. Your patch can
 be applied to 3.7.6 as is.

 I back-ported the patch to 3.4 and 3.0 and tested. I am sending those
 patches after this email.

 On 3.4.29 and 3.0.62 I was able to reproduce the problem and then
 applied the back-ported patch to verify that the problem is fixed.

 Thanks again for the fix.

 I'm lost here, why isn't this patch in Linus's tree already?  You seem
 to be sending backports for something that isn't there to backport yet.

 Please resend these patches to sta...@vger.kernel.org, with the upstream
 git commit id, when they are in Linus's tree, there's nothing I can do
 with them for now, sorry.

 greg k-h

Greg,

I will resend the patch when I have the commit id from Linus's tree.

Thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-11 Thread Shuah Khan
pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
it doesn't have interfaces to return PCI bus and PCI device id. Drivers
(AMD IOMMU, and AER) have module specific definitions for PCI_BUS() and
AMD_IOMMU driver also has a module specific interface to calculate PCI
device id from bus number and devfn.

This patch set adds PCI_BUS(), and PCI_DEVID() to pci.h, changes are to
use PCI_BUS() from pci. Changes AMD_IOMMU driver to use PCI_BUS() and
PCI_DEVID() from pci.

Files changed:

 include/uapi/linux/pci.h |4 
 1 file changed, 4 insertions(+)

 drivers/pci/pcie/aer/aerdrv_core.c |2 --
 1 file changed, 2 deletions(-)

 drivers/iommu/amd_iommu_types.h |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

 drivers/iommu/amd_iommu.c   |2 +-
 drivers/iommu/amd_iommu_init.c  |6 +++---
 drivers/iommu/amd_iommu_types.h |7 ---
 3 files changed, 4 insertions(+), 11 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-11 Thread Shuah Khan
pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
it doesn't have interfaces to return PCI bus and PCI device id. Drivers
(AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
and AMD_IOMMU driver also has a module specific interface to calculate PCI   
device id from bus number and devfn.

Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device
id respectively to avoid the need for duplicate definitions in other modules.
AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines
an interface to calculate device id from bus number, and devfn pair.

Signed-off-by: Shuah Khan shuah.k...@hp.com
---
 include/uapi/linux/pci.h |4 
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
index 3c292bc0..6b2c8b3 100644
--- a/include/uapi/linux/pci.h
+++ b/include/uapi/linux/pci.h
@@ -30,6 +30,10 @@
 #define PCI_DEVFN(slot, func)  slot)  0x1f)  3) | ((func)  0x07))
 #define PCI_SLOT(devfn)(((devfn)  3)  0x1f)
 #define PCI_FUNC(devfn)((devfn)  0x07)
+#define PCI_DEVID(bus, devfn)  u16)bus)  8) | devfn)
+
+/* return bus from PCI devid = ((u16)bus_number)  8) | devfn */
+#define PCI_BUS(x) (((x)  8)  0xff)
 
 /* Ioctls for /proc/bus/pci/X/Y nodes. */
 #define PCIIOC_BASE('P'  24 | 'C'  16 | 'I'  8)
-- 
1.7.9.5



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] amd_iommu: Remove local PCI_BUS() define and use PCI_BUS() from pci

2013-02-11 Thread Shuah Khan
Change to remove local PCI_BUS() define and use the new PCI_BUS interface from
pci.

Signed-off-by: Shuah Khan shuah.k...@hp.com
---
 drivers/iommu/amd_iommu_types.h |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index e38ab43..a07882f 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -24,6 +24,7 @@
 #include linux/mutex.h
 #include linux/list.h
 #include linux/spinlock.h
+#include linux/pci.h
 
 /*
  * Maximum number of IOMMUs supported
@@ -315,9 +316,6 @@
 
 #define MAX_DOMAIN_ID 65536
 
-/* FIXME: move this macro to linux/pci.h */
-#define PCI_BUS(x) (((x)  8)  0xff)
-
 /* Protection domain flags */
 #define PD_DMA_OPS_MASK(1UL  0) /* domain used for dma_ops */
 #define PD_DEFAULT_MASK(1UL  1) /* domain is a default 
dma_ops
-- 
1.7.9.5



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] amd_iommu: Remove calc_devid() and use PCI_DEVID() from pci

2013-02-11 Thread Shuah Khan
Change to remove calc_devid() and use PCI_DEVID() from pci instead.

Signed-off-by: Shuah Khan shuah.k...@hp.com
---
 drivers/iommu/amd_iommu.c   |2 +-
 drivers/iommu/amd_iommu_init.c  |6 +++---
 drivers/iommu/amd_iommu_types.h |7 ---
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c1c74e0..38c1392 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -173,7 +173,7 @@ static inline u16 get_device_id(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
 
-   return calc_devid(pdev-bus-number, pdev-devfn);
+   return PCI_DEVID(pdev-bus-number, pdev-devfn);
 }
 
 static struct iommu_dev_data *get_dev_data(struct device *dev)
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index faf10ba..582b5df 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -406,7 +406,7 @@ static int __init find_last_devid_on_pci(int bus, int dev, 
int fn, int cap_ptr)
u32 cap;
 
cap = read_pci_config(bus, dev, fn, cap_ptr+MMIO_RANGE_OFFSET);
-   update_last_devid(calc_devid(MMIO_GET_BUS(cap), MMIO_GET_LD(cap)));
+   update_last_devid(PCI_DEVID(MMIO_GET_BUS(cap), MMIO_GET_LD(cap)));
 
return 0;
 }
@@ -1128,9 +1128,9 @@ static int iommu_init_pci(struct amd_iommu *iommu)
pci_read_config_dword(iommu-dev, cap_ptr + MMIO_MISC_OFFSET,
  misc);
 
-   iommu-first_device = calc_devid(MMIO_GET_BUS(range),
+   iommu-first_device = PCI_DEVID(MMIO_GET_BUS(range),
 MMIO_GET_FD(range));
-   iommu-last_device = calc_devid(MMIO_GET_BUS(range),
+   iommu-last_device = PCI_DEVID(MMIO_GET_BUS(range),
MMIO_GET_LD(range));
 
if (!(iommu-cap  (1  IOMMU_CAP_IOTLB)))
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index a07882f..ec36cf6 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -701,13 +701,6 @@ extern int amd_iommu_max_glx_val;
  */
 extern void iommu_flush_all_caches(struct amd_iommu *iommu);
 
-/* takes bus and device/function and returns the device id
- * FIXME: should that be in generic PCI code? */
-static inline u16 calc_devid(u8 bus, u8 devfn)
-{
-   return (((u16)bus)  8) | devfn;
-}
-
 static inline int get_ioapic_devid(int id)
 {
struct devid_map *entry;
-- 
1.7.9.5



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-25 Thread Shuah Khan
On Wed, 2013-02-20 at 18:19 -0700, Bjorn Helgaas wrote:
 On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan shuah.k...@hp.com wrote:
  pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
  it doesn't have interfaces to return PCI bus and PCI device id. Drivers
  (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
  and AMD_IOMMU driver also has a module specific interface to calculate PCI
  device id from bus number and devfn.
 
  Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device
  id respectively to avoid the need for duplicate definitions in other 
  modules.
  AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver 
  defines
  an interface to calculate device id from bus number, and devfn pair.
 
  Signed-off-by: Shuah Khan shuah.k...@hp.com
  ---
   include/uapi/linux/pci.h |4 
   1 file changed, 4 insertions(+)
 
  diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
  index 3c292bc0..6b2c8b3 100644
  --- a/include/uapi/linux/pci.h
  +++ b/include/uapi/linux/pci.h
  @@ -30,6 +30,10 @@
   #define PCI_DEVFN(slot, func)  slot)  0x1f)  3) | ((func)  0x07))
   #define PCI_SLOT(devfn)(((devfn)  3)  0x1f)
   #define PCI_FUNC(devfn)((devfn)  0x07)
  +#define PCI_DEVID(bus, devfn)  u16)bus)  8) | devfn)
  +
  +/* return bus from PCI devid = ((u16)bus_number)  8) | devfn */
  +#define PCI_BUS(x) (((x)  8)  0xff)
 
   /* Ioctls for /proc/bus/pci/X/Y nodes. */
   #define PCIIOC_BASE('P'  24 | 'C'  16 | 'I'  8)
 
 David, can you point me at a description of include/uapi ... what is
 there and why, and how we should decide what new things go in
 include/uapi/linux/pci.h as opposed to include/linux/pci.h?  Maybe
 there should be something in Documentation/?
 
 I'm guessing it's something to do with being exported to userland, but
 I'm not sure the things in this patch (PCI_DEV_ID, PCI_BUS) are really
 exportable in the sense of being used for syscalls, etc.
 

Bjorn,David,

Looks like the following thread answers some of the questions about when
this uapi export was done on the existing defines.

https://lkml.org/lkml/2011/7/28/198 

Sounds like the concern is that the older defines PCI_DEVFN, PCI_SLOT,
PCI_FUNC,  and PCI_DEVID could be exported, but not the new ones I
added. I could find any discussion on whether these four older defines
are exportable or the reasons for the export in the above thread.

So the question is if uapi/linux.pci.h isn't the right place, do you
have a recommendation on where they belong. The only alternative I can
think of is include/linux/pci.h. It makes functional and logical sense
to add the new defines to where the existing ones are defines. At least,
not knowing the details of the change that moved PCI_DEVFN etc. to
uapi/pci.h, that is my conclusion.

-- Shuah




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-25 Thread Shuah Khan
On Mon, 2013-02-25 at 14:23 -0700, Bjorn Helgaas wrote:
 On Mon, Feb 25, 2013 at 9:37 AM, Shuah Khan shuah.k...@hp.com wrote:
  On Wed, 2013-02-20 at 18:19 -0700, Bjorn Helgaas wrote:
  On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan shuah.k...@hp.com wrote:
   pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
   it doesn't have interfaces to return PCI bus and PCI device id. Drivers
   (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
   and AMD_IOMMU driver also has a module specific interface to calculate 
   PCI
   device id from bus number and devfn.
  
   Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI 
   device
   id respectively to avoid the need for duplicate definitions in other 
   modules.
   AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver 
   defines
   an interface to calculate device id from bus number, and devfn pair.
  
   Signed-off-by: Shuah Khan shuah.k...@hp.com
   ---
include/uapi/linux/pci.h |4 
1 file changed, 4 insertions(+)
  
   diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
   index 3c292bc0..6b2c8b3 100644
   --- a/include/uapi/linux/pci.h
   +++ b/include/uapi/linux/pci.h
   @@ -30,6 +30,10 @@
#define PCI_DEVFN(slot, func)  slot)  0x1f)  3) | ((func)  
   0x07))
#define PCI_SLOT(devfn)(((devfn)  3)  0x1f)
#define PCI_FUNC(devfn)((devfn)  0x07)
   +#define PCI_DEVID(bus, devfn)  u16)bus)  8) | devfn)
   +
   +/* return bus from PCI devid = ((u16)bus_number)  8) | devfn */
   +#define PCI_BUS(x) (((x)  8)  0xff)
  
/* Ioctls for /proc/bus/pci/X/Y nodes. */
#define PCIIOC_BASE('P'  24 | 'C'  16 | 'I'  8)
 
  David, can you point me at a description of include/uapi ... what is
  there and why, and how we should decide what new things go in
  include/uapi/linux/pci.h as opposed to include/linux/pci.h?  Maybe
  there should be something in Documentation/?
 
  I'm guessing it's something to do with being exported to userland, but
  I'm not sure the things in this patch (PCI_DEV_ID, PCI_BUS) are really
  exportable in the sense of being used for syscalls, etc.
 
 
  Bjorn,David,
 
  Looks like the following thread answers some of the questions about when
  this uapi export was done on the existing defines.
 
  https://lkml.org/lkml/2011/7/28/198
 
  Sounds like the concern is that the older defines PCI_DEVFN, PCI_SLOT,
  PCI_FUNC,  and PCI_DEVID could be exported, but not the new ones I
  added. I could find any discussion on whether these four older defines
  are exportable or the reasons for the export in the above thread.
 
 I think David's disintegration script took include/linux/pci.h, left
 the #ifdef __KERNEL__ parts there, and moved everything else (which
 wasn't much) to include/uapi/linux/pci.h.
 
 It's obvious that the PCIIOC_ #defines need to be exported to
 user-space for ioctls.  It's not obvious to me why PCI_DEVFN,
 PCI_SLOT, and PCI_FUNC need to be exported to user-space.  But I can
 imagine user-space using functionality like that, even if it's not
 connected to a kernel interface.  I assume the intent of the
 disintegration is that only include/uapi would be exposed to
 user-space, so keeping those definitions in include/linux/pci.h would
 break any user programs that used them.
 
  So the question is if uapi/linux.pci.h isn't the right place, do you
  have a recommendation on where they belong. The only alternative I can
  think of is include/linux/pci.h. It makes functional and logical sense
  to add the new defines to where the existing ones are defines. At least,
  not knowing the details of the change that moved PCI_DEVFN etc. to
  uapi/pci.h, that is my conclusion.
 
 Using the linux-fullhist tree, I found these:
 
 059d367 Import 2.1.82 -- moved PCI_DEVFN outside #ifdef __KERNEL__
 b039547 Import 2.1.76 -- PCI_DEVFN was inside #ifdef __KERNEL__
 f6d9739 Import 2.1.68pre1 -- added #ifdef __KERNEL__ (enclosing PCI_DEVFN)
 940649f Import 1.3.0 -- added PCI_DEVFN
 
 There's no indication of *why* PCI_DEVFN was exported, of course.
 
 Bottom line, I think it's reasonable to keep PCI_DEVFN, et al., in
 uapi/linux/pci.h to keep from breaking user-programs, even though if
 we were adding them today we would probably put them in the
 kernel-only linux/pci.h.  For the new ones you're adding, I'd propose
 putting them in the kernel-only linux/pci.h because we know no user
 programs use them.

Yeah. These have been in uapi long enough to continue to keep them
there. :)

 
 It's not nice and consistent, but it does follow the simple rule of
 don't expose things to user-space unnecessarily.  We might want to
 add a comment to keep somebody from cleaning it up later.

ok. Will resend patches adding the new defines to linux/pci.h and
renaming PCI_BUS() to PCI_BUS_NR() or PCI_BUS_NUM() like you suggested.

Thanks,
-- Shuah


___
iommu mailing

[PATCH 3.0] iommu/amd: Initialize device table after dma_ops

2013-02-26 Thread Shuah Khan
When dma_ops are initialized the unity mappings are created. The
init_device_table_dma() function makes sure DMA from all devices is
blocked by default. This opens a short window in time where DMA to
unity mapped regions is blocked by the IOMMU. Make sure this does not
happen by initializing the device table after dma_ops.

Back-port upstream commit: f528d980c17b8714aedc918ba86e058af914d66b

Signed-off-by: Joerg Roedel j...@8bytes.org
Signed-off-by: Shuah Khan shuah.k...@hp.com
CC: sta...@vger.kernel.org 3.0
---
 arch/x86/kernel/amd_iommu_init.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c
index 33df6e8..d86aa3f 100644
--- a/arch/x86/kernel/amd_iommu_init.c
+++ b/arch/x86/kernel/amd_iommu_init.c
@@ -1363,6 +1363,7 @@ static struct syscore_ops amd_iommu_syscore_ops = {
  */
 static int __init amd_iommu_init(void)
 {
+   struct amd_iommu *iommu;
int i, ret = 0;
 
/*
@@ -1411,9 +1412,6 @@ static int __init amd_iommu_init(void)
if (amd_iommu_pd_alloc_bitmap == NULL)
goto free;
 
-   /* init the device table */
-   init_device_table();
-
/*
 * let all alias entries point to itself
 */
@@ -1463,6 +1461,12 @@ static int __init amd_iommu_init(void)
if (ret)
goto free_disable;
 
+   /* init the device table */
+   init_device_table();
+
+   for_each_iommu(iommu)
+   iommu_flush_all_caches(iommu);
+
amd_iommu_init_api();
 
amd_iommu_init_notifier();
-- 
1.7.9.5



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/4] pci: Add PCI_BUS_NUM() and PCI_DEVID() interfaces to return bus number and device id

2013-02-27 Thread Shuah Khan
pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
it doesn't have interfaces to return PCI bus and PCI device id. Drivers
(AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
and AMD_IOMMU driver also has a module specific interface to calculate PCI
device id from bus number and devfn.

Add PCI_BUS_NUM and PCI_DEVID interfaces to return PCI bus number and PCI
device id respectively to avoid the need for duplicate definitions in other
modules. AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver
defines an interface to calculate device id from bus number, and devfn pair.

PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces are exported to user-space
via uapi/linux/pci.h. However, in the interest to keep the new interfaces as
kernel only and not export them to user-space unnecessarily, added them to
linux/pci.h instead.

Signed-off-by: Shuah Khan shuah.k...@hp.com
---
 include/linux/pci.h |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 15472d6..95c105a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -35,6 +35,21 @@
 /* Include the ID list */
 #include linux/pci_ids.h
 
+/*
+ * The PCI interface treats multi-function devices as independent
+ * devices.  The slot/function address of each device is encoded
+ * in a single byte as follows:
+ *
+ * 7:3 = slot
+ * 2:0 = function
+ * PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() are defined uapi/linux/pci.h
+ * In the interest of not exposing interfaces to user-space unnecessarily,
+ * the following kernel only defines are being added here.
+ */
+#define PCI_DEVID(bus, devfn)  u16)bus)  8) | devfn)
+/* return bus from PCI devid = ((u16)bus_number)  8) | devfn */
+#define PCI_BUS_NUM(x) (((x)  8)  0xff)
+
 /* pci_slot represents a physical slot */
 struct pci_slot {
struct pci_bus *bus;/* The bus this slot is on */
-- 
1.7.9.5



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/4] pci/aer: Remove local PCI_BUS() define and use PCI_BUS_NUM() from pci

2013-02-27 Thread Shuah Khan
Change to remove local PCI_BUS() define and use the new PCI_BUS_NUM()
interface from pci.

Signed-off-by: Shuah Khan shuah.k...@hp.com
---
 drivers/pci/pcie/aer/aerdrv_core.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c
index 564d97f..8ec8b4f 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -89,8 +89,6 @@ static int add_error_device(struct aer_err_info *e_info, 
struct pci_dev *dev)
return -ENOSPC;
 }
 
-#definePCI_BUS(x)  (((x)  8)  0xff)
-
 /**
  * is_error_source - check whether the device is source of reported error
  * @dev: pointer to pci_dev to be checked
@@ -106,7 +104,7 @@ static bool is_error_source(struct pci_dev *dev, struct 
aer_err_info *e_info)
 * When bus id is equal to 0, it might be a bad id
 * reported by root port.
 */
-   if (!nosourceid  (PCI_BUS(e_info-id) != 0)) {
+   if (!nosourceid  (PCI_BUS_NUM(e_info-id) != 0)) {
/* Device ID match? */
if (e_info-id == ((dev-bus-number  8) | dev-devfn))
return true;
-- 
1.7.9.5



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/4] iommu/amd: Remove calc_devid() and use PCI_DEVID() from pci

2013-02-27 Thread Shuah Khan
Change to remove calc_devid() and use PCI_DEVID() from pci instead.

Signed-off-by: Shuah Khan shuah.k...@hp.com
---
 drivers/iommu/amd_iommu.c   |2 +-
 drivers/iommu/amd_iommu_init.c  |6 +++---
 drivers/iommu/amd_iommu_types.h |7 ---
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c1c74e0..38c1392 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -173,7 +173,7 @@ static inline u16 get_device_id(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
 
-   return calc_devid(pdev-bus-number, pdev-devfn);
+   return PCI_DEVID(pdev-bus-number, pdev-devfn);
 }
 
 static struct iommu_dev_data *get_dev_data(struct device *dev)
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index faf10ba..582b5df 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -406,7 +406,7 @@ static int __init find_last_devid_on_pci(int bus, int dev, 
int fn, int cap_ptr)
u32 cap;
 
cap = read_pci_config(bus, dev, fn, cap_ptr+MMIO_RANGE_OFFSET);
-   update_last_devid(calc_devid(MMIO_GET_BUS(cap), MMIO_GET_LD(cap)));
+   update_last_devid(PCI_DEVID(MMIO_GET_BUS(cap), MMIO_GET_LD(cap)));
 
return 0;
 }
@@ -1128,9 +1128,9 @@ static int iommu_init_pci(struct amd_iommu *iommu)
pci_read_config_dword(iommu-dev, cap_ptr + MMIO_MISC_OFFSET,
  misc);
 
-   iommu-first_device = calc_devid(MMIO_GET_BUS(range),
+   iommu-first_device = PCI_DEVID(MMIO_GET_BUS(range),
 MMIO_GET_FD(range));
-   iommu-last_device = calc_devid(MMIO_GET_BUS(range),
+   iommu-last_device = PCI_DEVID(MMIO_GET_BUS(range),
MMIO_GET_LD(range));
 
if (!(iommu-cap  (1  IOMMU_CAP_IOTLB)))
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index a07882f..ec36cf6 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -701,13 +701,6 @@ extern int amd_iommu_max_glx_val;
  */
 extern void iommu_flush_all_caches(struct amd_iommu *iommu);
 
-/* takes bus and device/function and returns the device id
- * FIXME: should that be in generic PCI code? */
-static inline u16 calc_devid(u8 bus, u8 devfn)
-{
-   return (((u16)bus)  8) | devfn;
-}
-
 static inline int get_ioapic_devid(int id)
 {
struct devid_map *entry;
-- 
1.7.9.5



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3.4] iommu/amd: Initialize device table after dma_ops

2013-02-27 Thread Shuah Khan
When dma_ops are initialized the unity mappings are created. The
init_device_table_dma() function makes sure DMA from all devices is
blocked by default. This opens a short window in time where DMA to
unity mapped regions is blocked by the IOMMU. Make sure this does not
happen by initializing the device table after dma_ops.

Back-port upstream commit: f528d980c17b8714aedc918ba86e058af914d66b

Signed-off-by: Joerg Roedel j...@8bytes.org
Signed-off-by: Shuah Khan shuah.k...@hp.com
CC: sta...@vger.kernel.org 3.4
---
 drivers/iommu/amd_iommu_init.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index ef0ae93..b573f80 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1572,8 +1572,6 @@ int __init amd_iommu_init_hardware(void)
if (amd_iommu_pd_alloc_bitmap == NULL)
goto free;
 
-   /* init the device table */
-   init_device_table();
 
/*
 * let all alias entries point to itself
@@ -1655,6 +1653,7 @@ out:
  */
 static int __init amd_iommu_init(void)
 {
+   struct amd_iommu *iommu;
int ret = 0;
 
ret = amd_iommu_init_hardware();
@@ -1673,6 +1672,12 @@ static int __init amd_iommu_init(void)
if (ret)
goto free;
 
+   /* init the device table */
+   init_device_table();
+
+   for_each_iommu(iommu)
+   iommu_flush_all_caches(iommu);
+
amd_iommu_init_api();
 
x86_platform.iommu_shutdown = disable_iommus;
-- 
1.7.9.5



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-27 Thread Shuah Khan
On Mon, 2013-02-25 at 14:53 -0700, Shuah Khan wrote:
 On Mon, 2013-02-25 at 14:23 -0700, Bjorn Helgaas wrote:
  On Mon, Feb 25, 2013 at 9:37 AM, Shuah Khan shuah.k...@hp.com wrote:
   On Wed, 2013-02-20 at 18:19 -0700, Bjorn Helgaas wrote:
   On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan shuah.k...@hp.com wrote:

  
  It's not nice and consistent, but it does follow the simple rule of
  don't expose things to user-space unnecessarily.  We might want to
  add a comment to keep somebody from cleaning it up later.
 
 ok. Will resend patches adding the new defines to linux/pci.h and
 renaming PCI_BUS() to PCI_BUS_NR() or PCI_BUS_NUM() like you suggested.
 
 Thanks,
 -- Shuah
 

Bjorn/Joerg,

I added PCI_BUS_NUM() amd PCI_DEVID() to linux/pci.h. Please note that
changing PCI_BUS() to PCI_BUS_NUM() required additional changes to AMD
IOMMU source files and aer driver. Essentially in addition to removing
local PCI_BUS() define, PCI_BUS() usages are changed to PCI_BUS_NUM(). I
am resending the patches.

Thanks,
-- Shuah


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3.4] iommu/amd: Initialize device table after dma_ops

2013-02-28 Thread Shuah Khan
On Tue, 2013-02-26 at 16:49 -0700, Shuah Khan wrote:
 When dma_ops are initialized the unity mappings are created. The
 init_device_table_dma() function makes sure DMA from all devices is
 blocked by default. This opens a short window in time where DMA to
 unity mapped regions is blocked by the IOMMU. Make sure this does not
 happen by initializing the device table after dma_ops.
 
 Back-port upstream commit: f528d980c17b8714aedc918ba86e058af914d66b
 
 Signed-off-by: Joerg Roedel j...@8bytes.org
 Signed-off-by: Shuah Khan shuah.k...@hp.com
 CC: sta...@vger.kernel.org 3.4

Please consider this for 3.5 as well.

-- Shuah
 ---
  drivers/iommu/amd_iommu_init.c |9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
 index ef0ae93..b573f80 100644
 --- a/drivers/iommu/amd_iommu_init.c
 +++ b/drivers/iommu/amd_iommu_init.c
 @@ -1572,8 +1572,6 @@ int __init amd_iommu_init_hardware(void)
   if (amd_iommu_pd_alloc_bitmap == NULL)
   goto free;
  
 - /* init the device table */
 - init_device_table();
  
   /*
* let all alias entries point to itself
 @@ -1655,6 +1653,7 @@ out:
   */
  static int __init amd_iommu_init(void)
  {
 + struct amd_iommu *iommu;
   int ret = 0;
  
   ret = amd_iommu_init_hardware();
 @@ -1673,6 +1672,12 @@ static int __init amd_iommu_init(void)
   if (ret)
   goto free;
  
 + /* init the device table */
 + init_device_table();
 +
 + for_each_iommu(iommu)
 + iommu_flush_all_caches(iommu);
 +
   amd_iommu_init_api();
  
   x86_platform.iommu_shutdown = disable_iommus;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3.4] iommu/amd: Initialize device table after dma_ops

2013-02-28 Thread Shuah Khan
On Thu, 2013-02-28 at 12:52 -0700, Shuah Khan wrote:
 On Tue, 2013-02-26 at 16:49 -0700, Shuah Khan wrote:
  When dma_ops are initialized the unity mappings are created. The
  init_device_table_dma() function makes sure DMA from all devices is
  blocked by default. This opens a short window in time where DMA to
  unity mapped regions is blocked by the IOMMU. Make sure this does not
  happen by initializing the device table after dma_ops.
  
  Back-port upstream commit: f528d980c17b8714aedc918ba86e058af914d66b
  
  Signed-off-by: Joerg Roedel j...@8bytes.org
  Signed-off-by: Shuah Khan shuah.k...@hp.com
  CC: sta...@vger.kernel.org 3.4
 
 Please consider this for 3.5 as well.
 
 -- Shuah

Please consider this for 3.5. Patch applies and fix works on 3.5.7. My
previous reply to Herton Ronaldo Krzesinski bounced. Including Luis
Henriques on this one.

-- Shuah

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3.2] iommu/amd: Initialize device table after dma_ops

2013-02-28 Thread Shuah Khan
When dma_ops are initialized the unity mappings are created. The
init_device_table_dma() function makes sure DMA from all devices is
blocked by default. This opens a short window in time where DMA to
unity mapped regions is blocked by the IOMMU. Make sure this does not
happen by initializing the device table after dma_ops.

Back-port upstream commit: f528d980c17b8714aedc918ba86e058af914d66b
Tested on 3.2.38

Signed-off-by: Joerg Roedel j...@8bytes.org
Signed-off-by: Shuah Khan shuah.k...@hp.com
CC: sta...@vger.kernel.org 3.2
---
 drivers/iommu/amd_iommu_init.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 62a4d5c..b7d1cdd 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1396,6 +1396,7 @@ static struct syscore_ops amd_iommu_syscore_ops = {
  */
 static int __init amd_iommu_init(void)
 {
+   struct amd_iommu *iommu;
int i, ret = 0;
 
/*
@@ -1444,9 +1445,6 @@ static int __init amd_iommu_init(void)
if (amd_iommu_pd_alloc_bitmap == NULL)
goto free;
 
-   /* init the device table */
-   init_device_table();
-
/*
 * let all alias entries point to itself
 */
@@ -1496,6 +1494,12 @@ static int __init amd_iommu_init(void)
if (ret)
goto free_disable;
 
+   /* init the device table */
+   init_device_table();
+
+   for_each_iommu(iommu)
+   iommu_flush_all_caches(iommu);
+
amd_iommu_init_api();
 
amd_iommu_init_notifier();
-- 
1.7.10.4



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/9] iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides

2013-04-10 Thread Shuah Khan
On Tue, Apr 9, 2013 at 2:13 PM, Joerg Roedel j...@8bytes.org wrote:
 When the IVRS entries for IOAPIC and HPET are overridden on
 the kernel command line, a problem detected in the check
 function might not be a firmware bug anymore. So disable
 the firmware bug reporting if the user provided valid
 ivrs_ioapic or ivrs_hpet entries on the command line.

 Signed-off-by: Joerg Roedel j...@8bytes.org
 ---
  drivers/iommu/amd_iommu_init.c |   19 ---
  1 file changed, 16 insertions(+), 3 deletions(-)

 diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
 index 030d6ab..9767941 100644
 --- a/drivers/iommu/amd_iommu_init.c
 +++ b/drivers/iommu/amd_iommu_init.c
 @@ -219,6 +219,7 @@ static struct devid_map __initdata 
 early_ioapic_map[EARLY_MAP_SIZE];
  static struct devid_map __initdata early_hpet_map[EARLY_MAP_SIZE];
  static int __initdata early_ioapic_map_size;
  static int __initdata early_hpet_map_size;
 +static bool __initdata cmdline_maps;

  static enum iommu_init_state init_state = IOMMU_START_STATE;

 @@ -1686,18 +1687,28 @@ static void __init free_on_init_error(void)

  static bool __init check_ioapic_information(void)
  {
 +   const char *fw_bug = FW_BUG;
 bool ret, has_sb_ioapic;
 int idx;

 has_sb_ioapic = false;
 ret   = false;

 +   /*
 +* If we have map overrides on the kernel command line the
 +* messages in this function might not describe firmware bugs
 +* anymore - so be careful
 +*/
 +   if (cmdline_maps)
 +   fw_bug = ;
 +
 for (idx = 0; idx  nr_ioapics; idx++) {
 int devid, id = mpc_ioapic_id(idx);

 devid = get_ioapic_devid(id);
 if (devid  0) {
 -   pr_err(FW_BUG AMD-Vi: IOAPIC[%d] not in IVRS 
 table\n, id);
 +   pr_err(%sAMD-Vi: IOAPIC[%d] not in IVRS table\n,
 +   fw_bug, id);
 ret = false;
 } else if (devid == IOAPIC_SB_DEVID) {
 has_sb_ioapic = true;
 @@ -1714,11 +1725,11 @@ static bool __init check_ioapic_information(void)
  * when the BIOS is buggy and provides us the wrong
  * device id for the IOAPIC in the system.
  */
 -   pr_err(FW_BUG AMD-Vi: No southbridge IOAPIC found in IVRS 
 table\n);
 +   pr_err(%sAMD-Vi: No southbridge IOAPIC found\n, fw_bug);
 }

 if (!ret)
 -   pr_err(AMD-Vi: Disabling interrupt remapping due to BIOS 
 Bug(s)\n);
 +   pr_err(AMD-Vi: Disabling interrupt remapping\n);

Can this made conditional on cmdline_maps and keep the original
message in the case of a real BIOS bug? That way it is explicit and
easier to debug.


 return ret;
  }
 @@ -2166,6 +2177,7 @@ static int __init parse_ivrs_ioapic(char *str)

 devid = ((bus  0xff)  8) | ((dev  0x1f)  3) | (fn  0x7);

 +   cmdline_maps= true;
 i   = early_ioapic_map_size++;
 early_ioapic_map[i].id  = id;
 early_ioapic_map[i].devid   = devid;
 @@ -2195,6 +2207,7 @@ static int __init parse_ivrs_hpet(char *str)

 devid = ((bus  0xff)  8) | ((dev  0x1f)  3) | (fn  0x7);

 +   cmdline_maps= true;
 i   = early_hpet_map_size++;
 early_hpet_map[i].id= id;
 early_hpet_map[i].devid = devid;
 --
 1.7.9.5


 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/9] iommu/amd: Remove map_sg_no_iommu()

2013-04-10 Thread Shuah Khan
On Tue, Apr 9, 2013 at 2:12 PM, Joerg Roedel j...@8bytes.org wrote:
 This function was intended as a fall-back if the map_sg
 function is called for a device not mapped by the IOMMU.
 Since the AMD IOMMU driver uses per-device dma_ops this can
 never happen. So this function isn't needed anymore.

 Signed-off-by: Joerg Roedel j...@8bytes.org
 ---

Reviewed-by: Shuah Khan shuahk...@gmail.com
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/9] iommu/amd: Use AMD specific data structure for irq remapping

2013-04-10 Thread Shuah Khan
On Tue, Apr 9, 2013 at 2:12 PM, Joerg Roedel j...@8bytes.org wrote:
 For compatibility reasons the irq remapping code for the AMD
 IOMMU used the same per-irq data structure as the Intel
 implementation. Now that support for the AMD specific data
 structure is upstream we can use this one instead.

 Signed-off-by: Joerg Roedel j...@8bytes.org
 ---

Reviewed-by: Shuah Khan shuahk...@gmail.com
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9] AMD IOMMU cleanups, fixes and IVRS bug workarounds

2013-04-10 Thread Shuah Khan
On Tue, Apr 9, 2013 at 2:12 PM, Joerg Roedel j...@8bytes.org wrote:
 Hi,

 this patch-set contains some cleanups and patches for the
 AMD IOMM driver. The most important part is a workaround
 that can be used to get interrupt remapping working even the
 the IVRS table provided by the BIOS is broken.


 Joerg

 Joerg Roedel (9):
   iommu/amd: Remove map_sg_no_iommu()
   iommu/amd: Use AMD specific data structure for irq remapping
   iommu/amd: Properly initialize irq-table lock
   iommu/amd: Move add_special_device() to __init
   iommu/amd: Extend IVRS special device data structure
   iommu/amd: Add early maps for ioapic and hpet
   iommu/amd: Add ioapic and hpet ivrs override
   iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides
   iommu/amd: Document ivrs_ioapic and ivrs_hpet parameters

  Documentation/kernel-parameters.txt |   14 
  drivers/iommu/amd_iommu.c   |   79 +++---
  drivers/iommu/amd_iommu_init.c  |  151 
 +++
  drivers/iommu/amd_iommu_types.h |1 +
  4 files changed, 182 insertions(+), 63 deletions(-)


Joerg,

Reviewed all the patches in this set. No longer have access to test machine. :(

Reviewed-by: Shuah Khan shuahk...@gmail.com

-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2 V5] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-10 Thread Shuah Khan
 == 0) {
 +   if (err_type  ARRAY_SIZE(_invalid_trnslt_desc))
 +   pr_cont(%s\n,
 +   _invalid_trnslt_desc[err_type]);
 +   } else {
 +   if (err_type  ARRAY_SIZE(_invalid_trnsac_desc))
 +   pr_cont(%s\n,
 +   _invalid_trnsac_desc[err_type]);
 +   }
 +   }
 +}
 +
 +static void dump_flags(int flags, int ev_type)
 +{
 +   struct _event_log_flags *p = (struct _event_log_flags *) flags;
 +   u32 err_type = p-type;
 +
 +   pr_cont( flags:%s %s %s %s %s %s %s %s %s,
 +   (p-gn ? G : N),
 +   (p-nx ? Nx : Ex),
 +   (p-us ? Usr : Sup),
 +   (p-i  ? I : M),
 +   (p-pr ? P : NP),
 +   (p-rw ? W : R),
 +   (p-pe ? N-Pm : Pm),
 +   (p-rz ? Rsv : Ill),
 +   (p-tr ? Tl : Ta));
 +

Good feature. Do you also plan to add decode logic for these flags.
For example, RZ is only meaningful when PR=1, RW is only meaningful
when
PR=1, TR=0, and I=0, and so on? This additional logic will be useful.

Reviewed-by: Shuah Khan shuahk...@gmail.com

-- Shuah


 +   /* Error type only needed for certain events */
 +   if (!amd_iommu_verbose) {
 +   if ((ev_type == EVENT_TYPE_DEV_TAB_ERR)  ||
 +   (ev_type == EVENT_TYPE_PAGE_TAB_ERR) ||
 +   (ev_type == EVENT_TYPE_CMD_HARD_ERR) ||
 +   (ev_type == EVENT_TYPE_INV_DEV_REQ))
 +   pr_cont( type(0x%x)]\n, err_type);
 +   } else {
 +   pr_cont(]\n);
 +   dump_detail_error(p, ev_type);
 +   }
 +}
 +
  static void dump_dte_entry(u16 devid)
  {
 int i;

 +   pr_err(AMD-Vi: DTE[0..3]:);
 for (i = 0; i  4; ++i)
 -   pr_err(AMD-Vi: DTE[%d]: %016llx\n, i,
 -   amd_iommu_dev_table[devid].data[i]);
 +   pr_cont( %016llx, amd_iommu_dev_table[devid].data[i]);
 +   pr_cont(\n);
  }

  static void dump_command(unsigned long phys_addr)
 @@ -619,81 +724,96 @@ static void dump_command(unsigned long phys_addr)
 pr_err(AMD-Vi: CMD[%d]: %08x\n, i, cmd-data[i]);
  }

 -static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 +void amd_iommu_print_event(int type, int devid, int domid,
 + int flags, u64 address)
  {
 -   int type, devid, domid, flags;
 -   volatile u32 *event = __evt;
 -   int count = 0;
 -   u64 address;
 -
 -retry:
 -   type= (event[1]  EVENT_TYPE_SHIFT)   EVENT_TYPE_MASK;
 -   devid   = (event[0]  EVENT_DEVID_SHIFT)  EVENT_DEVID_MASK;
 -   domid   = (event[1]  EVENT_DOMID_SHIFT)  EVENT_DOMID_MASK;
 -   flags   = (event[1]  EVENT_FLAGS_SHIFT)  EVENT_FLAGS_MASK;
 -   address = (u64)(((u64)event[3])  32) | event[2];
 -
 -   if (type == 0) {
 -   /* Did we hit the erratum? */
 -   if (++count == LOOP_TIMEOUT) {
 -   pr_err(AMD-Vi: No event written to event log\n);
 -   return;
 -   }
 -   udelay(1);
 -   goto retry;
 -   }
 -
 -   printk(KERN_ERR AMD-Vi: Event logged [);
 +   pr_err(AMD-Vi: Event logged [);

 switch (type) {
 case EVENT_TYPE_ILL_DEV:
 -   printk(ILLEGAL_DEV_TABLE_ENTRY device=%02x:%02x.%x 
 -  address=0x%016llx flags=0x%04x]\n,
 +   pr_cont(ILLEGAL_DEV_TABLE_ENTRY device=%02x:%02x.%x 
 +  address=0x%016llx,
PCI_BUS(devid), PCI_SLOT(devid), PCI_FUNC(devid),
 -  address, flags);
 +  address);
 +   dump_flags(flags, type);
 dump_dte_entry(devid);
 break;
 case EVENT_TYPE_IO_FAULT:
 -   printk(IO_PAGE_FAULT device=%02x:%02x.%x 
 -  domain=0x%04x address=0x%016llx flags=0x%04x]\n,
 +   pr_cont(IO_PAGE_FAULT device=%02x:%02x.%x 
 +  domain=0x%04x address=0x%016llx,
PCI_BUS(devid), PCI_SLOT(devid), PCI_FUNC(devid),
 -  domid, address, flags);
 +  domid, address);
 +   dump_flags(flags, type);
 +   dump_dte_entry(devid);
 break;
 case EVENT_TYPE_DEV_TAB_ERR:
 -   printk(DEV_TAB_HARDWARE_ERROR device=%02x:%02x.%x 
 -  address=0x%016llx flags=0x%04x]\n,
 +   pr_cont(DEV_TAB_HARDWARE_ERROR device=%02x:%02x.%x 
 +  address=0x%016llx,
PCI_BUS(devid), PCI_SLOT(devid), PCI_FUNC(devid),
 -  address, flags);
 +  address);
 +   dump_flags(flags, type);
 break;
 case

Re: [PATCH 2/2 V5] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-10 Thread Shuah Khan
On Wed, Apr 10, 2013 at 10:27 AM, Suravee Suthikulanit
suravee.suthikulpa...@amd.com wrote:
 On 4/10/2013 11:21 AM, Shuah Khan wrote:

 Good feature. Do you also plan to add decode logic for these flags.
 For example, RZ is only meaningful when PR=1, RW is only meaningful
 when
 PR=1, TR=0, and I=0, and so on? This additional logic will be useful.

 Reviewed-by: Shuah Khanshuahk...@gmail.com

 -- Shuah

 Additional filtering logic can also be added in the future.  This will also
 be important if we are planning on handling IOMMU errors.


Correct. Additional logic isn't necessary in this patch.

-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9] AMD IOMMU cleanups, fixes and IVRS bug workarounds

2013-04-13 Thread Shuah Khan
On Sat, Apr 13, 2013 at 9:26 AM, Joerg Roedel j...@8bytes.org wrote:
 On Sat, Apr 13, 2013 at 11:06:22PM +0800, Andrew Cooks wrote:
 On Fri, Apr 12, 2013 at 4:06 PM, Joerg Roedel j...@8bytes.org wrote:

  Oh, that's sad. You were the only one having a machine wich actually has
  unity-mapped ranges defined in the BIOS table. The code for those
  mappings was basically untested before you ran it on that machine.
 
 What is the machine in question? Maybe someone else has access to one,
 if it's not too exotic.

 Shuah had access to a HP server machine (don't know which one) that
 defined unity-map ranges in the BIOS table. Shuah certainly knows the
 details about that machine.


Joerg/Andrew,

It is a DL385 Gen8 server. Unfortunately I left HP as of yesterday and no longer
have access to the system. Maybe there others that have access to one.

-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu: Add iommu_fault class event to iommu trace

2013-09-24 Thread Shuah Khan

On 09/24/2013 04:48 AM, Joerg Roedel wrote:

On Fri, Aug 16, 2013 at 11:20:16AM -0600, Shuah Khan wrote:

+DEFINE_EVENT(iommu_fault, report_fault,
+
+   TP_PROTO(struct device *dev, unsigned long iova, int flags),
+
+   TP_ARGS(dev, iova, flags)
+);


I am not entirely happy with the naming. It is better to name the class
iommu_error and the page-fault event io_page_fault. This makes it more
clear what kind of fault it is.


Joerg




Joerg,

Yes, iommu_error and io_page_fault sounds better. I will rename and send 
the updated patch.


-- Shuah

--
Shuah Khan
Senior Linux Kernel Developer - Open Source Group
Samsung Research America(Silicon Valley)
shuah...@samsung.com | (970) 672-0658
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/2] iommu: Add iommu_error class event to iommu trace

2013-09-24 Thread Shuah Khan
This patch set adds iommu_error class event to iommu trace. iommu_error class
event can be enabled to trigger when an iommu error occurs. This trace event
is intended to be called to report the error information. Trace information
includes driver name, device name, iova, and flags.

iommu_error:io_page_fault

This patch set depends on the previous patch set that added iommu tracing
feature. Reference:

http://www.kernelhub.org/?msg=313155p=2
http://www.kernelhub.org/?msg=313156p=2

Shuah Khan (2):
  iommu: Add iommu_error class event to iommu trace
  iommu: Change iommu driver to call io_page_fault trace event

 drivers/iommu/iommu-traces.c |  3 +++
 include/linux/iommu.h|  2 ++
 include/trace/events/iommu.h | 33 +
 3 files changed, 38 insertions(+)

-- 
1.8.1.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/2] iommu: Change iommu driver to call io_page_fault trace event

2013-09-24 Thread Shuah Khan
Change iommu driver call io_page_fault trace event. This iommu_error class
event can be enabled to trigger when an iommu error occurs. Trace information
includes driver name, device name, iova, and flags.

Testing:
Added trace calls to iommu_prepare_identity_map() for testing some of the
conditions that are hard to trigger. Here is the trace from the testing:

   swapper/0-1 [003]  2.003774: io_page_fault: IOMMU:pci 
:00:02.0 iova=0xcb80 flags=0x0002
   swapper/0-1 [003]  2.004098: io_page_fault: IOMMU:pci 
:00:1d.0 iova=0xcadc6000 flags=0x0002
   swapper/0-1 [003]  2.004115: io_page_fault: IOMMU:pci 
:00:1a.0 iova=0xcadc6000 flags=0x0002
   swapper/0-1 [003]  2.004129: io_page_fault: IOMMU:pci 
:00:1f.0 iova=0x flags=0x0002

Signed-off-by: Shuah Khan shuah...@samsung.com
---
 include/linux/iommu.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ea319e..a444c79 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -22,6 +22,7 @@
 #include linux/errno.h
 #include linux/err.h
 #include linux/types.h
+#include trace/events/iommu.h
 
 #define IOMMU_READ (1)
 #define IOMMU_WRITE(2)
@@ -227,6 +228,7 @@ static inline int report_iommu_fault(struct iommu_domain 
*domain,
ret = domain-handler(domain, dev, iova, flags,
domain-handler_token);
 
+   trace_io_page_fault(dev, iova, flags);
return ret;
 }
 
-- 
1.8.1.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/2] iommu: Add iommu_error class event to iommu trace

2013-09-24 Thread Shuah Khan
iommu_error class event can be enabled to trigger when an iommu
error occurs. This trace event is intended to be called to report the
error information. Trace information includes driver name, device name,
iova, and flags.

iommu_error:io_page_fault

Signed-off-by: Shuah Khan shuah...@samsung.com
---
 drivers/iommu/iommu-traces.c |  3 +++
 include/trace/events/iommu.h | 33 +
 2 files changed, 36 insertions(+)

diff --git a/drivers/iommu/iommu-traces.c b/drivers/iommu/iommu-traces.c
index a2af60f..71ac5fa 100644
--- a/drivers/iommu/iommu-traces.c
+++ b/drivers/iommu/iommu-traces.c
@@ -22,3 +22,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(detach_device_from_domain);
 /* iommu_map_unmap */
 EXPORT_TRACEPOINT_SYMBOL_GPL(map);
 EXPORT_TRACEPOINT_SYMBOL_GPL(unmap);
+
+/* iommu_error */
+EXPORT_TRACEPOINT_SYMBOL_GPL(io_page_fault);
diff --git a/include/trace/events/iommu.h b/include/trace/events/iommu.h
index 86bcc5a..a8f5c32 100644
--- a/include/trace/events/iommu.h
+++ b/include/trace/events/iommu.h
@@ -123,6 +123,39 @@ DEFINE_EVENT_PRINT(iommu_map_unmap, unmap,
__entry-iova, __entry-size
)
 );
+
+DECLARE_EVENT_CLASS(iommu_error,
+
+   TP_PROTO(struct device *dev, unsigned long iova, int flags),
+
+   TP_ARGS(dev, iova, flags),
+
+   TP_STRUCT__entry(
+   __string(device, dev_name(dev))
+   __string(driver, dev_driver_string(dev))
+   __field(u64, iova)
+   __field(int, flags)
+   ),
+
+   TP_fast_assign(
+   __assign_str(device, dev_name(dev));
+   __assign_str(driver, dev_driver_string(dev));
+   __entry-iova = iova;
+   __entry-flags = flags;
+   ),
+
+   TP_printk(IOMMU:%s %s iova=0x%016llx flags=0x%04x,
+   __get_str(driver), __get_str(device),
+   __entry-iova, __entry-flags
+   )
+);
+
+DEFINE_EVENT(iommu_error, io_page_fault,
+
+   TP_PROTO(struct device *dev, unsigned long iova, int flags),
+
+   TP_ARGS(dev, iova, flags)
+);
 #endif /* _TRACE_IOMMU_H */
 
 /* This part must be outside protection */
-- 
1.8.1.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: hpsa driver bug crack kernel down!

2014-04-10 Thread Shuah Khan
On Thu, Apr 10, 2014 at 2:45 PM,  scame...@beardog.cce.hp.com wrote:
  3f583bc21977 BAD (Merge tag 'iommu-updates-v3.15')

 Yes, specifically (finally done bisecting):

 commit 2e45528930388658603ea24d49cf52867b928d3e
 Author: Jiang Liu jiang@linux.intel.com
 Date:   Wed Feb 19 14:07:36 2014 +0800

 iommu/vt-d: Unify the way to process DMAR device scope array

 Now we have a PCI bus notification based mechanism to update DMAR
 device scope array, we could extend the mechanism to support boot
 time initialization too, which will help to unify and simplify
 the implementation.

 Signed-off-by: Jiang Liu jiang@linux.intel.com
 Signed-off-by: Joerg Roedel j...@8bytes.org

 My git bisect appears to be converging on something else, something
 within the hpsa patches that I sent up recently, unfortunately for
 me.  Will let you all know when it converges.


This smells very much like the problem that was solved couple of years
ago for SI domain. It is likely that path is broken with the DMAR
device scope array change. Please take a look to see if the following
no longer occurs. Looks like BIOS could be expecting this RMRR to be
still mapped.

   /*
 * We want to prevent any device associated with an RMRR from
 * getting placed into the SI Domain. This is done because
 * problems exist when devices are moved in and out of domains
 * and their respective RMRR info is lost. We exempt USB devices
 * from this process due to their usage of RMRRs that are known
 * to not be needed after BIOS hand-off to OS.
 */
if (device_has_rmrr(dev) 
(pdev-class  8) != PCI_CLASS_SERIAL_USB)
return 0;

-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dma-mapping: Add devm_ interface for dma_map_single()

2014-06-03 Thread Shuah Khan

On 06/01/2014 01:01 AM, Eli Billauer wrote:

dmam_map_single() and dmam_unmap_single() are the managed counterparts
for the respective dma_* functions.

Note that dmam_map_single() returns a status value rather than the DMA handle.
The DMA handle is passed to the caller through a pointer in the arguments.

The reason for this API change is that dmam_map_single() allocates memory,
which may fail even if the DMA mapping is successful. On the other hand,
it seems like there's no platform-independent value for dma_handle that can
be used to indicate an error.

This leaves dmam_map_single() with no safe way to tell its caller that the
memory allocation has failed, except for returning a status as an int. Trying
to keep close to the non-devres API could also tempt programmers into using
dma_mapping_error() on the dmam_* functions. This would work incorrectly on
platforms, for which dma_mapping_error() ignores its argument, and always
returns success.



I see the value of this interface in unmap case, this type of wrapper
can release dma buffers, drivers neglected to release leaving dangling
buffers.

However, driver writers should give it some thought before switching
from conventional dma_map_*() interfaces to these proposed managed
dma mapping interfaces. These new interfaces shouldn't be treated as
drop in replacements to dma_map_*() interfaces.

The reasons are:

1. This interface adds an overhead in allocation memory for devres to
   compared to other dma_map_* wrappers. Users need to be aware of that.
   This would be okay in the cases where a driver allocates several
   buffers at init time and uses them. However, several drivers allocate
   during run-time and release as soon as it is no longer needed. This
   overhead is going to be in the performance path.

2. It adds a failure case even when dma buffers are available. i.e if
   if devres alloc fails, it will return failure even if dma map could
   have succeeded. This is a new failure case for DMA-API.

   The reason dma_map_*() routines fail now is because there are no
   buffers available. Drivers handle this error as a retry case.

   Drivers using dmam_map_single() will have to handle the failure
   cases differently.

   Since the return values are different for dmam_map_*(), that is
   plus that these interfaces can't be drop in replacements to the
   dma_map_*() interfaces.

3. Similarly, it adds an overhead in releasing memory for devres to
   compared to other dma_unmap_* wrappers. Users need to be aware of
   that. This overhead is going to be in the performance path when
   drivers unmap buffers during run-time.

Adding Joerg Roedel to the thread.

-- Shuah

--
Shuah Khan
Senior Linux Kernel Developer - Open Source Group
Samsung Research America(Silicon Valley)
shuah...@samsung.com | (970) 672-0658
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dma-mapping: Add devm_ interface for dma_map_single()

2014-06-06 Thread Shuah Khan

On 06/06/2014 10:21 AM, Eli Billauer wrote:

On 06/06/14 19:01, Greg KH wrote:

Please try to put yourself in my position: I have a driver that I care
  about, which works fine for a few years. It's based upon
dma_map_single(),
  which seems to be the common way to get non-coherent memory, even
for the
  driver's entire lifespan. I realize that dma_alloc_* was the
intended way to
  do it, but fact is that dma_map_* has become the common choice.


Is your driver in the kernel tree?  If not, you really are on your own :(


It's the Xillybus driver in the staging area. I don't know if this
counts for being in the kernel tree...

The suggested patchset would allow replacing my use of dma_map_single()
with a managed version of that function. This will clean the driver's
code considerably.

But I think that the discussion here is if it's valid to use
dma_map_single() for a device-permanent DMA mapping, or if
dma_alloc_noncoherent() is the only way. If the answer is no, there's
quite obviously no point in a devres version for that function.



Eli,

dma_map_single() and dma_unmap_single() are streaming DMA APIs. These
are intended for one DMA transfer and unmapped right after it is done.

dma buffers are limited shared resources for streaming that are
shared by several drivers. Hence the need for use and release
model.

Please refer to the Using Streaming DMA mappings in DMA-API-HOWTO.txt

- Streaming DMA mappings which are usually mapped for one DMA
  transfer, unmapped right after it (unless you use dma_sync_* below)
  and for which hardware can optimize for sequential accesses.

  This of streaming as asynchronous or outside the coherency
  domain.

  Good examples of what to use streaming mappings for are:

- Networking buffers transmitted/received by a device.
- Filesystem buffers written/read by a SCSI device.


If I understand your intended usage correctly, you are looking to
allocate and hold the buffers for the lifetime of the driver. For
such cases, dma_alloc_*() interfaces are the ones to use.

Please also refer to DMA-API.txt as well. Hope this helps.

-- Shuah


--
Shuah Khan
Senior Linux Kernel Developer - Open Source Group
Samsung Research America(Silicon Valley)
shuah...@samsung.com | (970) 672-0658
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: fix trace_map() to report original iova and original size

2015-01-16 Thread Shuah Khan
iommu_map() calls trace_map() with iova and size. trace_map()
should report original iova and original size as opposed to
iova and size after they get changed during mapping. size is
always zero at the end of mapping which is useless to report
and iova as it gets incremented, it is not as useful as the
original iova. Change iommu_map() to call trace_map() to
report original iova and original size.

Signed-off-by: Shuah Khan shua...@osg.samsung.com
Reported-by: Alex Williamson alex.william...@redhat.com
---
 drivers/iommu/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f7718d7..fbf8827 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1084,7 +1084,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long 
iova,
if (ret)
iommu_unmap(domain, orig_iova, orig_size - size);
else
-   trace_map(iova, paddr, size);
+   trace_map(orig_iova, paddr, orig_size);
 
return ret;
 }
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: fix trace_unmap() to report original iova

2015-01-16 Thread Shuah Khan
iommu_unmap() calls trace_unmap() with changed iova and original
size. trace_unmap() should report original iova instead. Change
iommu_unmap() to call trace_unmap() with original iova.

Signed-off-by: Shuah Khan shua...@osg.samsung.com
Reported-by: Alex Williamson alex.william...@redhat.com
---
 drivers/iommu/iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index fbf8827..2eb7554 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1094,6 +1094,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned 
long iova, size_t size)
 {
size_t unmapped_page, unmapped = 0;
unsigned int min_pagesz;
+   unsigned long orig_iova = iova;
 
if (unlikely(domain-ops-unmap == NULL ||
 domain-ops-pgsize_bitmap == 0UL))
@@ -1133,7 +1134,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned 
long iova, size_t size)
unmapped += unmapped_page;
}
 
-   trace_unmap(iova, 0, size);
+   trace_unmap(orig_iova, 0, size);
return unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
-- 
2.1.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: change trace unmap api to report unmapped size

2015-01-16 Thread Shuah Khan
Currently map and unmap are implemented as events under a
common trace class declaration. The common class forces
trace_unmap() to require a bogus physical address argument
that it doesn't use. Changing unmap to report unmapped size
will provide useful information for debugging. Remove common
map_unmap trace class and change map and unmap into separate
events as opposed to events under the same class to allow for
differences in the reporting information. In addition, map and
unmap are changed to handle size value as size_t instead of int
to match the passed size value and avoid overflow.

Signed-off-by: Shuah Khan shua...@osg.samsung.com
Suggested-by: Alex Williamson alex.william...@redhat.com
---
 drivers/iommu/iommu.c|  2 +-
 include/trace/events/iommu.h | 31 ++-
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2eb7554..9e0dcdb 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1134,7 +1134,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned 
long iova, size_t size)
unmapped += unmapped_page;
}
 
-   trace_unmap(orig_iova, 0, size);
+   trace_unmap(orig_iova, size, unmapped);
return unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
diff --git a/include/trace/events/iommu.h b/include/trace/events/iommu.h
index a8f5c32..2c7befb 100644
--- a/include/trace/events/iommu.h
+++ b/include/trace/events/iommu.h
@@ -83,7 +83,7 @@ DEFINE_EVENT(iommu_device_event, detach_device_from_domain,
TP_ARGS(dev)
 );
 
-DECLARE_EVENT_CLASS(iommu_map_unmap,
+TRACE_EVENT(map,
 
TP_PROTO(unsigned long iova, phys_addr_t paddr, size_t size),
 
@@ -92,7 +92,7 @@ DECLARE_EVENT_CLASS(iommu_map_unmap,
TP_STRUCT__entry(
__field(u64, iova)
__field(u64, paddr)
-   __field(int, size)
+   __field(size_t, size)
),
 
TP_fast_assign(
@@ -101,26 +101,31 @@ DECLARE_EVENT_CLASS(iommu_map_unmap,
__entry-size = size;
),
 
-   TP_printk(IOMMU: iova=0x%016llx paddr=0x%016llx size=0x%x,
+   TP_printk(IOMMU: iova=0x%016llx paddr=0x%016llx size=%zu,
__entry-iova, __entry-paddr, __entry-size
)
 );
 
-DEFINE_EVENT(iommu_map_unmap, map,
+TRACE_EVENT(unmap,
 
-   TP_PROTO(unsigned long iova, phys_addr_t paddr, size_t size),
-
-   TP_ARGS(iova, paddr, size)
-);
+   TP_PROTO(unsigned long iova, size_t size, size_t unmapped_size),
 
-DEFINE_EVENT_PRINT(iommu_map_unmap, unmap,
+   TP_ARGS(iova, size, unmapped_size),
 
-   TP_PROTO(unsigned long iova, phys_addr_t paddr, size_t size),
+   TP_STRUCT__entry(
+   __field(u64, iova)
+   __field(size_t, size)
+   __field(size_t, unmapped_size)
+   ),
 
-   TP_ARGS(iova, paddr, size),
+   TP_fast_assign(
+   __entry-iova = iova;
+   __entry-size = size;
+   __entry-unmapped_size = unmapped_size;
+   ),
 
-   TP_printk(IOMMU: iova=0x%016llx size=0x%x,
-   __entry-iova, __entry-size
+   TP_printk(IOMMU: iova=0x%016llx size=%zu unmapped_size=%zu,
+   __entry-iova, __entry-size, __entry-unmapped_size
)
 );
 
-- 
2.1.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: fix trace_map() to report original iova and original size

2015-01-19 Thread Shuah Khan
On 01/15/2015 07:29 PM, Shuah Khan wrote:
 iommu_map() calls trace_map() with iova and size. trace_map()
 should report original iova and original size as opposed to
 iova and size after they get changed during mapping. size is
 always zero at the end of mapping which is useless to report
 and iova as it gets incremented, it is not as useful as the
 original iova. Change iommu_map() to call trace_map() to
 report original iova and original size.
 
 Signed-off-by: Shuah Khan shua...@osg.samsung.com
 Reported-by: Alex Williamson alex.william...@redhat.com
 ---
  drivers/iommu/iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
 index f7718d7..fbf8827 100644
 --- a/drivers/iommu/iommu.c
 +++ b/drivers/iommu/iommu.c
 @@ -1084,7 +1084,7 @@ int iommu_map(struct iommu_domain *domain, unsigned 
 long iova,
   if (ret)
   iommu_unmap(domain, orig_iova, orig_size - size);
   else
 - trace_map(iova, paddr, size);
 + trace_map(orig_iova, paddr, orig_size);
  
   return ret;
  }
 


Joerg,

Just makeing sure you saw this as well. I saw your responses on the
trace_unmap() patches that you pulled them in.

thanks,
-- Shuah

-- 
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
shua...@osg.samsung.com | (970) 217-8978
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH kernel] vfio/spapr: Add trace points for map/unmap

2017-11-29 Thread Shuah Khan
On 11/29/2017 09:32 AM, Alex Williamson wrote:
> On Thu, 23 Nov 2017 15:13:37 +1100
> Alexey Kardashevskiy  wrote:
> 
>> On 17/11/17 17:58, Alexey Kardashevskiy wrote:
>>> On 17/11/17 11:13, Alex Williamson wrote:  
 On Tue, 14 Nov 2017 10:47:12 +1100
 Alexey Kardashevskiy  wrote:
  
> On 27/10/17 14:00, Alexey Kardashevskiy wrote:  
>> This adds trace_map/trace_unmap tracepoints to spapr driver. Type1 
>> already
>> uses these via the IOMMU API (iommu_map/__iommu_unmap).
>>
>> Signed-off-by: Alexey Kardashevskiy 

 Is this really legitimate to include tracepoints from a different  
 subsystem?>  The vfio type1 backend gets these trace points by virtue of  
 it actually using the IOMMU API, it doesn't call them itself.  I'm kind
 of surprised these are actually available to be called from a module.  
>>>
>>> They are explicitly exported:
>>>
>>> EXPORT_TRACEPOINT_SYMBOL_GPL(map);
>>> EXPORT_TRACEPOINT_SYMBOL_GPL(unmap);

Tracepoints need to be exported as per the tracepoint sub-system
requirements.

>>>
>>> I would think this is for things like drivers/vfio/vfio_iommu_spapr_tce.c ,
>>> why else?...
>>>
>>>   
 I suspect the way to do this is probably to define our own tracepoints
 in the vfio/spapr backend or insert tracepoints into the IOMMU layers
 that that code calls into rather than masquerading as tracepoints from
 a different subsystem.  Right?  
>>>
>>> This makes sense too. But it is going to be just cut-n-paste and some
>>> confusion -
>>> /sys/kernel/debug/tracing/events/iommu/map will still be present but
>>> won't work and
>>> /sys/kernel/debug/tracing/events/vfio/vfio_iommu_spapr_tce/map will.
> 
> But iommu/map does work, it's just not called by the vfio spapr tce
> backend, which is absolutely correct.  In fact, that's part of my
> reservation about this approach is that it could be interpreted as
> implying a call path that doesn't exist on this arch.

These tracepoints are intended to be called after a iommo/map/unmap
calls from IOMMU API. It doesn't make any sense to call them from
random places. It is totally useless to do so and makes it difficult
to debug problems.

As an author of these tracepoints - that is my opinion.

thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread Shuah Khan
On Wed, Nov 28, 2018 at 12:48 PM Russell King - ARM Linux
 wrote:
>
> On Wed, Nov 28, 2018 at 11:27:17AM -0800, David Miller wrote:
> > From: Linus Torvalds 
> > Date: Wed, 28 Nov 2018 10:00:06 -0800
> >
> > > Not all memory is accessible even to the kernel. If you have memory
> > > that shows up in the last page of phys_addr_t, you just mark it
> > > reserved at boot-time.
> >
> > It's not the physical memory at the end that needs to be reserved.
> >
> > It's the IOMMU mapping arena.
>
> True, if and only if you have an IOMMU.
>
> Where there isn't an IOMMU, then we'd have to reserve every page that
> that translates to a bus address in the top 4K of dma_addr_t on any
> bus in the system - that means knowing early in the kernel
> initialisation about all buses in the system so we can detect and
> reserve these pages.
>

The arch and platform differences/confusion as to "what is DMA error
value" is the reason why dma_mapping_error is part dma ops. I
understand that iy would make sense to reduce the overhead of an
additional call, however, I am not sure if would be trivial change to
address this.

I was down this path of trying to address the missing mapping error
checks a few years ago and introduced dma_mapping_error checks in the
DMA DEBUG API. As you might already know that there is no common
definition for the mapping error.

Quick look at the defines shows:

#define CALGARY_MAPPING_ERROR 0
#define S390_MAPPING_ERROR  (~(dma_addr_t) 0x0)
#define SPARC_MAPPING_ERROR (~(dma_addr_t)0x0)

This is the reason why there is a arch or in some cases bus specific
mapping_error ops is needed.
We could unify this and fix all these first. I haven't looked at the
patch set closely, maybe you are already doing this.

thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: amd: Fix IOMMU perf counter clobbering during init

2020-01-24 Thread Shuah Khan

On 1/23/20 11:43 PM, Suravee Suthikulpanit wrote:



On 1/24/20 5:32 AM, Shuah Khan wrote:

init_iommu_perf_ctr() clobbers the register when it checks write access
to IOMMU perf counters and fails to restore when they are writable.

Add save and restore to fix it.

Signed-off-by: Shuah Khan 
---
Changes since v1:
-- Fix bug in sucessful return path. Add a return instead of
    fall through to pc_false error case

  drivers/iommu/amd_iommu_init.c | 24 ++--
  1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c 
b/drivers/iommu/amd_iommu_init.c

index 568c52317757..483f7bc379fa 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1655,27 +1655,39 @@ static int iommu_pc_get_set_reg(struct 
amd_iommu *iommu, u8 bank, u8 cntr,

  static void init_iommu_perf_ctr(struct amd_iommu *iommu)
  {
  struct pci_dev *pdev = iommu->dev;
-    u64 val = 0xabcd, val2 = 0;
+    u64 val = 0xabcd, val2 = 0, save_reg = 0;
  if (!iommu_feature(iommu, FEATURE_PC))
  return;
  amd_iommu_pc_present = true;
+    /* save the value to restore, if writable */
+    if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
+    goto pc_false;
+
  /* Check if the performance counters can be written to */
  if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
  (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-    (val != val2)) {
-    pci_err(pdev, "Unable to write to IOMMU perf counter.\n");
-    amd_iommu_pc_present = false;
-    return;
-    }
+    (val != val2))
+    goto pc_false;
+
+    /* restore */
+    if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
+    goto pc_false;
  pci_info(pdev, "IOMMU performance counters supported\n");
  val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);
  iommu->max_banks = (u8) ((val >> 12) & 0x3f);
  iommu->max_counters = (u8) ((val >> 7) & 0xf);
+
+    return;
+


Good catch. Sorry, I missed this part as well :(


+pc_false:
+    pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
+    amd_iommu_pc_present = false;
+    return;
  }
  static ssize_t amd_iommu_show_cap(struct device *dev,



As for your question in v1:

 > I see 2 banks and 4 counters on my system. Is it sufficient to check
 > the first bank and first counter? In other words, if the first one
 > isn't writable, are all counters non-writable?

We currently assume all counters have the same write-ability. So, it 
should be sufficient

to just check the first one.

 > Should we read the config first and then, try to see if any of the
 > counters are writable? I have a patch that does that, I can send it
 > out for review.

Which config are you referring to? Not sure what you mean.


I mean reading MMIO_CNTR_CONF_OFFSET to get max banks and counters.
Also what is the reason to check writable?

I tried a couplf og things on my

AMD Ryzen 5 PRO 2400GE w/ Radeon Vega Graphics

I changed the logic to read config to get max banks and counters
before checking if counters are writable and tried writing to all.
The result is the same and all of them aren't writable. However,
when disable the writable check and assume they are, I can run

perf stat -e 'amd_iommu_0 on all events and get data.

perf stat -e 'amd_iommu_0/cmd_processed/' sleep 10

 Performance counter stats for 'system wide':

56  amd_iommu_0/cmd_processed/ 



  10.001525171 seconds time elapsed


perf stat -a -e amd_iommu/mem_trans_total/ sleep 10

 Performance counter stats for 'system wide':

 2,696  amd_iommu/mem_trans_total/ 



  10.001465115 seconds time elapsed

I tried all possible events listed under amd_iommu_0 and I can get
data on all of them. No problems in dmesg.

Is this inline with what you expect? I am curious what this write
tell us and can we enable read only mode on these counters?



By the way, here is the output from booting the system with this patch 
(+ debug messages).


[   14.408834] pci :60:00.2: AMD-Vi: IOMMU performance counters 
supported

[   14.416526] DEBUG: init_iommu_perf_ctr: amd_iommu_pc_present=0x1
[   14.429602] pci :40:00.2: AMD-Vi: IOMMU performance counters 
supported

[   14.437275] DEBUG: init_iommu_perf_ctr: amd_iommu_pc_present=0x1
[   14.450320] pci :20:00.2: AMD-Vi: IOMMU performance counters 
supported

[   14.457991] DEBUG: init_iommu_perf_ctr: amd_iommu_pc_present=0x1
[   14.471049] pci :00:00.2: AMD-Vi: IOMMU performance counters 
supported

[   14.478722] DEBUG: init_iommu_perf_ctr: amd_iommu_pc_present=0x1

Also, here is the perf amd_iommu testing.

# perf stat -e 'amd_iommu_0/cmd_processed/,\
     amd_iommu_1/cmd_processed/,\
     amd_iommu_2/cmd_processed/,\
     amd_iommu_3/cmd_processed/'

  Performance counter stats for 'system wide':

    204  amd_iommu_0/cmd_pr

Re: [PATCH] iommu: amd: Fix IOMMU perf counter clobbering during init

2020-01-23 Thread Shuah Khan

On 1/21/20 8:32 AM, Shuah Khan wrote:

On 1/20/20 7:10 PM, Suravee Suthikulpanit wrote:

On 1/17/2020 5:08 PM, Joerg Roedel wrote:

Adding Suravee, who wrote the IOMMU Perf Counter code.

On Tue, Jan 14, 2020 at 08:12:20AM -0700, Shuah Khan wrote:

init_iommu_perf_ctr() clobbers the register when it checks write access
to IOMMU perf counters and fails to restore when they are writable.

Add save and restore to fix it.

Signed-off-by: Shuah Khan
---
  drivers/iommu/amd_iommu_init.c | 22 --
  1 file changed, 16 insertions(+), 6 deletions(-)

Suravee, can you please review this patch?



This looks ok. Does this fix certain issues? Or is this just for sanity.


I didn't notice any problems. Counters aren't writable on my system.
However, it certainly looks like a bog since registers aren't restored
like in other places in this file where such checks are done on other
registers.

I see 2 banks and 4 counters on my system. Is it sufficient to check
the first bank and first counter? In other words, if the first one
isn't writable, are all counters non-writable?

Should we read the config first and then, try to see if any of the
counters are writable? I have a patch that does that, I can send it
out for review.



Reviewed-by: Suravee Suthikulpanit 


Joerg,

Please don't pull this in. I introduced a bug in this patch. It always
returns amd_iommu_pc_present false even when it can write to the
counters. My bad.

I will send v2.

thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] iommu: amd: Fix IOMMU perf counter clobbering during init

2020-01-23 Thread Shuah Khan
init_iommu_perf_ctr() clobbers the register when it checks write access
to IOMMU perf counters and fails to restore when they are writable.

Add save and restore to fix it.

Signed-off-by: Shuah Khan 
---
Changes since v1:
-- Fix bug in sucessful return path. Add a return instead of
   fall through to pc_false error case

 drivers/iommu/amd_iommu_init.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 568c52317757..483f7bc379fa 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1655,27 +1655,39 @@ static int iommu_pc_get_set_reg(struct amd_iommu 
*iommu, u8 bank, u8 cntr,
 static void init_iommu_perf_ctr(struct amd_iommu *iommu)
 {
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0;
+   u64 val = 0xabcd, val2 = 0, save_reg = 0;
 
if (!iommu_feature(iommu, FEATURE_PC))
return;
 
amd_iommu_pc_present = true;
 
+   /* save the value to restore, if writable */
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
+   goto pc_false;
+
/* Check if the performance counters can be written to */
if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
(iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-   (val != val2)) {
-   pci_err(pdev, "Unable to write to IOMMU perf counter.\n");
-   amd_iommu_pc_present = false;
-   return;
-   }
+   (val != val2))
+   goto pc_false;
+
+   /* restore */
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
+   goto pc_false;
 
pci_info(pdev, "IOMMU performance counters supported\n");
 
val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);
iommu->max_banks = (u8) ((val >> 12) & 0x3f);
iommu->max_counters = (u8) ((val >> 7) & 0xf);
+
+   return;
+
+pc_false:
+   pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
+   amd_iommu_pc_present = false;
+   return;
 }
 
 static ssize_t amd_iommu_show_cap(struct device *dev,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: amd: Fix IOMMU perf counter clobbering during init

2020-01-14 Thread Shuah Khan
init_iommu_perf_ctr() clobbers the register when it checks write access
to IOMMU perf counters and fails to restore when they are writable.

Add save and restore to fix it.

Signed-off-by: Shuah Khan 
---
 drivers/iommu/amd_iommu_init.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 568c52317757..c0ad4f293522 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1655,27 +1655,37 @@ static int iommu_pc_get_set_reg(struct amd_iommu 
*iommu, u8 bank, u8 cntr,
 static void init_iommu_perf_ctr(struct amd_iommu *iommu)
 {
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0;
+   u64 val = 0xabcd, val2 = 0, save_reg = 0;
 
if (!iommu_feature(iommu, FEATURE_PC))
return;
 
amd_iommu_pc_present = true;
 
+   /* save the value to restore, if writable */
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
+   goto pc_false;
+
/* Check if the performance counters can be written to */
if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
(iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-   (val != val2)) {
-   pci_err(pdev, "Unable to write to IOMMU perf counter.\n");
-   amd_iommu_pc_present = false;
-   return;
-   }
+   (val != val2))
+   goto pc_false;
+
+   /* restore */
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
+   goto pc_false;
 
pci_info(pdev, "IOMMU performance counters supported\n");
 
val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);
iommu->max_banks = (u8) ((val >> 12) & 0x3f);
iommu->max_counters = (u8) ((val >> 7) & 0xf);
+
+pc_false:
+   pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
+   amd_iommu_pc_present = false;
+   return;
 }
 
 static ssize_t amd_iommu_show_cap(struct device *dev,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: amd: Fix IOMMU perf counter clobbering during init

2020-01-21 Thread Shuah Khan

On 1/20/20 7:10 PM, Suravee Suthikulpanit wrote:

On 1/17/2020 5:08 PM, Joerg Roedel wrote:

Adding Suravee, who wrote the IOMMU Perf Counter code.

On Tue, Jan 14, 2020 at 08:12:20AM -0700, Shuah Khan wrote:

init_iommu_perf_ctr() clobbers the register when it checks write access
to IOMMU perf counters and fails to restore when they are writable.

Add save and restore to fix it.

Signed-off-by: Shuah Khan
---
  drivers/iommu/amd_iommu_init.c | 22 --
  1 file changed, 16 insertions(+), 6 deletions(-)

Suravee, can you please review this patch?



This looks ok. Does this fix certain issues? Or is this just for sanity.


I didn't notice any problems. Counters aren't writable on my system.
However, it certainly looks like a bog since registers aren't restored
like in other places in this file where such checks are done on other
registers.

I see 2 banks and 4 counters on my system. Is it sufficient to check
the first bank and first counter? In other words, if the first one
isn't writable, are all counters non-writable?

Should we read the config first and then, try to see if any of the
counters are writable? I have a patch that does that, I can send it
out for review.



Reviewed-by: Suravee Suthikulpanit 

Thanks for the review.

thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/amd: Fix event counter availability check

2020-06-02 Thread Shuah Khan

On 5/31/20 1:22 AM, Alexander Monakov wrote:

Hi,

Adding Shuah Khan to Cc: I've noticed you've seen this issue on Ryzen 2400GE;
can you have a look at the patch? Would be nice to know if it fixes the
problem for you too.



I am not seeing any change in behavior on my system. I still see:

I can't read perf counters.

The question I asked in my previous thread on this:


I see 2 banks and 4 counters on my system. Is it sufficient to check
the first bank and first counter? In other words, if the first one
isn't writable, are all counters non-writable?

Should we read the config first and then, try to see if any of the
counters are writable? I have a patch that does that, I can send it
out for review.

I changed the logic to read config to get max banks and counters
before checking if counters are writable and tried writing to all.
The result is the same and all of them aren't writable. However,
when disable the writable check and assume they are, I can run

perf stat -e 'amd_iommu_0 on all events and get data.

perf stat -e 'amd_iommu_0/cmd_processed/' sleep 10

 Performance counter stats for 'system wide':

56  amd_iommu_0/cmd_processed/

  10.001525171 seconds time elapsed


perf stat -a -e amd_iommu/mem_trans_total/ sleep 10

 Performance counter stats for 'system wide':

 2,696  amd_iommu/mem_trans_total/

  10.001465115 seconds time elapsed

I tried all possible events listed under amd_iommu_0 and I can get
data on all of them. No problems in dmesg.


This patch doesn't really address that question.

thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix event counter availability check

2021-02-26 Thread Shuah Khan

On 2/26/21 2:44 PM, Paul Menzel wrote:

[cc: +suravee, +jörg]

Dear Alex, dear Shuah, dear Suravee, dear Jörg,


Am 03.06.20 um 08:54 schrieb Alexander Monakov:

On Tue, 2 Jun 2020, Shuah Khan wrote:


I changed the logic to read config to get max banks and counters
before checking if counters are writable and tried writing to all.
The result is the same and all of them aren't writable. However,
when disable the writable check and assume they are, I can run

[snip]

This is similar to what I did. I also noticed that counters can
be successfully used with perf if the initial check is ignored.
I was considering sending a patch to remove the check and adjust
the event counting logic to use counters as read-only, but after
a bit more investigation I've noticed how late pci_enable_device
is done, and came up with this patch. It's a path of less resistance:
I'd expect maintainers to be more averse to removing the check
rather than fixing it so it works as intended (even though I think
the check should not be there in the first place).

However:

The ability to modify the counters is needed only for sampling the
events (getting an interrupt when a counter overflows). There's no
code to do that for these AMD IOMMU counters. A solution I would
prefer is to not write to those counters at all. It would simplify or
even remove a bunch of code. I can submit a corresponding patch if
there's general agreement this path is ok.

What do you think?


I like this idea. Suravee, Jörg, what do you think?

Commit 6778ff5b21b (iommu/amd: Fix performance counter initialization) 
delays the boot up to 100 ms, which is over 20 % on fast systems, and 
also just a workaround, and does not seem to work always. The delay is 
also not mentioned in the commit message.





Sounds good to me. I can test this out on my system.

thanks,
-- Shuah

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread Shuah Khan

On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote:

In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be programmed
and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.


unnecessarily



Unfortunatly, there is no documentation of since which generation


Unfortunately,

Rephrase suggestion:
Unfortunately, it is unclear when the PMC HW bug fixed.


of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume


Based


that the buggy platforms are less likely to be in used, and it should


in use


be relatively safe to remove this legacy logic.




Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd/init.c | 24 +---
  1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 648cdfd03074..247cdda5d683 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
  }
  
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,

-   u8 fxn, u64 *value, bool is_write);
-
  static void init_iommu_perf_ctr(struct amd_iommu *iommu)
  {
+   u64 val;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg = 0;
  
  	if (!iommu_feature(iommu, FEATURE_PC))

return;
  
  	amd_iommu_pc_present = true;
  
-	/* save the value to restore, if writable */

-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
-   goto pc_false;
-
-   /* Check if the performance counters can be written to */
-   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
-   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-   (val != val2))
-   goto pc_false;
-
-   /* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
-   goto pc_false;
-
pci_info(pdev, "IOMMU performance counters supported\n");
  
  	val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);

@@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu)
iommu->max_counters = (u8) ((val >> 7) & 0xf);
  
  	return;

-
-pc_false:
-   pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
-   amd_iommu_pc_present = false;
-   return;
  }
  
  static ssize_t amd_iommu_show_cap(struct device *dev,




I will test this patch and the revert on my two AMD systems and update
you in a day or two.

thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread Shuah Khan

On 4/9/21 10:37 AM, Shuah Khan wrote:

On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote:

In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be 
programmed

and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.


unnecessarily



Unfortunatly, there is no documentation of since which generation


Unfortunately,

Rephrase suggestion:
Unfortunately, it is unclear when the PMC HW bug fixed.


of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume


Based


that the buggy platforms are less likely to be in used, and it should


in use


be relatively safe to remove this legacy logic.




Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ 


Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd/init.c | 24 +---
  1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 648cdfd03074..247cdda5d683 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)

  return 0;
  }
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 
cntr,

-    u8 fxn, u64 *value, bool is_write);
-
  static void init_iommu_perf_ctr(struct amd_iommu *iommu)
  {
+    u64 val;
  struct pci_dev *pdev = iommu->dev;
-    u64 val = 0xabcd, val2 = 0, save_reg = 0;


Why not leave this u64 val here? Having the pdev assignment as the
first line makes it easier to read/follow.


  if (!iommu_feature(iommu, FEATURE_PC))
  return;
  amd_iommu_pc_present = true;
-    /* save the value to restore, if writable */
-    if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
-    goto pc_false;
-
-    /* Check if the performance counters can be written to */
-    if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
-    (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-    (val != val2))


Aha - this is going away anyway. Please ignore my comment on 1/2
about parenthesis around (val != val2) being unnecessary.


-    goto pc_false;
-
-    /* restore */
-    if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
-    goto pc_false;
-
  pci_info(pdev, "IOMMU performance counters supported\n");
  val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);
@@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct 
amd_iommu *iommu)

  iommu->max_counters = (u8) ((val >> 7) & 0xf);
  return;
-
-pc_false:
-    pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
-    amd_iommu_pc_present = false;
-    return;
  }
  static ssize_t amd_iommu_show_cap(struct device *dev,





thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] Revert "iommu/amd: Fix performance counter initialization"

2021-04-09 Thread Shuah Khan

On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote:

From: Paul Menzel 

This reverts commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b.

The original commit tries to address an issue, where PMC power-gating
causing the IOMMU PMC pre-init test to fail on certain desktop/mobile
platforms where the power-gating is normally enabled.

There have been several reports that the workaround still does not
guarantee to work, and can add up to 100 ms (on the worst case)
to the boot process on certain platforms such as the MSI B350M MORTAR
with AMD Ryzen 3 2200G.

Therefore, revert this commit as a prelude to removing the pre-init
test.

Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Signed-off-by: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
Note: I have revised the commit message to add more detail
   and remove uncessary information.

  drivers/iommu/amd/init.c | 45 ++--
  1 file changed, 11 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 321f5906e6ed..648cdfd03074 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -12,7 +12,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
@@ -257,8 +256,6 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
  static int amd_iommu_enable_interrupts(void);
  static int __init iommu_go_to_state(enum iommu_init_state state);
  static void init_device_table_dma(void);
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
-   u8 fxn, u64 *value, bool is_write);
  
  static bool amd_iommu_pre_enabled = true;
  
@@ -1717,11 +1714,13 @@ static int __init init_iommu_all(struct acpi_table_header *table)

return 0;
  }
  
-static void __init init_iommu_perf_ctr(struct amd_iommu *iommu)

+static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
+   u8 fxn, u64 *value, bool is_write);
+
+static void init_iommu_perf_ctr(struct amd_iommu *iommu)
  {
-   int retry;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg, save_src;
+   u64 val = 0xabcd, val2 = 0, save_reg = 0;
  
  	if (!iommu_feature(iommu, FEATURE_PC))

return;
@@ -1729,39 +1728,17 @@ static void __init init_iommu_perf_ctr(struct amd_iommu 
*iommu)
amd_iommu_pc_present = true;
  
  	/* save the value to restore, if writable */

-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, false))
-   goto pc_false;
-
-   /*
-* Disable power gating by programing the performance counter
-* source to 20 (i.e. counts the reads and writes from/to IOMMU
-* Reserved Register [MMIO Offset 1FF8h] that are ignored.),
-* which never get incremented during this init phase.
-* (Note: The event is also deprecated.)
-*/
-   val = 20;
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 8, , true))
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
goto pc_false;
  
  	/* Check if the performance counters can be written to */

-   val = 0xabcd;
-   for (retry = 5; retry; retry--) {
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, , true) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 0, , false) ||
-   val2)
-   break;
-
-   /* Wait about 20 msec for power gating to disable and retry. */
-   msleep(20);
-   }
-
-   /* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, true))
+   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
+   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
+   (val != val2))


Probably don't need parentheses around 'val != val2'


goto pc_false;
  
-	if (val != val2)

+   /* restore */
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
goto pc_false;
  
  	pci_info(pdev, "IOMMU performance counters supported\n");




thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread Shuah Khan

On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote:

In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be programmed
and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.

Unfortunatly, there is no documentation of since which generation
of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume
that the buggy platforms are less likely to be in used, and it should
be relatively safe to remove this legacy logic.

Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---



Tested-by: Shuah Khan 

thanks,
-- Shuah

Results with this patch on AMD Ryzen 5 PRO 2400GE w/ Radeon Vega
Graphics

sudo ./perf stat -e 'amd_iommu_0/cmd_processed/, 
amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, 
amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, 
amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, 
amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, 
amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, 
amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, 
amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, 
amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, 
amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, 
amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, 
amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10


 Performance counter stats for 'system wide':

   156  amd_iommu_0/cmd_processed/ 
(33.30%)
80   amd_iommu_0/cmd_processed_inv/ 
 (33.38%)
 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 
(33.40%)
 0   amd_iommu_0/int_dte_hit/ 
   (33.43%)
   325   amd_iommu_0/int_dte_mis/ 
   (33.44%)
 1,951   amd_iommu_0/mem_dte_hit/ 
   (33.45%)
 7,589   amd_iommu_0/mem_dte_mis/ 
   (33.49%)
   325   amd_iommu_0/mem_iommu_tlb_pde_hit/ 
 (33.45%)
 2,460   amd_iommu_0/mem_iommu_tlb_pde_mis/ 
 (33.41%)
 2,510   amd_iommu_0/mem_iommu_tlb_pte_hit/ 
 (33.38%)
 5,526   amd_iommu_0/mem_iommu_tlb_pte_mis/ 
 (33.33%)
 0   amd_iommu_0/mem_pass_excl/ 
 (33.29%)
 0   amd_iommu_0/mem_pass_pretrans/ 
 (33.28%)
 1,556   amd_iommu_0/mem_pass_untrans/ 
(33.27%)
 0   amd_iommu_0/mem_target_abort/ 
(33.26%)
 3,112   amd_iommu_0/mem_trans_total/ 
   (33.29%)
 0   amd_iommu_0/page_tbl_read_gst/ 
 (33.29%)
 1,813   amd_iommu_0/page_tbl_read_nst/ 
 (33.25%)
 2,242   amd_iommu_0/page_tbl_read_tot/ 
 (33.27%)
 0   amd_iommu_0/smi_blk/ 
   (33.29%)
 0   amd_iommu_0/smi_recv/ 
   (33.28%)
 0   amd_iommu_0/tlb_inv/ 
   (33.28%)
 0   amd_iommu_0/vapic_int_guest/ 
   (33.25%)
 0   amd_iommu_0/vapic_int_non_guest/ 
   (33.26%)


  10.003200316 seconds time elapsed

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread Shuah Khan

On 4/9/21 2:00 PM, Shuah Khan wrote:

On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote:

In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be 
programmed

and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.

Unfortunatly, there is no documentation of since which generation
of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume
that the buggy platforms are less likely to be in used, and it should
be relatively safe to remove this legacy logic.

Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ 


Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---



Tested-by: Shuah Khan 



Revert + this patch - same as my test on Ryzen 5

On AMD Ryzen 7 4700G with Radeon Graphics

These look real odd to me. Let me know if I should look further.

sudo ./perf stat -e 'amd_iommu_0/cmd_processed/, 
amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, 
amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, 
amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, 
amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, 
amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, 
amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, 
amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, 
amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, 
amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, 
amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, 
amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10


 Performance counter stats for 'system wide':

17,761,952,514,865,374  amd_iommu_0/cmd_processed/ 
   (33.28%)
18,582,155,570,607,472   amd_iommu_0/cmd_processed_inv/ 
(33.32%)
 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 
(33.36%)
5,056,087,645,262,255   amd_iommu_0/int_dte_hit/ 
 (33.40%)
32,831,106,446,308,888   amd_iommu_0/int_dte_mis/ 
  (33.44%)
13,461,819,655,591,296   amd_iommu_0/mem_dte_hit/ 
  (33.45%)
208,555,436,221,050,464   amd_iommu_0/mem_dte_mis/ 
   (33.47%)
196,824,154,635,609,888   amd_iommu_0/mem_iommu_tlb_pde_hit/ 
 (33.46%)
193,552,630,440,410,144   amd_iommu_0/mem_iommu_tlb_pde_mis/ 
 (33.45%)
176,936,647,809,098,368   amd_iommu_0/mem_iommu_tlb_pte_hit/ 
 (33.41%)
184,737,401,623,626,464   amd_iommu_0/mem_iommu_tlb_pte_mis/ 
 (33.37%)
 0   amd_iommu_0/mem_pass_excl/ 
 (33.33%)
 0   amd_iommu_0/mem_pass_pretrans/ 
 (33.30%)
 0   amd_iommu_0/mem_pass_untrans/ 
(33.28%)
 0   amd_iommu_0/mem_target_abort/ 
(33.27%)
245,383,212,924,004,288   amd_iommu_0/mem_trans_total/ 
   (33.27%)
 0   amd_iommu_0/page_tbl_read_gst/ 
 (33.28%)
262,267,045,917,967,264   amd_iommu_0/page_tbl_read_nst/ 
 (33.27%)
256,308,216,913,137,600   amd_iommu_0/page_tbl_read_tot/ 
 (33.28%)
 0   amd_iommu_0/smi_blk/ 
   (33.27%)
 0   amd_iommu_0/smi_recv/ 
   (33.27%)
 0   amd_iommu_0/tlb_inv/ 
   (33.27%)
 0   amd_iommu_0/vapic_int_guest/ 
   (33.26%)
38,913,544,420,579,888   amd_iommu_0/vapic_int_non_guest/ 
  (33.27%)


  10.003967760 seconds time elapsed

thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: benchmark: Extract a common header file for map_benchmark definition

2022-02-11 Thread Shuah Khan

On 2/10/22 9:22 PM, Song Bao Hua (Barry Song) wrote:




-Original Message-
From: tiantao (H)
Sent: Friday, February 11, 2022 4:15 PM
To: Song Bao Hua (Barry Song) ; sh...@kernel.org;
chenxiang (M) 
Cc: iommu@lists.linux-foundation.org; linux-kselft...@vger.kernel.org;
linux...@openeuler.org
Subject: [PATCH] dma-mapping: benchmark: Extract a common header file for
map_benchmark definition

kernel/dma/map_benchmark.c and selftests/dma/dma_map_benchmark.c
have duplicate map_benchmark definitions, which tends to lead to
inconsistent changes to map_benchmark on both sides, extract a
common header file to avoid this problem.

Signed-off-by: Tian Tao 


+To: Christoph

Looks like a right cleanup. This will help decrease the maintain
overhead in the future. Other similar selftests tools are also
doing this.

Acked-by: Barry Song 



+1 on this cleanup making this code maintainable. We are moving in
the direction of cleaning up defines in selftests for the same
reason.

Let's just make sure this works on older kernels. We do support
mainline kselftest on stable releases. With that:

Reviewed-by: Shuah Khan 

thanks,
-- Shuah



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu