[PATCH] iommu/ipmmu-vmsa: Unify domain alloc/free implementations

2016-11-10 Thread Laurent Pinchart
The ARM and ARM64 implementations of the IOMMU domain alloc/free
operations can be unified with the correct combination of type checking,
as the domain type can never be IOMMU_DOMAIN_DMA on 32-bit ARM due to
the driver calling iommu_group_get_for_dev() on ARM64 only. Do so.

Signed-off-by: Laurent Pinchart 
---
 drivers/iommu/ipmmu-vmsa.c | 58 +++---
 1 file changed, 14 insertions(+), 44 deletions(-)

Hi Magnus,

This cleans up patch 5/7, making patch 4/7 unnecessary. I proposed squashing
4/7, 5/7 and this one in v7.

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 11550ac90d65..827aa72aaf28 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -529,16 +529,22 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
  * IOMMU Operations
  */
 
-static struct iommu_domain *__ipmmu_domain_alloc(unsigned type)
+static struct iommu_domain *ipmmu_domain_alloc(unsigned int type)
 {
struct ipmmu_vmsa_domain *domain;
 
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+   return NULL;
+
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!domain)
return NULL;
 
spin_lock_init(>lock);
 
+   if (type == IOMMU_DOMAIN_DMA)
+   iommu_get_dma_cookie(>io_domain);
+
return >io_domain;
 }
 
@@ -546,6 +552,9 @@ static void ipmmu_domain_free(struct iommu_domain 
*io_domain)
 {
struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
 
+   if (io_domain->type)
+   iommu_put_dma_cookie(io_domain);
+
/*
 * Free the domain resources. We assume that all devices have already
 * been detached.
@@ -821,14 +830,6 @@ static void ipmmu_remove_device(struct device *dev)
set_archdata(dev, NULL);
 }
 
-static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
-{
-   if (type != IOMMU_DOMAIN_UNMANAGED)
-   return NULL;
-
-   return __ipmmu_domain_alloc(type);
-}
-
 static const struct iommu_ops ipmmu_ops = {
.domain_alloc = ipmmu_domain_alloc,
.domain_free = ipmmu_domain_free,
@@ -847,42 +848,11 @@ static const struct iommu_ops ipmmu_ops = {
 
 #ifdef CONFIG_IOMMU_DMA
 
-static struct iommu_domain *ipmmu_domain_alloc_dma(unsigned type)
-{
-   struct iommu_domain *io_domain = NULL;
-
-   switch (type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   io_domain = __ipmmu_domain_alloc(type);
-   break;
-
-   case IOMMU_DOMAIN_DMA:
-   io_domain = __ipmmu_domain_alloc(type);
-   if (io_domain)
-   iommu_get_dma_cookie(io_domain);
-   break;
-   }
-
-   return io_domain;
-}
-
-static void ipmmu_domain_free_dma(struct iommu_domain *io_domain)
-{
-   switch (io_domain->type) {
-   case IOMMU_DOMAIN_DMA:
-   iommu_put_dma_cookie(io_domain);
-   /* fall-through */
-   default:
-   ipmmu_domain_free(io_domain);
-   break;
-   }
-}
-
 static int ipmmu_add_device_dma(struct device *dev)
 {
struct iommu_group *group;
 
-   /* only accept devices with iommus property */
+   /* Only accept devices with an iommus property. */
if (of_count_phandle_with_args(dev->of_node, "iommus",
   "#iommu-cells") < 0)
return -ENODEV;
@@ -920,13 +890,13 @@ static struct iommu_group *ipmmu_device_group_dma(struct 
device *dev)
 static int ipmmu_of_xlate_dma(struct device *dev,
  struct of_phandle_args *spec)
 {
-   /* dummy callback to satisfy of_iommu_configure() */
+   /* Dummy callback to satisfy of_iommu_configure(). */
return 0;
 }
 
 static const struct iommu_ops ipmmu_ops = {
-   .domain_alloc = ipmmu_domain_alloc_dma,
-   .domain_free = ipmmu_domain_free_dma,
+   .domain_alloc = ipmmu_domain_alloc,
+   .domain_free = ipmmu_domain_free,
.attach_dev = ipmmu_attach_device,
.detach_dev = ipmmu_detach_device,
.map = ipmmu_map,
-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 02/20] x86: Set the write-protect cache mode for full PAT support

2016-11-10 Thread Kani, Toshimitsu
On Thu, 2016-11-10 at 14:14 +0100, Borislav Petkov wrote:
> + Toshi.
> 
> On Wed, Nov 09, 2016 at 06:34:48PM -0600, Tom Lendacky wrote:
> > 
> > For processors that support PAT, set the write-protect cache mode
> > (_PAGE_CACHE_MODE_WP) entry to the actual write-protect value
> > (x05).

Using slot 6 may be more cautious (for the same reason slot 7 was used
for WT), but I do not have a strong opinion for it.

set_page_memtype() cannot track the use of WP type since there is no
extra-bit available for WP, but WP is only supported by
early_memremap_xx() interfaces in this series.  So, I think we should
just document that WP is only intended for temporary mappings at boot-
time until this issue is resolved.  Also, we need to make sure that
this early_memremap for WP is only called after pat_init() is done.

A nit - please add WP to the function header comment below.
"This function initializes PAT MSR and PAT table with an OS-defined
value to enable additional cache attributes, WC and WT."

Thanks,
-Toshi
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 06/07] iommu/ipmmu-vmsa: ARM and ARM64 archdata access

2016-11-10 Thread Laurent Pinchart
Hello,

On Friday 21 Oct 2016 18:32:54 Robin Murphy wrote:
> On 20/10/16 00:36, Magnus Damm wrote:
> > From: Magnus Damm 
> > 
> > Not all architectures have an iommu member in their archdata, so
> > use #ifdefs support build wit COMPILE_TEST on any architecture.
> 
> As an alternative to this we could now use iommu_fwspec in place of the
> custom ipmmu_vmsa_archdata - that's deliberately
> architecture-independent. I converted the Mediatek drivers[1] as an
> example to stand separately from the big SMMU rework, as those are the
> ones I'm most familiar with, but it looks like the IPMMU is a similarly
> perfect fit.
> 
> Of course, that could always come afterwards as a cleanup - I wouldn't
> consider it a blocker at this point - but it does look like it might
> simplify some of the code being moved around in this series if it were
> to be done beforehand.

I like the suggestion, it looks much cleaner. It would also allow implementing 
.of_xlate() in a cleaner fashion. I don't think it needs to block this series, 
but if Magnus ends up sending a v7 to address all the other small comments, it 
would be worth a shot.

> [1]:http://www.mail-archive.com/iommu@lists.linux-foundation.org/msg14577.ht
> ml
> > Signed-off-by: Magnus Damm 
> > Reviewed-by: Joerg Roedel 
> > ---
> > 
> >  Changes since V5:
> >  - None
> >  
> >  Changes since V4:
> >  - None
> >  
> >  Changes since V3:
> >  - New patch
> >  
> >  drivers/iommu/ipmmu-vmsa.c |   37 +++--
> >  1 file changed, 27 insertions(+), 10 deletions(-)
> > 
> > --- 0012/drivers/iommu/ipmmu-vmsa.c
> > +++ work/drivers/iommu/ipmmu-vmsa.c 2016-09-20 21:59:21.690607110 +0900
> > @@ -70,6 +70,25 @@ static struct ipmmu_vmsa_domain *to_vmsa
> > 
> > return container_of(dom, struct ipmmu_vmsa_domain, io_domain);
> >  
> >  }
> > 
> > +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> > +static struct ipmmu_vmsa_archdata *to_archdata(struct device *dev)
> > +{
> > +   return dev->archdata.iommu;
> > +}
> > +static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata
> > *p) +{
> > +   dev->archdata.iommu = p;
> > +}
> > +#else
> > +static struct ipmmu_vmsa_archdata *to_archdata(struct device *dev)
> > +{
> > +   return NULL;
> > +}
> > +static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata
> > *p) +{
> > +}
> > +#endif
> > +
> > 
> >  #define TLB_LOOP_TIMEOUT   100 /* 100us */
> >  
> >  /*
> >  
> >  -> 
> > @@ -539,7 +558,7 @@ static void ipmmu_domain_free(struct iom
> > 
> >  static int ipmmu_attach_device(struct iommu_domain *io_domain,
> >  
> >struct device *dev)
> >  
> >  {
> > 
> > -   struct ipmmu_vmsa_archdata *archdata = dev->archdata.iommu;
> > +   struct ipmmu_vmsa_archdata *archdata = to_archdata(dev);
> > 
> > struct ipmmu_vmsa_device *mmu = archdata->mmu;
> > struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
> > unsigned long flags;
> > 
> > @@ -581,7 +600,7 @@ static int ipmmu_attach_device(struct io
> > 
> >  static void ipmmu_detach_device(struct iommu_domain *io_domain,
> >  
> > struct device *dev)
> >  
> >  {
> > 
> > -   struct ipmmu_vmsa_archdata *archdata = dev->archdata.iommu;
> > +   struct ipmmu_vmsa_archdata *archdata = to_archdata(dev);
> > 
> > struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
> > unsigned int i;
> > 
> > @@ -701,7 +720,7 @@ static int ipmmu_init_platform_device(st
> > 
> > archdata->mmu = mmu;
> > archdata->utlbs = utlbs;
> > archdata->num_utlbs = num_utlbs;
> > 
> > -   dev->archdata.iommu = archdata;
> > +   set_archdata(dev, archdata);
> > 
> > return 0;
> >  
> >  error:
> > @@ -713,12 +732,11 @@ error:
> >  static int ipmmu_add_device(struct device *dev)
> >  {
> > 
> > -   struct ipmmu_vmsa_archdata *archdata;
> > 
> > struct ipmmu_vmsa_device *mmu = NULL;
> > struct iommu_group *group;
> > int ret;
> > 
> > -   if (dev->archdata.iommu) {
> > +   if (to_archdata(dev)) {
> > 
> > dev_warn(dev, "IOMMU driver already assigned to device %s\n",
> > 
> >  dev_name(dev));
> > 
> > return -EINVAL;
> > 
> > @@ -754,8 +772,7 @@ static int ipmmu_add_device(struct devic
> > 
> >  * - Make the mapping size configurable ? We currently use a 2GB 
mapping
> >  *   at a 1GB offset to ensure that NULL VAs will fault.
> >  */
> > 
> > -   archdata = dev->archdata.iommu;
> > -   mmu = archdata->mmu;
> > +   mmu = to_archdata(dev)->mmu;
> > 
> > if (!mmu->mapping) {
> > 
> > struct dma_iommu_mapping *mapping;
> > 
> > @@ -783,7 +800,7 @@ error:
> > if (mmu)
> > 
> > arm_iommu_release_mapping(mmu->mapping);
> > 
> > -   dev->archdata.iommu = NULL;
> > +   

Re: [PATCH v6 05/07] iommu/ipmmu-vmsa: Add new IOMMU_DOMAIN_DMA ops

2016-11-10 Thread Laurent Pinchart
Hello,

On Thursday 10 Nov 2016 12:42:06 Joerg Roedel wrote:
> On Fri, Oct 21, 2016 at 06:52:53PM +0100, Robin Murphy wrote:
> > > -static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
> > > -{
> > > - if (type != IOMMU_DOMAIN_UNMANAGED)
> > > - return NULL;
> > 
> > I *think* that if we did the initial check thus:
> > if (type != IOMMU_DOMAIN_UNMANAGED ||
> > 
> > (IS_ENABLED(CONFIG_IOMMU_DMA) && type != IOMMU_DOMAIN_DMA))
> > 
> > return NULL;
> > 
> > it shouldn't be necessary to split the function at all - we then just
> > wrap the {get,put}_cookie() bits in "if (type ==  IOMMU_DOMAIN_DMA)" and
> > in the 32-bit ARM case they just don't run as that can never be true.
> 
> This would be a good improvement. Magnus, Robin, can either of you send
> a follow-on patch to implement this suggestion? I have applied these
> patches to my arm/renesas branch (not pushed yet). The patch can be
> based on it.

I like the suggestion too, a patch is on its way.

Joerg, as I've sent a few comments about the other patches (sorry for the late 
review, I got delayed by KS and LPC), the follow-up patch should probably be 
squashed into this one when Magnus addresses my comments. Could you please 
hold off pushing the arm/renesas branch until Magnus replies to this ?

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 04/07] iommu/ipmmu-vmsa: Break out domain allocation code

2016-11-10 Thread Laurent Pinchart
Hi Magnus,

Thank you for the patch.

On Thursday 20 Oct 2016 08:36:13 Magnus Damm wrote:
> From: Magnus Damm 
> 
> Break out the domain allocation code into a separate function.
> This is preparation for future code sharing.
> 
> Signed-off-by: Magnus Damm 
> Reviewed-by: Joerg Roedel 

This looks good to me,

Reviewed-by: Laurent Pinchart 

(assuming my review of the next patches in the series won't make me conclude 
that this patch isn't needed :-))

> ---
> 
>  Changes since V5:
>  - None
> 
>  Changes since V4:
>  - None
> 
>  Changes since V3:
>  - None
> 
>  Changes since V2:
>  - Included this new patch as-is from the following series:
>[PATCH 00/04] iommu/ipmmu-vmsa: IPMMU CONFIG_IOMMU_DMA update
> 
>  drivers/iommu/ipmmu-vmsa.c |   13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> --- 0008/drivers/iommu/ipmmu-vmsa.c
> +++ work/drivers/iommu/ipmmu-vmsa.c   2016-09-20 21:56:59.220607110 +0900
> @@ -507,13 +507,10 @@ static irqreturn_t ipmmu_irq(int irq, vo
>   * IOMMU Operations
>   */
> 
> -static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
> +static struct iommu_domain *__ipmmu_domain_alloc(unsigned type)
>  {
>   struct ipmmu_vmsa_domain *domain;
> 
> - if (type != IOMMU_DOMAIN_UNMANAGED)
> - return NULL;
> -
>   domain = kzalloc(sizeof(*domain), GFP_KERNEL);
>   if (!domain)
>   return NULL;
> @@ -523,6 +520,14 @@ static struct iommu_domain *ipmmu_domain
>   return >io_domain;
>  }
> 
> +static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
> +{
> + if (type != IOMMU_DOMAIN_UNMANAGED)
> + return NULL;
> +
> + return __ipmmu_domain_alloc(type);
> +}
> +
>  static void ipmmu_domain_free(struct iommu_domain *io_domain)
>  {
>   struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/07] iommu/ipmmu-vmsa: Break out utlb parsing code

2016-11-10 Thread Laurent Pinchart
Hi Magnus,

Thank you for the patch.

On Thursday 20 Oct 2016 08:36:03 Magnus Damm wrote:
> From: Magnus Damm 
> 
> Break out the utlb parsing code and dev_data allocation into a
> separate function. This is preparation for future code sharing.
> 
> Signed-off-by: Magnus Damm 
> Reviewed-by: Joerg Roedel 
> ---
> 
>  Changes since V5:
>  - None
> 
>  Changes since V4:
>  - Dropped hunk with fix to apply on top of:
>b1e2afc iommu/ipmmu-vmsa: Fix wrong error handle of ipmmu_add_device
> 
>  Changes since V3:
>  - Initialize "mmu" to NULL, check before accessing
>  - Removed group parameter from ipmmu_init_platform_device()
> 
>  Changes since V2:
>  - Included this new patch from the following series:
>[PATCH 00/04] iommu/ipmmu-vmsa: IPMMU CONFIG_IOMMU_DMA update
>  - Reworked code to fit on top on previous two patches in current series.
> 
>  drivers/iommu/ipmmu-vmsa.c |   58 ++---
>  1 file changed, 37 insertions(+), 21 deletions(-)
> 
> --- 0006/drivers/iommu/ipmmu-vmsa.c
> +++ work/drivers/iommu/ipmmu-vmsa.c   2016-09-20 21:49:34.580607110 +0900
> @@ -647,22 +647,15 @@ static int ipmmu_find_utlbs(struct ipmmu
>   return 0;
>  }
> 
> -static int ipmmu_add_device(struct device *dev)
> +static int ipmmu_init_platform_device(struct device *dev)

The device doesn't have to be a platform device, how about calling this 
ipmmu_init_device_archdata() ?

>  {
>   struct ipmmu_vmsa_archdata *archdata;
>   struct ipmmu_vmsa_device *mmu;
> - struct iommu_group *group = NULL;
>   unsigned int *utlbs;
>   unsigned int i;
>   int num_utlbs;
>   int ret = -ENODEV;
> 
> - if (dev->archdata.iommu) {
> - dev_warn(dev, "IOMMU driver already assigned to device %s\n",
> -  dev_name(dev));
> - return -EINVAL;
> - }
> -
>   /* Find the master corresponding to the device. */
> 
>   num_utlbs = of_count_phandle_with_args(dev->of_node, "iommus",
> @@ -699,6 +692,36 @@ static int ipmmu_add_device(struct devic
>   }
>   }
> 
> + archdata = kzalloc(sizeof(*archdata), GFP_KERNEL);
> + if (!archdata) {
> + ret = -ENOMEM;
> + goto error;
> + }
> +
> + archdata->mmu = mmu;
> + archdata->utlbs = utlbs;
> + archdata->num_utlbs = num_utlbs;
> + dev->archdata.iommu = archdata;
> + return 0;
> +
> +error:
> + kfree(utlbs);
> + return ret;
> +}
> +
> +static int ipmmu_add_device(struct device *dev)
> +{
> + struct ipmmu_vmsa_archdata *archdata;
> + struct ipmmu_vmsa_device *mmu = NULL;
> + struct iommu_group *group;
> + int ret;
> +
> + if (dev->archdata.iommu) {
> + dev_warn(dev, "IOMMU driver already assigned to device %s\n",
> +  dev_name(dev));
> + return -EINVAL;
> + }
> +
>   /* Create a device group and add the device to it. */
>   group = iommu_group_alloc();
>   if (IS_ERR(group)) {
> @@ -716,16 +739,9 @@ static int ipmmu_add_device(struct devic
>   goto error;
>   }
> 
> - archdata = kzalloc(sizeof(*archdata), GFP_KERNEL);
> - if (!archdata) {
> - ret = -ENOMEM;
> + ret = ipmmu_init_platform_device(dev);
> + if (ret < 0)
>   goto error;
> - }
> -
> - archdata->mmu = mmu;
> - archdata->utlbs = utlbs;
> - archdata->num_utlbs = num_utlbs;
> - dev->archdata.iommu = archdata;
> 
>   /*
>* Create the ARM mapping, used by the ARM DMA mapping core to 
allocate
> @@ -736,6 +752,8 @@ static int ipmmu_add_device(struct devic
>* - Make the mapping size configurable ? We currently use a 2GB 
mapping
>*   at a 1GB offset to ensure that NULL VAs will fault.
>*/
> + archdata = dev->archdata.iommu;
> + mmu = archdata->mmu;
>   if (!mmu->mapping) {
>   struct dma_iommu_mapping *mapping;
> 
> @@ -760,10 +778,8 @@ static int ipmmu_add_device(struct devic
>   return 0;
> 
>  error:
> - arm_iommu_release_mapping(mmu->mapping);
> -
> - kfree(dev->archdata.iommu);
> - kfree(utlbs);
> + if (mmu)
> + arm_iommu_release_mapping(mmu->mapping);

Looking at the surrounding code, shouldn't the mapping only be destroyed here 
if it has been created by the same call to the function ? If there's an 
existing mapping for the IPMMU instance and the arm_iommu_attach_device() call 
fails, releasing the mapping seems to be a bug. As the bug isn't introduced by 
this patch, we can fix it separately, but if you end up resubmitting due to 
the first comment you could also add a fix.

> 
>   dev->archdata.iommu = NULL;

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/07] iommu/ipmmu-vmsa: Rework interrupt code and use bitmap for context

2016-11-10 Thread Laurent Pinchart
Hi Magnus,

Thank you for the patch.

On Thursday 20 Oct 2016 08:35:53 Magnus Damm wrote:
> From: Magnus Damm 
> 
> Introduce a bitmap for context handing and convert the
> interrupt routine to handle all registered contexts.
> 
> At this point the number of contexts are still limited.
> 
> Also remove the use of the ARM specific mapping variable
> from ipmmu_irq() to allow compile on ARM64.
> 
> Signed-off-by: Magnus Damm 
> Reviewed-by: Joerg Roedel 
> ---
> 
>  Changes since V5:
>  - None
> 
>  Changes since V4:
>  - None
> 
>  Changes since V3:
>  - None
> 
>  Changes since V2: (Thanks again to Laurent!)
>  - Introduce a spinlock together with the bitmap and domain array.
>  - Break out code into separate functions for alloc and free.
>  - Perform free after (instead of before) configuring hardware registers.
>  - Use the spinlock to protect the domain array in the interrupt handler.
> 
>  Changes since V1: (Thanks to Laurent for feedback!)
>  - Use simple find_first_zero()/set_bit()/clear_bit() for context
> management. - For allocation rely on spinlock held when calling
> ipmmu_domain_init_context() - For test/free use atomic bitops
>  - Return IRQ_HANDLED if any of the contexts generated interrupts
> 
>  drivers/iommu/ipmmu-vmsa.c |   76 +++--
>  1 file changed, 66 insertions(+), 10 deletions(-)
> 
> --- 0004/drivers/iommu/ipmmu-vmsa.c
> +++ work/drivers/iommu/ipmmu-vmsa.c   2016-09-20 21:48:23.770607110 +0900
> @@ -8,6 +8,7 @@
>   * the Free Software Foundation; version 2 of the License.
>   */
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -26,12 +27,17 @@
> 
>  #include "io-pgtable.h"
> 
> +#define IPMMU_CTX_MAX 1
> +
>  struct ipmmu_vmsa_device {
>   struct device *dev;
>   void __iomem *base;
>   struct list_head list;
> 
>   unsigned int num_utlbs;
> + spinlock_t lock;/* Protects ctx and domains[] 
*/
> + DECLARE_BITMAP(ctx, IPMMU_CTX_MAX);
> + struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> 
>   struct dma_iommu_mapping *mapping;
>  };
> @@ -293,9 +299,29 @@ static struct iommu_gather_ops ipmmu_gat
>   * Domain/Context Management
>   */
> 
> +static int ipmmu_domain_allocate_context(struct ipmmu_vmsa_device *mmu,
> +  struct ipmmu_vmsa_domain *domain)

Nitpicking, I'd name this ipmmu_domain_alloc_context() as the driver uses the 
alloc abbreviation already.

> +{
> + unsigned long flags;
> + int ret;
> +
> + spin_lock_irqsave(>lock, flags);
> +
> + ret = find_first_zero_bit(mmu->ctx, IPMMU_CTX_MAX);
> + if (ret != IPMMU_CTX_MAX) {
> + mmu->domains[ret] = domain;
> + set_bit(ret, mmu->ctx);
> + }

How about returning a negative error code on error instead of IPMMU_CTX_MAX ? 
I think it would make the API clearer, avoiding the need to think about 
special error handling for this function.

Having said that, I find that the init/alloc and destroy/free function names 
don't carry a very clear semantic. Given the size of the alloc and free 
functions, how about inlining them in their single caller ?

> +
> + spin_unlock_irqrestore(>lock, flags);
> +
> + return ret;
> +}
> +
>  static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  {
>   u64 ttbr;
> + int ret;
> 
>   /*
>* Allocate the page table operations.
> @@ -325,10 +351,15 @@ static int ipmmu_domain_init_context(str
>   return -EINVAL;
> 
>   /*
> -  * TODO: When adding support for multiple contexts, find an unused
> -  * context.
> +  * Find an unused context.
>*/

The comment now holds on one line.

> - domain->context_id = 0;
> + ret = ipmmu_domain_allocate_context(domain->mmu, domain);
> + if (ret == IPMMU_CTX_MAX) {
> + free_io_pgtable_ops(domain->iop);
> + return -EBUSY;
> + }
> +
> + domain->context_id = ret;
> 
>   /* TTBR0 */
>   ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
> @@ -370,6 +401,19 @@ static int ipmmu_domain_init_context(str
>   return 0;
>  }
> 
> +static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
> +   unsigned int context_id)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(>lock, flags);
> +
> + clear_bit(context_id, mmu->ctx);
> + mmu->domains[context_id] = NULL;
> +
> + spin_unlock_irqrestore(>lock, flags);
> +}
> +
>  static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
>  {
>   /*
> @@ -380,6 +424,7 @@ static void ipmmu_domain_destroy_context
>*/
>   ipmmu_ctx_write(domain, IMCTR, IMCTR_FLUSH);
>   ipmmu_tlb_sync(domain);
> + ipmmu_domain_free_context(domain->mmu, domain->context_id);
>  }
> 
>  /*
> ---
> -- @@ -437,16 

Re: [PATCH v2 1/2] iommu/dma: Implement dma_{map,unmap}_resource()

2016-11-10 Thread Catalin Marinas
On Thu, Nov 10, 2016 at 02:30:04PM +, Robin Murphy wrote:
> On 10/11/16 12:24, Joerg Roedel wrote:
> > On Wed, Oct 26, 2016 at 06:43:56PM +0100, Robin Murphy wrote:
> >> With the new dma_{map,unmap}_resource() functions added to the DMA API
> >> for the benefit of cases like slave DMA, add suitable implementations to
> >> the arsenal of our generic layer. Since cache maintenance should not be
> >> a concern, these can both be standalone callback implementations without
> >> the need for arch code wrappers.
> >>
> >> CC: Joerg Roedel 
> >> Signed-off-by: Robin Murphy 
> >> ---
> >>
> >> v2: Use iommu_dma_unmap_page for symmetry, instead of being sneaky.
> >>
> >>  drivers/iommu/dma-iommu.c | 13 +
> >>  include/linux/dma-iommu.h |  4 
> >>  2 files changed, 17 insertions(+)
> >>
> >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> >> index c5ab8667e6f2..a2fd90a6a782 100644
> >> --- a/drivers/iommu/dma-iommu.c
> >> +++ b/drivers/iommu/dma-iommu.c
> >> @@ -624,6 +624,19 @@ void iommu_dma_unmap_sg(struct device *dev, struct 
> >> scatterlist *sg, int nents,
> >>__iommu_dma_unmap(iommu_get_domain_for_dev(dev), sg_dma_address(sg));
> >>  }
> >>  
> >> +dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
> >> +  size_t size, enum dma_data_direction dir, unsigned long attrs)
> >> +{
> >> +  return iommu_dma_map_page(dev, phys_to_page(phys), offset_in_page(phys),
> >> +  size, dma_direction_to_prot(dir, false) | IOMMU_MMIO);
> >> +}
> > 
> > Isn't the whole point of map_resource that we can map regions which have
> > no 'struct page' associated? The use phys_to_page() will certainly not
> > do the right thing here.
> 
> Perhaps it's a bit cheeky, but the struct page pointer is never
> dereferenced - the only thing iommu_dma_map_page() does with it is
> immediately convert it back again. Is there ever a case where
> page_to_phys(phys_to_page(phys)) != phys ?

Without SPARSEMEM_VMEMMAP or FLATMEM, it wouldn't work. For example,
__page_to_pfn() in the SPARSEMEM case, used by page_to_phys(), even
accesses the page structure (through page_to_section()).

If the page struct is not relevant to the iommu code in question, you
could factor it out into an __iommu_dma_map_pfn(). Or maybe move the
whole logic to iommu_dma_map_resource() and call it from
iommu_dma_map_page() with the page_to_phys(page) argument.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-10 Thread Alex Williamson
On Thu, 10 Nov 2016 12:14:40 +0100
Auger Eric  wrote:

> Hi Will, Alex,
> 
> On 10/11/2016 03:01, Will Deacon wrote:
> > On Wed, Nov 09, 2016 at 05:55:17PM -0700, Alex Williamson wrote:  
> >> On Thu, 10 Nov 2016 01:14:42 +0100
> >> Auger Eric  wrote:  
> >>> On 10/11/2016 00:59, Alex Williamson wrote:  
>  On Wed, 9 Nov 2016 23:38:50 +
>  Will Deacon  wrote:  
> > On Wed, Nov 09, 2016 at 04:24:58PM -0700, Alex Williamson wrote:
> >> The VFIO_IOMMU_MAP_DMA ioctl is a contract, the user ask to map a range
> >> of IOVAs to a range of virtual addresses for a given device.  If VFIO
> >> cannot reasonably fulfill that contract, it must fail.  It's up to QEMU
> >> how to manage the hotplug and what memory regions it asks VFIO to map
> >> for a device, but VFIO must reject mappings that it (or the SMMU by
> >> virtue of using the IOMMU API) know to overlap reserved ranges.  So I
> >> still disagree with the referenced statement.  Thanks,  
> >
> > I think that's a pity. Not only does it mean that both QEMU and the 
> > kernel
> > have more work to do (the former has to carve up its mapping requests,
> > whilst the latter has to check that it is indeed doing this), but it 
> > also
> > precludes the use of hugepage mappings on the IOMMU because of reserved
> > regions. For example, a 4k hole someplace may mean we can't put down 1GB
> > table entries for the guest memory in the SMMU.
> >
> > All this seems to do is add complexity and decrease performance. For 
> > what?
> > QEMU has to go read the reserved regions from someplace anyway. It's 
> > also
> > the way that VFIO works *today* on arm64 wrt reserved regions, it just 
> > has
> > no way to identify those holes at present.
> 
>  Sure, that sucks, but how is the alternative even an option?  The user
>  asked to map something, we can't, if we allow that to happen now it's a
>  bug.  Put the MSI doorbells somewhere that this won't be an issue.  If
>  the platform has it fixed somewhere that this is an issue, don't use
>  that platform.  The correctness of the interface is more important than
>  catering to a poorly designed system layout IMO.  Thanks,
> >>>
> >>> Besides above problematic, I started to prototype the sysfs API. A first
> >>> issue I face is the reserved regions become global to the iommu instead
> >>> of characterizing the iommu_domain, ie. the "reserved_regions" attribute
> >>> file sits below an iommu instance (~
> >>> /sys/class/iommu/dmar0/intel-iommu/reserved_regions ||
> >>> /sys/class/iommu/arm-smmu0/arm-smmu/reserved_regions).
> >>>
> >>> MSI reserved window can be considered global to the IOMMU. However PCIe
> >>> host bridge P2P regions rather are per iommu-domain.  
> > 
> > I don't think we can treat them as per-domain, given that we want to
> > enumerate this stuff before we've decided to do a hotplug (and therefore
> > don't have a domain).  
> That's the issue indeed. We need to wait for the PCIe device to be
> connected to the iommu. Only on the VFIO SET_IOMMU we get the
> comprehensive list of P2P regions that can impact IOVA mapping for this
> iommu. This removes any advantage of sysfs API over previous VFIO
> capability chain API for P2P IOVA range enumeration at early stage.

For use through vfio we know that an iommu_domain is minimally composed
of an iommu_group and we can find all the p2p resources of that group
referencing /proc/iomem, at least for PCI-based groups.  This is the
part that I don't think any sort of iommu sysfs attributes should be
duplicating.

> >>> Do you confirm the attribute file should contain both global reserved
> >>> regions and all per iommu_domain reserved regions?
> >>>
> >>> Thoughts?  
> >>
> >> I don't think we have any business describing IOVA addresses consumed
> >> by peer devices in an IOMMU sysfs file.  If it's a separate device it
> >> should be exposed by examining the rest of the topology.  Regions
> >> consumed by PCI endpoints and interconnects are already exposed in
> >> sysfs.  In fact, is this perhaps a more accurate model for these MSI
> >> controllers too?  Perhaps they should be exposed in the bus topology
> >> somewhere as consuming the IOVA range.  
> Currently on x86 the P2P regions are not checked when allowing
> passthrough. Aren't we more papist that the pope? As Don mentioned,
> shouldn't we simply consider that a platform that does not support
> proper ACS is not candidate for safe passthrough, like Juno?

There are two sides here, there's the kernel side vfio and there's how
QEMU makes use of vfio.  On the kernel side, we create iommu groups as
the set of devices we consider isolated, that doesn't necessarily mean
that there isn't p2p within the group, in fact that potential often
determines the composition of the group.  It's the user's problem 

Re: [PATCH 1/5] iommu: Allow taking a reference on a group directly

2016-11-10 Thread Joerg Roedel
On Wed, Nov 09, 2016 at 06:00:59PM +, Will Deacon wrote:
> I'd still rather the new function was renamed. We already have the group,
> so calling a weird underscore version of iommu_group_get is really
> counter-intuitive.
> 
> Joerg -- do you have a preference?

iommu_group_ref_get sound best to me. Unfortunatly C has no function
overloading ;)


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-10 Thread Alex Williamson
On Thu, 10 Nov 2016 15:40:07 +0100
Joerg Roedel  wrote:

> On Wed, Nov 09, 2016 at 01:01:14PM -0700, Alex Williamson wrote:
> > Well, it's not like QEMU or libvirt stumbling through sysfs to figure
> > out where holes could be in order to instantiate a VM with matching
> > holes, just in case someone might decide to hot-add a device into the
> > VM, at some point, and hopefully they don't migrate the VM to another
> > host with a different layout first, is all that much less disgusting or
> > foolproof. It's just that in order to dynamically remove a page as a
> > possible DMA target we require a paravirt channel, such as a balloon
> > driver that's able to pluck a specific page.  In some ways it's
> > actually less disgusting, but it puts some prerequisites on
> > enlightening the guest OS.  Thanks,  
> 
> I think it is much simpler if libvirt/qemu just go through all
> potentially assignable devices on a system and pre-exclude any addresses
> from guest RAM beforehand, rather than doing something like this with
> paravirt/ballooning when a device is hot-added. There is no guarantee
> that you can take a page away from a linux-guest.

Right, I'm not advocating a paravirt ballooning approach, just pointing
out that they're both terrible, just in different ways.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 3/7] iommu/exynos: Simplify internal enable/disable functions

2016-11-10 Thread Joerg Roedel
On Tue, Nov 08, 2016 at 02:29:20PM +0100, Marek Szyprowski wrote:
> Remove remaining leftovers of the ref-count related code in the
> __sysmmu_enable/disable functions inline __sysmmu_enable/disable_nocount
> to them. Suspend/resume callbacks now checks if master device is set for
> given SYSMMU controller instead of relying on the activation count.
> 
> Signed-off-by: Marek Szyprowski 
> ---
>  drivers/iommu/exynos-iommu.c | 104 
> ---
>  1 file changed, 29 insertions(+), 75 deletions(-)
> 
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index 4056228..f45b274 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -237,8 +237,8 @@ struct sysmmu_drvdata {
>   struct clk *aclk;   /* SYSMMU's aclk clock */
>   struct clk *pclk;   /* SYSMMU's pclk clock */
>   struct clk *clk_master; /* master's device clock */
> - int activations;/* number of calls to sysmmu_enable */
>   spinlock_t lock;/* lock for modyfying state */
> + int active; /* current status */

Any reason for this being int and not bool? You only assign true/false
later.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma-iommu: properly respect configured address space size

2016-11-10 Thread Robin Murphy
On 10/11/16 15:59, Joerg Roedel wrote:
> On Tue, Nov 08, 2016 at 11:37:23AM +, Robin Murphy wrote:
>> TBH I've been pondering ripping the size stuff out of dma-iommu, as it
>> all stems from me originally failing to understand what dma_32bit_pfn is
>> actually for.
> 
> The point of dma_32bit_pfn is to allocate dma-address below 4G by
> default. This is a performance optimization so that even devices capable
> of 64bit DMA are using SAC by default instead of DAC.
> 
> Since it is the goal to share a dma-iommu implemenation between
> architectures, I would rather prefer not to rip this stuff out.

Oh, I didn't mean rip it out entirely, just get rid of the bogus
assumption that it's the "size" of the domain, especially when given a
>32-bit DMA mask, since that defeats the very optimisation I do now
understand (although it might still be OK for platform devices where
SAC/DAC doesn't apply, to avoid the rb_last() overhead every time).

>From the patch I've started, "rip it out" turns out to actually be
mostly "rewrite the comments" anyway - I'll post something soon.

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback

2016-11-10 Thread Joerg Roedel
On Thu, Nov 10, 2016 at 04:07:08PM +, Robin Murphy wrote:
> On 10/11/16 15:46, Joerg Roedel wrote:
> > On Fri, Nov 04, 2016 at 11:24:06AM +, Eric Auger wrote:
> >> +  resource_list_for_each_entry(window, >windows) {
> >> +  if (resource_type(window->res) != IORESOURCE_MEM &&
> >> +  resource_type(window->res) != IORESOURCE_IO)
> >> +  continue;
> > 
> > Why do you care about IO resources?
> 
> [since this is essentially code I wrote]
> 
> Because they occupy some area of the PCI address space, therefore I
> assumed that, like memory windows, they would be treated as P2P. Is that
> not the case?

No, not at all. The IO-space is completly seperate from the MEM-space.
They are two different address-spaces, addressing different things. And
the IO-space is also not translated by any IOMMU I am aware of.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback

2016-11-10 Thread Joerg Roedel
On Thu, Nov 10, 2016 at 04:57:51PM +0100, Auger Eric wrote:
> It does not only serve the purpose to register the MSI IOVA region. We
> also need to allocate an iova_domain where MSI IOVAs will be allocated
> upon the request of the relevant MSI controllers. Do you mean you don't
> like to use the iova allocator for this purpose?

Yes, it looks like the only purpose of iommu_get_dma_msi_region_cookie()
is to get the msi-region information into into the reserved-list.

Why do you need to 'allocate' the MSI region after all? Except for
IOMMU_DOMAIN_DMA domains the iommu-core code does not care about address
allocation. This is up to the users of the domain, which includes that
the user has to take care of the MSI region.

Besides that, 'iommu_get_dma_msi_region_cookie' is a terrible function
name and does not describe at all what the function does or is supposed
to do.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback

2016-11-10 Thread Robin Murphy
On 10/11/16 15:46, Joerg Roedel wrote:
> On Fri, Nov 04, 2016 at 11:24:06AM +, Eric Auger wrote:
>> The function populates the list of reserved regions with the
>> PCI host bridge windows and the MSI IOVA range.
>>
>> At the moment an arbitray MSI IOVA window is set at 0x800
>> of size 1MB.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> RFC v1 -> v2: use defines for MSI IOVA base and length
>> ---
>>  drivers/iommu/arm-smmu.c | 66 
>> 
>>  1 file changed, 66 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c841eb7..c07ea41 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
>>  
>>  #define FSYNR0_WNR  (1 << 4)
>>  
>> +#define MSI_IOVA_BASE   0x800
>> +#define MSI_IOVA_LENGTH 0x10
>> +
>>  static int force_stage;
>>  module_param(force_stage, int, S_IRUGO);
>>  MODULE_PARM_DESC(force_stage,
>> @@ -1533,6 +1536,68 @@ static int arm_smmu_of_xlate(struct device *dev, 
>> struct of_phandle_args *args)
>>  return iommu_fwspec_add_ids(dev, , 1);
>>  }
>>  
>> +static int add_pci_window_reserved_regions(struct iommu_domain *domain,
>> +   struct pci_dev *dev)
>> +{
>> +struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +struct iommu_reserved_region *region;
>> +struct resource_entry *window;
>> +phys_addr_t start;
>> +size_t length;
>> +
>> +resource_list_for_each_entry(window, >windows) {
>> +if (resource_type(window->res) != IORESOURCE_MEM &&
>> +resource_type(window->res) != IORESOURCE_IO)
>> +continue;
> 
> Why do you care about IO resources?

[since this is essentially code I wrote]

Because they occupy some area of the PCI address space, therefore I
assumed that, like memory windows, they would be treated as P2P. Is that
not the case?

>> +
>> +start = window->res->start - window->offset;
>> +length = window->res->end - window->res->start + 1;
>> +
>> +iommu_reserved_region_for_each(region, domain) {
>> +if (region->start == start && region->length == length)
>> +continue;
>> +}
>> +region = kzalloc(sizeof(*region), GFP_KERNEL);
>> +if (!region)
>> +return -ENOMEM;
>> +
>> +region->start = start;
>> +region->length = length;
>> +
>> +list_add_tail(>list, >reserved_regions);
>> +}
>> +return 0;
>> +}
>> +
>> +static int arm_smmu_add_reserved_regions(struct iommu_domain *domain,
>> + struct device *device)
>> +{
>> +struct iommu_reserved_region *region;
>> +int ret = 0;
>> +
>> +/* An arbitrary 1MB region starting at 0x800 is reserved for MSIs */
>> +if (!domain->iova_cookie) {
>> +
>> +region = kzalloc(sizeof(*region), GFP_KERNEL);
>> +if (!region)
>> +return -ENOMEM;
>> +
>> +region->start = MSI_IOVA_BASE;
>> +region->length = MSI_IOVA_LENGTH;
>> +list_add_tail(>list, >reserved_regions);
>> +
>> +ret = iommu_get_dma_msi_region_cookie(domain,
>> +region->start, region->length);
>> +if (ret)
>> +return ret;
> 
> Gah, I hate this. Is there no other and simpler way to get the MSI
> region than allocating an iova-domain? In that regard, I also _hate_ the
> patch before introducing this weird iommu_get_dma_msi_region_cookie()
> function.

Mea culpa ;)

> Allocation an iova-domain is pretty expensive, as it also includes
> per-cpu data-structures and all, so please don't do this just for the
> purpose of compiling a list of reserved regions.

Yeah, this was really more of a "get it working" thing - the automatic
MSI mapping stuff is already plumbed in for DMA domains, so faking up a
DMA domain is the easy way in. I'll have a look at factoring out the
relevant parts a bit so that MSI-only regions can have an alternate
cookie with a lightweight allocator.

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback

2016-11-10 Thread Auger Eric
Hi Joerg,

On 10/11/2016 16:46, Joerg Roedel wrote:
> On Fri, Nov 04, 2016 at 11:24:06AM +, Eric Auger wrote:
>> The function populates the list of reserved regions with the
>> PCI host bridge windows and the MSI IOVA range.
>>
>> At the moment an arbitray MSI IOVA window is set at 0x800
>> of size 1MB.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> RFC v1 -> v2: use defines for MSI IOVA base and length
>> ---
>>  drivers/iommu/arm-smmu.c | 66 
>> 
>>  1 file changed, 66 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c841eb7..c07ea41 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
>>  
>>  #define FSYNR0_WNR  (1 << 4)
>>  
>> +#define MSI_IOVA_BASE   0x800
>> +#define MSI_IOVA_LENGTH 0x10
>> +
>>  static int force_stage;
>>  module_param(force_stage, int, S_IRUGO);
>>  MODULE_PARM_DESC(force_stage,
>> @@ -1533,6 +1536,68 @@ static int arm_smmu_of_xlate(struct device *dev, 
>> struct of_phandle_args *args)
>>  return iommu_fwspec_add_ids(dev, , 1);
>>  }
>>  
>> +static int add_pci_window_reserved_regions(struct iommu_domain *domain,
>> +   struct pci_dev *dev)
>> +{
>> +struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +struct iommu_reserved_region *region;
>> +struct resource_entry *window;
>> +phys_addr_t start;
>> +size_t length;
>> +
>> +resource_list_for_each_entry(window, >windows) {
>> +if (resource_type(window->res) != IORESOURCE_MEM &&
>> +resource_type(window->res) != IORESOURCE_IO)
>> +continue;
> 
> Why do you care about IO resources?
Effectively that's a draft implementation inspired from "iommu/dma:
Avoid PCI host bridge windows". Also not all PCI host bridge windows
induce issues; my understanding is only those not supporting ACS are a
problem.
> 
>> +
>> +start = window->res->start - window->offset;
>> +length = window->res->end - window->res->start + 1;
>> +
>> +iommu_reserved_region_for_each(region, domain) {
>> +if (region->start == start && region->length == length)
>> +continue;
>> +}
>> +region = kzalloc(sizeof(*region), GFP_KERNEL);
>> +if (!region)
>> +return -ENOMEM;
>> +
>> +region->start = start;
>> +region->length = length;
>> +
>> +list_add_tail(>list, >reserved_regions);
>> +}
>> +return 0;
>> +}
>> +
>> +static int arm_smmu_add_reserved_regions(struct iommu_domain *domain,
>> + struct device *device)
>> +{
>> +struct iommu_reserved_region *region;
>> +int ret = 0;
>> +
>> +/* An arbitrary 1MB region starting at 0x800 is reserved for MSIs */
>> +if (!domain->iova_cookie) {
>> +
>> +region = kzalloc(sizeof(*region), GFP_KERNEL);
>> +if (!region)
>> +return -ENOMEM;
>> +
>> +region->start = MSI_IOVA_BASE;
>> +region->length = MSI_IOVA_LENGTH;
>> +list_add_tail(>list, >reserved_regions);
>> +
>> +ret = iommu_get_dma_msi_region_cookie(domain,
>> +region->start, region->length);
>> +if (ret)
>> +return ret;
> 
> Gah, I hate this. Is there no other and simpler way to get the MSI
> region than allocating an iova-domain? In that regard, I also _hate_ the
> patch before introducing this weird iommu_get_dma_msi_region_cookie()
> function.
> 
> Allocation an iova-domain is pretty expensive, as it also includes
> per-cpu data-structures and all, so please don't do this just for the
> purpose of compiling a list of reserved regions.
It does not only serve the purpose to register the MSI IOVA region. We
also need to allocate an iova_domain where MSI IOVAs will be allocated
upon the request of the relevant MSI controllers. Do you mean you don't
like to use the iova allocator for this purpose?

Thanks

Eric
> 
> 
> 
>   Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma-iommu: properly respect configured address space size

2016-11-10 Thread Joerg Roedel
On Tue, Nov 08, 2016 at 11:37:23AM +, Robin Murphy wrote:
> TBH I've been pondering ripping the size stuff out of dma-iommu, as it
> all stems from me originally failing to understand what dma_32bit_pfn is
> actually for.

The point of dma_32bit_pfn is to allocate dma-address below 4G by
default. This is a performance optimization so that even devices capable
of 64bit DMA are using SAC by default instead of DAC.

Since it is the goal to share a dma-iommu implemenation between
architectures, I would rather prefer not to rip this stuff out.


Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback

2016-11-10 Thread Joerg Roedel
On Fri, Nov 04, 2016 at 11:24:06AM +, Eric Auger wrote:
> The function populates the list of reserved regions with the
> PCI host bridge windows and the MSI IOVA range.
> 
> At the moment an arbitray MSI IOVA window is set at 0x800
> of size 1MB.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> RFC v1 -> v2: use defines for MSI IOVA base and length
> ---
>  drivers/iommu/arm-smmu.c | 66 
> 
>  1 file changed, 66 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index c841eb7..c07ea41 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
>  
>  #define FSYNR0_WNR   (1 << 4)
>  
> +#define MSI_IOVA_BASE0x800
> +#define MSI_IOVA_LENGTH  0x10
> +
>  static int force_stage;
>  module_param(force_stage, int, S_IRUGO);
>  MODULE_PARM_DESC(force_stage,
> @@ -1533,6 +1536,68 @@ static int arm_smmu_of_xlate(struct device *dev, 
> struct of_phandle_args *args)
>   return iommu_fwspec_add_ids(dev, , 1);
>  }
>  
> +static int add_pci_window_reserved_regions(struct iommu_domain *domain,
> +struct pci_dev *dev)
> +{
> + struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> + struct iommu_reserved_region *region;
> + struct resource_entry *window;
> + phys_addr_t start;
> + size_t length;
> +
> + resource_list_for_each_entry(window, >windows) {
> + if (resource_type(window->res) != IORESOURCE_MEM &&
> + resource_type(window->res) != IORESOURCE_IO)
> + continue;

Why do you care about IO resources?

> +
> + start = window->res->start - window->offset;
> + length = window->res->end - window->res->start + 1;
> +
> + iommu_reserved_region_for_each(region, domain) {
> + if (region->start == start && region->length == length)
> + continue;
> + }
> + region = kzalloc(sizeof(*region), GFP_KERNEL);
> + if (!region)
> + return -ENOMEM;
> +
> + region->start = start;
> + region->length = length;
> +
> + list_add_tail(>list, >reserved_regions);
> + }
> + return 0;
> +}
> +
> +static int arm_smmu_add_reserved_regions(struct iommu_domain *domain,
> +  struct device *device)
> +{
> + struct iommu_reserved_region *region;
> + int ret = 0;
> +
> + /* An arbitrary 1MB region starting at 0x800 is reserved for MSIs */
> + if (!domain->iova_cookie) {
> +
> + region = kzalloc(sizeof(*region), GFP_KERNEL);
> + if (!region)
> + return -ENOMEM;
> +
> + region->start = MSI_IOVA_BASE;
> + region->length = MSI_IOVA_LENGTH;
> + list_add_tail(>list, >reserved_regions);
> +
> + ret = iommu_get_dma_msi_region_cookie(domain,
> + region->start, region->length);
> + if (ret)
> + return ret;

Gah, I hate this. Is there no other and simpler way to get the MSI
region than allocating an iova-domain? In that regard, I also _hate_ the
patch before introducing this weird iommu_get_dma_msi_region_cookie()
function.

Allocation an iova-domain is pretty expensive, as it also includes
per-cpu data-structures and all, so please don't do this just for the
purpose of compiling a list of reserved regions.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 2/8] iommu/iova: fix __alloc_and_insert_iova_range

2016-11-10 Thread Auger Eric
Hi Joerg,

On 10/11/2016 16:22, Joerg Roedel wrote:
> On Fri, Nov 04, 2016 at 11:24:00AM +, Eric Auger wrote:
>> Fix the size check within start_pfn and limit_pfn.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> the issue was observed when playing with 1 page iova domain with
>> higher iova reserved.
>> ---
>>  drivers/iommu/iova.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index e23001b..ee29dbf 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -147,7 +147,7 @@ static int __alloc_and_insert_iova_range(struct 
>> iova_domain *iovad,
>>  if (!curr) {
>>  if (size_aligned)
>>  pad_size = iova_get_pad_size(size, limit_pfn);
>> -if ((iovad->start_pfn + size + pad_size) > limit_pfn) {
>> +if ((iovad->start_pfn + size + pad_size - 1) > limit_pfn) {
> 
> A >= check is more readable here.

Sure

Thanks

Eric
> 
> 
>   Joerg
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain

2016-11-10 Thread Auger Eric
Hi Joerg, Robin,

On 10/11/2016 16:37, Joerg Roedel wrote:
> On Fri, Nov 04, 2016 at 11:24:02AM +, Eric Auger wrote:
>> Introduce a new iommu_reserved_region struct. This embodies
>> an IOVA reserved region that cannot be used along with the IOMMU
>> API. The list is protected by a dedicated mutex.
>>
>> An iommu domain now owns a list of those.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> ---
>>  drivers/iommu/iommu.c |  2 ++
>>  include/linux/iommu.h | 17 +
>>  2 files changed, 19 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 9a2f196..0af07492 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1061,6 +1061,8 @@ static struct iommu_domain 
>> *__iommu_domain_alloc(struct bus_type *bus,
>>  
>>  domain->ops  = bus->iommu_ops;
>>  domain->type = type;
>> +INIT_LIST_HEAD(>reserved_regions);
>> +mutex_init(>resv_mutex);
> 
> These regions are a property of the iommu-group, they are specific to a
> device or a group of devices, not to a particular domain where devics
> (iommu-groups) can come and go.
> 
> Further I agree with Robin that this is similar to the
> get_dm_regions/set_dm_regions approach, which should be changed/extended
> for this instead of adding something new.
OK I am currently respinning, taking this into account. I will put the
reserved region file attribute in the iommu-group sysfs dir.

Thanks

Eric
> 
> 
>   Joerg
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 2/8] iommu/iova: fix __alloc_and_insert_iova_range

2016-11-10 Thread Joerg Roedel
On Fri, Nov 04, 2016 at 11:24:00AM +, Eric Auger wrote:
> Fix the size check within start_pfn and limit_pfn.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> the issue was observed when playing with 1 page iova domain with
> higher iova reserved.
> ---
>  drivers/iommu/iova.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index e23001b..ee29dbf 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -147,7 +147,7 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>   if (!curr) {
>   if (size_aligned)
>   pad_size = iova_get_pad_size(size, limit_pfn);
> - if ((iovad->start_pfn + size + pad_size) > limit_pfn) {
> + if ((iovad->start_pfn + size + pad_size - 1) > limit_pfn) {

A >= check is more readable here.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-10 Thread Joerg Roedel
On Wed, Nov 09, 2016 at 08:11:16PM +, Robin Murphy wrote:
> When we *are* in full control of the IOVA space, we just carve out what
> we can find as best we can - see iova_reserve_pci_windows() in
> dma-iommu.c, which isn't really all that different to what x86 does
> (e.g. init_reserved_iova_ranges() in amd-iommu.c).

Yeah, that code was actually written with a look at what the Intel
driver does. I don't really like that it goes over all resources and
reserves them individually (not only because it is not hotplug-safe).

I have to check whether there is a nice and generic way to find out the
root-bridge windows and just reserve them in the iova-space. That would
be easier and more reliable.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu/dma: Implement dma_{map,unmap}_resource()

2016-11-10 Thread Robin Murphy
On 10/11/16 12:24, Joerg Roedel wrote:
> On Wed, Oct 26, 2016 at 06:43:56PM +0100, Robin Murphy wrote:
>> With the new dma_{map,unmap}_resource() functions added to the DMA API
>> for the benefit of cases like slave DMA, add suitable implementations to
>> the arsenal of our generic layer. Since cache maintenance should not be
>> a concern, these can both be standalone callback implementations without
>> the need for arch code wrappers.
>>
>> CC: Joerg Roedel 
>> Signed-off-by: Robin Murphy 
>> ---
>>
>> v2: Use iommu_dma_unmap_page for symmetry, instead of being sneaky.
>>
>>  drivers/iommu/dma-iommu.c | 13 +
>>  include/linux/dma-iommu.h |  4 
>>  2 files changed, 17 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab8667e6f2..a2fd90a6a782 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -624,6 +624,19 @@ void iommu_dma_unmap_sg(struct device *dev, struct 
>> scatterlist *sg, int nents,
>>  __iommu_dma_unmap(iommu_get_domain_for_dev(dev), sg_dma_address(sg));
>>  }
>>  
>> +dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
>> +size_t size, enum dma_data_direction dir, unsigned long attrs)
>> +{
>> +return iommu_dma_map_page(dev, phys_to_page(phys), offset_in_page(phys),
>> +size, dma_direction_to_prot(dir, false) | IOMMU_MMIO);
>> +}
> 
> Isn't the whole point of map_resource that we can map regions which have
> no 'struct page' associated? The use phys_to_page() will certainly not
> do the right thing here.

Perhaps it's a bit cheeky, but the struct page pointer is never
dereferenced - the only thing iommu_dma_map_page() does with it is
immediately convert it back again. Is there ever a case where
page_to_phys(phys_to_page(phys)) != phys ?

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma-iommu: properly respect configured address space size

2016-11-10 Thread Robin Murphy
On 08/11/16 14:53, Marek Szyprowski wrote:
> Hi Robin,
> 
> 
> On 2016-11-08 15:44, Robin Murphy wrote:
>> On 08/11/16 13:41, Marek Szyprowski wrote:
>>> On 2016-11-08 12:37, Robin Murphy wrote:
 On 07/11/16 13:06, Marek Szyprowski wrote:
> When one called iommu_dma_init_domain() with size smaller than
> device's
> DMA mask, the alloc_iova() will not respect it and always assume that
> all
> IOVA addresses will be allocated from the the (base ...
> dev->dma_mask) range.
 Is that actually a problem for anything?
>>> Yes, I found this issue while working on next version of ARM & ARM64
>>> DMA-mapping/IOMMU integration patchset and adapting Exynos drivers
>>> for the
>>> new IOMMU/DMA-mapping glue.
>>>
>>> Some Exynos devices (codec and camera isp) operate only on the
>>> limited (and
>>> really small: 256M for example) DMA window. They use non-standard way of
>>> addressing memory: an offset from the firmware base. However they still
>>> have
>>> 32bit DMA mask, as the firmware can be located basically everywhere
>>> in the
>>> real DMA address space, but then they can access only next 256M from
>>> that
>>> firmware base.
>> OK, that's good to know, thanks. However, I think in this case it sounds
>> like it's really the DMA mask that's the underlying problem - if those
>> blocks themselves can only drive 28 address bits, then the struct
>> devices representing them should have 28-bit DMA masks, and the
>> "firmware base" of whoever's driving the upper bits modelled with a
>> dma_pfn_offset. Even if they do have full 32-bit interfaces themselves,
>> but are constrained to segment-offset addressing internally, I still
>> think it would be tidier to represent things that way.
>>
>> At some point in dma-iommu development I did have support for DMA
>> offsets upstream of the IOMMU, and am happy to reinstate it if there's a
>> real use case (assuming you can't simply always set the firmware base to
>> 0 when using the IOMMU).
> 
> That would indeed look a bit simpler, but I've already tried such approach
> and the firmware crashes when its base in real DMA address space is set to
> zero.

Ah, sadly I'm not too surprised... ;)

Having pondered it a little more, what if the firmware base is instead
set equal to the max offset, then you can have a power-of-two DMA mask
at twice that size (e.g. 512M) such that the top-down IOVA allocations
start in the right place. That would appear to achieve pretty much the
same result as this patch, but more robustly.

Incidentally, how do the init size and DMA mask end up different in the
first place? Is this another case of the "dma-ranges" info from
of_dma_configure() getting clobbered by a driver calling dma_set_mask()
later?

Robin.

> This patch fixes this issue by taking the configured address space
> size
> parameter into account (if it is smaller than the device's dma_mask).
 TBH I've been pondering ripping the size stuff out of dma-iommu, as it
 all stems from me originally failing to understand what
 dma_32bit_pfn is
 actually for. The truth is that iova_domains just don't have a size or
 upper limit; however if devices with both large and small DMA masks
 share a domain, then the top-down nature of the allocator means that
 allocating for the less-capable devices would involve walking through
 every out-of-range entry in the tree every time. Having cached32_node
 based on dma_32bit_pfn just provides an optimised starting point for
 searching within the smaller mask.
>>> Right, this dma_32bit_pfn was really misleading at the first glance,
>>> but then I found that this was something like end_pfn in case of
>>> dma-iommu
>>> code.
>> Yes, that was my incorrect assumption - at the time I interpreted it as
>> a de-facto upper limit which was still possible to allocate above in
>> special circumstances, which turns out to be almost entirely backwards.
>> I'd rather not bake that into the dma-iommu code any further if we can
>> avoid it (I'll try throwing an RFC together to clear up what's there
>> already).
> 
> Okay.
> 
> Best regards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: problems with iommu and userspace DMA to hugepages, how to add iommu mapings from userspace.

2016-11-10 Thread Alex Williamson
On Thu, 10 Nov 2016 12:16:19 +0100
Lars Segerlund  wrote:

>  Hi,
> 
>  I am getting these errors from a userspace 'device driver' ,
> 
> [599805.585424] DMAR: DRHD: handling fault status reg 202
> [599805.585431] DMAR: DMAR:[DMA Read] Request device [03:00.0] fault addr
> 4c0008000
> DMAR:[fault reason 06] PTE Read access is not set
> 
>  Basicly i mapp a dma engine in a PCIe card by uio to a userspace library,
> and then allocate hugepages ( continous ),  do a virt to phys translation
> via /proc/self/ setup a dma and turn it on.
> 
> The message tells me I lack iommu mappings for the memory regions i try to
> read/write , so far so good nothing unexpected.
> 
> My problem is that I have to run with intel_iommu=on iommu=pt flags to the
> kernel due to other hard/software on the machine, I can't turn iommu off
> completely.
> 
> So I have the question is it possible to add iommu mappings from userspace
> ? through iommu groups in /sys/..  ?
> 
> So far all applications that does something similar either uses vfio or
> some kernel driver callback and I would prefer not to do this.
> ( I had high hopes for dpdk pmd driver but so far no luck ).

vfio is the right way to do this.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 02/20] x86: Set the write-protect cache mode for full PAT support

2016-11-10 Thread Borislav Petkov
+ Toshi.

On Wed, Nov 09, 2016 at 06:34:48PM -0600, Tom Lendacky wrote:
> For processors that support PAT, set the write-protect cache mode
> (_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).
> 
> Acked-by: Borislav Petkov 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/mm/pat.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index 170cc4f..87e8952 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -355,7 +355,7 @@ void pat_init(void)
>*  0102UC-: _PAGE_CACHE_MODE_UC_MINUS
>*  0113UC : _PAGE_CACHE_MODE_UC
>*  1004WB : Reserved
> -  *  1015WC : Reserved
> +  *  1015WP : _PAGE_CACHE_MODE_WP
>*  1106UC-: Reserved
>*  1117WT : _PAGE_CACHE_MODE_WT
>*
> @@ -363,7 +363,7 @@ void pat_init(void)
>* corresponding types in the presence of PAT errata.
>*/
>   pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
> -   PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
> +   PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
>   }
>  
>   if (!boot_cpu_done) {
> 
> 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain

2016-11-10 Thread Robin Murphy
On 10/11/16 12:14, Auger Eric wrote:
> Hi Robin,
> 
> On 10/11/2016 12:54, Robin Murphy wrote:
>> Hi Eric,
>>
>> On 10/11/16 11:22, Auger Eric wrote:
>>> Hi Robin,
>>>
>>> On 04/11/2016 15:00, Robin Murphy wrote:
 Hi Eric,

 Thanks for posting this new series - the bottom-up approach is a lot
 easier to reason about :)

 On 04/11/16 11:24, Eric Auger wrote:
> Introduce a new iommu_reserved_region struct. This embodies
> an IOVA reserved region that cannot be used along with the IOMMU
> API. The list is protected by a dedicated mutex.

 In the light of these patches, I think I'm settling into agreement that
 the iommu_domain is the sweet spot for accessing this information - the
 underlying magic address ranges might be properties of various bits of
 hardware many of which aren't the IOMMU itself, but they only start to
 matter at the point you start wanting to use an IOMMU domain at the
 higher level. Therefore, having a callback in the domain ops to pull
 everything together fits rather neatly.
>>> Using get_dm_regions could have make sense but this approach now is
>>> ruled out by sysfs API approach. If attribute file is bound to be used
>>> before iommu domains are created, we cannot rely on any iommu_domain
>>> based callback. Back to square 1?
>>
>> I think it's still OK. The thing about these reserved regions is that as
>> a property of the underlying hardware they must be common to any domain
>> for a given group, therefore without loss of generality we can simply
>> query group->domain->ops->get_dm_regions(), and can expect the reserved
>> ones will be the same regardless of what domain that points to
>> (identity-mapped IVMD/RMRR/etc.
> Are they really? P2P reserved regions depend on iommu_domain right?

Indeed. To use the SMMU example, reprogramming S2CRs to target a
different context bank (i.e. attaching to a different domain) won't
affect the fact that the transactions aren't even reaching the SMMU in
the first place. That's why we need the exact same information for DMA
domains, thus why getting it for free via the dm_regions mechanism would
be really neat.

The visibility of P2P regions, doorbells, etc. for a given device is
ultimately a property of the hardware topology, and topology happens to
be what the iommu_group already represents*. There looks to be a snag
when we try to consider the addresses of such regions, since addresses
are the business of iommu_domains, not groups, but as there is a 1:1
relationship between a group and its default domain, things end up tying
together quite neatly.

Robin.

*in fact, I've just had an idea that way we check ACS paths to determine
groups might similarly be a finer-grained way to detect what P2P
regions, if any, are actually relevant.

> Now I did not consider default_domain usability, I acknowledge. I will
> send a POC anyway.
> 
>  regions may not be, but we'd be
>> filtering those out anyway). The default DMA domains need this
>> information too, and since those are allocated at group creation,
>> group->domain should always be non-NULL and interrogable.
>>
>> Plus, the groups are already there in sysfs, and, being representative
>> of device topology, would seem to be an ideal place to expose the
>> addressing limitations relevant to the devices within them. This really
>> feels like it's all falling into place (on the kernel end, at least, I'm
>> sticking to the sidelines on the userspace discussion ;)).
> 
> Thanks
> 
> Eric
>>
>> Robin.
>>
>>>
>>> Thanks
>>>
>>> Eric

>
> An iommu domain now owns a list of those.
>
> Signed-off-by: Eric Auger 
>
> ---
> ---
>  drivers/iommu/iommu.c |  2 ++
>  include/linux/iommu.h | 17 +
>  2 files changed, 19 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 9a2f196..0af07492 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1061,6 +1061,8 @@ static struct iommu_domain 
> *__iommu_domain_alloc(struct bus_type *bus,
>  
>   domain->ops  = bus->iommu_ops;
>   domain->type = type;
> + INIT_LIST_HEAD(>reserved_regions);
> + mutex_init(>resv_mutex);
>   /* Assume all sizes by default; the driver may override this later */
>   domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>  
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 436dc21..0f2eb64 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -84,6 +84,8 @@ struct iommu_domain {
>   void *handler_token;
>   struct iommu_domain_geometry geometry;
>   void *iova_cookie;
> + struct list_head reserved_regions;
> + struct mutex resv_mutex; /* protects the reserved region list */
>  };
>  
>  enum iommu_cap {
> @@ -131,6 +133,21 @@ struct iommu_dm_region {
>   int prot;
>  

Re: [PATCH 4/5] iommu: Move REQ_ACS_FLAGS out to header and rename

2016-11-10 Thread Joerg Roedel
On Wed, Oct 26, 2016 at 12:01:34PM -0600, Alex Williamson wrote:
> Allow other parts of the kernel to see which PCI ACS flags the IOMMU
> layer considers necessary for isolation.
> 
> Signed-off-by: Alex Williamson 
> Cc: Joerg Roedel 
> ---
>  drivers/iommu/iommu.c |   18 +-
>  include/linux/iommu.h |   11 +++
>  2 files changed, 16 insertions(+), 13 deletions(-)

Applied, thanks Alex.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu/dma: Implement dma_{map,unmap}_resource()

2016-11-10 Thread Joerg Roedel
On Wed, Oct 26, 2016 at 06:43:56PM +0100, Robin Murphy wrote:
> With the new dma_{map,unmap}_resource() functions added to the DMA API
> for the benefit of cases like slave DMA, add suitable implementations to
> the arsenal of our generic layer. Since cache maintenance should not be
> a concern, these can both be standalone callback implementations without
> the need for arch code wrappers.
> 
> CC: Joerg Roedel 
> Signed-off-by: Robin Murphy 
> ---
> 
> v2: Use iommu_dma_unmap_page for symmetry, instead of being sneaky.
> 
>  drivers/iommu/dma-iommu.c | 13 +
>  include/linux/dma-iommu.h |  4 
>  2 files changed, 17 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index c5ab8667e6f2..a2fd90a6a782 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -624,6 +624,19 @@ void iommu_dma_unmap_sg(struct device *dev, struct 
> scatterlist *sg, int nents,
>   __iommu_dma_unmap(iommu_get_domain_for_dev(dev), sg_dma_address(sg));
>  }
>  
> +dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
> + size_t size, enum dma_data_direction dir, unsigned long attrs)
> +{
> + return iommu_dma_map_page(dev, phys_to_page(phys), offset_in_page(phys),
> + size, dma_direction_to_prot(dir, false) | IOMMU_MMIO);
> +}

Isn't the whole point of map_resource that we can map regions which have
no 'struct page' associated? The use phys_to_page() will certainly not
do the right thing here.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: tell kmemleak about the irq_remap_table

2016-11-10 Thread Joerg Roedel
On Wed, Oct 26, 2016 at 01:09:53PM +0200, Lucas Stach wrote:
> This will get rid of a lot false positives caused by kmemleak being
> unaware of the irq_remap_table. Based on a suggestion from Catalin Marinas.
> 
> Signed-off-by: Lucas Stach 
> ---
>  drivers/iommu/amd_iommu_init.c | 4 
>  1 file changed, 4 insertions(+)

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain

2016-11-10 Thread Auger Eric
Hi Robin,

On 10/11/2016 12:54, Robin Murphy wrote:
> Hi Eric,
> 
> On 10/11/16 11:22, Auger Eric wrote:
>> Hi Robin,
>>
>> On 04/11/2016 15:00, Robin Murphy wrote:
>>> Hi Eric,
>>>
>>> Thanks for posting this new series - the bottom-up approach is a lot
>>> easier to reason about :)
>>>
>>> On 04/11/16 11:24, Eric Auger wrote:
 Introduce a new iommu_reserved_region struct. This embodies
 an IOVA reserved region that cannot be used along with the IOMMU
 API. The list is protected by a dedicated mutex.
>>>
>>> In the light of these patches, I think I'm settling into agreement that
>>> the iommu_domain is the sweet spot for accessing this information - the
>>> underlying magic address ranges might be properties of various bits of
>>> hardware many of which aren't the IOMMU itself, but they only start to
>>> matter at the point you start wanting to use an IOMMU domain at the
>>> higher level. Therefore, having a callback in the domain ops to pull
>>> everything together fits rather neatly.
>> Using get_dm_regions could have make sense but this approach now is
>> ruled out by sysfs API approach. If attribute file is bound to be used
>> before iommu domains are created, we cannot rely on any iommu_domain
>> based callback. Back to square 1?
> 
> I think it's still OK. The thing about these reserved regions is that as
> a property of the underlying hardware they must be common to any domain
> for a given group, therefore without loss of generality we can simply
> query group->domain->ops->get_dm_regions(), and can expect the reserved
> ones will be the same regardless of what domain that points to
> (identity-mapped IVMD/RMRR/etc.
Are they really? P2P reserved regions depend on iommu_domain right?

Now I did not consider default_domain usability, I acknowledge. I will
send a POC anyway.

 regions may not be, but we'd be
> filtering those out anyway). The default DMA domains need this
> information too, and since those are allocated at group creation,
> group->domain should always be non-NULL and interrogable.
> 
> Plus, the groups are already there in sysfs, and, being representative
> of device topology, would seem to be an ideal place to expose the
> addressing limitations relevant to the devices within them. This really
> feels like it's all falling into place (on the kernel end, at least, I'm
> sticking to the sidelines on the userspace discussion ;)).

Thanks

Eric
> 
> Robin.
> 
>>
>> Thanks
>>
>> Eric
>>>

 An iommu domain now owns a list of those.

 Signed-off-by: Eric Auger 

 ---
 ---
  drivers/iommu/iommu.c |  2 ++
  include/linux/iommu.h | 17 +
  2 files changed, 19 insertions(+)

 diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
 index 9a2f196..0af07492 100644
 --- a/drivers/iommu/iommu.c
 +++ b/drivers/iommu/iommu.c
 @@ -1061,6 +1061,8 @@ static struct iommu_domain 
 *__iommu_domain_alloc(struct bus_type *bus,
  
domain->ops  = bus->iommu_ops;
domain->type = type;
 +  INIT_LIST_HEAD(>reserved_regions);
 +  mutex_init(>resv_mutex);
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
  
 diff --git a/include/linux/iommu.h b/include/linux/iommu.h
 index 436dc21..0f2eb64 100644
 --- a/include/linux/iommu.h
 +++ b/include/linux/iommu.h
 @@ -84,6 +84,8 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
void *iova_cookie;
 +  struct list_head reserved_regions;
 +  struct mutex resv_mutex; /* protects the reserved region list */
  };
  
  enum iommu_cap {
 @@ -131,6 +133,21 @@ struct iommu_dm_region {
int prot;
  };
  
 +/**
 + * struct iommu_reserved_region - descriptor for a reserved iova region
 + * @list: Linked list pointers
 + * @start: IOVA base address of the region
 + * @length: Length of the region in bytes
 + */
 +struct iommu_reserved_region {
 +  struct list_headlist;
 +  dma_addr_t  start;
 +  size_t  length;
 +};
>>>
>>> Looking at this in context with the dm_region above, though, I come to
>>> the surprising realisation that these *are* dm_regions, even at the
>>> fundamental level - on the one hand you've got physical addresses which
>>> can't be remapped (because something is already using them), while on
>>> the other you've got physical addresses which can't be remapped (because
>>> the IOMMU is incapable). In fact for reserved regions *other* than our
>>> faked-up MSI region there's no harm if the IOMMU were to actually
>>> identity-map them.
>>>
>>> Let's just add this to the existing infrastructure, either with some
>>> kind of IOMMU_NOMAP flag or simply prot = 0. That way it automatically
>>> gets shared between the VFIO 

Re: [RFC v2 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain

2016-11-10 Thread Robin Murphy
Hi Eric,

On 10/11/16 11:22, Auger Eric wrote:
> Hi Robin,
> 
> On 04/11/2016 15:00, Robin Murphy wrote:
>> Hi Eric,
>>
>> Thanks for posting this new series - the bottom-up approach is a lot
>> easier to reason about :)
>>
>> On 04/11/16 11:24, Eric Auger wrote:
>>> Introduce a new iommu_reserved_region struct. This embodies
>>> an IOVA reserved region that cannot be used along with the IOMMU
>>> API. The list is protected by a dedicated mutex.
>>
>> In the light of these patches, I think I'm settling into agreement that
>> the iommu_domain is the sweet spot for accessing this information - the
>> underlying magic address ranges might be properties of various bits of
>> hardware many of which aren't the IOMMU itself, but they only start to
>> matter at the point you start wanting to use an IOMMU domain at the
>> higher level. Therefore, having a callback in the domain ops to pull
>> everything together fits rather neatly.
> Using get_dm_regions could have make sense but this approach now is
> ruled out by sysfs API approach. If attribute file is bound to be used
> before iommu domains are created, we cannot rely on any iommu_domain
> based callback. Back to square 1?

I think it's still OK. The thing about these reserved regions is that as
a property of the underlying hardware they must be common to any domain
for a given group, therefore without loss of generality we can simply
query group->domain->ops->get_dm_regions(), and can expect the reserved
ones will be the same regardless of what domain that points to
(identity-mapped IVMD/RMRR/etc. regions may not be, but we'd be
filtering those out anyway). The default DMA domains need this
information too, and since those are allocated at group creation,
group->domain should always be non-NULL and interrogable.

Plus, the groups are already there in sysfs, and, being representative
of device topology, would seem to be an ideal place to expose the
addressing limitations relevant to the devices within them. This really
feels like it's all falling into place (on the kernel end, at least, I'm
sticking to the sidelines on the userspace discussion ;)).

Robin.

> 
> Thanks
> 
> Eric
>>
>>>
>>> An iommu domain now owns a list of those.
>>>
>>> Signed-off-by: Eric Auger 
>>>
>>> ---
>>> ---
>>>  drivers/iommu/iommu.c |  2 ++
>>>  include/linux/iommu.h | 17 +
>>>  2 files changed, 19 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index 9a2f196..0af07492 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -1061,6 +1061,8 @@ static struct iommu_domain 
>>> *__iommu_domain_alloc(struct bus_type *bus,
>>>  
>>> domain->ops  = bus->iommu_ops;
>>> domain->type = type;
>>> +   INIT_LIST_HEAD(>reserved_regions);
>>> +   mutex_init(>resv_mutex);
>>> /* Assume all sizes by default; the driver may override this later */
>>> domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>>>  
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>> index 436dc21..0f2eb64 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -84,6 +84,8 @@ struct iommu_domain {
>>> void *handler_token;
>>> struct iommu_domain_geometry geometry;
>>> void *iova_cookie;
>>> +   struct list_head reserved_regions;
>>> +   struct mutex resv_mutex; /* protects the reserved region list */
>>>  };
>>>  
>>>  enum iommu_cap {
>>> @@ -131,6 +133,21 @@ struct iommu_dm_region {
>>> int prot;
>>>  };
>>>  
>>> +/**
>>> + * struct iommu_reserved_region - descriptor for a reserved iova region
>>> + * @list: Linked list pointers
>>> + * @start: IOVA base address of the region
>>> + * @length: Length of the region in bytes
>>> + */
>>> +struct iommu_reserved_region {
>>> +   struct list_headlist;
>>> +   dma_addr_t  start;
>>> +   size_t  length;
>>> +};
>>
>> Looking at this in context with the dm_region above, though, I come to
>> the surprising realisation that these *are* dm_regions, even at the
>> fundamental level - on the one hand you've got physical addresses which
>> can't be remapped (because something is already using them), while on
>> the other you've got physical addresses which can't be remapped (because
>> the IOMMU is incapable). In fact for reserved regions *other* than our
>> faked-up MSI region there's no harm if the IOMMU were to actually
>> identity-map them.
>>
>> Let's just add this to the existing infrastructure, either with some
>> kind of IOMMU_NOMAP flag or simply prot = 0. That way it automatically
>> gets shared between the VFIO and DMA cases for free!
>>
>> Robin.
>>
>>> +
>>> +#define iommu_reserved_region_for_each(resv, d) \
>>> +   list_for_each_entry(resv, &(d)->reserved_regions, list)
>>> +
>>>  #ifdef CONFIG_IOMMU_API
>>>  
>>>  /**
>>>
>>
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org

Re: [PATCH v6 0/9] Fix kdump faults on system with amd iommu

2016-11-10 Thread Joerg Roedel
Hi Baoquan,

thanks for working on this, really appreciated!

On Thu, Oct 20, 2016 at 07:37:11PM +0800, Baoquan He wrote:
> This is v6 post. 
> 
> The principle of the fix is similar to intel iommu. Just defer the assignment
> of device to domain to device driver init. But there's difference than
> intel iommu. AMD iommu create protection domain and assign device to
> domain in iommu driver init stage. So in this patchset I just allow the
> assignment of device to domain in software level, but defer updating the
> domain info, especially the pte_root to dev table entry to device driver
> init stage.

I recently talked with the IOMMU guys from AMD about whether it is safe
to update the device-table pointer while the iommu is enabled. It turns
out that device-table pointer update is split up into two 32bit writes
in the IOMMU hardware. So updating it while the IOMMU is enabled could
have some nasty side effects.

The only way to work around this is to allocate the device-table
below 4GB, but that needs more low-mem then in the kdump kernel. So some
adjustments are needed there too. Anyway, can you add that to your
patch-set?


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 7/9] iommu/amd: Update domain into to dte entry during device driver init

2016-11-10 Thread Joerg Roedel
On Thu, Oct 20, 2016 at 07:37:18PM +0800, Baoquan He wrote:
>   prot = dir2prot(direction);
> + if (translation_pre_enabled(iommu) && !dev_data->domain_updated) {
> + dev_data->domain_updated = true;
> + set_dte_entry(dev_data->devid, domain, dev_data->ats.enabled);
> + if (alias != dev_data->devid)
> + set_dte_entry(alias, domain, dev_data->ats.enabled);
> + device_flush_dte(dev_data);
> + }

Hmm, I really don't like to have these checks in the fast-path. But to
get rid of it I think we need to re-work the way domains are assigned to
devices.

One way to do this would be to add a 'defered-domain-attach' feature to
the iommu-core code. What do you think about this?


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 00/07] iommu/ipmmu-vmsa: IPMMU multi-arch update V6

2016-11-10 Thread Joerg Roedel
On Thu, Oct 20, 2016 at 08:35:33AM +0900, Magnus Damm wrote:
> iommu/ipmmu-vmsa: IPMMU multi-arch update V6
> 
> [PATCH v6 01/07] iommu/ipmmu-vmsa: Remove platform data handling
> [PATCH v6 02/07] iommu/ipmmu-vmsa: Rework interrupt code and use bitmap for 
> context
> [PATCH v6 03/07] iommu/ipmmu-vmsa: Break out utlb parsing code
> [PATCH v6 04/07] iommu/ipmmu-vmsa: Break out domain allocation code
> [PATCH v6 05/07] iommu/ipmmu-vmsa: Add new IOMMU_DOMAIN_DMA ops
> [PATCH v6 06/07] iommu/ipmmu-vmsa: ARM and ARM64 archdata access
> [PATCH v6 07/07] iommu/ipmmu-vmsa: Drop LPAE Kconfig dependency

Applied to arm/renesas, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: convert DT component matching to component_match_add_release()

2016-11-10 Thread Joerg Roedel
On Wed, Oct 19, 2016 at 11:30:34AM +0100, Russell King wrote:
> Convert DT component matching to use component_match_add_release().
> 
> Signed-off-by: Russell King 
> ---
>  drivers/iommu/mtk_iommu.c| 8 +---
>  drivers/iommu/mtk_iommu.h| 5 +
>  drivers/iommu/mtk_iommu_v1.c | 8 +---
>  3 files changed, 15 insertions(+), 6 deletions(-)

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain

2016-11-10 Thread Auger Eric
Hi Robin,

On 04/11/2016 15:00, Robin Murphy wrote:
> Hi Eric,
> 
> Thanks for posting this new series - the bottom-up approach is a lot
> easier to reason about :)
> 
> On 04/11/16 11:24, Eric Auger wrote:
>> Introduce a new iommu_reserved_region struct. This embodies
>> an IOVA reserved region that cannot be used along with the IOMMU
>> API. The list is protected by a dedicated mutex.
> 
> In the light of these patches, I think I'm settling into agreement that
> the iommu_domain is the sweet spot for accessing this information - the
> underlying magic address ranges might be properties of various bits of
> hardware many of which aren't the IOMMU itself, but they only start to
> matter at the point you start wanting to use an IOMMU domain at the
> higher level. Therefore, having a callback in the domain ops to pull
> everything together fits rather neatly.
Using get_dm_regions could have make sense but this approach now is
ruled out by sysfs API approach. If attribute file is bound to be used
before iommu domains are created, we cannot rely on any iommu_domain
based callback. Back to square 1?

Thanks

Eric
> 
>>
>> An iommu domain now owns a list of those.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> ---
>>  drivers/iommu/iommu.c |  2 ++
>>  include/linux/iommu.h | 17 +
>>  2 files changed, 19 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 9a2f196..0af07492 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1061,6 +1061,8 @@ static struct iommu_domain 
>> *__iommu_domain_alloc(struct bus_type *bus,
>>  
>>  domain->ops  = bus->iommu_ops;
>>  domain->type = type;
>> +INIT_LIST_HEAD(>reserved_regions);
>> +mutex_init(>resv_mutex);
>>  /* Assume all sizes by default; the driver may override this later */
>>  domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>>  
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 436dc21..0f2eb64 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -84,6 +84,8 @@ struct iommu_domain {
>>  void *handler_token;
>>  struct iommu_domain_geometry geometry;
>>  void *iova_cookie;
>> +struct list_head reserved_regions;
>> +struct mutex resv_mutex; /* protects the reserved region list */
>>  };
>>  
>>  enum iommu_cap {
>> @@ -131,6 +133,21 @@ struct iommu_dm_region {
>>  int prot;
>>  };
>>  
>> +/**
>> + * struct iommu_reserved_region - descriptor for a reserved iova region
>> + * @list: Linked list pointers
>> + * @start: IOVA base address of the region
>> + * @length: Length of the region in bytes
>> + */
>> +struct iommu_reserved_region {
>> +struct list_headlist;
>> +dma_addr_t  start;
>> +size_t  length;
>> +};
> 
> Looking at this in context with the dm_region above, though, I come to
> the surprising realisation that these *are* dm_regions, even at the
> fundamental level - on the one hand you've got physical addresses which
> can't be remapped (because something is already using them), while on
> the other you've got physical addresses which can't be remapped (because
> the IOMMU is incapable). In fact for reserved regions *other* than our
> faked-up MSI region there's no harm if the IOMMU were to actually
> identity-map them.
> 
> Let's just add this to the existing infrastructure, either with some
> kind of IOMMU_NOMAP flag or simply prot = 0. That way it automatically
> gets shared between the VFIO and DMA cases for free!
> 
> Robin.
> 
>> +
>> +#define iommu_reserved_region_for_each(resv, d) \
>> +list_for_each_entry(resv, &(d)->reserved_regions, list)
>> +
>>  #ifdef CONFIG_IOMMU_API
>>  
>>  /**
>>
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


problems with iommu and userspace DMA to hugepages, how to add iommu mapings from userspace.

2016-11-10 Thread Lars Segerlund
 Hi,

 I am getting these errors from a userspace 'device driver' ,

[599805.585424] DMAR: DRHD: handling fault status reg 202
[599805.585431] DMAR: DMAR:[DMA Read] Request device [03:00.0] fault addr
4c0008000
DMAR:[fault reason 06] PTE Read access is not set

 Basicly i mapp a dma engine in a PCIe card by uio to a userspace library,
and then allocate hugepages ( continous ),  do a virt to phys translation
via /proc/self/ setup a dma and turn it on.

The message tells me I lack iommu mappings for the memory regions i try to
read/write , so far so good nothing unexpected.

My problem is that I have to run with intel_iommu=on iommu=pt flags to the
kernel due to other hard/software on the machine, I can't turn iommu off
completely.

So I have the question is it possible to add iommu mappings from userspace
? through iommu groups in /sys/..  ?

So far all applications that does something similar either uses vfio or
some kernel driver callback and I would prefer not to do this.
( I had high hopes for dpdk pmd driver but so far no luck ).

 All hints and help appreciated ! :-D

 / regards, Lars Segerlund.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-10 Thread Auger Eric
Hi Will, Alex,

On 10/11/2016 03:01, Will Deacon wrote:
> On Wed, Nov 09, 2016 at 05:55:17PM -0700, Alex Williamson wrote:
>> On Thu, 10 Nov 2016 01:14:42 +0100
>> Auger Eric  wrote:
>>> On 10/11/2016 00:59, Alex Williamson wrote:
 On Wed, 9 Nov 2016 23:38:50 +
 Will Deacon  wrote:
> On Wed, Nov 09, 2016 at 04:24:58PM -0700, Alex Williamson wrote:  
>> The VFIO_IOMMU_MAP_DMA ioctl is a contract, the user ask to map a range
>> of IOVAs to a range of virtual addresses for a given device.  If VFIO
>> cannot reasonably fulfill that contract, it must fail.  It's up to QEMU
>> how to manage the hotplug and what memory regions it asks VFIO to map
>> for a device, but VFIO must reject mappings that it (or the SMMU by
>> virtue of using the IOMMU API) know to overlap reserved ranges.  So I
>> still disagree with the referenced statement.  Thanks,
>
> I think that's a pity. Not only does it mean that both QEMU and the kernel
> have more work to do (the former has to carve up its mapping requests,
> whilst the latter has to check that it is indeed doing this), but it also
> precludes the use of hugepage mappings on the IOMMU because of reserved
> regions. For example, a 4k hole someplace may mean we can't put down 1GB
> table entries for the guest memory in the SMMU.
>
> All this seems to do is add complexity and decrease performance. For what?
> QEMU has to go read the reserved regions from someplace anyway. It's also
> the way that VFIO works *today* on arm64 wrt reserved regions, it just has
> no way to identify those holes at present.  

 Sure, that sucks, but how is the alternative even an option?  The user
 asked to map something, we can't, if we allow that to happen now it's a
 bug.  Put the MSI doorbells somewhere that this won't be an issue.  If
 the platform has it fixed somewhere that this is an issue, don't use
 that platform.  The correctness of the interface is more important than
 catering to a poorly designed system layout IMO.  Thanks,  
>>>
>>> Besides above problematic, I started to prototype the sysfs API. A first
>>> issue I face is the reserved regions become global to the iommu instead
>>> of characterizing the iommu_domain, ie. the "reserved_regions" attribute
>>> file sits below an iommu instance (~
>>> /sys/class/iommu/dmar0/intel-iommu/reserved_regions ||
>>> /sys/class/iommu/arm-smmu0/arm-smmu/reserved_regions).
>>>
>>> MSI reserved window can be considered global to the IOMMU. However PCIe
>>> host bridge P2P regions rather are per iommu-domain.
> 
> I don't think we can treat them as per-domain, given that we want to
> enumerate this stuff before we've decided to do a hotplug (and therefore
> don't have a domain).
That's the issue indeed. We need to wait for the PCIe device to be
connected to the iommu. Only on the VFIO SET_IOMMU we get the
comprehensive list of P2P regions that can impact IOVA mapping for this
iommu. This removes any advantage of sysfs API over previous VFIO
capability chain API for P2P IOVA range enumeration at early stage.

> 
>>>
>>> Do you confirm the attribute file should contain both global reserved
>>> regions and all per iommu_domain reserved regions?
>>>
>>> Thoughts?
>>
>> I don't think we have any business describing IOVA addresses consumed
>> by peer devices in an IOMMU sysfs file.  If it's a separate device it
>> should be exposed by examining the rest of the topology.  Regions
>> consumed by PCI endpoints and interconnects are already exposed in
>> sysfs.  In fact, is this perhaps a more accurate model for these MSI
>> controllers too?  Perhaps they should be exposed in the bus topology
>> somewhere as consuming the IOVA range.
Currently on x86 the P2P regions are not checked when allowing
passthrough. Aren't we more papist that the pope? As Don mentioned,
shouldn't we simply consider that a platform that does not support
proper ACS is not candidate for safe passthrough, like Juno?

At least we can state the feature also is missing on x86 and it would be
nice to report the risk to the userspace and urge him to opt-in.

To me taking into account those P2P still is controversial and induce
the bulk of the complexity. Considering the migration use case discussed
at LPC while only handling the MSI problem looks much easier.
host can choose an MSI base that is QEMU mach-virt friendly, ie. non RAM
region. Problem is to satisfy all potential uses though. When migrating,
mach-virt still is being used so there should not be any collision. Am I
missing some migration weird use cases here? Of course if we take into
consideration new host PCIe P2P regions this becomes completely different.

We still have the good old v14 where the user space chose where MSI
IOVA's are put without any risk of collision ;-)

>>  If DMA to an IOVA is consumed
>> by an intermediate device before it hits the IOMMU vs not 

Re: [PATCH v7 04/16] drivers: iommu: make of_iommu_set/get_ops() DT agnostic

2016-11-10 Thread Lorenzo Pieralisi
On Wed, Nov 09, 2016 at 02:40:08PM +, Robin Murphy wrote:

[...]

> > +void fwnode_iommu_set_ops(struct fwnode_handle *fwnode,
> > + const struct iommu_ops *ops)
> > +{
> > +   struct iommu_fwentry *iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
> > +
> > +   if (WARN_ON(!iommu))
> > +   return;
> > +
> > +   if (is_of_node(fwnode))
> 
> Nit: this check is actually redundant, since to_of_node() already does
> the right thing and of_node_get() is NULL-safe - iommu_fwspec_init()
> already works that way. On the other hand, though, it is perhaps more
> intuitive to see it explicitly, and since to_of_node() is inline it
> ought to result in the same object code (I've not checked, though).

I can easily fold this change in the final code and I think we
should keep this consistent so I am happy to change my code and
make the is_of_node() check implicit.

> Either way:
> 
> Reviewed-by: Robin Murphy 

Thank you !
Lorenzo

> > +   of_node_get(to_of_node(fwnode));
> > +
> > +   INIT_LIST_HEAD(>list);
> > +   iommu->fwnode = fwnode;
> > +   iommu->ops = ops;
> > +   spin_lock(_fwentry_lock);
> > +   list_add_tail(>list, _fwentry_list);
> > +   spin_unlock(_fwentry_lock);
> > +}
> > +
> > +const struct iommu_ops *fwnode_iommu_get_ops(struct fwnode_handle *fwnode)
> > +{
> > +   struct iommu_fwentry *node;
> > +   const struct iommu_ops *ops = NULL;
> > +
> > +   spin_lock(_fwentry_lock);
> > +   list_for_each_entry(node, _fwentry_list, list)
> > +   if (node->fwnode == fwnode) {
> > +   ops = node->ops;
> > +   break;
> > +   }
> > +   spin_unlock(_fwentry_lock);
> > +   return ops;
> > +}
> > +
> >  int iommu_fwspec_init(struct device *dev, struct fwnode_handle 
> > *iommu_fwnode,
> >   const struct iommu_ops *ops)
> >  {
> > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> > index 5b82862..0f57ddc 100644
> > --- a/drivers/iommu/of_iommu.c
> > +++ b/drivers/iommu/of_iommu.c
> > @@ -96,45 +96,6 @@ int of_get_dma_window(struct device_node *dn, const char 
> > *prefix, int index,
> >  }
> >  EXPORT_SYMBOL_GPL(of_get_dma_window);
> >  
> > -struct of_iommu_node {
> > -   struct list_head list;
> > -   struct device_node *np;
> > -   const struct iommu_ops *ops;
> > -};
> > -static LIST_HEAD(of_iommu_list);
> > -static DEFINE_SPINLOCK(of_iommu_lock);
> > -
> > -void of_iommu_set_ops(struct device_node *np, const struct iommu_ops *ops)
> > -{
> > -   struct of_iommu_node *iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
> > -
> > -   if (WARN_ON(!iommu))
> > -   return;
> > -
> > -   of_node_get(np);
> > -   INIT_LIST_HEAD(>list);
> > -   iommu->np = np;
> > -   iommu->ops = ops;
> > -   spin_lock(_iommu_lock);
> > -   list_add_tail(>list, _iommu_list);
> > -   spin_unlock(_iommu_lock);
> > -}
> > -
> > -const struct iommu_ops *of_iommu_get_ops(struct device_node *np)
> > -{
> > -   struct of_iommu_node *node;
> > -   const struct iommu_ops *ops = NULL;
> > -
> > -   spin_lock(_iommu_lock);
> > -   list_for_each_entry(node, _iommu_list, list)
> > -   if (node->np == np) {
> > -   ops = node->ops;
> > -   break;
> > -   }
> > -   spin_unlock(_iommu_lock);
> > -   return ops;
> > -}
> > -
> >  static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)
> >  {
> > struct of_phandle_args *iommu_spec = data;
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 436dc21..15d5478 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -351,6 +351,9 @@ int iommu_fwspec_init(struct device *dev, struct 
> > fwnode_handle *iommu_fwnode,
> >   const struct iommu_ops *ops);
> >  void iommu_fwspec_free(struct device *dev);
> >  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
> > +void fwnode_iommu_set_ops(struct fwnode_handle *fwnode,
> > + const struct iommu_ops *ops);
> > +const struct iommu_ops *fwnode_iommu_get_ops(struct fwnode_handle *fwnode);
> >  
> >  #else /* CONFIG_IOMMU_API */
> >  
> > @@ -580,6 +583,17 @@ static inline int iommu_fwspec_add_ids(struct device 
> > *dev, u32 *ids,
> > return -ENODEV;
> >  }
> >  
> > +static inline void fwnode_iommu_set_ops(struct fwnode_handle *fwnode,
> > +   const struct iommu_ops *ops)
> > +{
> > +}
> > +
> > +static inline
> > +const struct iommu_ops *fwnode_iommu_get_ops(struct fwnode_handle *fwnode)
> > +{
> > +   return NULL;
> > +}
> > +
> >  #endif /* CONFIG_IOMMU_API */
> >  
> >  #endif /* __LINUX_IOMMU_H */
> > diff --git a/include/linux/of_iommu.h b/include/linux/of_iommu.h
> > index e80b9c7..7681007 100644
> > --- a/include/linux/of_iommu.h
> > +++ b/include/linux/of_iommu.h
> > @@ -31,8 +31,16 @@ static inline const struct iommu_ops 
> > *of_iommu_configure(struct device *dev,
> >  
> >  #endif /* CONFIG_OF_IOMMU */
> >  
> > -void 

Re: [PATCH] iommu/vt-d: Fix the size calculation of pasid table

2016-11-10 Thread David Woodhouse
On Thu, 2016-11-10 at 11:45 +0100, Joerg Roedel wrote:
> Hi David,
> 
> On Sun, Oct 30, 2016 at 06:18:22AM -0600, David Woodhouse wrote:
> > + /* Start at 2 because it's defined as 1^(1+PSS) */
> 
> This probably means 2^(1+PSS), right?

Or 1 << (1+PSS). Yeah.

> > + iommu->pasid_max = 2 << ecap_pss(iommu->ecap);
> 
> Otherwise the patch looks good. Do you want to send it upstream yourself
> or should I pick it up?

I'll let it sit in -next for a day or two more once I've fixed the
above, then ask Linus to pull it. Thanks.

-- 
dwmw2

smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] iommu/mediatek: Convert M4Uv2 to iommu_fwspec

2016-11-10 Thread Joerg Roedel
On Mon, Oct 17, 2016 at 12:49:20PM +0100, Robin Murphy wrote:
> Our per-device data consists of the M4U instance and firmware-provided
> list of LARB IDs, which is a perfect fit for the generic iommu_fwspec
> machinery. Use that directly as a simpler alternative to the custom
> archdata code.
> 
> CC: Yong Wu 
> Signed-off-by: Robin Murphy 
> ---
> 
> These are fairly mechanical cleanups, so I'm pretty confident, but it
> still bears mentioning that they're only compile-tested as I don't have
> the relevant hardware.
> 
> Robin.
> 
>  drivers/iommu/mtk_iommu.c | 75 
> ---
>  1 file changed, 18 insertions(+), 57 deletions(-)

Applied both, thanks Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] ACPI IORT ARM SMMU support

2016-11-10 Thread Lorenzo Pieralisi
Hi Rafael,

On Thu, Nov 10, 2016 at 12:36:12AM +0100, Rafael J. Wysocki wrote:
> Hi Lorenzo,
> 
> On Wed, Nov 9, 2016 at 3:19 PM, Lorenzo Pieralisi
>  wrote:
> > This patch series is v7 of a previous posting:
> >
> > https://lkml.org/lkml/2016/10/18/506
> 
> I don't see anything objectionable in this series.

Thank you very much !

> Please let me know which patches in particular to look at in detail.

Apart from patch 1, patch 7 (and patch 16 that adds on top of it)
requires an ack from an ACPI core perspective. acpi_dma_configure()
should not affect x86/ia64 systems at all so those patches should
not really be controversial (patch 16 adds an IORT hook into
acpi_dma_configure() that is a nop by design on anything other than
ARM64), please let me know if they are ok.

Thank you again for your time,
Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 01/20] x86: Documentation for AMD Secure Memory Encryption (SME)

2016-11-10 Thread Borislav Petkov
On Wed, Nov 09, 2016 at 06:34:39PM -0600, Tom Lendacky wrote:
> This patch adds a Documenation entry to decribe the AMD Secure Memory
> Encryption (SME) feature.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  Documentation/kernel-parameters.txt |5 +++
>  Documentation/x86/amd-memory-encryption.txt |   40 
> +++
>  2 files changed, 45 insertions(+)
>  create mode 100644 Documentation/x86/amd-memory-encryption.txt
> 
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index 030e9e9..4c730b0 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2282,6 +2282,11 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
>   memory contents and reserves bad memory
>   regions that are detected.
>  
> + mem_encrypt=[X86-64] Enable AMD Secure Memory Encryption (SME)
> + Memory encryption is disabled by default, using this
> + switch, memory encryption can be enabled.

I'd say here:

"Force-enable memory encryption if it is disabled in the
BIOS."

> + on: enable memory encryption
> +
>   meye.*= [HW] Set MotionEye Camera parameters
>   See Documentation/video4linux/meye.txt.
>  
> diff --git a/Documentation/x86/amd-memory-encryption.txt 
> b/Documentation/x86/amd-memory-encryption.txt
> new file mode 100644
> index 000..788d871
> --- /dev/null
> +++ b/Documentation/x86/amd-memory-encryption.txt
> @@ -0,0 +1,40 @@
> +Secure Memory Encryption (SME) is a feature found on AMD processors.
> +
> +SME provides the ability to mark individual pages of memory as encrypted 
> using
> +the standard x86 page tables.  A page that is marked encrypted will be
> +automatically decrypted when read from DRAM and encrypted when written to
> +DRAM.  SME can therefore be used to protect the contents of DRAM from 
> physical
> +attacks on the system.
> +
> +A page is encrypted when a page table entry has the encryption bit set (see
> +below how to determine the position of the bit).  The encryption bit can be
> +specified in the cr3 register, allowing the PGD table to be encrypted. Each
> +successive level of page tables can also be encrypted.
> +
> +Support for SME can be determined through the CPUID instruction. The CPUID
> +function 0x801f reports information related to SME:
> +
> + 0x801f[eax]:
> + Bit[0] indicates support for SME
> + 0x801f[ebx]:
> + Bit[5:0]  pagetable bit number used to enable memory encryption
> + Bit[11:6] reduction in physical address space, in bits, when
> +   memory encryption is enabled (this only affects system
> +   physical addresses, not guest physical addresses)
> +
> +If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
> +determine if SME is enabled and/or to enable memory encryption:
> +
> + 0xc0010010:
> + Bit[23]   0 = memory encryption features are disabled
> +   1 = memory encryption features are enabled
> +
> +Linux relies on BIOS to set this bit if BIOS has determined that the 
> reduction
> +in the physical address space as a result of enabling memory encryption (see
> +CPUID information above) will not conflict with the address space resource
> +requirements for the system.  If this bit is not set upon Linux startup then
> +Linux itself will not set it and memory encryption will not be possible.
> +
> +SME support is configurable through the AMD_MEM_ENCRYPT config option.
> +Additionally, the mem_encrypt=on command line parameter is required to 
> activate
> +memory encryption.

So how am I to understand this? We won't have TSME or we will but it
will be off by default and users will have to enable it in the BIOS or
will have to boot with mem_encrypt=on...?

Can you please expand on all the possible options there would be
available to users?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix the size calculation of pasid table

2016-11-10 Thread Joerg Roedel
Hi David,

On Sun, Oct 30, 2016 at 06:18:22AM -0600, David Woodhouse wrote:
> + /* Start at 2 because it's defined as 1^(1+PSS) */

This probably means 2^(1+PSS), right?

> + iommu->pasid_max = 2 << ecap_pss(iommu->ecap);

Otherwise the patch looks good. Do you want to send it upstream yourself
or should I pick it up?



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [v16, 0/7] Fix eSDHC host version register bug

2016-11-10 Thread Geert Uytterhoeven
Hi Ulf,

On Wed, Nov 9, 2016 at 7:27 PM, Ulf Hansson  wrote:
> On 9 November 2016 at 04:14, Yangbo Lu  wrote:
>> This patchset is used to fix a host version register bug in the 
>> T4240-R1.0-R2.0
>> eSDHC controller. To match the SoC version and revision, 15 previous version
>> patchsets had tried many methods but all of them were rejected by reviewers.
>> Such as
>> - dts compatible method
>> - syscon method
>> - ifdef PPC method
>> - GUTS driver getting SVR method
>> Anrd suggested a soc_device_match method in v10, and this is the only 
>> available
>> method left now. This v11 patchset introduces the soc_device_match interface 
>> in
>> soc driver.
>>
>> The first four patches of Yangbo are to add the GUTS driver. This is used to
>> register a soc device which contain soc version and revision information.
>> The other three patches introduce the soc_device_match method in soc driver
>> and apply it on esdhc driver to fix this bug.
>>
>> ---
>> Changes for v15:
>> - Dropped patch 'dt: bindings: update Freescale DCFG compatible'
>>   since the work had been done by below patch on ShawnGuo's linux 
>> tree.
>>   'dt-bindings: fsl: add LS1043A/LS1046A/LS2080A compatible for SCFG
>>and DCFG'
>> - Fixed error code issue in guts driver
>> Changes for v16:
>> - Dropped patch 'powerpc/fsl: move mpc85xx.h to include/linux/fsl'
>> - Added a bug-fix patch from Geert
>> ---
>>
>> Arnd Bergmann (1):
>>   base: soc: introduce soc_device_match() interface
>>
>> Geert Uytterhoeven (1):
>>   base: soc: Check for NULL SoC device attributes

> Thanks, applied on my mmc tree for next!

Oops, the above two commits (plus two more enhancements) are also a dependency
for Samsung and Renesas. Hence the plan was to use an immutable branch for
that...

Ulf, would it still be possible to replace these two commits in mmc/next:

8b82c17a8ae533d6 base: soc: introduce soc_device_match() interface
6fa350172b098f0f base: soc: Check for NULL SoC device attributes

by a merge of the immutable branch I've just created?
Cfr, the other thread  "[PATCH v2 0/7] soc: renesas: Identify SoC and register
with the SoC bus".

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu