Re: [RFT v2] iommu/amd: use subsys_initcall() on amdv2 iommu

2016-04-08 Thread Luis R. Rodriguez
On Tue, Mar 29, 2016 at 10:41 AM, Luis R. Rodriguez  wrote:
> We need to ensure amd iommu v2 initializes before
> driver uses such as drivers/gpu/drm/amd/amdkfd/kfd_module.c,
> to do this make its init routine a subsys_initcall() which
> ensures its load init is called first than modules when
> built-in.
>
> This reverts the old work around implemented through commit
> 1bacc894c227fad8a7 ("drivers: Move iommu/ before gpu/ in Makefile"),
> instead of making the dependency implicit by linker order this
> makes the ordering requirement explicit through proper kernel
> APIs.
>
> Cc: Oded Gabbay 
> Cc: Christian König 
> Signed-off-by: Luis R. Rodriguez 

*poke*

 Luis
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 6/6] PCI: Squash pci_dev_flags to remove holes

2016-04-08 Thread Alex Williamson
On Wed, 24 Feb 2016 13:44:31 -0600
Bjorn Helgaas  wrote:

> From: Jacek Lawrynowicz 
> 
> After removing PCI_DEV_FLAGS_DMA_ALIAS_DEVFN, the (1 << 4) value was
> unused.  Squash the other values so all the bits are adjacent.  No
> functional change intended.
> 
> (I'm not sure this is worth doing.  We have 16 flag bits and we're not
> even close to exhausting them.  But if we do squash them, it should be
> in a separate patch so we don't clutter up the main patches.)
> ---

Needs a Sign-off

Reviewed-by: Alex Williamson 


>  include/linux/pci.h |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index d9e0c84..4e36024 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -173,13 +173,13 @@ enum pci_dev_flags {
>   /* Flag for quirk use to store if quirk-specific ACS is enabled */
>   PCI_DEV_FLAGS_ACS_ENABLED_QUIRK = (__force pci_dev_flags_t) (1 << 3),
>   /* Use a PCIe-to-PCI bridge alias even if !pci_is_pcie */
> - PCI_DEV_FLAG_PCIE_BRIDGE_ALIAS = (__force pci_dev_flags_t) (1 << 5),
> + PCI_DEV_FLAG_PCIE_BRIDGE_ALIAS = (__force pci_dev_flags_t) (1 << 4),
>   /* Do not use bus resets for device */
> - PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 6),
> + PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 5),
>   /* Do not use PM reset even if device advertises NoSoftRst- */
> - PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
> + PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 6),
>   /* Get VPD from function 0 VPD */
> - PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> + PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 7),
>  };
>  
>  enum pci_irq_reroute_variant {
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/6] PCI: Move informational printk to pci_add_dma_alias()

2016-04-08 Thread Alex Williamson
On Wed, 24 Feb 2016 13:43:54 -0600
Bjorn Helgaas  wrote:

> From: Jacek Lawrynowicz 
> 
> One of the quirks that adds DMA aliases logs an informational message in
> dmesg.  Move that to pci_add_dma_alias() so all users log the message
> consistently.  No functional change intended (except extra message).
> ---
>  drivers/pci/pci.c|2 ++
>  drivers/pci/quirks.c |6 +-
>  2 files changed, 3 insertions(+), 5 deletions(-)


Needs a Sign-off, otherwise:

Reviewed-by: Alex Williamson 

> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7fccc8a..8b0a637 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4580,6 +4580,8 @@ void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
>  {
>   dev->dma_alias_devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0);
>   dev->dev_flags |= PCI_DEV_FLAGS_DMA_ALIAS_DEVFN;
> + dev_info(>dev, "Enabling fixed DMA alias to %02x.%d\n",
> +  PCI_SLOT(devfn), PCI_FUNC(devfn));
>  }
>  
>  bool pci_device_is_present(struct pci_dev *pdev)
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index df28dce..cf023ea 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3662,12 +3662,8 @@ static void quirk_fixed_dma_alias(struct pci_dev *dev)
>   const struct pci_device_id *id;
>  
>   id = pci_match_id(fixed_dma_alias_tbl, dev);
> - if (id) {
> + if (id)
>   pci_add_dma_alias(dev, id->driver_data);
> - dev_info(>dev, "Enabling fixed DMA alias to %02x.%d\n",
> -  PCI_SLOT(id->driver_data),
> -  PCI_FUNC(id->driver_data));
> - }
>  }
>  
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ADAPTEC2, 0x0285, 
> quirk_fixed_dma_alias);
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 5/6] PCI: Add DMA alias quirk for mic_x200_dma

2016-04-08 Thread Alex Williamson
On Thu,  3 Mar 2016 15:53:20 +0100
Jacek Lawrynowicz  wrote:

> MIC x200 NTB forwards PCIe traffic using multiple alien RID. They have to
> be added as aliases to the DMA device in order to allow buffer access
> when IOMMU is enabled.
> 
> Signed-off-by: Jacek Lawrynowicz 
> Acked-by: David Woodhouse 
> ---

Reviewed-by: Alex Williamson 


>  Updated quirk comment with requirement that aliases have to be matching
>  values programmed in the EEPROM. The LUT table on x200 is static so there
>  is no need to program it from the driver.
> 
>  drivers/pci/quirks.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 51927a5..e66f09d 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3696,6 +3696,20 @@ DECLARE_PCI_FIXUP_HEADER(0x1283, 0x8892, 
> quirk_use_pcie_bridge_dma_alias);
>  DECLARE_PCI_FIXUP_HEADER(0x8086, 0x244e, quirk_use_pcie_bridge_dma_alias);
>  
>  /*
> + * MIC x200 NTB forwards PCIe traffic using multiple alien RIDs. They have to
> + * be added as aliases to the DMA device in order to allow buffer access
> + * when IOMMU is enabled. Following devfns have to match RIT-LUT table
> + * programmed in the EEPROM.
> + */
> +static void quirk_mic_x200_dma_alias(struct pci_dev *pdev)
> +{
> + pci_add_dma_alias(pdev, PCI_DEVFN(0x10, 0x0));
> + pci_add_dma_alias(pdev, PCI_DEVFN(0x11, 0x0));
> + pci_add_dma_alias(pdev, PCI_DEVFN(0x12, 0x3));
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, 
> quirk_mic_x200_dma_alias);
> +
> +/*
>   * Intersil/Techwell TW686[4589]-based video capture cards have an empty 
> (zero)
>   * class code.  Fix it.
>   */

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/6] PCI: Rename dma_alias_is_enabled() to pci_devs_are_dma_aliases()

2016-04-08 Thread Alex Williamson
On Wed, 24 Feb 2016 13:44:15 -0600
Bjorn Helgaas  wrote:

> This should be folded into the previous patch.  I left it separate to show
> the interface difference more clearly.
> 
> Also, pci_devs_are_dma_aliases() uses PCI internals (dma_alias_mask), so I
> think it should be in PCI code instead of in IOMMU code.  That would mean
> both it and pci_add_dma_alias() should probably be declared in
> include/linux/pci.h (but not exported to modules with EXPORT_SYMBOL()).
> ---
>  drivers/iommu/iommu.c |   11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)

Needs a Sign-off.

Reviewed-by: Alex Williamson 

> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index a214e19..031d06b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -659,10 +659,12 @@ static struct iommu_group 
> *get_pci_function_alias_group(struct pci_dev *pdev,
>   return NULL;
>  }
>  
> -static bool dma_alias_is_enabled(struct pci_dev *dev, u8 devfn)
> +static bool pci_devs_are_dma_aliases(struct pci_dev *pdev1, struct pci_dev 
> *pdev2)
>  {
> - return dev->dma_alias_mask &&
> -test_bit(devfn, dev->dma_alias_mask);
> + return (pdev1->dma_alias_mask &&
> + test_bit(pdev2->devfn, pdev1->dma_alias_mask)) ||
> +(pdev2->dma_alias_mask &&
> + test_bit(pdev1->devfn, pdev2->dma_alias_mask));
>  }
>  
>  /*
> @@ -692,8 +694,7 @@ static struct iommu_group *get_pci_alias_group(struct 
> pci_dev *pdev,
>   continue;
>  
>   /* We alias them or they alias us */
> - if (dma_alias_is_enabled(pdev, tmp->devfn) ||
> - dma_alias_is_enabled(tmp, pdev->devfn)) {
> + if (pci_devs_are_dma_aliases(pdev, tmp)) {
>   group = get_pci_alias_group(tmp, devfns);
>   if (group) {
>   pci_dev_put(tmp);
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/6] PCI: Add support for multiple DMA aliases

2016-04-08 Thread Alex Williamson
On Thu,  3 Mar 2016 15:38:02 +0100
Jacek Lawrynowicz  wrote:

> This patch solves IOMMU support issues with PCIe non-transparent bridges
> that use Requester ID look-up tables (RID-LUT), e.g. PEX8733.
> 
> The NTB connects devices in two independent PCI domains. Devices
> separated by the NTB are not able to discover each other. A PCI packet
> being forwared from one domain to another has to have its RID modified
> so it appears on correct bus and completions are forwarded back to the
> original domain through the NTB. RID is translated using preprogrammed
> table (LUT) and the PCI packet propagates upstream away from the NTB.
> If the destination system has IOMMU enabled, the packet will be
> discarded because the new RID is unknown to the IOMMU. Adding a DMA
> alias for the new RID allows IOMMU to properly recognize the packet.
> 
> Each device behind the NTB has a unique RID assigned in the RID-LUT.
> Current DMA alias implementation supports only single alias, so it's not
> possible to support mutiple devices behind the NTB when IOMMU is enabled.
> 
> This implementation enables all possible aliases on a given bus (256)
> that are stored in a bitset. Alias devfn is directly translated to a bit
> number. The bitset is not allocated for devices that have no need for
> DMA aliases.
> 
> More details can be found in following article:
> http://www.plxtech.com/files/pdf/technical/expresslane/RTC_Enabling%20MulitHostSystemDesigns.pdf
> 
> Signed-off-by: Jacek Lawrynowicz 
> Acked-by: David Woodhouse 
> Acked-by: Joerg Roedel 
> ---
>  I updated the commit message based on discussion with Bjorn. It should be
>  now a little easier to understand. I'm not resubmitting the whole patch set
>  because it could make the thread harder to follow.
>  
>  This time resubmitting with correct subject.
> 
>  drivers/iommu/iommu.c | 17 ++---
>  drivers/pci/pci.c | 11 +--
>  drivers/pci/probe.c   |  1 +
>  drivers/pci/search.c  | 14 +-
>  include/linux/pci.h   |  4 +---
>  5 files changed, 30 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index bfd4f7c..4c10da1 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -659,9 +659,15 @@ static struct iommu_group 
> *get_pci_function_alias_group(struct pci_dev *pdev,
>   return NULL;
>  }
>  
> +static bool dma_alias_is_enabled(struct pci_dev *dev, u8 devfn)
> +{
> + return dev->dma_alias_mask &&
> +test_bit(devfn, dev->dma_alias_mask);
> +}
> +
>  /*
> - * Look for aliases to or from the given device for exisiting groups.  The
> - * dma_alias_devfn only supports aliases on the same bus, therefore the 
> search
> + * Look for aliases to or from the given device for existing groups. DMA
> + * aliases are only supported on the same bus, therefore the search
>   * space is quite small (especially since we're really only looking at pcie
>   * device, and therefore only expect multiple slots on the root complex or
>   * downstream switch ports).  It's conceivable though that a pair of
> @@ -686,11 +692,8 @@ static struct iommu_group *get_pci_alias_group(struct 
> pci_dev *pdev,
>   continue;
>  
>   /* We alias them or they alias us */
> - if (((pdev->dev_flags & PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) &&
> -  pdev->dma_alias_devfn == tmp->devfn) ||
> - ((tmp->dev_flags & PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) &&
> -  tmp->dma_alias_devfn == pdev->devfn)) {
> -
> + if (dma_alias_is_enabled(pdev, tmp->devfn) ||
> + dma_alias_is_enabled(tmp, pdev->devfn)) {
>   group = get_pci_alias_group(tmp, devfns);
>   if (group) {
>   pci_dev_put(tmp);
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e5e2c9d..33f3d24 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4577,8 +4577,15 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
>   */
>  void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
>  {
> - dev->dma_alias_devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0);
> - dev->dev_flags |= PCI_DEV_FLAGS_DMA_ALIAS_DEVFN;
> + if (!dev->dma_alias_mask)
> + dev->dma_alias_mask = kcalloc(BITS_TO_LONGS(U8_MAX),
> +   sizeof(long), GFP_KERNEL);
> + if (!dev->dma_alias_mask) {
> + dev_warn(>dev, "Unable to allocate DMA alias mask\n");
> + return;
> + }
> +
> + set_bit(devfn, dev->dma_alias_mask);

This silently fixes the bug in 1/, it should be updated after 1/ is
fixed.  Otherwise,

Reviewed-by: Alex Williamson 


>   dev_info(>dev, "Enabling fixed DMA alias to %02x.%d\n",
>PCI_SLOT(devfn), PCI_FUNC(devfn));
>  }
> diff --git 

Re: AMD-Vi regression with PCIe/PCI bridges in 4.4

2016-04-08 Thread Tomasz Golinski
>> > Can you
>> > guys please test if the attached patch fixes the issues for you as it
>> > did for me?

Tested, works as expected. Thanks. Attaching dmesg from patched kernel
in case you need it.


dmesg-patch.log.gz
Description: application/gunzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: AMD-Vi regression with PCIe/PCI bridges in 4.4

2016-04-08 Thread Joerg Roedel
On Fri, Apr 08, 2016 at 05:39:41PM +0200, Tomasz Golinski wrote:
> > Okay, with this card I was able to reproduce the problem myself. Can you
> > guys please test if the attached patch fixes the issues for you as it
> > did for me?
> 
> Sorry for asking (maybe) stupid question. This patch doesn't apply
> against 4.5 kernel. Should I apply it against current git head or maybe
> IOMMU branch?

The patch is against v4.6-rc2.


Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] dma/iommu: Add pgsize_bitmap confirmation in __iommu_dma_alloc_pages

2016-04-08 Thread Doug Anderson
Hi,

On Fri, Apr 8, 2016 at 10:30 AM, Will Deacon  wrote:
>> > Am I barking up the wrong tree?
>>
>> I don't think min_order can be negative.  Certainly we could enter the
>> loop with order == 0 and min_order == 0, though.
>
> ... and in that case, PageCompound will be false, and we'll call split_page
> which won't do anything, so we break out.
>
>>
>> Some examples:
>>
>> order = 0, min_order = 0
>> -> Want alloc_pages _without_ __GFP_NORETRY.  OK
>> -> If alloc_pages fails, return NULL.  OK
>> -> If alloc pages succeeds, don't need splitting since single page.  OK
>
> [...]
>
>> I think those are all right.  Did I mess up?  You could certainly
>> structure the loop in a different way but you need to make sure you
>> handle all of those cases.  If you have an alternate structure that
>> handles all those, let's consider it.
>
> Right, I don't think the code is broken, I just think the !order check is
> confusing and not needed.

Ah ha!  Got it.  I didn't dig into split_page() to see that it was a
no-op when "order == 0".  I just know that the old code didn't call
split_page() with order == 0 so I assumed that was wise to keep.  If
we don't need to keep that then agreed that the "if" test can simply
be removed.  :)

-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] PCI: Add support for multiple DMA aliases

2016-04-08 Thread Alex Williamson
On Fri, 8 Apr 2016 11:06:32 -0500
Bjorn Helgaas  wrote:

> On Tue, Mar 15, 2016 at 07:48:17PM -0500, Bjorn Helgaas wrote:
> > On Mon, Mar 14, 2016 at 10:43:40PM +, David Woodhouse wrote:  
> > > On Thu, 2016-02-25 at 08:38 -0600, Bjorn Helgaas wrote:  
> > > >   
> > > > >  /*
> > > > > - * Look for aliases to or from the given device for exisiting 
> > > > > groups.  The
> > > > > - * dma_alias_devfn only supports aliases on the same bus, therefore 
> > > > > the search
> > > > > + * Look for aliases to or from the given device for existing groups. 
> > > > > DMA
> > > > > + * aliases are only supported on the same bus, therefore the search  
> > > > 
> > > > I'm trying to reconcile this statement that "DMA aliases are only
> > > > supported on the same bus" (which was there even before this patch)
> > > > with the fact that pci_for_each_dma_alias() does not have that
> > > > limitation.  
> > > 
> > > Doesn't it? You can still only set a DMA alias on the same bus with
> > > pci_add_dma_alias(), can't you?  
> > 
> > I guess it's true that PCI_DEV_FLAGS_DMA_ALIAS_DEVFN and the proposed
> > pci_add_dma_alias() only add aliases on the same bus.  I was thinking
> > about a scenario like this:
> > 
> >   00:00.0 PCIe-to-PCI bridge to [bus 01]
> >   01:01.0 conventional PCI device
> > 
> > where I think 01:00.0 is a DMA alias for 01:01.0 because the bridge
> > takes ownership of DMA transactions from 01:01.0 and assigns a
> > Requester ID of 01:00.0 (secondary bus number, device 0, function 0).

This is true, but this is a topology alias, not a quirk.
pci_for_each_dma_alias() already handles this case.  It would trigger
the callback function for any direct alias of the conventional device
as well as the bridge itself.  Obviously we don't end up with any
quirks for conventional devices because the aliases are masked behind
the bridge anyway.
 
> > > > >   * space is quite small (especially since we're really only looking 
> > > > >at pcie
> > > > >   * device, and therefore only expect multiple slots on the root 
> > > > >complex or
> > > > >   * downstream switch ports).  It's conceivable though that a pair of
> > > > > @@ -686,11 +692,8 @@ static struct iommu_group 
> > > > > *get_pci_alias_group(struct pci_dev *pdev,
> > > > >   continue;
> > > > >  
> > > > >   /* We alias them or they alias us */
> > > > > - if (((pdev->dev_flags & PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) 
> > > > > &&
> > > > > -  pdev->dma_alias_devfn == tmp->devfn) ||
> > > > > - ((tmp->dev_flags & PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) &&
> > > > > -  tmp->dma_alias_devfn == pdev->devfn)) {
> > > > > -
> > > > > + if (dma_alias_is_enabled(pdev, tmp->devfn) ||
> > > > > + dma_alias_is_enabled(tmp, pdev->devfn)) {
> > > > >   group = get_pci_alias_group(tmp, devfns);  
> > > > 
> > > > We basically have this:
> > > > 
> > > >   for_each_pci_dev(tmp) {
> > > >     if ()
> > > >   group = get_pci_alias_group();
> > > >   ...
> > > >   }  
> > > 
> > > Strictly, that's:
> > > 
> > >  for_each_pci_dev(tmp) {
> > >    if (pdev is an alias of tmp || tmp is an alias of pdev)
> > >      group = get_pci_alias_group();
> > >    ...
> > >  }  
> > 
> > OK.  
> >   
> > > > I'm trying to figure out why we don't do something like the following
> > > > instead:
> > > > 
> > > >   callback(struct pci_dev *pdev, u16 alias, void *opaque)
> > > >   {
> > > >     struct iommu_group *group;
> > > > 
> > > >     group = get_pci_alias_group();
> > > >     if (group)
> > > >   return group;
> > > > 
> > > >     return 0;
> > > >   }
> > > > 
> > > >   pci_for_each_dma_alias(pdev, callback, ...);  
> > > 
> > > And this would be equivalent to
> > > 
> > >  for_each_pci_dev(tmp) {
> > >    if (tmp is an alias of pdev)
> > >      group = get_pci_alias_group();
> > >    ...
> > >  }
> > > 
> > > The "is an alias of" property is not commutative. Perhaps it should be.
> > > But that's hard because in some cases the alias doesn't even *exist* as
> > > a real PCI device. It's just that you appear to get DMA transactions
> > > from a given source-id.  
> > 
> > Right.  In my example above, 01:00.0 is not a PCI device; it's only a
> > Requester ID that is fabricated by the bridge when it forwards DMA
> > transactions upstream.
> > 
> > I think I'm confused because I don't really understand IOMMU groups.
> > 
> > Let me explain what I think they are and you can correct me when I go
> > wrong.  The iommu_group_alloc() comment says "The IOMMU group
> > represents the minimum granularity of the IOMMU."  So I suppose the
> > IOMMU cannot distinguish between devices in a group.  All the devices
> > in the group use the same set of DMA mappings.  Granting device A DMA
> > access to a buffer grants the same access to all other members of A's
> > IOMMU group.
> > 
> > That would mean my question was fundamentally 

Re: [PATCH v2 1/2] dma/iommu: Add pgsize_bitmap confirmation in __iommu_dma_alloc_pages

2016-04-08 Thread Will Deacon
On Fri, Apr 08, 2016 at 09:50:43AM -0700, Doug Anderson wrote:
> On Fri, Apr 8, 2016 at 6:07 AM, Will Deacon  wrote:
> > On Tue, Apr 05, 2016 at 10:03:32AM -0700, Doug Anderson wrote:
> >> On Tue, Mar 29, 2016 at 10:02 AM, Will Deacon  wrote:
> >> > On Mon, Mar 28, 2016 at 02:32:11PM +0800, Yong Wu wrote:
> >> >> @@ -213,13 +215,16 @@ static struct page 
> >> >> **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
> >> >>   /*
> >> >>* Higher-order allocations are a convenience rather
> >> >>* than a necessity, hence using __GFP_NORETRY until
> >> >> -  * falling back to single-page allocations.
> >> >> +  * falling back to min size allocations.
> >> >>*/
> >> >> - for (order = min_t(unsigned int, order, __fls(count));
> >> >> -  order > 0; order--) {
> >> >> - page = alloc_pages(gfp | __GFP_NORETRY, order);
> >> >> + for (order = min_t(int, order, __fls(count));
> >> >> +  order >= min_order; order--) {
> >> >> + page = alloc_pages((order == min_order) ? gfp :
> >> >> +gfp | __GFP_NORETRY, order);
> >> >>   if (!page)
> >> >>   continue;
> >> >> + if (!order)
> >> >> + break;
> >> >
> >> > Isn't this handled by the loop condition?
> >>
> >> He changed the loop condition to be ">= min_order" instead of "> 0",
> >> so now we can get here with an order == 0.  This makes sense because
> >> when min_order is not 0 you still want to run the code to split the
> >> pages and it is sane not to duplicate that below.
> >>
> >> Maybe I'm misunderstanding, though.  Perhaps you can explain how you
> >> think this code should look?
> >
> > My reading of the code was that we require order >= min_order to enter
> > the loop. Given that order doesn't change between the loop header and the
> > if (!order) check, then that must mean we can enter the loop body with
> > order == 0 and order >= min_order, which means that min_order is allowed
> > to be negative. That feels weird.
> >
> > Am I barking up the wrong tree?
> 
> I don't think min_order can be negative.  Certainly we could enter the
> loop with order == 0 and min_order == 0, though.

... and in that case, PageCompound will be false, and we'll call split_page
which won't do anything, so we break out.

> 
> Some examples:
> 
> order = 0, min_order = 0
> -> Want alloc_pages _without_ __GFP_NORETRY.  OK
> -> If alloc_pages fails, return NULL.  OK
> -> If alloc pages succeeds, don't need splitting since single page.  OK

[...]

> I think those are all right.  Did I mess up?  You could certainly
> structure the loop in a different way but you need to make sure you
> handle all of those cases.  If you have an alternate structure that
> handles all those, let's consider it.

Right, I don't think the code is broken, I just think the !order check is
confusing and not needed.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 2/5] vfio: allow the user to register reserved iova range for MSI mapping

2016-04-08 Thread Eric Auger
Hi Alex,
On 04/08/2016 06:41 PM, Alex Williamson wrote:
> On Fri, 8 Apr 2016 17:48:01 +0200
> Eric Auger  wrote:
> 
> Hi Eric,
> 
>> Hi Alex,
>> On 04/07/2016 08:29 PM, Alex Williamson wrote:
>>> On Thu, 7 Apr 2016 15:43:29 +0200
>>> Eric Auger  wrote:
>>>   
 Hi Alex,
 On 04/07/2016 12:07 AM, Alex Williamson wrote:  
> On Mon,  4 Apr 2016 08:30:08 +
> Eric Auger  wrote:
> 
>> The user is allowed to [un]register a reserved IOVA range by using the
>> DMA MAP API and setting the new flag: 
>> VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> It provides the base address and the size. This region is stored in the
>> vfio_dma rb tree. At that point the iova range is not mapped to any 
>> target
>> address yet. The host kernel will use those iova when needed, typically
>> when the VFIO-PCI device allocates its MSIs.
>>
>> This patch also handles the destruction of the reserved binding RB-tree 
>> and
>> domain's iova_domains.
>>
>> Signed-off-by: Eric Auger 
>> Signed-off-by: Bharat Bhushan 
>>
>> ---
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by 
>> dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>>   CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>>   a reserved IOVA range. A single reserved iova region is allowed.
>>
>> Conflicts:
>>  drivers/vfio/vfio_iommu_type1.c
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 141 
>> +++-
>>  include/uapi/linux/vfio.h   |  12 +++-
>>  2 files changed, 150 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index c9ddbde..4497b20 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson "
>> @@ -403,10 +404,22 @@ static void vfio_unmap_unpin(struct vfio_iommu 
>> *iommu, struct vfio_dma *dma)
>>  vfio_lock_acct(-unlocked);
>>  }
>>  
>> +static void vfio_unmap_reserved(struct vfio_iommu *iommu)
>> +{
>> +#ifdef CONFIG_IOMMU_DMA_RESERVED
>> +struct vfio_domain *d;
>> +
>> +list_for_each_entry(d, >domain_list, next)
>> +iommu_unmap_reserved(d->domain);
>> +#endif
>> +}
>> +
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma 
>> *dma)
>>  {
>>  if (likely(dma->type != VFIO_IOVA_RESERVED))
>>  vfio_unmap_unpin(iommu, dma);
>> +else
>> +vfio_unmap_reserved(iommu);
>>  vfio_unlink_dma(iommu, dma);
>>  kfree(dma);
>>  }
>
> This makes me nervous, apparently we can add reserved mappings
> individually, but we have absolutely no granularity on remove, so if we
> remove one, we've removed them all even though we still have them
> linked in our rb tree.  I see later that only one reserved region is
> allowed, but that seems very short sighted, especially to impose that
> on the user level API.
 On kernel-size the reserved region is currently backed by a unique
 iova_domain. Do you mean you would like me to handle a list of
 iova_domains instead of using a single "cookie"?  
>>>
>>> TBH, I'm not really sure how this works with a single iova domain.  If
>>> we have multiple irq chips and each gets mapped by a separate page in
>>> the iova space, then is it really sufficient to do a lookup from the
>>> irq_data to the msi_desc to the device to the domain in order to get
>>> reserved iova to map that msi doorbell?  Don't we need an iova from the
>>> pool mapping the specific irqchip associated with our device?  The IOMMU
>>> domain might span any number of irq chips, how can we assume there's
>>> one only reserved iova space?  Maybe I'm not understanding how the code
>>> works.  
>>
>> On vfio_iommu_type1 we currently compute the reserved iova needs for
>> each domain and we take the max. Each domain then is assigned a reserved
>> iova domain of this max size.
>>
>> So let's 

Re: [PATCH v2 1/2] dma/iommu: Add pgsize_bitmap confirmation in __iommu_dma_alloc_pages

2016-04-08 Thread Doug Anderson
Will,

On Fri, Apr 8, 2016 at 6:07 AM, Will Deacon  wrote:
> On Tue, Apr 05, 2016 at 10:03:32AM -0700, Doug Anderson wrote:
>> On Tue, Mar 29, 2016 at 10:02 AM, Will Deacon  wrote:
>> > On Mon, Mar 28, 2016 at 02:32:11PM +0800, Yong Wu wrote:
>> >> @@ -213,13 +215,16 @@ static struct page 
>> >> **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
>> >>   /*
>> >>* Higher-order allocations are a convenience rather
>> >>* than a necessity, hence using __GFP_NORETRY until
>> >> -  * falling back to single-page allocations.
>> >> +  * falling back to min size allocations.
>> >>*/
>> >> - for (order = min_t(unsigned int, order, __fls(count));
>> >> -  order > 0; order--) {
>> >> - page = alloc_pages(gfp | __GFP_NORETRY, order);
>> >> + for (order = min_t(int, order, __fls(count));
>> >> +  order >= min_order; order--) {
>> >> + page = alloc_pages((order == min_order) ? gfp :
>> >> +gfp | __GFP_NORETRY, order);
>> >>   if (!page)
>> >>   continue;
>> >> + if (!order)
>> >> + break;
>> >
>> > Isn't this handled by the loop condition?
>>
>> He changed the loop condition to be ">= min_order" instead of "> 0",
>> so now we can get here with an order == 0.  This makes sense because
>> when min_order is not 0 you still want to run the code to split the
>> pages and it is sane not to duplicate that below.
>>
>> Maybe I'm misunderstanding, though.  Perhaps you can explain how you
>> think this code should look?
>
> My reading of the code was that we require order >= min_order to enter
> the loop. Given that order doesn't change between the loop header and the
> if (!order) check, then that must mean we can enter the loop body with
> order == 0 and order >= min_order, which means that min_order is allowed
> to be negative. That feels weird.
>
> Am I barking up the wrong tree?

I don't think min_order can be negative.  Certainly we could enter the
loop with order == 0 and min_order == 0, though.


Some examples:

order = 0, min_order = 0
-> Want alloc_pages _without_ __GFP_NORETRY.  OK
-> If alloc_pages fails, return NULL.  OK
-> If alloc pages succeeds, don't need splitting since single page.  OK

order = 1, min_order = 1
-> Want alloc_pages _without_ __GFP_NORETRY.  OK
-> If alloc_pages fails, return NULL.  OK
-> If alloc pages succeeds, DO need splitting.  OK

order = 1, min_order = 0
-> Want alloc_pages with __GFP_NORETRY.  OK
-> If alloc_pages fails, try order = 0.  OK
-> If alloc pages succeeds, DO need splitting.  OK

order = 2, min_order = 1
-> Want alloc_pages with __GFP_NORETRY.  OK
-> If alloc_pages fails, try order = 1.  OK
-> If alloc pages succeeds, DO need splitting.  OK


I think those are all right.  Did I mess up?  You could certainly
structure the loop in a different way but you need to make sure you
handle all of those cases.  If you have an alternate structure that
handles all those, let's consider it.

-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 2/5] vfio: allow the user to register reserved iova range for MSI mapping

2016-04-08 Thread Alex Williamson
On Fri, 8 Apr 2016 17:48:01 +0200
Eric Auger  wrote:

Hi Eric,

> Hi Alex,
> On 04/07/2016 08:29 PM, Alex Williamson wrote:
> > On Thu, 7 Apr 2016 15:43:29 +0200
> > Eric Auger  wrote:
> >   
> >> Hi Alex,
> >> On 04/07/2016 12:07 AM, Alex Williamson wrote:  
> >>> On Mon,  4 Apr 2016 08:30:08 +
> >>> Eric Auger  wrote:
> >>> 
>  The user is allowed to [un]register a reserved IOVA range by using the
>  DMA MAP API and setting the new flag: 
>  VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>  It provides the base address and the size. This region is stored in the
>  vfio_dma rb tree. At that point the iova range is not mapped to any 
>  target
>  address yet. The host kernel will use those iova when needed, typically
>  when the VFIO-PCI device allocates its MSIs.
> 
>  This patch also handles the destruction of the reserved binding RB-tree 
>  and
>  domain's iova_domains.
> 
>  Signed-off-by: Eric Auger 
>  Signed-off-by: Bharat Bhushan 
> 
>  ---
>  v3 -> v4:
>  - use iommu_alloc/free_reserved_iova_domain exported by 
>  dma-reserved-iommu
>  - protect vfio_register_reserved_iova_range implementation with
>    CONFIG_IOMMU_DMA_RESERVED
>  - handle unregistration by user-space and on vfio_iommu_type1 release
> 
>  v1 -> v2:
>  - set returned value according to alloc_reserved_iova_domain result
>  - free the iova domains in case any error occurs
> 
>  RFC v1 -> v1:
>  - takes into account Alex comments, based on
>    [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>  - use the existing dma map/unmap ioctl interface with a flag to register
>    a reserved IOVA range. A single reserved iova region is allowed.
> 
>  Conflicts:
>   drivers/vfio/vfio_iommu_type1.c
>  ---
>   drivers/vfio/vfio_iommu_type1.c | 141 
>  +++-
>   include/uapi/linux/vfio.h   |  12 +++-
>   2 files changed, 150 insertions(+), 3 deletions(-)
> 
>  diff --git a/drivers/vfio/vfio_iommu_type1.c 
>  b/drivers/vfio/vfio_iommu_type1.c
>  index c9ddbde..4497b20 100644
>  --- a/drivers/vfio/vfio_iommu_type1.c
>  +++ b/drivers/vfio/vfio_iommu_type1.c
>  @@ -36,6 +36,7 @@
>   #include 
>   #include 
>   #include 
>  +#include 
>   
>   #define DRIVER_VERSION  "0.2"
>   #define DRIVER_AUTHOR   "Alex Williamson "
>  @@ -403,10 +404,22 @@ static void vfio_unmap_unpin(struct vfio_iommu 
>  *iommu, struct vfio_dma *dma)
>   vfio_lock_acct(-unlocked);
>   }
>   
>  +static void vfio_unmap_reserved(struct vfio_iommu *iommu)
>  +{
>  +#ifdef CONFIG_IOMMU_DMA_RESERVED
>  +struct vfio_domain *d;
>  +
>  +list_for_each_entry(d, >domain_list, next)
>  +iommu_unmap_reserved(d->domain);
>  +#endif
>  +}
>  +
>   static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma 
>  *dma)
>   {
>   if (likely(dma->type != VFIO_IOVA_RESERVED))
>   vfio_unmap_unpin(iommu, dma);
>  +else
>  +vfio_unmap_reserved(iommu);
>   vfio_unlink_dma(iommu, dma);
>   kfree(dma);
>   }
> >>>
> >>> This makes me nervous, apparently we can add reserved mappings
> >>> individually, but we have absolutely no granularity on remove, so if we
> >>> remove one, we've removed them all even though we still have them
> >>> linked in our rb tree.  I see later that only one reserved region is
> >>> allowed, but that seems very short sighted, especially to impose that
> >>> on the user level API.
> >> On kernel-size the reserved region is currently backed by a unique
> >> iova_domain. Do you mean you would like me to handle a list of
> >> iova_domains instead of using a single "cookie"?  
> > 
> > TBH, I'm not really sure how this works with a single iova domain.  If
> > we have multiple irq chips and each gets mapped by a separate page in
> > the iova space, then is it really sufficient to do a lookup from the
> > irq_data to the msi_desc to the device to the domain in order to get
> > reserved iova to map that msi doorbell?  Don't we need an iova from the
> > pool mapping the specific irqchip associated with our device?  The IOMMU
> > domain might span any number of irq chips, how can we assume there's
> > one only reserved iova space?  Maybe I'm not understanding how the code
> > works.  
> 
> On vfio_iommu_type1 we currently compute the reserved iova needs for
> each domain and we take the max. Each domain then is assigned a reserved
> iova domain of this max size.
> 
> So let's say domain1 has the largest needs (say 2 doorbells)
> domain 1: iova 

Re: [PATCH 4/5] iommu/dma: Finish optimising higher-order allocations

2016-04-08 Thread Robin Murphy

On 08/04/16 06:32, Yong Wu wrote:

On Thu, 2016-04-07 at 18:42 +0100, Robin Murphy wrote:

/*
@@ -215,8 +221,9 @@ static struct page **__iommu_dma_alloc_pages(unsigned int 
count, gfp_t gfp)
 * than a necessity, hence using __GFP_NORETRY until
 * falling back to single-page allocations.
 */
-   for (order = min_t(unsigned int, order, __fls(count));
-order > 0; order--) {
+   for (pgsize_orders &= (2U << __fls(count)) - 1;
+(order = __fls(pgsize_orders)) > min_order;
+pgsize_orders &= (1U << order) - 1) {
page = alloc_pages(gfp | __GFP_NORETRY, order);
if (!page)
continue;
@@ -230,7 +237,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int 
count, gfp_t gfp)
}
}
if (!page)
-   page = alloc_page(gfp);
+   page = alloc_pages(gfp, order);


A small question: Do we need split it too if order != 0 here?


Ah, good point, somehow I missed that. It didn't stop my framebuffer 
console working kernel-side, but indeed I can't mmap it due to the 
un-split pages. I'll take that as an excuse to have a go at refactoring 
the whole thing to maybe not reach 5 levels of indentation.


Thanks,
Robin.





if (!page) {
__iommu_dma_free_pages(pages, i);
return NULL;

[...]


___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] PCI: Add support for multiple DMA aliases

2016-04-08 Thread David Woodhouse
On Fri, 2016-04-08 at 11:06 -0500, Bjorn Helgaas wrote:
> > I think I'm confused because I don't really understand IOMMU
> > groups.
,,,

> Ping?

I think this ended up being more of a Alex question...

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 3/6] PCI: Add support for multiple DMA aliases

2016-04-08 Thread Bjorn Helgaas
On Tue, Mar 15, 2016 at 07:48:17PM -0500, Bjorn Helgaas wrote:
> On Mon, Mar 14, 2016 at 10:43:40PM +, David Woodhouse wrote:
> > On Thu, 2016-02-25 at 08:38 -0600, Bjorn Helgaas wrote:
> > > 
> > > >  /*
> > > > - * Look for aliases to or from the given device for exisiting groups.  
> > > > The
> > > > - * dma_alias_devfn only supports aliases on the same bus, therefore 
> > > > the search
> > > > + * Look for aliases to or from the given device for existing groups. 
> > > > DMA
> > > > + * aliases are only supported on the same bus, therefore the search
> > > 
> > > I'm trying to reconcile this statement that "DMA aliases are only
> > > supported on the same bus" (which was there even before this patch)
> > > with the fact that pci_for_each_dma_alias() does not have that
> > > limitation.
> > 
> > Doesn't it? You can still only set a DMA alias on the same bus with
> > pci_add_dma_alias(), can't you?
> 
> I guess it's true that PCI_DEV_FLAGS_DMA_ALIAS_DEVFN and the proposed
> pci_add_dma_alias() only add aliases on the same bus.  I was thinking
> about a scenario like this:
> 
>   00:00.0 PCIe-to-PCI bridge to [bus 01]
>   01:01.0 conventional PCI device
> 
> where I think 01:00.0 is a DMA alias for 01:01.0 because the bridge
> takes ownership of DMA transactions from 01:01.0 and assigns a
> Requester ID of 01:00.0 (secondary bus number, device 0, function 0).
> 
> > > >   * space is quite small (especially since we're really only looking at 
> > > >pcie
> > > >   * device, and therefore only expect multiple slots on the root 
> > > >complex or
> > > >   * downstream switch ports).  It's conceivable though that a pair of
> > > > @@ -686,11 +692,8 @@ static struct iommu_group 
> > > > *get_pci_alias_group(struct pci_dev *pdev,
> > > >   continue;
> > > >  
> > > >   /* We alias them or they alias us */
> > > > - if (((pdev->dev_flags & PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) &&
> > > > -  pdev->dma_alias_devfn == tmp->devfn) ||
> > > > - ((tmp->dev_flags & PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) &&
> > > > -  tmp->dma_alias_devfn == pdev->devfn)) {
> > > > -
> > > > + if (dma_alias_is_enabled(pdev, tmp->devfn) ||
> > > > + dma_alias_is_enabled(tmp, pdev->devfn)) {
> > > >   group = get_pci_alias_group(tmp, devfns);
> > > 
> > > We basically have this:
> > > 
> > >   for_each_pci_dev(tmp) {
> > >     if ()
> > >   group = get_pci_alias_group();
> > >   ...
> > >   }
> > 
> > Strictly, that's:
> > 
> >  for_each_pci_dev(tmp) {
> >    if (pdev is an alias of tmp || tmp is an alias of pdev)
> >      group = get_pci_alias_group();
> >    ...
> >  }
> 
> OK.  
> 
> > > I'm trying to figure out why we don't do something like the following
> > > instead:
> > > 
> > >   callback(struct pci_dev *pdev, u16 alias, void *opaque)
> > >   {
> > >     struct iommu_group *group;
> > > 
> > >     group = get_pci_alias_group();
> > >     if (group)
> > >   return group;
> > > 
> > >     return 0;
> > >   }
> > > 
> > >   pci_for_each_dma_alias(pdev, callback, ...);
> > 
> > And this would be equivalent to
> > 
> >  for_each_pci_dev(tmp) {
> >    if (tmp is an alias of pdev)
> >      group = get_pci_alias_group();
> >    ...
> >  }
> > 
> > The "is an alias of" property is not commutative. Perhaps it should be.
> > But that's hard because in some cases the alias doesn't even *exist* as
> > a real PCI device. It's just that you appear to get DMA transactions
> > from a given source-id.
> 
> Right.  In my example above, 01:00.0 is not a PCI device; it's only a
> Requester ID that is fabricated by the bridge when it forwards DMA
> transactions upstream.
> 
> I think I'm confused because I don't really understand IOMMU groups.
> 
> Let me explain what I think they are and you can correct me when I go
> wrong.  The iommu_group_alloc() comment says "The IOMMU group
> represents the minimum granularity of the IOMMU."  So I suppose the
> IOMMU cannot distinguish between devices in a group.  All the devices
> in the group use the same set of DMA mappings.  Granting device A DMA
> access to a buffer grants the same access to all other members of A's
> IOMMU group.
> 
> That would mean my question was fundamentally backwards.  In
> get_pci_alias_group(A), we're not trying to figure out what all the
> aliases of A are, which is what pci_for_each_dma_alias() does.
> 
> Instead, we're trying to figure out which IOMMU group A belongs to.
> But I still don't quite understand how aliases fit into this.  Let's
> go back to my example and assume we've already put 00:00.0 and 01:01.0
> in IOMMU groups:
> 
>   00:00.0 PCIe-to-PCI bridge to [bus 01] # in IOMMU group G0
>   01:01.0 conventional PCI device# in IOMMU group G1
> 
> I assume these devices are in different IOMMU groups because if the
> bridge generated its own DMA, it would use Requester ID 00:00.0, which
> is distinct from the 

Re: AMD-Vi regression with PCIe/PCI bridges in 4.4

2016-04-08 Thread Tomasz Golinski
> Okay, with this card I was able to reproduce the problem myself. Can you
> guys please test if the attached patch fixes the issues for you as it
> did for me?

Sorry for asking (maybe) stupid question. This patch doesn't apply
against 4.5 kernel. Should I apply it against current git head or maybe
IOMMU branch?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[git pull] IOMMU Fixes for Linux v4.6-rc2

2016-04-08 Thread Joerg Roedel
Hi Linus,

The following changes since commit 9735a22799b9214d17d3c231fe377fc852f042e9:

  Linux 4.6-rc2 (2016-04-03 09:09:40 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fixes-v4.6-rc2

for you to fetch changes up to 0b74ecdfbea893ae585bc694136a3f50f5856cfe:

  iommu/vt-d: Silence an uninitialized variable warning (2016-04-07 14:51:47 
+0200)


IOMMU Fixes for Linux v4.6-rc2 with:

 - Compile-time fixes (warnings and failures)

 - A bug in iommu core code which could cause the group->domain pointer
   to be falsly cleared

 - Fix in scatterlist handling of the ARM common DMA-API code

 - Stall detection fix for the Rockchip IOMMU driver


Arnd Bergmann (1):
  iommu: provide of_xlate pointer unconditionally

Dan Carpenter (1):
  iommu/vt-d: Silence an uninitialized variable warning

Joerg Roedel (1):
  iommu: Don't overwrite domain pointer when there is no default_domain

John Keeping (1):
  iommu/rockchip: Fix "is stall active" check

Robin Murphy (1):
  iommu/dma: Restore scatterlist offsets correctly

 drivers/iommu/dma-iommu.c  | 4 ++--
 drivers/iommu/intel-iommu.c| 2 +-
 drivers/iommu/iommu.c  | 3 ++-
 drivers/iommu/rockchip-iommu.c | 8 
 include/linux/iommu.h  | 2 --
 5 files changed, 9 insertions(+), 10 deletions(-)

Please pull.

Thanks,

Joerg


signature.asc
Description: Digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 1/3: iommu changes

2016-04-08 Thread Eric Auger
Hi Alex,
On 04/07/2016 07:50 PM, Alex Williamson wrote:
> On Thu, 7 Apr 2016 14:28:59 +0200
> Eric Auger  wrote:
> 
>> Hi Alex,
>> On 04/07/2016 01:15 AM, Alex Williamson wrote:
>>> On Mon,  4 Apr 2016 08:06:55 +
>>> Eric Auger  wrote:
>>>   
 This series introduces the dma-reserved-iommu api used to:
 - create/destroy an iova domain dedicated to reserved iova bindings
 - map/unmap physical addresses onto reserved IOVAs.
 - unmap and destroy all IOVA reserved bindings  
>>>
>>> Why are we making the decision to have an unbalanced map vs unmap, we
>>> can create individual mappings, but only unmap the whole thing and
>>> start over?  That's a strange interface.  Thanks,  
>> The "individual" balanced unmap also exists (iommu_put_reserved_iova)
>> and this is the "normal" path. This happens on msi_domain_deactivate
>> (and possibly on msi_domain_set_affinity).
>>
>> I added iommu_unmap_reserved to handle the case where the userspace
>> registers a reserved iova domain and fails to unregister it. In that
>> case one need to handle the cleanup on kernel-side and I chose to
>> implement this on vfio_iommu_type1 release. All the reserved IOMMU
>> bindings get destroyed on that event.
>>
>> Any advice to handle this situation?
> 
> If we want to model it similar to regular iommu domains, then
> iommu_free_reserved_iova_domain() should release all the mappings and
> destroy the iova domain.
Yes this sounds obvious now.
  Additionally, since the reserved iova domain
> is just a construct on top of an iommu domain, it should be sufficient
> to call iommu_domain_free() to also remove the reserved iova domain if
> one exists.  Thanks,
Yes. For dma cookie (iommu_put_dma_cookie) I see this is done from the
iommu driver domain_free callback.

Thanks

Eric
> 
> Alex
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: AMD-Vi regression with PCIe/PCI bridges in 4.4

2016-04-08 Thread Joerg Roedel
On Wed, Apr 06, 2016 at 05:52:57PM +0200, Tomasz Golinski wrote:
> I'm using Asus Xonar DX card: https://www.asus.com/Sound-Cards/Xonar_DX/

Okay, with this card I was able to reproduce the problem myself. Can you
guys please test if the attached patch fixes the issues for you as it
did for me?


Thanks,

Joerg

>From e76e8764c9265b5dec31e6a8e5be61f028866d93 Mon Sep 17 00:00:00 2001
From: Joerg Roedel 
Date: Fri, 8 Apr 2016 15:12:24 +0200
Subject: [PATCH] iommu/amd: Fix checking of pci dma aliases

Commit 61289cb ('iommu/amd: Remove old alias handling code')
removed the old alias handling code from the AMD IOMMU
driver because this is now handled by the IOMMU core code.

But this also removed the handling of PCI aliases, which is
not handled by the core code. This caused issues with PCI
devices that have hidden PCIe-to-PCI bridges that rewrite
the request-id.

Fix this bug by re-introducing some of the removed functions
from commit 61289cbaf6c8 and add a alias field
'struct iommu_dev_data'. This field carrys the return value
of the get_alias() function and uses that instead of the
amd_iommu_alias_table[] array in the code.

Fixes: 61289cbaf6c8 ('iommu/amd: Remove old alias handling code')
Cc: sta...@vger.kernel.org # v4.4+
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 87 +--
 1 file changed, 76 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 374c129..5efadad 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -92,6 +92,7 @@ struct iommu_dev_data {
struct list_head dev_data_list;   /* For global dev_data_list */
struct protection_domain *domain; /* Domain the device is bound to */
u16 devid;/* PCI Device ID */
+   u16 alias;/* Alias Device ID */
bool iommu_v2;/* Device can make use of IOMMUv2 */
bool passthrough; /* Device is identity mapped */
struct {
@@ -166,6 +167,13 @@ static struct protection_domain *to_pdomain(struct 
iommu_domain *dom)
return container_of(dom, struct protection_domain, domain);
 }
 
+static inline u16 get_device_id(struct device *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   return PCI_DEVID(pdev->bus->number, pdev->devfn);
+}
+
 static struct iommu_dev_data *alloc_dev_data(u16 devid)
 {
struct iommu_dev_data *dev_data;
@@ -203,6 +211,68 @@ out_unlock:
return dev_data;
 }
 
+static int __last_alias(struct pci_dev *pdev, u16 alias, void *data)
+{
+   *(u16 *)data = alias;
+   return 0;
+}
+
+static u16 get_alias(struct device *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+   u16 devid, ivrs_alias, pci_alias;
+
+   devid = get_device_id(dev);
+   ivrs_alias = amd_iommu_alias_table[devid];
+   pci_for_each_dma_alias(pdev, __last_alias, _alias);
+
+   if (ivrs_alias == pci_alias)
+   return ivrs_alias;
+
+   /*
+* DMA alias showdown
+*
+* The IVRS is fairly reliable in telling us about aliases, but it
+* can't know about every screwy device.  If we don't have an IVRS
+* reported alias, use the PCI reported alias.  In that case we may
+* still need to initialize the rlookup and dev_table entries if the
+* alias is to a non-existent device.
+*/
+   if (ivrs_alias == devid) {
+   if (!amd_iommu_rlookup_table[pci_alias]) {
+   amd_iommu_rlookup_table[pci_alias] =
+   amd_iommu_rlookup_table[devid];
+   memcpy(amd_iommu_dev_table[pci_alias].data,
+  amd_iommu_dev_table[devid].data,
+  sizeof(amd_iommu_dev_table[pci_alias].data));
+   }
+
+   return pci_alias;
+   }
+
+   pr_info("AMD-Vi: Using IVRS reported alias %02x:%02x.%d "
+   "for device %s[%04x:%04x], kernel reported alias "
+   "%02x:%02x.%d\n", PCI_BUS_NUM(ivrs_alias), PCI_SLOT(ivrs_alias),
+   PCI_FUNC(ivrs_alias), dev_name(dev), pdev->vendor, pdev->device,
+   PCI_BUS_NUM(pci_alias), PCI_SLOT(pci_alias),
+   PCI_FUNC(pci_alias));
+
+   /*
+* If we don't have a PCI DMA alias and the IVRS alias is on the same
+* bus, then the IVRS table may know about a quirk that we don't.
+*/
+   if (pci_alias == devid &&
+   PCI_BUS_NUM(ivrs_alias) == pdev->bus->number) {
+   pdev->dev_flags |= PCI_DEV_FLAGS_DMA_ALIAS_DEVFN;
+   pdev->dma_alias_devfn = ivrs_alias & 0xff;
+   pr_info("AMD-Vi: Added PCI DMA alias %02x.%d for %s\n",
+   PCI_SLOT(ivrs_alias), PCI_FUNC(ivrs_alias),
+   dev_name(dev));
+   }
+
+   return 

Re: [PATCH v2 1/2] dma/iommu: Add pgsize_bitmap confirmation in __iommu_dma_alloc_pages

2016-04-08 Thread Will Deacon
On Tue, Apr 05, 2016 at 10:03:32AM -0700, Doug Anderson wrote:
> On Tue, Mar 29, 2016 at 10:02 AM, Will Deacon  wrote:
> > On Mon, Mar 28, 2016 at 02:32:11PM +0800, Yong Wu wrote:
> >> @@ -213,13 +215,16 @@ static struct page 
> >> **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
> >>   /*
> >>* Higher-order allocations are a convenience rather
> >>* than a necessity, hence using __GFP_NORETRY until
> >> -  * falling back to single-page allocations.
> >> +  * falling back to min size allocations.
> >>*/
> >> - for (order = min_t(unsigned int, order, __fls(count));
> >> -  order > 0; order--) {
> >> - page = alloc_pages(gfp | __GFP_NORETRY, order);
> >> + for (order = min_t(int, order, __fls(count));
> >> +  order >= min_order; order--) {
> >> + page = alloc_pages((order == min_order) ? gfp :
> >> +gfp | __GFP_NORETRY, order);
> >>   if (!page)
> >>   continue;
> >> + if (!order)
> >> + break;
> >
> > Isn't this handled by the loop condition?
> 
> He changed the loop condition to be ">= min_order" instead of "> 0",
> so now we can get here with an order == 0.  This makes sense because
> when min_order is not 0 you still want to run the code to split the
> pages and it is sane not to duplicate that below.
> 
> Maybe I'm misunderstanding, though.  Perhaps you can explain how you
> think this code should look?

My reading of the code was that we require order >= min_order to enter
the loop. Given that order doesn't change between the loop header and the
if (!order) check, then that must mean we can enter the loop body with
order == 0 and order >= min_order, which means that min_order is allowed
to be negative. That feels weird.

Am I barking up the wrong tree?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu