Re: [PATCH 0/5] arm-smmu: performance optimization

2017-08-17 Thread Leizhen (ThunderTown)


On 2017/8/17 22:36, Will Deacon wrote:
> Thunder, Nate, Robin,
> 
> On Mon, Jun 26, 2017 at 09:38:45PM +0800, Zhen Lei wrote:
>> I described the optimization more detail in patch 1 and 2, and patch 3-5 are
>> the implementation on arm-smmu/arm-smmu-v3 of patch 2.
>>
>> Patch 1 is v2. In v1, I directly replaced writel with writel_relaxed in
>> queue_inc_prod. But Robin figured that it may lead SMMU consume stale
>> memory contents. I thought more than 3 whole days and got this one.
>>
>> This patchset is based on Robin Murphy's [PATCH v2 0/8] io-pgtable lock 
>> removal.
> 
> For the time being, I think we should focus on the new TLB flushing
> interface posted by Joerg:
> 
> http://lkml.kernel.org/r/1502974596-23835-1-git-send-email-j...@8bytes.org
> 
> which looks like it can give us most of the benefits of this series. Once
> we've got that, we can see what's left in the way of performance and focus
> on the cmdq batching separately (because I'm still not convinced about it).
OK, this is a good news.

But I have a review comment(sorry, I have not subscribed it yet, so can not 
directly reply it):
I don't think we should add tlb sync for map operation
1. at init time, all tlbs will be invalidated
2. when we try to map a new range, there are no related ptes bufferd in tlb, 
because of above 1 and below 3
3. when we unmap the above range, make sure all related ptes bufferd in tlb to 
be invalidated before unmap finished

> 
> Thanks,
> 
> Will
> 
> .
> 

-- 
Thanks!
BestRegards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option

2017-08-17 Thread Michael S. Tsirkin
On Thu, Aug 17, 2017 at 07:47:04PM +0200, Auger Eric wrote:
> I will see with Peter and other potential users in the community whether
> it is worth to pursue the efforts on upstreaming the QEMU vSMMUv3
> device, considering the VFIO/VHOST integration is made impossible.

I posted more ideas on finding a way to support it after all separately.
Even without this:
1. VHOST does not need to be notified on map.
2. VFIO might be possible for hardware that supports nested page tables.

IMHO 2 is worth looking into.

-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option

2017-08-17 Thread Michael S. Tsirkin
On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
> > When running a virtual SMMU on a guest we sometimes need to trap
> > all changes to the translation structures. This is especially useful
> > to integrate with VFIO. This patch adds a new option that forces
> > the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
> > 
> > TLBI commands then can be trapped.
> > 
> > Signed-off-by: Eric Auger 
> > 
> > ---
> > v1 -> v2:
> > - rebase on v4.13-rc2
> > ---
> >  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 
> >  drivers/iommu/arm-smmu-v3.c | 5 +
> >  2 files changed, 9 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt 
> > b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > index c9abbf3..ebb85e9 100644
> > --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > @@ -52,6 +52,10 @@ the PCIe specification.
> >  devicetree/bindings/interrupt-controller/msi.txt
> >for a description of the msi-parent property.
> >  
> > +- tlbi-on-map   : invalidate caches whenever there is an update of
> > +  any remapping structure (updates to not-present or
> > +  present entries).
> > +
> 
> My position on this hasn't changed, so NAK for this patch. If you want to
> emulate something outside of the SMMUv3 architecture, please do so, but
> don't pretend that it's an SMMUv3.
> 
> Will

What if the emulated device does not list arm,smmu-v3, listing
qemu,ssmu-v3 as compatible? Would that address the concern?

Way I see it down the road most people will want to use nested
mmu support in hardware. So things will work fine without
this hack.

For those that can't for some reason, reusing most of the code
in a real smmu driver allows better coverage than a completely
separate device.

Consider hypervisor as hardware - yes it's
not exactly an smmu-v3 but it's very similar so imho it's better to
reuse existing code than duplicate all of it.



-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Avoid NULL group dereference

2017-08-17 Thread Shawn Lin

Hi

On 2017/8/17 18:40, Robin Murphy wrote:

The recently-removed FIXME in iommu_get_domain_for_dev() turns out to
have been a little misleading, since that check is still worthwhile even
when groups *are* universal. We have a few IOMMU-aware drivers which
only care whether their device is already attached to an existing domain
or not, for which the previous behaviour of iommu_get_domain_for_dev()
was ideal, and who now crash if their device does not have an IOMMU.



It works, thanks!

Tested-by: Shawn Lin 



With IOMMU groups now serving as a reliable indicator of whether a
device has an IOMMU or not (barring false-positives from VFIO no-IOMMU
mode), drivers could arguably do this:

group = iommu_group_get(dev);
if (group) {
domain = iommu_get_domain_for_dev(dev);
iommu_group_put(group);
}

However, rather than duplicate that code across multiple callsites,
particularly when it's still only the domain they care about, let's skip
straight to the next step and factor out the check into the common place
it applies - in iommu_get_domain_for_dev() itself. Sure, it ends up
looking rather familiar, but now it's backed by the reasoning of having
a robust API able to do the expected thing for all devices regardless.

Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
Reported-by: Shawn Lin 
Signed-off-by: Robin Murphy 
---

As well as dma-iommu, there are at least the Cavium ThunderX and
Freescale DPAA2 ethernet drivers expecting this to work too.

  drivers/iommu/iommu.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index af69bf7e035a..5499a0387349 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1352,6 +1352,8 @@ struct iommu_domain *iommu_get_domain_for_dev(struct 
device *dev)
struct iommu_group *group;
  
  	group = iommu_group_get(dev);

+   if (!group)
+   return NULL;
  
  	domain = group->domain;
  



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing

2017-08-17 Thread Joerg Roedel
Hi Will,

On Thu, Aug 17, 2017 at 06:17:05PM +0100, Will Deacon wrote:
> On Thu, Aug 17, 2017 at 06:50:40PM +0200, Joerg Roedel wrote:
> > Problem currently is how to get this information from
> > 'struct iommu_device' to 'struct iommu_domain'. As a workaround I
> > consider a per-domain flag in the iommu drivers which checks whether any
> > unmap has happened and just do nothing on the flush-call-back if there
> > were none.
> 
> Given that this can all happen concurrently, I really don't like the idea of
> having to track things with a flag. We'd end up introducing atomics and/or
> over-invalidating the TLBs.

Okay, I look into a better solution for that.

> We don't actually tend to see issues adding the TLB invalidation commands
> under the lock -- the vast majority of the overhead comes from the SYNC.
> Besides, I don't see how adding the commands in the ->iotlb_range_add
> callback is any better: it still happens on unmap and it still needs to
> take the lock.

With the deferred flushing you don't flush anything in the unmap path in
most cases.  All you do there is to add the unmapped iova-range to a
per-cpu list (its actually a ring-buffer). Only when that buffer is full
you do a flush_tlb_all() on the domain and then free all the iova
ranges.

With the flush-counters you can also see which entries in your buffer
have already been flushed from the IO/TLB by another CPU, so that you
can release them right away without any further flush. This way its less
likely that the buffer fills up.

In my tests on x86 I got the flush-rate down to ~1800 flushes/sec at a
network packet rate of 1.45 million pps.

> There are a few reasons I'm not rushing to move to the deferred flushing
> code for ARM:
> 
>   1. The performance numbers we have suggest that we can achieve near-native
>  performance without needing to do that.

Hard to believe when all CPUs fight for the cmd-buffer lock, especially
when you have around 96 CPUs :) Can you share the performance numbers
you have and what you measured?

>   2. We can free page-table pages in unmap, but that's not safe if we defer
>  flushing

Right, VT-d has the same problem and solved it with a free-list of pages
that is passed to the deferred flushing code. When the IO/TLB is flushed
it calls back into the driver which then frees the pages.

>   3. Flushing the whole TLB is undesirable and not something we currently
>  need to do

Is the TLB-refill cost higher than the time needed to add a
flush-command for every unmapped range?

>   4. There are security implications of deferring the unmap and I'm aware
>  of a security research group that use this to gain root privileges.

Interesting, can you share more about that?

>   5. *If* performance figures end up showing that deferring the flush is
>  worthwhile, I would rather use an RCU-based approach for protecting
>  the page tables, like we do on the CPU.

Yeah, I don't like the entry_dtor_cb() I introduced for that very much, so if
there are better solutions I am all ears.


Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-17 Thread Alex Williamson
On Thu, 17 Aug 2017 10:56:35 +
David Laight  wrote:

> From: Alex Williamson
> > Sent: 16 August 2017 17:56  
> ...
> > Firmware pissing match...  Processors running with 8k or less page size
> > fall within the recommendations of the PCI spec for register alignment
> > of MMIO regions of the device and this whole problem becomes less of an
> > issue.  
> 
> Actually if qemu is causing the MSI-X table accesses to fault, why doesn't
> it just lie to the guest about the physical address of the MSI-X table?
> Then mmio access to anything in the same physical page will just work.

That's an interesting idea, but now you need to add a BAR for the
virtualized vector table, but you'll also need to support extending a
BAR because there won't necessarily be a BAR available to add.  Of
course PCI requires natural alignment of BARs, thus an extra few bytes
on the end doubles the BAR size.  So also hope that if we need to
extend a BAR that there's a relatively small one available.  In either
case you're changing the layout of the device from what the driver might
expect.  We try pretty hard with device assignment to leave things in
the same place as they appear on bare metal, perhaps removing things,
but not actually moving things.  It might work in the majority of
cases, but it seems a bit precarious overall.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/4] iommu: Prevent VMD child devices from being remapping targets

2017-08-17 Thread Jon Derrick
VMD child devices must use the VMD endpoint's ID as the requester.
Because of this, there needs to be a way to link the parent VMD
endpoint's iommu group and associated mappings to the VMD child devices
such that attaching and detaching child devices modify the endpoint's
mappings, while preventing early detaching on a singular device removal
or unbinding.

The reassignment of individual VMD child devices devices to VMs is
outside the scope of VMD, but may be implemented in the future. For now
it is best to prevent any such attempts.

This patch prevents VMD child devices from returning an IOMMU, which
prevents it from exposing an iommu_group sysfs directories and allowing
subsequent binding by userspace-access drivers such as VFIO.

Signed-off-by: Jon Derrick 
---
 drivers/iommu/intel-iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 687f18f..94353a6e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -901,6 +901,11 @@ static struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devf
struct pci_dev *pf_pdev;
 
pdev = to_pci_dev(dev);
+
+   /* VMD child devices currently cannot be handled individually */
+   if (is_vmd(pdev->bus))
+   return NULL;
+
/* VFs aren't listed in scope tables; we need to look up
 * the PF instead to find the IOMMU. */
pf_pdev = pci_physfn(pdev);
-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/4] VMD fixups

2017-08-17 Thread Jon Derrick
Mostly just cleanup in this revision, eg, trying to limit scope of vmd code to
x86

Previous:
https://patchwork.kernel.org/patch/9886095/
https://patchwork.kernel.org/patch/9886097/
https://patchwork.kernel.org/patch/9886101/


Jon Derrick (4):
  MAINTAINERS: Add Jonathan Derrick as VMD maintainer
  pci/x86: Move VMD quirks to x86 fixups
  x86/PCI: Use is_vmd rather than relying on the domain number
  iommu: Prevent VMD child devices from being remapping targets

 MAINTAINERS |  1 +
 arch/x86/pci/fixup.c| 18 ++
 drivers/iommu/intel-iommu.c |  5 +
 drivers/pci/quirks.c| 17 -
 4 files changed, 24 insertions(+), 17 deletions(-)

-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/4] x86/PCI: Use is_vmd rather than relying on the domain number

2017-08-17 Thread Jon Derrick
Signed-off-by: Jon Derrick 
---
 arch/x86/pci/fixup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index ca4b02e5..1ed2fbf 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -628,7 +628,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x8c10, 
quirk_apple_mbp_poweroff);
 static void quirk_no_aersid(struct pci_dev *pdev)
 {
/* VMD Domain */
-   if (pdev->bus->sysdata && pci_domain_nr(pdev->bus) >= 0x1)
+   if (is_vmd(pdev->bus))
pdev->bus->bus_flags |= PCI_BUS_FLAGS_NO_AERSID;
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2030, quirk_no_aersid);
-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/4] pci/x86: Move VMD quirks to x86 fixups

2017-08-17 Thread Jon Derrick
VMD currently only exists for Intel x86 products

Signed-off-by: Jon Derrick 
---
 arch/x86/pci/fixup.c | 18 ++
 drivers/pci/quirks.c | 17 -
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index 11e4074..ca4b02e5 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -618,3 +618,21 @@ static void quirk_apple_mbp_poweroff(struct pci_dev *pdev)
dev_info(dev, "can't work around MacBook Pro poweroff issue\n");
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x8c10, 
quirk_apple_mbp_poweroff);
+
+/*
+ * VMD-enabled root ports will change the source ID for all messages
+ * to the VMD device. Rather than doing device matching with the source
+ * ID, the AER driver should traverse the child device tree, reading
+ * AER registers to find the faulting device.
+ */
+static void quirk_no_aersid(struct pci_dev *pdev)
+{
+   /* VMD Domain */
+   if (pdev->bus->sysdata && pci_domain_nr(pdev->bus) >= 0x1)
+   pdev->bus->bus_flags |= PCI_BUS_FLAGS_NO_AERSID;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2030, quirk_no_aersid);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031, quirk_no_aersid);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032, quirk_no_aersid);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033, quirk_no_aersid);
+
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index f1c9852..073baba 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4671,23 +4671,6 @@ static void quirk_intel_qat_vf_cap(struct pci_dev *pdev)
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
 
-/*
- * VMD-enabled root ports will change the source ID for all messages
- * to the VMD device. Rather than doing device matching with the source
- * ID, the AER driver should traverse the child device tree, reading
- * AER registers to find the faulting device.
- */
-static void quirk_no_aersid(struct pci_dev *pdev)
-{
-   /* VMD Domain */
-   if (pdev->bus->sysdata && pci_domain_nr(pdev->bus) >= 0x1)
-   pdev->bus->bus_flags |= PCI_BUS_FLAGS_NO_AERSID;
-}
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2030, quirk_no_aersid);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031, quirk_no_aersid);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032, quirk_no_aersid);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033, quirk_no_aersid);
-
 /* FLR may cause some 82579 devices to hang. */
 static void quirk_intel_no_flr(struct pci_dev *dev)
 {
-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/4] MAINTAINERS: Add Jonathan Derrick as VMD maintainer

2017-08-17 Thread Jon Derrick
Add myself as VMD maintainer

Signed-off-by: Jon Derrick 
Acked-by: Keith Busch 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f66488d..3ec39df 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10090,6 +10090,7 @@ F:  drivers/pci/dwc/*imx6*
 
 PCI DRIVER FOR INTEL VOLUME MANAGEMENT DEVICE (VMD)
 M: Keith Busch 
+M: Jonathan Derrick 
 L: linux-...@vger.kernel.org
 S: Supported
 F: drivers/pci/host/vmd.c
-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option

2017-08-17 Thread Auger Eric
Hi Will,

On 17/08/2017 18:34, Will Deacon wrote:
> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
>> When running a virtual SMMU on a guest we sometimes need to trap
>> all changes to the translation structures. This is especially useful
>> to integrate with VFIO. This patch adds a new option that forces
>> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
>>
>> TLBI commands then can be trapped.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v1 -> v2:
>> - rebase on v4.13-rc2
>> ---
>>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 
>>  drivers/iommu/arm-smmu-v3.c | 5 +
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt 
>> b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>> index c9abbf3..ebb85e9 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>> @@ -52,6 +52,10 @@ the PCIe specification.
>>  devicetree/bindings/interrupt-controller/msi.txt
>>for a description of the msi-parent property.
>>  
>> +- tlbi-on-map   : invalidate caches whenever there is an update of
>> +  any remapping structure (updates to not-present or
>> +  present entries).
>> +
> 
> My position on this hasn't changed, so NAK for this patch. If you want to
> emulate something outside of the SMMUv3 architecture, please do so, but
> don't pretend that it's an SMMUv3.
OK understood. I wanted to go down the road and enable DPDK use case
using the same solution as Intel.

So to me the approach is not adapted to SMMUv3 because
- SMMUv3 does not expose anything similar to the Intel Caching Mode
(when set, indicates the OS needs to invalidate TLB on map)
- SMMUv3 does not expose any IOVA range TLB invalidation command.

Do you agree with this conclusion? ACPI enablement was not a showstopper
I think.

I will see with Peter and other potential users in the community whether
it is worth to pursue the efforts on upstreaming the QEMU vSMMUv3
device, considering the VFIO/VHOST integration is made impossible.

Thanks

Eric

> 
> Will
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/tegra-gart: Add support for struct iommu_device

2017-08-17 Thread Dmitry Osipenko
On 17.08.2017 16:52, Thierry Reding wrote:
> On Thu, Aug 17, 2017 at 01:21:52AM +0300, Dmitry Osipenko wrote:
>> Hello Joerg,
>>
>> On 10.08.2017 01:29, Joerg Roedel wrote:
>>> From: Joerg Roedel 
>>>
>>> Add a struct iommu_device to each tegra-gart and register it
>>> with the iommu-core. Also link devices added to the driver
>>> to their respective hardware iommus.
>>>
>>> Signed-off-by: Joerg Roedel 
>>> ---
>>>  drivers/iommu/tegra-gart.c | 26 ++
>>>  1 file changed, 26 insertions(+)
>>>
>>
>> Reviewed-by: Dmitry Osipenko 
>> Tested-by: Dmitry Osipenko 
>>
>>> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
>>> index 29bafc6..b62f790 100644
>>> --- a/drivers/iommu/tegra-gart.c
>>> +++ b/drivers/iommu/tegra-gart.c
>>
>> [snip]
>>
>>> @@ -449,6 +472,9 @@ static int tegra_gart_remove(struct platform_device 
>>> *pdev)
>>>  {
>>
>> BTW, GART's driver can't be build as a module, so this function is pretty 
>> much a
>> dead code. Probably worth considering its removal.
> 
> Technically you can unbind the driver via sysfs, in which case this
> function would still be called. That said, all sorts of things will
> probably start to crash when you do that. I'd like to think that we
> will eventually be able to deal with this sanely (there's some work
> in progress to establish stronger links between devices, so that we
> can sanely deal with this kind of dependency), so I think it's okay
> to keep this around in case we ever get there.
> 

Good point! I tried to unbind and with this patch kernel crashes immediately:

[  193.968506] kernel BUG at mm/slab.c:2816!
[  193.968912] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[  193.969466] Modules linked in:
[  193.969822] CPU: 1 PID: 1177 Comm: bash Not tainted
4.13.0-rc5-next-20170816-00067-gd320d19b76f8 #593
[  193.970653] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[  193.971313] task: ee31e380 task.stack: ee394000
[  193.971771] PC is at ___cache_free+0x374/0x47c
[  193.972261] LR is at 0x1
[  193.972539] pc : []lr : [<0001>]psr: 200b0093
[  193.973112] sp : ee395dd0  ip : 0009  fp : 
[  193.973603] r10: 0480  r9 : 2e844000  r8 : c1814d8c
[  193.974097] r7 : 005e4c40  r6 : c0f91fc0  r5 : ef262480  r4 : ef0003c0
[  193.974694] r3 : ef262400  r2 : ef262000  r1 : 0400  r0 : 0004

[snip]

[  193.991288] [] (___cache_free) from [] (kfree+0xbc/0x260)
[  193.991978] [] (kfree) from [] (device_release+0x2c/0x90)
[  193.992732] [] (device_release) from []
(kobject_put+0x8c/0xe8)
[  193.993474] [] (kobject_put) from []
(tegra_gart_remove+0x1c/0x58)
[  193.994320] [] (tegra_gart_remove) from []
(platform_drv_remove+0x24/0x3c)
[  193.995121] [] (platform_drv_remove) from []
(device_release_driver_internal+0x15c/0x204)
[  193.996033] [] (device_release_driver_internal) from []
(unbind_store+0x7c/0xfc)
[  193.996890] [] (unbind_store) from []
(kernfs_fop_write+0x104/0x208)
[  193.997661] [] (kernfs_fop_write) from []
(__vfs_write+0x1c/0x128)
[  193.998445] [] (__vfs_write) from [] 
(vfs_write+0xa4/0x168)
[  194.017668] [] (vfs_write) from [] (SyS_write+0x3c/0x90)
[  194.038493] [] (SyS_write) from []
(ret_fast_syscall+0x0/0x1c)
[  194.057182] Code: e3a03000 ebfff54e eae2 e7f001f2 (e7f001f2)

Without this patch, it crashes too after unbinding but not immediately. Either
way unbinding isn't useful for this device.

> I don't have any objections to making the driver unloadable if that
> is what everyone else prefers, though. In that case, however, there
> are more steps involved than just removing the ->remove() callback.
> 
Indeed!

-- 
Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing

2017-08-17 Thread Will Deacon
Hi Joerg,

On Thu, Aug 17, 2017 at 06:50:40PM +0200, Joerg Roedel wrote:
> On Thu, Aug 17, 2017 at 05:32:35PM +0100, Will Deacon wrote:
> > I really like the idea of this, but I have a couple of questions and
> > comments below.
> 
> Great, this together with the common iova-flush it should make it
> possible to solve the performance problems of the dma-iommu code.
> 
> > > +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
> > > +phys_addr_t paddr, size_t size, int prot)
> > > +{
> > > + int ret = iommu_map(domain, iova, paddr, size, prot);
> > > +
> > > + iommu_tlb_range_add(domain, iova, size);
> > > + iommu_tlb_sync(domain);
> > 
> > Many IOMMUs don't need these callbacks on ->map operations, but they won't
> > be able to distinguish them easily with this API. Could you add a flags
> > parameter or something to the iommu_tlb_* functions, please?
> 
> Yeah, this is only needed for virtualized IOMMUs that have a non-present
> cache. My idea was to let the iommu-drivers tell the common code whether
> the iommu needs it and the code above just checks a flag and omits the
> calls to the flush-functions then.
> 
> Problem currently is how to get this information from
> 'struct iommu_device' to 'struct iommu_domain'. As a workaround I
> consider a per-domain flag in the iommu drivers which checks whether any
> unmap has happened and just do nothing on the flush-call-back if there
> were none.

Given that this can all happen concurrently, I really don't like the idea of
having to track things with a flag. We'd end up introducing atomics and/or
over-invalidating the TLBs.

> > I think we will struggle to implement this efficiently on ARM SMMUv3. The
> > way invalidation works there is that there is a single in-memory command
> > queue into which we can put TLB invalidation commands (they are inserted
> > under a lock). These are then processed asynchronously by the hardware, and
> > you can complete them by inserting a SYNC command and waiting for that to
> > be consumed by the SMMU. Sounds like a perfect fit, right?
> 
> Yes, its basically the same as way as it works on AMD-Vi and Intel VT-d.
> 
> > The problem is that we want to add those invalidation commands as early
> > as possible, so that they can be processed by the hardware concurrently
> > with us unmapping other pages.
> 
> I think that's a bad idea, because then you re-introduce the performance
> problems again because everyone will spin on the cmd-queue lock in the
> unmap path of the dma-api.

We don't actually tend to see issues adding the TLB invalidation commands
under the lock -- the vast majority of the overhead comes from the SYNC.
Besides, I don't see how adding the commands in the ->iotlb_range_add
callback is any better: it still happens on unmap and it still needs to
take the lock.

If we had something like an ->unmap_all_sync callback, we could incorporate
the TLB invalidation into that.

> > That means adding the invalidation commands in the ->unmap callback
> > and not bothering to implement ->iotlb_range_add callback at all.
> > Then, we will do the sync in ->iotlb_sync. This falls apart if
> > somebody decides to use iommu_flush_tlb_all(), where we would prefer
> > not to insert all of the invalidation commands in unmap and instead
> > insert a single invalidate-all command, followed up with a SYNC.
> 
> This problem can be solved with the deferred iova flushing code I posted
> to the ML. When a queue fills up, iommu_flush_tlb_all() is called and
> every entry that was unmapped before can be released. This works well on
> x86, are there reasons it wouldn't on ARM?

There are a few reasons I'm not rushing to move to the deferred flushing
code for ARM:

  1. The performance numbers we have suggest that we can achieve near-native
 performance without needing to do that.

  2. We can free page-table pages in unmap, but that's not safe if we defer
 flushing

  3. Flushing the whole TLB is undesirable and not something we currently
 need to do

  4. There are security implications of deferring the unmap and I'm aware
 of a security research group that use this to gain root privileges.

  5. *If* performance figures end up showing that deferring the flush is
 worthwhile, I would rather use an RCU-based approach for protecting
 the page tables, like we do on the CPU.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Avoid NULL group dereference

2017-08-17 Thread Robin Murphy
On 17/08/17 16:41, Joerg Roedel wrote:
> On Thu, Aug 17, 2017 at 11:40:08AM +0100, Robin Murphy wrote:
>> The recently-removed FIXME in iommu_get_domain_for_dev() turns out to
>> have been a little misleading, since that check is still worthwhile even
>> when groups *are* universal. We have a few IOMMU-aware drivers which
>> only care whether their device is already attached to an existing domain
>> or not, for which the previous behaviour of iommu_get_domain_for_dev()
>> was ideal, and who now crash if their device does not have an IOMMU.
>>
>> With IOMMU groups now serving as a reliable indicator of whether a
>> device has an IOMMU or not (barring false-positives from VFIO no-IOMMU
>> mode), drivers could arguably do this:
>>
>>  group = iommu_group_get(dev);
>>  if (group) {
>>  domain = iommu_get_domain_for_dev(dev);
>>  iommu_group_put(group);
>>  }
> 
> Okay, so just to check I got it right: Drivers do the above to check
> whether a device is managed by an IOMMU, and that crashes now because
> the 'group == NULL' check was removed?

Indeed - the typical context is network descriptors that don't have
space to store the CPU virtual address of the buffer, so when a packet
arrives the driver has to work backwards from the DMA address, in this
sort of pattern:

addr = desc[idx]->addr;
domain = iommu_get_domain_for_dev(dev);
if (domain)
addr = iommu_iova_to_phys(domain, addr)
buf = phys_to_virt(addr)

(the GIC thing is similar but in reverse, with a physical address which
may or may not need replacing with an IOVA). Unless we were to change
the interface to be iommu_get_domain_for_group(), I think it makes sense
for it to remain valid to call for any device.

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing

2017-08-17 Thread Joerg Roedel
Hi Will,

On Thu, Aug 17, 2017 at 05:32:35PM +0100, Will Deacon wrote:
> I really like the idea of this, but I have a couple of questions and
> comments below.

Great, this together with the common iova-flush it should make it
possible to solve the performance problems of the dma-iommu code.

> > +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
> > +  phys_addr_t paddr, size_t size, int prot)
> > +{
> > +   int ret = iommu_map(domain, iova, paddr, size, prot);
> > +
> > +   iommu_tlb_range_add(domain, iova, size);
> > +   iommu_tlb_sync(domain);
> 
> Many IOMMUs don't need these callbacks on ->map operations, but they won't
> be able to distinguish them easily with this API. Could you add a flags
> parameter or something to the iommu_tlb_* functions, please?

Yeah, this is only needed for virtualized IOMMUs that have a non-present
cache. My idea was to let the iommu-drivers tell the common code whether
the iommu needs it and the code above just checks a flag and omits the
calls to the flush-functions then.

Problem currently is how to get this information from
'struct iommu_device' to 'struct iommu_domain'. As a workaround I
consider a per-domain flag in the iommu drivers which checks whether any
unmap has happened and just do nothing on the flush-call-back if there
were none.

> I think we will struggle to implement this efficiently on ARM SMMUv3. The
> way invalidation works there is that there is a single in-memory command
> queue into which we can put TLB invalidation commands (they are inserted
> under a lock). These are then processed asynchronously by the hardware, and
> you can complete them by inserting a SYNC command and waiting for that to
> be consumed by the SMMU. Sounds like a perfect fit, right?

Yes, its basically the same as way as it works on AMD-Vi and Intel VT-d.

> The problem is that we want to add those invalidation commands as early
> as possible, so that they can be processed by the hardware concurrently
> with us unmapping other pages.

I think that's a bad idea, because then you re-introduce the performance
problems again because everyone will spin on the cmd-queue lock in the
unmap path of the dma-api.

> That means adding the invalidation commands in the ->unmap callback
> and not bothering to implement ->iotlb_range_add callback at all.
> Then, we will do the sync in ->iotlb_sync. This falls apart if
> somebody decides to use iommu_flush_tlb_all(), where we would prefer
> not to insert all of the invalidation commands in unmap and instead
> insert a single invalidate-all command, followed up with a SYNC.

This problem can be solved with the deferred iova flushing code I posted
to the ML. When a queue fills up, iommu_flush_tlb_all() is called and
every entry that was unmapped before can be released. This works well on
x86, are there reasons it wouldn't on ARM?

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option

2017-08-17 Thread Will Deacon
On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
> When running a virtual SMMU on a guest we sometimes need to trap
> all changes to the translation structures. This is especially useful
> to integrate with VFIO. This patch adds a new option that forces
> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
> 
> TLBI commands then can be trapped.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> v1 -> v2:
> - rebase on v4.13-rc2
> ---
>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 
>  drivers/iommu/arm-smmu-v3.c | 5 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt 
> b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> index c9abbf3..ebb85e9 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> @@ -52,6 +52,10 @@ the PCIe specification.
>  devicetree/bindings/interrupt-controller/msi.txt
>for a description of the msi-parent property.
>  
> +- tlbi-on-map   : invalidate caches whenever there is an update of
> +  any remapping structure (updates to not-present or
> +  present entries).
> +

My position on this hasn't changed, so NAK for this patch. If you want to
emulate something outside of the SMMUv3 architecture, please do so, but
don't pretend that it's an SMMUv3.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing

2017-08-17 Thread Will Deacon
Hi Joerg,

I really like the idea of this, but I have a couple of questions and
comments below.

On Thu, Aug 17, 2017 at 02:56:25PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> With the current IOMMU-API the hardware TLBs have to be
> flushed in every iommu_map(), iommu_map_sg(), and
> iommu_unmap() call.
> 
> For unmapping large amounts of address space, like it
> happens when a KVM domain with assigned devices is
> destroyed, this causes thousands of unnecessary TLB flushes
> in the IOMMU hardware because the unmap call-back runs for
> every unmapped physical page.
> 
> With the TLB Flush Interface introduced here the need to
> clean the hardware TLBs is removed from the iommu_map/unmap
> functions. Users now have to explicitly call these functions
> to sync the page-table changes to the hardware.
> 
> Three functions are introduced:
> 
>   * iommu_flush_tlb_all() - Flushes all TLB entries
> associated with that
> domain. TLBs entries are
> flushed when this function
> returns.
> 
>   * iommu_tlb_range_add() - This will add a given
> range to the flush queue
> for this domain.
> 
>   * iommu_tlb_sync() - Flushes all queued ranges from
>the hardware TLBs. Returns when
>the flush is finished.
> 
> The semantic of this interface is intentionally similar to
> the iommu_gather_ops from the io-pgtable code.
> 
> Additionally, this patch introduces synchronized versions of
> the iommu_map(), iommu_map_sg(), and iommu_unmap()
> functions. They will be used by current users of the
> IOMMU-API, before they are optimized to the unsynchronized
> versions.
> 
> Cc: Alex Williamson 
> Cc: Will Deacon 
> Cc: Robin Murphy 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/iommu.c | 26 +
>  include/linux/iommu.h | 80 
> ++-
>  2 files changed, 105 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3f6ea16..816e248 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -527,6 +527,8 @@ static int iommu_group_create_direct_mappings(struct 
> iommu_group *group,
>  
>   }
>  
> + iommu_flush_tlb_all(domain);
> +
>  out:
>   iommu_put_resv_regions(dev, );
>  
> @@ -1556,6 +1558,18 @@ int iommu_map(struct iommu_domain *domain, unsigned 
> long iova,
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
>  
> +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
> +phys_addr_t paddr, size_t size, int prot)
> +{
> + int ret = iommu_map(domain, iova, paddr, size, prot);
> +
> + iommu_tlb_range_add(domain, iova, size);
> + iommu_tlb_sync(domain);

Many IOMMUs don't need these callbacks on ->map operations, but they won't
be able to distinguish them easily with this API. Could you add a flags
parameter or something to the iommu_tlb_* functions, please?

> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_map_sync);
> +
>  size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t 
> size)
>  {
>   size_t unmapped_page, unmapped = 0;
> @@ -1608,6 +1622,18 @@ size_t iommu_unmap(struct iommu_domain *domain, 
> unsigned long iova, size_t size)
>  }
>  EXPORT_SYMBOL_GPL(iommu_unmap);
>  
> +size_t iommu_unmap_sync(struct iommu_domain *domain,
> + unsigned long iova, size_t size)
> +{
> + size_t ret = iommu_unmap(domain, iova, size);
> +
> + iommu_tlb_range_add(domain, iova, size);
> + iommu_tlb_sync(domain);

I think we will struggle to implement this efficiently on ARM SMMUv3. The
way invalidation works there is that there is a single in-memory command
queue into which we can put TLB invalidation commands (they are inserted
under a lock). These are then processed asynchronously by the hardware, and
you can complete them by inserting a SYNC command and waiting for that to
be consumed by the SMMU. Sounds like a perfect fit, right?

The problem is that we want to add those invalidation commands as early
as possible, so that they can be processed by the hardware concurrently
with us unmapping other pages. That means adding the invalidation commands
in the ->unmap callback and not bothering to implement ->iotlb_range_add
callback at all. Then, we will do the sync in ->iotlb_sync. This falls
apart if somebody decides to use iommu_flush_tlb_all(), where we would
prefer not to insert all of the invalidation commands in unmap and instead
insert a single invalidate-all command, followed up with a SYNC.

In other words, we really need the information about the invalidation as
part of the unmap call.

Any ideas?

Will

Re: [PATCH] iommu: Avoid NULL group dereference

2017-08-17 Thread Joerg Roedel
On Thu, Aug 17, 2017 at 11:40:08AM +0100, Robin Murphy wrote:
> The recently-removed FIXME in iommu_get_domain_for_dev() turns out to
> have been a little misleading, since that check is still worthwhile even
> when groups *are* universal. We have a few IOMMU-aware drivers which
> only care whether their device is already attached to an existing domain
> or not, for which the previous behaviour of iommu_get_domain_for_dev()
> was ideal, and who now crash if their device does not have an IOMMU.
> 
> With IOMMU groups now serving as a reliable indicator of whether a
> device has an IOMMU or not (barring false-positives from VFIO no-IOMMU
> mode), drivers could arguably do this:
> 
>   group = iommu_group_get(dev);
>   if (group) {
>   domain = iommu_get_domain_for_dev(dev);
>   iommu_group_put(group);
>   }

Okay, so just to check I got it right: Drivers do the above to check
whether a device is managed by an IOMMU, and that crashes now because
the 'group == NULL' check was removed?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/8] dt-bindings: mediatek: Add binding for mt2712 IOMMU and SMI

2017-08-17 Thread Rob Herring
On Fri, Aug 11, 2017 at 05:56:10PM +0800, Yong Wu wrote:
> This patch adds decriptions for mt2712 IOMMU and SMI.
> 
> In order to balance the bandwidth, mt2712 has two M4Us, two
> smi-commons, 8 smi-larbs. and mt2712 is also MTK IOMMU gen2 which
> use Short-Descriptor translation table format.

s/use/uses/

> 
> The mt2712 M4U-SMI HW diagram is as below:
> 
> EMI
>  |
>   ---
>   | |
>  M4U0  M4U1
>   | |
>  smi-common0   smi-common1
>   | |
>   -   
>   | | | | |   | ||
>   | | | | |   | ||
> larb0 larb1 larb2 larb3 larb6larb4larb5 larb7
> disp0 vdec  cam   venc   jpg  mdp1/disp1 mdp2/disp2  mdp3->larb names
> 
> All the connections are HW fixed, SW can NOT adjust it.
> 
> Signed-off-by: Yong Wu 
> ---
>  .../devicetree/bindings/iommu/mediatek,iommu.txt   |  6 +-
>  .../memory-controllers/mediatek,smi-common.txt |  6 +-
>  .../memory-controllers/mediatek,smi-larb.txt   |  5 +-
>  include/dt-bindings/memory/mt2712-larb-port.h  | 91 
> ++
>  4 files changed, 102 insertions(+), 6 deletions(-)
>  create mode 100644 include/dt-bindings/memory/mt2712-larb-port.h

Acked-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface

2017-08-17 Thread Joerg Roedel
On Thu, Aug 17, 2017 at 08:54:07AM -0600, Alex Williamson wrote:
> So _sync() does imply that they're slower, but iommu_map() does not
> imply that a _flush() is required.  One is a performance issue, the
> other is an API usability issue.  If the sync version is used
> sub-optimally, it's a performance issue, not a correctness issue.  If
> the async version is used without an explicit flush, it's a correctness
> issue.  Therefore, I would lean towards making the asynchronous mode
> explicit and providing good documentation and comments to steer
> developers to the async version.  I think it makes the API harder to
> use incorrectly.

I agree that it makes the API a bit more complicated to use. But that
can be solved by documenting it and by converting the main users (for me
that is VFIO and dma-iommu.c) of it to the unsynchronized interface,
because people will look at existing users and just do what they do.

Introducing _sync functions instead of _async ones also has the
side-effect that it puts pressure on the maintainers of the code to
convert it to the async interface, because they now see explicitly that
they use the slow version and start looking for ways to make things
faster.

What I absolutly don't want is that the whole explicit TLB flushing
of the IOMMU-API (as introduced in this patch-set) is considered some
optional part of the API, as it would be when I just introduce _async
versions of map/unmap/map_sg.


Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface

2017-08-17 Thread Alex Williamson
On Thu, 17 Aug 2017 16:43:08 +0200
Joerg Roedel  wrote:

> Hi Alex,
> 
> On Thu, Aug 17, 2017 at 08:35:20AM -0600, Alex Williamson wrote:
> > Wouldn't it be much more friendly to downstreams and out-of-tree
> > drivers to introduce new functions for the async semantics?  ie.
> > iommu_map_async(), etc.  The API also seems a little cleaner that
> > iommu_map() stands alone, it's synchronous, iommu_map_async() is
> > explicitly asynchronous and a _flush() call is needed to finalize it.
> > What do you see as the advantage to the approach here?  Thanks,  
> 
> The reason I did it this way was that I want the iommu_map(),
> iommu_unmap(), and iomu_map_sg() functions be considered the _default_
> to chose when using the IOMMU-API, because their use is faster than
> using the _sync() variants. Or in other words, I want the _sync function
> names to imply that they are slower versions of the default ones.

So _sync() does imply that they're slower, but iommu_map() does not
imply that a _flush() is required.  One is a performance issue, the
other is an API usability issue.  If the sync version is used
sub-optimally, it's a performance issue, not a correctness issue.  If
the async version is used without an explicit flush, it's a correctness
issue.  Therefore, I would lean towards making the asynchronous mode
explicit and providing good documentation and comments to steer
developers to the async version.  I think it makes the API harder to
use incorrectly.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface

2017-08-17 Thread Joerg Roedel
Hi Alex,

On Thu, Aug 17, 2017 at 08:35:20AM -0600, Alex Williamson wrote:
> Wouldn't it be much more friendly to downstreams and out-of-tree
> drivers to introduce new functions for the async semantics?  ie.
> iommu_map_async(), etc.  The API also seems a little cleaner that
> iommu_map() stands alone, it's synchronous, iommu_map_async() is
> explicitly asynchronous and a _flush() call is needed to finalize it.
> What do you see as the advantage to the approach here?  Thanks,

The reason I did it this way was that I want the iommu_map(),
iommu_unmap(), and iomu_map_sg() functions be considered the _default_
to chose when using the IOMMU-API, because their use is faster than
using the _sync() variants. Or in other words, I want the _sync function
names to imply that they are slower versions of the default ones.


Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] arm-smmu: performance optimization

2017-08-17 Thread Will Deacon
Thunder, Nate, Robin,

On Mon, Jun 26, 2017 at 09:38:45PM +0800, Zhen Lei wrote:
> I described the optimization more detail in patch 1 and 2, and patch 3-5 are
> the implementation on arm-smmu/arm-smmu-v3 of patch 2.
> 
> Patch 1 is v2. In v1, I directly replaced writel with writel_relaxed in
> queue_inc_prod. But Robin figured that it may lead SMMU consume stale
> memory contents. I thought more than 3 whole days and got this one.
> 
> This patchset is based on Robin Murphy's [PATCH v2 0/8] io-pgtable lock 
> removal.

For the time being, I think we should focus on the new TLB flushing
interface posted by Joerg:

http://lkml.kernel.org/r/1502974596-23835-1-git-send-email-j...@8bytes.org

which looks like it can give us most of the benefits of this series. Once
we've got that, we can see what's left in the way of performance and focus
on the cmdq batching separately (because I'm still not convinced about it).

Thanks,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface

2017-08-17 Thread Alex Williamson
On Thu, 17 Aug 2017 14:56:23 +0200
Joerg Roedel  wrote:

> Hi,
> 
> here is a patch-set to introduce an explicit interface to
> the IOMMU-API to flush IOMMU and device IO/TLBs. Currently
> the iommu_map(), iommu_map_sg(), and iommu_unmap() functions
> have to make sure all IO/TLBs in the system are synchronized
> with the page-table updates they made.
> 
> This is very inefficient in some scenarios, for example when
> a large address space is unmapped and an IO/TLB flush has to
> be done in every call of iommu_unmap(). Or in a scenario
> where it makes sense to queue up some changes to the
> page-tables and flush them together.
> 
> To optimize these scenarios, the need to synchronize with
> the IOMMU and device TLBs has been removed from the
> map/unmap functions of the IOMMU-API and an interface to
> explicitly do the flushes has been introduced.
> 
> To make the conversion of existing users of the IOMMU-API
> easier, new functions - iommu_map_sync(), iommu_map_sg_sync(),
> and iommu_unmap_sync() - have been introduced. These
> functions guarantee that the IO/TLBs are synchronized with
> any page-table update when they return. The optimizations
> possible with the new interface are subject to separate
> patch-sets.

Hi Joerg,

Wouldn't it be much more friendly to downstreams and out-of-tree
drivers to introduce new functions for the async semantics?  ie.
iommu_map_async(), etc.  The API also seems a little cleaner that
iommu_map() stands alone, it's synchronous, iommu_map_async() is
explicitly asynchronous and a _flush() call is needed to finalize it.
What do you see as the advantage to the approach here?  Thanks,

Alex


> Patch 1 just renames a few functions in the AMD-Vi driver
> that would otherwise collide with the new TLB-flush
> functions from the IOMMU-API.
> 
> Patch 2 introduces the new IO/TLB Flush-Interface.
> 
> Patch 3-13 convert existing users of the IOMMU-API to use
> the *_sync functions for now.
> 
> Please review.
> 
> Thanks,
> 
>   Joerg
> 
> Joerg Roedel (13):
>   iommu/amd: Rename a few flush functions
>   iommu: Introduce Interface for IOMMU TLB Flushing
>   vfio/type1: Use sychronized interface of the IOMMU-API
>   iommu/dma: Use sychronized interface of the IOMMU-API
>   arm: dma-mapping: Use sychronized interface of the IOMMU-API
>   drm/etnaviv: Use sychronized interface of the IOMMU-API
>   drm/msm: Use sychronized interface of the IOMMU-API
>   drm/nouveau/imem/gk20a: Use sychronized interface of the IOMMU-API
>   drm/rockchip: Use sychronized interface of the IOMMU-API
>   drm/tegra: Use sychronized interface of the IOMMU-API
>   gpu: host1x: Use sychronized interface of the IOMMU-API
>   IB/usnic: Use sychronized interface of the IOMMU-API
>   remoteproc: Use sychronized interface of the IOMMU-API
> 
>  arch/arm/mm/dma-mapping.c  | 21 +++---
>  drivers/gpu/drm/etnaviv/etnaviv_mmu.c  | 10 +--
>  drivers/gpu/drm/msm/msm_iommu.c|  5 +-
>  .../gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c| 12 ++--
>  drivers/gpu/drm/rockchip/rockchip_drm_gem.c|  6 +-
>  drivers/gpu/drm/tegra/drm.c|  6 +-
>  drivers/gpu/drm/tegra/gem.c|  6 +-
>  drivers/gpu/host1x/cdma.c  |  6 +-
>  drivers/gpu/host1x/job.c   |  6 +-
>  drivers/infiniband/hw/usnic/usnic_uiom.c   | 10 +--
>  drivers/iommu/amd_iommu.c  | 16 ++---
>  drivers/iommu/dma-iommu.c  |  8 +--
>  drivers/iommu/iommu.c  | 26 +++
>  drivers/remoteproc/remoteproc_core.c   | 10 +--
>  drivers/vfio/vfio_iommu_type1.c| 38 +-
>  include/linux/iommu.h  | 80 
> +-
>  16 files changed, 189 insertions(+), 77 deletions(-)
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
On Thu, Aug 17, 2017 at 04:30:48PM +0200, Lucas Stach wrote:
> Yeah, the IOMMU API being used internally is a historical accident, that
> we didn't get around to rectify yet. It's on my things-we-need-to-do
> list to prune the usage of the IOMMU API in etnaviv.

Okay, so for the time being, I drop the etnaviv patch from this series.


Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Lucas Stach
Am Donnerstag, den 17.08.2017, 16:18 +0200 schrieb Joerg Roedel:
> On Thu, Aug 17, 2017 at 04:03:54PM +0200, Lucas Stach wrote:
> > There is no IOMMU driver in use. Etnaviv just uses part of the IOMMU API
> > to manage the GPU internal MMU, see
> > drivers/gpu/drm/etnaviv/etnaviv_iommu.c
> 
> That looks like a very fragile construct, because it relies on internal
> behavior of the iommu code that can change in the future.
> 
> I strongly suggest to either make etnaviv_iommu.c a real iommu driver an
> move it to drivers/iommu, or (prefered by me) just call directly into
> the map/unmap functions of this driver from the rest of the
> etnaviv_iommu.c. I don't really see a reason why the IOMMU-API needs to
> be used there as another layer of indirection.

Yeah, the IOMMU API being used internally is a historical accident, that
we didn't get around to rectify yet. It's on my things-we-need-to-do
list to prune the usage of the IOMMU API in etnaviv.

Regards,
Lucas


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
On Thu, Aug 17, 2017 at 04:03:54PM +0200, Lucas Stach wrote:
> There is no IOMMU driver in use. Etnaviv just uses part of the IOMMU API
> to manage the GPU internal MMU, see
> drivers/gpu/drm/etnaviv/etnaviv_iommu.c

That looks like a very fragile construct, because it relies on internal
behavior of the iommu code that can change in the future.

I strongly suggest to either make etnaviv_iommu.c a real iommu driver an
move it to drivers/iommu, or (prefered by me) just call directly into
the map/unmap functions of this driver from the rest of the
etnaviv_iommu.c. I don't really see a reason why the IOMMU-API needs to
be used there as another layer of indirection.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Lucas Stach
Am Donnerstag, den 17.08.2017, 15:45 +0200 schrieb Joerg Roedel:
> Hi Lucas,
> 
> On Thu, Aug 17, 2017 at 03:32:38PM +0200, Lucas Stach wrote:
> > I don't think this is necessary. Etnaviv has managed and batched up TLB
> > flushes from day 1, as they don't happen through the MMU MMIO interface,
> > but through the GPU command stream.
> > 
> > So if my understanding of this series is correct, Etnaviv is just fine
> > with the changed semantics of the unsynchronized map/unmap calls.
> 
> This is not about any TLB on the GPU that could be flushed through the
> GPU command stream, but about the TLB in the IOMMU device. Or is this
> actually the same on this hardware? Which IOMMU-driver is use there?

There is no IOMMU driver in use. Etnaviv just uses part of the IOMMU API
to manage the GPU internal MMU, see
drivers/gpu/drm/etnaviv/etnaviv_iommu.c

Regards,
Lucas

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu/tegra*: Add support for struct iommu_device

2017-08-17 Thread Thierry Reding
On Thu, Aug 10, 2017 at 12:29:10AM +0200, Joerg Roedel wrote:
> Hi,
> 
> here are two patches to add support for 'struct iommu_device'
> to the tegra iommu-drivers. This will make the iommu-core
> code aware of the hardware iommus that a driver manages.
> 
> It will also add the iommus to sysfs and link them to the
> devices managed by them.
> 
> The patches apply on-top of Robin's iommu-group patches.
> 
> Please review.
> 
> Regards,
> 
>   Joerg
> 
> Joerg Roedel (2):
>   iommu/tegra: Add support for struct iommu_device
>   iommu/tegra-gart: Add support for struct iommu_device
> 
>  drivers/iommu/tegra-gart.c | 26 ++
>  drivers/iommu/tegra-smmu.c | 25 +
>  2 files changed, 51 insertions(+)

The series:

Acked-by: Thierry Reding 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/tegra-gart: Add support for struct iommu_device

2017-08-17 Thread Thierry Reding
On Thu, Aug 17, 2017 at 01:21:52AM +0300, Dmitry Osipenko wrote:
> Hello Joerg,
> 
> On 10.08.2017 01:29, Joerg Roedel wrote:
> > From: Joerg Roedel 
> > 
> > Add a struct iommu_device to each tegra-gart and register it
> > with the iommu-core. Also link devices added to the driver
> > to their respective hardware iommus.
> > 
> > Signed-off-by: Joerg Roedel 
> > ---
> >  drivers/iommu/tegra-gart.c | 26 ++
> >  1 file changed, 26 insertions(+)
> > 
> 
> Reviewed-by: Dmitry Osipenko 
> Tested-by: Dmitry Osipenko 
> 
> > diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
> > index 29bafc6..b62f790 100644
> > --- a/drivers/iommu/tegra-gart.c
> > +++ b/drivers/iommu/tegra-gart.c
> 
> [snip]
> 
> > @@ -449,6 +472,9 @@ static int tegra_gart_remove(struct platform_device 
> > *pdev)
> >  {
> 
> BTW, GART's driver can't be build as a module, so this function is pretty 
> much a
> dead code. Probably worth considering its removal.

Technically you can unbind the driver via sysfs, in which case this
function would still be called. That said, all sorts of things will
probably start to crash when you do that. I'd like to think that we
will eventually be able to deal with this sanely (there's some work
in progress to establish stronger links between devices, so that we
can sanely deal with this kind of dependency), so I think it's okay
to keep this around in case we ever get there.

I don't have any objections to making the driver unloadable if that
is what everyone else prefers, though. In that case, however, there
are more steps involved than just removing the ->remove() callback.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
Hi Lucas,

On Thu, Aug 17, 2017 at 03:32:38PM +0200, Lucas Stach wrote:
> I don't think this is necessary. Etnaviv has managed and batched up TLB
> flushes from day 1, as they don't happen through the MMU MMIO interface,
> but through the GPU command stream.
> 
> So if my understanding of this series is correct, Etnaviv is just fine
> with the changed semantics of the unsynchronized map/unmap calls.

This is not about any TLB on the GPU that could be flushed through the
GPU command stream, but about the TLB in the IOMMU device. Or is this
actually the same on this hardware? Which IOMMU-driver is use there?


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Lucas Stach
Hi Joerg,

Am Donnerstag, den 17.08.2017, 14:56 +0200 schrieb Joerg Roedel:
> From: Joerg Roedel 
> 
> The map and unmap functions of the IOMMU-API changed their
> semantics: They do no longer guarantee that the hardware
> TLBs are synchronized with the page-table updates they made.
> 
> To make conversion easier, new synchronized functions have
> been introduced which give these guarantees again until the
> code is converted to use the new TLB-flush interface of the
> IOMMU-API, which allows certain optimizations.
> 
> But for now, just convert this code to use the synchronized
> functions so that it will behave as before.

I don't think this is necessary. Etnaviv has managed and batched up TLB
flushes from day 1, as they don't happen through the MMU MMIO interface,
but through the GPU command stream.

So if my understanding of this series is correct, Etnaviv is just fine
with the changed semantics of the unsynchronized map/unmap calls.

Regards,
Lucas

> 
> Cc: Lucas Stach 
> Cc: Russell King 
> Cc: Christian Gmeiner 
> Cc: David Airlie 
> Cc: etna...@lists.freedesktop.org
> Cc: dri-de...@lists.freedesktop.org
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
> index f103e78..ae0247c 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
> @@ -47,7 +47,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova,
>  
>   VERB("map[%d]: %08x %08x(%zx)", i, iova, pa, bytes);
>  
> - ret = iommu_map(domain, da, pa, bytes, prot);
> + ret = iommu_map_sync(domain, da, pa, bytes, prot);
>   if (ret)
>   goto fail;
>  
> @@ -62,7 +62,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova,
>   for_each_sg(sgt->sgl, sg, i, j) {
>   size_t bytes = sg_dma_len(sg) + sg->offset;
>  
> - iommu_unmap(domain, da, bytes);
> + iommu_unmap_sync(domain, da, bytes);
>   da += bytes;
>   }
>   return ret;
> @@ -80,7 +80,7 @@ int etnaviv_iommu_unmap(struct etnaviv_iommu *iommu, u32 
> iova,
>   size_t bytes = sg_dma_len(sg) + sg->offset;
>   size_t unmapped;
>  
> - unmapped = iommu_unmap(domain, da, bytes);
> + unmapped = iommu_unmap_sync(domain, da, bytes);
>   if (unmapped < bytes)
>   return unmapped;
>  
> @@ -338,7 +338,7 @@ int etnaviv_iommu_get_suballoc_va(struct etnaviv_gpu 
> *gpu, dma_addr_t paddr,
>   mutex_unlock(>lock);
>   return ret;
>   }
> - ret = iommu_map(mmu->domain, vram_node->start, paddr, size,
> + ret = iommu_map_sync(mmu->domain, vram_node->start, paddr, size,
>   IOMMU_READ);
>   if (ret < 0) {
>   drm_mm_remove_node(vram_node);
> @@ -362,7 +362,7 @@ void etnaviv_iommu_put_suballoc_va(struct etnaviv_gpu 
> *gpu,
>  
>   if (mmu->version == ETNAVIV_IOMMU_V2) {
>   mutex_lock(>lock);
> - iommu_unmap(mmu->domain,iova, size);
> + iommu_unmap_sync(mmu->domain,iova, size);
>   drm_mm_remove_node(vram_node);
>   mutex_unlock(>lock);
>   }


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] drm/tegra: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Thierry Reding
On Thu, Aug 17, 2017 at 02:56:33PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> The map and unmap functions of the IOMMU-API changed their
> semantics: They do no longer guarantee that the hardware
> TLBs are synchronized with the page-table updates they made.
> 
> To make conversion easier, new synchronized functions have
> been introduced which give these guarantees again until the
> code is converted to use the new TLB-flush interface of the
> IOMMU-API, which allows certain optimizations.
> 
> But for now, just convert this code to use the synchronized
> functions so that it will behave as before.
> 
> Cc: Thierry Reding 
> Cc: David Airlie 
> Cc: Jonathan Hunter 
> Cc: dri-de...@lists.freedesktop.org
> Cc: linux-te...@vger.kernel.org
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/gpu/drm/tegra/drm.c | 6 +++---
>  drivers/gpu/drm/tegra/gem.c | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)

Acked-by: Thierry Reding 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 12/13] IB/usnic: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Christian Benvenuti 
Cc: Dave Goodell 
Cc: Doug Ledford 
Cc: Sean Hefty 
Cc: Hal Rosenstock 
Cc: linux-r...@vger.kernel.org
Signed-off-by: Joerg Roedel 
---
 drivers/infiniband/hw/usnic/usnic_uiom.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index c49db7c..e53dc38 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -205,7 +205,7 @@ static void usnic_uiom_unmap_sorted_intervals(struct 
list_head *intervals,
while (size > 0) {
/* Workaround for RH 970401 */
usnic_dbg("va 0x%lx size 0x%lx", va, PAGE_SIZE);
-   iommu_unmap(pd->domain, va, PAGE_SIZE);
+   iommu_unmap_sync(pd->domain, va, PAGE_SIZE);
va += PAGE_SIZE;
size -= PAGE_SIZE;
}
@@ -282,8 +282,8 @@ static int usnic_uiom_map_sorted_intervals(struct list_head 
*intervals,
size = pa_end - pa_start + PAGE_SIZE;
usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 
0x%x",
va_start, _start, size, flags);
-   err = iommu_map(pd->domain, va_start, pa_start,
-   size, flags);
+   err = iommu_map_sync(pd->domain, va_start,
+pa_start, size, flags);
if (err) {
usnic_err("Failed to map va 0x%lx pa 
%pa size 0x%zx with err %d\n",
va_start, _start, size, err);
@@ -299,8 +299,8 @@ static int usnic_uiom_map_sorted_intervals(struct list_head 
*intervals,
size = pa - pa_start + PAGE_SIZE;
usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 
0x%x\n",
va_start, _start, size, flags);
-   err = iommu_map(pd->domain, va_start, pa_start,
-   size, flags);
+   err = iommu_map_sync(pd->domain, va_start,
+pa_start, size, flags);
if (err) {
usnic_err("Failed to map va 0x%lx pa 
%pa size 0x%zx with err %d\n",
va_start, _start, size, err);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/13] arm: dma-mapping: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Russell King 
Cc: Laurent Pinchart 
Cc: Sricharan R 
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Joerg Roedel 
---
 arch/arm/mm/dma-mapping.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index fcf1473..fae2398 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1444,7 +1444,7 @@ __iommu_create_mapping(struct device *dev, struct page 
**pages, size_t size,
break;
 
len = (j - i) << PAGE_SHIFT;
-   ret = iommu_map(mapping->domain, iova, phys, len,
+   ret = iommu_map_sync(mapping->domain, iova, phys, len,
__dma_info_to_prot(DMA_BIDIRECTIONAL, attrs));
if (ret < 0)
goto fail;
@@ -1453,7 +1453,7 @@ __iommu_create_mapping(struct device *dev, struct page 
**pages, size_t size,
}
return dma_addr;
 fail:
-   iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
+   iommu_unmap_sync(mapping->domain, dma_addr, iova-dma_addr);
__free_iova(mapping, dma_addr, size);
return ARM_MAPPING_ERROR;
 }
@@ -1469,7 +1469,7 @@ static int __iommu_remove_mapping(struct device *dev, 
dma_addr_t iova, size_t si
size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
iova &= PAGE_MASK;
 
-   iommu_unmap(mapping->domain, iova, size);
+   iommu_unmap_sync(mapping->domain, iova, size);
__free_iova(mapping, iova, size);
return 0;
 }
@@ -1730,7 +1730,7 @@ static int __map_sg_chunk(struct device *dev, struct 
scatterlist *sg,
 
prot = __dma_info_to_prot(dir, attrs);
 
-   ret = iommu_map(mapping->domain, iova, phys, len, prot);
+   ret = iommu_map_sync(mapping->domain, iova, phys, len, prot);
if (ret < 0)
goto fail;
count += len >> PAGE_SHIFT;
@@ -1740,7 +1740,7 @@ static int __map_sg_chunk(struct device *dev, struct 
scatterlist *sg,
 
return 0;
 fail:
-   iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
+   iommu_unmap_sync(mapping->domain, iova_base, count * PAGE_SIZE);
__free_iova(mapping, iova_base, size);
return ret;
 }
@@ -1938,7 +1938,8 @@ static dma_addr_t arm_coherent_iommu_map_page(struct 
device *dev, struct page *p
 
prot = __dma_info_to_prot(dir, attrs);
 
-   ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 
prot);
+   ret = iommu_map_sync(mapping->domain, dma_addr, page_to_phys(page),
+len, prot);
if (ret < 0)
goto fail;
 
@@ -1988,7 +1989,7 @@ static void arm_coherent_iommu_unmap_page(struct device 
*dev, dma_addr_t handle,
if (!iova)
return;
 
-   iommu_unmap(mapping->domain, iova, len);
+   iommu_unmap_sync(mapping->domain, iova, len);
__free_iova(mapping, iova, len);
 }
 
@@ -2016,7 +2017,7 @@ static void arm_iommu_unmap_page(struct device *dev, 
dma_addr_t handle,
if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
__dma_page_dev_to_cpu(page, offset, size, dir);
 
-   iommu_unmap(mapping->domain, iova, len);
+   iommu_unmap_sync(mapping->domain, iova, len);
__free_iova(mapping, iova, len);
 }
 
@@ -2044,7 +2045,7 @@ static dma_addr_t arm_iommu_map_resource(struct device 
*dev,
 
prot = __dma_info_to_prot(dir, attrs) | IOMMU_MMIO;
 
-   ret = iommu_map(mapping->domain, dma_addr, addr, len, prot);
+   ret = iommu_map_sync(mapping->domain, dma_addr, addr, len, prot);
if (ret < 0)
goto fail;
 
@@ -2073,7 +2074,7 @@ static void arm_iommu_unmap_resource(struct device *dev, 
dma_addr_t dma_handle,
if (!iova)
return;
 
-   iommu_unmap(mapping->domain, iova, len);
+   iommu_unmap_sync(mapping->domain, iova, len);
__free_iova(mapping, iova, len);
 }
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 13/13] remoteproc: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Ohad Ben-Cohen 
Cc: Bjorn Andersson 
Cc: linux-remotep...@vger.kernel.org
Signed-off-by: Joerg Roedel 
---
 drivers/remoteproc/remoteproc_core.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c 
b/drivers/remoteproc/remoteproc_core.c
index 564061d..16db19c 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -553,7 +553,8 @@ static int rproc_handle_devmem(struct rproc *rproc, struct 
fw_rsc_devmem *rsc,
if (!mapping)
return -ENOMEM;
 
-   ret = iommu_map(rproc->domain, rsc->da, rsc->pa, rsc->len, rsc->flags);
+   ret = iommu_map_sync(rproc->domain, rsc->da,
+rsc->pa, rsc->len, rsc->flags);
if (ret) {
dev_err(dev, "failed to map devmem: %d\n", ret);
goto out;
@@ -661,8 +662,8 @@ static int rproc_handle_carveout(struct rproc *rproc,
goto dma_free;
}
 
-   ret = iommu_map(rproc->domain, rsc->da, dma, rsc->len,
-   rsc->flags);
+   ret = iommu_map_sync(rproc->domain, rsc->da, dma, rsc->len,
+rsc->flags);
if (ret) {
dev_err(dev, "iommu_map failed: %d\n", ret);
goto free_mapping;
@@ -823,7 +824,8 @@ static void rproc_resource_cleanup(struct rproc *rproc)
list_for_each_entry_safe(entry, tmp, >mappings, node) {
size_t unmapped;
 
-   unmapped = iommu_unmap(rproc->domain, entry->da, entry->len);
+   unmapped = iommu_unmap_sync(rproc->domain, entry->da,
+   entry->len);
if (unmapped != entry->len) {
/* nothing much to do besides complaining */
dev_err(dev, "failed to unmap %u/%zu\n", entry->len,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/13] gpu: host1x: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Thierry Reding 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-te...@vger.kernel.org
Signed-off-by: Joerg Roedel 
---
 drivers/gpu/host1x/cdma.c | 6 +++---
 drivers/gpu/host1x/job.c  | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 28541b2..bdf557e 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -55,7 +55,7 @@ static void host1x_pushbuffer_destroy(struct push_buffer *pb)
return;
 
if (host1x->domain) {
-   iommu_unmap(host1x->domain, pb->dma, pb->alloc_size);
+   iommu_unmap_sync(host1x->domain, pb->dma, pb->alloc_size);
free_iova(>iova, iova_pfn(>iova, pb->dma));
}
 
@@ -105,8 +105,8 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
}
 
pb->dma = iova_dma_addr(>iova, alloc);
-   err = iommu_map(host1x->domain, pb->dma, pb->phys, size,
-   IOMMU_READ);
+   err = iommu_map_sync(host1x->domain, pb->dma, pb->phys, size,
+IOMMU_READ);
if (err)
goto iommu_free_iova;
} else {
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index bee5044..70a029c 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -243,7 +243,7 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
goto unpin;
}
 
-   err = iommu_map_sg(host->domain,
+   err = iommu_map_sg_sync(host->domain,
iova_dma_addr(>iova, alloc),
sgt->sgl, sgt->nents, IOMMU_READ);
if (err == 0) {
@@ -695,8 +695,8 @@ void host1x_job_unpin(struct host1x_job *job)
struct host1x_job_unpin_data *unpin = >unpins[i];
 
if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) && host->domain) {
-   iommu_unmap(host->domain, job->addr_phys[i],
-   unpin->size);
+   iommu_unmap_sync(host->domain, job->addr_phys[i],
+unpin->size);
free_iova(>iova,
iova_pfn(>iova, job->addr_phys[i]));
}
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/13] drm/tegra: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Thierry Reding 
Cc: David Airlie 
Cc: Jonathan Hunter 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-te...@vger.kernel.org
Signed-off-by: Joerg Roedel 
---
 drivers/gpu/drm/tegra/drm.c | 6 +++---
 drivers/gpu/drm/tegra/gem.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 518f4b6..bc4528ee 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1156,8 +1156,8 @@ void *tegra_drm_alloc(struct tegra_drm *tegra, size_t 
size,
}
 
*dma = iova_dma_addr(>carveout.domain, alloc);
-   err = iommu_map(tegra->domain, *dma, virt_to_phys(virt),
-   size, IOMMU_READ | IOMMU_WRITE);
+   err = iommu_map_sync(tegra->domain, *dma, virt_to_phys(virt),
+size, IOMMU_READ | IOMMU_WRITE);
if (err < 0)
goto free_iova;
 
@@ -1180,7 +1180,7 @@ void tegra_drm_free(struct tegra_drm *tegra, size_t size, 
void *virt,
size = PAGE_ALIGN(size);
 
if (tegra->domain) {
-   iommu_unmap(tegra->domain, dma, size);
+   iommu_unmap_sync(tegra->domain, dma, size);
free_iova(>carveout.domain,
  iova_pfn(>carveout.domain, dma));
}
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 7a39a35..639bc75 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -135,8 +135,8 @@ static int tegra_bo_iommu_map(struct tegra_drm *tegra, 
struct tegra_bo *bo)
 
bo->paddr = bo->mm->start;
 
-   err = iommu_map_sg(tegra->domain, bo->paddr, bo->sgt->sgl,
-  bo->sgt->nents, prot);
+   err = iommu_map_sg_sync(tegra->domain, bo->paddr, bo->sgt->sgl,
+   bo->sgt->nents, prot);
if (err < 0) {
dev_err(tegra->drm->dev, "failed to map buffer: %zd\n", err);
goto remove;
@@ -162,7 +162,7 @@ static int tegra_bo_iommu_unmap(struct tegra_drm *tegra, 
struct tegra_bo *bo)
return 0;
 
mutex_lock(>mm_lock);
-   iommu_unmap(tegra->domain, bo->paddr, bo->size);
+   iommu_unmap_sync(tegra->domain, bo->paddr, bo->size);
drm_mm_remove_node(bo->mm);
mutex_unlock(>mm_lock);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Lucas Stach 
Cc: Russell King 
Cc: Christian Gmeiner 
Cc: David Airlie 
Cc: etna...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Joerg Roedel 
---
 drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
index f103e78..ae0247c 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
@@ -47,7 +47,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova,
 
VERB("map[%d]: %08x %08x(%zx)", i, iova, pa, bytes);
 
-   ret = iommu_map(domain, da, pa, bytes, prot);
+   ret = iommu_map_sync(domain, da, pa, bytes, prot);
if (ret)
goto fail;
 
@@ -62,7 +62,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova,
for_each_sg(sgt->sgl, sg, i, j) {
size_t bytes = sg_dma_len(sg) + sg->offset;
 
-   iommu_unmap(domain, da, bytes);
+   iommu_unmap_sync(domain, da, bytes);
da += bytes;
}
return ret;
@@ -80,7 +80,7 @@ int etnaviv_iommu_unmap(struct etnaviv_iommu *iommu, u32 iova,
size_t bytes = sg_dma_len(sg) + sg->offset;
size_t unmapped;
 
-   unmapped = iommu_unmap(domain, da, bytes);
+   unmapped = iommu_unmap_sync(domain, da, bytes);
if (unmapped < bytes)
return unmapped;
 
@@ -338,7 +338,7 @@ int etnaviv_iommu_get_suballoc_va(struct etnaviv_gpu *gpu, 
dma_addr_t paddr,
mutex_unlock(>lock);
return ret;
}
-   ret = iommu_map(mmu->domain, vram_node->start, paddr, size,
+   ret = iommu_map_sync(mmu->domain, vram_node->start, paddr, size,
IOMMU_READ);
if (ret < 0) {
drm_mm_remove_node(vram_node);
@@ -362,7 +362,7 @@ void etnaviv_iommu_put_suballoc_va(struct etnaviv_gpu *gpu,
 
if (mmu->version == ETNAVIV_IOMMU_V2) {
mutex_lock(>lock);
-   iommu_unmap(mmu->domain,iova, size);
+   iommu_unmap_sync(mmu->domain,iova, size);
drm_mm_remove_node(vram_node);
mutex_unlock(>lock);
}
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/13] drm/nouveau/imem/gk20a: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Ben Skeggs 
Cc: David Airlie 
Cc: dri-de...@lists.freedesktop.org
Cc: nouv...@lists.freedesktop.org
Signed-off-by: Joerg Roedel 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index cd5adbe..3f0de47 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -322,8 +322,9 @@ gk20a_instobj_dtor_iommu(struct nvkm_memory *memory)
 
/* Unmap pages from GPU address space and free them */
for (i = 0; i < node->base.mem.size; i++) {
-   iommu_unmap(imem->domain,
-   (r->offset + i) << imem->iommu_pgshift, PAGE_SIZE);
+   iommu_unmap_sync(imem->domain,
+(r->offset + i) << imem->iommu_pgshift,
+PAGE_SIZE);
dma_unmap_page(dev, node->dma_addrs[i], PAGE_SIZE,
   DMA_BIDIRECTIONAL);
__free_page(node->pages[i]);
@@ -458,14 +459,15 @@ gk20a_instobj_ctor_iommu(struct gk20a_instmem *imem, u32 
npages, u32 align,
for (i = 0; i < npages; i++) {
u32 offset = (r->offset + i) << imem->iommu_pgshift;
 
-   ret = iommu_map(imem->domain, offset, node->dma_addrs[i],
-   PAGE_SIZE, IOMMU_READ | IOMMU_WRITE);
+   ret = iommu_map_sync(imem->domain, offset, node->dma_addrs[i],
+PAGE_SIZE, IOMMU_READ | IOMMU_WRITE);
if (ret < 0) {
nvkm_error(subdev, "IOMMU mapping failure: %d\n", ret);
 
while (i-- > 0) {
offset -= PAGE_SIZE;
-   iommu_unmap(imem->domain, offset, PAGE_SIZE);
+   iommu_unmap_sync(imem->domain, offset,
+PAGE_SIZE);
}
goto release_area;
}
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/13] drm/msm: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Rob Clark 
Cc: David Airlie 
Cc: linux-arm-...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: freedr...@lists.freedesktop.org
Signed-off-by: Joerg Roedel 
---
 drivers/gpu/drm/msm/msm_iommu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index b23d336..b3525b7 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -64,7 +64,8 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
size_t ret;
 
 // pm_runtime_get_sync(mmu->dev);
-   ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
+   ret = iommu_map_sg_sync(iommu->domain, iova, sgt->sgl,
+   sgt->nents, prot);
 // pm_runtime_put_sync(mmu->dev);
WARN_ON(ret < 0);
 
@@ -77,7 +78,7 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova,
struct msm_iommu *iommu = to_msm_iommu(mmu);
 
pm_runtime_get_sync(mmu->dev);
-   iommu_unmap(iommu->domain, iova, len);
+   iommu_unmap_sync(iommu->domain, iova, len);
pm_runtime_put_sync(mmu->dev);
 
return 0;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

With the current IOMMU-API the hardware TLBs have to be
flushed in every iommu_map(), iommu_map_sg(), and
iommu_unmap() call.

For unmapping large amounts of address space, like it
happens when a KVM domain with assigned devices is
destroyed, this causes thousands of unnecessary TLB flushes
in the IOMMU hardware because the unmap call-back runs for
every unmapped physical page.

With the TLB Flush Interface introduced here the need to
clean the hardware TLBs is removed from the iommu_map/unmap
functions. Users now have to explicitly call these functions
to sync the page-table changes to the hardware.

Three functions are introduced:

* iommu_flush_tlb_all() - Flushes all TLB entries
  associated with that
  domain. TLBs entries are
  flushed when this function
  returns.

* iommu_tlb_range_add() - This will add a given
  range to the flush queue
  for this domain.

* iommu_tlb_sync() - Flushes all queued ranges from
 the hardware TLBs. Returns when
 the flush is finished.

The semantic of this interface is intentionally similar to
the iommu_gather_ops from the io-pgtable code.

Additionally, this patch introduces synchronized versions of
the iommu_map(), iommu_map_sg(), and iommu_unmap()
functions. They will be used by current users of the
IOMMU-API, before they are optimized to the unsynchronized
versions.

Cc: Alex Williamson 
Cc: Will Deacon 
Cc: Robin Murphy 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 26 +
 include/linux/iommu.h | 80 ++-
 2 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3f6ea16..816e248 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -527,6 +527,8 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
 
}
 
+   iommu_flush_tlb_all(domain);
+
 out:
iommu_put_resv_regions(dev, );
 
@@ -1556,6 +1558,18 @@ int iommu_map(struct iommu_domain *domain, unsigned long 
iova,
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
+int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
+  phys_addr_t paddr, size_t size, int prot)
+{
+   int ret = iommu_map(domain, iova, paddr, size, prot);
+
+   iommu_tlb_range_add(domain, iova, size);
+   iommu_tlb_sync(domain);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_map_sync);
+
 size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t 
size)
 {
size_t unmapped_page, unmapped = 0;
@@ -1608,6 +1622,18 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned 
long iova, size_t size)
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
+size_t iommu_unmap_sync(struct iommu_domain *domain,
+   unsigned long iova, size_t size)
+{
+   size_t ret = iommu_unmap(domain, iova, size);
+
+   iommu_tlb_range_add(domain, iova, size);
+   iommu_tlb_sync(domain);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_unmap_sync);
+
 size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 struct scatterlist *sg, unsigned int nents, int prot)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2cb54ad..7f9c114 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -167,6 +167,10 @@ struct iommu_resv_region {
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
+ * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
+ * @tlb_range_add: Add a given iova range to the flush queue for this domain
+ * @tlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
+ *queue
  * to an iommu domain
  * @iova_to_phys: translate iova to physical address
  * @add_device: add device to iommu grouping
@@ -199,6 +203,10 @@ struct iommu_ops {
 size_t size);
size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova,
 struct scatterlist *sg, unsigned int nents, int prot);
+   void (*flush_iotlb_all)(struct iommu_domain *domain);
+   void (*iotlb_range_add)(struct iommu_domain *domain,
+   unsigned long iova, size_t size);
+   void (*iotlb_sync)(struct iommu_domain *domain);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t 
iova);
int (*add_device)(struct device *dev);
void (*remove_device)(struct device *dev);

[PATCH 03/13] vfio/type1: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Alex Williamson 
Cc: k...@vger.kernel.org
Signed-off-by: Joerg Roedel 
---
 drivers/vfio/vfio_iommu_type1.c | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 8549cb1..4ad83d4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -672,7 +672,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
  struct vfio_domain, next);
 
list_for_each_entry_continue(d, >domain_list, next) {
-   iommu_unmap(d->domain, dma->iova, dma->size);
+   iommu_unmap_sync(d->domain, dma->iova, dma->size);
cond_resched();
}
 
@@ -687,9 +687,9 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
}
 
/*
-* To optimize for fewer iommu_unmap() calls, each of which
-* may require hardware cache flushing, try to find the
-* largest contiguous physical memory chunk to unmap.
+* To optimize for fewer iommu_unmap_sync() calls, each of which
+* may require hardware cache flushing, try to find the largest
+* contiguous physical memory chunk to unmap.
 */
for (len = PAGE_SIZE;
 !domain->fgsp && iova + len < end; len += PAGE_SIZE) {
@@ -698,7 +698,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
break;
}
 
-   unmapped = iommu_unmap(domain->domain, iova, len);
+   unmapped = iommu_unmap_sync(domain->domain, iova, len);
if (WARN_ON(!unmapped))
break;
 
@@ -877,15 +877,15 @@ static int map_try_harder(struct vfio_domain *domain, 
dma_addr_t iova,
int ret = 0;
 
for (i = 0; i < npage; i++, pfn++, iova += PAGE_SIZE) {
-   ret = iommu_map(domain->domain, iova,
-   (phys_addr_t)pfn << PAGE_SHIFT,
-   PAGE_SIZE, prot | domain->prot);
+   ret = iommu_map_sync(domain->domain, iova,
+(phys_addr_t)pfn << PAGE_SHIFT,
+PAGE_SIZE, prot | domain->prot);
if (ret)
break;
}
 
for (; i < npage && i > 0; i--, iova -= PAGE_SIZE)
-   iommu_unmap(domain->domain, iova, PAGE_SIZE);
+   iommu_unmap_sync(domain->domain, iova, PAGE_SIZE);
 
return ret;
 }
@@ -897,8 +897,9 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, 
dma_addr_t iova,
int ret;
 
list_for_each_entry(d, >domain_list, next) {
-   ret = iommu_map(d->domain, iova, (phys_addr_t)pfn << PAGE_SHIFT,
-   npage << PAGE_SHIFT, prot | d->prot);
+   ret = iommu_map_sync(d->domain, iova,
+(phys_addr_t)pfn << PAGE_SHIFT,
+npage << PAGE_SHIFT, prot | d->prot);
if (ret) {
if (ret != -EBUSY ||
map_try_harder(d, iova, pfn, npage, prot))
@@ -912,7 +913,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, 
dma_addr_t iova,
 
 unwind:
list_for_each_entry_continue_reverse(d, >domain_list, next)
-   iommu_unmap(d->domain, iova, npage << PAGE_SHIFT);
+   iommu_unmap_sync(d->domain, iova, npage << PAGE_SHIFT);
 
return ret;
 }
@@ -1102,8 +1103,8 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
size = npage << PAGE_SHIFT;
}
 
-   ret = iommu_map(domain->domain, iova, phys,
-   size, dma->prot | domain->prot);
+   ret = iommu_map_sync(domain->domain, iova, phys,
+size, dma->prot | domain->prot);
if (ret)
return ret;
 
@@ -1133,13 +1134,14 @@ static void vfio_test_domain_fgsp(struct vfio_domain 
*domain)
if (!pages)
return;
 
-   ret = 

[PATCH 09/13] drm/rockchip: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Mark Yao 
Cc: David Airlie 
Cc: Heiko Stuebner 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-rockc...@lists.infradead.org
Signed-off-by: Joerg Roedel 
---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index b74ac71..6d28224 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -41,8 +41,8 @@ static int rockchip_gem_iommu_map(struct rockchip_gem_object 
*rk_obj)
 
rk_obj->dma_addr = rk_obj->mm.start;
 
-   ret = iommu_map_sg(private->domain, rk_obj->dma_addr, rk_obj->sgt->sgl,
-  rk_obj->sgt->nents, prot);
+   ret = iommu_map_sg_sync(private->domain, rk_obj->dma_addr,
+   rk_obj->sgt->sgl, rk_obj->sgt->nents, prot);
if (ret < rk_obj->base.size) {
DRM_ERROR("failed to map buffer: size=%zd request_size=%zd\n",
  ret, rk_obj->base.size);
@@ -67,7 +67,7 @@ static int rockchip_gem_iommu_unmap(struct 
rockchip_gem_object *rk_obj)
struct drm_device *drm = rk_obj->base.dev;
struct rockchip_drm_private *private = drm->dev_private;
 
-   iommu_unmap(private->domain, rk_obj->dma_addr, rk_obj->size);
+   iommu_unmap_sync(private->domain, rk_obj->dma_addr, rk_obj->size);
 
mutex_lock(>mm_lock);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/13] iommu/dma: Use sychronized interface of the IOMMU-API

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Robin Murphy 
Cc: Will Deacon 
Cc: Nate Watterson 
Cc: Eric Auger 
Cc: Mitchel Humpherys 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/dma-iommu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9d1cebe..38c41a2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -417,7 +417,7 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, 
dma_addr_t dma_addr,
dma_addr -= iova_off;
size = iova_align(iovad, size + iova_off);
 
-   WARN_ON(iommu_unmap(domain, dma_addr, size) != size);
+   WARN_ON(iommu_unmap_sync(domain, dma_addr, size) != size);
iommu_dma_free_iova(cookie, dma_addr, size);
 }
 
@@ -572,7 +572,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
sg_miter_stop();
}
 
-   if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, prot)
+   if (iommu_map_sg_sync(domain, iova, sgt.sgl, sgt.orig_nents, prot)
< size)
goto out_free_sg;
 
@@ -631,7 +631,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
if (!iova)
return IOMMU_MAPPING_ERROR;
 
-   if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
+   if (iommu_map_sync(domain, iova, phys - iova_off, size, prot)) {
iommu_dma_free_iova(cookie, iova, size);
return IOMMU_MAPPING_ERROR;
}
@@ -791,7 +791,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist 
*sg,
 * We'll leave any physical concatenation to the IOMMU driver's
 * implementation - it knows better than we do.
 */
-   if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
+   if (iommu_map_sg_sync(domain, iova, sg, nents, prot) < iova_len)
goto out_free_iova;
 
return __finalise_sg(dev, sg, nents, iova);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface

2017-08-17 Thread Joerg Roedel
Hi,

here is a patch-set to introduce an explicit interface to
the IOMMU-API to flush IOMMU and device IO/TLBs. Currently
the iommu_map(), iommu_map_sg(), and iommu_unmap() functions
have to make sure all IO/TLBs in the system are synchronized
with the page-table updates they made.

This is very inefficient in some scenarios, for example when
a large address space is unmapped and an IO/TLB flush has to
be done in every call of iommu_unmap(). Or in a scenario
where it makes sense to queue up some changes to the
page-tables and flush them together.

To optimize these scenarios, the need to synchronize with
the IOMMU and device TLBs has been removed from the
map/unmap functions of the IOMMU-API and an interface to
explicitly do the flushes has been introduced.

To make the conversion of existing users of the IOMMU-API
easier, new functions - iommu_map_sync(), iommu_map_sg_sync(),
and iommu_unmap_sync() - have been introduced. These
functions guarantee that the IO/TLBs are synchronized with
any page-table update when they return. The optimizations
possible with the new interface are subject to separate
patch-sets.

Patch 1 just renames a few functions in the AMD-Vi driver
that would otherwise collide with the new TLB-flush
functions from the IOMMU-API.

Patch 2 introduces the new IO/TLB Flush-Interface.

Patch 3-13 convert existing users of the IOMMU-API to use
the *_sync functions for now.

Please review.

Thanks,

Joerg

Joerg Roedel (13):
  iommu/amd: Rename a few flush functions
  iommu: Introduce Interface for IOMMU TLB Flushing
  vfio/type1: Use sychronized interface of the IOMMU-API
  iommu/dma: Use sychronized interface of the IOMMU-API
  arm: dma-mapping: Use sychronized interface of the IOMMU-API
  drm/etnaviv: Use sychronized interface of the IOMMU-API
  drm/msm: Use sychronized interface of the IOMMU-API
  drm/nouveau/imem/gk20a: Use sychronized interface of the IOMMU-API
  drm/rockchip: Use sychronized interface of the IOMMU-API
  drm/tegra: Use sychronized interface of the IOMMU-API
  gpu: host1x: Use sychronized interface of the IOMMU-API
  IB/usnic: Use sychronized interface of the IOMMU-API
  remoteproc: Use sychronized interface of the IOMMU-API

 arch/arm/mm/dma-mapping.c  | 21 +++---
 drivers/gpu/drm/etnaviv/etnaviv_mmu.c  | 10 +--
 drivers/gpu/drm/msm/msm_iommu.c|  5 +-
 .../gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c| 12 ++--
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c|  6 +-
 drivers/gpu/drm/tegra/drm.c|  6 +-
 drivers/gpu/drm/tegra/gem.c|  6 +-
 drivers/gpu/host1x/cdma.c  |  6 +-
 drivers/gpu/host1x/job.c   |  6 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   | 10 +--
 drivers/iommu/amd_iommu.c  | 16 ++---
 drivers/iommu/dma-iommu.c  |  8 +--
 drivers/iommu/iommu.c  | 26 +++
 drivers/remoteproc/remoteproc_core.c   | 10 +--
 drivers/vfio/vfio_iommu_type1.c| 38 +-
 include/linux/iommu.h  | 80 +-
 16 files changed, 189 insertions(+), 77 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/13] iommu/amd: Rename a few flush functions

2017-08-17 Thread Joerg Roedel
From: Joerg Roedel 

Rename a few iommu cache-flush functions that start with
iommu_ to start with amd_iommu now. This is to prevent name
collisions with generic iommu code later on.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 354cbd6..5469457 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1165,7 +1165,7 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 
devid)
return iommu_queue_command(iommu, );
 }
 
-static void iommu_flush_dte_all(struct amd_iommu *iommu)
+static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
 {
u32 devid;
 
@@ -1179,7 +1179,7 @@ static void iommu_flush_dte_all(struct amd_iommu *iommu)
  * This function uses heavy locking and may disable irqs for some time. But
  * this is no issue because it is only called during resume.
  */
-static void iommu_flush_tlb_all(struct amd_iommu *iommu)
+static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
 {
u32 dom_id;
 
@@ -1193,7 +1193,7 @@ static void iommu_flush_tlb_all(struct amd_iommu *iommu)
iommu_completion_wait(iommu);
 }
 
-static void iommu_flush_all(struct amd_iommu *iommu)
+static void amd_iommu_flush_all(struct amd_iommu *iommu)
 {
struct iommu_cmd cmd;
 
@@ -1212,7 +1212,7 @@ static void iommu_flush_irt(struct amd_iommu *iommu, u16 
devid)
iommu_queue_command(iommu, );
 }
 
-static void iommu_flush_irt_all(struct amd_iommu *iommu)
+static void amd_iommu_flush_irt_all(struct amd_iommu *iommu)
 {
u32 devid;
 
@@ -1225,11 +1225,11 @@ static void iommu_flush_irt_all(struct amd_iommu *iommu)
 void iommu_flush_all_caches(struct amd_iommu *iommu)
 {
if (iommu_feature(iommu, FEATURE_IA)) {
-   iommu_flush_all(iommu);
+   amd_iommu_flush_all(iommu);
} else {
-   iommu_flush_dte_all(iommu);
-   iommu_flush_irt_all(iommu);
-   iommu_flush_tlb_all(iommu);
+   amd_iommu_flush_dte_all(iommu);
+   amd_iommu_flush_irt_all(iommu);
+   amd_iommu_flush_tlb_all(iommu);
}
 }
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-08-17 Thread Bharat Bhushan
Hi,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Thursday, August 17, 2017 3:42 PM
> To: Adam Tao 
> Cc: iommu@lists.linux-foundation.org; k...@vger.kernel.org;
> virtualizat...@lists.linux-foundation.org; virtio-...@lists.oasis-open.org;
> will.dea...@arm.com; robin.mur...@arm.com; lorenzo.pieral...@arm.com;
> m...@redhat.com; jasow...@redhat.com; marc.zyng...@arm.com;
> eric.au...@redhat.com; eric.auger@gmail.com; Bharat Bhushan
> ; pet...@redhat.com; kevin.t...@intel.com
> Subject: Re: [virtio-dev] [RFC] virtio-iommu version 0.4
> 
> Hi Adam,
> 
> On 16/08/17 05:08, Adam Tao wrote:
> >> * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
> >>   Bhushan.
> > Hi, Brucker
> > I read the related spec for virtio IOMMU, I am wondering if we support
> > both the virtual and physical devices in the guest to use the virtio
> > IOMMU, how can we config the related address translation for the
> > different type of devices.
> > e.g.
> > 1. for Virtio devs(e.g. virtio-net), we can change the memmap table
> > used by the vhost device.
> > 2. for physical devices we can directly config the IOMMU/SMMU on the
> > host.
> >
> > So do you know are there any RFCs for the back-end implementation for
> > the virtio-IOMMU thanks Adam
> 
> Eric's series [3] adds support for virtual devices (I'v only had time to test
> virtio-net so far, but as vhost-iotlb seems to be supported by Qemu, it should
> work with vhost-net as well). Bharat is working on the VFIO backend for
> physical devices. As far as I'm aware, his latest version is:
> 
> [7] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg04094.html

I am working on newer version which will be based on Eric-v3.
Actually I wanted to include the fix for the issue reported by Eric, it fails 
if we assign more than one interface.
I reproduced the issues on my side and know the root cause. I am working on 
fixing this.

Thanks
-Bharat

> 
> Thanks,
> Jean
> 
> 
> >> [1] http://www.spinics.net/lists/kvm/msg147990.html
> >> [2] git://linux-arm.org/virtio-iommu.git branch viommu/v0.4
> >> http://www.linux-arm.org/git?p=virtio-
> iommu.git;a=blob;f=dist/v0.4/virtio-iommu-v0.4.pdf
> >> I reiterate the disclaimers: don't use this document as a reference,
> >> it's a draft. It's also not an OASIS document yet. It may be riddled
> >> with mistakes. As this is a working draft, it is unstable and I do not
> >> guarantee backward compatibility of future versions.
> >> [3] https://lists.gnu.org/archive/html/qemu-arm/2017-08/msg4.html
> >> [4] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.4
> >> Warning: UAPI headers have changed! They didn't follow the spec,
> >> please update. (Use branch v0.1, that has the old headers, for the
> >> Qemu prototype [3])
> >> [5] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.4
> >> Warning: command-line has changed! Use --viommu vfio[,opts] and
> >> --viommu virtio[,opts] to instantiate a device.
> >> [6]
> >> http://www.linux-arm.org/git?p=virtio-iommu.git;a=tree;f=dist/diffs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-17 Thread David Laight
From: Alex Williamson
> Sent: 16 August 2017 17:56
...
> Firmware pissing match...  Processors running with 8k or less page size
> fall within the recommendations of the PCI spec for register alignment
> of MMIO regions of the device and this whole problem becomes less of an
> issue.

Actually if qemu is causing the MSI-X table accesses to fault, why doesn't
it just lie to the guest about the physical address of the MSI-X table?
Then mmio access to anything in the same physical page will just work.

It has already been pointed out that you can't actually police the
interrupts that are raised without host hardware support.

Actually, putting other vectors in the MSI-X table is boring, most
drivers will ignore unexpected interrupts.
Much more interesting are physical memory addresses and accessible IO
addresses.
Of course, a lot of boards have PCI master capability and can probably
be persuaded to do writes to specific location anyway.

David

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Avoid NULL group dereference

2017-08-17 Thread Marc Zyngier
On 17/08/17 11:40, Robin Murphy wrote:
> The recently-removed FIXME in iommu_get_domain_for_dev() turns out to
> have been a little misleading, since that check is still worthwhile even
> when groups *are* universal. We have a few IOMMU-aware drivers which
> only care whether their device is already attached to an existing domain
> or not, for which the previous behaviour of iommu_get_domain_for_dev()
> was ideal, and who now crash if their device does not have an IOMMU.
> 
> With IOMMU groups now serving as a reliable indicator of whether a
> device has an IOMMU or not (barring false-positives from VFIO no-IOMMU
> mode), drivers could arguably do this:
> 
>   group = iommu_group_get(dev);
>   if (group) {
>   domain = iommu_get_domain_for_dev(dev);
>   iommu_group_put(group);
>   }
> 
> However, rather than duplicate that code across multiple callsites,
> particularly when it's still only the domain they care about, let's skip
> straight to the next step and factor out the check into the common place
> it applies - in iommu_get_domain_for_dev() itself. Sure, it ends up
> looking rather familiar, but now it's backed by the reasoning of having
> a robust API able to do the expected thing for all devices regardless.
> 
> Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
> Reported-by: Shawn Lin 
> Signed-off-by: Robin Murphy 

Acked-by: Marc Zyngier 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: Avoid NULL group dereference

2017-08-17 Thread Robin Murphy
The recently-removed FIXME in iommu_get_domain_for_dev() turns out to
have been a little misleading, since that check is still worthwhile even
when groups *are* universal. We have a few IOMMU-aware drivers which
only care whether their device is already attached to an existing domain
or not, for which the previous behaviour of iommu_get_domain_for_dev()
was ideal, and who now crash if their device does not have an IOMMU.

With IOMMU groups now serving as a reliable indicator of whether a
device has an IOMMU or not (barring false-positives from VFIO no-IOMMU
mode), drivers could arguably do this:

group = iommu_group_get(dev);
if (group) {
domain = iommu_get_domain_for_dev(dev);
iommu_group_put(group);
}

However, rather than duplicate that code across multiple callsites,
particularly when it's still only the domain they care about, let's skip
straight to the next step and factor out the check into the common place
it applies - in iommu_get_domain_for_dev() itself. Sure, it ends up
looking rather familiar, but now it's backed by the reasoning of having
a robust API able to do the expected thing for all devices regardless.

Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
Reported-by: Shawn Lin 
Signed-off-by: Robin Murphy 
---

As well as dma-iommu, there are at least the Cavium ThunderX and
Freescale DPAA2 ethernet drivers expecting this to work too.

 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index af69bf7e035a..5499a0387349 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1352,6 +1352,8 @@ struct iommu_domain *iommu_get_domain_for_dev(struct 
device *dev)
struct iommu_group *group;
 
group = iommu_group_get(dev);
+   if (!group)
+   return NULL;
 
domain = group->domain;
 
-- 
2.13.4.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-08-17 Thread Jean-Philippe Brucker
Hi Adam,

On 16/08/17 05:08, Adam Tao wrote:
>> * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
>>   Bhushan.
> Hi, Brucker
> I read the related spec for virtio IOMMU,
> I am wondering if we support both the virtual and physical devices in
> the guest to use the virtio IOMMU, how can we config the related address
> translation for the different type of devices.
> e.g.
> 1. for Virtio devs(e.g. virtio-net), we can change the memmap table used
> by the vhost device.
> 2. for physical devices we can directly config the IOMMU/SMMU on the
> host.
> 
> So do you know are there any RFCs for the back-end implementation for the 
> virtio-IOMMU
> thanks
> Adam

Eric's series [3] adds support for virtual devices (I'v only had time to
test virtio-net so far, but as vhost-iotlb seems to be supported by Qemu,
it should work with vhost-net as well). Bharat is working on the VFIO
backend for physical devices. As far as I'm aware, his latest version is:

[7] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg04094.html

Thanks,
Jean


>> [1] http://www.spinics.net/lists/kvm/msg147990.html
>> [2] git://linux-arm.org/virtio-iommu.git branch viommu/v0.4
>> 
>> http://www.linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.4/virtio-iommu-v0.4.pdf
>> I reiterate the disclaimers: don't use this document as a reference,
>> it's a draft. It's also not an OASIS document yet. It may be riddled
>> with mistakes. As this is a working draft, it is unstable and I do not
>> guarantee backward compatibility of future versions.
>> [3] https://lists.gnu.org/archive/html/qemu-arm/2017-08/msg4.html
>> [4] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.4
>> Warning: UAPI headers have changed! They didn't follow the spec,
>> please update. (Use branch v0.1, that has the old headers, for the
>> Qemu prototype [3])
>> [5] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.4
>> Warning: command-line has changed! Use --viommu vfio[,opts] and
>> --viommu virtio[,opts] to instantiate a device.
>> [6] http://www.linux-arm.org/git?p=virtio-iommu.git;a=tree;f=dist/diffs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] irqchip/gic-{v2m, v3-its}: check iommu capable before doing iommu map

2017-08-17 Thread Robin Murphy
On 17/08/17 10:22, Marc Zyngier wrote:
> On 17/08/17 10:01, Shawn Lin wrote:
>> Hi Marc
>>
>> On 2017/8/17 16:52, Marc Zyngier wrote:
>>> On 17/08/17 09:28, Shawn Lin wrote:
 If a PCIe RC use gic-v2m or gic-v3-its as a msi domain but doesn't
 have iommu support, we don't need to do iommu_dma_map_msi_msg to
 get mapped iommu address as all we need is the physical address.
 Otherwise we see the oops shown below as we can't get a iommu_group
 for a device without iommu capable.

 Unable to handle kernel NULL pointer dereference at virtual address
 00d0
 [00d0] user address but active_mm is swapper
 Internal error: Oops: 9604 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 3 PID: 1 Comm: swapper/0 Not tainted
 4.13.0-rc5-next-20170815-1-g0744890-dirty #53
 Hardware name: Firefly-RK3399 Board (DT)
 task: 80007bc7 task.stack: 80007bc6c000
 PC is at iommu_get_domain_for_dev+0x38/0x50
 LR is at iommu_dma_map_msi_msg+0x3c/0x1b8
 pc : [] lr : [] pstate:
 a045

 ...

 [] iommu_get_domain_for_dev+0x38/0x50
 [] iommu_dma_map_msi_msg+0x3c/0x1b8
 [] its_irq_compose_msi_msg+0x44/0x50
 [] irq_chip_compose_msi_msg+0x40/0x58
 [] msi_domain_activate+0x1c/0x48
 [] __irq_domain_activate_irq+0x40/0x58
 [] irq_domain_activate_irq+0x24/0x40
 [] msi_domain_alloc_irqs+0x104/0x190
 [] pci_msi_setup_msi_irqs+0x3c/0x48
 [] __pci_enable_msi_range+0x21c/0x408
 [] pci_alloc_irq_vectors_affinity+0x104/0x168
 [] pcie_port_device_register+0x200/0x488
 [] pcie_portdrv_probe+0x34/0xd0
 [] local_pci_probe+0x3c/0xb8
 [] pci_device_probe+0x138/0x170
 [] driver_probe_device+0x21c/0x2d8
 [] __device_attach_driver+0x9c/0xf8
 [] bus_for_each_drv+0x5c/0x98
 [] __device_attach+0xc4/0x138
 [] device_attach+0x10/0x18
 [] pci_bus_add_device+0x4c/0xa8
 [] pci_bus_add_devices+0x44/0x90
 [] rockchip_pcie_probe+0xc70/0xcc8
 [] platform_drv_probe+0x58/0xc0
 [] driver_probe_device+0x21c/0x2d8
 [] __driver_attach+0xac/0xb0
 [] bus_for_each_dev+0x64/0xa0
 [] driver_attach+0x20/0x28
 [] bus_add_driver+0x110/0x230
 [] driver_register+0x60/0xf8
 [] __platform_driver_register+0x40/0x48
 [] rockchip_pcie_driver_init+0x18/0x20
 [] do_one_initcall+0xb0/0x120
 [] kernel_init_freeable+0x184/0x224
 [] kernel_init+0x10/0x100
 [] ret_from_fork+0x10/0x18

 This patch is to fix the problem exposed by the commit mentioned below.
 Before this commit, iommu has a work around method to fix this but now
 it doesn't. So we could fix this in gic code but maybe still need a fixes
 tag here.

 Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
 Signed-off-by: Shawn Lin 
>>>
>>> That looks pretty horrible. Please see the patch I posted in response to
>>> your initial report:
>>>
>>> http://marc.info/?l=linux-pci=150295809430903=2
>>
>> Aha, I saw it but I didn't get your point of "I'm not convinced  that
>> playing with the group refcount in an irqchip is the right approach..."
>>
>> So, I'm not sure whether you prefer your patch?
> I really dislike both:
> 
> - yours: because it reinvent the wheel (parsing DT one more time on each
> MSI message creation), and is DT specific while the rest of the code is
> completely firmware agnostic (what about ACPI?).
> 
> - mine: because it relies on an intimate knowledge of the internals of
> the iommu stuff, which is not what was originally intended.
> 
> My comment was an invitation to Joerg to speak his mind.
> 
> The original intent of iommu_dma_map_msi_msg() was that it could silently
> fail without exploding in your face. We can change that, but a minimum of
> coordination wouldn't hurt.
> 
> My personal preference would be to address this in the iommu layer,
> restoring a non-exploding iommu_dma_map_msi_msg():

Indeed, but that still doesn't fix other breakages that simply haven't
been found yet ;)

Thanks for the reports - I see exactly what I've done here. Gimme a
moment to write it up convincingly...

Robin.

> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 44eca1e3243f..5440eae21bea 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -883,12 +883,18 @@ static struct iommu_dma_msi_page 
> *iommu_dma_get_msi_page(struct device *dev,
>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>  {
>   struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq));
> - struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> + struct iommu_domain *domain;
> + struct iommu_group *group;
>   struct iommu_dma_cookie *cookie;
>   struct iommu_dma_msi_page *msi_page;
>   phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo;
>   unsigned long flags;
>  
> + group = iommu_group_get(dev);
> + 

Re: [PATCH] irqchip/gic-{v2m, v3-its}: check iommu capable before doing iommu map

2017-08-17 Thread Marc Zyngier
On 17/08/17 10:01, Shawn Lin wrote:
> Hi Marc
> 
> On 2017/8/17 16:52, Marc Zyngier wrote:
>> On 17/08/17 09:28, Shawn Lin wrote:
>>> If a PCIe RC use gic-v2m or gic-v3-its as a msi domain but doesn't
>>> have iommu support, we don't need to do iommu_dma_map_msi_msg to
>>> get mapped iommu address as all we need is the physical address.
>>> Otherwise we see the oops shown below as we can't get a iommu_group
>>> for a device without iommu capable.
>>>
>>> Unable to handle kernel NULL pointer dereference at virtual address
>>> 00d0
>>> [00d0] user address but active_mm is swapper
>>> Internal error: Oops: 9604 [#1] PREEMPT SMP
>>> Modules linked in:
>>> CPU: 3 PID: 1 Comm: swapper/0 Not tainted
>>> 4.13.0-rc5-next-20170815-1-g0744890-dirty #53
>>> Hardware name: Firefly-RK3399 Board (DT)
>>> task: 80007bc7 task.stack: 80007bc6c000
>>> PC is at iommu_get_domain_for_dev+0x38/0x50
>>> LR is at iommu_dma_map_msi_msg+0x3c/0x1b8
>>> pc : [] lr : [] pstate:
>>> a045
>>>
>>> ...
>>>
>>> [] iommu_get_domain_for_dev+0x38/0x50
>>> [] iommu_dma_map_msi_msg+0x3c/0x1b8
>>> [] its_irq_compose_msi_msg+0x44/0x50
>>> [] irq_chip_compose_msi_msg+0x40/0x58
>>> [] msi_domain_activate+0x1c/0x48
>>> [] __irq_domain_activate_irq+0x40/0x58
>>> [] irq_domain_activate_irq+0x24/0x40
>>> [] msi_domain_alloc_irqs+0x104/0x190
>>> [] pci_msi_setup_msi_irqs+0x3c/0x48
>>> [] __pci_enable_msi_range+0x21c/0x408
>>> [] pci_alloc_irq_vectors_affinity+0x104/0x168
>>> [] pcie_port_device_register+0x200/0x488
>>> [] pcie_portdrv_probe+0x34/0xd0
>>> [] local_pci_probe+0x3c/0xb8
>>> [] pci_device_probe+0x138/0x170
>>> [] driver_probe_device+0x21c/0x2d8
>>> [] __device_attach_driver+0x9c/0xf8
>>> [] bus_for_each_drv+0x5c/0x98
>>> [] __device_attach+0xc4/0x138
>>> [] device_attach+0x10/0x18
>>> [] pci_bus_add_device+0x4c/0xa8
>>> [] pci_bus_add_devices+0x44/0x90
>>> [] rockchip_pcie_probe+0xc70/0xcc8
>>> [] platform_drv_probe+0x58/0xc0
>>> [] driver_probe_device+0x21c/0x2d8
>>> [] __driver_attach+0xac/0xb0
>>> [] bus_for_each_dev+0x64/0xa0
>>> [] driver_attach+0x20/0x28
>>> [] bus_add_driver+0x110/0x230
>>> [] driver_register+0x60/0xf8
>>> [] __platform_driver_register+0x40/0x48
>>> [] rockchip_pcie_driver_init+0x18/0x20
>>> [] do_one_initcall+0xb0/0x120
>>> [] kernel_init_freeable+0x184/0x224
>>> [] kernel_init+0x10/0x100
>>> [] ret_from_fork+0x10/0x18
>>>
>>> This patch is to fix the problem exposed by the commit mentioned below.
>>> Before this commit, iommu has a work around method to fix this but now
>>> it doesn't. So we could fix this in gic code but maybe still need a fixes
>>> tag here.
>>>
>>> Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
>>> Signed-off-by: Shawn Lin 
>>
>> That looks pretty horrible. Please see the patch I posted in response to
>> your initial report:
>>
>> http://marc.info/?l=linux-pci=150295809430903=2
> 
> Aha, I saw it but I didn't get your point of "I'm not convinced  that
> playing with the group refcount in an irqchip is the right approach..."
> 
> So, I'm not sure whether you prefer your patch?
I really dislike both:

- yours: because it reinvent the wheel (parsing DT one more time on each
MSI message creation), and is DT specific while the rest of the code is
completely firmware agnostic (what about ACPI?).

- mine: because it relies on an intimate knowledge of the internals of
the iommu stuff, which is not what was originally intended.

My comment was an invitation to Joerg to speak his mind.

The original intent of iommu_dma_map_msi_msg() was that it could silently
fail without exploding in your face. We can change that, but a minimum of
coordination wouldn't hurt.

My personal preference would be to address this in the iommu layer,
restoring a non-exploding iommu_dma_map_msi_msg():

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 44eca1e3243f..5440eae21bea 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -883,12 +883,18 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
 void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 {
struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq));
-   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   struct iommu_domain *domain;
+   struct iommu_group *group;
struct iommu_dma_cookie *cookie;
struct iommu_dma_msi_page *msi_page;
phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo;
unsigned long flags;
 
+   group = iommu_group_get(dev);
+   if (!group)
+   return;
+   domain = iommu_get_domain_for_dev(dev);
+   iommu_group_put(group);
if (!domain || !domain->iova_cookie)
return;
 
Comments welcome.

M.
-- 
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org

Re: [PATCH] irqchip/gic-{v2m, v3-its}: check iommu capable before doing iommu map

2017-08-17 Thread Shawn Lin

Hi Marc

On 2017/8/17 16:52, Marc Zyngier wrote:

On 17/08/17 09:28, Shawn Lin wrote:

If a PCIe RC use gic-v2m or gic-v3-its as a msi domain but doesn't
have iommu support, we don't need to do iommu_dma_map_msi_msg to
get mapped iommu address as all we need is the physical address.
Otherwise we see the oops shown below as we can't get a iommu_group
for a device without iommu capable.

Unable to handle kernel NULL pointer dereference at virtual address
00d0
[00d0] user address but active_mm is swapper
Internal error: Oops: 9604 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 1 Comm: swapper/0 Not tainted
4.13.0-rc5-next-20170815-1-g0744890-dirty #53
Hardware name: Firefly-RK3399 Board (DT)
task: 80007bc7 task.stack: 80007bc6c000
PC is at iommu_get_domain_for_dev+0x38/0x50
LR is at iommu_dma_map_msi_msg+0x3c/0x1b8
pc : [] lr : [] pstate:
a045

...

[] iommu_get_domain_for_dev+0x38/0x50
[] iommu_dma_map_msi_msg+0x3c/0x1b8
[] its_irq_compose_msi_msg+0x44/0x50
[] irq_chip_compose_msi_msg+0x40/0x58
[] msi_domain_activate+0x1c/0x48
[] __irq_domain_activate_irq+0x40/0x58
[] irq_domain_activate_irq+0x24/0x40
[] msi_domain_alloc_irqs+0x104/0x190
[] pci_msi_setup_msi_irqs+0x3c/0x48
[] __pci_enable_msi_range+0x21c/0x408
[] pci_alloc_irq_vectors_affinity+0x104/0x168
[] pcie_port_device_register+0x200/0x488
[] pcie_portdrv_probe+0x34/0xd0
[] local_pci_probe+0x3c/0xb8
[] pci_device_probe+0x138/0x170
[] driver_probe_device+0x21c/0x2d8
[] __device_attach_driver+0x9c/0xf8
[] bus_for_each_drv+0x5c/0x98
[] __device_attach+0xc4/0x138
[] device_attach+0x10/0x18
[] pci_bus_add_device+0x4c/0xa8
[] pci_bus_add_devices+0x44/0x90
[] rockchip_pcie_probe+0xc70/0xcc8
[] platform_drv_probe+0x58/0xc0
[] driver_probe_device+0x21c/0x2d8
[] __driver_attach+0xac/0xb0
[] bus_for_each_dev+0x64/0xa0
[] driver_attach+0x20/0x28
[] bus_add_driver+0x110/0x230
[] driver_register+0x60/0xf8
[] __platform_driver_register+0x40/0x48
[] rockchip_pcie_driver_init+0x18/0x20
[] do_one_initcall+0xb0/0x120
[] kernel_init_freeable+0x184/0x224
[] kernel_init+0x10/0x100
[] ret_from_fork+0x10/0x18

This patch is to fix the problem exposed by the commit mentioned below.
Before this commit, iommu has a work around method to fix this but now
it doesn't. So we could fix this in gic code but maybe still need a fixes
tag here.

Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
Signed-off-by: Shawn Lin 


That looks pretty horrible. Please see the patch I posted in response to
your initial report:

http://marc.info/?l=linux-pci=150295809430903=2


Aha, I saw it but I didn't get your point of "I'm not convinced  that
playing with the group refcount in an irqchip is the right approach..."

So, I'm not sure whether you prefer your patch?



Thanks,

M.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] irqchip/gic-{v2m, v3-its}: check iommu capable before doing iommu map

2017-08-17 Thread Marc Zyngier
On 17/08/17 09:28, Shawn Lin wrote:
> If a PCIe RC use gic-v2m or gic-v3-its as a msi domain but doesn't
> have iommu support, we don't need to do iommu_dma_map_msi_msg to
> get mapped iommu address as all we need is the physical address.
> Otherwise we see the oops shown below as we can't get a iommu_group
> for a device without iommu capable.
> 
> Unable to handle kernel NULL pointer dereference at virtual address
> 00d0
> [00d0] user address but active_mm is swapper
> Internal error: Oops: 9604 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 PID: 1 Comm: swapper/0 Not tainted
> 4.13.0-rc5-next-20170815-1-g0744890-dirty #53
> Hardware name: Firefly-RK3399 Board (DT)
> task: 80007bc7 task.stack: 80007bc6c000
> PC is at iommu_get_domain_for_dev+0x38/0x50
> LR is at iommu_dma_map_msi_msg+0x3c/0x1b8
> pc : [] lr : [] pstate:
> a045
> 
> ...
> 
> [] iommu_get_domain_for_dev+0x38/0x50
> [] iommu_dma_map_msi_msg+0x3c/0x1b8
> [] its_irq_compose_msi_msg+0x44/0x50
> [] irq_chip_compose_msi_msg+0x40/0x58
> [] msi_domain_activate+0x1c/0x48
> [] __irq_domain_activate_irq+0x40/0x58
> [] irq_domain_activate_irq+0x24/0x40
> [] msi_domain_alloc_irqs+0x104/0x190
> [] pci_msi_setup_msi_irqs+0x3c/0x48
> [] __pci_enable_msi_range+0x21c/0x408
> [] pci_alloc_irq_vectors_affinity+0x104/0x168
> [] pcie_port_device_register+0x200/0x488
> [] pcie_portdrv_probe+0x34/0xd0
> [] local_pci_probe+0x3c/0xb8
> [] pci_device_probe+0x138/0x170
> [] driver_probe_device+0x21c/0x2d8
> [] __device_attach_driver+0x9c/0xf8
> [] bus_for_each_drv+0x5c/0x98
> [] __device_attach+0xc4/0x138
> [] device_attach+0x10/0x18
> [] pci_bus_add_device+0x4c/0xa8
> [] pci_bus_add_devices+0x44/0x90
> [] rockchip_pcie_probe+0xc70/0xcc8
> [] platform_drv_probe+0x58/0xc0
> [] driver_probe_device+0x21c/0x2d8
> [] __driver_attach+0xac/0xb0
> [] bus_for_each_dev+0x64/0xa0
> [] driver_attach+0x20/0x28
> [] bus_add_driver+0x110/0x230
> [] driver_register+0x60/0xf8
> [] __platform_driver_register+0x40/0x48
> [] rockchip_pcie_driver_init+0x18/0x20
> [] do_one_initcall+0xb0/0x120
> [] kernel_init_freeable+0x184/0x224
> [] kernel_init+0x10/0x100
> [] ret_from_fork+0x10/0x18
> 
> This patch is to fix the problem exposed by the commit mentioned below.
> Before this commit, iommu has a work around method to fix this but now
> it doesn't. So we could fix this in gic code but maybe still need a fixes
> tag here.
> 
> Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
> Signed-off-by: Shawn Lin 

That looks pretty horrible. Please see the patch I posted in response to
your initial report:

http://marc.info/?l=linux-pci=150295809430903=2

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: PCIe oops for NULL pointer dereference during next-20170807 and next-20170815

2017-08-17 Thread Shawn Lin

On 2017/8/17 15:58, Joerg Roedel wrote:

On Thu, Aug 17, 2017 at 03:02:31PM +0800, Shawn Lin wrote:

So should we revert this commit or maybe we could add some checking
into gic-v2m and gic-v3-its to see if the dev is iommu-capable? If not,
we should create another routine to map MSI msg.


Yes, fixing this in gic code is the right approach. I usually don't
revert patches to hide problems somewhere else.



Ok, then I post a patch for fixing that in gic code. Thanks.




Regards,

Joerg






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] irqchip/gic-{v2m, v3-its}: check iommu capable before doing iommu map

2017-08-17 Thread Shawn Lin
If a PCIe RC use gic-v2m or gic-v3-its as a msi domain but doesn't
have iommu support, we don't need to do iommu_dma_map_msi_msg to
get mapped iommu address as all we need is the physical address.
Otherwise we see the oops shown below as we can't get a iommu_group
for a device without iommu capable.

Unable to handle kernel NULL pointer dereference at virtual address
00d0
[00d0] user address but active_mm is swapper
Internal error: Oops: 9604 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 1 Comm: swapper/0 Not tainted
4.13.0-rc5-next-20170815-1-g0744890-dirty #53
Hardware name: Firefly-RK3399 Board (DT)
task: 80007bc7 task.stack: 80007bc6c000
PC is at iommu_get_domain_for_dev+0x38/0x50
LR is at iommu_dma_map_msi_msg+0x3c/0x1b8
pc : [] lr : [] pstate:
a045

...

[] iommu_get_domain_for_dev+0x38/0x50
[] iommu_dma_map_msi_msg+0x3c/0x1b8
[] its_irq_compose_msi_msg+0x44/0x50
[] irq_chip_compose_msi_msg+0x40/0x58
[] msi_domain_activate+0x1c/0x48
[] __irq_domain_activate_irq+0x40/0x58
[] irq_domain_activate_irq+0x24/0x40
[] msi_domain_alloc_irqs+0x104/0x190
[] pci_msi_setup_msi_irqs+0x3c/0x48
[] __pci_enable_msi_range+0x21c/0x408
[] pci_alloc_irq_vectors_affinity+0x104/0x168
[] pcie_port_device_register+0x200/0x488
[] pcie_portdrv_probe+0x34/0xd0
[] local_pci_probe+0x3c/0xb8
[] pci_device_probe+0x138/0x170
[] driver_probe_device+0x21c/0x2d8
[] __device_attach_driver+0x9c/0xf8
[] bus_for_each_drv+0x5c/0x98
[] __device_attach+0xc4/0x138
[] device_attach+0x10/0x18
[] pci_bus_add_device+0x4c/0xa8
[] pci_bus_add_devices+0x44/0x90
[] rockchip_pcie_probe+0xc70/0xcc8
[] platform_drv_probe+0x58/0xc0
[] driver_probe_device+0x21c/0x2d8
[] __driver_attach+0xac/0xb0
[] bus_for_each_dev+0x64/0xa0
[] driver_attach+0x20/0x28
[] bus_add_driver+0x110/0x230
[] driver_register+0x60/0xf8
[] __platform_driver_register+0x40/0x48
[] rockchip_pcie_driver_init+0x18/0x20
[] do_one_initcall+0xb0/0x120
[] kernel_init_freeable+0x184/0x224
[] kernel_init+0x10/0x100
[] ret_from_fork+0x10/0x18

This patch is to fix the problem exposed by the commit mentioned below.
Before this commit, iommu has a work around method to fix this but now
it doesn't. So we could fix this in gic code but maybe still need a fixes
tag here.

Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
Signed-off-by: Shawn Lin 
---

 drivers/irqchip/irq-gic-common.c | 10 ++
 drivers/irqchip/irq-gic-common.h |  2 +-
 drivers/irqchip/irq-gic-v2m.c|  6 +-
 drivers/irqchip/irq-gic-v3-its.c |  4 +++-
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
index 9ae7180..c340d70 100644
--- a/drivers/irqchip/irq-gic-common.c
+++ b/drivers/irqchip/irq-gic-common.c
@@ -14,6 +14,7 @@
  * along with this program.  If not, see .
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -146,3 +147,12 @@ void gic_cpu_config(void __iomem *base, void 
(*sync_access)(void))
if (sync_access)
sync_access();
 }
+
+bool gic_check_iommu_capable(struct device *dev)
+{
+   struct device_node *np = dev->of_node;
+   int ret;
+
+   ret = of_count_phandle_with_args(np, "iommus", "#iommu-cells");
+   return (ret > 0);
+}
diff --git a/drivers/irqchip/irq-gic-common.h b/drivers/irqchip/irq-gic-common.h
index 205e5fd..61c5db3 100644
--- a/drivers/irqchip/irq-gic-common.h
+++ b/drivers/irqchip/irq-gic-common.h
@@ -37,5 +37,5 @@ void gic_enable_quirks(u32 iidr, const struct gic_quirk 
*quirks,
void *data);
 
 void gic_set_kvm_info(const struct gic_kvm_info *info);
-
+bool gic_check_iommu_capable(struct device *dev);
 #endif /* _IRQ_GIC_COMMON_H */
diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index 993a842..87235d6 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 
+#include "irq-gic-common.h"
+
 /*
 * MSI_TYPER:
 * [31:26] Reserved
@@ -101,6 +103,7 @@ static void gicv2m_unmask_msi_irq(struct irq_data *d)
 static void gicv2m_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 {
struct v2m_data *v2m = irq_data_get_irq_chip_data(data);
+   struct device *dev = msi_desc_to_dev(irq_get_msi_desc(data->irq));
phys_addr_t addr = v2m->res.start + V2M_MSI_SETSPI_NS;
 
msg->address_hi = upper_32_bits(addr);
@@ -110,7 +113,8 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, 
struct msi_msg *msg)
if (v2m->flags & GICV2M_NEEDS_SPI_OFFSET)
msg->data -= v2m->spi_offset;
 
-   iommu_dma_map_msi_msg(data->irq, msg);
+   if (gic_check_iommu_capable(dev))
+   iommu_dma_map_msi_msg(data->irq, msg);
 }
 
 static struct irq_chip gicv2m_irq_chip = {
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 6893287..843d56a 100644
--- 

Re: PCIe oops for NULL pointer dereference during next-20170807 and next-20170815

2017-08-17 Thread Marc Zyngier
On 17/08/17 08:58, Joerg Roedel wrote:
> On Thu, Aug 17, 2017 at 03:02:31PM +0800, Shawn Lin wrote:
>> So should we revert this commit or maybe we could add some checking
>> into gic-v2m and gic-v3-its to see if the dev is iommu-capable? If not,
>> we should create another routine to map MSI msg.
> 
> Yes, fixing this in gic code is the right approach. I usually don't
> revert patches to hide problems somewhere else.
Here's what it would look like:

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index b47097a3e4b4..5e81ad62c801 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1125,6 +1125,8 @@ static void its_irq_compose_msi_msg(struct irq_data *d, 
struct msi_msg *msg)
 {
struct its_device *its_dev = irq_data_get_irq_chip_data(d);
struct its_node *its;
+   struct iommu_group *group;
+   struct device *dev;
u64 addr;
 
its = its_dev->its;
@@ -1134,7 +1136,12 @@ static void its_irq_compose_msi_msg(struct irq_data *d, 
struct msi_msg *msg)
msg->address_hi = upper_32_bits(addr);
msg->data   = its_get_event_id(d);
 
-   iommu_dma_map_msi_msg(d->irq, msg);
+   dev = msi_desc_to_dev(irq_get_msi_desc(d->irq));
+   group = iommu_group_get(dev);
+   if (group) {
+   iommu_dma_map_msi_msg(d->irq, msg);
+   iommu_group_put(group);
+   }
 }
 
 static int its_irq_set_irqchip_state(struct irq_data *d,

I'm not convinced  that playing with the group refcount in an irqchip
is the right approach...

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: PCIe oops for NULL pointer dereference during next-20170807 and next-20170815

2017-08-17 Thread Joerg Roedel
On Thu, Aug 17, 2017 at 03:02:31PM +0800, Shawn Lin wrote:
> So should we revert this commit or maybe we could add some checking
> into gic-v2m and gic-v3-its to see if the dev is iommu-capable? If not,
> we should create another routine to map MSI msg.

Yes, fixing this in gic code is the right approach. I usually don't
revert patches to hide problems somewhere else.


Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: PCIe oops for NULL pointer dereference during next-20170807 and next-20170815

2017-08-17 Thread Shawn Lin

+ Marc

On 2017/8/17 11:34, Shawn Lin wrote:

Hi all,

I see this NULL pointer dereference into linux-next these day which
break PCIe for my system, but not for linux-next-20170807. Is it that
commit causing this problem?

05f80300dc8bc ("iommu: Finish making iommu_group support mandatory")



The problem is that pcie-rockchip use gic-v3-its as a msi domain and
it assumes all of the users using it should has iommu support, so it
always iommu_dma_map_msi_msg . But unfortunately we don't.

Before this commit, iommu has a work around to fix this. Then Robin's
patch expose that problem by removing all the workaroud.

So should we revert this commit or maybe we could add some checking
into gic-v2m and gic-v3-its to see if the dev is iommu-capable? If not,
we should create another routine to map MSI msg.



As I check the oops dump and asm code, which shows me that we got a
NULL pointer for iommu_group in iommu_get_domain_for_dev.


Full dump:

[0.990401] pci :00:00.0: PCI bridge to [bus 01]
[0.990886] pci :00:00.0:   bridge window [mem 
0xfa00-0xfa0f]
[0.991553] pci :00:00.0: Max Payload Size set to  128/ 256 (was 
256), Max Read Rq  512
[0.992391] pci :01:00.0: Max Payload Size set to  128/ 128 (was 
128), Max Read Rq  512

[0.994084] pcieport :00:00.0: enabling device ( -> 0002)
[0.995009] Unable to handle kernel NULL pointer dereference at 
virtual address 00d0

[0.995775] Mem abort info:
[0.996048]   Exception class = DABT (current EL), IL = 32 bits
[0.996610]   SET = 0, FnV = 0
[0.996952]   EA = 0, S1PTW = 0
[0.997254] Data abort info:
[0.997532]   ISV = 0, ISS = 0x0004
[0.997900]   CM = 0, WnR = 0
[0.998187] [00d0] user address but active_mm is swapper
[0.998797] Internal error: Oops: 9604 [#1] PREEMPT SMP
[0.999334] Modules linked in:
[0.999660] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 
4.13.0-rc5-next-20170815-1-g0744890-dirty #53

[1.000547] Hardware name: Firefly-RK3399 Board (DT)
[1.001022] task: 80007bc7 task.stack: 80007bc6c000
[1.001637] PC is at iommu_get_domain_for_dev+0x38/0x50
[1.002143] LR is at iommu_dma_map_msi_msg+0x3c/0x1b8
[1.002626] pc : [] lr : [] 
pstate: a045

[1.003325] sp : 80007bc6f6d0
[1.003644] x29: 80007bc6f6d0 x28: 80007b654400
[1.004159] x27: 00d7 x26: 
[1.004673] x25: 08f097a0 x24: 80007bacf0a0
[1.005188] x23: 80007bacf0a0 x22: 80007bc5af00
[1.005702] x21: 80007bacf2a0 x20: 80007bc6f790
[1.006216] x19:  x18: 0001
[1.006729] x17: 0001 x16: 0019
[1.007243] x15:  x14: 80007b62fd04
[1.007758] x13: 80007b62fd03 x12: 0038
[1.008271] x11: 0040 x10: 80007c0ab930
[1.008785] x9 : 80007c0ab9e8 x8 : 
[1.009298] x7 : 80007b61b200 x6 : 80007b61b200
[1.009812] x5 : 80007c0ab908 x4 : 80007b61b200
[1.010325] x3 :  x2 : 
[1.010839] x1 : 80007b61b220 x0 : 80007bacf0a0
[1.011355] Process swapper/0 (pid: 1, stack limit = 0x80007bc6c000)
[1.011993] Stack: (0x80007bc6f6d0 to 0x80007bc7)
[1.012544] f6c0:   80007bc6f6f0 
0856c1e4
[1.013293] f6e0: 80007b61b220 80007bc6f790 80007bc6f750 
083d4d8c
[1.014039] f700: 80007b61b220 80007bc5af00 80007bacf2a0 
80007bc5af00
[1.014786] f720: 80007bacf0a0 08f09970 08f097a0 

[1.015531] f740: 00d7 80007b654400 80007bc6f760 
0811be50
[1.016278] f760: 80007bc6f770 08120e34 80007bc6f7a0 
0811d9f8
[1.017026] f780: 80007b61b220 0811d9e0 fee30040 
8000
[1.017774] f7a0: 80007bc6f7c0 0811f724 80007b61b220 
80007bacf2a0
[1.018520] f7c0: 80007bc6f7e0 0812139c 80007b62fe80 
0040
[1.019267] f7e0: 80007bc6f860 08445abc 80007b62fe80 
80007bacf000
[1.020013] f800: 0001 0001 0001 

[1.020760] f820: 80007bacf0a0 08cbaf10 0001 

[1.021507] f840: 80007b62fe80  80007b62ff00 
08446564
[1.022254] f860: 80007bc6f870 0844658c 80007bc6f8e0 
084468a4
[1.023001] f880:  fff4 0001 
80007bacf000
[1.023748] f8a0: 0006 0020 08c9d070 
08cbc000
[1.024495] f8c0: 80007ba4fc80 80007b654400 80007bc6f900 
0180c620
[1.025242] f8e0: 80007bc6f920 0843cd98 80007bacf000 

[1.025988] f900: