Re: dmaengine for sh7760 (was Re: use the generic dma-noncoherent code for sh V2)

2018-08-17 Thread Arnd Bergmann
On Fri, Aug 17, 2018 at 7:04 PM Rob Landley  wrote:
> On 07/31/2018 07:56 AM, Arnd Bergmann wrote:
> > On Fri, Jul 27, 2018 at 6:20 PM, Rob Landley  wrote:
> >> On 07/24/2018 03:21 PM, Christoph Hellwig wrote:
> >>> On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
>  Hi all,
> > If you hack on it, please convert the dmaengine platform data to use
> > a dma_slave_map array to pass the data into the dmaengine driver,
>
> The dmatest module didn't need it? I don't see why the ethernet driver would?
> (Isn't the point of an allocator to allocate from a request?)

I guess you have hit two of the special cases here:

- dmatest uses the memory-to-memory DMA engine interface, not the slave
  API, so you don't have to configure a slave at all

- smc91x (and its smc911x.c relative) are apparently special in that they
  use they use the DMA slave API but (AFAICT) require programming
  the dmaengine hardware into a memory-to-memory transfer with no
  DMA slave request signal and completely synchronous operation
  (the IRQ handler polls for the DMA descriptor to be complete),
  see also https://lkml.org/lkml/2018/4/3/464 for the discussion about
  the recent rework of that driver's implementation.

> > mapping the settings from a (pdev-name, channel-id) tuple to a pointer
> > that describes the channel configuration rather than having the
> > mapping from an numerical slave_id to a struct sh_dmae_slave_config
> > in the setup files. It should be a fairly mechanical conversion.
>
> I think all 8 channels are generic. Drivers should be able to grab them and
> release them at will, why does it need a table?
>
> (I say this not having made the smc91x.c driver use this yet, its "conversion"
> to device tree left it full of PXA #ifdefs and constants, and I've tried the

Another point about smc91x is that it only uses DMA on the PXA platform,
which is not part of the "multiplatform" ARM setup. It's likely that no
other platform actually has a DMA engine that can talk to this device in
the absence of a request signal, or that on more modern CPU cores,
a readsl() is actually just as fast, but it avoids the setup cost of talking
to the dma engine. Possibly both of the above.

> last half-dozen kernel releases and qemu releases and have yet to find an arm
> mainstone board under qemu that _doesn't_ time out trying to use DMA with this
> card. But that's another post...)

Is smc91x the only driver that you want to make use of the DMA engine?
I suspect that every other one currently relies on passing a slave ID
shdma_chan_filter into dma_request_slave_channel_compat() or
dma_request_channel() , which are some of the interfaces we want to
remove in the future, to make everything work the same across
all platforms.

shdma_chan_filter() is one of those that expect its pointer argument to
be a number that is in turn associated with an sh_dmae_slave_config
structure in the platform data of the dma engine. What the newer
dma_request_chan() interface does is to pass a pointer to the
slave device and a string as identifier for the same data, which then
gets associated through the dma_slave_map. On smc91x, both
the device and name argument are NULL, which triggers the special
case in the pxa dmaengine driver.

> > The other part I noticed is arch/sh/drivers/dma/*, which appears to
> > be entirely unused, and should probably removed.
>
> I had to switch that off to get this to work, yes. I believe it predates
> dmaengine and was obsoleted by it.

Ok. Have you found any reason to keep it around though?

   Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


dmaengine for sh7760 (was Re: use the generic dma-noncoherent code for sh V2)

2018-08-17 Thread Rob Landley



On 07/31/2018 07:56 AM, Arnd Bergmann wrote:
> On Fri, Jul 27, 2018 at 6:20 PM, Rob Landley  wrote:
>> On 07/24/2018 03:21 PM, Christoph Hellwig wrote:
>>> On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
 Hi all,

 can you review these patches to switch sh to use the generic
 dma-noncoherent code?  All the requirements are in mainline already
 and we've switched various architectures over to it already.
>>>
>>> Ok, there is one more issue with this version.   Wait for a new one
>>> tomorrow.
>>
>> Speaking of DMA:
>>
>> I'm trying to wire up DMAEngine to an sh7760 board that uses platform data 
>> (and
>> fix the smc91x.c driver to use DMAEngine without #ifdef arm), so I've been
>> reading through all that stuff, but the docs seem kinda... thin?
>>
>> Is there something I should have read other than
>> Documentation/driver-model/platform.txt,
>> Documentation/dmaegine/{provider,client}.txt, then trying to picking through 
>> the
>> source code and the sh7760 hardware pdf? (And watching the youtube video of
>> Laurent Pinchart's 2014 ELC talk on DMA, Maxime Ripard's 2015 ELC overview of
>> DMAEngine, the Xilinx video on DMAEngine...)
>>
>> At first I thought the SH_DMAE could initialize itself, but the probe 
>> function
>> needs platform data, and although arch/sh/kernel/cpu/sh4a/setup-sh7722.c 
>> looks
>> _kind_ of like a model I can crib from:
> 
>> B) That platform data is supplying sh_dmae_slave_config preallocating slave
>> channels to devices? (Does it have to? The docs gave me the impression the
>> driver would dynamically request them and devices could even share. Wasn't 
>> that
>> sort of the point of DMAEngine? Can my new board data _not_ do that? What's 
>> the
>> minimum amount of micromanaging I have to do?)
> 
> The thing here is that arch/sh is way behind on the API use, and it
> has prevented us from cleaning up drivers as well. A slave driver
> should have to just call dma_request_chan() with a constant
> string to identify its channel rather than going two different ways
> depending on whether it's used with DT or platform data.

I got the DMA controller hooked up to DMAEngine and the dmatest module is happy
with the result on all 8 channels. (Finding
arch/sh/kernel/cpu/sh4a/setup-sh7722.c helped a lot, finding it earlier would
have helped more. :)

The config symbols are:

CONFIG_SH_DMA=y
CONFIG_DMADEVICES=y
CONFIG_SH_DMAE_BASE=y
CONFIG_SH_DMAE=y
CONFIG_DMATEST=y #optional

The platform data is:

#include 
#include 
#include 

/* DMA */
static struct resource sh7760_dma_resources[] = {
{
.start  = SH_DMAC_BASE0,
.end= SH_DMAC_BASE0 + 9*16 - 1,
.flags  = IORESOURCE_MEM,
}, {
.start  = DMTE0_IRQ,
.end= DMTE0_IRQ,
.flags  = IORESOURCE_IRQ,
}
};

static struct sh_dmae_channel dma_chan[] = {
{
.offset = 0,
.dmars = 0,
.dmars_bit = 0,
}, {
.offset = 0x10,
.dmars = 0,
.dmars_bit = 8,
}, {
.offset = 0x20,
.dmars = 4,
.dmars_bit = 0,
}, {
.offset = 0x30,
.dmars = 4,
.dmars_bit = 8,
}, {
.offset = 0x50,
.dmars = 8,
.dmars_bit = 0,
}, {
.offset = 0x60,
.dmars = 8,
.dmars_bit = 8,
}, {
.offset = 0x70,
.dmars = 12,
.dmars_bit = 0,
}, {
.offset = 0x80,
.dmars = 12,
.dmars_bit = 8,
}
};

static const unsigned int ts_shift[] = TS_SHIFT;

static struct sh_dmae_pdata sh7760_dma_pdata = {
.channel = dma_chan,
.channel_num = ARRAY_SIZE(dma_chan),
.ts_low_shift   = CHCR_TS_LOW_SHIFT,
.ts_low_mask= CHCR_TS_LOW_MASK,
.ts_high_shift  = CHCR_TS_HIGH_SHIFT,
.ts_high_mask   = CHCR_TS_HIGH_MASK,
.ts_shift   = ts_shift,
.ts_shift_num   = ARRAY_SIZE(ts_shift),
.dmaor_init = DMAOR_INIT,
.dmaor_is_32bit = 1,
};

struct platform_device sh7760_dma_device = {
.name   = "sh-dma-engine",
.id = -1,
.num_resources  = ARRAY_SIZE(sh7760_dma_resources),
.resource   = sh7760_dma_resources,
.dev = {.platform_data = _dma_pdata},
};


And then add sh7760_dma_device to your struct platform_device array.

> If you hack on it, please convert the dmaengine platform data to use
> a dma_slave_map array to pass the data into the dmaengine driver,

The dmatest module didn't need it? I don't see why the ethernet driver would?
(Isn't the point of an allocator to allocate from a request?)

> mapping the settings from a (pdev-name, channel-id) tuple to a pointer
> that 

[PATCH] iommu/arm-smmu-v3: Fix a couple of minor comment typos

2018-08-17 Thread John Garry
Fix some comment typos spotted.

Signed-off-by: John Garry 

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 5059d09..feef122 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -708,7 +708,7 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
 }
 
 /*
- * Wait for the SMMU to consume items. If drain is true, wait until the queue
+ * Wait for the SMMU to consume items. If sync is true, wait until the queue
  * is empty. Otherwise, wait until there is at least one free slot.
  */
 static int queue_poll_cons(struct arm_smmu_queue *q, bool sync, bool wfe)
@@ -2353,8 +2353,8 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
*smmu)
irq = smmu->combined_irq;
if (irq) {
/*
-* Cavium ThunderX2 implementation doesn't not support unique
-* irq lines. Use single irq line for all the SMMUv3 interrupts.
+* Cavium ThunderX2 implementation doesn't support unique irq
+* lines. Use a single irq line for all the SMMUv3 interrupts.
 */
ret = devm_request_threaded_irq(smmu->dev, irq,
arm_smmu_combined_irq_handler,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: Add fast hook for getting DMA domains

2018-08-17 Thread Laurentiu Tudor
Hi Robin,

On 14.08.2018 16:04, Robin Murphy wrote:
> While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
> API callers to retrieve the domain pointer, for DMA ops domains it
> doesn't scale well for large systems and multi-queue devices, since the
> momentary refcount adjustment will lead to exclusive cacheline contention
> when multiple CPUs are operating in parallel on different mappings for
> the same device.
> 
> In the case of DMA ops domains, however, this refcounting is actually
> unnecessary, since they already imply that the group exists and is
> managed by platform code and IOMMU internals (by virtue of
> iommu_group_get_for_dev()) such that a reference will already be held
> for the lifetime of the device. Thus we can avoid the bottleneck by
> providing a fast lookup specifically for the DMA code to retrieve the
> default domain it already knows it has set up - a simple read-only
> dereference plays much nicer with cache-coherency protocols.
> 
> Signed-off-by: Robin Murphy 
> ---
>   drivers/iommu/iommu.c | 9 +
>   include/linux/iommu.h | 1 +
>   2 files changed, 10 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 63b37563db7e..63c586875df5 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1379,6 +1379,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct 
> device *dev)
>   }
>   EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
>   
> +/*
> + * For IOMMU_DOMAIN_DMA implementations which already provide their own
> + * guarantees that the group and its default domain are valid and correct.
> + */
> +struct iommu_domain *iommu_get_dma_domain(struct device *dev)
> +{
> + return dev->iommu_group->default_domain;
> +}

After some preliminary tests I'm seeing a ~10% performance improvement 
on one of our chips (nxp ls1046a) which is pretty nice. :-)
Any chance of making the function inline?
If not, shouldn't an EXPORT_SYMBOL_GPL be added?

---
Thanks & Best Regards, Laurentiu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] dt-bindings: iommu: ipmmu-vmsa: Add r8a774a1 support

2018-08-17 Thread Fabrizio Castro
Document RZ/G2M (R8A774A1) SoC bindings.

Signed-off-by: Fabrizio Castro 
Reviewed-by: Biju Das 
---
This patch applies on top of next-20180817

 Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt 
b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
index ffadb7c..68446dd 100644
--- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
+++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
@@ -13,6 +13,7 @@ Required Properties:
 - "renesas,ipmmu-r8a73a4" for the R8A73A4 (R-Mobile APE6) IPMMU.
 - "renesas,ipmmu-r8a7743" for the R8A7743 (RZ/G1M) IPMMU.
 - "renesas,ipmmu-r8a7745" for the R8A7745 (RZ/G1E) IPMMU.
+- "renesas,ipmmu-r8a774a1" for the R8A774A1 (RZ/G2M) IPMMU.
 - "renesas,ipmmu-r8a7790" for the R8A7790 (R-Car H2) IPMMU.
 - "renesas,ipmmu-r8a7791" for the R8A7791 (R-Car M2-W) IPMMU.
 - "renesas,ipmmu-r8a7793" for the R8A7793 (R-Car M2-N) IPMMU.
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/ipmmu-vmsa: Hook up R8A774A1 DT maching code

2018-08-17 Thread Fabrizio Castro
Add support for RZ/G2M (R8A774A1) SoC IPMMUs.

Signed-off-by: Fabrizio Castro 
Reviewed-by: Biju Das 
---
This patch applies on top of renesas-drivers-2018-07-31-v4.18-rc7

 drivers/iommu/ipmmu-vmsa.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 51af2c5..cf5cfcf 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -761,6 +761,7 @@ static bool ipmmu_slave_whitelist(struct device *dev)
 }
 
 static const struct soc_device_attribute soc_rcar_gen3[] = {
+   { .soc_id = "r8a774a1", },
{ .soc_id = "r8a7795", },
{ .soc_id = "r8a7796", },
{ .soc_id = "r8a77965", },
@@ -942,6 +943,9 @@ static const struct of_device_id ipmmu_of_ids[] = {
.compatible = "renesas,ipmmu-vmsa",
.data = _features_default,
}, {
+   .compatible = "renesas,ipmmu-r8a774a1",
+   .data = _features_rcar_gen3,
+   }, {
.compatible = "renesas,ipmmu-r8a7795",
.data = _features_rcar_gen3,
}, {
@@ -1143,6 +1147,7 @@ subsys_initcall(ipmmu_init);
 module_exit(ipmmu_exit);
 
 IOMMU_OF_DECLARE(ipmmu_vmsa_iommu_of, "renesas,ipmmu-vmsa");
+IOMMU_OF_DECLARE(ipmmu_r8a774a1_iommu_of, "renesas,ipmmu-r8a774a1");
 IOMMU_OF_DECLARE(ipmmu_r8a7795_iommu_of, "renesas,ipmmu-r8a7795");
 IOMMU_OF_DECLARE(ipmmu_r8a7796_iommu_of, "renesas,ipmmu-r8a7796");
 IOMMU_OF_DECLARE(ipmmu_r8a77965_iommu_of, "renesas,ipmmu-r8a77965");
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] Add IPMMU compatibility for r8a774a1

2018-08-17 Thread Fabrizio Castro
Dear All,

this series adds IPMMU compatibility for the RZ/G2M (a.k.a. R8A774A1).

Cheers,
Fab

Fabrizio Castro (2):
  iommu/ipmmu-vmsa: Hook up R8A774A1 DT maching code
  dt-bindings: iommu: ipmmu-vmsa: Add r8a774a1 support

 Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt | 1 +
 drivers/iommu/ipmmu-vmsa.c | 5 +
 2 files changed, 6 insertions(+)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention

2018-08-17 Thread John Garry

On 14/08/2018 14:04, Robin Murphy wrote:

John raised the issue[1] that we have some unnecessary refcount contention
in the DMA ops path which shows scalability problems now that we have more
real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
sidestepping this by stashing domain references in archdata, but since
that's not very nice for architecture-agnostic code, I think it's time to
look at a generic API-level solution.

These are a couple of quick patches based on the idea I had back when
first implementing this lot, but didn't have any way to justify at the
time. The third patch can be ignored for the sake of API discussion, but
is included for completeness.

Robin.



Some results:

PCIe NIC iperf test (128 processes, small packets):
Without patchset:
289232.00 rxpck/s

With patchset:
367283 rxpck/s

JFYI, with Leizhen's non-strict mode patchset + Robin's patchset:
1215539 rxpck/s

Leizhen can share non-strict mode results in his own patchset however.

We did also try the storage controller fio test with a lot of SAS SSD 
disks (24 disks, 24 fio processes) for Robin's patchset only, but did 
not see a significant change.


Thanks to Dongdong + chenxiang for testing.

Let me know if you require more info.

Thanks again,
John




[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html

Robin Murphy (3):
  iommu: Add fast hook for getting DMA domains
  iommu/dma: Use fast DMA domain lookup
  arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

 arch/arm64/mm/dma-mapping.c | 10 +-
 drivers/iommu/dma-iommu.c   | 23 ---
 drivers/iommu/iommu.c   |  9 +
 include/linux/iommu.h   |  1 +
 4 files changed, 27 insertions(+), 16 deletions(-)




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention

2018-08-17 Thread Christoph Hellwig
On Fri, Aug 17, 2018 at 12:30:31PM +0100, Robin Murphy wrote:
> On 17/08/18 08:24, Christoph Hellwig wrote:
> > I plan to make the arm iommu dma ops generic and move them to
> > drivers/iommu for the 4.20 merge window.
> 
> You mean 32-bit arm?

Sorry, I meant the arm64 wrappers for dma-iommu of course.

> The only place that code should move to is /dev/null ;)
> - the plan has always been to convert it to use groups and default domains
> so it can just plug iommu-dma in. The tricky bit is either weaning the users
> of the private arm_iommu_*() API onto generic IOMMU API usage, or at least
> implementing transitional wrappers in a way that doesn't break anything.

Agreed on the arm32 iommu code.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention

2018-08-17 Thread Robin Murphy

On 17/08/18 08:24, Christoph Hellwig wrote:

I plan to make the arm iommu dma ops generic and move them to
drivers/iommu for the 4.20 merge window.


You mean 32-bit arm? The only place that code should move to is 
/dev/null ;) - the plan has always been to convert it to use groups and 
default domains so it can just plug iommu-dma in. The tricky bit is 
either weaning the users of the private arm_iommu_*() API onto generic 
IOMMU API usage, or at least implementing transitional wrappers in a way 
that doesn't break anything.



 Because of that it would
be great to create a stable branch or even pull this in through
the dma-mapping tree entirely.


FWIW I might still need to tweak the first patch depending on how we 
choose to resolve the interaction between DMA ops and unmanaged 
domains[1] - if we want to add fallback paths to iommu-dma instead of 
swizzling device ops then the assumptions for this hook change and it 
will need to reference group->domain rather than group->default_domain.


Robin.


[1] 
https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg230275.html

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: Add fast hook for getting DMA domains

2018-08-17 Thread Robin Murphy

On 17/08/18 10:36, John Garry wrote:

On 14/08/2018 14:04, Robin Murphy wrote:

While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
API callers to retrieve the domain pointer, for DMA ops domains it
doesn't scale well for large systems and multi-queue devices, since the
momentary refcount adjustment will lead to exclusive cacheline contention
when multiple CPUs are operating in parallel on different mappings for
the same device.

In the case of DMA ops domains, however, this refcounting is actually
unnecessary, since they already imply that the group exists and is
managed by platform code and IOMMU internals (by virtue of
iommu_group_get_for_dev()) such that a reference will already be held
for the lifetime of the device. Thus we can avoid the bottleneck by
providing a fast lookup specifically for the DMA code to retrieve the
default domain it already knows it has set up - a simple read-only
dereference plays much nicer with cache-coherency protocols.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/iommu.c | 9 +
 include/linux/iommu.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63b37563db7e..63c586875df5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1379,6 +1379,15 @@ struct iommu_domain 
*iommu_get_domain_for_dev(struct device *dev)

 }
 EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);

+/*
+ * For IOMMU_DOMAIN_DMA implementations which already provide their own
+ * guarantees that the group and its default domain are valid and 
correct.

+ */
+struct iommu_domain *iommu_get_dma_domain(struct device *dev)
+{
+    return dev->iommu_group->default_domain;
+}
+
 /*
  * IOMMU groups are really the natrual working unit of the IOMMU, but
  * the IOMMU API works on domains and devices.  Bridge that gap by
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee6eb31..16f2172698e5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -297,6 +297,7 @@ extern int iommu_attach_device(struct iommu_domain 
*domain,

 extern void iommu_detach_device(struct iommu_domain *domain,
 struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device 
*dev);

+extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);


Hi Robin,

I was wondering whether it's standard to provide a stubbed version of 
this function for !CONFIG_IOMMU_API?


Nope, that's deliberate - this is one of those hooks where any 
legitimate caller will already be dependent on IOMMU_API itself.


Robin.



Cheers,
John


 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
  phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long 
iova,






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: Add fast hook for getting DMA domains

2018-08-17 Thread John Garry

On 14/08/2018 14:04, Robin Murphy wrote:

While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
API callers to retrieve the domain pointer, for DMA ops domains it
doesn't scale well for large systems and multi-queue devices, since the
momentary refcount adjustment will lead to exclusive cacheline contention
when multiple CPUs are operating in parallel on different mappings for
the same device.

In the case of DMA ops domains, however, this refcounting is actually
unnecessary, since they already imply that the group exists and is
managed by platform code and IOMMU internals (by virtue of
iommu_group_get_for_dev()) such that a reference will already be held
for the lifetime of the device. Thus we can avoid the bottleneck by
providing a fast lookup specifically for the DMA code to retrieve the
default domain it already knows it has set up - a simple read-only
dereference plays much nicer with cache-coherency protocols.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/iommu.c | 9 +
 include/linux/iommu.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63b37563db7e..63c586875df5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1379,6 +1379,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct 
device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);

+/*
+ * For IOMMU_DOMAIN_DMA implementations which already provide their own
+ * guarantees that the group and its default domain are valid and correct.
+ */
+struct iommu_domain *iommu_get_dma_domain(struct device *dev)
+{
+   return dev->iommu_group->default_domain;
+}
+
 /*
  * IOMMU groups are really the natrual working unit of the IOMMU, but
  * the IOMMU API works on domains and devices.  Bridge that gap by
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee6eb31..16f2172698e5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -297,6 +297,7 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
+extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);


Hi Robin,

I was wondering whether it's standard to provide a stubbed version of 
this function for !CONFIG_IOMMU_API?


Cheers,
John


 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention

2018-08-17 Thread Will Deacon
On Fri, Aug 17, 2018 at 12:24:15AM -0700, Christoph Hellwig wrote:
> I plan to make the arm iommu dma ops generic and move them to
> drivers/iommu for the 4.20 merge window.  Because of that it would
> be great to create a stable branch or even pull this in through
> the dma-mapping tree entirely.

In which case, for the arm64 bits:

Acked-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention

2018-08-17 Thread Christoph Hellwig
I plan to make the arm iommu dma ops generic and move them to
drivers/iommu for the 4.20 merge window.  Because of that it would
be great to create a stable branch or even pull this in through
the dma-mapping tree entirely.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu