Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus

2020-05-22 Thread Diana Craciun OSS

On 5/22/2020 12:42 PM, Robin Murphy wrote:

On 2020-05-22 00:10, Rob Herring wrote:

On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi
 wrote:


From: Laurentiu Tudor 

The existing bindings cannot be used to specify the relationship
between fsl-mc devices and GIC ITSes.

Add a generic binding for mapping fsl-mc devices to GIC ITSes, using
msi-map property.

Signed-off-by: Laurentiu Tudor 
Cc: Rob Herring 
---
  .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30 
+--

  1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt 
b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt

index 9134e9bcca56..b0813b2d0493 100644
--- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
+++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
@@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit 
value called an ICID

  the requester.

  The generic 'iommus' property is insufficient to describe the 
relationship

-between ICIDs and IOMMUs, so an iommu-map property is used to define
-the set of possible ICIDs under a root DPRC and how they map to
-an IOMMU.
+between ICIDs and IOMMUs, so the iommu-map and msi-map properties 
are used
+to define the set of possible ICIDs under a root DPRC and how they 
map to

+an IOMMU and a GIC ITS respectively.

  For generic IOMMU bindings, see
  Documentation/devicetree/bindings/iommu/iommu.txt.
@@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt.
  For arm-smmu binding, see:
  Documentation/devicetree/bindings/iommu/arm,smmu.yaml.

+For GICv3 and GIC ITS bindings, see:
+Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. 


+
  Required properties:

  - compatible
@@ -119,6 +122,15 @@ Optional properties:
    associated with the listed IOMMU, with the iommu-specifier
    (i - icid-base + iommu-base).

+- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier
+  data.
+
+  The property is an arbitrary number of tuples of
+  (icid-base,iommu,iommu-base,length).


I'm confused because the example has GIC ITS phandle, not an IOMMU.

What is an iommu-base?


Right, I was already halfway through writing a reply to say that all 
the copy-pasted "iommu" references here should be using the 
terminology from the pci-msi.txt binding instead.


Right, will change it.




+
+  Any ICID in the interval [icid-base, icid-base + length) is
+  associated with the listed GIC ITS, with the iommu-specifier
+  (i - icid-base + iommu-base).
  Example:

  smmu: iommu@500 {
@@ -128,6 +140,16 @@ Example:
 ...
  };

+   gic: interrupt-controller@600 {
+   compatible = "arm,gic-v3";
+   ...
+   its: gic-its@602 {
+   compatible = "arm,gic-v3-its";
+   msi-controller;
+   ...
+   };
+   };
+
  fsl_mc: fsl-mc@80c00 {
  compatible = "fsl,qoriq-mc";
  reg = <0x0008 0x0c00 0 0x40>,    /* MC 
portal base */

@@ -135,6 +157,8 @@ Example:
  msi-parent = <&its>;


Side note: is it right to keep msi-parent here? It rather implies that 
the MC itself has a 'native' Device ID rather than an ICID, which I 
believe is not strictly true. Plus it's extra-confusing that it 
doesn't specify an ID either way, since that makes it look like the 
legacy PCI case that gets treated implicitly as an identity msi-map, 
which makes no sense at all to combine with an actual msi-map.


Before adding msi-map, the fsl-mc code assumed that ICID and streamID 
are equal and used msi-parent just to get the reference to the ITS node. 
Removing msi-parent will break the backward compatibility of the already 
existing systems. Maybe we should mention that this is legacy and not to 
be used for newer device trees.






  /* define map for ICIDs 23-64 */
  iommu-map = <23 &smmu 23 41>;
+    /* define msi map for ICIDs 23-64 */
+    msi-map = <23 &its 23 41>;


Seeing 23 twice is odd. The numbers to the right of 'its' should be an
ITS number space.


On about 99% of systems the values in the SMMU Stream ID and ITS 
Device ID spaces are going to be the same. Nobody's going to bother 
carrying *two* sets of sideband data across the interconnect if they 
don't have to ;)


Robin.


Diana





  #address-cells = <3>;
  #size-cells = <1>;

--
2.26.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation

2020-05-22 Thread Krishna Reddy
>For the record: I don't think we should apply these because we don't have a 
>good way of testing them. We currently have three problems that prevent us 
>from enabling SMMU on Tegra194:

Out of three issues pointed here, I see that only issue 2) is a real blocker 
for enabling SMMU HW by default in upstream.

>That said, I have tested earlier versions of this patchset on top of my local 
>branch with fixes for the above and they do seem to work as expected.
>So I'll leave it up to the IOMMU maintainers whether they're willing to merge 
>the driver patches as is.
> But I want to clarify that I won't be applying the DTS patches until we've 
> solved all of the above issues and therefore it should be clear that these 
> won't be runtime tested until then.

SMMU driver patches as such are complete and can be used by nvidia with a local 
config change(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT=n) to disable_bypass and
Protects the driver patches against kernel changes. This config disable option 
is tested already by Nicolin Chen and me.

Robin/Will, Can you comment if smmu driver patches alone(1,2,3 out of 5 
patches) can be merged without DT enable patches? Is it reasonable to merge the 
driver patches alone?

>1) If we enable SMMU support, then the DMA API will automatically try
> to use SMMU domains for allocations. This means that translations
> will happen as soon as a device's IOMMU operations are initialized
> and that is typically a long time (in kernel time at least) before
> a driver is bound and has a chance of configuring the device.

> This causes problems for non-quiesced devices like display
> controllers that the bootloader might have set up to scan out a
> boot splash.

> What we're missing here is a way to:

> a) advertise reserved memory regions for boot splash framebuffers
> b) map reserved memory regions early during SMMU setup

> Patches have been floating on the public mailing lists for b) but
> a) requires changes to the bootloader (both proprietary ones and
> U-Boot for SoCs prior to Tegra194).

This happens if SMMU translations is enabled for display before reserved
 Memory regions issue is fixed. This issue is not a real blocker for SMMU 
enable.


>  2) Even if we don't enable SMMU for a given device (by not hooking up
> the iommus property), with a default kernel configuration we get a
> bunch of faults during boot because the ARM SMMU driver faults by
> default (rather than bypass) for masters which aren't hooked up to
> the SMMU.

> We could work around that by changing the default configuration or
> overriding it on the command-line, but that's not really an option
> because it decreases security and means that Tegra194 won't work
> out-of-the-box.

This is the real issue that blocks enabling SMMU.  The USF faults for devices
that don't have SMMU translations enabled should be fixed or WAR'ed before
SMMU can be enabled. We should look at keeping SID as 0x7F for the devices
that can't have SMMU enabled yet. SID 0x7f bypasses SMMU externally.

>  3) We don't properly describe the DMA hierarchy, which causes the DMA
> masks to be improperly set. As a bit of background: Tegra194 has a
> special address bit (bit 39) that causes some swizzling to happen
> within the memory controller. As a result, any I/O virtual address
> that has bit 39 set will cause this swizzling to happen on access.
> The DMA/IOMMU allocator always starts allocating from the top of
> the IOVA space, which means that the first couple of gigabytes of
> allocations will cause most devices to fail because of the
> undesired swizzling that occurs.

> We had an initial patch for SDHCI merged that hard-codes the DMA
> mask to DMA_BIT_MASK(39) on Tegra194 to work around that. However,
> the devices all do support addressing 40 bits and the restriction
> on bit 39 is really a property of the bus rather than a capability
> of the device. This means that we would have to work around this
> for every device driver by adding similar hacks. A better option is
> to properly describe the DMA hierarchy (using dma-ranges) because
> that will then automatically be applied as a constraint on each
> device's DMA mask.

> I have been working on patches to address this, but they are fairly
> involved because they require device tree bindings changes and so
> on.

Dma_mask issue is again outside SMMU driver and as long as the clients with
Dma_mask issue don't have SMMU enabled, it would be fine.
SDHCI can have SMMU enabled in upstream as soon as issue 2 is taken care.

>So before we solve all of the above issues we can't really enable SMMU on 
>Tegra194 and hence won't be able to test it. As such we don't know if these 
>patches even work, nor can we validate that they continue to work.
>As such, I don't think there's any use in applying these patches upstream 
>since they 

Re: [RFC] Use SMMU HTTU for DMA dirty page tracking

2020-05-22 Thread Jean-Philippe Brucker
Hi,

On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote:
> Hi all,
> 
> Is there any plan for enabling SMMU HTTU?

Not outside of SVA, as far as I know.

> I have seen the patch locates in the SVA series patch, which adds
> support for HTTU:
> https://www.spinics.net/lists/arm-kernel/msg798694.html
> 
> HTTU reduces the number of access faults on SMMU fault queue
> (permission faults also benifit from it).
> 
> Besides reducing the faults, HTTU also helps to track dirty pages for
> device DMA. Is it feasible to utilize HTTU to get dirty pages on device
> DMA during VFIO live migration?

As you know there is a VFIO interface for this under discussion:
https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email-kwankh...@nvidia.com/
It doesn't implement an internal API to communicate with the IOMMU driver
about dirty pages.

> If SMMU can track dirty pages, devices are not required to implement
> additional dirty pages tracking to support VFIO live migration.

It seems feasible, though tracking it in the device might be more
efficient. I might have misunderstood but I think for live migration of
the Intel NIC they trap guest accesses to the device and introspect its
state to figure out which pages it is accessing.

With HTTU I suppose (without much knowledge about live migration) that
you'd need several new interfaces to the IOMMU drivers:

* A way for VFIO to query HTTU support in the SMMU. There are some
  discussions about communicating more IOMMU capabilities through VFIO but
  no implementation yet. When HTTU isn't supported the DIRTY_PAGES bitmap
  would report all pages as they do now.

* VFIO_IOMMU_DIRTY_PAGES_FLAG_START/STOP would clear the dirty bit
  for all VFIO mappings (which is going to take some time). There is a
  walker in io-pgtable for iova_to_phys() which could be extended. I
  suppose it's also possible to atomically switch the HA and HD bits in
  context descriptors.

* VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP would query the dirty bit for all
  VFIO mappings.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus

2020-05-22 Thread Laurentiu Tudor
On 5/22/2020 5:02 PM, Rob Herring wrote:
> On Fri, May 22, 2020 at 3:42 AM Robin Murphy  wrote:
>>
>> On 2020-05-22 00:10, Rob Herring wrote:
>>> On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi
>>>  wrote:

 From: Laurentiu Tudor 

 The existing bindings cannot be used to specify the relationship
 between fsl-mc devices and GIC ITSes.

 Add a generic binding for mapping fsl-mc devices to GIC ITSes, using
 msi-map property.

 Signed-off-by: Laurentiu Tudor 
 Cc: Rob Herring 
 ---
   .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30 +--
   1 file changed, 27 insertions(+), 3 deletions(-)

 diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt 
 b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
 index 9134e9bcca56..b0813b2d0493 100644
 --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
 +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
 @@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit value 
 called an ICID
   the requester.

   The generic 'iommus' property is insufficient to describe the 
 relationship
 -between ICIDs and IOMMUs, so an iommu-map property is used to define
 -the set of possible ICIDs under a root DPRC and how they map to
 -an IOMMU.
 +between ICIDs and IOMMUs, so the iommu-map and msi-map properties are used
 +to define the set of possible ICIDs under a root DPRC and how they map to
 +an IOMMU and a GIC ITS respectively.

   For generic IOMMU bindings, see
   Documentation/devicetree/bindings/iommu/iommu.txt.
 @@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt.
   For arm-smmu binding, see:
   Documentation/devicetree/bindings/iommu/arm,smmu.yaml.

 +For GICv3 and GIC ITS bindings, see:
 +Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml.
 +
   Required properties:

   - compatible
 @@ -119,6 +122,15 @@ Optional properties:
 associated with the listed IOMMU, with the iommu-specifier
 (i - icid-base + iommu-base).

 +- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier
 +  data.
 +
 +  The property is an arbitrary number of tuples of
 +  (icid-base,iommu,iommu-base,length).
>>>
>>> I'm confused because the example has GIC ITS phandle, not an IOMMU.
>>>
>>> What is an iommu-base?
>>
>> Right, I was already halfway through writing a reply to say that all the
>> copy-pasted "iommu" references here should be using the terminology from
>> the pci-msi.txt binding instead.
>>
 +
 +  Any ICID in the interval [icid-base, icid-base + length) is
 +  associated with the listed GIC ITS, with the iommu-specifier
 +  (i - icid-base + iommu-base).
   Example:

   smmu: iommu@500 {
 @@ -128,6 +140,16 @@ Example:
  ...
   };

 +   gic: interrupt-controller@600 {
 +   compatible = "arm,gic-v3";
 +   ...
 +   its: gic-its@602 {
 +   compatible = "arm,gic-v3-its";
 +   msi-controller;
 +   ...
 +   };
 +   };
 +
   fsl_mc: fsl-mc@80c00 {
   compatible = "fsl,qoriq-mc";
   reg = <0x0008 0x0c00 0 0x40>,/* MC portal 
 base */
 @@ -135,6 +157,8 @@ Example:
   msi-parent = <&its>;
>>
>> Side note: is it right to keep msi-parent here? It rather implies that
>> the MC itself has a 'native' Device ID rather than an ICID, which I
>> believe is not strictly true. Plus it's extra-confusing that it doesn't
>> specify an ID either way, since that makes it look like the legacy PCI
>> case that gets treated implicitly as an identity msi-map, which makes no
>> sense at all to combine with an actual msi-map.
> 
> No, it doesn't make sense from a binding perspective.
> 
>>
   /* define map for ICIDs 23-64 */
   iommu-map = <23 &smmu 23 41>;
 +/* define msi map for ICIDs 23-64 */
 +msi-map = <23 &its 23 41>;
>>>
>>> Seeing 23 twice is odd. The numbers to the right of 'its' should be an
>>> ITS number space.
>>
>> On about 99% of systems the values in the SMMU Stream ID and ITS Device
>> ID spaces are going to be the same. Nobody's going to bother carrying
>> *two* sets of sideband data across the interconnect if they don't have to ;)
> 
> I'm referring to the 23 on the left and right, not between the msi and
> iommu. If the left and right are the same, then what are we remapping
> exactly?
> 

I also insisted a lot on keeping things simple and don't do any kind of
translation but Robin convinced me that this is not such a great idea.
The truth is that the hardware 

Re: [PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation

2020-05-22 Thread Thierry Reding
On Thu, May 21, 2020 at 04:31:02PM -0700, Krishna Reddy wrote:
> Changes in v5:
> Rebased on top of 
> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
> 
> v4 - https://lkml.org/lkml/2019/10/30/1054
> v3 - https://lkml.org/lkml/2019/10/18/1601
> v2 - https://lkml.org/lkml/2019/9/2/980
> v1 - https://lkml.org/lkml/2019/8/29/1588
> 
> Krishna Reddy (5):
>   iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>   dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
>   iommu/arm-smmu: Add global/context fault implementation hooks

For the record: I don't think we should apply these because we don't
have a good way of testing them. We currently have three problems that
prevent us from enabling SMMU on Tegra194:

  1) If we enable SMMU support, then the DMA API will automatically try
 to use SMMU domains for allocations. This means that translations
 will happen as soon as a device's IOMMU operations are initialized
 and that is typically a long time (in kernel time at least) before
 a driver is bound and has a chance of configuring the device.

 This causes problems for non-quiesced devices like display
 controllers that the bootloader might have set up to scan out a
 boot splash.

 What we're missing here is a way to:

 a) advertise reserved memory regions for boot splash framebuffers
 b) map reserved memory regions early during SMMU setup

 Patches have been floating on the public mailing lists for b) but
 a) requires changes to the bootloader (both proprietary ones and
 U-Boot for SoCs prior to Tegra194).

  2) Even if we don't enable SMMU for a given device (by not hooking up
 the iommus property), with a default kernel configuration we get a
 bunch of faults during boot because the ARM SMMU driver faults by
 default (rather than bypass) for masters which aren't hooked up to
 the SMMU.

 We could work around that by changing the default configuration or
 overriding it on the command-line, but that's not really an option
 because it decreases security and means that Tegra194 won't work
 out-of-the-box.

  3) We don't properly describe the DMA hierarchy, which causes the DMA
 masks to be improperly set. As a bit of background: Tegra194 has a
 special address bit (bit 39) that causes some swizzling to happen
 within the memory controller. As a result, any I/O virtual address
 that has bit 39 set will cause this swizzling to happen on access.
 The DMA/IOMMU allocator always starts allocating from the top of
 the IOVA space, which means that the first couple of gigabytes of
 allocations will cause most devices to fail because of the
 undesired swizzling that occurs.

 We had an initial patch for SDHCI merged that hard-codes the DMA
 mask to DMA_BIT_MASK(39) on Tegra194 to work around that. However,
 the devices all do support addressing 40 bits and the restriction
 on bit 39 is really a property of the bus rather than a capability
 of the device. This means that we would have to work around this
 for every device driver by adding similar hacks. A better option is
 to properly describe the DMA hierarchy (using dma-ranges) because
 that will then automatically be applied as a constraint on each
 device's DMA mask.

 I have been working on patches to address this, but they are fairly
 involved because they require device tree bindings changes and so
 on.

So before we solve all of the above issues we can't really enable SMMU
on Tegra194 and hence won't be able to test it. As such we don't know if
these patches even work, nor can we validate that they continue to work.

As such, I don't think there's any use in applying these patches
upstream since they will be effectively dead code until all of the above
issues are resolved.

>   arm64: tegra: Add DT node for T194 SMMU
>   arm64: tegra: enable SMMU for SDHCI and EQOS on T194

This one is going to cause EQOS to break because of 3) above. It might
work for SDHCI because of the workaround we currently have in that
driver. However, I do have a local patch that reverts the workaround
and replaces it with the proper fix, which uses dma-ranges as mentioned
above.

That said, I have tested earlier versions of this patchset on top of my
local branch with fixes for the above and they do seem to work as
expected.

So I'll leave it up to the IOMMU maintainers whether they're willing to
merge the driver patches as is. But I want to clarify that I won't be
applying the DTS patches until we've solved all of the above issues and
therefore it should be clear that these won't be runtime tested until
then.

I expect it will take at least until v5.9-rc1 before we have all the
changes merged that would allow us to enable SMMU support.

Thierry

>  .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
>  MAINTAINERS   |   2 +
>  arch/arm64/boot/dts/nvidia/teg

Re: arm-smmu-v3 high cpu usage for NVMe

2020-05-22 Thread John Garry

On 20/03/2020 10:41, John Garry wrote:

+ Barry, Alexandru

    PerfTop:   85864 irqs/sec  kernel:89.6%  exact:  0.0% lost: 
0/34434 drop:

0/40116 [4000Hz cycles],  (all, 96 CPUs)
-- 



  27.43%  [kernel]  [k] arm_smmu_cmdq_issue_cmdlist
  11.71%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.35%  [kernel]  [k] _raw_spin_unlock_irq
   2.65%  [kernel]  [k] get_user_pages_fast
   2.03%  [kernel]  [k] __slab_free
   1.55%  [kernel]  [k] tick_nohz_idle_exit
   1.47%  [kernel]  [k] arm_lpae_map
   1.39%  [kernel]  [k] __fget
   1.14%  [kernel]  [k] __lock_text_start
   1.09%  [kernel]  [k] _raw_spin_lock
   1.08%  [kernel]  [k] bio_release_pages.part.42
   1.03%  [kernel]  [k] __sbitmap_get_word
   0.97%  [kernel]  [k] 
arm_smmu_atc_inv_domain.constprop.42

   0.91%  [kernel]  [k] fput_many
   0.88%  [kernel]  [k] __arm_lpae_map



Hi Will, Robin,

I'm just getting around to look at this topic again. Here's the current 
picture for my NVMe test:


perf top -C 0 *
Samples: 808 of event 'cycles:ppp', Event count (approx.): 469909024
Overhead Shared Object Symbol
75.91% [kernel] [k] arm_smmu_cmdq_issue_cmdlist
3.28% [kernel] [k] arm_smmu_tlb_inv_range
2.42% [kernel] [k] arm_smmu_atc_inv_domain.constprop.49
2.35% [kernel] [k] _raw_spin_unlock_irqrestore
1.32% [kernel] [k] __arm_smmu_cmdq_poll_set_valid_map.isra.41
1.20% [kernel] [k] aio_complete_rw
0.96% [kernel] [k] enqueue_task_fair
0.93% [kernel] [k] gic_handle_irq
0.86% [kernel] [k] _raw_spin_lock_irqsave
0.72% [kernel] [k] put_reqs_available
0.72% [kernel] [k] sbitmap_queue_clear

* only certain CPUs run the dma unmap for my scenario, cpu0 being one of 
them.


Colleague Barry has similar findings for some other scenarios.

So we tried the latest perf NMI support wip patches, and noticed a few 
hotspots (see 
https://raw.githubusercontent.com/hisilicon/kernel-dev/fee69c8ca3784b9dd3912703cfcd4985a00f6bbb/perf%20annotate 
and 
https://raw.githubusercontent.com/hisilicon/kernel-dev/fee69c8ca3784b9dd3912703cfcd4985a00f6bbb/report.txt) 
when running some NVMe traffic:


- initial cmpxchg to get a place in the queue
- when more CPUs get involved, we start failing at an exponential rate
0.00 :8000107a3500:   cas x4, x2, [x27]
26.52 :8000107a3504:   mov x0, x4 : 
arm_smmu_cmdq_issue_cmdlist():


- the queue locking
- polling cmd_sync

Some ideas to optimise:

a. initial cmpxchg
So this cmpxchg could be considered unfair. In addition, with all the 
contention on arm_smmu_cmdq.q, that cacheline would be constantly pinged 
around the system.
Maybe we can implement something similar to the idea of queued/ticketed 
spinlocks, making a CPU spin on own copy of arm_smmu_cmdq.q after 
initial cmpxchg fails, released by its leader, and releasing subsequent 
followers


b. Drop the queue_full checking in certain circumstances
If we cannot theoretically fill the queue, then stop the checking for 
queue full or similar. This should also help current problem of a., as 
the less time between cmpxchg, the less chance of failing (as we check 
queue available space between cmpxchg attempts).


So if cmdq depth > nr_available_cpus * (max batch size + 1) AND we 
always issue a cmd_sync for a batch (regardless of whether requested), 
then we should never fill (I think).


c. Don't do queue locking in certain circumstances
If we implement (and support) b. and support MSI polling, then I don't 
think that this is required.


d. More minor ideas are to move forward when the "owner" stops gathering 
to reduce time of advancing the prod, hopefully reducing cmd_sync 
polling time; and also use a smaller word size for the valid bitmap 
operations, maybe 32b atomic operations are overall more efficient (than 
64b) - mostly valid range check is < 16 bits from my observation.


Let me know your thoughts or any other ideas.

Thanks,
John

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 09/15] device core: Add ability to handle multiple dma offsets

2020-05-22 Thread Jim Quinlan via iommu
Hi Nicolas,

On Wed, May 20, 2020 at 7:28 AM Nicolas Saenz Julienne
 wrote:
>
> Hi Jim,
> thanks for having a go at this! My two cents.
>
> On Tue, 2020-05-19 at 16:34 -0400, Jim Quinlan wrote:
> > The device variable 'dma_pfn_offset' is used to do a single
> > linear map between cpu addrs and dma addrs.  The variable
> > 'dma_map' is added to struct device to point to an array
> > of multiple offsets which is required for some devices.
> >
> > Signed-off-by: Jim Quinlan 
> > ---
>
> [...]
>
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -493,6 +493,8 @@ struct dev_links_info {
> >   * @bus_dma_limit: Limit of an upstream bridge or bus which imposes a 
> > smaller
> >   *   DMA limit than the device itself supports.
> >   * @dma_pfn_offset: offset of DMA memory range relatively of RAM
> > + * @dma_map: Like dma_pfn_offset but used when there are multiple
> > + *   pfn offsets for multiple dma-ranges.
> >   * @dma_parms:   A low level driver may set these to teach IOMMU code
> > about
> >   *   segment limitations.
> >   * @dma_pools:   Dma pools (if dma'ble device).
> > @@ -578,7 +580,12 @@ struct device {
> >allocations such descriptors. */
> >   u64 bus_dma_limit;  /* upstream dma constraint */
> >   unsigned long   dma_pfn_offset;
> > -
> > +#ifdef CONFIG_DMA_PFN_OFFSET_MAP
> > + const void *dma_offset_map; /* Like dma_pfn_offset, but for
> > +  * the unlikely case of multiple
> > +  * offsets. If non-null, 
> > dma_pfn_offset
> > +  * will be 0. */
>
> I get a bad feeling about separating the DMA offset handling into two distinct
> variables. Albeit generally frowned upon, there is a fair amount of trickery
> around dev->dma_pfn_offset all over the kernel. usb_alloc_dev() comes to mind
> for example. And this obviously doesn't play well with it.

The trickery should only be present when
CONFIG_DMA_PFN_OFFSET_MAP=y**.  Otherwise it does no harm.  Also, I
feel that if dev-dma_pfn_offset is valid then so is
dev->dma_pfn_offset_map -- they both use the same mechanism in the
same places.  I am merely
extending something that has been in Linux for a long time..

Further,  I could have had dma_pfn_offset_map  subsume dma_pfn_offset
but I wanted to leave it alone since folks would complain that it
would go from an addition to an if-clause and an inline function.  But
if I did go that way there would  only be one mechanism that would
cover both cases.

> I feel a potential
> solution to multiple DMA ranges should completely integrate with the current
> device DMA handling code, without special cases, on top of that, be 
> transparent
> to the user.

Having dma_pfn_offset_map subsume  dma_pfn_offset would integrate the
current  code too.  And I am not sure what you mean by being
"transparent to the user" -- the writer of the PCIe endpoint driver is
going to do some DMA calls and they have no idea if this mechanism is
in play or not.

>
> In more concrete terms, I'd repackage dev->bus_dma_limit and
> dev->dma_pfn_offset into a list/array of DMA range structures

This is sort of what I am doing except I defined my own structure.
Using the of_range structure would require one to do the same extra
calculations over  and over for a DMA call; this is why I  defined my
structure that has all of the needed precomputed variables.

> and adapt/create
> the relevant getter/setter functions so as for DMA users not to have to worry
> about the specifics of a device's DMA constraints.

I'm not sure I understand where these getter/setter functions would
exist or what they would do.

> editing dev->dma_pfn_offset, you'd be passing a DMA range structure to the
> device core, and let it take the relevant decisions on how to handle it

How and where would the device core operate for these getter/setters?
In how many places in the code?  The way I see it, any solution has to
adjust the value when doing dma2phys and phys2dma conversions, and the
most efficient place to do that is in the two DMA header files (the
second one is for ARM).

> internally (overwrite, add a new entry, merge them, etc...).
I'm concerned that  this would be overkill; I am just trying to get a
driver upstream for some baroque PCIe RC HW I'm not sure if we should
set up something elaborate when the demand is not there.

I'll be posting a v2.  ChrisophH has sent me some personal feedback
which I am incorporating; so feel free to discuss your ideas with him
as well because I really want consensus on any large changes in
direction.

Thanks,
Jim

** CONFIG_DMA_OF_PFN_OFFSET_MAP=y only occurs when building for
ARCH_BRCMSTB.  However, ARCH_BRCMSTB is set by the ARM64 defconfig and
the ARM multi_v7_defconfig, so it would be activated for those
defconfigs.  This may(a)  get us kicked out of those defconfigs  or
(b) we may have to keep

Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus

2020-05-22 Thread Robin Murphy

On 2020-05-22 15:08, Rob Herring wrote:

On Fri, May 22, 2020 at 3:57 AM Diana Craciun OSS
 wrote:


On 5/22/2020 12:42 PM, Robin Murphy wrote:

On 2020-05-22 00:10, Rob Herring wrote:

On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi
 wrote:


From: Laurentiu Tudor 

The existing bindings cannot be used to specify the relationship
between fsl-mc devices and GIC ITSes.

Add a generic binding for mapping fsl-mc devices to GIC ITSes, using
msi-map property.

Signed-off-by: Laurentiu Tudor 
Cc: Rob Herring 
---
   .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30
+--
   1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
index 9134e9bcca56..b0813b2d0493 100644
--- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
+++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
@@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit
value called an ICID
   the requester.

   The generic 'iommus' property is insufficient to describe the
relationship
-between ICIDs and IOMMUs, so an iommu-map property is used to define
-the set of possible ICIDs under a root DPRC and how they map to
-an IOMMU.
+between ICIDs and IOMMUs, so the iommu-map and msi-map properties
are used
+to define the set of possible ICIDs under a root DPRC and how they
map to
+an IOMMU and a GIC ITS respectively.

   For generic IOMMU bindings, see
   Documentation/devicetree/bindings/iommu/iommu.txt.
@@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt.
   For arm-smmu binding, see:
   Documentation/devicetree/bindings/iommu/arm,smmu.yaml.

+For GICv3 and GIC ITS bindings, see:
+Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml.

+
   Required properties:

   - compatible
@@ -119,6 +122,15 @@ Optional properties:
 associated with the listed IOMMU, with the iommu-specifier
 (i - icid-base + iommu-base).

+- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier
+  data.
+
+  The property is an arbitrary number of tuples of
+  (icid-base,iommu,iommu-base,length).


I'm confused because the example has GIC ITS phandle, not an IOMMU.

What is an iommu-base?


Right, I was already halfway through writing a reply to say that all
the copy-pasted "iommu" references here should be using the
terminology from the pci-msi.txt binding instead.


Right, will change it.




+
+  Any ICID in the interval [icid-base, icid-base + length) is
+  associated with the listed GIC ITS, with the iommu-specifier
+  (i - icid-base + iommu-base).
   Example:

   smmu: iommu@500 {
@@ -128,6 +140,16 @@ Example:
  ...
   };

+   gic: interrupt-controller@600 {
+   compatible = "arm,gic-v3";
+   ...
+   its: gic-its@602 {
+   compatible = "arm,gic-v3-its";
+   msi-controller;
+   ...
+   };
+   };
+
   fsl_mc: fsl-mc@80c00 {
   compatible = "fsl,qoriq-mc";
   reg = <0x0008 0x0c00 0 0x40>,/* MC
portal base */
@@ -135,6 +157,8 @@ Example:
   msi-parent = <&its>;


Side note: is it right to keep msi-parent here? It rather implies that
the MC itself has a 'native' Device ID rather than an ICID, which I
believe is not strictly true. Plus it's extra-confusing that it
doesn't specify an ID either way, since that makes it look like the
legacy PCI case that gets treated implicitly as an identity msi-map,
which makes no sense at all to combine with an actual msi-map.


Before adding msi-map, the fsl-mc code assumed that ICID and streamID
are equal and used msi-parent just to get the reference to the ITS node.
Removing msi-parent will break the backward compatibility of the already
existing systems. Maybe we should mention that this is legacy and not to
be used for newer device trees.


If ids are 1:1, then the DT should use msi-parent. If there is
remapping, then use msi-map. A given system should use one or the
other. I suppose if some ids are 1:1 and the msi-map was added to add
additional support for ids not 1:1, then you could end up with both.
That's fine in dts files, but examples should reflect the 'right' way.


Is that defined anywhere? The generic MSI binding just has some weaselly 
wording about buses:


"When #msi-cells is non-zero, busses with an msi-parent will require 
additional properties to describe the relationship between devices on 
the bus and the set of MSIs they can potentially generate."


which appears at odds with its own definition of msi-parent as including 
an msi-specifier (or at best very unclear about what value that 
specifier should take in this case).


The PCI MSI binding goes even further and specifically reserves 
msi-parent for cases where there is no sideband data. As far as I'm 
aware, the fact that the IT

Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus

2020-05-22 Thread Rob Herring
On Fri, May 22, 2020 at 3:57 AM Diana Craciun OSS
 wrote:
>
> On 5/22/2020 12:42 PM, Robin Murphy wrote:
> > On 2020-05-22 00:10, Rob Herring wrote:
> >> On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi
> >>  wrote:
> >>>
> >>> From: Laurentiu Tudor 
> >>>
> >>> The existing bindings cannot be used to specify the relationship
> >>> between fsl-mc devices and GIC ITSes.
> >>>
> >>> Add a generic binding for mapping fsl-mc devices to GIC ITSes, using
> >>> msi-map property.
> >>>
> >>> Signed-off-by: Laurentiu Tudor 
> >>> Cc: Rob Herring 
> >>> ---
> >>>   .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30
> >>> +--
> >>>   1 file changed, 27 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> >>> b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> >>> index 9134e9bcca56..b0813b2d0493 100644
> >>> --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> >>> +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> >>> @@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit
> >>> value called an ICID
> >>>   the requester.
> >>>
> >>>   The generic 'iommus' property is insufficient to describe the
> >>> relationship
> >>> -between ICIDs and IOMMUs, so an iommu-map property is used to define
> >>> -the set of possible ICIDs under a root DPRC and how they map to
> >>> -an IOMMU.
> >>> +between ICIDs and IOMMUs, so the iommu-map and msi-map properties
> >>> are used
> >>> +to define the set of possible ICIDs under a root DPRC and how they
> >>> map to
> >>> +an IOMMU and a GIC ITS respectively.
> >>>
> >>>   For generic IOMMU bindings, see
> >>>   Documentation/devicetree/bindings/iommu/iommu.txt.
> >>> @@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt.
> >>>   For arm-smmu binding, see:
> >>>   Documentation/devicetree/bindings/iommu/arm,smmu.yaml.
> >>>
> >>> +For GICv3 and GIC ITS bindings, see:
> >>> +Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml.
> >>>
> >>> +
> >>>   Required properties:
> >>>
> >>>   - compatible
> >>> @@ -119,6 +122,15 @@ Optional properties:
> >>> associated with the listed IOMMU, with the iommu-specifier
> >>> (i - icid-base + iommu-base).
> >>>
> >>> +- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier
> >>> +  data.
> >>> +
> >>> +  The property is an arbitrary number of tuples of
> >>> +  (icid-base,iommu,iommu-base,length).
> >>
> >> I'm confused because the example has GIC ITS phandle, not an IOMMU.
> >>
> >> What is an iommu-base?
> >
> > Right, I was already halfway through writing a reply to say that all
> > the copy-pasted "iommu" references here should be using the
> > terminology from the pci-msi.txt binding instead.
>
> Right, will change it.
>
> >
> >>> +
> >>> +  Any ICID in the interval [icid-base, icid-base + length) is
> >>> +  associated with the listed GIC ITS, with the iommu-specifier
> >>> +  (i - icid-base + iommu-base).
> >>>   Example:
> >>>
> >>>   smmu: iommu@500 {
> >>> @@ -128,6 +140,16 @@ Example:
> >>>  ...
> >>>   };
> >>>
> >>> +   gic: interrupt-controller@600 {
> >>> +   compatible = "arm,gic-v3";
> >>> +   ...
> >>> +   its: gic-its@602 {
> >>> +   compatible = "arm,gic-v3-its";
> >>> +   msi-controller;
> >>> +   ...
> >>> +   };
> >>> +   };
> >>> +
> >>>   fsl_mc: fsl-mc@80c00 {
> >>>   compatible = "fsl,qoriq-mc";
> >>>   reg = <0x0008 0x0c00 0 0x40>,/* MC
> >>> portal base */
> >>> @@ -135,6 +157,8 @@ Example:
> >>>   msi-parent = <&its>;
> >
> > Side note: is it right to keep msi-parent here? It rather implies that
> > the MC itself has a 'native' Device ID rather than an ICID, which I
> > believe is not strictly true. Plus it's extra-confusing that it
> > doesn't specify an ID either way, since that makes it look like the
> > legacy PCI case that gets treated implicitly as an identity msi-map,
> > which makes no sense at all to combine with an actual msi-map.
>
> Before adding msi-map, the fsl-mc code assumed that ICID and streamID
> are equal and used msi-parent just to get the reference to the ITS node.
> Removing msi-parent will break the backward compatibility of the already
> existing systems. Maybe we should mention that this is legacy and not to
> be used for newer device trees.

If ids are 1:1, then the DT should use msi-parent. If there is
remapping, then use msi-map. A given system should use one or the
other. I suppose if some ids are 1:1 and the msi-map was added to add
additional support for ids not 1:1, then you could end up with both.
That's fine in dts files, but examples should reflect the 'right' way.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linu

Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus

2020-05-22 Thread Rob Herring
On Fri, May 22, 2020 at 3:42 AM Robin Murphy  wrote:
>
> On 2020-05-22 00:10, Rob Herring wrote:
> > On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi
> >  wrote:
> >>
> >> From: Laurentiu Tudor 
> >>
> >> The existing bindings cannot be used to specify the relationship
> >> between fsl-mc devices and GIC ITSes.
> >>
> >> Add a generic binding for mapping fsl-mc devices to GIC ITSes, using
> >> msi-map property.
> >>
> >> Signed-off-by: Laurentiu Tudor 
> >> Cc: Rob Herring 
> >> ---
> >>   .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30 +--
> >>   1 file changed, 27 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt 
> >> b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> >> index 9134e9bcca56..b0813b2d0493 100644
> >> --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> >> +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> >> @@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit value 
> >> called an ICID
> >>   the requester.
> >>
> >>   The generic 'iommus' property is insufficient to describe the 
> >> relationship
> >> -between ICIDs and IOMMUs, so an iommu-map property is used to define
> >> -the set of possible ICIDs under a root DPRC and how they map to
> >> -an IOMMU.
> >> +between ICIDs and IOMMUs, so the iommu-map and msi-map properties are used
> >> +to define the set of possible ICIDs under a root DPRC and how they map to
> >> +an IOMMU and a GIC ITS respectively.
> >>
> >>   For generic IOMMU bindings, see
> >>   Documentation/devicetree/bindings/iommu/iommu.txt.
> >> @@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt.
> >>   For arm-smmu binding, see:
> >>   Documentation/devicetree/bindings/iommu/arm,smmu.yaml.
> >>
> >> +For GICv3 and GIC ITS bindings, see:
> >> +Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml.
> >> +
> >>   Required properties:
> >>
> >>   - compatible
> >> @@ -119,6 +122,15 @@ Optional properties:
> >> associated with the listed IOMMU, with the iommu-specifier
> >> (i - icid-base + iommu-base).
> >>
> >> +- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier
> >> +  data.
> >> +
> >> +  The property is an arbitrary number of tuples of
> >> +  (icid-base,iommu,iommu-base,length).
> >
> > I'm confused because the example has GIC ITS phandle, not an IOMMU.
> >
> > What is an iommu-base?
>
> Right, I was already halfway through writing a reply to say that all the
> copy-pasted "iommu" references here should be using the terminology from
> the pci-msi.txt binding instead.
>
> >> +
> >> +  Any ICID in the interval [icid-base, icid-base + length) is
> >> +  associated with the listed GIC ITS, with the iommu-specifier
> >> +  (i - icid-base + iommu-base).
> >>   Example:
> >>
> >>   smmu: iommu@500 {
> >> @@ -128,6 +140,16 @@ Example:
> >>  ...
> >>   };
> >>
> >> +   gic: interrupt-controller@600 {
> >> +   compatible = "arm,gic-v3";
> >> +   ...
> >> +   its: gic-its@602 {
> >> +   compatible = "arm,gic-v3-its";
> >> +   msi-controller;
> >> +   ...
> >> +   };
> >> +   };
> >> +
> >>   fsl_mc: fsl-mc@80c00 {
> >>   compatible = "fsl,qoriq-mc";
> >>   reg = <0x0008 0x0c00 0 0x40>,/* MC portal 
> >> base */
> >> @@ -135,6 +157,8 @@ Example:
> >>   msi-parent = <&its>;
>
> Side note: is it right to keep msi-parent here? It rather implies that
> the MC itself has a 'native' Device ID rather than an ICID, which I
> believe is not strictly true. Plus it's extra-confusing that it doesn't
> specify an ID either way, since that makes it look like the legacy PCI
> case that gets treated implicitly as an identity msi-map, which makes no
> sense at all to combine with an actual msi-map.

No, it doesn't make sense from a binding perspective.

>
> >>   /* define map for ICIDs 23-64 */
> >>   iommu-map = <23 &smmu 23 41>;
> >> +/* define msi map for ICIDs 23-64 */
> >> +msi-map = <23 &its 23 41>;
> >
> > Seeing 23 twice is odd. The numbers to the right of 'its' should be an
> > ITS number space.
>
> On about 99% of systems the values in the SMMU Stream ID and ITS Device
> ID spaces are going to be the same. Nobody's going to bother carrying
> *two* sets of sideband data across the interconnect if they don't have to ;)

I'm referring to the 23 on the left and right, not between the msi and
iommu. If the left and right are the same, then what are we remapping
exactly?

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: Fix group refcount in iommu_alloc_default_domain()

2020-05-22 Thread Sai Prakash Ranjan
Since the change to move default domain allocation to probe,
there is a refcount decrement missing for the group in
iommu_alloc_default_domain(). Because of this missing
refcount decrement, the device is never released from the
group as the devices_kobj refcount never reaches 0 in
iommu_group_remove_device() leading to a lot of issues.
One such case is that this will lead to a different group
allocation on every reload of the module which configures
iommu such as the ath10k module which then finally fails
to attach this device to the SMMU with -ENOSPC error in
__arm_smmu_alloc_bitmap() once the count of module reload
crosses the number of context banks. This will then lead
to NULL pointer deference in the next reload of the module.
Add the missing refcount decrement(iommu_group_put()) in
iommu_alloc_default_domain() to fix this issue.

Call trace:

...
  platform wifi-firmware.0: Adding to iommu group 82
  ath10k_snoc 1880.wifi: could not attach device: -28
  platform wifi-firmware.0: Removing from iommu group 82
  ath10k_snoc 1880.wifi: failed to initialize firmware: -28
  ath10k_snoc: probe of 1880.wifi failed with error -28
  platform wifi-firmware.0: Adding to iommu group 83
  Unable to handle kernel NULL pointer dereference at virtual address 

  Mem abort info:
ESR = 0x9606
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
  Data abort info:
ISV = 0, ISS = 0x0006
CM = 0, WnR = 0
  user pgtable: 4k pages, 39-bit VAs, pgdp=000177a53000
  [] pgd=0001e74f5003, pud=0001e74f5003, 
pmd=
  Internal error: Oops: 9606 [#1] PREEMPT SMP
  pstate: 6049 (nZCv daif +PAN -UAO)
  arm_smmu_flush_iotlb_all+0x20/0x6c
  iommu_create_device_direct_mappings+0x17c/0x1d8
  iommu_probe_device+0xc0/0x100
  of_iommu_configure+0x108/0x240
  of_dma_configure+0x130/0x1d0
  ath10k_fw_init+0xc4/0x1c4 [ath10k_snoc]
  ath10k_snoc_probe+0x5cc/0x678 [ath10k_snoc]
  platform_drv_probe+0x90/0xb0
  really_probe+0x134/0x2ec
  driver_probe_device+0x64/0xfc
  device_driver_attach+0x4c/0x6c
  __driver_attach+0xac/0xc0
  bus_for_each_dev+0x8c/0xd4
  driver_attach+0x2c/0x38
  bus_add_driver+0xfc/0x1d0
  driver_register+0x64/0xf8
  __platform_driver_register+0x4c/0x58
  init_module+0x20/0x1000 [ath10k_snoc]
  do_one_initcall+0x13c/0x2d0
  do_init_module+0x58/0x1dc
  load_module+0xde0/0xf10
  __arm64_sys_finit_module+0xb0/0xe0
  el0_svc_common+0xa4/0x154
  el0_svc_compat_handler+0x2c/0x38
  el0_svc_compat+0x8/0x10
 Code: d503201f f85b8268 b4000248 f8560e74 (f9400280)
 ---[ end trace e5c1470a584952a0 ]---
 Kernel panic - not syncing: Fatal exception

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/iommu.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a4c2f122eb8b..05f7b77c432f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1491,6 +1491,7 @@ static int iommu_alloc_default_domain(struct device *dev)
 {
struct iommu_group *group;
unsigned int type;
+   int ret;
 
group = iommu_group_get(dev);
if (!group)
@@ -1501,7 +1502,11 @@ static int iommu_alloc_default_domain(struct device *dev)
 
type = iommu_get_def_domain_type(dev);
 
-   return iommu_group_alloc_default_domain(dev->bus, group, type);
+   ret = iommu_group_alloc_default_domain(dev->bus, group, type);
+
+   iommu_group_put(group);
+
+   return ret;
 }
 
 /**
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 39/38] drm: xen: fix common struct sg_table related issues

2020-05-22 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

Fix the code to refer to proper nents or orig_nents entries. This driver
reports the number of the pages in the imported scatterlist, so it should
refer to sg_table->orig_nents entry.

Signed-off-by: Marek Szyprowski 
Acked-by: Oleksandr Andrushchenko 
---
For more information, see '[PATCH v5 00/38] DRM: fix struct sg_table nents
vs. orig_nents misuse' thread:
https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprow...@samsung.com/T/

This patch has been resurrected on Oleksandr Andrushchenko request. The
patch was a part of v2 patchset, but then I've dropped it as it only
fixes the debug output, thus I didn't consider it so critical.
---
 drivers/gpu/drm/xen/xen_drm_front_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c 
b/drivers/gpu/drm/xen/xen_drm_front_gem.c
index f0b85e0..ba4bdc5 100644
--- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
+++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
@@ -215,7 +215,7 @@ struct drm_gem_object *
return ERR_PTR(ret);
 
DRM_DEBUG("Imported buffer of size %zu with nents %u\n",
- size, sgt->nents);
+ size, sgt->orig_nents);
 
return &xen_obj->base;
 }
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 13/24] iommu/arm-smmu-v3: Enable broadcast TLB maintenance

2020-05-22 Thread Jean-Philippe Brucker
[+Eric]

On Thu, May 21, 2020 at 03:38:35PM +0100, Marc Zyngier wrote:
> On 2020-05-21 15:17, Will Deacon wrote:
> > [+Marc]
> > 
> > On Tue, May 19, 2020 at 07:54:51PM +0200, Jean-Philippe Brucker wrote:
> > > The SMMUv3 can handle invalidation targeted at TLB entries with shared
> > > ASIDs. If the implementation supports broadcast TLB maintenance,
> > > enable it
> > > and keep track of it in a feature bit. The SMMU will then be
> > > affected by
> > > inner-shareable TLB invalidations from other agents.
> > > 
> > > A major side-effect of this change is that stage-2 translation
> > > contexts
> > > are now affected by all invalidations by VMID. VMIDs are all shared
> > > and
> > > the only ways to prevent over-invalidation, since the stage-2 page
> > > tables
> > > are not shared between CPU and SMMU, are to either disable BTM or
> > > allocate
> > > different VMIDs. This patch does not address the problem.
> > 
> > This sounds like a potential performance issue, particularly as we
> > expose
> > stage-2 contexts via VFIO directly.

Yes it's certainly going to affect SMMU performance, though I haven't
measured it. QEMU and kvmtool currently use stage-1 translations instead
of stage-2, so it won't be a problem until they start using nested
translation (and unless the SMMU only supports stage-2).

In the coming month I'd like to have a look at coordinating VMID
allocation between KVM and SMMU, for guest SVA. If the guest wants to
share page tables with the SMMU, the SMMU has to use the same VMIDs as the
VM to receive broadcast TLBI.

Similarly to patch 06 ("arm64: mm: Pin down ASIDs for sharing mm with
devices") the SMMU would request a VMID allocated by KVM, when setting up
a nesting VFIO container. One major downside is that the VMID is pinned
and cannot be recycled on rollover while it's being used for DMA.

I wonder if we could use this even when page tables aren't shared between
CPU and SMMU, to avoid splitting the VMID space.

> > Maybe we could reserve some portion
> > of
> > VMID space for the SMMU? Marc, what do you reckon?
> 
> Certainly doable when we have 16bits VMIDs. With smaller VMID spaces (like
> on
> v8.0), this is a bit more difficult (we do have pretty large v8.0 systems
> around).

It's only an issue if those systems have an SMMUv3 supporting DVM. With
any luck that doesn't exist?

> How many VMID bits are we talking about?

That's anyone's guess... One passed-through device per VM would halve the
VMID space. But the SMMU allocates one VMID for each device assigned to a
guest, not one per VM (well one per domain, or VFIO container, but I think
it boils down to one per device with QEMU). So with SR-IOV for example it
should be pretty easy to reach 256 VMIDs in the SMMU.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/12] bus: fsl-mc: Add ACPI support for fsl-mc

2020-05-22 Thread Lorenzo Pieralisi
On Fri, May 22, 2020 at 05:32:02AM +, Makarand Pawagi wrote:

[...]

> > Subject: Re: [PATCH 12/12] bus: fsl-mc: Add ACPI support for fsl-mc
> > 
> > Hi Lorenzo,
> > 
> > On 5/21/2020 4:00 PM, Lorenzo Pieralisi wrote:
> > > From: Diana Craciun 
> > >
> > > Add ACPI support in the fsl-mc driver. Driver parses MC DSDT table to
> > > extract memory and other resources.
> > >
> > > Interrupt (GIC ITS) information is extracted from the MADT table by
> > > drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c.
> > >
> > > IORT table is parsed to configure DMA.
> > >
> > > Signed-off-by: Makarand Pawagi 
> > > Signed-off-by: Diana Craciun 
> > > Signed-off-by: Laurentiu Tudor 
> > > ---
> > 
> > The author of this patch should be Makarand. I think I accidentaly broke it 
> > when
> > we exchanged the patches. Very sorry about it.
> > 
>  
> Will you be able to correct this or should I post another patch?

I will update it.

Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus

2020-05-22 Thread Robin Murphy

On 2020-05-22 00:10, Rob Herring wrote:

On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi
 wrote:


From: Laurentiu Tudor 

The existing bindings cannot be used to specify the relationship
between fsl-mc devices and GIC ITSes.

Add a generic binding for mapping fsl-mc devices to GIC ITSes, using
msi-map property.

Signed-off-by: Laurentiu Tudor 
Cc: Rob Herring 
---
  .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30 +--
  1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt 
b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
index 9134e9bcca56..b0813b2d0493 100644
--- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
+++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
@@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit value called 
an ICID
  the requester.

  The generic 'iommus' property is insufficient to describe the relationship
-between ICIDs and IOMMUs, so an iommu-map property is used to define
-the set of possible ICIDs under a root DPRC and how they map to
-an IOMMU.
+between ICIDs and IOMMUs, so the iommu-map and msi-map properties are used
+to define the set of possible ICIDs under a root DPRC and how they map to
+an IOMMU and a GIC ITS respectively.

  For generic IOMMU bindings, see
  Documentation/devicetree/bindings/iommu/iommu.txt.
@@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt.
  For arm-smmu binding, see:
  Documentation/devicetree/bindings/iommu/arm,smmu.yaml.

+For GICv3 and GIC ITS bindings, see:
+Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml.
+
  Required properties:

  - compatible
@@ -119,6 +122,15 @@ Optional properties:
associated with the listed IOMMU, with the iommu-specifier
(i - icid-base + iommu-base).

+- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier
+  data.
+
+  The property is an arbitrary number of tuples of
+  (icid-base,iommu,iommu-base,length).


I'm confused because the example has GIC ITS phandle, not an IOMMU.

What is an iommu-base?


Right, I was already halfway through writing a reply to say that all the 
copy-pasted "iommu" references here should be using the terminology from 
the pci-msi.txt binding instead.



+
+  Any ICID in the interval [icid-base, icid-base + length) is
+  associated with the listed GIC ITS, with the iommu-specifier
+  (i - icid-base + iommu-base).
  Example:

  smmu: iommu@500 {
@@ -128,6 +140,16 @@ Example:
 ...
  };

+   gic: interrupt-controller@600 {
+   compatible = "arm,gic-v3";
+   ...
+   its: gic-its@602 {
+   compatible = "arm,gic-v3-its";
+   msi-controller;
+   ...
+   };
+   };
+
  fsl_mc: fsl-mc@80c00 {
  compatible = "fsl,qoriq-mc";
  reg = <0x0008 0x0c00 0 0x40>,/* MC portal base */
@@ -135,6 +157,8 @@ Example:
  msi-parent = <&its>;


Side note: is it right to keep msi-parent here? It rather implies that 
the MC itself has a 'native' Device ID rather than an ICID, which I 
believe is not strictly true. Plus it's extra-confusing that it doesn't 
specify an ID either way, since that makes it look like the legacy PCI 
case that gets treated implicitly as an identity msi-map, which makes no 
sense at all to combine with an actual msi-map.



  /* define map for ICIDs 23-64 */
  iommu-map = <23 &smmu 23 41>;
+/* define msi map for ICIDs 23-64 */
+msi-map = <23 &its 23 41>;


Seeing 23 twice is odd. The numbers to the right of 'its' should be an
ITS number space.


On about 99% of systems the values in the SMMU Stream ID and ITS Device 
ID spaces are going to be the same. Nobody's going to bother carrying 
*two* sets of sideband data across the interconnect if they don't have to ;)


Robin.


  #address-cells = <3>;
  #size-cells = <1>;

--
2.26.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: limit iova free size to unmmaped iova

2020-05-22 Thread Robin Murphy

On 2020-05-22 07:25, gup...@codeaurora.org wrote:

On 2020-05-22 01:46, Robin Murphy wrote:

On 2020-05-21 12:30, Prakash Gupta wrote:

Limit the iova size while freeing based on unmapped size. In absence of
this even with unmap failure, invalid iova is pushed to iova rcache and
subsequently can cause panic while rcache magazine is freed.


Can you elaborate on that panic?


We have seen couple of stability issues around this.
Below is one such example:

kernel BUG at kernel/msm-4.19/drivers/iommu/iova.c:904!
iova_magazine_free_pfns
iova_rcache_insert
free_iova_fast
__iommu_unmap_page
iommu_dma_unmap_page


OK, so it's not some NULL dereference or anything completely unexpected, 
that's good.



It turned out an iova pfn 0 got into iova_rcache. One possibility I see is
where client unmap with invalid dma_addr. The unmap call will fail and 
warn on

and still try to free iova. This will cause invalid pfn to be inserted into
rcache. As and when the magazine with invalid pfn will be freed
private_find_iova() will return NULL for invalid iova and meet bug 
condition.


That would indeed be a bug in whatever driver made the offending 
dma_unmap call.



Signed-off-by: Prakash Gupta 

:100644 100644 4959f5df21bd 098f7d377e04 M    drivers/iommu/dma-iommu.c

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd..098f7d377e04 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -472,7 +472,8 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,

    if (!cookie->fq_domain)
  iommu_tlb_sync(domain, &iotlb_gather);
-    iommu_dma_free_iova(cookie, dma_addr, size);
+    if (unmapped)
+    iommu_dma_free_iova(cookie, dma_addr, unmapped);


Frankly, if any part of the unmap fails then things have gone
catastrophically wrong already, but either way this isn't right. The
IOVA API doesn't support partial freeing - an IOVA *must* be freed
with its original size, or not freed at all, otherwise it will corrupt
the state of the rcaches and risk a cascade of further misbehaviour
for future callers.


I agree, we shouldn't be freeing the partial iova. Instead just making
sure if unmap was successful should be sufficient before freeing iova. 
So change

can instead be something like this:

-    iommu_dma_free_iova(cookie, dma_addr, size);
+    if (unmapped)
+    iommu_dma_free_iova(cookie, dma_addr, size);


TBH my gut feeling here is that you're really just trying to treat a
symptom of another bug elsewhere, namely some driver calling
dma_unmap_* or dma_free_* with the wrong address or size in the first
place.


This condition would arise only if driver calling dma_unmap/free_* with 0
iova_pfn. This will be flagged with a warning during unmap but will trigger
panic later on while doing unrelated dma_map/unmap_*. If unmapped has 
already
failed for invalid iova, there is no reason we should consider this as 
valid

iova and free. This part should be fixed.


I disagree. In general, if drivers call the DMA API incorrectly it is 
liable to lead to data loss, memory corruption, and various other 
unpleasant misbehaviour - it is not the DMA layer's job to attempt to 
paper over driver bugs.


There *is* an argument for downgrading the BUG_ON() in 
iova_magazine_free_pfns() to a WARN_ON(), since frankly it isn't a 
sufficiently serious condition to justify killing the whole machine 
immediately, but NAK to bodging the iommu-dma mid-layer to "fix" that. A 
serious bug already happened elsewhere, so trying to hide the fallout 
really doesn't help anyone.


Robin.


On 2020-05-22 00:19, Andrew Morton wrote:

I think we need a cc:stable here?


Added now.

Thanks,
Prakash

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu