[PATCH 1/1] arm64: dts: marvell: keep SMMU disabled by default for Armada 7040 and 8040

2020-11-05 Thread Tomasz Nowicki
FW has to configure devices' StreamIDs so that SMMU is able to lookup
context and do proper translation later on. For Armada 7040 & 8040 and
publicly available FW, most of the devices are configured properly,
but some like ap_sdhci0, PCIe, NIC still remain unassigned which
results in SMMU faults about unmatched StreamID (assuming
ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=y).

Since there is dependency on custom FW let SMMU be disabled by default.
People who still willing to use SMMU need to enable manually and
use ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=n (or via kernel command line)
with extra caution.

Fixes: 83a3545d9c37 ("arm64: dts: marvell: add SMMU support")
Cc:  # 5.9+
Signed-off-by: Tomasz Nowicki 
---
 arch/arm64/boot/dts/marvell/armada-7040.dtsi | 4 
 arch/arm64/boot/dts/marvell/armada-8040.dtsi | 4 
 2 files changed, 8 deletions(-)

diff --git a/arch/arm64/boot/dts/marvell/armada-7040.dtsi 
b/arch/arm64/boot/dts/marvell/armada-7040.dtsi
index 7a3198cd7a07..2f440711d21d 100644
--- a/arch/arm64/boot/dts/marvell/armada-7040.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-7040.dtsi
@@ -15,10 +15,6 @@ / {
 "marvell,armada-ap806";
 };
 
- {
-   status = "okay";
-};
-
 _pcie0 {
iommu-map =
<0x00x480 0x20>,
diff --git a/arch/arm64/boot/dts/marvell/armada-8040.dtsi 
b/arch/arm64/boot/dts/marvell/armada-8040.dtsi
index 79e8ce59baa8..22c2d6ebf381 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-8040.dtsi
@@ -15,10 +15,6 @@ / {
 "marvell,armada-ap806";
 };
 
- {
-   status = "okay";
-};
-
 _pcie0 {
iommu-map =
<0x00x480 0x20>,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/4] Add system mmu support for Armada-806

2020-10-23 Thread Tomasz Nowicki

On 10/23/20 12:33 PM, Robin Murphy wrote:

On 2020-10-23 13:19, Tomasz Nowicki wrote:

Hi Denis,

Sorry for late response, we had to check few things. Please see 
comments inline.


On 10/6/20 3:16 PM, Denis Odintsov wrote:

Hi,


Am 15.07.2020 um 09:06 schrieb Tomasz Nowicki :

The series is meant to support SMMU for AP806 and a workaround
for accessing ARM SMMU 64bit registers is the gist of it.

For the record, AP-806 can't access SMMU registers with 64bit width.
This patches split the readq/writeq into two 32bit accesses instead
and update DT bindings.

The series was successfully tested on a vanilla v5.8-rc3 kernel and
Intel e1000e PCIe NIC. The same for platform devices like SATA and USB.

For reference, previous versions are listed below:
V1: https://lkml.org/lkml/2018/10/15/373
V2: https://lkml.org/lkml/2019/7/11/426
V3: https://lkml.org/lkml/2020/7/2/1114



1) After enabling SMMU on Armada 8040, and 
ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=y by default in kernel since 
954a03be033c7cef80ddc232e7cbdb17df735663,
internal eMMC is prevented from being initialised (as there is no 
iommus property for ap_sdhci0)
Disabling "Disable bypass by default" make it work, but the patch 
highly suggest doing it properly.
I wasn't able to find correct path for ap_sdhci for iommus in any 
publicly available documentation,

would be highly appreciated addressed properly, thank you!

2) Second issue I got (btw I have ClearFog GT 8k armada-8040 based 
board) is mpci ath10k card.
It is found, it is enumerated, it is visible in lspci, but it fails 
to be initialised. Here is the log:


Firmware has to configure and assign device StreamIDs. Most of the 
devices are configured properly and supported in public FW. However, 
for both these cases (ap_sdhci0 and PCIe) some extra (u-boot/UEFI/ATF) 
patches are required which are not available yet. Sorry we let that 
happen.


Since we have dependency on custom FW and we cannot enforce people to 
patch their FW we will send the follow up fix patch (v5.9+) and revert 
respective DTS changes.


Note that it should be sufficient to simply keep the SMMU node disabled, 
rather than fully revert everything. For example, the PCIe SMMU for Arm 
Juno boards has been in that state for a long time - there are reasons 
why it isn't (yet) 100% usable for everyone, but it can easily be 
enabled locally for development (as I do).




Actually that was our plan :) but then we decided to keep DTS clean if 
something is not used. Your reasoning, however, does make sense and we 
will go for it.


Thanks,
Tomasz



The most important Armada-806 SMMU driver enhancements were merged so 
people who still willing to use SMMU need to provide proper DTB and 
use ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=n (or via kernel command line) 
with extra cautious.


Thanks,
Tomasz



[    1.743754] armada8k-pcie f260.pcie: host bridge 
/cp0/pcie@f260 ranges:
[    1.751116] armada8k-pcie f260.pcie:  MEM 
0x00f600..0x00f6ef -> 0x00f600

[    1.964690] armada8k-pcie f260.pcie: Link up
[    1.969379] armada8k-pcie f260.pcie: PCI host bridge to bus 
:00

[    1.976026] pci_bus :00: root bus resource [bus 00-ff]
[    1.981537] pci_bus :00: root bus resource [mem 
0xf600-0xf6ef]

[    1.988462] pci :00:00.0: [11ab:0110] type 01 class 0x060400
[    1.994504] pci :00:00.0: reg 0x10: [mem 0x-0x000f]
[    2.000843] pci :00:00.0: supports D1 D2
[    2.005132] pci :00:00.0: PME# supported from D0 D1 D3hot
[    2.011853] pci :01:00.0: [168c:003c] type 00 class 0x028000
[    2.018001] pci :01:00.0: reg 0x10: [mem 0x-0x001f 
64bit]
[    2.025002] pci :01:00.0: reg 0x30: [mem 0x-0x 
pref]

[    2.032111] pci :01:00.0: supports D1 D2
[    2.049409] pci :00:00.0: BAR 14: assigned [mem 
0xf600-0xf61f]
[    2.056322] pci :00:00.0: BAR 0: assigned [mem 
0xf620-0xf62f]
[    2.063142] pci :00:00.0: BAR 15: assigned [mem 
0xf630-0xf63f pref]
[    2.070484] pci :01:00.0: BAR 0: assigned [mem 
0xf600-0xf61f 64bit]
[    2.077880] pci :01:00.0: BAR 6: assigned [mem 
0xf630-0xf630 pref]

[    2.085135] pci :00:00.0: PCI bridge to [bus 01-ff]
[    2.090384] pci :00:00.0:   bridge window [mem 
0xf600-0xf61f]
[    2.097202] pci :00:00.0:   bridge window [mem 
0xf630-0xf63f pref]

[    2.104539] pcieport :00:00.0: Adding to iommu group 4
[    2.110232] pcieport :00:00.0: PME: Signaling with IRQ 38
[    2.116141] pcieport :00:00.0: AER: enabled with IRQ 38
[    8.131135] ath10k_pci :01:00.0: Adding to iommu group 4
[    8.131874] ath10k_pci :01:00.0: enabling device ( -> 0002)
[    8.132203] ath10k_pci :01:00.0: pci irq msi oper_irq_mode 2 
irq_mode 0 reset_mode 0


up to that point the log is the same as without SMMU enabled, except 
"Adding to iommu group N" lines, and IRQ being 37


[   

Re: [PATCH v4 0/4] Add system mmu support for Armada-806

2020-10-23 Thread Tomasz Nowicki

Hi Denis,

Sorry for late response, we had to check few things. Please see comments 
inline.


On 10/6/20 3:16 PM, Denis Odintsov wrote:

Hi,


Am 15.07.2020 um 09:06 schrieb Tomasz Nowicki :

The series is meant to support SMMU for AP806 and a workaround
for accessing ARM SMMU 64bit registers is the gist of it.

For the record, AP-806 can't access SMMU registers with 64bit width.
This patches split the readq/writeq into two 32bit accesses instead
and update DT bindings.

The series was successfully tested on a vanilla v5.8-rc3 kernel and
Intel e1000e PCIe NIC. The same for platform devices like SATA and USB.

For reference, previous versions are listed below:
V1: https://lkml.org/lkml/2018/10/15/373
V2: https://lkml.org/lkml/2019/7/11/426
V3: https://lkml.org/lkml/2020/7/2/1114



1) After enabling SMMU on Armada 8040, and ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=y 
by default in kernel since 954a03be033c7cef80ddc232e7cbdb17df735663,
internal eMMC is prevented from being initialised (as there is no iommus 
property for ap_sdhci0)
Disabling "Disable bypass by default" make it work, but the patch highly 
suggest doing it properly.
I wasn't able to find correct path for ap_sdhci for iommus in any publicly 
available documentation,
would be highly appreciated addressed properly, thank you!

2) Second issue I got (btw I have ClearFog GT 8k armada-8040 based board) is 
mpci ath10k card.
It is found, it is enumerated, it is visible in lspci, but it fails to be 
initialised. Here is the log:


Firmware has to configure and assign device StreamIDs. Most of the 
devices are configured properly and supported in public FW. However, for 
both these cases (ap_sdhci0 and PCIe) some extra (u-boot/UEFI/ATF) 
patches are required which are not available yet. Sorry we let that happen.


Since we have dependency on custom FW and we cannot enforce people to 
patch their FW we will send the follow up fix patch (v5.9+) and revert 
respective DTS changes.


The most important Armada-806 SMMU driver enhancements were merged so 
people who still willing to use SMMU need to provide proper DTB and use 
ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=n (or via kernel command line) with 
extra cautious.


Thanks,
Tomasz



[1.743754] armada8k-pcie f260.pcie: host bridge /cp0/pcie@f260 
ranges:
[1.751116] armada8k-pcie f260.pcie:  MEM 0x00f600..0x00f6ef 
-> 0x00f600
[1.964690] armada8k-pcie f260.pcie: Link up
[1.969379] armada8k-pcie f260.pcie: PCI host bridge to bus :00
[1.976026] pci_bus :00: root bus resource [bus 00-ff]
[1.981537] pci_bus :00: root bus resource [mem 0xf600-0xf6ef]
[1.988462] pci :00:00.0: [11ab:0110] type 01 class 0x060400
[1.994504] pci :00:00.0: reg 0x10: [mem 0x-0x000f]
[2.000843] pci :00:00.0: supports D1 D2
[2.005132] pci :00:00.0: PME# supported from D0 D1 D3hot
[2.011853] pci :01:00.0: [168c:003c] type 00 class 0x028000
[2.018001] pci :01:00.0: reg 0x10: [mem 0x-0x001f 64bit]
[2.025002] pci :01:00.0: reg 0x30: [mem 0x-0x pref]
[2.032111] pci :01:00.0: supports D1 D2
[2.049409] pci :00:00.0: BAR 14: assigned [mem 0xf600-0xf61f]
[2.056322] pci :00:00.0: BAR 0: assigned [mem 0xf620-0xf62f]
[2.063142] pci :00:00.0: BAR 15: assigned [mem 0xf630-0xf63f 
pref]
[2.070484] pci :01:00.0: BAR 0: assigned [mem 0xf600-0xf61f 
64bit]
[2.077880] pci :01:00.0: BAR 6: assigned [mem 0xf630-0xf630 
pref]
[2.085135] pci :00:00.0: PCI bridge to [bus 01-ff]
[2.090384] pci :00:00.0:   bridge window [mem 0xf600-0xf61f]
[2.097202] pci :00:00.0:   bridge window [mem 0xf630-0xf63f 
pref]
[2.104539] pcieport :00:00.0: Adding to iommu group 4
[2.110232] pcieport :00:00.0: PME: Signaling with IRQ 38
[2.116141] pcieport :00:00.0: AER: enabled with IRQ 38
[8.131135] ath10k_pci :01:00.0: Adding to iommu group 4
[8.131874] ath10k_pci :01:00.0: enabling device ( -> 0002)
[8.132203] ath10k_pci :01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 
reset_mode 0

up to that point the log is the same as without SMMU enabled, except "Adding to 
iommu group N" lines, and IRQ being 37

[8.221328] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.313362] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.409373] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.553433] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.641370] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.737979] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.807356] ath10k_pci :01:00.0: Failed to get pcie state addr: -16
[8.814032] ath10k_pci :01:00.0: failed to setup init config: -16
[8.820605] ath10k_pci :01:00.0: 

Re: [PATCH v4 3/4] dt-bindings: arm-smmu: add compatible string for Marvell Armada-AP806 SMMU-500

2020-07-16 Thread Tomasz Nowicki

On 15.07.2020 12:36, Robin Murphy wrote:

On 2020-07-15 08:06, Tomasz Nowicki wrote:

Add specific compatible string for Marvell usage due to errata of
accessing 64bits registers of ARM SMMU, in AP806.

AP806 SoC uses the generic ARM-MMU500, and there's no specific
implementation of Marvell, this compatible is used for errata only.


Reviewed-by: Robin Murphy 

Presumably Will can pick up these first 3 patches for 5.9 and #4 can go 
via arm-soc.


Thanks Robin for review and valuable comments.

Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/4] iommu/arm-smmu: Workaround for Marvell Armada-AP806 SoC erratum #582743

2020-07-16 Thread Tomasz Nowicki

On 15.07.2020 12:32, Robin Murphy wrote:

On 2020-07-15 08:06, Tomasz Nowicki wrote:

From: Hanna Hawa 

Due to erratum #582743, the Marvell Armada-AP806 can't access 64bit to
ARM SMMUv2 registers.

Provide implementation relevant hooks:
- split the writeq/readq to two accesses of writel/readl.
- mask the MMU_IDR2.PTFSv8 fields to not use AArch64 format (but
only AARCH32_L) since with AArch64 format 32 bits access is not 
supported.


Note that most 64-bit registers like TTBRn can be accessed as two 32-bit
halves without issue, and AArch32 format ensures that the register writes
which must be atomic (for TLBI etc.) need only be 32-bit.


Thanks Tomasz, this has ended up as clean as I'd hoped it could, and 
there's still room to come back and play more complicated games later if 
a real need for AARCH64_64K at stage 2 crops up.


Based on your implementation infrastructure rework, indeed the code 
looks much cleaner :)




Reviewed-by: Robin Murphy 



Thanks!

Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 4/4] arm64: dts: marvell: add SMMU support

2020-07-15 Thread Tomasz Nowicki
From: Marcin Wojtas 

Add IOMMU node for Marvell AP806 based SoCs together with platform
and PCI device Stream ID mapping.

Signed-off-by: Marcin Wojtas 
Signed-off-by: Tomasz Nowicki 
---
 arch/arm64/boot/dts/marvell/armada-7040.dtsi  | 28 +
 arch/arm64/boot/dts/marvell/armada-8040.dtsi  | 40 +++
 arch/arm64/boot/dts/marvell/armada-ap80x.dtsi | 18 +
 3 files changed, 86 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-7040.dtsi 
b/arch/arm64/boot/dts/marvell/armada-7040.dtsi
index 47247215770d..7a3198cd7a07 100644
--- a/arch/arm64/boot/dts/marvell/armada-7040.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-7040.dtsi
@@ -14,3 +14,31 @@
compatible = "marvell,armada7040", "marvell,armada-ap806-quad",
 "marvell,armada-ap806";
 };
+
+ {
+   status = "okay";
+};
+
+_pcie0 {
+   iommu-map =
+   <0x00x480 0x20>,
+   <0x100  0x4a0 0x20>,
+   <0x200  0x4c0 0x20>;
+   iommu-map-mask = <0x031f>;
+};
+
+_sata0 {
+   iommus = < 0x444>;
+};
+
+_sdhci0 {
+   iommus = < 0x445>;
+};
+
+_usb3_0 {
+   iommus = < 0x440>;
+};
+
+_usb3_1 {
+   iommus = < 0x441>;
+};
diff --git a/arch/arm64/boot/dts/marvell/armada-8040.dtsi 
b/arch/arm64/boot/dts/marvell/armada-8040.dtsi
index 7699b19224c2..79e8ce59baa8 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-8040.dtsi
@@ -15,6 +15,18 @@
 "marvell,armada-ap806";
 };
 
+ {
+   status = "okay";
+};
+
+_pcie0 {
+   iommu-map =
+   <0x00x480 0x20>,
+   <0x100  0x4a0 0x20>,
+   <0x200  0x4c0 0x20>;
+   iommu-map-mask = <0x031f>;
+};
+
 /* The RTC requires external oscillator. But on Aramda 80x0, the RTC clock
  * in CP master is not connected (by package) to the oscillator. So
  * disable it. However, the RTC clock in CP slave is connected to the
@@ -23,3 +35,31 @@
 _rtc {
status = "disabled";
 };
+
+_sata0 {
+   iommus = < 0x444>;
+};
+
+_sdhci0 {
+   iommus = < 0x445>;
+};
+
+_usb3_0 {
+   iommus = < 0x440>;
+};
+
+_usb3_1 {
+   iommus = < 0x441>;
+};
+
+_sata0 {
+   iommus = < 0x454>;
+};
+
+_usb3_0 {
+   iommus = < 0x450>;
+};
+
+_usb3_1 {
+   iommus = < 0x451>;
+};
diff --git a/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi 
b/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
index 7f9b9a647717..12e477f1aeb9 100644
--- a/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
@@ -56,6 +56,24 @@
compatible = "simple-bus";
ranges = <0x0 0x0 0xf000 0x100>;
 
+   smmu: iommu@500 {
+   compatible = "marvell,ap806-smmu-500", 
"arm,mmu-500";
+   reg = <0x10 0x10>;
+   dma-coherent;
+   #iommu-cells = <1>;
+   #global-interrupts = <1>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+;
+   status = "disabled";
+   };
+
gic: interrupt-controller@21 {
compatible = "arm,gic-400";
#interrupt-cells = <3>;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/4] dt-bindings: arm-smmu: add compatible string for Marvell Armada-AP806 SMMU-500

2020-07-15 Thread Tomasz Nowicki
Add specific compatible string for Marvell usage due to errata of
accessing 64bits registers of ARM SMMU, in AP806.

AP806 SoC uses the generic ARM-MMU500, and there's no specific
implementation of Marvell, this compatible is used for errata only.

Reviewed-by: Rob Herring 
Signed-off-by: Hanna Hawa 
Signed-off-by: Gregory CLEMENT 
Signed-off-by: Tomasz Nowicki 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..156b38924a00 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,10 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: Marvell SoCs implementing "arm,mmu-500"
+items:
+  - const: marvell,ap806-smmu-500
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/4] iommu/arm-smmu: Call configuration impl hook before consuming features

2020-07-15 Thread Tomasz Nowicki
'cfg_probe' hook is called at the very end of configuration probing
procedure and therefore features override and workaround may become
complex like for ID register fixups. In preparation for adding Marvell
errata move 'cfg_probe' a bit earlier to have chance to adjust
the detected features before we start consuming them.

Since the Cavium quirk (the only user) does not alter features
it is safe to do so.

Suggested-by: Robin Murphy 
Signed-off-by: Tomasz Nowicki 
---
 drivers/iommu/arm-smmu.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..19f906de6420 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1728,7 +1728,7 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
unsigned int size;
u32 id;
bool cttw_reg, cttw_fw = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK;
-   int i;
+   int i, ret;
 
dev_notice(smmu->dev, "probing hardware configuration...\n");
dev_notice(smmu->dev, "SMMUv%d with:\n",
@@ -1891,6 +1891,12 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_64K;
}
 
+   if (smmu->impl && smmu->impl->cfg_probe) {
+   ret = smmu->impl->cfg_probe(smmu);
+   if (ret)
+   return ret;
+   }
+
/* Now we've corralled the various formats, what'll it do? */
if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_S)
smmu->pgsize_bitmap |= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
@@ -1918,9 +1924,6 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
dev_notice(smmu->dev, "\tStage-2: %lu-bit IPA -> %lu-bit PA\n",
   smmu->ipa_size, smmu->pa_size);
 
-   if (smmu->impl && smmu->impl->cfg_probe)
-   return smmu->impl->cfg_probe(smmu);
-
return 0;
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/4] iommu/arm-smmu: Workaround for Marvell Armada-AP806 SoC erratum #582743

2020-07-15 Thread Tomasz Nowicki
From: Hanna Hawa 

Due to erratum #582743, the Marvell Armada-AP806 can't access 64bit to
ARM SMMUv2 registers.

Provide implementation relevant hooks:
- split the writeq/readq to two accesses of writel/readl.
- mask the MMU_IDR2.PTFSv8 fields to not use AArch64 format (but
only AARCH32_L) since with AArch64 format 32 bits access is not supported.

Note that most 64-bit registers like TTBRn can be accessed as two 32-bit
halves without issue, and AArch32 format ensures that the register writes
which must be atomic (for TLBI etc.) need only be 32-bit.

Signed-off-by: Hanna Hawa 
Signed-off-by: Gregory CLEMENT 
Signed-off-by: Tomasz Nowicki 
---
 Documentation/arm64/silicon-errata.rst |  3 ++
 drivers/iommu/arm-smmu-impl.c  | 45 ++
 2 files changed, 48 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.rst 
b/Documentation/arm64/silicon-errata.rst
index 936cf2a59ca4..157214d3abe1 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -125,6 +125,9 @@ stable kernels.
 | Cavium | ThunderX2 Core  | #219| CAVIUM_TX2_ERRATUM_219  
|
 
++-+-+-+
 
++-+-+-+
+| Marvell| ARM-MMU-500 | #582743 | N/A 
|
+++-+-+-+
+++-+-+-+
 | Freescale/NXP  | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 
|
 
++-+-+-+
 
++-+-+-+
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..59422cb92488 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -147,6 +147,48 @@ static const struct arm_smmu_impl arm_mmu500_impl = {
.reset = arm_mmu500_reset,
 };
 
+static u64 mrvl_mmu500_readq(struct arm_smmu_device *smmu, int page, int off)
+{
+   /*
+* Marvell Armada-AP806 erratum #582743.
+* Split all the readq to double readl
+*/
+   return hi_lo_readq_relaxed(arm_smmu_page(smmu, page) + off);
+}
+
+static void mrvl_mmu500_writeq(struct arm_smmu_device *smmu, int page, int off,
+  u64 val)
+{
+   /*
+* Marvell Armada-AP806 erratum #582743.
+* Split all the writeq to double writel
+*/
+   hi_lo_writeq_relaxed(val, arm_smmu_page(smmu, page) + off);
+}
+
+static int mrvl_mmu500_cfg_probe(struct arm_smmu_device *smmu)
+{
+
+   /*
+* Armada-AP806 erratum #582743.
+* Hide the SMMU_IDR2.PTFSv8 fields to sidestep the AArch64
+* formats altogether and allow using 32 bits access on the
+* interconnect.
+*/
+   smmu->features &= ~(ARM_SMMU_FEAT_FMT_AARCH64_4K |
+   ARM_SMMU_FEAT_FMT_AARCH64_16K |
+   ARM_SMMU_FEAT_FMT_AARCH64_64K);
+
+   return 0;
+}
+
+static const struct arm_smmu_impl mrvl_mmu500_impl = {
+   .read_reg64 = mrvl_mmu500_readq,
+   .write_reg64 = mrvl_mmu500_writeq,
+   .cfg_probe = mrvl_mmu500_cfg_probe,
+   .reset = arm_mmu500_reset,
+};
+
 
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
 {
@@ -175,5 +217,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
 
+   if (of_device_is_compatible(np, "marvell,ap806-smmu-500"))
+   smmu->impl = _mmu500_impl;
+
return smmu;
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/4] Add system mmu support for Armada-806

2020-07-15 Thread Tomasz Nowicki
The series is meant to support SMMU for AP806 and a workaround
for accessing ARM SMMU 64bit registers is the gist of it.

For the record, AP-806 can't access SMMU registers with 64bit width.
This patches split the readq/writeq into two 32bit accesses instead
and update DT bindings.

The series was successfully tested on a vanilla v5.8-rc3 kernel and
Intel e1000e PCIe NIC. The same for platform devices like SATA and USB.

For reference, previous versions are listed below:
V1: https://lkml.org/lkml/2018/10/15/373
V2: https://lkml.org/lkml/2019/7/11/426
V3: https://lkml.org/lkml/2020/7/2/1114

v3 -> v4
- call cfg_probe() impl hook a bit earlier which simplifies errata handling
- use hi_lo_readq_relaxed() and hi_lo_writeq_relaxed() for register accessors
- keep SMMU status disabled by default and enable where possible (DTS changes)
- commit logs improvements and other minor fixes

Hanna Hawa (1):
  iommu/arm-smmu: Workaround for Marvell Armada-AP806 SoC erratum
#582743

Marcin Wojtas (1):
  arm64: dts: marvell: add SMMU support

Tomasz Nowicki (2):
  iommu/arm-smmu: Call configuration impl hook before consuming features
  dt-bindings: arm-smmu: add compatible string for Marvell Armada-AP806
SMMU-500

 Documentation/arm64/silicon-errata.rst|  3 ++
 .../devicetree/bindings/iommu/arm,smmu.yaml   |  4 ++
 arch/arm64/boot/dts/marvell/armada-7040.dtsi  | 28 
 arch/arm64/boot/dts/marvell/armada-8040.dtsi  | 40 +
 arch/arm64/boot/dts/marvell/armada-ap80x.dtsi | 18 
 drivers/iommu/arm-smmu-impl.c | 45 +++
 drivers/iommu/arm-smmu.c  | 11 +++--
 7 files changed, 145 insertions(+), 4 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/4] Add system mmu support for Armada-806

2020-07-14 Thread Tomasz Nowicki

Hi Will,

On 14.07.2020 10:19, Will Deacon wrote:

Hi Tomasz,

On Thu, Jul 02, 2020 at 10:16:29PM +0200, Tomasz Nowicki wrote:

There were already two versions of series to support SMMU for AP806,
including workaround for accessing ARM SMMU 64bit registers.
First [1] by Hanna Hawa and second [2] by Gregory CLEMENT.
Since it got stuck this is yet another try. I incorporated the V2 comments,
mainly by moving workaround code to arm-smmu-impl.c as requested.

For the record, AP-806 can't access SMMU registers with 64bit width,
this patches split the readq/writeq into two 32bit accesses instead
and update DT bindings.

The series was successfully tested on a vanilla v5.8-rc3 kernel and
Intel e1000e PCIe NIC. The same for platform devices like SATA and USB.

[1]: https://lkml.org/lkml/2018/10/15/373
[2]: https://lkml.org/lkml/2019/7/11/426


Do you have a v4 of this series? It looks like there were a few comments
left to address, but with that I can pick it up for 5.9.


Yes, I will send it out today.

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/4] iommu/arm-smmu: Workaround for Marvell Armada-AP806 SoC erratum #582743

2020-07-03 Thread Tomasz Nowicki



On 03.07.2020 11:03, Robin Murphy wrote:

On 2020-07-02 21:16, Tomasz Nowicki wrote:

From: Hanna Hawa 

Due to erratum #582743, the Marvell Armada-AP806 can't access 64bit to
ARM SMMUv2 registers.

Provide implementation relevant hooks:
- split the writeq/readq to two accesses of writel/readl.
- mask the MMU_IDR2.PTFSv8 fields to not use AArch64 format (but
only AARCH32_L) since with AArch64 format 32 bits access is not 
supported.


Note that separate writes/reads to 2 is not problem regards to
atomicity, because the driver use the readq/writeq while initialize
the SMMU, report for SMMU fault, and use spinlock in one
case (iova_to_phys).


The comment about the spinlock seems to be out of date, and TBH that 
whole sentence is a bit unclear - how about something like:


"Note that most 64-bit registers like TTBRn can be accessed as two 
32-bit halves without issue, and AArch32 format ensures that the 
register writes which must be atomic (for TLBI etc.) need only be 32-bit."



Signed-off-by: Hanna Hawa 
Signed-off-by: Gregory CLEMENT 
Signed-off-by: Tomasz Nowicki 
---
  Documentation/arm64/silicon-errata.rst |  3 ++
  drivers/iommu/arm-smmu-impl.c  | 52 ++
  2 files changed, 55 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.rst 
b/Documentation/arm64/silicon-errata.rst

index 936cf2a59ca4..157214d3abe1 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -125,6 +125,9 @@ stable kernels.
  | Cavium | ThunderX2 Core  | #219    | 
CAVIUM_TX2_ERRATUM_219  |
  
++-+-+-+ 

  
++-+-+-+ 

+| Marvell    | ARM-MMU-500 | #582743 | 
N/A |
+++-+-+-+ 

+++-+-+-+ 

  | Freescale/NXP  | LS2080A/LS1043A | A-008585    | 
FSL_ERRATUM_A008585 |
  
++-+-+-+ 

  
++-+-+-+ 

diff --git a/drivers/iommu/arm-smmu-impl.c 
b/drivers/iommu/arm-smmu-impl.c

index c75b9d957b70..c1fc5e1b8193 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -147,6 +147,53 @@ static const struct arm_smmu_impl arm_mmu500_impl 
= {

  .reset = arm_mmu500_reset,
  };
+static u64 mrvl_mmu500_readq(struct arm_smmu_device *smmu, int page, 
int off)

+{
+    u64 val;
+
+    /*
+ * Marvell Armada-AP806 erratum #582743.
+ * Split all the readq to double readl
+ */
+    val = (u64)readl_relaxed(arm_smmu_page(smmu, page) + off + 4) << 32;
+    val |= readl_relaxed(arm_smmu_page(smmu, page) + off);


Even though io-64-nonatomic-hi-lo.h doesn't override readq() etc. for 
64-bit builds, you can still use hi_lo_readq_relaxed() directly.



+
+    return val;
+}
+
+static void mrvl_mmu500_writeq(struct arm_smmu_device *smmu, int 
page, int off,

+   u64 val)
+{
+    /*
+ * Marvell Armada-AP806 erratum #582743.
+ * Split all the writeq to double writel
+ */
+    writel_relaxed(upper_32_bits(val), arm_smmu_page(smmu, page) + 
off + 4);

+    writel_relaxed(lower_32_bits(val), arm_smmu_page(smmu, page) + off);


Similarly, hi_lo_writeq_relaxed().


+}
+
+static u32 mrvl_mmu500_cfg_id2_fixup(u32 id)
+{
+
+    /*
+ * Armada-AP806 erratum #582743.
+ * Hide the SMMU_IDR2.PTFSv8 fields to sidestep the AArch64
+ * formats altogether and allow using 32 bits access on the
+ * interconnect.
+ */
+    id &= ~(ARM_SMMU_ID2_PTFS_4K | ARM_SMMU_ID2_PTFS_16K |
+    ARM_SMMU_ID2_PTFS_64K);
+
+    return id;
+}
+
+static const struct arm_smmu_impl mrvl_mmu500_impl = {
+    .read_reg64 = mrvl_mmu500_readq,
+    .write_reg64 = mrvl_mmu500_writeq,
+    .cfg_id2_fixup = mrvl_mmu500_cfg_id2_fixup,
+    .reset = arm_mmu500_reset,
+};
+
  struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device 
*smmu)

  {
@@ -160,6 +207,11 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)

   */
  switch (smmu->model) {
  case ARM_MMU500:
+    if (of_device_is_compatible(smmu->dev->of_node,


Nit: there's a local "np" variable now.


+    "marvell,ap806-smmu-500")) {
+    smmu->impl = _mmu500_impl;
+    return smmu;
+    }


Please put this with the other integration checks below the switch 
statement. Yes, it means we'll end up assigning smmu->impl twice for 
this particular case, but that's the intended pattern.




Thanks, all above comments do make sense and will be fixed in next spin.

Thanks,
Tomasz

Re: [PATCH v3 4/4] arm64: dts: marvell: add SMMU support

2020-07-03 Thread Tomasz Nowicki

On 03.07.2020 11:16, Robin Murphy wrote:

On 2020-07-02 21:16, Tomasz Nowicki wrote:

From: Marcin Wojtas 

Add IOMMU node for Marvell AP806 based SoCs together with platform
and PCI device Stream ID mapping.

Signed-off-by: Marcin Wojtas 
Signed-off-by: Tomasz Nowicki 
---
  arch/arm64/boot/dts/marvell/armada-8040.dtsi  | 36 +++
  arch/arm64/boot/dts/marvell/armada-ap80x.dtsi | 17 +
  2 files changed, 53 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-8040.dtsi 
b/arch/arm64/boot/dts/marvell/armada-8040.dtsi

index 7699b19224c2..25c1df709f72 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-8040.dtsi
@@ -23,3 +23,39 @@
  _rtc {
  status = "disabled";
  };
+
+_usb3_0 {
+    iommus = < 0x440>;
+};
+
+_usb3_1 {
+    iommus = < 0x441>;
+};
+
+_sata0 {
+    iommus = < 0x444>;
+};
+
+_sdhci0 {
+    iommus = < 0x445>;
+};
+
+_sata0 {
+    iommus = < 0x454>;
+};
+
+_usb3_0 {
+    iommus = < 0x450>;
+};
+
+_usb3_1 {
+    iommus = < 0x451>;
+};
+
+_pcie0 {
+    iommu-map =
+    <0x0    0x480 0x20>,
+    <0x100  0x4a0 0x20>,
+    <0x200  0x4c0 0x20>;
+    iommu-map-mask = <0x031f>;


Nice! I do like a good compressed mapping :D


+};
diff --git a/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi 
b/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi

index 7f9b9a647717..ded8b8082d79 100644
--- a/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
@@ -56,6 +56,23 @@
  compatible = "simple-bus";
  ranges = <0x0 0x0 0xf000 0x100>;
+    smmu: iommu@500 {
+    compatible = "marvell,ap806-smmu-500", "arm,mmu-500";
+    reg = <0x10 0x10>;
+    dma-coherent;
+    #iommu-cells = <1>;
+    #global-interrupts = <1>;
+    interrupts = ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ;


I'd recommend you have the node disabled by default here, then 
explicitly enable it in armada-8040.dtsi where you add the Stream IDs. 
Otherwise it will also end up enabled for 8020, 70x0, etc. where 
disable_bypass will then catastrophically break everything.




Good point! I will fix this.

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 3/4] dt-bindings: arm-smmu: add compatible string for Marvell Armada-AP806 SMMU-500

2020-07-03 Thread Tomasz Nowicki

On 03.07.2020 11:05, Robin Murphy wrote:

On 2020-07-02 21:16, Tomasz Nowicki wrote:

Add specific compatible string for Marvell usage due to errata of
accessing 64bits registers of ARM SMMU, in AP806.

AP806 SoC uses the generic ARM-MMU500, and there's no specific
implementation of Marvell, this compatible is used for errata only.

Signed-off-by: Hanna Hawa 
Signed-off-by: Gregory CLEMENT 
Signed-off-by: Tomasz Nowicki 
---
  Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
  1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml

index d7ceb4c34423..7beca9c00b12 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
    - qcom,sc7180-smmu-500
    - qcom,sdm845-smmu-500
    - const: arm,mmu-500
+  - description: Marvell SoCs implementing "arm,mmu-500"
+    items:
+  - enum:
+  - marvell,ap806-smmu-500


Isn't a single-valued enum just a constant? :P


That's how copy-paste engineering ends up :)

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/4] iommu/arm-smmu: Add SMMU ID2 register fixup hook

2020-07-03 Thread Tomasz Nowicki

On 03.07.2020 10:24, Robin Murphy wrote:

On 2020-07-02 21:16, Tomasz Nowicki wrote:

We already have 'cfg_probe' hook which meant to override and apply
workarounds while probing ID registers. However, 'cfg_probe' is called
at the very end and therefore for some cases fixing up things becomes 
complex
or requires exporting of SMMU driver structures. Hence, seems it is 
better and

cleaner to do ID fixup right away. In preparation for adding Marvell
errata add an extra ID2 fixup hook.


Hmm, the intent of ->cfg_probe was very much to give impl a chance to 
adjust the detected features before we start consuming them, with this 
exact case in mind. Since the Cavium quirk isn't actually doing that - 
it just needs to run *anywhere* in the whole probe process - I'm under 
no illusion that I put the hook in exactly the right place first time 
around ;)


The diff below should be more on the mark...

Robin.

->8-
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..884ffca5b1eb 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1891,6 +1891,9 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)

  smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_64K;
  }

+    if (smmu->impl && smmu->impl->cfg_probe)
+    return smmu->impl->cfg_probe(smmu);
+
  /* Now we've corralled the various formats, what'll it do? */
  if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_S)
  smmu->pgsize_bitmap |= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
@@ -1909,7 +1912,6 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)

  dev_notice(smmu->dev, "\tSupported page sizes: 0x%08lx\n",
     smmu->pgsize_bitmap);

-
  if (smmu->features & ARM_SMMU_FEAT_TRANS_S1)
  dev_notice(smmu->dev, "\tStage-1: %lu-bit VA -> %lu-bit IPA\n",
     smmu->va_size, smmu->ipa_size);
@@ -1918,9 +1920,6 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)

  dev_notice(smmu->dev, "\tStage-2: %lu-bit IPA -> %lu-bit PA\n",
     smmu->ipa_size, smmu->pa_size);

-    if (smmu->impl && smmu->impl->cfg_probe)
-    return smmu->impl->cfg_probe(smmu);
-
  return 0;
  }



I was under impression that ->cfg_probe was meant for Cavium alike 
errata (position independent). Then I will move ->cfg_probe as you 
suggested. I prefer not to add yet another impl hook too :)


Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 1/4] iommu/arm-smmu: Add SMMU ID2 register fixup hook

2020-07-02 Thread Tomasz Nowicki
We already have 'cfg_probe' hook which meant to override and apply
workarounds while probing ID registers. However, 'cfg_probe' is called
at the very end and therefore for some cases fixing up things becomes complex
or requires exporting of SMMU driver structures. Hence, seems it is better and
cleaner to do ID fixup right away. In preparation for adding Marvell
errata add an extra ID2 fixup hook.

Signed-off-by: Tomasz Nowicki 
---
 drivers/iommu/arm-smmu.c | 3 +++
 drivers/iommu/arm-smmu.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..17c92e319754 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1857,6 +1857,9 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
 
/* ID2 */
id = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_ID2);
+   if (smmu->impl && smmu->impl->cfg_id2_fixup)
+   id = smmu->impl->cfg_id2_fixup(id);
+
size = arm_smmu_id_size_to_bits(FIELD_GET(ARM_SMMU_ID2_IAS, id));
smmu->ipa_size = size;
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..f4c8bd7d0b34 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -382,6 +382,7 @@ struct arm_smmu_impl {
void (*write_reg64)(struct arm_smmu_device *smmu, int page, int offset,
u64 val);
int (*cfg_probe)(struct arm_smmu_device *smmu);
+   u32 (*cfg_id2_fixup)(u32 id);
int (*reset)(struct arm_smmu_device *smmu);
int (*init_context)(struct arm_smmu_domain *smmu_domain);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/4] arm64: dts: marvell: add SMMU support

2020-07-02 Thread Tomasz Nowicki
From: Marcin Wojtas 

Add IOMMU node for Marvell AP806 based SoCs together with platform
and PCI device Stream ID mapping.

Signed-off-by: Marcin Wojtas 
Signed-off-by: Tomasz Nowicki 
---
 arch/arm64/boot/dts/marvell/armada-8040.dtsi  | 36 +++
 arch/arm64/boot/dts/marvell/armada-ap80x.dtsi | 17 +
 2 files changed, 53 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-8040.dtsi 
b/arch/arm64/boot/dts/marvell/armada-8040.dtsi
index 7699b19224c2..25c1df709f72 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-8040.dtsi
@@ -23,3 +23,39 @@
 _rtc {
status = "disabled";
 };
+
+_usb3_0 {
+   iommus = < 0x440>;
+};
+
+_usb3_1 {
+   iommus = < 0x441>;
+};
+
+_sata0 {
+   iommus = < 0x444>;
+};
+
+_sdhci0 {
+   iommus = < 0x445>;
+};
+
+_sata0 {
+   iommus = < 0x454>;
+};
+
+_usb3_0 {
+   iommus = < 0x450>;
+};
+
+_usb3_1 {
+   iommus = < 0x451>;
+};
+
+_pcie0 {
+   iommu-map =
+   <0x00x480 0x20>,
+   <0x100  0x4a0 0x20>,
+   <0x200  0x4c0 0x20>;
+   iommu-map-mask = <0x031f>;
+};
diff --git a/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi 
b/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
index 7f9b9a647717..ded8b8082d79 100644
--- a/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
@@ -56,6 +56,23 @@
compatible = "simple-bus";
ranges = <0x0 0x0 0xf000 0x100>;
 
+   smmu: iommu@500 {
+   compatible = "marvell,ap806-smmu-500", 
"arm,mmu-500";
+   reg = <0x10 0x10>;
+   dma-coherent;
+   #iommu-cells = <1>;
+   #global-interrupts = <1>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+;
+   };
+
gic: interrupt-controller@21 {
compatible = "arm,gic-400";
#interrupt-cells = <3>;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/4] Add system mmu support for Armada-806

2020-07-02 Thread Tomasz Nowicki
There were already two versions of series to support SMMU for AP806,
including workaround for accessing ARM SMMU 64bit registers.
First [1] by Hanna Hawa and second [2] by Gregory CLEMENT.
Since it got stuck this is yet another try. I incorporated the V2 comments,
mainly by moving workaround code to arm-smmu-impl.c as requested.

For the record, AP-806 can't access SMMU registers with 64bit width,
this patches split the readq/writeq into two 32bit accesses instead
and update DT bindings.

The series was successfully tested on a vanilla v5.8-rc3 kernel and
Intel e1000e PCIe NIC. The same for platform devices like SATA and USB.

[1]: https://lkml.org/lkml/2018/10/15/373
[2]: https://lkml.org/lkml/2019/7/11/426

Hanna Hawa (1):
  iommu/arm-smmu: Workaround for Marvell Armada-AP806 SoC erratum
#582743

Marcin Wojtas (1):
  arm64: dts: marvell: add SMMU support

Tomasz Nowicki (2):
  iommu/arm-smmu: Add SMMU ID2 register fixup hook
  dt-bindings: arm-smmu: add compatible string for Marvell Armada-AP806
SMMU-500

 Documentation/arm64/silicon-errata.rst|  3 ++
 .../devicetree/bindings/iommu/arm,smmu.yaml   |  5 ++
 arch/arm64/boot/dts/marvell/armada-8040.dtsi  | 36 +
 arch/arm64/boot/dts/marvell/armada-ap80x.dtsi | 17 ++
 drivers/iommu/arm-smmu-impl.c | 52 +++
 drivers/iommu/arm-smmu.c  |  3 ++
 drivers/iommu/arm-smmu.h  |  1 +
 7 files changed, 117 insertions(+)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/4] dt-bindings: arm-smmu: add compatible string for Marvell Armada-AP806 SMMU-500

2020-07-02 Thread Tomasz Nowicki
Add specific compatible string for Marvell usage due to errata of
accessing 64bits registers of ARM SMMU, in AP806.

AP806 SoC uses the generic ARM-MMU500, and there's no specific
implementation of Marvell, this compatible is used for errata only.

Signed-off-by: Hanna Hawa 
Signed-off-by: Gregory CLEMENT 
Signed-off-by: Tomasz Nowicki 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..7beca9c00b12 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: Marvell SoCs implementing "arm,mmu-500"
+items:
+  - enum:
+  - marvell,ap806-smmu-500
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/4] iommu/arm-smmu: Workaround for Marvell Armada-AP806 SoC erratum #582743

2020-07-02 Thread Tomasz Nowicki
From: Hanna Hawa 

Due to erratum #582743, the Marvell Armada-AP806 can't access 64bit to
ARM SMMUv2 registers.

Provide implementation relevant hooks:
- split the writeq/readq to two accesses of writel/readl.
- mask the MMU_IDR2.PTFSv8 fields to not use AArch64 format (but
only AARCH32_L) since with AArch64 format 32 bits access is not supported.

Note that separate writes/reads to 2 is not problem regards to
atomicity, because the driver use the readq/writeq while initialize
the SMMU, report for SMMU fault, and use spinlock in one
case (iova_to_phys).

Signed-off-by: Hanna Hawa 
Signed-off-by: Gregory CLEMENT 
Signed-off-by: Tomasz Nowicki 
---
 Documentation/arm64/silicon-errata.rst |  3 ++
 drivers/iommu/arm-smmu-impl.c  | 52 ++
 2 files changed, 55 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.rst 
b/Documentation/arm64/silicon-errata.rst
index 936cf2a59ca4..157214d3abe1 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -125,6 +125,9 @@ stable kernels.
 | Cavium | ThunderX2 Core  | #219| CAVIUM_TX2_ERRATUM_219  
|
 
++-+-+-+
 
++-+-+-+
+| Marvell| ARM-MMU-500 | #582743 | N/A 
|
+++-+-+-+
+++-+-+-+
 | Freescale/NXP  | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 
|
 
++-+-+-+
 
++-+-+-+
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..c1fc5e1b8193 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -147,6 +147,53 @@ static const struct arm_smmu_impl arm_mmu500_impl = {
.reset = arm_mmu500_reset,
 };
 
+static u64 mrvl_mmu500_readq(struct arm_smmu_device *smmu, int page, int off)
+{
+   u64 val;
+
+   /*
+* Marvell Armada-AP806 erratum #582743.
+* Split all the readq to double readl
+*/
+   val = (u64)readl_relaxed(arm_smmu_page(smmu, page) + off + 4) << 32;
+   val |= readl_relaxed(arm_smmu_page(smmu, page) + off);
+
+   return val;
+}
+
+static void mrvl_mmu500_writeq(struct arm_smmu_device *smmu, int page, int off,
+  u64 val)
+{
+   /*
+* Marvell Armada-AP806 erratum #582743.
+* Split all the writeq to double writel
+*/
+   writel_relaxed(upper_32_bits(val), arm_smmu_page(smmu, page) + off + 4);
+   writel_relaxed(lower_32_bits(val), arm_smmu_page(smmu, page) + off);
+}
+
+static u32 mrvl_mmu500_cfg_id2_fixup(u32 id)
+{
+
+   /*
+* Armada-AP806 erratum #582743.
+* Hide the SMMU_IDR2.PTFSv8 fields to sidestep the AArch64
+* formats altogether and allow using 32 bits access on the
+* interconnect.
+*/
+   id &= ~(ARM_SMMU_ID2_PTFS_4K | ARM_SMMU_ID2_PTFS_16K |
+   ARM_SMMU_ID2_PTFS_64K);
+
+   return id;
+}
+
+static const struct arm_smmu_impl mrvl_mmu500_impl = {
+   .read_reg64 = mrvl_mmu500_readq,
+   .write_reg64 = mrvl_mmu500_writeq,
+   .cfg_id2_fixup = mrvl_mmu500_cfg_id2_fixup,
+   .reset = arm_mmu500_reset,
+};
+
 
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
 {
@@ -160,6 +207,11 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "marvell,ap806-smmu-500")) {
+   smmu->impl = _mmu500_impl;
+   return smmu;
+   }
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 0/7] Add virtio-iommu driver

2019-02-25 Thread Tomasz Nowicki
Hi Jean,

On 15.01.2019 13:19, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.9 [1].
> 
> This is a simple rebase onto Linux v5.0-rc2. We now use the
> dev_iommu_fwspec_get() helper introduced in v5.0 instead of accessing
> dev->iommu_fwspec, but there aren't any functional change from v6 [2].
> 
> Our current goal for virtio-iommu is to get a paravirtual IOMMU working
> on Arm, and enable device assignment to guest userspace. In this
> use-case the mappings are static, and don't require optimal performance,
> so this series tries to keep things simple. However there is plenty more
> to do for features and optimizations, and having this base in v5.1 would
> be good. Given that most of the changes are to drivers/iommu, I believe
> the driver and future changes should go via the IOMMU tree.
> 
> You can find Linux driver and kvmtool device on v0.9.2 branches [3],
> module and x86 support on virtio-iommu/devel. Also tested with Eric's
> QEMU device [4]. Please note that the series depends on Robin's
> probe-deferral fix [5], which will hopefully land in v5.0.
> 
> [1] Virtio-iommu specification v0.9, sources and pdf
>  git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
>  http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
> 
> [2] [PATCH v6 0/7] Add virtio-iommu driver
>  
> https://lists.linuxfoundation.org/pipermail/iommu/2018-December/032127.html
> 
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.2
>  git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9.2
> 
> [4] [RFC v9 00/17] VIRTIO-IOMMU device
>  https://www.mail-archive.com/qemu-devel@nongnu.org/msg575578.html
> 
> [5] [PATCH] iommu/of: Fix probe-deferral
>  https://www.spinics.net/lists/arm-kernel/msg698371.html
> 
> Jean-Philippe Brucker (7):
>dt-bindings: virtio-mmio: Add IOMMU description
>dt-bindings: virtio: Add virtio-pci-iommu node
>of: Allow the iommu-map property to omit untranslated devices
>PCI: OF: Initialize dev->fwnode appropriately
>iommu: Add virtio-iommu driver
>iommu/virtio: Add probe request
>iommu/virtio: Add event queue
> 
>   .../devicetree/bindings/virtio/iommu.txt  |   66 +
>   .../devicetree/bindings/virtio/mmio.txt   |   30 +
>   MAINTAINERS   |7 +
>   drivers/iommu/Kconfig |   11 +
>   drivers/iommu/Makefile|1 +
>   drivers/iommu/virtio-iommu.c  | 1158 +
>   drivers/of/base.c |   10 +-
>   drivers/pci/of.c  |7 +
>   include/uapi/linux/virtio_ids.h   |1 +
>   include/uapi/linux/virtio_iommu.h |  161 +++
>   10 files changed, 1449 insertions(+), 3 deletions(-)
>   create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>   create mode 100644 drivers/iommu/virtio-iommu.c
>   create mode 100644 include/uapi/linux/virtio_iommu.h
> 

I have tested the whole series and Eric's QEMU patchset [4] for 
virtio-net-pci and virtio-blk-pci devices and faced no issues on 
ThunderX2. The multiqueue mode for both devices is working fine too so:

Tested-by: Tomasz Nowicki 

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-19 Thread Tomasz Nowicki

Hi Jean,

On 14.02.2018 15:53, Jean-Philippe Brucker wrote:

The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio-mmio transport without emulating
page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code transforms calls coming from the IOMMU API into
corresponding virtio requests. Mappings are kept in an interval tree
instead of page tables.

Signed-off-by: Jean-Philippe Brucker 
---
  MAINTAINERS   |   6 +
  drivers/iommu/Kconfig |  11 +
  drivers/iommu/Makefile|   1 +
  drivers/iommu/virtio-iommu.c  | 960 ++
  include/uapi/linux/virtio_ids.h   |   1 +
  include/uapi/linux/virtio_iommu.h | 116 +
  6 files changed, 1095 insertions(+)
  create mode 100644 drivers/iommu/virtio-iommu.c
  create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..2a181924d420 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14818,6 +14818,12 @@ S: Maintained
  F:drivers/virtio/virtio_input.c
  F:include/uapi/linux/virtio_input.h
  
+VIRTIO IOMMU DRIVER

+M: Jean-Philippe Brucker 
+S: Maintained
+F: drivers/iommu/virtio-iommu.c
+F: include/uapi/linux/virtio_iommu.h
+
  VIRTUAL BOX GUEST DEVICE DRIVER
  M:Hans de Goede 
  M:Arnd Bergmann 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..1ea0ec74524f 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -381,4 +381,15 @@ config QCOM_IOMMU
help
  Support for IOMMU on certain Qualcomm SoCs.
  
+config VIRTIO_IOMMU

+   bool "Virtio IOMMU driver"
+   depends on VIRTIO_MMIO
+   select IOMMU_API
+   select INTERVAL_TREE
+   select ARM_DMA_USE_IOMMU if ARM
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
  endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..9c68be1365e1 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..a9c9245e8ba2
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,960 @@
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2018 ARM Limited
+ * Author: Jean-Philippe Brucker 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vq;
+   /* Serialize anything touching the request queue */
+   spinlock_t  request_lock;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u8  domain_bits;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   union {
+   struct virtio_iommu_req_map map;
+   struct virtio_iommu_req_unmap unmap;
+   } req;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex;
+   unsigned intid;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   /* Number of endpoints attached to this domain */
+   unsigned long   endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct scatterlist  top;
+   struct scatterlist  bottom;
+
+   int written;
+   struct list_headlist;
+};
+
+#define to_viommu_domain(domain)   \
+   container_of(domain, struct viommu_domain, domain)
+
+/* 

Re: [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU

2017-12-22 Thread Tomasz Nowicki

Hi Robin,

On 19.12.2017 17:34, Robin Murphy wrote:

Hi Tomasz,

On 19/12/17 15:13, Tomasz Nowicki wrote:
Here is my lspci output of ThunderX2 for which I am observing kernel 
panic coming from

SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live):

# lspci -vt
-[:00]-+-00.0-[01-1f]--+ [...]
    + [...]
    \-00.0-[1e-1f]00.0-[1f]00.0  
ASPEED Technology, Inc. ASPEED Graphics Family


ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, 
Inc. ASPEED Graphics Family
PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED 
Technology, Inc. AST1150 PCI-to-PCI Bridge

While setting up ASP device SID in IORT dirver:
iort_iommu_configure() -> pci_for_each_dma_alias()
we need to walk up and iterate over each device which alias 
transaction from

downstream devices.

AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from 
IORT.
Bridge (1e:00.0) is the first alias. Following PCI Express to 
PCI/PCI-X Bridge
spec: PCIe-to-PCI/X bridges alias transactions from downstream devices 
using
the subordinate bus number. For bridge (1e:00.0), the subordinate is 
equal
to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as 
downstream
device. So it is possible to have two identical SIDs. The question is 
what we
should do about such case. Presented patch prevents from registering 
the same

ID so that SMMUv3 is not complaining later on.


Ooh, subtle :( There is logic in arm_smmu_attach_device() to tolerate
grouped devices aliasing to the same ID, but I guess I overlooked the
distinction of a device sharing an alias ID with itself. I'm not sure
I really like trying to work around this in generic code, since
fwspec->ids is essentially opaque data in a driver-specific format - in
theory a driver is free to encode a single logical ID into multiple
fwspec elements (I think I did that in an early draft of SMMUv2 SMR
support), at which point this approach might corrupt things massively.

Does the (untested) diff below suffice?

Robin.

->8-diff --git a/drivers/iommu/arm-smmu-v3.c 
b/drivers/iommu/arm-smmu-v3.c

index f122071688fd..d8a730d83401 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1731,7 +1731,7 @@ static __le64 *arm_smmu_get_step_for_sid(struct 
arm_smmu_device *smmu, u32 sid)


  static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
  {
-    int i;
+    int i, j;
  struct arm_smmu_master_data *master = fwspec->iommu_priv;
  struct arm_smmu_device *smmu = master->smmu;

@@ -1739,6 +1739,13 @@ static void arm_smmu_install_ste_for_dev(struct 
iommu_fwspec *fwspec)

  u32 sid = fwspec->ids[i];
  __le64 *step = arm_smmu_get_step_for_sid(smmu, sid);

+    /* Bridged PCI devices may end up with duplicated IDs */
+    for (j = 0; j < i; j++)
+    if (fwspec->ids[j] == sid)
+    break;
+    if (j < i)
+    continue;
+
  arm_smmu_write_strtab_ent(smmu, sid, step, >ste);
  }
  }


The patch works for me:

Tested-by: Tomasz Nowicki <tomasz.nowi...@cavium.com>

The bug causes a regression on our platform and would be nice to fix it 
for 4.15. Since no more comments so far, IMO we can put the fix to 
SMMUv3 driver. Do you prefer to send patch yourself or would you like me 
to do it?


Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V1 1/1] iommu: Make sure device's ID array elements are unique

2017-12-20 Thread Tomasz Nowicki

On 19.12.2017 17:37, Alex Williamson wrote:

On Tue, 19 Dec 2017 16:20:21 +0100
Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com> wrote:


While iterating over DMA aliases for a PCI device, for some rare cases
(i.e. PCIe-to-PCI/X bridges) we may get exactly the same ID as initial child
device. In turn, the same ID may get registered for a device multiple times.
Eventually IOMMU  driver may try to configure the same ID within domain
multiple times too which for some IOMMU drivers is illegal and causes kernel
panic.

Rule out ID duplication prior to device ID array registration.

CC: sta...@vger.kernel.org  # v4.14+


You've identified a release, is there a specific commit this fixes?


Yes, it was triggered by converting drm_pci_init() to 
pci_register_driver() in ast_drv.c


Fixes: 10631d724def ("drm/pci: Deprecate drm_pci_init/exit completely
")




Signed-off-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>
---
  drivers/iommu/iommu.c | 28 
  1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3de5c0b..9b2c138 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1945,6 +1945,31 @@ void iommu_fwspec_free(struct device *dev)
  }
  EXPORT_SYMBOL_GPL(iommu_fwspec_free);
  
+static void iommu_fwspec_remove_ids_dup(struct device *dev, u32 *ids,

+   int *num_ids)
+{
+   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   int i, j, k, valid_ids = *num_ids;
+
+   for (i = 0; i < valid_ids; i++) {
+   for (j = 0; j < fwspec->num_ids; j++) {
+   if (ids[i] != fwspec->ids[j])
+   continue;
+
+   dev_info(dev, "found 0x%x ID duplication, skipped\n",
+ids[i]);
+
+   for (k = i + 1; k < valid_ids; k++)
+   ids[k - 1] = ids[k];


Use memmove()?


Right.




+
+   valid_ids--;
+   break;


At this point ids[i] is not the ids[i] that we tested for dupes, it's
what was ids[i + 1], but we're going to i++ on the next iteration and
we therefore never test that entry.


Good point.

Now the fundamental question is where we should put the patch, here or 
in SMMUv3 driver as per Robin suggestion.


Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU

2017-12-20 Thread Tomasz Nowicki

Hi Robin,

On 19.12.2017 17:34, Robin Murphy wrote:

Hi Tomasz,

On 19/12/17 15:13, Tomasz Nowicki wrote:
Here is my lspci output of ThunderX2 for which I am observing kernel 
panic coming from

SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live):

# lspci -vt
-[:00]-+-00.0-[01-1f]--+ [...]
    + [...]
    \-00.0-[1e-1f]00.0-[1f]00.0  
ASPEED Technology, Inc. ASPEED Graphics Family


ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, 
Inc. ASPEED Graphics Family
PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED 
Technology, Inc. AST1150 PCI-to-PCI Bridge

While setting up ASP device SID in IORT dirver:
iort_iommu_configure() -> pci_for_each_dma_alias()
we need to walk up and iterate over each device which alias 
transaction from

downstream devices.

AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from 
IORT.
Bridge (1e:00.0) is the first alias. Following PCI Express to 
PCI/PCI-X Bridge
spec: PCIe-to-PCI/X bridges alias transactions from downstream devices 
using
the subordinate bus number. For bridge (1e:00.0), the subordinate is 
equal
to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as 
downstream
device. So it is possible to have two identical SIDs. The question is 
what we
should do about such case. Presented patch prevents from registering 
the same

ID so that SMMUv3 is not complaining later on.


Ooh, subtle :( There is logic in arm_smmu_attach_device() to tolerate
grouped devices aliasing to the same ID, but I guess I overlooked the
distinction of a device sharing an alias ID with itself. I'm not sure
I really like trying to work around this in generic code, since
fwspec->ids is essentially opaque data in a driver-specific format - in
theory a driver is free to encode a single logical ID into multiple
fwspec elements (I think I did that in an early draft of SMMUv2 SMR
support), at which point this approach might corrupt things massively.


I don't have strong favourite here, the fix in SMMUv3 driver would work 
too. Initially we can fix things for SMMUv3 only and if ever face the 
same issue for other IOMMU driver, then we can move it to generic layer.




Does the (untested) diff below suffice?

Robin.

->8-diff --git a/drivers/iommu/arm-smmu-v3.c 
b/drivers/iommu/arm-smmu-v3.c

index f122071688fd..d8a730d83401 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1731,7 +1731,7 @@ static __le64 *arm_smmu_get_step_for_sid(struct 
arm_smmu_device *smmu, u32 sid)


  static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
  {
-    int i;
+    int i, j;
  struct arm_smmu_master_data *master = fwspec->iommu_priv;
  struct arm_smmu_device *smmu = master->smmu;

@@ -1739,6 +1739,13 @@ static void arm_smmu_install_ste_for_dev(struct 
iommu_fwspec *fwspec)

  u32 sid = fwspec->ids[i];
  __le64 *step = arm_smmu_get_step_for_sid(smmu, sid);

+    /* Bridged PCI devices may end up with duplicated IDs */
+    for (j = 0; j < i; j++)
+    if (fwspec->ids[j] == sid)
+    break;
+    if (j < i)
+    continue;
+
  arm_smmu_write_strtab_ent(smmu, sid, step, >ste);
  }
  }


Yes, worked for me.

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V1 1/1] iommu: Make sure device's ID array elements are unique

2017-12-19 Thread Tomasz Nowicki
While iterating over DMA aliases for a PCI device, for some rare cases
(i.e. PCIe-to-PCI/X bridges) we may get exactly the same ID as initial child
device. In turn, the same ID may get registered for a device multiple times.
Eventually IOMMU  driver may try to configure the same ID within domain
multiple times too which for some IOMMU drivers is illegal and causes kernel
panic.

Rule out ID duplication prior to device ID array registration.

CC: sta...@vger.kernel.org  # v4.14+
Signed-off-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>
---
 drivers/iommu/iommu.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3de5c0b..9b2c138 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1945,6 +1945,31 @@ void iommu_fwspec_free(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_free);
 
+static void iommu_fwspec_remove_ids_dup(struct device *dev, u32 *ids,
+   int *num_ids)
+{
+   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   int i, j, k, valid_ids = *num_ids;
+
+   for (i = 0; i < valid_ids; i++) {
+   for (j = 0; j < fwspec->num_ids; j++) {
+   if (ids[i] != fwspec->ids[j])
+   continue;
+
+   dev_info(dev, "found 0x%x ID duplication, skipped\n",
+ids[i]);
+
+   for (k = i + 1; k < valid_ids; k++)
+   ids[k - 1] = ids[k];
+
+   valid_ids--;
+   break;
+   }
+   }
+
+   *num_ids = valid_ids;
+}
+
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 {
struct iommu_fwspec *fwspec = dev->iommu_fwspec;
@@ -1954,6 +1979,9 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, 
int num_ids)
if (!fwspec)
return -EINVAL;
 
+   /* Rule out IDs already registered */
+   iommu_fwspec_remove_ids_dup(dev, ids, _ids);
+
size = offsetof(struct iommu_fwspec, ids[fwspec->num_ids + num_ids]);
if (size > sizeof(*fwspec)) {
fwspec = krealloc(dev->iommu_fwspec, size, GFP_KERNEL);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU

2017-12-19 Thread Tomasz Nowicki
Here is my lspci output of ThunderX2 for which I am observing kernel panic 
coming from
SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live):

# lspci -vt
-[:00]-+-00.0-[01-1f]--+ [...]
   + [...]
   \-00.0-[1e-1f]00.0-[1f]00.0  ASPEED 
Technology, Inc. ASPEED Graphics Family

ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED 
Graphics Family
PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED Technology, Inc. 
AST1150 PCI-to-PCI Bridge
   
While setting up ASP device SID in IORT dirver:
iort_iommu_configure() -> pci_for_each_dma_alias()
we need to walk up and iterate over each device which alias transaction from
downstream devices.

AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from IORT.
Bridge (1e:00.0) is the first alias. Following PCI Express to PCI/PCI-X Bridge
spec: PCIe-to-PCI/X bridges alias transactions from downstream devices using
the subordinate bus number. For bridge (1e:00.0), the subordinate is equal
to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as downstream
device. So it is possible to have two identical SIDs. The question is what we
should do about such case. Presented patch prevents from registering the same
ID so that SMMUv3 is not complaining later on.

Tomasz Nowicki (1):
  iommu: Make sure device's ID array elements are unique

 drivers/iommu/iommu.c | 28 
 1 file changed, 28 insertions(+)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 0/1] Optimise IOVA allocations for PCI devices

2017-10-12 Thread Tomasz Nowicki

Hi Joerg,

On 12.10.2017 12:04, Joerg Roedel wrote:

Hi Tomasz,

On Thu, Oct 12, 2017 at 11:40:27AM +0200, Tomasz Nowicki wrote:

Can you please have a look and see if you are fine with this patch?


How do these changes relate to Robin's IOVA optimizations already in the
iommu-tree?



This is another optimization patch which does not conflict with Robin's 
patches.


Robin, please correct me if I am wrong.

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 0/1] Optimise IOVA allocations for PCI devices

2017-10-12 Thread Tomasz Nowicki

Hi Joerg,

Can you please have a look and see if you are fine with this patch?

Thanks in advance,
Tomasz

On 20.09.2017 10:52, Tomasz Nowicki wrote:

Here is my test setup where I have stareted performance measurements.

    PCIe  -   TX   -  PCIe  -
| ThunderX2  |--| Intel XL710 | ---> | Intel XL710 |--| X86 |
| (128 cpus) |  |   40GbE |  |40GbE|   -
  --

As the reference lets take v4.13 host, SMMUv3 off and 1-thread iperf
taskset to one CPU. The performance results I got:

SMMU off -> 100%
SMMU on -> 0,02%

I followed down the DMA mapping path and found out IOVA 32-bit space
full so that kernel was flushing rcaches for all CPUs in (1).
For 128 CPUs, this kills the performance. Furthermore, for my case, rcaches
contained PFNs > 32-bit mostly so the second round of IOVA allocation failed
as well. As the consequence IOVA had to be allocated outside of 32-bit (2)
from scratch since all rcaches have been flushed in (1).

 if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
(1)-->  iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

 if (!iova)
(2)-->  iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

My fix simply introduces parameter for alloc_iova_fast() to decide whether
rcache flush has to be done or not. All users follow mentioned scenario
so they should let flush as the last chance to avoid time costly iteration
over all CPUs.

This bring my iperf performance back to 100% with SMMU on.

My bad feelings regarding this solution is that machines with relatively
small numbers of CPUs may get DAC addresses more frequently for PCI
devices. Please let me know your thoughts.

Changelog:

v1 --> v2
- add missing documentation
- fix typo

Tomasz Nowicki (1):
   iommu/iova: Make rcache flush optional on IOVA allocation failure

  drivers/iommu/amd_iommu.c   |  5 +++--
  drivers/iommu/dma-iommu.c   |  6 --
  drivers/iommu/intel-iommu.c |  5 +++--
  drivers/iommu/iova.c| 11 ++-
  include/linux/iova.h|  5 +++--
  5 files changed, 19 insertions(+), 13 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 0/1] Optimise IOVA allocations for PCI devices

2017-10-12 Thread Tomasz Nowicki

Hi Joerg,

Can you please have a look and see if you are fine with this patch?

Thanks in advance,
Tomasz

On 20.09.2017 10:52, Tomasz Nowicki wrote:

Here is my test setup where I have stareted performance measurements.

    PCIe  -   TX   -  PCIe  -
| ThunderX2  |--| Intel XL710 | ---> | Intel XL710 |--| X86 |
| (128 cpus) |  |   40GbE |  |40GbE|   -
  --

As the reference lets take v4.13 host, SMMUv3 off and 1-thread iperf
taskset to one CPU. The performance results I got:

SMMU off -> 100%
SMMU on -> 0,02%

I followed down the DMA mapping path and found out IOVA 32-bit space
full so that kernel was flushing rcaches for all CPUs in (1).
For 128 CPUs, this kills the performance. Furthermore, for my case, rcaches
contained PFNs > 32-bit mostly so the second round of IOVA allocation failed
as well. As the consequence IOVA had to be allocated outside of 32-bit (2)
from scratch since all rcaches have been flushed in (1).

 if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
(1)-->  iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

 if (!iova)
(2)-->  iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

My fix simply introduces parameter for alloc_iova_fast() to decide whether
rcache flush has to be done or not. All users follow mentioned scenario
so they should let flush as the last chance to avoid time costly iteration
over all CPUs.

This bring my iperf performance back to 100% with SMMU on.

My bad feelings regarding this solution is that machines with relatively
small numbers of CPUs may get DAC addresses more frequently for PCI
devices. Please let me know your thoughts.

Changelog:

v1 --> v2
- add missing documentation
- fix typo

Tomasz Nowicki (1):
   iommu/iova: Make rcache flush optional on IOVA allocation failure

  drivers/iommu/amd_iommu.c   |  5 +++--
  drivers/iommu/dma-iommu.c   |  6 --
  drivers/iommu/intel-iommu.c |  5 +++--
  drivers/iommu/iova.c| 11 ++-
  include/linux/iova.h|  5 +++--
  5 files changed, 19 insertions(+), 13 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/6] iommu/iova: Extend rbtree node caching

2017-09-22 Thread Tomasz Nowicki

Hi Robin,

On 21.09.2017 17:52, Robin Murphy wrote:

The cached node mechanism provides a significant performance benefit for
allocations using a 32-bit DMA mask, but in the case of non-PCI devices
or where the 32-bit space is full, the loss of this benefit can be
significant - on large systems there can be many thousands of entries in
the tree, such that walking all the way down to find free space every
time becomes increasingly awful.

Maintain a similar cached node for the whole IOVA space as a superset of
the 32-bit space so that performance can remain much more consistent.

Inspired by work by Zhen Lei .

Tested-by: Ard Biesheuvel 
Tested-by: Zhen Lei 
Tested-by: Nate Watterson 
Signed-off-by: Robin Murphy 
---

v5: Fixed __cached_rbnode_delete_update() logic to update both nodes
 when necessary

  drivers/iommu/iova.c | 60 
  include/linux/iova.h |  3 ++-
  2 files changed, 30 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 20be9a8b3188..c6f5a22f8d20 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -48,6 +48,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
  
  	spin_lock_init(>iova_rbtree_lock);

iovad->rbroot = RB_ROOT;
+   iovad->cached_node = NULL;
iovad->cached32_node = NULL;
iovad->granule = granule;
iovad->start_pfn = start_pfn;
@@ -110,48 +111,44 @@ EXPORT_SYMBOL_GPL(init_iova_flush_queue);
  static struct rb_node *
  __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
  {
-   if ((*limit_pfn > iovad->dma_32bit_pfn) ||
-   (iovad->cached32_node == NULL))
+   struct rb_node *cached_node = NULL;
+   struct iova *curr_iova;
+
+   if (*limit_pfn <= iovad->dma_32bit_pfn)
+   cached_node = iovad->cached32_node;
+   if (!cached_node)
+   cached_node = iovad->cached_node;
+   if (!cached_node)
return rb_last(>rbroot);
-   else {
-   struct rb_node *prev_node = rb_prev(iovad->cached32_node);
-   struct iova *curr_iova =
-   rb_entry(iovad->cached32_node, struct iova, node);
-   *limit_pfn = curr_iova->pfn_lo;
-   return prev_node;
-   }
+
+   curr_iova = rb_entry(cached_node, struct iova, node);
+   *limit_pfn = min(*limit_pfn, curr_iova->pfn_lo);


I guess this it the fix for stale pointer in iovad->cached32_node from 
previous series but I think this is something more.


With this min() here we have real control over the limit_pfn we would 
like to allocate at most. In other works, without your series two 
subsequent calls can give us:

iova () = alloc_iova_fast(iovad, 1, DMA_BIT_MASK(32) >> shift);

iova (fffe)= alloc_iova_fast(iovad, 1, DMA_BIT_MASK(16) >> shift);

We do not see this since nobody uses limit_pfn below DMA_BIT_MASK(32) 
now. That might be done intentionally so I ask for your opinion.


Also, with your patch and two the same alloc_iova_fast() calls in 
iteration may get 32-bit space full much faster. Please correct me if I 
messed things up.


Thanks,
Tomasz



+
+   return rb_prev(cached_node);
  }
  
  static void

-__cached_rbnode_insert_update(struct iova_domain *iovad,
-   unsigned long limit_pfn, struct iova *new)
+__cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
  {
-   if (limit_pfn != iovad->dma_32bit_pfn)
-   return;
-   iovad->cached32_node = >node;
+   if (new->pfn_hi < iovad->dma_32bit_pfn)
+   iovad->cached32_node = >node;
+   else
+   iovad->cached_node = >node;
  }
  
  static void

  __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
  {
struct iova *cached_iova;
-   struct rb_node *curr;
  
-	if (!iovad->cached32_node)

-   return;
-   curr = iovad->cached32_node;
-   cached_iova = rb_entry(curr, struct iova, node);
+   cached_iova = rb_entry(iovad->cached32_node, struct iova, node);
+   if (free->pfn_hi < iovad->dma_32bit_pfn &&
+   iovad->cached32_node && free->pfn_lo >= cached_iova->pfn_lo)
+   iovad->cached32_node = rb_next(>node);
  
-	if (free->pfn_lo >= cached_iova->pfn_lo) {

-   struct rb_node *node = rb_next(>node);
-   struct iova *iova = rb_entry(node, struct iova, node);
-
-   /* only cache if it's below 32bit pfn */
-   if (node && iova->pfn_lo < iovad->dma_32bit_pfn)
-   iovad->cached32_node = node;
-   else
-   iovad->cached32_node = NULL;
-   }
+   cached_iova = rb_entry(iovad->cached_node, struct iova, node);
+   if (iovad->cached_node && free->pfn_lo >= cached_iova->pfn_lo)
+  

Re: [PATCH v4 5/6] iommu/iova: Extend rbtree node caching

2017-09-20 Thread Tomasz Nowicki

Hi Robin,

On 19.09.2017 18:31, Robin Murphy wrote:

The cached node mechanism provides a significant performance benefit for
allocations using a 32-bit DMA mask, but in the case of non-PCI devices
or where the 32-bit space is full, the loss of this benefit can be
significant - on large systems there can be many thousands of entries in
the tree, such that walking all the way down to find free space every
time becomes increasingly awful.

Maintain a similar cached node for the whole IOVA space as a superset of
the 32-bit space so that performance can remain much more consistent.

Inspired by work by Zhen Lei .

Tested-by: Ard Biesheuvel 
Tested-by: Zhen Lei 
Tested-by: Nate Watterson 
Signed-off-by: Robin Murphy 
---

v4:
  - Adjust to simplified __get_cached_rbnode() behaviour
  - Cosmetic tweaks

  drivers/iommu/iova.c | 43 +--
  include/linux/iova.h |  3 ++-
  2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index c93a6c46bcb1..a125a5786dbf 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -51,6 +51,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
  
  	spin_lock_init(>iova_rbtree_lock);

iovad->rbroot = RB_ROOT;
+   iovad->cached_node = NULL;
iovad->cached32_node = NULL;
iovad->granule = granule;
iovad->start_pfn = start_pfn;
@@ -119,39 +120,38 @@ __get_cached_rbnode(struct iova_domain *iovad, unsigned 
long limit_pfn)
if (limit_pfn <= iovad->dma_32bit_pfn && iovad->cached32_node)
return iovad->cached32_node;
  
+	if (iovad->cached_node)

+   return iovad->cached_node;
+
return >anchor.node;
  }
  
  static void

-__cached_rbnode_insert_update(struct iova_domain *iovad,
-   unsigned long limit_pfn, struct iova *new)
+__cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
  {
-   if (limit_pfn != iovad->dma_32bit_pfn)
-   return;
-   iovad->cached32_node = >node;
+   if (new->pfn_hi < iovad->dma_32bit_pfn)
+   iovad->cached32_node = >node;
+   else
+   iovad->cached_node = >node;
  }
  
  static void

  __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
  {
struct iova *cached_iova;
-   struct rb_node *curr;
+   struct rb_node **curr;
  
-	if (!iovad->cached32_node)

+   if (free->pfn_hi < iovad->dma_32bit_pfn)
+   curr = >cached32_node;
+   else
+   curr = >cached_node;
+
+   if (!*curr)
return;
-   curr = iovad->cached32_node;
-   cached_iova = rb_entry(curr, struct iova, node);
  
-	if (free->pfn_lo >= cached_iova->pfn_lo) {

-   struct rb_node *node = rb_next(>node);
-   struct iova *iova = rb_entry(node, struct iova, node);
-
-   /* only cache if it's below 32bit pfn */
-   if (node && iova->pfn_lo < iovad->dma_32bit_pfn)
-   iovad->cached32_node = node;
-   else
-   iovad->cached32_node = NULL;
-   }
+   cached_iova = rb_entry(*curr, struct iova, node);
+   if (free->pfn_lo >= cached_iova->pfn_lo)
+   *curr = rb_next(>node);


IMO, we may incorrectly update iovad->cached32_node here.

  32-bit boundary
    |     
|| ||   | || ||
  IOVA0   -  IOVA1   -  IOVA3   -  anchor |
|| ||   | || ||
    |     

If we free last IOVA from 32-bit space (IOVA1) we will update 
iovad->cached32_node to IOVA3.


Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 0/1] Optimise IOVA allocations for PCI devices

2017-09-20 Thread Tomasz Nowicki
Here is my test setup where I have stareted performance measurements.

   PCIe  -   TX   -  PCIe  -
| ThunderX2  |--| Intel XL710 | ---> | Intel XL710 |--| X86 |
| (128 cpus) |  |   40GbE |  |40GbE|   -
 --

As the reference lets take v4.13 host, SMMUv3 off and 1-thread iperf
taskset to one CPU. The performance results I got:

SMMU off -> 100%
SMMU on -> 0,02%

I followed down the DMA mapping path and found out IOVA 32-bit space
full so that kernel was flushing rcaches for all CPUs in (1).
For 128 CPUs, this kills the performance. Furthermore, for my case, rcaches
contained PFNs > 32-bit mostly so the second round of IOVA allocation failed
as well. As the consequence IOVA had to be allocated outside of 32-bit (2)
from scratch since all rcaches have been flushed in (1).

if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
(1)-->  iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

if (!iova)
(2)-->  iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

My fix simply introduces parameter for alloc_iova_fast() to decide whether
rcache flush has to be done or not. All users follow mentioned scenario
so they should let flush as the last chance to avoid time costly iteration
over all CPUs.

This bring my iperf performance back to 100% with SMMU on.

My bad feelings regarding this solution is that machines with relatively
small numbers of CPUs may get DAC addresses more frequently for PCI
devices. Please let me know your thoughts.

Changelog:

v1 --> v2
- add missing documentation
- fix typo

Tomasz Nowicki (1):
  iommu/iova: Make rcache flush optional on IOVA allocation failure

 drivers/iommu/amd_iommu.c   |  5 +++--
 drivers/iommu/dma-iommu.c   |  6 --
 drivers/iommu/intel-iommu.c |  5 +++--
 drivers/iommu/iova.c| 11 ++-
 include/linux/iova.h|  5 +++--
 5 files changed, 19 insertions(+), 13 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 1/1] iommu/iova: Make rcache flush optional on IOVA allocation failure

2017-09-20 Thread Tomasz Nowicki
Since IOVA allocation failure is not unusual case we need to flush
CPUs' rcache in hope we will succeed in next round.

However, it is useful to decide whether we need rcache flush step because
of two reasons:
- Not scalability. On large system with ~100 CPUs iterating and flushing
  rcache for each CPU becomes serious bottleneck so we may want to defer it.
- free_cpu_cached_iovas() does not care about max PFN we are interested in.
  Thus we may flush our rcaches and still get no new IOVA like in the
  commonly used scenario:

if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

if (!iova)
iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

   1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get
  PCI devices a SAC address
   2. alloc_iova() fails due to full 32-bit space
   3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas()
  throws entries away for nothing and alloc_iova() fails again
   4. Next alloc_iova_fast() call cannot take advantage of rcache since we
  have just defeated caches. In this case we pick the slowest option
  to proceed.

This patch reworks flushed_rcache local flag to be additional function
argument instead and control rcache flush step. Also, it updates all users
to do the flush as the last chance.

Signed-off-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>
Reviewed-by: Robin Murphy <robin.mur...@arm.com>
Tested-by: Nate Watterson <nwatt...@codeaurora.org>
---
 drivers/iommu/amd_iommu.c   |  5 +++--
 drivers/iommu/dma-iommu.c   |  6 --
 drivers/iommu/intel-iommu.c |  5 +++--
 drivers/iommu/iova.c| 11 ++-
 include/linux/iova.h|  5 +++--
 5 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 8d2ec60..ce68986 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1604,10 +1604,11 @@ static unsigned long dma_ops_alloc_iova(struct device 
*dev,
 
if (dma_mask > DMA_BIT_MASK(32))
pfn = alloc_iova_fast(_dom->iovad, pages,
- IOVA_PFN(DMA_BIT_MASK(32)));
+ IOVA_PFN(DMA_BIT_MASK(32)), false);
 
if (!pfn)
-   pfn = alloc_iova_fast(_dom->iovad, pages, 
IOVA_PFN(dma_mask));
+   pfn = alloc_iova_fast(_dom->iovad, pages,
+ IOVA_PFN(dma_mask), true);
 
return (pfn << PAGE_SHIFT);
 }
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 191be9c..25914d3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -370,10 +370,12 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
 
/* Try to get PCI devices a SAC address */
if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
-   iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> 
shift);
+   iova = alloc_iova_fast(iovad, iova_len,
+  DMA_BIT_MASK(32) >> shift, false);
 
if (!iova)
-   iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);
+   iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
+  true);
 
return (dma_addr_t)iova << shift;
 }
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 05c0c3a..75c8320 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3460,11 +3460,12 @@ static unsigned long intel_alloc_iova(struct device 
*dev,
 * from higher range
 */
iova_pfn = alloc_iova_fast(>iovad, nrpages,
-  IOVA_PFN(DMA_BIT_MASK(32)));
+  IOVA_PFN(DMA_BIT_MASK(32)), false);
if (iova_pfn)
return iova_pfn;
}
-   iova_pfn = alloc_iova_fast(>iovad, nrpages, IOVA_PFN(dma_mask));
+   iova_pfn = alloc_iova_fast(>iovad, nrpages,
+  IOVA_PFN(dma_mask), true);
if (unlikely(!iova_pfn)) {
pr_err("Allocating %ld-page iova for %s failed",
   nrpages, dev_name(dev));
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index f88acad..e8c140c 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -353,14 +353,15 @@ EXPORT_SYMBOL_GPL(free_iova);
  * @iovad: - iova domain in question
  * @size: - size of page frames to allocate
  * @limit_pfn: - max limit address
+ * @flush_rcache: - set to flush rcache on regular allocation failure
  * This function tries to satisfy an iova allocation from the rcache,
- * and falls back to regular allocation on failure.
+ 

Re: [PATCH 1/1] iommu/iova: Make rcache flush optional on IOVA allocation failure

2017-09-19 Thread Tomasz Nowicki

Hi Nate,

On 19.09.2017 04:57, Nate Watterson wrote:

Hi Tomasz,

On 9/18/2017 12:02 PM, Robin Murphy wrote:

Hi Tomasz,

On 18/09/17 11:56, Tomasz Nowicki wrote:

Since IOVA allocation failure is not unusual case we need to flush
CPUs' rcache in hope we will succeed in next round.

However, it is useful to decide whether we need rcache flush step 
because

of two reasons:
- Not scalability. On large system with ~100 CPUs iterating and flushing
   rcache for each CPU becomes serious bottleneck so we may want to 
deffer it.

s/deffer/defer

- free_cpu_cached_iovas() does not care about max PFN we are 
interested in.

   Thus we may flush our rcaches and still get no new IOVA like in the
   commonly used scenario:

 if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
 iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> 
shift);


 if (!iova)
 iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to 
get

   PCI devices a SAC address
2. alloc_iova() fails due to full 32-bit space
3. rcaches contain PFNs out of 32-bit space so 
free_cpu_cached_iovas()

   throws entries away for nothing and alloc_iova() fails again
4. Next alloc_iova_fast() call cannot take advantage of rcache 
since we
   have just defeated caches. In this case we pick the slowest 
option

   to proceed.

This patch reworks flushed_rcache local flag to be additional function
argument instead and control rcache flush step. Also, it updates all 
users

to do the flush as the last chance.


Looks like you've run into the same thing Nate found[1] - I came up with
almost the exact same patch, only with separate alloc_iova_fast() and
alloc_iova_fast_noretry() wrapper functions, but on reflection, just
exposing the bool to callers is probably simpler. One nit, can you
document it in the kerneldoc comment too? With that:

Reviewed-by: Robin Murphy <robin.mur...@arm.com>

Thanks,
Robin.

[1]:https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg19758.html 


This patch completely resolves the issue I reported in [1]!!


I somehow missed your observations in [1] :/
Anyway, it's great it fixes performance for you too.


Tested-by: Nate Watterson <nwatt...@codeaurora.org>


Thanks!
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/iova: Make rcache flush optional on IOVA allocation failure

2017-09-19 Thread Tomasz Nowicki

Hi Robin,

On 18.09.2017 18:02, Robin Murphy wrote:

Hi Tomasz,

On 18/09/17 11:56, Tomasz Nowicki wrote:

Since IOVA allocation failure is not unusual case we need to flush
CPUs' rcache in hope we will succeed in next round.

However, it is useful to decide whether we need rcache flush step because
of two reasons:
- Not scalability. On large system with ~100 CPUs iterating and flushing
   rcache for each CPU becomes serious bottleneck so we may want to deffer it.
- free_cpu_cached_iovas() does not care about max PFN we are interested in.
   Thus we may flush our rcaches and still get no new IOVA like in the
   commonly used scenario:

 if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
 iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

 if (!iova)
 iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get
   PCI devices a SAC address
2. alloc_iova() fails due to full 32-bit space
3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas()
   throws entries away for nothing and alloc_iova() fails again
4. Next alloc_iova_fast() call cannot take advantage of rcache since we
   have just defeated caches. In this case we pick the slowest option
   to proceed.

This patch reworks flushed_rcache local flag to be additional function
argument instead and control rcache flush step. Also, it updates all users
to do the flush as the last chance.


Looks like you've run into the same thing Nate found[1] - I came up with
almost the exact same patch, only with separate alloc_iova_fast() and
alloc_iova_fast_noretry() wrapper functions, but on reflection, just
exposing the bool to callers is probably simpler. One nit, can you
document it in the kerneldoc comment too? With that:

Reviewed-by: Robin Murphy <robin.mur...@arm.com>


Thanks! I will add missing comment.

Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/iova: Make rcache flush optional on IOVA allocation failure

2017-09-18 Thread Tomasz Nowicki
Since IOVA allocation failure is not unusual case we need to flush
CPUs' rcache in hope we will succeed in next round.

However, it is useful to decide whether we need rcache flush step because
of two reasons:
- Not scalability. On large system with ~100 CPUs iterating and flushing
  rcache for each CPU becomes serious bottleneck so we may want to deffer it.
- free_cpu_cached_iovas() does not care about max PFN we are interested in.
  Thus we may flush our rcaches and still get no new IOVA like in the
  commonly used scenario:

if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

if (!iova)
iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

   1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get
  PCI devices a SAC address
   2. alloc_iova() fails due to full 32-bit space
   3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas()
  throws entries away for nothing and alloc_iova() fails again
   4. Next alloc_iova_fast() call cannot take advantage of rcache since we
  have just defeated caches. In this case we pick the slowest option
  to proceed.

This patch reworks flushed_rcache local flag to be additional function
argument instead and control rcache flush step. Also, it updates all users
to do the flush as the last chance.

Signed-off-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>
---
 drivers/iommu/amd_iommu.c   | 5 +++--
 drivers/iommu/dma-iommu.c   | 6 --
 drivers/iommu/intel-iommu.c | 5 +++--
 drivers/iommu/iova.c| 7 +++
 include/linux/iova.h| 5 +++--
 5 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 8d2ec60..ce68986 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1604,10 +1604,11 @@ static unsigned long dma_ops_alloc_iova(struct device 
*dev,
 
if (dma_mask > DMA_BIT_MASK(32))
pfn = alloc_iova_fast(_dom->iovad, pages,
- IOVA_PFN(DMA_BIT_MASK(32)));
+ IOVA_PFN(DMA_BIT_MASK(32)), false);
 
if (!pfn)
-   pfn = alloc_iova_fast(_dom->iovad, pages, 
IOVA_PFN(dma_mask));
+   pfn = alloc_iova_fast(_dom->iovad, pages,
+ IOVA_PFN(dma_mask), true);
 
return (pfn << PAGE_SHIFT);
 }
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 191be9c..25914d3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -370,10 +370,12 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
 
/* Try to get PCI devices a SAC address */
if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
-   iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> 
shift);
+   iova = alloc_iova_fast(iovad, iova_len,
+  DMA_BIT_MASK(32) >> shift, false);
 
if (!iova)
-   iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);
+   iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
+  true);
 
return (dma_addr_t)iova << shift;
 }
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 05c0c3a..75c8320 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3460,11 +3460,12 @@ static unsigned long intel_alloc_iova(struct device 
*dev,
 * from higher range
 */
iova_pfn = alloc_iova_fast(>iovad, nrpages,
-  IOVA_PFN(DMA_BIT_MASK(32)));
+  IOVA_PFN(DMA_BIT_MASK(32)), false);
if (iova_pfn)
return iova_pfn;
}
-   iova_pfn = alloc_iova_fast(>iovad, nrpages, IOVA_PFN(dma_mask));
+   iova_pfn = alloc_iova_fast(>iovad, nrpages,
+  IOVA_PFN(dma_mask), true);
if (unlikely(!iova_pfn)) {
pr_err("Allocating %ld-page iova for %s failed",
   nrpages, dev_name(dev));
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index f88acad..1a18b14 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -358,9 +358,8 @@ EXPORT_SYMBOL_GPL(free_iova);
 */
 unsigned long
 alloc_iova_fast(struct iova_domain *iovad, unsigned long size,
-   unsigned long limit_pfn)
+   unsigned long limit_pfn, bool flush_rcache)
 {
-   bool flushed_rcache = false;
unsigned long iova_pfn;
struct iova *new_iova;
 
@@ -373,11 +372,11 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
if (!new_iova) {
unsigned int cpu

[PATCH 0/1] Optimise IOVA allocations for PCI devices

2017-09-18 Thread Tomasz Nowicki
Here is my test setup where I have stareted performance measurements.

   PCIe  -   TX   -  PCIe  -
| ThunderX2  |--| Intel XL710 | ---> | Intel XL710 |--| X86 |
| (128 cpus) |  |   40GbE |  |40GbE|   -
 --

As the reference lets take v4.13 host, SMMUv3 off and 1-thread iperf
taskset to one CPU. The performance results I got:

SMMU off -> 100%
SMMU on -> 0,02%

I followed down the DMA mapping path and found out IOVA 32-bit space
full so that kernel was flushing rcaches for all CPUs in (1).
For 128 CPUs, this kills the performance. Furthermore, for my case, rcaches
contained PFNs > 32-bit mostly so the second round of IOVA allocation failed
as well. As the consequence IOVA had to be allocated outside of 32-bit (2)
from scratch since all rcaches have been flushed in (1).

if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
(1)-->  iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

if (!iova)
(2)-->  iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

My fix simply introduces parameter for alloc_iova_fast() to decide whether
rcache flush has to be done or not. All users follow mentioned scenario
so they should let flush as the last chance to avoid time costly iteration
over all CPUs.

This bring my iperf performance back to 100% with SMMU on.

My bad feelings regarding this solution is that machines with relatively
small numbers of CPUs may get DAC addresses more frequently for PCI
devices. Please let me know your thoughts.

Tomasz Nowicki (1):
  iommu/iova: Make rcache flush optional on IOVA allocation failure

 drivers/iommu/amd_iommu.c   | 5 +++--
 drivers/iommu/dma-iommu.c   | 6 --
 drivers/iommu/intel-iommu.c | 5 +++--
 drivers/iommu/iova.c| 7 +++
 include/linux/iova.h| 5 +++--
 5 files changed, 16 insertions(+), 12 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 07/30] iommu/arm-smmu-v3: Add second level of context descriptor table

2017-05-15 Thread Tomasz Nowicki

Hi Jean,

On 27.02.2017 20:54, Jean-Philippe Brucker wrote:

@@ -1213,17 +1356,59 @@ static void arm_smmu_free_cd_tables(struct 
arm_smmu_master_data *master)
 __maybe_unused
 static int arm_smmu_alloc_cd(struct arm_smmu_master_data *master)
 {
+   int ssid;
+   int i, ret;
struct arm_smmu_cd_cfg *cfg = >ste.cd_cfg;

-   return arm_smmu_bitmap_alloc(cfg->context_map, ilog2(cfg->num_entries));
+   if (cfg->linear)
+   return arm_smmu_bitmap_alloc(cfg->table.context_map,
+ilog2(cfg->num_entries));
+
+   /* Find first leaf table with an empty slot, or allocate a new leaf */
+   for (i = cfg->l1.cur_table; i < cfg->num_entries; i++) {
+   struct arm_smmu_cd_table *table = >l1.tables[i];
+
+   if (!table->cdptr) {
+   __le64 *l1ptr = cfg->l1.ptr + i * CTXDESC_L1_DESC_DWORD;
+
+   ret = arm_smmu_alloc_cd_leaf_table(master->smmu, table,
+  
CTXDESC_NUM_L2_ENTRIES);
+   if (ret)
+   return ret;
+
+   arm_smmu_write_cd_l1_desc(l1ptr, table);
+   arm_smmu_sync_cd(master, i << CTXDESC_SPLIT, false);
+   }
+
+   ssid = arm_smmu_bitmap_alloc(table->context_map, CTXDESC_SPLIT);
+   if (ssid < 0)
+   continue;
+
+   cfg->l1.cur_table = i;
+   return i << CTXDESC_SPLIT | ssid;
+   }
+
+   return -ENOSPC;
 }

 __maybe_unused
 static void arm_smmu_free_cd(struct arm_smmu_master_data *master, u32 ssid)
 {
+   unsigned long l1_idx, idx;
struct arm_smmu_cd_cfg *cfg = >ste.cd_cfg;

-   arm_smmu_bitmap_free(cfg->context_map, ssid);
+   if (cfg->linear) {
+   arm_smmu_bitmap_free(cfg->table.context_map, ssid);
+   return;
+   }
+
+   l1_idx = ssid >> CTXDESC_SPLIT;
+   idx = ssid & ((1 << CTXDESC_SPLIT) - 1);
+   arm_smmu_bitmap_free(cfg->l1.tables[l1_idx].context_map, idx);
+
+   /* Prepare next allocation */
+   if (cfg->l1.cur_table > idx)
+   cfg->l1.cur_table = idx;
 }


I am not sure what is the logic here. idx becomes index of l1.tables in 
arm_smmu_alloc_cd() which in turn may be out of allocated memory. I may 
miss something. Can you please elaborate on this?


Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 04/30] iommu/arm-smmu-v3: Add support for PCI ATS

2017-05-10 Thread Tomasz Nowicki

Hi Jean,

On 27.02.2017 20:54, Jean-Philippe Brucker wrote:

+/*
+ * Returns -ENOSYS if ATS is not supported either by the device or by the SMMU
+ */
+static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
+{
+   int ret;
+   size_t stu;
+   struct pci_dev *pdev;
+   struct arm_smmu_device *smmu = master->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev))
+   return -ENOSYS;
+
+   pdev = to_pci_dev(master->dev);
+
+#ifdef CONFIG_PCI_ATS
+   if (!pdev->ats_cap)
+   return -ENOSYS;
+#else
+   return -ENOSYS;
+#endif


Nit: This deserves to be another helper in ats.c like:

int pci_ats_supported(struct pci_dev *dev) {
if (!pdev->ats_cap)
return 0;

return 1;
}

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory

2017-04-26 Thread Tomasz Nowicki

On 26.04.2017 12:08, Jean-Philippe Brucker wrote:

On 26/04/17 07:53, Tomasz Nowicki wrote:

+mutex_lock(>tasks_lock);
+list_for_each_entry(vfio_task, >tasks, list) {
+if (vfio_task->pasid != svm.pasid)
+continue;
+
+ret = iommu_unbind_task(device->dev, svm.pasid, flags);
+if (ret)
+dev_warn(device->dev, "failed to unbind PASID %u\n",
+ vfio_task->pasid);
+
+list_del(_task->list);
+kfree(vfio_task);


Please use list_for_each_entry_safe.


There is:

+break;

right after kfree, so we'd never follow vfio_task->list.next after freeing
vfio_task. The code searches for the _only_ task matching the PASID,
removes it and leaves the loop.



Ah right. Sorry for the noise.

Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 30/30] vfio: Allow to bind foreign task

2017-04-26 Thread Tomasz Nowicki

On 27.02.2017 20:54, Jean-Philippe Brucker wrote:

Let the process that owns the device create an address space bond on
behalf of another process. We add a pid argument to the BIND_TASK ioctl,
allowing the caller to bind a foreign task. The expected program flow in
this case is:

* Process A creates the VFIO context and initializes the device.
* Process B asks A to bind its address space.
* Process A issues an ioctl to the VFIO device fd with BIND_TASK(pid).
  It may communicate the given PASID back to process B or keep track of it
  internally.
* Process B asks A to perform transactions on its virtual address.
* Process A launches transaction tagged with the given PASID.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/vfio/vfio.c   | 35 +--
 include/uapi/linux/vfio.h | 15 +++
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index c4505d8f4c61..ecc5d07e3dbb 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1660,7 +1661,7 @@ static long vfio_svm_ioctl(struct vfio_device *device, 
unsigned int cmd,
struct vfio_device_svm svm;
struct vfio_task *vfio_task;

-   minsz = offsetofend(struct vfio_device_svm, pasid);
+   minsz = offsetofend(struct vfio_device_svm, pid);

if (copy_from_user(, (void __user *)arg, minsz))
return -EFAULT;
@@ -1669,9 +1670,39 @@ static long vfio_svm_ioctl(struct vfio_device *device, 
unsigned int cmd,
return -EINVAL;

if (cmd == VFIO_DEVICE_BIND_TASK) {
-   struct task_struct *task = current;
+   struct mm_struct *mm;
+   struct task_struct *task;
+
+   if (svm.flags & ~VFIO_SVM_PID)
+   return -EINVAL;
+
+   if (svm.flags & VFIO_SVM_PID) {
+   rcu_read_lock();
+   task = find_task_by_vpid(svm.pid);
+   if (task)
+   get_task_struct(task);
+   rcu_read_unlock();
+   if (!task)
+   return -ESRCH;
+
+   /*
+* Ensure process has RW access on the task's mm
+* FIXME:
+* - I think this ought to be in the IOMMU API
+* - I'm assuming permission is never revoked during the
+*   task's lifetime. Might be mistaken.
+*/
+   mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+   if (!mm || IS_ERR(mm))


I know this is RFC patch but considering we will keep this as is, we 
need here:

+put_task_struct(task);



+   return IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
+   mmput(mm);
+   } else {
+   get_task_struct(current);
+   task = current;
+   }

ret = iommu_bind_task(device->dev, task, , 0, NULL);
+   put_task_struct(task);
if (ret)
return ret;



Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory

2017-04-26 Thread Tomasz Nowicki

Hi Jean,

On 27.02.2017 20:54, Jean-Philippe Brucker wrote:

Add two new ioctl for VFIO devices. VFIO_DEVICE_BIND_TASK creates a bond
between a device and a process address space, identified by a
device-specific ID named PASID. This allows the device to target DMA
transactions at the process virtual addresses without a need for mapping
and unmapping buffers explicitly in the IOMMU. The process page tables are
shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
handle faults. VFIO_DEVICE_UNBIND_TASK removed a bond identified by a
PASID.

Also add a capability flag in device info to detect whether the system and
the device support SVM.

Users need to specify the state of a PASID when unbinding, with flags
VFIO_PASID_RELEASE_FLUSHED and VFIO_PASID_RELEASE_CLEAN. Even for PCI,
PASID invalidation is specific to each device and only partially covered
by the specification:

* Device must have an implementation-defined mechanism for stopping the
  use of a PASID. When this mechanism finishes, the device has stopped
  issuing transactions for this PASID and all transactions for this PASID
  have been flushed to the IOMMU.

* Device may either wait for all outstanding PRI requests for this PASID
  to finish, or issue a Stop Marker message, a barrier that separates PRI
  requests affecting this instance of the PASID from PRI requests
  affecting the next instance. In the first case, we say that the PASID is
  "clean", in the second case it is "flushed" (and the IOMMU has to wait
  for the Stop Marker before reassigning the PASID.)

We expect similar distinctions for platform devices. Ideally there should
be a callback for each PCI device, allowing the IOMMU to ask the device to
stop using a PASID. When the callback returns, the PASID is either flushed
or clean and the return value tells which.

For the moment I don't know how to implement this callback for PCI, so if
the user forgets to call unbind with either "clean" or "flushed", the
PASID is never reused. For platform devices, it might be simpler to
implement since we could associate an invalidate_pasid callback to a DT
compatible string, as is currently done for reset.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/vfio/pci/vfio_pci.c |  24 ++
 drivers/vfio/vfio.c | 104 
 include/uapi/linux/vfio.h   |  55 +++
 3 files changed, 183 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e3a1a4..3d7733f94891 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -623,6 +624,26 @@ int vfio_pci_register_dev_region(struct vfio_pci_device 
*vdev,
return 0;
 }



[...]



kfree(device);
@@ -1622,6 +1651,75 @@ static int vfio_device_fops_release(struct inode *inode, 
struct file *filep)
return 0;
 }

+static long vfio_svm_ioctl(struct vfio_device *device, unsigned int cmd,
+  unsigned long arg)
+{
+   int ret;
+   unsigned long minsz;
+
+   struct vfio_device_svm svm;
+   struct vfio_task *vfio_task;
+
+   minsz = offsetofend(struct vfio_device_svm, pasid);
+
+   if (copy_from_user(, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (svm.argsz < minsz)
+   return -EINVAL;
+
+   if (cmd == VFIO_DEVICE_BIND_TASK) {
+   struct task_struct *task = current;
+
+   ret = iommu_bind_task(device->dev, task, , 0, NULL);
+   if (ret)
+   return ret;
+
+   vfio_task = kzalloc(sizeof(*vfio_task), GFP_KERNEL);
+   if (!vfio_task) {
+   iommu_unbind_task(device->dev, svm.pasid,
+ IOMMU_PASID_CLEAN);
+   return -ENOMEM;
+   }
+
+   vfio_task->pasid = svm.pasid;
+
+   mutex_lock(>tasks_lock);
+   list_add(_task->list, >tasks);
+   mutex_unlock(>tasks_lock);
+
+   } else {
+   int flags = 0;
+
+   if (svm.flags & ~(VFIO_SVM_PASID_RELEASE_FLUSHED |
+ VFIO_SVM_PASID_RELEASE_CLEAN))
+   return -EINVAL;
+
+   if (svm.flags & VFIO_SVM_PASID_RELEASE_FLUSHED)
+   flags = IOMMU_PASID_FLUSHED;
+   else if (svm.flags & VFIO_SVM_PASID_RELEASE_CLEAN)
+   flags = IOMMU_PASID_CLEAN;
+
+   mutex_lock(>tasks_lock);
+   list_for_each_entry(vfio_task, >tasks, list) {
+   if (vfio_task->pasid != svm.pasid)
+   continue;
+
+   ret = iommu_unbind_task(device->dev, svm.pasid, flags);
+   if (ret)
+   

Re: [PATCH V7 01/11] iommu/of: Refactor of_iommu_configure() for error handling

2017-01-25 Thread Tomasz Nowicki

Hi Robin,

On 25.01.2017 18:35, Robin Murphy wrote:

Hi Tomasz,

On 25/01/17 17:17, Tomasz Nowicki wrote:

Hi Sricharan,

On 23.01.2017 17:18, Sricharan R wrote:

From: Robin Murphy <robin.mur...@arm.com>

In preparation for some upcoming cleverness, rework the control flow in
of_iommu_configure() to minimise duplication and improve the propogation
of errors. It's also as good a time as any to switch over from the
now-just-a-compatibility-wrapper of_iommu_get_ops() to using the generic
IOMMU instance interface directly.

Signed-off-by: Robin Murphy <robin.mur...@arm.com>
---
 drivers/iommu/of_iommu.c | 83
+++-
 1 file changed, 53 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 0f57ddc..ee49081 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -96,6 +96,28 @@ int of_get_dma_window(struct device_node *dn, const
char *prefix, int index,
 }
 EXPORT_SYMBOL_GPL(of_get_dma_window);

+static const struct iommu_ops
+*of_iommu_xlate(struct device *dev, struct of_phandle_args *iommu_spec)
+{
+const struct iommu_ops *ops;
+struct fwnode_handle *fwnode = _spec->np->fwnode;
+int err;
+
+ops = iommu_get_instance(fwnode);
+if (!ops || !ops->of_xlate)
+return NULL;
+
+err = iommu_fwspec_init(dev, _spec->np->fwnode, ops);
+if (err)
+return ERR_PTR(err);
+
+err = ops->of_xlate(dev, iommu_spec);
+if (err)
+return ERR_PTR(err);
+
+return ops;
+}
+
 static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)
 {
 struct of_phandle_args *iommu_spec = data;
@@ -105,10 +127,11 @@ static int __get_pci_rid(struct pci_dev *pdev,
u16 alias, void *data)
 }

 static const struct iommu_ops
-*of_pci_iommu_configure(struct pci_dev *pdev, struct device_node
*bridge_np)
+*of_pci_iommu_init(struct pci_dev *pdev, struct device_node *bridge_np)
 {
 const struct iommu_ops *ops;
 struct of_phandle_args iommu_spec;
+int err;

 /*
  * Start by tracing the RID alias down the PCI topology as
@@ -123,56 +146,56 @@ static int __get_pci_rid(struct pci_dev *pdev,
u16 alias, void *data)
  * bus into the system beyond, and which IOMMU it ends up at.
  */
 iommu_spec.np = NULL;
-if (of_pci_map_rid(bridge_np, iommu_spec.args[0], "iommu-map",
-   "iommu-map-mask", _spec.np, iommu_spec.args))
-return NULL;
+err = of_pci_map_rid(bridge_np, iommu_spec.args[0], "iommu-map",
+ "iommu-map-mask", _spec.np,
+ iommu_spec.args);
+if (err)
+return ERR_PTR(err);

-ops = of_iommu_get_ops(iommu_spec.np);
-if (!ops || !ops->of_xlate ||
-iommu_fwspec_init(>dev, _spec.np->fwnode, ops) ||
-ops->of_xlate(>dev, _spec))
-ops = NULL;
+ops = of_iommu_xlate(>dev, _spec);

 of_node_put(iommu_spec.np);
 return ops;
 }

-const struct iommu_ops *of_iommu_configure(struct device *dev,
-   struct device_node *master_np)
+static const struct iommu_ops
+*of_platform_iommu_init(struct device *dev, struct device_node *np)
 {
 struct of_phandle_args iommu_spec;
-struct device_node *np;
 const struct iommu_ops *ops = NULL;
 int idx = 0;

-if (dev_is_pci(dev))
-return of_pci_iommu_configure(to_pci_dev(dev), master_np);
-
 /*
  * We don't currently walk up the tree looking for a parent IOMMU.
  * See the `Notes:' section of
  * Documentation/devicetree/bindings/iommu/iommu.txt
  */
-while (!of_parse_phandle_with_args(master_np, "iommus",
-   "#iommu-cells", idx,
-   _spec)) {
-np = iommu_spec.np;
-ops = of_iommu_get_ops(np);
-
-if (!ops || !ops->of_xlate ||
-iommu_fwspec_init(dev, >fwnode, ops) ||
-ops->of_xlate(dev, _spec))
-goto err_put_node;
-
-of_node_put(np);
+while (!of_parse_phandle_with_args(np, "iommus", "#iommu-cells",
+   idx, _spec)) {
+ops = of_iommu_xlate(dev, _spec);
+of_node_put(iommu_spec.np);
 idx++;
+if (IS_ERR_OR_NULL(ops))
+break;
 }

 return ops;
+}
+
+const struct iommu_ops *of_iommu_configure(struct device *dev,
+   struct device_node *master_np)
+{
+const struct iommu_ops *ops;
+
+if (!master_np)
+return NULL;
+
+if (dev_is_pci(dev))
+ops = of_pci_iommu_init(to_pci_dev(dev), master_np);

I gave the whole patch set a try on ThunderX. really_probe() is failing
on dma_configure()->of_pci_iommu_init() for each PCI device.

When you say "failing", do you mean cleanly, or with a crash? I've
managed to hit __of_match_node() dereferencing NULL from
of_iommu_xlate() in a horribly complicated 

Re: [PATCH V7 10/11] iommu/arm-smmu: Clean up early-probing workarounds

2017-01-25 Thread Tomasz Nowicki

On 23.01.2017 17:18, Sricharan R wrote:

From: Robin Murphy 

Now that the appropriate ordering is enforced via profe-deferral of
masters in core code, rip it all out and bask in the simplicity.

Acked-by: Will Deacon 
Signed-off-by: Robin Murphy 
[Sricharan: Rebased on top of ACPI IORT SMMU series]
Signed-off-by: Sricharan R 


[...]


-
-#ifdef CONFIG_ACPI
-static int __init acpi_smmu_v3_init(struct acpi_table_header *table)
-{
-   if (iort_node_match(ACPI_IORT_NODE_SMMU_V3))
-   return arm_smmu_init();


Since we clean the code we can remove iort_node_match() from iort.c. 
There is no user for it any more. Unless Lorenzo has some plans for it.


Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 01/11] iommu/of: Refactor of_iommu_configure() for error handling

2017-01-25 Thread Tomasz Nowicki

Hi Sricharan,

On 23.01.2017 17:18, Sricharan R wrote:

From: Robin Murphy 

In preparation for some upcoming cleverness, rework the control flow in
of_iommu_configure() to minimise duplication and improve the propogation
of errors. It's also as good a time as any to switch over from the
now-just-a-compatibility-wrapper of_iommu_get_ops() to using the generic
IOMMU instance interface directly.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/of_iommu.c | 83 +++-
 1 file changed, 53 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 0f57ddc..ee49081 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -96,6 +96,28 @@ int of_get_dma_window(struct device_node *dn, const char 
*prefix, int index,
 }
 EXPORT_SYMBOL_GPL(of_get_dma_window);

+static const struct iommu_ops
+*of_iommu_xlate(struct device *dev, struct of_phandle_args *iommu_spec)
+{
+   const struct iommu_ops *ops;
+   struct fwnode_handle *fwnode = _spec->np->fwnode;
+   int err;
+
+   ops = iommu_get_instance(fwnode);
+   if (!ops || !ops->of_xlate)
+   return NULL;
+
+   err = iommu_fwspec_init(dev, _spec->np->fwnode, ops);
+   if (err)
+   return ERR_PTR(err);
+
+   err = ops->of_xlate(dev, iommu_spec);
+   if (err)
+   return ERR_PTR(err);
+
+   return ops;
+}
+
 static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)
 {
struct of_phandle_args *iommu_spec = data;
@@ -105,10 +127,11 @@ static int __get_pci_rid(struct pci_dev *pdev, u16 alias, 
void *data)
 }

 static const struct iommu_ops
-*of_pci_iommu_configure(struct pci_dev *pdev, struct device_node *bridge_np)
+*of_pci_iommu_init(struct pci_dev *pdev, struct device_node *bridge_np)
 {
const struct iommu_ops *ops;
struct of_phandle_args iommu_spec;
+   int err;

/*
 * Start by tracing the RID alias down the PCI topology as
@@ -123,56 +146,56 @@ static int __get_pci_rid(struct pci_dev *pdev, u16 alias, 
void *data)
 * bus into the system beyond, and which IOMMU it ends up at.
 */
iommu_spec.np = NULL;
-   if (of_pci_map_rid(bridge_np, iommu_spec.args[0], "iommu-map",
-  "iommu-map-mask", _spec.np, iommu_spec.args))
-   return NULL;
+   err = of_pci_map_rid(bridge_np, iommu_spec.args[0], "iommu-map",
+"iommu-map-mask", _spec.np,
+iommu_spec.args);
+   if (err)
+   return ERR_PTR(err);

-   ops = of_iommu_get_ops(iommu_spec.np);
-   if (!ops || !ops->of_xlate ||
-   iommu_fwspec_init(>dev, _spec.np->fwnode, ops) ||
-   ops->of_xlate(>dev, _spec))
-   ops = NULL;
+   ops = of_iommu_xlate(>dev, _spec);

of_node_put(iommu_spec.np);
return ops;
 }

-const struct iommu_ops *of_iommu_configure(struct device *dev,
-  struct device_node *master_np)
+static const struct iommu_ops
+*of_platform_iommu_init(struct device *dev, struct device_node *np)
 {
struct of_phandle_args iommu_spec;
-   struct device_node *np;
const struct iommu_ops *ops = NULL;
int idx = 0;

-   if (dev_is_pci(dev))
-   return of_pci_iommu_configure(to_pci_dev(dev), master_np);
-
/*
 * We don't currently walk up the tree looking for a parent IOMMU.
 * See the `Notes:' section of
 * Documentation/devicetree/bindings/iommu/iommu.txt
 */
-   while (!of_parse_phandle_with_args(master_np, "iommus",
-  "#iommu-cells", idx,
-  _spec)) {
-   np = iommu_spec.np;
-   ops = of_iommu_get_ops(np);
-
-   if (!ops || !ops->of_xlate ||
-   iommu_fwspec_init(dev, >fwnode, ops) ||
-   ops->of_xlate(dev, _spec))
-   goto err_put_node;
-
-   of_node_put(np);
+   while (!of_parse_phandle_with_args(np, "iommus", "#iommu-cells",
+  idx, _spec)) {
+   ops = of_iommu_xlate(dev, _spec);
+   of_node_put(iommu_spec.np);
idx++;
+   if (IS_ERR_OR_NULL(ops))
+   break;
}

return ops;
+}
+
+const struct iommu_ops *of_iommu_configure(struct device *dev,
+  struct device_node *master_np)
+{
+   const struct iommu_ops *ops;
+
+   if (!master_np)
+   return NULL;
+
+   if (dev_is_pci(dev))
+   ops = of_pci_iommu_init(to_pci_dev(dev), master_np);


I gave the whole patch set a try on ThunderX. really_probe() is failing 
on dma_configure()->of_pci_iommu_init() for each PCI device. 

Re: [PATCH v3] IOMMU: SMMUv2: Support for Extended Stream ID (16 bit)

2017-01-19 Thread Tomasz Nowicki

On 19.01.2017 11:57, Aleksey Makarov wrote:

Hi Tomasz,

On 01/19/2017 11:55 AM, Tomasz Nowicki wrote:

Hi Aleksey,

On 17.01.2017 16:14, Aleksey Makarov wrote:

Enable the Extended Stream ID feature when available.

This patch on top of series "KVM PCIe/MSI passthrough on ARM/ARM64
and IOVA reserved regions" by Eric Auger [1] allows to passthrough
an external PCIe network card on a ThunderX server successfully.

Without this patch that card caused a warning like

pci 0006:90:00.0: stream ID 0x9000 out of range for SMMU (0x7fff)

during boot.

[1] 
https://lkml.kernel.org/r/1484127714-3263-1-git-send-email-eric.au...@redhat.com

Signed-off-by: Aleksey Makarov <aleksey.maka...@linaro.org>


I do not thing this is related to PCIe network card. It is rather common to all 
devices which bus number > 127



iommu/arm-smmu: Support for Extended Stream ID (16 bit)

It is the time we have the real 16-bit Stream ID user, which is the ThunderX. 
Its IO topology uses 1:1 map for requester to stream ID translation:

RC no. | Requester ID | Stream ID
   |  |
RC_0   | 0- --->  | 0-

which allows to get full 16-bit stream ID. Currently all devices with bus number 
>= 128 (0x80) get non-zero 16th bit of BDF and stream ID (due to 1:1 map). 
Eventually SMMU drops such device because the stream ID is out of range. This is 
the case for all devices connected as external endpoints on ThunderX.


Technically the last sentence it is not correct as far as I can see.  There exists 
a device connected to PEM (external entpoint?) with BDF (== Stream ID) <= 
0x7fff:

0004:21:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED 
Graphics Family (rev 30)


Right. You’ve got a point.








Thank you for review.  May I change the commit message this way and keep your 
'Reviewed-by'?:


Yes, the commit message is accurate now. Add my R-b please.



---
iommu/arm-smmu: Support for Extended Stream ID (16 bit)

It is the time we have the real 16-bit Stream ID user, which is the
ThunderX. Its IO topology uses 1:1 map for requester ID to stream ID
translation for each root complex which allows to get full 16-bit
stream ID.  Firmware assigns bus IDs that are greater than 128 (0x80)
to some buses under PEM (external PCIe interface).  Eventually SMMU
drops devices on that buses because their stream ID is out of range:

  pci 0006:90:00.0: stream ID 0x9000 out of range for SMMU (0x7fff)

To fix above issue enable the Extended Stream ID optional feature when 
available.
---



Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3] IOMMU: SMMUv2: Support for Extended Stream ID (16 bit)

2017-01-19 Thread Tomasz Nowicki

Hi Aleksey,

On 17.01.2017 16:14, Aleksey Makarov wrote:

Enable the Extended Stream ID feature when available.

This patch on top of series "KVM PCIe/MSI passthrough on ARM/ARM64
and IOVA reserved regions" by Eric Auger [1] allows to passthrough
an external PCIe network card on a ThunderX server successfully.

Without this patch that card caused a warning like

pci 0006:90:00.0: stream ID 0x9000 out of range for SMMU (0x7fff)

during boot.

[1] 
https://lkml.kernel.org/r/1484127714-3263-1-git-send-email-eric.au...@redhat.com

Signed-off-by: Aleksey Makarov <aleksey.maka...@linaro.org>


I do not thing this is related to PCIe network card. It is rather common 
to all devices which bus number > 127




iommu/arm-smmu: Support for Extended Stream ID (16 bit)

It is the time we have the real 16-bit Stream ID user, which is the 
ThunderX. Its IO topology uses 1:1 map for requester to stream ID 
translation:


RC no. | Requester ID | Stream ID
   |  |
RC_0   | 0- --->  | 0-

which allows to get full 16-bit stream ID. Currently all devices with 
bus number >= 128 (0x80) get non-zero 16th bit of BDF and stream ID (due 
to 1:1 map). Eventually SMMU drops such device because the stream ID is 
out of range. This is the case for all devices connected as external 
endpoints on ThunderX.


To fix above issue enable the Extended Stream ID optional feature when 
available.




Please add my:
Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz



---
v3:
- keep formatting of the comment
- fix printk message after the deleted chunk.  "[Do] not print a mask
  here, since it's not overly interesting in itself, and add_device will
  still show the offending mask in full if it ever actually matters (as in
  the commit message)." (Robin Murphy)

v2:
https://lkml.kernel.org/r/2017011614.29444-1-aleksey.maka...@linaro.org
- remove unnecessary parentheses (Robin Murphy)
- refactor testing SMR fields to after setting sCR0 as theirs width
  depends on sCR0_EXIDENABLE (Robin Murphy)

v1 (rfc):
https://lkml.kernel.org/r/20170110115755.19102-1-aleksey.maka...@linaro.org

 drivers/iommu/arm-smmu.c | 69 +---
 1 file changed, 48 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 13d26009b8e0..861cc135722f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -24,6 +24,7 @@
  * - v7/v8 long-descriptor format
  * - Non-secure access to the SMMU
  * - Context fault reporting
+ * - Extended Stream ID (16 bit)
  */

 #define pr_fmt(fmt) "arm-smmu: " fmt
@@ -87,6 +88,7 @@
 #define sCR0_CLIENTPD  (1 << 0)
 #define sCR0_GFRE  (1 << 1)
 #define sCR0_GFIE  (1 << 2)
+#define sCR0_EXIDENABLE(1 << 3)
 #define sCR0_GCFGFRE   (1 << 4)
 #define sCR0_GCFGFIE   (1 << 5)
 #define sCR0_USFCFG(1 << 10)
@@ -126,6 +128,7 @@
 #define ID0_NUMIRPT_MASK   0xff
 #define ID0_NUMSIDB_SHIFT  9
 #define ID0_NUMSIDB_MASK   0xf
+#define ID0_EXIDS  (1 << 8)
 #define ID0_NUMSMRG_SHIFT  0
 #define ID0_NUMSMRG_MASK   0xff

@@ -169,6 +172,7 @@
 #define ARM_SMMU_GR0_S2CR(n)   (0xc00 + ((n) << 2))
 #define S2CR_CBNDX_SHIFT   0
 #define S2CR_CBNDX_MASK0xff
+#define S2CR_EXIDVALID (1 << 10)
 #define S2CR_TYPE_SHIFT16
 #define S2CR_TYPE_MASK 0x3
 enum arm_smmu_s2cr_type {
@@ -354,6 +358,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_FMT_AARCH64_64K  (1 << 9)
 #define ARM_SMMU_FEAT_FMT_AARCH32_L(1 << 10)
 #define ARM_SMMU_FEAT_FMT_AARCH32_S(1 << 11)
+#define ARM_SMMU_FEAT_EXIDS(1 << 12)
u32 features;

 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
@@ -1051,7 +1056,7 @@ static void arm_smmu_write_smr(struct arm_smmu_device 
*smmu, int idx)
struct arm_smmu_smr *smr = smmu->smrs + idx;
u32 reg = smr->id << SMR_ID_SHIFT | smr->mask << SMR_MASK_SHIFT;

-   if (smr->valid)
+   if (!(smmu->features & ARM_SMMU_FEAT_EXIDS) && smr->valid)
reg |= SMR_VALID;
writel_relaxed(reg, ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_SMR(idx));
 }
@@ -1063,6 +1068,9 @@ static void arm_smmu_write_s2cr(struct arm_smmu_device 
*smmu, int idx)
  (s2cr->cbndx & S2CR_CBNDX_MASK) << S2CR_CBNDX_SHIFT |
  (s2cr->privcfg & S2CR_PRIVCFG_MASK) << S2CR_PRIVCFG_SHIFT;

+   if (smmu->features & A

Re: [PATCH v8 18/18] iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP anymore

2017-01-18 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

IOMMU_CAP_INTR_REMAP has been advertised in arm-smmu(-v3) although
on ARM this property is not attached to the IOMMU but rather is
implemented in the MSI controller (GICv3 ITS).

Now vfio_iommu_type1 checks MSI remapping capability at MSI controller
level, let's correct this.

Signed-off-by: Eric Auger <eric.au...@redhat.com>
Acked-by: Will Deacon <will.dea...@arm.com>


For patches [15-18]:
Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz



---

v7 -> v8:
- added Will's A-b
---
 drivers/iommu/arm-smmu-v3.c | 2 --
 drivers/iommu/arm-smmu.c| 2 --
 2 files changed, 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 6c4111c..d9cf6cb 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1375,8 +1375,6 @@ static bool arm_smmu_capable(enum iommu_cap cap)
switch (cap) {
case IOMMU_CAP_CACHE_COHERENCY:
return true;
-   case IOMMU_CAP_INTR_REMAP:
-   return true; /* MSIs are just memory writes */
case IOMMU_CAP_NOEXEC:
return true;
default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a354572..13d2600 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1374,8 +1374,6 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 * requests.
 */
return true;
-   case IOMMU_CAP_INTR_REMAP:
-   return true; /* MSIs are just memory writes */
case IOMMU_CAP_NOEXEC:
return true;
default:


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 14/18] irqdomain: irq_domain_check_msi_remap

2017-01-18 Thread Tomasz Nowicki

On 17.01.2017 15:06, Tomasz Nowicki wrote:

On 17.01.2017 14:53, Auger Eric wrote:

Hi Tomasz,

On 17/01/2017 14:40, Tomasz Nowicki wrote:

On 11.01.2017 10:41, Eric Auger wrote:

This new function checks whether all MSI irq domains
implement IRQ remapping. This is useful to understand
whether VFIO passthrough is safe with respect to interrupts.

On ARM typically an MSI controller can sit downstream
to the IOMMU without preventing VFIO passthrough.
As such any assigned device can write into the MSI doorbell.
In case the MSI controller implements IRQ remapping, assigned
devices will not be able to trigger interrupts towards the
host. On the contrary, the assignment must be emphasized as
unsafe with respect to interrupts.

Signed-off-by: Eric Auger <eric.au...@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyng...@arm.com>

---
v7 -> v8:
- remove goto in irq_domain_check_msi_remap
- Added Marc's R-b

v5 -> v6:
- use irq_domain_hierarchical_is_msi_remap()
- comment rewording

v4 -> v5:
- Handle DOMAIN_BUS_FSL_MC_MSI domains
- Check parents
---
 include/linux/irqdomain.h |  1 +
 kernel/irq/irqdomain.c| 22 ++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index bc2f571..188eced 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -222,6 +222,7 @@ struct irq_domain *irq_domain_add_legacy(struct
device_node *of_node,
  void *host_data);
 extern struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec
*fwspec,
enum irq_domain_bus_token bus_token);
+extern bool irq_domain_check_msi_remap(void);
 extern void irq_set_default_host(struct irq_domain *host);
 extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
   irq_hw_number_t hwirq, int node,
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 876e131..d889751 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -278,6 +278,28 @@ struct irq_domain
*irq_find_matching_fwspec(struct irq_fwspec *fwspec,
 EXPORT_SYMBOL_GPL(irq_find_matching_fwspec);

 /**
+ * irq_domain_check_msi_remap - Check whether all MSI
+ * irq domains implement IRQ remapping
+ */
+bool irq_domain_check_msi_remap(void)
+{
+struct irq_domain *h;
+bool ret = true;
+
+mutex_lock(_domain_mutex);
+list_for_each_entry(h, _domain_list, link) {
+if (irq_domain_is_msi(h) &&
+!irq_domain_hierarchical_is_msi_remap(h)) {
+ret = false;
+break;
+}
+}
+mutex_unlock(_domain_mutex);
+return ret;
+}


Above function returns true, even though there is no MSI irq domains. Is
it intentional ?

From the VFIO integration point of view this is what we want. If there
is no MSI controller in the system, we have no vulnerability with
respect to IRQ assignment and we consider the system as safe. If
requested I can add a comment?



I see. Yes, a comment would be helpful then :) Thanks!



Anyway:

Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 13/18] genirq/msi: Set IRQ_DOMAIN_FLAG_MSI on MSI domain creation

2017-01-18 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

Now we have a flag value indicating an IRQ domain implements MSI,
let's set it on msi_create_irq_domain().

Signed-off-by: Eric Auger <eric.au...@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyng...@arm.com>


Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz



---

v7 -> v8
- Added Marc's R-b

v6: new
---
 kernel/irq/msi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index ee23006..ddc2f54 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -270,8 +270,8 @@ struct irq_domain *msi_create_irq_domain(struct 
fwnode_handle *fwnode,
if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
msi_domain_update_chip_ops(info);

-   return irq_domain_create_hierarchy(parent, 0, 0, fwnode,
-  _domain_ops, info);
+   return irq_domain_create_hierarchy(parent, IRQ_DOMAIN_FLAG_MSI, 0,
+  fwnode, _domain_ops, info);
 }

 int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 12/18] irqdomain: Add irq domain MSI and MSI_REMAP flags

2017-01-18 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

We introduce two new enum values for the irq domain flag:
- IRQ_DOMAIN_FLAG_MSI indicates the irq domain corresponds to
  an MSI domain
- IRQ_DOMAIN_FLAG_MSI_REMAP indicates the irq domain has MSI
  remapping capabilities.

Those values will be useful to check all MSI irq domains have
MSI remapping support when assessing the safety of IRQ assignment
to a guest.

irq_domain_hierarchical_is_msi_remap() allows to check if an
irq domain or any parent implements MSI remapping.

Signed-off-by: Eric Auger <eric.au...@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyng...@arm.com>


Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz



---

v7 -> v8:
- remove h variable in irq_domain_hierarchical_is_msi_remap
- add Marc's R-b

v6:
- add IRQ_DOMAIN_FLAG_MSI as suggested by Marc
- add irq_domain_is_msi, irq_domain_is_msi_remap and
  irq_domain_hierarchical_is_msi_remap
---
 include/linux/irqdomain.h | 35 +++
 kernel/irq/irqdomain.c| 14 ++
 2 files changed, 49 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index ffb8460..bc2f571 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -183,6 +183,12 @@ enum {
/* Irq domain is an IPI domain with single virq */
IRQ_DOMAIN_FLAG_IPI_SINGLE  = (1 << 3),

+   /* Irq domain implements MSIs */
+   IRQ_DOMAIN_FLAG_MSI = (1 << 4),
+
+   /* Irq domain implements MSI remapping */
+   IRQ_DOMAIN_FLAG_MSI_REMAP   = (1 << 5),
+
/*
 * Flags starting from IRQ_DOMAIN_FLAG_NONCORE are reserved
 * for implementation specific purposes and ignored by the
@@ -446,6 +452,19 @@ static inline bool irq_domain_is_ipi_single(struct 
irq_domain *domain)
 {
return domain->flags & IRQ_DOMAIN_FLAG_IPI_SINGLE;
 }
+
+static inline bool irq_domain_is_msi(struct irq_domain *domain)
+{
+   return domain->flags & IRQ_DOMAIN_FLAG_MSI;
+}
+
+static inline bool irq_domain_is_msi_remap(struct irq_domain *domain)
+{
+   return domain->flags & IRQ_DOMAIN_FLAG_MSI_REMAP;
+}
+
+extern bool irq_domain_hierarchical_is_msi_remap(struct irq_domain *domain);
+
 #else  /* CONFIG_IRQ_DOMAIN_HIERARCHY */
 static inline void irq_domain_activate_irq(struct irq_data *data) { }
 static inline void irq_domain_deactivate_irq(struct irq_data *data) { }
@@ -477,6 +496,22 @@ static inline bool irq_domain_is_ipi_single(struct 
irq_domain *domain)
 {
return false;
 }
+
+static inline bool irq_domain_is_msi(struct irq_domain *domain)
+{
+   return false;
+}
+
+static inline bool irq_domain_is_msi_remap(struct irq_domain *domain)
+{
+   return false;
+}
+
+static inline bool
+irq_domain_hierarchical_is_msi_remap(struct irq_domain *domain)
+{
+   return false;
+}
 #endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */

 #else /* CONFIG_IRQ_DOMAIN */
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 8c0a0ae..876e131 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1392,6 +1392,20 @@ static void irq_domain_check_hierarchy(struct irq_domain 
*domain)
if (domain->ops->alloc)
domain->flags |= IRQ_DOMAIN_FLAG_HIERARCHY;
 }
+
+/**
+ * irq_domain_hierarchical_is_msi_remap - Check if the domain or any
+ * parent has MSI remapping support
+ * @domain: domain pointer
+ */
+bool irq_domain_hierarchical_is_msi_remap(struct irq_domain *domain)
+{
+   for (; domain; domain = domain->parent) {
+   if (irq_domain_is_msi_remap(domain))
+   return true;
+   }
+   return false;
+}
 #else  /* CONFIG_IRQ_DOMAIN_HIERARCHY */
 /**
  * irq_domain_get_irq_data - Get irq_data associated with @virq and @domain


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 11/18] iommu/arm-smmu-v3: Implement reserved region get/put callbacks

2017-01-17 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

iommu/arm-smmu: Implement reserved region get/put callbacks

The get() populates the list with the MSI IOVA reserved window.

At the moment an arbitray MSI IOVA window is set at 0x800
of size 1MB. This will allow to report those info in iommu-group
sysfs.

Signed-off-by: Eric Auger <eric.au...@redhat.com>
Acked-by: Will Deacon <will.dea...@arm.com>


Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz



---

v7 -> v8:
- added Will's A-b

v4: creation
---
 drivers/iommu/arm-smmu-v3.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4d6ec44..6c4111c 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -412,6 +412,9 @@
 /* High-level queue structures */
 #define ARM_SMMU_POLL_TIMEOUT_US   100

+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
 static bool disable_bypass;
 module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
@@ -1883,6 +1886,29 @@ static int arm_smmu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
 }

+static void arm_smmu_get_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+   struct iommu_resv_region *region;
+   int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+   region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
+prot, IOMMU_RESV_MSI);
+   if (!region)
+   return;
+
+   list_add_tail(>list, head);
+}
+
+static void arm_smmu_put_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+   struct iommu_resv_region *entry, *next;
+
+   list_for_each_entry_safe(entry, next, head, list)
+   kfree(entry);
+}
+
 static struct iommu_ops arm_smmu_ops = {
.capable= arm_smmu_capable,
.domain_alloc   = arm_smmu_domain_alloc,
@@ -1898,6 +1924,8 @@ static int arm_smmu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
.of_xlate   = arm_smmu_of_xlate,
+   .get_resv_regions   = arm_smmu_get_resv_regions,
+   .put_resv_regions   = arm_smmu_put_resv_regions,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
 };



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 14/18] irqdomain: irq_domain_check_msi_remap

2017-01-17 Thread Tomasz Nowicki

On 17.01.2017 14:53, Auger Eric wrote:

Hi Tomasz,

On 17/01/2017 14:40, Tomasz Nowicki wrote:

On 11.01.2017 10:41, Eric Auger wrote:

This new function checks whether all MSI irq domains
implement IRQ remapping. This is useful to understand
whether VFIO passthrough is safe with respect to interrupts.

On ARM typically an MSI controller can sit downstream
to the IOMMU without preventing VFIO passthrough.
As such any assigned device can write into the MSI doorbell.
In case the MSI controller implements IRQ remapping, assigned
devices will not be able to trigger interrupts towards the
host. On the contrary, the assignment must be emphasized as
unsafe with respect to interrupts.

Signed-off-by: Eric Auger <eric.au...@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyng...@arm.com>

---
v7 -> v8:
- remove goto in irq_domain_check_msi_remap
- Added Marc's R-b

v5 -> v6:
- use irq_domain_hierarchical_is_msi_remap()
- comment rewording

v4 -> v5:
- Handle DOMAIN_BUS_FSL_MC_MSI domains
- Check parents
---
 include/linux/irqdomain.h |  1 +
 kernel/irq/irqdomain.c| 22 ++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index bc2f571..188eced 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -222,6 +222,7 @@ struct irq_domain *irq_domain_add_legacy(struct
device_node *of_node,
  void *host_data);
 extern struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec
*fwspec,
enum irq_domain_bus_token bus_token);
+extern bool irq_domain_check_msi_remap(void);
 extern void irq_set_default_host(struct irq_domain *host);
 extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
   irq_hw_number_t hwirq, int node,
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 876e131..d889751 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -278,6 +278,28 @@ struct irq_domain
*irq_find_matching_fwspec(struct irq_fwspec *fwspec,
 EXPORT_SYMBOL_GPL(irq_find_matching_fwspec);

 /**
+ * irq_domain_check_msi_remap - Check whether all MSI
+ * irq domains implement IRQ remapping
+ */
+bool irq_domain_check_msi_remap(void)
+{
+struct irq_domain *h;
+bool ret = true;
+
+mutex_lock(_domain_mutex);
+list_for_each_entry(h, _domain_list, link) {
+if (irq_domain_is_msi(h) &&
+!irq_domain_hierarchical_is_msi_remap(h)) {
+ret = false;
+break;
+}
+}
+mutex_unlock(_domain_mutex);
+return ret;
+}


Above function returns true, even though there is no MSI irq domains. Is
it intentional ?

From the VFIO integration point of view this is what we want. If there
is no MSI controller in the system, we have no vulnerability with
respect to IRQ assignment and we consider the system as safe. If
requested I can add a comment?



I see. Yes, a comment would be helpful then :) Thanks!

Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 14/18] irqdomain: irq_domain_check_msi_remap

2017-01-17 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

This new function checks whether all MSI irq domains
implement IRQ remapping. This is useful to understand
whether VFIO passthrough is safe with respect to interrupts.

On ARM typically an MSI controller can sit downstream
to the IOMMU without preventing VFIO passthrough.
As such any assigned device can write into the MSI doorbell.
In case the MSI controller implements IRQ remapping, assigned
devices will not be able to trigger interrupts towards the
host. On the contrary, the assignment must be emphasized as
unsafe with respect to interrupts.

Signed-off-by: Eric Auger 
Reviewed-by: Marc Zyngier 

---
v7 -> v8:
- remove goto in irq_domain_check_msi_remap
- Added Marc's R-b

v5 -> v6:
- use irq_domain_hierarchical_is_msi_remap()
- comment rewording

v4 -> v5:
- Handle DOMAIN_BUS_FSL_MC_MSI domains
- Check parents
---
 include/linux/irqdomain.h |  1 +
 kernel/irq/irqdomain.c| 22 ++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index bc2f571..188eced 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -222,6 +222,7 @@ struct irq_domain *irq_domain_add_legacy(struct device_node 
*of_node,
 void *host_data);
 extern struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
   enum irq_domain_bus_token 
bus_token);
+extern bool irq_domain_check_msi_remap(void);
 extern void irq_set_default_host(struct irq_domain *host);
 extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
  irq_hw_number_t hwirq, int node,
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 876e131..d889751 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -278,6 +278,28 @@ struct irq_domain *irq_find_matching_fwspec(struct 
irq_fwspec *fwspec,
 EXPORT_SYMBOL_GPL(irq_find_matching_fwspec);

 /**
+ * irq_domain_check_msi_remap - Check whether all MSI
+ * irq domains implement IRQ remapping
+ */
+bool irq_domain_check_msi_remap(void)
+{
+   struct irq_domain *h;
+   bool ret = true;
+
+   mutex_lock(_domain_mutex);
+   list_for_each_entry(h, _domain_list, link) {
+   if (irq_domain_is_msi(h) &&
+   !irq_domain_hierarchical_is_msi_remap(h)) {
+   ret = false;
+   break;
+   }
+   }
+   mutex_unlock(_domain_mutex);
+   return ret;
+}


Above function returns true, even though there is no MSI irq domains. Is 
it intentional ?


Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 10/18] iommu/arm-smmu: Implement reserved region get/put callbacks

2017-01-17 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

The get() populates the list with the MSI IOVA reserved window.

At the moment an arbitray MSI IOVA window is set at 0x800
of size 1MB. This will allow to report those info in iommu-group
sysfs.

Signed-off-by: Eric Auger <eric.au...@redhat.com>


Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz



---

v3 -> v4:
- do not handle PCI host bridge windows anymore
- encode prot

RFC v2 -> v3:
- use existing get/put_resv_regions

RFC v1 -> v2:
- use defines for MSI IOVA base and length
---
 drivers/iommu/arm-smmu.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a60cded..a354572 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -281,6 +281,9 @@ enum arm_smmu_s2cr_privcfg {

 #define FSYNR0_WNR (1 << 4)

+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
 static int force_stage;
 module_param(force_stage, int, S_IRUGO);
 MODULE_PARM_DESC(force_stage,
@@ -1549,6 +1552,29 @@ static int arm_smmu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, , 1);
 }

+static void arm_smmu_get_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+   struct iommu_resv_region *region;
+   int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+   region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
+prot, IOMMU_RESV_MSI);
+   if (!region)
+   return;
+
+   list_add_tail(>list, head);
+}
+
+static void arm_smmu_put_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+   struct iommu_resv_region *entry, *next;
+
+   list_for_each_entry_safe(entry, next, head, list)
+   kfree(entry);
+}
+
 static struct iommu_ops arm_smmu_ops = {
.capable= arm_smmu_capable,
.domain_alloc   = arm_smmu_domain_alloc,
@@ -1564,6 +1590,8 @@ static int arm_smmu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
.of_xlate   = arm_smmu_of_xlate,
+   .get_resv_regions   = arm_smmu_get_resv_regions,
+   .put_resv_regions   = arm_smmu_put_resv_regions,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
 };



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 06/18] iommu: iommu_get_group_resv_regions

2017-01-17 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

Introduce iommu_get_group_resv_regions whose role consists in
enumerating all devices from the group and collecting their
reserved regions. The list is sorted and overlaps between
regions of the same type are handled by merging the regions.

Signed-off-by: Eric Auger <eric.au...@redhat.com>


Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz



---
v6 -> v7:
- avoid merge of regions of different type

v3 -> v4:
- take the iommu_group lock in iommu_get_group_resv_regions
- the list now is sorted and overlaps are checked

NOTE:
- we do not move list elements from device to group list since
  the iommu_put_resv_regions() could not be called.
- at the moment I did not introduce any iommu_put_group_resv_regions
  since it simply consists in voiding/freeing the list
---
 drivers/iommu/iommu.c | 98 +++
 include/linux/iommu.h |  8 +
 2 files changed, 106 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 41c1906..640056b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -133,6 +133,104 @@ static ssize_t iommu_group_show_name(struct iommu_group 
*group, char *buf)
return sprintf(buf, "%s\n", group->name);
 }

+/**
+ * iommu_insert_resv_region - Insert a new region in the
+ * list of reserved regions.
+ * @new: new region to insert
+ * @regions: list of regions
+ *
+ * The new element is sorted by address with respect to the other
+ * regions of the same type. In case it overlaps with another
+ * region of the same type, regions are merged. In case it
+ * overlaps with another region of different type, regions are
+ * not merged.
+ */
+static int iommu_insert_resv_region(struct iommu_resv_region *new,
+   struct list_head *regions)
+{
+   struct iommu_resv_region *region;
+   phys_addr_t start = new->start;
+   phys_addr_t end = new->start + new->length - 1;
+   struct list_head *pos = regions->next;
+
+   while (pos != regions) {
+   struct iommu_resv_region *entry =
+   list_entry(pos, struct iommu_resv_region, list);
+   phys_addr_t a = entry->start;
+   phys_addr_t b = entry->start + entry->length - 1;
+   int type = entry->type;
+
+   if (end < a) {
+   goto insert;
+   } else if (start > b) {
+   pos = pos->next;
+   } else if ((start >= a) && (end <= b)) {
+   if (new->type == type)
+   goto done;
+   else
+   pos = pos->next;
+   } else {
+   if (new->type == type) {
+   phys_addr_t new_start = min(a, start);
+   phys_addr_t new_end = max(b, end);
+
+   list_del(>list);
+   entry->start = new_start;
+   entry->length = new_end - new_start + 1;
+   iommu_insert_resv_region(entry, regions);
+   } else {
+   pos = pos->next;
+   }
+   }
+   }
+insert:
+   region = iommu_alloc_resv_region(new->start, new->length,
+new->prot, new->type);
+   if (!region)
+   return -ENOMEM;
+
+   list_add_tail(>list, pos);
+done:
+   return 0;
+}
+
+static int
+iommu_insert_device_resv_regions(struct list_head *dev_resv_regions,
+struct list_head *group_resv_regions)
+{
+   struct iommu_resv_region *entry;
+   int ret;
+
+   list_for_each_entry(entry, dev_resv_regions, list) {
+   ret = iommu_insert_resv_region(entry, group_resv_regions);
+   if (ret)
+   break;
+   }
+   return ret;
+}
+
+int iommu_get_group_resv_regions(struct iommu_group *group,
+struct list_head *head)
+{
+   struct iommu_device *device;
+   int ret = 0;
+
+   mutex_lock(>mutex);
+   list_for_each_entry(device, >devices, list) {
+   struct list_head dev_resv_regions;
+
+   INIT_LIST_HEAD(_resv_regions);
+   iommu_get_resv_regions(device->dev, _resv_regions);
+   ret = iommu_insert_device_resv_regions(_resv_regions, head);
+   iommu_put_resv_regions(device->dev, _resv_regions);
+   if (ret)
+   break;
+   }
+   mutex_unlock(>mutex);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_get_group_resv_regions);
+
 static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);

 static 

Re: [PATCH v8 05/18] iommu: Only map direct mapped regions

2017-01-17 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

As we introduced new reserved region types which do not require
mapping, let's make sure we only map direct mapped regions.

Signed-off-by: Eric Auger <eric.au...@redhat.com>


Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz



---

v3 -> v4:
- use region's type and reword commit message and title
---
 drivers/iommu/iommu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 927878d..41c1906 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -343,6 +343,9 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
start = ALIGN(entry->start, pg_size);
end   = ALIGN(entry->start + entry->length, pg_size);

+   if (entry->type != IOMMU_RESV_DIRECT)
+   continue;
+
for (addr = start; addr < end; addr += pg_size) {
phys_addr_t phys_addr;



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 04/18] iommu: iommu_alloc_resv_region

2017-01-17 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

Introduce a new helper serving the purpose to allocate a reserved
region. This will be used in iommu driver implementing reserved
region callbacks.

Signed-off-by: Eric Auger <eric.au...@redhat.com>


Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz



---

v3 -> v4:
- add INIT_LIST_HEAD(>list)
- use int for prot param and add int type param
- remove implementation outside of CONFIG_IOMMU_API
---
 drivers/iommu/iommu.c | 18 ++
 include/linux/iommu.h |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1cee5c3..927878d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1575,6 +1575,24 @@ void iommu_put_resv_regions(struct device *dev, struct 
list_head *list)
ops->put_resv_regions(dev, list);
 }

+struct iommu_resv_region *iommu_alloc_resv_region(phys_addr_t start,
+ size_t length,
+ int prot, int type)
+{
+   struct iommu_resv_region *region;
+
+   region = kzalloc(sizeof(*region), GFP_KERNEL);
+   if (!region)
+   return NULL;
+
+   INIT_LIST_HEAD(>list);
+   region->start = start;
+   region->length = length;
+   region->prot = prot;
+   region->type = type;
+   return region;
+}
+
 /* Request that a device is direct mapped by the IOMMU */
 int iommu_request_dm_for_dev(struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 233a6bf..f6bb55d3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -244,6 +244,8 @@ extern void iommu_set_fault_handler(struct iommu_domain 
*domain,
 extern void iommu_get_resv_regions(struct device *dev, struct list_head *list);
 extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
 extern int iommu_request_dm_for_dev(struct device *dev);
+extern struct iommu_resv_region *
+iommu_alloc_resv_region(phys_addr_t start, size_t length, int prot, int type);

 extern int iommu_attach_group(struct iommu_domain *domain,
  struct iommu_group *group);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/18] iommu: Rename iommu_dm_regions into iommu_resv_regions

2017-01-17 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

We want to extend the callbacks used for dm regions and
use them for reserved regions. Reserved regions can be
- directly mapped regions
- regions that cannot be iommu mapped (PCI host bridge windows, ...)
- MSI regions (because they belong to another address space or because
  they are not translated by the IOMMU and need special handling)

So let's rename the struct and also the callbacks.

Signed-off-by: Eric Auger <eric.au...@redhat.com>
Acked-by: Robin Murphy <robin.mur...@arm.com>


Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz



---

v3 -> v4:
- add Robin's A-b
---
 drivers/iommu/amd_iommu.c | 20 ++--
 drivers/iommu/iommu.c | 22 +++---
 include/linux/iommu.h | 29 +++--
 3 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 3ef0f42..f7a024f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3161,8 +3161,8 @@ static bool amd_iommu_capable(enum iommu_cap cap)
return false;
 }

-static void amd_iommu_get_dm_regions(struct device *dev,
-struct list_head *head)
+static void amd_iommu_get_resv_regions(struct device *dev,
+  struct list_head *head)
 {
struct unity_map_entry *entry;
int devid;
@@ -3172,7 +3172,7 @@ static void amd_iommu_get_dm_regions(struct device *dev,
return;

list_for_each_entry(entry, _iommu_unity_map, list) {
-   struct iommu_dm_region *region;
+   struct iommu_resv_region *region;

if (devid < entry->devid_start || devid > entry->devid_end)
continue;
@@ -3195,18 +3195,18 @@ static void amd_iommu_get_dm_regions(struct device *dev,
}
 }

-static void amd_iommu_put_dm_regions(struct device *dev,
+static void amd_iommu_put_resv_regions(struct device *dev,
 struct list_head *head)
 {
-   struct iommu_dm_region *entry, *next;
+   struct iommu_resv_region *entry, *next;

list_for_each_entry_safe(entry, next, head, list)
kfree(entry);
 }

-static void amd_iommu_apply_dm_region(struct device *dev,
+static void amd_iommu_apply_resv_region(struct device *dev,
  struct iommu_domain *domain,
- struct iommu_dm_region *region)
+ struct iommu_resv_region *region)
 {
struct dma_ops_domain *dma_dom = to_dma_ops_domain(to_pdomain(domain));
unsigned long start, end;
@@ -3230,9 +3230,9 @@ static void amd_iommu_apply_dm_region(struct device *dev,
.add_device = amd_iommu_add_device,
.remove_device = amd_iommu_remove_device,
.device_group = amd_iommu_device_group,
-   .get_dm_regions = amd_iommu_get_dm_regions,
-   .put_dm_regions = amd_iommu_put_dm_regions,
-   .apply_dm_region = amd_iommu_apply_dm_region,
+   .get_resv_regions = amd_iommu_get_resv_regions,
+   .put_resv_regions = amd_iommu_put_resv_regions,
+   .apply_resv_region = amd_iommu_apply_resv_region,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
 };

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index dbe7f65..1cee5c3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -318,7 +318,7 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
  struct device *dev)
 {
struct iommu_domain *domain = group->default_domain;
-   struct iommu_dm_region *entry;
+   struct iommu_resv_region *entry;
struct list_head mappings;
unsigned long pg_size;
int ret = 0;
@@ -331,14 +331,14 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
pg_size = 1UL << __ffs(domain->pgsize_bitmap);
INIT_LIST_HEAD();

-   iommu_get_dm_regions(dev, );
+   iommu_get_resv_regions(dev, );

/* We need to consider overlapping regions for different devices */
list_for_each_entry(entry, , list) {
dma_addr_t start, end, addr;

-   if (domain->ops->apply_dm_region)
-   domain->ops->apply_dm_region(dev, domain, entry);
+   if (domain->ops->apply_resv_region)
+   domain->ops->apply_resv_region(dev, domain, entry);

start = ALIGN(entry->start, pg_size);
end   = ALIGN(entry->start + entry->length, pg_size);
@@ -358,7 +358,7 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
}

 out:
-   iommu_put_dm_regions(dev, );
+   iommu_put_resv_regions(dev, );

return ret;
 }
@@ -1559,20 +1559,20 @@ int iommu_domain_set

Re: [PATCH v8 01/18] iommu/dma: Allow MSI-only cookies

2017-01-17 Thread Tomasz Nowicki

On 11.01.2017 10:41, Eric Auger wrote:

From: Robin Murphy <robin.mur...@arm.com>

IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.

Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie, and extend the mapping
routine to implement a trivial linear allocator in such cases, to avoid
the needless overhead of a full-blown IOVA domain.

Signed-off-by: Robin Murphy <robin.mur...@arm.com>


Reviewed-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz


---
 drivers/iommu/dma-iommu.c | 119 +-
 include/linux/dma-iommu.h |   6 +++
 2 files changed, 102 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 2db0d64..de41ead 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -37,15 +37,50 @@ struct iommu_dma_msi_page {
phys_addr_t phys;
 };

+enum iommu_dma_cookie_type {
+   IOMMU_DMA_IOVA_COOKIE,
+   IOMMU_DMA_MSI_COOKIE,
+};
+
 struct iommu_dma_cookie {
-   struct iova_domain  iovad;
-   struct list_headmsi_page_list;
-   spinlock_t  msi_lock;
+   enum iommu_dma_cookie_type  type;
+   union {
+   /* Full allocator for IOMMU_DMA_IOVA_COOKIE */
+   struct iova_domain  iovad;
+   /* Trivial linear page allocator for IOMMU_DMA_MSI_COOKIE */
+   dma_addr_t  msi_iova;
+   };
+   struct list_headmsi_page_list;
+   spinlock_t  msi_lock;
 };

+static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
+{
+   if (cookie->type == IOMMU_DMA_IOVA_COOKIE)
+   return cookie->iovad.granule;
+   return PAGE_SIZE;
+}
+
 static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
 {
-   return &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad;
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+
+   if (cookie->type == IOMMU_DMA_IOVA_COOKIE)
+   return >iovad;
+   return NULL;
+}
+
+static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
+{
+   struct iommu_dma_cookie *cookie;
+
+   cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+   if (cookie) {
+   spin_lock_init(>msi_lock);
+   INIT_LIST_HEAD(>msi_page_list);
+   cookie->type = type;
+   }
+   return cookie;
 }

 int iommu_dma_init(void)
@@ -62,25 +97,53 @@ int iommu_dma_init(void)
  */
 int iommu_get_dma_cookie(struct iommu_domain *domain)
 {
+   if (domain->iova_cookie)
+   return -EEXIST;
+
+   domain->iova_cookie = cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
+   if (!domain->iova_cookie)
+   return -ENOMEM;
+
+   return 0;
+}
+EXPORT_SYMBOL(iommu_get_dma_cookie);
+
+/**
+ * iommu_get_msi_cookie - Acquire just MSI remapping resources
+ * @domain: IOMMU domain to prepare
+ * @base: Start address of IOVA region for MSI mappings
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately. Users should reserve a
+ * contiguous IOVA region, starting at @base, large enough to accommodate the
+ * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
+ * used by the devices attached to @domain.
+ */
+int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
+{
struct iommu_dma_cookie *cookie;

+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
+
if (domain->iova_cookie)
return -EEXIST;

-   cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+   cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
if (!cookie)
return -ENOMEM;

-   spin_lock_init(>msi_lock);
-   INIT_LIST_HEAD(>msi_page_list);
+   cookie->msi_iova = base;
domain->iova_cookie = cookie;
return 0;
 }
-EXPORT_SYMBOL(iommu_get_dma_cookie);
+EXPORT_SYMBOL(iommu_get_msi_cookie);

 /**
  * iommu_put_dma_cookie - Release a domain's DMA mapping resources
- * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() or
+ *  iommu_get_msi_cookie()
  *
  * IOMMU drivers should normally call this from their

[PATCH 1/1] iommu/arm-smmu: Fix for ThunderX erratum #27704

2017-01-15 Thread Tomasz Nowicki
The goal of erratum #27704 workaround was to make sure that ASIDs and VMIDs
are unique across all SMMU instances on affected Cavium systems.

Currently, the workaround code partitions ASIDs and VMIDs by increasing
global cavium_smmu_context_count which in turn becomes the base ASID and VMID
value for the given SMMU instance upon the context bank initialization.

For systems with multiple SMMU instances this approach implies the risk
of crossing 8-bit ASID, like for 1-socket CN88xx capable of 4 SMMUv2,
128 context banks each:
SMMU_0 (0-127 ASID RANGE)
SMMU_1 (127-255 ASID RANGE)
SMMU_2 (256-383 ASID RANGE) <--- crossing 8-bit ASID
SMMU_3 (384-511 ASID RANGE) <--- crossing 8-bit ASID

Since now we use 8-bit ASID (SMMU_CBn_TCR2.AS = 0) we effectively misconfigure
ASID[15:8] bits of SMMU_CBn_TTBRm register for SMMU_2/3. Moreover, we still
assume non-zero ASID[15:8] bits upon context invalidation. In the end,
except SMMU_0/1 devices all other devices under other SMMUs will fail on guest
power off/on. Since we try to invalidate TLB with 16-bit ASID but we actually
have 8-bit zero padded 16-bit entry.

This patch adds 16-bit ASID support for stage-1 AArch64 contexts so that
we use ASIDs consistently for all SMMU instances.

Signed-off-by: Tomasz Nowicki <t...@semihalf.com>
Reviewed-by: Robin Murphy <robin.mur...@arm.com>
Reviewed-by: Tirumalesh Chalamarla  <tirumalesh.chalama...@cavium.com>
---
 drivers/iommu/arm-smmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a60cded..476fab9 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -260,6 +260,7 @@ enum arm_smmu_s2cr_privcfg {
 
 #define TTBCR2_SEP_SHIFT   15
 #define TTBCR2_SEP_UPSTREAM(0x7 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_AS  (1 << 4)
 
 #define TTBRn_ASID_SHIFT   48
 
@@ -778,6 +779,8 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
reg = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
reg2 = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
reg2 |= TTBCR2_SEP_UPSTREAM;
+   if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
+   reg2 |= TTBCR2_AS;
}
if (smmu->version > ARM_SMMU_V1)
writel_relaxed(reg2, cb_base + ARM_SMMU_CB_TTBCR2);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 00/18] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2017-01-13 Thread Tomasz Nowicki

Hello Eric,

On 11.01.2017 10:41, Eric Auger wrote:

Following LPC discussions, we now report reserved regions through
the iommu-group sysfs reserved_regions attribute file.

Reserved regions are populated through the IOMMU get_resv_region
callback (former get_dm_regions), now implemented by amd-iommu,
intel-iommu and arm-smmu:
- the intel-iommu reports the [0xfee0 - 0xfeef] MSI window
  as a reserved region and RMRR regions as direct-mapped regions.
- the amd-iommu reports device direct mapped regions, the MSI region
  and HT regions.
- the arm-smmu reports the MSI window (arbitrarily located at
  0x800 and 1MB large).

Unsafe interrupt assignment is tested by enumerating all MSI irq
domains and checking MSI remapping is supported in the above hierarchy.
This check is done in case we detect the iommu translates MSI
(an IOMMU_RESV_MSI window exists). Otherwise the IRQ remapping
capability is checked at IOMMU level. Obviously this is a defensive
IRQ safety assessment: Assuming there are several MSI controllers
in the system and at least one does not implement IRQ remapping,
the assignment will be considered as unsafe (even if this controller
is not acessible from the assigned devices).

The series first patch stems from Robin's branch:
http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/misc

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/v4.10-rc3-reserved-v8


I tested the series on ThunderX with internal 10G VNIC and Intel IXGBE 
NIC. Please feel free to add my:

Tested-by: Tomasz Nowicki <tomasz.nowi...@caviumnetworks.com>

Thanks,
Tomasz

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/arm-smmu: Fix for ThunderX erratum #27704

2017-01-13 Thread Tomasz Nowicki

On 12.01.2017 07:41, Tomasz Nowicki wrote:

On 11.01.2017 13:19, Robin Murphy wrote:

On 11/01/17 11:51, Tomasz Nowicki wrote:

The goal of erratum #27704 workaround was to make sure that ASIDs and
VMIDs
are unique across all SMMU instances on affected Cavium systems.

Currently, the workaround code partitions ASIDs and VMIDs by increasing
global cavium_smmu_context_count which in turn becomes the base ASID
and VMID
value for the given SMMU instance upon the context bank initialization.

For systems with multiple SMMU instances this approach implies the risk
of crossing 8-bit ASID, like for CN88xx capable of 4 SMMUv2, 128
context bank each:
SMMU_0 (0-127 ASID RANGE)
SMMU_1 (127-255 ASID RANGE)
SMMU_2 (256-383 ASID RANGE) <--- crossing 8-bit ASID
SMMU_3 (384-511 ASID RANGE) <--- crossing 8-bit ASID


I could swear that at some point in the original discussion it was said
that the TLBs were only shared between pairs of SMMUs, so in fact 0/1
and 2/3 are independent of each other


Indeed TLBs are only shared between pairs of SMMUs but the workaround
makes sure ASIDs are unique across all SMMU instances so we do not have
to bother about SMMUs probe order.

 - out of interest, have you

managed to hit an actual problem in practice or is this patch just by
inspection?


Except SMMU0/1 devices all other devices under other SMMUs will fail on
guest power off/on. Since we try to invalidate tlb with 16bit ASID but
we actually have 8 bit zero padded 16 bit entry.



Of course, depending on the SMMUs to probe in the right order isn't
particularly robust, so it's still probably a worthwhile change.


Since we use 8-bit ASID now we effectively misconfigure ASID[15:8]
bits for
SMMU_CBn_TTBRm register. Also, we still use non-zero ASID[15:8] bits
upon context invalidation. This patch adds 16-bit ASID support for
stage-1
AArch64 contexts for Cavium SMMUv2 model so that we use ASIDs
consistently.

Signed-off-by: Tomasz Nowicki <t...@semihalf.com>
---
 drivers/iommu/arm-smmu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a60cded..ae8f059 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -260,6 +260,7 @@ enum arm_smmu_s2cr_privcfg {

 #define TTBCR2_SEP_SHIFT15
 #define TTBCR2_SEP_UPSTREAM(0x7 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_AS(1 << 4)

 #define TTBRn_ASID_SHIFT48

@@ -778,6 +779,9 @@ static void arm_smmu_init_context_bank(struct
arm_smmu_domain *smmu_domain,
 reg = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 reg2 = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
 reg2 |= TTBCR2_SEP_UPSTREAM;
+if (smmu->model == CAVIUM_SMMUV2 &&


I'd be inclined to say "smmu->version == ARM_SMMU_V2" there, rather than
make it Cavium-specific - we enable 16-bit VMID unconditionally where
supported, so I don't see any reason not to handle 16-bit ASIDs in the
same manner.


I agree, I will enable 16-bit ASID for ARM_SMMU_V2.



Actually, the ARM_SMMU_CTX_FMT_AARCH64 context check is all we need here:

+   if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
+   reg2 |= TTBCR2_AS;

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/arm-smmu: Fix for ThunderX erratum #27704

2017-01-11 Thread Tomasz Nowicki

On 11.01.2017 13:19, Robin Murphy wrote:

On 11/01/17 11:51, Tomasz Nowicki wrote:

The goal of erratum #27704 workaround was to make sure that ASIDs and VMIDs
are unique across all SMMU instances on affected Cavium systems.

Currently, the workaround code partitions ASIDs and VMIDs by increasing
global cavium_smmu_context_count which in turn becomes the base ASID and VMID
value for the given SMMU instance upon the context bank initialization.

For systems with multiple SMMU instances this approach implies the risk
of crossing 8-bit ASID, like for CN88xx capable of 4 SMMUv2, 128 context bank 
each:
SMMU_0 (0-127 ASID RANGE)
SMMU_1 (127-255 ASID RANGE)
SMMU_2 (256-383 ASID RANGE) <--- crossing 8-bit ASID
SMMU_3 (384-511 ASID RANGE) <--- crossing 8-bit ASID


I could swear that at some point in the original discussion it was said
that the TLBs were only shared between pairs of SMMUs, so in fact 0/1
and 2/3 are independent of each other


Indeed TLBs are only shared between pairs of SMMUs but the workaround 
makes sure ASIDs are unique across all SMMU instances so we do not have 
to bother about SMMUs probe order.


 - out of interest, have you

managed to hit an actual problem in practice or is this patch just by
inspection?


Except SMMU0/1 devices all other devices under other SMMUs will fail on 
guest power off/on. Since we try to invalidate tlb with 16bit ASID but 
we actually have 8 bit zero padded 16 bit entry.




Of course, depending on the SMMUs to probe in the right order isn't
particularly robust, so it's still probably a worthwhile change.


Since we use 8-bit ASID now we effectively misconfigure ASID[15:8] bits for
SMMU_CBn_TTBRm register. Also, we still use non-zero ASID[15:8] bits
upon context invalidation. This patch adds 16-bit ASID support for stage-1
AArch64 contexts for Cavium SMMUv2 model so that we use ASIDs consistently.

Signed-off-by: Tomasz Nowicki <t...@semihalf.com>
---
 drivers/iommu/arm-smmu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a60cded..ae8f059 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -260,6 +260,7 @@ enum arm_smmu_s2cr_privcfg {

 #define TTBCR2_SEP_SHIFT   15
 #define TTBCR2_SEP_UPSTREAM(0x7 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_AS  (1 << 4)

 #define TTBRn_ASID_SHIFT   48

@@ -778,6 +779,9 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
reg = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
reg2 = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
reg2 |= TTBCR2_SEP_UPSTREAM;
+   if (smmu->model == CAVIUM_SMMUV2 &&


I'd be inclined to say "smmu->version == ARM_SMMU_V2" there, rather than
make it Cavium-specific - we enable 16-bit VMID unconditionally where
supported, so I don't see any reason not to handle 16-bit ASIDs in the
same manner.


I agree, I will enable 16-bit ASID for ARM_SMMU_V2.




+   cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
+   reg2 |= TTBCR2_AS;
}
if (smmu->version > ARM_SMMU_V1)
writel_relaxed(reg2, cb_base + ARM_SMMU_CB_TTBCR2);



Either way:

Reviewed-by: Robin Murphy <robin.mur...@arm.com>

Thanks Robin!

Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/arm-smmu: Fix for ThunderX erratum #27704

2017-01-11 Thread Tomasz Nowicki
The goal of erratum #27704 workaround was to make sure that ASIDs and VMIDs
are unique across all SMMU instances on affected Cavium systems.

Currently, the workaround code partitions ASIDs and VMIDs by increasing
global cavium_smmu_context_count which in turn becomes the base ASID and VMID
value for the given SMMU instance upon the context bank initialization.

For systems with multiple SMMU instances this approach implies the risk
of crossing 8-bit ASID, like for CN88xx capable of 4 SMMUv2, 128 context bank 
each:
SMMU_0 (0-127 ASID RANGE)
SMMU_1 (127-255 ASID RANGE)
SMMU_2 (256-383 ASID RANGE) <--- crossing 8-bit ASID
SMMU_3 (384-511 ASID RANGE) <--- crossing 8-bit ASID

Since we use 8-bit ASID now we effectively misconfigure ASID[15:8] bits for
SMMU_CBn_TTBRm register. Also, we still use non-zero ASID[15:8] bits
upon context invalidation. This patch adds 16-bit ASID support for stage-1
AArch64 contexts for Cavium SMMUv2 model so that we use ASIDs consistently.

Signed-off-by: Tomasz Nowicki <t...@semihalf.com>
---
 drivers/iommu/arm-smmu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a60cded..ae8f059 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -260,6 +260,7 @@ enum arm_smmu_s2cr_privcfg {
 
 #define TTBCR2_SEP_SHIFT   15
 #define TTBCR2_SEP_UPSTREAM(0x7 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_AS  (1 << 4)
 
 #define TTBRn_ASID_SHIFT   48
 
@@ -778,6 +779,9 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
reg = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
reg2 = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
reg2 |= TTBCR2_SEP_UPSTREAM;
+   if (smmu->model == CAVIUM_SMMUV2 &&
+   cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
+   reg2 |= TTBCR2_AS;
}
if (smmu->version > ARM_SMMU_V1)
writel_relaxed(reg2, cb_base + ARM_SMMU_CB_TTBCR2);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] ACPI IORT ARM SMMU support

2016-11-16 Thread Tomasz Nowicki

On 09.11.2016 15:19, Lorenzo Pieralisi wrote:

This patch series is v7 of a previous posting:

https://lkml.org/lkml/2016/10/18/506



[...]



The ACPI IORT table provides information that allows instantiating
ARM SMMU devices and carrying out id mappings between components on
ARM based systems (devices, IOMMUs, interrupt controllers).

http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf

Building on basic IORT support, this patchset enables ARM SMMUs support
on ACPI systems.

Most of the code is aimed at building the required generic ACPI
infrastructure to create and enable IOMMU components and to bring
the IOMMU infrastructure for ACPI on par with DT, which is going to
make future ARM SMMU components easier to integrate.



[...]



This patchset is provided for review/testing purposes here:

git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/linux.git 
acpi/iort-smmu-v7

Tested on Juno and FVP models for ARM SMMU v1 and v3 probing path.



For all series:
Reviewed-by: Tomasz Nowicki <t...@semihalf.com>

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 13/16] drivers: iommu: arm-smmu: add IORT configuration

2016-11-09 Thread Tomasz Nowicki

Hi Lorenzo,

On 18.10.2016 18:04, Lorenzo Pieralisi wrote:

In ACPI bases systems, in order to be able to create platform
devices and initialize them for ARM SMMU components, the IORT
kernel implementation requires a set of static functions to be
used by the IORT kernel layer to configure platform devices for
ARM SMMU components.

Add static configuration functions to the IORT kernel layer for
the ARM SMMU components, so that the ARM SMMU driver can
initialize its respective platform device by relying on the IORT
kernel infrastructure and by adding a corresponding ACPI device
early probe section entry.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieral...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Robin Murphy <robin.mur...@arm.com>
Cc: Joerg Roedel <j...@8bytes.org>
---
 drivers/acpi/arm64/iort.c | 81 +
 drivers/iommu/arm-smmu.c  | 84 ++-
 include/linux/acpi_iort.h |  3 ++
 3 files changed, 167 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index ea90bc8..04cc5f7 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -548,6 +548,78 @@ static bool __init arm_smmu_v3_is_coherent(struct 
acpi_iort_node *node)
return smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE;
 }


[...]

+
+static int arm_smmu_device_acpi_probe(struct platform_device *pdev,
+ struct arm_smmu_device *smmu)
+{
+   struct device *dev = smmu->dev;
+   struct acpi_iort_node *node =
+   *(struct acpi_iort_node **)dev_get_platdata(dev);
+   struct acpi_iort_smmu *iort_smmu;
+   u64 *glb_irq;
+   int ret;
+
+   /* Retrieve SMMU1/2 specific data */
+   iort_smmu = (struct acpi_iort_smmu *)node->node_data;
+
+   ret = acpi_smmu_get_data(iort_smmu->model, >version,
+  >model);
+   if (ret < 0)
+   return ret;
+
+   glb_irq = ACPI_ADD_PTR(u64, iort_smmu,
+   iort_smmu->global_interrupt_offset);


One bug that I found:

-   glb_irq = ACPI_ADD_PTR(u64, iort_smmu,
-   iort_smmu->global_interrupt_offset);
+   glb_irq = ACPI_ADD_PTR(u64, node,
+   iort_smmu->global_interrupt_offset);

With this fix, I run VM with several PCI devices (NIC, SATA) in 
passthrough mode successfully on ACPI host using ThunderX 1-socket board.


Also, for my tests I used Eric's patches:
https://github.com/eauger/linux/commits/v4.9-rc3-reserved-rfc-v2

Including bug fix above:
Tested-by: Tomasz Nowicki <t...@semihalf.com> for all series.


+
+   if (!IORT_IRQ_MASK(glb_irq[1])) /* 0 means not implemented */
+   smmu->num_global_irqs = 1;
+   else
+   smmu->num_global_irqs = 2;
+
+   if (iort_smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK)
+   smmu->features |= ARM_SMMU_FEAT_COHERENT_WALK;
+
+   return 0;
+}
+#else
+static inline int arm_smmu_device_acpi_probe(struct platform_device *pdev,
+struct arm_smmu_device *smmu)
+{
+   return -ENODEV;
+}
+#endif
+


Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 13/15] drivers: acpi: iort: introduce iort_iommu_configure

2016-06-10 Thread Tomasz Nowicki

On 07.06.2016 15:31, Lorenzo Pieralisi wrote:

+static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)
+{
+   u32 *rid = data;
+
+   *rid = alias;
+   return 0;
+}
+
+/**
+ * iort_iommu_configure - Set-up IOMMU configuration for a device.
+ *
+ * @dev: device to configure
+ *
+ * Returns: iommu_ops pointer on configuration success
+ *  NULL on configuration failure
+ */
+const struct iommu_ops *iort_iommu_configure(struct device *dev)
+{
+   struct acpi_iort_node *node, *parent;
+   const struct iort_ops_node *iort_ops;
+   u32 rid = 0, devid = 0;
+
+   if (dev_is_pci(dev)) {
+   struct pci_bus *bus = to_pci_dev(dev)->bus;
+
+   pci_for_each_dma_alias(to_pci_dev(dev), __get_pci_rid,
+  );


I think we should find here the root bus which is connected to RC IORT node.


+
+   node = iort_scan_node(ACPI_IORT_NODE_PCI_ROOT_COMPLEX,
+ iort_find_dev_callback, >dev);
+   } else {
+   node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT,
+ iort_find_dev_callback, dev);
+   }
+
+   if (!node)
+   return NULL;
+
+   parent = iort_find_parent_node(node);
+
+   if (!parent)
+   return NULL;
+
+   iort_ops = iort_smmu_get_ops_node(parent);
+
+   if (iort_ops && iort_ops->iommu_xlate) {
+   iort_dev_map_rid(node, rid, , parent->type);
+   iort_ops->iommu_xlate(dev, devid, parent);
+   return iort_ops->ops;
+   }
+
+   return NULL;
+}
+


Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 09/11] drivers: acpi: implement acpi_dma_configure

2016-05-17 Thread Tomasz Nowicki



On 17.05.2016 10:07, Tomasz Nowicki wrote:

Hi Lorenzo,

On 14.04.2016 19:25, Lorenzo Pieralisi wrote:

On DT based systems, the of_dma_configure() API implements DMA
configuration
for a given device. On ACPI systems an API equivalent to
of_dma_configure()
is missing which implies that it is currently not possible to set-up DMA
operations for devices through the ACPI generic kernel layer.

This patch fills the gap by introducing acpi_dma_configure/deconfigure()
calls, that carry out IOMMU configuration through IORT (on systems where
it is present) and call arch_setup_dma_ops(...) with the retrieved
parameters.

The DMA range size passed to arch_setup_dma_ops() is sized according
to the device coherent_dma_mask (starting at address 0x0), mirroring the
DT probing path behaviour when a dma-ranges property is not provided
for the device being probed; this changes the current
arch_setup_dma_ops()
call parameters in the ACPI probing case, but since arch_setup_dma_ops()
is a NOP on all architectures but ARM/ARM64 this patch does not change
the current kernel behaviour on them.

This patch updates ACPI and PCI core code to use the newly introduced
acpi_dma_configure function, providing the same functionality
as of_dma_configure on ARM systems and leaving behaviour unchanged
for all other arches.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieral...@arm.com>
Cc: Bjorn Helgaas <bhelg...@google.com>
Cc: Robin Murphy <robin.mur...@arm.com>
Cc: Tomasz Nowicki <t...@semihalf.com>
Cc: Joerg Roedel <j...@8bytes.org>
Cc: "Rafael J. Wysocki" <r...@rjwysocki.net>
---
  drivers/acpi/glue.c |  4 +--
  drivers/acpi/iort.c | 85
+
  drivers/acpi/scan.c | 29 +
  drivers/pci/probe.c |  3 +-
  include/acpi/acpi_bus.h |  2 ++
  include/linux/acpi.h|  5 +++
  include/linux/iort.h|  9 ++
  7 files changed, 133 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
index 5ea5dc2..f8d6564 100644
--- a/drivers/acpi/glue.c
+++ b/drivers/acpi/glue.c
@@ -227,8 +227,7 @@ int acpi_bind_one(struct device *dev, struct
acpi_device *acpi_dev)

  attr = acpi_get_dma_attr(acpi_dev);
  if (attr != DEV_DMA_NOT_SUPPORTED)
-arch_setup_dma_ops(dev, 0, 0, NULL,
-   attr == DEV_DMA_COHERENT);
+acpi_dma_configure(dev, attr);

  acpi_physnode_link_name(physical_node_name, node_id);
  retval = sysfs_create_link(_dev->dev.kobj, >kobj,
@@ -251,6 +250,7 @@ int acpi_bind_one(struct device *dev, struct
acpi_device *acpi_dev)
  return 0;

   err:
+acpi_dma_deconfigure(dev);
  ACPI_COMPANION_SET(dev, NULL);
  put_device(dev);
  put_device(_dev->dev);
diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index 2b5ce65..b1bb8fb 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -72,6 +72,31 @@ int iort_iommu_set_node(struct iommu_ops *ops,
struct acpi_iort_node *node,
  return 0;
  }

+/**
+ * iort_iommu_get_node - Retrieve iort_iommu_node associated with an
IORT node.
+ *
+ * @node: IORT table node to be looked-up
+ *
+ * Returns: iort_iommu_node pointer on success
+ *  NULL on failure
+ */
+static struct iort_iommu_node *iort_iommu_get_node(struct
acpi_iort_node *node)
+{
+struct iort_iommu_node *iommu_node;
+
+spin_lock(_iommu_lock);
+list_for_each_entry(iommu_node, _iommu_list, list) {
+if (iommu_node->node == node)
+goto found;
+}
+
+iommu_node = NULL;
+found:
+spin_unlock(_iommu_lock);
+
+return iommu_node;
+}
+
  typedef acpi_status (*iort_find_node_callback)
  (struct acpi_iort_node *node, void *context);

@@ -405,6 +430,66 @@ iort_pci_get_domain(struct pci_dev *pdev, u32
req_id)
  return domain_handle;
  }

+static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)
+{
+u32 *rid = data;
+
+*rid = alias;
+return 0;
+}
+
+/**
+ * iort_iommu_configure - Set-up IOMMU configuration for a device.
+ *
+ * @dev: device that requires IOMMU set-up
+ *
+ * Returns: iommu_ops pointer on configuration success
+ *  NULL on configuration failure
+ */
+struct iommu_ops *iort_iommu_configure(struct device *dev)
+{
+struct acpi_iort_node *node, *parent;
+struct iommu_ops *ops = NULL;
+struct iommu_fwspec fwspec;
+struct iort_iommu_node *iommu_node;
+u32 rid = 0, devid = 0;
+
+if (dev_is_pci(dev)) {
+struct pci_bus *bus = to_pci_dev(dev)->bus;
+
+pci_for_each_dma_alias(to_pci_dev(dev), __get_pci_rid,
+   );
+
+node = iort_scan_node(ACPI_IORT_NODE_PCI_ROOT_COMPLEX,
+  iort_find_dev_callback, >dev);
+} else
+node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT,
+  iort_find_dev_callback, dev);


I think this will not work for finding host bridge stream ID. We still
need to use ACPI_IORT_NODE_PCI_

Re: [RFC PATCH 09/11] drivers: acpi: implement acpi_dma_configure

2016-05-16 Thread Tomasz Nowicki



On 16.05.2016 17:15, Tomasz Nowicki wrote:

On 18.04.2016 12:43, Robin Murphy wrote:

On 15/04/16 19:29, Timur Tabi wrote:

On Thu, Apr 14, 2016 at 12:25 PM, Lorenzo Pieralisi
<lorenzo.pieral...@arm.com> wrote:

+void acpi_dma_configure(struct device *dev, enum dev_dma_attr attr)
+{
+   struct iommu_ops *iommu;
+
+   iommu = iort_iommu_configure(dev);
+
+   /*
+* Assume dma valid range starts at 0 and covers the whole
+* coherent_dma_mask.
+*/
+   arch_setup_dma_ops(dev, 0, dev->coherent_dma_mask + 1, iommu,
+  attr == DEV_DMA_COHERENT);
+}


I have a network driver that is impacted by this code, so thank you
for posting this. (See
https://www.mail-archive.com/netdev@vger.kernel.org/msg106249.html).

One one SOC, the driver needs to set the mask to 32 bits.  On another
SOC, it needs to set it to 64 bits.  On device tree, the driver will
use dma-ranges.


That's the wrong way to look at it - the driver isn't _using_
dma-ranges, you're merely relying on the OF code setting the _default_
DMA mask differently based on the property. If your driver is in the
minority of those which actually care about DMA masks, then it should be
calling dma_set_mask_and_coherent() appropriately and not relying on the
default.


I don't see the clear strategy for setting DMA mask as well.

Lets consider DT boot method example:
1. SMMUv2 supports 48bit translation and 1:1 address map
dma-ranges = <0x0 0x0 0x0 0x0 0x0001 0x0>;
and we are adding PCI device:

pci_device_add -> DMA_BIT_MASK(32) by default
   pci_dma_configure
 of_dma_configure -> reads dma-ranges and calculates 48bit DMA mask,
but it picks minimum, we stay with DMA_BIT_MASK(32)

now PCI dev turns out to be e1000e NIC:
e1000_probe
   dma_set_mask_and_coherent -> tries to set DMA_BIT_MASK(64)
 dma_set_mask -> there is no set_dma_mask ops for SMMUv2 so we let


Sorry, there is no .set_dma_mask ops for ARM64 (not SMMUv2)

Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 09/11] drivers: acpi: implement acpi_dma_configure

2016-05-16 Thread Tomasz Nowicki

On 18.04.2016 12:43, Robin Murphy wrote:

On 15/04/16 19:29, Timur Tabi wrote:

On Thu, Apr 14, 2016 at 12:25 PM, Lorenzo Pieralisi
 wrote:

+void acpi_dma_configure(struct device *dev, enum dev_dma_attr attr)
+{
+   struct iommu_ops *iommu;
+
+   iommu = iort_iommu_configure(dev);
+
+   /*
+* Assume dma valid range starts at 0 and covers the whole
+* coherent_dma_mask.
+*/
+   arch_setup_dma_ops(dev, 0, dev->coherent_dma_mask + 1, iommu,
+  attr == DEV_DMA_COHERENT);
+}


I have a network driver that is impacted by this code, so thank you
for posting this. (See
https://www.mail-archive.com/netdev@vger.kernel.org/msg106249.html).

One one SOC, the driver needs to set the mask to 32 bits.  On another
SOC, it needs to set it to 64 bits.  On device tree, the driver will
use dma-ranges.


That's the wrong way to look at it - the driver isn't _using_
dma-ranges, you're merely relying on the OF code setting the _default_
DMA mask differently based on the property. If your driver is in the
minority of those which actually care about DMA masks, then it should be
calling dma_set_mask_and_coherent() appropriately and not relying on the
default.


I don't see the clear strategy for setting DMA mask as well.

Lets consider DT boot method example:
1. SMMUv2 supports 48bit translation and 1:1 address map
dma-ranges = <0x0 0x0 0x0 0x0 0x0001 0x0>;
and we are adding PCI device:

pci_device_add -> DMA_BIT_MASK(32) by default
  pci_dma_configure
of_dma_configure -> reads dma-ranges and calculates 48bit DMA mask, 
but it picks minimum, we stay with DMA_BIT_MASK(32)


now PCI dev turns out to be e1000e NIC:
e1000_probe
  dma_set_mask_and_coherent -> tries to set DMA_BIT_MASK(64)
dma_set_mask -> there is no set_dma_mask ops for SMMUv2 so we let 
it be DMA_BIT_MASK(64). From that point on, we let to use memory which 
SMMUv2 cannot work with.


Does lack of set_dma_mask is the only missing thing here?

Thanks,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu