Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A

2018-09-19 Thread Robin Murphy

On 19/09/18 15:18, Laurentiu Tudor wrote:

Hi Robin,

On 19.09.2018 16:25, Robin Murphy wrote:

Hi Laurentiu,

On 19/09/18 13:35, laurentiu.tu...@nxp.com wrote:

From: Laurentiu Tudor 

This patch series adds SMMU support for NXP LS1043A and LS1046A chips
and consists mostly in important driver fixes and the required device
tree updates. It touches several subsystems and consists of three main
parts:
   - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
     reserved memory areas, fixes and defered probe support
   - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
     consisting in misc dma mapping related fixes and probe ordering
   - addition of the actual arm smmu device tree node together with
     various adjustments to the device trees

Performance impact

  Running iperf benchmarks in a back-to-back setup (both sides
  having smmu enabled) on a 10GBps port show an important
  networking performance degradation of around %40 (9.48Gbps
  linerate vs 5.45Gbps). If you need performance but without
  SMMU support you can use "iommu.passthrough=1" to disable
  SMMU.

USB issue and workaround

  There's a problem with the usb controllers in these chips
  generating smaller, 40-bit wide dma addresses instead of the 48-bit
  supported at the smmu input. So you end up in a situation where the
  smmu is mapped with 48-bit address translations, but the device
  generates transactions with clipped 40-bit addresses, thus smmu
  context faults are triggered. I encountered a similar situation for
  mmc that I  managed to fix in software [1] however for USB I did not
  find a proper place in the code to add a similar fix. The only
  workaround I found was to add this kernel parameter which limits the
  usb dma to 32-bit size: "xhci-hcd.quirks=0x80".
  This workaround if far from ideal, so any suggestions for a code
  based workaround in this area would be greatly appreciated.


If you have a nominally-64-bit device with a
narrower-than-the-main-interconnect link in front of it, that should
already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
provided the interconnect hierarchy can be described appropriately (or
at least massaged sufficiently to satisfy the binding), e.g.:

/ {
  ...

  soc {
      ranges;
      dma-ranges = <0 0 1 0>;

      dev_48bit { ... };

      periph_bus {
      ranges;
      dma-ranges = <0 0 100 0>;

      dev_40bit { ... };
      };
  };
};

and if that fails to work as expected (except for PCI hosts where
handling dma-ranges properly still needs sorting out), please do let us
know ;)



Just to confirm, Is this [1] the change I was supposed to test?


Not quite - dma-ranges is only valid for nodes representing a bus, so 
putting it directly in the USB device nodes doesn't work (FWIW that's 
why PCI is broken, because the parser doesn't expect the 
bus-as-leaf-node case). That's teh point of that intermediate simple-bus 
node represented by "periph_bus" in my example (sorry, I should have put 
compatibles in to make it clearer) - often that's actually true to life 
(i.e. "soc" is something like a CCI and "periph_bus" is something like 
an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the 
CCI ports) but at worst it's just a necessary evil to make the binding 
happy (if it literally only represents the point-to-point link between 
the device master port and interconnect slave port).



Because if so, I'm still seeing context faults [2] with what looks like
clipped to 40-bits addresses. :-(
IIRC, the usb subsystem explicitly set 64-bit dma masks which in turn
will be limited to the SMMU input size of 48-bit. Won't that overwrite
the default dma mask derived from dma-ranges?


Indeed it will, but those default masks were effectively only ever a 
best-effort thing anyway - it's an ease-of-implementation detail that 
bus_dma_mask is not currently reflected in the device masks, although we 
may eventually change that; the crucial part is that the DMA ops 
implementations know about it and should now enforce it properly 
regardless of whether drivers set something wider.


Robin.



---
Best Regards, Laurentiu

[1] -

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 3bdea0470f69..a214c3df37fd 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -612,6 +612,7 @@
  compatible = "snps,dwc3";
  reg = <0x0 0x2f0 0x0 0x1>;
  interrupts = ;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x>;
  dr_mode = "host";
  snps,quirk-frame-length-adjustment = <0x20>;
  

Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A

2018-09-19 Thread Robin Murphy

On 19/09/18 15:18, Laurentiu Tudor wrote:

Hi Robin,

On 19.09.2018 16:25, Robin Murphy wrote:

Hi Laurentiu,

On 19/09/18 13:35, laurentiu.tu...@nxp.com wrote:

From: Laurentiu Tudor 

This patch series adds SMMU support for NXP LS1043A and LS1046A chips
and consists mostly in important driver fixes and the required device
tree updates. It touches several subsystems and consists of three main
parts:
   - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
     reserved memory areas, fixes and defered probe support
   - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
     consisting in misc dma mapping related fixes and probe ordering
   - addition of the actual arm smmu device tree node together with
     various adjustments to the device trees

Performance impact

  Running iperf benchmarks in a back-to-back setup (both sides
  having smmu enabled) on a 10GBps port show an important
  networking performance degradation of around %40 (9.48Gbps
  linerate vs 5.45Gbps). If you need performance but without
  SMMU support you can use "iommu.passthrough=1" to disable
  SMMU.

USB issue and workaround

  There's a problem with the usb controllers in these chips
  generating smaller, 40-bit wide dma addresses instead of the 48-bit
  supported at the smmu input. So you end up in a situation where the
  smmu is mapped with 48-bit address translations, but the device
  generates transactions with clipped 40-bit addresses, thus smmu
  context faults are triggered. I encountered a similar situation for
  mmc that I  managed to fix in software [1] however for USB I did not
  find a proper place in the code to add a similar fix. The only
  workaround I found was to add this kernel parameter which limits the
  usb dma to 32-bit size: "xhci-hcd.quirks=0x80".
  This workaround if far from ideal, so any suggestions for a code
  based workaround in this area would be greatly appreciated.


If you have a nominally-64-bit device with a
narrower-than-the-main-interconnect link in front of it, that should
already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
provided the interconnect hierarchy can be described appropriately (or
at least massaged sufficiently to satisfy the binding), e.g.:

/ {
  ...

  soc {
      ranges;
      dma-ranges = <0 0 1 0>;

      dev_48bit { ... };

      periph_bus {
      ranges;
      dma-ranges = <0 0 100 0>;

      dev_40bit { ... };
      };
  };
};

and if that fails to work as expected (except for PCI hosts where
handling dma-ranges properly still needs sorting out), please do let us
know ;)



Just to confirm, Is this [1] the change I was supposed to test?


Not quite - dma-ranges is only valid for nodes representing a bus, so 
putting it directly in the USB device nodes doesn't work (FWIW that's 
why PCI is broken, because the parser doesn't expect the 
bus-as-leaf-node case). That's teh point of that intermediate simple-bus 
node represented by "periph_bus" in my example (sorry, I should have put 
compatibles in to make it clearer) - often that's actually true to life 
(i.e. "soc" is something like a CCI and "periph_bus" is something like 
an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the 
CCI ports) but at worst it's just a necessary evil to make the binding 
happy (if it literally only represents the point-to-point link between 
the device master port and interconnect slave port).



Because if so, I'm still seeing context faults [2] with what looks like
clipped to 40-bits addresses. :-(
IIRC, the usb subsystem explicitly set 64-bit dma masks which in turn
will be limited to the SMMU input size of 48-bit. Won't that overwrite
the default dma mask derived from dma-ranges?


Indeed it will, but those default masks were effectively only ever a 
best-effort thing anyway - it's an ease-of-implementation detail that 
bus_dma_mask is not currently reflected in the device masks, although we 
may eventually change that; the crucial part is that the DMA ops 
implementations know about it and should now enforce it properly 
regardless of whether drivers set something wider.


Robin.



---
Best Regards, Laurentiu

[1] -

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 3bdea0470f69..a214c3df37fd 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -612,6 +612,7 @@
  compatible = "snps,dwc3";
  reg = <0x0 0x2f0 0x0 0x1>;
  interrupts = ;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x>;
  dr_mode = "host";
  snps,quirk-frame-length-adjustment = <0x20>;
  

Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A

2018-09-19 Thread Laurentiu Tudor
Hi Robin,

On 19.09.2018 16:25, Robin Murphy wrote:
> Hi Laurentiu,
> 
> On 19/09/18 13:35, laurentiu.tu...@nxp.com wrote:
>> From: Laurentiu Tudor 
>>
>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>> and consists mostly in important driver fixes and the required device
>> tree updates. It touches several subsystems and consists of three main
>> parts:
>>   - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>>     reserved memory areas, fixes and defered probe support
>>   - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>     consisting in misc dma mapping related fixes and probe ordering
>>   - addition of the actual arm smmu device tree node together with
>>     various adjustments to the device trees
>>
>> Performance impact
>>
>>  Running iperf benchmarks in a back-to-back setup (both sides
>>  having smmu enabled) on a 10GBps port show an important
>>  networking performance degradation of around %40 (9.48Gbps
>>  linerate vs 5.45Gbps). If you need performance but without
>>  SMMU support you can use "iommu.passthrough=1" to disable
>>  SMMU.
>>
>> USB issue and workaround
>>
>>  There's a problem with the usb controllers in these chips
>>  generating smaller, 40-bit wide dma addresses instead of the 48-bit
>>  supported at the smmu input. So you end up in a situation where the
>>  smmu is mapped with 48-bit address translations, but the device
>>  generates transactions with clipped 40-bit addresses, thus smmu
>>  context faults are triggered. I encountered a similar situation for
>>  mmc that I  managed to fix in software [1] however for USB I did not
>>  find a proper place in the code to add a similar fix. The only
>>  workaround I found was to add this kernel parameter which limits the
>>  usb dma to 32-bit size: "xhci-hcd.quirks=0x80".
>>  This workaround if far from ideal, so any suggestions for a code
>>  based workaround in this area would be greatly appreciated.
> 
> If you have a nominally-64-bit device with a 
> narrower-than-the-main-interconnect link in front of it, that should 
> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges, 
> provided the interconnect hierarchy can be described appropriately (or 
> at least massaged sufficiently to satisfy the binding), e.g.:
> 
> / {
>  ...
> 
>  soc {
>      ranges;
>      dma-ranges = <0 0 1 0>;
> 
>      dev_48bit { ... };
> 
>      periph_bus {
>      ranges;
>      dma-ranges = <0 0 100 0>;
> 
>      dev_40bit { ... };
>      };
>  };
> };
> 
> and if that fails to work as expected (except for PCI hosts where 
> handling dma-ranges properly still needs sorting out), please do let us 
> know ;)
> 

Just to confirm, Is this [1] the change I was supposed to test?
Because if so, I'm still seeing context faults [2] with what looks like 
clipped to 40-bits addresses. :-(
IIRC, the usb subsystem explicitly set 64-bit dma masks which in turn 
will be limited to the SMMU input size of 48-bit. Won't that overwrite 
the default dma mask derived from dma-ranges?

---
Best Regards, Laurentiu

[1] -

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 3bdea0470f69..a214c3df37fd 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -612,6 +612,7 @@
 compatible = "snps,dwc3";
 reg = <0x0 0x2f0 0x0 0x1>;
 interrupts = ;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x>;
 dr_mode = "host";
 snps,quirk-frame-length-adjustment = <0x20>;
 snps,dis_rxdet_inp3_quirk;
@@ -621,6 +622,7 @@
 compatible = "snps,dwc3";
 reg = <0x0 0x300 0x0 0x1>;
 interrupts = ;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x>;
 dr_mode = "host";
 snps,quirk-frame-length-adjustment = <0x20>;
 snps,dis_rxdet_inp3_quirk;
@@ -630,6 +632,7 @@
 compatible = "snps,dwc3";
 reg = <0x0 0x310 0x0 0x1>;
 interrupts = ;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x>;
 dr_mode = "host";
 snps,quirk-frame-length-adjustment = <0x20>;
 snps,dis_rxdet_inp3_quirk;

[2] -
[2.090577] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[2.096064] xhci-hcd xhci-hcd.0.auto: new USB bus registered, 
assigned bus number 2
[

Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A

2018-09-19 Thread Laurentiu Tudor
Hi Robin,

On 19.09.2018 16:25, Robin Murphy wrote:
> Hi Laurentiu,
> 
> On 19/09/18 13:35, laurentiu.tu...@nxp.com wrote:
>> From: Laurentiu Tudor 
>>
>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>> and consists mostly in important driver fixes and the required device
>> tree updates. It touches several subsystems and consists of three main
>> parts:
>>   - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>>     reserved memory areas, fixes and defered probe support
>>   - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>     consisting in misc dma mapping related fixes and probe ordering
>>   - addition of the actual arm smmu device tree node together with
>>     various adjustments to the device trees
>>
>> Performance impact
>>
>>  Running iperf benchmarks in a back-to-back setup (both sides
>>  having smmu enabled) on a 10GBps port show an important
>>  networking performance degradation of around %40 (9.48Gbps
>>  linerate vs 5.45Gbps). If you need performance but without
>>  SMMU support you can use "iommu.passthrough=1" to disable
>>  SMMU.
>>
>> USB issue and workaround
>>
>>  There's a problem with the usb controllers in these chips
>>  generating smaller, 40-bit wide dma addresses instead of the 48-bit
>>  supported at the smmu input. So you end up in a situation where the
>>  smmu is mapped with 48-bit address translations, but the device
>>  generates transactions with clipped 40-bit addresses, thus smmu
>>  context faults are triggered. I encountered a similar situation for
>>  mmc that I  managed to fix in software [1] however for USB I did not
>>  find a proper place in the code to add a similar fix. The only
>>  workaround I found was to add this kernel parameter which limits the
>>  usb dma to 32-bit size: "xhci-hcd.quirks=0x80".
>>  This workaround if far from ideal, so any suggestions for a code
>>  based workaround in this area would be greatly appreciated.
> 
> If you have a nominally-64-bit device with a 
> narrower-than-the-main-interconnect link in front of it, that should 
> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges, 
> provided the interconnect hierarchy can be described appropriately (or 
> at least massaged sufficiently to satisfy the binding), e.g.:
> 
> / {
>  ...
> 
>  soc {
>      ranges;
>      dma-ranges = <0 0 1 0>;
> 
>      dev_48bit { ... };
> 
>      periph_bus {
>      ranges;
>      dma-ranges = <0 0 100 0>;
> 
>      dev_40bit { ... };
>      };
>  };
> };
> 
> and if that fails to work as expected (except for PCI hosts where 
> handling dma-ranges properly still needs sorting out), please do let us 
> know ;)
> 

Just to confirm, Is this [1] the change I was supposed to test?
Because if so, I'm still seeing context faults [2] with what looks like 
clipped to 40-bits addresses. :-(
IIRC, the usb subsystem explicitly set 64-bit dma masks which in turn 
will be limited to the SMMU input size of 48-bit. Won't that overwrite 
the default dma mask derived from dma-ranges?

---
Best Regards, Laurentiu

[1] -

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 3bdea0470f69..a214c3df37fd 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -612,6 +612,7 @@
 compatible = "snps,dwc3";
 reg = <0x0 0x2f0 0x0 0x1>;
 interrupts = ;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x>;
 dr_mode = "host";
 snps,quirk-frame-length-adjustment = <0x20>;
 snps,dis_rxdet_inp3_quirk;
@@ -621,6 +622,7 @@
 compatible = "snps,dwc3";
 reg = <0x0 0x300 0x0 0x1>;
 interrupts = ;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x>;
 dr_mode = "host";
 snps,quirk-frame-length-adjustment = <0x20>;
 snps,dis_rxdet_inp3_quirk;
@@ -630,6 +632,7 @@
 compatible = "snps,dwc3";
 reg = <0x0 0x310 0x0 0x1>;
 interrupts = ;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x>;
 dr_mode = "host";
 snps,quirk-frame-length-adjustment = <0x20>;
 snps,dis_rxdet_inp3_quirk;

[2] -
[2.090577] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[2.096064] xhci-hcd xhci-hcd.0.auto: new USB bus registered, 
assigned bus number 2
[