Re: [PATCH 22/22] iommu/vt-d: Add a quirk flag for scope mismatched devices

2020-01-08 Thread Christoph Hellwig
On Thu, Jan 09, 2020 at 07:28:41AM +0800, Lu Baolu wrote:
> This patch has been replaced with this one.
> 
> https://lkml.org/lkml/2020/1/5/103

That still mentions a "nvme host device", which despite the different
spelling still doesn't make any sense.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] iommu/amd: Optimization and cleanup for unused variables

2020-01-08 Thread Adrian Huang
This series optimizes the register reading by using readq instead of
readl and cleans up the unused variables.

Adrian Huang (2):
  iommu/amd: Replace two consecutive readl calls with one readq
  iommu/amd: Remove unused struct member

 drivers/iommu/amd_iommu_init.c  | 6 +-
 drivers/iommu/amd_iommu_types.h | 1 -
 2 files changed, 1 insertion(+), 6 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/amd: Replace two consecutive readl calls with one readq

2020-01-08 Thread Adrian Huang
From: Adrian Huang 

Optimize the reigster reading by using readq instead of the two
consecutive readl calls.

Signed-off-by: Adrian Huang 
---
 drivers/iommu/amd_iommu_init.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index c5167fe0bd5f..cfdc4b60ccbe 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1718,7 +1718,6 @@ static const struct attribute_group *amd_iommu_groups[] = 
{
 static int __init iommu_init_pci(struct amd_iommu *iommu)
 {
int cap_ptr = iommu->cap_ptr;
-   u32 low, high;
int ret;
 
iommu->dev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(iommu->devid),
@@ -1736,10 +1735,7 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
amd_iommu_iotlb_sup = false;
 
/* read extended feature bits */
-   low  = readl(iommu->mmio_base + MMIO_EXT_FEATURES);
-   high = readl(iommu->mmio_base + MMIO_EXT_FEATURES + 4);
-
-   iommu->features = ((u64)high << 32) | low;
+   iommu->features = readq(iommu->mmio_base + MMIO_EXT_FEATURES);
 
if (iommu_feature(iommu, FEATURE_GT)) {
int glxval;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/amd: Remove unused struct member

2020-01-08 Thread Adrian Huang
From: Adrian Huang 

Commit c805b428f206 ("iommu/amd: Remove amd_iommu_pd_list") removes
the global list for the allocated protection domains. The
corresponding member 'list' of the protection_domain struct is
not used anymore, so it can be removed.

Signed-off-by: Adrian Huang 
---
 drivers/iommu/amd_iommu_types.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 500f0b78879d..f8d01d6b00da 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -463,7 +463,6 @@ struct amd_irte_ops;
  * independent of their use.
  */
 struct protection_domain {
-   struct list_head list;  /* for list of all protection domains */
struct list_head dev_list; /* List of all devices in this domain */
struct iommu_domain domain; /* generic domain handle used by
   iommu core code */
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC] ARM: dts: qcom: msm8974: add mdp5 iommu support

2020-01-08 Thread Brian Masney
This adds preliminary IOMMU support for the MDP5 on msm8974. It appears
that the qcom-iommu driver in upstream can be used on this SoC. I marked
this patch as a RFC since the frame buffer becomes corrupted when I boot
the Nexus 5 phone with this patch:

https://raw.githubusercontent.com/masneyb/nexus-5-upstream/master/images/broken-mdp5-iommu.jpg

A quick note about the ranges of the context banks below: Based on the
downstream sources, I believe that the memory addresses should be mapped
out like this:

mdp_iommu: iommu@fd928000 {
reg = <0xfd928000 0x8000>;
ranges = <0 0xfd93 0x8000>;
...

iommu-ctx@0 {
reg = <0x0 0x1000>;
...
};

iommu-ctx@1000 {
reg = <0x1000 0x1000>;
...
};

iommu-ctx@2000 {
reg = <0x2000 0x1000>;
...
};
};

However, the qcom-iommu driver in upstream expects the first context
bank to exist at address 0x1000, and the address space identifier
(asid) to be greater than 0. See get_asid() and qcom_iommu_of_xlate()
in the upstream qcom-iommu.c driver. I put in the patch below what the
driver expects. I modified the driver in my local tree to allow the
mapping that I have above so that the extra 0x1000 of memory is mapped
into the global address space and still experience the same screen
corruption issue.

Downstream MSM 3.4 IOMMU dts snippet for reference:
https://github.com/AICP/kernel_lge_hammerhead/blob/n7.1/arch/arm/boot/dts/msm-iommu-v1.dtsi#L110

I'm hoping that someone that's more familiar with this hardware has a
suggestion for something to try.

Signed-off-by: Brian Masney 
---
 arch/arm/boot/dts/qcom-msm8974.dtsi | 44 +
 1 file changed, 44 insertions(+)

diff --git a/arch/arm/boot/dts/qcom-msm8974.dtsi 
b/arch/arm/boot/dts/qcom-msm8974.dtsi
index 4b161b809dd5..2515a3bd4aa7 100644
--- a/arch/arm/boot/dts/qcom-msm8974.dtsi
+++ b/arch/arm/boot/dts/qcom-msm8974.dtsi
@@ -1305,6 +1305,46 @@ etm3_out: endpoint {
};
};
 
+   mdp_iommu: iommu@fd928000 {
+   compatible = "qcom,msm8974-iommu",
+"qcom,msm-iommu-v1";
+   reg = <0xfd928000 0x7000>;
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #iommu-cells = <1>;
+   ranges = <0 0xfd92f000 0x9000>;
+
+   clocks = < MDSS_AHB_CLK>,
+< MDSS_AXI_CLK>;
+   clock-names = "iface",
+ "bus";
+
+   qcom,iommu-secure-id = <1>;
+
+   // mdp_0
+   iommu-ctx@1000 {
+   compatible = "qcom,msm-iommu-v1-ns";
+   reg = <0x1000 0x1000>;
+   interrupts = ;
+   };
+
+   // mdp_1
+   iommu-ctx@2000 {
+   compatible = "qcom,msm-iommu-v1-sec";
+   reg = <0x2000 0x1000>;
+   interrupts = ,
+;
+   };
+
+   // mdp_2
+   iommu-ctx@3000 {
+   compatible = "qcom,msm-iommu-v1-sec";
+   reg = <0x3000 0x1000>;
+   interrupts = ,
+;
+   };
+};
+
ocmem@fdd0 {
compatible = "qcom,msm8974-ocmem";
reg = <0xfdd0 0x2000>,
@@ -1427,6 +1467,10 @@ mdp: mdp@fd90 {
interconnects = < MNOC_MAS_MDP_PORT0 
 BIMC_SLV_EBI_CH0>;
interconnect-names = "mdp0-mem";
 
+   iommus = <_iommu 1
+ _iommu 2
+ _iommu 3>;
+
ports {
#address-cells = <1>;
#size-cells = <0>;
-- 
2.24.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 22/22] iommu/vt-d: Add a quirk flag for scope mismatched devices

2020-01-08 Thread Roland Dreier via iommu
> Are you willing to add your reviewed-by for Jim's v2 patch? I will
> queue it for v5.6 if you both agree.

Sure:

Reviewed-by: Roland Dreier 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 22/22] iommu/vt-d: Add a quirk flag for scope mismatched devices

2020-01-08 Thread Lu Baolu

Hi Christoph,

On 1/8/20 10:16 PM, Christoph Hellwig wrote:

+/*
+ * We expect devices with endpoint scope to have normal PCI
+ * headers, and devices with bridge scope to have bridge PCI
+ * headers.  However some PCI devices may be listed in the
+ * DMAR table with bridge scope, even though they have a
+ * normal PCI header. We don't declare a socpe mismatch for
+ * below special cases.
+ */


Please use up all 80 lines for comments.


+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2f0d,  /* NTB devices  */
+quirk_dmar_scope_mismatch);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2020,  /* NVME host */
+quirk_dmar_scope_mismatch);


As said before "NVME host" host.  Besides the wrong spelling of NVMe,
the NVMe host is the Linux kernel, so describing a device as such seems
rather bogus.



This patch has been replaced with this one.

https://lkml.org/lkml/2020/1/5/103

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] iommu/vt-d: skip invalid RMRR entries

2020-01-08 Thread Barret Rhoden via iommu

On 1/7/20 8:27 PM, Lu Baolu wrote:

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a8bb458845bc..32c3c6338a3d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4315,13 +4315,25 @@ static void __init init_iommu_pm_ops(void)
  static inline void init_iommu_pm_ops(void) {}
  #endif    /* CONFIG_PM */
+static int rmrr_validity_check(struct acpi_dmar_reserved_memory *rmrr)
+{
+    if ((rmrr->base_address & PAGE_MASK) ||
+    (rmrr->end_address <= rmrr->base_address) ||
+    ((rmrr->end_address - rmrr->base_address + 1) & PAGE_MASK)) {
+    pr_err(FW_BUG "Broken RMRR base: %#018Lx end: %#018Lx\n",
+   rmrr->base_address, rmrr->end_address);


Since you will WARN_TAINT below, do you still want an error message
here?


I'm fine either way.

I put it in since arch_rmrr_sanity_check() also has a pr_err():

pr_err(FW_BUG "No firmware reserved region can cover this RMRR
   [%#018Lx-%#018Lx], contact BIOS vendor for fixes\n",
   start, end - 1);

Thanks,

Barret

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] dt-bindings: iommu: Add binding for MediaTek MT8167 IOMMU

2020-01-08 Thread Rob Herring
On Fri,  3 Jan 2020 17:26:31 +0100, Fabien Parent wrote:
> This commit adds IOMMU binding documentation for the MT8167 SoC.
> 
> Signed-off-by: Fabien Parent 
> ---
>  Documentation/devicetree/bindings/iommu/mediatek,iommu.txt | 1 +
>  1 file changed, 1 insertion(+)
> 

Acked-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 0/4] Add uacce module for Accelerator

2020-01-08 Thread Dave Jiang

On 12/15/19 8:08 PM, Zhangfei Gao wrote:

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Because of unified address, hardware and user space of process can share
the same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support.
We have keep verifying with Jean's sva patchset [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patches [3]


Looking forward to this common framework going upstream. I'm currently 
in the process of upstreaming the device driver (idxd) [1] for the 
recently announced Intel Data Streaming Accelerator [2] [3] that also 
supports SVA.


[1] https://lkml.org/lkml/2020/1/7/905
[2] https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator
[3] 
https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification


And I think with this framework upstream I can potentially drop the in 
driver exported char device support code and use this framework directly.





This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/v5.5-rc1-uacce-v10

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-upstream-v10

References:
[1] http://jpbrucker.net/sva/
[2] http://jpbrucker.net/git/linux/log/?h=sva/zip-devel
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:
v10:
Modify the include header to fix kbuild test erorr in other arch.

v9:
Suggested by Jonathan
1. Remove sysfs: numa_distance, node_id, id, also add is_visible callback
2. Split the api to solve the potential race
struct uacce_device *uacce_alloc(struct device *parent,
 struct uacce_interface *interface)
int uacce_register(struct uacce_device *uacce)
void uacce_remove(struct uacce_device *uacce)
3. Split clean up patch 03

v8:
Address some comments from Jonathan
Merge Jean's patch, using uacce_mm instead of pid for sva_exit

v7:
As suggested by Jean and Jerome
Only consider sva case and remove unused dma apis for the first patch.
Also add mm_exit for sva and vm_ops.close etc


v6: https://lkml.org/lkml/2019/10/16/231
Change sys qfrs_size to different file, suggested by Jonathan
Fix crypto daily build issue and based on crypto code base, also 5.4-rc1.

v5: https://lkml.org/lkml/2019/10/14/74
Add an example patch using the uacce interface, suggested by Greg
0003-crypto-hisilicon-register-zip-engine-to-uacce.patch

v4: https://lkml.org/lkml/2019/9/17/116
Based on 5.4-rc1
Considering other driver integrating uacce,
if uacce not compiled, uacce_register return error and uacce_unregister is 
empty.
Simplify uacce flag: UACCE_DEV_SVA.
Address Greg's comments:
Fix state machine, remove potential syslog triggered from user space etc.

v3: https://lkml.org/lkml/2019/9/2/990
Recommended by Greg, use sturct uacce_device instead of struct uacce,
and use struct *cdev in struct uacce_device, as a result,
cdev can be released by itself when refcount decreased to 0.
So the two structures are decoupled and self-maintained by themsleves.
Also add dev.release for put_device.

v2: https://lkml.org/lkml/2019/8/28/565
Address comments from Greg and Jonathan
Modify interface uacce_register
Drop noiommu mode first

v1: https://lkml.org/lkml/2019/8/14/277
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

RFC3:
https://lkml.org/lkml/2018/11/12/1951

RFC2:
https://lwn.net/Articles/763990/


Background of why Uacce:
Von Neumann processor is not good at general data manipulation.
It is designed for control-bound rather than data-bound application.
The latter need less control path facility and more/specific ALUs.
So there are more and more heterogeneous processors, such as
encryption/decryption accelerators, TPUs, or
EDGE (Explicated Data Graph Execution) processors, introduced to gain
better performance or power efficiency for particular applications
these days.

There are generally two ways to make use of these heterogeneous processors:

The first is to make them co-processors, just like FPU.
This is good for some application but it has its own cons:
It changes the ISA set permanently.
You must save all state elements when the process is switched out.
But most data-bound processors have a huge set of state elements.
It makes the kernel scheduler more complex.

The second is Accelerator.
It is taken as a IO device from the CPU's point of view
(but it need not to be physically). The process, running on CPU,
hold a context of the accelerator and send 

Re: [PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs

2020-01-08 Thread Robin Murphy

On 08/01/2020 2:00 pm, Peter Ujfalusi wrote:

Robin,

On 08/01/2020 14.21, Robin Murphy wrote:

On 08/01/2020 8:28 am, Peter Ujfalusi via iommu wrote:

Hi Christoph,

On 19/12/2019 17.20, Peter Ujfalusi wrote:

Hi Christoph,

On 19/12/2019 17.02, Christoph Hellwig wrote:

Hi Peter,

can you try the patch below (it will need to be split into two):


Thank you!

Unfortunately it does not help:
[    0.596208] edma: probe of 270.edma failed with error -5
[    0.596626] edma: probe of 2728000.edma failed with error -5
...
[    2.108602] sdhci-omap 2300.mmc: Got CD GPIO
[    2.113899] mmc0: Failed to set 32-bit DMA mask.
[    2.118592] mmc0: No suitable DMA available - falling back to PIO
[    2.159038] mmc0: SDHCI controller on 2300.mmc [2300.mmc]
using PIO
[    2.167531] mmc1: Failed to set 32-bit DMA mask.
[    2.172192] mmc1: No suitable DMA available - falling back to PIO
[    2.213841] mmc1: SDHCI controller on 2310.mmc [2310.mmc]
using PIO


Do you have idea on how to fix this in a proper way?

IMHO when drivers are setting the dma_mask and coherent_mask the
dma_pfn_offset should not be applied to the mask at all.

If I understand it correctly for EDMA as example:

I set dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
since it can only address memory in this range.

It does not matter if dma_pfn_offset is 0 or not 0 (like in k2g, where
it is 0x78) the EDMA still can only address within 32 bits.

The dma_pfn_offset will tell us that the memory location's physical
address is seen by the DMA at (phys_pfn - dma_pfn_offset) -> dma_pfn.

The dma_mask should be checked against the dma_pfn.


That's exactly what dma_direct_supported() does, though.


Yes, this is my understanding as well.


What it doesn't
do, AFAICS, is account for weird cases where the DMA zone *starts* way,
way above 32 physical address bits because of an implicit assumption
that *all* devices have a dma_pfn_offset equal to min_low_pfn.


The problem - I think - is that the DMA_BIT_MASK(32) from
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)) is treated as physical
address along the call path so the dma_pfn_offset is applied to it and
the check will fail, saying that DMA_BIT_MASK(32) can not be supported.


But that's the thing - in isolation, that is entirely correct. 
Considering ZONE_DMA32 for simplicity, in general the zone is expected 
to cover the physical address range 0x_ - 0x_ (because 
DMA offsets are relatively rare), and a device with a dma_pfn_offset of 
more than (0x1__ >> PAGE_SHIFT) *cannot* support that range with 
any mask, because the DMA address itself would have to be negative.


The problem is that platforms with esoteric memory maps have no right 
thing to do. If the base of RAM is at at 0x1__ or higher, the 
"correct" ZONE_DMA32 would be empty while ZONE_NORMAL above it would 
not, and last time I looked that makes the page allocator break badly. 
So the standard bodge on such platforms is to make ZONE_DMA32 cover not 
the first 4GB of *PA space*, but the first 4GB of *RAM*, wherever that 
happens to be. That then brings different problems - now the page 
allocator is happy and successfully returns GFP_DMA32 allocations from 
the range 0x8__ - 0x8__ that are utterly useless to 
32-bit devices with zero dma_pfn_offset - see the AMD Seattle SoC for 
the prime example of that. If on the other hand all devices are 
guaranteed to have a dma_pfn_offset that puts the base of RAM at DMA 
address 0 then GFP_DMA32 allocations do end up working as expected, but 
now the original assumption of where ZONE_DMA32 actually is is broken, 
so generic code unaware of the platform/architecture-specific bodge will 
be misled - that's the case you're running into.


Having thought this far, if there's a non-hacky way to reach in and grab 
ZONE_DMA{32} such that dma_direct_supported() could use zone_end_pfn() 
instead of trying to assume either way, that might be the most robust 
general solution.


Robin.


Fyi, this is what I have gathered on k2g via prints:

dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));

Prints inside dma_direct_supported():
sdhci-omap 2300.mmc: max_pfn: 88
sdhci-omap 2300.mmc: min_mask #1: ff
sdhci-omap 2300.mmc: min_mask #2: ff
sdhci-omap 2300.mmc: dev->dma_pfn_offset: 78
sdhci-omap 2300.mmc: PAGE_SHIFT: 12
sdhci-omap 2300.mmc: __phys_to_dma(dev, min_mask): ff880ff
sdhci-omap 2300.mmc: mask: 

Print in dma_supported() after returning from dma_direct_supported():
sdhci-omap 2300.mmc: dma_is_direct, ret = 0

sdhci-omap 2310.mmc: DMA is not supported for the device

keystone-k2g have this in soc0 node:
dma-ranges = <0x8000 0x8 0x 0x8000>;

DDR starts at 0x8   (8G) and 2G is aliased at 0x8000 .

This gives the 0x78 for dma_pfn_offset for all devices underneath it.

The DMA_BIT_MASK(24) is passed to __phys_to_dma() because
CONFIG_ZONE_DMA is 

Re: [PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs

2020-01-08 Thread Peter Ujfalusi via iommu
Robin,

On 08/01/2020 14.21, Robin Murphy wrote:
> On 08/01/2020 8:28 am, Peter Ujfalusi via iommu wrote:
>> Hi Christoph,
>>
>> On 19/12/2019 17.20, Peter Ujfalusi wrote:
>>> Hi Christoph,
>>>
>>> On 19/12/2019 17.02, Christoph Hellwig wrote:
 Hi Peter,

 can you try the patch below (it will need to be split into two):
>>>
>>> Thank you!
>>>
>>> Unfortunately it does not help:
>>> [    0.596208] edma: probe of 270.edma failed with error -5
>>> [    0.596626] edma: probe of 2728000.edma failed with error -5
>>> ...
>>> [    2.108602] sdhci-omap 2300.mmc: Got CD GPIO
>>> [    2.113899] mmc0: Failed to set 32-bit DMA mask.
>>> [    2.118592] mmc0: No suitable DMA available - falling back to PIO
>>> [    2.159038] mmc0: SDHCI controller on 2300.mmc [2300.mmc]
>>> using PIO
>>> [    2.167531] mmc1: Failed to set 32-bit DMA mask.
>>> [    2.172192] mmc1: No suitable DMA available - falling back to PIO
>>> [    2.213841] mmc1: SDHCI controller on 2310.mmc [2310.mmc]
>>> using PIO
>>
>> Do you have idea on how to fix this in a proper way?
>>
>> IMHO when drivers are setting the dma_mask and coherent_mask the
>> dma_pfn_offset should not be applied to the mask at all.
>>
>> If I understand it correctly for EDMA as example:
>>
>> I set dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
>> since it can only address memory in this range.
>>
>> It does not matter if dma_pfn_offset is 0 or not 0 (like in k2g, where
>> it is 0x78) the EDMA still can only address within 32 bits.
>>
>> The dma_pfn_offset will tell us that the memory location's physical
>> address is seen by the DMA at (phys_pfn - dma_pfn_offset) -> dma_pfn.
>>
>> The dma_mask should be checked against the dma_pfn.
> 
> That's exactly what dma_direct_supported() does, though.

Yes, this is my understanding as well.

> What it doesn't
> do, AFAICS, is account for weird cases where the DMA zone *starts* way,
> way above 32 physical address bits because of an implicit assumption
> that *all* devices have a dma_pfn_offset equal to min_low_pfn.

The problem - I think - is that the DMA_BIT_MASK(32) from
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)) is treated as physical
address along the call path so the dma_pfn_offset is applied to it and
the check will fail, saying that DMA_BIT_MASK(32) can not be supported.

Fyi, this is what I have gathered on k2g via prints:

dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));

Prints inside dma_direct_supported():
sdhci-omap 2300.mmc: max_pfn: 88
sdhci-omap 2300.mmc: min_mask #1: ff
sdhci-omap 2300.mmc: min_mask #2: ff
sdhci-omap 2300.mmc: dev->dma_pfn_offset: 78
sdhci-omap 2300.mmc: PAGE_SHIFT: 12
sdhci-omap 2300.mmc: __phys_to_dma(dev, min_mask): ff880ff
sdhci-omap 2300.mmc: mask: 

Print in dma_supported() after returning from dma_direct_supported():
sdhci-omap 2300.mmc: dma_is_direct, ret = 0

sdhci-omap 2310.mmc: DMA is not supported for the device

keystone-k2g have this in soc0 node:
dma-ranges = <0x8000 0x8 0x 0x8000>;

DDR starts at 0x8   (8G) and 2G is aliased at 0x8000 .

This gives the 0x78 for dma_pfn_offset for all devices underneath it.

The DMA_BIT_MASK(24) is passed to __phys_to_dma() because
CONFIG_ZONE_DMA is enabled.

SWIOTLB is enabled in kconfig.

> I'm not sure how possible it is to cope with that generically.

Me neither, but k2g is failing since v5.3-rc3 and the bisect pointed to
this commit.

> 
> Robin.
> 
>> We can not 'move' the dma_mask with dma_pfn_offset when setting the mask
>> since it is not correct. The DMA can access in 32 bits range and we have
>> the peripherals under 0x8000 .
>>
>> I might be missing something, but it looks to me that the way we set the
>> dma_mask and the coherent_mask is the place where this can be fixed.
>>
>> Best regards,
>> - Péter
>>
>>>
>>> - Péter
>>>
>>>
 diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
 index e822af0d9219..30b9c6786ce3 100644
 --- a/arch/arm/mm/dma-mapping.c
 +++ b/arch/arm/mm/dma-mapping.c
 @@ -221,7 +221,8 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
     static int __dma_supported(struct device *dev, u64 mask, bool warn)
   {
 -    unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
 +    unsigned long max_dma_pfn =
 +    min_t(unsigned long, max_pfn, zone_dma_limit >> PAGE_SHIFT);
     /*
    * Translate the device's DMA mask to a PFN limit.  This
 diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
 index 3ef204137e73..dd0e169a1bb1 100644
 --- a/arch/arm/mm/init.c
 +++ b/arch/arm/mm/init.c
 @@ -19,6 +19,7 @@
   #include 
   #include 
   #include 
 +#include 
   #include 
   #include 
   #include 
 @@ -84,15 +85,6 @@ static void __init find_limits(unsigned long
 *min, unsigned long *max_low,
   phys_addr_t arm_dma_zone_size __read_mostly;
   

Re: [PATCH 22/22] iommu/vt-d: Add a quirk flag for scope mismatched devices

2020-01-08 Thread Christoph Hellwig
> +/*
> + * We expect devices with endpoint scope to have normal PCI
> + * headers, and devices with bridge scope to have bridge PCI
> + * headers.  However some PCI devices may be listed in the
> + * DMAR table with bridge scope, even though they have a
> + * normal PCI header. We don't declare a socpe mismatch for
> + * below special cases.
> + */

Please use up all 80 lines for comments.

> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2f0d,/* NTB devices  
> */
> +  quirk_dmar_scope_mismatch);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2020,/* NVME host */
> +  quirk_dmar_scope_mismatch);

As said before "NVME host" host.  Besides the wrong spelling of NVMe,
the NVMe host is the Linux kernel, so describing a device as such seems
rather bogus.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs

2020-01-08 Thread Robin Murphy

On 08/01/2020 8:28 am, Peter Ujfalusi via iommu wrote:

Hi Christoph,

On 19/12/2019 17.20, Peter Ujfalusi wrote:

Hi Christoph,

On 19/12/2019 17.02, Christoph Hellwig wrote:

Hi Peter,

can you try the patch below (it will need to be split into two):


Thank you!

Unfortunately it does not help:
[0.596208] edma: probe of 270.edma failed with error -5
[0.596626] edma: probe of 2728000.edma failed with error -5
...
[2.108602] sdhci-omap 2300.mmc: Got CD GPIO
[2.113899] mmc0: Failed to set 32-bit DMA mask.
[2.118592] mmc0: No suitable DMA available - falling back to PIO
[2.159038] mmc0: SDHCI controller on 2300.mmc [2300.mmc]
using PIO
[2.167531] mmc1: Failed to set 32-bit DMA mask.
[2.172192] mmc1: No suitable DMA available - falling back to PIO
[2.213841] mmc1: SDHCI controller on 2310.mmc [2310.mmc]
using PIO


Do you have idea on how to fix this in a proper way?

IMHO when drivers are setting the dma_mask and coherent_mask the
dma_pfn_offset should not be applied to the mask at all.

If I understand it correctly for EDMA as example:

I set dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
since it can only address memory in this range.

It does not matter if dma_pfn_offset is 0 or not 0 (like in k2g, where
it is 0x78) the EDMA still can only address within 32 bits.

The dma_pfn_offset will tell us that the memory location's physical
address is seen by the DMA at (phys_pfn - dma_pfn_offset) -> dma_pfn.

The dma_mask should be checked against the dma_pfn.


That's exactly what dma_direct_supported() does, though. What it doesn't 
do, AFAICS, is account for weird cases where the DMA zone *starts* way, 
way above 32 physical address bits because of an implicit assumption 
that *all* devices have a dma_pfn_offset equal to min_low_pfn. I'm not 
sure how possible it is to cope with that generically.


Robin.


We can not 'move' the dma_mask with dma_pfn_offset when setting the mask
since it is not correct. The DMA can access in 32 bits range and we have
the peripherals under 0x8000 .

I might be missing something, but it looks to me that the way we set the
dma_mask and the coherent_mask is the place where this can be fixed.

Best regards,
- Péter



- Péter



diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e822af0d9219..30b9c6786ce3 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -221,7 +221,8 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
  
  static int __dma_supported(struct device *dev, u64 mask, bool warn)

  {
-   unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
+   unsigned long max_dma_pfn =
+   min_t(unsigned long, max_pfn, zone_dma_limit >> PAGE_SHIFT);
  
  	/*

 * Translate the device's DMA mask to a PFN limit.  This
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 3ef204137e73..dd0e169a1bb1 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -19,6 +19,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -84,15 +85,6 @@ static void __init find_limits(unsigned long *min, unsigned 
long *max_low,
  phys_addr_t arm_dma_zone_size __read_mostly;
  EXPORT_SYMBOL(arm_dma_zone_size);
  
-/*

- * The DMA mask corresponding to the maximum bus address allocatable
- * using GFP_DMA.  The default here places no restriction on DMA
- * allocations.  This must be the smallest DMA mask in the system,
- * so a successful GFP_DMA allocation will always satisfy this.
- */
-phys_addr_t arm_dma_limit;
-unsigned long arm_dma_pfn_limit;
-
  static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long 
*hole,
unsigned long dma_size)
  {
@@ -108,14 +100,14 @@ static void __init arm_adjust_dma_zone(unsigned long 
*size, unsigned long *hole,
  
  void __init setup_dma_zone(const struct machine_desc *mdesc)

  {
-#ifdef CONFIG_ZONE_DMA
-   if (mdesc->dma_zone_size) {
+   if (!IS_ENABLED(CONFIG_ZONE_DMA)) {
+   zone_dma_limit = ((phys_addr_t)~0);
+   } else if (mdesc->dma_zone_size) {
arm_dma_zone_size = mdesc->dma_zone_size;
-   arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
-   } else
-   arm_dma_limit = 0x;
-   arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
-#endif
+   zone_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
+   } else {
+   zone_dma_limit = 0x;
+   }
  }
  
  static void __init zone_sizes_init(unsigned long min, unsigned long max_low,

@@ -279,7 +271,7 @@ void __init arm_memblock_init(const struct machine_desc 
*mdesc)
early_init_fdt_scan_reserved_mem();
  
  	/* reserve memory for DMA contiguous allocations */

-   dma_contiguous_reserve(arm_dma_limit);
+   dma_contiguous_reserve(zone_dma_limit);
  
  	arm_memblock_steal_permitted = false;

memblock_dump_all();
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 

Re: [PATCH 0/3] iommu/arm-smmu: Qualcomm bootsplash/efifb

2020-01-08 Thread Will Deacon
On Thu, Dec 26, 2019 at 02:17:06PM -0800, Bjorn Andersson wrote:
> These patches implements the stream mapping inheritance that's necessary in
> order to not hit a security violation as the display hardware looses its 
> stream
> mapping during initialization of arm-smmu in various Qualcomm platforms.
> 
> This was previously posted as an RFC [1], changes since then involves the
> rebase and migration of the read-back code to the Qualcomm specific
> implementation, the mapping is maintained indefinitely - to handle probe
> deferring clients - and rewritten commit messages.

I don't think we should solve this in a Qualcomm-specific manner. Please can
you take a look at the proposal from Thierry [1] and see whether or not it
works for you?

Thanks,

Will

[1] 
https://lore.kernel.org/lkml/20191209150748.2471814-1-thierry.red...@gmail.com
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs

2020-01-08 Thread Peter Ujfalusi via iommu
Hi Christoph,

On 19/12/2019 17.20, Peter Ujfalusi wrote:
> Hi Christoph,
> 
> On 19/12/2019 17.02, Christoph Hellwig wrote:
>> Hi Peter,
>>
>> can you try the patch below (it will need to be split into two):
> 
> Thank you!
> 
> Unfortunately it does not help:
> [0.596208] edma: probe of 270.edma failed with error -5
> [0.596626] edma: probe of 2728000.edma failed with error -5
> ...
> [2.108602] sdhci-omap 2300.mmc: Got CD GPIO
> [2.113899] mmc0: Failed to set 32-bit DMA mask.
> [2.118592] mmc0: No suitable DMA available - falling back to PIO
> [2.159038] mmc0: SDHCI controller on 2300.mmc [2300.mmc]
> using PIO
> [2.167531] mmc1: Failed to set 32-bit DMA mask.
> [2.172192] mmc1: No suitable DMA available - falling back to PIO
> [2.213841] mmc1: SDHCI controller on 2310.mmc [2310.mmc]
> using PIO

Do you have idea on how to fix this in a proper way?

IMHO when drivers are setting the dma_mask and coherent_mask the
dma_pfn_offset should not be applied to the mask at all.

If I understand it correctly for EDMA as example:

I set dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
since it can only address memory in this range.

It does not matter if dma_pfn_offset is 0 or not 0 (like in k2g, where
it is 0x78) the EDMA still can only address within 32 bits.

The dma_pfn_offset will tell us that the memory location's physical
address is seen by the DMA at (phys_pfn - dma_pfn_offset) -> dma_pfn.

The dma_mask should be checked against the dma_pfn.

We can not 'move' the dma_mask with dma_pfn_offset when setting the mask
since it is not correct. The DMA can access in 32 bits range and we have
the peripherals under 0x8000 .

I might be missing something, but it looks to me that the way we set the
dma_mask and the coherent_mask is the place where this can be fixed.

Best regards,
- Péter

> 
> - Péter
> 
> 
>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> index e822af0d9219..30b9c6786ce3 100644
>> --- a/arch/arm/mm/dma-mapping.c
>> +++ b/arch/arm/mm/dma-mapping.c
>> @@ -221,7 +221,8 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
>>  
>>  static int __dma_supported(struct device *dev, u64 mask, bool warn)
>>  {
>> -unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
>> +unsigned long max_dma_pfn =
>> +min_t(unsigned long, max_pfn, zone_dma_limit >> PAGE_SHIFT);
>>  
>>  /*
>>   * Translate the device's DMA mask to a PFN limit.  This
>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>> index 3ef204137e73..dd0e169a1bb1 100644
>> --- a/arch/arm/mm/init.c
>> +++ b/arch/arm/mm/init.c
>> @@ -19,6 +19,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -84,15 +85,6 @@ static void __init find_limits(unsigned long *min, 
>> unsigned long *max_low,
>>  phys_addr_t arm_dma_zone_size __read_mostly;
>>  EXPORT_SYMBOL(arm_dma_zone_size);
>>  
>> -/*
>> - * The DMA mask corresponding to the maximum bus address allocatable
>> - * using GFP_DMA.  The default here places no restriction on DMA
>> - * allocations.  This must be the smallest DMA mask in the system,
>> - * so a successful GFP_DMA allocation will always satisfy this.
>> - */
>> -phys_addr_t arm_dma_limit;
>> -unsigned long arm_dma_pfn_limit;
>> -
>>  static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long 
>> *hole,
>>  unsigned long dma_size)
>>  {
>> @@ -108,14 +100,14 @@ static void __init arm_adjust_dma_zone(unsigned long 
>> *size, unsigned long *hole,
>>  
>>  void __init setup_dma_zone(const struct machine_desc *mdesc)
>>  {
>> -#ifdef CONFIG_ZONE_DMA
>> -if (mdesc->dma_zone_size) {
>> +if (!IS_ENABLED(CONFIG_ZONE_DMA)) {
>> +zone_dma_limit = ((phys_addr_t)~0);
>> +} else if (mdesc->dma_zone_size) {
>>  arm_dma_zone_size = mdesc->dma_zone_size;
>> -arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
>> -} else
>> -arm_dma_limit = 0x;
>> -arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
>> -#endif
>> +zone_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
>> +} else {
>> +zone_dma_limit = 0x;
>> +}
>>  }
>>  
>>  static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>> @@ -279,7 +271,7 @@ void __init arm_memblock_init(const struct machine_desc 
>> *mdesc)
>>  early_init_fdt_scan_reserved_mem();
>>  
>>  /* reserve memory for DMA contiguous allocations */
>> -dma_contiguous_reserve(arm_dma_limit);
>> +dma_contiguous_reserve(zone_dma_limit);
>>  
>>  arm_memblock_steal_permitted = false;
>>  memblock_dump_all();
>> diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
>> index 88c121ac14b3..7dbd77554273 100644
>> --- a/arch/arm/mm/mm.h
>> +++ b/arch/arm/mm/mm.h
>> @@ -82,14 +82,6 @@ extern __init void add_static_vm_early(struct static_vm 
>> *svm);
>>  
>>  #endif
>>  
>> -#ifdef CONFIG_ZONE_DMA
>> -extern