Re: [RFC] Use SMMU HTTU for DMA dirty page tracking
On 2020/5/27 11:27, Tian, Kevin wrote: >> From: Xiang Zheng >> Sent: Monday, May 25, 2020 7:34 PM >> >> [+cc Kirti, Yan, Alex] >> >> On 2020/5/23 1:14, Jean-Philippe Brucker wrote: >>> Hi, >>> >>> On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote: Hi all, Is there any plan for enabling SMMU HTTU? >>> >>> Not outside of SVA, as far as I know. >>> >> I have seen the patch locates in the SVA series patch, which adds support for HTTU: https://www.spinics.net/lists/arm-kernel/msg798694.html HTTU reduces the number of access faults on SMMU fault queue (permission faults also benifit from it). Besides reducing the faults, HTTU also helps to track dirty pages for device DMA. Is it feasible to utilize HTTU to get dirty pages on device DMA during VFIO live migration? >>> >>> As you know there is a VFIO interface for this under discussion: >>> https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email- >> kwankh...@nvidia.com/ >>> It doesn't implement an internal API to communicate with the IOMMU >> driver >>> about dirty pages. > > We plan to add such API later, e.g. to utilize A/D bit in VT-d 2nd-level > page tables (Rev 3.0). > Thank you, Kevin. When will you send this series patches? Maybe(Hope) we can also support hardware-based dirty pages tracking via common APIs based on your patches. :) >> >>> If SMMU can track dirty pages, devices are not required to implement additional dirty pages tracking to support VFIO live migration. >>> >>> It seems feasible, though tracking it in the device might be more >>> efficient. I might have misunderstood but I think for live migration of >>> the Intel NIC they trap guest accesses to the device and introspect its >>> state to figure out which pages it is accessing. > > Does HTTU implement A/D-like mechanism in SMMU page tables, or just > report dirty pages in a log buffer? Either way tracking dirty pages in IOMMU > side is generic thus doesn't require device-specific tweak like in Intel NIC. > Currently HTTU just implement A/D-like mechanism in SMMU page tables. We certainly expect SMMU can also implement PML-like feature so that we can avoid walking the whole page table to get the dirty pages. By the way, I'm not sure whether HTTU or SLAD can help for mediated deivce. -- Thanks, Xiang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/iova: Free global iova rcache on iova alloc failure
From: Vijayanand Jitta When ever an iova alloc request fails we free the iova ranges present in the percpu iova rcaches and then retry but the global iova rcache is not freed as a result we could still see iova alloc failure even after retry as global rcache is still holding the iova's which can cause fragmentation. So, free the global iova rcache as well and then go for the retry. Signed-off-by: Vijayanand Jitta --- drivers/iommu/iova.c | 22 ++ include/linux/iova.h | 6 ++ 2 files changed, 28 insertions(+) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 0e6a953..5ae0328 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -431,6 +431,7 @@ struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn) flush_rcache = false; for_each_online_cpu(cpu) free_cpu_cached_iovas(cpu, iovad); + free_global_cached_iovas(iovad); goto retry; } @@ -1044,5 +1045,26 @@ void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad) } } +/* + * free all the IOVA ranges of global cache + */ +void free_global_cached_iovas(struct iova_domain *iovad) +{ + struct iova_rcache *rcache; + int i, j; + + for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { + rcache = &iovad->rcaches[i]; + spin_lock(&rcache->lock); + for (j = 0; j < rcache->depot_size; ++j) { + iova_magazine_free_pfns(rcache->depot[j], iovad); + iova_magazine_free(rcache->depot[j]); + rcache->depot[j] = NULL; + } + rcache->depot_size = 0; + spin_unlock(&rcache->lock); + } +} + MODULE_AUTHOR("Anil S Keshavamurthy "); MODULE_LICENSE("GPL"); diff --git a/include/linux/iova.h b/include/linux/iova.h index a0637ab..a905726 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -163,6 +163,7 @@ int init_iova_flush_queue(struct iova_domain *iovad, struct iova *split_and_remove_iova(struct iova_domain *iovad, struct iova *iova, unsigned long pfn_lo, unsigned long pfn_hi); void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad); +void free_global_cached_iovas(struct iova_domain *iovad); #else static inline int iova_cache_get(void) { @@ -270,6 +271,11 @@ static inline void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad) { } + +static inline void free_global_cached_iovas(struct iova_domain *iovad) +{ +} + #endif #endif -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC] Use SMMU HTTU for DMA dirty page tracking
> From: Xiang Zheng > Sent: Monday, May 25, 2020 7:34 PM > > [+cc Kirti, Yan, Alex] > > On 2020/5/23 1:14, Jean-Philippe Brucker wrote: > > Hi, > > > > On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote: > >> Hi all, > >> > >> Is there any plan for enabling SMMU HTTU? > > > > Not outside of SVA, as far as I know. > > > > >> I have seen the patch locates in the SVA series patch, which adds > >> support for HTTU: > >> https://www.spinics.net/lists/arm-kernel/msg798694.html > >> > >> HTTU reduces the number of access faults on SMMU fault queue > >> (permission faults also benifit from it). > >> > >> Besides reducing the faults, HTTU also helps to track dirty pages for > >> device DMA. Is it feasible to utilize HTTU to get dirty pages on device > >> DMA during VFIO live migration? > > > > As you know there is a VFIO interface for this under discussion: > > https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email- > kwankh...@nvidia.com/ > > It doesn't implement an internal API to communicate with the IOMMU > driver > > about dirty pages. We plan to add such API later, e.g. to utilize A/D bit in VT-d 2nd-level page tables (Rev 3.0). > > > > >> If SMMU can track dirty pages, devices are not required to implement > >> additional dirty pages tracking to support VFIO live migration. > > > > It seems feasible, though tracking it in the device might be more > > efficient. I might have misunderstood but I think for live migration of > > the Intel NIC they trap guest accesses to the device and introspect its > > state to figure out which pages it is accessing. Does HTTU implement A/D-like mechanism in SMMU page tables, or just report dirty pages in a log buffer? Either way tracking dirty pages in IOMMU side is generic thus doesn't require device-specific tweak like in Intel NIC. Thanks kevin > > > > With HTTU I suppose (without much knowledge about live migration) that > > you'd need several new interfaces to the IOMMU drivers: > > > > * A way for VFIO to query HTTU support in the SMMU. There are some > > discussions about communicating more IOMMU capabilities through VFIO > but > > no implementation yet. When HTTU isn't supported the DIRTY_PAGES > bitmap > > would report all pages as they do now. > > > > * VFIO_IOMMU_DIRTY_PAGES_FLAG_START/STOP would clear the dirty bit > > for all VFIO mappings (which is going to take some time). There is a > > walker in io-pgtable for iova_to_phys() which could be extended. I > > suppose it's also possible to atomically switch the HA and HD bits in > > context descriptors. > > Maybe we need not switch HA and HD bits, just turn on them all the time? > > > > > * VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP would query the dirty > bit for all > > VFIO mappings. > > > > I think we need to consider the case of IOMMU dirty pages logging. We want > to test Kirti's VFIO migration patches combined with SMMU HTTU, any > suggestions? > > -- > Thanks, > Xiang > > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 18/24] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
Hi Jean, This patch only enables HTTU bits in CDs. Is it also neccessary to enable HTTU bits in STEs in this patch? On 2020/5/20 1:54, Jean-Philippe Brucker wrote: > If the SMMU supports it and the kernel was built with HTTU support, > enable hardware update of access and dirty flags. This is essential for > shared page tables, to reduce the number of access faults on the fault > queue. Normal DMA with io-pgtables doesn't currently use the access or > dirty flags. > > We can enable HTTU even if CPUs don't support it, because the kernel > always checks for HW dirty bit and updates the PTE flags atomically. > > Signed-off-by: Jean-Philippe Brucker > --- > drivers/iommu/arm-smmu-v3.c | 24 +++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index 1386d4d2bc60..6a368218f54c 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -58,6 +58,8 @@ > #define IDR0_ASID16 (1 << 12) > #define IDR0_ATS (1 << 10) > #define IDR0_HYP (1 << 9) > +#define IDR0_HD (1 << 7) > +#define IDR0_HA (1 << 6) > #define IDR0_BTM (1 << 5) > #define IDR0_COHACC (1 << 4) > #define IDR0_TTF GENMASK(3, 2) > @@ -311,6 +313,9 @@ > #define CTXDESC_CD_0_TCR_IPS GENMASK_ULL(34, 32) > #define CTXDESC_CD_0_TCR_TBI0(1ULL << 38) > > +#define CTXDESC_CD_0_TCR_HA (1UL << 43) > +#define CTXDESC_CD_0_TCR_HD (1UL << 42) > + > #define CTXDESC_CD_0_AA64(1UL << 41) > #define CTXDESC_CD_0_S (1UL << 44) > #define CTXDESC_CD_0_R (1UL << 45) > @@ -663,6 +668,8 @@ struct arm_smmu_device { > #define ARM_SMMU_FEAT_E2H(1 << 16) > #define ARM_SMMU_FEAT_BTM(1 << 17) > #define ARM_SMMU_FEAT_SVA(1 << 18) > +#define ARM_SMMU_FEAT_HA (1 << 19) > +#define ARM_SMMU_FEAT_HD (1 << 20) > u32 features; > > #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0) > @@ -1718,10 +1725,17 @@ static int arm_smmu_write_ctx_desc(struct > arm_smmu_domain *smmu_domain, >* this substream's traffic >*/ > } else { /* (1) and (2) */ > + u64 tcr = cd->tcr; > + > cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); > cdptr[2] = 0; > cdptr[3] = cpu_to_le64(cd->mair); > > + if (!(smmu->features & ARM_SMMU_FEAT_HD)) > + tcr &= ~CTXDESC_CD_0_TCR_HD; > + if (!(smmu->features & ARM_SMMU_FEAT_HA)) > + tcr &= ~CTXDESC_CD_0_TCR_HA; > + > /* >* STE is live, and the SMMU might read dwords of this CD in any >* order. Ensure that it observes valid values before reading > @@ -1729,7 +1743,7 @@ static int arm_smmu_write_ctx_desc(struct > arm_smmu_domain *smmu_domain, >*/ > arm_smmu_sync_cd(smmu_domain, ssid, true); > > - val = cd->tcr | > + val = tcr | > #ifdef __BIG_ENDIAN > CTXDESC_CD_0_ENDI | > #endif > @@ -1958,10 +1972,12 @@ static struct arm_smmu_ctx_desc > *arm_smmu_alloc_shared_cd(struct mm_struct *mm) > return old_cd; > } > > + /* HA and HD will be filtered out later if not supported by the SMMU */ > tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - VA_BITS) | > FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) | > FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) | > FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) | > + CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD | > CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64; > > switch (PAGE_SIZE) { > @@ -4454,6 +4470,12 @@ static int arm_smmu_device_hw_probe(struct > arm_smmu_device *smmu) > smmu->features |= ARM_SMMU_FEAT_E2H; > } > > + if (reg & (IDR0_HA | IDR0_HD)) { > + smmu->features |= ARM_SMMU_FEAT_HA; > + if (reg & IDR0_HD) > + smmu->features |= ARM_SMMU_FEAT_HD; > + } > + > /* >* If the CPU is using VHE, but the SMMU doesn't support it, the SMMU >* will create TLB entries for NH-EL1 world and will miss the > -- Thanks, Xiang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Relax ACS requirement for Intel RCiEP devices.
On 5/26/20 3:17 PM, Ashok Raj wrote: All Intel platforms guarantee that all root complex implementations must send transactions up to IOMMU for address translations. Hence for RCiEP devices that are Vendor ID Intel, can claim exception for lack of ACS support. 3.16 Root-Complex Peer to Peer Considerations When DMA remapping is enabled, peer-to-peer requests through the Root-Complex must be handled as follows: • The input address in the request is translated (through first-level, second-level or nested translation) to a host physical address (HPA). The address decoding for peer addresses must be done only on the translated HPA. Hardware implementations are free to further limit peer-to-peer accesses to specific host physical address regions (or to completely disallow peer-forwarding of translated requests). • Since address translation changes the contents (address field) of the PCI Express Transaction Layer Packet (TLP), for PCI Express peer-to-peer requests with ECRC, the Root-Complex hardware must use the new ECRC (re-computed with the translated address) if it decides to forward the TLP as a peer request. • Root-ports, and multi-function root-complex integrated endpoints, may support additional peerto-peer control features by supporting PCI Express Access Control Services (ACS) capability. Refer to ACS capability in PCI Express specifications for details. Since Linux didn't give special treatment to allow this exception, certain RCiEP MFD devices are getting grouped in a single iommu group. This doesn't permit a single device to be assigned to a guest for instance. In one vendor system: Device 14.x were grouped in a single IOMMU group. /sys/kernel/iommu_groups/5/devices/:00:14.0 /sys/kernel/iommu_groups/5/devices/:00:14.2 /sys/kernel/iommu_groups/5/devices/:00:14.3 After the patch: /sys/kernel/iommu_groups/5/devices/:00:14.0 /sys/kernel/iommu_groups/5/devices/:00:14.2 /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group 14.0 and 14.2 are integrated devices, but legacy end points. Whereas 14.3 was a PCIe compliant RCiEP. 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 This permits assigning this device to a guest VM. Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") Signed-off-by: Ashok Raj To: Joerg Roedel To: Bjorn Helgaas Cc: linux-ker...@vger.kernel.org Cc: iommu@lists.linux-foundation.org Cc: Lu Baolu Cc: Alex Williamson Cc: Darrel Goeddel Cc: Mark Scott , Cc: Romil Sharma Cc: Ashok Raj --- drivers/iommu/iommu.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 2b471419e26c..31b595dfedde 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1187,7 +1187,18 @@ static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev, struct pci_dev *tmp = NULL; struct iommu_group *group; - if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS)) + /* +* Intel VT-d Specification Section 3.16, Root-Complex Peer to Peer +* Considerations manadate that all transactions in RCiEP's and +* even Integrated MFD's *must* be sent up to the IOMMU. P2P is +* only possible on translated addresses. This gives enough +* guarantee that such devices can be forgiven for lack of ACS +* support. +*/ + if (!pdev->multifunction || + (pdev->vendor == PCI_VENDOR_ID_INTEL && +pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) || +pci_acs_enabled(pdev, REQ_ACS_FLAGS)) Looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan return NULL; for_each_pci_dev(tmp) { ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Relax ACS requirement for Intel RCiEP devices.
On Tue, 26 May 2020 15:17:35 -0700 Ashok Raj wrote: > All Intel platforms guarantee that all root complex implementations > must send transactions up to IOMMU for address translations. Hence for > RCiEP devices that are Vendor ID Intel, can claim exception for lack of > ACS support. > > > 3.16 Root-Complex Peer to Peer Considerations > When DMA remapping is enabled, peer-to-peer requests through the > Root-Complex must be handled > as follows: > • The input address in the request is translated (through first-level, > second-level or nested translation) to a host physical address (HPA). > The address decoding for peer addresses must be done only on the > translated HPA. Hardware implementations are free to further limit > peer-to-peer accesses to specific host physical address regions > (or to completely disallow peer-forwarding of translated requests). > • Since address translation changes the contents (address field) of > the PCI Express Transaction Layer Packet (TLP), for PCI Express > peer-to-peer requests with ECRC, the Root-Complex hardware must use > the new ECRC (re-computed with the translated address) if it > decides to forward the TLP as a peer request. > • Root-ports, and multi-function root-complex integrated endpoints, may > support additional peerto-peer control features by supporting PCI Express > Access Control Services (ACS) capability. Refer to ACS capability in > PCI Express specifications for details. > > Since Linux didn't give special treatment to allow this exception, certain > RCiEP MFD devices are getting grouped in a single iommu group. This > doesn't permit a single device to be assigned to a guest for instance. > > In one vendor system: Device 14.x were grouped in a single IOMMU group. > > /sys/kernel/iommu_groups/5/devices/:00:14.0 > /sys/kernel/iommu_groups/5/devices/:00:14.2 > /sys/kernel/iommu_groups/5/devices/:00:14.3 > > After the patch: > /sys/kernel/iommu_groups/5/devices/:00:14.0 > /sys/kernel/iommu_groups/5/devices/:00:14.2 > /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group > > 14.0 and 14.2 are integrated devices, but legacy end points. > Whereas 14.3 was a PCIe compliant RCiEP. > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > > This permits assigning this device to a guest VM. > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") > Signed-off-by: Ashok Raj > To: Joerg Roedel > To: Bjorn Helgaas > Cc: linux-ker...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org > Cc: Lu Baolu > Cc: Alex Williamson > Cc: Darrel Goeddel > Cc: Mark Scott , > Cc: Romil Sharma > Cc: Ashok Raj > --- > drivers/iommu/iommu.c | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 2b471419e26c..31b595dfedde 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -1187,7 +1187,18 @@ static struct iommu_group > *get_pci_function_alias_group(struct pci_dev *pdev, > struct pci_dev *tmp = NULL; > struct iommu_group *group; > > - if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS)) > + /* > + * Intel VT-d Specification Section 3.16, Root-Complex Peer to Peer > + * Considerations manadate that all transactions in RCiEP's and > + * even Integrated MFD's *must* be sent up to the IOMMU. P2P is > + * only possible on translated addresses. This gives enough > + * guarantee that such devices can be forgiven for lack of ACS > + * support. > + */ > + if (!pdev->multifunction || > + (pdev->vendor == PCI_VENDOR_ID_INTEL && > + pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) || > + pci_acs_enabled(pdev, REQ_ACS_FLAGS)) > return NULL; > > for_each_pci_dev(tmp) { Hi Ashok, As this is an Intel/VT-d standard, not a PCIe standard, why not implement this in pci_dev_specific_acs_enabled() with all the other quirks? Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
AMD IOMMU perf counters on Zen2
Hello, I'd like to use IOMMU perf counters on a Zen 2 CPU (Ryzen 4500U, Renoir SoC). Unfortunately, init_iommu_perf_ctr fails because val2 != val, i.e. the counter appears not writable. However, if I patch the kernel to skip this check, the counters seem to increment when configured with perf tool. Do you know why the counter might appear not writable in newer CPUs? Thanks. Alexander ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: Relax ACS requirement for Intel RCiEP devices.
All Intel platforms guarantee that all root complex implementations must send transactions up to IOMMU for address translations. Hence for RCiEP devices that are Vendor ID Intel, can claim exception for lack of ACS support. 3.16 Root-Complex Peer to Peer Considerations When DMA remapping is enabled, peer-to-peer requests through the Root-Complex must be handled as follows: • The input address in the request is translated (through first-level, second-level or nested translation) to a host physical address (HPA). The address decoding for peer addresses must be done only on the translated HPA. Hardware implementations are free to further limit peer-to-peer accesses to specific host physical address regions (or to completely disallow peer-forwarding of translated requests). • Since address translation changes the contents (address field) of the PCI Express Transaction Layer Packet (TLP), for PCI Express peer-to-peer requests with ECRC, the Root-Complex hardware must use the new ECRC (re-computed with the translated address) if it decides to forward the TLP as a peer request. • Root-ports, and multi-function root-complex integrated endpoints, may support additional peerto-peer control features by supporting PCI Express Access Control Services (ACS) capability. Refer to ACS capability in PCI Express specifications for details. Since Linux didn't give special treatment to allow this exception, certain RCiEP MFD devices are getting grouped in a single iommu group. This doesn't permit a single device to be assigned to a guest for instance. In one vendor system: Device 14.x were grouped in a single IOMMU group. /sys/kernel/iommu_groups/5/devices/:00:14.0 /sys/kernel/iommu_groups/5/devices/:00:14.2 /sys/kernel/iommu_groups/5/devices/:00:14.3 After the patch: /sys/kernel/iommu_groups/5/devices/:00:14.0 /sys/kernel/iommu_groups/5/devices/:00:14.2 /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group 14.0 and 14.2 are integrated devices, but legacy end points. Whereas 14.3 was a PCIe compliant RCiEP. 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 This permits assigning this device to a guest VM. Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") Signed-off-by: Ashok Raj To: Joerg Roedel To: Bjorn Helgaas Cc: linux-ker...@vger.kernel.org Cc: iommu@lists.linux-foundation.org Cc: Lu Baolu Cc: Alex Williamson Cc: Darrel Goeddel Cc: Mark Scott , Cc: Romil Sharma Cc: Ashok Raj --- drivers/iommu/iommu.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 2b471419e26c..31b595dfedde 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1187,7 +1187,18 @@ static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev, struct pci_dev *tmp = NULL; struct iommu_group *group; - if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS)) + /* +* Intel VT-d Specification Section 3.16, Root-Complex Peer to Peer +* Considerations manadate that all transactions in RCiEP's and +* even Integrated MFD's *must* be sent up to the IOMMU. P2P is +* only possible on translated addresses. This gives enough +* guarantee that such devices can be forgiven for lack of ACS +* support. +*/ + if (!pdev->multifunction || + (pdev->vendor == PCI_VENDOR_ID_INTEL && +pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) || +pci_acs_enabled(pdev, REQ_ACS_FLAGS)) return NULL; for_each_pci_dev(tmp) { -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 09/14] device core: Add ability to handle multiple dma offsets
Hello Andy, On Tue, May 26, 2020 at 4:54 PM Andy Shevchenko wrote: > > On Tue, May 26, 2020 at 03:12:48PM -0400, Jim Quinlan wrote: > > The new field in struct device 'dma_pfn_offset_map' is used to facilitate > > the use of multiple pfn offsets between cpu addrs and dma addrs. It is > > similar to 'dma_pfn_offset' except that the offset chosen depends on the > > cpu or dma address involved. > > > > Signed-off-by: Jim Quinlan > > --- > > drivers/of/address.c| 65 +++-- > > drivers/usb/core/message.c | 3 ++ > > drivers/usb/core/usb.c | 3 ++ > > include/linux/device.h | 10 +- > > include/linux/dma-direct.h | 10 -- > > include/linux/dma-mapping.h | 46 ++ > > kernel/dma/Kconfig | 13 > > 7 files changed, 144 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/of/address.c b/drivers/of/address.c > > index 96d8cfb14a60..a01afffcde7d 100644 > > --- a/drivers/of/address.c > > +++ b/drivers/of/address.c > > @@ -918,6 +918,47 @@ void __iomem *of_io_request_and_map(struct device_node > > *np, int index, > > } > > EXPORT_SYMBOL(of_io_request_and_map); > > > > +#ifdef CONFIG_DMA_PFN_OFFSET_MAP > > +static int attach_dma_pfn_offset_map(struct device *dev, > > + struct device_node *node, int num_ranges) > > +{ > > + struct of_range_parser parser; > > + struct of_range range; > > + size_t r_size = (num_ranges + 1) > > + * sizeof(struct dma_pfn_offset_region); > > + struct dma_pfn_offset_region *r; > > + > > > + r = devm_kzalloc(dev, r_size, GFP_KERNEL); > > devm_?! Yes, otherwise if the device gets unbound/bound repeatedly then there would be a memory leak. > > > Looking at r_size it should be rather kcalloc(). Yep. > > > > + if (!r) > > + return -ENOMEM; > > + dev->dma_pfn_offset_map = r; > > + of_dma_range_parser_init(&parser, node); > > + > > + /* > > + * Record all info for DMA ranges array. We could > > + * just use the of_range struct, but if we did that it > > + * would require more calculations for phys_to_dma and > > + * dma_to_phys conversions. > > + */ > > + for_each_of_range(&parser, &range) { > > + r->cpu_beg = range.cpu_addr; > > + r->cpu_end = r->cpu_beg + range.size; > > + r->dma_beg = range.bus_addr; > > + r->dma_end = r->dma_beg + range.size; > > + r->pfn_offset = PFN_DOWN(range.cpu_addr) > > + - PFN_DOWN(range.bus_addr); > > + r++; > > + } > > + return 0; > > +} > > +#else > > +static int attach_dma_pfn_offset_map(struct device *dev, > > + struct device_node *node, int num_ranges) > > +{ > > + return 0; > > +} > > +#endif > > + > > /** > > * of_dma_get_range - Get DMA range info > > * @dev: device pointer; only needed for a corner case. > > @@ -947,6 +988,8 @@ int of_dma_get_range(struct device *dev, struct > > device_node *np, u64 *dma_addr, > > struct of_range_parser parser; > > struct of_range range; > > u64 dma_start = U64_MAX, dma_end = 0, dma_offset = 0; > > + bool dma_multi_pfn_offset = false; > > + int num_ranges = 0; > > > > while (node) { > > ranges = of_get_property(node, "dma-ranges", &len); > > @@ -977,10 +1020,19 @@ int of_dma_get_range(struct device *dev, struct > > device_node *np, u64 *dma_addr, > > pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", > >range.bus_addr, range.cpu_addr, range.size); > > > > + num_ranges++; > > if (dma_offset && range.cpu_addr - range.bus_addr != > > dma_offset) { > > - pr_warn("Can't handle multiple dma-ranges with > > different offsets on node(%pOF)\n", node); > > - /* Don't error out as we'd break some existing DTs */ > > - continue; > > + if (!IS_ENABLED(CONFIG_DMA_PFN_OFFSET_MAP)) { > > + pr_warn("Can't handle multiple dma-ranges > > with different offsets on node(%pOF)\n", node); > > + pr_warn("Perhaps set > > DMA_PFN_OFFSET_MAP=y?\n"); > > + /* > > + * Don't error out as we'd break some existing > > + * DTs that are using configs w/o > > + * CONFIG_DMA_PFN_OFFSET_MAP set. > > + */ > > + continue; > > + } > > + dma_multi_pfn_offset = true; > > } > > dma_offset = range.cpu_addr - range.bus_addr; > > > > @@ -991,6 +1043,13 @@ int of_dma_get_range(struct device *dev, struct > > device_node *np, u64 *dma_addr, > > dma_end = range.bus_addr + range.size; > >
Re: [PATCH v2 09/14] device core: Add ability to handle multiple dma offsets
On Tue, May 26, 2020 at 03:12:48PM -0400, Jim Quinlan wrote: > The new field in struct device 'dma_pfn_offset_map' is used to facilitate > the use of multiple pfn offsets between cpu addrs and dma addrs. It is > similar to 'dma_pfn_offset' except that the offset chosen depends on the > cpu or dma address involved. > > Signed-off-by: Jim Quinlan > --- > drivers/of/address.c| 65 +++-- > drivers/usb/core/message.c | 3 ++ > drivers/usb/core/usb.c | 3 ++ > include/linux/device.h | 10 +- > include/linux/dma-direct.h | 10 -- > include/linux/dma-mapping.h | 46 ++ > kernel/dma/Kconfig | 13 > 7 files changed, 144 insertions(+), 6 deletions(-) > > diff --git a/drivers/of/address.c b/drivers/of/address.c > index 96d8cfb14a60..a01afffcde7d 100644 > --- a/drivers/of/address.c > +++ b/drivers/of/address.c > @@ -918,6 +918,47 @@ void __iomem *of_io_request_and_map(struct device_node > *np, int index, > } > EXPORT_SYMBOL(of_io_request_and_map); > > +#ifdef CONFIG_DMA_PFN_OFFSET_MAP > +static int attach_dma_pfn_offset_map(struct device *dev, > + struct device_node *node, int num_ranges) > +{ > + struct of_range_parser parser; > + struct of_range range; > + size_t r_size = (num_ranges + 1) > + * sizeof(struct dma_pfn_offset_region); > + struct dma_pfn_offset_region *r; > + > + r = devm_kzalloc(dev, r_size, GFP_KERNEL); devm_?! Looking at r_size it should be rather kcalloc(). > + if (!r) > + return -ENOMEM; > + dev->dma_pfn_offset_map = r; > + of_dma_range_parser_init(&parser, node); > + > + /* > + * Record all info for DMA ranges array. We could > + * just use the of_range struct, but if we did that it > + * would require more calculations for phys_to_dma and > + * dma_to_phys conversions. > + */ > + for_each_of_range(&parser, &range) { > + r->cpu_beg = range.cpu_addr; > + r->cpu_end = r->cpu_beg + range.size; > + r->dma_beg = range.bus_addr; > + r->dma_end = r->dma_beg + range.size; > + r->pfn_offset = PFN_DOWN(range.cpu_addr) > + - PFN_DOWN(range.bus_addr); > + r++; > + } > + return 0; > +} > +#else > +static int attach_dma_pfn_offset_map(struct device *dev, > + struct device_node *node, int num_ranges) > +{ > + return 0; > +} > +#endif > + > /** > * of_dma_get_range - Get DMA range info > * @dev: device pointer; only needed for a corner case. > @@ -947,6 +988,8 @@ int of_dma_get_range(struct device *dev, struct > device_node *np, u64 *dma_addr, > struct of_range_parser parser; > struct of_range range; > u64 dma_start = U64_MAX, dma_end = 0, dma_offset = 0; > + bool dma_multi_pfn_offset = false; > + int num_ranges = 0; > > while (node) { > ranges = of_get_property(node, "dma-ranges", &len); > @@ -977,10 +1020,19 @@ int of_dma_get_range(struct device *dev, struct > device_node *np, u64 *dma_addr, > pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", >range.bus_addr, range.cpu_addr, range.size); > > + num_ranges++; > if (dma_offset && range.cpu_addr - range.bus_addr != > dma_offset) { > - pr_warn("Can't handle multiple dma-ranges with > different offsets on node(%pOF)\n", node); > - /* Don't error out as we'd break some existing DTs */ > - continue; > + if (!IS_ENABLED(CONFIG_DMA_PFN_OFFSET_MAP)) { > + pr_warn("Can't handle multiple dma-ranges with > different offsets on node(%pOF)\n", node); > + pr_warn("Perhaps set DMA_PFN_OFFSET_MAP=y?\n"); > + /* > + * Don't error out as we'd break some existing > + * DTs that are using configs w/o > + * CONFIG_DMA_PFN_OFFSET_MAP set. > + */ > + continue; > + } > + dma_multi_pfn_offset = true; > } > dma_offset = range.cpu_addr - range.bus_addr; > > @@ -991,6 +1043,13 @@ int of_dma_get_range(struct device *dev, struct > device_node *np, u64 *dma_addr, > dma_end = range.bus_addr + range.size; > } > > + if (dma_multi_pfn_offset) { > + dma_offset = 0; > + ret = attach_dma_pfn_offset_map(dev, node, num_ranges); > + if (ret) > + return ret; > + } > + > if (dma_start >= dma_end) { > ret = -EINVAL; > pr_debug("Invalid DMA ranges configuration on node(%pOF)\n", > diff --git a/drivers/usb/core/
Re: [RFC 0/2] iommu: arm-smmu: Add support for early direct mappings
On Thu, May 14, 2020 at 12:34 PM wrote: > > On Thu 27 Feb 18:57 PST 2020, Bjorn Andersson wrote: > > Rob, Will, we're reaching the point where upstream has enough > functionality that this is becoming a critical issue for us. > > E.g. Lenovo Yoga C630 is lacking this and a single dts patch to boot > mainline with display, GPU, WiFi and audio working and the story is > similar on several devboards. > > As previously described, the only thing I want is the stream mapping > related to the display controller in place, either with the CB with > translation disabled or possibly with a way to specify the framebuffer > region (although this turns out to mess things up in the display > driver...) > > I did pick this up again recently and concluded that by omitting the > streams for the USB controllers causes an instability issue seen on one > of the controller to disappear. So I would prefer if we somehow could > have a mechanism to only pick the display streams and the context > allocation for this. > > > Can you please share some pointers/insights/wishes for how we can > conclude on this subject? Ping? I just wanted to follow up on this discussion as this small series is crucial for booting mainline on the Dragonboard 845c devboard. It would be really valuable to be able to get some solution upstream so we can test mainline w/o adding additional patches. The rest of the db845c series has been moving forward smoothly, but this set seems to be very stuck with no visible progress since Dec. Are there any pointers for what folks would prefer to see? thanks -john ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 09/14] device core: Add ability to handle multiple dma offsets
The new field in struct device 'dma_pfn_offset_map' is used to facilitate the use of multiple pfn offsets between cpu addrs and dma addrs. It is similar to 'dma_pfn_offset' except that the offset chosen depends on the cpu or dma address involved. Signed-off-by: Jim Quinlan --- drivers/of/address.c| 65 +++-- drivers/usb/core/message.c | 3 ++ drivers/usb/core/usb.c | 3 ++ include/linux/device.h | 10 +- include/linux/dma-direct.h | 10 -- include/linux/dma-mapping.h | 46 ++ kernel/dma/Kconfig | 13 7 files changed, 144 insertions(+), 6 deletions(-) diff --git a/drivers/of/address.c b/drivers/of/address.c index 96d8cfb14a60..a01afffcde7d 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -918,6 +918,47 @@ void __iomem *of_io_request_and_map(struct device_node *np, int index, } EXPORT_SYMBOL(of_io_request_and_map); +#ifdef CONFIG_DMA_PFN_OFFSET_MAP +static int attach_dma_pfn_offset_map(struct device *dev, +struct device_node *node, int num_ranges) +{ + struct of_range_parser parser; + struct of_range range; + size_t r_size = (num_ranges + 1) + * sizeof(struct dma_pfn_offset_region); + struct dma_pfn_offset_region *r; + + r = devm_kzalloc(dev, r_size, GFP_KERNEL); + if (!r) + return -ENOMEM; + dev->dma_pfn_offset_map = r; + of_dma_range_parser_init(&parser, node); + + /* +* Record all info for DMA ranges array. We could +* just use the of_range struct, but if we did that it +* would require more calculations for phys_to_dma and +* dma_to_phys conversions. +*/ + for_each_of_range(&parser, &range) { + r->cpu_beg = range.cpu_addr; + r->cpu_end = r->cpu_beg + range.size; + r->dma_beg = range.bus_addr; + r->dma_end = r->dma_beg + range.size; + r->pfn_offset = PFN_DOWN(range.cpu_addr) + - PFN_DOWN(range.bus_addr); + r++; + } + return 0; +} +#else +static int attach_dma_pfn_offset_map(struct device *dev, +struct device_node *node, int num_ranges) +{ + return 0; +} +#endif + /** * of_dma_get_range - Get DMA range info * @dev: device pointer; only needed for a corner case. @@ -947,6 +988,8 @@ int of_dma_get_range(struct device *dev, struct device_node *np, u64 *dma_addr, struct of_range_parser parser; struct of_range range; u64 dma_start = U64_MAX, dma_end = 0, dma_offset = 0; + bool dma_multi_pfn_offset = false; + int num_ranges = 0; while (node) { ranges = of_get_property(node, "dma-ranges", &len); @@ -977,10 +1020,19 @@ int of_dma_get_range(struct device *dev, struct device_node *np, u64 *dma_addr, pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", range.bus_addr, range.cpu_addr, range.size); + num_ranges++; if (dma_offset && range.cpu_addr - range.bus_addr != dma_offset) { - pr_warn("Can't handle multiple dma-ranges with different offsets on node(%pOF)\n", node); - /* Don't error out as we'd break some existing DTs */ - continue; + if (!IS_ENABLED(CONFIG_DMA_PFN_OFFSET_MAP)) { + pr_warn("Can't handle multiple dma-ranges with different offsets on node(%pOF)\n", node); + pr_warn("Perhaps set DMA_PFN_OFFSET_MAP=y?\n"); + /* +* Don't error out as we'd break some existing +* DTs that are using configs w/o +* CONFIG_DMA_PFN_OFFSET_MAP set. +*/ + continue; + } + dma_multi_pfn_offset = true; } dma_offset = range.cpu_addr - range.bus_addr; @@ -991,6 +1043,13 @@ int of_dma_get_range(struct device *dev, struct device_node *np, u64 *dma_addr, dma_end = range.bus_addr + range.size; } + if (dma_multi_pfn_offset) { + dma_offset = 0; + ret = attach_dma_pfn_offset_map(dev, node, num_ranges); + if (ret) + return ret; + } + if (dma_start >= dma_end) { ret = -EINVAL; pr_debug("Invalid DMA ranges configuration on node(%pOF)\n", diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c index 6197938dcc2d..aaa3e58f5eb4 100644 --- a/drivers/usb/core/message.c +++ b/drivers/usb/core/message.c @@ -1960,6 +1960,9 @@ int usb_set_configuration(struct usb_device *dev, int configur
[PATCH v2 00/14] PCI: brcmstb: enable PCIe for STB chips
v2: Commit: "device core: Add ability to handle multiple dma offsets" o Added helper func attach_dma_pfn_offset_map() in address.c (Chistoph) o Helpers funcs added to __phys_to_dma() & __dma_to_phys() (Christoph) o Added warning when multiple offsets are needed and !DMA_PFN_OFFSET_MAP o dev->dma_pfn_map => dev->dma_pfn_offset_map o s/frm/from/ for dma_pfn_offset_frm_{phys,dma}_addr() (Christoph) o In device.h: s/const void */const struct dma_pfn_offset_region */ o removed 'unlikely' from unlikely(dev->dma_pfn_offset_map) since guarded by CONFIG_DMA_PFN_OFFSET_MAP (Christoph) o Since dev->dma_pfn_offset is copied in usb/core/{usb,message}.c, now dev->dma_pfn_offset_map is copied as well. o Merged two of the DMA commits into one (Christoph). Commit "arm: dma-mapping: Invoke dma offset func if needed": o Use helper functions instead of #if CONFIG_DMA_PFN_OFFSET Other commits' changes: o Removed need for carrying of_id var in priv (Nicolas) o Commit message rewordings (Bjorn) o Commit log messages filled to 75 chars (Bjorn) o devm_reset_control_get_shared()) => devm_reset_control_get_optional_shared (Philipp) o Add call to reset_control_assert() in PCIe remove routines (Philipp) v1: This patchset expands the usefulness of the Broadcom Settop Box PCIe controller by building upon the PCIe driver used currently by the Raspbery Pi. Other forms of this patchset were submitted by me years ago and not accepted; the major sticking point was the code required for the DMA remapping needed for the PCIe driver to work [1]. There have been many changes to the DMA and OF subsystems since that time, making a cleaner and less intrusive patchset possible. This patchset implements a generalization of "dev->dma_pfn_offset", except that instead of a single scalar offset it provides for multiple offsets via a function which depends upon the "dma-ranges" property of the PCIe host controller. This is required for proper functionality of the BrcmSTB PCIe controller and possibly some other devices. [1] https://lore.kernel.org/linux-arm-kernel/1516058925-46522-5-git-send-email-jim2101...@gmail.com/ Jim Quinlan (14): PCI: brcmstb: PCIE_BRCMSTB depends on ARCH_BRCMSTB ata: ahci_brcm: Fix use of BCM7216 reset controller dt-bindings: PCI: Add bindings for more Brcmstb chips PCI: brcmstb: Add bcm7278 reigister info PCI: brcmstb: Add suspend and resume pm_ops PCI: brcmstb: Add bcm7278 PERST support PCI: brcmstb: Add control of rescal reset of: Include a dev param in of_dma_get_range() device core: Add ability to handle multiple dma offsets arm: dma-mapping: Invoke dma offset func if needed PCI: brcmstb: Set internal memory viewport sizes PCI: brcmstb: Accommodate MSI for older chips PCI: brcmstb: Set bus max burst size by chip type PCI: brcmstb: Add bcm7211, bcm7216, bcm7445, bcm7278 to match list .../bindings/pci/brcm,stb-pcie.yaml | 40 +- arch/arm/include/asm/dma-mapping.h| 13 +- drivers/ata/ahci_brcm.c | 14 +- drivers/of/address.c | 69 ++- drivers/of/device.c | 2 +- drivers/of/of_private.h | 8 +- drivers/pci/controller/Kconfig| 3 +- drivers/pci/controller/pcie-brcmstb.c | 408 +++--- drivers/usb/core/message.c| 3 + drivers/usb/core/usb.c| 3 + include/linux/device.h| 10 +- include/linux/dma-direct.h| 10 +- include/linux/dma-mapping.h | 46 ++ kernel/dma/Kconfig| 13 + 14 files changed, 559 insertions(+), 83 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5.6 014/126] iommu/amd: Fix over-read of ACPI UID from IVRS table
From: Alexander Monakov [ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ] IVRS parsing code always tries to read 255 bytes from memory when retrieving ACPI device path, and makes an assumption that firmware provides a zero-terminated string. Both of those are bugs: the entry is likely to be shorter than 255 bytes, and zero-termination is not guaranteed. With Acer SF314-42 firmware these issues manifest visibly in dmesg: AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1... The first three lines show how the code over-reads adjacent table entries into the UID, and in the last line it even reads garbage data beyond the end of the IVRS table itself. Since each entry has the length of the UID (uidl member of ivhd_entry struct), use that for memcpy, and manually add a zero terminator. Avoid zero-filling hid and uid arrays up front, and instead ensure the uid array is always zero-terminated. No change needed for the hid array, as it was already properly zero-terminated. Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type HID") Signed-off-by: Alexander Monakov Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Link: https://lore.kernel.org/r/20200511102352.1831-1-amona...@ispras.ru Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu_init.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 2b9a67ecc6ac..5b81fd16f5fa 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1329,8 +1329,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, } case IVHD_DEV_ACPI_HID: { u16 devid; - u8 hid[ACPIHID_HID_LEN] = {0}; - u8 uid[ACPIHID_UID_LEN] = {0}; + u8 hid[ACPIHID_HID_LEN]; + u8 uid[ACPIHID_UID_LEN]; int ret; if (h->type != 0x40) { @@ -1347,6 +1347,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, break; } + uid[0] = '\0'; switch (e->uidf) { case UID_NOT_PRESENT: @@ -1361,8 +1362,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, break; case UID_IS_CHARACTER: - memcpy(uid, (u8 *)(&e->uid), ACPIHID_UID_LEN - 1); - uid[ACPIHID_UID_LEN - 1] = '\0'; + memcpy(uid, &e->uid, e->uidl); + uid[e->uidl] = '\0'; break; default: -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5.4 012/111] iommu/amd: Fix over-read of ACPI UID from IVRS table
From: Alexander Monakov [ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ] IVRS parsing code always tries to read 255 bytes from memory when retrieving ACPI device path, and makes an assumption that firmware provides a zero-terminated string. Both of those are bugs: the entry is likely to be shorter than 255 bytes, and zero-termination is not guaranteed. With Acer SF314-42 firmware these issues manifest visibly in dmesg: AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1... The first three lines show how the code over-reads adjacent table entries into the UID, and in the last line it even reads garbage data beyond the end of the IVRS table itself. Since each entry has the length of the UID (uidl member of ivhd_entry struct), use that for memcpy, and manually add a zero terminator. Avoid zero-filling hid and uid arrays up front, and instead ensure the uid array is always zero-terminated. No change needed for the hid array, as it was already properly zero-terminated. Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type HID") Signed-off-by: Alexander Monakov Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Link: https://lore.kernel.org/r/20200511102352.1831-1-amona...@ispras.ru Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu_init.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index ef14b00fa94b..135ae5222cf3 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1331,8 +1331,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, } case IVHD_DEV_ACPI_HID: { u16 devid; - u8 hid[ACPIHID_HID_LEN] = {0}; - u8 uid[ACPIHID_UID_LEN] = {0}; + u8 hid[ACPIHID_HID_LEN]; + u8 uid[ACPIHID_UID_LEN]; int ret; if (h->type != 0x40) { @@ -1349,6 +1349,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, break; } + uid[0] = '\0'; switch (e->uidf) { case UID_NOT_PRESENT: @@ -1363,8 +1364,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, break; case UID_IS_CHARACTER: - memcpy(uid, (u8 *)(&e->uid), ACPIHID_UID_LEN - 1); - uid[ACPIHID_UID_LEN - 1] = '\0'; + memcpy(uid, &e->uid, e->uidl); + uid[e->uidl] = '\0'; break; default: -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4.19 12/81] iommu/amd: Fix over-read of ACPI UID from IVRS table
From: Alexander Monakov [ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ] IVRS parsing code always tries to read 255 bytes from memory when retrieving ACPI device path, and makes an assumption that firmware provides a zero-terminated string. Both of those are bugs: the entry is likely to be shorter than 255 bytes, and zero-termination is not guaranteed. With Acer SF314-42 firmware these issues manifest visibly in dmesg: AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1... The first three lines show how the code over-reads adjacent table entries into the UID, and in the last line it even reads garbage data beyond the end of the IVRS table itself. Since each entry has the length of the UID (uidl member of ivhd_entry struct), use that for memcpy, and manually add a zero terminator. Avoid zero-filling hid and uid arrays up front, and instead ensure the uid array is always zero-terminated. No change needed for the hid array, as it was already properly zero-terminated. Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type HID") Signed-off-by: Alexander Monakov Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Link: https://lore.kernel.org/r/20200511102352.1831-1-amona...@ispras.ru Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu_init.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 2557ed112bc2..c7d0bb3b4a30 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1334,8 +1334,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, } case IVHD_DEV_ACPI_HID: { u16 devid; - u8 hid[ACPIHID_HID_LEN] = {0}; - u8 uid[ACPIHID_UID_LEN] = {0}; + u8 hid[ACPIHID_HID_LEN]; + u8 uid[ACPIHID_UID_LEN]; int ret; if (h->type != 0x40) { @@ -1352,6 +1352,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, break; } + uid[0] = '\0'; switch (e->uidf) { case UID_NOT_PRESENT: @@ -1366,8 +1367,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, break; case UID_IS_CHARACTER: - memcpy(uid, (u8 *)(&e->uid), ACPIHID_UID_LEN - 1); - uid[ACPIHID_UID_LEN - 1] = '\0'; + memcpy(uid, &e->uid, e->uidl); + uid[e->uidl] = '\0'; break; default: -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4.14 10/59] iommu/amd: Fix over-read of ACPI UID from IVRS table
From: Alexander Monakov [ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ] IVRS parsing code always tries to read 255 bytes from memory when retrieving ACPI device path, and makes an assumption that firmware provides a zero-terminated string. Both of those are bugs: the entry is likely to be shorter than 255 bytes, and zero-termination is not guaranteed. With Acer SF314-42 firmware these issues manifest visibly in dmesg: AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1... The first three lines show how the code over-reads adjacent table entries into the UID, and in the last line it even reads garbage data beyond the end of the IVRS table itself. Since each entry has the length of the UID (uidl member of ivhd_entry struct), use that for memcpy, and manually add a zero terminator. Avoid zero-filling hid and uid arrays up front, and instead ensure the uid array is always zero-terminated. No change needed for the hid array, as it was already properly zero-terminated. Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type HID") Signed-off-by: Alexander Monakov Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Link: https://lore.kernel.org/r/20200511102352.1831-1-amona...@ispras.ru Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu_init.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 6c228144b3da..ec9a20e06941 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1317,8 +1317,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, } case IVHD_DEV_ACPI_HID: { u16 devid; - u8 hid[ACPIHID_HID_LEN] = {0}; - u8 uid[ACPIHID_UID_LEN] = {0}; + u8 hid[ACPIHID_HID_LEN]; + u8 uid[ACPIHID_UID_LEN]; int ret; if (h->type != 0x40) { @@ -1335,6 +1335,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, break; } + uid[0] = '\0'; switch (e->uidf) { case UID_NOT_PRESENT: @@ -1349,8 +1350,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, break; case UID_IS_CHARACTER: - memcpy(uid, (u8 *)(&e->uid), ACPIHID_UID_LEN - 1); - uid[ACPIHID_UID_LEN - 1] = '\0'; + memcpy(uid, &e->uid, e->uidl); + uid[e->uidl] = '\0'; break; default: -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4.9 09/64] iommu/amd: Fix over-read of ACPI UID from IVRS table
From: Alexander Monakov [ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ] IVRS parsing code always tries to read 255 bytes from memory when retrieving ACPI device path, and makes an assumption that firmware provides a zero-terminated string. Both of those are bugs: the entry is likely to be shorter than 255 bytes, and zero-termination is not guaranteed. With Acer SF314-42 firmware these issues manifest visibly in dmesg: AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1... The first three lines show how the code over-reads adjacent table entries into the UID, and in the last line it even reads garbage data beyond the end of the IVRS table itself. Since each entry has the length of the UID (uidl member of ivhd_entry struct), use that for memcpy, and manually add a zero terminator. Avoid zero-filling hid and uid arrays up front, and instead ensure the uid array is always zero-terminated. No change needed for the hid array, as it was already properly zero-terminated. Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type HID") Signed-off-by: Alexander Monakov Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Link: https://lore.kernel.org/r/20200511102352.1831-1-amona...@ispras.ru Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu_init.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index e6ae8d123984..a3279f303b49 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1171,8 +1171,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, } case IVHD_DEV_ACPI_HID: { u16 devid; - u8 hid[ACPIHID_HID_LEN] = {0}; - u8 uid[ACPIHID_UID_LEN] = {0}; + u8 hid[ACPIHID_HID_LEN]; + u8 uid[ACPIHID_UID_LEN]; int ret; if (h->type != 0x40) { @@ -1189,6 +1189,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, break; } + uid[0] = '\0'; switch (e->uidf) { case UID_NOT_PRESENT: @@ -1203,8 +1204,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, break; case UID_IS_CHARACTER: - memcpy(uid, (u8 *)(&e->uid), ACPIHID_UID_LEN - 1); - uid[ACPIHID_UID_LEN - 1] = '\0'; + memcpy(uid, &e->uid, e->uidl); + uid[e->uidl] = '\0'; break; default: -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Relax ACS requirement for RCiEP devices.
On Tue, May 26, 2020 at 12:26:54PM -0600, Alex Williamson wrote: > > > > > > I don't think the language in the spec is anything sufficient to handle > > > RCiEP uniquely. We've previously rejected kernel command line opt-outs > > > for ACS, and the extent to which those patches still float around the > > > user community and are blindly used to separate IOMMU groups are a > > > testament to the failure of this approach. Users do not have a basis > > > for enabling this sort of opt-out. The benefit is obvious in the IOMMU > > > grouping, but the risk is entirely unknown. A kconfig option is even > > > worse as that means if you consume a downstream kernel, the downstream > > > maintainers might have decided universally that isolation is less > > > important than functionality. > > > > We discussed this internally, and Intel vt-d spec does spell out clearly > > in Section 3.16 Root-Complex Peer to Peer Considerations. The spec clearly > > calls out that all p2p must be done on translated addresses and therefore > > must go through the IOMMU. > > > > I suppose they should also have some similar platform gauranteed behavior > > for RCiEP's or MFD's *Must* behave as follows. The language is strict and > > when IOMMU is enabled in the platform, everything is sent up north to the > > IOMMU agent. > > > > 3.16 Root-Complex Peer to Peer Considerations > > When DMA remapping is enabled, peer-to-peer requests through the > > Root-Complex must be handled > > as follows: > > • The input address in the request is translated (through first-level, > > second-level or nested translation) to a host physical address (HPA). > > The address decoding for peer addresses must be done only on the > > translated HPA. Hardware implementations are free to further limit > > peer-to-peer accesses to specific host physical address regions > > (or to completely disallow peer-forwarding of translated requests). > > • Since address translation changes the contents (address field) of the PCI > > Express Transaction Layer Packet (TLP), for PCI Express peer-to-peer > > requests with ECRC, the Root-Complex hardware must use the new ECRC > > (re-computed with the translated address) if it decides to forward > > the TLP as a peer request. > > • Root-ports, and multi-function root-complex integrated endpoints, may > > support additional peerto-peer control features by supporting PCI Express > > Access Control Services (ACS) capability. Refer to ACS capability in > > PCI Express specifications for details. > > That sounds like it might be a reasonable basis for quirking all RCiEPs > on VT-d platforms if Intel is willing to stand behind it. Thanks, > Sounds good.. that's what i hear from our platform teams. If there is a violation it would be a bug in silicon. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Relax ACS requirement for RCiEP devices.
On Tue, 26 May 2020 11:06:48 -0700 "Raj, Ashok" wrote: > Hi Alex, > > I was able to find better language in the IOMMU spec that gaurantees > the behavior we need. See below. > > > On Tue, May 05, 2020 at 09:34:14AM -0600, Alex Williamson wrote: > > On Tue, 5 May 2020 07:56:06 -0700 > > "Raj, Ashok" wrote: > > > > > On Tue, May 05, 2020 at 08:05:14AM -0600, Alex Williamson wrote: > > > > On Mon, 4 May 2020 23:11:07 -0700 > > > > "Raj, Ashok" wrote: > > > > > > > > > Hi Alex > > > > > > > > > > + Joerg, accidently missed in the Cc. > > > > > > > > > > On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote: > > > > > > On Mon, 4 May 2020 21:42:16 -0700 > > > > > > Ashok Raj wrote: > > > > > > > > > > > > > PCIe Spec recommends we can relax ACS requirement for RCIEP > > > > > > > devices. > > > > > > > > > > > > > > PCIe 5.0 Specification. > > > > > > > 6.12 Access Control Services (ACS) > > > > > > > Implementation of ACS in RCiEPs is permitted but not required. It > > > > > > > is > > > > > > > explicitly permitted that, within a single Root Complex, some > > > > > > > RCiEPs > > > > > > > implement ACS and some do not. It is strongly recommended that > > > > > > > Root Complex > > > > > > > implementations ensure that all accesses originating from RCiEPs > > > > > > > (PFs and VFs) without ACS capability are first subjected to > > > > > > > processing by > > > > > > > the Translation Agent (TA) in the Root Complex before further > > > > > > > decoding and > > > > > > > processing. The details of such Root Complex handling are outside > > > > > > > the scope > > > > > > > of this specification. > > > > > > > > > > > > > > > > > > > Is the language here really strong enough to make this change? ACS > > > > > > is > > > > > > an optional feature, so being permitted but not required is rather > > > > > > meaningless. The spec is also specifically avoiding the words > > > > > > "must" > > > > > > or "shall" and even when emphasized with "strongly", we still only > > > > > > have > > > > > > a recommendation that may or may not be honored. This seems like a > > > > > > weak basis for assuming that RCiEPs universally honor this > > > > > > recommendation. Thanks, > > > > > > > > > > > > > > > > We are speaking about PCIe spec, where people write it about 5 years > > > > > ahead > > > > > and every vendor tries to massage their product behavior with vague > > > > > words like this.. :) > > > > > > > > > > But honestly for any any RCiEP, or even integrated endpoints, there > > > > > is no way to send them except up north. These aren't behind a RP. > > > > > > > > But they are multi-function devices and the spec doesn't define routing > > > > within multifunction packages. A single function RCiEP will already be > > > > assumed isolated within its own group. > > > > > > That's right. The other two devices only have legacy PCI headers. So > > > they can't claim to be RCiEP's but just integrated endpoints. The legacy > > > devices don't even have a PCIe header. > > > > > > I honestly don't know why these are groped as MFD's in the first place. > > > > > > > > > > > > I did check with couple folks who are part of the SIG, and seem to > > > > > agree > > > > > that ACS treatment for RCiEP's doesn't mean much. > > > > > > > > > > I understand the language isn't strong, but it doesn't seem like ACS > > > > > should > > > > > be a strong requirement for RCiEP's and reasonable to relax. > > > > > > > > > > What are your thoughts? > > > > > > > > I think hardware vendors have ACS at their disposal to clarify when > > > > isolation is provided, otherwise vendors can submit quirks, but I don't > > > > see that the "strongly recommended" phrasing is sufficient to assume > > > > isolation between multifunction RCiEPs. Thanks, > > > > > > You point is that integrated MFD endpoints, without ACS, there is no > > > gaurantee to SW that they are isolated. > > > > > > As far as a quirk, do you think: > > > - a cmdline optput for integrated endpoints, and RCiEP's suffice? > > > along with a compile time default that is strict enforcement > > > - typical vid/did type exception list? > > > > > > A more generic way to ask for exception would be scalable until we can > > > stop > > > those type of integrated devices. Or we need to maintain these device > > > lists > > > for eternity. > > > > I don't think the language in the spec is anything sufficient to handle > > RCiEP uniquely. We've previously rejected kernel command line opt-outs > > for ACS, and the extent to which those patches still float around the > > user community and are blindly used to separate IOMMU groups are a > > testament to the failure of this approach. Users do not have a basis > > for enabling this sort of opt-out. The benefit is obvious in the IOMMU > > grouping, but the risk is entirely unknown. A kconfig option is even > > worse as
[PATCH v3] dma: Fix max PFN arithmetic overflow on 32 bit systems
The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is 4 294 967 296 or 0x1 which is no problem on 64 bit systems. The patch does not change the later overall result of 0x10 for MAX_DMA32_PFN. The new calculation yields the same result, but does not require 64 bit arithmetic. On 32 bit systems the old calculation suffers from an arithmetic overflow in that intermediate term in braces: 4UL aka unsigned long int is 4 byte wide and an arithmetic overflow happens (the 0x1 does not fit in 4 bytes), the in braces result is truncated to zero, the following right shift does not alter that, so MAX_DMA32_PFN evaluates to 0 on 32 bit systems. That wrong value is a problem in a comparision against MAX_DMA32_PFN in the init code for swiotlb in 'pci_swiotlb_detect_4gb()' to decide if swiotlb should be active. That comparison yields the opposite result, when compiling on 32 bit systems. This was not possible before 1b7e03ef7570 ("x86, NUMA: Enable emulation on 32bit too") when that MAX_DMA32_PFN was first made visible to x86_32 (and which landed in v3.0). In practice this wasn't a problem, unless you activated CONFIG_SWIOTLB on x86 (32 bit). However for ARCH=x86 (32 bit) and if you have set CONFIG_IOMMU_INTEL, since c5a5dc4cbbf4 ("iommu/vt-d: Don't switch off swiotlb if bounce page is used") there's a dependency on CONFIG_SWIOTLB, which was not necessarily active before. That landed in v5.4, where we noticed it in the fli4l Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64 bit kernel configs there (I could not find out why, so let's just say historical reasons). The effect is at boot time 64 MiB (default size) were allocated for bounce buffers now, which is a noticeable amount of memory on small systems like pcengines ALIX 2D3 with 256 MiB memory, which are still frequently used as home routers. We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4 (LTS) in fli4l and got that kernel messages for example: Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018 … Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem) … PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB) The initial analysis and the suggested fix was done by user 'sourcejedi' at stackoverflow and explicitly marked as GPLv2 for inclusion in the Linux kernel: https://unix.stackexchange.com/a/520525/50007 The new calculation, which does not suffer from that overflow, is the same as for arch/mips now as suggested by Robin Murphy. The fix was tested by fli4l users on round about two dozen different systems, including both 32 and 64 bit archs, bare metal and virtualized machines. Fixes: 1b7e03ef7570 ("x86, NUMA: Enable emulation on 32bit too") Fixes: https://web.nettworks.org/bugs/browse/FFL-2560 Fixes: https://unix.stackexchange.com/q/520065/50007 Reported-by: Alan Jenkins Suggested-by: Robin Murphy Signed-off-by: Alexander Dahl Cc: sta...@vger.kernel.org --- Notes: v3: - rewritten commit message to better explain that arithmetic overflow and added Fixes tag (Greg Kroah-Hartman) - rebased on v5.7-rc7 v2: - use the same calculation as with arch/mips (Robin Murphy) arch/x86/include/asm/dma.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/dma.h b/arch/x86/include/asm/dma.h index 00f7cf45e699..8e95aa4b0d17 100644 --- a/arch/x86/include/asm/dma.h +++ b/arch/x86/include/asm/dma.h @@ -74,7 +74,7 @@ #define MAX_DMA_PFN ((16UL * 1024 * 1024) >> PAGE_SHIFT) /* 4GB broken PCI/AGP hardware bus master zone */ -#define MAX_DMA32_PFN ((4UL * 1024 * 1024 * 1024) >> PAGE_SHIFT) +#define MAX_DMA32_PFN (1UL << (32 - PAGE_SHIFT)) #ifdef CONFIG_X86_32 /* The maximum address that we can perform a DMA transfer to on this platform */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Relax ACS requirement for RCiEP devices.
Hi Alex, I was able to find better language in the IOMMU spec that gaurantees the behavior we need. See below. On Tue, May 05, 2020 at 09:34:14AM -0600, Alex Williamson wrote: > On Tue, 5 May 2020 07:56:06 -0700 > "Raj, Ashok" wrote: > > > On Tue, May 05, 2020 at 08:05:14AM -0600, Alex Williamson wrote: > > > On Mon, 4 May 2020 23:11:07 -0700 > > > "Raj, Ashok" wrote: > > > > > > > Hi Alex > > > > > > > > + Joerg, accidently missed in the Cc. > > > > > > > > On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote: > > > > > On Mon, 4 May 2020 21:42:16 -0700 > > > > > Ashok Raj wrote: > > > > > > > > > > > PCIe Spec recommends we can relax ACS requirement for RCIEP devices. > > > > > > > > > > > > PCIe 5.0 Specification. > > > > > > 6.12 Access Control Services (ACS) > > > > > > Implementation of ACS in RCiEPs is permitted but not required. It is > > > > > > explicitly permitted that, within a single Root Complex, some RCiEPs > > > > > > implement ACS and some do not. It is strongly recommended that Root > > > > > > Complex > > > > > > implementations ensure that all accesses originating from RCiEPs > > > > > > (PFs and VFs) without ACS capability are first subjected to > > > > > > processing by > > > > > > the Translation Agent (TA) in the Root Complex before further > > > > > > decoding and > > > > > > processing. The details of such Root Complex handling are outside > > > > > > the scope > > > > > > of this specification. > > > > > > > > > > > > > > > > Is the language here really strong enough to make this change? ACS is > > > > > an optional feature, so being permitted but not required is rather > > > > > meaningless. The spec is also specifically avoiding the words "must" > > > > > or "shall" and even when emphasized with "strongly", we still only > > > > > have > > > > > a recommendation that may or may not be honored. This seems like a > > > > > weak basis for assuming that RCiEPs universally honor this > > > > > recommendation. Thanks, > > > > > > > > > > > > > We are speaking about PCIe spec, where people write it about 5 years > > > > ahead > > > > and every vendor tries to massage their product behavior with vague > > > > words like this.. :) > > > > > > > > But honestly for any any RCiEP, or even integrated endpoints, there > > > > is no way to send them except up north. These aren't behind a RP. > > > > > > But they are multi-function devices and the spec doesn't define routing > > > within multifunction packages. A single function RCiEP will already be > > > assumed isolated within its own group. > > > > That's right. The other two devices only have legacy PCI headers. So > > they can't claim to be RCiEP's but just integrated endpoints. The legacy > > devices don't even have a PCIe header. > > > > I honestly don't know why these are groped as MFD's in the first place. > > > > > > > > > I did check with couple folks who are part of the SIG, and seem to agree > > > > that ACS treatment for RCiEP's doesn't mean much. > > > > > > > > I understand the language isn't strong, but it doesn't seem like ACS > > > > should > > > > be a strong requirement for RCiEP's and reasonable to relax. > > > > > > > > What are your thoughts? > > > > > > I think hardware vendors have ACS at their disposal to clarify when > > > isolation is provided, otherwise vendors can submit quirks, but I don't > > > see that the "strongly recommended" phrasing is sufficient to assume > > > isolation between multifunction RCiEPs. Thanks, > > > > You point is that integrated MFD endpoints, without ACS, there is no > > gaurantee to SW that they are isolated. > > > > As far as a quirk, do you think: > > - a cmdline optput for integrated endpoints, and RCiEP's suffice? > > along with a compile time default that is strict enforcement > > - typical vid/did type exception list? > > > > A more generic way to ask for exception would be scalable until we can stop > > those type of integrated devices. Or we need to maintain these device lists > > for eternity. > > I don't think the language in the spec is anything sufficient to handle > RCiEP uniquely. We've previously rejected kernel command line opt-outs > for ACS, and the extent to which those patches still float around the > user community and are blindly used to separate IOMMU groups are a > testament to the failure of this approach. Users do not have a basis > for enabling this sort of opt-out. The benefit is obvious in the IOMMU > grouping, but the risk is entirely unknown. A kconfig option is even > worse as that means if you consume a downstream kernel, the downstream > maintainers might have decided universally that isolation is less > important than functionality. We discussed this internally, and Intel vt-d spec does spell out clearly in Section 3.16 Root-Complex Peer to Peer Considerations. The spec clearly calls out that all p2p must be done on translated addresses and there
Re: [PATCH 1/2] PCI: Introduce PCI_FIXUP_IOMMU
Hi, Christoph On 2020/5/26 下午10:46, Christoph Hellwig wrote: On Tue, May 26, 2020 at 07:49:08PM +0800, Zhangfei Gao wrote: Some platform devices appear as PCI but are actually on the AMBA bus, and they need fixup in drivers/pci/quirks.c handling iommu_fwnode. Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow down iommu probing as all devices in fixup final list will be reprocessed. Who is going to use this? I don't see a single user in the series. We will add iommu fixup in drivers/pci/quirks.c, handling fwspec->can_stall, which is introduced in https://www.spinics.net/lists/linux-pci/msg94559.html Unfortunately, the patch does not catch v5.8, so we have to wait. And we want to check whether this is a right method to solve this issue. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] PCI: Introduce PCI_FIXUP_IOMMU
On Tue, May 26, 2020 at 07:49:08PM +0800, Zhangfei Gao wrote: > Some platform devices appear as PCI but are actually on the AMBA bus, > and they need fixup in drivers/pci/quirks.c handling iommu_fwnode. > Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode > is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow > down iommu probing as all devices in fixup final list will be > reprocessed. Who is going to use this? I don't see a single user in the series. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] Let pci_fixup_final access iommu_fwnode
On 2020/5/25 下午9:43, Joerg Roedel wrote: On Tue, May 12, 2020 at 12:08:29PM +0800, Zhangfei Gao wrote: Some platform devices appear as PCI but are actually on the AMBA bus, and they need fixup in drivers/pci/quirks.c handling iommu_fwnode. So calling pci_fixup_final after iommu_fwnode is allocated. For example: Hisilicon platform device need fixup in drivers/pci/quirks.c +static void quirk_huawei_pcie_sva(struct pci_dev *pdev) +{ + struct iommu_fwspec *fwspec; + + pdev->eetlp_prefix_path = 1; + fwspec = dev_iommu_fwspec_get(&pdev->dev); + if (fwspec) + fwspec->can_stall = 1; +} + +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva); I don't think it is a great idea to hook this into PCI_FIXUP_FINAL. The fixup list needs to be processed for every device, which will slow down probing. So either we introduce something like PCI_FIXUP_IOMMU, if this is entirely PCI specific. If it needs to be generic we need some fixup infrastructure in the IOMMU code itself. Thanks Joerg for the good suggestion. I am trying to introduce PCI_FIXUP_IOMMU in https://lkml.org/lkml/2020/5/26/366 Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] iommu: calling pci_fixup_iommu in iommu_fwspec_init
Calling pci_fixup_iommu in iommu_fwspec_init, which alloc iommu_fwnode. Some platform devices appear as PCI but are actually on the AMBA bus, and they need fixup in drivers/pci/quirks.c handling iommu_fwnode. So calling pci_fixup_iommu after iommu_fwnode is allocated. Signed-off-by: Zhangfei Gao --- drivers/iommu/iommu.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 7b37542..fb84c42 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode, fwspec->iommu_fwnode = iommu_fwnode; fwspec->ops = ops; dev_iommu_fwspec_set(dev, fwspec); + + if (dev_is_pci(dev)) + pci_fixup_device(pci_fixup_iommu, to_pci_dev(dev)); + return 0; } EXPORT_SYMBOL_GPL(iommu_fwspec_init); -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] PCI: Introduce PCI_FIXUP_IOMMU
Some platform devices appear as PCI but are actually on the AMBA bus, and they need fixup in drivers/pci/quirks.c handling iommu_fwnode. Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow down iommu probing as all devices in fixup final list will be reprocessed. Suggested-by: Joerg Roedel Signed-off-by: Zhangfei Gao --- drivers/pci/quirks.c | 7 +++ include/asm-generic/vmlinux.lds.h | 3 +++ include/linux/pci.h | 8 3 files changed, 18 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index ca9ed57..b037034 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -83,6 +83,8 @@ extern struct pci_fixup __start_pci_fixups_header[]; extern struct pci_fixup __end_pci_fixups_header[]; extern struct pci_fixup __start_pci_fixups_final[]; extern struct pci_fixup __end_pci_fixups_final[]; +extern struct pci_fixup __start_pci_fixups_iommu[]; +extern struct pci_fixup __end_pci_fixups_iommu[]; extern struct pci_fixup __start_pci_fixups_enable[]; extern struct pci_fixup __end_pci_fixups_enable[]; extern struct pci_fixup __start_pci_fixups_resume[]; @@ -118,6 +120,11 @@ void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev) end = __end_pci_fixups_final; break; + case pci_fixup_iommu: + start = __start_pci_fixups_iommu; + end = __end_pci_fixups_iommu; + break; + case pci_fixup_enable: start = __start_pci_fixups_enable; end = __end_pci_fixups_enable; diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 71e387a..3da32d8 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -411,6 +411,9 @@ __start_pci_fixups_final = .; \ KEEP(*(.pci_fixup_final)) \ __end_pci_fixups_final = .; \ + __start_pci_fixups_iommu = .; \ + KEEP(*(.pci_fixup_iommu)) \ + __end_pci_fixups_iommu = .; \ __start_pci_fixups_enable = .; \ KEEP(*(.pci_fixup_enable)) \ __end_pci_fixups_enable = .;\ diff --git a/include/linux/pci.h b/include/linux/pci.h index 83ce1cd..0d5fbf8 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1892,6 +1892,7 @@ enum pci_fixup_pass { pci_fixup_early,/* Before probing BARs */ pci_fixup_header, /* After reading configuration header */ pci_fixup_final,/* Final phase of device fixups */ + pci_fixup_iommu,/* After iommu_fwspec_init */ pci_fixup_enable, /* pci_enable_device() time */ pci_fixup_resume, /* pci_device_resume() */ pci_fixup_suspend, /* pci_device_suspend() */ @@ -1934,6 +1935,10 @@ enum pci_fixup_pass { class_shift, hook) \ DECLARE_PCI_FIXUP_SECTION(.pci_fixup_final, \ hook, vendor, device, class, class_shift, hook) +#define DECLARE_PCI_FIXUP_CLASS_IOMMU(vendor, device, class, \ +class_shift, hook) \ + DECLARE_PCI_FIXUP_SECTION(.pci_fixup_iommu, \ + hook, vendor, device, class, class_shift, hook) #define DECLARE_PCI_FIXUP_CLASS_ENABLE(vendor, device, class, \ class_shift, hook) \ DECLARE_PCI_FIXUP_SECTION(.pci_fixup_enable,\ @@ -1964,6 +1969,9 @@ enum pci_fixup_pass { #define DECLARE_PCI_FIXUP_FINAL(vendor, device, hook) \ DECLARE_PCI_FIXUP_SECTION(.pci_fixup_final, \ hook, vendor, device, PCI_ANY_ID, 0, hook) +#define DECLARE_PCI_FIXUP_IOMMU(vendor, device, hook) \ + DECLARE_PCI_FIXUP_SECTION(.pci_fixup_iommu, \ + hook, vendor, device, PCI_ANY_ID, 0, hook) #define DECLARE_PCI_FIXUP_ENABLE(vendor, device, hook) \ DECLARE_PCI_FIXUP_SECTION(.pci_fixup_enable,\ hook, vendor, device, PCI_ANY_ID, 0, hook) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] Introduce PCI_FIXUP_IOMMU
Some platform devices appear as PCI but are actually on the AMBA bus, and they need fixup in drivers/pci/quirks.c handling iommu_fwnode. Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow down iommu probing as all devices in fixup final list will be reprocessed, suggested by Joerg, [1] For example: Hisilicon platform device need fixup in drivers/pci/quirks.c handling fwspec->can_stall, which is introduced in [2] +static void quirk_huawei_pcie_sva(struct pci_dev *pdev) +{ +struct iommu_fwspec *fwspec; + +pdev->eetlp_prefix_path = 1; +fwspec = dev_iommu_fwspec_get(&pdev->dev); +if (fwspec) +fwspec->can_stall = 1; +} + +DECLARE_PCI_FIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva); +DECLARE_PCI_iFIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva); [1] https://www.spinics.net/lists/iommu/msg44591.html [2] https://www.spinics.net/lists/linux-pci/msg94559.html Zhangfei Gao (2): PCI: Introduce PCI_FIXUP_IOMMU iommu: calling pci_fixup_iommu in iommu_fwspec_init drivers/iommu/iommu.c | 4 drivers/pci/quirks.c | 7 +++ include/asm-generic/vmlinux.lds.h | 3 +++ include/linux/pci.h | 8 4 files changed, 22 insertions(+) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 00/38] DRM: fix struct sg_table nents vs. orig_nents misuse
On Tue, May 26, 2020 at 9:01 AM Marek Szyprowski wrote: > > Hi > > On 13.05.2020 15:47, Christoph Hellwig wrote: > > I've pushed out a branch with the first three patches here: > > > > git://git.infradead.org/users/hch/dma-mapping.git dma-sg_table-helper > > > > Gitweb: > > > > > > http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/dma-sg_table-helper > > > > and merged it into the dma-mapping for-next tree. Unless someone shouts > > the branch should be considered immutable in 24 hours. > > David & Daniel: could you merge all the DRM related changes on top of > the provided branch? Merging those changes separately would take a lots > of time because of the dependencies on the sgtable helpers and changes > in the DRM core. We generally freeze drm for big rework past -rc6 (small drivers tend to be a bit later). I think simpler if we just land this in the merge window and then smash the drm patches on top for 5.9 merge window. Or all in in the same topic branch, but feels a bit late for that and making sure nothing breaks. -Daniel > > Best regards > -- > Marek Szyprowski, PhD > Samsung R&D Institute Poland > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] iommu/hyper-v: Constify hyperv_ir_domain_ops
On Mon, May 25, 2020 at 11:49:57PM +0200, Rikard Falkeborn wrote: > The struct hyperv_ir_domain_ops is not modified and can be made const to > allow the compiler to put it in read-only memory. > > Before: >textdata bss dec hex filename >29161180112052161460 drivers/iommu/hyperv-iommu.o > > After: >textdata bss dec hex filename >30441052112052161460 drivers/iommu/hyperv-iommu.o > > Signed-off-by: Rikard Falkeborn Acked-by: Wei Liu > --- > drivers/iommu/hyperv-iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c > index a386b83e0e34..3c0c67a99c7b 100644 > --- a/drivers/iommu/hyperv-iommu.c > +++ b/drivers/iommu/hyperv-iommu.c > @@ -131,7 +131,7 @@ static int hyperv_irq_remapping_activate(struct > irq_domain *domain, > return 0; > } > > -static struct irq_domain_ops hyperv_ir_domain_ops = { > +static const struct irq_domain_ops hyperv_ir_domain_ops = { > .alloc = hyperv_irq_remapping_alloc, > .free = hyperv_irq_remapping_free, > .activate = hyperv_irq_remapping_activate, > -- > 2.26.2 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/sun50i: Constify sun50i_iommu_ops
On Mon, May 25, 2020 at 11:49:58PM +0200, Rikard Falkeborn wrote: > The struct sun50i_iommu_ops is not modified and can be made const to > allow the compiler to put it in read-only memory. > > Before: >textdata bss dec hex filename > 143582501 64 16923421b drivers/iommu/sun50i-iommu.o > > After: >textdata bss dec hex filename > 147262117 64 16907420b drivers/iommu/sun50i-iommu.o > > Signed-off-by: Rikard Falkeborn Acked-by: Maxime Ripard Thanks! Maxime ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: limit iova free size to unmmaped iova
On 2020-05-22 14:54, Robin Murphy wrote: On 2020-05-22 07:25, gup...@codeaurora.org wrote: On 2020-05-22 01:46, Robin Murphy wrote: On 2020-05-21 12:30, Prakash Gupta wrote: I agree, we shouldn't be freeing the partial iova. Instead just making sure if unmap was successful should be sufficient before freeing iova. So change can instead be something like this: - iommu_dma_free_iova(cookie, dma_addr, size); + if (unmapped) + iommu_dma_free_iova(cookie, dma_addr, size); TBH my gut feeling here is that you're really just trying to treat a symptom of another bug elsewhere, namely some driver calling dma_unmap_* or dma_free_* with the wrong address or size in the first place. This condition would arise only if driver calling dma_unmap/free_* with 0 iova_pfn. This will be flagged with a warning during unmap but will trigger panic later on while doing unrelated dma_map/unmap_*. If unmapped has already failed for invalid iova, there is no reason we should consider this as valid iova and free. This part should be fixed. I disagree. In general, if drivers call the DMA API incorrectly it is liable to lead to data loss, memory corruption, and various other unpleasant misbehaviour - it is not the DMA layer's job to attempt to paper over driver bugs. There *is* an argument for downgrading the BUG_ON() in iova_magazine_free_pfns() to a WARN_ON(), since frankly it isn't a sufficiently serious condition to justify killing the whole machine immediately, but NAK to bodging the iommu-dma mid-layer to "fix" that. A serious bug already happened elsewhere, so trying to hide the fallout really doesn't help anyone. Sorry for delayed response, it was a long weekend. I agree that invalid DMA API call can result in unexpected issues and client should fix it, but then the present behavior makes it difficult to catch cases when driver is making wrong DMA API calls. When invalid iova pfn is passed it doesn't fail then and there, though DMA layer is aware of iova being invalid. It fails much after that in the context of an valid map/unmap, with BUG_ON(). Downgrading BUG_ON() to WARN_ON() in iova_magazine_free_pfns() will not help much as invalid iova will cause NULL pointer dereference. I see no reason why DMA layer wants to free an iova for which unmapped failed. IMHO queuing an invalid iova (which already failed unmap) to rcache which eventually going to crash the system looks like iommu-dma layer issue. Thanks, Prakash ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 00/38] DRM: fix struct sg_table nents vs. orig_nents misuse
Hi On 13.05.2020 15:47, Christoph Hellwig wrote: > I've pushed out a branch with the first three patches here: > > git://git.infradead.org/users/hch/dma-mapping.git dma-sg_table-helper > > Gitweb: > > > http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/dma-sg_table-helper > > and merged it into the dma-mapping for-next tree. Unless someone shouts > the branch should be considered immutable in 24 hours. David & Daniel: could you merge all the DRM related changes on top of the provided branch? Merging those changes separately would take a lots of time because of the dependencies on the sgtable helpers and changes in the DRM core. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu