Re: [RFC] Use SMMU HTTU for DMA dirty page tracking

2020-05-26 Thread Xiang Zheng


On 2020/5/27 11:27, Tian, Kevin wrote:
>> From: Xiang Zheng
>> Sent: Monday, May 25, 2020 7:34 PM
>>
>> [+cc Kirti, Yan, Alex]
>>
>> On 2020/5/23 1:14, Jean-Philippe Brucker wrote:
>>> Hi,
>>>
>>> On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote:
 Hi all,

 Is there any plan for enabling SMMU HTTU?
>>>
>>> Not outside of SVA, as far as I know.
>>>
>>
 I have seen the patch locates in the SVA series patch, which adds
 support for HTTU:
 https://www.spinics.net/lists/arm-kernel/msg798694.html

 HTTU reduces the number of access faults on SMMU fault queue
 (permission faults also benifit from it).

 Besides reducing the faults, HTTU also helps to track dirty pages for
 device DMA. Is it feasible to utilize HTTU to get dirty pages on device
 DMA during VFIO live migration?
>>>
>>> As you know there is a VFIO interface for this under discussion:
>>> https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email-
>> kwankh...@nvidia.com/
>>> It doesn't implement an internal API to communicate with the IOMMU
>> driver
>>> about dirty pages.
> 
> We plan to add such API later, e.g. to utilize A/D bit in VT-d 2nd-level 
> page tables (Rev 3.0). 
> 

Thank you, Kevin.

When will you send this series patches? Maybe(Hope) we can also support
hardware-based dirty pages tracking via common APIs based on your patches. :)

>>
>>>
 If SMMU can track dirty pages, devices are not required to implement
 additional dirty pages tracking to support VFIO live migration.
>>>
>>> It seems feasible, though tracking it in the device might be more
>>> efficient. I might have misunderstood but I think for live migration of
>>> the Intel NIC they trap guest accesses to the device and introspect its
>>> state to figure out which pages it is accessing.
> 
> Does HTTU implement A/D-like mechanism in SMMU page tables, or just
> report dirty pages in a log buffer? Either way tracking dirty pages in IOMMU
> side is generic thus doesn't require device-specific tweak like in Intel NIC.
> 

Currently HTTU just implement A/D-like mechanism in SMMU page tables. We 
certainly
expect SMMU can also implement PML-like feature so that we can avoid walking the
whole page table to get the dirty pages.

By the way, I'm not sure whether HTTU or SLAD can help for mediated deivce.

-- 
Thanks,
Xiang

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/iova: Free global iova rcache on iova alloc failure

2020-05-26 Thread vjitta
From: Vijayanand Jitta 

When ever an iova alloc request fails we free the iova
ranges present in the percpu iova rcaches and then retry
but the global iova rcache is not freed as a result we
could still see iova alloc failure even after retry as
global rcache is still holding the iova's which can cause
fragmentation. So, free the global iova rcache as well
and then go for the retry.

Signed-off-by: Vijayanand Jitta 
---
 drivers/iommu/iova.c | 22 ++
 include/linux/iova.h |  6 ++
 2 files changed, 28 insertions(+)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 0e6a953..5ae0328 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -431,6 +431,7 @@ struct iova *find_iova(struct iova_domain *iovad, unsigned 
long pfn)
flush_rcache = false;
for_each_online_cpu(cpu)
free_cpu_cached_iovas(cpu, iovad);
+   free_global_cached_iovas(iovad);
goto retry;
}
 
@@ -1044,5 +1045,26 @@ void free_cpu_cached_iovas(unsigned int cpu, struct 
iova_domain *iovad)
}
 }
 
+/*
+ * free all the IOVA ranges of global cache
+ */
+void free_global_cached_iovas(struct iova_domain *iovad)
+{
+   struct iova_rcache *rcache;
+   int i, j;
+
+   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+   rcache = &iovad->rcaches[i];
+   spin_lock(&rcache->lock);
+   for (j = 0; j < rcache->depot_size; ++j) {
+   iova_magazine_free_pfns(rcache->depot[j], iovad);
+   iova_magazine_free(rcache->depot[j]);
+   rcache->depot[j] = NULL;
+   }
+   rcache->depot_size = 0;
+   spin_unlock(&rcache->lock);
+   }
+}
+
 MODULE_AUTHOR("Anil S Keshavamurthy ");
 MODULE_LICENSE("GPL");
diff --git a/include/linux/iova.h b/include/linux/iova.h
index a0637ab..a905726 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -163,6 +163,7 @@ int init_iova_flush_queue(struct iova_domain *iovad,
 struct iova *split_and_remove_iova(struct iova_domain *iovad,
struct iova *iova, unsigned long pfn_lo, unsigned long pfn_hi);
 void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad);
+void free_global_cached_iovas(struct iova_domain *iovad);
 #else
 static inline int iova_cache_get(void)
 {
@@ -270,6 +271,11 @@ static inline void free_cpu_cached_iovas(unsigned int cpu,
 struct iova_domain *iovad)
 {
 }
+
+static inline void free_global_cached_iovas(struct iova_domain *iovad)
+{
+}
+
 #endif
 
 #endif
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC] Use SMMU HTTU for DMA dirty page tracking

2020-05-26 Thread Tian, Kevin
> From: Xiang Zheng
> Sent: Monday, May 25, 2020 7:34 PM
> 
> [+cc Kirti, Yan, Alex]
> 
> On 2020/5/23 1:14, Jean-Philippe Brucker wrote:
> > Hi,
> >
> > On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote:
> >> Hi all,
> >>
> >> Is there any plan for enabling SMMU HTTU?
> >
> > Not outside of SVA, as far as I know.
> >
> 
> >> I have seen the patch locates in the SVA series patch, which adds
> >> support for HTTU:
> >> https://www.spinics.net/lists/arm-kernel/msg798694.html
> >>
> >> HTTU reduces the number of access faults on SMMU fault queue
> >> (permission faults also benifit from it).
> >>
> >> Besides reducing the faults, HTTU also helps to track dirty pages for
> >> device DMA. Is it feasible to utilize HTTU to get dirty pages on device
> >> DMA during VFIO live migration?
> >
> > As you know there is a VFIO interface for this under discussion:
> > https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email-
> kwankh...@nvidia.com/
> > It doesn't implement an internal API to communicate with the IOMMU
> driver
> > about dirty pages.

We plan to add such API later, e.g. to utilize A/D bit in VT-d 2nd-level 
page tables (Rev 3.0). 

> 
> >
> >> If SMMU can track dirty pages, devices are not required to implement
> >> additional dirty pages tracking to support VFIO live migration.
> >
> > It seems feasible, though tracking it in the device might be more
> > efficient. I might have misunderstood but I think for live migration of
> > the Intel NIC they trap guest accesses to the device and introspect its
> > state to figure out which pages it is accessing.

Does HTTU implement A/D-like mechanism in SMMU page tables, or just
report dirty pages in a log buffer? Either way tracking dirty pages in IOMMU
side is generic thus doesn't require device-specific tweak like in Intel NIC.

Thanks
kevin

> >
> > With HTTU I suppose (without much knowledge about live migration) that
> > you'd need several new interfaces to the IOMMU drivers:
> >
> > * A way for VFIO to query HTTU support in the SMMU. There are some
> >   discussions about communicating more IOMMU capabilities through VFIO
> but
> >   no implementation yet. When HTTU isn't supported the DIRTY_PAGES
> bitmap
> >   would report all pages as they do now.
> >
> > * VFIO_IOMMU_DIRTY_PAGES_FLAG_START/STOP would clear the dirty bit
> >   for all VFIO mappings (which is going to take some time). There is a
> >   walker in io-pgtable for iova_to_phys() which could be extended. I
> >   suppose it's also possible to atomically switch the HA and HD bits in
> >   context descriptors.
> 
> Maybe we need not switch HA and HD bits, just turn on them all the time?
> 
> >
> > * VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP would query the dirty
> bit for all
> >   VFIO mappings.
> >
> 
> I think we need to consider the case of IOMMU dirty pages logging. We want
> to test Kirti's VFIO migration patches combined with SMMU HTTU, any
> suggestions?
> 
> --
> Thanks,
> Xiang
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 18/24] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

2020-05-26 Thread Xiang Zheng
Hi Jean,

This patch only enables HTTU bits in CDs. Is it also neccessary to enable
HTTU bits in STEs in this patch?

On 2020/5/20 1:54, Jean-Philippe Brucker wrote:
> If the SMMU supports it and the kernel was built with HTTU support,
> enable hardware update of access and dirty flags. This is essential for
> shared page tables, to reduce the number of access faults on the fault
> queue. Normal DMA with io-pgtables doesn't currently use the access or
> dirty flags.
> 
> We can enable HTTU even if CPUs don't support it, because the kernel
> always checks for HW dirty bit and updates the PTE flags atomically.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/arm-smmu-v3.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 1386d4d2bc60..6a368218f54c 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -58,6 +58,8 @@
>  #define IDR0_ASID16  (1 << 12)
>  #define IDR0_ATS (1 << 10)
>  #define IDR0_HYP (1 << 9)
> +#define IDR0_HD  (1 << 7)
> +#define IDR0_HA  (1 << 6)
>  #define IDR0_BTM (1 << 5)
>  #define IDR0_COHACC  (1 << 4)
>  #define IDR0_TTF GENMASK(3, 2)
> @@ -311,6 +313,9 @@
>  #define CTXDESC_CD_0_TCR_IPS GENMASK_ULL(34, 32)
>  #define CTXDESC_CD_0_TCR_TBI0(1ULL << 38)
>  
> +#define CTXDESC_CD_0_TCR_HA  (1UL << 43)
> +#define CTXDESC_CD_0_TCR_HD  (1UL << 42)
> +

>  #define CTXDESC_CD_0_AA64(1UL << 41)
>  #define CTXDESC_CD_0_S   (1UL << 44)
>  #define CTXDESC_CD_0_R   (1UL << 45)
> @@ -663,6 +668,8 @@ struct arm_smmu_device {
>  #define ARM_SMMU_FEAT_E2H(1 << 16)
>  #define ARM_SMMU_FEAT_BTM(1 << 17)
>  #define ARM_SMMU_FEAT_SVA(1 << 18)
> +#define ARM_SMMU_FEAT_HA (1 << 19)
> +#define ARM_SMMU_FEAT_HD (1 << 20)
>   u32 features;
>  
>  #define ARM_SMMU_OPT_SKIP_PREFETCH   (1 << 0)
> @@ -1718,10 +1725,17 @@ static int arm_smmu_write_ctx_desc(struct 
> arm_smmu_domain *smmu_domain,
>* this substream's traffic
>*/
>   } else { /* (1) and (2) */
> + u64 tcr = cd->tcr;
> +
>   cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
>   cdptr[2] = 0;
>   cdptr[3] = cpu_to_le64(cd->mair);
>  
> + if (!(smmu->features & ARM_SMMU_FEAT_HD))
> + tcr &= ~CTXDESC_CD_0_TCR_HD;
> + if (!(smmu->features & ARM_SMMU_FEAT_HA))
> + tcr &= ~CTXDESC_CD_0_TCR_HA;
> +
>   /*
>* STE is live, and the SMMU might read dwords of this CD in any
>* order. Ensure that it observes valid values before reading
> @@ -1729,7 +1743,7 @@ static int arm_smmu_write_ctx_desc(struct 
> arm_smmu_domain *smmu_domain,
>*/
>   arm_smmu_sync_cd(smmu_domain, ssid, true);
>  
> - val = cd->tcr |
> + val = tcr |
>  #ifdef __BIG_ENDIAN
>   CTXDESC_CD_0_ENDI |
>  #endif
> @@ -1958,10 +1972,12 @@ static struct arm_smmu_ctx_desc 
> *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
>   return old_cd;
>   }
>  
> + /* HA and HD will be filtered out later if not supported by the SMMU */
>   tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - VA_BITS) |
> FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) |
> FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) |
> FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) |
> +   CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
> CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
>  
>   switch (PAGE_SIZE) {
> @@ -4454,6 +4470,12 @@ static int arm_smmu_device_hw_probe(struct 
> arm_smmu_device *smmu)
>   smmu->features |= ARM_SMMU_FEAT_E2H;
>   }
>  
> + if (reg & (IDR0_HA | IDR0_HD)) {

> + smmu->features |= ARM_SMMU_FEAT_HA;
> + if (reg & IDR0_HD)
> + smmu->features |= ARM_SMMU_FEAT_HD;
> + }
> +

>   /*
>* If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
>* will create TLB entries for NH-EL1 world and will miss the
> 

-- 
Thanks,
Xiang

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Relax ACS requirement for Intel RCiEP devices.

2020-05-26 Thread Kuppuswamy, Sathyanarayanan



On 5/26/20 3:17 PM, Ashok Raj wrote:

All Intel platforms guarantee that all root complex implementations
must send transactions up to IOMMU for address translations. Hence for
RCiEP devices that are Vendor ID Intel, can claim exception for lack of
ACS support.


3.16 Root-Complex Peer to Peer Considerations
When DMA remapping is enabled, peer-to-peer requests through the
Root-Complex must be handled
as follows:
• The input address in the request is translated (through first-level,
   second-level or nested translation) to a host physical address (HPA).
   The address decoding for peer addresses must be done only on the
   translated HPA. Hardware implementations are free to further limit
   peer-to-peer accesses to specific host physical address regions
   (or to completely disallow peer-forwarding of translated requests).
• Since address translation changes the contents (address field) of
   the PCI Express Transaction Layer Packet (TLP), for PCI Express
   peer-to-peer requests with ECRC, the Root-Complex hardware must use
   the new ECRC (re-computed with the translated address) if it
   decides to forward the TLP as a peer request.
• Root-ports, and multi-function root-complex integrated endpoints, may
   support additional peerto-peer control features by supporting PCI Express
   Access Control Services (ACS) capability. Refer to ACS capability in
   PCI Express specifications for details.

Since Linux didn't give special treatment to allow this exception, certain
RCiEP MFD devices are getting grouped in a single iommu group. This
doesn't permit a single device to be assigned to a guest for instance.

In one vendor system: Device 14.x were grouped in a single IOMMU group.

/sys/kernel/iommu_groups/5/devices/:00:14.0
/sys/kernel/iommu_groups/5/devices/:00:14.2
/sys/kernel/iommu_groups/5/devices/:00:14.3

After the patch:
/sys/kernel/iommu_groups/5/devices/:00:14.0
/sys/kernel/iommu_groups/5/devices/:00:14.2
/sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group

14.0 and 14.2 are integrated devices, but legacy end points.
Whereas 14.3 was a PCIe compliant RCiEP.

00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00

This permits assigning this device to a guest VM.

Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
Signed-off-by: Ashok Raj 
To: Joerg Roedel 
To: Bjorn Helgaas 
Cc: linux-ker...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: Lu Baolu 
Cc: Alex Williamson 
Cc: Darrel Goeddel 
Cc: Mark Scott ,
Cc: Romil Sharma 
Cc: Ashok Raj 
---
  drivers/iommu/iommu.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2b471419e26c..31b595dfedde 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1187,7 +1187,18 @@ static struct iommu_group 
*get_pci_function_alias_group(struct pci_dev *pdev,
struct pci_dev *tmp = NULL;
struct iommu_group *group;
  
-	if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS))

+   /*
+* Intel VT-d Specification Section 3.16, Root-Complex Peer to Peer
+* Considerations manadate that all transactions in RCiEP's and
+* even Integrated MFD's *must* be sent up to the IOMMU. P2P is
+* only possible on translated addresses. This gives enough
+* guarantee that such devices can be forgiven for lack of ACS
+* support.
+*/
+   if (!pdev->multifunction ||
+   (pdev->vendor == PCI_VENDOR_ID_INTEL &&
+pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) ||
+pci_acs_enabled(pdev, REQ_ACS_FLAGS))

Looks good to me.
Reviewed-by: Kuppuswamy Sathyanarayanan 


return NULL;
  
  	for_each_pci_dev(tmp) {



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: Relax ACS requirement for Intel RCiEP devices.

2020-05-26 Thread Alex Williamson
On Tue, 26 May 2020 15:17:35 -0700
Ashok Raj  wrote:

> All Intel platforms guarantee that all root complex implementations
> must send transactions up to IOMMU for address translations. Hence for
> RCiEP devices that are Vendor ID Intel, can claim exception for lack of
> ACS support.
> 
> 
> 3.16 Root-Complex Peer to Peer Considerations
> When DMA remapping is enabled, peer-to-peer requests through the
> Root-Complex must be handled
> as follows:
> • The input address in the request is translated (through first-level,
>   second-level or nested translation) to a host physical address (HPA).
>   The address decoding for peer addresses must be done only on the
>   translated HPA. Hardware implementations are free to further limit
>   peer-to-peer accesses to specific host physical address regions
>   (or to completely disallow peer-forwarding of translated requests).
> • Since address translation changes the contents (address field) of
>   the PCI Express Transaction Layer Packet (TLP), for PCI Express
>   peer-to-peer requests with ECRC, the Root-Complex hardware must use
>   the new ECRC (re-computed with the translated address) if it
>   decides to forward the TLP as a peer request.
> • Root-ports, and multi-function root-complex integrated endpoints, may
>   support additional peerto-peer control features by supporting PCI Express
>   Access Control Services (ACS) capability. Refer to ACS capability in
>   PCI Express specifications for details.
> 
> Since Linux didn't give special treatment to allow this exception, certain
> RCiEP MFD devices are getting grouped in a single iommu group. This
> doesn't permit a single device to be assigned to a guest for instance.
> 
> In one vendor system: Device 14.x were grouped in a single IOMMU group.
> 
> /sys/kernel/iommu_groups/5/devices/:00:14.0
> /sys/kernel/iommu_groups/5/devices/:00:14.2
> /sys/kernel/iommu_groups/5/devices/:00:14.3
> 
> After the patch:
> /sys/kernel/iommu_groups/5/devices/:00:14.0
> /sys/kernel/iommu_groups/5/devices/:00:14.2
> /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group
> 
> 14.0 and 14.2 are integrated devices, but legacy end points.
> Whereas 14.3 was a PCIe compliant RCiEP.
> 
> 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
> Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> 
> This permits assigning this device to a guest VM.
> 
> Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
> Signed-off-by: Ashok Raj 
> To: Joerg Roedel 
> To: Bjorn Helgaas 
> Cc: linux-ker...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Cc: Lu Baolu 
> Cc: Alex Williamson 
> Cc: Darrel Goeddel 
> Cc: Mark Scott ,
> Cc: Romil Sharma 
> Cc: Ashok Raj 
> ---
>  drivers/iommu/iommu.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2b471419e26c..31b595dfedde 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1187,7 +1187,18 @@ static struct iommu_group 
> *get_pci_function_alias_group(struct pci_dev *pdev,
>   struct pci_dev *tmp = NULL;
>   struct iommu_group *group;
>  
> - if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS))
> + /*
> +  * Intel VT-d Specification Section 3.16, Root-Complex Peer to Peer
> +  * Considerations manadate that all transactions in RCiEP's and
> +  * even Integrated MFD's *must* be sent up to the IOMMU. P2P is
> +  * only possible on translated addresses. This gives enough
> +  * guarantee that such devices can be forgiven for lack of ACS
> +  * support.
> +  */
> + if (!pdev->multifunction ||
> + (pdev->vendor == PCI_VENDOR_ID_INTEL &&
> +  pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) ||
> +  pci_acs_enabled(pdev, REQ_ACS_FLAGS))
>   return NULL;
>  
>   for_each_pci_dev(tmp) {

Hi Ashok,

As this is an Intel/VT-d standard, not a PCIe standard, why not
implement this in pci_dev_specific_acs_enabled() with all the other
quirks?  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

AMD IOMMU perf counters on Zen2

2020-05-26 Thread Alexander Monakov
Hello,

I'd like to use IOMMU perf counters on a Zen 2 CPU (Ryzen 4500U, Renoir SoC).
Unfortunately, init_iommu_perf_ctr fails because val2 != val, i.e. the
counter appears not writable. However, if I patch the kernel to skip this
check, the counters seem to increment when configured with perf tool.

Do you know why the counter might appear not writable in newer CPUs?

Thanks.
Alexander
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: Relax ACS requirement for Intel RCiEP devices.

2020-05-26 Thread Ashok Raj
All Intel platforms guarantee that all root complex implementations
must send transactions up to IOMMU for address translations. Hence for
RCiEP devices that are Vendor ID Intel, can claim exception for lack of
ACS support.


3.16 Root-Complex Peer to Peer Considerations
When DMA remapping is enabled, peer-to-peer requests through the
Root-Complex must be handled
as follows:
• The input address in the request is translated (through first-level,
  second-level or nested translation) to a host physical address (HPA).
  The address decoding for peer addresses must be done only on the
  translated HPA. Hardware implementations are free to further limit
  peer-to-peer accesses to specific host physical address regions
  (or to completely disallow peer-forwarding of translated requests).
• Since address translation changes the contents (address field) of
  the PCI Express Transaction Layer Packet (TLP), for PCI Express
  peer-to-peer requests with ECRC, the Root-Complex hardware must use
  the new ECRC (re-computed with the translated address) if it
  decides to forward the TLP as a peer request.
• Root-ports, and multi-function root-complex integrated endpoints, may
  support additional peerto-peer control features by supporting PCI Express
  Access Control Services (ACS) capability. Refer to ACS capability in
  PCI Express specifications for details.

Since Linux didn't give special treatment to allow this exception, certain
RCiEP MFD devices are getting grouped in a single iommu group. This
doesn't permit a single device to be assigned to a guest for instance.

In one vendor system: Device 14.x were grouped in a single IOMMU group.

/sys/kernel/iommu_groups/5/devices/:00:14.0
/sys/kernel/iommu_groups/5/devices/:00:14.2
/sys/kernel/iommu_groups/5/devices/:00:14.3

After the patch:
/sys/kernel/iommu_groups/5/devices/:00:14.0
/sys/kernel/iommu_groups/5/devices/:00:14.2
/sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group

14.0 and 14.2 are integrated devices, but legacy end points.
Whereas 14.3 was a PCIe compliant RCiEP.

00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00

This permits assigning this device to a guest VM.

Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
Signed-off-by: Ashok Raj 
To: Joerg Roedel 
To: Bjorn Helgaas 
Cc: linux-ker...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: Lu Baolu 
Cc: Alex Williamson 
Cc: Darrel Goeddel 
Cc: Mark Scott ,
Cc: Romil Sharma 
Cc: Ashok Raj 
---
 drivers/iommu/iommu.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2b471419e26c..31b595dfedde 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1187,7 +1187,18 @@ static struct iommu_group 
*get_pci_function_alias_group(struct pci_dev *pdev,
struct pci_dev *tmp = NULL;
struct iommu_group *group;
 
-   if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS))
+   /*
+* Intel VT-d Specification Section 3.16, Root-Complex Peer to Peer
+* Considerations manadate that all transactions in RCiEP's and
+* even Integrated MFD's *must* be sent up to the IOMMU. P2P is
+* only possible on translated addresses. This gives enough
+* guarantee that such devices can be forgiven for lack of ACS
+* support.
+*/
+   if (!pdev->multifunction ||
+   (pdev->vendor == PCI_VENDOR_ID_INTEL &&
+pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) ||
+pci_acs_enabled(pdev, REQ_ACS_FLAGS))
return NULL;
 
for_each_pci_dev(tmp) {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 09/14] device core: Add ability to handle multiple dma offsets

2020-05-26 Thread Jim Quinlan via iommu
Hello Andy,

On Tue, May 26, 2020 at 4:54 PM Andy Shevchenko
 wrote:
>
> On Tue, May 26, 2020 at 03:12:48PM -0400, Jim Quinlan wrote:
> > The new field in struct device 'dma_pfn_offset_map' is used to facilitate
> > the use of multiple pfn offsets between cpu addrs and dma addrs.  It is
> > similar to 'dma_pfn_offset' except that the offset chosen depends on the
> > cpu or dma address involved.
> >
> > Signed-off-by: Jim Quinlan 
> > ---
> >  drivers/of/address.c| 65 +++--
> >  drivers/usb/core/message.c  |  3 ++
> >  drivers/usb/core/usb.c  |  3 ++
> >  include/linux/device.h  | 10 +-
> >  include/linux/dma-direct.h  | 10 --
> >  include/linux/dma-mapping.h | 46 ++
> >  kernel/dma/Kconfig  | 13 
> >  7 files changed, 144 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > index 96d8cfb14a60..a01afffcde7d 100644
> > --- a/drivers/of/address.c
> > +++ b/drivers/of/address.c
> > @@ -918,6 +918,47 @@ void __iomem *of_io_request_and_map(struct device_node 
> > *np, int index,
> >  }
> >  EXPORT_SYMBOL(of_io_request_and_map);
> >
> > +#ifdef CONFIG_DMA_PFN_OFFSET_MAP
> > +static int attach_dma_pfn_offset_map(struct device *dev,
> > +  struct device_node *node, int num_ranges)
> > +{
> > + struct of_range_parser parser;
> > + struct of_range range;
> > + size_t r_size = (num_ranges + 1)
> > + * sizeof(struct dma_pfn_offset_region);
> > + struct dma_pfn_offset_region *r;
> > +
>
> > + r = devm_kzalloc(dev, r_size, GFP_KERNEL);
>
> devm_?!

Yes, otherwise if the device gets unbound/bound repeatedly then there
would be a memory leak.

>
>
> Looking at r_size it should be rather kcalloc().

Yep.
>
>
> > + if (!r)
> > + return -ENOMEM;
> > + dev->dma_pfn_offset_map = r;
> > + of_dma_range_parser_init(&parser, node);
> > +
> > + /*
> > +  * Record all info for DMA ranges array.  We could
> > +  * just use the of_range struct, but if we did that it
> > +  * would require more calculations for phys_to_dma and
> > +  * dma_to_phys conversions.
> > +  */
> > + for_each_of_range(&parser, &range) {
> > + r->cpu_beg = range.cpu_addr;
> > + r->cpu_end = r->cpu_beg + range.size;
> > + r->dma_beg = range.bus_addr;
> > + r->dma_end = r->dma_beg + range.size;
> > + r->pfn_offset = PFN_DOWN(range.cpu_addr)
> > + - PFN_DOWN(range.bus_addr);
> > + r++;
> > + }
> > + return 0;
> > +}
> > +#else
> > +static int attach_dma_pfn_offset_map(struct device *dev,
> > +  struct device_node *node, int num_ranges)
> > +{
> > + return 0;
> > +}
> > +#endif
> > +
> >  /**
> >   * of_dma_get_range - Get DMA range info
> >   * @dev: device pointer; only needed for a corner case.
> > @@ -947,6 +988,8 @@ int of_dma_get_range(struct device *dev, struct 
> > device_node *np, u64 *dma_addr,
> >   struct of_range_parser parser;
> >   struct of_range range;
> >   u64 dma_start = U64_MAX, dma_end = 0, dma_offset = 0;
> > + bool dma_multi_pfn_offset = false;
> > + int num_ranges = 0;
> >
> >   while (node) {
> >   ranges = of_get_property(node, "dma-ranges", &len);
> > @@ -977,10 +1020,19 @@ int of_dma_get_range(struct device *dev, struct 
> > device_node *np, u64 *dma_addr,
> >   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
> >range.bus_addr, range.cpu_addr, range.size);
> >
> > + num_ranges++;
> >   if (dma_offset && range.cpu_addr - range.bus_addr != 
> > dma_offset) {
> > - pr_warn("Can't handle multiple dma-ranges with 
> > different offsets on node(%pOF)\n", node);
> > - /* Don't error out as we'd break some existing DTs */
> > - continue;
> > + if (!IS_ENABLED(CONFIG_DMA_PFN_OFFSET_MAP)) {
> > + pr_warn("Can't handle multiple dma-ranges 
> > with different offsets on node(%pOF)\n", node);
> > + pr_warn("Perhaps set 
> > DMA_PFN_OFFSET_MAP=y?\n");
> > + /*
> > +  * Don't error out as we'd break some existing
> > +  * DTs that are using configs w/o
> > +  * CONFIG_DMA_PFN_OFFSET_MAP set.
> > +  */
> > + continue;
> > + }
> > + dma_multi_pfn_offset = true;
> >   }
> >   dma_offset = range.cpu_addr - range.bus_addr;
> >
> > @@ -991,6 +1043,13 @@ int of_dma_get_range(struct device *dev, struct 
> > device_node *np, u64 *dma_addr,
> >   dma_end = range.bus_addr + range.size;
> >  

Re: [PATCH v2 09/14] device core: Add ability to handle multiple dma offsets

2020-05-26 Thread Andy Shevchenko
On Tue, May 26, 2020 at 03:12:48PM -0400, Jim Quinlan wrote:
> The new field in struct device 'dma_pfn_offset_map' is used to facilitate
> the use of multiple pfn offsets between cpu addrs and dma addrs.  It is
> similar to 'dma_pfn_offset' except that the offset chosen depends on the
> cpu or dma address involved.
> 
> Signed-off-by: Jim Quinlan 
> ---
>  drivers/of/address.c| 65 +++--
>  drivers/usb/core/message.c  |  3 ++
>  drivers/usb/core/usb.c  |  3 ++
>  include/linux/device.h  | 10 +-
>  include/linux/dma-direct.h  | 10 --
>  include/linux/dma-mapping.h | 46 ++
>  kernel/dma/Kconfig  | 13 
>  7 files changed, 144 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 96d8cfb14a60..a01afffcde7d 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -918,6 +918,47 @@ void __iomem *of_io_request_and_map(struct device_node 
> *np, int index,
>  }
>  EXPORT_SYMBOL(of_io_request_and_map);
>  
> +#ifdef CONFIG_DMA_PFN_OFFSET_MAP
> +static int attach_dma_pfn_offset_map(struct device *dev,
> +  struct device_node *node, int num_ranges)
> +{
> + struct of_range_parser parser;
> + struct of_range range;
> + size_t r_size = (num_ranges + 1)
> + * sizeof(struct dma_pfn_offset_region);
> + struct dma_pfn_offset_region *r;
> +

> + r = devm_kzalloc(dev, r_size, GFP_KERNEL);

devm_?!

Looking at r_size it should be rather kcalloc().

> + if (!r)
> + return -ENOMEM;
> + dev->dma_pfn_offset_map = r;
> + of_dma_range_parser_init(&parser, node);
> +
> + /*
> +  * Record all info for DMA ranges array.  We could
> +  * just use the of_range struct, but if we did that it
> +  * would require more calculations for phys_to_dma and
> +  * dma_to_phys conversions.
> +  */
> + for_each_of_range(&parser, &range) {
> + r->cpu_beg = range.cpu_addr;
> + r->cpu_end = r->cpu_beg + range.size;
> + r->dma_beg = range.bus_addr;
> + r->dma_end = r->dma_beg + range.size;
> + r->pfn_offset = PFN_DOWN(range.cpu_addr)
> + - PFN_DOWN(range.bus_addr);
> + r++;
> + }
> + return 0;
> +}
> +#else
> +static int attach_dma_pfn_offset_map(struct device *dev,
> +  struct device_node *node, int num_ranges)
> +{
> + return 0;
> +}
> +#endif
> +
>  /**
>   * of_dma_get_range - Get DMA range info
>   * @dev: device pointer; only needed for a corner case.
> @@ -947,6 +988,8 @@ int of_dma_get_range(struct device *dev, struct 
> device_node *np, u64 *dma_addr,
>   struct of_range_parser parser;
>   struct of_range range;
>   u64 dma_start = U64_MAX, dma_end = 0, dma_offset = 0;
> + bool dma_multi_pfn_offset = false;
> + int num_ranges = 0;
>  
>   while (node) {
>   ranges = of_get_property(node, "dma-ranges", &len);
> @@ -977,10 +1020,19 @@ int of_dma_get_range(struct device *dev, struct 
> device_node *np, u64 *dma_addr,
>   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
>range.bus_addr, range.cpu_addr, range.size);
>  
> + num_ranges++;
>   if (dma_offset && range.cpu_addr - range.bus_addr != 
> dma_offset) {
> - pr_warn("Can't handle multiple dma-ranges with 
> different offsets on node(%pOF)\n", node);
> - /* Don't error out as we'd break some existing DTs */
> - continue;
> + if (!IS_ENABLED(CONFIG_DMA_PFN_OFFSET_MAP)) {
> + pr_warn("Can't handle multiple dma-ranges with 
> different offsets on node(%pOF)\n", node);
> + pr_warn("Perhaps set DMA_PFN_OFFSET_MAP=y?\n");
> + /*
> +  * Don't error out as we'd break some existing
> +  * DTs that are using configs w/o
> +  * CONFIG_DMA_PFN_OFFSET_MAP set.
> +  */
> + continue;
> + }
> + dma_multi_pfn_offset = true;
>   }
>   dma_offset = range.cpu_addr - range.bus_addr;
>  
> @@ -991,6 +1043,13 @@ int of_dma_get_range(struct device *dev, struct 
> device_node *np, u64 *dma_addr,
>   dma_end = range.bus_addr + range.size;
>   }
>  
> + if (dma_multi_pfn_offset) {
> + dma_offset = 0;
> + ret = attach_dma_pfn_offset_map(dev, node, num_ranges);
> + if (ret)
> + return ret;
> + }
> +
>   if (dma_start >= dma_end) {
>   ret = -EINVAL;
>   pr_debug("Invalid DMA ranges configuration on node(%pOF)\n",
> diff --git a/drivers/usb/core/

Re: [RFC 0/2] iommu: arm-smmu: Add support for early direct mappings

2020-05-26 Thread John Stultz
On Thu, May 14, 2020 at 12:34 PM  wrote:
>
> On Thu 27 Feb 18:57 PST 2020, Bjorn Andersson wrote:
>
> Rob, Will, we're reaching the point where upstream has enough
> functionality that this is becoming a critical issue for us.
>
> E.g. Lenovo Yoga C630 is lacking this and a single dts patch to boot
> mainline with display, GPU, WiFi and audio working and the story is
> similar on several devboards.
>
> As previously described, the only thing I want is the stream mapping
> related to the display controller in place, either with the CB with
> translation disabled or possibly with a way to specify the framebuffer
> region (although this turns out to mess things up in the display
> driver...)
>
> I did pick this up again recently and concluded that by omitting the
> streams for the USB controllers causes an instability issue seen on one
> of the controller to disappear. So I would prefer if we somehow could
> have a mechanism to only pick the display streams and the context
> allocation for this.
>
>
> Can you please share some pointers/insights/wishes for how we can
> conclude on this subject?

Ping? I just wanted to follow up on this discussion as this small
series is crucial for booting mainline on the Dragonboard 845c
devboard. It would be really valuable to be able to get some solution
upstream so we can test mainline w/o adding additional patches.

The rest of the db845c series has been moving forward smoothly, but
this set seems to be very stuck with no visible progress since Dec.

Are there any pointers for what folks would prefer to see?

thanks
-john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 09/14] device core: Add ability to handle multiple dma offsets

2020-05-26 Thread Jim Quinlan via iommu
The new field in struct device 'dma_pfn_offset_map' is used to facilitate
the use of multiple pfn offsets between cpu addrs and dma addrs.  It is
similar to 'dma_pfn_offset' except that the offset chosen depends on the
cpu or dma address involved.

Signed-off-by: Jim Quinlan 
---
 drivers/of/address.c| 65 +++--
 drivers/usb/core/message.c  |  3 ++
 drivers/usb/core/usb.c  |  3 ++
 include/linux/device.h  | 10 +-
 include/linux/dma-direct.h  | 10 --
 include/linux/dma-mapping.h | 46 ++
 kernel/dma/Kconfig  | 13 
 7 files changed, 144 insertions(+), 6 deletions(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 96d8cfb14a60..a01afffcde7d 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -918,6 +918,47 @@ void __iomem *of_io_request_and_map(struct device_node 
*np, int index,
 }
 EXPORT_SYMBOL(of_io_request_and_map);
 
+#ifdef CONFIG_DMA_PFN_OFFSET_MAP
+static int attach_dma_pfn_offset_map(struct device *dev,
+struct device_node *node, int num_ranges)
+{
+   struct of_range_parser parser;
+   struct of_range range;
+   size_t r_size = (num_ranges + 1)
+   * sizeof(struct dma_pfn_offset_region);
+   struct dma_pfn_offset_region *r;
+
+   r = devm_kzalloc(dev, r_size, GFP_KERNEL);
+   if (!r)
+   return -ENOMEM;
+   dev->dma_pfn_offset_map = r;
+   of_dma_range_parser_init(&parser, node);
+
+   /*
+* Record all info for DMA ranges array.  We could
+* just use the of_range struct, but if we did that it
+* would require more calculations for phys_to_dma and
+* dma_to_phys conversions.
+*/
+   for_each_of_range(&parser, &range) {
+   r->cpu_beg = range.cpu_addr;
+   r->cpu_end = r->cpu_beg + range.size;
+   r->dma_beg = range.bus_addr;
+   r->dma_end = r->dma_beg + range.size;
+   r->pfn_offset = PFN_DOWN(range.cpu_addr)
+   - PFN_DOWN(range.bus_addr);
+   r++;
+   }
+   return 0;
+}
+#else
+static int attach_dma_pfn_offset_map(struct device *dev,
+struct device_node *node, int num_ranges)
+{
+   return 0;
+}
+#endif
+
 /**
  * of_dma_get_range - Get DMA range info
  * @dev:   device pointer; only needed for a corner case.
@@ -947,6 +988,8 @@ int of_dma_get_range(struct device *dev, struct device_node 
*np, u64 *dma_addr,
struct of_range_parser parser;
struct of_range range;
u64 dma_start = U64_MAX, dma_end = 0, dma_offset = 0;
+   bool dma_multi_pfn_offset = false;
+   int num_ranges = 0;
 
while (node) {
ranges = of_get_property(node, "dma-ranges", &len);
@@ -977,10 +1020,19 @@ int of_dma_get_range(struct device *dev, struct 
device_node *np, u64 *dma_addr,
pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
 range.bus_addr, range.cpu_addr, range.size);
 
+   num_ranges++;
if (dma_offset && range.cpu_addr - range.bus_addr != 
dma_offset) {
-   pr_warn("Can't handle multiple dma-ranges with 
different offsets on node(%pOF)\n", node);
-   /* Don't error out as we'd break some existing DTs */
-   continue;
+   if (!IS_ENABLED(CONFIG_DMA_PFN_OFFSET_MAP)) {
+   pr_warn("Can't handle multiple dma-ranges with 
different offsets on node(%pOF)\n", node);
+   pr_warn("Perhaps set DMA_PFN_OFFSET_MAP=y?\n");
+   /*
+* Don't error out as we'd break some existing
+* DTs that are using configs w/o
+* CONFIG_DMA_PFN_OFFSET_MAP set.
+*/
+   continue;
+   }
+   dma_multi_pfn_offset = true;
}
dma_offset = range.cpu_addr - range.bus_addr;
 
@@ -991,6 +1043,13 @@ int of_dma_get_range(struct device *dev, struct 
device_node *np, u64 *dma_addr,
dma_end = range.bus_addr + range.size;
}
 
+   if (dma_multi_pfn_offset) {
+   dma_offset = 0;
+   ret = attach_dma_pfn_offset_map(dev, node, num_ranges);
+   if (ret)
+   return ret;
+   }
+
if (dma_start >= dma_end) {
ret = -EINVAL;
pr_debug("Invalid DMA ranges configuration on node(%pOF)\n",
diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
index 6197938dcc2d..aaa3e58f5eb4 100644
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -1960,6 +1960,9 @@ int usb_set_configuration(struct usb_device *dev, int 
configur

[PATCH v2 00/14] PCI: brcmstb: enable PCIe for STB chips

2020-05-26 Thread Jim Quinlan via iommu
v2:
Commit: "device core: Add ability to handle multiple dma offsets"
  o Added helper func attach_dma_pfn_offset_map() in address.c (Chistoph)
  o Helpers funcs added to __phys_to_dma() & __dma_to_phys() (Christoph)
  o Added warning when multiple offsets are needed and !DMA_PFN_OFFSET_MAP
  o dev->dma_pfn_map => dev->dma_pfn_offset_map
  o s/frm/from/ for dma_pfn_offset_frm_{phys,dma}_addr() (Christoph)
  o In device.h: s/const void */const struct dma_pfn_offset_region */
  o removed 'unlikely' from unlikely(dev->dma_pfn_offset_map) since
guarded by CONFIG_DMA_PFN_OFFSET_MAP (Christoph)
  o Since dev->dma_pfn_offset is copied in usb/core/{usb,message}.c, now
dev->dma_pfn_offset_map is copied as well.
  o Merged two of the DMA commits into one (Christoph).

Commit "arm: dma-mapping: Invoke dma offset func if needed":
  o Use helper functions instead of #if CONFIG_DMA_PFN_OFFSET

Other commits' changes:
  o Removed need for carrying of_id var in priv (Nicolas)
  o Commit message rewordings (Bjorn)
  o Commit log messages filled to 75 chars (Bjorn)
  o devm_reset_control_get_shared())
=> devm_reset_control_get_optional_shared (Philipp)
  o Add call to reset_control_assert() in PCIe remove routines (Philipp)

v1:
This patchset expands the usefulness of the Broadcom Settop Box PCIe
controller by building upon the PCIe driver used currently by the
Raspbery Pi.  Other forms of this patchset were submitted by me years
ago and not accepted; the major sticking point was the code required
for the DMA remapping needed for the PCIe driver to work [1].

There have been many changes to the DMA and OF subsystems since that
time, making a cleaner and less intrusive patchset possible.  This
patchset implements a generalization of "dev->dma_pfn_offset", except
that instead of a single scalar offset it provides for multiple
offsets via a function which depends upon the "dma-ranges" property of
the PCIe host controller.  This is required for proper functionality
of the BrcmSTB PCIe controller and possibly some other devices.

[1] 
https://lore.kernel.org/linux-arm-kernel/1516058925-46522-5-git-send-email-jim2101...@gmail.com/

Jim Quinlan (14):
  PCI: brcmstb: PCIE_BRCMSTB depends on ARCH_BRCMSTB
  ata: ahci_brcm: Fix use of BCM7216 reset controller
  dt-bindings: PCI: Add bindings for more Brcmstb chips
  PCI: brcmstb: Add bcm7278 reigister info
  PCI: brcmstb: Add suspend and resume pm_ops
  PCI: brcmstb: Add bcm7278 PERST support
  PCI: brcmstb: Add control of rescal reset
  of: Include a dev param in of_dma_get_range()
  device core: Add ability to handle multiple dma offsets
  arm: dma-mapping: Invoke dma offset func if needed
  PCI: brcmstb: Set internal memory viewport sizes
  PCI: brcmstb: Accommodate MSI for older chips
  PCI: brcmstb: Set bus max burst size by chip type
  PCI: brcmstb: Add bcm7211, bcm7216, bcm7445, bcm7278 to match list

 .../bindings/pci/brcm,stb-pcie.yaml   |  40 +-
 arch/arm/include/asm/dma-mapping.h|  13 +-
 drivers/ata/ahci_brcm.c   |  14 +-
 drivers/of/address.c  |  69 ++-
 drivers/of/device.c   |   2 +-
 drivers/of/of_private.h   |   8 +-
 drivers/pci/controller/Kconfig|   3 +-
 drivers/pci/controller/pcie-brcmstb.c | 408 +++---
 drivers/usb/core/message.c|   3 +
 drivers/usb/core/usb.c|   3 +
 include/linux/device.h|  10 +-
 include/linux/dma-direct.h|  10 +-
 include/linux/dma-mapping.h   |  46 ++
 kernel/dma/Kconfig|  13 +
 14 files changed, 559 insertions(+), 83 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5.6 014/126] iommu/amd: Fix over-read of ACPI UID from IVRS table

2020-05-26 Thread Greg Kroah-Hartman
From: Alexander Monakov 

[ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ]

IVRS parsing code always tries to read 255 bytes from memory when
retrieving ACPI device path, and makes an assumption that firmware
provides a zero-terminated string. Both of those are bugs: the entry
is likely to be shorter than 255 bytes, and zero-termination is not
guaranteed.

With Acer SF314-42 firmware these issues manifest visibly in dmesg:

AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1...

The first three lines show how the code over-reads adjacent table
entries into the UID, and in the last line it even reads garbage data
beyond the end of the IVRS table itself.

Since each entry has the length of the UID (uidl member of ivhd_entry
struct), use that for memcpy, and manually add a zero terminator.

Avoid zero-filling hid and uid arrays up front, and instead ensure
the uid array is always zero-terminated. No change needed for the hid
array, as it was already properly zero-terminated.

Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type 
HID")

Signed-off-by: Alexander Monakov 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20200511102352.1831-1-amona...@ispras.ru
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu_init.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 2b9a67ecc6ac..5b81fd16f5fa 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1329,8 +1329,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
}
case IVHD_DEV_ACPI_HID: {
u16 devid;
-   u8 hid[ACPIHID_HID_LEN] = {0};
-   u8 uid[ACPIHID_UID_LEN] = {0};
+   u8 hid[ACPIHID_HID_LEN];
+   u8 uid[ACPIHID_UID_LEN];
int ret;
 
if (h->type != 0x40) {
@@ -1347,6 +1347,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
}
 
+   uid[0] = '\0';
switch (e->uidf) {
case UID_NOT_PRESENT:
 
@@ -1361,8 +1362,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case UID_IS_CHARACTER:
 
-   memcpy(uid, (u8 *)(&e->uid), ACPIHID_UID_LEN - 
1);
-   uid[ACPIHID_UID_LEN - 1] = '\0';
+   memcpy(uid, &e->uid, e->uidl);
+   uid[e->uidl] = '\0';
 
break;
default:
-- 
2.25.1



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5.4 012/111] iommu/amd: Fix over-read of ACPI UID from IVRS table

2020-05-26 Thread Greg Kroah-Hartman
From: Alexander Monakov 

[ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ]

IVRS parsing code always tries to read 255 bytes from memory when
retrieving ACPI device path, and makes an assumption that firmware
provides a zero-terminated string. Both of those are bugs: the entry
is likely to be shorter than 255 bytes, and zero-termination is not
guaranteed.

With Acer SF314-42 firmware these issues manifest visibly in dmesg:

AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1...

The first three lines show how the code over-reads adjacent table
entries into the UID, and in the last line it even reads garbage data
beyond the end of the IVRS table itself.

Since each entry has the length of the UID (uidl member of ivhd_entry
struct), use that for memcpy, and manually add a zero terminator.

Avoid zero-filling hid and uid arrays up front, and instead ensure
the uid array is always zero-terminated. No change needed for the hid
array, as it was already properly zero-terminated.

Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type 
HID")

Signed-off-by: Alexander Monakov 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20200511102352.1831-1-amona...@ispras.ru
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu_init.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index ef14b00fa94b..135ae5222cf3 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1331,8 +1331,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
}
case IVHD_DEV_ACPI_HID: {
u16 devid;
-   u8 hid[ACPIHID_HID_LEN] = {0};
-   u8 uid[ACPIHID_UID_LEN] = {0};
+   u8 hid[ACPIHID_HID_LEN];
+   u8 uid[ACPIHID_UID_LEN];
int ret;
 
if (h->type != 0x40) {
@@ -1349,6 +1349,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
}
 
+   uid[0] = '\0';
switch (e->uidf) {
case UID_NOT_PRESENT:
 
@@ -1363,8 +1364,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case UID_IS_CHARACTER:
 
-   memcpy(uid, (u8 *)(&e->uid), ACPIHID_UID_LEN - 
1);
-   uid[ACPIHID_UID_LEN - 1] = '\0';
+   memcpy(uid, &e->uid, e->uidl);
+   uid[e->uidl] = '\0';
 
break;
default:
-- 
2.25.1



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4.19 12/81] iommu/amd: Fix over-read of ACPI UID from IVRS table

2020-05-26 Thread Greg Kroah-Hartman
From: Alexander Monakov 

[ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ]

IVRS parsing code always tries to read 255 bytes from memory when
retrieving ACPI device path, and makes an assumption that firmware
provides a zero-terminated string. Both of those are bugs: the entry
is likely to be shorter than 255 bytes, and zero-termination is not
guaranteed.

With Acer SF314-42 firmware these issues manifest visibly in dmesg:

AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1...

The first three lines show how the code over-reads adjacent table
entries into the UID, and in the last line it even reads garbage data
beyond the end of the IVRS table itself.

Since each entry has the length of the UID (uidl member of ivhd_entry
struct), use that for memcpy, and manually add a zero terminator.

Avoid zero-filling hid and uid arrays up front, and instead ensure
the uid array is always zero-terminated. No change needed for the hid
array, as it was already properly zero-terminated.

Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type 
HID")

Signed-off-by: Alexander Monakov 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20200511102352.1831-1-amona...@ispras.ru
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu_init.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 2557ed112bc2..c7d0bb3b4a30 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1334,8 +1334,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
}
case IVHD_DEV_ACPI_HID: {
u16 devid;
-   u8 hid[ACPIHID_HID_LEN] = {0};
-   u8 uid[ACPIHID_UID_LEN] = {0};
+   u8 hid[ACPIHID_HID_LEN];
+   u8 uid[ACPIHID_UID_LEN];
int ret;
 
if (h->type != 0x40) {
@@ -1352,6 +1352,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
}
 
+   uid[0] = '\0';
switch (e->uidf) {
case UID_NOT_PRESENT:
 
@@ -1366,8 +1367,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case UID_IS_CHARACTER:
 
-   memcpy(uid, (u8 *)(&e->uid), ACPIHID_UID_LEN - 
1);
-   uid[ACPIHID_UID_LEN - 1] = '\0';
+   memcpy(uid, &e->uid, e->uidl);
+   uid[e->uidl] = '\0';
 
break;
default:
-- 
2.25.1



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4.14 10/59] iommu/amd: Fix over-read of ACPI UID from IVRS table

2020-05-26 Thread Greg Kroah-Hartman
From: Alexander Monakov 

[ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ]

IVRS parsing code always tries to read 255 bytes from memory when
retrieving ACPI device path, and makes an assumption that firmware
provides a zero-terminated string. Both of those are bugs: the entry
is likely to be shorter than 255 bytes, and zero-termination is not
guaranteed.

With Acer SF314-42 firmware these issues manifest visibly in dmesg:

AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1...

The first three lines show how the code over-reads adjacent table
entries into the UID, and in the last line it even reads garbage data
beyond the end of the IVRS table itself.

Since each entry has the length of the UID (uidl member of ivhd_entry
struct), use that for memcpy, and manually add a zero terminator.

Avoid zero-filling hid and uid arrays up front, and instead ensure
the uid array is always zero-terminated. No change needed for the hid
array, as it was already properly zero-terminated.

Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type 
HID")

Signed-off-by: Alexander Monakov 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20200511102352.1831-1-amona...@ispras.ru
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu_init.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6c228144b3da..ec9a20e06941 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1317,8 +1317,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
}
case IVHD_DEV_ACPI_HID: {
u16 devid;
-   u8 hid[ACPIHID_HID_LEN] = {0};
-   u8 uid[ACPIHID_UID_LEN] = {0};
+   u8 hid[ACPIHID_HID_LEN];
+   u8 uid[ACPIHID_UID_LEN];
int ret;
 
if (h->type != 0x40) {
@@ -1335,6 +1335,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
}
 
+   uid[0] = '\0';
switch (e->uidf) {
case UID_NOT_PRESENT:
 
@@ -1349,8 +1350,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case UID_IS_CHARACTER:
 
-   memcpy(uid, (u8 *)(&e->uid), ACPIHID_UID_LEN - 
1);
-   uid[ACPIHID_UID_LEN - 1] = '\0';
+   memcpy(uid, &e->uid, e->uidl);
+   uid[e->uidl] = '\0';
 
break;
default:
-- 
2.25.1



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4.9 09/64] iommu/amd: Fix over-read of ACPI UID from IVRS table

2020-05-26 Thread Greg Kroah-Hartman
From: Alexander Monakov 

[ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ]

IVRS parsing code always tries to read 255 bytes from memory when
retrieving ACPI device path, and makes an assumption that firmware
provides a zero-terminated string. Both of those are bugs: the entry
is likely to be shorter than 255 bytes, and zero-termination is not
guaranteed.

With Acer SF314-42 firmware these issues manifest visibly in dmesg:

AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160
AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1...

The first three lines show how the code over-reads adjacent table
entries into the UID, and in the last line it even reads garbage data
beyond the end of the IVRS table itself.

Since each entry has the length of the UID (uidl member of ivhd_entry
struct), use that for memcpy, and manually add a zero terminator.

Avoid zero-filling hid and uid arrays up front, and instead ensure
the uid array is always zero-terminated. No change needed for the hid
array, as it was already properly zero-terminated.

Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type 
HID")

Signed-off-by: Alexander Monakov 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20200511102352.1831-1-amona...@ispras.ru
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu_init.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index e6ae8d123984..a3279f303b49 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1171,8 +1171,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
}
case IVHD_DEV_ACPI_HID: {
u16 devid;
-   u8 hid[ACPIHID_HID_LEN] = {0};
-   u8 uid[ACPIHID_UID_LEN] = {0};
+   u8 hid[ACPIHID_HID_LEN];
+   u8 uid[ACPIHID_UID_LEN];
int ret;
 
if (h->type != 0x40) {
@@ -1189,6 +1189,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
}
 
+   uid[0] = '\0';
switch (e->uidf) {
case UID_NOT_PRESENT:
 
@@ -1203,8 +1204,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case UID_IS_CHARACTER:
 
-   memcpy(uid, (u8 *)(&e->uid), ACPIHID_UID_LEN - 
1);
-   uid[ACPIHID_UID_LEN - 1] = '\0';
+   memcpy(uid, &e->uid, e->uidl);
+   uid[e->uidl] = '\0';
 
break;
default:
-- 
2.25.1



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Relax ACS requirement for RCiEP devices.

2020-05-26 Thread Raj, Ashok
On Tue, May 26, 2020 at 12:26:54PM -0600, Alex Williamson wrote:
> > > 
> > > I don't think the language in the spec is anything sufficient to handle
> > > RCiEP uniquely.  We've previously rejected kernel command line opt-outs
> > > for ACS, and the extent to which those patches still float around the
> > > user community and are blindly used to separate IOMMU groups are a
> > > testament to the failure of this approach.  Users do not have a basis
> > > for enabling this sort of opt-out.  The benefit is obvious in the IOMMU
> > > grouping, but the risk is entirely unknown.  A kconfig option is even
> > > worse as that means if you consume a downstream kernel, the downstream
> > > maintainers might have decided universally that isolation is less
> > > important than functionality.  
> > 
> > We discussed this internally, and Intel vt-d spec does spell out clearly 
> > in Section 3.16 Root-Complex Peer to Peer Considerations. The spec clearly
> > calls out that all p2p must be done on translated addresses and therefore
> > must go through the IOMMU.
> > 
> > I suppose they should also have some similar platform gauranteed behavior
> > for RCiEP's or MFD's *Must* behave as follows. The language is strict and
> > when IOMMU is enabled in the platform, everything is sent up north to the
> > IOMMU agent.
> > 
> > 3.16 Root-Complex Peer to Peer Considerations
> > When DMA remapping is enabled, peer-to-peer requests through the
> > Root-Complex must be handled
> > as follows:
> > • The input address in the request is translated (through first-level,
> >   second-level or nested translation) to a host physical address (HPA).
> >   The address decoding for peer addresses must be done only on the 
> >   translated HPA. Hardware implementations are free to further limit 
> >   peer-to-peer accesses to specific host physical address regions 
> >   (or to completely disallow peer-forwarding of translated requests).
> > • Since address translation changes the contents (address field) of the PCI
> >   Express Transaction Layer Packet (TLP), for PCI Express peer-to-peer 
> >   requests with ECRC, the Root-Complex hardware must use the new ECRC 
> >   (re-computed with the translated address) if it decides to forward 
> >   the TLP as a peer request.
> > • Root-ports, and multi-function root-complex integrated endpoints, may
> >   support additional peerto-peer control features by supporting PCI Express
> >   Access Control Services (ACS) capability. Refer to ACS capability in 
> >   PCI Express specifications for details.
> 
> That sounds like it might be a reasonable basis for quirking all RCiEPs
> on VT-d platforms if Intel is willing to stand behind it.  Thanks,
> 

Sounds good.. that's what i hear from our platform teams. If there is a
violation it would be a bug in silicon.  
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: Relax ACS requirement for RCiEP devices.

2020-05-26 Thread Alex Williamson
On Tue, 26 May 2020 11:06:48 -0700
"Raj, Ashok"  wrote:

> Hi Alex,
> 
> I was able to find better language in the IOMMU spec that gaurantees 
> the behavior we need. See below.
> 
> 
> On Tue, May 05, 2020 at 09:34:14AM -0600, Alex Williamson wrote:
> > On Tue, 5 May 2020 07:56:06 -0700
> > "Raj, Ashok"  wrote:
> >   
> > > On Tue, May 05, 2020 at 08:05:14AM -0600, Alex Williamson wrote:  
> > > > On Mon, 4 May 2020 23:11:07 -0700
> > > > "Raj, Ashok"  wrote:
> > > > 
> > > > > Hi Alex
> > > > > 
> > > > > + Joerg, accidently missed in the Cc.
> > > > > 
> > > > > On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote:
> > > > > > On Mon,  4 May 2020 21:42:16 -0700
> > > > > > Ashok Raj  wrote:
> > > > > >   
> > > > > > > PCIe Spec recommends we can relax ACS requirement for RCIEP 
> > > > > > > devices.
> > > > > > > 
> > > > > > > PCIe 5.0 Specification.
> > > > > > > 6.12 Access Control Services (ACS)
> > > > > > > Implementation of ACS in RCiEPs is permitted but not required. It 
> > > > > > > is
> > > > > > > explicitly permitted that, within a single Root Complex, some 
> > > > > > > RCiEPs
> > > > > > > implement ACS and some do not. It is strongly recommended that 
> > > > > > > Root Complex
> > > > > > > implementations ensure that all accesses originating from RCiEPs
> > > > > > > (PFs and VFs) without ACS capability are first subjected to 
> > > > > > > processing by
> > > > > > > the Translation Agent (TA) in the Root Complex before further 
> > > > > > > decoding and
> > > > > > > processing. The details of such Root Complex handling are outside 
> > > > > > > the scope
> > > > > > > of this specification.
> > > > > > > 
> > > > > > 
> > > > > > Is the language here really strong enough to make this change?  ACS 
> > > > > > is
> > > > > > an optional feature, so being permitted but not required is rather
> > > > > > meaningless.  The spec is also specifically avoiding the words 
> > > > > > "must"
> > > > > > or "shall" and even when emphasized with "strongly", we still only 
> > > > > > have
> > > > > > a recommendation that may or may not be honored.  This seems like a
> > > > > > weak basis for assuming that RCiEPs universally honor this
> > > > > > recommendation.  Thanks,
> > > > > >   
> > > > > 
> > > > > We are speaking about PCIe spec, where people write it about 5 years 
> > > > > ahead
> > > > > and every vendor tries to massage their product behavior with vague
> > > > > words like this..  :)
> > > > > 
> > > > > But honestly for any any RCiEP, or even integrated endpoints, there 
> > > > > is no way to send them except up north. These aren't behind a RP.
> > > > 
> > > > But they are multi-function devices and the spec doesn't define routing
> > > > within multifunction packages.  A single function RCiEP will already be
> > > > assumed isolated within its own group.
> > > 
> > > That's right. The other two devices only have legacy PCI headers. So 
> > > they can't claim to be RCiEP's but just integrated endpoints. The legacy
> > > devices don't even have a PCIe header.
> > > 
> > > I honestly don't know why these are groped as MFD's in the first place.
> > >   
> > > >  
> > > > > I did check with couple folks who are part of the SIG, and seem to 
> > > > > agree
> > > > > that ACS treatment for RCiEP's doesn't mean much. 
> > > > > 
> > > > > I understand the language isn't strong, but it doesn't seem like ACS 
> > > > > should
> > > > > be a strong requirement for RCiEP's and reasonable to relax.
> > > > > 
> > > > > What are your thoughts? 
> > > > 
> > > > I think hardware vendors have ACS at their disposal to clarify when
> > > > isolation is provided, otherwise vendors can submit quirks, but I don't
> > > > see that the "strongly recommended" phrasing is sufficient to assume
> > > > isolation between multifunction RCiEPs.  Thanks,
> > > 
> > > You point is that integrated MFD endpoints, without ACS, there is no 
> > > gaurantee to SW that they are isolated.
> > > 
> > > As far as a quirk, do you think:
> > >   - a cmdline optput for integrated endpoints, and RCiEP's suffice?
> > > along with a compile time default that is strict enforcement
> > >   - typical vid/did type exception list?
> > > 
> > > A more generic way to ask for exception would be scalable until we can 
> > > stop
> > > those type of integrated devices. Or we need to maintain these device 
> > > lists
> > > for eternity.   
> > 
> > I don't think the language in the spec is anything sufficient to handle
> > RCiEP uniquely.  We've previously rejected kernel command line opt-outs
> > for ACS, and the extent to which those patches still float around the
> > user community and are blindly used to separate IOMMU groups are a
> > testament to the failure of this approach.  Users do not have a basis
> > for enabling this sort of opt-out.  The benefit is obvious in the IOMMU
> > grouping, but the risk is entirely unknown.  A kconfig option is even
> > worse as 

[PATCH v3] dma: Fix max PFN arithmetic overflow on 32 bit systems

2020-05-26 Thread Alexander Dahl
The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is
4 294 967 296 or 0x1 which is no problem on 64 bit systems.  The
patch does not change the later overall result of 0x10 for
MAX_DMA32_PFN.  The new calculation yields the same result, but does not
require 64 bit arithmetic.

On 32 bit systems the old calculation suffers from an arithmetic
overflow in that intermediate term in braces: 4UL aka unsigned long int
is 4 byte wide and an arithmetic overflow happens (the 0x1 does
not fit in 4 bytes), the in braces result is truncated to zero, the
following right shift does not alter that, so MAX_DMA32_PFN evaluates to
0 on 32 bit systems.

That wrong value is a problem in a comparision against MAX_DMA32_PFN in
the init code for swiotlb in 'pci_swiotlb_detect_4gb()' to decide if
swiotlb should be active.  That comparison yields the opposite result,
when compiling on 32 bit systems.

This was not possible before 1b7e03ef7570 ("x86, NUMA: Enable emulation
on 32bit too") when that MAX_DMA32_PFN was first made visible to x86_32
(and which landed in v3.0).

In practice this wasn't a problem, unless you activated CONFIG_SWIOTLB
on x86 (32 bit).

However for ARCH=x86 (32 bit) and if you have set CONFIG_IOMMU_INTEL,
since c5a5dc4cbbf4 ("iommu/vt-d: Don't switch off swiotlb if bounce page
is used") there's a dependency on CONFIG_SWIOTLB, which was not
necessarily active before.  That landed in v5.4, where we noticed it in
the fli4l Linux distribution.  We have CONFIG_IOMMU_INTEL active on both
32 and 64 bit kernel configs there (I could not find out why, so let's
just say historical reasons).

The effect is at boot time 64 MiB (default size) were allocated for
bounce buffers now, which is a noticeable amount of memory on small
systems like pcengines ALIX 2D3 with 256 MiB memory, which are still
frequently used as home routers.

We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4
(LTS) in fli4l and got that kernel messages for example:

  Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 
2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018
  …
  Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K 
rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem)
  …
  PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB)

The initial analysis and the suggested fix was done by user 'sourcejedi'
at stackoverflow and explicitly marked as GPLv2 for inclusion in the
Linux kernel:

  https://unix.stackexchange.com/a/520525/50007

The new calculation, which does not suffer from that overflow, is the
same as for arch/mips now as suggested by Robin Murphy.

The fix was tested by fli4l users on round about two dozen different
systems, including both 32 and 64 bit archs, bare metal and virtualized
machines.

Fixes: 1b7e03ef7570 ("x86, NUMA: Enable emulation on 32bit too")
Fixes: https://web.nettworks.org/bugs/browse/FFL-2560
Fixes: https://unix.stackexchange.com/q/520065/50007
Reported-by: Alan Jenkins 
Suggested-by: Robin Murphy 
Signed-off-by: Alexander Dahl 
Cc: sta...@vger.kernel.org
---

Notes:
v3:
  - rewritten commit message to better explain that arithmetic overflow
and added Fixes tag (Greg Kroah-Hartman)
  - rebased on v5.7-rc7

v2:
  - use the same calculation as with arch/mips (Robin Murphy)

 arch/x86/include/asm/dma.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/dma.h b/arch/x86/include/asm/dma.h
index 00f7cf45e699..8e95aa4b0d17 100644
--- a/arch/x86/include/asm/dma.h
+++ b/arch/x86/include/asm/dma.h
@@ -74,7 +74,7 @@
 #define MAX_DMA_PFN   ((16UL * 1024 * 1024) >> PAGE_SHIFT)
 
 /* 4GB broken PCI/AGP hardware bus master zone */
-#define MAX_DMA32_PFN ((4UL * 1024 * 1024 * 1024) >> PAGE_SHIFT)
+#define MAX_DMA32_PFN (1UL << (32 - PAGE_SHIFT))
 
 #ifdef CONFIG_X86_32
 /* The maximum address that we can perform a DMA transfer to on this platform 
*/
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: Relax ACS requirement for RCiEP devices.

2020-05-26 Thread Raj, Ashok
Hi Alex,

I was able to find better language in the IOMMU spec that gaurantees 
the behavior we need. See below.


On Tue, May 05, 2020 at 09:34:14AM -0600, Alex Williamson wrote:
> On Tue, 5 May 2020 07:56:06 -0700
> "Raj, Ashok"  wrote:
> 
> > On Tue, May 05, 2020 at 08:05:14AM -0600, Alex Williamson wrote:
> > > On Mon, 4 May 2020 23:11:07 -0700
> > > "Raj, Ashok"  wrote:
> > >   
> > > > Hi Alex
> > > > 
> > > > + Joerg, accidently missed in the Cc.
> > > > 
> > > > On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote:  
> > > > > On Mon,  4 May 2020 21:42:16 -0700
> > > > > Ashok Raj  wrote:
> > > > > 
> > > > > > PCIe Spec recommends we can relax ACS requirement for RCIEP devices.
> > > > > > 
> > > > > > PCIe 5.0 Specification.
> > > > > > 6.12 Access Control Services (ACS)
> > > > > > Implementation of ACS in RCiEPs is permitted but not required. It is
> > > > > > explicitly permitted that, within a single Root Complex, some RCiEPs
> > > > > > implement ACS and some do not. It is strongly recommended that Root 
> > > > > > Complex
> > > > > > implementations ensure that all accesses originating from RCiEPs
> > > > > > (PFs and VFs) without ACS capability are first subjected to 
> > > > > > processing by
> > > > > > the Translation Agent (TA) in the Root Complex before further 
> > > > > > decoding and
> > > > > > processing. The details of such Root Complex handling are outside 
> > > > > > the scope
> > > > > > of this specification.
> > > > > >   
> > > > > 
> > > > > Is the language here really strong enough to make this change?  ACS is
> > > > > an optional feature, so being permitted but not required is rather
> > > > > meaningless.  The spec is also specifically avoiding the words "must"
> > > > > or "shall" and even when emphasized with "strongly", we still only 
> > > > > have
> > > > > a recommendation that may or may not be honored.  This seems like a
> > > > > weak basis for assuming that RCiEPs universally honor this
> > > > > recommendation.  Thanks,
> > > > > 
> > > > 
> > > > We are speaking about PCIe spec, where people write it about 5 years 
> > > > ahead
> > > > and every vendor tries to massage their product behavior with vague
> > > > words like this..  :)
> > > > 
> > > > But honestly for any any RCiEP, or even integrated endpoints, there 
> > > > is no way to send them except up north. These aren't behind a RP.  
> > > 
> > > But they are multi-function devices and the spec doesn't define routing
> > > within multifunction packages.  A single function RCiEP will already be
> > > assumed isolated within its own group.  
> > 
> > That's right. The other two devices only have legacy PCI headers. So 
> > they can't claim to be RCiEP's but just integrated endpoints. The legacy
> > devices don't even have a PCIe header.
> > 
> > I honestly don't know why these are groped as MFD's in the first place.
> > 
> > >
> > > > I did check with couple folks who are part of the SIG, and seem to agree
> > > > that ACS treatment for RCiEP's doesn't mean much. 
> > > > 
> > > > I understand the language isn't strong, but it doesn't seem like ACS 
> > > > should
> > > > be a strong requirement for RCiEP's and reasonable to relax.
> > > > 
> > > > What are your thoughts?   
> > > 
> > > I think hardware vendors have ACS at their disposal to clarify when
> > > isolation is provided, otherwise vendors can submit quirks, but I don't
> > > see that the "strongly recommended" phrasing is sufficient to assume
> > > isolation between multifunction RCiEPs.  Thanks,  
> > 
> > You point is that integrated MFD endpoints, without ACS, there is no 
> > gaurantee to SW that they are isolated.
> > 
> > As far as a quirk, do you think:
> > - a cmdline optput for integrated endpoints, and RCiEP's suffice?
> >   along with a compile time default that is strict enforcement
> > - typical vid/did type exception list?
> > 
> > A more generic way to ask for exception would be scalable until we can stop
> > those type of integrated devices. Or we need to maintain these device lists
> > for eternity. 
> 
> I don't think the language in the spec is anything sufficient to handle
> RCiEP uniquely.  We've previously rejected kernel command line opt-outs
> for ACS, and the extent to which those patches still float around the
> user community and are blindly used to separate IOMMU groups are a
> testament to the failure of this approach.  Users do not have a basis
> for enabling this sort of opt-out.  The benefit is obvious in the IOMMU
> grouping, but the risk is entirely unknown.  A kconfig option is even
> worse as that means if you consume a downstream kernel, the downstream
> maintainers might have decided universally that isolation is less
> important than functionality.

We discussed this internally, and Intel vt-d spec does spell out clearly 
in Section 3.16 Root-Complex Peer to Peer Considerations. The spec clearly
calls out that all p2p must be done on translated addresses and there

Re: [PATCH 1/2] PCI: Introduce PCI_FIXUP_IOMMU

2020-05-26 Thread Zhangfei Gao

Hi, Christoph

On 2020/5/26 下午10:46, Christoph Hellwig wrote:

On Tue, May 26, 2020 at 07:49:08PM +0800, Zhangfei Gao wrote:

Some platform devices appear as PCI but are actually on the AMBA bus,
and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
down iommu probing as all devices in fixup final list will be
reprocessed.

Who is going to use this?  I don't see a single user in the series.

We will add iommu fixup in drivers/pci/quirks.c, handling

fwspec->can_stall, which is introduced in

https://www.spinics.net/lists/linux-pci/msg94559.html

Unfortunately, the patch does not catch v5.8, so we have to wait.
And we want to check whether this is a right method to solve this issue.

Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] PCI: Introduce PCI_FIXUP_IOMMU

2020-05-26 Thread Christoph Hellwig
On Tue, May 26, 2020 at 07:49:08PM +0800, Zhangfei Gao wrote:
> Some platform devices appear as PCI but are actually on the AMBA bus,
> and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
> Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
> is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
> down iommu probing as all devices in fixup final list will be
> reprocessed.

Who is going to use this?  I don't see a single user in the series.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Let pci_fixup_final access iommu_fwnode

2020-05-26 Thread Zhangfei Gao



On 2020/5/25 下午9:43, Joerg Roedel wrote:

On Tue, May 12, 2020 at 12:08:29PM +0800, Zhangfei Gao wrote:

Some platform devices appear as PCI but are
actually on the AMBA bus, and they need fixup in
drivers/pci/quirks.c handling iommu_fwnode.
So calling pci_fixup_final after iommu_fwnode is allocated.

For example:
Hisilicon platform device need fixup in
drivers/pci/quirks.c

+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+   struct iommu_fwspec *fwspec;
+
+   pdev->eetlp_prefix_path = 1;
+   fwspec = dev_iommu_fwspec_get(&pdev->dev);
+   if (fwspec)
+   fwspec->can_stall = 1;
+}
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);

I don't think it is a great idea to hook this into PCI_FIXUP_FINAL. The
fixup list needs to be processed for every device, which will slow down
probing.

So either we introduce something like PCI_FIXUP_IOMMU, if this is
entirely PCI specific. If it needs to be generic we need some fixup
infrastructure in the IOMMU code itself.


Thanks Joerg for the good suggestion.
I am trying to introduce PCI_FIXUP_IOMMU in
https://lkml.org/lkml/2020/5/26/366

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/2] iommu: calling pci_fixup_iommu in iommu_fwspec_init

2020-05-26 Thread Zhangfei Gao
Calling pci_fixup_iommu in iommu_fwspec_init, which alloc
iommu_fwnode. Some platform devices appear as PCI but are
actually on the AMBA bus, and they need fixup in
drivers/pci/quirks.c handling iommu_fwnode.
So calling pci_fixup_iommu after iommu_fwnode is allocated.

Signed-off-by: Zhangfei Gao 
---
 drivers/iommu/iommu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7b37542..fb84c42 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct 
fwnode_handle *iommu_fwnode,
fwspec->iommu_fwnode = iommu_fwnode;
fwspec->ops = ops;
dev_iommu_fwspec_set(dev, fwspec);
+
+   if (dev_is_pci(dev))
+   pci_fixup_device(pci_fixup_iommu, to_pci_dev(dev));
+
return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_init);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] PCI: Introduce PCI_FIXUP_IOMMU

2020-05-26 Thread Zhangfei Gao
Some platform devices appear as PCI but are actually on the AMBA bus,
and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
down iommu probing as all devices in fixup final list will be
reprocessed.

Suggested-by: Joerg Roedel 
Signed-off-by: Zhangfei Gao 
---
 drivers/pci/quirks.c  | 7 +++
 include/asm-generic/vmlinux.lds.h | 3 +++
 include/linux/pci.h   | 8 
 3 files changed, 18 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index ca9ed57..b037034 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -83,6 +83,8 @@ extern struct pci_fixup __start_pci_fixups_header[];
 extern struct pci_fixup __end_pci_fixups_header[];
 extern struct pci_fixup __start_pci_fixups_final[];
 extern struct pci_fixup __end_pci_fixups_final[];
+extern struct pci_fixup __start_pci_fixups_iommu[];
+extern struct pci_fixup __end_pci_fixups_iommu[];
 extern struct pci_fixup __start_pci_fixups_enable[];
 extern struct pci_fixup __end_pci_fixups_enable[];
 extern struct pci_fixup __start_pci_fixups_resume[];
@@ -118,6 +120,11 @@ void pci_fixup_device(enum pci_fixup_pass pass, struct 
pci_dev *dev)
end = __end_pci_fixups_final;
break;
 
+   case pci_fixup_iommu:
+   start = __start_pci_fixups_iommu;
+   end = __end_pci_fixups_iommu;
+   break;
+
case pci_fixup_enable:
start = __start_pci_fixups_enable;
end = __end_pci_fixups_enable;
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 71e387a..3da32d8 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -411,6 +411,9 @@
__start_pci_fixups_final = .;   \
KEEP(*(.pci_fixup_final))   \
__end_pci_fixups_final = .; \
+   __start_pci_fixups_iommu = .;   \
+   KEEP(*(.pci_fixup_iommu))   \
+   __end_pci_fixups_iommu = .; \
__start_pci_fixups_enable = .;  \
KEEP(*(.pci_fixup_enable))  \
__end_pci_fixups_enable = .;\
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 83ce1cd..0d5fbf8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1892,6 +1892,7 @@ enum pci_fixup_pass {
pci_fixup_early,/* Before probing BARs */
pci_fixup_header,   /* After reading configuration header */
pci_fixup_final,/* Final phase of device fixups */
+   pci_fixup_iommu,/* After iommu_fwspec_init */
pci_fixup_enable,   /* pci_enable_device() time */
pci_fixup_resume,   /* pci_device_resume() */
pci_fixup_suspend,  /* pci_device_suspend() */
@@ -1934,6 +1935,10 @@ enum pci_fixup_pass {
 class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_final, \
hook, vendor, device, class, class_shift, hook)
+#define DECLARE_PCI_FIXUP_CLASS_IOMMU(vendor, device, class,   \
+class_shift, hook) \
+   DECLARE_PCI_FIXUP_SECTION(.pci_fixup_iommu, \
+   hook, vendor, device, class, class_shift, hook)
 #define DECLARE_PCI_FIXUP_CLASS_ENABLE(vendor, device, class,  \
 class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_enable,\
@@ -1964,6 +1969,9 @@ enum pci_fixup_pass {
 #define DECLARE_PCI_FIXUP_FINAL(vendor, device, hook)  \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_final, \
hook, vendor, device, PCI_ANY_ID, 0, hook)
+#define DECLARE_PCI_FIXUP_IOMMU(vendor, device, hook)  \
+   DECLARE_PCI_FIXUP_SECTION(.pci_fixup_iommu, \
+   hook, vendor, device, PCI_ANY_ID, 0, hook)
 #define DECLARE_PCI_FIXUP_ENABLE(vendor, device, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_enable,\
hook, vendor, device, PCI_ANY_ID, 0, hook)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-05-26 Thread Zhangfei Gao
Some platform devices appear as PCI but are actually on the AMBA bus,
and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
down iommu probing as all devices in fixup final list will be
reprocessed, suggested by Joerg, [1]

For example:
Hisilicon platform device need fixup in
drivers/pci/quirks.c handling fwspec->can_stall, which is introduced in [2]

+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+struct iommu_fwspec *fwspec;
+
+pdev->eetlp_prefix_path = 1;
+fwspec = dev_iommu_fwspec_get(&pdev->dev);
+if (fwspec)
+fwspec->can_stall = 1;
+}
+
+DECLARE_PCI_FIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_iFIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva); 

[1] https://www.spinics.net/lists/iommu/msg44591.html
[2] https://www.spinics.net/lists/linux-pci/msg94559.html

Zhangfei Gao (2):
  PCI: Introduce PCI_FIXUP_IOMMU
  iommu: calling pci_fixup_iommu in iommu_fwspec_init

 drivers/iommu/iommu.c | 4 
 drivers/pci/quirks.c  | 7 +++
 include/asm-generic/vmlinux.lds.h | 3 +++
 include/linux/pci.h   | 8 
 4 files changed, 22 insertions(+)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 00/38] DRM: fix struct sg_table nents vs. orig_nents misuse

2020-05-26 Thread Daniel Vetter
On Tue, May 26, 2020 at 9:01 AM Marek Szyprowski
 wrote:
>
> Hi
>
> On 13.05.2020 15:47, Christoph Hellwig wrote:
> > I've pushed out a branch with the first three patches here:
> >
> > git://git.infradead.org/users/hch/dma-mapping.git dma-sg_table-helper
> >
> > Gitweb:
> >
> > 
> > http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/dma-sg_table-helper
> >
> > and merged it into the dma-mapping for-next tree.  Unless someone shouts
> > the branch should be considered immutable in 24 hours.
>
> David & Daniel: could you merge all the DRM related changes on top of
> the provided branch? Merging those changes separately would take a lots
> of time because of the dependencies on the sgtable helpers and changes
> in the DRM core.

We generally freeze drm for big rework past -rc6 (small drivers tend
to be a bit later). I think simpler if we just land this in the merge
window and then smash the drm patches on top for 5.9 merge window.

Or all in in the same topic branch, but feels a bit late for that and
making sure nothing breaks.
-Daniel

>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/hyper-v: Constify hyperv_ir_domain_ops

2020-05-26 Thread Wei Liu
On Mon, May 25, 2020 at 11:49:57PM +0200, Rikard Falkeborn wrote:
> The struct hyperv_ir_domain_ops is not modified and can be made const to
> allow the compiler to put it in read-only memory.
> 
> Before:
>textdata bss dec hex filename
>29161180112052161460 drivers/iommu/hyperv-iommu.o
> 
> After:
>textdata bss dec hex filename
>30441052112052161460 drivers/iommu/hyperv-iommu.o
> 
> Signed-off-by: Rikard Falkeborn 

Acked-by: Wei Liu 

> ---
>  drivers/iommu/hyperv-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index a386b83e0e34..3c0c67a99c7b 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -131,7 +131,7 @@ static int hyperv_irq_remapping_activate(struct 
> irq_domain *domain,
>   return 0;
>  }
>  
> -static struct irq_domain_ops hyperv_ir_domain_ops = {
> +static const struct irq_domain_ops hyperv_ir_domain_ops = {
>   .alloc = hyperv_irq_remapping_alloc,
>   .free = hyperv_irq_remapping_free,
>   .activate = hyperv_irq_remapping_activate,
> -- 
> 2.26.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/sun50i: Constify sun50i_iommu_ops

2020-05-26 Thread Maxime Ripard
On Mon, May 25, 2020 at 11:49:58PM +0200, Rikard Falkeborn wrote:
> The struct sun50i_iommu_ops is not modified and can be made const to
> allow the compiler to put it in read-only memory.
> 
> Before:
>textdata bss dec hex filename
>   143582501  64   16923421b drivers/iommu/sun50i-iommu.o
> 
> After:
>textdata bss dec hex filename
>   147262117  64   16907420b drivers/iommu/sun50i-iommu.o
> 
> Signed-off-by: Rikard Falkeborn 

Acked-by: Maxime Ripard 

Thanks!
Maxime
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: limit iova free size to unmmaped iova

2020-05-26 Thread guptap

On 2020-05-22 14:54, Robin Murphy wrote:

On 2020-05-22 07:25, gup...@codeaurora.org wrote:

On 2020-05-22 01:46, Robin Murphy wrote:

On 2020-05-21 12:30, Prakash Gupta wrote:

I agree, we shouldn't be freeing the partial iova. Instead just making
sure if unmap was successful should be sufficient before freeing iova. 
So change

can instead be something like this:

-    iommu_dma_free_iova(cookie, dma_addr, size);
+    if (unmapped)
+    iommu_dma_free_iova(cookie, dma_addr, size);


TBH my gut feeling here is that you're really just trying to treat a
symptom of another bug elsewhere, namely some driver calling
dma_unmap_* or dma_free_* with the wrong address or size in the first
place.

This condition would arise only if driver calling dma_unmap/free_* 
with 0
iova_pfn. This will be flagged with a warning during unmap but will 
trigger
panic later on while doing unrelated dma_map/unmap_*. If unmapped has 
already
failed for invalid iova, there is no reason we should consider this as 
valid

iova and free. This part should be fixed.


I disagree. In general, if drivers call the DMA API incorrectly it is
liable to lead to data loss, memory corruption, and various other
unpleasant misbehaviour - it is not the DMA layer's job to attempt to
paper over driver bugs.

There *is* an argument for downgrading the BUG_ON() in
iova_magazine_free_pfns() to a WARN_ON(), since frankly it isn't a
sufficiently serious condition to justify killing the whole machine
immediately, but NAK to bodging the iommu-dma mid-layer to "fix" that.
A serious bug already happened elsewhere, so trying to hide the
fallout really doesn't help anyone.


Sorry for delayed response, it was a long weekend.
I agree that invalid DMA API call can result in unexpected issues and 
client
should fix it, but then the present behavior makes it difficult to catch 
cases
when driver is making wrong DMA API calls. When invalid iova pfn is 
passed it
doesn't fail then and there, though DMA layer is aware of iova being 
invalid. It
fails much after that in the context of an valid map/unmap, with 
BUG_ON().


Downgrading BUG_ON() to WARN_ON() in iova_magazine_free_pfns() will not 
help

much as invalid iova will cause NULL pointer dereference.

I see no reason why DMA layer wants to free an iova for which unmapped 
failed.
IMHO queuing an invalid iova (which already failed unmap) to rcache 
which

eventually going to crash the system looks like iommu-dma layer issue.

Thanks,
Prakash
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 00/38] DRM: fix struct sg_table nents vs. orig_nents misuse

2020-05-26 Thread Marek Szyprowski
Hi

On 13.05.2020 15:47, Christoph Hellwig wrote:
> I've pushed out a branch with the first three patches here:
>
> git://git.infradead.org/users/hch/dma-mapping.git dma-sg_table-helper
>
> Gitweb:
>
> 
> http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/dma-sg_table-helper
>
> and merged it into the dma-mapping for-next tree.  Unless someone shouts
> the branch should be considered immutable in 24 hours.

David & Daniel: could you merge all the DRM related changes on top of 
the provided branch? Merging those changes separately would take a lots 
of time because of the dependencies on the sgtable helpers and changes 
in the DRM core.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu