Re: [PATCH v1 1/2] iommu/vt-d: Move intel_iommu_ops to header file

2021-10-07 Thread Lu Baolu

Hi Andy,

On 10/7/21 12:14 AM, Andy Shevchenko wrote:

On Fri, Jul 30, 2021 at 10:20:08AM +0800, Lu Baolu wrote:

Hi Andy,

On 7/30/21 12:35 AM, Andy Shevchenko wrote:

Compiler is not happy about hidden declaration of intel_iommu_ops.

.../drivers/iommu/intel/iommu.c:414:24: warning: symbol 'intel_iommu_ops' was 
not declared. Should it be static?

Move declaration to header file to make compiler happy.


Thanks for the cleanup. Sharing data structures between different files
doesn't seem to be a good design. How about adding a helper so that the
intel_iommu_ops could be a static one?


I don't see any change in the upstream. What's the plan?
Can we take my patch as a quick fix?



Thanks for the reminding.

Can you please tell in which kernel configuration could above warning
be triggered?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/2] iommu/vt-d: avoid duplicated removing in __domain_mapping

2021-10-07 Thread Lu Baolu

On 10/8/21 10:07 AM, Lu Baolu wrote:

On 10/8/21 8:04 AM, Longpeng(Mike) wrote:

__domain_mapping() always removes the pages in the range from
'iov_pfn' to 'end_pfn', but the 'end_pfn' is always the last pfn
of the range that the caller wants to map.

This would introduce too many duplicated removing and leads the
map operation take too long, for example:

   Map iova=0x10,nr_pages=0x7d61800
 iov_pfn: 0x10, end_pfn: 0x7e617ff
 iov_pfn: 0x14, end_pfn: 0x7e617ff
 iov_pfn: 0x18, end_pfn: 0x7e617ff
 iov_pfn: 0x1c, end_pfn: 0x7e617ff
 iov_pfn: 0x20, end_pfn: 0x7e617ff
 ...
   it takes about 50ms in total.

We can reduce the cost by recalculate the 'end_pfn' and limit it
to the boundary of the end of this pte page.

   Map iova=0x10,nr_pages=0x7d61800
 iov_pfn: 0x10, end_pfn: 0x13
 iov_pfn: 0x14, end_pfn: 0x17
 iov_pfn: 0x18, end_pfn: 0x1b
 iov_pfn: 0x1c, end_pfn: 0x1f
 iov_pfn: 0x20, end_pfn: 0x23
 ...
   it only need 9ms now.

Signed-off-by: Longpeng(Mike) 
---
  drivers/iommu/intel/iommu.c | 11 ++-
  include/linux/intel-iommu.h |  6 ++
  2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59a..46edae6 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2354,12 +2354,17 @@ static void switch_to_super_page(struct 
dmar_domain *domain,

  return -ENOMEM;
  first_pte = pte;
+    lvl_pages = lvl_to_nr_pages(largepage_lvl);
+
  /* It is large page*/
  if (largepage_lvl > 1) {
  unsigned long end_pfn;
+    unsigned long pages_to_remove;
  pteval |= DMA_PTE_LARGE_PAGE;
-    end_pfn = ((iov_pfn + nr_pages) & 
level_mask(largepage_lvl)) - 1;

+    pages_to_remove = min_t(unsigned long, nr_pages,
+    nr_pte_to_next_page(pte) * lvl_pages);
+    end_pfn = iov_pfn + pages_to_remove - 1;
  switch_to_super_page(domain, iov_pfn, end_pfn, 
largepage_lvl);

  } else {
  pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
@@ -2381,10 +2386,6 @@ static void switch_to_super_page(struct 
dmar_domain *domain,

  WARN_ON(1);
  }
-    lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
-    BUG_ON(nr_pages < lvl_pages);
-
  nr_pages -= lvl_pages;
  iov_pfn += lvl_pages;
  phys_pfn += lvl_pages;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 9bcabc7..b29b2a3 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -713,6 +713,12 @@ static inline bool first_pte_in_page(struct 
dma_pte *pte)

  return IS_ALIGNED((unsigned long)pte, VTD_PAGE_SIZE);
  }
+static inline int nr_pte_to_next_page(struct dma_pte *pte)
+{
+    return first_pte_in_page(pte) ? BIT_ULL(VTD_STRIDE_SHIFT) :
+    (struct dma_pte *)ALIGN((unsigned long)pte, VTD_PAGE_SIZE) - 
pte;


We should make it like this to avoid the 0day warning:

 (struct dma_pte *)(uintptr_t)VTD_PAGE_ALIGN((unsigned long)pte) - pte;

Can you please test this line of change? No need to send a new version.
I will handle it if it passes your test.


Just realized that ALIGN() has already done the type cast. Please ignore
above comment. Sorry for the noise.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 2/2] iommu/vt-d: avoid duplicated removing in __domain_mapping

2021-10-07 Thread Lu Baolu

On 10/8/21 8:04 AM, Longpeng(Mike) wrote:

__domain_mapping() always removes the pages in the range from
'iov_pfn' to 'end_pfn', but the 'end_pfn' is always the last pfn
of the range that the caller wants to map.

This would introduce too many duplicated removing and leads the
map operation take too long, for example:

   Map iova=0x10,nr_pages=0x7d61800
 iov_pfn: 0x10, end_pfn: 0x7e617ff
 iov_pfn: 0x14, end_pfn: 0x7e617ff
 iov_pfn: 0x18, end_pfn: 0x7e617ff
 iov_pfn: 0x1c, end_pfn: 0x7e617ff
 iov_pfn: 0x20, end_pfn: 0x7e617ff
 ...
   it takes about 50ms in total.

We can reduce the cost by recalculate the 'end_pfn' and limit it
to the boundary of the end of this pte page.

   Map iova=0x10,nr_pages=0x7d61800
 iov_pfn: 0x10, end_pfn: 0x13
 iov_pfn: 0x14, end_pfn: 0x17
 iov_pfn: 0x18, end_pfn: 0x1b
 iov_pfn: 0x1c, end_pfn: 0x1f
 iov_pfn: 0x20, end_pfn: 0x23
 ...
   it only need 9ms now.

Signed-off-by: Longpeng(Mike) 
---
  drivers/iommu/intel/iommu.c | 11 ++-
  include/linux/intel-iommu.h |  6 ++
  2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59a..46edae6 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2354,12 +2354,17 @@ static void switch_to_super_page(struct dmar_domain 
*domain,
return -ENOMEM;
first_pte = pte;
  
+			lvl_pages = lvl_to_nr_pages(largepage_lvl);

+
/* It is large page*/
if (largepage_lvl > 1) {
unsigned long end_pfn;
+   unsigned long pages_to_remove;
  
  pteval |= DMA_PTE_LARGE_PAGE;

-   end_pfn = ((iov_pfn + nr_pages) & 
level_mask(largepage_lvl)) - 1;
+   pages_to_remove = min_t(unsigned long, nr_pages,
+   
nr_pte_to_next_page(pte) * lvl_pages);
+   end_pfn = iov_pfn + pages_to_remove - 1;
switch_to_super_page(domain, iov_pfn, end_pfn, 
largepage_lvl);
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
@@ -2381,10 +2386,6 @@ static void switch_to_super_page(struct dmar_domain 
*domain,
WARN_ON(1);
}
  
-		lvl_pages = lvl_to_nr_pages(largepage_lvl);

-
-   BUG_ON(nr_pages < lvl_pages);
-
nr_pages -= lvl_pages;
iov_pfn += lvl_pages;
phys_pfn += lvl_pages;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 9bcabc7..b29b2a3 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -713,6 +713,12 @@ static inline bool first_pte_in_page(struct dma_pte *pte)
return IS_ALIGNED((unsigned long)pte, VTD_PAGE_SIZE);
  }
  
+static inline int nr_pte_to_next_page(struct dma_pte *pte)

+{
+   return first_pte_in_page(pte) ? BIT_ULL(VTD_STRIDE_SHIFT) :
+   (struct dma_pte *)ALIGN((unsigned long)pte, VTD_PAGE_SIZE) - 
pte;


We should make it like this to avoid the 0day warning:

(struct dma_pte *)(uintptr_t)VTD_PAGE_ALIGN((unsigned long)pte) - pte;

Can you please test this line of change? No need to send a new version.
I will handle it if it passes your test.


+}
+
  extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev 
*dev);
  extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
  



Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/2] iommu/vt-d: avoid duplicated removing in __domain_mapping

2021-10-07 Thread Longpeng(Mike)
__domain_mapping() always removes the pages in the range from
'iov_pfn' to 'end_pfn', but the 'end_pfn' is always the last pfn
of the range that the caller wants to map.

This would introduce too many duplicated removing and leads the
map operation take too long, for example:

  Map iova=0x10,nr_pages=0x7d61800
iov_pfn: 0x10, end_pfn: 0x7e617ff
iov_pfn: 0x14, end_pfn: 0x7e617ff
iov_pfn: 0x18, end_pfn: 0x7e617ff
iov_pfn: 0x1c, end_pfn: 0x7e617ff
iov_pfn: 0x20, end_pfn: 0x7e617ff
...
  it takes about 50ms in total.

We can reduce the cost by recalculate the 'end_pfn' and limit it
to the boundary of the end of this pte page.

  Map iova=0x10,nr_pages=0x7d61800
iov_pfn: 0x10, end_pfn: 0x13
iov_pfn: 0x14, end_pfn: 0x17
iov_pfn: 0x18, end_pfn: 0x1b
iov_pfn: 0x1c, end_pfn: 0x1f
iov_pfn: 0x20, end_pfn: 0x23
...
  it only need 9ms now.

Signed-off-by: Longpeng(Mike) 
---
 drivers/iommu/intel/iommu.c | 11 ++-
 include/linux/intel-iommu.h |  6 ++
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59a..46edae6 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2354,12 +2354,17 @@ static void switch_to_super_page(struct dmar_domain 
*domain,
return -ENOMEM;
first_pte = pte;
 
+   lvl_pages = lvl_to_nr_pages(largepage_lvl);
+
/* It is large page*/
if (largepage_lvl > 1) {
unsigned long end_pfn;
+   unsigned long pages_to_remove;
 
pteval |= DMA_PTE_LARGE_PAGE;
-   end_pfn = ((iov_pfn + nr_pages) & 
level_mask(largepage_lvl)) - 1;
+   pages_to_remove = min_t(unsigned long, nr_pages,
+   
nr_pte_to_next_page(pte) * lvl_pages);
+   end_pfn = iov_pfn + pages_to_remove - 1;
switch_to_super_page(domain, iov_pfn, end_pfn, 
largepage_lvl);
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
@@ -2381,10 +2386,6 @@ static void switch_to_super_page(struct dmar_domain 
*domain,
WARN_ON(1);
}
 
-   lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
-   BUG_ON(nr_pages < lvl_pages);
-
nr_pages -= lvl_pages;
iov_pfn += lvl_pages;
phys_pfn += lvl_pages;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 9bcabc7..b29b2a3 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -713,6 +713,12 @@ static inline bool first_pte_in_page(struct dma_pte *pte)
return IS_ALIGNED((unsigned long)pte, VTD_PAGE_SIZE);
 }
 
+static inline int nr_pte_to_next_page(struct dma_pte *pte)
+{
+   return first_pte_in_page(pte) ? BIT_ULL(VTD_STRIDE_SHIFT) :
+   (struct dma_pte *)ALIGN((unsigned long)pte, VTD_PAGE_SIZE) - 
pte;
+}
+
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev 
*dev);
 extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/2] iommu/vt-d: convert the return type of first_pte_in_page to bool

2021-10-07 Thread Longpeng(Mike)
first_pte_in_page() returns boolean value, so let's convert its
return type to bool. In addition, use 'IS_ALIGNED' to make the
code more readable and neater.

Signed-off-by: Longpeng(Mike) 
---
 include/linux/intel-iommu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 05a65eb..9bcabc7 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -708,9 +708,9 @@ static inline bool dma_pte_superpage(struct dma_pte *pte)
return (pte->val & DMA_PTE_LARGE_PAGE);
 }
 
-static inline int first_pte_in_page(struct dma_pte *pte)
+static inline bool first_pte_in_page(struct dma_pte *pte)
 {
-   return !((unsigned long)pte & ~VTD_PAGE_MASK);
+   return IS_ALIGNED((unsigned long)pte, VTD_PAGE_SIZE);
 }
 
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev 
*dev);
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/2] iommu/vt-d: boost the mapping process

2021-10-07 Thread Longpeng(Mike)
Hi guys,

We found that the __domain_mapping() would take too long when
the memory region is too large, we try to make it faster in this
patchset. The performance number can be found in PATCH 2, please
review when you free, thanks.

Changes v2 -> v3:
 - make first_pte_in_page() neater  [Baolu]
 - remove meaningless BUG_ON() in __domain_mapping()  [Baolu]

Changes v1 -> v2:
 - Fix compile warning on i386  [Baolu]

Longpeng(Mike) (2):
  iommu/vt-d: convert the return type of first_pte_in_page to bool
  iommu/vt-d: avoid duplicated removing in __domain_mapping

 drivers/iommu/intel/iommu.c | 11 ++-
 include/linux/intel-iommu.h | 10 --
 2 files changed, 14 insertions(+), 7 deletions(-)

-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 6/6] iommu/tegra-smmu: Add pagetable mappings to debugfs

2021-10-07 Thread Nicolin Chen
On Thu, Oct 07, 2021 at 07:13:25PM +0200, Thierry Reding wrote:
> > @@ -496,6 +506,8 @@ static void tegra_smmu_as_unprepare(struct tegra_smmu 
> > *smmu,
> > mutex_unlock(>lock);
> >  }
> >  
> > +static const struct file_operations tegra_smmu_debugfs_mappings_fops;
> 
> Could the implementation be moved up here to avoid the forward
> declaration?

I thought that keeping all debugfs fops together would be preferable.
But yes, I will move it if you prefer no-additional forward declare.

> > +   seq_printf(s, "\nSWGROUP: %s\n", swgrp->name);
> > +   seq_printf(s, "as->id: %d\nas->attr: %c|%c|%s\nas->pd_dma: %pad\n", 
> > as->id,
> > +  as->attr & SMMU_PD_READABLE ? 'R' : '-',
> > +  as->attr & SMMU_PD_WRITABLE ? 'W' : '-',
> > +  as->attr & SMMU_PD_NONSECURE ? "NS" : "S",
> > +  >pd_dma);
> > +   seq_puts(s, "{\n");
> 
> Maybe this can be more compact by putting the name, ID, attributes and
> base address onto a single line? Maybe also use "'-' : 'S'" for the
> non-secure attribute to keep in line with what you've done for readable
> and writable attributes.

Okay. Will change that.

> Then again, this is going to be very verbose output anyway, so maybe it
> isn't worth it.

Are you saying the whole debugfs thing or just attributes? Yet, for
either case, I don't think so, as mappings info would help for sure
from our past experience while the attributes are just one line...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 3/6] iommu/tegra-smmu: Rename struct tegra_smmu_swgroup *group to *swgrp

2021-10-07 Thread Nicolin Chen
On Thu, Oct 07, 2021 at 06:57:31PM +0200, Thierry Reding wrote:
> On Mon, Sep 13, 2021 at 06:38:55PM -0700, Nicolin Chen wrote:
> > There are both tegra_smmu_swgroup and tegra_smmu_group structs
> > using "group" for their pointer instances. This gets confusing
> > to read the driver sometimes.
> > 
> > So this patch renames "group" of struct tegra_smmu_swgroup to
> > "swgrp" as a cleanup. Also renames its "find" function.
> > 
> > Note that we already have "swgroup" being used for an unsigned
> > int type variable that is inside struct tegra_smmu_swgroup, so
> > it's not able to use "swgroup" but only something like "swgrp".
> > 
> > Signed-off-by: Nicolin Chen 
> > ---
> >  drivers/iommu/tegra-smmu.c | 34 +-
> >  1 file changed, 17 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> > index a32ed347e25d..0f3883045ffa 100644
> > --- a/drivers/iommu/tegra-smmu.c
> > +++ b/drivers/iommu/tegra-smmu.c
> > @@ -334,35 +334,35 @@ static void tegra_smmu_domain_free(struct 
> > iommu_domain *domain)
> >  }
> >  
> >  static const struct tegra_smmu_swgroup *
> > -tegra_smmu_find_swgroup(struct tegra_smmu *smmu, unsigned int swgroup)
> > +tegra_smmu_find_swgrp(struct tegra_smmu *smmu, unsigned int swgroup)
> 
> This makes things inconsistent now. The tegra_smmu_find_swgroup() name
> indicates that we're looking for some "swgroup" entity within an "smmu"
> object. The entity that we're looking for is a struct tegra_smmu_swgroup
> so I think it makes sense to use that full name in the function name.

This is more like an indirect change to keep consistency between
function name and pointer name.

> >  {
> > -   const struct tegra_smmu_swgroup *group = NULL;
> > +   const struct tegra_smmu_swgroup *swgrp = NULL;
> 
> I don't think the existing naming is confusing. The variable name
> "group" is consistently used for tegra_smmu_swgroup structures and there
> are no cases where we would confuse them with struct tegra_smmu_group
> instances.

If we don't rename it, then PATCH-4 adds to struct tegra_smmu_group
a "struct tegra_smmu_swgroup *group", which results in a confusing
group->group...

> However, I don't feel strongly about it, so I'm fine with changing the
> variable names to "swgrp" if you think that makes things less confusing.

Yea, I'd like to keep this change. I will respin it in next version
after fixing other comments.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 2/6] iommu/tegra-smmu: Rename struct tegra_smmu_group_soc *soc to *group_soc

2021-10-07 Thread Nicolin Chen
On Thu, Oct 07, 2021 at 06:50:52PM +0200, Thierry Reding wrote:

> >  static const struct tegra_smmu_group_soc *
> > -tegra_smmu_find_group(struct tegra_smmu *smmu, unsigned int swgroup)
> > +tegra_smmu_find_group_soc(struct tegra_smmu *smmu, unsigned int swgroup)
> 
> This one might be okay to disambiguate, but even here I think this isn't
> really necessary. It's already clear from the return value what's being
> returned.

The point here is to disambiguate "group", as there are quite a few
places using the same naming for different structures. You may argue
that it's clear by looking at the return value/type. But it is still
hard to tell when reading the code of its caller, right?

> > @@ -921,9 +922,9 @@ static struct iommu_group 
> > *tegra_smmu_device_group(struct device *dev)
> > }
> >  
> > INIT_LIST_HEAD(>list);
> > +   group->group_soc = group_soc;
> > group->swgroup = swgroup;
> > group->smmu = smmu;
> > -   group->soc = soc;
> 
> As another example, it's pretty evident that group->soc refers to the
> group SoC data rather than the SMMU SoC data. The latter can be obtained
> from group->smmu->soc, which again is enough context to make it clear
> what this is.
> 
> So I don't think this makes things any clearer. It only makes the names
> more redundant and awkward to write.

Okay. I will drop the part of s/soc/group_soc.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jason Gunthorpe via iommu
On Thu, Oct 07, 2021 at 12:11:27PM -0700, Jacob Pan wrote:
> Hi Barry,
> 
> On Thu, 7 Oct 2021 18:43:33 +1300, Barry Song <21cn...@gmail.com> wrote:
> 
> > > > Security-wise, KVA respects kernel mapping. So permissions are better
> > > > enforced than pass-through and identity mapping.  
> > >
> > > Is this meaningful? Isn't the entire physical map still in the KVA and
> > > isn't it entirely RW ?  
> > 
> > Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY.
> > But the difference is really minor.
> That brought up a good point if we were to use DMA API to give out KVA as
> dma_addr for trusted devices. We cannot satisfy DMA direction requirements
> since we can't change kernel mapping. It will be similar to DMA direct
> where dir is ignored AFAICT.

Right.

Using the DMA API to DMA to read only kernel memory is a bug in the
first place.

> Or we are saying if the device is trusted, using pass-through is allowed.
> i.e. physical address.

I don't see trusted being relavent here beyond the usual decision to
use the trusted map or not.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jacob Pan
Hi Barry,

On Thu, 7 Oct 2021 18:43:33 +1300, Barry Song <21cn...@gmail.com> wrote:

> > > Security-wise, KVA respects kernel mapping. So permissions are better
> > > enforced than pass-through and identity mapping.  
> >
> > Is this meaningful? Isn't the entire physical map still in the KVA and
> > isn't it entirely RW ?  
> 
> Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY.
> But the difference is really minor.
That brought up a good point if we were to use DMA API to give out KVA as
dma_addr for trusted devices. We cannot satisfy DMA direction requirements
since we can't change kernel mapping. It will be similar to DMA direct
where dir is ignored AFAICT.

Or we are saying if the device is trusted, using pass-through is allowed.
i.e. physical address.

Thoughts?

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] iommu: Use put_pages_list

2021-10-07 Thread Matthew Wilcox
ping?

On Thu, Sep 30, 2021 at 05:20:42PM +0100, Matthew Wilcox (Oracle) wrote:
> page->freelist is for the use of slab.  We already have the ability
> to free a list of pages in the core mm, but it requires the use of a
> list_head and for the pages to be chained together through page->lru.
> Switch the iommu code over to using free_pages_list().
> 
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---
>  drivers/iommu/amd/io_pgtable.c | 99 +++---
>  drivers/iommu/dma-iommu.c  | 11 +---
>  drivers/iommu/intel/iommu.c| 89 +++---
>  include/linux/iommu.h  |  3 +-
>  4 files changed, 77 insertions(+), 125 deletions(-)
> 
> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
> index 182c93a43efd..8dfa6ee58b76 100644
> --- a/drivers/iommu/amd/io_pgtable.c
> +++ b/drivers/iommu/amd/io_pgtable.c
> @@ -74,49 +74,37 @@ static u64 *first_pte_l7(u64 *pte, unsigned long 
> *page_size,
>   *
>   
> /
>  
> -static void free_page_list(struct page *freelist)
> -{
> - while (freelist != NULL) {
> - unsigned long p = (unsigned long)page_address(freelist);
> -
> - freelist = freelist->freelist;
> - free_page(p);
> - }
> -}
> -
> -static struct page *free_pt_page(unsigned long pt, struct page *freelist)
> +static void free_pt_page(unsigned long pt, struct list_head *list)
>  {
>   struct page *p = virt_to_page((void *)pt);
>  
> - p->freelist = freelist;
> -
> - return p;
> + list_add_tail(>lru, list);
>  }
>  
>  #define DEFINE_FREE_PT_FN(LVL, FN)   
> \
> -static struct page *free_pt_##LVL (unsigned long __pt, struct page 
> *freelist)\
> -{
> \
> - unsigned long p;
> \
> - u64 *pt;
> \
> - int i;  
> \
> - 
> \
> - pt = (u64 *)__pt;   
> \
> - 
> \
> - for (i = 0; i < 512; ++i) { 
> \
> - /* PTE present? */  
> \
> - if (!IOMMU_PTE_PRESENT(pt[i]))  
> \
> - continue;   
> \
> - 
> \
> - /* Large PTE? */
> \
> - if (PM_PTE_LEVEL(pt[i]) == 0 || 
> \
> - PM_PTE_LEVEL(pt[i]) == 7)   
> \
> - continue;   
> \
> - 
> \
> - p = (unsigned long)IOMMU_PTE_PAGE(pt[i]);   
> \
> - freelist = FN(p, freelist); 
> \
> - }   
> \
> - 
> \
> - return free_pt_page((unsigned long)pt, freelist);   
> \
> +static void free_pt_##LVL (unsigned long __pt, struct list_head *list)   
> \
> +{\
> + unsigned long p;\
> + u64 *pt;\
> + int i;  \
> + \
> + pt = (u64 *)__pt;   \
> + \
> + for (i = 0; i < 512; ++i) { \
> + /* PTE present? */  \
> + if (!IOMMU_PTE_PRESENT(pt[i]))  \
> + continue;   \
> + \
> + /* Large PTE? */\
> + if (PM_PTE_LEVEL(pt[i]) == 0 || \
> + PM_PTE_LEVEL(pt[i]) == 7)   \
> + continue;   \
> +   

Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jacob Pan
Hi Jason,

On Thu, 7 Oct 2021 14:48:22 -0300, Jason Gunthorpe  wrote:

> On Thu, Oct 07, 2021 at 10:50:10AM -0700, Jacob Pan wrote:
> 
> > On platforms that are DMA snooped, this barrier is not needed. But I
> > think your point is that once we convert to DMA API, the sync/barrier
> > is covered by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are
> > good.  
> 
> No.. my point is that a CPU store release is not necessary a DMA
> visiable event on all platforms and things like dma_wmb/rmb() may
> still be necessary. This all needs to be architected before anyone
> starts writing drivers that assume a coherent DMA model without using
> a coherent DMA allocation.
> 
Why is that specific to SVA? Or you are talking about things in general?

Can we ensure coherency at the API level where SVA bind device is
happening? i.e. fail the bind if not passing coherency check.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/tegra-smmu: Use devm_bitmap_zalloc when applicable

2021-10-07 Thread Thierry Reding
On Sun, Sep 26, 2021 at 03:07:18PM +0200, Christophe JAILLET wrote:
> 'smmu->asids' is a bitmap. So use 'devm_kzalloc()' to simplify code,
> improve the semantic of the code and avoid some open-coded arithmetic in
> allocator arguments.
> 
> Signed-off-by: Christophe JAILLET 
> ---
>  drivers/iommu/tegra-smmu.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)

Acked-by: Thierry Reding 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jason Gunthorpe via iommu
On Thu, Oct 07, 2021 at 10:50:10AM -0700, Jacob Pan wrote:

> On platforms that are DMA snooped, this barrier is not needed. But I think
> your point is that once we convert to DMA API, the sync/barrier is covered
> by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are good.

No.. my point is that a CPU store release is not necessary a DMA
visiable event on all platforms and things like dma_wmb/rmb() may
still be necessary. This all needs to be architected before anyone
starts writing drivers that assume a coherent DMA model without using
a coherent DMA allocation.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jacob Pan
Hi Jason,

On Thu, 7 Oct 2021 08:59:18 -0300, Jason Gunthorpe  wrote:

> On Fri, Oct 08, 2021 at 12:54:52AM +1300, Barry Song wrote:
> > On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe  wrote:
> >  
> > >
> > > On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:
> > >  
> > > > So do we have a case where devices can directly access the kernel's
> > > > data structure such as a list/graph/tree with pointers to a kernel
> > > > virtual address? then devices don't need to translate the address
> > > > of pointers in a structure. I assume this is one of the most useful
> > > > features userspace SVA can provide.  
> > >
> > > AFIACT that is the only good case for KVA, but it is also completely
> > > against the endianess, word size and DMA portability design of the
> > > kernel.
> > >
> > > Going there requires some new set of portable APIs for gobally
> > > coherent KVA dma.  
> > 
> > yep. I agree. it would be very weird if accelerators/gpu are sharing
> > kernel' data struct, but for each "DMA" operation - reading or writing
> > the data struct, we have to call dma_map_single/sg or
> > dma_sync_single_for_cpu/device etc. It seems once devices and cpus
> > are sharing virtual address(SVA), code doesn't need to do explicit
> > map/sync each time.  
> 
That is what we have today with sva_bind_device.
> No, it still need to do something to manage visibility from the
> current CPU to the DMA - it might not be flushing a cache, but it is
> probably a arch specific CPU barrier instruction.
> 
Are you talking about iommu_dma_sync_single_for_cpu(), this is not SVA
specific, right?

On platforms that are DMA snooped, this barrier is not needed. But I think
your point is that once we convert to DMA API, the sync/barrier is covered
by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are good.

We could also add a check for dev_is_dma_coherent(dev) before using SVA.

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 6/6] iommu/tegra-smmu: Add pagetable mappings to debugfs

2021-10-07 Thread Thierry Reding
On Mon, Sep 13, 2021 at 06:38:58PM -0700, Nicolin Chen wrote:
> This patch dumps all active mapping entries from pagetable
> to a debugfs directory named "mappings".
> 
> Attaching an example:
> 
> SWGROUP: hc
> as->id: 0
> as->attr: R|W|N
> as->pd_dma: 0x80c03000
> {
> [index: 1023] 0xf0080c3e (count: 2)
> {
> PTE RANGE  | ATTR | PHYS   | IOVA 
>   | SIZE
> [#1022, #1023] | 0x5  | 0x00010bbf1000 | 
> 0xe000 | 0x2000
> }
> }
> Total PDE count: 1
> Total PTE count: 2
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/tegra-smmu.c | 145 +
>  1 file changed, 145 insertions(+)
> 
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index 68c34a4a0ecc..aac977e181f6 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -46,6 +46,7 @@ struct tegra_smmu {
>   struct list_head list;
>  
>   struct dentry *debugfs;
> + struct dentry *debugfs_mappings;
>  
>   struct iommu_device iommu;  /* IOMMU Core code handle */
>  };
> @@ -153,6 +154,9 @@ static inline u32 smmu_readl(struct tegra_smmu *smmu, 
> unsigned long offset)
>  
>  #define SMMU_PDE_ATTR(SMMU_PDE_READABLE | SMMU_PDE_WRITABLE 
> | \
>SMMU_PDE_NONSECURE)
> +#define SMMU_PTE_ATTR(SMMU_PTE_READABLE | SMMU_PTE_WRITABLE 
> | \
> +  SMMU_PTE_NONSECURE)
> +#define SMMU_PTE_ATTR_SHIFT  29
>  
>  static unsigned int iova_pd_index(unsigned long iova)
>  {
> @@ -164,6 +168,12 @@ static unsigned int iova_pt_index(unsigned long iova)
>   return (iova >> SMMU_PTE_SHIFT) & (SMMU_NUM_PTE - 1);
>  }
>  
> +static unsigned long pd_pt_index_iova(unsigned int pd_index, unsigned int 
> pt_index)
> +{
> + return ((dma_addr_t)pd_index & (SMMU_NUM_PDE - 1)) << SMMU_PDE_SHIFT |
> +((dma_addr_t)pt_index & (SMMU_NUM_PTE - 1)) << SMMU_PTE_SHIFT;
> +}
> +
>  static bool smmu_dma_addr_valid(struct tegra_smmu *smmu, dma_addr_t addr)
>  {
>   addr >>= 12;
> @@ -496,6 +506,8 @@ static void tegra_smmu_as_unprepare(struct tegra_smmu 
> *smmu,
>   mutex_unlock(>lock);
>  }
>  
> +static const struct file_operations tegra_smmu_debugfs_mappings_fops;

Could the implementation be moved up here to avoid the forward
declaration?

> +
>  static void tegra_smmu_attach_as(struct tegra_smmu *smmu,
>struct tegra_smmu_as *as,
>unsigned int swgroup)
> @@ -517,6 +529,12 @@ static void tegra_smmu_attach_as(struct tegra_smmu *smmu,
>   dev_warn(smmu->dev,
>"overwriting group->as for swgroup: %s\n", 
> swgrp->name);
>   group->as = as;
> +
> + if (smmu->debugfs_mappings)
> + debugfs_create_file(group->swgrp->name, 0444,
> + smmu->debugfs_mappings, group,
> + _smmu_debugfs_mappings_fops);
> +
>   break;
>   }
>  
> @@ -541,6 +559,12 @@ static void tegra_smmu_detach_as(struct tegra_smmu *smmu,
>   if (group->swgrp != swgrp)
>   continue;
>   group->as = NULL;
> +
> + if (smmu->debugfs_mappings) {
> + d = debugfs_lookup(group->swgrp->name, 
> smmu->debugfs_mappings);
> + debugfs_remove(d);
> + }
> +
>   break;
>   }
>  
> @@ -1124,6 +1148,125 @@ static int tegra_smmu_clients_show(struct seq_file 
> *s, void *data)
>  
>  DEFINE_SHOW_ATTRIBUTE(tegra_smmu_clients);
>  
> +static int tegra_smmu_debugfs_mappings_show(struct seq_file *s, void *data)
> +{
> + struct tegra_smmu_group *group = s->private;
> + const struct tegra_smmu_swgroup *swgrp;
> + struct tegra_smmu_as *as;
> + struct tegra_smmu *smmu;
> + unsigned int pd_index;
> + unsigned int pt_index;
> + unsigned long flags;
> + u64 pte_count = 0;
> + u32 pde_count = 0;
> + u32 *pd, val;
> +
> + if (!group || !group->as || !group->swgrp)
> + return 0;
> +
> + swgrp = group->swgrp;
> + smmu = group->smmu;
> + as = group->as;
> +
> + mutex_lock(>lock);
> +
> + val = smmu_readl(smmu, swgrp->reg) & SMMU_ASID_ENABLE;
> + if (!val)
> + goto unlock;
> +
> + pd = page_address(as->pd);
> + if (!pd)
> + goto unlock;
> +
> + seq_printf(s, "\nSWGROUP: %s\n", swgrp->name);
> + seq_printf(s, "as->id: %d\nas->attr: %c|%c|%s\nas->pd_dma: %pad\n", 
> as->id,
> +as->attr & SMMU_PD_READABLE ? 'R' : '-',
> +as->attr & SMMU_PD_WRITABLE ? 'W' : '-',
> +as->attr & SMMU_PD_NONSECURE ? "NS" : "S",
> +>pd_dma);
> + seq_puts(s, "{\n");

Maybe this can be more compact by putting the name, ID, 

Re: [PATCH v6 5/6] iommu/tegra-smmu: Attach as pointer to tegra_smmu_group

2021-10-07 Thread Thierry Reding
On Mon, Sep 13, 2021 at 06:38:57PM -0700, Nicolin Chen wrote:
> This could ease driver to access corresponding as pointer
> when having tegra_smmu_group pointer only, which can help
> new mappings debugfs nodes.
> 
> Also moving tegra_smmu_find_group_soc() upward, for using
> it in new tegra_smmu_attach_as(); and it's better to have
> all tegra_smmu_find_* functions together.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/tegra-smmu.c | 94 +++---
>  1 file changed, 78 insertions(+), 16 deletions(-)

Acked-by: Thierry Reding 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 4/6] iommu/tegra-smmu: Use swgrp pointer instead of swgroup id

2021-10-07 Thread Thierry Reding
On Mon, Sep 13, 2021 at 06:38:56PM -0700, Nicolin Chen wrote:
> This patch changes in struct tegra_smmu_group to use swgrp
> pointer instead of swgroup, as a preparational change for
> the "mappings" debugfs feature.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/tegra-smmu.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)

Seems fine:

Acked-by: Thierry Reding 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 3/6] iommu/tegra-smmu: Rename struct tegra_smmu_swgroup *group to *swgrp

2021-10-07 Thread Thierry Reding
On Mon, Sep 13, 2021 at 06:38:55PM -0700, Nicolin Chen wrote:
> There are both tegra_smmu_swgroup and tegra_smmu_group structs
> using "group" for their pointer instances. This gets confusing
> to read the driver sometimes.
> 
> So this patch renames "group" of struct tegra_smmu_swgroup to
> "swgrp" as a cleanup. Also renames its "find" function.
> 
> Note that we already have "swgroup" being used for an unsigned
> int type variable that is inside struct tegra_smmu_swgroup, so
> it's not able to use "swgroup" but only something like "swgrp".
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/tegra-smmu.c | 34 +-
>  1 file changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index a32ed347e25d..0f3883045ffa 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -334,35 +334,35 @@ static void tegra_smmu_domain_free(struct iommu_domain 
> *domain)
>  }
>  
>  static const struct tegra_smmu_swgroup *
> -tegra_smmu_find_swgroup(struct tegra_smmu *smmu, unsigned int swgroup)
> +tegra_smmu_find_swgrp(struct tegra_smmu *smmu, unsigned int swgroup)

This makes things inconsistent now. The tegra_smmu_find_swgroup() name
indicates that we're looking for some "swgroup" entity within an "smmu"
object. The entity that we're looking for is a struct tegra_smmu_swgroup
so I think it makes sense to use that full name in the function name.

>  {
> - const struct tegra_smmu_swgroup *group = NULL;
> + const struct tegra_smmu_swgroup *swgrp = NULL;

I don't think the existing naming is confusing. The variable name
"group" is consistently used for tegra_smmu_swgroup structures and there
are no cases where we would confuse them with struct tegra_smmu_group
instances.

However, I don't feel strongly about it, so I'm fine with changing the
variable names to "swgrp" if you think that makes things less confusing.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 2/6] iommu/tegra-smmu: Rename struct tegra_smmu_group_soc *soc to *group_soc

2021-10-07 Thread Thierry Reding
On Mon, Sep 13, 2021 at 06:38:54PM -0700, Nicolin Chen wrote:
> There are both tegra_smmu_soc and tegra_smmu_group_soc using "soc"
> for their pointer instances. This patch renames the one of struct
> tegra_smmu_group_soc from "soc" to "group_soc" to distinguish it.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/tegra-smmu.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)

I think the context makes it clear which one this is. The "soc" field in
struct tegra_smmu_group clearly refers to the group SoC data, whereas
the "soc" field in struct tegra_smmu refers to the SMMU SoC data.

> 
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index 6ebae635d3aa..a32ed347e25d 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -22,7 +22,7 @@
>  struct tegra_smmu_group {
>   struct list_head list;
>   struct tegra_smmu *smmu;
> - const struct tegra_smmu_group_soc *soc;
> + const struct tegra_smmu_group_soc *group_soc;
>   struct iommu_group *grp;
>   unsigned int swgroup;
>  };
> @@ -870,7 +870,7 @@ static struct iommu_device 
> *tegra_smmu_probe_device(struct device *dev)
>  static void tegra_smmu_release_device(struct device *dev) {}
>  
>  static const struct tegra_smmu_group_soc *
> -tegra_smmu_find_group(struct tegra_smmu *smmu, unsigned int swgroup)
> +tegra_smmu_find_group_soc(struct tegra_smmu *smmu, unsigned int swgroup)

This one might be okay to disambiguate, but even here I think this isn't
really necessary. It's already clear from the return value what's being
returned.

>  {
>   unsigned int i, j;
>  
> @@ -896,19 +896,20 @@ static struct iommu_group 
> *tegra_smmu_device_group(struct device *dev)
>  {
>   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>   struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
> - const struct tegra_smmu_group_soc *soc;
> + const struct tegra_smmu_group_soc *group_soc;
>   unsigned int swgroup = fwspec->ids[0];
>   struct tegra_smmu_group *group;
>   struct iommu_group *grp;
>  
>   /* Find group_soc associating with swgroup */
> - soc = tegra_smmu_find_group(smmu, swgroup);
> + group_soc = tegra_smmu_find_group_soc(smmu, swgroup);
>  
>   mutex_lock(>lock);
>  
>   /* Find existing iommu_group associating with swgroup or group_soc */
>   list_for_each_entry(group, >groups, list)
> - if ((group->swgroup == swgroup) || (soc && group->soc == soc)) {
> + if ((group->swgroup == swgroup) ||
> + (group_soc && group->group_soc == group_soc)) {
>   grp = iommu_group_ref_get(group->grp);
>   mutex_unlock(>lock);
>   return grp;
> @@ -921,9 +922,9 @@ static struct iommu_group *tegra_smmu_device_group(struct 
> device *dev)
>   }
>  
>   INIT_LIST_HEAD(>list);
> + group->group_soc = group_soc;
>   group->swgroup = swgroup;
>   group->smmu = smmu;
> - group->soc = soc;

As another example, it's pretty evident that group->soc refers to the
group SoC data rather than the SMMU SoC data. The latter can be obtained
from group->smmu->soc, which again is enough context to make it clear
what this is.

So I don't think this makes things any clearer. It only makes the names
more redundant and awkward to write.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 1/6] iommu/tegra-smmu: Rename struct iommu_group *group to *grp

2021-10-07 Thread Thierry Reding
On Mon, Sep 13, 2021 at 06:38:53PM -0700, Nicolin Chen wrote:
> There are a few structs using "group" for their pointer instances.
> This gets confusing sometimes. The instance of struct iommu_group
> is used in local function with an alias "grp", which can separate
> it from others.
> 
> So this patch simply renames "group" to "grp" as a cleanup.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/tegra-smmu.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Acked-by: Thierry Reding 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: DPAA2 triggers, [PATCH] dma debug: report -EEXIST errors in add_dma_entry

2021-10-07 Thread Gerald Schaefer
On Thu, 7 Oct 2021 12:59:32 +0200
Karsten Graul  wrote:

[...]
> > 
> >>> BTW, there is already a WARN in the add_dma_entry() path, related
> >>> to cachlline overlap and -EEXIST:
> >>>
> >>> add_dma_entry() -> active_cacheline_insert() -> -EEXIST ->
> >>> active_cacheline_inc_overlap()
> >>>
> >>> That will only trigger when "overlap > ACTIVE_CACHELINE_MAX_OVERLAP".
> >>> Not familiar with that code, but it seems that there are now two
> >>> warnings for more or less the same, and the new warning is much more
> >>> prone to false-positives.
> >>>
> >>> How do these 2 warnings relate, are they both really necessary?
> >>> I think the new warning was only introduced because of some old
> >>> TODO comment in add_dma_entry(), see commit 2b4bbc6231d78
> >>> ("dma-debug: report -EEXIST errors in add_dma_entry").
> > 
> > AFAICS they are different things. I believe the new warning is supposed to 
> > be for the fundementally incorrect API usage (as above) of mapping 
> > different regions overlapping within the same cacheline. The existing one 
> > is about dma-debug losing internal consistency when tracking the *same* 
> > region being mapped multiple times, which is a legal thing to do - e.g. 
> > buffer sharing between devices - but if anyone's doing it to excess that's 
> > almost certainly a bug (i.e. they probably intended to unmap it in between 
> > but missed that out).
> 
> Thanks for the explanation Robin. 
> 
> In our case its really that a buffer is mapped twice for 2 different devices 
> which we use in SMC to provide failover capabilities. We see that -EEXIST is 
> returned when a buffer is mapped for the second device. Since there is a 
> maximum of 2 parallel mappings we never see the warning shown by 
> active_cacheline_inc_overlap() because we don't exceed 
> ACTIVE_CACHELINE_MAX_OVERLAP.
> 
> So how to deal with this kind of "legal thing", looks like there is no way to 
> suppress the newly introduced EEXIST warning for that case?

Thanks Karsten, very interesting. We assumed so far that we hit the
same case as Ioana, i.e. having multiple sg elements in one cacheline.
With debug output it now seems that we hit a completely different
case, not at all related to any cacheline or coherency issues.

So it really seems that the new warning is basically the same
as the already present one, with the difference that it already
triggers on the first occurrence. Looking at the code again, it
also seems rather obvious now...

IIUC, from what Robin described, this means that the "legal thing
to do - e.g. buffer sharing between devices" will now immediately
trigger the new warning? Not sure if I missed something (again),
because then I would expect much more reports on this, and of
course it would then obviously be false-positive.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] qcom_scm: hide Kconfig symbol

2021-10-07 Thread Arnd Bergmann
From: Arnd Bergmann 

Now that SCM can be a loadable module, we have to add another
dependency to avoid link failures when ipa or adreno-gpu are
built-in:

aarch64-linux-ld: drivers/net/ipa/ipa_main.o: in function `ipa_probe':
ipa_main.c:(.text+0xfc4): undefined reference to `qcom_scm_is_available'

ld.lld: error: undefined symbol: qcom_scm_is_available
>>> referenced by adreno_gpu.c
>>>   gpu/drm/msm/adreno/adreno_gpu.o:(adreno_zap_shader_load) in 
>>> archive drivers/built-in.a

This can happen when CONFIG_ARCH_QCOM is disabled and we don't select
QCOM_MDT_LOADER, but some other module selects QCOM_SCM. Ideally we'd
use a similar dependency here to what we have for QCOM_RPROC_COMMON,
but that causes dependency loops from other things selecting QCOM_SCM.

This appears to be an endless problem, so try something different this
time:

 - CONFIG_QCOM_SCM becomes a hidden symbol that nothing 'depends on'
   but that is simply selected by all of its users

 - All the stubs in include/linux/qcom_scm.h can go away

 - arm-smccc.h needs to provide a stub for __arm_smccc_smc() to
   allow compile-testing QCOM_SCM on all architectures.

 - To avoid a circular dependency chain involving RESET_CONTROLLER
   and PINCTRL_SUNXI, drop the 'select RESET_CONTROLLER' statement.
   According to my testing this still builds fine, and the QCOM
   platform selects this symbol already.

Acked-by: Kalle Valo 
Acked-by: Alex Elder 
Signed-off-by: Arnd Bergmann 
---
Changes in v2:
- fix the iommu dependencies

I've queued this version as a bugfix along with patch 1/2
in my asm-generic tree.

 drivers/firmware/Kconfig   |  5 +-
 drivers/gpu/drm/msm/Kconfig|  4 +-
 drivers/iommu/Kconfig  |  3 +-
 drivers/iommu/arm/arm-smmu/Makefile|  3 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  3 +-
 drivers/media/platform/Kconfig |  2 +-
 drivers/mmc/host/Kconfig   |  2 +-
 drivers/net/ipa/Kconfig|  1 +
 drivers/net/wireless/ath/ath10k/Kconfig|  2 +-
 drivers/pinctrl/qcom/Kconfig   |  3 +-
 include/linux/arm-smccc.h  | 10 +++
 include/linux/qcom_scm.h   | 71 --
 12 files changed, 24 insertions(+), 85 deletions(-)

diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 220a58cf0a44..cda7d7162cbb 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -203,10 +203,7 @@ config INTEL_STRATIX10_RSU
  Say Y here if you want Intel RSU support.
 
 config QCOM_SCM
-   tristate "Qcom SCM driver"
-   depends on ARM || ARM64
-   depends on HAVE_ARM_SMCCC
-   select RESET_CONTROLLER
+   tristate
 
 config QCOM_SCM_DOWNLOAD_MODE_DEFAULT
bool "Qualcomm download mode enabled by default"
diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index e9c6af78b1d7..3ddf739a6f9b 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -17,7 +17,7 @@ config DRM_MSM
select DRM_SCHED
select SHMEM
select TMPFS
-   select QCOM_SCM if ARCH_QCOM
+   select QCOM_SCM
select WANT_DEV_COREDUMP
select SND_SOC_HDMI_CODEC if SND_SOC
select SYNC_FILE
@@ -55,7 +55,7 @@ config DRM_MSM_GPU_SUDO
 
 config DRM_MSM_HDMI_HDCP
bool "Enable HDMI HDCP support in MSM DRM driver"
-   depends on DRM_MSM && QCOM_SCM
+   depends on DRM_MSM
default y
help
  Choose this option to enable HDCP state machine
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 124c41adeca1..c5c71b7ab7e8 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -308,7 +308,6 @@ config APPLE_DART
 config ARM_SMMU
tristate "ARM Ltd. System MMU (SMMU) Support"
depends on ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)
-   depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
@@ -438,7 +437,7 @@ config QCOM_IOMMU
# Note: iommu drivers cannot (yet?) be built as modules
bool "Qualcomm IOMMU Support"
depends on ARCH_QCOM || (COMPILE_TEST && !GENERIC_ATOMIC64)
-   depends on QCOM_SCM=y
+   select QCOM_SCM
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU
diff --git a/drivers/iommu/arm/arm-smmu/Makefile 
b/drivers/iommu/arm/arm-smmu/Makefile
index e240a7bcf310..b0cc01aa20c9 100644
--- a/drivers/iommu/arm/arm-smmu/Makefile
+++ b/drivers/iommu/arm/arm-smmu/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o
+arm_smmu-$(CONFIG_ARM_SMMU_QCOM) += arm-smmu-qcom.o
diff --git 

[PATCH v2 1/2] firmware: include drivers/firmware/Kconfig unconditionally

2021-10-07 Thread Arnd Bergmann
From: Arnd Bergmann 

Compile-testing drivers that require access to a firmware layer
fails when that firmware symbol is unavailable. This happened
twice this week:

 - My proposed to change to rework the QCOM_SCM firmware symbol
   broke on ppc64 and others.

 - The cs_dsp firmware patch added device specific firmware loader
   into drivers/firmware, which broke on the same set of
   architectures.

We should probably do the same thing for other subsystems as well,
but fix this one first as this is a dependency for other patches
getting merged.

Reviewed-by: Bjorn Andersson 
Reviewed-by: Charles Keepax 
Acked-by: Will Deacon 
Acked-by: Bjorn Andersson 
Cc: Mark Brown 
Cc: Liam Girdwood 
Cc: Charles Keepax 
Cc: Simon Trimmer 
Cc: Michael Ellerman 
Reviewed-by: Mark Brown 
Signed-off-by: Arnd Bergmann 
---
No changes in v2, but it's now queued in my asm-generic
tree for v5.15

 arch/arm/Kconfig| 2 --
 arch/arm64/Kconfig  | 2 --
 arch/ia64/Kconfig   | 2 --
 arch/mips/Kconfig   | 2 --
 arch/parisc/Kconfig | 2 --
 arch/riscv/Kconfig  | 2 --
 arch/x86/Kconfig| 2 --
 drivers/Kconfig | 2 ++
 8 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index fc196421b2ce..59baf6c132a7 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1989,8 +1989,6 @@ config ARCH_HIBERNATION_POSSIBLE
 
 endmenu
 
-source "drivers/firmware/Kconfig"
-
 if CRYPTO
 source "arch/arm/crypto/Kconfig"
 endif
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 077f2ec4eeb2..407b4addea36 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1931,8 +1931,6 @@ source "drivers/cpufreq/Kconfig"
 
 endmenu
 
-source "drivers/firmware/Kconfig"
-
 source "drivers/acpi/Kconfig"
 
 source "arch/arm64/kvm/Kconfig"
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 045792cde481..1e33666fa679 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -388,8 +388,6 @@ config CRASH_DUMP
  help
Generate crash dump after being started by kexec.
 
-source "drivers/firmware/Kconfig"
-
 endmenu
 
 menu "Power management and ACPI options"
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 771ca53af06d..6b8f591c5054 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -3316,8 +3316,6 @@ source "drivers/cpuidle/Kconfig"
 
 endmenu
 
-source "drivers/firmware/Kconfig"
-
 source "arch/mips/kvm/Kconfig"
 
 source "arch/mips/vdso/Kconfig"
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 4742b6f169b7..27a8b49af11f 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -384,6 +384,4 @@ config KEXEC_FILE
 
 endmenu
 
-source "drivers/firmware/Kconfig"
-
 source "drivers/parisc/Kconfig"
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index c3f3fd583e04..8bc71ab143e3 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -561,5 +561,3 @@ menu "Power management options"
 source "kernel/power/Kconfig"
 
 endmenu
-
-source "drivers/firmware/Kconfig"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4e001425..4dca39744ee9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2828,8 +2828,6 @@ config HAVE_ATOMIC_IOMAP
def_bool y
depends on X86_32
 
-source "drivers/firmware/Kconfig"
-
 source "arch/x86/kvm/Kconfig"
 
 source "arch/x86/Kconfig.assembler"
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 30d2db37cc87..0d399ddaa185 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -17,6 +17,8 @@ source "drivers/bus/Kconfig"
 
 source "drivers/connector/Kconfig"
 
+source "drivers/firmware/Kconfig"
+
 source "drivers/gnss/Kconfig"
 
 source "drivers/mtd/Kconfig"
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jason Gunthorpe via iommu
On Fri, Oct 08, 2021 at 12:54:52AM +1300, Barry Song wrote:
> On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe  wrote:
> >
> > On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:
> >
> > > So do we have a case where devices can directly access the kernel's data
> > > structure such as a list/graph/tree with pointers to a kernel virtual 
> > > address?
> > > then devices don't need to translate the address of pointers in a 
> > > structure.
> > > I assume this is one of the most useful features userspace SVA can 
> > > provide.
> >
> > AFIACT that is the only good case for KVA, but it is also completely
> > against the endianess, word size and DMA portability design of the
> > kernel.
> >
> > Going there requires some new set of portable APIs for gobally
> > coherent KVA dma.
> 
> yep. I agree. it would be very weird if accelerators/gpu are sharing
> kernel' data struct, but for each "DMA" operation - reading or writing
> the data struct, we have to call dma_map_single/sg or
> dma_sync_single_for_cpu/device etc. It seems once devices and cpus
> are sharing virtual address(SVA), code doesn't need to do explicit
> map/sync each time.

No, it still need to do something to manage visibility from the
current CPU to the DMA - it might not be flushing a cache, but it is
probably a arch specific CPU barrier instruction.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Barry Song
On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe  wrote:
>
> On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:
>
> > So do we have a case where devices can directly access the kernel's data
> > structure such as a list/graph/tree with pointers to a kernel virtual 
> > address?
> > then devices don't need to translate the address of pointers in a structure.
> > I assume this is one of the most useful features userspace SVA can provide.
>
> AFIACT that is the only good case for KVA, but it is also completely
> against the endianess, word size and DMA portability design of the
> kernel.
>
> Going there requires some new set of portable APIs for gobally
> coherent KVA dma.

yep. I agree. it would be very weird if accelerators/gpu are sharing
kernel' data struct, but for each "DMA" operation - reading or writing
the data struct, we have to call dma_map_single/sg or
dma_sync_single_for_cpu/device etc. It seems once devices and cpus
are sharing virtual address(SVA), code doesn't need to do explicit
map/sync each time.

>
> Jason

Thanks
barry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 1/2] iommu/vt-d: convert the return type of first_pte_in_page to bool

2021-10-07 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



> -Original Message-
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, October 7, 2021 2:18 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; dw...@infradead.org; w...@kernel.org;
> j...@8bytes.org
> Cc: baolu...@linux.intel.com; iommu@lists.linux-foundation.org;
> linux-ker...@vger.kernel.org; Gonglei (Arei) 
> Subject: Re: [PATCH v2 1/2] iommu/vt-d: convert the return type of
> first_pte_in_page to bool
> 
> On 2021/10/5 23:23, Longpeng(Mike) wrote:
> > first_pte_in_page() returns boolean value, so let's convert its
> > return type to bool.
> >
> > Signed-off-by: Longpeng(Mike) 
> > ---
> >   include/linux/intel-iommu.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index 05a65eb..a590b00 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -708,7 +708,7 @@ static inline bool dma_pte_superpage(struct dma_pte 
> > *pte)
> > return (pte->val & DMA_PTE_LARGE_PAGE);
> >   }
> >
> > -static inline int first_pte_in_page(struct dma_pte *pte)
> > +static inline bool first_pte_in_page(struct dma_pte *pte)
> >   {
> > return !((unsigned long)pte & ~VTD_PAGE_MASK);
> >   }
> >
> 
> Probably,
> 
>   return IS_ALIGNED((unsigned long)pte, VTD_PAGE_SIZE);
> 
> looks neater?
> 

Looks better! I'll include this optimization in v3.

> Best regards,
> baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 07/20] iommu/iommufd: Add iommufd_[un]bind_device()

2021-10-07 Thread Jason Gunthorpe via iommu
On Thu, Oct 07, 2021 at 12:23:13PM +1100, David Gibson wrote:
> On Fri, Oct 01, 2021 at 09:43:22AM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 30, 2021 at 01:10:29PM +1000, David Gibson wrote:
> > > On Wed, Sep 29, 2021 at 09:24:57AM -0300, Jason Gunthorpe wrote:
> > > > On Wed, Sep 29, 2021 at 03:25:54PM +1000, David Gibson wrote:
> > > > 
> > > > > > +struct iommufd_device {
> > > > > > +   unsigned int id;
> > > > > > +   struct iommufd_ctx *ictx;
> > > > > > +   struct device *dev; /* always be the physical device */
> > > > > > +   u64 dev_cookie;
> > > > > 
> > > > > Why do you need both an 'id' and a 'dev_cookie'?  Since they're both
> > > > > unique, couldn't you just use the cookie directly as the index into
> > > > > the xarray?
> > > > 
> > > > ID is the kernel value in the xarray - xarray is much more efficient &
> > > > safe with small kernel controlled values.
> > > > 
> > > > dev_cookie is a user assigned value that may not be unique. It's
> > > > purpose is to allow userspace to receive and event and go back to its
> > > > structure. Most likely userspace will store a pointer here, but it is
> > > > also possible userspace could not use it.
> > > > 
> > > > It is a pretty normal pattern
> > > 
> > > Hm, ok.  Could you point me at an example?
> > 
> > For instance user_data vs fd in io_uring
> 
> Ok, but one of those is an fd, which is an existing type of handle.
> Here we're introducing two different unique handles that aren't an
> existing kernel concept.

I'm not sure how that matters, the kernel has many handles - and we
get to make more of them.. Look at xarray/idr users in the kernel, many of
those are making userspace handles.

> That said... is there any strong reason why user_data needs to be
> unique?  I can imagine userspace applications where you don't care
> which device the notification is coming from - or at least don't care
> down to the same granularity that /dev/iommu is using.  In which case
> having the kernel provided unique handle and the
> not-necessarily-unique user_data would make perfect sense.

I don't think the user_data 64 bit value should be unique, it is just
transported from  user to kernal and back again. It is *not* a handle,
it is a cookie.

Handles for the kernel/user boundary should come from xarrays that
have nice lookup properties - not from user provided 64 bit values
that have to be stored in red black trees..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jason Gunthorpe via iommu
On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:

> So do we have a case where devices can directly access the kernel's data
> structure such as a list/graph/tree with pointers to a kernel virtual address?
> then devices don't need to translate the address of pointers in a structure.
> I assume this is one of the most useful features userspace SVA can provide.

AFIACT that is the only good case for KVA, but it is also completely
against the endianess, word size and DMA portability design of the
kernel.

Going there requires some new set of portable APIs for gobally
coherent KVA dma.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DPAA2 triggers, [PATCH] dma debug: report -EEXIST errors in add_dma_entry

2021-10-07 Thread Karsten Graul
On 06/10/2021 16:23, Robin Murphy wrote:
> On 2021-10-06 14:10, Gerald Schaefer wrote:
>> On Fri, 1 Oct 2021 14:52:56 +0200
>> Gerald Schaefer  wrote:
>>
>>> On Thu, 30 Sep 2021 15:37:33 +0200
>>> Karsten Graul  wrote:
>>>
 On 14/09/2021 17:45, Ioana Ciornei wrote:
> On Wed, Sep 08, 2021 at 10:33:26PM -0500, Jeremy Linton wrote:
>> +DPAA2, netdev maintainers
>> Hi,
>>
>> On 5/18/21 7:54 AM, Hamza Mahfooz wrote:
>>> Since, overlapping mappings are not supported by the DMA API we should
>>> report an error if active_cacheline_insert returns -EEXIST.
>>
>> It seems this patch found a victim. I was trying to run iperf3 on a
>> honeycomb (5.14.0, fedora 35) and the console is blasting this error 
>> message
>> at 100% cpu. So, I changed it to a WARN_ONCE() to get the call trace, 
>> which
>> is attached below.
>>
>
> These frags are allocated by the stack, transformed into a scatterlist
> by skb_to_sgvec and then DMA mapped with dma_map_sg. It was not the
> dpaa2-eth's decision to use two fragments from the same page (that will
> also end un in the same cacheline) in two different in-flight skbs.
>
> Is this behavior normal?
>

 We see the same problem here and it started with 5.15-rc2 in our nightly 
 CI runs.
 The CI has panic_on_warn enabled so we see the panic every day now.
>>>
>>> Adding a WARN for a case that be detected false-positive seems not
>>> acceptable, exactly for this reason (kernel panic on unaffected
>>> systems).
>>>
>>> So I guess it boils down to the question if the behavior that Ioana
>>> described is legit behavior, on a system that is dma coherent. We
>>> are apparently hitting the same scenario, although it could not yet be
>>> reproduced with debug printks for some reason.
>>>
>>> If the answer is yes, than please remove at lease the WARN, so that
>>> it will not make systems crash that behave valid, and have
>>> panic_on_warn set. Even a normal printk feels wrong to me in that
>>> case, it really sounds rather like you want to fix / better refine
>>> the overlap check, if you want to report anything here.
>>
>> Dan, Christoph, any opinion?
>>
>> So far it all looks a lot like a false positive, so could you please
>> see that those patches get reverted? I do wonder a bit why this is
>> not an issue for others, we surely cannot be the only ones running
>> CI with panic_on_warn.
> 
> What convinces you it's a false-positive? I'm hardly familiar with most of 
> that callstack, but it appears to be related to mlx5, and I know that exists 
> on expansion cards which could be plugged into a system with non-coherent 
> PCIe where partial cacheline overlap *would* be a real issue. Of course it's 
> dubious that there are many real use-cases for plugging a NIC with a 4-figure 
> price tag into a little i.MX8 or whatever, but the point is that it *should* 
> still work correctly.
> 
>> We would need to disable DEBUG_DMA if this WARN stays in, which
>> would be a shame. Of course, in theory, this might also indicate
>> some real bug, but there really is no sign of that so far.
> 
> The whole point of DMA debug is to flag up things that you *do* get away with 
> on the vast majority of systems, precisely because most testing happens on 
> those systems rather than more esoteric embedded setups. Say your system only 
> uses dma-direct and a driver starts triggering the warning for not calling 
> dma_mapping_error(), would you argue for removing that warning as well since 
> dma_map_single() can't fail on your machine so it's "not a bug"?
> 
>> Having multiple sg elements in the same page (or cacheline) is
>> valid, correct? And this is also not a decision of the driver
>> IIUC, so if it was bug, it should be addressed in common code,
>> correct?
> 
> According to the streaming DMA API documentation, it is *not* valid:
> 
> ".. warning::
> 
>   Memory coherency operates at a granularity called the cache
>   line width.  In order for memory mapped by this API to operate
>   correctly, the mapped region must begin exactly on a cache line
>   boundary and end exactly on one (to prevent two separately mapped
>   regions from sharing a single cache line).  Since the cache line size
>   may not be known at compile time, the API will not enforce this
>   requirement.  Therefore, it is recommended that driver writers who
>   don't take special care to determine the cache line size at run time
>   only map virtual regions that begin and end on page boundaries (which
>   are guaranteed also to be cache line boundaries)."
> 
>>> BTW, there is already a WARN in the add_dma_entry() path, related
>>> to cachlline overlap and -EEXIST:
>>>
>>> add_dma_entry() -> active_cacheline_insert() -> -EEXIST ->
>>> active_cacheline_inc_overlap()
>>>
>>> That will only trigger when "overlap > ACTIVE_CACHELINE_MAX_OVERLAP".
>>> Not familiar with that code, but it seems that there are now two
>>> warnings for more 

Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.

2021-10-07 Thread Ajay Garg
Thanks Alex for the reply.


Lu, Alex :

I got my diagnosis regarding the host-driver wrong, my apologies.
There is no issue with the pci-device's host-driver (confirmed by
preventing the loading of host-driver at host-bootup). Thus, nothing
to be fixed at the host-driver side.

Rather seems some dma mapping/unmapping inconsistency is happening,
when kvm/qemu boots up with the pci-device attached to the guest.

I put up debug-logs in "vfio_iommu_type1_ioctl" method in
"vfio_iommu_type1.c" (on the host-machine).
When the guest boots up, repeated DMA-mappings are observed for the
same address as per the host-machine's logs (without a corresponding
DMA-unmapping first) :

##
ajay@ajay-Latitude-E6320:~$ tail -f /var/log/syslog | grep "ajay: "
Oct  7 14:12:32 ajay-Latitude-E6320 kernel: [  146.202297] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct  7 14:12:32 ajay-Latitude-E6320 kernel: [  146.583179] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct  7 14:12:32 ajay-Latitude-E6320 kernel: [  146.583253] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct  7 14:12:36 ajay-Latitude-E6320 kernel: [  150.105584] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct  7 14:13:07 ajay-Latitude-E6320 kernel: [  180.986499] ajay:
_UNMAP_DMA for [0x7ffe724a9840] status [0]
Oct  7 14:13:07 ajay-Latitude-E6320 kernel: [  180.986559] ajay:
_MAP_DMA for [0x7ffe724a97d0] status [0]
Oct  7 14:13:07 ajay-Latitude-E6320 kernel: [  180.986638] ajay:
_MAP_DMA for [0x7ffe724a97d0] status [0]
Oct  7 14:13:07 ajay-Latitude-E6320 kernel: [  181.087359] ajay:
_MAP_DMA for [0x7ffe724a97d0] status [0]
Oct  7 14:13:13 ajay-Latitude-E6320 kernel: [  187.271232] ajay:
_UNMAP_DMA for [0x7fde7b7fcfa0] status [0]
Oct  7 14:13:13 ajay-Latitude-E6320 kernel: [  187.271320] ajay:
_UNMAP_DMA for [0x7fde7b7fcfa0] status [0]

##


I'll try and backtrack to the userspace process that is sending these ioctls.


Thanks and Regards,
Ajay






On Tue, Oct 5, 2021 at 4:01 AM Alex Williamson
 wrote:
>
> On Sat, 2 Oct 2021 22:48:24 +0530
> Ajay Garg  wrote:
>
> > Thanks Lu for the reply.
> >
> > >
> > > Isn't the domain should be switched from a default domain to an
> > > unmanaged domain when the device is assigned to the guest?
> > >
> > > Even you want to r-setup the same mappings, you need to un-map all
> > > existing mappings, right?
> > >
> >
> > Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities
> > need to take.
> > May be the patch could suppress the flooding till then?
>
> No, this is wrong.  The pte values should not exist, it doesn't matter
> that they're the same.  Is the host driver failing to remove mappings
> and somehow they persist in the new vfio owned domain?  There's
> definitely a bug beyond logging going on here.  Thanks,
>
> Alex
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu/vt-d: convert the return type of first_pte_in_page to bool

2021-10-07 Thread Lu Baolu

On 2021/10/5 23:23, Longpeng(Mike) wrote:

first_pte_in_page() returns boolean value, so let's convert its
return type to bool.

Signed-off-by: Longpeng(Mike) 
---
  include/linux/intel-iommu.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 05a65eb..a590b00 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -708,7 +708,7 @@ static inline bool dma_pte_superpage(struct dma_pte *pte)
return (pte->val & DMA_PTE_LARGE_PAGE);
  }
  
-static inline int first_pte_in_page(struct dma_pte *pte)

+static inline bool first_pte_in_page(struct dma_pte *pte)
  {
return !((unsigned long)pte & ~VTD_PAGE_MASK);
  }



Probably,

return IS_ALIGNED((unsigned long)pte, VTD_PAGE_SIZE);

looks neater?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu