Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-08 Thread Lu Baolu

Hi Longpeng,

On 4/8/21 3:37 PM, Longpeng (Mike, Cloud Infrastructure Service Product 
Dept.) wrote:

Hi Baolu,


-Original Message-
From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Thursday, April 8, 2021 12:32 PM
To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
; io...@lists.linux-foundation.org;
linux-kernel@vger.kernel.org
Cc: baolu...@linux.intel.com; David Woodhouse ; Nadav
Amit ; Alex Williamson ;
Kevin Tian ; Gonglei (Arei) ;
sta...@vger.kernel.org
Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

Hi Longpeng,

On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product
Dept.) wrote:

Hi Baolu,


-Original Message-
From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Friday, April 2, 2021 12:44 PM
To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
; io...@lists.linux-foundation.org;
linux-kernel@vger.kernel.org
Cc: baolu...@linux.intel.com; David Woodhouse ;
Nadav Amit ; Alex Williamson
; Kevin Tian ;
Gonglei (Arei) ; sta...@vger.kernel.org
Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating
superpage

Hi Longpeng,

On 4/1/21 3:18 PM, Longpeng(Mike) wrote:

diff --git a/drivers/iommu/intel/iommu.c
b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2342,9 +2342,20 @@ static inline int
hardware_largepage_caps(struct

dmar_domain *domain,

 * removed to make room for superpage(s).
 * We're adding new large pages, so make sure
 * we don't remove their parent tables.
+*
+* We also need to flush the iotlb before 
creating
+* superpage to ensure it does not perserves any
+* obsolete info.
 */
-   dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
-  largepage_lvl + 1);
+   if (dma_pte_present(pte)) {


The dma_pte_free_pagetable() clears a batch of PTEs. So checking
current PTE is insufficient. How about removing this check and always
performing cache invalidation?



Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping )

orNOT-present ( e.g. create a totally new superpage mapping ), but we only need 
to
call free_pagetable and flush_iotlb in the former case, right ?

But this code covers multiple PTEs and perhaps crosses the page boundary.

How about moving this code into a separated function and check PTE presence
there. A sample code could look like below: [compiled but not tested!]

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index
d334f5b4e382..0e04d450c38a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct
dmar_domain *domain,
  return level;
   }

+/*
+ * Ensure that old small page tables are removed to make room for
superpage(s).
+ * We're going to add new large pages, so make sure we don't remove
their parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+unsigned long start_pfn,
+unsigned long end_pfn, int level) {


Maybe "swith_to" will lead people to think "remove old and then setup new", so how about something 
like "remove_room_for_super_page" or "prepare_for_super_page" ?


I named it like this because we also want to have a opposite operation
split_from_super_page() which switch a PDE or PDPE from super page
setting up to small pages, which is needed to optimize dirty bit
tracking during VM live migration.




+   unsigned long lvl_pages = lvl_to_nr_pages(level);
+   struct dma_pte *pte = NULL;
+   int i;
+
+   while (start_pfn <= end_pfn) {


start_pfn < end_pfn ?


end_pfn is inclusive.




+   if (!pte)
+   pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+   if (dma_pte_present(pte)) {
+   dma_pte_free_pagetable(domain, start_pfn,
+  start_pfn + lvl_pages - 1,
+  level + 1);
+
+   for_each_domain_iommu(i, domain)
+   iommu_flush_iotlb_psi(g_iommus[i],
domain,
+ start_pfn,
lvl_pages,
+ 0, 0);
+   }
+
+   pte++;
+   start_pfn += lvl_pages;
+   if (first_pte_in_page(pte))
+ 

RE: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-08 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Baolu,

> -Original Message-
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, April 8, 2021 12:32 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; io...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org
> Cc: baolu...@linux.intel.com; David Woodhouse ; Nadav
> Amit ; Alex Williamson ;
> Kevin Tian ; Gonglei (Arei) ;
> sta...@vger.kernel.org
> Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating 
> superpage
> 
> Hi Longpeng,
> 
> On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product
> Dept.) wrote:
> > Hi Baolu,
> >
> >> -Original Message-
> >> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> >> Sent: Friday, April 2, 2021 12:44 PM
> >> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >> ; io...@lists.linux-foundation.org;
> >> linux-kernel@vger.kernel.org
> >> Cc: baolu...@linux.intel.com; David Woodhouse ;
> >> Nadav Amit ; Alex Williamson
> >> ; Kevin Tian ;
> >> Gonglei (Arei) ; sta...@vger.kernel.org
> >> Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating
> >> superpage
> >>
> >> Hi Longpeng,
> >>
> >> On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
> >>> diff --git a/drivers/iommu/intel/iommu.c
> >>> b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644
> >>> --- a/drivers/iommu/intel/iommu.c
> >>> +++ b/drivers/iommu/intel/iommu.c
> >>> @@ -2342,9 +2342,20 @@ static inline int
> >>> hardware_largepage_caps(struct
> >> dmar_domain *domain,
> >>>* removed to make room for 
> >>> superpage(s).
> >>>* We're adding new large pages, so 
> >>> make sure
> >>>* we don't remove their parent tables.
> >>> +  *
> >>> +  * We also need to flush the iotlb before 
> >>> creating
> >>> +  * superpage to ensure it does not perserves any
> >>> +  * obsolete info.
> >>>*/
> >>> - dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
> >>> -largepage_lvl + 1);
> >>> + if (dma_pte_present(pte)) {
> >>
> >> The dma_pte_free_pagetable() clears a batch of PTEs. So checking
> >> current PTE is insufficient. How about removing this check and always
> >> performing cache invalidation?
> >>
> >
> > Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping )
> orNOT-present ( e.g. create a totally new superpage mapping ), but we only 
> need to
> call free_pagetable and flush_iotlb in the former case, right ?
> 
> But this code covers multiple PTEs and perhaps crosses the page boundary.
> 
> How about moving this code into a separated function and check PTE presence
> there. A sample code could look like below: [compiled but not tested!]
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index
> d334f5b4e382..0e04d450c38a 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct
> dmar_domain *domain,
>  return level;
>   }
> 
> +/*
> + * Ensure that old small page tables are removed to make room for
> superpage(s).
> + * We're going to add new large pages, so make sure we don't remove
> their parent
> + * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
> + */
> +static void switch_to_super_page(struct dmar_domain *domain,
> +unsigned long start_pfn,
> +unsigned long end_pfn, int level) {

Maybe "swith_to" will lead people to think "remove old and then setup new", so 
how about something like "remove_room_for_super_page" or 
"prepare_for_super_page" ?

> +   unsigned long lvl_pages = lvl_to_nr_pages(level);
> +   struct dma_pte *pte = NULL;
> +   int i;
> +
> +   while (start_pfn <= end_pfn) {

start_pfn < end_pfn ?

> +   if (!pte)
> +   pte = pfn_to_dma_pte(domain, start_pfn, &level);
> +
> +   if (dma_pte_present(pte)) {
> +   dma_pte_free_pagetable(domain, start_

Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-07 Thread Lu Baolu

Hi Longpeng,

On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product 
Dept.) wrote:

Hi Baolu,


-Original Message-
From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Friday, April 2, 2021 12:44 PM
To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
; io...@lists.linux-foundation.org;
linux-kernel@vger.kernel.org
Cc: baolu...@linux.intel.com; David Woodhouse ; Nadav
Amit ; Alex Williamson ;
Kevin Tian ; Gonglei (Arei) ;
sta...@vger.kernel.org
Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

Hi Longpeng,

On 4/1/21 3:18 PM, Longpeng(Mike) wrote:

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee09323..cbcb434 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct

dmar_domain *domain,

 * removed to make room for superpage(s).
 * We're adding new large pages, so make sure
 * we don't remove their parent tables.
+*
+* We also need to flush the iotlb before 
creating
+* superpage to ensure it does not perserves any
+* obsolete info.
 */
-   dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
-  largepage_lvl + 1);
+   if (dma_pte_present(pte)) {


The dma_pte_free_pagetable() clears a batch of PTEs. So checking current PTE is
insufficient. How about removing this check and always performing cache
invalidation?



Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping ) 
orNOT-present ( e.g. create a totally new superpage mapping ), but we only need to 
call free_pagetable and flush_iotlb in the former case, right ?


But this code covers multiple PTEs and perhaps crosses the page
boundary.

How about moving this code into a separated function and check PTE
presence there. A sample code could look like below: [compiled but not
tested!]

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d334f5b4e382..0e04d450c38a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,

return level;
 }

+/*
+ * Ensure that old small page tables are removed to make room for 
superpage(s).
+ * We're going to add new large pages, so make sure we don't remove 
their parent

+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+unsigned long start_pfn,
+unsigned long end_pfn, int level)
+{
+   unsigned long lvl_pages = lvl_to_nr_pages(level);
+   struct dma_pte *pte = NULL;
+   int i;
+
+   while (start_pfn <= end_pfn) {
+   if (!pte)
+   pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+   if (dma_pte_present(pte)) {
+   dma_pte_free_pagetable(domain, start_pfn,
+  start_pfn + lvl_pages - 1,
+  level + 1);
+
+   for_each_domain_iommu(i, domain)
+   iommu_flush_iotlb_psi(g_iommus[i], domain,
+ start_pfn, lvl_pages,
+ 0, 0);
+   }
+
+   pte++;
+   start_pfn += lvl_pages;
+   if (first_pte_in_page(pte))
+   pte = NULL;
+   }
+}
+
 static int
 __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2341,22 +2376,11 @@ __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,

return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
-   unsigned long nr_superpages, end_pfn;
+   unsigned long end_pfn;

pteval |= DMA_PTE_LARGE_PAGE;
-   lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
-   nr_superpages = nr_pages / lvl_pages;
-   end_pfn = iov_pfn + nr_superpages * 
lvl_pages - 1;

-
-   /*
-* Ensure that old small page tables are
-* removed to make room for superpage(s).
-* We're

RE: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-06 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Baolu,

> -Original Message-
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Friday, April 2, 2021 12:44 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; io...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org
> Cc: baolu...@linux.intel.com; David Woodhouse ; Nadav
> Amit ; Alex Williamson ;
> Kevin Tian ; Gonglei (Arei) ;
> sta...@vger.kernel.org
> Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating 
> superpage
> 
> Hi Longpeng,
> 
> On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index ee09323..cbcb434 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct
> dmar_domain *domain,
> >  * removed to make room for superpage(s).
> >  * We're adding new large pages, so make sure
> >  * we don't remove their parent tables.
> > +*
> > +* We also need to flush the iotlb before 
> > creating
> > +* superpage to ensure it does not perserves any
> > +* obsolete info.
> >  */
> > -   dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
> > -  largepage_lvl + 1);
> > +   if (dma_pte_present(pte)) {
> 
> The dma_pte_free_pagetable() clears a batch of PTEs. So checking current PTE 
> is
> insufficient. How about removing this check and always performing cache
> invalidation?
> 

Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping ) or 
NOT-present ( e.g. create a totally new superpage mapping ), but we only need 
to call free_pagetable and flush_iotlb in the former case, right ?

> > +   int i;
> > +
> > +   dma_pte_free_pagetable(domain, iov_pfn, 
> > end_pfn,
> > +  largepage_lvl + 
> > 1);
> > +   for_each_domain_iommu(i, domain)
> > +   
> > iommu_flush_iotlb_psi(g_iommus[i], domain,
> > + iov_pfn, 
> > nr_pages, 0, 0);
> > +
> 
> Best regards,
> baolu


Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-01 Thread Lu Baolu
On 4/2/21 11:41 AM, Longpeng (Mike, Cloud Infrastructure Service Product 
Dept.) wrote:

Hi Baolu,

在 2021/4/2 11:06, Lu Baolu 写道:

Hi Longpeng,

On 4/1/21 3:18 PM, Longpeng(Mike) wrote:

The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.

1.mmap(4GB,MAP_HUGETLB)
2.
    while (1) {
     (a)    DMA MAP   0,0xa
     (b)    DMA UNMAP 0,0xa
     (c)    DMA MAP   0,0xc000
   * DMA read IOVA 0 may failure here (Not present)
   * if the problem occurs.
     (d)    DMA UNMAP 0,0xc000
    }

The page table(only focus on IOVA 0) after (a) is:
   PML4: 0x19db5c1003   entry:0x899bdcd2f000
    PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
     PDE: 0x1a30a72003  entry:0x89b39cacb000
  PTE: 0x21d200803  entry:0x89b3b0a72000

The page table after (b) is:
   PML4: 0x19db5c1003   entry:0x899bdcd2f000
    PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
     PDE: 0x1a30a72003  entry:0x89b39cacb000
  PTE: 0x0  entry:0x89b3b0a72000

The page table after (c) is:
   PML4: 0x19db5c1003   entry:0x899bdcd2f000
    PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
     PDE: 0x21d200883   entry:0x89b39cacb000 (*)

Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.

However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:

1. PDE: 0x1a30a72003
2. __domain_mapping
   dma_pte_free_pagetable
     Set the PDE entry to ZERO
   Set the PDE entry to 0x21d200883

So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.

Cc: David Woodhouse 
Cc: Lu Baolu 
Cc: Nadav Amit 
Cc: Alex Williamson 
Cc: Kevin Tian 
Cc: Gonglei (Arei) 

Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating
superpage")
Cc:  # v3.0+
Link:
https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5...@huawei.com/

Suggested-by: Lu Baolu 
Signed-off-by: Longpeng(Mike) 
---
   drivers/iommu/intel/iommu.c | 15 +--
   1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee09323..cbcb434 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct
dmar_domain *domain,
    * removed to make room for superpage(s).
    * We're adding new large pages, so make sure
    * we don't remove their parent tables.
+ *
+ * We also need to flush the iotlb before creating
+ * superpage to ensure it does not perserves any
+ * obsolete info.
    */
-    dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
-   largepage_lvl + 1);
+    if (dma_pte_present(pte)) {
+    int i;
+
+    dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
+   largepage_lvl + 1);
+    for_each_domain_iommu(i, domain)
+    iommu_flush_iotlb_psi(g_iommus[i], domain,
+  iov_pfn, nr_pages, 0, 0);


Thanks for patch!

How about making the flushed page size accurate? For example,

@@ -2365,8 +2365,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long
iov_pfn,
     dma_pte_free_pagetable(domain, iov_pfn,
end_pfn,

largepage_lvl + 1);
     for_each_domain_iommu(i, domain)
- iommu_flush_iotlb_psi(g_iommus[i], domain,
- iov_pfn, nr_pages, 0, 0);
+ iommu_flush_iotlb_psi(g_iommus[i], domain, iov_pfn,
+ ALIGN_DOWN(nr_pages, lvl_pages), 0, 0);


Yes, make sense.

Maybe another alternative is 'end_pfn - iova_pfn + 1', it's readable because we
free pagetable with (iova_pfn, end_pfn) above. Which one do you prefer?


Yours looks better.

By the way, if you are willing to prepare a v2, please make sure to add
Joerg (IOMMU subsystem maintainer) to the list.

Best regards,
baolu


Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-01 Thread Lu Baolu

Hi Longpeng,

On 4/1/21 3:18 PM, Longpeng(Mike) wrote:

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee09323..cbcb434 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,
 * removed to make room for superpage(s).
 * We're adding new large pages, so make sure
 * we don't remove their parent tables.
+*
+* We also need to flush the iotlb before 
creating
+* superpage to ensure it does not perserves any
+* obsolete info.
 */
-   dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
-  largepage_lvl + 1);
+   if (dma_pte_present(pte)) {


The dma_pte_free_pagetable() clears a batch of PTEs. So checking current
PTE is insufficient. How about removing this check and always performing
cache invalidation?


+   int i;
+
+   dma_pte_free_pagetable(domain, iov_pfn, 
end_pfn,
+  largepage_lvl + 
1);
+   for_each_domain_iommu(i, domain)
+   
iommu_flush_iotlb_psi(g_iommus[i], domain,
+ iov_pfn, 
nr_pages, 0, 0);
+   


Best regards,
baolu


Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-01 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Baolu,

在 2021/4/2 11:06, Lu Baolu 写道:
> Hi Longpeng,
> 
> On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
>> The translation caches may preserve obsolete data when the
>> mapping size is changed, suppose the following sequence which
>> can reveal the problem with high probability.
>>
>> 1.mmap(4GB,MAP_HUGETLB)
>> 2.
>>    while (1) {
>>     (a)    DMA MAP   0,0xa
>>     (b)    DMA UNMAP 0,0xa
>>     (c)    DMA MAP   0,0xc000
>>   * DMA read IOVA 0 may failure here (Not present)
>>   * if the problem occurs.
>>     (d)    DMA UNMAP 0,0xc000
>>    }
>>
>> The page table(only focus on IOVA 0) after (a) is:
>>   PML4: 0x19db5c1003   entry:0x899bdcd2f000
>>    PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
>>     PDE: 0x1a30a72003  entry:0x89b39cacb000
>>  PTE: 0x21d200803  entry:0x89b3b0a72000
>>
>> The page table after (b) is:
>>   PML4: 0x19db5c1003   entry:0x899bdcd2f000
>>    PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
>>     PDE: 0x1a30a72003  entry:0x89b39cacb000
>>  PTE: 0x0  entry:0x89b3b0a72000
>>
>> The page table after (c) is:
>>   PML4: 0x19db5c1003   entry:0x899bdcd2f000
>>    PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
>>     PDE: 0x21d200883   entry:0x89b39cacb000 (*)
>>
>> Because the PDE entry after (b) is present, it won't be
>> flushed even if the iommu driver flush cache when unmap,
>> so the obsolete data may be preserved in cache, which
>> would cause the wrong translation at end.
>>
>> However, we can see the PDE entry is finally switch to
>> 2M-superpage mapping, but it does not transform
>> to 0x21d200883 directly:
>>
>> 1. PDE: 0x1a30a72003
>> 2. __domain_mapping
>>   dma_pte_free_pagetable
>>     Set the PDE entry to ZERO
>>   Set the PDE entry to 0x21d200883
>>
>> So we must flush the cache after the entry switch to ZERO
>> to avoid the obsolete info be preserved.
>>
>> Cc: David Woodhouse 
>> Cc: Lu Baolu 
>> Cc: Nadav Amit 
>> Cc: Alex Williamson 
>> Cc: Kevin Tian 
>> Cc: Gonglei (Arei) 
>>
>> Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating
>> superpage")
>> Cc:  # v3.0+
>> Link:
>> https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5...@huawei.com/
>>
>> Suggested-by: Lu Baolu 
>> Signed-off-by: Longpeng(Mike) 
>> ---
>>   drivers/iommu/intel/iommu.c | 15 +--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index ee09323..cbcb434 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct
>> dmar_domain *domain,
>>    * removed to make room for superpage(s).
>>    * We're adding new large pages, so make sure
>>    * we don't remove their parent tables.
>> + *
>> + * We also need to flush the iotlb before creating
>> + * superpage to ensure it does not perserves any
>> + * obsolete info.
>>    */
>> -    dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
>> -   largepage_lvl + 1);
>> +    if (dma_pte_present(pte)) {
>> +    int i;
>> +
>> +    dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
>> +   largepage_lvl + 1);
>> +    for_each_domain_iommu(i, domain)
>> +    iommu_flush_iotlb_psi(g_iommus[i], domain,
>> +  iov_pfn, nr_pages, 0, 0);
> 
> Thanks for patch!
> 
> How about making the flushed page size accurate? For example,
> 
> @@ -2365,8 +2365,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned 
> long
> iov_pfn,
>     dma_pte_free_pagetable(domain, 
> iov_pfn,
> end_pfn,
> 
> largepage_lvl + 1);
>     for_each_domain_iommu(i, domain)
> - iommu_flush_iotlb_psi(g_iommus[i], domain,
> - iov_pfn, nr_pages, 0, 0);
> + iommu_flush_iotlb_psi(g_iommus[i], domain, iov_pfn,
> + ALIGN_DOWN(nr_pages, lvl_pages), 0, 0);
> 
Yes, make sense.

Maybe another alternative is 'end_pfn - iova_pfn + 1', it's readable because we
free pagetable with (iova_pfn, end_pfn) above. Which one do you prefer?

> 
>> +    }
>>   } else {
>>   pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
>>   }
>>
> 
> Best regards,
> baolu
> .


Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-01 Thread Lu Baolu

Hi Longpeng,

On 4/1/21 3:18 PM, Longpeng(Mike) wrote:

The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.

1.mmap(4GB,MAP_HUGETLB)
2.
   while (1) {
(a)DMA MAP   0,0xa
(b)DMA UNMAP 0,0xa
(c)DMA MAP   0,0xc000
  * DMA read IOVA 0 may failure here (Not present)
  * if the problem occurs.
(d)DMA UNMAP 0,0xc000
   }

The page table(only focus on IOVA 0) after (a) is:
  PML4: 0x19db5c1003   entry:0x899bdcd2f000
   PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
PDE: 0x1a30a72003  entry:0x89b39cacb000
 PTE: 0x21d200803  entry:0x89b3b0a72000

The page table after (b) is:
  PML4: 0x19db5c1003   entry:0x899bdcd2f000
   PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
PDE: 0x1a30a72003  entry:0x89b39cacb000
 PTE: 0x0  entry:0x89b3b0a72000

The page table after (c) is:
  PML4: 0x19db5c1003   entry:0x899bdcd2f000
   PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
PDE: 0x21d200883   entry:0x89b39cacb000 (*)

Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.

However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:

1. PDE: 0x1a30a72003
2. __domain_mapping
  dma_pte_free_pagetable
Set the PDE entry to ZERO
  Set the PDE entry to 0x21d200883

So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.

Cc: David Woodhouse 
Cc: Lu Baolu 
Cc: Nadav Amit 
Cc: Alex Williamson 
Cc: Kevin Tian 
Cc: Gonglei (Arei) 

Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating 
superpage")
Cc:  # v3.0+
Link: 
https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5...@huawei.com/
Suggested-by: Lu Baolu 
Signed-off-by: Longpeng(Mike) 
---
  drivers/iommu/intel/iommu.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee09323..cbcb434 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,
 * removed to make room for superpage(s).
 * We're adding new large pages, so make sure
 * we don't remove their parent tables.
+*
+* We also need to flush the iotlb before 
creating
+* superpage to ensure it does not perserves any
+* obsolete info.
 */
-   dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
-  largepage_lvl + 1);
+   if (dma_pte_present(pte)) {
+   int i;
+
+   dma_pte_free_pagetable(domain, iov_pfn, 
end_pfn,
+  largepage_lvl + 
1);
+   for_each_domain_iommu(i, domain)
+   
iommu_flush_iotlb_psi(g_iommus[i], domain,
+ iov_pfn, 
nr_pages, 0, 0);


Thanks for patch!

How about making the flushed page size accurate? For example,

@@ -2365,8 +2365,8 @@ __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
dma_pte_free_pagetable(domain, 
iov_pfn, end_pfn,


largepage_lvl + 1);
for_each_domain_iommu(i, domain)
- 
iommu_flush_iotlb_psi(g_iommus[i], domain,
- 
iov_pfn, nr_pages, 0, 0);
+ 
iommu_flush_iotlb_psi(g_iommus[i], domain, iov_pfn,
+ 
ALIGN_DOWN(nr_pages, lvl_pages), 0, 0);




+   }
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
}



Best regards,
baolu


[PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-01 Thread Longpeng(Mike)
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.

1.mmap(4GB,MAP_HUGETLB)
2.
  while (1) {
   (a)DMA MAP   0,0xa
   (b)DMA UNMAP 0,0xa
   (c)DMA MAP   0,0xc000
 * DMA read IOVA 0 may failure here (Not present)
 * if the problem occurs.
   (d)DMA UNMAP 0,0xc000
  }

The page table(only focus on IOVA 0) after (a) is:
 PML4: 0x19db5c1003   entry:0x899bdcd2f000
  PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
   PDE: 0x1a30a72003  entry:0x89b39cacb000
PTE: 0x21d200803  entry:0x89b3b0a72000

The page table after (b) is:
 PML4: 0x19db5c1003   entry:0x899bdcd2f000
  PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
   PDE: 0x1a30a72003  entry:0x89b39cacb000
PTE: 0x0  entry:0x89b3b0a72000

The page table after (c) is:
 PML4: 0x19db5c1003   entry:0x899bdcd2f000
  PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
   PDE: 0x21d200883   entry:0x89b39cacb000 (*)

Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.

However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:

1. PDE: 0x1a30a72003
2. __domain_mapping
 dma_pte_free_pagetable
   Set the PDE entry to ZERO
 Set the PDE entry to 0x21d200883

So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.

Cc: David Woodhouse 
Cc: Lu Baolu 
Cc: Nadav Amit 
Cc: Alex Williamson 
Cc: Kevin Tian 
Cc: Gonglei (Arei) 

Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating 
superpage")
Cc:  # v3.0+
Link: 
https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5...@huawei.com/
Suggested-by: Lu Baolu 
Signed-off-by: Longpeng(Mike) 
---
 drivers/iommu/intel/iommu.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee09323..cbcb434 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,
 * removed to make room for superpage(s).
 * We're adding new large pages, so make sure
 * we don't remove their parent tables.
+*
+* We also need to flush the iotlb before 
creating
+* superpage to ensure it does not perserves any
+* obsolete info.
 */
-   dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
-  largepage_lvl + 1);
+   if (dma_pte_present(pte)) {
+   int i;
+
+   dma_pte_free_pagetable(domain, iov_pfn, 
end_pfn,
+  largepage_lvl + 
1);
+   for_each_domain_iommu(i, domain)
+   
iommu_flush_iotlb_psi(g_iommus[i], domain,
+ iov_pfn, 
nr_pages, 0, 0);
+   }
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
}
-- 
1.8.3.1