[PATCH v2] nouveau/hmm: map pages after migration

2020-03-02 Thread Ralph Campbell
When memory is migrated to the GPU, it is likely to be accessed by GPU
code soon afterwards. Instead of waiting for a GPU fault, map the
migrated memory into the GPU page tables with the same access permissions
as the source CPU page table entries. This preserves copy on write
semantics.

Signed-off-by: Ralph Campbell 
Cc: Christoph Hellwig 
Cc: Jason Gunthorpe 
Cc: "Jérôme Glisse" 
Cc: Ben Skeggs 
---

Originally this patch was targeted for Jason's rdma tree since other HMM
related changes were queued there. Now that those have been merged, this
patch just contains changes to nouveau so it could go through any tree.
I guess Ben Skeggs' tree would be appropriate.

Changes since v1:
 Rebase to linux-5.6.0-rc4
 Address Christoph Hellwig's comments

 drivers/gpu/drm/nouveau/nouveau_dmem.c | 44 -
 drivers/gpu/drm/nouveau/nouveau_svm.c  | 85 ++
 drivers/gpu/drm/nouveau/nouveau_svm.h  |  5 ++
 3 files changed, 118 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 0ad5d87b5a8e..172e0c98cec5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -25,11 +25,13 @@
 #include "nouveau_dma.h"
 #include "nouveau_mem.h"
 #include "nouveau_bo.h"
+#include "nouveau_svm.h"
 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -558,10 +560,11 @@ nouveau_dmem_init(struct nouveau_drm *drm)
 }
 
 static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
-   unsigned long src, dma_addr_t *dma_addr)
+   unsigned long src, dma_addr_t *dma_addr, u64 *pfn)
 {
struct device *dev = drm->dev->dev;
struct page *dpage, *spage;
+   unsigned long paddr;
 
spage = migrate_pfn_to_page(src);
if (!spage || !(src & MIGRATE_PFN_MIGRATE))
@@ -569,17 +572,21 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct 
nouveau_drm *drm,
 
dpage = nouveau_dmem_page_alloc_locked(drm);
if (!dpage)
-   return 0;
+   goto out;
 
*dma_addr = dma_map_page(dev, spage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
if (dma_mapping_error(dev, *dma_addr))
goto out_free_page;
 
+   paddr = nouveau_dmem_page_addr(dpage);
if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_VRAM,
-   nouveau_dmem_page_addr(dpage), NOUVEAU_APER_HOST,
-   *dma_addr))
+   paddr, NOUVEAU_APER_HOST, *dma_addr))
goto out_dma_unmap;
 
+   *pfn = NVIF_VMM_PFNMAP_V0_V | NVIF_VMM_PFNMAP_V0_VRAM |
+   ((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
+   if (src & MIGRATE_PFN_WRITE)
+   *pfn |= NVIF_VMM_PFNMAP_V0_W;
return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
 
 out_dma_unmap:
@@ -587,18 +594,19 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct 
nouveau_drm *drm,
 out_free_page:
nouveau_dmem_page_free_locked(drm, dpage);
 out:
+   *pfn = NVIF_VMM_PFNMAP_V0_NONE;
return 0;
 }
 
 static void nouveau_dmem_migrate_chunk(struct nouveau_drm *drm,
-   struct migrate_vma *args, dma_addr_t *dma_addrs)
+   struct migrate_vma *args, dma_addr_t *dma_addrs, u64 *pfns)
 {
struct nouveau_fence *fence;
unsigned long addr = args->start, nr_dma = 0, i;
 
for (i = 0; addr < args->end; i++) {
args->dst[i] = nouveau_dmem_migrate_copy_one(drm, args->src[i],
-   dma_addrs + nr_dma);
+   dma_addrs + nr_dma, pfns + i);
if (args->dst[i])
nr_dma++;
addr += PAGE_SIZE;
@@ -607,15 +615,12 @@ static void nouveau_dmem_migrate_chunk(struct nouveau_drm 
*drm,
nouveau_fence_new(drm->dmem->migrate.chan, false, );
migrate_vma_pages(args);
nouveau_dmem_fence_done();
+   nouveau_pfns_map(drm, args->vma->vm_mm, args->start, pfns, i);
 
while (nr_dma--) {
dma_unmap_page(drm->dev->dev, dma_addrs[nr_dma], PAGE_SIZE,
DMA_BIDIRECTIONAL);
}
-   /*
-* FIXME optimization: update GPU page table to point to newly migrated
-* memory.
-*/
migrate_vma_finalize(args);
 }
 
@@ -632,7 +637,8 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
.vma= vma,
.start  = start,
};
-   unsigned long c, i;
+   unsigned long i;
+   u64 *pfns;
int ret = -ENOMEM;
 
args.src = kcalloc(max, sizeof(*args.src), GFP_KERNEL);
@@ -646,19 +652,25 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
if (!dma_addrs)
 

Re: [PATCH hmm 9/8] mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()

2020-03-12 Thread Ralph Campbell



On 3/12/20 12:33 PM, Jason Gunthorpe wrote:

pmd_to_hmm_pfn_flags() already checks it and makes the cpu flags 0. If no
fault is requested then the pfns should be returned with the not valid
flags.

It should not unconditionally fault if faulting is not requested.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page 
basis")
Signed-off-by: Jason Gunthorpe 


Looks good to me.
Reviewed-by: Ralph Campbell 


---
  mm/hmm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Bonus patch, this one got found after I made the series..

diff --git a/mm/hmm.c b/mm/hmm.c
index ca33d086bdc190..6d9da4b0f0a9f8 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -226,7 +226,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, 
unsigned long addr,
hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
 , _fault);
  
-	if (pmd_protnone(pmd) || fault || write_fault)

+   if (fault || write_fault)
return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
  
  	pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()

2020-03-11 Thread Ralph Campbell



On 3/11/20 11:34 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

Many of the direct returns of error skipped doing the pte_unmap(). All non
zero exit paths must unmap the pte.

The pte_unmap() is split unnaturally like this because some of the error
exit paths trigger a sleep and must release the lock before sleeping.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed 
filesystem")
Fixes: 53f5c3f489ec ("mm/hmm: factor out pte and pmd handling to simplify 
hmm_vma_walk_pmd()")
Signed-off-by: Jason Gunthorpe 


The changes look OK to me but one issue noted below.
In any case, you can add:
Reviewed-by: Ralph Campbell 


---
  mm/hmm.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 72e5a6d9a41756..35f85424176d14 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -325,6 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
unsigned long addr,
}
  
  		/* Report error for everything else */

+   pte_unmap(ptep);
*pfn = range->values[HMM_PFN_ERROR];
return -EFAULT;
} else {
@@ -339,10 +340,13 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
unsigned long addr,
if (pte_devmap(pte)) {
hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
  hmm_vma_walk->pgmap);
-   if (unlikely(!hmm_vma_walk->pgmap))
+   if (unlikely(!hmm_vma_walk->pgmap)) {
+   pte_unmap(ptep);
return -EBUSY;
+   }
} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) 
{
if (!is_zero_pfn(pte_pfn(pte))) {
+   pte_unmap(ptep);
*pfn = range->values[HMM_PFN_SPECIAL];
return -EFAULT;
}
@@ -437,7 +441,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
  
  		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, [i]);

if (r) {
-   /* hmm_vma_handle_pte() did unmap pte directory */
+   /* hmm_vma_handle_pte() did pte_unmap() */
hmm_vma_walk->last = addr;
return r;
}



I think there is a case where hmm_vma_handle_pte() is called, a fault is 
requested,
pte_unmap() and hmm_vma_walk_hole_() are called, the latter returns zero (the 
fault
was handled OK), but now the page table is unmapped for successive loop 
iterations
and pte_unmap(ptep - 1) is called at the loop end.
Since this isn't an issue on x86_64 but is on x86_32, I can see how it could be 
missed.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 2/8] mm/hmm: don't free the cached pgmap while scanning

2020-03-11 Thread Ralph Campbell



On 3/11/20 11:35 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

The pgmap is held in the hmm_vma_walk variable in hope of speeding up
future get_dev_pagemap() calls by hitting the same pointer. The algorithm
doesn't actually care about how long the pgmap is held for.

Move the put of the cached pgmap to after the walk is completed and delete
all the other now redundant puts.

This solves a possible leak of the reference in hmm_vma_walk_pmd() if a
hmm_vma_handle_pte() fails while looping.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed 
filesystem")
Signed-off-by: Jason Gunthorpe 


Reviewed-by: Ralph Campbell 


---
  mm/hmm.c | 31 +--
  1 file changed, 9 insertions(+), 22 deletions(-)

We talked about just deleting this stuff, but I think it makes alot sense for
hmm_range_fault() to trigger fault on devmap pages that are not compatible
with the caller - so lets just fix the leak on error path for now.

diff --git a/mm/hmm.c b/mm/hmm.c
index 35f85424176d14..9e8f68eb83287a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -239,10 +239,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, 
unsigned long addr,
}
pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
}
-   if (hmm_vma_walk->pgmap) {
-   put_dev_pagemap(hmm_vma_walk->pgmap);
-   hmm_vma_walk->pgmap = NULL;
-   }
hmm_vma_walk->last = end;
return 0;
  }
@@ -360,10 +356,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
unsigned long addr,
return 0;
  
  fault:

-   if (hmm_vma_walk->pgmap) {
-   put_dev_pagemap(hmm_vma_walk->pgmap);
-   hmm_vma_walk->pgmap = NULL;
-   }
pte_unmap(ptep);
/* Fault any virtual address we were asked to fault */
return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
@@ -446,16 +438,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
return r;
}
}
-   if (hmm_vma_walk->pgmap) {
-   /*
-* We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
-* so that we can leverage get_dev_pagemap() optimization which
-* will not re-take a reference on a pgmap if we already have
-* one.
-*/
-   put_dev_pagemap(hmm_vma_walk->pgmap);
-   hmm_vma_walk->pgmap = NULL;
-   }
pte_unmap(ptep - 1);
  
  	hmm_vma_walk->last = addr;

@@ -529,10 +511,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long 
start, unsigned long end,
pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
  cpu_flags;
}
-   if (hmm_vma_walk->pgmap) {
-   put_dev_pagemap(hmm_vma_walk->pgmap);
-   hmm_vma_walk->pgmap = NULL;
-   }
hmm_vma_walk->last = end;
goto out_unlock;
}
@@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int 
flags)
return -EBUSY;
ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
  _walk_ops, _vma_walk);
+   /*
+* A pgmap is kept cached in the hmm_vma_walk to avoid expensive
+* searching in the probably common case that the pgmap is the
+* same for the entire requested range.
+*/
+   if (hmm_vma_walk.pgmap) {
+   put_dev_pagemap(hmm_vma_walk.pgmap);
+   hmm_vma_walk.pgmap = NULL;
+   }
} while (ret == -EBUSY);
  
  	if (ret)



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock

2020-03-11 Thread Ralph Campbell



On 3/11/20 11:35 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

This eventually calls into handle_mm_fault() which is a sleeping function.
Release the lock first.

hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
need the lock.

Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
Cc: Steven Price 
Signed-off-by: Jason Gunthorpe 


Reviewed-by: Ralph Campbell 


---
  mm/hmm.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 9e8f68eb83287a..32dcbfd3908315 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long 
start, unsigned long end,
  
  	pud = READ_ONCE(*pudp);

if (pud_none(pud)) {
-   ret = hmm_vma_walk_hole(start, end, -1, walk);
-   goto out_unlock;
+   spin_unlock(ptl);
+   return hmm_vma_walk_hole(start, end, -1, walk);
}
  
  	if (pud_huge(pud) && pud_devmap(pud)) {

@@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long 
start, unsigned long end,
bool fault, write_fault;
  
  		if (!pud_present(pud)) {

-   ret = hmm_vma_walk_hole(start, end, -1, walk);
-   goto out_unlock;
+   spin_unlock(ptl);
+   return hmm_vma_walk_hole(start, end, -1, walk);
}
  
  		i = (addr - range->start) >> PAGE_SHIFT;

@@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long 
start, unsigned long end,
hmm_range_need_fault(hmm_vma_walk, pfns, npages,
 cpu_flags, , _fault);
if (fault || write_fault) {
-   ret = hmm_vma_walk_hole_(addr, end, fault,
-write_fault, walk);
-   goto out_unlock;
+   spin_unlock(ptl);
+   return hmm_vma_walk_hole_(addr, end, fault, write_fault,
+ walk);
}
  
  		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 5/8] mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT

2020-03-11 Thread Ralph Campbell



On 3/11/20 11:35 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

All return paths that do EFAULT must call hmm_range_need_fault() to
determine if the user requires this page to be valid.

If the page cannot be made valid if the user later requires it, due to vma
flags in this case, then the return should be HMM_PFN_ERROR.

Fixes: a3e0d41c2b1f ("mm/hmm: improve driver API to work and wait over a range")
Signed-off-by: Jason Gunthorpe 


Reviewed-by: Ralph Campbell 


---
  mm/hmm.c | 19 ---
  1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 5f5ccf13dd1e85..e10cd0adba7b37 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -582,18 +582,15 @@ static int hmm_vma_walk_test(unsigned long start, 
unsigned long end,
struct vm_area_struct *vma = walk->vma;
  
  	/*

-* Skip vma ranges that don't have struct page backing them or
-* map I/O devices directly.
-*/
-   if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
-   return -EFAULT;
-
-   /*
+* Skip vma ranges that don't have struct page backing them or map I/O
+* devices directly.
+*
 * If the vma does not allow read access, then assume that it does not
-* allow write access either. HMM does not support architectures
-* that allow write without read.
+* allow write access either. HMM does not support architectures that
+* allow write without read.
 */
-   if (!(vma->vm_flags & VM_READ)) {
+   if ((vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) ||
+   !(vma->vm_flags & VM_READ)) {
bool fault, write_fault;
  
  		/*

@@ -607,7 +604,7 @@ static int hmm_vma_walk_test(unsigned long start, unsigned 
long end,
if (fault || write_fault)
return -EFAULT;
  
-		hmm_pfns_fill(start, end, range, HMM_PFN_NONE);

+   hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
hmm_vma_walk->last = end;
  
  		/* Skip this vma and continue processing the next vma. */



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 4/8] mm/hmm: add missing pfns set to hmm_vma_walk_pmd()

2020-03-11 Thread Ralph Campbell



On 3/11/20 11:35 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

All success exit paths from the walker functions must set the pfns array.

A migration entry with no required fault is a HMM_PFN_NONE return, just
like the pte case.

Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
Signed-off-by: Jason Gunthorpe 


Reviewed-by: Ralph Campbell 


---
  mm/hmm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 32dcbfd3908315..5f5ccf13dd1e85 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -394,7 +394,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
pmd_migration_entry_wait(walk->mm, pmdp);
return -EBUSY;
}
-   return 0;
+   return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
} else if (!pmd_present(pmd))
return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 6/8] mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()

2020-03-11 Thread Ralph Campbell



On 3/11/20 11:35 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

The intention with this code is to determine if the caller required the
pages to be valid, and if so, then take some action to make them valid.
The action varies depending on the page type.

In all cases, if the caller doesn't ask for the page, then
hmm_range_fault() should not return an error.

Revise the implementation to be clearer, and fix some bugs:

  - hmm_pte_need_fault() must always be called before testing fault or
write_fault otherwise the defaults of false apply and the if()'s don't
work. This was missed on the is_migration_entry() branch

  - -EFAULT should not be returned unless hmm_pte_need_fault() indicates
fault is required - ie snapshotting should not fail.

  - For !pte_present() the cpu_flags are always 0, except in the special
case of is_device_private_entry(), calling pte_to_hmm_pfn_flags() is
confusing.

Reorganize the flow so that it always follows the pattern of calling
hmm_pte_need_fault() and then checking fault || write_fault.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page 
basis")
Signed-off-by: Jason Gunthorpe 


Reviewed-by: Ralph Campbell 


---
  mm/hmm.c | 35 +++
  1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index e10cd0adba7b37..bf676cfef3e8ee 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -282,15 +282,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
unsigned long addr,
if (!pte_present(pte)) {
swp_entry_t entry = pte_to_swp_entry(pte);
  
-		if (!non_swap_entry(entry)) {

-   cpu_flags = pte_to_hmm_pfn_flags(range, pte);
-   hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
-  , _fault);
-   if (fault || write_fault)
-   goto fault;
-   return 0;
-   }
-
/*
 * This is a special swap entry, ignore migration, use
 * device and report anything else as error.
@@ -310,26 +301,30 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
unsigned long addr,
return 0;
}
  
-		if (is_migration_entry(entry)) {

-   if (fault || write_fault) {
-   pte_unmap(ptep);
-   hmm_vma_walk->last = addr;
-   migration_entry_wait(walk->mm, pmdp, addr);
-   return -EBUSY;
-   }
+   hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, ,
+  _fault);
+   if (!fault && !write_fault)
return 0;
+
+   if (!non_swap_entry(entry))
+   goto fault;
+
+   if (is_migration_entry(entry)) {
+   pte_unmap(ptep);
+   hmm_vma_walk->last = addr;
+   migration_entry_wait(walk->mm, pmdp, addr);
+   return -EBUSY;
}
  
  		/* Report error for everything else */

pte_unmap(ptep);
*pfn = range->values[HMM_PFN_ERROR];
return -EFAULT;
-   } else {
-   cpu_flags = pte_to_hmm_pfn_flags(range, pte);
-   hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
-  , _fault);
}
  
+	cpu_flags = pte_to_hmm_pfn_flags(range, pte);

+   hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, ,
+  _fault);
if (fault || write_fault)
goto fault;
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages

2020-03-11 Thread Ralph Campbell



On 3/11/20 11:35 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

hmm_range_fault() should never return 0 if the caller requested a valid
page, but the pfns output for that page would be HMM_PFN_ERROR.

hmm_pte_need_fault() must always be called before setting HMM_PFN_ERROR to
detect if the page is in faulting mode or not.

Fix two cases in hmm_vma_walk_pmd() and reorganize some of the duplicated
code.

Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
Fixes: da4c3c735ea4 ("mm/hmm/mirror: helper to snapshot CPU page table")
Signed-off-by: Jason Gunthorpe 


Reviewed-by: Ralph Campbell 


---
  mm/hmm.c | 38 +-
  1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index bf676cfef3e8ee..f61fddf2ef6505 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -363,8 +363,10 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
  {
struct hmm_vma_walk *hmm_vma_walk = walk->private;
struct hmm_range *range = hmm_vma_walk->range;
-   uint64_t *pfns = range->pfns;
-   unsigned long addr = start, i;
+   uint64_t *pfns = >pfns[(start - range->start) >> PAGE_SHIFT];
+   unsigned long npages = (end - start) >> PAGE_SHIFT;
+   unsigned long addr = start;
+   bool fault, write_fault;
pte_t *ptep;
pmd_t pmd;
  
@@ -374,14 +376,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,

return hmm_vma_walk_hole(start, end, -1, walk);
  
  	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {

-   bool fault, write_fault;
-   unsigned long npages;
-   uint64_t *pfns;
-
-   i = (addr - range->start) >> PAGE_SHIFT;
-   npages = (end - addr) >> PAGE_SHIFT;
-   pfns = >pfns[i];
-
hmm_range_need_fault(hmm_vma_walk, pfns, npages,
 0, , _fault);
if (fault || write_fault) {
@@ -390,8 +384,15 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
return -EBUSY;
}
return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
-   } else if (!pmd_present(pmd))
+   }
+
+   if (!pmd_present(pmd)) {
+   hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, ,
+_fault);
+   if (fault || write_fault)
+   return -EFAULT;
return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);


Shouldn't this fill with HMM_PFN_NONE instead of HMM_PFN_ERROR?
Otherwise, when a THP is swapped out, you will get a different
value than if a PTE is swapped out and you are prefetching/snapshotting.


+   }
  
  	if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {

/*
@@ -408,8 +409,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
goto again;
  
-		i = (addr - range->start) >> PAGE_SHIFT;

-   return hmm_vma_handle_pmd(walk, addr, end, [i], pmd);
+   return hmm_vma_handle_pmd(walk, addr, end, pfns, pmd);
}
  
  	/*

@@ -418,15 +418,19 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 * entry pointing to pte directory or it is a bad pmd that will not
 * recover.
 */
-   if (pmd_bad(pmd))
+   if (pmd_bad(pmd)) {
+   hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, ,
+_fault);
+   if (fault || write_fault)
+   return -EFAULT;
return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
+   }
  
  	ptep = pte_offset_map(pmdp, addr);

-   i = (addr - range->start) >> PAGE_SHIFT;
-   for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
+   for (; addr < end; addr += PAGE_SIZE, ptep++, pfns++) {
int r;
  
-		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, [i]);

+   r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, pfns);
if (r) {
/* hmm_vma_handle_pte() did pte_unmap() */
hmm_vma_walk->last = addr;


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling

2020-03-11 Thread Ralph Campbell



On 3/11/20 11:35 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

Currently if a special PTE is encountered hmm_range_fault() immediately
returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
uses).

EFAULT should only be returned after testing with hmm_pte_need_fault().

Also pte_devmap() and pte_special() are exclusive, and there is no need to
check IS_ENABLED, pte_special() is stubbed out to return false on
unsupported architectures.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed 
filesystem")
Signed-off-by: Jason Gunthorpe 


Reviewed-by: Ralph Campbell 


---
  mm/hmm.c | 19 ---
  1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index f61fddf2ef6505..ca33d086bdc190 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -335,16 +335,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
unsigned long addr,
pte_unmap(ptep);
return -EBUSY;
}
-   } else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) 
{
-   if (!is_zero_pfn(pte_pfn(pte))) {
+   }
+
+   /*
+* Since each architecture defines a struct page for the zero page, just
+* fall through and treat it like a normal page.
+*/
+   if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
+   hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, ,
+  _fault);
+   if (fault || write_fault) {
pte_unmap(ptep);
-   *pfn = range->values[HMM_PFN_SPECIAL];
return -EFAULT;
}
-   /*
-* Since each architecture defines a struct page for the zero
-* page, just fall through and treat it like a normal page.
-*/
+   *pfn = range->values[HMM_PFN_SPECIAL];
+   return 0;
}
  
  	*pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/2] mm: remove device private page support from hmm_range_fault

2020-03-16 Thread Ralph Campbell



On 3/16/20 10:52 AM, Christoph Hellwig wrote:

No driver has actually used properly wire up and support this feature.
There is various code related to it in nouveau, but as far as I can tell
it never actually got turned on, and the only changes since the initial
commit are global cleanups.


This is not actually true. OpenCL 2.x does support SVM with nouveau and
device private memory via clEnqueueSVMMigrateMem().
Also, Ben Skeggs has accepted a set of patches to map GPU memory after being
migrated and this change would conflict with that.



Signed-off-by: Christoph Hellwig 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  1 -
  drivers/gpu/drm/nouveau/nouveau_dmem.c  | 37 -
  drivers/gpu/drm/nouveau/nouveau_dmem.h  |  2 --
  drivers/gpu/drm/nouveau/nouveau_svm.c   |  3 --
  include/linux/hmm.h |  2 --
  mm/hmm.c| 28 ---
  6 files changed, 73 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index dee446278417..90821ce5e6ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -776,7 +776,6 @@ struct amdgpu_ttm_tt {
  static const uint64_t hmm_range_flags[HMM_PFN_FLAG_MAX] = {
(1 << 0), /* HMM_PFN_VALID */
(1 << 1), /* HMM_PFN_WRITE */
-   0 /* HMM_PFN_DEVICE_PRIVATE */
  };
  
  static const uint64_t hmm_range_values[HMM_PFN_VALUE_MAX] = {

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 7605c4c48985..42808efceaf2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -671,40 +671,3 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
  out:
return ret;
  }
-
-static inline bool
-nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
-{
-   return is_device_private_page(page) && drm->dmem == page_to_dmem(page);
-}
-
-void
-nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
-struct hmm_range *range)
-{
-   unsigned long i, npages;
-
-   npages = (range->end - range->start) >> PAGE_SHIFT;
-   for (i = 0; i < npages; ++i) {
-   struct page *page;
-   uint64_t addr;
-
-   page = hmm_device_entry_to_page(range, range->pfns[i]);
-   if (page == NULL)
-   continue;
-
-   if (!(range->pfns[i] & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
-   continue;
-   }
-
-   if (!nouveau_dmem_page(drm, page)) {
-   WARN(1, "Some unknown device memory !\n");
-   range->pfns[i] = 0;
-   continue;
-   }
-
-   addr = nouveau_dmem_page_addr(page);
-   range->pfns[i] &= ((1UL << range->pfn_shift) - 1);
-   range->pfns[i] |= (addr >> PAGE_SHIFT) << range->pfn_shift;
-   }
-}
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.h 
b/drivers/gpu/drm/nouveau/nouveau_dmem.h
index 92394be5d649..1ac620b3d4fb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.h
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.h
@@ -38,8 +38,6 @@ int nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
 unsigned long start,
 unsigned long end);
  
-void nouveau_dmem_convert_pfn(struct nouveau_drm *drm,

- struct hmm_range *range);
  #else /* IS_ENABLED(CONFIG_DRM_NOUVEAU_SVM) */
  static inline void nouveau_dmem_init(struct nouveau_drm *drm) {}
  static inline void nouveau_dmem_fini(struct nouveau_drm *drm) {}
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index df9bf1fd1bc0..7e0376dca137 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -367,7 +367,6 @@ static const u64
  nouveau_svm_pfn_flags[HMM_PFN_FLAG_MAX] = {
[HMM_PFN_VALID ] = NVIF_VMM_PFNMAP_V0_V,
[HMM_PFN_WRITE ] = NVIF_VMM_PFNMAP_V0_W,
-   [HMM_PFN_DEVICE_PRIVATE] = NVIF_VMM_PFNMAP_V0_VRAM,
  };
  
  static const u64

@@ -558,8 +557,6 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
break;
}
  
-	nouveau_dmem_convert_pfn(drm, );

-
svmm->vmm->vmm.object.client->super = true;
ret = nvif_object_ioctl(>vmm->vmm.object, data, size, NULL);
svmm->vmm->vmm.object.client->super = false;
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 4bf8d6997b12..5e6034f105c3 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -74,7 +74,6 @@
   * Flags:
   * HMM_PFN_VALID: pfn is valid. It has, at least, read permission.
   * HMM_PFN_WRITE: CPU page table has write permission set
- * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE)
   *
   * The driver provides a flags array for mapping page protections to device
   * PTE bits. If the 

Re: [PATCH v2 hmm 0/9] Small hmm_range_fault() cleanups

2020-03-26 Thread Ralph Campbell



On 3/23/20 6:14 PM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

This is v2 of the first simple series with a few additional patches of little
adjustments.

This needs an additional patch to the hmm tester:

diff --git a/tools/testing/selftests/vm/hmm-tests.c 
b/tools/testing/selftests/vm/hmm-tests.c
index 033a12c7ab5b6d..da15471a2bbf9a 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -1274,7 +1274,7 @@ TEST_F(hmm2, snapshot)
/* Check what the device saw. */
m = buffer->mirror;
ASSERT_EQ(m[0], HMM_DMIRROR_PROT_ERROR);
-   ASSERT_EQ(m[1], HMM_DMIRROR_PROT_NONE);
+   ASSERT_EQ(m[1], HMM_DMIRROR_PROT_ERROR);
ASSERT_EQ(m[2], HMM_DMIRROR_PROT_ZERO | HMM_DMIRROR_PROT_READ);
ASSERT_EQ(m[3], HMM_DMIRROR_PROT_READ);
ASSERT_EQ(m[4], HMM_DMIRROR_PROT_WRITE);

v2 changes:
  - Simplify and rename the flags, rework hmm_vma_walk_test in patch 2 (CH)
  - Adjust more comments in patch 3 (CH, Ralph)
  - Put the ugly boolean logic into a function in patch 3 (CH)
  - Update commit message of patch 4 (CH)
  - Adjust formatting in patch 5 (CH)
  Patches 6, 7, 8 are new

v1: https://lore.kernel.org/r/20200320164905.21722-1-...@ziepe.ca

Jason Gunthorpe (9):
   mm/hmm: remove pgmap checking for devmap pages
   mm/hmm: return the fault type from hmm_pte_need_fault()
   mm/hmm: remove unused code and tidy comments
   mm/hmm: remove HMM_FAULT_SNAPSHOT
   mm/hmm: remove the CONFIG_TRANSPARENT_HUGEPAGE #ifdef
   mm/hmm: use device_private_entry_to_pfn()
   mm/hmm: do not unconditionally set pfns when returning EBUSY
   mm/hmm: do not set pfns when returning an error code
   mm/hmm: return error for non-vma snapshots

  Documentation/vm/hmm.rst|  12 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |   2 +-
  drivers/gpu/drm/nouveau/nouveau_svm.c   |   2 +-
  include/linux/hmm.h | 109 +
  mm/hmm.c| 312 ++--
  5 files changed, 133 insertions(+), 304 deletions(-)



I was able to recompile Karol Herbst's mesa tree and Jerome's SVM tests to
test this with nouveau so for the series you can add,
Tested-by: Ralph Campbell 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm v2 5/5] mm/hmm: remove the customizable pfn format from hmm_range_fault

2020-05-01 Thread Ralph Campbell



On 5/1/20 11:20 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

Presumably the intent here was that hmm_range_fault() could put the data
into some HW specific format and thus avoid some work. However, nothing
actually does that, and it isn't clear how anything actually could do that
as hmm_range_fault() provides CPU addresses which must be DMA mapped.

Perhaps there is some special HW that does not need DMA mapping, but we
don't have any examples of this, and the theoretical performance win of
avoiding an extra scan over the pfns array doesn't seem worth the
complexity. Plus pfns needs to be scanned anyhow to sort out any
DEVICE_PRIVATE pages.

This version replaces the uint64_t with an usigned long containing a pfn
and fixed flags. On input flags is filled with the HMM_PFN_REQ_* values,
on successful output it is filled with HMM_PFN_* values, describing the
state of the pages.

amdgpu is simple to convert, it doesn't use snapshot and doesn't use
per-page flags.

nouveau uses only 16 hmm_pte entries at most (ie fits in a few cache
lines), and it sweeps over its pfns array a couple of times anyhow. It
also has a nasty call chain before it reaches the dma map and hardware
suggesting performance isn't important:

nouveau_svm_fault():
  args.i.m.method = NVIF_VMM_V0_PFNMAP
  nouveau_range_fault()
   nvif_object_ioctl()
client->driver->ioctl()
  struct nvif_driver nvif_driver_nvkm:
.ioctl = nvkm_client_ioctl
   nvkm_ioctl()
nvkm_ioctl_path()
  nvkm_ioctl_v0[type].func(..)
  nvkm_ioctl_mthd()
   nvkm_object_mthd()
  struct nvkm_object_func nvkm_uvmm:
.mthd = nvkm_uvmm_mthd
   nvkm_uvmm_mthd()
nvkm_uvmm_mthd_pfnmap()
 nvkm_vmm_pfn_map()
  nvkm_vmm_ptes_get_map()
   func == gp100_vmm_pgt_pfn
struct nvkm_vmm_desc_func gp100_vmm_desc_spt:
  .pfn = gp100_vmm_pgt_pfn
 nvkm_vmm_iter()
  REF_PTES == func == gp100_vmm_pgt_pfn()
dma_map_page()

Acked-by: Felix Kuehling 
Tested-by: Ralph Campbell 
Signed-off-by: Jason Gunthorpe 
Signed-off-by: Christoph Hellwig 
---
  Documentation/vm/hmm.rst|  26 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  35 ++
  drivers/gpu/drm/nouveau/nouveau_dmem.c  |  27 +---
  drivers/gpu/drm/nouveau/nouveau_dmem.h  |   3 +-
  drivers/gpu/drm/nouveau/nouveau_svm.c   |  87 -
  include/linux/hmm.h |  99 ++-
  mm/hmm.c| 160 +++-
  7 files changed, 192 insertions(+), 245 deletions(-)



...snip...

  
+static void nouveau_hmm_convert_pfn(struct nouveau_drm *drm,

+   struct hmm_range *range, u64 *ioctl_addr)
+{
+   unsigned long i, npages;
+
+   /*
+* The ioctl_addr prepared here is passed through nvif_object_ioctl()
+* to an eventual DMA map in something like gp100_vmm_pgt_pfn()
+*
+* This is all just encoding the internal hmm reprensetation into a


s/reprensetation/representation/

Looks good and still tests OK with nouveau.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 0/5] Adjust hmm_range_fault() API

2020-04-22 Thread Ralph Campbell



On 4/21/20 5:21 PM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

The API is a bit complicated for the uses we actually have, and
disucssions for simplifying have come up a number of times.

This small series removes the customizable pfn format and simplifies the
return code of hmm_range_fault()

All the drivers are adjusted to process in the simplified format.
I would appreciated tested-by's for the two drivers, thanks!


For nouveau you can add:
Tested-by: Ralph Campbell 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/4] mm: check the device private page owner in hmm_range_fault

2020-03-16 Thread Ralph Campbell



On 3/16/20 12:32 PM, Christoph Hellwig wrote:

Hmm range fault will succeed for any kind of device private memory,
even if it doesn't belong to the calling entity.  While nouveau
has some crude checks for that, they are broken because they assume
nouveau is the only user of device private memory.  Fix this by
passing in an expected pgmap owner in the hmm_range_fault structure.

Signed-off-by: Christoph Hellwig 
Fixes: 4ef589dc9b10 ("mm/hmm/devmem: device memory hotplug using ZONE_DEVICE")


Looks good.
Reviewed-by: Ralph Campbell 


---
  drivers/gpu/drm/nouveau/nouveau_dmem.c | 12 
  include/linux/hmm.h|  2 ++
  mm/hmm.c   | 10 +-
  3 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index edfd0805fba4..ad89e09a0be3 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -672,12 +672,6 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
return ret;
  }
  
-static inline bool

-nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
-{
-   return is_device_private_page(page) && drm->dmem == page_to_dmem(page);
-}
-
  void
  nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
 struct hmm_range *range)
@@ -696,12 +690,6 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
if (!is_device_private_page(page))
continue;
  
-		if (!nouveau_dmem_page(drm, page)) {

-   WARN(1, "Some unknown device memory !\n");
-   range->pfns[i] = 0;
-   continue;
-   }
-
addr = nouveau_dmem_page_addr(page);
range->pfns[i] &= ((1UL << range->pfn_shift) - 1);
range->pfns[i] |= (addr >> PAGE_SHIFT) << range->pfn_shift;
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 5e6034f105c3..bb6be4428633 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -132,6 +132,7 @@ enum hmm_pfn_value_e {
   * @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter
   * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)
   * @valid: pfns array did not change since it has been fill by an HMM function
+ * @dev_private_owner: owner of device private pages
   */
  struct hmm_range {
struct mmu_interval_notifier *notifier;
@@ -144,6 +145,7 @@ struct hmm_range {
uint64_tdefault_flags;
uint64_tpfn_flags_mask;
uint8_t pfn_shift;
+   void*dev_private_owner;
  };
  
  /*

diff --git a/mm/hmm.c b/mm/hmm.c
index cfad65f6a67b..b75b3750e03d 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -216,6 +216,14 @@ int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long 
addr,
unsigned long end, uint64_t *pfns, pmd_t pmd);
  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
  
+static inline bool hmm_is_device_private_entry(struct hmm_range *range,

+   swp_entry_t entry)
+{
+   return is_device_private_entry(entry) &&
+   device_private_entry_to_page(entry)->pgmap->owner ==
+   range->dev_private_owner;
+}
+
  static inline uint64_t pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t 
pte)
  {
if (pte_none(pte) || !pte_present(pte) || pte_protnone(pte))
@@ -254,7 +262,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
unsigned long addr,
 * Never fault in device private pages pages, but just report
 * the PFN even if not present.
 */
-   if (is_device_private_entry(entry)) {
+   if (hmm_is_device_private_entry(range, entry)) {
*pfn = hmm_device_entry_from_pfn(range,
swp_offset(entry));
*pfn |= range->flags[HMM_PFN_VALID];


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/4] mm: handle multiple owners of device private pages in migrate_vma

2020-03-16 Thread Ralph Campbell



On 3/16/20 12:32 PM, Christoph Hellwig wrote:

Add a new src_owner field to struct migrate_vma.  If the field is set,
only device private pages with page->pgmap->owner equal to that field
are migrated.  If the field is not set only "normal" pages are migrated.

Signed-off-by: Christoph Hellwig 
Fixes: df6ad69838fc ("mm/device-public-memory: device memory cache coherent with 
CPU")


When migrating to device private memory, setting the src_owner lets the caller
know about source pages that are already migrated and skips pages migrated to a
different device similar to pages swapped out an actual swap device.
But, it prevents normal pages from being migrated to device private memory.
It can still be useful for the driver to know that a page is owned by a
different device if it has a device to device way of migrating data.
nouveau_dmem_migrate_vma() isn't setting args.src_owner so only normal pages
will be migrated which I guess is OK for now since nouveau doesn't handle
direct GPU to GPU migration currently.

When migrating device private memory to system memory due to a CPU fault,
the source page should be the device's device private struct page so if it
isn't, then it does make sense to not migrate whatever normal page is there.
nouveau_dmem_migrate_to_ram() sets src_owner so this case looks OK.

Just had to think this through.
Reviewed-by: Ralph Campbell 


---
  arch/powerpc/kvm/book3s_hv_uvmem.c | 1 +
  drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 +
  include/linux/migrate.h| 8 
  mm/migrate.c   | 9 ++---
  4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 67fefb03b9b7..f44f6b27950f 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -563,6 +563,7 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned 
long start,
mig.end = end;
mig.src = _pfn;
mig.dst = _pfn;
+   mig.src_owner = _uvmem_pgmap;
  
  	mutex_lock(>arch.uvmem_lock);

/* The requested page is already paged-out, nothing to do */
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index a4682272586e..0e36345d395c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -176,6 +176,7 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct 
vm_fault *vmf)
.end= vmf->address + PAGE_SIZE,
.src= ,
.dst= ,
+   .src_owner  = drm->dev,
};
  
  	/*

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 72120061b7d4..3e546cbf03dd 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -196,6 +196,14 @@ struct migrate_vma {
unsigned long   npages;
unsigned long   start;
unsigned long   end;
+
+   /*
+* Set to the owner value also stored in page->pgmap->owner for
+* migrating out of device private memory.  If set only device
+* private pages with this owner are migrated.  If not set
+* device private pages are not migrated at all.
+*/
+   void*src_owner;
  };
  
  int migrate_vma_setup(struct migrate_vma *args);

diff --git a/mm/migrate.c b/mm/migrate.c
index b1092876e537..7605d2c23433 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2241,7 +2241,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
arch_enter_lazy_mmu_mode();
  
  	for (; addr < end; addr += PAGE_SIZE, ptep++) {

-   unsigned long mpfn, pfn;
+   unsigned long mpfn = 0, pfn;
struct page *page;
swp_entry_t entry;
pte_t pte;
@@ -2255,8 +2255,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
}
  
  		if (!pte_present(pte)) {

-   mpfn = 0;
-
/*
 * Only care about unaddressable device page special
 * page table entry. Other special swap entries are not
@@ -2267,11 +2265,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
goto next;
  
  			page = device_private_entry_to_page(entry);

+   if (page->pgmap->owner != migrate->src_owner)
+   goto next;
+
mpfn = migrate_pfn(page_to_pfn(page)) |
MIGRATE_PFN_MIGRATE;
if (is_write_device_private_entry(entry))
mpfn |= MIGRATE_PFN_WRITE;
} else {
+   if (migrate->src_owner)
+   goto next;
pfn = pte_pfn(pte);
if (is_zero_pfn(pfn)) {
  

Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault

2020-03-17 Thread Ralph Campbell


On 3/17/20 5:59 AM, Christoph Hellwig wrote:

On Tue, Mar 17, 2020 at 09:47:55AM -0300, Jason Gunthorpe wrote:

I've been using v7 of Ralph's tester and it is working well - it has
DEVICE_PRIVATE support so I think it can test this flow too. Ralph are
you able?

This hunk seems trivial enough to me, can we include it now?


I can send a separate patch for it once the tester covers it.  I don't
want to add it to the original patch as it is a significant behavior
change compared to the existing code.



Attached is an updated version of my HMM tests based on linux-5.6.0-rc6.
I ran this OK with Jason's 8+1 HMM patches, Christoph's 1-5 misc HMM clean ups,
and Christoph's 1-4 device private page changes applied.

I'm working on getting my nouveau tests running again on a different test
machine and will report on that when ready.
>From d499fb343bfa9764695ecdcd759fb16bc1ca3c93 Mon Sep 17 00:00:00 2001
From: Ralph Campbell 
Date: Tue, 17 Mar 2020 11:10:38 -0700
Subject: [PATCH] mm/hmm/test: add self tests for HMM
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add some basic stand alone self tests for HMM.

Signed-off-by: Ralph Campbell 
Signed-off-by: Jérôme Glisse 
---
 MAINTAINERS|3 +
 include/uapi/linux/test_hmm.h  |   59 ++
 lib/Kconfig.debug  |   12 +
 lib/Makefile   |1 +
 lib/test_hmm.c | 1210 +
 tools/testing/selftests/vm/.gitignore  |1 +
 tools/testing/selftests/vm/Makefile|3 +
 tools/testing/selftests/vm/config  |2 +
 tools/testing/selftests/vm/hmm-tests.c | 1353 
 tools/testing/selftests/vm/run_vmtests |   16 +
 tools/testing/selftests/vm/test_hmm.sh |   97 ++
 11 files changed, 2757 insertions(+)
 create mode 100644 include/uapi/linux/test_hmm.h
 create mode 100644 lib/test_hmm.c
 create mode 100644 tools/testing/selftests/vm/hmm-tests.c
 create mode 100755 tools/testing/selftests/vm/test_hmm.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index cc1d18cb5d18..98c80a589a52 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7608,7 +7608,10 @@ L:	linux...@kvack.org
 S:	Maintained
 F:	mm/hmm*
 F:	include/linux/hmm*
+F:	include/uapi/linux/test_hmm*
 F:	Documentation/vm/hmm.rst
+F:	lib/test_hmm*
+F:	tools/testing/selftests/vm/*hmm*
 
 HOST AP DRIVER
 M:	Jouni Malinen 
diff --git a/include/uapi/linux/test_hmm.h b/include/uapi/linux/test_hmm.h
new file mode 100644
index ..8c5f70c160bf
--- /dev/null
+++ b/include/uapi/linux/test_hmm.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * This is a module to test the HMM (Heterogeneous Memory Management) API
+ * of the kernel. It allows a userspace program to expose its entire address
+ * space through the HMM test module device file.
+ */
+#ifndef _UAPI_LINUX_HMM_DMIRROR_H
+#define _UAPI_LINUX_HMM_DMIRROR_H
+
+#include 
+#include 
+
+/*
+ * Structure to pass to the HMM test driver to mimic a device accessing
+ * system memory and ZONE_DEVICE private memory through device page tables.
+ *
+ * @addr: (in) user address the device will read/write
+ * @ptr: (in) user address where device data is copied to/from
+ * @npages: (in) number of pages to read/write
+ * @cpages: (out) number of pages copied
+ * @faults: (out) number of device page faults seen
+ */
+struct hmm_dmirror_cmd {
+	__u64		addr;
+	__u64		ptr;
+	__u64		npages;
+	__u64		cpages;
+	__u64		faults;
+};
+
+/* Expose the address space of the calling process through hmm device file */
+#define HMM_DMIRROR_READ		_IOWR('H', 0x00, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_WRITE		_IOWR('H', 0x01, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_MIGRATE		_IOWR('H', 0x02, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_SNAPSHOT		_IOWR('H', 0x03, struct hmm_dmirror_cmd)
+
+/*
+ * Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT.
+ * HMM_DMIRROR_PROT_ERROR: no valid mirror PTE for this page
+ * HMM_DMIRROR_PROT_NONE: unpopulated PTE or PTE with no access
+ * HMM_DMIRROR_PROT_READ: read-only PTE
+ * HMM_DMIRROR_PROT_WRITE: read/write PTE
+ * HMM_DMIRROR_PROT_ZERO: special read-only zero page
+ * HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL: Migrated device private page on the
+ *	device the ioctl() is made
+ * HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE: Migrated device private page on some
+ *	other device
+ */
+enum {
+	HMM_DMIRROR_PROT_ERROR			= 0xFF,
+	HMM_DMIRROR_PROT_NONE			= 0x00,
+	HMM_DMIRROR_PROT_READ			= 0x01,
+	HMM_DMIRROR_PROT_WRITE			= 0x02,
+	HMM_DMIRROR_PROT_ZERO			= 0x10,
+	HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL	= 0x20,
+	HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE	= 0x30,
+};
+
+#endif /* _UAPI_LINUX_HMM_DMIRROR_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 69def4a9df00..4d22ce7879a7 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2162,6 +2162,18 @@ config TEST_MEMINIT
 
 	  If unsure, say N.
 
+config TEST_HMM
+	tristate &q

Re: [PATCH 2/2] mm: remove device private page support from hmm_range_fault

2020-03-17 Thread Ralph Campbell



On 3/17/20 4:56 AM, Jason Gunthorpe wrote:

On Mon, Mar 16, 2020 at 01:24:09PM -0700, Ralph Campbell wrote:


The reason for it being backwards is that "normally" a device doesn't want
the device private page to be faulted back to system memory, it wants to
get the device private struct page so it can update its page table to point
to the memory already in the device.


The "backwards" is you set the flag on input and never get it on
output, clear the flag in input and maybe get it on output.

Compare with valid or write which don't work that way.


Also, a device like Nvidia's GPUs may have an alternate path for copying
one GPU's memory to another (nvlink) without going through system memory
so getting a device private struct page/PFN from hmm_range_fault() that isn't
"owned" by the faulting GPU is useful.
I agree that the current code isn't well tested or thought out for multiple 
devices
(rdma, NVMe drives, GPUs, etc.) but it also ties in with peer-to-peer access 
via PCIe.


I think the series here is a big improvement. The GPU driver can set
owners that match the hidden cluster interconnect.

Jason



I agree this is an improvement. I was just thinking about possible future use 
cases.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault

2020-03-17 Thread Ralph Campbell



On 3/17/20 12:34 AM, Christoph Hellwig wrote:

On Mon, Mar 16, 2020 at 03:49:51PM -0700, Ralph Campbell wrote:

On 3/16/20 12:32 PM, Christoph Hellwig wrote:

Remove the code to fault device private pages back into system memory
that has never been used by any driver.  Also replace the usage of the
HMM_PFN_DEVICE_PRIVATE flag in the pfns array with a simple
is_device_private_page check in nouveau.

Signed-off-by: Christoph Hellwig 


Getting rid of HMM_PFN_DEVICE_PRIVATE seems reasonable to me since a driver can
look at the struct page but what if a driver needs to fault in a page from
another device's private memory? Should it call handle_mm_fault()?


Obviously no driver cared for that so far.  Once we have test cases
for that and thus testable code we can add code to fault it in from
hmm_vma_handle_pte.



I'm OK with the series. I think I would have been less confused if I looked at
patch 4 then 3.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault

2020-03-19 Thread Ralph Campbell

Adding linux-kselft...@vger.kernel.org for the test config question.

On 3/19/20 11:17 AM, Jason Gunthorpe wrote:

On Tue, Mar 17, 2020 at 04:14:31PM -0700, Ralph Campbell wrote:


On 3/17/20 5:59 AM, Christoph Hellwig wrote:

On Tue, Mar 17, 2020 at 09:47:55AM -0300, Jason Gunthorpe wrote:

I've been using v7 of Ralph's tester and it is working well - it has
DEVICE_PRIVATE support so I think it can test this flow too. Ralph are
you able?

This hunk seems trivial enough to me, can we include it now?


I can send a separate patch for it once the tester covers it.  I don't
want to add it to the original patch as it is a significant behavior
change compared to the existing code.



Attached is an updated version of my HMM tests based on linux-5.6.0-rc6.
I ran this OK with Jason's 8+1 HMM patches, Christoph's 1-5 misc HMM clean ups,
and Christoph's 1-4 device private page changes applied.


I'd like to get this to mergable, it looks pretty good now, but I have
no idea about selftests - and I'm struggling to even compile the tools
dir


diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 69def4a9df00..4d22ce7879a7 100644
+++ b/lib/Kconfig.debug
@@ -2162,6 +2162,18 @@ config TEST_MEMINIT
  
  	  If unsure, say N.
  
+config TEST_HMM

+   tristate "Test HMM (Heterogeneous Memory Management)"
+   depends on DEVICE_PRIVATE
+   select HMM_MIRROR
+select MMU_NOTIFIER


extra spaces


Will fix in v8.


In general I wonder if it even makes sense that DEVICE_PRIVATE is user
selectable?


Should tests enable the feature or the feature enable the test?
IMHO, if the feature is being compiled into the kernel, that should
enable the menu item for the test. If the feature isn't selected,
no need to test it :-)


+static int dmirror_fops_open(struct inode *inode, struct file *filp)
+{
+   struct cdev *cdev = inode->i_cdev;
+   struct dmirror *dmirror;
+   int ret;
+
+   /* Mirror this process address space */
+   dmirror = kzalloc(sizeof(*dmirror), GFP_KERNEL);
+   if (dmirror == NULL)
+   return -ENOMEM;
+
+   dmirror->mdevice = container_of(cdev, struct dmirror_device, cdevice);
+   mutex_init(>mutex);
+   xa_init(>pt);
+
+   ret = mmu_interval_notifier_insert(>notifier, current->mm,
+   0, ULONG_MAX & PAGE_MASK, _min_ops);
+   if (ret) {
+   kfree(dmirror);
+   return ret;
+   }
+
+   /* Pairs with the mmdrop() in dmirror_fops_release(). */
+   mmgrab(current->mm);
+   dmirror->mm = current->mm;


The notifier holds a mmgrab, no need for another one


OK. I'll replace dmirror->mm with dmirror->notifier.mm.


+   /* Only the first open registers the address space. */
+   filp->private_data = dmirror;


Not sure what this comment means


I'll change the comment to:
/*
 * The first open of the device character file registers the address
 * space of the process doing the open() system call with the device.
 * Subsequent file opens by other processes will have access to the
 * first process' address space.
 */


+static inline struct dmirror_device *dmirror_page_to_device(struct page *page)
+
+{
+   struct dmirror_chunk *devmem;
+
+   devmem = container_of(page->pgmap, struct dmirror_chunk, pagemap);
+   return devmem->mdevice;
+}


extra devmem var is not really needed


I'll change this to:
return container_of(page->pgmap, struct dmirror_chunk,
pagemap)->mdevice;


+
+static bool dmirror_device_is_mine(struct dmirror_device *mdevice,
+  struct page *page)
+{
+   if (!is_zone_device_page(page))
+   return false;
+   return page->pgmap->ops == _devmem_ops &&
+   dmirror_page_to_device(page) == mdevice;
+}


Use new owner stuff, right? Actually this is redunant now, the check
should be just WARN_ON pageowner != self owner


I'll clean this up. dmirror_device_is_mine() isn't needed now.


+static int dmirror_do_fault(struct dmirror *dmirror, struct hmm_range *range)
+{
+   uint64_t *pfns = range->pfns;
+   unsigned long pfn;
+
+   for (pfn = (range->start >> PAGE_SHIFT);
+pfn < (range->end >> PAGE_SHIFT);
+pfn++, pfns++) {
+   struct page *page;
+   void *entry;
+
+   /*
+* HMM_PFN_ERROR is returned if it is accessing invalid memory
+* either because of memory error (hardware detected memory
+* corruption) or more likely because of truncate on mmap
+* file.
+*/
+   if (*pfns == range->values[HMM_PFN_ERROR])
+   return -EFAULT;


Unless that snapshot is use hmm_range_fault() never returns success
and sets PFN_ERROR, so this should be

Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault

2020-03-19 Thread Ralph Campbell



On 3/19/20 5:14 PM, Jason Gunthorpe wrote:

On Tue, Mar 17, 2020 at 04:14:31PM -0700, Ralph Campbell wrote:


+static int dmirror_fault(struct dmirror *dmirror, unsigned long start,
+unsigned long end, bool write)
+{
+   struct mm_struct *mm = dmirror->mm;
+   unsigned long addr;
+   uint64_t pfns[64];
+   struct hmm_range range = {
+   .notifier = >notifier,
+   .pfns = pfns,
+   .flags = dmirror_hmm_flags,
+   .values = dmirror_hmm_values,
+   .pfn_shift = DPT_SHIFT,
+   .pfn_flags_mask = ~(dmirror_hmm_flags[HMM_PFN_VALID] |
+   dmirror_hmm_flags[HMM_PFN_WRITE]),


Since pfns is not initialized pfn_flags_mask should be 0.


Good point.


+   .default_flags = dmirror_hmm_flags[HMM_PFN_VALID] |
+   (write ? dmirror_hmm_flags[HMM_PFN_WRITE] : 0),
+   .dev_private_owner = dmirror->mdevice,
+   };
+   int ret = 0;



+static int dmirror_snapshot(struct dmirror *dmirror,
+   struct hmm_dmirror_cmd *cmd)
+{
+   struct mm_struct *mm = dmirror->mm;
+   unsigned long start, end;
+   unsigned long size = cmd->npages << PAGE_SHIFT;
+   unsigned long addr;
+   unsigned long next;
+   uint64_t pfns[64];
+   unsigned char perm[64];
+   char __user *uptr;
+   struct hmm_range range = {
+   .pfns = pfns,
+   .flags = dmirror_hmm_flags,
+   .values = dmirror_hmm_values,
+   .pfn_shift = DPT_SHIFT,
+   .pfn_flags_mask = ~0ULL,


Same here, especially since this is snapshot

Jason


Actually, snapshot ignores pfn_flags_mask and default_flags.
In hmm_pte_need_fault(), HMM_FAULT_SNAPSHOT is checked and returns early before
checking pfn_flags_mask and default_flags since no faults are being requested.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 0/6] Small hmm_range_fault() cleanups

2020-03-20 Thread Ralph Campbell



On 3/20/20 9:48 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

I've had these in my work queue for a bit, nothing profound here, just some
small edits for clarity.


The hmm tester changes are clear enough but I'm having a bit of trouble 
figuring out
what this series applies cleanly to since I'm trying to apply it on top of the
other patches you and Christoph have sent out.
Is there a private git tree/branch where everything is applied?



Ralph's hmm tester will need a small diff to work after this - which
illustrates how setting default_flags == 0 is the same as what was called
SNAPSHOT:

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 6ca953926dc13f..5f31f5b3e64cb9 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -300,7 +300,7 @@ static int dmirror_range_fault(struct dmirror *dmirror,
  
  		range->notifier_seq = mmu_interval_read_begin(range->notifier);

down_read(>mmap_sem);
-   count = hmm_range_fault(range, 0);
+   count = hmm_range_fault(range);
up_read(>mmap_sem);
if (count <= 0) {
if (count == 0 || count == -EBUSY)
@@ -337,8 +337,7 @@ static int dmirror_fault(struct dmirror *dmirror, unsigned 
long start,
.flags = dmirror_hmm_flags,
.values = dmirror_hmm_values,
.pfn_shift = DPT_SHIFT,
-   .pfn_flags_mask = ~(dmirror_hmm_flags[HMM_PFN_VALID] |
-   dmirror_hmm_flags[HMM_PFN_WRITE]),
+   .pfn_flags_mask = 0,
.default_flags = dmirror_hmm_flags[HMM_PFN_VALID] |
(write ? dmirror_hmm_flags[HMM_PFN_WRITE] : 0),
.dev_private_owner = dmirror->mdevice,
@@ -872,7 +871,7 @@ static int dmirror_range_snapshot(struct dmirror *dmirror,
range->notifier_seq = mmu_interval_read_begin(range->notifier);
  
  		down_read(>mmap_sem);

-   count = hmm_range_fault(range, HMM_FAULT_SNAPSHOT);
+   count = hmm_range_fault(range);
up_read(>mmap_sem);
if (count <= 0) {
if (count == 0 || count == -EBUSY)
@@ -916,7 +915,7 @@ static int dmirror_snapshot(struct dmirror *dmirror,
.flags = dmirror_hmm_flags,
.values = dmirror_hmm_values,
.pfn_shift = DPT_SHIFT,
-   .pfn_flags_mask = ~0ULL,
+   .pfn_flags_mask = 0,
.dev_private_owner = dmirror->mdevice,
};
int ret = 0;

Jason Gunthorpe (6):
   mm/hmm: remove pgmap checking for devmap pages
   mm/hmm: return the fault type from hmm_pte_need_fault()
   mm/hmm: remove unused code and tidy comments
   mm/hmm: remove HMM_FAULT_SNAPSHOT
   mm/hmm: remove the CONFIG_TRANSPARENT_HUGEPAGE #ifdef
   mm/hmm: use device_private_entry_to_pfn()

  Documentation/vm/hmm.rst|  12 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |   2 +-
  drivers/gpu/drm/nouveau/nouveau_svm.c   |   2 +-
  include/linux/hmm.h |  55 +-
  mm/hmm.c| 238 +---
  5 files changed, 98 insertions(+), 211 deletions(-)


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 0/6] Small hmm_range_fault() cleanups

2020-03-20 Thread Ralph Campbell



On 3/20/20 9:48 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

I've had these in my work queue for a bit, nothing profound here, just some
small edits for clarity.

Ralph's hmm tester will need a small diff to work after this - which
illustrates how setting default_flags == 0 is the same as what was called
SNAPSHOT:

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 6ca953926dc13f..5f31f5b3e64cb9 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -300,7 +300,7 @@ static int dmirror_range_fault(struct dmirror *dmirror,
  
  		range->notifier_seq = mmu_interval_read_begin(range->notifier);

down_read(>mmap_sem);
-   count = hmm_range_fault(range, 0);
+   count = hmm_range_fault(range);
up_read(>mmap_sem);
if (count <= 0) {
if (count == 0 || count == -EBUSY)
@@ -337,8 +337,7 @@ static int dmirror_fault(struct dmirror *dmirror, unsigned 
long start,
.flags = dmirror_hmm_flags,
.values = dmirror_hmm_values,
.pfn_shift = DPT_SHIFT,
-   .pfn_flags_mask = ~(dmirror_hmm_flags[HMM_PFN_VALID] |
-   dmirror_hmm_flags[HMM_PFN_WRITE]),
+   .pfn_flags_mask = 0,
.default_flags = dmirror_hmm_flags[HMM_PFN_VALID] |
(write ? dmirror_hmm_flags[HMM_PFN_WRITE] : 0),
.dev_private_owner = dmirror->mdevice,
@@ -872,7 +871,7 @@ static int dmirror_range_snapshot(struct dmirror *dmirror,
range->notifier_seq = mmu_interval_read_begin(range->notifier);
  
  		down_read(>mmap_sem);

-   count = hmm_range_fault(range, HMM_FAULT_SNAPSHOT);
+   count = hmm_range_fault(range);
up_read(>mmap_sem);
if (count <= 0) {
if (count == 0 || count == -EBUSY)
@@ -916,7 +915,7 @@ static int dmirror_snapshot(struct dmirror *dmirror,
.flags = dmirror_hmm_flags,
.values = dmirror_hmm_values,
.pfn_shift = DPT_SHIFT,
-   .pfn_flags_mask = ~0ULL,
+   .pfn_flags_mask = 0,
.dev_private_owner = dmirror->mdevice,
};
int ret = 0;

Jason Gunthorpe (6):
   mm/hmm: remove pgmap checking for devmap pages
   mm/hmm: return the fault type from hmm_pte_need_fault()
   mm/hmm: remove unused code and tidy comments
   mm/hmm: remove HMM_FAULT_SNAPSHOT
   mm/hmm: remove the CONFIG_TRANSPARENT_HUGEPAGE #ifdef
   mm/hmm: use device_private_entry_to_pfn()

  Documentation/vm/hmm.rst|  12 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |   2 +-
  drivers/gpu/drm/nouveau/nouveau_svm.c   |   2 +-
  include/linux/hmm.h |  55 +-
  mm/hmm.c| 238 +---
  5 files changed, 98 insertions(+), 211 deletions(-)


The series looks good to me so,
Reviewed-by: Ralph Campbell 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH hmm 3/6] mm/hmm: remove unused code and tidy comments

2020-03-20 Thread Ralph Campbell



On 3/20/20 9:49 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

Delete several functions that are never called, fix some desync between
comments and structure content, remove an unused ret, and move one
function only used by hmm.c into hmm.c

Signed-off-by: Jason Gunthorpe 


Reviewed-by: Ralph Campbell 


---
  include/linux/hmm.h | 50 -
  mm/hmm.c| 12 +++
  2 files changed, 12 insertions(+), 50 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index bb6be4428633a8..184a8633260f9d 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -120,9 +120,6 @@ enum hmm_pfn_value_e {
   *
   * @notifier: a mmu_interval_notifier that includes the start/end
   * @notifier_seq: result of mmu_interval_read_begin()
- * @hmm: the core HMM structure this range is active against
- * @vma: the vm area struct for the range
- * @list: all range lock are on a list
   * @start: range virtual start address (inclusive)
   * @end: range virtual end address (exclusive)
   * @pfns: array of pfns (big enough for the range)
@@ -131,7 +128,6 @@ enum hmm_pfn_value_e {
   * @default_flags: default flags for the range (write, read, ... see hmm doc)
   * @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter
   * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)


s/pfn_shifts/pfn_shift


- * @valid: pfns array did not change since it has been fill by an HMM function
   * @dev_private_owner: owner of device private pages
   */
  struct hmm_range {
@@ -171,52 +167,6 @@ static inline struct page *hmm_device_entry_to_page(const 
struct hmm_range *rang
return pfn_to_page(entry >> range->pfn_shift);
  }
  
-/*

- * hmm_device_entry_to_pfn() - return pfn value store in a device entry
- * @range: range use to decode device entry value
- * @entry: device entry to extract pfn from
- * Return: pfn value if device entry is valid, -1UL otherwise
- */
-static inline unsigned long
-hmm_device_entry_to_pfn(const struct hmm_range *range, uint64_t pfn)
-{
-   if (pfn == range->values[HMM_PFN_NONE])
-   return -1UL;
-   if (pfn == range->values[HMM_PFN_ERROR])
-   return -1UL;
-   if (pfn == range->values[HMM_PFN_SPECIAL])
-   return -1UL;
-   if (!(pfn & range->flags[HMM_PFN_VALID]))
-   return -1UL;
-   return (pfn >> range->pfn_shift);
-}
-
-/*
- * hmm_device_entry_from_page() - create a valid device entry for a page
- * @range: range use to encode HMM pfn value
- * @page: page for which to create the device entry
- * Return: valid device entry for the page
- */
-static inline uint64_t hmm_device_entry_from_page(const struct hmm_range 
*range,
- struct page *page)
-{
-   return (page_to_pfn(page) << range->pfn_shift) |
-   range->flags[HMM_PFN_VALID];
-}
-
-/*
- * hmm_device_entry_from_pfn() - create a valid device entry value from pfn
- * @range: range use to encode HMM pfn value
- * @pfn: pfn value for which to create the device entry
- * Return: valid device entry for the pfn
- */
-static inline uint64_t hmm_device_entry_from_pfn(const struct hmm_range *range,
-unsigned long pfn)
-{
-   return (pfn << range->pfn_shift) |
-   range->flags[HMM_PFN_VALID];
-}
-
  /* Don't fault in missing PTEs, just snapshot the current state. */
  #define HMM_FAULT_SNAPSHOT(1 << 1)
  
diff --git a/mm/hmm.c b/mm/hmm.c

index b4f662eadb7a7c..687d21c675ee60 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -37,6 +37,18 @@ enum {
NEED_WRITE_FAULT = 1 << 1,
  };
  
+/*

+ * hmm_device_entry_from_pfn() - create a valid device entry value from pfn
+ * @range: range use to encode HMM pfn value
+ * @pfn: pfn value for which to create the device entry
+ * Return: valid device entry for the pfn
+ */
+static uint64_t hmm_device_entry_from_pfn(const struct hmm_range *range,
+ unsigned long pfn)
+{
+   return (pfn << range->pfn_shift) | range->flags[HMM_PFN_VALID];
+}
+
  static int hmm_pfns_fill(unsigned long addr, unsigned long end,
struct hmm_range *range, enum hmm_pfn_value_e value)
  {


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/2] mm: remove device private page support from hmm_range_fault

2020-03-16 Thread Ralph Campbell



On 3/16/20 1:09 PM, Jason Gunthorpe wrote:

On Mon, Mar 16, 2020 at 07:49:35PM +0100, Christoph Hellwig wrote:

On Mon, Mar 16, 2020 at 11:42:19AM -0700, Ralph Campbell wrote:


On 3/16/20 10:52 AM, Christoph Hellwig wrote:

No driver has actually used properly wire up and support this feature.
There is various code related to it in nouveau, but as far as I can tell
it never actually got turned on, and the only changes since the initial
commit are global cleanups.


This is not actually true. OpenCL 2.x does support SVM with nouveau and
device private memory via clEnqueueSVMMigrateMem().
Also, Ben Skeggs has accepted a set of patches to map GPU memory after being
migrated and this change would conflict with that.


Can you explain me how we actually invoke this code?

For that we'd need HMM_PFN_DEVICE_PRIVATE NVIF_VMM_PFNMAP_V0_VRAM
set in ->pfns before calling hmm_range_fault, which isn't happening.


Oh, I got tripped on this too

The logic is backwards from what you'd think.. If you *set*
HMM_PFN_DEVICE_PRIVATE then this triggers:

hmm_pte_need_fault():
if ((cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
/* Do we fault on device memory ? */
if (pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) {
*write_fault = pfns & range->flags[HMM_PFN_WRITE];
*fault = true;
}
return;
}

Ie if the cpu page is a DEVICE_PRIVATE and the caller sets
HMM_PFN_DEVICE_PRIVATE in the input flags (pfns) then it always faults
it and never sets HMM_PFN_DEVICE_PRIVATE in the output flags.

So setting 0 enabled device_private support, and nouveau is Ok.

AMDGPU is broken because it can't handle device private and can't set
the flag to supress it.

I was going to try and sort this out as part of getting rid of range->flags

Jason



The reason for it being backwards is that "normally" a device doesn't want
the device private page to be faulted back to system memory, it wants to
get the device private struct page so it can update its page table to point
to the memory already in the device.
Also, a device like Nvidia's GPUs may have an alternate path for copying
one GPU's memory to another (nvlink) without going through system memory
so getting a device private struct page/PFN from hmm_range_fault() that isn't
"owned" by the faulting GPU is useful.
I agree that the current code isn't well tested or thought out for multiple 
devices
(rdma, NVMe drives, GPUs, etc.) but it also ties in with peer-to-peer access 
via PCIe.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/2] mm: remove device private page support from hmm_range_fault

2020-03-16 Thread Ralph Campbell



On 3/16/20 11:49 AM, Christoph Hellwig wrote:

On Mon, Mar 16, 2020 at 11:42:19AM -0700, Ralph Campbell wrote:


On 3/16/20 10:52 AM, Christoph Hellwig wrote:

No driver has actually used properly wire up and support this feature.
There is various code related to it in nouveau, but as far as I can tell
it never actually got turned on, and the only changes since the initial
commit are global cleanups.


This is not actually true. OpenCL 2.x does support SVM with nouveau and
device private memory via clEnqueueSVMMigrateMem().
Also, Ben Skeggs has accepted a set of patches to map GPU memory after being
migrated and this change would conflict with that.


Can you explain me how we actually invoke this code?


GPU memory is allocated when the device private memory "struct page" is
allocated. See where nouveau_dmem_chunk_alloc() calls nouveau_bo_new().
Then when a page is migrated to the GPU, the GPU memory physical address
is just the offset into the "fake" PFN range allocated by
devm_request_free_mem_region().

I'm looking into allocating GPU memory at the time of migration instead of when
the device private memory struct pages are allocated but that is a future
improvement.

System memory is migrated to GPU memory:
# mesa
clEnqueueSVMMigrateMem()
  svm_migrate_op()
q.svm_migrate()
  pipe->svm_migrate() // really nvc0_svm_migrate()
drmCommandWrite() // in libdrm
  drmIoctl()
ioctl()
  nouveau_drm_ioctl() // nouveau_drm.c
drm_ioctl()
  nouveau_svmm_bind()
nouveau_dmem_migrate_vma()
  migrate_vma_setup()
  nouveau_dmem_migrate_chunk()
nouveau_dmem_migrate_copy_one()
  // allocate device private struct page
  dpage = nouveau_dmem_page_alloc_locked()
dpage = nouveau_dmem_pages_alloc()
// Get GPU VRAM physical address
nouveau_dmem_page_addr(dpage)
// This does the DMA to GPU memory
drm->dmem->migrate.copy_func()
  migrate_vma_pages()
  migrate_vma_finalize()

Without my recent patch set, there is no GPU page table entry created for
this migrated memory so there will be a GPU fault which is handled in a
worker thread:
nouveau_svm_fault()
  // examine fault buffer entries and compute range of pages
  nouveau_range_fault()
// This will fill in the pfns array with a device private entry PFN
hmm_range_fault()
// This sees the range->flags[HMM_PFN_DEVICE_PRIVATE] flag
// and converts the HMM PFN to a GPU physical address
nouveau_dmem_convert_pfn()
// This sets up the GPU page tables
nvif_object_ioctl()


For that we'd need HMM_PFN_DEVICE_PRIVATE NVIF_VMM_PFNMAP_V0_VRAM
set in ->pfns before calling hmm_range_fault, which isn't happening.



It is set by hmm_range_fault() via the range->flags[HMM_PFN_DEVICE_PRIVATE] 
entry
when hmm_range_fault() sees a device private struct page. The call to
nouveau_dmem_convert_pfn() is just replacing the "fake" PFN with the real PFN
but not clearing/changing the read/write or VRAM/system memory PTE bits.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/4] memremap: add an owner field to struct dev_pagemap

2020-03-16 Thread Ralph Campbell



On 3/16/20 12:32 PM, Christoph Hellwig wrote:

Add a new opaque owner field to struct dev_pagemap, which will allow
the hmm and migrate_vma code to identify who owns ZONE_DEVICE memory,
and refuse to work on mappings not owned by the calling entity.

Signed-off-by: Christoph Hellwig 


This looks like a reasonable approach to take.
Reviewed-by: Ralph Campbell 


---
  arch/powerpc/kvm/book3s_hv_uvmem.c | 2 ++
  drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 +
  include/linux/memremap.h   | 4 
  mm/memremap.c  | 4 
  4 files changed, 11 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 79b1202b1c62..67fefb03b9b7 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -779,6 +779,8 @@ int kvmppc_uvmem_init(void)
kvmppc_uvmem_pgmap.type = MEMORY_DEVICE_PRIVATE;
kvmppc_uvmem_pgmap.res = *res;
kvmppc_uvmem_pgmap.ops = _uvmem_ops;
+   /* just one global instance: */
+   kvmppc_uvmem_pgmap.owner = _uvmem_pgmap;
addr = memremap_pages(_uvmem_pgmap, NUMA_NO_NODE);
if (IS_ERR(addr)) {
ret = PTR_ERR(addr);
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 0ad5d87b5a8e..a4682272586e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -526,6 +526,7 @@ nouveau_dmem_init(struct nouveau_drm *drm)
drm->dmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
drm->dmem->pagemap.res = *res;
drm->dmem->pagemap.ops = _dmem_pagemap_ops;
+   drm->dmem->pagemap.owner = drm->dev;
if (IS_ERR(devm_memremap_pages(device, >dmem->pagemap)))
goto out_free;
  
diff --git a/include/linux/memremap.h b/include/linux/memremap.h

index 6fefb09af7c3..60d97e8fd3c0 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -103,6 +103,9 @@ struct dev_pagemap_ops {
   * @type: memory type: see MEMORY_* in memory_hotplug.h
   * @flags: PGMAP_* flags to specify defailed behavior
   * @ops: method table
+ * @owner: an opaque pointer identifying the entity that manages this
+ * instance.  Used by various helpers to make sure that no
+ * foreign ZONE_DEVICE memory is accessed.
   */
  struct dev_pagemap {
struct vmem_altmap altmap;
@@ -113,6 +116,7 @@ struct dev_pagemap {
enum memory_type type;
unsigned int flags;
const struct dev_pagemap_ops *ops;
+   void *owner;
  };
  
  static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap)

diff --git a/mm/memremap.c b/mm/memremap.c
index 09b5b7adc773..9b2c97ceb775 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -181,6 +181,10 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
WARN(1, "Missing migrate_to_ram method\n");
return ERR_PTR(-EINVAL);
}
+   if (!pgmap->owner) {
+   WARN(1, "Missing owner\n");
+   return ERR_PTR(-EINVAL);
+   }
break;
case MEMORY_DEVICE_FS_DAX:
if (!IS_ENABLED(CONFIG_ZONE_DEVICE) ||


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault

2020-03-16 Thread Ralph Campbell



On 3/16/20 12:32 PM, Christoph Hellwig wrote:

Remove the code to fault device private pages back into system memory
that has never been used by any driver.  Also replace the usage of the
HMM_PFN_DEVICE_PRIVATE flag in the pfns array with a simple
is_device_private_page check in nouveau.

Signed-off-by: Christoph Hellwig 


Getting rid of HMM_PFN_DEVICE_PRIVATE seems reasonable to me since a driver can
look at the struct page but what if a driver needs to fault in a page from
another device's private memory? Should it call handle_mm_fault()?



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  1 -
  drivers/gpu/drm/nouveau/nouveau_dmem.c  |  5 +++--
  drivers/gpu/drm/nouveau/nouveau_svm.c   |  1 -
  include/linux/hmm.h |  2 --
  mm/hmm.c| 25 +
  5 files changed, 8 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index dee446278417..90821ce5e6ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -776,7 +776,6 @@ struct amdgpu_ttm_tt {
  static const uint64_t hmm_range_flags[HMM_PFN_FLAG_MAX] = {
(1 << 0), /* HMM_PFN_VALID */
(1 << 1), /* HMM_PFN_WRITE */
-   0 /* HMM_PFN_DEVICE_PRIVATE */
  };
  
  static const uint64_t hmm_range_values[HMM_PFN_VALUE_MAX] = {

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 0e36345d395c..edfd0805fba4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -28,6 +28,7 @@
  
  #include 

  #include 
+#include 
  #include 
  #include 
  
@@ -692,9 +693,8 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm,

if (page == NULL)
continue;
  
-		if (!(range->pfns[i] & range->flags[HMM_PFN_DEVICE_PRIVATE])) {

+   if (!is_device_private_page(page))
continue;
-   }
  
  		if (!nouveau_dmem_page(drm, page)) {

WARN(1, "Some unknown device memory !\n");
@@ -705,5 +705,6 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
addr = nouveau_dmem_page_addr(page);
range->pfns[i] &= ((1UL << range->pfn_shift) - 1);
range->pfns[i] |= (addr >> PAGE_SHIFT) << range->pfn_shift;
+   range->pfns[i] |= NVIF_VMM_PFNMAP_V0_VRAM;
}
  }
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index df9bf1fd1bc0..39c731a99937 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -367,7 +367,6 @@ static const u64
  nouveau_svm_pfn_flags[HMM_PFN_FLAG_MAX] = {
[HMM_PFN_VALID ] = NVIF_VMM_PFNMAP_V0_V,
[HMM_PFN_WRITE ] = NVIF_VMM_PFNMAP_V0_W,
-   [HMM_PFN_DEVICE_PRIVATE] = NVIF_VMM_PFNMAP_V0_VRAM,
  };
  
  static const u64

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 4bf8d6997b12..5e6034f105c3 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -74,7 +74,6 @@
   * Flags:
   * HMM_PFN_VALID: pfn is valid. It has, at least, read permission.
   * HMM_PFN_WRITE: CPU page table has write permission set
- * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE)
   *
   * The driver provides a flags array for mapping page protections to device
   * PTE bits. If the driver valid bit for an entry is bit 3,
@@ -86,7 +85,6 @@
  enum hmm_pfn_flag_e {
HMM_PFN_VALID = 0,
HMM_PFN_WRITE,
-   HMM_PFN_DEVICE_PRIVATE,
HMM_PFN_FLAG_MAX
  };
  
diff --git a/mm/hmm.c b/mm/hmm.c

index 180e398170b0..cfad65f6a67b 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -118,15 +118,6 @@ static inline void hmm_pte_need_fault(const struct 
hmm_vma_walk *hmm_vma_walk,
/* We aren't ask to do anything ... */
if (!(pfns & range->flags[HMM_PFN_VALID]))
return;
-   /* If this is device memory then only fault if explicitly requested */
-   if ((cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
-   /* Do we fault on device memory ? */
-   if (pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) {
-   *write_fault = pfns & range->flags[HMM_PFN_WRITE];
-   *fault = true;
-   }
-   return;
-   }
  
  	/* If CPU page table is not valid then we need to fault */

*fault = !(cpu_flags & range->flags[HMM_PFN_VALID]);
@@ -260,21 +251,15 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
unsigned long addr,
swp_entry_t entry = pte_to_swp_entry(pte);
  
  		/*

-* This is a special swap entry, ignore migration, use
-* device and report anything else as error.
+* Never fault in device private pages pages, but just report
+* the PFN even if not present.
 */

Re: linux-next: manual merge of the hmm tree with the drm tree

2020-07-30 Thread Ralph Campbell



On 7/30/20 5:03 AM, Jason Gunthorpe wrote:

On Thu, Jul 30, 2020 at 07:21:10PM +1000, Stephen Rothwell wrote:

Hi all,

Today's linux-next merge of the hmm tree got a conflict in:

   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c

between commit:

   7763d24f3ba0 ("drm/nouveau/vmm/gp100-: fix mapping 2MB sysmem pages")

from the drm tree and commits:

   4725c6b82a48 ("nouveau: fix mapping 2MB sysmem pages")
   1a77decd0cae ("nouveau: fix storing invalid ptes")

from the hmm tree.

7763d24f3ba0 and 4725c6b82a48 are exactly the same patch.


Oh? Ralph? What happened here?


Ben did email me saying he was planning to take this patch into
his nouveau tree and I did reply to him saying you had also taken it
into your tree and that I had more nouveau/SVM patches for you on the way.
So, I'm not sure what happened.


Ben can you drop 7763d24f3ba0 ?

Jason


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount

2021-06-17 Thread Ralph Campbell



On 6/17/21 8:16 AM, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
  arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
  drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
  fs/dax.c   |  4 +-
  include/linux/dax.h|  2 +-
  include/linux/memremap.h   |  7 +--
  include/linux/mm.h | 44 -
  lib/test_hmm.c |  2 +-
  mm/internal.h  |  8 +++
  mm/memremap.c  | 68 +++---
  mm/migrate.c   |  5 --
  mm/page_alloc.c|  3 ++
  mm/swap.c  | 45 ++---
  12 files changed, 45 insertions(+), 147 deletions(-)


I think it is great that you are picking this up and trying to revive it.

However, I have a number of concerns about how it affects existing ZONE_DEVICE
MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_FS_DAX users and I don't see this
addressing them. For example, dev_dax_probe() allocates MEMORY_DEVICE_GENERIC
struct pages and then:
  dev_dax_fault()
dev_dax_huge_fault()
  __dev_dax_pte_fault()
vmf_insert_mixed()
which just inserts the PFN into the CPU page tables without increasing the page
refcount so it is zero (whereas it was one before). But using get_page() will
trigger VM_BUG_ON_PAGE() if it is enabled. There isn't any current notion of
free verses allocated for these struct pages. I suppose init_page_count()
could be called on all the struct pages in dev_dax_probe() to fix that though.

I'm even less clear about how to fix MEMORY_DEVICE_FS_DAX. File systems have 
clear
allocate and free states for backing storage but there are the complications 
with
the page cache references, etc. to consider. The >1 to 1 reference count seems 
to
be used to tell when a page is idle (no I/O, reclaim scanners) rather than free
(not allocated to any file) but I'm not 100% sure about that since I don't 
really
understand all the issues around why a file system needs to have a DAX mount 
option
besides knowing that the storage block size has to be a multiple of the page 
size.



Re: [PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount

2021-06-29 Thread Ralph Campbell

On 6/28/21 9:46 AM, Felix Kuehling wrote:

Am 2021-06-17 um 3:16 p.m. schrieb Ralph Campbell:

On 6/17/21 8:16 AM, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that
complicates the
code for put_page() and several places in the kernel that need to
check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't
need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
   arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
   drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
   fs/dax.c   |  4 +-
   include/linux/dax.h    |  2 +-
   include/linux/memremap.h   |  7 +--
   include/linux/mm.h | 44 -
   lib/test_hmm.c |  2 +-
   mm/internal.h  |  8 +++
   mm/memremap.c  | 68 +++---
   mm/migrate.c   |  5 --
   mm/page_alloc.c    |  3 ++
   mm/swap.c  | 45 ++---
   12 files changed, 45 insertions(+), 147 deletions(-)


I think it is great that you are picking this up and trying to revive it.

However, I have a number of concerns about how it affects existing
ZONE_DEVICE
MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_FS_DAX users and I don't see this
addressing them. For example, dev_dax_probe() allocates
MEMORY_DEVICE_GENERIC
struct pages and then:
   dev_dax_fault()
     dev_dax_huge_fault()
   __dev_dax_pte_fault()
     vmf_insert_mixed()
which just inserts the PFN into the CPU page tables without increasing
the page
refcount so it is zero (whereas it was one before). But using
get_page() will
trigger VM_BUG_ON_PAGE() if it is enabled. There isn't any current
notion of
free verses allocated for these struct pages. I suppose init_page_count()
could be called on all the struct pages in dev_dax_probe() to fix that
though.

Hi Ralph,

For DEVICE_GENERIC pages free_zone_device_page doesn't do anything. So
I'm not sure what the reference counting is good for in this case.

Alex is going to add free_zone_device_page support for DEVICE_GENERIC
pages (patch 8 of this series). However, even then, it only does
anything if there is an actual call to put_page. Where would that call
come from in the dev_dax driver case?


Correct, the drivers/dax/device.c driver allocates MEMORY_DEVICE_GENERIC
struct pages and doesn't seem to allocate/free the page nor increment/decrement
the reference count but it does call vmf_insert_mixed() if the /dev/file
is mmap()'ed into a user process' address space. If devm_memremap_pages()
returns the array of ZONE_DEVICE struct pages initialized with a reference
count of zero, then the CPU page tables will have a PTE/PFN that points to
a struct page with a zero reference count. This goes against the normal
expectation in the rest of the mm code that assumes a page mapped by a CPU
has a non-zero reference count.
So yes, nothing "bad" happens because put_page() isn't called but the
reference count will differ from other drivers that call vmf_insert_mixed()
or vm_insert_page() where the page was allocated with alloc_pages() or
similar.


I'm even less clear about how to fix MEMORY_DEVICE_FS_DAX. File
systems have clear
allocate and free states for backing storage but there are the
complications with
the page cache references, etc. to consider. The >1 to 1 reference
count seems to
be used to tell when a page is idle (no I/O, reclaim scanners) rather
than free
(not allocated to any file) but I'm not 100% sure about that since I
don't really
understand all the issues around why a file system needs to have a DAX
mount option
besides knowing that the storage block size has to be a multiple of
the page size.

The only thing that happens in free_zone_device_page for FS_DAX pages is
wake_up_var(>_refcount). I guess, whoever is waiting for this
wake-up will need to be prepared to see a refcount 0 instead of 1 now. I
see these callers that would need to be updated:

./fs/ext4/inode.c:        error = ___wait_var_event(>_refcount,
./fs/ext4/inode.c-                atomic_read(>_refcount) == 1,
./fs/ext4/inode.c-                TASK_INTERRUPTIBLE, 0, 0,
./fs/ext4/inode.c-                ext4_wait_dax_page(ei));
--
./fs/fuse/dax.c:    return ___wait_var_event(>_refcount,
./fs/fuse/dax.c-            atomic_read(>_refcount) == 1,
TASK_INTERRUPTIBLE,
./fs/fuse/dax.c-            0, 0, fuse_wait_dax_page(inode));
--
./fs/xfs/xfs_file.c:    return ___wait_var_event(>_refcount,
./fs/xfs/xfs_file.c-            atomic_read(>_refcount) == 1,
TASK_INTERRUPTIBLE,
./fs/xfs/xfs_file.c-            0, 0, xfs_wait_dax_page(inode));

Regarding your page-cache comment, doesn't DAX mean that those file
pages a

RE: [PATCH v3 6/8] mm: Selftests for exclusive device memory

2021-03-01 Thread Ralph Campbell
> From: Alistair Popple 
> Sent: Thursday, February 25, 2021 11:19 PM
> To: linux...@kvack.org; nouv...@lists.freedesktop.org;
> bske...@redhat.com; a...@linux-foundation.org
> Cc: linux-...@vger.kernel.org; linux-ker...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; John Hubbard ; Ralph
> Campbell ; jgli...@redhat.com; Jason Gunthorpe
> ; h...@infradead.org; dan...@ffwll.ch; Alistair Popple
> 
> Subject: [PATCH v3 6/8] mm: Selftests for exclusive device memory
> 
> Adds some selftests for exclusive device memory.
> 
> Signed-off-by: Alistair Popple 

One minor nit below, but you can add
Tested-by: Ralph Campbell 
Reviewed-by: Ralph Campbell 

> +static int dmirror_exclusive(struct dmirror *dmirror,
> +  struct hmm_dmirror_cmd *cmd)
> +{
> + unsigned long start, end, addr;
> + unsigned long size = cmd->npages << PAGE_SHIFT;
> + struct mm_struct *mm = dmirror->notifier.mm;
> + struct page *pages[64];
> + struct dmirror_bounce bounce;
> + unsigned long next;
> + int ret;
> +
> + start = cmd->addr;
> + end = start + size;
> + if (end < start)
> + return -EINVAL;
> +
> + /* Since the mm is for the mirrored process, get a reference first. */
> + if (!mmget_not_zero(mm))
> + return -EINVAL;
> +
> + mmap_read_lock(mm);
> + for (addr = start; addr < end; addr = next) {
> + int i, mapped;
> +
> + if (end < addr + (64 << PAGE_SHIFT))
> + next = end;
> + else
> + next = addr + (64 << PAGE_SHIFT);

I suggest using ARRAY_SIZE(pages) instead of '64' to make the meaning clear.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v3 5/8] mm: Device exclusive memory access

2021-03-01 Thread Ralph Campbell
> From: Alistair Popple 
> Sent: Thursday, February 25, 2021 11:18 PM
> To: linux...@kvack.org; nouv...@lists.freedesktop.org;
> bske...@redhat.com; a...@linux-foundation.org
> Cc: linux-...@vger.kernel.org; linux-ker...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; John Hubbard ; Ralph
> Campbell ; jgli...@redhat.com; Jason Gunthorpe
> ; h...@infradead.org; dan...@ffwll.ch; Alistair Popple
> 
> Subject: [PATCH v3 5/8] mm: Device exclusive memory access
> 
> Some devices require exclusive write access to shared virtual memory (SVM)
> ranges to perform atomic operations on that memory. This requires CPU page
> tables to be updated to deny access whilst atomic operations are occurring.
> 
> In order to do this introduce a new swap entry type (SWP_DEVICE_EXCLUSIVE).
> When a SVM range needs to be marked for exclusive access by a device all page
> table mappings for the particular range are replaced with device exclusive 
> swap
> entries. This causes any CPU access to the page to result in a fault.
> 
> Faults are resovled by replacing the faulting entry with the original 
> mapping. This
> results in MMU notifiers being called which a driver uses to update access
> permissions such as revoking atomic access. After notifiers have been called 
> the
> device will no longer have exclusive access to the region.
> 
> Signed-off-by: Alistair Popple 
> ---
>  Documentation/vm/hmm.rst |  15 
>  include/linux/rmap.h |   3 +
>  include/linux/swap.h |   4 +-
>  include/linux/swapops.h  |  44 ++-
>  mm/hmm.c |   5 ++
>  mm/memory.c  | 108 +-
>  mm/mprotect.c|   8 ++
>  mm/page_vma_mapped.c |   9 ++-
>  mm/rmap.c| 163 +++
>  9 files changed, 352 insertions(+), 7 deletions(-)
...
> +int make_device_exclusive_range(struct mm_struct *mm, unsigned long start,
> + unsigned long end, struct page **pages) {
> + long npages = (end - start) >> PAGE_SHIFT;
> + long i;

Nit: you should use unsigned long for 'i' and 'npages' to match start/end.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 2/8] mm/swapops: Rework swap entry manipulation code

2021-03-08 Thread Ralph Campbell



On 3/3/21 10:16 PM, Alistair Popple wrote:

Both migration and device private pages use special swap entries that
are manipluated by a range of inline functions. The arguments to these
are somewhat inconsitent so rework them to remove flag type arguments
and to make the arguments similar for both read and write entry
creation.

Signed-off-by: Alistair Popple 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jason Gunthorpe 


Looks good to me too.
Reviewed-by: Ralph Campbell 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 1/8] mm: Remove special swap entry functions

2021-03-08 Thread Ralph Campbell



On 3/3/21 10:16 PM, Alistair Popple wrote:

Remove the migration and device private entry_to_page() and
entry_to_pfn() inline functions and instead open code them directly.
This results in shorter code which is easier to understand.

Signed-off-by: Alistair Popple 


Looks OK to me.
Reviewed-by: Ralph Campbell 


---

v4:
* Added pfn_swap_entry_to_page()
* Reinstated check that migration entries point to locked pages
* Removed #define swapcache_prepare which isn't needed for CONFIG_SWAP=0
   builds
---
  arch/s390/mm/pgtable.c  |  2 +-
  fs/proc/task_mmu.c  | 23 +-
  include/linux/swap.h|  4 +--
  include/linux/swapops.h | 69 ++---
  mm/hmm.c|  5 ++-
  mm/huge_memory.c|  4 +--
  mm/memcontrol.c |  2 +-
  mm/memory.c | 10 +++---
  mm/migrate.c|  6 ++--
  mm/page_vma_mapped.c|  6 ++--
  10 files changed, 50 insertions(+), 81 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 18205f851c24..aae001096c46 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -691,7 +691,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, 
swp_entry_t entry)
if (!non_swap_entry(entry))
dec_mm_counter(mm, MM_SWAPENTS);
else if (is_migration_entry(entry)) {
-   struct page *page = migration_entry_to_page(entry);
+   struct page *page = pfn_swap_entry_to_page(entry));
  
  		dec_mm_counter(mm, mm_counter(page));

}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 602e3a52884d..7c75af0fc423 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -514,10 +514,8 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
} else {
mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT;
}
-   } else if (is_migration_entry(swpent))
-   page = migration_entry_to_page(swpent);
-   else if (is_device_private_entry(swpent))
-   page = device_private_entry_to_page(swpent);
+   } else if (is_pfn_swap_entry(swpent))
+   page = pfn_swap_entry_to_page(swpent);
} else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
&& pte_none(*pte))) {
page = xa_load(>vm_file->f_mapping->i_pages,
@@ -549,7 +547,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
swp_entry_t entry = pmd_to_swp_entry(*pmd);
  
  		if (is_migration_entry(entry))

-   page = migration_entry_to_page(entry);
+   page = pfn_swap_entry_to_page(entry);
}
if (IS_ERR_OR_NULL(page))
return;
@@ -691,10 +689,8 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long 
hmask,
} else if (is_swap_pte(*pte)) {
swp_entry_t swpent = pte_to_swp_entry(*pte);
  
-		if (is_migration_entry(swpent))

-   page = migration_entry_to_page(swpent);
-   else if (is_device_private_entry(swpent))
-   page = device_private_entry_to_page(swpent);
+   if (is_pfn_swap_entry(swpent))
+   page = pfn_swap_entry_to_page(swpent);
}
if (page) {
int mapcount = page_mapcount(page);
@@ -1382,11 +1378,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct 
pagemapread *pm,
frame = swp_type(entry) |
(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
flags |= PM_SWAP;
-   if (is_migration_entry(entry))
-   page = migration_entry_to_page(entry);
-
-   if (is_device_private_entry(entry))
-   page = device_private_entry_to_page(entry);
+   if (is_pfn_swap_entry(entry))
+   page = pfn_swap_entry_to_page(entry);
}
  
  	if (page && !PageAnon(page))

@@ -1443,7 +1436,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long 
addr, unsigned long end,
if (pmd_swp_soft_dirty(pmd))
flags |= PM_SOFT_DIRTY;
VM_BUG_ON(!is_pmd_migration_entry(pmd));
-   page = migration_entry_to_page(entry);
+   page = pfn_swap_entry_to_page(entry);
}
  #endif
  
diff --git a/include/linux/swap.h b/include/linux/swap.h

index 596bc2f4d9b0..57a7690966a4 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -519,8 +519,8 @@ static inline void show_swap_cache_info(void)
  {
  }
  
-#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));})

-#define swapcache_prepare(e) ({(is_migration_entry(e) || 
is_device_private_entry(e)

Re: [PATCH v4 5/8] mm: Device exclusive memory access

2021-03-08 Thread Ralph Campbell



On 3/3/21 10:16 PM, Alistair Popple wrote:

Some devices require exclusive write access to shared virtual
memory (SVM) ranges to perform atomic operations on that memory. This
requires CPU page tables to be updated to deny access whilst atomic
operations are occurring.

In order to do this introduce a new swap entry
type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for
exclusive access by a device all page table mappings for the particular
range are replaced with device exclusive swap entries. This causes any
CPU access to the page to result in a fault.

Faults are resovled by replacing the faulting entry with the original
mapping. This results in MMU notifiers being called which a driver uses
to update access permissions such as revoking atomic access. After
notifiers have been called the device will no longer have exclusive
access to the region.

Signed-off-by: Alistair Popple 


I see in the next two patches how make_device_exclusive_entry() and
check_device_exclusive_range() are used. This points out a similar
problem that migrate_vma_setup() had before I added the
mmu_notifier_range_init_migrate() helper to pass a cookie from
migrate_vma_setup() to the invalidation callback so the device driver
could ignore an invalidation callback triggered by the caller and thus
resulting in a deadlock or having to invalidate device PTEs that
wouldn't be migrating.

I think you can eliminate the need for check_device_exclusive_range() in
the same way by adding a "void *" pointer to make_device_exclusive_entry()
and passing that through to try_to_protect(), setting rmap_walk_control rwc.arg
and then passing arg to mmu_notifier_range_init_migrate().
Although, maybe it would be better to define a new
mmu_notifier_range_init_exclusive() and event type MMU_NOTIFY_EXCLUSIVE so
that a device driver can revoke atomic/exclusive access but keep read/write
access to other parts of the page.

I thought about how make_device_exclusive_entry() is similar to 
hmm_range_fault()
and whether it would be possible to add a new HMM_PFN_REQ_EXCLUSIVE flag but I
see that make_device_exclusive_entry() returns the pages locked and with an
additional get_page() reference. This doesn't fit well with the other
hmm_range_fault() entries being returned as a "snapshot" so having a different
API makes sense. I think it would be useful to add a HMM_PFN_EXCLUSIVE flag so
that snapshots of the page tables can at least report that a page is exclusively
being accessed by *some* device. Unfortunately, there is no pgmap pointer to be
able to tell which device has exclusive access (since any struct page could be
exclusively accessed, not just device private ones).
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-03-08 Thread Ralph Campbell



On 3/3/21 10:16 PM, Alistair Popple wrote:

The behaviour of try_to_unmap_one() is difficult to follow because it
performs different operations based on a fairly large set of flags used
in different combinations.

TTU_MUNLOCK is one such flag. However it is exclusively used by
try_to_munlock() which specifies no other flags. Therefore rather than
overload try_to_unmap_one() with unrelated behaviour split this out into
it's own function and remove the flag.

Signed-off-by: Alistair Popple 


Looks good to me.
Reviewed-by: Ralph Campbell 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 4/8] mm/rmap: Split migration into its own function

2021-03-08 Thread Ralph Campbell



On 3/3/21 10:16 PM, Alistair Popple wrote:

Migration is currently implemented as a mode of operation for
try_to_unmap_one() generally specified by passing the TTU_MIGRATION flag
or in the case of splitting a huge anonymous page TTU_SPLIT_FREEZE.

However it does not have much in common with the rest of the unmap
functionality of try_to_unmap_one() and thus splitting it into a
separate function reduces the complexity of try_to_unmap_one() making it
more readable.

Several simplifications can also be made in try_to_migrate_one() based
on the following observations:

  - All users of TTU_MIGRATION also set TTU_IGNORE_MLOCK.
  - No users of TTU_MIGRATION ever set TTU_IGNORE_HWPOISON.
  - No users of TTU_MIGRATION ever set TTU_BATCH_FLUSH.

TTU_SPLIT_FREEZE is a special case of migration used when splitting an
anonymous page. This is most easily dealt with by calling the correct
function from unmap_page() in mm/huge_memory.c  - either
try_to_migrate() for PageAnon or try_to_unmap().

Signed-off-by: Alistair Popple 
Reviewed-by: Christoph Hellwig 


Looks reasonable to me. I do worry a bit about code duplication.
At some point in the future, it might be discovered that other combinations
of TTU_XXX flags are needed in which case a careful check of try_to_migrate()
and try_to_unmap() will be needed.

Reviewed-by: Ralph Campbell 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v3 6/8] mm: Selftests for exclusive device memory

2021-03-01 Thread Ralph Campbell

> From: Jason Gunthorpe 
> Sent: Monday, March 1, 2021 9:56 AM
> To: Alistair Popple 
> Cc: linux...@kvack.org; nouv...@lists.freedesktop.org;
> bske...@redhat.com; a...@linux-foundation.org; linux-...@vger.kernel.org;
> linux-ker...@vger.kernel.org; dri-devel@lists.freedesktop.org; John Hubbard
> ; Ralph Campbell ;
> jgli...@redhat.com; h...@infradead.org; dan...@ffwll.ch
> Subject: Re: [PATCH v3 6/8] mm: Selftests for exclusive device memory
> 
> On Fri, Feb 26, 2021 at 06:18:30PM +1100, Alistair Popple wrote:
> > Adds some selftests for exclusive device memory.
> >
> > Signed-off-by: Alistair Popple 
> > ---
> >  lib/test_hmm.c | 124 ++
> >  lib/test_hmm_uapi.h|   2 +
> >  tools/testing/selftests/vm/hmm-tests.c | 219 +
> >  3 files changed, 345 insertions(+)
> 
> Please get Ralph to review this, otherwise:
> 
> Acked-by: Jason Gunthorpe 
> 
> Jason

I'm working on it. Thanks for encouragement. 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-18 Thread Ralph Campbell

On 8/17/21 5:35 PM, Felix Kuehling wrote:

Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:

On 8/12/21 11:31 PM, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that
complicates the
code for put_page() and several places in the kernel that need to
check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't
need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

v5:
AS: add condition at try_grab_page to check for the zone device type,
while
page ref counter is checked less/equal to zero. In case of device
zone, pages
ref counter are initialized to zero.

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
   arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
   drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
   fs/dax.c   |  4 +-
   include/linux/dax.h    |  2 +-
   include/linux/memremap.h   |  7 +--
   include/linux/mm.h | 13 +
   lib/test_hmm.c |  2 +-
   mm/internal.h  |  8 +++
   mm/memremap.c  | 68 +++---
   mm/migrate.c   |  5 --
   mm/page_alloc.c    |  3 ++
   mm/swap.c  | 45 ++---
   12 files changed, 46 insertions(+), 115 deletions(-)


I haven't seen a response to the issues I raised back at v3 of this
series.
https://lore.kernel.org/linux-mm/4f6dd918-d79b-1aa7-3a4c-caa67ddc2...@nvidia.com/


Did I miss something?

I think part of the response was that we did more testing. Alex added
support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests
recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE
about a zero page refcount in try_get_page. The fix is in the latest
version of patch 2. But it's already obsolete because John Hubbard is
about to remove that function altogether.

I think the issues you raised were more uncertainty than known bugs. It
seems the fact that you can have DAX pages with 0 refcount is a feature
more than a bug.

Regards,
   Felix


Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined?
In that case, mmap() of a DAX device will call insert_page() which calls
get_page() which would trigger VM_BUG_ON_PAGE().

I can believe it is OK for PTE_SPECIAL page table entries to have no
struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with
a zero reference count using insert_pfn().

I find it hard to believe that other MM developers don't see an issue
with a struct page with refcount == 0 and mapcount == 1.

I don't see where init_page_count() is being called for the
MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD
driver allocates and passes to migrate_vma_setup().
Looks like svm_migrate_get_vram_page() needs to call init_page_count()
instead of get_page(). (I'm looking at branch origin/alexsierrag/device_generic
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver.git)

Also, what about the other places where is_device_private_page() is called?
Don't they need to be updated to call is_device_page() instead?
One of my goals for this patch was to remove special casing reference counts
for ZONE_DEVICE pages in rmap.c, etc.

I still think this patch needs an ACK from a FS/DAX maintainer.



Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-17 Thread Ralph Campbell

On 8/12/21 11:31 PM, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

v5:
AS: add condition at try_grab_page to check for the zone device type, while
page ref counter is checked less/equal to zero. In case of device zone, pages
ref counter are initialized to zero.

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
  arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
  drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
  fs/dax.c   |  4 +-
  include/linux/dax.h|  2 +-
  include/linux/memremap.h   |  7 +--
  include/linux/mm.h | 13 +
  lib/test_hmm.c |  2 +-
  mm/internal.h  |  8 +++
  mm/memremap.c  | 68 +++---
  mm/migrate.c   |  5 --
  mm/page_alloc.c|  3 ++
  mm/swap.c  | 45 ++---
  12 files changed, 46 insertions(+), 115 deletions(-)


I haven't seen a response to the issues I raised back at v3 of this series.
https://lore.kernel.org/linux-mm/4f6dd918-d79b-1aa7-3a4c-caa67ddc2...@nvidia.com/

Did I miss something?



Re: [PATCH v1 02/14] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-25 Thread Ralph Campbell



On 8/25/21 4:15 AM, Vlastimil Babka wrote:

On 8/25/21 05:48, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

That's certainly welcome. I just wonder what was the reason to use 1 in the
first place and why it's no longer necessary?


I'm sure this is a long story that I don't know most of the history.
I'm guessing that ZONE_DEVICE started out with a reference count of
one since that is what most "normal" struct page pages start with.
Then put_page() is used to free newly initialized struct pages to the
slab/slob/slub page allocator.
This made it easy to call get_page()/put_page() on ZONE_DEVICE pages
since get_page() asserts that the caller has a reference.
However, most drivers that create ZONE_DEVICE struct pages just insert
a PTE into the user page tables and don't increment/decrement the
reference count. MEMORY_DEVICE_FS_DAX used the >1 to 1 reference count
transition to signal that a page was idle so that made put_page() a
bit more complex. Then MEMORY_DEVICE_PRIVATE pages were added and this
special casing of what "idle" meant got more complicated and more parts
of mm had to check for is_device_private_page().
My goal was to make ZONE_DEVICE struct pages reference counts be zero
based and allocated/freed similar to the page allocator so that more
of the mm code could be used, like THP ZONE_DEVICE pages, without special
casing ZONE_DEVICE.



Re: [PATCH v2 02/12] mm: remove extra ZONE_DEVICE struct page refcount

2021-09-14 Thread Ralph Campbell

On 9/13/21 9:15 AM, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
Reviewed-by: Christoph Hellwig 
---
v2:
AS: merged this patch in linux 5.11 version

v5:
AS: add condition at try_grab_page to check for the zone device type, while
page ref counter is checked less/equal to zero. In case of device zone, pages
ref counter are initialized to zero.

v7:
AS: fix condition at try_grab_page added at v5, is invalid. It supposed
to fix xfstests/generic/413 test, however, there's a known issue on
this test where DAX mapped area DIO to non-DAX expect to fail.
https://patchwork.kernel.org/project/fstests/patch/1489463960-3579-1-git-send-email-xz...@redhat.com
This condition was removed after rebase over patch series
https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com
---
  arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
  drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
  fs/dax.c   |  4 +-
  include/linux/dax.h|  2 +-
  include/linux/memremap.h   |  7 +--
  include/linux/mm.h | 11 
  lib/test_hmm.c |  2 +-
  mm/internal.h  |  8 +++
  mm/memremap.c  | 69 +++---
  mm/migrate.c   |  5 --
  mm/page_alloc.c|  3 ++
  mm/swap.c  | 45 ++---
  12 files changed, 45 insertions(+), 115 deletions(-)


I don't see mm/memcontrol.c listed here so I think you will need this fragment 
too:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 389b5766e74f..0cc9c7c71e79 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5541,11 +5541,7 @@ static struct page *mc_handle_swap_pte(struct 
vm_area_struct *vma,
 */
if (is_device_private_entry(ent)) {
page = pfn_swap_entry_to_page(ent);
-   /*
-* MEMORY_DEVICE_PRIVATE means ZONE_DEVICE page and which have
-* a refcount of 1 when free (unlike normal page)
-*/
-   if (!page_ref_add_unless(page, 1, 1))
+   if (!get_page_unless_zero(page))
return NULL;
return page;
}




Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Ralph Campbell



On 10/14/21 10:06 AM, Jason Gunthorpe wrote:

On Thu, Oct 14, 2021 at 10:39:28AM -0500, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
Reviewed-by: Christoph Hellwig 
---
v2:
AS: merged this patch in linux 5.11 version

v5:
AS: add condition at try_grab_page to check for the zone device type, while
page ref counter is checked less/equal to zero. In case of device zone, pages
ref counter are initialized to zero.

v7:
AS: fix condition at try_grab_page added at v5, is invalid. It supposed
to fix xfstests/generic/413 test, however, there's a known issue on
this test where DAX mapped area DIO to non-DAX expect to fail.
https://patchwork.kernel.org/project/fstests/patch/1489463960-3579-1-git-send-email-xz...@redhat.com
This condition was removed after rebase over patch series
https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com
---
  arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
  drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
  fs/dax.c   |  4 +-
  include/linux/dax.h|  2 +-
  include/linux/memremap.h   |  7 +--
  include/linux/mm.h | 11 
  lib/test_hmm.c |  2 +-
  mm/internal.h  |  8 +++
  mm/memcontrol.c|  6 +--
  mm/memremap.c  | 69 +++---
  mm/migrate.c   |  5 --
  mm/page_alloc.c|  3 ++
  mm/swap.c  | 45 ++---
  13 files changed, 46 insertions(+), 120 deletions(-)

Has anyone tested this with FSDAX? Does get_user_pages() on fsdax
backed memory still work?


I ran xfstests-dev using the kernel boot option to "fake" a pmem device
when I first posted this patch. The tests ran OK (or at least the same
tests passed with and without my patch). However, I could never really
convince myself the changes were "OK" for fsdax since I didn't understand
the code that well. I would still like to see a xfsdax maintainer or
expert ACK this change.

https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git


What refcount value does the struct pages have when they are installed
in the PTEs? Remember a 0 refcount will make all the get_user_pages()
fail.

I'm looking at the call path starting in ext4_punch_hole() and I would
expect to see something manipulating the page ref count before
the ext4_break_layouts() call path gets to the dax_page_unused() test.

All I see is we go into unmap_mapping_pages() - that would normally
put back the page references held by PTEs but insert_pfn() has this:

if (pfn_t_devmap(pfn))
entry = pte_mkdevmap(pfn_t_pte(pfn, prot));

And:

static inline pte_t pte_mkdevmap(pte_t pte)
{
return pte_set_flags(pte, _PAGE_SPECIAL|_PAGE_DEVMAP);
}

Which interacts with vm_normal_page():

if (pte_devmap(pte))
return NULL;

To disable that refcounting?

So... I have a feeling this will have PTEs pointing to 0 refcount
pages? Unless FSDAX is !pte_devmap which is not the case, right?

This seems further confirmed by this comment:

/*
 * If we race get_user_pages_fast() here either we'll see the
 * elevated page count in the iteration and wait, or
 * get_user_pages_fast() will see that the page it took a reference
 * against is no longer mapped in the page tables and bail to the
 * get_user_pages() slow path.  The slow path is protected by
 * pte_lock() and pmd_lock(). New references are not taken without
 * holding those locks, and unmap_mapping_pages() will not zero the
 * pte or pmd without holding the respective lock, so we are
 * guaranteed to either see new references or prevent new
 * references from being established.
 */

Which seems to explain this scheme relies on unmap_mapping_pages() to
fence GUP_fast, not on GUP_fast observing 0 refcounts when it should
stop.

This seems like it would be properly fixed by using normal page
refcounting for PTEs - ie stop using special for these pages?

Does anyone know why devmap is pte_special anyhow?


+void free_zone_device_page(struct page *page)
+{
+   switch (page->pgmap->type) {
+   case MEMORY_DEVICE_PRIVATE:
+   free_device_page(page);
+   return;
+   case MEMORY_DEVICE_FS_DAX:
+   /* notify page idle */
+   wake_up_var(>_refcount);
+   return;

It is not for this series, but I wonder if we should ju

Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Ralph Campbell



On 10/14/21 11:01 AM, Jason Gunthorpe wrote:

On Thu, Oct 14, 2021 at 10:35:27AM -0700, Ralph Campbell wrote:


I ran xfstests-dev using the kernel boot option to "fake" a pmem device
when I first posted this patch. The tests ran OK (or at least the same
tests passed with and without my patch).

Hmm. I know nothing of xfstests but

tests/generic/413

Looks kind of like it might cover this situation?

Did it run for you?

Jason


I don't remember. I'll have to rerun the test which might take a day or two
to set up again.



Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED

2021-10-25 Thread Ralph Campbell



On 10/24/21 21:16, Alistair Popple wrote:

MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
source page was already locked during migrate_vma_collect(). If it
wasn't then the a second attempt is made to lock the page. However if
the first attempt failed it's unlikely a second attempt will succeed,
and the retry adds complexity. So clean this up by removing the retry
and MIGRATE_PFN_LOCKED flag.

Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
set, but nothing actually checks that.

Signed-off-by: Alistair Popple 


You can add:
Reviewed-by: Ralph Campbell 


---
  Documentation/vm/hmm.rst |   2 +-
  arch/powerpc/kvm/book3s_hv_uvmem.c   |   4 +-
  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |   2 -
  drivers/gpu/drm/nouveau/nouveau_dmem.c   |   4 +-
  include/linux/migrate.h  |   1 -
  lib/test_hmm.c   |   5 +-
  mm/migrate.c | 145 +--
  7 files changed, 35 insertions(+), 128 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index a14c2938e7af..f2a59ed82ed3 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -360,7 +360,7 @@ between device driver specific code and shared common code:
 system memory page, locks the page with ``lock_page()``, and fills in the
 ``dst`` array entry with::
  
- dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;

+ dst[i] = migrate_pfn(page_to_pfn(dpage));
  
 Now that the driver knows that this page is being migrated, it can

 invalidate device private MMU mappings and copy device private memory
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index a7061ee3b157..28c436df9935 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
  gpa, 0, page_shift);
  
  	if (ret == U_SUCCESS)

-   *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
+   *mig.dst = migrate_pfn(pfn);
else {
unlock_page(dpage);
__free_page(dpage);
@@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
}
}
  
-	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;

+   *mig.dst = migrate_pfn(page_to_pfn(dpage));
migrate_vma_pages();
  out_finalize:
migrate_vma_finalize();
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 4a16e3c257b9..41d9417f182b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
svm_migrate_get_vram_page(prange, migrate->dst[i]);
migrate->dst[i] = migrate_pfn(migrate->dst[i]);
-   migrate->dst[i] |= MIGRATE_PFN_LOCKED;
src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE,
  DMA_TO_DEVICE);
r = dma_mapping_error(dev, src[i]);
@@ -580,7 +579,6 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
  dst[i] >> PAGE_SHIFT, page_to_pfn(dpage));
  
  		migrate->dst[i] = migrate_pfn(page_to_pfn(dpage));

-   migrate->dst[i] |= MIGRATE_PFN_LOCKED;
j++;
}
  
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c

index 92987daa5e17..3828aafd3ac4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -166,7 +166,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct 
nouveau_drm *drm,
goto error_dma_unmap;
mutex_unlock(>mutex);
  
-	args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;

+   args->dst[0] = migrate_pfn(page_to_pfn(dpage));
return 0;
  
  error_dma_unmap:

@@ -602,7 +602,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct 
nouveau_drm *drm,
((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
if (src & MIGRATE_PFN_WRITE)
*pfn |= NVIF_VMM_PFNMAP_V0_W;
-   return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+   return migrate_pfn(page_to_pfn(dpage));
  
  out_dma_unmap:

dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index c8077e936691..479b861ae490 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -119,7 +119,6 @@ static inline int migrate_misplaced_page(

Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-07 Thread Ralph Campbell

On 2/6/22 22:32, Christoph Hellwig wrote:

ZONE_DEVICE struct pages have an extra reference count that complicates
the code for put_page() and several places in the kernel that need to
check the reference count to see that a page is not being used (gup,
compaction, migration, etc.). Clean up the code so the reference count
doesn't need to be treated specially for ZONE_DEVICE pages.

Note that this excludes the special idle page wakeup for fsdax pages,
which still happens at refcount 1.  This is a separate issue and will
be sorted out later.  Given that only fsdax pages require the
notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig
symbol can go away and be replaced with a FS_DAX check for this hook
in the put_page fastpath.

Based on an earlier patch from Ralph Campbell .

Signed-off-by: Christoph Hellwig 


Thanks for working on this, definite step forward.

Reviewed-by: Ralph Campbell 



Re: [PATCH] nouveau/svm: Fix to migrate all requested pages

2022-07-28 Thread Ralph Campbell

On 7/19/22 23:27, Alistair Popple wrote:


Users may request that pages from an OpenCL SVM allocation be migrated
to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into
nouveau_dmem_migrate_vma() to do the migration. If the total range to be
migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in
chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the
starting address means that only the first chunk will get migrated.

Fix the calculation so that the entire range will get migrated if
possible.

Signed-off-by: Alistair Popple 
Fixes: e3d8b0890469 ("drm/nouveau/svm: map pages after migration")


Thanks for fixing this!
Reviewed-by: Ralph Campbell 



Re: [PATCH] mm/migrate_device: Return number of migrating pages in args->cpages

2022-11-14 Thread Ralph Campbell



On 11/10/22 16:51, Alistair Popple wrote:

migrate_vma->cpages originally contained a count of the number of
pages migrating including non-present pages which can be poluated


"populated"


directly on the target.

Commit 241f68859656 ("mm/migrate_device.c: refactor migrate_vma and
migrate_deivce_coherent_page()") inadvertantly changed this to contain
just the number of pages that were unmapped. Usage of
migrate_vma->cpages isn't documented, but most drivers use it to see
if all the requested addresses can be migrated so restore the original
behaviour.

Fixes: 241f68859656 ("mm/migrate_device.c: refactor migrate_vma and 
migrate_deivce_coherent_page()")
Signed-off-by: Alistair Popple 
Reported-by: Ralph Campbell 


You can add
Reviewed-by: Ralph Campbell 

Thanks!




[PATCH] drm/edid: Add quirk for OSVR HDK 2.0

2023-06-08 Thread Ralph Campbell
The OSVR virtual reality headset HDK 2.0 uses a different EDID
vendor and device identifier than the HDK 1.1 - 1.4 headsets.
Add the HDK 2.0 vendor and device identifier to the quirks table so
that window managers do not try to display the desktop screen on the
headset display.

Signed-off-by: Ralph Campbell 
Tested-by: Ralph Campbell 
---
 drivers/gpu/drm/drm_edid.c | 1 +
 1 file changed, 1 insertion(+)

I don't know how many of these VR headsets are still around but I have a
working one and I saw and entry for HDK 1.x so I thought it would be good
to add HDK 2.0.

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 0454da505687..3b8cc1fe05e8 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -230,6 +230,7 @@ static const struct edid_quirk {
 
/* OSVR HDK and HDK2 VR Headsets */
EDID_QUIRK('S', 'V', 'R', 0x1019, EDID_QUIRK_NON_DESKTOP),
+   EDID_QUIRK('A', 'O', 'U', 0x, EDID_QUIRK_NON_DESKTOP),
 };
 
 /*
-- 
2.40.1



Re: [PATCH] drm/edid: Add quirk for OSVR HDK 2.0

2023-06-09 Thread Ralph Campbell



On 6/9/23 02:03, Jani Nikula wrote:

On Thu, 08 Jun 2023, Ralph Campbell  wrote:

The OSVR virtual reality headset HDK 2.0 uses a different EDID
vendor and device identifier than the HDK 1.1 - 1.4 headsets.
Add the HDK 2.0 vendor and device identifier to the quirks table so
that window managers do not try to display the desktop screen on the
headset display.

At some point in time we requested bugs to be filed about quirks, with
EDIDs attached, so we could look at them later, and maybe remove the
quirks.

The headset non-desktop thing started off as a quirk, but since then
we've added both Microsoft VSDB and DisplayID primary use as ways to
indicate this without quirks.

BR,
Jani.


If you want me to file a bug, I can do that and I have the EDID too.
Where would I file it?

I did see the DisplayID 2.0 code. This headset is no longer being
manufactured so updating the EDID is not practical.


Signed-off-by: Ralph Campbell 
Tested-by: Ralph Campbell 
---
  drivers/gpu/drm/drm_edid.c | 1 +
  1 file changed, 1 insertion(+)

I don't know how many of these VR headsets are still around but I have a
working one and I saw and entry for HDK 1.x so I thought it would be good
to add HDK 2.0.

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 0454da505687..3b8cc1fe05e8 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -230,6 +230,7 @@ static const struct edid_quirk {
  
  	/* OSVR HDK and HDK2 VR Headsets */

EDID_QUIRK('S', 'V', 'R', 0x1019, EDID_QUIRK_NON_DESKTOP),
+   EDID_QUIRK('A', 'O', 'U', 0x, EDID_QUIRK_NON_DESKTOP),
  };
  
  /*


Re: [PATCH] drm/edid: Add quirk for OSVR HDK 2.0

2023-06-11 Thread Ralph Campbell



On 6/10/23 00:22, Jani Nikula wrote:

On Fri, 09 Jun 2023, Ralph Campbell  wrote:

On 6/9/23 02:03, Jani Nikula wrote:

On Thu, 08 Jun 2023, Ralph Campbell  wrote:

The OSVR virtual reality headset HDK 2.0 uses a different EDID
vendor and device identifier than the HDK 1.1 - 1.4 headsets.
Add the HDK 2.0 vendor and device identifier to the quirks table so
that window managers do not try to display the desktop screen on the
headset display.

At some point in time we requested bugs to be filed about quirks, with
EDIDs attached, so we could look at them later, and maybe remove the
quirks.

The headset non-desktop thing started off as a quirk, but since then
we've added both Microsoft VSDB and DisplayID primary use as ways to
indicate this without quirks.

BR,
Jani.

If you want me to file a bug, I can do that and I have the EDID too.
Where would I file it?

I suppose at https://gitlab.freedesktop.org/drm/misc/-/issues

We should then reference the issue in the commit message (no need to
resend, this can be added while applying).


I did see the DisplayID 2.0 code. This headset is no longer being
manufactured so updating the EDID is not practical.

I'm not saying the EDID should be updated, just that we might drop the
quirk if we find another generic way to identify non-desktops that
covers the EDID in question.

If the device isn't covered by the existing mechanisms, I'm not opposed
to merging as-is, with the issue reference added.


Thanks,
Jani.


Thanks, I created issue #30
"OSVR virtual reality headset HDK 2.0 not recognized as a non-desktop display"




Signed-off-by: Ralph Campbell 
Tested-by: Ralph Campbell 
---
   drivers/gpu/drm/drm_edid.c | 1 +
   1 file changed, 1 insertion(+)

I don't know how many of these VR headsets are still around but I have a
working one and I saw and entry for HDK 1.x so I thought it would be good
to add HDK 2.0.

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 0454da505687..3b8cc1fe05e8 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -230,6 +230,7 @@ static const struct edid_quirk {
   
   	/* OSVR HDK and HDK2 VR Headsets */

EDID_QUIRK('S', 'V', 'R', 0x1019, EDID_QUIRK_NON_DESKTOP),
+   EDID_QUIRK('A', 'O', 'U', 0x, EDID_QUIRK_NON_DESKTOP),
   };
   
   /*


[PATCH v2] drm/edid: Add quirk for OSVR HDK 2.0

2023-06-21 Thread Ralph Campbell
The OSVR virtual reality headset HDK 2.0 uses a different EDID
vendor and device identifier than the HDK 1.1 - 1.4 headsets.
Add the HDK 2.0 vendor and device identifier to the quirks table so
that window managers do not try to display the desktop screen on the
headset display.

Signed-off-by: Ralph Campbell 
Tested-by: Ralph Campbell 
---
 drivers/gpu/drm/drm_edid.c | 1 +
 1 file changed, 1 insertion(+)

I don't know how many of these VR headsets are still around but I have a
working one and I saw an entry for HDK 1.x so I thought it would be good
to add HDK 2.0.

v2: The vendor ID was byte swapped.
I'm not sure how I missed that in v1.

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 0454da505687..3b8cc1fe05e8 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -230,6 +230,7 @@ static const struct edid_quirk {
 
/* OSVR HDK and HDK2 VR Headsets */
EDID_QUIRK('S', 'V', 'R', 0x1019, EDID_QUIRK_NON_DESKTOP),
+   EDID_QUIRK('A', 'U', 'O', 0x, EDID_QUIRK_NON_DESKTOP),
 };
 
 /*
-- 
2.40.1



<    1   2