Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-26 Thread Felix Kuehling



On 2022-09-26 17:35, Lyude Paul wrote:

On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:

When the module is unloaded or a GPU is unbound from the module it is
possible for device private pages to be left mapped in currently running
processes. This leads to a kernel crash when the pages are either freed
or accessed from the CPU because the GPU and associated data structures
and callbacks have all been freed.

Fix this by migrating any mappings back to normal CPU memory prior to
freeing the GPU memory chunks and associated device private pages.

Signed-off-by: Alistair Popple 

---

I assume the AMD driver might have a similar issue. However I can't see
where device private (or coherent) pages actually get unmapped/freed
during teardown as I couldn't find any relevant calls to
devm_memunmap(), memunmap(), devm_release_mem_region() or
release_mem_region(). So it appears that ZONE_DEVICE pages are not being
properly freed during module unload, unless I'm missing something?

I've got no idea, will poke Ben to see if they know the answer to this


I guess we're relying on devm to release the region. Isn't the whole 
point of using devm_request_free_mem_region that we don't have to 
remember to explicitly release it when the device gets destroyed? I 
believe we had an explicit free call at some point by mistake, and that 
caused a double-free during module unload. See this commit for reference:


commit 22f4f4faf337d5fb2d2750aff13215726814273e
Author: Philip Yang 
Date:   Mon Sep 20 17:25:52 2021 -0400

drm/amdkfd: fix svm_migrate_fini warning

Device manager releases device-specific resources when a driver

disconnects from a device, devm_memunmap_pages and
devm_release_mem_region calls in svm_migrate_fini are redundant.

It causes below warning trace after patch "drm/amdgpu: Split

amdgpu_device_fini into early and late", so remove function
svm_migrate_fini.

BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718

WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795

devm_release_action+0x51/0x60
Call Trace:
? memunmap_pages+0x360/0x360
svm_migrate_fini+0x2d/0x60 [amdgpu]
kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
drm_dev_release+0x20/0x40 [drm]
release_nodes+0x196/0x1e0
device_release_driver_internal+0x104/0x1d0
driver_detach+0x47/0x90
bus_remove_driver+0x7a/0xd0
pci_unregister_driver+0x3d/0x90
amdgpu_exit+0x11/0x20 [amdgpu]

Signed-off-by: Philip Yang 

Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 

Furthermore, I guess we are assuming that nobody is using the GPU when 
the module is unloaded. As long as any processes have /dev/kfd open, you 
won't be able to unload the module (except by force-unload). I suppose 
with ZONE_DEVICE memory, we can have references to device memory pages 
even when user mode has closed /dev/kfd. We do have a cleanup handler 
that runs in an MMU-free-notifier. In theory that should run after all 
the pages in the mm_struct have been freed. It releases all sorts of 
other device resources and needs the driver to still be there. I'm not 
sure if there is anything preventing a module unload before the 
free-notifier runs. I'll look into that.


Regards,
  Felix





---
  drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++-
  1 file changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 66ebbd4..3b247b8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
mutex_unlock(>dmem->mutex);
  }
  
+/*

+ * Evict all pages mapping a chunk.
+ */
+void
+nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
+{
+   unsigned long i, npages = range_len(>pagemap.range) >> 
PAGE_SHIFT;
+   unsigned long *src_pfns, *dst_pfns;
+   dma_addr_t *dma_addrs;
+   struct nouveau_fence *fence;
+
+   src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
+   dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
+   dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL);
+
+   migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT,
+   npages);
+
+   for (i = 0; i < npages; i++) {
+   if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
+   struct page *dpage;
+
+   /*
+* _GFP_NOFAIL because the GPU is going away and there
+* is nothing sensible we can do if we can't copy the
+* data back.
+*/

You'll have 

Re: [PATCH] powerpc: export cpu_smallcore_map for modules

2022-08-19 Thread Felix Kuehling



On 2022-08-19 17:01, Randy Dunlap wrote:

Fix build error when CONFIG_DRM_AMDGPU=m:

ERROR: modpost: "cpu_smallcore_map" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] 
undefined!

by exporting 'cpu_smallcore_map' just as other per_cpu
symbols are exported.

drivers/gpu/drm/amd/amdkfd/kfd_device.c calls cpu_smt_mask().
This is an inline function on powerpc which references
cpu_smallcore_map.

Fixes: 425752c63b6f ("powerpc: Detect the presence of big-cores via "ibm, 
thread-groups"")
Fixes: 7bc913085765 ("drm/amdkfd: Try to schedule bottom half on same core")
Signed-off-by: Randy Dunlap 
Cc: Gautham R. Shenoy 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: amd-...@lists.freedesktop.org
Cc: Felix Kuehling 
Cc: Alex Deucher 
Cc: Christian König 
Cc: "Pan, Xinhui" 


Acked-by: Felix Kuehling 



---
  arch/powerpc/kernel/smp.c |1 +
  1 file changed, 1 insertion(+)

--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -86,6 +86,7 @@ DEFINE_PER_CPU(cpumask_var_t, cpu_core_m
  static DEFINE_PER_CPU(cpumask_var_t, cpu_coregroup_map);
  
  EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);

+EXPORT_PER_CPU_SYMBOL(cpu_smallcore_map);
  EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
  EXPORT_PER_CPU_SYMBOL(cpu_core_map);
  EXPORT_SYMBOL_GPL(has_big_cores);


Re: Build regressions/improvements in v5.17-rc1

2022-01-24 Thread Felix Kuehling



Am 2022-01-24 um 14:11 schrieb Randy Dunlap:

On 1/24/22 10:55, Geert Uytterhoeven wrote:

Hi Alex,

On Mon, Jan 24, 2022 at 7:52 PM Alex Deucher  wrote:

On Mon, Jan 24, 2022 at 5:25 AM Geert Uytterhoeven  wrote:

On Sun, 23 Jan 2022, Geert Uytterhoeven wrote:

  + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: error: 
control reaches end of non-void function [-Werror=return-type]:  => 1560:1

I don't really see what's going on here:

#ifdef CONFIG_X86_64
return cpu_data(first_cpu_of_numa_node).apicid;
#else
return first_cpu_of_numa_node;
#endif

Ah, the actual failure causing this was not included:

In file included from /kisskb/src/arch/x86/um/asm/processor.h:41:0,
  from /kisskb/src/include/linux/mutex.h:19,
  from /kisskb/src/include/linux/kernfs.h:11,
  from /kisskb/src/include/linux/sysfs.h:16,
  from /kisskb/src/include/linux/kobject.h:20,
  from /kisskb/src/include/linux/pci.h:35,
  from
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:25:
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: In
function 'kfd_cpumask_to_apic_id':
/kisskb/src/arch/um/include/asm/processor-generic.h:103:18: error:
called object is not a function or function pointer
  #define cpu_data (_cpu_data)
   ^
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9:
note: in expansion of macro 'cpu_data'
   return cpu_data(first_cpu_of_numa_node).apicid;
  ^
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1:
error: control reaches end of non-void function [-Werror=return-type]
  }
  ^

ah yes, UML.
I have a bunch of UML fixes that I have been hesitant to post.

This is one of them.
What do people think about this?


Does it make sense to configure a UML kernel with a real device driver 
in the first place? Or should we just prevent enabling amdgpu for UML 
with a Kconfig dependency?


Regards,
  Felix




thanks.

---
From: Randy Dunlap 


../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9: note: in 
expansion of macro ‘cpu_data’
   return cpu_data(first_cpu_of_numa_node).apicid;
  ^~~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1: error: control 
reaches end of non-void function [-Werror=return-type]

../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c: In function 
‘kfd_fill_iolink_info_for_cpu’:
../arch/um/include/asm/processor-generic.h:103:19: error: called object is not 
a function or function pointer
  #define cpu_data (_cpu_data)
   ~^~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1688:27: note: in expansion 
of macro ‘cpu_data’
   struct cpuinfo_x86 *c = _data(0);
^~~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1691:7: error: dereferencing 
pointer to incomplete type ‘struct cpuinfo_x86’
   if (c->x86_vendor == X86_VENDOR_AMD)
^~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1691:23: error: 
‘X86_VENDOR_AMD’ undeclared (first use in this function); did you mean 
‘X86_VENDOR_ANY’?
   if (c->x86_vendor == X86_VENDOR_AMD)
^~
X86_VENDOR_ANY

../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c: In function 
‘kfd_create_vcrat_image_cpu’:
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1742:11: warning: unused 
variable ‘entries’ [-Wunused-variable]
   uint32_t entries = 0;

Signed-off-by: Randy Dunlap 
---
  drivers/gpu/drm/amd/amdkfd/kfd_crat.c |6 +++---
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c |2 +-
  2 files changed, 4 insertions(+), 4 deletions(-)

--- linux-next-20220107.orig/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ linux-next-20220107/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1552,7 +1552,7 @@ static int kfd_cpumask_to_apic_id(const
first_cpu_of_numa_node = cpumask_first(cpumask);
if (first_cpu_of_numa_node >= nr_cpu_ids)
return -1;
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
return cpu_data(first_cpu_of_numa_node).apicid;
  #else
return first_cpu_of_numa_node;
--- linux-next-20220107.orig/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ linux-next-20220107/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1679,7 +1679,7 @@ static int kfd_fill_mem_info_for_cpu(int
return 0;
  }
  
-#ifdef CONFIG_X86_64

+#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
  static int kfd_fill_iolink_info_for_cpu(int numa_node_id, int *avail_size,
uint32_t *num_entries,
struct crat_subtype_iolink *sub_type_hdr)
@@ -1738,7 +1738,7 @@ static int kfd_create_vcrat_image_cpu(vo
struct crat_subtype_generic *sub_type_hdr;
int avail_size = *size;
int numa_node_id;
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
uint32_t entries = 0;
  #endif
   

Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED

2021-10-28 Thread Felix Kuehling
Am 2021-10-27 um 9:42 p.m. schrieb Alistair Popple:
> On Wednesday, 27 October 2021 3:09:57 AM AEDT Felix Kuehling wrote:
>> Am 2021-10-25 um 12:16 a.m. schrieb Alistair Popple:
>>> MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
>>> source page was already locked during migrate_vma_collect(). If it
>>> wasn't then the a second attempt is made to lock the page. However if
>>> the first attempt failed it's unlikely a second attempt will succeed,
>>> and the retry adds complexity. So clean this up by removing the retry
>>> and MIGRATE_PFN_LOCKED flag.
>>>
>>> Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
>>> set, but nothing actually checks that.
>>>
>>> Signed-off-by: Alistair Popple 
>> It makes sense to me. Do you have any empirical data on how much more
>> likely migrations are going to fail with this change due to contested
>> page locks?
> Thanks Felix. I do not have any empirical data on this but I've mostly seen
> migrations fail due to the reference count check failing rather than failure 
> to
> lock the page. Even then it's mostly been due to thrashing on the same page, 
> so
> I would be surprised if this change made any noticeable difference.

We have seen more page locking contention on NUMA systems that disappear
when we disable NUMA balancing. Probably NUMA balancing migrations
result in the page lock being more contended, which can cause HMM
migration of some pages to fail.

Also, for migrations to system memory, multiple threads page faulting
concurrently can cause contention. I was just helping debug such an
issue. Having migrations to system memory be only partially successful
is problematic. We'll probably have to implement some retry logic in the
driver to handle this.

Regards,
  Felix


>
>> Either way, the patch is
>>
>> Acked-by: Felix Kuehling 
>>
>>
>>> ---
>>>  Documentation/vm/hmm.rst |   2 +-
>>>  arch/powerpc/kvm/book3s_hv_uvmem.c   |   4 +-
>>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |   2 -
>>>  drivers/gpu/drm/nouveau/nouveau_dmem.c   |   4 +-
>>>  include/linux/migrate.h  |   1 -
>>>  lib/test_hmm.c   |   5 +-
>>>  mm/migrate.c | 145 +--
>>>  7 files changed, 35 insertions(+), 128 deletions(-)
>>>
>>> diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
>>> index a14c2938e7af..f2a59ed82ed3 100644
>>> --- a/Documentation/vm/hmm.rst
>>> +++ b/Documentation/vm/hmm.rst
>>> @@ -360,7 +360,7 @@ between device driver specific code and shared common 
>>> code:
>>> system memory page, locks the page with ``lock_page()``, and fills in 
>>> the
>>> ``dst`` array entry with::
>>>  
>>> - dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
>>> + dst[i] = migrate_pfn(page_to_pfn(dpage));
>>>  
>>> Now that the driver knows that this page is being migrated, it can
>>> invalidate device private MMU mappings and copy device private memory
>>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
>>> b/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> index a7061ee3b157..28c436df9935 100644
>>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> @@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
>>> *vma,
>>>   gpa, 0, page_shift);
>>>  
>>> if (ret == U_SUCCESS)
>>> -   *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
>>> +   *mig.dst = migrate_pfn(pfn);
>>> else {
>>> unlock_page(dpage);
>>> __free_page(dpage);
>>> @@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct 
>>> *vma,
>>> }
>>> }
>>>  
>>> -   *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
>>> +   *mig.dst = migrate_pfn(page_to_pfn(dpage));
>>> migrate_vma_pages();
>>>  out_finalize:
>>> migrate_vma_finalize();
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> index 4a16e3c257b9..41d9417f182b 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> @@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, 
>>> struct svm_range *prange,
>&

Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED

2021-10-26 Thread Felix Kuehling
Am 2021-10-25 um 12:16 a.m. schrieb Alistair Popple:
> MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
> source page was already locked during migrate_vma_collect(). If it
> wasn't then the a second attempt is made to lock the page. However if
> the first attempt failed it's unlikely a second attempt will succeed,
> and the retry adds complexity. So clean this up by removing the retry
> and MIGRATE_PFN_LOCKED flag.
>
> Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
> set, but nothing actually checks that.
>
> Signed-off-by: Alistair Popple 

It makes sense to me. Do you have any empirical data on how much more
likely migrations are going to fail with this change due to contested
page locks?

Either way, the patch is

Acked-by: Felix Kuehling 


> ---
>  Documentation/vm/hmm.rst |   2 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c   |   4 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |   2 -
>  drivers/gpu/drm/nouveau/nouveau_dmem.c   |   4 +-
>  include/linux/migrate.h  |   1 -
>  lib/test_hmm.c   |   5 +-
>  mm/migrate.c | 145 +--
>  7 files changed, 35 insertions(+), 128 deletions(-)
>
> diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
> index a14c2938e7af..f2a59ed82ed3 100644
> --- a/Documentation/vm/hmm.rst
> +++ b/Documentation/vm/hmm.rst
> @@ -360,7 +360,7 @@ between device driver specific code and shared common 
> code:
> system memory page, locks the page with ``lock_page()``, and fills in the
> ``dst`` array entry with::
>  
> - dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + dst[i] = migrate_pfn(page_to_pfn(dpage));
>  
> Now that the driver knows that this page is being migrated, it can
> invalidate device private MMU mappings and copy device private memory
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index a7061ee3b157..28c436df9935 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
> *vma,
> gpa, 0, page_shift);
>  
>   if (ret == U_SUCCESS)
> - *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
> + *mig.dst = migrate_pfn(pfn);
>   else {
>   unlock_page(dpage);
>   __free_page(dpage);
> @@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
>   }
>   }
>  
> - *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + *mig.dst = migrate_pfn(page_to_pfn(dpage));
>   migrate_vma_pages();
>  out_finalize:
>   migrate_vma_finalize();
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 4a16e3c257b9..41d9417f182b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, 
> struct svm_range *prange,
>   migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
>   svm_migrate_get_vram_page(prange, migrate->dst[i]);
>   migrate->dst[i] = migrate_pfn(migrate->dst[i]);
> - migrate->dst[i] |= MIGRATE_PFN_LOCKED;
>   src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE,
> DMA_TO_DEVICE);
>   r = dma_mapping_error(dev, src[i]);
> @@ -580,7 +579,6 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, 
> struct svm_range *prange,
> dst[i] >> PAGE_SHIFT, page_to_pfn(dpage));
>  
>   migrate->dst[i] = migrate_pfn(page_to_pfn(dpage));
> - migrate->dst[i] |= MIGRATE_PFN_LOCKED;
>   j++;
>   }
>  
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index 92987daa5e17..3828aafd3ac4 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -166,7 +166,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct 
> nouveau_drm *drm,
>   goto error_dma_unmap;
>   mutex_unlock(>mutex);
>  
> - args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + args->dst[0] = migrate_pfn(page_to_pfn(dpage));
>   return 0;
>  
>  error_dma_unmap:
> @@ -602,7 +602,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct 
> nouveau_drm *drm,