Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release
On 2022-09-26 17:35, Lyude Paul wrote: On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote: When the module is unloaded or a GPU is unbound from the module it is possible for device private pages to be left mapped in currently running processes. This leads to a kernel crash when the pages are either freed or accessed from the CPU because the GPU and associated data structures and callbacks have all been freed. Fix this by migrating any mappings back to normal CPU memory prior to freeing the GPU memory chunks and associated device private pages. Signed-off-by: Alistair Popple --- I assume the AMD driver might have a similar issue. However I can't see where device private (or coherent) pages actually get unmapped/freed during teardown as I couldn't find any relevant calls to devm_memunmap(), memunmap(), devm_release_mem_region() or release_mem_region(). So it appears that ZONE_DEVICE pages are not being properly freed during module unload, unless I'm missing something? I've got no idea, will poke Ben to see if they know the answer to this I guess we're relying on devm to release the region. Isn't the whole point of using devm_request_free_mem_region that we don't have to remember to explicitly release it when the device gets destroyed? I believe we had an explicit free call at some point by mistake, and that caused a double-free during module unload. See this commit for reference: commit 22f4f4faf337d5fb2d2750aff13215726814273e Author: Philip Yang Date: Mon Sep 20 17:25:52 2021 -0400 drm/amdkfd: fix svm_migrate_fini warning Device manager releases device-specific resources when a driver disconnects from a device, devm_memunmap_pages and devm_release_mem_region calls in svm_migrate_fini are redundant. It causes below warning trace after patch "drm/amdgpu: Split amdgpu_device_fini into early and late", so remove function svm_migrate_fini. BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718 WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795 devm_release_action+0x51/0x60 Call Trace: ? memunmap_pages+0x360/0x360 svm_migrate_fini+0x2d/0x60 [amdgpu] kgd2kfd_device_exit+0x23/0xa0 [amdgpu] amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu] amdgpu_device_fini_sw+0x45/0x290 [amdgpu] amdgpu_driver_release_kms+0x12/0x30 [amdgpu] drm_dev_release+0x20/0x40 [drm] release_nodes+0x196/0x1e0 device_release_driver_internal+0x104/0x1d0 driver_detach+0x47/0x90 bus_remove_driver+0x7a/0xd0 pci_unregister_driver+0x3d/0x90 amdgpu_exit+0x11/0x20 [amdgpu] Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Furthermore, I guess we are assuming that nobody is using the GPU when the module is unloaded. As long as any processes have /dev/kfd open, you won't be able to unload the module (except by force-unload). I suppose with ZONE_DEVICE memory, we can have references to device memory pages even when user mode has closed /dev/kfd. We do have a cleanup handler that runs in an MMU-free-notifier. In theory that should run after all the pages in the mm_struct have been freed. It releases all sorts of other device resources and needs the driver to still be there. I'm not sure if there is anything preventing a module unload before the free-notifier runs. I'll look into that. Regards, Felix --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++- 1 file changed, 48 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 66ebbd4..3b247b8 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) mutex_unlock(>dmem->mutex); } +/* + * Evict all pages mapping a chunk. + */ +void +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) +{ + unsigned long i, npages = range_len(>pagemap.range) >> PAGE_SHIFT; + unsigned long *src_pfns, *dst_pfns; + dma_addr_t *dma_addrs; + struct nouveau_fence *fence; + + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); + dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL); + + migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT, + npages); + + for (i = 0; i < npages; i++) { + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { + struct page *dpage; + + /* +* _GFP_NOFAIL because the GPU is going away and there +* is nothing sensible we can do if we can't copy the +* data back. +*/ You'll have
Re: [PATCH] powerpc: export cpu_smallcore_map for modules
On 2022-08-19 17:01, Randy Dunlap wrote: Fix build error when CONFIG_DRM_AMDGPU=m: ERROR: modpost: "cpu_smallcore_map" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! by exporting 'cpu_smallcore_map' just as other per_cpu symbols are exported. drivers/gpu/drm/amd/amdkfd/kfd_device.c calls cpu_smt_mask(). This is an inline function on powerpc which references cpu_smallcore_map. Fixes: 425752c63b6f ("powerpc: Detect the presence of big-cores via "ibm, thread-groups"") Fixes: 7bc913085765 ("drm/amdkfd: Try to schedule bottom half on same core") Signed-off-by: Randy Dunlap Cc: Gautham R. Shenoy Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: linuxppc-dev@lists.ozlabs.org Cc: amd-...@lists.freedesktop.org Cc: Felix Kuehling Cc: Alex Deucher Cc: Christian König Cc: "Pan, Xinhui" Acked-by: Felix Kuehling --- arch/powerpc/kernel/smp.c |1 + 1 file changed, 1 insertion(+) --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -86,6 +86,7 @@ DEFINE_PER_CPU(cpumask_var_t, cpu_core_m static DEFINE_PER_CPU(cpumask_var_t, cpu_coregroup_map); EXPORT_PER_CPU_SYMBOL(cpu_sibling_map); +EXPORT_PER_CPU_SYMBOL(cpu_smallcore_map); EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map); EXPORT_PER_CPU_SYMBOL(cpu_core_map); EXPORT_SYMBOL_GPL(has_big_cores);
Re: Build regressions/improvements in v5.17-rc1
Am 2022-01-24 um 14:11 schrieb Randy Dunlap: On 1/24/22 10:55, Geert Uytterhoeven wrote: Hi Alex, On Mon, Jan 24, 2022 at 7:52 PM Alex Deucher wrote: On Mon, Jan 24, 2022 at 5:25 AM Geert Uytterhoeven wrote: On Sun, 23 Jan 2022, Geert Uytterhoeven wrote: + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: error: control reaches end of non-void function [-Werror=return-type]: => 1560:1 I don't really see what's going on here: #ifdef CONFIG_X86_64 return cpu_data(first_cpu_of_numa_node).apicid; #else return first_cpu_of_numa_node; #endif Ah, the actual failure causing this was not included: In file included from /kisskb/src/arch/x86/um/asm/processor.h:41:0, from /kisskb/src/include/linux/mutex.h:19, from /kisskb/src/include/linux/kernfs.h:11, from /kisskb/src/include/linux/sysfs.h:16, from /kisskb/src/include/linux/kobject.h:20, from /kisskb/src/include/linux/pci.h:35, from /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:25: /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: In function 'kfd_cpumask_to_apic_id': /kisskb/src/arch/um/include/asm/processor-generic.h:103:18: error: called object is not a function or function pointer #define cpu_data (_cpu_data) ^ /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9: note: in expansion of macro 'cpu_data' return cpu_data(first_cpu_of_numa_node).apicid; ^ /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1: error: control reaches end of non-void function [-Werror=return-type] } ^ ah yes, UML. I have a bunch of UML fixes that I have been hesitant to post. This is one of them. What do people think about this? Does it make sense to configure a UML kernel with a real device driver in the first place? Or should we just prevent enabling amdgpu for UML with a Kconfig dependency? Regards, Felix thanks. --- From: Randy Dunlap ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9: note: in expansion of macro ‘cpu_data’ return cpu_data(first_cpu_of_numa_node).apicid; ^~~~ ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1: error: control reaches end of non-void function [-Werror=return-type] ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c: In function ‘kfd_fill_iolink_info_for_cpu’: ../arch/um/include/asm/processor-generic.h:103:19: error: called object is not a function or function pointer #define cpu_data (_cpu_data) ~^~~ ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1688:27: note: in expansion of macro ‘cpu_data’ struct cpuinfo_x86 *c = _data(0); ^~~~ ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1691:7: error: dereferencing pointer to incomplete type ‘struct cpuinfo_x86’ if (c->x86_vendor == X86_VENDOR_AMD) ^~ ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1691:23: error: ‘X86_VENDOR_AMD’ undeclared (first use in this function); did you mean ‘X86_VENDOR_ANY’? if (c->x86_vendor == X86_VENDOR_AMD) ^~ X86_VENDOR_ANY ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c: In function ‘kfd_create_vcrat_image_cpu’: ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1742:11: warning: unused variable ‘entries’ [-Wunused-variable] uint32_t entries = 0; Signed-off-by: Randy Dunlap --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c |6 +++--- drivers/gpu/drm/amd/amdkfd/kfd_topology.c |2 +- 2 files changed, 4 insertions(+), 4 deletions(-) --- linux-next-20220107.orig/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ linux-next-20220107/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1552,7 +1552,7 @@ static int kfd_cpumask_to_apic_id(const first_cpu_of_numa_node = cpumask_first(cpumask); if (first_cpu_of_numa_node >= nr_cpu_ids) return -1; -#ifdef CONFIG_X86_64 +#if defined(CONFIG_X86_64) && !defined(CONFIG_UML) return cpu_data(first_cpu_of_numa_node).apicid; #else return first_cpu_of_numa_node; --- linux-next-20220107.orig/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ linux-next-20220107/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1679,7 +1679,7 @@ static int kfd_fill_mem_info_for_cpu(int return 0; } -#ifdef CONFIG_X86_64 +#if defined(CONFIG_X86_64) && !defined(CONFIG_UML) static int kfd_fill_iolink_info_for_cpu(int numa_node_id, int *avail_size, uint32_t *num_entries, struct crat_subtype_iolink *sub_type_hdr) @@ -1738,7 +1738,7 @@ static int kfd_create_vcrat_image_cpu(vo struct crat_subtype_generic *sub_type_hdr; int avail_size = *size; int numa_node_id; -#ifdef CONFIG_X86_64 +#if defined(CONFIG_X86_64) && !defined(CONFIG_UML) uint32_t entries = 0; #endif
Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED
Am 2021-10-27 um 9:42 p.m. schrieb Alistair Popple: > On Wednesday, 27 October 2021 3:09:57 AM AEDT Felix Kuehling wrote: >> Am 2021-10-25 um 12:16 a.m. schrieb Alistair Popple: >>> MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a >>> source page was already locked during migrate_vma_collect(). If it >>> wasn't then the a second attempt is made to lock the page. However if >>> the first attempt failed it's unlikely a second attempt will succeed, >>> and the retry adds complexity. So clean this up by removing the retry >>> and MIGRATE_PFN_LOCKED flag. >>> >>> Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag >>> set, but nothing actually checks that. >>> >>> Signed-off-by: Alistair Popple >> It makes sense to me. Do you have any empirical data on how much more >> likely migrations are going to fail with this change due to contested >> page locks? > Thanks Felix. I do not have any empirical data on this but I've mostly seen > migrations fail due to the reference count check failing rather than failure > to > lock the page. Even then it's mostly been due to thrashing on the same page, > so > I would be surprised if this change made any noticeable difference. We have seen more page locking contention on NUMA systems that disappear when we disable NUMA balancing. Probably NUMA balancing migrations result in the page lock being more contended, which can cause HMM migration of some pages to fail. Also, for migrations to system memory, multiple threads page faulting concurrently can cause contention. I was just helping debug such an issue. Having migrations to system memory be only partially successful is problematic. We'll probably have to implement some retry logic in the driver to handle this. Regards, Felix > >> Either way, the patch is >> >> Acked-by: Felix Kuehling >> >> >>> --- >>> Documentation/vm/hmm.rst | 2 +- >>> arch/powerpc/kvm/book3s_hv_uvmem.c | 4 +- >>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 - >>> drivers/gpu/drm/nouveau/nouveau_dmem.c | 4 +- >>> include/linux/migrate.h | 1 - >>> lib/test_hmm.c | 5 +- >>> mm/migrate.c | 145 +-- >>> 7 files changed, 35 insertions(+), 128 deletions(-) >>> >>> diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst >>> index a14c2938e7af..f2a59ed82ed3 100644 >>> --- a/Documentation/vm/hmm.rst >>> +++ b/Documentation/vm/hmm.rst >>> @@ -360,7 +360,7 @@ between device driver specific code and shared common >>> code: >>> system memory page, locks the page with ``lock_page()``, and fills in >>> the >>> ``dst`` array entry with:: >>> >>> - dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; >>> + dst[i] = migrate_pfn(page_to_pfn(dpage)); >>> >>> Now that the driver knows that this page is being migrated, it can >>> invalidate device private MMU mappings and copy device private memory >>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c >>> b/arch/powerpc/kvm/book3s_hv_uvmem.c >>> index a7061ee3b157..28c436df9935 100644 >>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c >>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c >>> @@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct >>> *vma, >>> gpa, 0, page_shift); >>> >>> if (ret == U_SUCCESS) >>> - *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED; >>> + *mig.dst = migrate_pfn(pfn); >>> else { >>> unlock_page(dpage); >>> __free_page(dpage); >>> @@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct >>> *vma, >>> } >>> } >>> >>> - *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; >>> + *mig.dst = migrate_pfn(page_to_pfn(dpage)); >>> migrate_vma_pages(); >>> out_finalize: >>> migrate_vma_finalize(); >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>> index 4a16e3c257b9..41d9417f182b 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>> @@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, >>> struct svm_range *prange, >&
Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED
Am 2021-10-25 um 12:16 a.m. schrieb Alistair Popple: > MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a > source page was already locked during migrate_vma_collect(). If it > wasn't then the a second attempt is made to lock the page. However if > the first attempt failed it's unlikely a second attempt will succeed, > and the retry adds complexity. So clean this up by removing the retry > and MIGRATE_PFN_LOCKED flag. > > Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag > set, but nothing actually checks that. > > Signed-off-by: Alistair Popple It makes sense to me. Do you have any empirical data on how much more likely migrations are going to fail with this change due to contested page locks? Either way, the patch is Acked-by: Felix Kuehling > --- > Documentation/vm/hmm.rst | 2 +- > arch/powerpc/kvm/book3s_hv_uvmem.c | 4 +- > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 - > drivers/gpu/drm/nouveau/nouveau_dmem.c | 4 +- > include/linux/migrate.h | 1 - > lib/test_hmm.c | 5 +- > mm/migrate.c | 145 +-- > 7 files changed, 35 insertions(+), 128 deletions(-) > > diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst > index a14c2938e7af..f2a59ed82ed3 100644 > --- a/Documentation/vm/hmm.rst > +++ b/Documentation/vm/hmm.rst > @@ -360,7 +360,7 @@ between device driver specific code and shared common > code: > system memory page, locks the page with ``lock_page()``, and fills in the > ``dst`` array entry with:: > > - dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; > + dst[i] = migrate_pfn(page_to_pfn(dpage)); > > Now that the driver knows that this page is being migrated, it can > invalidate device private MMU mappings and copy device private memory > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c > b/arch/powerpc/kvm/book3s_hv_uvmem.c > index a7061ee3b157..28c436df9935 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct > *vma, > gpa, 0, page_shift); > > if (ret == U_SUCCESS) > - *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED; > + *mig.dst = migrate_pfn(pfn); > else { > unlock_page(dpage); > __free_page(dpage); > @@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, > } > } > > - *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; > + *mig.dst = migrate_pfn(page_to_pfn(dpage)); > migrate_vma_pages(); > out_finalize: > migrate_vma_finalize(); > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > index 4a16e3c257b9..41d9417f182b 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > @@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, > struct svm_range *prange, > migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]); > svm_migrate_get_vram_page(prange, migrate->dst[i]); > migrate->dst[i] = migrate_pfn(migrate->dst[i]); > - migrate->dst[i] |= MIGRATE_PFN_LOCKED; > src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE, > DMA_TO_DEVICE); > r = dma_mapping_error(dev, src[i]); > @@ -580,7 +579,6 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, > struct svm_range *prange, > dst[i] >> PAGE_SHIFT, page_to_pfn(dpage)); > > migrate->dst[i] = migrate_pfn(page_to_pfn(dpage)); > - migrate->dst[i] |= MIGRATE_PFN_LOCKED; > j++; > } > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c > b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index 92987daa5e17..3828aafd3ac4 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -166,7 +166,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct > nouveau_drm *drm, > goto error_dma_unmap; > mutex_unlock(>mutex); > > - args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; > + args->dst[0] = migrate_pfn(page_to_pfn(dpage)); > return 0; > > error_dma_unmap: > @@ -602,7 +602,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct > nouveau_drm *drm,