Re: [PATCH] drm/amdgpu: Init zone device and drm client after mode-1 reset on reload

2024-03-05 Thread Felix Kuehling
On 2024-03-04 19:20, Rehman, Ahmad wrote: [AMD Official Use Only - General] Hey, Due to mode-1 reset (pending_reset), the amdgpu_amdkfd_device_init will not be called and hence adev->kfd.init_complete will not be set. The function amdgpu_amdkfd_drm_client_create has condition: if

Re: [PATCH 2/3] drm/amdgpu: sdma support for sriov cpx mode

2024-03-04 Thread Felix Kuehling
On 2024-03-04 10:19, Samir Dhume wrote: Signed-off-by: Samir Dhume Please add a meaningful commit description to all the patches in the series. See one more comment below. --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 34 +++- 1 file changed, 27 insertions(+), 7

Re: [PATCH] drm/amdgpu: Init zone device and drm client after mode-1 reset on reload

2024-03-04 Thread Felix Kuehling
On 2024-03-04 17:05, Ahmad Rehman wrote: In passthrough environment, when amdgpu is reloaded after unload, mode-1 is triggered after initializing the necessary IPs, That init does not include KFD, and KFD init waits until the reset is completed. KFD init is called in the reset handler, but in

Re: [PATCH V3] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

2024-03-01 Thread Felix Kuehling
On 2024-02-29 01:04, Jesse.Zhang wrote: fix the issue: "amdgpu: Failed to create process VM object". [Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm page table. But when clifo run. It also initializes a vm for a process device through the function

Re: [PATCH v3] drm/amdgpu: change vm->task_info handling

2024-03-01 Thread Felix Kuehling
put last in vm_fini() Cc: Christian Koenig Cc: Alex Deucher Cc: Felix Kuehling Signed-off-by: Shashank Sharma One nit-pick and one bug inline. With those fixed, the patch Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 9 +- drivers/gpu/drm/a

Re: [PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

2024-02-28 Thread Felix Kuehling
On 2024-02-28 01:41, Christian König wrote: Am 28.02.24 um 06:04 schrieb Jesse.Zhang: fix the issue when run clinfo: "amdgpu: Failed to create process VM object". when amdgpu initialized, seq64 do mampping and update bo mapping in vm page table. But when clifo run. It also initializes a vm

Re: [PATCH] drm/amdkfd: Increase the size of the memory reserved for the TBA

2024-02-23 Thread Felix Kuehling
+TMA reserved memory size to two pages. Signed-off-by: Laurent Morichetti Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 23 --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 6 +++--- 2 files changed, 19 insertions(+), 10 deletions(-) diff

Re: [PATCH] drm/amdkfd: fix process reference drop on debug ioctl

2024-02-21 Thread Felix Kuehling
On 2024-02-21 05:54, Jonathan Kim wrote: Prevent dropping the KFD process reference at the end of a debug IOCTL call where the acquired process value is an error. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 + 1 file

Re: [PATCH 1/2] drm/amdkfd: Document and define SVM event tracing macro

2024-02-16 Thread Felix Kuehling
On 2024-02-15 10:18, Philip Yang wrote: Document how to use SMI system management interface to receive SVM events. Define SVM events message string format macro that could use by user mode for sscanf to parse the event. Add it to uAPI header file to make it obvious that is changing uAPI in

[PATCH v3] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole

2024-02-13 Thread Felix Kuehling
Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c| 6 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 11 +++- drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 29 ++-- 4 files changed, 27

Re: [Patch v2 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-13 Thread Felix Kuehling
Signed-off-by: Rajneesh Bhardwaj Reviewed-by: Felix Kuehling --- * Change the enum bitfield to 4 to avoid ORing condition of previous member flags. * Incorporate review feedback from Felix from https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg102840.html and split one

Re: [PATCH 2/2] drm/amdgpu: Fix implicit assumtion in gfx11 debug flags

2024-02-13 Thread Felix Kuehling
-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c index d722cbd31783..826bc4f6c8a7 100644 --- a/drivers

Re: [PATCH 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-13 Thread Felix Kuehling
On 2024-02-09 20:49, Rajneesh Bhardwaj wrote: In certain cooperative group dispatch scenarios the default SPI resource allocation may cause reduced per-CU workgroup occupancy. Set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang scenarions. Suggested-by: Joseph Greathouse

Re: [Patch v2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-08 Thread Felix Kuehling
On 2024-02-08 15:01, Bhardwaj, Rajneesh wrote: On 2/8/2024 2:41 PM, Felix Kuehling wrote: On 2024-02-07 23:14, Rajneesh Bhardwaj wrote: In certain cooperative group dispatch scenarios the default SPI resource allocation may cause reduced per-CU workgroup occupancy. Set

Re: [Patch v2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-08 Thread Felix Kuehling
On 2024-02-07 23:14, Rajneesh Bhardwaj wrote: In certain cooperative group dispatch scenarios the default SPI resource allocation may cause reduced per-CU workgroup occupancy. Set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang scenarions. Suggested-by: Joseph Greathouse

Re: [PATCH v2] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology

2024-02-06 Thread Felix Kuehling
the kfd_gpu_cache_info before asking the remaining fields to be filled in by lower-level functions. Fixes: 04756ac9a24c ("drm/amdkfd: Add cache line sizes to KFD topology") Signed-off-by: Joseph Greathouse Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 + 1 file

Re: [PATCH] drm/amdkfd: Don't divide L2 cache by partition mode

2024-02-06 Thread Felix Kuehling
On 2024-02-06 16:24, Kent Russell wrote: Partition mode only affects L3 cache size. After removing the L2 check in the previous patch, make sure we aren't dividing all cache sizes by partition mode, just L3. Fixes: a75bfb3c4045 ("drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3") The fixes

Re: [PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology

2024-02-06 Thread Felix Kuehling
On 2024-02-06 15:55, Joseph Greathouse wrote: The current kfd_gpu_cache_info structure is only partially filled in for some architectures. This means that for devices where we do not fill in some fields, we can returned uninitialized values through the KFD topology. Zero out the

Re: [PATCH 1/2] drm/amdgpu: Unmap only clear the page table leaves

2024-02-02 Thread Felix Kuehling
On 2024-02-01 11:50, Philip Yang wrote: SVM migration unmap pages from GPU and then update mapping to GPU to recover page fault. Currently unmap clears the PDE entry for range length >= huge page and free PTB bo, update mapping to alloc new PT bo. There is race bug that the freed entry bo

Re: [PATCH] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-02 Thread Felix Kuehling
Thanks for checking. The patch ls Reviewed-by: Felix Kuehling Thanks, -Joe Regards, Felix + m->compute_resource_limits = q->is_gws ? + COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK : 0; + q->is_active = QUEUE_IS_ACTIVE(*q); }

Re: [PATCH] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-02 Thread Felix Kuehling
On 2024-02-01 13:54, Rajneesh Bhardwaj wrote: In certain cooperative group dispatch scenarios the default SPI resource allocation may cause reduced per-CU workgroup occupancy. Set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang scenarions. Suggested-by: Joseph Greathouse

Re: [PATCH v3] drm/amdkfd: reserve the BO before validating it

2024-01-30 Thread Felix Kuehling
_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Signed-off-by: Lang Yu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 20 --- drivers/gpu/drm/amd/amdkfd/kfd_charde

[PATCH 2/2] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-30 Thread Felix Kuehling
by a NULL access with a small offset. v2: - Move it to the reserved space to avoid concflicts with Mesa - Add macros to make reserved space management easier Cc: Arunpravin Paneer Selvam Cc: Christian Koenig Signed-off-by: Jay Cornwall Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amdgpu: Reduce VA_RESERVED_BOTTOM to 64KB

2024-01-30 Thread Felix Kuehling
/vm/mmap_min_addr. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 98a57192..2c4053b29bb3 100644 --- a/drivers

Re: [PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Felix Kuehling
On 2024-01-29 11:50, Arunpravin Paneer Selvam wrote: @@ -339,18 +346,19 @@ static void kfd_init_apertures_v9(struct kfd_process_device *pdd, uint8_t id)    pdd->lds_base = MAKE_LDS_APP_BASE_V9();    pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base); -    /* Raven needs SVM to

Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Felix Kuehling
, Felix Kuehling 写道: On 2024-01-29 8:58, Shengyu Qu wrote: Hi, Seems rocm-opengl interop hang problem still exists[1]. Btw have you discovered into this problem? Best regards, Shengyu [1] https://projects.blender.org/blender/blender/issues/100353#issuecomment-599 Maybe you're having a different

Re: [PATCH v2] drm/amdkfd: reserve the BO before validating it

2024-01-29 Thread Felix Kuehling
On 2024-01-28 21:30, Yu, Lang wrote: [AMD Official Use Only - General] -Original Message- From: Kuehling, Felix Sent: Saturday, January 27, 2024 3:22 AM To: Yu, Lang ; amd-gfx@lists.freedesktop.org Cc: Francis, David Subject: Re: [PATCH v2] drm/amdkfd: reserve the BO before

Re: [PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Felix Kuehling
On 2024-01-29 3:45, Yu, Lang wrote: [AMD Official Use Only - General] -Original Message- From: amd-gfx On Behalf Of Felix Kuehling Sent: Friday, January 26, 2024 6:28 AM To: amd-gfx@lists.freedesktop.org Cc: Cornwall, Jay ; Koenig, Christian ; Paneer Selvam, Arunpravin Subject

Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Felix Kuehling
be we need more users to test it. Besides, Tested-by: Shengyu Qu Best Regards, Shengyu 在 2024/1/26 06:27, Felix Kuehling 写道: The TBA and TMA, along with an unused IB allocation, reside at low addresses in the VM address space. A stray VM fault which hits these pages must be serviced

Re: [PATCH v2] drm/amdkfd: reserve the BO before validating it

2024-01-26 Thread Felix Kuehling
On 2024-01-25 20:59, Yu, Lang wrote: [AMD Official Use Only - General] -Original Message- From: Kuehling, Felix Sent: Thursday, January 25, 2024 5:41 AM To: Yu, Lang ; amd-gfx@lists.freedesktop.org Cc: Francis, David Subject: Re: [PATCH v2] drm/amdkfd: reserve the BO before

[PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-25 Thread Felix Kuehling
by a NULL access with a small offset. v2: - Move it to the reserved space to avoid concflicts with Mesa - Add macros to make reserved space management easier Cc: Arunpravin Paneer Selvam Cc: Christian Koenig Signed-off-by: Jay Cornwall Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu

Re: [PATCH v2] drm/amdgpu: move the drm client creation behind drm device registration

2024-01-25 Thread Felix Kuehling
ight place for this to ensure it only gets called once. The fix looks reasonable to me. Reviewed-by: Felix Kuehling This looks fine to me, needs to be checked by Felix anyway. Thanks, Lijo And re-locating the drm client creation following after drm_dev_register looks like a more proper flow. v2: wr

Re: [PATCH v2] drm/amdkfd: reserve the BO before validating it

2024-01-24 Thread Felix Kuehling
On 2024-01-22 4:08, Lang Yu wrote: Fixes: 410f08516e0f ("drm/amdkfd: Move dma unmapping after TLB flush") v2: Avoid unmapping attachment twice when ERESTARTSYS. [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call

Re: [PATCH v2] drm/amdgpu: change vm->task_info handling

2024-01-24 Thread Felix Kuehling
On 2024-01-24 9:32, Shashank Sharma wrote: On 19/01/2024 21:23, Felix Kuehling wrote: On 2024-01-18 14:21, Shashank Sharma wrote: This patch changes the handling and lifecycle of vm->task_info object. The major changes are: - vm->task_info is a dynamically allocated ptr now, and its

Re: [PATCH] drm/amdkfd: Add cache line sizes to KFD topology

2024-01-22 Thread Felix Kuehling
* On various Navis, most cache lines are 128 except L1 scalar data and instruction caches as well as L3 cache * You fixed L1 scalar data and instruction cache sizes for Carrizo. Was that intentional? If that sounds correct and how it's meant to be, you can add my Reviewed-by: Felix Kuehling

Re: [PATCH v2] drm/amdgpu: change vm->task_info handling

2024-01-19 Thread Felix Kuehling
of vm->task_info. V2: Do not block all the prints when task_info not found (Felix) Cc: Christian Koenig Cc: Alex Deucher Cc: Felix Kuehling Signed-off-by: Shashank Sharma Nit-picks inline. --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

Re: [PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

2024-01-19 Thread Felix Kuehling
On 2024-01-18 07:07, Christian König wrote: Am 18.01.24 um 00:44 schrieb Friedrich Vock: On 18.01.24 00:00, Alex Deucher wrote: [SNIP] Right now, IH overflows, even if they occur repeatedly, only get registered once. If not registering IH overflows can trivially lead to system crashes, it's

Re: [pull] amdgpu, amdkfd drm-fixes-6.8

2024-01-15 Thread Felix Kuehling
ras_ctrl debugfs Charlene Liu (1): drm/amd/display: Update z8 latency Dafna Hirschfeld (1): drm/amdkfd: fixes for HMM mem allocation Daniel Miess (1): Revert "drm/amd/display: Fix conversions between bytes and KB" Felix Kuehling (4): drm/amdkfd: Fix lock

[PATCH] drm/amdgpu: Remove unnecessary NULL check

2024-01-15 Thread Felix Kuehling
A static checker pointed out, that bo_va->base.bo was already derefenced earlier in the same scope. Therefore this check is unnecessary here. Reported-by: Dan Carpenter Fixes: 79e7fdec71f2 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by: Felix Kuehling ---

Re: Proposal to add CRIU support to DRM render nodes

2024-01-15 Thread Felix Kuehling
be generalized later if there is interest then. Regards,   Felix On 2023-12-06 16:23, Felix Kuehling wrote: Executive Summary: We need to add CRIU support to DRM render nodes in order to maintain CRIU support for ROCm application once they start relying on render nodes for more GPU memory management

Re: [PATCH] drm/amdkfd: init drm_client with funcs hook

2024-01-15 Thread Felix Kuehling
On 2024-01-12 3:05, Flora Cui wrote: otherwise drm_client_dev_unregister() would try to kfree(>kfd.client). Signed-off-by: Flora Cui Thank you for finding and fixing this bug. You can add: Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Revie

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-11 Thread Felix Kuehling
won't be able to start. The VA range allocator is in libdrm. Marek On Fri, Jan 5, 2024, 15:20 Felix Kuehling wrote: TBA/TMA were relocated to the upper half of the canonical address space. I don't think that qualifies as 32-bit by definition. But maybe you're using a different definition

Re: [PATCH v2] drm/amdkfd: Fix variable dereferenced before NULL check in 'kfd_dbg_trap_device_snapshot()'

2024-01-11 Thread Felix Kuehling
On 2024-01-10 10:56, Srinivasan Shanmugam wrote: Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:1024 kfd_dbg_trap_device_snapshot() warn: variable dereferenced before check 'entry_size' (see line 1021) Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off

Re: [PATCH v3] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-11 Thread Felix Kuehling
On 2024-01-10 17:01, Philip Yang wrote: While svm range partial migrating to system memory, clear dma_addr vram domain flag, otherwise the future split will get incorrect vram_pages and actual loc. After range spliting, set new range and old range actual_loc: new range actual_loc is 0 if

Re: [PATCH] drm/amdkfd: reserve the BO before validating it

2024-01-11 Thread Felix Kuehling
On 2024-01-11 02:22, Lang Yu wrote: Fixes: 410f08516e0f ("drm/amdkfd: Move dma unmapping after TLB flush") [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] [ 41.708996] ?

Re: [PATCH] drm/amdkfd: Fix the shift-out-of-bounds warning

2024-01-11 Thread Felix Kuehling
[+Jon] On 2024-01-11 01:05, Ma, Jun wrote: Hi Felix, On 1/10/2024 11:57 PM, Felix Kuehling wrote: On 2024-01-10 04:39, Ma Jun wrote: There is following shift-out-of-bounds warning if ecode=0. "shift exponent 4294967295 is too large for 64-bit type 'long long unsigned int'"

Re: [PATCH v2] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-10 Thread Felix Kuehling
On 2024-01-09 15:05, Philip Yang wrote: After svm range partial migrating to system memory, unmap to cleanup the corresponding dma_addr vram domain flag, otherwise the future split will get incorrect vram_pages and actual loc. After range spliting, set new range and old range actual_loc: new

Re: [PATCH] drm/amdkfd: Fix the shift-out-of-bounds warning

2024-01-10 Thread Felix Kuehling
On 2024-01-10 04:39, Ma Jun wrote: There is following shift-out-of-bounds warning if ecode=0. "shift exponent 4294967295 is too large for 64-bit type 'long long unsigned int'" Signed-off-by: Ma Jun --- include/uapi/linux/kfd_ioctl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

Re: [PATCH] drm/amdkfd: Fix 'node' NULL check in 'svm_range_get_range_boundaries()'

2024-01-10 Thread Felix Kuehling
mdkfd/kfd_svm.c:2691 svm_range_get_range_boundaries() warn: can 'node' even be NULL? Suggested-by: Philip Yang Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +- 1 file changed, 5 insert

Re: [PATCH v2] drm/amdkfd: fixes for HMM mem allocation

2024-01-08 Thread Felix Kuehling
On 2024-01-07 08:07, Dafna Hirschfeld wrote: Fix err return value and reset pgmap->type after checking it. Fixes: c83dee9b6394 ("drm/amdkfd: add SPM support for SVM") Reviewed-by: Felix Kuehling Signed-off-by: Dafna Hirschfeld --- v2: remove unrelated DOC fix and add 'Fixes'

[PATCH v2] drm/amdkfd: Fix sparse __rcu annotation warnings

2024-01-05 Thread Felix Kuehling
Properly mark kfd_process->ef as __rcu and consistently use the right accessor functions. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202312052245.yfpbsgnh-...@intel.com/ Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-05 Thread Felix Kuehling
why UMDs can't allocate anything in that range. Marek On Wed, Jan 3, 2024 at 2:50 PM Jay Cornwall wrote: On 1/3/2024 12:58, Felix Kuehling wrote: A segfault in Mesa seems to be a different issue from what's mentioned in the commit message. I'd let Christian or Marek comment on compatibility

Re: [PATCH v5 1/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2024-01-04 Thread Felix Kuehling
On 2024-01-04 4:33, Christian König wrote: Am 04.01.24 um 00:15 schrieb Felix Kuehling: DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the process_info->kfd_bo_list. There is no explicit KFD API call to validate them or add eviction fences to them. This pa

[PATCH v5 1/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2024-01-03 Thread Felix Kuehling
s long as all imports are from KFD, with the exports already reserved, validated and fenced by the KFD restore worker. v5: Reintroduced separate evicted_user state to simplify the state machine and CS error handling when amdgpu_vm_validate is called without a ticket. Signed-off-by: Felix Ku

[PATCH v5 2/2] drm/amdkfd: Bump KFD ioctl version

2024-01-03 Thread Felix Kuehling
This is not strictly a change in the IOCTL API. This version bump is meant to indicate to user mode the presence of a number of changes and fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD. Signed-off-by: Felix Kuehling

Re: [PATCH] drm/amdkfd: Fix lock dependency warning with srcu

2024-01-03 Thread Felix Kuehling
letion)(>deferred_list_work)); sync(srcu); Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdk

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Felix Kuehling
't see how Jay's patch could have caused that. I made another change in that code recently that could make a difference for this issue: commit 8f08c5b24ced1be7eb49692e4816c1916233c79b Author: Felix Kuehling Date:   Fri Oct 27 18:21:55 2023 -0400     drm/amdkfd: Run restore_workers on fr

[PATCH v2] drm/amdkfd: Fix lock dependency warning

2024-01-02 Thread Felix Kuehling
t are already done in amdkfd_fence_enable_signaling. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkf

Re: [PATCH] drm/amdkfd: fixes for HMM mem allocation

2024-01-02 Thread Felix Kuehling
/* Disable SVM support capability */ + pgmap->type = 0; Ooff, thanks for catching that. For the KFD driver changes you can add Fixes: c83dee9b6394 ("drm/amdkfd: add SPM support for SVM") Reviewed-by: Felix Kuehling return PTR_ERR(r);

Re: [PATCH] drm/amdgpu: change vm->task_info handling

2024-01-02 Thread Felix Kuehling
ogistical changes required for existing usage of vm->task_info. Cc: Christian Koenig Cc: Alex Deucher Cc: Felix Kuehling Signed-off-by: Shashank Sharma --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 ++- drivers/gpu/drm/amd/amdgpu/

Re: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed

2024-01-02 Thread Felix Kuehling
On 2024-01-02 09:07, Hawking Zhang wrote: Check and report boot status if discovery failed. Signed-off-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c

Re: [PATCH v2] drm/amdkfd: Confirm list is non-empty before utilizing list_first_entry in kfd_topology.c

2024-01-02 Thread Felix Kuehling
ology to surface peer-to-peer links")' Suggested-by: Lijo Lazar Suggested-by: Felix Kuehling Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Reviewed-by: Felix Kuehling --- v2: Changed to "if (list_empty(>io_link_props)) retur

Re: [PATCH] drm/amdkfd: Fix lock dependency warning

2024-01-02 Thread Felix Kuehling
On 2023-12-28 18:11, Philip Yang wrote: On 2023-12-21 15:40, Felix Kuehling wrote: == WARNING: possible circular locking dependency detected 6.5.0-kfd-fkuehlin #276 Not tainted -- kworker

Re: [PATCH] drm/amdkfd: Fix iterator used outside loop in 'kfd_add_peer_prop()'

2024-01-02 Thread Felix Kuehling
On 2023-12-29 04:43, Srinivasan Shanmugam wrote: Fix the following about iterator use: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1456 kfd_add_peer_prop() warn: iterator used outside loop: 'iolink3' Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan

Re: [PATCH v3] drm/amdkfd: Prefer kernel data types u8, u16, u32, u64 in amdkfd/kfd_priv.h

2024-01-02 Thread Felix Kuehling
as well? I also see a bunch of unrelated indentation changes in this patch. Regards,   Felix Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam --- v3: - updated u32, u16, u64 for missed variables in v2 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 448

Re: [PATCH] drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()'

2024-01-02 Thread Felix Kuehling
? Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdkfd: Fix lock dependency warning

2023-12-21 Thread Felix Kuehling
ange_schedule_evict_svm_bo instead of in the worker. That way it's impossible for a BO to get freed while eviction work is pending and the cancel_work_sync call in svm_range_bo_release can be eliminated. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 13 - 1

Re: [PATCH] drm/amdkfd: Drop redundant NULL pointer check in kfd_topology.c

2023-12-21 Thread Felix Kuehling
/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1428 kfd_add_peer_prop() warn: can 'iolink1' even be NULL? drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1433 kfd_add_peer_prop() warn: can 'iolink2' even be NULL? Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan

Re: [PATCH] drm/amdkfd: Fix type of 'dbg_flags' in 'struct kfd_process'

2023-12-21 Thread Felix Kuehling
() warn: maybe use && instead of & Please add a Fixes-tag: Fixes: 0de4ec9a0353 ("drm/amdgpu: prepare map process for multi-process debug devices") Suggested-by: Lijo Lazar Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam ---

Re: [PATCH] drm/amdkfd: Fix sparse __rcu annotation warnings

2023-12-20 Thread Felix Kuehling
On 2023-12-11 10:56, Felix Kuehling wrote: On 2023-12-08 05:11, Christian König wrote: Am 07.12.23 um 20:14 schrieb Felix Kuehling: On 2023-12-05 17:20, Felix Kuehling wrote: Properly mark kfd_process->ef as __rcu and consistently access it with rcu_dereference_protected. Repor

Re: [PATCH v3 1/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-12-20 Thread Felix Kuehling
On 2023-12-20 8:58, Christian König wrote: Am 19.12.23 um 23:43 schrieb Felix Kuehling: On 2023-12-19 3:10, Christian König wrote: Am 15.12.23 um 16:19 schrieb Felix Kuehling: On 2023-12-15 07:30, Christian König wrote: @@ -1425,11 +1451,21 @@ int amdgpu_vm_handle_moved(struct amdgpu_device

Re: [PATCH v3 1/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-12-19 Thread Felix Kuehling
On 2023-12-19 3:10, Christian König wrote: Am 15.12.23 um 16:19 schrieb Felix Kuehling: On 2023-12-15 07:30, Christian König wrote: @@ -1425,11 +1451,21 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,   }     r = amdgpu_vm_bo_update(adev, bo_va, clear

[PATCH] drm/amdgpu: Let KFD sync with VM fences

2023-12-18 Thread Felix Kuehling
Change the rules for amdgpu_sync_resv to let KFD synchronize with VM fences on page table reservations. This fixes intermittent memory corruption after evictions when using amdgpu_vm_handle_moved to update page tables for VM mappings managed through render nodes. Signed-off-by: Felix Kuehling

Re: [PATCH v3 1/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-12-15 Thread Felix Kuehling
On 2023-12-15 07:30, Christian König wrote: @@ -1425,11 +1451,21 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,   }     r = amdgpu_vm_bo_update(adev, bo_va, clear); -    if (r) -    return r;     if (unlock)   dma_resv_unlock(resv); + 

[PATCH v3 1/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-12-14 Thread Felix Kuehling
s long as all imports are from KFD, with the exports already reserved, validated and fenced by the KFD restore worker. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 10 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 39 -- drivers/gpu/drm/amd/

[PATCH v3 2/2] drm/amdkfd: Bump KFD ioctl version

2023-12-14 Thread Felix Kuehling
This is not strictly a change in the IOCTL API. This version bump is meant to indicate to user mode the presence of a number of changes and fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD. Signed-off-by: Felix Kuehling

Re: [PATCH v2 1/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-12-14 Thread Felix Kuehling
On 2023-12-14 16:40, Felix Kuehling wrote: Fence slot reservation should bet done by the caller and not here. The caller doesn't necessarily have the BO list to create all those fences. The whole point of doing this in the VM code was, to use the "BO lists" maintained by th

Re: [PATCH v2 1/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-12-14 Thread Felix Kuehling
On 2023-12-13 09:30, Christian König wrote: Am 06.12.23 um 22:44 schrieb Felix Kuehling: DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the process_info->kfd_bo_list. There is no explicit KFD API call to validate them or add eviction fences to them. This pa

Re: [PATCH v3 2/2] drm/amdgpu: Enable clear page functionality

2023-12-14 Thread Felix Kuehling
the DRM_BUDDY_CLEARED flag. - Remove ! from amdgpu_res_cleared() check. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König Acked-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c| 22 --- .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h| 25

Re: [PATCH] drm/amd: Add a workaround for GFX11 systems that fail to flush TLB

2023-12-14 Thread Felix Kuehling
On 2023-12-14 10:06, Alex Deucher wrote: On Thu, Dec 14, 2023 at 9:24 AM Liu, Shaoyun wrote: [AMD Official Use Only - General] The gmc flush tlb function is used on both baremetal and sriov. But the function amdgpu_virt_kiq_reg_write_reg_wait is defined in amdgpu_virt.c with name

Re: [PATCH] drm/amdkfd: svm range always mapped flag not working on APU

2023-12-14 Thread Felix Kuehling
be mapped to all GPUs after this change. This side effect will be fixed with Thunk change to set CWSR svm range with ACCESS_IN_PLACE attribute on the GPU that user queue is created. Signed-off-by: Philip Yang With the commit description fixed, this patch is Reviewed-by: Felix Kuehling ---

Re: [PATCH 1/2] drm: update drm_show_memory_stats() for dma-bufs

2023-12-13 Thread Felix Kuehling
On 2023-12-07 13:02, Alex Deucher wrote: Show buffers as shared if they are shared via dma-buf as well (e.g., shared with v4l or some other subsystem). You can add KFD to that list. With the in-progress CUDA11 VM changes and improved interop between KFD and render nodes, sharing DMABufs

Re: [PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout

2023-12-13 Thread Felix Kuehling
in those cases. There are also some FIXMEs in this code that should be addressed at the same time. That said, as a short-term fix, this patch is Acked-by: Felix Kuehling ---   drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++--   1 file changed, 2 insertions(+), 2 deletions(-) diff --git

Re: [PATCH 2/2] drm/amdgpu: Enable clear page functionality

2023-12-13 Thread Felix Kuehling
On 2023-12-13 9:20, Christian König wrote: Am 12.12.23 um 00:32 schrieb Felix Kuehling: On 2023-12-11 04:50, Christian König wrote: Am 08.12.23 um 20:53 schrieb Alex Deucher: [SNIP] You also need a functionality which resets all cleared blocks to uncleared after suspend/resume. No idea how

Re: [PATCH 2/2] drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages

2023-12-11 Thread Felix Kuehling
On 2023-12-11 05:38, Christian König wrote: Am 09.12.23 um 00:01 schrieb James Zhu: Needn't do schedule for each hmm_range_fault, and use cond_resched to replace schedule. cond_resched() is usually NAKed upstream since it is a NO-OP in most situations. That's weird, because

Re: [PATCH 2/2] drm/amdgpu: Enable clear page functionality

2023-12-11 Thread Felix Kuehling
On 2023-12-11 04:50, Christian König wrote: Am 08.12.23 um 20:53 schrieb Alex Deucher: [SNIP] You also need a functionality which resets all cleared blocks to uncleared after suspend/resume. No idea how to do this, maybe Alex knows of hand. Since the buffers are cleared on creation, is

Re: [PATCH] drm/amdkfd: Fix sparse __rcu annotation warnings

2023-12-11 Thread Felix Kuehling
On 2023-12-08 05:11, Christian König wrote: Am 07.12.23 um 20:14 schrieb Felix Kuehling: On 2023-12-05 17:20, Felix Kuehling wrote: Properly mark kfd_process->ef as __rcu and consistently access it with rcu_dereference_protected. Reported-by: kernel test robot Closes: ht

Re: [PATCH] drm/amdkfd: Fix sparse __rcu annotation warnings

2023-12-07 Thread Felix Kuehling
On 2023-12-05 17:20, Felix Kuehling wrote: Properly mark kfd_process->ef as __rcu and consistently access it with rcu_dereference_protected. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202312052245.yfpbsgnh-...@intel.com/ Signed-off-by: Felix Kuehl

[PATCH v2 1/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-12-06 Thread Felix Kuehling
ces for amdgpu_vm_fence_imports into amdgpu_vm_validate, outside the vm->status_lock * Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds without KFD Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 10 ++ .../gpu/drm/amd/

[PATCH v2 2/2] drm/amdkfd: Bump KFD ioctl version

2023-12-06 Thread Felix Kuehling
This is not strictly a change in the IOCTL API. This version bump is meant to indicate to user mode the presence of a number of changes and fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD. Signed-off-by: Felix Kuehling

Proposal to add CRIU support to DRM render nodes

2023-12-06 Thread Felix Kuehling
Executive Summary: We need to add CRIU support to DRM render nodes in order to maintain CRIU support for ROCm application once they start relying on render nodes for more GPU memory management. In this email I'm providing some background why we are doing this, and outlining some of the

Re: [PATCH 2/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-12-05 Thread Felix Kuehling
On 2023-12-04 03:40, Christian König wrote:   @@ -416,6 +423,28 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,   }   spin_lock(>status_lock);   } +    while (ticket && !list_empty(>evicted_user)) { +    bo_base =

[PATCH] drm/amdkfd: Fix sparse __rcu annotation warnings

2023-12-05 Thread Felix Kuehling
Properly mark kfd_process->ef as __rcu and consistently access it with rcu_dereference_protected. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202312052245.yfpbsgnh-...@intel.com/ Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkf

Re: [PATCH 1/6] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-12-04 Thread Felix Kuehling
6. Cheers,   Felix Alex Thanks, Felix On 2023-12-01 18:34, Felix Kuehling wrote: This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3. These helper functions are needed for KFD to export and import DMABufs the right way without duplicating the tracking of DMABufs associated with G

Re: [PATCH 1/6] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-12-01 Thread Felix Kuehling
On 2023-12-01 18:34, Felix Kuehling wrote: This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3. These helper functions are needed for KFD to export and import DMABufs the right way without duplicating the tracking of DMABufs associated with GEM objects while ensuring that move notifier

[PATCH 5/6] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-12-01 Thread Felix Kuehling
VM. Revalidation after evictions is handled in the VM code. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 3 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_bu

[PATCH 3/6] drm/amdkfd: Import DMABufs for interop through DRM

2023-12-01 Thread Felix Kuehling
Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and remain set as long as the object is imported into KFD. Signed-off-by: Felix Kuehling Reviewed-by: Ramesh Errabolu Reviewed-by: Xiaogang.C

[PATCH 2/6] drm/amdkfd: Export DMABufs from KFD using GEM handles

2023-12-01 Thread Felix Kuehling
Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM handles are created in a drm_client_dev context to avoid exposing them in user mode contexts through a DMABuf import. Signed-off-by: Felix Kuehling Reviewed-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

[PATCH 6/6] drm/amdkfd: Bump KFD ioctl version

2023-12-01 Thread Felix Kuehling
This is not strictly a change in the IOCTL API. This version bump is meant to indicate to user mode the presence of a number of changes and fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD. Signed-off-by: Felix Kuehling

[PATCH 4/6] drm/amdgpu: New VM state for evicted user BOs

2023-12-01 Thread Felix Kuehling
Create a new VM state to track user BOs that are in the system domain. In the next patch this will be used do conditionally re-validate them in amdgpu_vm_handle_moved. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 + drivers/gpu/drm/amd/amdgpu

<    1   2   3   4   5   6   7   8   9   10   >